[PDF] Step-wise target controllability identifies dysregulated pathways of macrophage networks in multiple sclerosis

Abstract

Identifying the nodes that have the potential to influence the state of a network is a relevant question for many complex systems. In many applications it is often essential to test the ability of an individual node to control a specific target subset of the network. In biological networks, this might provide precious information on how single genes regulate the expression of specific groups of molecules in the cell. Taking into account these constraints, we propose an optimized heuristic based on the Kalman rank condition to quantify the centrality of a node as the number of target nodes it can control. By introducing a hierarchy among the nodes in the target set, and performing a step-wise research, we ensure for sparse and directed networks the identification of a controllable driver-target configuration in a significantly reduced space and time complexity. We show how the method works for simple network configurations, then we use it to characterize the inflammatory pathways in molecular gene networks associated with macrophage dysfunction in patients with multiple sclerosis. Results indicate that the targeted secreted molecules can in general be controlled by a large number of driver nodes (51%) involved in different cell functions, i.e. sensing, signaling and transcription. However, during the inflammatory response only a moderate fraction of all the possible driver-target pairs are significantly coactivated, as measured by gene expression data obtained from human blood samples. Notably, they differ between multiple sclerosis patients and healthy controls, and we find that this is related to the presence of dysregulated genes along the controllable walks. Our method, that we name step-wise target controllability, represents a practical solution to identify controllable driver-target configurations in directed complex networks and test their relevance from a functional perspective.

Full PDF

SStep-wise target controllability identifiesdysregulations of macrophage networks in multiplesclerosis

Giulia Bassignana , Jennifer Fransson , Vincent Henry , Olivier Colliot , ViolettaZujovic , Fabrizio De Vico Fallani Sorbonne Universites, UPMC Univ Paris 06, Inserm U-1127, CNRS UMR-7225,Institut du Cerveau et de la Moelle Epini`ere, Hopital Piti´e-Salpˆetri`ere, Paris, France Inria Paris, Aramis Project Team, Paris, France* corresponding author: [email protected]

Abstract

Identifying the nodes able to drive the state of a network is crucial to understand, and eventuallycontrol, biological systems. Despite recent advances, such identification remains difficult becauseof the huge number of equivalent controllable configurations, even in relatively simple networks.Based on the evidence that in many applications it is essential to test the ability of individualnodes to control a specific target subset, we develop a fast and principled method to identifycontrollable driver-target configurations in sparse and directed networks. We demonstrate ourapproach on simulated networks and experimental gene networks to to characterize macrophagedysregulation in human subjects with multiple sclerosis.

December 8, 2020 1/27 a r X i v : . [ q - b i o . M N ] D ec Introduction

For many biological systems, it is crucial to identify the units, such as genes or neurons,with the potential to influence the rest of the network, as this identification can enabledescribing, understanding, and eventually controlling the function of the system [1, 2].Topological descriptors based on network science can indeed be used to quantify suchinfluence in terms of node centrality, such as degree, betweenness, or closeness [3].However, these descriptors only capture the structural properties of the network andneglect their effect on the dynamics, thus limiting our understanding on the actualinfluencing power.Control network theory, linking network structure to dynamics through linear ornonlinear models, has been shown to be a more principled approach for identifying drivernodes in an interconnected system [4, 5]. While theoretically these approaches can givea minimum set of driver nodes sufficient to steer the system into desired states, theirexhaustive identification might be difficult in practice as there exists in general a verylarge number of equivalent controllable walks, even in relatively simple networks [6, 7].In the case of criteria based on the manipulation of controllability matrices [8, 9], thepresence of many walks can for example induce numerical errors due to the differentorders of magnitude in the matrix elements.An alternative solution has recently been proposed to circumvent this limitation,based on the possibility to map the controllability problem onto the maximum cardinalitymatching over the associated graph [10–12]. As a result, it is possible to identify a setof driver nodes - at least for directed networks - with linear, and not exponential, timecomplexity [13]. While this approach elegantly solves numerical issues, it can neverthelessnot tell which configuration, among all the possible ones, is the most relevant. In general,there is a factorial number of equivalent configurations (with the same number of inputs)and enumerating all possible matchings [14] rapidly becomes unfeasible, even for simplegraphs such as trees [6, 7], bipartite graphs [15], or random graphs [16]. Thus, theresearch of alternative strategies to characterize the candidate driver nodes is crucial forthe concrete application of network controllability tools.One possibility would be to reduce the original problem into smaller sub-problemsunder the assumption of specific constraints compatible with the underlying scientificquestion. On the one hand, for many biological, technological and social systems itis desirable to only control a subset of target nodes (or a subsystem) that is essentialfor the system’s mission pertaining to a selected task or function. In this directionsome approaches have been recently proposed, based on filtering of the controllabilitymatrix [17] or adaptation of graph matching [18–21]. However, they do not solvethe problem of multiple driver set configurations. On the other hand, technical andexperimental constraints often limit the possibility to stimulate many driver nodes inparallel, for example in gene expression modulation [22] or brain stimulation [23]. Inthese cases, approaches that focus on the ability of single driver node to control the entirenetwork, such as control centrality [24] or single-node controllability [25], do circumventthe multiplicity issue, but can still suffer from numerical errors and approximate results.To overcome this impasse, we propose an integrated method that combines theadvantages of the previous approaches and quantifies the capacity of a single driver nodeto control a predefined target set. Based on the Kalman controllability condition, ourmethod identifies the part of the target set that can be controlled by a candidate driver.To do so, we introduce a ranking among the target nodes and we iteratively evaluate thecontrollability of the system by adding one target node at a time in a descending order.This eventually finds a univocal controllable configuration corresponding to the highestranking. In the following, we first illustrate how our method, named step-wise targetcontrollability , works for simple network structures and we discuss the potential benefitsfor directed and sparse networks as compared to alternative approaches. Then, we use itDecember 8, 2020 2/27o study molecular networks of macrophage pro-inflammatory activation, derived fromontology-based reconstructions, and identify the driver-target pathway alterations usinggene expression data from blood samples of patients affected by multiple sclerosis (MS)and a matched group of healthy controls (HC).

Step-wise target controllability identifies a controllable subset oftargets

Let G be a directed graph (or network) of N nodes (or vertices) and L links (or edges), and T an arbitrary subset of S < N nodes in the network. The aim is to measure the abilityof each node to drive the state of the target set T from a dynamical system perspective [5].In the case of linear time-invariant dynamics, the number of controllable target nodescan be obtained computing the rank of the target controllability matrix [18, 26, 27]: Q T = (cid:2) CB CAB CA B · · · CA N − B (cid:3) (1)where A is the adjacency matrix of the network, B is a vector identifying the drivernode and C is a matrix selecting the rows of A corresponding to the target, or output,nodes ( Material and methods ).A walk in the network consists of an alternating sequence of vertices and edges.The ( i, j ) entry of Q T indicates how many walks of length j − i [28]. Trivially, all the nodes not traversed by these walks do not contributeto the walk lengths and they can be neglected for the purpose of control. By removingthe irrelevant nodes from the network, A becomes smaller and this results in a targetcontrollability matrix with less columns. Put differently, we avoid the computation ofmatrix exponentials corresponding to non-existing driver-target walks ( Material andmethods ). In practice, this can be of great advantage for reducing the occurrence ofround-off errors during the matrix rank calculations. For example, this is the case forsparse and directed networks, where fewer nodes are reachable as compared to denseand undirected networks.This can be easily appreciated in the following example. Let us consider a directedfull binary tree with h = 6 levels, with the root node as the candidate driver. Withoutloss of generality, we randomly position a target in each level and we rank them accordingto their height in the tree. Then, we introduce a simple cycle among the first threenodes of the tree ( Fig. S1 ). By construction, this configuration is controllable and theentire target set can be fully driven by the driver. However, when considering the entirenetwork the returned rank is deficient. Instead, by removing the part of the networkthat is irrelevant for the control, the rank is full and we retrieve the entire controllableconfiguration, even in the case of larger networks, i.e. up to h = 10 levels.The rank of Q T gives the number of target nodes τ ≤ S controllable by the driver,but there might be in general many possible equivalent configurations. To overcome thisissue, we propose a step-wise procedure that tests the controllability on subproblems ofincreasing size. First, we introduce a hierarchy among the target nodes and relabel themaccording to their importance in a descending order, i.e. t (cid:31) t (cid:31) ... (cid:31) t S . Then, wecreate an empty auxiliary set T (cid:48) and we sequentially include the target nodes accordingto their ordering. At each step, if the rank of Q T (cid:48) is full, the new target node is retained,otherwise it is removed from T (cid:48) . When all the target nodes have been visited, thealgorithm returns the set of controllable targets with highest ranking ( Fig. 1 , Materialand methods ).December 8, 2020 3/27ur method, named step-wise target controllability , not only returns for each candidatedriver the number of controllable target nodes τ corresponding to the configuration withhighest ranking, but also the set T (cid:48) of controllable targets.To explore the limits of our method in terms of computational complexity, weperformed a simulation analysis on synthetic random networks, which vary in numberof nodes, connection density and target size ( Fig. S2 ). Results show that in generalour method is able to retrieve a larger number of controllable targets as compared toa standard approach that computes the rank of the full controllability matrix. Morespecifically, when the target set contains 5% of the network nodes, results are quitestable across different connection densities. For larger target-set sizes our method worksbetter when the connection density is relatively low (0 . . N >

180 and density > . Driver genes are homogeneously distributed in the macrophagenetwork

To test our method in a biological context, we construct a network representing theinteractions between molecules involved in macrophages response to pro-inflammatorystimuli (

Fig. 2 ), with the connections between genes inferred from a previously estab-lished network based on literature [29, 30]. This network is of interest in MS due to thechronic inflammation characteristic of the disease, and the generally destructive effectsof pro-inflammatory macrophages in MS [31]. Hence, dysregulation of macrophages maylead to aggravated inflammation and disease.In order to facilitate biological interpretation of the network, we divide the nodesaccording to molecular function: sensing , signaling , transcription factors or secreted molecules ( Tab. S1 ). We choose the 13 secreted molecules as target nodes becausethey represent the end-products of macrophage pro-inflammatory activation and enablepropagation of inflammation to other cells, thus exacerbating chronic inflammation.To establish a hierarchy among the targets, we use macrophage RNA expression datafrom a group of MS patients and healthy controls. The macrophages were tested withand without activating stimuli to mimic the pro-inflammatory response. We measurethe gene activation as the ratio of the expression between the “pro-inflammatory” and“alert” condition. We then consider the fold change ∆ between the gene activation of MSpatients and HC subjects (

Material and methods ). Genes with larger ∆ values areranked first (

Fig. S3 ).Results show that 51% of the tested network nodes can control at least one target(i.e. τ >

0) and that those drivers tend to be homogeneously distributed across classes(

Tab. S2 ). This indicates a high redundancy in the way the target set can be controlled.Notably, target centrality values are weakly correlated (Spearman rho 0 .

18, p < . k , as defined in Material and methods ,indicating that the most connected genes (e.g. RELA, NFKB1) are not necessarily theones that can most efficiently steer the state of the target set (

Fig. 3a ).Almost all of the driver nodes identified by our method can control the target genesCCL5, CXCL10, CXCL11, IFNA1 and IFNB1, which code for inflammatory chemokinesand cytokines. This implies a high level of co-regulation among these molecules, withmany different actors exerting control over this regulation. Interestingly, the driverswith the highest target control centrality values (SOCS1 and SOCS3, τ = 10) belongto feedback systems that control pro- and anti-inflammatory signal transduction byregulating the signaling process triggered in response to IFN γ [32]. In addition, alldrivers with τ (cid:62) Fig. 3a ). This cluster includes the receptors of IFN γ and theDecember 8, 2020 4/27ignaling molecules responsible for their intracellular effects. This result matches thewell-described effects of IFN γ on chemokine production [33, 34] and overall macrophageactivation [35]. Robustness of driver nodes to random attacks

To assess the stability of our findings to possible errors in the network construction, weperformed a robustness analysis simulating different types of alterations to its nodesand links (

Material and methods ). Results show that removing nodes with higherdegree k leads to a greater reduction of control centrality in the drivers compared tothe removal of low-degree nodes or random removal of nodes ( Fig. 3b ). For example,by attacking 10% of the nodes we lose 5% of the drivers in the latter cases, while welose 20% of the drivers when removing the most connected ones (

Fig. 3b ). This resultconfirms the crucial role of hubs in biological networks in terms of resilience to randomattacks [36] and controllability [37].When perturbing links, the worst condition is given by their random removal. Byattacking 10% of the links around 5% of the drivers are lost. This is intuitively due tothe interruption of driver-target walks and to consequent impossibility to control a nodethat cannot be reached. While randomly rewiring the links has an intermediate impact,adding new links has no effect on the target control centrality of the drivers (

Fig. 3c) .This is of great advantage as it shows that our results will not change if new connectionsare established or provided by the literature.

Gene dysregulation and altered driver-target coactivation in mul-tiple sclerosis

Using step-wise target controllability, we detect potential directed interactions in themacrophage activation network, but we cannot quantify how changes in the driver’s stateaffect those in the targets. To measure driver-target functional interactions, we computethe Spearman correlation between the gene activation of controllable driver-target pairs,for the HC and MS groups (

Material and methods, Tab. S3 ). We call coactivated the genes exhibiting a significant correlation (p < . Fig. 4a, Tab. S3 ). For both HC and MS groups these interactions tend to primarilyinvolve signaling functions

Fig. 4b . However, the number of driver-target coactivationsis lower in the pathological condition (

M S = 19 versus HC = 36). More importantly,they differ from those observed in the HC group ( Fig. 4b) . This is particularly evidentfor target IFNA1, which only exhibits coactivations with signaling and transcriptiondrivers in the MS group (

Fig. 4c ).Because the macrophage network edges are fixed and reconstructed from knownprotein-protein interactions, differences in coactivation can be essentially attributedto altered regulation of transcription. Hence, our hypothesis is that the observedfunctional reorganization can be explained by the dysregulation of specific genes alongthe controllable walks from the drivers to targets. To test this prediction, we examineall the pairs of genes whose coactivation appears or disappears in the MS group (

Fig.4a ). We found that 47 /

51 of these differentially coactivated pairs present at leastone dysregulated gene (i.e., fold change | ∆ | above the 75-percentile, Material andmethods ) on the walk from the driver to the target (

Fig. 5, Tab. S4 ).We find in total 14 dysregulated nodes on any of these walks. The genes that mostfrequently appear are NFKB1, IFNA1 and IFNB1 (36 /

51 walks). They are present onall walks that end with targets CCL5, CXCL10, CXCL11, IFNA1 or IFNB1, i.e., the5 targets that could be controlled by most drivers. This points to their dysregulationbeing a potent disruptor of the normal network functioning. The co-occurrence ofDecember 8, 2020 5/27hese three dysregulated genes can be explained by a feedback loop in which NFKB1activates IFNB1, and IFNA1 and IFBN1 both activate STAT2, which through severalintermediates can influence all three genes (

Fig. S4 ). Indeed, this stems from the factthat all these nodes belong to the main connected component of the network, i.e. asubnetwork in which every node is reachable from any other node.Taken together, these results indicate that the aberrant reorganization of functionalinteractions in the MS group is associated with the presence of dysregulated genes alongthe controllable walks of the macrophage network.

Switch of SOCS-gene coactive drivers reflects dysregulated inflammatoryresponse

Because drivers are crucial for steering the target network’s state, we focus on thesubnetwork specifically involving the dysregulated drivers (IRF8, NFKB1, SOCS1,SOCS3, TLR7) and the walks towards the respective controllable targets

Fig. 6 . Bylooking at how driver and target nodes are differently coactivated in healthy controlsand MS patients, we obtain a much clearer description of the gene dysregulation effects.First, many of the previous results can be now appreciated in finer detail, such as i) thereduction in number of coactivated driver-target pairs in MS, ii) the large number oftargets that can be controlled by SOCS1 and SOCS3, and iii) the potential of NFKB1,IFNA1 and IFNB1 to affect the driver-target functional interactions.Second, we report an interesting mechanism involving the drivers with the highest τ centrality values, i.e. SOCS1 and SOCS3. In the HC group, SOCS1 is coactivatedwith the targets while SOCS3 does not exhibit any significant correlation. In the MSgroup, we observe the opposite, i.e. SOCS1 is silent while SOCS3 becomes coactive.Because both driver genes are dysregulated, the observed “switch” mechanism couldbe therefore associated with the altered pro-inflammatory response of the MS group.Indeed, these two molecules are known to be strong modulators of macrophage response:SOCS1 inhibits the signaling of pro-inflammatory genes while SOCS3 is known to be animportant actor in inflammatory response, with the ratio of the two proteins determiningthe actual effect [38]. Identification of controllable configurations in complex networks

Network controllability refers to the ability to drive an interconnected dynamical systemfrom any initial state to any desired final state in finite time, through a suitable selectionof inputs [4, 5]. In recent years, an increasing number of research groups from differentdisciplines have focused their efforts on identifying the minimum set of driver nodes orquantifying the capacity of single nodes to control the entire network, as well as partsof it [39–49]. Despite being theoretically attractive, network controllability still suffersfrom computational issues that limit its impact in concrete applications. This is mainlydue to the presence of multiple equivalent controllable walks in a network that makethe associated controllability problem ill-posed and/or the resulting solution space verybig [4, 5].To reduce such complexity, we propose a method based on control centrality, which waspreviously designed to quantify the ability of one node to control directed networks [24].First, we define the target control centrality to measure the controllability of a specificpart of the network, i.e., a predefined target set. Because edges are directed, this has theadvantage to ignore the part of the network that is not traversed by the walks connectingthe driver to the target set. Second, we introduce an ordering among the target nodesDecember 8, 2020 6/27nd perform a step-wise controllability test with increasing size. Because of the ranking,only one controllable configuration will be identified, i.e. the one with the highest ranking(

Fig. 1 ). To test the controllability of the driver-targets configuration at each step weadopted the Kalman criterion [4, 8]. However, the entire iterative framework is quiteflexible and other methods, such as Gramian condition [50], Popov-Belevich-Hautuscriterion [21], or feedback vertex set [51], could be used as alternative controllabilitycriteria.The main advantage in terms of time of our method consists in the fact that weidentify a controllable driver-targets configuration by performing j tests when j < S outof S target nodes are controllable, as opposed to a brute-force approach that requires totest all the possible configurations with j targets and that, in the worst-case scenario,leads to (cid:0) jS (cid:1) tests.In addition, because of the step-wise procedure, the method does not need to computethe rank of the full controllability matrix and this allows to minimize possible round-offerrors at least for relatively small networks. Specifically, our results show that thecomputations are reliable when N <

180 and the connection density is < . Fig. S2 ).While these numbers are in general relatively low and impede to scale up to very largenetworks, they still represent ranges that are compatible with typical brain networksobtained from neuroimaging data ( [52–54]).

Control pathways in macrophage molecular networks

The study of the molecular interactions is crucial to the understanding of the basicfunctions of the cell such as proliferation or apoptosis [55,56]. Determining the connectionmechanisms that rule a specific biological function can significantly impact our dailylife by providing new therapeutics to counteract diseases [57–59]. Studying molecularnetworks is however difficult, because in general we do not know the true functionalinteractions of a cell and indirect techniques such as gene co-expression are typicallyemployed to infer such connections [60]. Based on correlation analysis, these methodscannot inform on the causal nature of the interactions. More importantly, the reliabilityof the estimated network critically depends on the number of interactions to number ofdata samples ratio, which is in practice very low [60].To overcome this limitation, we reconstruct the directed gene interactions associatedwith the inflammatory state of the human macrophages by adopting a novel ontology-based approach that integrates the available information from multiple datasets andresults in the literature [61]. Previous studies show that the number of driver nodes inbiological networks is rather high due to their sparse and heterogeneous nature [12, 62].Consistently, we find that a large percentage of genes (51%) can control at least onesecreted molecule in the target set. Our results also confirm that, despite being crucialfor global communication, hubs (e.g. RELA) are not always the most important from anetwork control perspective (

Fig. 3 a). This stems from the theoretical impossibilityto diversify the input signals to all the connected neighbors [12]. The found drivergenes are heterogeneously distributed across the tested gene classes. However, ourmethod highlights SOCS1 and SOCS3 as the drivers with the highest target controlcentrality values, with other IFN γ -response-related genes showing similar values. Thisis in line with the known effects of SOCS-genes and IFN γ on molecules secreted bypro-inflammatory macrophages [32–35], supporting the ability of this method to identifybiologically relevant drivers.Overall, these results uncover the existence of potential causal influences from candi-date driver genes to the secreted molecules in the human macrophage activation network.Because the identified driver nodes are robust to network alterations, notably whenadding new links ( Fig. 3 cblue diamonds), the obtained results are expected to besufficiently resilient to the integration of new gene-gene interactions. Notably, our resultsDecember 8, 2020 7/27re also relatively robust to the deletion of links, a situation which is often associatedwith filtering or thresholding out the weakest or less relevant connections in biologicalnetworks [63] (

Fig. 3c light blue triangles).From a different angle, our approach can be seen as a principled manner to focus onspecific nodes (the drivers) or node pairs (driver-target) in complex networks. This mighthave important consequences when studying genome-wide databases where the highnumber of elements can make prohibitive the assessment of significant gene expressionsand/or co-expressions [64, 65].

Dysregulated genes and aberrant interactions in multiple sclero-sis

Multiple sclerosis is an immune-mediated disease in which the immune system erroneouslyattacks myelin in the central nervous system. There are many neurological symptoms,including motor and cognitive deficits, that can vary in type and severity dependingon the attacked central nervous system regions [66]. The role of macrophages inMS is crucial because of their ability to obtain a pro-inflammatory activation state,including the release of pro-inflammatory cytokines and leading to central nervous systemtissue damage [67]. Hence, dysregulation of macrophages may lead to autoimmunityand persistent inflammatory diseases [68]. While the etiology of MS is still not well-understood there is a large consensus on its genetic basis and on the importance ofunveiling the underlying network mechanisms [69].In this study, we combined network controllability tools and gene expression data todetect the genes responsible for altering the macrophages action in multiple sclerosis.Differently from standard approaches, where the attention is focused on the identificationof the driver nodes in a network, we here propose an alternative way of exploiting networkcontrollability. We first show that the macrophage inflammatory state in the MS groupwas characterized by a drastic alteration of the coactivations in the driver and targetgenes (

Fig. 4 ). Such absence of coordination was in general associated with the presenceof dysregulated genes along the walks from the driver to the target node. Notably, thepathological dysregulation of NFKB1, IFNB1 and IFNA1, which belong to the samefeedback cycle (

Fig. S4 ), critically affects several driver-target functional interactions(

Fig. 5 ).Finally, our approach allows to identify a shift mechanism for dysregulated SOCS1 andSOCS3 drivers, showing opposite coactivation patterns in MS patients compared to thehealthy controls (

Fig. 6 ). These results suggest that experimentally stimulating SOCS3- a strong inducer of pro-inflammatory response - might be more effective for movingthe state of the altered secreted molecules towards physiological configurations. Takentogether, these results might have practical consequences on how to design interventionstrategies and counteract disease phenotype.

Methodological considerations

Our method uses Kalman controllability rank condition [8] to quantify the centralityof the driver nodes. This criterion assumes that the investigated system has a lineardynamics,

Eq. 2 . In our case, this means that the changes in the gene activation wouldfollow a linear trend. While this is in general not true and difficult to ascertain, it appearsthat results from non-linear tests are often dominated by linear relationships [70, 71].Furthermore, a significant fraction of the data analysis and modeling deals exclusivelywith linear approaches as they are simpler, easy to interpret and serve as a prerequisiteof nonlinear behavior [39].Another peculiarity of our approach is the assumption of time-invariant interactionsin the molecular gene network. On the one hand, this assumption allows to betterDecember 8, 2020 8/27xploit the well-established results and tools in network controllability [12]; on the otherhand, it might conflict with existing literature looking for biological connectivity changesbetween conditions or populations such as differential gene coexpression [72]. Here,we hypothesized that the activation state of each node (in terms of gene expression)could eventually change but not the underlying network structure. Thus, our network -obtained from detailed maps of the macrophage cells - would only act as a substrate/proxyfor functional interactions, such as correlated gene activities.Our method requires a specific ordering of the target nodes. While this can be typicallyachieved in many biological applications - by ranking nodes according to their state (e.g.,gene expression [73, 74], brain activity [75, 76]) - challenges might still remain in general.When it is not possible to impose a ranking of the nodes from external knowledge, anotherpossibility is to derive it from the network structure taking into account, for example,node centrality values. However, there might be multiple nodes with the same centralityvalue, which would impede a proper ranking. In those situations, multiple node centralitymeasures could be integrated to get a more heterogeneous distribution (e.g., degree,strength, betweenness, closeness) [3]. Another possibility is to add equally importanttargets at the same iteration step and test if they are simultaneously controllable. Whilethis procedure is suboptimal and may underestimate the number of controllable targets,it still minimizes the computational complexity related to testing multiple driver-targetcombinations.We finally notice that our method is conceived for directed networks only, wherethe dimensionality reduction has a real computational benefit. In fact, in the case ofundirected graphs, it is not possible to remove nodes on the walks from the driver to thetargets since information is bound to span the entire network. Similarly, for directedbut dense networks, the possibility to focus on specific parts of the network, and reducethe computational cost, becomes lower regardless of the topology.To conclude, it is important to mention that extensions of network controllabilitytools to time-varying frameworks do exist [77, 78]. However, in that case networks wouldbe inferred from gene coexpression and therefore affected by statistical uncertainty dueto sample sizes. Further research is needed to seek how to apply network controllabilityin presence of noisy time-varying connections.

Conclusion

In this study, We introduce a method to quantify the ability of candidate driver nodesto drive the state of a target set within a sparse and directed network. Further, weillustrate how this method works for the molecular network associated with the humanmacrophage inflammatory response. The obtained results reveal in a principled waythe genes that are significantly dysregulated in multiple sclerosis. We hope that thismethod can contribute to the identification of the key nodes in biological networks tobetter identify pharmacological targets to counteract human diseases.

Step-wise target controllability

We introduce a method to identify which target nodes in a network that can be controlledfrom a single driver node. To do so, we start by considering the canonical linear time-invariant dynamics on a directed network described by the adjacency matrix A ∈ R N × N ˙ x ( n ) = A x ( n ) + B u ( n ) , y ( n ) = C x ( n ) (2)December 8, 2020 9/27here x ( n ) ∈ R N describes the state of each node at time t , B ∈ R N specifies thedriver node, u ( n ) ∈ R N is its external input (or control) signal, y ( n ) ∈ R S is the outputvector, and C ∈ R S × N is the output matrix identifying the target nodes.Such a system is controllable if it can be guided from any initial state to any desiredfinal state in finite time, with a suitable choice of input. A necessary and sufficientcondition to assess the controllability of Eq. 2 , is that the controllability matrix QQ = (cid:2) B AB A B · · · A N − B (cid:3) (3)has full row rank, i.e. rank( Q ) = N . That is the Kalman rank condition, whichbasically verifies the existence of linearly independent rows in Q [4, 8]. If so, the drivernode can reach and control the dynamics of all the other nodes through independentwalks of length N − T of the network, specified in C andconsisting of S ≤ N nodes, then Eq. 2 can be reduced into a target controllabilitymatrix Q T = CQ ( Eq. 1 ), where C filters the rows of interest corresponding to thetargets. Now, the rank of Q T gives the number τ ≤ S of nodes in the target set thatcan be controlled by the driver.To identify a driver-target configuration, we further introduce a hierarchy among thetarget nodes, so that we can order and relabel them from the most important one to theleast, i.e. t (cid:31) t (cid:31) ... (cid:31) t S . Then we perform the following step-wise procedure for eachcandidate driver node • Step 1.

Initialization – Create a temporary empty target set T (cid:48) ← {} – Set the number of controllable targets τ ← • Step 2.

Repeat until termination criteria are met.

For j ← , ..., S do – Add the j -th target node to the target set T (cid:48) ← T (cid:48) ∪ { t j } – Build the subgraph containing the nodes on walks from the driver to thetargets in T (cid:48) – Compute the rank of the target controllability matrix Q T (cid:48) – If rank( Q T (cid:48) ) is full then τ ← τ + 1 else T (cid:48) ← T (cid:48) \ { t j } – j ← j + 1 • Step 3.

Output τ and T (cid:48) Eventually, the target control centrality τ is the number of controllable targets in T ,and the set T (cid:48) contains the τ controllable targets with highest ranking.Note that in general, the method may underestimate the actual number of controllabletarget nodes because of the occurrence of numerical errors, but not because of the step-wise procedure itself. The Matlab code associated to the step-wise target controllabilityis freely available at https://github.com/BCI-NET/Public Construction of the macrophage activation network

We reconstruct the inflammatory molecular network of the human macrophage byintegrating information from the macrophage signal transduction map [29, 30]. Thismap contains a comprehensive, validated, and annotated map of signal transductionpathways of inflammatory processes in macrophages based on the current literature. Toextract molecular interactions from this map, we used the Hermit software [79], whichDecember 8, 2020 10/27mplements automatic reasoning based on logical rules. We specifically used the rulesimplemented in the molecular network ontology to infer molecular interactions dependingon the process they belong [61, 80]. Because we are interested in the inflammationprocess, we restricted our analysis to a specific subset of 101 genes with known rolesin macrophage pro-inflammatory activation, and for which their regulation in responseto pro-inflammatory stimuli could be confirmed in our data set. These genes wereclassified according to their function in the cell: sensing , signaling , transcription and secreted ( Tab. S1 ), as described in databases such as NCBI Gene [81], UniProt [82]and GeneCards [83]. The full network was thus reduced to only include these genesand their interactions. Due to recent studies, we also opted to exclude two edges (fromSOCS3 to IFNGR1 and to IFNGR2) to represent the involved pathways [38].The resulting network contains N = 101 nodes and L = 211 unweighted directededges representing either activation or inhibition between genes. The total degree k ofeach node in the network is computed by summing the number of incoming and outgoingedges: k i = N (cid:88) j =1 A ij + N (cid:88) j =1 A ji (4)where A ij = 1 if there is an edge between the corresponding genes, and 0 otherwise. Collection of macrophage mRNA expression data

Collection of blood for the study was approved by the French Ethics committee andthe French ministry of research (DC-2012-1535 and AC-2012-1536). Written informedconsent was obtained from all study participants. All patients fulfilled diagnostic criteriafor multiple sclerosis [84], and individuals (multiple sclerosis patients and healthy donors)with any other inflammatory or neurological disorders were excluded from the study.Patients were included in the study only if they were not undergoing treatment.Blood was sampled from 8 MS patients and 8 healthy controls in acid citrate dextrosetubes. From blood samples, peripheral blood mononuclear cells were isolated using FicollPaque Plus ( ) and centrifugation (2200 rpm, 20 min). Cellswere washed in PBS and RPMI +10% FCS. Monocytes were isolated with anti-CD14microbeads ( ) and plated in 12-well plates (500000 cells / well)in RPMI +10% FCS and granulocyte-macrophage colony-stimulating factor (500 U / ml)to induce differentiation into macrophages. After 72h, media was replaced with freshmedia supplemented with granulocyte-macrophage colony-stimulating factor (500 U / ml)to maintain “alert” macrophages or IFN γ (200 U / ml) + upLPS (10 ng / ml) to induce“pro-inflammatory” activation. Cells were lysed after 24h and RNA was extracted withRNeasy Mini Kit ( ).Transcriptome sequencing cDNA libraries were prepared using a stranded mRNApolyA selection (Truseq stranded mRNA kit, ). For each sample,we performed 60 million single-end, 75 base reads on a NextSeq 500 sequencer ( ). RNA-Seq data analyses were performed by GenoSplice technology( ). Sequencing, data quality, reads repartition (e.g., for potentialribosomal contamination), and insert size estimation are performed using FastQC [85],Picard-Tools ( http://broadinstitute.github.io/picard/ ), Samtools [86] and rseqc[87]. Reads were mapped using STARv2.4.0 [88] on the hg19 Human genome assembly.Gene expression regulation study was performed [89]. Briefly, for each gene present inthe FAST DB v2018 1 annotations, reads aligning on constitutive regions (that are notprone to alternative splicing) were counted. Based on these read counts, normalizationwas performed using DESeq2 [90] in R (v.3.2.5) [91].December 8, 2020 11/27 etwork modeling and data analysis In the modeling framework described by

Eq 2 , matrix A corresponds to the molecularnetwork and represents the time-invariant component of the system. The dynamiccomponent is instead represented by the gene activation response in the healthy anddiseased condition ( Fig. 2b ), computed as the ratio in gene expression between the “pro-inflammatory” and “alert” condition. Specifically, x ( n ) represents the gene activation. B is a vector identifying the candidate driver. The control signal u ( n ) is out of the scopeof this work. The output vector y ( n ) and the output matrix C identify the target nodes.We select the genes belonging to the secreted molecules class ( Tab. S1 ) as ourtarget set T . All the nodes in the other classes are then tested separately as potentialdriver nodes by computing their target control centrality τ . To enhance numericalprecision, the logarithmic transformation log ( q + 1) is applied to the elements of thetarget controllability matrix Q T ( Eq. 1 ).The hierarchy among the target nodes is established by computing the fold change ∆between the corresponding gene activation in the two groups:∆ = µ MS µ HC (5)where µ MS and µ HC are group-averages for MS patients and healthy controls, respectively,of the gene activation. Nodes with higher ∆ absolute values are ranked first. Highlypositive ∆ values indicate a too strong inflammatory response (over-activation) in theMS patients with respect to the healthy controls. Highly negative ∆ values indicate atoo weak inflammatory response (under-activation). We define dysregulated genes alongthe controllable driver-target walks as those for which | ∆ | is above the 75 th percentile.We perform a robustness analysis to evaluate the stability of the identified driver nodesto potential errors in the molecular network reconstruction. We simulate attacks withincreasing intensity, i.e. up to 20% of the nodes or edges in the network. When removingnodes, we consider the following cases: i) random deletion, ii) preferential removal ofhigh-degree nodes, and iii) preferential removal of low-degree nodes. Preferential attacksare performed by selecting nodes with a probability p proportional to their degree k , i.e. p ∝ k for high-degree nodes and p ∝ − k for low-degree nodes. When perturbing edges,we test: i) random addition, ii) random deletion, and iii) random rewiring. For each case,we simulated 1000 repetitions and we computed the target control centrality τ for thedriver nodes identified in the original network. Then, we report the percentage of nodesthat cease to be drivers (i.e. τ = 0), that is, the percentage of nodes that are drivers inour analysis, but are no longer able to control any target in the perturbed case. Acknowledgments

We would like to thank Prof. Albert-Laszlo Barabasi for his helpful comments andsuggestions. The research leading to these results has received funding from the Frenchgovernment under management of Agence Nationale de la Recherche as part of the”Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Insti-tute) and reference ANR-10-IAIHU-06 (Agence Nationale de la Recherche-10-IA InstitutHospitalo-Universitaire-6), and from the Inria Project Lab Program (project Neuromark-ers). The content is solely the responsibility of the authors and does not necessarilyrepresent the official views of any of the funding agencies.December 8, 2020 12/27 eferences

1. Jeong H, Mason SP, Barab´asi AL, Oltvai ZN. Lethality and Centrality in ProteinNetworks. Nature. 2001;411(6833):41–42. doi:10.1038/35075138.2. Bonifazi P, Goldin M, Picardo MA, Jorquera I, Cattani A, Bianconi G, et al. GABAergicHub Neurons Orchestrate Synchrony in Developing Hippocampal Networks. Science.2009;326(5958):1419–1424. doi:10.1126/science.1175509.3. Newman M. Networks: An Introduction. Oxford, New York: Oxford University Press;2010.4. Rugh WJ, Kailath T. Linear System Theory, 2nd Edition. 2nd ed. Upper Saddle River,NJ: Pearson; 1995.5. Sontag ED. Mathematical Control Theory: Deterministic Finite Dimensional Systems.2nd ed. Texts in Applied Mathematics. New York: Springer-Verlag; 1998.6. Wagner SG. On the Number of Matchings of a Tree. European Journal of Combinatorics.2007;28(4):1322–1330. doi:10.1016/j.ejc.2006.01.014.7. Heuberger C, Wagner S. The Number of Maximum Matchings in a Tree. DiscreteMathematics. 2011;311(21):2512–2542. doi:10.1016/j.disc.2011.07.028.8. Kalman RE. Mathematical Description of Linear Dynamical Systems. Journal of theSociety for Industrial and Applied Mathematics Series A Control. 1963;1(2):152–192.doi:10.1137/0301010.9. Hautus MLJ. Stabilization Controllability and Observability of Linear AutonomousSystems. Indagationes Mathematicae (Proceedings). 1970;73:448–455. doi:10.1016/S1385-7258(70)80049-X.10. Lin CT. Structural Controllability. IEEE Transactions on Automatic Control.1974;19(3):201–208. doi:10.1109/TAC.1974.1100557.11. Shields R, Pearson J. Structural Controllability of Multiinput Linear Systems. IEEETransactions on Automatic Control. 1976;21(2):203–212. doi:10.1109/TAC.1976.1101198.12. Liu YY, Slotine JJ, Barab´asi AL. Controllability of Complex Networks. Nature.2011;473(7346):167–173. doi:10.1038/nature10011.13. Hopcroft JE, Karp RM. A N5/2 Algorithm for Maximum Matchings in Bipartite. In:12th Annual Symposium on Switching and Automata Theory (Swat 1971); 1971. p.122–125.14. Uno T. Algorithms for Enumerating All Perfect, Maximum and Maximal Matchings inBipartite Graphs. In: Leong HW, Imai H, Jain S, editors. Algorithms and Computation.Lecture Notes in Computer Science. Berlin, Heidelberg: Springer; 1997. p. 92–101.15. Liu Y, Liu G. Number of Maximum Matchings of Bipartite Graphs with Positive Surplus.Discrete Mathematics. 2004;274(1):311–318. doi:10.1016/S0012-365X(03)00204-8.16. Zdeborov´a L, M´ezard M. The Number of Matchings in Random Graphs. Journal of Statisti-cal Mechanics: Theory and Experiment. 2006;2006(05):P05003–P05003. doi:10.1088/1742-5468/2006/05/P05003.17. Klickstein IS, Sorrentino F. Control Distance and Energy Scaling of Complex Net-works. IEEE Transactions on Network Science and Engineering. 2019; p. 1–1.doi:10.1109/TNSE.2018.2887042.18. Gao J, Liu YY, D’Souza RM, Barab´asi AL. Target Control of Complex Networks. NatureCommunications. 2014;5:5415. doi:10.1038/ncomms6415.19. Czeizler E, Gratie C, Chiu WK, Kanhaiya K, Petre I. Target Controllability of LinearNetworks. In: Bartocci E, Lio P, Paoletti N, editors. Computational Methods in SystemsBiology. Lecture Notes in Computer Science. Cham: Springer International Publishing;2016. p. 67–81.20. Zhang X, Wang H, Lv T. Efficient Target Control of Complex Networks Based onPreferential Matching. PLoS ONE. 2017;12(4). doi:10.1371/journal.pone.0175375.

December 8, 2020 13/27

1. Li J, Chen X, Pequito S, Pappas GJ, Preciado VM. Structural Target Controllability ofUndirected Networks. In: 2018 IEEE Conference on Decision and Control (CDC). MiamiBeach, FL: IEEE; 2018. p. 6656–6661.22. Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J. Gene Replacementand Transgenic Animals. Molecular Cell Biology 4th edition. 2000;.23. Hallett M. Transcranial Magnetic Stimulation and the Human Brain. Nature.2000;406(6792):147–150. doi:10.1038/35018000.24. Liu YY, Slotine JJ, Barab´asi AL. Control Centrality and Hierarchical Structure inComplex Networks. PLOS ONE. 2012;7(9):e44459. doi:10.1371/journal.pone.0044459.25. Gu S, Pasqualetti F, Cieslak M, Telesford QK, Yu AB, Kahn AE, et al. Con-trollability of Structural Brain Networks. Nature Communications. 2015;6:8414.doi:10.1038/ncomms9414.26. Murota K, Poljak S. Note on a Graph-Theoretic Criterion for Structural Output Controlla-bility. IEEE Transactions on Automatic Control. 1990;35(8):939–942. doi:10.1109/9.58507.27. Commault C, Van der Woude J, Frasca P. Functional Target Controllability of Networks:Structural Properties and Efficient Algorithms. IEEE Transactions on Network Scienceand Engineering. 2019; p. 1–1. doi:10.1109/TNSE.2019.2937404.28. Biggs N. Algebraic Graph Theory. Cambridge University Press; 1974.29. Robert C, Lu X, Law A, Freeman TC, Hume DA. Macrophages.Com: An on-LineCommunity Resource for Innate Immunity Research. Immunobiology. 2011;216(11):1203–1211. doi:10.1016/j.imbio.2011.07.025.30. Raza S, Robertson KA, Lacaze PA, Page D, Enright AJ, Ghazal P, et al. A Logic-BasedDiagram of Signalling Pathways Central to Macrophage Activation. BMC SystemsBiology. 2008;2:36. doi:10.1186/1752-0509-2-36.31. Bitsch A, Kuhlmann T, Costa CD, Bunkowski S, Polak T, Br¨uck W. Tumour Necro-sis Factor Alpha mRNA Expression in Early Multiple Sclerosis Lesions: Correlationwith Demyelinating Activity and Oligodendrocyte Pathology. Glia. 2000;29(4):366–375.doi:10.1002/(SICI)1098-1136(20000215)29:4¡366::AID-GLIA7¿3.0.CO;2-Y.32. McCormick SM, Heller NM. Regulation of Macrophage, Dendritic Cell, and MicroglialPhenotype and Function by the SOCS Proteins. Frontiers in Immunology. 2015;6.doi:10.3389/fimmu.2015.00549.33. Koper OM, Kami´nska J, Sawicki K, Kemona H. CXCL9, CXCL10, CXCL11, andTheir Receptor (CXCR3) in Neuroinflammation and Neurodegeneration. Advancesin Clinical and Experimental Medicine: Official Organ Wroclaw Medical University.2018;27(6):849–856. doi:10.17219/acem/68846.34. Liu J, Guan X, Ma X. Interferon Regulatory Factor 1 Is an Essential and DirectTranscriptional Activator for Interferon { gamma } -Induced RANTES/CCl5 Expres-sion in Macrophages. The Journal of Biological Chemistry. 2005;280(26):24347–24355.doi:10.1074/jbc.M500973200.35. Mosser DM, Edwards JP. Exploring the Full Spectrum of Macrophage Activation. NatureReviews Immunology. 2008;8(12):958–969. doi:10.1038/nri2448.36. Barab´asi AL, Oltvai ZN. Network Biology: Understanding the Cell’s Functional Organi-zation. Nature Reviews Genetics. 2004;5(2):101–113. doi:10.1038/nrg1272.37. Pu CL, Pei WJ, Michaelson A. Robustness Analysis of Network Controllabil-ity. Physica A: Statistical Mechanics and its Applications. 2012;391(18):4420–4425.doi:10.1016/j.physa.2012.04.019.38. Wilson HM. SOCS Proteins in Macrophage Polarization and Function. Frontiers inImmunology. 2014;5. doi:10.3389/fimmu.2014.00357.39. Liu YY, Barab´asi AL. Control Principles of Complex Networks. Reviews of ModernPhysics. 2016;88(3). doi:10.1103/RevModPhys.88.035006. December 8, 2020 14/27

0. Lugagne JB, Sosa Carrillo S, Kirch M, K¨ohler A, Batt G, Hersen P. Balancing aGenetic Toggle Switch by Real-Time Feedback Control and Periodic Forcing. NatureCommunications. 2017;8. doi:10.1038/s41467-017-01498-0.41. Sun X, Hu F, Wu S, Qiu X, Linel P, Wu H. Controllability and Stability Analysis of LargeTranscriptomic Dynamic Systems for Host Response to Influenza Infection in Human.Infectious Disease Modelling. 2016;1(1):52–70. doi:10.1016/j.idm.2016.07.002.42. Wuchty S. Controllability in Protein Interaction Networks. Proceedings of the Na-tional Academy of Sciences of the United States of America. 2014;111(19):7156–7160.doi:10.1073/pnas.1311231111.43. Tang E, Giusti C, Baum GL, Gu S, Pollock E, Kahn AE, et al. Developmental Increasesin White Matter Network Controllability Support a Growing Diversity of Brain Dynamics.Nature Communications. 2017;8(1):1252. doi:10.1038/s41467-017-01254-4.44. Dosenbach NUF, Fair DA, Miezin FM, Cohen AL, Wenger KK, Dosenbach RAT, et al. Dis-tinct Brain Networks for Adaptive and Stable Task Control in Humans. Proceedings of theNational Academy of Sciences. 2007;104(26):11073–11078. doi:10.1073/pnas.0704320104.45. Wagner T, Valero-Cabre A, Pascual-Leone A. Noninvasive Human BrainStimulation. Annual review of biomedical engineering. 2007;9:527–65.doi:10.1146/annurev.bioeng.9.061206.133100.46. Menara T, Gu S, Bassett DS, Pasqualetti F. On Structural Controllability of Symmetric(Brain) Networks. arXiv:170605120 [cs, math]. 2017;.47. Gu S, Betzel RF, Mattar MG, Cieslak M, Delio PR, Grafton ST, et al. OptimalTrajectories of Brain State Transitions. NeuroImage. 2017;148(Supplement C):305–317.doi:10.1016/j.neuroimage.2017.01.003.48. Betzel RF, Gu S, Medaglia JD, Pasqualetti F, Bassett DS. Optimally Controllingthe Human Connectome: The Role of Network Topology. Scientific Reports. 2016;6.doi:10.1038/srep30770.49. Muldoon SF, Pasqualetti F, Gu S, Cieslak M, Grafton ST, Vettel JM, et al.Stimulation-Based Control of Dynamic Brain Networks. PLOS Computational Biology.2016;12(9):e1005076. doi:10.1371/journal.pcbi.1005076.50. Klickstein I, Shirin A, Sorrentino F. Energy Scaling of Targeted Optimal Control ofComplex Networks. Nature Communications. 2017;8(1):15145. doi:10.1038/ncomms15145.51. Za˜nudo JGT, Yang G, Albert R. Structure-Based Control of Complex Networkswith Nonlinear Dynamics. Proceedings of the National Academy of Sciences.2017;doi:10.1073/pnas.1617387114.52. Bullmore E, Sporns O. Complex Brain Networks: Graph Theoretical Analysis ofStructural and Functional Systems. Nature Reviews Neuroscience. 2009;10(3):186–198.doi:10.1038/nrn2575.53. Bullmore ET, Bassett DS. Brain Graphs: Graphical Models of the Human Brain Connec-tome. Annual Review of Clinical Psychology. 2011;7(1):113–140. doi:10.1146/annurev-clinpsy-040510-143934.54. De Vico Fallani F, Richiardi J, Chavez M, Achard S. Graph Analysis of Func-tional Brain Networks: Practical Issues in Translational Neuroscience. Philosophi-cal Transactions of the Royal Society B: Biological Sciences. 2014;369(1653):20130521.doi:10.1098/rstb.2013.0521.55. Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, et al. DynamicModularity in Protein Interaction Networks Predicts Breast Cancer Outcome. NatureBiotechnology. 2009;27(2):199–204. doi:10.1038/nbt.1522.56. Maslov S, Sneppen K. Specificity and Stability in Topology of Protein Networks. Science.2002;296(5569):910–913. doi:10.1126/science.1065103.57. Drier Y, Sheffer M, Domany E. Pathway-Based Personalized Analysis of Can-cer. Proceedings of the National Academy of Sciences. 2013;110(16):6388–6393.doi:10.1073/pnas.1219651110.

December 8, 2020 15/27

8. Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, et al. DiseaseNetworks. Uncovering Disease-Disease Relationships through the Incomplete Interactome.Science (New York, NY). 2015;347(6224):1257601. doi:10.1126/science.1257601.59. Tong T, Gao Q, Guerrero R, Ledig C, Chen L, Rueckert D. A Novel GradingBiomarker for the Prediction of Conversion from Mild Cognitive Impairment toAlzheimer’s Disease. IEEE Transactions on Biomedical Engineering. 2017;64(1):155–165.doi:10.1109/TBME.2016.2549363.60. Uygun S, Peng C, Lehti-Shiu MD, Last RL, Shiu SH. Utility and Limitations of UsingGene Expression Data to Identify Functional Associations. PLoS Computational Biology.2016;12(12). doi:10.1371/journal.pcbi.1005244.61. Henry VJ, Sa¨ıs F, Marchadier E, Dibie J, Goelzer A, Fromion V. BiPOm: BiologicalInterlocked Process Ontology for Metabolism. How to Infer Molecule Knowledge fromBiological Process? In: International Conference on Biomedical Ontology, ICBO 2017.Newcastle upon Tyne, United Kingdom; 2017. p. np.62. Ruths J, Ruths D. Control Profiles of Complex Networks. Science. 2014;343(6177):1373–1376. doi:10.1126/science.1242063.63. Fallani FDV, Latora V, Chavez M. A Topological Criterion for Filtering Informa-tion in Complex Brain Networks. PLOS Computational Biology. 2017;13(1):e1005305.doi:10.1371/journal.pcbi.1005305.64. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al.Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry.Nature. 2008;456(7218):53–59. doi:10.1038/nature07517.65. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Dalton JD, et al.Genome-Wide Analysis of Genetic Alterations in Acute Lymphoblastic Leukaemia. Nature.2007;446(7137):758–764. doi:10.1038/nature05690.66. Hauser SL, Oksenberg JR, Baranzini SE. Multiple Sclerosis. In: Rosenberg’s Molecularand Genetic Basis of Neurological and Psychiatric Disease. Elsevier; 2015. p. 1001–1014.67. Chu F, Shi M, Zheng C, Shen D, Zhu J, Zheng X, et al. The Roles of Macrophagesand Microglia in Multiple Sclerosis and Experimental Autoimmune Encephalomyelitis.Journal of Neuroimmunology. 2018;318:1–7. doi:10.1016/j.jneuroim.2018.02.015.68. Strauss O, Dunbar PR, Bartlett A, Phillips A. The Immunophenotype of Antigen Present-ing Cells of the Mononuclear Phagocyte System in Normal Human Liver – A SystematicReview. Journal of Hepatology. 2015;62(2):458–468. doi:10.1016/j.jhep.2014.10.006.69. Airas L. Hormonal and Gender-Related Immune Changes in Multiple Sclerosis. ActaNeurologica Scandinavica. 2015;132(S199):62–70. doi:10.1111/ane.12433.70. Song L, Langfelder P, Horvath S. Comparison of Co-Expression Measures: MutualInformation, Correlation, and Model Based Indices. BMC Bioinformatics. 2012;13(1):328.doi:10.1186/1471-2105-13-328.71. Steuer R, Kurths J, Daub CO, Weise J, Selbig J. The Mutual Information: Detecting andEvaluating Dependencies between Variables. Bioinformatics. 2002;18(suppl 2):S231–S240.72. Choi JK, Yu U, Yoo OJ, Kim S. Differential Coexpression Analysis Using MicroarrayData and Its Application to Human Cancer. Bioinformatics. 2005;21(24):4348–4355.doi:10.1093/bioinformatics/bti722.73. Zhao J, Yang TH, Huang Y, Holme P. Ranking Candidate Disease Genes from GeneExpression and Protein Interaction: A Katz-Centrality Based Approach. PLOS ONE.2011;6(9):e24306. doi:10.1371/journal.pone.0024306.74. Liseron-Monfils C, Olson A, Ware D. NECorr, a Tool to Rank Gene Importance inBiological Processes Using Molecular Networks and Transcriptome Data. bioRxiv. 2018;p. 326868. doi:10.1101/326868.75. Lohmann G, Margulies DS, Horstmann A, Pleger B, Lepsien J, Goldhahn D, et al.Eigenvector Centrality Mapping for Analyzing Connectivity Patterns in fMRI Data ofthe Human Brain. PLOS ONE. 2010;5(4):e10232. doi:10.1371/journal.pone.0010232.

December 8, 2020 16/27

6. Sen B, Chu SH, Parhi KK. Ranking Regions, Edges and Classifying Tasks inFunctional Brain Graphs by Sub-Graph Entropy. Scientific Reports. 2019;9(1):1–20.doi:10.1038/s41598-019-44103-8.77. Li A, Cornelius SP, Liu YY, Wang L, Barab´asi AL. The Fundamental Advantages ofTemporal Networks. arXiv:160706168 [nlin]. 2016;.78. Zhang Y, Garas A, Scholtes I. Controllability of Temporal Networks: An Analysis UsingHigher-Order Networks. arXiv:170106331 [physics]. 2017;.79. Motik B, Cuenca Grau B, Sattler U. Structured Objects in Owl: Representation andReasoning. In: Proceeding of the 17th International Conference on World Wide Web -WWW ’08. Beijing, China: ACM Press; 2008. p. 555.80. Musen MA. The Prot´eg´e Project: A Look Back and a Look Forward. AI matters.2015;1(4):4–12. doi:10.1145/2757001.2757003.81. Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, et al. DatabaseResources of the National Center for Biotechnology Information. Nucleic Acids Research.2018;46(D1):D8–D13. doi:10.1093/nar/gkx1095.82. Consortium TU. UniProt: A Worldwide Hub of Protein Knowledge. Nucleic AcidsResearch. 2019;47(D1):D506–D515. doi:10.1093/nar/gky1049.83. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. TheGeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.Current Protocols in Bioinformatics. 2016;54(1):1.30.1–1.30.33. doi:10.1002/cpbi.5.84. Thompson AJ, Banwell BL, Barkhof F, Carroll WM, Coetzee T, Comi G, et al. Diagnosisof Multiple Sclerosis: 2017 Revisions of the McDonald Criteria. The Lancet Neurology.2018;17(2):162–173. doi:10.1016/S1474-4422(17)30470-2.85. Andrews S. FastQC: A Quality Control Tool for High Throughput Sequence Data; 2010.86. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Se-quence Alignment/Map Format and SAMtools. Bioinformatics. 2009;25(16):2078–2079.doi:10.1093/bioinformatics/btp352.87. Wang L, Wang S, Li W. RSeQC: Quality Control of RNA-Seq Experiments. Bioinformatics.2012;28(16):2184–2185. doi:10.1093/bioinformatics/bts356.88. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al.STAR: Ultrafast Universal RNA-Seq Aligner. Bioinformatics. 2013;29(1):15–21.doi:10.1093/bioinformatics/bts635.89. Noli L, Capalbo A, Ogilvie C, Khalaf Y, Ilic D. Discordant Growth of Monozygotic TwinsStarts at the Blastocyst Stage: A Case Study. Stem Cell Reports. 2015;5(6):946–953.doi:10.1016/j.stemcr.2015.10.006.90. Love MI, Huber W, Anders S. Moderated Estimation of Fold Change and Dispersionfor RNA-Seq Data with DESeq2. Genome Biology. 2014;15(12). doi:10.1186/s13059-014-0550-8.91. Team RC. R: A Language and Environment for Statistical Computing; 2014.

December 8, 2020 17/27 igures

Fig 1. Working principle of step-wise target controllability.

Panel a) illustrates anetwork with one driver and a target set T = { t , t , t } of cardinality S = 3. The Kalmancondition informs us that only two targets are controllable from the driver, i.e. τ = rank( Q T ) = 2.However, there might be up to 3 equivalent configurations that are controllable, i.e. { t , t } , { t , t } , and { t , t } . For larger networks, the number of Kalman tests to perform can beprohibitive, i.e. (cid:0) Sτ (cid:1) . Panel b) . By introducing a hierarchy among the target nodes, ourstep-wise method identifies the configuration with the most important nodes by performingonly S tests (see Material and methods ). In this example, the first step considers thesubgraph containing all the walks from the driver to the target set T (cid:48) = { t } . The associatedcontrollability matrix has full rank, i.e. rank( Q T (cid:48) ) = 1. The first target is therefore retainedand the algorithm moves to Step 2, by constructing a new subgraph containing the walks fromthe driver to the target set T (cid:48) = { t , t } . The rank of the new controllability matrix is nowdeficient and t is not retained. In Step 3, the new subgraph contains the walks from the driverto T (cid:48) = { t , t } . Because rank( Q T (cid:48) ) is full and there are no more targets, the algorithm stopsand returns the controllable configuration t , t . December 8, 2020 18/27 ig 2. Molecular network and gene activation associated with the pro-inflammatory state of macrophages . Panel a) shows the molecular network reconstructedthrough ontology-based techniques from the macrophage.com repository [29, 30]. The net-work consists of N = 101 nodes corresponding to genes involved in inflammation; for thesake of interpretablity, they are organized in four classes, depending on their function in thecell. Sensing genes are in the membrane of the cell and start a signaling pathway inside thecell, to the transcription factors, which promote the production of secreted molecules. Thereare L = 211 directed edges representing either activation or inhibition interactions betweenmolecules ( Material and methods ). The size of the nodes is proportional to their totaldegree k . Panel b) shows gene activation computed as the ratio in expression between the“pro-inflammatory” and “alert” states, based on our RNA sequencing data, generated frommonocyte-derived macrophages from blood samples of multiple sclerosis patients ( n = 8) andhealthy controls ( n = 8) ( Material and methods ). Solid lines represent group-averagedvalues, while transparent patches stand for standard deviation.

December 8, 2020 19/27 ig 3. Gene network target control centrality and analysis of robustness for thedriver nodes.

In panel a) the size of the nodes codes the step-wise target control centralityvalues τ . Nodes with τ = 0 are classified as not-drivers and are represented in gray. The insetshows that τ values cannot be merely predicted by node degree k (Spearman rho 0 .

18, p < . b) shows the percentage of driver nodes ( τ >

0) that are lost when removing nodes ina random fashion (black circles), or preferentially attacking high-degree (blue diamonds) orlow-degree nodes (light blue triangles). Panel c) shows the percentage of driver nodes that arelost when randomly rewiring (black circles), adding (blue diamonds) or removing edges (lightblue triangles). December 8, 2020 20/27 ig 4. Altered driver-target coactivation in multiple sclerosis.

Panel a) reports thecoactive driver-target pairs, computed as significant Spearman correlations (p < .

05) betweenthe gene activation of controllable driver-target pairs, for the healthy control (HC - blue squares)and the multiple sclerosis group (MS - red squares). White squares indicate that there is acontrollable walk from the driver to the target, but that their correlation is not significant. Greysquares mean that there is no controllable walk for driver-target pairs. The size of the circlesfor driver nodes codes for their target control centrality values τ . For target genes, circle sizesrepresent the number of driver nodes that can control them. Panel b) Venn diagram showing adecrease in number of driver-target coactivations in the MS patients as compared to HC. Inboth groups, these functional interactions tend to predominantly involve signaling genes. Panel c) subnetwork of the walks from all the drivers coactivated with the target IFNA1. December 8, 2020 21/27 ig 5. Pooled visualization of dysregulated genes along differentially coactivateddriver-target walks.

Highlighted genes indicate all nodes on walks between coactivated driver-target pairs, either in the healthy control group, or in the MS patients group. Dysregulatedgenes are shown in red. Edge thickness is proportional to the number of times they are traversedby walks connecting a driver to a target node (information not reported here).

December 8, 2020 22/27 ig 6. Dyregulated drivers and coactivation switch for SOCS-genes.

The subnetworkincludes all dysregulated drivers (IRF8, NFKB1, SOCS1, SOCS3, TLR7) and their controllabletargets. Panel a) shows coactivated pairs for healthy controls (HC), panel b) shows coactivatedpairs for multiple sclerosis (MS) patients. A coactivation switch can be appreciated betweenthe HC and MS group. SOCS1 and SOCS3 are respectively coactive and silent in HC, whilethey invert their role in the MS group. December 8, 2020 23/27 upplementary figures

Fig S1. Methodological validation of step-wise target controllability.

We start froma simple directed full binary tree, and we add a cycle among the first three nodes. The driveris the root of the tree, while we put a target node in each level of the tree. Target nodes arethen ranked according to their height h . This configuration is fully controllable by constructionregardless of the tree’s height h . However, for h = 5 (i.e. N = 63), the standard procedurecomputing the rank of the full Kalman controllability matrix cannot retrieve all the targets, asthe rank computation is deficient due to numerical errors. Instead, by using the step-wise targetcontrollability we can correctly identify the controllable targets up to h = 10, i.e. N = 2047. December 8, 2020 24/27 ig S2. Average number of targets that can be controlled by a single driver node ,computed on samples of 100 connected and directed random networks, at the varying of sizeand density of the networks, and target set size. Panel a) corresponds to the step-wise targetcontrollability. Panel b) corresponds to a standard procedure that computes the rank of the fullKalman controllability matrix. Results show that in general our method is able to retrieve alarger number of controllable targets as compared to the standard procedure. More specifically,when the target set contains 5% of the network nodes, results are quite stable across differentconnection densities. For larger target-set sizes our method works better when the connectiondensity is relatively low (0 . . N >

180 and density > . Fig S3. Hierarchy among target genes.

Genes corresponding to the 13 secreted moleculesare ranked according to the absolute value of fold change ∆ in the gene activation between themultiple sclerosis (MS) group and the healthy control (HC) group (

Materials and methods ). December 8, 2020 25/27 ig S4. Subnetwork illustrating the feedback cycle between dysregulated genesIFNA1, IFNB1 and NFKB1.

The three nodes belong to the only strongly connectedcomponent (a subnetwork in which every node is reachable from any other node) of the networkhaving more than two nodes. It plays, thus, a central role in the network topology.

December 8, 2020 26/27 upplementary tables

Table 1.

List of all node genes and associated class depending on their functional role.

Table 2.

Distribution of driver nodes across different classes of genes.

Table 3.

Spearman cross-correlation and p-values values for controllable driver-target nodepairs. Values for pairs that were not controllable were replaced by not-a-number (NaN). Bothvalues for multiple sclerosis patients (MS) and healthy controls (HC) are reported.

Table 4.