[PDF] Automated Discovery of Interactions and Dynamics for Large Networked Dynamical Systems

Abstract

Understanding the mechanisms of complex systems is very important. Networked dynamical system, that understanding a system as a group of nodes interacting on a given network according to certain dynamic rules, is a powerful tool for modelling complex systems. However, finding such models according to the time series of behaviors is hard. Conventional methods can work well only on small networks and some types of dynamics. Based on a Bernoulli network generator and a Markov dynamics learner, this paper proposes a unified framework for Automated Interaction network and Dynamics Discovery (AIDD) on various network structures and different types of dynamics. The experiments show that AIDD can be applied on large systems with thousands of nodes. AIDD can not only infer the unknown network structure and states for hidden nodes but also can reconstruct the real gene regulatory network based on the noisy, incomplete, and being disturbed data which is closed to real situations. We further propose a new method to test data-driven models by experiments of control. We optimize a controller on the learned model, and then apply it on both the learned and the ground truth models. The results show that both of them behave similarly under the same control law, which means AIDD models have learned the real network dynamics correctly.

Full PDF

AAutomated Discovery of Interactions and Dynamicsfor Large Networked Dynamical Systems

Yan Zhang , Yu Guo , Zhang Zhang , Mengyuan Chen , Shuo Wang , and JiangZhang ID Beijing Normal University, Beijing, China State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing, China Software Institute, Nanjing University, Nanjing, China * [email protected] ABSTRACT

Understanding the mechanisms of complex systems is very important. According to speciﬁc dynamic rules, a networkeddynamical system, understanding a system as a group of nodes interacting on a given network, is a powerful tool for modellingcomplex systems. However, ﬁnding such models according to time series of behaviours is difﬁcult. Conventional methods canwork well only on small networks and for some types of dynamics. This paper proposes a uniﬁed framework for an automatedinteraction network and dynamics discovery (AIDD) on various network structures and dynamics, based on a stochasticgradient descent algorithm. The experiments show that AIDD can be applied to large systems with thousands of nodes and isrobust against noise and missing information. We further propose a new method to test data-driven models based on controlexperiments. The results show that AIDD is able to learn the real network dynamics correctly.

Introduction

Living cells, brains, human society, stock markets, global climate systems, and so forth are complex systems composedof many nonlinear interactive units [1–5]. By decomposing a complex system into a static network with dynamics on nodes,networked dynamical system models are powerful tools to describe complex systems, playing a paramount role in understandingtheir collective behaviours and controlling their functions [2, 3, 6]. However, building such models requires professionalknowledge and modelling experience, which hinders the wide application of these methods. The reconstruction of suchnetworked dynamical systems in a data-driven way remains a fundamental problem, i.e., to retrieve the interaction networkstructure and the node dynamics from time-series data of complex system behaviours without any subjective biases [7, 8].Although many classical approaches to time series forecasting have been proposed [9, 10], prediction of the behaviours ofcomplex systems with highly nonlinear and long-range correlations, especially those based on a complex network structure,had not been resolved until the recent introduction of graph (neural) network (GNN) models [11–22]. GNNs are designedparticularly for networked dynamical systems. By learning complex functions of information aggregation and propagation on agiven network, GNNs can simulate any complex dynamics on such networks. However, a complete graph is always required formost GNN models, which hinders their wider applications [8, 19, 23–26].How to reveal network structure from time series data is of great importance because the revealed interaction networkcan not only help us to understand the behaviours of complex systems but also can improve algorithms’ explainability andtransparency in terms of causality [5, 27–30]. The interdependence relations or causal structure can be obtained by directlycalculating some statistical measures [31–33], perturbing the system [34, 35], optimising a score function [36, 37], or expandingthe complex interaction dynamics on a set of basal functions [7, 38, 39], and other methods [40–42].Among these algorithms, the algorithm for revealing network interactions (ARNI) [43] is one of the most prominentmethods. It can not only infer a network with high accuracy but can also be adopted for various nonlinear dynamics. However,one disadvantage is that the performance of the model strongly depends on the choice of the basal functions. If the prior biaseson basal functions are missing, this approach becomes very time-consuming, limiting its application to larger systems.Very few studies have been proposed to perform both network inference and time series forecasting tasks together, althoughsome network inference algorithms are capable of forecasting. The implicit and time-variant network structures can also beobtained from deep learning models for forecasting based on an attention mechanism [8, 25, 44–47]. The ﬁrst framework toderive an explicit network is NRI (neural relation inference) [19], in which an encoder–decoder framework is used. However,the complicated encoding process to infer the connections from time series data has limited scalability and accuracy on larger a r X i v : . [ phy s i c s . s o c - ph ] F e b igure 1. The workﬂow and evaluation of Automated Interaction and Dynamics Discovery(AIDD). Workﬂows showing howour proposed AIDD framework models a complex system, and how can be evaluated on tasks of time series forecasting,interaction network inference, and control experiments. The framework of the AIDD is also shown in the inset box. A columnof the adjacency matrix for the candidate network is sampled by the network generator. It can be regarded as a mask vector toﬁlter out the supposed unrelated nodes. Then, the time series information for related nodes is input into the dynamics learningmodule, which then outputs a prediction for the new states of all nodes. After that, the prediction is compared against the data.The loss function can be calculated, and the gradient information can be back-propagated directly. After optimisation, a learnednetworked dynamical system represented by neural networks can be obtained.networks [24]. Further studies promote NRI on several aspects, such as the consideration of constraints [23, 48], and inferringhidden nodes [49], whereas the problem of scalability and accuracy remains.However, despite the large number of experiments on evaluating and comparing various data-driven models that have beenconducted based on prediction tasks, much room for improvement remains. According to the three-layer causal hierarchy,intervention and counterfactual experiments rather than predictions are the gold standards for testing a trained data-drivenmodel as a complete substitution of the original system [27, 50]. That is, downstream tasks such as the control experiments ofboth the learned model and the original systems should be tested and compared [51]. However, we found only a few studieshave performed this kind of task [6, 51].As shown in Fig. 1, this paper proposes a uniﬁed framework for automated interactions and dynamics discovery (AIDD).This is a universal framework for learning both the interaction structure and dynamics of a complex system from time seriesdata. The design of a lightweight network generator and a universal dynamics learning component based on Markov dynamicsmakes AIDD not only applicable to various networks and dynamics, but also enables it to reconstruct very large networkswith high accuracy and robustness. The entire framework is differentiable so that it can be optimised directly by automaticdifferentiation and machine learning techniques [52]. Beyond tasks of network inference and time series forecasting, wepropose a new method to test a learned data-driven model based on control experiments. Finally, we test the validity of ourframework on real gene regulatory networks under noisy, incomplete, and interrupted data, which is close to realistic situations.The results demonstrate that a high performance can be obtained. esults

Problem Formulation

Suppose the complex system to be considered evolves under discrete time steps. Thus, the dynamics to be reconstructedcan be described by a mapping. X t + = f ( X t , A ) + ζ t , (1)where X t = [ X t , X t , · · · , X tN ] ∈ R N × D is the state of the system at time t , N is the number of nodes, D is the dimension of thestate space of each single node, A is the adjacency matrix of the interaction network to be inferred, and ζ t ∈ R N × D is the noiseimposed on nodes. However, Equation 1 can only describe the dynamical processes with explicit mathematical forms andcannot be applied to those deﬁned by rule tables or transitional probabilities, such as cellular automata, Boolean dynamics, orMarkov chains. Therefore, instead of Equation 1, we use a more general form, a Markov chain { X t } , to describe the dynamics. f ( X t + | X t , A ) ≡ P ( X t + | X t , A ) , (2)where f is the dynamics to be discovered, X t is the abbreviation for the event that the random variable x t takes value x t ∈ S N ,where S is the state space of each single node and can be either a ﬁnite set of discrete values or an inﬁnite set with continuousvalues. P is the conditional probability. Equation 2 is compatible with Equation 1 but more general [53]. It can even beextended to non-Markov random processes with ﬁnite histories by adding more hidden auxiliary variable variables[54].However, it is difﬁcult to infer the probabilities in Equation 2, particularly when N is large. Fortunately, the interactions ofcomplex systems are always localised, which means that P ( X t + | X t , A ) can be factorised into local transitional probabilities [55]: P ( X t + | X t , A ) = N ∏ i = P ( X t + i | X t (cid:12) A · i ) , (3)where (cid:12) is the element-wise product, and A · i represents the i th column of matrix A , and X t (cid:12) A · i is a vector representing thestate combination of all neighbour nodes of i . Then f i ( X t + i | X t (cid:12) A · i ) ≡ P ( X t + i | X t (cid:12) A · i ) (4)represents the local dynamics of node i , which is also called a causal mechanism in the literature [55].Therefore, our task becomes the reconstruction of the network A and learning the local dynamics f i according to theobserved time series x = ( x , x , · · · , x T ) with T time steps on multiple samples. Model

We build a neural network framework consisting of two modules to solve the reconstruction problem, as shown in the insetpanel of Fig. 1. The ﬁrst module is a network generator that can generate a candidate network adjacency matrix ˆ A ( Θ ) withthe parameters Θ . The second module then attempts to simulate the system dynamics f by using a set of neural networksˆ f i ( ˆ X t + i | x t (cid:12) ˆ A · i ( θ · i ) , φ i ) for any node i , which are parameterized by Φ = ( φ , · · · φ N ) to predict the future state ˆ X t + , ˆ X t + , · · · according to the candidate matrix ˆ A ( Θ ) and the observed state of the previous time step x t .Instead of using the complicated graph network architecture to generate the candidate network as described in [19], wedirectly sample each element in the adjacency matrix ˆ A i j ∼ Bernoulli ( θ i j ) , where θ i j ∈ [ , ] represents the probability thatthe entry of the i th row and the j th column in ˆ A takes the value 1. To make the sampling process differentiable, we use theGumbel-softmax technique [19, 56] to generate the adjacency matrix [24].ˆ A i j = σ (( log ( θ i j ) + ξ i j ) / τ ) , (5)where ξ i j ∼ Gumbel ( , ) is a random number following the standard Gumbel distribution, σ is the softmax function, and τ isthe parameter of temperature to adjust the softness of the sampling process. The random numbers generated by Equation 5 havea similar distribution as Bernoulli ( θ i j ) , especially when τ is large. The simulated sampling process is differentiable such thatthe gradients can be passed by. When τ → ∞ , ˆ A i j exactly equals 1 with probability θ i j , or 0 with probability 1 − θ i j .Compared to other network generation mechanisms based on the Hadmard product of two V dimensional node featurevectors [8, 19, 44], where V (cid:28) N , our method has higher accuracy in inferring links because more parameters ( N × N v.s. N × V ) are used. owever, optimising N × N parameters does not reduce the implementation performance because the matrix elementsare independent of each other, which means that ˆ A can be generated column-by-column separately, where each columnrepresents the possible neighbourhood of a single node. In this way, our framework has a large improvement in ﬂexibility andcomputational efﬁciency compared to the encoder-decoder frameworks such as [19] and can be applied to very large networks.However, the limitations are that the networks should be static, and the correlations between elements of ˆ A are ignored. Further,the introduction of noise ξ i j can push the network generator to jump out of local minimums during optimisation.According to Equation 3, the dynamics learning module can also be decomposed into local modules node by node. Eachlocal dynamics learner can be modelled by a feedforward or a recurrent neural network ˆ f i ( ·| φ i ) . Both the network structureand the parameters φ i can be shared by different nodes or not. If they share the dynamics learning module, they form a graphnetwork [11, 19], as shown in Fig. 1. Objective function

Finally, the network inference and dynamics learning problem can be solved by minimising the following objective function. L ( Θ , Φ ) = E ˆ A ∼ B ( Θ ) (cid:32) − T ∑ t = log P ( ˆ X t + = x t + | x t , ˆ A , Φ ) (cid:33) + λ ∑ i j ˆ A i j ≈ − K K ∑ k = T ∑ t = N ∑ i = L i ( ˆ X t + i = x t + i | x t , ˆ A k · i ; φ i ) + λ ∑ i j ˆ A i j , (6)where L i ( ˆ X t + i = x t + i | x t , ˆ A k · i ; φ i ) = log ˆ f i ( ˆ X t + i = x t + i | x t (cid:12) ˆ A k · i ; φ i ) (7)is the local log-likelihood, and K is the number of samples for matrix ˆ A under a given Θ , and x ti is the observational vectorof states of node i at time t . Thus, the objective function contains two terms, the former being the log-likelihood, whichcan be decomposed into local terms. The latter is the structural loss to conform the network to be sparse while avoidingover-ﬁtting [48]. The parameter λ can adjust the relative importance of the structural loss. When we perform multiple timepredictions, we can use ˆ X t predicted by the previous time series instead of x t in Equation 7 to calculate the loss.If the state space S of each node is real, then the local log-likelihood Equation 7 can be taken as a mean-absolute error(MAE) form, L i ( ˆ X t + i = x t + i | x t , ˆ A k · i ) = (cid:12)(cid:12)(cid:12) x t + i − ˆ f i ( x t (cid:12) ˆ A k · i ; φ i ) (cid:12)(cid:12)(cid:12) , (8)by assuming that ˆ f i ( x t + i | x t (cid:12) A · i ) is an independent Laplacian distribution for any data point ( ν ) exp (cid:2) − ν · (cid:12)(cid:12) x t + i − µ i (cid:0) x t (cid:12) ˆ A · i | φ i (cid:1)(cid:12)(cid:12)(cid:3) ,where, the mean value of the Laplacian distribution µ i is modelled by a neural network ˆ f i ( x t (cid:12) ˆ A · i ; φ i ) , which can be regardedas a new form of the dynamics learner [57]. We let ν = Performances and Comparisons

To test our model and compare it with others, we generate time series data by a set of network dynamics on a variety ofgraphs as the ground truths. All continuous, discrete, and binary dynamics are included. Spring (spring dynamics) [19], SIR (aninter-city meta-population SIR epidemic model) [59], and Michaelis–Menten kinetics [60, 61] all are examples of continuousdynamics. The coupled map network (CMN) [62, 63] and the voter model [64] are representatives of discrete and binarydynamics, respectively. The details of the dynamical models can be found in the Supplementary Section 6. The graphs to beinferred are either generated by models (ER [65] for Erdos Renyi, WS [4] for Watts-Strogatz, and BA [66] for Barabasi-Albert)or from empirical data including a gene network (for S. cerevisiae, Gene) [67], an inter-city trafﬁc network (City) of China,three social networks (email, Dorm, and Blog), and a road network(Road) [68]. The details of the empirical networks can befound in the Supplementary Section 7. For each model and network, we ran the simulation for various lengths of time. All thereported results are on the testing dataset.We compare our model to a series of baseline methods for both network inference and single step forecasting tasks.ARNI [43] and NRI [19] are both state-of-the-art models for network inference and time series forecasting. The former isbased on the block-orthogonal regression method, and the latter is based on deep learning and graph networks. Two other able 1.

Comparisons of performance on network inference and dynamics prediction tasks between AIDD and other selectedmethods (columns) on different dynamics and networks (rows)

Type Model Network ARNI MI PC NRI LSTM OURSAUC MSE AUC AUC MSE/ACC AUC MSE/ACC MSE/ACC AUCCon. Spring ER-10 0.5853 1.33E-03 0.7500 0.8250

WS-10 0.5125 1.58E-03 0.6875 0.7875

BA-10 0.5169 1.10E-03 0.6422 0.6571

ER-2000 - - 0.4997 - - - 2.25E-03

WS-2000 - - 0.5002 - - - 5.89E-03

BA-2000 - - 0.5010 - - - 4.54E-03

SIR SIR-371(D) 0.5424* 8.25E-03 0.5027 0.5119 - - 2.28E-03*

Menten Gene-100(D)

WS-10

BA-10

ER-200 0.8441* 4.17E-02 0.5774 0.7648 - - 5.91E-05

WS-200

WS-1000 - - BA-1000 - - 0.5290 - - -

Bin. Voter ER-10 - - - 0.4552 0.8447* 0.5000 0.5242

WS-10 - - - 0.5250 0.9062* 0.5037 0.6007

BA-10 - - - 0.4607 0.9588* 0.4999 0.6917

WS-1000 - - 0.5470* - - - 0.5317

BA-1000 - - 0.5030 - - - 0.5208*

EMAIL-1133 - - 0.4999 - - - 0.5333*

ROAD-1174 - - 0.5004 - - - 0.5455*

DORM-217(D) - - 0.5219 - - - 0.5735*

BLOG-1224(D) - - 0.4995 - - - 0.5295*

In the network column, we use network - size format. The networks marked with “D” means that they are directed graphs.All networks generated by models share the same edge density value, which is 1% for large networks (size > 10), and it is20% and 3% for small networks with sizes smaller than 10, and ER networks with size = 200, respectively, to avoid isolatednodes. All the results are the averages of ﬁve repeated experiments. The same data volume is shared for different methodsin one row. The items marked by “-” indicate that valid results of the model cannot be obtained due to the limitations ofthe speciﬁc method on dynamics, memory, or time consumption. The best results among all the compared algorithmsin the same row are boldfaced, and the second-best results are marked “*”. More parameter settings are shown in theSupplementary Section 9.frequently used statistical metrics, partial correlation and mutual information, are also compared on the network inference task.In addition, the classic time series forecasting model, long short-term memory (LSTM) [54], is compared on the prediction task.The details of these comparison models can be found in the Supplementary Section 5. In all the experiments, we used a graphnetwork with 128 hidden units as the dynamics learning module (see the method section for details). The neural network isshared by all nodes. Other parameters are shown in the footnote of Table 1 and Supplementary Section 9.

Figure 2.

Multi-step prediction results of AIDD on CMN dynamics data in a 10-node ER network. In (a), we show the timeseries data of multi-step predictions and the ground truths for two selected nodes. In (b), we show how the mean square error(MSE) increases with time for CMN dynamics. The parameters are the same as in Table 1.As shown in Table 1, our model outperforms all the methods on large networks for both tasks. Compared with the ARNImodel, AIDD does not rely on the choice of basis functions; as a result, it can be applied to very diverse dynamics. By usingneural networks as a universal estimator for dynamics, AIDD avoids the problem of combinatorial explosion of basis functions.This enables AIDD to have competitive performances on space and time complexity ( O ( N ) v.s. O ( N ) for ARNI. In ARNImodel, the time complexity of ﬁnding the interactions of a single node is O ( N ) , so the time complexity of revealing the entire etwork interactions is O ( N ) ). In order to compare performance on time series forecasting, we slightly modiﬁed the originalalgorithm of ARNI such that it can also output one-step prediction [43]. Compared to the NRI framework, our model has amuch lighter network generation architecture. NRI cannot output any result under the same limitations of time and space onnetworks with sizes larger than 30 owing to the computational complexity [19].The model can also output multi-step prediction results by feeding the result of the one-step prediction output back to themodel. Figure 2 shows the results for the selected dynamics. Figure 3.

The performance of AIDD under different factors in network inference and dynamics learning. (a) shows how thenumber of nodes and the volume of data (the number of samples × the number of time steps, which was ﬁxed to 100) jointlyinﬂuence the network inference accuracy on WS networks under CMN dynamics. With the exception of 100 nodes with a 4%edge density, all nodes shared the same edge density value, which was 1%. (b) shows how performance decreases with edgedensity. For the experiments in (b), we set the number of nodes to 100, and the sparse matrix parameter λ was set to 0.In general, AIDD works very well on large sparse networks, and the performance on both tasks decreases as the edgedensity increases, as shown in Fig. 3(b).We can improve the accuracy by increasing the amount of data. Figure 3(a) shows how the area under curve (AUC) dependson both network size and data volume to be fed into the model systematically, and similar contours can be obtained on MSEerrors (see Supplementary Section 2). There is a trade-off between network size and data volume under a given accuracy, asshown in Fig. 3(a). It is interesting to observe that data volume is sensitive to network size only when the number of nodesis between 300 and 500, and beyond that, a minimum amount of data volume is sufﬁcient to obtain an acceptable accuracy(e.g., AUC = . Robustness against noise and hidden nodes

A good data-driven model must be robust against noise and unobservable nodes such that it can be applied to the real world.To show the robustness against noise of AIDD, we plot changes in AUC with the magnitude of noise on Michaelis–Mentenkinetics [60, 61], which can describe the dynamics of Gene regulatory networks, as shown in Fig. 4. Our model can recover thenetwork structure with 0.85 AUC when the mean magnitude of noise is 0.3.In real applications, we can only obtain partial information of the entire system owing to the limitations of observation.Thus, a certain proportion of nodes are unobservable or hidden. This requires the inference algorithm to be robust to hiddennodes. Thus, we test the AIDD on an incomplete network. To generate the incomplete network data as the ground truth, werandomly select a certain percentage of nodes as the hidden nodes (Fig. 4(a)), and the time series data of these nodes areremoved. AUC decreases and MSE increases as the fraction of the number of unobserved nodes increases on both spring andvoice dynamics, as shown in Fig. 4(c); however, the sensitivity depends on various types of dynamics. It is found that when theproportion of missing nodes reaches 50%, the inference accuracy is still above 95%, which proves that our model can achievesuperior results in the absence of normally sufﬁcient amounts of data. urthermore, we test the ability of AIDD to reveal unknown network structures of unobservable nodes on CMN and Voterdynamics, with only the number of hidden nodes available. We completed this task by performing the same interaction inferencetask, setting the states for unknown nodes to random values. Figure 4(d) shows the AUCs of the link structures of unknownnetworks on Voter and CMN dynamics. The results reveal that the network inference accuracy is robust for missing nodes. Thealgorithm can recover the interactions even for unobservable nodes with over 80% accuracy. The details of the algorithm can befound in [69] and Supplementary Section 3.

Figure 4.

The robustness evaluation of AIDD against noise and missing nodes. (a) shows a schematic ground truth networkwith missing information on the unobserved nodes (grey nodes). (b) shows the inﬂuence of proportion of unobserved nodes onthe accuracy of interaction inference on the partial network with observed nodes measured by AUC, and the accuracy ofdynamic predictions (inset) measured by the MSE of the observable nodes on Spring, CMN, and the AUC of the observablenodes on Voter dynamics. All the experiments were conducted on an ER network with 100 nodes, and all networks generatedby models share the same edge density value, which is 4%. (c) shows the dependence of AUC and MSE on the mean of noiseadded on each node for the Michaelis–Menten kinetics (Gene dynamics) on the yeast S. cerevisiae gene network with 100nodes. (d) shows the ability of AIDD to infer interactions on the entire network (the light colour bars) and the unobservedpartial networks (the dark colour bars). All the experiments are conducted on CMN and Voter dynamics with ER, WS, and BAnetworks, and all networks contain 100 nodes with 10% unobservable nodes selected randomly, and all networks generated bymodels share the same edge density value, which is 4%. The parameters in these experiments can be referred to in theSupplementary Section 9.

Control

To further verify that AIDD has learned the ground truth dynamics and that the trained model can replace the originalsystem, we design control experiments. The reasons why we choose the control problem as the test bed for our model include(1) the control problem in a complex network is very important, and it has relevance to many engineering ﬁelds [6]; (2) thecontrol problem is more difﬁcult than predictions. This is true because to control a system means to intervene in it. As a result, e have stood at least on the second level of the causal hierarchy [50].Here, our control problem is to ﬁnd a control law based on the learned network dynamics such that if the rule is applied tothe ground truth model, we can synchronise the whole system by regulating a few nodes. To do this, we divide the nodes in thenetwork into two groups: the driver nodes, who can be manipulated by the controller directly, and the target nodes, whose statescannot be directly controlled, but can be inﬂuenced indirectly. The controller is designed by optimising a set of parameters suchthat the control objective, that is, the synchronisation of the whole system, is achieved.The control experiments consisted of two stages as shown in the bottom part of Fig. 1. In the ﬁrst stage, we ﬁnd theoptimised controller’s parameters on the learned network dynamics to achieve the designed objective. In the second stage,we apply the optimised controller to the ground truth model. In order to verify the learned model, we also optimize anothercontroller on the ground truth model. After that, we compare the deviation curves of the two controllers from the control targeton the same ground truth model. If the curves are similar, we conclude that the learned model can substitute for the real model.The detailed methods can be referred to in the Supplementary Section 4.Two control experiments were designed. The ﬁrst problem is to synchronise the movement directions of all masses withspring dynamics on a small-size BA network with 10 nodes (see Fig. 5(a)). Three nodes with the largest degrees are selected asthe driver nodes. The controller implemented by a neural network adjusts the forces imposed on the drivers according to thecurrent state of the system at each time step. The experimental results are shown in Fig. 5. The two MSE curves in Fig. 5(b)depict the degree of control achieved by the trained model and the ground truth model, respectively. They overlap to show thatthe learned dynamics can be a good substitute for the real system. Both curves approached 0 within 13 timesteps, indicatingthat the controllers achieved the goal within the given time.The second control experiment involves querying all the oscillators in a CMN model on a WS model network with 10 nodes(see Fig. 5(d)) to take the same value of 0 .

6, which is the mean value an oscillator can take with a given range. The controller isa parameterized neural network, which maps the current state of the system into the control signals (see Supplementary Section4). The signals are the forces imposed on the two drivers (with the largest degrees) directly. From Fig. 6 (e) and (f), the controlsare not well achieved for all nodes because the MSE curves do not converge to zeros. However, the two MSE curves overlapvery well, indicating that the surrogate behaves identically to the ground truth model.

Gene regulatory network inference

To verify that our algorithm can be applied to actual scenarios and not only on toy models, we attempt to infer the realsubnetwork structure from the known transcriptional network of yeast S. cerevisiae according to the time series data of mRNAconcentrations generated by GeneNetWeaver (GNW) [67], a famous simulator for gene dynamics.The networks used by GNWs are extracted from known biological interaction networks (Escherichia coli, Saccharomycescerevisiae, etc.). On these networks, GNW uses a set of dynamical equations to simulate the transcription and translationprocesses, and it has considered many factors close to real situations (see Supplementary Section 6 for details). Therefore,GNW is a famous platform for benchmarking and performance assessment of network inference methods.In the experiment, we used yeast S. cerevisiae gene network with 100 nodes as the benchmark gene network, and weused the default parameters of DREAM4_In-Silico in GeneNetWeaver software to generate data. For the dynamics learner,we use different neural networks for each node because of the heterogeneity of node dynamics and the existence of latentvariables, noise, and perturbations [67]. We compare our method with partial correlation, Bayesian network inference, andmutual information algorithms. Our method outperforms others on network inference (Fig. 6(a)) on the AUC(0.82). It can alsopredict the dynamics with a relatively high accuracy (the average absolute error (MAE) is 0.038, see Fig. 6. This indicates thatour method can perform well realistic gene regulatory dynamics.

Discussion

In this paper, we propose a uniﬁed framework of automatic interaction and dynamics discovery, called AIDD, for largenetwork dynamical systems. We also propose a new standard based on control tasks to evaluate whether the true networkdynamics can be learned.The main highlights of AIDD include scalability, universality, and robustness. The high scalability is reﬂected by the factthat the model can be applied to large networks with more than thousands of nodes with more than 90% accuracy because thetraining procedure can be taken node by node. AIDD is a universal framework because it can be applied to various types ofdynamics, including continuous, discrete, and binary. AIDD is robust not only on noisy input signals, but also on unobservablenodes. AIDD can recover an entire network even when time series data is missing with more than 90% accuracy. AIDD wasalso shown to work well on datasets generated by GeneNetWeaver, which emulates the real environment of gene regulatorynetwork dynamics.Furthermore, we propose a new method based on controls to test the validity of the learned model. We have optimised twocontrollers, one of which is optimised on the real system and the other is optimised on the model. Then, we apply them to igure 5.

The control experiments on learned models. (a) shows the spring network that we studied for the ﬁrst experiment.Three large nodes are driver nodes, and the others are target nodes. The control objective is to request all masses to have thesame movement direction. (b) shows the ﬁnal movement states of all the target nodes under the controls. (c) shows two MSEcurves for evaluating goal achievement versus time steps of the controls. One represents the results of learned model, and theother is the ground truth. (d) is a coupled mapping network that we studied in the second experiment. Two large nodes wereselected as driver nodes. The control objective is to ask all oscillators to have the same value of 0.6, which is the mean of thevalue range for all nodes. (e) shows the oscillations of all target nodes during control. (f) shows two MSE curves for evaluatinggoal achievement versus time steps of the controls. One is for the trained model, and the other for the ground truth. Theparameters in these experiments are given in the Supplementary Section 9.a real system. If they behave similarly, we conclude that the learned model can be a substitute for the real system. Controlexperiments on spring and CMN dynamics of small networks have proved that well-trained AIDD models can replace the realsystems.This framework has many potential applications. For example, AIDD can be used to infer missing links according to thedynamics information. AIDD can also be used in time series forecasting. In contrast to other forecasting models, a clearbinary network can be output by AIDD, which can provide deeper insights into element interactions and potential causal links,increasing the explanability of the model.However, some drawbacks are present in AIDD. First, a large amount of training data, especially the time series in diverseinitial conditions, is required to obtain a good model. Nevertheless, it is difﬁcult to obtain different time series under a varietyof initial conditions. Although we can split a long time series into segments, and the ﬁrst values on each segment can be treatedas a new initial condition, the diversity of the initial conditions is always not high enough to train an AIDD model with highquality. Hence we may develop new models that are suitable for small data.Second, all the dynamics considered in this paper are Markovian, but this property is hardly satisﬁed in real cases. Newextensions and experiments on non-Markovian dynamics should be conducted. For example, we can use a recurrent neuralnetwork instead of a feed-forward network as the dynamics learning component.Finally, our network generator samples networks according to the naive mean ﬁeld assumption. Although good results havebeen obtained on network inference, correlations between nodes are ignored. Thus, we can use generative graph models toreplace the Bernoulli network generator such that correlations and inductive bias on structures can be considered. igure 6.

Performances of AIDD and other compared methods on network inference for the gene regulatory network of yeastS. cerevisiae with 100 nodes. (a) ROC curves of different network inference methods. The comparison methods includeBayesian network (BN), partial correlation (PC), mutual information (MI), and AIDD. The AUC for different methods ismarked in the legend. (b) shows the comparison between the observed time series of the expression data (real) and the predicteddata on selected genes. In this plot, the solid lines represent the predictions and the dotted lines represent the observed data.

Methods

AIDD

The framework consists of two parts: a network generator and a dynamics learner. The input of the model is the stateinformation of all nodes at time t , and the output of the model is the predicted state information of all nodes at time t +

1. Theinferred adjacency matrix ˆ A can also be retrieved from the network generator.The network generator is simply a differential matrix sampler parameterized by N parameters Θ , as described in the modelsub-section of the main text. However, the dynamics learners are different for different tasks. We will illustrate the details asfollows. Architecture of the Dynamics Learning module

In all experiments except the ﬁnal one (the gene regulatory network inference from GeneNetWeaver generated data), weused the same neural network structure to construct the dynamics learning module, as shown in Fig. 7. The network structureand weights are shared by all nodes. This design has been veriﬁed to be suitable for learning various complex dynamics [19, 24].In all experiments, we set the hidden layer size of the dynamics learner to 128. The parameter τ in the network generatorwas set to 1. In the experiments, we set K to be the number of epochs because ˆ A · i ( Θ ) are sampled at each epoch.The λ in the objective function is the coefﬁcient of the sparse matrix. In the experiments where the number of nodes is lessthan 1000, λ is set to 0 . λ is set to 0 .

001 for larger networks. More details about the parameters can be found in theSupplementary Section 9.

Architecture of AIDD for gene regulatory network inference

We adopt a heterogeneous network structure in the task of gene regulatory network inference because the regulationdynamics and parameters are heterogeneous for different genes. In detail, the dynamics learning module consists of severalmulti-layer perceptrons (MLPs), with each MLP corresponding to a single gene. The input of the MLP is a vector produced bythe element-wise product of the gene expression vector X t and the column vector ˆ A i of the adjacency matrix, which representsthe TF(transcription factors) regulation acting on the corresponding gene i . The output of the MLP is ˆ X t + i , which is theestimated concentration of gene i at time t +

1. The concatenation of all MLP outputs is the gene expression value ˆ X t + at time t +

1. Then, we can compare the output estimation ˆ X t + and the real expression value X t + to compute the loss function. Then,the stochastic gradient descent algorithm can be applied to minimise the loss function. The structure diagram of the model isshown in Fig. 8. Training

We separate the training data by batches and use the stochastic gradient descent algorithm to update the parameters Θ , Φ step by step with different learning rates lr θ and lr φ , respectively, until the epochs of training exceed a threshold. The gradients igure 7. The feedforward process of the dynamics learning module. It can be divided into four steps. (1) Node to Edge:aggregating the original information of nodes to form representations of edges; (2) Edge to Edge: update the edgerepresentations; (3) Edge to Node: aggregate all information on neighbouring edges of each node to form a new feature vectorof the current node; (4) node to node: update the node representations; (5) Output: ﬁnally, concatenate the node representationsand the input state vectors of node i to feed into a feedforward network, and output the prediction of the next state of i .can be computed directly by the automatic differentiation technique on the Pytorch platform because all the steps in ourframework are differentiable. To improve the efﬁciency, we sampled one adjacency matrix at each epoch and updated theparameters immediately. For the complete algorithm, readers are referred to the Supplementary Section 1. We implemented thegradient descent algorithm with the Adam optimizer, and all the algorithms were run on an RTX 2080Ti(11G). Network Inference and Time Series Forecasting

After sufﬁcient training, the optimised parameters of Θ ∗ and Φ ∗ can be obtained. Then, we can use a network generator tosample the complete adjacency matrices as the inferred networks by setting Θ = Θ ∗ and τ → ∞ to obtain absolute 0 or 1.A single time-step prediction of states for all nodes can be sampled by using the dynamics learner ˆ x t (cid:48) + ∼ P ( ˆ X t (cid:48) + | x t (cid:48) , ˆ A ( Θ ∗ ) , Φ ∗ ) ,where ˆ A ( Θ ∗ ) is a sampled adjacency matrix by the optimal parameters. Multiple time-step predictions can also be obtained inan independent rollout manner [70], that is, sampling the state recursively, ˆ x t (cid:48) + ∼ P ( ˆ X t (cid:48) + | ˆ x t (cid:48) , ˆ A ( Θ ∗ ) , Φ ∗ ) for all t (cid:48) >

0. Notethat ˆ x t (cid:48) represents the sample of ˆ X t (cid:48) according to the dynamics learner.Areas under the curve (AUC) and mean square errors (MSE for continuous values) or AUC (for binary values) are used toevaluate the results of network inference and time series forecasting, respectively. More details on training and evaluation canbe found in the Supplementary Section 8 and 9. Data availability

We used the default parameters of DREAM4_In-Silico in GeneNetWeaver software to generate gene network, GeneNetWeaversoftware can be download on https://github.com/tschaffter/gnw [67]. Three social networks (email, Dorm, andBlog), and a road network(Road) [68] can be found at http://networkrepository.com/email-enron-only.php . Code availability

AIDD code repository can be found at https://github.com/kby24/AIDD . The repository includes example codesfor generating data and Algorithms of AIDD. igure 8.

The architecture of AIDD in the task of gene regulatory network inference. The same network generator is used, butdifferent dynamics learners are set for each node. In the forward process, the adjacency matrix is generated by the networkgenerator through Gumbel softmax sampling. Then, the element-wise products between gene expression vector X t at time t andcolumn i of the adjacency matrix are calculated as the input for the corresponding dynamics learner i . Subsequently, thedynamics learner i , which is an MLP, computes the output ˆ X t + i , which is the estimation of gene i’s expression at time t + eferences Siegenfeld, A. F. & Bar-Yam, Y. An introduction to complex systems science and its applications.

Complexity (2020). Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D.-U. Complex networks: Structure and dynamics.

Phys.reports , 175–308 (2006). Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems.

Nat.reviews neuroscience , 186–198 (2009). Watts, D. J. & Strogatz, S. H. Collective dynamics of ’small world’ networks.

Nature , 440–442 (1998). Runge, J., Nowack, P., Kretschmer, M., Flaxman, S. & Sejdinovic, D. Detecting and quantifying causal associations in largenonlinear time series datasets.

Sci. Adv. , eaau4996 (2019). Liu, Y.-Y. & Barabási, A.-L. Control principles of complex systems.

Rev. Mod. Phys. , 035006 (2016). Wang, W.-X., Lai, Y.-C. & Grebogi, C. Data based identiﬁcation and prediction of nonlinear and complex dynamicalsystems.

Phys. Reports , 1–76 (2016). Ha, S. & Jeong, H. Deep learning reveals hidden interactions in complex systems. Preprint at https://arxiv.org/abs/2001.02539 (2020). Brockwell, P. J., Davis, R. A. & Fienberg, S. E.

Time Series: Theory and Methods (Springer Science & Business Media,1991).

Benidis, K. et al. Neural forecasting: Introduction and literature overview. Preprint at https://arxiv.org/abs/2004.10240 (2020).

Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model.

IEEETransactions on Neural Networks , 61–80 (2008). Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at https://arxiv.org/abs/1806.01261 (2018).

Wu, Z. et al. A comprehensive survey on graph neural networks.

IEEE Transactions on Neural Networks Learn. Syst. ,4–24 (2021). Zhou, J. et al. Graph neural networks: A review of methods and applications. Preprint at https://arxiv.org/abs/1812.08434 (2018).

Zhang, Z., Cui, P. & Zhu, W. Deep learning on graphs: A survey.

IEEE Transactions on Knowl. Data Eng.

Yu, B., Yin, H. & Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for trafﬁc fore-casting. In

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence IJCAI , 3634–3640(ijcai.org,2018).

Sanchez-Gonzalez, A. et al. Graph networks as learnable physics engines for inference and control. In

InternationalConference on Machine Learning , 4470–4479 (PMLR, 2018).

Bapst, V. et al. Unveiling the predictive power of static structure in glassy systems.

Nat. Phys. , 448–454 (2020). Kipf, T., Fetaya, E., Wang, K.-C., Welling, M. & Zemel, R. Neural relational inference for interacting systems. In

International Conference on Machine Learning , 2688–2697 (PMLR, 2018).

Zhang, Z. et al. Neural gene network constructor: A neural based model for reconstructing gene regulatory network.Preprint at (2019).

Zheng, C., Fan, X., Wang, C. & Qi, J. Gman: A graph multi-attention network for trafﬁc prediction. In

Proceedings of theAAAI Conference on Artiﬁcial Intelligence , , 1234–1241 (2020). Guo, S., Lin, Y., Feng, N., Song, C. & Wan, H. Attention based spatial-temporal graph convolutional networks for trafﬁcﬂow forecasting. In

Proceedings of the AAAI Conference on Artiﬁcial Intelligence , , 922–929 (2019). Alet, F., Weng, E., Lozano-Pérez, T. & Kaelbling, L. P. Neural relational inference with fast modular meta-learning.

Adv.Neural Inf. Process. Syst. , 11827–11838 (2019). Zhang, Z. et al. A general deep learning framework for network reconstruction and dynamics learning.

Appl. Netw. Sci. ,1–17 (2019). Pareja, A. et al. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In

Proceedings of the AAAIConference on Artiﬁcial Intelligence , , 5363–5370 (2020). Franceschi, L., Niepert, M., Pontil, M. & He, X. Learning discrete structures for graph neural networks. In

Internationalconference on machine learning , 1972–1982 (PMLR, 2019).

Pearl, J.

Causality (Cambridge university press, 2009).

Tank, A., Covert, I., Foti, N., Shojaie, A. & Fox, E. Neural granger causality for nonlinear time series. Preprint at https://arxiv.org/abs/1802.05842 (2018).

Löwe, S., Madras, D., Zemel, R. & Welling, M. Amortized causal discovery: Learning to infer causal graphs fromtime-series data. Preprint at https://arxiv.org/abs/2006.10833 (2020).

Glymour, C., Zhang, K. & Spirtes, P. Review of causal discovery methods based on graphical models.

Front. genetics ,524 (2019). Peng, J., Wang, P., Zhou, N. & Zhu, J. Partial correlation estimation by joint sparse regression models.

J. Am. Stat. Assoc. , 735–746 (2009).

Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved geneticmodules.

Science , 249–255 (2003).

Stetter, O., Battaglia, D., Soriano, J. & Geisel, T. Model-free reconstruction of excitatory neuronal connectivity fromcalcium imaging signals.

PLoS Comput. Biol , e1002653 (2012). Nitzan, M., Casadiego, J. & Timme, M. Revealing physical interaction networks from statistics of collective dynamics.

Sci.advances , e1600396 (2017). Timme, M. & Casadiego, J. Revealing networks from dynamics: an introduction.

J. Phys. A: Math. Theor. , 343001(2014). Liu, H., Kim, J. & Shlizerman, E. Functional connectomics from neural dynamics: probabilistic graphical models forneuronal network of caenorhabditis elegans.

Philos. Transactions Royal Soc. B: Biol. Sci. , 20170377 (2018).

Runge, J. Causal network reconstruction from time series: From theoretical assumptions to practical estimation.

Chaos:AnInterdiscip. J. Nonlinear Sci. , 075310 (2018). Casadiego, J., Nitzan, M., Hallerberg, S. & Timme, M. Model-free inference of direct network interactions from nonlinearcollective dynamics.

Nat. communications , 1–10 (2017). Li, L., Xu, D., Peng, H., Kurths, J. & Yang, Y. Reconstruction of complex network based on the noise via QR decompositionand compressed sensing.

Sci. reports , 1–13 (2017). Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods.

Econom. journalEconom.Soc.

Sugihara, G. et al. Detecting causality in complex ecosystems.

Science , 496–500 (2012).

Ma, C., Chen, H.-S., Lai, Y.-C. & Zhang, H.-F. Statistical inference approach to structural reconstruction of complexnetworks from binary time series.

Phys. Rev. E , 022301 (2018). Pathak, J., Hunt, B., Girvan, M., Lu, Z. & Ott, E. Model-free prediction of large spatiotemporally chaotic systems fromdata: A reservoir computing approach.

Phys. review letters , 024102 (2018).

Velickovic,P. et al. Graph attention networks. In

International Conference on Learning Representations (OpenRe-view.net,2018) Wang, D. et al. Learning attribute-structure co-evolutions in dynamic graphs. Preprint at https://arxiv.org/abs/2007.13004 (2020).

Velickovic, P. et al. Pointer graph networks.

Stat , 11 (2020).

Wu, Z., Pan, S., Long, G., Jiang, J. & Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. In

Proceedingsof the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence, IJCAI-19 , 1907–1913 (IJCAI, 2019).

Li, Y., Meng, C., Shahabi, C. & Liu, Y. Structure-informed graph auto-encoder for relational inference and simulation. In

ICML Workshop on Learning and Reasoning with Graph-Structured Data , (2019). Ayed, I., de Bézenac, E., Pajot, A., Brajard, J. & Gallinari, P. Learning dynamical systems from partial observations.Preprint at https://arxiv.org/abs/1902.11136 (2019).

Pearl, J. & Mackenzie, D.

The Book of Why: The New Science of Cause and Effect (Basic Books, 2018).

Baggio, G., Bassett, D. S. & Pasqualetti, F. Data-driven control of complex networks. Preprint at https://arxiv.org/abs/2003.12189 (2020).

Baydin, A. G., Barak, P., Radul, A. A. & Siskind, J. Automatic differentiation in machine learning: a survey.

J. Mach.Learn.Res. , 1–43 (2018). Gardiner, C. W. et al.

Handbook of Stochastic Methods (springer Berlin, 1985). Hochreiter, S. & Schmidhuber, J. Long short-term memory.

Neural computation , 1735–1780 (1997). Schölkopf, B. Causality for machine learning. Preprint at https://arxiv.org/abs/1911.10500 (2019).

Jang, E., Gu, S. & Poole, B. Categorical reparameterization with gumbel-softmax. In

Fifth International Conference onLearning Representations, ICLR (OpenReview.net, 2017).

Doersch, C. Tutorial on variational autoencoders. Preprint at https://arxiv.org/abs/1606.05908 (2016).

Lee, J. et al. Deep neural networks as gaussian processes. In

Sixth International Conference on Learning Representations,ICLR (OpenReview.net, 2018).

Brockmann, D. & Helbing, D. The hidden geometry of complex, network-driven contagion phenomena.

Science ,1337–1342 (2013).

Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks.

Nat. Rev. Mol. Cell Biol. , 770–780(2008). Barzel, B.& Barabási, A.-L. Network link prediction by global silencing of indirect correlations.

Nat. biotechnology ,720–725 (2013). Garcia, P., Parravano, A., Cosenza, M., Jiménez, J. & Marcano, A. Coupled map networks as communication schemes.

Phys. Rev. E , 045201 (2002). Jalan, S., Amritkar, R. & Hu, C.-K. Synchronized clusters in coupled map networks. i. numerical studies.

Phys. Rev. E ,016211 (2005). Campbell, A., Gurin, G. & Miller, W. E.

The Voter Decides . (Row, Peterson, and Co., 1954).

Erdos, P.& Rényi, A. On the evolution of random graphs.

Publ. Math. Inst. Hung. Acad. Sci , 17–60 (1960). Barabási, A.-L. & Albert, R. Emergence of scaling in random networks.

Science , 509–512 (1999).

Schaffter, T., Marbach, D. & Floreano, D. Genenetweaver: in silico benchmark generation and performance proﬁling ofnetwork inference methods.

Bioinformatics , 2263–2270 (2011). Rossi, R. & Ahmed, N. The network data repository with interactive graph analytics and visualization. In

Proceedings ofthe AAAI Conference on Artiﬁcial Intelligence , 4292–4293 (AAAI Press, 2015) Mengyuan Chen. et al. Inference for network structure and dynamics from time series data via graph neural network.Preprint at https://arxiv.org/abs/2001.06576 (2020).

Tang, C. & Salakhutdinov, R. R. Multiple futures prediction. In

Advances in Neural Information Processing Systems ,15398–15408 (2019). cknowledgements (not compulsory)

We acknowledge Prof. Qinghua Chen, Dr. Lifei Wang, and the workshops in Swarma Club for the helpful discussion. Weacknowledge the support of the National Natural Science Foundation of China (NSFC) under grant numbers 61673070.

Author contributions statement

J.Z. conceived the experiments, Y.Z. and Y.G. conducted most of the experiments. Z.Z. and S.W. conducted the experimentson gene networks, M.Y.C. conducted the experiments of robustness. J.Z., Y.Z., and Y.G. wrote, and all authors reviewed themanuscript.

Competing interests