[PDF] Detecting structural perturbations from time series with deep learning

Abstract

Full PDF

DDetecting structural perturbations from time serieswith deep learning

Edward Laurence , ∗ , Charles Murphy , Guillaume St-Onge , Xavier Roy-Pomerleau , and Vincent Thibeault D´epartement de Physique, de G´enie Physique, et d’Optique, Universit´e Laval, Qu´ebec (Qu´ebec), Canada G1V 0A6Centre interdisciplinaire en mod´elisation math´ematique de l’Universit´e Laval, Qu´ebec, G1V 0A6, Canada ∗ To whom correspondence should be addressed. E-mail: [email protected]

Small disturbances can trigger functional breakdownsin complex systems. A challenging task is to inferthe structural cause of a disturbance in a networkedsystem, soon enough to prevent a catastrophe. Wepresent a graph neural network approach, borrowedfrom the deep learning paradigm, to infer structural per-turbations from functional time series. We show ourdata-driven approach outperforms typical reconstruc-tion methods while meeting the accuracy of Bayesianinference. We validate the versatility and performanceof our approach with epidemic spreading, populationdynamics, and neural dynamics, on various networkstructures: random networks, scale-free networks, 25real food-web systems, and the C. Elegans connectome.Moreover, we report that our approach is robust todata corruption. This work uncovers a practical avenueto study the resilience of real-world complex systems. self-supervised learning | graph neural networks | complex networks | nonlinear dynamics Complex systems can shift abruptly and irreversiblyinto pathological states following a disturbance. Massextinction, stock market crash, lake eutrophication areconcrete examples of such catastrophes [18, 30, 43]. Forsystems with a clear underlying network structure, stressesmay take the form of structural defects such as removingnodes or edges. These perturbations bring the systemcloser to a tipping-point until an obvious dynamical shiftis observed [41]. To prevent breakdowns to occur, earlydetection is essential but is notoriously diﬃcult [25]. In-deed, the eﬀect of the disturbance can be negligible andcomparable to the system’s noise [42]. Even more diﬃcultthan early detection of catastrophes is the challenge of inferring the removed edges/nodes.Given observations of a network dynamics, the issue ofidentifying structural defects has remained largely unex-plored. Most closely related are numerous studies thattackle the problem of detecting early-warning signals offunctional transitions [10, 42, 9, 48]. Mostly grouped un-der the critical slowing down paradigm, these methods are far from universal as many have reported erroneousdetections or failed to forecast an eventual catastrophe[5, 47, 19, 22]. Moreover, few early-warning signals takeinto account the structure of interactions, even thoughdata may be available, so the structural cause of the distur-bance cannot be inferred. Other approaches coming fromthe study of dynamical complex networks investigate thefunctional eﬀect of removing edges [13, 20, 24]. However,they are restricted to speciﬁc dynamical models which aregenerally inadequate to describe the rich behavior of realcomplex systems.Reconstruction methods have been frequently used, es-pecially in neuroscience, to infer a functional structurefrom complex time series [44, 4, 46, 40, 8, 44, 6]. Most ofthese methods shine by their simplicity as they usuallydo not require any parameter ﬁtting and can readily beapplied to almost any type of dynamics. While beingoriginally designed to reconstruct whole networks, theynevertheless seem a natural ﬁt to identify structural de-fects. Yet, the precision of the reconstructed structure isoften limited when the methods are applied on real-worlddatasets. They tend to detect a large variety of func-tional (dense) relationships between the nodes, which areonly indirectly related to the (sparse) structure [12, 35].Other reconstruction approaches rely on assumptions ofdynamical [37, 45] or structural models [50, 32, 34]. Theirmain assets are their high accuracy on toy models withground-truth data and their quantiﬁcation of the conﬁ-dence interval of the reconstructed structure. However,these approaches are limited to speciﬁc instances wherethe model assumptions are reasonable [7], which limittheir scope for real-world applications.In this paper, we directly tackle the issue of inferringstructural perturbations by introducing a method basedon graph neural networks oriented toward real-world ap-plications. We achieve this by training a deep learningmodel to forecast complex dynamics after being trainedon time series of activity with a given original network.Our approach is self-supervised and predicts structural1 a r X i v : . [ phy s i c s . s o c - ph ] J un efects without prior knowledge of the nature of the per-turbations, nor of the dynamical process that generatedthe time series. It can thus be applied to various dynamicsand networks with minimal modiﬁcations. We benchmarkits performance on epidemiological, population, and neu-ral dynamics over various synthetic and real networks. Weshow that it outperforms standard reconstruction methodswhile achieving a high level of precision, and proves to begreatly robust to noisy datasets.Our work provides a novel method to study and mon-itor stressed complex networks. Beyond paving the wayof bringing deep learning frameworks into the study ofdynamical networks, our versatile approach can be usedto address concrete problems of real-world systems. Toname a few, it includes monitoring ecological systems [39],documenting temporal evolution of gene regulatory net-works [17], detecting leaks in water ﬂow networks [38],and evaluating pathological structures of brains disorders[27].

1. Context

Consider a graph G composed of N nodes and a set E of M = |E| edges. We denote the adjacency matrix by A whose element a ij = 1 if there exists an interaction fromnode j to node i and a ij = 0 otherwise. We consider T discrete observations of the nodes activity represented bya time series X = { x ( t ) ∈ R N } t =1 ...T , where the element x i ( t ) = [ x ( t )] i is the activity of node i at time t . Thetime series is generated from a hidden and potentionallystochastic dynamical mechanism, X = M [ A , X , θ ] , (1)where X is the initial state of the system and θ areunknown parameters. The dynamics can take any formrespecting the condition that the time evolution of a nodeactivity depends on itself and the activity of its neighborsonly. It ensures that the adjacency matrix truly governsthe interactions.We consider the following scenario: For t < τ , thedynamics governed by (1) is taking place on the originalgraph whose edges are E . At t = τ >

0, the graph isperturbed by removing a set d E ⊂ E of edges. The eﬀectof the perturbation is a shift in the adjacency matrix A = A − P where P is called the perturbation matrix .For t ≥ τ , the dynamics takes place on the perturbedgraph of adjacency matrix A . Note that the changepoint τ is only introduced to simplify the notation andis assumed to be known. Our analysis could have been done equivalently with the more general scheme of havingtwo distinct and disconnected time series, the dynamicsoccurring on the original graph and the dynamics on theperturbed graph.We now address the task of inferring the perturbationmatrix P relying on the initial adjacency matrix A , timeseries X , and the moment of perturbation τ . Three exam-ples illustrating the task are displayed in Fig. 1. Reason-ably, this problem can be solved by inferring the perturbedstructure, A ≈ ˆ A = f ( X ) from the observation of thenodes activity. The estimated perturbation matrix ˆ P iscomposed of the original edges missing from the inferredstructure, i.e., ˆ P = A − ˆ A . However, reconstructing net-works from time series is a notoriously hard problem and,yet, no universal method exists [29]. Assuming that thestructure A and nodes’ activity before the perturbation t < τ are known, properly incorporating these featuresinto the reconstruction process becomes a critical facet ofthe inference methods, i.e., ˆ A = f ( A , X ) We introduce a Graph Neural Networks (GNN) to leveragethe given structural information. In recent years, GNNhave been developed to be used on structural datasets [49]with graph-based tasks [23, 15]. The developed model istrained on the self-supervised task of forecasting the nodesactivity while optimizing on numerous internal parameters.More precisely, our GNN model forecasts a node activityfrom previous dynamical states of the neighbors,ˆ x ( t + 1) = ( GNN( A , X ( a ) ( t ) , Λ ) , if t < τ, GNN( A − σ (ˆ h GNN ) , X ( a ) ( t ) , Λ ) , if t ≥ τ (2)where X ( a ) ( t ) = { x ( t ) } t = t − a,...,t is the a past steps ofactivity, ˆ x ( t + 1) is the forecasted activities at time t + 1, Λ and ˆ h GNN ∈ R N × N are trainable parameters, and σ ( · )is the sigmoid function. In (2), the graph of interactionsis given by the structure A − σ (ˆ h GNN ) if the forecastis next to the perturbation t > τ , and simply A other-wise. Hence, the perturbation and the dynamical mech-anisms are parametrized separately using ˆ h GNN and Λ respectively. The inferred perturbation matrix is givenby ˆ P = σ (ˆ h GNN ). Note that the sigmoid function σ ( · ) isused to limit the perturbation amplitude between 0 and1. During training, examples of previous and future activ-ities are presented to the model and the error on a lossfunction L [ ˆ x , x ] between the forecasted activity ˆ x and theobserved activity x is backpropagated through the model2 dges0.00.51.0 S c o r e ( P ij ) C RemovedNot removedTime010203040 N o d e B Infectious Susceptible A T P R AUC: 0.98

Top 20% edges0.00.51.0 S c o r e ( P ij ) F RemovedNot removedTime01020 N o d e E D T P R AUC: 1.00

Top 10% edges0.00.51.0 S c o r e ( P ij ) I RemovedNot removedTime050100 N o d e H Spike G T P R AUC: 0.99

Fig. 1 : Examples of GNN predictions and time series for epidemic spreading on a scale-free network (A-B-C), population dynamics ona real ecological network [1] (D-E-F), and neural dynamics on a C.Elegans connectome (G-H-I). Halfway through the time series(black lines in B-E-H), three random edges, shown by the thick red edges in (A-D-G), are removed from the original networksdisplayed in (A-D-G). The bars of (C-F-I) show the predicted scores by the GNN model on individual edges with true removed edgescolored in red. The ROC curves of the GNN predictions are shown with the corresponding AUC. for parameters optimization [See the Materials and Meth-ods for details]. Eventually, the model reaches a stableminimum of the loss function and the perturbation matrixcan be estimated as ˆ P = σ (ˆ h GNN ). Model and trainingdetails are provided in the Materials and Methods section.By design, the GNN is inductive, which means that itspredictions are independent of the target node identityand only depend on the states of the target node and itsneighbors. The learned dynamical mechanism is sharedamong all nodes, although the neighborhood activity maydiﬀer. It also means that forecasting all N node statesis equivalent to operate N independent forecasts, one foreach node of the graph. Hence, the number of parametersof Λ does not have to scale up with the number of nodes inthe graph and the GNN model can be lightweight memory-wise and computationally eﬃcient.

2. Results

We introduce two types of inference methods to comparewith the GNN model. The ﬁrst are functional recon-struction algorithms. The adjacency matrix after theperturbation A is approximated by ad-hoc metrics suchas the correlation matrix [4] or the Granger Causality[14, 6, 8, 40, 44], and the perturbation matrix is estimatedby ﬁnding the original edges missing from the inferredstructure ˆ P = A − ˆ A . For the second type, we assume that the dynamicalmechanisms are perfectly known. It cuts out the chal-lenge of learning the dynamics to focus on detecting theperturbation. In the context of stochastic dynamics, wedevelop a Bayesian inference method, inspired by Ref. [37],that estimates the perturbation by sampling the posteriordistribution Pr( P | A , X , θ ). This method is deeply ad-vantaged compared to the GNN model as the dynamicalmechanism M [ · , · , θ ] is assumed to be known and is usedexplicitly to compute the posterior distribution. There-fore, it serves as an idealized reference point to comparewith our GNN model.We measure the performance of the algorithms usingthe area under the curve (AUC) of the ROC curve. TheAUC can be interpreted as the probability that the modeldistinguishes whether an edge is present or absent fromthe perturbation set d E , independently of the thresholdapplied on the estimated perturbation matrix ˆ P . Notethat an AUC equals to 0.5 is achieved with a uniformand non-informative baseline, whereas an AUC equals to1 indicates that all the edges from the ground-truth per-turbation have the top scores on the inferred perturbationmatrix ˆ P . We evaluate the GNN model over an epidemic spread-ing dynamics called the susceptible-infectious-susceptible3 .05 0.10Connection prob.0.40.60.81.0 A UC A GNNGrangerCorr.Bayesian

10 100 1000 10000Time series length0.40.60.8 A UC B A UC C I nde x o f d i s pe r s i on D Fig. 2 : Area under the curve for SIS dynamics on Erd˝os-R´eyni net-works. (A) With diﬀerent network connection probability p and N = 100 , T = 300 , α = β = 0 . . (B) With diﬀerent timeseries length and N = 100 , p = 0 . , and transition probabilities α = 0 . , β = 0 . . (C) AUC under various ﬂip probabilities with N = 100 , p = 0 . , T = 300 , α = β = 0 . . Each symbol is theaverage over 100 independent simulations with similar conﬁgu-rations and standard deviation is displayed by a shaded region.The gray dashed lines are the non-informative baseline AUC = 0 . .(D) Index of dispersion D = σ /µ of AUC for data from (A). model (SIS) [36]. Nodes can be either infected ( x i = 1) orsusceptible ( x i = 0). At each time step, infected nodes caninfect their susceptible neighbors with a probability α . In-fected individuals, on the other hand, become susceptibleagain with probability β . An example of a time series andprediction of the perturbations is given in Fig. 1(A,B,C). In Fig. 2, we report the AUCfor spreading dynamics over the Erd˝os-R´eyni random net-works [33] for various parameters. It is striking that theGNN performs as well as the Bayesian model, a resultthat is consistent over all experiments. This may come asa surprise as the GNN has been given a more diﬃcult task:Learning both the dynamics and the perturbation. Thus,it suggests that the GNN has simultaneously achievedthe task of learning the dynamical mechanism and strug-gles as much as the Bayesian method on detecting theperturbation.Another important result is that the GNN model out-performs functional reconstruction methods over all ex-periments. It demonstrates the importance of explicitly considering the dynamical mechanism and the prior net-work structure, instead of using a purely functional-baseddeﬁnition of the connectivity.All methods perform best on low connectivity networks[Fig. 2(A)]. Perturbations on denser networks are harder toinfer, independently of the model used. At least two phe-nomena justify this. First, the number of edges involvedin the perturbation is ﬁxed while the total number ofedges grows with the average connectivity of the network.Hence, ﬁnding the right removed edges in an increasingnumber of candidates naturally lead to a more diﬃculttask, irrespectively of the method employed. Second, theaverage number of infected nodes monotonically increaseswith the connectivity of the network [See SupplementaryInformation]. Therefore, the probability of being infectedremains almost the same after the removal of a single edgein dense networks. Consequently, perturbations are barelydetectable.Figure 2(B) shows that increasing the length of thetime series leads to better performance, especially forthe GNN and Bayesian approaches. Larger time seriesare equivalent to having a larger dataset, hence a morerobust inference. This is beneﬁcial for the GNN in twoways since both the inference of the dynamics and ofthe perturbations are enhanced. Building on results ofFig. 2(A), we conclude that perturbations on dense randomnetworks can be inferred with high conﬁdence (AUC¿0.9)if using suﬃciently large time series, e.g., T ∼ entries.The consistency of the inference can be expressed us-ing the index of dispersion D = σ /µ of the AUC overexperiments sharing similar settings. We observe that thedispersion decreases with the average AUC [Fig. 2(D)]. Itmeans that the GNN model becomes even more consistenton similar experiments with high AUC outcomes. More-over, because the dispersion coeﬃcient is highly similarbetween the GNN model and the Bayesian model, it sug-gests that the error source is similar. We hypothesize thatlarge AUC variance is produced by inherent ﬂuctuationsin the generated dataset rather than the sensitivity of thealgorithms. Networks featuring hubsthat dominate the connectivity are common in nature [2].We benchmark the models on tree-like scale-free networksgenerated from the Barab´asi–Albert model [3](Fig. 3). Inthe Supplementary Information, we provide a validationof the GNN model with non-tree scale-free networks aswell.In general, the GNN model performance is much higheron scale-free networks than on random networks. This may4e due to the tree-like structure of the generated networks.Perturbations tend to break up networks into disconnectedcomponents. Smaller and peripheral components tend toquickly deactivate. The removed edges may then be easilyidentiﬁed as the bridges between the dynamically distinctcomponents. This could further be supported by theirnotable contributions in the dynamical likelihood [SeeSupplementary Information for details].Fig. 3(A) shows a decrease in AUC for the functionalreconstruction methods as the infection probability in-creases. Based on the previous discussion, increasing theinfection probability raises the contrast between the inac-tive and active components, which would normally helpinfer the perturbation. To explain this behavior, notethat the correlation (14) between an active node x i ≈ T and an inactive node x j ≈ T is roughly equal to zero,but so is the correlation for pairs of active nodes. Similararguments can be made for the Granger causality. There-fore, for high infection probability, structural informationbecomes hidden from functional reconstruction methods,explaining their poor performance. They perform bestwhen the infection is sparse. It highlights the importanceof using techniques that take the nature of the dynamicsinto consideration.In Fig. 3(B), the models are tested against increasingnetwork size. We observe that the GNN model perfor-mance is roughly constant up to 10 nodes. This maycome as a surprise when recalling that the perturbationis composed of a single edge and that the model has only300 time steps to learn both the dynamics and the per-turbation. Note that we observe a slight decrease in theperformance of the Bayesian model over large networks.However, this trend is due to an insuﬃcient number ofsampling steps compared to the network size. Typical real datasets are noisy[28]. Time series come as a mixture of some hidden dynam-ical mechanism masked by a layer of noise. We investigatethe robustness to noise of the GNN model. To simulatenoise, entries of the time series are ﬂipped with proba-bility p ﬂip , i.e., 0 goes to 1 and 1 to 0. For p ﬂip = 0 . A UC A GNNGrangerCorr.Bayesian

10 100 1000Network size0.60.81.0 A UC B A UC C Fig. 3 : Area under the curve for SIS dynamics on Barab´asi–Albertscale-free networks. (A) Prediction over various infection proba-bility α on networks of N = 100 nodes. (B) AUC over increasingnetwork size with α = 0 . . (C) AUC over increasing ﬂip proba-bility with α = 0 . . The likelihood of the Bayesian model of (C)is corrected to integrate the ﬂip mechanism. Dots are averagesover 100 simulations with similar conﬁgurations, and standarddeviations are indicated by shaded regions. Epidemics spreadingare simulated over T = 300 time steps and β = 0 . . Perturbationsoccur at τ = 150 and consists in removing a single random edge. values of likelihood appear and sampling of the poste-rior distribution becomes impossible. While the Bayesianmodel must be revisited, our GNN model remains un-changed, incorporating automatically the eﬀect of noisein its predictions during training.We compare the performance over various ﬂip prob-abilities on scale-free networks [Fig. 3(C)] and randomnetworks [Fig. 2(C)]. We report that the GNN model isas precise as the corrected Bayesian model. On scale-freenetworks, it maintains high AUC over large ﬂip probability.This performance is quite remarkable as the GNN modelhas to learn the dynamical mechanism from a noisy timeseries while inferring the missing edge. For random net-works, the results show high AUC variances. It indicatesthat the noise process leads to larger ﬂuctuations in preci-sion, which is shared among the GNN and the Bayesianmodels. We again report that functional reconstructionmethods are outperformed by the GNN approach. Population dynamics are valuable tools to study ecologicalsystems. Species population varies according to theirpredator-prey relationships.In these systems, each node represents a diﬀerent speciesand its population x i ( t ) varies in time according to apredator-prey relationship. We benchmark our models ona simple population dynamics given by˙ x i = c i x i + x i N X j =1 ( a ij − a ji ) x j , (3)5 .8 0.9 1.0Average AuC of GNN model0.40.60.81.0 A v e r age A u C o f r e c on . a l go r i t h m AA GrangerCorr. 0.8 0.9 1.0Average AUC of GNN model0.40.60.81.0 A v e r age A UC o f r e c on . a l go r i t h m B W o r s e t h a n G N N B e t t e r t h a n G N N

Fig. 4 : Average area under the curve of predictions on 25 realecological networks with predator-prey dynamics. Each marker isassociated to a speciﬁc real network. The dynamics are simulatedon T = 10 steps with (A) ∆ t = 1 / and (B) ∆ t = 1 . Thefunctional reconstruction algorithms perform better when the dotsare in the upper blue region of the line x = y . Random baselinesare indicated with dashed gray lines. Averages are computed over100 runs on each network. The perturbation is the removal of asingle random edge at τ = 500 . where c i = c P j ( a ji − a ij ) ∈ R is the fraction of thenumber of interactions that a species has [See Materialsand Methods]. Note that introducing a species-dependentparameter c i raises the diﬃculty for dynamical learning asthe dynamical parameters are now speciﬁc to each node.Now that the dynamics is deterministic, the Bayesianapproach previously developed no longer applies. Hence,the GNN model is compared with the correlation and theGranger causality approaches on 26 real directed ecologicalnetworks [Example in Fig. 1(D,E,F)]. We also introducethe sampling parameter ∆ t that indicates the time intervalbetween two steps in the time series. Large ∆ t implies acoarser sampling of time series.Our results show that the GNN (Granger) approachreaches an average AUC of 0.95 (0.88) over all simulationsfor ∆ t = 1 / t = 1 [Fig. 4]. TheGNN performance suggests that it is a valuable candidatefor deterministic context, and even when the dynami-cal parameters of each node are dependent of their localstructure.Interestingly, the GNN model maintains a high AUC forall 26 real networks and large sampling steps ∆ t . The GNNperformance is mostly unaﬀected by the characteristic ofthe network structure. We challenge the versatility of our GNN model on a neuraldynamics simulated on the C.Elegans connectome. First,we simulate the continuous noisy theta model of (4) onthe directed network structure [11]. Then, we threshold

Fig. 5 : Area under the curve of predictions over neural dynamicson the C. Elegans connectome. The perturbation is composedof three random edges and dynamics spans over T = 500 timesteps with τ = 250 . Boxplots are computed over 100 simulations.Outliers (outside of 75th percentile plus 1.5 interquartile range)are indicated with markers. the time series to only record the spikes of activity. Doingso cuts oﬀ valuable information for learning the dynam-ics which greatly increase the diﬃculty of the inferencetask [Example in Fig. 1(G,H,I)]. Also, the simulated thetamodel includes a random noise injection during integra-tion.Results indicate that the GNN model is highly capableof handling this spiking neural dynamics [Fig. 5]. Whille17% of the runs show AUC below 0.75, most of them areabove the expected values from functional reconstructionalgorithms. On average, the AUC is 0 . ± .

19 with amedian value at 0 .

99. It demonstrates the GNN’s highaccuracy in the deterministic context, even in the presenceof partial information.

3. Discussion

Detecting perturbations of complex systems is a challengeof paramount importance, especially in modern times withthe impacts of climate changes on ecosystems and our morethan ever entangled societies. Yet, the lack of reliablepredictive tools to infer the presence and the structuralcause of disturbances is of serious concern. Without takingfull advantage of the graph structure, functional eﬀectof perturbations can be so subtle that it could remainundetected at the local and global levels.In this paper, we have introduced a method relying onrecent deep learning advances to infer structural pertur-bations from time series of activity. The core and originalidea is the self-supervised training scheme that neitherrequires prior information on the perturbation strategynor on the dynamical mechanism. Minimal modiﬁcationsof the model are required to apply the method on datasetsof diﬀerent natures, such as continuous, noisy, or discretetime series.We have tested the method in three contrasting contexts:6preading, ecological, and neural dynamics. We show thatour approach outperforms standard functional reconstruc-tion methods while being comparable to Bayesian modelswith a priori knowledge of the dynamical mechanisms.Our results suggest that GNN are promising candidates topredict structural disturbances in practical applications.Apart from its eﬀectiveness, there are multiple advan-tages of using the GNN model. First, the optimizationprocedure scales, both in speed and accuracy, to large net-works, and is fairly robust to the hyperparameters choice.Second, GNN models are part of an active ﬁeld of researchso that future improvements are expected, e.g., enhancedneural network architecture and improved gradient de-scent optimizer [26]. Third, GNN are versatile to diﬀerentdynamics while being able to cope with large levels ofnoises. Finally, they are well adapted to support othertypes of information collected on the graph that couldhelp the predictions.Our study provides a new avenue for monitoring com-plex systems. It can be used for detecting disturbancesin various classes of complex networks, ranging from eco-cology to neuroscience, and paves the way for the devel-opment of targeted intervention strategies for sustainablemanagement.

4. Materials and Methods

Empirical Networks.

Generated networks.

Random networks are generated from theErd˝os-R´eyni model: N nodes are randomly connected with prob-ability p [33]. The scale-free networks are generated from theBarab´asi–Albert model [3]. We ﬁrst start with a connected pairof nodes. Then, at each step, we add a new node to the networkand connect it to m = 1 existing node chosen with probabilityproportional to their degrees. Simulated dynamics.

For all experiments, we ﬁrst initialize thenodes activity and run the dynamics for 100 burning steps whichare then discarded. Then, we simulate the dynamics for τ steps.At t = τ , we remove random edges from the original network.Then, the dynamics is continued until t = T . The time series X = { x ( t ) ∈ R N } t =1 ...T is then constructed from the vectors ofnodes activity at each time step. Susceptible-Infected-Susceptible dynamics (SIS).

Nodes ac-tivity is binary x ( t ) ∈ { , } N where x i ( t ) = 1 indicates that node i is infected at time step t and x i ( t ) = 0 if susceptible. At each timestep, a susceptible node i is infected with probability 1 − (1 − α ) l i ,where l i = P i a ij x j is its number of infected neighbors and α isthe transmission probability. Infected nodes recover with constantprobability β . The nodes activity is initialized with equiprobablebinary values. Lotka-Volterra dynamics.

We have simulated mixed ecosystemsfrom a generalized Lotka-Volterra dynamics (3). The inherent growthof species is given by c i x i . We chose c i = 0 . P j ( a ji − a ij ).The diagonal of the adjacency matrix is set to zero, a ii = 0, toprevent intraspecies competition. The initial population is uniformlydistributed between 0 and 1. The integration is done using an eighthorder Runge-Kutta method. Noisy neural dynamics.

Neural dynamics on the C. Elegans aresimulated using the noisy theta model. First, we integrate the ODEsfor the the phase of oscillators,˙ y i = [1 − cos( y i )]+ [1 + cos( y i )]  I i + θN − N X j =1 a ij [1 − cos( y j )]  , (4)where y i (0) is initialized uniformly between 0 and 2 π . The param-eters I i control the noise level and are sampled from a uniformdistribution U (0 . , .

15) and θ = 5. We integrate the system us-ing an Euler integrator with dt = 0 .

1. The external current I i issampled on each time step. A spike is said to be produced whencos( y i ) > .

95, i.e., when y i is close to π . The continuous variable y i is transformed into a discrete state variable x i = H [cos( y i ) − . H ( y ) = 1 if y > H ( y ) = 0 otherwise. The time series y is then discarded and only x is used for inference. GNN model architecture.

We have trained graph neural net-works to forecast the activity. For a single forecast of the activityof node i , the GNN takes as input an adjacency matrix A and awindow X ( a ) = { x ( a ) i } i =1 ...N ∈ R a × N of length a of consecutivetime steps of activity of all N nodes, and it outputs the forecastedactivity ˆ x = { ˆ x i } i =1 ...N of all nodes,ˆ x = GNN( A , X ( a ) , Λ ) . (5)The GNN is formed of two main operations. For an individual node i , y i = h Λ  x ( a ) i + N X j =1 a ij x ( a ) j  (6a)ˆ x i = F ( y i ) , (6b)where (6a) is a parametrized and non-linear operation called aGINConv layer [49], and F ( · ) is a trainable function. The operation h Λ is a sequence of linear transformations and ReLU activationswith the exact form detailed in the Supplementary information.For directed networks, (5) is replaced with a linear transformationof two distinct neural networksˆ x = W [GNN( A , X ( a ) , Λ ) k GNN( A T , X ( a ) , Λ )] . (7)All GNN model parameters Λ are randomly initialized using Kaim-ing’s method [16]. GNN training.

We ﬁrst initialize ˆ h GNN so that σ (ˆ h GNN ) = 10 − for existing edges of A and 0 otherwise.A training step goes as follows. We randomly slice a windowat randomly chosen time of length a + 1 in X . The ﬁrst a values, X ( a ) , are given to the model while the last values are used as theforecasting target x . We used a = 1 for the SIS and predator-preydynamics, and a = 20 for the neural dynamics. The neural dynamicsgets a longer activity history because the actual state of the systemis hidden due to the binary spike transformation. Therefore, theinformation from the observed state is only partial and the timeseries are non-Markovian. For these reasons, the task is inherentlymore diﬃcult and requires a longer activity history.We apply a forward step following the prescription of (2). We ptimize the target forecast over a cross-entropy loss for binary dy-namics (epidemics, neural) and the absolute error loss for continuousdynamics (ecological) with the RAdam optimizer [26].We use a learning rate of 10 − for the parameters Λ , with adecay of 0.1 after 1000 steps, and a learning rate of 10 − for theperturbation matrix ˆ h GNN . The model is optimized on 10 steps.It roughly takes 30 seconds per training on a personal computer toachieve an inference over a network of N = 100 nodes with timeseries of length T = 300. Bayesian model.

We estimate the perturbed structure from theexpected value of the posterior distribution asˆ A = X A ∗ A ∗ Pr( A ∗ | X , A , θ ) , (8)and the perturbation matrix from ˆ P = A − ˆ A . The posterior canbe rewritten using Bayes theoremPr( A ∗ | X , A , θ ) ∝ Pr( X | A , A ∗ , θ ) Pr( A ∗ | A ) , (9)where Pr( X | A , A ∗ , θ ) is the likelihood and Pr( A ∗ | A ) is the priordistribution, that we consider uniform. Moreover, since only theportion t > τ is needed to infer the perturbed matrix, we can dropthe dependence on A in the likelihood, i.e., Pr( X | A ∗ , θ ).The posterior of (8) is sampled using a Metropolis-Hasting scheme[31]. We denote the adjacency matrix at sampling step ν by A ν that we have initialized with the unperturbed graph A = A .At each sampling step, we randomly select an edge ( i, j ) ∈ E . Ifthe edge ( i, j ) is present ([ A ν ] ij = 1), we remove it, [ A ∗ ν +1 ] ij = 0,otherwise we add it back, [ A ∗ ν +1 ] ij = 1. Since graphs are undi-rected for the SIS dynamics, each graph proposition is symmetricallyapplied, e.g., [ A ∗ ν +1 ] ij = [ A ∗ ν +1 ] ji ,.The probability of accepting the proposition A ν → A ∗ ν +1 at step ν + 1 is given by ρ = min (cid:18) , Pr( X | A ∗ ν +1 , θ )Pr( X | A ν , θ ) (cid:19) . (10)The likelihood Pr( X | A ∗ , θ ) of the SIS model can be approximatedby the product over individual transition probabilities for each nodePr( X | A ∗ , θ ) = N Y i =1 T Y t>τ Pr[ x i ( t ) | x i ( t − A ∗ , X ( t − , θ ] . (11)which is exact if the process is noiseless, p ﬂip ≡ q = 0. For a node i ofdegree k , with l ≡ P j a ij x j ( t −

1) infected neighbors, x i ( t − ≡ v and x i ( t ) ≡ w , the transition probabilities arePr( w | v ; l, k, θ ) = R ( v, q ) R [ w, R ( β, q )] , + [1 − R ( v, q )] R [ w, R ( F kl , q )] , (12)where R ( y, z ) ≡ y (1 − z ) + (1 − y ) z and F kl = [ q + (1 − q )(1 − α )] l [1 − q + q (1 − α )] k − l . (13)Recall that α and β are given SIS parameters, and q is the ﬂipprobability. Complete derivation is covered in the Supplementary in-formation. We sample the posterior distribution over Γ = 2000propositions. We estimate the posterior distribution of (8) bycounting the relative number of steps held by a certain graph, i.e.,Pr( A ∗ | X , A , θ ) = Γ − P ν I ( A ν − A ∗ ) where I ( · ) = 1 only if theargument is element-wise zero and I ( · ) = 0 otherwise. Functional reconstruction methods.

The correlation matrixhas elements ˆ a ij = (cid:12)(cid:12)(cid:12)(cid:12) ( x i − µ i ) T ( x j − µ j ) σ i σ j (cid:12)(cid:12)(cid:12)(cid:12) , (14)where µ i = h x i i is the average activity of node i and σ i is thestandard deviation of x i . In the Granger causality method [14], the element ( i, j ) of the adjacency matrix is estimated as the ratioof the standard deviation of the forecasting error of linear models F ( · ) trained with solely the activity of node i and the other, G ( · , · ),trained with the activity of nodes i and j ,ˆ a ij = std[ x i − F ( x i )]std[ x i − G ( x i , x j )] . (15)For both methods, the estimated adjacency matrix ˆ A is computedusing only the nodes activity after the perturbation. To suppressinitially absent edges, we also perform an element-wise multiplicationbetween ˆ A and the original adjacency matrix A . If the networkis undirected, we average the score over in and out edges, i.e.,ˆ A ← ( ˆ A + ˆ A T ). We then normalize the estimated adjacencymatrix and evaluate the perturbation matrix by ˆ P = A − ˆ A . Evaluation metrics.

Predicted removed edges are compared toground-truth edges under a Receiver Operating Characteristic (ROC)curve analysis. For each inference, models output a weighted matrixof perturbation ˆ P . Note that only existing edges of A are assignedto a non-zero score because adding edges is prohibited. Perturbationscores are then rescaled between 0 and 1 where a score of 1 is themost likely of being removed. Then, we compute the True/FalsePositive Rates (TPR/FPR) under a varying threshold r from 0 to 1:TPR( r ) = N TP ( r ) N TP( r ) + N FN( r ) (16)FPR( r ) = N FP ( r ) N FP( r ) + N TN ( r ) , (17)where the positive edges are those with a higher score than thethreshold, i.e., the edge ( i, j ) with ˆ P ij > r . The AUC is computedas the area under the ROC curve and is independent of the threshold r applied. The AUC can be interpreted as the probability that themodel distinguishes whether an edge is present or absent from theperturbation set d E . Acknowledgments.

This work was funded by the Fonds derecherche du Qu´ebec-Nature et technologies, the Natural Sciencesand Engineering Research Council of Canada, and Sentinel North, ﬁ-nanced by the Canada First Research Excellence Fund. The authorswant to thank Antoine Allard, Patrick Desrosiers, and Louis J. Dub´efor supports and helpful comments, Mohamed Babine for fruitfuldiscussions and code sharing, and Martin Laprise for useful com-ments and for allocating computational resources. We acknowledgeCalcul Qu´ebec for using their computing facilities.

4. References [1]

R. Angelini and F. Vaz-Velho , Ecosystem structure andtrophic analysis of Angolan ﬁshery landings , Sci. Mar., 75(2011), pp. 309–319.[2]

A.-L. Barab´asi , Scale-free networks: A decade and beyond ,Science, 325 (2009), pp. 412–413.[3]

A.-L. Barab´asi and R. Albert , Emergence of scaling inrandom networks , Science, 286 (1999), pp. 509–512.[4]

B. Barzel and A.-L. Barab´asi , Network link prediction byglobal silencing of indirect correlations , Nat. Biotechnol., 31(2013), pp. 720–725.[5]

M. C. Boerlijst, T. Oudman, and A. M. de Roos , Catas-trophic collapse can occur without early warning: Examples ofsilent catastrophes in structured ecological models , PloS one, 8(2013), p. e62033. S. L. Bressler and A. K. Seth , Wiener–Granger causality: awell established methodology , NeuroImage, 58 (2011), pp. 323–329.[7]

I. Brugere, B. Gallagher, and T. Y. Berger-Wolf , Net-work structure inference, a survey: Hotivations, methods, andapplications , ACM Computing Surveys (CSUR), 51 (2018),pp. 1–39.[8]

Y. Chen, S. L. Bressler, and M. Ding , Frequency decom-position of conditional Granger causality and application tomultivariate neural ﬁeld potential data , J. Neurosci. Methods,150 (2006), pp. 228–237.[9]

C. F. Clements, M. A. McCarthy, and J. L. Blanchard , Early warning signals of recovery in complex systems , Nat.Commun., 10 (2019), p. 1681.[10]

V. Dakos, S. R. Carpenter, E. H. van Nes, and M. Schef-fer , Resilience indicators: prospects and limitations for earlywarnings of regime shifts , Philos. Trans. R. Soc. B, 370 (2015),p. 20130263.[11]

G. B. Ermentrout and N. Kopell , Parabolic bursting in anexcitable system coupled with a slow oscillation , SIAM Journalon Applied Mathematics, 46 (1986), pp. 233–253.[12]

S. Feizi, D. Marbach, M. M´edard, and M. Kellis , Net-work deconvolution as a general method to distinguish directdependencies in networks , Nat. Biotechnol., 31 (2013), p. 726.[13]

J. Gao, B. Barzel, and A.-L. Barab´asi , Universal resiliencepatterns in complex networks , Nature, 530 (2016), pp. 307–312.[14]

C. W. Granger , Investigating causal relations by economet-ric models and cross-spectral methods , Econometrica, (1969),pp. 424–438.[15]

W. Hamilton, Z. Ying, and J. Leskovec , Inductive repre-sentation learning on large graphs , in NeurIPS, 2017, pp. 1024–1034.[16]

K. He, X. Zhang, S. Ren, and J. Sun , Delving deep intorectiﬁers: Surpassing human-level performance on Imagenetclassiﬁcation , in Proceedings of the ICCV, 2015, pp. 1026–1034.[17]

M. Hecker, S. Lambeck, S. Toepfer, E. Van Someren, andR. Guthke , Gene regulatory network inference: data inte-gration in dynamic models—a review , Biosystems, 96 (2009),pp. 86–103.[18]

O. Hoegh-Guldberg, P. J. Mumby, A. J. Hooten, R. S.Steneck, P. Greenfield, E. Gomez, C. D. Harvell, P. F.Sale, A. J. Edwards, K. Caldeira, et al. , Coral reefs underrapid climate change and ocean acidiﬁcation , Science, 318(2007), pp. 1737–1742.[19]

G. J¨ager and M. F¨ullsack , Systematically false positives inearly warning signal analysis , PloS one, 14 (2019), p. e0211072.[20]

J. Jiang, Z.-G. Huang, T. P. Seager, W. Lin, C. Grebogi,A. Hastings, and Y.-C. Lai , Predicting tipping points inmutualistic networks through dimension reduction , Proc. Natl.Acad. Sci. U.S.A., 115 (2018), pp. E639–E647.[21]

M. Kaiser and C. C. Hilgetag , Nonoptimal component place-ment, but short processing paths, due to long-distance projec-tions in neural systems , PLOS Comput. Biol., 2 (2006).[22]

S. K´efi, V. Dakos, M. Scheffer, E. H. Van Nes, and M. Ri-etkerk , Early warning signals also precede non-catastrophictransitions , Oikos, 122 (2013), pp. 641–648. [23]

T. N. Kipf and M. Welling , Semi-supervised classiﬁ-cation with graph convolutional networks , arXiv preprintarXiv:1609.02907, (2016).[24]

E. Laurence, N. Doyon, L. J. Dub´e, and P. Desrosiers , Spectral dimension reduction of complex dynamical networks ,Phys. Rev. X, 9 (2019), p. 011042.[25]

T. M. Lenton , Early warning of climate tipping points , Nat.Clim. Change, 1 (2011), pp. 201–209.[26]

L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, andJ. Han , On the variance of the adaptive learning rate andbeyond , arXiv preprint arXiv:1908.03265, (2019).[27]

M.-E. Lynall, D. S. Bassett, R. Kerwin, P. J. McKenna,M. Kitzbichler, U. Muller, and E. Bullmore , Functionalconnectivity and brain networks in schizophrenia , Journal ofNeuroscience, 30 (2010), pp. 9477–9487.[28]

M. E. Mann and J. M. Lees , Robust estimation of back-ground noise and signal detection in climatic time series , Clim.Change, 33 (1996), pp. 409–445.[29]

D. Marbach, J. C. Costello, R. K¨uffner, N. M. Vega,R. J. Prill, D. M. Camacho, K. R. Allison, A. Aderhold,R. Bonneau, Y. Chen, et al. , Wisdom of crowds for robustgene network inference , Nat. Methods, 9 (2012), p. 796.[30]

R. M. May, S. A. Levin, and G. Sugihara , Ecology forbankers , Nature, 451 (2008), pp. 893–894.[31]

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth,A. H. Teller, and E. Teller , Equation of state calcula-tions by fast computing machines , J. Chem. Phys., 21 (1953),pp. 1087–1092.[32]

M. E. Newman , Estimating network structure from unreliablemeasurements , Phys. Rev. E, 98 (2018), p. 062321.[33] ,

Networks , Oxford university press, 2018.[34]

L. Pan, T. Zhou, L. L¨u, and C.-K. Hu , Predicting missinglinks and identifying spurious links via likelihood analysis , Sci.Rep., 6 (2016), pp. 1–10.[35]

H.-J. Park and K. Friston , Structural and functional brainnetworks: From connections to cognition , Science, 342 (2013),p. 1238411.[36]

R. Pastor-Satorras, C. Castellano, P. Van Mieghem,and A. Vespignani , Epidemic processes in complex networks ,Rev. Mod. Phys., 87 (2015), p. 925.[37]

T. P. Peixoto , Network reconstruction and community detec-tion from dynamics , Phys. Rev. Lett., 123 (2019), p. 128301.[38]

R. P´erez, V. Puig, J. Pascual, A. Peralta, E. Landeros,and L. Jordanas , Pressure sensor distribution for leak de-tection in Barcelona water distribution network , Water Sci.Tech.-W. Sup., 9 (2009), pp. 715–721.[39]

M. M. Pires , Rewilding ecological communities and rewiringecological networks , Perspect. Ecol. Conser., 15 (2017), pp. 257–265.[40]

A. Roebroeck, E. Formisano, and R. Goebel , Mappingdirected inﬂuence over the brain using Granger causality andfMRI , NeuroImage, 25 (2005), pp. 230–242.[41]

M. Scheffer , Foreseeing tipping points , Nature, 467 (2010),pp. 411–412. M. Scheffer, J. Bascompte, W. A. Brock, V. Brovkin,S. R. Carpenter, V. Dakos, H. Held, E. H. Van Nes,M. Rietkerk, and G. Sugihara , Early-warning signals forcritical transitions , Nature, 461 (2009), p. 53.[43]

D. W. Schindler , Recent advances in the understanding andmanagement of eutrophication , Limnol. Oceanogr., 51 (2006),pp. 356–363.[44]

A. K. Seth, A. B. Barrett, and L. Barnett , Grangercausality analysis in neuroscience and neuroimaging , J. Neu-rosci., 35 (2015), pp. 3293–3297.[45]

S. G. Shandilya and M. Timme , Inferring network topologyfrom complex dynamics , New J. Phys., 13 (2011), p. 013004.[46]

A. Sheikhattar, S. Miran, J. Liu, J. B. Fritz, S. A.Shamma, P. O. Kanold, and B. Babadi , Extracting neuronalfunctional network dynamics via adaptive Granger causalityanalysis , Proc. Natl. Acad. Sci. U.S.A., 115 (2018), pp. E3869–E3878.[47]

T. Wilkat, T. Rings, and K. Lehnertz , No evidence forcritical slowing down prior to human epileptic seizures , Chaos,29 (2019), p. 091104.[48]

C. Wissel , A universal law of the characteristic return timenear thresholds , Oecologia, 65 (1984), pp. 101–107.[49]

K. Xu, W. Hu, J. Leskovec, and S. Jegelka , How powerfulare graph neural networks? , arXiv preprint arXiv:1810.00826,(2018).[50]

J.-G. Young, F. S. Valdovinos, and M. E. Newman , Re-construction of plant–pollinator networks from observationaldata , bioRxiv, (2019), p. 754077. upplementary Information Detecting structural perturbations from timeseries with deep learning

Edward Laurence, Charles Murphy, Guillaume St-Onge, Xavier Roy-Pomerleau, and Vincent Thibeault

D´epartement de Physique, de G´enie Physique, et d’Optique, Universit´e Laval, Qu´ebec (Qu´ebec), Canada G1V 0A6Centre interdisciplinaire en mod´elisation math´ematique de l’Universit´e Laval, Qu´ebec, G1V 0A6, CanadaJune 11, 2020

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The input of the GNN model for the target node i is its current activity x ( a ) i , and the activity of its neighbors x ( a ) j for all j with a ij = 1. Here, x ( a ) j denotes a slice of a consecutive time steps of node j activity.The length a is a parameter that must be set beforehand. In practice, it is best if set to the expected memoryof the dynamical mechanism. For instance, the memory of a markovian process is a = 1. If a is too small,then the input activity will not be suﬃcient to forecast the future activity. On the contrary, if a is muchlarger than the memory of the dynamical system, the model should learn to vanish the weights allocated forthe oldest activity steps. A large a also exposes the model to overﬁtting by ﬁnding superﬂuous patterns inprevious states of activity.For a single forward pass, the GNN model applies the following operations: y i = h Λ  x ( a ) i + N X j =1 a ij x ( a ) j  (1a)ˆ x i = F ( y i ) (1b)1 a r X i v : . [ phy s i c s . s o c - ph ] J un here h Λ ( · ) is a trainable layer, and F ( · ) is a function that may have trainable parameters. The function h Λ ( · ) is called a GINConv [4]. It aggregates the neighborhood of node i by summing its neighbors’ activity,and then by applying a non-linear operation described by h Λ ( · ). Thanks to the sum operation on j , the layeris independent of the number of neighbors of node i . It makes the GNN model inductive as neither the sizeof the graph, nor the number of neighbors, are hard-coded in the layers.During training, a loss function is computed between the model output and the ground-truth future activity, L (ˆ x i , x i ) . (2)The spreading and the neural dynamics use the binary cross entropy, L (ˆ x i , x i ) = x i log(ˆ x i ) + (1 − x i ) log(1 − ˆ x i ) , (3)while the predator-prey dynamics uses the L1 loss, L (ˆ x i , x i ) = | x i − ˆ x i | . (4)The GINConv layer is the core operation of the GNN model. By explicitly using the adjacency matrix A inEq. (1a), the model is able to backpropagate the error into the adjacency matrix A which will be crucial forinferring the perturbation. Note that the backpropagation requires the mathematical operations applied bythe layers to be diﬀerentiable, which obviously is satisﬁed when using the sum aggregation.The GINConv could have been equivalently written as a sparse operation on the node’s neighborhood N , y i = h Λ  x i + X j ∈N i x j  , (5)where the independence with the number of nodes N in the graph still holds. Yet, slicing beforehand thenodes activity directly on the neighborhood of the node i would disable the diﬀerentiability of the adjacencymatrix A . Hence, the essential backpropagation of A would fail.Table 1 summarizes the mathematical operations of the model for the diﬀerent dynamics. There is noguarantee that this selection of hyperparameters, e.g., number of linear transformations or the number ofhidden dimensions, is the best possible conﬁguration for our task neither that it will ﬁt to a diﬀerent dynamics.From our experiments, similar results were achieved with other sets of hyperparameters. Worse performanceresulted from a bad choice of learning rates rather than an inappropriate model design. An ablation studycould be made in the future to validate our observations.It is worth noting that other GNN layers could have been used instead of the GINConv, such as the GraphAttention Layer (GAT) [3] and the SAGEConv [1].We have used the GINConv mostly by convenience. Most deep learning frameworks of GNN models use asparse representation of the adjacency matrix, which makes the adjacency matrix not diﬀerentiable. Only fewlayers have been coded by the PyTorch community for using a dense adjacency matrix.The function F ( · ) is an activation layer used to control the shape of the output and its bounds. It receives asinput y i , i.e., the output of the GINConv. We list the functions used for the selected dynamics,SIS: F ( y ) = 11 + e − y ; (6)Predator-prey: F ( y ) = W y + c ; (7)Neural dynamics: F ( y ) = 11 + e − W y + c , (8)2 able 1 : Layer by layer description of the GNN models for each dynamics. For eachsequence, the operations are applied in order from top to bottom. A linear operationLinear( n, m ) is deﬁned as f ( x ) = W ( x ) + c with W ∈ R m × n and c ∈ R m . The ReLUoperation is the element-wise maximum ReLU( x ) = max(0 , x ).Dynamics h Λ ( · ) F ( · ) Number ofparametersSIS Linear(1, 16) Sigmoid 1 185ReLULinear(16, 64)ReLULinear(64, 1)Predator-prey Linear(1, 64) Linear(128, 1) ∗

37 569ReLULinear(64, 128)ReLULinear(128, 64)Neural Linear(20, 64) 17 441ReLU Linear(64, 1) ∗ Linear(64, 64) SigmoidReLULinear(64, 32) ∗ The linear function is twice the size of the h Λ ( · ) because the networks are directed and theoutputs of two GINConvs are concatenated. with detailed dimensions in Table 1.For the binary dynamics (SIS and neural dynamics), we use the sigmoid function to bound the output between0 and 1. In doing so, the output can be interpreted as a probability that the next activity is the active state.For the predator-prey dynamics, the output is not required to be bounded. Therefore, we simply apply alinear transformation to match the desired output size. Note that W and c are trainable parameters.For directed networks, we use two GINConvs so that the GNN model can distinguish between in- and out-edges, y (1) i = h (1) Λ  x i + N X j =1 a ij x j  (9a) y (2) i = h (2) Λ  x i + N X j =1 a ji x j  (9b)ˆ x i = F ( y (1) i k y (2) i ) (9c)where ·k· is the concatenation operator. This technique is essential to correctly infer on directed networks.

32 weight decay is often used to prevent overﬁtting. In our task, we found that using weight decay impairsthe learning of the perturbation. When applying a weight decay on the trainable perturbation matrix ˆ h GNN ,we penalize the model on the norm of | ˆ h GNN | . Therefore, it pushes the elements to zero, [ˆ h GNN ] ij → x = GNN( A − σ (ˆ h GNN ) , x , Λ ) . (10)Hence, the direction [ˆ h GNN ] ij → σ (ˆ h GNN )] ij → .

5. We would ratherlook for the direction [ σ (ˆ h GNN )] ij → Λ without harming the perturbation.From experiments, using L2 weight decay did not lead to any noticeable improvements. We explore how the model is aﬀected by removing a large number of edges. From Fig. 1(C,D), we concludethat the GNN model performs almost as well as the Bayesian model in multiple edges perturbations. A slightdeviation between the GNN and Bayesian model is noticeable for scale-free networks. This issue is detailed inthe next section. Moreover, the performance measured with the AUC is quite stable up to 15 removed edges.In the considered scale-free networks of Fig. 1(D), the perturbation represents 15% of all edges. In Fig. 1(B),we present the AUC for loopy scale-free graphs. We conclude againt that the GNN model performs as well asthe Bayesian approach.

Interestingly, for scale-free networks, we observe that some removed edges can be more challenging to inferthan usual. This emergent aspect is caused by removing core edges that disconnect the network into smallinactive components. Therefore, if certain edges have been removed deeper down into the smaller componentsof the disconnected tree structure, they would be practically impossible to infer because of the absence ofactivity. The only time frame to infer them would be just after the perturbation, while the activity is rapidlydecaying in the periphery. We show an example in Fig. 2 where an edge from the perturbation does notinduce a large likelihood drop. It is clearly shown that this edge connects two nodes with vanishing level ofactivity after the perturbation, which explains how diﬃcult it can be to infer it.

We investigate the eﬀects of decreasing the size of the perturbed time series. More precisely, the moment ofperturbation τ is moved closer to the end of the time series. In Fig. 1(A), we have simulated the dynamicsover a ﬁxed T = 200 time steps. We execute the perturbation at various t = τ ∈ [0 ,

10 50 100Pertubed series length0.60.81.0 A u C A GNNBayesian A u C B A u C C A u C D Figure 1 : AUC on various schemes with SIS dynamics. In (A), we have varied when thechange point τ occurs. The length T − τ of the perturbed time series is displayed on thex-axis (Scale-free graphs N = 100 , m = 1 with T = 200 and α = 0 . , β = 0 . N = 100 , m = 3 and T = 200 , α = 0 .

1. (C-D) AUCas a function of the number of removed edges (C: random graphs with N = 100 , p =0 . , α = 0 . , β = 0 .

1) (D: scale-free graphs N = 100 , m = 1 , α = 0 . , β = 0 . We now detail the likelihood of the SIS model. Consider the transition probability Pr[ x i ( t ) | x i ( t − A , X ( t − , θ ] for the noiseless SIS model p ﬂip = 0.If the node i is infected at time t −

1, i.e., x i ( t −

1) = 1, then it may have recovered with probability β ,i.e., x i ( t ) = 0, or stayed infected with probability 1 − β , i.e., x i ( t ) = 1. Its contribution to the transitionprobability is x i ( t − x i ( t )[1 − β ] + [1 − x i ( t )] β ] . (11)If the node is susceptible at time t −

1, i.e., x i ( t −

1) = 0, it may get infected with probability 1 − (1 − α ) l i ,where l i = P j a ij x j ( t −

1) is the number of infected neighbors, or stay susceptible with probability (1 − α ) l i .Its contribution to the transition probability is[1 − x i ( t − x i ( t )[1 − (1 − α ) l i ] + [1 − x i ( t )](1 − α ) l i ] . (12)Combining (11) and (12), the transition probabilities arePr[ w | v ; A , X ( t − , α, β ] = vR ( w, β ) + [1 − v ] R ( w, g l ) (13a)where w ≡ x i ( t ), v ≡ x i ( t − g l = (1 − α ) l i , and R ( y, z ) = y (1 − z ) + (1 − y ) z. (13b)5 dges0.00.51.0 G NN S c o r e ( P ij ) A RemovedNot removed Edges0.00.51.0 B a y e s S c o r e ( P ij ) B Edges0510 R e l a t i v e L i k e li h oo d C D A v e r a g e a c t i v i t y Figure 2 : Example of inference on the SIS dynamics with α = 0 .

12 and β = 0 . − Pr( X | A ∗ , θ ) / Pr( X | A , θ ) when removing theedge ordered similarly as (A-B). In D, the network used for the example where red edgesand the black edge are the perturbation. The black edge is a false negative accordingto the model. Only the red edges have been correctly predicted by the GNN modelwith a high score. The nodes are colored according to their average activity after theperturbation. 6ecause this is a Markov process, the likelihood of a time series is the product of the transition probabilities,Pr( X | A , θ ) = N Y i =1 T Y t>τ Pr[ x i ( t ) | x i ( t − A , X ( t − , θ ] . (14)We now consider the SIS model with a ﬂip probability p ﬂip . We introduce the observed variables X and thenoiseless activity Y , i.e., equivalent to the true activity before executing the ﬂips. Note that since we do notobserve the whole state of the system ( X and Y ), correlation remains between X ( t ) and X ( t − X ( t − X alone is non-Markovian.However, we expect the correlation to decrease quickly with time, so we use a Markovian approximation towrite the likelihood. In other words, we use only X ( t −

1) to describe X ( t ), as before.First, consider that the node i is truly infected at time t −

1, i.e., y i ( t −

1) = 1. Assuming that we observe x i ( t − p ﬂip and failed with probability 1 − p ﬂip ,Pr( y i ( t −

1) = 1 | v ) = v [1 − p ﬂip ] + [1 − v ] p ﬂip ≡ R [ v, p ﬂip ] , (15)where v = x i ( t − β or stay infected with probability 1 − β . Second, the ﬂip process succeeds with probability p ﬂip or fails with probability 1 − p ﬂip . This composition is enclosed in R [ x i ( t ) , R ( β, p ﬂip )]. Therefore, the ﬁrstcontribution to the transition probability is R [ x i ( t − , p ﬂip ] R [ x i ( t ) , R ( β, p ﬂip )] . (16)The second contribution to the transition probability adopt that the node i is truly susceptible at time t − y i ( t −

1) = 0. This occurs with probability 1 − R [ x i ( t − , p ﬂip ]. Again, two processes form the step: Thepossible infection of node i and the ﬂip process with probability p ﬂip . Therefore, the second contribution is(1 − R [ x i ( t − , p ﬂip ]) R [ x i ( t ) , R ( F i , p ﬂip )] . (17)where F i is the probability that the node i does not get infected by its neighbors , i.e., Pr( y i ( t ) = 0 | y i ( t −

1) =0 , A , α, β ). To obtain F i , we need to consider the complete neighborhood of the node i . If a neighbor j isobserved infected x j ( t −

1) = 1, then two events could have occured:1. Node j is infected at time t −

1, i.e., y j ( t −

1) = 1, in which case the ﬂip failed with probability (1 − p ﬂip ).It fails to infect node i with probability 1 − α ;2. Node j is susceptible at time t −

1, i.e, y j ( t −

1) = 0, in which case the ﬂip noise succeed with probability p ﬂip , and it failed to infect node i with probability 1.As we observe l i infected nodes, the contribution of observed infected neighbors is[ p ﬂip + (1 − p ﬂip )(1 − α )] l i . (18)For a node i with k i neighbors, a similar argument can be made for the k i − l i susceptible neighbors of node i , i.e., x j ( t −

1) = 0 for all j that a ij = 1. The total contribution is[(1 − p ﬂip ) + p ﬂip (1 − α )] k i − l i . (19)Combining (18) with (19), we obtain F i = [ p ﬂip + (1 − p ﬂip )(1 − α )] l i [1 − p ﬂip + p ﬂip (1 − α )] k i − l i . (20)7 able 2 : Summary properties of ecological networks. Average, maximum, and minimumvalues of structural properties of the 25 food webs we have used. The average localclustering is the average of the fraction of existing over the possible triangles througheach node [2]. Network property Mean Min MaxNodes 76.2 14 249Edges 487.4 38 3313Density 0.097 0.0293 0.263Average local clustering 0.10 0.016 0.358From (16) and (17), the transition probabilities of the SIS model with ﬂips arePr[ w | v ; A , X ( t − , α, β ] = R [ v, p ﬂip ] R [ w, R ( β, p ﬂip )] + (1 − R [ v, p ﬂip ]) R [ w, R ( F i , p ﬂip )] (21)where w ≡ x i ( t ), v ≡ x i ( t − F i given by (20). As a reminder, correlations remain between the observedstates of node separated by more than 1 time step. Therefore, the likelihood is only approximately equal tothe product of the transition probabilies [Eq. (14)]. References [1]

W. Hamilton, Z. Ying, and J. Leskovec , Inductive representation learning on large graphs , in NeurIPS, 2017, pp. 1024–1034.[2]

M. E. Newman , Networks , Oxford university press, 2018.[3]

P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio , Graph attention networks , arXivpreprint arXiv:1710.10903, (2017).[4]

K. Xu, W. Hu, J. Leskovec, and S. Jegelka , How powerful are graph neural networks? , arXiv preprint arXiv:1810.00826,(2018)., arXiv preprint arXiv:1810.00826,(2018).