Reconstructing large networks with time-varying interactions
Chun-Wei Chang, Takeshi Miki, Masayuki Ushio, Hsiao-Pei Lu, Fuh-Kwo Shiah, Chih-hao Hsieh
11 arXiv: Research article Reconstructing large networks with time-varying interactions
Authors:
Chun-Wei Chang , Takeshi Miki , Masayuki Ushio , Hsiao-Pei Lu , Fuh-Kwo Shiah , Chih-hao Hsieh Affiliations: National Center for Theoretical Sciences, Taipei 10617, Taiwan Research Center for Environmental Changes, Academia Sinica, Taipei 11529, Taiwan Institute of Oceanography, National Taiwan University, Taipei 10617, Taiwan Department of Environmental Engineering and Ecology, Faculty of Advanced Science and Technology, Ryukoku University, Otsu, Shiga, 520-2194, Japan Center for Biodiversity Science, Ryukoku University, Otsu, Shiga, 520-2194, Japan Hakubi Center, Kyoto University, Kyoto, Kyoto 606-8501, Japan Center for Ecological Research, Kyoto University, Otsu, Shiga 520-2113, Japan Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan Institute of Ecology and Evolutionary Biology, Department of Life Science, National Taiwan University, Taipei 10617, Taiwan * Correspondence to: Chih-hao Hsieh;
Email: [email protected]
Running title:
Reconstruction of high-dimensional networks
Keywords:
Interaction network, Network topology, Dynamical stability, Microbial community.
Number of words in abstract: 196 Number of words in main text: 4252 Number of cited references: 47 Number of tables & figures: 4 figures and 1 table
Abstract
Reconstructing interactions from observational data is a critical need for investigating natural biological networks, wherein network dimensionality (i.e. number of interacting components) is usually high and interactions are time-varying. These pose a challenge to existing methods that can quantify only small interaction networks or assume static interactions under steady state. Here, we proposed a novel approach to reconstruct high-dimensional, time-varying interaction networks using empirical time series. This method, named “multiview distance regularized S-map”, generalized the state space reconstruction to accommodate high dimensionality and overcome difficulties in quantifying massive interactions with limited data. When we evaluated this method using the time series generated from a large theoretical model involving hundreds of interacting species, estimated interaction strengths were in good agreement with theoretical expectations. As a result, reconstructed networks preserved important topological properties, such as centrality, strength distribution and derived stability measures. Moreover, our method effectively forecasted the dynamic behavior of network nodes. Applying this method to a natural bacterial community helped identify keystone species from the interaction network and revealed the mechanisms governing the dynamical stability of bacterial community. Our method overcame the challenge of high dimensionality and disentangled complex time-varying interactions in large natural dynamical systems. Introduction
Interaction network critically determines the dynamics and stability of dynamical systems (Albert et al. et al. et al. et al. et al. et al. et al. et al. et al. et al. et al. More recently, empirical dynamic modeling (EDM) based on state space reconstruction (SSR) of dynamical attractor was proposed to quantify interactions in nonlinear dynamical systems. Convergent cross mapping (CCM) (Sugihara et al. et al. et al. et al. et al. et al. m ). This restriction arises due to the “curse of dimensionality” (Bellman 1957); high dimensionality likely makes data too sparse to correctly depict neighboring relationships among data points (Hastie et al. et al. et al. et al. et al. m ), time-varying interaction networks in large, nonlinear dynamical systems (e.g., ecological communities). The advance of our approach is developing a novel distance measure (Fig. 1) that quantifies the neighboring relationships among high-dimensional data points in SSR. We refer to this measure as “multiview distance” because it is determined by ensembling numerous distances measured in various low-dimensional, topologically equivalent SSRs (i.e., multiview embeddings (Ye & Sugihara 2016)). Through multiview distance, our approach links two existing EDM methods, multiview embedding (Ye & Sugihara 2016) and regularized S-map (Cenci et al. E ) much smaller than network dimensionality (i.e., E ≪ m ). This method, named “multiview distance regularized S-map” (abbreviated as MDR S-map, hereafter), is summarized in two steps (Fig. 1): (i) measuring multiview distances among m -dimensional data from various E -dimensional multiview SSR that embeds various combinations of variables (Ye & Sugihara 2016); and (ii) determining m -dimensional interaction matrix at each time point by plug-in multiview distances to S-map algorithm with regularization constraints (Cenci et al. I. MDR S-map algorithm and implementation ). More detail statistical properties of MDR-S-map were also offered in SI Text,
II.
Statistical properties of MDR S-map.
Through the two-step procedure of MDR S-map, we reconstructed time-varying, high-dimensional interaction networks for large dynamical systems using time series data. We tested the MDR S-map method using simulated time-series data from a stochastic population dynamic model ( m =117 variables and N =100 time points; Methods and SI Text, III.
Parameterization of theoretical models ), where interaction strengths and network properties are exactly known. In particular, we examined whether the reconstructed networks preserved the three types of network properties (Methods), including: i) topological properties of network node; ii) topological properties of the entire network; and iii) network stability measures. Then, we employed this method to analyze empirical bacterial community data (dominant OTUs (=operational taxonomic units) with m =136 and N =90) collected from a natural coastal environment (Martin-Platero et al. m =117 and 136 for simulated and empirical bacterial community data, respectively) were more than the length of time series data ( N =100 and 90 for simulated and empirical data, respectively), a situation that very likely occurs in real-word datasets concerning large interaction networks. Based on analysis of a bacterial community, we demonstrated a real-world application of MDR S-map for reconstructing large, time-varying interaction networks and unveiling the interplay among network properties, dynamics and stability. Materials and Methods
MDR S-map
Our method of reconstructing large interaction networks using time series data was based on EDM (an approach rooted in attractor reconstruction). For attractor reconstruction, a critical parameter is the optimal embedding dimension (Kennel et al. et al. et al. et al. I . MDR S-map algorithm and implementation and II.
Statistical properties of MDR S-map . Assessing MDR S-map based on a theoretical network model
We validated the MDR S-map method using simulated time series data generated from a theoretical network model, as that interaction strengths and network topological properties are known a priori . This network model initially consists of 1000 mutually interacting species. Let N i ( k ) denote the population size of species i ( i = 1, 2, …, 1000) at time step t ( t = 0, 1, 2, …, n ), and N ( t ) = ( N ( t ), N ( t ), …, N ( t )) T . Population dynamics of species i followed the multi-species Ricker model, ( ) ( 1) ( ) exp 1 ( ) i i i i N t N k r e t + = + MN ------(Eq. 3.), where r is intrinsic growth rate of species i , e i is a vector taking zero values for all except for the i th entity, M represents time-independent interaction matrix with the size 1000 x 1000 ( m ij represent the effect of species j on species i ). The detail parameterization of the multi-species Ricker model was offered in SI Text, III. Parameterization of theoretical models . Analyzing model interaction networks
For model time series, we only used the last 100 time steps out of the 1000 steps simulations for further analyses. Because the model populations had chaotic dynamics and only some of these populations could co-exist until the end of model computations, we only selected dominant species with relative abundance > 0.1% (calculated over the last 100 steps) for further analyses. These dominant species had obvious temporal fluctuations, which is necessary for applying EDM. In total, 117 species were selected. Then, we applied the MDR S-map to reconstruct the interaction networks from the model time series and compared the reconstructed networks with the theoretical networks derived from the Jacobian matrix of the difference equation models. To make reasonable comparisons, we multiplied the theoretical expectation of Jacobian ( tt XY + ) by the ratio of standard deviations, YX , because time series data were normalized with respect to the temporal standard deviations prior to performing MDR S-map. Finally, we examined whether the estimated interaction strengths were consistent with the scaled theoretical Jacobians. Based on the reconstructed interaction networks, we aimed to verify whether or not: 1) the interaction strengths can be successfully recovered at each time point; and 2) topological properties of network can be preserved. In particular, we focused on three types of topological properties quantified in directed and weighted networks. First, we considered the topological properties that evaluated the importance of each network node, including in-, out-, and all interaction strengths (Barrat et al. topological indices summarizing the entire interaction network, including mean transitivity (Barrat et al. et al. Analysis of empirical bacterial time series data
We applied MDR S-map on empirical time series data of natural bacterial communities in Canoe Beach, Boston, MA, USA (Martin-Platero et al.
II-2.
Statistical properties of MDR S-map based on percentage data )). The reconstructed interaction networks enabled us to estimate how stability of the bacteria community changed through time and why. To investigate causal mechanisms underlying structural stability of the bacterial community, we examined causal relationships between structural instability (i.e., trace of interaction matrix) versus summary statistics for network topology (e.g., mean and S.D. of interaction strength), ecological properties characterizing bacterial diversity (e.g., Shannon diversity) and physicochemical environments (e.g., nutrients and salinity). To examine relationships, we applied linear analysis (temporal correlation) to determine the statistical association and nonlinear analysis (CCM) to identify causality. For linear correlation analysis, we calculated the Pearson correlation coefficient between pairs of time series and tested the significance using a stationary bootstrap that accommodates autocorrelations in time series. For nonlinear causality test, we performed CCM analysis (Chang et al. Identifying causal variables by CCM in SI Text I-1.).
Computation
All analyses were done with R (ver. 3.1.2). Simplex projection and CCM analyses were implemented using the rEDM (Version 1.2.3) package (Ye et al.
Results and Discussion
One-step forward predictability of network nodes and optimal regularization algorithms
Prior to analyzing the reconstructed networks, we evaluated the forecasting ability of MDR S-map for the dynamics of network nodes (e.g., one-step forward forecast of future node states), as forecasting skills represent a proxy of reliability for reconstructed dynamical systems (Cenci et al. et al. methods in empirical bacteria time series (Table 1). Specifically, the MDR S-map results, based on classical elastic-net regularization, had the best performance in both in-sample and out-of-sample forecasts. The MDR S-map results, based on adaptive elastic-net regularization, also outperformed all existing EDM methods for out-of-sample forecast, but had similar performance as the regularized S-map for in-sample forecast (Table 1). It is noteworthy that existing EDM methods (e.g. multivariate S-map) have been demonstrated to outperform other linear time series analyses on forecasting nonlinear dynamical systems (Sugihara et al. et al. et al. Evaluating the quality of reconstructed interaction strength using simulated datasets MDR S-map correctly quantified the strengths of time-varying interactions (i.e., Jacobians b ij ( t ) quantifying the influence of node j on i at time t ) in high-dimensional ( m = 117) interaction networks. The estimated interaction strengths were highly consistent with theoretical expectations derived from models (Fig. 2). Such strong consistency held throughout all analyzed time points (Pearson r with mean SD=0.79 r =0.930; Fig. S2A) obtained from temporal medians of all interaction strengths (i.e., long-term b ij = median( b ij ( t )); t =1, 2, …, n ). In contrast, the theoretical long-term medians had a weak negative relationship with interaction strengths inferred from the correlation coefficients between each pair of time series (Pearson r =−0.081 and p <0.01 in Fig. S2B), suggesting that correlation between time series provided no clear information for knowing true interaction strength in nonlinear systems, as reported (Sugihara et al. et al. et al. IV.
Testing robustness of the MDR S-map against data noise ). In addition, the consistency between MDR S-map estimations and theoretical expectations persisted, even when some critical network nodes (e.g., dominant species ranked in top 10% of abundance) were artificially removed from the analysis (Fig. S8). Therefore, we inferred that the reconstruction of network subgraphs was still reliable, even if some critical nodes (Fig. S8) or external environmental forcing (Figs. S4-S5) were unobservable or excluded from analyses for practical reasons. Alternate EDM approaches were unable to recover entire networks, as these methods cannot accommodate a large number of interactions in SSR models. A recent study (Ushio 2020) quantified interaction by increasing embedding dimension in regularized S-map (Cenci et al. interactions with moderate accuracy, is generally difficult to estimate network topology and stability measures (Figs. S9-S11; SI Text, V. Importance of embedding dimension in the estimation of network topology ). Nevertheless, we recognized that our method had some limitations . For example, reconstructed networks did not reveal all existing interactions (i.e., false-negatives). That is, the number of interactions (i.e., edge number) detected in our reconstructed network may be less than that in the model network (median edge number=2098 and 889 for model and reconstructed networks, respectively) due to the difficulty in estimating weak interactions (Fig. 2).
Evaluating the reconstructed network properties using simulated datasets
Topological properties of network nodes The reconstructed interaction networks preserved key topological properties of network nodes, such as strength (i.e., weighted degree), distribution of strength, and centrality, as demonstrated using an example showing the reconstructed network observed at time=950 (Fig. 3A-D). Topological indices derived at all time points also agreed with theoretical expectations (Table S1), although indices were slightly underestimated (slope <1) due to the bias of regularization (Hastie et al.
II. Statistical properties of MDR S-map ). Similarly, the estimated out- and all-strengths (in-strength + out-strength) of each node also show strong positive associations with the theoretically expectations (Table S1). Consequently, the reconstructed cumulative distributions of in-strengths and all-strengths preserved the shape (the second or higher order moments) of theoretical distributions (all p >.05 for Kolmogorov-Smirnov test; Fig. 3B and Table S1). However, the shape of reconstructed out-strength distribution was not preserved, due to fewer nodes with weak out-strength (Table S1). In addition to node strength, more complex centrality indices that evaluate the importance of nodes in networks (Newman 2004), including eigenvector centrality (Fig. 3C), authority (Fig. 3D), and hub (Table S1), were also highly consistent with theoretical expectations. Topological properties of entire network The reconstructed interaction networks well preserved the topological properties of entire interaction networks. The topological property summarizing entire networks, such as transitivity, was well preserved (reconstructed:0.15 .01 and true:0.16 .08; permutation test p -value=0.511). Other topological properties, such as mean and standard deviation of interaction strengths, although slightly underestimated, had temporal dynamics that were highly associated with the dynamics of theoretically expected values (Fig. 3E, F). That is, these reconstructed topological properties were relatively correct; in a practical sense, due to this high consistency, we inferred that our approach was capable of monitoring topological changes of interaction networks in time. Network stability measures Stability measures derived from reconstructed interaction matrices (Jacobian matrices), including local Lyapunov instability (i.e., the norm of complex dominant eigenvalue; Fig. 3G) and structural instability (matrix trace; Fig. 3H), also had strong positive associations with theoretical expectations. More specifically, the estimated and theoretical local Lyapunov instability exhibited clear in-phase dynamics with concordant peaks and valleys; however, this concordance only implied a ranked but not quantitative relationship, as revealed by high Spearman and low Pearson correlations, respectively (Fig. 3G). In contrast, both ranked and quantitative relationship were preserved for structural instability (Fig. 3H). Therefore, our inferences on network stability were still reliable, although our method cannot reliably quantify all weak interactions that have long been suggested important for stability (McCann et al. et al.
Applications to a real-world example of bacteria community Keystone bacterial species identified by node centrality As an empirical example, we reconstructed bacterial interaction networks (Movie S1) in a natural coastal environment. On average, the key OTUs exerting strong effects on others (out-strength) mainly belonged to order Flavobacteriales and order Rhodobacterales (Fig. S12) in which many members are copiotrophic species (Fuhrman et al. et al. et al.
Deciphering causal mechanisms governing dynamical stability of communities Reconstruction of high-dimensional interaction networks revealed causal mechanisms underlying dynamical stability of natural bacterial communities. Firstly, structural instability derived from the bacterial interaction networks was affected by mean interaction strength ( p -value of CCM defined in Method, p CCM =0.027 in Fig. 4A). Moreover, there was a marginally significant negative association between structural instability and mean interaction strength (Pearson r =−0.250, p bootstrap =0.098). Because mean interaction strengths were mostly positive, this negative association implied that bacteria communities became more stable if more facilitative interactions occurred in the communities. Moreover, facilitative interactions dominated the interaction networks under productive environments, as revealed by high concentrations of chlorophyll a (Fig. 4B) and nutrients (e.g., silicate in Fig. 4C). Likely, nutrients facilitated the growth of primary producers producing abundant organic matters, which bring common benefits for various bacteria involved in various stages of organic decomposition and enhance the facilitative interactions. In addition, the bacterial community was more structurally stable when the Shannon diversity of bacterial community was higher (Fig. 4D), confirming previous findings of positive biodiversity effects on ecosystem stability (Chang et al. Concluding remarks
The MDR S-map approach proposed in this study overcame the curse of dimensionality in network reconstruction. To our best knowledge, this is the first study demonstrating feasibility of quantifying interaction networks using time series data alone in high-dimensional nonlinear dynamical systems. Although the number of interactions cannot be exactly recovered, due to a lack of statistical power to differentiate some weak interactions from the absence of interaction, our method correctly quantified most critical interactions for evaluating network topology and stability. This analytical framework was applied in bacteria communities (Fig. 4) and can be easily extended to other real-world systems for searching key nodes or interactions from large networks, as long as time series of network nodes are available. Therefore, we appeal to collect high-quality time series data from various systems. As such, reconstruction of diverse types of interaction networks is expected to improve our understanding regarding complex interactions and emergent properties of large dynamical networks involving enormous numbers of interacting components. Acknowledgments:
We are grateful for comments from Po-Ju Ke, Cheng-han Tsai and Kazuhiro Takemoto. This study was supported by National Taiwan University, National Center for Theoretical Sciences, Foundation for the Advancement of Outstanding Scholarship, and Ministry of Science and Technology, Taiwan (to CHH).
References
Albert, R., Jeong, H. & Barabási, A.-L. (2000). Error and attack tolerance of complex networks.
Nature , 406, 378-382. Barrat, A., Barthélemy, M., Pastor-Satorras, R. & Vespignani, A. (2004). The architecture of complex weighted networks.
Proc. Natl. Acad. Sci. U.S.A. , 101, 3747-3752. Bellman, R.E. (1957).
Dynamic Programming . Princeton University Press. Berry, D. & Widder, S. (2014). Deciphering microbial interactions and detecting keystone species with co-occurrence networks.
Front. Microbiol. , 5. Bonacich, P. (1987). Power and Centrality: A Family of Measures.
Am. J. Sociol. , 92, 1170-1182. Cenci, S., Medeiros, L.P., Sugihara, G. & Saavedra, S. (2020). Assessing the predictability of nonlinear dynamics under smooth parameter changes.
J. R. Soc. Interface , 17, 20190627. Cenci, S. & Saavedra, S. (2019). Non-parametric estimation of the structural stability of non-equilibrium community dynamics.
Nat. Ecol. Evol. , 3, 912-918. Cenci, S., Sugihara, G. & Saavedra, S. (2019). Regularized S‐map for inference and forecasting with noisy ecological time series.
Methods Ecol. Evol. , 10, 650-660. Chang, C.-W., Ushio, M. & Hsieh, C.-h. (2017). Empirical dynamic modeling for beginners.
Ecol. Res. , 32, 785-796. Chang, C.-W., Ye, H., Miki, T., Deyle, E.R., Souissi, S., Anneville, O. et al. (2020). Long-term warming destabilizes aquatic ecosystems through weakening biodiversity-mediated causal networks.
Global Change Biol. , 26, 6413-6423. Craft, C. (2007). Freshwater input structures soil properties, vertical accretion, and nutrient accumulation of Georgia and U.S tidal marshes.
Limnol. Oceanogr. , 52, 1220-1230. Csardi, G. & Nepusz, T. (2006). The igraph software package for complex network research.
InterJournal, complex systems , 1695, 1-9. Deyle, E.R., Fogarty, M., Hsieh, C.-h., Kaufman, L., MacCall, A.D., Munch, S.B. et al. (2013). Predicting climate effects on Pacific sardine.
Proc. Natl. Acad. Sci. U.S.A. , 110, 6430-6435. Deyle, E.R., May, R.M., Munch, S.B. & Sugihara, G. (2016). Tracking and forecasting ecosystem interactions in real time.
Proc. Royal Soc. B , 283, 20152258. Emmerson, M. & Yearsley, J.M. (2004). Weak interactions, omnivory and emergent food-web properties.
Proc. Royal Soc. B , 271, 397-405. Freilich, M.A., Wieters, E., Broitman, B.R., Marquet, P.A. & Navarrete, S.A. (2018). Species co-occurrence networks: Can they reveal trophic and non-trophic interactions in ecological communities?
Ecology , 99, 690-699. Fry, B. (2006).
Stable isotope ecology . Springer. Fuhrman, J.A., Cram, J.A. & Needham, D.M. (2015). Marine microbial community dynamics and their ecological interpretation.
Nat. Rev. Microbiol. , 13, 133-146. González, J.M., Fernández-Gómez, B., Fernàndez-Guerra, A., Gómez-Consarnau, L., Sánchez, O., Coll-Lladó, M. et al. (2008). Genome analysis of the proteorhodopsin-containing marine bacterium
Polaribacter sp. MED152 (Flavobacteria).
Proc. Natl. Acad. Sci. U.S.A. , 105, 8724-8729. Griffin, A.S., West, S.A. & Buckling, A. (2004). Cooperation and competition in pathogenic bacteria.
Nature , 430, 1024-1027. Hastie, T., Tibshirani, R. & Friedman, J. (2009).
The elements of statistical learning: data mining, inference, and prediction . Springer Science & Business Media. Hoegh-Guldberg, O. & Bruno, J.F. (2010). The impact of climate change on the world’s marine ecosystems.
Science , 328, 1523-1528. Hsieh, C.-h., Glaser, S.M., Lucas, A.J. & Sugihara, G. (2005). Distinguishing random environmental fluctuations from ecological catastrophes for the North Pacific Ocean.
Nature , 435, 336-340. Kéfi, S., Domínguez-García, V., Donohue, I., Fontaine, C., Thébault, E. & Dakos, V. (2019). Advancing our understanding of ecological stability.
Ecol. Lett. , 22, 1349- Phys. Rev. A , 45, 3403-3411. Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment.
J. ACM , 46, 604–632. Kolaczyk, E.D. & Csárdi, G. (2014).
Statistical analysis of network data with R . Springer. Martin-Platero, A.M., Cleary, B., Kauffman, K., Preheim, S.P., McGillicuddy, D.J., Alm, E.J. et al. (2018). High resolution time series reveals cohesive but short-lived communities in coastal plankton.
Nat. Commun. , 9, 266. McCann, K., Hastings, A. & Huxel, G.R. (1998). Weak trophic interactions and the balance of nature.
Nature , 395, 794. Montoya, J.M., Pimm, S.L. & Solé, R.V. (2006). Ecological networks and their fragility.
Nature , 442, 259-264. Morinière, E.C.d.l., Pollux, B.J.A., Nagelkerken, I., Hemminga, M.A., Huiskes, A.H.L. & Velde, G.v.d. (2003). Ontogenetic dietary changes of coral reef fishes in the mangrove-seagrass-reef continuum: stable isotopes and gut-content analysis.
Mar. Ecol. Prog. Ser. , 246, 279-289. Newman, M.E.J. (2004). Analysis of weighted networks.
Phys. Rev. E , 70, 056131. Rogers, T.L., Munch, S.B., Stewart, S.D., Palkovacs, E.P., Giron-Nava, A., Matsuzaki, S.-i.S. et al. (2020). Trophic control changes with season and nutrient loading in lakes.
Ecol. Lett. , 23, 1287-1297. Shalizi, C.R. (2006). Methods and Techniques of Complex Systems Science: An Overview. In:
Complex Systems Science in Biomedicine (eds. Deisboeck, TS & Kresh, JY). Springer US Boston, MA, pp. 33-114. Song, C., Von Ahn, S., Rohr, R.P. & Saavedra, S. (2020). Towards a probabilistic understanding about the context-dependency of species interactions.
Trends Ecol. Evol. , 35, 384-396. Strogatz, S.H. (2001). Exploring complex networks.
Nature , 410, 268-276. Sugihara, G., Grenfell, B., May, R.M., Chesson, P., Platt, H.M. & Williamson, M. (1990). Distinguishing error from chaos in ecological time series.
Philos. Trans. R. Soc. Lond. B. , 330, 235-251. Sugihara, G., May, R., Ye, H., Hsieh, C.-h., Deyle, E., Fogarty, M. et al. (2012). Detecting causality in complex ecosystems.
Science , 338, 496-500. Ushio, M. (2020). Interaction capacity underpins community diversity. bioRxiv , 2020.2004.2008.032524. Ushio, M., Hsieh, C., Masuda, R., Deyle, E., Ye, H., Chang, C. et al. (2018). Fluctuating interaction network and time-varying stability of anatural fish community.
Nature , 554, 360-363. Xiao, Y., Angulo, M.T., Friedman, J., Waldor, M.K., Weiss, S.T. & Liu, Y.-Y. (2017). Mapping the ecological networks of microbial communities.
Nat. Commun. , 8, 2042. Ye, H., Adam Clark, Ethan Deyle, Steve Munch, Oliver Keyes, Jun Cai et al. (2013). rEDM: Applications of empirical dynamic modeling (EDM) from time series. Ye, H., Deyle, E.R., Gilarranz, L.J. & Sugihara, G. (2015). Distinguishing time-delayed causal interactions using convergent cross mapping.
Sci. Rep. , 5, 14750. Ye, H. & Sugihara, G. (2016). Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality.
Science , 353, 922-925. Yu, Z., Gan, Z., Huang, H., Zhu, Y. & Meng, F. (2020). The varying bacterial interactions revealed by regularized S-map.
Appl. Environ. Microbiol. , AEM.01615-01620. Zinzalla, G. & Thurston, D.E. (2009). Targeting protein–protein interactions for therapeutic intervention: a challenge for the future.
Future Med. Chem. , 1, 65-93. Zou, H. & Zhang, H.H. (2009). On the adaptive elastic-net with a diverging number of parameters.
Ann. Stat. , 37, 1733-1751. Fig. 1. Schematic illustration for MDR S-map analysis procedure.
The MDR S-map consists of two steps. The aim of the first step is to obtain the multiview distance that operates at the optimal embedding dimension ( E ). (1-1) Neighborhood relationships [inferred from the distance, d M (.)] among high-dimensional data points X ( t ) were recovered in numerous low-dimensional state space reconstruction (SSR; M , M …, M C ) (i.e., multiview SSR). Among these SSRs, (1-2) we computed distances ( L norm) among data points under the optimal embedding dimension E . Collecting all these distances, (1-3) we obtained the multiview distances ( d E ) and (1-4) determined the data weights ( W E ) for the following S-map analysis. In the second step, we estimated the high-dimensional interaction strength ( B ) by S-map, based on locally weighted least square optimization (argmin B (|| . || ) ) with regularization ( λ [.]) incorporated. Specifically, (2-1) the weights, w E =exp(−θ d E /mean( d E )), derived in the first step were plugged-in the S-map optimization algorithm. (2-2) Under the constraint of regularization ( λ and α are the penalty factors of regularization), (2-3) we solved the high-dimensional local linear coefficients to approximate interaction strengths at each time step. Detailed definitions of each variable are reported in Methods and SI Text, I. MDR S-map algorithm and implementation . Fig. 2. Comparisons of interaction strengths estimated by an MDR S-map, with true strengths derived from model networks. ( A-D ) Interaction strengths of time-varying interaction networks were effectively reconstructed at every time step with high prediction skills (Pearson r and Mantel r that examine prediction skills for each corresponding pair of Jacobian elements and the entire interaction matrix, respectively); the panels are examples from four arbitrary time points at 950, 960, 970, and 980, whereas the conclusions remained for all other time points. Here, the grey lines represent the 1:1 lines, and black lines represents the slope of quantile regression based on median. The strengths were slightly underestimated (slope<1) because the penalized algorithm used in regularization leads to minor biased (underestimated) estimators. Because strength of weak interactions cannot be precisely estimated, it caused false positive and false negative findings presented as the vertical and horizontal parts, respectively, of the crosses, near the origins. Fig. 3 . Preserved topological properties in the weighted, directed network reconstructed by MDR S-map. ( A-D ) The topological properties of network nodes in the reconstructed network observed at time=950 are shown as examples. The estimated ( A ) in-strength (weighted in-degree) of each node was positively correlated with theoretical expectations. Therefore, after centralizing by mean strength, ( B ) the shape of cumulative distribution of estimated strengths was not different from that of theoretical strengths (Kolmogorov-Smirnov test p =0.947). Topological indices inferring the centrality of network nodes, including ( C ) eigenvector centrality and ( D ) authority, demonstrated significant positive correlations between the estimations and theoretical expectations. Results of topological properties of network nodes observed at the other time points were summarized (Table S1). For each time point, the topological properties characterizing the whole network were also quantified, including ( E ) mean and ( F ) standard deviation of interaction strength. Temporal dynamics of reconstructed topological properties had strong positive correlations with that of theoretical expectations. Similarly, indices inferring network stability, including local ( G ) Lyapunov instability and ( H ) structural instability, also demonstrated concordant temporal fluctuations in reconstructed versus theoretical networks. Fig. 4. Reconstructed interaction networks from empirical daily time series revealed causal mechanisms determining structural stability in a coastal bacterial community ( A ) There was a significant negative association between structural instability and mean interaction strength. In this system, there were more positive (facilitative) interactions when ( B ) primary production (approximated by chlorophyll a concentration) and ( C ) nutrient concentration (e.g., silicate) were high. Moreover, structural instability decreased with increasing Shannon diversity index ( D ). In addition, local environmental disturbances caused by terrestrial freshwater input destabilized the dynamics of marine bacterial community ( E ). Correlation analysis with stationary bootstrap ( r and p bootstrap ) and causality analysis with Convergent Cross Mapping ( ρ CCM and p CCM ) were used to decipher the mechanisms (see Methods). Table 1. Comparison of forecast performance among various EDM methods.
Three types of forecast skill, Pearson r correlation coefficient, rMSE (root mean square error), and MAE (mean absolute error), were calculated and averaged for all individual nodes (i.e., 117 species and 135 OTUs in the simulated and empirical datasets, respectively). Here, we summarized the computed forecast skills as mean±SD. Irrespective of forecast skill types, the MDR S-map had the best performance in both leave-one-out cross-validation (in-sample) and out-of-sample forecasts. The Pearson r for out-of-sample forecast was not calculated, as only two test samples were used in this analysis. Data
Forecast skill
Type of evaluation
Univariate S-map Multivariate S-map
Regularized S-map (classical elastic-net)
MDR S-map (classical elastic-net) MDR S-map (adaptive elastic-net) Model
Pearson r In-sample Out-of-sample - - - - - rMSE In-sample Out-of-sample MAE
In-sample Out-of-sample Bacterial community
Pearson r In-sample 0.48 rMSE In-sample 0.90 MAE
In-sample 0.54 Supporting Information for
Reconstructing large networks with time-varying interactions
5 Chun-Wei Chang, Takeshi Miki, Masayuki Ushio, Hsiao-Pei Lu, Fuh-Kwo Shiah, Chih-hao Hsieh*. *Correspondence to: Chih-hao Hsieh; Email:[email protected] 10
This file includes:
Supplementary Information Text I to V I. MDR S-map algorithm and implementation
I-1. Step 1: Obtaining multiview distance and weights I-2. Step 2: Estimation of high-dimensional interaction strength using regularized S-map I-3. Parameter selection based on cross-validation
I-4. State dependency of estimated interaction strength I-5. Dimensional difference in the two-step procedure of MDR S-map
20 II. Statistical properties of MDR S-map
II-1. Statistical properties of interaction strengths estimated by ordinary and regularized S-map II-2. Statistical properties of MDR S-map based on percentage data
III. Parameterization of theoretical model 25
IV. Testing robustness of the MDR S-map against data noise
IV-1. Random noises generation IV-2. Robustness of MDR S-map to noises
V. Importance of embedding dimension in the estimation of network topology 30 2 Figures S1 to S12 Tables S1 to S2 Legends for Movies S1 SI References Other supplementary materials for this manuscript include the following:
Movies S1
10 3
Supplementary Information Text
I. MDR S-map algorithm and implementation
This section offered the detail algorithm for the implementation of multiview distance regularized S-map (MDR S-map), which can be achieved in two steps.
I-1. Step 1: Obtaining multiview distance and weights
5 The implementation of multiview SSR In the first step, we determined the multiview distance that depicted the neighbors of a high-dimensional system state via the ensemble of numerous distances measured in various low-dimensional state space reconstructions at the optimal embedding dimension ( E ). Practically, the optimal embedding dimension ( E i ) for a network node, x ( i ) was obtained using 10 univariate simplex projection (Sugihara & May 1990) that optimized a one-step forward forecast, predicting the future states of x ( i ) . With the determined optimal embedding dimension, numerous low-dimensional SSRs were built by embedding various combinations of causal variables and their time-lags (maximal lag=3, following multiview SSR (Ye & Sugihara 2016)). For instance, to make one-step forward forecast on one of the m network nodes x ( i ) , we 15 considered a subset of variables ( )( ) ( ) { , ,..., } Ei ii i x x x from all the q i causal variables of x ( i ) . Generally, the number of causal variables affecting x ( i ) ( q i ) and network dimensionality ( m ) were much larger than the optimal embedding dimension ( E i ) in large dynamical systems, i.e., m ≥ q i >> E i .
20 Identifying causal variables by CCM Practically, all causal variables from m network nodes are determined using convergent cross-mapping (CCM) (Sugihara et al. x ( i ) to all other m -1 network nodes (Chang et al. ρ ( L ) (correlation coefficients between observations and predictions) when increasing time series library, L . Specially, we tested convergence by testing the significance of: 1) monotonic increasing trend in ρ ( L ) using Kendall’s τ test; and 2) converged cross-mapping skill ρ ( L max ) using a Student’s t -test, where the significance of CCM requires that both tests are significant (i.e., max( p Kendall-test , p t -test )< α =0.05) (Chang et al. Multiview distance and weights derived based on the ensemble approach With the determined causal variables and the optimal embedding dimension of the target node i , we selected various subsets of causal variables to build numerous low-dimensional SSR ( M c ; c denotes any one of the combinations consisting of the causal variables) operated at the 10 optimal embedding dimension, E i . For all these SSR reconstructed for node i , we calculated the distance (L norm) between every pair of embedded states observed at various time points, e.g., ( , )( , ) ( , ) ( ) ( , ,..., )( ) Ei c ic i c ic u u X t x x x t = and, ( , )( , ) ( , ) ( ) ( , ,..., )( ) Ei c ic i c ic v v X t x x x t = , where ( , ) k c i x the k -th causal variable selected for the node i in the variable combination c , respectively. :{ , ,... } an, d u v u vN t t tT t t tt , using their Euclidean distance ( ( ), ( )) c cu v d X t X t . Finally, 15 we collected the distances measured under various SSR, M c and then calculated the ensemble multiview distance, ( ( ), ( )) E m u m v d X t X t , as the weighted average among all these distances, i.e., ( ( ), ( )) ( ( ), ( ))
E c cm u m v c u vc d X t X t w d X t X t = , to approximate the distance measured among high-dimensional system states, (1) (2) ( ) ( ) ( , ,..., )( ) mm X t x x x t = . Here, the weight, c w , is proportional to the forecast skill that evaluates the performance of one-step forward forecasts 20 for the future values of x ( i ) under M c, using the correlation coefficients ( ρ ) between model forecast and observations, (i.e., ( ), 1 c c cc w M w = ). In practical implementation, we randomly generated 10000 SSR from combinations of causal variables and retained only the 5 top 100 SSR with the highest one-step forecast skills ( c =1, 2, …, 100). This strategy considers the computational efficiency, since the number of low-dimensional SSR, M c , also grows combinatorically with the number of causal variables q . Finally, we prepared a N × N multiview distance matrix that collected all pairwise ensemble distances for the analyses in the next step. Again, this distance matrix was determined at the optimal embedding dimension for each node. 5 Remarks of Step 1 The computation of multiview distance under numerous low-dimensional SSR, i.e., ( ( ), ( )) E m u m v d X t X t , recovered the information of high-dimensional SSR, but avoided the computation of distance measures using all network nodes, i.e., ( ( ), ( )) m u m v d X t X t . This step 10 is critical, as mean distance among data points increases with data dimensionality, or equivalently, the sample size required to maintain constant mean distance among randomly generated data points grows exponentially with increasing dimensionality, known as the curse of dimensionality (Bellman 1957; Hall et al. X m ( t ) that are required for SSR-based approaches (Sugihara et al. et al. I-2. Step 2: Estimation of high-dimensional interaction strength using regularized S-map
Quantitative definition of interaction strength in S-map 20 The second step is to estimate interaction strengths based on the neighboring relationships determined in the first step (i.e., multiview distance and weights). In this step, we adopted an intuitive definition of interaction strength: the strength of node x ( j ) affecting node x ( i ) at time t is the response of x ( i ) at t +1 to subtle changes of x ( j ) at previous time step t , i.e., 6 ( ) ( ) ( ) ( ) i j x t x t + . In general, we could always identify a time-recursive function, F i that linked the response of x ( i ) ( t +1) to all interacting nodes observed at previous time step t , ( ) ( ) (( 1 () )) mi i x Xt F t + = , { , ,..., }: N t t tT t , (1) (2) ( ) ( ), ( ), , )( ) ( )( Tmm
X x t x t x tt = , ( ) 1 : i m F → , and i = 1, 2,…, m . Then, the interaction strength of x ( j ) affecting x ( i ) at time t is equivalent to the partial derivative of the time-recursive function F ( i ) with respect to x ( j ) , 5 ( ) ( ) ( ) /( ) mi j X tF x , evaluated at time t (i.e., the Jacobian of F ( i ) ). The derived interaction strength based on time-recursive function quantifies the interaction that occurred at the time scale between two adjacent time points (e.g., sampling interval of time series). Therefore, changing the sampling interval alters the time-recursive function and thus infers the interaction strength at various time scales. 10
S-map algorithm based on local linear approximation In reality, the time-recursive function F ( i ) is unknown and highly complicated and thus cannot be parametrically specified. To solve this problem, S-map (Sugihara et al. et al. F ( i ) locally using linear combinations of embedded variables 15 observed at each time point t k , i.e., ( ) ( ) ( ) ( )0 ) ) ) ( ) ((( ( ) ( ) )( k k mi i i sk t k k t isTm m k ksm F t f t t B b tX X x tX = = = , : , , , k N t T t t t , where x (0) = 1 is the intercept term and ( ) tk i f is the function that locally approximates the unknown function F ( i ) at time t k . Based on this local linear approximation, the interaction strengths of x ( j ) affecting x ( i ) at time t k , ( ) ( ) ( ( )) ( )/ jki k xX tF t , can be approximated by the Jacobians of these locally approximated linear functions, (i.e., local linear 20 coefficients), ( ) ( ) ( ) ( )( ) /( ) k i jt k k ij km f t x t b tX = . Therefore, we can estimate the interaction strengths (Jacobians) by estimating these local linear coefficients, ( ) ij k b t . To do so, S-map proposed a local weighted least square estimator to solve the local linear coefficients, 7
2( ) ( ) ( ) ( ) ˆ arg min ( ) k k k i i i it B t t
B Y B = −
W X --------(eq. 1), where (1) (2) ( ) ( , ,..., ) m X X X = X is a N × m data matrix collecting the time series of all network nodes; ( ) ( ) ( ) ( )1 2 ( ( ), ( ),..., ( )) i i i i TN X x t x t x t = and ( ) ( ) ( ) ( )2 3 1 ( ( ), ( ),..., ( )) i i i i TN Y x t x t x t + = are N ×1 vectors representing the current and one-step forward time series data of the node x ( i ) , respectively. ( ) k it W is a N × N diagonal ensemble weight matrix in which each diagonal entity is 5 the weight obtained from the exponential decay function of Euclidean distance, ( ( ), ( )) m q m k d X t X t , measured under SSR, ( ( ), ( ))exp 0, m q m kq d X t X tw d = − : , , , q N t T t t t , where θ is a state-dependency (nonlinearity) parameter and m q m kq d d X t X tN = is the mean distance. Here, the target point is weighted by zero ( w kk =0) to exclude the target data points from the estimating equation (i.e., leave-one-out cross-10 validation). The Jacobians are solved for each time point; thus, a time series of interaction strengths (i.e. time-varying interaction network) are estimated. Solving high-dimensional interaction strength using regularization As explained in the first step, the high dimensionality of network ( m ) is vulnerable to 15 the curse of dimensionality. To tackle this issue, we replaced the distance d measured in high-dimensional space (dimension= m ) by ensemble multiview distances d E (derived in the first step) measured at the optimal embedding dimension (dimension= E ) to determine the local weight matrix k Et W . Furthermore, the optimization based on Eq. 1 suffered from overfitting problems and has no unique solution if the dimensionality ( m ) is larger than the time series length ( N ) 20 (Bühlmann & Van De Geer 2011). Thus, we constrained the optimization by introducing 8 regularization (e.g., elastic-net (Cenci et al. ( ) ˆ k it B , ˆ arg min ( ) (1 ) k k k k k i E i i i it B t t t t B Y B B B = − + + −
W X ----------(eq. 2). Here, λ is the penalized factor, and α is the adjusted parameter balancing the regularization using L (||.|| ) or L (||.||) norm of the parameter vector ( ) k it B . The penalization on L norm 5 automatically shrinks parameters ( ) ij k b t to zero (Tibshirani 1996), if some variables x ( j ) have no or very weak effect on x ( i ) at time t k . Based on the algorithm of parameter shrinkage, the embedded variables do not need to be determined in prior as in existing multivariate EDM methods; instead, the variables having substantial effects are automatically (estimated ij b ) selected at each time point. To eliminate potential false positives, we applied adaptive elastic-10 net regularization that further shrinks the estimated Jacobians (Zou & Zhang 2009). In summary, the proposed two-step procedure of MDR S-map quantified high-dimensional interaction strengths and reconstructed interaction networks. I-3. Parameter selection based on cross-validation
15 Practically, the solution of Eq. 2 depends on nonlinearity parameter θ , penalized factor λ , and adjusted parameter α in the MDR S-map algorithm. The best parameter combination ( θ i , λ i , α i ) for each node i and estimated interaction strengths are those that minimize rMSE of the one-step forecast on ( ) ( 1) i k x t + . More specifically, the parameters were determined by leave-one-out cross-validation, in which data points collected at the testing time step t k were excluded 20 (weighted by zero) from the dataset ( Y ( i ) and X ) to obtain a one-step forecast on ( ) ( 1) i k x t + , : , , , k N t T t t t . In addition, we keep self-regulation effect of each node ( ˆ ii in Eq. 2) in the final estimation, which the estimation of ˆ ii underwent no penalization, i.e., 9 ˆ0 but 0 i ii = . To test the generalization ability of our methods, the selected parameters were applied for fitting those time series data involved in the cross-validation (i.e., library or in-sample) and also for evaluating data not involved in parameter selections (i.e., out-of-sample) (Cenci et al. t , in which each 5 entity ˆ ij ij J = was estimated as S-map coefficient representing the strength of node j affecting node i . In addition to cross-validation, we then applied out-of-sample prediction to evaluate the performance of S-map model trained by library dataset (Cenci et al. ( ) i Y and X in the eq. 2.). Estimations of out-of-samples interaction strengths were achieved by plugging-in their neighborhood relationships with respective to library data (i.e., the weights derived from ensemble distance, d E ( X library , X out-of-sample )) when solving Eq. 2. It is noteworthy that the out-of-samples predictions in interaction strengths highlighted the capacity 15 of MDR S-map to predict the one-step forward future state of entire interaction networks. I-4. State dependency of estimated interaction strength
The estimation of interaction strengths by MDR S-map is based on Jacobian ( ) ( ) ( ) ( ) i j x t x t + , considering the overall effects of abundance changes of nodes j on the 20 abundance of nodes i . This measurement changed through time, i.e., time-varying, if either one of these following scenarios occurs: i) the abundance of x ( i ) or x ( j ) is time-varying; or ii) the per capita effects of x ( j ) are time-varying. Scenario (i) usually occurs in nonlinear dynamical systems, whereas Scenario (ii) is often observed in empirical systems, but often ignored in 10 many theoretical analysis for the reason of simplicity. Certainly, a combination of (i) and (ii) are also possible. Although the per capita effects can be isolated in simple theoretical systems via the scaling of abundances (Berlow et al. I-5. Dimensional difference in the two-step procedure of MDR S-map
Unlike existing EDM methods, MDR S-map uses ensemble neighborhood relationships obtained from various views of low-dimensional SSRs (i.e., multiview distance and weights, d E and W E , obtained at the optimal dimension, E ) but accesses high-dimensional estimators of 10 interaction strengths using regularization. As such, we constructed high-dimensional, time-varying interaction networks in large dynamical systems. It is noteworthy that the dimensionality used in the first step ( E ) and the second step ( m ) were different. Under the optimal embedding dimension at the first step, the reconstructed manifold was the topological invariant to the original manifold (Sauer et al. m ) interacting with the target node ( x ( i ) ) we are concerned about, it is very likely that the effective dimensionality is not so high, as many effects are redundant; consequently, the optimal embedding dimension E is always less than m . Therefore, we do not
20 need to limit the number of potential interacting variables in Eq. 2 in the statistical inferences of interaction strengths, as the number of interacting nodes is not equal to the dimensionality of dynamical attractor reconstructed from the view of the target node x ( i ) . This reasoning, setting the dimensionality differently for different parts of algorithm, is the main difference between our method and existing EDM methods.
25 11 MDR S-map based on the ensemble approach (i.e., the first step; Fig. 1) effectively determines the neighboring relationship among high-dimensional data points at the optimal embedding dimension. The neighboring relationship is critical to investigating the dynamical behavior of nonlinear systems (Sugihara & May 1990), but cannot be directly inferred from the distance measured in high-dimensional state space, as the curse of dimensionality makes 5 the median distance among data points increase with their dimensionality (Bellman 1957). Combining an ensemble approach with the regularization (i.e., the second step; Fig. 1) makes the interaction strengths of high-dimensional networks explicitly quantified. This quantification needs no prior knowledge or complicated procedures to select variables as in previous studies (Deyle et al. et al.
II. Statistical properties of MDR S-map
II-1. Statistical properties of interaction strengths estimated by ordinary and regularized S-map
In this section, we derived the statistical expectations of original and regularized S-map coefficients. Considering a dynamical system presented by time-recursive function using 5 embedded state variables, ( ) ( ) 1 ( ( )), (1, 2,..., )( 1) , : mi i mi x t F X t t n F + = →
Here, ( ) m X t is a point at native state space (not through taking lags) observed at time t , (1) (2) ( ) ( )=( ( ), ( ), ..., ( )) m Tm X t x t x t x t . While F i is nonlinear and thus difficult to estimate reliably; thus, we estimated the function nonparametrically (Perretti et al. t , such that ( ) ( ) ( ) ( )0 ) ) ) ( )( ( ) ( ( ) ( ( ) T im m mi i i st tm ss
F t f t t B b tX X X x t = = = , : , , , k N t T t t t . Statistically, the local linear coefficients, ( ) it B , can be estimated by applying weighted-least square locally at each data point X tk, which repeatedly solves the least square optimization using the whole dataset X t (dim( X t )= N * m ), albeit with different weights for different data points, 15 depending on how close the library data points X tq are to the target points X tk (inferred from d ( X tq , X tk )) in State Space Reconstruction (SSR). Then, the estimator of ( ) k it B at every time step can be solved by local weighted least square,
2( ) ( ) ( ) ( ) ( ) 1 ( ) ( ) ˆ arg min ( ) =( ) k k k k k i i i i T i T i it B t t t t
B Y B Y − = − W X X W X X W , where, ( ) ( ) ( ) ( )2 3 1 ( ( ), ( ),..., ( )) , i i i i TN
Y x t x t x t + = diag( , ,..., ), k t n w w w = W and 20 ( ( ), ( ))exp q kq d X t X tw d = − . This estimator is an unbiased estimator because ( ) ( ) 1 ( ) ( ) ( ) 1 ( ) ( ) ( ) ˆE( | ) ( ) E( ) ( ) k k k k k k k i T i T i i T i T i i it t t t t t t B Y B B − − = = =
X X W X X W X W X X W X .
13 As explained in the Methods section, due to high dimensionality, the weight matrix in our method was not directly calculated from all the variables using the whole dimension. Instead, we determined the weight matrix based on the ensembles collected from low-dimensional SSR under the optimal embedding dimension (see Methods). Nevertheless, the replacement of multiview weight matrix will not make this estimator biased. 5 Next, we extended the statistical derivation of S-map coefficients that accommodated the regularization using elastic-net. In regularization, we included the penalty factor ( λ ) that added constraints on least square optimization and shrunk the estimated parameters. ˆ arg min ( ) (1 ) k k k k k i i i i i it B t t t t B Y B B B = − + + −
W X
Here, we only demonstrated a special case of elastic-net regularization using ridge regression 10 (i.e., α =1), in which the estimator had close form. ˆ arg min ( ) ( ) k k k k k k i i i i i T i T i it B t t t t n t B Y B B Y − = − + = + W X X W X I X W
Obviously, this estimator was biased and this biased estimator is always underestimated, as the penalty factor λ is positive. ( ) ( ) 1 ( ) ( ) ( ) ( ) 1 ( ) ˆE( | ) ( ) ( ) k k k k k k k i T i T i i i T i it t n t t t t n t B B B B − − = + = − +
X X W X I X W X X W X I
15 Although this regularized estimator is biased (i.e., slightly lower accuracy), it provides greater precision (lower variance) by avoiding overfitting, making the fitted S-map model have better generalization ability for predicting the dataset out of library (out-of-sample). This purpose of improving forecast precision was addressed in the previous study (Cenci et al. N ) is less than the number of variables ( m ); this difficulty hinders reconstruction of high-dimensional 14 interactional networks (Tibshirani 1996). II-2. Statistical properties of MDR S-map based on percentage data
In this section, we derived the statistical properties of S-map coefficients, based on the percentilized time series data ( P t ). Firstly, we formulated the time-recursive function G ( i ) used 5 in the S-map, but used percentage data instead. ( ) ( ) T ( )( )( ) 1 ( ) ( )( ) ( 1) ( )( )( ) ( ) and (( ( ) )( ) ) i i itii i kt tk kk p t G t P tx tp t c x t c x tx Pt − + = = = = Again, the time-recursive function is locally approximated by linear function, with linear coefficients ( ) it . Similarly, we solved the weighted-least square optimization to obtain the estimator of local linear coefficients, 10 ( ) ( ) ( ) ( ) ( ) 1 ( ) 1 ( ), , 1 ˆ arg min ( ) arg min ( ) k k k k k i i i i i i it t p p t t p t t t Y Y − − + = − = −
W P W C C X , where =diag( , ,..., ) N t t t t c c c C . Assuming the weight matrix ( ) ( ), k k i it p w t = W D W derived from proportional data, with diag(1 ,1 ,...,1 ) w n = + + + D is different from the original weight matrix with some derivations δ s . ( ) ( ) ( ) ( ) ( ) 1 ( ) ( ) 1, , , ˆ arg min ( ) ( ) , where , k k k k k t i i i i T i T i it t p t t t p t p t t t Y Y + − − = − = W Q X X W X X W Q Q C C
15 Then, we derive the expectation of estimator, ( ) ˆ k it as ( ) ( ) 1 ( ) ( ) ( ) 1 ( ) ( ), , , , ˆE( | ) ( ) E( ) ( ) ( ) k k k k k k i T i T i i T i T i it t p t p t t p t p t t Y B − − = =
X X W X X W Q X W X X W Q X ( ) 1 ( ) ( ) ( ) ( ) 1 ( ) ( ), , , ,biased due to percentilization ( ) ( ( ) ) ( ) ( ) k k k k k k k
T i T i i i T i T i it p t p n Q t t t p t p Q t
B B B − − = + = +
X W X X W I D X X W X X W D X
11 11 1 1 diag( , ,..., ) n nn t tt t t tQ t t t c cc c c cc c c ++ ++ + + −− − D , ( ) ˆ k it is unbiased only if Q t is an identity matrix (or D Q is a zero matrix). Based on this formulation, the estimator may be biased because there are 20 15 unequal weightings operating on ( )1 it X + and X by t −+ C and t − C , respectively. We inferred that for the estimation for percentage data to be reliable, a statistical requirement was that fluctuations in total biomass ( c t ) between adjacent time points are not too strong. It is also noteworthy that the difference in weight matrices, W tk and W tk , p , itself, will not bias the estimation, unless t c + − and t c − are very different. Finally, we extend this derivation by 5 including the penalty factor ( λ ) introduced by regularization based on α =1. ( ) ( ) ( ) 1 ( ) ( ) 1 ( ) ( ), , ,biased due to regularization biased due to percentilization+regularization ˆE( | ) ( ) ( ) ( ) k k k k k k k i i T i i T i T i it t t p n t t p n t p Q t B B B − − = − + + +
X X W X I X W X I X W D X
The derived estimation is still biased. The bias comes from two sources: One is caused by parameter shrinkage controlled by penalty factor ( λ ); another is caused by the interaction between percentilization ( D Q ) and parameter shrinkage. Correction of these biases is 10 complicated and requires the information of ratio of total abundance at adjacent time points ( +1 tt cc ) that is usually missing when using relative abundance data. Nonetheless, based on our results (Fig. S3), the estimated interaction strength remained positively correlated with the theoretical expectation, although the bias was not corrected. 15 16 III. Parameterization of theoretical models
We validated MDR S-map method using simulated time series data generated from the multi-species Ricker model, ( ) ( 1) ( ) exp 1 ( ) i i i i N t N k r e t + = + MN ------(Eq. 3.), where r is intrinsic growth rate of species i , e i is a vector taking zero values for all except for 5 the i th entity, M represents time-independent interaction matrix with the size 1000 x 1000 ( m ij represent the effect of species j on species i ). The species-specific intrinsic growth rate r i was generated as ( )
10 00 i r r r v U = + , where r = 1.5 represents the average growth rate in the community, and r v = 0.5 represents 10 the size of inter-specific variation of the growth rate. Here, U is a random variable following uniform distribution Uniform (-1, 1). The interaction matrix M = ( m ij ), effect of sp.j on sp.i was generated as, ( )( ) max1 1
1 1~ (0, , ) 1 1, 11 if 0 2 , 1 11 else iiij T Iji m jiij ji m m i nm N I i n i j nm v U mm i n j im v U = − − + + = −− + , where N T is a random variable following truncated normal distribution N T ( μ =0, σ=σ I , ς=I max ) 15 with probability density function involving the truncation threshold ς , ( ) ( ; , ) if ( )( | , , ) , ( ) 0 else ( ) ( ) T T N T TTN T TN N g N f N Ng Nf N g NF F = − = =− −
In this function, f N and F N are the probability density function and cumulative distribution function of normal distribution Normal ( μ , σ ), respectively. Here, we let the mean interaction strength μ=0 , the inter-specific variation of interaction strength σ I = 2.0, and the maximum 20 interaction strength I max = 0.5 (i.e., truncation threshold ς ). In addition, v m = 0.9 determines the maximum difference of the interaction strength between m ij and m ji . With this definition, the 17 interaction matrix includes competition ( m ij < 0 & m ji < 0) and trophic-interactions between resource sp. j and consumer sp. i ( m ij > 0 & m ji < 0) only. Also note that the trophic interaction can occur only when i < j . Finally, we generated initial population size as ( ) ( ) 1 i T N t n N = + , where n =0.1 and N T is a truncated normal random variable with μ =0, σ =0.2, and ς =0.9. 5 IV. Testing robustness of the MDR S-map against data noise
IV-1. Random noises generation
To test the robustness of MDR S-map method to data noise, we considered three types of random noises, including: i) observation noise, ii) process noise, and iii) stochastic 10 environmental forcing. Firstly, for observation noise, we added noises into the (true) population size at each time step t . The inclusion of noises is carried out after the generation of time series data based on Eq. 3., ( ) ( ) ( ) 1 1 , : , , ,
Ni k i k O k n t N t E U tj n t T t t = + , where E O represents the size of observation noise. This is the conversion from the (true) 15 population size vector N ( t ) into the observed vector n (t) = ( n ( t ), n ( t ), …, n ( t )) T . For process noises and stochastic environmental forcing, we modeled the intrinsic growth rate of each species i ( r i in Eq. 3) as a function of time-dependent process noises (from unknown mechanisms) ( ) i t and environmental forcing env ( t ) as follows, ( ) 1 ( )+ ( ) i i E P i r t r E env t E t = + , 20 where r i , E E , and E P is the constant intrinsic growth rate of species i , species-independent effect size of environmental forcing, and species-independent effect size of process noise, respectively. In addition, env ( t ) and ( ) i t are time-dependent environmental forcing with first-order autocorrelation (i.e., red noise) and time- and species-dependent noises without 18 autocorrelation (i.e., white noise), respectively. Unlike observation noise, both process noises and stochastic environmental forcing were added in the iterative computation of difference equations (Eq. 3). Specially, the process noises were generated as ( ) i t U = and the environmental forcing env ( t ) was generated as (1) 0.1,( ) ( 1) (1 ) 2, env env env envenv t env t v U t == − + −
5 where ρ env = 0.8 represents the degree of autocorrelation and v env = 0.5 represents the maximum noise size. Based on various magnitude of E E , E P , and E O , we generated the observed vectors n using identical initial values and parameters N (0), M , and r i ( k ). In total, we considered four levels for each noise type, including zero, low, medium, and high level of noise (E E = E P = E O = 0.05, 0.1, and 0.15 for low, medium, and high level, respectively). Then, we analyzed the 10 time series data generated from the different levels of noise combinations (i.e., 4 =64 combinations). Results for testing influences of data stochasticity on MDR S-map were summarized in Figs. S4-7. IV-2. Robustness of MDR S-map to random noises
15 The estimated interaction strengths by MDR S-map were robust against low to moderate levels of noise, including observation noises, process noises and stochastic environmental forcing (Fig. S4). Among various types of noises, the MDR S-map was relatively more sensitive to observation noises than the other types of noises; nevertheless, reasonable positive associations between estimated and theoretical interaction strengths were 20 still maintained, even under moderate observation noise. Interestingly, however, the negative influences of observational noises were relieved when process noises exist (Fig. S5), although process noises undermined the performance of one-step forward forecast (Fig. S6). These analyses indicate that process noises, while bringing negative impacts on one-step forward 19 forecast, facilitate quantification of interaction strengths. This finding, although seemingly counter-intuitive, can be explained with dynamical system theory; facilitative effects from process noises might help to better recover the local geometry of attractor when time series data are limited and sparse under high-dimensional settings. That is, process noises randomly travel data points to the states that otherwise will not be visited if observations are limited, and 5 thus relieves data sparsity. Indeed, the positive influences of process noises become less visible when increasing observations (doubling the length of time series) (Fig. S7). Finally, our method was also robust to unseen environmental forcing (here, modeled as red noise), indicating that the estimated interaction strengths are still reliable even if some important external forces were not included in the analysis for practical reasons. 10 20
V. Importance of embedding dimension in the estimation of network topology
In this study, we emphasize that an important feature of MDR S-map was its capacity of accommodating a large number of network nodes while still operating at the optimal embedding dimension through multiview ensemble approach. This was critically different from the approach used in a recent study (Ushio 2020) that quantified interactions by 5 embedding all causal nodes into regularized S-map (Cenci et al. over-embedding (Kennel et al. et al. et al. E < 20 and N = 122 × 5 = 610 time points used in the study (Ushio 2020)); however, it remains an open question concerning how many data points are needed for the over-embedding approach to operate. In 25 21 summary, our analysis revealed the strong negative impacts of high-dimensionality on network reconstructions, especially for network topology, if the SSR was not operated at the optimal embedding dimension. As explained in previous sections, this was because precise neighborhood measures (e.g., distance) required for SSR-based approaches (Sugihara et al. et al. SI references
Bühlmann, P. & Van De Geer, S. (2011).
Statistics for high-dimensional data: methods, theory and applications . Springer Science & Business Media. Bellman, R.E. (1957).
Dynamic Programming . Princeton University Press. Berlow, E.L., Neutel, A.-M., Cohen, J.E., De Ruiter, P.C., Ebenman, B., Emmerson, M. et al.
5 (2004). Interaction strengths in food webs: issues and opportunities.
J. Anim. Ecol. , 73, 585-598. Cenci, S. & Saavedra, S. (2019). Non-parametric estimation of the structural stability of non-equilibrium community dynamics.
Nat. Ecol. Evol. , 3, 912-918. Cenci, S., Sugihara, G. & Saavedra, S. (2019). Regularized S‐map for inference and 10 forecasting with noisy ecological time series.
Methods Ecol. Evol. , 10, 650-660. Chang, C.-W., Ushio, M. & Hsieh, C.-h. (2017). Empirical dynamic modeling for beginners.
Ecol. Res. , 32, 785-796. Deyle, E.R., May, R.M., Munch, S.B. & Sugihara, G. (2016). Tracking and forecasting ecosystem interactions in real time.
Proc. Royal Soc. B , 283, 20152258. 15 Habeeb, R.L., Trebilco, J., Wotherspoon, S. & Johnson, C.R. (2005). Determining natural scales of ecological systems.
Ecol. Monogr. , 75, 467-487. Hall, P., Marron, J.S. & Neeman, A. (2005). Geometric representation of high dimension, low sample size data.
J. Roy. Stat. Soc. Ser. B. (Stat. Method.) , 67, 427-444. Kennel, M.B., Brown, R. & Abarbanel, H.D.I. (1992). Determining embedding dimension for 20 phase-space reconstruction using a geometrical construction.
Phys. Rev. A , 45, 3403-3411. Liebert, W., Pawelzik, K. & Schuster, H.G. (1991). Optimal embeddings of chaotic attractors from topological considerations.
EPL , 14, 521-526. Perretti, C.T., Munch, S.B. & Sugihara, G. (2013). Model-free forecasting outperforms the 25 correct mechanistic model for simulated and experimental data.
Proc. Natl. Acad. Sci. U.S.A. , 110, 5253-5257. Sauer, T., Yorke, J. & Casdagli, M. (1991). Embedology.
J. Stat. Phys. , 65, 579-616. Shalizi, C.R. (2006). Methods and Techniques of Complex Systems Science: An Overview. In:
Complex Systems Science in Biomedicine (eds. Deisboeck, TS & Kresh, JY). 30 Springer US Boston, MA, pp. 33-114. Sugihara, G., Grenfell, B., May, R.M., Chesson, P., Platt, H.M. & Williamson, M. (1990). Distinguishing error from chaos in ecological time series.
Philos. Trans. R. Soc. Lond. B. , 330, 235-251. Sugihara, G., May, R., Ye, H., Hsieh, C.-h., Deyle, E., Fogarty, M. et al. (2012). Detecting 35 causality in complex ecosystems.
Science , 338, 496-500. Sugihara, G. & May, R.M. (1990). Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series.
Nature , 344, 734-741. 23 Suzuki, K., Yoshida, K., Nakanishi, Y. & Fukuda, S. (2017). An equation-free method reveals the ecological interaction networks within complex microbial ecosystems.
Methods Ecol. Evol. , 8, 1774-1785. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso.
J. Roy. Stat. Soc. Ser. B. (Stat. Method.) , 58, 267-288. 5 Ushio, M. (2020). Interaction capacity underpins community diversity. bioRxiv , 2020.2004.2008.032524. Ye, H. & Sugihara, G. (2016). Information leverage in interconnected ecosystems: Overcoming the curse of dimensionality.
Science , 353, 922-925. Zou, H. & Zhang, H.H. (2009). On the adaptive elastic-net with a diverging number of 10 parameters.
Ann. Stat. , 37, 1733-1751. Fig. S1. Examples illustrating that interaction strengths estimated for out-of-samples observed at T=997 and 998 were consistent with theoretical expectations.
Based on our analysis, the one-step-forward forecast can not only be used to predict future values of network 5 nodes (Table 1), but also to reconstruct the whole interaction networks at future time points. Here, the grey line represents the 1:1 lines, and black lines represents the slope of quantile regression based on median. In Panel ( B ), most of the points are distributed along the 1:1 line; however, a few outliers are influential to the estimation of regression slope, which make the regression line deviate from the grey line and decrease Pearson r . In summary, once we know 10 25 the state of network nodes at previous time points, we can incorporate this information to the nonparametric MDR S-map model established by library time series and forecast the interaction networks for one or a few steps forward. Fig. S2. Long-term median of theoretical expectations was consistent with interaction strength estimated by MDR S-map, but not with the interaction strength inferred from correlation analyses. Long-term median of interaction (i.e., Jacobian) matrices was calculated as the temporal median of each interaction strength observed at all time points. ( A ) Since the results for each time point had a strong positive relationship between MDR S-map estimations 26 and theoretical expectations (Fig. 2), the estimated long-term medians of interaction strengths corresponded well to the theoretical medians. However, ( B ) these theoretical medians had a very weak, even negative relationship with the correlation coefficients obtained from time series pairs. Here, the grey line represents the 1:1 lines, whereas the black line represents the slope of quantile regression based on median. 5 27 Fig. S3. Interaction strengths estimated based on percentage (or relative abundance) data.
Estimations of interaction networks using MDR S-map had good correspondence with theoretical expectations for each time point, T=950 and 960 as examples (
A-B ) and their long-5 term median ( C ). However, these relationships were not as strong as estimations based on absolute abundance data, as percentilization introduced biases caused by unequally weighted data at adjacent time points (see details in SI Text, II. Statistical properties of MDR S-map ). Here, the grey line represented the 1:1 lines and black line represented the slope of quantile regression, based on the median.
10 28
Fig. S4. Interaction strengths estimated by MDR S-map are robust to different types of noises.
The prediction skills of estimated interaction strength were evaluated by calculating the temporal medians of ( A ) Pearson r and ( B ) rMSE between estimated and theoretical interaction strengths. Prediction skills varied with different types of noises, including observation noises 5 (OB), process noises (PS), and stochastic environmental forcing (EF) (see details in SI Text, IV. Testing robustness of the MDR S-map against data noise ). In this analysis, each type of noise was independently added (i.e. without the other types of noises). Among these types of noises, observation noises had the strongest impacts on prediction skills. Nonetheless, the positive relationships between the estimations and theoretical expectations remained robust, 10 even under high levels of observation noises. Fig S5. Interaction strengths estimated by MDR S-map are robust to synergistic effects of various types of noises . Prediction skills of estimated interaction strength were evaluated by calculating the temporal medians of ( A ) Pearson r and ( B ) rMSE between the estimated and theoretical interaction strengths (see details in SI Text, IV. Testing robustness of the MDR S-map against data noise ). These prediction skills consistently declined when elevating observation noises. However, both process noises and stochastic environmental forcing 5 (modeled as red noises) buffer against the negative impacts of observation noises. Therefore, the estimations of interaction strengths reached the highest skills under strong process noises, especially when observation noises exist. The presence of process noises is mostly beneficial to our estimations, in particular when observation noises are strong. Similarly, although the presence of unseen environmental forcing also has positive influences on prediction skills, its 10 influence is usually minor and sometimes even becomes negative when there are strong process noises. 31
Fig. S6. One-step forward forecast skills for noisy data.
The one-step forecast skills evaluated by ( A , C ) Pearson r and ( B , D ) rMSE between the predictions of MDR S-map and time series data (similar to the forecast skills shown in Table 1) varies with different types of 5 noises, including observation noises (OB), process noises (PS), and stochastic environmental forcing (EF) which were independently added (i.e. without other types of noises) (see details in SI Text, IV. Testing robustness of the MDR S-map against data noise ). Although estimations of interaction strengths improved when involving process noises (Fig. S4), performance of one-step forward forecast declined when elevating process noises in both leave-one-out cross-10 validation (In-sample, A - B ) and out-of-sample forecast (Out-of-sample, C-D ). 32
Fig. S7. The robustness of MDR S-map to random noises using longer time series (n=200).
The prediction skills of estimated interaction strength, ( A ) Pearson r and ( B ) rMSE, were calculated to evaluated performance of MDR S-map using longer time series ( n =200). Here, various types of noises, including observation noises (OB) and process noises (PS) were 5 included (see details in SI Text, IV. Testing robustness of the MDR S-map against data noise ). Similar to the analysis based on the shorter time series used in main texts ( n =100), low to medium levels of process error still improved the prediction skill for inferring interaction strengths in the absence of observation error. However, strong process noises do not have positive influences on the estimations of interaction strength; this was in contrast to the positive 10 effects of strong process noises observed in the analyses using shorter time series. In addition, for the synergistic effects of various types of noises ( C and D are the prediction skills using Pearson r and rMSE, respectively), process noises buffered against the negative impacts of observation error, as observed in the analysis of short time series. However, this contrasted with the strong process error leading to lower prediction skill than adding a medium level of 15 process error in the absence of observation error. 33 Fig. S8. Interaction strengths estimated from incomplete network nodes.
Removal of some critical species will not influence estimations of interaction strengths among the remaining species. Here, dominant species with the top 10% relative abudnace (12 species out of 117 species) were removed from our analysis. However, removal of these critical species had very 5 minor effects on the estimated interaction strengths among the remaining species (i.e., sub-networks). Compared to networks reconstructed using complete nodes, there was minor reduction in Pearson r , approximately 0.010, 0.013 and 0.09 for the networks reconstructed at ( A ) T=950, ( B ) T=960 and ( C ) the longterm medians (Fig. S2), respectively. That is, reconstruction of sub-network was still reliable, even if some network nodes or edges were not 10 involved in the MDR S-map analysis for practical reseaons. Here, the grey line represents the 1:1 line, and black line represents the slope of quantile regression based on median. 34 Fig. S9. MDR S-map is robust to random noises and outperforms over-embedding regularized S-map in the estimations of node centrality indices.
The four centrality indices, all strength (i.e., in- + out- strength;
A-C ), eigenvector centrality (
D-F ), hub (
G-I ), and authority ( j - l ), were calculated for each node. Calculated indices changed with various types of noises, 5 including observation noises (OB; A , D , G , and J ), process noises (PS; B , E , H , and K ), and unseen environmental forcing (EF; C , F , I , and L ). To evalutate the performance of MDR S-map (black solid curcle) and over-embedding regularized S-map (blue open triangle; See details in SI Text, V. Importance of embedding dimension in the estimation of network topology ), we calculated the median of time-varying correlation coefficients that measured correlations 10 between theoretical and estimated nodes centrality at each time point. The error bars present 95% bootstrapped confidence intervals. We concluded that the estimation of node centrality based on MDR-S-map was robust to random noises. Throughout all noise scenarios, MDR S-map outperformed over-embeddeding regularized S-map in the estimation of node centrality. 35
Fig. S10. MDR S-map was robust to random noises and outperformed over-embedding regularized S-map in the estimations of network topology and stability indices.
Network topology, including (
A-C ) mean and (
D-F ) standard deviation of interaction strength and network stability indices, including (
G-I ) Lyapunov instability (
J-L ) and structual instability, 5 were calculated from time-varying interaciton matrix. The calculated indices changed with various types of noises, including observation noises (OB; A , D , G , and J ), process noises (PS; B , E , H , and K ), and unseen environmental forcing (EF; C , F , I , and L ). To evaluate the performance of MDR S-map (black solid circle; See details in SI Text, V. Importance of embedding dimension in the estimation of network topology ) and over-embedding regularized 10 S-map (blue open triangle), we calculated temporal correlations among these time-varying indices. The error bars represent 95% bootstrapped confidence intervals. The estimated network topology and stability based on MDR-S-map was robust to random noises. Throughout all noise scenarios, MDR S-map outperformed over-embeddeding regularized S-map in estimation of network topology. For estimation of network stability, the MDR S-map has 15 greater performance in most scenarios, except for a few scenarios where an over-emebedding S-map performed similarly. Fig. S11. MDR S-map was robust to random noises and outperformed over-embedding regularized S-map in estimation of individual interaction strengths.
Here, three types of noises, including observation noises (OB; A , D , G , and J ), process noises (PS; B , E , H , and K ), and unseen environmental forcing (EF; C , F , I , and L ), were considered as presented in Fig. S4. 5 To evaluate method performance of MDR S-map (black solid curcle) and over-embedding regularized S-map (blue open triangle; See details in SI Text, V. Importance of embedding dimension in the estimation of network topology ), we calculated the temporal median of the time-varying skills, including (
A-C ) regression slope, (
D-F ) Pearson correlation, (
G-I ) mean absoulte error (MAE), and (
J-L ) root mean square error (rMSE) computed from the compairson 10 between estimated and theoretical interaction strengths at each time point. The error bars represent 95% bootstrapped confidence interval. The estimation of interaction strength based on MDR-S-map was robust to random noises. Throughout all noise scenarios, MDR S-map siginicantly outperformed over-embeddeding regularized S-map (non-overlapped confidence 37 intervals for regression slope, Pearson r , MAE, and some rMSE), although differences in Pearson correlation were usually not large (< 0.2 throughout the scenarios). We concluded that over-embedding regularized S-map roughly quantified the interaction strength, although problems caused by high dimensionality were not fully addressed. 38 Fig. S12. Importance of bacteria OTUs inferred from the reconstructed interaction networks.
Importance of each of the 137 bacteria OTUs was evaluated by ( A ) strengths of 5 outward interactions affecting the other species (i.e., out-strength) and ( B ) hub considering not only the interaction strengths of the node itself, but also authority of the nodes it affected. The points indicate species ranked by their temporal mean of the topological indices. Grey bars indicate the temporal standard deviation ( Table S1. Comparison of the topological properties characterizing the network nodes in reconstructed versus theoretical networks.
For the interaction network reconstructed at every time point, we calculated the topological indices for each node and compared them to those calculated from theoretical networks. To evaluate the performance of estimations, we computed the Pearson r and the slope of median-based regression by comparing estimations 5 and theoretical expectations. The interaction network can be evaluated at each time point; for demonstration purposes, we summarized these indices in this table (mean SD; median). In particular, we applied a Kolmogorov-Smirnov test (K-S test) to test whether the shape of the cumulative distribution of estimated strengths (in-, out-, and all) differed from the distribution of theoretical expectations. There was rarely significant difference between estimations and 10 expectations, except for the distribution of out-strengths.
Type In-strength Out-strength All strength Eigenvector centrality Hub Authority Pearson r .05; 0.90 0.97 .01; 0.97 0.96 .02; 0.96 0.76 .38; 0.96 0.83 .22; 0.94 0.77 .27; 0.91 Slope 0.87 .05; 0.87 0.89 .07; 0.88 0.85 .09; 0.84 0.85 .45; 1.00 0.69 .34; 0.84 0.75 .39; 0.83 p -value of K-S test 0.77 .19; 0.79 0.06 .11; 0.004 0.36 .35; 0.17 -- -- --
15 40
Table S2. Averaged node centrality index of dominant bacterial OTU ranked by mean out-strength.
The importance of OTU may be different 16 when concerning different centrality indices. Therefore, we labeled the top 10 highest mean values with respect to each centrality index by red 17 background. 18
OTU Out- strength In- strength All strength Hub Authority Eigenvector centrality Phylum Class Order Family Genus
1 1.99 0.07 2.06 0.14 0.01 0.06 Unidentifiable 2 1.86 0.32 2.18 0.04 0.04 0.06 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 3 1.83 0.56 2.39 0.03 0.05 0.10 Proteobacteria Alphaproteobacteria Rickettsiales SAR11 Pelagibacter 4 1.52 -0.05 1.48 0.10 0.01 0.05 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 5 1.46 0.05 1.52 0.23 0.02 0.21 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 6 1.42 0.15 1.57 0.14 0.01 0.12 Actinobacteria Actinobacteria Actinomycetales Microbacteriaceae 7 1.34 -0.01 1.33 0.13 0.05 0.15 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 8 1.25 0.30 1.55 0.02 0.01 0.04 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae Loktanella 9 1.25 0.69 1.94 0.06 0.05 0.08 Firmicutes Clostridia Clostridiales 10 1.23 0.33 1.56 0.08 0.06 0.14 Unidentifiable 11 1.19 0.49 1.68 0.07 0.01 0.07 Proteobacteria Epsilonproteobacteria 12 1.18 0.10 1.28 0.10 0.01 0.06 Bacteroidetes Flavobacteria Flavobacteriales Cryomorphaceae Crocinitomix 13 1.15 0.13 1.28 0.06 0.00 0.05 Proteobacteria Gammaproteobacteria Vibrionales Vibrionaceae Vibrio 14 1.14 0.08 1.21 0.18 0.03 0.11 Proteobacteria Gammaproteobacteria 15 1.04 1.16 2.21 0.29 0.02 0.21 Bacteroidetes Sphingobacteria Sphingobacteriales 16 1.04 0.06 1.10 0.09 0.02 0.09 Bacteroidetes Flavobacteria Flavobacteriales 17 0.98 0.05 1.03 0.04 0.05 0.09 Proteobacteria Gammaproteobacteria Oceanospirillales Halomonadaceae Cobetia 18 0.94 0.32 1.26 0.04 0.05 0.08 Unidentifiable 19 0.94 -0.21 0.72 0.01 0.06 0.08 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae Sulfitobacter 20 0.92 0.12 1.05 0.10 0.02 0.12 Bacteroidetes 21 0.87 0.35 1.22 0.15 0.01 0.11 Proteobacteria Gammaproteobacteria Pseudomonadales Moraxellaceae Psychrobacter 22 0.87 0.01 0.88 0.04 0.00 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 23 0.84 0.11 0.95 0.01 0.01 0.04 Proteobacteria Epsilonproteobacteria Campylobacterales Campylobacteraceae Arcobacter 24 0.82 0.72 1.54 0.07 0.01 0.07 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Tenacibaculum 25 0.80 0.04 0.84 0.04 0.02 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 26 0.63 0.33 0.95 0.02 0.03 0.07 Unidentifiable 27 0.62 0.39 1.01 0.04 0.07 0.06 Bacteroidetes Sphingobacteria Sphingobacteriales Saprospiraceae 28 0.61 0.06 0.67 0.03 0.03 0.05 Proteobacteria 29 0.60 0.53 1.13 0.01 0.07 0.09 Proteobacteria Gammaproteobacteria Methylococcales Methylococcaceae Methylocaldum 30 0.60 1.04 1.64 0.04 0.15 0.20 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 31 0.59 0.12 0.71 0.03 0.05 0.07 Proteobacteria Alphaproteobacteria Rhodospirillales Rhodospirillaceae 32 0.58 0.06 0.64 0.05 0.03 0.08 Unidentifiable
33 0.58 0.54 1.12 0.04 0.07 0.13 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Tenacibaculum 34 0.54 0.47 1.01 0.07 0.02 0.07 Verrucomicrobia Verrucomicrobiae Verrucomicrobiales 35 0.49 -0.04 0.45 0.06 0.01 0.04 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 36 0.48 0.53 1.01 0.02 0.02 0.04 Proteobacteria Epsilonproteobacteria Campylobacterales Helicobacteraceae Sulfurovum 37 0.47 0.42 0.89 0.03 0.05 0.06 Cyanobacteria Cyanobacteria Family VIII GpVIII 38 0.46 0.21 0.67 0.04 0.05 0.08 Bacteroidetes 39 0.45 0.82 1.28 0.02 0.01 0.03 Cyanobacteria Cyanobacteria Family II GpIIa 40 0.45 0.19 0.64 0.07 0.07 0.10 Unidentifiable 41 0.45 0.12 0.56 0.01 0.02 0.05 Unidentifiable 42 0.44 0.16 0.60 0.05 0.07 0.07 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 43 0.43 0.45 0.88 0.09 0.04 0.13 Cyanobacteria Cyanobacteria 44 0.43 0.44 0.87 0.04 0.04 0.09 Bacteroidetes Flavobacteria Flavobacteriales 45 0.42 0.19 0.61 0.02 0.08 0.08 Proteobacteria Gammaproteobacteria 46 0.42 0.22 0.64 0.02 0.00 0.03 Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae Planctomyces 47 0.40 0.38 0.78 0.02 0.04 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 48 0.40 0.78 1.18 0.01 0.04 0.04 Proteobacteria Epsilonproteobacteria Campylobacterales Campylobacteraceae Arcobacter 49 0.38 0.00 0.37 0.08 0.02 0.09 Proteobacteria Gammaproteobacteria 50 0.37 0.29 0.67 0.02 0.02 0.02 Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae Planctomyces 51 0.35 0.24 0.59 0.05 0.02 0.07 Bacteroidetes Sphingobacteria Sphingobacteriales Saprospiraceae Lewinella 52 0.35 0.66 1.01 0.03 0.06 0.08 Proteobacteria Alphaproteobacteria Rickettsiales SAR11 Pelagibacter 53 0.34 0.25 0.60 0.01 0.02 0.03 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae Loktanella 54 0.34 0.42 0.77 0.04 0.02 0.05 Cyanobacteria Cyanobacteria Family II GpIIa 55 0.34 -0.97 -0.63 0.01 0.13 0.08 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 56 0.34 0.30 0.63 0.03 0.01 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 57 0.33 1.60 1.93 0.01 0.07 0.13 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 58 0.32 0.15 0.47 0.02 0.02 0.04 Proteobacteria Gammaproteobacteria 59 0.30 0.83 1.13 0.02 0.05 0.04 Proteobacteria Gammaproteobacteria 60 0.30 0.47 0.77 0.03 0.03 0.03 Proteobacteria Gammaproteobacteria 61 0.29 0.11 0.40 0.04 0.02 0.06 Proteobacteria Alphaproteobacteria Rickettsiales SAR11 Pelagibacter 62 0.28 0.09 0.37 0.06 0.05 0.06 Cyanobacteria Cyanobacteria 63 0.25 0.45 0.70 0.03 0.04 0.03 Verrucomicrobia Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae Roseibacillus 64 0.22 0.29 0.51 0.04 0.03 0.06 Bacteroidetes Sphingobacteria Sphingobacteriales 65 0.22 0.11 0.33 0.06 0.02 0.05 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 66 0.22 0.41 0.63 0.02 0.04 0.05 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 67 0.21 0.14 0.35 0.02 0.03 0.03 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Gilvibacter 68 0.20 0.58 0.79 0.01 0.03 0.05 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 69 0.19 -1.42 -1.23 0.04 0.15 0.11 Bacteroidetes Flavobacteria Flavobacteriales 70 0.19 0.44 0.63 0.02 0.05 0.04 Proteobacteria Gammaproteobacteria Thiotrichales Piscirickettsiaceae Methylophaga 71 0.17 0.22 0.38 0.06 0.07 0.08 Proteobacteria Gammaproteobacteria
72 0.17 -0.02 0.14 0.04 0.02 0.05 Proteobacteria Gammaproteobacteria 73 0.16 0.25 0.40 0.01 0.05 0.04 Proteobacteria Deltaproteobacteria Desulfobacterales Desulfobulbaceae 74 0.16 0.39 0.55 0.12 0.01 0.11 Proteobacteria Gammaproteobacteria 75 0.16 0.35 0.51 0.05 0.03 0.07 Proteobacteria Gammaproteobacteria 76 0.16 0.68 0.84 0.05 0.11 0.11 Proteobacteria Epsilonproteobacteria Campylobacterales Helicobacteraceae Sulfurovum 77 0.15 0.19 0.34 0.06 0.01 0.10 Bacteroidetes 78 0.14 0.28 0.42 0.04 0.01 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Krokinobacter 79 0.14 0.01 0.15 0.00 0.00 0.01 Proteobacteria Gammaproteobacteria 80 0.14 0.22 0.36 0.01 0.01 0.02 Verrucomicrobia Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae Roseibacillus 81 0.13 0.00 0.12 0.02 0.06 0.06 Proteobacteria 82 0.13 0.42 0.55 0.04 0.05 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Winogradskyella 83 0.12 0.03 0.16 0.02 0.03 0.03 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 84 0.12 0.79 0.91 0.03 0.05 0.08 Unidentifiable 85 0.11 0.09 0.20 0.02 0.00 0.02 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 86 0.11 0.84 0.95 0.01 0.01 0.03 Cyanobacteria Cyanobacteria Family II GpIIa 87 0.10 0.31 0.40 0.03 0.02 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 88 0.10 -0.04 0.05 0.03 0.06 0.05 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 89 0.09 1.55 1.64 0.06 0.04 0.09 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 90 0.09 0.11 0.20 0.02 0.03 0.02 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 91 0.07 0.08 0.15 0.09 0.01 0.12 Verrucomicrobia Verrucomicrobiae Verrucomicrobiales 92 0.05 0.47 0.51 0.05 0.03 0.05 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 93 0.04 0.63 0.67 0.02 0.02 0.04 Proteobacteria Gammaproteobacteria 94 0.04 0.14 0.19 0.02 0.02 0.05 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 95 0.04 0.38 0.42 0.03 0.02 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 96 0.02 0.04 0.07 0.18 0.01 0.12 Unidentifiable 97 0.02 0.17 0.19 0.02 0.07 0.09 Proteobacteria Gammaproteobacteria 98 0.02 0.75 0.77 0.06 0.12 0.20 Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae 99 0.02 0.27 0.29 0.03 0.02 0.05 Proteobacteria Gammaproteobacteria Alteromonadales Colwelliaceae Colwellia 100 0.02 0.70 0.72 0.02 0.04 0.04 Proteobacteria Epsilonproteobacteria Campylobacterales Campylobacteraceae Arcobacter 101 0.01 0.54 0.55 0.02 0.01 0.03 Actinobacteria Actinobacteria Actinomycetales 102 0.01 0.48 0.49 0.06 0.00 0.06 Proteobacteria Deltaproteobacteria Desulfobacterales Desulfobulbaceae Desulfotalea 103 0.00 0.45 0.45 0.03 0.01 0.03 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 104 -0.01 0.38 0.38 0.09 0.05 0.08 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Lutimonas 105 -0.01 0.60 0.59 0.04 0.03 0.03 Fusobacteria Fusobacteria Fusobacteriales Fusobacteriaceae 106 -0.01 0.06 0.05 0.02 0.07 0.08 Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae Planctomyces 107 -0.01 0.06 0.04 0.05 0.01 0.04 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 108 -0.04 0.48 0.44 0.02 0.04 0.04 Proteobacteria Gammaproteobacteria Thiotrichales Piscirickettsiaceae Methylophaga 109 -0.04 0.19 0.15 0.04 0.00 0.03 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 110 -0.05 -0.28 -0.33 0.01 0.03 0.03 Actinobacteria Actinobacteria Actinomycetales
111 -0.05 0.36 0.31 0.05 0.01 0.04 Proteobacteria 112 -0.06 0.44 0.38 0.02 0.01 0.04 Proteobacteria Gammaproteobacteria 113 -0.06 0.29 0.23 0.02 0.03 0.03 Proteobacteria Gammaproteobacteria 114 -0.07 0.14 0.08 0.05 0.01 0.05 Proteobacteria Gammaproteobacteria 115 -0.07 0.55 0.48 0.03 0.01 0.05 Unidentifiable 116 -0.07 0.31 0.24 0.03 0.05 0.10 Bacteroidetes Flavobacteria Flavobacteriales 117 -0.11 0.42 0.31 0.02 0.02 0.03 Unidentifiable 118 -0.11 0.27 0.16 0.05 0.03 0.05 Unidentifiable 119 -0.19 0.41 0.21 0.05 0.05 0.05 Proteobacteria Epsilonproteobacteria Campylobacterales Helicobacteraceae Sulfurovum 120 -0.21 0.06 -0.15 0.02 0.01 0.03 Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae 121 -0.21 1.13 0.92 0.03 0.09 0.11 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 122 -0.23 0.25 0.02 0.03 0.09 0.07 Proteobacteria Gammaproteobacteria 123 -0.24 0.59 0.35 0.03 0.03 0.04 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 124 -0.25 0.52 0.27 0.10 0.07 0.13 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Tenacibaculum 125 -0.25 0.26 0.01 0.03 0.04 0.03 Unidentifiable 126 -0.27 0.32 0.05 0.04 0.06 0.06 Proteobacteria Gammaproteobacteria 127 -0.27 0.86 0.59 0.05 0.02 0.06 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 128 -0.28 0.37 0.09 0.12 0.01 0.09 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Polaribacter 129 -0.28 0.30 0.02 0.07 0.05 0.08 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Tenacibaculum 130 -0.33 0.19 -0.14 0.03 0.07 0.08 Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae Rhodopirellula 131 -0.35 0.51 0.15 0.04 0.06 0.08 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae 132 -0.38 0.55 0.17 0.05 0.02 0.04 Proteobacteria Gammaproteobacteria 133 -0.39 0.01 -0.38 0.05 0.03 0.04 Verrucomicrobia Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae Haloferula 134 -0.59 0.69 0.11 0.04 0.04 0.08 Actinobacteria Actinobacteria 135 -0.62 0.28 -0.34 0.05 0.01 0.03 Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae 136 -0.63 0.79 0.15 0.08 0.07 0.08 Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae Winogradskyella
19 20 44
Legends for Movies S1
21 Time-varying interaction network of empirical bacterial communities sampled daily in Canoe 22 Beach, Boston (see Methods). For demonstration, we only included the reconstructed bacterial 23 interaction networks for the first 30 days. 24 2528