Machine learning dismantling and early-warning signals of disintegration in complex systems
MMachine learning dismantling and early-warningsignals of disintegration in complex systems
Marco Grassia , Manlio De Domenico ∗ , Giuseppe Mangioni ∗ Dip. Ingegneria Elettrica, Elettronica e Informatica - Universit`a degli Studi di Catania - Italy CoMuNe Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo (TN), Italy ∗ To whom correspondence should be addressed;E-mail: [email protected]; [email protected].
From physics to engineering, biology and social science, natural and artifi-cial systems are characterized by interconnected topologies whose features –e.g., heterogeneous connectivity, mesoscale organization, hierarchy – affecttheir robustness to external perturbations, such as targeted attacks to theirunits. Identifying the minimal set of units to attack to disintegrate a com-plex network, i.e. network dismantling, is a computationally challenging (NP-hard) problem which is usually attacked with heuristics. Here, we show thata machine trained to dismantle relatively small systems is able to identifyhigher-order topological patterns, allowing to disintegrate large-scale social,infrastructural and technological networks more efficiently than human-basedheuristics. Remarkably, the machine assesses the probability that next attackswill disintegrate the system, providing a quantitative method to quantify sys-temic risk and detect early-warning signals of system’s collapse. This demon-strates that machine-assisted analysis can be effectively used for policy and a r X i v : . [ phy s i c s . s o c - ph ] J a n ecision making to better quantify the fragility of complex systems and theirresponse to shocks. Introduction
Several empirical systems consist of nonlinearly interacting units, whose structure and dynam-ics can be suitably represented by complex networks ( ). Heterogeneous connectivity ( ),mesoscale (
19, 31 ), higher-order (
9, 26 ) and hierarchical ( ) organization, efficiency in infor-mation exchange ( ) and multiplexity (
10, 17, 18, 25 ), are distinctive features of biologicalmolecules within the cell ( ), connectomes ( ), mutualistic interactions among species ( ),urban ( ), trade ( ) and social (
14, 22, 27 ) systems.However, the structure of complex networks can dramatically affect its proper functioning,with crucial effects on collective behavior and phenomena such as synchronization in popula-tions of coupled oscillators ( ), the spreading of infectious diseases (
28, 33 ) and cascade fail-ures ( ), the emergence of misinformation (
38, 42 ) and hate ( ) in socio-technical systems orthe emergence of social conventions ( ). While heterogeneous connectivity is known to makesuch complex networks more sensitive to shocks and other perturbations occurring to hubs ( ),a clear understanding of the topological factors – and their interplay – responsible for a system’svulnerability still remains elusive. For this reason, the identification of the minimum set of unitsto target for driving a system towards its collapse – a procedure known as network dismantling– attracted increasing attention (
12, 24, 29, 30, 36 ) for practical applications and their implica-tions for policy making. Dismantling is efficient if such a set is small and, simultaneously, thesystem quickly breaks down into smaller isolated clusters. The problem is, however, NP-hardand while percolation theory provides the tools to understand large-scale transitions as units arerandomly disconnected (
7, 13, 32, 35 ), a general theory of network dismantling is missing andapplications mostly rely on approximated theories or heuristics.2ere, we develop a computationally efficient framework – named GDM and conceptuallydescribed in Figure 1 – based on machine learning, to provide a scalable solution, tackle thedismantling challenge, and gain new insights about the latent features of the topological orga-nization of complex networks. Specifically, we employ graph convolutional-style layers, over-coming the limitations of classic (Euclidean) deep learning and operate on graph-structureddata. These layers, inspired by the convolutional layers that empower most of the deep learningmodels nowadays, aggregate the features of each node with the ones found in its neighborhoodby means of a learned non-trivial function, producing high-level node features. While the ma-chine is trained on identifying the critical point from dismantling of relatively small systems– that can be easily and optimally dismantled – we show that it exhibits remarkable inductivecapabilities, being able to generalize to previously unseen nodes and way larger networks afterthe learning phase.
Results
Model architecture
The machine learning framework proposed here consists of a (geometric) deep learning model,composed of graph convolutional-style layers and a regressor (a multilayer perceptron), that istrained to predict attack strategies on small synthetic networks – that can be easily and opti-mally dismantled – and then used to dismantle large networks, for which the optimal solutioncannot be found in reasonable time. To give an insight, the graph convolutional-style lay-ers aggregate the features of each node with the ones found in its neighborhood by means ofa learned non-trivial function, as they are inspired by the convolutional layers that empowermost of the (Euclidean) deep learning models nowadays. More practically, the (higher-order)node features are propagated by the neural network when many layers are stacked: deeper thearchitecture, i.e., the more convolutional layers, the farther the features propagate, capturing3 etwork dismantling1 h h h n . . .. . . L . . .. . . Input OutputGraph convolutional-style layers. . .Deep Learning ModelLearning I N F E R E NC E Inference P r e - p r o cess i ng Optimal TargetComputation Node FeatureComputation
Figure 1:
Training a machine to learn complex topological patterns for network disman-tling.
To build our training data, we generate and dismantle small networks optimally andcompute the node features. After the model is trained, it can be fed the target network (again,with its nodes’ features) and it will assign each node n a value p n , the probability that it belongsto the (sub-)optimal dismantling set. Nodes are then ranked and removed until the dismantlingtarget is reached. 4he importance of the neighborhood of each node. Specifically, we stack a variable numberof state-of-the-art layers, namely Graph Attention Networks (GAT) ( ), that are based on theself-attention mechanism (also known as intra-attention) which was shown to improve the per-formance in natural language processing tasks ( ). These layers are able to handle the wholeneighborhood of nodes without any sampling, which is one of the major limitations of otherpopular convolutional-style layers (e.g., GraphSage ( )), and also to assign a relative impor-tance factor to the features of each neighboring node that depends on the node itself thanks tothe attention mechanism.Such detailed model takes as input one network at a time plus the features of its nodes andreturns a scalar value p n between zero and one for every node n . During the dismantling of anetwork, nodes are sorted and removed (if they belong to the LCC) in descending order of p n until the target is reached. Dismantling real-world systems
In our experiments, we dismantle empirical complex systems of high societal or strategic rel-evance, our main goal being to learn an efficient attack strategy. To validate the goodness ofsuch a strategy, we compare against state-of-the-art dismantling methods, such as
GeneralizedNetwork Dismantling (GND) ( ), Explosive Immunization (EI) ( ), CoreHD ( ), Min-Sum(MS) ( ) and Collective Influence (CI) ( ), using node degree, k –core value and local cluster-ing coefficient as node features.We refer the reader to the Supplementary Materials for a detailed description of our models,for additional discussion and experiments (also on large real-world and synthetic networks, plusresults in table form), and also for an extensive list of the real-world test networks, that includebiological, social, infrastructure, communication, trophic and technological ones.To quantify the goodness of each method in dismantling the network, we consider the Area nder the Curve (AUC) encoding changes in the Largest Connected Component (LCC) sizeacross the attacks. The LCC size is commonly used in the literature to quantify the robustnessof a network, because systems need the existence of a giant cluster to work properly. The AUCindicator has the advantage of accounting for how quickly, overall, the LCC is disintegrated:the lower the area under the curve, the more efficient is the network dismantling.As a representative example, we show in Figure 2a the result of the dismantling process forthe corruption network ( ), built from corruption scandals in Brazil, as a function of thenumber of removed units. Results are shown for GDM and the two cutting-edge algorithmsmentioned above. In Figures 2b and 2c, instead, we show the structure before and after dis-mantling, respectively. Our framework disintegrates the network faster than other methods: toverify if this feature is general, we perform a thorough analysis of several empirical systems.Figure 3 shows the performance of each dismantling method on each empirical system con-sidered in this study, allowing for an overall comparison. On average, our approach outperformsthe others. For instance, Generalized Network Dismantling’s cumulative AUC is ∼ higherand the Min-Sum algorithm is outscored by a significant margin, which is remarkable consider-ing that our approach is static – i.e., predictions are made at the beginning of the attack – whilethe other ones are dynamic – i.e., structural importance of the nodes is (re)computed duringthe attacks. For a more extensive comparison with these approaches, we also introduce a nodereinsertion phase using a greedy algorithm which reinserts, a posteriori, those nodes that belongto smaller components of the (virtually) dismantled system and which removal is not actuallyneeded in order to reach the desired target ( ). Once again, our approach outperforms theother algorithms: even without accounting for the reinsertion phase, GDM performs compara-bly with GND + reinsertion and outscores the others, highlighting how it is able to identify themore critical nodes of a network. We compute the AUC value by integrating the
LCC ( x ) / | N | values using Simpson’s rule. DM +RGND +R MS +RCoreHD CI 2
Number of removed nodes L CC S i z e (a) Dismantling process. (b) Original network. (c) Attacked network. Figure 2:
Dismantling the Brazilian corruption network. (a) GDM and state-of-the-art algo-rithms with reinsertion of the nodes are compared. The network before (b) and after (c) a GDMattack is shown. The color of the nodes represents (from dark red to white) the attack order,while their size represents their betweenness value. In the attacked network, darker nodes donot belong to the LCC, and their contour color represents the component they belong to.An interesting feature of our framework is that it can enhance existing heuristics based onnode descriptors, by employing the same measure as the only node feature. It is plausible toassess that our framework learns correlations among node features. To probe this hypothe-sis, in the Supplementary Materials we analyze the configuration models of the same networksanalyzed so far: those models keep the observed connectivity distribution while destroyingtopological correlations. We observe that dismantling performance drop on these models, con-firming that the existing topological correlations are learned and, consequently, exploited by themachine.
Early-warning signals of systemic collapse
Another relevant output of our method is the calculation of a damage score that can be usedto predict the impact of future attacks to the system. Accordingly, we introduce an estimatorof early warning that can be used for inform policy and decision making in applications wherecomplex interconnected systems – such as water management systems, power grids, communi-7ation systems and public transportation networks – are subject to potential failures or targetedattacks. We define Ω , namely Early Warning , as a value between and , calculated as follows.We first simulate the dismantling of the target network using our approach and call S o the set ofvirtually removed nodes that cause the percolation of the network. Then, we sum the p n valuespredicted by our model for each node n ∈ S o and define Ω m = X n ∈ S o p n . The value of the Early Warning Ω for the network after the removal of a generic set S of nodesis given by Ω = ( Ω s / Ω m if Ω s ≤ Ω m otherwisewhere Ω s = P n ∈ S p n .The rationale behind this definition is that the system will tolerate a certain amount of dam-age before it collapses: this value is captured by Ω m . Ω will quickly reach values close to when nodes with key-role in the integrity of the system are removed. Of course, the systemcould be heavily harmed by removing many less relevant nodes (e.g., the peripheral ones) withan attack that causes a small decrease in LCC size over time, and probably get a low value of Ω . However, this kind of attacks does not need an early-warning signal since they do not causean abrupt disruption of the system and can be easily detected. Why do we need an Early Warning signal?
In Figure 4 we show a toy-example meant toexplain why the Largest Connected Component size may not be enough to determine the state ofa system. The toy-example network in Figure 4a is composed of two cliques (fully connectedsub-networks) connected by a few border nodes (bridges) that also belong to the respectivecliques. Many dismantling approaches (like the degree and betweenness-based heuristics, oreven ours) would remove those bridge nodes first, meaning that the network would eventually8reak in two, as shown in Figure 4b. Now, when most of the bridge nodes are removed (e.g.,after removals), the LCC is still quite large as it includes more than of the nodes, butit takes just a few more removals of the bridges to break the network in two. While Ω is ableto capture the imminent system disruption (i.e., the Ω value gets closer to very fast), the LCCsize is not, and one would notice when it is too late. Moreover, the LCC curve during the initialpart of the attack is exactly the same as the one in Figure 4c, showing the removal of nodesin inverse degree (or betweenness) order, which does not cause the percolation of the system.Again, Ω captures this difference and does not grow, meaning that a slow degradation should beexpected. Tests on real-world systems
We test our method on key infrastructure networks and predictthe collapse of the system under various attack strategies (see Fig. 5 for details). Remarkably,while the LCC size decreases slowly without providing a clear alarm signal until the system isheavily damaged and collapses, Ω grows faster when critical nodes are successfully attacked,reaching warning levels way before the system is disrupted, as highlighted by the First ResponseTime, defined as the time occurring between system’s collapse and an early-warning signal of50% (i.e., Ω = 0 . ). Moreover, the first order derivative Ω s tracks the importance of nodes thatare being attacked, providing a measure of the attack intensity over time. Discussion
Our results show that using machine learning to learn network dismantling comes with a seriesof advantages. While the ultimate theoretical framework is still missing, our framework allowsone to learn directly from the data, at variance with traditional approaches which rely on thedefinition of new heuristics, metrics or algorithms. An important advantage of our method,typical of data-driven modeling, is that it can be further improved by simply retuning the pa-9ameters of the underlying model and training again: conversely, existing approaches requirethe (re)definition of heuristics and algorithms which are more demanding in terms of human ef-forts. Remarkably, the computational complexity of dismantling networks with our frameworkis considerably low: just O ( N + E ) , where N is system’s size and E the number of connections– which drops to O ( N ) for sparse networks. This feature allows for applications to systemsconsisting of millions of nodes while keeping excellent performance in terms of computingtime and accuracy. Last but not least, from a methodological perspective, it is worth remarkingthat our framework is general enough to be adapted and applied to other interesting NP-hardproblems on networks, opening the door for new opportunities and promising research direc-tions in complexity science, together with very recent results employing machine learning, forinstance, to predict extreme events ( ).The impact of our results is broad. On the one hand, we provide a framework which disin-tegrates real systems more efficiently and faster than state-of-the-art approaches: for instance,applications to covert networks might allow to hinder communications and information ex-change between harmful individuals. On the other hand, we provide a quantitative descriptor ofdamage which is more predictive than existing ones, such as the size of the largest connectedcomponent: our measure allows to estimate the potential system’s collapse due to subsequentdamages, providing policy and decision makers with a quantitative early-warning signal for trig-gering a timely response to systemic emergencies in water management systems, power grids,communication and public transportation networks. References
1. R´eka Albert, Hawoong Jeong, and Albert-L´aszl´o Barab´asi. Error and attack tolerance ofcomplex networks. nature , 406(6794):378, 2000.10. Luiz G. A. Alves, Giuseppe Mangioni, Isabella Cingolani, Francisco Aparecido Rodrigues,Pietro Panzarasa, and Yamir Moreno. The nested structural organization of the worldwidetrade multi-layer network.
Scientific Reports , 9(1):2866, 2019.3. Alex Arenas, Albert D´ıaz-Guilera, Jurgen Kurths, Yamir Moreno, and Changsong Zhou.Synchronization in complex networks.
Physics reports , 469(3):93–153, 2008.4. Albert-L´aszl´o Barab´asi and R´eka Albert. Emergence of scaling in random networks. sci-ence , 286(5439):509–512, 1999.5. Andrea Baronchelli. The emergence of consensus: a primer.
Royal Society open science ,5(2):172189, 2018.6. Marc Barthelemy. The statistical physics of cities.
Nature Reviews Physics , 1:406–415,2019.7. Amir Bashan, Yehiel Berezin, Sergey V Buldyrev, and Shlomo Havlin. The extreme vulner-ability of interdependent spatially embedded networks.
Nature Physics , 9(10):667, 2013.8. Danielle S Bassett and Olaf Sporns. Network neuroscience.
Nature neuroscience ,20(3):353, 2017.9. Austin R Benson, David F Gleich, and Jure Leskovec. Higher-order organization of com-plex networks.
Science , 353(6295):163–166, 2016.10. Stefano Boccaletti, Ginestra Bianconi, Regino Criado, Charo I Del Genio, Jes´us G´omez-Gardenes, Miguel Romance, Irene Sendina-Nadal, Zhen Wang, and Massimiliano Zanin.The structure and dynamics of multilayer networks.
Physics Reports , 544(1):1–122, 2014.11. Stefano Boccaletti, Vito Latora, Yamir Moreno, Martin Chavez, and D-U Hwang. Complexnetworks: Structure and dynamics.
Physics reports , 424(4-5):175–308, 2006.112. Alfredo Braunstein, Luca Dall’Asta, Guilhem Semerjian, and Lenka Zdeborov´a. Networkdismantling.
Proceedings of the National Academy of Sciences , 113(44):12368–12373, Oct2016.13. Sergey V Buldyrev, Roni Parshani, Gerald Paul, H Eugene Stanley, and Shlomo Havlin.Catastrophic cascade of failures in interdependent networks.
Nature , 464(7291):1025,2010.14. Damon Centola, Joshua Becker, Devon Brackbill, and Andrea Baronchelli. Experimentalevidence for tipping points in social convention.
Science , 360(6393):1116–1119, 2018.15. Aaron Clauset, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and theprediction of missing links in networks.
Nature , 453(7191):98, 2008.16. Pau Clusella, Peter Grassberger, Francisco J. P´erez-Reche, and Antonio Politi. Immuniza-tion and targeted destruction of networks using explosive percolation.
Phys. Rev. Lett. ,117:208301, Nov 2016.17. Manlio De Domenico, Clara Granell, Mason A Porter, and Alex Arenas. The physics ofspreading processes in multilayer networks.
Nature Physics , 12(10):901–906, 2016.18. Manlio De Domenico, Albert Sol´e-Ribalta, Emanuele Cozzo, Mikko Kivel¨a, YamirMoreno, Mason A Porter, Sergio G´omez, and Alex Arenas. Mathematical formulationof multilayer networks.
Physical Review X , 3(4):041022, 2013.19. Santo Fortunato. Community detection in graphs.
Physics reports , 486(3-5):75–174, 2010.20. Roger Guimera and Luis A Nunes Amaral. Functional cartography of complex metabolicnetworks. nature , 433(7028):895, 2005. 121. William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning onlarge graphs. 2017.22. Neil F Johnson, M Zheng, Yulia Vorobyeva, Andrew Gabriel, Hong Qi, Nicol´as Vel´asquez,Pedro Manrique, Daniela Johnson, E Restrepo, Chaoming Song, et al. New online ecologyof adversarial aggregates: Isis and beyond.
Science , 352(6292):1459–1463, 2016.23. NF Johnson, R Leahy, N Johnson Restrepo, N Velasquez, M Zheng, P Manrique, P De-vkota, and S Wuchty. Hidden resilience and adaptive dynamics of the global online hateecology.
Nature , 573(7773):261–265, 2019.24. Maksim Kitsak, Lazaros K Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H Eu-gene Stanley, and Hern´an A Makse. Identification of influential spreaders in complex net-works.
Nature physics , 6(11):888, 2010.25. Mikko Kivel¨a, Alex Arenas, Marc Barthelemy, James P Gleeson, Yamir Moreno, and Ma-son A Porter. Multilayer networks.
Journal of complex networks , 2(3):203–271, 2014.26. Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes. From networks to optimal higher-order models of complex systems.
Nature Physics , 15(4):313–320, 2019.27. David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-L´aszl´o Barab´asi, DevonBrewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, et al.Computational social science.
Science , 323(5915):721–723, 2009.28. Joan T Matamalas, Alex Arenas, and Sergio G´omez. Effective approach to epidemic con-tainment using link equations in complex networks.
Science advances , 4(12):eaau4212,2018. 139. Flaviano Morone and Hern´an A Makse. Influence maximization in complex networksthrough optimal percolation.
Nature , 524(7563):65, 2015.30. Flaviano Morone, Byungjoon Min, Lin Bo, Romain Mari, and Hern´an A Makse. Collectiveinfluence algorithm to find influencers via optimal percolation in massively large socialmedia.
Scientific reports , 6:30062, 2016.31. Mark EJ Newman. Communities, modules and large-scale structure in networks.
Naturephysics , 8(1):25–31, 2012.32. Saeed Osat, Ali Faqeeh, and Filippo Radicchi. Optimal percolation on multiplex networks.
Nature communications , 8(1):1540, 2017.33. Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespig-nani. Epidemic processes in complex networks.
Reviews of modern physics , 87(3):925,2015.34. Di Qi and Andrew J Majda. Using machine learning to predict extreme events in complexsystems.
PNAS , 117(1):52–59, 2020.35. Filippo Radicchi. Percolation in real interdependent networks.
Nature Physics , 11(7):597,2015.36. Xiao-Long Ren, Niels Gleinig, Dirk Helbing, and Nino Antulov-Fantulin. Generalizednetwork dismantling.
Proceedings of the National Academy of Sciences , 116(14):6554–6559, 2019.37. Haroldo V Ribeiro, Luiz G A Alves, Alvaro F Martins, Ervin K Lenzi, and Matjaˇz Perc.The dynamical structure of political corruption networks.
Journal of Complex Networks ,6(6):989–1003, 01 2018. 148. Massimo Stella, Emilio Ferrara, and Manlio De Domenico. Bots increase exposure tonegative and inflammatory content in online social systems.
Proceedings of the NationalAcademy of Sciences , 115(49):12435–12440, 2018.39. Samir Suweis, Filippo Simini, Jayanth R Banavar, and Amos Maritan. Emergence of struc-tural and dynamical properties of ecological mutualistic networks.
Nature , 500(7463):449,2013.40. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, AidanN. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. 06 2017.41. Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li`o, andYoshua Bengio. Graph attention networks. 2018.42. Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true and false news online.
Science , 359(6380):1146–1151, 2018.43. Duncan J Watts and Steven H Strogatz. Collective dynamics of ?small-world?networks. nature , 393(6684):440, 1998.44. Yang Yang, Takashi Nishikawa, and Adilson E Motter. Small vulnerable sets determinelarge network cascades in power grids.
Science , 358(6365):eaan3184, 2017.45. Lenka Zdeborov´a, Pan Zhang, and Hai-Jun Zhou. Fast and simple decycling and disman-tling of networks.
Scientific Reports , 6(1), Nov 2016.15 D M G N D E G N D A d a p t i v e d e g r e e E I M S Heuristic G D M G D M + R G N D + R C o r e H D M S + R C I Heuristic
ARK201012_LCCadvogatoarenas-metacfinder-googlecorruptiondblp-citedimacs10-celegansneuraldimacs10-polblogsecon-wm1ego-twittereu-powergridfoodweb-baydryfoodweb-baywetinf-USAir97internet-topologylibrec-ciaodvd-trustlibrec-filmtrust-trustlinuxloc-brightkitemaayan-Stelzlmaayan-figeysmaayan-foodwebmaayan-vidalmoreno_crime_projectedmoreno_propromoreno_trainmunmun_digg_reply_LCCopsahl-openflightsopsahl-powergridopsahl-ucsocialoregon2_010526p2p-Gnutella06p2p-Gnutella31pajek-erdospetster-hamsterpower-eris1176route-viewsslashdot-threadsslashdot-zoosubelj_jdksubelj_jung-jweb-EPAweb-webbase-2001wikipedia_link_knwikipedia_link_li
Figure 3:
Dismantling empirical complex systems.
Per-method cumulative area under thecurve (AUC) of real-world networks dismantling. The lower the better. The dismantling targetfor each method is of the network size. Each value is scaled to the one of our approach(GDM) for the same network.
GND stands for
Generalized Network Dismantling (with costmatrix W = I ), MS stands for Min-Sum , EI stands for Explosive Immunization and CI for Collective Influence . +R means that the reinsertion phase is performed.
CoreHD and CI arecompared to other +R algorithms as they include the reinsertion phase. Also, note that somevalues are clipped (limited) to 3x for the MS heuristic to improve visualization.16 a) Toy-example network composed of two cliques connected by bridges. The size of the nodesrepresents their betweenness value and the color (from dark red to white) represents their importance tothe system’s health according to our method. LCC SLCC PI R o b u s t n e ss v a l u e (b) Degree or betweenness based attack. R o b u s t n e ss v a l u e (c) Inverse degree or betweenness based attack. Figure 4: Toy-example meant to explain why the LCC is not sufficient to evaluate the state ofthe system. The LCC decreases at the same rate during the initial part of both the attacks shown.Instead, Ω values do not and reach warning levels before the system suddenly collapses.17 uropean Power Grid North-America Power Grid London Public Transport Original After 100 attacks After 300 attacks Original After 200 attacks After 550 attacks Original After 10 attacks After 30 attacks
Before critical point After critical point Early-warning signal First-response time [AU] at
Ω = 50 %
Predicted 0.1% most critical stations Predicted 0.1% most critical stations Predicted 5% most critical stations
Systemic Collapse Systemic Collapse Systemic Collapse
LCC SLCC PILCC SLCC PI
Figure 5:
Early warning due to network dismantling of real infrastructures.
Three empir-ical systems, namely the European power grid (left), the North-American power grid (middle)and the London public transport (right), are repeatedly attacked using a degree-based heuristics,i.e., hubs are damaged first. A fraction of the most vulnerable stations is shown for the originalsystems and some representative damaged states (i.e., before and after the critical point for sys-tem’s collapse), in the top of the figure. The plots show the behavior of the largest (LCC) andsecond-largest (SLCC) connected components, as well as the behavior of Ω , the Early Warningdescriptor introduced in this study and the p n value of each removed node (PI). Transitions be-tween green and red areas indicate the percolation point of the corresponding systems, foundthrough the SLCC peak. We also show the first response time in arbitrary units (AU), to high-light how our framework allows to anticipate system’s collapse, allowing for timely emergencyresponse. 18 achine learning dismantling and early-warningsignals of disintegration in complex systems Marco Grassia , Manlio De Domenico ∗ , Giuseppe Mangioni ∗ Dip. Ingegneria Elettrica, Elettronica e Informatica - Universit`a degli Studi di Catania - Italy CoMuNe Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo (TN), Italy ∗ To whom correspondence should be addressed;E-mail: [email protected]; [email protected].
Supplementary Materials
Here, we provide additional examples and details about the results we discussed in the maintext. We begin detailing the architecture of the employed deep learning model and the way itis trained. Secondly, we discuss the computational complexity of our framework and then weshow more toy and real-world examples on the dismantling process and on the Early Warning Ω . Lastly, we list the test networks used to evaluate our approach. Deep Learning Model
How the model works
A simplistic but more practical understanding of how a model with L GAT network layers assigns the p n value to each node n can be achieved by considering the L -hop neighborhood of the node. Consider a tree with height L with node n as root and whereeach node’s neighbors are its children. That is, each level l + 1 of the tree is populated with theneighbors of nodes at level l . For instance, at level we have n ’s neighbors.1 a r X i v : . [ phy s i c s . s o c - ph ] J a n ow, n ’s high-level node features ( h Ln ) are computed by aggregating the information fromthe L -hop neighborhood in a bottom up fashion. Each GAT network layer processes a level ofthe three, so the deeper the model, the farther the information comes from. That is, the modelstarts from the bottom of the tree (i.e., the nodes at L hops from n ) to compute the high-levelnode features of each node at layer L and goes up until the root (node n ) is reached.This means that the model is able to aggregate the information in the whole n ’s L -hopneighborhood in h Ln , which also accounts for the different importance each node has in thatneighborhood thanks to the GAT ’s self-attention mechanism. The basic idea is somehow sim-ilar to the
Collective Influence approach, with the main differences being that the geometricdeep learning model learns a weighted sum function from the training data to aggregate manynode features, whereas the
Collective Influence just sums the degrees, and also that the modelaggregates the whole L -hop neighborhood ball, not just its frontier.These high-level features ( h Ln ) are then fed to a regressor that returns p n , the node’s structuralimportance indicator used in our work.The actual implementation of our model relies on PyTorch Geometric library ( ) on-topof PyTorch ( ), while the handling of the graphs (i.e., implementation of the data structures,removal of the nodes and the computation of the connected components) is performed using graph-tool ( ). Training
We train our models in a supervised manner. Our training data is composed of smallsynthetic networks ( nodes each) generated using the Barab´asi-Albert (BA), the Erd˝os-R´enyi(ER) and the Static Power law generational models that are implemented in iGraph ( ) and NetworkX ( ). Each synthetic network is dismantled optimally using brute-force and nodesare assigned a numeric label (the learning target) that depends on their presence in the optimaldismantling set(s). That is, all combinations of increasing length of nodes are removed until we2nd at least one that shrinks the Largest Connected Component (LCC) to a given target size, ∼ in our tests; then, the label of each node is computed as the number of optimal sets itbelongs to, divided by the total number of optimal sets. For example, if there is only a set ofoptimal size, we assign a label value of to the nodes in that set and to all other nodes. Thisis meant to teach the model that some nodes are more critical than others since they belong tomany optimal dismantling sets.We stress that the training label is arbitrary and others may work better for other trainingsets or targets. Moreover, while we train on a generic purpose dataset that includes both powerlaw and ER networks, the training networks can also be chosen to fit the target networks, e.g.,by using networks from similar domains or with similar characteristics. Node features
Considering that the model can process any features combination, one couldjust choose to stuff every suitable node metrics that comes to his mind and, since it is proventhat Deep Neural Networks learn the feature importance, let them do the rest. On the other hand,it could also be tempting to use no features at all (e.g, a constant value for every node) sinceKipf et al. ( ) showed that their Graph Convolutional Network (GCN) , a particular type ofconvolutional-style graph neural networks, can learn to linearly separate the communities basedon the network structure alone and on minimal supervision (one labelled node per community),meaning that convolutional-style neural networks can leverage the network topology to assigna higher-level node feature that describes its role in the network.We argue that, while the first idea could make sense for scenarios where training data isabundant and the features are cheap to compute, and while the second shows worse (with respectto models with simple features) but still interesting performance, it makes sense to performsome feature selection a priori to keep the computational complexity of the attack low and alsoto speed-up the learning process. With that in mind, we pick node degree (plus its
Pearson’s hi-square statistic , χ , over the neighborhood), k –coreness and local clustering coefficient asnode features. Parameters
We run a grid search to test various combination of model parameters, which arereported here, and select the models that better fit the dismantling target (i.e., lower area underthe curve or lower number of removals).• Convolutional-style layers:
Graph Attention Network layers. – Number of layers: from to ; – Output channels for each layer: , , , , or , sometimes with a decreasingvalue between consecutive layers; – Multi-head attentions: , , , , or concatenated heads; – Dropout probability: fixed to . ; – Leaky ReLU angle of the negative slope: fixed to . ; – Each layer learns an additive bias; – Each layer is coupled with a linear layer with the same number of input and outputchannels; – Activation function: Exponential Linear Unit (ELU). The input at each convolu-tional layer is the sum between the output of the GAT and the linear layers;• Regressor: Multi Layer Perceptron – Number of layers: from to ; – Number of neurons per layer: , , , or , sometimes with a decreasingvalue between consecutive layers. 4 Learning rate: fixed to − ;• Epochs: we train each model for epochs; Computational complexity
The computational complexity of our approach mainly depends on two elements: 1) the com-putational complexity of the node features used and 2) the computational complexity of theconvolutional-style layers in the model. In particular, the convolutional-style layers that weemploy, i.e., the
Graph Attention Network , scale as O ( N + E ) where N is the number of nodesand E is the number of edges in the network. Considering that real-world networks are usu-ally sparse, we assume that O ( E ) ≈ O ( N ) , so O ( N + E ) ≈ O ( N ) , and the computationalcomplexity of our approach is the maximum between this and the computational complexityof the features. Given that, the most expensive feature we compute in our experiments is the k –coreness, that is O ( N + E ) , so the computational complexity of the approach detailed aboveis O ( N ) . For what concerns the computational complexity of the brute-force performed duringthe training set generation, it is irrelevant as it is a highly parallelizable one-time task that isperformed on very small networks. Moreover, since the neural models can generalize, there isno need to train them for each dismantling, and the actual time spent training is negligible. Other Results
Understanding GDM’s behavior
Before testing on real-world networks, we investigate thebehavior of our approach by dismantling some toy-example networks. To this aim, we em-ploy the same low computational complexity node features from the main paper (that are alsodetailed above).The first toy example, shown in figure 1a, is a network built from three ego-networks joined5y a bridge. The betweenness based heuristics , and also our common sense, would suggestto remove the bridge first, reducing the LCC size to one third of the initial value, and thenremove the nodes at the center of the unconnected ego networks left, for a total of four removals.Instead, our model predicts a different strategy and removes only the cores of the ego sub-networks, reaching the same LCC size with just three removals, as shown in Figure 2a.At this point, we want to probe if the model is just learning to remove the nodes in descend-ing degree order as the previous example would suggest. If that is the case, in our second toyexample network, composed of a clique with an appended tail as illustrated in Figure 1b, themodel would remove the nodes in the clique first, given their high degree. Instead, the tail isdetached first, meaning that the predicted strategy differs from the degree based one, and boththe degree and betweenness-based heuristics are outperformed, as shown in Figure 2b. (a) Three bridged ego networks. (b) Tailed clique Figure 1: Toy examples. The color of the nodes represents (from dark red to white) the removalorder of predicted strategy, while their size represents their betweenness value.
Dismantling real-world complex systems The removal of nodes by descending betweenness centrality order. The node betweenness is a node centralitymeasure that captures the importance of the node to the shortest paths through the network. DM Degree Betweenness Centrality
Number of removed nodes L CC S i z e (a) Three bridged ego networks. Number of removed nodes L CC S i z e (b) Tailed clique Figure 2: Dismantling the toy example networks using our approach, GDM, and the degree andbetweenness based heuristics as comparison.
Enhancement of metric-based heuristics
In order to better understand how our frame-work is able to outperform cutting-edge algorithms, we compare existing node metric-basedheuristics (e.g., removal of nodes in degree order) against GDM models that employ the cor-responding node metric as the only node feature. As an example, in Figure 3 we display theenhancement of the degree and the betweenness based heuristics in the left and right columnsrespectively. These GDM-enhanced heuristics effectively outperform the vanilla ones, high-lighting the fact that the model is able to capture the importance of the nodes thanks to thefeature propagation discussed before. This also gives an important insight as the model seemsto learn correlations between node features.
Dismantling of configuration model rewired networks
We investigate further if themodel is learning correlations among node features by dismantling the configuration modelrewirings of the networks in our test set. If that is the case, the dismantling power of our ap- The configuration model of a network keeps the observed connectivity distribution while destroying topolog-ical correlations, meaning that feature correlations are lost. configuration models and also the original instance as comparison. Inall the tested networks, there is a severe performance drop. For instance, in Figures 4b it takesjust ∼ removals to dismantle the original instance of the Moreno crime network, while theLCC size of the rewired networks after the same number of removals is still very large (i.e., ∼ ). This result confirms our insight. That is, existing topological correlations are learnedand, consequently, exploited by the machine. Dismantling results
In the main paper we compare our approach with the state-of-the-artalgorithms. In Table 1 we report the same results in numerical form.The table also includes other commonly used static attack approaches that remove the nodesin descending importance order according to some node centrality metric. While many heuris-tics fall in this category, we compare with the removal of nodes in descending degree ( ), be-tweenness ( ) and PageRank ( ). Our approach outperforms all these static approaches with asignificant margin, even the ones with higher computational complexity (e.g., the betweenness-based one). Dismantling of large networks
While in the main paper we compare our approach onsmall and medium size networks, in this section we extend the comparison against the morepromising state-of-the-art algorithms (
GND and MS with and without reinsertion, and CoreHD )to large networks with up to . M nodes and up to . M edges.As shown in Figure 5 and in Table 2, the results from the main paper are confirmed even forthese networks, although with smaller margins. This is still impressive as the proposed approachis static while the others recompute the nodes’ structural importance during the dismantlingprocess, which involves many removals for these networks (e.g., K on hyves network) andchanges the network topology drastically, confirming the validity of our approach.8n Table 3, we also report the prediction (if any) and dismantling time of each of the abovementioned methods to give a better idea on what their different computational complexitiesmean and translate into.
Dismantling curves
In Figure 6, we display the dismantling of most of our test networksand compare with the state-of-the-art algorithms and with the heuristics introduced in the previ-ous paragraph. As previously mentioned, one of the advantages of our approach is that we canchoose the best model to reach a given objective. As an example, we show the models that lowerthe area under the curve (GDM AUC) and the removals number (GDM
More Early Warning Ω examples In addition to the example applications of Ω illustratedin the main paper, we also test if it can detect the collapse of other systems. In particular, weshow the SciKit European powergrid (eu-powergrid) under random failures, degree or Min-Sum+ Reinsertion phase attacks in Figure 8, and also various American roads under GeneralizedNetwork Dismantling + Reinsertion phase attacks in Figure 9. In all these scenarios, Ω is ableto detect the system damage and reaches warning levels before the system collapse actuallyhappens, even in case of multiple large connected components detaching from the larger one asthe attack goes on. Dataset
In Table 4 we list the test networks used in our experiments with their category and size (numberof nodes and edges). Those networks model systems from various domains (e.g., biological,infrastructure and social data and so on), and range from a few hundred of nodes to more thanone million. For more about each network, we refer the reader to the original source.9 est environment
Here we detail the environment where our experiments were performed and the tools used.All experiments ran on a shared machine equipped with two Intel Xenon E5-2620 CPUs,128GB RAM and a two core nVidia Tesla K80 (with 12GB VRAM each). More details aboutthe drivers used and the full package dependency list of our code can be found in the codepackage.Concerning the other algorithms used in our comparison (i.e.,
GND , EGND MS , CoreHD and EI ), we use authors’ official code with default parameters. Specifically, we use identityweight input matrix for both GND and
EGND (and the relative fine-tuning algorithm), K trialsfor the EGND . References
1. Jdk dependency network dataset – KONECT, September 2016.2. Advogato network dataset – KONECT, April 2017.3. Brightkite network dataset – KONECT, April 2017.4. Caenorhabditis elegans network dataset – KONECT, April 2017.5. Catster/dogster familylinks/friendships network dataset – KONECT, April 2017.6. Citeseer network dataset – KONECT, April 2017.7. Crime network dataset – KONECT, April 2017.8. Dblp co-authorship network dataset – KONECT, April 2017.9. Dblp network dataset – KONECT, April 2017.100. Digg friends network dataset – KONECT, April 2017.11. Digg network dataset – KONECT, October 2017.12. Douban network dataset – KONECT, April 2017.13. Eu institution network dataset – KONECT, April 2017.14. Florida ecosystem dry network dataset – KONECT, April 2017.15. Florida ecosystem wet network dataset – KONECT, April 2017.16. Gnutella network dataset – KONECT, April 2017.17. Google.com internal network dataset – KONECT, April 2017.18. Gowalla network dataset – KONECT, April 2017.19. Hamsterster full network dataset – KONECT, April 2017.20. Human protein (figeys) network dataset – KONECT, April 2017.21. Human protein (stelzl) network dataset – KONECT, April 2017.22. Human protein (vidal) network dataset – KONECT, April 2017.23. Hyves network dataset – KONECT, April 2017.24. Internet topology network dataset – KONECT, April 2017.25. Jung and javax dependency network dataset – KONECT, April 2017.26. Linux network dataset – KONECT, April 2017.27. Little rock lake network dataset – KONECT, April 2017.118. Notre dame network dataset – KONECT, April 2017.29. Openflights network dataset – KONECT, April 2017.30. Protein network dataset – KONECT, April 2017.31. Route views network dataset – KONECT, April 2017.32. Slashdot threads network dataset – KONECT, April 2017.33. Slashdot zoo network dataset – KONECT, April 2017.34. Stanford network dataset – KONECT, April 2017.35. Train bombing network dataset – KONECT, April 2017.36. Twitter (icwsm) network dataset – KONECT, October 2017.37. Twitter lists network dataset – KONECT, April 2017.38. Uc irvine messages network dataset – KONECT, April 2017.39. Us power grid network dataset – KONECT, April 2017.40. Wordnet network dataset – KONECT, April 2017.41. Caenorhabditis elegans (neural) network dataset – KONECT, January 2018.42. Ciaodvd trust network dataset – KONECT, January 2018.43. Erd˝os network dataset – KONECT, March 2018.44. Filmtrust trust network dataset – KONECT, January 2018.45. Political blogs network dataset – KONECT, January 2018.126. Wikipedia links (kn) network dataset – KONECT, January 2018.47. Wikipedia links (li) network dataset – KONECT, January 2018.48. Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 US election:Divided they blog. In
Proc. Int. Workshop on Link Discov. , pages 36–43, 2005.49. Reka Albert, Hawoong Jeong, and Albert-Laszlo Barab´asi. Internet: Diameter of theworld-wide web.
Nature , 401(6749):130–131, Sep 1999.50. Vladimir Batagelj and Andrej Mrvar. Pajek datasets, July.51. Kurt Bollacker, Steve Lawrence, and C. Lee Giles. CiteSeer: An autonomous Web agentfor automatic retrieval and identification of interesting publications. In
Proc. Int. Conf. onAutonomous Agents , pages 116–123, 1998.52. Thomas Brinkhoff. A framework for generating network-based moving objects.
GeoIn-formatica , 6, 06 2000.53. CAIDA. Ipv4 routed /24 as links dataset.54. Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: User move-ment in location-based social networks. In
Proc. Int. Conf. on Knowledge Discovery andData Mining , pages 1082–1090, 2011.55. Munmun De Choudhury, Yu-Ru Lin, Hari Sundaram, K. Selc¸uk Candan, Lexing Xie, andAisling Kelliher. How does the data sampling strategy impact the discovery of informationdiffusion in social media? In
ICWSM , pages 34–41, 2010.56. Munmun De Choudhury, Hari Sundaram, Ajita John, and Dor´ee Duncan Seligmann. So-cial synchrony: Predicting mimicry of user actions in online social media. In
Proc. Int.Conf. on Comput. Science and Engineering , pages 151–158, 2009.137. St´ephane Coulomb, Michel Bauer, Denis Bernard, and Marie-Claude Marsolier-Kergoat.Gene essentiality and the topology of protein interaction networks.
Proceedings of theRoyal Society B: Biological Sciences , 272(1573):1721–1725, 2005.58. Gabor Csardi and Tamas Nepusz. The igraph software package for complex networkresearch.
InterJournal , Complex Systems:1695, 2006.59. Manlio De Domenico, Albert Sol´e-Ribalta, Sergio G´omez, and Alex Arenas. Navigabilityof interconnected networks under random failures.
Proceedings of the National Academyof Sciences , 111(23):8351–8356, 2014.60. Jordi Duch and Alex Arenas. Community detection in complex networks using extremaloptimization.
Phys. Rev. E , 72(2):027104, 2005.61. Rob M. Ewing, Peter Chu, Fred Elisma, Hongyan Li, Paul Taylor, Shane Climie, LindaMcBroom-Cerajewski, Mark D. Robinson, Liam O’Connor, Michael Li, Rod Taylor,Moyez Dharsee, Yuen Ho, Adrian Heilbut, Lynda Moore, Shudong Zhang, Olga Ornatsky,Yury V. Bukhman, Martin Ethier, Yinglun Sheng, Julian Vasilescu, Mohamed Abu-Farha,Jean-Philippe P. Lambert, Henry S. Duewel, Ian I. Stewart, Bonnie Kuehl, Kelly Hogue,Karen Colwill, Katharine Gladwish, Brenda Muskat, Robert Kinach, Sally-Lin L. Adams,Michael F. Moran, Gregg B. Morin, Thodoros Topaloglou, and Daniel Figeys. Large-scalemapping of human protein–protein interactions by mass spectrometry.
Molecular SystemsBiology , 3, 2007.62. Christiane Fellbaum, editor.
WordNet: an Electronic Lexical Database . MIT Press, 1998.63. Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geo-metric. In
ICLR Workshop on Representation Learning on Graphs and Manifolds , 2019.144. Guobing Guo, Jia Zhang, and Neil Yorke-Smith. A novel Bayesian similarity measure forrecommender systems. In
Proc. Int. Joint Conf. on Artif. Intell. , pages 2619–2625, 2013.65. Guobing Guo, Jie Zhang, Daniel Thalmann, and Neil Yorke-Smith. ETAF: An extendedtrust antecedents framework for trust prediction. In
Proc. Int. Conf. Adv. in Soc. Netw.Anal. and Min. , pages 540–547, 2014.66. Vicenc¸ G´omez, Andreas Kaltenbrunner, and Vicente L´opez. Statistical analysis of thesocial network and discussion threads in Slashdot. In
Proc. Int. World Wide Web Conf. ,pages 645–654, 2008.67. Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure,dynamics, and function using NetworkX. In
Proceedings of the 7th Python in ScienceConference (SciPy2008) , pages 11–15, Pasadena, CA USA, August 2008.68. Jing-Dong J Han, Denis Dupuy, Nicolas Bertin, Michael E Cusick, and Marc Vidal. Ef-fect of sampling on topology predictions of protein-protein interaction networks.
NatureBiotechnology , 23(7):839–844, 2005.69. Brian Hayes. Connecting the dots. can the tools of graph theory and social-network studiesunravel the next big plot?
American Scientist , 94(5):400–404, 2006.70. T. Hogg and K. Lerman. Social dynamics of Digg.
EPJ Data Science , 1(5), 2012.71. Petter Holme, Beom Jun Kim, Chang No Yoon, and Seung Kee Han. Attack vulnerabilityof complex networks.
Phys. Rev. E , 65:056109, May 2002.72. Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolu-tional networks. arXiv preprint arXiv:1609.02907 , 2016.153. J´erˆome Kunegis, Andreas Lommatzsch, and Christian Bauckhage. The Slashdot Zoo:Mining a social network with negative edges. In
Proc. Int. World Wide Web Conf. , pages741–750, 2009.74. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densificationlaws, shrinking diameters and possible explanations. In
Proceedings of the Eleventh ACMSIGKDD International Conference on Knowledge Discovery in Data Mining , KDD ’05,page 177–187, New York, NY, USA, 2005. Association for Computing Machinery.75. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densificationand shrinking diameters.
ACM Trans. Knowledge Discovery from Data , 1(1):1–40, 2007.76. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collec-tion, June 2014.77. Jure Leskovec, Kevin Lang, Anirban Dasgupta, and Michael W. Mahoney. Communitystructure in large networks: Natural cluster sizes and the absence of large well-definedclusters.
Internet Mathematics , 6(1):29–123, 2009.78. Michael Ley. The DBLP computer science bibliography: Evolution, research issues, per-spectives. In
Proc. Int. Symposium on String Processing and Information Retrieval , pages1–10, 2002.79. Feifei Li, Dihan Cheng, Marios Hadjieleftheriou, George Kollios, and Shang-Hua Teng.On trip planning queries in spatial databases. In Claudia Bauzer Medeiros, Max J. Egen-hofer, and Elisa Bertino, editors,
Advances in Spatial and Temporal Databases , pages273–290, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.80. Penn State University Libraries. Digital chart of the world server, 2006.161. Neo D. Martinez, John J. Magnuson, Timothy. Kratz, and M. Sierszen. Artifacts or at-tributes? effects of resolution on the Little Rock Lake food web.
Ecological Monographs ,61:367–392, 1991.82. Paolo Massa, Martino Salvetti, and Danilo Tomasoni. Bowling alone and trust decline insocial network sites. In
Proc. Int. Conf. Dependable, Autonomic and Secure Computing ,pages 658–663, 2009.83. Carsten Matke, Wided Medjroubi, and David Kleinhans. SciGRID - An Open SourceReference Model for the European Transmission Network (v0.2), Jul 2016.84. Julian McAuley and Jure Leskovec. Learning to discover social circles in ego networks.In
Advances in Neural Information Processing Systems , pages 548–556. 2012.85. Flaviano Morone and Hern´an A Makse. Influence maximization in complex networksthrough optimal percolation.
Nature , 524(7563):65, 2015.86. Tore Opsahl, Filip Agneessens, and John Skvoretz. Node centrality in weighted networks:Generalizing degree and shortest paths.
Social Networks , 3(32):245–251, 2010.87. Tore Opsahl and Pietro Panzarasa. Clustering in weighted networks.
Social Networks ,31(2):155–163, 2009.88. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citationranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.89. Gergely Palla, Ill´es J. Farkas, P´eter Pollner, Imre Der´enyi, and Tam´as Vicsek. Directednetwork modules.
New J. Phys. , 9(6):186, 2007.170. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, ZacharyDeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differ-entiation in pytorch. 2017.91. Tiago P. Peixoto. The graph-tool python library. figshare , 2014.92. Haroldo V Ribeiro, Luiz G A Alves, Alvaro F Martins, Ervin K Lenzi, and Matjaˇz Perc.The dynamical structure of political corruption networks.
Journal of Complex Networks ,6(6):989–1003, 01 2018.93. Matei Ripeanu, Ian Foster, and Adriana Iamnitchi. Mapping the Gnutella network: Proper-ties of large-scale peer-to-peer systems and implications for system design.
IEEE InternetComput. J. , 6, 2002.94. Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graphanalytics and visualization. In
AAAI , 2015.95. Jean-Franc¸ois Rual, Kavitha Venkatesan, Tong Hao, Tomoko Hirozane-Kishikawa,Am´elie Dricot, Ning Li, Gabriel F. Berriz, Francis D. Gibbons, Matija Dreze, and NonoAyivi-Guedehoussou. Towards a proteome-scale map of the human protein–protein inter-action network.
Nature , (7062):1173–1178, 2005.96. U. Stelzl, U. Worm, M. Lalowski, C. Haenig, F. H. Brembeck, H. Goehler, M. Stroedicke,M. Zenkner, A. Schoenherr, S. Koeppen, J. Timm, S. Mintzlaff, C. Abraham, N. Bock,S. Kietzmann, A. Goedde, E Toks¨oz, A. Droege, S. Krobitsch, B. Korn, W. Birchmeier,H. Lehrach, and E. E. Wanker. A human protein–protein interaction network: A resourcefor annotating the proteome.
Cell , 122:957–968, 2005.187. Michael PH Stumpf, Carsten Wiuf, and Robert M May. Subnets of scale-free networks arenot scale-free: Sampling properties of networks.
Proceedings of the National Academy ofSciences of the United States of America , 102(12):4221–4224, 2005.98. Robert E. Ulanowicz, Johanna J. Heymans, and Michael S. Egnotovich. Network analy-sis of trophic dynamics in South Florida ecosystems, FY 99: The graminoid ecosystem.
Annual Report to the United States Geological Service Biological Resources Division Ref.No.[UMCES] CBL 00-0176, Chesapeake Biological Laboratory, University of Maryland ,2000.99. Lovro ˇSubelj and Marko Bajec. Software systems through complex networks science:Review, analysis and applications. In
Proc. Int. Workshop on Software Mining , pages9–16, 2012.100. Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks.
Nature , 393(1):440–442, 1998.101. John G. White, E. Southgate, J. N. Thomson, and S. Brenner. The structure of the nervoussystem of the nematode Caenorhabditis elegans.
Phil. Trans. R. Soc. Lond , 314:1–340,1986.102. Bart Wiegmans. Gridkit: European and north-american extracts, Mar 2016.103. Jaewon Yang and Jure Leskovec. Defining and evaluating network communities basedon ground-truth. In
Proc. ACM SIGKDD Workshop on Mining Data Semantics , page 3.ACM, 2012.104. R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.1905. Beichuan Zhang, Raymond Liu, Daniel Massey, and Lixia Zhang. Collecting the InternetAS-level topology.
SIGCOMM Computer Communication Review , 35(1):53–61, 2005.20
DM Betweenness Centrality Degree
Number of removed nodes L CC S i z e (a) arenas-meta degree Number of removed nodes L CC S i z e (b) arenas-meta betweenness
10 20 30 40 50 60 70 80
Number of removed nodes L CC S i z e (c) foodweb-baywet degree Number of removed nodes L CC S i z e (d) foodweb-baywet betweenness Number of removed nodes L CC S i z e (e) inf-USAir97 degree Number of removed nodes L CC S i z e (f) inf-USAir97 betweenness Figure 3: Comparison of degree and betweenness vanilla heuristics with their GDM-enhancedversions on the arenas-meta, foodweb-baywet and inf-USAir97 networks.21
Number of removed nodes L CC S i z e (a) arenas-meta network Number of removed nodes L CC S i z e (b) Moreno crime network Number of removed nodes L CC S i z e (c) opsahl-openflights network Number of removed nodes L CC S i z e (d) route-views network Figure 4: Dismantling of original networks (dark blue) and configuration model rewiringsfor each (light blue). 22 euristic GDM GND EGND Adaptive degree EI σ Pagerank Degree Betweenness MS EI σ GDM +R GND +R CoreHD MS +R CI ‘ − NetworkARK201012 LCC 100.0 99.7 100.1 103.3 128.4 103.1 104.9 123.3 130.9 3883.7 94.5 87.6 92.6 95.8 114.6advogato 100.0 108.0 105.5 101.8 111.6 150.1 113.4 114.8 112.6 494.5 94.8 97.5 102.1 102.7 98.8arenas-meta 100.0 129.0 141.9 103.6 120.4 114.8 116.4 142.4 120.5 579.3 90.8 92.5 95.5 94.9 96.9cfinder-google 100.0 160.4 246.5 99.5 233.7 113.5 141.3 377.9 682.8 1609.3 67.5 105.6 101.0 166.9 114.0corruption 100.0 99.3 126.9 157.3 236.3 147.5 400.1 166.7 864.8 1141.9 97.6 147.4 138.6 139.6 176.6dblp-cite 100.0 113.3 121.7 113.5 111.7 114.7 131.6 119.0 139.8 533.5 103.9 108.5 132.3 132.5 117.1dimacs10-celegansneural 100.0 85.0 95.9 103.1 105.7 116.4 120.8 125.1 117.5 182.2 94.2 103.8 111.6 110.3 99.7dimacs10-polblogs 100.0 107.5 97.1 102.1 115.5 112.5 117.9 114.8 107.5 262.3 98.4 108.4 106.0 104.9 104.6econ-wm1 100.0 130.3 114.4 109.8 128.0 131.0 129.4 132.7 107.7 309.3 99.6 109.4 106.0 105.9 126.3ego-twitter 100.0 116.8 115.8 108.9 103.0 107.8 108.8 133.3 167.3 6017.4 98.8 98.2 114.4 111.7 103.9eu-powergrid 100.0 75.9 89.1 138.8 73.8 180.1 163.5 174.5 290.9 3313.0 64.4 66.5 83.4 92.8 109.4foodweb-baydry 100.0 104.5 99.5 98.1 103.0 120.5 122.3 109.4 104.4 125.2 97.8 98.0 101.2 99.3 110.6foodweb-baywet 100.0 110.2 108.4 99.6 103.9 123.6 125.4 112.9 106.8 128.3 98.5 108.5 102.1 101.8 113.0inf-USAir97 100.0 112.4 117.8 130.4 147.0 117.1 139.1 128.6 164.0 633.6 100.1 117.2 103.7 107.6 129.8internet-topology 100.0 95.6 95.8 99.1 113.9 109.2 131.4 122.9 138.6 3879.9 94.8 84.7 100.2 101.7 103.0librec-ciaodvd-trust 100.0 113.1 115.5 117.6 129.4 120.5 139.8 114.9 126.6 634.5 104.3 114.4 124.4 126.3 126.1librec-filmtrust-trust 100.0 108.9 118.3 117.7 112.8 131.8 148.4 158.9 168.7 1308.2 89.7 95.5 106.8 98.6 98.0linux 100.0 97.9 101.1 116.2 84.5 176.0 190.8 365.1 150.0 1035.2 78.3 71.4 74.1 80.1 92.1loc-brightkite 100.0 100.2 100.3 98.6 97.7 104.3 110.9 122.1 106.7 593.9 89.5 99.7 92.1 92.4 93.0maayan-Stelzl 100.0 144.1 133.0 102.5 114.3 113.4 127.7 137.0 111.7 1269.6 96.3 113.4 107.1 105.2 105.4maayan-figeys 100.0 104.3 120.2 100.7 155.9 127.3 146.9 153.4 129.5 1656.6 98.0 100.1 123.7 123.4 99.5maayan-foodweb 100.0 111.5 94.6 114.7 147.8 118.9 123.8 126.2 154.6 268.7 100.0 125.5 136.1 144.4 173.9maayan-vidal 100.0 111.0 106.7 103.3 101.6 109.1 110.6 123.9 114.1 843.9 90.1 102.5 95.6 97.9 97.3moreno crime projected 100.0 105.8 86.0 191.2 139.2 157.6 218.8 180.6 976.7 2103.3 82.7 88.8 100.3 104.1 126.2moreno propro 100.0 115.9 123.6 115.6 87.9 126.1 123.5 146.7 145.2 1985.3 90.7 94.6 92.2 93.1 96.3moreno train 100.0 104.9 104.9 107.1 124.0 149.5 156.0 134.7 176.9 408.8 100.0 109.7 115.6 120.3 211.6munmun digg reply LCC 100.0 116.3 108.6 98.5 109.4 106.5 108.3 117.5 98.9 556.8 95.6 104.0 99.0 98.4 98.5opsahl-openflights 100.0 101.2 106.2 127.2 109.9 123.2 135.4 123.6 157.3 807.7 84.4 92.0 102.6 111.3 120.9opsahl-powergrid 100.0 36.9 69.4 148.6 37.0 173.4 180.9 183.9 164.3 1508.1 43.1 42.1 51.4 52.5 65.6opsahl-ucsocial 100.0 122.1 116.1 99.9 118.5 105.9 109.9 109.8 108.8 342.0 97.0 106.1 105.8 106.0 101.7oregon2 010526 100.0 106.8 101.5 108.8 131.1 101.6 130.5 114.6 162.0 3247.5 90.0 80.5 113.0 112.8 95.1p2p-Gnutella06 100.0 128.5 120.4 108.5 108.6 111.6 125.1 118.4 108.7 274.0 101.4 120.4 110.1 108.4 109.1p2p-Gnutella31 100.0 133.6 NaN 109.1 112.7 110.3 123.1 129.5 109.2 474.4 102.3 121.6 110.4 108.8 109.8pajek-erdos 100.0 112.2 107.5 103.3 119.9 103.3 104.6 106.7 122.8 2790.7 98.2 106.9 116.7 113.9 101.0petster-hamster 100.0 92.5 90.9 122.7 103.8 135.1 127.2 123.8 166.7 402.6 91.5 93.3 96.2 96.5 98.6power-eris1176 100.0 199.1 218.0 340.2 171.7 253.5 622.5 430.2 632.6 1957.4 86.6 154.8 161.3 157.8 153.7route-views 100.0 99.3 99.1 101.8 133.2 103.5 103.5 112.3 131.5 4340.9 94.0 82.0 93.0 95.2 112.5slashdot-threads 100.0 100.1 102.2 99.5 122.5 104.6 105.4 114.2 117.6 1495.8 96.1 95.8 115.7 115.1 97.9slashdot-zoo 100.0 99.2 100.1 95.6 120.9 103.3 106.8 124.0 112.4 683.8 95.2 97.7 106.9 105.9 96.5subelj jdk 100.0 107.6 110.2 115.1 113.0 144.2 181.5 346.9 144.6 1275.7 80.9 84.7 84.8 81.0 103.4subelj jung-j 100.0 102.1 111.6 122.0 118.4 150.7 185.7 334.6 143.1 1295.0 80.1 88.5 82.9 72.2 101.5web-EPA 100.0 148.4 157.6 102.2 141.1 104.9 109.7 137.7 158.1 1471.9 101.1 115.6 133.8 132.8 107.3web-webbase-2001 100.0 127.6 130.0 165.0 196.6 216.3 165.4 207.7 3603.1 55066.4 64.7 50.1 76.6 82.6 80.9wikipedia link kn 100.0 107.3 102.6 103.7 113.2 124.8 143.6 140.3 128.8 NaN 92.9 98.0 113.9 113.5 96.8wikipedia link li 100.0 120.3 145.4 132.8 151.8 120.5 165.2 110.6 211.9 1049.4 107.2 151.0 177.5 174.5 157.8Average 100.0 111.7 115.4 119.1 123.6 128.7 151.1 158.9 273.3 2596.4 91.5 100.8 106.9 108.7 112.1
Table 1: Per-method area under the curve (AUC) of real-world networks dismantling. The lowerthe better. The dismantling target for each method is of the network size. We compute theAUC value by integrating the
LCC ( x ) / | N | values using Simpson’s rule, and each value isscaled to the one of our approach (GDM) for the same network. +R means that the reinsertionphase is performed. CoreHD and CI are compared to other +R algorithms as they include thereinsertion phase. EGND for p2p-Gnutella31 is missing as the computation was killed after10d. 23 D M G N D M S G D M + R G N D + R M S + R C o r e H D Heuristic citeseercom-dblpdigg-friendsdoubanemail-EuAllhyvesloc-gowallamunmun_twitter_socialpetster-catdog-householdtech-RL-caidatwitter_LCCwordnet-words
Figure 5:
Dismantling empirical complex large systems.
Per-method cumulative area underthe curve (AUC) of real-world networks dismantling. The lower the better. The dismantlingtarget for each method is of the network size. We compute the AUC value by integratingthe
LCC ( x ) / | N | values using Simpson’s rule, and each value is scaled to the one of our ap-proach (GDM) for the same network. GND stands for
Generalized Network Dismantling (withcost matrix W = I ) and MS stands for Min-Sum . +R means that the reinsertion phase is per-formed. Also, note that some values are clipped (limited) to 3x for the MS heuristic to improvevisualization. 24 euristic GDM GND MS GDM +R GND +R MS +R CoreHDNetworkciteseer 100.0 102.2 111.2 92.8 91.3 95.0 94.3com-dblp 100.0 109.6 184.5 91.5 108.3 92.4 91.2digg-friends 100.0 100.9 140.5 97.0 103.6 120.7 121.2douban 100.0 120.8 132.7 102.6 129.3 131.6 132.9email-EuAll 100.0 97.0 192.1 100.0 100.0 147.4 148.5hyves 100.0 109.3 133.6 101.6 109.6 131.9 133.6loc-gowalla 100.0 103.2 105.4 89.7 91.9 91.0 90.5munmun twitter social 100.0 105.2 140.5 100.2 112.4 138.5 137.3petster-catdog-household 100.0 100.7 164.7 95.4 98.0 143.4 144.7tech-RL-caida 100.0 104.8 147.2 86.9 94.3 82.8 80.2twitter LCC 100.0 93.6 98.8 85.3 81.4 83.0 84.7wordnet-words 100.0 120.4 234.5 100.0 110.8 111.0 109.7Average 100.0 105.6 148.8 95.3 102.6 114.1 114.1 Table 2: Per-method area under the curve (AUC) of real-world large networks dismantling.The lower the better. The dismantling target for each method is of the network size. Wecompute the AUC value by integrating the
LCC ( x ) / | N | values using Simpson’s rule, and eachvalue is scaled to the one of our approach (GDM) for the same network. +R means that thereinsertion phase is performed. CoreHD and CI are compared to other +R algorithms as theyinclude the reinsertion phase. Prediction time Dismantle timeHeuristic GDM CoreHD GDM GND MSNetworkciteseer 00:00:03.4 00:00:22.9 01:30:17.1 03:43:51.6 01:26:21.5com-dblp 00:00:02.9 00:00:14.9 00:22:30.7 04:57:25.6 00:59:38.4digg-friends 00:00:02.8 00:00:19.9 00:08:01.9 00:30:55.5 01:11:37.4douban 00:00:01.3 00:00:06.1 00:01:10.1 00:03:34.8 00:11:40.4email-EuAll 00:00:02.4 00:00:07.8 00:00:10.7 00:01:14.9 00:09:49.0hyves 00:00:13.5 00:00:36.6 03:08:02.7 08:21:22.8 02:03:26.9loc-gowalla 00:00:02.0 00:00:15.9 00:17:22.3 01:27:28.0 00:46:15.0munmun twitter social 00:00:04.3 00:00:14.3 00:00:53.5 00:07:53.4 00:29:13.9petster-catdog-household 00:00:03.9 00:00:40.6 00:44:20.5 03:58:17.1 02:16:02.8tech-RL-caida 00:00:01.8 00:00:12.1 00:07:23.7 04:14:34.1 00:29:30.8twitter LCC 00:00:04.4 00:00:13.0 00:32:01.0 05:33:36.3 00:19:18.8wordnet-words 00:00:01.4 00:00:12.1 00:03:34.0 01:23:52.1 00:22:28.5
Table 3: Real-world large networks dismantling timings. The lower the better. Time format isHH:MM:SS.s. MS and GND do not have prediction time as they refresh the predictions duringthe dismantling, while there is no
CoreHD dismantling column as we use our dismantler.25
DM AUC GDM
Number of removed nodes L CC S i z e (a) arenas-meta Number of removed nodes L CC S i z e (b) corruption Number of removed nodes L CC S i z e (c) douban Number of removed nodes L CC S i z e (d) econ-wm1 Number of removed nodes L CC S i z e (e) foodweb-baywet Number of removed nodes L CC S i z e (f) hyves Number of removed nodes L CC S i z e (g) inf-USAir97 Number of removed nodes L CC S i z e (h) librec-ciaodvd-trust Number of removed nodes L CC S i z e (i) maayan-foodweb Number of removed nodes L CC S i z e (j) maayan-Stelzl Number of removed nodes L CC S i z e (k) moreno-crime-projected Number of removed nodes L CC S i z e (l) opsahl-openflights Number of removed nodes L CC S i z e (m) p2p-Gnutella06 Number of removed nodes L CC S i z e (n) petster-hamster Number of removed nodes L CC S i z e (o) power-eris1176 Number of removed nodes L CC S i z e (p) tech-RL-caida Number of removed nodes L CC S i z e (q) twitter LCC Number of removed nodes L CC S i z e (r) wordnet-words Figure 6: Dismantling of some networks in our test set. We compare against the algorithmswithout reinsertion in Tables 1 and 2 and show both the models with lower area under the curve(GDM AUC) and with lower number of removals (GDM
DM +R AUC GDM +R CI 2
Number of removed nodes L CC S i z e (a) arenas-meta Number of removed nodes L CC S i z e (b) corruption Number of removed nodes L CC S i z e (c) douban Number of removed nodes L CC S i z e (d) econ-wm1 Number of removed nodes L CC S i z e (e) foodweb-baywet Number of removed nodes L CC S i z e (f) hyves Number of removed nodes L CC S i z e (g) inf-USAir97 Number of removed nodes L CC S i z e (h) librec-ciaodvd-trust Number of removed nodes L CC S i z e (i) maayan-foodweb Number of removed nodes L CC S i z e (j) maayan-Stelzl Number of removed nodes L CC S i z e (k) moreno-crime-projected Number of removed nodes L CC S i z e (l) opsahl-openflights Number of removed nodes L CC S i z e (m) p2p-Gnutella06 Number of removed nodes L CC S i z e (n) petster-hamster Number of removed nodes L CC S i z e (o) power-eris1176 Number of removed nodes L CC S i z e (p) tech-RL-caida Number of removed nodes L CC S i z e (q) twitter LCC Number of removed nodes L CC S i z e (r) wordnet-words Figure 7: Dismantling of some networks in our test set. We compare against the algorithmswith reinsertion phase in Tables 1 and 2 and show both the models with lower area under thecurve (GDM +R AUC) and with lower number of removals (GDM +R
CC SLCC PI R o b u s t n e ss v a l u e (a) Random R o b u s t n e ss v a l u e (b) Degree R o b u s t n e ss v a l u e (c) MS + R Figure 8: Early Warning values for the SciKit European powergrid under random failures andtargeted attacks. 32
CC SLCC PI R o b u s t n e ss v a l u e (a) California Roads R o b u s t n e ss v a l u e (b) North-America roads R o b u s t n e ss v a l u e (c) San Francisco roads Figure 9: Ω values for three different American road networks under GND +R attacks (withcost matrix W = I ). 33 CC SLCC PI R o b u s t n e ss v a l u e (a) Internet topology (tech-RL-caida) R o b u s t n e ss v a l u e (b) University of Notre Dame website hyperlinksnetwork R o b u s t n e ss v a l u e (c) Stanford University website hyperlinks network Figure 10: Ω values for three different internet networks under GND +R attacks (with costmatrix W = I ). 34 etwork Name Category | N | | E | ReferencesARK201012 LCC CAIDA ARK (Dec 2010) (LCC) Infrastructure 29.3K 78.1K ( )advogato Advogato trust network Social 6.5K 43.3K (
2, 82 )arenas-meta C. elegans Metabolic 453 2.0K (
4, 60 )cfinder-google Google.com internal Hyperlink 15.8K 149.5K (
17, 89 )citeseer CiteSeer Citation 384.4K 1.7M (
6, 51 )com-dblp DBLP co-authorship Coauthorship 317.1K 1.0M (
8, 103 )corruption Corruption Scandals Social 309 3.3K ( )dblp-cite DBLP citation Citation 12.6K 49.6K (
9, 78 )digg-friends Digg friends Social 279.6K 1.5M (
10, 70 )dimacs10-celegansneural C. elegans (neural) Neural 297 2.1K (
41, 100, 101 )dimacs10-polblogs Political blogs (LCC) Hyperlink 1.2K 16.7K (
45, 48 )douban Douban social network Social 154.9K 327.2K (
12, 104 )econ-wm1 Economic network WM1 Economic 260 2.6K ( )ego-twitter Twitter lists Social 23.4K 32.8K (
37, 84 )email-EuAll EU institution email Communication 265.2K 365.6K (
13, 75 )eu-powergrid SciGRID Power Europe Power 1.5K 1.8K ( )foodweb-baydry Florida ecosystem dry Trophic 128 2.1K (
14, 98 )foodweb-baywet Florida ecosystem wet Trophic 128 2.1K (
15, 98 )gridkit-eupowergrid GridKit Power Europe Power 13.8K 17.3K ( )gridkit-north america GridKit Power North-America Power 16.2K 20.2K ( )hyves Hyves social network Social 1.4M 2.8M (
23, 104 )inf-USAir97 US Air lines (1997) Infrastructure 332 2.1K (
50, 94 )internet-topology Internet (AS) topology Infrastructure 34.8K 107.7K (
24, 105 )librec-ciaodvd-trust CiaoDVD trust network Social 4.7K 33.1K (
42, 65 )librec-filmtrust-trust FilmTrust trust network Social 874 1.3K (
44, 64 )linux Linux source code files Software 30.8K 213.7K ( )loc-brightkite Brightkite friendships Social 58.2K 214.1K (
3, 54 )loc-gowalla Gowalla friendships Social 196.6K 950.3K (
18, 54 )london transport multiplex aggr Aggregated London Transportation network Transport 369 430 ( )maayan-Stelzl Human protein (Stelzl) Metabolic 1.7K 3.2K (
21, 96 )maayan-figeys Human protein (Figeys) Metabolic 2.2K 6.4K (
20, 61 )maayan-foodweb Little Rock Lake food web Trophic 183 2.5K (
27, 81 )maayan-vidal Human protein (Vidal) Metabolic 3.1K 6.7K (
22, 95 )moreno crime projected Crime (projection) Social 754 2.1K ( )moreno propro Protein Metabolic 1.9K 2.3K (
30, 57, 68, 97 )moreno train Train bombing terrorist contacts Human contact 64 243 (
35, 69 )munmun digg reply LCC Digg social network replies (LCC) Communication 29.7K 84.8K (
11, 56 )munmun twitter social Twitter follows (ICWSM) Social 465.0K 833.5K (
36, 55 )opsahl-openflights OpenFlights Infrastructure 2.9K 15.7K (
29, 86 )opsahl-powergrid US power grid Infrastructure 4.9K 6.6K (
39, 100 )opsahl-ucsocial UC Irvine messages Communication 1.9K 13.8K (
38, 87 )oregon2 010526 Autonomous systems Oregon-2 Infrastructure 11.5K 32.7K ( )p2p-Gnutella06 Gnutella P2P, August 8 2002 Computer 8.7K 31.5K (
76, 93 )p2p-Gnutella31 Gnutella P2P, August 31 2002 Computer 62.6K 147.9K (
16, 93 )pajek-erdos Erd˝os co-authorship network Coauthorship 6.9K 11.8K (
43, 50 )petster-catdog-household Catster/Dogster familylinks (LCC) Social 324.9K 2.6M ( )petster-hamster Hamsterster full Social 2.4K 16.6K ( )power-eris1176 Power network problem Power 1.2K 9.9K ( )roads-california California Road Network Infrastructure 21.0K 21.7K ( )roads-northamerica North-America Road Network Infrastructure 175.8K 179.1K ( )roads-sanfrancisco San Francisco Road Network Infrastructure 175.0K 221.8K ( )route-views Autonomous systems AS-733 Infrastructure 6.5K 13.9K (
31, 75 )slashdot-threads Slashdot threads Communication 51.1K 117.4K (
32, 66 )slashdot-zoo Slashdot Zoo Social 79.1K 467.7K (
33, 73 )subelj jdk JDK dependency network Software 6.4K 53.7K ( )subelj jung-j JUNG and Javax dependency network Software 6.1K 50.3K (
25, 99 )tech-RL-caida Internet router network Infrastructure 190.9K 607.6K ( )twitter LCC Twitter users (LCC) Social 532.3K 694.6K ( )web-EPA Pages linking to epa.gov Hyperlink 4.3K 8.9K ( )web-NotreDame Notre Dame web pages Hyperlink 325.7K 1.1M (
28, 49 )web-Stanford Stanford University web pages Hyperlink 281.9K 2M (
34, 77 )web-webbase-2001 Web network Hyperlink 16.1K 25.6K ( )wikipedia link kn Wikipedia links (KN) Hyperlink 29.5K 278.7K ( )wikipedia link li Wikipedia links (LI) Hyperlink 49.1K 294.3K ( )wordnet-words WordNet lexical network Lexical 146.0K 657.0K (
40, 62 ))