[PDF] Severability of mesoscale components and local time scales in dynamical networks

Abstract

A major goal of dynamical systems theory is the search for simplified descriptions of the dynamics of a large number of interacting states. For overwhelmingly complex dynamical systems, the derivation of a reduced description on the entire dynamics at once is computationally infeasible. Other complex systems are so expansive that despite the continual onslaught of new data only partial information is available. To address this challenge, we define and optimise for a local quality function severability for measuring the dynamical coherency of a set of states over time. The theoretical underpinnings of severability lie in our local adaptation of the Simon-Ando-Fisher time-scale separation theorem, which formalises the intuition of local wells in the Markov landscape of a dynamical process, or the separation between a microscopic and a macroscopic dynamics. Finally, we demonstrate the practical relevance of severability by applying it to examples drawn from power networks, image segmentation, social networks, metabolic networks, and word association.

Full PDF

SSeverability of mesoscale components and local time scales in dynamical networks

Yun William Yu, Jean-Charles Delvenne, Sophia N. Yaliraki, and Mauricio Barahona Department of Computer and Mathematical Sciences, University of Toronto, Toronto, ON, Canada Institute of Information and Communication Technologies,Electronics and Applied Mathematics, Université catholique de Louvain, Belgium Department of Chemistry, Imperial College London, United Kingdom Department of Mathematics, Imperial College London, United Kingdom (Dated: June 5, 2020)A major goal of dynamical systems theory is the search for simpliﬁed descriptions of the dy-namics of a large number of interacting states. For overwhelmingly complex dynamical systems,the derivation of a reduced description on the entire dynamics at once is computationally infea-sible. Other complex systems are so expansive that despite the continual onslaught of new dataonly partial information is available. To address this challenge, we deﬁne and optimise for a localquality function severability for measuring the dynamical coherency of a set of states over time. Thetheoretical underpinnings of severability lie in our local adaptation of the Simon-Ando-Fisher time-scale separation theorem, which formalises the intuition of local wells in the Markov landscape of adynamical process, or the separation between a microscopic and a macroscopic dynamics. Finally,we demonstrate the practical relevance of severability by applying it to examples drawn from powernetworks, image segmentation, social networks, metabolic networks, and word association.

Complex dynamical systems composed of a largenumber of interconnected components are omnipresent,whether in biology (genetic and biochemical networks,interconnected neurons in the brain), technology (power,communication and computer networks) or human inter-actions (economy and social networks) [20, 27–29, 42, 66].The complexity of such dynamical networks is the resultof the subtle interdependence between the dynamics ofthe individual agents and the network-mediated inter-actions between them [13, 22, 62, 65]. As data sciencebecomes pervasive, ever increasing amounts of relationaldata (in many cases dynamic) are collected and analyzed,and scientists are faced with the challenge of extractingconsistent patterns and useful simpliﬁed representationsin a scalable way.The search for a simpliﬁed description of a complexsystem that retains essential features of the dynamics ofthe original is fundamental in complex systems analysis,and many approaches exist, including classical methodssuch as model reduction for linear systems [8, 44, 50, 52]or time scale separation techniques [10, 35, 59]. Typi-cally, model reduction (or dimensionality reduction) tech-niques output a simpliﬁed spectral description based ondominant eigenmodes, which are generally diﬃcult tointerpret because their internal states are global com-binations of all original states, destroying the originalinterconnection structure. Time scale separation tech-niques are applicable when dynamical systems are dom-inated by diﬀerent kinds of behaviour at long and shorttimes. This coexistence of time scales allows the useof several simpliﬁed descriptions, each best suited at agiven time scale. Unfortunately, despite the explosion ofdata-collection capabilities, many systems are suﬃcientlycomplex that only partial data is available. Often, onlythe structure of interconnection is available, necessitatinga network-centric approach; alternately, sometimes onlypart of the network is seen or computationally tractable to work with, motivating a local approach. In this pa-per, we extend and strengthen the time scale separationapproach for complex, heterogeneous network dynamics,focusing on local interactions.The theory of time scale separation was ﬁrst exploredin detail in the framework of system dynamics by Simon,Ando and Fisher [2, 59], who considered the existence ofa partition of states into components with suﬃciently lowdynamical inﬂuence between them. At short times, thecross-inﬂuence between components can be neglected, sothe dynamics of the global system can be approximatedby the dynamics of disconnected components. At longtimes, the states inside each component evolve to theirdominant mode, and the dynamics can be accurately de-scribed by the aggregated (or lumped) system, where allthe states within each component are collapsed into a sin-gle value. A particularly vivid illustration can be foundin Markov chains (or random walks) with coexisting timescales where there are groups of states which mix fast toa quasi-stationary state, yet exhibit low escape probabil-ity from each group. However, the Simon-Ando approachis global in nature, depending on all parameters of theglobal system—the escape time between groups must belarger than all mixing times to quasi-stationary conver-gence within each group. Furthermore, the groups ofstates must be an exact partition of the system, includ-ing all nodes and assigning each node to a single group.Such a uniformity has little reason to emerge sponta-neously in large, complex, heterogeneous networks, andcan only be artiﬁcially imposed by the grouping, splittingor trimming of naturally coherent dynamic structures.Our work extends the time scale separation approachfor complex dynamical systems, and supersedes previousarguments in three ways: it provides a method to de-tect dynamically coherent structures at all time scales;it does not require global knowledge of the system; andit allows each state to belong to several overlapping co- a r X i v : . [ phy s i c s . s o c - ph ] J un herent structures or to remain unassigned to any compo-nent. For ease of terminology, we will often use the term‘component’ to refer to dynamically coherent structures.We focus below on Markov random dynamics, or equiva-lently random walks on weighted networks, not only be-cause this is an important problem in its own right, butbecause a signiﬁcant number of linear and nonlinear dy-namics (see Methods) can be reduced to random walks.Furthermore, random walks admit an intuitive descrip-tion in terms of ﬂows of probabilities. In this frame-work, the severability quality function for evaluating acomponent measures how similarly the full componentexchanges probability ﬂows with its surroundings, com-pared to the component aggregated into a single node.This is measured as a local property of the component,independently from the rest of the system. In constrast,constructs such as lumped states [19], or macro-states[36] consist in a global partitioning of the states, basedon the accuracy of the global reduced dynamics. Thelink with network-centric concepts such as clusters [32],communities [24] is explicited below. I. RESULTSA. Severability: mixing and retention in Markovlandscapes

To introduce our method, we draw an analogy fromenergy landscapes [64]. In particular, we consider theMarkov landscape deﬁned by the transition matrix of thestandard random walk on a graph, where the nodes (orvertices) of the graph correspond to states and the land-scape reﬂects the transition probabilities between them.Markov landscapes are analogous to energy landscapesalthough they lack a potential energy function pointingdownwards to a minimum energy state. Still, the notionsof wells, barriers and roughness translate easily and help-fully to the language of time scale separation. In thispicture, a well is a group of states surrounded by highbarriers (hence with a long escape time), whereas rough-ness inside the well is related to the mixing time (lowroughness implies a fast mixing time). An illustration ofsuch a landscape can be found in Figure 1a, where wepresent a 3D representation of the luminosity landscapesof three paintings with very diﬀerent characteristics; froma well compartmentalised painting by van Doesburg to arough, featureless excerpt of Monet. In this case, thebarriers and roughness are obtained from diﬀerences inluminosity of adjacent pixels. If a random walker (e.g.De Gennes’ ant in a labyrinth [23]) is allowed to explorevan Doesburg’s luminosity landscape, the observed dy-namics will reveal the presence of severable componentsin the state space; on the other hand, no such compo-nents would be expected in Monet’s landscape.Mathematically, a subset of states of a system is de-ﬁned to be a severable component if it has both high bar-riers and low roughness, as extracted by the behaviour of the random walkers on the underlying landscape. Asshown below in a precise sense (see Section III A), sucha severable component can be understood as a mesoscaledynamical structure , i.e., a set of states that behave co-herently in the eyes of the external environment andwhich capture a relevant description of the system sit-ting between individual nodes and the global system. Toformalise these notions, we borrow the concepts of mixingand retention from Markov chain quasi-stationarity [15],as follows. First, we introduce a measure of the mix-ing over a set of states C by appealing to a randomwalker restricted to those states; C is poorly mixing overa timespan t if the random walker’s position at Markovtimes 0 and t are strongly correlated. More precisely, wemeasure mixing by deﬁning a quantity 0 ≤ µ ( C, t ) ≤ C at time t and the quasi-stationary distribution reached at long times, should thewalkers remain in C (Eq. (5) in Methods). The mixing µ is thus inversely related to the roughness of the land-scape over C , since the exploration of C is hindered bythe roughness of the landscape. Secondly, we characterisethe retention over the set C , which is directly related tothe height of the barriers separating the set of states C from the rest of the system, i.e., random walkers tendto stay within C if it is hard for them to escape. Thisis quantiﬁed by ρ ( C, t ), a number between 0 and 1 de-ﬁned as the probability of a walker not escaping by time t (Eq. (4) in Methods). Both ρ and µ therefore rangefrom 0 to 1, where the value of 1 corresponds to perfectretention or mixing, respectively.We now simply deﬁne the severability of the set C attime scale t as σ ( C, t ) = ρ ( C, t ) + µ ( C, t )2 . (1)Severability can be understood as a compound functionthat balances mixing and retention for a given set ofstates C over the time scale t . If C corresponds to amesoscale dynamical structure, its severability will peakat some time t max , below which the walkers are poorlymixed and beyond which retention is degraded. In a con-nected network, the individual node and the entire graphwill respectively have good severabilities for Markov time0 and ∞ for the trivial reason: at t = 0, retentionand mixing will be perfect for any individual node be-cause nothing has diﬀused, and at t = ∞ , the proba-bilities will have reached the ultimate stationary distri-bution, implying perfect mixing coupled with the alwaysperfect retention of the entire graph. At intermediatetimescales, severable structures are of intermediate size,based on a combination of mixing and retention; on gridgraphs, these optimally severable structures slowly ex-pand with Markov time as higher times allow for mix-ing of larger diameter regions (Figure 1). Less uniformgraphs have more interesting substructures; optimallyseverable structures remain so over a range of Markovtimes before jumping in size to another plateau.

64 nodescomponentsEntirenetworkSingle nodes 16 nodescomponents S e v e r ab ili t y Markov Time " E ne r g y " ba Barriers and potential wells of the Markovian landscape

FIG. 1. (a) Left column: small excerpts from three paintings by Theo van Doesburg’s

Composition in dissonances (1919) (top),Paul Klee’s

Ancient Sound (1925) (middle), and Claude Monet’s

The Japanese Footbridge (1920-22) (bottom). Right column:associated luminosity landscapes obtained from the transition matrix derived from the graph representation of each image.Visualizing the luminosity landscape in van Doesburg’s painting reveals coherent spatio-temporal structures insulated by highbarriers, and within which no obstacle would slow down a random walker. On the other extreme, Monet’s rough landscape,when looking at luminosity (i.e. the perceived brightness), is almost featureless with no obvious components. Klee’s landscapeis intermediate with signiﬁcant internal roughness yet noticeable barriers. The balance between the barrier height and theintrinsic roughness translates over to the emergence of components: barrier heights are inversely related to inter-componentconnection strength and determine escape time, whereas the roughness is inversely related to intra-component connectionstrength and determines mixing time. (b) The “Markovian” landscape of a hierarchical random graph with three levels andgroups of sizes 16, 64, and 256. If a pair of nodes are in the same lowest level size 16 component, they are connected withprobability p = 0 . p = 0 . p = 0 . h k i = 16. The severability of a single node(blue circles), 16-node component (green triangles), 64-node component (red diamond), and the entire network (cyan square)are represented as a function of the time evolved for the Markov process. The succession of optimal severabilities at diﬀerenttime scales reveals the hierarchy of mesoscale structures containing a node of interest. This notion is illustrated in Figure 1b, where we showhow the process of diﬀusion of information on a very sim-ple model of a network, given by a hierarchical randomgraph with three levels of 16, 64 and 256 nodes, leadsto severable components of the state space. As the timeof diﬀusion increases, the random walkers gain suﬃcientprobability to overcome the barriers of the landscape,and hence diﬀuse to larger portions of the network sothat the optimal severable components grow from beingsingle nodes at very short times, through each of the in-termediate levels over diﬀerent time scales, to the entirenetwork at long times.The components of an interconnected dynamical sys-tem with high severability have a precise mathematical meaning in terms of local time scale separation (see localtime scale theorem in Section III B in Methods). Brieﬂy,the existence of a local time scale separation for a groupof states in the dynamics of the random walker allowsfor a simpliﬁed model for the dynamical behaviour of thegroup of nodes C when excited by an impulse, i.e. an ar-rival of probability mass into C . High retention and highmixing (implying high severability) at time t max guaran-tees: (i) the eﬀect of C on the rest of the system whengiven an impulse can be neglected altogether for timescales less than t max , and (ii) the subsystem C can beaccurately approximated to ﬁrst order by a single statethat aggregates all the states of C for all times beyond t max .In summary, the set C can be thought of as a structureof intermediate size whose dynamical response to an im-pulse permits accurate simpliﬁed descriptions. Our localtime scale theorem (Section III B) is inspired by Simonand Ando’s classic result for global time scale separation,yet it diﬀers from it in that it seeks to ﬁnd the conditionsunder which one can reproduce correctly the behaviour ofa severable component at diﬀerent time scales, indepen-dently of the rest of the system. When the full intercon-nected system can be partitioned into components with comparable time scales, we recover Simon and Ando’sglobal theorem (see Supp Inf. B), demonstrating thatour local time scale theorem generalizes their result. B. Mesoscale components in power networks

As a ﬁrst application, we consider the synchroniza-tion dynamics of coupled nonlinear phase oscillators withKuramoto-like sinusoidal coupling [6], which is found inareas as diverse as laser physics, biological synchrony ofcells and animals, and power networks [61]. For our ex-ample, we will apply severability to a standard powernetwork benchmark. Power networks are composed oftwo types of nodes: generator buses, which deliver power,and load buses, which consume power. The internal stateof each node i is described by a voltage, which oscillateswith a frequency ˙ θ i around a nominal value (e.g. 50 or 60Hz). The nonlinear dynamics of bus i can be modelledas M i ¨ θ i + D i ˙ θ i = P i − X j A ij sin( θ i − θ j ) , (2)where M i is an inertia (zero for some buses), D i is adamping coeﬃcient, P i is the power being injected orwithdrawn from the network at node i , and A ij indicatesthe strength of the (symmetric) interaction between i and j [22]. Given suﬃcient coupling strength between thenodes, and depending on properties of the coupling ma-trix A , the network converges to a stationary state whereall angles in the system oscillate at constant frequency ω ,keeping relatively small constant angle diﬀerences withrespect to one another.Although this system is inherently nonlinear, sever-ability is of use here. In Figure 2 we show the results ofthe application of our analysis to linearized discrete-timerandom walk dynamics based on node strengths A ij , thatis equivalent to the continuous-time nonlinear dynamicsfor small deviations around the synchronized state (seeSupp.Inf. E for details). Our example is on a classic testcase for power networks, the IEEE RTS96 test system,composed of three identical copies of the RTS24 test sys-tem interlinked with a few extra edges and one extranode. Previous work has used time-scale based identiﬁca-tion of global partitions into slow-coherent areas based onglobal edge-counting or spectral methods [3, 53]. Theseglobal partitions correctly recover the expected compo-nents, but by nature require information about the entire network. In contrast, severability recovers the expectedcomponents based solely on local information and pro-vides a validation of the components in terms of theirdynamical response. More precisely, Fig. 2 shows thatthe fully nonlinear simulations of the model (2) can bewell represented by the aggregated angle variables withinthe components found with severability: the aggregationof angle variables within the ‘correct’ components has lit-tle eﬀect on the dynamics of the other variables of thesystem, whereas aggregation of an ‘incorrect’ subgroupresults in major discrepancies from the full dynamicalevolution. This result justiﬁes the simpliﬁcation of us-ing random walk dynamics in severability, even for morecomplicated systems.More generally, using random walks to model higherorder dynamics provides a general framework to capturecentral features of many other dynamics taking place ona network (Section III D). C. On severable components and cliques

Given a network-centric view of severable components,it may come as no surprise that there are some simi-larities between network community detection and thediscovery of severable components. Network communi-ties are groups of nodes with strong connectivity withinthe group and lower connectivity with the rest of thenetwork. Communities are often captured with local orglobal metrics that relate the number of edges crossingthe boundaries of a community with the edges inside thecommunities, such as modularity [47], SBM maximumlikelihood [33], OSLOM [41] or conductance [58], some-times with random walk as a computational tool [1, 60].Most of those criteria essentially capture the retentionpart, ρ , of severability. A few references [30, 32] alsoanalyse the conductance internal to the cluster, which isa combinatorial criterion capturing essentially the mix-ing part µ of severability for t = 1. Although not allnetworks may have an easily measured dynamical in-teraction taking place on it (e.g. social networks), wecan endow the graph with the standard random walkand apply severability to those dynamics. Indeed, thestandard random walk on a network can be used to ap-proximate such things as opinion dynamics, informationdiﬀusion, and consensus problems, as the dynamics ofthe random walk is deeply related to the structure ofthe graph [18, 43, 45, 54, 63]. As shown in Fig. 1b,the expectation is that such communities are detectedas meaningful mesoscale components observed during in-formation diﬀusion. To validate this idea further, wehave used the standard LFR synthetic benchmark net-work model for community detection, where networksare constructed as dense random Erdős-Rényi graphs in-terconnected by sparse random links at several levels ofcoarseness [24, 40]. As shown in Supp.Inf. F, our greedyalgorithm for ﬁnding severable components recovers thecommunities with high ﬁdelity, comparable or superior FIG. 2. Illustration of the three component RTS96 power network test system, which is composed of three copies of theRTS24 benchmark. Above, we have highlighted the three severable components as segments (at t = 64, severability of 0.753,0.753, and 0.807 for green, purple, and black respectively) (a), whereas below (d) we arbitrarily partitioned into two connectedcomponents (at t = 64, severability of 0.611 and 0.646 for green and black respectively). (b,e) Full dynamics of the powernetwork with starting phase angles chosen to match within a component. (c,f) Full dynamics of the power network using insteada collapsed state representing all of the black component. When the collapsed component is highly severable (top), the reducedrepresentation matches the original system much better than when using arbitrary partitions (bottom). to other state-of-the-art methods [38]. We remark thatseverable components are found from the local diﬀusivedynamics without global information from the graph.However, communities in social networks are oftencharacterised by a clique-like structure, showing for in-stance a low diameter and high density of triangles [49].While clique-like structures emerge as particular cases ofseverable components, severability may detect long-rangestructures that are not akin to communities. An exampleof this is shown in Fig. 3a, where a ring-of-rings networkis correctly revealed by severability. Such non-cliquelike structures are present in other areas of application,including transportation networks, images, and proteinstructures [57]. Another illustration is provided by bio-chemical networks, and one canonical metabolic pathwayis the citric acid cycle. When we analyze the citrate path-way schematic (map00020) in the KEGG database[31]using severability, the search for high severability struc-tures detects the Krebs citric acid cycle. These struc-tures do not ﬁt the standard deﬁnition of communities,and indeed are not detected as such by most communitydetection algorithms [57]. However, as exempliﬁed by theKrebs cycle, they are nonetheless dynamical structures ofimportance. Thus, although severable components andnetwork communities share some characteristics, they are diﬀerent concepts built on diﬀerent ideas, the former bycoherency of dynamics, and the latter by the density ofclique-like structure. D. Word association as a diﬀusion: overlaps andorphans

An important feature of severability as a means to an-alyzing interconnected sytems is that it allows the pos-sibility of overlaps (a node can belong to more than onegroup) and orphans (a node can belong to no group,as every group that includes it has higher severabil-ity without it). To illustrate these features, we turnto a word association network (the University of SouthFlorida Free Association Norms dataset [46]), previouslyused to highlight the existence of overlapping networkcommunities[49]. To build this network, researchers pre-sented words to participants, who were then asked forthe ﬁrst word that came to mind. Hence each node inthe network corresponds to a word, and directed linksbetween nodes are weighted according to the proportionof responses linking those two words. For example, whencued with ‘science’, 21.4% of participants wrote ‘biology’.The very construction of the network is reminiscent

Citrate cis-Aconitate IsocitrateOxalosuccinate2-Oxoglutarate3-Carboxy-1-hydroxypropyl-ThPPThPPS-Succinyldihydro-lipoamide-E Lipoamide-EDihydro-lipoamide-ESuccinyl-CoASuccinateFumarateS-Malate OxaloacetateAcetyl-CoA S-Acetyldihydro-lipoamide-EDihydro-lipoamide-E Lipoamide-E ThPP2-Hydroxy-ethyl-ThPP PyruvatePhosphoenol-pyruvate ba weight 1 edges connectingrings togetherweight 2 edges connectingnodes within the rings FIG. 3. (a) Ring of rings. Heavy lines (within rings) cor-respond to undirected links with weight 2, while light linesbetween rings to links with weight 1. Severability is able to re-cover the seeded ring structure (at Markov times 3 ≤ t < < t ≤

21 and adds acetyl-CoAfrom 21 < t ≤ of a random walk process representing the mental asso-ciation based on similarity of meaning and contextualusage, thus making severability able to incorporate theweight and directionality of the network in a natural way.As severability is a local method, it is not necessary toanalyse the entire graph to ﬁnd components. Rather, byanalysing increasing horizons on the network an expand-ing view of associated meanings presents itself from a par-ticular vantage point. Figure 4 shows the word ‘nature’and the components it belongs to (with maximum searchsize S = 50 and Markov time t = 2), as well as the com-ponents and orphan nodes to which ‘nature’ is directlylinked (see Supp.Inf. L for further details). By permit-ting overlapping components, we are able to recover thediﬀerent contexts and meanings associated with a singleword. E. Locality in image segmentation: zooming andcropping

In Fig. 5 we apply severability optimisation to theidentiﬁcation of stained neurons in a cell-ﬂuorescence im- age in order to illustrate visually a central aspect of themethod; namely, that it does not rely on global informa-tion in order to detect mesoscale components faithfully.Below we show that the results are similar whether thealgorithm is run on only some part of the image, or onits entirety.Image segmentation divides images into subsets of ad-jacent pixels of similar color or luminosity, and is partic-ularly used for medical and biological imaging. Some ofthe existing segmentation methods are based on a nomi-nal diﬀusion dynamics taking place on the lattice graphof pixels [26, 57]. In this view, a segment can be seenas a particular case of a severable component, as alreadysuggested in our initial view of the paintings in Fig. 1.To carry out our analysis, we have followed a classic pro-tocol to generate a lattice graph from the image by as-signing an edge between pixels weighted by a functionof the diﬀerence in luminosity and distance (up to a cut-oﬀ) [7, 58, 67] (for details see Supp.Inf. G). The severablesubsets are self-consistent in that they are found robustlyfrom diﬀusions starting from any of the members of theset; as these are strongly severable subsets, they tend tobe found regardless of which member of the set is usedas a starting point (unlike the word association commu-nities of the last section). Both the cells and patches ofthe background are found as severable components. Fur-thermore, because severability does not depend on globalinformation, the results do not change signiﬁcantly whenthe algorithm is run only on a smaller section of the im-age: only segments that lie on the edges of the image areaﬀected by cropping. This feature is of potential appli-cation to evolving networks, as communities are stableagainst perturbations and do not need to be recomputedfully when new nodes are added outside of a local neigh-bourhood.We note that while other completely local methods[30]exhibit a similar commutativity, ‘mostly’ local meth-ods like OSLOM (Order Statistics Local OptimizationMethod) do not [41]. Though OSLOM is based on localorder statistics, when partitioning, it takes into accountsome global information, making it noncommutative. InAppendix J, we show that OSLOM behaves diﬀerently ona ring of small-world networks vs a single small-world.

II. DISCUSSION

Real-life dynamics emerging from the interaction ofmany elementary nodes can sometimes be seen as the in-terconnection of mesoscale dynamical structures, whoseevolution over a time scale of interest can be representedas a single aggregated state interacting with its surround-ings. We have introduced a measure for the detection ofsuch severable components which can be well approxi-mated by their aggregated variables. The formal the-ory, a generalization of Simon and Ando’s classic the-ory to local time scale separation is illustrated on theparticular case of Markov chains, which are represen-

ACORNEARTHWILDERNESSAIR JUNGLE PLANETWOODS SHADEENVIRONMENTENJOY LONG CLIMBLOVE BARK OAKSTUMPANIMALSRESPIRATION HIKETREES GREENTIMBER TRUNKDEADLIVE LIMBBRANCHOUTSIDEHEALTHLIFEBREATH RUNEXERCISELEAVESMAGAZINE CRAWLMOVETRAILIMPORTANT WALKBUSH LEGLEAFTIMEDEATH ELMPINEREAL FEETJOG

NATURE

TREEFOREST "OAK""WALK""LIFE""JUNGLE""ENVIRONMENT" "SCIENCE""HOME" "DELICATE" "ROSE""MAKE UP""ATTRACTIVE""QUIET""SHOW""REACTION""STORM" "MOUNTAIN""HUMAN""WHITE" "LAZY" BIRDSCLIMBHIKECHEMICAL PRETTYVASEFLOWER CAMPINGOUTDOORSTREENATUREFOREST ba FIG. 4. (a) The ﬁve components that the word ‘nature’ belongs to. Nodes and links are coloured by component identiﬁcation;coloured ovals represent multiple component membership. (b) A broader view of the component landscape surrounding “nature”,depicting also components connected to, but not containing, ‘nature’, including three orphan nodes (see SI for details). Nodesbelonging to just one of the components are combined into a single block labelled by the most central word of the component,while nodes belonging to more than one component are separately mentioned in the gray ovals. Note that in many cases, thewords used to label the components possess multiple labels themselves. Communities were found by optimizing severability forMarkov time t = 2 and search size S = 50. tative of a larger class of dynamics, including consen-sus and synchronization. Severable components, whichcan coexist at several sizes and time scales, overlap andleave orphan nodes. This dynamical concept is connectedto other more particular notions encountered in severalclasses of systems, including basins (energy landscapes),slow-coherent areas (power networks), segments (imageprocessing), communities (social networks analysis), andrings (biochemical networks) which can be understoodas structures with a locally coherent dynamics. On theother hand, other kinds of meso-structures in complexnetworks (e.g., block models, or roles in ecological sys-tems [4, 11, 12]) are global in nature, and do not fallunder the condition of locality required in this paper.Hence, while locality is an advantage for large systems,by deﬁnition truly global characteristics cannot be thusdiscovered. However, we have shown in this paper thatmany classes of structures are in fact locally deﬁned,demonstrating the applicability of severability.Engineering disciplines traditionally operate by plug-ging together smaller components, usually seen as blackboxes with simple external behaviour regardless of theirinternal complexity, in order to generate complex systemswith controlled behaviour. One may argue that manynatural systems are built similarly. In this perspective,we aim here to reverse this process: although complexsystems are often too large to analyse in their entirety, our approach here is to try and ﬁnd if there exist suit-able intermediate dynamical components which providea proper understanding and representation of the com-plex global dynamics. In this sense, severability servesthe role of a local coarse-graining mechanism for the dy-namics as observed from a given subset of states in thesystem. Appealing to the coexistence of local time scalesin Markov processes as a means to reveal severable com-ponents establishes mathematical connections betweendiﬀusion processes and model reduction, linking in a pre-cise sense good mixing and retention in a subsystem toits accurate approximation through coarse-graining whilepreserving the Markov property.As Big Data continues to proliferate, severability pro-vides a ﬁrst step towards the deﬁnition of new methodsable to tackle the huge wealth of data being collected inall areas of science, technology and social life, much ofwhich comes with a naturally endowed dynamics. Un-doubtedly, more challenges lie on the road ahead, suchas in the treatment of more sophisticated node dynamics,for example when the dynamics are strongly nonlinear ornon-Markovian [16, 55]. Yet the importance of dynam-ics as a key to characterising networks will undoubtedlypersist. Ultimately, we hope that the framework of sev-erable components (code available at https://github.com/yunwilliamyu/severability ) provides not only aspeciﬁc solution to recovering mesoscale structures when + Starting image ≈ Cells ≈ Background + FIG. 5. Neocortical pyramidal neurons, stained with a ﬂuorescent dye, with resolution reduced to 102 ×

102 and convertedto grayscale by luminosity. (Cyan) At Markov time t = 32, segments largely corresponding to cells were found (see S.I. fordetails). (Yellow) Furthermore, repeating the procedure with a cropped subregion of the image gives largely the same results,with some minor variations along the borders. This commutativity is a key feature of local methods. Despite the fact thatseverability is not speciﬁcally designed for image analysis, the severable components found are of good quality. the dynamics are roughly Markovian, but also a mean-ingful and practical starting point for more sophisticatedmethods capable of tackling these more diﬃcult prob-lems. III. METHODSA. Formal deﬁnition of Severability

The deﬁnition of severability uses concepts of graphtheory and Markov chains. A graph G is a set V of n nodes (or vertices, or states in the Markov chain termi-nology) together with another set E of links (or edges)between vertices. We assume that every node has at leastone outgoing edge, and that all edges are labelled witha positive weight. The weighted, directed graph, is en-coded as an adjacency matrix A , where A ij is the weightof the edge going from i to j . The (weighted) out-degreeof node i is the sum of weights of edges leaving i . Theout-degrees can be compiled in the vector A , where isthe n × D = diag( A ) is the diagonalmatrix of out-degrees.On a given graph G , we deﬁne a random process in dis-crete time. A random walker starts from a node i at time0 and jumps at time t = 1 to any out-neighbour j withprobability A ij /d i , proportional to the edge weight. Suc-cessive jumps at t = 1 , , , , . . . deﬁne a Markov chain,or random walk, on the graph. The probability of pres-ence of the random walker evolves as x ( t + 1) = x ( t ) D − A ≡ x ( t ) P, (3)where x ( t ) is the 1 × n normalised probability vector and P is the transition matrix, the rows of which are non-negative and sum to one. Provided that the graph isstrongly connected and aperiodic (i.e. there is no integer k > k edges), any initial probability distribution converges to aunique stationary distribution, which is a solution of theﬁxed-point equation x ( ∞ ) = x ( ∞ ) P .Given a connected subset C ⊂ V with k nodes, let Q be the submatrix of P corresponding to the nodes in C .Then we deﬁne the retention of the subset C over time t , ρ ( C, t ), as the probability for a random walker startingwith a uniform probability distribution in C not to haveescaped by time t : ρ ( C, t ) = 1 k (cid:0) T Q t (cid:1) . (4)To deﬁne mixing, let q ( t ) i be the i th row of the matrix Q t . Note that because C is connected, q ( t ) i = . Thus thenormalised row of Q , q ( t ) i /q ( t ) i is the probability distri-bution at time t for a random walker starting from node i , conditional upon the walker remaining in C between 0and t. We can then deﬁne the internal mixing µ ( C, t ) as µ ( C, t ) = 1 − k k X i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ¯ q − q ( t ) i q ( t ) i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T V , (5)where ¯ q is the arithmetic mean over the unit-normalisedrows of Q t , and we have used the fact that the total variation distance norm is given by k v k T V = 12 X i | v i | . (6)The internal mixing term µ approaches 1 as the probabil-ity distribution of a random walker starting somewhereuniformly random within the community approaches thequasistationary distribution on that subset of nodes [15].Both ρ and µ are deﬁned to range from 0 to 1, wherethe value of 1 corresponds to perfect retention or mix-ing, respectively. We deﬁne the severability as compoundfunction of both retention and mixing: σ ( C, t ) = ρ ( C, t ) + µ ( C, t )2 , (7)which can be understood as the quality of the subset C to be considered as a separate dynamical mesostruc-ture over time t . Severability has an intrinsic resolutionparameter t , corresponding to the Markov horizon; as t increases, the random walker will diﬀuse to larger partsof the graph, as reﬂected by the iterations of the sub-matrix Q . Note that, from the above deﬁnitions, σ ( C, t )depends only upon the out-links from nodes within C ;hence it is a purely local function.The particular form assumed for retention, mixing andseverability is justiﬁed by the mathematical propertiesstated in III B, and proved in the Supp.Inf. B. Local time scale separation theorem

1. Background: Simon and Ando’s global time scaleseparation theorem

In 1961, Simon and Ando established a time scaleseparation theorem, both for general linear systems andMarkov chains in particular [59], which we present now.Given a Markov chain x ( t + 1) = x ( t ) P, (8)let us split the nodes in two sets x ( t ) = (cid:0) x ( t ) x ( t ) (cid:1) , with a corresponding partition of PP = (cid:18) P P P P (cid:19) . (9)Fix an arbitrary (cid:15) >

0, which will serve as a requestedstandard of approximation. Assume that P is close to aperfectly decoupled transition matrix˜ P = (cid:18) ˜ P

00 ˜ P (cid:19) . (10)Simon and Ando proved that there is a small enough δ ( (cid:15), ˜ P , ˜ P ) > T ( (cid:15), ˜ P , ˜ P ) such that if k P − ˜ P k ≤ δ , then two kinds of approximations are validfor the trajectories of x ( t ):0 • On the one hand, for all times t ≤ T , the decoupledapproximation x dec ( t ) = (cid:0) x dec, ( t ) x dec, ( t ) (cid:1) = (cid:0) x (0) P t x (0) P t (cid:1) (11)is within (cid:15) in norm from the actual solution x ( t ): k x ( t ) − x dec ( t ) k < (cid:15), t < T (12) • On the other hand, and more importantly, for alltimes t and in particular for all t > T , the aggre-gated probabilities x tot ( t ) = (cid:0) x , tot ( t ) x , tot ( t ) (cid:1) = (cid:0) x ( t ) x ( t ) (cid:1) are within (cid:15) in norm from the approximation x tot,approx ( t ) = (cid:0) x , tot,approx (0) x , tot,approx (0) (cid:1) (cid:18) λ δ δ λ (cid:19) t , (13)for some real values λ , λ , δ , δ . • Moreover, for times t > T , x i ( t ) can be recon-structed as x i, tot ( t ) v i with an error bounded by (cid:15) ,for some v i .Which norms are chosen in the statement above is ir-relevant, as all norms of vectors or matrices of a given sizeare equivalent up to a factor, making the statement truefor any choice of them. For simplicity we stated here thetwo block case in a Markov chain dynamics, although thetheory holds for general linear systems split in an arbi-trary number of blocks. It is important to notice that therequired δ depends not only on the given (cid:15) but also po-tentially on all the entries of the diagonal blocks ˜ P ii , as isapparent for example in our own proof of Simon-Ando’stheorem (Supp.Inf. III B 1). The theorem can thereforeonly be applied globally, with full knowledge of the dy-namics. It is desirable to decouple this global conditioninto the local conditions to be satisﬁed by each diagonalblock P ii to satisfy the required accuracy (cid:15) , and sever-ability oﬀers one practical way to achieve this, as shownbelow.

2. Statement of the local time scale theorem

Following the same notation as above, consider the ﬁrstblock P (denoted Q in the main text and Methods A),which describes the set of states C with severability σ ( t ).The local dynamics of x ( t ) is described by the opendynamical system x ( t + 1) = x ( t ) P + u ( t ) y ( t ) = x ( t ) P , (14)where u , deﬁned for all t ≥

0, is the input into the sub-system C , i.e., the in-ﬂow of probability from the envi-ronment, and y is the output of the subsystem, by which it inﬂuences the environment by an outﬂow of probabil-ity. By the environment we mean the rest of the statespace, described by x , itself governed by an open systemequation of the same kind as Eq.(14). The global dynam-ics on (cid:0) x ( t ) x ( t ) (cid:1) can be understood as the feedbackinterconnection of the two systems, related by the equa-tions: u ( t ) = y ( t ) , u ( t ) = y ( t ).Equation (14) describes a relationship between the in-put sequence u ( t ) and the output sequence y ( t ). Analternative way to describe an open system in linear re-sponse theory is by its impulse response. In our notation,the impulse response can be written as g ( t ) = P t − P . for all t ≥

1, and zero for t ≤

0. The impulse responsecharacterizes fully the input-output relationship in thatthe output generated by any input sequence u ( t ) is ob-tained by the convolution product y = u ∗ g (deﬁnedas y ( t ) = u ( t ) g (0) + u ( t − g (1) + u ( t − g (2) + . . . for all t ≥ C initially, x (0) = 0. A non-zero initial state can be incorpo-rated by adding an artiﬁcal input u ( − x (0) = u ( − y de-scribed by Eq. (14), we need to approximate the function g ( t ) by another impulse response h ( t ) between the sameinput and output spaces measured with a given norm. Acommon metric used in the open systems literature is theone-norm k g − h k = X t ≥ k g ( t ) − h ( t ) k , (15)whenever it is deﬁned. Of course, given matrices g ( t ) , h ( t ), we can choose any matrix norm for k g ( t ) − h ( t ) k , as they all relate within a constant factor only de-pendent on the dimension of the matrix. If the approxi-mation is only meant to be valid on time interval [ t , t ],then we can restrict the sum in Eq. (15) to t ∈ [ t , t ],denoted k g − h k , [ t ,t ] . An error in the impulse responsecommitted in replacing g by h will result in an error inthe output y in the following way, as one can show fromelementary algebra:sup ≤ t ≤ t k ( u ∗ g )( t ) − ( u ∗ h )( t ) k ≤ sup t u ( t ) · k g − h k , [0 ,t ] , where t can be inﬁnity.Our local time scale separation theorem makes twostatements, regarding the approximability of the impulseresponse of the nodes C , before and after an arbitrarychosen time T . The ﬁrst one at short times follows di-rectly from the high retention implied by a high severabil-ity at time T , whereas the second one at long times re-quires a more careful analysis. The theorems are provedin Supp. Inf. A. Local Time Scale Theorem (Short times) . The sys-tem represented by Eq. (14) can be approximated untiltime T by the trivial response y ( t ) = 0 , with accuracy k g − k , [0 ,T ] = O (1 − σ ( T )) . Local Time Scale Theorem (Long times) . The systemdescribed by Eq. (14) can be approximated by a one-statesystem of the following form x C ( t + 1) = λ x C ( t ) + u ( t ) by ( t ) = x C ( t ) d, (16) where λ is the dominant eigenvalue of P and b, d are ap-propriate vectors, whose corresponding impulse responseis h ( t ) = bλ t d . Vectors b, d are found from the dominanteigenvectors P b = λb , vP = λv , and d = vP , nor-malised so that vb = 1 = v . The approximation is validfor all times—including obviously t > T —and the errorsummed over all times is k g − h k = O (1 − σ ( T )) . For any given input signal u ( t ) bounded by k u ( t ) k ≤ K for all t ≥ −

1, the exact model described by Eq. (14)and the one-dimensional model given by Eq. (16) deliveroutputs whose diﬀerence is at all times bounded by O (1 − σ ( T )) K .The constants contained inside O ( . ) in these state-ments may depend on the dimension of x (number ofnodes in C ), but neither on the speciﬁc entries of P noron T . In view of these statements, the best time scaleseparation is given by T = t max , at which severabilitypeaks, and the error of the resulting approximations is1 − σ ( t max ).Assuming now that the global network is split into twoor several blocks, one may combine the diﬀerent local ap-proximations and obtain the following version of classicSimon-Ando theorem: given a global dynamics given byEqs (8) and (9), suppose that we ﬁnd a common time T at which both 1 − σ ( T, P ) ≤ δ and 1 − σ ( T, P ) ≤ δ ,then the short-term and long-term dynamics can be ap-proximated as Eqs (11) and (13) with error bounded by (cid:15) = O ( δ ), where the hidden constant only depends onthe total number of nodes (Supp. Inf. B). The generali-sation to more than two components is straightforward.This version highlights the role of severability of eachcomponent and the need to ﬁnd a common global timescale T (possibly suboptimal for each component sepa-rately) where each component simultaneously reaches ahigh severability, for a global time scale separation toemerge.See Supp. Inf. C for a toy example of comparativeapplication of the global and local time scale separationtheorems. C. Computational aspects of Severabilityoptimisation

We apply a semi-greedy search algorithm to ﬁnd theoptimal component C for a starting node n , at a chosenMarkov time t and setting a search size S (see AppendixD 1 in the Supp. Inf. for a detailed ﬂowchart). Brieﬂy, the algorithm proceeds as follows. Withoutloss of generality, deﬁne σ ( C ) = σ ( Q, t ). Initially, only n ∈ C . Aggregate nodes greedily, except let every thirdstep be a Kernighan-Lin switch of a single node on theboundary of C to maximise σ ( C ) [34]. After the ini-tial semi-greedy optimisation, the intermediate compo-nent C that has maximal severability is ﬁne-tuned usingKernighan-Lin switches to ﬁnd a local maximum. If n isin the resulting component, the algorithm stops; other-wise, start over with a diﬀerent neighbour of the startingnode. If all neighbours of n have been attempted with-out success, declare n an orphan. For the word associa-tion network, every neighbour of “nature” was attemptedfor the ﬁrst step, giving the overlapping communities.A detailed description of other computational aspectsof the implementation are discussed in Supp. Inf. D. D. Markov chain equivalence of dynamical systems

Markov chains, or random walks, are characterized bya dynamics of the form x ( t + 1) = x ( t ) P , where P is anynonnegative square matrix with all rows summing to one.To every such dynamics we can associate a dual consensusdynamics y ( t + 1) = P y ( t ) acting on the column vector y ( t ), the entries of which are positions, or opinions, ofagents, which converge to one another until convergenceto the same value if and only if the corresponding randomwalk converges to a unique stationary distribution.Positive linear systems are common in economics, bi-ology, chemistry, where variables naturally take nonneg-ative values. Such systems are characterized by an evo-lution y ( t + 1) = P y ( t ), or x ( t + 1) = x ( t ) P , where P is only required to be nonnegative. Under the sameconnectivity conditions on the network underlying P , weknow that there is a unique dominant eigenvalue λ anda corresponding left and right eigenvectors, u = λ − uP and v = λ − P v respectively, all of which are positive byvirtue of the Perron-Frobenius theorem.This property allows a normalization that transformsthe dynamics into a consensus, or random walk, dynam-ics. The new matrix is ˜ P = λ − D − v P D v , where D v isthe diagonal matrix associated to v . It is readily observedthat ˜ P is a valid transition matrix, and is equivalent to P except for a global scaling λ − and a change of variableon every node. In particular, it has the same eigenvec-tors and acts on the same underlying network as P . Thistransformation has an elegant information-theoretic in-terpretation as the random walk with maximal entropyrate (if P is a zero-one matrix) or a free energy (if thenonnegative entries are interpreted as exponential energybarriers along the edges) [17, 56].Markov chains are also deﬁned in continuous time, fol-lowing an equation of the form ˙ x ( t ) = x ( t ) L , where thecontinuous-time transition matrix has nonpositive diago-nal, nonnegative oﬀ-diagonal terms, and zero-sum rows.One also has continuous-time consensus, and any positivecontinuous-time linear system, characterized by a matrix2 L with nonpositive diagonal and nonnegative oﬀ-diagonalterms, can be similarly normalized to a continuous-timeMarkov chain, which can be sampled to a discrete-timeMarkov chain.Some non-linear systems can be linearized arounda ﬁxed point. Classic theorems such as Hartman-Grobman’s ensure that the nonlinear and linearized sys-tems are equivalent up to a change of variables in a neigh-bourhood of the ﬁxed point. Kuramoto oscillators, andpower networks dynamics, linearize to consensus dynam-ics. ACKNOWLEDGMENTS

The authors would like to thank Antoine Delmotte,Michael Schaub, Arnaud Browet, Florian Dörﬂer and Renaud Lambiotte for code and/or discussions. Neu-ron ﬂuorescence imagery is courtesy of Simon Schultzand Marie-Therese Vasilache. Y.W.Y. was partially sup-ported by an Imperial Marshall Scholarship during theearly years of this work. J.-C. D. is partly supported bythe Flagship European Research Area Network (FLAG-ERA) Joint Transnational Call “FuturICT 2.0”. Thiswork is partially supported by the Engineering and Phys-ical Sciences Research Council of the United Kingdom.

Author Contributions

M.B. and S.Y. conceived the project and guided theresearch. J.-C.D. developed the theoretical analyses.Y.W.Y. implemented and designed the experimentalanalyses. The manuscript was jointly written by all au-thors. [1] Andersen, R., F. Chung, and K. Lang, 2007, InternetMathematics (1), 35.[2] Ando, A., and F. M. Fisher, 1963, International Eco-nomic Review (1), 53.[3] Avramovic, B., P. V. Kokotovic, J. R. Winkelman, andJ. H. Chow, 1980, Automatica (6), 637.[4] Beguerisse-Díaz, M., G. Garduño-Hernández, B. Van-gelov, S. N. Yaliraki, and M. Barahona, 2014, Journalof The Royal Society Interface (101), 20140940.[5] Blondel, V. D., J. L. Guillaume, R. Lambiotte, andE. Lefebvre, 2008, Journal of Statistical Mechanics: The-ory and Experiment , P10008.[6] Boccaletti, S., V. Latora, Y. Moreno, M. Chavez, andD.-U. Hwang, 2006, Physics reports (4), 175.[7] Browet, A., P. Absil, and P. Van Dooren, 2011, Combi-natorial Image Analysis , 358–371.[8] Bultheel, A., and M. Van Barel, 1986, Journal of Com-putational and Applied Mathematics (3), 401.[9] Chung, F., 1997, Spectral graph theory, number 92 inRegional conference series in mathematics (Amer. Math-ematical Society).[10] Coderch, M., A. Willsky, S. Sastry, and D. Castanon,1983, IEEE Transactions on Automatic Control (11),1017.[11] Cooper, K., and M. Barahona, 2010, arXiv preprintarXiv:1012.2726 .[12] Cooper, K., and M. Barahona, 2011, arXiv preprintarXiv:1103.5582 .[13] Cornelius, S. P., W. L. Kath, and A. E. Motter, 2013,Nature communications .[14] Danon, L., A. Díaz-Guilera, J. Duch, and A. Arenas,2005, Journal of Statistical Mechanics: Theory and Ex-periment (09), P09008, ISSN 1742-5468, URL http://iopscience.iop.org/1742-5468/2005/09/P09008/ .[15] Darroch, J. N., and E. Seneta, 1965, Journal of AppliedProbability (1), 88, ISSN 00219002, URL .[16] Delvenne, J.-C., R. Lambiotte, and L. E. Rocha, 2015,Nature communications .[17] Delvenne, J.-C., and A.-S. Libert, 2011, Physical ReviewE (4), 046117. [18] Delvenne, J.-C., M. T. Schaub, S. N.Yaliraki, and M. Barahona, 2013,Dynamics On and Of Complex Networks (Springer),volume 2, chapter The stability of a graph partition:A dynamics-based framework for community detection,pp. 221–242.[19] Derisavi, S., H. Hermanns, and W. H. Sanders, 2003,Information Processing Letters (6), 309.[20] Dorﬂer, F., and F. Bullo, 2011, inDecision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on(IEEE), pp. 7099–7104.[21] Dörﬂer, F., and F. Bullo, 2012, SIAM Journal on Controland Optimization (3), 1616.[22] Dörﬂer, F., M. Chertkov, and F. Bullo, 2013, Proceedingsof the National Academy of Sciences (6), 2005.[23] de Gennes, P. G., 1976, La recherche (72), 919.[24] Girvan, M., and M. E. J. Newman, 2002, Proc. Nat.Acad. Sci. USA (12), 7821 , URL .[25] Good, B. H., Y. de Montjoye, and A. Clauset, 2010, Phys.Rev. E (4), 046106, URL http://link.aps.org/doi/10.1103/PhysRevE.81.046106 .[26] Grady, L., 2006, Pattern Analysis and Machine Intelli-gence, IEEE Transactions on (11), 1768.[27] Heidemann, J., M. Klier, and F. Probst, 2012, ComputerNetworks (18), 3866.[28] Helbing, D., 2013, Nature (7447), 51.[29] Ideker, T., and N. J. Krogan, 2012, Molecular systemsbiology (1), 565.[30] Jeub, L. G., P. Balachandran, M. A. Porter, P. J. Mucha,and M. W. Mahoney, 2015, Physical Review E (1),012821.[31] Kanehisa, M., and S. Goto, 2000, Nucleic Acids Re-search (1), 27 , URL http://nar.oxfordjournals.org/content/28/1/27.abstract .[32] Kannan, R., S. Vempala, and A. Vetta, 2004, J. ACM (3), 497–515.[33] Karrer, B., and M. E. Newman, 2011, Physical ReviewE (1), 016107.[34] Kernighan, B. W., and S. Lin, 1970, Bell Syst. Tech. J. (2), 291–307. [35] Kokotovic, P. V., J. O’Reilly, and H. K. Khalil, 1986,Singular Perturbation Methods in Control: Analysis and Design(Academic Press, Inc., Orlando, FL, USA), ISBN0124176356.[36] Korenblum, D., and D. Shalloway, 2003, Physical ReviewE (5), 056704.[37] Lancichinetti, A., and S. Fortunato, 2009, Phys. Rev.E (1), 016118, URL http://link.aps.org/doi/10.1103/PhysRevE.80.016118 .[38] Lancichinetti, A., and S. Fortunato, 2009, Physical re-view E (5), 056117.[39] Lancichinetti, A., S. Fortunato, and J. Kertész, 2009,New Journal of Physics (3), 033015, ISSN 1367-2630, URL http://iopscience.iop.org/1367-2630/11/3/033015 .[40] Lancichinetti, A., S. Fortunato, and F. Radicchi, 2008,Phys. Rev. E (4), 046110, URL http://link.aps.org/doi/10.1103/PhysRevE.78.046110 .[41] Lancichinetti, A., F. Radicchi, J. J. Ramasco, and S. For-tunato, 2011, PloS one (4), e18961.[42] Majdandzic, A., B. Podobnik, S. V. Buldyrev, D. Y.Kenett, S. Havlin, and H. E. Stanley, 2014, NaturePhysics (1), 34.[43] Masuda, N., M. A. Porter, and R. Lambiotte, 2016, arXivpreprint arXiv:1612.03281 .[44] Moore, B., 1981, Automatic Control, IEEE Transactionson (1), 17.[45] Morarescu, I., and A. Girard, 2009, Automatic Control,IEEE Transactions on (99), 1862.[46] Nelson, D. L., C. L. McEvoy, and T. A. Schreiber, 1998,The university of south ﬂorida word association, rhyme,and word fragment norms, URL .[47] Newman, M. E. J., and M. Girvan, 2004, Phys. Rev.E (2), 026113, URL http://link.aps.org/doi/10.1103/PhysRevE.69.026113 .[48] Olfati-Saber, R., J. A. Fax, and R. M. Murray, 2007,ieeep (1), 215.[49] Palla, G., I. Derenyi, I. Farkas, and T. Vicsek, 2005,Nature (7043), 814, ISSN 0028-0836, URL http://dx.doi.org/10.1038/nature03607 .[50] Pernebo, L., and L. M. Silverman, 1982, Automatic Con-trol, IEEE Transactions on (2), 382.[51] Radicchi, F., C. Castellano, F. Cecconi, V. Loreto,and D. Parisi, 2004, Proc. Nat. Acad. Sci. USA (9), 2658 , URL .[52] Roberts, J. D., 1980, International Journal of Control (4), 677.[53] Romeres, D., F. Dörﬂer, and F. Bullo, 2013, inEuropean Control Conference, Zürich, Switzerland(Citeseer).[54] Rosvall, M., and C. T. Bergstrom, 2008, Proc. Nat. Acad.Sci. USA (4), 1118 , URL .[55] Rosvall, M., A. V. Esquivel, A. Lancichinetti, J. D. West,and R. Lambiotte, 2014, Nature communications .[56] Ruelle, D., and G. Gallavotti, 1978,Thermodynamic formalism, volume 112 (Addison-Wesley Reading).[57] Schaub, M., J. Delvenne, S. Yaliraki, and M. Barahona,2012, PloS one (2), e32210.[58] Shi, J., and J. Malik, 2000, Pattern Analysis and MachineIntelligence, IEEE Transactions on (8), 888. [59] Simon, H. A., and A. Ando, 1961, Econometrica: journalof the Econometric Society , 111.[60] Spielman, D. A., and S.-H. Teng, 2013, SIAM Journal onComputing (1), 1.[61] Strogatz, S. H., 2000, Physica D: Nonlinear Phenomena (1), 1.[62] Strogatz, S. H., 2001, Nature (6825), 268, ISSN 0028-0836, URL http://dx.doi.org/10.1038/35065725 .[63] Van Dongen, S., 2008, SIAM Journal on Matrix Analysisand Applications (1), 121.[64] Wales, D. J., 2006, International Reviews in PhysicalChemistry (1-2), 237.[65] Watson, R. A., and J. B. Pollack, 2005, Artiﬁcial Life (4), 445.[66] Watts, D. J., and S. H. Strogatz, 1998, nature (6684),440.[67] Wu, Z., and R. Leahy, 1993, Pattern Analysis and Ma-chine Intelligence, IEEE Transactions on (11), 1101. Supplementary Information

Appendix A: Proof of the local time scale separation theorem

Proof.

On the short time scale, the validity of the approximation is given by T k g ( t ) k where ( t ) = P t − tP for t ≥ t = 0).We notice that P t =1 T P t − P expresses the probabilities of escape within time T . In fact retention ρ ( T ) asintroduced in Eq. (4) can be expressed as 1 − ρ ( T ) = 1 k T T X t =0 P t P = T +1 X t =0 k g ( t ) k≥ T X t =0 k g ( t ) k (A1)for the choice of k -by- k matrix norm k A k = P ij | A ij | /k , given that all entries of g ( t ) are nonnegative. The short timescale local theorem results from 1 − ρ ( T ) ≤ − σ ( T )) which follows directly from the deﬁnition of severability.Equation (16) generates an impulse response h ( t ) = bλ t − d for t ≥

1. The diﬀerence ∆( t ) = k P t − P − bλ t − d k ,decays to zero exponentially provided that P has dominant eigenvalue λ , eigenvectors v = λ − vP (normalised sothat the entries of row vector v sum to one), b = λ − P b (normalised so that vb = 1) and d = vP . Indeed thisguarantees that P t behaves as bλ t v for high t .If P were perfectly stochastic, then P = and we would have b = and λ = 1. As P is almost stochastic,we expect that 1 − λ and − b are in O (1 − σ ( t )) for all t , which we can prove indeed in the following way.It is well known from Perron-Frobenius theory that the dominant eigenvalue of a matrix with positive entries sitsbetween the minimum and maximum row sum.Therefore k (1 − ρ ( t )) ≤ λ t < λ <

1, therefore 1 − λ ≤ O (1 − σ ( t )). Toevaluate b , let us call ˜ P ( t ) the row-normalized matrix derived from P t , where every row is scaled so as to sum toone. Then the distance (in any norm) between any two rows of ˜ P ( t ) is in O (1 − µ ( t )), by the deﬁnition of internalmixing in Eq. (5). The distance between P t and ˜ P ( t ) on the other hand is in O (1 − ρ ( t )), by deﬁnition of theretention. Therefore the distance between any two rows of P t is in O (1 − σ ( t )), thus P t = v + O (1 − σ ( t )) forsome positive vector v . Premultiplying this equality by v , we get λ t v = v + O (1 − σ ( t )), thus v = v + O (1 − σ ( t )).Postmultiplying instead by b gets b = + O (1 − σ ( t )), as required.Consider the remainder R = P − bλv , thus R t = P t − bλ t v from spectral decomposition properties. We ﬁnd fromthe above that R t = ˜ P ( t ) − v + O (1 − σ ( t )) = O (1 − σ ( t )).Choosing the matrix norm k A k = sup i P j | A ij | , which happens to be submultiplicative ( k AB k ≤ k A kk B k for all A, B ), and using the identity P t ≥ z t = (1 − z ) − , applied here to z = R T (for which the identity is valid because alleigenvalues of R T have absolute value ≤ λ < k g − h k = X t ≥ ∆( t )= X t ≥ k R t − P k≤ (1 − k R T k ) − ( k P k + k RP k + · · · + k R T − P k )= (1 − k R T k ) − O ( k ( I + P + · · · + P T − ) P k )= O (1) O (1 − σ ( T )) , using also R t = O ( || P t || ) and k P i A ( i ) k ≤ P i k A ( i ) k ≤ k k P i A ( i ) k for any family of k -by- k nonnegative matrices A ( i ) .We obtain the same result k g − ˜ h k = O (1 − σ ( T )) for ˜ h = ˜ bλ t − d , with ˜ b = . This is because k ˜ h − h k = O (1 − σ ( T )),easily obtained from the fact ˜ b − b = O (1 − σ ( T )). This approximation has the nice property of preserving the ﬂow ofprobability: it describes the behaviour of a single super-node that aggregates all the input probability ﬂows, expelling5a small fraction 1 − λ of stored probability mass at every step to the nodes in the rest of the system, with weightsgiven by d .Therefore diﬀerent approximations rule the short-term (where a large retention matters) and long-term behaviours(where fast mixing also matters).The proof highlights that the theorem is robust with respect to the choice of norms in the deﬁnition of severabilityand in the statement of the theorems, as it changes the hidden constants in a way that only depends on k , thenumber of nodes. The speciﬁc deﬁnition chosen for severability in this article is motivated by simplicity, convenientcomputation and good practical results. Appendix B: From local to global: a proof of Simon-Ando’s theorem

We now provide a proof of Simon and Ando’s global time scale theorem, stated in terms of severability of thecomponents. Assume that a partition of the network into two components reveals a common time scale T at whicheach severability is higher than 1 − (cid:15) . In the short run, every component can ignore the other and evolve separately,with a resulting error of order O ( (cid:15) ). Let us turn to the long run case.For times t ≥ T , one may write x ( t ) = x ( t − T ) P T + O ( (cid:15) ) (as P tk = t − T u ( k ) the probability mass leaking fromcomponent x , is in O ( (cid:15) )). From high mixing, all rows of P T are (cid:15) -close to one another, and (cid:15) -close to a multiple of thedominant eigenvector v (quasi-stationary distribution). The same holds for x , (cid:15) -close to a multiple of v . The fullstate trajectory x ( t ) = ( x ( t ) x ( t )) thus remains (cid:15) -close to a trajectory of the form ( α ( t ) v β ( t ) v ), and thereforeit is enough to know the two-dimensional trajectory ( α ( t ) , β ( t )) (in fact one-dimensional in the set of probabilitymeasures because subject to the constraint α ( t ) + β ( t ) = 1) to reconstruct approximately x ( t ). This means that S ,the image of the set of all probability measures under the map P T is invariant under P , has diameter 1 − O ( (cid:15) ) in thedirection { ( αv (1 − α ) v ) | α ∈ R ) } , and is ‘thin’ in that every point of S is (cid:15) -away from that direction.Now consider the two-dimensional dominant eigenspace of P , generated by the dominant left eigenvector (stationarydistribution) w (1) of eigenvalue 1 and the second left eigenvector w (2) (normalised to unit norm) of eigenvalue 1 −O ( (cid:15) ).The intersection of that space in the space of probability measures is one-dimensional, of the form S = { w (1) + γw (2) | w (1) + γw (2) ≥ } . On this eigen-set, the dynamics takes the simple, exact form x ( t ) = w (1) + λ t γw (2) . Giventhat S ⊆ S , we know that every point x = ( x x ) ∈ S is O ( (cid:15) )-approximated by the projection on Proj( x ) =( x v x v ). Therefore, the aggregated dynamics obtained in replacing w (1) and w (1) by their approximationsin terms of v and v induces a one-dimensional aggregated dynamics on the direction ( αv (1 − α ) v ) where x and x are replaced by their aggregation x v and x v , and the projected dynamics is given by Proj( x ( t )) =Proj( w (1) ) + λ t γ Proj( w (2) ).The trajectory initiated by a point x ∈ S , and the trajectory generated by the aggregated by its projection Proj( x )with this projected dynamics, remain O ( (cid:15) )-close at all times.On the other hand, any point x in S is O ( (cid:15) )-close to a point x in S , and those two points remain O ( (cid:15) )-close whenboth are iterated by P , as P contracts the 1-distance (or total variation distance ; the induced 1-norm of P is 1).Now we can conclude. The trajectory initiated by any point in S (iterated by the exact dynamics P ) remain O ( (cid:15) )-close at all times from some trajectory in the eigen-set S , which itself remains O ( (cid:15) )-close at all times from theprojected, aggregated dynamics on the direction ( αv (1 − α ) v ). Therefore any trajectory in S generated by theactual dynamics S is O ( (cid:15) )-close at all times from the one-dimensional dynamics on the aggregated quantities.A closer look would show that the projected dynamics taken from the approximation given by the Local Time Scaleseparation theorem on each block separately, is not strictly identical to the aggregated dynamics presented here, butthe trajectories generated by the two one-dimensional dynamics are O ( (cid:15) )-close at all times again.In the above, all hidden constants in the O ( . ) notation are dependent on the speciﬁc norms used to measuredistances, thus dependent on the number of nodes in each block, but on nothing else.This completes the proof of Simon and Ando’s global time scale theorem, as given in Methods (see Section III B 1),since arbitrarily small perturbations from a ﬁxed, block-diagonal transition matrix ˜ P lead to arbitrarily high sever-ability for arbitrarily large intervals of time.The global nature of the theorem reveals itself in the fact that it needs simultaneously at time T , a high mixing anda high retention in every component, thus shedding light on the conditions required for global time-scale separationto hold.See next Appendix for a simple example showing that δ ( (cid:15), ˜ P , ˜ P ) and T ( (cid:15), ˜ P , ˜ P ) described in the text in theclassic statement of Simon-Ando’s theorem indeed depend on the global information ( ˜ P , ˜ P ).6 Appendix C: Global vs Local time scale separation theorems: an example

We apply our version of Simon-Ando’s theorem (formulated in terms of severabilities) to a toy example of fournodes separated into two blocks, or mesoscale components. We then modify the example so that Simon-Ando’s globaltheorem does not apply any more, but our local theorem still applies.Consider P =  − η − δ η δ η − η − η η δ η − η − δ  , (C1)which is δ -close to the block-diagonal matrix˜ P =  − η η η − η − η η η − η  . (C2)Let us compare the trajectories generated by the two initial conditions (1 0 0 0) and (0 1 0 0), which both lead to thesame aggregation (probability 1 in the ﬁrst block).If η (cid:28) δ (cid:28) η (cid:28) δ (cid:28)

1, then it is clear that their trajectories will remain very diﬀerent, even at theaggregated level, for a long time, as at times of the order 1 /δ , the ﬁrst trajectory will be concentrated mostly on thesecond block (and so will the aggregated trajectory), while the second trajectory will stay conﬁned in the ﬁrst block.If η (cid:28) δ (cid:28) η (cid:28) /δ the ﬁrst trajectory will be equally split between the twoblocks, while the the ﬁrst trajectory will be again conﬁned in the ﬁrst block.Thus if we want to reach a given accuracy in Simon-Ando’s theorem, for instance (cid:15) = 0 .

1, we need to take δ of theorder of min( η , η ), which shows the global dependency of δ on the ‘internal details’ of both blocks. The transitionbetween the short time regime and the long time regimes occurs at time T = O (1 /δ ).In our language, the severability of each block i = 1 , t of between O (1 /η i )and O (1 /δ ) (if δ (cid:28) η i indeed, otherwise the severability remains low at all times). We see indeed that these intervalswill start overlapping at time O (1 /δ ). We can therefore apply our version of Simon-Ando’s theorem, as we havesimultaneous high severability in each block, for some time t .This also shows the intrinsically asymptotic nature of Simon-Ando’s original theorem: as δ is decreased, the peakof severability for each block extends into a plateau stretching until 1 /δ , eventually forcing overlap of plateaus forsmall enough δ .If we consider a slightly more complicated example: P =  − η − δ η δ η − η − η η δ η − η − δ  , (C3)with δ (cid:28) η (cid:28) δ (cid:28) η then Simon-Ando’s theorem cannot be formally applied, because it assumes a ﬁxed block-diagonal structure, and an arbitrarily small perturbation of it. We ﬁnd the same conclusion in the language ofseverability. We see that the severability of each block i = 1 , O (1 /η i ) and O (1 /δ i ). As these intervals do not coincide, we indeed cannot apply our version of Simon-Ando’s theorem.Our local time scale theorem is nevertheless applicable to each block separately, and allows us to identify them asmesoscale components reaching high severability at diﬀerent time scales. This shows that the local time scale theoremis of wider applicability and is a more relevant tool to identify components with dynamical coherence in a complex,heterogenous dynamical system.7 Appendix D: Computational aspects of Severability1. Severability optimization ﬂowchart

StartRead initial node n . Readmaximum size S Let the set N = { n i } be all theneighbours of n Sort N such that σ ([ n , n i ]) ≥ σ ([ n , n i +1 ]) i = 1Call function C = F([ n , n i ])Is n ∈ C ? i = i + 1Return C End NoYes Start F C = [ n , n i ] i = 1Is | C i | ≥ S ? Is i mod 3 = 0?Kernighan-Lin step: eitheradd a neighbouring nodeto or remove a boundarynode from C i such that σ ( C i +1 ) is maximal.Greedy aggregation:add a neighbouringnode to C i such that σ ( C i +1 ) is maximised. i = i + 1 k =arg max i σ ( C i )Let C = C k Is it possible to add aneighbouring node toor remove a node from C such that∆ σ ( C ) > C such that σ ( C i +1 ) is maximal. Return C End FNo Yes NoYesYes No

FIG. 6. Flowchart of the optimisation procedure to ﬁnd the most severable component to which a node n belongs. For clarity,the Markov time t is assumed to be constant in this diagram.

2. Computational Complexity

Let n be the number of nodes in a graph. The severability of a component C of size k for a Markov time t canbe computed in P ( k, t ) = O ( k log t ) time, where the cubic term comes from schoolbook matrix multiplication.Computation of mixing and retention given Q t are both O( k ) operations, so the total cost is dominated by matrixexponentiation.The cost can be reduced using fast matrix multiplications techniques; for instance, using Strassen’s method, thetotal cost would only be O ( k . log t ). Alternatively, for large t , matrix diagonalisation can be ﬁrst employed, whichmakes the t term negligible, giving a O ( k ) solution.However, ﬁnding good components is more involved than simply computing the severability of a single set of nodes.The cost of the component optimisation algorithm described in Appendix D 1 is more diﬃcult to characterise, asit depends strongly on the number of nodes neighbouring the putative component throughout the procedure. In8pathological cases, the cost is O ( nS · P ( S, t )) = O ( nS log t ), where S is the maximum number of nodes permittedin the component, and n is the size of the graph. Luckily, this upper bound only occurs in complete graphs, and so isof little relevance as most real networks are far sparser. However, by specifying the maximum component size S , onecan choose the maximal computational resources one wants to spend trying to ﬁnd a component.Potential optimizations include using a random walk to highlight likely candidate neighbours; for instance, bychoosing only the l nodes that a random walker uniformly distributed in C would most likely walk to in the nextstep, or for removal of nodes, the l nodes in C that have the least density of probability. Such an algorithm wouldonly cost O ( S P ( S, t )), a signiﬁcant improvement.More subtly, the computational cost of the matrix powers might also be reduced, by taking advantage of the factthat Q ( C ∗ ) for each of the neighbouring components is eﬀectively a rank-2 perturbation of Q ( C ). Furthermore, asbrieﬂy mentioned in the discussion, severability is only one way of quantifying the mixing and retention of randomwalkers. Other, alternate, methods may be found that are quicker.

3. Benchmarking against community detection methods

Optimal component cover.

To compare against benchmarks with overlapping components, it is necessary to gen-erate a list of components to cover the network. Simply taking the optimal components of each node is suboptimal,because then there are many duplicate components in the list. Instead, we chose the following naive method:1. Let components = be the set of components; let covered be the set of nodes that have been assigned to at leastone component.2. Choose a node x that is more connected to unassgined nodes than to nodes in covered . If no such node exists,end.3. Find the optimal component C ( x ) for x , and add C ( x ) to components and the nodes in C ( x ) to covered .4. Repeat from step 2. Partitioning.

To compare severability with partitioning methods, it is necessary to turn the optimal componentcover into a partition. To do so, ﬁrst order the components of the cover arbitrarily. Where a node appears in multiplecomponents, always choose the ﬁrst component it appears in. This procedure is obviously dependent upon the orderingof the components; however, in networks with well-deﬁned partition structure, this method works suﬃciently well, asdemonstrated in the LFR benchmark.

Choice of Markov Time.

For hierarchical networks, Markov time serves as a useful resolution parameter, allowingfor severability to pick out optimal component structure at diﬀerent levels. However, existing metrics [14, 39] requirethe selection of a single time t . For partitions, this can be done by choosing a Markov time to minimise the numberof singleton and overlapping vertices, but other t could be chosen. Quantifying similarity of partitions.

To compare partitions across diﬀerent methods, normalised mutual informa-tion [14] has been employed. To compare component covers, a generalisation of normalised mutual information thatallows for overlapping nodes has been used [39]. We refer to the generalised variant as simply “normalised mutualinformation”, without loss of precision as only the generalised variant can be used in the benchmarks with overlappingcomponents.9

Appendix E: Linearization and dicretization of anetwork of Kuramoto Oscillators

For power networks, in a number of situations of prac-tical relevance [20, 21], e.g. when operating in the regimewhere frequencies ˙ θ i have almost synchronized, the term M i ¨ θ i can be reasonably neglected and one may linearizearound the steady state trajectory to obtain∆ ˙ θ i = X j D − i A ij ∆ θ j = X j L ij ∆ θ j , (E1)where A ii is deﬁned as − P j = i A ij . The matrix L iscalled the Laplacian of the network, as it plays the samerole in graphs as the Laplace operator in continuousspace. It is important to note that this equation alsofully characterizes the consensus model of opinion dy-namics [48], the heat equation, and random walkers dif-fusing through the network in continuous time [9]; to wit,the θ i represent, respectively, converging opinions, equal-izing temperatures, or the expected fraction of walkers onnode i at any given time.In order to build a discrete-time random walk to whichour framework can be directly applied, we choose atimestep δ = 0 . · − η , where η is the smallest natu-ral number such that a modiﬁed adjacency matrix A =0 . · − η L + I is strictly positive. We then measure theseverability of random walk dynamics on the graph de-ﬁned by the modiﬁed adjacency matrix A . Appendix F: Variants of the LFR benchmark1. Unweighted, undirected, non-overlapping LFRnetworks

We analyse a class of networks in which compo-nents are extremely unevenly sized, a situation in whichmany popular partitioning methods perform subopti-mally. These multi-scale networks are randomly con-structed such that both degree and component size dis-tributions follow power laws, with exponents γ and β , re-spectively. Additional parameters include the total num-ber of nodes N , the average degree h k i , the maximumdegree k max , and the intrinsic parameter µ —not to beconfused with the mixing µ ( C, t ) which is part of sever-ability. The fraction of links from a node to other nodeswithin the same component is given by 1 − µ [40]. Graphgeneration parameters were chosen at values typical ofreal networks: γ = 2, β = 2, N = 1000, h k i = 15, and k max = 50 [40]. Severability optimisation was performedwith a maximum search size S = 50, and partitions weregenerated from the component cover.As can be seen in Figure 7, severability performs well,always ﬁnding the natural component structure up tountil around µ = 0 .

5, when components are no longerdeﬁned in a strong sense [51]. That severability beginsfailing at µ = 0 .

2. Unweighted, undirected, overlapping LFRnetworks

Further extensions to the LFR benchmark were im-plemented to allow for components to overlap [37]. InFigure 8, we compare the component covers from sever-ability to the pre-seeded components. For the optimisa-tion, the maximum search size S = 50 ,

100 was used forthe upper and lower panels, respectively. The parameterschosen were identical to those used for the evaluation ofk-clique percolation[49] in ﬁgure 6 of Ref. [38]. Compar-ison with those results shows that severability performscomparably for the smaller component sizes, but signiﬁ-cantly better for larger components.

3. Weighted, directed, overlapping LFR networks

Severability also loses no accuracy when direction andweight are added to the benchmark [37] (as seen in Fig-ure 9). This is expected, since the Markov chain for-mulation naturally includes both. For the optimisationshown, the maximum search size S = 100. Appendix G: Image processing

The image in Fig. 5 of the main text was pre-processedby reducing the image resolution to a more convenientsize and converting to a network using standard methods.Brieﬂy, we connect only adjacent pixels (using the max-imum metric) with link weight w = exp (cid:2) − (∆ I ) /σ I (cid:3) , where ∆ I is the diﬀerence in luminosity and σ I is anadjustable parameter controlling the exponential weightdecay. Here we used σ I = 20. Severability was optimizedwith Markov time t = 32, and maximum size s = 200.In a post-processing step, segments with mixing µ > . ρ < . C was com-pleted embedded in C ( C ⊂ C ), we keep only the onewith higher severability. Communities were then induc-tively merged if they overlapped by more than 20 pixelsuntil no more merges were possible. Merging is generallyrelevant when a feature of the network is much largerthan the maximum search size; in this case the optimisa-tion method gives overlapping patches of the background,0 µ N o r m a li s ed m u t ua l i n f o r m a t i on Modularity (Sim. Annealing)Modularity (Blondel, et al)InfomapSeverability

FIG. 7. Comparison of severability with modularity and in-fomap the LFR benchmarks with exponents γ = 2, β = 2,average degree h k i = 20, and maximum component size of50. Severability optimisation was performed with maximumsearch size of 50, and Markov time t = 3 (a value determinedas a result of minimising the number of orphan nodes andoverlapping nodes). Modularity was optimised for using bothsimulated annealing[25], which is extremely slow, but givesgood results, and a faster heuristic by Blondel, et al [5]. Eachpoint is an average over ten random realisations. Fraction of Overlapping Nodes N o r m a li s ed m u t ua l i n f o r m a t i on µ t = 0.1 0 0.1 0.2 0.3 0.4 0.500.20.40.60.81 µ t = 0.3 N=1000s min =10s max =500 0.1 0.2 0.3 0.4 0.500.20.40.60.81 µ t = 0.1 0 0.1 0.2 0.3 0.4 0.500.20.40.60.81 µ t = 0.3 N=1000s min =20s max =100 FIG. 8. Severability at Markov time t = 4, withan unweighted, undirected, overlapping variant of the LFBenchmark[37]. The networks have 1000 nodes; the otherparameters are τ = 2, τ = 1, h k i = 20, k max = 50. Eachpoint is an average over ﬁve random realisations. which can then be pieced together. The segments wereordered by average luminosity, and the darker patcheswere assigned to the background. Fraction of Overlapping Nodes N o r m a li s ed m u t ua l i n f o r m a t i on µ t = 0.1 0 0.1 0.2 0.3 0.4 0.500.20.40.60.81 µ t = 0.3 FIG. 9. Severability at Markov time t = 4, with a weighted,directed, overlapping variant of the LF Benchmark[37]. Thenetworks have 1000 nodes; the other parameters are τ = 2, τ = 1, µ w = 0 . h k i = 20, k max = 50, s min = 20, s max =100. Each point is an average over ﬁve random realisations. Modularity SeverabilityInfomod Infomap

FIG. 10. Ring of rings. As in Figure 3, heavy lines (withinrings) correspond to undirected links with weight 2, whilelight lines between rings to links with weight 1. Severabilityis able to recover the seeded ring structure (at Markov times3 ≤ t < Appendix H: Ring-of-rings

We also examined the results of running several otherpopular graph partitioning methods on the ring-of-ringsnetwork shown in Figure 3. Infomod, Infomap, and Mod-ularity were all unable to recover the ring structure of thegraph (Figure 10).

Appendix I: Square lattice

As a negative control, It is instructive to consider anetwork in which there is clearly no structure. For that,we chose a regular 2-D square lattice with each node con-nected to all 8 neighbours (including diagonal links). Wevisualise this using a uniformly coloured discrete image,in which each pixel is connected to all of the adjacentpixels with links of equal strength. As can be seen in theﬁgure below, after accounting for symmetry considera-tions, all components found are transients, which is theexpected result.Additionally, these images strongly suggest a rela-tionship between severability optimisation and diﬀusion.This is of course quite closely related both to the de-1 C o m ponen t S i z e S(t) = 5.5948t − 1.688R = 0.98928 FIG. 11. (top) Correlation of Markov time with size on ansquare lattice. (bottom) The transient components foundby severability on a regular lattice at Markov times t = { , , , , , , , , , } . Each block is connected to alleight of its neighbouring blocks by a single undirected edge ofweight 1. pendance of severability on random walk dynamics andto the optimisation procedure outlined in Appendix D 1.Along these lines, the optimisation procedure we outlinedcan be thought of as a modiﬁed random walk in whichpreviously explored states are immediately accessible tothe random walker, but probability barriers in the “en-ergy landscape” are magniﬁed.2 ≤ t ≤ ≤ t ≤

15 5 ≤ t ≤

21 4 ≤ t ≤

24 3 ≤ t (a) Severability(b) OSLOM1 ≤ t ≤ ≤ t ≤

15 5 ≤ t ≤

21 4 ≤ t ≤

24 3 ≤ t FIG. 12. Ring of small world networks of sizes 5, 10, 20, and 40 generated using the Watts-Strogatz model. (a) At diﬀerenttimes, severability recovers each small-world network, as expected. Additionally, when the largest small-world is in isolation, itstill correctly recovers it a component. (b) OSLOM recovers the three larger small-worlds, but splits up the smallest one, eventhough in other experiments it can recover 5-cliques. Additionally, when the largest small-world is given in isolation, OSLOMbreaks it up, giving three overlapping communities instead.

Appendix J: Ring of small-worlds: commutativity & locality

We further explore commutativity as in Figure 5, by looking at a ring of small-world networks and comparing againstOSLOM. We ﬁrst generate small-world networks using the Watts-Strogatz model. Each node is ﬁrst connected to its2 neighbors on both sides. Then every edge is rewired with independent probability 0 .

1, but such that multi-edgescannot exist, so a small world with a total of 5 nodes will not be rewired from a 5-clique.Note that whereas severability gives the same results when looking at a single small-world network compared to aring of four of them, OSLOM does not. Some of this is equivalent behaviour, as OSLOM chooses to not consider theentire network as a valid community. For the small-worlds of size 5, 10, and 20, OSLOM returns all individual nodes,which is as valid of an answer as the entire network. However, for the largest of the small-world networks of size 40,OSLOM chooses to split it up into 3 pieces, which is not what it chose in the ring of 4 small-worlds. Severabilityalways gives the small-world at the appropriate times, as it is truly local.Additionally, OSLOM demonstrates trouble when the scales of the networks are very diﬀerent. It is unable torecover the 5-clique of the smallest small-world, despite the 5-clique being recoverable when the other communitiesare of the same size. This comes from the imposition of the same resolution on all communities implicit in OSLOM.3 (a) Severability, t=2(b) Severability, t=16

FIG. 13. 5-clique of 5-cliques attached to a small world network of size 50 generated using the Watts-Strogatz model. (above)At Markov time 2, the 5-cliques are recovered, but not the 50-small world. At Markov time 16, the size 50 small world isrecovered, but the 5-cliques aggregate into a 5-clique of 5-cliques. At no one time are both the 5-cliques and the 50 small worldsimultaneously recovered, because they exist on diﬀerent time scales.

Appendix K: Co-existence of diﬀerent timescales Appendix L: Word Association Extended

Figure 4 only depicted the components including “nature” and the orphans directly connected to that word. How-ever, this is only a small snippet of the entire network. Here, we have displayed all the other components that haveat least one link to “nature”, but do not include the word itself. As with Figure 4, the maximum search size S = 50and the Markov time t = 2.= 2.