Analysis of contagion maps on a class of networks that are spatially embedded in a torus
AANALYSIS OF CONTAGION MAPS ON A CLASS OF NETWORKSTHAT ARE SPATIALLY EMBEDDED IN A TORUS
BARBARA I. MAHLER ∗ , ULRIKE TILLMANN † , AND
MASON A. PORTER ‡ Abstract.
A spreading process on a network is influenced by the network’s underlying spatialstructure, and it is insightful to study the extent to which a spreading process follows such structure.We consider a threshold contagion on a network whose nodes are embedded in a manifold and wherethe network has both ‘geometric edges’, which respect the geometry of the underlying manifold, and‘non-geometric edges’ that are not constrained by that geometry. Building on ideas from Tayloret al. [45], we examine when a contagion propagates as a wave along a network whose nodes areembedded in a torus and when it jumps via long non-geometric edges to remote areas of the network.We build a ‘contagion map’ for a contagion spreading on such a ‘noisy geometric network’ to producea point cloud; and we study the dimensionality, geometry, and topology of this point cloud to examinequalitative properties of this spreading process. We identify a region in parameter space in which thecontagion propagates predominantly via wavefront propagation. We consider different probabilitydistributions for constructing non-geometric edges — reflecting different decay rates with respect tothe distance between nodes in the underlying manifold — and examine the effect of such choices onthe qualitative properties of the spreading dynamics. Our work generalizes the analysis in Taylor etal. and consolidates contagion maps both as a tool for investigating spreading behavior on spatialnetworks and as a technique for manifold learning.
Key words. spreading dynamics, contagions, spatial networks, manifold learning, topologicaldata analysis
AMS subject classifications.
1. Introduction.
Spreading dynamics are ubiquitous in many situations, in-cluding social settings and biological processes. The spreading of a contagious diseaseor of an idea between people are two obvious examples, and various other phenomenaalso give rise to spreading processes on networks [30, 36].The spreading of real-world contagions is often guided by the geometry of theunderlying domain [10, 22, 42, 50]. One example is the spread of contagious diseases.Historically, such diseases spread gradually along part of the earth’s surface. Similarly,when means of transportation and communication are limited, information typicallydisseminates via entities that are physically close. In such cases, contagions oftenpropagate as a wavefront, passing between geometrically close entities. However, withmodern transportation and communication technology, there are now many scenarioswhere — even in the presence of a well-defined underlying geometry, such as thespherical surface of the earth — a contagion can also spread via connections that arenot intrinsically geometric [1, 6]. Examples of such scenarios include the spreadingof an infectious disease via passengers traveling on a long-distance flight and thedissemination of information via social media. In these examples, a contagion jumpsacross space to distant locations, rather than following the geometry of an underlyingdomain.One way to study such phenomena is to consider contagion models on networksthat are embedded in some underlying geometric space [39]. In particular, one can ∗ Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK ([email protected]).This author acknowledges a studentship from the EPSRC under grant EP/G03706X/1. † Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK, and The Alan TuringInstitute, 96 Euston Road, London NW1 2DB, UK ([email protected]). This author wassupported by the EPSRC through grants EP/R018472-1 (TDA network) and EP/N510129/1 (ATI). ‡ Department of Mathematics, University of California Los Angeles, California 90095, USA ([email protected]). 1 a r X i v : . [ c s . S I] F e b B. I. MAHLER, U. TILLMANN, M. A. PORTER consider networks that have both geometric edges that respect the geometry of theunderlying space, in the sense that they can only connect nodes that are close toeach other according to the space’s metric; and non-geometric edges, which are notconstrained by the underlying geometry and can connect nodes that are far from eachother. Following terminology from [45], we refer to such networks as noisy geomet-ric networks . It is interesting and important to ask [30, 36, 40], what propagationpattern(s) a contagion follows and how much such patterns are influenced by thestructure of the underlying space. Two fundamental spreading mechanisms that canoccur on a noisy geometric network are wavefront propagation (WFP) and the appear-ance of new clusters (ANC). Wavefront propagation is the spreading of a contagionalong the structure of the underlying space via geometric edges. The appearance ofnew clusters occurs when long-range, non-geometric edges connect activated nodeswith nodes in a region of the network that has been unaffected by the contagion andthereby lead to a new cluster of activated nodes in this previously unaffected region.We can view the non-geometric edges as ‘bridges’ that can accelerate the spreadingprocess considerably, especially if they are ‘long’. In a threshold model (a type of‘complex contagion’ [8, 30]), in which a sufficient fraction or number of nodes in afocal node’s neighborhood need to be active to activate that node, bridges also needto be sufficiently ‘wide’ to encourage ANC.Taylor et al. [45] used methods from topological data analysis and nonlinear di-mensionality reduction to study spreading behavior of a threshold contagion modelon noisy geometric networks. They explored the occurrence of WFP and ANC on anoisy ring lattice, and they examined the extent to which the spreading process fol-lows the ring structure. To investigate the extent to which a complex contagion on anetwork adheres to the structure of the underlying space, they introduced the notionof a contagion map , which maps each node of a network to a point in R n based on itsactivation times in different realizations of a contagion process. It thereby produces apoint cloud that one can view as a geometrical distortion of the network that reflectsthe contagion’s spreading behavior. To see if they could identify the structure of theunderlying space, Taylor et al. examined the geometry, dimensionality, and topologyof such point clouds. They compared their results with a bifurcation analysis of thecontagion on noisy ring lattices and found that the contagion map successfully recov-ers the geometry, dimensionality, and topology of the underlying space exactly whenthe contagion propagates predominantly by WFP. This illustrates that one can usecontagion maps to illuminate propagation patterns of spreading processes on noisygeometric networks whose underlying space is known. Moreover, they found that onnoisy ring lattices, WFP occurs for a wide range of the network and contagion pa-rameters, suggesting that contagion maps are a viable tool for inferring the structureof the underlying space of a noisy geometric network from contagion dynamics on it.That is, one may be able to use contagion maps as a technique for manifold learning.We follow the approach of [45], and we build on their ideas through a studyof a new example. We still use a threshold contagion model, but we consider amore complicated family of noisy geometric networks. Our networks, which one canconstrue as geometrically embedded in a flat torus, are similar to the Kleinberg small-world model [29]. We use a contagion map to construct a point cloud that representsthe dynamics of the contagion from a set of different initial conditions. We thenexamine the structure of this point cloud in three different ways: topologically (viathe homology of a space built on the point cloud), geometrically (via distances betweenpairs of points), and with respect to dimensionality (via the approximate embeddingdimension). We compare our findings to the topological and geometric structure, ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS
2. Network model. A geometric network is a network whose nodes are embed-ded in some metric space and whose edges, called geometric edges , can occur onlybetween pairs of nodes that are sufficiently close in this space [2, 3].One can build a noisy geometric network from a geometric network by adding so-called non-geometric (i.e., ‘noisy’) edges between pairs of nodes that can be distantfrom each other in the underlying metric space. For example, in synthetic noisygeometric networks, the nodes may be located on a manifold that is embedded in anambient Euclidean space. One can place geometric edges between all or some of thenode pairs that are at distances from each other that are below some fixed threshold,and one can then add non-geometric edges uniformly at random (see Figure 1(a)) orfollowing some other probabilistic or deterministic rule.As another example (see Figure 1(b)), one can add noise to node locations in theambient space and place a non-geometric edge between any two nodes that are close inthe ambient space but not close with respect to geodesic distance along the manifold.Many nonlinear dimension-reduction techniques, such as diffusion maps and Isomap[4, 9, 16, 44, 46], start by inferring a proximity network from point cloud data, such asby connecting each point to its k nearest neighbors, with the goal of finding underlyinglow-dimensional structure of the point cloud. One can view such proximity networksas noisy geometric networks, as it is possible for nodes to be adjacent even whenthey are not close on the underlying manifold, and nonlinear dimension-reductiontechniques seek to find purely geometric structures in such networks. B. I. MAHLER, U. TILLMANN, M. A. PORTER (a) (b) (c) (d)
Fig. 1 . Examples of noisy geometric networks. (a) Nodes lie on a 2D sphere embedded in R . There are geometric edges (blue) between nodes that are close to each other on the sphere,and we place non-geometric edges (red) uniformly at random. (b) The noisy Swiss roll. Nodescorrespond to a noisy sample from a bounded 2D manifold that is embedded in R as a roll. Thereare geometric edges (blue) between nodes that are close to each other on the manifold, and there arenon-geometric edges (red) between nodes that are close to each other in R but not close to eachother on the manifold. (c,d) A Kleinberg-like small-world network. It consists of (c) a regular latticeof geometric edges (blue) that we (d) ‘wrap up’ into a torus. We place non-geometric edges (red)according to some probability distribution. We consider a family of noisy geometric networks that are embedded geometricallyin a 2D manifold, with the nodes spread evenly on the surface of a flat torus. Thetorus has Betti numbers β = 1, β = 2, and β = 1 (we define Betti numbersin Definition 8.3 in the Supplementary Material), so there are multiple nontrivialtopological features that one can take into account when comparing the structure ofa contagion map to that of the underlying space.Our noisy geometric network is a variant of the Kleinberg small-world model [29](see Figure 1(c,d)). We start with a periodic square lattice of N = n × n nodes: V = Z × Z n Z × n Z , so ( i x , i y ) = ( kn + i x , ln + i y ) ∈ V for all k, l ∈ Z . We define the periodic lattice distance µ per between nodes ( i x , i y ) , ( j x , j y ) ∈ V to be(2.1) µ per (( i x , i y ) , ( j x , j y )) = | i x − j x | per + | i y − j y | per , where | a | per = min { b ∈ { , , . . . , n − , n − } | b = kn + a or b = kn − a for some k ∈ Z } , and we take the sum of the two residues in (2.1) in Z . The periodic lattice distance isthe regular lattice distance, but opposite sides of the lattice are considered to be closeto each other. We can thus think of the lattice as being ‘wrapped up’ into a 2D torus,which has no boundary. In other words, we are using periodic boundary conditions.We fix p ∈ R > and place a geometric edge between any two nodes whose (peri-odic) Euclidean distance from each other is within p . That is, we place a geometricedge between nodes i = ( i x , i y ) and j = ( j x , j y ) ∈ V if and only if | i x − j x | + | i y − j y | ≤ p . We call the number of geometric edges that are incident to a node i its geometricdegree , which we denote by d G ( i ). ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS q ∈ N ‘non-geometric stubs’, and we connect pairs of stubs tobuild non-geometric edges as follows. We connect a non-geometric stub from node i to a stub from node j with a probability that is proportional to µ per ( i, j ) − γ , where γ ∈ R ≥ is a fixed parameter. We call the number of non-geometric edges incident toa node i its non-geometric degree , which we denote by d NG ( i ) (where d NG ( i ) = q , bydefinition). The degree of a node i is d ( i ) = d G ( i ) + d NG ( i ), and the class of networksthat we just defined consists of regular networks of uniform degree d = d ( i ) for all i ∈ V . When γ = 0, we match the non-geometric stubs uniformly at random. For γ >
0, non-geometric edges tend to connect nodes that are close with respect to theperiodic lattice distance, and this tendency becomes more pronounced as γ increases.
3. Contagion model. A contagion on a network is a dynamical process in whichnodes become successively ‘activated’, starting from some initial condition. A commontype of initial condition is that a set of ‘seed’ nodes are active at time t = 0 [27, 38].We examine the Watts threshold model (WTM) [48] (see also [18, 47]), one of thesimplest and best-studied models for a contagion on a network .Let V denote the set of nodes of a network, and let A = ( A ij ) i,j ∈ V be theadjacency matrix of the network. In our contagion, each node can be either active or inactive , and we denote the state of node i ∈ V at time t by η i ( t ), which takes thevalue 1 if it is active and the value 0 if it is inactive. We call the set of nodes that areactive at time t = 0 a contagion seed , and we denote the seed set by S ⊆ V . That is, η s (0) = 1 for all s ∈ S and η i (0) = 0 for all i ∈ V \ S . If S consists of a single node,the initial condition is called ‘node seeding’; if S consists of a node together with itsneighbors, the initial condition is called ‘cluster seeding’. For a given homogeneousthreshold T , we update node states synchronously in discrete time steps according tothe following rule. If η i ( t ) = 1, then η i ( t + 1) = 1. If η i ( t ) = 0, then η i ( t + 1) = 1 if and only if f i > T , where f i = 1 d (cid:88) j ∈ V A ij η j ( t ) . In other words, a node activates at a time step if the fraction of its neighbors that areactive is larger than T at the previous time step. Once a node is active, it stays activeforever. For a fixed homogeneous threshold T and a given seed S , this contagion is adeterministic and monotonic process, which eventually reaches a stable state in whicheither all of the nodes are active or some nodes are inactive and will never activate.(It is ‘monotonic’ in the sense that a node that activates stays active forever.) For agiven network and a given seed set S , this deterministic process is one ‘realization’of the contagion. Formally, a realization R is the nested sequence of subsets of V that are active at successive time steps: R = { S = S , S , . . . , S m , . . . } such that S t = { i ∈ V | η i ( t ) = 1 } .
4. Methods.
We construct a point cloud by mapping the network nodes topoints in R N based on their activation times during different realizations of the con-tagion dynamics. This so-called contagion map was first studied in [45] and is inspiredby approaches, such as diffusion maps and Isomap [4, 9, 16, 44, 46], from nonlinear di-mension reduction. One can construe a point cloud that is the image of a contagionmap as a distortion of an underlying network structure that reflects the contagion dy-namics. We analyze the structure of this point cloud from three different perspectives— topologically, geometrically, and with respect to dimensionality — and compare itto the structure of the underlying network. We expect the structure of the point loudin R N to resemble the structure of the underlying network when the contagion spreads B. I. MAHLER, U. TILLMANN, M. A. PORTER predominantly via WFP. We perform a bifurcation analysis to identify regions in theparameter space for which WFP is the predominant mode of propagation, and we usethe results of this analysis to validate that the structure of the underlying network isrecovered in the point cloud whenever the contagion spreads predominantly via WFP.To compare the topology, geometry, and dimensionality of a point cloud to thestructure of the network on which it is based, we need to specify this structure pre-cisely. Specifically, we need to choose a metric space associated with the network, andwe need to specify the locations of the network’s nodes in this metric space.We consider the torus as the Cartesian product of two circles:(4.1) T = 12 π S × π S ⊂ C × C ∼ = R . We evenly distribute the nodes of our network on this torus T . The N = n nodes ofour network are the points on T with coordinates(4.2) 12 π (cid:18) cos 2 πxn , sin 2 πxn , cos 2 πyn , sin 2 πyn (cid:19) , x, y ∈ { , , . . . , n − } . Consider our WTM contagion, with homogeneous thresh-old T , on one instantiation of our Kleinberg-like network for some fixed parametervalues n , p , q , and γ . For a given seed set, the contagion dynamics is a deterministicprocess, and we can record the activation times of the nodes. We consider severalrealizations of the contagion dynamics initialized with different seeds. We denote theset of realizations by J = { R , R , . . . , R | J | } , and we denote the activation time ofnode i ∈ V in realization R j ∈ J by x ( i ) j . If node i is never activated in realization R j , we set x ( i ) j = 2 N (i.e., larger than any actual activation time).The regular contagion map associated to the set J of realizations is a functionfrom the set V of nodes to R | J | . It is defined by i (cid:55)→ x ( i ) = [ x ( i )1 , x ( i )2 , . . . , x ( i ) | J | ] . The regular contagion map associated to J maps each node in V to a vector in R J that records its activation times during each of the realizations.We take J to be the same size as V and choose the seed sets to be the clus-ters around the different nodes, such that the seed that initializes realization R j ∈ J is S ( j ) = { j } ∪ { k | A jk (cid:54) = 0 } . In this case, the activation time of node i in re-alization j is a proxy for a distance between nodes i and j . To see this, considerthe realization of a contagion with homogeneous threshold T = 0 initialized with asingle seed node { j } , and observe that the activation time of node i is exactly thelength of a shortest path between i and j . For cluster seeding of our contagion,the activation time of node i during realization j may not be precisely the shortest-path distance between i and j ; it depends on how the contagion spreads. Moreover, x ( i ) j (cid:54) = x ( j ) i in general. With this in mind, we define the reflected contagion map , whichmaps i (cid:55)→ y ( i ) = [ x (1) i , x (2) i . . . , x ( | J | ) i ], and the symmetric contagion map , which maps i (cid:55)→ [ x ( i )1 + x (1) i , . . . , x ( i ) | J | + x ( | J | ) i ]. The traditional use of the term ‘bifurcation’ [19] is to describe situations in dynamical-systemstheory in which a system’s qualitative behavior changes in a mathematically quantifiable way (e.g., asexpressed using a normal form), such as the onset of a limit cycle for a critical value of a parameter, asa function of one or more parameters. The notion of bifurcation that we examine in the present paperis somewhat different in flavor from classical bifurcations, but we nevertheless examine qualitativechanges in dynamics as we adjust parameters in a model.ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS To quantify the similarity of the geometric structure of a con-tagion map to that of the network on which it is based, we calculate the Pearsoncorrelation coefficient of pairwise distances between points of the point cloud andpairwise distances between corresponding nodes of the network. We use Euclideandistance in R for the nodes and Euclidean distance in R N for points in the pointcloud.Recall that the nodes lie on (4.1) at points with coordinates (4.2). Let w ( i ) = 12 π (cid:18) cos 2 πi x n , sin 2 πi x n , cos 2 πi y n , sin 2 πi y n (cid:19) denote the point in T that is associated with node i = ( i x , i y ). The distance betweentwo such points is d (cid:16) w ( i ) , w ( j ) (cid:17) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) k =1 (cid:16) w ( i ) k − w ( j ) k (cid:17) = 1 π (cid:18) sin ( i x − j x ) πn + sin ( i y − j y ) πn (cid:19) , and the distance between the corresponding points, x ( i ) and x ( j ) , in the point cloudis d ( x ( i ) , x ( j ) ) = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) k =1 (cid:16) x ( i ) k − x ( j ) k (cid:17) . Given ordered sets, D net and D map , of pairwise distances between nodes of the net-work and points in the point cloud, respectively, we compute the Pearson correlationcoefficient between these sets: ρ = N (cid:80) i =1 N (cid:80) j = i +1 (cid:104) d ( w ( i ) , w ( j ) ) − d ( w ( i ) , w ( j ) ) (cid:105) (cid:104) d ( x ( i ) , x ( j ) ) − d ( x ( i ) , x ( j ) ) (cid:105)(cid:115) N (cid:80) i =1 N (cid:80) j = i +1 (cid:104) d ( w ( i ) , w ( j ) ) − d ( w ( i ) , w ( j ) ) (cid:105) (cid:115) N (cid:80) i =1 N (cid:80) j = i +1 (cid:104) d ( x ( i ) , x ( j ) ) − d ( x ( i ) , x ( j ) ) (cid:105) , where d ( w ( i ) , w ( j ) ) = (cid:80) Ni =1 (cid:80) Nj = i +1 d ( w ( i ) , w ( j ) )( N − N ) / d ( x ( i ) , x ( j ) ) denotes the meanpairwise distance between points in the contagion map. Progressively larger Pear-son correlation coefficients ρ indicate progressively more similar geometric structuresbetween the contagion map and its associated network. We examine the topology of a contagion map by considering thepersistent homology (PH) of the Vietoris–Rips (VR) filtration (see Definition 8.12 inthe Supplementary Material) on its associated point cloud. We calculate PH usingthe software package
Ripser . We seek to quantify the extent to which topologicalfeatures of a torus appear in the barcode that represents PH in a given dimension. Ripser is publicly available at https://github.com/Ripser/ripser.
B. I. MAHLER, U. TILLMANN, M. A. PORTER
To do this, we calculate the Wasserstein distance W [ d ] (see Definition 8.9 in theSupplementary Material) between this barcode and a ‘model barcode’ that representstopological features of a torus in the given dimension. As a ‘model barcode’, wechoose the one that corresponds to the PH of the VR filtration on the regular pointcloud on the torus in formula (4.2). Smaller Wasserstein distances correspond tomore ‘torus-like’ point clouds, recovering the topology of the manifold in which thenetwork’s nodes are embedded. Roughly speaking, a barcode exhibits the topologicalfeatures of a torus when it has two dominant bars in dimension 1 and one dominantbar in dimension 2 (as well as one bar that never dies in dimension 0). We work withnetworks of N = 50 ×
50 nodes (see Section 2). Due to the computational complexityof computing 2D persistent homology of a VR filtration on 2500 points (it involvesbuilding up to 1 . × simplices), we compute PH only up to dimension 1, whichrequires building only up to 2 . × simplices for a given point cloud.The Wasserstein distance between barcodes is sensitive to scaling. Consider, forinstance, two barcodes that have the same number of bars, such that the relativelengths of the bars within each barcode are the same. Although these two barcodesrepresent the exact same topological features — albeit of different sizes — the Wasser-stein distance between them is nonzero. Similarly, two barcodes that represent verysimilar topological features, but are at very different scales, may be at a larger Wasser-stein distance from each other than two barcodes that represent different features butare close in ‘scale’ . See Figure 3 for an illustration of this phenomenon.In the present application, this sensitivity to scaling can manifest as follows. Themodel-torus barcode corresponding to regularly-spaced points on a torus constructedas the Cartesian product of two circles of circumference 1 has relatively short bars.For our contagion, larger values of T entail slower spreading. So if we have twocontagion maps both of which arise from spreading by WFP without ANC, but onewith large T (slow propagation) and one with small T (fast propagation), then bothcontagion maps have the same (torus-like) shape, but the former is much ‘larger’than the latter. This implies, in turn, that the former’s corresponding barcode isfarther away than the latter from the model-torus barcode. Similarly, when there aremany non-geometric edges, there is fast spreading via ANC. So while the shape ofthe contagion map should not look torus-like in this case, but rather like a cluster ofpoints, the Wasserstein distance from the corresponding barcode to the model torusbarcode may still be small, simply by virtue of the size of the point cloud, rather thanits shape.To counteract the above scaling issue, we ‘calibrate’ all barcodes before calculatingthe Wasserstein distance. We find the longest bar in each barcode and divide the birthand death times of all bars by that length. This yields barcodes whose longest bar isexactly 1, so one can construe them to be at the same ‘scale’. The Wasserstein distancebetween these calibrated barcodes allows one to systematically compare topologicalfeatures of the corresponding point clouds. Our use of the term ‘scale’ differs from existing uses in topological data analysis. Two exampleuses of ‘scale’ in TDA are for the persistence of a topological feature in a filtration and the point ina filtration at which a feature appears.ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s Fig. 2 . (a) Uncalibrated and (b) calibrated barcodes of the PH of the VR filtration on theregularly-spaced point cloud on the torus from formula (4.2) . The Wasserstein distance between thetwo barcodes is about . , although they are identical aside from the scale. (a) (b) (c) filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s (d) (e) (f) filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s Fig. 3 . Illustrative examples to demonstrate the sensitivity of the Wasserstein distance tobarcode ‘scales’. Barcodes (d), (e), and (f) are the calibrated versions of barcodes (a), (b), and (c),respectively. Barcodes (a) and (c) (and consequently (d) and (f)) represent 1D topological featuresof a torus, whereas barcode (b) (and consequently also (e)) does not. However, in the uncalibratedversions of the barcodes, the Wasserstein distance W [ d ] between barcodes (a) and (b) is about . ,and the Wasserstein distance W [ d ] between barcodes (a) and (c) is about . , suggesting that (a)and (b) — rather than (a) and (c) — have similar topological features. By contrast, the Wassersteindistances between the calibrated versions of the barcodes illustrate the true topological proximities:about . between barcodes (d) and (e) and about . between barcodes (d) and (f). B. I. MAHLER, U. TILLMANN, M. A. PORTER (a) (b) (c) filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s filtration step h o m o l o g y c l a ss e s Fig. 4 . Calibrated barcodes for PH of the VR filtration on contagion maps arising from aKleinberg-like small-world network with parameter values N = 2500 , d G = 8 , d NG = 2 and γ = 0 for different values for the contagion threshold T . In (a), we use a small threshold ( T = 0 . ), andwe observe fast spreading via both WFP and ANC. In (b), we use a threshold of T = 0 . , for whichANC is unlikely and we expect spreading to follow WFP. In (c), T = 0 . , which is a large threshold,for which we expect little spreading of a contagion. (See Figure 14 to locate these parameter valuesin the ( d NG , T ) -parameter space and identify their associated spreading regimes.) In panel (b), weobserve two relatively long bars, which represent the 1D topological features of the torus. In panel(a), we still observe some torus-like features (in the form of two slightly dominant bars), despite thefast spreading via ANC. This illustrates that WFP can still affect contagion maps noticeably, evenin the presence of ANC. In panel (c), there are many bars that are born and die at the same time.This reflects the fact that little spreading occurs, as most nodes do not activate in most realizationsof the contagion. The Wasserstein distances from the calibrated barcode representing PH of the VRfiltration on the regularly-spaced point cloud on the torus (see Figure 2) are about . in panel(a), about . in panel (b), and about . in panel (c). To compute Wasserstein distance, we use the software package
Hera . Hera currently provides the fastest algorithm for computing Wasserstein distances.
We determine the approximate embedding dimension P of a point cloud by finding the smallest dimension such that we lose less than 5% of thevariance when projecting to that dimension using principal component analysis (PCA)[44]. That is, for each p ∈ { , , . . . } , we project the point cloud { x ( i ) ∈ R N } i ∈ V to R p using PCA, resulting in a point cloud { ˆ x ( i ) p ∈ R p } i ∈ V .We then estimate the extent to which this projection preserves the original pointcloud by calculating the residual variance [11, 46] R p = 1 − (cid:16) ρ ( p ) (cid:17) , where ρ ( p ) is the Pearson correlation coefficient between the pairwise Euclidean dis-tances of points in { x ( i ) ∈ R N } i ∈ V and corresponding pairwise Euclidean distancesbetween points in { ˆ x ( i ) p ∈ R p } i ∈ V (see Section 4.2). The approximate embedding di-mension P is the smallest dimension for which the residual variance is less than 5%;that is, P = min { p | R p < . } .In practice, we put a cap of 100 on P , so if the approximate embedding dimensionis 100 or larger, we record it to be 100. Because we consider the torus to be embeddedin R , an approximate embedding dimension of P = 4 indicates that the contagionmap recovers the dimensionality of the torus. Hera is publicly available at https://bitbucket.org/grey narn/hera.ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS
5. Numerical Experiments.5.1. Experiments in the ( d NG , T ) parameter space. We construct Kleinberg-like small-world networks (as detailed in Section 2) for the following parameter values: N = 2500 nodes (i.e., n = 50), geometric degrees of d G = 4 , ,
12 (corresponding to p = 1 , √ , d NG = 0 , , , . . . ,
25, and distance decay pa-rameter γ = 0 , . , . , . , . . . ,
3. For each of these 3 × ×
31 = 2418 networks andfor each threshold value T = 0 , . , . , . , . . . ,
1, we run the contagion model (seeSection 3) with cluster seeding around each of its 2500 nodes and record the acti-vation times of each node. Using the activation times of each node in each of theserealizations as coordinates, we map the nodes of a network to a point cloud in R via the symmetric contagion map (see Section 4.1).We compute the quantitative measures for the similarity of these point clouds tothe underlying torus in terms of geometry (see Section 4.2), topology (see Section 4.3),and dimensionality (see Section 4.4) when we place non-geometric edges uniformly atrandom (i.e., when γ = 0). We illustrate our results by separately displaying thevalues of the Pearson correlation coefficient ρ , the Wasserstein distance W [ d ], andembedding dimension P in the ( d NG , T ) parameter space for each value of d G (seeFigure 5). When examining topological similarity, we only cover the case d G = 8,as computing PH of the VR filtration on a point cloud of 2500 points is extremelytime-consuming because of the large number of simplices involved. Brighter regionsin our plots signify larger Pearson correlation coefficients ρ (in the geometry com-putation), smaller Wasserstein distances W [ d ] (in the topology computation), andlower approximate embedding dimensions P . In each plot, we can identify a region inthe parameter space for which ρ is large, and W [ d ] and P are small, indicating thatWFP dominates for these parameter values.The first column of each plot in Figure 5 (e.g., see the yellow bar in panel (a))shows our results for d NG = 0, which corresponds to a purely geometric network. Inthis case, network formation is deterministic and we can analytically determine thenetwork dynamics (in particular, the presence versus absence of WFP). (See Section 6for details.) We can see in the first column of each plot that ρ , W [ d ], and P take onlyextreme values for d NG = 0 and that the transition between extreme values occursat the same threshold T for all three quantities. Below this threshold, the Pear-son correlation coefficients are large and the Wasserstein distances and approximateembedding dimensions are small ( P = 4, to be precise). Above this threshold, thePearson correlation coefficients are small and P is large (at the cap of 100), and thesevalues of T yield ‘infinite activation times’ of nodes in the plot for the Wassersteindistance. The observations that we describe in this paragraph are consistent withour analytical considerations, which demonstrate that spreading (by WFP) can occuronly below this threshold.There is a band along the transition between the region in which we expect WFP(see Figures 9–11) and the region in which we do not. This band is dark in the plotsof the geometric and the topological structure, and it is bright in the plot of dimen-sionality. This implies that the point cloud is low-dimensional for the correspondingparameter combinations, but that it does not exhibit torus-like structure in terms ofgeometry or topology. Although this was not discussed in [45], one can also observesuch a band for a threshold contagion on their noisy ring lattices.Some irregularities and outliers in our figures are likely due to the probabilisticnature of non-geometric edges in our network construction. One example is that ofthe non-white spots within the white region in Figure 5(e). These correspond to2 B. I. MAHLER, U. TILLMANN, M. A. PORTER parameter combinations for which our bifurcation analysis (see Section 6) suggeststhat we should expect infinite activation times, but all nodes end up having finiteactivation times in all realizations.(a) (b)
Pearson correlation coefficient ( ) for d (G) =4 d (NG) T h r e s ho l d ( T ) Embedding dimension (P) for d (G) =4 d (NG) T h r e s ho l d ( T ) (c) (d) (e) Pearson correlation coefficient ( ) for d (G) =8 d (NG) T h r e s ho l d ( T ) Embedding dimension (P) for d (G) =8 d (NG) T h r e s ho l d ( T ) Wasserstein distance from torus barcode (W [d]) for d (G) =8 d (NG) T h r e s ho l d ( T ) (f) (g) Pearson correlation coefficient ( ) for d (G) =12 d (NG) T h r e s ho l d ( T ) Embedding dimension (P) for d (G) =12 d (NG) T h r e s ho l d ( T ) Fig. 5 . (a) Geometry (as quantified by the Pearson correlation coefficient) and (b) dimension-ality (as determined by the approximate embedding dimension, which we cap at ) of contagionmaps from Kleinberg-like small-world networks with parameter values N = 2500 , d G = 4 , γ = 0 ,and d NG = 0 , , , . . . , and contagion thresholds of T = 0 , . , . , . , . . . , . (c) Pearson cor-relation coefficient and (d) approximate embedding dimension (which we cap at ) of contagionmaps from Kleinberg-like small-world networks with parameter values N = 2500 , d G = 8 , γ = 0 ,and d NG = 0 , , , . . . , and contagion thresholds of T = 0 , . , . , . , . . . , . (e) Wassersteindistances between the scaled barcode to measure PH of the VR filtration on the regularly-spaced pointcloud on the torus (see formula (4.2) ) and the scaled barcodes to measure PH of the VR filtrationon the contagion maps from Kleinberg-like small-world networks with parameter values N = 2500 , d G = 12 , γ = 0 , and d NG = 0 , , , . . . , and contagion thresholds of T = 0 , . , . , . , . . . , .The white regions correspond to parameter combinations for which there are nodes that do not ac-tivate (i.e., they have ‘infinite’ activation times) in some realizations of the contagion. (f) Pearsoncorrelation coefficient and (g) approximate embedding dimension (which we cap at ) of contagionmaps from Kleinberg-like small-world networks with parameter values N = 2500 , d G = 8 , γ = 0 ,and d NG = 0 , , , . . . , and contagion thresholds of T = 0 , . , . , . , . . . , . γ on contagion maps. In ourKleinberg-like small-world networks (see Section 2), recall that we regulate the rangeof non-geometric edges using the decay parameter γ ∈ R ≥ . Each node has a fixednumber of non-geometric stubs, and we connect two stubs that emanate from nodes ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS i and j to form a non-geometric edge with a probability proportional to µ per ( i, j ) − γ .For γ = 0, we match the non-geometric stubs uniformly at random, regardless of thedistance between the corresponding nodes, so the length of the non-geometric edgescan take any value with equal probability. For γ >
0, non-geometric edges have abias to connect nodes that are close to each other with respect to the periodic latticedistance. This bias becomes more pronounced for progressively larger γ , so largervalues of γ tend to yield shorter non-geometric edges.The speed of a contagion on Kleinberg networks depends significantly on theparameter γ [12, 17]. We examine the effect of γ on the shape of a contagion map.For fixed values of the geometric degree d G and non-geometric degree d NG of ournetworks and threshold T of our contagion, we vary the value of γ . Specifically, wechoose a geometric degree of d G = 8, a non-geometric degree of d NG = 2, and onevalue for the contagion threshold T for each predicted spreading regime when γ = 0.We use the value T = 0 .
05 for the regime in which we expect both WFP and ANCwhen γ = 0, the value T = 0 .
25 for the regime in which we expect WFP but no ANCwhen γ = 0, and T = 0 . γ = 0. See Figure 14 to locate these parameter values in the ( d NG , T ) parameterplane and identify their respective associated spreading regimes. For each of thesethree values for T , we let γ vary from 0 to 3 in increments of 0 .
1. For each value for γ , we map the nodes of the associated network via the contagion map using the givenvalue for T and analyze the resulting point cloud as described in Section 4.For T = 0 .
05, the Pearson correlation coefficient increases significantly in analmost linear fashion as we increase γ , whereas the Wasserstein distance and theapproximate embedding dimension both decrease. This arises from the fact that non-geometric edges change in function from drivers of ANC to contributors to WFP. For γ = 0, we expect fast spreading of the contagion that is dominated by ANC. This leadsto a contagion map whose image is a cluster of tightly bunched points that are fairlyevenly-distributed in the region that they occupy. In particular, the pairwise distancesbetween the points are not influenced much by the pairwise distances between theircorresponding nodes. Such a cluster of points has a high approximate embeddingdimension P , because its points are distributed with roughly constant density acrossthe region that they occupy, so the point cloud does not have an intrinsic lowerdimension than its ambient space.For progressively larger γ , the non-geometric edges tend to become shorter andcontribute increasingly to WFP, instead of facilitating spreading across large distancesin a network. They thereby produce a point cloud that is still contained in a smallvolume (because the spreading is still fast with such a low threshold), but with pairwisedistances between points that become increasingly faithful to the pairwise distancesbetween their corresponding nodes.For T = 0 .
25, the Pearson correlation coefficient starts out large and increasesfurther for progressively larger γ . By contrast, the Wasserstein distance is smallthroughout the range of γ , with a slight decrease at the lower end of the range. Theapproximate embedding dimension is P = 4 for all values of γ that we considered.This relative stability of all three measures stems from the fact that, for this valueof T , WFP dominates over ANC even when γ = 0, as the non-geometric edges arenot sufficiently numerous to drive ANC. For progressively larger γ , the graduallyshortening non-geometric edges only contribute increasingly to WFP.For T = 0 .
4, the Pearson correlation coefficient and the approximate embeddingdimension remain, respectively, fairly small and fairly large for all values of γ . TheWasserstein distance decreases steadily as we increase γ .4 B. I. MAHLER, U. TILLMANN, M. A. PORTER (a) (b) (c) (d) P ea r s on c o rr e l a t i on () T=.05, geometry P ea r s on c o rr e l a t i on () T=.25, geometry P ea r s on c o rr e l a t i on () T=.4, geometry P ea r s on c o rr e l a t i on () T= .05, .25, .4, geometry
T=0.4T=0.05T=0.25
Fig. 6 . Pearson correlation coefficient between point-cloud distances and node–node distancesas we increase γ from to in increments of . for Kleinberg-like small-world networks withparameters N = 2500 , d G = 8 , and d NG = 2 and contagion thresholds of (a) T = 0 . , (b) T = 0 . , and (c) T = 0 . . In panel (d), we show the plots for all three values of T . (a) (b) (c) (d) W a ss e r s t e i n d i s t an c e ( W [ d ] ) T=.05, topology W a ss e r s t e i n d i s t an c e ( W [ d ] ) T=.25, topology W a ss e r s t e i n d i s t an c e ( W [ d ] ) T=.4, topology W a ss e r s t e i n d i s t an c e ( W [ d ] ) T= .05, .25, .4, topology
T=0.25T=0.4T=0.05
Fig. 7 . Wasserstein distance between scaled barcodes as we increase γ from to in incrementsof . for Kleinberg-like small-world networks with parameters N = 2500 , d G = 8 , and d NG = 2 and contagion thresholds of (a) T = 0 . , (b) T = 0 . , and (c) T = 0 . . In panel (d), we show theplots for all three values of T . (a) (b) (c) (d) E m bedd i ng d i m en s i on ( P ) T=.05, dimensionality E m bedd i ng d i m en s i on ( P ) T=.25, dimensionality E m bedd i ng d i m en s i on s ( P ) T=.4, dimensionality E m bedd i ng d i m en s i on ( P ) T= .05, .25, .4, dimensionality
T=0.4T=0.05T=0.25
Fig. 8 . Approximate embedding dimension as we increase γ from to in increments of . forKleinberg-like small-world networks with parameters N = 2500 , d G = 8 , and d NG = 2 and contagionthresholds of (a) T = 0 . , (b) T = 0 . , and (c) T = 0 . . In panel (d), we show the plots for allthree values of T .
6. Bifurcation analysis.
We conduct a bifurcation analysis for the spreadingbehavior of the WTM contagion (see Section 3 for its definition), which we initializewith cluster seeding on the family of Kleinberg-like small-world networks that wedescribed in Section 2. The results of this bifurcation analysis give a guideline forinterpreting our prior numerical computations. We want to determine analyticallywhich combinations of network parameter values d G , and d NG (see Section 2) andthreshold parameter value T (see Section 3) allow the contagion to spread by WFPand which allow it to spread by ANC. That is, we want to identify regions in parameterspace for which the spreading dynamics follow specific regimes that are characterizedby the presence and absence of WFP and ANC. We are especially interested in theregion of parameter space for which there is WFP but no ANC, as this region shouldcomprise the parameter combinations for which the contagion map exhibits structuralfeatures of a torus. ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS N = 2500 exclusively, as — at least locally and sufficiently early inthe contagion process — the total size of the network should not affect the contagionbehavior. At later stages, a contagion that saturates a network will speed up earlierfor smaller networks, as the active region of the network is now proportionately largerwith respect to the total network size. Additionally, we restrict our analysis to thecase where γ = 0 (i.e., we place non-geometric edges uniformly at random).We fix the geometric degree d G and examine the spreading behavior as we varythe non-geometric degree d NG and the threshold T . The possible values for d G areconstrained by the number of nodes that are within a distance p ∈ R > of a givennode (see Section 2). For a given p , the corresponding d G is 1 less than the numberof integer lattice points that lie inside a circle of radius p that is centered at theorigin. This number approximately equal to the area of the circle, and the problemof determining it is known as the Gauss Circle Problem [20]. The three smallestvalues of non-geometric degree d G are 4, 8, and 12, which correspond to 1 ≤ p < √ √ ≤ p <
2, and 2 ≤ p < √
5, respectively, in the definition of our Kleinberg-likesmall-world network.
We consider the networks for d G =4 , ,
12 individually, and we work out the maximum threshold for which a Kleinbergnetwrok with γ = 0 can support sustained spreading via only geometric edges.If d G = 4, then for WFP to occur, the threshold T needs to be small enough toallow spreading via a single edge. Therefore, for variable d NG , for WFP to occur, thethreshold T needs to be smaller than T WFP = 14 + d NG . (a) (b) (c) d (NG) T h r e s ho l d ( T ) WFP bifurcation for d (G) =4 Fig. 9 . (a) Purely geometric network with geometric neighbors up to radius r = 1 around anode. (This corresponds to p = 1 and q = 0 in our Kleinberg-like small-world networks.) Thegeometric degree is d G = 4 , and the non-geometric degree is d NG = 0 . We color the seed nodes ofthe contagion seed in dark red, the nodes that activate during the first time step in a moderatelydark color, and the nodes that are activated in the second time steps in a light color. (b) A nodewith its four direct neighbors, which are the nodes that are within Euclidean distance p = 1 fromit. (c) Bifurcation diagram for the occurrence of WFP in a network with d G = 4 . We vary thenon-geometric degree d NG from to , and we vary the contagion threshold T from to . WFPoccurs only in the region below the curve. If d G = 8, then for WFP to occur, the threshold T needs to be small enoughto allow spreading via 3 edges. Therefore, for variable d NG , for WFP to occur, thethreshold T needs to be smaller than T WFP = 38 + d NG . B. I. MAHLER, U. TILLMANN, M. A. PORTER (a) (b) (c) d (NG) T h r e s ho l d ( T ) WFP bifurcation for d (G) =8 Fig. 10 . (a) Purely geometric network with geometric neighbors up to radius r = √ arounda node. (This corresponds to p = √ and q = 0 in our Kleinberg-like small-world networks.) Thegeometric degree is d G = 8 , and the non-geometric degree is d NG = 0 . We color the seed nodes ofthe contagion seed in dark red, the nodes that activate during the first time step in a moderatelydark color, and the nodes that are activated in the second time step in a light color. (b) A nodewith its eight direct neighbors, which are the nodes that are within Euclidean distance p = √ fromit. (c) Bifurcation diagram for the occurrence of WFP in a network with d G = 8 . We vary thenon-geometric degree d NG from to , and we vary the contagion threshold T from to . WFPoccurs only in the region below the curve. If d G = 12, then for WFP to occur, the threshold T needs to be small enoughto allow spreading via 4 edges. Therefore, for variable d NG , for WFP to occur, thethreshold T needs to be smaller than T WFP = 412 + d NG . (a) (b) (c) d (NG) T h r e s ho l d ( T ) WFP bifurcation for d (G) =12
Fig. 11 . (a) Purely geometric network with geometric neighbors up to radius r = 2 arounda node. (This corresponds to p = 2 and q = 0 in our Kleinberg-like small-world networks.) Thegeometric degree is d G = 12 , and the non-geometric degree is d NG = 0 . We color the seed nodesof the contagion seed in dark red, the nodes that activate during the first time step in a moderatelydark color, and the nodes that are activated in the second time step in a light color. (b) A nodewith its twelve direct neighbors, which are the nodes that are within Euclidean distance p = 2 fromit. (c) Bifurcation diagram for the occurrence of WFP in a network with d G = 12 . We vary thenon-geometric degree d NG from to , and we vary the contagion threshold T from to . WFPoccurs only in the region below the curve. There does not seem to be a closed form for T WFP that holds for general valuesof d G . One needs to find the maximum threshold that allows spreading by WFP foreach value of d G individually by finding the edges that can support spreading fromthe contagion seed. The activation of an inactive nodeby ANC occurs, by definition, exclusively via non-geometric edges. That is, a node
ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS T × ( d G + d NG ) active nodes bynon-geometric edges and all of its geometric neighbors are inactive. Consequently, ifthe threshold T is larger than or equal to the ratio of the non-geometric degree tothe total degree (i.e., T ≥ d NG d G + d NG ), then ANC is impossible. If T < d NG d G + d NG , thenANC is possible. When d NG − d G + d NG ≤ T < d NG d G + d NG , ANC can occur in principle, butonly if all of the non-geometric edges of an inactive node that has no active geometricneighbors ‘reach into’ contagion clusters. This is very unlikely to occur in practice,so a threshold T for which ANC is possible in principle is not a good indicator inpractice for the presence of ANC. We will explore this issue.We define the horizon H ANC = d NG d G + d NG of ANC to be the boundary between thresholds for which ANC is possible in theoryand thresholds for which ANC is impossible. Using the horizon as a boundary curvefor ANC generates an ‘idealized’ bifurcation diagram that tends to overestimate thesize of the region of the parameter space for which ANC occurs.In practice, one needs to think about the probability of a number k among allnon-geometric edges of a given node reaching into clusters of active nodes. If weplace non-geometric edges uniformly at random (which occurs when γ = 0 in theconstruction of our Kleinberg networks), the expected probability for a non-geometricedge of an inactive node to be incident to an active node at time t is q ( t ) N − , where q ( t )is the number of active nodes at time t . Consequently, if the non-geometric degreeis d NG , the expected number of non-geometric edges of an inactive node that areincident to active nodes is q ( t ) N − d NG . It follows that the maximum threshold for whichone can expect every node that is inactive before time t to activate via ANC at time t is(6.1) T = q ( t ) N − d NG d G + d NG . However, for ANC to occur at a certain time step, it is not necessary for every inactivenode to activate via ANC at that time step. It suffices for any inactive node that issufficiently far away from the contagion to activate via ANC, and the threshold forthat to occur is generally lower than (6.1). Consequently, we expect (6.1) to be alower bound for T ANC , the critical threshold for ANC to occur.The numerator in (6.1) depends linearly on the number of active nodes q ( t ), whichis time-dependent. This raises the question of what may a sensible choice for t (and q ( t )). Intuitively, if ANC occurs towards the end of a spreading process, when largeparts of a network are already active, then its contribution to the spreading of thecontagion is a minor one and it has only a negligible distortive effect on a contagionmap. The activation times of nodes that are infected via ANC late in a contagionprocess are only mildly shorter than what occurs for spreading purely via WFP, sothe points in the image of a contagion map are perturbed only slightly. To obtaina meaningful approximate bound for T ANC , we thus seek to work out a point in thespreading process up to which the occurrence of ANC plays a significant role in theoverall spreading behavior and accordingly has a noticeable effect on the contagionmap. The later this point occurs, the larger q ( t ) N − d NG will be and the larger we expect8 B. I. MAHLER, U. TILLMANN, M. A. PORTER the critical threshold (i.e., bifurcation point) T ANC to be. We have(6.2) δ H
ANC = δ d NG d G + d NG < T ANC < d NG d G + d NG = H ANC for some δ ∈ (0 , δ arises from how late in a spreading processthe occurrence of ANC plays a significant role. If, for instance, the occurrence ofANC plays a significant role in overall spreading behavior and thus has a noticeabledistortive effect on a contagion map only if it takes place by the time that three fifthsof the nodes in a network are active, then the bifurcation curve for ANC is boundedbelow as follows: T ANC > d NG d G + d NG . In Figure 12, we compare the idealized bifurcation diagram (using the horizonof ANC as its bifurcation curve) and the diagram that we obtain from (6.1) withthe number of active nodes equaling (3 / N , where N is the number of nodes in anetwork, to our numerical results for the geometry and dimensionality.(a) (b) d (NG) T h r e s ho l d ( T ) d (NG) T h r e s ho l d ( T ) Fig. 12 . (a) Idealized bifurcation curve and (b) approximate bifurcation curve for WFP andANC in a Kleinberg-like small-world network with geometric neighbors up to distance r = √ arounda node (i.e., with geometric degree d G = 8 ). The blue curve shows T WFP ; the red curve shows theidealized and approximate T ANC in panels (a) and (b), respectively.
The above argument is independent of the particular geometry that underliesa network, as long as one places the non-geometric edges uniformly at random. Inparticular, the inequalities (6.2) should also hold for the ring lattice in the compu-tations of Taylor et al. [45]. Indeed, looking at their results (see Figure 6 in [45]),their T ANC0 curve (the dotted curve) does seem to sit a bit higher than what theyobserved in their numerical results, suggesting that T ANC0 is indeed bounded aboveby the idealized bifurcation curve.If one chooses three fifths of the total number of nodes as the maximum numberof nodes that activate before a certain time step t for the occurrence of ANC to be‘significant’ at time t , then one should expect the actual bifurcation curve T ANC tolie somewhere between the red curves in Figure 12.To find the actual bifurcation curve T ANC , we need to find (for a given value of d NG ) the largest threshold T that realistically allows ANC to arise before the activeregion of the network is so large that the occurrence of ANC no longer has a significantimpact on the spreading dynamics. This amounts to finding a threshold T that is as ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS neighborhood and thenotation N for the set of nodes outside which we count nodes being activated viaANC. The neighborhood can consist either of the active nodes only, in which case |N | = q ( t ); or it can include some additional nodes around the active region, suchthat N > q ( t ).At a given time t , suppose that the q ( t ) nodes that are active at time t wereactivated at previous time steps by WFP from the cluster seed and that they forman active cluster of roughly square shape. We denote this set of nodes by I = { i ∈ V | η i ( t ) = 1 } , so | I | = q ( t ). We define the neighborhood N ( I ) of this active cluster to be the clusteritself together with the nodes in its periphery of a certain ‘width’. That is, N ( I )is the active cluster I itself, its boundary, and (depending on the width) some morenodes around it. Given a width w , we approximate the number of nodes in N ( I ) as(6.3) |N ( I ) | ≈ (cid:16)(cid:112) q ( t ) + 2 w (cid:17) . We can choose any natural number for the width w , and one plausible choice is d G / w = 0, the neighborhood N ( I ) is just I itself (and then |N | = q ( t )). To make asensible choice for the size of the active region of a network at the latest point in aspreading process at which we consider ANC to be significant, we estimate the largestnumber q max of active nodes such that the active region together with a periphery ofinactive nodes of width d G takes up at most 90% of a network. We make this estimateby choosing q max to be the largest integer such that(6.4) |N max | = (cid:0) √ q max + 2 d G (cid:1) ≤ . N .
Consequently,(6.5) q max ≤ (cid:16) √ . N − d G (cid:17) . For N = 2500 and d G = 8, this gives q max = 988.If we consider ANC to be significant up to time t , the corrected bifurcation curvefor ANC is(6.6) T ANC = kd G + d NG , where k is the largest integer such that, at time t , the expected number of nodesoutside the neighborhood N ( I ) of I that have more than k active neighbors is at least1. Let X k be the number of nodes outside N ( I ) with more than k active neighbors.It satisfies the binomial distribution X k ∼ Bin ( N − |N ( I ) | , P ( d in > k )) , Our use of the term ‘neighborhood’ is different from its usual use in graph theory. B. I. MAHLER, U. TILLMANN, M. A. PORTER so P ( X k = x ) = (cid:18) N − |N ( I ) | x (cid:19) P ( d in > k ) x P ( d in ≤ k ) N −|N ( I ) |− x , where d in is the number of edges of a node outside N ( I ) that are incident to an activenode. Therefore, the expected number of nodes outside N ( I ) with more than k activeneighbors is E [ X k ] = N −|N ( I ) | (cid:88) x =0 P ( X k = x ) x = ( N − |N ( I ) | ) P ( d in > k )= ( N − |N ( I ) | ) (1 − P ( d in ≤ k )) . (6.7)As we argue in a remark at the end of this section, it is approximately the case that(6.8) d in ∼ Bin (cid:18) d NG , q ( t ) N (cid:19) . That is, d in approximately follows a binomial distribution. Its associated (approxi-mate) cumulative distribution function is P ( d in ≤ k ) ≈ k (cid:88) d =0 (cid:18) d NG d (cid:19) (cid:18) q ( t ) N (cid:19) d (cid:18) − q ( t ) N (cid:19) d NG − d = (cid:0) d NG − k (cid:1) (cid:18) d NG k (cid:19) (cid:90) − q ( t ) N s d NG − k − (1 − s ) k ds . (6.9)To determine the numerator of T ANC (see formula (6.6)), we seek the largestinteger k such that E [ X k ] ≥
1. That is, using equation (6.9), we seek the largestinteger k such that(6.10) N − |N ( I ) | − N − |N ( I ) | ≥ k (cid:88) d =0 (cid:18) d NG d (cid:19) (cid:18) q ( t ) N (cid:19) d (cid:18) − q ( t ) N (cid:19) d NG − d . We can find k for each value of d NG and deduce T ANC ( d NG ) from that value for agiven d G . For d G = 8, this yields the plots in Figure 13 for T ANC for various choicesof the width of the neighborhood N ( I ) and the maximum value of q ( t ) at which weconsider ANC to be significant.Observe that the curves are essentially increasing, but in an oscillatory manner,resulting in a staircase-like shape. This shape arises from the fact that, as we increase d NG from 0 to 25 in integer increments, it affects both the largest integer k such thatinequality (6.10) is satisfied and the numerator of formula (6.6). For progressivelylarger values of d NG , in the sum on the right-hand side of the inequality (6.10), we seethat (cid:0) d NG d (cid:1) is progressively larger and (cid:16) − q ( t ) N (cid:17) d NG − d is progressively smaller, wherethe latter factor is dominant. This explains the overall increasing tendency of thecurve. For a given value of k , an increase of d NG leads to a steady decrease of T ANC ,explaining the small intervals of decreases of the curve.
ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS d (NG) T h r e s ho l d ( T ) d (NG) T h r e s ho l d ( T ) d (NG) T h r e s ho l d ( T ) d (NG) T h r e s ho l d ( T ) Fig. 13 . Behavior of T ANC in a Kleinberg-like small-world network with geometric neighborsup to distance r = √ around a node (i.e., with geometric degree d G = 8 ) for various choices of whatconstitutes the neighborhood of an active cluster of nodes: (a) |N ( I ) | = q (0) (pathological case), (b) |N ( I ) | = (cid:16)(cid:112) q (0) + d G (cid:17) (pathological case), (c) |N ( I ) | = 988 , and (d) |N ( I ) | = (cid:16) √
988 + d G (cid:17) . Note (again) that our central reasoning in the above argument is independentof the geometry that underlies our noisy geometric network, provided we place non-geometric edges uniformly at random. The only point at which the particular geom-etry of the 2D torus comes into play is in our estimation of the approximation (6.3),which we calculate by assuming that the contagion cluster I is roughly square-shapedand that its neighborhood N ( I ) forms a larger square-shaped area of some widtharound I . If we take this width to be 0 (i.e., if N ( I ) = q ( t )), the formula for theexpectation (6.7) is the same for any noisy geometric network. d (NG) T h r e s ho l d ( T ) no WFPno ANCno WFPANCANCWFPno ANCWFP (b)(c)(a) Fig. 14 . Bifurcation diagram for WFP and ANC in a Kleinberg-like small-world network withgeometric neighbors up to distance r = √ around a node (i.e., geometric degree d G = 8 ). The bluecurve shows T WFP , and the red curve shows T ANC . The points ( a ) , ( b ) , and ( c ) mark parametercombinations that represent three different spreading regimes. Point ( a ) indicates the parametercombination ( d NG , T ) = (2 , . , for which (due to the small value of the threshold T ) we expectfast spreading via WFP and ANC; point ( b ) indicates the parameter combination ( d NG , T ) = (2 , . ,representing spreading predominantly via WFP; and point ( c ) indicates the parameter combination ( d NG , T ) = (2 , . , for which we do not expect spreading. If d G = 8 and d NG = 2 , the total degreeis . This implies that a contagion requires , , and adjacent activated nodes, respectively, toactivate an inactive node for (a) T = 0 . , (b) T = 0 . , and (c) T = 0 . . Remark:
The random variable d in follows a binomial distribution only approx-2 B. I. MAHLER, U. TILLMANN, M. A. PORTER imately, because — with the lack of multi-edges in our networks — the event thata non-geometric edge of a node is incident to an active (or inactive) node is not en-tirely independent of another one of this node’s non-geometric edges being incidentto an active (or inactive) node. Consequently, if we want to determine the proba-bility that a node outside N ( I ) has k non-geometric edges that reach into a conta-gion cluster (i.e., P ( d in ) = k ), we have to pick k of the node’s d NG non-geometricedges { e , e , . . . e k , e k +1 , . . . , e d NG − , e d NG } and calculate consecutively, for each (cid:96) ∈{ , , . . . d NG } , the probability that e (cid:96) is incident to an active (if (cid:96) ∈ { , . . . , k } ) orinactive (if (cid:96) ∈ { k + 1 , . . . , d NG } ) node, given the incidence statuses of all e j with j < (cid:96) .The precise probability of a node having k non-geometric edges that are incidentto an active node is thus P ( d in = k ) = (cid:18) d NG k (cid:19) (cid:16)(cid:81) k − k (cid:48) =0 q ( t ) − k (cid:48) (cid:17) (cid:16)(cid:81) d NG − − kk (cid:48) =0 N − − q ( t ) − k (cid:48) (cid:17)(cid:81) d NG − k (cid:48) =0 ( N − − k (cid:48) ) . However, for N (cid:29) k , it is reasonable to approximate the probability that a givennon-geometric edge of an inactive node is incident to an active node as q ( t ) N (i.e., thenumber of active nodes divided by the total number of nodes). Consequently, d in asymptotically follows the binomial distribution and equation (6.9) is correct asymp-totically.
7. Conclusions and Discussion.
Networks that have some underlying geom-etry and include both geometric edges (which are short according to that geometry)and non-geometric edges (which can occur between nodes regardless of their distancefrom each other) arise in many applications [3], including modeling of human com-munication and transportation. The spreading of a contagion on such a network canbe influenced heavily by the underlying geometry, and it is useful to investigate thestrength of such influence.To study this problem, we considered a family of networks whose nodes lie ona 2D torus and whose edges include both geometric edges (which are deterministicand close to each other on the torus) and non-geometric edges (which are formedrandomly and can be between nodes that are far from each other on the torus). Usingthe Watts threshold model, we investigated the spreading behavior of contagions onthis family of networks. We did so by mapping network nodes to a high-dimensionalpoint cloud via a contagion map (following Taylor et al. [45]) and analyzing the struc-ture of this point cloud from three perspectives: geometrically, topologically, and interms of dimensionality. To examine the point cloud’s geometry and dimensionality,we calculated a Pearson correlation coefficient and the embedding dimension, whichare well-established measures and easy to compute. To study the topology of thepoint cloud, we computed persistent homology of the Vietoris–Rips filtration andthen calculated the Wasserstein distance between the corresponding barcode and areference barcode. This was the most challenging and time-consuming part of ourwork, as algorithms for the computation of PH are computationally expensive andsoftware development in the field is still relatively young and evolving. A single runon a contagion map took several hours to finish for some parameter combinations.We therefore restricted ourselves to computing PH in dimension 1, although PH indimension 2 may also be insightful for our problem. In our analysis of the topologicalstructure of the point clouds, we also illustrated the sensitivity of the Wassersteindistance to the overall scale of barcodes and the ensuing need to correct for geometric
ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS
B. I. MAHLER, U. TILLMANN, M. A. PORTER (a) (b) (c)
Fig. 15 . Our numerical results for (a) geometry, (b) topology, and (c) dimensionality superim-posed on the bifurcation diagram for WFP and ANC for a Kleinberg-like small-world network withgeometric neighbors up to distance r = √ around a node (i.e., geometric degree d G = 8 ). The tran-sitions between qualitatively different structures according to the numerical results align well withthe curves for T WFP and T ANC . In the absence of WFP, our numerical results cannot distinguishbetween the presence and absence of ANC. Additionally, in the presence of WFP, the values of ourquantifiers that suggest the presence of a toroidal structure of a point cloud are somewhat large evenin the region in which we do not expect any ANC, and they become weaker for progressively largervalues of d NG . This is a sensible observation, as one can expect toroidal structure only when WFPis present, and such toroidal structure is disturbed by the presence of ANC to an extent that dependson the rate of ANC. Acknowledgements.
We thank Florian Klimm, Dane Taylor, and two anony-mous referees for helpful comments. We are especially thankful to Heather Harringtonfor numerous helpful discussions at an early stage of this project.
REFERENCES[1]
D. Balcan, V. Colizza, B. Gonc¸alves, H. Hu, J. J. Ramasco, and A. Vespignani , Mul-tiscale mobility networks and the spatial spreading of infectious diseases , Proceedings ofthe National Academy of Sciences of the United States of America, 106 (2009), pp. 21484–21489.[2]
M. Barthelemy , Spatial networks , Physics Reports, 499 (2011), pp. 1–101.[3]
M. Barthelemy , Morphogenesis of Spatial Networks , Springer International Publishing, Cham,Switzerland, 2018.[4]
M. Belkin and P. Niyogi , Laplacian eigenmaps for dimensionality reduction and data repre-sentation , Neural Computation, 15 (2002), pp. 1373–1396.[5]
M. Bogu˜n´a, F. Papadopoulos, and D. Krioukov , Sustaining the Internet with hyperbolicmapping , Nature Communications, 1 (2010), 62.[6]
D. Brockmann and D. Helbing , The hidden geometry of complex, network-driven contagionphenomena. , Science, 342 (2013), pp. 1337–1342.[7]
D. Centola , The spread of behavior in an online social network experiment , Science, 329(2010), pp. 1194–1198.[8]
D. Centola, M. W. Macy, and V. M. Egu´ıluz , Cascade dynamics of complex propagation ,Physica A, 374 (2007), pp. 449–456.[9]
R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W.Zucker , Geometric diffusions as a tool for harmonic analysis and structure definitionof data: Diffusion maps , Proceedings of the National Academy of Sciences of the UnitedStates of Americal, 102 (2005), pp. 7426–7431.[10]
D. J. Corsi, M. H. Boyle, S. A. Lear, C. K. Chow, K. K. Teo, and S. V. Subramanian , Trends in smoking in Canada from 1950 to 2011: Progression of the tobacco epidemicaccording to socioeconomic status and geography , Cancer Causes and Control, 25 (2014),pp. 45–57.[11]
M. A. Cox, T. F. & Cox , Multidimensional Scaling , CRC Press, Boca Raton, FL, USA, 2010.[12]
R. Ebrahimi, J. Gao, G. Ghasemiesfeh, and G. Schoenebeck , Complex contagions in Klein-berg’s small world model , in Proceedings of the 2015 Conference on Innovations in Theo-retical Computer Science, ACM, 2015, pp. 63–72.[13]
H. Edelsbrunner and J. L. Harer , Computational Topology: An Introduction , AmericanONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS Mathematical Society, Providence, RI, USA, 2010.[14]
H. Edelsbrunner and J. L. Harer , Persistent homology — A survey , in Surveys on Discreteand Computational Geometry. Twenty years later, J. E. Goodman, J. Pach, and R. Pol-lak, eds., vol. 453 of Contemporary Mathematics, American Mathematical Society, 2008,pp. 257–282.[15]
M. Feng and M. A. Porter , Persistent homology of geospatial data: A case study with voting ,http://arxiv.org/abs/1902.05911 (2019).[16]
S. Gerber, T. Tasdizen, and R. Whitaker , Robust non-linear dimensionality reduction us-ing successive 1-dimensional Laplacian Eigenmaps , Proceedings of the 24th internationalconference on Machine learning — ICML ’07, (2007), pp. 281–288.[17]
G. Ghasemiesfeh, R. Ebrahimi, and J. Gao , Complex contagion and the weakness of long tiesin social networks: revisited , Proceedings of the fourteenth ACM Conference on ElectronicCommerce, 1 (2013), pp. 507–524.[18]
M. Granovetter , Threshold models of collective behavior , American Journal of Sociology, 83(1978), pp. 1420–1443.[19]
J. Guckenheimer and P. Holmes , Nonlinear Oscillations, Dynamical Systems, and Bifurca-tions of Vector Fields , Springer-Verlag, Berlin, Germany, 1983.[20]
G. H. Hardy , On the expression of a number as the sum of two squares , Quarterly Journal ofMathematics, 46 (1915), pp. 263–283.[21]
K. D. Harris, C. M. Danforth, and P. S. Dodds , Dynamical influence processes on networks:General theory and applications to social contagion , Physical Review E, 88 (2014), 022816.[22]
P. Hedstr¨om , Contagious collectivities: On the spatial diffusion of Swedish trade unions,1890-1940 , American Journal of Sociology, 99 (1994), pp. 1157–1179.[23]
P. D. Hoff, A. E. Raftery, and M. S. Handcock , Latent space approaches to social networkanalysis , Journal of the American Statistical Association, 97 (2002), pp. 1090–1098.[24]
J. S. Juul and M. A. Porter , Hipsters on networks: How a small group of individuals canlead to an antiestablishment majority , Physical Review E, 99 (2019), 022313.[25]
J. S. Juul and M. A. Porter , Synergistic effects in threshold models on networks , Chaos, 28(2018), 013115.[26]
M. Kerber, D. Morozov, and A. Nigmetov , Geometry helps to compare persistence dia-grams , Journal of Experimental Algorithmics, 22 (2017), 1.4.[27]
P. L. Kiss, Istvan Z. and Miller, Joel C. and Simon , Mathematics of Epidemics on Net-works: From Exact to Approximate Models , Springer-Verlag, Berlin, Germany, 2017.[28]
J. M. Kleinberg , Navigation in a small world , Nature, 406 (2000), p. 845.[29]
J. M. Kleinberg , The small-world phenomenon: An algorithmic perspective , in Proceedingsof the Thirty-Second Annual ACM Symposium on Theory of Computing, ACM, 2000,pp. 163–170.[30]
S. Lehmann and Y.-Y. Ahn , Complex Spreading Phenomena in Social Systems , SpringerInternational Publishing, Cham, Switzerland, 2018.[31]
S. Melnik, J. A. Ward, J. P. Gleeson, and M. A. Porter , Multi-stage complex contagions ,Chaos, 23 (2013), 013124.[32]
M. E. J. Newman , Models of the small world , Journal of Statistical Physics, 101 (2000),pp. 819–841.[33]
M. E. J. Newman , Networks , Oxford University Press, second ed., 2018.[34]
S.-W. Oh and M. A. Porter , Complex contagions with timers , Chaos, 28 (2018), 033101.[35]
N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, and H. A. Harrington , A roadmapfor the computation of persistent homology , European Physical Journal — Data Science,6 (2017), 17.[36]
R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani , Epidemicprocesses in complex networks , Reviews of Modern Physics, 87 (2015), pp. 925–979.[37]
M. A. Porter , Small-world network , Scholarpedia, 7 (2012), 1739.[38]
M. A. Porter and J. P. Gleeson , Dynamical Systems on Networks: A Tutorial , Frontiersin Applied Dynamical Systems: Reviews and Tutorials, Springer International Publishing,Vol. 4, 2016.[39]
C. J. Rhodes and R. M. Anderson , Epidemic thresholds and vaccination in a lattice modelof disease spread. , Theoretical Population Biology, 52 (1997), pp. 101–118.[40]
E. M. Rogers , Diffusion of Innovations , Simon and Schuster, New York City, NY, USA, 2010.[41]
M. ´A. Serrano, M. Bogu˜n´a, and F. Sagu´es , Uncovering the hidden geometry behindmetabolic networks , Molecular BioSystems, 8 (2012), pp. 843–850.[42]
G. W. Shannon, G. F. Pyle, and R. L. Bashshur , The Geography of AIDS: Origins andCourse of an Epidemic , The Guilford Press, New York, NY, USA, 1991.[43]
V. de Silva and R. Ghrist , Coverage in sensor networks via persistent homology , Algebraic B. I. MAHLER, U. TILLMANN, M. A. PORTER& Geometric Topology, Mathematical Sciences Publishers, 2007, pp. 339–358.[44]
C. O. S. Sorzano, J. Vargas, and A. P. Montano , A survey of dimensionality reductiontechniques , http://arxiv.org/abs/1403.2877, (2014).[45]
D. Taylor, F. Klimm, H. A. Harrington, M. Kram´ar, K. Mischaikow, M. A. Porter,and P. J. Mucha , Topological data analysis of contagion maps for examining spreadingprocesses on networks , Nature Communications, 6 (2015), 7723.[46]
J. B. Tenenbaum, V. De Silva, and J. C. Langford , A global geometric framework fornonlinear dimensionality reduction , Science, 290 (2000), pp. 2319–2324.[47]
T. W. Valente , Network models of the diffusion of innovations , Computational & Mathemat-ical Organization Theory, 2 (1996), pp. 163–164.[48]
D. J. Watts , A simple model of global cascades on random networks , Proceedings of theNational Academy of Sciences of the United States of America, 99 (2002), pp. 5766–5771.[49]
D. J. Watts and S. H. Strogatz , Collective dynamics of ‘small-world’ networks , Nature, 393(1998), pp. 440–442.[50]
X. J. Xu, X. Zhang, and J. F. F. Mendes , Impacts of preference and geography on epidemicspreading , Physical Review E, 76 (2007), 056109.[51]
A. Zomorodian and G. Carlsson , Computing persistent homology , Discrete and Computa-tional Geometry, 33 (2005), pp. 249–274.ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS Supplementary Materials.
We present the mathematical background for themethodology that we used in the main text to construct a topological measure for howclosely the point clouds that we obtain from contagion maps (see Section 4.3 of themain text). For proofs and further discussion of this theory, see [13]. For a condensedand accessible introduction see [35].
8. Simplicial Homology.
Definition An abstract finite simplicial complex is a finite collection Σ offinite sets that is closed under inclusion: whenever α ∈ Σ and β ⊆ α , it follows that β ∈ Σ . The elements of Σ are called simplices . A face of a simplex α is a non-emptyproper subset β ⊂ α . The dimension of a simplex α ∈ Σ is | α | −
1. The 0-dimensionalsimplices are called vertices , and we denote the set of vertices by V (Σ). The dimensionof a simplicial complex is the maximum of the dimensions of the simplices that itcontains. A simplicial subcomplex Ω ⊆ Σ of a simplicial complex Σ is a subcollectionof simplices that is itself a simplicial complex. For n ∈ N , the n -skeleton of a simplicialcomplex is the union of its simplices of dimensions m ≤ n .To each simplex, we can assign a polytope, which is called its geometric realization .A 0-simplex corresponds to a vertex, a 1-simplex corresponds to an edge, a 2-simplexcorresponds to a triangle, a 3-simplex corresponds to a tetrahedron, and so on. Onecan thereby represent a simplicial complex Σ as a subset of the simplex that is spannedby its vertices. See page 53 of [14], and see Figure 16 for examples of geometricrealizations. (a) (b) (c) (d)(e) Fig. 16 . Geometric realizations of (a) a -simplex, (b) a -simplex, (c) a -simplex, (d) a -simplex, and (e) a 3D simplicial complex. Let Σ be a simplicial complex, let k be an integer, and let F some field. A k -chain is a linear combination of k -simplices in Σ over F . We can turn the set C k of k -chainsinto a vector space by defining addition and scalar multiplication to be component-wise. In topological data analysis, the most common field is Z / Z ; in this case, onecan construe a k -chain to be a collection of k -simplices in Σ. When working over Z / Z , addition is equivalent to taking the symmetric difference. It is straightforwardto check that C k satisfies the axioms of a vector space with this definition of additionand scalar multiplication and that the 0-vector is the empty set.The boundary ∂ k ( α ) of a k -simplex α is the alternating sum of its ( k − boundary of a k -chain is the sum of the boundaries of itssimplices. The boundary of a k -chain is a ( k − ∂ k B. I. MAHLER, U. TILLMANN, M. A. PORTER defines a function ∂ k : C k −→ C k − . This function commutes with vector additionand scalar multiplication on C k . That is, ∂ k is a linear map; it is called the boundaryoperator . We thus have a sequence of vector spaces that are connected by boundaryoperators:(8.1) · · · ∂ k +2 −−−→ C k +1 ∂ k +1 −−−→ C k ∂ k −→ C k − ∂ k − −−−→ · · · . A k -chain in the image B k of ∂ k +1 is called a k -boundary . A k -chain in thekernel Z k of ∂ k is called a k -cycle (see Figure 17). A fundamental property of theboundary operator is that the boundary of a boundary is empty. Consequently, thesequence (8.1) is a chain complex . Lemma
For any integer k and ( k + 1) -chain d ∈ C k +1 , we have that ∂ k ∂ k +1 ( d ) = 0 . That is, the k th boundary space B k is a subspace of the k th cycle space Z k , sothe following definition makes sense. Definition
Given a simplicial complex Σ and an integer k , the k th ho-mology H k (Σ) is the quotient vector space of the k th cycle space Z k (Σ) by the k thboundary space B k (Σ) : H k (Σ) = Z k (Σ) /B k (Σ) . The k th Betti number β k (Σ) is the dimension of the k th homology of Σ : β k (Σ) = dim H k (Σ) = dim Z k (Σ) − dim B k (Σ) . Two k -cycles represent the same element of the k th homology H k if they differ onlyby k -boundaries. Roughly speaking, β n (Σ) is the number of n -dimensional ‘holes’ ofthe space Σ. For example, β (Σ) is the number of connected components, β (Σ) is thenumber of ‘tunnels’, and β (Σ) is the number of ‘voids’ of the geometric realizationof Σ. C k Z k B k C k +1 Z k +1 B k +1 C k − Z k − B k − ∂ k +2 −→ ∂ k +1 −→ ∂ k −→ ∂ k − −→ Fig. 17 . A chain complex consists of a sequence of chain, cycle, and boundary spaces that areconnected by boundary operators.
Persistent Homology.
Definition A filtration of a finite simplicial complex Σ is a nested sequenceof simplicial subcomplexes of Σ , such that the th member of the sequence is the emptycomplex and the last member of it is all of Σ . That is, ∅ = F ⊆ F ⊆ · · · ⊆ F n = Σ . ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS F ⊆ F ⊆ · · · ⊆ F n of a simplicial complex Σ, for every i ≤ j and dimension k , the inclusion map from F i to F j induces a linear map f i,jk : H k ( F i ) −→ H k ( F j ) . Therefore, there is a sequence of homologies that are related via these linear maps:0 = H k ( F ) −→ H k ( F ) −→ · · · −→ H k ( F n ) = H k (Σ) . One can track the evolution of Σ along the filtration through the algebraic structuresof the homologies in this sequence.We can generalize the notion of homology in the setting of a filtration of a sim-plicial complex.
Definition
For an integer k , the k th persistent homologies (PH) are theimages of the linear maps induced by inclusion: H i,jk = im f i,jk , ≤ i ≤ j ≤ n , i, j ∈ N . The k th persistent Betti numbers are the dimensions of these spaces: β i,jk = dim( H i,jk ) . Using Definition 8.5, we can formalize the notion of birth and death of a homologyclass.
Definition
A nonzero homology class ξ ∈ H k ( F i ) is born at F i if ξ / ∈ H i − ,ik , and it dies at F j if f i,j − k ( ξ ) (cid:54) = 0 but f i,jk ( ξ ) = 0 . For 0 ≤ i ≤ j ≤ n , one can construe the k th PH H i,jk as the space that consistsof all homology classes that are born at or before F i and are still alive at F j .We can now finally give a mathematically rigorous definition of persistence. Definition
Let ∅ = F ⊆ F ⊆ · · · ⊆ F n = Σ be a filtration of a simplicialcomplex Σ , and let k be an integer. If ξ ∈ H k ( F i ) is a homology class that is born at F i and dies at F j , then its persistence (also known as its ‘lifespan’) is the difference pers( ξ ) = j − i . If ξ never dies, we note its death as infinite (i.e., j = ∞ ), makingits persistence infinite as well (i.e., pers( ξ ) = ∞ ). The interval [ i, j ) is called a persistence interval . Barcodes, Persistence Diagrams, and Wasserstein Distance.
Given a fil-tration of a simplicial complex, the birth or death of a homology class is accompaniedby a change of the topological characteristics of the complex. If there are only births(respectively, only deaths) at j , then the j th Betti number increases (respectively,decreases). Therefore, tracking the change of Betti numbers during a filtration isuseful for monitoring the topological evolution of the growing complex. A topologicalfeature that emerges with the birth of a homology class ξ and disappears with thedeath of ξ is said to ‘correspond to’ ξ and has persistence pers( ξ ). The features thatpersist for a long time interval are usually considered to be the important features ofthe complex, although this is not always the case. (For a discussion, see [15].)One can view the collection of persistence intervals [ i, j ) as the filtered analog ofBetti numbers. Two ways to represent them are with a barcode or with a persistencediagram (see Figure 18).0 B. I. MAHLER, U. TILLMANN, M. A. PORTER
Definition
The persistence diagram of the PH (at a given dimension) of afiltered simplicial complex is the multiset { ( i, j ) ∈ R | [ i, j ) is a persistence interval } ∪ { ( i, j ) ∈ R | i = j } ⊆ R , where R = R ∪ ∞ . Note that all persistence diagrams have equal cardinality. For a given dimension,the collection of persistence intervals of the PH of a filtration is called a barcode (seeFigure 18).(a) (b)
Fig. 18 . (a) An example of a barcode. Each horizontal bar indicates the lifespan of a homologyclass. (b) Equivalent persistence diagram. Points in the extended plane R , where R = R ∪ ∞ , markthe persistence intervals. The definition of barcodes (and, equivalently, the definition of persistence dia-grams) depends on a choice of basis for the homology spaces H k ( F i ). The Fundamen-tal Theorem of Persistent Homology [51] guarantees that there exists a choice of basisthat defines barcodes uniquely.One can turn the space of persistence diagrams (equivalently, the space of bar-codes) into a metric space by defining the following notion of distance between twopersistence diagrams. Definition
Given two persistence diagrams D and D , a metric d on R ,and a number p ∈ [1 , ∞ ] , the p th Wasserstein distance between D and D is W p [ d ]( D , D ) := inf φ : D → D (cid:34) (cid:88) x ∈ D d [ x, φ ( x )] p (cid:35) /p , where φ ranges over all bijections from D and D .If p = ∞ and d = L ∞ , where L ∞ (( x , y ) , ( x , y )) = sup {| x − x | , | y − y |} ,the Wasserstein distance W ∞ [ L ∞ ] is called the bottleneck distance . One property of PH that is central to its utility in applications is that it is stable andtherefore robust to noise: A small perturbation of input data induces only a small
ONTAGION MAPS ON A CLASS OF NETWORKS EMBEDDED IN A TORUS ˇCech Complex and Vietoris–Rips Complex.
Persistent homology is a usefultool for analyzing point-cloud data. A point cloud is a set of points, P = { x , x , . . . , x l } ⊆ M , in a metric space (
M, d ). One can view P as a sample from some subspace of M .There are various ways to construct a simplicial complex from a point cloud. Two ofthe most common constructions are the ˇCech complex and the Vietoris–Rips complex,which we now describe.Given a non-negative number (cid:15) , recall that the closed ball B (cid:15) ( x ) of radius (cid:15) around a point x ∈ M is the set of points within distance (cid:15) from x ; that is, B (cid:15) ( x ) = { y ∈ M : d ( x, y ) ≤ (cid:15) } . Definition
Let ( M, d ) be a metric space, and let P = { x , x , . . . , x l } ⊆ M be a point cloud in M . For (cid:15) ≥ , the ˇCech complex C (cid:15) at (cid:15) associated with P is thesimplicial complex whose simplices are sets of points in P whose closed ( (cid:15)/ -ballshave non-empty intersection. The set { y , . . . , y k } ⊆ P is a ( k − -simplex of C (cid:15) ifand only if k (cid:84) i =1 B (cid:15)/ ( y i ) (cid:54) = ∅ . This set of simplices is closed under taking subsets. That is, the conditions for asimplicial complex are indeed satisfied by this definition. The 0-simplices of C (cid:15) cor-respond precisely to the points x , x , . . . , x l ; the 1-simplices are the pairs of pointsthat are within distance (cid:15) of each other; the 2-simplices are the triples of points whose( (cid:15)/ C is the setof points in P . For sufficiently large (cid:15) (to be precise, for (cid:15) at least as large as thediameter of P ), the ˇCech complex C (cid:15) is the ( l − { x , x , . . . , x l } together with all of its faces. If (cid:15) ≤ (cid:15) , then C (cid:15) ⊆ C (cid:15) , and increasing (cid:15) incrementallyfrom 0 to a large enough value gives a filtration of ˇCech complexes associated withthe point cloud P : ∅ ⊆ C ⊆ · · · ⊆ C (cid:15) large . For M = R n , the following result, known as the Nerve Theorem , states thatthe ˇCech complex C (cid:15) associated with a point cloud P is topologically faithful to theunion of the closed ( (cid:15)/ P in the sense that it has thesame homotopy type. Intuitively, two spaces have the same homotopy type if onecan transform them into each other by bending, compressing, and expanding them(without having to do any cutting or gluing). Theorem . For a point cloud P = { x , x , . . . , x l } ⊆ R n and (cid:15) ≥ , the ˇCech complex C (cid:15) is homotopy equivalent to the union of the closed ( (cid:15)/ -balls around the points in P . The Nerve Theorem justifies why, when a point cloud P is a sample of somesubspace of M , the ˇCech filtration can reveal features of this subspace. We expectthat features that have a long lifespan in the ˇCech filtration are likely to correspondto features of the underlying space.Whether the balls of a certain radius around a set of points in P ⊆ M have apoint of common intersection depends on the entire metric space M and the position2 B. I. MAHLER, U. TILLMANN, M. A. PORTER of P in it. Checking for a point of common intersection is computationally intensive,so it can be impractical (or even infeasible) to construct the ˇCech filtration associatedwith a point cloud.The following construction of a simplicial complex from a point cloud dependsonly on the pairwise distances of the points. Therefore, it is computationally moreefficient than constructing a ˇCech filtration and hence more useful in practice. Definition
Let ( M, d ) be a metric space, and let P = { x , x , . . . , x l } ⊆ M be a point cloud in M . For (cid:15) ≥ , the Vietoris–Rips (VR) complex R (cid:15) at (cid:15) associatedwith P is the simplicial complex whose simplices are sets of points in P that arepairwise within distance (cid:15) . The set { y , . . . , y k } ⊆ P is a ( k − -simplex of R (cid:15) if andonly if d ( y i , y j ) ≤ (cid:15) for all i, j ∈ { , · · · , k } . As with C (cid:15) , the 0-simplices of R (cid:15) correspond precisely to the points x , x , . . . , x l ,and the 1-simplices of R (cid:15) are the pairs of points that are within distance (cid:15) of eachother. Therefore, the 1-skeleton of R (cid:15) is the same as that of C (cid:15) . In the definitionof higher-dimensional simplices, only the pairwise distances of points play a role.The simplicial complex R (cid:15) is the maximal simplicial complex one can build on its1-skeleton; its k -simplices are the ( k + 1)-cliques of its 1-skeleton. Consequently,the 1-skeleton of R (cid:15) completely determines the entire simplicial complex. This is anattractive quality from a computational point of view, because it implies that it ispossible to store a VR complex as a graph (its 1-skeleton).If (cid:15) ≤ (cid:15) , then R (cid:15) ⊆ R (cid:15) ; increasing (cid:15) incremently from 0 to a value larger thanthe maximum distance beween any pair of points in a point cloud gives a filtrationof VR complexes whose 0th member is the collection of 0-simplices and whose finalmember is the ( l − { x , x , . . . , x l } together with its faces.That is, ∅ ⊆ R ⊆ · · · ⊆ R (cid:15) large . Let’s now return to the special case M = R n . Although a VR complex R (cid:15) asso-ciated with a point cloud P ⊆ R n is not a faithful representation of the union of ballsaround the points in P and may not even be topologically equivalent to a subspaceof R n , VR complexes provide a good approximation in the light of persistence, as thefollowing lemma, due to de Silva and Ghrist [43], shows. Lemma
For M = R n and any (cid:15) ≥ , we have that R (cid:15) ⊆ C (cid:15) √ ⊆ R (cid:15) √ . Consequently, any topological feature that persists between R (cid:15) and R (cid:15) √ in the VRfiltration is also a feature of the ˇCech complex C (cid:15) √2