Steady and ranging sets in graph persistence
SSteady and ranging sets in graph persistence
Mattia G. Bergomi , Massimo Ferri , Antonella Tavaglione Veos Digital, Milan, Italy ARCES and Dept. of Mathematics, Univ. of Bologna, Italy [email protected], [email protected],[email protected] , Abstract.
Generalised persistence functions (gp-functions) are definedon ( R , ≤ )-indexed diagrams in a given category. A sufficient condition forstability is also introduced. In the category of graphs, a standard way ofproducing gp-functions is proposed: steady and ranging sets for a givenfeature. The example of steady and ranging hubs is studied in depth;their meaning is investigated in three concrete networks. Keywords:
Persistence, hub, network.
Weighted graphs are a common data-structure in many real-world scenarios. Itis also customary to make use of persistent homology for analysis, classification,comparison and retrieval. However, this technique is by its very own naturelimited to the analysis of weighted simplicial complexes. Of course, the graphitself is a one-dimensional complex, however it often turns out that the relevantinformation is not the one carried by its topology, but it is rather carried by moreconcealed graph-theoretical structures. A common choice to overcome this issueis to associate auxiliary simplicial complexes to the graph, see for instance [2].This strategy has been successfully applied in many interesting applications,e.g. [21,18,23,24,25,7,22,4,26].It is possible to define and compute persistence in other categories thansimplicial complexes or topological spaces [3,1] and, in a different sense, [20,15].The present paper introduces a further class of generalized persistence functions( gp-functions ), defined on ( R , ≤ )-indexed diagrams in a given category, that canbe described via persistence diagrams. Additionally, we display a specific way ofbuilding gp-functions for filtered graphs, introducing the concepts of steady and ranging sets .We are therefore rather far from the categorifications of [5,17,19,10], in thatwe aim to provide a simpler and more agile tool for a direct use on graphs—without a passage through simplicial complexes—and possibly on other struc-tures naturally arising from applications.Section 2 is dedicated to recalling persistence diagrams and categorical per-sistence functions, and to introducing gp-functions. Section 3 focuses on graphs: a r X i v : . [ c s . C G ] S e p t defines balanced gp-functions, for which stability holds, and above all defines steady and ranging sets with respect to given features in a graph; this is the coreof the paper. The feature which is studied in depth in Section 4 is the one ofbeing a hub , i.e. a vertex whose degree is higher than the one of its neighbors.This is illustrated in Section 5 by three concrete examples: steady and ranginghubs in a network of airports, the network of characters of Les Mis´erables andthe one of a set of languages. An Appendix contains examples showing that themain gp-functions of the paper are not balanced.
Persistent topology has produced several concepts and tools: barcodes, extendedpersistence, zig-zag persistence, persistence modules and many more, but in ouropinion persistence diagrams are the most effective for analysis and comparisonof shapes, where the term “shape” has a very wide meaning. In Section 2.1 werecall their definition in the classical topological context. Section 2.2 is a briefoverview of the extension to a broad categorical context, given in [3,1]. Sec-tion 2.3 finally contains the main new concept of the present paper: generalisedpersistence functions. All the following sections will be based on them.
The main object of study in persistent homology [11] are filtered spaces, i.e.pairs (
X, f ) where X is a topological space (mostly the space of a simpli-cial complex) and f : X → R is a map called filtering function : sublevel sets X u = f − (cid:0) ( −∞ , u ] (cid:1) are compared through the homology morphisms inducedby inclusion, in particular through the so-called Persistent Betti Number func-tions. Out of such a function a persistence diagram (see Def. 1) can be built [8,Sect. 2]; out of the persistence diagram, in turn, the Persistent Betti Numberfunction can be recovered [8].Persistence diagrams are the most widely used “fingerprints” of filteredspaces. The bottleneck distance between persistence diagrams yields an effectivelower bound to distances between filtered spaces; this makes persistence dia-grams a powerful tool in shape classification, analysis and retrieval. The strate-gic advantage of the generalisation started in [3,1] consists in the fact that alsocategorical persistence functions (see Sect. 2.2) can be represented by persistencediagrams.In R × ( R ∪ { + ∞} ) set ∆ = { ( u, v ) | u = v } , ∆ + = { ( u, v ) | u < v } and¯ ∆ + = ∆ ∪ ∆ + . In a multiset, the multiplicity of an element will be the numberof times that the element appears. Definition 1. [8,6] A persistence diagram D is a multiset of points of ¯ ∆ + whereevery point of the diagonal ∆ appears with infinite multiplicity. The points of D belonging to ∆ + are called cornerpoints ; they are said to be proper if both their coordinates are finite, cornerpoints at infinity otherwise. Aersistence diagram is said to be finite if so is its set of cornerpoints. We shallonly consider finite persistence diagrams. Definition 2.
Given persistence diagrams
D, D (cid:48) , let Γ be the set of all bijectionsbetween D and D (cid:48) . We define the bottleneck (formerly matching ) distance asthe real number d ( D, D (cid:48) ) = inf γ ∈ Γ sup p ∈ D (cid:107) p − γ ( p ) (cid:107) ∞ This distance checks the maximum displacement between correspondingpoints for a given matching either between cornerpoints of the two diagramsor between cornerpoints and their own projections on the diagonal ∆ , and takesthe minimum among these maxima. Minima and maxima are actually attainedbecause of the requested finiteness. We briefly recall from [3,1] some definitions that we shall use in the paper.
Definition 3. [3, Def. 3.2] Let C be a category. A lower-bounded function p :Morph( C ) → Z is a categorical persistence function if, for all u → u → v → v , the following inequalities hold:1. p ( u → v ) ≤ p ( u → v ) and p ( u → v ) ≤ p ( u → v ) .2. p ( u → v ) − p ( u → v ) ≥ p ( u → v ) − p ( u → v ) . The archetypal categorical persistence functions are Persistent Betti Num-bers (see [11] for their definition and properties). Still, this definition has amuch wider range; for instance it includes functions induced by weakly directedproperties , e.g. functions counting clique communities, blocks, edge-blocks in aweighted graph [1].
Remark 1.
There is a standard way of associating a persistence diagram to acategorical persistence function; see [3, Sect. 3.9]. By [1, Prop. 1], the disconti-nuity sets of a categorical persistence function are either vertical or horizontal(possibly unbounded) segments with end-points in the cornerpoints. This meansthat categorical persistence functions have the appearence of superimposed tri-angles, typical of Persistent Betti Number functions. In fact, the two conditionsof Def. 3 correspond to Prop. 1 and Lemma 1 of [12], where that behaviour ofthe discontinuities of “size functions” (what would later be called 0-th PersistentBetti Number functions) was studied.
Definition 4. [3] A persistence function is a categorical persistence function onthe category ( R , ≤ ) . So a persistence function maps each pair of real numbers u ≤ v , to an integer p ( u, v ) such that, given u ≤ u ≤ v ≤ v , the following inequalities hold.. p ( u , v ) ≤ p ( u , v ) and p ( u , v ) ≤ p ( u , v ), that is to say p is non-decreasing in the first argument, and non-increasing in the second.2. p ( u , v ) − p ( u , v ) ≥ p ( u , v ) − p ( u , v ). Definition 5. [5, Sect. 1.3] An ( R , ≤ )-indexed diagram is any functor from thecategory ( R , ≤ ) to an arbitrary category C . ( R , ≤ ) -indexed diagrams form a cat-egory, C ( R , ≤ ) . The ( R , ≤ ) -indexed diagram is said to be monic if all morphismsof its image are monomorphisms of C . Assume that a map p is given, which assigns to each monic ( R , ≤ ) -indexed diagram M in a category C a categorical persistence function p M on ( R , ≤ ) , such that p M = p M (cid:48) for M naturally isomorphic to M (cid:48) . All theresulting categorical persistence functions p M are called generalised persistencefunctions in C ( gp-functions for brevity). The map p itself is called a gp-functiongenerator .Remark 2. The mapping assumed in Def. 6 can be easily shaped into a functorbetween suitable categories, but we shall not make use of this property.Every gp-function can be represented by a persistence diagram by the alreadyquoted construction of [3, Sect. 3.9], so gp-functions can be compared throughthe bottleneck distance of the respective diagrams. Moreover, ( R , ≤ )-indexeddiagrams can be compared through the interleaving distance [5, Def. 3.4], ex-tending the interleaving distance in the topological setting [8]. Thus, it makessense to discuss stability [5, Sect. 5] and universality [17, Sect. 5]. There is a wideclass of gp-functions for which stability follows by definition: the gp-functionsbuilt by composing a categorical persistence function (on a category C with fi-nite colimits) with ( R , ≤ )-indexed diagrams in C [3, Thm. 3.27]. Universality isalso guaranteed if C respects suitable conditions [1, Prop. 5]. Remark 3.
Even with a well-behaved category (e.g.
Graph ), gp-functions mightnot enjoy stability, since their values may not depend on the single morphisms,but on the structure of the whole ( R , ≤ )-indexed diagram. This will be unfortu-nately the case for the examples of Sect. 4. Let
Graph be the category having finite simple undirected graphs as objectsand injective simplicial applications as morphisms, seen as a subcategory of thecategory of finite simplicial complexes. In what follows, a graph will be consideredas the pair of its vertex set and edge set, i.e. G = ( V, E ), G (cid:48) = ( V (cid:48) , E (cid:48) ) and soon. We consider ( R , ≤ )-indexed diagrams in Graph that are constant on a finiteset of left-closed, right-open intervals. Because of the choice of monomorphismss the only acceptable morphisms, every such ( R , ≤ )-indexed diagram is monicand can be seen, up to natural isomorphisms, as a filtration of a graph G comingfrom a filtering function f : V ∪ E → R ∪ { + ∞} . Moreover, we shall limitour study to ( R , ≤ )-indexed diagrams whose associated filtration has no isolatedvertices at any level. In other words, the filtering function f takes value + ∞ ifa vertex is isolated, and the minimum of its values on the edges incident to thevertex, otherwise. Thus, f is determined by its restriction to E ; therefore the weighted graphs considered here are pairs ( G, f ) with f : E → R .A gp-function in Graph (Def. 6) p M , where M is an ( R , ≤ )-indexed diagram,will be denoted p ( G,f ) , where M corresponds to the filtration produced by theweighted graph ( G, f ). The associated persistence diagram will be denoted by D ( f ), for the sake of simplicity and if no confusion may occur. In general, gp-functions are not stable unless they come from a categorical persis-tence function on
Graph , i.e. there is no guarantee that the bottleneck distancebetween their persistence diagrams be a lower bound for their interleaving dis-tance. All the same there is a condition (Def. 7) which implies stability in thatsense of Thm. 1.
Definition 7.
Let p be a gp-function generator on Graph . The map p itselfand the resulting gp-functions are said to be balanced if the following conditionis satisfied. Let ( G, f ) and ( G (cid:48) , f (cid:48) ) be two weighted graphs, and p ( G,f ) , p ( G (cid:48) ,f (cid:48) ) their associated gp-functions. If an isomorphism ψ : G → G (cid:48) exists, such that sup e ∈ E | f ( e ) − f (cid:48) (cid:0) ψ ( e ) (cid:1) | ≤ h ( h > ), then for all ( u, v ) ∈ ∆ + the inequality p ( G,f ) ( u − h, v + h ) ≤ p ( G (cid:48) ,f (cid:48) ) ( u, v ) holds. Let (
G, f ), ( G (cid:48) , f (cid:48) ) be as above. Let also H be the (possibly empty) setof graph isomorphisms between G and G (cid:48) . We can now take to Graph somedefinitions given in [14,9,17].
Definition 8.
The natural pseudodistance of ( G, f ) and ( G (cid:48) , f (cid:48) ) is δ (cid:0) ( G, f ) , ( G (cid:48) , f (cid:48) ) (cid:1) = (cid:26) + ∞ if H = ∅ inf φ ∈H sup e ∈ E | f ( e ) − g (cid:0) φ ( e ) (cid:1) | otherwiseSome simple adjustments of the proof of [9, Thm. 29] and of its precedinglemmas yield the following theorem. Theorem 1 (Stability).
Let p be a balanced gp-function generator in Graph and ( G, f ) , ( G (cid:48) , f (cid:48) ) be two weighted graphs. Then we have d (cid:0) D ( f ) , D ( f (cid:48) ) (cid:1) ≤ δ (cid:0) ( G, f ) , ( G (cid:48) , f (cid:48) ) (cid:1) , where D ( f ) and D ( f (cid:48) ) are the persistence diagrams realized by the gp-functions p ( G,f ) and p ( G (cid:48) ,f (cid:48) ) respectively. (cid:3) Through [13, Thm. 5.8], this also implies stability with respect to the in-terleaving distance. Universality is generally not granted for stable persistencefunctions: it needs ad hoc constructions. .2 Steady and ranging sets
Given a weighted graph (
G, f ), any function F : 2 V ∪ E → { true, f alse } is calleda feature . We call F -set any X ⊂ V ∪ E such that F ( X ) = true . Given areal number u , we denote by G u the subgraph of G induced by the edge set f − ( −∞ , u ]. We shall say that X ⊂ V ∪ E is an F -set at level w ∈ R if it is an F -set of the subgraph G w . Definition 9.
Let F be a feature. A set X ⊆ V ∪ E is a steady F -set (s F -setfor brevity) at ( u, v ) ∈ ∆ + if it is an F -set at all levels w with u ≤ w ≤ v . Wecall X a ranging F -set (r F -set) at ( u, v ) if there exist levels w ≤ u and w (cid:48) ≥ v at which it is an F -set.Let S F ( G,f ) ( u, v ) be the set of s F -sets at ( u, v ) and let R F ( G,f ) ( u, v ) be the setof r F -sets at ( u, v ) .Remark 4. Of course, steady implies ranging; this is due to the “ ≤ ” and “ ≥ ”signs in the definitions. With strict inequalities the implication fails. Actually,there are features F for which steady is equivalent to ranging: the ones for whicha set can be an F -set only in a (possibly unbounded) interval. A simple exampleis the feature F which assigns true only to singletons consisting of a vertex of afixed degree. Lemma 1. If u ≤ u (cid:48) < v (cid:48) ≤ v , then1. S F ( G,f ) ( u, v ) ⊆ S F ( G,f ) ( u (cid:48) , v (cid:48) ) R F ( G,f ) ( u, v ) ⊆ R F ( G,f ) ( u (cid:48) , v (cid:48) ) where the equalities hold if G u = G u (cid:48) and G v = G v (cid:48) . Moreover S F ( G,f ) ( u, v ) = ∅ = R F ( G,f ) ( u, v ) if G u = ∅ .Proof. By the definitions themselves of steady and ranging F -set. Definition 10.
Let F be a feature. For any graph G , for any filtering func-tion f : E → R , we define σ F ( G,f ) : ∆ + → R as the function which assignsto ( u, v ) ∈ ∆ + the number | S F ( G,f ) ( u, v ) | and (cid:37) F ( G,f ) : ∆ + → R as the functionwhich assigns to ( u, v ) ∈ ∆ + the number | R F ( G,f ) ( u, v ) | . We denote by σ F and (cid:37) F the maps assigning σ F ( G,f ) and (cid:37) F ( G,f ) respectively to the ( R , ≤ ) -indexed diagramcorresponding to ( G, f ) . Proposition 1.
The maps σ F and (cid:37) F are gp-function generators.Proof. We prove conditions 1 and 2 of Def. 3, recalling that the source categoryis ( R , ≤ ), so the existence of a morphism u → v (with u (cid:54) = v ) simply means that u < v . Assume u < u < v < v . Let ( G, f ) be any weighted graph. – (Condition 1 for σ F ) By Lemma 1, S F ( G,f ) ( u , v ) ⊆ S F ( G,f ) ( u , v ), so | S F ( G,f ) ( u , v ) | ≤ | S F ( G,f ) ( u , v ) | . Also S F ( G,f ) ( u , v ) ⊆ S F ( G,f ) ( u , v ) and | S F ( G,f ) ( u , v ) | ≤ | S F ( G,f ) ( u , v ) | . (Condition 2 for σ F ) By Lemma 1, S F ( G,f ) ( u , v ) ⊆ S F ( G,f ) ( u , v ),so | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | is the number of s F -sets at ( u , v )which fail to be F -sets at some w with u ≤ w ≤ u . Analogously for | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | .Now, every s F -set at ( u , v ) which fails to be an F -set at w with u ≤ w ≤ u is also an s F -set at ( u , v ) failing at the same w .So S F ( G,f ) ( u , v ) − S F ( G,f ) ( u , v ) ⊇ S F ( G,f ) ( u , v ) − S F ( G,f ) ( u , v ) and | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | ≥ | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | . – (Condition 1 for (cid:37) F ) The argument is the same as for σ F . – (Condition 2 for (cid:37) F ) By Lemma 1, R F ( G,f ) ( u , v ) ⊆ R F ( G,f ) ( u , v ),so | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | is the number of r F -sets at ( u , v )which fail to be F -sets at all levels w with w ≤ u . Analogously for | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | .Now, every r F -set at ( u , v ) which fails to be an F -set at all levels w with w ≤ u is also an r F -set at ( u , v ) failing at the same levels w . So R F ( G,f ) ( u , v ) − R F ( G,f ) ( u , v ) ⊇ R F ( G,f ) ( u , v ) − R F ( G,f ) ( u , v ) and | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | ≥ | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | .The value of both functions σ F ( G,f ) and (cid:37) F ( G,f ) at a point P on a vertical (resp.horizontal) discontinuity line is the same as the value at the points in a right(resp. upper) neighborhood of P Of course, there are many features which give valid but meaningless gp-functions: the features F such that, if X is an F -set at level u , then it is an F -set also at level v for all v > u .We still don’t know which hypothesis on F would imply that σ F ( G,f ) or (cid:37) F ( G,f ) are balanced (Def. 7). Such features exist: One is the already mentioned feature F which assigns true only to singletons consisting of a vertex of a fixed degree. We now give an example of the framework exposed in Section 3.2. Given anygraph G , we define Eu : 2 V ∪ E → { true, f alse } to yield true on a set A if andonly if A is a set of vertices whose induced subgraph of G is nonempty, connected,Eulerian and maximal with respect to these properties; in that case A is saidto be a Eu - set of G . Let now ( G, f ) be a weighted graph. We apply Def. 9 tofeature Eu , in the modified version with one strict inequality. Definition 11.
For any real number w , the subset A ⊆ V is a Eu -set at level w if it is a Eu -set of the subgraph G w . It is a steady Eu -set (an s Eu -set) at ( u, v ) ∈ ∆ + if it is a Eu -set at all levels w with u ≤ w < v . It is a ranging Eu -set (an r Eu -set) at ( u, v ) if there exist levels w ≤ u and w (cid:48) ≥ v at which itis a Eu -set. S Eu ( G,f ) ( u, v ) and R Eu ( G,f ) ( u, v ) are respectively the sets of s Eu -sets and of r Eu -setst ( u, v ) . We define σ Eu ( G,f ) : ∆ + → R as the function which assigns to ( u, v ) ∈ ∆ + the number | S Eu ( G,f ) ( u, v ) | and (cid:37) Eu ( G,f ) : ∆ + → R as the function which assigns to ( u, v ) ∈ ∆ + the number | R Eu ( G,f ) ( u, v ) | .We denote by σ Eu and (cid:37) Eu the maps assigning σ Eu ( G,f ) and (cid:37) Eu ( G,f ) respectively tothe ( R , ≤ ) -indexed diagram corresponding to ( G, f ) . Fig. 1: Example of the functions σ Eu ( G,f ) and (cid:37) Eu ( G,f ) , coinciding for this particularweighted graph. Proposition 2.
The maps σ Eu and (cid:37) Eu are gp-function generators.Proof. By Proposition 1.Fig. 1 shows these two functions (coincident in this case) for a weightedgraph.Both functions σ Eu and (cid:37) Eu are not balanced (see the Appendix). Although the informal concept of hub is intuitively clear, it is not as easy toformalize in graph-theoretical terms. The simple idea of a vertex with (locally)maximum degree is not entirely satisfactory: in a social network it is commonto find users with a lot of contacts, with whom, however, they interact poorly.Even a high sum of traffic intensities (e.g. the number of messages exchangedbetween a user and its connections) is not enough to bestow a vertex the centralrole meant by the word hub .e shall use local degree prevalence as the feature used for building twogp-function generators: for any graph G we define H : 2 V ∪ E → { true, f alse } toyield true only on singletons containing a vertex whose degree is greater than theones of its neighbors. Such a vertex is called an H -vertex or simply a hub . Thisfeature, combined with the generalized persistence framework and the notion ofranging and steady feature, allows for the identification of those vertices whoserole is indeed central throughout the filtration of a given weighted graph ( G, f ).Importantly, we preserve the flexibility granted in the realm of classical per-sistence: as one of the many possible variations, we could consider a vertex to bea hub if the sum of values of f on the edges incident to it (instead of the degree)is greater then the sum at its neighbors.Our proposal is to build persistence diagrams in our generalized framework,and thereafter use the selection procedure presented in [16] (see 5.1) to identifyrelevant cornerpoints, thus identifying the “persistent” hubs of a given weightedgraph. Definition 12.
For any real number w , a vertex is a hub (or H -vertex ) at level w if it is an H -vertex of the subgraph G w . It is a steady hub (or s H -vertex ) at( u, v ) ∈ ∆ + if it is an H -vertex at all levels w with u ≤ w ≤ v . It is a ranginghub (or r H -vertex ) at ( u, v ) ∈ ∆ + if there exist levels w ≤ u and w (cid:48) ≥ v atwhich it is an H -vertex. S H ( G,f ) ( u, v ) and R H ( G,f ) ( u, v ) are respectively the sets of s H -vertices and of r H -vertices at ( u, v ) . We define σ H ( G,f ) : ∆ + → R as the function which assigns to ( u, v ) ∈ ∆ + the number | S H ( G,f ) ( u, v ) | and (cid:37) H ( G,f ) : ∆ + → R as the function whichassigns to ( u, v ) ∈ ∆ + the number | R H ( G,f ) ( u, v ) | .We denote by σ H and (cid:37) H the maps assigning σ H ( G,f ) and (cid:37) H ( G,f ) respectively tothe ( R , ≤ ) -indexed diagram corresponding to ( G, f ) . Proposition 3. σ H and (cid:37) H are gp-function generators.Proof. By Proposition 1. . Fig. 2: A weighted graph (
G, f ) and its functions σ H ( G,f ) and (cid:37) H ( G,f ) (right).Fig. 2 shows an example of the two gp-functions. Also σ H and (cid:37) H are notbalanced (see the Appendix). Persistent hubs
In this Section we present a first approach to hub detection implementable onreal-world graphs. We consider this work in progress a sort of exploration of themeaning of steady and ranging hubs in different contexts; however, we will notcompare our results to a ground truth.In the following examples, instead of the functions σ H ( G,f ) and (cid:37) H ( G,f ) , wewill only show the corresponding persistence diagrams, to make the selectionprocedure clearer. It is well-known in persistence that noise is represented by cornerpoints close tothe diagonal ∆ . However, not all cornerpoints close to ∆ necessarily representnoise, then how wide is the strip along ∆ to get rid of? A smart, simple answeris offered in [16], where a remarkable application to segmentation of very noisydata is given. We summarize it here for a given persistence diagram D .Call diagonal gap a maximal region of the form { ( u, v ) ∈ ∆ + | a < u < v < b } where no cornerpoints of D lie; b − a is its width. We can then form a hierar-chy of diagonal gaps by decreasing width; out of it we get a hierarchy of setsof cornerpoints: We can consider the cornerpoints lying above the first, widestgap as the most relevant. Empirically, we may decide that also the cornerpointssitting above the second, or the third widest gap are relevant, and so on. Equiv-alently, we consider the cornerpoints below the chosen gap to be ignored as apossible result of noise. In Fig. 3 it is possible to observe how the selection ofcornerpoints above the widest diagonal gap allows to traceback those maxima(or classes of maxima depending on the multiplicity of the cornerpoints), thatare more relevant with respect to the trend of the time series.Fig. 3: Selecting maxima in a time series. Left.
Flow of the Nile from 1871to 1970. Data freely available at vincentarelbundock.github.io.
Right.
Corner-points selected by considering the widest diagonal gap (in yellow).In the next Sections we apply this selection criterion to the persistence di-agrams corresponding to the functions σ H ( G,f ) and (cid:37) H ( G,f ) , computed for someetworks and some filtering functions. The vertices identified by the so selectedcornerpoints will be called persistent hubs , in particular persistent steady hubs or persistent ranging hubs . A first attempt of the search for relevant hubs has been realized on a set of44 major North-American cities (41 in the US, three in Canada; the ones incapital letters in the Amtrak railway map; see Table 1). The edges connect citiesbetween which there have been flights in a randomly chosen but fixed week (June11 to 17, 2018). Flight data have been obtained from Google Flights by selectingdirect flights with Business Class; distances have been found at Prokerala.com.A single vertex has been considered for each city with more than one airport.
Vertices (degree)
Albuquerque (13) Atlanta (42) Baltimore (16) Boston (30)Buffalo (8) Cheyenne (0) Chicago (40) Cincinnati (19)Cleveland (13) Dallas (41) Denver (39) Detroit (35)El Paso (7) Houston (40) Indianapolis (17) Jacksonville (12)Kansas City (19) Las Vegas (23) Los Angeles (37) Memphis (11)Miami (30) Milwaukee (14) Mobile (3) Montreal (16)New Orleans (16) New York (35) Oakland/Emeryville (7) Philadelphia (34)Phoenix (35) Pittsburgh (14) Portland (25) Sacramento (16)Salt Lake City (33) San Antonio (17) San Diego (26) San Francisco (35)Seattle (34) St. Louis (17) St. Paul-Minneapolis (38) Tampa (19)Toronto (26) Tucson (10) Vancouver (18) Washington (32)
Table 1: The towns considered as vertices and the respective degrees in the graph.As filtering functions we used: – distance – number of flights in the fixed week – their productand their opposites (+their maximum). For each such choice we looked for steadyand ranging hubs, for a total of twelve different persistence diagrams. Note thatthe same vertex can contribute to several cornerpoints of the persistence diagramof σ H ( G,f ) , whereas this cannot happen for (cid:37) H ( G,f ) .Next, we report results in which where the interest resides in the identificationof hubs which do not rank very high by their degree. In particular, we do notfind of particular interest that Atlanta, Dallas, Chicago and Houston turn out tobe often persistent ranging or steady hubs, since they have the highest degreesin the graph (42, 41, 40 and 40 respectively). D e a t h Visualizing the 0th widest gap
Fig. 4: Filtering function: distance; steady hubs. Persistent steady hubs abovethe widest diagonal gap: two cornerpoints represent Atlanta, one Dallas and oneSeattle.The first occurrence of a persistent hub which is rather far from havinghighest degrees is with the filtering function distance: Seattle is just twelfth inthe degree rank, but appears above the widest diagonal gap as a steady hub(Figure 4). Persistent steady hubs are: Atlanta (with two cornerpoints), Dallas,Seattle.Surprisingly, if we use the opposite of distance (summed to the maximumdistance, for ease of representation), the cornerpoints corresponding to verticeswith highest degrees are located under the widest diagonal gap (Figure 5). Per-sistent steady hubs are: Los Angeles, San Francisco, Seattle. D e a t h Visualizing the 0th widest gap
Fig. 5: Filtering function: max distance minus distance; steady hubs. Persistentsteady hubs above the widest diagonal gap: Los Angeles, San Francisco, Seattle.New York City has the eighth highest degree (35, together with Detroit,Phoenix and San Francisco). Still, we would expect it to appear as a hub, inthe common sense of the term. In fact, it occurs as one of the few ranging hubswhen the filtering functions (max minus number of flights) and distance · (maxminus number of flights) are used.Ranging hubs for (max minus number of flights): Atlanta, Chicago, Dallas, NewYork. teady hubs Cosette Courfeyrac EnjolrasMarius Myriel Valjean
Ranging hubs
Cosette Courfeyrac EnjolrasMarius Myriel Valjean
Clique-community centrality
Enjolras Fantine GavrocheMarius Valjean
Table 2: Hubs in Les Miserables characters co-occurrence. Comparing results ob-tained via the steady and ranging persistence construction and clique-communitycentrality.Ranging hubs for the product filtering function are Atlanta, Chicago, Dallas,New York, Vancouver.
A classical benchmark for the analysis of hubs in co-occurrence graphs is givenby
Les Mis´erables . The network representing the co-occurrence of its charactersis freely available at Graphistry. The graph has 77 major characters as vertices;each of the 254 edges joins two characters which appear together in at least onescene; the weight on an edge is the number of common occurrences. We used theinverse of the weight as a filtering function. We compare our results with theones of [24], where the notion of clique-community centrality was used to spotparticularly important characters: Table 2.Our method spots Cosette as a hub, whereas clique-community centralitydoes not. On the contrary, our technique misses Gavroche and Fantine. Bothmethods miss Javert. We are particularly puzzled by the result of Kurlin’s selec-tion method: above the second widest diagonal gap (the first obviously isolatesJean Valjean) we find only Enjolras.
The website TerraLing.com contains much information, consisting of 165 prop-erties, about several languages. It was used in an interesting research [22] onpersistent cycles in language families. Unfortunately the amount of informationvaries quite a lot from language to language. We analysed the mutual relationsof 19 languages (18 of the European Union plus Turkish: Table 3) for which atleast 50% of the 165 properties are checked. The graph is the complete one with19 vertices. The filtering function defined on each edge is the opposite of the nor-malised quantity of common properties of the two languages that it connects.anging and steady hubs coincide and are: Castilian, Catalan, Dutch, English,Portuguese, Swedish.
Languages
Castilian Catalan Czech Croatian DanishDutch English Finnish French GalicianGerman Greek Hungarian Italian PolishPortuguese Romanian Swedish Turkish
Table 3: The 19 considered languages.Apart from the presence of English, which might also be biased by the greatquantity of information available, we have no key for interpreting these results.For this and for the previous applications, we would very much like to set up aresearch with specific experts.Fig. 6: σ Eu is not balanced: filtering function f left, f (cid:48) right. We introduced gp-functions in a fairly general setting and studied their stability.We have then restricted our scope to the category of graphs, where we havedefined steady and ranging sets according to features relative to the given graphs.Particular attention has been given to steady and ranging hubs in a graph. Wealso tried to apply this notion to the vertices of a network of airports, to thecharacters of
Les Mis´erables and to a set of languages. cknowledgments
We are indebted to Diego Alberici, Emanuele Mingione, Pierluigi Contucci, Pa-trizio Frosini, Lorenzo Zuffi and above all Pietro Vertechi for many fruitful dis-cussions. Article written within the activity of INdAM-GNSAGA.Fig. 7: (cid:37) Eu is not balanced: filtering function f left, f (cid:48) right. Appendix: Instability
In order to show that some of the proposed gp-functions are not balanced—sotheir persistence diagrams do not enjoy stability—we give examples which donot respect Def. 7.
Fig. 8: σ H is not balanced: filtering function f left, f (cid:48) right.The gp-function generator σ Eu is not balanced, as the example of Fig.6 shows:in fact, the maximum absolute value of the weight difference on the same edgesis 1, and σ Eu ( G,f ) (2 . − ,
10 + 1) = 1 > σ Eu ( G,f (cid:48) ) (2 . ,
25 8 769 397 4 8 65
Fig. 9: (cid:37) H is not balanced: filtering function f left, f (cid:48) right.Also the gp-function generator (cid:37) Eu is not balanced, as the example of Fig.7shows: in fact, the maximum absolute value of the weight difference on the sameedges is 1, and (cid:37) Eu ( G,f ) (7 . − ,
10 + 1) = 1 > (cid:37) Eu ( G,f (cid:48) ) (7 . , σ H is not a balanced gp-function generator, as the example of Fig. 8 shows:the maximum absolute value of the weight difference on the same edges is 2, but σ H ( G,f ) (4 − , > σ H ( G,f (cid:48) ) (4 , > ” is substituted by“ ≥ ” in thedefinition of hub (what we don’t think to be a good idea).Also (cid:37) ( G,f ) is not a balanced gp-function, as the example of Fig. 9 shows:the maximum absolute value of the weight difference on the same edges is 2, but (cid:37) H ( G,f ) (5 − , > (cid:37) H ( G,f (cid:48) ) (5 , References
1. M. G. Bergomi, M. Ferri, P. Vertechi, and L. Zuffi. Beyond topological persistence:Starting from networks. arXiv preprint arXiv:1901.08051 , 2020.2. M. G. Bergomi, M. Ferri, and L. Zuffi. Topological graph persistence. arXivpreprint arXiv:1707.09670 , 2020.3. M. G. Bergomi and P. Vertechi. Rank-based persistence.
Theory and applicationsof categories , 35(9):228–260, 2020.4. A. S. Blevins and D. S. Bassett. Reorderability of node-filtered order complexes.
Physical Review E , 101(5):052311, 2020.5. P. Bubenik and J. A. Scott. Categorification of persistent homology.
Discrete &Computational Geometry , 51(3):600–627, 2014.6. F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas, and S. Y. Oudot. Proximityof persistence modules and their diagrams. In
SCG ’09: Proceedings of the 25thannual symposium on Computational geometry , pages 237–246, New York, NY,USA, 2009. ACM.7. S. Chowdhury and F. M´emoli. Persistent path homology of directed networks.In
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1152–1169. SIAM, 2018.8. D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams.
Discr. Comput. Geom. , 37(1):103–120, 2007.9. M. d’Amico, P. Frosini, and C. Landi. Natural pseudo-distance and optimal match-ing between reduced size functions.
Acta Applicandae Mathematicae , 109(2):527–554, 2010.10. V. de Silva, E. Munch, and A. Stefanou. Theory of interleavings on [0 , ∞ )-actegories. arXiv preprint arXiv:1706.04095 , 2017.1. H. Edelsbrunner and J. Harer. Persistent homology—a survey. In Surveys ondiscrete and computational geometry , volume 453 of
Contemp. Math. , pages 257–282. Amer. Math. Soc., Providence, RI, 2008.12. P. Frosini and C. Landi. Size functions and formal series.
Appl. Algebra Engrg.Comm. Comput. , 12(4):327–349, 2001.13. P. Frosini, C. Landi, and F. M´emoli. The persistent homotopy type distance.
Homology, Homotopy and Applications , 21(2):231–259, 2019.14. P. Frosini and M. Mulazzani. Size homotopy groups for computation of naturalsize distances.
Bull. of the Belg. Math. Soc. , 6(3):455–464, 1999.15. W. Kim and F. Memoli. Generalized persistence diagrams for persistence modulesover posets. arXiv preprint arXiv:1810.11517 , 2018.16. V. Kurlin. A fast persistence-based segmentation of noisy 2d clouds with provableguarantees.
Pattern recognition letters , 83:3–12, 2016.17. M. Lesnick. The theory of the interleaving distance on multidimensional persistencemodules.
Foundations of Computational Mathematics , pages 1–38, 2015.18. L.-D. Lord, P. Expert, H. M. Fernandes, G. Petri, T. J. Van Hartevelt, F. Vaccarino,G. Deco, F. Turkheimer, and M. L. Kringelbach. Insights into brain architecturesfrom the homological scaffolds of functional connectivity networks.
Frontiers inSystems Neuroscience , 10, 2016.19. S. Y. Oudot.
Persistence theory: from quiver representations to data analysis ,volume 209. American Mathematical Society Providence, RI, 2015.20. A. Patel. Generalized persistence diagrams.
Journal of Applied and ComputationalTopology , 1(3-4):397–419, 2018.21. G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, andF. Vaccarino. Homological scaffolds of brain functional networks.
Journal of TheRoyal Society Interface , 11(101):20140873, 2014.22. A. Port, I. Gheorghita, D. Guth, J. M. Clark, C. Liang, S. Dasu, and M. Marcolli.Persistent topology of syntax.
Mathematics in Computer Science , 12(1):33–50,2018.23. M. W. Reimann, M. Nolte, M. Scolamiero, K. Turner, R. Perin, G. Chindemi,P. D(cid:32)lotko, R. Levi, K. Hess, and H. Markram. Cliques of neurons bound intocavities provide a missing link between structure and function.
Frontiers in Com-putational Neuroscience , 11:48, 2017.24. B. Rieck, U. Fugacci, J. Lukasczyk, and H. Leitte. Clique community persistence:A topological visual analysis approach for complex networks.
IEEE Transactionson Visualization and Computer Graphics , 24(1):822–831, 2018.25. A. E. Sizemore, C. Giusti, A. Kahn, J. M. Vettel, R. F. Betzel, and D. S. Bas-sett. Cliques and cavities in the human connectome.
Journal of ComputationalNeuroscience , 44(1):115–145, Feb 2018.26. A. D. Vijay, M. Zhenyu, X. Kelin, and M. Yuguang. Weighted persistent homol-ogy for osmolyte molecular aggregation and hydrogen-bonding network analysis.