[PDF] Steady and ranging sets in graph persistence

Abstract

Generalised persistence functions (gp-functions) are defined on (R,≤) -indexed diagrams in a given category. A sufficient condition for stability is also introduced. In the category of graphs, a standard way of producing gp-functions is proposed: steady and ranging sets for a given feature. The example of steady and ranging hubs is studied in depth; their meaning is investigated in three concrete networks.

Full PDF

SSteady and ranging sets in graph persistence

Mattia G. Bergomi , Massimo Ferri , Antonella Tavaglione Veos Digital, Milan, Italy ARCES and Dept. of Mathematics, Univ. of Bologna, Italy [email protected], [email protected],[email protected] , Abstract.

Generalised persistence functions (gp-functions) are deﬁnedon ( R , ≤ )-indexed diagrams in a given category. A suﬃcient condition forstability is also introduced. In the category of graphs, a standard way ofproducing gp-functions is proposed: steady and ranging sets for a givenfeature. The example of steady and ranging hubs is studied in depth;their meaning is investigated in three concrete networks. Keywords:

Persistence, hub, network.

Weighted graphs are a common data-structure in many real-world scenarios. Itis also customary to make use of persistent homology for analysis, classiﬁcation,comparison and retrieval. However, this technique is by its very own naturelimited to the analysis of weighted simplicial complexes. Of course, the graphitself is a one-dimensional complex, however it often turns out that the relevantinformation is not the one carried by its topology, but it is rather carried by moreconcealed graph-theoretical structures. A common choice to overcome this issueis to associate auxiliary simplicial complexes to the graph, see for instance [2].This strategy has been successfully applied in many interesting applications,e.g. [21,18,23,24,25,7,22,4,26].It is possible to deﬁne and compute persistence in other categories thansimplicial complexes or topological spaces [3,1] and, in a diﬀerent sense, [20,15].The present paper introduces a further class of generalized persistence functions( gp-functions ), deﬁned on ( R , ≤ )-indexed diagrams in a given category, that canbe described via persistence diagrams. Additionally, we display a speciﬁc way ofbuilding gp-functions for ﬁltered graphs, introducing the concepts of steady and ranging sets .We are therefore rather far from the categoriﬁcations of [5,17,19,10], in thatwe aim to provide a simpler and more agile tool for a direct use on graphs—without a passage through simplicial complexes—and possibly on other struc-tures naturally arising from applications.Section 2 is dedicated to recalling persistence diagrams and categorical per-sistence functions, and to introducing gp-functions. Section 3 focuses on graphs: a r X i v : . [ c s . C G ] S e p t deﬁnes balanced gp-functions, for which stability holds, and above all deﬁnes steady and ranging sets with respect to given features in a graph; this is the coreof the paper. The feature which is studied in depth in Section 4 is the one ofbeing a hub , i.e. a vertex whose degree is higher than the one of its neighbors.This is illustrated in Section 5 by three concrete examples: steady and ranginghubs in a network of airports, the network of characters of Les Mis´erables andthe one of a set of languages. An Appendix contains examples showing that themain gp-functions of the paper are not balanced.

Persistent topology has produced several concepts and tools: barcodes, extendedpersistence, zig-zag persistence, persistence modules and many more, but in ouropinion persistence diagrams are the most eﬀective for analysis and comparisonof shapes, where the term “shape” has a very wide meaning. In Section 2.1 werecall their deﬁnition in the classical topological context. Section 2.2 is a briefoverview of the extension to a broad categorical context, given in [3,1]. Sec-tion 2.3 ﬁnally contains the main new concept of the present paper: generalisedpersistence functions. All the following sections will be based on them.

The main object of study in persistent homology [11] are ﬁltered spaces, i.e.pairs (

X, f ) where X is a topological space (mostly the space of a simpli-cial complex) and f : X → R is a map called ﬁltering function : sublevel sets X u = f − (cid:0) ( −∞ , u ] (cid:1) are compared through the homology morphisms inducedby inclusion, in particular through the so-called Persistent Betti Number func-tions. Out of such a function a persistence diagram (see Def. 1) can be built [8,Sect. 2]; out of the persistence diagram, in turn, the Persistent Betti Numberfunction can be recovered [8].Persistence diagrams are the most widely used “ﬁngerprints” of ﬁlteredspaces. The bottleneck distance between persistence diagrams yields an eﬀectivelower bound to distances between ﬁltered spaces; this makes persistence dia-grams a powerful tool in shape classiﬁcation, analysis and retrieval. The strate-gic advantage of the generalisation started in [3,1] consists in the fact that alsocategorical persistence functions (see Sect. 2.2) can be represented by persistencediagrams.In R × ( R ∪ { + ∞} ) set ∆ = { ( u, v ) | u = v } , ∆ + = { ( u, v ) | u < v } and¯ ∆ + = ∆ ∪ ∆ + . In a multiset, the multiplicity of an element will be the numberof times that the element appears. Deﬁnition 1. [8,6] A persistence diagram D is a multiset of points of ¯ ∆ + whereevery point of the diagonal ∆ appears with inﬁnite multiplicity. The points of D belonging to ∆ + are called cornerpoints ; they are said to be proper if both their coordinates are ﬁnite, cornerpoints at inﬁnity otherwise. Aersistence diagram is said to be ﬁnite if so is its set of cornerpoints. We shallonly consider ﬁnite persistence diagrams. Deﬁnition 2.

Given persistence diagrams

D, D (cid:48) , let Γ be the set of all bijectionsbetween D and D (cid:48) . We deﬁne the bottleneck (formerly matching ) distance asthe real number d ( D, D (cid:48) ) = inf γ ∈ Γ sup p ∈ D (cid:107) p − γ ( p ) (cid:107) ∞ This distance checks the maximum displacement between correspondingpoints for a given matching either between cornerpoints of the two diagramsor between cornerpoints and their own projections on the diagonal ∆ , and takesthe minimum among these maxima. Minima and maxima are actually attainedbecause of the requested ﬁniteness. We brieﬂy recall from [3,1] some deﬁnitions that we shall use in the paper.

Deﬁnition 3. [3, Def. 3.2] Let C be a category. A lower-bounded function p :Morph( C ) → Z is a categorical persistence function if, for all u → u → v → v , the following inequalities hold:1. p ( u → v ) ≤ p ( u → v ) and p ( u → v ) ≤ p ( u → v ) .2. p ( u → v ) − p ( u → v ) ≥ p ( u → v ) − p ( u → v ) . The archetypal categorical persistence functions are Persistent Betti Num-bers (see [11] for their deﬁnition and properties). Still, this deﬁnition has amuch wider range; for instance it includes functions induced by weakly directedproperties , e.g. functions counting clique communities, blocks, edge-blocks in aweighted graph [1].

Remark 1.

There is a standard way of associating a persistence diagram to acategorical persistence function; see [3, Sect. 3.9]. By [1, Prop. 1], the disconti-nuity sets of a categorical persistence function are either vertical or horizontal(possibly unbounded) segments with end-points in the cornerpoints. This meansthat categorical persistence functions have the appearence of superimposed tri-angles, typical of Persistent Betti Number functions. In fact, the two conditionsof Def. 3 correspond to Prop. 1 and Lemma 1 of [12], where that behaviour ofthe discontinuities of “size functions” (what would later be called 0-th PersistentBetti Number functions) was studied.

Deﬁnition 4. [3] A persistence function is a categorical persistence function onthe category ( R , ≤ ) . So a persistence function maps each pair of real numbers u ≤ v , to an integer p ( u, v ) such that, given u ≤ u ≤ v ≤ v , the following inequalities hold.. p ( u , v ) ≤ p ( u , v ) and p ( u , v ) ≤ p ( u , v ), that is to say p is non-decreasing in the ﬁrst argument, and non-increasing in the second.2. p ( u , v ) − p ( u , v ) ≥ p ( u , v ) − p ( u , v ). Deﬁnition 5. [5, Sect. 1.3] An ( R , ≤ )-indexed diagram is any functor from thecategory ( R , ≤ ) to an arbitrary category C . ( R , ≤ ) -indexed diagrams form a cat-egory, C ( R , ≤ ) . The ( R , ≤ ) -indexed diagram is said to be monic if all morphismsof its image are monomorphisms of C . Assume that a map p is given, which assigns to each monic ( R , ≤ ) -indexed diagram M in a category C a categorical persistence function p M on ( R , ≤ ) , such that p M = p M (cid:48) for M naturally isomorphic to M (cid:48) . All theresulting categorical persistence functions p M are called generalised persistencefunctions in C ( gp-functions for brevity). The map p itself is called a gp-functiongenerator .Remark 2. The mapping assumed in Def. 6 can be easily shaped into a functorbetween suitable categories, but we shall not make use of this property.Every gp-function can be represented by a persistence diagram by the alreadyquoted construction of [3, Sect. 3.9], so gp-functions can be compared throughthe bottleneck distance of the respective diagrams. Moreover, ( R , ≤ )-indexeddiagrams can be compared through the interleaving distance [5, Def. 3.4], ex-tending the interleaving distance in the topological setting [8]. Thus, it makessense to discuss stability [5, Sect. 5] and universality [17, Sect. 5]. There is a wideclass of gp-functions for which stability follows by deﬁnition: the gp-functionsbuilt by composing a categorical persistence function (on a category C with ﬁ-nite colimits) with ( R , ≤ )-indexed diagrams in C [3, Thm. 3.27]. Universality isalso guaranteed if C respects suitable conditions [1, Prop. 5]. Remark 3.

Even with a well-behaved category (e.g.

Graph ), gp-functions mightnot enjoy stability, since their values may not depend on the single morphisms,but on the structure of the whole ( R , ≤ )-indexed diagram. This will be unfortu-nately the case for the examples of Sect. 4. Let

Graph be the category having ﬁnite simple undirected graphs as objectsand injective simplicial applications as morphisms, seen as a subcategory of thecategory of ﬁnite simplicial complexes. In what follows, a graph will be consideredas the pair of its vertex set and edge set, i.e. G = ( V, E ), G (cid:48) = ( V (cid:48) , E (cid:48) ) and soon. We consider ( R , ≤ )-indexed diagrams in Graph that are constant on a ﬁniteset of left-closed, right-open intervals. Because of the choice of monomorphismss the only acceptable morphisms, every such ( R , ≤ )-indexed diagram is monicand can be seen, up to natural isomorphisms, as a ﬁltration of a graph G comingfrom a ﬁltering function f : V ∪ E → R ∪ { + ∞} . Moreover, we shall limitour study to ( R , ≤ )-indexed diagrams whose associated ﬁltration has no isolatedvertices at any level. In other words, the ﬁltering function f takes value + ∞ ifa vertex is isolated, and the minimum of its values on the edges incident to thevertex, otherwise. Thus, f is determined by its restriction to E ; therefore the weighted graphs considered here are pairs ( G, f ) with f : E → R .A gp-function in Graph (Def. 6) p M , where M is an ( R , ≤ )-indexed diagram,will be denoted p ( G,f ) , where M corresponds to the ﬁltration produced by theweighted graph ( G, f ). The associated persistence diagram will be denoted by D ( f ), for the sake of simplicity and if no confusion may occur. In general, gp-functions are not stable unless they come from a categorical persis-tence function on

Graph , i.e. there is no guarantee that the bottleneck distancebetween their persistence diagrams be a lower bound for their interleaving dis-tance. All the same there is a condition (Def. 7) which implies stability in thatsense of Thm. 1.

Deﬁnition 7.

Let p be a gp-function generator on Graph . The map p itselfand the resulting gp-functions are said to be balanced if the following conditionis satisﬁed. Let ( G, f ) and ( G (cid:48) , f (cid:48) ) be two weighted graphs, and p ( G,f ) , p ( G (cid:48) ,f (cid:48) ) their associated gp-functions. If an isomorphism ψ : G → G (cid:48) exists, such that sup e ∈ E | f ( e ) − f (cid:48) (cid:0) ψ ( e ) (cid:1) | ≤ h ( h > ), then for all ( u, v ) ∈ ∆ + the inequality p ( G,f ) ( u − h, v + h ) ≤ p ( G (cid:48) ,f (cid:48) ) ( u, v ) holds. Let (

G, f ), ( G (cid:48) , f (cid:48) ) be as above. Let also H be the (possibly empty) setof graph isomorphisms between G and G (cid:48) . We can now take to Graph somedeﬁnitions given in [14,9,17].

Deﬁnition 8.

The natural pseudodistance of ( G, f ) and ( G (cid:48) , f (cid:48) ) is δ (cid:0) ( G, f ) , ( G (cid:48) , f (cid:48) ) (cid:1) = (cid:26) + ∞ if H = ∅ inf φ ∈H sup e ∈ E | f ( e ) − g (cid:0) φ ( e ) (cid:1) | otherwiseSome simple adjustments of the proof of [9, Thm. 29] and of its precedinglemmas yield the following theorem. Theorem 1 (Stability).

Let p be a balanced gp-function generator in Graph and ( G, f ) , ( G (cid:48) , f (cid:48) ) be two weighted graphs. Then we have d (cid:0) D ( f ) , D ( f (cid:48) ) (cid:1) ≤ δ (cid:0) ( G, f ) , ( G (cid:48) , f (cid:48) ) (cid:1) , where D ( f ) and D ( f (cid:48) ) are the persistence diagrams realized by the gp-functions p ( G,f ) and p ( G (cid:48) ,f (cid:48) ) respectively. (cid:3) Through [13, Thm. 5.8], this also implies stability with respect to the in-terleaving distance. Universality is generally not granted for stable persistencefunctions: it needs ad hoc constructions. .2 Steady and ranging sets

Given a weighted graph (

G, f ), any function F : 2 V ∪ E → { true, f alse } is calleda feature . We call F -set any X ⊂ V ∪ E such that F ( X ) = true . Given areal number u , we denote by G u the subgraph of G induced by the edge set f − ( −∞ , u ]. We shall say that X ⊂ V ∪ E is an F -set at level w ∈ R if it is an F -set of the subgraph G w . Deﬁnition 9.

Let F be a feature. A set X ⊆ V ∪ E is a steady F -set (s F -setfor brevity) at ( u, v ) ∈ ∆ + if it is an F -set at all levels w with u ≤ w ≤ v . Wecall X a ranging F -set (r F -set) at ( u, v ) if there exist levels w ≤ u and w (cid:48) ≥ v at which it is an F -set.Let S F ( G,f ) ( u, v ) be the set of s F -sets at ( u, v ) and let R F ( G,f ) ( u, v ) be the setof r F -sets at ( u, v ) .Remark 4. Of course, steady implies ranging; this is due to the “ ≤ ” and “ ≥ ”signs in the deﬁnitions. With strict inequalities the implication fails. Actually,there are features F for which steady is equivalent to ranging: the ones for whicha set can be an F -set only in a (possibly unbounded) interval. A simple exampleis the feature F which assigns true only to singletons consisting of a vertex of aﬁxed degree. Lemma 1. If u ≤ u (cid:48) < v (cid:48) ≤ v , then1. S F ( G,f ) ( u, v ) ⊆ S F ( G,f ) ( u (cid:48) , v (cid:48) ) R F ( G,f ) ( u, v ) ⊆ R F ( G,f ) ( u (cid:48) , v (cid:48) ) where the equalities hold if G u = G u (cid:48) and G v = G v (cid:48) . Moreover S F ( G,f ) ( u, v ) = ∅ = R F ( G,f ) ( u, v ) if G u = ∅ .Proof. By the deﬁnitions themselves of steady and ranging F -set. Deﬁnition 10.

Let F be a feature. For any graph G , for any ﬁltering func-tion f : E → R , we deﬁne σ F ( G,f ) : ∆ + → R as the function which assignsto ( u, v ) ∈ ∆ + the number | S F ( G,f ) ( u, v ) | and (cid:37) F ( G,f ) : ∆ + → R as the functionwhich assigns to ( u, v ) ∈ ∆ + the number | R F ( G,f ) ( u, v ) | . We denote by σ F and (cid:37) F the maps assigning σ F ( G,f ) and (cid:37) F ( G,f ) respectively to the ( R , ≤ ) -indexed diagramcorresponding to ( G, f ) . Proposition 1.

The maps σ F and (cid:37) F are gp-function generators.Proof. We prove conditions 1 and 2 of Def. 3, recalling that the source categoryis ( R , ≤ ), so the existence of a morphism u → v (with u (cid:54) = v ) simply means that u < v . Assume u < u < v < v . Let ( G, f ) be any weighted graph. – (Condition 1 for σ F ) By Lemma 1, S F ( G,f ) ( u , v ) ⊆ S F ( G,f ) ( u , v ), so | S F ( G,f ) ( u , v ) | ≤ | S F ( G,f ) ( u , v ) | . Also S F ( G,f ) ( u , v ) ⊆ S F ( G,f ) ( u , v ) and | S F ( G,f ) ( u , v ) | ≤ | S F ( G,f ) ( u , v ) | . (Condition 2 for σ F ) By Lemma 1, S F ( G,f ) ( u , v ) ⊆ S F ( G,f ) ( u , v ),so | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | is the number of s F -sets at ( u , v )which fail to be F -sets at some w with u ≤ w ≤ u . Analogously for | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | .Now, every s F -set at ( u , v ) which fails to be an F -set at w with u ≤ w ≤ u is also an s F -set at ( u , v ) failing at the same w .So S F ( G,f ) ( u , v ) − S F ( G,f ) ( u , v ) ⊇ S F ( G,f ) ( u , v ) − S F ( G,f ) ( u , v ) and | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | ≥ | S F ( G,f ) ( u , v ) | − | S F ( G,f ) ( u , v ) | . – (Condition 1 for (cid:37) F ) The argument is the same as for σ F . – (Condition 2 for (cid:37) F ) By Lemma 1, R F ( G,f ) ( u , v ) ⊆ R F ( G,f ) ( u , v ),so | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | is the number of r F -sets at ( u , v )which fail to be F -sets at all levels w with w ≤ u . Analogously for | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | .Now, every r F -set at ( u , v ) which fails to be an F -set at all levels w with w ≤ u is also an r F -set at ( u , v ) failing at the same levels w . So R F ( G,f ) ( u , v ) − R F ( G,f ) ( u , v ) ⊇ R F ( G,f ) ( u , v ) − R F ( G,f ) ( u , v ) and | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | ≥ | R F ( G,f ) ( u , v ) | − | R F ( G,f ) ( u , v ) | .The value of both functions σ F ( G,f ) and (cid:37) F ( G,f ) at a point P on a vertical (resp.horizontal) discontinuity line is the same as the value at the points in a right(resp. upper) neighborhood of P Of course, there are many features which give valid but meaningless gp-functions: the features F such that, if X is an F -set at level u , then it is an F -set also at level v for all v > u .We still don’t know which hypothesis on F would imply that σ F ( G,f ) or (cid:37) F ( G,f ) are balanced (Def. 7). Such features exist: One is the already mentioned feature F which assigns true only to singletons consisting of a vertex of a ﬁxed degree. We now give an example of the framework exposed in Section 3.2. Given anygraph G , we deﬁne Eu : 2 V ∪ E → { true, f alse } to yield true on a set A if andonly if A is a set of vertices whose induced subgraph of G is nonempty, connected,Eulerian and maximal with respect to these properties; in that case A is saidto be a Eu - set of G . Let now ( G, f ) be a weighted graph. We apply Def. 9 tofeature Eu , in the modiﬁed version with one strict inequality. Deﬁnition 11.

For any real number w , the subset A ⊆ V is a Eu -set at level w if it is a Eu -set of the subgraph G w . It is a steady Eu -set (an s Eu -set) at ( u, v ) ∈ ∆ + if it is a Eu -set at all levels w with u ≤ w < v . It is a ranging Eu -set (an r Eu -set) at ( u, v ) if there exist levels w ≤ u and w (cid:48) ≥ v at which itis a Eu -set. S Eu ( G,f ) ( u, v ) and R Eu ( G,f ) ( u, v ) are respectively the sets of s Eu -sets and of r Eu -setst ( u, v ) . We deﬁne σ Eu ( G,f ) : ∆ + → R as the function which assigns to ( u, v ) ∈ ∆ + the number | S Eu ( G,f ) ( u, v ) | and (cid:37) Eu ( G,f ) : ∆ + → R as the function which assigns to ( u, v ) ∈ ∆ + the number | R Eu ( G,f ) ( u, v ) | .We denote by σ Eu and (cid:37) Eu the maps assigning σ Eu ( G,f ) and (cid:37) Eu ( G,f ) respectively tothe ( R , ≤ ) -indexed diagram corresponding to ( G, f ) . Fig. 1: Example of the functions σ Eu ( G,f ) and (cid:37) Eu ( G,f ) , coinciding for this particularweighted graph. Proposition 2.

The maps σ Eu and (cid:37) Eu are gp-function generators.Proof. By Proposition 1.Fig. 1 shows these two functions (coincident in this case) for a weightedgraph.Both functions σ Eu and (cid:37) Eu are not balanced (see the Appendix). Although the informal concept of hub is intuitively clear, it is not as easy toformalize in graph-theoretical terms. The simple idea of a vertex with (locally)maximum degree is not entirely satisfactory: in a social network it is commonto ﬁnd users with a lot of contacts, with whom, however, they interact poorly.Even a high sum of traﬃc intensities (e.g. the number of messages exchangedbetween a user and its connections) is not enough to bestow a vertex the centralrole meant by the word hub .e shall use local degree prevalence as the feature used for building twogp-function generators: for any graph G we deﬁne H : 2 V ∪ E → { true, f alse } toyield true only on singletons containing a vertex whose degree is greater than theones of its neighbors. Such a vertex is called an H -vertex or simply a hub . Thisfeature, combined with the generalized persistence framework and the notion ofranging and steady feature, allows for the identiﬁcation of those vertices whoserole is indeed central throughout the ﬁltration of a given weighted graph ( G, f ).Importantly, we preserve the ﬂexibility granted in the realm of classical per-sistence: as one of the many possible variations, we could consider a vertex to bea hub if the sum of values of f on the edges incident to it (instead of the degree)is greater then the sum at its neighbors.Our proposal is to build persistence diagrams in our generalized framework,and thereafter use the selection procedure presented in [16] (see 5.1) to identifyrelevant cornerpoints, thus identifying the “persistent” hubs of a given weightedgraph. Deﬁnition 12.

For any real number w , a vertex is a hub (or H -vertex ) at level w if it is an H -vertex of the subgraph G w . It is a steady hub (or s H -vertex ) at( u, v ) ∈ ∆ + if it is an H -vertex at all levels w with u ≤ w ≤ v . It is a ranginghub (or r H -vertex ) at ( u, v ) ∈ ∆ + if there exist levels w ≤ u and w (cid:48) ≥ v atwhich it is an H -vertex. S H ( G,f ) ( u, v ) and R H ( G,f ) ( u, v ) are respectively the sets of s H -vertices and of r H -vertices at ( u, v ) . We deﬁne σ H ( G,f ) : ∆ + → R as the function which assigns to ( u, v ) ∈ ∆ + the number | S H ( G,f ) ( u, v ) | and (cid:37) H ( G,f ) : ∆ + → R as the function whichassigns to ( u, v ) ∈ ∆ + the number | R H ( G,f ) ( u, v ) | .We denote by σ H and (cid:37) H the maps assigning σ H ( G,f ) and (cid:37) H ( G,f ) respectively tothe ( R , ≤ ) -indexed diagram corresponding to ( G, f ) . Proposition 3. σ H and (cid:37) H are gp-function generators.Proof. By Proposition 1. . Fig. 2: A weighted graph (

G, f ) and its functions σ H ( G,f ) and (cid:37) H ( G,f ) (right).Fig. 2 shows an example of the two gp-functions. Also σ H and (cid:37) H are notbalanced (see the Appendix). Persistent hubs

In this Section we present a ﬁrst approach to hub detection implementable onreal-world graphs. We consider this work in progress a sort of exploration of themeaning of steady and ranging hubs in diﬀerent contexts; however, we will notcompare our results to a ground truth.In the following examples, instead of the functions σ H ( G,f ) and (cid:37) H ( G,f ) , wewill only show the corresponding persistence diagrams, to make the selectionprocedure clearer. It is well-known in persistence that noise is represented by cornerpoints close tothe diagonal ∆ . However, not all cornerpoints close to ∆ necessarily representnoise, then how wide is the strip along ∆ to get rid of? A smart, simple answeris oﬀered in [16], where a remarkable application to segmentation of very noisydata is given. We summarize it here for a given persistence diagram D .Call diagonal gap a maximal region of the form { ( u, v ) ∈ ∆ + | a < u < v < b } where no cornerpoints of D lie; b − a is its width. We can then form a hierar-chy of diagonal gaps by decreasing width; out of it we get a hierarchy of setsof cornerpoints: We can consider the cornerpoints lying above the ﬁrst, widestgap as the most relevant. Empirically, we may decide that also the cornerpointssitting above the second, or the third widest gap are relevant, and so on. Equiv-alently, we consider the cornerpoints below the chosen gap to be ignored as apossible result of noise. In Fig. 3 it is possible to observe how the selection ofcornerpoints above the widest diagonal gap allows to traceback those maxima(or classes of maxima depending on the multiplicity of the cornerpoints), thatare more relevant with respect to the trend of the time series.Fig. 3: Selecting maxima in a time series. Left.

Flow of the Nile from 1871to 1970. Data freely available at vincentarelbundock.github.io.

Right.

Corner-points selected by considering the widest diagonal gap (in yellow).In the next Sections we apply this selection criterion to the persistence di-agrams corresponding to the functions σ H ( G,f ) and (cid:37) H ( G,f ) , computed for someetworks and some ﬁltering functions. The vertices identiﬁed by the so selectedcornerpoints will be called persistent hubs , in particular persistent steady hubs or persistent ranging hubs . A ﬁrst attempt of the search for relevant hubs has been realized on a set of44 major North-American cities (41 in the US, three in Canada; the ones incapital letters in the Amtrak railway map; see Table 1). The edges connect citiesbetween which there have been ﬂights in a randomly chosen but ﬁxed week (June11 to 17, 2018). Flight data have been obtained from Google Flights by selectingdirect ﬂights with Business Class; distances have been found at Prokerala.com.A single vertex has been considered for each city with more than one airport.

Vertices (degree)

Albuquerque (13) Atlanta (42) Baltimore (16) Boston (30)Buﬀalo (8) Cheyenne (0) Chicago (40) Cincinnati (19)Cleveland (13) Dallas (41) Denver (39) Detroit (35)El Paso (7) Houston (40) Indianapolis (17) Jacksonville (12)Kansas City (19) Las Vegas (23) Los Angeles (37) Memphis (11)Miami (30) Milwaukee (14) Mobile (3) Montreal (16)New Orleans (16) New York (35) Oakland/Emeryville (7) Philadelphia (34)Phoenix (35) Pittsburgh (14) Portland (25) Sacramento (16)Salt Lake City (33) San Antonio (17) San Diego (26) San Francisco (35)Seattle (34) St. Louis (17) St. Paul-Minneapolis (38) Tampa (19)Toronto (26) Tucson (10) Vancouver (18) Washington (32)

Table 1: The towns considered as vertices and the respective degrees in the graph.As ﬁltering functions we used: – distance – number of ﬂights in the ﬁxed week – their productand their opposites (+their maximum). For each such choice we looked for steadyand ranging hubs, for a total of twelve diﬀerent persistence diagrams. Note thatthe same vertex can contribute to several cornerpoints of the persistence diagramof σ H ( G,f ) , whereas this cannot happen for (cid:37) H ( G,f ) .Next, we report results in which where the interest resides in the identiﬁcationof hubs which do not rank very high by their degree. In particular, we do notﬁnd of particular interest that Atlanta, Dallas, Chicago and Houston turn out tobe often persistent ranging or steady hubs, since they have the highest degreesin the graph (42, 41, 40 and 40 respectively). D e a t h Visualizing the 0th widest gap

Fig. 4: Filtering function: distance; steady hubs. Persistent steady hubs abovethe widest diagonal gap: two cornerpoints represent Atlanta, one Dallas and oneSeattle.The ﬁrst occurrence of a persistent hub which is rather far from havinghighest degrees is with the ﬁltering function distance: Seattle is just twelfth inthe degree rank, but appears above the widest diagonal gap as a steady hub(Figure 4). Persistent steady hubs are: Atlanta (with two cornerpoints), Dallas,Seattle.Surprisingly, if we use the opposite of distance (summed to the maximumdistance, for ease of representation), the cornerpoints corresponding to verticeswith highest degrees are located under the widest diagonal gap (Figure 5). Per-sistent steady hubs are: Los Angeles, San Francisco, Seattle. D e a t h Visualizing the 0th widest gap

Fig. 5: Filtering function: max distance minus distance; steady hubs. Persistentsteady hubs above the widest diagonal gap: Los Angeles, San Francisco, Seattle.New York City has the eighth highest degree (35, together with Detroit,Phoenix and San Francisco). Still, we would expect it to appear as a hub, inthe common sense of the term. In fact, it occurs as one of the few ranging hubswhen the ﬁltering functions (max minus number of ﬂights) and distance · (maxminus number of ﬂights) are used.Ranging hubs for (max minus number of ﬂights): Atlanta, Chicago, Dallas, NewYork. teady hubs Cosette Courfeyrac EnjolrasMarius Myriel Valjean

Ranging hubs

Cosette Courfeyrac EnjolrasMarius Myriel Valjean

Clique-community centrality

Enjolras Fantine GavrocheMarius Valjean

Table 2: Hubs in Les Miserables characters co-occurrence. Comparing results ob-tained via the steady and ranging persistence construction and clique-communitycentrality.Ranging hubs for the product ﬁltering function are Atlanta, Chicago, Dallas,New York, Vancouver.

A classical benchmark for the analysis of hubs in co-occurrence graphs is givenby

Les Mis´erables . The network representing the co-occurrence of its charactersis freely available at Graphistry. The graph has 77 major characters as vertices;each of the 254 edges joins two characters which appear together in at least onescene; the weight on an edge is the number of common occurrences. We used theinverse of the weight as a ﬁltering function. We compare our results with theones of [24], where the notion of clique-community centrality was used to spotparticularly important characters: Table 2.Our method spots Cosette as a hub, whereas clique-community centralitydoes not. On the contrary, our technique misses Gavroche and Fantine. Bothmethods miss Javert. We are particularly puzzled by the result of Kurlin’s selec-tion method: above the second widest diagonal gap (the ﬁrst obviously isolatesJean Valjean) we ﬁnd only Enjolras.

The website TerraLing.com contains much information, consisting of 165 prop-erties, about several languages. It was used in an interesting research [22] onpersistent cycles in language families. Unfortunately the amount of informationvaries quite a lot from language to language. We analysed the mutual relationsof 19 languages (18 of the European Union plus Turkish: Table 3) for which atleast 50% of the 165 properties are checked. The graph is the complete one with19 vertices. The ﬁltering function deﬁned on each edge is the opposite of the nor-malised quantity of common properties of the two languages that it connects.anging and steady hubs coincide and are: Castilian, Catalan, Dutch, English,Portuguese, Swedish.

Languages

Castilian Catalan Czech Croatian DanishDutch English Finnish French GalicianGerman Greek Hungarian Italian PolishPortuguese Romanian Swedish Turkish

Table 3: The 19 considered languages.Apart from the presence of English, which might also be biased by the greatquantity of information available, we have no key for interpreting these results.For this and for the previous applications, we would very much like to set up aresearch with speciﬁc experts.Fig. 6: σ Eu is not balanced: ﬁltering function f left, f (cid:48) right. We introduced gp-functions in a fairly general setting and studied their stability.We have then restricted our scope to the category of graphs, where we havedeﬁned steady and ranging sets according to features relative to the given graphs.Particular attention has been given to steady and ranging hubs in a graph. Wealso tried to apply this notion to the vertices of a network of airports, to thecharacters of

Les Mis´erables and to a set of languages. cknowledgments

We are indebted to Diego Alberici, Emanuele Mingione, Pierluigi Contucci, Pa-trizio Frosini, Lorenzo Zuﬃ and above all Pietro Vertechi for many fruitful dis-cussions. Article written within the activity of INdAM-GNSAGA.Fig. 7: (cid:37) Eu is not balanced: ﬁltering function f left, f (cid:48) right. Appendix: Instability

In order to show that some of the proposed gp-functions are not balanced—sotheir persistence diagrams do not enjoy stability—we give examples which donot respect Def. 7.

Fig. 8: σ H is not balanced: ﬁltering function f left, f (cid:48) right.The gp-function generator σ Eu is not balanced, as the example of Fig.6 shows:in fact, the maximum absolute value of the weight diﬀerence on the same edgesis 1, and σ Eu ( G,f ) (2 . − ,

10 + 1) = 1 > σ Eu ( G,f (cid:48) ) (2 . ,

25 8 769 397 4 8 65

Fig. 9: (cid:37) H is not balanced: ﬁltering function f left, f (cid:48) right.Also the gp-function generator (cid:37) Eu is not balanced, as the example of Fig.7shows: in fact, the maximum absolute value of the weight diﬀerence on the sameedges is 1, and (cid:37) Eu ( G,f ) (7 . − ,

10 + 1) = 1 > (cid:37) Eu ( G,f (cid:48) ) (7 . , σ H is not a balanced gp-function generator, as the example of Fig. 8 shows:the maximum absolute value of the weight diﬀerence on the same edges is 2, but σ H ( G,f ) (4 − , > σ H ( G,f (cid:48) ) (4 , > ” is substituted by“ ≥ ” in thedeﬁnition of hub (what we don’t think to be a good idea).Also (cid:37) ( G,f ) is not a balanced gp-function, as the example of Fig. 9 shows:the maximum absolute value of the weight diﬀerence on the same edges is 2, but (cid:37) H ( G,f ) (5 − , > (cid:37) H ( G,f (cid:48) ) (5 , References

1. M. G. Bergomi, M. Ferri, P. Vertechi, and L. Zuﬃ. Beyond topological persistence:Starting from networks. arXiv preprint arXiv:1901.08051 , 2020.2. M. G. Bergomi, M. Ferri, and L. Zuﬃ. Topological graph persistence. arXivpreprint arXiv:1707.09670 , 2020.3. M. G. Bergomi and P. Vertechi. Rank-based persistence.

Theory and applicationsof categories , 35(9):228–260, 2020.4. A. S. Blevins and D. S. Bassett. Reorderability of node-ﬁltered order complexes.

Physical Review E , 101(5):052311, 2020.5. P. Bubenik and J. A. Scott. Categoriﬁcation of persistent homology.

Discrete &Computational Geometry , 51(3):600–627, 2014.6. F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas, and S. Y. Oudot. Proximityof persistence modules and their diagrams. In

SCG ’09: Proceedings of the 25thannual symposium on Computational geometry , pages 237–246, New York, NY,USA, 2009. ACM.7. S. Chowdhury and F. M´emoli. Persistent path homology of directed networks.In

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1152–1169. SIAM, 2018.8. D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams.

Discr. Comput. Geom. , 37(1):103–120, 2007.9. M. d’Amico, P. Frosini, and C. Landi. Natural pseudo-distance and optimal match-ing between reduced size functions.

Acta Applicandae Mathematicae , 109(2):527–554, 2010.10. V. de Silva, E. Munch, and A. Stefanou. Theory of interleavings on [0 , ∞ )-actegories. arXiv preprint arXiv:1706.04095 , 2017.1. H. Edelsbrunner and J. Harer. Persistent homology—a survey. In Surveys ondiscrete and computational geometry , volume 453 of

Contemp. Math. , pages 257–282. Amer. Math. Soc., Providence, RI, 2008.12. P. Frosini and C. Landi. Size functions and formal series.

Appl. Algebra Engrg.Comm. Comput. , 12(4):327–349, 2001.13. P. Frosini, C. Landi, and F. M´emoli. The persistent homotopy type distance.

Homology, Homotopy and Applications , 21(2):231–259, 2019.14. P. Frosini and M. Mulazzani. Size homotopy groups for computation of naturalsize distances.

Bull. of the Belg. Math. Soc. , 6(3):455–464, 1999.15. W. Kim and F. Memoli. Generalized persistence diagrams for persistence modulesover posets. arXiv preprint arXiv:1810.11517 , 2018.16. V. Kurlin. A fast persistence-based segmentation of noisy 2d clouds with provableguarantees.

Pattern recognition letters , 83:3–12, 2016.17. M. Lesnick. The theory of the interleaving distance on multidimensional persistencemodules.

Foundations of Computational Mathematics , pages 1–38, 2015.18. L.-D. Lord, P. Expert, H. M. Fernandes, G. Petri, T. J. Van Hartevelt, F. Vaccarino,G. Deco, F. Turkheimer, and M. L. Kringelbach. Insights into brain architecturesfrom the homological scaﬀolds of functional connectivity networks.

Frontiers inSystems Neuroscience , 10, 2016.19. S. Y. Oudot.

Persistence theory: from quiver representations to data analysis ,volume 209. American Mathematical Society Providence, RI, 2015.20. A. Patel. Generalized persistence diagrams.

Journal of Applied and ComputationalTopology , 1(3-4):397–419, 2018.21. G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, andF. Vaccarino. Homological scaﬀolds of brain functional networks.

Journal of TheRoyal Society Interface , 11(101):20140873, 2014.22. A. Port, I. Gheorghita, D. Guth, J. M. Clark, C. Liang, S. Dasu, and M. Marcolli.Persistent topology of syntax.

Mathematics in Computer Science , 12(1):33–50,2018.23. M. W. Reimann, M. Nolte, M. Scolamiero, K. Turner, R. Perin, G. Chindemi,P. D(cid:32)lotko, R. Levi, K. Hess, and H. Markram. Cliques of neurons bound intocavities provide a missing link between structure and function.

Frontiers in Com-putational Neuroscience , 11:48, 2017.24. B. Rieck, U. Fugacci, J. Lukasczyk, and H. Leitte. Clique community persistence:A topological visual analysis approach for complex networks.

IEEE Transactionson Visualization and Computer Graphics , 24(1):822–831, 2018.25. A. E. Sizemore, C. Giusti, A. Kahn, J. M. Vettel, R. F. Betzel, and D. S. Bas-sett. Cliques and cavities in the human connectome.

Journal of ComputationalNeuroscience , 44(1):115–145, Feb 2018.26. A. D. Vijay, M. Zhenyu, X. Kelin, and M. Yuguang. Weighted persistent homol-ogy for osmolyte molecular aggregation and hydrogen-bonding network analysis.