[PDF] Homological Scaffold via Minimal Homology Bases

Abstract

The homological scaffold leverages persistent homology to construct a topologically sound summary of a weighted network. However, its crucial dependency on the choice of representative cycles hinders the ability to trace back global features onto individual network components, unless one provides a principled way to make such a choice. In this paper, we apply recent advances in the computation of minimal homology bases to introduce a quasi-canonical version of the scaffold, called minimal, and employ it to analyze data both real and in silico. At the same time, we verify that, statistically, the standard scaffold is a good proxy of the minimal one for sufficiently complex networks.

Full PDF

GGuerra et al.

RESEARCH

Homological Scaﬀold via Minimal HomologyBases

Marco Guerra , Alessandro De Gregorio , Ulderico Fugacci , Giovanni Petri and Francesco Vaccarino * Correspondence:[email protected] Department of MathematicalSciences, Politecnico di Torino,Torino, ItalyFull list of author information isavailable at the end of the article

Abstract

The homological scaﬀold leverages persistent homology to construct atopologically sound summary of a weighted network. However, its crucialdependency on the choice of representative cycles hinders the ability to traceback global features onto individual network components, unless one provides aprincipled way to make such a choice. In this paper, we apply recent advances inthe computation of minimal homology bases to introduce a quasi-canonicalversion of the scaﬀold, called minimal, and employ it to analyze data both realand in silico. At the same time, we verify that, statistically, the standard scaﬀoldis a good proxy of the minimal one for suﬃciently complex networks.

Keywords:

Persistent Homology; Topological Data Analysis; NetworkSkeletonization

AMS Subject Classiﬁcation:

Network science has long represented the cornerstone theory in dealing with com-plex, heterogeneous multi-agent systems. Network descriptions have found wideapplications and had a signiﬁcant impact on a wide range of ﬁelds ([1, 2]), includ-ing social networks ([3, 4]), epidemiology ([5, 6]), biology ([7, 8]), and neuroscience([9, 10, 11]).In recent years, new approaches to the analysis of networks and, more generally,complex interacting systems have emerged which leverage topological techniques([12, 13, 14, 15]). These techniques generally are referred to as Topological DataAnalysis (TDA) ([16, 17]). TDA is a relatively modern subject based on classical Al-gebraic Topology ([18, 19]) and that was sparked from a handful of seminal works inthe late 90’s ([20, 21, 22, 23, 24]). TDA typically endows a large variety of datasetswith a notion of shape (more properly, with a topological structure) and, based onthat, studies the considered data in terms of its topological features. This ﬁeld isundergoing a rapid expansion thanks to its rooting in the powerful languages ofhomological algebra and category theory, which provide strong formal foundations,as well as to the wide variety of applications it found, that span material science([25, 26]), biology and chemistry ([27, 28, 29, 30, 31, 32]), sensor networks ([33]),cosmology ([34]), medicine and neuroscience ([35, 36, 37, 38, 39, 40, 41, 42, 43]),manufacturing and engineering ([44, 45, 46]), social sciences ([47, 48]), and networkscience itself ([49, 50, 51, 52]).The most central tool in TDA is undoubtedly Persistent Homology ([16, 53]). Thetheory of (or around) persistence has recently been proposed as a framework for a r X i v : . [ m a t h . A T ] A p r uerra et al. Page 2 of 25 the topological skeletonization of spaces, particularly weighted graphs and networks([54, 55, 56, 57]).In [40], the generators of persistent homology are used to build one instance ofnetwork skeletonization called homological scaﬀold . However, the method has a se-rious drawback, consisting in the large degree of arbitrariness in the choice of onerepresentative cycle from the many equivalent generating cycles of the same homol-ogy class. This is unfortunately a direct consequence of the homology classes beingequivalence classes and aﬀects all attempts to localize cycles ([58, 42]). In this work,we set out to address this issue by searching for a form of canonicity in the choiceof generators, namely by computing minimal representatives of homology bases.Minimal homology bases have long been investigated ([59, 60]), with a breakthroughonly coming thanks to the introduction of a ﬁrst eﬃcient algorithm for the com-putation of bases in dimension one ([61]). Here, we leverage said minimal bases topropose a new approach to network skeletonization, the minimal scaﬀold , whichovercomes the limitation of the previous one. While the minimal scaﬀold is notunique in the most general case possible, we provide strong guarantees and caveatson when and to what degree it is well-deﬁned. We then show a few applications ofthe novel method, concluding the paper with a comparison between our and theprevious construction.

Outline of the Paper

The paper is organized as follows. Section 2 provides a brief overview of the mainconcepts in Topological Data Analysis. Section 3 describes the original approachto network skeletonization by means of persistent homology, and highlights thedeﬁciencies which we wish to address. In Section 4, the topic of computing minimalrepresentatives of a homology basis is worked out. Section 5 introduces the mainconcept of this work, the minimal scaﬀold. In Section 6, the issue of uniquenessis discussed, with some results stated, leading to a more reﬁned version of theminimal scaﬀold. Section 7 showcases some applications for the minimal scaﬀold.In the light of its computational complexity, we further carry out in Section 8 astatistical comparison between the minimal and original scaﬀolds, providing someheuristic guarantees and caveats. Section 9 concludes the discussion.

Glossary

List of symbols and their common usage throughout the paper. uerra et al.

Page 3 of 25

Symbol Meaning C A point cloud in R d K A simplicial complex F A ﬁltration of simplicial complexes ( K ε ) ε =1 ,..M W A non-negatively weighted ﬁnite graph V The set of vertices of a graph E The set of edges of a graphVR( W ) The Vietoris-Rips complex of graph WC k ( K ) The vector space over Z of chains of k -simplices of the complex K∂ k The boundary operator between C k ( K ) and C k − ( K ) H ( K ) The 1 st homology group of complex Kβ ( K ) The dimension of H ( K ) P H ( F ) The 1-dimensional persistent homology of ﬁltration F µ A function assigning non-negative weights to edges and cycles B A minimal homology cycle basis˜ B A minimal homology cycle basis with draws B ∗ The disjoint union of minimal cycle bases across a ﬁltration˜ B ∗ The disjoint union of minimal cycle bases with draws across a ﬁltration V i A set of homologous, equally minimal variants of a basis cycle H ( W ) The homological scaﬀold of weighted graph W H min ( W ) The minimal homological scaﬀold of weighted graph W ˜ H min ( W ) The minimal homological scaﬀold with draws of weighted graph W In this section we introduce the minimum amount of mathematics necessary to theunderstanding of the rest of the paper. We refer to classical textbooks on the subjectfor further reading ([18, 19, 53, 16]).

Simplicial complexes

Thanks to their proven ﬂexibility in a plethora of applicative contexts, simplicialcomplexes are the most adopted mathematical structure for encoding unorganized,large-size and high-dimensional data. In purely combinatorial terms, a (ﬁnite) sim-plicial complex K on a ﬁnite set V is a collection of non-empty subsets of V , called simplices , with the property of being closed under inclusion, i.e., every non-emptysubset of a simplex of K is itself a simplex of K . Given a simplicial complex K , theelements of V are called vertices of K and a simplex σ ∈ K is called a k -simplex (equivalently, a simplex of dimension k ) if it consists of k + 1 vertices. The dimen-sion of a simplicial complex K is the largest dimension of the simplices in K .Even if the abstract deﬁnition of a simplicial complex just given is able to cap-ture a variety of datasets not necessarily endowed with a geometrical realization,it is worth to be mentioned that, intuitively, a simplicial complex is nothing but acollection of well-glued bricks, its simplices. According with such a perspective, a k -simplex can be seen as the convex hull of k + 1 geometrically independent points.For instance, a 1-simplex is an edge, a 2-simplex is a triangle, a 3-simplex is atetrahedron, and so on. uerra et al. Page 4 of 25

Homology

Homology is a topological tool which provides invariants for shape description andcharacterization. Given a simplicial complex K , it is possible to associate to it acollection of vector spaces C k ( K ) over a ﬁeld, in our case Z , whose bases are indexedby the k -simplices so that, loosely speaking, we say that these spaces are generatedby the k -simplices of K . These spaces are connected by boundary operators ∂ k : C k ( K ) → C k − ( K ) mapping each k -simplex σ in the sum of the ( k − K strictly contained in σ . We denote as Z k ( K ) := ker ∂ k the space of the k -cyclesof K and as B k ( K ) := Im ∂ k +1 the space of the k -boundaries of K . Then, since ∂ k ∂ k +1 = 0, the quotient H k ( K ) := Z k ( K ) B k ( K )deﬁnes a vector space called k th homology group of K . We will call two k -cycles homologous if they belong to the same homology class.Roughly speaking, homology reveals the presence of “holes” in a shape. A non-nullelement of H k ( K ) is an equivalence class of cycles that are not the boundary ofany collection of ( k + 1)-simplices of K . Such classes represent, in dimension 0, theconnected components of complex K , in dimension 1, its tunnels and its loops, indimension 2, the shells surrounding voids or cavities, and so on. Persistent homology

An intrinstic limitation of homology concerns the need for working with a singlesimplicial complex representing the dataset under investigation. However, in realworld applications, the presence of noise and of measurement errors makes thechoice and construction of a single steady representation very hard in practice.

Persistent homology ([22, 53]), currently one of the main tools in Topological DataAnalysis, aims at solving this issue through a multi-scale study of a dataset and ofits homological features by associating to it a sequence of simplicial complexes. Theconcept of ﬁltration captures exactly the idea of analyzing a dataset at diﬀerentthresholds of a parameter on which it depends. More formally, given a simplicialcomplex K , a ﬁltration F of K is a sequence of its subcomplexes such that ∅ ⊆ K ⊆ · · · ⊆ K M = K Given a ﬁltration of a simplicial complex K , persistent homology keeps track of theevolution of the non-null non-homologous cycles of K and, associating a lifespan toeach of them, is able to discriminate the relevant information from the noise. For-mally, for p, q = 1 , . . . , M with p < q , H p,qk ( F ) on ( p, q ) of a ﬁltration F consists ofthe image of the linear map between H k ( K p ) and H k ( K q ) induced by the inclusionof complexes between K p and K q . So, more intuitively, the elements in H p,qk ( F )represent the cycles of K which survive from step p to step q .Given a ﬁltration of ﬁnite simplicial complexes F , we deﬁne its k -dimensional per-sistent homology classes as the homology classes of (cid:76) ε H k ( K ε ) modulo the mapsinduced by the inclusion of simplicial complexes. More properly, h ∈ H k ( K p ) and h ∈ H k ( K q ) with p ≤ q are equivalent if and only if ι ∗ p,qk ( h ) = h where ι ∗ p,qk uerra et al. Page 5 of 25 denotes the linear map between H k ( K p ) and H k ( K q ) induced by the inclusion ofcomplexes between K p and K q . We call k -dimensional persistent homology P H k ( F )the space spanned by the k -dimensional persistent homology classes.As proven in [23], a basis of P H k ( F ) is in bijective correspondence with a ﬁnite setof intervals of the form { ( p, q ) , p < q, p, q ∈ Z ∪ ∞} referred as persistence pairs .We deﬁne a set of k -dimensional generator cycles of the persistent homology as aset of k -cycles of K M whose persistent homology classes form a basis of P H k ( F ).The information about the “life” of each homology class can be collected in a visual,informative representation of the topological structure of the input, the persistencebarcode : a plot consisting of a bar for each homological feature appearing through-out the ﬁltration, stretching from its birth to its death value. An equivalent wayto depict the same information is through the persistence diagram : the persistencediagram is the multi-set (i.e., multiple instances of the same element are allowed) ofpoints in R consisting of all the (birth, death) pairs, i.e., pairs of values p < q suchthat a k -dimensional homology class arises at ﬁltration step p and becomes zero atstep q . Persistent homology owes its popularity as a descriptor to the immediacyand power of these visual representations of the homological information but, evenmore, to the fact that the retrieved features are provably stable. In fact, by deﬁninga notion of distance among persistence diagrams or barcodes, it can be shown thatsimilar datasets necessarily have similar homological features ([24]). Building (ﬁltered) complexes

In many applications, one is not directly called to deal with a simplicial complex,but has instead access to data in the form of point clouds in a metric space or ofweighted graphs. For example, data may be obtained as a sample of some (unknown)ground truth, i.e., an undisclosed manifold of dimension usually much lower thanthe space it is embedded in ([16]). Another typical subject of application is networkscience ([49, 52]): in this setting, the input is in the form of a weighted graph.Notice that in this case it is not mandatory that the graph can be embedded insome metric space, i.e., that the edge weighting respects a triangular inequality.Networks are not necessarily representations of geometrical entities, and still thetopological approach extends naturally to this context.In both these cases, one needs to provide a suitable simplicial complex resting onthe given structure. The subject has been addressed extensively (see, for example,[53]); in here, we simply review the most typical scheme, called the

Vietoris-Ripscomplex . Given a graph G = ( V, E ), its ﬂag or clique complex is the simplicialcomplex F lag ( G ) whose simplices coincide with the cliques of G .Given a point cloud V ⊂ R n and ﬁxed a value ε >

0, one can build a graph G ε with a vertex for every point in V , and an edge between two vertices every timethe distance between the corresponding points is less or equal than ε . Analogously,given a weighted graph G = ( V, E ) one can build a subgraph G ε on the same vertexset, with only those edges that have weight less or equal than ε . Independently fromthe considered case, one can deﬁne the Vietoris-Rips complex

V R ε of parameter ε asthe ﬂag complex F lag ( G ε ) of graph G ε . Furthermore, since varying ε the Vietoris-Rips complexes V R ε form an increasing sequence of simplicial complexes, the family( V R ε ) gives raise to a ﬁltration denoted as ﬁltered Vietoris-Rips complex (see Fig. uerra et al. Page 6 of 25 ε H H (0,0.18)(0,0.31)(0,0.33)(0,0.34)(0,0.35)(0,0.35)(0,∞) (0.39,0.51) H H ( a ) ( c )( b ) Figure 1 (a) An example of Vietoris-Rips ﬁltration of simplicial complexes with parameter ε , andthe corresponding barcode for 0- and 1-dimensional persistent homology. (b) The persistent pairsof the above ﬁltration. (c) Two equivalent representatives of the (only) generator of P H . R n . The homological scaﬀold originated from the intuition that traditional, graph-theoretical tools in network analysis were naturally able to capture signiﬁcantproperties ([62]), but proved not as eﬀective in detecting multi-agent and large-scaleinteractions. Interest in searching for alternative descriptors of network relationsarose, and soon works were published which leveraged invariants oﬀered by compu-tational topology ([63, 14, 13]).In proposing the scaﬀold ([40]), the authors pointed out that homological might beable to summarize well network mesoscale structures, i.e., features living betweenthe purely local connections and the global statistics, to which previous methodolo-gies were blind. Furthermore, this structure could be analyzed over the continuous,full range of interaction intensities, without the need for ad-hoc domain-speciﬁc uerra et al.

Page 7 of 25 thresholds.Homological cycles intuitively describe obstruction patterns. The presence of non-trivial homology within a given region of a network highlights its structure asnon-contractible, binding signals to ﬂow over constrained channels, which in turnplay the role of bridges.To test the method, the homological scaﬀold was computed from resting-state fMRIdata for 15 healthy volunteers who were either infused with placebo or psilocybin:the scaﬀold discriminated the two groups, as well as providing meaningful insightas to the impact of the psychoactive substance onto the pattern of information ﬂowin the brain [40].Given a non-negatively weighted ﬁnite graph W = ( V, E, w : E (cid:55)→ R + ), let F bea ﬁltration of simplicial complexes as above.Let { b i } be a set of 1-dimensional generator cycles of the persistent homology. Sincewe are over Z , each of the b i ’s is completely identiﬁed by its support, which is a setof edges of E . In particular, we can depict set { b i } as a matrix whose row are indexedby E and having the b i ’s as columns. The row sums, as natural numbers, form anew weighting function on the edges of W , the new weights counting precisely inhow many persistent cycles an edge appears along the ﬁltration. Deﬁnition 3.1

Suppose W and F as above, and consider a set { b i } of 1-dimensional generator cycles of the persistent homology. Consider the function h W : E (cid:55)→ R + h W := (cid:88) i e ∈ b i (1)where by e ∈ b i we denote the indicator function E (cid:55)→ R + such that e ∈ b i ( e (cid:48) ) = 1 if e (cid:48) appears in b i , and 0 otherwise.Then the homological scaﬀold of W is the weighted graph H ( W ) such that- its vertex set coincides with the vertex set of W - its edge set is a subset of the edge set of W , consisting of edges with nonzerovalue for h W - its weight function is the restriction of h W to E .In accordance with the above deﬁnition, building the homological scaﬀold of aweighted network W is a method of network compression or skeletonization . Thedeﬁnition also implies that edge weights are assigned by the number of basis cyclesthe edge belongs to.In the example of Fig. 2(a), a ﬁltration of simplicial complexes arising from a pointcloud is depicted, together with generators of the persistent homology group, eachat the scale at which it is born. In Fig. 2(b), the corresponding homological scaf-fold is represented: one can see that the scaﬀolding procedure amounts to stackinggenerators of P H , i.e., cycles in the network, each yielding unitary weight.In the following, we shall sometimes refer to the homological scaﬀold as the loose , uerra et al. Page 8 of 25 ε ( a ) ( b ) Figure 2 (a) A point cloud in [0 , and the generators of P H , plotted on the ﬁltration stepthey appear at (scale reported on the axis below). (b) The resulting homological scaﬀold. Edges inblue have weight 1, each belonging to only one generator. The edge in green has weight 2, as itbelongs to two generators. or original scaﬀold, to contrast it with the new deﬁnition of scaﬀold to follow.As anticipated in the introduction, it is apparent that there is a substantial sourceof arbitrariness in this deﬁnition.Several diﬀerent representative cycles exist which form a basis of the persistenthomology (as a consequence of several diﬀerent cycles belonging to the same ho-mology class), and hence one must make a choice. For example, Fig. 3(a) depictsone speciﬁc cycle whose homology class generates (part of) the persistent homologygroup of the point cloud. At the same time, any other choice of edges forming acycle around the hole is homologically equivalent and, in principle, legitimate.In the original paper, the authors resorted to using the cycles as output by the JavaPlex implementation ([64]) of the persistent homology algorithm (based onthe original implementation of [21]), and a posteriori checked the selected cyclesfor consistency. However, in principle, this means that the same simplicial complexwritten with two diﬀerent orderings of the simplices could lead to diﬀerent choicesof generators, and therefore, to diﬀerent scaﬀolds.As such, we must be careful in the choice of nodes and edges output by the algo-rithm; while the presence of a generator denotes undeniably that an obstructionpattern exists, we cannot be as conﬁdent about its precise location in the networkor the constituents that provide bridges around it. The homological scaﬀold deﬁnedin this way introduces noise in the localization of mesoscale patterns onto individualnodes and edges, a process which, if accurate, could provide valuable insight as to uerra et al.

Page 9 of 25 ( a ) ( b ) Figure 3

A simplicial complex K with dim H ( K ) = 1 . Its homological scaﬀold (on a subset ofthe ﬁltration steps, for clarity) is reported in panel (a): the chosen generator meanders around thehole. Furthermore, a diﬀerent ordering of the list of simplices fed to the algorithm could return adiﬀerent cycle. In panel (b), the shortest representative cycle is chosen: this choice is stable withrespect to any ordering of the input, while at the same time endowing the generator with somemetric and geometric meaning. the functional role of single players in a network.In this work, we try to work around the problem of cycle choice and give a stricterdeﬁnition, by requiring that, among all possible representatives, those of minimaltotal length are chosen (e.g., Fig. 3(b)).The original algorithm reported a computational complexity of the order O ( n ) toobtain representatives of basis cycles. The search for minimality in the computation of the scaﬀold was made feasible bythe introduction of eﬃcient algorithms to compute the minimal representatives ofa homology bases in dimension one.It is known that in dimension higher than one, minimal representatives of a homol-ogy basis will remain elusive. Indeed, Chen and Freedman ([65]) proved that theproblem of obtaining these minimal representatives is computationally intractable,being at least as hard as the notoriously NP-Hard Nearest Codeword Problem. Fur-thermore, it is even NP-Hard to approximate within any constant factor, meaningthat no polynomial-time algorithm exists to obtain an approximate minimal basisthat diﬀers from the exact one by at most a multiplicative constant. In the lightof this, we must necessarily restrict our attention to the 1-dimensional case, i.e.,computing minimal representatives of a basis of H . Given a simplicial complex K , let us consider C the vector space generated by the1-simplices of K and Z the vector space of 1-cycles, i.e., Z = ker ∂ . Given a 1-cycle uerra et al. Page 10 of 25 b ∈ Z , let µ ( b ) be its length, i.e., the sum of the weights of the 1-simplices that formit, and denote by [ b ] the homology class b belongs to. Finally, let β := dim H ( K ).We want to obtain a set of β ∈ Z { b , ..., b β } = argmin Span { [ b i ] } = H (cid:88) i µ ( b i ) (2)that is a set of cycles of minimal length whose homology classes span H ( K ). Inaccordance with the literature, we call this set a minimal homology basis , with aslight abuse of terminology, as it would be more appropriate to call it a minimally-represented homology basis .In 2018, Dey et al. ([61]) introduced a polynomial-time algorithm to obtain saidrepresentatives. Building on the work of Horton ([66]), de Pina ([67]), and Mehlhornet al. ([68]), the algorithm sets oﬀ to compute a basis of the space of cycles. Then,it applies a cohomological technique called simplex annotation ([69]) to lift a basisof cycles to a basis of the homology group H , while at the same time enforcing theminimal length constraint. A sketch of the algorithm follows. Algorithm: MinBasis( K ) • A basis of the cycles group Z is found via a spanning tree. Each edge in thecomplement of the spanning tree identiﬁes a candidate cycle ([66]). • An annotation of the edges is computed via matrix reduction ([69]). Thisyields the dimension β of H , as well as an eﬃcient tool to determine if twocycles b and b are linearly dependent in H ( [ b ] = [ b ]). • A set of support vectors is generated which maintains a basis of the orthogonalcomplement in H of the minimal basis cycles. • Iteratively for each dimension of H , the candidate set of cycles is parsedin search of cycles b ’s that are linearly independent in homology from theprevious ones (exploiting the support vectors). Among these, the µ -shortestone is added to the minimal basis. • The set of support vectors is updated for the remaining dimensions to enforceit remain a basis of the orthogonal complement of the basis. • The last two steps above are repeated until completion of the minimal basis.Call B = { b i } the output of MinBasis on input K . Theorem (3.1, [61]) Cycles in B form a minimal homology basis of H ( K ).Notice that the minimal homology basis is guaranteed to exist, as we only work withﬁnite simplicial complexes, which imply the existence of a ﬁnite number of bases.However, it needs not, in general, be unique. Several diﬀerent cycles of the sameminimal length may all belong to the same homology class of a basis cycle. Heuris-tically, this is especially true in case the input complex is unweighted (equivalently,has equal weights for every edge), in which case the length of a cycle is the numberof edges that form it. Furthermore, there exist cases when diﬀerent sets of cyclesof minimal length generate the same homology space, and are not even pairwisehomologous. We will treat the problem of the uniqueness of the minimal basis inmore detail in the following, and account for it explicitly in the construction of the uerra et al. Page 11 of 25 minimal scaﬀold.The computational complexity of the above procedure is evaluated ([61]) to O ( n β + n ω ) where n is the number of simplices in K and ω is the fast ma-trix multiplication exponent, which as of 2014 is bounded by 2.37 ([61, 70, 71]).This yields a worst-case complexity of O ( n ) in the number of simplices for generalcomplexes, which we recall is itself of order 3 in the number of points in the worstcase. In this section, we introduce an alternative deﬁnition for the homological scaﬀold,which we call minimal, based on the minimal representatives obtained above, andaims at overcoming the arbitrariness in the cycle choice of the previous deﬁnition.After addressing the simplest case, we analyze its uniqueness properties and intro-duce a second, more reﬁned, deﬁnition.Let F be the ﬁltration of simplicial complexes induced by a non-negativelyweighted ﬁnite graph W . For all ﬁltration steps ε , deﬁne, as per (2), B ε := { b εi } theminimal homology basis of H ( K ε ). Take the disjoint union of minimal bases for ε varying on all ﬁltration steps B ∗ := (cid:97) ε B ε Deﬁnition 5.1

Suppose W , F and B ∗ as above. Similarly to the loose case, deﬁnethe function h W,min : E (cid:55)→ R + as h W,min := (cid:88) b ∈ B ∗ e ∈ b (3)Then, we deﬁne the minimal scaﬀold of W as the weighted graph H min ( W )whose:- vertex set coincides with the vertex set of W - edge set is a subset of the edge set of W , consisting of edges with nonzerovalue for h W,min - weight function is the restriction of h W,min to E .The minimal scaﬀold amounts, again, to the stacking of generator cycles across aﬁltration. However, two diﬀerences are to be noted with respect to the loose deﬁni-tion. First, we require the representative cycles to be minimal. Second, we point outthat while the loose scaﬀold is built by aggregating the generator cycles of P H ( F ),the minimal scaﬀold is built by independently computing a minimal basis for each H ( K ε ) , for all ε . Notice that, since cycles are modiﬁed throughout a ﬁltration, itwould be meaningless to talk about a minimal representative over a certain persis-tence interval. This also means that its computation can be eﬀectively parallelizedby assigning diﬀerent ﬁltration steps to diﬀerent jobs, and later recombining theoutputs. uerra et al. Page 12 of 25 ε ( a ) ( b ) Figure 4 (a) The same point cloud of Fig. 2. Along the ﬁltration we show the evolution ofminimal generators, which can get progressively shorter as new edges are introduced. For example,at ε = 0 . , the pentagonal cycle gets cut to a shorter quadrilateral, albeit with an individuallonger edge. This evolution is accounted for in the minimal scaﬀold, which displays thetriangle-rich structure mentioned above. (b) The resulting minimal scaﬀold (weights not reported). An interesting phenomenon that descends directly from the above peculiarity isthat the minimal scaﬀold of random point clouds tends to display a more pro-nounced triangular structure (clustering) around cycles. Indeed, as longer (or, innon-metrical ﬁltrations, later) edges are introduced, a cycle can be shortened (bythe triangular inequality) by a longer edge which cuts a corner. Since at each stepthe algorithm records the minimal representative, upon aggregating the minimalscaﬀold one ﬁnds each cycle in its progressively shorter version, and the history ofthe shortening is visible as a padding of triangles around it (see for example Fig.4(a)).We remark that, if there is no ambiguity in the construction of a ﬁltration of sim-plicial complexes from a point cloud, or from a weighted graph, we will indiﬀerentlyspeak of the scaﬀold as a function of either of them ( H min ( C ), or H min ( W ), or H min ( F )).We have mentioned that the scaﬀold amounts to a change in weighting in the inputgraph h W,min : E (cid:55)→ R + altering the original weights of the edges. Additionally, considering node strength (i.e. the sum of the weights of the edges incident to a given node), it can equally be uerra et al. Page 13 of 25 considered as a function H min : V (cid:55)→ R + assigning weights to nodes. Considering the reliability of the choice of edges inthe procedure, this explains why the minimal scaﬀold can be utilized to associatemesoscopic features with single nodes and links. Computational Complexity

For large input sizes, the cost of assembling the minimal basis cycles into the scaf-fold is negligible with respect to the cost of computing such minimal basis. We knowthat each run of Dey’s algorithm costs O ( | K | ) in the worst case ([61]), and in theworst case | K | is itself O ( n ) where n is the number of points.The number of ﬁltration steps has an upper bound of O ( n ) (i.e., the number ofedges) in the worst case, as in general every edge may carry a diﬀerent weight.Hence Dey’s algorithm has to be run once for each edge in the worst case.This yields a theoretical worst-case complexity of order O ( n n ) = O ( n ). There-fore, while the minimal scaﬀold is undeniably a polynomial-time algorithm, its prac-tical computation is often hindered by its dire lack of scalability, especially if com-pared against the loose version, which has a far more favourable complexity.A comparison of running times is carried out in Fig. 5, which clearly shows thatcomputing the minimal scaﬀold on an ordinary machine can quickly become trou-blesome. uerra et al. Page 14 of 25

Figure 5

The running times of computing the minimal and loose scaﬀolds for Watts-Strogatzweighted random graphs. For all instances, number of nodes N is indicated on the x-axis. Numberof stubs k is N/ , and rewiring probability is p = 0 . . Implementation

We have written a Python implementation of Dey’s algorithm, together with alibrary for the computation of the minimal scaﬀold. The code is available on GitHubat [72], with some usage examples. It allows for shared-memory multi-threadedparallelism across ﬁltration steps to improve computation times, while still beingsuitable for ordinary desktop workstations.

The uniqueness of the minimal scaﬀold depends on the uniqueness of the minimalbasis. Indeed, if there exists only one possible set B ∗ of cycles forming a minimalbasis, then the scaﬀold is uniquely determined. Two issues aﬀect the uniqueness ofset B ∗ . Draws

The ﬁrst one arises when two or more diﬀerent and homologous basis cycles are ofthe same minimal length. This case is relatively simple to work around: we modifythe deﬁnition of minimal scaﬀold to keep track of all variants of minimal basiscycles, dividing the weight equally among them.Speciﬁcally, to account for this issue we have slightly modiﬁed Dey’s algorithm. Inits last step described above, one is concerned with ﬁnding all cycles whose anno-tation is not orthogonal to the given support vector: among these, the one withminimal length is chosen as a basis cycle. Instead, we keep track of all such cycleswith the same minimal length. This does not alter the complexity, as one needs tocheck all possible cycles anyway. We call this case a draw . uerra et al. Page 15 of 25

Therefore, we modify set B to become a set of sets of cycles . Given complex K , wedeﬁne a minimal basis with draws ˜ B := β ( K ) (cid:91) i =1 { b i, , ..., b i,n i } where for all i = 1 , ..., β ( K ), the cycles b i,j with j = 1 , ..., n i are homologous andhave the same minimal length. Furthermore, for every choice of j i ∈ { , ..., n i } ,Span i { b i,j i } = H ( K ). Call V i := { b i, , ..., b i,n i } each set of draws, i.e., variants ofthe i th minimal basis cycle, ∀ i = 1 , ..., β ( K ).In the example of Fig. 6(a) and (b), we have set ˜ B = { { b , , b , } } , whereas set B might have indiﬀerently been equal to { b , } or to { b , } , whichever happened tocome ﬁrst in the search.The minimal scaﬀold is modiﬁed accordingly. Given the usual ﬁltration F , let ˜ B ε be the minimal basis with draws of H ( K ε ). Again, we aggregate all variants ofminimal basis cycles along the ﬁltration˜ B ∗ := (cid:97) ε ˜ B ε Then, we deﬁne the weighting function with draws ˜ h W,min : E (cid:55)→ R + ˜ h W,min := (cid:88) V ⊂ ˜ B ∗ | V | (cid:88) b ∈ V e ∈ b (4)and the resulting minimal scaﬀold with draws ˜ H min ( W ) is built from ˜ h W,min as inDeﬁnition 3.The meaning of the above deﬁnition is that all variants of all minimal basis cyclesare taken into account when building the scaﬀold, and the weights are assigned di-viding each variant’s contribution by its cardinality, for each ﬁltration step. In theexample of Fig. 6(c), the two cycles forming the variant of the only generator aremultiplied by a factor of and then summed: therefore, common edges outside thediamond are assigned weight 1, consistently with the minimal scaﬀold in deﬁnition(3), whereas the four edges forming the perimeter of the diamond each get assignedweight .With the introduction of draws, we settle the case when ambiguity arises amongindividual cycles, without interactions. As an example, we can state the followingresult. Proposition If F is such that, for all ε in the ﬁltration, each basis cycle belongsto a diﬀerent connected component of K ε , then the minimal scaﬀold with draws˜ H min ( F ) is unique. Pathological cases

The other issue arises when there exist sets of minimal cycles that are not linearlyindependent. Suppose that three diﬀerent cycles generate a homology group of di-mension two, i.e., when three minimal cycles are pairwise independent in homology, uerra et al.

Page 16 of 25 but threewise dependent. In this case, two generators are suﬃcient to span H and,if their lengths are arranged pathologically, there is no principled way to choose twoout of the three.Suppose for example that three cycles b , b and b are such that µ ( b ) < µ ( b ) = µ ( b ) and [ b ] = [ b ] + [ b ]In this case, both bases { b , b } and { b , b } span the same homology space, andare of equal minimal length. The minimality criterion fails in this case.One could believe that such a conﬁguration can only happen in the most generalspaces, and that by imposing some mild hypotheses on the input data one couldrule the pathology out. In fact the opposite is true, this degeneracy being possibleeven after enforcing very strong conditions on the data. Counterexample

Even if W is planar and an isometric embedding W (cid:44) → R exists(i.e., the input planar weighted graph can be accurately drawn onto the plane), theminimal scaﬀold ˜ H min ( W ) needs not be unique.In fact, consider complex K arising from the geometric, planar graph in Fig. 6(d).Its homology H ( K ) is generated by two cycles; since the outer cycle b is the short-est, and the two inner ones b and b are of equal length, the minimality criterioncan not solve between { b , b } and { b , b } , as both are acceptable minimal bases.The minimal scaﬀold (with or without draws) is not unique in this case.Clearly, the same could happen with more than three cycles, with a larger numberof possibly ambiguous conﬁguration. Therefore, if we allow for a high degree ofsymmetry in the input, this pathology could arise even in the rather tame contextof planar graphs on R . This issue is rather delicate, in the sense that not only thealgorithm is unable to make a principled choice; it is not even capable of detectingwhen such a conﬁguration takes place. In fact, this is more of a feature of homologythan a ﬂaw in the skeletonization framework: what our eyes see as diﬀerent cyclesare in fact homologically equivalent, and it is impossible to use homology to tellthem apart.We however remark that, for complexes arising from real-world data, this type ofconﬁguration is actually pathological. Indeed, the following generality result holds Proposition

Assume a point cloud C = { X i } such that X i ∼ U ([0 , d ) i.i.d.. Then,almost surely, the minimal scaﬀold H min ( W ) (with or without draws) is unique.If the input point cloud is sampled uniformly at random in some R d , then edgelengths are distributed according to an absolutely continuous probability law. There-fore, given two edges e and e , P [ µ ( e ) = µ ( e )] = 0. The same holds for any twonon-identical cycles, and any two homology bases (being but ﬁnite sets of edges):the probability of them sharing the exact same length is zero. By ﬁniteness of theinput, at least one minimal homology basis exists and, by the above reasoning,almost surely this basis is unique for each ﬁltration step. Then, with probability 1the minimal scaﬀold is unique. uerra et al. Page 17 of 25

K b b ( a ) ( b )( c ) K b b b ( d ) ( e ) Figure 6

Top panel: (a) A simplicial complex K . (b) Two homologous and equally minimalgenerators of H ( K ) . (c) The minimal scaﬀold with draws ˜ H min ( K ) . The weight is equallydivided among the variants of the minimal representative. Bottom panel: (d) A simplicial complex K on the represented point cloud. H ( K ) has dimension 2. (e) µ ( b ) < µ ( b ) = µ ( b ) . Aminimal basis can either be composed of { b , b } or { b , b } , hence it is not unique. This result is actually quite general: whenever we can assume our input data tobe subject to noise, then we are in principle allowed to rule out pathological same-length cycles. In these cases, the minimal scaﬀold is unique.We remark that this uniqueness result is compatible with the phenomenon of theconcentration of measure: while for a very high-dimensional space or a very largenumber of points we know from theory that the distribution of length of edgesconcentrates towards its mean value, the probability of two edges (and hence twocycles) having the same length is still zero. One needs to be careful, however, thatthe probability of two cycles diﬀering in length by less than some (cid:15) > (cid:15) .In summary, the minimal scaﬀold with draws ˜ H min is well-deﬁned up to somepathological circumstances, where it may depend on the ordering of the input. As illustrative examples, we show here a few applications of the minimal scaﬀold.Through it, we obtain meaningful subsets of known networks in neuroscience, andrank their constituents by their “topological importance”. uerra et al.

Page 18 of 25

Figure 7

The top 25 neurons by relative node strength in the minimal scaﬀold over averagestrength in C. Elegans (mean . ). Four neurons show a signiﬁcantly higher relative strengththan the others. The C. Elegans dataset is a correlation network of neural activations of the ne-matode worm Caenorhabditis Elegans. C. Elegans has become a model organismdue to the unique characteristic of each individual sharing the exact same nervoussystem structure.The input consists of a symmetric weighted adjacency matrix over 297 nodes, eachrepresenting a neuron. Edge weights represent (quantized) time correlations be-tween the ﬁring of neurons, ranging from 1 to 70.The minimal homological scaﬀold of its brain map highlights the geometry of theobstruction patterns, i.e., the precise areas where nervous stimuli are less likely toﬂow. We stress the improvement obtained by the minimal scaﬀold over the looseone, in that it is not only able to identify the presence of a “grey area” in thenetwork, but it can as well provide a reliable boundary for it, and identify whichneurons and inter-neuron links are responsible for information ﬂowing around theobstruction.As an interesting example, we see in Fig. 7 the top 25 neurons ranked in descendingorder of relative node strength (sum of weights of incident edges) with respect tothe average node strength. We can identify four nodes, labeled 81, 260, 36, and37, which hold a signiﬁcantly higher relative strength than the rest. This impliestheir presence in many minimal cycles across several scales, hence suggesting thatthey play a crucial role in the fabric of information ﬂow within the nematode’s brain.The same type of analysis was repeated on the correlation network of brain activitiesin an 88-parcel atlas of the human brain, obtained through fMRI imaging at restingstate. The data is courtesy of the Human Connectome Project ([73]).Again, the minimal scaﬀold identiﬁes which regions and links in the human brain uerra et al.

Page 19 of 25 ( a ) ( c ) ( b ) Figure 8 (a) The top 25 brain regions in the human brain by relative node strength in theminimal scaﬀold over average strength (mean . ). Two neurons show signiﬁcantly higherimportance. (b) The chord diagram of the minimal scaﬀold. Node size represents node strength,edge color intensity represents weight in the scaﬀold. (c) The minimal scaﬀold embedded in thehuman brain, with regions accurately located, projected on the three coordinated planes. Edgecolor represents log-weight in the minimal scaﬀold (Log-scale for visualization purposes). are key bridges for the ﬂow of information. Two parcels stand out (Fig. 8(a)) asparticularly relevant for network topology.For a relatively small network such as this, we can visualize the scaﬀold as a propersubnetwork by a chord diagram (Fig. 8(b)), with edge weight represented by colorintensity and node strength by the size and color of the vertex. We stress that,starting from a virtually complete graph over 88 nodes, we reduce the size from3828 edges to just 191, while preserving the topological structure.We can, as well, leverage libraries in computational neuroscience ([74]) to embed thescaﬀold in the actual human brain, with regions correctly located, projected on thethree coordinated planes. In Fig. 8(c), for visualization purposes color intensitiesrepresent log-weight in the scaﬀold. As the last contribution for this work, we consider a comparison between the mini-mal and loose scaﬀolds.We have already pointed out that the minimal scaﬀold in general oﬀers superiorguarantees as a tool, both for network analysis and network skeletonization. On theother hand, the loose scaﬀold clearly has an advantage in terms of computationalcomplexity: while it is in principle viable for most of the applications where persis-tent homology has been employed, the minimal scaﬀold, even adopting ﬁltration- uerra et al.

Page 20 of 25 wise parallelization, requires a vastly larger amount of computational power, whicheﬀectively limits its range of application, unless run on dedicated, high-performanceinfrastructures.A reasonable question to ask is the following. If one is interested not in the ex-act structure of the scaﬀold, but only in its statistical behaviour, could the loosescaﬀold provide a suﬃcient approximation of the minimal one? In a more concreteexample, if instead of wondering exactly which nodes in a network are the mosttopologically important one is interested in the distribution of the degree sequenceof the minimal scaﬀold, could the loose one come to one’s help?To answer this question, we have performed comparisons of several graph metricsin the two scaﬀolds of C. Elegans. Further, to gain insight into the general case, wehave sampled two families of random graphs at diﬀerent parameter values, one forgeometric graphs (Random Geometric Graph), and one for non-geometric graphs(Weigthed Watts-Strogatz).

C. Elegans

For the C. Elegans dataset, we have compared the following graph metrics of theminimal and loose scaﬀolds:1 Degree Sequence2 Node Strength3 Betweeness Centrality4 Closeness Centrality5 Eigenvector Centrality6 Clustering Coeﬃcients7 Edge weightsResults (reported in the Table of Fig. 9(c)) indicate that, for metrics 1 to 5, thetwo scaﬀolds are very well correlated. So for example the cheap, loose scaﬀold is areliable proxy of the distribution of the “true” degree sequence (scatterplot in Fig.9(d)).We instead observe poor correlation of edge weights and clustering coeﬃcients.The ﬁrst one is not unexpected, since the edge weighting procedure is conceptuallydiﬀerent in the two scaﬀolds: while in the minimal one we consider a diﬀerent basisfor each ﬁltration step, the loose scaﬀold considers bases of the persistent homologyspace, drastically reducing the number of cycles considered. To make it clearer,in general set B ∗ has cardinality much larger than the dimension of P H . It istherefore explicable that the distributions of edge weights do not generally agree.Clustering coeﬃcients, on the other hand, are a measure of how “triangular” agraph is around a given node. As remarked in Section 5, another consequence ofassembling the scaﬀold from the minimal bases of the H ’s is that a large numberof artiﬁcial triangles appear around cycles. In this case too, therefore, the poorcorrelation is easily explained. Random Graphs

Drawing inspiration from [52], we repeat the analysis on random graph samples. [52]divides random networks into two categories: those created from edge weightingschemes and those created from points in the Euclidean space. We have chosen uerra et al.

Page 21 of 25

Random Geometric Model

PearsonSpearman

Watts-Strogatz Model

PearsonSpearman D e g r ee S e q u e n ce B e t w ee n ee ss C e n t r a li t y ( a ) ( b )( c ) Spearman PearsonMetric

Corr p-val Corr p-valNode Degree 0.953148 3.1842e-155 0.975559 3.4463e-196Node Strength 0.772330 4.3712e-60 0.700653 3.7250e-45Betw. Centrality 0.952098 7.7348e-154 0.986412 1.8813e-233Closeness Centrality 0.921274 5.1143e-123 0.960413 8.7695e-166Eigenvector Centrality 0.880711 9.5943e-98 0.858564 1.3911e-87Clustering Coecients 0.412889 1.1778e-13 0.358577 1.9337e-10Edge Weights 0.226321 1.3586e-09 0.086226 0.0224 AAAE3HichVNNj9RGEDU7QwITkizhmEuLVVAuY3X7q+0bYoKUS9BGZFnQerPq8dR4WvhL3e0NK6/FhQMoyjU/LLf8jfyClNsDmV0i0Rc/16t69ao/lk0htaH07xt7k+nNzz6/dXv2xZ0vv/p6/+43z3TdqgyOsrqo1fOl0FDICo6MNAU8bxSIclnA8fLlYuCPz0FpWVe/mIsGTkuRV3ItM2EwdHZ37590CbmsOiOWbSFU35HLpvPdMCv7S0RsRAOI34OrEdLPSLoZ+s/IA5KWbWEk+mrLqvP67jK77LvUwCuzXHdPGxCqFFXffyrzEBN1PSSm6X/y79mfwCiZDSKLWin8NPNzUVz/3a18Uq+A/AC5AiAEM6ibhD4LYmL/fJfFgQdzFoYjx8MwTCwRBJGPRBINalblqVFQ5WYzJHLu+T7FxMD1OUOFiFoBTmkU+laAeyGFeRAS6+cRmN9csoDKKFFIc0E+uPFoEiPkLveDeLASjEQcBcxDiLsdM7SCDa3Soqg1VKD1rtoDlMMaj3l8KA9dxoLBvuePYhEN2ABjl0dJiERk53osc6jOITO1uqpGsSnljCFM3DAZtKxLjIdxGAXWl58wBvOY7+73omi1ASWrHM8E1njfJOrqrUGcKI4TW8w4H6Yd/fkoyrmNJ77PMU6tvVUO5BhkvrEK1PW8yPfY2DyMI5jTxMZpHCE1Qs8LdgylUK0+XPGz/QNMsIt8DNgWHDjbdXi2/1e6qrO2xBGyQmh9wmhjTjuh8P4W0M/SVkMjspcihxOElShBn3b2cfbkO4ysyBq3dl1XhtjobkUnSq0vyiVmlsJs9HVuCP4fd9KadXzayappDVTZ2GjdFsTUZHjpZCUVHmlxgUBkSqJXkm2EEhkejJ7hJrDrI38Mnnkuoy77OTh4+Gi7Hbecb537zvcOc7jz0PnROXSOnGzyYvJ68nbybvrr9M309+kfY+rejW3NPefKmv75L8S2YEg= AAAE3HichVNNj9RGEDU7QwITkizhmEuLVVAuY3X7q+0bYoKUS9BGZFnQerPq8dR4WvhL3e0NK6/FhQMoyjU/LLf8jfyClNsDmV0i0Rc/16t69ao/lk0htaH07xt7k+nNzz6/dXv2xZ0vv/p6/+43z3TdqgyOsrqo1fOl0FDICo6MNAU8bxSIclnA8fLlYuCPz0FpWVe/mIsGTkuRV3ItM2EwdHZ37590CbmsOiOWbSFU35HLpvPdMCv7S0RsRAOI34OrEdLPSLoZ+s/IA5KWbWEk+mrLqvP67jK77LvUwCuzXHdPGxCqFFXffyrzEBN1PSSm6X/y79mfwCiZDSKLWin8NPNzUVz/3a18Uq+A/AC5AiAEM6ibhD4LYmL/fJfFgQdzFoYjx8MwTCwRBJGPRBINalblqVFQ5WYzJHLu+T7FxMD1OUOFiFoBTmkU+laAeyGFeRAS6+cRmN9csoDKKFFIc0E+uPFoEiPkLveDeLASjEQcBcxDiLsdM7SCDa3Soqg1VKD1rtoDlMMaj3l8KA9dxoLBvuePYhEN2ABjl0dJiERk53osc6jOITO1uqpGsSnljCFM3DAZtKxLjIdxGAXWl58wBvOY7+73omi1ASWrHM8E1njfJOrqrUGcKI4TW8w4H6Yd/fkoyrmNJ77PMU6tvVUO5BhkvrEK1PW8yPfY2DyMI5jTxMZpHCE1Qs8LdgylUK0+XPGz/QNMsIt8DNgWHDjbdXi2/1e6qrO2xBGyQmh9wmhjTjuh8P4W0M/SVkMjspcihxOElShBn3b2cfbkO4ysyBq3dl1XhtjobkUnSq0vyiVmlsJs9HVuCP4fd9KadXzayappDVTZ2GjdFsTUZHjpZCUVHmlxgUBkSqJXkm2EEhkejJ7hJrDrI38Mnnkuoy77OTh4+Gi7Hbecb537zvcOc7jz0PnROXSOnGzyYvJ68nbybvrr9M309+kfY+rejW3NPefKmv75L8S2YEg= AAAE3HichVNNj9RGEDU7QwITkizhmEuLVVAuY3X7q+0bYoKUS9BGZFnQerPq8dR4WvhL3e0NK6/FhQMoyjU/LLf8jfyClNsDmV0i0Rc/16t69ao/lk0htaH07xt7k+nNzz6/dXv2xZ0vv/p6/+43z3TdqgyOsrqo1fOl0FDICo6MNAU8bxSIclnA8fLlYuCPz0FpWVe/mIsGTkuRV3ItM2EwdHZ37590CbmsOiOWbSFU35HLpvPdMCv7S0RsRAOI34OrEdLPSLoZ+s/IA5KWbWEk+mrLqvP67jK77LvUwCuzXHdPGxCqFFXffyrzEBN1PSSm6X/y79mfwCiZDSKLWin8NPNzUVz/3a18Uq+A/AC5AiAEM6ibhD4LYmL/fJfFgQdzFoYjx8MwTCwRBJGPRBINalblqVFQ5WYzJHLu+T7FxMD1OUOFiFoBTmkU+laAeyGFeRAS6+cRmN9csoDKKFFIc0E+uPFoEiPkLveDeLASjEQcBcxDiLsdM7SCDa3Soqg1VKD1rtoDlMMaj3l8KA9dxoLBvuePYhEN2ABjl0dJiERk53osc6jOITO1uqpGsSnljCFM3DAZtKxLjIdxGAXWl58wBvOY7+73omi1ASWrHM8E1njfJOrqrUGcKI4TW8w4H6Yd/fkoyrmNJ77PMU6tvVUO5BhkvrEK1PW8yPfY2DyMI5jTxMZpHCE1Qs8LdgylUK0+XPGz/QNMsIt8DNgWHDjbdXi2/1e6qrO2xBGyQmh9wmhjTjuh8P4W0M/SVkMjspcihxOElShBn3b2cfbkO4ysyBq3dl1XhtjobkUnSq0vyiVmlsJs9HVuCP4fd9KadXzayappDVTZ2GjdFsTUZHjpZCUVHmlxgUBkSqJXkm2EEhkejJ7hJrDrI38Mnnkuoy77OTh4+Gi7Hbecb537zvcOc7jz0PnROXSOnGzyYvJ68nbybvrr9M309+kfY+rejW3NPefKmv75L8S2YEg= AAAE3HichVNNj9RGEDU7QwITkizhmEuLVVAuY3X7q+0bYoKUS9BGZFnQerPq8dR4WvhL3e0NK6/FhQMoyjU/LLf8jfyClNsDmV0i0Rc/16t69ao/lk0htaH07xt7k+nNzz6/dXv2xZ0vv/p6/+43z3TdqgyOsrqo1fOl0FDICo6MNAU8bxSIclnA8fLlYuCPz0FpWVe/mIsGTkuRV3ItM2EwdHZ37590CbmsOiOWbSFU35HLpvPdMCv7S0RsRAOI34OrEdLPSLoZ+s/IA5KWbWEk+mrLqvP67jK77LvUwCuzXHdPGxCqFFXffyrzEBN1PSSm6X/y79mfwCiZDSKLWin8NPNzUVz/3a18Uq+A/AC5AiAEM6ibhD4LYmL/fJfFgQdzFoYjx8MwTCwRBJGPRBINalblqVFQ5WYzJHLu+T7FxMD1OUOFiFoBTmkU+laAeyGFeRAS6+cRmN9csoDKKFFIc0E+uPFoEiPkLveDeLASjEQcBcxDiLsdM7SCDa3Soqg1VKD1rtoDlMMaj3l8KA9dxoLBvuePYhEN2ABjl0dJiERk53osc6jOITO1uqpGsSnljCFM3DAZtKxLjIdxGAXWl58wBvOY7+73omi1ASWrHM8E1njfJOrqrUGcKI4TW8w4H6Yd/fkoyrmNJ77PMU6tvVUO5BhkvrEK1PW8yPfY2DyMI5jTxMZpHCE1Qs8LdgylUK0+XPGz/QNMsIt8DNgWHDjbdXi2/1e6qrO2xBGyQmh9wmhjTjuh8P4W0M/SVkMjspcihxOElShBn3b2cfbkO4ysyBq3dl1XhtjobkUnSq0vyiVmlsJs9HVuCP4fd9KadXzayappDVTZ2GjdFsTUZHjpZCUVHmlxgUBkSqJXkm2EEhkejJ7hJrDrI38Mnnkuoy77OTh4+Gi7Hbecb537zvcOc7jz0PnROXSOnGzyYvJ68nbybvrr9M309+kfY+rejW3NPefKmv75L8S2YEg= ( d ) Figure 9

Correlations between the minimal and loose scaﬀold. (a) Comparison in the weightedWatts-Strogatz model. Degree sequence and betweenness centrality in the two scaﬀolds arecompared, using Pearson and Spearman correlation coeﬃcients. Each box is computed over asample of 30 weighted Watts-Strogatz random graphs, with parameters as reported on the x-axis:the pair ( N, k ) indicates a WS model on N nodes, with k stubs to rewire. The rewiring probabilityis . . The cyan x’s and the green diamonds represent the average correlation value against theloose and minimal null models, respectively. (b) Comparison in the random geometric model.Again, Pearson and Spearman correlation coeﬃcients of the degree sequence and betweennesscentrality in the two scaﬀolds are compared. Each box is computed over a sample of 30 randomgeometric graphs, with parameters as reported on the x-axis: the pair ( N, t ) indicates a graph on N nodes sampled uniformly at random in the [0 , square. t is the connectivity distancethreshold. The cyan x’s and the green diamonds represent the average correlation value againstthe loose and minimal null models, respectively. (c) Correlation tests for several network metricsshow signiﬁcant capabilities of the standard scaﬀold to reproduce certain statistical properties ofthe minimal one in C. Elegans. At the same time, due to diﬀerent construction mechanisms,others are unreliable. (d) Scatterplot of the degree sequence of neurons of C. Elegans in theminimal scaﬀold versus in the loose one. to analyze the weighted Watts-Strogatz (WS) model as representative of the ﬁrstclass, and the geometric random model as representative of the second. We remarkthat weighting needs to be introduced in order to compute persistence; while forgeometric graphs this simply requires computing the Euclidean distance, for theWatts-Strogatz model it requires an ad-hoc procedure that is described in detail inthe supplemental material of [52].We brieﬂy recall that a WS graph is parametrized by the number of nodes, by thenumber of stubs to rewire, and by the rewiring probability. A random geometricgraph is instead parametrized by the number of points to sample (uniformly) in[0 , d , and by a cutoﬀ value that acts as distance threshold, beyond which no edgeis introduced.In both cases, we observe good agreement on key statistics, as reported in Fig.9(a) and (b). Each bar is obtained by computing the correlation of the reportedstatistic on a sample of 30 random graphs of the reported model, with parameters uerra et al. Page 22 of 25 as indicated on the x-axis.For comparison, two null models are built for each instance of the minimal andloose scaﬀolds in the sample, by constructing an Erd˝os-R´enyi random graph onthe same vertex set, one with the same number of edges as the minimal scaﬀold,and one with the same number as the loose one. The correlation is computed ofeach statistic between the minimal scaﬀold and the loose null model and betweenthe loose scaﬀold and the minimal null model. The average of these correlationsis reported on the boxplots to act as a baseline value, highlighting that the twoscaﬀolding procedures agree with each other by more than just statistical noise.

We provided a new method of network analysis and skeletonization, based on thecomputation of minimal homology bases. This new new construction ﬁlls a signiﬁ-cant gap in previous literature, in that it yields, in all but some pathological cases, awell-deﬁned and unique subgraph, acting as a reasonable ground truth for compar-ison with the previous construction. It can be employed in a range of applications,both to identify crucial and weak links in a network, and to obtain compressed andtopologically sound representations of the input. It also allows to evaluate the re-liability of other scaﬀolding procedures with respect to said ground truth: we haveobserved that, for some applications, the loose scaﬀold can be deemed a suﬃcientlyaccurate tool, while not incurring in as cumbersome a computational load.We foresee that the subject of homological skeletonization is not yet concluded.Other approaches to ﬁnding canonical generators of homology are possible (for ex-ample in [54] and [75]), and we plan to investigate them further in subsequentworks.

Availability of data and material

The C. Elegans dataset analysed during the current study is available and included in the GitHub repositoryMinScaﬀold, https://github.com/marcoguerra192/MinScaﬀold.The Human Connectome Project dataset is available from the pagehttps://github.com/marcoguerra192/MinScaﬀold.

Competing interests

The authors declare that they have no competing interests.

Funding

MG, ADG, UF, and FV acknowledge the support from the Italian MIUR Award “Dipartimento di Eccellenza2018-2022” - CUP: E11G18000350001 and the SmartData@PoliTO center for Big Data and Machine Learning. GPacknowledges partial support from Intesa Sanpaolo Innovation Center. The funder had no role in study design, datacollection, and analysis, decision to publish, or preparation of the manuscript.

Author’s contributions

MG, ADG, UF, GP, and FV conceived and designed the study, performed the analysis and wrote the manuscript. Allauthors read and approved the ﬁnal manuscript.

Acknowledgements

The authors acknowledge Iacopo Iacopini for kindly sharing a Python library for plotting simplicial complexes,available on GitHub ( github.com/iaciac/py-draw-simplicial-complex). We further acknowledge the python libraryNilearn ([74]) for the brain image visualization code. We would also like to thank Paola Siri for useful discussions.

Author details Department of Mathematical Sciences, Politecnico di Torino, Torino, Italy. ISI Foundation, Torino, Italy.

References

1. Newman, M.E.: The structure and function of complex networks. SIAM review (2), 167–256 (2003)2. Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weightednetworks. Proceedings of the national academy of sciences (11), 3747–3752 (2004)3. Granovetter, M.S.: The strength of weak ties. Social networks, pp. 347–367. Elsevier (1977) uerra et al. Page 23 of 25

4. Vega-Redondo, F.: Complex social networks. Cambridge University Press (2007)5. Pastor-Satorras, R., Castellano, C., Van Mieghem, P., Vespignani, A.: Epidemic processes in complex networks.Reviews of modern physics (3), 925 (2015)6. Colizza, V., Barrat, A., Barth´elemy, M., Vespignani, A.: The role of the airline transportation network in theprediction and predictability of global epidemics. Proceedings of the National Academy of Sciences (7),2015–2020 (2006)7. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proceedings of the nationalacademy of sciences (12), 7821–7826 (2002)8. Alon, U.: Biological networks: the tinkerer as an engineer. Science (5641), 1866–1867 (2003)9. Bassett, D.S., Sporns, O.: Network neuroscience. Nature neuroscience (3), 353 (2017)10. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functionalsystems. Nature reviews neuroscience (3), 186–198 (2009)11. Bassett, D.S., Bullmore, E.: Small-world brain networks. The neuroscientist (6), 512–523 (2006)12. Horak, D., Maleti´c, S., Rajkovi´c, M.: Persistent homology of complex networks. Journal of StatisticalMechanics: Theory and Experiment (03), 03034 (2009). doi:10.1088/1742-5468/2009/03/p0303413. Patania, A., Petri, G., Vaccarino, F.: The shape of collaborations. EPJ Data Science (1), 18 (2017).doi:10.1140/epjds/s13688-017-0114-814. Lee, H., Chung, M.K., Kang, H., Kim, B.-N., Lee, D.S.: Discriminative persistent homology of brain networks.In: 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 841–844 (2011).IEEE15. Rieck, B., Fugacci, U., Lukasczyk, J., Leitte, H.: Clique community persistence: A topological visual analysisapproach for complex networks. IEEE Transactions on Visualization and Computer Graphics (1), 822–831(2018). doi:10.1109/TVCG.2017.274432116. Ghrist, R.: Elementary applied topology. Createspace (2014)17. Patania, A., Vaccarino, F., Petri, G.: Topological analysis of data. EPJ Data Science (1), 7 (2017)18. Hatcher, A.: Algebraic topology. Cambridge University Press (2002)19. Munkres, J.R.: Elements of algebraic topology. Perseus Books (1984)20. Frosini, P.: A distance for similarity classes of submanifolds of a euclidean space. Bulletin of the AustralianMathematical Society (3), 407–415 (1990)21. Delﬁnado, C.J.A., Edelsbrunner, H.: An incremental algorithm for betti numbers of simplicial complexes on the3-sphere. Computer Aided Geometric Design (7), 771–784 (1995). doi:10.1016/0167-8396(95)00016-Y. GridGeneration, Finite Elements, and Geometric Design22. Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simpliﬁcation. Discrete &Computational Geometry (4), 511–533 (2002). doi:10.1007/s00454-002-2885-223. Zomorodian, A.J., Carlsson, G.: Computing persistent homology. Discrete & Computational Geometry (2),249–274 (2005)24. Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete & ComputationalGeometry (1), 103–120 (2007). doi:10.1007/s00454-006-1276-525. Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E.G., Matsue, K., Nishiura, Y.: Hierarchical structures ofamorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences ofthe United States of America (26), 7035–7040 (2016)26. Lee, Y., Barthel, S.D., D(cid:32)lotko, P., Moosavi, S.M., Hess, K., Smit, B.: Quantifying similarity of pore-geometry innanoporous materials. Nature communications (1), 1–8 (2017)27. Chan, J.M., Carlsson, G., Rabadan, R.: Topology of viral evolution. Proceedings of the National Academy ofSciences (46), 18566–18571 (2013)28. Chung, M.K., Bubenik, P., Kim, P.T.: Persistence diagrams of cortical surface data. In: Information Processingin Medical Imaging, pp. 386–397 (2009). Springer29. Dequeant, M.-L., Ahnert, S., Edelsbrunner, H., Fink, T., Glynn, E., Hattem, G., Kudlicki, A., Mileyko, Y.,Morton, J., Mushegian, A., et al. : Comparison of pattern detection methods in microarray time series of thesegmentation clock. PLoS One (8), 2856 (2008)30. Wang, Y., Agarwal, P.K., Brown, P., H, E., Rudolph, J.: Coarse and reliable geometric alignment for proteindocking. In: In Proceedings of Paciﬁc Symposium on Biocomputing, vol. 10, pp. 65–75 (2005)31. Martin, S., Thompson, A., Coutsias, E.A., Watson, J.-P.: Topology of cyclo-octane energy landscape. Journalof Chemical Physics (23), 234115 (2010)32. Phinyomark, A., Khushaba, R.N., Ib´a˜nez-Marcelo, E., Patania, A., Scheme, E., Petri, G.: Navigating features: atopologically informed chart of electromyographic features space. Journal of The Royal Society Interface (137), 20170734 (2017)33. De Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology (1), 339–358 (2007)34. van de Weygaert, R., Vegter, G., Edelsbrunner, H., Jones, B.J.T., Pranav, P., Park, C., Hellwing, W.A.,Eldering, B., Kruithof, N., Bos, E.G.P.P., et al.: Alpha, Betti and the Megaparsec Universe: On the Topology ofthe Cosmic Web, pp. 60–101. Springer, Berlin, Heidelberg (2011)35. Patania, A., Selvaggi, P., Veronese, M., Dipasquale, O., Expert, P., Petri, G.: Topological gene expressionnetworks recapitulate brain anatomy and function. Network Neuroscience (3), 744–762 (2019)36. Li, L., Cheng, W.-Y., Glicksberg, B.S., Gottesman, O., Tamler, R., Chen, R., Bottinger, E.P., Dudley, J.T.:Identiﬁcation of type 2 diabetes subgroups through topological analysis of patient similarity. ScienceTranslational Medicine (311), 311–174 (2015)37. Giusti, C., Pastalkova, E., Curto, C., Itskov, V.: Clique topology reveals intrinsic geometric structure in neuralcorrelations. Proceedings of the National Academy of Sciences of the United States of America (44),13455–13460 (2015)38. Wang, Y., Ombao, H., Chung, M.K.: Topological data analysis of single-trial electroencephalographic signals.Annals of Applied Statistics (3), 1506–1534 (2017) uerra et al. Page 24 of 25

39. Yoo, J., Kim, E.Y., Ahn, Y.M., Ye, J.C.: Topological persistence vineyard for dynamic functional brainconnectivity during resting and gaming stages. Journal of Neuroscience Methods (15), 1–13 (2016)40. Petri, G., Expert, P., Turkheimer, F., Carhart-Harris, R., Nutt, D., Hellyer, P.J., Vaccarino, F.: Homologicalscaﬀolds of brain functional networks. Journal of The Royal Society Interface (101), 20140873 (2014).doi:10.1098/rsif.2014.087341. Ib´a˜nez-Marcelo, E., Campioni, L., Phinyomark, A., Petri, G., Santarcangelo, E.L.: Topology highlightsmesoscopic functional equivalence between imagery and perception: The case of hypnotizability. NeuroImage , 437–449 (2019)42. Lord, L.-D., Expert, P., Fernandes, H.M., Petri, G., Van Hartevelt, T.J., Vaccarino, F., Deco, G., Turkheimer,F., Kringelbach, M.L.: Insights into brain architectures from the homological scaﬀolds of functional connectivitynetworks. Frontiers in systems neuroscience , 85 (2016)43. Ib´a˜nez-Marcelo, E., Campioni, L., Manzoni, D., Santarcangelo, E.L., Petri, G.: Spectral and topologicalanalyses of the cortical representation of the head position: Does hypnotizability matter? Brain and behavior (6), 01277 (2019)44. Guo, W., Banerjee, A.G.: Toward automated prediction of manufacturing productivity based on featureselection using topological data analysis. In: IEEE International Symposium on Assembly and Manufacturing,pp. 31–36 (2016)45. Phinyomark, A., Petri, G., Ib´a˜nez-Marcelo, E., Osis, S.T., Ferber, R.: Analysis of big data in gait biomechanics:Current trends and future directions. Journal of medical and biological engineering (2), 244–260 (2018)46. Campbell, E., Phinyomark, A., Al-Timemy, A.H., Khushaba, R.N., Petri, G., Scheme, E.: Diﬀerences in emgfeature space between able-bodied and amputee subjects for myoelectric control. In: 2019 9th InternationalIEEE/EMBS Conference on Neural Engineering (NER), pp. 33–36 (2019). IEEE47. Patania, A., Petri, G., Vaccarino, F.: The shape of collaborations. EPJ Data Science (1), 18 (2017)48. Benson, A.R., Abebe, R., Schaub, M.T., Jadbabaie, A., Kleinberg, J.: Simplicial closure and higher-order linkprediction. Proceedings of the National Academy of Sciences (48), 11221–11230 (2018)49. Petri, G., Scolamiero, M., Donato, I., Vaccarino, F.: Topological strata of weighted complex networks. PloS one (6) (2013)50. Patania, A., Vaccarino, F., Petri, G.: Topological analysis of data. EPJ Data Science (1), 7 (2017)51. Donato, I., Gori, M., Pettini, M., Petri, G., De Nigris, S., Franzosi, R., Vaccarino, F.: Persistent homologyanalysis of phase transitions. Physical Review E (5), 052138 (2016)52. Sizemore, A., Giusti, C., Bassett, D.S.: Classiﬁcation of weighted networks through mesoscale homologicalfeatures. Journal of Complex Networks (2), 245–273 (2017)53. Edelsbrunner, H., Harer, J.: Computational topology: An introduction. American Mathematical Society (2010)54. Kurlin, V.: A one-dimensional homologically persistent skeleton of an unstructured point cloud in any metricspace. Computer Graphics Forum (5), 253–262 (2015). doi:10.1111/cgf.1271355. Kalisnik, S., Kurlin, V., Lesnik, D.: A higher-dimensional homologically persistent skeleton. Advances in AppliedMathematics , 113–142 (2019)56. Ge, X., Safa, I.I., Belkin, M., Wang, Y.: Data skeletonization via Reeb graphs. Advances in Neural InformationProcessing Systems 24, 837–845 (2011)57. Chazal, F., Huang, R., Sun, J.: Gromov–hausdorﬀ approximation of ﬁlamentary structures using reeb-typegraphs. Discrete & Computational Geometry (3), 621–649 (2015)58. Sizemore, A.E., Giusti, C., Kahn, A., Vettel, J.M., Betzel, R.F., Bassett, D.S.: Cliques and cavities in thehuman connectome. Journal of computational neuroscience (1), 115–145 (2018)59. Obayashi, I.: Volume-optimal cycle: Tightest representative cycle of a generator in persistent homology. SIAMJournal on Applied Algebra and Geometry (4), 508–534 (2018)60. Dey, T., Sun, J., Wang, Y.: Approximating loops in a shortest homology basis from point data. Proceedings ofthe Annual Symposium on Computational Geometry (2009). doi:10.1145/1810959.181098961. Dey, T.K., Li, T., Wang, Y.: Eﬃcient algorithms for computing a minimal homology basis. In: Latin AmericanSymposium on Theoretical Informatics, pp. 376–398 (2018). Springer62. Baronchelli, A., Ferrer-i-Cancho, R., Pastor-Satorras, R., Chater, N., Christiansen, M.H.: Networks in cognitivescience. Trends in cognitive sciences (7), 348–360 (2013)63. Lum, P.Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., Carlsson, J., Carlsson,G.: Extracting insights from the shape of complex data using topology. Scientiﬁc reports , 1236 (2013)64. Tausz, A., Vejdemo-Johansson, M., Adams, H.: JavaPlex: A research software package for persistent(co)homology. In: Hong, H., Yap, C. (eds.) Proceedings of ICMS 2014. Lecture Notes in Computer Science8592, pp. 129–136 (2014)65. Chen, C., Freedman, D.: Hardness results for homology localization. Discrete & Computational Geometry (3), 425–448 (2011). doi:10.1007/s00454-010-9322-866. Horton, J.: A polynomial-time algorithm to ﬁnd the shortest cycle basis of a graph. SIAM Journal onComputing (2), 358–366 (1987). doi:10.1137/021602667. de Pina de J, C.: Applications of shortest path methods (1995)68. Kavitha, T., Mehlhorn, K., Michail, D., Paluch, K.: A faster algorithm for minimum cycle basis of graphs. In:D´ıaz, J., Karhum¨aki, J., Lepist¨o, A., Sannella, D. (eds.) Automata, Languages and Programming, pp. 846–857.Springer, Berlin, Heidelberg (2004)69. Busaryev, O., Cabello, S., Chen, C., Dey, T.K., Wang, Y.: Annotating simplices with a homology basis and itsapplications. In: Scandinavian Workshop on Algorithm Theory, pp. 189–200 (2012). Springer70. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of SymbolicComputation (3), 251–280 (1990). doi:10.1016/S0747-7171(08)80013-2. Computational algebraic complexityeditorial71. Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th InternationalSymposium on Symbolic and Algebraic Computation. ISSAC ’14, pp. 296–303. ACM, New York, NY, USA(2014). doi:10.1145/2608628.2608664. http://doi.acm.org/10.1145/2608628.2608664 uerra et al. Page 25 of 25

72. Guerra, M., De Gregorio, A.: Minimal Scaﬀold repository

MinScaﬀold (2019). https://github.com/marcoguerra192/MinScaffold

73. M. Termenon, A.J. C. Delon-Martin, Achard, S.: Reliability of graph analysis of resting state fmri usingtest-retest dataset from the human connectome project. Neuroimage (15), 172–187 (2016).doi:10.1016/j.neuroimage.2016.05.06274. Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaiﬁ, J., Gramfort, A., Thirion, B.,Varoquaux, G.: Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics8