Homological Scaffold via Minimal Homology Bases
Marco Guerra, Alessandro De Gregorio, Ulderico Fugacci, Giovanni Petri, Francesco Vaccarino
GGuerra et al.
RESEARCH
Homological Scaffold via Minimal HomologyBases
Marco Guerra , Alessandro De Gregorio , Ulderico Fugacci , Giovanni Petri and Francesco Vaccarino * Correspondence:[email protected] Department of MathematicalSciences, Politecnico di Torino,Torino, ItalyFull list of author information isavailable at the end of the article
Abstract
The homological scaffold leverages persistent homology to construct atopologically sound summary of a weighted network. However, its crucialdependency on the choice of representative cycles hinders the ability to traceback global features onto individual network components, unless one provides aprincipled way to make such a choice. In this paper, we apply recent advances inthe computation of minimal homology bases to introduce a quasi-canonicalversion of the scaffold, called minimal, and employ it to analyze data both realand in silico. At the same time, we verify that, statistically, the standard scaffoldis a good proxy of the minimal one for sufficiently complex networks.
Keywords:
Persistent Homology; Topological Data Analysis; NetworkSkeletonization
AMS Subject Classification:
Network science has long represented the cornerstone theory in dealing with com-plex, heterogeneous multi-agent systems. Network descriptions have found wideapplications and had a significant impact on a wide range of fields ([1, 2]), includ-ing social networks ([3, 4]), epidemiology ([5, 6]), biology ([7, 8]), and neuroscience([9, 10, 11]).In recent years, new approaches to the analysis of networks and, more generally,complex interacting systems have emerged which leverage topological techniques([12, 13, 14, 15]). These techniques generally are referred to as Topological DataAnalysis (TDA) ([16, 17]). TDA is a relatively modern subject based on classical Al-gebraic Topology ([18, 19]) and that was sparked from a handful of seminal works inthe late 90’s ([20, 21, 22, 23, 24]). TDA typically endows a large variety of datasetswith a notion of shape (more properly, with a topological structure) and, based onthat, studies the considered data in terms of its topological features. This field isundergoing a rapid expansion thanks to its rooting in the powerful languages ofhomological algebra and category theory, which provide strong formal foundations,as well as to the wide variety of applications it found, that span material science([25, 26]), biology and chemistry ([27, 28, 29, 30, 31, 32]), sensor networks ([33]),cosmology ([34]), medicine and neuroscience ([35, 36, 37, 38, 39, 40, 41, 42, 43]),manufacturing and engineering ([44, 45, 46]), social sciences ([47, 48]), and networkscience itself ([49, 50, 51, 52]).The most central tool in TDA is undoubtedly Persistent Homology ([16, 53]). Thetheory of (or around) persistence has recently been proposed as a framework for a r X i v : . [ m a t h . A T ] A p r uerra et al. Page 2 of 25 the topological skeletonization of spaces, particularly weighted graphs and networks([54, 55, 56, 57]).In [40], the generators of persistent homology are used to build one instance ofnetwork skeletonization called homological scaffold . However, the method has a se-rious drawback, consisting in the large degree of arbitrariness in the choice of onerepresentative cycle from the many equivalent generating cycles of the same homol-ogy class. This is unfortunately a direct consequence of the homology classes beingequivalence classes and affects all attempts to localize cycles ([58, 42]). In this work,we set out to address this issue by searching for a form of canonicity in the choiceof generators, namely by computing minimal representatives of homology bases.Minimal homology bases have long been investigated ([59, 60]), with a breakthroughonly coming thanks to the introduction of a first efficient algorithm for the com-putation of bases in dimension one ([61]). Here, we leverage said minimal bases topropose a new approach to network skeletonization, the minimal scaffold , whichovercomes the limitation of the previous one. While the minimal scaffold is notunique in the most general case possible, we provide strong guarantees and caveatson when and to what degree it is well-defined. We then show a few applications ofthe novel method, concluding the paper with a comparison between our and theprevious construction.
Outline of the Paper
The paper is organized as follows. Section 2 provides a brief overview of the mainconcepts in Topological Data Analysis. Section 3 describes the original approachto network skeletonization by means of persistent homology, and highlights thedeficiencies which we wish to address. In Section 4, the topic of computing minimalrepresentatives of a homology basis is worked out. Section 5 introduces the mainconcept of this work, the minimal scaffold. In Section 6, the issue of uniquenessis discussed, with some results stated, leading to a more refined version of theminimal scaffold. Section 7 showcases some applications for the minimal scaffold.In the light of its computational complexity, we further carry out in Section 8 astatistical comparison between the minimal and original scaffolds, providing someheuristic guarantees and caveats. Section 9 concludes the discussion.
Glossary
List of symbols and their common usage throughout the paper. uerra et al.
Page 3 of 25
Symbol Meaning C A point cloud in R d K A simplicial complex F A filtration of simplicial complexes ( K ε ) ε =1 ,..M W A non-negatively weighted finite graph V The set of vertices of a graph E The set of edges of a graphVR( W ) The Vietoris-Rips complex of graph WC k ( K ) The vector space over Z of chains of k -simplices of the complex K∂ k The boundary operator between C k ( K ) and C k − ( K ) H ( K ) The 1 st homology group of complex Kβ ( K ) The dimension of H ( K ) P H ( F ) The 1-dimensional persistent homology of filtration F µ A function assigning non-negative weights to edges and cycles B A minimal homology cycle basis˜ B A minimal homology cycle basis with draws B ∗ The disjoint union of minimal cycle bases across a filtration˜ B ∗ The disjoint union of minimal cycle bases with draws across a filtration V i A set of homologous, equally minimal variants of a basis cycle H ( W ) The homological scaffold of weighted graph W H min ( W ) The minimal homological scaffold of weighted graph W ˜ H min ( W ) The minimal homological scaffold with draws of weighted graph W In this section we introduce the minimum amount of mathematics necessary to theunderstanding of the rest of the paper. We refer to classical textbooks on the subjectfor further reading ([18, 19, 53, 16]).
Simplicial complexes
Thanks to their proven flexibility in a plethora of applicative contexts, simplicialcomplexes are the most adopted mathematical structure for encoding unorganized,large-size and high-dimensional data. In purely combinatorial terms, a (finite) sim-plicial complex K on a finite set V is a collection of non-empty subsets of V , called simplices , with the property of being closed under inclusion, i.e., every non-emptysubset of a simplex of K is itself a simplex of K . Given a simplicial complex K , theelements of V are called vertices of K and a simplex σ ∈ K is called a k -simplex (equivalently, a simplex of dimension k ) if it consists of k + 1 vertices. The dimen-sion of a simplicial complex K is the largest dimension of the simplices in K .Even if the abstract definition of a simplicial complex just given is able to cap-ture a variety of datasets not necessarily endowed with a geometrical realization,it is worth to be mentioned that, intuitively, a simplicial complex is nothing but acollection of well-glued bricks, its simplices. According with such a perspective, a k -simplex can be seen as the convex hull of k + 1 geometrically independent points.For instance, a 1-simplex is an edge, a 2-simplex is a triangle, a 3-simplex is atetrahedron, and so on. uerra et al. Page 4 of 25
Homology
Homology is a topological tool which provides invariants for shape description andcharacterization. Given a simplicial complex K , it is possible to associate to it acollection of vector spaces C k ( K ) over a field, in our case Z , whose bases are indexedby the k -simplices so that, loosely speaking, we say that these spaces are generatedby the k -simplices of K . These spaces are connected by boundary operators ∂ k : C k ( K ) → C k − ( K ) mapping each k -simplex σ in the sum of the ( k − K strictly contained in σ . We denote as Z k ( K ) := ker ∂ k the space of the k -cyclesof K and as B k ( K ) := Im ∂ k +1 the space of the k -boundaries of K . Then, since ∂ k ∂ k +1 = 0, the quotient H k ( K ) := Z k ( K ) B k ( K )defines a vector space called k th homology group of K . We will call two k -cycles homologous if they belong to the same homology class.Roughly speaking, homology reveals the presence of “holes” in a shape. A non-nullelement of H k ( K ) is an equivalence class of cycles that are not the boundary ofany collection of ( k + 1)-simplices of K . Such classes represent, in dimension 0, theconnected components of complex K , in dimension 1, its tunnels and its loops, indimension 2, the shells surrounding voids or cavities, and so on. Persistent homology
An intrinstic limitation of homology concerns the need for working with a singlesimplicial complex representing the dataset under investigation. However, in realworld applications, the presence of noise and of measurement errors makes thechoice and construction of a single steady representation very hard in practice.
Persistent homology ([22, 53]), currently one of the main tools in Topological DataAnalysis, aims at solving this issue through a multi-scale study of a dataset and ofits homological features by associating to it a sequence of simplicial complexes. Theconcept of filtration captures exactly the idea of analyzing a dataset at differentthresholds of a parameter on which it depends. More formally, given a simplicialcomplex K , a filtration F of K is a sequence of its subcomplexes such that ∅ ⊆ K ⊆ · · · ⊆ K M = K Given a filtration of a simplicial complex K , persistent homology keeps track of theevolution of the non-null non-homologous cycles of K and, associating a lifespan toeach of them, is able to discriminate the relevant information from the noise. For-mally, for p, q = 1 , . . . , M with p < q , H p,qk ( F ) on ( p, q ) of a filtration F consists ofthe image of the linear map between H k ( K p ) and H k ( K q ) induced by the inclusionof complexes between K p and K q . So, more intuitively, the elements in H p,qk ( F )represent the cycles of K which survive from step p to step q .Given a filtration of finite simplicial complexes F , we define its k -dimensional per-sistent homology classes as the homology classes of (cid:76) ε H k ( K ε ) modulo the mapsinduced by the inclusion of simplicial complexes. More properly, h ∈ H k ( K p ) and h ∈ H k ( K q ) with p ≤ q are equivalent if and only if ι ∗ p,qk ( h ) = h where ι ∗ p,qk uerra et al. Page 5 of 25 denotes the linear map between H k ( K p ) and H k ( K q ) induced by the inclusion ofcomplexes between K p and K q . We call k -dimensional persistent homology P H k ( F )the space spanned by the k -dimensional persistent homology classes.As proven in [23], a basis of P H k ( F ) is in bijective correspondence with a finite setof intervals of the form { ( p, q ) , p < q, p, q ∈ Z ∪ ∞} referred as persistence pairs .We define a set of k -dimensional generator cycles of the persistent homology as aset of k -cycles of K M whose persistent homology classes form a basis of P H k ( F ).The information about the “life” of each homology class can be collected in a visual,informative representation of the topological structure of the input, the persistencebarcode : a plot consisting of a bar for each homological feature appearing through-out the filtration, stretching from its birth to its death value. An equivalent wayto depict the same information is through the persistence diagram : the persistencediagram is the multi-set (i.e., multiple instances of the same element are allowed) ofpoints in R consisting of all the (birth, death) pairs, i.e., pairs of values p < q suchthat a k -dimensional homology class arises at filtration step p and becomes zero atstep q . Persistent homology owes its popularity as a descriptor to the immediacyand power of these visual representations of the homological information but, evenmore, to the fact that the retrieved features are provably stable. In fact, by defininga notion of distance among persistence diagrams or barcodes, it can be shown thatsimilar datasets necessarily have similar homological features ([24]). Building (filtered) complexes
In many applications, one is not directly called to deal with a simplicial complex,but has instead access to data in the form of point clouds in a metric space or ofweighted graphs. For example, data may be obtained as a sample of some (unknown)ground truth, i.e., an undisclosed manifold of dimension usually much lower thanthe space it is embedded in ([16]). Another typical subject of application is networkscience ([49, 52]): in this setting, the input is in the form of a weighted graph.Notice that in this case it is not mandatory that the graph can be embedded insome metric space, i.e., that the edge weighting respects a triangular inequality.Networks are not necessarily representations of geometrical entities, and still thetopological approach extends naturally to this context.In both these cases, one needs to provide a suitable simplicial complex resting onthe given structure. The subject has been addressed extensively (see, for example,[53]); in here, we simply review the most typical scheme, called the
Vietoris-Ripscomplex . Given a graph G = ( V, E ), its flag or clique complex is the simplicialcomplex F lag ( G ) whose simplices coincide with the cliques of G .Given a point cloud V ⊂ R n and fixed a value ε >
0, one can build a graph G ε with a vertex for every point in V , and an edge between two vertices every timethe distance between the corresponding points is less or equal than ε . Analogously,given a weighted graph G = ( V, E ) one can build a subgraph G ε on the same vertexset, with only those edges that have weight less or equal than ε . Independently fromthe considered case, one can define the Vietoris-Rips complex
V R ε of parameter ε asthe flag complex F lag ( G ε ) of graph G ε . Furthermore, since varying ε the Vietoris-Rips complexes V R ε form an increasing sequence of simplicial complexes, the family( V R ε ) gives raise to a filtration denoted as filtered Vietoris-Rips complex (see Fig. uerra et al. Page 6 of 25 ε H H (0,0.18)(0,0.31)(0,0.33)(0,0.34)(0,0.35)(0,0.35)(0,∞) (0.39,0.51) H H ( a ) ( c )( b ) Figure 1 (a) An example of Vietoris-Rips filtration of simplicial complexes with parameter ε , andthe corresponding barcode for 0- and 1-dimensional persistent homology. (b) The persistent pairsof the above filtration. (c) Two equivalent representatives of the (only) generator of P H . R n . The homological scaffold originated from the intuition that traditional, graph-theoretical tools in network analysis were naturally able to capture significantproperties ([62]), but proved not as effective in detecting multi-agent and large-scaleinteractions. Interest in searching for alternative descriptors of network relationsarose, and soon works were published which leveraged invariants offered by compu-tational topology ([63, 14, 13]).In proposing the scaffold ([40]), the authors pointed out that homological might beable to summarize well network mesoscale structures, i.e., features living betweenthe purely local connections and the global statistics, to which previous methodolo-gies were blind. Furthermore, this structure could be analyzed over the continuous,full range of interaction intensities, without the need for ad-hoc domain-specific uerra et al.
Page 7 of 25 thresholds.Homological cycles intuitively describe obstruction patterns. The presence of non-trivial homology within a given region of a network highlights its structure asnon-contractible, binding signals to flow over constrained channels, which in turnplay the role of bridges.To test the method, the homological scaffold was computed from resting-state fMRIdata for 15 healthy volunteers who were either infused with placebo or psilocybin:the scaffold discriminated the two groups, as well as providing meaningful insightas to the impact of the psychoactive substance onto the pattern of information flowin the brain [40].Given a non-negatively weighted finite graph W = ( V, E, w : E (cid:55)→ R + ), let F bea filtration of simplicial complexes as above.Let { b i } be a set of 1-dimensional generator cycles of the persistent homology. Sincewe are over Z , each of the b i ’s is completely identified by its support, which is a setof edges of E . In particular, we can depict set { b i } as a matrix whose row are indexedby E and having the b i ’s as columns. The row sums, as natural numbers, form anew weighting function on the edges of W , the new weights counting precisely inhow many persistent cycles an edge appears along the filtration. Definition 3.1
Suppose W and F as above, and consider a set { b i } of 1-dimensional generator cycles of the persistent homology. Consider the function h W : E (cid:55)→ R + h W := (cid:88) i e ∈ b i (1)where by e ∈ b i we denote the indicator function E (cid:55)→ R + such that e ∈ b i ( e (cid:48) ) = 1 if e (cid:48) appears in b i , and 0 otherwise.Then the homological scaffold of W is the weighted graph H ( W ) such that- its vertex set coincides with the vertex set of W - its edge set is a subset of the edge set of W , consisting of edges with nonzerovalue for h W - its weight function is the restriction of h W to E .In accordance with the above definition, building the homological scaffold of aweighted network W is a method of network compression or skeletonization . Thedefinition also implies that edge weights are assigned by the number of basis cyclesthe edge belongs to.In the example of Fig. 2(a), a filtration of simplicial complexes arising from a pointcloud is depicted, together with generators of the persistent homology group, eachat the scale at which it is born. In Fig. 2(b), the corresponding homological scaf-fold is represented: one can see that the scaffolding procedure amounts to stackinggenerators of P H , i.e., cycles in the network, each yielding unitary weight.In the following, we shall sometimes refer to the homological scaffold as the loose , uerra et al. Page 8 of 25 ε ( a ) ( b ) Figure 2 (a) A point cloud in [0 , and the generators of P H , plotted on the filtration stepthey appear at (scale reported on the axis below). (b) The resulting homological scaffold. Edges inblue have weight 1, each belonging to only one generator. The edge in green has weight 2, as itbelongs to two generators. or original scaffold, to contrast it with the new definition of scaffold to follow.As anticipated in the introduction, it is apparent that there is a substantial sourceof arbitrariness in this definition.Several different representative cycles exist which form a basis of the persistenthomology (as a consequence of several different cycles belonging to the same ho-mology class), and hence one must make a choice. For example, Fig. 3(a) depictsone specific cycle whose homology class generates (part of) the persistent homologygroup of the point cloud. At the same time, any other choice of edges forming acycle around the hole is homologically equivalent and, in principle, legitimate.In the original paper, the authors resorted to using the cycles as output by the JavaPlex implementation ([64]) of the persistent homology algorithm (based onthe original implementation of [21]), and a posteriori checked the selected cyclesfor consistency. However, in principle, this means that the same simplicial complexwritten with two different orderings of the simplices could lead to different choicesof generators, and therefore, to different scaffolds.As such, we must be careful in the choice of nodes and edges output by the algo-rithm; while the presence of a generator denotes undeniably that an obstructionpattern exists, we cannot be as confident about its precise location in the networkor the constituents that provide bridges around it. The homological scaffold definedin this way introduces noise in the localization of mesoscale patterns onto individualnodes and edges, a process which, if accurate, could provide valuable insight as to uerra et al.
Page 9 of 25 ( a ) ( b ) Figure 3
A simplicial complex K with dim H ( K ) = 1 . Its homological scaffold (on a subset ofthe filtration steps, for clarity) is reported in panel (a): the chosen generator meanders around thehole. Furthermore, a different ordering of the list of simplices fed to the algorithm could return adifferent cycle. In panel (b), the shortest representative cycle is chosen: this choice is stable withrespect to any ordering of the input, while at the same time endowing the generator with somemetric and geometric meaning. the functional role of single players in a network.In this work, we try to work around the problem of cycle choice and give a stricterdefinition, by requiring that, among all possible representatives, those of minimaltotal length are chosen (e.g., Fig. 3(b)).The original algorithm reported a computational complexity of the order O ( n ) toobtain representatives of basis cycles. The search for minimality in the computation of the scaffold was made feasible bythe introduction of efficient algorithms to compute the minimal representatives ofa homology bases in dimension one.It is known that in dimension higher than one, minimal representatives of a homol-ogy basis will remain elusive. Indeed, Chen and Freedman ([65]) proved that theproblem of obtaining these minimal representatives is computationally intractable,being at least as hard as the notoriously NP-Hard Nearest Codeword Problem. Fur-thermore, it is even NP-Hard to approximate within any constant factor, meaningthat no polynomial-time algorithm exists to obtain an approximate minimal basisthat differs from the exact one by at most a multiplicative constant. In the lightof this, we must necessarily restrict our attention to the 1-dimensional case, i.e.,computing minimal representatives of a basis of H . Given a simplicial complex K , let us consider C the vector space generated by the1-simplices of K and Z the vector space of 1-cycles, i.e., Z = ker ∂ . Given a 1-cycle uerra et al. Page 10 of 25 b ∈ Z , let µ ( b ) be its length, i.e., the sum of the weights of the 1-simplices that formit, and denote by [ b ] the homology class b belongs to. Finally, let β := dim H ( K ).We want to obtain a set of β ∈ Z { b , ..., b β } = argmin Span { [ b i ] } = H (cid:88) i µ ( b i ) (2)that is a set of cycles of minimal length whose homology classes span H ( K ). Inaccordance with the literature, we call this set a minimal homology basis , with aslight abuse of terminology, as it would be more appropriate to call it a minimally-represented homology basis .In 2018, Dey et al. ([61]) introduced a polynomial-time algorithm to obtain saidrepresentatives. Building on the work of Horton ([66]), de Pina ([67]), and Mehlhornet al. ([68]), the algorithm sets off to compute a basis of the space of cycles. Then,it applies a cohomological technique called simplex annotation ([69]) to lift a basisof cycles to a basis of the homology group H , while at the same time enforcing theminimal length constraint. A sketch of the algorithm follows. Algorithm: MinBasis( K ) • A basis of the cycles group Z is found via a spanning tree. Each edge in thecomplement of the spanning tree identifies a candidate cycle ([66]). • An annotation of the edges is computed via matrix reduction ([69]). Thisyields the dimension β of H , as well as an efficient tool to determine if twocycles b and b are linearly dependent in H ( [ b ] = [ b ]). • A set of support vectors is generated which maintains a basis of the orthogonalcomplement in H of the minimal basis cycles. • Iteratively for each dimension of H , the candidate set of cycles is parsedin search of cycles b ’s that are linearly independent in homology from theprevious ones (exploiting the support vectors). Among these, the µ -shortestone is added to the minimal basis. • The set of support vectors is updated for the remaining dimensions to enforceit remain a basis of the orthogonal complement of the basis. • The last two steps above are repeated until completion of the minimal basis.Call B = { b i } the output of MinBasis on input K . Theorem (3.1, [61]) Cycles in B form a minimal homology basis of H ( K ).Notice that the minimal homology basis is guaranteed to exist, as we only work withfinite simplicial complexes, which imply the existence of a finite number of bases.However, it needs not, in general, be unique. Several different cycles of the sameminimal length may all belong to the same homology class of a basis cycle. Heuris-tically, this is especially true in case the input complex is unweighted (equivalently,has equal weights for every edge), in which case the length of a cycle is the numberof edges that form it. Furthermore, there exist cases when different sets of cyclesof minimal length generate the same homology space, and are not even pairwisehomologous. We will treat the problem of the uniqueness of the minimal basis inmore detail in the following, and account for it explicitly in the construction of the uerra et al. Page 11 of 25 minimal scaffold.The computational complexity of the above procedure is evaluated ([61]) to O ( n β + n ω ) where n is the number of simplices in K and ω is the fast ma-trix multiplication exponent, which as of 2014 is bounded by 2.37 ([61, 70, 71]).This yields a worst-case complexity of O ( n ) in the number of simplices for generalcomplexes, which we recall is itself of order 3 in the number of points in the worstcase. In this section, we introduce an alternative definition for the homological scaffold,which we call minimal, based on the minimal representatives obtained above, andaims at overcoming the arbitrariness in the cycle choice of the previous definition.After addressing the simplest case, we analyze its uniqueness properties and intro-duce a second, more refined, definition.Let F be the filtration of simplicial complexes induced by a non-negativelyweighted finite graph W . For all filtration steps ε , define, as per (2), B ε := { b εi } theminimal homology basis of H ( K ε ). Take the disjoint union of minimal bases for ε varying on all filtration steps B ∗ := (cid:97) ε B ε Definition 5.1
Suppose W , F and B ∗ as above. Similarly to the loose case, definethe function h W,min : E (cid:55)→ R + as h W,min := (cid:88) b ∈ B ∗ e ∈ b (3)Then, we define the minimal scaffold of W as the weighted graph H min ( W )whose:- vertex set coincides with the vertex set of W - edge set is a subset of the edge set of W , consisting of edges with nonzerovalue for h W,min - weight function is the restriction of h W,min to E .The minimal scaffold amounts, again, to the stacking of generator cycles across afiltration. However, two differences are to be noted with respect to the loose defini-tion. First, we require the representative cycles to be minimal. Second, we point outthat while the loose scaffold is built by aggregating the generator cycles of P H ( F ),the minimal scaffold is built by independently computing a minimal basis for each H ( K ε ) , for all ε . Notice that, since cycles are modified throughout a filtration, itwould be meaningless to talk about a minimal representative over a certain persis-tence interval. This also means that its computation can be effectively parallelizedby assigning different filtration steps to different jobs, and later recombining theoutputs. uerra et al. Page 12 of 25 ε ( a ) ( b ) Figure 4 (a) The same point cloud of Fig. 2. Along the filtration we show the evolution ofminimal generators, which can get progressively shorter as new edges are introduced. For example,at ε = 0 . , the pentagonal cycle gets cut to a shorter quadrilateral, albeit with an individuallonger edge. This evolution is accounted for in the minimal scaffold, which displays thetriangle-rich structure mentioned above. (b) The resulting minimal scaffold (weights not reported). An interesting phenomenon that descends directly from the above peculiarity isthat the minimal scaffold of random point clouds tends to display a more pro-nounced triangular structure (clustering) around cycles. Indeed, as longer (or, innon-metrical filtrations, later) edges are introduced, a cycle can be shortened (bythe triangular inequality) by a longer edge which cuts a corner. Since at each stepthe algorithm records the minimal representative, upon aggregating the minimalscaffold one finds each cycle in its progressively shorter version, and the history ofthe shortening is visible as a padding of triangles around it (see for example Fig.4(a)).We remark that, if there is no ambiguity in the construction of a filtration of sim-plicial complexes from a point cloud, or from a weighted graph, we will indifferentlyspeak of the scaffold as a function of either of them ( H min ( C ), or H min ( W ), or H min ( F )).We have mentioned that the scaffold amounts to a change in weighting in the inputgraph h W,min : E (cid:55)→ R + altering the original weights of the edges. Additionally, considering node strength (i.e. the sum of the weights of the edges incident to a given node), it can equally be uerra et al. Page 13 of 25 considered as a function H min : V (cid:55)→ R + assigning weights to nodes. Considering the reliability of the choice of edges inthe procedure, this explains why the minimal scaffold can be utilized to associatemesoscopic features with single nodes and links. Computational Complexity
For large input sizes, the cost of assembling the minimal basis cycles into the scaf-fold is negligible with respect to the cost of computing such minimal basis. We knowthat each run of Dey’s algorithm costs O ( | K | ) in the worst case ([61]), and in theworst case | K | is itself O ( n ) where n is the number of points.The number of filtration steps has an upper bound of O ( n ) (i.e., the number ofedges) in the worst case, as in general every edge may carry a different weight.Hence Dey’s algorithm has to be run once for each edge in the worst case.This yields a theoretical worst-case complexity of order O ( n n ) = O ( n ). There-fore, while the minimal scaffold is undeniably a polynomial-time algorithm, its prac-tical computation is often hindered by its dire lack of scalability, especially if com-pared against the loose version, which has a far more favourable complexity.A comparison of running times is carried out in Fig. 5, which clearly shows thatcomputing the minimal scaffold on an ordinary machine can quickly become trou-blesome. uerra et al. Page 14 of 25
Figure 5
The running times of computing the minimal and loose scaffolds for Watts-Strogatzweighted random graphs. For all instances, number of nodes N is indicated on the x-axis. Numberof stubs k is N/ , and rewiring probability is p = 0 . . Implementation
We have written a Python implementation of Dey’s algorithm, together with alibrary for the computation of the minimal scaffold. The code is available on GitHubat [72], with some usage examples. It allows for shared-memory multi-threadedparallelism across filtration steps to improve computation times, while still beingsuitable for ordinary desktop workstations.
The uniqueness of the minimal scaffold depends on the uniqueness of the minimalbasis. Indeed, if there exists only one possible set B ∗ of cycles forming a minimalbasis, then the scaffold is uniquely determined. Two issues affect the uniqueness ofset B ∗ . Draws
The first one arises when two or more different and homologous basis cycles are ofthe same minimal length. This case is relatively simple to work around: we modifythe definition of minimal scaffold to keep track of all variants of minimal basiscycles, dividing the weight equally among them.Specifically, to account for this issue we have slightly modified Dey’s algorithm. Inits last step described above, one is concerned with finding all cycles whose anno-tation is not orthogonal to the given support vector: among these, the one withminimal length is chosen as a basis cycle. Instead, we keep track of all such cycleswith the same minimal length. This does not alter the complexity, as one needs tocheck all possible cycles anyway. We call this case a draw . uerra et al. Page 15 of 25
Therefore, we modify set B to become a set of sets of cycles . Given complex K , wedefine a minimal basis with draws ˜ B := β ( K ) (cid:91) i =1 { b i, , ..., b i,n i } where for all i = 1 , ..., β ( K ), the cycles b i,j with j = 1 , ..., n i are homologous andhave the same minimal length. Furthermore, for every choice of j i ∈ { , ..., n i } ,Span i { b i,j i } = H ( K ). Call V i := { b i, , ..., b i,n i } each set of draws, i.e., variants ofthe i th minimal basis cycle, ∀ i = 1 , ..., β ( K ).In the example of Fig. 6(a) and (b), we have set ˜ B = { { b , , b , } } , whereas set B might have indifferently been equal to { b , } or to { b , } , whichever happened tocome first in the search.The minimal scaffold is modified accordingly. Given the usual filtration F , let ˜ B ε be the minimal basis with draws of H ( K ε ). Again, we aggregate all variants ofminimal basis cycles along the filtration˜ B ∗ := (cid:97) ε ˜ B ε Then, we define the weighting function with draws ˜ h W,min : E (cid:55)→ R + ˜ h W,min := (cid:88) V ⊂ ˜ B ∗ | V | (cid:88) b ∈ V e ∈ b (4)and the resulting minimal scaffold with draws ˜ H min ( W ) is built from ˜ h W,min as inDefinition 3.The meaning of the above definition is that all variants of all minimal basis cyclesare taken into account when building the scaffold, and the weights are assigned di-viding each variant’s contribution by its cardinality, for each filtration step. In theexample of Fig. 6(c), the two cycles forming the variant of the only generator aremultiplied by a factor of and then summed: therefore, common edges outside thediamond are assigned weight 1, consistently with the minimal scaffold in definition(3), whereas the four edges forming the perimeter of the diamond each get assignedweight .With the introduction of draws, we settle the case when ambiguity arises amongindividual cycles, without interactions. As an example, we can state the followingresult. Proposition If F is such that, for all ε in the filtration, each basis cycle belongsto a different connected component of K ε , then the minimal scaffold with draws˜ H min ( F ) is unique. Pathological cases
The other issue arises when there exist sets of minimal cycles that are not linearlyindependent. Suppose that three different cycles generate a homology group of di-mension two, i.e., when three minimal cycles are pairwise independent in homology, uerra et al.
Page 16 of 25 but threewise dependent. In this case, two generators are sufficient to span H and,if their lengths are arranged pathologically, there is no principled way to choose twoout of the three.Suppose for example that three cycles b , b and b are such that µ ( b ) < µ ( b ) = µ ( b ) and [ b ] = [ b ] + [ b ]In this case, both bases { b , b } and { b , b } span the same homology space, andare of equal minimal length. The minimality criterion fails in this case.One could believe that such a configuration can only happen in the most generalspaces, and that by imposing some mild hypotheses on the input data one couldrule the pathology out. In fact the opposite is true, this degeneracy being possibleeven after enforcing very strong conditions on the data. Counterexample
Even if W is planar and an isometric embedding W (cid:44) → R exists(i.e., the input planar weighted graph can be accurately drawn onto the plane), theminimal scaffold ˜ H min ( W ) needs not be unique.In fact, consider complex K arising from the geometric, planar graph in Fig. 6(d).Its homology H ( K ) is generated by two cycles; since the outer cycle b is the short-est, and the two inner ones b and b are of equal length, the minimality criterioncan not solve between { b , b } and { b , b } , as both are acceptable minimal bases.The minimal scaffold (with or without draws) is not unique in this case.Clearly, the same could happen with more than three cycles, with a larger numberof possibly ambiguous configuration. Therefore, if we allow for a high degree ofsymmetry in the input, this pathology could arise even in the rather tame contextof planar graphs on R . This issue is rather delicate, in the sense that not only thealgorithm is unable to make a principled choice; it is not even capable of detectingwhen such a configuration takes place. In fact, this is more of a feature of homologythan a flaw in the skeletonization framework: what our eyes see as different cyclesare in fact homologically equivalent, and it is impossible to use homology to tellthem apart.We however remark that, for complexes arising from real-world data, this type ofconfiguration is actually pathological. Indeed, the following generality result holds Proposition
Assume a point cloud C = { X i } such that X i ∼ U ([0 , d ) i.i.d.. Then,almost surely, the minimal scaffold H min ( W ) (with or without draws) is unique.If the input point cloud is sampled uniformly at random in some R d , then edgelengths are distributed according to an absolutely continuous probability law. There-fore, given two edges e and e , P [ µ ( e ) = µ ( e )] = 0. The same holds for any twonon-identical cycles, and any two homology bases (being but finite sets of edges):the probability of them sharing the exact same length is zero. By finiteness of theinput, at least one minimal homology basis exists and, by the above reasoning,almost surely this basis is unique for each filtration step. Then, with probability 1the minimal scaffold is unique. uerra et al. Page 17 of 25
K b b ( a ) ( b )( c ) K b b b ( d ) ( e ) Figure 6
Top panel: (a) A simplicial complex K . (b) Two homologous and equally minimalgenerators of H ( K ) . (c) The minimal scaffold with draws ˜ H min ( K ) . The weight is equallydivided among the variants of the minimal representative. Bottom panel: (d) A simplicial complex K on the represented point cloud. H ( K ) has dimension 2. (e) µ ( b ) < µ ( b ) = µ ( b ) . Aminimal basis can either be composed of { b , b } or { b , b } , hence it is not unique. This result is actually quite general: whenever we can assume our input data tobe subject to noise, then we are in principle allowed to rule out pathological same-length cycles. In these cases, the minimal scaffold is unique.We remark that this uniqueness result is compatible with the phenomenon of theconcentration of measure: while for a very high-dimensional space or a very largenumber of points we know from theory that the distribution of length of edgesconcentrates towards its mean value, the probability of two edges (and hence twocycles) having the same length is still zero. One needs to be careful, however, thatthe probability of two cycles differing in length by less than some (cid:15) > (cid:15) .In summary, the minimal scaffold with draws ˜ H min is well-defined up to somepathological circumstances, where it may depend on the ordering of the input. As illustrative examples, we show here a few applications of the minimal scaffold.Through it, we obtain meaningful subsets of known networks in neuroscience, andrank their constituents by their “topological importance”. uerra et al.
Page 18 of 25
Figure 7
The top 25 neurons by relative node strength in the minimal scaffold over averagestrength in C. Elegans (mean . ). Four neurons show a significantly higher relative strengththan the others. The C. Elegans dataset is a correlation network of neural activations of the ne-matode worm Caenorhabditis Elegans. C. Elegans has become a model organismdue to the unique characteristic of each individual sharing the exact same nervoussystem structure.The input consists of a symmetric weighted adjacency matrix over 297 nodes, eachrepresenting a neuron. Edge weights represent (quantized) time correlations be-tween the firing of neurons, ranging from 1 to 70.The minimal homological scaffold of its brain map highlights the geometry of theobstruction patterns, i.e., the precise areas where nervous stimuli are less likely toflow. We stress the improvement obtained by the minimal scaffold over the looseone, in that it is not only able to identify the presence of a “grey area” in thenetwork, but it can as well provide a reliable boundary for it, and identify whichneurons and inter-neuron links are responsible for information flowing around theobstruction.As an interesting example, we see in Fig. 7 the top 25 neurons ranked in descendingorder of relative node strength (sum of weights of incident edges) with respect tothe average node strength. We can identify four nodes, labeled 81, 260, 36, and37, which hold a significantly higher relative strength than the rest. This impliestheir presence in many minimal cycles across several scales, hence suggesting thatthey play a crucial role in the fabric of information flow within the nematode’s brain.The same type of analysis was repeated on the correlation network of brain activitiesin an 88-parcel atlas of the human brain, obtained through fMRI imaging at restingstate. The data is courtesy of the Human Connectome Project ([73]).Again, the minimal scaffold identifies which regions and links in the human brain uerra et al.
Page 19 of 25 ( a ) ( c ) ( b ) Figure 8 (a) The top 25 brain regions in the human brain by relative node strength in theminimal scaffold over average strength (mean . ). Two neurons show significantly higherimportance. (b) The chord diagram of the minimal scaffold. Node size represents node strength,edge color intensity represents weight in the scaffold. (c) The minimal scaffold embedded in thehuman brain, with regions accurately located, projected on the three coordinated planes. Edgecolor represents log-weight in the minimal scaffold (Log-scale for visualization purposes). are key bridges for the flow of information. Two parcels stand out (Fig. 8(a)) asparticularly relevant for network topology.For a relatively small network such as this, we can visualize the scaffold as a propersubnetwork by a chord diagram (Fig. 8(b)), with edge weight represented by colorintensity and node strength by the size and color of the vertex. We stress that,starting from a virtually complete graph over 88 nodes, we reduce the size from3828 edges to just 191, while preserving the topological structure.We can, as well, leverage libraries in computational neuroscience ([74]) to embed thescaffold in the actual human brain, with regions correctly located, projected on thethree coordinated planes. In Fig. 8(c), for visualization purposes color intensitiesrepresent log-weight in the scaffold. As the last contribution for this work, we consider a comparison between the mini-mal and loose scaffolds.We have already pointed out that the minimal scaffold in general offers superiorguarantees as a tool, both for network analysis and network skeletonization. On theother hand, the loose scaffold clearly has an advantage in terms of computationalcomplexity: while it is in principle viable for most of the applications where persis-tent homology has been employed, the minimal scaffold, even adopting filtration- uerra et al.
Page 20 of 25 wise parallelization, requires a vastly larger amount of computational power, whicheffectively limits its range of application, unless run on dedicated, high-performanceinfrastructures.A reasonable question to ask is the following. If one is interested not in the ex-act structure of the scaffold, but only in its statistical behaviour, could the loosescaffold provide a sufficient approximation of the minimal one? In a more concreteexample, if instead of wondering exactly which nodes in a network are the mosttopologically important one is interested in the distribution of the degree sequenceof the minimal scaffold, could the loose one come to one’s help?To answer this question, we have performed comparisons of several graph metricsin the two scaffolds of C. Elegans. Further, to gain insight into the general case, wehave sampled two families of random graphs at different parameter values, one forgeometric graphs (Random Geometric Graph), and one for non-geometric graphs(Weigthed Watts-Strogatz).
C. Elegans
For the C. Elegans dataset, we have compared the following graph metrics of theminimal and loose scaffolds:1 Degree Sequence2 Node Strength3 Betweeness Centrality4 Closeness Centrality5 Eigenvector Centrality6 Clustering Coefficients7 Edge weightsResults (reported in the Table of Fig. 9(c)) indicate that, for metrics 1 to 5, thetwo scaffolds are very well correlated. So for example the cheap, loose scaffold is areliable proxy of the distribution of the “true” degree sequence (scatterplot in Fig.9(d)).We instead observe poor correlation of edge weights and clustering coefficients.The first one is not unexpected, since the edge weighting procedure is conceptuallydifferent in the two scaffolds: while in the minimal one we consider a different basisfor each filtration step, the loose scaffold considers bases of the persistent homologyspace, drastically reducing the number of cycles considered. To make it clearer,in general set B ∗ has cardinality much larger than the dimension of P H . It istherefore explicable that the distributions of edge weights do not generally agree.Clustering coefficients, on the other hand, are a measure of how “triangular” agraph is around a given node. As remarked in Section 5, another consequence ofassembling the scaffold from the minimal bases of the H ’s is that a large numberof artificial triangles appear around cycles. In this case too, therefore, the poorcorrelation is easily explained. Random Graphs
Drawing inspiration from [52], we repeat the analysis on random graph samples. [52]divides random networks into two categories: those created from edge weightingschemes and those created from points in the Euclidean space. We have chosen uerra et al.
Page 21 of 25
Random Geometric Model
PearsonSpearman
Watts-Strogatz Model
PearsonSpearman D e g r ee S e q u e n ce B e t w ee n ee ss C e n t r a li t y ( a ) ( b )( c ) Spearman PearsonMetric
Corr p-val Corr p-valNode Degree 0.953148 3.1842e-155 0.975559 3.4463e-196Node Strength 0.772330 4.3712e-60 0.700653 3.7250e-45Betw. Centrality 0.952098 7.7348e-154 0.986412 1.8813e-233Closeness Centrality 0.921274 5.1143e-123 0.960413 8.7695e-166Eigenvector Centrality 0.880711 9.5943e-98 0.858564 1.3911e-87Clustering Coe cients 0.412889 1.1778e-13 0.358577 1.9337e-10Edge Weights 0.226321 1.3586e-09 0.086226 0.0224
Correlations between the minimal and loose scaffold. (a) Comparison in the weightedWatts-Strogatz model. Degree sequence and betweenness centrality in the two scaffolds arecompared, using Pearson and Spearman correlation coefficients. Each box is computed over asample of 30 weighted Watts-Strogatz random graphs, with parameters as reported on the x-axis:the pair ( N, k ) indicates a WS model on N nodes, with k stubs to rewire. The rewiring probabilityis . . The cyan x’s and the green diamonds represent the average correlation value against theloose and minimal null models, respectively. (b) Comparison in the random geometric model.Again, Pearson and Spearman correlation coefficients of the degree sequence and betweennesscentrality in the two scaffolds are compared. Each box is computed over a sample of 30 randomgeometric graphs, with parameters as reported on the x-axis: the pair ( N, t ) indicates a graph on N nodes sampled uniformly at random in the [0 , square. t is the connectivity distancethreshold. The cyan x’s and the green diamonds represent the average correlation value againstthe loose and minimal null models, respectively. (c) Correlation tests for several network metricsshow significant capabilities of the standard scaffold to reproduce certain statistical properties ofthe minimal one in C. Elegans. At the same time, due to different construction mechanisms,others are unreliable. (d) Scatterplot of the degree sequence of neurons of C. Elegans in theminimal scaffold versus in the loose one. to analyze the weighted Watts-Strogatz (WS) model as representative of the firstclass, and the geometric random model as representative of the second. We remarkthat weighting needs to be introduced in order to compute persistence; while forgeometric graphs this simply requires computing the Euclidean distance, for theWatts-Strogatz model it requires an ad-hoc procedure that is described in detail inthe supplemental material of [52].We briefly recall that a WS graph is parametrized by the number of nodes, by thenumber of stubs to rewire, and by the rewiring probability. A random geometricgraph is instead parametrized by the number of points to sample (uniformly) in[0 , d , and by a cutoff value that acts as distance threshold, beyond which no edgeis introduced.In both cases, we observe good agreement on key statistics, as reported in Fig.9(a) and (b). Each bar is obtained by computing the correlation of the reportedstatistic on a sample of 30 random graphs of the reported model, with parameters uerra et al. Page 22 of 25 as indicated on the x-axis.For comparison, two null models are built for each instance of the minimal andloose scaffolds in the sample, by constructing an Erd˝os-R´enyi random graph onthe same vertex set, one with the same number of edges as the minimal scaffold,and one with the same number as the loose one. The correlation is computed ofeach statistic between the minimal scaffold and the loose null model and betweenthe loose scaffold and the minimal null model. The average of these correlationsis reported on the boxplots to act as a baseline value, highlighting that the twoscaffolding procedures agree with each other by more than just statistical noise.
We provided a new method of network analysis and skeletonization, based on thecomputation of minimal homology bases. This new new construction fills a signifi-cant gap in previous literature, in that it yields, in all but some pathological cases, awell-defined and unique subgraph, acting as a reasonable ground truth for compar-ison with the previous construction. It can be employed in a range of applications,both to identify crucial and weak links in a network, and to obtain compressed andtopologically sound representations of the input. It also allows to evaluate the re-liability of other scaffolding procedures with respect to said ground truth: we haveobserved that, for some applications, the loose scaffold can be deemed a sufficientlyaccurate tool, while not incurring in as cumbersome a computational load.We foresee that the subject of homological skeletonization is not yet concluded.Other approaches to finding canonical generators of homology are possible (for ex-ample in [54] and [75]), and we plan to investigate them further in subsequentworks.
Availability of data and material
The C. Elegans dataset analysed during the current study is available and included in the GitHub repositoryMinScaffold, https://github.com/marcoguerra192/MinScaffold.The Human Connectome Project dataset is available from the pagehttps://github.com/marcoguerra192/MinScaffold.
Competing interests
The authors declare that they have no competing interests.
Funding
MG, ADG, UF, and FV acknowledge the support from the Italian MIUR Award “Dipartimento di Eccellenza2018-2022” - CUP: E11G18000350001 and the SmartData@PoliTO center for Big Data and Machine Learning. GPacknowledges partial support from Intesa Sanpaolo Innovation Center. The funder had no role in study design, datacollection, and analysis, decision to publish, or preparation of the manuscript.
Author’s contributions
MG, ADG, UF, GP, and FV conceived and designed the study, performed the analysis and wrote the manuscript. Allauthors read and approved the final manuscript.
Acknowledgements
The authors acknowledge Iacopo Iacopini for kindly sharing a Python library for plotting simplicial complexes,available on GitHub ( github.com/iaciac/py-draw-simplicial-complex). We further acknowledge the python libraryNilearn ([74]) for the brain image visualization code. We would also like to thank Paola Siri for useful discussions.
Author details Department of Mathematical Sciences, Politecnico di Torino, Torino, Italy. ISI Foundation, Torino, Italy.
References
1. Newman, M.E.: The structure and function of complex networks. SIAM review (2), 167–256 (2003)2. Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weightednetworks. Proceedings of the national academy of sciences (11), 3747–3752 (2004)3. Granovetter, M.S.: The strength of weak ties. Social networks, pp. 347–367. Elsevier (1977) uerra et al. Page 23 of 25
4. Vega-Redondo, F.: Complex social networks. Cambridge University Press (2007)5. Pastor-Satorras, R., Castellano, C., Van Mieghem, P., Vespignani, A.: Epidemic processes in complex networks.Reviews of modern physics (3), 925 (2015)6. Colizza, V., Barrat, A., Barth´elemy, M., Vespignani, A.: The role of the airline transportation network in theprediction and predictability of global epidemics. Proceedings of the National Academy of Sciences (7),2015–2020 (2006)7. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proceedings of the nationalacademy of sciences (12), 7821–7826 (2002)8. Alon, U.: Biological networks: the tinkerer as an engineer. Science (5641), 1866–1867 (2003)9. Bassett, D.S., Sporns, O.: Network neuroscience. Nature neuroscience (3), 353 (2017)10. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functionalsystems. Nature reviews neuroscience (3), 186–198 (2009)11. Bassett, D.S., Bullmore, E.: Small-world brain networks. The neuroscientist (6), 512–523 (2006)12. Horak, D., Maleti´c, S., Rajkovi´c, M.: Persistent homology of complex networks. Journal of StatisticalMechanics: Theory and Experiment (03), 03034 (2009). doi:10.1088/1742-5468/2009/03/p0303413. Patania, A., Petri, G., Vaccarino, F.: The shape of collaborations. EPJ Data Science (1), 18 (2017).doi:10.1140/epjds/s13688-017-0114-814. Lee, H., Chung, M.K., Kang, H., Kim, B.-N., Lee, D.S.: Discriminative persistent homology of brain networks.In: 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 841–844 (2011).IEEE15. Rieck, B., Fugacci, U., Lukasczyk, J., Leitte, H.: Clique community persistence: A topological visual analysisapproach for complex networks. IEEE Transactions on Visualization and Computer Graphics (1), 822–831(2018). doi:10.1109/TVCG.2017.274432116. Ghrist, R.: Elementary applied topology. Createspace (2014)17. Patania, A., Vaccarino, F., Petri, G.: Topological analysis of data. EPJ Data Science (1), 7 (2017)18. Hatcher, A.: Algebraic topology. Cambridge University Press (2002)19. Munkres, J.R.: Elements of algebraic topology. Perseus Books (1984)20. Frosini, P.: A distance for similarity classes of submanifolds of a euclidean space. Bulletin of the AustralianMathematical Society (3), 407–415 (1990)21. Delfinado, C.J.A., Edelsbrunner, H.: An incremental algorithm for betti numbers of simplicial complexes on the3-sphere. Computer Aided Geometric Design (7), 771–784 (1995). doi:10.1016/0167-8396(95)00016-Y. GridGeneration, Finite Elements, and Geometric Design22. Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. Discrete &Computational Geometry (4), 511–533 (2002). doi:10.1007/s00454-002-2885-223. Zomorodian, A.J., Carlsson, G.: Computing persistent homology. Discrete & Computational Geometry (2),249–274 (2005)24. Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete & ComputationalGeometry (1), 103–120 (2007). doi:10.1007/s00454-006-1276-525. Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E.G., Matsue, K., Nishiura, Y.: Hierarchical structures ofamorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences ofthe United States of America (26), 7035–7040 (2016)26. Lee, Y., Barthel, S.D., D(cid:32)lotko, P., Moosavi, S.M., Hess, K., Smit, B.: Quantifying similarity of pore-geometry innanoporous materials. Nature communications (1), 1–8 (2017)27. Chan, J.M., Carlsson, G., Rabadan, R.: Topology of viral evolution. Proceedings of the National Academy ofSciences (46), 18566–18571 (2013)28. Chung, M.K., Bubenik, P., Kim, P.T.: Persistence diagrams of cortical surface data. In: Information Processingin Medical Imaging, pp. 386–397 (2009). Springer29. Dequeant, M.-L., Ahnert, S., Edelsbrunner, H., Fink, T., Glynn, E., Hattem, G., Kudlicki, A., Mileyko, Y.,Morton, J., Mushegian, A., et al. : Comparison of pattern detection methods in microarray time series of thesegmentation clock. PLoS One (8), 2856 (2008)30. Wang, Y., Agarwal, P.K., Brown, P., H, E., Rudolph, J.: Coarse and reliable geometric alignment for proteindocking. In: In Proceedings of Pacific Symposium on Biocomputing, vol. 10, pp. 65–75 (2005)31. Martin, S., Thompson, A., Coutsias, E.A., Watson, J.-P.: Topology of cyclo-octane energy landscape. Journalof Chemical Physics (23), 234115 (2010)32. Phinyomark, A., Khushaba, R.N., Ib´a˜nez-Marcelo, E., Patania, A., Scheme, E., Petri, G.: Navigating features: atopologically informed chart of electromyographic features space. Journal of The Royal Society Interface (137), 20170734 (2017)33. De Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic & Geometric Topology (1), 339–358 (2007)34. van de Weygaert, R., Vegter, G., Edelsbrunner, H., Jones, B.J.T., Pranav, P., Park, C., Hellwing, W.A.,Eldering, B., Kruithof, N., Bos, E.G.P.P., et al.: Alpha, Betti and the Megaparsec Universe: On the Topology ofthe Cosmic Web, pp. 60–101. Springer, Berlin, Heidelberg (2011)35. Patania, A., Selvaggi, P., Veronese, M., Dipasquale, O., Expert, P., Petri, G.: Topological gene expressionnetworks recapitulate brain anatomy and function. Network Neuroscience (3), 744–762 (2019)36. Li, L., Cheng, W.-Y., Glicksberg, B.S., Gottesman, O., Tamler, R., Chen, R., Bottinger, E.P., Dudley, J.T.:Identification of type 2 diabetes subgroups through topological analysis of patient similarity. ScienceTranslational Medicine (311), 311–174 (2015)37. Giusti, C., Pastalkova, E., Curto, C., Itskov, V.: Clique topology reveals intrinsic geometric structure in neuralcorrelations. Proceedings of the National Academy of Sciences of the United States of America (44),13455–13460 (2015)38. Wang, Y., Ombao, H., Chung, M.K.: Topological data analysis of single-trial electroencephalographic signals.Annals of Applied Statistics (3), 1506–1534 (2017) uerra et al. Page 24 of 25
39. Yoo, J., Kim, E.Y., Ahn, Y.M., Ye, J.C.: Topological persistence vineyard for dynamic functional brainconnectivity during resting and gaming stages. Journal of Neuroscience Methods (15), 1–13 (2016)40. Petri, G., Expert, P., Turkheimer, F., Carhart-Harris, R., Nutt, D., Hellyer, P.J., Vaccarino, F.: Homologicalscaffolds of brain functional networks. Journal of The Royal Society Interface (101), 20140873 (2014).doi:10.1098/rsif.2014.087341. Ib´a˜nez-Marcelo, E., Campioni, L., Phinyomark, A., Petri, G., Santarcangelo, E.L.: Topology highlightsmesoscopic functional equivalence between imagery and perception: The case of hypnotizability. NeuroImage , 437–449 (2019)42. Lord, L.-D., Expert, P., Fernandes, H.M., Petri, G., Van Hartevelt, T.J., Vaccarino, F., Deco, G., Turkheimer,F., Kringelbach, M.L.: Insights into brain architectures from the homological scaffolds of functional connectivitynetworks. Frontiers in systems neuroscience , 85 (2016)43. Ib´a˜nez-Marcelo, E., Campioni, L., Manzoni, D., Santarcangelo, E.L., Petri, G.: Spectral and topologicalanalyses of the cortical representation of the head position: Does hypnotizability matter? Brain and behavior (6), 01277 (2019)44. Guo, W., Banerjee, A.G.: Toward automated prediction of manufacturing productivity based on featureselection using topological data analysis. In: IEEE International Symposium on Assembly and Manufacturing,pp. 31–36 (2016)45. Phinyomark, A., Petri, G., Ib´a˜nez-Marcelo, E., Osis, S.T., Ferber, R.: Analysis of big data in gait biomechanics:Current trends and future directions. Journal of medical and biological engineering (2), 244–260 (2018)46. Campbell, E., Phinyomark, A., Al-Timemy, A.H., Khushaba, R.N., Petri, G., Scheme, E.: Differences in emgfeature space between able-bodied and amputee subjects for myoelectric control. In: 2019 9th InternationalIEEE/EMBS Conference on Neural Engineering (NER), pp. 33–36 (2019). IEEE47. Patania, A., Petri, G., Vaccarino, F.: The shape of collaborations. EPJ Data Science (1), 18 (2017)48. Benson, A.R., Abebe, R., Schaub, M.T., Jadbabaie, A., Kleinberg, J.: Simplicial closure and higher-order linkprediction. Proceedings of the National Academy of Sciences (48), 11221–11230 (2018)49. Petri, G., Scolamiero, M., Donato, I., Vaccarino, F.: Topological strata of weighted complex networks. PloS one (6) (2013)50. Patania, A., Vaccarino, F., Petri, G.: Topological analysis of data. EPJ Data Science (1), 7 (2017)51. Donato, I., Gori, M., Pettini, M., Petri, G., De Nigris, S., Franzosi, R., Vaccarino, F.: Persistent homologyanalysis of phase transitions. Physical Review E (5), 052138 (2016)52. Sizemore, A., Giusti, C., Bassett, D.S.: Classification of weighted networks through mesoscale homologicalfeatures. Journal of Complex Networks (2), 245–273 (2017)53. Edelsbrunner, H., Harer, J.: Computational topology: An introduction. American Mathematical Society (2010)54. Kurlin, V.: A one-dimensional homologically persistent skeleton of an unstructured point cloud in any metricspace. Computer Graphics Forum (5), 253–262 (2015). doi:10.1111/cgf.1271355. Kalisnik, S., Kurlin, V., Lesnik, D.: A higher-dimensional homologically persistent skeleton. Advances in AppliedMathematics , 113–142 (2019)56. Ge, X., Safa, I.I., Belkin, M., Wang, Y.: Data skeletonization via Reeb graphs. Advances in Neural InformationProcessing Systems 24, 837–845 (2011)57. Chazal, F., Huang, R., Sun, J.: Gromov–hausdorff approximation of filamentary structures using reeb-typegraphs. Discrete & Computational Geometry (3), 621–649 (2015)58. Sizemore, A.E., Giusti, C., Kahn, A., Vettel, J.M., Betzel, R.F., Bassett, D.S.: Cliques and cavities in thehuman connectome. Journal of computational neuroscience (1), 115–145 (2018)59. Obayashi, I.: Volume-optimal cycle: Tightest representative cycle of a generator in persistent homology. SIAMJournal on Applied Algebra and Geometry (4), 508–534 (2018)60. Dey, T., Sun, J., Wang, Y.: Approximating loops in a shortest homology basis from point data. Proceedings ofthe Annual Symposium on Computational Geometry (2009). doi:10.1145/1810959.181098961. Dey, T.K., Li, T., Wang, Y.: Efficient algorithms for computing a minimal homology basis. In: Latin AmericanSymposium on Theoretical Informatics, pp. 376–398 (2018). Springer62. Baronchelli, A., Ferrer-i-Cancho, R., Pastor-Satorras, R., Chater, N., Christiansen, M.H.: Networks in cognitivescience. Trends in cognitive sciences (7), 348–360 (2013)63. Lum, P.Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., Carlsson, J., Carlsson,G.: Extracting insights from the shape of complex data using topology. Scientific reports , 1236 (2013)64. Tausz, A., Vejdemo-Johansson, M., Adams, H.: JavaPlex: A research software package for persistent(co)homology. In: Hong, H., Yap, C. (eds.) Proceedings of ICMS 2014. Lecture Notes in Computer Science8592, pp. 129–136 (2014)65. Chen, C., Freedman, D.: Hardness results for homology localization. Discrete & Computational Geometry (3), 425–448 (2011). doi:10.1007/s00454-010-9322-866. Horton, J.: A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM Journal onComputing (2), 358–366 (1987). doi:10.1137/021602667. de Pina de J, C.: Applications of shortest path methods (1995)68. Kavitha, T., Mehlhorn, K., Michail, D., Paluch, K.: A faster algorithm for minimum cycle basis of graphs. In:D´ıaz, J., Karhum¨aki, J., Lepist¨o, A., Sannella, D. (eds.) Automata, Languages and Programming, pp. 846–857.Springer, Berlin, Heidelberg (2004)69. Busaryev, O., Cabello, S., Chen, C., Dey, T.K., Wang, Y.: Annotating simplices with a homology basis and itsapplications. In: Scandinavian Workshop on Algorithm Theory, pp. 189–200 (2012). Springer70. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of SymbolicComputation (3), 251–280 (1990). doi:10.1016/S0747-7171(08)80013-2. Computational algebraic complexityeditorial71. Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th InternationalSymposium on Symbolic and Algebraic Computation. ISSAC ’14, pp. 296–303. ACM, New York, NY, USA(2014). doi:10.1145/2608628.2608664. http://doi.acm.org/10.1145/2608628.2608664 uerra et al. Page 25 of 25
72. Guerra, M., De Gregorio, A.: Minimal Scaffold repository
MinScaffold (2019). https://github.com/marcoguerra192/MinScaffold
73. M. Termenon, A.J. C. Delon-Martin, Achard, S.: Reliability of graph analysis of resting state fmri usingtest-retest dataset from the human connectome project. Neuroimage (15), 172–187 (2016).doi:10.1016/j.neuroimage.2016.05.06274. Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B.,Varoquaux, G.: Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics8