[PDF] Reviews: Topological Distances and Losses for Brain Networks

Abstract

Almost all statistical and machine learning methods in analyzing brain networks rely on distances and loss functions, which are mostly Euclidean or matrix norms. The Euclidean or matrix distances may fail to capture underlying subtle topological differences in brain networks. Further, Euclidean distances are sensitive to outliers. A few extreme edge weights may severely affect the distance. Thus it is necessary to use distances and loss functions that recognize topology of data. In this review paper, we survey various topological distance and loss functions from topological data analysis (TDA) and persistent homology that can be used in brain network analysis more effectively. Although there are many recent brain imaging studies that are based on TDA methods, possibly due to the lack of method awareness, TDA has not taken as the mainstream tool in brain imaging field yet. The main purpose of this paper is provide the relevant technical survey of these powerful tools that are immediately applicable to brain network data.

Full PDF

RReviews: Topological Distances and Losses for Brain Networks

Moo K. Chung a,b, ∗ , Alexander Smith c , Gary Shiu d a Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA b Waisman Laboratory for Brain Imaging and Behavior, University of Wisconsin, Madison, WI, USA c Department of Chemical and Biological Engineering, University of Wisconsin, Madison, WI, USA d Department of Physics, University of Wisconsin, Madison, WI, USA

Abstract

Almost all statistical and machine learning methods in analyzing brain networks rely on distances and loss functions, which aremostly Euclidean or matrix norms. The Euclidean or matrix distances may fail to capture underlying subtle topological di ﬀ erencesin brain networks. Further, Euclidean distances are sensitive to outliers. A few extreme edge weights may severely a ﬀ ect thedistance. Thus it is necessary to use distances and loss functions that recognize topology of data. In this review paper, we surveyvarious topological distance and loss functions from topological data analysis (TDA) and persistent homology that can be used inbrain network analysis more e ﬀ ectively. Although there are many recent brain imaging studies that are based on TDA methods,possibly due to the lack of method awareness, TDA has not taken as the mainstream tool in brain imaging ﬁeld yet. The mainpurpose of this paper is provide the relevant technical survey of these powerful tools that are immediately applicable to brainnetwork data. Keywords:

Topological distances, topological losses, topological data analysis, persistent homology, brain networks

1. Introduction

There are many similarity measures, distances and loss func-tions used in discriminating brain networks (Banks and Carley,1994; Chung et al., 2019c,b; Lee et al., 2012; Chen et al., 2016).Due to ever increasing popularity of deep learning, which esti-mates the parameters by minimizing loss functions, there is alsoa renewed interest in studying various loss functions. Manyof existing distances and losses simply ignore the underlyingtopology of the networks and often use Euclidean distances.Existing distances often fail to capture underlying topologicaldi ﬀ erences. They may lose sensitivity over topological struc-tures such as the connected components, modules and cycles innetworks. Further, Euclidean norms are sensitive to outliers. Afew extreme edge weights may severely a ﬀ ect the distance.In standard graph theory based brain network analysis, thesimilarity of networks are measured by the di ﬀ erence in graphtheory features such as assortativity, betweenness centrality, small-worldness and network homogeneity (Bullmore and Sporns, 2009;Rubinov and Sporns, 2010; Uddin et al., 2008). Comparison ofgraph theory features appears to reveal changes of structural orfunctional connectivity associated with di ﬀ erent clinical pop-ulations (Rubinov and Sporns, 2010). Since dense weightedbrain networks are di ﬃ cult to interpret and analyze, they areoften turned into binary networks by thresholding edge weights(He et al., 2008; Wijk et al., 2010). The choice of thresh-olding the edge weights may alter the network topology. The ∗ Correspondence to: M.K.Chung, Department of Biostatistics and Medi-cal Informatics, University of Wisconsin-Madison, 1300 University Ave, 4725Medical Science Center, Madison, 53706, USA

Email address: [email protected] (Moo K. Chung) multiple comparison correction over every possible connectionswere used in determining thresholds (Rubinov et al., 2009; Sal-vador et al., 2005; Wijk et al., 2010). The amount of sparsityof connections were also used in determining thresholds (Bas-sett, 2006; He et al., 2008; Wijk et al., 2010; Lee et al., 2011c).To address the inherent problem of thresholding, persistent ho-mological brain network analysis was developed in 2011 (Leeet al., 2011a,b). Since then the method has been successfullyapplied to numerous brain network studies and shown to be apowerful alternative brain network approaches (Chung et al.,2013, 2015a; Lee et al., 2012).Persistent homology provides a coherent mathematical frame-work for quantifying brain images and networks (Carlsson andMemoli, 2008; Edelsbrunner and Harer, 2010b). Instead oflooking at network at a ﬁxed thresholding or resolution, persis-tent homology quantiﬁes the over all changes of network topol-ogy over multiple scales (Edelsbrunner and Harer, 2010b; Ho-rak et al., 2009; Zomorodian and Carlsson, 2005). In doing so,it reveals the most persistent topological features that are ro-bust to noise and scale. Persistent homology has been appliedto wide variety of data including sensor networks (Carlsson andMemoli, 2008), protein structures (Gameiro et al., 2015) andRNA viruses (Chan et al., 2013), image segmentation.In persistent homology based brain network analysis, in-stead of analyzing networks at one ﬁxed threshold that may notbe optimal, we build the collection of nested networks over ev-ery possible threshold using the graph ﬁltration , a persistenthomological construct (Lee et al., 2011a, 2012; Chung et al.,2013, 2015a). The graph ﬁltration is a threshold-free frame-work for analyzing a family of graphs but requires hierarchi-cally building speciﬁc nested subgraph structures. The graph

Preprint submitted to arXiv February 18, 2021 a r X i v : . [ c s . C G ] F e b ltration shares similarities to the existing multi-thresholdingor multi-resolution network models that use many di ﬀ erent ar-bitrary thresholds or scales (Achard et al., 2006; He et al., 2008;Lee et al., 2012; Kim et al., 2015; Supekar et al., 2008). Suchapproaches are mainly used to visually display the dynamicpattern of how graph theoretic features change over di ﬀ erentthresholds and the pattern of change is rarely quantiﬁed. Per-sistent homology can be used to quantify such dynamic pat-tern in a more coherent mathematical framework. In numerousstudies, persistent homological network approach is shown tovery robust and outperforming many existing network measuresand methods. In Lee et al. (2011a, 2012), persistent homologywas shown to outperform against eight existing graph theoryfeatures such as assortativity, between centrality, clustering co-e ﬃ cient, characteristic path length, samll-worldness, modular-ity and global network homogeneity. In Chung et al. (2017a),persistent homology was shown to outperform various matrixnorms. In Wang et al. (2018), persistent homology using thepersistent landscape was shown to outperform power spectraldensity and local variance methods. In Wang et al. (2017b), per-sistent homology was shown to outperform topographic powermaps. In Yoo et al. (2017), center persistency was shown to out-perform the network-based statistic and element-wise multiplecorrections.Existing statistical and machine learning methods for brainnetworks are based on distance and loss functions that are mostlyEuclidean based or matrix norms. Such distances are geometricin nature and not sensitive enough for topological signal di ﬀ er-ences often observed in brain networks. Thus, it is necessaryto use distances and loss functions that are topologically moresensitive. Persistent homology o ﬀ ers a coherent mathemati-cal framework for measuring network distances topologically .Various topological distances such as Gromov-Hausdor ﬀ (GH)distance(Tuzhilin, 2016; Carlsson and Memoli, 2008; Carlssonand M´emoli, 2010; Chazal et al., 2009; Lee et al., 2011a, 2012),bottleneck distances (Lee et al., 2012, 2017; Chung et al., 2015a)and the Kolmogorov-Smirnov (KS) distance(Chung, 2012; Chunget al., 2017b; Lee et al., 2017) are available. The main purposeof this paper is to review such topological distances within thecontext of persistent homology.

2. Traditional distances and losses

In statistical analysis and machine learning, the loss func-tion acts as a cost function, which needs to be minimized todetermine the model ﬁt. Widely used loss functions are of-ten Euclidean, and have proven e ﬀ ective in traditional appli-cations (Bishop, 2006). Here, we describe the most often usedloss functions that were applied in various brain image analysistasks.The mean square error loss (MSE) is the most often usedloss. For a given model with n input data x i , input labels y i , andmodel outputs (cid:98) y i , MSE is given as MS E = n n (cid:88) i = ( y i − (cid:98) y i ) (1) The MSE is based upon the L -norm between the model pre-diction and input data and the most often used loss in regres-sion. In regression setting, MSE was in identifying brain imag-ing predictions for memory performance (Wang et al., 2011),stimation of kinetic constants from PET data (O’Sullivan andSaha, 1999), for correction partial volume e ﬀ ects in arterialspin labeling MRI (Kim et al., 2018; Asllani et al., 2008), EEGsignal classiﬁcation with neural networks (Kottaimalai et al.,2013). This is also the basis of widely used k -means cluster-ing (Jain et al., 1999) in identifying abnormalities in brain tu-mors (Arunkumar et al., 2019), modeling state spaces in rsfMRI(Huang et al., 2020b). This loss was also used for image clas-siﬁcation of autism spectrum disorder (Heinsfeld et al., 2018),tumor segmentation for MR brain images Mittal et al. (2019)among other learning tasks.One problem associated with MSE is that it heavily penal-izes outliers due to its quadratic lose (Bishop, 2006). The im-pact of outliers can be signiﬁcantly reduced using the mean ab-solute error (MAE), which is L -norm: MAE = n n (cid:88) i = | y i − (cid:98) y i | (2)The MAE was successfully used in clustering (Jain et al., 1999),tumor segmentation in MRI (Blessy and Sulochana, 2015), ageprediction in deep learning (Cole et al., 2017) and the conver-sion of MRI to CT through deep learning (Wolterink et al.,2017).Another common loss often used in probabilistic model build-ing is the cross entropy L ent , which was initially used in logis-tic regression and artiﬁcial neural networks. The output of themodel is deﬁned as a probability p i that a given input x i belongsto a binary class: L ent = y i log p i + (1 − y i ) log(1 − p i ) . (3)The cross entropy is equivalent to the maximum likelihood ofthe product of Bernoulli distributions. The use of cross en-tropy over MSE can lead to faster training times and improvedmodel generalization (Bishop, 2006). The cross entropy hasbeen used in improving brain MRI segmentation (Moeskopset al., 2017), the prediction of intracranial pressure levels af-ter brain injury (Chen et al., 2010) and in a machine learninginterface for medical image analysis (Zhang and Kagen, 2017).Logistic regression has found use in a deep learning ensembleregression model for brain disease diagnosis (Suk et al., 2017)and in the classiﬁcation of brain hemorrhages with the convo-lutional neural networks (Jnawali et al., 2018).Another broadly applied loss in the support vector machineand other classiﬁcation tasks is the hinge loss , which uses the L ∞ -norm (Bishop, 2006). The hinge loss heavily penalizes in-correct classiﬁcations or classiﬁcations that are correct but arenear the decision boundary. The hinge loss has been used in theclassiﬁcation of CT brain images with deep learning networksfor identiﬁcation of Alzheimer’s disease (Gao et al., 2017). Ithas also been applied to data collected from working brains toguide machine learning algorithms (Fong et al., 2018).2 .2. Matrix norms as network distances Many distance or similarity measures are not metrics buthaving metric distances makes networks more stable due to thetriangle inequality. Further, existing network distance conceptsare often borrowed from the metric space theory. Let us startwith formulating brain networks as metric spaces. The brainnetworks are often algebraically represented as matrices of size p × p , where p is predetermined number of parcellations. Often100-300 parcellations are used for this purpose. It is necessaryto use distances or losses deﬁned on the connectivity matrices.Consider a weighted graph or network X = ( V , w ) with the nodeset V = { , . . . , p } and the edge weights w = ( w i j ), where w i j is the weight between nodes i and j . The edge weight is usu-ally given by a similarity measure between the observed dataon the nodes. Various similarity measures have been proposed.The correlation or mutual information between measurementsfor the biological or metabolic network and the frequency ofcontact between actors for the social network have been usedas edge weights (Bassett, 2006; Bien and Tibshirani, 2011). Wemay assume that the edge weights satisfy the metric proper-ties: nonnegativity, identity, symmetry and the triangle inequal-ity such that w i , j ≥ , w ii = , w i j = w ji , w i j ≤ w ik + w k j . With theses conditions, X = ( V , w ) forms a metric space. Al-though the metric property is not necessary for building a net-work, it o ﬀ ers many nice mathematical properties and easierinterpretation on network connectivity. Further, persistent ho-mology is often built on top of metric spaces. Many real-worldnetworks satisfy the metric properties.Given measurement vector x i = ( x i , · · · , x ni ) (cid:62) ∈ R n on thenode i . The weight w = ( w i j ) between nodes is often given bysome bivariate function f : w i j = f ( x i , x j ). The correlation be-tween x i and x j , denoted as corr( x i , x j ), is a bivariate function.If the weights w = ( w i j ) are given by w i j = (cid:113) − corr( x i , x j ) , it can be shown that X = ( V , w ) forms a metric space.Matrix norm of the di ﬀ erence between networks is oftenused as a measure of similarity between networks (Banks andCarley, 1994; Zhu et al., 2014). Given two networks X = ( V , w ) and X = ( V , w ), the L l -norm of network di ﬀ erenceis given by D l ( X , X ) = (cid:107) w − w (cid:107) l = (cid:16) (cid:88) i , j (cid:12)(cid:12)(cid:12) w i j − w i j (cid:12)(cid:12)(cid:12) l (cid:17) / l . Note L l is the element-wise Euclidean distance in l -dimension.When l = ∞ , L ∞ -distance is written as D ∞ ( X , X ) = (cid:107) w − w (cid:107) ∞ = max ∀ i , j (cid:12)(cid:12)(cid:12) w i j − w i j (cid:12)(cid:12)(cid:12) . The element-wise di ﬀ erences may not capture additional higherorder similarity. For instance, there might be relations betweena pair of columns or rows (Zhu et al., 2014). Also L and L -distances usually surfer the problem of outliers. Few outlying Figure 1: Toy networks with an outlying edge weight. All the L l -norm baseddistance will give ∞ distance while topological distances may not be severelya ﬀ ected with such outlying data. extreme edge weights may severely a ﬀ ect the distance. Giventwo identical networks that only di ﬀ er in one outlying edge withinﬁnite weight (Figure 1), L l ( X , X ) = ∞ and L ∞ ( X , X ) = ∞ . Thus, the usual matrix norm based distance is sensitive toeven a single outlying edge. However, topological distances arenot sensitive to edges weights but sensitive to the underlyingtopology. Further, these distances ignore the underlying topo-logical structures. Thus, there is a need to deﬁne distances thatare more topological. The log-Euclidean was previously used in measuring dis-tance between correlation based brain networks (Qiu et al., 2015;Chung et al., 2015a), which are supposed to reside in the spaceof positive deﬁnite symmetric (PDS) matrices. Let S p be thespace of symmetric (PDS) matrices of size p × p . Let P p bethe space of positive deﬁnite symmetric (PDS) matrices. Notethat the dimension of S p and P p is p ( p + /

2, i.e., S p , P p ⊂ R p ( p + / . While S p is a ﬂat Euclidean space, P p is a curvedmanifold. Even though, various computation in P p is fairly in-volving, the use of exponential map exp : S p → P p and itsinverse make computations in P p tractable. The exponentialmap is realized with matrix exponential. In practice, the singu-lar value decomposition is mainly used in computing the matrixexponential as follows. For X ∈ S p , there exists an orthogonalmatrix Q such that X = Q (cid:62) Λ Q with Λ = diag ( λ , · · · , λ p ), the diagonal matrix consisting ofentries λ , · · · , λ p . Then matrix exponential is deﬁned as expX = Q (cid:62) ( exp Λ ) Q . Thus expX = Q (cid:62) ( exp Λ ) Q , where exp Λ is the diagonal matrixconsisting of entries e λ , · · · , e λ p . Similarly, for Y ∈ P m , wehave decomposition Y = Q (cid:62) Λ Q with Λ = diag ( λ , · · · , λ p ).Then the matrix logarithm of Y is computed as logY = Q (cid:62) (log Λ ) Q . If matrix Y is nonnegative deﬁnite with zero eigenvalues,the matrix logarithm is not deﬁned since log 0 is not deﬁned.Thus, we cannot apply logarithm directly to rank-deﬁcient largecorrelation and covariance matrices obtained from small num-ber of subjects. One way of applying logarithm to nonnegative3eﬁnite matrices is to make matrix Y diagonally dominant byadding a diagonal matrix α I with suitable choice of relativelylarge α (Chan and Wood, 1997). Alternately, we can performa graphical LASSO-type of sparse model and obtain the clos-est possible positive deﬁnite matrices (Chung et al., 2015a; Qiuet al., 2015; Mazumder and Hastie, 2012).For X , Y ∈ P p , the log-Euclidean metric is given by theFrobenius norm (Qiu et al., 2015) D LE ( X , Y ) = (cid:107) log X − log Y (cid:107) F = (cid:113) tr (cid:0) log( X ) − log( Y ) (cid:1) . The log-Euclidean distance was used in Qiu et al. (2015) tocompute the mean of functional brain networks and the varia-tion of each individual network around the mean. If C i denotesthe brain network represented as the PDS correlation matrix ofthe i -th subject, the average of all brain network within log-Euclidean framework is given by¯ C = exp (cid:16) n n (cid:88) i = log C i (cid:17) . The metric was also used in the local linear embedding of brainfunctional networks (Qiu et al., 2015) and regression for resting-state functional brain networks in the PDS space (Huang et al.,2020b).

Graph matching is well formulated established method formatching two di ﬀ erent graphs via combinatorial optimization.The method is often used in distributed controls and computervision (Zavlanos and Pappas, 2008). Suppose two graphs X = ( V , w ) and X = ( V , w ) with p nodes are given. If A and A are adjacency matrices of X and X , the graph matching costfunction is given by D GM ( X , X ) = min Q (cid:107) QA Q (cid:62) − A (cid:107) , where Q is the permutation matrix that shu ﬄ es the node in-dex. Two graphs are isomorphic if D GM =

0. All isomorphicgraphs have the same structure, since one obtain A from A by relabelling of the nodes. However, not every two graphs areisomorphic. If λ i ≥ λ i ≥ · · · ≥ λ ip are eigenvalues of A i . Thenwe can show that (Zavlanos and Pappas, 2008) (cid:107) QA Q (cid:62) − A (cid:107) ≥ p (cid:88) i = ( λ i − λ i ) , which is the lower bound for the graph matching cost D GM .The main limitation of graph matching is the exponentialrun time 2 O ( √ p log p ) , which does not scale well for large p (Babaiand Luks, 1983). The other limitation is that if the size of nodesets don’t match, it is di ﬃ cult to apply the graph matching al-gorithm. Additional nodes are argumented to match the nodesets but the argumentation can be somewhat arbitrary (Guo andSrivastava, 2020). The graph matching has been used in match-ing and averaging heterogenous tree structures such as brainartery trees and neuronal trees (Guo and Srivastava, 2020). If the same brain parcellation is used across subjects, we do notneed realignment of node labels so graph matching has not seenmany applications in brain network analysis. However, with theavailability of many di ﬀ erent parcellations, it might be usefulmatching networks across di ﬀ erent parcellations. Many existing approaches for measuring distance betweenbrain networks assume the size of networks to be the same.Otherwise, it is di ﬃ cult to deﬁne distance and loss functions.The canonical correlation is perhaps one of few statistical sim-ilarly measure that enable to compute the distance between themeasurements of di ﬀ erent sizes (Hotelling, 1992). It might beuseful for comparing brain networks obtained parcellations ofdi ﬀ erent sizes.Given two vectors X = ( x , · · · , x n ) (cid:62) and Y = ( y , · · · , y m ) (cid:62) ,the canonical correlation between X and Y is given by ρ = max a , b corr ( a (cid:62) X , b (cid:62) Y ) . Various numerical methods are available for computing ρ . It iscomputed using canocorr in MATLAB and cancor or CCA inR package. For brain connectivity matrices, we vectorize onlythe upper triangle components of the connectivity matrices andcompute the canonical correlations. Numerically, the canonicalcorrelation is mainly computed vis the singular value decompo-sition (SVD).The canonical correlation has been widely used in variousapplications including deep learning (Andrew et al., 2013), mul-tiview clustering (Chaudhuri et al., 2009). In brain imaging, it ismainly used in correlating measurements from di ﬀ erent modal-ities. In Avants et al. (2010), fractional anisotropy values fromdi ﬀ usion tensor imaging and cortical thickness from MRI werecorrelated using canonical correlations. In Correa et al. (2009),fMRI, structural MRI and EEG measurements are correlatedusing canonical correlations.

3. Preliminary: Persistent homology

We start with the basic mathematical understanding of per-sistent homology.

A high dimensional object can be approximated by the pointcloud data X consisting of p number of points. If we connectpoints of which distance satisfy a given criterion, the connectedpoints start to recover the topology of the object. Hence, we canrepresent the underlying topology as a collection of the subsetsof X that consists of nodes which are connected (Edelsbrunnerand Harer, 2010b; Hart, 1999). Given a point cloud data set X with a rule for connections, the topological space is a simpli-cial complex and its element is a simplex (Zomorodian, 2009).For point cloud data, the Delaunay triangulation is probably themost widely used method for connecting points. The Delau-nay triangulation represents the collection of points in space as4 igure 2: A simplicial complex with 5 vertices and 2-simplex σ = [ v , v , v ]with a ﬁlled-in face (colored gray). After boundary operation ∂ , we are onlyleft with 1-simplices [ v , v ] + [ v , v ] + [ v , v ], which is the boundary of theﬁlled in triangle. The complex has a single connected component ( β =

1) anda single 1-cycle. The dotted red arrows are the orientation of simplices. a graph whose face consists of triangles. Another way of con-necting point cloud data is based on Rips complex often studiedin persistent homology.Homology is an algebraic formalism to associate a sequenceof objects with a topological space (Edelsbrunner and Harer,2010b). In persistent homology, the algebraic formalism is usu-ally built on top of objects that are hierarchically nested suchas morse ﬁltration, graph ﬁltration and dendrograms. Formallyhomology usually refers to homology groups which are oftenbuilt on top of a simplical complex for point cloud and networkdata (Lee et al., 2014).The k -simplex σ is the convex hull of v + v , · · · , v k . A point is a 0-simplex, an edge is a 1-simplex,and a ﬁlled-in triangle is a 2-simplex. A simplicial complex is a ﬁnite collection of simplices such as points (0-simplex),lines (1-simplex), triangles (2-simplex) and higher dimensionalcounter parts (Edelsbrunner and Harer, 2010b). A k-skeleton is a simplex complex of up to k simplices. Hence a graph isa 1-skeleton consisting of 0-simplices (nodes) and 1-simplices(edges). There are various simplicial complexes. The most of-ten used simplicial complex in persistent homology is the Ripscomplex.The boundary operations have been very useful for e ﬀ ec-tively quantifying persistent homology. Let C k be the collectionof k -simplices. We deﬁne the k -th boundary operator ∂ k : C k → C k − . that removes the ﬁlled-in interior of k -simplices. Consider aﬁlled-in triangle σ = [ v , v , v ] ∈ C with three vertices v , v , v in Figure 2. The boundary operator ∂ k applied to σ resulted inthe collection of three edges that forms the boundary of σ : ∂ σ = [ v , v ] + [ v , v ] + [ v , v ] ∈ C . (4)If we give the direction or orientation to edges such that[ v , v ] = − [ v , v ] , and use edge notation e i j = [ v i , v j ], we can write (4) as ∂ σ = e + e + e = e + e − e . We can apply the boundary operation ∂ further to ∂ σ andobtain ∂ ∂ σ = ∂ e + ∂ e + ∂ e = v − v + v − v + v − v = . The boundary operation twice will results in an empty set. Suchalgebraic representation for boundary operation has been veryuseful for e ﬀ ectively quantifying persistent homology.The boundary operator ∂ k can be represented using the bound-ary matrix , which is the higher dimensional version of the in-cidence matrix. (Lee et al., 2018, 2019; Schaub et al., 2018)showing how ( k − k -dimensionalsimplex. The boundary matrix ∂ k is a matrix of size n × m consisting of ones and zeros, where n is the total number of( k − m is the total number of k -dimensional simplices in the simplicial complex. Index the( k − σ , · · · , σ n and k -dimensionalsimplices as τ , · · · , τ m . The ( i , j )-th entry of ∂ k is one if σ i is aface of τ j otherwise zero. The entry can be -1 depending on theorientation of σ i .For the simplicial complex in Figure 2, the boundary matri-ces are given by ∂ = e e e e e e σ   ∂ = v v v v v e e e e e e  − − − − − −

10 0 0 0 0 1  ∂ = v v v v v (cid:16) (cid:17) . Although we put direction in the boundary matrix ∂ by addingsign, the Betti number β computation will be invariant. Withboundary operations, we can build a vector space C k using theset of k -simplices as a basis. The vector spaces C k , C k − , C k − , · · · are then sequentially nested by boundary operator ∂ k (Edels-brunner and Harer, 2010b): · · · ∂ k + −−−→ C k ∂ k −→ C k − ∂ k − −−−→ C k − ∂ k − −−−→ · · · . (5)Such nested structure is called the chain complex . Let B k be acollection of boundaries obtained as the image of ∂ k , i.e., B k = { ∂ k σ : σ ∈ C k } . Let Z k be a collection of cycles obtained as the kernel of ∂ k , i.e., Z k = { σ ∈ C k : ∂ k σ = } . For instance, the 1-cycle formed by edges e , e , e in Figure2 is the boundary of the ﬁlled-in gray colored triangle σ . Theboundaries B k form subgroups of the cycles Z k , i.e, B k ⊂ Z k .We can partition Z k into cycles that di ﬀ er from each other byboundaries through the quotient space H k = Z k / B k , k -th homology group. The elements of the k -th homology group are often referred to as k -dimensional cyclesor k -cycles. The k -th Betti number β k is then the number of k -dimensional cycles, which is given by the rank of H k , i.e., β k = rank ( H k ) = rank ( Z k ) − rank ( B k ) . (6)The 0-th Betti number is the number of connected componentswhile 1-st Betti number is the number of cycles.The Betti numbers β k are usually algebraically computedby reducing the boundary matrix ∂ k to the Smith normal form,which has a block diagonal matrix as a submatrix in the upperleft, via Gaussian elimination (Edelsbrunner and Harer, 2010b).For instance, the boundary matrices ∂ i in Figure 2 is trans-formed to the Smith normal form S ( ∂ i ) after Gaussian elimi-nation as S ( ∂ ) =   , S ( ∂ ) =   . In the Smith normal form S ( ∂ k ), the number of columns con-taining only zeros is rank ( Z k ), the number of k -cycles while thenumber of rows containing one is rank ( B k − ), the number of( k − rank ( Z ) = rank ( B ) = rank ( Z ) = rank ( B ) = S ( ∂ ). Thus, we have β = rank ( Z ) − rank ( B ) = − = ,β = rank ( Z ) − rank ( B ) = − = . The Betti numbers can be also computed using the HodgeLaplacian without Gaussian elimination. The standard graphLaplacian is deﬁned as ∆ = ∂ ∂ (cid:62) , which is also called the 0-th Hodge Laplacian (Lee et al., 2018).In general, the k -th Hodge Laplacian is deﬁned as ∆ k = ∂ k + ∂ (cid:62) k + + ∂ k ∂ (cid:62) k . The boundary operation ∂ k only depends on k -simplices. Thus, ∆ k is uniquely determined by ( k + − and k -simplices. The k -thLaplacian is sparse a n k × n k positive semi-deﬁnite symmetricmatrix, where n k is the number of k -simplices in the network(Friedman, 1998). Then the k -th Betti number β k is the dimen-sion of ker ∆ k , which is given by computing the rank of ∆ k . The0th Betti number, the number of connected component, is com-puted from ∆ while the 1st Betti number, the number of cycles,is computed from ∆ .The k -th hodge Laplacian depends only on n k number of k -simplices in the data. After lengthy algebraic derivation, wecan show that ∆ k = D − A u + ( k + I n k + A l , Figure 3: Rips ﬁltration on 1-skeleton (Rips complex consisting of only nodesand edges) of the point cloud data that was sampled along the underlying keyshaped data. If two points are within the given radius, we connect them with anedge. where A u and A l are the upper and lower adjacency matricesbetween the k -simplices. D = diag ( deg ( σ ) , · · · , deg ( σ n k )) isthe diagonal matrix consisting of the sum of node degrees ofsimplices σ j (Muhammad and Egerstedt, 2006). The Rips complex has been the main building block for per-sistent homology and deﬁned on top of the point cloud data(Ghrist, 2008). The

Rips complex is a simplicial complex con-structed by connecting two data points if they are within spe-ciﬁc distance (cid:15) . Figure 3 shows an example of the Rips complexthat approximates the gray object with a point cloud. Givena point cloud data, the Rips complex R (cid:15) is a simplicial com-plex whose k -simplices correspond to unordered ( k + (cid:15) (Ghrist, 2008).While a graph has at most 1-simplices, the Rips complex has atmost k -simplices. The Rips complex has the property that R (cid:15) ⊂ R (cid:15) ⊂ R (cid:15) ⊂ · · · for 0 = (cid:15) ≤ (cid:15) ≤ (cid:15) ≤ · · · . When (cid:15) =

0, the Rips complex issimply the node set V . By increasing the ﬁltration value (cid:15) , weare connecting more nodes so the size of the edge set increases.Such the nested sequence of the Rips complexes is called a Ripsﬁltration , the main object of interest in the persistent homology(Edelsbrunner and Harer, 2008). The increasing (cid:15) values arecalled the ﬁltration values.One major problem of the Rips complex is that as the num-ber of vertices p increase, the resulting simplical complex be-comes very dense. Further, as the ﬁltration values increases,there exists an edge between ever pair of vertices and ﬁlled tri-angle between every triple of vertices. At higher ﬁltration val-6 igure 4: The births and deaths of connected components in the sublevel sets ina Morse ﬁltration (Chung et al., 2009a). We have local minimums a < b < d < f and local maximums c < e . At y = a , we have a single connected component(gray area). As we increase the ﬁltration value to y = b , we have the birthof a new component (second gray area). At the local maximum y = c , thetwo sublevel sets merge together to form a single component. This is viewedas the death of a component. The process continues till we exhaust all thecritical values. Following the Elder rule, we pair birth to death: ( c , b ) and( e , d ). Other critical values are paired similarly. These paired points form thepersistent diagram (PD). ues, Rips ﬁltration becomes very ine ﬀ ective representation ofdata. A function is called a

Morse function if all critical valuesare distinct and non-degenerate, i.e., the Hessian does not van-ish (Milnor, 1973). For a 1D Morse function y = f ( t ), deﬁnesublevel set R ( y ) as R ( y ) = f − ( −∞ , y ] = { t ∈ R : f ( t ) < y } . The sublevel set is the domain of f satisfying f ( t ) ≤ y . As weincrease height y ≤ y ≤ y ≤ · · · , the sublevel set gets biggersuch that R ( y ) ⊂ R ( y ) ⊂ R ( y ) ⊂ · · · . The sequence of the sublevel sets form a

Morse ﬁltration withﬁltration values y , y , y , · · · . Let β ( y ) be the 0-th Betti num-ber of R ( y ). β ( y ) counts the number of connected componentsin R ( y ). The number of connected components is the most of-ten used topological invariant in applications (Edelsbrunner andHarer, 2008). β ( y ) only changes its value as it passes throughcritical values (Figure 4). The birth and death of connectedcomponents in the Morse ﬁltration is characterized by the pair-ing of local minimums and maximums. For 1D Morse func-tions, we do not have higher dimensional topological featuresbeyond the connected components.Let us denote the local minimums as g , · · · , g m and thelocal maximums as h , · · · , h n . Since the critical values of aMorse function are all distinct, we can combine all minimumsand maximums and reorder them from the smallest to the largest:We further order all critical values together and let g = z (1) < z (2) < · · · < z ( m + n ) = h n , Figure 5: A comparison of the barcodes obtained from a sample with no mea-surement error (top) and a sample containing a single measurement error (redpoint). While both point clouds may be taken from the same population, thepresence of a single measurement error can dramatically impact the resultingbarcodes. B has most features with approximately equal lifetimes ( τ i − ξ i ),where as B has a feature with a large lifetime corresponding to the improperlymeasured point. Simply looking at the barcodes, it is di ﬃ cult to di ﬀ erentiatetopological signal to noise. where z i is either h i or g i and z ( i ) denotes the i -th largest numberin z , · · · , z m + n . In a Morse function, g is smaller than h and g m is smaller than h n in the unbounded domain R (Chung et al.,2009a).By keeping track of the birth and death of components, it ispossible to compute topological invariants of sublevel sets suchas the 0-th Betti number β (Edelsbrunner and Harer, 2008). Aswe move y from −∞ to ∞ , at a local minimum, the sublevel setadds a new component so that β ( g i − (cid:15) ) = β ( g i ) + ﬃ ciently small (cid:15) . This process is called the birth of thecomponent. The newly born component is identiﬁed with thelocal minimum g i .Similarly for at a local maximum, two components are mergedas one so that β ( h i − (cid:15) ) = β ( h i ) − . This process is called the death of the component. Since thenumber of connected components will only change if we passthrough critical points and we can iteratively compute β at eachcritical value as β ( z ( i + ) = β ( z ( i ) ) ± . The sign depends on if z ( i ) is maximum ( −

1) or minimum ( + ﬀ ect of low signal-to-noise ratio and to ob-tain smooth Morse function, either spatial or temporal smooth-ing have been often applied to brain imaging data before per-sistent homology is applied. In Chung et al. (2015a); Lee et al.(2017), Gaussian kernel smoothing was applied to 3D volumet-ric images. In Wang et al. (2018), di ﬀ usion was applied to tem-porally smooth data.7 .4. Persistent diagrams & barcodes In persistent homology, the topology of underlying data canbe represented by the birth and death of cycles. In a graph,the 0D and 1D cycles are a connected component and a cycle(Carlsson et al., 2008). During a ﬁltration, cycles in a homologygroup appear and disappear. If a cycle appears at birth value ξ and disappears at death value τ, it can be encoded into a point,( ξ, τ ) (0 ≤ ξ ≤ τ < ∞ ) in R . If m number of cycles appearduring the ﬁltration of a network X = ( V , w ), the homologygroup can be represented by a point set P ( X ) = { ( ξ , τ ) , . . . , ( ξ m , τ m ) } . This scatter plot is called the persistence diagram (PD) (Cohen-Steiner et al., 2007). A barcode encodes the birth time ξ i anddeath time τ i of the cycle as an interval ( ξ i , τ i ). The PD and bar-code encode topologically equivalent information. The lengthof a bar τ i − ξ i is the persistence of the cycle (Figure 5) (Edels-brunner and Harer, 2010b). Longer bars corresponds to morepersistent cycles that are considered as topological signal whileshorter bars correspond to topological noise. However, recentlysuch interpretation has been disputed and it may possible tohave meaningful signal even in shorter bars (Bubenik et al.,2020).The birth and death of connected components in the sub-level set of a function can be quantiﬁed and visualized by per-sistent diagram (PD) (Figure 4). Consider a smooth Morsefunction y = f ( x ) with unique critical values a < b < c < d < e < f that estimate the underlying data. Such functioncan be estimated using kernel smoothing methods (Wang et al.,2018). Then consider the Morse ﬁltration that swaps y valuefrom −∞ to ∞ . Each time the horizontal line touches the localminimum a < b < d < f , a new component that contain thelocal minimum is born. Each time the line touches the localmaximum c < e , the two components in the sublevel set mergetogether. This is considered as the death of a component. Fol-lowing the Elder Rule , when we pass a maximum and mergetwo components, we pair the maximum (birth) with the higherof the minimums of the two components (birth) (Edelsbrunnerand Harer, 2008, 2010b; Zomorodian and Carlsson, 2005). Do-ing so we are pairing the birth of a component to its death. Suchpaired points ( c , b ) and ( e , d ) form a persistence diagram . Ingeneral, PD can be algebraically represented as the weightedsum of Dirac delta functions I ( x , y ) = m (cid:88) i = c i δ ( x − ξ i , y − τ i ) (7)with (cid:80) mi = c i = I ( x , y ) becomes a 2D probabilitydistribution satisfying (cid:90) < x (cid:90) y > x I ( x , y ) dxdy = . Such probabilistic representation enables us to the access thevast library of distance and similarity measures between prob-ability distributions. Given 1D functional signal f observed at Figure 6: Heat kernel smoothing was performed on cortical thickness to obtainsmooth Morse function and projected onto a sphere (Chung et al., 2009b). Thecritical points are identiﬁed by comparing coritcal thickness around each meshvertex. The white (black) crosses are local minimums (maximums). They willbe paired following Elders rule to obtain the reduced PD. time points t , · · · t n , the critical points can be numerically es-timated obtained by checking sign changes in the ﬁnite di ﬀ er-ences f ( t i + ) − f ( t i ) . The estimation of critical points in higherdimensional functions are done similarly by checking the signsof neighboring voxels or nodes (Figure 6) (Osher and Fedkiw,2003; Chung et al., 2009a).For higher dimensional Morse functions, saddle points canalso create or merge sublevel sets so we also have to be con-cerned with them. Since there is no saddle points in 1D Morsefunctions, we do not need to worry about saddle points in 1Dfunctional signals. The addition of the saddle points makes theconstruction of the persistence diagrams much more complex.However, the saddle points do not yield a clear statistical in-terpretation compared to local minimums and maximums. Infact, there is no statistical methods or applications developedfor saddle points in literature. Thus, they are often removedin the reduced Morse ﬁltration (Chung et al., 2009b; Pachauriet al., 2011).The use of critical points and values within image analy-sis and computer vision has been relatively limited so far, andtypically appear as part of simple image processing tasks suchas feature extraction and edge detection (Antoine et al., 1996;Cootes et al., 1993; Sato et al., 1998). The ﬁrst or second or-der image derivatives may be used to identify the edges of ob-jects to serve as the contour of an anatomical shape. In thiscontext, image derivatives are computed after image smoothingand thresholded to obtain edges and ridges of images, that areused to identify voxels likely to lie on boundaries of anatomi-cal objects. Then, the collection of critical points are used as ageometric feature that characterize anatomical shape. Speciﬁcproperties of critical values as a topic on its own, however, hasreceived less attention so far. One reason is that it is di ﬃ cult toconstruct a streamlined linear analysis framework using criticalpoints or values. In brain imaging, on the other hand, the use ofextreme values has been quite popular in the context of multiplecomparison correction using the random ﬁeld theory (Worsleyet al., 1996; Taylor and Worsley, 2008; Kiebel et al., 1999). Inthe random ﬁeld theory, the extreme value of a statistic is ob-tained from, and is used to compute the p -value for correctingfor correlated noise across neighboring voxels.8 . Why statistical analysis in TDA hard? The persistent diagrams and equivalent barcodes are the orig-inal most often used descriptors in persistent homology. How-ever, due to the heterogenous nature of PD, statistical analysisof PD and barcodes have been very di ﬃ cult. It is not even clearhow to average PDs, the ﬁrst critical step in building statisticalframeworks. Even the Fr´echet mean is not unique, rendering it achallenging statistical issue to perform inference on directly onPDs (Bubenik, 2015; Chung et al., 2009a; Heo et al., 2012). Toremedy the problem various methods including persistent land-scape (Bubenik, 2015) and persistence images are proposed. Even though barcodes possess similar stability propertiesas PD (Bauer and Lesnick, 2013), the statistical analysis of bar-codes is di ﬃ cult. The di ﬃ culty is due to the heterogeneousalgebraic representation as an unordered multiset of intervals B = { ( ξ , τ ) , ( ξ , τ ) , · · · , ( ξ m , τ m ) } , where ξ i and τ i represent the birth and death times of the i -thtopological features. Barcodes may not match across di ﬀ erentnetworks. Also performing averaging, and subsequently con-structing test statistics is more di ﬃ cult as there is no uniqueFr`echet mean in the space of barcodes (Kaliˇsnik, 2019; Adcocket al., 2013; Hofer et al., 2019; Turner et al., 2014).Another important aspect is the inability of the barcode rep-resentation, and persistent homology in general, to distinguishbetween topological features that are intrinsic to the data ornoises (Blumberg et al., 2014). Consider a point cloud datacoming from some measurements (Figure 5). The four pointson top are all clustered together. On the other hand, one mea-surement (red point) can be far from the cluster as the result ofmeasurement error. The noisy point produces a connected com-ponent with large persistence making the barcodes ( B ) verydi ﬀ erent from the barcodes ( B ) from the data with no mea-surement error.Due to all these limitations, barcodes are often made intosummary statistics such as the minimum, maximum and aver-age length of a collection of barcodes (Pun et al., 2018). The accumulated persistence function (APF) accumulates the sumof the barcodes into a scalar value: APF ( t ) = (cid:88) i : µ i ≤ t ( τ i − ξ i ) , where µ i = ( τ i + ξ i ) / i -th bar. Since APF ismonotonically increasing over parameter t , it may be possibleto construct KS-distance between APF and perform the exacttopological interference (Chung et al., 2017b). APF was usedin the analysis of brain artery trees (Biscio and Møller, 2019)and determining if rs-fMRI is topologically stationary over time( ? ).Barcodes can also be quantiﬁed in terms of their entropy E ( B ), which is another summary statistic (Atienza et al., 2018).Let l i = τ i − ξ i be the length of i -th barcode and L = (cid:80) i l i bethe total length of all the barcodes. Deﬁne the fraction of length Figure 7: A comparison of the barcodes for an ordered set of points (top) anda relatively unordered set of points (bottom). In the structured system, we ﬁndthat most components have equal birth ( ξ i ) and death ( τ i ) times, which result inequal lifetimes for each topological feature ( τ i − ξ i ), resulting in a high persis-tence entropy E ( B ). For unordered system B , the components have di ﬀ erentdeath points. Thus unordered system will have a lower persistence entropy, i.e., E ( B ) > E ( B ). of the i -th barcode p i = l i / L over the total length. Then theentropy E ( B ) is deﬁned as E ( B ) = − (cid:88) i p i log p i (8)This provides a single scalar measurement of the topologicaldisorder of a given system (Atienza et al., 2018). The barcodeentropy has also been shown to outperform graph theoretic en-tropies, such as connectivity entropy and Von Neumann entropywhen characterizing complex networks and has been used inidentifying pre-ictal and ictal states in epilepsy patients (Ruccoet al., 2016; Merelli et al., 2015, 2016).Figure 7 illustrates how entropy di ﬀ ers for two systems. Forordered system B , ignoring the component that dies at ∞ , wehave L = , p = p = p = / E ( B ) = .

1. For un-ordered system B , we have L = , p = / , p = / , p = / E ( B ) = .

0. Thus, ordered system has higher persis-tent entropy E ( B ) > E ( B ). It would be investing to investigateif brain networks will exhibit lower persistence entropy. The PD and equivalent barcodes are the original most oftenused descriptors in persistent homology. Even though, the PDpossess desirable bounding properties such as Lipschitz stabil-ity with respect to the bottleneck distance, statistical analysison persistent PD is di ﬃ cult since it is not a clear cut to de-ﬁne the average PD. Further the Fr´echet mean is not unique inPD, rendering it a challenging statistical issue to perform infer-ence directly on PD (Chung et al., 2009a; Heo et al., 2012). Toremedy the problem, persistent landscape was ﬁrst proposed inBubenik (2015), which maps PD to a Hilbert space.Given a barcode ( ξ i , τ i ), we can deﬁne the piecewise linearbump function h i as h i ( (cid:15) ) = max(min( (cid:15) − ξ i , τ i − (cid:15) ) , . (9)The geometric representation of the bump function (9) is a right-angled isosceles triangle with height equal to half of the base of9he corresponding interval in the barcode (Wang et al., 2019;Bubenik, 2020). Given barcodes ( ξ , τ ) , · · · , ( ξ m , τ m ), the per-sistent landscape λ k ( (cid:15) ) is deﬁned as λ k ( (cid:15) ) = k -th largest value of h i ( (cid:15) ) . With this new representation, it is possible to have a uniquemean landscape ¯ λ ( (cid:15) ) over subjects, which is simply the aver-age of persistent landscape at each ﬁltration value (cid:15) . Such op-eration enables to construct the usual test statistic. The per-sistent landscapes have been applied to epilepsy EEG (Wanget al., 2018, 2019), functional brain networks in fMRI (Stolzet al., 2017, 2018) and neuroanatomical shape modeling in MRI(Garg et al., 2017). However, since scatter points in PD areconverted to piecewise linear functions, statistical analysis onpersistent landscape is not necessarily any easier than before.Thus, additional transformations on persistent landscapes is neededfor statistical analysis. Since λ ( (cid:15) ) > λ ( (cid:15) ) > λ ( (cid:15) ) > · · · at each ﬁxed (cid:15) , it is possible to build a monotone sequence ofscalar values by computing area under function λ k ( (cid:15) ) and per-form the exact topological inference (Wang et al., 2019). Due to the heterogenous nature of persistent diagrams (PD),it is di ﬃ cult to even average PDs and perform statistical anal-ysis. Starting with Chung et al. (2009a), various smoothingmethods were applied to PD such that averaging and statisticalanalysis can be directly performed on the smoothed PD. Chunget al. (2009a) discretized PD using the the uniform square grid.A concentration map is then obtained by counting the num-ber of points in each pixel, which is equivalent to smoothingPD with a uniform kernel. Notice that this approach is some-what similar to the voxel-based morphometry (Ashburner andFriston, 2000), where brain tissue density maps are used as ashapeless metric for characterizing concentration of the amountof tissue. In Reininghaus et al. (2015), persistence scale-spacekernel approach was proposed, where points in PD is treated asheat sources modeled as Dirac-delta and di ﬀ usion is run on PD.Ignoring symmetry across y = x , di ﬀ usion of heat sources isgiven by the convolution of Gaussian kernel K σ ( x , y ) = πσ e − ( x + y ) / (2 σ ) on Dirac-delta functions as f ( x , y ) = m (cid:88) i = K σ ∗ δ ( ξ i , τ i ) = m (cid:88) i = πσ e − [( x − ξ i ) + ( y − τ i ) ] / (2 σ ) . This representation is known as the persistence surface (Adamset al., 2017). Heat di ﬀ usion f ( x , y ) is then discretized by inte-grating the function within each square grid of given resolution.An example of the discretization is given in Figure 8-c. A per-sistence image (PI) is simply the vector of square grid values.There exists a stability result bounding the L -distance betweenPI by the bottleneck distance D B or Wasserstein distance D W on PD (Adams et al., 2017): || PI − PI (cid:48) || ≤ L · D ( PD , PD (cid:48) ) (10) Figure 8: (a) Persistence Diagram (PD) scatter plot. (b) Persistence Surface.(c) Persistence Image (PI) at multiple grid resolutions. An illustration of theprocess for converting a PD into a PI representation. The scatter plot presentedin (a) is ﬁt with multiple 2-dimensional Gassians centered at each point in thePD, known as the persistence surface (b). The persistence surface is then dis-cretized into a persistence image (c) by overlaying a grid, and assigning eachgrid square a scalar value equal to the integral of the sum of the Gaussianswithin the bounds of the grid square. for some constant L . This is consequence of kernel smooth-ing, which reduces the spatial variability. The vectorized rep-resentation enables the PI to be used in various statistical andmachine learning applications Chazal and Michel (2017). PIwas used in the functional brain network study on Schizophre-nia (Stolz et al., 2018) and and in conjunction with deep learn-ing for autism classiﬁcation (Rathore et al., 2019). The majortheme across these applications is that the PI presents a simple,vectorized representation of the persistence diagram that canbe readily used in many statistical and machine learning algo-rithms (Pun et al., 2018). The PI can be easily manipulated byadjusting the size of grid as well as the bandwidth σ of kernel.This allows users to adjust the amount of information needed torepresent PD in multiple scales for application needs.One major drawback of the method is the inherent loss ofinformation from both the grid discretization and the smooth-ing of PD. This loss of information can be detrimental to theanalysis of the given dataset as important features may be oversmoothed or unintentionally grouped together. One methodof combatting this potential loss of information is to utilizeweighting function w ( ξ, τ ) on the PI representation to empha-size certain features (Divol and Polonik, 2019; Adams et al.,2017). A popular method used is to assign linear or exponentialweight such as w ( ξ, τ ) = ( τ − ξ ) for each point in the PD thatincreases in correlation with the distance from the diagonal ofthe persistence diagram before di ﬀ usion via the Gaussian ker-nel. This weights more toward the features of higher persistencemaking them robust to the subsequent analysis.

5. Big data, scalability and approximation

The huge volumes of data in various imaging ﬁelds includ-ing neuroimaging and cosmology necessitate e ﬃ cient algorithmsfor performing scalable TDA computation. The computing needsin these diverse ﬁelds share a lot of similarities and we canadapt many scalable methods developed in other areas to brain10maging data. In this section, we review some of the scalableschemes for simplifying big data computations. There are sev-eral ways in which TDA can help with big data challenges. Oneof which is to dimensionality reduction.Given n scatter points data in a k dimensional space, theamount of information grows exponentially as k n . In contrast,persistent homology compresses the data to k functions f ( (cid:15) ) , f ( (cid:15) ) , · · · , f k − ( (cid:15) ) of ﬁltration value (cid:15) . Topological featuressuch as the duration of birth to death of k -cycles, Betti numbersand the number of such k -cycles are used for such functions. Ifwe further bin the ﬁltration parameter into N bin intervals, theamount of information characterized by persistent homologygrows only linearly with k as kN bin . This huge compressionof data can facilitate the search for its underlying patterns inbig data. Even so, performing TDA on large dataset is daunting.Persistent homology does not scale well with increased networksize (Figure 9). The computational complexity of persistenthomology grows rapidly with the number of simplices (Topazet al., 2015). With n nodes, the size of the k -skeleton grows as n k + . Homology calculations are often done by Gaussian elimi-nation, and if there are N simplicies simplices, it takes O ( N simplicies )time to perform. In a d dimensional embedding space, this com-putational time can be estimated as O ( n k + ) (Solo et al., 2018).Thus, a full TDA computation of Rips complex is exponen-tially costly; it becomes infeasible when the number of nodesor the dimension of the embedding space is large. This can eas-ily happen when one tries to use brain networks at the voxellevel resolution. Thus, there have been many algorithm de-velopment in computing Rips complex approximately but fastfor large-scale data. There are two broad approaches to makethe TDA computation scalable: 1) by sampling a subset of thenodes and 2) by constructing alternate ﬁltrations consisting ofsigniﬁcantly smaller number of simplices while keeping the setof nodes unchanged that give the same or approximate persis-tent homologies of the ﬁltration. This section is focused on thedetailed review of the ﬁrst approach. The construction of alter-nate ﬁltrations such as graph ﬁltrations are reviewed in the nextsection.The second approach speeds up the computation by reduc-ing the number of simplices. The alpha ﬁltration based on De-launay triangulation is an example of an alternate ﬁltration witha provable guarantee that it consists of smaller number of sim-plicies than the Rips ﬁltration (Otter et al., 2017). The worst-case number of simplices in the alpha ﬁltration in dimension k = O ( n ). The beneﬁts of a sparser representa-tion than Rips ﬁltration cannot be overstated, but the challengeis to construct a simpliﬁed ﬁltration without spoiling the sta-bility results of persistent homology. By perturbing the metricof the data space, one can construct a sparsed Rips ﬁltrationwhich has O ( n ) simplicies and can be computed in O ( n log n )time (Sheehy, 2013). One crucial di ﬀ erence between geometry and topology isthat topology can be accurately estimated from a small sampleof the full data, while geometry requires more ﬁne-grained in-formation. This reﬂects the fact that topology is a more basic Figure 9: rs-fMRI correlation network of two subjects from HCP with morethan 25000 nodes. Identifying cycles and computing the number of cycles canbe computationally demanding in this type of dense correlation network sincepersistent homology computations are not very scalable. notion of shape than geometry. This suggests a sampling proce-dure can signiﬁcantly reduce the size of a simplicial representa-tion of data while preserving topological information. Since thenumber of simplices in the Rips complex grows with respect tothe number of vertices N vertices as 2 O ( N vertices ) , being able to evalu-ate topology on a small sample of the vertices can greatly speedup the computation. A principled procedure for sampling datapoints to identify the data’s topology is given by witness com-plexes (de Silva and Carlsson, 2004). The idea is to use a smallsubset of the given point cloud as landmarks, which form thevertices of the complexes. The remaining non-landmark pointsare used as witnesses to the existence of simplices spanned bycombinations of landmark points.Let Z be the full point cloud, L ⊂ Z denote the set of land-mark points, and W ⊂ Z denotes the set of witnesses. W is adense sample of Z . Then w ∈ W is a weak witness of a simplex σ = [ v , v , · · · , v k ] ⊂ L if for all v ∈ σ and for all v (cid:48) ∈ L \ σ , d ( w , v ) ≤ d ( w , v (cid:48) ) (de Silva and Carlsson, 2004). The largestdistance from a witness to the simplex should be smaller thanthe distance to all other points excluding the simplex. A k -simplex is witnessed by w if it consists of w ’s ( k + Z . Let d i ( w , σ ) is the ( i + w to one of vertices in σ . Note d k ( w , σ ) = max k d ( w , v k ). Sim-ilarly deﬁne d k ( w , L ) to be the distance from w to its ( k + L . Starting with initial random point v , the witness complex can be deﬁned by induction. For ver-tices v i ∈ L , we include the k -simplex σ in the witness complexWit( L , W , (cid:15) ) at scale (cid:15) if all of its faces belong to Wit( L , W , (cid:15) )and there exists a witness point w ∈ W such that d k ( w , σ ) ≤ (cid:15) + d k ( w , L ) (11)for ﬁltration value (cid:15) which is analogous to the ﬁltration valuesin the Rips ﬁltration. The witness complex is the largest sim-plicial complex with vertices in L whose faces are witnessed bypoints in W (Guibas and Oudot, 2008).The witness complex depends on the initial choice of land-marks. We want to minimize the number of landmark points11 igure 10: In the witness complex, edges and higher-dimensional simplicesare built out of vertices in the landmark set L = { v , v , . . . , v } and witnessedby other points according to inequality (11). Point w is a weak witness of the2-simplex [ v , v , v ] at ﬁltration scale (cid:15) =

0. This 2-simplex belongs to thewitness complex Wit( L , W ,

0) since σ = [ v , v , v ] satisﬁes (11) for k = v , v ] , [ v , v ] , [ v , v ] satisfy (11) for k = to reduce computational time, but yet this landmark set shouldbe representative of the data, as the four points in representingthe circle in Figure 10. The landmarks are chosen either ran-domly or through the maxmin algorithm. which tend to selectevenly spaced landmarks (de Silva and Carlsson, 2004; Coleand Shiu, 2019). In brain imaging applications, the landmarkcan be chosen biologically. In building parcellation-based net-works, center of individual parcellation can be chosen as land-marks and approximate the overall topology of large-scale brainnetworks constructed at the voxel-level. The size of a witnesscomplex is determined by the size of the landmark set L . Since L represents only a fraction of the points in given point cloud Z ,witness complexes are much smaller than the Rips complexescorresponding to Z . The upper bound | Z | / | L | ≤

20 was sug-gested for 2D surface data (de Silva and Carlsson, 2004). Tak-ing | Z | / | L | =

20 and denoting by N W , N R the sizes of the corre-sponding witness complex and Rips complex respectively, theworst-case scaling of the number of simplices in the witnesscomplex is (Otter et al., 2017; Arafat et al., 2019) N W ∼ | L | ∼ | Z | / ∼ N / R . Therefore the witness complex presents a huge computationaladvantage for large data sets.Witness complexes can be thought of as approximations ofthe Delaunay triangulation in 2D (de Silva and Carlsson, 2004;Attali et al., 2007; Guibas and Oudot, 2008; Boissonnat et al.,2009b). The advantage of the witness complex is that the ambi-ent dimension does not a ﬀ ect the complexity of the algorithm,while the Delaunay triangulation su ﬀ ers from the curse of di-mensionality. An algorithm was proposed in Boissonnat et al.(2009b) to enrich the set of witnesses and landmarks to preservethe relation between the Delaunay triangulation and the witnesscomplex in k > Figure 11: Left: The Voronoi diagram obtained from the point cloud that sam-ples the key shaped object. Right: The Delaunay complex, which is the dualgraph of the Voronoi diagram. lated by natural images were found to have compatible topo-logical structure, that of a two-sphere. (Dabaghian et al., 2012)used witness complexes to model activity in the hippocampusas storing topological information about the local environment. α -ﬁltrations The α -ﬁltration (Edelsbrunner and M¨ucke, 1994) takes itsinspiration from often used Delaunay triangulations, which is asimplicial complex. The Delaunay triangulation or tetrahedral-ization has been widely used in brain imaging ﬁeld for buildingsurface or volumetric meshes from MRI (Si, 2015; Wang et al.,2017a).Given a set of points X = { x , · · · , x p } ⊂ R D , the Voronoicell around x k is given by V k = { y ∈ R D | d ( x , x k ) ≤ d ( x , x j ) for all j (cid:44) k } . The Voronoi cell of point x k includes all points for which x k is the nearest point among all points in X . The Voronoi cell isa convex set. The collection of Voronoi cells triangulates thedomain R D and constitutes the Voronoi diagram of X . The De-launay complex is deﬁned via the Voronoi diagram as ∪ k V k . Anexample of this geometric realization is shown in Figure 11.The Delaunay triangulation is then deﬁned as the dual graph ofthe Voronoi diagram. The minimum spanning tree of the com-plete graph consisting of node set X and edge weights d ( x k , x j )is a subset of the Delaunay triangulation.The size of the Delaunay complex grows with the numberof vertices N vertices as N vertices log N vertices for D ≤ N D / for D ≥ ﬀ ers from the curse of dimen-sionality. Developing e ﬃ cient algorithms for handling higherdimensional Delaunay complex is a subject of ongoing research(Boissonnat et al., 2009a). Wang et al. (2017b) used the De-launay triangulation in building a complex from EEG channellocations.12o further speed up the computation, α -complex is oftenused over Rips complex. The α -complex, which is a subcom-plex of the Delaunay complex, is deﬁned as follows. For eachpoint x ∈ X and (cid:15) >

0, let B x ( (cid:15) ) be the closed ball with center x and radius (cid:15) . The union of these balls ∪ x ∈ X B x ( (cid:15) ) is the setof points in R D at distance at most (cid:15) from at least one of thepoints of X . We then intersect each ball with the correspondingVoronoi cell, R x ( (cid:15) ) = B x ( (cid:15) ) ∩ V x . The collection of these sets { R x ( (cid:15) ) | x ∈ X } forms a cover of ∪ x ∈ X B x ( (cid:15) ), and the α -complexof X at scale (cid:15) is deﬁned asAlpha( X , (cid:15) ) = { σ ⊂ X | ∩ x ∈ σ R x ( (cid:15) ) (cid:44) ∅} . Since balls and Voronoi cells are convex, the R x ( (cid:15) ) are also con-vex. Alpha( X , (cid:15) ) has the same homology as ∪ x ∈ X B x ( (cid:15) ). Notonly is Alpha( X , (cid:15) ) a subcomplex of the Delaunay complex, itis also a subcomplex of the ˇCech complex.One potentially undesirable aspect of the α -ﬁltration is thatthe topological objects it identiﬁes die at smaller scales thanone would expect when there are outliers. Thus, the α -ﬁltrationis sensitive to outliers. We can try to remedy this situation bymodifying the α -ﬁltration to a weighted α -ﬁltration that allowsballs of di ﬀ erent sizes, and building the ﬁltration in a spatiallyadaptive fashion. In applications, this weighted α -ﬁltration isoften motivated by the underlying physical and biological mod-els. For example, in studying the persistent homology of large-scale structure data in cosmology (Biagetti et al., 2020), thegravitational potential depends on the mass density of clusters(0-simplices) suggesting a varying ball sizes. In modeling ofbiomolecules, such as proteins, RNA, and DNA, each atom isrepresented by a ball whose radius reﬂects the range of its vander Waals interactions and thus depends on the atom type. Forbrain network applications, we can adjust the size of balls tofollow the contour of either gray matter or white matter of thebrain and adoptively build the ﬁltration.A di ﬀ erent modiﬁcation to the α -ﬁltration can be done thatreduces the e ﬀ ect of outliers (Biagetti et al., 2020). In α DTM-ﬁltration, we assign the simplices a ﬁltration time based on thedistance-to-measure (DTM) function. Given a set of point X ,the empirical DTM function is given byDTM( x ) = k  (cid:88) x i ∈ N k ( x ) || x − x i || p  / p (12)where N k ( x ) is the list of the k nearest neighbors in X to x (Chazal et al., 2011). Often p = k =

1, in which the DTMfunction gives the distance to the nearest point in X . Increasing k “smooths” this distance function so that it takes small val-ues where it is near many points and large values near outliers.Then performing a sublevel ﬁltration using the DTM function,outliers will be added relatively late in the ﬁltration, leading toa smaller e ﬀ ect on the persistent homology.For large-scale images, DTM function can be evaluated onan image grid (Xu et al., 2019). Evaluating on a grid with highresolution and su ﬃ ciently large k involves prohibitive compu-tational expense. One way to get around this is to evaluate the DTM function at certain relevant points of the space. Thisamounts to choosing an e ﬃ cient triangulation of the ambientspace. In Biagetti et al. (2020), the DTM function was eval-uated on the Delaunay complex. This was shown to keep thecomputational cost low while maintaining the desirable featureof α DTM-ﬁltration in reducing the adverse e ﬀ ects of outliers. Another approach to signiﬁcantly reduce the number of sim-plices is Morse ﬁltration. Using the combinatorial Morse theory(Forman, 1998), one can reduce the initial complex using geo-metric and combinatorial methods before performing homologycalculations. Combinatorial Morse theory extends the notionof Morse homology for smooth manifolds to discrete datasets.The gradient ﬂow for a smooth manifold is replaced by a par-tial pairing of cells in a complex. As a result, the Morse com-plex has thus a signiﬁcantly smaller size than the original com-plex, while preserving all homological information. As demon-strated in Harker et al. (2010), for many complexes the resultingMorse complex is many orders of magnitude smaller than theoriginal. An e ﬃ cient preprocessing algorithm was developedin Mischaikow and Nanda (2013) that extends combinatorialMorse theory from complexes to ﬁltrations. This proprocessingalgorithm maps any ﬁltration into a Morse ﬁltration with thesame persistent homologies, thus signiﬁcantly reducing boththe computational time and the required memory for runningthe persistence algorithm. The computational overhead in TDA lies in the homologycalculations of higher-dimensional simplices. The primary al-gorithms in persistent homology requires Gaussian eliminationwith cubic runtime in the number of simplices (Edelsbrunnerand Harer, 2010b). This can cause an additional computationalbottleneck when the number of simplices up to dimension k in the Rips ﬁltration is the order of O ( n k + ) for n data points(Sheehy, 2013). Thus, it is worthwhile to explore if the runtime for analyzing data structure is signiﬁcantly reduced if werepresent the nodes and their relations in terms of hypergraphs (Torres et al., 2020). A hypergraph is a generalization of a graphwhich contains vertex set V and hyperedge set E consisting ofvertices allowing polyadic relations between an arbitrary num-ber of vertices.This is in contrast to the normally deﬁned graph which canonly form connections between two vertices. Unlike an edge ina graph or a simplicial complex, a hyperedge can connect anyarbitrary number of vertices. Thus, a hypergraph representationof data consists of only 0D and 1D objects. Figure 12 displays ahypergraph with three nodes v , v , v representing interactionsbetween all three nodes with hyperedge e .We can represent the hypergraph using the incidence ma-trices or the corresponding adjacency matrices. The incidencematrix H = ( h i j ) is a matrix with rows representing vertices v i and columns representing edges e j (Estrada and Rodriguez-13 igure 12: Schematic of graph and hypergraph with three nodes v , v , v . Theedge set for the graph is e = [ v , v ], e = [ v , v ], e = [ v , v ]. The edgeset for the hypergraph is given by e = [ v , v , v ], e = [ v , v ]. A hyper-graph allows edges that contain an arbitrary number of vertices, while edges ina graph is limited to only two vertices, preventing it from capturing higher levelinteractions between vertices. The connection e in the hypergraph captures thehigh level interaction between the three vertices. A graphs on the other handcan only approximate the higher order interaction with a series of pairwise in-teractions. The simplex captures the high level interaction but also assumespairwise interactions between all vertices. The hypergraph captures the highlevel interaction directly. Velazquez, 2005) deﬁned as h i j =  v i ∈ e j v i (cid:60) e j . The adjacency matrix A for a given graph or hypergraph is de-rived from the incidence matrix A = HH T − D , where D is the diagonal matrix whose diagonal entries are thedegrees of each node (Estrada and Rodriguez-Velazquez, 2005):The adjacency matrix of a hypergraph can be used to computecertain properties of hypergraphs such as centrality and cluster-ing coe ﬃ cients (Estrada and Rodriguez-Velazquez, 2005). Theincidence matrix H and adjacency matrix A of the graph in Fig-ure 12 is given by H = v v v e e e   , A = v v v v v v   . Similarly, the incidence and adjacency matrix of the hypergraphin Figure 12 is given by H = v v v e e   , A = v v v v v v   . Simplicial complexes are also capable of capturing higherlevel interactions, but require representation of higher order re-lationships using high-dimensional simplices and have the re-striction that these relationships have a property known as the downward inclusion (Torres et al., 2020). The downward in-clusion requires that any subset of ( k + k -simplex must also form a simplex. This reduces the abil-ity of the simplicial complex to represent abstract relationships,such as those used to understand the cross link connectivity offunctional brain networks over time (Bassett et al., 2014), as it assumes the presence of lower level connections via down-ward inclusion. It is possible to have a high order interactionbetween three regions of the brain without signiﬁcant pairwiseinteraction between any two regions. Thus, the simplicial com-plex based brain network analysis may over enforce additionallow hierarchical dependency than needed.In Figure 12, we illustrate the di ﬀ erence between a graph,a simplex, and a hypergraph in capturing a high level relation-ship between three vertices. The graph can only capture dyadicrelationship between two nodes. For higher order interactionbetween all three nodes v , v , v , additional model is needed.The simplex represents the higher level relationships betweenall three vertices by the ﬁlled-in triangle but requires that alldyadic connections between all node pairs exist, which may notnecessarily be true in brain networks. However, the hypergraphcaptures only the triadic relationship between the vertices with-out lower level dyadic relationship.A hypergraph representation of data consists of only 0D(vertices) and 1D (hyperedges) objects making it a low dimen-sional representation of higher order interactions within a dataset.Hypergraphs provide improved ﬂexibility over simplicial rep-resentations used in TDA and accurately capture higher levelrelationships in complex networks, such as those used to under-stand covariant structures in neurodevelopment (Torres et al.,2020; Gu et al., 2017). The hypergraph has found many appli-cations in functional brain imaging studies. Hypergraph topol-ogy has been used to di ﬀ erentiate brain patterns during attention-demanding tasks, memory-demanding tasks, and resting states(Davison et al., 2015). Hypergraph topology has also been usedin understanding di ﬀ erences in functional connectivity in brainsover the human lifespan (Davison et al., 2016). It has also beenused in the diagnosis of brain disease and in the identiﬁcationof connectome biomarkers (Jie et al., 2016; Zu et al., 2016)The use of hypergraphs extends beyond functional networkswhere they have been used in multi-atlas and multi-modal hip-pocampus segmentation by developing hypergraphs that repre-sent topological connections across multiple images over dif-ferent imaging modalities (Dong et al., 2015).Many of the reported methods in literature use simple topo-logical descriptors of hypergraphs, such as number of crosslinks, hyperedge sizes or clustering coe ﬃ cients. There has beenlittle application of the concepts of TDA to hypergraphs di-rectly. However, there has been work some preliminary worksconnecting TDA to hypergraphs including the the analysis ofBetti numbers of hypergraphs, the cohomology of hypergraphs,and the embedded homology of hypergraphs (Chung and Gra-ham, 1992; Emtander, 2009; Bressan et al., 2016).The application of TDA methods to hypergraphs may yieldfaster running times as the homology of high dimensional sim-plices are no longer needed to represent high order interac-tions, and could result in more meaningful analysis as con-straints such as downward inclusion would no longer be re-quired (Lloyd et al., 2016).14 . Persistent homology in brain networks For adapting persistent homology to brain network data, graph ﬁltration was introduced (Lee et al., 2011a, 2012). In-stead of analyzing networks at one ﬁxed threshold, we buildthe collection of nested networks over every possible threshold.The graph ﬁltration is a threshold-free framework for analyzinga family of graphs but requires hierarchically building nestedsubgraph structures. The graph ﬁltration framework shares sim-ilarities to existing multi-thresholding or multi-resolution net-work models that use somewhat arbitrary multiple thresholdsor scales (Achard et al., 2006; He et al., 2008; Kim et al., 2015;Lee et al., 2012). However such approaches are mainly ex-ploratory and mainly used to visualize graph feature changesover di ﬀ erent thresholds without quantiﬁcation. Persistent ho-mology, on the other hand, quantiﬁes such feature changes in acoherent way. The graph ﬁltration has been the ﬁrst type of ﬁltrations ap-plied in brain networks and now considered as the de fact base-line ﬁltrations in the ﬁeld (Lee et al., 2011a, 2012). Euclideandistance is often used metric in building ﬁltrations in persis-tent homology (Edelsbrunner and Harer, 2010b). Most brainnetwork studies also use the Euclidean distances for buildinggraph ﬁltrations (Lee et al., 2011b, 2012; Chung et al., 2015a,2017b; Petri et al., 2014; Khalid et al., 2014; Cassidy et al.,2015; Wong et al., 2016; Anirudh et al., 2016; Palande et al.,2017). Given weighted network X = ( V , w ) with edge weight w = ( w i j ), the binary network X (cid:15) = ( V , w (cid:15) ) is a graph consistingof the node set V and the binary edge weights w (cid:15) = ( w (cid:15), i j ) givenby w (cid:15), i j =  w i j > (cid:15) ;0 otherwise . (13)Note Lee et al. (2011a, 2012) deﬁnes the binary graphs by thresh-olding above such that w (cid:15), i j = w i j < (cid:15) which is consistentwith the deﬁnition of the Rips ﬁltration. However, in brain con-nectivity, higher value w i j indicates stronger connectivity so weusually thresholds below (Chung et al., 2015a).Note w (cid:15) is the adjacency matrix of X (cid:15) , which is a simpli-cial complex consisting of 0-simplices (nodes) and 1-simplices(edges) (Ghrist, 2008). In the metric space X = ( V , w ), the Ripscomplex R (cid:15) ( X ) is a simplicial complex whose ( p − p -tuples of points that satisfy w i j ≤ (cid:15) in a pairwise fashion (Ghrist, 2008). While the binary network X (cid:15) has at most 1-simplices, the Rips complex can have at most( p − X (cid:15) ⊂ R (cid:15) ( X ) and its compliment X c (cid:15) ⊂ R (cid:15) ( X ). Since a binary network is a special case of theRips complex, we also have X (cid:15) ⊃ X (cid:15) ⊃ X (cid:15) ⊃ · · · and equivalently X c (cid:15) ⊂ X c (cid:15) ⊂ X c (cid:15) ⊂ · · · Figure 13: Schematic of graph ﬁltration and Betti curves. We sort the edgeweights in an increasing order. We threshold the graph at ﬁltration values andobtain binary graphs. The thresholding is performed sequentially by increasingthe ﬁltration values. The 0-th Betti number β , which counts the number ofconnected components, and the ﬁrst Betti number β , which counts the numberof cycles, is then plotted over the ﬁltration. The Betti plots curves monotone ingraph ﬁltrations.Figure 14: Graph ﬁltrations of maltreated children vs. normal control subjectson FA-values (Chung et al., 2015a). The Pearson correlation is used as ﬁltrationvalues at 0.5, 0.6 and 0.7. maltreated subjects show much higher correlation ofFA-values indicating more homogeneous and less varied structural covariaterelationship. for 0 = (cid:15) ≤ (cid:15) ≤ (cid:15) · · · . The sequence of such nested multi-scale graphs is deﬁned as the graph ﬁltration (Lee et al., 2011a,2012). Figure 13 illustrates a graph ﬁltration in a 4-nodes ex-ample while Figure 14 shows the graph ﬁltration on structuralcovariates on maltreated children on 116 parcellated brain re-gions. fvNote that X is the complete weighted graph while X ∞ is thenode set V . By increasing the threshold value, we are threshold-ing at higher connectivity so more edges are removed. Given aweighted graph, there are inﬁnitely many di ﬀ erent ﬁltrations.This makes the comparisons between two di ﬀ erent graph ﬁltra-tions di ﬃ cult. For network Y = ( V , z ) with the same node setbut with di ﬀ erent edge weight z , with di ﬀ erent ﬁltration values, λ ≤ λ ≤ λ · · · , we have Y λ ⊃ Y λ ⊃ Y λ ⊃ · · · . Then how we compare two di ﬀ erent graph ﬁltrations {X (cid:15) i } and {Y λ i } ? For di ﬀ erent (cid:15) j and (cid:15) j + , we can have identical binarygraph, i.e., X (cid:15) j = X (cid:15) j + . For graph X = ( V , w ) with q uniquepositive edge weights, the maximum number of unique ﬁltra-tions is q + p nodes, the maximum number of edges is( p − p ) /

2, which is obtained in a complete graph. If we orderthe edge weights in the increasing order, we have the sortededge weights:0 = w (0) < min j , k w jk = w (1) < w (2) < · · · < w ( q ) = max j , k w jk , where q ≤ ( p − p ) /

2. The subscript ( ) denotes the order statis-tic. For all λ < w (1) , X λ = X is the complete graph of V . Forall w ( r ) ≤ λ < w ( r + ( r = , · · · , q − X λ = X w ( r ) . For all w ( q ) ≤ λ , X λ = X ρ ( q ) = V , the vertex set. Hence, the ﬁltrationgiven by X ⊃ X w (1) ⊃ X w (2) ⊃ · · · ⊃ X w ( q ) is maximal in a sense that we cannot have any additional ﬁl-tration X (cid:15) that is not one of the above ﬁltrations. Thus, graphﬁltrations are usually given at edge weights.The condition of having unique edge weights is not restric-tive in practice. Assuming edge weights to follow some contin-uous distribution, the probability of any two edges being equalis zero. For discrete distribution, it may be possible to haveidentical edge weights. Then simply add Gaussian noise or addextremely small increasing sequence of numbers to q numberof edges. The graph ﬁltration can be quantiﬁed using monotonic func-tion f satisfying f ( X (cid:15) ) ≥ f ( X (cid:15) ) ≥ f ( X (cid:15) ) ≥ · · · (14)or f ( X (cid:15) ) ≤ f ( X (cid:15) ) ≤ f ( X (cid:15) ) ≤ · · · (15)The number of connected components (zeroth Betti number β ) and the number of cycles (ﬁrst Betti number β ) satisfy themonotonicity (Figures 13 and 15). The size of the largest clus-ter also satisﬁes a similar but opposite relation of monotonicincrease. There are numerous monotone graph theory features(Chung et al., 2015a, 2017b).For graphs, β can be computed easily as a function of β .Note that the Euler characteristic χ can be computed in twodi ﬀ erent ways χ = β − β + β − · · · = nodes − edges + f aces − · · · , where nodes , edges , f aces are the number of nodes, edgesand faces. However, graphs do not have ﬁlled faces and Bettinumbers higher than β and β can be ignored. Thus, a graphwith p nodes and q edges is given by (Adler et al., 2010) χ = β − β = p − q . Thus, β = p − q − β . In a graph, Betti numbers β and β are monotone over ﬁltrationon edge weights (Chung et al., 2019a,b). When we do ﬁltration Figure 15: The Betti curves on the covariance correlation matrices for Jacobiandeterminant (left column) and fractional anisotrophy (right column) on 548 (toptwo rows) and 1856 (bottom two rows) nodes (Chung et al., 2015a). Unlikethe covariance, the correlation seems to shows huge group separation betweennormal and maltreated children visually. However, in all 7 cases except top right(548 nodes covariance for FA), statistically signiﬁcant di ﬀ erences were detectedusing the rank-sum test on the areas under the Betti-plots ( p -value < . ﬀ erent nodesizes indicating the robustness of the proposed method over changing numberof nodes. on the maximal ﬁltration in (14), edges are deleted one at a time.Since an edge has only two end points, the deletion of an edgedisconnect the graph into at most two. Thus, the number ofconnected components ( β ) always increases and the increaseis at most by one. Note p is ﬁxed over the ﬁltration but q isdecreasing by one while β increases at most by one. Hence, β always decreases and the decrease is at most by one. Further,the length of the largest cycles, as measured by the number ofnodes, also decreases monotonically (Figure 16).Identifying connected components in a network is impor-tant to understand in decomposing the network into disjointsubnetworks. The number of connected components (0-th Bettinumber) of a graph is a topological invariant that measuresthe number of structurally independent or disjoint subnetworks.There are many available existing algorithms, which are notrelated to persistent homology, for computing the number ofconnected components including the Dulmage-Mendelsohn de-composition (Pothen and Fan, 1990), which has been widelyused for decomposing sparse matrices into block triangular formsin speeding up matrix operations.In graph ﬁltrations, the number of cycles increase or de-creases as the ﬁltration value increases. The pattern of mono-tone increase or decrease can visually show how the topology ofthe graph changes over ﬁltration values. The overall pattern of Betti curves can be used as a summary measure of quantifyinghow the graph changes over increasing edge weights (Chunget al., 2013) (Figure 13). The Betti curves are related to bar-codes. The Betti number is equal to the number of bars in the16 igure 16: The largest cycle at given correlation thresholds on rs-fMRI. Tworepresentative subjects in HCP were used (Chung et al., 2019a). As the thresh-old increases, the length of cycles decreases monotonically. barcodes at the speciﬁc ﬁltration value.

Binary trees have been a popular data structure to analyzeusing persistent homology in recent years (Bendich et al., 2016;Li et al., 2017). Trees and graphs are 1-skeletons, which areRips complexes consisting of only nodes and edges. However,trees do not have 1-cycles and can be quantiﬁed using up to 0-cycles only, i.e., connected components, and higher order topo-logical features can be simply ignored. However, Garside et al.(2020) used somewhat ine ﬃ cient ﬁltrations in the 2D plane thatincrease the radius of circles from the root node or points alongthe circles. Such ﬁltrations will produces persistent diagrams(PD) that spread points in 2D plane. Further, it may create 1-cycles. Such PD are di ﬃ cult to analyze since scatter points donot correspond across di ﬀ erent PD. For 1-skeleton, the graphﬁltration o ﬀ ers more e ﬃ cient alternative (Chung et al., 2019b;Songdechakraiwut and Chung, 2020b).Consider a tree T = ( V , w ) with node set V = { , , · · · , p } and weighted adjacency matrix w . If we have binary tree withbinary adjacency matrix, we add edge weights by taking thedistance between nodes i and j as the edge weights w i j andbuild a weighted tree with w = ( w i j ). For a tree T with p ≥ p − w (1) < w (2) < · · · < w ( p − . Threshold T at (cid:15) and deﬁne the binary tree T (cid:15) = ( V , w (cid:15) ) with edge weights w (cid:15) = ( w (cid:15), i j ) , w (cid:15), i j = w i j > (cid:15) and 0otherwise. Then we have graph ﬁltration on trees T w (0) ⊃ T w (1) ⊃ T w (2) ⊃ · · · ⊃ T w ( p − . (16)Since all the edge weights are above ﬁltration value w (0) = β ( w (0) ) =

1. Since no edgeweight is above the threshold w ( q − , β ( w ( p − ) = p . Each timewe threshold, the tree splits into two and the number of compo-nents increases exactly by one in the tree (Chung et al., 2019b).Thus, we have β ( T w (1) ) = , β ( T w (2) ) = , · · · , β ( T w ( p − ) = p . Thus, the coordinates for the 0-th Betti curve is given by(0 , , ( w (1) , , · · · , ( w (2) , , ( w ( p − , p ) , ( ∞ , p ) . All the 0-cycles (connected components) never die oncethey are bone over graph ﬁltration. For convenience, we simplylet the death value of 0-cycles at some ﬁxed number c > w ( q − .Then PD of the graph ﬁltration is simply( w (1) , c ) , ( w (2) , c ) , · · · , ( w ( q − , c )forming 1D scatter points along the horizontal line y = c mak-ing various operations and analysis on PD signiﬁcantly simpli-ﬁed (Songdechakraiwut and Chung, 2020b). Figure 17 illus-trates the graph ﬁltration and corresponding 1D scatter pointsin PD on the binary tree used in Garside et al. (2020). A di ﬀ er-ent graph ﬁltration is also possible by making the edge weightto be the shortest distance from the root node. However, theyshould carry the identical topological information.For a general graph, it is not possible to analytically deter-mine the coordinates for its Betti curves. The best we can dois to compute the number of connected components β numer-ically using the single linkage dendrogram method (SLD) (Leeet al., 2012), the Dulmage-Mendelsohn decomposition (Pothenand Fan, 1990; Chung et al., 2011) or through the Gaussianelimination (de Silva and Ghrist, 2007; Carlsson and Memoli,2008; Edelsbrunner et al., 2002). Instead of doing graph ﬁltration at the edge level, it is possi-ble to build di ﬀ erent kind of ﬁltrations at the node level (Hoferet al., 2020; Wang et al., 2017b). Consider graph G = ( V , E )with nodes V = , , · · · , p and node weights w i deﬁned at eachnode i . With threshold λ , deﬁne a binary network G λ = ( V λ , E λ )where V λ = { i ∈ V : w i ≤ λ } and E λ = { ( i , j ) ∈ E : max( w i , w j ) ≤ λ } . Note E λ ⊂ E such that two nodes i and j are connected ifmax( w i , w j ) ≤ λ . We include a node from G in G λ when thethreshold λ is above its weight, and we connect two nodes in G λ with an edge when λ is above the larger weight of any of thetwo nodes. Then we have the node-based graph ﬁltration G w (1) ⊂ G w (2) ⊂ · · · ⊂ G w ( q ) . (17)The ﬁltration (17) is not a ﬀ ected by reindexing nodes sincethe edge weights remain the same regardless of node indexing.Each G w ( j ) in (17) consists of clusters of connected nodes; as λ increases, clusters appear and later merge with existing clus-ters. The pattern of changing clusters in (17) has the followingproperties.For w ( i ) ≤ λ < w ( i + , G λ = G w ( i ) , the ﬁltration (17) is max-imal in the sense that no more G λ can be added to (17). As λ increases from w ( i ) to w ( i + , only the node v (cid:48) i + that correspondsto the weight w ( i + is added in V w ( i + .Node-based graph ﬁltration was applied to the Delaunay tri-angulation of EEG channels in the meditation study in Wanget al. (2017b). Node weights are the powers at the EEG chan-nels. At each ﬁltration value λ , both the nodes and edges with17 igure 17: Left: binary tree used in Garside et al. (2020). Middle: β -curve over graph ﬁltration. Edge weights of the tree is used as the ﬁltration values. Right: Thepoints in the persistent diagram all lined up at y = .

31, which is arbitarily picked to be larger than the maximum edge weight 0.3034. weights less than or equal to λ are added. The clusters changeas λ increases.Over the years, other ﬁltrations on graph and networks havebeen developed including clique ﬁltration (Petri et al., 2013;Stolz et al., 2017).

7. Topological distances

The topological distances and losses are usually built ontop of various algebraic representations of persistent homol-ogy such as barcodes, PD and graph ﬁltrations. The Gromov-Hausdor ﬀ (GH) distance is possibly the most popular distancethat is originally used to measure distance between two met-ric spaces (Tuzhilin, 2016). It was later adapted to measuredistances in persistent homology, dendrograms (Carlsson andMemoli, 2008; Carlsson and M´emoli, 2010; Chazal et al., 2009)and brain networks (Lee et al., 2011a, 2012). The probabil-ity distributions of GH-distance is unknown. Thus, the sta-tistical inference on GH-distance has been done through re-sampling techniques such as jackknife, bootstraps or permu-tations (Lee et al., 2012, 2017; Chung et al., 2015a), which of-ten cause computational bottlenecks for large-scale networks.To bypass the computational bottleneck associated with resam-pling large-scale networks, the Kolmogorov-Smirnov (KS) dis-tance was introduced in (Chung, 2012; Chung et al., 2017b;Lee et al., 2017). In this section, we review various topologicaldistances.

This is perhaps the most often used distance in persistenthomology but it is rarely useful in applications due to the crudenature of metric. Given two networks X = ( V , w ) with m cy-cles and X = ( V , w ) with n cycles, we construct a ﬁltration.Subsequently, PDs P ( X ) = (cid:110) ( ξ , τ ) , · · · , ( ξ m , τ m ) (cid:111) and P ( X ) = (cid:110) ( ξ , τ ) , · · · , ( ξ n , τ n ) (cid:111) are obtained through the ﬁltration (Lee et al., 2012; Chung et al.,2019b). The bottleneck distance between the networks is de-ﬁned as the bottleneck distance of the corresponding PDs andbound the Hausdor ﬀ distance (Cohen-Steiner et al., 2007; Edels-brunner and Harer, 2008): D B (cid:0) P ( X ) , P ( X ) (cid:1) = inf γ sup ≤ i ≤ m (cid:107) t i − γ ( t i ) (cid:107) ∞ , (18)where t i = ( ξ i , τ i ) ∈ P ( X ) and γ is a bijection from P ( X )to P ( X ). The inﬁmum is taken over all possible bijections. If t j = ( ξ j , τ j ) = γ ( t i ) for some i and j , L ∞ -norm is given by (cid:107) t i − γ ( t i ) (cid:107) ∞ = max (cid:0) | ξ i − ξ j | , | τ i − τ j | (cid:1) . The optimal bijection γ is often determined by the Hungarianalgorithm (Cohen-Steiner et al., 2007; Edelsbrunner and Harer,2008). Note (18) assumes m = n such that the bijection γ ex-ists. If m (cid:44) n , there is no one-to-one correspondence betweentwo PDs. Then, additional points should be augmented alongthe diagonal line to match unmatched pairs. This enables us tomatch short-lived homology classes to zero persistence (Chunget al., 2019b; Cole, 2020).If the two networks share the same node set V = V , with p nodes and the same number of q unique edge weights. If thegraph ﬁltration is performed on two networks, the number oftheir 0D and 1D cycles that appear and disappear during theﬁltration is p and 1 − p + q , respectively (Chung et al., 2019b).Thus, their persistence diagrams of 0D and 1D cycles alwayshave the same number of points.The well known stability theorem (Cohen-Steiner et al., 2007)states D B (cid:0) P ( X ) , P ( X ) (cid:1) ≤ (cid:107) w − w (cid:107) ∞ . The stability theorem is established on Morse functions on acompact manifold but should be true for most of applications.Since the inﬁnity norm is a crude metric even in the Euclideanspace, such stability theorem does not imply statistical sensitiv-ity or robustness.18 .2. Wasserstein distance

The bottleneck distance is not necessarily a sensitive metricand usually performs poorly compared to other distances (Leeet al., 2011a; Chung et al., 2017a). A more sensitive distancemight be the q -Wasserstein distance (Edelsbrunner and Harer,2010b) which is related to recently popular optimal transports . q -Wasserstein distance distance D W is deﬁned as D W (cid:0) P ( X ) , P ( X ) (cid:1) = (cid:104) inf γ (cid:88) i (cid:107) t i − γ ( t i ) (cid:107) q ∞ (cid:105) / q , where the inﬁmum is taken over all bijections γ between P ( X )and P ( X ), with the possibility of agumenting unmatched pointsto the diagonal (Chung et al., 2019b). It can be shown that the q -Wasserstein distance is bounded by D W (cid:0) P ( X ) , P ( X ) (cid:1) ≤ C (cid:2) inf (cid:88) (cid:107) f − g (cid:107) q ∞ (cid:3) / q for some constant C . Again since the inﬁnity norm is a crudemetric, the stability statement simply implies the distance arewell bounded and behave reasonably well but it does not im-plies statistical sensitivity. ﬀ distance Gromov-Hausdor ﬀ (GH) distance for brain networks is in-troduced in Lee et al. (2011a, 2012). GH-distance measuresthe di ﬀ erence between networks by embedding the networkinto the ultrametric space that represents hierarchical cluster-ing structure of network (Carlsson and M´emoli, 2010). Thedistance s i j between the closest nodes in the two disjoint con-nected components R and R is called the single linkage dis-tance (SLD), which is deﬁned as s i j = min l ∈ R , k ∈ R w lk . Every edge connecting a node in R to a node in R has thesame SLD. SLD is then used to construct the single linkage ma-trix (SLM) S = ( s i j ). SLM shows how connected componentsare merged locally and can be used in constructing a dendro-gram. SLM is a ultrametric which is a metric space satisfyingthe stronger triangle inequality s i j ≤ max( s ik , s k j ) (Carlsson andM´emoli, 2010). Thus the dendrogram can be represented as aultrametric space D = ( V , S ) , which is again a metric space.GH-distance between networks is then deﬁned through GH-distance between corresponding dendrograms. Given two den-drograms D = ( V , S ) and D = ( V , S ) with SLM S = ( s i j )and S = ( s i j ), D GH ( D , D ) = max ∀ i , j | s i j − s i j | . (19)GH-distance is the maximum of single linkage distance (SLD)di ﬀ erences (Lee et al., 2012). SLD between two nodes i and j is the shortest distance between two connected componentsthat contain the nodes. The edges that gives the the maximumSLM is the GH-distance between the two networks. Among alltopological distances, only GH-distance uses the single linkageclustering. All other distances do not use the single linkage Figure 18: Two topologically distinct graphs may have identical dendrograms,which results in zero GH-distance. clustering (Chung et al., 2017a). For instance, bottleneck dis-tance or KS-distance are not related to single linkage clustering(Lee et al., 2012).The limitation of the SLM is the inability to discriminatea cycle in a graph. Consider two topologically di ﬀ erent graphswith three nodes (Figure 18). However, the corresponding SLMare identically given by  . . . . . .  and  . . . . . .  . The lack of uniqueness of SLMs makes GH-distance incapableof discriminating networks with cycles (Chung, 2012).For the statistical inference on GH-distance, resampling tech-niques such as jackknife or permutation tests are often used(Lee et al., 2011a, 2012, 2017).

Many TDA distances are not scalable and may not be appli-cable for large brain networks at the voxel level without sim-pliﬁcation on the underlying algebraic representations. Mo-tivated by the need to build a scalable topological distance,Kolmogorov-Smirnov (KS) distance was introduced in Chunget al. (2013, 2015a, 2017b); Lee et al. (2017). KS-distance isbuilt on top of the graph ﬁltration, a reduced Rips ﬁltration on1-skeleton. Given two networks X = ( V , w ) and X = ( V , w ),KS-distance between X and X is deﬁned as D KS ( X , X ) = sup (cid:15) ≥ (cid:12)(cid:12)(cid:12) β i ( X (cid:15) ) − β i ( X (cid:15) ) (cid:12)(cid:12)(cid:12) for β and β curves on graph ﬁltrations X (cid:15) and X (cid:15) . Thedistance D KS is motivated by the KS-test for determining theequivalence of two cumulative distribution functions in non-parametric statistics (B¨ohm and Hornik, 2010; Chung et al.,2017b; Gibbons and Chakraborti, 2011). The distance D KS canbe discretely approximated using the ﬁnite number of ﬁltrations (cid:15) , · · · , (cid:15) q : D q = sup ≤ j ≤ q (cid:12)(cid:12)(cid:12) β i ( X (cid:15) j ) − β i ( X (cid:15) j ) (cid:12)(cid:12)(cid:12) .

19f we choose enough number of q such that (cid:15) j are all the sortededge weights, then (Chung et al., 2017b). D KS ( X , X ) = D q . This is possible since there are only up to p ( p − / p nodes and the monotone func-tion increases discretely but not continuously . In practice, (cid:15) j may be chosen uniformly at discrete intervals.KS-distance treat the two networks in Figure 18 as identicalif Betti number β is used as the monotonic feature function.To discriminate cycles, KS-distance on β should be used. Thisexample illustrates a need for new more sophisticated networkdistance that can discriminates the higher order topological fea-tures beyond connected components. The advantage of usingKS-distance is its easiness to interpret compared to other lessintuitive distances from persistent homology. Due to its sim-plicity, it is possible to determine its probability distribution ex-actly (Chung et al., 2017b).

8. Topological inference and learning

It is di ﬃ cult to directly apply existing statistical and ma-chine learning methods to TDA features such as PDs and bar-codes that do not have one-to-one correspondence across fea-tures. It is di ﬃ cult to average such features without transfor-mations or smoothing the features ﬁrst. Thus, there have beenconsistent lack of the concept of statistical consistency and sta-tistical methods in the ﬁeld. One may assume that resamplingbased statistical inference with minimal statistical assumptionssuch as the permutation test may work. However, resamplingdirectly on the Rips complex or other derived topological fea-ture may not be feasible for large-scale networks. Thus, there isa strong need to develop a scalable statistical inference proce-dures tailored for TDA methods (Chung et al., 2017b, 2019b). Most TDA features have the stability results that show thedistance between two topological features are bounded by someknown distance (Cohen-Steiner et al., 2007; Adams et al., 2017).Such stability results do not necessarily provide the statisticalconsistency that is required for building proper test statisticsfor hypothesis testing that is needed in brain imaging (Bubeniket al., 2010). The test statistic T n , which depends on the sam-ple size n , is consistent if it converge to true value value T inprobability as n goes to inﬁnity. For T n to be consistent, weneed lim n →∞ P ( | T n − T | > (cid:15) ) = (cid:15) >

0. Most statistics such as sample mean and T -statistics are all consistent. The consistency grantees the con-vergence of statistical results when enough samples are used.Existing stability results are mostly on the stability of TDA fea-tures but not about the stability or consistency of the test statis-tics on such features. So additional investigations are neededto establish the consistency of statistics built on top of TDAfeatures, which were rarely done till now. The easiest way to build consistent statistical proceduresin TDA is using the available topological distances with wellestablished stability results. Unfortunately, there is no knownprobability distributions for topological distances except KS-distance (Chung et al., 2019b). The complicated algebraic formsof most TDA distances do not make building probability dis-tributions on them easier. The probability distribution of KS-distance D q under the null assumption of the equivalence oftwo Betti curves β i ( X (cid:15) j ) and β i ( X (cid:15) j ) is asymptotically given by(Chung et al., 2017b, 2019b)lim q →∞ (cid:16) D q / (cid:112) q ≥ d (cid:17) = ∞ (cid:88) i = ( − i − e − i d . (20)The p -value under the null is then computed as p -value = e − d o − e − d o + e − d o · · · , where the observed value d o is the least integer greater than D q / (cid:112) q in the data. For any large value d ≥

2, the secondterm is in the order of 10 − and insigniﬁcant. Even for smallobserved d , the expansion converges quickly. The KS-distancedoes not assume any statistical distribution. This is perhaps theonly known distribution related to topological distances. Themain advantage of the method is that the method avoid usingtime the consuming permutation test for large-scale networks.For other distances, resampling techniques such as permu-tations and bootstraps are needed to determine the statisticaldistributions. The permutation test (Fisher, 1966) is the most widely usednonparametric test procedure in brain imaging, which can beeasily adapted for inference on distance and loss functions. Itis known as the only exact test since the distribution of the teststatistic under the null hypothesis can be exactly computed bycalculating all possible values of the test statistic under everypossible combinations. Unfortunately, even for modest samplesizes, the total number of permutations is astronomically largeand only a small subset of permutations is used in approximat-ing p -values. Thus, permutation tests are all approximate inpractice.The permutation test for two samples are done as follows(Chung et al., 2013; Efron, 1982; Lee et al., 2012). Two sampletest setting is probably the most often encountered in brain net-works. Suppose there are m networks X , X , · · · , X m in GroupI and n networks Y , Y , · · · , Y n in Group II. We are interestedin testing if networks in two groups di ﬀ er. Concatenate all thenetworks and reindex them as Z = ( Z , · · · , Z m + n ) = ( X , · · · , X m , Y , · · · , Y n ) . Following Songdechakraiwut and Chung (2020b), we deﬁnethe average between-group topological distance as D B ( Z ) = mn m (cid:88) i = n (cid:88) j = D ( X i , Y i ) . D can be used for this purpose. Thesmaller distance D B implies smaller group di ﬀ erences. Thus, D B can be used as the test statistic for di ﬀ erentiating the groups.For this, we need to determine the empirical distribution of D B under the null hypothesis of no group di ﬀ erence. Under thenull, the group labels are interchangeable and the sample spacecan be generated by permutations. The permutations are doneas follows. Now permute the indices of Z in the symmetricgroup of degree m + n , i.e., S m + n (Kondor et al., 2007). Thepermuted networks are indexed as Z σ , where σ ∈ S m + n is thepermutation. Then we split Z σ into two parts Z σ = ( Z σ (1) , · · · , Z σ ( m ) , Z σ ( m + , · · · , Z σ ( m + n ) ) . Then for each permutation σ , we compute D B ( Z σ ). Theoreti-cally, for all S m + n , we can compute the the distances. This givesthe empirical estimation of the distribution of D B . The numberof permutations exponentially increases and it is impractical togenerate every possible permutation. So up to tens of thousandspermutations are usually generated to approximate the null dis-tribution. This is an approximate method and a care should betaken to guarantee the convergence. For faster convergence, wecan use the transposition test, which iteratively computes thenext permutation from the previous permutation in a sequentialmanner (Chung et al., 2019c; Songdechakraiwut and Chung,2020b). Many existing prediction and classiﬁcation models of brainstructures and functions are usually based on a speciﬁc brainparcellation scheme. Summary measurements across parcel-lations are compared through various standard loss functionssuch as correlations (Huang et al., 2020b) and mutual informa-tion (Thirion et al., 2014). However, there begin to emerge evi-dences that brain functions do not correlate well with the ﬁxedparcellation boundaries (Rottschy et al., 2012; Eickho ﬀ et al.,2016; Arslan et al., 2018; Kong et al., 2019). Many existingparcellation schemes give raise to conﬂicting topological struc-tures over di ﬀ erent scales (Lee et al., 2012; Chung et al., 2015a,2018). The topological structure of network at one parcellationscale may not carry over to di ﬀ erent parcellation scales. Forinstance, the estimated modular structure usually do not havecontinuity over di ﬀ erent resolution parameters or parcellations(Betzel and Bassett, 2017; Chung, 2018). Thus, there are needsto develop learning frameworks that provide a consistent andcomplete picture of brain anatomical structure and functionalprocesses regardless of the choice of parcellation. One possiblesolution is to develop learning framework across parcellationsof di ﬀ erent sizes and topology.Existing prediction models mainly use regressions, such asgeneral linear models (GLM), logistic regressions or mixed-e ﬀ ect models, that incorporate the accumulated e ﬀ ect of fea-tures as the sum of predictive variables in correlating cognitivescores (Zhang et al., 2019). Regression models might be rea-sonable for determining the group level average patterns. How-ever, the underlying network features that matter most and inwhat combinations might be just too complex to be discovered in the regression based predictive models. These challenges canbe addressed through deep learning , which is methodologicallywell suited to address all these challenges.Due to the popularity of deep learning, there have been re-cent attempts at trying to incorporate the persistent homologyto deep learning (Chen et al., 2019; Hu et al., 2019). Earlier at-tempts at incorporating topology into learning framework wasnot e ﬀ ective since they mainly used topological features intoclassiﬁcation frameworks (Guo et al., 2018; Garin and Tauzin,2019). Instead of trying to feed the scalar summary of topologi-cal features that may not be e ﬀ ective, more e ﬃ cient approachesseem to use loss and cost functions that explicitly incorporatetopology in learning. For various learning tasks such as clustering, classiﬁcationand regression, traditional losses mostly based on Euclideandistance are often used (Chung et al., 2019b). Such loss func-tions are well suited for measuring similarity and distance be-tween homogenous imaging data that match across subjects.For heterogenous data such as brain sulcal patterns (Huang et al.,2020a) or functional activation patterns that may not have voxel-to-voxel correspondence across subjects (Desai et al., 2005),such loss functions are ill-suited or ine ﬀ ective. Also it is notstraightforward to set up loss function between brain parcella-tions of di ﬀ erent sizes and topology. However, the topologicaldistances we reviewed can easily handle such heterogenous dataof di ﬀ erent sizes and topology.In addition to topological distances we reviewed so far, weintroduce an additional topological loss function in terms of thebarcodes (Edelsbrunner and Harer, 2010a; Songdechakraiwutand Chung, 2020a). The expensive optimization process in-volved in the Rips ﬁltration takes O ( d ) run-time for d numberof data points (Edmonds and Karp, 1972; Kerber et al., 2017;Edelsbrunner and Harer, 2010a), making it impractical in brainnetworks with far larger number of topological features involv-ing hundreds of connected components and thousands of cy-cles. Instead of the Rips ﬁltration, if we use the graph ﬁltra-tion, the optimization problem turns into an optimal matchingproblem with a signiﬁcantly reduced run-time of O ( d log d )(Songdechakraiwut and Chung, 2020a).Consider a network G = ( V , w ) comprising a set of nodes V and unique positive symmetric edge weights w = ( w i j ). Thegraph ﬁltration is deﬁned on G . Unlike the Rips complex, thereare no more higher dimensional topological features to computein a graph ﬁltration beyond the 1D topology. The 0D or 1D bar-code corresponding to the network G is a collection of intervals[ b i , d i ] such that each interval tabulates the life-time of a con-nected component or a cycle that appears at the ﬁrst ﬁltrationvalue b i and vanishes at the last ﬁltration value d i .The number of connected components β is non-decreasingas (cid:15) increases, and so we can simply represent the 0D barcodeof the network using only the set of increasing birth values: I ( G ) : b < b < · · · < b m Similarly the number of 1-cycles is monotonically decreasing,and thus we can represent the 1D barcode using only the set of21ncreasing death values: I ( G ) : d < d < · · · < d m Increasing the ﬁltration (cid:15) can result in either the birth of a con-nected component or the death of a cycle but not both at thesame time. Therefore, the set of 0D birth values I ( G ) and 1Ddeath values I ( G ) partition the edge weight set W such that W = I ( G ) ∪ I ( G ) with I ( G ) ∩ l ( G ) = ∅ (Songdechakraiwutand Chung, 2020a). This decomposition can be e ﬀ ectively usedin computing the topological loss e ﬃ ciently.A topological loss function which measures the topologicalsimilarity between two networks can be deﬁned in terms of the0D and 1D barcodes. Let G = ( V , w ) and G = ( V , w ) be net-works of same size. The topological loss L top ( G , G ) is deﬁnedby the optimal matching cost: L top ( G , G ) = min τ (cid:88) b ∈ I ( G ) [ b − τ ( b )] + (cid:88) d ∈ I ( G ) [ d − τ ( d )] assuming bijection τ : I ( G ) ∪ I ( G ) → I ( G ) ∪ I ( G ) fac-torizes τ = τ ⊕ τ into bijections τ : I ( G ) → I ( G ) and τ : I ( G ) → I ( G ). L top is the modiﬁed Wasserstein distancebetween the barcodes of the two networks. It can be shown thatthe optimal bijection τ is given by matching the i -th smallestbirth values in I ( G ) to the i -th smallest birth values in I ( G )(Songdechakraiwut and Chung, 2020a). Similarly τ is givenby matching the i -th smallest death values in I ( G ) to the i -thsmallest death values in I ( G ). For two networks of di ﬀ erentsizes, any unmatched 0D birth or 1D death values in the largernetwork are matched to the largest or smallest edge weights inthe smaller network, respectively.A topological loss function is given by the Wasserstein dis-tance between the persistence images of two datasets. Othertopological loss is also possible. In Br¨uel-Gabrielsson et al.(2019), topological loss L PD k on the persistent diagram of k -cycle was introduced as L PD k ( p , q , i ) = (cid:88) i ≥ i ( d i − b i ) p (cid:32) d i + b i (cid:33) q , where ( b i , d i ) is the i -th barcode and i is the index for the i -thmost persistent point. The power p can be increased to morestrongly penalize the most persistent features, and the power q serves to weight features that are prominent later in the ﬁltra-tion. We can use L PD k as a regularizer that penalizes on thenumber of clusters or cycles in a natural way that is di ﬃ cult toaccomplish with more traditional analysis. L PD k was used in de-noising the number of connected component of MNIST imagedataset (Br¨uel-Gabrielsson et al., 2019). Topological regulariz-ers of this type enable us to use the stochastic gradient decent toencourage speciﬁc topological features in deep learning tasks. An important application of deep learning toward brain net-works is to uncover patterns in networks, as these patterns canreveal the underlying topological signal. However, much of thedeep learning e ﬀ orts so far has been limited to using the data structure provided by the local geometry. For example, the var-ious Euclidean loss functions often used in generative modelsas well as for regularizations are deﬁned with only knowledgeof the data’s local geometry. In contrast, topology encodesthe global structure of the geometry of data. The purpose oftopological learning toward brain networks is to develop learn-ing frameworks that exploit the topological information of net-works explicitly.Central to learning tasks such as regression, classiﬁcationand clustering is the optimization of a loss function. Givenobserved data X i with responses y i , we ﬁt a predictive modelwith parameters (cid:98) β which will enable us to make a prediction (cid:98) y i = f ( (cid:98) β ; X i ) for each observation. The loss function L assessesthe goodness of the ﬁt of the model prediction to the observa-tion, mostly through the Euclidean distance or matrix normssuch as the sum of squared residuals L ( β ) = n (cid:88) i = (cid:0) y i − f ( (cid:98) β ; X i ) (cid:1) . However, such models are prone to over-ﬁtting if there are moreunknown parameters than observations. The regularization termor penalty P ( β ) is introduced to avoid such overﬁtting: (cid:98) β = arg min β L ( β ) + λ P ( β ) , where λ is a tuning parameter that controls the contribution ofthe regularization term. Traditionally common regularizers in-clude L and L loss functions, however, topological penaltiesare recently introduced to penalize or favor speciﬁc topologicalfeatures (Chen et al., 2019). In (Songdechakraiwut and Chung,2020b), functional brain networks G , · · · , G k are regressed inthe model (cid:98) Θ = arg min Θ n n (cid:88) k = L F ( Θ , G k ) + λ L top ( Θ , T ) , where the Frobenius norm L F measures the goodness-of-ﬁt be-tween the network model Θ and P = ( V P , w P ) is a networkexpressing a prior topological knowledge such as the structuralbrain network, where functional networks are overlaid. For suf-ﬁciently large λ , all the functional networks will be warped tothe structural network T and there will be no topological di ﬀ er-ences among functional networks.Topological approaches are relatively unexplored in deeplearning. Many past works on incorporating topological fea-tures into convolutional neural networks does it implicitly andfail to identify which topological features are learned (Cloughet al., 2019). Recently, more explicit topological losses con-structed from persistent homology have seen more success inexplicitly learning underlying topology of data. Applications oftopology to deep learning include the extraction of topologicalfeatures for better learning (Pun et al., 2018; Cole et al., 2020),using di ﬀ erentiable properties of persistence to deﬁne topolog-ical loss (Clough et al., 2019; Edelsbrunner and Harer, 2010a;Songdechakraiwut and Chung, 2020a) and persistence images(Cole et al., 2020), deep learning interpretability (Gabrielssonand Carlsson, 2019; Gabella, 2019).22n other applications, we can incorporate topological pri-ors in the model or training data to improve the performancein learning tasks (Chen et al., 2019; Br¨uel-Gabrielsson et al.,2019). In image segmentation, standard loss functions is oftenconstructed at the voxel level ignoring the higher-level struc-tures (Clough et al., 2019). However, it may be necessary toenforce global topological prior to avoid the additional imageprocessing to correct topological defects in the segmentationresults (Chung et al., 2015b).Topology is also used in designing more robust deep learn-ing models Br¨uel-Gabrielsson et al. (2019). Adversarial attacksin machine learning try to deceive models by supplying decep-tive input or outlying data (Huang et al., 2017). Machine learn-ing techniques mainly assume the the training and test data aregenerated from the statistical distributions. By designing the in-put data that violates such statistical assumptions, we can foolthe learning model. Neutral networks are vulnerable to adver-sarial attacks which may be imperceptible to the human eye, butcan lead the model to misclassify the output (Chakraborty et al.,2018). It has been speculated that models based on topologicalfeatures are more robust against adversarial attacks (Gabriels-son and Carlsson, 2019; Carlsson and Gabrielsson, 2018). Atopological layer can also be placed in neural networks to pre-vent from adversarial attacks and enhance topological ﬁdelity.In Br¨uel-Gabrielsson et al. (2019), it was shown that the suc-cess rate of adversarial attacks is much smaller on networkswith a topological layer. Even thought the concept of adver-sarial attacks have not been popular in brain network modeling,building deep learning models on brain networks that are robustagainst adversarial attacks would be highly useful.

9. Conclusion & Disucssion

In this review paper, we surveyed various topological dis-tance and loss functions based on persistent homology. Weshowed how such distance and loss functions can be used inwide variety of learning applications including regression, clus-tering and classiﬁcation. ManyTDA approaches can success-fully di ﬀ erentiate the topological network di ﬀ erences. How-ever, often it is unclear where the di ﬀ erence is localized withinthe networks. In diagnostics related problems such as deter-mining if a subject belong to particular clinical population, itis important to determine the topological di ﬀ erences; however,more important questions is localizing the source of di ﬀ erences.Since there is no one-to-one correspondence between the topo-logical features and the original data, it is often not possible tolocalize the signals, which has been the biggest limitation of theTDA methods in brain imaging applications. The future devel-opment TDA should be toward this important but very di ﬃ cultquestion.Compared to TDA methods, geometric methods are moreadapt at detecting localized signals in brain imaging. Figure 19displays the sucal and gyral trees obtained from the brain sur-face mesh (Huang et al., 2019b), where trees are teated as heatsource with value + ﬀ usion is applied to pro-duce the smooth map of sulcal and gyral trees. Such smooth Figure 19: Top: Sulcal (blue) and gyral (red) pattern of two di ﬀ erent subjects.The sucal pattern can be represented as a forest, a collection of trees (Huanget al., 2020a). Since the nodes of the forests are the surface mesh vertices, theforest can be represented as about 300000 × .

001 (Chung et al., 2008) smoothing the node value wheresulci are assigned value -1 and gyri are assigned value 1. The smoothing oblivi-ates the need for matching heterogeneous data structure and comparisons ofthe di ﬀ erent sulcal pattern across subject can be done at the mesh vertex level.Such local geometric solution is not readily available for brain networks. maps can be easily compared across di ﬀ erent subjects. For in-stance, two-sample t -statistic at each mesh vertex is performedlocalizing group di ﬀ erences. Such localized signal detection isdi ﬃ cult if not impossible with many existing TDA methods.Persistent homology features are by deﬁnition global sum-mary measures and they might be more useful for tasks thatdo not involves identifying the source of signal di ﬀ erences. Itmight be more useful in discrete decision making tasks such asclustering and classiﬁcations. In fact TDA has begin to be moreuseful in deep learning (Chen et al., 2019).The paper reviewed how TDA can be used in statistic brainnetworks that do not change over time. For dynamically chang-ing brain networks over time, it is unclear how TDA can be ap-plied yet. Consider the dynamically changing correlating ma-trices of resting-state fMRI (Figure 20). At each each correla-tion matrix, we can apply persistent homology and obtain PD,which is the discrete representation of connectivity at one par-ticular time point. It is unclear how to integrate such discretetopological representation over multiple time points in a contin-uous fashion. Dynamic-TDA as introduced in Songdechakrai-wut and Chung (2020a) encodes rsfMRI as a time-ordered se-quence of Rips complexes and their corresponding barcodes in23 igure 20: Top: Dynamically changing correlation matrices computed from rs-fMRI using the sliding window of size 60 for a subject (Huang et al., 2019a).Bottom: The constructed correlation matrices are superimposed on top of the white matter ﬁbers, which are colored based on correlations between parcellations.The structural brain network is a more or less a tree like structure without many loops or cycles while the functional bran networks can have many loops and cycles.Thus, aligning functional brain networks on top of structural brain network directly will not work. It requires a new topological approach for aligning topologicallydi ﬀ erent networks. studying dynamically changing topological patterns over time.Dyanmic-TDA may provide some suggestion for extending TDAto time varying brain networks. This is left as promising futureresearch direction. Acknowledgements

This study was supported by NIH grants R01 EB022856and R01 EB028753 and NSF grant MDS-2010778. We wouldlike to thank Andery Gritsenko of Northeastern University forthe computation of the largest cycle in a graph. We also like tothank Alex Cole of University of Amsterdam for the discussionof the witness complex. We thank Hyekyung Lee of Seoul Na-tional University Hospital for discussion on sparse ﬁltrationsand kernel distances. We would like to thank Robin Hender-son of Newcastle University, U.K. for providing the coordinatesand connectivity information of nodes of the binary tree used inGarside et al. (2020). We would like to thank Hill Goldsmithof University of Wisconsin for providing one subject rsfMRIdata used in the ﬁgure, which came from the study Huang et al.(2019a). We thank Taniguchi Masanobu of Waseda Universityfor discussion on canonical correlations.

ReferencesReferences

Achard, S., Salvador, R., Whitcher, B., Suckling, J., Bullmore, E., 2006. Aresilient, low-frequency, small-world human brain functional network withhighly connected association cortical hubs. The Journal of Neuroscience 26,63–72.Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P.,Chepushtanova, S., Hanson, E., Motta, F., Ziegelmeier, L., 2017. Persistenceimages: A stable vector representation of persistent homology. The Journalof Machine Learning Research 18, 218–252.Adcock, A., Carlsson, E., Carlsson, G., 2013. The ring of algebraic functionson persistence bar codes. arXiv preprint arXiv:1304.0530.Adler, R., Bobrowski, O., Borman, M., Subag, E., Weinberger, S., 2010. Per-sistent homology for random ﬁelds and complexes. In: Borrowing strength:theory powering applications–a Festschrift for Lawrence D. Brown. Instituteof Mathematical Statistics, pp. 124–143.Andrew, G., Arora, R., Bilmes, J., Livescu, K., 2013. Deep canonical correla-tion analysis. In: International conference on machine learning. PMLR, pp.1247–1255.Anirudh, R., Thiagarajan, J., Kim, I., Polonik, W., 2016. Autism spectrumdisorder classiﬁcation using graph kernels on multidimensional time series.arXiv preprint arXiv:1611.09897.Antoine, J., Petra, A., Max, A., 1996. Evaluation of ridge seeking operatorsfor multimodality medical image matching. IEEE Transactions on PatternAnalysis and Machine Intelligence.Arafat, N. A., Basu, D., Bressan, S., 2019. Topological data analysis with (cid:15) -netinduced lazy witness complex. In: International Conference on Databaseand Expert Systems Applications. Springer, pp. 376–392. rslan, S., Ktena, S., Makropoulos, A., Robinson, E., Rueckert, D., Parisot,S., 2018. Human brain mapping: A systematic comparison of parcellationmethods for the human cerebral cortex. NeuroImage 170, 5–30.Arunkumar, N., Mohammed, M. A., Abd Ghani, M. K., Ibrahim, D. A., Abdul-hay, E., Ramirez-Gonzalez, G., de Albuquerque, V. H. C., 2019. K-meansclustering and neural network for object detecting and identifying abnormal-ity of brain tumor. Soft Computing 23 (19), 9083–9096.Ashburner, J., Friston, K., 2000. Voxel-based morphometry - the methods. Neu-roImage 11, 805–821.Asllani, I., Borogovac, A., Brown, T. R., 2008. Regression algorithm correctingfor partial volume e ﬀ ects in arterial spin labeling mri. Magnetic Resonancein Medicine: An O ﬃ cial Journal of the International Society for MagneticResonance in Medicine 60 (6), 1362–1371.Atienza, N., Gonz´alez-D´ıaz, R., Soriano-Trigueros, M., 2018. On the stabilityof persistent entropy and new summary functions for tda. arXiv preprintarXiv:1803.08304.Attali, D., Edelsbrunner, H., Mileyko, Y., 2007. Weak witnesses for delaunaytriangulations of submanifolds. In: Proceedings of the 2007 ACM sympo-sium on Solid and physical modeling. pp. 143–150.Avants, B., Cook, P., Ungar, L., Gee, J., Grossman, M., 2010. Dementia in-duces correlated reductions in white matter integrity and cortical thickness:a multivariate neuroimaging study with sparse canonical correlation analy-sis. NeuroImage 50, 1004–1016.Babai, L., Luks, E., 1983. Canonical labeling of graphs. In: Proceedings of theﬁfteenth annual ACM symposium on Theory of computing. pp. 171–183.Banks, D., Carley, K., 1994. Metric inference for social networks. Journal ofclassiﬁcation 11, 121–149.Bassett, D., 2006. Small-world brain networks. The Neuroscientist 12, 512–523.Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Grafton, S. T., 2014.Cross-linked structure of network evolution. Chaos: An InterdisciplinaryJournal of Nonlinear Science 24 (1), 013112.Bauer, U., Lesnick, M., 2013. Induced matchings and the algebraic stability ofpersistence barcodes. arXiv preprint arXiv:1311.3681.Bendich, P., Marron, J., Miller, E., Pieloch, A., Skwerer, S., 2016. Persistenthomology analysis of brain artery trees. The annals of applied statistics 10,198.Betzel, R., Bassett, D., 2017. Multi-scale brain networks. NeuroImage in press.Biagetti, M., Cole, A., Shiu, G., 9 2020. The Persistence of Large Scale Struc-tures I: Primordial non-Gaussianity.Bien, J., Tibshirani, R., 2011. Hierarchical clustering with prototypes via mini-max linkage. Journal of American Statistical Association 106, 1075–1084.Biscio, C. A., Møller, J., 2019. The accumulated persistence function, a newuseful functional summary statistic for topological data analysis, with a viewto brain artery trees and spatial point process applications. Journal of Com-putational and Graphical Statistics 28 (3), 671–681.Bishop, C. M., 2006. Pattern recognition and machine learning. springer.Blessy, S., Sulochana, C. H., 2015. Performance analysis of unsupervised opti-mal fuzzy clustering algorithm for mri brain tumor segmentation. Technol-ogy and Health Care 23 (1), 23–35.Blumberg, A. J., Gal, I., Mandell, M. A., Pancia, M., 2014. Robust statis-tics, hypothesis testing, and conﬁdence intervals for persistent homology onmetric measure spaces. Foundations of Computational Mathematics 14 (4),745–789.B¨ohm, W., Hornik, K., 2010. A Kolmogorov-Smirnov test for r samples. Insti-tute for Statistics and Mathematics Research Report Series, Report 105.Boissonnat, J.-D., Devillers, O., Hornus, S., 2009a. Incremental constructionof the delaunay triangulation and the delaunay graph in medium dimension.In: Proceedings of the twenty-ﬁfth annual symposium on Computationalgeometry. pp. 208–216.Boissonnat, J.-D., Guibas, L. J., Oudot, S. Y., 2009b. Manifold reconstructionin arbitrary dimensions using witness complexes. Discrete & ComputationalGeometry 42 (1), 37–70.Bressan, S., Li, J., Ren, S., Wu, J., 2016. The embedded homology of hyper-graphs and applications. arXiv preprint arXiv:1610.00890.Br¨uel-Gabrielsson, R., Nelson, B. J., Dwaraknath, A., Skraba, P., Guibas, L. J.,Carlsson, G., 2019. A topology layer for machine learning. arXiv preprintarXiv:1905.12200.Bubenik, P., 2015. Statistical topological data analysis using persistence land-scapes. Journal of Machine Learning Research 16, 77–102.Bubenik, P., 2020. The persistence landscape and some of its properties. In: Topological Data Analysis. Springer, pp. 97–117.Bubenik, P., Carlsson, G., Kim, P., Luo, Z.-M., 2010. Statistical topology viaMorse theory persistence and nonparametric estimation. Algebraic methodsin statistics and probability II 516, 75–92.Bubenik, P., Hull, M., Patel, D., Whittle, B., 2020. Persistent homology detectscurvature. Inverse Problems 36 (2), 025008.Bullmore, E., Sporns, O., 2009. Complex brain networks: graph theoreticalanalysis of structural and functional systems. Nature Review Neuroscience10, 186–98.Carlsson, G., Gabrielsson, R. B., 2018. Topological approaches to deep learn-ing. arXiv preprint arXiv:1811.01122.Carlsson, G., Ishkhanov, T., De Silva, V., Zomorodian, A., 2008. On the lo-cal behavior of spaces of natural images. International journal of computervision 76, 1–12.Carlsson, G., Memoli, F., 2008. Persistent clustering and a theorem of J. Klein-berg. arXiv preprint arXiv:0808.2241.Carlsson, G., M´emoli, F., 2010. Characterization, stability and convergence ofhierarchical clustering methods. Journal of Machine Learning Research 11,1425–1470.Cassidy, B., Rae, C., Solo, V., 2015. Brain activity: conditional dissimilar-ity and persistent homology. In: IEEE 12th International Symposium onBiomedical Imaging (ISBI). pp. 1356–1359.Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., Mukhopadhyay,D., 2018. Adversarial attacks and defences: A survey. arXiv preprintarXiv:1810.00069.Chan, G., Wood, A., 1997. Algorithm as 312: An algorithm for simulatingstationary Gaussian random ﬁelds. Journal of the Royal Statistical Society:Series C 46, 171–181.Chan, J. M., Carlsson, G., Rabadan, R., 2013. Topology of viral evolution.Proceedings of the National Academy of Sciences 110 (46), 18566–18571.Chaudhuri, K., Kakade, S., Livescu, K., Sridharan, K., 2009. Multi-view clus-tering via canonical correlation analysis. In: Proceedings of the 26th annualinternational conference on machine learning. pp. 129–136.Chazal, F., Cohen-Steiner, D., Guibas, L., M´emoli, F., Oudot, S., 2009.Gromov-Hausdor ﬀ stable signatures for shapes using persistence. In: Com-puter Graphics Forum. Vol. 28. pp. 1393–1403.Chazal, F., Cohen-Steiner, D., M´erigot, Q., 2011. Geometric inference for prob-ability measures. Foundations of Computational Mathematics 11 (6), 733–751.Chazal, F., Michel, B., 2017. An introduction to topological data analy-sis: fundamental and practical aspects for data scientists. arXiv preprintarXiv:1710.04019.Chen, C., Ni, X., Bai, Q., Wang, Y., 2019. A topological regularizer for clas-siﬁers via persistent homology. In: The 22nd International Conference onArtiﬁcial Intelligence and Statistics. PMLR, pp. 2573–2582.Chen, W., Cockrell, C., Ward, K. R., Najarian, K., 2010. Intracranial pressurelevel prediction in traumatic brain injury by extracting features from mul-tiple sources and using machine learning methods. In: 2010 IEEE Interna-tional Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp.510–515.Chen, X., Zhang, H., Gao, Y., Wee, C.-Y., Li, G., Shen, D., 2016. High-orderresting-state functional connectivity network for MCI classiﬁcation. HumanBrain Mapping 37, 3282–3296.Chung, F., Graham, R., 1992. Cohomological aspects of hypergraphs. Transac-tions of the American Mathematical Society 334 (1), 365–388.Chung, M., 2012. Computational Neuroanatomy: The Methods. World Scien-tiﬁc, Singapore.Chung, M., 2018. Statistical challenges of big brain network data. Statistics andProbability Letter 136, 78–82.Chung, M., Adluru, N., Dalton, K., Alexander, A., Davidson, R., 2011. Scalablebrain network construction on white matter ﬁbers. In: Proc. of SPIE. Vol.7962. p. 79624G.Chung, M., Bubenik, P., Kim, P., 2009a. Persistence diagrams of cortical sur-face data. Proceedings of the 21st International Conference on InformationProcessing in Medical Imaging (IPMI), Lecture Notes in Computer Science(LNCS) 5636, 386–397.Chung, M., Dalton, K., Davidson, R., 2008. Tensor-based cortical surface mor-phometry via weighted spherical harmonic representation. IEEE Transac-tions on Medical Imaging 27, 1143–1151.Chung, M., Hanson, J., Lee, H., Adluru, N., Alexander, A. L., Davidson, R.,Pollak, S., 2013. Persistent homological sparse network approach to detect- ng white matter abnormality in maltreated children: MRI and DTI multi-modal study. MICCAI, Lecture Notes in Computer Science (LNCS) 8149,300–307.Chung, M., Hanson, J., Ye, J., Davidson, R., Pollak, S., 2015a. Persistent ho-mology in sparse regression and its application to brain morphometry. IEEETransactions on Medical Imaging 34, 1928–1939.Chung, M., Huang, S.-G., Gritsenko, A., Shen, L., Lee, H., 2019a. Statisticalinference on the number of cycles in brain networks. In: 2019 IEEE 16thInternational Symposium on Biomedical Imaging (ISBI 2019). IEEE, pp.113–116.Chung, M., Lee, H., DiChristofano, A., Ombao, H., Solo, V., 2019b. Exacttopological inference of the resting-state brain networks in twins. NetworkNeuroscience 3, 674–694.Chung, M., Lee, H., Solo, V., Davidson, R., Pollak, S., 2017a. Topologicaldistances between brain networks. International Workshop on Connectomicsin Neuroimaging, 161–170.Chung, M., Qiu, A., Seo, S., Vorperian, H., 2015b. Uniﬁed heat kernel regres-sion for di ﬀ usion, kernel smoothing and wavelets on manifolds and its appli-cation to mandible growth modeling in CT images. Medical Image Analysis22, 63–76.Chung, M., Singh, V., Kim, P., Dalton, K., Davidson, R., 2009b. Topologicalcharacterization of signal in brain images using min-max diagrams. MIC-CAI, Lecture Notes in Computer Science (LNCS) 5762, 158–166.Chung, M., Vilalta-Gil, V., Lee, H., Rathouz, P., Lahey, B., Zald, D., 2017b.Exact topological inference for paired brain networks via persistent homol-ogy. Information Processing in Medical Imaging (IPMI), Lecture Notes inComputer Science (LNCS) 10265, 299–310.Chung, M., Wang, Y., Wu, G. ., 2018. Heat kernel smoothing in irregular imagedomains. International Conference of the IEEE Engineering in Medicine andBiology Society (EMBC), 5101–5104.Chung, M., Xie, L., Huang, S.-G., Wang, Y., Yan, J., Shen, L., 2019c. Rapid ac-celeration of the permutation test via transpositions. In: International Work-shop on Connectomics in Neuroimaging. Vol. 11848. Springer, pp. 42–53.Clough, J., Oksuz, I., Byrne, N., Schnabel, J., King, A., 2019. Explicit topo-logical priors for deep-learning based image segmentation using persistenthomology. In: International Conference on Information Processing in Med-ical Imaging. Springer, pp. 16–28.Cohen-Steiner, D., Edelsbrunner, H., Harer, J., 2007. Stability of persistencediagrams. Discrete and Computational Geometry 37, 103–120.Cole, A., 2020. Identifying and exploiting structure in cosmological and stringtheoretic data. Ph.D. Thesis, University of Wisconsin-Madison.Cole, A., Loges, G. J., Shiu, G., 9 2020. Quantitative and Interpretable OrderParameters for Phase Transitions from Persistent Homology.Cole, A., Shiu, G., 2019. Topological data analysis for the string landscape.Journal of High Energy Physics 2019 (3), 54.Cole, J. H., Poudel, R. P., Tsagkrasoulis, D., Caan, M. W., Steves, C., Spector,T. D., Montana, G., 2017. Predicting brain age with deep learning from rawimaging data results in a reliable and heritable biomarker. NeuroImage 163,115–124.Cootes, T., Hill, A., Taylor, C., Haslam, J., 1993. The use of active shape mod-els for locating structures in medical images. Lecture Notes in ComputerScience 687, 33–33.Correa, N., Li, Y.-O., Adali, T., Calhoun, V., 2009. Fusion of fMRI, sMRI, andEEG data using canonical correlation analysis. In: 2009 IEEE InternationalConference on Acoustics, Speech and Signal Processing. pp. 385–388.Dabaghian, Y., M´emoli, F., Frank, L., Carlsson, G., 2012. A topologicalparadigm for hippocampal spatial map formation using persistent homol-ogy. PLoS Comput Biol 8 (8), e1002581.Davison, E. N., Schlesinger, K. J., Bassett, D. S., Lynall, M.-E., Miller, M. B.,Grafton, S. T., Carlson, J. M., 2015. Brain network adaptability across taskstates. PLoS comput biol 11 (1), e1004029.Davison, E. N., Turner, B. O., Schlesinger, K. J., Miller, M. B., Grafton, S. T.,Bassett, D. S., Carlson, J. M., 2016. Individual di ﬀ erences in dynamic func-tional brain connectivity across the human lifespan. PLoS computationalbiology 12 (11), e1005178.de Silva, V., Carlsson, G., 2004. Topological estimation using witness com-plexes. SPBG 4, 157–166.de Silva, V., Ghrist, R., 2007. Homological sensor networks. Notic Amer MathSoc 54, 10–17.Desai, R., Liebenthal, E., Possing, E., Waldron, E., Binder, J., 2005. Volumetricvs. surface-based alignment for localization of auditory cortex activation. NeuroImage 26 (4), 1019–1029.Divol, V., Polonik, W., 2019. On the choice of weight functions for linear rep-resentations of persistence diagrams. Journal of Applied and ComputationalTopology 3 (3), 249–283.Dong, P., Guo, Y., Shen, D., Wu, G., 2015. Multi-atlas and multi-modal hip-pocampus segmentation for infant mr brain images by propagating anatom-ical labels on hypergraph. In: International Workshop on Patch-based Tech-niques in Medical Imaging. Springer, pp. 188–196.Edelsbrunner, H., Harer, J., 2008. Persistent homology - a survey. Contempo-rary Mathematics 453, 257–282.Edelsbrunner, H., Harer, J., 2010a. Computational Topology. American Mathe-matical Society.Edelsbrunner, H., Harer, J., 2010b. Computational topology: An introduction.American Mathematical Society.Edelsbrunner, H., Letscher, D., Zomorodian, A., 2002. Topological persistenceand simpliﬁcation. Discrete and Computational Geometry 28, 511–533.Edelsbrunner, H., M¨ucke, E. P., 1994. Three-dimensional alpha shapes. ACMTransactions on Graphics (TOG) 13 (1), 43–72.Edmonds, J., Karp, R. M., 1972. Theoretical improvements in algorithmic ef-ﬁciency for network ﬂow problems. Journal of the ACM (JACM) 19 (2),248–264.Efron, B., 1982. The jackknife, the bootstrap and other resampling plans.Vol. 38. SIAM.Eickho ﬀ , S., Nichols, T., Laird, A., Ho ﬀ staedter, F., Amunts, K., Fox, P., Bz-dok, D., Eickho ﬀ , C., 2016. Behavior, sensitivity, and power of activationlikelihood estimation characterized by massive empirical simulation. Neu-roImage 137, 70–85.Emtander, E., 2009. Betti numbers of hypergraphs. Communications in algebra37 (5), 1545–1571.Estrada, E., Rodriguez-Velazquez, J. A., 2005. Complex networks as hyper-graphs. arXiv preprint physics / / CRC Press.Gu, S., Yang, M., Medaglia, J. D., Gur, R. C., Gur, R. E., Satterthwaite, T. D.,Bassett, D. S., 2017. Functional hypergraph uncovers novel covariant struc-tures over neurodevelopment. Human brain mapping 38 (8), 3823–3835.Guibas, L., Oudot, S., 2008. Reconstruction using witness complexes. Discrete& Computational Geometry 40, 325–356.Guo, W., Manohar, K., Brunton, S., Banerjee, A., 2018. Sparse-TDA: Sparserealization of topological data analysis for multi-way classiﬁcation. IEEETransactions on Knowledge and Data Engineering 30, 1403–1408.Guo, X., Srivastava, A., 2020. Representations, metrics and statistics for shapeanalysis of elastic graphs. In: Proceedings of the IEEE / CVF Conference onComputer Vision and Pattern Recognition Workshops. pp. 832–833. arker, S., Mischaikow, K., Mrozek, M., Nanda, V., Wagner, H., Juda, M.,Dlotko, P., 2010. The e ﬃ ciency of a homology algorithm based on discretemorse theory and coreductions. Image-A: Applicable Mathematics in ImageEngineering, 1 (1), 41-48.Hart, J., 1999. Computational topology for shape modeling. In: Proceedingsof the International Conference on Shape Modeling and Applications. pp.36–43.He, Y., Chen, Z., Evans, A., 2008. Structural insights into aberrant topologicalpatterns of large-scale cortical networks in Alzheimer’s disease. Journal ofNeuroscience 28, 4756.Heinsfeld, A. S., Franco, A. R., Craddock, R. C., Buchweitz, A., Meneguzzi,F., 2018. Identiﬁcation of autism spectrum disorder using deep learning andthe abide dataset. NeuroImage: Clinical 17, 16–23.Heo, G., Gamble, J., Kim, P., 2012. Topological analysis of variance and themaxillary complex. Journal of the American Statistical Association 107,477–492.Hofer, C., Graf, F., Rieck, B., Niethammer, M., Kwitt, R., 2020. Graph ﬁltrationlearning. In: International Conference on Machine Learning. PMLR, pp.4314–4323.Hofer, C. D., Kwitt, R., Niethammer, M., 2019. Learning representations ofpersistence barcodes. Journal of Machine Learning Research 20 (126), 1–45.Horak, D., Maleti´c, S., Rajkovi´c, M., 2009. Persistent homology of complexnetworks. Journal of Statistical Mechanics: Theory and Experiment 2009,P03034.Hotelling, H., 1992. Relations between two sets of variates. In: Breakthroughsin statistics. Springer, pp. 162–190.Hu, X., Li, F., Samaras, D., Chen, C., 2019. Topology-preserving deep imagesegmentation. In: Advances in Neural Information Processing Systems. pp.5657–5668.Huang, S., Papernot, N., Goodfellow, I., Duan, Y., Abbeel, P., 2017. Adversarialattacks on neural network policies. arXiv:1702.02284.Huang, S.-G., Chung, M. K., Carroll, I. C., Goldsmith, H. H., 2019a. Dy-namic functional connectivity using heat kernel. In: 2019 IEEE Data Sci-ence Workshop (DSW). pp. 222–226.Huang, S.-G., Lyu, I., Qiu, A., Chung, M., 2019b. Fast polynomial approxima-tion to heat di ﬀ usion in manifolds. MICCAI 11767, 48–56.Huang, S.-G., Lyu, I., Qiu, A., Chung, M., 2020a. Fast polynomial approxima-tion of heat kernel convolution on manifolds and its application to brain sul-cal and gyral graph pattern analysis. IEEE Transactions on Medical Imaging39, 2201–2212.Huang, S.-G., Samdin, S.-T., Ting, C., Ombao, H., Chung, M., 2020b. Statis-tical model for dynamically-changing correlation matrices with applicationto brain connectivity. Journal of Neuroscience Methods 331, 108480.Jain, A. K., Murty, M. N., Flynn, P. J., 1999. Data clustering: a review. ACMcomputing surveys (CSUR) 31 (3), 264–323.Jie, B., Wee, C.-Y., Shen, D., Zhang, D., 2016. Hyper-connectivity of functionalnetworks for brain disease diagnosis. Medical image analysis 32, 84–100.Jnawali, K., Arbabshirani, M. R., Rao, N., Patel, A. A., 2018. Deep 3d con-volution neural network for ct brain hemorrhage classiﬁcation. In: MedicalImaging 2018: Computer-Aided Diagnosis. Vol. 10575. International Soci-ety for Optics and Photonics, p. 105751C.Kaliˇsnik, S., 2019. Tropical coordinates on the space of persistence barcodes.Foundations of Computational Mathematics 19 (1), 101–129.Kerber, M., Morozov, D., Nigmetov, A., 2017. Geometry helps to comparepersistence diagrams. Journal of Experimental Algorithmics (JEA) 22, 1–20.Khalid, A., Kim, B., Chung, M., Ye, J., Jeon, D., 2014. Tracing the evolutionof multi-scale functional networks in a mouse model of depression usingpersistent brain network homology. NeuroImage 101, 351–363.Kiebel, S., Poline, J.-P., Friston, K., Holmes, A., Worsley, K., 1999. Ro-bust smoothness estimation in statistical parametric maps using standardizedresiduals from the general linear model. NeuroImage 10, 756–766.Kim, K. H., Choi, S. H., Park, S.-H., 2018. Improving arterial spin labeling byusing deep learning. Radiology 287 (2), 658–666.Kim, W., Adluru, N., Chung, M., Okonkwo, O., Johnson, S., Bendlin, B.,Singh, V., 2015. Multi-resolution statistical analysis of brain connectivitygraphs in preclinical Alzheimer’s disease. NeuroImage 118, 103–117.Kondor, R., Howard, A., Jebara, T., 2007. Multi-object tracking with represen-tations of the symmetric group. In: International Conference on ArtiﬁcialIntelligence and Statistics (AISTATS). Vol. 1. p. 5. Kong, R., Gao, J., Xu, Y., Pan, Y., Wang, J., Liu, J., 2019. Classiﬁcation ofautism spectrum disorder by combining brain connectivity and deep neuralnetwork classiﬁer. Neurocomputing 324, 63–68.Kottaimalai, R., Rajasekaran, M. P., Selvam, V., Kannapiran, B., 2013. Eeg sig-nal classiﬁcation using principal component analysis with neural network inbrain computer interface applications. In: 2013 IEEE International Confer-ence ON Emerging Trends in Computing, Communication and Nanotech-nology (ICECCN). IEEE, pp. 227–231.Lee, H., Chung, M.K., K. H., Lee, D., 2014. Hole detection in metabolic con-nectivity of Alzheimer’s disease using k-Laplacian. In: International Con-ference on Medical Image Computing and Computer-Assisted Intervention(MICCAI), Lecture Notes in Computer Science. pp. 297–304.Lee, H., Chung, M., Choi, H., K., H., Ha, S., Kim, Y., Lee, D., 2019. Harmonicholes as the submodules of brain network and network dissimilarity. Inter-national Workshop on Computational Topology in Image Context, LectureNotes in Computer Science, 110–122.Lee, H., Chung, M., Kang, H., Choi, H., Kim, Y., Lee, D., 2018. Abnormalhole detection in brain connectivity by kernel density of persistence diagramand Hodge Laplacian. In: IEEE International Symposium on BiomedicalImaging (ISBI). pp. 20–23.Lee, H., Chung, M., Kang, H., Kim, B.-N., Lee, D., 2011a. Computing theshape of brain networks using graph ﬁltration and Gromov-Hausdor ﬀ met-ric. MICCAI, Lecture Notes in Computer Science 6892, 302–309.Lee, H., Chung, M., Kang, H., Kim, B.-N., Lee, D., 2011b. Discriminativepersistent homology of brain networks. In: IEEE International Symposiumon Biomedical Imaging (ISBI). pp. 841–844.Lee, H., Kang, H., Chung, M., Kim, B.-N., Lee, D., 2012. Persistent brainnetwork homology from the perspective of dendrogram. IEEE Transactionson Medical Imaging 31, 2267–2277.Lee, H., Kang, H., Chung, M., Lim, S., Kim, B.-N., Lee, D., 2017. Integratedmultimodal network approach to PET and MRI based on multidimensionalpersistent homology. Human Brain Mapping 38, 1387–1402.Lee, H., Lee, D., Kang, H., Kim, B.-N., Chung, M., 2011c. Sparse brain net-work recovery under compressed sensing. IEEE Transactions on MedicalImaging 30, 1154–1165.Li, Y., Wang, D., Ascoli, G., Mitra, P., Wang, Y., 2017. Metrics for compar-ing neuronal tree shapes based on persistent homology. PloS one 12 (8),e0182184.Lloyd, S., Garnerone, S., Zanardi, P., 2016. Quantum algorithms for topologicaland geometric analysis of data. Nature communications 7 (1), 1–7.Mazumder, R., Hastie, T., 2012. Exact covariance thresholding into connectedcomponents for large-scale graphical LASSO. The Journal of MachineLearning Research 13, 781–794.Merelli, E., Piangerelli, M., Rucco, M., Toller, D., 2016. A topological ap-proach for multivariate time series characterization: the epileptic brain. In:Proceedings of the 9th EAI International Conference on Bio-inspired In-formation and Communications Technologies (formerly BIONETICS). pp.201–204.Merelli, E., Rucco, M., Sloot, P., Tesei, L., 2015. Topological characterizationof complex systems: Using persistent entropy. Entropy 17 (10), 6872–6892.Milnor, J., 1973. Morse Theory. Princeton University Press.Mischaikow, K., Nanda, V., 2013. Morse theory for ﬁltrations and e ﬃ cientcomputation of persistent homology. Discrete & Computational Geometry50 (2), 330–353.Mittal, M., Goyal, L. M., Kaur, S., Kaur, I., Verma, A., Hemanth, D. J., 2019.Deep learning based enhanced tumor segmentation approach for mr brainimages. Applied Soft Computing 78, 346–354.Moeskops, P., Veta, M., Lafarge, M. W., Eppenhof, K. A., Pluim, J. P., 2017.Adversarial training and dilated convolutions for brain mri segmentation.In: Deep learning in medical image analysis and multimodal learning forclinical decision support. Springer, pp. 56–64.Muhammad, A., Egerstedt, M., 2006. Control using higher order laplacians innetwork topologies. In: Proc. of 17th International Symposium on Mathe-matical Theory of Networks and Systems. pp. 1024–1038.Osher, S., Fedkiw, R., 2003. Level set methods and dynamic implicit surfaces.Springer Verlag.O’Sullivan, F., Saha, A., 1999. Use of ridge regression for improved estimationof kinetic constants from pet data. IEEE transactions on medical imaging18 (2), 115–125.Otter, N., Porter, M., Tillmann, U., Grindrod, P., Harrington, H., 2017. Aroadmap for the computation of persistent homology. EPJ Data Science (1), 17.Pachauri, D., Hinrichs, C., Chung, M., Johnson, S., Singh, V., 2011. Topologybased kernels with application to inference problems in alzheimer’s disease.IEEE Transactions on Medical Imaging, 1760–1770.Palande, S., Jose, V., Zielinski, B., Anderson, J., Fletcher, P., Wang, B., 2017.Revisiting abnormalities in brain network architecture underlying autism us-ing topology-inspired statistical inference. In: International Workshop onConnectomics in Neuroimaging. pp. 98–107.Petri, G., Expert, P., Turkheimer, F., Carhart-Harris, R., Nutt, D., Hellyer, P.,Vaccarino, F., 2014. Homological sca ﬀ olds of brain functional networks.Journal of The Royal Society Interface 11, 20140873.Petri, G., Scolamiero, M., Donato, I., Vaccarino, F., 2013. Topological strata ofweighted complex networks. PloS One 8, e66506.Pothen, A., Fan, C., 1990. Computing the block triangular form of a sparsematrix. ACM Transactions on Mathematical Software (TOMS) 16, 324.Pun, C. S., Xia, K., Lee, S. X., 2018. Persistent-homology-based machine learn-ing and its applications–a survey. arXiv preprint arXiv:1811.00252.Qiu, A., Lee, A., Tan, M., Chung, M., 2015. Manifold learning on brain func-tional networks in aging. Medical image analysis 20, 52–60.Rathore, A., Palande, S., Anderson, J. S., Zielinski, B. A., Fletcher, P. T., Wang,B., 2019. Autism classiﬁcation using topological features and deep learning:A cautionary tale. In: International Conference on Medical Image Comput-ing and Computer-Assisted Intervention. Springer, pp. 736–744.Reininghaus, J., Huber, S., Bauer, U., Kwitt, R., 2015. A stable multi-scalekernel for topological machine learning. In: IEEE Conference on ComputerVision and Pattern Recognition (CVPR). pp. 4741–4748.Rottschy, C., Langner, R., Dogan, I., Reetz, K., Laird, A., Schulz, J., Fox,P., Eickho ﬀ , S., 2012. Modelling neural correlates of working memory: acoordinate-based meta-analysis. NeuroImage 60, 830–846.Rubinov, M., Knock, S. A., Stam, C. J., Micheloyannis, S., Harris, A. W.,Williams, L. M., Breakspear, M., 2009. Small-world properties of nonlinearbrain activity in schizophrenia. Human Brain Mapping 30, 403–416.Rubinov, M., Sporns, O., 2010. Complex network measures of brain connectiv-ity: Uses and interpretations. NeuroImage 52, 1059–1069.Rucco, M., Castiglione, F., Merelli, E., Pettini, M., 2016. Characterisation ofthe idiotypic immune network through persistent entropy. In: Proceedingsof ECCS 2014. Springer, pp. 117–128.Salvador, R., Suckling, J., Coleman, M. R., Pickard, J. D., Menon, D., Bull-more, E., 2005. Neurophysiological architecture of functional magnetic res-onance images of human brain. Cerebral Cortex 15, 1332–1342.Sato, Y., Nakajima, S., Shiraga, N., Atsumi, H., Yoshida, S., Koller, T., Gerig,G., Kikinis, R., 1998. Three-dimensional multi-scale line ﬁlter for segmen-tation and visualization of curvilinear structures in medical images. MedicalImage Analysis 2, 143–168.Schaub, M., Benson, A., Horn, P., Lippner, G., Jadbabaie, A., 2018. Randomwalks on simplicial complexes and the normalized hodge laplacian. arXivpreprint arXiv:1807.05044.Sheehy, D. R., 2013. Linear-size approximations to the vietoris–rips ﬁltration.Discrete & Computational Geometry 49 (4), 778–796.Si, H., 2015. TetGen, a Delaunay-based quality tetrahedral mesh generator.ACM Transactions on Mathematical Software (TOMS) 41, 1–36.Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.,2008. Topological analysis of population activity in visual cortex. Journal ofVision 8, 1–18.Solo, V., Poline, J., Lindquist, M., Simpson, S., Bowman, D., C. M., Cassidy,B., 2018. Connectivity in fMRI: a review and preview. IEEE Transactionson Medical Imaging, in press.Songdechakraiwut, T., Chung, M., 2020a. Dynamic topological data analysisfor functional brain signals. IEEE International Symposium on BiomedicalImaging Workshops, 1–4.Songdechakraiwut, T., Chung, M., 2020b. Topological learning for brain net-works, arXiv:2012.00675.Stolz, B., Emerson, T., Nahkuri, S., Porter, M., Harrington, H., 2018. Topolog-ical data analysis of task-based fMRI data from experiments on schizophre-nia. arXiv preprint arXiv:1809.08504.Stolz, B., Harrington, H., Porter, M., 2017. Persistent homology of time-dependent functional networks constructed from coupled time series. Chaos:An Interdisciplinary Journal of Nonlinear Science 27, 047410.Suk, H.-I., Lee, S.-W., Shen, D., Initiative, A. D. N., et al., 2017. Deep ensemblelearning of sparse regression models for brain disease diagnosis. Medicalimage analysis 37, 101–113. Supekar, K., Menon, V., Rubin, D., Musen, M., Greicius, M., 2008. Networkanalysis of intrinsic functional brain connectivity in Alzheimer’s disease.PLoS Computational Biology 4 (6), e1000100.Taylor, J., Worsley, K., 2008. Random ﬁelds of multivariate test statistics, withapplications to shape analysis. Annals of Statistics 36, 1–27.Thirion, B., Varoquaux, G., Dohmatob, E., Poline, J.-B., 2014. Which fMRIclustering gives good brain parcellations? Frontiers in neuroscience 8, 167.Topaz, C., Ziegelmeier, L., Halverson, T., 2015. Topological data analysis ofbiological aggregation models. PLoS One, e0126383.Torres, L., Blevins, A. S., Bassett, D. S., Eliassi-Rad, T., 2020. The why,how, and when of representations for complex systems. arXiv preprintarXiv:2006.02870.Turner, K., Mileyko, Y., Mukherjee, S., Harer, J., 2014. Fr´echet means fordistributions of persistence diagrams. Discrete & Computational Geometry52 (1), 44–70.Tuzhilin, A., 2016. Who invented the Gromov-Hausdor ﬀ distance? arXivpreprint arXiv:1612.00728.Uddin, L., Kelly, A., Biswal, B., Margulies, D., Shehzad, Z., Shaw, D., Ghaf-fari, M., Rotrosen, J., Adler, L., Castellanos, F., Milham, M., 2008. Networkhomogeneity reveals decreased integrity of default-mode network in ADHD.Journal of Neuroscience Methods 169, 249–254.Wang, G., Wang, Y., Initiative, A. D. N., 2017a. Towards a holistic corticalthickness descriptor: heat kernel-based grey matter morphology signatures.Neuroimage 147, 360–380.Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A. J., Shen, L.,2011. Sparse multi-task regression and feature selection to identify brainimaging predictors for memory performance. In: 2011 International Confer-ence on Computer Vision. IEEE, pp. 557–562.Wang, Y., Chung, M., Dentico, D., Lutz, A., Davidson, R., 2017b. Topologicalnetwork analysis of electroencephalographic power maps. In: InternationalWorkshop on Connectomics in NeuroImaging, Lecture Notes in ComputerScience (LNCS). Vol. 10511. pp. 134–142.Wang, Y., Ombao, H., Chung, M., 2018. Topological data analysis of single-trial electroencephalographic signals. Annals of Applied Statistics 12, 1506–1534.Wang, Y., Ombao, H., Chung, M., 2019. Statistical persistent homology ofbrain signals. International Conference on Acoustics, Speech, and SignalProcessing (ICASSP), 1125–1129.Wijk, B. C. M., Stam, C. J., Da ﬀ ertshofer, A., 2010. Comparing brain networksof di ﬀ erent size and connectivity density using graph theory. PloS one 5,e13701.Wolterink, J. M., Dinkla, A. M., Savenije, M. H., Seevinck, P. R., van denBerg, C. A., Iˇsgum, I., 2017. Deep mr to ct synthesis using unpaired data.In: International workshop on simulation and synthesis in medical imaging.Springer, pp. 14–23.Wong, E., Palande, S., Wang, B., Zielinski, B., Anderson, J., Fletcher, P., 2016.Kernel partial least squares regression for relating functional brain networktopology to clinical measures of behavior. In: IEEE International Sympo-sium on Biomedical Imaging (ISBI). pp. 1303–1306.Worsley, K., Marrett, S., Neelin, P., Vandal, A., Friston, K., Evans, A., 1996. Auniﬁed statistical approach for determining signiﬁcant signals in images ofcerebral activation. Human Brain Mapping 4, 58–73.Xu, X., Cisewski-Kehe, J., Green, S. B., Nagai, D., 2019. Finding cosmic voidsand ﬁlament loops using topological data analysis. Astron. Comput. 27, 34–52.Yoo, K., Lee, P., Chung, M., Sohn, W., Chung, S., Na, D., Ju, D., Jeong, Y.,2017. Degree-based statistic and center persistency for brain connectivityanalysis. Human Brain Mapping 38, 165–181.Zavlanos, M., Pappas, G., 2008. A dynamical systems approach to weightedgraph matching. Automatica 44, 2817–2824.Zhang, G., Cai, B., Zhang, A., Stephen, J., Wilson, T., Calhoun, V., Wang, Y.-P., 2019. Estimating Dynamic Functional Brain Connectivity With a SparseHidden Markov Model. IEEE transactions on medical imaging 39, 488–498.Zhang, Y. C., Kagen, A. C., 2017. Machine learning interface for medical imageanalysis. Journal of digital imaging 30 (5), 615–621.Zhu, X., Suk, H.-I., Shen, D., 2014. Matrix-similarity based loss function andfeature selection for Alzheimer’s disease diagnosis. In: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition. pp. 3089–3096.Zomorodian, A., 2009. Topology for computing. Vol. 16 of Cambridge Mono-graphs on Applied and Computational Mathematics. Cambridge University ress, Cambridge.Zomorodian, A., Carlsson, G., 2005. Computing persistent homology. Discreteand Computational Geometry 33, 249–274.Zu, C., Gao, Y., Munsell, B., Kim, M., Peng, Z., Zhu, Y., Gao, W., Zhang, D.,Shen, D., Wu, G., 2016. Identifying high order brain connectome biomarkersvia learning on hypergraph. In: International Workshop on Machine Learn-ing in Medical Imaging. Springer, pp. 1–9.ress, Cambridge.Zomorodian, A., Carlsson, G., 2005. Computing persistent homology. Discreteand Computational Geometry 33, 249–274.Zu, C., Gao, Y., Munsell, B., Kim, M., Peng, Z., Zhu, Y., Gao, W., Zhang, D.,Shen, D., Wu, G., 2016. Identifying high order brain connectome biomarkersvia learning on hypergraph. In: International Workshop on Machine Learn-ing in Medical Imaging. Springer, pp. 1–9.