From trees to barcodes and back again: theoretical and statistical perspectives
FFrom trees to barcodes and back again:theoretical and statistical perspectives
Lida Kanari ∗ , Ad´elie Garin , Kathryn Hess Blue Brain Project, ´Ecole polytechnique f´ed´erale de Lausanne(EPFL), Campus Biotech, 1202 Geneva, Switzerland. Laboratory for topology and neuroscience, Brain Mind Institute,´Ecole polytechnique f´ed´erale de Lausanne (EPFL), 1015 Lausanne,Switzerland.*Correspondence: [email protected] 8, 2020
Abstract
Methods of topological data analysis have been successfully appliedin a wide range of fields to provide useful summaries of the structure ofcomplex data sets in terms of topological descriptors, such as persistencediagrams. While there are many powerful techniques for computing topo-logical descriptors, the inverse problem, i.e., recovering the input datafrom topological descriptors, has proved to be challenging. In this articlewe study in detail the Topological Morphology Descriptor (TMD), whichassigns a persistence diagram to any tree embedded in Euclidean space, anda sort of stochastic inverse to the TMD, the Topological Neuron Synthesis(TNS) algorithm, gaining both theoretical and computational insights intothe relation between the two. We propose a new approach to classifybarcodes using symmetric groups, which provides a concrete language toformulate our results. We investigate to what extent the TNS recoversa geometric tree from its TMD and describe the effect of different typesof noise on the process of tree generation from persistence diagrams. Weprove moreover that the TNS algorithm is stable with respect to specifictypes of noise.
Keywords: tree, topological data analysis, persistence barcode, sym-metric group, inverse methods a r X i v : . [ m a t h . A T ] D ec ontents Introduction
Although geometric approaches to analyzing data have been extensivelyused for many years, the first topological methods for data analysis weredeveloped only recently, e.g., [8], [7], [22], [26], [25] and [4]. TopologicalData Analysis (TDA) is a fairly new field at the intersection of datascience and algebraic topology, the aim of which is to provide robustmathematical, statistical, and algorithmic methods to infer and analyzethe topological and geometric structures underlying complex data. Thesedata are often represented as point clouds in Euclidean or metric spaces,though TDA methods have also been generalized to geometric objectsand graphs. TDA has proved its utility in a wide range of applications inbiology [11], [3], [18], [10], material science [16], and climate science [19],among other fields. Although it is still rapidly evolving, TDA now providesa set of powerful and efficient tools that can be used in combination withor as complements to other data science tools.One of the most promising applications of TDA is to the study of thebrain, where it has served to analyze neuronal morphologies [13], brainnetworks [21], [23] [13], and brain functionality [24]. Motivated by the desireto objectively classify neuronal morphologies, in a previous publication(Kanari and Hess in [13]) we designed a topological signature for trees, theTopological Morphology Descriptor (TMD), that assigns a barcode (i.e., amulti-set of open intervals – called bars – in the real line) to any geometrictree (i.e, any finite binary tree embedded in R ). We showed that the TMDalgorithm effectively determines the reliability of the clustering of randomand neuronal trees. Moreover, using the TMD algorithm, we performed anobjective, stable classification of pyramidal cells in the rat neocortex [14],based only on the shape of their dendrites.A frequent topic of discussion in the context of TDA is how to define aninverse to the process of associating a particular topological descriptor to adataset, i.e., how to design a practical algorithm to recover the input datafrom a topological descriptor, such as a barcode. Oudot and Solomon [20]and Curry et al. [6] have proposed partial solutions to this problem. Themain obstacle that renders this endeavor particularly challenging has provento be the computational complexity of the space of inputs considered. Toavoid this obstacle, it is reasonable to constrain the input space and searchonly for an inverse transformation that is relevant in a specific context, forinstance to look for solutions only in the space of embedded graphs, asin [2].In the context of geometric trees, we have designed an algorithm toreverse-engineer the TMD [12], in order to digitally generate artificial neu-rons, to compensate for the dearth of available biological reconstructions.This algorithm, called Topological Neuron Synthesis (TNS), stochasticallygenerates a geometric tree from a barcode, in a biologically groundedmanner. As shown in [12], the synthesized neurons are statistically in-distinguishable from the corresponding reconstructed neurons in terms ofboth their morphological characteristics and the networks they form.In this article, we further study the properties of this generative al- orithm, from mathematical and statistical perspectives. We perform atheoretical and computational analysis of the TMD and TNS algorithmsand their mathematical properties, in which symmetric groups play a keyrole. In particular, we investigate in detail the extent to which the TNSprovides an inverse to the TMD.First, we carefully define our objects of study – geometric trees, bar-codes, and persistence diagrams – then recall the TMD and TNS algorithms.We also introduce two distinct classifications of geometric trees: into com-binatorial types and into TMD-types. The symmetric groups play animportant role in our classification of trees into TMD-types. These com-plementary descriptions provide us with a language in which to formulateour results on the relationship between the TMD and the TNS.In the next section, we introduce tools to describe the set of geometrictrees that realize a specific barcode, i.e., whose TMD is equal to thatbarcode. In particular we establish an explicit formula for the cardinalityof this set, which we use to describe how the cardinality changes when anew bar is added to a barcode or two bars of a barcode permuted. Cayleygraphs of symmetric groups provide a useful visualization of these effects.We then study the composite of the TNS and TMD algorithms froma theoretical perspective, to quantify the extent to which the TNS actsas an inverse to the TMD. For a given barcode B , we show that, fora reasonable choice of parameter in the TNS, the probability that thebottleneck distance between the barcodes B and TMD ◦ TNS( B ) is greaterthan ε decreases with ε , thus establishing a form of stability for the TNS.We prove, moreover, that the probability that two bars of a barcode B will be permuted by applying TMD ◦ TNS decreases exponentially withthe distance between the terminations of the two bars, which is anotherform of stability. Together these stability results imply that the TNS is anexcellent approximation to a (right) inverse to the TMD.In the final section we present computational results that illustrate thecomplex relationship between a barcode and its possible tree-realizations.In particular, we study the distinguishing characteristics of “biological”geometric trees, i.e., those that arise from digital reconstructions of neurons,as opposed to arbitrary geometric trees. We also show that both thecombinatorial type and the TMD-type of a geometric tree can changesignificantly when applying the composite TNS ◦ TMD, from which itfollows that the TNS is not a left inverse to the TMD. rees Barcodes
TreesTrees
Barcodes Barcodes
TMDTMDTNS TNS
Neurons
Figure 1: The two composites of TMD and TNS. (Left) An illustration of how aneuron (black) is modeled as a tree (dashed red lines). We describe this processand how to extract a barcode from a tree in section 2.3. (Top) The compositeTMD ◦ TNS applied to a barcode B . The new barcode B (cid:48) = TMD ◦ TNS( B )is indicated in dashed lines on top of the barcode B on the right. We show insection 4 that the barcodes B and B (cid:48) will almost certainly be very similar andquantify this similarity. (Bottom) The composite TNS ◦ TMD applied to a tree T .The tree T that we start with is indicated in dashed red lines under the new tree T (cid:48) = TNS ◦ TMD( T ). The trees T and T (cid:48) can be quite different combinatorially,as seen on the right. Precisely what a mathematician means by the terms “tree” and “barcode”can vary depending on context. First, we specify what these terms mean inthis article. We then recall biologically motivated algorithms for generatingbarcodes from trees [13] and trees from barcodes [12], the relation betweenwhich will be made clear in the following sections. A finite rooted tree T is an acyclic, finite, directed graph such that eachvertex is of degree at most 3, with a distinguished vertex r of degree 1,called the root . A vertex v of T is a parent of a vertex w if there is adirected edge from w to v ; the vertex w is then a child of v . Each vertexof T has a single parent, except for the root r , which has no parent, and atmost two children. The non-root vertices of degree 1 are called the leaves of T , and the vertices of degree 3 the branch points of T . A finite tree T is ully specified by its set of vertices, equipped with the partial order “is aparent of”.Our main objects of study in this article are geometric trees , i.e.,embeddings of finite rooted trees in R , which are often used to modelneurons. We assume, moreover, that if a vertex v is the parent of a vertex w , then the distance from the root to v is less than that from the root to w . Let T denote the set of geometric trees.We say that two geometric trees T and T (cid:48) are combinatorially equivalent ,denoted T ∼ comb T (cid:48) , if they are embeddings of the same finite rooted tree.In other words, the combinatorial type of a geometric tree is independentof its embedding in R . A persistence barcode is a finite multi-set B = { ( b i , d i ) } i =0 ,...,n of openintervals, called bars , in the real line, R . We call b i the birth time and d i the death time of the i th interval. For barcodes of geometric trees, generatedwith the TMD algorithm (cf. section 2.3), the birth time b i is the distancefrom the first bifurcation of branch i to the root, while the death time d i is the distance from the branch termination to the root. Let B denote theset of all barcodes.A persistence barcode can equivalently be represented as a multi-set ofpoints in R , called a persistence diagram , where a bar ( b i , d i ) correspondsto a point in R with x -coordinate d i and y -coordinate b i . If B is a barcode,we let PD( B ) denote the associated persistence diagram. Note that, underthis correspondence, the points of PD( B ) lie below the diagonal, since b i is less than d i for every i .We say that a barcode B = { ( b i , d i ) } i =0 ,...,n is strict if the first bar( b , d ) properly contains all the others, i.e., b < b i and d i < d for all i ,and no bars are born or die at the same time, i.e., b i (cid:54) = b j and d i (cid:54) = d j for all i (cid:54) = j . The birth times of a strict barcode admit a total ordering.Without loss of generality, we assume that the bars are ordered by birthvalue, that is b < b < · · · < b n . Let B st denote the set of strict barcodes,and let B st n denote the subset of those strict barcodes with n + 1 bars.We say that two strict barcodes B = { ( b i , d i ) } i =0 ,...,n and B (cid:48) = { ( b (cid:48) i , d (cid:48) i ) } i =0 ,...,n with the same number of bars are equivalent , denoted B ∼ bar B (cid:48) , if theirdeaths occur in the same order ( d i > d j ⇔ d (cid:48) i > d (cid:48) j ), which clearly defines anequivalence relation on B st n . There is a bijection from the set of equivalenceclasses of strict barcodes with n + 1 bars to the symmetric group S n on n Much of the framework that we we develop in this article could be extended to geometrictrees equipped with a different distance function that does not necessarily satisfy this condition,at the price of a more involved combinatorial representation. Since our geometric trees ofinterest – digitally reconstructed neurons equipped with the path-distance from the root – dosatisfy our extra assumption, we prefer to leave this extension to the interested reader. The inversion of the coordinates that we apply here is motivated by the TMD algorithm(cf. section 2.3), which decomposes a geometric tree into bars, starting at the leaves andprogressing down to the root. etters, taking the equivalence class of a barcode B = { ( b i , d i ) } i =0 ,...,n tothe permutation σ B : { , ..., n } → { , ..., n } , where σ B ( i ) = { j | d j ≤ d i } . We denote the equivalence class containing a strict barcode B = { ( b i , d i ) } i =0 ,...,n by ( i ...i n ), where d i k > d i k +1 for all 1 ≤ k < n . Forexample, (2134) corresponds to the barcode with 5 bars shown in Figure 2. b d b d b d b d b d Figure 2: A strict barcode belonging to the equivalence class (2134). One bar( b , d ) contains all the others. The remaining bars are ordered by their birthtimes ( b < b < b < b ). Similarly, the deaths are ordered d > d > d > d ,leading to the notation (2134). The TMD (Topological Morphology Descriptor) is a many-to-one functionfrom the set of geometric trees to the set of barcodes,TMD :
T → B , that encodes the overall shape of the tree, both the topology of the branch-ing structure of a tree and its embedding in R [13]. It is defined recursivelyas follows.Let T be a rooted tree with root r and set N of vertices, with subset L of leaves. Let δ : N → R ≥ be the function that assigns to each vertex itsEuclidean distance to the root r .Intuitively, the output of the TMD algorithm is a barcode, where eachbar represents a branch of the tree. The endpoints of a bar correspond tothe distances to the root from the tip of the branch and from the pointwhere the branch bifurcates from another, longer branch, see Figure 3.Table 1 summarizes the terminology used for the TMD algorithm. For those used to think in terms of persistent homology, the TMD computes the 0-dimensional barcode, or persistence diagram, of the distance function δ . Each bar ( b, d )corresponds to a connected component in the sublevel sets δ − (cid:0) [0 , t ) (cid:1) , that is, a branch of thetree. Note that the birth and death roles are reversed in the TMD algorithm compared to”usual” terminology in persistent homology: the birth corresponds to the bifurcation and thedeath to the termination of a branch. For each v ∈ N (cid:114) L , let L v denote the set of leaves of the subtree of T with root at the branch point v . Let µ : N → R be the function defined by µ ( v ) = (cid:40) max (cid:8) δ ( l ) | l ∈ L v (cid:9) : v ∈ N (cid:114) L,δ ( v ) : v ∈ L. We order the children of any vertex of T by their µ -value: if v , v ∈ N aresiblings, then v is younger than v if µ ( v ) < µ ( v ).The algorithm that extracts the TMD of a geometric tree T proceeds asfollows (Figure 3). Start by creating a set A of active vertices , originally setequal to L , and an empty barcode. For each leaf l , the algorithm proceedsrecursively along its unique path to the root r . At each branch point b ,one applies the standard Elder Rule from topological data analysis [5],removing from A all of the children of b , and adding b to A . One bar isadded to the barcode for each child of b except (any one of) the longest.Each child removed from A corresponds to a path from some leaf l to b ,which is recorded in the barcode as a bar (cid:0) δ ( b ) , δ ( l ) (cid:1) . These operationsare applied iteratively to all the vertices until the root r is reached, atwhich point A contains only r and a leaf l for which µ is maximal amongall leaves, which is recorded in the barcode as a bar (cid:0) , δ ( l ) (cid:1) .If T is a digital reconstruction of a neuron, and the function δ is thepath distance from the soma, then TMD( T ) is actually a strict barcode.Indeed, the probability for two branch points or leaves to be exactly thesame distance from the soma is almost zero, and TMD( T ) always has alongest bar that contains all the others. This observation justifies ourinterest in the subset of strict barcodes.The TMD gives rise to an equivalence relation on T : two geometric trees T and T (cid:48) are TMD-equivalent , denoted T ∼ tmd T (cid:48) , if TMD( T ) ∼ bar TMD( T (cid:48) ). e provide below an in-depth analysis of the TMD-equivalence classes ofgeometric trees. Geometric trees can be combinatorially equivalent withoutbeing TMD-equivalent and vice-versa, cf. Figure 5. The topological neuron synthesis (TNS) algorithm [12] stochastically gen-erates synthetic neurons, in particular for use in digital reconstuctions ofbrain circuitry [17]. In this paper, we focus on the sub-process of the TNSthat stochastically generates a geometric tree from a strict barcode, insuch a way that if a tree T is generated from a barcode B , then TMD( T )is “close to” to B , with respect to an appropriate metric on the set ofbarcodes, up to some stochastic noise, cf. section 4. Henceforth, when werefer to the TNS, we mean this sub-process.To grow geometric trees, the TNS algorithm first initiates growth, thenloops through steps of elongation and branching/termination . Each branchof the tree is elongated as a directed random walk [1] with memory. Ateach step, a growing tip is assigned probabilities to bifurcate, to terminate,or to continue that depend on the path distance from the root and on achosen bar of the selected barcode. Once a bar has been used, it is removedfrom the barcode. The growth of a tree terminates when no bars remainto be used. We now provide further details of the two steps in this process. Bifurcation / Termination
The branching process in the TNS algorithm is based on the concept of a
Galton-Watson tree [9], which is a finite rooted tree recursively generatedas follows. At each step, a number of offspring is independently sampledfrom a distribution. Since a geometric tree consists only of bifurcations,terminations, and continuations, the accepted values for the number ofoffspring are: zero (termination), one (continuation), and two (bifurcation).The Galton-Watson algorithm generates only a combinatorial tree, withno embedding in space, so we modify the traditional process to introducea dependency of the tree growth on the embedding, so that the bifurca-tion/termination probabilities depend on the path distance of the growingtip from the root.The bifurcation/termination step of the growth process of a geometrictree with associated barcode B proceeds as follows. Each growing tip of thetree is assigned a bar ( b i , d i ) sampled from the barcode B and a bifurcationangle a i . The growing tip first checks the probability to bifurcate, then theprobability to terminate. If the growing tip does not bifurcate or terminate,then the branch continues to elongate. The probability to bifurcate dependson b i : as the distance from the root to the growing tip approaches b i , theprobability to bifurcate increases exponentially until it attains a maximumof 1 at b i . Similarly, the probability to terminate depends exponentiallyon d i .The probabilities to bifurcate and terminate are sampled from anexponential distribution e − λx , whose free parameter λ should be wisely hosen. A very steep exponential distribution (high value of λ ) reducesthe variance of the population of geometric trees synthesized based on thesame barcode. On the other hand, a very low value of λ results in treesthat are almost random, since the dependence on the input persistencebarcode is decreased significantly. If we assume that growth takes placein discrete steps of size L , the value of the parameter λ should be of theorder of the step size L , to ensure biologically appropriate variance [12].Assuming L = 1 in some appropriate units, we usually select λ ≈
1, sothat the bifurcation and termination points are stochastically chosen butstill strongly correlated with the input persistence barcodes.Contrary to other neuron synthesis algorithms [15] that sample thebranching and termination probabilities from independent distributions, inthe TNS the correlation of these probabilities is captured in the structureof the barcode. When the growing tip bifurcates, the corresponding baris removed from the input barcode to exclude re-sampling of the sameconditional probability, thus recording the tree’s growth history, whichis essential for reproducing the branching structure. In the event of atermination, the growing tip is deactivated, and the bar that correspondsto this termination point is removed from the reference barcode.At a bifurcation, the directions of the two daughter branches createddepend on the bifurcation angle a i . In this study, we focus primarily onthe combinatorial type and the TMD type of the generated geometric tree,so we do not investigate the effect of bifurcation angles on the growth. Elongation
We now describe how the synthesized trees are embedded in R . A segment of a growing tree is the portion of the tree between a pair of consecutivevertices (parent and child). Each synthesized tree is grown segment bysegment. The direction of a segment, i.e., the vector (cid:126)d from its starting pointto its end point, is a weighted sum of three unit vectors: the cumulative memory (cid:126)m of the directions of previous segments within a branch, a targetvector (cid:126)t , and a random vector (cid:126)r [15]. The memory term is a weighted sumof the previous directions of the branch, with the weights decreasing withdistance from the tip. As long as the memory function decreases faster thanlinearly with the distance from the growing tip, the exact choice of functionis not important [12]. The target vector is chosen at the beginning of eachbranch and depends on the bifurcation angles. The random component isa unit vector sampled uniformly from R at each step. The direction ofthe segment (cid:126)d = ρ(cid:126)r + τ(cid:126)t + µ (cid:126)m, then depends on three weight parameters ρ , τ , and µ , where ρ + τ + µ = 1.An increase of the randomness weight ρ results in a highly tortuousbranch, approaching the limit of a simple random walk when ρ = 1. Ifthe targeting weight τ = 1, the branch will be a straight line in the targetdirection. Different combinations of the three parameters ( τ, ρ, µ ) generatemore or less meandering branches and thus reproduce a large diversity ofgeometric trees. he Elder Rule and TNS The TNS provides a sort of right inverse to the TMD. To recreate a treethat is close to TMD-equivalent to the original, the branch correspondingto a particular bar ( b i , d i ) in the barcode can be attached only to branchescorresponding to bars ( b j , d j ) such that d i < d j and b i > b j . This ruleensures that the Elder rule (at a bifurcation, the longer component survives)holds in the TMD transformation. As a result, only a subset of trees with n branches can be generated by the TNS from a given strict barcode with n bars. TMD TNSGoal
Compute the barcode of a tree based on a distance function Grow a new tree from a barcode
Directionality
From leaves to root From root to leaves
Domains { geometric trees } −→ { barcodes } { barcodes } −→ { geometric trees } Table 1: Summary and terminology of the TMD and TNS algorithms. TheTMD computes the barcode of a tree from the tips of branches towards the root,whereas the TNS grows the tree in the opposite direction, from the root to theleaves.
In this section we provide an in-depth analysis of the set of geometric treesthat realize a specific strict barcode B , i.e., each of which has TMD equalto B . A geometric tree T is a tree-realization of a barcode B if TMD( T ) = B , i.e., T ∈ TMD − ( B ). Examples of tree-realizations are provided in Figure 4B,while Figure 5 shows all the possible combinatorial types of tree-realizationsof a strict barcode with n = 4.In Figures 4 and 5, we encode the combinatorial structure of the tree,i.e., how the branches may be attached to each other, in an adjacencymatrix in which the ( i, j ) coefficient is non-zero if the Elder Rule allowsbar i to be connected to bar j . For example, in Figure 4A, bars 1 − , , (0 , , (0 , , , and 3, from 1 to 2 and 3, and from 2 to 3. B with four bars. Eachrow (left) represents a possible permutation of death times for the correspondingorder of bars in B , i.e., a possible TMD-type. Each barcode can be realized by asubset of all the combinatorial tree types, each represented by a column, with acorresponding adjacency matrix. For any strict barcode B , let T ( B ) denote the set of combinatorialequivalence classes of tree-realizations of B , i.e., T ( B ) = TMD − ( B ) / ∼ comb . We can characterize the equivalence relation on strict barcodes in terms of T ( B ). Lemma 3.1. If B and B (cid:48) are two strict barcodes with the same numberof bars, then B ∼ bar B (cid:48) ⇐⇒ T ( B ) = T ( B (cid:48) ) . Proof.
The order of the deaths in a strict barcode B completely determinesthe set of combinatorial equivalence classes of its possible tree realizations.Indeed, the two pairs of bars in Figure 6(2) lead to the same adjacencypossibilities for their respective branches. Only move (1) in Figure 6, orresponding to switching the order of the deaths of the two bars, modifiesthe permutation equivalence class of the barcode, hence also the set oftrees that return the given barcode. (1)(2) Figure 6: The two possible moves that respect the condition of a realisablebarcode. Move (1) modifies the barcode’s ordering, whereas move (2) does notchange the order of the deaths.
Let B = { ( b i , d i ) } i =0 ,...,n be a strict barcode. In this section we analyzethe tree-realization number of B ,TRN( B ) = T ( B ) , i.e., the number of combinatorial equivalence classes of tree-realizations of B . We provide a formula for TRN( B ) in terms of the index of each bar( b i , d i ), i.e., the number of bars that include ( b i , d i ) strictly:index i ( B ) = { j | b j < b i < d i < d j } = { j < i | d i < d j } . A version of this formula was established by Curry in [5].
Lemma 3.2.
The tree-realization number of a strict barcode B = { ( b i , d i ) } i =0 ,...,n is equal to the product of the indices of its bars, i.e., TRN( B ) = (cid:89) ≤ i ≤ n index i ( B ) . Proof.
Because of the Elder Rule applied in the TMD, one branch canbe attached to another only if its corresponding bar is included in theother bar. This simple observation enables us to prove the lemma by astraightforward recursion on the number of bars.In particular, the maximum tree-realization number for a strict barcodewith n + 1 bars is n !, in the specific case where d n < ... < d < d . We callthis case a strictly ordered barcode.Note that the tree-realization number does not satisfyTRN( B ) = TRN( B (cid:48) ) = ⇒ B ∼ B (cid:48) n general, i.e., the tree-realization number is not a complete invariant ofthe barcode equivalence relation. For instance, barcodes (231) and (312)in Figure 8 both have TRN( B ) = 2 but have different permutation types.The inverse clearly does hold, however:TRN( B ) (cid:54) = TRN( B (cid:48) ) = ⇒ B (cid:54)∼ B (cid:48) , enabling us to detect non-equivalence of barcodes. For the pairs of barcodesstudied in sections 4 and 5 of this paper, though, it is usually true thatif they have same TRN, then they are equivalent, so that we can use theTRN to detect equivalence of barcodes in these special cases. In particular,if a new bar is added to a barcode or two deaths are transposed and noother changes take place, then the tree-realization number does change.Lemma 3.2 enables us to quantify how adding a new bar changestree-realization number. Lemma 3.3.
Let B = { ( b i , d i ) } i =0 ,...n and B (cid:48) = B ∪ { ( b n +1 , d n +1 ) } , where b n +1 > b i for all ≤ i ≤ n , be strict barcodes. If d i > ...d i k − > d n +1 >d i k > ...d i n , then TRN( B (cid:48) ) = TRN( B ) · k. Proof.
The condition on d n +1 implies that the new bar ( b n +1 , d n +1 ) isincluded in exactly k other bars, so its index is k . Example 3.4.
Let B be a barcode with four bars such that b < b < b d > d > d , i.e., its equivalence class is (213). It is easy tosee that TRN( B ) = 3 (see Figure 7). If we add a new bar ( b , d ) such that d > d > d , the equivalence class of the new barcode B (cid:48) is (2143), andbar ( b , d ) is included in ( b , d ), ( b , d ) and ( b , d ), but not in ( b , d )because d > d . Therefore, its index is 3, whence TRN( B (cid:48) ) = 3 · Figure 7: A barcode B in equivalence class (213) is shown in black. There arethree possible combinatorial equivalence classes of trees whose TMD barcodeis B , also represented in black. After adding the extra bar in red, we obtaina new barcode B (cid:48) , in the equivalence class (2143). In a tree-realization of B (cid:48) ,the branch corresponding to the red bar can be attached to any of the branchescorresponding to the 0th, 1st, and 2nd bars, represented on the trees by the redbranches. This leads to nine possible combinatorial equivalence classes of treesfor the barcode (2134). We can also apply Lemma 3.2 to determining how switching the order oftwo consecutive deaths in the barcodes affects the tree realization number. roposition 3.5. Let B = { ( b i , d i ) } i =0 ,...n be a strict barcode in the equiv-alence class ( i ...i n ) . Let B (cid:48) = { ( b (cid:48) i , d (cid:48) i ) } i =0 ,...n be a new barcode obtainedby permuting the deaths d i k and d i k +1 , i.e., b i = b (cid:48) i for all i and d i = d (cid:48) i forall i (cid:54) = i k , i k +1 , while d i k = d (cid:48) i k +1 and d i k +1 = d (cid:48) i k .1. If i k < i k +1 , then index i k +1 ( B (cid:48) ) = index i k +1 ( B ) − , and TRN( B (cid:48) ) = TRN( B )(index i k +1 ( B ) − i k +1 ( B ) .
2. If i k > i k +1 , then index i k +1 ( B (cid:48) ) = index i k +1 ( B ) + 1 , and TRN( B (cid:48) ) = TRN( B )(index i k +1 ( B ) + 1)index i k +1 ( B ) . Proof.
It is enough to prove (1), since (2) then follows by switching theroles of B and B (cid:48) .If i k < i k +1 , then b i k < b i k +1 . Since B is in the equivalence class ( i ...i n ), d i k +1 < d i k as well, whence ( b i k +1 , d i k +1 ) ⊂ ( b i k , d i k ). On the other hand,( b (cid:48) i k +1 , d (cid:48) i k +1 ) (cid:54)⊂ ( b (cid:48) i k , d (cid:48) i k ), but otherwise respects all of the same inclusionrelations as ( b i k +1 , d i k +1 ), so thatindex i k +1 ( B (cid:48) ) = index i k +1 ( B ) − , as desired. Moreover, ( b (cid:48) i k , d (cid:48) i k ) (cid:54)⊂ ( b (cid:48) i k +1 , d (cid:48) i k +1 ), so ( b (cid:48) i k , d (cid:48) i k ) respects exactlythe same inclusion relations as ( b i k , d i k ), i.e.,index i k ( B (cid:48) ) = index i k ( B ) . Because no other bars are affected when passing from B to B (cid:48) , we canconclude. Example 3.6.
In Figure 9, we show all the possible death-transpositionsin a strict barcode with five bars. As an example, take B in the equivalenceclass (2134), so the barcode satisfies d > d > d > d . The index of( b , d ) is 4, because it is included in all the other bars. Permuting d and d leads to a barcode in the equivalence class (2143). The index of the lastbar is now 3 because it is no longer included in the third bar.Recall the bijection σ : B st n → S n from section 2.2. Permuting theorder of the deaths d i and d i +1 corresponds to a transposition ( i, i + 1)in S n (and to move (1) in Figure 6). Studying the allowed moves andtheir effects on the barcode is equivalent to studying the symmetric groupseen as generated by transpositions of type ( i, i + 1), enabling us to createthe following revealing visualization of the effect of switching the order ofdeaths or of adding a new bar by using Cayley graphs of S n .We illustrate the results of Lemma 3.3 and Proposition 3.5 on thetwo Cayley graphs of S and S in Figures 8 and 9. Figure 8 showsthe Cayley graph of S generated by the permutations (12) and (23)and the corresponding equivalence classes of barcodes. The vertices ofthe graph correspond to the permutations in the symmetric group and heir corresponding barcode types, and the edges between them to thetranspositions transforming one permutation into another. The numbernext to each bar is its index. The trees that return a given barcode aresketched next to each vertex of the Cayley graph of S . (12)(23)
213 123 132231 321 312
AA A AAACCD C BEF D BBED
Figure 8: A representation of the space of barcodes with four bars up to permu-tation equivalence classas the Cayley graph of S generated by the transpositions (12) and (23),respectively. Each vertex is an element of the group. The edges represent thetransposition to convert one end point into the other, colored by generator. Thenumber to the right of each bar is its index. All trees T such that TMD( T ) = B are indicated next to a barcode B with the corresponding tree type of Figure 5.The number of such trees can be computed using Lemma 3.2. Figure 9 shows the Cayley graph of S generated by the transpositions(12), (23), and (34), illustrating the effect on tree-realization number bothof switching the order of two deaths and, in comparison with the previousfigure, of adding an extra bar.
33 143 11 43 21 33 211
Figure 9: The Cayley graph representing S generated by (12) , (23) , (34). Forthe four outside elements, we provide the barcode associated to the permutation.The number next to each bar is its index. In this section, we investigate the effect of the composition of the TNS andTMD algorithms from a theoretical perspective. Given a strict barcode B = { ( b i , d i ) } i =0 ,...,n , we apply the TNS to B , for a fixed choice of theparameter λ (cf. section 2.4), obtaining a tree T B , and then compute thebarcode of T B , TMD( T B ), which we denote by B (cid:48) = { ( b (cid:48) i , d (cid:48) i ) } i =0 ,...,n . Toquantify to what extent the TNS acts as an inverse to the TMD, we areinterested in determining how similar B and B (cid:48) are.Expressing the similarity between B and B (cid:48) in terms of the bottleneckdistance enables us to establish one form of stability for the TNS inthe first part of this section. We establish another type of stability forthe TNS in the second part, when we show that the probability thatthe order of two specific bars will be altered upon applying TMD ◦ TNSdecreases exponentially with the distance between the death times of thetwo bars. Together these stability results imply that the TNS is an excellentapproximation to a (right) inverse to the TMD.
Henceforth, we call the endpoints of the bars of the barcode B targets , asthe TNS algorithm either creates a new branch or terminates a branch whenthe distance from the root approaches a birth or death point, respectively. y definition of the TNS algorithm (cf. section 2.4), when approachinga target, there is an exponential probability to bifurcate (create a newbranch) or terminate, depending on λ . It follows that for any bar ( b i , d i ) ofa given barcode B , the distance between b i and b (cid:48) i (the bifurcation distanceand the target bifurcation distance of the i th branch) and the distancebetween d i and d (cid:48) i (the termination distance and the target terminationdistance of the i th branch) should follow an exponential distribution ofparameter λ , | b i − b (cid:48) i | ∼ Exp( λ ) and | d i − d (cid:48) i | ∼ Exp( λ ) . The notion of similarity between barcodes that we consider here is thestandard bottleneck distance . Given two strict barcodes B and B (cid:48) with n + 1 bars, the bottleneck distance between them is d ( B, B (cid:48) ) = inf γ ∈ S n sup i | b i − b (cid:48) γ ( i ) | + | d i − d (cid:48) γ ( i ) | . Lemma 4.1.
Let B be a strict barcode with n bars, and let B (cid:48) = TMD ◦ TNS( B ) . If B ∼ B (cid:48) , then P (cid:0) d ( B, B (cid:48) ) > ε (cid:1) ≤ − (1 − exp( − λε )( λε + 1)) n . (1) Proof.
Considering the case where γ is the identity, we see that d ( B, B (cid:48) ) ≤ sup i | b i − b (cid:48) i | + | d i − d (cid:48) i | . If B ∼ B (cid:48) , the differences between the new and original values of the birthsand deaths all follow an exponential distribution, | b i − b (cid:48) i | ∼ Exp( λ ) and | d i − d (cid:48) i | ∼ Exp( λ ) . The cumulative probability distribution function of | b i − b (cid:48) i | + | d i − d (cid:48) i | is thus given by an Erlang(2 , λ ) distribution P (cid:0) | b i − b (cid:48) i | + | d i − d (cid:48) i | ≤ ε (cid:1) = 1 − (1 + λε ) exp( − λε ) . Because we consider the supremum over i of the sum | b i − b (cid:48) i | + | d i − d (cid:48) i | ,and all of the | b i − b (cid:48) i | + | d i − d (cid:48) i | are i.i.d , it follows from the theory of orderstatistics that P (cid:0) d ( B, B (cid:48) ) ≤ ε (cid:1) ≥ P (sup i | b i − b (cid:48) i | + | d i − d (cid:48) i | ≤ ε ) = (1 − exp( − λε )( λε +1)) n . (2)Considering the probability of the complement leads to the result inEquation 1.Lemma 4.1 implies that the TNS is stable with respect to the bottleneckdistance, in a manner dependent on the parameter λ . To illustrate thisdependence, we plot the function of Equation 1 for different values of λ in Figure 10. The curve obtained for λ = 1 (blue in Figure 10) makes itclear that setting λ = 1 ensures that the TNS gives rise to a diverse family f new trees that are nonetheless topologically not significantly far fromthe original ones, which is the desired goal from a biological perspective.Making an appropriate choice of the parameter λ is thus essential. Figure 10: Upper bound on the probability that the bottleneck distance between B and TNS ◦ TMD( B ) is larger than ε (Equation 1) for various values of λ andfor n = 10. If B ∼ B (cid:48) , the bound by γ = I in the formula for the bottleneck distanceis computed between pairs of points that follow the same exponential law,as the order of bars is preserved. If B (cid:54)∼ B (cid:48) , for example when a switch ofbars occurs, then we cannot assume that the distances between matchedpairs of points in the computed bottleneck distance follow the same law.Change of permutation type between B and B (cid:48) is more frequent for small λ (Figure 14). Therefore, the previous lemma is usually not applicable forsmall values of λ , for which it is any case not particularly useful, as shownin Figure 10. In Figure 11 we summarize graphically the discussion above.The transposition of bars is studied in detail in the next section. Exp( λ ) ∼ Exp( λ ) ∼ Erlang(2 , λ ) Figure 11: A. The (cid:96) -distance between the black bullet and the diamond followsan Erlang(2 , λ ) distribution. The interior of the green square defines a bound forthe (cid:96) -distance from the black bullet that depends on the value of the parameter λ .B. If the endpoints of the bars of B are sufficiently far away from each other and B ∼ TMD ◦ TNS( B ), then, with high probability, taking γ = I will minimizethe (cid:96) -distance between pairs of endpoints of bars. C. If the endpoints of B areinstead close to each other, then it is more likely that B (cid:54)∼ TMD ◦ TNS( B ), sothat the optimal choice of γ (represented by red segments) is not the identity.The red distances do not necessarily follow exponential distributions, so the proofof Lemma 4.1 does not apply. We perform two experiments to illustrate our theoretical results com-putationally. First, we compute the bottleneck distance between inputbarcodes B and output barcodes B (cid:48) , for increasing values of lambda λ from0 .
01 to 2 (see Figure 12A). The computational results (average bottleneckdistances in red, Figure 12A2) fit the curve of the expected mean of theprobability density function well (blue curve).We also compute the cumulative density function for 0 ≤ (cid:15) ≤ ≤ λ ≤
2, which we compare to the computational results (redpoints, Figure 12A3), showing that they match the theoretical prediction(blue colormap) very closely for a wide range of sufficiently large λ (zoom-in, Figure 12A3). However, for very small values of λ , the condition B ∼ B (cid:48) is not always satisfied, leading to the observation that for λ < . λ , where the input barcodes arise bygradually decreasing the death time of one bar of an initial barcode B andthus increasing the distance to the next death time in the sequence (seeFigure 12B). All other bars of the initial barcode B remain the same. Weobserve that the bottleneck distance depends only on the value of λ andnot on the distance between the bars of the input barcode. The PDF can be deduced from Equation 2 in the proof of Lemma 4.1 by taking thederivative of (1 − exp( − λε )( λε + 1)) n . λ . We compute the bottleneckdistance between an input barcode B and an output barcode B (cid:48) for λ = 0 . −
2. A1. From barcode B (in black), a tree (in red) is generated using theTNS which results in a new barcode B (cid:48) = TMD ◦ TNS( B ) (in red). A2. Theaverage bottleneck distance (red points) is compared to the expected mean ofthe probability distribution function found in Lemma 4.1 (blue curve). A3. Thebottleneck distances (red) are compared to the cumulative distribution probabilityfor 0 < (cid:15) <
200 and 0 < λ < B and B (cid:48) as a function of distances between bars in B . B1. We consider barcodes of thesame permutation type for different distances between two bars ( b i , d i ) and ( b j , d j )of the initial barcode B that are consecutive in the order of deaths. B2. Foreach input barcode with increasing d i , distance between death times presentedin x -axis, 100 synthesized barcodes are generated and the bottleneck distancebetween the input and output barcodes is computed ( y -axis), which depends onlyon the value of λ and not on the distance between the bars.22 .2 Transposition stability As the TNS algorithm is a stochastic process, the image of any strict barcode B = { ( b i , d i } under the composite TMD ◦ TNS essentially always differs atleast slightly from B . Here we determine the probability that the ordersof the death times of two specific bars of B and TMD ◦ TNS( B ) = B (cid:48) = { ( b (cid:48) i , d (cid:48) i ) } are different, so that B and TMD ◦ TNS( B ) are not combinatoriallyequivalent, i.e., the associated permutations are different, as long as thebirth times are not also transposed. d (cid:48) i d (cid:48) j d i d j | d i − d (cid:48) i | ∼ Exp( λ ) | d j − d (cid:48) j | ∼ Exp( λ )Figure 13: We are interested in the case where d (cid:48) j < d (cid:48) i when we start from d i < d j .The distances | d i − d (cid:48) i | and | d j − d (cid:48) j | both follow an exponential law of parameter λ . The probability to terminate increases exponentially when approaching d i and d j , as represented by the blue arrows. Lemma 4.2.
Let B be a strict barcode, and let ( b i , d i ) , ( b j , d j ) be bars of B such that d i < d j . Let ( b (cid:48) i , d (cid:48) i ) and ( b (cid:48) j , d (cid:48) j ) denote the corresponding barsin B (cid:48) = TMD ◦ TNS( B ) . The probability that d (cid:48) j < d (cid:48) i is P ( d (cid:48) j < d (cid:48) i ) = 12 exp( − λ ( d j − d i )) . The TNS thus exhibits a sort of “transposition stability”: the probabilitythat the death times of two bars will be transposed decreases exponentiallywith the distance between those death times.
Proof.
We compute P ( d (cid:48) j < d (cid:48) i ) = P ( d (cid:48) j < d (cid:48) i | d i < d j ), the probability that d (cid:48) j < d (cid:48) i given that d i < d j . Observe first that P ( d (cid:48) j < d (cid:48) i ) = P ( d j + ( d i − d (cid:48) i ) < d i + ( d j − d (cid:48) j ))= P ( d j + X i < d i + X j ) = P ( X j − X i > d j − d i ) . Let Y = X j − X i . As X j and X i both follow an exponential law, thedensity function of their difference, Y , is given by f Y ( t ) = λ exp( − λt )when t ≥
0. Therefore, P ( d (cid:48) j < d (cid:48) i ) = P ( X j − X i > d j − d i ) = (cid:90) ∞ d j − d i f Y ( t ) dt = 12 exp( − λ ( d j − d i )) . Remark 4.3.
Since the TNS is based on a stochastic process, multipletranspositions can occur when generating a new tree from a barcode.This makes it challenging to determine the overall probability of changing quivalence classes when computing the composite TMD ◦ TNS. Note thatthe TNS might also affect the birth order, but we will not discuss thispossible effect in this paper. For the following experiments, the selectedexamples do not experience birth-switches, as the cells from which wecomputed the barcodes were chosen with sufficient gaps between birthvalues to avoid such switches.We perform the following computational experiment to evaluate thetransposition stability results. We systematically vary the distance betweentwo bars by changing the death time of a bar in the input barcode andcompute the percentage of order changes that occur for different values oflambda(see Figure 14). We compare the theoretical results (solid lines) tothe computational experiment (scatter points) for five different values oflambda. Note that for this experiment, the birth times are chosen to besufficiently distinct, and only the number of switches due to permutationsthat correspond to death changes are counted. The experimental resultsmatch the theoretical prediction with high accuracy, where we computethe error as the average distance of the computational points from thetheoretical curve ( λ = 10 , error = 0%, λ = 5 , error = 0 . λ =1 , error = 0 . λ = 0 . , error = 0 . λ = 0 . , error = 3%, λ =0 . , error = 5%). Note that the error increases for smaller values of λ ,due to the computational artefacts introduced when λ is small. In this section we present computational results that illustrate the complexrelationship between the equivalence class of a barcode and its possibletree-realizations.We first present four results concerning all geometric trees: a com-putation of the distribution of tree-realization numbers across the set ofequivalence classes of strict barcodes for various numbers of bars, a com-putation of the empirical distribution of combinatorial types of geometrictrees in a synthesized population as a function of the equivalence class ofthe input barcode, a measurement of the diversity of TMD-equivalenceclasses among the realizations of a fixed barcode, and simulations of thefluctations in tree-realization number that can occur as two bars gradually witch the order of their deaths.We conclude by reporting on an experiment that sheds light on thedistinguishing characteristics of “biological” geometric trees, i.e., thosethat arise from digital reconstructions of neurons. We illustrate here how the number of tree-realizations of strict barcodeswith n + 1 bars depends on n . In Figure 15 we present the distribution oftree-realization numbers across equivalence classes of barcodes with n + 1bars, for 1 ≤ n ≤
10. As mentioned in section 3.2, the tree-realizationnumber is maximal for a fixed number of bars if and only if the barcode isstrictly ordered. We observe an exponential-like behavior in the distributionof tree-realizations with the increase of the number of bars. We expect tolink this behavior with intrinsic properties in the space of barcodes in afuture work.
Figure 15: Histogram of tree-realization numbers for equivalence classes of bar-codes with n + 1 bars (1 ≤ n ≤ .2 Empirical distributions of combinatorial typesof trees In this section, we explore computationally the probability to generatedifferent combinatorial tree types (see Figure 5 A-F) with the TNS. Weobserve that this probability depends on the choice of the parameter λ (cf. section 2.4). When λ >
2, the TNS is more likely to generate treeswith all branches connected to the longest branch, due to the design of thealgorithm. On the other hand, for smaller values of λ , the probability togenerate different types of trees is approximately uniform.Focusing on our prefered value of λ , we generated 1000 trees for λ = 1and computed the percentage of each combinatorial tree type that is realizedfor each equivalence class of barcodes with four bars (Figure 16). Thereare six possible equivalence classes of strict barcodes with four bars andsix combinatorial equivalence classes of geometric trees with four branches. Figure 16: Empirical distribution (percentage of 1000 trees) of synthesizedgeometric trees with four branches by combinatorial tree type (columns A - F)for a given input barcode equivalence class (rows), when λ = 1. We observe thatthe distribution is approximately uniform. We now explore the diversity of TMD-equivalence classes of geometric treesthat can be synthesized from a fixed barcode, in the particular case of theTMD of a biologically meaningful tree. For a fixed geometric tree witheight branches arising from a digital reconstruction of layer 4 pyramidal cell, e computed its TMD barcode, to which we applied the TNS with λ = 1to generate a set of 100 geometric trees. We computed the barcode-typeand the persistence diagrams of the synthesized trees (Figure 17).In agreement with the results presented in Figure 12, the persistencediagrams of the synthesized trees (Figure 17B, in blue) are essentiallyindistinguishable from the persistence diagram of the original barcode(Figure 17B, in red). On the other hand, the TMD-equivalence class ofa synthesized tree is not necessarily equal to that of the original tree(Figure 17A). Here we represent the TMD-equivalence class of a tree interms of the permutation σ B corresponding to the equivalence class ofits TMD barcode B . In this graphical representation, we plot birth (orbifurcation) index k (on the x -axis) versus death (or termination) index σ B ( k ) (on the y -axis). A strictly ordered barcode would thus correspondto the set of points along the diagonal in this representation. Figure 17: Barcode-equivalence class, represented by the corresponding per-mutation, (A) and persistence diagram (B) of 100 synthesized cells based ona geometric tree with eight branches, extracted from a layer 4 pyramidal cell.The barcode-equivalence classes of the synthesized trees (represented by bluedots) can differ from that of the original tree due to the stochastic nature ofsynthesis algorithm. The persistence diagrams of the synthesized trees (B, blue)are essentially indistinguishable from those of the original tree (B, red).
Motivated by the theoretical results on the probability to change classesin section 4.2, we analyze here several simulations of gradual switching ofdeath order of two bars and the resulting effect on tree realizations andtheir associated barcodes.Let B be a strict barcode, and let ( b i , d i ) , ( b j , d j ) be bars of B suchthat d i < d j . By Lemma 4.2, for a fixed choice of the parameter λ (cf. section 2.4), the probability that the order of these two bars is reversed n TMD ◦ TNS( B ) depends exponentially on the distance between d i and d j : P ( d (cid:48) j < d (cid:48) i ) = 12 exp( − λ ( d j − d i )) . Thus, when the distance between d i and d j decreases, the probabilitythat the order of bars changes increases. When there is no k such that d i < d k < d j , Proposition 3.5 provides a formula for the tree-realizationnumber of the new barcode obtained when such a switch happens, as longas the order of the birth times is not also reversed.We start with a geometric tree T extracted from a digital reconstructionof a neuron and compute its associated barcode B = TMD( T ). We choosetwo bars ( b i , d i ) and ( b j , d j ) of B that are consecutive in the order ofdeaths and divide the interval ( d i , d j ) into 50 equally sized subintervals.For 0 ≤ k ≤
50, let B k be a barcode that is identical to B , except that its i th bar is (cid:0) b i , d i + k ( d j − d i ) / (cid:1) and its j th bar is (cid:0) b j , d j − k ( d j − d i ) / (cid:1) .An interesting way to visualize this change is to think of B k as migratingalong the edge between B and the barcode with d i and d j permuted in thecorresponding Cayley graph as k increases. The middle point of the edgecorresponds to the non-strict barcode for which the two deaths are equal.Let B (cid:48) k = TMD ◦ TNS( B k ) for all k . Because of the stochastic natureof the TNS algorithm, the permutation equivalence class of B (cid:48) k may bedifferent from that of B k . Figure 18 provides an example of this construction,where the barcodes are represented as persistence diagrams for visualizationpurposes (cf. section 2.2).
Termination distance B i f u r c a t i on d i s t an c e Evolution of PD(B') ′ when bars 4 and 5 switch Figure 18: We begin with a barcode with 8 bars. The death times d i and d i (i.e., the 4th and 5th largest death times) are slowly switching as k increases,represented by red-shifting of the color of the points in the persistence diagram.When k = 0 (in red), we have the original barcode B , and when k = 50 (in blue)we obtain a barcode identical to the original, except that ( b i , d i ) is replaced by( b i , d i ) and ( b i , d i ) by ( b i , d i ). To test whether the barcode B (cid:48) k is equivalent to the original barcode B , we compute its tree-realization number: if TRN( B (cid:48) k ) (cid:54) = TRN( B ), then and B (cid:48) k are not equivalent. Note that for the specific process that givesrise to B k , it is likely that only the studied death-switch could lead to adifference between the tree-realization numbers of the input and outputbarcodes, unless two other deaths are too close to each other in the inputbarcode, as in the last row of Figure 19, which we explain further below.Therefore, the tree-realization number provides a very good indication ofwhether the switch of deaths took place, i.e., if B ∼ B (cid:48) k or not. Indeed,two barcodes that are the same except for two deaths that switched havedifferent tree-realization number, cf. Proposition 3.5. Figure 19 showsseveral examples of the endpoint-switching process described above andthe corresponding evolution of the tree-realization number as k increases.The corresponding permutation type and tree-realization number of eachinitial barcode, and the bars that are switched are listed in Table 2. Permutation TRN Bars that switch B (2 , , , , , , ,
3) 810 4 and 5ˆ B (2 , , , , , , ,
3) 540 4 and 5 B (5 , , , , , ,
3) 12 2 and 3ˆ B (5 , , , , , ,
3) 18 2 and 3 B (5 , , , , , ,
3) 12 3 and 4ˆ B (5 , , , , , ,
3) 18 3 and 4 B (8 , , , , , , ,
5) 20 1 and 2ˆ B (6 , , , , , , ,
5) 40 1 and 2Table 2: For each example displayed in Figure 19, we list the permutation typeand the tree-realization number of the original barcode B and of ˆ B = B , andthe indices of the bars that are switched. The superscript i in B i indicates thecorresponding row of Figure 19. For example, the largest death time of barcode B is the second bar (in order of birth times), and its shortest death is the thirdone. When we switch the 4th and 5th (from largest to smallest) death times in B and ˆ B , the TRN changes from 810 to 540. The top row of Figure 19 illustrates very well the exponential behaviorof changing classes. When the distance between the death times of the twobars is very small (they are the closest when k = 25), the tree-realizationnumber oscillates between its values for two different classes and otherwisestays constant.The two middle rows come from the same biological tree and hencehave the same starting barcode. The difference is that in the second row,the death times of the two bars are already very close, leading to morefrequent changes of equivalence class than in the third row.The bottom row illustrates Remark 4.3 well. Since several bars areclose to each other (represented here by several points in the persistencediagram that are close to each other), applying the TNS algorithm leads tofrequent changes in equivalence classes, leading to the oscillatory behaviorof the tree-realization number curve.
100 200 300 4000100200300400500
Evolution of PD(B') ′ when bars 2 and 3 switch Evolution of PD(B')when bars 3 and 4 switch
Evolution of PD(B') when bars 1 and 2switch
Evolution of PD(B')when bars 4 and 5 switch
TRN(B') as 1 and 2 switchTRN(B') as 4 and 5 switch kkkk k kk0 10 20 30 40 5012131415161718
TRN(B') as 3 and 4 switch k0 10 20 30 40 5012131415161718
TRN(B') as 2 and 3 switch kk B i f u r c a t i on d i s tt an c e B i f u r c a t i on d i s tt an c e B i f u r c a t i on d i s tt an c e B i f u r c a t i on d i s tt an c e Termination distanceTermination distanceTermination distanceTermination distance
Figure 19: On the left, evolution of PD( B (cid:48) k ) as k increases (represented by red-shifting of the point color, from red k = 0 to blue k = 50), for various pairs ofbars. When not clear, we circle in orange the two points that switch. On theright, the corresponding evolution of the tree realization number TRN( B (cid:48) k ) as k increases. For instance, as indicated in Table 2, the tree-realization number of B is 810 and that of ˆ B = B is 540. The barcodes B (cid:48) k exhibit the behaviordescribed in Lemma 4.2, except for the last row, in which death times that aretoo close to each other (circled in purple and green) interfere with the process.Without this interference, the tree-realization numbers should oscillate between20 and 40. When k gets close to 50 (blue), the death time d i (largest death time)starts interfering with the third one d i (circled in purple) in the tree synthesisprocess. 31 .5 Tree-realizations of biological barcodes Since the original objective in developing the TMD was to classify digitalreconstructions of neurons, it is natural to ask whether those barcodesthat arise biologically exhibit any special characteristics compared to thosearising from other sets of geometric trees. In Figure 20 we employ thegraphical representation of permutations introduced in section 5.3 to displayas red dots all possible permutations corresponding to TMD-barcodes ofbiological trees with at most 30 branches arising from a population of digitalreconstructions of neurons. Clearly, only a small fraction of the set of allpossible permutations can be realized as the barcode-equivalence classes ofgeometric trees extracted from digital reconstructions of neurons, as everyblack dot in this plot can arise as a pair (cid:0) k, σ ( k ) (cid:1) for some permutation σ . In future work, we intend to study the biological relevance of thisrestriction.To provide further insight into the subset of TMD-equivalence classesof biological geometric trees within the set of all possible TMD-equivalenceclasses, we computed the tree-realization number as a function of thenumber of bars, for a population of barcodes obtained by applying the TMDto geometric trees extracted from a population of digitally reconstructedneurons. We compared the values obtained to the maximum tree-realizationnumber and to the tree-realization numbers of randomly chosen barcodeswith the same number of bars (Figure 21). Interestingly, the barcodes thatcorrespond to apical dendrites (relatively complex neural trees that performsignificant processing tasks) exhibit a more narrow range of possible tree-realization numbers than random barcodes of the same size. On the otherhand, barcodes of basal dendrites (less complex neuronal trees) exhibit tree-realization numbers similar to those of the randomly generated barcodes. n ! for n + 1 bars) (in red). (B) The log of thetree-realization number for barcodes of apical dendrites (in blue) in comparisonwith random barcodes (in yellow) and the maximum maximum tree-realizationnumber (in red). In this paper we presented and analyzed two algorithms that are relevantin topological data analysis: the TMD, which encodes the structure of ageometric tree in a barcode, and the TNS, which generates a geometric treefrom a barcode. We proved that for a good choice of parameter, the TNSis robust with respect to small perturbations of barcodes; an analogousstability result for the TMD was established in [13].We observed that ordering the bars in the persistence barcode accordingto birth times results in the natural association of a permutation to the arcode, based on death times, giving rise to a meaningful equivalencerelation on the set of barcodes. We also introduced a natural, combinatorialequivalence relation on geometric trees. For any barcode, we analyzedthe set of combinatorial equivalence classes of those geometric trees whoseTMD corresponds that barcode, providing a simple, explicit formula for itscardinality in terms of the permutation associated to the barcode. Cayleygraphs of symmetric groups provide a useful visualization of how thiscardinality varies as bars in the barcode are transposed.We illustrated our theoretical results computationally. In addition, wecomputed the probability for the TNS to generate different combinatorialtree types from a fixed barcode and found it to be a function of theparameter λ on which the TNS depends, a result which can be explainedonly by the stochastic nature of the TNS algorithm. The stochastic natureof the TNS algorithm also leads to variation in the equivalence classesof barcodes associated by the TMD to the trees generated from a fixedbarcode by the TNS. In particular, when starting with the TMD of a“biological” tree (i.e., arising from a digital neuron reconstruction) includingbars with similar birth or death times, we observed an oscillatory behaviorbetween two (or more) different classes states, increasing the variance ofthe generated trees.We also initiated an analysis of the distinctive features of biological treescompared to random trees. We discovered that the barcodes associated bythe TMD to trees representing neuronal morphologies represent a smallfraction of possible equivalence classes of barcodes. It follows that the set ofcombinatorial types of geometric trees that are biologically realized is alsoconstrained, indicating a biological preference for specific tree structures.There is much yet to discover about which geometric or combinatorialfeatures distinguish biological trees among all geometric trees and why.In future work we intend to further investigate the effect of differenttypes of noise on the TNS algorithm. For instance, we have consideredonly the effect of transposing two bars, but other types of changes arecertainly also relevant, such as investigating the effects of switching bothbirths and deaths, as mentioned in Remark 4.3. On a more neuroscientificnote, we intend to continue exploring the distinguishing characteristics ofbiological trees, with the goal of explaining the structural and functionalreasons for the observed geometric and combinatorial constraints.On the mathematical side, we are currently analyzing the structure onthe space of barcodes revealed by the symmetric groups and determiningwhat information can be extracted from the induced stratification ofthis space. This structure on the space of barcodes should also providesignificant insights into the still somewhat mysterious space of geometrictrees, which is of considerable interest to a wide range of mathematicians. Author contributions
Conceptualization by L.K., experiments performed by L.K. and A.G. For-mal analysis and theorem formulation by L.K., A.G and K.H. Supervision y K.H. All authors wrote that paper and agreed to the published versionof the manuscript. Funding
This study was supported by funding to the Blue Brain Project, a researchcenter of the ´Ecole polytechnique f´ed´erale de Lausanne (EPFL), fromthe Swiss government’s ETH Board of the Swiss Federal Institutes ofTechnology. AG and KH gratefully acknowledge the support of SwissNational Science Foundation, grant number CRSII5 177237.
Acknowledgments
The authors would like to thank Justin Curry, Jordan De Sha, and Bren-dan Mallery for fruitful discussions. We also thank LNMC lab for theneuronal reconstructions used for this analysis, as part of the BBP neuronmorphologies dataset.
Conflicts of interest
The authors declare no conflict of interest.
References [1] C. Aslangul, N. Pottier, P. Chvosta, and D. Saint-James. Directedrandom walk with spatially correlated random transfer rates.
Physicalreview. E, Statistical physics, plasmas, fluids, and related interdisci-plinary topics , 47 3:1610–1617, 1993.[2] R. L. Belton, B. T. Fasy, R. Mertz, S. Micka, D. L. Millman, D. Sali-nas, A. Schenfisch, J. Schupbach, and L. Williams. Reconstructingembedded graphs from persistence diagrams. arXiv , abs/1912.08913,2020.[3] H. Byrne, H. Harrington, R. Muschel, G. Reinert, B. J. Stolz, andU. Tillmann. Topological methods for characterising spatial networks:A case study in tumour vasculature. arXiv: Quantitative Methods ,2019.[4] G. Carlsson. Topology and data.
Bulletin of the American Mathemat-ical Society , 46:255–308, 2009.[5] J. Curry. The fiber of the persistence map for functions on the interval. arXiv , Jan 2019.[6] J. Curry, S. Mukherjee, and K. Turner. How many directions determinea shape and other sufficiency results for two topological transforms. arXiv: Algebraic Topology , 2018.
7] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological per-sistence and simplification.
Discrete & Computational Geometry ,28:511–533, 2002.[8] P. Frosini and C. Landi. Size functions and morphological transfor-mations.
Acta Applicandae Mathematica , 49:85–104, 1997.[9] F. Galton and H. W. Watson. On the probability of the extinction offamilies. 1875.[10] M. Gameiro, Y. Hiraoka, S. Izumi, M. Kram´ar, K. Mischaikow, andVidit Nanda. A topological measurement of protein compressibility.
Japan Journal of Industrial and Applied Mathematics , 32:1–17, 2015.[11] H. Harrington, E. Feliu, C. Wiuf, and M. P. Stumpf. Cellular com-partments cause multistability in biochemical reaction networks andallow cells to process more information. arXiv: Molecular Networks ,2012.[12] L. Kanari, H. Dictus, A. Chalimourda, W. Van Geit, B. Coste, J. Shill-cock, K. Hess, and H. Markram. Computational synthesis of corticaldendritic morphologies.
Cell , June 2020.[13] L. Kanari, P. D(cid:32)lotko, M. Scolamiero, R. Levi, J. Shillcock, K. Hess,and H. Markram. A topological representation of branching neuronalmorphologies.
Neuroinformatics , 16(1):3–13, Jan 2018.[14] L. Kanari, S. Ramaswamy, Y.g Shi, S. Morand, J. Meystre, R. Perin,M. Abdellah, Y. Wang, K. Hess, and H. Markram. Objective Morpho-logical Classification of Neocortical Pyramidal Cells.
Cerebral Cortex ,29(4):1719–1735, 01 2019.[15] R. Koene, B. Tijms, Peter van Hees, F. Postma, A. Ridder, G. Ra-makers, J. Pelt, and A. V. Ooyen. Netmorph: A framework for thestochastic generation of large scale neuronal networks with realisticneuron morphologies.
Neuroinformatics , 7:195–210, 2009.[16] Y. Lee, S. Barthel, P. D(cid:32)lotko, S. M. Moosavi, K. Hess, and B. Smit.High-throughput screening approach for nanoporous materials genomeusing topological data analysis: Application to zeolites.
Journal ofChemical Theory and Computation , 14:4427 – 4437, 2018.[17] H. Markram, E. Muller, S. Ramaswamy, M. W. Reimann, M. Abdellah,C. A. Sanchez, A. Ailamaki, L. Alonso-Nanclares, N. Antille, S. Arse-ver, G. A. Atenekeng Kahou, T. Berger, A. Bilgili, N. Buncic, A. Chal-imourda, G. Chindemi, J-D Courcol, F. Delalondre, V. Delattre,S. Druckmann, R. Dumusc, J. Dynes, S. Eilemann, E. Gal, M. Gevaert,J-P Ghobril, A. Gidon, J. Graham, A. Gupta, V. Haenel, E. Hay,T. Heinis, J. B. Hernando, M. Hines, L. Kanari, D. Keller, J. Kenyon,G. Khazen, Y. Kim, J. G. King, Z. Kisv´arday, P. Kumbhar, S. Lasserre,J-V Le B´e, B. Magalh˜aes, A. Merch´an-P´erez, J. Meystre, B. R. Mor-rice, J. Muller, A. Mu˜noz-C´espedes, S. Muralidhar, K. Muthurasa,D. Nachbaur, T. H. Newton, . Nolte, A. Ovcharenko, J. Palacios,L. Pastor, R. Perin, R. Ranjan, I. Riachi, J. Rodr´ıguez, J. L. Riquelme,C. R¨ossert, K. Sfyrakis, Y. Shi, J. Shillcock, G. Silberberg, R. Silva, . Tauheed, M. Telefont, M. Toledo-Rodriguez, T. Tr¨ankler, W. VanGeit, J. Villafranca D´ıaz, . Walker, Y. Wang, S. M. Zaninetta, J. De-Felipe, S. L. Hill, I. Segev, and F. Sch¨urmann. Reconstruction andsimulation of neocortical microcircuitry. Cell , 163:456–492, 2015.[18] A. Martino, A. Rizzi, and F. M. F. Mascioli. Supervised approachesfor protein function prediction by topological data analysis. , pages1–8, 2018.[19] G. Muszynski, K. Kashinath, V. Kurlin, Michael F. Wehner, andM. Prabhat. Topological data analysis and machine learning forrecognizing atmospheric river patterns in large climate datasets.
Geo-scientific Model Development , 12:613–628, 2019.[20] S. Oudot and E. Solomon. Inverse problems in topological persistence. arXiv , abs/1810.10813, 2018.[21] M. W. Reimann, M. Nolte, M. Scolamiero, K. Turner, R. Perin,G. Chindemi, P. Dlotko, R. Levi, K. Hess, and H. Markram. Cliquesof neurons bound into cavities provide a missing link between structureand function.
Frontiers in Computational Neuroscience , 11, 2017.[22] V. Robins. Computational topology for point data: Betti numbers of α -shapes. 2002.[23] A. Sizemore, C. Giusti, A. E. Kahn, J. Vettel, R. Betzel, and D. Bas-sett. Cliques and cavities in the human connectome. Journal ofComputational Neuroscience , 44:115 – 145, 2017.[24] B. J. Stolz, H. Harrington, and M. Porter. Persistent homology oftime-dependent functional networks constructed from coupled timeseries.
Chaos , 27 4:047410, 2017.[25] A. Verri, C. Uras, P. Frosini, and M. Ferri. On the use of size functionsfor shape analysis.
Biological Cybernetics , 70:99–107, 2004.[26] A. Zomorodian and G. Carlsson. Computing persistent homology. In
SCG ’04 , 2004., 2004.