[PDF] Graph Reconstruction by Discrete Morse Theory

Abstract

Recovering hidden graph-like structures from potentially noisy data is a fundamental task in modern data analysis. Recently, a persistence-guided discrete Morse-based framework to extract a geometric graph from low-dimensional data has become popular. However, to date, there is very limited theoretical understanding of this framework in terms of graph reconstruction. This paper makes a first step towards closing this gap. Specifically, first, leveraging existing theoretical understanding of persistence-guided discrete Morse cancellation, we provide a simplified version of the existing discrete Morse-based graph reconstruction algorithm. We then introduce a simple and natural noise model and show that the aforementioned framework can correctly reconstruct a graph under this noise model, in the sense that it has the same loop structure as the hidden ground-truth graph, and is also geometrically close. We also provide some experimental results for our simplified graph-reconstruction algorithm.

Full PDF

GGraph Reconstruction by Discrete Morse Theory

Tamal K. Dey ∗ , Jiayuan Wang ∗ , Yusu Wang ∗ Abstract

Recovering hidden graph-like structures from potentially noisy data is a fundamentaltask in modern data analysis. Recently, a persistence-guided discrete Morse-basedframework to extract a geometric graph from low-dimensional data has become popular.However, to date, there is very limited theoretical understanding of this framework interms of graph reconstruction. This paper makes a ﬁrst step towards closing this gap.Speciﬁcally, ﬁrst, leveraging existing theoretical understanding of persistence-guideddiscrete Morse cancellation, we provide a simpliﬁed version of the existing discreteMorse-based graph reconstruction algorithm. We then introduce a simple and naturalnoise model and show that the aforementioned framework can correctly reconstructa graph under this noise model, in the sense that it has the same loop structure asthe hidden ground-truth graph, and is also geometrically close. We also provide someexperimental results for our simpliﬁed graph-reconstruction algorithm.

Recovering hidden structures from potentially noisy data is a fundamental task in modern dataanalysis. A particular type of structure often of interest is the geometric graph-like structure.For example, given a collection of GPS trajectories, recovering the hidden road networkcan be modeled as reconstructing a geometric graph embedded in the plane. Given thesimulated density ﬁeld of dark matters in universe, ﬁnding the hidden ﬁlamentary structuresis essentially a problem of geometric graph reconstruction.Diﬀerent approaches have been developed for reconstructing a curve or a metric graphfrom input data. For example, in computer graphics, much work have been done in extracting1D skeleton of geometric models using the medial axis or Reeb graphs [10, 29, 20, 16, 22, 7].In computer vision and machine learning, a series of work has been developed based on theconcept of principal curves , originally proposed by Hastie and Steutzle [18]. Extensions tographs include the work in [19] for 2D images and in [25] for high dimensional point data.In general, there is little theoretical guarantees for most approaches developed in practiceto extract hidden graphs. One exception is some recent work in computational topology:Aanijaneya et al. [3] proposed the ﬁrst algorithm to approximate a metric graph from aninput metric space with guarantees. The authors of [8, 16] used Reeb-like structures toapproximate a hidden (metric) graph with some theoretical guarantees. These work however ∗ Department of Computer Science and Engineering, The Ohio State University. tamaldey,[email protected], [email protected] a r X i v : . [ c s . C G ] M a r nly handles (Gromov-)Hausdorﬀ-type of noise. When input points are embedded in anambient space, they requires the input points to lie within a small tubular neighborhood ofthe hidden graph. Empirically, these methods do not seem to be eﬀective when the inputcontains ambient noise allowing some faraway points from the hidden graph.Recently, a discrete Morse-based framework for recovering hidden structures was proposedand studied [9, 17, 26]. This line of work computes and simpliﬁes a discrete analog of(un)stable manifolds of a Morse function by using the (Forman’s) discrete Morse theorycoupled with persistent homology for 2D or 3D volumetric data. One of the main issues insuch simpliﬁcation is the inherent obstructions that may occur for cancelling critical pairs.The authors of [26] suggest sidestepping this and consider a combinatorial representationof critical pairs for further processing. The authors in [9] identify a restricted set of pairscalled “cancellable close pairs” which are guaranteed to admit cancellation. This frameworkhas been applied to, for example, extracting ﬁlament structures from simulated dark matterdensity ﬁelds [27] and reconstructing road networks from GPS traces [28].This persistence-guided discrete Morse-based framework has shown to be very eﬀective inrecovering a hidden geometric graph from (non-Hausdorﬀ type) noise and non-homogeneousdata. The method draws upon the global topological structure hidden in the input scalar ﬁeldand thus is particularly eﬀective at identifying junction nodes which has been a challengefor previous approaches that rely mostly on local information. However, to date, theoreticalunderstanding of such a framework remains limited. Simpliﬁcation of a discrete Morsegradient vector ﬁeld using persistence has been studied before. For example, the work of [9]clariﬁes the connection between persistence-pairing and the simpliﬁcation of discrete Morsechain complex (which is closely related, but diﬀerent from the cancellation in the discretegradient vector ﬁeld) for 2D and 3D domains. Bauer et al. [6] obtain several results onpersistence guided discrete Morse simpliﬁcation for combinatorial surfaces. The simpliﬁcationof vertex-edge persistence pairing used in [6] has also been observed in [4] independently forsimplifying Morse functions on surfaces. Leveraging these existing developments, we aim toprovide a theoretical understanding of a persistence-guided discrete Morse based approach toreconstruct a hidden geometric graph. Main contributions and organization of paper.

In Section 3, we start with oneversion of the existing persistence-guided discrete Morse-based graph reconstruction algorithm(as employed in [27, 28, 11]). We show that this algorithm can be signiﬁcantly simpliﬁed whilestill yielding the same output. To establish the theoretical guarantee of the reconstructionalgorithm, we introduce a simple yet natural noise model in Section 4. Intuitively, this noisemodel assumes that we are given an input density ﬁeld ρ : R d → R where densities aresigniﬁcantly higher within a small neighborhood around a hidden graph than outside it. Underthis noise model, we show that the reconstructed graph has the same loop structure as thehidden graph, and is also geometrically close to it; the technical details are in Sections 5 and6 for the general case and the 2-dimensional case (with additional guarantees), respectively.While our noise model is simple, our theoretical guarantees are ﬁrst of a kind developedfor a discrete Morse-based approach applied to graph reconstruction. In fact, prior to this,it was not clear whether a discrete Morse based approach can recover a graph even if thereis no noise, that is, the density function has positive values only on the hidden graph. Forour speciﬁc noise model, it may be possible to develop thresholding strategies perhaps with2heoretical guarantees. However, previous work (e.g, [27, 28]) have shown that discrete Morseapproach succeeds in many cases handling non-homogeneous data where thresholding fails.We have implemented the proposed simpliﬁed algorithm and tested it on several data sets,which generally gives a speed-up of at least a factor of 2 over a state-of-the-art approach. Wepresent more discussions and experimental results in the appendix. For simplicity, we consider only a smooth function f : R d → R . See [13, 21] for more generaldiscussions.For a point p ∈ R d , the gradient vector of f at a point p is ∇ f ( p ) = − [ ∂f∂x · · · ∂f∂x d ] T ,which represents the steepest descending direction of f at p , with its magnitude being therate of change. An integral line of f is a path π : (0 , → R d such that the tangent vectorat each point p of this path equals ∇ f ( p ), which is intuitively a ﬂow line following thesteepest descending direction at any point. A point p ∈ R d is critical if its gradient vectorvanishes, i.e, ∇ f ( p ) = [0 · · · T . A maximal integral line necessarily “starts” and “ends” atcritical points of f ; that is, lim t → π ( t ) = p with ∇ f ( p ) = [0 · · · T , and lim t → π ( t ) = q with ∇ f ( q ) = [0 · · · T . See Figure 1a where we show the graph of a function f : R → R , andthere is an integral line from p (cid:48) to the minimum v .For a critical point p , the union of p and all the points from integral lines ﬂowing into p isreferred to as the stable manifold of p . Similarly, for a critical point q , the union of q and allthe points on integral lines starting from q is called the unstable manifold of q . The stablemanifold of a minimum p intuitively corresponds the basin/valley around p in the terrainof f . The 1-stable manifolds of index ( d −

1) saddles consist of pieces of curves connecting( d − f ); see Figure 1a for an example. Symmetrically, the unstable manifoldof a maximum q corresponds to the mountain around q . The 1-unstable manifolds consist ofa collection of curves connecting 1-saddles to minima, corresponding intuitively to the “valleyridges”.In this paper, we focus on a graph-reconstruction framework using Morse-theory (as in e.g,[17, 9, 27, 28]). Intuitively, the 1-stable manifolds of saddles (mountain ridges) of the densityﬁeld ρ are used to capture the hidden graphs. To implement such an idea in practice, the discrete Morse theory is used for robustness and simplicity contributed by its combinatorialnature; and a simpliﬁcation scheme guided by the persistence pairings is employed to removenoise. Below, we introduce some necessary background notions in these topics. First we brieﬂy describe some notions from discrete Morse theory (originally introduced byForman [15]) in the simplicial setting.A k -simplex τ = { p , . . . , p k } is the convex hull of k + 1 aﬃnely independent points; k iscalled the dimension of τ . A face σ of τ is a simplex spanned by a proper subset of vertices3 a) (b) (c) (d) Figure 1: (a) M and M are maxima (red dots), v and v are minima (blue dots), s is asaddle (green dots) with its stable manifolds ﬂowing to it from M and M . If we put a dropof water at p (cid:48) it will ﬂow to v . If we put it on the other side of the mountain ridge it willﬂow to minimum v . (b) Before cancellation of pair (cid:104) v , e (cid:105) . (c) After cancellation, the pathfrom e to v is inverted, giving rise to a gradient path from e to v , making (cid:104) v , e (cid:105) nowpotentially cancellable. (d) An edge-triangle pair (cid:104) e, t (cid:105) which is not cancellable as there aretwo gradient paths between them.of τ ; σ is a facet of the k -simplex τ , denoted by σ < τ , if its dimension is k − K which is simply a collection of simplicesand all their faces so that if two simplices intersect, they do so in a common face. A discrete(gradient) vector is a pair of simplices ( σ, τ ) such that σ < τ . A Morse pairing in K is acollection of discrete vectors M ( K ) = { ( σ, τ ) } where each simplex appears in at most onepair; simplices that are not in any pair are called critical .Given a Morse pairing M ( K ), a V-path is a sequence τ , σ , τ , . . . , σ (cid:96) , τ (cid:96) , σ (cid:96) +1 , where( σ i , τ i ) ∈ M ( K ) for every i = 1 , . . . , (cid:96) , and each σ i +1 is a facet of τ i for each i = 0 , . . . , (cid:96) . If (cid:96) = 0, the V-path is trivial . This V-path is cyclic if (cid:96) > σ (cid:96) +1 , τ ) ∈ M ( K ); otherwise,it is acyclic in which case we call this V-path a gradient path . We say that a gradient path is avertex-edge gradient path if dimension ( σ i ) = 0, implying that dimension ( τ i ) = 1. Similarly,it is a edge-triangle gradient path if dimension ( σ i ) = 1. A Morse pairing M ( K ) becomes a discrete gradient vector ﬁeld (or equivalently a gradient Morse pairing ) if there is no cyclicV-path induced by M ( K ).Intuitively, given a discrete gradient vector ﬁeld M ( K ), a gradient path τ , σ , . . . , τ (cid:96) , σ (cid:96) +1 is the analog of an integral line in the smooth setting. But diﬀerent from the smooth setting,a maximal gradient path may not start or end at critical simplices. However, those that do(i.e, when τ and σ k +1 are critical simplices) are analogous to maximal integral line in thesmooth setting which “start” and “end” at critical points, and for convenience one can thinkof critical k -simplices in the discrete Morse setting as index- k critical points in the smoothsetting. For example, for a function on R , critical 0-, 1- and 2-simplices in the discrete Morsesetting correspond to minima, saddles and maxima in the smooth setting, respectively.For a critical edge e , we deﬁne its stable manifold to be the union of edge-triangle gradientpaths that ends at e . Its unstable manifold is deﬁned to be the union of vertex-edge gradientpaths that begins with e . While earlier we use “mountain ridges” (1-stable manifolds)to motivate the graph reconstruction framework, algorithmically (especially for the Morsecancellations below), vertex-edge gradient paths are simpler to handle. Hence in our algorithmbelow, we in fact consider the function g ρ = − ρ (instead of the density ﬁeld ρ itself) and the4lgorithm outputs (a subset of) the (vertex-edge paths in the discretesetting) as the recovered hidden graph. Morse cancellation / simpliﬁcation.

One can simplify a discrete gradient vectorﬁeld M ( K ) (i.e, reducing the number of critical simplices) by the following Morse cancellationoperation: A pair of critical simplices (cid:104) σ, τ (cid:105) with dimension ( τ ) = dimension ( σ ) + 1 is cancellable , if there is a unique gradient path τ = τ , σ , . . . , τ (cid:96) , σ (cid:96) +1 = σ starting at the k + 1-simplex τ and ends at the k -simplex σ . The Morse cancellation operation on (cid:104) σ, τ (cid:105) then modiﬁes the vector ﬁeld M ( K ) by removing all gradient vectors ( σ i , τ i ), for i = 1 , . . . , (cid:96) ,while adding new gradient vectors ( σ i , τ i − ), for i = 1 , . . . , (cid:96) + 1. Intuitively, the gradientpath is inverted. Note that τ = τ and σ = σ (cid:96) +1 are no longer critical after the cancellationas they now participate in discrete gradient vectors. If there is no gradient path, or morethan one gradient path between this pair of critical simplices (cid:104) σ, τ (cid:105) , then this pair is notcancellable – the uniqueness condition is to ensure that no cyclic V-paths are formed afterthe cancellation operation. See Figure 1 (b) – (d) for examples. The Morse cancellation can be applied to any sequence of critical simplices pairs as long as theyare cancellable at the time of cancellation. There is no canonical cancellation sequence. Tocancel features corresponding to “noise” w.r.t. an input piecewise-linear function f : | K | → R ,a popular strategy is to guide the Morse cancellation by the persistent homology induced bythe lower-star ﬁltration [17, 27], which we introduce now. Filtrations and lower-star ﬁltration.

Given a simplicial complex K , let S be anordered sequence σ , . . . , σ N of all n simplices in K so that for any simplex σ i ∈ K , all of itsfaces appear before it in S . Then S induces a (simplex-wise) ﬁltration F ( K ) : K ⊂ K ⊂· · · ⊂ K N = K, where K i = (cid:83) j ≤ i σ j is the subcomplex formed by the preﬁx σ , . . . , σ i of S .Passing to homology groups, we have a persistence module H ∗ ( K ) → · · · → H ∗ ( K N ), whichhas a unique decomposition into the direct sum of a set of indecomposable summands thatcan be represented by the set of persistence-pairing P ( K ) induced by F ( K ): Each persistencepair ( σ i , σ j ) ∈ P ( K ) indicates that a new k -th homological class, k = dimension ( σ i ),is created at K i and destroyed at K j ; σ i is thus called a positive simplex as it creates,and σ j a negative simplex . Assuming that there is a simplex-wise function ¯ f : K → R such that ¯ f ( σ i ) ≤ ¯ f ( σ j ) if i < j , then the persistence of the pair ( σ, τ ) is deﬁned aspers( σ ) = pers( τ ) = pers( σ, τ ) = ¯ f ( τ ) − ¯ f ( σ ). Some simplices σ (cid:96) ’s may be unpaired, meaningthat homological features created at K (cid:96) are never destroyed. We augment P ( K ) by adding( σ (cid:96) , ∞ ) for every unpaired simplex σ (cid:96) to it, and set pers( σ (cid:96) , ∞ ) = ∞ .The persistent homology can be deﬁned for any ﬁltration of K . In our setting, there isan input function f : V ( K ) → R deﬁned at the vertices V ( K ) of K whose linear extensionleads to a piecewise-linear (PL) function still denoted by f : | K | → R . To reﬂect topologicalfeatures of f , we use the lower-star ﬁltration of K induced by f : Speciﬁcally, for any vertex v ∈ V ( K ), its lower-star LowSt( v ) is the set of simplicies containing v where v has the highest f value among their vertices. Now sort vertices of K in non-decreasing order of their f -values:5 , . . . , v n . An ordered sequence S = (cid:104) σ , . . . , σ N (cid:105) induces a lower-star ﬁltration F f ( K ) of K w.r.t. f if S can be partitioned to n consecutive pieces (cid:104) σ , . . . , σ I (cid:105) , (cid:104) σ I +1 , . . . , σ I (cid:105) , . . . , (cid:104) σ I n − +1 , . . . , σ N (cid:105) , such that the i -th piece (cid:104) σ I i − +1 , . . . , σ I i (cid:105) equals LowSt( v i ).Now let P f ( K ) be the resulting set of persistence pairs induced by the lower-star ﬁltration F f ( K ). Extend the function f : V ( K ) → R to a simplex-wise function ¯ f : K → R where¯ f ( σ ) = max v ∈ σ f ( v ) (i.e, ¯ f ( σ ) is the highest f-value of any of its vertices). For each pair ( σ, τ ),we measure its persistence by pers( σ, τ ) = ¯ f ( τ ) − ¯ f ( σ ). Every simplex in K contributes to apersistence pair in P f ( K ). However, assuming the value of f is distinct on all vertices, thenthose persistence pairs with zero-persistence are “trivial” in the sense they correspond to thelocal pairing of two simplices from the lower-star of the same vertex. A persistence pair ( σ, τ ) with positive persistence corresponds to a pair of (homological) critical points ( p, q ) for thePL-function f : | K | → R [13] induced by the function f on V ( K ), with p ∈ σ and q ∈ τ . Problem setup.

Suppose we have a domain Ω (which will be a cube in R d in thispaper) and a density function ρ : Ω → R (that “concentrates” around a hidden geometricgraph G ⊂ Ω). In the discrete setting, our input will be a triangulation K of Ω and a densityfunction given as a PL-function ρ : K → R . Our goal is to compute a graph (cid:98) G approximatingthe hidden graph G . In Algorithm 1, we ﬁrst present a known discrete Morse-based graph(1-skeleton) reconstruction framework, which is based on the approaches in [17, 9, 27, 28].Intuitively, we wish to use “mountain ridges” of the density ﬁeld to approximate the hiddengraph, which are computed as the 1-unstable manifolds of g ρ = − ρ , the negation of the densityfunction. Speciﬁcally, after initializing the discrete gradient vector ﬁeld M to be a trivialone, a persistence-guided Morse cancellation step is performed in Procedure PerSimpVF ()to compute a new discrete gradient vector ﬁeld M δ so as to capture only important (highpersistent) features of g ρ . In particular, Morse-cancellation is performed for each pair ofcritical simplices from P ( K ) (if possible) in increasing order of persistence values (for pairswith equal persistence, we use the nested order as in [6]). Finally, the union of the 1-unstablemanifolds of all remaining high-persistence critical edges is taken as the output graph (cid:98) G , asoutlined in Procedure CollectOutputG ().Since we only need 1-unstable manifolds, K is assumed to be a 2-complex. It is pointedout in [11] that in fact, instead of performing Morse-cancellation for all critical pairs involvingedges (i.e, vertex-edge pairs and edge-triangle pairs), one only needs to cancel vertex-edgepairs – This is because only vertex-edge gradient vectors will contribute to the 1-unstablemanifolds, and also new vertex-edge vectors can only be generated while canceling othervertex-edge pairs. Hence in PerSimpVF (), we can consider only vertex-edge pairs ( σ, τ ) ∈ P inorder. Furthermore, it is not necessary to check whether the cancellation is valid or not – itwill always be valid as long as the pairs are processed in increasing orders of persistence [6] .However, we can further simplify the algorithm as follows: First, we replace procedure PerSimpVF () by procedure

PerSimpTree () as shown in Algorithm 2, which is much simpler We remark that though [6] states that the cancellation is not valid in higher dimension or non-manifold2-complexes, all cancellations in

PerSimpVF () are for vertex-edge pairs in a spanning tree which can be viewedas a 1-complex, and thus are always valid. no explicit cancellationoperation any more.

Algorithm 1:

MorseRecon ( K , ρ , δ ) Data:

Triangulation K of Ω, density function ρ : K → R , threshold δ Result:

Reconstructed graph (cid:98) G begin Compute persistence pairings P ( K ) by the lower-star ﬁltration of K w.r.t g ρ = − ρ M = PerSimpVF ( P ( K ) , δ ) (cid:98) G = CollectOutputG ( M ) return (cid:98) G Procedure

PerSimpVF ( P ( K ) , δ ) Set initial discrete gradient ﬁeld M on K to be trivial Rank all persistence pairs in P ( K ) in increasing order of their persistence for each ( σ, τ ) ∈ P ( K ) with pers( σ, τ ) ≤ δ do If possible, perform discrete-Morse cancellation of ( σ, τ ) and update the discretegradient vector ﬁeld M return M Procedure

CollectOutputG ( M ) (cid:98) G = ∅ for each remaining critical edge e with pers( e ) > δ do (cid:98) G = (cid:98) G (cid:83) { e } return (cid:98) G The 1-dimensional simplicial complex T output by procedure PerSimpTree () may havemultiple connected components T = { T , . . . , T k } – In fact, it is known that each T i is atree and T is a forest (see results from [4, 6] as summarized in Lemma 3.2 below). For eachcomponent T i , we deﬁne its sink , denoted by si( T i ), as the vertex v i ∈ T i with the lowestfunction g ρ = − ρ value. Lemma 3.2 also states that the sink of T i would have been the only critical simplex among all simplices in T i , if we had performed the δ -simpliﬁcation asspeciﬁed in procedure PerSimpVF (). Next, we replace procedure

CollectOutputG () by procedure

Treebased-OutputG () shown in Algorithm 2. We use

MorseReconSimp () to denote our simpliﬁedversion of Algorithm 1 (with

PerSimpVF () replaced by

PerSimpTree (), and

CollectOutputG ()replaced by

Treebased-OutputG (). In summary, algorithm

MorseReconSimp ( K, ρ, δ ) works byﬁrst computing all persistence pairs as before. It then collects all vertex-edge persistencepairs ( v, e ) with pers( v, e ) ≤ δ . These edges along with the set of all vertices form a spanningforest T . Then, for every edge e = (cid:104) u, v (cid:105) with pers( e ) > δ , it outputs the 1-unstable manifoldof e , which is simply the union of e and the unique paths from u and v to the sink (root)of the tree containing them respectively. Its time complexity is stated below; note for theprevious algorithm MorseRecon (), the cancellation step can take ˜ O ( n ) time. Theorem 3.1.

The time complexity of our Algorithm

PerSimpVF () is O ( P ert ( K ) + n ) , where P erT ( K ) is the time to compute persistence pairings for K , and n is the total number of lgorithm 2: MorseReconSimp ( K , ρ , δ ) Procedure

PerSimpTree ( P ( K ) , δ ) / ∗ This procedure replaces original

PerSimpVF () ∗ / Π := the set of vertex-edge persistence pairs from P ( K ) Set Π ≤ δ ⊆ Π to be Π ≤ δ = { ( v, e ) ∈ Π | pers( v, e ) ≤ δ } T := (cid:83) ( v,σ ) ∈ Π ≤ δ { σ = (cid:104) u , u (cid:105) , u , u } return T Procedure

Treebased-OutputG ( T ) / ∗ This procedure replaces

CollectOutputG () ∗ / (cid:98) G = ∅ for each edge e = (cid:104) u, v (cid:105) with pers( e ) > δ do Let π ( u ) be the unique path from u to the sink of the tree T i containing u Deﬁne π ( v ) similarly; Set (cid:98) G = (cid:98) G ∪ π ( u ) ∪ π ( v ) ∪ { e } return (cid:98) G vertices and edges in K . We remark that the O ( n ) term is contributed by the step collecting all 1-unstable manifolds,which takes linear time if one avoids revisiting edges while tracing the paths. Justiﬁcation of the modiﬁed algorithm

MorseReconSimp ().

Let M δ denote theresulting discrete gradient ﬁeld after canceling all vertex-edge persistence pairs in P ( K ) withpersistence at most δ ; that is, M δ is the output of the procedure PerSimpVF () (although weonly compute the relevant part of the discrete gradient vector ﬁeld). Using observations from[4, 6], we show that the output T of procedure PerSimpTree () includes all information of M δ .Furthermore, procedure Treebased-OutputG () computes the correct 1-unstable manifolds forall critical edges with persistence larger than δ . Indeed, observe that edges in Morse pairingsfrom M δ (for any δ ≥

0) form a spanning forest of edges in K . Results of [6] imply that theoutput T constructed by our modiﬁed procedure corresponds exactly to this spanning forest: Lemma 3.2.

The following statements hold for the output T of procedure PerSimpTree ()w.r.t any δ ≥ :(i) T is a spanning forest consisting of potentially multiple trees { T , . . . , T k } .(ii) For each tree T i , its sink v i is the only critical simplex in M δ . The collection of v i scorresponds exactly to those vertices whose persistence is bigger than δ .(iii) Any edge with pers( e ) > δ remains critical in M δ (and cannot be contained in T ). Note that, (ii) above implies that for each T i , any discrete gradientpath of M δ in T i terminates at its sink v i . See the right ﬁgure for anillustration. Hence for any vertex v ∈ T i , the path π ( v ) computed inprocedure Treebased-OutputG () is the unique discrete gradient path startingat v . This immediately leads to the following result:8 orollary 3.3. For each critical edge e = (cid:104) u, v (cid:105) with pers( e ) ≥ δ , π ( u ) ∪ π ( v ) ∪ { e } as computed in procedure Treebased-OutputG () is the1-unstable manifold of e in M δ . Hence the output of our simpliﬁed algorithm MorseReconSimp () equals that of the original algorithm

MorseRecon ().

We ﬁrst describe the noise model in the continuous setting where the domain is Ω = [0 , d .We then explain the setup in the discrete setting when the input is a triangulation K of Ω.Given a connected “true graph” G ⊂ Ω, consider a ω -neighborhood G ω ⊆ Ω, meaning that (i) G ⊆ G ω , and (ii) forany x ∈ G ω , d ( x, G) ≤ ω (i.e, G ω is sandwiched between G andits ω -oﬀset). Given G ω , we use cl( G ω ) to denote the closure of itscomplement cl( G ω ) = cl(Ω \ G ω ). See the right ﬁgure, showing G (red graph) with its ω -neighborhood G ω (orange). Deﬁnition 4.1.

A density function ρ : Ω → R is a ( β, ν, ω )-approximation of a connected graph G if the following holds:C-1 There is a ω -neighborhood G ω of G such that G ω deformation retracts to G .C-2 ρ ( x ) ∈ [ β, β + ν ] for x ∈ G ω ; and ρ ( x ) ∈ [0 , ν ] otherwise. Furthermore, β > ν . Intuitively, this noise model requires that the density ρ concentrates around the truegraph G in the sense that the density is signiﬁcantly higher inside G ω than outside it; andthe density ﬂuctuation inside or outside G ω is small compared to the density value in G ω (condition C-2). Condition C-1 says that the neighborhood has the same topology of thehidden graph. Such a density ﬁeld could for example be generated as follows: Imagine thatthere is an ideal density ﬁeld f G : Ω → R where f G ( x ) = β for x ∈ G ω and 0 otherwise.There is a noisy perturbation g : Ω → R whose size is always bounded by g ( x ) ∈ [0 , ν ] forany x ∈ Ω. The observed density ﬁeld ρ = f G + g is an ( β, ν, ω )-approximation of G.In the discrete setting when we have a triangulation K of Ω, we deﬁne a ω -neighborhood G ω to be a subcomplex of K , i.e, G ω ⊆ K , such that (i) G is contained in the underlyingspace of G ω and (ii) for any vertex v ∈ V ( G ω ), d ( v, G) ≤ ω . The outside-region cl( G ω ) ⊆ K is simply the smallest subcomplex of K that contains all simplices from K \ G ω (i.e, allsimplices not in G ω and their faces). A PL-function ρ : K → R ( β, ν, ω )-approximation of Gcan be extended to this setting by requiring the underlying space of G ω deformation retractsto G as in (C-1), and having those density conditions in (C-2) only at vertices of K .We remark that the noise model is still limited – In particular, it does not allow signiﬁcantnon-uniform density distribution. However, this is the ﬁrst time that theoretical guarantees areprovided for a discrete Morse based reconstruction framework, despite that such a frameworkhas been used for diﬀerent applications before. We also give experiments and discussions inAppendix B that the algorithm works beyond this noise model empirical, where thresholdingtype approaches do not work. 9 Theoretical guarantee

In this section, we prove results that are applicable to any dimension. Recall that M δ isthe discrete gradient ﬁeld after the δ -Morse cancellation process , where we perform Morse-cancellation for all vertex-edge persistence pairs from P ( K ). (While our algorithm does notmaintain M δ explicitly, we use it for theoretical analysis.) At this point, all positive edges(i.e, those paired with triangles or unpaired in P ( K )) remain critical in M δ . Some negativeedges (i.e, those paired with vertices in P ( K )) are also critical in M δ – these are exactly thenegative edges with persistence bigger than δ . Treebased-OutputG () only takes the 1-unstablemanifolds of those critical edges (positive or negative) with persistence bigger than δ ; sothose positive edges whose persistence is ≤ δ (if there is any) are ignored.From now on, we use “ under our noise model ” to refer to (1) the input is a ( β, ν, ω )-approximated density ﬁeld w.r.t. G, and (2) δ ∈ [ ν, β − ν ). Let (cid:98) G be the output of algorithm MorseReconSimp ( K, ρ, δ ). The proof of the following result is in Appendix A.1.

Proposition 5.1.

Under our noise model, we have:(i) There is a single critical vertex left after

PerSimpVF () which is in G ω .(ii) Every critical edge considered by Treebased-OutputG () forms a persistence pair with atriangle.(iii) Every critical edge considered by

Treebased-OutputG () is in G ω . Theorem 5.2.

Under our noise model, the output graph satisﬁes ˆ G ⊆ G ω .Proof. Recall that the output graph (cid:98) G consists of the union of 1-unstable manifolds of allthe edges e ∗ , . . . , e ∗ g with persistence larger than δ – By Propositions 5.1 (ii) and (iii), theyare all positive (paired with triangles), and contained inside G ω .Take any i ∈ [1 , g ] and consider e ∗ i = (cid:104) u, v (cid:105) . Without loss of generality, consider thegradient path starting from u : π : u = u , e , u , e , . . . , u s , e s , u s +1 . By Lemma 3.2 andProposition 5.1, u s +1 must be a critical vertex (a sink) and is necessarily the global minimum v , which is also contained inside G ω . We now argue that the entire path π (i.e, all simplicesin it) is contained inside G ω . In fact, we argue a stronger statement: First, we say thata gradient vector ( v, e ) is crossing if v ∈ G ω and e / ∈ G ω (i.e, e ∈ cl( G ω )) – Since v is anendpoint of e , this means that the other endpoint of e must lie in K \ G ω . Claim 5.3.

During the δ -Morse cancellation, no crossing gradient vector is ever produced.Proof. Suppose the lemma is not true: Then let ( v, e ) be the ﬁrst crossing gradient vectorever produced during the δ -Morse cancellation process. Since we start with a trivial discretegradient vector ﬁeld, the creation of ( v, e ) can only be caused by reversing of some gradientpath π (cid:48) connecting two critical simplices v (cid:48) and e (cid:48) while we are performing Morse-cancellationfor the persistence pair ( v (cid:48) , e (cid:48) ). Obviously, pers( v (cid:48) , e (cid:48) ) ≤ δ . On the other hand, due to our( β, ν, ω )-noise model and the choice of δ , it must be that either both v (cid:48) , e (cid:48) ∈ G ω or both v (cid:48) , e (cid:48) ∈ K \ G ω – as otherwise, the persistence of this pair will be larger than β − ν > δ .Now consider this gradient path π (cid:48) connecting v (cid:48) and e (cid:48) in the current discrete gradientvector ﬁeld M (cid:48) . Since the pair ( v, e ) becomes a gradient vector after the inversion of this path,10t must be that ( w, e ) currently is a gradient vector where e = (cid:104) v, w (cid:105) . Furthermore, since thepath π (cid:48) begins and ends with simplices either both in G ω or both outside it, the path π (cid:48) mustcontain a gradient vector ( v (cid:48)(cid:48) , e (cid:48)(cid:48) ) going in the opposite direction crossing inside/outside, thatis, v (cid:48)(cid:48) ∈ G ω and e (cid:48)(cid:48) / ∈ G ω . In other words, it must contain a crossing gradient vector. Thishowever contradicts to our assumption that ( v, e ) would be the ﬁrst crossing gradient vector.Hence the assumption is wrong and no crossing gradient vector can ever be created.As there is no crossing gradient vector during and after δ -Morse cancellation, it follows that π , which is one piece of the 1-unstable manifold of the critical edge e ∗ i , has to be containedinside G ω . The same argument works for the other piece of 1-unstable manifold of e ∗ i (startingfrom the other endpoint of e ∗ i ). Since this is for any i ∈ [1 , g ], the theorem holds.The previous theorem shows that (cid:98) G is close to G in geometry. Next we will show thatthey are also close in topology. Proposition 5.4.

Under our noise model, (cid:98) G is homotopy equivalent to G .Proof. We show that (cid:98) G has the same ﬁrst Betti number as that of G which implies the claimas any two graphs in R d with the same ﬁrst Betti number are homotopy equivalent.The underlying space of ω -neighborhood G ω of G deformation retracts to G by deﬁnition.Observe that, by our noise model, G ω is a sublevel set in the ﬁltration that determines thepersistence pairs. This sublevel set being homotopy equivalent to G must contain exactly g positive edges where g is the ﬁrst Betti number of G . Each of these positive edges pairs with atriangle in G ω . Therefore, pers( e, t ) > δ for each of the g positive edges in G ω . By our earlierresults, these are exactly the edges that will be considered by procedure Treebased-OutputG ().Our algorithm constructs (cid:98) G by adding these g positive edges to the spanning tree each ofwhich adds a new cycle. Thus, (cid:98) G has ﬁrst Betti number g .We have already proved that (cid:98) G is contained in G ω . This fact along with Proposition 5.4can be used to argue that any deformation retraction taking (underlying space) G ω to G also takes (cid:98) G to a subset G (cid:48) ⊆ G where G (cid:48) and G have the same ﬁrst Betti number. In whatfollows, we use G ω to denote also its underlying space. Theorem 5.5.

Let F : G ω × [0 , → G ω be any deformation retraction. Then, the restriction F | (cid:98) G : (cid:98) G × [0 , → G ω is a homotopy from the embedding (cid:98) G to G (cid:48) ⊆ G where G (cid:48) is theminimal subset so that G and G (cid:48) have the same ﬁrst Betti number.Proof. The fact that F | (cid:98) G ( · , (cid:96) ) is continuous for any (cid:96) ∈ [0 ,

1] is obvious from the continuityof F . Only thing that needs to be shown is that F | (cid:98) G ( (cid:98) G,

1) = G (cid:48) . Suppose not. Then, G (cid:48)(cid:48) = F | (cid:98) G ( (cid:98) G,

1) is a proper subset of G which has a ﬁrst Betti number less than that of G .We observe that the cycle in (cid:98) G created by a positive edge e along with the paths to the root ofthe spanning tree is also non-trivial in G ω because this is a cycle created by adding the edge e during persistence ﬁltration and the edge e is not killed in G ω .Therefore, a cycle basis for (cid:98) G is also a homology basis for G ω . Since the map F ( · ,

1) : G ω → G is a homotopy equivalence,it induces an isomorphism in the respective homology groups; in particular, a homology basisin G ω is mapped to a homology basis in G . Therefore, the image G (cid:48)(cid:48) = F | (cid:98) G ( (cid:98) G,

1) must havea basis of cardinality g if (cid:98) G has ﬁrst Betti number g . But, G (cid:48)(cid:48) cannot have a cycle basis ofcardinality g if it is a proper subset of G (cid:48) reaching a contradiction.11 Additional guarantee for 2D

For R , we now show that G ω actually deformation retracts to (cid:98) G , which is stronger thansaying G and (cid:98) G are homotopy equivalent. We are unable to prove this result for dimensionshigher than 2, as our current proof needs that the edge-triangle persistence pairs can alwaysbe canceled (even though our algorithm does not depend on edge-triangle cancellations atall). It would be interesting, as a future work, to see whether a diﬀerent approach can bedeveloped to avoid this obstruction for the special case under our noise model. The mainresult of this section is as follows. Theorem 6.1.

Under our noise model, G ω deformation retracts to G and ˆ G . This main result follows from Proposition 6.2 and Theorem 6.3 below. To prove them,we will show that there exists a partition R := { R i } of the set of triangles in K for whichTheorem 6.3 holds. (This theorem is our main tool in establishing the deformation retract.)We ﬁrst state the results below before giving their proofs. Let B i = ∂R i where ∂ is theboundary operator operating on the 2-chain R i . We also abuse the notations R i and B i todenote the geometric space that is the point-wise union of simplices in the respective chains.Let t i be a triangle in R i whose choice will be explained later. In the following, let H be themaximal set of edges in (cid:98) G whose deletions do not eliminate a cycle (assume that a vertex isdeleted only if all of its edges are deleted). Observe that H necessarily consists of negativeedges forming “hairs” attached to the loops of (cid:98) G and hence to ∪ i B i because of the followingproposition. Proposition 6.2.

Under our noise model, (cid:98) G = ∪ B i (cid:83) H . Theorem 6.3.

Under our noise model, there exists a partition { R i } of triangles in K suchthat, there is a deformation retraction of ∪ i ( R i \ t i ) to (cid:98) G that comprises of two deformationretractions, one from ∪ i ( R i \ t i ) to G ω and another one from G ω to ∪ i B i (cid:83) H which is (cid:98) G . Now we describe the construction of a partition R of the triangles in K to prove Proposition6.2 and Theorem 6.3. For technicality we assume that K is augmented to a triangulation ofa sphere by putting a vertex v at inﬁnity and joining it to the boundary of K with edgesand triangles all of whom have function value ∞ . Let P ( K ) be the collection of persistencepairs of the form either ( σ, τ ) or ( σ, ∞ ) generated from the lower-star ﬁltration F ( K ) asdescribed before. Since K is 2-dimensional, each pair ( σ, τ ) is either a vertex-edge pair or anedge-triangle pair. We order persistence pairs in P ( K ) by their persistence, where ties arebroken via the nested order in the ﬁltration F ( K ), and obtain: P ( K ) = { ( σ , τ ) , . . . , ( σ n , τ n ) , ( α , ∞ ) , . . . , ( α s , ∞ ) } . (1)Starting with a trivial discrete gradient vector ﬁeld M where all simplices in K are critical,the algorithm PerSimpVF () performs Morse cancellations for the ﬁrst m ≤ n persistence pairs( σ , τ ) , . . . , ( σ m , τ m ) in order where pers( σ m , τ m ) ≤ δ but ( σ m +1 , τ m +1 ) > δ , Let M i denotethe gradient vector ﬁeld after canceling ( σ i , τ i ). Recall that in the implementation of thealgorithm we do not need to perform Morse cancellation for any edge-triangle pairs. Howeverin this section, for the theoretical analysis, we will cancel edge-triangle pairs as well. Recall12hat a positive edge is one that creates a 1-cycle, namely, it is either paired with a triangle orunpaired; while a negative edge is one that destroys a 0-cycle (i.e, paired with a vertex).Consider the ordered sequence of edge-triangle persistence pairs, ( e , t ) , . . . , ( e n , t n ), whichis a subsequence of the one in (1). Consider the sequence t , . . . , t n of triangles in K orderedby the above sequence. Recall the standard persistence algorithm [14]. It implicitly associatesa 2-chain with a triangle t when searching for the edge it is about to pair with. This 2-chain isnon-empty if t is a destructor, and is empty otherwise. Let D i denote this 2-chain associatedwith t i for i ∈ [1 , n ]. Initially, the algorithm asserts D i = t i . At any stage, if D i is not empty,the persistence algorithm identiﬁes the edge e in the boundary ∂D i that has been insertedinto the ﬁltration F ( K ) most recently. If e has not been paired with anyone, the algorithmcreates the persistence pair ( e, t i ). Otherwise, if e has already been paired with a triangle, say t i (cid:48) , then D i is updated with D i = D i + D i (cid:48) and the search continues. Given an index j ∈ [1 , n ],we deﬁne a modiﬁed set of chains C ji inductively as follows. For j = 1, C i = t i . Assume that C j − i has been already deﬁned. To deﬁne C ji , similar to the persistence algorithm, check ifthe edge e j − is on the boundary ∂C j − i . If so, deﬁne C ji := C j − i + C j − j − and C ji := C j − i otherwise. The following result is proved in Appendix A.3. Proposition 6.4.

For i ∈ [1 , n ] , e i is in ∂C ii . Furthermore, e i is the most recent edge in ∂C ii according to the ﬁltration order F ( K ) . Procedure

PerSimpVF () also implicitly maintains a 2-chain R ∗ i with each triangle t i .Initially, R ∗ i = t i as in the case of D i . Then, inductively assume that R ∗ i is the 2-chainimplicitly associated with t i when a persistence pair ( e i (cid:48) , t i (cid:48) ) is about to be considered by PerSimpVF () and the boundary ∂R ∗ i contains e i (cid:48) . By reversing a gradient path between t i (cid:48) and e i (cid:48) , it implicitly updates the 2-chain R ∗ i as R ∗ i := R ∗ i + R ∗ i (cid:48) . We observe that R ∗ i isidentical with C i (cid:48) i . Proposition 6.5 below establishes this fact along with some other inductiveproperties useful to prove Theorem 6.3. The proof can be found in Appendix A.4. Proposition 6.5.

Let ( e j , t j ) be the edge-triangle persistence pair PerSimpVF () is about toconsider and let C ji be the -chains deﬁned as above . Then, the following statements hold:(a) For each triangle t i , i = 1 , . . . , n , in the persistence order, the -chain R ∗ i satisﬁes thefollowing conditions:(a.i) R ∗ i = C ji , (a.ii) interpreting R ∗ i as a set of triangles, one hasthat the sets R ∗ i , i = j, . . . , n , partition the set of all triangles in K .(b) There is a gradient path from t i to all edges of the triangles in R ∗ i , and (b.i) the path isunique if the edge is in the boundary ∂R ∗ i for every i = j, . . . , n ; (b.ii) if there is morethan one gradient path from t i to an edge e , then e must be a negative edge. We are now ready to setup the regions R i s needed for Theorem 6.3 and Proposition 6.2.Suppose the ﬁrst m edge-triangle pairs have persistence less than or equal to δ , the parametersupplied to PerSimpVF (). Then, we set R i = R ∗ i as in Proposition 6.5 for i ≥ m + 1 , . . . , n .The proof for Proposition 6.2 is in Appendix A.2.Finally, similar to the vertex-edge gradient vectors, we say that a gradient vector ( e, t ) is crossing if e ∈ G ω and t / ∈ G ω . The following claim can be proved similarly as Claim 5.3.13 laim 6.6. During the δ -Morse cancellation of edge-triangle pairs, no crossing gradientvector is ever produced. Proof of Theorem 6.3.

Set ˆ R i = R i \ t i . Let T be the spanning tree formed by all negativeedges and their vertices. Let L i be the set of edges in R i that has more than one gradient pathfrom t i to them; L i ⊂ T by Proposition 6.5 (b.ii). First, we want to establish a deformationretraction from ∪ ( ˆ R i \ t i ) to G ω . To do this, for k = 0 , , . . . , s , we will deﬁne ˆ R ki inductivelywhere ˆ R k − i deformation retracts to ˆ R ki and ˆ R si ⊆ G ω ∪ L i . Let ˆ R i = ˆ R i . For k = 1 , . . . , s ,consider a positive edge e in ˆ R k − i where (a) e is not in G ω and (b) there is a unique gradientpath in R i from t i to e that passes through triangles all of which are in R i \ ˆ R k − i . If such anedge e exists, then e is necessarily incident to a single triangle, say t , in ˆ R k − i . We collapsethe pair ( e, t ), which is necessarily an edge-triangle gradient vector pair because e is positive.We take ˆ R ki to be ˆ R k − i \ { e, t } . If no such e exists, then either (A) there is no positive edgein ˆ R k − i \ G ω any more; or (B) for each positive edge e (cid:48) ∈ ˆ R k − i \ G ω , (B-1) there is a uniquegradient path from t i to e (cid:48) but this path passes through some triangle in ˆ R k − i ; or (B-2) thereare two gradient paths from t i to e (cid:48) .If there is no positive edge in ˆ R k − i \ G ω any more, then ˆ R k − i ⊆ G ω ∪ L i , as otherwise,there will be at least some triangle from ˆ R k − i \ G ω ∪ L i with at least one boundary edge of itbeing positive. The induction then terminates; we set s = k − e (cid:48) ∈ ˆ R k − i \ L i is an edge not in G ω for which the unique gradient path from t i passes through triangles inˆ R k − i . Let e (cid:48)(cid:48) be the ﬁrst edge in this path that is in ˆ R ik − \ L i . Then, if e (cid:48)(cid:48) (cid:54)∈ G ω , it qualiﬁesfor the conditions (a) and (b) required for e reaching a contradiction. So, assume e (cid:48)(cid:48) ∈ G ω .But, in that case, we have a gradient path that goes into G ω and then comes out to reach e (cid:48) (cid:54)∈ G ω . There has to be a gradient pair in this path where the edge is in G ω and the triangleis not in G ω . This contradicts Claim 6.6. Thus, case (B-1) is not possible. Now consider(B-2): e (cid:48) must be negative by Proposition 6.5 (b.ii). So, it is not possible either.To summarize, the induction terminates in case (A), at which time we would have thatˆ R si ⊆ G ω ∪ L i . Furthermore, this process also establishes a deformation retraction from ˆ R i toˆ R si realized by successive collapses of edge-triangle pairs. Furthermore, by construction, eachcollapsed pair ( e, t ) must be from cl( G ω ), hence ∪ i ˆ R si contains all simplices in G ω . Combinedwith that ˆ R si ⊆ G ω ∪ L i , we have that ∪ i ˆ R si = G ω ∪ L , where L = ∪ i L i is a subset of thespanning tree T . The edges in L being part of a spanning tree cannot form a cycle andthus can be retracted along the tree to G ω , which gives rise to a deformation retraction from ∪ i ( R i \ t i ) to G ω ∪ L and then to G ω , establishing the ﬁrst part of Theorem 6.3.We now show that ( ∪ i ˆ R si ) (cid:84) G ω = G ω deformation retracts to ∪ B i (cid:83) H . Let ˆ L i be theedges in ˆ R si ∩ G ω with more than one gradient path from t i to them. These edges are negativeby Proposition 6.5 (b.ii). Replacing ˆ R i with ˆ R si ∩ G ω and edges in B i ∪ ˆ L i playing the role ofedges in G ω ∪ L i in the above induction, we can obtain that ˆ R si ∩ G ω deformation retracts to B i ∪ ˆ L i . Observe that now instead of Claim 2, we use the fact that no edge-triangle gradientpath crosses B i that consists of only negative and critical edges. To this end, we also observethat ∪ ˆ L i = T ∩ G ω where T is the spanning tree formed by all negative edges, as we onlycollapse edge-triangle pairs that are gradient pairs (hence the participating edges are alwayspositive). This implies that H ⊂ ∪ ˆ L i . Again, edges in ˆ L i (being part of a spanning tree) canbe retracted along the spanning tree till one reaches B i or edges in H . Performing this for14ach i , we thus obtain a deformation retraction from ( ∪ i ˆ R si ) (cid:84) G ω = G ω to ∪ B i (cid:83) ∪ ˆ L i andfurther to ∪ B i (cid:83) H = (cid:98) G . This ﬁnishes the proof of Theorem 6.3.In Appendix B, we also provide some experiments demonstrating the eﬃciency of thesimpliﬁed algorithm, as well as discussion on thresholding strategies. Acknowledgments

We thank Suyi Wang for generously helping with the software.The Enzo dataset used in our experiments is obtained from [1]. The Bone dataset isobtained from [2]. This work is supported by National Science Foundation under grantsCCF-1526513, CCF-1618247, and CCF-1740761, and by National Institute of Health undergrant R01EB022899.

References [1] Enzo code is developed by the Laboratory for Computational Astrophysics at theUniversity of California in San Diego (http://lca.ucsd.edu).[2] CIBC CT dataset archive. .[3] M. Aanjaneya, F. Chazal, D. Chen, M. Glisse, L. Guibas, and D. Morozov. Metric graphreconstruction from noisy data.

International Journal of Computational Geometry &Applications , 22(04):305–325, 2012.[4] D. Attali, M. Glisse, S. Hornus, F. Lazarus, and D. Morozov. Persistence-sensitivesimpliﬁcation of functions on surfaces in linear time.

Presented at TOPOINVIS , 9:23–24,2009.[5] U. Bauer, M. Kerber, J. Reininghaus, and H. Wagner. Phat–persistent homologyalgorithms toolbox.

Journal of Symbolic Computation , 78:76–90, 2017.[6] U. Bauer, C. Lange, and M. Wardetzky. Optimal topological simpliﬁcation of discretefunctions on surfaces.

Discr. Comput. Geom. , 47(2):347–377, 2012.[7] S. Biasotti, D. Giorgi, M. Spagnuolo, and B. Falcidieno. Reeb graphs for shape analysisand applications.

Theoretical Computer Science , 392(1-3):5–22, 2008.[8] F. Chazal, R. Huang, and J. Sun. Gromov–hausdorﬀ approximation of ﬁlamentarystructures using reeb-type graphs.

Discr. Comput. Geom. , 53(3):621–649, 2015.[9] O. Delgado-Friedrichs, V. Robins, and A. Sheppard. Skeletonization and partitioningof digital images using discrete morse theory.

IEEE Trans. Pattern Anal. MachineIntelligence , 37(3):654–666, March 2015.[10] T. Dey and J. Sun. Deﬁning and computing curve-skeletons with medial geodesicfunction. In

Sympos. Geom. Proc. , volume 6, pages 143–152, 2006.[11] T. Dey, J. Wang, and Y. Wang. Improved road network reconstruction using discretemorse theory. In

Proc. 25th ACM SIGSPATIAL . ACM, 2017.1512] T. Dey, J. Wang, and Y. Wang. Graph reconstruction by discrete morse theory. arXivpreprint arXiv:1803.05093 , 2018.[13] H. Edelsbrunner and J. Harer.

Computational Topology - an Introduction . AmericanMathematical Soc., 2010.[14] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simpliﬁ-cation.

Discr. Comput. Geom. , 28:511–533, 2002.[15] R. Forman. Morse theory for cell complexes.

Advances in mathematics , 134(1):90–145,1998.[16] X. Ge, I. I Safa, M. Belkin, and Y. Wang. Data skeletonization via reeb graphs. In

Advances in Neural Info. Proc. Sys. , pages 837–845, 2011.[17] A. Gyulassy, M. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham,and B. Hamann. Topologically clean distance ﬁelds.

IEEE Trans. Visualization ComputerGraphics , 13(6):1432–1439, Nov 2007.[18] T. Hastie and W. Stuetzle. Principal curves.

Journal of the American StatisticalAssociation , 84(406):502–516, 1989.[19] B. K´egl and A. Krzyzak. Piecewise linear skeletonization using principal curves.

IEEETrans. Pattern Anal. Machine Intelligence , 24(1):59–74, 2002.[20] L. Liu, E. W Chambers, D. Letscher, and T. Ju. Extended grassﬁre transform on medialaxes of 2d shapes.

Computer-Aided Design , 43(11):1496–1505, 2011.[21] J. Milnor.

Morse Theory . Princeton Univ. Press, New Jersey, 1963.[22] M. Natali, S. Biasotti, G. Patan`e, and B. Falcidieno. Graph-based representations ofpoint clouds.

Graphical Models , 73(5):151–164, 2011.[23] M. L. Norman, G. L. Bryan, R. Harkness, J. Bordner, D. Reynolds, B. O’Shea, andR. Wagner. Simulating Cosmological Evolution with Enzo.

ArXiv e-prints , May 2007. arXiv:0705.1556 .[24] B. W. O’Shea, G. Bryan, J. Bordner, M. L. Norman, T. Abel, R. Harkness, andA. Kritsuk. Introducing Enzo, an AMR Cosmology Application.

ArXiv Astrophysicse-prints , March 2004. arXiv:astro-ph/0403044 .[25] U. Ozertem and D. Erdogmus. Locally deﬁned principal curves and surfaces.

Journal ofMachine learning research , 12(Apr):1249–1286, 2011.[26] V. Robins, P. J. Wood, and A. P. Sheppard. Theory and algorithms for constructingdiscrete morse complexes from grayscale digital images.

IEEE Trans. Pattern Anal.Machine Intelligence , 33(8):1646–1658, Aug 2011.[27] T. Sousbie. The persistent cosmic web and its ﬁlamentary structure - I. Theory andimplementation. 414:350–383, June 2011. arXiv:1009.4015 .1628] S. Wang, Y. Wang, and Y. Li. Eﬃcient map reconstruction and augmentation viatopological methods. In

Proc. 23rd ACM SIGSPATIAL , page 25. ACM, 2015.[29] Y. Yan, K. Sykes, E. Chambers, D. Letscher, and T. Ju. Erosion thickness on medialaxes of 3d shapes.

ACM Trans. on Graphics , 35(4):38, 2016.17

Missing proofs

A.1 Proof of Proposition 5.1

Proof of claim (i):

We ﬁrst show that there can be no critical vertex of M δ from theinterior of outside-region cl( G ω ). Indeed, suppose there is a critical vertex v / ∈ G ω , then itforms a persistence pair either with ∞ or with a critical edge e ∈ K in P ( K ). The formercannot happen as the only vertex unpaired by the persistence algorithm is the global minimumof the input function f , which necessarily lies in the neighborhood G ω . So suppose ( v, e ) isthe persistent pair containing v . It then follows that e ∈ cl( G ω ) as it must come after v inthe lower-star ﬁltration F ( K ) induced by f . Hence pers( v, e ) is necessarily smaller than δ under our noise model. In other words, the persistence pairing ( v, e ) should have alreadybeen canceled during the δ -Morse simpliﬁcation process. As a result, there cannot be anycritical vertex left in cl( G ω ).Next, we argue that there is exactly one critical vertex in the neighborhood G ω in the ﬁnaldiscrete gradient vector ﬁeld M δ . First, note that the global minimum v of the PL-function f must remain critical, as v will be paired with ∞ by the persistent homology inducedby the lower-star ﬁltration, and the persistence pairing ( v , ∞ ) remains after the δ -Morsecancellation by Lemma 3.2 (ii). We now prove that there cannot be any other critical vertexin G ω .Assume on the contrary that there is another critical vertex u ∈ G ω . This means wehave a persistence pairing ( u, e (cid:48) ) with pers( u, e (cid:48) ) > δ . It then follows that e (cid:48) ∈ cl( G ω ) by ourassumption on the noise model for f = − ρ . Recall that the low-star ﬁltration F ( K ) is inducedby adding the lower-star of v i s in order where v , v , . . . , v n are sorted in increasing orderof f -values. Speciﬁcally, F ( K ) contains the following sequence: ∅ ⊂ ¯ K ⊂ ¯ K ⊂ · · · ⊂ ¯ K n , where ¯ K i = (cid:83) j ≤ i LowSt( v j ). Assume that e (cid:48) = < v a , v b > such that a < b (i.e, f ( v a ) < f ( v b )).Then ¯ K b is the subcomplex in the ﬁltration F ( K ) when the edge e (cid:48) is ﬁrst included. It followsfrom the persistence algorithm [14] that, there are two connected components C , C ⊂ ¯ K b − that is merged with the addition of e (cid:48) ; that is, v a ∈ C and v b ∈ C . Furthermore, since e (cid:48) forms a persistence pair with u , this means that u must be the global minimum of one ofthe components, say C w.o.l.g, and f ( u ) > f ( z ) where z is the global minimum of the othercomponent C . See Figure 2.Figure 2: Illustration for proof of Proposition 5.1.Hence z ∈ G ω as u ∈ G ω and f ( u ) > f ( z ). This however is not possible. Indeed, note18hat G ω ⊆ ¯ K b − : This is because that e (cid:48) ∈ cl( G ω ) (and thus ¯ f ( e (cid:48) ) > δ > ν ), implying ¯ K b − will contain all vertices (thus as well as simplices they span) whose function value is lowerthan δ , which contains G ω (whose vertices have f values < ν < δ ). Since both u and z are in G ω (which is connected), they are already connected in ¯ K b − , contradictory to that u ∈ C and z ∈ C are two diﬀerent connected components in ¯ K b − . Hence the persistence pairing( u, e ) cannot exist, and there is no critical vertex other than the global minimum v . Proof of claim (ii).

A critical edge forms a persistence pair with either a vertex, or atriangle, or remains unpaired. The last case is not possible as the input domain is simplyconnected. Any remaining critical edge cannot be paired with a (critical) vertex either byclaim (i). Statement (ii) then follows.

Proof of claim (iii).

We now prove Proposition (iii): Let e be a critical edge that isconsidered by Treebased-OutputG (). By Proposition 5.1, e forms a persistence pair with atriangle t . Since e is considered by Treebased-OutputG (), pers( e, t ) > δ . Under our noisemodel, this means that e and t cannot be both in G ω or both in its complement G ω . Then,we have two possibilities: (i) e ∈ G ω and t (cid:54)∈ G ω or (ii) t ∈ G ω and e (cid:54)∈ G ω . In case (i) thereis nothing to prove because e ∈ G ω ; case (ii) is impossible because the function value of t willbe less than that of its pairing edge e . A.2 Proof of Proposition 6.2

First, we need the following obvious claim.

Claim A.1.

Let G be any graph and T ⊆ G be a spanning tree. Let E ⊆ G \ T be a set ofedges. Let G E,T ⊆ G be the subgraph where G E,T = ( T ∪ E ) \ H where H is the largest set ofedges in T whose deletions do not eliminate any cycle. Given T and E , the set H and hence G E,T is unique.

We call G E,T in the above claim to be the minimal subgraph with respect to the edgeset E and the spanning tree T . Now, consider the spanning tree T of the 1-skeleton of K consisting of all negative edges. Addition of positive edges creates cycle. Let E be the set ofall positive edges with persistence more than δ . The graph consisting of edges T ∪ E hascycles. Consider the minimal subgraph G E,T . The graph ˆ G computed by the algorithm isthe graph G E,T plus a maximal set of edges in T whose deletions do not eliminate any cyclein (cid:98) G . Denoting this set of edges as H , one has ˆ G = G E,T ∪ H .On the other hand, the union of the boundaries ∪ i B i of the regions R i consist of onlynegative edges or positive edges that are critical. This is because otherwise the edges haveto be positive and non-critical, but then the two regions containing such edges are alreadymerged, eliminating them from boundaries. Therefore, ∪ i B i can be formed by taking thespanning tree T consisting of negative edges, adding the positive edges E to it and theneliminating all edges that cannot eliminate any cycle. Therefore, ∪ i B i also forms a subgraphof the 1-skeleton of K that is minimal with respect to the same set of positive edges E andthe spanning tree T . Hence, by claim G E,T = ∪ i B i and thus ˆ G = ∪ i B i (cid:83) H .19 .3 Proof of Proposition 6.4 First we show that the chain D i constructed by the persistence algorithm [14] is a subchainof C ii . The chain D i itself is constructed by adding chains as the algorithm searches for theedge e i to be paired with t i . Let D i = t i , D i , . . . , D i k be the ordered sequence of chains thatare added to construct D i .We use double induction. First, assume inductively that, for j < i , D j is a subchain of C jj .It is true initially for j = 1 because C = D = t . To prove the hypothesis for C ii , assumein a nested induction that D i , . . . , D i k − are subchains of C ii . Initially, t i and hence D i isa subchain of C ii . For the nested induction consider the edge e i k which is in the boundary ∂ ( D i + · · · + D i k − ). The persistence pair ( e i k , t i k ) appears before the pair ( e i , t i ). Therefore,the chain C i k i k is added to C i k i and thus becomes a subchain in C ii . But, the chain C i k i k contains D i k as a subchain which is also added to C ii as a result. This establishes that D i is a subchainof C ii .If the edge e i ∈ ∂D i is not in the boundary ∂C ii , it must be the case that, for some i (cid:48) < i , C i (cid:48) i (cid:48) has been added to C ii where ∂C i (cid:48) i (cid:48) contains e i . In that case, ( e i , t i (cid:48) ) must be a persistencepair according to the construction of C ji s. This is impossible because ( e i , t i ) is a persistencepair and i > i (cid:48) .To show that e i is the most recent edge in ∂C ii , assume inductively that e j is the mostrecent edge in ∂C jj for j < i . If e i is not the most recent edge in ∂C ii , let e i (cid:48) be the mostrecent one. Let C j (cid:48) j (cid:48) be the chain that was added to C ii because of which e i (cid:48) was includedin the boundary C ii . Clearly, j (cid:48) < i . Then, e i (cid:48) was in the boundary ∂C j (cid:48) j (cid:48) and by inductivehypothesis i (cid:48) ≤ j (cid:48) . It follows that i (cid:48) ≤ j (cid:48) < i reaching a contradiction that e i is not the mostrecent edge. The proposition then follows. A.4 Proof of Proposition 6.5

We induct on j . When j = 1, we have R ∗ i = t i for every i = 1 , . . . , n . Then, (a) & (b) aresatisﬁed trivially. Assume that they hold for j . Since R ∗ i = C ji by inductive hypothesis,Proposition 6.4 ensures that ∂R ∗ j contains e j , the edge with which t j pairs with. Then, byinductive hypothesis (b), there is a unique gradient path π j from t j to e j . The algorithm PerSimpVF () only reverses π j .The set R ∗ j is a constituent of the sets { R ∗ i } , i = j, . . . , n that partition the set of trianglesin K . Hence, the edge e j ∈ ∂R ∗ j necessarily belongs to another boundary ∂R ∗ j (cid:48) for some j (cid:48) > j . Observe that since each edge can be incident to at most two triangles, there is aunique such j (cid:48) . We update R ∗ j (cid:48) := R ∗ j (cid:48) + R ∗ j . Clearly, the sets R ∗ i , i = j + 1 , . . . , n partitionsthe set of triangles in K , proving claim (a.ii). We argue that R ∗ j (cid:48) = C j +1 j (cid:48) completing theproof that (a) holds after PerSimpVF () processes t j . Before the update it holds inductivelythat R ∗ j (cid:48) = C jj (cid:48) . After PerSimpVF () processes ( e j , t j ), the only 2-chain that gets updated is R ∗ j (cid:48) because e j is only in ∂R ∗ j (cid:48) where j (cid:48) (cid:54) = j . The new 2-chain R ∗ j (cid:48) after the update exactlysatisﬁes the deﬁnition of C j +1 j (cid:48) because R ∗ j (cid:48) := R ∗ j (cid:48) + R ∗ j = C jj (cid:48) + C jj . This proves claim (a.i).To show that claim (b) holds as well, we need to consider the only updated 2-chain R ∗ j (cid:48) .We say that an edge is in a 2-chain if one of its triangles contains the edge in its boundary.Let e be any edge in R ∗ j (cid:48) . If e is in R ∗ j (cid:48) before the update, then we already have a gradient20ath from t j (cid:48) to e by inductive hypothesis. If e is in R ∗ j but not in R ∗ j (cid:48) before the update,we have a gradient path from t j (cid:48) to e in the updated R ∗ j (cid:48) . This gradient path is obtained byconcatenating three sequences, say π , π and π . The sequence π is the gradient path from t j (cid:48) to e j in R ∗ j (cid:48) before the update. Let t be the last triangle shared by a path from t j to e and the unique path from t j to e j before the update. The sequence π is the subsequence ofthe reversed path π j from e j to t with e j and t removed, and the sequence π is the gradientpath from t to e that existed in R ∗ j . This establishes that there is a gradient path from t j (cid:48) toall edges in R ∗ j (cid:48) after the update. Now, to prove (b.i), if e is a boundary edge of R ∗ j (cid:48) after theupdate, it must be the case that e is in the boundary of either R ∗ j or R ∗ j (cid:48) but not in bothbefore the update. In that case, uniqueness of the gradient path from either t j or t j (cid:48) to e before the update implies the uniqueness of the path after the update. Hence, (b.i) holds forupdated R ∗ j (cid:48) and hence for all i > j .To prove claim (b.ii), suppose e ∈ R ∗ j (cid:48) has two gradient paths from t j (cid:48) to it after theupdate. If e is from R ∗ j (cid:48) before the update, then the gradient path from t j (cid:48) to it will notchange. Hence it must be negative in this case. So now suppose e is from R ∗ j but not from R ∗ j (cid:48) before the update. By the argument in the previous paragraph, we now that new gradientpath from t j (cid:48) to e consists of three portions, and it is easy to see that we can have two pathsfrom t j (cid:48) to e only if there were two paths from t j to e in R ∗ j before the update. Hence byinduction hypothesis, e must be negative as well. This proves claim (b.ii), and ﬁnishes theproof of Proposition 6.5. B Experiments

In this section, we perform our algorithm on 2D and 3D density ﬁelds. The 2D density ﬁeldsare generated from the GPS trajectories and the goal is to extract the hidden road networkbehind [28]. There are three 3D density ﬁelds: A synthetic dataset where the ground truth isknown; and two real-life datasets where the noise model assumptions may or may not besatisﬁed: Speciﬁcally, we have the

Enzo dataset [23, 24], which comes from the simulations ofcosmological structure formation in university and where the goal is to extract the ﬁlamentstructure behind; and the

Bone dataset which are Micro CT images of bones from the CTDataset Archive from CIBC [2]. For the

Bone dataset , we crop a portion of one bone since thetriangulation is huge. The sizes of these data sets are in Table 1, where “Athens”, “Beijing”,and “Berlin” represents the three 2D density ﬁelds obtained from large collection of noisyGPS traces in the three respective cities.The input points of the datasets are actually vertices from a 2D or 3D grid, so we obtaina triangulation of input points by simply triangulating each 2D/3D cubic cell. It is possibleto use a cubical complex directly, but using a triangulation allows us to threshold on theinput density function to remove noise and reduce the size of input simplicial complex.There are two parameters used in our experiments: the threshold δ which is the inputparameter of MorseRecon () and

MorseReconSimp () used for persistence-based simpliﬁcation;and the parameter t to threshold the density function as used by the thresholding method(not needed for our algorithm). Both parameters are chosen empirically.Below, we ﬁrst show the eﬃciency of our simpliﬁed algorithm MorseReconSimp () versusthe original algorithm

MorseRecon (). Then, we present some experimental results showing21hat, while the practical data does not fall under our noise model, empirically the algorithmstill works well (as also demonstrated earlier in work such as [27, 28]). Furthermore, as wementioned earlier, thresholding may be able to produce a graph with theoretical guaranteefor input density ﬁelds under our noise model, when it is combined with say, medial axis typeapproaches, to extract the graph after thresholding (even for that it is not yet straightforwardto obtain such theoretical result). However, we will present experiments that show thatsimple thresholding does not work well empirically for non-uniformly sampled input.

Name

Table 1: Size of the triangulations of the datasets.

Running time.

We implemented our simpliﬁed algorithm

MorseReconSimp () and nowcompare its running time with the original algorithm

MorseRecon (). As we mentioned inSection 3 in [11], this algorithm has already been simpliﬁed so that Morse-cancellation is only done for vertex-edge critical pairs. Hence we implemented this improved version of

MorseRecon (), which we refer to as

MorseRecon + () (the version of PerSimpVF () where onlyvertex-edge critical pairs are canceled is referred to as

PerSimpVF + ()). Note that a speedupof a factor of at least 2 has already been reported for MorseRecon + () over MorseRecon () onGPS data [11]. The step of computing the persistence pairing for the negation of densityﬁeld (both in

MorseRecon + () and our MorseReconSimp ()) is done by PHAT software package[5]. The comparison of their running time is shown in Table 2. Speciﬁcally, note the step ofcomputing persistence pairing is common to both the original

MorseRecon () algorithm andour simpliﬁed

MorseReconSimp () version. Furthermore, for 3D data, this step is currentlythe bottleneck (although it may be improved by using persistence algorithm more suitablefor volumetric data, such as DiPha). Hence we report the time for this step separately inthe 3rd column of the table. The 4th and 5th columns of Table 2 show the running time (inseconds) of algorithm

MorseRecon + () (without persistence computation) and of algorithm MorseReconSimp () (without persistence computation). As we can see, our simpliﬁed algorithmis more eﬃcient, generally we see at least a factor of 2 speed-up.

Reconstructions results.

We do not show the reconstruction results for 2D data sets,as these data sets are originally used in both [28, 11] and our output is the same (Corollary3.3). We now show the reconstruction results for 3D data sets. Since it is hard to obtainground truth for real life datasets, we ﬁrst show the result of a synthetic dataset where weknow the ground truth in Figure 3a. This dataset is generated as follows: First we create anarbitrary graph (the black lines) , then we diﬀuse this graph by convolution with a Gaussiankernel to obtain a 3D density ﬁeld. The band width of the Gaussian kernel is 4, comparing22 ame δ Pre-process

PerSimpVF + + CollectOutputG PerSimpTree + Treebased-OutputG

Athens 0.01 12.3 1.2 0.5Beijing 0.1 97.8 13.1 5.4Berlin 10 2.0 0.25 0.17Name δ Pre-process

PerSimpVF + + CollectOutputG PerSimpTree + Treebased-OutputG

ENZO 50 26.5 1.0 0.38Bone 40 869 21.6 8.2

Table 2: Running time (in seconds) of pre-process (column 3, including ﬁltration andpersistence computation ), algorithm

MorseRecon + () w/o persistence part (column 4) andour simpliﬁed algorithm MorseReconSimp () w/o persistence part (column 5).to the radius 50 of the input data. As we can see in Figure 3a, our output graph capturesthe two loops in the ground truth graph, which follows our theoretical results.The reconstruction results for real life data sets Enzo and Bone are given in Figure 3b andin Figure 3c, respectively, overlayed with the original density ﬁeld. There is no single goodthreshold t as the density has a rather non-homogeneous distribution, and structures can existat diﬀerent level of thresholds (which we will show more shortly below). Hence we provide avolume rendering of the input density ﬁeld overlapped with our output reconstruction.We also provide reconstructions of the bone dataset at diﬀerent threshold δ level in Figure4. As we increase the threshold δ , we capture fewer features of the data. (a) (b) (c) Figure 3: In the three ﬁgures above, the red lines are the reconstructions, the volumes showthe volume renderings of the input density functions. (a) Synthetic dataset.The black linesare the ground truth. (b) Enzo dataset. (c) Part of a bone dataset.23 a) 40 (b) 50 (c) 60 (d) 70

Figure 4: (a)-(d): Reconstructions of the bone dataset at diﬀerent threshold δ level (show inthe sub caption). Comparing with the thresholding method.

A natural way to denoise is simplythresholding. However, for many data, due to the non-homogeneousness, no single threshold t can capture all features. For example, as we show in Figures 5 and 6, branching / loopfeatures appear at diﬀerent time as we vary the density threshold t . Furthermore, featuresthat appear at high density threshold t may actually be destroyed at low threshold t . Hencethere is no single good threshold t to capture all features. As an example, see the two loopscircled with blue in Figure 5a for a high density threshold t , which is ﬁlled in a lower threshold t in 5c and 5d respectively. However, we need to lower the threshold t as many new features,such as the loops circled with green shown in 5c only appears in a lower threshold t . Similarly,for the bone dataset, we note that most of the vertical ﬁbers in the lower part can only becaptured at a lower threshold t . However, at that point, the features in the high densityregions (say the top part) are already merged.On the other hand, while we do not yet have theoretical guarantees for the discrete Morsebased graph reconstruction algorithm for such non-homogeneous data sets, we note that itperforms very well empirically, captures these features of diﬀerent density scales. (The redgraphs in both ﬁgures are the output reconstruction by our algorithm MorseReconSimp .) (a) 2 . −

30 (b) 2 . −

30 (c) 1 . −

30 (d) 1 . − Figure 5: (a)-(d): Part of the Enzo dataset. The gray volumes are isovolumes with increasinglower bounds (show in the sub caption), the red lines are the reconstructions. Reliablefeatures existing in high thresholds got killed when we lower it.24 a) 210 (b) 200 (c) 180 (d) 160

Figure 6: (a)-(d): Bone dataset. The blue volumes are isovolumes with increasing lowerbounds (show in the sub caption), the red lines are the reconstructions. Low threshold t captures features in lower part, high threshold tt