[PDF] Frechet-Like Distances between Two Merge Trees

Abstract

The purpose of this paper is to extend the definition of Frechet distance which measures the distance between two curves to a distance (Frechet-Like distance) which measures the similarity between two rooted trees. The definition of Frechet-Like distance is as follows: Tow men start from the roots of two trees. When they reach to a node with the degree of more than 2 , they construct k−1 men which k is the outgoing degree of the node and each man monitor a man in another tree (there is a rope between them). The distance is the minimum length of the ropes between the men and the men whom are monitored and they all go forward (the geodesic distance between them to the root of the tree increases) and reach to the leaves of the trees. Here, I prove that the Frechet-Like distance between two trees is SNP-hard to compute. I modify the definition of Frechet-Like distance to measure the distance between tow merge trees, and I prove the relation between the interleaving distance and the modified Frechet-Like distance.

Full PDF

FFrechet-Like Distances between Two Merge Trees

Elena Farahbakhsh TouliApril 23, 2020

Abstract

The purpose of this paper is to extend the deﬁnition of Frechet distance which measures thedistance between two curves to a distance (Frechet-Like distance) which measures the similaritybetween two rooted trees. The deﬁnition of Frechet-Like distance is as follows: Tow men startfrom the roots of two trees. When they reach to a node with the degree of more than 2, theyconstruct k − k is the outgoing degree of the node and each man monitor a manin another tree (there is a rope between them). The distance is the minimum length of theropes between the men and the men whom are monitored and they all go forward (the geodesicdistance between them to the root of the tree increases) and reach to the leaves of the trees.Here, I prove that the Frechet-Like distance between two trees is SNP-hard to compute.I modify the deﬁnition of Frechet-Like distance to measure the distance between tow mergetrees, and I prove the relation between the interleaving distance and the modiﬁed Frechet-Likedistance. a r X i v : . [ c s . CC ] A p r Introduction

In this paper I are interested in extending the deﬁnition of the Frechet distance between curves toa distance between two trees.Frechet distance between curves is a distance for measuring the similarity between two curves.For the ﬁrst time Frechet distance was deﬁned by Maurice Frchet [4, 9, 10]. Later, Frechet distanceattracted attention and was worked on by other people [1, 3, 4, 6, 7].The intuitive deﬁnition of Frechet distance between two curves is as follows: A man and his dogstart from the starting points of two curves and a leash connects the dog to the man. They canonly go forward. The Frechet distance between the curves is the minimum length of the leash thatthe man and the dog start from the beginning of the curves and they reach to the end of the curveswithout separating the leash. In the following I write the mathematical deﬁnition of the Frechetdistance between two curves [4].

Deﬁnition 1. [4] Suppose that I have two curves C : [ a, b ] −→ V and C : [ a (cid:48) , b (cid:48) ] −→ V , such that a < b and a (cid:48) < b (cid:48) and V is a vector space. The Frechet distance between C and C is deﬁned as theinﬁmum distance over all continuous increasing functions α : [0 , −→ [ a, a (cid:48) ] and β : [0 , −→ [ b, b (cid:48) ] that maximizes the distance between C ( α ( t )) and C ( β ( t )) on t ∈ [0 , . In this case, the Frechetdistance is deﬁned as follows: d F ( C , C ) = inf α,β max t ∈ [0 , { d ( C ( α ( t )) , C ( β ( t ))) } . Weak Frechet distance is a special kinds of Frechet distance such that the man and the dog can gobackward as well [4]. Both Frechet distance and weak Frechet distance can be found in a polynomialtime between two polygonal curves [4], but it is NP-hard to compute the Frechet distance betweentwo surfaces [12] and till now no one has deﬁned Frechet distance between trees. Discrete Frechetdistance was discussed by T. Eiter and H. Mannila in 1994 [8]. In 2012, P.K.Agrawal, etc. foundan algorithm to ﬁnd the discrete Frechet distance between two polygonal curves in sub-quadratictime. [1]

New work.

In this paper I will extend the deﬁnition of the Frechet distance between curves todeﬁne a similar distance between rooted trees.This is the ﬁrst time that the Frechet distance is deﬁned between trees. I call it Frechet-Likedistance because of the similarity of this deﬁnition to the Frechet distance between curves. Theintuitive deﬁnition of the Frechet-Like distance is as follows: Two men A and B start form theroots of merge trees T and T respectively and there is a rope between them. When they reachto a node (a vertex with the degree of greater than 2) each of them construct ( k −

1) men similarto themselves, which k is the outgoing degree of the node. Each man from T ( T ) is assigned towalk simultaneously (there is a rope between them) with a man in T ( T ); he can stop in a nodeor go forward with a man from T ( T ) or the man in T ( T ) stops and another one goes forward.A man from T ( T ) can walk with more than one man from T ( T ) only if the distance betweenthem is not more than ε and the man in T ( T ) stops in one point. Also if two men A and B gosimultaneously with two men A (cid:48) and B (cid:48) , respectively, there is a rope between the nearest commonancestor or A and B and also the nearest common ancestor of A (cid:48) and B (cid:48) . The problem is ﬁndingthe minimum length of the rope that the man can start from the beginning of the tree T andanother from T and there is at least a man to reach to any leaf of T and a man that reaches toany leaf of T . A man should either construct other men or reach to a leaf. In this deﬁnition length1f the rope with one end point in x ∈ | T | and one end point in y ∈ | T | is deﬁned as d ( x, y ). HereI consider the Euclidean distance as the distance between x and y .Later I modify the deﬁnition of Frechet-Like distance to a deﬁnition between two merge trees.By considering the merge trees T f and T g , I prove the relation between the modiﬁed Frechet-Likedistance and the interleaving distance between two merge trees. Deﬁnition 2.

Merge tree. [14, 15]A merge tree is a rooted tree with a function which is deﬁned on each point of the tree. A mergetree T h is deﬁned by a pair ( T, h ) such that h : | T | −→ R is a monotone function which means thatif for x, y ∈ | T | x < y , h ( x ) < h ( y ) .Intuitively I can deﬁne a merge tree ( T, h ) as follows: consider a tree and a node of the tree asthe node u . Hang the tree from the node. I consider the function value h ( u ) = 0 for u that I hangthe tree from and for all the other points in the tree, the function of each point of the merge tree T hu will be the negative distance between the node u and the point. The outcome of this paper is as follows: The distance between two trees is discussed in section 2.In section 3, I deﬁne the Frechet-Like distance between trees, both the intuition and mathematicaldeﬁnition of Frechet-Like distance. In section 4, I prove that it is NP-hard to approximate theFrechet-Like distance between rooted trees. Section 5 is considered for modifying the Frechet-like distance between two merge trees. I also prove the relation between the interleaving distanceand the modiﬁed Frechet-like distance between two merge trees in this section. Section 6 is theconclusion.

Distance between trees is one of the topics that has been discussed in the previous years [2, 5, 13,14, 15]. The tree edit distance and the tree alignment distance are two well-known distances whichwere deﬁned between trees [13]. Both the tree edit distance and the tree alignment distance betweentwo trees are MAX SNP-hard to compute. There is a polynomial time algorithm for computing thetree alignment distance between two ordered trees if we bound the degree of each node, howeverthere is no known polynomial algorithm for ﬁnding the edit distance between ordered trees withbounded degrees. There is a polynomial algorithm for computing the tree edit distance betweentrees if we consider trees with bounded depth [13]. Deﬁnition 3.

Tree edit distance [13].Consider two labeled trees T and T . The tree edit distance is the minimum cost of changingone tree to another one by using three editing operations add, remove and rename. Deﬁnition 4.

Tree alignment distance [13].Consider two labeled trees T and T . The alignment distance between the two trees is obtainedas follows: ﬁrst I add nodes to T and T that the modiﬁed trees T (cid:48) and T (cid:48) have the same structures.The related cost would be the the cost of changing the labels that two trees T (cid:48) and T (cid:48) have also samelabels. The minimum cost related to the best structural changes is the alignment distance. Two following notes are satisﬁed about the tree edit distance and tree alignment distance from[13] and [15] respectively. x < y is that x is a descendant of y Ordered tree is a rooted tree that there is an order between the children of each node [13]. ote 1. Tree alignment distance is always greater than or equal to tree edit distance. For moreillumination, look at Figure 1 (b).

Note 2.

Although there is a polynomial time algorithm for ﬁnding tree alignment distance betweenordered labeled trees, tree alignment distance cannot capture similarities between trees. Figure 1 (a)illustrates this better. (a) (b) (c)Figure 1: (a) Two trees are very similar to each other, but the alignment distance between themis very large, because tree alignment distance is sensible to the relationship between children andparents. (b) If the costs of relabeling, removing and adding nodes is 1, tree edit distance betweentwo trees is 2, and tree alignment distance between them is 4. (c) Two trees (red color and blackcolor trees) are completely diﬀerent, however the Hausdorﬀ distance between them is small.Another distance that we can consider between trees is Hausdorﬀ distance [6]. Hausdorﬀ dis-tance is deﬁned between two sets of points. The Hausdorﬀ distance is deﬁned as follows:

Deﬁnition 5.

Hausdorﬀ Distance [6].For given sets S and S in a space, for each point s in S we ﬁnd the closest point to it in S (as s (cid:48) ), and for each point in S we ﬁnd the closest point to it in the set S . The Hausdorﬀdistance is the maximum over all distances that we ﬁnd. The mathematical deﬁnition of Hausdorﬀdistance is as follows: d H ( S , S ) = max (cid:26) sup s ∈ S inf s (cid:48) ∈ S d ( s, s (cid:48) ) , sup s (cid:48) ∈ S inf s ∈ S d ( s, s (cid:48) ) (cid:27) . If we consider the underlying space of trees on R , we can deﬁne Hausdorﬀ distance between twotrees. However, the Hausdorﬀ distance cannot capture dissimilarities between trees. For examplein Figure 1 the two trees are very diﬀerent, however the Hausdorﬀ distance between them is verysmall.Another distance that we can consider between trees is interleaving distance[14, 15]. Interleavingdistance is deﬁned between merge trees. Interleaving distance between two merge trees T f and T g is deﬁned by two continuous functions α and β and the deﬁnition is as follows: Deﬁnition 6. [2, 14, 15] Interleaving distance between two merge trees T f and T g is deﬁned asfollows: d I ( T f , T g ) = inf { δ s.t. there is a pair of δ -compatible maps between T f and T g } , here two continuous maps α δ : | T f | −→ | T g | and β δ : | T g | −→ | T f | are δ -compatible if and only ifthe following conditions are satisﬁed:(1) For all u ∈ | T f | , g ( α δ ( u )) = f ( u ) + δ ,(2) For all v ∈ | T g | , f ( α δ ( v )) = g ( v ) + δ ,(3) For all u , u ∈ | T f | s.t. f ( u ) = f ( u ) , β δ oα δ ( u ) = β δ oα δ ( u )) = u δ ,(4) For all v , v ∈ | T g | s.t. g ( v ) = g ( v ) , α δ oβ δ ( v ) = αoβ δ ( v ) = v δ . In [2], P. K. Agrawal, etc., proved that it is NP-hard to compute interleaving distance betweentwo merge trees and it concludes the fact that it it NP-hard to compute the Gromov-Hausdorﬀdistance between trees within a factor of better than 3. Later in 2019 E. Farahbakhsh and Y. Wang[15] deﬁned one ε -good map from T f to T g which is deﬁned as follows: Deﬁnition 7. [15] A map α δ : | T f | −→ | T g | is called δ -good map if and only if the followingconditions are satisﬁed:(C1) α δ is continuous,(C2) For every point u ∈ | T f | , g ( α δ ( u )) = f ( u ) + δ ,(C3) For every pair of points v = α δ ( u ) and v = α δ ( u ) , if v ≥ v , u δ ≥ u δ ,(C4) If there is a point v ∈ | T g | which is not in the image of α δ , f ( v F ) − f ( v ) ≤ δ . and by the deﬁnition of δ -good map, they proved the following Theorem: Theorem 1. [15] d I ( T f , T g ) ≤ δ if and only if there is a δ -good map α δ : | T f | −→ | T g | . In this section I deﬁne the Frechet-Like distance between two rooted trees. Given two merge trees T and T rooted at u and v respectively, the deﬁnition of Frechet-Like distance is that I considertwo men A and B who start to walk from the points u and v respectively. Two men are connectedby using a rope. If a man reaches to a node (with a degree of higher than 2) he will copy himself k − k is the outgoing degree of the node. Each man at each time can monitor(there is a rope between them) just one man at a time unless the man stops and others go far awayjust ε distance. Here the distance between two men is deﬁned the distance between their functionvalue of the merge tree at those points that they are. The Frechet-Like distance is deﬁned as theminimum distance between the point that two men are. In the following I write the mathematicaldeﬁnition of the Frechet-Like distance: Deﬁnition 8.

Frechet-Like Distance

For two given rooted trees T and T , I deﬁne Frechet-Like distance as follows: d F L ( T f , T g ) := min R ∈R sup ( x,y ) ∈ R d ( x, y ) d ( x, y ) is the Euclidean distance between two points x and y and the correspondence R ⊆ | T f | × | T g | is deﬁned as follows:1) ∀ x ∈ | T | , ∃ y ∈ | T | s.t. ( x, y ) ∈ R u δ is an ancestor of u in T f such that f ( u δ ) − f ( u ) = 2 δ v F is the nearest ancestor of v such that v F is in the image of α δ . -i) ∀ y ∈ | T | , ∃ x ∈ | T | s.t. ( x, y ) ∈ R

2) If ( x , y ) ∈ R and ( x , y ) ∈ R and x ≥ x and y ≥ y then2-i) ∀ x s.t. x ≤ x ≤ x , ∃ y s.t. y ≤ y ≤ y and ( x, y ) ∈ R and2-ii) ∀ y s.t. y ≤ y ≤ y , ∃ x s.t. x ≤ x ≤ x and ( x, y ) ∈ R .3) If ( x , y ) ∈ R and ( x , y ) ∈ R then ( x ∼ x , y ∼ y ) ∈ R .4) If x ∈ | T | is a leaf, there should be a leaf y ∈ | T | such that ( x, y ) ∈ R , unless there is a y (cid:48) suchthat ( x, y (cid:48) ) ∈ R and ( x N , y (cid:48) ) ∈ R .4-i) If y ∈ | T | is a leaf, there should be a leaf x ∈ | T | such that ( x, y ) ∈ R , unless there is a x (cid:48) such that ( x (cid:48) , y ) ∈ R and ( x (cid:48) , y N ) ∈ R . In this section I prove that computing the Frechet like distance between two rooted trees is SNP-hard to compute by a reduction from UNRESTRICTED-PARTITION. The way that I prove thatit is in SNP-hard is very similar to proving that Gromov-Hausdorﬀ distance between two mergetrees is in SNP-complete. [2]

UNRESTRICTED-PARTITION.

Input: a multiset of positive integers X = { a , ..., a n } such that n = 3 k ,Output: Is there a partition of X into k multisets X , ..., X m such that for each multiset X j if weconsider by S j the summation of elements in multiset X j , S j = ( (cid:80) ni =1 a i ) /m ? [11] Theorem 2.

The problem UNRESTRICTED-PARTITION is in SNP-complete.Proof.

See ([11]).Here, I construct two merge trees T f and T g as follows. In the following picture A and B aretwo large numbers.Figure 2: Two trees T f and T g . A and B are two large numbers.if I consider two merge trees T f and T g that their roots is located in one point, d ( x, y ) = | f ( x ) − g ( y ) | in the Deﬁnition9. Now, I prove the hardness of approximation of Frechet-Likedistance by the following lemmas. x ∼ x is the nearest ancestor of x and x x N the nearest node which is an ancestor of x emma 1. d F L ( f, g ) ≤ if UNRESTRICTED-PARTITION is a yes instance.Proof. If UNRESTRICTED-PARTITION is a yes instance, I can construct a correspondence R ⊆| T f | × | T g | such that sup ( x,y ) ∈ R | f ( x ) − g ( y ) | ≤

1. If UNRESTRICTED-PARTITION is a yesinstance, I can partition X into X , X , ..., X n such that S ( X ) = S ( X ) k , and X i = { a i, , ..., a i,k i } .Therefore, I map sub-trees rooted at { u i , ..., u i ki } to v i , such that u i j corresponds to a i,j in theconstruction of the tree and v i corresponds to X i . When I say that I map a point x ∈ | T f | to a point y ∈ | T g | , we mean that ( x, y ) ∈ R If ( u i j , v i ) ∈ R and ( u i k , v i ) ∈ R , I have that ( u r , v i ) ∈ R . (Formore illustration look at Figure 2) Therefore, I could construct a correspondence R ⊆ | T f | × | T g | such that sup ( x,y ) ∈ R | f ( x ) − g ( y ) | < Lemma 2.

If UNRESTRICTED-PARTITION is a no instance, d F L ( f, g ) ≥ .Proof. If UNRESTRICTED-PARTITION is a no instance, as edges with the length of A are toolarge, we have to ﬁnd a correspondence R such that for any pair of points x , x ∈ T f suchthat x (cid:107) x , there are two diﬀerent points y , y ∈ T g such that ( x , y ) ∈ R , and ( x , y ) ∈ R .Therefore, the best correspondence that I can ﬁnd with the conditions of the deﬁnition 9 is that x ∼ x map to two diﬀerent point y and y as shown in Figure 4. Which indicates that theFrechet distance between T f and T g cannot be smaller than 3.From two mentioned lemmas, we can conclude the following result. Corollary 1.

Computing a (3 − (cid:15) ) -approximation of the Frechet-Like distance between two mergetrees T f and T g is NP-complete, In this section we deﬁne a Frechet-Like distance between two merge trees, which we call Frechet-Like distance between merge trees. Given two merge trees T f and T g rooted at u and v respectively, x (cid:107) x if x (cid:2) x nor x (cid:2) x Deﬁnition 9.

Frechet-Like Distance

For two given merge trees T f and T g , we deﬁne Frechet-Like distance as follows: d MF L ( T f , T g ) := min R ∈R sup ( x,y ) ∈ R | f ( x ) − g ( y ) | and the correspondence R ⊆ | T f | × | T g | is deﬁned like the Deﬁnition 6. By the following lemma, we prove the relation between the Frechet-like distance and the inter-leaving distance between merge trees.

Lemma 3.

If there exists an ε such that d MF L ( T f , T g ) ≤ ε , then d I ( T f , T g ) ≤ ε .Proof. For proving this lemma we need to ﬁnd an ε -good map α ε : | T f | −→ | T g | such that threeconditions in the deﬁnition of ε -good map are satisﬁed. First, we consider the ε -good map α ε asfollows:As the Frechet-Like distance between T f and T g is not greater than ε , based on the deﬁnition9 there is a correspondence R such that four conditions of the deﬁnition 9 are satisﬁed. Now forconstructing of the ε -good map for any pair of points ( x, y ) ∈ R if g ( y ) = f ( x ) + ε , we map thepoint x to y , in another words α ε ( x ) = y . Otherwise, if g ( y ) < f ( x ) + ε , we map x to a point y (cid:48) such that y ≤ y (cid:48) and g ( y (cid:48) ) = f ( x ) + ε , it means that α ε ( x ) = y (cid:48) .Now, we need to prove that α ε is an ε -good map. To do so, we need to prove that four conditionsof the deﬁnition 7 for the map α ε are satisﬁed. C1.

We need to prove that map α ε is continuous. To do so, we use the similar method as is writtenin [15]. C2.

Based on the construction of the map α ε for any pair of points ( x, y ) ∈ R we map x to a pointwhich is ε distance higher than x . As for all the point x in | T f | there is at least one y such that7 x, y ) ∈ R , we can conclude that for all the point x in | T f | , g ( α ε ( x )) = f ( x ) + ε , which satisﬁes thecondition (C2) of the Deﬁnition 7. C3.

If two pairs of points ( x , y ) ∈ R and ( x , y ) ∈ R , and y ≤ y , we know that g ( y ) ≤ g ( y ).Therefore, based on the construction of the map α ε , we have that f ( x ) ≤ f ( x ). Two cases canhappen:Case1: x ≤ x , which in this case we have that x (cid:15) ≤ x (cid:15) .Case2: x (cid:107) x , in this case if by contradiction x (cid:15) (cid:2) x (cid:15) , therefore we have that x (cid:15) (cid:107) x (cid:15) as f ( x ) ≤ f ( x ). Based on the deﬁnition of Frechet-Like distance the highest y such that ( x , y ) ∈ R is y . Therefore by using the condition 3 of the Frechet-Like distance the highest point yinT g that( x ∼ x , y ) ∈ R is y and f ( x ∼ x ) − g ( y ) > ε . It is a contradiction with the fact that theFrechet-Like distance between T f and T g is less than or equal to ε .Figure 5: C4.

If there is a point y ∈ T f such that there is no x ∈ T g map to y under the map α ε , as wealready proved in C1 that the map is continuous, the point should be a branch connects a leaf (Forexample y L ) to the tree, and none of the point y (cid:48) ≤ y are in the image of the map α ε . Now, I justneed to prove that g ( y F ) − g ( y ) ≤ ε . By contradiction if g ( y F ) − g ( y ) > ε and x is the pointthat ( x, y F ) ∈ R based on the deﬁnition of Frechet-Like distance condition 4, x is a leaf. Therefore,( x, y L ) ∈ R and it is a contradiction by the fact that d MF L ( T f , T g ) ≤ ε . In this paper I extended the Frechet distance between two curves to Frechet-Like distance betweenrooted trees. In section 2, I discussed some distances that have been deﬁned between two trees. Ideﬁned a new deﬁnition for computing the similarity between two trees in Section 2. I called thenew distance, Frechet-Like distance because of the similarity of the deﬁnition to Frechet distancebetween curves. The hardness of approximation was discussed later in Section 4. Here, we also8roved that although there is a polynomial time algorithm for computing the Frechet distancebetween polygonal curves [4], it is NP-hard to approximate Frechet-Like distance between twotrees. The relation between Frechet-Like distance between two merge trees and the interleavingdistance was discussed in section 5.

References [1] Pankaj K. Agarwal, Rinat Ben Avraham, Haim Kaplan, and Micha Sharir. Computing thediscrete fr´echet distance in subquadratic time.

SIAM J. Comput. , 43:429–449, 2013.[2] Pankaj K. Agarwal, Kyle Fox, Abhinandan Nath, Anastasios Sidiropoulos, and Yusu Wang.Computing the gromov-hausdorﬀ distance for metric trees. In

ISAAC , 2015.[3] Hugo Alves Akitaya, Maike Buchin, Leonie Ryvkin, and J´erˆome Urhausen. The k -frechetdistance. 2019.[4] Helmut Alt and Michael Godau. Computing the frchet distance between two polygonal curves. Int. J. Comput. Geometry Appl. , 5:75–91, 1995.[5] Philip Bille. A survey on tree edit distance and related problems.

Theor. Comput. Sci. ,337(1-3):217–239, June 2005.[6] Kevin Buchin, Maike Buchin, and Carola Wenk. Computing the fr´echet distance betweensimple polygons in polynomial time. volume 2006, pages 80–87, 01 2006.[7] Maike Buchin, Anne Driemel, and Bettina Speckmann. Computing the fr´echet distance withshortcuts is np-hard. In

Symposium on Computational Geometry , 2013.[8] Thomas Eiter and Heikki Mannila. Computing discrete fr´echet distance. Technical report,1994.[9] G.M.N. Ewing.

Calculus of Variations with Applications . Dover Books on Mathematics. DoverPublications, 1985.[10] M. Maurice Fr´echet. Sur quelques points du calcul fonctionnel.

Rendiconti del Circolo Matem-atico di Palermo (1884-1940) , 22(1):1–72, Dec 1906.[11] M. R. Garey and David S. Johnson. Computers and intractability: A guide to the theory ofnp-completeness. 1978.[12] Michael Godau. On the complexity of measuring the similarity between geometric objects inhigher dimensions. 1999.[13] Tao Jiang, Lusheng Wang, and Kaizhong Zhang. Alignment of trees: an alternative to treeedit. 1995.[14] Dmitriy Morozov, Kenes Beketayev, and Gunther H. Weber. Interleaving distance betweenmerge trees. In

Workshop on Topological Methods in Data Analysis and Visualization: Theory,Algorithms and Applications , 2013. 915] Elena Farahbakhsh Touli and Yusu Wang.

FPT -algorithms for computing gromov-hausdorﬀand interleaving distances between trees. In