Frechet-Like Distances between Two Merge Trees
FFrechet-Like Distances between Two Merge Trees
Elena Farahbakhsh TouliApril 23, 2020
Abstract
The purpose of this paper is to extend the definition of Frechet distance which measures thedistance between two curves to a distance (Frechet-Like distance) which measures the similaritybetween two rooted trees. The definition of Frechet-Like distance is as follows: Tow men startfrom the roots of two trees. When they reach to a node with the degree of more than 2, theyconstruct k − k is the outgoing degree of the node and each man monitor a manin another tree (there is a rope between them). The distance is the minimum length of theropes between the men and the men whom are monitored and they all go forward (the geodesicdistance between them to the root of the tree increases) and reach to the leaves of the trees.Here, I prove that the Frechet-Like distance between two trees is SNP-hard to compute.I modify the definition of Frechet-Like distance to measure the distance between tow mergetrees, and I prove the relation between the interleaving distance and the modified Frechet-Likedistance. a r X i v : . [ c s . CC ] A p r Introduction
In this paper I are interested in extending the definition of the Frechet distance between curves toa distance between two trees.Frechet distance between curves is a distance for measuring the similarity between two curves.For the first time Frechet distance was defined by Maurice Frchet [4, 9, 10]. Later, Frechet distanceattracted attention and was worked on by other people [1, 3, 4, 6, 7].The intuitive definition of Frechet distance between two curves is as follows: A man and his dogstart from the starting points of two curves and a leash connects the dog to the man. They canonly go forward. The Frechet distance between the curves is the minimum length of the leash thatthe man and the dog start from the beginning of the curves and they reach to the end of the curveswithout separating the leash. In the following I write the mathematical definition of the Frechetdistance between two curves [4].
Definition 1. [4] Suppose that I have two curves C : [ a, b ] −→ V and C : [ a (cid:48) , b (cid:48) ] −→ V , such that a < b and a (cid:48) < b (cid:48) and V is a vector space. The Frechet distance between C and C is defined as theinfimum distance over all continuous increasing functions α : [0 , −→ [ a, a (cid:48) ] and β : [0 , −→ [ b, b (cid:48) ] that maximizes the distance between C ( α ( t )) and C ( β ( t )) on t ∈ [0 , . In this case, the Frechetdistance is defined as follows: d F ( C , C ) = inf α,β max t ∈ [0 , { d ( C ( α ( t )) , C ( β ( t ))) } . Weak Frechet distance is a special kinds of Frechet distance such that the man and the dog can gobackward as well [4]. Both Frechet distance and weak Frechet distance can be found in a polynomialtime between two polygonal curves [4], but it is NP-hard to compute the Frechet distance betweentwo surfaces [12] and till now no one has defined Frechet distance between trees. Discrete Frechetdistance was discussed by T. Eiter and H. Mannila in 1994 [8]. In 2012, P.K.Agrawal, etc. foundan algorithm to find the discrete Frechet distance between two polygonal curves in sub-quadratictime. [1]
New work.
In this paper I will extend the definition of the Frechet distance between curves todefine a similar distance between rooted trees.This is the first time that the Frechet distance is defined between trees. I call it Frechet-Likedistance because of the similarity of this definition to the Frechet distance between curves. Theintuitive definition of the Frechet-Like distance is as follows: Two men A and B start form theroots of merge trees T and T respectively and there is a rope between them. When they reachto a node (a vertex with the degree of greater than 2) each of them construct ( k −
1) men similarto themselves, which k is the outgoing degree of the node. Each man from T ( T ) is assigned towalk simultaneously (there is a rope between them) with a man in T ( T ); he can stop in a nodeor go forward with a man from T ( T ) or the man in T ( T ) stops and another one goes forward.A man from T ( T ) can walk with more than one man from T ( T ) only if the distance betweenthem is not more than ε and the man in T ( T ) stops in one point. Also if two men A and B gosimultaneously with two men A (cid:48) and B (cid:48) , respectively, there is a rope between the nearest commonancestor or A and B and also the nearest common ancestor of A (cid:48) and B (cid:48) . The problem is findingthe minimum length of the rope that the man can start from the beginning of the tree T andanother from T and there is at least a man to reach to any leaf of T and a man that reaches toany leaf of T . A man should either construct other men or reach to a leaf. In this definition length1f the rope with one end point in x ∈ | T | and one end point in y ∈ | T | is defined as d ( x, y ). HereI consider the Euclidean distance as the distance between x and y .Later I modify the definition of Frechet-Like distance to a definition between two merge trees.By considering the merge trees T f and T g , I prove the relation between the modified Frechet-Likedistance and the interleaving distance between two merge trees. Definition 2.
Merge tree. [14, 15]A merge tree is a rooted tree with a function which is defined on each point of the tree. A mergetree T h is defined by a pair ( T, h ) such that h : | T | −→ R is a monotone function which means thatif for x, y ∈ | T | x < y , h ( x ) < h ( y ) .Intuitively I can define a merge tree ( T, h ) as follows: consider a tree and a node of the tree asthe node u . Hang the tree from the node. I consider the function value h ( u ) = 0 for u that I hangthe tree from and for all the other points in the tree, the function of each point of the merge tree T hu will be the negative distance between the node u and the point. The outcome of this paper is as follows: The distance between two trees is discussed in section 2.In section 3, I define the Frechet-Like distance between trees, both the intuition and mathematicaldefinition of Frechet-Like distance. In section 4, I prove that it is NP-hard to approximate theFrechet-Like distance between rooted trees. Section 5 is considered for modifying the Frechet-like distance between two merge trees. I also prove the relation between the interleaving distanceand the modified Frechet-like distance between two merge trees in this section. Section 6 is theconclusion.
Distance between trees is one of the topics that has been discussed in the previous years [2, 5, 13,14, 15]. The tree edit distance and the tree alignment distance are two well-known distances whichwere defined between trees [13]. Both the tree edit distance and the tree alignment distance betweentwo trees are MAX SNP-hard to compute. There is a polynomial time algorithm for computing thetree alignment distance between two ordered trees if we bound the degree of each node, howeverthere is no known polynomial algorithm for finding the edit distance between ordered trees withbounded degrees. There is a polynomial algorithm for computing the tree edit distance betweentrees if we consider trees with bounded depth [13]. Definition 3.
Tree edit distance [13].Consider two labeled trees T and T . The tree edit distance is the minimum cost of changingone tree to another one by using three editing operations add, remove and rename. Definition 4.
Tree alignment distance [13].Consider two labeled trees T and T . The alignment distance between the two trees is obtainedas follows: first I add nodes to T and T that the modified trees T (cid:48) and T (cid:48) have the same structures.The related cost would be the the cost of changing the labels that two trees T (cid:48) and T (cid:48) have also samelabels. The minimum cost related to the best structural changes is the alignment distance. Two following notes are satisfied about the tree edit distance and tree alignment distance from[13] and [15] respectively. x < y is that x is a descendant of y Ordered tree is a rooted tree that there is an order between the children of each node [13]. ote 1. Tree alignment distance is always greater than or equal to tree edit distance. For moreillumination, look at Figure 1 (b).
Note 2.
Although there is a polynomial time algorithm for finding tree alignment distance betweenordered labeled trees, tree alignment distance cannot capture similarities between trees. Figure 1 (a)illustrates this better. (a) (b) (c)Figure 1: (a) Two trees are very similar to each other, but the alignment distance between themis very large, because tree alignment distance is sensible to the relationship between children andparents. (b) If the costs of relabeling, removing and adding nodes is 1, tree edit distance betweentwo trees is 2, and tree alignment distance between them is 4. (c) Two trees (red color and blackcolor trees) are completely different, however the Hausdorff distance between them is small.Another distance that we can consider between trees is Hausdorff distance [6]. Hausdorff dis-tance is defined between two sets of points. The Hausdorff distance is defined as follows:
Definition 5.
Hausdorff Distance [6].For given sets S and S in a space, for each point s in S we find the closest point to it in S (as s (cid:48) ), and for each point in S we find the closest point to it in the set S . The Hausdorffdistance is the maximum over all distances that we find. The mathematical definition of Hausdorffdistance is as follows: d H ( S , S ) = max (cid:26) sup s ∈ S inf s (cid:48) ∈ S d ( s, s (cid:48) ) , sup s (cid:48) ∈ S inf s ∈ S d ( s, s (cid:48) ) (cid:27) . If we consider the underlying space of trees on R , we can define Hausdorff distance between twotrees. However, the Hausdorff distance cannot capture dissimilarities between trees. For examplein Figure 1 the two trees are very different, however the Hausdorff distance between them is verysmall.Another distance that we can consider between trees is interleaving distance[14, 15]. Interleavingdistance is defined between merge trees. Interleaving distance between two merge trees T f and T g is defined by two continuous functions α and β and the definition is as follows: Definition 6. [2, 14, 15] Interleaving distance between two merge trees T f and T g is defined asfollows: d I ( T f , T g ) = inf { δ s.t. there is a pair of δ -compatible maps between T f and T g } , here two continuous maps α δ : | T f | −→ | T g | and β δ : | T g | −→ | T f | are δ -compatible if and only ifthe following conditions are satisfied:(1) For all u ∈ | T f | , g ( α δ ( u )) = f ( u ) + δ ,(2) For all v ∈ | T g | , f ( α δ ( v )) = g ( v ) + δ ,(3) For all u , u ∈ | T f | s.t. f ( u ) = f ( u ) , β δ oα δ ( u ) = β δ oα δ ( u )) = u δ ,(4) For all v , v ∈ | T g | s.t. g ( v ) = g ( v ) , α δ oβ δ ( v ) = αoβ δ ( v ) = v δ . In [2], P. K. Agrawal, etc., proved that it is NP-hard to compute interleaving distance betweentwo merge trees and it concludes the fact that it it NP-hard to compute the Gromov-Hausdorffdistance between trees within a factor of better than 3. Later in 2019 E. Farahbakhsh and Y. Wang[15] defined one ε -good map from T f to T g which is defined as follows: Definition 7. [15] A map α δ : | T f | −→ | T g | is called δ -good map if and only if the followingconditions are satisfied:(C1) α δ is continuous,(C2) For every point u ∈ | T f | , g ( α δ ( u )) = f ( u ) + δ ,(C3) For every pair of points v = α δ ( u ) and v = α δ ( u ) , if v ≥ v , u δ ≥ u δ ,(C4) If there is a point v ∈ | T g | which is not in the image of α δ , f ( v F ) − f ( v ) ≤ δ . and by the definition of δ -good map, they proved the following Theorem: Theorem 1. [15] d I ( T f , T g ) ≤ δ if and only if there is a δ -good map α δ : | T f | −→ | T g | . In this section I define the Frechet-Like distance between two rooted trees. Given two merge trees T and T rooted at u and v respectively, the definition of Frechet-Like distance is that I considertwo men A and B who start to walk from the points u and v respectively. Two men are connectedby using a rope. If a man reaches to a node (with a degree of higher than 2) he will copy himself k − k is the outgoing degree of the node. Each man at each time can monitor(there is a rope between them) just one man at a time unless the man stops and others go far awayjust ε distance. Here the distance between two men is defined the distance between their functionvalue of the merge tree at those points that they are. The Frechet-Like distance is defined as theminimum distance between the point that two men are. In the following I write the mathematicaldefinition of the Frechet-Like distance: Definition 8.
Frechet-Like Distance
For two given rooted trees T and T , I define Frechet-Like distance as follows: d F L ( T f , T g ) := min R ∈R sup ( x,y ) ∈ R d ( x, y ) d ( x, y ) is the Euclidean distance between two points x and y and the correspondence R ⊆ | T f | × | T g | is defined as follows:1) ∀ x ∈ | T | , ∃ y ∈ | T | s.t. ( x, y ) ∈ R u δ is an ancestor of u in T f such that f ( u δ ) − f ( u ) = 2 δ v F is the nearest ancestor of v such that v F is in the image of α δ . -i) ∀ y ∈ | T | , ∃ x ∈ | T | s.t. ( x, y ) ∈ R
2) If ( x , y ) ∈ R and ( x , y ) ∈ R and x ≥ x and y ≥ y then2-i) ∀ x s.t. x ≤ x ≤ x , ∃ y s.t. y ≤ y ≤ y and ( x, y ) ∈ R and2-ii) ∀ y s.t. y ≤ y ≤ y , ∃ x s.t. x ≤ x ≤ x and ( x, y ) ∈ R .3) If ( x , y ) ∈ R and ( x , y ) ∈ R then ( x ∼ x , y ∼ y ) ∈ R .4) If x ∈ | T | is a leaf, there should be a leaf y ∈ | T | such that ( x, y ) ∈ R , unless there is a y (cid:48) suchthat ( x, y (cid:48) ) ∈ R and ( x N , y (cid:48) ) ∈ R .4-i) If y ∈ | T | is a leaf, there should be a leaf x ∈ | T | such that ( x, y ) ∈ R , unless there is a x (cid:48) such that ( x (cid:48) , y ) ∈ R and ( x (cid:48) , y N ) ∈ R . In this section I prove that computing the Frechet like distance between two rooted trees is SNP-hard to compute by a reduction from UNRESTRICTED-PARTITION. The way that I prove thatit is in SNP-hard is very similar to proving that Gromov-Hausdorff distance between two mergetrees is in SNP-complete. [2]
UNRESTRICTED-PARTITION.
Input: a multiset of positive integers X = { a , ..., a n } such that n = 3 k ,Output: Is there a partition of X into k multisets X , ..., X m such that for each multiset X j if weconsider by S j the summation of elements in multiset X j , S j = ( (cid:80) ni =1 a i ) /m ? [11] Theorem 2.
The problem UNRESTRICTED-PARTITION is in SNP-complete.Proof.
See ([11]).Here, I construct two merge trees T f and T g as follows. In the following picture A and B aretwo large numbers.Figure 2: Two trees T f and T g . A and B are two large numbers.if I consider two merge trees T f and T g that their roots is located in one point, d ( x, y ) = | f ( x ) − g ( y ) | in the Definition9. Now, I prove the hardness of approximation of Frechet-Likedistance by the following lemmas. x ∼ x is the nearest ancestor of x and x x N the nearest node which is an ancestor of x emma 1. d F L ( f, g ) ≤ if UNRESTRICTED-PARTITION is a yes instance.Proof. If UNRESTRICTED-PARTITION is a yes instance, I can construct a correspondence R ⊆| T f | × | T g | such that sup ( x,y ) ∈ R | f ( x ) − g ( y ) | ≤
1. If UNRESTRICTED-PARTITION is a yesinstance, I can partition X into X , X , ..., X n such that S ( X ) = S ( X ) k , and X i = { a i, , ..., a i,k i } .Therefore, I map sub-trees rooted at { u i , ..., u i ki } to v i , such that u i j corresponds to a i,j in theconstruction of the tree and v i corresponds to X i . When I say that I map a point x ∈ | T f | to a point y ∈ | T g | , we mean that ( x, y ) ∈ R If ( u i j , v i ) ∈ R and ( u i k , v i ) ∈ R , I have that ( u r , v i ) ∈ R . (Formore illustration look at Figure 2) Therefore, I could construct a correspondence R ⊆ | T f | × | T g | such that sup ( x,y ) ∈ R | f ( x ) − g ( y ) | < Lemma 2.
If UNRESTRICTED-PARTITION is a no instance, d F L ( f, g ) ≥ .Proof. If UNRESTRICTED-PARTITION is a no instance, as edges with the length of A are toolarge, we have to find a correspondence R such that for any pair of points x , x ∈ T f suchthat x (cid:107) x , there are two different points y , y ∈ T g such that ( x , y ) ∈ R , and ( x , y ) ∈ R .Therefore, the best correspondence that I can find with the conditions of the definition 9 is that x ∼ x map to two different point y and y as shown in Figure 4. Which indicates that theFrechet distance between T f and T g cannot be smaller than 3.From two mentioned lemmas, we can conclude the following result. Corollary 1.
Computing a (3 − (cid:15) ) -approximation of the Frechet-Like distance between two mergetrees T f and T g is NP-complete, In this section we define a Frechet-Like distance between two merge trees, which we call Frechet-Like distance between merge trees. Given two merge trees T f and T g rooted at u and v respectively, x (cid:107) x if x (cid:2) x nor x (cid:2) x Definition 9.
Frechet-Like Distance
For two given merge trees T f and T g , we define Frechet-Like distance as follows: d MF L ( T f , T g ) := min R ∈R sup ( x,y ) ∈ R | f ( x ) − g ( y ) | and the correspondence R ⊆ | T f | × | T g | is defined like the Definition 6. By the following lemma, we prove the relation between the Frechet-like distance and the inter-leaving distance between merge trees.
Lemma 3.
If there exists an ε such that d MF L ( T f , T g ) ≤ ε , then d I ( T f , T g ) ≤ ε .Proof. For proving this lemma we need to find an ε -good map α ε : | T f | −→ | T g | such that threeconditions in the definition of ε -good map are satisfied. First, we consider the ε -good map α ε asfollows:As the Frechet-Like distance between T f and T g is not greater than ε , based on the definition9 there is a correspondence R such that four conditions of the definition 9 are satisfied. Now forconstructing of the ε -good map for any pair of points ( x, y ) ∈ R if g ( y ) = f ( x ) + ε , we map thepoint x to y , in another words α ε ( x ) = y . Otherwise, if g ( y ) < f ( x ) + ε , we map x to a point y (cid:48) such that y ≤ y (cid:48) and g ( y (cid:48) ) = f ( x ) + ε , it means that α ε ( x ) = y (cid:48) .Now, we need to prove that α ε is an ε -good map. To do so, we need to prove that four conditionsof the definition 7 for the map α ε are satisfied. C1.
We need to prove that map α ε is continuous. To do so, we use the similar method as is writtenin [15]. C2.
Based on the construction of the map α ε for any pair of points ( x, y ) ∈ R we map x to a pointwhich is ε distance higher than x . As for all the point x in | T f | there is at least one y such that7 x, y ) ∈ R , we can conclude that for all the point x in | T f | , g ( α ε ( x )) = f ( x ) + ε , which satisfies thecondition (C2) of the Definition 7. C3.
If two pairs of points ( x , y ) ∈ R and ( x , y ) ∈ R , and y ≤ y , we know that g ( y ) ≤ g ( y ).Therefore, based on the construction of the map α ε , we have that f ( x ) ≤ f ( x ). Two cases canhappen:Case1: x ≤ x , which in this case we have that x (cid:15) ≤ x (cid:15) .Case2: x (cid:107) x , in this case if by contradiction x (cid:15) (cid:2) x (cid:15) , therefore we have that x (cid:15) (cid:107) x (cid:15) as f ( x ) ≤ f ( x ). Based on the definition of Frechet-Like distance the highest y such that ( x , y ) ∈ R is y . Therefore by using the condition 3 of the Frechet-Like distance the highest point yinT g that( x ∼ x , y ) ∈ R is y and f ( x ∼ x ) − g ( y ) > ε . It is a contradiction with the fact that theFrechet-Like distance between T f and T g is less than or equal to ε .Figure 5: C4.
If there is a point y ∈ T f such that there is no x ∈ T g map to y under the map α ε , as wealready proved in C1 that the map is continuous, the point should be a branch connects a leaf (Forexample y L ) to the tree, and none of the point y (cid:48) ≤ y are in the image of the map α ε . Now, I justneed to prove that g ( y F ) − g ( y ) ≤ ε . By contradiction if g ( y F ) − g ( y ) > ε and x is the pointthat ( x, y F ) ∈ R based on the definition of Frechet-Like distance condition 4, x is a leaf. Therefore,( x, y L ) ∈ R and it is a contradiction by the fact that d MF L ( T f , T g ) ≤ ε . In this paper I extended the Frechet distance between two curves to Frechet-Like distance betweenrooted trees. In section 2, I discussed some distances that have been defined between two trees. Idefined a new definition for computing the similarity between two trees in Section 2. I called thenew distance, Frechet-Like distance because of the similarity of the definition to Frechet distancebetween curves. The hardness of approximation was discussed later in Section 4. Here, we also8roved that although there is a polynomial time algorithm for computing the Frechet distancebetween polygonal curves [4], it is NP-hard to approximate Frechet-Like distance between twotrees. The relation between Frechet-Like distance between two merge trees and the interleavingdistance was discussed in section 5.
References [1] Pankaj K. Agarwal, Rinat Ben Avraham, Haim Kaplan, and Micha Sharir. Computing thediscrete fr´echet distance in subquadratic time.
SIAM J. Comput. , 43:429–449, 2013.[2] Pankaj K. Agarwal, Kyle Fox, Abhinandan Nath, Anastasios Sidiropoulos, and Yusu Wang.Computing the gromov-hausdorff distance for metric trees. In
ISAAC , 2015.[3] Hugo Alves Akitaya, Maike Buchin, Leonie Ryvkin, and J´erˆome Urhausen. The k -frechetdistance. 2019.[4] Helmut Alt and Michael Godau. Computing the frchet distance between two polygonal curves. Int. J. Comput. Geometry Appl. , 5:75–91, 1995.[5] Philip Bille. A survey on tree edit distance and related problems.
Theor. Comput. Sci. ,337(1-3):217–239, June 2005.[6] Kevin Buchin, Maike Buchin, and Carola Wenk. Computing the fr´echet distance betweensimple polygons in polynomial time. volume 2006, pages 80–87, 01 2006.[7] Maike Buchin, Anne Driemel, and Bettina Speckmann. Computing the fr´echet distance withshortcuts is np-hard. In
Symposium on Computational Geometry , 2013.[8] Thomas Eiter and Heikki Mannila. Computing discrete fr´echet distance. Technical report,1994.[9] G.M.N. Ewing.
Calculus of Variations with Applications . Dover Books on Mathematics. DoverPublications, 1985.[10] M. Maurice Fr´echet. Sur quelques points du calcul fonctionnel.
Rendiconti del Circolo Matem-atico di Palermo (1884-1940) , 22(1):1–72, Dec 1906.[11] M. R. Garey and David S. Johnson. Computers and intractability: A guide to the theory ofnp-completeness. 1978.[12] Michael Godau. On the complexity of measuring the similarity between geometric objects inhigher dimensions. 1999.[13] Tao Jiang, Lusheng Wang, and Kaizhong Zhang. Alignment of trees: an alternative to treeedit. 1995.[14] Dmitriy Morozov, Kenes Beketayev, and Gunther H. Weber. Interleaving distance betweenmerge trees. In
Workshop on Topological Methods in Data Analysis and Visualization: Theory,Algorithms and Applications , 2013. 915] Elena Farahbakhsh Touli and Yusu Wang.
FPT -algorithms for computing gromov-hausdorffand interleaving distances between trees. In