Translation Invariant Fréchet Distance Queries
Joachim Gudmundsson, André van Renssen, Zeinab Saeidi, Sampson Wong
TTranslation Invariant Fr´echet Distance Queries
Joachim Gudmundsson, Andr´e van Renssen, Zeinab Saeidi, Sampson Wong
Abstract
The Fr´echet distance is a popular similarity measure between curves. For someapplications, it is desirable to match the curves under translation before computingthe Fr´echet distance between them. This variant is called the Translation InvariantFr´echet distance, and algorithms to compute it are well studied. The query version,however, is much less well understood.We study Translation Invariant Fr´echet distance queries in a restricted setting ofhorizontal query segments. More specifically, we prepocess a trajectory in O ( n log n )time and space, such that for any subtrajectory and any horizontal query segmentwe can compute their Translation Invariant Fr´echet distance in O (polylog n ) time.We hope this will be a step towards answering Translation Invariant Fr´echet queriesbetween arbitrary trajectories. The Fr´echet distance is a popular measure of similarity between curves as it takes intoaccount the location and ordering of the points along the curves, and it was introducedby Maurice Fr´echet in 1906 [11]. Measuring the similarity between curves is an importantproblem in many areas of research, including computational geometry [2, 4, 10], computa-tional biology [14, 25], data mining [15, 20, 23], image processing [1, 21] and geographicalinformation science [16, 18, 19, 22].The Fr´echet distance is most commonly described as the dog-leash distance; consider aman standing at the starting point of one trajectory and the dog at the starting point ofanother trajectory. A leash is required to connect the dog and its owner. Both the man andhis dog are free to vary their speed, but they are not allowed to go backward along theirtrajectory. The cost of a walk is the maximum leash length required to connect the dog andits owner from the beginning to the end of their trajectories. The Fr´echet distance is theminimum length of the leash that is needed over all possible walks. More formally, for twocurves A and B each having complexity n , the Fr´echet distance between A and B is definedas: δ F ( A, B ) = inf µ max a ∈ A dist ( a, µ ( a ))where dist ( a, b ) denotes the Euclidean distance between point a and b and µ : A → B isa continuous and non-decreasing function that maps every point in a ∈ A to a point in µ ( a ) ∈ B .Since the early 90’s the problem of computing the Fr´echet distance between two polygonalcurves has received considerable attention. In 1992 Alt and Godau [2] were the first toconsider the problem and gave an O ( n log n ) time algorithm for the problem. The onlyimprovement since then is a randomized algorithm with running time O ( n (log log n ) ) in1 a r X i v : . [ c s . C G ] F e b he word RAM model by Buchin et al . [7]. In 2014 Bringmann [4] showed that, conditionalon the Strong Exponential Time Hypothesis (SETH), there cannot exist an algorithm withrunning time O ( n − ε ) for any ε >
0. Even for realistic models of input curves, such as c -packed curves [10], exact distance computation requires n − o (1) time under SETH [4]. Onlyby allowing a (1 + ε )-approximation can one obtain near-linear running times in n and c on c -packed curves [5, 10].For some applications, such as protein matching [14] and handwriting recognition [21], itis desirable to match the two curves under translation before computing the Fr´echet distancebetween them. Formally, we match two polygonal curves A and B under the Fr´echet distanceby computing the translation τ so that the Fr´echet distance is minimised. This variant iscalled the Translation Invariant Fr´echet distance, and algorithms to compute it are wellstudied [3, 6, 14, 24]. Algorithms for the Translation Invariant Fr´echet distance generallycarry higher running times than for the standard Fr´echet distance, moreover, these runningtimes depend on the dimension of the input curves and whether the input curves are discreteor continuous.For a discrete sequence of points in two dimensions, Bringmann et al . [6] recently pro-vided an O ( n ) time algorithm to compute the Translation Invariant Fr´echet distance,and showed that the problem has a conditional lower bound of Ω( n ) under SETH. Forcontinuous polygonal curves in two dimensions, Alt et al . [3] provided an O ( n log n ) timealgorithm, and Wenk [24] extended this to an O ( n log n ) time algorithm in three dimen-sions. If we allow for a (1 + ε )-approximation then there is an O ( n /ε ) time algorithm [3],which matches conditional lower bound for approximating the standard Fr´echet distance [4].For both the standard Fr´echet distance and the Translation Invariant Fr´echet distance,subquadratic and subquartic time algorithms respectively are unlikely to exist under SETH [4,6]. However, if at least one of the trajectories is can be preprocessed, then the Fr´echet dis-tance can be computed much more efficiently.Querying the standard Fr´echet distance between a given trajectory and a query trajec-tory has been studied [8, 9, 10, 12, 13], but due to the difficult nature of the query problem,data structures only exist for answering a restricted class of queries. There are two resultswhich are most relevant. The first is De Berg et al .’s [9] data structure, which answersFr´echet distance queries between a horizontal query segment and a vertex-to-vertex subtra-jectory of a preprocessed trajectory. Their data structure can be constructed in O ( n log n )time using O ( n log n ) space such that queries can be answered in O (log n ) time. The sec-ond is Driemel and Har-Peled’s [10] data structure, which answers approximate Fr´echetdistance queries between a query trajectory of complexity k and a vertex-to-vertex subtra-jectory of a preprocessed trajectory. The data structure can be constructed in O ( n log n )using O ( n log n ) space, and a constant factor approximation to the Fr´echet distance canbe answered in O ( k log n log( k log n )) time. In the special case when k = 1, the approx-imation ratio can be improved to (1 + ε ) with no increase in preprocessing or query timewith respect to n . New ideas are required for exact Fr´echet distance queries on arbitraryquery trajectories. Other query versions for the standard Fr´echet distance have also beenconsidered [8, 12, 13].Querying the Translation Invariant Fr´echet distance is less well understood. This is notsurprising given the complexity of computing the Translation Invariant Fr´echet distance.Nevertheless, in our paper we are able to answer exact Translation Invariant Fr´echet queriesin a restricted setting of horizontal query segments. We hope this will be a step towardsanswering exact Translation Invariant Fr´echet queries between arbitrary trajectories.In this paper, we answer exact Translation Invariant Fr´echet distance queries between2 subtrajectory (not necessarily vertex-to-vertex) of a preprocessed trajectory and a hor-izontal query segment. The data structure can be constructed in O ( n log n ) time us-ing O ( n log n ) space such that queries can be answered in O (polylog n ) time. We useMegiddo’s parametric search technique [17] to De Berg et al .’s [9] data structure to optimisethe Fr´echet distance. We hope that as standard Fr´echet distance queries become more wellunderstood, similar optimisation methods could lead to improved data structures for theTranslation Invariant Fr´echet distance as well. Let p , . . . , p n be a sequence of n points in the plane. We denote π = ( p , p . . . , p n ) tobe the polygonal trajectory defined by this sequence. Let x ≤ x and y ∈ R , and define p = ( x , y ) and q = ( x , y ) so that Q = pq is a horizontal segment in the plane. Let u and v be two points on the trajectory π , then from [9], the Fr´echet distance between π [ u, v ] and Q can be computed by using the formula: δ F ( π [ u, v ] , pq ) = max {(cid:107) up (cid:107) , (cid:107) vq (cid:107) , δ −→ h ( π [ u, v ] , pq ) , B ( π [ u, v ] , y ) } . The first two terms are simply the distance between the starting points of the twotrajectories, and the ending points of the two trajectories. The third term is the directedHausdorff distance between π [ u, v ] and Q which can be computed from: δ −→ h ( π [ u, v ] , Q ) = max { max p i .x ∈ ( −∞ ,x ] (cid:107) p − p i (cid:107) , max p i .x ∈ [ x , ∞ ) (cid:107) q − p i (cid:107) , max i (cid:107) y − p i .y (cid:107)} , where each p i in the formula above are vertices of the subtrajectory π [ u, v ], and p i .x aretheir x -coordinates. The formula handles three cases for mapping every point of π [ u, v ] toits closest point on Q . The first term describes mapping points of π [ u, v ] to the left of p totheir closest point p . The second term describes mapping points of π [ u, v ] to the right of q analogously. The third term describes mapping points of π [ u, v ] that are in the verticalstrip between p and q to their orthogonal projection onto Q . In later sections we refer tothese three terms as δ −→ h ( L ), δ −→ h ( R ) and δ −→ h ( M ) for the left, right, and middle terms of theHausdorff distance respectively.The fourth term in our formula for the Fr´echet distance is the maximum backward pairdistance over all backward pairs. A pair of vertices ( p i , p j ) (with j > i ) is a backward pairif p j lies to the left of p i . The backward pair distance of π [ u, v ] can be computed from: B ( π [ u, v ] , y ) = max ∀ p i ,p j ∈ π [ u,v ]: i ≤ j,p i .x ≥ p j .x B ( p i ,p j ) ( y ) , where B ( p i ,p j ) ( y ) is the backward pair distance for a given backward pair ( p i , p j ) and isdefined as B ( p i ,p j ) ( y ) = min x ∈ R max {(cid:107) p i − ( x, y ) (cid:107) , (cid:107) p j − ( x, y ) (cid:107)} . The distance terms in the braces compute the distance between a given point ( x, y ) andthe farthest of p i and p j . Let us call this the backward pair distance of ( x, y ). Then thefunction B ( p i ,p j ) ( y ) denotes the minimum backward pair distance of a given backward pair( p i , p j ) over all points ( x, y ) which have the same y -coordinate. Taking the maximum overall backward pairs gives us the backward pair distance for π [ u, v ].3n Figure 1, we show for each y -coordinate the point with the minimum backward pairdistance (left), and the magnitude of this minimum distance (right). We see in the figurethat the function B ( p i ,p j ) ( y ) consists of two linear functions joined together in the middlewith a hyperbolic function. p i p j Figure 1: For each y -coordinate, Left: the point with minimum backward pair distance,Right: the minimum backward pair distance.We extend the work of De Berg et al . [9] in two ways. First, we provide a method foranswering Fr´echet distance queries between π [ u, v ] and Q when u and v are not necessarilyvertices of π , and second, we optimise the placement of Q to minimise its Fr´echet distanceto π [ u, v ]. We achieve both of these extensions by carefully applying Megiddo’s parametricsearch technique [17] to compute the optimal Fr´echet distance.In order to apply parametric search, we are required to construct a set of critical values(which we will describe in detail at a later stage) so that an optimal solution is guaranteedto be contained within this set. Since this set of critical values is often large, we need toavoid computing the set explicitly, but instead design a decision algorithm that efficientlysearches the set implicitly. Megiddo’s parametric search [17] states that if: • the set of critical values has polynomial size, and • the Fr´echet distance is convex with respect to the set of critical values, and • a comparison-based decision algorithm decides if a given critical value is equal to, tothe left of, or to the right of the optimum,then there is an efficient algorithm to compute the optimal Fr´echet distance in O ( P T p + T p T s log P ) time, where P is the number of processors of the (parallel) algorithm, T p is theparallel running time and T s is the serial running time of the decision algorithm. For ourpurposes, P = 1 since we run our queries serially, and T p = T s = O (polylog n ) for thedecision versions of our query algorithms. The first problem we apply parametric search to is the following. Given any horizontalquery segment Q in the plane and any two points u, v on π (not necessarily vertices of π ),determine the Fr´echet distance between Q and the subtrajectory π [ u, v ].Let p u be the first vertex of π along π [ u, v ] and let p v be the last vertex of π along π [ u, v ], as illustrated in Figure 2. If p u and p v do not exist then π [ u, v ] is a single segment so4 p n p qu vp (cid:48) q (cid:48) p u p v Figure 2: The points p (cid:48) and q (cid:48) mapped to the vertices p u and p v of the trajectory.the Fr´echet distance between π [ u, v ] and Q can be computed in constant time. Otherwise,our goal is to build a Fr´echet mapping µ : π [ u, v ] → Q which attains the optimal Fr´echetdistance. We build this mapping µ in several steps. Our first step is to compute points p (cid:48) and q (cid:48) on the horizontal segment pq so that p (cid:48) = µ ( p u ) and q (cid:48) = µ ( p v ).If the point p (cid:48) is computed correctly, then the mapping p (cid:48) → p u allows us to subdividethe Fr´echet computation into two parts without affecting the overall value of the Fr´echetdistance. In other words, we obtain the following formula: δ F ( π [ u, v ] , pq ) = max { δ F ( up u , pp (cid:48) ) , δ F ( π [ p u , v ] , p (cid:48) q ) } (1)We now apply the same argument to p v . We compute q (cid:48) optimally on the horizontalsegment p (cid:48) q optimally so that mapping p v → q (cid:48) does not increase the Fr´echet distancebetween the subtrajectory π [ p u , v ] and the truncated segment p (cid:48) q . In other words, we have: δ F ( π [ u, v ] , pq ) = max { δ F ( up u , pp (cid:48) ) , δ F ( π [ p u , p v ] , p (cid:48) q (cid:48) ) , δ F ( p v v, q (cid:48) q ) } (2)Now that p u and p v are vertices of π , [9] provides an efficient data structure for computingthe middle term δ F ( π [ p u , p v ] , p (cid:48) q (cid:48) ). The first and last terms have constant complexity andcan be handled in constant time. All that remains is to compute the points p (cid:48) and q (cid:48) efficiently. Theorem 1.
Given a trajectory π with n vertices in the plane. There is a data structurethat uses O ( n log n ) space and preprocessing time, such that for any two points u and v on π (not necessarily vertices of π ) and any horizontal query segment Q in the plane, onecan determine the exact Fr´echet distance between Q and the subtrajectory from u to v in O (log n ) time.Proof. Decision Algorithm.
Let S be the set of critical values (defined later in this proof),let s be the current candidate for the point p (cid:48) , and let F ( s ) = max( δ F ( ps, up u ) , δ F ( sq, π [ p u , v ]))be the minimum Fr´echet distance between pq and π [ u, v ] subject to p u being mapped to s .Our aim is to design a decision algorithm that runs in O (log n ) time that decides whetherthe optimal p (cid:48) is equal to s , to the left of s or to the right of s . This is equivalent to provingthat all points to one side of s cannot be the optimal p (cid:48) and may be discarded.We use the Fr´echet distance formula from Section 2 to rewrite F ( s ) = max( (cid:107) up (cid:107) , (cid:107) vq (cid:107) , (cid:107) p u s (cid:107) , δ −→ h ( π [ p u , v ] , sq ) , B ( π [ p u , v ] , y )). Then we take several cases for which of these five5erms attains the maximum value F ( s ), and in each case we either deduce that p (cid:48) = s or allcritical values to one side of s may be discarded. • If F ( s ) = max( (cid:107) up (cid:107) , (cid:107) vq (cid:107) , B ( π [ p u , v ] , y )), then p (cid:48) = s . We observe that none ofthe three terms on the right hand side of the equation depend on the position of s . Hence, F ( s ) = max( (cid:107) up (cid:107) , (cid:107) vq (cid:107) , B ( π [ p u , v ] , y )) ≤ F ( p (cid:48) ), and since F ( p (cid:48) ) is theminimum possible value, F ( s ) = F ( p (cid:48) ). We have found a valid candidate for p (cid:48) andcan discard all other candidates in the set S . • If F ( s ) = (cid:107) p u s (cid:107) and p u is to the right (left) of s , then p (cid:48) is to the right (left) of s .We will argue this for when p u is to the right of s , but an analogous argument can beused when p u is to the left. We observe that all points t to the left of s will now have (cid:107) p u t (cid:107) > (cid:107) p u s (cid:107) . Hence, F ( s ) = (cid:107) p u s (cid:107) < (cid:107) p u t (cid:107) ≤ F ( t ) for all points t to the left of s ,therefore all points to the left of s may be discarded. • If F ( s ) = δ −→ h ( π [ p u , v ] , sq ), then p (cid:48) is to the left of s . The directed Hausdorff distancemaps every point in π [ p u , v ] to their closest point on sq , so by shortening sq to tq forsome point t on sq to the right of s , the directed Hausdorff distance cannot decrease.Hence, F ( s ) ≤ F ( t ) for all t to the right of s , so all points to the right of s may bediscarded.To determine q (cid:48) for a fixed candidate s for p (cid:48) , we treat the problem in a similar way. Weconsider the subtrajectory π [ p u , v ] and the horizontal line segment sq . Defining a function G ( t ) representing the Fr´echet distance when p v is mapped to t , we obtain a similar deci-sion algorithm. The most notable difference is that since we now consider the end of thesubtrajectory, the decisions for moving t left and right are reversed. Convexity.
We will prove that F ( s ) is convex, and it will follow similarly that G ( t )is convex. It suffices to show that F ( s ) is the maximum of convex functions, since themaximum of convex functions is itself convex. The three terms (cid:107) up (cid:107) , (cid:107) vq (cid:107) , B ( π [ p u , v ] , y )are constant. The term (cid:107) p u s (cid:107) is an upward hyperbola and is convex. If suffices to showthat δ −→ h ( π [ p u , v ] , sq ) is convex.We observe that the Hausdorff distance δ −→ h ( π [ p u , v ] , sq ) must be attained at a vertex p i of π [ p u , v ], and that each of δ −→ h ( p i , sq ) as a function of s is a constant function between p and p ∗ i , and a hyperbolic function between p ∗ i and q . Thus, the function for each p i isconvex, so the overall Hausdorff distance function is also convex. Critical Values.
A critical value is a value c which could feasibly attain the minimumvalue F ( c ) = F ( p (cid:48) ). We represent F ( s ) as the minimum of n simple functions and thenargue that the minimum of F can only occur at the minimum of one of these functions, orat the intersection of a pair of these functions.First, (cid:107) up (cid:107) , (cid:107) vq (cid:107) , B ( π [ p u , v ] , y ) are constant functions in terms of s . Next, (cid:107) p u s (cid:107) is ahyperbolic function. Finally, δ −→ h ( π [ p u , v ] , sq ) is not itself simple, but it can be rewritten asthe combination of n simple functions as described in the above section.Hence, F ( s ) is the combination (maximum) of n simple functions, and these functions aresimple in that they are piecewise constant or hyperbolic. Hence F ( s ) attains its minimumeither at the minimum of one of these n functions, or at a point where two of these functionsintersect. Therefore, there are at most O ( n ) critical values for F ( s ). Query Complexity.
Computing q (cid:48) for a given candidate s for p (cid:48) takes O (log n ) time:We can compute the terms (cid:107) up (cid:107) , (cid:107) p u s (cid:107) , (cid:107) vq (cid:107) , and (cid:107) p v q (cid:48) (cid:107) in constant time. The terms B ( π [ p u , p v ] , y ) and δ −→ h ( π [ p u , p v ] , sq (cid:48) ) can be computed in O (log n ) time using the existing6ata structure by De Berg et al . [9]. We need to determine the time complexity of thesequential algorithm T s , parallel algorithm T p , and the number of the processor P . To find q (cid:48) , the decision algorithm takes T s = O (log n ). The parallel form runs on one processor in T p = O (log n ). Substituting these values in the running time of the parametric search of O ( P T p + T p T s log P ) leads to O (log n ) time.The above analysis implies that p (cid:48) itself can be computed in O (log n ) time: For a given s , the decision algorithm runs in T s = O (log n ) as mentioned above. The parallel form ofthe decision algorithm runs on one processors in T p = O (log n ). Substituting these valuesin the running time of the parametric search of O ( P T p + T p T s log P ) leads to O (log n ) time. Preprocessing and Space.
To compute the second term of Formula 2, we use the datastructure by De Berg et al . [9]. This data structure uses O ( n log n ) space and preprocessingtime and supports O (log n ) query time.We note that the set of critical values can be restricted significantly, while still beingguaranteed to contain optimal elements to use as p (cid:48) and q (cid:48) . Specifically, we can reduce thesize of this set from O ( n ) to O ( n ). Since this does not improve the running time of theabove algorithm, details on this improvement are deferred to Appendix A. We move on to the problem of minimising Fr´echet distance under translations. We firstfocus on a special case where the horizontal segment can only be translated vertically. InSection 5 we consider arbitrary translations of the horizontal segment.To this end, let us consider the following problem. Let π be a trajectory in the planewith n vertices. We preprocess π into a data structure such that for a query specified by1. two points u and v on the trajectory π ,2. two vertical lines x and x such that (cid:107) x − x (cid:107) = L one can quickly find a horizontal segment l y that spans the vertical strip between x and x such that the Fr´echet distance between l y and the subtrajectory π [ u, v ] is minimised; seeFigure 3. p p n u vx x l y Figure 3: Finding a horizontal segment l y in the vertical strip between x and x thatminimises the Fr´echet distance between l y and π [ u, v ].7n the next theorem, we present a decision problem D π [ u,v ] ( x , x , l cy ) that, for a giventrajectory π with two points u and v on π and two vertical lines x = x and x = x , returnswhether the line l y is above, below, or equal to the current candidate line l cy . We then useparametric search to find l y that minimises the Fr´echet distance. Theorem 2.
Given a trajectory π with n vertices in the plane. There is a data structurethat uses O ( n log n ) space and preprocessing time, such that for any two points u and v on π (not necessarily vertices of π ) and two vertical lines x = x and x = x , one candetermine the horizontal segment l y with left endpoint on x = x and right endpoint on x = x that minimises its Fr´echet distance to the subtrajectory π [ u, v ] in O (log n ) time.Proof. Decision Algorithm.
Let l cy be the current horizontal segment. To decide whetherthe line segment that minimises the Fr´echet distance lies above or below l cy , we must computethe maximum of the terms that determine the Fr´echet distance: (cid:107) up (cid:107) , (cid:107) vq (cid:107) , δ −→ h ( π [ u, v ] , pq ),and B ( π [ u, v ] , l cy ). As mentioned in Section 2, we divide the directed Hausdorff distanceinto three different terms: δ −→ h ( L ), δ −→ h ( R ), and δ −→ h ( M ). We first consider when one termdetermines the Fr´echet distance, in which we have the following cases: • (cid:107) up (cid:107) , (cid:107) vq (cid:107) , δ −→ h ( L ), and δ −→ h ( R ): Since the argument for these terms is analogous,we focus on (cid:107) up (cid:107) . If u is located above l cy , the next candidate lies above l cy (searchcontinues above l cy ). If u lies below l cy , the next candidate lies below l cy (search continuesbelow l cy ). If u and p have the same y -coordinate, we can stop, since moving l cy eitherup or down increases the Fr´echet distance. • B ( π [ u, v ] , l cy ): If the midpoint of the perpendicular bisector of the backward pair deter-mining the current Fr´echet distance is located above l cy , the next candidate lies above l cy , since this is the only way to decrease the distance to the further of the two pointsof the backward pair. If this midpoint lies below l cy , the next candidate lies below l cy .If the midpoint is located on l cy , we can stop, because the term B ( p i ,p j ) ( l cy ) increasesby either moving l cy up or down. • δ −→ h ( M ): If the point with maximum projected distance is located above l cy , the nextcandidate lies above l cy . If the point is below l cy , the next candidate lies below l cy . Ifthe point is on l cy , then we stop, but unlike in the first case, this maximum term andthe overall Fr´echet distance must both be zero in this case.If more than one term determine the current Fr´echet distance, we must first determinethe direction of the implied movement for each term. If this direction is the same, we movein that direction. If the directions are opposite, we can stop, because moving in eitherdirection would increase the other maximum term resulting in a larger Fr´echet distance. Convexity.
It suffices to show the Fr´echet distance between π [ u, v ] and l cy as a functionof y is convex. We show that this function is the maximum of several convex functions,and therefore must be convex. The first two terms for computing the Fr´echet distance are (cid:107) up (cid:107) and (cid:107) vq (cid:107) , which are hyperbolic in terms of y . Similarly to the previous section, wehandle each of the Hausdorff distances by splitting them up Hausdorff distances for eachvertex p i . The left and right Hausdorff distances δ −→ h ( L ) and δ −→ h ( R ) for a single vertex p i is ahyperbolic function. The middle Hausdorff distance δ −→ h ( M ) for a single vertex p i is a shiftedabsolute value function. In all cases, Hausdorff distance for a single vertex is convex, so theoverall Hausdorff distance is also convex. Finally, the backward pair distance B ( π [ u, v ] , l cy )as a function of y is shown by De Berg et al . [9] to be two rays joined together in the middlewith a hyperbolic arc. It is easy to verify that this function is convex.8 ritical Values. A horizontal segment l cy is a critical value of a decision algorithmif the decision algorithm could feasibly return that l cy = l y . These critical values are the y -coordinates of the intersection points of two hyperbolic functions for each combination oftwo terms of determining the Fr´echet distance or the minimum point of the upper envelopeof two such hyperbolic functions. Therefore, there are only a constant number of criticalvalues for each two terms. Each term gives rise to O ( n ) hyperbolic functions (specifically, B ( π [ u, v ] , l cy ) can be of size Θ( n ) in the worst case). Thus, there are O ( n ) critical values. Query Complexity.
The decision algorithm runs in T s = T p = O (log n ) time sincewe use Theorem 1 to compute the Fr´echet distance for a fixed l cy . Substituting this inthe running time of the parametric search O ( P T p + T p T s log P ) leads to a query time of O (log n ). Preprocessing and Space.
Since we compute the Fr´echet distance of the currentcandidate l cy using Theorem 1, we require O ( n log n ) space and preprocessing time. Finally, we consider minimising the Fr´echet distance of a horizontal segment under arbitraryplacement. Let π be a trajectory in the plane with n vertices. We preprocess π into a datastructure such that for a query specified by two points u and v on π and a positive real value L , one can quickly determine the horizontal segment l of length L such that the Fr´echetdistance between l and the subtrajectory π [ u, v ] is minimised.In the following theorem, we present a decision problem D π [ u,v ] ( L, x ) that, for a giventrajectory π with two points u and v on π and a length L and an x -coordinate x , returnswhether the line l has its left endpoint to the left, on, or to the right of x . We then applyparametric search to this decision algorithm to find the horizontal segment l of length L with minimum Fr´echet distance to π [ u, v ]. Theorem 3.
Given a trajectory π with n vertices in the plane. There is a data structurethat uses O ( n log n ) space and preprocessing time, such that for any two points u and v on π (not necessarily vertices of π ) and a length L , one can determine the horizontal segment l of length L that minimises the Fr´echet distance to π [ u, v ] in O (log n ) time.Proof. Decision Algorithm.
We only need to decide whether l c should be moved to theleft or right, with respect to its current position, for the cases where D π [ u,v ] ( x , x , l y ) stops.We classify the terms that determine the Fr´echet distance in two classes: • C : This class contains the terms whose value is determined by the distance from apoint on π ( u, v ) to p or q . Hence, it consists of (cid:107) up (cid:107) , (cid:107) vq (cid:107) , δ −→ h ( R ), and δ −→ h ( L ). • C : This class contains the terms whose value is determined by the distance froma point on π ( u, v ) to the closest point on pq . Hence, it consists of δ −→ h ( M ) and B ( π [ u, v ] , l y ).Next, we show how to decide whether the next candidate line segment lies to the left or rightof l c (i.e., the x -coordinate of its left endpoint lies to the left or right of the left endpoint of l c ) for each case where D π [ u,v ] ( x , x , l y ) stops.We decide this by considering each C and C term and the restriction they place on thenext candidate line segment pq . After we do this for each individual C or C term, we take9he intersection of all these restrictions. If the intersection is empty, then our placement of pq was optimal, and our decision algorithm stops. Otherwise we can either move pq to theleft or to the right to improve the Fr´echet distance.First, consider the C terms. Let us assume for now that the C term is the distance term (cid:107) up (cid:107) . Then in order to improve the Fr´echet distance to u , we need to place the horizontalsegment pq in such a way that p lies inside the open disk centered at u with radius equalto the current Fr´echet distance d . A similar condition holds for the other C terms: eachdefines a disk of radius d and the point it maps to in the next candidate needs to lie insidethis disk.Similarly, the C terms define horizontal open half-planes. Consider the term δ −→ h ( M ).This term is reduced when the vertical projection distance to the line segment is reduced.Hence, if the point defining this term lies above l c , this term can be reduced by movingthe line segment upward and thus the half-plane is the half-plane above l c . An analogousstatement holds if the point lies below l c . For the term B ( π [ u, v ] , l y ), we need to considerthe midpoint of the bisector, since the implied Fr´echet distance is the distance from l c tothe further of the two points defining the bisector. Thus, the half-plane that improves theFr´echet distance is the one that lies on the same side of l c as this midpoint.To combine all the terms we do the following: First, we take all disks induced by the C terms whose distance is with respect to q and translate them horizontally to the leftby a distance of L . This ensures that the disks constructed with respect to p can now beintersected with the disks constructed with respect to q . We take the intersection of all C and C terms that defined the stopping condition of the vertical optimisation step. If thisintersection is empty, by construction there is no point where we can move p to in order toreduce the Fr´echet distance. If it is not empty, we will show that it lies entirely to the leftor entirely to the right of p and thus implies the direction in which the next candidate lies.Now that we have described our general approach, we show which cases can occur andshow that for each of them we can determine in which direction to continue (if any). d dp qu vd v (cid:48) (a) The midpoint of pq is the midpointof uv . u vv (cid:48) p q ddd (b) Moving the midpoint of pq towards the mid-point of uv . Figure 4: Determining where l c should be moved to reduce the Fr´echet distance. Case 1. D π [ u,v ] ( x , x , l y ) stops because of terms in C . If only a single term of C isinvolved, say (cid:107) up (cid:107) , this implies that the y -coordinate of u is the same as that of l c and thusits disk lies entirely to the left of p . Hence, we can reduce the Fr´echet distance by moving l c horizontally towards u and thus we pick our next candidate in that direction. The same10rgument follows analogously the C term is (cid:107) vq (cid:107) , δ −→ h ( R ), or δ −→ h ( L ), the same argumentfollows analogously the distance is between a point on the trajectoryIf two terms of C are involved, say (cid:107) up (cid:107) and (cid:107) vq (cid:107) , their intersection can be empty(see Figure 4(a)) or non-empty (see Figure 4(b)). If it is empty, the midpoint of pq is thesame as the midpoint of uv , which implies that we cannot reduce the Fr´echet distance.If the intersection is not empty, moving the endpoint of the line segment into this regionpotentially reduces the Fr´echet distance. We note that since (cid:107) up (cid:107) and (cid:107) vq (cid:107) stopped thevertical optimisation, they lie on opposite sides of l c . Hence, the intersection of their diskslies entirely to the left or entirely to the right of p and thus determines in which directionthe next candidate lies.If three terms in C are involved, we again construct the intersection as described earlier.If this intersection is empty (see Figure 5b), we are again done. If it is not (see Figure 5a),it again determines the direction in which the our next candidate lies, as the intersection ofthree disks is a subset of the intersection of two disks. p qu d d dp i p j d dp (cid:48) i p (cid:48) j (a) The case where we can improve the Fr´echetdistance. p qu d d dp i p j d dp (cid:48) i p (cid:48) j (b) The case where we cannot improve theFr´echet distance. Figure 5: The case where we have three C terms.If there are more than three C terms, we reduce this to the case of three C terms.If the intersection of these disks is non-empty, then trivially the intersection of a subset ofthree of them is also non-empty. If the intersection is empty, we select a subset of threewhose intersection is also empty. The three disks can be chosen as follows. Insert the disksin some order and stop when the intersection first becomes empty. The set of three disksconsists of the last inserted disk and the two extreme disks among the previously inserteddisks. Since the boundary of all the disks must go through a single point and the disks haveequal radius, these three disks will have an empty intersection. Hence, the case of morethan three disks reduces to the case of three disks. Case 2. D π [ u,v ] ( x , x , l y ) stops because of a term in C . Since the vertical optimisationstopped, we know that at least two C terms are involved and there exists a pair that lies onopposite sides of l c . These two terms define open half-planes whose intersection is empty,hence we cannot reduce the Fr´echet distance further. Case 3. D π [ u,v ] ( x , x , l y ) stops because of a term in C and a term in C . We canassume there are at most two C terms and at most one C terms, due to the previouscases.The C term can be either δ −→ h ( M ) (see Figure 6a, where h is the point at distance d )or B ( π [ u, v ] , l y ) (see Figure 6b, where ( p i , p j ) is the backward pair with distance d ). Theregion R shows the intersection of the disk of a single C term and the C term. We notethat since the point of the C term and the point of the C term lie on opposite sides of l c ,11 qhu d dR (a) The case where the C term is δ −→ h ( M ). p qu d dR p j p i d (b) The case where the C term ismax u ≤ i ≤ j ≤ v,p i .x ≥ p j .x B ( p i ,p j )( l c ). Figure 6: Reduce the Fr´echet distance when it is determined by a term of C and a term of C .this intersection lies either entirely to the left or entirely to the right of p or q , determiningthe direction in which our next candidate must lie.The same procedure can be applied when there are two C terms and using similararguments, it can be shown that if the intersection is not empty, the direction to improvethe Fr´echet distance is uniquely determined. Convexity.
Next, we show that D π [ u,v ] ( L, x ) is a convex function with respect to theparameter x . Let l cy be the current horizontal segment and assume without loss of generalitythat the decision algorithm moves right to a new segment l y (cid:48) ; see Figure 7a. Consider alinear interpolation from l cy to l y (cid:48) . Let l y (cid:48)(cid:48) be the segment at the midpoint of this linearinterpolation. Since D π [ u,v ] ( L, x ) is a continuous function, for continuous functions, convexis the same as midpoint convex, this implies that we only need to show that D π [ u,v ] ( L, x )is midpoint convex.Consider the two mappings that minimise the Fr´echet distance between π [ u, v ] and thehorizontal segments l cy and l y (cid:48) . Let r be any point on π [ u, v ] and let a and c be the pointswhere r is mapped to on l y and l y (cid:48) . Construct a point b on l y (cid:48)(cid:48) where r will be mapped toby linearly interpolating a and c . Performing this transformation for every point on π [ u, v ],we obtain a valid mapping for l y (cid:48)(cid:48) , though not necessarily one of minimum Fr´echet distance.We bound the distance between r and b in terms of (cid:107) ra (cid:107) and (cid:107) rc (cid:107) . Consider the paral-lelogram consisting of a , r , b , and a point r (cid:48) that is distance (cid:107) ra (cid:107) from c and distance (cid:107) rc (cid:107) from a ; see Figure 7b. Since b is the midpoint of ac , it is also the midpoint of rr (cid:48) in thisparallelogram. We can conclude that (cid:107) rb (cid:107) ≤ ( (cid:107) ra (cid:107) + (cid:107) rc (cid:107) ) / r on π [ u, v ] and the Fr´echet distance is theminimum over all possible mappings, the Fr´echet distance of l y (cid:48)(cid:48) is upper bounded by theaverage of the Fr´echet distances of l cy and l y (cid:48) . Therefore, the decision problem is convex. Critical Values. An x -coordinate x is a critical value of a decision algorithm if thedecision algorithm could feasibly return that the left endpoint of l has x -coordinate x .For the C class, these critical values are determined by up to three C terms: the verticesthemselves, the midpoint of any pair of vertices, and the center of the circle through thethree (translated) points determining the Fr´echet distance. Since each term in C consistsof at most n points, there are O ( n ) critical values in Case 1 .For the C class, these critical values are the x -coordinates of the intersection pointsand minima of two hyperbolic functions, one for each element of each pair of two terms.12 vl cy l y (cid:48) l y (cid:48)(cid:48) r a cb (a) Point r is mapped to the three line segments. r a cb r (cid:48) (b) Upper bounding (cid:107) rb (cid:107) . Figure 7: The decision algorithm is a convex function with respect to the left endpoint ofthe line segment.Therefore, there are only a constant number of critical values for each two terms. Eachterm gives rise to at most O ( n ) hyperbolic functions (specifically, B ( π [ u, v ] , l y ) can be ofsize Θ( n ) in the worst case). Thus, there are at most O ( n ) critical values in Case 2 .Using similar arguments, it can be shown that there are at most O ( n ) critical values in Case 3 , as they consist of at most two C terms and at most one C term. Query Complexity.
The decision algorithm runs in T s = O (log n ) time since weuse Theorem 2 to compute the optimal placement for a fixed left endpoint. The parallelform of the decision algorithm runs on one processor in T p = O (log n ) time. Substitutingthese values in the running time of the parametric search of O ( P T p + T p T s log P ) leads to O (log n ) time. Preprocessing and Space.
Since we use the algorithm of Theorem 2 to the optimalplacement of l c for a given x -coordinate of its left endpoint, this requires O ( n log n ) spaceand preprocessing time. In this paper, we answer Translation Invariant Frechet distance queries between a horizontalquery segment and a subtrajectory of a preprocessed trajectory. The most closely relatedresult is that of De Berg et al . [9], which computes the normal Fr´echet distance between asubtrajectory and a horizontal query segment. We extend this work in two way. Firstly, weconsider all subtrajectories, not just vertex-to-vertex subtrajectories. Secondly, we computethe optimal translation for minimising the Fr´echet distance, thus our approach allows us tocompute both the normal Fr´echet distance and the Translation Invariant Fr´echet distance.All our queries can be answered in polylogarithmic time.In terms of future work, one avenue would be to improve the query times. While ourapproach has polylogarithmic query time, the O (log n ) time needed for querying the op-timal placement under translation is far from practical. Furthermore, our results use a O ( n log n ) size data structure and reducing this would make the approach more appeal-ing.Other future work takes the form of generalising our queries further. In our most generalform, we still work with a fixed length line segment with a fixed orientation. An interestingopen problem is to see if we can also determine the optimal length of the line segmentefficiently at query time. Allowing the line segment to have an arbitrary orientation seems adifficult problem to generalise our approach to, since the data structures we use assume that13he line segment is horizontal. This can be extended to accommodate a constant number oforientations instead, but to extend this to truly arbitrary orientations, given at query time,will require significant modifications and novel ideas. References [1] Helmut Alt. The computational geometry of comparing shapes. In
Efficient Algorithms ,pages 235–248. Springer, 2009.[2] Helmut Alt and Michael Godau. Computing the Fr´echet distance between two polygo-nal curves.
International Journal of Computational Geometry & Applications , 5(2):75–91, 1995.[3] Helmut Alt, Christian Knauer, and Carola Wenk. Matching polygonal curves withrespect to the Fr´echet distance. In
STACS 2001, 18th Annual Symposium on TheoreticalAspects of Computer Science, Dresden, Germany, February 15-17, 2001, Proceedings ,pages 63–74, 2001.[4] Karl Bringmann. Why walking the dog takes time: Fr´echet distance has no stronglysubquadratic algorithms unless SETH fails. In
Proceedings of the 55th IEEE AnnualSymposium on Foundations of Computer Science , pages 661–670, 2014.[5] Karl Bringmann and Marvin K¨unnemann. Improved approximation for Fr´echet dis-tance on c -packed curves matching conditional lower bounds. International Journal ofComputational Geometry & Applications , 27(1-2):85–120, 2017.[6] Karl Bringmann, Marvin K¨unnemann, and Andr´e Nusser. Fr´echet distance under trans-lation: Conditional hardness and an algorithm via offline dynamic grid reachability. In
Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms , pages2902–2921, 2019.[7] Kevin Buchin, Maike Buchin, Wouter Meulemans, and Wolfgang Mulzer. Four Sovi-ets walk the dog: Improved bounds for computing the Fr´echet distance.
Discrete &Computational Geometry , 58(1):180–216, 2017.[8] Mark de Berg, Atlas F. Cook, and Joachim Gudmundsson. Fast Fr´echet queries.
Com-putational Geometry , 46(6):747–755, 2013.[9] Mark De Berg, Ali D Mehrabi, and Tim Ophelders. Data structures for Fr´echet queriesin trajectory data. In
Proceedings of the 29th Canadian Conference on ComputationalGeometry , 2017.[10] Anne Driemel and Sariel Har-Peled. Jaywalking your dog: computing the Fr´echetdistance with shortcuts.
SIAM Journal on Computing , 42(5):1830–1866, 2013.[11] Maurice Fr´echet. Sur quelques points du calcul fonctionnel.
Rendiconti del CircoloMatematico di Palermo (1884-1940) , 22(1):1–72, 1906.[12] Joachim Gudmundsson, Majid Mirzanezhad, Ali Mohades, and Carola Wenk. FastFr´echet distance between curves with long edges. In
Proceedings of the 3rd InternationalWorkshop on Interactive and Spatial Computing , pages 52–58, 2018.1413] Joachim Gudmundsson and Michiel Smid. Fast algorithms for approximate Fr´echetmatching queries in geometric trees.
Computational Geometry , 48(6):479–494, 2015.[14] Minghui Jiang, Ying Xu, and Binhai Zhu. Protein structure-structure alignment withdiscrete Fr´echet distance.
J. Bioinformatics and Computational Biology , 6(1):51–64,2008.[15] Eamonn J Keogh and Michael J Pazzani. Scaling up dynamic time warping to mas-sive datasets. In
European Conference on Principles of Data Mining and KnowledgeDiscovery , pages 1–11, 1999.[16] Patrick Laube.
Computational Movement Analysis . Springer Briefs in Computer Sci-ence. Springer, 2014.[17] Nimrod Megiddo. Applying parallel computation algorithms in the design of serialalgorithms. In , pages 399–408. IEEE, 1981.[18] Wouter Meulemans.
Similarity measures and algorithms for cartographic schematiza-tion . PhD thesis, Technische Universiteit Eindhoven, 2014.[19] Peter Ranacher and Katerina Tzavella. How to compare movement? A review ofphysical movement similarity measures in geographic information science and beyond.
Cartography and Geographic Information Science , 41(3):286––307, 2014.[20] Chotirat Ann Ratanamahatana and Eamonn Keogh. Three myths about dynamic timewarping data mining. In
Proceedings of the 2005 SIAM International Conference onData Mining , pages 506–510, 2005.[21] E Sriraghavendra, K Karthik, and Chiranjib Bhattacharyya. Fr´echet distance basedapproach for searching online handwritten documents. In
Proceedings of the 9th Inter-national Conference on Document Analysis and Recognition , volume 1, pages 461–465,2007.[22] Kevin Toohey and Matt Duckham. Trajectory similarity measures.
SIGSPATIALSpecial , 7(1):43–50, 2015.[23] Haozhou Wang, Han Su, Kai Zheng, Shazia Sadiq, and Xiaofang Zhou. An effective-ness study on trajectory similarity measures. In
Proceedings of the 24th AustralasianDatabase Conference , pages 13–22, 2013.[24] Carola Wenk.
Shape matching in higher dimensions . PhD thesis, Free University ofBerlin, Dahlem, Germany, 2003.[25] Tim Wylie and Binhai Zhu. Protein chain pair simplification under the discrete Fr´echetdistance.
IEEE/ACM Trans. Comput. Biology Bioinform. , 10(6):1372–1383, 2013.15 ppendix A Improving the Number of Critical Values
In order to show that it suffices to use a set of critical values of size O ( n ) instead of O ( n )to compute p (cid:48) and q (cid:48) , we look more formally at what property a candidate needs to satisfy. Definition 1.
A point s represents p u if and only if there exists a non-decreasing continuousmapping µ : π [ u, v ] → pq such that µ achieves the Fr´echet distance and µ ( p u ) = s . Now we define a collection of points on pq that could feasibly be representatives. Definition 2.
Given any vertex p i on the subtrajectory π [ u, v ] , let p ∗ i be the orthogonalprojection of vertex p i onto the horizontal segment pq . Definition 3.
Given any two vertices p i and p j on the subtrajectory π [ u, v ] , let P ij be theperpendicular bisector of p i and p j . Let P ∗ ij be the intersection of the perpendicular bisector P ij with the horizontal segment pq . We now have all we need in place to define our set S of candidates for p (cid:48) and q (cid:48) . Definition 4.
Let S be the set containing the following elements:1. the points p and q ,2. all orthogonal projection points p ∗ i , and3. all perpendicular bisector intersection points P ∗ ij . It now suffices to show that S contains at least one representative for p u . An analogousargument shows that S contains a representative of p v as well. Lemma 1.
There exists an element s ∈ S on pq that represents p u .Proof. Assume for the sake of contradiction that there is no element s ∈ S which represents p u . Consider a mapping µ that achieves the Fr´echet distance and consider the point µ ( p u )on the horizontal segment pq . Since µ ( p u ) represents p u , µ ( p u ) cannot be in S and mustlie strictly between two consecutive elements of S , say s L to its left and s R to its right (seeFigure 8). Note that it may be the case that s L = p or s R = q . Since s L and s R are elementsof S , neither can represent p u . Next, we reason about the implications of s L and s R notbeing able to represent p u , before putting these together to obtain a contradiction. s L cannot represent p u . This means that no mapping which sends p u → s L achievesthe Fr´echet distance. Let us take the mapping µ and modify it into a new mapping µ L in such a way that µ L ( p u ) = s L . We can do so by starting out parametrising µ L with aconstant speed mapping which sends u → p and p u → s L . Next, we stay fixed at p u alongthe subtrajectory and move along the horizontal segment from s L to µ ( p u ). The red shadedregion in Figure 8 describes this portion of the remapping. Now that µ L ( p u ) = µ ( p u ), wecan use the original mapping for the rest.Since our new mapping µ L maps p u to an element of S that cannot represent it, we knowthat our modification must increase the Fr´echet distance. The only place where the Fr´echetdistance could have increased is at the line segments where the mapping was changed andhere µ L ( p u ) = s L maximises the Fr´echet distance. Hence, we have (cid:107) p u s L (cid:107) > d , where d isthe Fr´echet distance, as shown in Figure 8. But (cid:107) p u µ ( p u ) (cid:107) ≤ d , so we can deduce that p u is closer to µ ( p u ) than s L . Therefore, p u is to the right of s L . Finally, if s L and s R were on16 u up p i s L µ ( p u ) s R qT Figure 8: The point µ ( p u ) lies between two consecutive elements s L and s R . Distances thatare greater than d are thin solid and distances that are at most d are dotted, where d is theFr´echet distance.opposite sides of p ∗ u , then s L and s R would not be consecutive, therefore p u must be on thesame side of s L and s R . Therefore, p u is to the right of the entire segment s L s R . s R cannot represent p u . Again, no mapping which sends p u → s R achieves theFr´echet distance, so we use the same approach and modify µ into a new mapping mapping µ R in such a way that µ R ( p u ) = s R . To this end, we keep the mapping µ R the same as µ until it reaches p u , and then while staying at p u , we fastforward the movement from µ ( p u )along the horizontal segment so that µ R ( p u ) = s R . Next, we stay at s R and fastforwardthe movement along the subtrajectory, until we reach the first point T on the subtrajectorysuch that µ ( T ) = s R in the original mapping. From point T onwards we can use the originalmapping µ .Since our new mapping µ R maps p u to an element of S that does not represent it, wecannot have achieved the Fr´echet distance. The first change we applied was staying at p u and fastforwarding the movement from µ ( p u ) to s R . However, since we know from abovethat p u is to the right of the entire segment s L s R , this fastforwarding moves closer to p u ,so this part cannot increase the Fr´echet distance. The second change we applied, stayingat s R and fastforwarding the movement from p u to T , must therefore be the change thatincreases the Fr´echet distance. Thus, there must be a point on the subtrajectory π [ p u , T ]which has distance greater than d , the Fr´echet distance, to the point s R . Since the distanceto a point s R is maximal at vertices of π [ p u , T ], we can assume without loss of generalitythat (cid:107) p i s R (cid:107) > d for some vertex p i . Consider µ ( p i ) in the original mapping. Since p i ison the subtrajectory π [ p u , T ], µ ( p i ) must be between µ ( p u ) and µ ( T ) = s R . This mappingof p i to µ ( p i ) is shown as a black dotted line in Figure 8. Using a similar logic as before, (cid:107) p i µ ( p i ) (cid:107) ≤ d and (cid:107) p i s R (cid:107) > d , so p i must lie to left of s R . And since s L and s R areconsecutive elements of S , we deduce that p i is to the left of the entire segment s L s R . Putting these together.
We now have the full diagram as shown in Figure 8. Thevertex p u is to the right of both s L and s R and the vertex p i is to the left of both s L and s R . We also have inferred that (cid:107) p u s L (cid:107) > d and (cid:107) p i s R (cid:107) > d . Moreover, since (cid:107) p u µ ( p u ) (cid:107) ≤ d and (cid:107) p i µ ( p i ) (cid:107) ≤ d , we also have that (cid:107) p u s R (cid:107) ≤ d and (cid:107) p i s L (cid:107) ≤ d , since this just movesthese endpoints closer to p u and p i respectively.Finally, we will show that P ∗ ui lies between s L and s R , reaching the intended contradic-tion. We do so by considering the function f ( x ) = (cid:107) xp u (cid:107) − (cid:107) xp i (cid:107) for all points x between s L s R . From our length conditions, we have that f ( s L ) > f ( s R ) <
0. Furthermore, since f ( x ) is a continuous function, by the intermediate value theorem, there is a point x strictlybetween s L and s R such that f ( x ) = 0. Since f ( x ) = 0, the point x is equidistant from p u and p i so therefore lies on both P ui and the horizontal segment pq . Therefore x = P ∗ ui and isan element of S between two consecutive elements s L and s R , giving us a contradiction.Note that in the above proof, we require only P ∗ ui to be in the candidate set when weare computing p (cid:48) , and also only when ( p u , p i ) is a backward pair. This means that forcomputing p (cid:48) and q (cid:48) respectively, we only require the bisector intersections P ∗ ui and P ∗ jv tobe in S , hence reducing the size of S from O ( n ) to O ( nn