A Tail Estimate with Exponential Decay for the Randomized Incremental Construction of Search Structures
AA Tail Estimate with Exponential Decay for the RandomizedIncremental Construction of Search Structures
Joachim Gudmundsson ∗ Martin P. Seybold † Abstract
We revisit the randomized incremental construction of the TrapezoidalSearch DAG (TSD) for a set of n non-crossing segments, e.g. edges fromplanar subdivisions. It is well known that this point location structurehas O ( n ) expected size and O ( n ln n ) expected construction time.Our main result is an improved tail bound, with exponential decay, forthe size of the TSD: There is a constant such that the probability for a TSDto exceed its expected size by more than this factor is at most /e n . Thisyields improved bounds on the TSD construction and their maintenance.I.e. TSD construction takes with high probability O ( n ln n ) time andTSD size can be made worst case O ( n ) with an expected rebuild cost of O (1) .The proposed analysis technique also shows that the expected depthis O (ln n ) , which partially solves a recent conjecture by Hemmer et al.that is used in the CGAL implementation of the TSD. Keywords
Randomized Incremental Construction, Data Structures,Tail Bound
The Randomized Incremental Construction (RIC) is one of the most suc-cessful and influential paradigms in Computational Geometry. Its sim-plicity makes the method particularly useful for many, seemingly dif-ferent problems that ask to compute a defined structure for a given setof objects. The idea is to first permute all n objects, uniformly at ran-dom, before inserting them, one at a time, in an initially empty structureunder this order. Mulmuley’s book [18] gives an excellent introductionto the paradigm. A simple D geometric problem that can be solved byRIC is to compute the intervals induced by a given set of points on a line(e.g. the x -axis). In this case, the structure for the empty set of points isthe interval ( −∞ , + ∞ ) and, at every point insertion, the interval thatcontains the point is split into two open intervals (left and right of thepoint). There are two well known methods to identify the interval thatneeds to be split for the next point, that are called maintaining conflictlists and keeping a searchable history of all structures created in the pro-cess. Using conflict lists, all points are placed in the initial interval (e.g.they have a pointer to their interval) and every time an interval is split,its points are partitioned in the left and right interval (cf. partitions inquicksort). Using the history structures, one starts with the initial in-terval and every time an interval is split, the split point is stored thereintogether with two pointers to the respective left and right result intervals(cf. binary search trees).Though the RIC seems unguided, the resulting search structures havesurprisingly good expected performance measures on any input, i.e. theexpectation is over the random permutations of the objects. The random-ized binary search trees, for example, have a worst case size of O ( n ) and ∗ [email protected] † [email protected] every leaf has expected depth O (ln n ) . A beautiful simple argument forthis is due to Backward Analysis [20]. One fixes an arbitrary search point q (within one leaf) and counts how often q changes its interval duringthe construction, or equivalently during deleting all objects in the re-verse order. This leads to an expected depth of O (ln n ) for q . Moreover,Chernoff’s method shows that deviations of more than a constant fac-tor from the expected value are very unlikely, i.e. no more than inverseproportional to a polynomial in n whose degree can be made arbitrarylarge by increasing the constant. That is, the search path to q has O (ln n ) length with high probability (w.h.p.). Since there are only n +1 differentsearch paths, the longest of them is w.h.p. within a constant factor of theexpected value. As a result the tree height is w.h.p. within a constantfactor of the optimum.Tail bounds have immediate algorithmic applications, e.g. when theexpected performance measure of a search structure needs to be madea worst case property (within a constant factor). Treaps [22, 24], forexample, are a fully-dynamic version of randomized binary search treeswith expected logarithmic update time whose shape is, after each update,dictated by a random permutation that is uniformly from those over thecurrent set of objects. Since it is easy to maintain the height of the rootin Treaps, one can simply rebuild a degraded tree entirely (with a freshpermutation) until the data structure is again within a constant factor ofoptimum. Given a high probability tail bound for a performance mea-sure of interest, the simple rebuild strategy (to attain worst case guaran-tees) for dynamic search structures only adds an expected rebuild costto the update time that is at most a constant . This demonstrates that tailbounds with polynomial or exponential decay, rather than constant, areof general interest for maintaining dynamic data structures.Well known examples of search structures, derived from history basedRIC, include those over Voronoi diagrams and convex polytopes (e.g.Chapters 3.2 and 3.3 in [18]). Point location in Voronoi Diagrams, forexample, is central to several variants of Nearest Neighbor queries [2].Moreover, RICs often give natural extensions to external memory [1]and parallel algorithms [5].We study the well-known problem of computing a planar subdivi-sion, called trapezoidation, that is induced by a set S of n line seg-ments [9, 18]. Such subdivisions are extendable to more complex, spa-tial objects, e.g. x -monotonous curves [21] or curves of bounded alge-braic degree [16]. Every trapezoidation contains O ( n + k ) faces, where k denotes the number of intersection points of the segments. Trape-zoidations are fundamental, e.g., in theoretical [7] and practical [3, 4]algorithms for intersection reporting, i.e. needed for spatial joins in theoverlay construction from map layers.Mulmuley [15] gave a conflict list based RIC, maintaing one endpointper segment, for the problem and Seidel [19] gave the history based RICthat builds the TSD online, both taking O ( n ln n + k ) expected time.The TSD is the history of trapezoidations that are created during the RICand allows to perform point location queries that return the trapezoid,of the current subdivision, that contains the query point. TSDs have a We denote the natural, the base two, and the base ten logarithms with ln( · ) , lg( · ) ,and log( · ) respectively. a r X i v : . [ c s . C G ] F e b orst-case size of Θ( n ) , but their expected size is O ( n + k ) . Moreover,the expected search path length is w.h.p. O (ln n ) and the longest searchpath (search depth) is also w.h.p. O (ln n ) , since there are only O ( n ) different search paths (e.g. [9, Chapter 6.4]). In constrast to Quicksortand Randomized Binary Search Trees, high probability bounds for theTSD construction time are only known under additional assumptions(cf. Section 1.1).As with Treaps, TSDs allow fully-dynamic updates such that, aftereach update, the underlying random permutation is uniformly fromthose over the current set of segments. Early algorithms generalizecommon search tree rotations to abstract, complex structures in orderto reuse the point location search and leaf insertion algorithms [17].Simpler search and recursive top-down algorithms were described re-cently [6]. The bounds on expected insertion and deletion time of bothmethods however require that the update request entails a random ob-ject. Unlike with Treaps, whose tail bound gives that rebuild decisions arehighly unlikely, it is non-trivial to determine the length of a longestsearch path in the TSD. The work of Hemmer et al. [12] shows howto turn the TSDs expected query time into a worst-case bound. Theygive two (LasVegas Verifier) algorithms to estimate the search depth.Their exact algorithm runs in O ( n ln n ) expected time and their O (1) -approximation runs in O ( n lg n ) time. Their CGAL implementation [25]however refrains from these verifiers and simply uses the TSD depth totrigger rebuilds, which is readily available. Clearly the TSD depth is anupper bound, since the (combinatorial) paths are a super set of the searchpaths. They give a family of instances that, under a certain permutation,yield a ratio of Ω( n/ ln n ) between depth and search depth. In exper-iments with several instances and orders however, the two worst casemeasures only differ by a small constant factor. They conjecture that theTSD depth is O (lg n ) with at least constant probability (see Conjecture in [10]). To the best of our knowledge, not even the expected value ofthis quantity is known.The theory developed for RICs lead to a tail bound technique [14, 8]that holds as soon as the actual geometric problem under considera-tion provides a certain boundedness property. To our knowledge, thestrongest tail bound to date is from Clarkson et al. [8, Corollary 26],which states the following. Given a function M such that M ( j ) up-per bounds the size of the structure on j objects. If M ( j ) /j is non-decreasing, then, for all λ > , the probability that the history size ex-ceeds λM ( n ) is at most ( e/λ ) λ /e . This includes the TSD size for non-crossing segments ( k = 0 ), but also, e.g., the RICs of convex hulls andDelaunay triangulations. Assuming intersecting segments, Matouˇsekand Seidel [13] show how to use an isoperimetric inequality for per-mutations to derive a tail bound of O ( n − c ) , given there are at least k ≥ Cn log n many intersections in the input (both constants c and C depend on the deviation threshold λ ). Mehlhorn et al. [14] show that thegeneral approach can yield a tail bound of at most /e Ω( k/n ln n ) , giventhere are at least k ≥ n ln n ln (3) n intersections in the input segments.Recently, Sen [23] gave tail estimates for ‘conflict graph’ based RICs(cf. Chapter 3.4 in [18]) using Freedman’s inequality for Martingales.The work also shows a lower bound on tail estimates for the runtime,i.e. the total number of ‘conflict graph’ modifications, for computing thetrapezoidation of non-crossing segments that rules out high probabilitytail bounds [23, Section 6]. In this variation of the RIC, not only oneendpoint per segment is maintained in conflict lists, but edges in a bi-partite conflict graph, over existing trapezoids and uninserted segments,that contain an edge if the geometric objects intersect (see Appendix andFigure 4 in [23]). Hence this lower bound construction does not translateto the TSD. We introduce a new and direct technique to analyze the size of the TSDthat is based on pairwise events and an inductive application of Cher-noff’s method. Our main result is a much sharper tail estimate for theTSD size of non-crossing segments (see Table 1).Technique Bound With Prob. ≥ ConditionIsoperimetric[13] O ( k ) 1 − O (1 /n c ) k ≥ Cn log n Hoeffding[14] O ( k ) 1 − /e Ω( k/n ln n ) k ≥ n ln n ln (3) n Freedman[23] O ( k ) 1 − /e k/nα ( n ) k ≥ n ln n Hoeffding[8] O ( n ) 1 − ( e/λ ) λ /e Pairwise Events O ( n ) 1 − /e n Table 1: Tail bounds for the history size of TSDs on n segments. k de-notes the number of intersection points and α ( n ) the inverse ofAckermann’s function.This complements the known high probability bound for the pointlocation cost and shows that the TSD has, with very high probability,size O ( j ) after every insertion step j . Hence, building the TSD for non-crossing segments takes w.h.p. O ( n ln n ) time, which strengthens theknown expected time bound.The proposed technique also shows that the TSD has O (ln n ) ex-pected depth, which partially solves a recent conjecture by Hemmer etal. [10, 12] that is assumed in practice. We believe our technique can begeneralized to other RIC based search structures. Let S be a set of n segments in the plane. We identify the permu-tations over S with the set of bijective mappings to { , . . . , n } , i.e. P ( S ) = { π : S → { , . . . , n } | π bijective } . The integer π ( s ) is calledthe priority of the segment s .An implicit, infinitesimal shear transformation allows to assume,without loss of generality, that all distinct endpoints have different x -coordinates (e.g. Chapter 6.3 in [9]). Trapezoidation T ( S ) is definedby emitting two vertical rays (in negative and positive y -direction) fromeach end or intersection point until the ray meets the first segment orthe bounding rectangle (see Figure 1). To simplify presentation, we alsoimplicitly move common endpoints infinitesimal along their segment,towards their interior. This gives that non-crossing segments have no points in common, though there may exist some spatially empty trape-zoids in T ( S ) . We identify T ( S ) with the set of faces in this decompo-sition of the plane. Elements in T ( S ) are trapezoidal regions that havea boundary that is defined by at most four segments of S (see Figure 1).Note that boundaries of the trapezoids in T ( S ) are solely determinedby the set of segments S , irrespective of the permutation. We will needthe following notations. Let γ > be the smallest constant such that |T ( S ) | ≤ γn holds for any S that is sufficiently large. For a segment s ∈ S , let f ( s, S ) = { ∆ ∈ T ( S ) : ∆ is bounded by s } denote the setof faces that are bounded by s (i.e. top , bottom , left , or right ). Let s i = π − ( i ) be the priority i segment and let S ≤ k = { s , . . . , s k } .The expected size of the TSD is typically analyzed by considering (cid:80) nj =1 D j where the random variable D j := | f ( s j , S ≤ j ) | denotes thenumber of faces that are created by inserting s j into trapezoidation T ( S ≤ j − ) , equivalently that are removed by deleting s j from T ( S ≤ j ) (see Figure 2). Classic Backward Analysis [9, p. ] in this context is See, e.g., Lemma 6.2 in [9] that shows |T ( S ) | ≤ | S | + 1 for non-crossing seg-ments. ca d bca bca d bca d bca ∆( v ) d Figure 1: Trapezoidations over the segments S = { a = ( a.l, a.r ) , b = ( b.l, b.r ) , c = ( c.l, c.r ) , d = ( d.l, d.r ) } where c.l = d.l is a commonendpoint. T ( { a } ) , T ( { a, b } ) , T ( { a, b, c } ) , and T ( { a, b, c, d } ) have , , , and faces respectively (cf. leaves in Figure 2). aa ab b bbc c c cc T ( { a } ) T ( { a, b } ) T ( { a, b, c } ) v dd T ( { a, b, c, d } ) d u Figure 2: TSD for the history of trapezoidations under permutation π = (cid:0) a b c d (cid:1) from Figure 1. TSD node v corresponds to thetrapezoid ∆( v ) , which has the boundaries top (∆( v )) = c , bottom (∆( v )) = b , left (∆( v )) = a.r , and right (∆( v )) = b.r and the spatially empty ∆( u ) is due to common endpoint left (∆( u )) = c.l = d.l = right (∆( u )) . Path with heavy linewidth is not a search path, since d.r is left of a.r .the following argument. Let S (cid:48) ⊆ S be a fixed subset of j segments,then E P ( S ) (cid:104) D j (cid:12)(cid:12)(cid:12) S ≤ j = S (cid:48) (cid:105) = 1 j (cid:88) s ∈ S (cid:48) (cid:88) ∆ ∈T ( S (cid:48) ) χ (cid:0) ∆ ∈ f ( s, S (cid:48) ) (cid:1) ≤ γjj , where the binary indicator variable χ (∆ ∈ f ( s, S (cid:48) )) is iff the trape-zoid ∆ is bounded by segment s . The equality is due to that every seg-ment in S (cid:48) is equally likely to be picked for s j . Since the bound on thevalue of the conditional expectation does not depend on the actual set S (cid:48) , we have E [ D j ] ≤ γ unconditionally for each step j . Since the de-struction of a face (of a leaf node) creates at most three search nodes,linearity of expectations gives that the expected number of TSD nodesis at most γn . We define for each ≤ i < j ≤ n an event, i.e. a binary randomvariable, X i,j : P ( S ) → { , } by setting X i,j = (cid:40) if f ( s j , S ≤ j ) contains a trapezoid bounded by s i otherwise . To simplify presentation, we place the events in a lower triangle matrixand call the set r ( j ) := { X i,j : 1 ≤ i < j } the events of row j and theset c ( i ) := { X i,j : i < j ≤ n } the events of column i .Imagine that the random permutation is built backwards, i.e. by suc-cessively choosing one of the remaining elements uniformly at random X ( π ) = 11 10 0 1 D = 4 A = { a } N = {} D = 5 A = { a, b } N = {} D = 6 A = { c } N = { a, b } D = 4 Table 2: Outcome of the pairwise events and the partitions for segments S = { a, b, c, d } and order π from Figures 1 and 2.to assign the largest available priority value. For every step j at leastone of the row events occurs, i.e. < (cid:80) i
Let S be a set of non-crossing segments. For every π ∈ P ( S ) and j ≥ , we have D j ( π ) / ≤ (cid:80) i There is a constant λ > such that, for every set S of n non-crossing segments, we have Pr[ (cid:80) nj =2 (cid:80) j − i =1 X i,j > λn ] < /e n . Proof. For the Chernoff Method, set t := ln 2 and B := γ +1ln 2 n . Toleverage Equation (2) and (3) for our events, we regroup the summationterms by column index. Let C i := (cid:80) Y ∈ c ( i ) Y for each ≤ i < n . For Pr[ (cid:80) i RIC of a TSD for n non-crossing segments takes w.h.p. O ( n ln n ) time. Corollary 2. The TSD size, for n non-crossing segments, can be made O ( n ) with ‘rebuild if too large’ by merely increasing the expected con-struction time by an additive constant. The classical argument shows that any of the O ( n ) many search pathshas logarithmic length, with high probability. Since there there are only O ( n ) points that differ in their search paths, the high probability boundis strong enough to address each of them in a union bound (e.g. [9, Chap-ter 6]). However, a DAG on n vertices of degree at most two may wellcontain Θ(2 n ) different paths (cf. Figure 3). Cf. [9, Chapter 6.4] and [18, Lemma 3.1.5 and Theorem 3.1.4]. s s s s s s s s ...... Figure 3: Insertion order for a set of non-crossing segments that resultsin a TSD with depth Ω( n ) and Ω(2 n ) paths.Each root-to-leaf path in the TSD gives rise to a sub-sequence of ‘fullregion’ nodes ( u , . . . , u m ) , i.e. those nodes whose associated trape-zoids ∆( u i ) are actual faces of the trapezoidation T ( S ≤ j ) for some step j ∈ { , . . . , n } (see Figure 2). The length of the sequence of face transi-tions, is within a factor of three of the path length since a face destructioninserts at most three edges in the TSD to connect a trapezoid of T ( S
CGAL 5.1.1 , that focuses to exhibit how our proposed tail estimateon the TSD size compares against the known high probability bound forthe search depth. Since the computation of the search depth for the in-termediary structures entails considerable work, we only compute thesearch depth for the first segment insertions. Beside this we alsorecorded the depth, which is constant time accessible throughout theentire construction.Our experiments comprise one real (NC) and two synthetic (rnd-hor- K, rnd- K) data sets. The NC data set is from the OpenStreetMapproject and contains all , line segments that are associated tostreets in New Caledonia, as of June st 2020. The rnd-hor- K data setcontains horizontal segments with the y -coordinates { , . . . , } and x -coordinates that are chosen uniformly at random from (0 , . . Thernd- K data set contains segments whose endpoint coordinates arechosen uniformly at random from the (0 , . square, resulting in aplanar subdivision with , edges.Figures 4, 5, and 6 show the absolute values of size, depth, and searchdepth during the TSD construction with two different random permuta-tions on the three data sets. The figures also contain a plot of the TSD sizerelative to n and TSD depth relative to log n to make relative deviationsfrom optimum visually better accessible. In our experiment, the depthand search depth are very closely related and the largest discrepancy isobserved on the rnd-hor-10K data set (see also Figure 3). Fluctuations ofthe TSD sizes between the two runs are visually barely distinguishable,as suggested by our exponential tail bound. The depth and search depthshow more fluctuations during the construction, yet within a small con-stant factor of log n as suggested by the known high probability bound. Acknowledgments The authors want to thank Wolfgang Mulzer forpointing out an incorrect statement in the earlier draft, Boris Aronovfor discussions during his stay, Daniel Bahrdt for the github project OsmGraphCreator , and Raimund Seidel for sharing his excellentlecture notes on a CG course he thought 1991 at UC Berkeley. References [1] Pankaj K. Agarwal, Lars Arge, Jeff Erickson, Paolo Giulio Franciosa, andJeffrey Scott Vitter. Efficient searching with linear constraints. In Proc.of the 17th Symposium on Principles of Database Systems (PODS’98) , pages169–178, 1998. doi:10.1145/275487.275506 .[2] Pankaj K. Agarwal, Boris Aronov, Sariel Har-Peled, Jeff M. Phillips, Ke Yi,and Wuzhou Zhang. Nearest neighbor searching under uncertainty II. In Proc. of the 32nd Symposium on Principles of Database Systems (PODS’13) ,pages 115–126, 2013. doi:10.1145/2463664.2465219 .[3] D. S. Andrews, J. Snoeyink, J. Boritz, T. Chan, G. Denham, J. Harri-son, and C. Zhu. Further comparison of algorithms for geometric in-tersection problems. In Proc. 6th International Symposium on SpatialData Handling , 1994. URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.2254 .[4] D. S. Andrews and Jack Snoeyink. Geometry in GIS is not combinato-rial: Segment intersection for polygon overlay. In Proc. of 11th Sym-posium on Computational Geometry (SoCG’95) , pages C24–C25, 1995.URL: https://doi.org/10.1145/220279.220333 , doi:10.1145/220279.220333 .[5] Guy E. Blelloch, Yan Gu, Julian Shun, and Yihan Sun. Parallelism in ran-domized incremental algorithms. J. ACM , 67(5):27:1–27:27, 2020. doi:10.1145/3402819 .[6] Milutin Brankovic, Nikola Grujic, Andr´e van Renssen, and Martin P.Seybold. A simple dynamization of trapezoidal point location in pla-nar subdivisions. In Proc. 47th International Colloquium on Automata, anguages, and Programming (ICALP’20) , pages 18:1–18:18, 2020. doi:10.4230/LIPIcs.ICALP.2020.18 .[7] Timothy M. Chan. A simple trapezoid sweep algorithm for reportingred/blue segment intersections. In Proc. of the 6th Canadian Conferenceon Computational Geometry (CCCG’94) , pages 263–268, 1994.[8] Kenneth L. Clarkson, Kurt Mehlhorn, and Raimund Seidel. Four re-sults on randomized incremental constructions. Computational Ge-ometry: Theory and Applications , 3:185–212, 1993. doi:10.1016/0925-7721(93)90009-U .[9] Mark de Berg, Otfried Cheong, Marc J. van Kreveld, and Mark H. Over-mars. Computational Geometry: Algorithms and Applications, 3rd Edition .Springer, 2008. doi:10.1007/978-3-540-77974-2 .[10] Michael Hemmer, Michal Kleinbort, and Dan Halperin. Improved imple-mentation of point location in general two-dimensional subdivisions. In Proc. 20th European Symposium on Algorithms (ESA’12) , pages 611–623,2012. doi:10.1007/978-3-642-33090-2“53 .[11] Michael Hemmer, Michal Kleinbort, and Dan Halperin. Improvedimplementation of point location in general two-dimensional subdivi-sions. CoRR , abs/1205.5434, 2012. URL: http://arxiv.org/abs/1205.5434 , arXiv:1205.5434 .[12] Michael Hemmer, Michal Kleinbort, and Dan Halperin. Optimal random-ized incremental construction for guaranteed logarithmic planar point lo-cation. Computational Geometry: Theory and Applications , 58:110–123,2016. doi:10.1016/j.comgeo.2016.07.006 .[13] Jir´ı Matouˇsek and Raimund Seidel. A tail estimate for mulmuley’s seg-ment intersection algorithm. In Proc. 19th International Colloquium onAutomata, Languages and Programming (ICALP’92) , pages 427–438, 1992. doi:10.1007/3-540-55719-9“94 .[14] Kurt Mehlhorn, Micha Sharir, and Emo Welzl. Tail estimates for the spacecomplexity of randomized incremental algorithms. In Proc. of the 3rd Sym-posium on Discrete Algorithms (SODA’93) , pages 89–93, 1992. URL: http://dl.acm.org/citation.cfm?id=139404.139423 .[15] Ketan Mulmuley. A fast planar partition algorithm, I. J. of Symbolic Com-putation , 10(3-4):253–280, 1990. doi:10.1016/S0747-7171(08)80064-8 .[16] Ketan Mulmuley. A fast planar partition algorithm, II. J. ACM , 38(1):74–103, 1991. doi:10.1145/102782.102785 .[17] Ketan Mulmuley. Randomized multidimensional search trees: Lazy bal-ancing and dynamic shuffling. In Proc. of the 32nd Symposium on Foun-dations of Computer Science (FOCS’91) , pages 180–196, 1991. doi:10.1109/SFCS.1991.185368 .[18] Ketan Mulmuley. Computational Geometry: An Introduction Through Ran-domized Algorithms . Prentice Hall, 1994.[19] Raimund Seidel. A simple and fast incremental randomized algorithmfor computing trapezoidal decompositions and for triangulating poly-gons. Computational Geometry: Theory and Applications , 1:51–64, 1991. doi:10.1016/0925-7721(91)90012-4 .[20] Raimund Seidel. Backwards Analysis of Randomized Geometric Algorithms ,pages 37–67. Springer Berlin Heidelberg, Berlin, Heidelberg, 1993. doi:10.1007/978-3-642-58043-73 .[21] Raimund Seidel. Teaching computational geometry. In Proc. of the 5thCanadian Conference on Computational Geometry (CCCG’93) , pages 272–272, 1993.[22] Raimund Seidel and Cecilia R. Aragon. Randomized search trees. Algo-rithmica , 16(4-5):464–497, 1996. doi:10.1007/BF01940876 . [23] Sandeep Sen. A unified approach to tail estimates for randomized in-cremental construction. In Proc. of the 36th Symposium on TheoreticalAspects of Computer Science (STACS’19) , pages 58:1–58:16, 2019. doi:10.4230/LIPIcs.STACS.2019.58 .[24] Jean Vuillemin. A unifying look at data structures.