Preprocessing Imprecise Points for the Pareto Front
Ivor van der Hoog, Irina Kostitsyna, Maarten Löffler, Bettina Speckmann
PPreprocessing Imprecise Points for thePareto Front
Ivor van der Hoog
Department of Information and Computing Sciences, Utrecht University, the [email protected]
Irina Kostitsyna
Department of Mathematics and Computer Science, TU Eindhoven, the [email protected]
Maarten Löffler
Department of Information and Computing Sciences, Utrecht University, the Netherlandsm.loffl[email protected]
Bettina Speckmann
Department of Mathematics and Computer Science, TU Eindhoven, the [email protected]
Abstract
In the preprocessing model for uncertain data we are given a set of regions R which model theuncertainty associated with an unknown set of points P . In this model there are two phases: apreprocessing phase, in which we have access only to R , followed by a reconstruction phase, inwhich we have access to points in P at a certain retrieval cost C per point. We study the followingalgorithmic question: how fast can we construct the Pareto front of P in the preprocessing model?We show that if R is a set of pairwise-disjoint axis-aligned rectangles, then we can preprocess R to reconstruct the Pareto front of P efficiently. To refine our algorithmic analysis, we introduce anew notion of algorithmic optimality which relates to the entropy of the uncertainty regions. Ourproposed uncertainty-region optimality falls on the spectrum between worst-case optimality andinstance optimality. We prove that instance optimality is unobtainable in the preprocessing model,whenever the classic algorithmic problem reduces to sorting. Our results are worst-case optimal inthe preprocessing phase; in the reconstruction phase, our results are uncertainty-region optimal withrespect to real RAM instructions, and instance optimal with respect to point retrievals. Theory of computation → Design and analysis of algorithms
Keywords and phrases preprocessing, imprecise points, geometric uncertainty, lower bounds, algo-rithmic optimality, Pareto front
Funding
Ivor van der Hoog : Supported by the Dutch Research Council (NWO); 614.001.504.
Maarten Löffler : Partially supported by the Dutch Research Council (NWO); 614.001.504.
Bettina Speckmann : Partially supported by the Dutch Research Council (NWO); 639.023.208. a r X i v : . [ c s . C G ] J a n Preprocessing Imprecise Points for the Pareto Front
In many applications of geometric algorithms to real-world problems the input is inherentlyimprecise. A classic example are GPS samples used in GIS applications, which have asignificant error. Geometric imprecision can be caused by other factors as well. For example,if a measured object moves during measurement, it may have an error dependent on itsspeed [18]. Another example comes from I/O-sensitive computations: exact locations may betoo costly to store in local memory [3]. Algorithms that can handle imprecise input well havereceived considerable attention in computational geometry. We continue this line of researchby studying the efficient construction of the Pareto front of a collection of imprecise points.
Preprocessing model.
Held and Mitchell [17] introduced the preprocessing model of uncer-tainty as a model to study the amount of geometric information contained in uncertain points.In this model, the input is a set of geometric (uncertainty) regions R = ( R , R , . . . , R n ) withan associated “true” planar point set P = ( p , p , . . . , p n ). For any pair ( R , P ), we say that P respects R if each p i lies inside its associated region R i ; we assume throughout the paperthat P respects R . The preprocessing model has two consecutive phases: a preprocessingphase where we have access only to the set of uncertainty regions R and a reconstructionphase where we can for each R i ∈ R , request the true location p i in (traditionally constant) C time. The value C can, for example, model the cost of disk retrievals for I/O-sensitivecomputations [3]. We typically want to preprocess R in O ( n log n ) time to create somelinear-size auxiliary datastructure Ξ. Afterwards, we want to reconstruct the desired outputon P using Ξ faster than would be possible without preprocessing.Löffler and Snoeyink [22] were the first to interpret R as a collection of imprecisemeasurements of a true point set P . The size of Ξ and the running time of the reconstructionphase, together quantify the information about (the Delaunay triangulation of) P containedin R . This interpretation was widely adopted within computational geometry and motivatedmany recent results for constructing Delaunay triangulations [4, 5, 11, 28], spanning trees [20,30], convex hulls [15, 16, 23, 25] and other planar decompositions [21, 27] for imprecise points. Output format.
Classical work in the preprocessing model ultimately aims to preprocessthe data in such a way that one can achieve a (near-) linear -time reconstruction phase. Indeed,if the final output structure has linear complexity and must explicitly contain the coordinatesof each value in P , then returning the result takes Ω( nC ) time. However, this point of viewis limiting in two ways. First, certain geometric problems, such as the convex hull or thePareto front, may have sub-linear output complexity. Second, even if the output has linearcomplexity, it may be possible to find its combinatorial structure without inspecting the truelocations of all points. Consider the example in Figure 1: on the left, we do not need toretrieve any point; on the right, we do not need to retrieve p after we retrieve p . Van derHoog et al. [27] propose an addition to the preprocessing model to enable a more fine-grained p p p p p p p Figure 1
The Pareto front of P can be implied by the geometry of R (left) or not (right). . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 3 analysis in these situations: instead of returning the desired structure on P explicitly, theyinstead return an implicit representation of the output. This implicit representation cantake the form of a pointer structure which is guaranteed to be isomorphic to the desiredoutput on P , but where each value is a pointer to either a certain (retrieved) point, or toan uncertain (unretrieved) point. In this paper, we study the efficient construction of thePareto front of a set of imprecise points P , from pairwise-disjoint axis-aligned rectangles R as uncertainty regions, in the preprocessing model with implicit representation. Algorithmic efficiency.
To assess the efficiency of any algorithm we generally want tocompare its performance to a suitable lower bound. Two common types of lower boundsare worst-case and instance lower bounds. The classical worst-case lower bound takes theminimum over all algorithms A , of the maximal running time of A for any pair ( R , P ).The instance lower bound [1, 14] is the minimum over all A , for a fixed instance ( R , P ),of the running time of A on ( R , P ). For the Pareto front the worst-case lower bound istrivially Ω( nC ); worst-case optimal performance (for us, in the reconstruction phase) ishence easily obtainable. Instance-optimality, on the other hand, is unobtainable in classicalcomputational geometry [1]. Consider, for example, binary search for a value q amongsta set X of sorted numbers. For each instance ( X, q ), there exists a naive algorithm thatguesses the correct answer in constant time. Thus the instance lower bound for binary searchis constant, even though there is no algorithm that can perform binary search in constanttime in a comparison-based RAM model [13]. Hence we introduce a new lower bound for thepreprocessing model, whose granularity falls in between the instance and worst-case lowerbound. Our uncertainty-region lower bound is the minimum over all algorithms A , for a fixedinput R , of the maximal running time of A on ( R , P ) for any P that respects R . A detaileddiscussion of algorithmic efficiency for the preprocessing model can be found in Section 2. Related work.
Bruce et al. [3] study the efficient construction of the Pareto front of two-dimensional pairwise disjoint axis-aligned uncertainty rectangles in what would later bethe preprocessing model using implicit representation. As their paper is motivated byI/O-sensitive computation, they assume that the retrieval cost C dominates polynomialRAM running time and both their preprocessing and reconstruction phase use an unspecifiedpolynomial number of RAM instructions. In the reconstruction phase they have a retrieval-strategy that iteratively selects a region R i for which they retrieve p i to construct Ξ ∗ (sinceΞ ∗ is an implicit representation , they do not have to retrieve each p i ∈ P ). Their result isinstance optimal under their assumption that C dominates the RAM running time of allparts of their algorithm. We study the same problem without their assumption on C . Results and organization
We discuss in Section 2 the three possible lower bounds for thepreprocessing model: worst case, instance, and our new uncertainty-region lower bound. InSection 3 we present the necessary geometric preliminaries. Then, in Section 4, we prove anuncertainty-region lower bound on the time required for the reconstruction phase. In Section 5we then show how to preprocess R in O ( n log n ) time to create an auxiliary structure Ξ. Wealso explain how to reconstruct the Pareto front of P as an implicit representation Ξ ∗ fromΞ. Our results are worst-case optimal in the preprocessing phase; our reconstruction resultsare uncertainty-region optimal in the RAM instructions, instance optimal with respect tothe retrieval cost C and an O (log n ) factor removed from instance optimal with respect toboth. This is the first two-dimensional result in the preprocessing model with better thanworst-case optimal performance. Preprocessing Imprecise Points for the Pareto Front p p p p p p p p n p n Figure 2
Thrice a collection of grey uncertainty regions where the Pareto front, EMST orDelaunay triangulation of the grey points is implied by the regions; plus an orange region R n .Depending on the placement of p n , it can neighbor any grey point in the final structure. We briefly revisit the definitions of worst-case and instance lower bounds in the preprocessingmodel and then formally introduce our new uncertainty-region lower bound.
Worst-case lower bounds.
The worst-case comparison-based lower bound of an algorithmicproblem P considers each algorithm plus datastructure pair ( A, Ξ) which solves P in a competitive setting with respect to their maximal running time:Worst-case lower bound( P ) := min ( A, Ξ) max ( R ,P ) Runtime( A, Ξ , R , P ) . The number L of distinct outcomes for all instances ( R , P ) implies a lower bound onthe maximal running time for any algorithm A : regardless of preprocessing, auxiliarydatastructures and memory used, any comparison-based pointer machine algorithm A canbe represented as a decision tree where at each algorithmic step, a binary decision istaken [2, 7, 13]. Since there are at least L different outcomes, there must exists a pair ( R , P )for which A takes log L steps before A terminates (this lower bound is often referred to asthe information theoretic lower bound or sometimes the entropy of the problem [1, 6, 7]). Instance lower bounds.
A stronger lower bound, is an instance lower bound [14] (or instanceoptimal in the random-order setting in [1]). For an extensive overview of instance optimalitywe refer to Appendix A. For a given instance ( R , P ), its instance lower bound is:Instance lower bound( P , R , P ) = min ( A, Ξ) Runtime( A, Ξ , R , P ) . An algorithm A is instance optimal, if for every instance ( R , P ) the runtime of A matches theinstance lower bound. Löffler et al. [21] define proximity structures that include quadtrees,Delaunay triangulations, convex hulls, Pareto fronts and Euclidean minimum spanning trees.We prove the following: (cid:73) Theorem 1.
Let the unspecified retrieval cost C not dominate O (log n ) RAM instructionsand R be any set of pairwise disjoint uncertainty rectangles. Then there exists no algorithm A in the preprocessing model with implicit representation that can construct a proximity datastructure on the true points which is instance optimal. Proof.
Let R = ( R , R , . . . R n − ) be a set of uncertainty regions for which the implicitdata structure Ξ ∗ can be known in the preprocessing phase. Denote by R n an uncertainty We refer to comparison-based algorithms algorithms on an intuitive level: as RAM computations thatdo not make use of flooring. For a more formal definition we refer to any of [1, 2, 10, 13]. . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 5 region for which p n can neighbor any p i ∈ ( p , . . . p n − ). See Figure 2 for an example ofthe Pareto front, the EMST and the Delaunay triangulation (with it, Voronoi diagrams)and Figure 3 for the convex hull. For the set of grey points ( p , . . . p n − ), their respectivestructure is known while the orange point p n can neighbor any of the grey points. Viathe information theoretic lower bound, there is no algorithm A that for every instance candecide the correct neighbor of p n in O ( C ) time. Yet for every instance, there exists a naivealgorithm that correctly guesses the constantly many neighbors of p n and verifies this guessin O ( C ) time. (cid:74) p n p n p n Figure 3
A collection of n − R n is shown in orange. Depending on the placement of p n , it can neighbor any grey point in theconvex hull of all the points. Uncertainty-region lower bounds.
Worst-case optimality is easily attainable by any algo-rithm and we proved that instance optimality is not attainable in the preprocessing model.Yet the examples in Figure 1 and 2 intuitively have a lower bound of Θ(1) and Θ(log n + C ),which is trivial to match via binary search. We capture this intuition for a fixed input R :Uncertainty-region lower bound( P , R ) := min ( A, Ξ) max ( P respects R ) Runtime( A, Ξ , R , P ) , and say an algorithm A is uncertainty-region optimal if for every R , A has a running timethat matches the uncertainty-region lower bound. Denote by L ( R ) the number of distinctoutcomes for all P that respect R . Via the information theoretic lower bound we know: ∀R , log | L ( R ) | ≤ Uncertainty-region lower bound( P , R ) . For constructing proximity structures in the preprocessing model with implicit representations,the value of log L ( R ) can range from anywhere between 0 and n log n . Consequently, anoptimal algorithm cannot necessarily afford to explicitly retrieve the entire point set P . Throughout the paper, we use the notation R ® , R (cid:81) for original and truncated regionsrespectively (which we define later). When the set is clear from context, we drop thesuperscript. Let R = ( R , R , . . . , R n ) be a sequence of n pairwise disjoint closed axis-aligned Preprocessing Imprecise Points for the Pareto Front R R R R R R R R p k p i R B R B T runc ( R ) Figure 4
Left: a collection of uncertainty regions. Green is positive, red is negative and yellow ispotential. The horizontal halfslab of a green region is shown. Right: A collection of uncertaintyregions before and after truncation, note that we re-indexed the regions and flagged one. uncertainty rectangles, with underlying point set P . For ease of exposition, we assume R and P lie in general position (no points or region vertices share a coordinate). Wedenote by [ R i , R j ] := ( R i , R i +1 , . . . , R j ) a subsequence of j − i + 1 regions and similarlyby [ p i , p j ] = ( p i , p i +1 , . . . , p j ) a subsequence of points. For brevity, with slight abuse ofnotation, we may refer to points as degenerate rectangles; hence any set R may containpoints. Whenever we place points on a vertex, we mean placing it arbitrarily close to saidvertex. A region R i precedes a region R j if i < j . Conversely, R j succeeds R i .For two points p and q , we say that p (Pareto) dominates q if both its x - and y -coordinatesare greater than or equal to the respective coordinates of q . A point p (Pareto) dominates a rectangle R , if p dominates its top right vertex. We define the Pareto front of P as theboundary of the set of points that are dominated by a point in P . That is, the Pareto frontis the set of points in P that are not dominated by any other point in P , connected by arectilinear staircase. For any region or point R , we define its horizontal halfslab as the unionof all horizontal halflines that are directed leftward, whose apex lies in or on R . We definethe vertical halfslab symmetrically using downward vertical halflines. Given a set R withoutknowledge of P , we say a region R i ∈ R is (Figure 4, left):a negative region if for all choices of P , the point p i is not part of the Pareto front of P ;a positive region if for all choices of P , the point p i is part of the Pareto front; ora potential region if it is neither positive nor negative. (cid:73) Lemma 2.
A region R i ∈ R is negative if and only if ∃ R j ∈ R such that the top rightvertex of R i is dominated by the bottom left vertex of R j . A non-negative region R i is positiveif and only if R k ∈ R such that R i intersects either halfslab of R k . Proof.
Let R i and R j be two axis-aligned rectangular uncertainty regions where the topright vertex of R i is dominated by the bottom left vertex of R j . All choices of p i ∈ R i aredominated by the top right vertex of R i , similarly all choices of p j ∈ R j dominate the bottomleft vertex of R j hence via transitivity p j always dominates p i which implies that R i is anegative region. If there is no region whose bottom left vertex dominates the top right vertexof R i , then p i appears on the Pareto front of P if all regions have their point lie on thebottom left vertex and p i lies on the top right vertex of R i . Hence R i is then not negative.If R i is non-negative, and there exists a region R k that contains R i in its horizontal orvertical halfslab then R i cannot be positive since if p k is placed on the top right vertex of R k and p i on the bottom left vertex, p k must dominate p i .Suppose that R i is not positive and not negative. Then per definition there exists a pointplacement of p i , and another true point p l , such that p l dominates p i . In this case, p l alsodominates the bottom left vertex of R i , yet the uncertainty region R l cannot be entirelycontained in the quadrant that dominates the top right vertex of R i , else R i is negative.Hence R l must have a halfslab that intersects R i which proves the lemma. (cid:74) . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 7 Evans and Sember [15] and Nagai et al. [25] study convex hulls and Pareto fronts of imprecisepoints. They note that for a set of pairwise-disjoint convex regions R , there is a connectedarea of negative points. They call this area the guaranteed dominated region . We refer to theboundary of the guaranteed dominated region as the guaranteed boundary B R . We note thatfor Pareto fronts, the guaranteed boundary is the Pareto front of the bottom left vertices in R . Intuitively, discovering the exact location of a point below B R does not provide additionaluseful information, only discovering that a point lies below B R does. (cid:73) Lemma 3.
Let R be a set of pairwise disjoint non-negative rectangles. The intersection ofa region R i ∈ R with B R is a staircase with no top right vertex. Proof.
Per definition, non-negative regions have a top right vertex that lies above B R . Theirbottom left vertex lies either on B R , or below B R (since B R is the Pareto front of all bottomleft vertices). Hence the closure of each uncertainty region intersects B R . The intersectionbetween a connected staircase and an axis-aligned rectangular region is always a connectedstaircase. Each top vertex of B R corresponds to a bottom left vertex of a region in R . Each R i cannot cannot contain such a top vertex since regions are pairwise disjoint. (cid:74) We formalise the above intuition by defining a procedure
Trunc . Given an original set R ® of n ® pairwise disjoint axis-aligned rectangles, Trunc ( R ® ) returns a truncated set R (cid:81) wheresome regions may be flagged (marked with a boolean). Refer to Figure 4. Specifically, eachnegative region in R ® gets removed, each potential region R i , whose bottom left vertex isbelow B R ® , gets flagged and replaced by the part of R i above B R ® . By Lemma 3 this resultsin a rectangular area. All remaining regions are rectangles which touch B R ® . Since they arealso disjoint, their intersections with B R ® induce a well-defined order, and Trunc re-indexesthe remaining regions according to top left to bottom right ordering of their bottom leftvertices. We obtain a set R (cid:81) = ( R , R , . . . R n (cid:81) ) = Trunc ( R ® ) with n (cid:81) ≤ n ® . Observe that B R ® = B R (cid:81) . We say R (cid:81) is a truncated set if it is the result of a truncation of some set R ® . Dependency graphs.
Given a truncated set R = R (cid:81) , we define a (directed) dependencygraph denoted by G ( R ) as follows. The nodes of the graph correspond to the regions in R .We have two types of directed edges which we refer to as horizontal and vertical arrows. Aregion R i has a vertical arrow to R j if R j succeeds R i and is vertically visible from R i (thatis, there exists a vertical segment connecting R i and R j that does not intersected any otherregion in R ). A region R i has a horizontal arrow to R j if R j precedes R i and is horizontallyvisible from R i . Refer to Figure 5. Observe that, if R is a truncated set, any point region p ∈ R has no outgoing arrows, since after truncation the halfslabs of p do not intersect theinterior of any rectangle in R . We note an important property of the dependency graph: (cid:73) Lemma 4.
Let R i ∈ R such that R i is a source in G ( R ) . Then all R l ∈ R with i < l cannot have an incoming dependency arrow from a region R k with k < i and vice versa. Figure 5
A truncated set and its horizontal and vertical arrows.
Preprocessing Imprecise Points for the Pareto Front
Proof.
Consider such regions R k , R i and R l . Per the ordering of R , the bottom left vertexof R k lies left and above the bottom left vertex of R i . Per definition, R k can only have avertical arrow to R l . The region R k has a vertical arrow to R l only if its bottom facet liesabove R l . However, then either its bottom facet intersects R i (contradicting the assumptionthat the regions are pairwise disjoint) or it lies above R i (contradicting the assumption that R i is a source node in G ( R )). The argument for arrows from R l to R k is symmetrical. (cid:74)(cid:73) Corollary 5.
Let R be a truncated set and let R i and R j be source nodes in G ( R ) . Thereis no region in R\ [ R i , R j ] that has a directed path in G ( R ) to any region in [ R i , R j ] . The Pareto cost function.
We show that for any set R ® , we can construct the Pareto frontof the underlying point set using only R (cid:81) = Trunc ( R ® ). To show that we can use R (cid:81) toconstruct Ξ ∗ in uncertainty-region optimal time, we define the Pareto cost function denotedby CP( R (cid:81) , P ). In Section 4 we show that CP( R (cid:81) , P ) is the uncertainty-region lower boundfor constructing Ξ ∗ and in Section 5 we show that this lower bound is tight.Before we can define the Pareto cost function, we define additional concepts (Figure 6).By C we denote the unspecified cost for a retrieval. Whenever we write log we refer to thelogarithm base 2. Let R = R (cid:81) be a truncated set. For all regions R i ∈ R , we denote by V i the subset of [ R i , R n ] that is vertically visible from R i (including R i itself) and by H i the subset of [ R , R i ] that is horizontally visible from R i (including R i itself). Given P , wedenote by V i ( P ) ⊆ V i : the union of { R i } with the subset of V i of regions that are dominatedby a point p j with j ≤ i . The set H i ( P ) is defined symmetrically taking points p j with i ≤ j .Intuitively, the truncation operator represents the foresight about the Pareto front of P .Now, given a truncated set R and P we construct a set ˜ R ( P ) ⊂ R that intuitively representswhich regions of R were geometrically interesting in hindsight. Consider for a given P , allregions that are intersected by the Pareto front of P . Let R j be such a region, then giventhe Pareto front of P \{ p j } , R j covers some area above this Pareto front. Hence, the point p j could be part of the Pareto front of P if it lies in this area. Intuitively, all regions intersectedby the Pareto front of P are hereby suitable for further inspection; however, if the regionsare positive regions this further inspection might not be required to construct Ξ ∗ . Similarly,if the region R j lies above the Pareto front of the points P \{ p j } , the point p j cannot bedominated by a point in P \{ p j } and hence we can conclude it lies on the Pareto front of P without further inspection. This is why we define ˜ R ( P ) as the subset of R where eachregion R i ∈ ˜ R ( P ) is intersected by the Pareto front of P and one of three conditions holds: R i is flagged; R i intersects and edge e with endpoint p j ∈ P and i = j ; and/or R i is not a sink in G ( R ).We define the Pareto cost function as: CP( R , P ) = P R i ∈ ˜ R ( P ) C + log | V i ( P ) | + log | H i ( P ) | . One is free to compute any auxiliary Ξ in the preprocessing phase, in order to reconstruct astructure Ξ ∗ , isomorphic to the Pareto front, as efficiently as possible. There exists a choice ofinput R ® where all regions are positive: namely whenever R ® = R (cid:81) = T runc ( R ® ) and G ( R ® )is a graph with no edges. In this case, for every choice of P that respects R ® , the Pareto frontof P is isomorphic to B R ® hence it is possible to construct Ξ ∗ in the preprocessing phase.If R ® has m elements, constructing B R ® has a well-known O ( m log m ) worst case lower bound. . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 9 In the reconstruction phase an algorithm can use any auxiliary structure Ξ to aidits computation. In the remainder of this section we consider any truncated set R = R (cid:81) = T runc ( R ® ) of n elements, together with any auxiliary datastructure. We provide aninformation-theoretical lower bound, which depends on R and P , for both the number ofRAM instructions and disk retrievals required to construct Ξ ∗ regardless of Ξ. Bruce et al. study in their paper the reconstruction of the Pareto front of P in a variantof (what would later be) the preprocessing model with implicit representation. Bruce et al. present an iterative retrieval strategy that is instance optimal. Their strategy performs atmost three times more retrievals than any algorithm must use to discover the Pareto front of P and they prove that this factor-3 redundancy is the best anyone can do. Their strategydescribes the regions that must be considered in a geometric sense, not an algorithmic sense.That is, at each iteration they can identify a triplet of regions to query. But they have noalgorithmic procedure to identify these three regions as such, nor a way to beforehand specifywhich regions should be considered. In their model this is justifiable as they assume thatthe retrieval cost C vastly dominates any RAM instructions and hence identifying the tripleeach iteration is trivial. In this paper, we drop the assumption that C is enormous and areinterested in a retrieval strategy which not only minimizes the number of retrievals, butwhich can also elect which points to retrieve efficiently.We note that the query strategy of Bruce et al. produces a result of the same qualityas the lemma below and naturally, our proofs share some elements which we fully wish toattribute to the work of [3]. The novelty in our result is that for each pair ( R , P ) we areable to characterize the regions which require a disk retrieval using ˜ R ( P ). Which will helpus in the reconstruction phase, when we want to identify these regions efficiently. (cid:73) Lemma 6.
Let R be a truncated set and let P be any point set that respects R . Anyalgorithm that constructs Ξ ∗ of P must perform at least | ˜ R ( P ) | retrievals. Proof.
Let R i ∈ ˜ R ( P ). Per definition, R i is not dominated by a point in P . Hence given P \ p i , there exists a choice of p i such that p i appears on the Pareto front of P . Any algorithm A must spend a disk retrieval on p i , if there also exists a choice of p i such that it does notappear on the Pareto front, given P \ p i . We consider the three cases for when R i ∈ ˜ R ( P ):Let R i be flagged. Then there exists a choice of p i such that p i lies below B R and hencedoes not appear on the Pareto front of P . Else let R i be intersected by an edge that has asan endpoint a point p j with j = i . Then e is either a vertical edge whose top vertex is p j or a horizontal edge whose right vertex is p j . In both cases, there exists a choice of p i forwhich it does not appear on the Pareto front of P since it would be dominated by p j (this R i Figure 6
A region R i and the set V i in orange. Middle: for a given set of points, the set V i ( P ) isshown in red. Right: the set V i ( P ) changes for different P , but always includes R i . is achieved by placing p i left of the vertical edge, or below the horizontal edge). Lastly letneither first two cases apply and R i have at least one outgoing edge in G ( R ). Then there isat least one region R ∈ H i ∪ V i , the argument for this case is illustrated by Figure 7. Denoteby R a region in H i (the case for V i is symmetrical). Moreover, let R be the region in H i with the highest index. We ‘charge’ the region R one disk retrieval. First we show that eachregion in R gets charged at most twice, then we show this charge is justified.Suppose that R gets charged by two regions R i , R j with R ∈ H i and R ∈ H j (theargument for when R lies in two vertical halfslabs is symmetrical) and let i < j . If R lies in H i and H j , then R i must lie in the horizontal halfslab of R j , which contradicts theassumption that R was the region in H j with the highest index (see Figure 7, middle).Second we show that this charge is justified. Consider R and the two regions R i and R l ( l < i ) that charge R and all points in P \{ p , p i , p l } . Since case (2) does not apply to R i and R l , there is no point p ∈ P \{ p i , p l } whose horizontal or vertical halfslab intersects R i or R l , thus no point in P \{ p i , p l } can dominate R , R i or R l . This implies that regardless of allother points, there a choice for p i , p l , p where all three points appear on the Pareto front of P (the point placement where p i and p l appear on the bottom left vertex of their respectiveregions and R appears on the top right vertex). However, there also exists a choice where p is dominated by p l or p i . Any algorithm must therefore consider at least p , p i or p l in orderto find out and this is why the charge is justified. (cid:74) In Section 2 we defined the uncertainty-region lower bound. By an information-theoreticallower bound (algebraic decision tree or entropy [1, 7]), we have, for any R , that theUncertainty-region lower bound is at least log L ( R ), where L ( R ) is the number of combina-torially different Pareto fronts of point sets that respect R . We prove the following: (cid:73) Lemma 7.
Let R be a truncated set and P be any point set that respects R . Then X R i ∈ ˜ R ( P ) log | V i ( P ) | + log | H i ( P ) | ≤ · log L ( R ) . Proof.
We show that P R i ∈ ˜ R ( P ) log | V i ( P ) | ≤ log L ( R ). By a symmetric argument we have P R i ∈ ˜ R ( P ) log | H i ( P ) | ≤ log L ( R ) and the lemma follows. Consider for a fixed set P allregions R i ∈ ˜ R ( P ) for which | V i ( P ) | ≥ R i ∈ V i ( P )) and sort them from lowestindex to highest. For ease of exposition we denote these regions as ( R , R , . . . , R m ). Wecreate m different, pairwise disjoint vertical slabs as follows: the first slab is bound by theleft facets of R and R , the second by facets of R and R and the m ’th slab is a halfplane(Figure 8). In the degenerate case that a slab has width 0 (this can occur, when aftertruncation regions can have left vertices that share a coordinate) we give it width ε .Let R i = R and R j = R . For all regions R k ∈ V i ( P ), per definition i ≤ k < j . Eachof these truncated regions has thus a bottom left endpoint that lies left of the bottom left R l R i R l R i R j R (cid:48) R (cid:48) Figure 7
Left: The region R l charges the blue region and R i the green. Middle: for R j , either R i ∈ H j or there is another region (yellow) with higher index in H j . . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 11 | Vi ( P ) | | Vj ( P ) | | Vk ( P ) || Vs ( P ) | Figure 8
Left: A pair ( R , P ) such that the grey points form the Pareto front. Given the Paretofront, we can extract V i ( P ) for each i . Middle: based on the sets V i ( P ), we create vertical slabsirrespective of the original points P . Right: In each vertical slab, we can create V i ( P ) combinatoriallydistinct (partial) Pareto fronts using only points in the vertical slab. vertex of R j and right of the bottom vertex of R i which implies that their bottom left vertexlies in the first vertical slab. The result of this observation is, that given R , there are atleast | V i ( P ) | combinatorially different Pareto fronts contained within the first vertical slab.These Pareto fronts are obtained by placing the points of the regions in V i ( P ) \ R i on theirrespective bottom left endpoints, and by letting p i dominate any prefix of these points.Let R j = R and R k = R . Via the same argument each region in V j ( P ) has its bottomendpoint in the second vertical halfslab. Hence with the same argument as above, there areat least | V j ( P ) | combinatorially different Pareto fronts contained within the second halfslab.Moreover, we created | V i ( P ) | different combinatorial outcomes by placing only points in thefirst vertical halfslab, using only points preceding p j . This means that these combinationscan be generated, whilst no point preceding p j dominates any point following p j . This impliesthat the total number of combinatorially different Pareto fronts contained in both the firstand second halfslab is | V i ( P ) | · | V j ( P ) | . By applying this argument recursively it follows that: Q R i ∈ ˜ R ( P ) | V i ( P ) | ≤ L ( R ) , which concludes the proof. (cid:74) Given Lemma 6 and Lemma 7 we can immediately conclude the following: (cid:73)
Theorem 8.
Let R be a truncated set and P be any set that respects R . Then CP( R , P ) is fewer than three times the uncertainty-region lower bound of R . We wish to briefly note that for each i , V i ( P ) and H i ( P ) have at most n elements and thusby Lemma 6, CP( R , P ) is a factor log n removed from the instance lower bound. Theorem 8 gives an uncertainty-region lower bound for any truncated set R . In this section,we show that this lower bound is tight. To that end, we first define additional geometricconcepts. First, we introduce the notion of canonical rectangles. Then we define the notionof subproblems . Finally, we show how to use the subproblems of a canonical set to quicklyselect only regions which lie in ˜ R ( P ). We wish to emphasise that in the reconstruction phasewe have implicit access to the point set P , meaning that for each region R i , we can request p i in O ( C ) time. Thus reading all points in P takes Ω( nC ) time, which we aim to avoid. Let R be a truncated set of n regions and let P respect R . Denote by V next i the regionstrictly right of the vertical slab of R i with the lowest index; H prev i is defined symmetrically negative needs truncation remove for culled set can be compounded Figure 9
Left: R ® with B R ® in red. Middle: the set of regions after truncation. The yellowregion is a source and a sink, it splits the problem into two. Right: The canonical set. using the highest index (refer to Figure 11). For each i , let p xMax i (respectively p yMax i ) bethe point in P with maximal x -coordinate ( y -coordinate) among points p k with k ≤ i (with k ≥ i ). Throughout this section, we denote by f i ( P ) the region succeeding R i with thelowest index that is not dominated by a point p k with k ≤ i . The region g i ( P ) is the regionpreceding R i with highest index not dominated by a point p k with k ≥ i .Let R i ∈ R be both a source and sink in G ( R ). By Lemma 4, p i appears on the Paretofront and connects the Pareto front of [ p , p i − ] and [ p i +1 , p n ]. Thus, we can split the problemof computing the Pareto front of P into two, and solve each half independently. We say thata truncated set R is culled if G ( R ) contains no region that is both a source and a sink. Let[ R i , R j ] be a sequence of sinks in G ( R ), and R ∗ be the smallest rectangle that contains R i and R j . Note that R ∗ is disjoint from regions in R\ [ R i , R j ] and contains all [ R i , R j ]. Wecan use R ∗ to capture a “streak” of points which do, or do not, appear on the Pareto front: (cid:73) Lemma 9.
Let [ R i , R j ] be a sequence of sinks in G ( R ) . If there is no p k ∈ P preceding p i that dominates p i then there is no point preceding p i that dominates any point in [ p i , p j ] . Ifsome p k preceding p i dominates p j , then p k dominates all points in [ p i , p j ] . Similar statementshold for p k succeeding p j . Proof.
Any p k that dominates any point p s with s ∈ h i, j i , but not p i or p j itself must liein the interior of R ∗ , but R ∗ contains only points whose regions are sinks in G ( R ). Thiscontradiction implies all claims of the lemma. (cid:74) This lemma implies that if both p i and p j are not dominated by other points in P then allthe points in [ p i , p j ] appear on the Pareto front of P as a contiguous subsequence, and allregions R k ∈ [ R i , R j ] are not part of ˜ R ( P ). Theorem 8 states we cannot “afford” to spendany disk retrievals on ( p i , p i +1 , . . . , p j ). Instead, we should add a pre-stored chain referencing[ p i , p j ] to Ξ ∗ in constant time. This is why for any maximal sequence of sinks [ R i , R j ] in atruncated and culled set R , we define their compound region R ∗ and we replace [ R i , R j ] in R with R ∗ (refer to Figure 9 (right)). Let R ? be the resulting set of regions. The region R ∗ is a sink in G ( R ? ) and a region R has an outgoing arrow to R ∗ in G ( R ? ) if and only if ithad an outgoing arrow in G ( R (cid:81) ) to at least one region in [ R i , R j ]. Since R ∗ is just anotherrectangle disjoint from all other rectangles in R comp , the definition of truncated and culled still applies to R comp . We say a set R ? is a canonical set if it is truncated, culled, and ifthere are no two consecutive regions that are sinks in G ( R ? ). In the remainder, we assume R is a truncated set and R = R ? is its respective canonical set as the reconstruction input. Subproblems.
Let R be a truncated set. We say two indices i < j form a subproblem withrespect to a dependency graph G ( R ) if R i and R j are sources in G ( R ) and if there does . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 13 not exist a region R k with i < k < j that is also a source. With slight abuse of notation,we say that [ R i , R j ] is a subproblem of G ( R ). At later stages we will consider some altereddependency graph G ( R ) and will refer to subproblems [ R l , R m ] of G ( R ). The algorithm sketch.
The core of our algorithm is rather straightforward: it is an iterativestrategy, where at each iteration t we have an (implicitly truncated) set R t and a queue ofsubproblems of G ( R t ). Each iteration, we dequeue a subproblem [ R i , R j ] of G ( R t ), retrieve p i , p j to replace R i and R j and (implicitly) re-truncate. We maintain the following invariant: (cid:73) Invariant 1.
For each iteration, when we consider a subproblem [ R i , R j ] we have a pointerto the region R which stores p xMax i − and the region R which stores p yMax j +1 . Observe that for all subproblems [ R i , R j ] of G ( R = R ? ), the point p xMax i − = p i − and p yMax j +1 = p j +1 . We sketch Algorithm 1. We want to prove that its runtime matches the valueCP( R , P ) of Theorem 8. This would trivially be true, if for each subproblem [ R i , R j ] of G ( R t ), R i , R j ∈ ˜ R ( P ). Unfortunately that is not always the case, and thus we resort to amore involved argument to prove the following theorem. In the remainder of this section, weshow that the algorithm’s running time is O ( A ( R , R ? , P )). (cid:73) Theorem 10.
Let R be a truncated set and let R ? be its respective canonical set, Ξ bebuilt on R ? and Algorithm 1 run on R ? as input. Let Algorithm 1 consider for each iteration t , a subproblem [ R i ( t ) , R j ( t ) ] with i ( t ) < j ( t ) − . Let R A ( R ? , P ) = S t { R i ( t ) , R j ( t ) } . Let V i ( P ) and H i ( P ) refer to subsets of R , not R ? . Then: A ( R , R ? , P ) = X R i ∈R A ( R ? ,P ) (cid:18) C + log | V i ( P ) | + log | H i ( P ) | (cid:19) ≤ CP( R , P ) . Algorithm 1:
Algorithm sketch, assuming R is canonical. Result:
The pointer structure Ξ ∗ . (Runtime) Q ← subproblems ( G ( R ))(Preprocessing) while Q = ∅ do [ R i , R j ] ← Q.DeQueue() ( O (1)) p i , p j ← Retrieve( R i , R j ) (2 C + O (1)) p xMax i , p yMax j ← Compare(( p i , p xMax i − ), ( p j , p yMax j +1 )) (2 C + O (1)) if p i not dominated by xMax i , p yMax j then Ξ ∗ .Append( p i after p xMax i − ) ( O (1)) if p j not dominated by p xMax i , p yMax j then Ξ ∗ .Append( p yMax j +1 after p j ) ( O (1)) f i ( P ) ← gallopingSearch( p xMax i , V i )( O (log | V i ( P ) | )) g j ( P ) ← gallopingSearch( p yMax j , H j )( O (log | H j ( P ) | )) R t +1 ← ImplicitTruncate( R t − R i − R j + p i + p j ) ( O (1))DetermineSubproblems( R t +1 , f i ( P ), g j ( P )) ( O (1)) foreach subproblem [ R c , R d ] of G ( R t +1 ∩ [ R i = p i , R j = p j ]) do Q.Queue([ R c , R d ]) ( O (1), charged to[ R c , R d ]) Proving Theorem 10.
This theorem describes an intuitive “runtime allowance” that Algo-rithm 1 has. We first prove 3 Lemmas about subproblems encountered by Algoritm 1. (cid:73)
Lemma 11.
Let R be a canonical set and R i ∈ R . Algorithm 1 encounters a subproblem [ R i , · ] or [ · , R i ] if and only if R i is intersected by the Pareto front of P . Proof.
The region R i is not intersected by the Pareto front of P if and only if R i is dominatedby a point p j ∈ P . Let p j appear on the Pareto front of P (via transitivity of domination,we can always obtain such a p j ). The iterative procedure must consider p j before p i since R j prevents R i from being a source in the dependency graph. But when R j is considered, R i is truncated. The graph must always have at least one source. Thus, since R i will neverbe removed after truncation, it must eventually become a source. (cid:74)(cid:73) Lemma 12.
Let R be a canonical set. Algorithm 1 encounters only subproblems [ R i , R j ] where either: j = i + 1 or R i ∈ ˜ R ( P ) or R j ∈ ˜ R ( P ) , and R i ˜ R ( P ) if and only if | V i ( P ) | = | H i ( P ) | = 1 (the same holds for R j ). Proof. If R is a canonical set, then there cannot by any subproblem [ R i , R j ] of G ( R ) where R i and R j are both sinks in G ( R ). As a consequence, for each [ R i , R j ] either R i ∈ ˜ R ( P ) or R j ∈ RP and R i ˜ R ( P ) implies V i ( P ) = H i ( P ) = { R i } .In later iterations, we cannot immediately guarantee that R t is canonical, and theallowance for spending computation time is hence lost. Via Lemma 11 we know that R i and R j are both intersected by the Pareto front of P . Thus, the regions R i , R j ˜ R ( P ) impliesthat R i and R j are both sinks in the original graph G ( R ) (as ˜ R ( P ) is defined on the originaltruncated set). Thus R i ˜ R ( P ) implies V i ( P ) = H i ( P ) = { R i } .What remains is to show that for each subproblem either R i or R j does lie in ˜ R ( P ). Let i < j −
1. Then if R i and R j are both sinks, then by Lemma 4 the region R i +1 or R j − mustalso be a source which contradicts the assumption that [ R i , R j ] is a subproblem. (cid:74)(cid:73) Lemma 13.
Let R be a canonical set. Algorithm 1 encounters only subproblems [ R i , R j ] followed by [ R i = { p i } , R k ] if R k ∈ ˜ R ( P ) . Proof.
By the argument of Lemma 12, R k is intersected by the Pareto front of P . Moreoverafter the iteration t where the algorithm considers [ R i , R j ], the region R i has no outgoingedges in each iteration t with t < t . Hence if [ R i = { p k } , R k ] is a subproblem, the region R k has at least one outgoing arrow and thus R k ∈ ˜ R ( P ). (cid:74) These three Lemmas imply the following theorem that we later use for a charging scheme:when we relate algorithm runtime to CP ( R , P ) : Proof of Theorem 10.
Recall that CP( R , P )) = P R k ∈ ˜ R ( P ) C + log | V k ( P ) | + log | H k ( P ) | .Let [ R i , R j ] be the first subproblem considered that has R i as its left boundary. ByLemma 12, at least R i or R j is in ˜ R ( P ) hence we charge C time to either the term( C + log | V i ( P ) | + log | H i ( P ) | ) or ( C + log | V j ( P ) | + log | H j ( P ) | ) in the sum of CP ( R , P ).Moreover, R i ˜ R ( P ) implies log | V i ( P ) | = log | V j ( P ) | = 0 hence including these two terms,does not increase the sum’s value. For subsequent subproblems [ R i , R k ], Lemma 13 guaranteesthat R k ∈ ˜ R ( P ). Hence the term: ( C + log | V k ( P ) | + log | H k ( P ) | ) in the sum of A ( R , R ? , P )can be charged to the term ( C + log | V k ( P ) | + log | H k ( P ) | ) in the sum of CP ( R , P ). (cid:74) . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 15 Figure 10
The construction of the subproblem tree. Left we see a subproblem of a canonicalset with the vertical arrows drawn. In the middle we see the children of this subproblem with thehorizontal arrows drawn. On the right we continued the recursion one additional step.
The subproblem tree.
Theorem 10 shows that if we are able to execute our describedalgorithm in the specified running time, then we prove that CP( R , P ) is tight and we haveobtained an uncertainty-region optimal algorithm. However, in order to achieve this runningtime, in each iteration we must determine the new subproblems efficiently. This is why wedefine a subproblem tree on the original dependency graph G ( R ). The subproblem tree,denoted by T R , is a range tree on the interval [1 , n ] ⊂ Z (Figure 10). The root node ofthe subproblem tree stores the interval [1 , n ]. If R is a canonical set, the subproblems of R partition R , and the root node has a child for each subproblem [ R i , R j ] where the childstores the interval [ i, j ] and a pointer to R i and R j . We construct the subsequent children asfollows: for each node [ i, j ], we remove all outgoing arrows from R i and R j and we create achild node for each subproblem of G ([ R i , R j ]) without these arrows. Note that each nodehas at least two children: as removing the outgoing arrows from R i and R j creates at leastone additional source R k with k ∈ h i, j i and R i and R j remain sources in G ([ R i , R j ]). Here, we elaborate on the preprocessing procedure. First, we transform a set R ® of m axis-aligned pairwise disjoint rectangles into a truncated set R (cid:81) with n elements in O ( m log m )total time. Next, we construct a canonical set R ? and the auxiliary datastructure Ξ(which consists of the subproblem tree T R (cid:81) and some additional pointers) in O ( n log n ) time.Specifically, we define Ξ as follows: Defining Ξ . Given a canonical set R ? , let Ξ consist of G ( R ? ) and the tree T R ? augmentedwith the following attributes stored for every region R i ∈ R (Figure 11): A binary search tree on V i and H i from G ( R ? ). A pointer to V next i and H prev i in R ? . A pointer to the region R j with highest j , such that R i ∈ V j (the back pointer ) and apointer to the region R j with lowest index j , such that R i ∈ H j (the forward pointer ). A pointer to the highest node in T R ? that stores an interval [ · , i ], and a pointer to thehighest node in T R ? that stores an interval [ i, · ]. If R i is a compound region, an array of all the regions compound in R i . Creating a truncated set.
We consider the bottom left vertices of all regions in R ® , construct B R ® , together with a range tree on the horizontal edges of B R ® [9] in O ( m log m ) time. Foreach region R ∈ R ® we detect whether R is negative by performing a point location withits top right vertex on the interior of B R ® ; if it is negative then it is discarded. If a region R ∈ R ® is not negative then by Lemma 3 we know that R ∩ B R ® is a staircase of constantcomplexity which we compute in logarithmic time using binary search on B R ® . We flag eachnon-negative R ∈ R ® whose interior intersects B R , and store its region after truncation. Thisresults in a set R (cid:81) of n pairwise disjoint axis-aligned rectangles, which we sort and re-indexbased on their intersection with B R (cid:81) in O ( m log m ) time and conclude: (cid:73) Lemma 14.
For any set R ® of m axis-aligned, pairwise disjoint axis-aligned rectangles wecan construct its truncated set R (cid:81) of n rectangles in O ( m log m ) time. Recall that for any truncated set R (cid:81) we denote by H i the set of regions R j in R with j < i which are horizontally visible from R i and by V i the set of regions R j with j > i which arevertically visible from R i . In the remainder of the preprocessing phase, we spend O ( n log n )time to transform R (cid:81) into a canonical set R ? , construct G ( R (cid:81) ) and G ( R ? ) and constructthe datastructure Ξ. (cid:73) Observation 1.
For any truncated set R (cid:81) , a region R j ∈ R (cid:81) is vertically visible from aregion R i ∈ R (cid:81) if and only if there exists a face or edge in the vertical decomposition of R which is vertically adjacent to both R i and R j . Using Observation 1 we obtain the following through standard Computational Geometry: (cid:73)
Lemma 15.
For any truncated set R (cid:81) of n axis-aligned, pairwise disjoint rectangles wecan construct its canonical set R ? and Ξ in O ( n log n ) time. Proof.
A vertical or horizontal decomposition has a number of faces and edges which islinear in the number of input vertices and can be constructed in O ( n log n ) time [9]. Giventhe vertical decomposition of R (cid:81) , we can traverse it in linear time to store for each region R i the set V i . Similarly we can identify and store H i for each R i , and in O ( n log n ) totaltime we construct a binary search tree on each set H i and V i to obtain Attribute 1. Foreach set V i , we identify V next i in logarithmic time by searching by searching for the left-mostbottom-left endpoint right of the vertical slab through R i to obtain Attribute 2.Through this procedure, we construct the dependency graph G ( R (cid:81) ) in O ( n log n ) timeby iterating over all nodes in this graph. In linear time, we can identify the connectedcomponents of G ( R (cid:81) ) and the regions which are both a source and sink in G ( R (cid:81) ). FromLemma 4 we know that we can solve each connected component of G ( R (cid:81) ) independentlyand that the solutions must be concatenated through the regions that are both a source andsink. We store the connected components of G ( R (cid:81) ) as a doubly linked list and remove allsources and sinks from R (cid:81) to create a culled set. f i ( P ) V next i R i back pointer g j ( P ) R j H prev j forward pointer Figure 11
Two choices of P for the same set R . The sets R i ( P ) and R j ( P ) are shown in orangeand blue respectively. Left: we show V next i and the backward pointer and f i ( P ). Right: we show H prev j and the forward pointer and g j ( P ). . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 17 To transform a culled set into a canonical set, we identify all sinks in the graph in lineartime (by checking if | V i | = | H i | = 1) and we iterate over all regions in order of their index.Neighboring sinks get recursively grouped into a compound region and this procedure createsa canonical set in linear time. For each region compounding k regions, we construct Attribute5 in O ( k ) time. After having compound all regions, we do a linear-time scan to re-index allthe (compound) regions so that all indices are consecutive and we obtain a canonical set R ? . During this linear time scan, we identify for each R i the region of its back pointer and forward pointer (Attribute 3) in logarithmic time, through searching through the verticaland horizontal decomposition. Moreover, whenever we compound a set [ R i , R i + k ] into aregion R , we make sure to remove [ R i , R i + k ] from G ( R (cid:81) ) and replace it with R (where allarrows pointing to a region in [ R i , R i + k ] now point to R ). In this way, we simultaneouslycreate G ( R ? ).Lastly, we want to obtain from a canonical set R ? its subproblem tree T R ? in O ( n ) timeusing prior constructed G ( R ? ). This can be done as follows: first we identify the subproblemsof G ( R ? ) in linear time. Then for each subproblem [ R i , R j ] of G ( R ) we (temporarily) removeall outgoing arrows from R i and R j from the graph and for each node that has an arrowfrom R i or R j we check if it becomes a source node in constant time. This gives us the childnodes of the node that stores [ i, j ] in the T R ? . During this process, we store for each region R i a pointer to the largest interval [ i, · ] in the T R ? (which must always exist) in constantadditional time per region (Attribute 4). Applying this procedure recursively takes timelinear in the number of edges in G ( R ), which itself is linear in the number of cells of thevertical and horizontal decomposition of R (cid:81) , which concludes the lemma. (cid:74) Lemma 14 and 15 and the observation that n ≤ m immediately imply Theorem 16. (cid:73) Theorem 16.
For any set R ® of m axis-aligned, pairwise disjoint axis-aligned rectangleswe can construct its trucated set R and its canonical set R ? and Ξ in O ( m log m ) time. We want to run Algorithm 1 whilst maintaining Invariant 1, in O ( A ( R , R ? , P )) time (The-orem 10). First, we argue that the reporting (appending) step of the algorithm is correct: (cid:73) Lemma 17.
For any iteration t , for any subproblem [ R i , R j ] of G ( R t ) , the point p i appearson the Pareto front of P if and only if p i is not dominated by p xMax i or p yMax j . Proof.
Let p i be not dominated by p xMax i and p yMax j , but dominated by some point p k .Then k < i or k > j , because R i and R j are both sources in G ( R t ). If k < i then the x -coordinate of p k is greater than of p i , and thus p xMax i = p i . Then, the point p xMax i hasgreater x -coordinate than p i , it lies in some region R = R i , and since R precedes R i andcontains p xMax i , its bottom facet must lie above the top facet of R i . Thus p xMax i dominates p i which is a contradiction. If j < k then p yMax j = p j and the symmetrical argument applies. (cid:74) The previous lemma implies that if Invariant 1 is maintained, we can iteratively identifypoints that appear on the Pareto front. Lemma 4 guarantees that for each iteration t , foreach subproblem [ R i , R j ], the Pareto front of { p xMax i } ∪ [ p i , p j ] ∪ { p yMax j } is a connectedsubchain of the Pareto front of P . Hence we can safely append p i after p xMax i . What remainsto show is that we can maintain Invariant 1 and identify the subproblems of R t efficiently. X R k f i ( P ) f i ( P ) g j ( P ) g j ( P ) XR k R i R j Figure 12
An illustration of the argument of Lemma 20. If R k loses the incoming arrow from X ,there must be a directed path from f i ( P ) or g i ( P ) to R k , or either R k = f i ( P ), R k = g j ( P ). Identifying subproblems.
Consider an iteration t in which we handle subproblem [ R i , R j ],and let [ R k , R l ] be any subproblem of G ( R t +1 ) that is not already a subproblem of G ( R t ). Itmust be that i ≤ k ≤ l ≤ j (Lemma 4). We need to quickly identify these new subproblems. (cid:73) Lemma 18.
For any truncated set R t , for any subproblem [ R i , R j ] of G ( R t ) , either f i ( P ) ∈ V i or f i ( P ) = V next i . Proof.
Any region in [ R i , R j ] that is dominated by a point preceding p i is dominated by p xMax i . The point p xMax i − cannot dominate R i , as else R i would have been removed duringa truncation. Hence, f i ( P ) is V next i or a region preceding it. Suppose for the sake ofcontradiction that f i ( P ) is a region preceding V next i and not in V i . Consider any vertical rayfrom a point in R i , right of p xMax i that intersects f i ( P ) (such a ray must always exist, since f i ( P ) precedes V next i and is not dominated by p xMax i ). Since f i ( P ) V i , this ray must alsointersect a region R ∈ V i (else this ray would be a line of sight to f i ( P ), which would imply f i ( P ) ∈ V i ). However, then R must precede f i ( P ) which contradicts the assumption that f i ( P ) was the lowest-indexed region succeeding R i , not dominated by p xMax i . (cid:74)(cid:73) Corollary 19.
Let R t be a truncated set, [ R i , R j ] be a subproblem. Given Invariant 1 and Ξ , we can identify f i ( P ) in O (log | V i ( P ) | ) time using the folklore galloping search. Proof.
The datastructure Ξ stores for R i the set V i as a balanced binary search tree (Attribute1). The set V i ( P ) is a prefix of V i which ends at f i ( P ) ∈ V i (or, in the case that V i ( P ) = V i , f i ( P ) = V next i )). Thus, given Invariant 1, we can use p xMax i to identify V i ( P ) in O (log | V i ( P ) | )time by using the folklore galloping (exponential) search by Bentley and Chi-Chih Yao. If V i ( P ) = V i , we refer to V next i which is stored in Ξ (Attribute 2). (cid:74) Next, we prove a lemma that helps us to identify the subproblems of G ( R t +1 ): (cid:73) Lemma 20.
Let [ R i , R j ] be a subproblem of G ( R t ) and denote by v the lowest node in T R such that the interval [ i, j ] is stored in v . For any descendent [ a, b ] of v , there is no region R ∈ [ R a , R b ] that is a source node in G ( R t +1 ) other than possibly R a , R b , f i ( P ) or g j ( P ) . Proof. If f i ( P ) equals or succeeds g j ( P ) then per definition of f i ( P ) and g j ( P ) all regionsin ( R i , R j ) apart from f i ( P ) = g j ( P ) are dominated and therefore removed after truncationof R t +1 . Hence, they cannot be sources in G ( R t +1 ) (Figure 12). Let [ a, b ] be a descendent of v , R k be a region with k ∈ h a, b i succeeding f i ( P ) and preceding g j ( P ). Per construction of T R each such R k has at least one incoming arrow from a region X ∈ [ R a , R b ]. The region R k can only become a source in G ( R t +1 ) if either p i or p j dominates X (else, X was dominatedby p xMax i or p yMax j before iteration t and does not exist in G ( R t )).We consider the case where p i dominates X (Figure 12). If p i dominates X , then X lies strictly left of the vertical line through p i , and R k intersects the vertical halfslab of X . . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 19 f i ( P ) R i f i ( P ) R i p q Figure 13
Left: the first case of the proof of Lemma 21, where p must dominate the remainingregions with an arrow to f i ( P ). Right: the second case, where either q sees f i ( P ), dominates f i ( P )or the purple region keeps its horizontal arrow to f i ( P ). Similarly if f i ( P ) = R k then R k must lie at least partly right of the vertical line through p i and below the bottom facet of f i ( P ). This means that if R k lies in the vertical halfslab of X then it must also lie in the vertical halfslab of f i ( P ). The region f i ( P ) is therefore a node in G ( R t ) with a directed path to R k , so R k is not a source node in G ( R t +1 ). (cid:74) Algorithm 1 runtime.
We further specify the iterative procedure of our algorithm. Ouralgorithm maintains a queue of subproblems. In iteration t , we dequeue a subproblem [ R i , R j ]of G ( R t ) and we denote by v the lowest node in T R such that the interval [ i, j ] is storedin v . We can obtain v in constant time via Attribute 4. By Lemma 4, processing [ R i , R j ]does not affect other subproblems which are in the queue before we process [ R i , R j ]. If thealgorithm has not yet retrieved p i nor p xMax i − , it retrieves both points using Invariant 1 in 2 C time and computes p xMax i in constant time. Similarly we compute p yMax j in with at most 2 C additional time. By Lemma 17, we check in O (1) time if p i and p j appear on the Paretofront, and if so we add them as the respective successor of p xMax i − or predecessor p yMax j +1 . Ifwe have just retrieved p i , we use galloping search to identify f i ( P ) in O (log | V i ( P ) | ) time(Corollary 19), we set the back pointer (Attribute 3) to null and (for later use) we store areference in R i to f i ( P ). If we did not retrieve p i this iteration, we retrieved it in a prioriteration and we use the pre-stored result f i ( P ) in O (1) time. We do the same for g j ( P ) in O ( C + log | H j ( P ) | ) time. We briefly remark the following claim. (cid:73) Lemma 21.
Let [ R i , R j ] be a subproblem of G ( R t ) and f i ( P ) precede g j ( P ) . Then theregion f i ( P ) is a source in G ( R t +1 ) if and only if: (1) the forward pointer of f i ( P ) is null or (2) the region resulting from the forward pointer has been retrieved in an iteration t < t . Proof.
Suppose that the pointer is null and suppose that there is no region R k for which f i ( P ) ∈ H k ( P ). Then f i ( P ) has no incoming horizontal arrows. If there is a region R k forwhich f i ( P ) ∈ H k ( P ) then there is a point p retrieved in an iteration earlier such that p is horizontally visible from f i ( P ) that set the pointer to null Figure 13, Left. The point p dominates all remaining regions with a horizontal arrow to f i ( P ). If the region resulting fromthe forward pointer has been retrieved in an iteration t < t , all regions with a horizontalpointer to f i ( P ) must have been considered by the algorithm, so f i ( P ) is not dominated. Bydefinition, all regions preceding f i ( P ) in [ R i , R j ] are dominated by p xMax i , thus, if f i ( P ) hasno incoming horizontal arrows it must be a source in G ( R t +1 ).If the pointer is not null and the region resulting from the forward pointer has not yetbeen retrieved in an earlier iteration then f i ( P ) must have at least one incoming horizontalarrow. Indeed, suppose that all regions with a horizontal pointer to f i ( P ) that are not yet f i ( P ) g j ( P ) a bbR i R j f i ( P ) g j ( P ) a b R i R j fec da b c d e fa i ji j Figure 14
Left: Case 1 where f i ( P ) , g j ( P ) lie in the same grandchild [ a, b ]. Right they don’t. retrieved are dominated by a point q retrieved prior to the current iteration. Then either q dominates f i ( P ), contradicting the assumption that f i ( P ) precedes g j ( P ), or the retrieval of q would have set the forward pointer of f i ( P ) to null . (cid:74) For ease of exposition, we assume f i ( P ) and g j ( P ) are not compound regions. For compoundregions, we refer to Appendix B. We distinguish between two cases based on which childrenof v contain f i ( P ) and g j ( P ) (Figure 14). Note that we never add a subproblem [ R a , R b ] if a = b + 1 (as such a subproblem does not satisfy the premise of Theorem 10). Instead, wecharge retrieving and comparing p a and p b immediately with at most 4 C overhead. Case 1: f i ( P ) and g j ( P ) are contained in the same grandchild [ a, b ] of v . We check inconstant time whether f i ( P ) and g j ( P ) are sources in G ( R t ) (by Lemma 21). Note thateither f i ( P ) or g j ( P ) must be a source. Let R k = f i ( P ) and R l = g j ( P ).If both f i ( P ) and g j ( P ) are sources, then by Lemma 20 the only three subproblemsin G ( R t +1 ) and [ R i , R j ] are: [ R i = p i , R k ], [ R k , R l ] and [ R l , R j = p j ]. In this case p xMax k − = p xMax i and p yMax l +1 = p yMax j . If k = l −
1, we immediately retrieve p k and p l in 2Ctime as the aforementioned overhead. Else we add to [ R k , R l ] a reference to p xMax k − and p yMax l +1 to maintain Invariant 1 and add the subproblem [ R k , R l ] to the queue.If f i ( P ) is a source and g j ( P ) is not, by the same reasoning the only subproblems are[ R i , R k ] and [ R k , R j ]. We check if k = j − R k , R k ] to the queue with a reference to p xMax i .This case is symmetric to the previous, as f i ( P ) is not a source and g j ( P ) is. Case 2: f i ( P ) ∈ [ R a , R b ] and g j ( P ) ∈ [ R e , R f ] for distinct children [ a, b ] and [ e, f ] of v . In this case, per construction of T R ? , each child [ c, d ] of v with b ≤ c < d ≤ e is a subproblemof G ( R t +1 ). We wish to briefly note, that either c < d −
1, or [ c, d ] neighbors a child of v for which this is true (else, regions could have been compounded). Hence by Theorem 10 if c = d − C time to the neighbor to immediately retrieve p c and p d and possiblyadd them to Ξ ∗ (again as the aforementioned overhead). If c < d −
1, then per constructionof T R ? , the point p c appears on the Pareto front of P . Note that since [ c, d ] is a child of v , p xMax c − can only be p c − or p xMax i . We charge O (1) time to the future processing of [ R c , R d ]to provide four pointers to [ R c , R d ] (to maintain Invariant 1) and add [ R c , R d ] to the queue.What remains is to handle [ a, b ] and [ e, f ] and we describe the procedure for [ a, b ]. Wecheck in constant time if f i ( P ) is a source using Lemma 21. If it is, then by Lemma 20 theonly subproblems of G ( R t +1 ) contained in [ R i , R b ] are [ R i , f i ( P )] and [ f i ( P ) , R b ]. We briefly . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 21 check if [ f i ( P ) , R b ] is a subproblem of length 2. If so we retrieve the corresponding pointsto see if they appear on the Pareto front. Else we add [ f i ( P ) , R b ] to the queue in constanttime via the same procedure as Case 1. If f i ( P ) is not a source, then [ R i , R b ] is the onlysubproblem of G ( R t +1 ) in [ R i , R b ] and we handle it similarly. We conclude: (cid:73) Theorem 22.
Algorithm 1 constructs Ξ ∗ in O ( A ( R , R ? , P )) = Θ(CP( R , P )) time. References Peyman Afshani, Jérémy Barbay, and Timothy M Chan. Instance-optimal geometric algorithms.
Journal of the ACM (JACM) , 64(1):1–38, 2017. Michael Ben-Or. Lower bounds for algebraic computation trees. In
Proc. 15th annual ACMSymposium on Theory of Computing , pages 80–86, 1983. Richard Bruce, Michael Hoffmann, Danny Krizanc, and Rajeev Raman. Efficient updatestrategies for geometric computing with uncertainty.
Theory of Computing Systems , 38(4):411–423, 2005. Kevin Buchin, Maarten Löffler, Pat Morin, and Wolfgang Mulzer. Delaunay triangulationof imprecise points simplified and extended.
Algorithmica , 61:674–693, 2011. doi:http://dx.doi.org/10.1007/s00453-010-9430-0 . Kevin Buchin and Wolfgang Mulzer. Delaunay triangulations in O ( sort ( n )) time and more. Journal of the ACM (JACM) , 58(2):6, 2011. Jean Cardinal, Samuel Fiorini, and Gwenaël Joret. Minimum entropy coloring. In
Proc. 16thInternational Symposium on Algorithms and Computation (ISAAC) , pages 819–828. Springer,2005. Jean Cardinal, Gwenaël Joret, and Jérémie Roland. Information-theoretic lower bounds forquantum sorting. arXiv preprint:1902.06473 , 2019. Timothy M Chan. Comparison-based time-space lower bounds for selection.
ACM Transactionson Algorithms (TALG) , 6(2):1–16, 2010. Mark de Berg, Otfried Cheong, Marc Van Kreveld, and Mark Overmars.
ComputationalGeometry: Introduction . Springer, 2008. Erik D Demaine, Adam C Hesterberg, and Jason S Ku. Finding closed quasigeodesics onconvex polyhedra. In
Proc. 36th International Symposium on Computational Geometry (SoCG) .Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Olivier Devillers. Delaunay triangulation of imprecise points, preprocess and actually get afast query time.
Journal of Computational Geometry , 2(1):30–45, 2011. Jeff Erickson et al. Lower bounds for linear satisfiability problems. In
SODA , pages 388–395,1995. Jeff Erickson, Ivor van der Hoog, and Tillmann Miltzow. Smoothing the gap between np ander. In
Proc. IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) .IEEE, 2020. William Evans, David Kirkpatrick, Maarten Löffler, and Frank Staals. Competitive querystrategies for minimising the ply of the potential locations of moving points. In
Proc. 29thAnnual Symposium on Computational Geometry , pages 155–164. ACM, 2013. William Evans and Jeff Sember. The possible hull of imprecise points. In
Proc. 23rd CanadianConference on Computational Geometry , 2011. Esther Ezra and Wolfgang Mulzer. Convex hull of points lying on lines in o ( n log n ) time afterpreprocessing. Computational Geometry , 46(4):417–434, 2013. Martin Held and Joseph SB Mitchell. Triangulating input-constrained planar point sets.
Information Processing Letters , 109(1):54–56, 2008. Simon H Kahan. Real-time processing of moving data. 1992. David G Kirkpatrick and Raimund Seidel. Output-size sensitive algorithms for finding maximalvectors. In
Proceedings of the first annual symposium on Computational geometry , pages 89–96,1985. Chih-Hung Liu and Sandro Montanari. Minimizing the diameter of a spanning tree forimprecise points.
Algorithmica , 80(2):801–826, 2018. Maarten Löffler and Wolfgang Mulzer. Unions of onions: Preprocessing imprecise points forfast onion decomposition.
Journal of Computational Geometry , 5:1–13, 2014. Maarten Löffler and Jack Snoeyink. Delaunay triangulation of imprecise points in linear timeafter preprocessing.
Computational Geometry , 43(3):234–242, 2010. Maarten Löffler and Marc van Kreveld. Largest and smallest convex hulls for imprecise points.
Algorithmica , 56(2):235, 2010. Shlomo Moran, Marc Snir, and Udi Manber. Applications of ramsey’s theorem to decisiontree complexity.
Journal of the ACM (JACM) , 32(4):938–949, 1985. Takayuki Nagai, Seigo Yasutome, and Nobuki Tokura. Convex hull problem with impreciseinput and its solution.
Systems and Computers in Japan , 30(3):31–42, 1999. Arnold Schönhage. On the power of random access machines. In
International Colloquium onAutomata, Languages, and Programming , pages 520–529. Springer, 1979. Ivor van der Hoog, Irina Kostitsyna, Maarten Löffler, and Bettina Speckmann. Preprocessingambiguous imprecise points. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Marc van Kreveld, Maarten Löffler, and Joseph SB Mitchell. Preprocessing imprecise pointsand splitting triangulations.
SIAM Journal on Computing , 39(7):2990–3000, 2010. Andrew Chi-Chih Yao. A lower bound to finding convex hulls.
Journal of the ACM (JACM) ,28(4):780–787, 1981. Jiemin Zeng.
Integrating Mobile Agents and Distributed Sensors in Wireless Sensor Networks .PhD thesis, The Graduate School, Stony Brook University: Stony Brook, NY., 2016. . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 23
A Reviewing lower bounds
The folklore worst-case lower bound definition of an algorithmic problem P with input X is:Worst-case lower bound( P ) := min A max X Runtime(
A, X ) , where each A is an algorithm that solves P for some definition of solving. Afshani, Barbayand Chan [1] observe that there are three common techniques to prove lower bounds withincomputational geometry:direct arguments based on counting, or information theory ;topological arguments, as used by e.g. Yao [29] or Ben-Or [2] (sometimes referred to as algebraic decision tree arguments); orarguments based on Ramsey theory, as used by e.g. Moran, Snir and Manber [24].The latter two techniques decompose algorithms into decision trees and reason about theirdepth. In traditional computation models decisions are binary; therefore, without additionalinformation about the decision tree structure of the specific problem P , the best possible lowerbound on its tree depth is Ω(log( Models of computation.
Applying these techniques to bound the running time of thealgorithms A , requires a precise definition of the model of computation used for the algorithmicanalysis. The classical argument by Ben-Or [2] assumes that the computation can be modeledby an algebraic decision tree, where in each node a binary decision is taken at which thealgorithm branches based on an algebraic test.Afshani, Barbay and Chan investigate a stronger definition for an algorithmic lower bound.They reason that the computational power that comes from the abstract algebraic decisiontree model, where algebraic test functions are only bounded in the number of argumentsand not their degree, is too large for a more fine-grained analysis of algorithmic runningtime. They restrict the class of algorithms that they consider for their competitive analysisto algebraic decision trees where each test is a multilinear function (a function that is linear,separate in each of its variables) with a constant number of variables. We share the sentimentthat a computational model that allows arbitrary algebraic computations in constant time isunrealistically powerful, but note that the alternative model is perhaps too restrictive, asit becomes difficult, if not impossible, to express computations such as higher-dimensionalrange searching using only multilinear functions.Recently, Erickson, van der Hoog and Miltzow [13] note that computations that involvedata structures do not only need to make decisions, but also need to be able to accessmemory. Memory is inherently discrete: a model that supports only real-valued algebraicdecisions can either not access memory, or has the ability to access discrete values withreal-valued computations which would imply that P = P SP ACE [26]. Fueled by the desireto analyse algorithms within computational geometry, they (re)define the real RAM. Weuse their definition of RAM to be able to define lower bounds for the preprocessing model(as the preprocessing model inherently can access memory as it needs to be able to use anauxiliary data structure Ξ). For completeness, we summarize their definition and how itenables an information theoretic lower bound, even when dealing with a pre-stored structureΞ at the end of this section.
Better than worst-case optimality.
A natural more refined lower bound than the worst-caselower bound is the instance lower bound. Given an algorithmic problem P with input X ,the instance lower bound is defined as:Instance lower bound( P , X ) := min A Runtime(
A, X ) . We recall the example in the introduction where we perform a binary search to see whether avalue q is contained in a sorted sequence of numbers X . For each instance ( X, q ), there existsa “lucky” algorithm that guesses the location of q in X in constant time. Thus, the instancelower bound for binary search is constant, even though there is no algorithm that can performbinary search in constant time in a comparison-based RAM model. Fine-grained algorithmicanalysis is desirable, yet instance optimality is unobtainable. It is therefore unsurprisingthat there is a rich tradition of finding algorithmic analyses that capture an algorithmicperformance that is better than worst-case optimality. Many attempts parametrize thealgorithmic problem, to better enable its analysis. For example, there is output-sensitive analysis as used by Kirkpatrick and Seidel [19] where the algorithm runtime depends on thesize k of the output. Other parameters can include geometric restrictions such as fatness , the spread of the input, or the number of reflex vertices in a (simple) polygon. Such parametersare hard to apply in the preprocessing model with implicit representation, as the auxiliarystructure Ξ allows one to bypass the natural lower bound that these parameters bring. Forexample: an output-sensitive lower bound is not applicable, as output of any size can becomputed in the preprocessing phase to be referred to in the reconstruction phase in O (1)time. Better than worst-case optimality without additional parameters.
Afshani, Barbay andChan propose an alternative definition of instance optimality which is not inherently unob-tainable. They restrict the algorithms A that solve P and consider the input I together witha permutation σ . They analyse the running time of A , conditioned on that it receives input X in the order given by σ . They then compare algorithmic running time based on the worstchoice of σ :Instance lower bound in the order oblivious setting( P , X ) := min A max σ Runtime(
A, X, σ ) . Intuitively, a permutation σ can force the algorithm to make poor decisions by placingthe input in a bad order and they assume that an algorithm receives “the worst orderof processing the input” to avoid the unreasonable computational power that a guessingalgorithm has. The instance lower bound in the order oblivious setting for our binary searchexample would be Ω( n ), as there exists a σ for which X is not a sorted set. Given q and( X, σ ), any algorithm then has to spend linear time to check if q is in X .This definition of lower bound would strictly speaking be applicable to the preprocessingmodel: given P and a permutation σ an algorithm can then only retrieve points in the order σ . However, we would argue that this lower bound is not very compatible with the spirit ofthe model. Per definition, one is free to preprocess R , Therefore, during preprocessing itwould not be unreasonable for an algorithm to decide on a favourable order to retrieve thepoints in P . This is why, amongst many alternative stricter-than-worst-case lower bounddefinitions, we propose another, specifically for the preprocessing model.Uncertainty-region lower bound( P , R ) := min ( A, Ξ) max ( P respects R ) Runtime( A, Ξ , R , P ) , Denote for any fixed algorithmic problem P , by L ( R ) the number of combinatorially distinctoutcomes of P given R . In the remainder of this section we recall the RAM definition of [13] . van der Hoog, I. Kostitsyna, M. Löffler, B. Speckmann. 25 to show that regardless of ( A, Ξ), Ω(log L ( R )) is an uncertainty region lower bound for thetime required by A to solve P . Recalling the real RAM definition.
If the reader is confident in the ability of the RAMmodel to support such a lower bound, we advise the reader skips ahead. Erickson, vander Hoog and Miltzow define the real RAM in two steps. First, they define computationsbased on the (discrete) word RAM, so that discrete memory can be accessed withoutunreasonable computational power. Then, they augment the word RAM with separatereal-valued computations that only work on values stored within the discrete memory cells.Their operations include memory manipulation, real arithmetic and comparisons (whichverifies if the real value stored in a memory cell is greater than 0). For an extensive overviewof the computations that they allow, we refer to Table 1 in [13]. They say a program onthe real RAM consists of a fixed, finite indexed sequence of read-only instructions. Themachine maintains an integer program counter , which is initially equal to 1. At each timestep, the machine executes the instruction indicated by the program counter. Every realRAM operation increases the program counter by one, apart from a comparison operationwhich ends in a goto statement that can set the program counter to any discrete value. Thismodel thereby immediately allows the classical information theoretic lower bound argument,even if there is some pre-stored data Ξ within memory. Indeed, let P be an algorithmicproblem such that there are L distinct outcomes and fix a program (algorithm) that reportsthe correct outcome. Each outcome may be described by the sequence of instructions thatlead to it, together with a halt instruction that tells the program to stop and output theresult. Hence, the program only terminates on the correct outcome, if it arrived there viaa goto statement from a comparison instruction (all other instructions only increase theprogram counter by 1, hence without comparisons the algorithm terminates at the firstoutcome in the sequence). It follows, that any sequence of instructions can be converted intoa binary tree where each node is a comparison instruction and where the leaves of the treeare lines in the sequence that store an outcome with a halt instruction. Hence regardless ofΞ, there is an outcome stored as a leaf in the tree where the program that requires Ω(log L )comparison instructions until it arrives at that leaf. B Handling compound regions
We describe the algorithmic procedure for when Algorithm 1 encounters a subproblem [ R i , R j ]where f i ( P ) or g i ( P ) is a compound region. Let f i ( P ) be a compound region. Then perdefinition f i ( P ) is a sink in the original graph: G ( R ). Consequently, the region R in thecanonical set R that succeeds R must have no more remaining incoming vertical arrows (aselse, R would not have been visible from the just processed R i ). The region R itself cannotbe a compound region, since else R and R could have been compounded together. We set f i ( P ) to be R instead, and continue as normal.We set the compound region R aside, with a reference to p xMax i and add it to a separatequeue that we handle at the algorithm’s termination in O (1) time. We charge this O (1) timeto this iteration t where we added it to the special queue. Per definition, for each region R i ,there is a unique f i ( P ), so R i gets charged at most once in this manner. It is possible thatin a later iteration t , when a subproblem [ R i , R j ] is considered by Algorithm 1, the region R is g j ( P ). In this case, we do not add R to the queue again but we do store a reference to p yMax j and we charge [ R i , R j ], O (1) time for storing this reference.For any compound region R , that is not dominated by a point in P , there must be an iteration t where a subproblem is considered such that f i ( P ) = R or g j ( P ) = R and thus itmust be in the special queue. When we process the special queue, we do the following: weuse p xMax i to identify the prefix of the original regions stored in R that are dominated bypoints preceding R in O (log | V i ( P ) | ) time using galloping search (we charge the prior f i ( P ),and just as above a region can only get charged once).At this point, we wish to briefly remark upon any possible ambiguity regarding theruntime O (log | V i ( P ) | ). In the premise of Theorem 10 we defined the sets V i ( P ) as subsetsof the truncated set R , not the canonical set R = R ? that serves as the input of thealgorithm. Note that O (log | V i ( P ) | ) is smaller than O (log | V ∗ i ( P ) | ) where V ∗ i ( P ) is a subsetof R ? since R ? can compound regions in V i ( P ) together. Throughout Section 5.3, weperformed a galloping search over the outgoing edges in the graph G ( R ? ), hence we spent O (log | V ∗ i ( P ) | ) ≤ O (log | V i ( P ) | ) time per search. Here, we perform a galloping search overregions in V i that are compounded (not in R ? ), and this is the first point where we usethe larger O (log | V i ( P ) | ) runtime. We wish to emphasise that the runtime of Section 4.1is hereby correct: as O (log | V i ( P ) | ) is an over-estimation of the actual time spent on thegalloping search. We continue the argument:Whenever g j ( P ) = R , we similarly use p yMax j to identify the suffix of the original regionsstored in R that are dominated by points in P succeeding R . For the at most 2 regionsthat are intersected by the vertical line through p xMax i and the horizontal line through p yMax j respectively, we explicitly retrieve their points in order to determine whether theyare dominated or not. We charge this 2 C retrieval time to R i and R jj