[PDF] Complete and Sufficient Spatial Domination of Multidimensional Rectangles

Abstract

Full PDF

CC OMPLETE AND S UFFICIENT S PATIAL D OMINATION OF M ULTIDIMENSIONAL R ECTANGLES

Tobias Emrich

Harman International

[email protected]

Hans-Peter Kriegel

Ludwig Maximilian University of Munich [email protected]

Andreas Züﬂe

George Mason University [email protected]

Peer Kröger

Ludwig Maximilian University of Munich [email protected]

Matthias Renz

Christian-Albrechts-Universität zu Kiel [email protected] A BSTRACT

Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systemsand index structures. Many tasks, such as nearest neighbor search and similarity ranking, require todecide if objects in one rectangle A may/must/must not be closer to objects in a second rectangle B ,than objects in a third rectangle R . To decide this relation of “Spatial Domination” it can be shownthat using minimum and maximum distances it is often impossible to detect spatial domination. Thisspatial gem provides a necessary and sufﬁcient decision criterion for spatial domination that can becomputed efﬁciently even in higher dimensional space. In addition, this spatial gem provides anexample, pseudocode and an implementation in Python. Minimal bounding rectangles (MBRs) are used as object approximations in a plethora of different applications. Forexample, MBRs are used to bound spatially extended objects and MBRs are used as spatial key for spatial accessmethods such as the R-Tree [1, 2]. MBR approximations have also become very popular for uncertain databases[3, 4, 5, 6, 7] to approximate all possible locations of an uncertain object.Figure 1: Spatial pruning on MBRs.Rectangular approximations are commonly integratedinto spatial query processing as a ﬁlter-step, to identifytrue hits and true drops efﬁciently based on rectangularapproximations only. Most types of spatial/similarityqueries, including k -nearest neighbor ( k NN). queries,reverse k -nearest neighbor (R k NN) queries, and rank-ing queries, commonly require the following informa-tion. Given three rectangles A , B , and R in a multi-dimensional space R d , the task is to determine whetherobject A is deﬁnitely closer to R than B w.r.t. a distancefunction deﬁned on the objects in R d . If this is the case,we say A dominates B w.r.t. R . An example of such asituation is depicted in Figure 1. This concept of domi-nation is a central problem to identify true hits and truedrops (pruning). For example, in case of a NN queryaround R , we can prune B if it is dominated by A w.r.t. R and for an R NN query around R , we can prune B if A dominates R w.r.t. B .The domination problem is trivial for point objects. However, applied to rectangles, the domination problem is muchmore difﬁcult to solve. Traditionally, the minimal distance and maximal distance between rectangles (min/max-dist)are used to decide which object is closer to another object: If the maximum distance between R and A is lower than a r X i v : . [ c s . D B ] J a n igure 2: MBR pruning example: Incompleteness of Min/Maxthe minimum distance between R and B , then, for any points r ∈ R , a ∈ A and b ∈ B , it must hold that a is closerto r than b . While this implication is correct, the backward direction does not hold. Thus, min/max-dist provides asufﬁcient but not a complete decision criterion. To illustrate this problem, consider the example shown in Figure 2.In this example, we can guarantee that point a spatially dominates point b with respect to R . That is, any point in R must be closer to point a than to point b , as any point in R is located on the same side of the equi-distance line H ab (or Voronoi-line) between a and b . Yet, in this example, we still have M axDist ( a, R ) > M inDist ( b, R ) , thus themin/max-dist approach does not allow to identify this spatial domination. The error of min/max-dist is incurred byusing two different locations of R for the computation of mindist and maxdist.Another existing method for detecting spatial domination on rectangle has been proposed in [8] which exploitsconvexness of rectangles to check only the combination of corners of all three rectangles A , B , and R . The drawbackof this approach is the high run-time, which scale exponentially in the dimensionality of the data space.This spatial gem revisits a necessary and sufﬁcient decision criterion for spatial domination of rectangles [9] that allowsto efﬁciently scale to high dimensionality. We summarize the theory, describe application, illustrate examples andprovides an implementation in Python. The problem of spatial domination is formally deﬁned as follows.

Deﬁnition 1 (Domination) . Let

A, B, R ⊆ R d be rectangles and d ist be an L P norm to measure distance betweenpoints in R d such as Euclidean distance ( P = 2 ) and Manhattan distance ( P = 1 ). The rectangle A dominates B w.r.t. R iff for all points r ∈ R it holds that every point a ∈ A is closer to r than any point b ∈ B , i.e. Dom R ( A, B ) ⇔ ∀ r ∈ R, ∀ a ∈ A, ∀ b ∈ B : d ist ( a, r ) < d ist ( b, r ) (1)Equation 1 enumerates an (uncountably) inﬁnite number of triples of possible point locations and thus, cannot becomputed directly in this form. An decision criterion that is complete, sufﬁcient, and can be computed in O ( d ) time hasbeen proposed in [9] as follows: Theorem 1 (Complete and Sufﬁcient Domination) . Let

A, B, R ⊆ R d be rectangles and d ist be an L P norm tomeasure distance between points in R d . Further, let MinDist ( X, x ) and MaxDist ( X, x ) denote the minimum andmaximum distance between a (one-dimenstional) interval X = [ X min , X max ] and a scalar x , respectively: M inDist ( X, x ) =  , for x ∈ XX min − x, for x < X min x − X max , for x > X max  M axDist ( X, x ) = max( | x − X min | , | x − X max | ) Then, the following equivalence holds:

Dom R ( A, B ) ⇔ d (cid:88) i =1 max r ∈{ R mini ,R maxi } ( MaxDist ( A i , r ) P − MinDist ( B i , r ) P ) < (2) Proof.

A detailed inference of Theorem 1 can be found in [9].2 xample 1.

Let us revisit the example in Figure 2 using Euclidean distance ( P = 2 ), and assume the followingcoordinates of points and rectangles in this example: a = (0 , , b = (0 , , R min = (2 , , and R max = (10 , .Using Equation 2 we obtain, for the ﬁrst dimensions ( i = 1 ): max r ∈{ R min ,R max } ( MaxDist ( A , r ) − MinDist ( B , r ) ) =max r i ∈{ , } ( MaxDist ( A , r ) − MinDist ( B , r ) ) =max( MaxDist ([0 , , − MinDist ([0 , , )) , MaxDist ([0 , , − MinDist ([0 , , ) =max(4 − , − max (0 ,

0) = 0 and for the second dimension we get: max r ∈{ R min ,R max } ( MaxDist ( A , r ) − MinDist ( B , r ) ) =max r i ∈{ , } ( MaxDist ( A , r ) − MinDist ( B , r ) ) =max( MaxDist ([2 , , − MinDist ([0 , , )) , MaxDist ([2 , , − MinDist ([0 , , ) =max(0 − , − max ( − , −

36) = − The sum over the two dimensions yields − − , and since − < , the inequation of Equation 2 is satisﬁed, andthus, Dom R ( a, b ) holds. This section provides an implementation for the complete and sufﬁcient spatial domination decision criterion sketchedin Section 2 and described in detail in [9]. Pseudocode can be found in Algorithm 1, which takes three d -dimensionalrectangle and an L P norm as input, and decides if A spatially dominates B with respect to R . The algorithm iterates ofall dimensions in Line 2. For each dimension, the algorithm uses the minimum and maximum point of rectangle R , andcompares minDist and maxDist of these points to rectangles A and B in lines 4-5. Algorithms to compute maxDist andminDist are shown in Algorithm 2 and Algorithm 3, respectively. The larger of the two is aggregated into the ﬁnalresult in lines 6-10.For an implementation of Algorithm 1 in Python, see https://github.com/azufle/Spatial_Gems_Spatial_Domination . Algorithm 1:

Complete and Sufﬁcient Domination of MBRs input : P // Employed L P norm to measure distance input : d // Number of dimensions input : A, B, R // Three Rectangles in R d . output : A boolean predicate to decide if A spatially dominates B w.r.t. R sum ← // Initialize the sum for i ∈ d // Iterate over each dimension do max ← M axDist ( A i , R mini ) P − M inDist ( B i , R mini ) P // First argument of the maximum max ← M axDist ( A i , R maxi ) P − M inDist ( B i , R maxi ) P // Second argument of the maximum if max ≥ max // Add the larger argument to the sum then sum ← sum + max else sum ← sum + max end return sum < // Return True if sum < and False otherwise lgorithm 2: Maximum Distance between an Interval and a Scalar input : X = [ X min , X max ] // An interval from X min to X max input : x // A scalar in R output : The maximum distance between X and x (for any point in X ) if | x − X min | ≥ | x − X max | then return | x − X min | else return | x − X max | Algorithm 3:

Minimum Distance between an Interval and a Scalar input : X = [ X min , X max ] // An interval from X min to X max input : x // A scalar in R output : The minimum distance between X and x (for any point in X ) if x < X min then return X min − x else if x ≤ X max then return // Case where x ∈ X else return x − X max // Case where x > X max In this section, we will show how the concepts of spatial domination can be used to accelerate candidate pruning usedin similarity search and recommendation systems.

The reverse k-nearest neighbor query problem is given a point q, retrieve all the data points that have q as one of their knearest neighbors. The geometrical pruning based solution of this problem introduced in [10] overcomes the problem ofother RkNN approaches including 1) support arbitrary values of k, 2) can efﬁciently deal with database updates, and 3)are applicable to arbitrary-dimensional feature spaces. The basic operation of determining if an object or a page regionof a spatial index, e.g. an R-tree, can be pruned based on a bisecting hyperplane deﬁned by two multi-dimensionalpoints can be efﬁciently solved with our spatial domination solution.

Multi-preference recommendation and multi-criteria decision making based on top-k query processing has becomea hot topic and has been studied extensively in the last two decades. In the context of multi criteria top-k queryprocessing, computational geometry driven top-k query processing models have gained a lot of interest in recent years[11, 12, 13, 14, 15] and could be successfully adapted to several variants of top-k queries in multi-criteria settings [12].The principal idea of the above mentioned line of work is to translate object domination according to multi-criteriapreferences into hyperplane bisection of the multidimensional preference space. A key operation shared by all theseapproaches is, given a halfspace S deﬁned by hyper-plane bi-section of the multidimensional preference space, and anaxis parallel rectangle R , the determination of which of the following cases is true: case 1) R is completely covered by S , case 2) R intersects S , or case 3) R does not intersect S at all, as illustrated in Figure 3. References [1] A. Guttman. R-Trees: A dynamic index structure for spatial searching. In

ACM SIGMOD , pages 47–57, 1984.[2] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-Tree: An efﬁcient and robust access method forpoints and rectangles. In

ACM SIGMOD , 1990.[3] George Beskales, Mohamed A. Soliman, and Ihab F. IIyas. Efﬁcient search for the top-k probable nearestneighbors in uncertain databases.

Proc. VLDB Endow. , 1(1):326–339, 2008.4igure 3: R-domination identiﬁcation based on hyperplane bisection [12][4] J. Chen and R. Cheng. Efﬁcient evaluation of imprecise location-dependent queries. In

ICDE , 2007.[5] R. Cheng, J. Chen, M. Mokbel, and C. Chow. Probabilistic veriﬁers: Evaluating constrained nearest-neighborqueries over uncertain data. In

ICDE , 2008.[6] R. Cheng, D. Kalashnikov, and S. Prabhakar. Querying imprecise data in moving object environments. In

TKDE ,2004.[7] X. Lian and L. Chen. Probabilistic inverse ranking queries over uncertain data. In

DASFAA , 2009.[8] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, and Andreas Züﬂe. Incremental reverse nearestneighbor ranking in vector spaces. In

International Symposium on Spatial and Temporal Databases , pages265–282. Springer, 2009.[9] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, and Andreas Züﬂe. Boosting spatial pruning: onoptimal pruning of mbrs. In

Proceedings of the 2010 ACM SIGMOD International Conference on Management ofdata , pages 39–50. ACM, 2010.[10] Yufei Tao, Dimitris Papadias, and Xiang Lian. Reverse knn search in arbitrary dimensionality. In

Proceedings ofthe Thirtieth international conference on Very large data bases-Volume 30 , pages 744–755. VLDB Endowment,2004.[11] Kyriakos Mouratidis. Geometric top-k processing: Updates since mdm’16 [advanced seminar]. In , pages 1–3. IEEE, 2019.[12] Kyriakos Mouratidis and Bo Tang. Exact processing of uncertain top-k queries in multi-criteria settings.

Proceed-ings of the VLDB Endowment , 11(8):866–879, 2018.[13] Bo Tang, Kyriakos Mouratidis, and Man Lung Yiu. Determining the impact regions of competing options inpreference space. In

Proceedings of the 2017 ACM International Conference on Management of Data , pages805–820. ACM, 2017.[14] Guolei Yang and Ying Cai. Querying a collection of continuous functions.

IEEE Transactions on Knowledge andData Engineering , 30(9):1783–1795, 2018.[15] Li Qian, Jinyang Gao, and HV Jagadish. Learning user preferences by adaptive pairwise comparison.