[PDF] Approximate k-flat Nearest Neighbor Search

Abstract

Let k be a nonnegative integer. In the approximate k -flat nearest neighbor ( k -ANN) problem, we are given a set P⊂ R d of n points in d -dimensional space and a fixed approximation factor c>1 . Our goal is to preprocess P so that we can efficiently answer approximate k -flat nearest neighbor queries: given a k -flat F , find a point in P whose distance to F is within a factor c of the distance between F and the closest point in P . The case k=0 corresponds to the well-studied approximate nearest neighbor problem, for which a plethora of results are known, both in low and high dimensions. The case k=1 is called approximate line nearest neighbor. In this case, we are aware of only one provably efficient data structure, due to Andoni, Indyk, Krauthgamer, and Nguyen. For k≥2 , we know of no previous results. We present the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary k . We use a data structure for 0 -ANN-queries as a black box, and the performance depends on the parameters of the 0 -ANN solution: suppose we have an 0 -ANN structure with query time O( n ρ ) and space requirement O( n 1+σ ) , for ρ,σ>0 . Then we can answer k -ANN queries in time O( n k/(k+1−ρ)+t ) and space O( n 1+σk/(k+1−ρ) +n log O(1/t) n) . Here, t>0 is an arbitrary constant and the O -notation hides exponential factors in k , 1/t , and c and polynomials in d . Our new data structures also give an improvement in the space requirement over the previous result for 1 -ANN: we can achieve near-linear space and sublinear query time, a further step towards practical applications where space constitutes the bottleneck.

Full PDF

AApproximate k -ﬂat Nearest Neighbor Search ∗ Wolfgang Mulzer † Huy L. Nguy˜ên ‡ Paul Seiferth † Yannik Stein † Abstract

Let k be a nonnegative integer. In the approximate k -ﬂat nearest neighbor ( k -ANN) problem, we aregiven a set P ⊂ R d of n points in d -dimensional space and a ﬁxed approximation factor c > . Our goalis to preprocess P so that we can eﬃciently answer approximate k -ﬂat nearest neighbor queries : givena k -ﬂat F , ﬁnd a point in P whose distance to F is within a factor c of the distance between F andthe closest point in P . The case k = 0 corresponds to the well-studied approximate nearest neighborproblem, for which a plethora of results are known, both in low and high dimensions. The case k = 1 iscalled approximate line nearest neighbor . In this case, we are aware of only one provably eﬃcient datastructure, due to Andoni, Indyk, Krauthgamer, and Nguy˜ên (AIKN) [2]. For k ≥ , we know of noprevious results.We present the ﬁrst eﬃcient data structure that can handle approximate nearest neighbor queries forarbitrary k . We use a data structure for -ANN-queries as a black box, and the performance depends onthe parameters of the -ANN solution: suppose we have an -ANN structure with query time O ( n ρ ) andspace requirement O ( n σ ) , for ρ, σ > . Then we can answer k -ANN queries in time O ( n k/ ( k +1 − ρ )+ t ) and space O ( n σk/ ( k +1 − ρ ) + n log O (1 /t ) n ) . Here, t > is an arbitrary constant and the O -notationhides exponential factors in k , /t , and c and polynomials in d .Our approach generalizes the techniques of AIKN for -ANN: we partition P into clusters of increas-ing radius, and we build a low-dimensional data structure for a random projection of P . Given a queryﬂat F , the query can be answered directly in clusters whose radius is “small” compared to d ( F, P ) usinga grid. For the remaining points, the low dimensional approximation turns out to be precise enough.Our new data structures also give an improvement in the space requirement over the previous resultfor -ANN: we can achieve near-linear space and sublinear query time, a further step towards practicalapplications where space constitutes the bottleneck. ∗ WM and PS were supported in part by DFG Grants MU 3501/1 and MU 3501/2. YS was supported by theDeutsche Forschungsgemeinschaft within the research training group ‘Methods for Discrete Structures’ (GRK1408). † Institut für Informatik, Freie Universität Berlin, { mulzer,pseiferth,yannikstein } @inf.fu-berlin.de . ‡ Simons Institute, UC Berkeley [email protected] . a r X i v : . [ c s . C G ] N ov Introduction Nearest neighbor search is a fundamental problem in computational geometry, with countlessapplications in databases, information retrieval, computer vision, machine learning, signal pro-cessing, etc. [10]. Given a set P ⊂ R d of n points in d -dimensional space, we would like topreprocess P so that for any query point q ∈ R d , we can quickly ﬁnd the point in P that isclosest to q .There are eﬃcient algorithms if the dimension d is “small” [7, 18]. However, as d increases,these algorithms quickly become ineﬃcient: either the query time approaches linear or the spacegrows exponentially with d . This phenomenon is usually called the “curse of dimensionality”.Nonetheless, if one is satisﬁed with just an approximate nearest neighbor whose distance to thequery point q lies within some factor c = 1 + ε , ε > , of the distance between q and the actualnearest neighbor, there are eﬃcient solutions even for high dimensions. Several methods areknown, oﬀering trade-oﬀs between the approximation factor, the space requirement, and thequery time (see, e.g., [1, 3] and the references therein).From a practical perspective, it is important to keep both the query time and the spacesmall. Ideally, we would like algorithms with almost linear (or at least sub-quadratic) spacerequirement and sub-linear query time. Fortunately, there are solutions with these guarantees.These methods include locality sensitive hashing (LSH) [11,12] and a more recent approach thatimproves upon LSH [3]. Speciﬁcally, the latter algorithm achieves query time n / (8 c )+ O (1 /c ) and space n / (8 c )+ O (1 /c ) , where c is the approximation factor.Often, however, the query object is more complex than a single point. Here, the complexityof the problem is much less understood. Perhaps the simplest such scenario occurs when thequery object is a k -dimensional ﬂat, for some small constant k . This is called the approximate k -ﬂat nearest neighbor problem [2]. It constitutes a natural generalization of approximate nearestneighbors, which corresponds to k = 0 . In practice, low-dimensional ﬂats are used to modeldata subject to linear variations. For example, one could capture the appearance of a physicalobject under diﬀerent lighting conditions or under diﬀerent viewpoints (see [4] and the referencestherein).So far, the only known algorithm with worst-case guarantees is for k = 1 , the approximateline nearest neighbor problem. For this case, Andoni, Indyk, Krauthgamer, and Nguy˜ên (AIKN)achieve sub-linear query time d O (1) n / t and space d O (1) n O (1 /ε +1 /t ) , for arbitrarily small t > .For the “dual” version of the problem, where the query is a point but the data set consists of k -ﬂats, three results are known [4,14,15]. The ﬁrst algorithm is essentially a heuristic with somecontrol of the quality of approximation [4]. The second algorithm provides provable guaranteesand a very fast query time of ( d + log n + 1 /ε ) O (1) [14]. The third result, due to Mahabadi,is very recent and improves the space requirement of Magen’s result [15]. Unfortunately, thesealgorithms all suﬀer from very high space requirements, thus limiting their applicability inpractice. In fact, even the basic LSH approach for k = 0 is already too expensive for largedatasets and additional theoretical work and heuristics are required to reduce the memory usageand make LSH suitable for this setting [13, 19]. For k ≥ , we know of no results in the theoryliterature. Our results.

We present the ﬁrst eﬃcient data structure for general approximate k -ﬂat nearestneighbor search. Suppose we have a data structure for approximate point nearest neighborsearch with query time O ( n ρ + d log n ) and space O ( n σ + d log n ) , for some constants ρ, σ > .Then our algorithm achieves query time O ( d O (1) n k/ ( k +1 − ρ )+ t ) and space O ( d O (1) n σk/ ( k +1 − ρ ) + n log O (1 /t ) n ) , where t > can be made arbitrarily small. The constant factors for the querytime depend on k , c , and /t . Our main result is as follows. Theorem 1.1.

Fix an integer k ≥ and an approximation factor c > . Suppose we have a datastructure for approximate point nearest neighbor search with query time O ( n ρ + d log n ) and space O ( n σ + d log n ) , for some constants ρ, σ > . Let P ⊂ R d be a d -dimensional n -point set. For Main Data Structure and Algorithm Overview any parameter t > , we can construct a data structure with O ( d O (1) n kσ/ ( k +1 − ρ ) + n log O (1 /t ) n ) space that can answer the following queries in expected time O ( d O (1) n k/ ( k +1 − ρ )+ t ) : given a k -ﬂat F ⊂ R d , ﬁnd a point p ∈ P with d ( p, F ) ≤ cd ( P, F ) . Algorithm ρ σ

AINR [3] / c + O (1 /c ) 7 / c + O (1 /c ) LSH1 [1, Theorem 3.2.1] /c /c LSH2 [1, Theorem 3.4.1] O (1 /c ) 0 The table above gives an overview of some approximate point nearest neighbor structures thatcan be used in Theorem 1.1. The result by AINR gives the current best query performance forlarge enough values of c . For smaller c , an approach using locality sensitive hashing (LSH1)may be preferable. With another variant of locality sensitive hashing (LSH2), the space can bemade almost linear, at the expense of a slightly higher query time. The last result (and relatedpractical results, e.g., [13]) is of particular interest in applications as the memory consumptionis a major bottleneck in practice. It also improves over the previous algorithm by AIKN for linequeries.Along the way towards Theorem 1.1, we present a novel data structure for k -ﬂat near neighbor reporting when the dimension d is constant. The space requirement in this case is O d ( n log O ( d ) n ) and the query time is O d ( n k/ ( k +1) log d − k − n + | R | ) , where R is the answer set.We believe that this data structure may be of independent interest and may lead to furtherapplications. Our results provide a vast generalization of the result in AIKN and shows forthe ﬁrst time that it is possible to achieve provably eﬃcient nearest neighbor search for higher-dimensional query objects. Our techniques.

Our general strategy is similar to the approach by AIKN. The data struc-ture consists of two main structures: the projection structure and the clusters . The projectionstructure works by projecting the point set to a space of constant dimension and by answeringthe nearest neighbor query in that space. As we will see, this suﬃces to obtain a rough estimatefor the distance, and it can be used to obtain an exact answer if the point set is “spread out”.Unfortunately, this does not need to be the case. Therefore, we partition the point set intoa sequence of clusters . A cluster consists of m points and a k -ﬂat K such that all points in thecluster are “close” to K , where m is a parameter to be optimized. Using a rough estimate fromthe projection structure, we can classify the clusters as small and large . The points in the largeclusters are spread out and can be handled through projection. The points in the small clustersare well behaved and can be handled directly in high dimensions using grids and discretization. Organization.

In order to provide the curious reader with quick gratiﬁcation, we will give themain data structure together with the properties of the cluster and the projection structure inSection 2. Considering these structures as black boxes, this already proves Theorem 1.1.In the remainder of the paper, we describe the details of the helper structures. The necessarytools are introduced in Section 3. Section 4 gives the approximate nearest neighbor algorithmfor small clusters. In Section 5, we consider approximate near neighbor reporting for k -ﬂats inconstant dimension. This data structure is then used for the projection structures in Section 6. We describe our main data structure for approximate k -ﬂat nearest neighbor search. It relies onvarious substructures that will be described in the following sections. Throughout, P denotes a d -dimensional n -point set, and c > is the desired approximation factor.Let K be a k -ﬂat in d dimensions. The ﬂat-cluster C (or cluster for short) of K with radius α is the set of all points with distance at most α to K , i.e., C = { p ∈ R d | d ( p, K ) ≤ α } . Acluster is full if it contains at least m points from P , where m is a parameter to be determined. Main Data Structure and Algorithm Overview We call

P α -cluster-free if there is no full cluster with radius α . Let t > be an arbitrarilysmall parameter. Our data structure requires the following three subqueries. Q1:

Given a query ﬂat F , ﬁnd a point p ∈ P with d ( p, F ) ≤ n t d ( P, F ) . Q2:

Assume P is contained in a ﬂat-cluster with radius α . Given a query ﬂat F with d ( P, F ) ≥ α/n t , return a point p ∈ P with d ( p, F ) ≤ cd ( P, F ) . Q3:

Assume P is αn t / (2 k + 1) -cluster free. Given a query ﬂat F with d ( P, F ) ≤ α , ﬁnd thenearest neighbor p ∗ ∈ P to F .Brieﬂy, our strategy is as follows: during the preprocessing phase, we partition the point setinto a set of full clusters of increasing radii. To answer a query F , we ﬁrst perform a query oftype Q1 to obtain an n t -approximate estimate (cid:101) r for d ( P, F ) . Using (cid:101) r , we identify the “small”clusters. These clusters can be processed using a query of type Q2 . The remaining point setcontains no “small” full cluster, so we can process it with a query of type Q3 .We will now describe the properties of the subqueries and the organization of the datastructure in more detail. The data structure for Q2 -queries is called the cluster structure . It isdescribed in Section 4, and it has the following properties. Theorem 2.1.

Let Q be a d -dimensional m -point set that is contained in a ﬂat-cluster of radius α . Let c > be an approximation factor. Using space O c ( m σ + d log m ) , we can build adata structure with the following property. Given a query k -ﬂat F with d ( P, F ) ≥ α/n t and anestimate (cid:101) r with d ( P, F ) ∈ [ (cid:101) r/n t , (cid:101) r ] , we can ﬁnd a c -approximate nearest neighbor for F in Q intotal time O c (( n t k ) k +1 ( m − /k + ρ/k + ( d/k ) log m )) . The data structures for Q1 and Q3 are very similar, and we cover them in Section 6. Theyare called projection structures , since they are based on projecting P into a low dimensionalsubspace. In the projected space, we use a data structure for approximate k -ﬂat near neighborsearch to be described in Section 5. The projection structures have the following properties. Theorem 2.2.

Let P be a d -dimensional n -point set, and let t > be a small enough con-stant. Using space and time O ( n log O (1 /t ) n ) , we can obtain a data structure for the follow-ing query: given a k -ﬂat F , ﬁnd a point p ∈ P with d ( p, F ) ≤ n t d ( P, F ) . A query needs O ( n k/ ( k +1) log O (1 /t ) n ) time, and the answer is correct with high probability. Theorem 2.3.

Let P be a d -dimensional n -point set, and let t > be a small enough constant.Using space and time O ( n log O (1 /t ) n ) , we can obtain a data structure for the following query:given a k -ﬂat F and α > such that d ( F, P ) ≤ α and such that P is αn t / (2 k + 1) -cluster-free,ﬁnd an exact nearest neighbor for F in P . A query needs O ( n k/ ( k +1) log O (1 /t ) n + m ) time, andthe answer is correct with high probability. Here, m denotes the size of a full cluster. First, we build a projection structure for Q1 queries on P . This needs O ( n log O (1 /t ) n ) space,by Theorem 2.2. Then, we repeatedly ﬁnd the full ﬂat-cluster C with smallest radius. The m points in C are removed from P , and we build a cluster structure for Q2 queries on this set. ByTheorem 2.1, this needs O c ( m σ + d log m ) space. To ﬁnd C , we check all ﬂats K spannedby k + 1 distinct points of P . In Lemma 3.3 below, we prove that this provides a good enoughapproximation. In the end, we have n/m point sets Q , . . . , Q n/m ordered by decreasing radius,i.e., the cluster for Q has the largest radius. The total space occupied by the cluster structuresis O ( nm σ + ( n/m ) d log n ) .Finally, we build a perfect binary tree T with n/m leaves labeled Q , . . . , Q n/m , from leftto right. For a node v ∈ T let Q v be the union of all Q i assigned to leaves below v . For each v ∈ T we build a data structure for Q v to answer Q3 queries. Since each point is contained in O (log n ) data structures, the total size is O ( n log O (1 /t ) n ) , by Theorem 2.3. For pseudocode, seeAlgorithm 1. Main Data Structure and Algorithm Overview Input : point set P ⊂ R d , approximation factor c , parameter t > Q ← P for i ← n/m downto do For each V ∈ (cid:0) Qk +1 (cid:1) , consider the k -ﬂat K V deﬁned by V . Let α V be the radius of thesmallest ﬂat-cluster of K V with exactly m points of Q . Choose the ﬂat K = K V that minimizes α V and set α i = α V . Remove from Q the set Q i of m points in Q within distance α i from K . Construct a cluster structure C i for the cluster ( K, Q i ) . Build a perfect binary tree T with n/m leaves, labeled Q , . . . , Q n/m from left to right. foreach node v ∈ T do Build data structure for Q3 queries as in Theorem 2.3 for the set Q v corresponding tothe leaves below v . Algorithm 1:

Preprocessing algorithm. Compared with AIKN [2], we organize the pro-jection structure in a tree to save space.

Suppose we are given a k -ﬂat F . To ﬁnd an approximate nearest neighbor for F we proceedsimilarly as AIKN [2]. We use Q2 queries on “small” clusters and Q3 queries on the remainingpoints; for pseudocode, see Algorithm 2. Input : query ﬂat F Output : a c -approximate nearest neighbor for F in P Query the root of T for a n t -approximate nearest neighbor p to F . /* type Q1 */ (cid:101) r ← d ( p , F ) i ∗ ← maximum i ∈ { , . . . , n/m } with α i > (cid:101) rn t , or if no such i exists for i ← i ∗ + 1 to n/m do /* type Q2; we have d ( Q i , F ) ≥ (cid:101) r/n t ≥ α i /n t */ Query cluster structure C i with estimate (cid:101) r . /* type Q3 */ Query projection structure for a (cid:101) r -thresholded nearest neighbor of F in Q = (cid:83) j ∗ i =1 U i . return closest point to F among query results. Algorithm 2:

Algorithm for ﬁnding approximate nearest neighbor in high dimensions.First, we perform a query of type Q1 to get a n t -approximate nearest neighbor p for F in time O ( n k/ ( k +1) log O (1 /t ) n ) . Let (cid:101) r = d ( p , F ) . We use (cid:101) r as an estimate to distinguishbetween “small” and “large” clusters. Let i ∗ ∈ { , . . . , n/m } be the largest integer such thatthe cluster assigned with Q i ∗ has radius α i ∗ > (cid:101) rn t . For i = i ∗ + 1 , . . . , n/m , we use (cid:101) r as anestimate for a Q2 query on Q i . Since | Q i | = m and by Theorem 2.1, this needs total time O ( n t ( k +1)+1 m − /k + ρ/k + ( n/m ) d log m ) .It remains to deal with points in “large” clusters. The goal is to perform a type Q3 query on (cid:83) ≤ i ≤ i ∗ Q i . For this, we start at the leaf of T labeled Q i ∗ and walk up to the root. Each timewe encounter a new node v from its right child, we perform a Q3 query on Q u , where u denotesthe left child of v . Let L be all the left children we ﬁnd in this way. Then clearly we have | L | = O (log n ) and (cid:83) u ∈ L Q u = (cid:83) ≤ i ≤ i ∗ Q i . Moreover, by construction, there is no full clusterwith radius less than (cid:101) rn t deﬁned by k + 1 vertices of Q u for any u ∈ L . We will see that thisimplies every Q u to be (cid:101) rn t / (2 k + 1) -cluster-free, so Theorem 2.3 guarantees a total query time of O ( n k/ ( k +1) log O (1 /t ) n + m ) for this step. Among all the points we obtained during the queries,we return the one that is closest to F . A good trade-oﬀ point is achieved for m = nm − /k + ρ/k ,i.e., for m = n k/ ( k +1 − ρ ) . This gives the bounds claimed in Theorem 1.1. Correctness.

Let p ∗ be a point with d ( p ∗ , F ) = d ( P, F ) . First, suppose that p ∗ ∈ Q i , for some Preliminaries i > i ∗ . Then, we have d ( p ∗ , F ) ≥ (cid:101) r/n t ≥ α i /n t , where α i is the radius of the cluster assignedto Q i . Since (cid:101) r is a valid n t -approximate estimate for d ( F, Q i ) , a query of type Q2 on Q i givesa c -approximate nearest neighbor, by Theorem 2.1. Now, suppose that p ∗ ∈ Q i for ≤ i ≤ i ∗ .Let u be the node of L with p ∗ ∈ Q u . Then Theorem 2.3 guarantees that we will ﬁnd p ∗ whendoing a Q3 query on Q u . Partition Trees.

Fix an integer constant r > , and let P ⊂ R d be a d -dimensional n -pointset. A simplicial r -partition Ξ for P is a sequence Ξ = ( P , ∆ ) , . . . , ( P m , ∆ m ) of pairs such that(i) the sets P , . . . , P m form a partition of P with n/r ≤ | P i | ≤ (cid:100) n/r (cid:101) , for i = 1 , . . . , m ; (ii)each ∆ i is a relatively open simplex with P i ⊂ ∆ i , for i = 1 , . . . , m ; and (iii) every hyperplane h in R d crosses O ( r − /d ) simplices ∆ i in Ξ . Here, a hyperplane h crosses a simplex ∆ if h intersects ∆ , but does not contain it. In a classic result, Matoušek showed that such a simplicialpartition always exists and that it can be computed eﬃciently [6, 16]. Theorem 3.1 (Partition theorem, Theorem 3.1 and Lemma 3.4 in [16]) . For any d -dimensional n -point set P ⊂ R d and for any constant ≤ r ≤ n/ , there exists a simplicial r -partition for P . Furthermore, if r is bounded by a constant, such a partition can be found in time O ( n ) . Through repeated application of Theorem 3.1, one can construct a partition tree for P . Apartition tree T is a rooted tree in which each node is associated with a pair ( Q, ∆) , such that Q is a subset of P and ∆ is a relatively open simplex that contains Q . If | Q | ≥ r , the childrenof ( Q, ∆) constitute a simplicial r -partition of Q . Otherwise, the node ( Q, ∆) has | Q | childrenwhere each child corresponds to a point in Q . A partition tree has constant degree, linear size,and logarithmic depth.Given a hyperplane h , there is a straightforward query algorithm to ﬁnd the highest nodesin T whose associated simplex does not cross h : start at the root and recurse on all childrenwhose associated simplex crosses h ; repeat until there are no more crossings or until a leaf isreached. The children of the traversed nodes whose simplices do not cross h constitute the desiredanswer. A direct application of Theorem 3.1 yields a partition tree for which this query takestime O ( n − /d + γ ) , where γ > is a constant that can be made arbitrarily small by increasing r . In 2012, Chan [5] described a more global construction that eliminates the n γ factor. Theorem 3.2 (Optimal Partition Trees [5]) . For any d -dimensional n -point set P ⊂ R d , andfor any large enough constant r , there is a partition tree T with the following properties: (i) thetree T has degree O ( r ) and depth log r n ; (ii) each node is of the form ( Q, ∆) , where Q is a subsetof P and ∆ a relatively open simplex that contains Q ; (iii) for each node ( Q, ∆) , the simplicesof the children of Q are contained in ∆ and are pairwise disjoint; (iv) the point set associatedwith a node of depth (cid:96) has size at most n/r (cid:96) ; (v) for any hyperplane h in R d , the number m (cid:96) ofsimplices in T that h intersects at level (cid:96) obeys the recurrence m (cid:96) = O ( r (cid:96) ( d − /d + r (cid:96) ( d − / ( d − m (cid:96) − + r(cid:96) log r log n ) . Thus, h intersects O ( n − /d ) simplices in total. The tree T can be build in expected time O ( n log n ) . k -ﬂat Discretization. For our cluster structure we must ﬁnd k -ﬂats that are close to manypoints. The following lemma shows that it suﬃces to check “few” k -ﬂats for this. Lemma 3.3.

Let P ⊂ R d be a ﬁnite point set with | P | ≥ k +1 , and let F ⊂ R d be a k -ﬂat. Thereis a k -ﬂat F (cid:48) such that F (cid:48) is the aﬃne hull of k + 1 points in P and δ F (cid:48) ( P ) ≤ (2 k + 1) δ F ( P ) ,where δ F (cid:48) ( P ) = max p ∈ P d( p, F (cid:48) ) and δ F ( P ) = max p ∈ P d( p, F ) . Cluster Structure Proof.

This proof generalizes the proof of Lemma 2.3 by AIKN [2].Let Q be the orthogonal projection of P onto F . We may assume that F is the aﬃne hullof Q , since otherwise we could replace F by aﬀ ( Q ) without aﬀecting δ F ( P ) . We choose anorthonormal basis for R d such that F is the linear subspace spanned by the ﬁrst k coordinates.An aﬃne basis for F (cid:48) is constructed as follows: ﬁrst, take a point p ∈ P whose x -coordinateis minimum. Let q be the projection of p onto F , and translate the coordinate system suchthat q is the origin. Next, choose k additional points p , . . . , p k ∈ P such that | det( q , . . . , q k ) | is maximum, where q i is the projection of p i onto F , for i = 1 , . . . , k . That is, we choose k additional points such that the volume of the k -dimensional parallelogram spanned by theirprojections onto F is maximized. The set { q , . . . , q k } is a basis for F , since the maximumdeterminant cannot be by our assumption that F is spanned by Q .Now ﬁx some point p ∈ P and let q be its projection onto F . We write q = (cid:80) ki =1 µ i q i . Then,the point r = (cid:80) ki =1 µ i p i + (1 − (cid:80) ki =1 µ i ) p lies in F (cid:48) . By the triangle inequality, we have d( p, r ) ≤ d( p, q ) + d( q, r ) ≤ δ F ( P ) + d( q, r ) . (1)To upper-bound d( q, r ) we ﬁrst show that all coeﬃcients µ i lie in [ − , . Claim 3.4.

Take p ∈ P , q ∈ Q and q , . . . , q k as above. Write q = (cid:80) ki =1 µ i q i . Then for i = 1 , . . . , k , we have µ i ∈ [ − , , and µ j ≥ for at least one j ∈ { , . . . , k } .Proof. We ﬁrst prove that all coeﬃcients µ i lie in the interval [ − , . Suppose that | µ i | > forsome i ∈ { , . . . , k } . We may assume that i = 1 . Using the linearity of the determinant, | det( q, q , . . . , q k ) | = | det( µ q , q , . . . , q k ) | = | µ |·| det( q , q , . . . , q k ) | > | det( q , q , . . . , q k ) | , contradicting the choice of q , . . . , q k .Furthermore, by our choice of the origin, all points in Q have a non-negative x -coordinate.Thus, at least one coeﬃcient µ j , j ∈ { , . . . , k } , has to be non-negative.Using Claim 3.4, we can now bound d( q, r ) . For i = 1 , . . . , k , we write p i = q i + q ⊥ i , where q ⊥ i is orthogonal to F . Then, d( q, r ) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) i =1 µ i q i − k (cid:88) i =1 µ i ( q i + q ⊥ i ) − (cid:32) − k (cid:88) i =1 µ i (cid:33) p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) i =1 µ i q ⊥ i + (cid:32) − k (cid:88) i =1 µ i (cid:33) p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:32) k (cid:88) i =1 | µ i | + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − k (cid:88) i =1 µ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:33) δ F ( P ) ≤ kδ F ( P ) , (2)since (cid:107) q ⊥ (cid:107) , . . . , (cid:107) q ⊥ k (cid:107) , (cid:107) p (cid:107) ≤ δ F ( P ) , and since (cid:12)(cid:12)(cid:12) − (cid:80) ki =1 µ i (cid:12)(cid:12)(cid:12) ≤ k follows from fact that at leastone µ i is non-negative. By (1) and (2), we get d( p, F (cid:48) ) ≤ (2 k + 1) δ F ( P ) . Remark 3.5.

For k = 1 , the proof of Lemma 3.3 coincides with the proof of Lemma 2.3 byAIKN [2]. In this case, one can obtain a better bound on d( q, r ) since q is a convex combinationof q and q . This gives δ F (cid:48) ( P ) ≤ δ F ( P ) . A k -ﬂat cluster structure consists of a k -ﬂat K and a set Q of m points with d ( q, K ) ≤ α , forall q ∈ Q . Let K : u (cid:55)→ A (cid:48) u + a be a parametrization of K , with A (cid:48) ∈ R d × k and a ∈ R d suchthat the columns of A (cid:48) constitute an orthonormal basis for K and such that a is orthogonal to K . We are also given an approximation parameter c > . The cluster structure uses a datastructure for approximate point nearest neighbor search as a black box. We assume that we Cluster Structure have such a structure available that can answer c -approximate point nearest neighbor queriesin d dimensions with query time O c ( n ρ + d log n ) and space requirement O c ( n σ + d log n ) forsome constants ρ, σ > . As mentioned in the introduction, the literature oﬀers several datastructures for us to choose from.The cluster structure distinguishes two cases: if the query ﬂat F is close to K , we canapproximate F by few “patches” that are parallel to K , such that a good nearest neighbor forthe patches is also good for K . Since the patches are parallel to K , they can be handled through -ANN queries in the orthogonal space K ⊥ and low-dimensional queries inside K . If the queryﬂat is far from K , we can approximate Q by its projection onto K and handle the query with alow-dimensional data structure. Let K ⊥ be the linear subspace of R d that is orthogonal to K . Let Q a be the projection of Q onto K , and let Q b be the projection of Q onto K ⊥ . We compute a k -dimensional partition tree T for Q a . As stated in Theorem 3.2, the tree T has O ( m ) nodes, and it can be computed intime O ( m log m ) .For each node ( S a , ∆) of T , we do the following: we determine the set S ⊆ Q whoseprojection onto K gives S a , and we take the projection S b of S onto K ⊥ . Then, we build a d − k dimensional c (cid:48) -ANN data structure for S b , as given by the assumption, where c (cid:48) = (1 − / log n ) c .See Algorithm 3 for pseudocode. Input : k -ﬂat K ⊆ R d , point set Q ⊂ R d with d ( q, K ) ≤ α for all q ∈ Q , approximationparameter c > Q a ← projection of Q onto K Q b ← projection of Q onto K ⊥ Build a k -dimensional partition tree T for Q a as in Theorem 3.2. c (cid:48) ← (1 − / log n ) c foreach node ( S a , ∆) ∈ T do S b ← projection of the points in Q corresponding to S a onto K ⊥ Build a ( d − k ) -dimensional c (cid:48) -ANN structure for S b as given by the assumption. Algorithm 3:

CreateClusterStructure

Lemma 4.1.

The cluster structure can be constructed in total time O c ( m ρ + md log m ) , andit requires O c ( m σ + d log m ) space.Proof. By Theorem 3.2, the partition tree can be built in O ( m log m ) time. Thus, the prepro-cessing time is dominated by the time to construct the c (cid:48) -ANN data structures at the nodes ofthe partition tree T . Since the sets on each level of T constitute a partition of Q , and sincethe sizes of the sets decrease geometrically, the bounds on the preprocessing time and spacerequirement follow directly from our assumption. Note that by our choice of c (cid:48) = (1 − / log n ) c ,the space requirement and query time for the ANN data structure change only by a constantfactor. We set ε = 1 /

100 log n . Let F be the query k -ﬂat, given as F : v (cid:55)→ B (cid:48) v + b , with B (cid:48) ∈ R d × k and b ∈ R d such that the columns of B (cid:48) are an orthonormal basis for F and b is orthogonalto F . Our ﬁrst task is to ﬁnd bases for the ﬂats K and F that provide us with informationabout the relative position of K and F . For this, we take the matrix M = A (cid:48) T B (cid:48) ∈ R k × k , andwe compute a singular value decomposition M = U Σ V T of M [9, Chapter 7.3]. Recall that U and V are orthogonal k × k matrices and that Σ = diag ( σ , . . . , σ k ) is a k × k diagonal matrix Cluster Structure with σ ≥ · · · ≥ σ k ≥ . We call σ , . . . , σ k the singular values of M . The following lemmasummarizes the properties of the SVD that are relevant to us. Lemma 4.2.

Let M = A (cid:48) T B (cid:48) , and let M = U Σ V T be a singular value decomposition for M .Let u , . . . , u k be the columns of U and v , . . . , v k be the columns of V . Then, (i) u , . . . , u k is an orthonormal basis for K (in the coordinate system induced by A (cid:48) ); (ii) v , . . . , v k is anorthonormal basis for F (in the coordinate system induced by B (cid:48) ): and (iii) for i = 1 , . . . , k , theprojection of v i onto K is σ i u i and the projection of u i onto F is σ i v i (again in the coordinatesystems induced by A (cid:48) and B (cid:48) ). In particular, we have σ ≤ .Proof. Properties (i) and (ii) follow since U and V are orthogonal matrices. Property (iii) holdsbecause M = A (cid:48) T B (cid:48) describes the projection from F onto K (in the coordinate systems inducedby A (cid:48) and B (cid:48) ) and because M T = B (cid:48) T A (cid:48) = V Σ U T describes the projection from K onto F .We reparametrize K according to U and F according to V . More precisely, we set A = A (cid:48) U and B = B (cid:48) V , and we write K : u (cid:55)→ Au + a and F : v (cid:55)→ Bv + b . The new coordinate systemprovides a simple representation for the distances between F and K . We begin with a technicallemma that is a simple corollary of Lemma 4.2. Lemma 4.3.

Let a , . . . , a k be the columns of the matrix A ; let a (cid:107) , . . . , a (cid:107) k be the columns ofthe matrix BB T A , and a ⊥ , . . . , a ⊥ k the columns of the matrix A − BB T A . Then, (i) for i =1 , . . . , k , the vector a (cid:107) i is the projection of a i onto F and the vector a ⊥ i is the projection of a i onto F ⊥ ; (ii) for i = 1 , . . . , k , we have (cid:107) a (cid:107) i (cid:107) = σ i and (cid:107) a ⊥ i (cid:107) = √ − σ i ; and (iii) the vectors a (cid:107) , . . . , a (cid:107) k , a ⊥ , . . . , a ⊥ k are pairwise orthogonal. An analogous statement holds for the matrices B , AA T B , and B − AA T B .Proof. Properties (i) and (ii) are an immediate consequence of the deﬁnition of A and B andof Lemma 4.2. The set a (cid:107) , . . . , a (cid:107) k is orthogonal by Lemma 4.2(ii). Furthermore, since for any i, j ∈ { , . . . , k } , the vector a (cid:107) i lies in F and the vector a ⊥ i lies in F ⊥ , a (cid:107) i and a ⊥ j are orthogonal.Finally, let ≤ i < j ≤ k . Then, (cid:104) a ⊥ i , a ⊥ j (cid:105) = (cid:104) a ⊥ i , a ⊥ j (cid:105) + (cid:104) a ⊥ i , a (cid:107) j (cid:105) + (cid:104) a (cid:107) i , a ⊥ j (cid:105) + (cid:104) a (cid:107) i , a (cid:107) j (cid:105) = (cid:104) a i , a j (cid:105) = 0 , since we already saw that (cid:104) a ⊥ i , a (cid:107) j (cid:105) = (cid:104) a (cid:107) i , a ⊥ j (cid:105) = (cid:104) a (cid:107) i , a (cid:107) j (cid:105) = (cid:104) a i , a j (cid:105) = 0 . The argument for theother matrices is completely analogous.The next lemma shows how our choice of bases gives a convenient representation of thedistances between F and K . Lemma 4.4.

Take two points x F ∈ K and y K ∈ F such that d ( F, K ) = d ( y K , x F ) . Write x F = Au F + a and y K = Bv K + b . Then, for any point x ∈ K with x = Au + a , we have d ( F, x ) = k (cid:88) i =1 (cid:0) − σ i (cid:1) ( u − u F ) i + d ( F, K ) , and for any point y ∈ F with y = Bv + b , we have d ( y, K ) = k (cid:88) i =1 (cid:0) − σ i (cid:1) ( v − v K ) i + d ( F, K ) . Proof.

We show the calculation for d ( F, x ) . The calculation for d ( y, K ) is symmetric. Let x ∈ K with x = Au + a be given. Let y x ∈ F be the projection of x onto F . Then, d ( F, x ) = (cid:107) x − y x (cid:107) = (cid:107) ( x − x F )+( x F − y K )+( y K − y x ) (cid:107) = (cid:107) ( x − x F ) − ( y x − y K ) (cid:107) + (cid:107) x F − y K (cid:107) , Cluster Structure where the last equality is due to Pythagoras, since x − x F lies in K , y x − y K lies in F , and x F − y K is orthogonal to both K and F . Now, we have y x = BB T x + b . Similarly, since y K isthe projection of x F onto F , we have y K = BB T x F + b . Thus, d ( F, x ) = (cid:13)(cid:13) ( x − x F ) − BB T ( x − x F ) (cid:13)(cid:13) + d ( F, K ) = (cid:13)(cid:13)(cid:0) A − BB T A (cid:1) ( u − u F ) (cid:13)(cid:13) + d ( F, K ) , using the deﬁnition of x and x F . By Lemma 4.3, the columns a ⊥ , . . . , a ⊥ k of the matrix A − BB T A (cid:48) are pairwise orthogonal and for i = 1 , . . . , k , we have (cid:107) a ⊥ i (cid:107) = 1 − σ i . Pythagoras gives d ( F, x ) = k (cid:88) i =1 (cid:107) a ⊥ i (cid:107) ( u − u F ) i + d ( F, K ) = k (cid:88) i =1 (cid:0) − σ i (cid:1) ( u − u F ) i + d ( F, K ) . Input : query k -ﬂat F ⊆ R d ; an estimate (cid:101) r with d ( F, Q ) ∈ [ (cid:101) r/n t , (cid:101) r ] . M ← A (cid:48) T B (cid:48) . Compute an SVD M = U Σ V T of M with singular values ≥ σ ≥ · · · ≥ σ k ≥ . if σ k = 1 then f ← projection of F onto K ⊥ ; /* F and K are parallel; f is a point */ r ← c (cid:48) -ANN for f in Q b return r Reparametrize K according to U and F according to V . /* Near case */ G ← set of approximate patches obtained by combining Lemma 4.6 and 4.7 R ← ∅ foreach G ← G do R ← R ∪ result of approximate nearest-neighbor query for G as in Lemma 4.8 /* Far case */ R ← R ∪ result of approximate nearest-neighbor for G as in Lemma 4.11 return point in R that minimizes the distance to F Algorithm 4:

QueryClusterStructureWe now give a brief overview of the query algorithm, refer to Algorithm 4 for pseudocode.First, we check for the special case that F and K are parallel, i.e., that σ = · · · = σ k = 1 . Inthis case, we need to perform only a single c (cid:48) -ANN query in Q b to obtain the desired result. If F and K are not parallel, we distinguish two scenarios: if F is far from Q , we can approximate Q by its projection Q a onto K . Thus, we take the closest point x F in K to F , and we returnan approximate nearest neighbor for x F in Q a according to an appropriate metric derived fromLemma 4.4. Details can be found in Section 4.2.2. If F is close to Q , we use Lemma 4.4 todiscretize the relevant part of F into patches , such that each patch is parallel to K and suchthat the best nearest neighbor in Q for the patches provides an approximate nearest neighborfor F . Each patch can then be handled essentially by an appropriate nearest neighbor queryin K ⊥ . Details follow in Section 4.2.1. We say F and Q are close if d ( F, Q ) ≤ α/ε , and far if d ( F, Q ) > α/ε . Recall that we chose ε = 1 /

100 log n . d ( F, Q ) ≤ α/ε We use our reparametrization of F and K to split the coordinates as follows: recall that ≥ σ ≥ · · · ≥ σ k ≥ are the singular values of M = A (cid:48) T B (cid:48) . Pick l ∈ { , . . . , k } such that ≥ σ i ≥ √ − ε , for i = 1 , . . . , l , and √ − ε > σ i ≥ , for i = l + 1 , . . . , k . For a d × k matrix X , let X [ i ] denote the d × i matrix with the ﬁrst i columns of X , and X − [ i ] the d × ( k − i ) matrix Cluster Structure with the remaining k − i columns of X . Similarly, for a vector v ∈ R k , let v [ i ] be the vectorin R i with the ﬁrst i coordinates of v , and v − [ i ] the vector in R k − i with the remaining k − i coordinates of v .The following lemma is an immediate consequence of Lemma 4.4. It tells us that we canpartition the directions in F into those that are almost parallel to K and those that are almostorthogonal to K . Along the orthogonal directions, we discretize F into few lower-dimensionalﬂats that are almost parallel to K . After that, we approximate these ﬂats by few patches thatare actually parallel to K . These patches are then used to perform the query. Lemma 4.5.

Let y ∈ F be a point and y K ∈ F with d ( F, K ) = d ( y K , K ) . Write y K = Bv K + b and y = Bv + b . Then, (cid:13)(cid:13) ( v − v K ) − [ l ] (cid:13)(cid:13) ≤ d ( y, K ) / √ ε .Proof. By Lemma 4.4 and the choice of l , d ( y, K ) = k (cid:88) i =1 (cid:0) − σ i (cid:1) ( v − v K ) i + d ( F, K ) ≥ k (cid:88) i = l +1 (cid:0) − σ i (cid:1) ( v − v K ) i ≥ ε (cid:13)(cid:13) ( v − v K ) − [ l ] (cid:13)(cid:13) . Using Lemma 4.5, we can discretize the query F into a set of l -ﬂats that are almost parallelto the cluster ﬂat K . Lemma 4.6.

There is a set L of O (( n t k / ε − / ) k − l ) l -ﬂats such that the following holds: (i)for every L ∈ L , we have L ⊆ F ; (ii) for every L ∈ L and for every unit vector u ∈ L , theprojection of u onto K has length at least √ − ε ; and (iii) if d ( F, Q ) ∈ [ α/n t , α/ε ] , then thereis an l -ﬂat L ∈ L with d ( L, Q ) ≤ (1 + ε ) d ( F, Q ) .Proof. Let y K = Bv K + b ∈ F be a point in F with d ( F, K ) = d ( y K , K ) . Furthermore, let τ = αεn t √ k and o τ = (cid:38) n t √ kε / (cid:39) . Using τ and σ τ , we deﬁne a set I of index vectors with I = {− o τ τ, ( − o τ + 1) τ, . . . , o τ τ } k − l and | I | = O ( o k − lτ ) = O (( n t k / ε − / ) k − l ) . For each i ∈ I , we deﬁne the l -ﬂat L i as L i : w (cid:55)→ B [ l ] w + B − [ l ] (cid:0) ( v K ) − [ l ] + i (cid:1) + b. Our desired set of approximate query l -ﬂats is now L = { L i | i ∈ I } .The set L meets properties (i) and (ii) by construction, so it remains to verify (iii). Forthis, we take a point y Q ∈ F with d ( F, Q ) = d ( y Q , Q ) . We write y Q = Bv Q + b , and we deﬁne s = ( v Q − v K ) − [ l ] . We assumed that d ( y Q , K ) ≤ α/ε , so Lemma 4.5 gives (cid:107) s (cid:107) ≤ α/ε / . It followsthat by rounding each coordinate of s to the nearest multiple of τ , we obtain an index vector i Q ∈ I with (cid:107) i Q − s (cid:107) ≤ τ √ k = εα/n t . Hence, considering the point in L i Q with w = ( v Q ) [ l ] ,we get d ( L i Q , Q ) ≤ d ( L i Q , y Q ) + d ( y Q , Q ) ≤ (cid:13)(cid:13) B [ l ] ( v Q ) [ l ] + B − [ l ] (cid:0) ( v K ) − [ l ] + i Q (cid:1) + b − Bv Q − b (cid:13)(cid:13) + d ( F, Q )= (cid:13)(cid:13) B − [ l ] (cid:0) ( v K ) − [ l ] + i Q − ( v Q ) − [ l ] (cid:1)(cid:13)(cid:13) + d ( F, Q )= (cid:107) ( v K − v Q ) − [ l ] + i Q (cid:107) + d ( F, Q ) (*) = (cid:107) i Q − s (cid:107) + d ( F, Q ) ≤ εα/n t + d ( F, Q ) ≤ (1 + ε ) d ( F, Q ) , (**)where in (*) we used that the columns of B − [ l ] are orthonormal and in (**) we used the assump-tion d ( F, Q ) ≥ α/n t . Cluster Structure From now on, we focus on an approximate query l -ﬂat L : w (cid:55)→ B w + b with B = B [ l ] .Our next goal is to approximate L by a set of patches such that each is parallel to K . Lemma 4.7.

There is a set G of O (( n t k / ε − ) l ) patches such that the following holds: (i)every G ∈ G is an l -dimensional polytope, given by O ( l ) inequalities; (ii) for every G ∈ G , theaﬃne hull of G is parallel to K ; (iii) if d ( L, Q ) ∈ [ α/n t , α/ε ] , then there exists G ∈ G with d ( G, Q ) ≤ (1 + ε ) d ( L, Q ) ; (iv) for all G ∈ G and for all q ∈ Q , we have d ( G, q ) ≥ (1 − ε ) d ( L, q ) .Proof. Let C = AA T B be the d × l matrix whose columns b (cid:107) , . . . , b (cid:107) l constitute the projectionsof the columns of B onto K . By Lemma 4.3, the vectors b (cid:107) i are orthogonal with (cid:107) b (cid:107) i (cid:107) = σ i , for i = 1 , . . . , l , and the columns b ⊥ , . . . , b ⊥ l of the matrix B − C also constitute an orthogonal set,with (cid:107) b ⊥ i (cid:107) = 1 − σ i , for i = 1 , . . . , l . Let z K be a point in L that minimizes the distance to K ,and write z K = B w K + b . Furthermore, let τ i = αεn t (cid:113) l (1 − σ i ) , for i = 1 , . . . , l , and o τ = (cid:38) n t √ lε (cid:39) . We use the τ i and o τ to deﬁne a set I of index vectors as I = (cid:81) li =1 {− o τ τ i , ( − o τ + 1) τ i , . . . , o τ τ i } .We have | I | = O ( o lτ ) = O (( n t k / ε − ) l ) . For each index vector i ∈ I , we deﬁne the patch G i as G i : w (cid:55)→ Cw + B ( w K + i ) + b , subject to w ∈ l (cid:89) i =1 [0 , τ i ] . Our desired set of approximate query patches is now G = { G i | i ∈ I } . The set G fulﬁllsproperties (i) and (ii) by construction, so it remains to check (iii). Fix a point z ∈ L . Since L ⊆ F , we can write z = B w + b = Bv + b , where the vector w represents the coordinates of z in L and the vector v represents the coordinates of z in F . By Lemma 4.4, d ( z, K ) = k (cid:88) i =1 (1 − σ i )( v − v K ) i + d ( F, K ) , where the vector v K represents the coordinates of a point in F that is closest to K . By deﬁnitionof L , the last k − l coordinates v − [ l ] in F are the same for all points z ∈ L , so we can concludethat the coordinates for a closest point to K in L are given by w K = ( v K ) [ l ] and that d ( z, K ) = l (cid:88) i =1 (1 − σ i )( w − w K ) i + d ( L, K ) . (3)Now take a point z Q in L with d ( z Q , Q ) = d ( L, Q ) and write z Q = B w Q + b . Since we assumed d ( L, Q ) ≤ α/ε , (3) implies that for i = 1 , . . . , l , we have | ( w Q − w K ) i | ≤ α/ (cid:16) ε (cid:113) σ i (cid:17) . Thus,if for i = 1 , . . . , l , we round ( w Q − w K ) i down to the next multiple of τ i , we obtain an indexvector i Q ∈ I with ( w Q − w K ) − i Q ∈ (cid:81) li =1 [0 , τ i ] . We set s Q = ( w Q − w K ) − i Q . Consideringthe point Cs Q + B ( u K + i Q ) + b in G i Q , we see that d ( G i Q , z Q ) ≤ (cid:107) Cs Q + B ( w K + i Q ) + b − B w Q − b (cid:107) = (cid:107) Cs Q − B (( w Q − w K ) − i Q ) (cid:107) = (cid:107) ( C − B ) s Q (cid:107) = l (cid:88) i =1 (1 − σ i )( s Q ) i ≤ l (cid:88) i =1 (1 − σ i ) τ i = ε α /n t , using the properties of the matrix B − C stated above. It follows that d ( G i Q , Q ) ≤ d ( G i Q , z Q ) + d ( z Q , Q ) ≤ εα/n t + d ( L, Q ) ≤ (1 + ε ) d ( L, Q ) , Cluster Structure since we assumed d ( L, Q ) ≥ α/n t . This proves (iii). Property (iv) is obtained similarly. Let G i ∈ G , q ∈ Q and let z be a point in G i . Write z = Cw + B ( w K + i )+ b , where w ∈ (cid:81) ti =1 [0 , σ i ] .Considering the point z x = B ( w + w K + i ) + b in L , we see that d ( G i , r x ) ≤ (cid:107) z − z x (cid:107) = (cid:107) ( C − B ) w (cid:107) ≤ ε α /n t . Thus, d ( G i , q ) ≥ d ( z x , q ) − d ( G i , z x ) ≥ d ( L, q ) − εα/n t ≥ (1 − ε ) d ( L, q ) . Finally, we have a patch G : w (cid:55)→ Cw + b , and we are looking for an approximate nearestneighbor for G in Q . The next lemma states how this can be done. Lemma 4.8.

Suppose that d ( G, Q ) ∈ [ α/ n t , α/ε ] . We can ﬁnd a point (cid:101) q ∈ Q with d ( G, (cid:101) q ) ≤ (1 − / n ) cd ( G, Q ) in total time O c (( k n t /ε )( m − /k + ρ/k + ( d/k ) log m )) .Proof. Let G a be the projection of G onto K , and let g be the projection of G onto K ⊥ . Since G and K are parallel, g is a point, and G a is of the form G a : w (cid:55)→ Cw + a , with a ∈ K and w ∈ (cid:81) ti =1 [0 , τ i ] . Let G + a = { x ∈ K | d ∞ ( x, G a ) ≤ α √ k/ε } , where d ∞ ( · , · ) denotes the (cid:96) ∞ -distance with respect to the coordinate system induced by A . We subdivide the set G + a \ G a ,into a collection C of axis-parallel cubes, each with diameter εα/ n t . The cubes in C have sidelength εα/ n t √ k , the total number of cubes is O (( kn t /ε ) k ) , and the boundaries of the cubeslie on O ( k n t /ε ) hyperplanes.We now search the partition tree T to ﬁnd the highest nodes (∆ , Q ) in T whose simplices ∆ are completely contained in a single cube of C . This is done as follows: we begin at the rootof T , and we check for all children (∆ , Q ) and for all boundary hyperplanes h of C whether thesimplex ∆ crosses the boundary h . If a child (∆ , Q ) crosses no hyperplane, we label it with thecorresponding cube in C (or with G a ). Otherwise, we recurse on (∆ , Q ) with all the boundaryhyperplanes that it crosses.In the end, we have obtained a set D of simplices such that each simplex in D is completelycontained in a cube of C . The total number of simplices in D is s = O (( k n t /ε ) m − /k ) ,by Theorem 3.2. For each simplex in D , we query the corresponding c (cid:48) -ANN structure. Let R ⊆ Q b be the set of the query results. For each point q b ∈ R , we take the corresponding point q ∈ Q , and we compute the distance d ( q, G ) . We return a point (cid:101) q that minimizes d ( q, G ) . Thequery time is dominated by the time for the ANN queries. For each ∆ ∈ D , let m ∆ be thenumber of points in the corresponding ANN structure. By assumption, an ANN-query takestime O c ( m ρ ∆ + d log m ∆ ) , so the total query time is proportional to (cid:88) ∆ ∈D m ρ ∆ + d log m ∆ ≤ s (cid:32) (cid:88) ∆ ∈D m ∆ /s (cid:33) ρ + sd log (cid:32) (cid:88) ∆ ∈D m ∆ /s (cid:33) ≤ O c (cid:16) ( k n t /ε )( m − /k + ρ/k + ( d/k ) log m ) (cid:17) , using the fact that m (cid:55)→ m ρ + d log m is concave and that (cid:80) ∆ ∈D m ∆ ≤ m .It remains to prove that approximation bound. Take a point q ∗ in Q with d ( q ∗ , Q ) = d ( Q, G ) .Since we assumed that d ( Q, G ) ≤ α/ε , the projection q ∗ a of q ∗ onto K lies in G + a . Let ∆ ∗ be thesimplex in D with q ∗ a ∈ ∆ ∗ . Suppose that the ANN-query for ∆ ∗ returns a point (cid:98) q ∈ Q . Thus,in K ⊥ , we have d ( (cid:98) q b , g ) ≤ c (cid:48) d ( Q b ∆ ∗ , g ) ≤ c (cid:48) d ( q ∗ b , g ) , where (cid:98) q b and q ∗ b are the projections of (cid:98) q and q ∗ onto K ⊥ and Q b ∆ ∗ is the point set stored in the ANN-structure of ∆ ∗ . By the deﬁnitionof C , in K , we have d ( (cid:98) q a , G a ) ≤ d ( q ∗ a , G a ) + εα/ n t ≤ d ( q ∗ a , G a ) + εd ( q ∗ , G ) , where (cid:98) q a is the Cluster Structure projection of (cid:98) q onto K . By Pythagoras, d ( (cid:98) q, G ) = d ( (cid:98) q b , g ) + d ( (cid:98) q a , G a ) ≤ c (cid:48) d ( q ∗ b , g ) + ( d ( q ∗ a , G a ) + εd ( q ∗ , G )) ≤ c (cid:48) d ( q ∗ b , g ) + d ( q ∗ a , G a ) + (2 ε + ε ) d ( q ∗ , G ) ≤ ( c (cid:48) + 3 ε )( q ∗ , G ) ≤ (cid:0) (1 − / log n ) c + 3 /

100 log n (cid:1) ( q ∗ , G ) ≤ (1 − / n ) c ( q ∗ , G ) , recalling that c (cid:48) = (1 − / log n ) c and ε = 1 /

100 log n . Since d ( (cid:101) q, G ) ≤ d ( (cid:98) q, G ) , the resultfollows.Of all the candidate points obtained through querying patches, we return the one closest to F .The following lemma summarizes the properties of the query algorithm. Lemma 4.9.

Suppose that d ( F, Q ) ∈ [ α/n t , α/ε ] . Then the query procedure returns a point (cid:101) q ∈ Q with d ( F, (cid:101) q ) ≤ cd ( F, Q ) in total time O c (( k n t ε − / ) k +1 ( m − /k + ρ/k + ( d/k ) log m )) .Proof. By Lemmas 4.6 and 4.7, there exists a patch G with d ( G, Q ) ≤ (1 + ε ) d ( F, Q ) . For thispatch, the algorithm from Lemma 4.8 returns a point (cid:98) q with d ( (cid:98) q, G ) ≤ (1 + 1 / n ) cd ( G, Q ) .Thus, using Lemma 4.7(iv), we have (1 − ε ) d ( (cid:98) q, L ) ≤ d ( (cid:98) q, G ) ≤ (1 − / n ) c (1 + ε ) d ( F, Q ) and by our choice of ε = 1 /

100 log n , we get (1 − / n )(1 + ε ) / (1 − ε ) ≤ (1 − / n )(1 + 3 ε )(1 + 2 ε ) ≤ (1 − / n )(1 + 6 /

100 log n ) ≤ . d ( F, Q ) ≥ α/ε If d ( F, Q ) ≥ α/ε , we can approximate Q by its projection Q a onto K without losing toomuch. Thus, we can perform the whole algorithm in K . This is done by a procedure simi-lar to Lemma 4.8. Lemma 4.10.

Suppose we are given an estimate (cid:101) r with d ( F, Q a ) ∈ [ (cid:101) r/ n t , (cid:101) r ] . Then, we canﬁnd a point (cid:101) q ∈ Q a with d ( F, (cid:101) q ) ≤ (1 + ε ) d ( F, Q a ) in time O (( k / n t /ε ) m − /k ) .Proof. Let x F be a point in K with d ( F, K ) = d ( F, x F ) . Write x F = Au F + a . Deﬁne C = k (cid:89) i =1 (cid:18) ( u F ) i + (cid:20) , (cid:101) r/ (cid:113) − σ i (cid:21)(cid:19) If we take a point x ∈ K with d ( x, F ) ∈ [ (cid:101) r/ n t , (cid:101) r ] and write x = Au + a , then Lemma 4.4 gives d ( F, x ) = k (cid:88) i =1 (1 − σ i )( u − u F ) i + d ( F, K ) , so u ∈ C . We subdivide C into copies of the hyperrectangle (cid:81) ki =1 [0 , ε (cid:101) r/ n t (cid:113) k (1 − σ i )] . Let C be the resulting set of hyperrectangles. The boundaries of the hyperrectangles in C lie on O ( k / n t /ε ) hyperplanes. We now search the partition tree T in order to ﬁnd the highest nodes Approximate k -ﬂat Range Reporting in Low Dimensions (∆ , Q ) in T whose simplices ∆ are completely contained in a single hyperrectangle of C . This isdone in the same way as in Lemma 4.8.This gives a set D of simplices such that each simplex in D is completely contained in ahyperrectangle of C . The total number of simplices in D is O (( k / n t /ε ) m − /k ) , by Theorem 3.2.For each simplex ∆ ∈ D , we pick an arbitrary point q ∈ Q a that lies in ∆ , and we compute d ( F, q ) . We return the point (cid:101) q ∈ Q a that minimizes the distance to F . The total query time is O (( k / n t /ε ) m − /k ) .Now let q ∗ be a point in Q a with d ( F, Q a ) = d ( F, q ∗ ) , and let ∆ ∗ be the simplex D thatcontains q ∗ . Furthermore, let (cid:98) q ∈ Q a be the point that the algorithm examines in ∆ ∗ . Write q ∗ = Au ∗ + a and (cid:98) q = A (cid:98) u + a . Since q ∗ and (cid:98) q lie in the same hyperrectangle and by Lemma 4.4, d ( F, (cid:98) q ) = k (cid:88) i =1 (1 − σ i )( (cid:98) u − u F ) i + d ( F, K ) ≤ k (cid:88) i =1 (1 − σ i )( u ∗ − u F ) i + ε (cid:101) r / n t + d ( F, K ) ≤ (1 + ε ) d ( F, q ∗ ) . Since d ( F, (cid:101) q ) ≤ d ( F, (cid:98) q ) , the result follows. Lemma 4.11.

Suppose we are given an estimate (cid:101) r with d ( F, Q ) ∈ [ (cid:101) r/n t , (cid:101) r ] . Suppose further that d ( F, Q ) ≥ α/ε . Then we can ﬁnd a (cid:101) q ∈ Q with d ( F, (cid:101) q ) ≤ cd ( F, Q ) in time O (( k / n t /ε ) m − /k ) .Proof. For any point q ∈ Q , let q a ∈ Q be its projection onto K . Then, d ( q a , q ) ≤ α ≤ εd ( F, Q ) .Thus, d ( F, Q a ) ∈ [(1 − ε ) d ( F, Q ) , (1 + ε ) d ( F, Q )] , and we can apply Lemma 4.10. Let (cid:101) q a ∈ Q a be the result of this query, and let (cid:101) q be the corresponding point in Q . We have d ( F, (cid:101) q ) ≤ d ( (cid:101) q, (cid:101) q a ) + d ( F, (cid:101) q a ) ≤ εd ( F, Q ) + (1 + ε ) d ( F, Q a ) ≤ εd ( F, Q ) + (1 + ε ) d ( F, Q ) ≤ (1 + 4 ε ) d ( F, Q ) ≤ cd ( F, Q ) , by our choice of ε .By combining Lemmas 4.1, 4.9, and 4.11, we obtain Theorem 2.1. k -ﬂat Range Reporting in Low Dimensions In this section, we present a data structure for low dimensional k -ﬂat approximate near neighborreporting. In Section 6, we will use it as a foundation for our projection structures. The detailsof the structure are summarized in Theorem 5.1. Throughout this section, we will think of d asa constant, and we will suppress factors depending on d in the O -notation. Theorem 5.1.

Let P ⊂ R d be an n -point set. We can preprocess P into an O ( n log d − k − n ) spacedata structure for approximate k -ﬂat near neighbor queries: given a k -ﬂat F and a parameter α , ﬁnd a set R ⊆ P that contains all p ∈ P with d ( p, F ) ≤ α and no p ∈ P with d ( p, F ) > ((4 k + 3)( d − k −

1) + √ k + 1) α . The query time is O ( n k/ ( k +1) log d − k − n + | R | ) . Let E ⊂ R d be the ( k + 1) -dimensional subspace of R d spanned by the ﬁrst k + 1 coordinates,and let Q be the projection of P onto E . We build a ( k + 1) -dimensional partition tree T for Q , as in Theorem 3.2. If d > k + 1 , we also build a slab structure for each node of T . Let v besuch a node, and let Ξ be the simplicial partition for the children of v . Let w > . A w -slab S is a closed region in E that is bounded by two parallel hyperplanes of distance w . The median We assume general position: any two distinct points in P have distinct projections in Q . Approximate k -ﬂat Range Reporting in Low Dimensions hyperplane (cid:98) h of S is the hyperplane inside S that is parallel to the two boundary hyperplanesand has distance w/ from both. A w -slab S is full if there are at least r / simplices ∆ in Ξ with ∆ ⊂ S . Input : point set P ⊂ R d if | P | = O (1) then Store P in a list and return . Q ← projection of P onto the subspace E spanned by the ﬁrst k + 1 coordinates. T ← ( k + 1) -dimensional partition tree for Q as in Theorem 3.2. if d > k + 1 then foreach node v ∈ T do Ξ ← simplicial partition for the children of v for j ← to (cid:98) r / (cid:99) do D j ← CreateSlabStructure( Ξ j ) Ξ j +1 ← Ξ j without all simplices inside the slab for D j Algorithm 5:

CreateSearchStructure

Input : Ξ j = ( Q , ∆ ) , . . . , ( Q r (cid:48) , ∆ r (cid:48) ) V j ← vertices of the simplices in Ξ j For each ( k + 1) -subset V ⊂ V j , ﬁnd the smallest w V > such that the w V -slab withmedian hyperplane aﬀ( V ) is full. Let w j be the smallest w V ; let S j be the corresponding full w j -slab and (cid:98) h j = aﬀ( V ) itsmedian hyperplane. Find the set D j of r / simplices in S j ; let Q j ← (cid:83) ∆ i ∈D j Q i and let P j be the d -dimensional point set corresponding to Q j . h j ← the hyperplane orthogonal to E through (cid:98) h j P (cid:48) ← projection of P j onto h j /* P (cid:48) is ( d − -dimensional */ CreateSearchStructure( P (cid:48) ) Algorithm 6:

CreateSlabStructureThe slab structure for v is constructed in several iterations. In iteration j , we have a currentsubset Ξ j ⊆ Ξ of pairs in the simplicial partition. For each ( k + 1) -set v , . . . , v k of verticesof simplices in Ξ j , we determine the smallest width of a full slab whose median hyperplaneis spanned by v , . . . , v k . Let S j be the smallest among those slabs, and let (cid:98) h j be its medianhyperplane. Let D j be the r / simplices that lie completely in S j . We remove D j and thecorresponding point set Q j = (cid:83) ∆ i ∈D j Q i from Ξ j to obtain Ξ j +1 . Let P j ⊆ P be the d -dimensional point set corresponding to Q j . We project P j onto the d -dimensional hyperplane h j that is orthogonal to E and goes through (cid:98) h j . We recursively build a search structure for the ( d − -dimensional projected point set. The j th slab structure D j at v consists of this searchstructure, the hyperplane h j , and the width w j . This process is repeated until less than r / simplices remain; see Algorithms 5 and 6 for details.Denote by S ( n, d ) the space for a d -dimensional search structure with n points. The partitiontree T has O ( n ) nodes, so the overhead for storing the slabs and partitions is linear. Thus, S ( n, d ) = O ( n ) + (cid:88) D S ( n D , d − , where the sum is over all slab structures D and where n D is the number of points in the slabstructure D . Since every point appears in O (log n ) slab structures, and since the recursion stopsfor d = k + 1 , we get Lemma 5.2.

The search structure for n points in d dimensions needs space O ( n log d − k − n ) . Approximate k -ﬂat Range Reporting in Low Dimensions For a query, we are given a distance threshold α > and a k -ﬂat F . For the recursion, we willneed to query the search structure with a k -dimensional polytope. We obtain the initial querypolytope by intersecting the ﬂat F with the bounding box of P extended by α in each direction.With slight abuse of notation, we still call this polytope F .A query for F and α is processed by using the slab structures for small enough slabs and byrecursing in the partition tree for the remaining points. Details follow.Suppose we are at some node v of the partition tree, and let j ∗ be the largest integer with w j ∗ ≤ (4 k + 2) α . For j = 1 , . . . , j ∗ , we recursively query each slab structure D j as follows: let (cid:101) F ⊆ F be the polytope containing the points in F with distance at most α + w j / from h j ,and let F h be the projection of (cid:101) F onto h j . We query the search structure in D j with F h and α . Next, we project F onto the subspace E spanned by the ﬁrst k + 1 coordinates. Let D bethe simplices in Ξ j ∗ +1 with distance at most α from the projection. For each simplex in D , werecursively query the corresponding child in the partition tree. Upon reaching the bottom of therecursion (i.e., | P | = O (1) ), we collect all points within distance α from F in the set R . Input : polytope F , distance threshold α > Output : point set R ⊆ P R ← ∅ if | P | = O (1) then R ← { p ∈ P | d ( p, F ) ≤ α } else if d = k + 1 then Compute polytope F (cid:5) as described. R ← R ∪ all points of P inside F (cid:5) else j ∗ ← the largest integer with w j ∗ ≤ (4 k + 2) α for j ← to j ∗ do F h ← projection of (cid:101) F onto h j as described R ← R ∪ D j . query( F h , α ) (cid:98) F ← projection of F onto the subspace E spanned by the ﬁrst k + 1 coordinates D ← simplices in Ξ j ∗ +1 D (cid:48) ← { ∆ ∈ D | d (∆ , (cid:98) F ) ≤ α } foreach ∆ ∈ D (cid:48) do R ← R ∪ result of recursive query to partition tree node for ∆ . return R Algorithm 7:

Find a superset R of all points in P with distance less than α from a querypolytope F .If d = k +1 , we approximate the region of interest by the polytope F (cid:5) = { x ∈ R d | d ( x, F ) ≤ α } , where d ( · , · ) denotes the (cid:96) -metric in R d . Then, we query the partition tree T to ﬁnd allpoints of P that lie inside F (cid:5) . We prove in Lemma 5.4 that F (cid:5) is a polytope with O ( d O ( k ) ) facets; see Algorithm 7 for details. The following two lemmas analyze the correctness and querytime of the algorithm. Lemma 5.3.

The set R contains all p ∈ P with d ( p, F ) ≤ α and no p ∈ P with d ( p, F ) > κα ,where κ = (4 k + 3)( d − k −

1) + √ k + 1 .Proof. The proof is by induction on the size n of P and on the dimension d . If n = O (1) , wereturn all points with distance at most α to F . If d = k + 1 , we report the points inside thepolytope F (cid:5) (lines 4–6) using T . Since (cid:107) x (cid:107) ≤ (cid:107) x (cid:107) ≤ √ k + 1 (cid:107) x (cid:107) holds for all x ∈ R k +1 , thepolytope F (cid:5) contains all points with distance at most α from F and no point with distance morethan α √ k + 1 from F . Thus, correctness also follows in this case. Approximate k -ﬂat Range Reporting in Low Dimensions h j ˜ Fq ¯ q ¯ q F q F α w j h j Fig. 1:

The distance error due to the reduction of the dimension in the slab structure D j .In the general case ( d > k + 1 and n not constant), we prove the lemma individually for theslab structures and for the partition tree. Let D j be a slab structure and P j the corresponding d -dimensional point set. Fix some point p ∈ P j with d ( p, F ) ≤ α . To query D j , we take thesubpolytope (cid:101) F ⊆ F with distance at most α + w j / from the median hyperplane h j , and weproject it onto h j . Let F h be this projection. Since orthogonal projections can only decreasedistances, we have p ∈ D j .query( F h , α ) by induction. Now ﬁx a point q ∈ D j .query( F h , α ) .We must argue that d ( q, F h ) ≤ (4 k + 3)( d − k −

1) + √ k + 1 . Let ¯ q ∈ R d − be the projection of q onto h j and ¯ q F ∈ F h the closest point to ¯ q in F h . Let q F ∈ (cid:101) F be the corresponding d -dimensionalpoint (see Fig. 1). By triangle inequality and the induction hypothesis, d ( q, F ) = d ( q, (cid:101) F ) ≤ d ( q, ¯ q ) + d (¯ q, ¯ q F ) + d ( ¯ q F , q F ) ≤ w j / k + 3)( d − k −

2) + √ k + 1) α + ( α + w j / . By construction, we have w j ≤ (4 k + 2) α , so d ( q, F ) ≤ ((4 k + 3)( d − k −

1) + √ k + 1) α , asclaimed.Consider now a child in the partition tree queried in line 16, and let P j be the corresponding d -dimensional point set. Since | P j | < | P | , the claim follows by induction. Lemma 5.4.

The query time is O ( n k/ ( k +1) log d − k − n + | R | ) .Proof. Let Q ( n, d ) be the total query time.First, let d > k + 1 . We bound the time to query the partition tree T . Let (cid:98) F be theprojection of F onto E . Furthermore, let V be the set of nodes in T that are visited duringa query, and let D be the corresponding simplices. By construction, all simplices in D havedistance at most α from (cid:98) F . Consider the α -slab S whose median hyperplane contains (cid:98) F . Wepartition V into two sets: the nodes V B whose simplices intersect ∂S , and the nodes V C whosesimplices lie completely in S . First, since the simplex for each node in T is contained in thesimplex for its parent node, we observe that V B constitutes a connected subtree of T , startingat the root. The nodes of V C form several connected subtrees, each hanging oﬀ a node in V B .Furthermore, by construction, each node from V has at most r / children from V B . Let V (cid:96) bethe set of nodes in V with level (cid:96) , for (cid:96) = 0 , . . . , log r n , and let m (cid:96) = | V (cid:96) | . By Theorem 3.2, wehave | V (cid:96) ∩ V B | ≤ O ( r (cid:96)k/ ( k +1) + r ( k − /k m (cid:96) − + r(cid:96) log r log n ) . Since | V (cid:96) ∩ V C | ≤ r / m (cid:96) − , we get m (cid:96) = | V (cid:96) | = O (cid:16) r (cid:96)k/ ( k +1) + ( r ( k − /k + r / ) m (cid:96) − + r(cid:96) log r log n (cid:17) . For any δ > max(0 , / − ( k − /k ) , if we choose r large enough depending on δ , this solves to m (cid:96) = O ( r (cid:96)k/ ( k +1) + r (cid:96) (( k − /k + δ ) log n ) . Thus, we get Q ( n, d ) = log r n (cid:88) (cid:96) =0 O ( r (cid:96)k/ ( k +1) + r (cid:96) (( k − /k + δ ) log n )( O ( r ) + Q ( n/r (cid:96) , d − . (4)For d = k + 1 , we use T directly. Thus, by Theorem 3.2, the query time Q ( n, k + 1) is O ( f k +1 n − /k + | R F | ) , where f k +1 is the number of facets of F ♦ and R F is the answer set. We Projection Structures claim that f k +1 is bounded by ((2 k + 2)(5 d − k ) k/ ) ( k +1) / . Recall that F (cid:5) is the Minkowskisum of F and the (cid:96) -ball with radius α as in Algorithm 7 line 5. Initially F is the intersectionof the query k -ﬂat with the extended bounding box of P . This intersection can be described byat most d − k + 2 d = 3 d − k oriented half-spaces, where d denotes the initial dimension. Ineach recursive step, we intersect F with the bounding hyperplanes of a slab. Therefore, thedescriptive complexity of F in the base case is at most d − k . By duality and the UpperBound theorem [17], the V -description of F consists of at most (5 d − k ) k/ vertices. Usingthat the Minkowski sum of two polytopes with v and v vertices has at most v v vertices, wededuce that F (cid:5) has at most (2 k + 2)(5 d − k ) k/ vertices. Applying the upper bound theoremagain, it follows that f k +1 = ((2 k + 2)(5 d − k ) k/ ) ( k +1) / , as claimed.Thus, plugging the base case into (4), we get that the overall query time Q ( n, d ) is boundedby O ( n k/ ( k +1) log d − k − + | R | ) .Theorem 5.1 follows immediately from Lemmas 5.2, 5.3, and 5.4. k -Flat Nearest Neighbor Queries We now show how to extend our data structure from Section 5.1 for approximate k -ﬂat nearestneighbor queries with multiplicative error (4 k + 3)( d − k −

1) + √ k + 1 . That is, given an n -point set P ⊂ R d , we want to ﬁnd for any given query ﬂat F ⊂ R d a point p ∈ P with d ( p, F ) ≤ ((4 k + 3)( d − k −

1) + √ k + 1) d ( P, F ) . We reduce this problem to a near neighborquery by choosing an appropriate threshold α that ensures | R | = O ( √ n ) , using random sampling.For preprocessing we build the data structure D from Theorem 5.1 for P .Let a query ﬂat F be given. The F -rank of a point p ∈ P is the number of points in P thatare closer to F than p . Let X ⊆ P be a random sample obtained by taking each point in P independently with probability / √ n . The expected size of X is √ n , and if x ∈ X is the closestpoint to F in X , then the expected F -rank of x is √ n . Set α = d ( x, F ) / ((4 k + 3)( d − k −

1) + √ k + 1) . We query D with F and α to obtain a set R . If d ( P, F ) ≤ α , then R contains thenearest neighbor. Otherwise, x is a ((4 k + 3)( d − k −

1) + √ k + 1) -approximate nearest neighborfor F . Thus, it suﬃces to return the nearest neighbor in R ∪ { x } . Since with high probability allpoints in R have F -rank at most O ( √ n log n ) , we have | R | = O ( √ n log n ) , and the query timeis O ( n k/ ( k +1) log d − k − n ) . This establishes the following corollary of Theorem 5.1. Corollary 5.5.

Let P ⊂ R d be an n -point set. We can preprocess P into an O ( n log d − k − n ) space data structure for approximate k -ﬂat nearest neighbor queries: given a ﬂat F , ﬁnd a point p ∈ P with d ( p, F ) ≤ ((4 k + 3)( d − k −

1) + √ k + 1) d ( P, F ) . The expected query time is O ( n k/ ( k +1) log d − k − n ) . We now describe how to answer queries of type Q1 and Q3 eﬃciently. Our approach is toproject the points into random subspace of constant dimension and to solve the problem thereusing our data structures from Theorem 5.1 and Corollary 5.5. For this, we need a Johnson-Lindenstrauss-type lemma that bounds the distortion, see Section 6.1.Let < t ≤ / (2 + 40 k ) be a parameter and let P ⊂ R d be a high dimensional n -pointset. Set d (cid:48) = 2 /t + 2 and let M ∈ R d (cid:48) × d be a random projection from R d to R d (cid:48) , scaled by (cid:112) d/ d (cid:48) . We obtain ¯ P ⊂ R d (cid:48) by projecting P using M . We build for ¯ P the data structure D from Corollary 5.5 to answer Q1 queries and D from Theorem 5.1 to answer Q3 queries. Thisneeds O ( n log O ( d (cid:48) ) n ) = O ( n log O (1 /t ) n ) space. For each p ∈ P we write ¯ p for the d (cid:48) -dimensionalpoint M p and ¯ F for the projected ﬂat M F . Projection Structures We use the following variant of the Johnson-Lindenstrauss-Lemma, as proved by Dasgupta andGupta [8, Lemma 2.2].

Lemma 6.1 (JL-Lemma) . Let d (cid:48) < d , and let M ∈ R d (cid:48) × d be the projection matrix onto arandom d (cid:48) -dimensional subspace, scaled by a factor of (cid:112) d/d (cid:48) . Then, for every vector x ∈ R d ofunit length and every β > , we have1. Pr (cid:2) (cid:107) M x (cid:107) ≥ β (cid:3) ≤ exp (cid:0) d (cid:48) (1 − β + ln β ) (cid:1) , and2. Pr (cid:2) (cid:107) M x (cid:107) ≤ /β (cid:3) ≤ exp (cid:0) d (cid:48) (1 − /β − ln β ) (cid:1) ≤ exp (cid:0) d (cid:48) (1 − ln β ) (cid:1) . Lemma 6.2.

Let p ∈ R d be a point and let F ⊂ R d be a k -ﬂat. For d (cid:48) ∈ { k, . . . , d − } , let M ∈ R d (cid:48) × d be the projection matrix into a random d (cid:48) -dimensional subspace, scaled by (cid:112) d/ d (cid:48) .Let ¯ p = M p and ¯ F = M F be the projections of p and of F , respectively. Then, for any β ≥ k ,(i) Pr[d(¯ p, ¯ F ) ≤ d( p, F )] ≥ − e − d (cid:48) / ; and (ii) Pr[d(¯ p, ¯ F ) ≥ d( p, F ) /β ] ≥ − β − d (cid:48) / .Proof. Let N = 2 M , and set q = N p and K = N F . Deﬁning ∆ p = d( p, F ) and ∆ q = d( q, K ) ,we must bound the probabilities Pr[∆ q ≤ p ] for (i) and Pr[∆ q ≥ p /β ] for (ii).We begin with (i). Let p (cid:107) be the orthogonal projection of p onto F , and let p ⊥ = p − p (cid:107) . Let q ⊥ = N p ⊥ . Then, ∆ p = (cid:107) p ⊥ (cid:107) and ∆ q ≤ (cid:107) q ⊥ (cid:107) . By Lemma 6.1(1), Pr (cid:104) (cid:107) q ⊥ (cid:107) ≥ p (cid:105) = Pr (cid:104) (cid:107) N p ⊥ (cid:107) / (cid:107) p ⊥ (cid:107) ≥ (cid:105) = Pr (cid:104) (cid:107) N ( p ⊥ / (cid:107) p ⊥ (cid:107) ) (cid:107) ≥ (cid:105) ≤ exp (cid:0) d (cid:48) (1 − (cid:1) ≤ exp( − d (cid:48) / . Thus,

Pr[∆ q ≤ p ] ≥ Pr[ (cid:107) q ⊥ (cid:107) ≤ p ] ≥ − exp( − d (cid:48) / , as desired.For (ii), choose k orthonormal vectors e , . . . , e k such that F = (cid:110) p (cid:107) + (cid:80) ki =1 λ i e i | λ i ∈ R (cid:111) .Set u i = e i (cid:107) p ⊥ (cid:107) /β , and consider the lattice L = (cid:110) p (cid:107) + (cid:80) ki =1 µ i u i | µ i ∈ Z (cid:111) ⊂ F . Let ¯ L = N L be the projected lattice. We next argue that with high probability (i) all points in ¯ L havedistance at least (cid:107) p ⊥ (cid:107) /β from q ; and (ii) for i = 1 , . . . , k , we have (cid:107) N u i (cid:107) < (cid:107) p ⊥ (cid:107) /β √ k .To show (i), we partition L into layers : for j ∈ { , , . . . , } , let L j = (cid:40) p (cid:107) + k (cid:88) i =1 µ i u i | µ i ∈ {− j, . . . , j } , max i | µ i | = j (cid:41) ⊂ L. Now for any j ∈ N and r = p (cid:107) + (cid:80) ki =1 µ i u i ∈ L j , Pythagoras gives (cid:107) p − r (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:107) p ⊥ (cid:107) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (cid:88) i =1 µ i u i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:107) p ⊥ (cid:107) (cid:118)(cid:117)(cid:117)(cid:116) k (cid:88) i =1 | µ i | /β ≥ (cid:107) p ⊥ (cid:107) (cid:112) j /β . Thus, using Lemma 6.1(2), Pr (cid:104) (cid:107) N ( p − r ) (cid:107) ≤ (cid:107) p ⊥ (cid:107) /β (cid:105) = Pr (cid:104) (cid:107) N ( p − r ) (cid:107) / (cid:107) p − r (cid:107) ≤ (cid:107) p ⊥ (cid:107) /β (cid:107) p − r (cid:107) (cid:105) ≤ Pr (cid:104) (cid:107) N ( p − r ) / (cid:107) p − r (cid:107)(cid:107) ≤ /β (1 + j /β ) (cid:105) ≤ exp( d (cid:48) (1 + ln(9 /β (1 + j /β )))) ≤ (5 /β ) d (cid:48) (1 + j /β ) − d (cid:48) / , as √ e ≤ . Now we use a union bound to obtain Pr (cid:104) ∃ r ∈ L : (cid:107) N ( p − r ) (cid:107) ≤ (cid:107) p ⊥ (cid:107) /β (cid:105) = ∞ (cid:88) j =0 Pr (cid:104) ∃ r ∈ L j : (cid:107) N ( p − r ) (cid:107) ≤ (cid:107) p ⊥ (cid:107) /β (cid:105) . Projection Structures Grouping the summands into groups of β consecutive terms, this is = ∞ (cid:88) l =0 lβ + β − (cid:88) j = lβ Pr (cid:104) ∃ r ∈ L j : (cid:107) N ( p − r ) (cid:107) ≤ (cid:107) p ⊥ (cid:107) /β (cid:105) ≤ (5 /β ) d (cid:48) ∞ (cid:88) l =0 | L ≤ ( l +1) β | (1 + ( lβ ) /β ) − d (cid:48) / , where L ≤ a = (cid:83) ai =0 L i . Using the rough bound | L ≤ a | ≤ (3 a ) k , this is ≤ (5 /β ) d (cid:48) ∞ (cid:88) l =0 (3( l + 1) β ) k (1 + l ) − d (cid:48) / = 5 d (cid:48) k β k − d (cid:48) ∞ (cid:88) l =0 ( l + 1) k (1 + l ) − d (cid:48) / . For d (cid:48) ≥ k , we have ( l + 1) k (1 + l ) − d (cid:48) / ≤ ( l + 1) k (1 + l ) − k ≤ / (1 + l ) , so we can boundthe sum by (cid:80) ∞ l =1 / (1 + l ) ≤ π / . Thus, we have derived Pr (cid:104) ∃ r ∈ L : (cid:107) N ( p − r ) (cid:107) ≤ (cid:107) p ⊥ (cid:107) /β (cid:105) ≤ d (cid:48) k β k − d (cid:48) ( π / ≤ d (cid:48) + k β k − d (cid:48) , (5)since π / ≤ / .To show (ii), we use a union bound with Lemma 6.1(1). Recalling (cid:107) u (cid:107) = (cid:107) p ⊥ (cid:107) /β , Pr (cid:104) ∃ i = 1 , . . . , k : (cid:107) N u i (cid:107) > (cid:107) p ⊥ (cid:107) / √ kβ (cid:105) ≤ k Pr (cid:104) (cid:107) N u (cid:107) > (cid:107) p ⊥ (cid:107) / √ kβ (cid:105) = k Pr[ (cid:107) N ( u / (cid:107) u (cid:107) ) (cid:107) > β /k ] ≤ k exp (cid:0) d (cid:48) (1 − β /k + ln( β /k )) (cid:1) ≤ k exp (cid:0) − β d (cid:48) / k (cid:1) , (6)since c /k ≥ β /k )) for β /k ≥ . By (5) and (6), and recalling β, d (cid:48) ≥ k , theprobability that events (i) and (ii) do not both happen is at most d (cid:48) + k β k − d (cid:48) + ke − β d (cid:48) / k ≤ (1+1 / d (cid:48) β (1 / − d (cid:48) + ( β/ e − cd (cid:48) ≤ (cid:32) / / (cid:33) d (cid:48) β − d (cid:48) / + 110 β − d (cid:48) / ≤ β − d (cid:48) / . Suppose (i) and (ii) happen. Fix a point w ∈ K , and let ¯ r ∈ ¯ L be the point in the projectedlattice that is closest to w . By (i), d( q, ¯ r ) > (cid:107) p ⊥ (cid:107) /β . By (ii) and the choice of ¯ r , the k -dimensional cube with center ¯ r and side length (cid:107) p ⊥ (cid:107) /β √ k contains w . This cube has diameter (cid:107) p ⊥ (cid:107) /β . By triangle inequality, d( q, w ) > d( q, ¯ r ) − d(¯ r, w ) ≥ (3 /β − /β ) (cid:107) p ⊥ (cid:107) = 2 (cid:107) p ⊥ (cid:107) /β . Let a query ﬂat F be given. To answer Q1 queries, we compute ¯ F and query D with ¯ F toobtain a ((4 k + 3)( d (cid:48) − k −

1) + √ k + 1) -nearest neighbor ¯ p . We return the original point p . Toobtain Theorem 2.2, we argue that if ¯ p is a ((4 k + 3)( d − k −

1) + √ k + 1) -nearest neighbor for ¯ F , then p is a n t -nearest neighbor for F with high probability. Conclusion Let p ∗ ∈ P be a point with d ( p ∗ , F ) = d ( P, F ) . Set δ p ∗ = d ( p ∗ , F ) and ¯ δ p ∗ = d ( ¯ p ∗ , ¯ F ) .Denote by A the event that ¯ δ p ∗ ≤ δ p ∗ . By Lemma 6.2, Pr[ A ] ≥ − e − d (cid:48) / = 1 − e − /t − . Let A be the event that for all points p ∈ P with δ p = d ( p, F ) > n t δ p ∗ we have ¯ δ p = d (¯ p, ¯ F ) > ((4 k +3)( d (cid:48) − k − √ k + 1) δ p ∗ . For a ﬁxed p ∈ P , by setting β = n t / ((4 k +3)( d (cid:48) − k − √ k + 1) in Lemma 6.2, this probability is Pr[¯ δ p > ((4 k + 3)( d (cid:48) − k −

1) + √ k + 1) δ p ∗ ] ≥ − ( n t / ((4 k + 3)( d (cid:48) − k −

1) + √ k + 1)) − d (cid:48) / = 1 − n − − t ((4 k + 3)(2 t + 1 − k ) √ k + 1) /t +1 ≥ − n − − t/ , for n large enough. By the union bound, we get Pr[ A ] ≥ − n − t/ , so the event A ∩ A occurswith constant probability. Then, p is a n t -approximate nearest neighbor for F , as desired. To answer a query of type Q3 , we compute the projection ¯ F and query D with parameter α .We obtain a set ¯ R ⊂ ¯ P in time O ( n k/ ( k +1) log O (1 /t ) n + | ¯ R | ) . Let R ⊂ P be the corresponding d -dimensional set. We return a point p ∈ R that minimizes d ( p, F ) . If δ p ∗ ≤ α , the event A from above implies that ¯ p ∗ ∈ ¯ R , and we correctly return p ∗ .To bound the size of | ¯ R | , and thus the running time, we use that P is αn t / (2 k + 1) -cluster-free. Let A be the event that for all p ∈ P with d ( p, F ) > αn t / (2 k +1) , we have d (¯ p, ¯ F ) > ((4 k +3)( d (cid:48) − k − √ k + 1) α . By the deﬁnition of cluster-freeness and the guarantee of Theorem 5.1,we have | ¯ R | = m in the case of A . Using β = n t / ((2 k + 1)((4 k + 3)( d − k −

1) + √ k + 1)) in Lemma 6.2 and doing a similar calculation as above yields again Pr[ A ] ≥ − n − t/ . Thus,we can answer queries of type Q3 successfully in time O ( n k/ ( k +1) log O (1 /t ) n + m ) with constantprobability, as claimed in Theorem 2.3. We have described the ﬁrst provably eﬃcient data structure for general k -ANN. Our maintechnical contribution consists of two new data structures: the cluster data structure for high-dimensional k -ANN queries, and the projection data structure for k -ANN queries in constantdimension. We have only presented the latter structure for a constant approximation factor (4 k + 3)( d − k −

1) + √ k + 1 , but we believe that it is possible to extend it to any ﬁxedapproximation factor c > . For this, one would need to subdivide the slab structures by asuﬃciently ﬁne sequence of parallel hyperplanes.Naturally, the most pressing open question is to improve the query time of our data structure.Also, a further generalization to more general query or data objects would be of interest. Acknowledgments

This work was initiated while WM, PS, and YS were visiting the Courant Institute of Math-ematical Sciences. We would like to thank our host Esther Ezra for her hospitality and manyenlightening discussions.

References [1] A. Andoni.

Nearest Neighbor Search: the Old, the New, and the Impossible . PhD thesis,Massachusetts Institute of Technology, 2009.

Conclusion [2] A. Andoni, P. Indyk, R. Krauthgamer, and H. L. Nguyen. Approximate line nearest neigh-bor in high dimensions. In Proc. 20th Annu. ACM-SIAM Sympos. Discrete Algorithms(SODA) , pages 293–301, 2009.[3] A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing.In

Proc. 24th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA) , pages 1018–1028,2014.[4] R. Basri, T. Hassner, and L. Zelnik-Manor. Approximate nearest subspace search withapplications to pattern recognition. In

Proc. Conference on Computer Vision and PatternRecognition (CVPR) , pages 1–8, 2007.[5] T. M. Chan. Optimal partition trees.

Discrete Comput. Geom. , 47(4):661–690, 2012.[6] B. Chazelle.

The discrepancy method. Randomness and complexity . Cambridge UniversityPress, 2000.[7] K. L. Clarkson. A randomized algorithm for closest-point queries.

SIAM J. Comput. ,17(4):830–847, 1988.[8] S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss.

Random Structures Algorithms , 22(1):60–65, 2003.[9] R. A. Horn and C. R. Johnson.

Matrix analysis . Cambridge University Press, secondedition, 2013.[10] P. Indyk. Nearest neighbors in high-dimensional spaces. In J. E. Goodman and J. O’Rourke,editors,

Handbook of Discrete and Computational Geometry , chapter 39. CRC Press, 2ndedition, 2004.[11] P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curseof dimensionality. In

Proc. 30th Annu. ACM Sympos. Theory Comput. (STOC) , pages604–613, 1998.[12] E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Eﬃcient search for approximate nearestneighbor in high dimensional spaces. In

Proc. 30th Annu. ACM Sympos. Theory Comput.(STOC) , pages 614–623, 1998.[13] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: Eﬃcient indexingfor high-dimensional similarity search. In

Proc. 33rd International Conference on Very LargeData Bases (VLDB) , pages 950–961, 2007.[14] A. Magen. Dimensionality reductions in (cid:96) that preserve volumes and distance to aﬃnespaces. Discrete Comput. Geom. , 38(1):139–153, 2007.[15] S. Mahabadi. Approximate nearest line search in high dimensions. In

Proc. 26th Annu.ACM-SIAM Sympos. Discrete Algorithms (SODA) , page to appear, 2015.[16] J. Matoušek. Eﬃcient partition trees.

Discrete Comput. Geom. , 8(3):315–334, 1992.[17] J. Matoušek.

Lectures on discrete geometry . Springer-Verlag, 2002.[18] S. Meiser. Point location in arrangements of hyperplanes.

Inform. and Comput. , 106(2):286–303, 1993.[19] R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In