[PDF] Optimal terminal dimensionality reduction in Euclidean space

Abstract

Let ε∈(0,1) and X⊂ R d be arbitrary with |X| having size n>1 . The Johnson-Lindenstrauss lemma states there exists f:X→ R m with m=O( ε −2 logn) such that ∀x∈X ∀y∈X,∥x−y ∥ 2 ≤∥f(x)−f(y) ∥ 2 ≤(1+ε)∥x−y ∥ 2 . We show that a strictly stronger version of this statement holds, answering one of the main open questions of [MMMR18]: " ∀y∈X " in the above statement may be replaced with " ∀y∈ R d ", so that f not only preserves distances within X , but also distances to X from the rest of space. Previously this stronger version was only known with the worse bound m=O( ε −4 logn) . Our proof is via a tighter analysis of (a specific instantiation of) the embedding recipe of [MMMR18].

Full PDF

aa r X i v : . [ c s . D S ] O c t Optimal terminal dimensionality reduction in Euclidean space

Shyam Narayanan ∗ Jelani Nelson † October 23, 2018

Abstract

Let ε ∈ (0 ,

1) and X ⊂ R d be arbitrary with | X | having size n >

1. The Johnson-Lindenstrauss lemma states there exists f : X → R m with m = O ( ε − log n ) such that ∀ x ∈ X ∀ y ∈ X, k x − y k ≤ k f ( x ) − f ( y ) k ≤ (1 + ε ) k x − y k . We show that a strictly stronger version of this statement holds, answering one of the main openquestions of [MMMR18]: “ ∀ y ∈ X ” in the above statement may be replaced with “ ∀ y ∈ R d ”,so that f not only preserves distances within X , but also distances to X from the rest of space.Previously this stronger version was only known with the worse bound m = O ( ε − log n ). Ourproof is via a tighter analysis of (a speciﬁc instantiation of) the embedding recipe of [MMMR18]. Metric embeddings may play a role in algorithm design when input data is geometric, in whichthe technique is applied as a pre-processing step to map input data living in some metric space( X , d X ) into some target space ( Y , d Y ) that is algorithmically friendlier. One common approach isthat of dimensionality reduction , in which X and Y are both subspaces of the same normed space,but where Y is of much lower dimension. Working with lower-dimensional embedded data thentypically results in eﬃciency gains, in terms of memory, running time, and/or other resources.A cornerstone result in this area is the Johnson-Lindenstrauss (JL) lemma [JL84], which providesdimensionality reduction for Euclidean space. Lemma 1.1. [JL84] Let ε ∈ (0 , and X ⊂ R d be arbitrary with | X | having size n > . Thereexists f : X → R m with m = O ( ε − log n ) such that ∀ x, y ∈ X, k x − y k ≤ k f ( x ) − f ( y ) k ≤ (1 + ε ) k x − y k . (1)It has been recently shown that the dimension m of the target Euclidean space achieved by theJL lemma is best possible, at least for ε ≫ / p min( n, d ) [LN17] (see also [AK17]).The multiplicative factor on the right hand side of Eqn. (1), in this case 1 + ε , is referred toas the distortion of the embedding f . Recent work of Elkin et al. [EFN17] showed a stronger form ∗ Harvard University. [email protected] . Supported by a PRISE fellowship and a Herchel-Smith Fellowship. † Harvard University. [email protected] . Supported by NSF CAREER award CCF-1350670, NSF grantIIS-1447471, ONR grant N00014-18-1-2562, ONR DORECG award N00014-17-1-2127, an Alfred P. Sloan ResearchFellowship, and a Google Faculty Research Award.

1f Euclidean dimensionality reduction for the case of constant distortion. Namely, they showedthat in Eqn. (1), whereas x is taken as an arbitrary point in X , y may be taken as an arbitrarypoint in R d . They called such an embedding a terminal embedding . Though rather than achievingterminal distortion 1 + ε , their work only showed how to achieve constant terminal distortion with m = O (log n ) for a constant that could be made arbitrarily close to √

10 (see [EFN17, Theorem1]). Terminal embeddings can be useful in static high-dimensional computational geometry datastructural problems. For example, consider nearest neighbor search over some ﬁnite database X ⊂ R d . If one builds a data structure over f ( X ) for a terminal embedding f , then any futurequery is guaranteed to be handled correctly (or, at least, the embedding will not be the source offailure). Contrast this with the typical approach where one uses a randomized embedding obliviousto the input (e.g. random projections) that preserves the distance between any ﬁxed pair of vectorswith probability 1 − / poly ( n ). One can verify that the embedding preserves distances within X during pre-processing, but for any later query there is some non-zero probability that the embeddingwill fail to preserve the distance between the query point q and some points in X .Subsequent to [EFN17], work of Mahabadi et al. [MMMR18] gave a construction for termi-nal dimensionality reduction in Euclidean space achieving terminal distortion 1 + ε , with m = O ( ε − log n ). They asked as one of their main open questions (see [MMMR18, Open Problem 3])whether it is possible to achieve this terminal embedding guarantee with m = O ( ε − log n ), whichwould be optimal given the JL lower bound of [LN17]. Our contribution in this work is to resolvethis question aﬃrmatively; the following is our main theorem. Theorem 1.1.

Let ε ∈ (0 , and X ⊂ R d be arbitrary with | X | having size n > . There exists f : X → R m with m = O ( ε − log n ) such that ∀ x ∈ X, ∀ y ∈ R d , k x − y k ≤ k f ( x ) − f ( y ) k ≤ (1 + ε ) k x − y k . The embedding we analyze in this work is in fact one that ﬁts within the family introduced in[MMMR18]; our contribution is to provide a sharper analysis.We note that unlike in [MMMR18], which provided a deterministic polynomial time constructionof the terminal embedding, we here only provide a Monte Carlo polynomial time constructionalgorithm. Our error probability though only comes from mapping the points in X . If the pointsin X are mapped to R m well (with low “convex hull distortion”, which we deﬁne later), whichhappens with high probability, then our ﬁnal terminal embedding is guaranteed to have low terminaldistortion as map from all of R d to R m . Remark.

Unlike in the JL lemma for which the embedding f may be linear, the terminal embeddingwe analyze here (as was also in the case in [EFN17, MMMR18]) is nonlinear, as it must be. To seethat it must be nonlinear, consider that any linear embedding with constant terminal distortion, x Π x for some matrix Π ∈ R m × d , must have Π( x − y ) = 0 for any x ∈ X and any y ∈ R d on theunit sphere centered at x . In other words, one needs ker (Π) = ∅ , which is impossible unless m ≥ d . We outline both the approach of previous works as well as our own. In all these approaches, onestarts with an embedding f : X → ℓ m with good distortion, then deﬁnes an outer extension f Ext as introduced in [EFN17] and deﬁned explicitly in [MMMR18]. More generally, a terminal embedding from ( X , d X ) into ( Y , d Y ) with terminal set K ⊂ X and terminal distortion α is a map f : X → Y s.t. ∃ c > d X ( u, w ) ≤ c · d Y ( f ( u ) , f ( w )) ≤ α · d X ( u, w ) for all u ∈ K, w ∈ X . eﬁnition 1.1. For f : X → R m and Z ) X , we say g : Z → R m ′ is an outer extension of f if m ′ ≥ m , and g ( x ) for x ∈ X has its ﬁrst m entries equal to f ( x ) and last m ′ − m entries all 0.In [EFN17, MMMR18] and the current work, a terminal embedding is obtained by, for each u ∈ R d \ X , deﬁning an outer extension f ( u )Ext for Z = X ∪ { u } with m ′ = m + 1. Since f ( u )Ext and f ( u ′ )Ext act identically on X for any u, u ′ ∈ R d , we can then deﬁne our ﬁnal terminal embedding by˜ f ( u ) = ( f ( u ) ,

0) for u ∈ X and ˜ f ( u ) = f ( u )Ext ( u ) for u ∈ R d \ X , which will have terminal distortionsup u ∈ R d \ X Dist ( f ( u )Ext ), where Dist ( g ) denotes the distortion of g . The main task is thus creating anouter extension with low distortion for a set Z = X ∪ { u } for some u . In all the cases that follow,we will have f Ext ( x ) = ( f ( x ) ,

0) for x ∈ X , and we will then specify how to embed points u / ∈ X .The construction of Elkin et al. [EFN17], the “EFN extension”, is as follows. Suppose f : X → ℓ mp is an α -distortion embedding. The EFN extension f EFN : X ∪ { u } → ℓ m +1 p is then deﬁnedby f EFN ( u ) = ( f ( ρ ℓ p ( u )) , d ( ρ ℓ p ( u ) , u )), where ρ d ( u ) = argmin x ∈ X d ( x, u ) for some metric d . Inthe case that we view X ∪ { u } as living in the metric space ℓ dp , Elkin et al. showed that the EFNextension has terminal distortion at most 2 ( p − /p · ((2 α ) p + 1) /p [EFN17, Theorem 1]. If p = 2and f is obtained via the JL lemma to have distortion α ≤ ε , this implies the EFN extensionwould have terminal distortion at most √

10 + O ( ε ).The EFN extension does not in general achieve terminal distortion 1 + ε , even if starting with f a perfect isometry. In fact, the bound of [EFN17] showing distortion at least √

10 is sharp. Considerfor example X = {− , , } ⊂ R . Consider the identity map f ( x ) = x , which has distortion 1 asa map from ( X, ℓ ) to ( R , ℓ ). Then f EFN ( −

1) = ( − , f EFN (0) = (0 , f EFN (2) = (2 , f EFN (1) = (0 , k f EFN (1) − f EFN ( − k = √ √ k f EFN (1) − f EFN (2) k = √ √ f EFN has terminal distortionat least (in fact exactly equal to) √

10. This example in fact shows sharpness for ℓ p for all p ≥ ε , the work of [MMMR18] had to develop a new outerextension, the “MMMR extension”, which they based on the following core lemma. Lemma 1.2 ([MMMR18, Lemma 3.1 (rephrased)]) . Let X be a ﬁnite subset of ℓ d , and suppose f : X → ℓ m has distortion γ . Fix some u ∈ R d , and deﬁne x := ρ ℓ ( u ) . Then ∃ u ′ ∈ R m s.t. • k u ′ − f ( x ) k ≤ k u − x k , and • ∀ x ∈ X , |h u ′ − f ( x ) , f ( x ) − f ( x ) i − h u − x , x − x i| ≤ √ γ ( k x − x k + k u − x k )Mahabadi et al. then used the u ′ promised by Lemma 1.2 as part of a construction thattakes a (1 + γ )-distortion embedding f : X → ℓ m and uses it in a black box way to con-struct an outer extension f MMMR : X ∪ { u } → ℓ m +12 . In particular, they deﬁne f MMMR ( u ) =( u ′ , p k u − x k − k u ′ − f ( x ) k ). It is clear this map perfectly preserves the distance from u to x ; in [MMMR18, Theorem 1.5], it is furthermore shown that f MMMR preserves distances from u to all of X up to a 1 + O ( √ γ ) factor. Thus one should set γ = Θ( ε ) so that f MMMR has distortion1 + ε , which is achieved by starting with an f guaranteed by the JL lemma with m = Θ( ε − log n ).They then showed that this loss is tight , in the sense that there exist X , u , f : X → R m , where f has distortion 1 + γ , such that any outer extension f Ext to domain X ∪ { u } has distortion 1 + Ω( √ γ )[MMMR18, Section 3.2]. Thus, seemingly a new approach is needed to achieve m = O ( ε − log n ).One may be discouraged by the above-mentioned tightness of the γ → Ω( √ γ ) loss, but in thiswork we show that, in fact, the MMMR extension can be made to provide 1 + ε distortion withthe optimal m = O ( ε − log n )! The tightness result mentioned in the last paragraph is only an3bstacle to using the low-distortion property of f in a black box way , as it is only shown that there exist f where the γ → Ω( √ γ ) loss is necessary. However, the f we are using is not an arbitrary f , but rather is the f obtained via the JL lemma. A standard way of proving the JL lemma isto choose Π ∈ R m × d with i.i.d. subgaussian entries, scaled by 1 / √ m for m = Θ( ε − log n ) [Ver18,Exercise 5.3.3]. The low-distortion embedding is then f ( x ) = Π x . We show in this work that bynot just using that this f is a low-distortion embedding for X , but rather that it satisﬁes a strongerproperty we dub convex hull distortion (which we show x Π x does satisfy with high probability),one can achieve the desired terminal embedding result with optimal m . Deﬁnition 1.2.

For T ⊂ S d − a subset of the unit sphere in R d , and ε ∈ (0 , ∈ R m × d that Π provides ε -convex hull distortion for T if ∀ x ∈ conv( T ) , |k Π x k − k x k | < ε where conv( T ) := { P i λ i t i : ∀ i t i ∈ T, λ i ≥ , P i λ i = 1 } denotes the convex hull of T .We show that a random Π with subgaussian entries provides ε -convex hull distortion for T withprobability at least 1 − δ as long as m = Ω( ε − log( | T | / ( εδ ))). We then replace Lemma 1.2 with anew lemma that shows that as long as f ( x ) = Π x does not just have 1 + γ distortion for X , butrather provides γ -convex hull distortion for T = { ( x − y ) / k x − y k : x, y ∈ X } , then the 3 √ γ termon the RHS of the second bullet of Lemma 1.2 can be replaced with O ( γ ) (plus one other technicalimprovement; see Lemma 3.1). Note | T | = (cid:0) n (cid:1) , so log | T | = O (log n ). We then show how to usethe modiﬁed lemma to achieve an outer extension with m = O ( ε − log( n/ε )) = O ( ε − log n ). Thislast equality holds since we may assume ε = Ω(1 / √ n ), since otherwise there is a trivial terminalembedding with m = n = O ( ε − ) with no distortion: if d ≤ n , take the identity map. Else,translate X so 0 ∈ X ; then E := span ( X ) has dim ( E ) ≤ n −

1. By rotation, we can assume E = span { e , . . . , e n − } so that every x ∈ E can be written as P n − i =1 α ( x ) i e i for some vector α ( x ) ∈ R n − .We can then deﬁne a terminal embedding ˜ f : R d → R n with ˜ f ( x ) = ( α ( proj E ( x )) , k proj E ⊥ ( x ) k )for all x ∈ R d . Here proj E denotes orthogonal projection onto E . For our optimal terminal embedding analysis, we rely on two previous results. The ﬁrst result is thevon Neumann Minimax theorem [vN28], which was also used in the terminal embedding analysisin [MMMR18]. The theorem states the following:

Theorem 2.1.

Let X ⊂ R n and Y ⊂ R m be compact convex sets. Suppose that f : X × Y → R isa continuous function that satisﬁes the following properties:1. f ( · , y ) : X → R is convex for any ﬁxed y ∈ Y ,2. f ( x, · ) : Y → R is concave for any ﬁxed x ∈ X. Then, we have that min x ∈ X max y ∈ Y f ( x, y ) = max y ∈ Y min x ∈ X f ( x, y ) . The second result is a result of Dirksen [Dir15, Dir16] that provides a uniform tail bound onsubgaussian empirical processes. To explain the result, we ﬁrst make the following deﬁnitions:4 eﬁnition 2.1.

A semi-metric d on a space X is a function X × X → R ≥ such that d ( x, x ) = 0 ,d ( x, y ) = d ( y, x ) , and d ( x, y ) + d ( y, z ) ≥ d ( x, z ) for all x, y, z ∈ X .Note a semi-metric may have d ( x, y ) = 0 for x = y . Deﬁnition 2.2.

Given a semi-metric d on T , and subset S ⊂ T, deﬁne d ( t, S ) = inf s ∈ S d ( t, s ) forany point t ∈ T . Deﬁnition 2.3.

Given a semi-metric d on T , deﬁne γ ( T, d ) = inf { S r } ∞ r =0 sup t ∈ T X r ≥ r/ d ( t, S r ) , where the ﬁrst inﬁmum runs over all subsequences S ⊂ S ⊂ . . . ⊂ T where | S | = 1 and | S r | ≤ r . Deﬁnition 2.4.

For any random variable X , we deﬁne the subgaussian norm of X as k X k ψ = inf n C ≥ E (cid:16) e X /C (cid:17) ≤ o . It is well known that the subgaussian norm as deﬁned above indeed deﬁnes a norm [BLM13].We also note the following proposition:

Proposition 2.1. [BLM13] There exists some constant k such that if X , ..., X n are i.i.d. sub-gaussians with mean and subgaussian norm C , then for any a , ..., a n ∈ R , a X + ... + a n X n issubgaussian with subgaussian norm ≤ k k a k C, where k a k = p a + ... + a n . Theorem 2.2. [Dir16, Theorem 3.2] Let T be a set, and suppose that for every t ∈ T and every ≤ i ≤ m, X t,i is a random variable with ﬁnite expected value and variance. For each t ∈ T, let A t = 1 m m X i =1 (cid:0) X t,i − E X t,i (cid:1) . Consider the following semi-metric on T : for s, t ∈ T , deﬁne d ψ ( s, t ) := max ≤ i ≤ m k X s,i − X t,i k ψ and deﬁne ∆ ψ ( T ) := sup t ∈ T max ≤ i ≤ m k X t,i k ψ . Then, there exist constants c, C > such that for all u ≥ , P sup t ∈ T | A t | ≥ C (cid:18) m γ ( T, d ψ ) + 1 √ m ∆ ψ ( T ) γ ( T, d ψ ) (cid:19) + c √ u ∆ ψ ( T ) √ m + u ∆ ψ ( T ) m !! ≤ e − u . Construction of the Terminal Embedding ℓ condition Here we show that for all X = { x , . . . , x n } ⊂ R d a set of unit norm vectors, there exists Π ∈ R m × d for m = O ( ε − log n ) providing ε -convex hull distortion for X , as deﬁned in Deﬁnition 1.2.If ε − > n, this construction follows by projecting onto a spanning subspace of x , . . . , x n andchoosing an orthonormal basis. For ε − < n , our goal is to show that if Π ∈ R m × d is a normalizedrandom matrix with i.i.d. subgaussian entries (normalized by 1 / √ m ), then Π provides ε -convexhull distortion with high probability.Deﬁne T = conv( X ). We apply Theorem 2.2 as follows. For some m which we will chooselater, let Π be a matrix in R m × d with i.i.d. subgaussian entries with mean 0, variance 1, and somesubgaussian norm C . Let Π be the scaled matrix, i.e. √ m · Π . Let Π i denote the i th row of Π and for any t ∈ R d , let X t,i = Π i t. Finally, let T k be the subset of T of points with norm at most2 − k , i.e., T k = { x ∈ T : k x k ≤ − k } .First, note that A t := 1 m m X i =1 (cid:0) (Π i t ) − E (Π i t ) (cid:1) = 1 m (cid:0) k Π t k − m · k t k (cid:1) = k Π t k − k t k . For any t ∈ T and 1 ≤ i ≤ m, deﬁne X t,i = Π i t . Then, A t corresponds to the deﬁnition inTheorem 2.2. Note that for any s, t ∈ T and for any 1 ≤ i ≤ m, X s,i − X t,i = X s − t,i , whichis a subgaussian with mean 0, variance k s − t k , and subgaussian norm at most k k s − t k C byProposition 2.1. Therefore, if we deﬁne d ψ ( s, t ) = max ≤ i ≤ m k X s,i − X t,i k ψ , as in Theorem 2.2, d ψ ( s, t ) ≤ k k s − t k C . As a result, we have the following:

Proposition 3.1. ∆ ψ ( T k ) = O (2 − k ) .Proof. Since any point t ∈ T k has Euclidean norm at most 2 − k , the conclusion is immediate.We also note the following: Proposition 3.2. γ ( T k , ℓ ) = O ( √ log n ) , where ℓ represents the standard Euclidean distance.Proof. The Majorizing Measures theorem [Tal14] gives γ ( T k , ℓ ) = Θ (cid:0) E g (sup x ∈ T k h g, x i (cid:1) ) , where g is a d -dimensional vector of i.i.d. standard normal random variables. Also, E g (sup x ∈ T k h g, x i ) ≤ E g (sup x ∈ T h g, x i ) because T k ⊂ T . However, since T = conv( X ) and since h g, x i is a linear functionof x , we have sup x ∈ T h g, x i = sup x ∈ X h g, x i . Since the x i ’s are unit norm vectors, then each g i := h g, x i i is standard normal, which implies (even if they are dependent) that E sup i | g i | = O ( √ log n ) . This means that γ ( T k , ℓ ) = O ( E g (sup x ∈ T h g, x i )) = O ( √ log n ) , so the proof is complete.This allows us to bound γ ( T, d ψ ) as follows. Corollary 3.1. γ ( T k , d ψ ) = O ( √ log n ) . Proof. As d ψ ( s, t ) = O ( k s − t k ) for any points s, t, the result follows from Proposition 3.2.Therefore, using T = T k in Theorem 2.2 and varying k gives us the following.6 heorem 3.1. Suppose that ε, δ < and m = Θ (cid:16) ε log n log(2 /ε ) δ (cid:17) . Then, there exists some con-stant C such that for all k ≥ , P sup t ∈ T k (cid:12)(cid:12) k Π t k − k t k (cid:12)(cid:12) ≥ C (cid:16) ε + ε · − k (cid:17)! ≤ δn log(2 /ε ) . Consequently, with probability at least − δ/n, we have that for all t ∈ T , |k Π t k − k t k | = O ( ε ) . Proof.

The ﬁrst part follows from Theorem 2.2 and the previous propositions. Note thatsup t ∈ T k (cid:12)(cid:12) k Π t k − k t k (cid:12)(cid:12) = sup t ∈ T k | A t | . Moreover, m γ ( T k , d ψ ) = O ( ε ) and √ m ∆ ψ ( T k ) γ ( T k , d ψ ) = O ( ε · − k ) . If we let u = ln n log(2 /ε ) δ , then √ u ∆ ψ ( T k ) √ m = O ( ε · − k ) , and u ∆ ψ ( T ) m = O ( ε · − k ) . The ﬁrst result now follows immediatelyfrom Theorem 2.2, as e − u = δn log(2 /ε ) . Note that it is possible for T k to be empty, but in this casewe see that sup t ∈ T k | A t | = 0 so the ﬁrst result is immediate.For the second part, assume WLOG that ε − = 2 ℓ for some ℓ. Deﬁne T ′ , T ′ , . . . , T ′ ℓ such that T ′ ℓ = T ℓ and for all k < ℓ, T ′ k = T k \ T k +1 . Then, T ′ , . . . , T ′ ℓ forms a partition of T , since k x k ≤ x ∈ T . Note that if T ∈ T ℓ , then with probability at least 1 − δn log(2 /ε ) , (cid:12)(cid:12) k Π t k − k t k (cid:12)(cid:12) = O ( ε ) andthus k Π t k = O ( ε ) and |k Π t k − k t k | = O ( ε ) . If T ∈ T ′ k for some k < ℓ, then with probability atleast 1 − δn log(2 /ε ) , (cid:12)(cid:12) k Π t k − k t k (cid:12)(cid:12) = |k Π t k + k t k | · |k Π t k − k t k | = O ( ε · − k ). This means thatsince |k Π t k + k t k | ≥ − ( k +1) , we must have that |k Π t k − k t k | = O ( ε ) . Therefore, by union bounding over all 0 ≤ k ≤ ℓ , with probability at least1 − ( ℓ + 1) δn log(2 /ε ) = 1 − ( ℓ + 1) δn ( ℓ + 1) = 1 − δn , for all t ∈ T, |k Π t k − k t k | = O ( ε ) . Thus, we are done.We therefore have the following immediate corollary.

Corollary 3.2.

For ≤ ε − < n and for any X = { x , ..., x n } ⊂ S d − , with probability at least − poly ( n ) − , a randomly chosen Π with m = Ω( ε − log n ) provides ε -convex hull distortion for X . We note that the methods for completing the terminal embedding in this section are very similarto those in [MMMR18]. Speciﬁcally, our proofs for Lemmas 3.1 and 3.2 are based on the proofs of[MMMR18, Lemma 3.1] and [MMMR18, Theorem 1.5], respectively.

Lemma 3.1.

Let x , ..., x n be nonzero points in R d and let v i = x i k x i k . Suppose that Π provides ε -convex hull distortion for V = { v , − v , ..., v n , − v n } . Then, for any u ∈ R d , there exists a u ′ ∈ R m such that k u ′ k ≤ k u k and |h u ′ , Π x i i − h u, x i i| ≤ ε k u k · k x i k for every x i . roof. This statement is trivial for for u = 0 , so assume u = 0 . It suﬃces to show that there alwaysexists a u ′ such that k u ′ k ≤ k u k and |h u ′ , Π v i i − h u, v i i| ≤ ε k u k for all v i . We will have that |h u ′ , Π x i i − h u, x i i| ≤ ε k u k · k x i k by scaling. Now, let B be the ball in R m of radius k u k andΛ be the the unit ℓ -ball in R n , where we write λ ∈ Λ as ( λ , . . . , λ n ) . Furthermore, deﬁne for u ′ ∈ B, λ ∈ Λ , Φ( u ′ , λ ) := n X i =1 (cid:0) λ i ( h u, v i i − h u ′ , Π v i i ) − ε | λ i | · k u k (cid:1) . We wish to show that there exists a u ′ ∈ B such that for all λ ∈ Λ , Φ( u ′ , λ ) ≤ . This clearlysuﬃces by looking at λ = ± v j for 1 ≤ j ≤ n. Note that Φ is linear in u ′ and concave in λ, and that B, Λ are compact and convex. Then, bythe von Neumann minimax theorem [vN28],min u ′ ∈ B max λ ∈ Λ Φ( u ′ , λ ) = max λ ∈ Λ min u ′ ∈ B Φ( u ′ , λ ) . Thus it suﬃces to show the right hand side is nonpositive, i.e. for any u, λ , there exists u ′ ∈ B s.t.Φ( u ′ , λ ) ≤ . Deﬁning P = P λ i v i , then for u ′ = k u k · Π P k Π P k , it suﬃces to show for all u, λ h u, P i − h u ′ , Π P i = h u, P i − k u k · k Π P k ≤ ε k λ k k u k . But in fact h u, P i ≤ k u k · k P k , so having that k P k − k Π P k ≤ ε k λ k for all λ ∈ Λ is suﬃcient.For k λ k = 1 this follows from Π providing ε -convex hull distortion for V , and for k λ k <

1, itfollows because we can scale λ so that k λ k = 1 . Lemma 3.2.

Let x , ..., x n ∈ R d be distinct. Let Y = n x i − x j k x i − x j k : i = j o . Moreover, suppose that Π ∈ R m × d provides ε -convex hull distortion for Y . Then, for any u ∈ R d , there exists an outerextension f : { x , . . . , x n , u } → R m +1 with distortion O ( ε ) , where f ( x i ) = Π x i .Proof. Note that the map x Π x yields a 1 + ε -distortion embedding of x , . . . , x n into R m as itapproximately preserves the norm of all points in Y . Therefore, we just have to verify that thereexists f ( u ) ∈ R m +1 such that k f ( u ) − Π x i k = (1 ± O ( ε )) k u − x i k for all i and any u ∈ R d .Fix u ∈ R d , and let x k be the closest point to u among { x , . . . , x n } . By Lemma 3.1, there exists u ′ ∈ R m such that k u ′ k ≤ k u − x k k and for all i , |h u ′ , Π( x i − x k ) i − h u − x k , x i − x k i| ≤ ε k u − x k k k x i − x k k . Next, let f ( u ) ∈ R m +1 be the point (Π x k + u ′ , p k u − x k k − k u ′ k ). If w i := x i − x k , then k f ( u ) − f ( x i ) k = k u − x k k − k u ′ k + k u ′ − Π w i k = k u − x k k + k Π w i k − h u ′ , Π w i i (2)and k u − x i k = k u − x k k + k w i k − h u − x k , w i i . (3)Since k u − x i k ≥ k u − x k k and k u − x i k ≥ ( k w i k − k u − x k k ) , we have that k u − x i k ≥ ( k u − x k k + k w i k ) / . This follows from the fact that max(1 , ( x − ) ≥ ( x + 1) / x ≥ . Since k Π( x i − x k ) k = (1 ± ε ) k x i − x k k , we also have that (cid:12)(cid:12) k Π w i k − k w i k (cid:12)(cid:12) = (cid:12)(cid:12) k Π( x i − x k ) k − k x i − x k k (cid:12)(cid:12) ≤ ε k x i − x k k = 3 ε k w i k i, j, assuming ε ≤ . Therefore, by subtracting Eq. (3) from Eq.(2), we have that (cid:12)(cid:12) k f ( u ) − f ( x i ) k − k u − x i k (cid:12)(cid:12) ≤ ε k w i k + 2 (cid:12)(cid:12) h u ′ , Π w i i − h u − x k , w i i (cid:12)(cid:12) ≤ ε k w i k + 2 ε k u − x k k k w i k ≤ ε (cid:0) k w i k + k u − x k k (cid:1) ≤ ε k u − x i k , as desired.We summarize the previous results, which allows us to prove our main result. Theorem 3.2.

For all ε < and x , . . . , x n ∈ R d , there exists m = O ( ε − log n ) and a (nonlinear)map f : R d → R m such that for all ≤ i ≤ n and u ∈ R d , (1 − O ( ε )) k u − x i k ≤ k f ( u ) − f ( x i ) k ≤ (1 + O ( ε )) k u − x i k . Proof.

By Corollary 3.2, there exists a Π ∈ R m × d with m = Θ( ε − log n ) that provides ε -convexhull distortion for Y , where Y is deﬁned in Lemma 3.2. By Lemma 3.2, for each u ∈ R d thereexists an outer extension f ( u ) : { x , . . . , x n , u } → R m +1 with distortion 1 + O ( ε ) sending x i to(Π x i , x i (Π x i ,

0) and map each u

6∈ { x , ..., x n } to f ( u ) ( u ), which gives usa terminal embedding to m = O ( ε − log n ) dimensions with distortion 1 + O ( ε ) . We brieﬂy note how one can produce a terminal embedding into R m with a Monte Carlo randomizedpolynomial time algorithm, where m = O ( ε − log n ) . By choosing a random Π from Subsection 3.1,we get with at least 1 − n − Θ(1) probability a matrix providing ε -convex hull distortion for our set Y in Lemma 3.2. To map any point in R d into R m +1 dimensions, for any u ∈ R d , it suﬃces to ﬁnda u ′ ∈ R m such that if x k is the point in X closest to u , || u ′ || ≤ || u − x k || and for all i , |h u ′ , Π( x i − x k ) i − h u − x k , x i − x k i| ≤ ε k u − x k k k x i − x k k . Assuming that Π provides an ε -convex hull distortion for Y , such a u ′ exists for all u , which meansthat u ′ can be found with semideﬁnite programming in polynomial time, as noted in [MMMR18]. References [AK17] Noga Alon and Bo’az Klartag. Optimal compression of approximate inner productsand dimension reduction. In

Proceedings of the 58th IEEE Annual Symposium onFoundations of Computer Science (FOCS) , pages 639–650, 2017.[BLM13] Stephane Boucheron, Gabor Lugosi, and Pascal Massart.

Concentration Inequalities:A Nonasymptotic Theory of Independence . Oxford University Press, 1 st edition, 2013.[Dir15] Sjoerd Dirksen. Tail bounds via generic chaining. Electron. J. Probab. , 20(53):1–29,2015.[Dir16] Sjoerd Dirksen. Dimensionality reduction with subgaussian matrices: a uniﬁed theory.

Found. Comp. Math. , 16(5):1367–1396, 2016.9EFN17] Michael Elkin, Arnold Filtser, and Ofer Neiman. Terminal embeddings.

TheoreticalComputer Science , 697:1–36, 2017.[JL84] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings intoa Hilbert space.

Contemporary Mathematics , 26:189–206, 1984.[LN17] Kasper Green Larsen and Jelani Nelson. Optimality of the Johnson-Lindenstrausslemma. In

Proceedings of the 58th Annual IEEE Symposium on Foundations of Com-puter Science (FOCS) , 2017.[MMMR18] Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, and Ilya P. Razen-shteyn. Nonlinear dimension reduction via outer Bi-Lipschitz extensions. In

Proceed-ings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC) ,pages 1088–1101, 2018.[Tal14] Michel Talagrand.

Upper and lower bounds for stochastic processes: modern methodsand classical problems . Springer, 2014.[Ver18] Roman Vershynin.

High-dimensional probability: an introduction with applications indata science . Cambridge University Press, 2018.[vN28] John von Neumann. Zur theorie der Gesellschaftsspiele.