Reconstructing measures on manifolds: an optimal transport approach
RRECONSTRUCTING MEASURES ON MANIFOLDS: AN OPTIMALTRANSPORT APPROACH
Vincent Divol ∗ Abstract.
Assume that we observe i.i.d. points lying close to some unknown d -dimensional C k submanifold M in a possibly high-dimensional space. We study the problem of reconstruct-ing the probability distribution generating the sample. After remarking that this problem isdegenerate for a large class of standard losses ( L p , Hellinger, total variation, etc.), we focus onthe Wasserstein loss, for which we build an estimator, based on kernel density estimation, whoserate of convergence depends on d and the regularity s ≤ k − M is a d -dimensional cube. Therelated problem of the estimation of the volume measure of M for the Wasserstein loss is alsoconsidered, for which a minimax estimator is exhibited. Density estimation is one of the most fundamental tasks in non-parametric statistics. If efficientmethods (from both a theoretical and a practical point of view) exist when the ambient spaceis of low dimension, minimax rates of estimation become increasingly slow as the dimensionincreases. To overcome this so-called curse of dimensionality , some structural assumptions onthe underlying probability are to be made in moderate to high dimensions, which may takedifferent forms, including e.g. the existence of a parametric component [LLW07], the single-index model [LZZL13], sparsity assumptions [Tib96], or constraints on the shape of the support.We focus in this work on the latter, namely on the case where the probability distribution µ generating the observations is assumed to be concentrated around a submanifold M of R D ,of dimension d smaller than D . This assumption, known as the manifold assumption, hasbeen fruitfully studied, with an emphasis put on reconstructing different geometric quantitiesrelated to the manifold, such as M itself [GPPVW12, AL18, AL19, Div20], its homology groups[NSW08, BRS + +
19, BHHS20]. Thetopic of density estimation in the manifold setting has itself been studied for over thirty years,with the emphasis initially being put on reconstructing the density in the case where the manifold M is given—think for instance of datasets lying on the space of orthogonal matrices—notableworks including [Hen90, Pel05, CGK + ∗ Inria Saclay & Université Paris-Saclay , [email protected] a r X i v : . [ m a t h . S T ] F e b etting where the manifold M is unknown and acts as a nuisance parameter. Kernel densityestimators on manifolds are designed in [BS17, WW20], where rates are exhibited, respectivelyin the case where the manifold has a boundary and in the case where the density is Höldercontinuous. In [BH19], kernel density estimators are shown to be minimax, and an adaptiveprocedure is designed, based on Lepski’s method, to estimate the unknown density in a point x ∈ R D which is known to belong to the unknown (and possibly nonsmooth) manifold M .To go beyond the pointwise estimation of µ , even the choice of a relevant loss is nontrivial.Indeed, most standard losses between probability measures (e.g. the L p distance, the Hellingerdistance or the Kullback-Leibler divergence) are degenerate when comparing mutually singularmeasures, which will typically be the case for measures on two distinct manifolds, even if they arevery close to each other with respect to the Hausdorff distance. This implies that the estimationproblem is degenerate from a minimax perspective when choosing such losses (see Theorem2.13). On the contrary, the Wasserstein distances W p , 1 ≤ p ≤ ∞ are particularly adapted tothis problem, as they are by design robust to small metric perturbations of the support of ameasure.Apart from this first motivation, the use of Wasserstein distances, and more generallyof the theory of optimal transport, has shown to be an efficient tool in widely different re-cent problems of machine learning, with fast implementations and sound theoretical results (seee.g. [PC19] for a survey). From a statistical perspective, most of the attention has been ded-icated to studying rates of convergence between a probability distribution µ and its empiricalcounterpart µ n [Dud69, DSS13, FG15, SP18, WB19a, L + µ , then it is possible to build estimators with smaller risks than the empiricalmeasure µ n . Assume for instance that µ is a probability distribution on the cube [ − , D , withdensity f of regularity s (measured through the Besov scale B sp,q ). In this setting, it has beenshown in [WB19b] that, given n i.i.d. points of law µ , the minimax rate (up to logarithmicfactors) for the estimation of µ with respect to the Wasserstein distance W p is of order n − s +12 s + D if D ≥ n − log n if D = 2 n − if D = 1 , (1.1)and that this rate is attained by a modified linear wavelet density estimator. Our main contri-bution consists in extending the results of [WB19b] by allowing the support of the probability tobe any d -dimensional compact C k submanifold M ⊂ R D for k ≥
2. More precisely, assume thatsome probability µ on M has a lower and upper bounded density f which belongs to the Besovspace B sp,q ( M ) for some 0 < s ≤ k −
1, 1 ≤ p < ∞ , 1 ≤ q ≤ ∞ (see Section 2 for details). Wefirst show (Theorem 3.1) that some weighted kernel density estimator that we integrate against2he volume measure vol M on M attains, for the W p distance, the rate of estimation n − s +12 s + d if d ≥ n − (log n ) if d = 2 n − if d = 1 . (1.2)In the case where the manifold M is unknown, we do not have access to the volume measurevol M , so that the latter estimator is not computable. We therefore propose to estimate thevolume measure vol M in a preliminary step. Such an estimator c vol M is defined by using localpolynomial estimation techniques from [AL19]. We show that this estimator is a minimaxestimator of the volume measure up to logarithmic factors (Theorem 3.6), with a risk of order(log n/n ) k/d . We then show (Theorem 3.7) that a weighted kernel density estimator integratedagainst c vol M attains the rate (1.2). Those rates are significantly faster than the rates of (1.1) if d (cid:28) D and are shown to be minimax up to logarithmic factors.In Section 2, we define our statistical model and give some preliminary results on Wasser-stein distances. In Section 3, we define kernel density estimators on a manifold M , and state ourmain results. Proofs of the main theorems are then given in Section 4 while additional proofsare found in the Appendix. For any d >
0, we write · for the dot product and | v | for the norm of a vector v ∈ R d . Theball centered at x ∈ R d of radius h > B ( x, h ). For Ω ⊂ R d a set and x ∈ R d ,we let d ( x, Ω) := inf {| x − y | , y ∈ Ω } be the distance from x to Ω and we write B Ω ( x, h ) for B ( x, h ) ∩ Ω. Also, we let Ω h := { x ∈ R d , d ( x, Ω) < h } be the h -tubular neighborhood ofΩ. Given a tensor A : ( R d ) i → R d of order i ≥
0, the operator norm k A k op is defined as k A k op := max { A [ v , . . . , v i ] , | v | , . . . , | v i | ≤ } . Also, we let A ∗ : R d → R d denote the adjointof the operator A : R d → R d .Let D > M d be the set of all smooth d -dimensional connected submanifoldsin R D without boundary, endowed with the metric induced by the standard metric on R D . Wedenote by d g the geodesic distance on M . The tangent space at a point x ∈ M is denoted by T x M . It is identified with a d -dimensional subspace of R D , and the orthogonal projection on T x M is denoted by π x . We also let ˜ π x : R D → T x M be defined by ˜ π x ( y ) = π x ( y − x ). We denoteby T x M ⊥ the normal space at x ∈ M . If M ∈ M d , M ∈ M d , x ∈ M and f : M → M is a C l function, then we let d l f ( x ) : ( T x M ) l → T f ( x ) M be the l th differential of f at x and k f k C l ( M ) := max ≤ i ≤ l sup x ∈ M (cid:13)(cid:13) d i f ( x ) (cid:13)(cid:13) op . For i = 1, we write df for d f , and if d ≤ d , thenwe define the Jacobian of f at x ∈ M as J f ( x ) = p det( df ( x ) ∗ df ( x )). We let C l ( M ) be the3pace of all C l functions f : M → R (with possibly l = ∞ ) and for f ∈ C ( M ), we let ∇ f denotethe gradient of f . We also denote by ∇· the divergence operator on M .Let vol M be the volume measure associated with the Riemannian metric on M (orequivalently vol M is the d -dimensional Hausdorff measure restricted to M ). We will denotethe integration with respect to dvol M ( x ) by d x when the context is clear. For 1 ≤ p ≤ ∞ ,we let L p ( M ) be the set of measurable functions f : M → R with finite p -norm k f k L p ( M ) :=( R f dvol M ) /p (and usual modification if p = ∞ ). We say that a locally integrable function isweakly differentiable if there exists a measurable section ∇ f of the tangent bundle T M (uniquelydefined almost everywhere) such that for all smooth vector fields w on M with compact support,we have Z f ( ∇ · w ) dvol M = − Z ( ∇ f ) · w dvol M . Furthermore, we will denote by p ∗ ∈ [1 , ∞ ] the number satisfying p + p ∗ = 1.The key quantity used to describe the regularity of a manifold M is its reach τ ( M ).It is defined as the distance between M and its medial axis, that is the set of points x ∈ R D for which there are at least two points of M which attain the distance from x to M . Inparticular, the projection π M on the manifold M is defined on M τ ( M ) . Originally introduced in[Fed59], the reach τ ( M ) measures both the local regularity of M (namely its curvature) and itsglobal regularity, see e.g. [AKC +
19, BHHS20] or [DZ01, Section 6.6] for precise results on therelationships between the reach of a manifold and its geometry. We then measure the regularityof M through the regularity of local parametrizations of M (see [AL19]). Definition 2.1.
Let M ∈ M d , and τ min , L > , k ≥ . Let r = ( τ min ∧ L ) / . We say that M is in M kd,τ min ,L if M is closed, of reach larger than τ min and if, for all x ∈ M , the projection ˜ π x : M → T x M is a local diffeomorphism in x , with inverse Ψ x defined on B T x M (0 , r ) , satisfying k Ψ x k C k ( B TxM (0 ,r )) ≤ L .Remark . (i) For the sake of convenience, we use a definition slightly different from thedefinition of [AL19], where authors assume the existence of local parametrizations ˜Ψ x having controlled C k norms, with ˜Ψ x not necessarily equal to the inverse Ψ x of the or-thogonal projection. However, our definition is not restrictive. Indeed, on can writeΨ x = ˜Ψ x ◦ (˜ π x ◦ ˜Ψ x ) − , where the C k norm of (˜ π x ◦ ˜Ψ x ) − is controlled by the inversefunction theorem. Therefore, the C k norm of Ψ x can always be controlled by the C k normsof other parametrizations ˜Ψ x . Both definitions can also be proven to be equivalent toassuming that the function d ( · , M ) has a controlled C k norm on M τ ( M ) , see e.g. [PR84].(ii) The value of the scale parameter r is used for convenience. Other small scales could beused, or the radius r could also be added as another parameter of the model, without anysubstantial gain in doing so. 4 .2 Besov spaces on manifolds Let M ∈ M kd,τ min ,L for some k ≥ τ min , L >
0. As stated in the introduction, minimax ratesfor the estimation of a given probability will depend crucially on the regularity of its density f ,which is assumed to belong to some Besov space B sp,q ( M ). We first introduce Sobolev spaces H lp ( M ) on M for l ≤ k an integer, and Besov spaces on M are then defined by real interpolation. Definition 2.3 (Sobolev space on a manifold) . Let ≤ l ≤ k , ≤ p < ∞ and let f ∈ C ∞ ( M ) function. We let k f k H lp ( M ) := max ≤ i ≤ l (cid:18)Z (cid:13)(cid:13)(cid:13) d i f ( x ) (cid:13)(cid:13)(cid:13) p op dvol M ( x ) (cid:19) /p . (2.1) The space H lp ( M ) is the completion of C ∞ ( M ) for the norm k · k H lp ( M ) .Remark p = ∞ ) . The previous definition cannot be extended to the case p = ∞ . Indeed, the completion of C ∞ ( M ) for the norm k · k H l ∞ ( M ) is equal to C l ( M ), whereasfor instance H ∞ ( M ) should be equal to L ∞ ( M ). For l = 1, the space H p ( M ) can equivalentlybe defined as the space of weakly differentiable functions f with k f k H p ( M ) < ∞ , while thisdefinition can be easily extended to the case p = ∞ . In particular, if f ∈ H ∞ ( M ), then onecan verify that f ◦ Ψ x ∈ H ∞ ( B T x M (0 , r )) for any x ∈ M . It follows from standard resultson Sobolev spaces on domains that f ◦ Ψ x is Lipschitz continuous (see e.g. [Bre10, Proposition9.3]). Hence, f is also locally Lipschitz continuous. By Rademacher theorem, f is thereforealmost everywhere differentiable, and its differential coincides with the weak differential. As aconsequence, a function f ∈ H ∞ ( M ) is Lipschitz continuous, with Lipschitz constant for thedistance d g equal to k f k H ∞ ( M ) .For 1 ≤ p < ∞ , we introduce the negative homogeneous Sobolev norm k · k ˙ H − p ( M ) ,defined, for f ∈ L p ( M ) with R f dvol M = 0, by k f k ˙ H − p ( M ) := sup (cid:26)Z f g dvol M , k∇ g k L p ∗ ( M ) ≤ (cid:27) , (2.2)where the supremum is taken over all functions g ∈ H p ∗ ( M ). For f ∈ L p ( M ), the negativeSobolev norm is defined by k f k H − p ( M ) := sup (cid:26)Z f g dvol M , k g k H p ∗ ( M ) ≤ (cid:27) , (2.3)and the corresponding Banach space is denoted by H − p ( M ). Proposition 2.5.
Let ≤ p < ∞ and f ∈ H − p ( M ) with R f dvol M = 0 .(i) We have C d,τ min | vol M | d − p − d k f k ˙ H − p ( M ) ≤ k f k H − p ( M ) ≤ k f k ˙ H − p ( M ) for some positive con-stant C d,τ min depending on d and τ min . ii) We have k f k ˙ H − p ( M ) = inf {k w k L p ( M ) , ∇ · w = f } , where the infimum is taken over allmeasurable vector fields w on M with finite p -norm, and where ∇ · w = f means that R f g dvol M = − R w · ∇ g dvol M for all g ∈ C ∞ ( M ) . Following [Tri92], Besov spaces on a manifold M are defined as real interpolation ofSobolev spaces. Definition 2.6 (Real interpolation of spaces) . Let A , A be two Banach spaces, which con-tinuously embed into some Banach space A . We endow the space A ∩ A with the norm k x k A ∩ A = max {k x k A , k x k A } for x ∈ A ∩ A and the space A + A with the norm K ( x, for x ∈ A + A , where K ( x, λ ) := inf {k x k A + λ k x k A , x = x + x , x ∈ A , x ∈ A } , λ ≥ . (2.4) For θ ∈ [0 , and ≤ q ≤ ∞ , we let k x k ( A ,A ) θ,q := (cid:18)Z ∞ λ − θq K ( x, λ ) q d λλ (cid:19) /q , x ∈ A + A , (2.5) and ( A , A ) θ,q := { x ∈ A + A , k x k ( A ,A ) θ,q < ∞} (with usual modification if q = ∞ ). Thepair ( A , A ) is called a compatible pair, and ( A , A ) θ,q is the real interpolation between A and A of exponents θ and q . For
A, B two Banach spaces and F : A → B a bounded operator, we let k F k A,B be theoperator norm of F . Let ( A , A ) and ( B , B ) be two compatible pairs. Let F : A + A → B + B be a linear map such that the restriction of F to A j is a bounded linear map into B j ( j = 0 , k F k ( A ,A ) θ,q , ( B ,B ) θ,q ≤ k F k − θA ,B k F k θA ,B . (2.6) Definition 2.7 (Besov space on a manifold) . Let ≤ p < ∞ and < s < k . The Besov space B sp,q ( M ) is defined as B sp,q ( M ) := ( L p ( M ) , H kp ( M )) s/k,q . Basic results from interpolation theory then imply that k · k B sp,q ( M ) ≤ k · k B s p,q ( M ) if0 < s ≤ s < k (see e.g. [Lun18]). Let P be the set of finite Borel measures µ on R D , with | µ | the total mass of µ . Let P be theset of measures in P with | µ | = 1. For 1 ≤ p ≤ ∞ , let P p be the set of measures µ ∈ P satifying( R | x | p d µ ( x )) /p < ∞ (usual modification if p = ∞ ) and let P p = P p ∩ P . The pushforward ofa measure µ by a measurable application φ : R D → R D is defined by φ µ ( A ) := µ ( φ − ( A )) (2.7)6or any Borel set A ⊂ R D . For ρ : R D → [0 , ∞ ) a measurable function, we denote by ρ · µ themeasure having density ρ with respect to µ . Definition 2.8 (Wasserstein distance) . Let ≤ p ≤ ∞ and let µ, ν ∈ P p with the same totalmass. Let Π( µ, ν ) be the set of transport plans between µ and ν , i.e. probability measures on R D × R D with first marginal π (resp. second marginal π ) equal to µ (resp. ν ). The cost C p ( π ) of π ∈ Π( µ, ν ) is defined as R | x − y | p d π ( x, y ) . The p -Wasserstein distance between µ and ν isdefined as W p ( µ, ν ) := inf π ∈ Π( µ,ν ) C p ( π ) /p , (2.8) with usual modification if p = ∞ . A crucial point in the study conducted in the next sections is the relation betweenWasserstein distances and negative Sobolev norms.
Proposition 2.9 (Wasserstein distances and negative Sobolev norms) . Let ≤ p < ∞ . Let M ∈ M d be a manifold with reach τ ( M ) ≥ τ min , and let µ, ν ∈ P p be two probability measuressupported on M , absolutely continuous with respect to vol M . Assume that µ, ν ≥ f min · vol M forsome f min > . Then, identifying measures with their densities, we have W p ( µ, ν ) ≤ p − /p f /p − k µ − ν k ˙ H − p ( M ) ≤ p − /p C d,τ min ,f min k µ − ν k H − p ( M ) , (2.9) for some constant C d,τ min ,f min depending on d , τ min and f min . In particular, if p = 1, then the first inequality in (2.9) is actually an equality by theKantorovitch-Rubinstein duality formula [Vil08, Particular Case 5.16]. This inequality appearsin [Pey18] for p = 2 and in [San15, Section 5.5.1] for measures having density with respect tothe Lebesgue measure. We carefully adapt their proofs in Appendix B. Let ( Y , H ), ( X , G ) be measurable spaces and let Q be a subset of the space of probabilitymeasures on ( Y , H ). Assume that there is a measurable function ι : ( Y , H ) → ( X , G ) such thatwe observe i.i.d. variables X , . . . , X n ∼ ι ξ for some ξ ∈ Q . Those random variables are alldefined on some probabilistic space (Ω , F , P ), and the integration with respect to P is denotedby E . Let ϑ be a functional of interest defined on P , taking its value in some measurable space( E, E ). The tuple ( Y , H , Q , ι, ϑ ) is a statistical model . Given i.i.d. observations with law ι ξ where ξ ∈ Q , the goal is to produce an estimator ˆ ϑ (depending on the observations and theparameters defining Q ) such that its risk E L ( ˆ ϑ, ϑ ) is as small as possible, where L is some7easurable loss function L : E × E → [0 , ∞ ]. The infimum of the risk over the estimators ˆ ϑ iscalled the minimax risk for the estimation of ϑ on Q with respect to the loss L : R n ( ϑ, Q , L ) := inf ˆ ϑ sup µ ∈Q E L ( ˆ ϑ, ϑ ) , (2.10)where ˆ ϑ = ˆ ϑ ( ι ( X ) , . . . , ι ( X n )) and X , . . . , X n is an i.i.d. sample with law ξ . We consider thefollowing models, where points are sampled on a manifold, with possibly tubular noise. We fixin the following some parameters τ min , L s , L k >
0, 1 ≤ q ≤ ∞ and 0 < f min < f max < ∞ . Wealso write M kd instead of M kd,τ min ,L k . Definition 2.10 (Noise free model) . Let d ≤ D be integers, k ≥ , ≤ s < k and ≤ p < ∞ .Let M ∈ M kd . For s = 0 , the set Q ( M ) is the set of probability distributions µ on R D absolutelycontinuous with respect to the volume measure vol M , with a density f satisfying f min ≤ f ≤ f max almost everywhere. For s > , the set Q s ( M ) is the set of distributions µ ∈ Q ( M ) , with density f ∈ B sp,q ( M ) satisfying k f k B sp,q ( M ) ≤ L s . The model Q s,kd is equal to the union of the sets Q s ( M ) for M ∈ M kd . The statistical model is completed by letting ( Y , H ) be R D endowed with its Borel σ -algebra and ι and ϑ be the identity.Remark . If µ ∈ Q s ( M ), then, as µ ≥ f min vol M , one has diam( M ) ≤ C d / ( f min τ d − ) forsome constant C d depending only on d [AL18, Lemma 2.2]. In particular, the manifold M isautomatically compact. Definition 2.12 (Tubular noise model) . Let d ≤ D be integers, k ≥ , ≤ s < k , ≤ p < ∞ and γ ≥ . The set Q s,kd ( γ ) is the set of probability distributions ξ of random variables ( Y, Z ) where Y ∼ µ ∈ Q s,kd and Z ∈ B (0 , γ ) is such that Z ∈ T Y M ⊥ . The statistical model is completedby letting ( Y , H ) be R D × R D endowed with its Borel σ -algebra, ι be the addition R D × R D → R D and ϑ ( ξ ) be the first marginal µ of ξ . Concretely, a n -sample in the tubular noise model is given by X , . . . , X n , where X i isequal to Y i + Z i with Y i supported on some manifold M and Z i ∈ T Y i M ⊥ is of norm smallerthan γ . The goal is then to reconstruct the law µ of Y i . We first show that such a taskis impossible if the loss function L is larger than the total variation distance TV, which isdefined by TV( µ, ν ) := sup A | µ ( A ) − ν ( A ) | for µ, ν ∈ P , where the supremum is taken over allmeasurable sets A ⊂ R D . Theorem 2.13.
Let d ≤ D be integers, k ≥ , ≤ s < k , ≤ p < ∞ . Let L : P × P → [0 , ∞ ] bea measurable map with respect to the Borel σ -algebra associated to the total variation distance on P × P . Assume that L ( µ, ν ) ≥ g (TV( µ, ν )) for a convex nondecreasing function g : R → [0 , ∞ ] with g (0) = 0 . Then, for any τ min > , if f min is small enough and L k , L s , f max are large enough,we have R n ( µ, Q s,kd , L ) ≥ g ( c d ) , (2.11) for some constant c d > . g ( x ) = x ), the Kullback-Leibler divergence (with g ( x ) = x / L p distance withrespect to some dominating measure (with g ( x ) = x p ). We give a proof of Theorem 2.13, basedon Assouad’s lemma, in Appendix G. A simple example of loss L which is not degenerate formutually singular measures is given by the W p distance. As stated in the introduction, we willtherefore choose this loss, and study R n ( µ, Q s,kd ( γ ) , W p ), the minimax rate of estimation for µ with respect to W p , where ϑ ( ξ ) = µ is the first marginal of ξ ∈ Q s,kd ( γ ). Remark . For γ >
0, the statistical model Q s,kd ( γ ) is not identifiable, in the sense thatthere exist ξ , ξ in the model for which ι ξ = ι ξ . Having such an equality implies that W p ( ϑ ( ξ ) , ϑ ( ξ )) ≤ W p ( ϑ ( ξ ) , ι ξ )+ W p ( ι ξ , ϑ ( ξ )) ≤ γ . This inequality is tight up to a constant.Indeed, take Y an uniform random variable on the unit sphere, let ξ be the law of ( Y,
0) and ξ be the law of ((1 + γ ) Y, − γY ). Then, ξ and ξ are in Q s,kd ( γ ) and ι ξ = ι ξ , whereas, by theKantorovitch-Rubinstein duality formula, W p ( ϑ ( ξ ) , ϑ ( ξ )) ≥ W ( ϑ ( ξ ) , ϑ ( ξ )) ≥ E [ φ ((1 + γ ) Y ) − φ ( Y )]for any 1-Lipschitz function φ . Letting φ be the distance to the unit sphere, we obtain that thisdistance is larger than γ . In that sense, γ represents the maximal precision for the estimationof ϑ ( ξ ). Remark . For ease of notation, we will write in the following a (cid:46) b to indicate that thereexists a constant C depending on the parameters p, k, τ min , L s , L k , f min , f max , but not on s and D , such that a ≤ Cb , and write a (cid:16) b to indicate that a (cid:46) b and b (cid:46) a . Also, we will write c α to indicate that a constant c depends on some parameter α . Before building an estimator in the model Q s,kd ( γ ), let us consider the easier problem of theestimation of µ in the case where γ = 0 (noise free model) and the support M is known. Let µ ∈ Q s ( M ) and Y , . . . , Y n be a n -sample of law µ . Let µ n = n P ni =1 δ Y i be the empirical measureof the sample. Identify R d with R d × { } D − d and consider a kernel K : R D → R satisfying thefollowing conditions:• Condition A : The kernel K is a smooth radial function with support B (0 ,
1) such that R R d K = 1.• Condition B ( m ) : The kernel K is of order m ≥ | α | := P dj =1 α j be the length of a multiindex α = ( α , . . . , α d ). Then, for all multiindexes α , α with 0 ≤ | α | < m , 0 ≤ | α | < m + | α | , and with | α | > α = 0, we have Z R d ∂ α K ( v ) v α d v = 0 , (3.1)9here v α = Q dj =1 v α j j and ∂ α K is the partial derivative of K in the direction α .• Condition C ( β ) : The negative part K − of K satisfies R R d K − ≤ β .We show in Appendix H that for every integer m ≥ β >
0, there exists akernel K satisfying conditions A , B ( m ) and C ( β ). Define the convolution of K with a measure ν ∈ P as K ∗ ν ( x ) := Z K ( x − y )d ν ( y ) , x ∈ R D , (3.2)and, for h >
0, let K h := h − d K ( · /h ). Let ρ h := K h ∗ vol M and let µ n,h be the measure withdensity K h ∗ ( µ n /ρ h ) with respect to vol M . Dividing by ρ h ensures that µ n,h is a measure ofmass 1. Remark that the computation of µ n,h requires to have access to M , that is µ n,h is anestimator on Q s ( M ) but not on Q s,kd . By linearity, the expectation of µ n,h is given by µ h , themeasure having for density K h ∗ ( µ/ρ h ) on M . Theorem 3.1.
Let d ≤ D be integers, < s ≤ k − with k ≥ and ≤ p < ∞ . Let M ∈ M kd and µ ∈ Q s ( M ) with Y , . . . , Y n a n -sample of law µ . There exists a constant β depending onthe parameters of the model such that, if K is a kernel satisfying conditions A , B ( k ) and C ( β ) ,then the measure µ n,h satisfies the following:(i) If (log n/n ) /d (cid:46) h (cid:46) , then, with probability larger than − cn − k/d , the density of µ n,h is larger than f min / and smaller than f max everywhere on M .(ii) If n − /d (cid:46) h (cid:46) , then we have E k µ − µ n,h k H − p ( M ) ≤ k µ − µ h k H − p ( M ) + E k µ n,h − µ h k H − p ( M ) (3.3) (cid:46) h s +1 + h − d/ I d ( h ) √ n , (3.4) where I d ( h ) = 1 if d ≥ , ( − log( h )) / if d = 2 and h − / if d = 1 .(iii) Let h (cid:16) n − / (2 s + d ) if d ≥ , h (cid:16) (log n/n ) /d if d ≤ . Define µ n,h = µ n,h if µ n,h is aprobability measure and µ n,h = δ X otherwise. Then, E W p ( µ n,h , µ ) (cid:46) n − s +12 s + d if d ≥ ,n − (log n ) if d = 2 ,n − if d = 1 . (3.5) (iv) Furthermore, for any ≤ s < k and τ min > , if f min is small enough and if f max and L s are large enough, then there exists a manifold M ∈ M kd such that R n ( µ, W p , Q s ( M )) (cid:38) n − s +12 s + d if d ≥ ,n − if d ≤ . (3.6)10 emark . The condition C ( β ) on the kernel is only used to ensure that the measure µ n,h hasa lower and upper bounded density on M . An alternative possibility to ensure this property is toassume that the density of µ is Hölder continuous of exponent δ for some δ >
0. Techniques from[BH19] then imply that k µ n,h − µ k L ∞ ( M ) (cid:46) h δ + n − / h − d/ (cid:28) sp > d , then every element of B sp,q ( M ) is Höldercontinuous (by [Tri92, Theorem 7.4.2]), and condition C ( β ) is no longer required. However,Theorem 3.1 also holds for non-continuous densities. Remark . Let K be a nonnegative kernel satisfying conditions A , B (0) and C ( β ). It isstraightforward to check that W p ( µ n , µ n,h ) (cid:46) h . Therefore, Theorem 3.1(ii) and Proposition2.9 imply in particular that W p ( µ n , µ ) (cid:46) h + h − d/ I d ( h ) √ n . By choosing h of the order n − /d , weobtain that W p ( µ n , µ ) (cid:46) n − d if d ≥ n − (log n ) if d = 2 n − if d = 1 . (3.7)Such a result was already shown for p = ∞ [TGHS20] with additional logarithmic factors, witha proof very different than ours. See also [Div21] for a short proof of this result when M is theflat torus.In (3.4), a classical bias-variance trade-off appears. Namely, the bias of the estimatoris of order h s +1 , whereas its fluctuations are of order h − d/ / √ n (at least for d ≥ h , say for the pointwise estimation of a function of class C s on the cube[0 , d . It is then well-known (see e.g. [Tsy08, Chapter 1]) that the bias of the estimator isof order h s whereas its variance is of order h − d/ / √ n . The supplementary factor h appearingboth in the bias and fluctuation terms can be explained by the fact that we are using a norm H − p ( M ) instead of a pointwise norm to quantify the risk of the estimator: in some sense, we areestimating the antiderivative of the density rather than the density itself. This is particularlytrue if d = 1 and p = 1, where the Wasserstein distance between two measures is given by the L distance between the cumulative distribution functions of the two measures [San15, Proposition2.17]. Before giving a proof of Theorem 3.1, let us explain how to extend it to the case wherethe manifold M is unknown and in the presence of tubular noise. The measure µ n,h is themeasure having density K h ∗ ( µ n /ρ h ) with respect to vol M . Of course, if M is unknown, thenso is vol M , and we therefore propose the following estimation procedure of vol M , using localpolynomial estimation techniques from [AL19]. Let X , . . . , X n be a n -sample in the model withtubular noise Q s,kd ( γ ), with X i = Y i + Z i , Y i of law µ and Z i ∈ T Y i M ⊥ with | Z i | ≤ γ . Let ν ( i ) n bethe empirical measure n − P j = i δ X j − X i . For two positive parameters ‘ , ε , the local polynomial11stimator (ˆ π i , ˆ V ,i , . . . , ˆ V m − ,i ) of order m at X j is defined as an element ofarg min π, sup ≤ j ≤ m − k V j k op ≤ ‘ ν ( i ) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x − π ( x ) − m − X j =2 V j [ π ( x ) ⊗ j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { x ∈ B (0 , ε ) } , (3.8)where the argmin is taken over all orthogonal projectors π of rank d and symmetric tensors V j : ( R D ) j → R D of order j . Let ˆ T i be the image of ˆ π i and ˆΨ i : v ∈ R D X i + v + P m − j =2 ˆ V j,i [ v ⊗ j ].Let ∠ ( T , T ) denote the angle between two d -dimensional subspaces, defined by k π T − π T k op ,where π T i is the orthogonal projection on T i for i = 1 ,
2. We summarize the results of [AL19] inthe following proposition (see Appendix A for details).
Proposition 3.4.
With probability at least − cn − k/d , if m ≤ k , (log n/n ) /d (cid:46) ε (cid:46) , γ (cid:46) ε and (cid:46) ‘ (cid:46) ε − , then, max ≤ i ≤ n ∠ ( T Y i M, ˆ T i ) (cid:46) ε m − + γε − (3.9) and, for all ≤ i ≤ n , if v ∈ ˆ T i with | v | ≤ ε , we have | ˆΨ i ( v ) − Ψ Y i ◦ π Y i ( v ) | (cid:46) ε m + γ (3.10) (cid:13)(cid:13)(cid:13) d ˆΨ i ( v ) − d (Ψ Y i ◦ π Y i )( v ) (cid:13)(cid:13)(cid:13) op (cid:46) ε m − + γε − . (3.11)Hence, if γ is of order at most ε k , then it is possible to approximate the tangent spaceat Y i with precision ε k − and the local parametrization with precision ε k . In particular, authorsin [AL19] show that, with high probability, S ni =1 B ˆΨ i ( ˆ T i ) ( X i , ε ) is at Hausdorff distance less than ε k + γ from M (up to a constant). We now define an estimator c vol M of vol M by using anappropriate partition of unity ( χ j ) j , which is built thanks to the next lemma. For A, B ⊂ R D ,introduce the asymmetric Hausdorff distance d H ( A | B ) := sup x ∈ A d ( x, B ) and the Hausdorffdistance d H ( A, B ) := d H ( A | B ) ∨ d H ( B | A ). We say that a set S is δ -sparse if | x − y | ≥ δ for alldistinct points x, y ∈ S . Lemma 3.5 (Construction of partitions of unity) . Let δ (cid:46) . Let S ⊂ M δ be a set which is δ -sparse, with d H ( M δ | S ) ≤ δ . Let θ : R D → [0 , be a smooth radial function supported on B (0 , , which is equal to on B (0 , / . Define, for y ∈ M δ and x ∈ S , χ x ( y ) = θ (cid:16) y − x δ (cid:17)P x ∈ S θ (cid:16) y − x δ (cid:17) . (3.12) Then, the sequence of functions χ x : M δ → [0 , for x ∈ S , satisfies (i) P x ∈ S χ x ≡ , with atmost c d non zero terms in the sum at any given point of M δ , (ii) k χ x k C l ( M δ ) ≤ C l,d δ − l for any l ≥ and, (iii) χ x is supported on B M δ ( x, δ ) .
12 proof of Lemma 3.5 is given in Appendix A. Given a set S ⊂ M δ with d H ( M δ | S ) ≤ δ/
3, the farthest sampling algorithm with parameter 7 δ/ S ⊂ S which is 7 δ/ δ/ S : the set S then satisfies the hypothesisof Lemma 3.5. The next proposition describes how we may define a minimax estimator c vol M ofthe volume measure on M (up to logarithmic factors) using such a partition of unity. Theorem 3.6 (Minimax estimation of the volume measure on M ) . Let d ≤ D be integers and k ≥ . Let ξ ∈ Q ,kd ( γ ) and let X , . . . , X n be a n -sample of law ι ξ . Let (log n/n ) /d (cid:46) ε (cid:46) , γ (cid:46) ε , (cid:46) ‘ (cid:46) ε − .(i) Let { X i , . . . , X i J } be the output of the farthest point sampling algorithm with parameter ε/ and input { X , . . . , X n } . With probability larger than − cn − k/d , there exists asequence of smooth nonnegative functions χ j : M ε/ → [0 , for ≤ j ≤ J , such that χ j is supported on B M ε/ ( X i j , ε ) , k χ j k C ( M ε/ ) (cid:46) ε − and P Jj =1 χ j ( z ) = 1 for z ∈ M ε/ , withat most c d non-zero terms in the sum.(ii) Let ˆΨ i be the local polynomial estimator of order m ≤ k with parameter ε and ‘ , and ˆ T i theassociated tangent space. Let c vol M be the measure defined by, for all continuous boundedfunctions f : R D → R , Z f ( x )d c vol M ( x ) = J X j =1 Z ˆΨ ij ( ˆ T ij ) f ( x ) χ j ( x )d x, (3.13) where the integration is taken against the d -dimensional Hausdorff measure on ˆΨ i j ( ˆ T i j ) .Then, for ≤ r ≤ ∞ , with probability larger than − cn − k/d , we have, for γ (cid:46) ε , W r c vol M | c vol M | , vol M | vol M | ! (cid:46) γ + ε m . (3.14) (iii) In particular, if m = k , ε (cid:16) (log n/n ) /d and γ (cid:46) ε , we obtain that E W r c vol M | c vol M | , vol M | vol M | ! (cid:46) γ + (cid:18) log nn (cid:19) kd . (3.15) Also, for any τ min > and ≤ s < k , if f min is small enough, and if f max , L k , L s are largeenough, then R n (cid:18) vol M | vol M | , Q s,kd ( γ ) , W r (cid:19) (cid:38) γ + (cid:18) n (cid:19) kd . (3.16)Let ˆ ρ h := K h ∗ c vol M . We define ˆ ν n,h as the measure having density K h ∗ ( ν n / ˆ ρ h ) withrespect to the measure c vol M , where ν n = n P ni =1 δ X i is the empirical measure of the sample( X , . . . , X n ). 13 heorem 3.7. Let d ≤ D be integers, < s ≤ k − with k ≥ and ≤ p < ∞ . Let ξ ∈ Q s,kd ( γ ) ,with µ the first marginal of ξ and let X , . . . , X n be a n -sample of law ι ξ . There exists aconstant β depending on the parameters of the model such that the following holds. Assume that K is a kernel satisfying conditions A , B ( k ) and C ( β ) , that (log n/n ) /d (cid:46) ε (cid:46) h (cid:46) , γ (cid:46) ε , (cid:46) ‘ (cid:46) ε − and consider the estimator c vol M defined in (3.13) with parameters m , ε and ‘ .Then,(i) The measure ˆ ν n,h is a nonnegative measure with probability larger than − cn − k/d .(ii) Define ˆ ν n,h = ˆ ν n,h if ˆ ν n,h is a nonnegative measure and ˆ ν n,h = δ X otherwise. Then, withprobability larger than − cn − k/d , W p (ˆ ν n,h , µ n,h ) (cid:46) γ + ε m . (3.17) (iii) In particular, let m = d s + 1 e , ε (cid:16) (ln n/n ) /d , ‘ (cid:16) ε − and h (cid:16) n − / (2 s + d ) if d ≥ , h (cid:16) (log n/n ) /d if d ≤ . Then, E W p (ˆ ν n,h , µ ) (cid:46) γ + n − s +12 s + d if d ≥ n − (log n ) if d = 2 n − if d = 1 . (3.18) (iv) Furthermore, if ≤ s < k and τ min > , for any f min small enough and f max , L s , L k largeenough, we have R n ( µ, Q s,kd ( γ ) , W p ) (cid:38) γ + n − s +12 s + d + n − kd if d ≥ ,n − if d ≤ . (3.19) Remark . There are several considerations worth of interest con-cerning the numerical implementations of the estimators c vol M and ˆ ν n,h . In a preprocessing step,one must first solve the optimization problem (3.8) for each element X i j of the output of thefarthest point sampling algorithm. Let N j be the number of points of the sample at distanceless than ε from X i j (which is with high probability of order nε d (cid:16) log n ). For k = 2, minimiz-ing (3.8) is equivalent to performing a PCA on the N j neighbors of X i j , with a correspondingtime complexity of order O ( N j ) with high probability. For k ≥
3, as the space of orthogonalprojectors of rank d is a non-convex manifold, the minimization of the objective function is moredelicate. In [ZJRS16], a Riemannian SVRG procedure is proposed to minimize a functional de-fined on some Riemannian manifold. Their procedure outputs values whose costs are provablyclose to the minimal value of the objective function, even for non-convex smooth functions. Theimplementation of such an algorithm is a promising way to minimize (3.8) in practice.The uniform measure on M can be approximated by considering the empirical measure( ˆ U M ) N of a N -sample of law ˆ U M := c vol M / | c vol M | . To create such a sample, we may use14mportance sampling techniques to sample according to the measure with density χ j on ˆΨ i j ( ˆ T i j ).Eventually, the measure ˆ ν ( N ) n,h with density K h ∗ ( ν n / ˆ ρ h ) with respect to ( ˆ U M ) N may be used asa proxy for ˆ ν n,h . The first step to prove Theorem 3.1 is to study the bias of the estimator, given by the distance k·k H − p ( M ) between µ h and µ . Write ˜ φ for φ/ρ h . Introduce the operator A h : B sp,q ( M ) → H − p ( M )defined for φ ∈ L ( M ) and x ∈ M by A h φ ( x ) := K h ∗ (cid:18) φ ( x ) ρ h ( x ) (cid:19) − φ ( x ) = Z M K h ( x − y ) (cid:16) ˜ φ ( y ) − ˜ φ ( x ) (cid:17) dvol M ( x ) . (4.1)Then, k µ h − µ k H − p ( M ) = k A h f k H − p ( M ) ≤ k A h k B sp,q ( M ) ,H − p ( M ) k f k B sp,q ( M ) ≤ k A h k B sp,q ( M ) ,H − p ( M ) L s . (4.2) Proposition 4.1.
Let < s ≤ k − , ≤ p < ∞ , and assume that the kernel K is of order k .Then, if h (cid:46) , k A h k B sp,q ( M ) ,H − p ( M ) (cid:46) h s +1 . (4.3)The proof of Proposition 4.1 consists in using the Taylor expansion of a function φ ∈ B sp,q ( M ), and by using that all polynomial terms of low order in the Taylor expansion disappearwhen integrated against K , as the kernel K is of sufficiently large order. Namely, we have thefollowing property, whose proof is given in Appendix C. Lemma 4.2.
Assume that the kernel K is of order k and let B : ( R D ) j → R be a tensor oforder ≤ j < k . Then, for all x ∈ M , (cid:12)(cid:12)(cid:12)(cid:12)Z M K h ( x − y ) B [( x − y ) ⊗ j ]d y (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) k B k op h k (4.4) | ρ h ( x ) − | (cid:46) h k − and k ρ h k C j ( M ) (cid:46) h k − − j (4.5)Let us now give a sketch of proof of Proposition 4.1 in the case 0 < s ≤
1. The H − p ( M )-norm of A h φ is by definition equal to k A h φ k H − p ( M ) = sup (cid:26)Z ( A h φ ) g dvol M , k g k H p ∗ ( M ) ≤ (cid:27) . g ∈ H p ∗ ( M ) with k g k H p ∗ ( M ) ≤
1. We use the following symmetrization trick: Z A h φ ( x ) g ( x )d x = ZZ K h ( x − y )( ˜ φ ( y ) − ˜ φ ( x )) g ( x )d y d x = ZZ K h ( y − x )( ˜ φ ( x ) − ˜ φ ( y )) g ( y )d y d x (by swapping the indexes x and y )= 12 ZZ K h ( x − y )( ˜ φ ( y ) − ˜ φ ( x ))( g ( x ) − g ( y ))d y d x (4.6)where, at the last line, we averaged the two previous lines and used that K is an even function.Informally, as K h ( x − y ) = 0 if | x − y | ≥ h , and as ρ h is roughly constant, we expect | ˜ φ ( y ) − ˜ φ ( x ) | to be of order h s and | g ( x ) − g ( y ) | to be of order h , leading to a bound of R A h φ ( x ) g ( x )d x oforder h s +1 . For l ≥
1, the following analogue of the symmetrization trick holds.
Lemma 4.3 (Symmetrization trick) . There exists h (cid:46) such that the following holds. Let ≤ l ≤ k − be even and let K ( l ) ( x ) = R K λ ( x ) (1 − λ ) l − λ − l ( l − d λ for x ∈ R D . Fix x ∈ M andlet φ ∈ C ∞ ( M ) be a function supported in B M ( x , h ) . Define ˜ φ l := d l ( ˜ φ ◦ Ψ x ) ◦ ˜ π x . Let g ∈ L p ∗ ( M ) with k g k L p ∗ ( M ) ≤ . Then, for h (cid:46) , R A h φ ( x ) g ( x )d x is equal to ZZ B M ( x ,h ) K ( l ) h ( x − y )( ˜ φ l ( y ) − ˜ φ l ( x )) [ π x ( x − y )] ⊗ l ( g ( x ) − g ( y )) d y d x + R, (4.7) where R is a remainder term satisfying | R | (cid:46) k ˜ φ k H lp ( M ) h l +1 . Furthermore, if l ≤ k − is even,we have | R | (cid:46) k ˜ φ k H l +1 p ( M ) h l +2 . Lemma 4.4.
Let η ∈ C ∞ ( M ) and let ≤ l ≤ k − . Assume that either l = 0 or that η issupported on B M ( x , h ) . Let η l = d l ( η ◦ Ψ x ) ◦ ˜ π x . Then, for any h (cid:46) , h − d ZZ B M ( x ,h ) {| x − y | ≤ h } k η l ( x ) − η l ( y ) k p op | x − y | p d x d y ! /p (cid:46) Z B M ( x ,h ) k η l +1 ( x ) k p op d x ! /p (cid:46) k η k H l +1 p ( M ) . (4.8)Proofs of Lemma 4.3 and Lemma 4.4 are found in Appendix C. Let φ ∈ C ∞ ( M ) be afunction supported in B M ( x , h ) and g ∈ H p ∗ ( M ) with k g k H p ∗ ( M ) ≤ Case : s is even Let l = s . Assume first that p > g is smooth. We have ZZ B M ( x ,h ) | K λh ( x − y ) | (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) op | g ( x ) − g ( y ) || x − y | l d x d y (4.9)16 k K k C ( R D ) ( λh ) l +1 − d ZZ B M ( x ,h ) {| x − y | ≤ λh } (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) op | g ( x ) − g ( y ) || x − y | d x d y ≤ k K k C ( R D ) ( λh ) l +1 ( λh ) − d ZZ B M ( x ,h ) {| x − y | ≤ λh } (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) p op d x d y ! /p × ( λh ) − d ZZ B M ( x ,h ) {| x − y | ≤ λh } | g ( x ) − g ( y ) | p ∗ | x − y | p ∗ d x d y ! /p ∗ (cid:46) ( λh ) l +1 p ( λh ) − d Z x ∈B M ( x ,h ) (cid:13)(cid:13)(cid:13) ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) p op vol M ( B M ( x, λh ))d x ! /p k g k H p ∗ ( M ) (cid:46) k ˜ φ k H lp ( M ) ( λh ) l +1 (cid:46) k φ k H lp ( M ) ( λh ) l +1 , (4.10)where at the last line, we used Lemma A.1(iii) to control the volume of B M ( x, λh ) and, at thesecond to last line, we used Lemma 4.4. Furthermore, it follows from Leibniz formula for thederivative of a product and Lemma 4.2 that k ˜ φ k H lp ( M ) (cid:46) k φ k H lp ( M ) .As C ∞ ( M ) is dense in H p ∗ ( M ), inequality (4.10) actually holds for every g ∈ H p ∗ ( M ).If p = 1, then every function g ∈ H p ∗ ( M ) with k g k H p ∗ ( M ) ≤ d g (see Remark 2.4). Using that d g ( x, y ) ≤ | x − y | if | x − y | ≤ τ min / p < ∞ shows that inequality (4.10)also holds if p = ∞ .By integrating inequality (4.10) against λ ∈ (0 ,
1) and by using Lemma 4.3, we obtainthe inequality k A h φ k H − p ( M ) (cid:46) h s +1 k φ k H sp ( M ) . Case : s is odd Similarly, we treat the case where s ≤ k − l = s −
1. Onceagain, assume first that p > g is smooth. Then, ZZ B M ( x ,h ) | K λh ( x − y ) | (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) op | g ( x ) − g ( y ) || x − y | l d x d y ≤ ZZ B M ( x ,h ) | K λh ( x − y ) | (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) op | x − y | | g ( x ) − g ( y ) || x − y | | x − y | l +2 d x d y ≤ k K k C ( R D ) ( λh ) l +2 − d ZZ B M ( x ,h ) {| x − y | ≤ λh } (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) op | x − y | | g ( x ) − g ( y ) || x − y | d x d y ≤ k K k C ( R D ) ( λh ) l +2 ( λh ) − d ZZ B M ( x ,h ) {| x − y | ≤ λh } (cid:13)(cid:13)(cid:13) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) p op | x − y | p d x d y /p × ( λh ) − d ZZ B M ( x ,h ) {| x − y | ≤ λh } | g ( x ) − g ( y ) | p ∗ | x − y | p ∗ d x d y ! /p ∗ ( λh ) l +2 k φ k H sp ( M ) , (4.11)where at last line we used Lemma 4.4 and the inequality k ˜ φ k H lp ( M ) (cid:46) k φ k H lp ( M ) . As in theprevious case, the same inequality holds for g ∈ H p ∗ ( M ) non necessarily smooth and if p = 1.By using Lemma 4.3 and by integrating (4.11) against λ ∈ (0 , k A h φ k H − p ( M ) (cid:46) h s +1 k φ k H sp ( M ) .So far, we have proven that k A h φ k H − p ( M ) (cid:46) h s +1 k φ k H sp ( M ) (4.12)for all integers 0 ≤ s ≤ k − φ a smooth function supported on B M ( x , h ). To obtain theresult when φ is not supported on some ball B M ( x , h ), we use an appropriate partition of unity.Indeed, for δ = h /
8, standard packing arguments show the existence of a set S of cardinality N ≤ c d | vol M | δ − d with d H ( M δ | S ) ≤ δ/
3. By the remark following Lemma 3.5, the output S ofthe farthest point sampling algorithm with parameter 7 δ/ N (cid:46)
1. We consider such a covering ( B M ( x, h )) x ∈ S , withassociated partition of unity ( χ x ) x ∈ S given by Lemma 3.5. Then, k A h φ k H − p ( M ) is bounded by X x ∈ S k A h ( χ x φ ) k H − p ( M ) (cid:46) h s +1 X x ∈ S k χ x φ k H sp ( M ) (cid:46) h s +1 X x ∈ S k χ x k C s ( M ) k φ k H sp ( M ) (cid:46) h s +1 k φ k H sp ( M ) , where the second to last inequality follows from Leibniz rule for the derivative of a product.Also, the last inequality follows from the fact that ( χ x ) | M = χ x ◦ i M , where i M : M → M δ is theinclusion, which is a C k function with controlled C k -norm. Hence, k χ x k C s ( M ) (cid:46) k χ x k C s ( M δ ) (cid:46) C ∞ ( M ) is dense in H sp ( M ), this gives the desired bound on the operator norm of A h : H sp ( M ) → H − p ( M ) for 0 ≤ s ≤ k − B sp,q ( M ), we use the interpolation inequality (2.6). By the reiteration theorem [Lun18,Theorem 1.3.5], for 0 < s < k − B sp,q ( M ) = ( L p ( M ) , H k − p ( M )) s/ ( k − ,q , with an equivalentnorm. Hence, we have, for 0 < s < k − k A h k B sp,q ( M ) ,H − p ( M ) (cid:46) k A h k − θL p ( M ) ,H − p ( M ) k A h k θH k − p ( M ) ,H − p ( M ) (cid:46) h − sk − h k sk − (cid:46) h s +1 , so that Proposition 4.1 is proven for s < k −
1. It remains to prove the inequality in the case s = k −
1. By Fatou’s lemma and the definition of interpolation spaces (2.5), we have, for someconstant C not depending on s , k A h f k B k − p,q ( M ) ≤ lim inf s → k − s Proposition 4.5. Let µ ∈ Q s ( M ) with Y , . . . , Y n a n -sample of law µ . Assume that h (cid:46) andthat nh d (cid:38) . Then, E k µ n,h − µ h k H − p ( M ) (cid:46) n − / h − d/ I d ( h ) , (4.13) where I d ( h ) is defined in Theorem 3.1. Let ∆ be the Laplace-Beltrami operator on M and G : U M → R be a Green’s function,defined on { ( x, y ) ∈ M × M, x = y } (see [Aub82, Chapter 4]). By definition, if f ∈ C ∞ ( M ),then the function Gf : x ∈ M R G ( x, y ) f ( y )d y is a smooth function satisfying ∆ Gf = f ,with ∇ Gf ( x ) = R ∇ x G ( x, y ) f ( y )d y for x ∈ M . Hence, if w = ∇ Gf , then ∇ · w = f , so that,Proposition 2.5 yields k f k H − p ( M ) ≤ k f k ˙ H − p ( M ) ≤ k∇ Gf k L p ( M ) . By linearity, we have k µ n,h − µ h k H − p ( M ) = k K h ( µ n − µ ) k H − p ( M ) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n X i =1 ∇ G (cid:18) K h ∗ (cid:18) δ Y i ρ h ( Y i ) (cid:19)(cid:19) − E (cid:20) ∇ G (cid:18) K h ∗ (cid:18) δ Y i ρ h ( Y i ) (cid:19)(cid:19)(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L p ( M ) . (4.14)The expectation of the L p -norm of the sum of i.i.d. centered functions is controlled thanks tothe next lemma. Lemma 4.6. Let U , . . . , U n be i.i.d. functions on L p ( M ) . Then, E (cid:13)(cid:13)(cid:13) n P ni =1 ( U i − E U i ) (cid:13)(cid:13)(cid:13) pL p ( M ) is smaller than n − p/ R (cid:0) E (cid:2) | U ( z ) | (cid:3)(cid:1) p/ d z if p ≤ ,C p n − p/ R (cid:0) E | U ( z ) | (cid:1) p/ d z + C p n − p R M E [ | U ( z ) | p ] d z if p > . (4.15) Proof. If p ≤ 2, one has by Jensen’s inequality E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( U i ( z ) − E U i ( z )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ≤ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ( U i ( z ) − E U i ( z )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p/ ≤ n p/ (cid:16) E | U ( z ) | (cid:17) p/ and (4.15) follows by integrating this inequality against z ∈ M . For p > 2, we use Rosenthalinequality [Ros70, Theorem 3] for a fixed z ∈ M , and then integrate the inequality against z ∈ M . 19t remains to bound E h(cid:12)(cid:12)(cid:12) ∇ G (cid:16) K h ∗ (cid:16) δ Y ρ h ( Y ) (cid:17)(cid:17) ( z ) (cid:12)(cid:12)(cid:12) p i where Y ∼ µ , z ∈ M and p ≥ Lemma 4.7. Let p ≥ . Then, for all z ∈ M and h (cid:46) , E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) ∇ G (cid:18) K h ∗ (cid:18) δ Y ρ h ( Y ) (cid:19)(cid:19) ( z ) (cid:12)(cid:12)(cid:12)(cid:12) p (cid:21) (cid:46) if d = 1 − log h if p = d = 2 h p + d − dp else . (4.16)A proof of Lemma 4.7 is found in Appendix D. From (4.14), Lemma 4.6 and Lemma 4.7,we obtain, in the case p ≥ d ≥ E k µ n,h − µ h k H − p ( M ) ≤ (cid:18) E k µ n,h − µ h k pH − p ( M ) (cid:19) /p ≤ C p n − / Z E (cid:12)(cid:12)(cid:12)(cid:12) ∇ G (cid:18) K h ∗ (cid:18) δ Y ρ h ( Y ) (cid:19)(cid:19) ( z ) (cid:12)(cid:12)(cid:12)(cid:12) ! p/ d z /p + C p n /p − (cid:18)Z E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) ∇ G (cid:18) K h ∗ (cid:18) δ Y ω h ( Y ) (cid:19)(cid:19) ( z ) (cid:12)(cid:12)(cid:12)(cid:12) p (cid:21) d z (cid:19) /p (cid:46) n − / | vol M | /p h − d/ + n /p − | vol M | /p h d/p − d . Recalling that | vol M | ≤ f − (cid:46) nh d (cid:38) 1, one can check that this quantity is smallerup to a constant than n − / h − d/ , proving Proposition 4.5 in the case p ≥ d ≥ 3. Asimilar computation shows that Proposition 4.5 also holds if p ≤ d ≤ The proof of (i) is found in Appendix E. Let us now prove (ii). If 0 < s ≤ k − 1, by Proposition4.1 and (4.2), we have k µ − µ h k H − p ( M ) ≤ L s k A h k B sp,q ( M ) ,H − p ( M ) (cid:46) h s +1 . Combining this inequality with Proposition 4.5 yields (3.4).Let us prove (iii). Let E be the event described in (i). If E is realized, then µ n,h isequal to µ n,h , and it satisfies µ n,h ≥ f min vol M . Thus, Proposition 2.9 yields W p ( µ n,h , µ ) (cid:46) k µ n,h − µ k H − p ( M ) . If E is not realized, we bound W p ( µ n,h , µ ) by diam( M ), which is itselfbounded by a constant depending only on the parameters of the model (see [AL18, Lemma2.2]). Hence, E W p ( µ n,h , µ ) ≤ E h W p ( µ n,h , µ ) { E } i + diam( M ) P ( E c ) (cid:46) E k µ n,h − µ k H − p ( M ) + n − k/d , and we conclude thanks to (3.4).Eventually, a proof of (iv) is found in Appendix G.20 T j M ˆΨ j ( ˆ T j ) Ψ Y j ◦ π Y j ˆΨ j X j Y j εε ε S j Figure 1: Illustration of Lemma 4.8(a) Proof of Theorem 3.6(i). Assume that γ ≤ ε/ 24. Let X = { X , . . . , X n } and Y = { Y , . . . , Y n } . By the remarkfollowing Lemma 3.5, the existence of a partition of unity satisfying the requirements of Theorem3.6(i) is ensured as long as d H ( M ε/ | X ) ≤ ε/ 24. We have d H ( M ε/ | X ) ≤ d H ( M ε/ | Y ) + ε/ ≤ d H ( M | Y ) + 4 ε/ 24. Hence, the partition of unity exists if d H ( M | Y ) ≤ ε/ 24. This is satisfied withprobability larger than 1 − cn − k/d if ε (cid:38) (log n/n ) /d by [Aam17, Lemma III.23]. Proof of Theorem 3.6(ii). For ease of notation, we will assume that the output { X i , . . . , X i J } of the farthest pointsampling algorithm is equal to { X , . . . , X J } . Write ν j for the measure having density χ j withrespect to the d -dimensional Hausdorff measure on ˆΨ j ( ˆ T j ). Lemma 4.8. If (log n/n ) /d (cid:46) ε (cid:46) and γ (cid:46) ε , with probability larger than − cn − k/d , for all j = 1 , . . . , J :(a) The map Ψ Y j ◦ π Y j : B ˆ T j (0 , ε ) → M is a diffeomorphism on its image, which contains B M ( Y j , ε ) . Let S j : B M ( Y j , ε ) → B ˆ T j (0 , ε ) be the inverse of Ψ Y j ◦ π Y j . Then, ˆΨ j ◦ S j : B M ( Y j , ε ) → ˆΨ j ( ˆ T j ) is also a diffeomorphism on its image, which contains B ˆΨ j ( ˆ T j ) ( X j , ε ) .Furthermore, for all z ∈ B M ( Y j , ε ) , we have | ˆΨ j ◦ S j ( z ) − X j | ≥ | z − Y j | .(b) The measure ( ˆΨ j ◦ S j ) − ν j has a density ˜ χ j on M equal to ˜ χ j ( z ) = χ j ( ˆΨ j ◦ S j ( z )) J ( ˆΨ j ◦ S j )( z ) for z ∈ M , (4.17) where the function is extended by for z ∈ M \B M ( Y j , ε ) . c) For z ∈ B M ( Y j , ε ) , we have | ˆΨ j ◦ S j ( z ) − z | (cid:46) ε m + γ, (4.18) | ˜ χ j ( z ) − χ j ( z ) | (cid:46) ε m + γ. (4.19)A proof of Lemma 4.8 is found in Appendix F. Let ˆ M ε = S Jj =1 B ˆΨ j ( ˆ T j ) ( X j , ε ) be thesupport of c vol M . Lemma 4.9. Let (log n/n ) /d (cid:46) ε (cid:46) and γ (cid:46) ε . Fix ≤ r ≤ ∞ and let φ : M → R , ˜ φ : ˆ M ε → R be functions satisfying φ min ≤ φ, ˜ φ ≤ φ max for some positive constants φ min , φ max > . Assumefurther that for all j = 1 , . . . , J and for all z ∈ M we have, | ˜ φ ( ˆΨ j ◦ S j ( z )) − φ ( z ) | ≤ T (cid:46) .Then, with probability larger than − cn − k/d , we have W r ˜ φ · c vol M | ˜ φ · c vol M | , φ · vol M | φ · vol M | ! (cid:46) C ( T + ε m + γ ) , (4.20) where C depends on φ min and φ max . In particular, inequality (3.14) is a consequence of Lemma 4.9 with φ ≡ ˜ φ ≡ Proof. Assume first that r < ∞ . We have the bound W r ˜ φ · c vol M | ˜ φ · c vol M | , φ · vol M | φ · vol M | ! = 1 | ˜ φ · c vol M | /r W r ˜ φ · c vol M , φ · vol M | ˜ φ · c vol M || φ · vol M | ! ≤ | ˜ φ · c vol M | /r W r J X j =1 ˜ φ · ν j , J X j =1 ( ˆΨ j ◦ S j ) − ( ˜ φ · ν j ) + W r J X j =1 ( ˆΨ j ◦ S j ) − ( ˜ φ · ν j ) , φ · vol M | ˜ φ · c vol M || φ · vol M | ! (4.21)We use Proposition 2.9 to bound the second term in (4.21). By a change of variables,the density of ( ˆΨ j ◦ S j ) − ( ˜ φ · ν j ) is given by ˜ φ j : z ˜ φ ( ˆΨ j ◦ S j ( z )) ˜ χ j ( z ). With probability largerthan 1 − cn − k/d , we have for z ∈ M , should ε m + γ be small enough, J X j =1 ˜ χ j ( z ) ≥ J X j =1 χ j ( z ) − Cc d ( ε m + γ ) ≥ − 12 = 12 , where c d is the constant of Lemma 3.5. Therefore, the density of P Jj =1 ( ˆΨ j ◦ S j ) − ( ˜ φ · ν j ) is largerthan φ min / 2. Remark also that ˜ χ j ( z ) ≤ z ∈ M . Hence, we have according to Lemma22.8, | ˜ φ j ( z ) − φ ( z ) χ j ( z ) | ≤ T + 2 φ max | χ j ( z ) − ˜ χ j ( z ) | (cid:46) T + φ max ( ε m + γ ) for some constant C .This gives the bound, || ˜ φ · c vol M | − | φ · vol M || ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J X j =1 ˜ φ j − φ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( M ) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J X j =1 ˜ φ j − φ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L r ( M ) | vol M | − /r ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J X j =1 ˜ φ j − φ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ∞ ( M ) | vol M | ≤ C | vol M | ( T + φ max ( ε m + γ )) . (4.22)Therefore, φ | ˜ φ · c vol M || φ · vol M | is larger than φ min (cid:18) − C | vol M | T + φ max ( ε m + γ ) φ min | vol M | (cid:19) ≥ φ min − C ( T + φ max ( ε m + γ )) ≥ φ min T, ε and γ are small enough. Hence, by Proposition 2.9 and using (4.22), W r J X j =1 ( ˆΨ j ◦ S j ) − ( ˜ φ · ν j ) , φ · vol M | ˜ φ · c vol M || φ · vol M | ≤ r − /r (cid:18) φ min (cid:19) − /r (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J X j =1 ˜ φ j − φ | ˜ φ · c vol M || φ · vol M | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) H − r ( M ) ≤ (cid:18) φ min ∨ (cid:19) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X j =1 ˜ φ j − φ | ˜ φ · c vol M || φ · vol M | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L r ( M ) ≤ (cid:18) φ min ∨ (cid:19) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J X j =1 ˜ φ j − φ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L r ( M ) + || φ · vol M | − | ˜ φ · c vol M ||| φ · vol M | k φ k L r ( M ) ≤ (cid:18) φ min ∨ (cid:19) C ( T + φ max ( ε m + γ )) (cid:18) | vol M | /r + | vol M | φ min | vol M | | vol M | /r φ max (cid:19) ≤ C φ min ,φ max ( T + ε m + γ ) , where we used that | vol M | ≤ f − (cid:46) 1, and the constant C φ min ,φ max in the upper bound dependingon φ min and φ max , but not on r .To bound the first term in (4.21), consider the transport plan P Jj =1 (id , ( ˆΨ j ◦ S j ) − ) ( ˜ φ · ν j ) , which has, according to Lemma 4.8, a cost bounded by J X j =1 Z | y − ( ˆΨ j ◦ S j ) − ( y ) | r d( ˜ φ · ν j )( y ) (cid:46) φ max ( ε m + γ ) r | c vol M | . As | c vol M | (cid:46) | vol M | + T + φ max ( ε m + γ ) (cid:46) 1, we obtain the desired bound. By letting r → ∞ ,and remarking that the different constants involved are independent of r , we observe that thesame bound holds for r = ∞ . 23 emark . Inequality (4.22) with φ ≡ φ ≡ c vol M and the volume | vol M | of M : choosing k = m , it is of order ε k + γ with probabilitylarger than 1 − cn − k/d . Proof of Theorem 3.6(iii). Inequality (3.15) is a consequence of Theorem 3.6(ii), whereas the lower bound on theminimax risk (3.16) is proven in Appendix G. Proof of Theorem 3.7. Note first that ˆ ν n,h is indeed a measure of mass 1. We show in Lemma F.2 that T := max j =1 ...J sup z ∈B ( Y j ,ε ) (cid:12)(cid:12)(cid:12)(cid:12) K h ∗ (cid:18) ν n ˆ ρ h (cid:19) ( ˆΨ j ◦ S j ( z )) − K h ∗ (cid:18) µ n ρ h (cid:19) ( z ) (cid:12)(cid:12)(cid:12)(cid:12) satisfies T (cid:46) ε m + γ with probability larger than 1 − cn − k/d . As f min / ≤ K h ∗ µ n ≤ f max on M by Theorem 3.1(i), and as every y ∈ ˆ M ε is in the image of ˆΨ j ◦ S j for some j = 1 . . . J , wehave f min / ≤ K h ∗ ν n ≤ f max on ˆ M ε should ε k + γ be small enough. This proves Theorem3.1(i) and, together with Lemma 4.9, this also proves Theorem 3.7(ii). Theorem 3.7(iii) is aconsequence of Theorem 3.7(ii) and Theorem 3.7(iv) is proven in Appendix G. Acknowledgements I am grateful to Fréderic Chazal, Pascal Massart, Eddie Aamari, Clément Berenfeld and ClémentLevrard for helpful discussions and valuable comments on different mathematical aspects of thiswork. APPENDIX A Geometric properties of C k manifolds with positive reach and their estimators Let M ∈ M kd,τ min ,L for some k ≥ τ min , L > 0. Recall that the angle between two d -dimensional subspaces T and T is given by ∠ ( T , T ) := k π T − π T k op = k π ⊥ T ◦ π T k op , where π T (resp. π T ) is the orthogonal projection on T (resp. T ) and π ⊥ T := id − π T . Lemma A.1. Let x, y ∈ M . The following properties hold:(i) One has | π ⊥ y ( x − y ) | ≤ | x − y | τ min and ∠ ( T x M, T y M ) ≤ | y − x | τ min . ii) If π M ( z ) = x for some z ∈ M τ min , then z − x ∈ T x M ⊥ .(iii) If h ≤ τ min / , then c d h d ≤ vol M ( B M ( x, h )) ≤ C d h d .(iv) If h ≤ r , then B M ( x, h ) ⊂ Ψ x ( B T x M (0 , h )) ⊂ B M ( x, h/ . Also, if u ∈ B T x M (0 , r ) , then | u | ≤ | Ψ x ( u ) − x | ≤ | u | / .(v) There exists a map N x : B T x M (0 , r ) → T x M ⊥ satisfying dN x (0) = 0 , and such that, for u ∈ B T x M (0 , r ) , we have Ψ x ( u ) = x + u + N x ( u ) with | N x ( u ) | ≤ L | u | .(vi) There exist tensors B x , . . . , B k − x of operator norm controlled by a constant dependingon L , d , k and τ min , such that, if u ∈ T x M satisfies | u | ≤ C k,d,L , then J Ψ x ( u ) = 1 + P k − i =2 B ix [ u ⊗ i ] + R x ( u ) , with | R x ( u ) | ≤ C k,d,L | u | k .Proof. See Theorem 4.18 in [Fed59], Lemma 6 in [GW03] for (i), Theorem 4.8 in [Fed59] for (ii),and Proposition 8.7 in [AL18] for (iii). See Lemma A.2 in [AL19] for the second inclusion ofballs in (iv), which also implies the second inequality in (iv). The first inclusion as well as thefirst inequality in (iv) follow from the fact that Ψ x is the inverse of ˜ π x , which is 1-Lipschitz.By a Taylor expansion of Ψ x at u = 0, we have Ψ x ( u ) = x + u + N x ( u ), with N x ( u ) = R d Ψ x ( tu )[ u ⊗ ]d t . Hence, | N x ( u ) | ≤ L | u | . Furthermore, as ˜ π x ◦ Ψ x ( u ) = u , we have π x ( N x ( u )) = 0, i.e. N x takes its values in T x M ⊥ . This proves (v).Eventually, we prove (vi). We have d Ψ x ( u ) = id T x M + dN x ( u ), and d Ψ x ( u ) ∗ d Ψ x ( u ) =id T x M +( dN x ( u )) ∗ dN x ( u ). Therefore, J Ψ x ( u ) = q det( d Ψ x ( u ) ∗ d Ψ x ( u )) = q det(id T x M + ( dN x ( u )) ∗ dN x ( u )) . One has dN x ( u ) = dN x (0) + P k − j =2 d j N x (0)( j − [ u ⊗ ( j − ] + R x ( u ), with | R x ( u ) | ≤ C k,L | u | k − and dN x (0) = 0. Hence, ( dN x ( u )) ∗ dN x ( u ) is written as P k − j =2 B j [ u ⊗ j ] + R x ( u ), with | R x ( u ) | ≤ C k,l | u | k . The operator norm of this operator is smaller than, say, 1 / | u | sufficiently small, andwe conclude the proof by writing a Taylor expansion at 0 of the function F p det(id + F ).We now prove Lemma 3.5, on the construction of smooth partitions of unity based onsome set S which is sufficiently sparse and dense over a tubular neighborhood of M . Proof of Lemma 3.5. Consider the functions θ and ( χ x ) x ∈ S as in the statement of the lemma,and, for y ∈ M δ , let Z ( y ) = P x ∈ S θ (cid:16) y − x δ (cid:17) . As d H ( M δ | S ) ≤ δ , we have Z ( y ) ≥ χ x ( y ) is well defined. The function χ x is smooth and we have P x ∈ S χ x ≡ M δ .One has d l χ x ( y ) which is written as a sum of terms of the form d l − j θ (cid:16) y − x δ (cid:17) d j ( Z − )( y ), and d j ( Z − )( y ) is equal to a sum of terms of the form Z j − j − ( y ) d j Z ( y ) for 1 ≤ j ≤ j . Also,25 (cid:13)(cid:13) d j θ (cid:16) y − x δ (cid:17)(cid:13)(cid:13)(cid:13) op ≤ C j δ − j and (cid:13)(cid:13) d j Z ( y ) (cid:13)(cid:13) op ≤ C j δ − j P x ∈ S {| x − y | ≤ δ } . Hence, as Z ≥ 1, wehave for any l ≥ (cid:13)(cid:13)(cid:13) d l χ x ( y ) (cid:13)(cid:13)(cid:13) op ≤ C l δ − l X x ∈ S {| x − y | ≤ δ } . It remains to bound this sum. If x ∈ B ( y, δ ), then π M ( x ) ∈ B ( π M ( y ) , δ ). Also, for x = x ∈ S ,we have | π M ( x ) − π M ( x ) | ≥ | x − x | − δ ≥ δ . In particular, the balls B M ( π M ( x ) , δ ) for x ∈ S are pairwise disjoint, and are all included in B M ( π M ( y ) , δ ) . Therefore, if 11 δ ≤ τ ( M ) / M ( B M ( π M ( x ) , δ )) ≥ c d δ d , and that X x ∈ S {| x − y | ≤ δ } ≤ X x ∈ S {| x − y | ≤ δ } vol M ( B M ( π M ( x ) , δ )) c d δ d ≤ vol M ( B M ( π M ( y ) , δ )) c d δ d ≤ c d . This concludes the proof.We end this section by detailing the properties of the local polynomial estimators ˆΨ i and ˆ T i defined in [AL19]. In particular, we prove Proposition 3.4. Recall that X i = Y i + Z i with Y i ∈ M and | Z i | ≤ γ . Aamari and Levrard introduce tensors V ∗ j,i which are defined as d j Ψ X i (0) /j !, where d j Ψ X i (0) is the j th differential of Ψ X i at 0 (see the proof of Lemma 2 in[AL19] for details). In particular, we have V ∗ ,i = π Y i . Furthermore, as ˜ π Y j ◦ Ψ Y j = id, we have π Y j ◦ V ∗ j,i = 0 for j ≥ Lemma A.2. With probability larger than − cn − k/d , for any ≤ i ≤ n ,(i) We have ∠ ( T Y i M, ˆ T i ) (cid:46) ε m − + γε − .(ii) For v ∈ ˆ T i , we have ˆΨ i ( v ) = X i + v + ˆ N i ( v ) , where ˆ N i : ˆ T i → ˆ T ⊥ i is defined by ˆ N i ( v ) = P m − j =2 ˆ V j,i [ v ⊗ j ] .(iii) For any ≤ j < m , (cid:13)(cid:13)(cid:13) ˆ V j,i ◦ ˆ π i − V ∗ j,i ◦ π Y i (cid:13)(cid:13)(cid:13) op (cid:46) ε m − j + γε − j . (iv) For v ∈ B ˆ T i (0 , ε ) , we have | ˆΨ i ( v ) − Ψ Y i ( π Y i ( v )) | (cid:46) ε m + γ, (A.1) | ˆ N i ( v ) − N Y i ( π Y i ( v )) | (cid:46) ε m + γ, (A.2) (cid:13)(cid:13)(cid:13) d ˆΨ i ( v ) − d (Ψ Y i ◦ π Y i )( v ) (cid:13)(cid:13)(cid:13) op (cid:46) ε m − + γε − (A.3) (cid:13)(cid:13)(cid:13) d ˆ N i ( v ) − d ( N Y i ◦ π Y i )( v ) (cid:13)(cid:13)(cid:13) op (cid:46) ε m − + γε − . (A.4) Proof of Proposition 3.4. Lemma A.2(i) is stated in Theorem 2 in [AL19]. Remark that for x ∈ B ( X i , ε ), with ˜ x = x − X i , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ x − π (˜ x ) − m − X j =2 V j [ π (˜ x ) ⊗ j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ x − π (˜ x ) − m − X j =2 π ⊥ ◦ V j [ π (˜ x ) ⊗ j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m − X j =2 π ◦ V j [ π (˜ x ) ⊗ j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 26o that we may always assume that the tensors ˆ V j,i minimizing the criterion (3.8) satisfy ˆ π i ◦ ˆ V j,i =0 for j ≥ 2. This proves Lemma A.2(ii).We prove Lemma A.2(iii) by induction on 2 ≤ j < m . The result for j = 2 is stated in[AL19, Theorem 2]. It is shown in [AL19] (see Equation (3)) that there exist tensors V j,i for1 ≤ j < m satisfying with probability larger than 1 − cn − k/d , (cid:13)(cid:13)(cid:13) V j,i ◦ π Y i (cid:13)(cid:13)(cid:13) op (cid:46) ε m − j + γε − j . (A.5)The tensors V j,i are defined by the relations, for y ∈ M close enough to Y i , ( y − Y i = π Y i ( y − Y i ) + P m − j =2 V ∗ j,i [ π Y i ( y − Y i ) ⊗ j ] + R ( y − Y i ) y − Y i − ˆ π i ( y − Y i ) − P m − j =2 ˆ V j,i [ˆ π i ( y − Y i ) ⊗ j ] = P m − j =1 V j,i [ π Y i ( y − Y i ) ⊗ j ] + R ( y − Y i ) , with | R ( y − Y i ) | , | R ( y − Y i ) | (cid:46) ε m , see the proof of Lemma 3 in [AL19]. In particular, for j ≥ π Y i ◦ V ∗ j,i = 0, we see that V j,i ◦ π Y i is written as the sum of ( π Y i − ˆ π i ) ◦ V ∗ j,i + ( V ∗ j,i ◦ π Y i − ˆ V j,i ◦ ˆ π i ) and of a sum of terms proportional toˆ V j ,i [ˆ π i ◦ V ∗ a ,i ◦ π Y i , . . . , ˆ π i ◦ V ∗ a j ,i ◦ π Y i ] , (A.6)where 2 ≤ j < j and a + · · · + a j = j , 1 ≤ a , . . . , a j < j . There exists in particular an indexin the sum which is larger than 2. Assume without loss of generality that a , . . . , a l > a l +1 , . . . , a j = 1, so that ˆ π i ◦ ˆ V a u ,i = 0 for 1 ≤ u ≤ l . Then, (cid:13)(cid:13)(cid:13) ˆ V j ,i [ˆ π i ◦ V ∗ a ,i ◦ π Y i , . . . , ˆ π i ◦ V ∗ a l ,i ◦ π Y i , . . . , ˆ π i ◦ V ∗ a j ,i ◦ π Y i ] (cid:13)(cid:13)(cid:13) op = (cid:13)(cid:13)(cid:13) ˆ V j ,i [ˆ π i ◦ ( V ∗ a ,i − ˆ V a ,i ) ◦ π Y i , . . . , ˆ π i ◦ ( V ∗ a l ,i − ˆ V a l ,i ) ◦ π Y i , . . . , ˆ π i ◦ V ∗ a j ,i ◦ π Y i ] (cid:13)(cid:13)(cid:13) op (cid:46) ‘ l Y u =1 (cid:13)(cid:13)(cid:13) V ∗ a u ,i ◦ π Y i − ˆ V a u ,i ◦ π Y i (cid:13)(cid:13)(cid:13) op (cid:46) ‘ l Y u =1 (cid:18)(cid:13)(cid:13)(cid:13) V ∗ a u ,i ◦ π Y i − ˆ V a u ,i ◦ ˆ π i (cid:13)(cid:13)(cid:13) op + ‘ k π Y i − ˆ π i k op (cid:19) (cid:46) ε − l Y u =1 (cid:16) ε m − a u + γε − a u + ε m − + γε − (cid:17) (cid:46) ε − ( ε lm − ( j − l ) + γ l ε − ( j − l ) ) (cid:46) ε m − j + γε − j , where at the last line we use the induction hypothesis as well as Lemma A.2(i), the fact that P lu =1 a u = j − l and that ‘ (cid:46) ε − . As (cid:13)(cid:13)(cid:13) ( π Y i − ˆ π i ) ◦ V ∗ j,i (cid:13)(cid:13)(cid:13) op (cid:46) ε m − + γε − , we obtain that (cid:13)(cid:13)(cid:13) ( V ∗ j,i ◦ π Y i − ˆ V j,i ◦ ˆ π i ) − V j,i ◦ π Y i (cid:13)(cid:13)(cid:13) op (cid:46) ε m − j + γε − j . (cid:13)(cid:13)(cid:13) V ∗ j,i ◦ π Y i − ˆ V j,i ◦ ˆ π i (cid:13)(cid:13)(cid:13) op ≤ (cid:13)(cid:13)(cid:13) ( V ∗ j,i ◦ π Y i − ˆ V j,i ◦ ˆ π i ) − V j,i ◦ π Y i (cid:13)(cid:13)(cid:13) op + (cid:13)(cid:13)(cid:13) V j,i ◦ π Y i (cid:13)(cid:13)(cid:13) op (cid:46) ε m − j + γε − j . We now may prove (A.1). Indeed, for v ∈ B ˆ T i (0 , ε ), ˆΨ i ( v ) = X i + v + P m − j =2 ˆ V j,i [ v ⊗ j ],whereas by a Taylor expansion, Ψ Y i ◦ π Y i ( v ) = Y i + π Y i ( v ) + P m − j =2 V j,i [ π Y i ( v ) ⊗ j ] + R ( v ), with | R ( v ) | (cid:46) ε m . By Lemma A.2(iii), the difference between the two quantities is bounded with highprobability by a sum of terms of order ( ε m − j + γε − j ) | v | j (cid:46) ε m + γ . Inequality (A.2) is directlyimplied by (A.1) and Lemma A.2(i). Inequality (A.3) is proven as (A.1), by noting that, for h ∈ ˆ T i , ( d (Ψ Y j ◦ π Y j )( v )[ h ] = π Y j ( h ) + P m − j =2 jV ∗ j,i [ π Y j ( v ) , π Y j ( h ) ⊗ ( j − ] + R ( v ) hd ˆΨ j ( v )[ h ] = h + P m − j =2 j ˆ V j,i [ v, h ⊗ ( j − ] , with k R ( v ) k op (cid:46) ε m − . Equation (A.4) is shown in a similar way. B Properties of negative Sobolev norms Proof of Proposition 2.5. The second inequality in (i) is trivial. The assertion (ii) is stated in[BCS10, Theorem 2.1] for an open set Ω ⊂ R d , and their proof can be straightforwardly adaptedto the manifold setting. It remains to prove the first inequality in (i). Note that for any g with k∇ g k L p ∗ ( M ) ≤ 1, one has R f g dvol M = R f ( g − R g dvol M )dvol M as R f dvol M = 0. Also, byPoincaré inequality (see [BCH18, Theorem 0.6]), (cid:13)(cid:13)(cid:13)(cid:13) g − Z M g (cid:13)(cid:13)(cid:13)(cid:13) L p ∗ ( M ) ≤ C p R dp ∗ + p k∇ g k L p ∗ ( M ) ≤ C p R dp ∗ + p , where R = max { d g ( x, y ) , x, y ∈ M } and C depends on d and on a lower bound κ on the Riccicurvature of M . Therefore, k g − R M g k H p ∗ ( M ) ≤ C p R dp ∗ + p . The quantity κ can be further lowerbounded by a constant depending on τ min and d . Indeed, a bound on the second fundamentalform of M entails a bound on the Ricci curvature according to Gauss equation (see e.g. [dC92,Chapter 6]), and the second fundamental form is controlled by the reach of M , see [NSW08,Proposition 6.1]. As C p ≤ C ∨ 1, to conclude, it suffices to bound the geodesic diameter of M .This is done in the following lemma. Lemma B.1. The geodesic diameter of M satisfies sup x,y ∈ M d g ( x, y ) ≤ c d | vol M | τ − d min .Proof. Consider a covering of M by N open balls of radius r = τ min / x, y ∈ M . Such a covering exists with N ≤ c d | vol M | r − d by standard packing28 B B B B x = x x x x x y = x Figure 2: Illustration of the construction in the proof of Lemma B.1arguments. Let γ : [0 , ‘ ] → M be a unit speed curve between x and y . Let B be the ball of thecovering such that x ∈ B . If y ∈ B , then | x − y | ≤ r , and by [NSW08, Proposition 6.3], wehave d g ( x, y ) ≤ r . Otherwise, let t = inf { t ∈ [0 , ‘ ] , ∀ t ≥ t, γ ( t ) B } . Then x := γ ( t )belong to the boundary of B , and is also in some other ball B . By the previous argument, wehave d g ( x, x ) ≤ r . If y ∈ B , then d g ( x , y ) ≤ r and d g ( x, y ) ≤ r . Otherwise, we define t = inf { t ∈ [ t , ‘ ] , ∀ t ≥ t, γ ( t ) B } and we iterate the same argument. At the end, weobtain a sequence x = x , x , . . . , x I of points in M with associated balls B i which contain x i ,such that y ∈ B I and d g ( x i , x i +1 ) ≤ r . Furthermore, all the balls B i are pairwise distinct. As d g ( x I , y ) ≤ r , we have ‘ ≤ ( I + 1)4 r ≤ ( N + 1)4 r ≤ N r . By letting γ be a geodesic, weobtain in particular ‘ = d g ( x, y ) ≤ N r ≤ c d | vol M | r − d . Proof of Proposition 2.9. Given a measurable map ρ : [0 , → P p , E t a vectorial measure abso-lutely continuous with respect to ρ t (see [San15, Box 4.2]) and v ( x, t ) a time-depending vectorfield, defined as the density of E t with respect to ρ t , we define the Benamou-Brenier functional B p ( ρ, E ) := Z | v ( x, t ) | p d ρ t ( x )d t. (B.1)The Benamou-Brenier formula [BB00, Bre03] asserts that for µ, ν ∈ P p supported on some ballof radius R , W pp ( µ, ν ) = min {B p ( ρ, E ) , ∂ t ρ t + ∇ · E t = 0 , ρ = µ, ρ = ν } , (B.2)where ρ t is supported on the ball of radius R , and the continuity equation ∂ t ρ + ∇ · E = µ − ν has to be understood in the distributional sense, i.e. Z [0 , × R D ∂ t φ ( t, x )d ρ ( t, x ) + Z [0 , × R D ∇ φ ( t, x ) · d E ( t, x ) = 0 , (B.3)29or all φ ∈ C ((0 , × B (0 , R )) with compact support.Assume that µ has a density f and ν has a density f on M . As τ ( M ) > 0, theexistence of a probability measure of mass 1, supported on M , with density larger than f min implies that M is compact, see Remark 2.11. It is in particular included in a ball B (0 , R ) forsome R large enough. Let w be a vector field on M with ∇ · w = µ − ν in a distributional sense,i.e. R ∇ g · w = − R g ( µ − ν ) for all g ∈ C ( M ). Let ρ t = (1 − t ) µ + tν and define E the vectormeasure having density w with respect to Leb × vol M , where Leb is the Lebesgue measure on[0 , ρ, E ) satisfies the continuity equation and E = v · ρ where v ( t, x ) = w ( x )(1 − t ) f ( x )+ tf ( x ) for t ∈ [0 , x ∈ M . Hence, W pp ( µ, ν ) ≤ Z Z p | v | p d ρ = 1 p Z Z | w ( x ) | p | (1 − t ) f ( x ) + tf ( x ) | p ((1 − t ) f ( x ) + tf ( x ))d x d t ≤ p Z | w ( x ) | p d x f p − . By taking the infimum on vector fields w on M satisfying ∇ · w = µ − ν and using Proposition2.5, we obtain the conclusion. The second inequality in (2.9) follows from Proposition 2.5. C Proofs of Section 4.1 Proof of Lemma 4.2. We first prove (4.4). Note that if | x − y | ≥ h for x, y ∈ M , then K h ( x − y ) =0. Hence, by a change of variable, using that B M ( x, h ) ⊂ Ψ x ( B T x M (0 , h )) according to LemmaA.1(iv), Z M K h ( x − y ) B [( x − y ) ⊗ j ]d y = Z B TxM (0 ,h ) K h ( x − Ψ x ( v )) B [( x − Ψ x ( v )) ⊗ j ] J Ψ x ( v )d v = Z B TxM (0 , K (cid:18) x − Ψ x ( hv ) h (cid:19) B [( x − Ψ x ( hv )) ⊗ j ] J Ψ x ( hv )d v. As the functions Ψ x and K are C k , according to Lemma A.1(v) and Lemma A.1(vi), we canwrite by a Taylor expansion, for v, u ∈ B T x M (0 , r ), Ψ x ( v ) = x + v + P k − i =2 d i Ψ x (0) i ! [ v ⊗ i ] + R ( x, v ) J Ψ x ( v ) = 1 + P k − i =2 B ix [ v ⊗ i ] + R ( x, v ) K ( v + u ) = K ( v ) + P k − i =1 d i K ( v ) i ! [ u ⊗ i ] + R ( v, u ) B [( v + u ) ⊗ j ] = B [ v ⊗ j ] + P ∅6 = σ ⊂{ ,...,j } B [ v σ , u σ c ] , (C.1)where | R j ( x, v ) | ≤ C j | v | k for j = 1 , | R ( v, u ) | ≤ C | u | k and ( v σ , u σ c ) is the j -tuple whose l thentry is equal to v if l ∈ σ , u otherwise. We obtain that x − Ψ x ( hv ) h = − v − k − X i =2 d i Ψ x (0) i ! [( hv ) ⊗ i ] h − − R ( x, hv ) h − , K (cid:16) x − Ψ x ( hv ) h (cid:17) B [( x − Ψ x ( hv )) ⊗ j ] J Ψ x ( hv ) is written as a sum of termsof the form C i ,i ,i h − i d i K ( v )[( d i Ψ x (0)[( hv ) ⊗ i ]) ⊗ i ] F i [( hv ) ⊗ i ] (C.2)for 0 ≤ i ≤ k − 1, 2 ≤ i ≤ k − j ≤ i ≤ k , where F i is some tensor of order i and k issome integer depending on k and j , plus a remainder term smaller than k B k op | hv | k − j up to aconstant depending on k , j , L k and K . The terms for which i i + i − i ≥ k are smaller than k B k op h k up to a constant, whereas the integrals of the other the terms are null as the kernelis of order k . The first inequality in (4.5) is proven in a similar manner. Let us now bound k ρ h k C j ( M ) . Given x ∈ M , it suffices to bound (cid:13)(cid:13) d j ( ρ h ◦ Ψ x )(0) (cid:13)(cid:13) op . We have d j ( ρ h ◦ Ψ x )(0) = h − j Z B TxM (0 ,h ) ( d j K ) h ( x − Ψ x ( v )) J Ψ x ( v )d v. Therefore, using the same argument as before, we obtain that (cid:13)(cid:13) d j ( ρ h ◦ Ψ x )(0) (cid:13)(cid:13) op (cid:46) h k − − j . Proof of Lemma 4.3. Let 0 ≤ l ≤ k − φ ∈ C ∞ ( M ) be supported in B M ( x , h ) forsome h small enough and g ∈ L p ∗ ( M ) with k g k L p ∗ ( M ) ≤ 1. Let x = Ψ x ( u ) ∈ B M ( x , h )and let ˜ φ x = ˜ φ ◦ Ψ x . Recall that ˜ φ l = d l ˜ φ x ◦ ˜ π x . We have K h ( x − Ψ x ( v )) = 0 only if | x − Ψ x ( v ) | ≤ h . Hence, as | x − Ψ x ( v ) | ≥ | u − v | (recall that Ψ x is the inverse of the projection˜ π x ), the function K h ( x − Ψ x ( · )) is supported on B T x M ( u, h ) ⊂ B T x M (0 , r ) =: B for h, h small enough. Thus, A h φ ( x ) = Z B M ( x,h ) K h ( x − y )( ˜ φ ( y ) − ˜ φ ( x ))d y = Z B K h ( x − Ψ x ( v ))( ˜ φ x ( v ) − ˜ φ x ( u )) J Ψ x ( v )d v. We may write˜ φ x ( v ) − ˜ φ x ( u ) = l − X i =1 d i ˜ φ x ( u ) i ! [( v − u ) ⊗ i ] + Z d l ˜ φ x ( u + λ ( v − u ))[( v − u ) ⊗ l ] (1 − λ ) l − ( l − λ. Each term R B K h ( x − Ψ x ( v )) d i ˜ φ x ( u ) i ! [( v − u ) ⊗ i ] J Ψ x ( v )d v is equal to Z M K h ( x − y ) d i ˜ φ x (˜ π x ( x )) i ! [( π x ( y − x )) ⊗ i ]d y, and is therefore of order smaller than h k max ≤ i ≤ l (cid:13)(cid:13)(cid:13) ˜ φ i ( x ) (cid:13)(cid:13)(cid:13) op by Lemma 4.2. Hence, A h φ ( x ) isequal to the sum of a remainder term of order h k max ≤ i ≤ l (cid:13)(cid:13)(cid:13) ˜ φ i ( x ) (cid:13)(cid:13)(cid:13) op and of Z Z B K h ( x − Ψ x ( v )) d l ˜ φ x ( u + λ ( v − u ))[( v − u ) ⊗ l ] (1 − λ ) l − ( l − J Ψ x ( v )d v d λ Z Z B K h ( x − Ψ x ( v )) (cid:16) d l ˜ φ x ( u + λ ( v − u )) − d l ˜ φ x ( u ) (cid:17) [( v − u ) ⊗ l ] (1 − λ ) l − ( l − J Ψ x ( v )d v d λ + R ( x ) , where | R ( x ) | (cid:46) h k max ≤ i ≤ l (cid:13)(cid:13)(cid:13) ˜ φ i ( x ) (cid:13)(cid:13)(cid:13) op by Lemma 4.2. We now fix λ ∈ (0 , 1) and write, by achange of variables, and as B T x M ( u, h ) ⊂ B for h , h small enough, U ( x ) := Z B K h ( x − Ψ x ( v )) (cid:16) d l ˜ φ x ( u + λ ( v − u )) − d l ˜ φ x ( u ) (cid:17) [( v − u ) ⊗ l ] J Ψ x ( v )d v = Z B K h (cid:18) x − Ψ x (cid:18) u + w − uλ (cid:19)(cid:19) (cid:16) d l ˜ φ x ( w ) − d l ˜ φ x ( u ) (cid:17) " ( w − u ) ⊗ l λ l J Ψ x (cid:18) u + w − uλ (cid:19) d wλ d Note that | K h ( u ) − K h ( v ) | (cid:46) h − d − | u − v | {| u | ≤ h or | v | ≤ h } , and that, as Ψ x is C , (cid:12)(cid:12)(cid:12)(cid:12) x − Ψ x (cid:18) u + w − uλ (cid:19) − x − Ψ x ( w ) λ (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) d Ψ x ( u )[ w − u ] − ( x − Ψ x ( w )) λ (cid:12)(cid:12)(cid:12)(cid:12) + L k | w − u | λ ≤ L k | w − u | λ (cid:46) | w − u | λ , whereas, as J Ψ x is Lipschitz continuous, (cid:12)(cid:12)(cid:12)(cid:12) J Ψ x (cid:18) u + w − uλ (cid:19) − J Ψ x ( w ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) (cid:12)(cid:12)(cid:12)(cid:12) u + w − uλ − w (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) | w − u | λ . Hence, U ( x ) is equal to the sum of λ − l Z B K hλ ( x − Ψ x ( w )) (cid:16) d l ˜ φ x ( w ) − d l ˜ φ x ( u ) (cid:17) [( w − u ) ⊗ l ] J Ψ x ( w ) d w = λ − l Z M K hλ ( x − y ) (cid:16) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:17) [( π x ( y − x )) ⊗ l ]d y, and of a remainder term smaller than λ − l Z B (cid:12)(cid:12)(cid:12)(cid:12) λ − d K h (cid:18) x − Ψ x (cid:18) u + w − uλ (cid:19)(cid:19) J Ψ x (cid:18) u + w − uλ (cid:19) − K hλ ( x − Ψ x ( w )) J Ψ x ( w ) (cid:12)(cid:12)(cid:12)(cid:12) × (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) − d l ˜ φ x ( u ) (cid:13)(cid:13)(cid:13) op | w − u | l d w (cid:46) λ − l Z | w − u | (cid:46) λh | w − u | ( λh ) d +1 J Ψ x (cid:18) u + w − uλ (cid:19) + | K hλ ( x − Ψ x ( w )) | | w − u | λ ! × (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) − d l ˜ φ x ( u ) (cid:13)(cid:13)(cid:13) op | w − u | l d w (cid:46) h l +1 ( λh ) − d Z | w − u | (cid:46) λh (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) − d l ˜ φ x ( u ) (cid:13)(cid:13)(cid:13) op d w. R M A h φ ( x ) g ( x )d x as S + R , where, bythe symmetrization trick (using that l is even) S = ZZ M × M K ( l ) h ( x − y ) (cid:16) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:17) [( π x ( y − x )) ⊗ l ] g ( x )d y d x = ZZ M × M K ( l ) h ( x − y ) (cid:16) ˜ φ l ( x ) − ˜ φ l ( y ) (cid:17) [( π x ( x − y )) ⊗ l ] g ( y )d y d x = 12 ZZ M × M K ( l ) h ( x − y ) (cid:16) ˜ φ l ( y ) − ˜ φ l ( x ) (cid:17) [( π x ( x − y )) ⊗ l ]( g ( x ) − g ( y ))d y d x, and, as A h φ is supported on B M ( x , h + h ) ⊂ B M ( x, h ) if h is small enough, R is smaller upto a constant than, h l +1 ( λh ) − d Z x ∈B M ( x, h ) Z | w − ˜ π x ( x ) | (cid:46) λh (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) − d l ˜ φ x (˜ π x ( x )) (cid:13)(cid:13)(cid:13) op | g ( x ) | d w d x (C.3)+ Z M h k max ≤ i ≤ l (cid:13)(cid:13)(cid:13) ˜ φ i ( x ) (cid:13)(cid:13)(cid:13) op | g ( x ) | d x (cid:46) h l +1 ( λh ) − d Z w ∈B M ( x, h ) (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) (cid:13)(cid:13)(cid:13) op Z | w − ˜ π x ( x ) | (cid:46) λh | g ( x ) | d x d w (C.4)+ h l +1 Z x ∈B M ( x, h ) (cid:13)(cid:13)(cid:13) ˜ φ l ( x ) (cid:13)(cid:13)(cid:13) op | g ( x ) | d x + Z M h k max ≤ i ≤ l (cid:13)(cid:13)(cid:13) ˜ φ i ( x ) (cid:13)(cid:13)(cid:13) op | g ( x ) | d x, where we also used Lemma A.1(iii). By the chain rule,max ≤ i ≤ l (cid:13)(cid:13)(cid:13) ˜ φ i ( x ) (cid:13)(cid:13)(cid:13) op (cid:46) max ≤ i ≤ l (cid:13)(cid:13)(cid:13) d i ˜ φ ( x ) (cid:13)(cid:13)(cid:13) op (cid:46) l X i =1 (cid:13)(cid:13)(cid:13) d i ˜ φ ( x ) (cid:13)(cid:13)(cid:13) op . Hence, applying Hölder’s inequality and using that k g k L p ∗ ( M ) ≤ h l +1 k ˜ φ k H lp ( M ) . To bound the first term in (C.4), remark that by Young’sinequality for integral operators [Sog17, Theorem 0.3.1], if T λh ( g )( y ) = ( λh ) − d R | x − y | (cid:46) λh | g ( x ) | d x ,then kT λh g k L p ∗ ( M ) (cid:46) k g k L p ∗ ( M ) . This yields, by Hölder’s inequality, h l +1 Z w ∈B M ( x, h ) (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) (cid:13)(cid:13)(cid:13) op T hλ ( g )(Ψ x ( w ))d w (cid:46) h l +1 k ˜ φ k H lp ( M ) , which concludes the proof of the first statement of Lemma 4.3. To bound the remainder termin terms of k ˜ φ k H l +1 p ( M ) , we bound the second term in (C.3) in the same fashion, while, to boundthe first term, we write, by a change of variables, Z B M ( x , h ) Z | w − ˜ π x ( x ) | (cid:46) λh (cid:13)(cid:13)(cid:13) d l ˜ φ x ( w ) − d l ˜ φ x (˜ π x ( x )) (cid:13)(cid:13)(cid:13) op | g ( x ) | d x d w ≤ Z Z B M ( x , h ) Z | w − ˜ π x ( x ) | (cid:46) λh (cid:13)(cid:13)(cid:13) d l +1 ˜ φ x (˜ π x ( x ) + λ ( w − ˜ π x ( x ))) (cid:13)(cid:13)(cid:13) op | ˜ π x ( x ) − w || g ( x ) | d x d w d λ h Z Z B M ( x , h ) Z | u − ˜ π x ( x ) | (cid:46) λ λh (cid:13)(cid:13)(cid:13) d l +1 ˜ φ x ( u ) (cid:13)(cid:13)(cid:13) op | g ( x ) | d x d uλ d d λ , and this term is bounded as the first term in (C.4) by h ( hλ ) d k ˜ φ k H l +1 p ( M ) , concluding the proofof Lemma 4.3. Proof of Lemma 4.4. By the chain rule, (cid:13)(cid:13)(cid:13) d l +1 ( η ◦ Ψ x )( u ) (cid:13)(cid:13)(cid:13) op (cid:46) max ≤ i ≤ l +1 (cid:13)(cid:13) d i η (Ψ x ( u )) (cid:13)(cid:13) op forany u ∈ B T x M (0 , h ). Hence, by a change of variables, Z B M ( x ,h ) k η l +1 ( x ) k p op d x (cid:46) Z B Tx M (0 ,h ) max ≤ i ≤ l +1 (cid:13)(cid:13)(cid:13) d i η (Ψ x ( u )) (cid:13)(cid:13)(cid:13) p op d u (cid:46) l +1 X i =1 Z B Tx M (0 ,h ) (cid:13)(cid:13)(cid:13) d i η (Ψ x ( u )) (cid:13)(cid:13)(cid:13) p op d u (cid:46) l +1 X i =1 Z B Tx M (0 ,h ) (cid:13)(cid:13)(cid:13) d i η (Ψ x ( u )) (cid:13)(cid:13)(cid:13) p op J Ψ x ( u )d u (cid:46) k η k pH l +1 p ( M ) , where we used at last line that, by Lemma A.1(vi), J Ψ x ( u ) ≥ / | u | ≤ h if h is smallenough. To prove the first inequality, write h − d ZZ B M ( x ,h ) {| x − y | ≤ h } k η l ( x ) − η l ( y ) k p op | x − y | p d x d y (cid:46) h − d ZZ B Tx M (0 ,h ) {| Ψ x ( u ) − Ψ x ( v ) | ≤ h } (cid:13)(cid:13)(cid:13) d l ( η ◦ Ψ x )( u ) − d l ( η ◦ Ψ x )( v ) (cid:13)(cid:13)(cid:13) p op | Ψ x ( u ) − Ψ x ( v ) | p d u d v (cid:46) h − d Z ZZ B Tx M (0 ,h ) {| u − v | ≤ h } (cid:13)(cid:13)(cid:13) d l +1 ( η ◦ Ψ x )( u + λ ( v − u )) (cid:13)(cid:13)(cid:13) p op d u d v d λ (cid:46) h − d Z ZZ B Tx M (0 , h ) {| w − u | ≤ λh } (cid:13)(cid:13)(cid:13) d l +1 ( η ◦ Ψ x )( w ) (cid:13)(cid:13)(cid:13) p op d u d wλ − d d λ (cid:46) Z Z B Tx M (0 , h ) (cid:13)(cid:13)(cid:13) d l +1 ( η ◦ Ψ x )( w ) (cid:13)(cid:13)(cid:13) p op d w (cid:46) Z B M ( x ,h ) k η l +1 ( x ) k p op d x, where at the second to last line, we used that w = u + λ ( v − u ) is of norm smaller than 2 h if | u | ≤ h and | v − u | ≤ h ≤ h , and, at the last line, we used that J Ψ x ( w ) ≥ / | w | smallenough. D Proof of Lemma 4.7 Lemma 4.7 is heavily based on the following classical control on the gradient of the Greenfunction. 34 emma D.1. Let x, y ∈ M , then |∇ x G ( x, y ) | (cid:46) d g ( x, y ) d − ≤ | x − y | d − . (D.1) Proof. For d ≥ 2, a proof of Lemma D.1 is found in [Aub82, Theorem 4.13]. See also [H + d ≥ 3. Constants in their proofsdepend on d , bounds on the curvature of M , | vol M | and the geodesic diameter of M . As, thosethree last quantities can be further bounded by constants depending on τ min , f min and d , seeLemma B.1 and [NSW08, Proposition 6.1], this concludes the proof. For d = 1, M is isometricto a circle, for which a closed formula for G exists [Bur94], and satisfies |∇ x G ( x, y ) | ≤ | ρ h ( x ) | ≥ / x ∈ M . Therefore, Lemma D.1 yields (cid:12)(cid:12)(cid:12)(cid:12) ∇ G (cid:18) K h ∗ (cid:18) δ x ρ h (cid:19)(cid:19) ( z ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z M ∇ z G ( z, y ) K h ( x − y ) ρ h ( x ) d y (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) Z B M ( x,h ) k K k ∞ h − d | z − y | d − d y. If d = 1, this quantity is smaller than a constant as vol M ( B M ( x, h )) (cid:46) h d by Lemma A.1(iii).We then obtain directly the result in this case by integrating this inequality against f ( x )d x . If d ≥ 2, we use the following argument.• If | x − z | ≥ h and y ∈ B M ( x, h ), then | z − y | ≥ | x − z | − h ≥ | x − z | / 2. Therefore, byLemma A.1(iii), Z B M ( x,h ) k K k ∞ h − d | z − y | d − d y ≤ − d k K k ∞ h − d | x − z | d − vol M ( B M ( x, h )) (cid:46) | x − z | d − . • If | x − z | ≤ h , then Z B M ( x,h ) k K k ∞ h − d | z − y | d − d y ≤ Z B M ( z, h ) k K k ∞ h − d | z − y | d − d y ≤ Z B TzM (0 , h ) k K k ∞ h − d J Ψ z ( u ) | z − Ψ z ( u ) | d − d u (cid:46) h − d Z B TzM (0 , h ) d u | u | d − (cid:46) h − d , where at the last line we used that | z − Ψ z ( u ) | ≥ | u | and that J Ψ z ( u ) (cid:46) E [ |∇ ( G ( K h ∗ δ X ))( z ) | p ] = Z M f ( x ) |∇ ( G ( K h ∗ δ x ))( z ) | p d x ≤ f max Z B M ( z, h ) |∇ ( G ( K h ∗ δ x ))( z ) | p d x + Z M \B M ( z, h ) |∇ ( G ( K h ∗ δ x ))( z ) | p d x ! Z B M ( z, h ) h (1 − d ) p d x + Z M \B M ( z, h ) | z − x | (1 − d ) p d x (cid:46) h (1 − d ) p + d + Z M \B M ( z, h ) | z − x | (1 − d ) p d x. The latter integral is bounded by Z h ≤| x − z |≤ r | z − x | (1 − d ) p d x + Z | x − z |≥ r | z − x | (1 − d ) p d x ≤ Z h ≤| Ψ z ( u ) − z |≤ r | z − Ψ z ( u ) | (1 − d ) p J Ψ z ( u )d u + | vol M | r (1 − d ) p (cid:46) Z h/ ≤| u |≤ r | u | (1 − d ) p d u + 1 (cid:46) h (1 − d ) p + d if (1 − d ) p + d < , where at the last line we use that | u | ≤ | z − Ψ z ( u ) | ≤ | u | / d > d = 2 and p > 2, the condition (1 − d ) p + d < d = 2 and p = 2, then R h/ ≤| u |≤ h | u | (1 − d ) p d u is of order − log h , concluding the proof. E Proof of Theorem 3.1(i) Let f be the density of µ and ˜ f = f /ρ h . By Lemma 4.2, f min (1 − c h k − ) ≤ ˜ f ≤ f max (1+ c h k − )for h small enough. We have K h ∗ f ( x ) = Z M K h ( x − y ) ˜ f ( y )d y = Z B TxM (0 ,h ) K h ( x − Ψ x ( v )) ˜ f ◦ Ψ x ( v ) J Ψ x ( v )d v ≥ Z B TxM (0 ,h ) K h ( v ) ˜ f ◦ Ψ x ( v ) J Ψ x ( v )d v − Z B TxM (0 ,h ) | K h ( x − Ψ x ( v )) − K h ( v ) | ˜ f ◦ Ψ x ( v ) J Ψ x ( v )d v. (E.1)By Lemma A.1(v), the quantity | K h ( x − Ψ x ( v )) − K h ( v ) | is bounded by k K k C R d ) h d +1 | x − v − Ψ x ( v ) | (cid:46) | v | h d +1 , so that the second term in (E.1) is bounded by Cf max R B TxM (0 ,h ) | v | h d +1 d v (cid:46) h . Also, usingthat | J Ψ x ( v ) − | ≤ c | v | by Lemma A.1, the first term is larger than f min (1 − c h k − )(1 − c h ) Z R d K + ( v )d v − f max (1 + c h )(1 + c h k − ) Z R d K − ( v )d v = f min (1 − c h ) (cid:18) Z R d K − ( v )d v (cid:19) − f max (1 + c h ) Z R d K − ( v )d v = f min (1 − c h ) − ( f max (1 + c h ) − f min (1 − c h )) Z R d K − ( v )d v ≥ f min (1 − c h ) − ( f max (1 + c h ) − f min (1 − c h )) β ≥ f min / , if β < f min / (4( f max − f min )) and h is small enough. Likewise, we show that K h ∗ ˜ f ( x ) ≤ f max / | K h ∗ ˜ f ( x ) − K h ∗ ( µ n /ρ h )( x ) | is small enough for all x ∈ M with high36robability. Note that K h ∗ ˜ f − K h ∗ ( µ n /ρ h ) is L -Lipschitz with L (cid:46) h − d − . Let t = f min / M by N balls B M ( x j , t/ (2 L )). By standard packing arguments, sucha covering exists with N (cid:46) ( L/t ) d . If | K h ∗ ˜ f ( x j ) − K h ∗ µ n ( x j ) | ≤ t/ j = 1 , . . . , N ,then k K h ∗ ˜ f − K h ∗ µ n k L ∞ ( M ) ≤ t/ Lt/ (2 L ) ≤ t . Hence, using Bernstein inequality [GN15,Theorem 3.1.7], as | K h ( x j − Y i ) | ≤ k K k C ( R D ) h − d and Var( K h ( x j − Y i )) ≤ k K k C ( R D ) h − d , weobtain P ( k K h ∗ ˜ f − K h ∗ µ n k L ∞ ( M ) ≥ t ) ≤ P ( ∃ j, | K h ∗ ˜ f ( x j ) − K h ∗ µ n ( x j ) | ≥ t/ (cid:46) ( L/t ) d P ( | K h ∗ ˜ f ( x j ) − K h ∗ µ n ( x j ) | ≥ t/ (cid:46) h − d ( d +1) exp( − Cnh d ) . Choosing nh d = C log n for C large enough yields the conclusion. F Proofs of Section 4.4 We first prove Lemma 4.8. Proof of (a). The application Ψ Y j ◦ π Y j : B ˆ T j (0 , ε ) → M is a diffeomorphism on B ˆ T j (0 , ε ),as the composition of the diffeomorphisms Ψ Y j and ( π Y j ) | ˆ T j (recall that ∠ ( ˆ T j , T Y j M ) (cid:46) ε m − + γε − (cid:46) B M ( Y j , ε ) ⊂ Ψ Y j ( B T Yj M (0 , ε )) ⊂ (Ψ Y j ◦ π Y j )( B ˆ T j (0 , ε )) . This proves the first part of Lemma 4.8(a). Let S j : B M ( Y j , ε ) → B ˆ T j (0 , ε ) be the inverse ofΨ Y j ◦ π Y j . By Lemma A.2(ii), ˆΨ j is injective on ˆ T j , while, for v ∈ ˆ T j with | v | ≤ ε , (cid:13)(cid:13)(cid:13) id − d ˆΨ j ( v ) (cid:13)(cid:13)(cid:13) op ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m − X a =2 a ˆ V a,j [ · , v ⊗ ( a − ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:46) ‘ε ≤ / ‘ (cid:46) ε − is small enough. Hence, ˆΨ j : B ˆ T j (0 , ε ) → ˆΨ j ( ˆ T j ) is a diffeomorphism on its image,and ˆΨ j ◦ S j is a diffeomorphism as a composition of diffeomorphisms. Note that the inverse ofˆΨ j is given by ˆ π j ( · − X j ), so that B ˆΨ j ( ˆ T j ) ( X j , ε ) ⊂ ˆΨ j ( B ˆ T j (0 , ε )). Furthermore, by Lemma A.1,(Ψ Y j ◦ π Y j )( B ˆ T j (0 , ε )) ⊂ Ψ Y j ( B T Yj (0 , ε )) ⊂ B M ( Y j , ε/ , so that ( ˆΨ j ◦ S j )( B M ( Y j , ε )) contains B ˆΨ j ( ˆ T j ) ( X j , ε ). Furthermore, these inclusions of balls alsohold for any ε ≤ ε , proving that | ˆΨ j ◦ S j ( z ) − X j | ≥ (7 / | z − Y j | for any z ∈ B M ( Y j , ε ). Proof of (b). The formula for the density ˜ χ j follows from a change of variables.37 roof of (c). The inequality (4.18) follows from Proposition 3.4. We now prove that, for z ∈ B M ( Y j , ε ), | π Y i ( z − ˆΨ j ◦ S j ( z )) | (cid:46) ε ( ε m + γ ) . (F.2)Let u ∈ ˆ T j be such that z = Ψ Y j ◦ π Y j ( u ) and y = ˆΨ j ( u ). Recall that X j ∈ T Y j M ⊥ byassumption, so that π Y j ( X j − Y j ) = 0. Also, by Lemma A.1(v), we have Ψ Y j ( π Y j ( u )) = Y j + π Y j ( u ) + N Y j ( π Y j ( u )) with N Y j ( π Y j ( u )) ∈ T Y j M ⊥ , while by Lemma A.2(ii), we have ˆΨ j ( u ) = X j + u + ˆ N j ( u ) with ˆ N j ( u ) ∈ ˆ T ⊥ j . Hence, | π Y j ( z − y ) | = | π Y j ( Y j + π Y j ( u ) + N Y j ( π Y j ( u )) − ( X j + u + ˆ N j ( u ))) | = | π Y j ( N Y j ( π Y j ( u )) − ˆ N j ( u )) |≤ ∠ ( T Y j M, ˆ T j ) | N Y j ( π Y j ( u )) − ˆ N j ( u ) | + | ˆ π j ( N Y j ( π Y j ( u )) − ˆ N j ( u )) | (cid:46) ( ε m − + γε − )( ε m + γ ) + | ˆ π j ( π ⊥ Y j ( N Y j ( π Y j ( u )))) | (cid:46) ( ε m − + γε − )( ε m + γ ) + ∠ ( T Y j M, ˆ T j ) | N Y j ( π Y j ( u )) | (cid:46) ( ε m − + γε − )( ε m + γ + ε ) (cid:46) ( ε m − + γε − )( ε + γ ) , where we used Proposition 3.4 to bound ∠ ( T Y j M, ˆ T j ), Lemma A.2 to bound | N Y j ( π Y j ( u )) − ˆ N j ( u ) | and Lemma A.1 to bound | N Y j ( π Y j ( u )) | . Recalling that γ (cid:46) ε by assumption, we obtain (F.2).To prove inequality (4.19), we first bound | χ j ( ˆΨ j ◦ S j ( z )) − χ j ( z ) | and then bound | J ( ˆΨ j ◦ S j )( z ) − | . The first bound is based on the following elementary lemma. Lemma F.1. Let θ : R D → R be a smooth radial function. Then, | θ ( x ) − θ ( y ) | ≤ k θ k C R D ) || x | −| y | | .Proof. As dθ (0) = 0, one can write θ ( x ) = ˜ θ ( | x | ) for some function ˜ θ which is Lipschitzcontinuous with Lipschitz constant k d θ k C R D ) . This implies the conclusion.Recall from the proof of Lemma 3.5 that we have χ j ( z ) = ζ j ( z ) / P Ji =1 ζ i ( z ) where ζ i = θ (cid:16) z − X i ε (cid:17) for some smooth radial function θ , and that furthermore, there is at most c d non-zeroterms in the sum in the denominator, which is always larger than 1. Hence, if we control forevery i = 1 , . . . , J the difference || z − X i | − | ˆΨ j ◦ S j ( z ) − X i | | , then we obtain a control on | χ j ( z ) − χ j ( ˆΨ j ◦ S j ( z )) | . We have by (4.18) and (F.2), || ˆΨ j ◦ S j ( z ) − X i | − | z − X i | | = || ˆΨ j ◦ S j ( z ) − z | + 2( ˆΨ j ◦ S j ( z ) − z ) · ( z − X i ) | (cid:46) ( ε m + γ ) + | ( ˆΨ j ◦ S j ( z ) − z ) · ( z − Y i ) | + | ( ˆΨ j ◦ S j ( z ) − z ) · ( X i − Y i ) | (cid:46) ( ε m + γ ) + | π Y j ( ˆΨ j ◦ S j ( z ) − z ) · π Y j ( z − Y i ) | + | π ⊥ Y j ( ˆΨ j ◦ S j ( z ) − z ) · π ⊥ Y j ( z − Y i ) | + ( ε m + γ ) γ (cid:46) ( ε m + γ ) + ε ( ε m + γ ) | z − Y i | + ( ε k + γ ) | π ⊥ Y j ( z − Y i ) | + ( ε m + γ ) γ. 38y Lemma A.1(i), | π ⊥ Y j ( z − Y i ) | ≤ | ˜ π ⊥ Y j ( z ) | + | ˜ π ⊥ Y j ( Y i ) | (cid:46) ε + | Y i − Y j | and γ, ε m (cid:46) ε . Hence,we obtain that || ˆΨ j ◦ S j ( z ) − X i | − | z − X i | | (cid:46) ( ε m + γ )( ε + | Y i − Y j | ) . (F.3)Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ (cid:18) z − X i ε (cid:19) − θ ˆΨ j ◦ S j ( z ) − X i ε !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) ( ε m + γ )( ε + | Y i − Y j | ) ε = ( ε m + γ ) | Y i − Y j | ε ! . (F.4)Note also that if | Y i − Y j | ≥ ε , then | z − X i | ≥ | X i − X j |−| z − X j | ≥ ε − ε − γ ≥ ε , while by thesame argument | ˆΨ j ◦ S j ( z ) − X i | ≥ ε . Hence, both terms in the left-hand side of (F.4) are nullin that case. Thus, we may assume that | Y i − Y j | ≤ ε , so that (cid:12)(cid:12)(cid:12)(cid:12) θ (cid:16) z − X i ε (cid:17) − θ (cid:18) ˆΨ j ◦ S j ( z ) − X i ε (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) (cid:46) ε m + γ . From the definition of χ j ( z ), and as the function t /t is Lipschitz on [1 , ∞ [, weobtain that | χ j ( z ) − χ j ( ˆΨ j ◦ S j ( z )) | (cid:46) ε m + γ .We now prove a bound on | J ( ˆΨ j ◦ S j )( z ) − | . One has, for u = S j ( z ) ∈ ˆ T j , | J ( ˆΨ j ◦ S j )( z ) − | = | J ˆΨ j ( u ) − J (Ψ Y j ◦ π Y j )( u ) | J (Ψ Y j ◦ π Y j )( u ) . By Lemma A.1(v) and Lemma A.2(ii), (cid:13)(cid:13)(cid:13) id ˆ T j − d (Ψ Y j ◦ π Y j )( u ) (cid:13)(cid:13)(cid:13) op (cid:46) | u | and (cid:13)(cid:13)(cid:13) id ˆ T j − d ˆΨ j ( u ) (cid:13)(cid:13)(cid:13) op (cid:46) | u | . As a consequence, both Jacobians are larger than, say 1 / u small enough, and, asthe function A ∈ R d × d p det( A ) is c d -Lipschitz continuous on the set of matrices withdet( A ) ≥ / k A k op ≤ 2, we have | J ( ˆΨ j ◦ S j )( z ) − | ≤ c d (cid:13)(cid:13)(cid:13) d ˆΨ j ( u ) ∗ d ˆΨ j ( u ) − d (Ψ Y j ◦ π Y j )( u ) ∗ d (Ψ Y j ◦ π Y j )( u ) (cid:13)(cid:13)(cid:13) op . (F.5)Recall that ˆΨ j ( u ) = X j + u + ˆ N j ( u ) and Ψ Y j ◦ π Y j ( u ) = Y j + π Y j ( u ) + N Y j ◦ π Y j ( u ). We maywrite d ˆΨ j ( u ) ∗ d ˆΨ j ( u ) = id ˆ T j + ( d ˆ N j ( u )) ∗ d ˆ N j ( u ) and d (Ψ Y j ◦ π Y j )( u ) ∗ d (Ψ Y j ◦ π Y j )( u ) = ˆ π j π Y j ˆ π j + ( d ( N Y j ◦ π Y j )( u )) ∗ d ( N Y j ◦ π Y j )( u ) . One has (cid:13)(cid:13)(cid:13) id ˆ T j − ˆ π j π Y j ˆ π j (cid:13)(cid:13)(cid:13) op = (cid:13)(cid:13)(cid:13) ˆ π j π ⊥ Y j π ⊥ Y j ˆ π j (cid:13)(cid:13)(cid:13) op ≤ ∠ ( T Y j M, ˆ T j ) (cid:46) ( ε m − + γε − ) (cid:46) ε m + γ (recall that γ (cid:46) ε ). Furthermore, by Lemma A.2(iv), (cid:13)(cid:13)(cid:13) ( d ˆ N j ( u )) ∗ d ˆ N j ( u ) − ( d ( N Y j ◦ π Y j )( u )) ∗ d ( N Y j ◦ π Y j )( u ) (cid:13)(cid:13)(cid:13) op ≤ (cid:18)(cid:13)(cid:13)(cid:13) d ˆ N j ( u ) (cid:13)(cid:13)(cid:13) op + (cid:13)(cid:13)(cid:13) d ( N Y j ◦ π Y j )( u ) (cid:13)(cid:13)(cid:13) op (cid:19) (cid:13)(cid:13)(cid:13) d ˆ N j ( u ) − d ( N Y j ◦ π Y j )( u ) (cid:13)(cid:13)(cid:13) op ε ( ε m − + γε − ) (cid:46) ε m + γ. Putting together (F.5) with those two inequalities, we obtain that | J ( ˆΨ j ◦ S j )( z ) − | (cid:46) ε m + γ ,concluding the proof of Lemma 4.8.To conclude the section, we state and prove Lemma F.2, which gives an upper bound onthe quantity T appearing in Lemma 4.9 for φ = K h ∗ ( ν n / ˆ ρ h ) and φ = K h ∗ ( µ n /ρ h ). Lemma F.2. The quantity T = max j =1 ...J sup z ∈B ( Y j ,ε ) | φ ( ˆΨ j ◦ S j ( z )) − φ ( z ) | satisfies T (cid:46) ε m + γ with probability larger than − cn − k/d .Proof. For z ∈ B ( Y j , ε ), we have | φ ( ˆΨ j ◦ S j ( z )) − φ ( z ) | ≤ n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K h ∗ δ X i ( ˆΨ j ◦ S j ( z ))ˆ ρ h ( X i ) − K h ∗ δ Y i ( z ) ρ h ( Y i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Fix an index i ∈ { , . . . , n } . By Lemma A.1(i), as X i − Y i ∈ T Y i M ⊥ , we have for z ∈ M , || z − Y i | − | z − X i | | = || X i − Y i | − z − Y i ) · ( X i − Y i ) | ≤ γ + | z − Y i | γτ min . This inequality together with (F.3) and Lemma F.1 yield | K h ( X i − ˆΨ j ◦ S j ( z )) − K h ( Y i − z ) |≤ | K h ( X i − ˆΨ j ◦ S j ( z )) − K h ( X i − z ) | + | K h ( X i − z ) − K h ( Y i − z ) | (cid:46) h − d − (cid:16) ( ε m + γ )( ε + | Y i − Y j | ) + γ + γ | z − Y i | (cid:17) . We may assume that | Y i − Y j | ≤ h and | z − Y i | ≤ h , for otherwise both quantities in theleft-hand site of the above equation are equal to zero. Hence, as γ (cid:46) ε (cid:46) h by assumption, wehave | K h ( X i − ˆΨ j ◦ S j ( z )) − K h ( Y i − z ) | (cid:46) h − d ( ε m + γ ) { Y i ∈ B M ( z, h ) } . (F.6)Let us now bound | ˆ ρ h ( ˆΨ j ◦ S j ( X i )) − ρ h ( Y i ) | . By the triangle inequality, and using (4.19) and(F.6), we obtain that this quantity is smaller than J X j =1 Z M | ˜ χ j ( z ) K h ( X i − ˆΨ j ◦ S j ( z )) − χ j ( z ) K h ( Y i − z ) | d z (cid:46) J X j =1 Z M (cid:16) { z ∈ B M ( Y j , ε ) } ( ε m + γ ) | K h ( Y i − z ) | + ˜ χ j ( z ) h − d ( ε m + γ ) { z ∈ B M ( Y i , h ) } (cid:17) d z (cid:46) h − d ( ε m + γ ) J X j =1 Z M { z ∈ B M ( Y j , ε ) } { z ∈ B M ( Y i , h ) } d z ε d h − d ( ε m + γ ) J X j =1 {| Y j − Y i | ≤ h } (cid:46) h − d ( ε m + γ ) J X j =1 {| Y j − Y i | ≤ h } vol M ( B M ( Y j , ε/ (cid:46) h − d ( ε m + γ )vol M ( B M ( Y i , h )) (cid:46) ε m + γ, where we use that { X , . . . , X J } is 7 ε/ { Y , . . . , Y J } is ε/ B M ( Y j , ε/ 8) for | Y j − Y i | ≤ h are pairwise distincts, and are all included in B M ( Y i , h + ε/ ⊂ B M ( Y i , h ). We conclude by Lemma A.1(iii). Letting N ( z, h ) be the number of points Y i belonging to B M ( z, h ), we obtain | φ ( ˆΨ j ◦ S j ( z )) − φ ( z ) | (cid:46) n n X i =1 (cid:16) | K h ( Y i − z ) | ( ε m + γ ) + h − d ( ε m + γ ) { Y i ∈ B M ( z, h ) } (cid:17) (cid:46) N ( z, h ) nh d ( ε m + γ ) . If, for every z ∈ M and some λ > N ( z, h ) ≤ λnh d , then we have the conclusion. Let usbound P = P ( ∃ z ∈ M, N ( z, h ) > λnh d ) . If N ( z, h ) > λnh d , then there exists a point Y i with N ( Y i , h ) ≥ N ( z, h ) > λnh d . Hence, P ≤ n P ( N ( Y , h ) > λnh d ). Conditionally on Y , N ( Y , h ) = 1 + U with U a binomialrandom variable of parameters n − µ ( B M ( Y , h )) ≤ f max vol M ( B M ( Y , h )) (cid:46) h d (seeLemma A.1(iii)). In particular, for λ large enough, the probability P is smaller than n − k/d byHoeffding’s inequality. G Lower bounds on minimax risks In this section, we prove the different lower bounds on minimax risks stated in the article. Themain tool used will be Assouad’s lemma. Fix as in Section 2.4 a statistical model ( Y , H , Q , ι, ϑ ),where ϑ : Y → ( E, L ) is a measurable function taking its values in some semi-metric space( E, L ). Lemma G.1 (Assouad’s lemma [Yu97]) . Let m ≥ be an integer and Q m = { ξ σ , σ ∈{− , } m } ⊂ Q be a set of probability measures. Assume that for all σ, σ ∈ {− , } m , L ( ϑ ( ξ σ ) , ϑ ( ξ σ )) ≥ | σ − σ | δ, (G.1) where | σ − σ | = P mi =1 { σ ( i ) = σ ( i ) } is the Hamming distance between σ and σ . Then, R n ( ϑ, Q , L ) ≥ m δ (cid:0) − max (cid:8) T V ( ι ξ σ , ι ξ σ ) , | σ − σ | = 1 (cid:9)(cid:1) n . (G.2)The lower bound on the minimax rates we prove are actually going to hold on the smallermodel of uniform distributions on manifolds. 41 efinition G.2. Let k ≥ and γ ≥ . The set Q kd ( γ ) is the set of probability distributions ξ ofrandom variables ( Y, Z ) , where Y follows the uniform distribution on some manifold M ∈ M kd with f − ≤ | vol M | ≤ f − , and Z ∈ B (0 , γ ) is such that Z ∈ T Y M ⊥ . The statistical modelis completed by letting ( Y , H ) be R D × R D endowed with its Borel σ -algebra, ι be the addition R D × R D → R D and ϑ ( ξ ) be the first marginal µ of ξ . We write Q kd for Q kd (0). One can check that Q kd ( γ ) ⊂ Q k,sd ( γ ), with parameter L s = f − /p min ∨ f − /p max . Therefore, a lowerbound on the minimax risk on the model Q kd ( γ ) yields alowerbound on the minimax risk on the model Q k,sd ( γ ) should the parameter L s be large enough.We build a subfamily of manifolds indexed by σ ∈ {− , } m following [AL19]. By[AL19, Section C.2], there exists a manifold M ⊂ R d +1 of reach 2 τ min , of volume C d τ d min whichcontains B R d (0 , τ min ). Let δ > m points x , . . . , x m ∈ B R d (0 , τ min / | x i − x i | ≥ δ for i = i and c d ( τ min /δ ) d ≤ m ≤ C d ( τ min /δ ) d . Let 0 < Λ < δ and let φ : R d +1 → [0 , 1] be a smooth radial function supported on B (0 , φ ≡ B (0 , / e be the unit vector in the ( d + 1)th direction. We then let, for σ ∈ {− , } m ,Φ Λ σ ( x ) = x + m X i =1 σ i + 12 Λ φ (cid:18) x − x i δ (cid:19) e. (G.3)Let M Λ σ = Φ Λ σ ( M ) and µ Λ σ be the the uniform measure on M Λ σ . If Λ ≤ c k,d,τ min δ k , then µ Λ σ ∈ Q kd ,provided that L k is large enough [AL19, Lemma C.13]. If σ i = 1, the volume of Φ Λ σ ( B R d ( x i , δ ))satisfies, with ω d the volume of the d -dimensional unit ball, (cid:12)(cid:12)(cid:12) vol M Λ σ (Φ Λ σ ( B R d ( x i , δ )) − ω d δ d (cid:12)(cid:12)(cid:12) ≤ Z B R d ( x i ,δ ) | J Φ Λ σ ( x ) − | d x ≤ Z B R d ( x i ,δ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)s δ − (cid:12)(cid:12)(cid:12)(cid:12) ∇ φ (cid:18) x − x i δ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d x ≤ C d δ d Λ δ − . Hence, for δ small enough, we have || vol M Λ σ | − C d τ d min | ≤ mC d δ d Λ δ − ≤ C d τ d min / 3, as m ≤ C d ( τ min /δ ) d and Λ ≤ c k,d,τ min δ k . As a consequence, if | σ − σ | = 1, with for instance σ i = 1 and σ i = − 1, thenTV( µ Λ σ , µ Λ σ ) ≤ max( µ Λ σ (Φ Λ σ ( B R d ( x i , δ ))) , µ Λ σ ( B R d ( x i , δ )) ≤ C d,τ min δ d . (G.4)We may now prove the different minimax lower bounds using Assouad’s Lemma on the family { µ Λ σ , σ ∈ {− , } m } . Proof of Theorem 2.13. As g is nondecreasing and convex, by Jensen’s inequality, we may as-sume without loss of generality that L = TV. Let Γ = | ( µ Λ σ − µ Λ σ )( B i ) | , where B i = B R d ( x i , δ )42nd σ ( i ) = σ ( i ). Then, TV( µ Λ σ , µ Λ σ ) ≥ | σ − σ | Γ. Furthermore, if for instance σ ( i ) = 1,Γ ≥ µ Λ σ ( B i ) = ( ω d δ d ) / | vol M Λ σ | ≥ c d δ d /τ d min . By Assouad’s Lemma, R n ( µ ; Q s,kd ; TV) ≥ R n ( µ ; Q kd ; TV) ≥ m c d δ d τ d min (cid:16) − C d,τ min δ d (cid:17) n ≥ C d (cid:16) − C d,τ min δ d (cid:17) n . We obtain the conclusion by letting δ go to 0. Lemma G.3. For any τ min > and ≤ r ≤ ∞ , for f min small enough and f max , L k largeenough, one has R n (cid:18) vol M | vol M | , Q kd ( γ ) , W r (cid:19) (cid:38) γ + n − k/d . (G.5) Proof. As, W r ≥ W , we may assume that r = 1. Let σ, σ ∈ {− , } m with σ ( i ) = σ ( i ).Let p σ ( i ) = vol M Λ σ ( B ( x i , δ )) and U Λ σ,i = p − σ ( i ) (vol M Λ σ ) |B ( x i ,δ ) . By the Kantorovitch-Rubinsteinduality formula, W ( µ, ν ) = max R f d( µ − ν ), where the maximum is taken over all 1-Lipschitzcontinuous functions f : R D → R . Let f : x x · e . Assume for instance that σ ( i ) = − σ ( i ) = 1. We have f ( x ) = 0 for x ∈ B M Λ σ ( x i , δ ) and f ( x ) = Λ for x ∈ B M Λ σ ( x i , δ/ p σ ( i ) ≤ cδ − d , W ( U Λ σ,i , U Λ σ ,i ) ≥ p − σ ( i ) Λ ω d ( δ/ d ≥ c Λ . Note also that | p σ ( i ) − p σ ( i ) | ≤ (cid:12)(cid:12)(cid:12) vol M Λ σ (Φ Λ σ ( B R d ( x i , δ )) − ω d δ d (cid:12)(cid:12)(cid:12) ≤ C d δ d Λ δ − . Furthermore, || vol M Λ σ |−| vol M Λ σ || ≤ P mi =1 | p σ ( i ) − p σ ( i ) | ≤ | σ − σ | C d δ d Λ δ − . Let f i be a 1-Lipschitz continuousfunction such that W ( U Λ σ,i , U Λ σ ,i ) = R f i d ( U Λ σ,i − U Λ σ ,i ). One can choose f i such that f i ( x i ) = 0,so that the maximum of | f i | on B ( x i , δ ) is at most δ . One can then change the value of f i outside the ball without changing the value of the integral, so that f i is supported on B ( x i , δ )and is 1-Lipschitz continuous. Consider the function f obtained by gluing together the differentfunctions f i . The function f is 1-Lipschitz continuous, so that W (cid:16) µ Λ σ , µ Λ σ (cid:17) ≥ m X i =1 p σ ( i ) | vol M Λ σ | U Λ σ,i − p σ ( i ) | vol M Λ σ | U Λ σ ,i ( f ) ≥ m X i =1 p σ ( i ) | vol M Λ σ | ( U Λ σ,i − U Λ σ ,i )( f ) − | p σ ( i ) − p σ ( i ) | | U Λ σ ,i ( f ) || vol M Λ σ | − p σ ( i ) | U Λ σ ,i ( f ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | vol M Λ σ | − | vol M Λ σ | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ m X i =1 p σ ( i ) | vol M Λ σ | W ( U Λ σ,i , U Λ σ ,i ) − m X i =1 c | p σ ( i ) − p σ ( i ) | δ { σ ( i ) = σ ( i ) } − c δ | σ − σ | δ d Λ δ − ≥ m X i =1 { σ ( i ) = σ ( i ) } ( c δ d Λ − c δ d Λ δ − ) − c δ | σ − σ | δ d Λ δ − ≥ c δ d Λ | σ − σ | . c k,d,τ min ,L k δ k and δ = n − , we have, by Assouad’s Lemma, R n (cid:18) vol M | vol M | , Q kd ( γ ) , W r (cid:19) (cid:38) n − k/d . Consider now the case γ > 0. Let M be the d -dimensional sphere of radius τ min and M bethe d -dimensional sphere of radius τ min + δ . Let Y be uniform on M , and let ξ be the lawof ( Y, ξ be the law of ((1 + γ/τ min ) Y, − γ/τ min Y ). Then, ι ξ = ι ξ , whereas W (cid:16) vol M | vol M | , vol M | vol M | (cid:17) ≥ γ . We conclude by Le Cam lemma [Yu97] that R n (cid:16) vol M | vol M | , Q kd ( γ ) , W r (cid:17) (cid:38) γ . Proof of Theorem 3.1(iv). Let a n = n − s +12 s + d if d ≥ a n = n − / if d ≤ 2. As W p ≥ W ,we may assume without loss of generality that p = 1, and up to rescaling, we assume that τ min = √ d . Consider the manifold M ⊂ R d +1 containing B R d (0 , √ d ) of the previous proof.In particular, M contains the cube [ − , d . We adapt the proof of Theorem 3 in [WB19b],where authors consider a family of functions f σ : [ − , d → M indexed by σ ∈ {− , } m , with f σ = 1 + n − / P mj =1 σ j ψ j , where ( ψ j ) j =1 ,...,m are elements of a wavelet basis of [ − , d (see[WB19b, Appendix E] for details on the construction of the wavelet basis). If m (cid:46) n d/ (2 s + d ) ,then t ≤ f σ ≤ t for some positive constants t < < t , and k f σ k B sp,q ([ − , d ) (cid:46) 1. Define afunction g σ by g σ ( x ) = f σ ( x ) if x ∈ [ − , d and g σ ( x ) = 1 otherwise. The function g σ satisfies t ≤ g σ ≤ t , as well as k g σ k B sp,q ( M ) (cid:46) k f σ k B sp,q + | vol M | /p (cid:46) 1. Such an inequality is clearfor the k · k H lp ( M ) norm for l an integer, as k g σ k pH lp ( M ) = k g σ k pH lp ([ − , d ) + k g σ k pH lp ( M \ [ − , d ) ,while the result follows from interpolation for Besov spaces [Lun18, Corollary 1.1.7]. Also, as R [ − , d f σ = 1, we have R g σ = | vol M | , and g σ / | vol M | is larger than f min = t / | vol M | and smallerthan f max = t / | vol M | . Hence, identifying measures with their densities, the set Q m = { µ σ = g σ / | vol M | , σ ∈ {− , } m } is a subset of Q s,kd for f min small enough and L k , L s , f max large enough. Furthermore, for σ, σ ∈{− , } m , TV( µ σ , µ σ ) = TV( f σ , f σ ), while W ( µ σ , µ σ ) = W ( f σ , f σ ) by the Kantorovitch-Rubinstein duality formula. Hence, applying Assouad’s inequality in the same fashion than in[WB19b, Theorem 3] yields that R n ( µ, Q s,kd , W ) (cid:38) a n . Proof of Theorem 3.7(iv). According to Lemma G.3, R n ( µ, Q s,kd , W p ) ≥ R n ( µ, Q kd , W p ) (cid:38) γ + n − k/d , and according to Theorem 3.1(iv), R n ( µ, Q s,kd , W p ) (cid:38) a n . H Existence of kernels satisfying conditions A , B ( m ) and C ( β )The goal of the section is to prove the existence of a kernel K satisfying the conditions A , B ( m )and C ( β ) stated at the beginning of Section 3.44f K is a radial kernel, we have by integration by parts, as K is smooth with compactsupport, Z R d ∂ α K ( v ) v α d v = C α ,α Z R d K ( v ) v α + α d v = C α ,α Z R K ( r ) r d + | α | + | α |− d r. Hence, to show the existence of such a kernel, it suffices to find, for every m ≥ κ , a smooth even function K : R → R supported on [ − , 1] satisfying• Condition A : R R K ( r ) r d − d r = κ ,• Condition B ( m ) : R R K ( r ) r d + i − d r = 0 for i = 1 , . . . , m ,• Condition C ( β ) : R R K ( r ) − r d − d r ≤ β .We show by recursion on m that for any β > 0, there exists a such a kernel. For m = 0, let K be any smooth even nonnegative function supported on [ − , K = κK / R R K ,we obtain a kernel K satisfying the desired conditions for any β > 0. Consider now the case m > 0. Let β > m + d is even, then any K satisfying conditions A , B ( m − 1) and C ( β ) will also satisfy B ( m ). Indeed, as K is even, we have R R K ( r ) r m + d − d r = 0, so that the induction step isproven.• If m + d is odd, let K be a kernel satisfying conditions A , B ( m − 1) and C ( β/ Lemma H.1. For i ≥ , let e i : x ∈ R x i + d − and fix an integer m > . Then, for any a ∈ R , let F a be the set of smooth functions f : (1 , ∞ ) → R with compact support satisfying R f e i = 0 for ≤ i < m and R f e m = a . Then, inf (cid:26)Z | f ( r ) | r d − d r, f ∈ F a (cid:27) = 0 . (H.1)Assume first the lemma. Let a = − R R K ( r ) r m + d − and f ∈ F a . Then, R ( K ( r ) + f ( | r | )) r d − d r = κ + R f ( | r | ) r d − d r = κ R ( K ( r ) + f ( | r | )) r i + d − d r = R f ( | r | ) r i + d − d r = 0 for 0 < i < m R ( K ( r ) + f ( | r | )) r m + d − d r = R K ( r ) r m + d − d r + 2 R ∞ f ( r ) r m + d − d r = 0 . Hence, the kernel K + f ( | · | ) satisfies the conditions A and B ( m ). Also, we have, as K ( r ) = 0if | r | ≥ Z R ( K ( r ) + f ( | r | )) − r d − d r = Z R K ( r ) − d r + 2 Z ∞ f ( r ) − r d − d r β/ Z ∞ | f ( r ) | r d − d r, where we used at the last line that R ∞ f ( r ) − r d − d r = R ∞ f ( r ) + r d − d r = R ∞ | f ( r ) | r d − d r .Lemma H.1 asserts the existence of f ∈ F a with R | f ( r ) | r d − d r ≤ β/ 2. For such a choice of f , the kernel ˜ K = K + f ( | · | ) satisfies also C ( β ). Finally, f has a compact support, includedin [0 , R ] for some R > 0. The kernel ˜ K /R is supported on B (0 , A , B ( m ) and C ( β ). This concludes the induction step, and the proof of the existence of kernelssatisfying conditions A , B ( m ) and C ( β ). Proof of Lemma H.1. Consider functions f supported on [ r , r ] for some 1 < r ≤ r to fix.Let G r ,r be the subspace of L ([ r , r ]) spanned by the functions e i for 0 ≤ i ≤ m − g m be the projection of e m on G ⊥ r ,r the orthogonal space of G r ,r , with L norm ‘ . Thefunction f = ag m ‘ is a polynomial of degree m restricted to [ r , r ] and satisfies R f e i = 0 for0 ≤ i ≤ m − R f e m = a‘ R e m g m = a . Also, we have for any polynomial P ∈ G r ,r , k e m − P k L ([ r ,r ]) = Z r r | r m + d − − P ( r ) | d r = Z r r r | ( r r ) d + m − − P ( rr ) | d r = r d + m ) − Z r r | r d + m − − r − ( d + m − P ( rr ) | d r. As r r − ( d + m − P ( rr ) is an element of G ,r /r , letting r = 2 r , we obtain ‘ = k g m k L ([ r ,r ]) = min P ∈ G r ,r k e m − P k L ([ r ,r ]) = r d + m ) − min P ∈ G , k e m − P k L ([1 , = Cr d + m ) − , where C = C m > e m restricted to [1 , 2] and G , . The function f is notsmooth so that it does not belong to F a . To overcome this issue, we consider a smooth kernel ρ on R satisfying R ρ = 1 and R ρ ( r ) r i d r = 0 for i = 1 , . . . , m + d − 1, with support included in B R (0 , r / ρ . The map ρ ∗ f is supported on (1 , ∞ ) and it is straighforward to check that ρ ∗ f ∈ F a for r > 2. By Young’sinequality, k ρ ∗ f k L ( R ) ≤ k ρ k L ∞ ( R ) k f k L ( R ) , so that Z | ρ ∗ f ( r ) | r d − d r ≤ Z r / r / r d − d r ! / k ρ ∗ f k L ( R ) ≤ (cid:16) c d r d − (cid:17) / k ρ k L ∞ ( R ) k f k L ( R ) ≤ C d,m ar − m By letting r goes to ∞ , we see that inf nR | f ( r ) | r d − d r, f ∈ F a o = 0.46 eferences [Aam17] Eddie Aamari. Vitesses de convergence en inférence géométrique . PhD thesis,Paris Saclay, 2017.[AKC + 19] Eddie Aamari, Jisu Kim, Frédéric Chazal, Bertrand Michel, Alessandro Rinaldo,and Larry Wasserman. Estimating the reach of a manifold. Electronic Journal ofStatistics , 13(1):1359–1399, 2019.[AL18] Eddie Aamari and Clément Levrard. Stability and minimax optimality of tangen-tial Delaunay complexes for manifold reconstruction. Discrete & ComputationalGeometry , 59(4):923–971, 2018.[AL19] Eddie Aamari and Clément Levrard. Nonasymptotic rates for manifold, tangentspace and curvature estimation. The Annals of Statistics , 47(1):177–204, 2019.[Aub82] Thierry Aubin. Nonlinear Analysis on Manifolds. Monge-Ampère Equations .Grundlehren der mathematischen Wissenschaften. Springer New York, 1982.[BB00] Jean-David Benamou and Yann Brenier. A computational fluid mechanics solutionto the monge-kantorovich mass transfer problem. Numerische Mathematik , 84:375–393, 01 2000.[BCH18] Gérard Besson, Gilles Courtois, and Sa’ar Hersonsky. Poincaré inequality on com-plete Riemannian manifolds with Ricci curvature bounded below. MathematicalResearch Letters , 25(6):1741–1769, 2018.[BCS10] Lorenzo Brasco, Guillaume Carlier, and Filippo Santambrogio. Congested trafficdynamics, weak flows and very degenerate elliptic equations. Journal de mathé-matiques pures et appliquées , 93(6):652–671, 2010.[BH19] Clément Berenfeld and Marc Hoffmann. Density estimation on an unknown sub-manifold. arXiv preprint arXiv:1910.08477 , 2019.[BHHS20] Clément Berenfeld, John Harvey, Marc Hoffmann, and Krishnan Shankar. Esti-mating the reach of a manifold via its convexity defect function. arXiv preprintarXiv:2001.08006 , 2020.[Bre03] Yann Brenier. Extended Monge-Kantorovich theory. In Optimal transportationand applications , pages 91–121. Springer, 2003.[Bre10] Haim Brezis. Functional Analysis, Sobolev Spaces and Partial Differential Equa-tions . Universitext. Springer New York, 2010.47BRS + 12] Sivaraman Balakrishnan, Alesandro Rinaldo, Don Sheehy, Aarti Singh, and LarryWasserman. Minimax rates for homology inference. In Artificial Intelligence andStatistics , pages 64–72, 2012.[BS17] Tyrus Berry and Timothy Sauer. Density estimation on manifolds with boundary. Computational Statistics & Data Analysis , 107:1–17, 2017.[Bur94] Heinrich Burkhardt. Sur les fonctions de Green relatives à un domaine d’unedimension. Bulletin de la Société Mathématique de France , 22:71–75, 1894.[CGK + 20] Galatia Cleanthous, Athanasios G Georgiadis, Gerard Kerkyacharian, PenchoPetrushev, and Dominique Picard. Kernel and wavelet density estimators on man-ifolds and more general metric spaces. Bernoulli , 26(3):1832–1862, 2020.[dC92] Manfredo Perdigao do Carmo. Riemannian Geometry . Mathematics (Boston,Mass.). Birkhäuser, 1992.[Div20] Vincent Divol. Minimax adaptive estimation in manifold inference. arXiv preprintarXiv:2001.04896 , 2020.[Div21] Vincent Divol. A short proof on the rate of convergence of the empirical measurefor the Wasserstein distance. arXiv preprint arXiv:2101.08126 , 2021.[DSS13] Steffen Dereich, Michael Scheutzow, and Reik Schottstedt. Constructive quanti-zation: Approximation by empirical measures. In Annales de l’IHP Probabilités etstatistiques , volume 49, pages 1183–1203, 2013.[Dud69] Richard Mansfield Dudley. The speed of mean Glivenko-Cantelli convergence. TheAnnals of Mathematical Statistics , 40(1):40–50, 1969.[DZ01] M.C. Delfour and J.P. Zolésio. Shapes and Geometries: Analysis, DifferentialCalculus, and Optimization . Advances in Design and Control. Society for Industrialand Applied Mathematics, 2001.[Fed59] Herbert Federer. Curvature measures. Transactions of the American MathematicalSociety , 93(3):418–491, 1959.[FG15] Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wassersteindistance of the empirical measure. Probability Theory and Related Fields , 162(3-4):707–738, 2015.[GN15] Evarist Giné and Richard Nickl. Mathematical Foundations of Infinite-Dimensional Statistical Models . Cambridge Series in Statistical and ProbabilisticMathematics. Cambridge University Press, 2015.48GPPVW12] Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and LarryWasserman. Minimax manifold estimation. Journal of machine learning research ,13(May):1263–1291, 2012.[GW03] Joachim Giesen and Uli Wagner. Shape dimension and intrinsic metric from sam-ples of manifolds with high co-dimension. In Proceedings of the nineteenth annualsymposium on Computational geometry , pages 329–337, 2003.[H + 96] Tsutomu Hiroshima et al. Construction of the Green function on Riemannianmanifold using harmonic coordinates. Journal of Mathematics of Kyoto University ,36(1):1–30, 1996.[HA05] Matthias Hein and Jean-Yves Audibert. Intrinsic dimensionality estimation of sub-manifolds in R d . In Proceedings of the 22nd international conference on Machinelearning , pages 289–296. ACM, 2005.[Hen90] Harrie Hendriks. Nonparametric estimation of a probability density on a Rieman-nian manifold using Fourier expansions. The Annals of Statistics , pages 832–849,1990.[KRW16] Jisu Kim, Alessandro Rinaldo, and Larry Wasserman. Minimax rates for estimat-ing the dimension of a manifold. arXiv preprint arXiv:1605.01011 , 2016.[L + 20] Jing Lei et al. Convergence and concentration of empirical measures under Wasser-stein distance in unbounded functional spaces. Bernoulli , 26(1):767–798, 2020.[LJM09] Anna V Little, Yoon-Mo Jung, and Mauro Maggioni. Multiscale estimation ofintrinsic dimensionality of data sets. In , 2009.[LLW07] Han Liu, John Lafferty, and Larry Wasserman. Sparse nonparametric density esti-mation in high dimensions using the rodeo. In Artificial Intelligence and Statistics ,pages 283–290, 2007.[Lun18] A. Lunardi. Interpolation Theory . Publications of the Scuola Normale Superiore.Scuola Normale Superiore, 2018.[LZZL13] Jicai Liu, Riquan Zhang, Weihua Zhao, and Yazhao Lv. A robust and efficient esti-mation method for single index models. Journal of Multivariate Analysis , 122:226–238, 2013.[NSW08] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology ofsubmanifolds with high confidence from random samples. Discrete & Computa-tional Geometry , 39(1-3):419–441, 2008.49PC19] Gabriel Peyré and Marco Cuturi. Computational optimal transport: With ap-plications to data science. Foundations and Trends® in Machine Learning , 11(5-6):355–607, 2019.[Pel05] Bruno Pelletier. Kernel density estimation on Riemannian manifolds. Statistics &probability letters , 73(3):297–304, 2005.[Pey18] Rémi Peyre. Comparison between W distance and ˙ H − norm, and localization ofWasserstein distance. ESAIM. Control, Optimisation and Calculus of Variations ,24(4), 2018.[PR84] Jean-Baptiste Poly and Gilles Raby. Fonction distance et singularités. Bulletin deSciences Mathématiques , 108:187–195, 1984.[Ros70] Haskell P Rosenthal. On the subspaces of L p ( p > 2) spanned by sequences ofindependent random variables. Israel Journal of Mathematics , 8(3):273–303, 1970.[San15] Filippo Santambrogio. Optimal transport for applied mathematicians. Birkäuser,NY , 55:58–63, 2015.[Sog17] Christopher D. Sogge. Fourier Integrals in Classical Analysis . Cambridge Tractsin Mathematics. Cambridge University Press, 2 edition, 2017.[SP18] Shashank Singh and Barnabás Póczos. Minimax distribution estimation in Wasser-stein distance. arXiv preprint arXiv:1802.08855 , 2018.[TGHS20] Nicolás García Trillos, Moritz Gerlach, Matthias Hein, and Dejan Slepčev. Errorestimates for spectral convergence of the graph Laplacian on random geomet-ric graphs toward the Laplace–Beltrami operator. Foundations of ComputationalMathematics , 20(4):827–887, 2020.[Tib96] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal ofthe Royal Statistical Society: Series B (Methodological) , 58(1):267–288, 1996.[Tri92] H. Triebel. Theory of Function Spaces II . Monographs in Mathematics. SpringerBasel, 1992.[Tsy08] Alexandre B Tsybakov. Introduction to nonparametric estimation . Springer Sci-ence & Business Media, 2008.[Vil08] Cédric Villani. Optimal transport: old and new , volume 338. Springer Science &Business Media, 2008.[WB19a] Jonathan Weed and Francis Bach. Sharp asymptotic and finite-sample rates of con-vergence of empirical measures in Wasserstein distance. Bernoulli , 25(4A):2620–2648, 2019. 50WB19b] Jonathan Weed and Quentin Berthet. Estimation of smooth densities in Wasser-stein distance. arXiv preprint arXiv:1902.01778 , 2019.[WW20] Hau-Tieng Wu and Nan Wu. Strong uniform consistency with rates for kerneldensity estimators with general kernels on manifolds, 2020.[Yu97] Bin Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam , pages423–435. Springer, 1997.[ZJRS16] Hongyi Zhang, Sashank J Reddi, and Suvrit Sra. Riemannian SVRG: Fast stochas-tic optimization on Riemannian manifolds.