Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation
EEigen-convergence of Gaussian kernelized graph Laplacian bymanifold heat interpolation
Xiuyuan Cheng ∗ Nan Wu † Abstract
This work studies the spectral convergence of graph Laplacian to the Laplace-Beltrami operatorwhen the graph affinity matrix is constructed from N random samples on a d -dimensional manifoldembedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and con-structing candidate approximate eigenfunctions via convolution with manifold heat kernel, we provethat, with Gaussian kernel, one can set the kernel bandwidth parameter (cid:15) ∼ (log N/N ) / ( d/ suchthat the eigenvalue convergence rate is N − / ( d/ and the eigenvector convergence in 2-norm hasrate N − / ( d +4) ; When (cid:15) ∼ N − / ( d/ , both eigenvalue and eigenvector rates are N − / ( d/ . Theserates are up to a log N factor and proved for finitely many low-lying eigenvalues. The result holds forun-normalized and random-walk graph Laplacians when data are uniformly sampled on the manifold,as well as the density-corrected graph Laplacian (where the affinity matrix is normalized by the degreematrix from both sides) with non-uniformly sampled data. As an intermediate result, we prove newpoint-wise and Dirichlet form convergence rates for the density-corrected graph Laplacian. Numericalresults are provided to verify the theory. Graph Laplacian matrices built from data samples are widely used in data analysis and machine learning.The earlier works include Isomap [2], Laplacian Eigenmap [3], Diffusion Map [10, 30], among others. Apartfrom being a widely-used unsupervised learning method for clustering analysis and dimension reduction(see, e.g., the review papers [33, 30]), graph Laplacian methods also drew attention via the applicationin semi-supervised learning [24, 12, 29, 15]. Under the manifold setting, data samples are assumed to lieon low-dimensional manifolds embedded in a possibly high-dimensional ambient space. A fundamentalproblem is convergence of the graph Laplacian matrix to the manifold Laplacian operator in the largesample limit. The operator point-wise convergence has been intensively studied and established in a seriesof works [19, 18, 4, 10, 27], and extended to variant settings, such as different kernel normalizations [23, 36]and general class of kernels [31, 5, 9]. The eigen-convergence, namely how the empirical eigenvalues andeigenvectors converge to the population eigenvalues and eigenfunctions of the manifold Laplacian, is amore subtle issue and has been studied in [4, 34, 6, 35, 28, 14] (among others) and recently in [32, 7, 11, 8].The current work proves the eigen-convergence, specifically the consistency of eigenvalues and eigenvec-tors in 2-norm, for finitely many low-lying eigenvalues of the graph Laplacian constructed using Gaussiankernel from i.i.d. sampled manifold data. The result covers the un-normalized and random-walk graphLaplacian when data density is uniform, and the density-corrected graph Laplacian (defined below) withnon-uniformly sampled data. For the latter, we also prove new point-wise and Dirichlet form convergencerates as an intermediate result. We overview the main results in Section 1.1 in the context of literature,which are also summarized in Table 2. The framework of our work follows the variational principle for-mulation of eigenvalues using the graph and manifold Dirichlet forms. Dirichlet form-based approach to ∗ Department of Mathematics, Duke University. Email: [email protected] † Department of Mathematics, Duke University. Email: [email protected] a r X i v : . [ m a t h . S T ] J a n able 1: List of default notations M d -dimensional manifold in R D p data sampling density on M ∆ M Laplace-Beltrami operator, also as ∆ µ k population eigenvalue of − ∆ ψ k population eigenfunctions of − ∆ λ k empirical eigenvalue of graph Laplacian v k empirical eigenvector of graph Laplacian ∇ M manifold gradient, also as ∇ H t manifold heat kernel Q t semi-group operator of manifold diffusion, Q t = e t ∆ X dataset points used for computing WN number of samples in X(cid:15) kernel bandwidth parameter K (cid:15) graph affinity kernel, W ij = K (cid:15) ( x i , x j ), K (cid:15) ( x, y ) = (cid:15) − d/ h ( (cid:107) x − y (cid:107) (cid:15) ) h a function [0 , ∞ ) → R m m [ h ] := (cid:82) R d h ( | u | ) dum m [ h ] := d (cid:82) R d | u | h ( | u | ) duW kernelized graph affinity matrix D degree matrix of W , D ii = (cid:80) Nj =1 W ij L un un-normalized graph Laplacian L rw random-walk graph Laplacian E N graph Dirichlet form ρ X function evaluation operator, ρ X f = { f ( x i ) } Ni =1 ˜ W density-corrected affinity matrix, ˜ W = D − W D − ˜ D degree matrix of ˜ W Asymptotic Notations O ( · ) f = O ( g ): | f | ≤ C | g | in the limit, C > O a ( · )declaring the constant dependence on a Θ( · ) f = Θ( g ): for f , g ≥ C g ≤ f ≤ C g in thelimit, C , C > ∼ f ∼ g same as f = Θ( g ) o ( · ) f = o ( g ): for g > | f | /g → · ) f = Ω( g ): for f, g > f/g → ∞ in the limit˜ O ( · ) O ( · ) multiplied another factor involving a log, de-fined every time used in textWhen the superscript a is omitted, it declares thatthe constants are absolute ones. f = O ( g , g ) means that f = O ( | g | + | g | ). prove graph Laplacian eigen-convergence was firstly carried out in [6] under a non-probabilistic setting.[32, 7] extended the approach under the probabilistic setting, where x i are i.i.d. samples, using optimaltransport techniques. Our analysis follows the same form-based approach and differs from previous worksin the following aspects: Let (cid:15) be the (squared) kernel bandwidth parameter corresponding to diffusiontime, N the number of samples, and d the manifold intrinsic dimensionality, • Leveraging the observation in [10, 27] that the bias error in the point-wise rate of graph Laplaciancan be improved from O ( √ (cid:15) ) to O ( (cid:15) ) using a C kernel function, we show that the improved point-wiserate of Gaussian kernelized graph Laplacian translates into an improved eigen-convergence rate than whenusing compactly supported kernels (e.g. indicator function). • We show that the eigenvalue convergence rate matches that of the Dirichlet form convergence ratein [9], which is better than the point-wise rate. This leads to faster convergence of eigenvalue thaneigenvectors, and in particular, the eigenvalue error rate equals the square of the eigenvector rate underthe regime (cid:15) ∼ N − / ( d/ , both up to a factor of a certain power of log N . • In obtaining the initial crude eigenvalue LB, called Step 1 in below, we develop a short proof usingmanifold heat kernel to define the “interpolation mapping”, which constructs from a vector v a smoothfunction f on M . The manifold variational form of f , defined via the heat kernel, naturally relatesto the graph Dirichlet form of v when the graph affinity matrix is constructed using Gaussian kernel.The approach of heat kernel interpolation for variational principle eigen-convergence analysis, to the bestknowledge of the authors, has not been explored in the literature.Towards the eigen-convergence, our work also recaps and develops several intermediate results underweaker assumptions of the kernel function (i.e., non-Gaussian), including an improved point-wise con-vergence rate of density-corrected graph Laplacian. The density-corrected graph Laplacian, originallyproposed in [10], is an important variant of the kernelized graph Laplacian where the affinity matrix is˜ W = D − W D − . In applications, the data distribution p is often not uniform on the manifold, and thenthe standard graph Laplacian with W recovers the Fokker-Planck operator (weighted Laplacian) withmeasure p , which involves a drift term depending on ∇ M log p . The density-corrected graph Laplacian,2n contrast, recovers the Laplace-Beltrami operator consistently when p satisfies certain regularity con-dition, and thus is useful in many applications. In this work, we first prove the point-wise convergenceand Dirichlet form convergence of the density-corrected graph Laplacian with ˜ W , both matching those ofthe standard graph Laplacian, and this can be of independent interest. Then the eigen-consistency resultextends to such graph Laplacians (with Gaussian kernel function), also achieving the same rate as thestandard graph Laplacian when p is uniform.In below, we give an overview of the theoretical results and end the introduction section with furtherliterature review. In the rest of the paper, Section 2 gives set-up and preliminaries needed in the analysis.Sections 3-5 develop the eigen-convergence of standard graph Laplacians, both the un-normalized and thenormalized (random-walk) ones. Section 6 extends to density-corrected graph Laplacian, and Section 7gives numerical results. We discuss possible extensions in the last section. Notations . Default and asymptotic notations like O ( · ), Ω( · ), Θ( · ), are listed in Table 1. In this paper,we treat constants which are determined by h , M , p as absolute ones, including the intrinsic dimension d . We mainly track the number of samples N and the kernel diffusion time parameter (cid:15) , and we mayemphasize the constant dependence on p or M in certain circumstances, using the subscript notation like O M ( · ). All constant dependence can be tracked in the proof. The current paper inherits the probabilistic manifold data setting, namely, the dataset { x i } Ni =1 consists ofi.i.d. samples drawn from a distribution on M with density p satisfying the following assumption: Assumption 1 (Smooth M and p ) . (A1) M is a d -dimensional compact connected C ∞ manifold (withoutboundary) isometrically embedded in R D .(A2) p ∈ C ∞ ( M ) and uniformly bounded both from below and above, that is, ∃ p min , p max > s.t. < p min ≤ p ( x ) ≤ p max < ∞ , ∀ x ∈ M . Suppose M is embedded via ι , and when there is no danger of confusion, we use the same notation x to denote x ∈ M and ι ( x ) ∈ R D . We have the measure space ( M , dV ): when M is orientable, dV isthe Riemann volume form; otherwise, dV is the measure associated with the local volume form. Thesmoothness of p and M fulfills many application scenarios, and possible extensions to less regular M or p are postponed. Our analysis first addresses the basic case where p is uniform on M , i.e., p = M ) andis a positive constant. For non-uniform p as in (A2), we adopt and analyze the density correction graphLaplacian in Section 6. In both cases, the graph Laplacian recovers the Laplace-Beltrami operator ∆ M .In below, we write ∆ M as ∆, ∇ M as ∇ .Given N data samples, the graph affinity matrix W and the degree matrix D are defined as W ij = K (cid:15) ( x i , x j ) , D ii = N (cid:88) j =1 W ij .W is real symmetric, typically W ij ≥
0, and for the kernelized affinity matrix, W ij = K (cid:15) ( x i , x j ) where K (cid:15) ( x, y ) := (cid:15) − d/ h ( (cid:107) x − y (cid:107) (cid:15) ) , (1)for a function h : [0 , ∞ ) → R . The parameter (cid:15) > √ (cid:15) >
0, which corresponds to the scale ofthe local distance (cid:107) x − y (cid:107) such that h ( (cid:107) x − y (cid:107) (cid:15) ) is of O (1) magnitude. Our results are written with respectto the time parameter (cid:15) , which corresponds to the squared local distance length scale.3 able 2: Summary of theoretical results. p uniform p non-uniform Needed assumptions Error bound L un with W L rw with W ˜ L rw with ˜ W on h on (cid:15) ( (cid:15) → (cid:15) d/ = Ω( log NN ) form rateCrude eigen-value LB Prop. 4.1 Prop. 4.4 Prop. 6.6 O (1)Eigenvectorconvergence Prop. 5.2 - - Gaussian (cid:15) d/ > c K log NN point-wise rateEigenvalueconvergence Prop. 5.3 - - form rateEigen-convergence Thm. 5.4 Thm. 5.5 Thm. 6.7 Gaussian (cid:15) d/ ∼ N − λ k and v k :˜ O ( N − d/ ) (cid:15) d/ = c log NN , c > c K λ k : ˜ O ( N − d/ ), v k : ˜ O ( N − d +4 )Point-wiseconvergence Thm. 5.1 [27, 9] ∗ Thm. 6.2 Assump. 2 (cid:15) d/ = Ω( log NN ) point-wise rateDirichlet formconvergence Thm. 3.2 [9] ∗ Thm. 6.3 Assump. 2 (cid:15) d/ = Ω( log NN ) form rateIn the last column, the “form rate” is O ( (cid:15), (cid:113) log NN(cid:15)d/ ), and the “point-wise rate” is O ( (cid:15), (cid:113) log NN(cid:15)d/ ). Convergence of first k max eigenvalues and eigenvectors are concerned, k max is fixed. “ λ k :” means the error for eigenvalue convergence, “ v k :” means the errorfor eigenvector convergence (in 2-norm), and ˜ O ( · ) means the possibly involving of a factor of (log N ) α for some α >
0. In the 2nd(3rd) column, the eigenvector and eigenvalue convergences are proved in Thm. 5.5 (Thm. 6.7) and are not written as separatedpropositions. ∗ The point-wise convergence and Dirichlet form convergence results of graph Laplacian with W hold when p satisfiesAssump. 1(A2), i.e., when p is not uniform. The form convergence rate may hold when h is not differentiable, e.g., when h = [0 , ,c.f. Remark 2. Our main result of graph Laplacian eigen-convergence considers when the kernelized graph affinity iscomputed with h ( ξ ) = 1(4 π ) d/ e − ξ/ , ξ ∈ [0 , ∞ ) , (2)we call such h the Gaussian one. The Gaussian h belongs to a larger family of differentiable functions: Assumption 2 (Differentiable h ) . (C1) Regularity. h is continuous on [0 , ∞ ) , C on (0 , ∞ ) .(C2) Decay condition. ∃ a, a k > , s.t., | h ( k ) ( ξ ) | ≤ a k e − aξ for all ξ > , k = 0 , , .(C3) Non-negativity. h ≥ on [0 , ∞ ) . To exclude the case that h ≡ , assume (cid:107) h (cid:107) ∞ > . Several important intermediate results, which can be of independent interest, only require h to satisfyAssumption 2 or weaker, including- Point-wise convergence of graph Laplacians, which we call the point-wise rate .- Convergence of the graph Dirichlet form (cid:15)N u T ( D − W ) u applied to smooth manifold functions, i.e., u = { f ( x i ) } Ni =1 for f smooth on M , which we call the form rate .- The eigenvalue upper bound (UB), which matches the form rate.A summary of results with needed assumptions is provided in Table 2. The point-wise rate and form rateof standard graph Laplacian only require a differentiable and decay condition of h as originally taken in[10], and even without Assumption 2(C3) non-negativity. In literature, the point-wise rate of random-walkgraph Laplacian ( I − D − W ) with differentiable and decay h was shown to be O ( (cid:15), (cid:113) log NN(cid:15) d/ ) in [27].The exposition in [27] was for Gaussian h but the analysis therein extends directly to general h . Theform rate with differentiable h was shown to be O ( (cid:15), (cid:113) log NN(cid:15) d/ ) in [9] via a V-statistic analysis. [9] alsoderived point-wise rate for both the random-walk and the un-normalized graph Laplacian ( D − W ). The4nalysis in [9] was mainly developed for kernel with adaptive bandwidth, and higher order regularity of h ( C instead of C ) was assumed to handle the complication due to variable kernel bandwidth. For thefixed-bandwidth kernel as in (1), the analysis in [9] can be simplified to proceed under less restrictiveconditions of h . We include more details in below when quoting these previous results. Our analysis ofdensity-corrected graph Laplacian assumes W ij ≥
0, and our main result of eigen-covergence needs h tobe Gaussian, thus we include (C3) in Assumption 2 to simplify exposition.As shown in Table 2, the eigenvalue UB holds for general differentiable h , while the initial crudeeigenvalue LB (to be explained in below) and consequently the final eigenvalue and eigenvector convergencerate, need h to be Gaussian. This difference between eigenvalue UB and LB analysis is due to the subtlety ofthe variational principle approach in analyzing empirical eigenvalues. To be more specific, by “projecting”the population eigenfunctions to vectors in R N and use as “candidate” eigenvectors in the variational form,the form rate directly translates into a rate of eigenvalue UB (for fixed finitely many low-lying eigenvalues).The eigenvalue LB, however, is more difficult, as has been pointed out in [6]. In [6] and following workstaking the variational principle approach, the LB analysis is by “interpolating” the empirical eigenvectorsto be functions on M . Unlike with the population eigenfunctions which are known to be smooth, there isless property of the empirical eigenvectors that one can use, and any regularity property of these discreteobjects is usually non-trivial to obtain [8].The interpolation mapping in [6] first assigns a point x i to a Voronoi cell V i , assuming that { x i } i formsan ε -net of M to begin with (a non-probabilistic setting), and this maps a vector u to a piece-wise constantfunction P ∗ u on M ; next, P ∗ u is convolved with a kernel function which is compacted supported on a smallgeodesic ball, and this produces “candidate” eigenfunctions, whose manifold differential Dirichlet form isupper bounded by the graph Dirchlet form of u , up to an error, through differential geometry calculations.Under the probabilistic setting of i.i.d. samples, [32] constructed the mapping P ∗ using a Wasserstein- ∞ optimal transport (OT) map, where the ∞ -OT distance between the empirical measure N (cid:80) i δ x i and thepopulation measure pdV is bounded by constructing a Voronoi tessellation of M when d ≥
2. This ledto an overall eigen-convergence rate of ˜ O ( N − / d ) in [32] when h is compactly supported and satisfiescertain regularity conditions and d ≥
2, the ˜ O indicating a possible a factor of certain power of log N . Atypical example is when h is an indicator function h = [0 , , which is called “ ε -graph” in computer scienceliterature ( ε corresponds to √ (cid:15) in our notation). The approach was extended to k NN graphs in [7], wherethe rate of eigenvalue and 2-norm eigenvector convergence was also improved to match the point-wise rateof the ε -graph or k NN graph Laplacians, leading to a rate of ˜ O ( N − / ( d +4) ) when (cid:15) d/ = Ω( log NN ). Thesame rate was shown for ∞ -norm consistency of eigenvectors in [8], combined with Lipschitz regularityanalysis of empirical eigenvectors using advanced PDE tools.In the current work, we take a different approach for the interpolation mapping in the eigenvalue LBanalysis, which is based on manifold heat kernels. Our analysis makes use of the fact that at short timeand on small local neighborhoods, the heat kernel H t ( x, y ) can be approximated by G t ( x, y ) := 1(4 πt ) d/ e − d M ( x,y )24 t , (3)and consequently by K t ( x, y ) when h is Gaussian as in (2). The first approximation H t ≈ G t is byclassical results of elliptical operators on Riemannian manifolds, c.f. Theorem 2.1. Next, G t ≈ K t because K t replaces geodesic distance d M ( x, y ) with Euclidean distance (cid:107) x − y (cid:107) in G t , and the two locally matchby d M ( x, y ) = (cid:107) x − y (cid:107) + O ( (cid:107) x − y (cid:107) ). These estimates allow us to construct interpolated C ∞ ( M ) functions I r [ v ] from discrete vector v ∈ R N by convolving with the heat kernel at time r = (cid:15)δ , where 0 < δ < K = k max + 1 low-lying population eigenvalues µ k of − ∆.Specifically, δ is inversely proportional to the smallest eigen-gap between µ k for k ≤ K ( µ k assumed tohave single multiplicity in the first place, and then the result generalizes to greater than one multiplicity),which is an O (1) constant determined by − ∆ and K . Applying the variational principle to the operator I − Q t , where Q t is the diffusion semi-group operator and Q t ’s spectrum is determined by that of − ∆, allowsto prove an initial eigenvalue LB smaller than half of the minimum first- K eigen-gap, which is enough for5he bootstrap strategy, following [7], to obtain refined eigenvector and eigenvalue consistency rate. In ourcase, Step 2 matches the eigenvector 2-norm consistency to the point-wise rate, which is standard. In Step3, leveraging the eigenvector consistency proved in Step 2, we further improve the eigenvalue convergenceto match the form rate, and then the refined eigenvalue LB rate matches the eigenvalue UB rate. Inthe process, the first K many empirical eigenvalues are upper bounded to be O (1), which follows by theeigenvalue UB proved in the beginning.As a road map, our eigen-convergence analysis consists of the following four steps,- Step 0. Eigenvalue UB by the Dirichlet form convergence, up to the form rate.- Step 1. Initial crude eigenvalue LB, providing eigenvalue error up to the smallest first K eigen-gap.- Step 2. 2-norm consistency of eigenvectors, up to the point-wise rate.- Step 3. Refined eigenvalue consistency, up to the form rate.Step 1 requires h to be non-negative and currently only covers the Gaussian case. This may be relaxed,since the proof only uses the approximation property of h , namely that K (cid:15) ≈ H (cid:15) . In this work, we restrictto the Gaussian case for simplicity and the wide use of Gaussian kernels in applications. As we adopt a Dirichlet form-based analysis, the eigen-convergence result in the current paper is of thesame type as in previous works using variational principle [6, 32, 7]. In particular, the rate concerns theconvergence of the first k max many low-lying eigenvalues of the Laplacian, where k max is a fixed finiteinteger. The constants in the big- O notations in the bounds are treated as O (1), and they depend on k max and these leading eigenvalues and eigenfunctions of the manifold Laplacian. Such results are usefulfor applications where leading eigenvectors are the primary focus, e.g., spectral clustering and dimension-reduced spectral embedding. An alternative approach is to analyze functional operator consistency [4, 34,28, 26], which may provide different eigen-consistency bounds, e.g., ∞ -norm consistency of eigenvectorsusing compact embedding of Glivenko-Cantelli function classes [11].The current work considers noise-less data on M , while robustness of graph Laplacian against noisein data is important for applications. When manifold data vectors are perturbed by noise in the ambientspace, [13] showed that Gaussian kernel function h has special property to make kernelized graph Lapla-cian robust to noise (by a modification of diagonal entries). More recently, [20] showed that bi-stochasticnormalization can make the Gaussian kernelized graph affinity matrix robust to high dimensional het-eroskedastic noise in data. These results suggest that Gaussian h is a special and useful choice of kernelfunction for graph Laplacian methods.Meanwhile, bi-stochastically normalized graph Laplacian has been studied in [23], where the point-wise convergence of the kernel integral operator to the manifold operator was proved. The spectralconvergence of bi-stochastically normalized graph Laplacian for data on hypertorus was recently provedto be O ( N − / ( d/ o (1) ) in [36], and the algorithm uses a symmetric Sinkhorn iteration with acceleratednumerical convergence. The density-corrected affinity kernel matrix ˜ W = D − W D − , which is analyzed inthe current work, is similar to the step in the Sinkhorn iteration of matrix scaling. It would be interestingto explore the connections to these works and extend our analysis to bi-stochastically normalized graphLaplacians, which may have better properties of spectral convergence and noise-robustness.6 Set-up and preliminaries
We define the following moment constants of function h satisfying Assumption 2, m [ h ] := (cid:90) R d h ( (cid:107) u (cid:107) ) du, m [ h ] := 1 d (cid:90) R d (cid:107) u (cid:107) h ( (cid:107) u (cid:107) ) du, ˜ m := m m . By (C3), h ≥ h ≡ m [ h ] , m [ h ] >
0. With Gaussian h as in (2), m = 1, m = 2, and ˜ m = 1. Denote m [ h ] and m [ h ] by m and m for a shorthand notation, and • The un-normalized graph Laplacian L un is defined as L un := 1 m p(cid:15)N ( D − W ) . (4)Note that the standard un-normalized graph Laplacian is usually D − W , and we divide by theconstant m p(cid:15)N for the convergence of L un to − ∆. • The random-walk graph Laplacian L rw is defined as L rw := 1 m m (cid:15) ( I − D − W ) , (5)with the constant normalization to ensure convergence to − ∆.The matrix L un is real-symmetric, positive semi-definite (PSD), and the smallest eigenvalue is zero. Sup-pose eigenvalues of L un are λ k , k = 1 , , · · · , and sorted in ascending order, that is,0 = λ ( L un ) ≤ λ ( L un ) ≤ · · · ≤ λ N ( L un ) . The L rw matrix is well-define when D i > i , which holds w.h.p. under the regime that (cid:15) d/ =Ω( log NN ), c.f. Lemma 3.5. We always work under the (cid:15) d/ = Ω( log NN ) regime, namely the connectivityregime. Due to that D − W is similar to D − / W D − / which is PSD, L rw is also real-diagonalized andhas N non-negative real eigenvalues, sorted and denoted as 0 = λ ( L rw ) ≤ λ ( L rw ) ≤ · · · ≤ λ N ( L rw ). Wealso have that, by the min-max variational formula for real-symmetric matrix, λ k ( L un ) = min L ⊂ R N , dim ( L )= k sup v ∈ L,v (cid:54) =0 v T L un vv T v , k = 1 , · · · , N. We define the graph Dirichlet form E N ( u ) for u ∈ R N as E N ( u ) = 1 m (cid:15)N u T ( D − W ) u = 1 m (cid:15)N N (cid:88) i,j =1 W i,j ( u i − u j ) . (6)By (4), E N ( u ) = p N u T L un u , and thus λ k ( L un ) = min L ⊂ R N , dim ( L )= k sup v ∈ L,v (cid:54) =0 E N ( v ) p N (cid:107) v (cid:107) , k = 1 , · · · , N. (7)Similarly, we have λ k ( L rw ) = min L ⊂ R N , dim ( L )= k sup v ∈ L,v (cid:54) =0 E N ( v ) m N v T Dv , k = 1 , · · · , N. (8)7o introduce notations of manifold Laplacian, we define inner-product in H := L ( M , dV ) as (cid:104) f, g (cid:105) := (cid:82) M f ( x ) g ( x ) dV ( x ), for f, g ∈ L ( M , dV ). We also use (cid:104)· , ·(cid:105) q to denote inner-product in L ( M , qdV ), qdV being a general measure on M (not necessarily probability measure), that is (cid:104) f, g (cid:105) q := (cid:82) M f ( x ) g ( x ) q ( x ) dV ( x ),for f, g ∈ L ( M , qdV ). For smooth connected compact manifold M , the (minus) manifold Laplacian-Beltrami operator − ∆ has eigen-pairs { µ k , ψ k } ∞ k =1 ,0 = µ < µ ≤ · · · ≤ µ k ≤ · · · , − ∆ ψ k = µ k ψ k , (cid:104) ψ k , ψ l (cid:105) = δ k,l , ψ k ∈ C ∞ ( M ) , k, l = 1 , , · · · . The second eigenvalue µ > M . When µ i = · · · = µ i + l − = µ for some eigenvalue µ of − ∆ having multiplicity l , the eigenfunctions ψ i , · · · , ψ i + l − can be set to be an orthonormal basis ofthe l -dimensional eigenspace associated with µ . Note that ψ k ∈ C ∞ ( M ) for generic smooth M . M We leverage the special property of Gaussian kernel in the ambient space R D that it locally approximatesthe manifold heat kernel on M . We start from the notations of manifold heat kernel. Since M is smoothcompact (no-boundary), the Green’s function of the heat equation on M exists, namely the heat kernel H t ( x, y ) of M . We denote the heat diffusion semi-group operator as Q t which can be formally written as Q t = e t ∆ , and Q t f ( x ) = (cid:90) M H t ( x, y ) f ( y ) dV ( y ) , ∀ f ∈ L ( M , dV ) . By that Q t is semi-group, we have the reproduce property (cid:90) M H t ( x, y ) H t ( y, z ) dV ( y ) = H t ( x, z ) , ∀ x, z ∈ M , ∀ t > . Meanwhile, by the probability interpretation, (cid:90) M H t ( x, y ) dV ( y ) = 1 , ∀ x ∈ M , ∀ t > . Using the eigenvalue and eigenfunctions { µ k , ψ k } k of − ∆, the heat kernel has the expansion representation H t ( x, y ) = (cid:80) ∞ k =1 e − tµ k ψ k ( x ) ψ k ( y ). We will not use the spectral expansion of H t in our analysis, but onlythat ψ k are also eigenfunctions of Q t , that is, Q t ψ k = e − tµ k ψ k , k = 1 , , · · · (9)Next, we derive Lemma 2.2, which characterizes two properties of the heat kernel H t at sufficientlyshort time: First, on a local neighborhood on M , H t ( x, y ) can be approximated by K t ( x, y ) in the leadingorder, where K t is defined as in (1) with Gaussian h ; Second, globally on the manifold the heat kernel H t ( x, y ) has a sub-Gaussian decay. These are based on classical results about heat kernel on Riemannianmanifolds [21, 16, 25, 17], summarized in the following theorem. Theorem 2.1 (Heat kernel parametrix and decay [25, 16]) . Suppose M is as in Assumption 1 (A1), and m > d/ is a positive integer. Then there are positive constants t < , δ < inj ( M ) i.e. the injectiveradius of M , and both t and δ depend on M , and1) Local approximation: There are positive constants C , C which depending on M , and u , · · · , u m ∈ C ∞ ( M ) , where u satisfies that | u ( x, y ) − | ≤ C d M ( x, y ) , ∀ y ∈ M , d M ( y, x ) < δ , nd G t is defined as in (3) , such that, when t < t , for any x ∈ M , | H t ( x, y ) − G t ( x, y ) (cid:32) m (cid:88) l =0 t l u l ( x, y ) (cid:33) | ≤ C t m − d/ , ∀ y ∈ M , d M ( y, x ) < δ . (10)
2) Global decay: There is positive constant C depending on M such that, when t < t , H t ( x, y ) ≤ C t − d/ e − d M ( x,y )25 t , ∀ x, y ∈ M . (11)Part 1) is by the classical parametrix construction of heat kernel on M , see e.g. Chapter 3 of [25], andPart 2) follows the classical upper bound of heat kernel by Gaussian estimate dating back to 60s [1, 17].We include a proof of the theorem in Appendix B for completeness.The theorem directly gives to the following lemma (proof in Appendix B), which is useful for ourconstruction of interpolation mapping using heat kernel. We denote by B δ ( x ) the Euclidean ball in R D centered at point x of radius δ . Lemma 2.2.
Suppose M is as in Assumption 1 (A1), and t → . Let δ t := (cid:113) d ) t log t , and K t ( x, y ) be with Gaussian kernel h , i.e., K t ( x, y ) = (4 πt ) − d/ e −(cid:107) x − y (cid:107) / t . Then there is positive constant (cid:15) depending on M such that, when t < (cid:15) , for any x ∈ M , H t ( x, y ) = K t ( x, y )(1 + O ( t (log t − ) )) + O ( t ) , ∀ y ∈ B δ t ( x ) ∩ M , (12) H t ( x, y ) = O ( t ) , ∀ y / ∈ B δ t ( x ) ∩ M , (13) H t ( x, y ) = O ( t − d/ ) , ∀ x, y ∈ M . (14) The constants in big- O in all the equations only depend on M and are uniform for all x . In this section, we consider uniform p on M , and standard graph Laplacians L un and L rw with thekernelized affinity matrix W , W ij = K (cid:15) ( x i , x j ) defined as in (1). We show the eigenvalue UB for generaldifferentiable h satisfying Assumption 2, not necessarily Gaussian. Proposition 3.1 (Eigenvalue UB of L un ) . Under Assumption 1(A1), p being uniform on M , and As-sumption 2. For fixed K ∈ N , if as N → ∞ , (cid:15) → and (cid:15) d/ = Ω( log NN ) , then for sufficiently large N ,w.p. > − K N − , λ k ( L un ) ≤ µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k = 1 , · · · , K. The proposition holds when the population eigenvalues µ k have more than 1 multiplicities, as long as theyare sorted in an ascending order. The proof is by constructing a k -dimensional subspace L in (7) spannedby vectors in R N which are produced by evaluating the population eigenfunctions ψ k at the N data points.Given X = { x i } Ni =1 , define the function evaluation operator ρ X applied to f : M → R as ρ X : C ( M ) → R N , ρ X f = ( f ( x ) , · · · , f ( x N )) . We will use u k = √ p ρ X ψ k as “candidate” approximate eigenvectors. To analyze E N ( √ p ρ X ψ k ), thefollowing result from [9] shows that it converges to the differential Dirichlet form p − (cid:104) ψ k , ( − ∆) ψ k (cid:105) p = pµ k p and weighted Laplacian ∆ q , which is defined as∆ q := ∆ + ∇ qq · ∇ for measure qdV on M . ∆ q is reduced to ∆ when p is uniform. Theorem 3.2 (Theorem 3.5 in [9]) . Under Assumptions 1 and 2, as N → ∞ , (cid:15) → , (cid:15) d/ = Ω( log NN ) ,then for any f ∈ C ∞ ( M ) , when N is sufficiently large, w.p. > − N − , E N ( ρ X f ) = (cid:104) f, − ∆ p f (cid:105) p + O p,f ( (cid:15) ) + O (cid:32)(cid:115) log NN (cid:15) d/ (cid:90) M |∇ f | p (cid:33) . The constant in O p,f ( · ) depends on the C norm of p and f on M , and that in O ( · ) is an absolute one.Proof of Theorem 3.2. The proof is by a going through of the proof of Theorem 3.5 of [9] under thesimplified situation when β = 0 (no normalization of the estimated density is involved). Specifically, theproof uses the concentration of the V -statistics V ij := (cid:15) K (cid:15) ( x i , x j )( f ( x i ) − f ( x j )) . The expectation of E V ij , i (cid:54) = j , equals (cid:15) (cid:82) M (cid:82) M K (cid:15) ( x, y )( f ( x ) − f ( y )) p ( x ) p ( y ) dV ( x ) dV ( y ) = m [ h ] (cid:104) f, − ∆ p f (cid:105) p + O p,f ( (cid:15) ).Meanwhile, | V ij | is bounded by O ( (cid:15) − d/ ), and the variance of the V ij can also be bounded by O ( (cid:15) − d/ )with the constant as in the theorem, following the calculation in the proof of Theorem 3.5 in [9]. Theconcentration of N ( N − (cid:80) Ni,j =1 V ij at E V ij then follows by the decoupling of the V -statistics, and it givesthe high probability bound in the theorem.Note that the results in [9] are proved under the assumption that h to be C rather than C , that is,requiring Assumption 2(C1)(C2) to hold for up to 4-th derivative of h . This is because C regularity of h is used to handle complication of the adaptive bandwidth in the other analysis in [9]. With the fixedbandwidth kernel K (cid:15) ( x, y ) as defined in (1), C regularity suffices, as originally assumed in [10]. Remark . Since the proof only involves the computation of moments of the V -statistic, it is possible torelax Assumption 2(C3) non-negativity of h and replace with certain non-vanishing conditions on m [ h ]and m [ h ], e.g., as in [10] and Assumption A.5 in [9]. Since the non-negativity of W ij is used in otherplaces in the paper, and our eigenvalue LB needs h to be Gaussian, we adopt the non-negativity of h inAssumption 2 for simplicity. The C regularity of f may also be relaxed, and the constant in O p,f ( · ) maybe improved accordingly. These extensions are not further pursued here. Remark . When h = [0 , , using the same method as in the proof of Lemma 8 in [10], one can verifythat (proof in Appendix C.1), for i (cid:54) = j , E V ij = m [ h ] (cid:104) f, − ∆ p f (cid:105) p + O p,f ( (cid:15) ) , f ∈ C ∞ ( M ) . (15)The boundedness and variance of V ij are again bounded by O ( (cid:15) − d/ ), and thus the Dirichlet form conver-gence with h = [0 , has the same rate as in Theorem 3.2. This firstly implies that the eigenvalue UBalso has the same rate, following the same proof of Proposition 3.1. The final eigen-convergence rate alsodepends on the point-wise rate of the graph Laplacian, see more in Remark 3.In Theorem 3.2 and in below, the log N factor in the variance error bound is due to the concentrationargument. Throughout the paper, the classical Bernstein inequality Lemma B.1 is intensively used.To proceed, recall the definition of E N ( u ) as in (6), we define the bi-linear form for u, v ∈ R N as B N ( u, v ) := 14 ( E N ( u + v ) − E N ( u − v )) = 1 m / (cid:15)N u T ( D − W ) v, which is symmetric, i.e., B N ( u, v ) = B N ( v, u ), and B N ( u, u ) = E N ( u ). The following lemma characterizesthe forms E N and B N applied to ρ X ψ k , proved in Appendix C.1.10 emma 3.3. Under Assumption 1 (A1), p being uniform on M , and Assumption 2. As N → ∞ , (cid:15) → , (cid:15) d/ N = Ω(log N ) . For fixed K , when N is sufficiently large, w.p. > − K N − , E N ( 1 √ p ρ X ψ k ) = pµ k + O ( (cid:15) ) + O (cid:32)(cid:114) log NN (cid:15) d/ (cid:33) , k = 1 , · · · , K,B N ( 1 √ p ρ X ψ k , √ p ρ X ψ l ) = O ( (cid:15) ) + O (cid:32)(cid:114) log NN (cid:15) d/ (cid:33) , k (cid:54) = l, ≤ k, l ≤ K. (16)We need to show the linear independence of the vectors ρ X ψ , · · · , ρ X ψ K such that they span a K -dimensional subspace in R N . This holds w.h.p. at large N , by the following lemma showing thenear-isometry of the projection mapping ρ X , proved in Appendix C.1. Lemma 3.4.
Under Assumptions 1 (A1), p being uniform on M . For fixed K , when N is sufficientlylarge, w.p. > − K N − , N (cid:107) √ p ρ X ψ k (cid:107) = 1 + O ( (cid:114) log NN ) , ≤ k ≤ K ;1 N ( 1 √ p ρ X ψ k ) T ( 1 √ p ρ X ψ l ) = O ( (cid:114) log NN ) , k (cid:54) = l, ≤ k, l ≤ K. (17) Proof of Proposition 3.1.
For fixed K , consider the intersection of both good events in Lemma 3.3 and3.4, which happens w.p. > − K N − with large enough N . Let u k = √ p ρ X ψ k , by (17), the set { u , · · · , u K } is linearly independent.For any 1 ≤ k ≤ K , let L = Span { u , · · · , u k } , then dim ( L ) = k . By (7), to show the UB of λ k as inthe proposition, it suffices to show thatsup v ∈ L, (cid:107) v (cid:107) = N p E N ( v ) ≤ µ k + O ( (cid:15) ) + O (cid:32)(cid:114) log NN (cid:15) d/ (cid:33) . For any v ∈ L , (cid:107) v (cid:107) = N , there are c j , 1 ≤ j ≤ k , such that v = (cid:80) kj =1 c j u j . By (17),1 = 1 N (cid:107) v (cid:107) = k (cid:88) j =1 c j (1 + O ( (cid:114) log NN )) + k (cid:88) j (cid:54) = l,j,l =1 | c j || c l | O ( (cid:114) log NN ) = (cid:107) c (cid:107) (1 + O ( K (cid:114) log NN )) , thus (cid:107) c (cid:107) = 1 + O ( (cid:113) log NN ). Meanwhile, E N ( v ) = E N ( (cid:80) kj =1 c j u j ) = (cid:80) kj,l =1 c j c l B N ( u j , u l ), and by (16), E N ( v ) = k (cid:88) j =1 c j (cid:32) pµ j + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) + k (cid:88) j (cid:54) = l,j,l =1 | c j || c l | O ( (cid:15), (cid:114) log NN (cid:15) d/ )= p k (cid:88) j =1 µ j c j + K (cid:107) c (cid:107) O ( (cid:15), (cid:114) log NN (cid:15) d/ ) ≤ (cid:107) c (cid:107) (cid:40) pµ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:41) , (18)where since K is fixed integer, we incorporate it into the big- O . Also, µ k ≤ µ K = O (1), and then1 p E N ( v ) ≤ (cid:32) O ( (cid:114) log NN ) (cid:33) (cid:40) µ k + O ( (cid:15) ) + O (cid:32)(cid:114) log NN (cid:15) d/ (cid:33)(cid:41) = µ k + O ( (cid:15) ) + O (cid:32)(cid:114) log NN (cid:15) d/ (cid:33) , which finishes the proof. 11 .2 Random-walk graph Laplacian We fist establish a concentration argument of D i in the following lemma, which shows that D i > N D i concentrates at the value of m p >
0. Consequently, N u T Du also concentrates and thedeviation is uniformly bounded for all u ∈ R N , which will be used in analyzing (8). Lemma 3.5.
Under Assumption 1(A1), p uniform, and Assumption 2. Suppose as N → , (cid:15) → and (cid:15) d/ = Ω( log NN ) . Then, when N is large enough, w.p. > − N − ,1) The degree D i concentrates for all i , namely, N D i = m p + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ∀ i = 1 , · · · , N. (19)
2) The from N u T Du concentrates for all u , namely, N u T Du = 1 N (cid:107) u (cid:107) ( m p + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N . (20) The constants in the big-O in (19) and (20) are determined by ( M , h ) and uniform for all i and u . Part 2) immediately follows from Part 1), the latter being proved by standard concentration argument ofindependent sum and a union bound for N events. With Lemma 3.5, the proof of the following propositionis similar to that of Proposition 3.1, and the difference lies in handling the denominator of the Rayleighquotient in (8). The proofs of Lemma 3.5 and Proposition 3.6 are in Appendix C.1. Proposition 3.6 (Eigenvalue UB of L rw ) . Suppose M , p uniform, h , K , µ k , and (cid:15) are under the samecondition as in Proposition 3.1, then for sufficiently large N , w.p. > − N − − K N − , D i > forall i , and λ k ( L rw ) ≤ µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k = 1 , · · · , K. In this section, we prove O (1) eigenvalue LB in Step 1, first for L un , and then the proof for L rw is similar.We consider for t > L t on H = L ( M , dV ) defined as L t := I − Q t , L t f ( x ) = f ( x ) − (cid:90) M H t ( x, y ) f ( y ) dV ( y ) , f ∈ H. The semi-group operator Q t is Hilbert-Schmidt, compact, and has eigenvalues and eigenfunctions as in(9). Thus, the operator L t is self-adjoint and PSD, and has L t ψ k = (1 − e − tµ k ) ψ k , k = 1 , , · · · For any t >
0, the eigenvalues { − e − tµ k } k are ascending from 0 and have limit point 1. We denote (cid:107) f (cid:107) = (cid:104) f, f (cid:105) for f ∈ H . By the variational principle, we have that when t >
0, for any k ,1 − e − tµ k = inf L ⊂ H, dim ( L )= k sup f ∈ L, (cid:107) f (cid:107) (cid:54) =0 (cid:104) f, L t f (cid:105)(cid:104) f, f (cid:105) . (21)For the first result, we assume that µ k are all of multiplicity 1 for simplicity. When population eigenvalueshave greater than one multiplicity, the result extends by considering eigenspace rather than eigenvectorsin the standard way, see Remark 4. 12 .1 Un-normalized graph Laplacian Proposition 4.1 (Initial crude eigenvalue LB of L un ) . Under Assumptions 1 (A1), suppose p is uniformon M , and h is Gaussian. For fixed k max ∈ N , K = k max + 1 , suppose µ < · · · < µ K < ∞ are all ofsingle multiplicity, and define γ K := 12 min ≤ k ≤ k max ( µ k +1 − µ k ) , (22) γ K > and is a fixed constant. Then there is a absolute constant c K determined by M and k max (specifically, c K = c ( µ K γ K ) d/ γ − K , where c is a constant depending on M ), such that, if as N → ∞ , (cid:15) → , and (cid:15) d/ > c K log NN , then for sufficiently large N , w.p. > − K N − − N − , λ k ( L un ) > µ k − γ K , k = 2 , · · · , K. Suppose { λ k , v k } Kk =1 are eigenvalue and eigenvectors of L un , to construct a test function f k on M fromthe vector v k , we define the interpolation mapping (the terminology “interpolation” is inherited from [6])by the heat kernel with diffusion time r , 0 < r < (cid:15) to be determined. Specifically, define I r [ u ]( x ) := 1 N N (cid:88) j =1 u j H r ( x, x j ) , I r : R N → C ∞ ( M ) , and then for any t > (cid:104) I r [ u ] , Q t I r [ u ] (cid:105) = 1 N N (cid:88) i,j =1 u i u j H r + t ( x i , x j ) , (cid:104) I r [ u ] , I r [ u ] (cid:105) = 1 N N (cid:88) i,j =1 u i u j H r ( x i , x j ) . (23)We define the quadratic form q s ( u ) := 1 N N (cid:88) i,j =1 u i u j H s ( x i , x j ) , s > , u ∈ R N . We also define q (0) s and q (2) s as below, and then for any u ∈ R N , q s ( u ) = q (0) s ( u ) − q (2) s ( u ), where q (0) s ( u ) := 1 N N (cid:88) i =1 u i N N (cid:88) j =1 H s ( x i , x j ) , q (2) s ( u ) := 12 1 N N (cid:88) i,j =1 H s ( x i , x j )( u i − u j ) (24)We will show that q (0) s ( u ) ≈ p N (cid:107) u (cid:107) by concentration of the independent sum N (cid:80) Nj =1 H s ( x i , x j ); q (2) s ( u ) ≥ O ( s ) when u is an eigenvector with (cid:107) u (cid:107) = N . Lemma 4.2.
Under Assumptions 1 (A1), p being uniform on M . Suppose as N → , s → and s d/ = Ω( log NN ) . Then, when N is large enough, w.p. > − N − , q (0) s ( u ) = 1 N (cid:107) u (cid:107) ( p + O M ( (cid:114) log NN s d/ )) , ∀ u ∈ R N . The notation O M ( · ) indicates that the constant depends on M and is uniform for all u .Proof of Lemma 4.2. By definition, q (0) s ( u ) = N (cid:80) Ni =1 u i ( D s ) i , where ( D s ) i := N (cid:80) Nj =1 H s ( x i , x j ), and { ( D s ) i } Ni =1 are N positive valued random variables. It suffices to show that with large enough N , w.p.indicated in the lemma, ( D s ) i = p + O M ( (cid:114) log NN s d/ ) , ∀ i = 1 , · · · , N. (25)13his can be proved using concentration argument, similar as in the proof of Lemma 3.5 1), where we usethe boundedness of the heat kernel (14) in Lemma 2.2. The proof of (25) is given in Appendix C.2. Notethat (25) is a property of the r.v. H s ( x i , x j ) only, which is irrelevant to the vector u . Thus the thresholdof large N in the lemma and the constant in big- O depend on M and are uniform for all u . Lemma 4.3.
Under Assumptions 1 ( p can be non-uniform), h being Gaussian, let < α < be a fixedconstant. Suppose (cid:15) → as N → ∞ , then with sufficiently small (cid:15) , for any realization of X , ≤ q (2) (cid:15) ( u ) = (cid:18) O ( (cid:15) (log 1 (cid:15) ) ) (cid:19) u T ( D − W ) uN + (cid:107) u (cid:107) N O ( (cid:15) ) , ∀ u ∈ R N , (26) and ≤ q (2) α(cid:15) ( u ) ≤ . α − d/ u T ( D − W ) uN + (cid:107) u (cid:107) N O ( (cid:15) ) , ∀ u ∈ R N . (27) The constants in big- O only depend on M and are uniform for all u and α .Proof of Lemma 4.3. For any u ∈ R N , q (2) (cid:15) ( u ) =
12 1 N (cid:80) Ni,j =1 H (cid:15) ( x i , x j )( u i − u j ) ≥
0. Since (cid:15) = o (1),take t in Lemma 2.2 to be (cid:15) , when (cid:15) < (cid:15) , the three equations hold. By (13), truncate at an δ (cid:15) = (cid:113) d ) (cid:15) log (cid:15) Euclidean ball, q (2) (cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 H (cid:15) ( x i , x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + O ( (cid:15) ) 12 1 N N (cid:88) i,j =1 ( u i − u j ) . By that N (cid:80) Ni,j =1 ( u i − u j ) ≤ N (cid:107) u (cid:107) , and apply (12) with the short hand that ˜ O ( (cid:15) ) stands for O ( (cid:15) (log (cid:15) ) ), q (2) (cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 (cid:16) K (cid:15) ( x i , x j )(1 + ˜ O ( (cid:15) )) + O ( (cid:15) ) (cid:17) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + O ( (cid:15) ) (cid:107) u (cid:107) N = (1 + ˜ O ( (cid:15) )) 12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + O ( (cid:15) ) (cid:107) u (cid:107) N .
By the truncation argument for K (cid:15) ( x i , x j ), we have that12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) = u T ( D − W ) uN + (cid:107) u (cid:107) N O ( (cid:15) ) . (28)Putting together, we have q (2) (cid:15) ( u ) = (1 + ˜ O ( (cid:15) )) (cid:18) u T ( D − W ) uN + (cid:107) u (cid:107) N O ( (cid:15) ) (cid:19) + O ( (cid:15) ) (cid:107) u (cid:107) N , which proves (26).To prove (27), since α < < α(cid:15) < (cid:15) < (cid:15) , we then apply Lemma 2.2with t therein being α(cid:15) . With a truncation at δ α(cid:15) -Euclidean ball, and by (12), q (2) α(cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 (cid:16) K α(cid:15) ( x i , x j )(1 + ˜ O ( α(cid:15) )) + O ( α (cid:15) ) (cid:17) { x j ∈ B δα(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) N O ( (cid:15) )= (1 + ˜ O ( (cid:15) )) 12 1 N N (cid:88) i,j =1 K α(cid:15) ( x i , x j ) { x j ∈ B δα(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) N O ( (cid:15) ) . (cid:15) is sufficiently small such that 1 + ˜ O ( (cid:15) ) is less than 1.1. Note that K α(cid:15) ( x, y ) = 1(4 πα(cid:15) ) d/ e − (cid:107) x − y (cid:107) α(cid:15) ≤ α d/ π(cid:15) ) d/ e − (cid:107) x − y (cid:107) (cid:15) = α − d/ K (cid:15) ( x, y ) , ∀ x, y ∈ M , (29)then, by that { x j ∈ B δα(cid:15) ( x i ) } ≤ { x j ∈ B δ(cid:15) ( x i ) } , and again with (28), q (2) α(cid:15) ( u ) ≤ . N N (cid:88) i,j =1 α − d/ K (cid:15) ( x i , x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) N O ( (cid:15) )= 1 . α − d/ (cid:18) u T ( D − W ) uN + (cid:107) u (cid:107) N O ( (cid:15) ) (cid:19) + (cid:107) u (cid:107) N O ( (cid:15) ) , and this proves (27). Proof of Proposition 4.1.
For fixed k max , since γ K < µ K , define δ := 0 . γ K µ K < . , (30) δ > M and k max . For (cid:15) >
0, let r := δ(cid:15) , t = (cid:15) − r = (1 − δ ) (cid:15). For L un v k = λ k v k , where v k are normalized s.t.1 N v Tk v l = δ kl , ≤ k, l ≤ N, (31)let f k = I r [ v k ], k = 1 , · · · , K , then f k ∈ C ∞ ( M ) ⊂ H . Because (cid:15) d/ > c K log NN , and (cid:15) = o (1), (cid:15) d/ =Ω( log NN ). Thus, under the assumption of the current proposition, the condition needed in Proposition 3.1 issatisfied, and then when N is sufficiently large, there is an event E UB which happens w.p. > − K N − ,under which λ k ≤ µ k + 0 . µ K ≤ . µ K , ≤ k ≤ K. (32)We first show that { f j } Kj =1 are linearly independent by considering (cid:104) f k , f l (cid:105) . By definition, for 1 ≤ k ≤ K , (cid:104) f k , f k (cid:105) = q r ( v k ) = q (0) δ(cid:15) ( v k ) − q (2) δ(cid:15) ( v k ) , and for k (cid:54) = l , 1 ≤ k, l ≤ K , (cid:104) ( f k ± f l ) , ( f k ± f l ) (cid:105) = q r ( v k ± v l ) = q (0) δ(cid:15) ( v k ± v l ) − q (2) δ(cid:15) ( v k ± v l ) . Because s = δ(cid:15) , under the condition of the proposition, s satisfies the condition in Lemma 4.2, and thus,with sufficiently large N , there is an event E (0) which happens w.p. > − N − , under which q (0) δ(cid:15) ( v k ) = p + O ( (cid:114) log NN (cid:15) d/ ) , ≤ k ≤ K ; q (0) δ(cid:15) ( v k ± v l ) = 2 p + O ( (cid:114) log NN (cid:15) d/ ) , k (cid:54) = l, ≤ k, l ≤ K, where we used that the factor δ − d/ is a fixed constant. Meanwhile, applying (27) in Lemma 4.3 where α = δ , and note that v Tk ( D − W ) v k N = p(cid:15)λ k ; ( v k ± v l ) T ( D − W )( v k ± v l ) N = p(cid:15) ( λ k + λ l ) , k (cid:54) = l, ≤ k, l ≤ K,
15e have that q (2) δ(cid:15) ( v k ) = O ( δ − d/ ) p(cid:15)λ k + O ( (cid:15) ) , ≤ k ≤ K,q (2) δ(cid:15) ( v k ± v l ) = O ( δ − d/ ) p(cid:15) ( λ k + λ l ) + 2 O ( (cid:15) ) , k (cid:54) = l, and by that λ k , λ l ≤ . µ K which is a fixed constant, so is δ , we have that q (2) δ(cid:15) ( v k ) = O ( (cid:15) ) , ≤ k ≤ K ; q (2) δ(cid:15) ( v k ± v l ) = O ( (cid:15) ) , k (cid:54) = l, ≤ k, l ≤ K. (33)Putting together, we have that (cid:104) f k , f k (cid:105) = p + O ( (cid:114) log NN (cid:15) d/ , (cid:15) ) , ≤ k ≤ K, (cid:104) f k , f l (cid:105) = 14 ( q δ(cid:15) ( v k + v l ) − q δ(cid:15) ( v k − v l )) = O ( (cid:114) log NN (cid:15) d/ , (cid:15) ) , k (cid:54) = l, ≤ k, l ≤ K. (34)This proves linear independence of { f j } Kj =1 when N is large enough, since O ( (cid:113) log NN(cid:15) d/ , (cid:15) ) = o (1).We consider first K eigenvalues of L t , t = (1 − δ ) (cid:15) . For each 2 ≤ k ≤ K , let L k = Span { f , · · · , f k } bea k -dimensional subspace in H , then by (21),1 − e − (1 − δ ) (cid:15)µ k ≤ sup f ∈ L k , (cid:107) f (cid:107) (cid:54) =0 (cid:104) f, L t f (cid:105)(cid:104) f, f (cid:105) = (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105)(cid:104) f, f (cid:105) . (35)For any f ∈ L k , (cid:107) f (cid:107) (cid:54) = 0, there is c ∈ R k , c (cid:54) = 0, such that f = (cid:80) kj =1 c j f j . Thus f = k (cid:88) j =1 c j I r [ v j ] = I r [ k (cid:88) j =1 c j v j ] = I r [ v ] , v := k (cid:88) j =1 c j v j . Because v j are orthogonal, (cid:107) v j (cid:107) = N , we have that (cid:107) v (cid:107) N = (cid:107) c (cid:107) , v T ( D − W ) vN = k (cid:88) j =1 c j ( p(cid:15)λ j ) ≤ λ k p(cid:15) (cid:107) c (cid:107) . By definition, (cid:104) f, f (cid:105) = q δ(cid:15) ( v ), and (cid:104) f, Q t f (cid:105) = q (cid:15) ( v ).We first upper bound the numerator of the r.h.s. of (35). By that q (2) δ(cid:15) ( v ) ≥ (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105) = q δ(cid:15) ( v ) − q (cid:15) ( v ) = q (0) δ(cid:15) ( v ) − q (2) δ(cid:15) ( v ) − q (0) (cid:15) ( v ) + q (2) (cid:15) ( v ) ≤ ( q (0) δ(cid:15) ( v ) − q (0) (cid:15) ( v )) + q (2) (cid:15) ( v ) . (36)We have already obtained the good event E (0) when applying Lemma 4.2 with s = δ(cid:15) . We apply thelemma again to s = (cid:15) , which gives that with sufficiently large N there is an event E (1) which happens w.p. > − N − , and then under E (0) ∩ E (1) , q (0) δ(cid:15) ( v ) = (cid:107) c (cid:107) ( p + O M ( (cid:114) δ − d/ log NN (cid:15) d/ )) , q (0) (cid:15) ( v ) = (cid:107) c (cid:107) ( p + O M ( (cid:114) log NN (cid:15) d/ )) . (37)We track the constant dependence here: the constant in O M ( · ) in Lemma 4.2 is only depending on M (and not on K ), thus we use the notation O M ( · ) in (37) and below to emphasize that the constant is M -dependent only and independent from K . Then (37) gives that q (0) δ(cid:15) ( v ) − q (0) (cid:15) ( v ) = (cid:107) c (cid:107) δ − d/ O M ( (cid:114) log NN (cid:15) d/ ) . q (2) (cid:15) ( v ) follows from (26) in Lemma 4.3, with the shorthand that ˜ O ( (cid:15) ) stands for O ( (cid:15) (log (cid:15) ) ), q (2) (cid:15) ( v ) = v T ( D − W ) vN (1 + ˜ O ( (cid:15) )) + (cid:107) c (cid:107) O ( (cid:15) ) ≤ (cid:15) (cid:107) c (cid:107) ( λ k p (1 + ˜ O ( (cid:15) )) + O ( (cid:15) )) . Thus, (36) continues as (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105) ≤ (cid:15) (cid:107) c (cid:107) (cid:32) λ k p (1 + ˜ O ( (cid:15) )) + O ( (cid:15) ) + δ − d/ O M ( 1 (cid:15) (cid:114) log NN (cid:15) d/ ) (cid:33) . (38)Next we lower bound the denominator (cid:104) f, f (cid:105) . Here we use (27) in Lemma 4.3, which gives that0 ≤ q (2) δ(cid:15) ( v ) ≤ Θ( δ − d/ ) v T ( D − W ) vN + (cid:107) c (cid:107) O ( (cid:15) ) ≤ (cid:15) (cid:107) c (cid:107) (cid:16) λ k p Θ( δ − d/ ) + O ( (cid:15) ) (cid:17) . Note that we assume under event E UB so that the eigenvalue UB (32) holds, thus λ k p Θ( δ − d/ ) + O ( (cid:15) ) = O (1). Together with that δ is a fixed constant, we have that q (2) δ(cid:15) ( v ) = (cid:107) c (cid:107) O ( (cid:15) ) . Then, again under E (1) , (cid:104) f, f (cid:105) = q (0) δ(cid:15) ( v ) − q (2) δ(cid:15) ( v ) = (cid:107) c (cid:107) (cid:32) p + O ( (cid:114) δ − d/ log NN (cid:15) d/ ) − O ( (cid:15) ) (cid:33) ≥ (cid:107) c (cid:107) (cid:32) p − O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) . Putting together, and by that λ k ≤ . µ K , we have that (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105)(cid:104) f, f (cid:105) ≤ (cid:15) (cid:18) λ k p + ˜ O ( (cid:15) ) + δ − d/ O M ( (cid:15) (cid:113) log NN(cid:15) d/ ) (cid:19) p − O ( (cid:15), (cid:113) log NN(cid:15) d/ ) ≤ (cid:15) (cid:32) λ k + ˜ O ( (cid:15) ) + C(cid:15) (cid:114) log
NN (cid:15) d/ (cid:33) , where C = c ( M ) δ − d/ , and c ( M ) is a constant only depending on M . We set c K := ( C . γ K ) = ( c ( M )0 . δ − d/ γ − K , and since we assume (cid:15) d/ > c K log NN in the current proposition, we have that C(cid:15) (cid:113) log
NN(cid:15) d/ < . γ K . Then,comparing to l.h.s. of (35), we have that1 − e − (1 − δ ) (cid:15)µ k ≤ (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105)(cid:104) f, f (cid:105) ≤ (cid:15) (cid:16) λ k + ˜ O ( (cid:15) ) + 0 . γ K (cid:17) . By the relation that 1 − e − x ≥ x − x for any x ≥
0, 1 − e − (1 − δ ) (cid:15)µ k ≥ (cid:15) (1 − δ ) (cid:0) µ k − (1 − δ ) (cid:15)µ k (cid:1) , and when (cid:15) is sufficiently small s.t. (cid:15)µ k ≤ (cid:15) (1 . µ K ) < . γ K ,1 − e − (1 − δ ) (cid:15)µ k ≥ (cid:15) (1 − δ ) ( µ k − . γ K ) > . Noting that for k ≥ µ k ≥ µ ≥ γ K >
0, because µ = 0. Thus, when (cid:15) is sufficiently small and the ˜ O ( (cid:15) )term is less than 0 . γ K , under the good events E (1) ∩ E UB , which happens w.p. > − K N − − N − ,we have that 0 < (1 − δ )( µ k − . γ K ) ≤ λ k + ˜ O ( (cid:15) ) + 0 . γ K < λ k + 0 . γ K . Recall that by definition (30), δµ K = 0 . γ K , then δµ k ≤ δµ K = 0 . γ K , also 0 < δ < .
5. Re-arrangingthe terms gives that µ k < λ k + 0 . γ K . This can be verified for all 2 ≤ k ≤ K , and note that the goodevent E (1) is w.r.t X , and E UB is constructed for fixed k max , and none is for specific k ≤ K .17 igure 1: Population eigenvalues µ k of − ∆, and empirical eigenvalues λ k of graph Laplacian matrix L N , L N can be L un or L rw .The positive integer k max is fixed, and the constant γ K is half of the minimum first- K eigen-gaps, defined as in (22). Eigenvalue UBand initial LB are proved for k ≤ K , which guarantees (41). Extending to greater than one multiplicity by defining γ K as in (44). The counterpart result of random-walk graph Laplacian is the following proposition. It replaces Proposi-tion 3.1 with Proposition 3.6 in obtaining the eigenvalue UB in the analysis, and consequently the highprobability differs slightly.
Proposition 4.4 (Initial crude eigenvalue LB of L rw ) . Under the same condition and setting of M , p being uniform, h being Gaussian, and k max , µ k , (cid:15) same as in Proposition 4.1. Then, for sufficiently large N , w.p. > − K N − − N − , λ k ( L rw ) > µ k − γ K , for k = 2 , · · · , K . The proof is similar to that of Proposition 4.1 and left to Appendix C.2. The difference lies in thatthe empirical eigenvectors v k are D -orthonormal rather than orthonormal, and the degree concentrationLemma 3.5 is used to relate (cid:107) v (cid:107) N with N v T Dv for arbitrary vector v . In this section, we obtain eigen-convergence rate of L un and L rw from the initial crude eigenvalue boundin Step 1. We first derive the Steps 2-3 for L un , and the proof for L rw is similar. In Step 1, the crude bound of eigenvalue (the UB is already with the form rate, the LB is crude) givesthat for fixed k max and at large N , each λ k will fall into the interval ( µ k − γ K , µ k + γ K ), where γ K isless than half of the smallest eigenvalue gaps ( µ − µ ), ..., ( µ k max +1 − µ k max ), illustrated in Fig. 1. Thismeans that λ k is separated from neighboring µ k − and µ k +1 by an O (1) distance away. This O (1) initialseparation is enough for proving eigenvector consistency up to the point-wise rate, which is a standardargument, see e.g. proof of Theorem 2.6 part 2) in [7]. In below we provide an informal explanation andthen the formal statement in Proposition 5.2, with a proof for completeness.We first give an illustrative informal derivation. Take k = 2 for example, let L N = L un , L N u k = λ k u k ,and we want to show that u and ρ X ψ are aligned. r := L N ( ρ X ψ ) − ρ X ( − ∆) ψ ∈ R N , r ( i ) = L N ( ρ X ψ )( x i ) − ( − ∆) ψ ( x i ) , the point-wise rate gives L ∞ bound of the residual vector r , suppose (cid:107) r (cid:107) ≤ ε (cid:107) ρ X ψ (cid:107) . Meanwhile, forany l = 1 , , · · · , N , the crude bound of eigenvalues λ gives that λ > µ + γ K , where γ K > O (1) constant determined by k max and M . Because empirical eigenvalues are sorted, λ l for l ≥ γ K away from µ . As a result, | λ l − µ | > γ K > , l (cid:54) = 2 , ≤ l ≤ N. l (cid:54) = 2, u Tl r = u Tl ( L N ( ρ X ψ ) − µ ρ X ψ ) = ( λ l − µ ) u Tl ( ρ X ψ ),which gives that | u Tl ( ρ X ψ ) | = | u Tl r || λ l − µ | ≤ εγ K (cid:107) u l (cid:107) (cid:107) ρ X ψ (cid:107) . This shows that ρ X ψ has O ( ε ) alignment with all the other eigenvectors than u , and since { u , · · · , u N } are orthogonal basis in R N , this guarantees 1 − O ( ε ) alignment between ρ X ψ and u .To proceed, we used the point-wise rate of graph Laplacian with C kernel h as in the next theorem.The analysis of point-wise convergence was given in [27] and [9]: The original theorem in [27] considers thenormalized graph Laplacian ( I − D − W ). The analysis is similar for ( D − W ) and leads to the same rate,which was derived in [9] under the setting of variable kernel bandwidth. These previous works considera fixed point x on M , and since the concentration result has exponentially high probability, it directlygives the version of uniform error bound at every data point x i , which is needed here. Theorem 5.1 ([27, 9]) . Under Assumptions 1 and 2, if as N → ∞ , (cid:15) → , (cid:15) d/ = Ω( log NN ) , then forany f ∈ C ( M ) ,1) When N is large enough, w.p. > − N − , (cid:15) m m (cid:0) ( I − D − W )( ρ X f ) (cid:1) i = − ∆ p f ( x i ) + ε i , sup ≤ i ≤ N | ε i | = O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) .
2) When N is large enough, w.p. > − N − , (cid:15) m p ( x i ) N (( D − W )( ρ X f )) i = − ∆ p f ( x i ) + ε i , sup ≤ i ≤ N | ε i | = O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) . The constants in the big-O notations depend on M , p and the C norm of f . Note that Theorem 5.1 holds for non-uniform p , while in our eigen-convergence analysis of graph Laplacianwith W in below, we only use the result when p is uniform. Meanwhile, similar to Theorem 3.2, Assumption2(C3) may be relaxed for Theorem 5.1 to hold, c.f. Remark 1. Proof of Theorem 5.1.
Consider the N events such that ε i is less than the error bound. For each of the i -th event, condition on x i , Theorem 3.10 in [9] can be directly used to show that the event holds w.p. > − N − for the case 1) random-walk graph Laplacian. For the case 2) un-normalized graph Laplacian,adopting the same technique of Theorem 3.8 in [9] proves the same rate as for the fixed-bandwidth kernel(Theorem 3.10 is the fixed-bandwidth version of Theorem 3.7 therein), and gives that the event holds w.p. > − N − . Specifically, the proof is by showing the concentration of the (cid:15)N (cid:80) Nj =1 K (cid:15) ( x i , x j )( f ( x j ) − f ( x i )), which is an independent summation condition on x i . The r.v. H j := (cid:15) K (cid:15) ( x i , x j )( f ( x j ) − f ( x i )), j (cid:54) = i , has expectation E H j = m p ( x i )∆ p f ( x i ) + O f,p ( (cid:15) ), and E H j can be shown to be bounded byΘ( (cid:15) − d/ − ), and | H j | is also bounded by Θ( (cid:15) − d/ − ), following the same calculation as in the proof ofTheorem 3.8 [9]. This shows that the bias error is O ( (cid:15) ), and the variance error is O ( (cid:113) log NN(cid:15) d/ ), byclassical Bernstein. Same as in Theorem 3.2, C regularity and decay up to 2nd derivative of h are enoughhere. Strictly speaking, the analysis in [9] is for the “ N − (cid:80) Nj (cid:54) = i,j =1 ” summation and not the “ N (cid:80) Nj (cid:54) = i,j =1 ”one here. However, the difference between N − and N only introduces an O ( N ) relative error and is ofhigher order, same as in the proof of Lemma 3.5, and the i = j term cancels out in the summation of( D − W ) ρ X f . Then, by the independence of the x i ’s, the i -th event holds with the same high probability.The current theorem, in both 1) and 2), follows by a union bound.19e are ready for Step 2 for the unnormalized graph Laplacian L un = (cid:15) m pN ( D − W ). Here we considereigenvectors normalized to have 2-norm 1, i.e., L un u k = λ k u k , u Tk u l = δ kl , and we compare u k to φ k := 1 √ pN ρ X ψ k ∈ R N , (39)where ψ k are population eigenfunctions which are orthonormal in H = L ( M , dV ), same as above. Proposition 5.2.
Under Assumptions 1(A1), p being uniform on M , and h is Gaussian, for fixed k max ∈ N , K = k max + 1 , assume that the eigenvalues µ k for k ≤ K are all single multiplicity, and γ K > asdefined in (22) , the constant c K as in Proposition 4.1. If as N → ∞ , (cid:15) → , (cid:15) d/ > c K log NN ,then for sufficiently large N , w.p. > − K N − − (2 K + 4) N − , there exist scalars α k (cid:54) = 0 , actually | α k | = 1 + o (1) , such that (cid:107) u k − α k φ k (cid:107) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ≤ k ≤ k max . Proof of Proposition 5.2.
The proof uses the same approach as that of Theorem 2.6 part 2) in [7], andsince our setting is different, we include a proof for completeness.When k = 1, we always have λ = µ = 0, u is the constant vector u = √ N N , and ψ is theconstant function, and thus φ = u up to a sign. Under the condition of the current proposition,the assumptions of Proposition 4.1 are satisfied, and because (cid:15) d/ > c K log NN implies that (cid:15) d/ =Ω( log NN ), the assumptions of Theorem 5.1 2) are also satisfied. We apply Theorem 5.1 2) to the K functions ψ , · · · , ψ K . By a union bound, we have that when N is large enough, w.p. > − KN − , (cid:107) L un φ k − µ k φ k (cid:107) ∞ = √ pN ( O ( (cid:15) ) + O ( (cid:113) log NN(cid:15) d/ )) for 2 ≤ k ≤ K . By that (cid:107) v (cid:107) ≤ √ N (cid:107) v (cid:107) ∞ for any v ∈ R N ,this gives that there is Err pt > (cid:107) L un φ k − µ k φ k (cid:107) ≤ Err pt , ≤ k ≤ K, Err pt = O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) . (40)The constants in big-O depends on first K eigenfunctions and are absolute ones because K is fixed.Applying Proposition 4.1, and consider the intersection with the good event in Proposition 4.1, we havefor each 2 ≤ k ≤ K , | µ k − λ k | < γ K . By definition of γ K as in (22),min ≤ j ≤ N, j (cid:54) = k | µ k − λ j | > γ K > , ≤ k ≤ k max . (41)For each k ≤ k max , let S k = Span { u k } be the 1-dimensional subspace in R N , and let S ⊥ k be its orthogonalcomplement. We will show that (cid:107) P S ⊥ k φ k (cid:107) is small. By definition, P S ⊥ k µ k φ k = (cid:80) Nj (cid:54) = k,j =1 µ k ( u Tj φ k ) u j , andmeanwhile, P S ⊥ k L un φ k = (cid:80) Nj (cid:54) = k,j =1 ( u Tj L un φ k ) u j = (cid:80) Nj (cid:54) = k,j =1 λ j ( u Tj φ k ) u j . Subtracting the two gives that P S ⊥ k ( µ k φ k − L un φ k ) = (cid:80) Nj (cid:54) = k,j =1 ( µ k − λ j )( u Tj φ k ) u j . By that u j are orthonormal vectors, and (41), (cid:107) P S ⊥ k ( µ k φ k − L un φ k ) (cid:107) = N (cid:88) j (cid:54) = k,j =1 ( µ k − λ j ) ( u Tj φ k ) ≥ γ K N (cid:88) j (cid:54) = k,j =1 ( u Tj φ k ) = γ K (cid:107) P S ⊥ k φ k (cid:107) . Then, combined with (40), we have that γ K (cid:107) P S ⊥ k φ k (cid:107) ≤ (cid:107) P S ⊥ k ( µ k φ k − L un φ k ) (cid:107) ≤ (cid:107) µ k φ k − L un φ k (cid:107) ≤ Err pt , namely, (cid:107) P S ⊥ k φ k (cid:107) ≤ Err pt γ K .By definition, P S ⊥ k φ k = φ k − ( u Tk φ k ) u k , where (cid:107) u k (cid:107) = 1. Note that φ k are unit vectors up to an O ( (cid:113) log NN ) error: Because the good event in Proposition 4.1 is under that in the eigenvalue UB Proposition20.1, and specifically that of Lemma 3.4. Thus (17) holds, which means that |(cid:107) φ k (cid:107) − | ≤ Err norm ,1 ≤ k ≤ K , where Err norm = O ( (cid:113) log NN ). Then, one can verify that | u Tk φ k | = 1 + O (Err norm , Err pt ) = 1 + o (1) , (42)and then we set α k = u Tk φ k , and have that (cid:107) α k φ k − u k (cid:107) = O (Err pt ) | u Tk φ k | ≤ O (Err pt )1 − O (Err norm , Err pt ) = O (Err pt )(1 + O (Err norm , Err pt )) = O (Err pt ) . The bound holds for each k ≤ k max . Proposition 5.3.
Under the same condition of Proposition 5.2, k max is fixed. Then, for sufficiently large N , with the same indicated high probability, | µ k − λ k | = O (cid:32) (cid:15), (cid:114) log NN (cid:15) d/ (cid:33) , ≤ k ≤ k max . Proof of Proposition 5.3.
We inherit the notations in the proof of Proposition 5.2. Again µ = λ = 0.For 2 ≤ k ≤ k max , note that u Tk ( L un φ k − µ k φ k ) = ( λ k − µ k ) u Tk φ k , (43)and meanwhile, we have shown that u k = α k φ k + ε k , where α k = 1 + o (1) and (cid:107) ε k (cid:107) = O (Err pt ). Thusthe l.h.s. of (43) equals( α k φ k + ε k ) T ( L un φ k − µ k φ k ) = α k ( φ Tk L un φ k − µ k (cid:107) φ k (cid:107) ) + ε Tk ( L un φ k − µ k φ k ) =: 1 ○ + 2 ○ . By definition of φ k , φ Tk L un φ k = pN ( ρ X ψ k ) T L un ( ρ X ψ k ) = p E N ( ρ X ψ k ). The good event in Proposi-tion 5.2 is under the good event E UB , under which Lemma (3.3) and Lemma 3.4 hold. Then by (16), E N ( ρ X ψ k ) = p µ k + O ( (cid:15), (cid:113) log NN(cid:15) d/ ); By (17), (cid:107) φ k (cid:107) = 1 + O ( (cid:113) log NN ). Putting together, and by that α k = 1 + o (1) = O (1),1 ○ = α k ( φ Tk L un φ k − µ k (cid:107) φ k (cid:107) ) = O (1) (cid:32) µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) − µ k (1 + O ( (cid:114) log NN )) (cid:33) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . Meanwhile, by (40), (cid:107) L un φ k − µ k φ k (cid:107) ≤ Err pt , and then | ○ | ≤ (cid:107) ε k (cid:107) (cid:107) L un φ k − µ k φ k (cid:107) = O (Err pt ) . Because (cid:15) d/ > c K log NN for some c K > log NN(cid:15) d/ = (cid:15) log NN(cid:15) d/ < (cid:15)c K , thus Err pt = O ( (cid:15) + (cid:113) log NN(cid:15) d/ ) = O ( √ (cid:15) ), and then 2 ○ = O (Err pt ) = O ( (cid:15) ). Back to (43), we have that | λ k − µ k || u Tk φ k | = | ○ + 2 ○ | = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) + O ( (cid:15) ) , and by (42), | u Tk φ k | = 1 + o (1), thus | λ k − µ k | = | ○ + ○ | o (1) = O ( | ○ + 2 ○ | ) = O ( (cid:15), (cid:113) log NN(cid:15) d/ ). The aboveholds for all k ≤ k max . 21 .3 Eigen-convergence rate We are ready to prove the main theorems on eigen-convergence of graph Laplacians, when p is uniformand the kernel function h is Gaussian. Theorem 5.4 (eigen-convergence of L un ) . Under Assumption 1 (A1), p is uniform on M , and h isGaussian. For k max ∈ N fixed, assume that the eigenvalues µ k for k ≤ K := k max + 1 are all singlemultiplicity, and the constant c K as in Proposition 4.1. Consider first k max eigenvalues and eigenvectorsof L un , L un u k = λ k u k , u Tk u l = δ kl , and the vectors φ k are defined as in (39) . If as N → ∞ , (cid:15) → , (cid:15) d/ > c K log NN , then for sufficiently large N , w.p. > − K N − − (2 K + 4) N − , | µ k − λ k | = O (cid:32) (cid:15), (cid:114) log NN (cid:15) d/ (cid:33) , ≤ k ≤ k max , and there exist scalars α k (cid:54) = 0 , actually | α k | = 1 + o (1) , such that (cid:107) u k − α k φ k (cid:107) = O (cid:32) (cid:15), (cid:114) log NN (cid:15) d/ (cid:33) , ≤ k ≤ k max . In particular, when (cid:15) ∼ N − / ( d/ , | µ k − λ k | = O ( N − d/ ) , (cid:107) u k − α k φ k (cid:107) = O ( N − d/ (cid:112) log N ) , ≤ k ≤ k max . When (cid:15) = ( c (cid:48) log NN ) / ( d/ , c (cid:48) > c K , | µ k − λ k | = O (( log NN ) d/ ) , (cid:107) u k − α k φ k (cid:107) = O (( log NN ) d +4 ) , ≤ k ≤ k max . Proof.
Under the condition of the theorem, the eigenvector and eigenvalue error bounds have been provedin Proposition 5.2 and Proposition 5.3. For the the two specific asymptotic scaling of (cid:15) , the rate followsfrom the bounds involving both (cid:15) and N . Remark . With indicator h = [0 , , the point-wise rate is Err pt,ind = O (cid:18) √ (cid:15), (cid:113) log NN(cid:15) d/ (cid:19) see [19, 4, 27, 7]among others. While our way of Step 1 cannot be applied to such h , [7] covered this case when d ≥
2, andprovided the eigenvalue and eigenvector consistency up to Err pt,ind when (cid:15) d/ = Ω( log NN ). The scaling (cid:15) d/ = ˜Θ( N − ) is the optimal one to balance the bias and variance errors in Err pt,ind , and then it givesthe overall error rate as ˜ O ( N − / ( d +4) ), which agrees with the eigen-convergence rate in [7]. Here ˜ O ( · ) and˜Θ( · ) indicate that the constant is possibly multiplied by a factor of certain power of log N . Meanwhile,we note that, using the Dirichlet form convergence rate, the eigenvalue consistency can be improved to besquared: By Remark 2, the Dirichlet form convergence with indicator h is Err form,ind = O ( (cid:15), (cid:113) log NN(cid:15) d/ ).Then, once the initial crude eigenvalue LB is established, in Step 2, the eigenvector 2-norm consistencycan be shown to be Err pt,ind . In Step 3, the eigenvalue consistency for the first k max eigenvalues can beshown to be O (Err form,ind , Err pt,ind ) = O ( (cid:15), (cid:113) log NN(cid:15) d/ ). This would imply the eigenvalue convergence rateof ˜ O ( N − / ( d/ ) under the regime where (cid:15) = ˜Θ( N − / ( d/ ), where eigenvector consistency remains˜ O ( N − / ( d +4) ), which is the same as with Gaussian h in Theorem 5.4. Remark . The result extends when the population eigenvalues µ k have multiplicity greater than one.Suppose we consider 0 = µ (1) < µ (2) < · · · < µ ( M ) < · · · , which are distinct eigenvalues, and µ ( m ) hasmultiplicity l m ≥
1. Then let k max = (cid:80) Mm =1 l m , K = (cid:80) M +1 m =1 l m , µ K = µ ( M +1) , and { µ k , ψ k } Kk =1 are sorted22igenvalues and associated eigenfunctions. Step 0. eigenvalue UB holds, since Proposition 3.1 does notrequire single multiplicity. In Step 1, the only place in Proposition 4.1 where single multiplicity of µ k isused is in the definition of γ K . Then, by changing to γ ( M ) = 12 min ≤ m ≤ M ( µ ( m +1) − µ ( m ) ) > , (44)and defining δ = 0 . γ ( M ) µ K , 0 < δ < . M and K , Proposition 4.1proves that | λ k − µ ( m ) | < γ ( M ) for all k ≤ K , i.e. m ≤ M + 1. This allows to extend Step 2 Proposition 5.2by considering the projection P S ⊥ where the subspace in R N is spanned by eigenvectors whose eigenvalues λ k approaches µ k = µ ( m ) , similar as in the original proof of Theorem 2.6 part 2) in [7]. Specifically,suppose µ i = · · · = µ i + l m − = µ ( m ) , 2 ≤ m ≤ M , let S ( m ) = Span { u i , · · · , u i + l m − } , and the index set I m := { i, · · · , i + l m − } . For eigenfunction ψ k , k ∈ I m , then µ k = µ ( m ) , similarly as in the proof ofProposition 5.2, one can verify that (cid:107) P ( S ( m ) ) ⊥ ( µ k φ k − L un φ k ) (cid:107) = (cid:88) j / ∈ I m ( µ k − λ j ) ( u Tj φ k ) ≥ ( γ ( M ) ) (cid:88) j / ∈ I m ( u Tj φ k ) = ( γ ( M ) ) (cid:107) P ( S ( m ) ) ⊥ φ k (cid:107) , which gives that (cid:107) φ k − P S ( m ) φ k (cid:107) = (cid:107) P ( S ( m ) ) ⊥ φ k (cid:107) ≤ γ ( M ) Err pt , for all k ∈ I m . By that { φ k } Kk =1 are near orthonormal with large N (Lemma 3.4), this proves that there exists an l m -by- l m orthogonaltransform Q m , and | α k | = 1 + o (1), such that (cid:107) u k − α k φ (cid:48) k (cid:107) = O (Err pt ) = O ( (cid:15), (cid:113) log NN(cid:15) d/ ), k ∈ I m , where φ (cid:48) k = (cid:80) l m j =1 ( Q m ) k,j φ k . This proves consistency of empirical eigenvectors u k up to the point-wise ratefor k ≤ k max . Finally, Step 3 Proposition 5.3 extends by considering (43) for u k and φ (cid:48) k , making use of (cid:107) u k − α k φ (cid:48) k (cid:107) = O (Err pt ), the Dirichlet form convergence of E N ( ρ X ψ k ) (Lemma 3.3), and that { φ (cid:48) k } k ∈ I m is transformed from { φ k } k ∈ I m by an orthogonal matrix Q m .To address the eigen-convergence of L rw , we define the D/N -weighted 2-norm as (cid:107) u (cid:107) DN = 1 N u T Du, and recall that eigenvectors of L rw are D -orthogonal. The following theorem is the counterpart of Theorem5.4 for L rw , obtaining the same rates. Theorem 5.5 (eigen-convergence of L rw ) . Under the same condition and setting of M , p being uniform, h being Gaussian, and k max , K, µ k , (cid:15) same as in Theorem 5.4. Consider first k max eigenvalues andeigenvectors of L rw , L rw v k = λ k v k , v Tk Dv l = δ kl N p , i.e. (cid:107) v k (cid:107) DN = p , and the vectors φ k defined as in (39) . Then, for sufficiently large N , w.p. > − K N − − (4 K + 6) N − , (cid:107) v k (cid:107) = 1 + o (1) , and the samebound of | µ k − λ k | and (cid:107) v k − α k φ k (cid:107) as in Theorem 5.4 hold for ≤ k ≤ k max , with certain scalars α k satisfying | α k | = 1 + o (1) , The extension to when µ k has greater than 1 multiplicity is possible, similarly as in Remark 4. The proofof L rw uses almost the same method as for L un , and the difference is that v k are no longer orthonormalbut D -orthogonal. This is handled by that (cid:107) u (cid:107) and p (cid:107) u (cid:107) D/N agrees in relative error up to the form rate,due to the concentration of D i /N (Lemma 3.5). The detailed proof is left to Appendix C.3. We consider p as in Assumption 1(A2). The density-corrected graph Laplacian is defined as [10]˜ L rw = 1 m m (cid:15) ( I − ˜ D − ˜ W ) , ˜ W ij = W ij D i D j , ˜ D ii = N (cid:88) j =1 ˜ W ij , W ij = K (cid:15) ( x i , x j ) as before, and D is the degree matrix of W . The density-corrected graph laplacianrecovers Laplace-Beltrami operator when p is not uniform.. In this section, we extend the theory of point-wise convergence, Dirichlet form convergence, and eigen-convergence to such graph Laplacian. ˜ L rw This subsection proves Theorem 6.2, which shows that the point-wise rate of ˜ L rw is same as that of L rw without the density-correction. The result is for general differentiable h satisfying Assumption 2, whichcan be of independent interest.We first establish the counterpart of Lemma 3.5 about the concentration of all N D i = N (cid:80) Nj =1 W ij when p is not uniform. The deviation bound is uniform for all i and has an bias error at O ( (cid:15) ). Lemma 6.1.
Under Assumptions 1 and 2, suppose as N → ∞ , (cid:15) → , (cid:15) d/ = Ω( log NN ) . Then,1) When N is large enough, w.p. > − N − , D i > for all i s.t. ˜ W is well-defined, and N D i = m ˜ p (cid:15) ( x i ) + O ( (cid:15) , (cid:114) log NN (cid:15) d/ ) , ˜ p (cid:15) := p + ˜ m(cid:15) ( ωp + ∆ p ) , ≤ i ≤ N. (45) where ω ∈ C ∞ ( M ) is determined by manifold extrinsic coordinates, and ˜ m [ h ] = m [ h ]2 m [ h ] .2) When N is large enough, w.p. > − N − , ˜ D i > for all i s.t. ˜ L rw is well-defined, and N (cid:88) j =1 W ij D j = 1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ≤ i ≤ N. (46) The constants in big- O in parts 1) and 2) depend on ( M , p ) , and are uniform for all i . The proof is left to Appendix D. The following theorem proves the point-wise rate of ˜ L rw . Theorem 6.2.
Under Assumptions 1 and 2, if as N → ∞ , (cid:15) → , (cid:15) d/ = Ω( log NN ) , then for any f ∈ C ( M ) , when N is large enough, w.p. > − N − , (cid:15) m m ( I − ˜ D − ˜ W )( ρ X f )( x i ) = − ∆ f ( x i ) + ε i , sup ≤ i ≤ N | ε i | = O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) . The constants in the big-O notation depend on M , p and the C norm of f . The theorem slightly improves the point-wise convergence rate of O ( (cid:15), (cid:113) log NN(cid:15) d/ ) in [28]. It is provedusing the same techniques as the analysis of point-wise convergence of L rw in [27, 9], and we include aproof for completeness. Proof of Theorem 6.2.
By definition, − (cid:15) m m ( I − ˜ D − ˜ W )( ρ X f )( x i ) = 1 (cid:15) m m (cid:80) Nj =1 W ij f ( x j ) − f ( x i ) D j (cid:80) Nj =1 W ij D j . (47)The proof of Lemma 6.1 has constructed two good events E and E ( E is for Part 1) to hold, Part 2)assumes E and E ), such that with large enough N , E ∩ E happens w.p. > − N − , under which D i , ˜ D i > i , ˜ W and ˜ L rw are well-defined, and equations (45), (A.21), and (46) hold. (46) providesthe concentration of the denominator of the r.h.s. of (47). We now consider the numerator. Note that,with sufficiently small (cid:15) , ˜ p (cid:15) is uniformly bounded from below by O (1) constant p (cid:48) min . This is because24 , p ∈ C ∞ ( M ), M is compact, then ( ωp + ∆ p ) is uniformly bounded, and meanwhile p is uniformlybounded from below. Thus, under E ,1 N N (cid:88) j =1 W ij f ( x j ) − f ( x i ) N D j = 1 N N (cid:88) j =1 W ij ( f ( x j ) − f ( x i )) m ˜ p (cid:15) ( x j )(1 + ε j ) , max ≤ j ≤ N | ε j | = O ( (cid:15) , (cid:114) log NN (cid:15) d/ ) , and the equation equals1 N N (cid:88) j =1 W ij ( f ( x j ) − f ( x i )) m ˜ p (cid:15) ( x j ) (1 + ε (cid:48) j ) = 1 N N (cid:88) j =1 W ij ( f ( x j ) − f ( x i )) m ˜ p (cid:15) ( x j ) + 1 N N (cid:88) j =1 W ij ( f ( x j ) − f ( x i )) m ˜ p (cid:15) ( x j ) ε (cid:48) j =: 1 ○ + 2 ○ , max ≤ j ≤ N | ε (cid:48) j | = O ( (cid:15) , (cid:114) log NN (cid:15) d/ )and we analyze the two terms respectively.To bound | ○ | , we use W ij ≥ p (cid:15) ( x ) ≥ p (cid:48) min > | ○ | ≤ N N (cid:88) j =1 W ij | f ( x j ) − f ( x i ) | m ˜ p (cid:15) ( x j ) | ε (cid:48) j | ≤ max ≤ j ≤ N | ε (cid:48) j | m p (cid:48) min · N N (cid:88) j =1 W ij | f ( x j ) − f ( x i ) | . We claim that w.p. > − N − , and we call this good event E , under which1 N N (cid:88) j =1 W ij | f ( x j ) − f ( x i ) | = O ( √ (cid:15) ) , ≤ i ≤ N, (48)and the proof is in below. With (48), under E , | ○ | can be bounded by | ○ | = ( max ≤ j ≤ N | ε (cid:48) j | ) O ( √ (cid:15) ) = O ( (cid:15) , (cid:114) log NN (cid:15) d/ ) O ( √ (cid:15) ) = O ( (cid:15) / , (cid:114) log NN (cid:15) d/ − ) . (49)The analysis of 1 ○ uses concentration of independent sum again. Condition on x i and consider1 ○ (cid:48) = 1 N − N (cid:88) j (cid:54) = i,j =1 K (cid:15) ( x i , x j ) f ( x j ) − f ( x i )˜ p (cid:15) ( x j ) =: 1 N − N (cid:88) j (cid:54) = i,j =1 Y j , and we have 1 ○ = m (1 − N ) 1 ○ (cid:48) . Due to uniform boundedness of ˜ p (cid:15) from below by p (cid:48) min > | Y j | arebounded by L Y = Θ( (cid:15) − d/ ). We claim that the expectation (proof in below) E Y j = (cid:90) M K (cid:15) ( x i , y ) f ( y ) p ( y )˜ p (cid:15) ( y ) dV ( y ) − f ( x i ) (cid:90) M K (cid:15) ( x i , y ) p ( y )˜ p (cid:15) ( y ) dV ( y ) = m (cid:15) ∆ f ( x i ) + O ( (cid:15) ) . (50)The variance of Y j is bounded by E Y j = (cid:90) M K (cid:15) ( x i , y ) (cid:18) f ( y ) − f ( x i )˜ p (cid:15) ( y ) (cid:19) p ( y ) dV ( y ) ≤ p (cid:48) min (cid:90) M K (cid:15) ( x i , y ) ( f ( y ) − f ( x i )) p ( y ) dV ( y ) ≤ ν Y = Θ f,p ( (cid:15) − d/ ) , which follows the same derivation as in the proof of the point-wise convergence of L rw without density-correction, c.f. Theorem 5.1 1), and can be directly verified by a similar calculation as in (52). We attempt25t the large deviation bound at Θ( (cid:113) log NN ν Y ) ∼ ( log NN(cid:15) d/ − ) / which is of small order than ν Y L Y = Θ( (cid:15) ) underthe theorem condition that (cid:15) d/ = Ω( log NN ). Thus the classical Bernstein gives that for large enough N ,w.p. > − N − , 1 ○ (cid:48) = E Y j + O ( (cid:114) log NN ν Y ) = m (cid:15) ∆ f ( x i ) + O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ − ) , and as a result, 1 ○ = ˜ m(cid:15) ∆ f ( x i ) + O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ − ) . (51)By a union bound over the events needed at N points, we have that (51) holds at all x i under a goodevent E which happens w.p. > − N − .Putting together, under E and E , by (49) and (51), at all x i ,1 (cid:15) N (cid:88) j =1 W ij f ( x j ) − f ( x i ) D j = ˜ m ∆ f ( x i ) + O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) + O ( (cid:15) / , (cid:114) log NN (cid:15) d/ )= ˜ m ∆ f ( x i ) + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . Combined with (46), under E , E , E , E ,1 (cid:15) ˜ m (cid:80) Nj =1 W ij f ( x j ) − f ( x i ) D j (cid:80) Nj =1 W ij D j = ∆ f ( x i ) + O ( (cid:15), (cid:113) log NN(cid:15) d/ )1 + O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = ∆ f ( x i ) + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . It remains to establish (48) and (50) to finish the proof of the theorem.Proof of (48): Define r.v. Y j = W ij | f ( x j ) − f ( x i ) | and condition on x i , for j (cid:54) = i , E Y j = (cid:82) M K (cid:15) ( x i , y ) | f ( y ) − f ( x i ) | p ( y ) dV ( y ). Let δ (cid:15) = (cid:113) ( d +10 a ) (cid:15) log (cid:15) , for any x ∈ M , K (cid:15) ( x, y ) = O ( (cid:15) ) when y / ∈ B δ (cid:15) ( x ), then (cid:90) M K (cid:15) ( x, y ) | f ( y ) − f ( x ) | p ( y ) dV ( y )= (cid:90) B δ(cid:15) ( x ) K (cid:15) ( x, y ) | f ( y ) − f ( x ) | p ( y ) dV ( y ) + O ( (cid:15) ) (cid:107) f (cid:107) ∞ (cid:107) p (cid:107) ∞ ≤ (cid:90) B δ(cid:15) ( x ) K (cid:15) ( x, y )( (cid:107)∇ f (cid:107) ∞ (cid:107) y − x (cid:107) ) p ( y ) dV ( y ) + O f,p ( (cid:15) )= O f,p ( √ (cid:15) ) + O f,p ( (cid:15) ) = O ( √ (cid:15) ) . The O f,p ( √ (cid:15) ) is obtained because (cid:107) p (cid:107) ∞ , (cid:107)∇ f (cid:107) ∞ are finite constants, and1 √ (cid:15) (cid:90) B δ(cid:15) ( x ) K (cid:15) ( x, y ) (cid:107) y − x (cid:107) dV ( y ) = (cid:90) B δ(cid:15) ( x ) (cid:15) − d/ h ( (cid:107) x − y (cid:107) (cid:15) ) (cid:107) y − x (cid:107)√ (cid:15) dV ( y ) ≤ (cid:90) B δ(cid:15) ( x ) (cid:15) − d/ a e − a (cid:107) x − y (cid:107) (cid:15) (cid:107) y − x (cid:107)√ (cid:15) dV ( y ) ≤ (cid:90) (cid:107) u (cid:107) < . δ (cid:15) , u ∈ R d a e − a . (cid:107) u (cid:107) (cid:107) u (cid:107) . O ( (cid:107) u (cid:107) )) du = O (1) , (52)26here u ∈ R d is the projected coordinates in the tangent plane T x ( M ), and the comparison of (cid:107) x − y (cid:107) R D to (cid:107) u (cid:107) (namely 0 . (cid:107) x − y (cid:107) R D < (cid:107) u (cid:107) < . (cid:107) x − y (cid:107) R D ) and the volume comparison (namely dV ( y ) =(1 + O ( (cid:107) u (cid:107) )) du ) hold when δ (cid:15) < δ ( M ) which is a constant depending on M , see e.g. Lemma A.1 in [9].Meanwhile, | Y j | is bounded by L Y = (cid:107) f (cid:107) ∞ Θ( (cid:15) − d/ ), and the variance of Y j is bounded by E Y j and thenbounded by ν Y = Θ( (cid:15) − d/ ), by a similar calculation as in (52). We attempt at the large deviation boundat Θ( (cid:113) log NN ν Y ) ∼ ( log NN(cid:15) d/ − ) / which is of small order than ν Y L Y = Θ( (cid:15) ) under the theorem condition that (cid:15) d/ = Ω( log NN ). Thus, for fixed i , w.p. > − N − ,1 N − (cid:88) j (cid:54) = i Y j = E Y + O ( (cid:114) log NN (cid:15) d/ − ) = O ( √ (cid:15) )(1 + o (1)) = O ( √ (cid:15) ) . The term N Y i is bounded by O ( (cid:15) − d/ N − ) = o ( √ (cid:15) ). By the same argument of independence of x i from { x j } j (cid:54) = i and the union bound over N events, we have proved (48).Proof of (50): Note that p ˜ p (cid:15) = 11 + (cid:15) ˜ m ( ω + ∆ pp ) = 1 − (cid:15) ˜ m ( ω + ∆ pp ) + (cid:15) r (cid:15) = 1 − (cid:15)r + (cid:15) r (cid:15) , where r := ˜ m ( ω + ∆ pp ) is a deterministic function, r ∈ C ∞ ( M ); r (cid:15) ∈ C ∞ ( M ), and (cid:107) r (cid:15) (cid:107) ∞ = O (1) when (cid:15) is less than some O (1) threshold due to that (cid:107) ω + ∆ pp (cid:107) ∞ = O (1). Then, (cid:90) M K (cid:15) ( x i , y ) f p ˜ p (cid:15) ( y ) dV ( y ) = (cid:90) M K (cid:15) ( x i , y ) f ( y )(1 − (cid:15)r + (cid:15) r (cid:15) )( y ) dV ( y )= (cid:90) M K (cid:15) ( x i , y ) f ( y ) dV ( y ) − (cid:15) (cid:90) M K (cid:15) ( x i , y )( f r )( y ) dV ( y ) + (cid:15) (cid:90) M K (cid:15) ( x i , y )( f r (cid:15) )( y ) dV ( y )= (cid:16) m f ( x i ) + m (cid:15) ( ωf + ∆ f )( x i ) + O ( (cid:15) ) (cid:17) − (cid:15) ( m f r ( x i ) + O ( (cid:15) )) + O ( (cid:15) )= m f ( x i ) + m (cid:15) ( ωf + ∆ f − m f r )( x i ) + O ( (cid:15) ) , and taking f = 1 gives that (cid:90) M K (cid:15) ( x i , y ) p ˜ p (cid:15) ( y ) dV ( y ) = m + m (cid:15) ( ω − m r )( x i ) + O ( (cid:15) ) . Putting together and subtracting the two terms in (50) proves that E Y j = m (cid:15) ∆ f ( x i ) + O ( (cid:15) ). The graph Dirichlet form of density-corrected graph Laplacian is defined as˜ E N ( u ) := 1 m m (cid:15) u T ( ˜ D − ˜ W ) u = 1 m m (cid:15) N (cid:88) i,j =1 ˜ W i,j ( u i − u j ) = 1 m m (cid:15) N (cid:88) i,j =1 W i,j ( u i − u j ) D i D j . (53)We establish the counter part of Theorem 3.2, which achieves the same form rate. The theorem is forgeneral differentiable h , which can be of independent interest. Theorem 6.3.
Under Assumptions 1 and 2, if as N → ∞ , (cid:15) → , (cid:15) d/ N = Ω(log N ) , then for any f ∈ C ∞ ( M ) , when N is sufficiently large, w.p. > − N − − N − , ˜ E N ( ρ X f ) = (cid:104) f, − ∆ f (cid:105) + O p,f ( (cid:15), (cid:114) log NN (cid:15) d/ ) . roof of Theorem 6.3. By definition (53),˜ E N ( ρ X f ) = 1 m m (cid:15) N N (cid:88) i,j =1 W i,j ( f ( x i ) − f ( x j )) D i N D j N . The following lemma (proved in Appendix D) makes use of the concentration of D i /N to reduce the graphDirichlet form to be a V-statistics up to a relative error at the form rate. Lemma 6.4.
Under the good event in Lemma 6.1 1), ˜ E N ( u ) = m [ h ] (cid:15) N N (cid:88) i,j =1 W i,j ( u i − u j ) p ( x i ) p ( x j ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N , and the constant in big- O is determined by ( M , p ) and uniform for all u . We consider under the good event in Lemma 6.1 1), which is called E and happens w.p. > − N − .Then applying Lemma 6.4 with u = ρ X f , we have that˜ E N ( ρ X f ) = m (cid:15) N N (cid:88) i,j =1 W i,j ( f ( x i ) − f ( x j )) p ( x i ) p ( x j ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) =: 3 ○ (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) (54)The term 3 ○ in (54) equals N (cid:80) Ni,j =1 V i,j , where V i,j := m (cid:15) K (cid:15) ( x i , x j ) ( f ( x i ) − f ( x j )) p ( x i ) p ( x j ) , and V i,i = 0. Wefollow the same approach as in the proof of Theorem 3.5 in [9] to analyze this V-statistic, and show that(proof in Appendix D) { ○ in (54) } = (cid:104) f, − ∆ f (cid:105) + O f,p ( (cid:15), (cid:114) log NN (cid:15) d/ ) . (55)Back to (54), we have shown that under E ∩ E ,˜ E N ( ρ X f ) = 3 ○ (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = (cid:32) (cid:104) f, − ∆ f (cid:105) + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ))= (cid:104) f, − ∆ f (cid:105) + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , and the constant in big- O depends on M , f and p . ˜ L rw In this subsection, let λ k be eigenvalues of ˜ L rw and v k the associated eigenvectors. By (53), recall that˜ m = m m , the analogue of (8) is the following λ k = min L ⊂ R N , dim ( L )= k sup v ∈ L,v (cid:54) =0 1 (cid:15) ˜ m v T ( ˜ D − ˜ W ) vv T ˜ Dv = m ˜ E N ( v ) v T ˜ Dv , ≤ k ≤ N. (56)The methodology is same as before, with a main difference in the definition of the heat interpolationmapping with weights p ( x j ) as in (57). This gives to the p -weighted quadratic form ˜ q s ( u ) defined in(58), for which we derive the concentration argument of for ˜ q (0) s in (A.33) and the upper bound of ˜ q (2) s inLemma D.2. The other difference is that the ˜ D -weighted 2-norm is considered because the eigenvectorsare ˜ D -orthogonal. All the proofs of the Steps 0-3 are left to Appendix D.Step 0. We first establish eigenvalue UB based on Lemma 6.1 and the form rate in Theorem 6.3.28 roposition 6.5 (Eigenvalue UB of ˜ L rw ) . Under Assumptions 1 and 2, for fixed K ∈ N , Suppose µ < · · · < µ K < ∞ are all of single multiplicity. If as N → ∞ , (cid:15) → , and (cid:15) d/ = Ω( log NN ) , thenfor sufficiently large N , w.p. > − N − − K N − , ˜ L rw is well-defined, and λ k ≤ µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k = 1 , · · · , K. Step 1. Eigenvalue crude LB. We prove with the p -weighted interpolation mapping defined as˜ I r [ u ] = 1 N N (cid:88) j =1 u j p ( x j ) H r ( x, x j ) = I r [˜ u ] , ˜ u i = u i /p ( x i ) . (57)Then, same as before, (cid:104) ˜ I r [ u ] , ˜ I r [ u ] (cid:105) = q δ(cid:15) (˜ u ), and (cid:104) ˜ I r [ u ] , Q t ˜ I r [ u ] (cid:105) = q (cid:15) (˜ u ), where for s > q s ( u ) := 1 N N (cid:88) i,j =1 H s ( x i , x j ) p ( x i ) p ( x j ) u i u j = q s (˜ u ) = ˜ q (0) s ( u ) − ˜ q (2) s ( u ) , ˜ q (0) s ( u ) := 1 N N (cid:88) i =1 u i N N (cid:88) j =1 H s ( x i , x j ) p ( x i ) p ( x j ) , ˜ q (2) s ( u ) := 12 N N (cid:88) i,j =1 H s ( x i , x j ) p ( x i ) p ( x j ) ( u i − u j ) . (58) Proposition 6.6 (Initial crude eigenvalue LB of ˜ L rw ) . Under Assumptions 1, h is Gaussian. For fixed k max ∈ N , K = k max + 1 , and µ k , (cid:15) and N satisfy the same condition as in Proposition 4.1, where thedefinition of c K is the same except that c is a constant depending on ( M , p ) . Then, for sufficiently large N , w.p. > − K N − − N − , λ k > µ k − γ K , for k = 2 , · · · , K . Steps 2-3. We prove eigenvector consistency and refined eigenvalue convergence rate. Define (cid:107) u (cid:107) D := N (cid:88) i =1 u i ˜ D i , ∀ u ∈ R N . (59)The proof uses same techniques as before, and the differences is in handling the ˜ D -orthogonality of theeigenvectors and using the concentration arguments in Lemma 6.1. Same as before, extension to when µ k has greater than 1 multiplicity is possible (Remark 4). Theorem 6.7 (eigen-convergence of ˜ L rw ) . Under the same condition and setting of M , p being uniform, h being Gaussian, and k max , K , µ k , (cid:15) same as in Theorem 5.4, where the definition of c K is the sameexcept that c is a constant depending on ( M , p ) . Consider first k max eigenvalues and eigenvectors of ˜ L rw , ˜ L rw v k = λ k v k , and v k are normalized s.t. N (cid:107) v k (cid:107) D = 1 . Define for ≤ k ≤ K , ˜ φ k := ρ X ( ψ k √ N ) . Then, for sufficiently large N , w.p. > − K N − − (4 K + 8) N − , (cid:107) v k (cid:107) = Θ(1) , and the same bounds asin Theorem 5.4 hold for | µ k − λ k | and (cid:107) v k − α k ˜ φ k (cid:107) , for ≤ k ≤ k max , with certain scalars α k satisfying | α k | = 1 + o (1) , a) (b) Figure 2: Data points are sampled uniformly on S embedded in R . (a) The eigenvalue relative error RelErr λ , visualized (in log )as a field on a grid of (log ) N and (cid:15) , k max = 9. The red curve on the left plot indicates the post-selected optimal (cid:15) which minimizesthe error, and that minimal error as a function of N is plotted on the right in log-log scale. (b) Same plot as (a) for eigenvectorrelative error RelErr v . The relative errors are defined in (60). The empirical errors are averaged over 500 runs of experiments, andthe log error values are smoothed over the grid for better visualization. Plots of the raw values are shown in Fig. A.1. (a) (b) Figure 3: Data points are sampled uniformly on S embedded in R , same plots as Fig. 2. k max = 9, and the plots of raw valuesare shown in Fig. A.2. L rw We test on two simulated datasets, which are uniformly sampled on S (embedded in R , the formula isin Appendix A) and unit sphere S (embedded in R ). For both datasets, we compute over an increasingnumber of samples N = { , · · · , } and a range of values of (cid:15) , where the grid points of both N and (cid:15) are evenly spaced in log scale. For each value of N and (cid:15) , we generate N data points, construct thekernelized matrix W ij = K (cid:15) ( x i , x j ) as defined in (1) with Gaussian h , and compute the first 10 eigenvalues λ k and eigenvectors v k of L rw . The errors are computed byRelErr λ = k max (cid:88) k =2 | λ k − µ k | µ k , RelErr v = k max (cid:88) k =2 (cid:107) v k − φ k (cid:107) (cid:107) φ k (cid:107) , (60)where φ k is as defined by (39). The experiment is repeated for 500 replicas from which the averagedempirical errors are computed. For the data on S , (cid:15) = { − . , · · · , − } . The manifold (in first 3coordinates) is illustrated in Fig. 4(a) but the density is uniform here. See more details in Appendix A.For the data on S , (cid:15) = { − . , · · · , − . } . These ranges are chosen so that the minimal error over (cid:15) for each N are observed, at least for RelErr λ . Note that for S , the population eigenvalues starting from µ are of multiplicity 2, and for S , the multiplicities are 3, 5, · · · .The results are shown in Figures 2 and 3. For data on S , Fig. 2 (a) shows that RelErr λ as a function30 a) (b) (c) (d) Figure 4: (a) Random sampled data on S embedded in R , the first 3 coordinates are shown, and colored by the density. (b) Density p and the test function f plotted as a function of intrinsic coordinate (arc-length) on [0 ,
1) of S . (c) One realization of ˜ L rw ( ρ X f )plotted in comparison with the true function of ρ X (∆ f ). (d) Log relative error log RelErr pt , as defined in (61), computed over arange of values of (cid:15) , averaged over 50 runs of repeated experiments. The two fitted lines shows the approximate scaling of RelErr pt at small (cid:15) , where variance error dominates, and at large (cid:15) , where bias error dominates. (a) (b) Figure 5: Same eigenvalue and eigenvector relative error plots as Fig. 2, where data are non-uniformly sampled on S as in Fig.4(a). k max = 9, and the plots of raw values are shown in Fig. A.3. of N (with post-selected best (cid:15) ) shows a convergence order of about N − . , which is consistent with thetheoretical bound of N − / ( d/ in Theorem 5.5, since d = 1 here. In the left plot of colored field, thelog error values are smoothed over the grid of N and (cid:15) , and the best (cid:15) scales with N as about N − . .The empirical scaling of optimal (cid:15) is less stable to observe: depending on the level of smoothing, the slopeof log (cid:15) varies between -0.2 and -0.5 (the left plot), while the slope for best (log) error is always about-0.4 (the right plot). The result without smoothing is shown in Fig. A.1. The eigenvector error in Fig.2(b) shows an order of about N − . , which is better than the theoretical prediction. For the data on S , the eigenvalue convergence shows an order of about N − . , in agreement with the theoretical rate of N − / ( d/ when d = 2. The eigenvector error again shows an order of about N − . which is better thantheory. The small error of eigenvector estimation at very large value of (cid:15) may be due to the symmetry ofthe simple manifolds S and S . In both experiments, the eigenvector estimation prefers a much largervalue of (cid:15) than the eigenvalue estimation, which is consistent with the theory. To examine the density-corrected graph Laplacian, we switch to non-uniform density on S , illustrated inFig. 4(a). We first investigate the point-wise convergence of − ˜ L rw f to ∆ f , on a test function f : S → R ,see more details in Appendix A. The error is computed asRelErr pt = (cid:107) − ˜ L rw ρ X f − ρ X (∆ f ) (cid:107) (cid:107) ρ X (∆ f ) (cid:107) , (61)31nd the result is shown in Fig. 4. Theorem 6.2 predicts the bias error to be O ( (cid:15) ) and the variance errorto be O ( (cid:15) − d/ − / ) = O ( (cid:15) − / ) since N is fixed, which agrees with Fig. 4(d).The results of RelErr λ and RelErr v are shown in Fig. 5. The order of convergence with best (cid:15) appearsto be about N − . for both eigenvalue and eigenvector errors, which is better than those of L rw (when p is uniform) in Fig. 2, and better than the theoretical prediction in Theorem 6.7. The current result may be extended in several directions. First, for manifold with smooth boundary,the random-walk graph Laplacian recovers the Neumann Laplacian [10], and one can expect to provethe spectral convergence as well, such as in [22]. Second, extension to kernel with variable or adaptivebandwidth [5, 9], and other normalization schemes, e.g., bi-stochastic normalization [23, 20, 36], wouldbe important to improve the robustness against low sampling density and noise in data, and even thespectral convergence as well. Related is the problem of spectral convergence to other manifold diffusionoperators, e.g., the Fokker-Planck operator, on L ( M , pdV ). It would also be interesting to extend to moregeneral types of kernel function h which is not Gaussian, and even not symmetric [37], for the spectralconvergence. At last, further investigation is needed to explain the good spectral convergence observed inexperiments, particularly that of the eigenvector convergence and the faster rate with density-correctedgraph Laplacian. For the eigenvector convergence, the current work focuses on the 2-norm consistency,while the ∞ -norm consistency as has been derived in [11, 8] is also important to study. Acknowledgement
The authors thank Hau-Tieng Wu for helpful discussion. Cheng thanks Yiping Lu for helpful discussionon the eigen-convergence problem and the proof.
References [1] Donald Gary Aronson. Bounds for the fundamental solution of a parabolic equation.
Bulletin of theAmerican Mathematical society , 73(6):890–896, 1967.[2] Mukund Balasubramanian and Eric L Schwartz. The isomap algorithm and topological stability.
Science , 295(5552):7–7, 2002.[3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data rep-resentation.
Neural computation , 15(6):1373–1396, 2003.[4] Mikhail Belkin and Partha Niyogi. Convergence of laplacian eigenmaps. In
Advances in NeuralInformation Processing Systems , pages 129–136, 2007.[5] Tyrus Berry and John Harlim. Variable bandwidth diffusion kernels.
Applied and ComputationalHarmonic Analysis , 40(1):68–96, 2016.[6] Dmitri Burago, Sergei Ivanov, and Yaroslav Kurylev. A graph discretization of the laplace-beltramioperator. arXiv preprint arXiv:1301.2222 , 2013.[7] Jeff Calder and Nicolas Garcia Trillos. Improved spectral convergence rates for graph laplacians onepsilon-graphs and k-nn graphs. arXiv preprint arXiv:1910.13476 , 2019.[8] Jeff Calder, Nicolas Garcia Trillos, and Marta Lewicka. Lipschitz regularity of graph laplacians onrandom data clouds. arXiv preprint arXiv:2007.06679 , 2020.329] Xiuyuan Cheng and Hau-Tieng Wu. Convergence of graph laplacian with knn self-tuned kernels. arXiv preprint arXiv:2011.01479 , 2020.[10] Ronald R Coifman and St´ephane Lafon. Diffusion maps.
Applied and computational harmonic anal-ysis , 21(1):5–30, 2006.[11] David B Dunson, Hau-Tieng Wu, and Nan Wu. Spectral convergence of graph laplacian and heatkernel reconstruction in l ∞ from random samples. arXiv preprint arXiv:1912.05680v3 , 2019.[12] Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J Wainwright, and Michael I Jordan.Asymptotic behavior of \ ell p-based laplacian regularization in semi-supervised learning. In Confer-ence on Learning Theory , pages 879–906, 2016.[13] Noureddine El Karoui and Hau-Tieng Wu. Graph connection laplacian methods can be made robustto noise.
The Annals of Statistics , 44(1):346–372, 2016.[14] Justin Eldridge, Mikhail Belkin, and Yusu Wang. Unperturbed: spectral analysis beyond davis-kahan. arXiv preprint arXiv:1706.06516 , 2017.[15] M Flores, J Calder, and G Lerman. Algorithms for lp-based semi-supervised learning on graphs. arXiv preprint arXiv:1901.05031 , 2019.[16] Alexander Grigor’yan. Gaussian upper bounds for the heat kernel on arbitrary manifolds.
Journal ofDifferential Geometry , 45:33–52, 1997.[17] Alexander Grigor’yan. Heat kernel and analysis on manifolds, volume 47 of ams.
IP Studies inAdvanced Mathematics. American Mathematical Society, Providence, RI , 2, 2009.[18] Matthias Hein. Uniform convergence of adaptive graph-based regularization. In
International Con-ference on Computational Learning Theory , pages 50–64. Springer, 2006.[19] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. From graphs to manifolds–weak andstrong pointwise consistency of graph laplacians. In
International Conference on ComputationalLearning Theory , pages 470–485. Springer, 2005.[20] Boris Landa, Ronald R Coifman, and Yuval Kluger. Doubly-stochastic normalization of the gaussiankernel is robust to heteroskedastic noise. arXiv preprint arXiv:2006.00402 , 2020.[21] Peter Li, Shing Tung Yau, et al. On the parabolic kernel of the schr¨odinger operator.
Acta Mathe-matica , 156:153–201, 1986.[22] Jinpeng Lu. Graph approximations to the laplacian spectra.
Journal of Topology and Analysis , pages1–35, 2020.[23] Nicholas F Marshall and Ronald R Coifman. Manifold learning with bi-stochastic kernels.
IMAJournal of Applied Mathematics , 84(3):455–482, 2019.[24] Boaz Nadler, Nathan Srebro, and Xueyuan Zhou. Semi-supervised learning with the graph laplacian:The limit of infinite unlabelled data.
Advances in neural information processing systems , 22:1330–1338, 2009.[25] Steven Rosenberg.
The Laplacian on a Riemannian manifold: an introduction to analysis on mani-folds . Number 31. Cambridge University Press, 1997.[26] Zuoqiang Shi. Convergence of laplacian spectra from random samples. arXiv preprintarXiv:1507.00151 , 2015. 3327] Amit Singer. From graph to manifold laplacian: The convergence rate.
Applied and ComputationalHarmonic Analysis , 21(1):128–134, 2006.[28] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connection laplacian from randomsamples.
Information and Inference: A Journal of the IMA , 6(1):58–123, 2016.[29] Dejan Slepcev and Matthew Thorpe. Analysis of p-laplacian regularization in semisupervised learning.
SIAM Journal on Mathematical Analysis , 51(3):2085–2120, 2019.[30] Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R Coifman. Diffusion maps for signalprocessing: A deeper look at manifold-learning techniques based on kernels and graphs.
IEEE signalprocessing magazine , 30(4):75–86, 2013.[31] Daniel Ting, Ling Huang, and Michael Jordan. An analysis of the convergence of graph laplacians. arXiv preprint arXiv:1101.5435 , 2011.[32] Nicol´as Garc´ıa Trillos, Moritz Gerlach, Matthias Hein, and Dejan Slepˇcev. Error estimates for spectralconvergence of the graph laplacian on random geometric graphs toward the laplace–beltrami operator.
Foundations of Computational Mathematics , 20(4):827–887, 2020.[33] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: acomparative review.
J Mach Learn Res , 10(66-71):13, 2009.[34] Ulrike Von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consistency of spectral clustering.
TheAnnals of Statistics , pages 555–586, 2008.[35] Xu Wang. Spectral convergence rate of graph laplacian. arXiv preprint arXiv:1510.08110 , 2015.[36] Caroline L Wormell and Sebastian Reich. Spectral convergence of diffusion maps: improved errorbounds and an alternative normalisation. arXiv preprint arXiv:2006.02037 , 2020.[37] Hau-Tieng Wu, Nan Wu, et al. Think globally, fit locally under the manifold setup: Asymptoticanalysis of locally linear embedding.
Annals of Statistics , 46(6B):3805–3837, 2018.
A Details of numerical experiments
In the example of S data, the isometric embedding in R is by ι ( t ) = 12 π √ (cid:18) cos(2 πt ) , sin(2 πt ) ,
23 cos(2 π t ) ,
23 sin(2 π t ) (cid:19) , where t ∈ [0 ,
1) is the intrinsic coordinate of S (arc-length). In the example in Section. 7.2 where p is notuniform, p ( t ) = 1 + sin(2 π t ) + . sin(2 π t ), and the test function f ( t ) = 0 . πt ) − . π t ). Inthe example of S data, sample are on unit sphere in R .In both plots of the raw error data without smoothing, Figures A.1 and A.2 the slope of error conver-gence rates (about -0.4 and - 0.33) are about the same. The slope of post-selected optimal (log) (cid:15) as afunction of (log) N changes, due to the closeness of the values over the multiple values of (cid:15) . B More preliminaries
Throughout the paper, we use the following version of classical Bernstein inequality, where the tail proba-bility uses ν > t < νL . 34 a) (b) Figure A.1: Same plots as Fig. 2 where the log error values on the (log) grid of N and (cid:15) are without smoothing. (a) (b) Figure A.2: Same plots as Fig. 3 where the log error values on the (log) grid of N and (cid:15) are without smoothing. (a) (b) Figure A.3: Same plots as Fig. 5 where the log error values on the (log) grid of N and (cid:15) are without smoothing. emma B.1 (Classical Bernstein) . Let ξ j be i.i.d. bounded random variables, j = 1 , · · · , N , E ξ j = 0 . If | ξ j | ≤ L and E ξ j ≤ ν for L, ν > , then Pr[ 1 N N (cid:88) j =1 ξ j > t ] , Pr[ 1 N N (cid:88) j =1 ξ j < − t ] ≤ exp {− t N ν + tL ) } , ∀ t > . In particular, when tL < ν , both the tail probabilities are bounded by exp {− Nt ν } . Additional proofs in Section 2:
Proof of Theorem 2.1.
Part 1): We provide a direct verification of (10) based on the parametrix construc-tion for completeness, which is not explicitly included in [25].First note that there is t , determined by M s.t. when t < t , (cid:90) M G t ( x, y ) dV ( y ) = (cid:90) M G t ( y, x ) dV ( y ) ≤ C , ∀ x ∈ M , for some C > M . This is because (cid:82) M G t ( x, y ) dV ( y ) up to an O ( t ) truncation errorequals the integral on B t := { y ∈ M , d M ( x, y ) < δ t := (cid:113) ( d/ t log t } . By change to the projectedcoordinate u in T x ( M ), the integral domain of u is contained in 1 . δ t -ball in R d for small enough δ t , then (cid:90) B t G t ( x, y ) dV ( y ) = 1(4 πt ) d/ (cid:90) B t e − d M ( x,y )24 t dV ( y ) ≤ πt ) d/ (cid:90) u ∈ R d , (cid:107) u (cid:107) < . δ t e − . (cid:107) u (cid:107) t (1 + O ( δ t )) du ≤ Θ(1)(1 + O ( t log 1 t )) = O (1) . Next, as has been shown in Chapter 3 of [25], there exist u l ∈ C ∞ ( M×M ) for l = 1 , · · · , m , u satisfiesthe needed property, and we define P m ( t, x, y ) = G t ( x, y ) (cid:0)(cid:80) ml =0 t l u l ( x, y ) (cid:1) , P m ∈ C ∞ ((0 , ∞ ) , M × M ).By Theorem 3.22 of [25], H t ( x, y ) − P m ( t, x, y ) = (cid:90) t ds (cid:90) M Q m ( t − s, x, z ) P m ( s, z, y ) dV ( z ) , where by Lemma 3.18 of [25], there is C ( t ) and thus is determined by M s.t.sup x,y ∈M | Q m ( s, x, y ) | ≤ C s m − d/ , ∀ ≤ s ≤ t . As a result, for t < t , | H t ( x, y ) − P m ( t, x, y ) | ≤ (cid:90) t ds (cid:90) M | Q m ( t − s, x, z ) | G s ( z, y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) l =0 t l u l ( z, y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dV ( z ) ≤ C t m − d/ ( m (cid:88) l =0 (cid:107) u l (cid:107) ∞ ) (cid:90) t ds (cid:90) M G s ( z, y ) dV ( z ) ≤ C t m − d/ ( m (cid:88) l =0 (cid:107) u l (cid:107) ∞ ) C t = O ( t m − d/ ) . Part 2) is a classical results proved in several places, see e.g. Theorem 1.1 in [16] combined withsup x ∈M H t ( x, x ) ≤ C t − d/ for some C depending on manifold, which can be deduced from Part 1). Theconstant 5 in 5 t in the exponential in (11) can be made any constant greater than 4, and the constant C change accordingly. 36 roof of Lemma 2.2. Let m = (cid:100) d + 3 (cid:101) , m is a positive integer m − d ≥
3. Since t →
0, and δ t = o (1),the Euclidean ball of radius δ t contains δ t -geodesic ball and is contained (1 . δ t )-geodesic ball, for smallenough t . Then both claims in Theorem 2.1 hold when t < (cid:15) for some (cid:15) depending on M , and in 1) for y ∈ B δ t ( x ) ∩ M , C t m − d/ = O ( t ). Here by choosing larger m can make the term of higher order of t ,yet O ( t ) is enough for our later analysis.Proof of (12): We use the shorthand notation ˜ O ( t ) to denote O ( t log t ). In Theorem 2.1, m is fixed, (cid:107) u l (cid:107) ∞ for l ≤ m are finite constants depending on M , thus H t ( x, y ) = G t ( x, y ) ( u ( x, y ) + O ( t )) + O ( t ) . Note that d M ( x, y ) = (cid:107) x − y (cid:107) (1 + O ( (cid:107) x − y (cid:107) )), and thus when y ∈ B δ t ( x ), d M ( x, y ) = O ( (cid:107) x − y (cid:107) ) = O ( δ t ) = ˜ O ( t ). By the property of u , u ( x, y ) = 1 + O ( d M ( x, y ) ) = 1 + ˜ O ( t ) . Meanwhile, by mean value theorem and that d M ( x, y ) ≥ (cid:107) x − y (cid:107) , e − d M ( x,y )2 t = e − (cid:107) x − y (cid:107) O ( (cid:107) x − y (cid:107) t = e − (cid:107) x − y (cid:107) t (1 + O ( (cid:107) x − y (cid:107) t )) , and then G t ( x, y ) = K t ( x, y )(1 + O ( (cid:107) x − y (cid:107) t )) = K t ( x, y )(1 + O ( t (log 1 t ) )) . Thus, for any y ∈ B δ t ( x ) ∩ M , H t ( x, y ) = K t ( x, y )(1 + O ( t (log 1 t ) )) (cid:16) O ( t ) + O ( t ) (cid:17) + O ( t ) , which proves (12), and the constants in big- O are all determined by M .Proof of (13) and (14): When y is outside the δ t -Euclidean ball, it is outside the δ t -geodesic ball.Then, by Theorem 2.1 2) and the definition of δ t , H t ( x, y ) ≤ C t − d/ e − δ t t ≤ C t , which proves (13).(14) directly follows from (11). C About graph Laplacians with W C.1 Proofs in Section 3
Proof of (15) in Remark 2.
We want to show that1 (cid:15) (cid:90) M (cid:90) M K (cid:15) ( x, y )( f ( x ) − f ( y )) p ( x ) p ( y ) dV ( x ) dV ( y ) = m [ h ] (cid:104) f, − ∆ p f (cid:105) p + O ( (cid:15) ) . First consider when p is uniform. Denote by B r ( x ) the Euclidean ball in R D centered at x with radius r . When y ∈ B √ (cid:15) ( x ) ∩ M , ( f ( x ) − f ( y )) = ( ∇ f ( x ) T u ) + Q x, ( u ) + O ( (cid:107) u (cid:107) ), where u ∈ R d is the localprojected coordinate, i.e., let φ x be the projection onto T x ( M ), u = φ x ( y − x ), also (cid:107) u (cid:107) ≤ (cid:107) y − x (cid:107) < √ (cid:15) . Q x, ( · ) is a three-order polynomial where the coefficients depend on the derivatives of extrinsic coordinatesof M and f at x . Then,1 (cid:15) (cid:90) M K (cid:15) ( x, y )( f ( x ) − f ( y )) dV ( y ) = (cid:90) M (cid:15) − d/ h ( (cid:107) x − y (cid:107) (cid:15) ) ( f ( x ) − f ( y )) (cid:15) dV ( y ) (A.1)= (cid:15) − d/ (cid:90) ˜ B (cid:18) ( ∇ f ( x ) T u ) (cid:15) + Q x, ( u ) (cid:15) + O ( (cid:15) ) (cid:19) (1 + O ( (cid:15) )) du, ˜ B := φ x ( B √ (cid:15) ( x ) ∩ M )37nd ˜ B ⊂ B √ (cid:15) (0; R d ), where we used the volume comparison relation dV ( y ) = (1+ O ( (cid:107) u (cid:107) )) du . By the met-ric comparison, (cid:107) y − x (cid:107) = (cid:107) u (cid:107) (1+ O ( (cid:107) u (cid:107) )), thus Vol( B √ (cid:15) (0; R d ) \ ˜ B ) ≤ Vol( B √ (cid:15) (0; R d ) \ B √ (cid:15) (1 − O ( (cid:15) )) (0; R d )) = (cid:15) d/ O ( (cid:15) ). Meanwhile, the integration of odd power of u vanishes on (cid:82) B √ (cid:15) (0; R d ) du . Thus one can verify that (cid:15) − d/ (cid:82) ˜ B ( ∇ f ( x ) T u ) (cid:15) du = m [ h ] |∇ f ( x ) | + O ( (cid:15) ), (cid:15) − d/ (cid:82) ˜ B Q x, ( u ) (cid:15) du = O ( (cid:15) / ), and thus the l.h.s. of (A.1)= m [ h ] |∇ f ( x ) | + O ( (cid:15) ). Integrating over (cid:82) M dV ( x ) proves that the bias error is O ( (cid:15) ). When p is notuniform, one can similarly show that (cid:15) (cid:82) M K (cid:15) ( x, y )( f ( x ) − f ( y )) p ( y ) dV ( y ) = m [ h ] |∇ f ( x ) | p ( x ) + O ( (cid:15) )and the proof extends. Proof of Lemma 3.3.
Since p is a constant, ∆ p = ∆. Apply Theorem 3.2 to when f = ψ k , and ( ψ k ± ψ l )where k (cid:54) = l , which are K cases and are all in C ∞ ( M ). Since the set { ψ k } Kk =1 is orthonormal in L ( M , dV ), p − (cid:104) ψ k , − ∆ ψ k (cid:105) p = pµ k ; p − (cid:104) ψ k ± ψ l , − ∆( ψ k ± ψ l ) (cid:105) p = p ( µ k + µ l ) , k (cid:54) = l, ≤ k, l ≤ K. Under the intersection of the K good events which happens with the indicated high probability, (16)holds. The needed threshold of N is the max of the K many ones. These thresholds and the constants inthe big- O ’s depend on p and ψ k for k up to K , and K is a fixed integer. This means that these constantsare determined by M , and thus are treated as absolute ones. Proof of Lemma 3.4.
First, for any f ∈ C ( M ), when N > N f depending on f , w.p. > − N − ,1 N (cid:107) ρ X f (cid:107) = (cid:104) f, f (cid:105) p + O f ( (cid:114) log NN ) . (A.2)This is because, by definition, N (cid:107) ρ X f (cid:107) = N (cid:80) Nj =1 f ( x i ) , which is independent sum of r.v. Y j := f ( x i ) . E Y j = (cid:82) M f ( y ) pdV ( y ) = (cid:104) f, f (cid:105) p , and boundedness | Y j | ≤ L Y := (cid:107) f (cid:107) ∞ , M which is O f (1) constant.The variance of Y j is bounded by E Y j = (cid:82) M f ( y ) pdV ( y ) := ν Y , which again is O f (1) constant. Sincelog N/N = o (1), (A.2) follows by the classical Bernstein.Now consider the K vectors u k = √ p ρ X ψ k . Apply (A.2) to when f = √ p ψ k and √ p ( ψ k ± ψ l ) for k (cid:54) = l , and consider the intersection of the K good events, which happens w.p. > − K N − , when N exceeds the maximum thresholds of N for the K cases. By (cid:104) ψ k , ψ l (cid:105) p = pδ kl , and the the polar formula4 u Tk u l = (cid:107) u k + u l (cid:107) − (cid:107) u k − u l (cid:107) , this gives (17). Both the K thresholds and all the constants in big-Oin (17) depend on { ψ k } Kk =1 . Proof of Lemma 3.5.
Suppose Part 1) has been shown with uniform constant in big- O for each i , thenunder the good event of Part 2), Part 2) holds automatically. In particular, since (19) is a property of therandom r.v. W ij only, where W ij are determined by the random points x i and irrelevant to the vector u ,the threshold of large N is determined by when Part 1) holds and is uniform for all u .It suffices to prove Part 1) to finish proving the lemma. For each i , we construct an event under whichthe bound in (19) holds for D i , and then apply a union bound. For i fixed,1 N D i = 1 N K (cid:15) ( x i , x i ) + 1 N (cid:88) j (cid:54) = i K (cid:15) ( x i , x j ) =: 1 ○ + 2 ○ . By Assumption 2(C2), K (cid:15) ( x i , x i ) = (cid:15) − d/ h (0) ≤ Θ( (cid:15) − d/ ). and thus 1 ○ = O ( N − (cid:15) − d/ ). Consider2 ○ (cid:48) := N − (cid:80) j (cid:54) = i K (cid:15) ( x i , x j ), which is an independent sum condition on x i and over the randomness of { x j } j (cid:54) = i . The ( N −
1) r.v. Y j := K (cid:15) ( x i , x j ) , j (cid:54) = i, satisfies that (Lemma 8 in [10], Lemma A.6 in [9]) E Y j = (cid:90) M K (cid:15) ( x i , y ) pdV ( y ) = pm + O ( (cid:15) ) . | Y j | ≤ L Y = Θ( (cid:15) − d/ ). Variance of Y j is bounded by E Y j = (cid:90) M K (cid:15) ( x i , y ) pdV ( y ) = p (cid:90) M (cid:15) − d h ( (cid:107) x i − y (cid:107) (cid:15) ) dV ( y ) , where since h ( r ) as a function on [0 , ∞ ) also satisfies Assumption 2, E Y j = (cid:15) − d/ p ( m [ h ] + O ( (cid:15) )) ≤ ν Y = Θ( (cid:15) − d/ ) . The constants in the big-Θ notation of L Y and ν Y are absolute ones depending on M and do not depend on x i . Since (cid:113) log NN(cid:15) d/ = o (1), the classical Bernstein gives that when N is sufficiently large w.p. > − N − , | ○ (cid:48) − E Y j | = O ( (cid:114) ν Y log NN ) = O ( (cid:114) log NN (cid:15) d/ ) | condition on x i . Under this event, 2 ○ (cid:48) = O (1), and then 2 ○ = (1 − N ) 2 ○ (cid:48) gives that2 ○ = m p + O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) + O ( 1 N ) = m p + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , and then 1 N D i = O ( N − (cid:15) − d/ ) + m p + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) = m p + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . By that x i is independent from { x j } j (cid:54) = i , and that the bound is uniform for all location of x i , we have thatw.p. > − N − , the bound in (19) for i , and applying union bound to the N events proves Part 1). Proof of Proposition 3.6.
Under the condition of the current proposition, Lemma 3.5 applies. For fixed K ,take the intersection of the good events in Lemma 3.5, 3.4 and 3.3, which happens w.p. > − K N − − N − for large enough N . Same as before, let u k = √ p ρ X ψ k , and by 3.4, the set { u , · · · , u K } is linearlyindependent. Let L = Span { u , · · · , u k } , then dim ( L ) = k for each k ≤ K . For any v ∈ L , v (cid:54) = 0, thereare c j , 1 ≤ j ≤ k , such that v = (cid:80) kj =1 c j u j . Again, by (17), we have N (cid:107) v (cid:107) = (cid:107) c (cid:107) (1 + O ( (cid:113) log NN )), andtogether with Lemma 3.5 2),1 m N v T Dv = 1 N (cid:107) v (cid:107) ( p + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = (cid:107) c (cid:107) (1 + O ( (cid:114) log NN ))( p + O ( (cid:15), (cid:114) log NN (cid:15) d/ ))= (cid:107) c (cid:107) p (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , (A.3)and the constant in O ( · ) is uniform for all v . For E N ( v ), (18) still holds, and by that K is fixed it gives E N ( v ) ≤ (cid:107) c (cid:107) (cid:32) pµ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) . Together with (A.3), we have that E N ( v ) m N v T Dv ≤ pµ k + O ( (cid:15), (cid:113) log NN(cid:15) d/ ) p (1 + O ( (cid:15), (cid:113) log NN(cid:15) d/ )) = µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , and the r.h.s. upper bounds λ k ( L rw ) by (8). 39 .2 Proofs in Section 4 Proof of (25) in Lemma 4.2.
Suppose s is small enough such that Lemma 2.2 holds with (cid:15) being s here.For each i , we construct an event under which the bound in (25) holds for ( D s ) i , and then apply a unionbound. For i fixed, ( D s ) i = 1 N H s ( x i , x i ) + 1 N (cid:88) j (cid:54) = i H s ( x i , x j ) =: 1 ○ + 2 ○ . By (14), H s ( x i , x i ) = O ( s − d/ ), and thus 1 ○ = O ( N − s − d/ ). Consider 2 ○ (cid:48) := N − (cid:80) j (cid:54) = i H s ( x i , x j ),which is an independent sum condition on x i and over the randomness of { x j } j (cid:54) = i . The ( N −
1) r.v. Y j := H s ( x i , x j ), j (cid:54) = i , satisfies that E Y j = (cid:82) M H s ( x i , y ) pdV ( y ) = p , and boundedness: again by (14), | Y j | ≤ L Y = Θ( s − d/ ). Variance of Y j is bounded by E Y j = (cid:82) M H s ( x i , y ) pdV ( y ) = pH s ( x i , x i ) ≤ ν Y =Θ( s − d/ ). The constants in the big-Θ notation of L Y and ν Y are from (14) which only depend on M andnot on x i . We use the notation O M ( · ) to stress this. Since (cid:113) log NNs d/ = o (1), the classical Bernstein givesthat with sufficiently large N , w.p. > − N − , | ○ (cid:48) − p | = O ( (cid:114) ν Y log NN ) = O M ( (cid:114) log NN s d/ ) | condition on x i . The rest of the proof is the same as that of Lemma 3.5 1), namely, by that 2 ○ = (1 − N ) 2 ○ (cid:48) , one canverify that both 2 ○ and then ( D s ) i equals p + O M ( (cid:113) log NNs d/ ) w.p. > − N − , and then (25) follows fromapplying union bound to the N events. Proof of Proposition 4.4.
The proof is by the same method as that of Proposition 4.1, and the differenceis that the eigenvectors are D -orthogonal here and normalized differently. Denote λ k ( L rw ) as λ k , and let L rw v k = λ k v k , normalized s.t. 1 N v Tk Dv l = δ kl , ≤ k, l ≤ N. Note that this normalization of v k differs from what is used in the final eigen-convergence rate result,Theorem 5.5, because the current proposition concerns eigenvalue only.Because (cid:15) d/ > c K log NN , (cid:15) d/ = Ω( log NN ), then the conditions needed in Proposition 3.6 are satisfied.Thus, with sufficiently large N , there is an event E (cid:48) UB which happens w.p. > − N − − K N − , underwhich D i > i s.t. L rw is well-defined, and (32) holds for λ k = λ k ( L rw ). Because the good event E (cid:48) UB in Proposition 3.6 assumes the good event in Lemma 3.5, then (20) also holds for all the v k and v k ± v l , which gives that ( m = 1 because h is Gaussian)1 = 1 N v Tk Dv k = 1 N (cid:107) v k (cid:107) ( p + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K, N ( v k ± v l ) T D ( v k ± v l ) = 1 N (cid:107) v k ± v l (cid:107) ( p + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) k (cid:54) = l, ≤ k, l ≤ K, and, equivalently (because p > N (cid:107) v k (cid:107) = 1 p (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K, N (cid:107) v k ± v l (cid:107) = 1 p (2 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , k (cid:54) = l, ≤ k, l ≤ K. (A.4)40e set δ , r , t , in the same way, and let f k = I r [ v k ], f k ∈ C ∞ ( M ). Because the good event E (0) onlyconcerns randomness of H δ(cid:15) ( x i , x j ), under E (0) which happens w.p. > − N − , q (0) δ(cid:15) ( v k ) = 1 N (cid:107) v k (cid:107) ( p + O ( (cid:114) log NN (cid:15) d/ )) = 1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ≤ k ≤ K,q (0) δ(cid:15) ( v k ± v l ) = 1 N (cid:107) v k ± v l (cid:107) ( p + O ( (cid:114) log NN (cid:15) d/ )) = 2 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k (cid:54) = l, ≤ k, l ≤ K. (A.5)Next, note that since ( D − W ) v k = ˜ m(cid:15)λ k Dv k , and with Gaussian h , ˜ m = 1, and v k are D -orthogonal, v Tk ( D − W ) v k N = (cid:15)λ k N v Tk Dv k = (cid:15)λ k , ≤ k ≤ K, ( v k ± v l ) T ( D − W )( v k ± v l ) N = (cid:15) ( λ k + λ l ) , k (cid:54) = l, ≤ k, l ≤ K. (A.6)Then, (27) in Lemma 4.3 where α = δ gives that q (2) δ(cid:15) ( v k ) = O ( δ − d/ ) (cid:15)λ k + O ( (cid:15) ) , ≤ k ≤ K,q (2) δ(cid:15) ( v k ± v l ) = O ( δ − d/ ) (cid:15) ( λ k + λ l ) + 2 O ( (cid:15) ) , k (cid:54) = l, ≤ k, l ≤ K, then same as in (33), they are both O ( (cid:15) ). Together with (A.5), this gives that (cid:104) f k , f k (cid:105) = 1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) + O ( (cid:15) ) , ≤ k ≤ K, (cid:104) f k , f l (cid:105) = 14 ( q δ(cid:15) ( v k + v l ) − q δ(cid:15) ( v k − v l )) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) + O ( (cid:15) ) , k (cid:54) = l, ≤ k, l ≤ K. (A.7)Then due to that O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = o (1), we have linear independence of { f j } Kj =1 with large enough N .Again, we let L k = Span { f , · · · , f k } , and have (35). For any f ∈ L k , f = (cid:80) kj =1 c j f j , f = I r [ v ] , v := (cid:80) kj =1 c j v j , 1 N v T Dv = k (cid:88) j =1 c j N v Tj Dv j = (cid:107) c (cid:107) , and, by that Lemma 3.5 2) holds, (20) applies to v to give N v T Dv = N (cid:107) v (cid:107) ( p + O ( (cid:15), (cid:113) log NN(cid:15) d/ )), thus1 N (cid:107) v (cid:107) = (cid:107) c (cid:107) p (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) . (A.8)Meanwhile, by (A.6), v T ( D − W ) vN = k (cid:88) j =1 c j v Tj ( D − W ) v j N = k (cid:88) j =1 c j (cid:15)λ j ≤ (cid:15)λ k (cid:107) c (cid:107) . (A.9)With the good event E (1) same as before (Lemma 4.2 at s = (cid:15) ), under E (0) ∩ E (1) , and the O M ( · ) notationmeans that the constant depends on M only and not on K , q (0) (cid:15) ( v ) = 1 N (cid:107) v (cid:107) ( p + O M ( (cid:114) log NN (cid:15) d/ )) , q (0) δ(cid:15) ( v ) = 1 N (cid:107) v (cid:107) ( p + O M ( (cid:114) δ − d/ log NN (cid:15) d/ )) , (A.10)41nd then, again, q (0) δ(cid:15) ( v ) − q (0) (cid:15) ( v ) = 1 N (cid:107) v (cid:107) O M ( δ − d/ (cid:114) log NN (cid:15) d/ ) = (cid:107) c (cid:107) p (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) O M ( δ − d/ (cid:114) log NN (cid:15) d/ )= (cid:107) c (cid:107) O M ( δ − d/ (cid:114) log NN (cid:15) d/ ) , where we used (A.8) to substitute the N (cid:107) v (cid:107) term after the leading N (cid:107) v (cid:107) p term is canceled in thesubtraction. The UB of q (2) (cid:15) ( v ) is similar as before, namely, by (26) in Lemma 4.3, inserting (A.9), andwith the shorthand that ˜ O ( (cid:15) ) stands for O ( (cid:15) (log (cid:15) ) ), q (2) (cid:15) ( v ) = v T ( D − W ) vN (1 + ˜ O ( (cid:15) )) + (cid:107) c (cid:107) O ( (cid:15) ) ≤ (cid:15) (cid:107) c (cid:107) ( λ k (1 + ˜ O ( (cid:15) )) + O ( (cid:15) )) . Thus we have that (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105) ≤ ( q (0) δ(cid:15) ( v ) − q (0) (cid:15) ( v )) + q (2) (cid:15) ( v ) ≤ (cid:15) (cid:107) c (cid:107) (cid:32) λ k (1 + ˜ O ( (cid:15) )) + O ( (cid:15) ) + δ − d/ O M ( 1 (cid:15) (cid:114) log NN (cid:15) d/ ) (cid:33) = (cid:15) (cid:107) c (cid:107) (cid:32) λ k + ˜ O ( (cid:15) ) + δ − d/ O M ( 1 (cid:15) (cid:114) log NN (cid:15) d/ ) (cid:33) . (by λ k ≤ . µ K ) (A.11)To lower bound (cid:104) f, f (cid:105) , again by (27) in Lemma 4.3, inserting (A.9),0 ≤ q (2) δ(cid:15) ( v ) ≤ Θ( δ − d/ ) v T ( D − W ) vN + (cid:107) c (cid:107) O ( (cid:15) ) ≤ (cid:15) (cid:107) c (cid:107) (cid:16) λ k Θ( δ − d/ ) + O ( (cid:15) ) (cid:17) , and then since λ k Θ( δ − d/ ) + O ( (cid:15) ) = O (1), we again have that q (2) δ(cid:15) ( v ) = (cid:107) c (cid:107) O ( (cid:15) ). We have derivedformula of q (0) δ(cid:15) ( v ) in (A.10) under E (0) ∩ E (1) , and inserting (A.8), q (0) δ(cid:15) ( v ) = 1 N (cid:107) v (cid:107) ( p + O ( (cid:114) log NN (cid:15) d/ )) = (cid:107) c (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) . (A.12)Thus, (cid:104) f, f (cid:105) = q (0) δ(cid:15) ( v ) − q (2) δ(cid:15) ( v ) = (cid:107) c (cid:107) (cid:32) O ( (cid:15), (cid:114) log NN (cid:15) d/ ) − O ( (cid:15) ) (cid:33) ≥ (cid:107) c (cid:107) (cid:32) − O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) . Together with (A.11), this gives (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105)(cid:104) f, f (cid:105) ≤ (cid:15) (cid:18) λ k + ˜ O ( (cid:15) ) + δ − d/ O M ( (cid:15) (cid:113) log NN(cid:15) d/ ) (cid:19) − O ( (cid:15), (cid:113) log NN(cid:15) d/ ) ≤ (cid:15) (cid:32) λ k + ˜ O ( (cid:15) ) + C(cid:15) (cid:114) log
NN (cid:15) d/ (cid:33) , where the notation of C is defined in the same way as in the proof of Proposition 4.1. The rest of theproof is the same, and the intersection of all the needed good events E (0) , E (1) , and E (cid:48) UB , which happensw.p. > − N − − K N − − N − . 42 .3 Proofs in Section 5 Proof of Theorem 5.5.
With sufficiently large N , we restrict to the intersection of the good events inProposition 4.4 and the K = k max + 1 good events of applying Theorem 5.1 1) to { ψ k } Kk =1 , which happensw.p. > − K N − − (6 + 4 K ) N − . The good event in Proposition 4.4 is contained in the good event E (cid:48) UB of Proposition 3.6 of the eigenvalue UB, which is again contained in the good event of Lemma 3.5.As a result, D i > i , and thus L rw is well-defined, and (20) holds.Applying (20) to u = v k , and because (cid:107) v k (cid:107) D/N = p , we have that ( m = 0 due to that h is Gaussian) p = (cid:107) v k (cid:107) DN = p (cid:107) v k (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K. (A.13)This verifies that (cid:107) v k (cid:107) = 1 + O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = 1 + o (1), for 1 ≤ k ≤ K .Because the good event E (cid:48) UB is under that in Lemma 3.4, (cid:107) φ k (cid:107) = 1 + O ( (cid:113) log NN ), 1 ≤ k ≤ K , andthen, applying (20) to u = φ k , (cid:107) φ k (cid:107) DN = p (cid:107) φ k (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = p (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K. (A.14)Step 2. for L rw : We follow a similar approach as in Proposition 5.2. When k = 1, λ = 0, and v isalways the constant vector, thus the discrepancy is zero. Consider 2 ≤ k ≤ K , by Theorem 5.1 1), andthat (cid:107) u (cid:107) ≤ √ N (cid:107) u (cid:107) ∞ for any u ∈ R N , (cid:107) L rw φ k − µ k φ k (cid:107) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ≤ k ≤ K, (A.15)and then by (20) which holds uniformly for all u ∈ R N , (cid:107) L rw φ k − µ k φ k (cid:107) DN = (cid:107) L rw φ k − µ k φ k (cid:107) √ p (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = O ( (cid:107) L rw φ k − µ k φ k (cid:107) ) . Thus, there is Err pt >
0, s.t. (cid:107) L rw φ k − µ k φ k (cid:107) DN ≤ Err pt , ≤ k ≤ K, Err pt = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . (A.16)The constant in big- O depends on first K eigenfunctions, and is an absolute one because K is fixed. Next,same as in the proof of Proposition 5.2, under the good event of Proposition 4.4 and by the definition of γ K as the maximum (half) eigen-gap among { µ k } ≤ k ≤ K , (41) holds for λ k .Let S k = Span { ( DN ) / v k } , S k is a 1-dimensional subspace in R N . Because v j ’s are D -orthogonal, S ⊥ k = Span { ( DN ) / v j , j (cid:54) = k, ≤ j ≤ N } . Note that P S ⊥ k (cid:18) ( DN ) / µ k φ k (cid:19) = ( DN ) / N (cid:88) j (cid:54) = k,j =1 v Tj ( DN ) φ k (cid:107) v j (cid:107) DN µ k v j , (A.17)and because L Trw Dv j = 1 (cid:15) ( I − W D − ) Dv j = 1 (cid:15) ( D − W ) v j = Dλ j v j , (A.18)43 S ⊥ k (cid:18) ( DN ) / L rw φ k (cid:19) = ( DN ) / N (cid:88) j (cid:54) = k,j =1 v Tj ( DN ) L rw φ k (cid:107) v j (cid:107) DN v j = ( DN ) / N (cid:88) j (cid:54) = k,j =1 1 N ( L Trw Dv j ) T φ k (cid:107) v j (cid:107) DN v j = ( DN ) / N (cid:88) j (cid:54) = k,j =1 1 N ( Dv j ) T φ k (cid:107) v j (cid:107) DN λ j v j . (A.19)Subtracting (A.17) and (A.19) gives P S ⊥ k (cid:18) ( DN ) / ( L rw φ k − µ k φ k ) (cid:19) = N (cid:88) j (cid:54) = k,j =1 ( λ j − µ k ) v Tj DN φ k (cid:107) v j (cid:107) DN ( DN ) / v j , and by that v j are D -orthogonal, and (41), (cid:107) P S ⊥ k (cid:18) ( DN ) / ( L rw φ k − µ k φ k ) (cid:19) (cid:107) = N (cid:88) j (cid:54) = k,j =1 | λ j − µ k | | v Tj DN φ k | (cid:107) v j (cid:107) DN ≥ γ K N (cid:88) j (cid:54) = k,j =1 | v Tj DN φ k | (cid:107) v j (cid:107) DN . The square-root of the l.h.s. (cid:107) P S ⊥ k (cid:18) ( DN ) / ( L rw φ k − µ k φ k ) (cid:19) (cid:107) ≤ (cid:107) ( DN ) / ( L rw φ k − µ k φ k ) (cid:107) = (cid:107) L rw φ k − µ k φ k (cid:107) DN ≤ Err pt , and the last inequality is by (A.16). This gives that N (cid:88) j (cid:54) = k,j =1 | v Tj DN φ k | (cid:107) v j (cid:107) DN / ≤ Err pt γ K . Meanwhile, P S ⊥ k (cid:0) ( DN ) / φ k (cid:1) = (cid:80) Nj (cid:54) = k,j =1 v Tj ( DN ) φ k (cid:107) v j (cid:107) DN ( DN ) / v j , and by D -orthogonality of v j again, (cid:80) Nj (cid:54) = k,j =1 | v Tj DN φ k | (cid:107) v j (cid:107) DN = (cid:107) P S ⊥ k (cid:0) ( DN ) / φ k (cid:1) (cid:107) . Thus, (cid:107) P S ⊥ k (cid:18) ( DN ) / φ k (cid:19) (cid:107) = N (cid:88) j (cid:54) = k,j =1 | v Tj DN φ k | (cid:107) v j (cid:107) DN / ≤ Err pt γ K = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . (A.20)Finally, define β k := v Tk ( DN ) φ k (cid:107) v k (cid:107) DN , β k ( DN ) / v k = P S k ( DN ) / φ k ,P S ⊥ k (cid:18) ( DN ) / φ k (cid:19) = ( DN ) / φ k − P S k ( DN ) / φ k = ( DN ) / ( φ k − β k v k ) , and then, together with (A.20), (cid:107) φ k − β k v k (cid:107) DN = (cid:107) P S ⊥ k (cid:18) ( DN ) / φ k (cid:19) (cid:107) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . Applying (20) to u = φ k − β k v k , (cid:107) φ k − β k v k (cid:107) = ( p (1+ O ( (cid:15), (cid:113) log NN(cid:15) d/ ))) / (cid:107) φ k − β k v k (cid:107) DN = O ( (cid:107) φ k − β k v k (cid:107) DN ),and we have shown that (cid:107) φ k − β k v k (cid:107) = O ( (cid:107) φ k − β k v k (cid:107) DN ) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) .
44o finish Step 2, it remains to show that | β k | = 1 + o (1), and then we define α k = β k . By definition of β k , (cid:107) φ k (cid:107) DN = (cid:107) ( DN ) / φ k (cid:107) = (cid:107) P S ⊥ k (cid:18) ( DN ) / φ k (cid:19) (cid:107) + (cid:107) β k ( DN ) / v k (cid:107) = (cid:107) P S ⊥ k (cid:18) ( DN ) / φ k (cid:19) (cid:107) + β k (cid:107) v k (cid:107) DN , by that (cid:107) v k (cid:107) DN = p , and (A.14), and (A.20), this gives p (1 + o (1)) = o (1) + β k p , and thus β k = 1 + o (1).Step 3. of L rw : For 2 ≤ k ≤ k max , by the relation (A.18), v Tk D ( L rw φ k − µ k φ k ) = ( L Trw Dv k ) T φ k − µ k v Tk Dφ k = ( λ k − µ k ) v Tk Dφ k , and we have shown that v k = α k φ k + ε k , α k = 1 + o (1) , (cid:107) ε k (cid:107) DN = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . Similar as in the proof of Proposition 5.3, | λ k − µ k || v Tk DN φ k | = | v Tk DN ( L rw φ k − µ k φ k ) | = | ( α k φ k + ε k ) T DN ( L rw φ k − µ k φ k ) |≤ | α k || φ Tk DN L rw φ k − µ k (cid:107) φ k (cid:107) DN | + | ε Tk DN ( L rw φ k − µ k φ k ) | =: 1 ○ + 2 ○ . By (A.14), (cid:107) φ k (cid:107) DN = p (1+ O ( (cid:15), (cid:113) log NN(cid:15) d/ )), and meanwhile, φ Tk DN L rw φ k = p E N ( ρ X ψ k ) = pµ k + O ( (cid:15), (cid:113) log NN(cid:15) d/ )by (16). Thus 1 ○ = O ( | φ Tk DN L rw φ k − µ k (cid:107) φ k (cid:107) DN | ) = O ( (cid:15), (cid:113) log NN(cid:15) d/ ). By (A.16) and the bound of ε k , | ○ | ≤ (cid:107) ε k (cid:107) DN (cid:107) L rw φ k − µ k φ k (cid:107) DN = O (Err pt ) which is O ( (cid:15) ) as shown in the proof of Proposition 5.3.Finally, by the definition of β k , and that (cid:107) v k (cid:107) DN = p , | λ k − µ k || β k | ≤ | ○ | + | ○ |(cid:107) v k (cid:107) DN = O ( (cid:15), (cid:113) log NN(cid:15) d/ ) + O ( (cid:15) ) p = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . Since | β k | = 1 + o (1), this proves the bound of | λ k − µ k | , and the argument for all k ≤ k max . D About the density-corrected graph Laplacian with ˜ W D.1 Proofs of the point-wise convergence of ˜ L rw Proof of Lemma 6.1.
Part 1): By that N D i = N ( Y i + (cid:80) Nj (cid:54) = i Y j ), Y j := K (cid:15) ( x i , x j ). For j (cid:54) = i , Y j hasexpectation (Lemma 8 in [10], Lemma A.6 in [9]) (cid:90) M K (cid:15) ( x i , y ) p ( y ) dV ( y ) = m p ( x i ) + m (cid:15) ( ωp ( x i ) + ∆ p ( x i )) + O p ( (cid:15) ) , where ω ∈ C ∞ ( M ) is determined by manifold extrinsic coordinates; Meanwhile, K (cid:15) ( x i , x i ) = (cid:15) − d/ h (0) = O ( (cid:15) − d/ ); In the independent sum N − (cid:80) j (cid:54) = i Y j , | Y j | is bounded by Θ( (cid:15) − d/ ) and has variance boundedby Θ( (cid:15) − d/ ). The rest of the proof is the same as in proving Lemma 3.5 1).Part 2): By part 1), under a good event E , which happens w.p. > − N − , (45) holds. Because p ( x ) ≥ p min > x ∈ M , we then have1 N D i = m p ( x i )(1 + ε ( D ) i ) , sup ≤ i ≤ N | ε ( D ) i | = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . (A.21)45ince O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = o (1), with large enough N and under E , D i >
0, then ˜ W is well-defined. Furtherly,by (A.21), 1 N N (cid:88) j =1 W ij N D j = 1 N N (cid:88) j =1 W ij m p ( x j )(1 + ε ( D ) j )= m N N (cid:88) j =1 W ij p ( x j ) (cid:32) O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) . (by that p > W ij ≥ Y j = K (cid:15) ( x i , x j ) p − ( x j ) (condition on x i ), for j (cid:54) = i , E Y j = (cid:90) M K (cid:15) ( x i , y ) p − ( y ) p ( y ) dV ( y ) = (cid:90) M K (cid:15) ( x i , y ) dV ( y ) = m + O ( (cid:15) ) ,Y j is bounded by Θ( (cid:15) − d/ ) and so is its variance, where the constants in big-Θ depend on p . Then, similaras in proving (45), we have a good event E which happens w.p. > − N − , under which1 m N N (cid:88) j =1 W ij p ( x j ) = 1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ≤ i ≤ N, (A.22)and the constant in big- O depends on p , the function h , and is uniform for all x i . Then under E ∩ E , N (cid:88) j =1 W ij D j = (cid:32) O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) (cid:32) O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) = 1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , which proves (46). Meanwhile, combining (46) and (A.21), N ˜ D i = ND i N (cid:88) j =1 W ij D j = 1 m p ( x i )(1 + ε ( D ) i ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = 1 m p ( x i ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , (A.23)and thus under E ∩ E , with large N , ˜ D i > L rw is well-defined. D.2 Proofs of the form rate
Proof of Lemma 6.4.
As has been shown in the proof of Lemma 6.1, under the good event in Lemma 6.11), (45) and then (A.21) hold. Notation of ε ( D ) i as in (A.21), and omitting h in the notations m , m , wehave that ˜ E N ( u ) = 1 m m (cid:15) N N (cid:88) i,j =1 W i,j ( u i − u j ) D i N D j N = 1 m (cid:15) N N (cid:88) i,j =1 W i,j ( u i − u j ) p ( x i ) p ( x j )(1 + ε ( D ) i )(1 + ε ( D ) j )= 1 m (cid:15) N N (cid:88) i,j =1 W i,j ( u i − u j ) p ( x i ) p ( x j ) (1 + ε ij ) , ε ij = O ( ε ( D ) i , ε ( D ) j )= m (cid:15) N N (cid:88) i,j =1 W i,j ( u i − u j ) p ( x i ) p ( x j ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , where the last row uses the non-negativity of W i,j ( u i − u j ) p ( x i ) p ( x j ) .46roof of (55) in the proof of Theorem 6.3: Proof.
Proof of (55) : By definition, for i (cid:54) = j , E V i,j = 1 m (cid:15) (cid:90) M (cid:90) M K (cid:15) ( x, y )( f ( x ) − f ( y )) dV ( x ) dV ( y )= 2 m (cid:15) (cid:90) M f ( x ) (cid:18)(cid:90) M K (cid:15) ( x, y )( f ( x ) − f ( y )) dV ( y ) (cid:19) dV ( x )By Lemma A.6 in [9], (cid:82) M K (cid:15) ( x, y )( f ( x ) − f ( y )) dV ( y ) = − (cid:15) m ∆ f ( x ) + O f ( (cid:15) ), and thus, E V i,j = (cid:104) f, − ∆ f (cid:105) + O f ( (cid:15) ) . Meanwhile, by that p ≥ p min >
0, 0 ≤ V ij ≤ Θ p (1) m (cid:15) K (cid:15) ( x i , x j )( f ( x i ) − f ( x j )) , and then by theboundedness and variance calculation in the proof of Theorem 3.5 of [9], one can verify that, with constantsdepending on ( f, p ), | V ij | ≤ L = Θ( (cid:15) − d/ ) , E V ij ≤ ν = Θ( (cid:15) − d/ ) . Then, by the same decoupling argument to derive the concentration of V-statistics, under good event E which happens w.p. > − N − ,1 N ( N − N (cid:88) i (cid:54) = j,i,j =1 V ij = E V ij + O f,p ( (cid:114) log NN (cid:15) d/ ) . As a result, 3 ○ in (54) = (1 − N ) N ( N − (cid:80) Ni (cid:54) = j,i,j =1 V ij = (1 − N ) (cid:18) (cid:104) f, − ∆ f (cid:105) + O f ( (cid:15) ) + O f,p ( (cid:113) log NN(cid:15) d/ ) (cid:19) ,which proves (55) because O ( N ) is higher order than O ( (cid:113) log NN(cid:15) d/ ). D.3 Proofs of the eigen-convergence of ˜ L rw Proof of Proposition 6.5.
The proof is similar to that of Proposition 3.6. We first restrict to the goodevents E ∩ E in Lemma 6.1, which happens w.p. > − N − , under which ˜ W and ˜ L rw are well-defined,and (45) and (46) hold.Let u k = ρ X ψ k . The following lemma, proved in below, shows the near ˜ D -orthonormal of the vectors u k and is an analogue of Lemma 3.4. Lemma D.1.
Under the same assumption of Lemma 6.1, when N is sufficiently large, w.p. > − N − − K N − , (cid:107) ρ X ψ k (cid:107) D = 1 m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K ;( ρ X ψ k ) T ˜ D ( ρ X ψ l ) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k (cid:54) = l, ≤ k, l ≤ K. (A.24)Under the good event of Lemma D.1, called E ⊂ E ∩ E , ˜ D i > i , and with large enough N ,the set { ˜ D / u k } Kk =1 is linearly independent, and then so is the set { u k } Kk =1 . Let L = Span { u , · · · , u k } ,then dim ( L ) = k for each k ≤ K . For any v ∈ L , v (cid:54) = 0, there are c j , 1 ≤ j ≤ k , such that v = (cid:80) kj =1 c j u j .By (A.24), we have m (cid:107) v (cid:107) D = (cid:107) c (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) . (A.25)47eanwhile, by defining ˜ B N ( u, v ) := ( ˜ E N ( u + v ) − ˜ E N ( u − v )), similarly as in Lemma 3.3, applyingTheorem 6.3 to the K cases where f = ψ k and ( ψ k ± ψ l ) gives that, under a good event E whichhappens w.p. > − K N − , ˜ E N ( ρ X ψ k ) = µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k = 1 , · · · , K, ˜ B N ( ρ X ψ k , ρ X ψ l ) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , k (cid:54) = l, ≤ k, l ≤ K. (A.26)Then, similar as in (18),˜ E N ( v ) = k (cid:88) j,l =1 c j c k ˜ B N ( u j , u k ) = k (cid:88) j =1 c j (cid:32) µ j + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) + k (cid:88) j (cid:54) = l,j,l =1 | c j || c l | O ( (cid:15), (cid:114) log NN (cid:15) d/ )= k (cid:88) j =1 µ j c j + (cid:107) c (cid:107) KO ( (cid:15), (cid:114) log NN (cid:15) d/ ) ≤ (cid:107) c (cid:107) (cid:32) µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) . (A.27)Back to the r.h.s. of (56), together with (A.25), we have that m ˜ E N ( v ) v T ˜ Dv ≤ µ k + O ( (cid:15), (cid:113) log NN(cid:15) d/ )1 + O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = µ k + O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , (A.28)and thus provides an UB of λ k . The bound holds for all the 1 ≤ k ≤ K , under good events E ∩ E . Proof of Lemma D.1.
Restrict to the good events E ∩ E in Lemma 6.1, which happens w.p. > − N − ,under which ˜ W and ˜ L rw are well-defined, and (A.23) holds. Then, (cid:107) ρ X ψ k (cid:107) D = 1 N N (cid:88) i =1 ψ k ( x i ) m p ( x i ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = (cid:107) ρ X ( p − / ψ k ) (cid:107) N m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K, (cid:107) ρ X ( ψ k ± ψ l ) (cid:107) D = (cid:107) ρ X ( p − / ( ψ k ± ψ l )) (cid:107) N m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , k (cid:54) = l, ≤ k, l ≤ K. Apply (A.2) to when f = p − / ψ k and p − / ( ψ k ± ψ l ) for k (cid:54) = l , and recall that (cid:104) ψ k , ψ l (cid:105) = δ kl , we have1 N (cid:107) ρ X ( p − / ψ k ) (cid:107) = 1 + O ( (cid:114) log NN ) , N (cid:107) ρ X ( p − / ( ψ k ± ψ l )) (cid:107) = 2 + O ( (cid:114) log NN ) , under a good event which happens w.p. > − K N − with large enough N , and then (cid:107) ρ X ψ k (cid:107) D = 1 m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K, (cid:107) ρ X ( ψ k ± ψ l ) (cid:107) D = 2 m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , k (cid:54) = l, ≤ k, l ≤ K, which proves (A.24). Proof of Proposition 6.6.
The proof follows the same strategy of proving Proposition 4.4, where we intro-duce weights by by p ( x i ) in the heat kernel interpolation map when constructing candidate eigenfunctionsfrom eigenvectors. 48e restrict to the good event E (cid:48)(cid:48) UB in Proposition 6.5, which is contained in E ∩ E in Lemma 6.1. Under E (cid:48)(cid:48) UB , D i >
0, ˜ D i >
0, and ˜ L rw is well-defined, and, with sufficiently large N , λ k ≤ λ K ≤ . µ K = O (1).Let ˜ L rw v k = λ k v k , normalized s.t. v Tk ˜ Dv l = δ kl , ≤ k, l ≤ N. Note that always λ = 0. Under E ∩ E , (A.23) holds, and thus m (cid:107) u (cid:107) D = m N N (cid:88) i =1 u i ( N ˜ D i ) = (cid:32) N N (cid:88) i =1 u i p ( x i ) (cid:33) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N , (A.29)and the constant in big- O is determined by ( M , p ) and uniform for all u . Define the notation (cid:107) u (cid:107) p − := 1 N N (cid:88) i =1 u i p ( x i ) , ∀ u ∈ R N . (A.30)Taking u to be v k and ( v k ± v l ) gives that m = (cid:107) v k (cid:107) p − (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K, m = (cid:107) v k ± v l (cid:107) p − (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , k (cid:54) = l, ≤ k, l ≤ K. (A.31)Set δ , r , t in the same way as in the proof of Proposition 4.4, and define ˜ I r [ u ] as in (57). We have (cid:104) ˜ I r [ u ] , ˜ I r [ u ] (cid:105) = q δ(cid:15) (˜ u ), (cid:104) ˜ I r [ u ] , Q t ˜ I r [ u ] (cid:105) = q (cid:15) (˜ u ), and (58) for s >
0. Next, similar as in the proof of Lemma4.2, one can show that with large N and w.p. > − N − ,1 N N (cid:88) j =1 H s ( x i , x j ) p ( x i ) p ( x j ) = 1 p ( x i ) (1 + O M ,p ( (cid:114) log NN s d/ )) , ≤ i ≤ N, (A.32)where the notation O M ,p ( · ) indicates that the constant depends on ( M , p ) and is uniform for all x i .Applying (A.32) to s = δ(cid:15) gives that, under a good event E (cid:48) (0) , which happens w.p. > − N − ,˜ q (0) δ(cid:15) ( u ) = 1 N N (cid:88) i =1 u i p ( x i ) (1 + O M ,p ( δ − d/ (cid:114) log NN (cid:15) d/ ))= (cid:107) u (cid:107) p − (1 + O M ,p ( δ − d/ (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N . (A.33)Applying (A.32) to s = (cid:15) gives the good event E (cid:48) (1) , which happens w.p. > − N − , under which˜ q (0) (cid:15) ( u ) = (cid:107) u (cid:107) p − (1 + O M ,p ( (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N . (A.34)The constants in big- O in (A.33) and (A.34) are determined by ( M , p ) only and uniform for all u .We also need an analogue of Lemma 4.3 to upper bound ˜ q (2) s , proved in below. The proof follows samemethod of Lemma 4.3, and makes use of the uniform boundedness of p from below, and Lemma 6.4. Lemma D.2.
Under Assumptions 1, h being Gaussian, let < α < be a fixed constant. Suppose (cid:15) = o (1) , (cid:15) d/ = Ω( log NN ) , then with sufficiently large N , and under the good event E of Lemma 6.1 1), ≤ ˜ q (2) (cid:15) ( u ) = (cid:32) O (cid:32) (cid:15) (log 1 (cid:15) ) , (cid:114) log NN (cid:15) d/ (cid:33)(cid:33) ( u T ( ˜ D − ˜ W ) u ) + (cid:107) u (cid:107) p − O ( (cid:15) ) , ∀ u ∈ R N , (A.35)49 nd ≤ ˜ q (2) α(cid:15) ( u ) ≤ . α − d/ ( u T ( ˜ D − ˜ W ) u ) + (cid:107) u (cid:107) p − O ( (cid:15) ) , ∀ u ∈ R N . (A.36) The constants in big- O only depend on ( M , p ) and are uniform for all u and α . We proceed to define f k = ˜ I r [ v k ], f k ∈ C ∞ ( M ). Next, note that since ( I − ˜ D − ˜ W ) v k = (cid:15)λ k v k , and v k are ˜ D -orthonormal, then v Tk ( ˜ D − ˜ W ) v k = (cid:15)λ k v Tk ˜ Dv k = (cid:15)λ k , , ≤ k ≤ K, ( v k ± v l ) T ( ˜ D − ˜ W )( v k ± v l ) = (cid:15) ( λ k + λ l ) , k (cid:54) = l, ≤ k, l ≤ K. (A.37)Taking α = δ in Lemma D.2, (A.36) then gives˜ q (2) δ(cid:15) ( v k ) = O ( δ − d/ ) (cid:15)λ k + O ( (cid:15) ) , ≤ k ≤ K, ˜ q (2) δ(cid:15) ( v k ± v l ) = O ( δ − d/ ) (cid:15) ( λ k + λ l ) + 2 O ( (cid:15) ) , k (cid:54) = l, ≤ k, l ≤ K, and both are O ( (cid:15) ). Meanwhile, (A.33)and (A.31) give that (with that δ > K and − ∆)˜ q (0) δ(cid:15) ( v k ) = (cid:107) v k (cid:107) p − (1 + O ( (cid:114) log NN (cid:15) d/ )) = m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K, ˜ q (0) δ(cid:15) ( v k ± v l ) = (cid:107) v k ± v l (cid:107) p − (1 + O ( (cid:114) log NN (cid:15) d/ )) = 2 m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , k (cid:54) = l, ≤ k, l ≤ K. (A.38)Putting together with the bounds of q (2) δ(cid:15) , this gives that (cid:104) f k , f k (cid:105) = ˜ q (0) δ(cid:15) ( v k ) − ˜ q (2) δ(cid:15) ( v k ) = m (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) − O ( (cid:15) ) , ≤ k ≤ K, (cid:104) f k , f l (cid:105) = 14 (˜ q δ(cid:15) ( v k + v l ) − ˜ q δ(cid:15) ( v k − v l )) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) + O ( (cid:15) ) , k (cid:54) = l, ≤ k, l ≤ K. (A.39)Then due to that O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = o (1), we have linear independence of { f j } Kj =1 with large enough N .Same as before, for any 2 ≤ k ≤ K , we let L k = Span { f , · · · , f k } , and have (35). For any f ∈ L k , f = (cid:80) kj =1 c j f j , f = ˜ I r [ v ] , v := (cid:80) kj =1 c j v j , and v T ˜ Dv = k (cid:88) j =1 c j v Tj ˜ Dv j = (cid:107) c (cid:107) . Meanwhile, by (A.29), m = 1, (cid:107) c (cid:107) = (cid:107) v (cid:107) D = (cid:107) v (cid:107) p − (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , (A.40)and by (A.37), v T ( ˜ D − ˜ W ) v = (cid:15) k (cid:88) j =1 λ j c j ≤ (cid:15) (cid:107) c (cid:107) λ k . (A.41)Then, as we work under E (0) ∩ E (1) , (A.33) and (A.34) hold. Applying to u = v and subtracting the two,˜ q (0) δ(cid:15) ( v ) − ˜ q (0) (cid:15) ( v ) = (cid:107) v (cid:107) p − O M ,p ( δ − d/ (cid:114) log NN (cid:15) d/ ) = (cid:107) c (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) O M ,p ( δ − d/ (cid:114) log NN (cid:15) d/ )= (cid:107) c (cid:107) O M ,p ( δ − d/ (cid:114) log NN (cid:15) d/ ) , q (2) (cid:15) ( v ), by (A.35), and with theshorthand that ˜ O ( (cid:15) ) stands for O ( (cid:15) (log (cid:15) ) ),˜ q (2) (cid:15) ( v ) = (cid:32) O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) (cid:33) ( v T ( ˜ D − ˜ W ) v ) + (cid:107) v (cid:107) p − O ( (cid:15) ) ≤ (cid:32) O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) (cid:33) (cid:15) (cid:107) c (cid:107) λ k + (cid:107) c (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) O ( (cid:15) ) ≤ (cid:15) (cid:107) c (cid:107) (cid:40) λ k (cid:32) O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) (cid:33) + O ( (cid:15) ) (cid:41) . Thus we have that (cid:104) f, f (cid:105) − (cid:104) f, Q t f (cid:105) ≤ ( q (0) δ(cid:15) ( v ) − q (0) (cid:15) ( v )) + q (2) (cid:15) ( v ) ≤ (cid:15) (cid:107) c (cid:107) (cid:40) λ k (cid:32) O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ ) (cid:33) + O ( (cid:15) ) + O M ,p ( δ − d/ (cid:15) (cid:114) log NN (cid:15) d/ ) (cid:41) = (cid:15) (cid:107) c (cid:107) (cid:40) λ k + ˜ O ( (cid:15) ) + O M ,p ( δ − d/ (cid:15) (cid:114) log NN (cid:15) d/ ) (cid:41) . (by λ k ≤ . µ K ) (A.42)To lower bound (cid:104) f, f (cid:105) , again by (A.36), (A.40) and (A.41),0 ≤ ˜ q (2) δ(cid:15) ( v ) ≤ Θ( δ − d/ )( v T ( ˜ D − ˜ W ) v ) + (cid:107) v (cid:107) p − O ( (cid:15) ) ≤ (cid:15) (cid:107) c (cid:107) (cid:16) λ k Θ( δ − d/ ) + O ( (cid:15) ) (cid:17) = (cid:107) c (cid:107) O ( (cid:15) ) . By (A.33) and (A.40),˜ q (0) δ(cid:15) ( v ) = (cid:107) v (cid:107) p − (1 + O ( (cid:114) log NN (cid:15) d/ )) = (cid:107) c (cid:107) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , (A.43)Thus, (cid:104) f, f (cid:105) = ˜ q (0) δ(cid:15) ( v ) − ˜ q (2) δ(cid:15) ( v ) = (cid:107) c (cid:107) (cid:32) O ( (cid:15), (cid:114) log NN (cid:15) d/ ) − O ( (cid:15) ) (cid:33) ≥ (cid:107) c (cid:107) (cid:32) − O ( (cid:15), (cid:114) log NN (cid:15) d/ ) (cid:33) . the rest of the proof is the same as that in Proposition 4.4, where the constant C is defined as C = c M ,p δ − d/ , c M ,p being a constant determined by ( M , p ), and then the constant c in the definition of c K also depends on p . The needed good events are E (cid:48) (0) , E (cid:48) (1) , and E (cid:48)(cid:48) UB , and the LB holds for k ≤ K . Proof of Lemma D.2.
By definition, for any u ∈ R N ,˜ q (2) (cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 H (cid:15) ( x i , x j ) p ( x i ) p ( x j ) ( u i − u j ) ≥ . Take t in Lemma 2.2 to be (cid:15) , since (cid:15) = o (1), the three equations hold when (cid:15) < (cid:15) . By (13), truncate atan δ (cid:15) = (cid:113) d ) (cid:15) log (cid:15) Euclidean ball, there is C , a positive constant determined by M , s.t.12 1 N N (cid:88) i,j =1 H (cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j / ∈ B δ(cid:15) ( x i ) } ( u i − u j ) ≤ C (cid:15) N N (cid:88) i,j =1 ( u i − u j ) p ( x i ) p ( x j ) { x j / ∈ B δ(cid:15) ( x i ) } . N N (cid:88) i,j =1 ( u i − u j ) p ( x i ) p ( x j ) = 2 N N (cid:88) i =1 u i p ( x i ) N N (cid:88) j =1 p ( x j ) − (cid:32) N N (cid:88) i =1 u i p ( x i ) (cid:33) ≤ N N (cid:88) i =1 u i p ( x i ) N N (cid:88) j =1 p ( x j ) ≤ N N (cid:88) i =1 u i p ( x i ) 1 p min = 2 p min (cid:107) u (cid:107) p − , (A.44)thus, ˜ q (2) (cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 H (cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) ) . (A.45)Apply (12) with the short hand that ˜ O ( (cid:15) ) stands for O ( (cid:15) (log (cid:15) ) ),˜ q (2) (cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j )(1 + ˜ O ( (cid:15) )) + O ( (cid:15) ) p ( x i ) p ( x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) )= (1 + ˜ O ( (cid:15) )) 12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + O ( (cid:15) ) 1 N N (cid:88) i,j =1 ( u i − u j ) p ( x i ) p ( x j ) + (cid:107) u (cid:107) p − O ( (cid:15) )= (1 + ˜ O ( (cid:15) )) 12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) ) (by (A.44)) . The truncation for K (cid:15) ( x i , x j ) gives that K (cid:15) ( x i , x j ) { x j / ∈ B δ(cid:15) ( x i ) } = O ( (cid:15) ), and then similarly as in (A.45),12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δ(cid:15) ( x i ) } ( u i − u j ) = 12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) p ( x i ) p ( x j ) ( u i − u j ) − (cid:107) u (cid:107) p − O ( (cid:15) ) . (A.46)By Lemma 6.4, and m = 2 with Gaussian h , we have that under the good event E of Lemma 6.1 1),˜ E N ( u ) = (cid:15) N N (cid:88) i,j =1 W i,j ( u i − u j ) p ( x i ) p ( x j ) (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N , and the constant in big- O is determined by ( M , p ) and uniform for all u . This gives that12 1 N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) p ( x i ) p ( x j ) ( u i − u j ) = (cid:15) ˜ E N ( u )(1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , (A.47)and as a result, together with (A.46),˜ q (2) (cid:15) ( u ) = (1 + ˜ O ( (cid:15) )) (cid:32) (cid:15) ˜ E N ( u )(1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) − (cid:107) u (cid:107) p − O ( (cid:15) ) (cid:33) + (cid:107) u (cid:107) p − O ( (cid:15) )= (cid:15) ˜ E N ( u )(1 + ˜ O ( (cid:15) ) + O ( (cid:114) log NN (cid:15) d/ )) + (cid:107) u (cid:107) p − O ( (cid:15) ) . Recall that ˜ E N ( u ) = (cid:15) u T ( ˜ D − ˜ W ) u , this proves (A.35).To prove (A.36), since 0 < α(cid:15) < (cid:15) , apply Lemma 2.2 with t = α(cid:15) , and similarly as in (A.45),52 q (2) α(cid:15) ( u ) = 12 1 N N (cid:88) i,j =1 H α(cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δα(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) )= 12 1 N N (cid:88) i,j =1 K α(cid:15) ( x i , x j )(1 + ˜ O ( α(cid:15) )) + O ( α (cid:15) ) p ( x i ) p ( x j ) { x j ∈ B δα(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) ) (by (12))= (1 + ˜ O ( (cid:15) )) 12 1 N N (cid:88) i,j =1 K α(cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δα(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) ) . (by (A.44))Then, using (29), (A.46) and (A.47),˜ q (2) α(cid:15) ( u ) ≤ (1 + ˜ O ( (cid:15) )) α − d/ N N (cid:88) i,j =1 K (cid:15) ( x i , x j ) p ( x i ) p ( x j ) { x j ∈ B δα(cid:15) ( x i ) } ( u i − u j ) + (cid:107) u (cid:107) p − O ( (cid:15) )= (1 + ˜ O ( (cid:15) )) α − d/ (cid:32) (cid:15) ˜ E N ( u )(1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) − (cid:107) u (cid:107) p − O ( (cid:15) ) (cid:33) + (cid:107) u (cid:107) p − O ( (cid:15) )= (1 + ˜ O ( (cid:15) ) + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) α − d/ (cid:15) ˜ E N ( u ) + (cid:107) u (cid:107) p − O ( (cid:15) ) , which proves (A.36) because ˜ O ( (cid:15) ) + O ( (cid:15), (cid:113) log NN(cid:15) d/ ) = o (1) and thus the constant in front of α − d/ is lessthan 1.1 for sufficiently small (cid:15) . Proof of Theorem 6.7.
With sufficiently large N , we restrict to the intersection of the good events inProposition 6.6 and the K = k max + 1 good events of applying Theorem 6.2 to { ψ k } Kk =1 . Because the goodevent in Proposition 6.6 is already under under E (cid:48)(cid:48) UB of Proposition 6.5, and under E ∩ E of Lemma6.1, the extra good events in addition to what is needed in Proposition 6.6 are those corresponding to E ∩ E in the proof of Theorem 6.2 where f = ψ k for each 1 ≤ k ≤ K , and, by a union bound, happensw.p. > − K · N − . This gives to the final high probability indicated in the theorem. In addition, D i > D i > i , and ˜ L rw is well-defined.The rest of the proof follows similar method as that of Theorem 5.5, but differs in the normalizationof the eigenvectors and that of the eigenfunctions. With the definition of (cid:107) u (cid:107) ˜ D and (cid:107) u (cid:107) p − in (59) and(A.30) respectively, As has been shown in (A.29), under E ∩ E , (cid:107) u (cid:107) D = (cid:107) u (cid:107) p − (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ∀ u ∈ R N , (A.48)and the constant in big-O is determined by ( M , p ) and uniform for all u . This also gives that withsufficiently large N ,0 . p max (cid:107) u (cid:107) N ≤ . (cid:107) u (cid:107) p − ≤ (cid:107) u (cid:107) D ≤ . (cid:107) u (cid:107) p − ≤ . p min (cid:107) u (cid:107) N , ∀ u ∈ R N , (A.49)because (cid:107) u (cid:107) p − = N (cid:80) Ni =1 u i p ( x i ) is upper bounded by p min N (cid:107) u (cid:107) and lower bounded by . p max (cid:107) u (cid:107) N . Apply(A.49) to u = v k , this gives that . p max (cid:107) v k (cid:107) ≤ (cid:107) v k (cid:107) D N = 1 ≤ . p min (cid:107) v k (cid:107) , that is (cid:114) p min . ≤ (cid:107) v k (cid:107) ≤ (cid:114) p max . , ≤ k ≤ K, (cid:107) v k (cid:107) = Θ(1) under the high probability event.Meanwhile, because the good event E (cid:48)(cid:48) UB is under the one needed in Lemma D.1, as shown in the proofof Lemma D.1, we have that (cid:107) ρ X ψ k (cid:107) p − = 1 N N (cid:88) i =1 ψ k ( x i ) p ( x i ) = 1 + O ( (cid:114) log NN ) , ≤ k ≤ K, where the constant in big- O depends on ( M , p ) and is uniform for all k ≤ K . By definition, N (cid:107) ˜ φ k (cid:107) p − = (cid:107) ρ X ψ k (cid:107) p − , and then, apply (A.48) to u = ˜ φ k , (cid:107) ˜ φ k (cid:107) D = (cid:107) ˜ φ k (cid:107) p − (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = 1 N (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) , ≤ k ≤ K. (A.50)Step 2. for ˜ L rw : When k = 1, λ = 0, and v is always the constant vector, thus the discrepancy iszero. Consider 2 ≤ k ≤ K , by Theorem 6.2 and that (cid:107) u (cid:107) ≤ √ N (cid:107) u (cid:107) ∞ , (cid:107) ˜ L rw ˜ φ k − µ k ˜ φ k (cid:107) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) , ≤ k ≤ K. (A.51)Then, by (A.49), √ N (cid:107) ˜ L rw ˜ φ k − µ k ˜ φ k (cid:107) ˜ D = O ( (cid:107) ˜ L rw ˜ φ k − µ k ˜ φ k (cid:107) ) = O ( (cid:15), (cid:113) log NN(cid:15) d/ ), that is, there isErr pt >
0, s.t. √ N (cid:107) L rw ˜ φ k − µ k ˜ φ k (cid:107) ˜ D ≤ Err pt , ≤ k ≤ K, Err pt = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . (A.52)Meanwhile, because we are under E (cid:48)(cid:48) UB , (41) holds for λ k . The proof then proceeds in the same way as theStep 2. in Theorem 5.5, replacing DN with ˜ D . Specifically, let S k = Span { ˜ D / v k } , S ⊥ k = Span { ˜ D / v j , j (cid:54) = k, ≤ j ≤ N } . We then have P S ⊥ k (cid:16) ˜ D / µ k ˜ φ k (cid:17) = ˜ D / (cid:80) Nj (cid:54) = k,j =1 v Tj ˜ D ˜ φ k (cid:107) v j (cid:107) D µ k v j , and because˜ L Trw ˜ Dv j = 1 (cid:15) ( I − ˜ W ˜ D − ) ˜ Dv j = 1 (cid:15) ( ˜ D − ˜ W ) v j = ˜ Dλ j v j , (A.53)we also have P S ⊥ k (cid:16) ˜ D / ˜ L rw ˜ φ k (cid:17) = ˜ D / (cid:80) Nj (cid:54) = k,j =1 v Tj ˜ D ˜ φ k (cid:107) v j (cid:107) D λ j v j . Take the subtraction P S ⊥ k (cid:16) ˜ D / ( ˜ L rw ˜ φ k − µ k ˜ φ k ) (cid:17) and do the same calculation as before, by (A.52), it gives that (cid:107) P S ⊥ k (cid:16) ˜ D / ˜ φ k (cid:17) (cid:107) = N (cid:88) j (cid:54) = k,j =1 | v Tj ˜ D ˜ φ k | (cid:107) v j (cid:107) D / ≤ Err pt √ N γ K = 1 √ N O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . (A.54)We similarly define β k := v Tk ˜ D ˜ φ k (cid:107) v k (cid:107) D , β k ˜ D / v k = P S k ˜ D / ˜ φ k , and P S ⊥ k (cid:16) ˜ D / ˜ φ k (cid:17) = ˜ D / ˜ φ k − P S k ˜ D / ˜ φ k =˜ D / (cid:16) ˜ φ k − β k v k (cid:17) . Then, by (A.54), we have (cid:107) ˜ φ k − β k v k (cid:107) ˜ D = (cid:107) P S ⊥ k (cid:16) ˜ D / ˜ φ k (cid:17) (cid:107) = √ N O ( (cid:15), (cid:113) log NN(cid:15) d/ ),and by (A.49), (cid:107) ˜ φ k − β k v k (cid:107) = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . To finish Step 2, it remains to show that | β k | = 1 + o (1), and then we define α k = β k . Note that (cid:107) ˜ φ k (cid:107) D = (cid:107) ˜ D / ˜ φ k (cid:107) = (cid:107) P S ⊥ k (cid:16) ˜ D / ˜ φ k (cid:17) (cid:107) + (cid:107) P S k (cid:16) ˜ D / ˜ φ k (cid:17) (cid:107) = (cid:107) P S ⊥ k (cid:16) ˜ D / ˜ φ k (cid:17) (cid:107) + β k (cid:107) v k (cid:107) D . (A.55)54y that (cid:107) v k (cid:107) D = N , inserting into (A.55) together with (A.54), (A.50),1 N (1 + O ( (cid:15), (cid:114) log NN (cid:15) d/ )) = ( 1 √ N O ( (cid:15), (cid:114) log NN (cid:15) d/ )) + β k N , which gives that 1 + o (1) = o (1) + β k by multiplying N to both sides.Step 3. of ˜ L rw : The proof is the same as Step 3. in Theorem 5.5, replacing DN with ˜ D . Specifically,using the relation (A.53), and the eigenvector consistency in Step 2, we have | λ k − µ k || v Tk ˜ D ˜ φ k | ≤ | α k || ˜ φ Tk ˜ D ˜ L rw ˜ φ k − µ k (cid:107) ˜ φ (cid:107) D | + | ε Tk ˜ D ( ˜ L rw ˜ φ k − µ k ˜ φ k ) | =: 1 ○ + 2 ○ . where (cid:107) ε k (cid:107) ˜ D = √ N O ( (cid:15), (cid:113) log NN(cid:15) d/ ) and α k = 1 + o (1). By (A.26), ˜ φ Tk ˜ D ˜ L rw ˜ φ k = ˜ E N ( ˜ φ k ) = N ( µ k + O ( (cid:15), (cid:113) log NN(cid:15) d/ ). Together with (A.50), one can show that N ○ = O ( (cid:15), (cid:113) log NN(cid:15) d/ ). For 2 ○ , with (A.52), onecan verify that 2 ○ ≤ (cid:107) ε k (cid:107) ˜ D (cid:107) ˜ L rw ˜ φ k − µ k ˜ φ k (cid:107) ˜ D = N O (Err pt ) = O ( (cid:15) ) N , where used that O (Err pt ) = O ( (cid:15) ) sameas before. Putting together, and with the definition of β k above, | λ k − µ k || β k | ≤ ○ + 2 ○ (cid:107) v k (cid:107) D = ( O ( (cid:15), (cid:113) log NN(cid:15) d/ ) + O ( (cid:15) )) /N /N = O ( (cid:15), (cid:114) log NN (cid:15) d/ ) . We have shown that | β k | = 1 + o (1), thus the bound of | λ k − µ k | is proved, and holds for k ≤ k maxmax