[PDF] Metric entropy, n-widths, and sampling of functions on manifolds

Abstract

We first investigate on the asymptotics of the Kolmogorov metric entropy and nonlinear n-widths of approximation spaces on some function classes on manifolds and quasi-metric measure spaces. Secondly, we develop constructive algorithms to represent those functions within a prescribed accuracy. The constructions can be based on either spectral information or scattered samples of the target function. Our algorithmic scheme is asymptotically optimal in the sense of nonlinear n-widths and asymptotically optimal up to a logarithmic factor with respect to the metric entropy.

Full PDF

aa r X i v : . [ m a t h . NA ] S e p Metric entropy, n -widths, and sampling of functions onmanifolds Martin Ehler

University of Vienna, Department of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna

Frank Filbir

Faculty of Mathematics, Technische Universit¨at M¨unchen, Boltzmannstrasse 3, 85748, Garching,Germany, and Helmholtz Zentrum M¨unchen, Ingolst¨adter Landstrasse 1, D-85764 Neuherberg

Abstract

We ﬁrst investigate on the asymptotics of the Kolmogorov metric entropy and nonlinear n -widths of approximation spaces on some function classes on manifolds and quasi-metricmeasure spaces. Secondly, we develop constructive algorithms to represent those functionswithin a prescribed accuracy. The constructions can be based on either spectral informa-tion or scattered samples of the target function. Our algorithmic scheme is asymptoticallyoptimal in the sense of nonlinear n -widths and asymptotically optimal up to a logarithmicfactor with respect to the metric entropy.

1. Introduction

In classical computational mathematics, it is customary to represent a function byusing ﬁnitely many parameters, e.g., the coeﬃcients of some truncated series expansion.Representing a function in terms of binary bits rather than a sequence of real numbers isthe problem of quantization. The integers thus obtained are represented as a bit stringto which coding techniques can be applied to achieve a ﬁnal compression. In wirelesscommunication, for instance, one needs to transform an analogue signal into a streamof bits, from which the original signal should be recovered at the receiving end with aminimal distortion.The theory of bit representation of functions pre-dates these modern requirementsand was already studied by Kolmogorov. The notion of metric entropy in the sense ofKolmogorov gives a measurement of the minimal number of bits needed to represent anarbitrary function from a compact subset of a function space. Babenko, Kolmogorov,Tikhomirov, Vitushkin, and Yerokhin have given many estimates on the metric entropyfor several compact subsets of the standard function spaces, cf. [21], [40, Chapter 2,3].Constructive algorithms were derived in [26] to represent functions in suitably deﬁnedBesov spaces on the sphere using asymptotically the same number of bits as the metricentropy of these classes, except for a logarithmic factor. A generalization was obtainedfor compact smooth Riemannian manifolds X and global approximation spaces in [11].Periodic function classes were considered in [8, 37]. Related measures of complexity are n -widths [20] and were studied for some classical function spaces in [32, 33], see [6] for Email addresses: [email protected] (Martin Ehler), [email protected] (Frank Filbir)

Preprint submitted to JAT August 15, 2018 he concept of nonlinear n -widths. We also refer to [1, Chapter 1], [10, Sections 1.3, 3,4], [19, Sections 6.3, 6.4] for related results.Both concepts, metric entropy and n -widths, are important complexity measures forthe analysis of functions on high-dimensional datasets occurring in biology, medicine,and related areas. Many computational schemes are categorized into the ﬁeld of manifoldlearning, where functions need to be learned from ﬁnitely many training data that areassumed to lie on some (unknown) manifold [2, 4, 30, 35, 36, 38]. While much of the recentresearch in this direction focuses on the understanding of data geometry, approximationtheory methods were introduced in [12, 13, 25, 27, 28].The purpose of the present paper is to generalize results on metric entropy and n -widths to the context of sampled functions on quasi-metric measure spaces and coveringa larger scale of approximation spaces. Indeed, we determine the asymptotics of themetric entropy and nonlinear n -widths for global approximation spaces. We also providea computational approximation method, and this scheme is asymptotically optimal withrespect to the nonlinear n -widths and asymptotically optimal up to a logarithmic factorin the sense of the metric entropy. In addition to obtaining theoretical bounds on themetric entropy and n -widths, our results have the following notable features:- The computational scheme is based on a linear approximation operator to asymp-totically match the optimal bounds in the sense of n -widths.- We give explicit schemes for converting the target function into a near minimal num-ber of bits by combining the linear approximation operator with linear quantization,and we derive a reconstruction scheme from such bits to a prescribed accuracy.- Our constructions can deal with both, spectral information as well as ﬁnitely manytraining data consisting of function evaluations at scattered data points.In addition, we shall investigate on the asymptotics of the metric entropy of local approx-imation spaces.The outline of this paper is as follows: In Section 2 we introduce the setting anddeﬁne metric entropy and n -widths. The asymptotics of the metric entropy of global (andunder additional assumptions also local) approximation spaces is determined in Section2.3. In Section 3, we introduce our approximation schemes for global approximationspaces and compute the asymptotics of the n -widths of global approximation spaces. InSection 4 we verify that linear quantization of the approximation scheme leads to optimalbit representations up to a logarithmic factor for the global approximation space. Localversions of approximation spaces are considered in Section 5. For the readers convenience,Appendix A contains a list and brief discussion of the technical assumptions used for themain results of the present paper.

2. Approximation spaces and their metric entropy and n -widths We ﬁrst ﬁx the setting and introduce some technical assumptions used throughoutthe paper. Let ( X , ρ ) be a quasi-metric space, i.e., a space with a nonnegative symmetricmap ρ satisfying ρ ( x, y ) = 0 if and only if x = y , and the triangle inequality holds at leastup to a constant factor c >

0, i.e., ρ ( x, y ) ≤ c ( ρ ( x, z ) + ρ ( z, y )) , for all x, y, z ∈ X .The quasi-metric ρ induces a topology and we assume that X is endowed with a Borelprobability measure µ . The system { ϕ k } ∞ k =0 ⊂ L ( X , µ ) is supposed to be an orthonormal2asis of continuous real-valued functions with ϕ ≡ { λ k } ∞ k =0 such that λ = 0 and λ k → ∞ as k → ∞ . Let N be a positive integer and we shall restrict ourselves to N = 2 n , where n is some nonnegative integer. The space of diﬀusion polynomials up to degree N isΠ N := span { ϕ k : λ k ≤ N } , and the generalized heat kernel is G t ( x, y ) = ∞ X k =0 exp( − λ k t ) ϕ k ( x ) ϕ k ( y ) , t > . (1)We use the symbols . when the left-hand side is bounded by a generic positive constanttimes the right-hand side, & is used analogously, and ≍ means that both . and & hold.Now, we can summarize the technical assumptions that are related to so-called upper andlower Gaussian bounds on the generalized heat kernel: Deﬁnition 2.1 ([3]) . Under the above notation, a quasi-metric space X is called a dif-fusion measure space if there is some α > x ∈ X and t >

0, the closed ball B t ( x ) of radius t at x is compact and µ ( B t ( x )) . t α , x ∈ X , t > . (ii) There is c > | G t ( x, y ) | . t − α/ exp (cid:16) − c ρ ( x, y ) t (cid:17) , x, y ∈ X , < t ≤ . (iii) We have t − α/ . G t ( x, x ) , x ∈ X , < t < . From here on, we suppose that X is a diﬀusion measure space throughout the presentpaper. It is also noteworthy that the conditions of a diﬀusion measure space imply that µ ( B t ( x )) ≍ t α , for all 0 < t <

1, cf. [12]. Thus, the volume of small balls essentially be-haves as in R α . The above conditions also imply the following estimate on the Christoﬀelfunction (or spectral function), X λ k ≤ t | ϕ k ( x ) | ≍ t α , x ∈ X , t ≥ , (2)see [3, 12, 13] for a discussion and references. By integrating over X , we obtaindim(Π t ) ≍ t α . (3) Remark 2.2.

It was pointed out in [3] that all technical assumptions are satisﬁed when X ⊂ R d is an α -dimensional compact, connected, Riemannian manifold without boundary,with non-negative Ricci curvature, geodesic distance ρ , and µ being the Riemannianvolume measure on X normalized with µ ( X ) = 1, { ϕ k } ∞ k =0 are the eigenfunctions ofthe Laplace-Beltrami operator on X , and {− λ k } ∞ k =0 are the corresponding eigenvaluesarranged in nonincreasing order, see also [18]. In this case, (3) is consistent with Weyl’slaw. For further discussions, we refer to [12, 13, 27].Given an arbitrary normed space Z and a subset Y ⊂ Z , we deﬁne, for f ∈ Z , E ( f, Y, Z ) := inf g ∈ Y k f − g k Z . (4)As we have pointed out already, we assume X is a diﬀusion measure space throughout.The notation | B in the following means restricting functions to the set B .3 eﬁnition 2.3. For a nonempty ball B ⊂ X and 1 ≤ p ≤ ∞ , the approximation space of order s > A s ( L p ( B )) = { f ∈ L p ( B ) : k f k A s ( L p ( B )) < ∞} , (5)where the associated norm is given by k f k A s ( L p ( B )) := k f k L p ( B ) + sup N ≥ N s E ( f, Π N | B , L p ( B )) . The unit ball in A s ( L p ( B )) is denoted by A s ( L p ( B )) := { f ∈ L p ( B ) : k f k A s ( L p ( B )) ≤ } . (6)For a nonempty ball B ⊂ X , let us denote the closure of S N Π N | B in L p ( B ) by X p ( B ). We can simply switch from L p ( B ) to X p ( B ) in the deﬁnition of approximationspace without changing anything, so that we observe A s ( L p ( B )) = A s ( X p ( B )). Remark 2.4.

In classical situations, the smoothness of a function is related to the accu-racy of its approximation, and for a more classical characterization of the approximationspaces in terms of pseudo-diﬀerential operators and K-functionals, we refer to [25]. Inturn it has also become customary to consider the accuracy of approximation itself as ameasurement of smoothness. For classical results on approximation spaces, we refer to[7, Chapter 7] and references therein. n -widths Metric entropy as studied in [23] refers to the minimal number of bits needed to rep-resent a function f up to precision ε . This number determines the maximal compressionwhen loss of information is bounded by ε . For a more stringent mathematical exposition,let Y be a compact subset of a metric space ( Z, ̺ ). Given ε >

0, let N ε ( Y ) be the ε -covering number of Y in Z , i.e., the minimal number of balls of radius ε that cover Y .Suppose that g , . . . , g N ε ( Y ) is a list of centers of these balls. Given any f ∈ Y , there is g j such that ̺ ( f, g j ) ≤ ε . We may then represent f using the binary representation of j , anduse g j as the reconstruction of f based on this representation. Any binary enumeration ofthese centers takes ⌈ log ( N ε ( Y )) ⌉ many bits, which somewhat measures the complexityof Y and where ⌈·⌉ denotes the ceiling function. Deﬁnition 2.5.

Let Y be a compact subset of a metric space ( Z, ̺ ) and, for ε >

0, let N ε ( Y ) be the ε -covering number of Y in Z . Then H ε ( Y, Z ) := log ( N ε ( Y )) (7)is called the metric entropy of Y in Z .Thus, the smallest integer not less than the metric entropy is the minimal number ofbits necessary to represent any f with precision ε .Let us also introduce some alternative notions of complexity. Given some Banachspace Z , let n ≥ M n : R n → Z be some mapping, for which Z n := M n ( R n ) denotes its range. For a compact subset Y ⊂ Z , we deﬁne the worst caseerror of approximation, consistently with (4), by E ( Y, Z n , Z ) := sup f ∈ Y E ( f, Z n , Z ) . Y with some quasi-norm and consider continuous maps ¯ a : Y → R n . For ﬁxed ¯ a ,the term M n (¯ a ( f )) ∈ Z is an approximation of f from Z n . The quantity E ( Y, ¯ a, M n , Z ) := sup f ∈ Y k f − M n (¯ a ( f )) k Z is the error of approximation by the nonlinear method of approximation M n (¯ a ( · )) : Y → Z . The continuous nonlinear n -width is deﬁned in [6] by d n ( Y, Z ) := inf ¯ a,M n E ( Y, ¯ a, M n , Z ) , where the inﬁmum runs over all continuous maps ¯ a : Y → R n and all mappings M n : R n → Z . For further considerations on n -widths, we refer to [6, 9, 14, 29, 31, 39].In the subsequent sections we shall compute the metric entropy (7) and the continuousnonlinear n -widths of the approximation ball A s ( X p ( X )) of radius 1 given by (6). Werefer to [17] for related results. We shall determine the asymptotics of the metric entropy of global and local approxi-mation spaces, so let B ⊂ X be some nonempty ball, which is allowed to coincide with X .Since A s ( X p ( B )) is not ﬁnite-dimensional, A s ( X p ( B )) is not compact in the approxima-tion space. Here, we consider A s ( X p ( B )) as a subspace of X p ( B ), in which it is compact,see [11] for B = X and the case B ( X can be proven analogously.The following result for the approximation space extends ﬁndings in [26] from thesphere to diﬀusion measure spaces. Theorem 2.6.

Let X be a diﬀusion measure space and suppose we are given a nonemptyball B ⊂ X such that { ϕ k | B } ∞ j =0 are linearly independent. If s > is ﬁxed, then H ε ( A s ( X p ( B )) , X p ( B )) ≍ (1 /ε ) α/s , for all < ε ≤ , (8) where α is the constant in Deﬁnition 2.1. This result was veriﬁed in [11] provided that B = X and p = ∞ . Note that the linearindependence condition is trivially satisﬁed for B = X since { ϕ k } ∞ j =0 is even an orthonor-mal basis. In case B is a proper subset of X , the linear independence condition may behard to check in general. However, if X is a compact, connected, real-analytic Rieman-nian manifold (equipped with a real analytic Riemannian metric), then the eigenfunctions { ϕ k } ∞ j =0 of the Laplace-Beltrami operator are real analytic due to the real analytic hy-poellipticity of elliptic partial diﬀerential operators, cf. [22, Section 5.3]. In this situation,the global linear independence implies the linear independence of the restrictions. Forresults related to Theorem 2.6, see [9, 31, 32].The proof of Theorem 2.6 is based on a general Banach space result, also used in[26, Theorem 4.1]. Let Z be a Banach space and { φ k } ∞ k =1 ⊂ Z be a sequence of linearlyindependent elements whose linear span is dense in Z , and deﬁne Z k := span { φ , . . . , φ k } with Z = { } . Let { δ k } ∞ k =0 be a nonincreasing sequence of positive numbers withlim k →∞ δ k = 0 and deﬁne A ( Z ; { δ k } ∞ k =0 , { φ k } ∞ k =1 ) := { f ∈ Z : E ( f, Z k , Z ) ≤ δ k , for k = 0 , , . . . } . (9)The following result goes back to Lorentz in [23]:5 heorem 2.7 (Theorem 3.3 in [24]) . Let { δ k } ∞ k =0 be a nonincreasing sequence of positivenumbers such that δ k ≤ cδ k , for k = 1 , , . . . and some constant c ∈ (0 , . For ℓ ≥ , let M ℓ := min { k : δ k ≤ e − ℓ } , then we have, for < ε ≤ , H ǫ (cid:0) A ( Z ; { δ k } ∞ k =0 , { φ k } ∞ k =1 ) , Z (cid:1) ≍ L X ℓ =1 M ℓ , (10) where L := 2 + ⌊ log(1 /ε ) ⌋ .Proof of Theorem 2.6. We aim to apply Theorem 2.7 with the function system { ϕ k | B } ∞ k =0 and Z = X p ( B ). There, the index set is supposed to start with k = 1, so we set φ k = ϕ k − | B , k = 1 , , . . . . To deﬁne the sequence { δ k } ∞ k =0 , we need some preparation.The linear independence assumption yields that (3) implies dim(Π N | B ) ≍ N α . By using Z k := span { ϕ | B , . . . , ϕ k − | B } , we derive, for N α ≤ k ≤ (2 N ) α ,(2 N ) s E ( f, Π N | B , X p ( B )) . k s/α E ( f, Z k , X p ( B )) . N s E ( f, Π N | B , X p ( B )) . Therefore, there are constants C i ≥

1, for i = 1 ,

2, such that the deﬁnitions δ = , δ k := (2 C ) − k − s/α , and δ = C , δ k := C k − s/α , lead to A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) ⊂ A s ( X p ( B )) ⊂ A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) , which also yields N ε (cid:0) A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) (cid:1) ≤ N ε ( A s ( X p ( B ))) ≤ N ε (cid:0) A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) (cid:1) . Since δ i ;2 k ≤ cδ i ; k , for c := 2 − s/α ∈ (0 , P Lℓ =1 M ℓ ≍ e Lα/s , so that the choice of L in (10) implies (8).It is obvious that increased precision requires more bits, and smoother functions canbe represented with fewer bits. The exact growth condition (8) reﬂects these thoughts ina quantitative fashion and that (8) serves as a benchmark for function representation ondiﬀusion measure spaces. The remaining part of the present work is dedicated to developa scheme that matches the optimality bound at least up to a logarithmic factor.

3. Global function approximation

This section is dedicated to introduce our approximation scheme, to discuss its char-acterization of certain function spaces, and to determine the asymptotics of the n -widths. We now collect few ingredients for our approximation scheme.

Deﬁnition 3.1.

We call an inﬁnitely often diﬀerentiable and non-increasing function H : R ≥ → R a low-pass ﬁlter if H ( t ) = 1 for t ≤ / H ( t ) = 0 for t ≥ H , the kernel K N ( x, y ) := ∞ X k =0 H ( λ k N ) ϕ k ( x ) ϕ k ( y ) , (11)6f. [12, 25, 27], is localized, i.e., for ﬁxed r > α and all x = y with N = 1 , , . . . , (cid:12)(cid:12) K N ( x, y ) (cid:12)(cid:12) . N α − r ρ ( x, y ) r . (12)Alternatively, we also have for ﬁxed r > α and all x, y with N = 1 , , . . . , (cid:12)(cid:12) K N ( x, y ) (cid:12)(cid:12) . N α max(1 , ( N r ρ ( x, y ) r ) . (13)We ﬁnd in [12, Inequality (3.12)] thatsup y ∈ X Z X (cid:12)(cid:12) K N ( x, y ) (cid:12)(cid:12) dµ ( x ) . ν on X and f ∈ L ( X , | ν | ), we can deﬁne, for N =1 , , . . . , σ N ( f, ν ) := ∞ X k =0 H ( λ k N ) Z X f ( x ) ϕ k ( x ) ϕ k dν ( x ) = Z X f ( x ) K N ( x, · ) dν ( x ) . (15) This section is dedicated to introduce further ingredients to develop our approximationscheme. We ﬁrst aim to replace the integral over diﬀusion polynomials with a ﬁnite sumor at least with an integral over a “simpler” measure.

Deﬁnition 3.2.

A signed Borel measure ν on X is called a quadrature measure of order N if Z X f ( x ) g ( x ) dµ ( x ) = Z X f ( x ) g ( x ) dν ( x ) , for all f, g ∈ Π N . While Π N is ﬁnite dimensional, the collection of products is so as well and, hence,there do exist ﬁnitely supported quadrature measure, cf. [5], [34, Section 1.5]. Note thatwe request exact integration of products from functions in Π N , which leads us to thefollowing product assumption on the structure of diﬀusion polynomials, which is alsoused in [13]: for f ∈ L p ( X ), letdist( f, Π N ) L ∞ := inf h ∈ Π N k f − h k L ∞ and assume that there is a constant a ≥ ǫ N := sup λ ℓ ,λ k ≤ N dist( ϕ ℓ ϕ k , Π aN ) L ∞ (16)satisﬁes N m ǫ N → N → ∞ , for all m > Deﬁnition 3.3.

Let the product assumption hold. For ﬁxed 1 ≤ p ≤ ∞ , a family( µ N ) ∞ N =1 of positive quadrature measures of order N , respectively, is called a family of Marcinkiewicz-Zygmund quadrature measures of order N , respectively, if k f k | µ N | ,L p ( X ) ≍ k f k L p ( X ) , for all f ∈ Π N , (17)where | µ N | denotes the total variation measure of µ N and k f k | µ N | ,L p ( X ) denotes the L p -norm of f with respect to | µ N | .Under fairly general assumptions, the results in [12, Theorem 3.1] and [13, Theorem5.8] imply the existence of a family of ﬁnitely supported Marcinkiewicz-Zygmund quadra-ture measures ( µ N ) ∞ N =1 of order N , respectively, such that µ N ) ≍ N α , where α is as in Deﬁnition 2.1. 7 .3. Widths and characterization of approximation spaces Fix 1 ≤ p ≤ ∞ and suppose that ( µ N ) ∞ N =1 is a family of Marcinkiewicz-Zygmundquadrature measures of order N , respectively. According to [12, Inequality (3.13)], wehave k σ N ( f, µ N ) k L p . k f k | µ N | ,L p , f ∈ L p ( X , | µ N | ) , (18)and the generic constant can be chosen independently of f and N . Later, we shall alsoneed that (17) extends to all functions in X p ( X ), i.e., that k f k | µ N | ,L p . k f k L p ( X ) holds,for all f ∈ X p ( X ). The latter is ﬁne for p = ∞ . For 1 ≤ p < ∞ , we have not yet foundany explicit example except for the measure µ itself. Therefore, we shall simply restrictourselves to µ N = µ , N = 1 , , . . . in this case.We can also estimate sup y ∈ X Z X | K N ( x, y ) | d | µ N | ( x ) . , (19)cf. [12, Inequality (3.12)]. Note that (19) is the quadrature version of (14). Next, werecall the characterization of A s ( L p ( X )) using σ N , see [12, 25, 27, 28]: Theorem 3.4.

Suppose that ≤ p ≤ ∞ and assume that ( µ N ) ∞ N =1 is a family ofMarcinkiewicz-Zygmund quadrature measures of order N , respectively, if p = ∞ . For ≤ p < ∞ we choose µ N = µ , N = 1 , , . . . . Assume further that H is a low-pass ﬁlter.Then, for all f ∈ A s ( L p ( X )) , we have k f − σ N ( f, µ N ) k L p ( X ) . N − s k f k A s ( L p ( X )) , (20) where the generic constant does not depend on N or f . On the other hand, if, for f ∈ L p ( X ) , there are generic constants not depending on N such that k f − σ N ( f, µ N ) k L p ( X ) . N − s , then f ∈ A s ( L p ( X )) . Remark 3.5.

Let us point out again that we suppose µ N = µ , N = 1 , , . . . for 1 ≤ p < ∞ . In this case, the term σ N ( f, µ N ) contains spectral information R X f ( x ) ϕ k ( x ) dµ ( x ).If p = ∞ and µ N has ﬁnite support, then we have an approximation scheme that usesﬁnitely many training data consisting of function evaluations at scattered data points insupp( µ N ).We have already determined the asymptotics of the Kolmogorov metric entropy. Here,we shall determine the n -widths for the global approximation space. Theorem 3.6.

The continuous nonlinear n -widths of A s ( X p ( X )) in X p ( X ) satisﬁes d n ( A s ( X p ( X )) , X p ( X )) ≍ n − s/α . In order to verify Theorem 3.6 we shall consider two more types of n -widths, cf. [33].The linear n -width of a subset Y in a Banach space Z is L n ( Y, Z ) := inf F n sup x ∈ Y k x − F n ( x ) k , where the inﬁmum is taken over all bounded linear operators F n on Z whose range is ofdimension at most n . The Bernstein n -width of Y in Z is B n ( Y, Z ) := sup Z n +1 sup { λ : λZ n +1 ⊂ Y } Z n +1 of Z of dimension at least n + 1 and Z n +1 denotes the unit ball in Z n +1 . The continuous nonlinear n -widths is sandwiched by B n ( Y, Z ) ≤ d n ( Y, Z ) ≤ L n ( Y, Z ) , (21)cf. [6]. We can now take care of the proof. We point out that we shall derive a lowerbound on the Bernstein n -width, which is closely related to the Bernstein estimates in[25] for integer derivatives when combined with K-functionals, cf. [6]. Proof of Theorem 3.6.

The operator σ N is bounded on L p ( X ) and σ N ( f ) is an elementin Π N . Consider N with N ≍ n /α , so that dim(Π N ) ≍ N α yields L n ( A s ( X p ( X )) , X p ( X )) . n − s/α , cf. Theorem 3.4 . Hence, the second inequality in (21) implies d n ( A s ( X p ( X )) , X p ( X )) . n − s/α .To establish the lower bound, we take the subspace Π N +1 and aim to derive a genericconstant c > cN − s Π N +1 ⊂ A s ( X p ( X )), where Π N +1 = { f ∈ Π N +1 : k f k L p ( X ) ≤ } . For f ∈ Π N +1 , let g := N − s f . We obtain k g k L p ( X ) ≤ N − s and, for M > N , we derive E ( g, Π M , L p ( X )) = 0. The choice M ≤ N yields E ( g, Π M , L p ( X )) = N − s k f − σ M ( f ) k L p ( X ) . N − s , because k f k L p ( X ) ≤ k σ M ( f ) k L p ( X ) . k f k L p ( X ) . Thus,sup M ≥ M s E ( g, Π M , L p ( X )) . c such that cN − s Π N +1 ⊂ A s ( L p ( X )) . Therefore, we obtain B n ( A s ( X p ( X )) , X p ( X )) & n − s/α , so that (21) concludes the proof.It should be mentioned that upper bounds on linear n -widths for compact Riemannianmanifolds were already derived in [16], where also the exact asymptotics were obtainedfor compact homogeneous manifolds.

4. Bit representation in global approximation spaces

This section is dedicated to verify that linear quantization of the approximation scheme σ N ( f, µ N ) enables bit representations matching the optimality bounds stated in Theorem2.6 up to a logarithmic factor. First, we recall the formula (15), σ N ( f, µ N ) = ∞ X k =0 H ( λ k N ) Z X f ( x ) ϕ k ( x ) ϕ k dµ N ( x ) , where ( µ N ) ∞ N =1 is a family of Marcinkiewicz-Zygmund quadrature measures of order N ,respectively, if p = ∞ . Again, if 1 ≤ p < ∞ , then we choose µ N = µ , N = 1 , , . . . .Since H ( t ) = 1, for t ∈ [0 , /

2] and H ( t ) = 0, for t >

1, we observe that H ( λ k N ) H ( λ k N ) = H ( λ k N ). If ( ν N ) ∞ N =1 is a family of quadrature measures of order 2 N , respectively, then astraight-forward calculation yields σ N ( f, µ N ) = Z X σ N ( f, µ N , x ) ∞ X k =0 H ( λ k N ) ϕ k ( x ) dν N ( x ) ϕ k , (22)9he representation (22) involves the quadrature measure ν N and the Marcinkiewicz-Zygmund quadrature measure µ N . To design the ﬁnal approximation scheme, we ﬁxsome S >

1, apply the quantization I N ( f, µ N , x ) = ⌊ N S σ N ( f, µ N , x ) ⌋ , (23)and deﬁne the actual approximation by σ ◦ N ( f, µ N , ν N ) := N − S Z X I N ( f, µ N , x ) ∞ X k =0 H ( λ k N ) ϕ k ( x ) dν N ( x ) ϕ k . (24)In other words, we replace σ N ( f, µ N , x ) in (22) with a number on the grid N S Z .We have the following result for the ball A s ( L p ( X )) of radius 1 of the global approxi-mation space given by (5). It extends results in [11] from compact Riemannian manifoldsand p = ∞ to diﬀusion measure spaces and to the entire range 1 ≤ p ≤ ∞ : Theorem 4.1.

For the case p = ∞ , we assume that ( µ N ) ∞ N =1 is a family of Marcinkiewicz-Zygmund quadrature measures of order N , respectively. For ≤ p < ∞ we choose µ N = µ , N = 1 , , . . . . Assume further that H is a low-pass ﬁlter. We suppose that ( ν N ) ∞ N =1 areMarcinkiewicz-Zygmund quadrature measures of order N with ν N ) . N α . Forﬁxed s > and S > max(1 , s ) , we apply the discretizations (23) and (24) . Then there isa constant c > such that, for all f ∈ A s ( L p , X ) , k f − σ ◦ N ( f, µ N , ν N ) k L p ( X ) ≤ cN − s (25) holds. For cN − s = ε ≤ and ε ≤ , the number of bits needed to represent all integers { I N ( f, µ N , x ) : x ∈ supp( ν N ) } does not exceed a positive constant (independent of ε )times (1 /ε ) α/s (1 + log (1 /ε )) . (26) Proof of Theorem 4.1.

The triangle inequality yields k f − σ ◦ N ( f, µ N , ν N ) k L p ( X ) . k f − σ N ( f, µ N ) k L p ( X ) + k σ N ( f, µ N ) − σ ◦ N ( f, µ N , ν N ) k L p ( X ) . Since Theorem 3.4 implies k f − σ N ( f, µ N ) k L p ( X ) . N − s k f k A s ( L p ( X )) , we only need totake care of the term on the far most right. The quantization (23) immediately yields | σ N ( f, µ N , x ) − N − S I N ( f, µ N , x ) | ≤ N − S , for all x ∈ supp( ν N ), (27)so that (22) and (19) imply k σ N ( f, µ N ) − σ ◦ N ( f, µ N , ν N ) k L p ( X ) = (cid:13)(cid:13) Z X (cid:16) σ N ( f, µ N , y ) − N − S I N ( f, µ N , y ) (cid:17) K N ( · , y ) dν N ( y ) (cid:13)(cid:13) L p ( X ) . N − S ≤ N − s . Hence, we have derived (25).To tackle (26), we observe that the localization property (13) yields k g k L ∞ . N α k g k L ,for all g ∈ Π N , see also [27, Lemma 5.5] for more general Nikolskii inequalities. We apply(27) and then use σ N ( f, µ N ) ∈ Π N with L p ֒ → L , which yields | I N ( f, µ N , x ) | . N S k σ N ( f, µ N ) k L ∞ ( X ) . N S + α k σ N ( f, µ N ) k L p ( X ) . k σ N ( f, µ N ) k L p ( X ) . k f k L p ( X ) holds. Since f is containedin the ball of radius 1, so that k f k L p ( X ) ≤

1, we see that | I N ( f, µ N , x ) | . N S + α . Thus, the number of bits needed to represent each I N ( f, µ N , x ) is at most log ( c N S + α ),where c ≥ c N S + α ≥ I N ( f, µ N , x ) would be zero. Since ν N ) . N α , we have { I N ( f, µ N , x ) : x ∈ supp( ν N ) } . N α . Therefore, the total number of bits needed to represent all numbers { I N ( f, µ N , x ) : x ∈ supp( ν N ) } is at most c N α log ( c N S + α ), where c is a positiveconstant. By using cN − s = ε ≤ ε ≤

1, we derive that the number of necessary bitsdoes not exceed c c α/s (1 /ε ) α/s log ( c ( c/ε ) ( S + α ) /s ) . (1 /ε ) α/s log (( c ) s/ ( S + α ) c/ε ) . (1 /ε ) α/s log (( c /ε ) s/ ( S + α ) c/ε ) . (1 /ε ) α/s (1 + log (1 /ε )) , which concludes the proof.Theorem 4.1 yields that our bit representation scheme is optimal with respect to themetric entropy as stated in Theorem 2.6 at least up to a logarithmic factor.

5. Bit representation of locally smooth functions

In the previous section we derived a bit-representation scheme for the global approx-imation space, i.e., B = X . It turns out that the case B ( X is more involved becausewe do not have results that characterize A s ( L p ( B )) by means of σ N . In fact, σ N requiresfunctions to be deﬁned globally so that one would be forced to deal with boundary eﬀects.On the other hand, B itself may not be a diﬀusion measure space satisfying all requiredassumptions. We circumvent such diﬃculties by dealing with a modiﬁed approximationspace, for which we can construct a bit representation scheme. Before we can discuss local smoothness, few technical details need to be introducedand we make use of C ∞ ( X ) := T s> A s ( L ∞ ( X )): Deﬁnition 5.1.

We say that X satisﬁes the smooth cut-oﬀ property if for any s > B ′ , B with B ′ ( B there is φ ∈ C ∞ ( X ) such that φ equals 1 on B ′ and φ vanishes outside of B with | φ ( x ) | ≤

1, for all x ∈ X .Note that any smooth manifold satisﬁes the smooth cut-oﬀ property. Deﬁnition 5.2.

For x ∈ X , the local approximation space in x is denoted by A s ( X p ( X ) , x )and deﬁned as the collection of f ∈ X p ( X ) such that there is an open ball B containing x with f φ ∈ A s ( X p ( X )), for all φ ∈ C ∞ ( X ) with support in B .It turns out that the approximation rate of σ N ( f, µ N ) characterizes the approximationclass of f , at least when switching to dyadic numbers N = 2 n , n = 1 , , . . . , cf. [12, 25,27, 28]: Theorem 5.3.

Let X satisfy the smooth cut-oﬀ property. For p = ∞ , suppose that ( µ n ) ∞ n =1 is a family of Marcinkiewicz-Zygmund quadrature measures of order n , respec-tively. For ≤ p < ∞ , we choose µ n = µ , n = 1 , , . . . . If H is a low-pass ﬁlter, thenthe following points are equivalent: i) f ∈ A s ( X p ( X ) , x ) ,(ii) there is a ball B centered at x such that k f − σ n ( f, µ n ) k L p ( B ) . − ns . (28)Note that the generic constant in (28) may depend on x and f . Nonetheless, the abovetheorem characterizes local approximation by means of σ n The local approximation space A s ( X p ( X ) , x ), for x ∈ X , is not endowed with anynorm. In view of Theorem 5.3, we ﬁx some ball B and introduce a new approximationspace in the following. Deﬁnition 5.4.

Let B be a nonempty ball in X . For p = ∞ , suppose that ( µ n ) ∞ n =1 isa family of Marcinkiewicz-Zygmund quadrature measures of order 2 n , respectively. For1 ≤ p < ∞ , we choose µ n = µ , n = 1 , , . . . . If H is a low-pass ﬁlter, then we deﬁne the local approximation space in B by A s ( X p ( X ) , B ) := { f ∈ X p ( X ) : k f k A s ( X p ( X ) ,B ) < ∞} , where k f k A s ( X p ( X ) ,B ) := k f k L p ( X ) + sup n ≥ ns k f − σ n ( f, µ n ) k L p ( B ) . Note that if p = ∞ , then the space A s ( X p ( X ) , B ) implicitly depends on the family( µ n ) ∞ n =1 of Marcinkiewicz-Zygmund quadrature measures of order 2 n , respectively. Asopposed to A s ( X p ( B )) deﬁned in (5), the space A s ( X p ( X ) , B ) consists of functions deﬁnedglobally that inherit approximation properties locally. By deﬁnition, we have k f − σ n ( f, µ n ) k L p ( B ) ≤ − ns k f k A s ( X p ( X ) ,B ) . (29)Since σ n ( f, µ n ) is a diﬀusion polynomial, we observe that A s ( X p ( X ) , B ) | B ֒ → A s ( X p ( B )) . However, we cannot claim that the reverse embedding also holds.It should be mentioned that σ n ( f, µ n ) in (29) approximates f locally but its deﬁni-tion needs global knowledge of f or at least on supp( µ n ) if p = ∞ . To enable the designof an approximation scheme that involves local information on f exclusively, we deﬁneone more approximation space by using some cut-oﬀ function: Deﬁnition 5.5.

For some ﬁxed φ ∈ C ∞ ( X ), deﬁne A s ( X p ( X ) , φ ) := { f ∈ X p ( X ) : f φ ∈ A s ( X p ( X )) } (30)endowed with the norm k f k A s ( X p ( X ) ,φ ) := k f k L p ( X ) + sup n ≥ ns E ( f φ, Π n , L p ( X )).To study local approximation, choose two concentric balls B ′ , B with B ′ ( B . If X satisﬁes the smooth cut-oﬀ property, then we can ﬁx some φ ∈ C ∞ ( X ) that is one on B ′ and zero outside of B . The Deﬁnition (30) yields that f ∈ A s ( X p ( X ) , φ ) implies k f − σ n ( f φ, µ n ) k L p ( B ′ ) ≤ k f φ − σ n ( f φ, µ n ) k L p ( X ) . − ns k f k A s ( X p ( X ) ,φ ) . In the subsequent section, we shall consider balls of radius r for both spaces, A s ( X p ( X ) , B ) := { f ∈ X p ( X ) : k f k A s ( X p ( X ) ,B ) ≤ } , (31) A s ( X p ( X ) , φ ) := { f ∈ X p ( X ) : k f k A s ( X p ( X ) ,φ ) ≤ } , (32)and aim to develop approximation schemes requiring only few bits.12 .2. Local bit-representation in approximation classes To design an approximation scheme for the spaces A s ( X p ( X ) , B ) and A s ( X p ( X ) , φ ),let B be a ball in X and let B ′ ⊂ B be another nonempty ball concentric with B and ofradius strictly less. It will turn out that the following scheme enables us to approximate f on B ′ . For some ﬁxed S >

1, we apply the quantization I n ( f, µ n , x ) as in (23) but inplace of (24) we deﬁne the local approximation by σ ◦ n ( f, µ n , ν n , B ) := 2 − nS Z B I n ( f, µ n , x ) ∞ X k =0 H ( λ k n +1 ) ϕ k ( x ) dν n ( x ) ϕ k , (33)We have the following result for the ball A s ( X p ( X ) , B ) of the localized approximationspace given by (31): Theorem 5.6.

Suppose that X satisﬁes the smooth cut-oﬀ property and that ( µ n ) ∞ n =1 is afamily of Marcinkiewicz-Zygmund quadrature measures of order n , respectively, if p = ∞ .For ≤ p < ∞ we choose µ n = µ , n = 1 , , . . . . Assume further that H is a low-passﬁlter. Let B, B ′ be two concentric balls, so that B ′ ( B . We also suppose that ( ν n ) ∞ n =1 are Marcinkiewicz-Zygmund quadrature measures of order n +1 with ν n ) . nα .For ﬁxed s > and S > max(1 , s ) , we apply the discretizations (23) and (33) . Thenthere is a constant c > such that, for all f ∈ A s ( X p ( X ) , B ) , k f − σ ◦ n ( f, µ n , ν n , B ) k L p ( B ′ ) ≤ c − ns (34) holds. For c − ns = ε ≤ and ε ≤ , the number of bits needed to represent all integers { I n ( f, µ n , x ) : x ∈ supp( ν n ) ∩ B } does not exceed a positive constant (independent of ε ) times (1 /ε ) α/s (1 + log (1 /ε )) . (35) Proof of Theorem 5.6.

For f ∈ A s ( X p ( X ) , B ), we use the localization property (12) with r = α + S and the embedding L p ֒ → L in the compact case to derive, for y ∈ B ′ , Z X \ B (cid:12)(cid:12) σ n ( f, µ n , x ) K n +1 ( x, y ) (cid:12)(cid:12) d | ν n | ( x ) . − nS k σ n ( f, µ n ) k | ν n | ,L p . − nS k σ n ( f, µ n ) k | µ n | ,L p . (36)The latter estimate holds because both ( ν n ) ∞ n =1 and ( µ n ) ∞ n =1 are families of Marcinkiewicz-Zygmund measures. The quantization (23) immediately yields | σ n ( f, µ n , x ) − − nS I n ( f, µ n , x ) | ≤ − nS , for all x ∈ supp( ν n ). (37)By using (37), (22), and (19), we derive k σ n ( f, µ n ) − σ ◦ n ( f, µ n , ν n , B ) k L p ( B ′ ) = (cid:13)(cid:13) σ n ( f, µ n ) − Z B − nS I n ( f, µ n , x ) K n +1 ( x, · ) dν n ( x ) (cid:13)(cid:13) L p ( B ′ ) . (cid:13)(cid:13) σ n ( f, µ n ) − Z B σ n ( f, µ n , x ) K n +1 ( x, · ) dν n ( x ) (cid:13)(cid:13) L p ( B ′ ) + 2 − nS . Next, we make use of (22) and (36) with (18) to obtain k σ n ( f, µ n ) − σ ◦ n ( f, µ n , B ) k L p ( B ′ ) . (cid:13)(cid:13) Z X \ B σ n ( f, µ n , x ) K n +1 ( x, · ) dν n ( x ) (cid:13)(cid:13) L p ( B ′ ) + 2 − nS . − nS k f k | µ n | ,L p + 2 − nS . − nS . µ n = µ , for 1 ≤ p < ∞ , so that k f k | µ n | ,L p . k f k L p ( X ) holds for the entire range 1 ≤ p ≤ ∞ . Note that k f k L p ( X ) ≤ f ∈ A s ( X p ( X ) , B ). The triangle inequality with (29) and the above estimate yield k f − σ ◦ n ( f, µ n , B ) k L p ( B ′ ) . k f − σ n ( f, µ n ) k L p ( B ′ ) + k σ n ( f, µ n ) − σ ◦ n ( f, µ n , ν n , B ) k L p ( B ′ ) . − ns + 2 − nS . − ns , which veriﬁes (34).For the remaining part, we can follow the lines of the proof of Theorem 4.1.Note that Theorem 5.6 still requires global knowledge of f because we need to build σ n ( f, µ n ). For the sake of completeness, we use a cut-oﬀ function to feed in localinformation only: Theorem 5.7.

Under the same assumption as in Theorem 4.1, let φ ∈ C ∞ ( X ) be one on B ′ and zero outside of B . Then there is a constant c > such that, for f ∈ A s ( X p ( X ) , φ ) , k f − σ ◦ n ( f φ, µ n , ν n ) k L p ( B ′ ) ≤ c − ns (38) holds. For c − ns = ε ≤ , the number of bits needed to represent all integers { I n ( f φ, µ n , x ) : x ∈ supp( ν n ) } does not exceed a positive constant (independent of ε ) times (1 /ε ) α/s (1 + log (1 /ε )) . (39) Proof.

The smooth cut-oﬀ property of φ yields that f ∈ A s ( X p ( X ) , φ ) leads to f φ ∈A s ( X p ( X )), due to | φ ( x ) | ≤

1, for all x ∈ X . Thus, Theorem 4.1 applied to f φ implies(38) and (39). Appendix A. Summary of the technical assumptions

The asymptotic bounds on the metric entropy in Theorem 2.6 hold for any diﬀusionmeasure space X as introduced in Deﬁnition 2.1 provided that the restrictions { ϕ k | B } ∞ k =0 are linearly independent on the ball B ⊂ X . Our computational scheme that matchesthis bound up to a logarithmic factor needs few additional technical assumptions that aredistributed within the present paper. Here, we list all the required assumptions for thesake of completeness:(I) X is a diﬀusion measure space (Deﬁnition 2.1),(II) ǫ N deﬁned in (16) satisﬁes N m ǫ N → N → ∞ , for all m > ν N ) ∞ N =1 of Marcinkiewicz-Zygmund quadrature measures of order2 N , respectively, satisfying ν N ) . N α (Deﬁnitions 3.2, 3.3 and (22)),(IV) the smooth cut-oﬀ property holds (Deﬁnition 5.1).Condition (I) is the general framework, and the additional conditions are further technicaldetails. Note that (IV) is only needed in Section 5, and it is well-known that this holds forsmooth manifolds. All of the above conditions hold for compact homogeneous manifolds,e.g., the sphere and the Grassmann manifold, if the function system { ϕ k } ∞ k =0 are eigen-functions of the Laplace operator, cf. [15]. Moreover, it was pointed out in [3] that theconditions are also satisﬁed for smooth compact Riemannian manifolds with nonnegativeRicci curvature, see also Remark 2.2. Families of Marcinkiewicz-Zygmund quadraturemeasures ( µ N ) ∞ N =1 were then constructed in [13], such that µ N ) ≍ N α .14 cknowledgment M.E. has been funded by the Vienna Science and Technology Fund (WWTF) throughproject VRG12-009. The authors thank H. N. Mhaskar for many fruitful discussions.

References [1] B. Carl and I. Stephani.

Entropy, Compactness and the Approximation of Operators .Cambridge University Press, 1990.[2] C. K. Chui, F. Filbir, and H. N. Mhaskar. Representation of functions on big data:Graphs and trees.

Appl. Comput. Harmon. Anal. , 38:489–509, 2015.[3] C. K. Chui and H. N. Mhaskar. Smooth function extension based on high dimensionalunstructured data.

Mathematics of Computation , 83(290):2865–2891, 2014.[4] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. J. Warner, and S. W.Zucker. Geometric diﬀusions as a tool for harmonic analysis and structure deﬁnitionof data. part i: Diﬀusion maps.

Proc. Nat. Acad. Sci. , 102:7426–7431, 2005.[5] P. de la Harpe and C. Pache. Cubature formulas, geometrical designs, reproducingkernels, and Markov operators. In

Inﬁnite groups: geometric, combinatorial anddynamical aspects , volume 248, pages 219–267, Basel, 2005. Birkh¨auser.[6] R. A DeVore, R. Howard, and C. Micchelli. Optimal nonlinear approximation.

Manuscripta Mathematica , 63(4):469–478, 1989.[7] R. A. DeVore and G. G. Lorentz.

Constructive Approximation . Springer-Verlag,1993.[8] D. Dung. Non-linear approximations using sets of ﬁnite cardinality or ﬁnite pseudo-dimension.

J. Complexity , 17(2):467–492, 2001.[9] D. Dung and T. Ullrich. n -widths and ε -dimensions for high-dimensional approxi-mations. Found. Comput. Math. , 13:965–1003, 2013.[10] D. E. Edmunds and H. Triebel.

Function Spaces, Entropy Numbers, DiﬀerentialOperators . Cambridge University Press, 1996.[11] M. Ehler and F. Filbir. ε -coverings of H¨older-Zygmund type spaces on data-deﬁnedmanifolds. Abstract and Applied Analysis , 2014.[12] F. Filbir and H. N. Mhaskar. A quadrature formula for diﬀusion polynomials corre-sponding to a generalized heat kernel.

J. Fourier Anal. Appl. , 16(5):629–657, 2010.[13] F. Filbir and H. N. Mhaskar. Marcinkiewicz–Zygmund measures on manifolds.

J. Complexity , 27(6):568–596, 2011.[14] S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The Gelfand widths of l p -balls for0 < p ≤ J. Complexity , 26:629–640, 2010.[15] D. Geller and I. Z. Pesenson. Band-limited localized Parseval frames and Besovspaces on compact homogeneous manifolds.

J. Geom. Anal. , 21(2):334–371, 2011.1516] D. Geller and I. Z. Pesenson. n -widths and approximation theory on compact Rie-mannian manifolds. In A. Mayeli, P. E. T. Jorgensen, and G. ´Olafsson, editors, Commutative and Noncommutative Harmonic Analysis and Applications , volume603. Contemporary Mathematics, 2013.[17] D. Geller and I. Z. Pesenson. Kolmogorov and linear widths of balls in Sobolev spaceson compact manifolds.

Math. Scand. , 115(1):96–122, 2014.[18] A. Grigor’yan. Estimates of heat kernels on riemannian manifolds. In B. Davies andYu. Safarov, editors,

Spectral Theory and Geometry. ICMS Instructional Conference,Edinburgh, 1998 , volume 273 of

London Math. Soc. Lecture Notes , pages 140–225.Cambridge Univ. Press, 1999.[19] D. D. Haroske and H. Triebel.

Distributions, Sobolev Spaces, Elliptic Equations .EMS Publishing House, 2007.[20] A. N. Kolmogorov. ¨Uber die beste Ann¨aherung von Funktionen einer gegebenenFunktionenklasse.

Ann. Math. , 37:107–111, 1936.[21] A. N. Kolmogorov and V. M. Tihomirov. ε -entropy and ε -capacity of sets infunction spaces. Uspehi Mat. Nauk., vol. 14 (1959), 3-86, English transl. inAmer. Math. Soc. Transl. 17, 277-364. MR0112032 (22:2890).[22] S. G. Krantz and H. R. Parks. A Primer of Real Analytic Functions . Birkh¨auser,1992.[23] G. G. Lorentz. Metric entropy and approximation.

Bull. Amer. Math. Soc. ,72(6):903–937, 1966.[24] G. G. Lorentz, M. v. Golitschek, and Y. Makovoz.

Constructive Approximation,Advanced Problems . Springer Verlag, New York, 1996.[25] M. Maggioni and H. N. Mhaskar. Diﬀusion polynomial frames on metric measurespaces.

Appl. Comput. Harmon. Anal. , 24(3):329–353, 2008.[26] H. N. Mhaskar. On the representation of smooth functions on the sphere usingﬁnitely many bits.

Appl. Comput. Harmon. Anal , 18(3):215–233, 2005.[27] H. N. Mhaskar. Eignets for function approximation on manifolds.

Appl. Com-put. Harmon. Anal. , 29:63–87, 2010.[28] H. N. Mhaskar. A generalized diﬀusion frame for parsimonious representation offunctions on data deﬁned manifolds.

Neural Networks , 24:345–359, 2011.[29] C. A. Micchelli and T. J. Rivlin.

Optimal Estimation in Approximation Theory ,chapter A survey of optimal recovery, pages 1–54. Plenum Press, 1977.[30] B. Nadler and M. Galun. Fundamental limitations of spectral clustering.

NeuralInformation Processing systems , 19:1017–1024, 2007.[31] E. Novak. Optimal recovery and n -widths for convex classes of functions. J. Ap-prox. Theory , 80(3):390–408, 1995.[32] A. Pinkus. n -Widths in Approximation Theory . Springer-Verlag, New York, 1985.[33] A. Pinkus. n -widths of Sobolev spaces in L p . Constr. Approx. , 1:15–62, 1985.[34] T. J Rivlin.

The Chebyshev polynomials . John Wiley and Sons, 1974.1635] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linearembedding.

Science , 290(5500):2323–2326, 2000.[36] B. Sch¨olkopf and A. Smola.

Learning with Kernels . MIT Press, Cambridge, MA,2002.[37] V. N. Temlyakov. Estimates for the asymptotic characteristics of classes of functionswith bounded mixed derivative or diﬀerence.

Proc. Steklov Inst. Math. , 189(161–197),1990.[38] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric frameworkfor nonlinear dimensionality reduction.

Science , 290(5500):2319–23, Dec 2000.[39] J. F. Traub and H. Wozniakowski.

A General Theory of Optimal Algorithms . Aca-demic Press, New York, 1980.[40] A. G. Vitushkin.