Metric entropy, n-widths, and sampling of functions on manifolds
aa r X i v : . [ m a t h . NA ] S e p Metric entropy, n -widths, and sampling of functions onmanifolds Martin Ehler
University of Vienna, Department of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna
Frank Filbir
Faculty of Mathematics, Technische Universit¨at M¨unchen, Boltzmannstrasse 3, 85748, Garching,Germany, and Helmholtz Zentrum M¨unchen, Ingolst¨adter Landstrasse 1, D-85764 Neuherberg
Abstract
We first investigate on the asymptotics of the Kolmogorov metric entropy and nonlinear n -widths of approximation spaces on some function classes on manifolds and quasi-metricmeasure spaces. Secondly, we develop constructive algorithms to represent those functionswithin a prescribed accuracy. The constructions can be based on either spectral informa-tion or scattered samples of the target function. Our algorithmic scheme is asymptoticallyoptimal in the sense of nonlinear n -widths and asymptotically optimal up to a logarithmicfactor with respect to the metric entropy.
1. Introduction
In classical computational mathematics, it is customary to represent a function byusing finitely many parameters, e.g., the coefficients of some truncated series expansion.Representing a function in terms of binary bits rather than a sequence of real numbers isthe problem of quantization. The integers thus obtained are represented as a bit stringto which coding techniques can be applied to achieve a final compression. In wirelesscommunication, for instance, one needs to transform an analogue signal into a streamof bits, from which the original signal should be recovered at the receiving end with aminimal distortion.The theory of bit representation of functions pre-dates these modern requirementsand was already studied by Kolmogorov. The notion of metric entropy in the sense ofKolmogorov gives a measurement of the minimal number of bits needed to represent anarbitrary function from a compact subset of a function space. Babenko, Kolmogorov,Tikhomirov, Vitushkin, and Yerokhin have given many estimates on the metric entropyfor several compact subsets of the standard function spaces, cf. [21], [40, Chapter 2,3].Constructive algorithms were derived in [26] to represent functions in suitably definedBesov spaces on the sphere using asymptotically the same number of bits as the metricentropy of these classes, except for a logarithmic factor. A generalization was obtainedfor compact smooth Riemannian manifolds X and global approximation spaces in [11].Periodic function classes were considered in [8, 37]. Related measures of complexity are n -widths [20] and were studied for some classical function spaces in [32, 33], see [6] for Email addresses: [email protected] (Martin Ehler), [email protected] (Frank Filbir)
Preprint submitted to JAT August 15, 2018 he concept of nonlinear n -widths. We also refer to [1, Chapter 1], [10, Sections 1.3, 3,4], [19, Sections 6.3, 6.4] for related results.Both concepts, metric entropy and n -widths, are important complexity measures forthe analysis of functions on high-dimensional datasets occurring in biology, medicine,and related areas. Many computational schemes are categorized into the field of manifoldlearning, where functions need to be learned from finitely many training data that areassumed to lie on some (unknown) manifold [2, 4, 30, 35, 36, 38]. While much of the recentresearch in this direction focuses on the understanding of data geometry, approximationtheory methods were introduced in [12, 13, 25, 27, 28].The purpose of the present paper is to generalize results on metric entropy and n -widths to the context of sampled functions on quasi-metric measure spaces and coveringa larger scale of approximation spaces. Indeed, we determine the asymptotics of themetric entropy and nonlinear n -widths for global approximation spaces. We also providea computational approximation method, and this scheme is asymptotically optimal withrespect to the nonlinear n -widths and asymptotically optimal up to a logarithmic factorin the sense of the metric entropy. In addition to obtaining theoretical bounds on themetric entropy and n -widths, our results have the following notable features:- The computational scheme is based on a linear approximation operator to asymp-totically match the optimal bounds in the sense of n -widths.- We give explicit schemes for converting the target function into a near minimal num-ber of bits by combining the linear approximation operator with linear quantization,and we derive a reconstruction scheme from such bits to a prescribed accuracy.- Our constructions can deal with both, spectral information as well as finitely manytraining data consisting of function evaluations at scattered data points.In addition, we shall investigate on the asymptotics of the metric entropy of local approx-imation spaces.The outline of this paper is as follows: In Section 2 we introduce the setting anddefine metric entropy and n -widths. The asymptotics of the metric entropy of global (andunder additional assumptions also local) approximation spaces is determined in Section2.3. In Section 3, we introduce our approximation schemes for global approximationspaces and compute the asymptotics of the n -widths of global approximation spaces. InSection 4 we verify that linear quantization of the approximation scheme leads to optimalbit representations up to a logarithmic factor for the global approximation space. Localversions of approximation spaces are considered in Section 5. For the readers convenience,Appendix A contains a list and brief discussion of the technical assumptions used for themain results of the present paper.
2. Approximation spaces and their metric entropy and n -widths We first fix the setting and introduce some technical assumptions used throughoutthe paper. Let ( X , ρ ) be a quasi-metric space, i.e., a space with a nonnegative symmetricmap ρ satisfying ρ ( x, y ) = 0 if and only if x = y , and the triangle inequality holds at leastup to a constant factor c >
0, i.e., ρ ( x, y ) ≤ c ( ρ ( x, z ) + ρ ( z, y )) , for all x, y, z ∈ X .The quasi-metric ρ induces a topology and we assume that X is endowed with a Borelprobability measure µ . The system { ϕ k } ∞ k =0 ⊂ L ( X , µ ) is supposed to be an orthonormal2asis of continuous real-valued functions with ϕ ≡ { λ k } ∞ k =0 such that λ = 0 and λ k → ∞ as k → ∞ . Let N be a positive integer and we shall restrict ourselves to N = 2 n , where n is some nonnegative integer. The space of diffusion polynomials up to degree N isΠ N := span { ϕ k : λ k ≤ N } , and the generalized heat kernel is G t ( x, y ) = ∞ X k =0 exp( − λ k t ) ϕ k ( x ) ϕ k ( y ) , t > . (1)We use the symbols . when the left-hand side is bounded by a generic positive constanttimes the right-hand side, & is used analogously, and ≍ means that both . and & hold.Now, we can summarize the technical assumptions that are related to so-called upper andlower Gaussian bounds on the generalized heat kernel: Definition 2.1 ([3]) . Under the above notation, a quasi-metric space X is called a dif-fusion measure space if there is some α > x ∈ X and t >
0, the closed ball B t ( x ) of radius t at x is compact and µ ( B t ( x )) . t α , x ∈ X , t > . (ii) There is c > | G t ( x, y ) | . t − α/ exp (cid:16) − c ρ ( x, y ) t (cid:17) , x, y ∈ X , < t ≤ . (iii) We have t − α/ . G t ( x, x ) , x ∈ X , < t < . From here on, we suppose that X is a diffusion measure space throughout the presentpaper. It is also noteworthy that the conditions of a diffusion measure space imply that µ ( B t ( x )) ≍ t α , for all 0 < t <
1, cf. [12]. Thus, the volume of small balls essentially be-haves as in R α . The above conditions also imply the following estimate on the Christoffelfunction (or spectral function), X λ k ≤ t | ϕ k ( x ) | ≍ t α , x ∈ X , t ≥ , (2)see [3, 12, 13] for a discussion and references. By integrating over X , we obtaindim(Π t ) ≍ t α . (3) Remark 2.2.
It was pointed out in [3] that all technical assumptions are satisfied when X ⊂ R d is an α -dimensional compact, connected, Riemannian manifold without boundary,with non-negative Ricci curvature, geodesic distance ρ , and µ being the Riemannianvolume measure on X normalized with µ ( X ) = 1, { ϕ k } ∞ k =0 are the eigenfunctions ofthe Laplace-Beltrami operator on X , and {− λ k } ∞ k =0 are the corresponding eigenvaluesarranged in nonincreasing order, see also [18]. In this case, (3) is consistent with Weyl’slaw. For further discussions, we refer to [12, 13, 27].Given an arbitrary normed space Z and a subset Y ⊂ Z , we define, for f ∈ Z , E ( f, Y, Z ) := inf g ∈ Y k f − g k Z . (4)As we have pointed out already, we assume X is a diffusion measure space throughout.The notation | B in the following means restricting functions to the set B .3 efinition 2.3. For a nonempty ball B ⊂ X and 1 ≤ p ≤ ∞ , the approximation space of order s > A s ( L p ( B )) = { f ∈ L p ( B ) : k f k A s ( L p ( B )) < ∞} , (5)where the associated norm is given by k f k A s ( L p ( B )) := k f k L p ( B ) + sup N ≥ N s E ( f, Π N | B , L p ( B )) . The unit ball in A s ( L p ( B )) is denoted by A s ( L p ( B )) := { f ∈ L p ( B ) : k f k A s ( L p ( B )) ≤ } . (6)For a nonempty ball B ⊂ X , let us denote the closure of S N Π N | B in L p ( B ) by X p ( B ). We can simply switch from L p ( B ) to X p ( B ) in the definition of approximationspace without changing anything, so that we observe A s ( L p ( B )) = A s ( X p ( B )). Remark 2.4.
In classical situations, the smoothness of a function is related to the accu-racy of its approximation, and for a more classical characterization of the approximationspaces in terms of pseudo-differential operators and K-functionals, we refer to [25]. Inturn it has also become customary to consider the accuracy of approximation itself as ameasurement of smoothness. For classical results on approximation spaces, we refer to[7, Chapter 7] and references therein. n -widths Metric entropy as studied in [23] refers to the minimal number of bits needed to rep-resent a function f up to precision ε . This number determines the maximal compressionwhen loss of information is bounded by ε . For a more stringent mathematical exposition,let Y be a compact subset of a metric space ( Z, ̺ ). Given ε >
0, let N ε ( Y ) be the ε -covering number of Y in Z , i.e., the minimal number of balls of radius ε that cover Y .Suppose that g , . . . , g N ε ( Y ) is a list of centers of these balls. Given any f ∈ Y , there is g j such that ̺ ( f, g j ) ≤ ε . We may then represent f using the binary representation of j , anduse g j as the reconstruction of f based on this representation. Any binary enumeration ofthese centers takes ⌈ log ( N ε ( Y )) ⌉ many bits, which somewhat measures the complexityof Y and where ⌈·⌉ denotes the ceiling function. Definition 2.5.
Let Y be a compact subset of a metric space ( Z, ̺ ) and, for ε >
0, let N ε ( Y ) be the ε -covering number of Y in Z . Then H ε ( Y, Z ) := log ( N ε ( Y )) (7)is called the metric entropy of Y in Z .Thus, the smallest integer not less than the metric entropy is the minimal number ofbits necessary to represent any f with precision ε .Let us also introduce some alternative notions of complexity. Given some Banachspace Z , let n ≥ M n : R n → Z be some mapping, for which Z n := M n ( R n ) denotes its range. For a compact subset Y ⊂ Z , we define the worst caseerror of approximation, consistently with (4), by E ( Y, Z n , Z ) := sup f ∈ Y E ( f, Z n , Z ) . Y with some quasi-norm and consider continuous maps ¯ a : Y → R n . For fixed ¯ a ,the term M n (¯ a ( f )) ∈ Z is an approximation of f from Z n . The quantity E ( Y, ¯ a, M n , Z ) := sup f ∈ Y k f − M n (¯ a ( f )) k Z is the error of approximation by the nonlinear method of approximation M n (¯ a ( · )) : Y → Z . The continuous nonlinear n -width is defined in [6] by d n ( Y, Z ) := inf ¯ a,M n E ( Y, ¯ a, M n , Z ) , where the infimum runs over all continuous maps ¯ a : Y → R n and all mappings M n : R n → Z . For further considerations on n -widths, we refer to [6, 9, 14, 29, 31, 39].In the subsequent sections we shall compute the metric entropy (7) and the continuousnonlinear n -widths of the approximation ball A s ( X p ( X )) of radius 1 given by (6). Werefer to [17] for related results. We shall determine the asymptotics of the metric entropy of global and local approxi-mation spaces, so let B ⊂ X be some nonempty ball, which is allowed to coincide with X .Since A s ( X p ( B )) is not finite-dimensional, A s ( X p ( B )) is not compact in the approxima-tion space. Here, we consider A s ( X p ( B )) as a subspace of X p ( B ), in which it is compact,see [11] for B = X and the case B ( X can be proven analogously.The following result for the approximation space extends findings in [26] from thesphere to diffusion measure spaces. Theorem 2.6.
Let X be a diffusion measure space and suppose we are given a nonemptyball B ⊂ X such that { ϕ k | B } ∞ j =0 are linearly independent. If s > is fixed, then H ε ( A s ( X p ( B )) , X p ( B )) ≍ (1 /ε ) α/s , for all < ε ≤ , (8) where α is the constant in Definition 2.1. This result was verified in [11] provided that B = X and p = ∞ . Note that the linearindependence condition is trivially satisfied for B = X since { ϕ k } ∞ j =0 is even an orthonor-mal basis. In case B is a proper subset of X , the linear independence condition may behard to check in general. However, if X is a compact, connected, real-analytic Rieman-nian manifold (equipped with a real analytic Riemannian metric), then the eigenfunctions { ϕ k } ∞ j =0 of the Laplace-Beltrami operator are real analytic due to the real analytic hy-poellipticity of elliptic partial differential operators, cf. [22, Section 5.3]. In this situation,the global linear independence implies the linear independence of the restrictions. Forresults related to Theorem 2.6, see [9, 31, 32].The proof of Theorem 2.6 is based on a general Banach space result, also used in[26, Theorem 4.1]. Let Z be a Banach space and { φ k } ∞ k =1 ⊂ Z be a sequence of linearlyindependent elements whose linear span is dense in Z , and define Z k := span { φ , . . . , φ k } with Z = { } . Let { δ k } ∞ k =0 be a nonincreasing sequence of positive numbers withlim k →∞ δ k = 0 and define A ( Z ; { δ k } ∞ k =0 , { φ k } ∞ k =1 ) := { f ∈ Z : E ( f, Z k , Z ) ≤ δ k , for k = 0 , , . . . } . (9)The following result goes back to Lorentz in [23]:5 heorem 2.7 (Theorem 3.3 in [24]) . Let { δ k } ∞ k =0 be a nonincreasing sequence of positivenumbers such that δ k ≤ cδ k , for k = 1 , , . . . and some constant c ∈ (0 , . For ℓ ≥ , let M ℓ := min { k : δ k ≤ e − ℓ } , then we have, for < ε ≤ , H ǫ (cid:0) A ( Z ; { δ k } ∞ k =0 , { φ k } ∞ k =1 ) , Z (cid:1) ≍ L X ℓ =1 M ℓ , (10) where L := 2 + ⌊ log(1 /ε ) ⌋ .Proof of Theorem 2.6. We aim to apply Theorem 2.7 with the function system { ϕ k | B } ∞ k =0 and Z = X p ( B ). There, the index set is supposed to start with k = 1, so we set φ k = ϕ k − | B , k = 1 , , . . . . To define the sequence { δ k } ∞ k =0 , we need some preparation.The linear independence assumption yields that (3) implies dim(Π N | B ) ≍ N α . By using Z k := span { ϕ | B , . . . , ϕ k − | B } , we derive, for N α ≤ k ≤ (2 N ) α ,(2 N ) s E ( f, Π N | B , X p ( B )) . k s/α E ( f, Z k , X p ( B )) . N s E ( f, Π N | B , X p ( B )) . Therefore, there are constants C i ≥
1, for i = 1 ,
2, such that the definitions δ = , δ k := (2 C ) − k − s/α , and δ = C , δ k := C k − s/α , lead to A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) ⊂ A s ( X p ( B )) ⊂ A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) , which also yields N ε (cid:0) A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) (cid:1) ≤ N ε ( A s ( X p ( B ))) ≤ N ε (cid:0) A ( X p ( B ); { δ k } ∞ k =0 , { φ k } ∞ k =1 ) (cid:1) . Since δ i ;2 k ≤ cδ i ; k , for c := 2 − s/α ∈ (0 , P Lℓ =1 M ℓ ≍ e Lα/s , so that the choice of L in (10) implies (8).It is obvious that increased precision requires more bits, and smoother functions canbe represented with fewer bits. The exact growth condition (8) reflects these thoughts ina quantitative fashion and that (8) serves as a benchmark for function representation ondiffusion measure spaces. The remaining part of the present work is dedicated to developa scheme that matches the optimality bound at least up to a logarithmic factor.
3. Global function approximation
This section is dedicated to introduce our approximation scheme, to discuss its char-acterization of certain function spaces, and to determine the asymptotics of the n -widths. We now collect few ingredients for our approximation scheme.
Definition 3.1.
We call an infinitely often differentiable and non-increasing function H : R ≥ → R a low-pass filter if H ( t ) = 1 for t ≤ / H ( t ) = 0 for t ≥ H , the kernel K N ( x, y ) := ∞ X k =0 H ( λ k N ) ϕ k ( x ) ϕ k ( y ) , (11)6f. [12, 25, 27], is localized, i.e., for fixed r > α and all x = y with N = 1 , , . . . , (cid:12)(cid:12) K N ( x, y ) (cid:12)(cid:12) . N α − r ρ ( x, y ) r . (12)Alternatively, we also have for fixed r > α and all x, y with N = 1 , , . . . , (cid:12)(cid:12) K N ( x, y ) (cid:12)(cid:12) . N α max(1 , ( N r ρ ( x, y ) r ) . (13)We find in [12, Inequality (3.12)] thatsup y ∈ X Z X (cid:12)(cid:12) K N ( x, y ) (cid:12)(cid:12) dµ ( x ) . ν on X and f ∈ L ( X , | ν | ), we can define, for N =1 , , . . . , σ N ( f, ν ) := ∞ X k =0 H ( λ k N ) Z X f ( x ) ϕ k ( x ) ϕ k dν ( x ) = Z X f ( x ) K N ( x, · ) dν ( x ) . (15) This section is dedicated to introduce further ingredients to develop our approximationscheme. We first aim to replace the integral over diffusion polynomials with a finite sumor at least with an integral over a “simpler” measure.
Definition 3.2.
A signed Borel measure ν on X is called a quadrature measure of order N if Z X f ( x ) g ( x ) dµ ( x ) = Z X f ( x ) g ( x ) dν ( x ) , for all f, g ∈ Π N . While Π N is finite dimensional, the collection of products is so as well and, hence,there do exist finitely supported quadrature measure, cf. [5], [34, Section 1.5]. Note thatwe request exact integration of products from functions in Π N , which leads us to thefollowing product assumption on the structure of diffusion polynomials, which is alsoused in [13]: for f ∈ L p ( X ), letdist( f, Π N ) L ∞ := inf h ∈ Π N k f − h k L ∞ and assume that there is a constant a ≥ ǫ N := sup λ ℓ ,λ k ≤ N dist( ϕ ℓ ϕ k , Π aN ) L ∞ (16)satisfies N m ǫ N → N → ∞ , for all m > Definition 3.3.
Let the product assumption hold. For fixed 1 ≤ p ≤ ∞ , a family( µ N ) ∞ N =1 of positive quadrature measures of order N , respectively, is called a family of Marcinkiewicz-Zygmund quadrature measures of order N , respectively, if k f k | µ N | ,L p ( X ) ≍ k f k L p ( X ) , for all f ∈ Π N , (17)where | µ N | denotes the total variation measure of µ N and k f k | µ N | ,L p ( X ) denotes the L p -norm of f with respect to | µ N | .Under fairly general assumptions, the results in [12, Theorem 3.1] and [13, Theorem5.8] imply the existence of a family of finitely supported Marcinkiewicz-Zygmund quadra-ture measures ( µ N ) ∞ N =1 of order N , respectively, such that µ N ) ≍ N α , where α is as in Definition 2.1. 7 .3. Widths and characterization of approximation spaces Fix 1 ≤ p ≤ ∞ and suppose that ( µ N ) ∞ N =1 is a family of Marcinkiewicz-Zygmundquadrature measures of order N , respectively. According to [12, Inequality (3.13)], wehave k σ N ( f, µ N ) k L p . k f k | µ N | ,L p , f ∈ L p ( X , | µ N | ) , (18)and the generic constant can be chosen independently of f and N . Later, we shall alsoneed that (17) extends to all functions in X p ( X ), i.e., that k f k | µ N | ,L p . k f k L p ( X ) holds,for all f ∈ X p ( X ). The latter is fine for p = ∞ . For 1 ≤ p < ∞ , we have not yet foundany explicit example except for the measure µ itself. Therefore, we shall simply restrictourselves to µ N = µ , N = 1 , , . . . in this case.We can also estimate sup y ∈ X Z X | K N ( x, y ) | d | µ N | ( x ) . , (19)cf. [12, Inequality (3.12)]. Note that (19) is the quadrature version of (14). Next, werecall the characterization of A s ( L p ( X )) using σ N , see [12, 25, 27, 28]: Theorem 3.4.
Suppose that ≤ p ≤ ∞ and assume that ( µ N ) ∞ N =1 is a family ofMarcinkiewicz-Zygmund quadrature measures of order N , respectively, if p = ∞ . For ≤ p < ∞ we choose µ N = µ , N = 1 , , . . . . Assume further that H is a low-pass filter.Then, for all f ∈ A s ( L p ( X )) , we have k f − σ N ( f, µ N ) k L p ( X ) . N − s k f k A s ( L p ( X )) , (20) where the generic constant does not depend on N or f . On the other hand, if, for f ∈ L p ( X ) , there are generic constants not depending on N such that k f − σ N ( f, µ N ) k L p ( X ) . N − s , then f ∈ A s ( L p ( X )) . Remark 3.5.
Let us point out again that we suppose µ N = µ , N = 1 , , . . . for 1 ≤ p < ∞ . In this case, the term σ N ( f, µ N ) contains spectral information R X f ( x ) ϕ k ( x ) dµ ( x ).If p = ∞ and µ N has finite support, then we have an approximation scheme that usesfinitely many training data consisting of function evaluations at scattered data points insupp( µ N ).We have already determined the asymptotics of the Kolmogorov metric entropy. Here,we shall determine the n -widths for the global approximation space. Theorem 3.6.
The continuous nonlinear n -widths of A s ( X p ( X )) in X p ( X ) satisfies d n ( A s ( X p ( X )) , X p ( X )) ≍ n − s/α . In order to verify Theorem 3.6 we shall consider two more types of n -widths, cf. [33].The linear n -width of a subset Y in a Banach space Z is L n ( Y, Z ) := inf F n sup x ∈ Y k x − F n ( x ) k , where the infimum is taken over all bounded linear operators F n on Z whose range is ofdimension at most n . The Bernstein n -width of Y in Z is B n ( Y, Z ) := sup Z n +1 sup { λ : λZ n +1 ⊂ Y } Z n +1 of Z of dimension at least n + 1 and Z n +1 denotes the unit ball in Z n +1 . The continuous nonlinear n -widths is sandwiched by B n ( Y, Z ) ≤ d n ( Y, Z ) ≤ L n ( Y, Z ) , (21)cf. [6]. We can now take care of the proof. We point out that we shall derive a lowerbound on the Bernstein n -width, which is closely related to the Bernstein estimates in[25] for integer derivatives when combined with K-functionals, cf. [6]. Proof of Theorem 3.6.
The operator σ N is bounded on L p ( X ) and σ N ( f ) is an elementin Π N . Consider N with N ≍ n /α , so that dim(Π N ) ≍ N α yields L n ( A s ( X p ( X )) , X p ( X )) . n − s/α , cf. Theorem 3.4 . Hence, the second inequality in (21) implies d n ( A s ( X p ( X )) , X p ( X )) . n − s/α .To establish the lower bound, we take the subspace Π N +1 and aim to derive a genericconstant c > cN − s Π N +1 ⊂ A s ( X p ( X )), where Π N +1 = { f ∈ Π N +1 : k f k L p ( X ) ≤ } . For f ∈ Π N +1 , let g := N − s f . We obtain k g k L p ( X ) ≤ N − s and, for M > N , we derive E ( g, Π M , L p ( X )) = 0. The choice M ≤ N yields E ( g, Π M , L p ( X )) = N − s k f − σ M ( f ) k L p ( X ) . N − s , because k f k L p ( X ) ≤ k σ M ( f ) k L p ( X ) . k f k L p ( X ) . Thus,sup M ≥ M s E ( g, Π M , L p ( X )) . c such that cN − s Π N +1 ⊂ A s ( L p ( X )) . Therefore, we obtain B n ( A s ( X p ( X )) , X p ( X )) & n − s/α , so that (21) concludes the proof.It should be mentioned that upper bounds on linear n -widths for compact Riemannianmanifolds were already derived in [16], where also the exact asymptotics were obtainedfor compact homogeneous manifolds.
4. Bit representation in global approximation spaces
This section is dedicated to verify that linear quantization of the approximation scheme σ N ( f, µ N ) enables bit representations matching the optimality bounds stated in Theorem2.6 up to a logarithmic factor. First, we recall the formula (15), σ N ( f, µ N ) = ∞ X k =0 H ( λ k N ) Z X f ( x ) ϕ k ( x ) ϕ k dµ N ( x ) , where ( µ N ) ∞ N =1 is a family of Marcinkiewicz-Zygmund quadrature measures of order N ,respectively, if p = ∞ . Again, if 1 ≤ p < ∞ , then we choose µ N = µ , N = 1 , , . . . .Since H ( t ) = 1, for t ∈ [0 , /
2] and H ( t ) = 0, for t >
1, we observe that H ( λ k N ) H ( λ k N ) = H ( λ k N ). If ( ν N ) ∞ N =1 is a family of quadrature measures of order 2 N , respectively, then astraight-forward calculation yields σ N ( f, µ N ) = Z X σ N ( f, µ N , x ) ∞ X k =0 H ( λ k N ) ϕ k ( x ) dν N ( x ) ϕ k , (22)9he representation (22) involves the quadrature measure ν N and the Marcinkiewicz-Zygmund quadrature measure µ N . To design the final approximation scheme, we fixsome S >
1, apply the quantization I N ( f, µ N , x ) = ⌊ N S σ N ( f, µ N , x ) ⌋ , (23)and define the actual approximation by σ ◦ N ( f, µ N , ν N ) := N − S Z X I N ( f, µ N , x ) ∞ X k =0 H ( λ k N ) ϕ k ( x ) dν N ( x ) ϕ k . (24)In other words, we replace σ N ( f, µ N , x ) in (22) with a number on the grid N S Z .We have the following result for the ball A s ( L p ( X )) of radius 1 of the global approxi-mation space given by (5). It extends results in [11] from compact Riemannian manifoldsand p = ∞ to diffusion measure spaces and to the entire range 1 ≤ p ≤ ∞ : Theorem 4.1.
For the case p = ∞ , we assume that ( µ N ) ∞ N =1 is a family of Marcinkiewicz-Zygmund quadrature measures of order N , respectively. For ≤ p < ∞ we choose µ N = µ , N = 1 , , . . . . Assume further that H is a low-pass filter. We suppose that ( ν N ) ∞ N =1 areMarcinkiewicz-Zygmund quadrature measures of order N with ν N ) . N α . Forfixed s > and S > max(1 , s ) , we apply the discretizations (23) and (24) . Then there isa constant c > such that, for all f ∈ A s ( L p , X ) , k f − σ ◦ N ( f, µ N , ν N ) k L p ( X ) ≤ cN − s (25) holds. For cN − s = ε ≤ and ε ≤ , the number of bits needed to represent all integers { I N ( f, µ N , x ) : x ∈ supp( ν N ) } does not exceed a positive constant (independent of ε )times (1 /ε ) α/s (1 + log (1 /ε )) . (26) Proof of Theorem 4.1.
The triangle inequality yields k f − σ ◦ N ( f, µ N , ν N ) k L p ( X ) . k f − σ N ( f, µ N ) k L p ( X ) + k σ N ( f, µ N ) − σ ◦ N ( f, µ N , ν N ) k L p ( X ) . Since Theorem 3.4 implies k f − σ N ( f, µ N ) k L p ( X ) . N − s k f k A s ( L p ( X )) , we only need totake care of the term on the far most right. The quantization (23) immediately yields | σ N ( f, µ N , x ) − N − S I N ( f, µ N , x ) | ≤ N − S , for all x ∈ supp( ν N ), (27)so that (22) and (19) imply k σ N ( f, µ N ) − σ ◦ N ( f, µ N , ν N ) k L p ( X ) = (cid:13)(cid:13) Z X (cid:16) σ N ( f, µ N , y ) − N − S I N ( f, µ N , y ) (cid:17) K N ( · , y ) dν N ( y ) (cid:13)(cid:13) L p ( X ) . N − S ≤ N − s . Hence, we have derived (25).To tackle (26), we observe that the localization property (13) yields k g k L ∞ . N α k g k L ,for all g ∈ Π N , see also [27, Lemma 5.5] for more general Nikolskii inequalities. We apply(27) and then use σ N ( f, µ N ) ∈ Π N with L p ֒ → L , which yields | I N ( f, µ N , x ) | . N S k σ N ( f, µ N ) k L ∞ ( X ) . N S + α k σ N ( f, µ N ) k L p ( X ) . k σ N ( f, µ N ) k L p ( X ) . k f k L p ( X ) holds. Since f is containedin the ball of radius 1, so that k f k L p ( X ) ≤
1, we see that | I N ( f, µ N , x ) | . N S + α . Thus, the number of bits needed to represent each I N ( f, µ N , x ) is at most log ( c N S + α ),where c ≥ c N S + α ≥ I N ( f, µ N , x ) would be zero. Since ν N ) . N α , we have { I N ( f, µ N , x ) : x ∈ supp( ν N ) } . N α . Therefore, the total number of bits needed to represent all numbers { I N ( f, µ N , x ) : x ∈ supp( ν N ) } is at most c N α log ( c N S + α ), where c is a positiveconstant. By using cN − s = ε ≤ ε ≤
1, we derive that the number of necessary bitsdoes not exceed c c α/s (1 /ε ) α/s log ( c ( c/ε ) ( S + α ) /s ) . (1 /ε ) α/s log (( c ) s/ ( S + α ) c/ε ) . (1 /ε ) α/s log (( c /ε ) s/ ( S + α ) c/ε ) . (1 /ε ) α/s (1 + log (1 /ε )) , which concludes the proof.Theorem 4.1 yields that our bit representation scheme is optimal with respect to themetric entropy as stated in Theorem 2.6 at least up to a logarithmic factor.
5. Bit representation of locally smooth functions
In the previous section we derived a bit-representation scheme for the global approx-imation space, i.e., B = X . It turns out that the case B ( X is more involved becausewe do not have results that characterize A s ( L p ( B )) by means of σ N . In fact, σ N requiresfunctions to be defined globally so that one would be forced to deal with boundary effects.On the other hand, B itself may not be a diffusion measure space satisfying all requiredassumptions. We circumvent such difficulties by dealing with a modified approximationspace, for which we can construct a bit representation scheme. Before we can discuss local smoothness, few technical details need to be introducedand we make use of C ∞ ( X ) := T s> A s ( L ∞ ( X )): Definition 5.1.
We say that X satisfies the smooth cut-off property if for any s > B ′ , B with B ′ ( B there is φ ∈ C ∞ ( X ) such that φ equals 1 on B ′ and φ vanishes outside of B with | φ ( x ) | ≤
1, for all x ∈ X .Note that any smooth manifold satisfies the smooth cut-off property. Definition 5.2.
For x ∈ X , the local approximation space in x is denoted by A s ( X p ( X ) , x )and defined as the collection of f ∈ X p ( X ) such that there is an open ball B containing x with f φ ∈ A s ( X p ( X )), for all φ ∈ C ∞ ( X ) with support in B .It turns out that the approximation rate of σ N ( f, µ N ) characterizes the approximationclass of f , at least when switching to dyadic numbers N = 2 n , n = 1 , , . . . , cf. [12, 25,27, 28]: Theorem 5.3.
Let X satisfy the smooth cut-off property. For p = ∞ , suppose that ( µ n ) ∞ n =1 is a family of Marcinkiewicz-Zygmund quadrature measures of order n , respec-tively. For ≤ p < ∞ , we choose µ n = µ , n = 1 , , . . . . If H is a low-pass filter, thenthe following points are equivalent: i) f ∈ A s ( X p ( X ) , x ) ,(ii) there is a ball B centered at x such that k f − σ n ( f, µ n ) k L p ( B ) . − ns . (28)Note that the generic constant in (28) may depend on x and f . Nonetheless, the abovetheorem characterizes local approximation by means of σ n The local approximation space A s ( X p ( X ) , x ), for x ∈ X , is not endowed with anynorm. In view of Theorem 5.3, we fix some ball B and introduce a new approximationspace in the following. Definition 5.4.
Let B be a nonempty ball in X . For p = ∞ , suppose that ( µ n ) ∞ n =1 isa family of Marcinkiewicz-Zygmund quadrature measures of order 2 n , respectively. For1 ≤ p < ∞ , we choose µ n = µ , n = 1 , , . . . . If H is a low-pass filter, then we define the local approximation space in B by A s ( X p ( X ) , B ) := { f ∈ X p ( X ) : k f k A s ( X p ( X ) ,B ) < ∞} , where k f k A s ( X p ( X ) ,B ) := k f k L p ( X ) + sup n ≥ ns k f − σ n ( f, µ n ) k L p ( B ) . Note that if p = ∞ , then the space A s ( X p ( X ) , B ) implicitly depends on the family( µ n ) ∞ n =1 of Marcinkiewicz-Zygmund quadrature measures of order 2 n , respectively. Asopposed to A s ( X p ( B )) defined in (5), the space A s ( X p ( X ) , B ) consists of functions definedglobally that inherit approximation properties locally. By definition, we have k f − σ n ( f, µ n ) k L p ( B ) ≤ − ns k f k A s ( X p ( X ) ,B ) . (29)Since σ n ( f, µ n ) is a diffusion polynomial, we observe that A s ( X p ( X ) , B ) | B ֒ → A s ( X p ( B )) . However, we cannot claim that the reverse embedding also holds.It should be mentioned that σ n ( f, µ n ) in (29) approximates f locally but its defini-tion needs global knowledge of f or at least on supp( µ n ) if p = ∞ . To enable the designof an approximation scheme that involves local information on f exclusively, we defineone more approximation space by using some cut-off function: Definition 5.5.
For some fixed φ ∈ C ∞ ( X ), define A s ( X p ( X ) , φ ) := { f ∈ X p ( X ) : f φ ∈ A s ( X p ( X )) } (30)endowed with the norm k f k A s ( X p ( X ) ,φ ) := k f k L p ( X ) + sup n ≥ ns E ( f φ, Π n , L p ( X )).To study local approximation, choose two concentric balls B ′ , B with B ′ ( B . If X satisfies the smooth cut-off property, then we can fix some φ ∈ C ∞ ( X ) that is one on B ′ and zero outside of B . The Definition (30) yields that f ∈ A s ( X p ( X ) , φ ) implies k f − σ n ( f φ, µ n ) k L p ( B ′ ) ≤ k f φ − σ n ( f φ, µ n ) k L p ( X ) . − ns k f k A s ( X p ( X ) ,φ ) . In the subsequent section, we shall consider balls of radius r for both spaces, A s ( X p ( X ) , B ) := { f ∈ X p ( X ) : k f k A s ( X p ( X ) ,B ) ≤ } , (31) A s ( X p ( X ) , φ ) := { f ∈ X p ( X ) : k f k A s ( X p ( X ) ,φ ) ≤ } , (32)and aim to develop approximation schemes requiring only few bits.12 .2. Local bit-representation in approximation classes To design an approximation scheme for the spaces A s ( X p ( X ) , B ) and A s ( X p ( X ) , φ ),let B be a ball in X and let B ′ ⊂ B be another nonempty ball concentric with B and ofradius strictly less. It will turn out that the following scheme enables us to approximate f on B ′ . For some fixed S >
1, we apply the quantization I n ( f, µ n , x ) as in (23) but inplace of (24) we define the local approximation by σ ◦ n ( f, µ n , ν n , B ) := 2 − nS Z B I n ( f, µ n , x ) ∞ X k =0 H ( λ k n +1 ) ϕ k ( x ) dν n ( x ) ϕ k , (33)We have the following result for the ball A s ( X p ( X ) , B ) of the localized approximationspace given by (31): Theorem 5.6.
Suppose that X satisfies the smooth cut-off property and that ( µ n ) ∞ n =1 is afamily of Marcinkiewicz-Zygmund quadrature measures of order n , respectively, if p = ∞ .For ≤ p < ∞ we choose µ n = µ , n = 1 , , . . . . Assume further that H is a low-passfilter. Let B, B ′ be two concentric balls, so that B ′ ( B . We also suppose that ( ν n ) ∞ n =1 are Marcinkiewicz-Zygmund quadrature measures of order n +1 with ν n ) . nα .For fixed s > and S > max(1 , s ) , we apply the discretizations (23) and (33) . Thenthere is a constant c > such that, for all f ∈ A s ( X p ( X ) , B ) , k f − σ ◦ n ( f, µ n , ν n , B ) k L p ( B ′ ) ≤ c − ns (34) holds. For c − ns = ε ≤ and ε ≤ , the number of bits needed to represent all integers { I n ( f, µ n , x ) : x ∈ supp( ν n ) ∩ B } does not exceed a positive constant (independent of ε ) times (1 /ε ) α/s (1 + log (1 /ε )) . (35) Proof of Theorem 5.6.
For f ∈ A s ( X p ( X ) , B ), we use the localization property (12) with r = α + S and the embedding L p ֒ → L in the compact case to derive, for y ∈ B ′ , Z X \ B (cid:12)(cid:12) σ n ( f, µ n , x ) K n +1 ( x, y ) (cid:12)(cid:12) d | ν n | ( x ) . − nS k σ n ( f, µ n ) k | ν n | ,L p . − nS k σ n ( f, µ n ) k | µ n | ,L p . (36)The latter estimate holds because both ( ν n ) ∞ n =1 and ( µ n ) ∞ n =1 are families of Marcinkiewicz-Zygmund measures. The quantization (23) immediately yields | σ n ( f, µ n , x ) − − nS I n ( f, µ n , x ) | ≤ − nS , for all x ∈ supp( ν n ). (37)By using (37), (22), and (19), we derive k σ n ( f, µ n ) − σ ◦ n ( f, µ n , ν n , B ) k L p ( B ′ ) = (cid:13)(cid:13) σ n ( f, µ n ) − Z B − nS I n ( f, µ n , x ) K n +1 ( x, · ) dν n ( x ) (cid:13)(cid:13) L p ( B ′ ) . (cid:13)(cid:13) σ n ( f, µ n ) − Z B σ n ( f, µ n , x ) K n +1 ( x, · ) dν n ( x ) (cid:13)(cid:13) L p ( B ′ ) + 2 − nS . Next, we make use of (22) and (36) with (18) to obtain k σ n ( f, µ n ) − σ ◦ n ( f, µ n , B ) k L p ( B ′ ) . (cid:13)(cid:13) Z X \ B σ n ( f, µ n , x ) K n +1 ( x, · ) dν n ( x ) (cid:13)(cid:13) L p ( B ′ ) + 2 − nS . − nS k f k | µ n | ,L p + 2 − nS . − nS . µ n = µ , for 1 ≤ p < ∞ , so that k f k | µ n | ,L p . k f k L p ( X ) holds for the entire range 1 ≤ p ≤ ∞ . Note that k f k L p ( X ) ≤ f ∈ A s ( X p ( X ) , B ). The triangle inequality with (29) and the above estimate yield k f − σ ◦ n ( f, µ n , B ) k L p ( B ′ ) . k f − σ n ( f, µ n ) k L p ( B ′ ) + k σ n ( f, µ n ) − σ ◦ n ( f, µ n , ν n , B ) k L p ( B ′ ) . − ns + 2 − nS . − ns , which verifies (34).For the remaining part, we can follow the lines of the proof of Theorem 4.1.Note that Theorem 5.6 still requires global knowledge of f because we need to build σ n ( f, µ n ). For the sake of completeness, we use a cut-off function to feed in localinformation only: Theorem 5.7.
Under the same assumption as in Theorem 4.1, let φ ∈ C ∞ ( X ) be one on B ′ and zero outside of B . Then there is a constant c > such that, for f ∈ A s ( X p ( X ) , φ ) , k f − σ ◦ n ( f φ, µ n , ν n ) k L p ( B ′ ) ≤ c − ns (38) holds. For c − ns = ε ≤ , the number of bits needed to represent all integers { I n ( f φ, µ n , x ) : x ∈ supp( ν n ) } does not exceed a positive constant (independent of ε ) times (1 /ε ) α/s (1 + log (1 /ε )) . (39) Proof.
The smooth cut-off property of φ yields that f ∈ A s ( X p ( X ) , φ ) leads to f φ ∈A s ( X p ( X )), due to | φ ( x ) | ≤
1, for all x ∈ X . Thus, Theorem 4.1 applied to f φ implies(38) and (39). Appendix A. Summary of the technical assumptions
The asymptotic bounds on the metric entropy in Theorem 2.6 hold for any diffusionmeasure space X as introduced in Definition 2.1 provided that the restrictions { ϕ k | B } ∞ k =0 are linearly independent on the ball B ⊂ X . Our computational scheme that matchesthis bound up to a logarithmic factor needs few additional technical assumptions that aredistributed within the present paper. Here, we list all the required assumptions for thesake of completeness:(I) X is a diffusion measure space (Definition 2.1),(II) ǫ N defined in (16) satisfies N m ǫ N → N → ∞ , for all m > ν N ) ∞ N =1 of Marcinkiewicz-Zygmund quadrature measures of order2 N , respectively, satisfying ν N ) . N α (Definitions 3.2, 3.3 and (22)),(IV) the smooth cut-off property holds (Definition 5.1).Condition (I) is the general framework, and the additional conditions are further technicaldetails. Note that (IV) is only needed in Section 5, and it is well-known that this holds forsmooth manifolds. All of the above conditions hold for compact homogeneous manifolds,e.g., the sphere and the Grassmann manifold, if the function system { ϕ k } ∞ k =0 are eigen-functions of the Laplace operator, cf. [15]. Moreover, it was pointed out in [3] that theconditions are also satisfied for smooth compact Riemannian manifolds with nonnegativeRicci curvature, see also Remark 2.2. Families of Marcinkiewicz-Zygmund quadraturemeasures ( µ N ) ∞ N =1 were then constructed in [13], such that µ N ) ≍ N α .14 cknowledgment M.E. has been funded by the Vienna Science and Technology Fund (WWTF) throughproject VRG12-009. The authors thank H. N. Mhaskar for many fruitful discussions.
References [1] B. Carl and I. Stephani.
Entropy, Compactness and the Approximation of Operators .Cambridge University Press, 1990.[2] C. K. Chui, F. Filbir, and H. N. Mhaskar. Representation of functions on big data:Graphs and trees.
Appl. Comput. Harmon. Anal. , 38:489–509, 2015.[3] C. K. Chui and H. N. Mhaskar. Smooth function extension based on high dimensionalunstructured data.
Mathematics of Computation , 83(290):2865–2891, 2014.[4] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. J. Warner, and S. W.Zucker. Geometric diffusions as a tool for harmonic analysis and structure definitionof data. part i: Diffusion maps.
Proc. Nat. Acad. Sci. , 102:7426–7431, 2005.[5] P. de la Harpe and C. Pache. Cubature formulas, geometrical designs, reproducingkernels, and Markov operators. In
Infinite groups: geometric, combinatorial anddynamical aspects , volume 248, pages 219–267, Basel, 2005. Birkh¨auser.[6] R. A DeVore, R. Howard, and C. Micchelli. Optimal nonlinear approximation.
Manuscripta Mathematica , 63(4):469–478, 1989.[7] R. A. DeVore and G. G. Lorentz.
Constructive Approximation . Springer-Verlag,1993.[8] D. Dung. Non-linear approximations using sets of finite cardinality or finite pseudo-dimension.
J. Complexity , 17(2):467–492, 2001.[9] D. Dung and T. Ullrich. n -widths and ε -dimensions for high-dimensional approxi-mations. Found. Comput. Math. , 13:965–1003, 2013.[10] D. E. Edmunds and H. Triebel.
Function Spaces, Entropy Numbers, DifferentialOperators . Cambridge University Press, 1996.[11] M. Ehler and F. Filbir. ε -coverings of H¨older-Zygmund type spaces on data-definedmanifolds. Abstract and Applied Analysis , 2014.[12] F. Filbir and H. N. Mhaskar. A quadrature formula for diffusion polynomials corre-sponding to a generalized heat kernel.
J. Fourier Anal. Appl. , 16(5):629–657, 2010.[13] F. Filbir and H. N. Mhaskar. Marcinkiewicz–Zygmund measures on manifolds.
J. Complexity , 27(6):568–596, 2011.[14] S. Foucart, A. Pajor, H. Rauhut, and T. Ullrich. The Gelfand widths of l p -balls for0 < p ≤ J. Complexity , 26:629–640, 2010.[15] D. Geller and I. Z. Pesenson. Band-limited localized Parseval frames and Besovspaces on compact homogeneous manifolds.
J. Geom. Anal. , 21(2):334–371, 2011.1516] D. Geller and I. Z. Pesenson. n -widths and approximation theory on compact Rie-mannian manifolds. In A. Mayeli, P. E. T. Jorgensen, and G. ´Olafsson, editors, Commutative and Noncommutative Harmonic Analysis and Applications , volume603. Contemporary Mathematics, 2013.[17] D. Geller and I. Z. Pesenson. Kolmogorov and linear widths of balls in Sobolev spaceson compact manifolds.
Math. Scand. , 115(1):96–122, 2014.[18] A. Grigor’yan. Estimates of heat kernels on riemannian manifolds. In B. Davies andYu. Safarov, editors,
Spectral Theory and Geometry. ICMS Instructional Conference,Edinburgh, 1998 , volume 273 of
London Math. Soc. Lecture Notes , pages 140–225.Cambridge Univ. Press, 1999.[19] D. D. Haroske and H. Triebel.
Distributions, Sobolev Spaces, Elliptic Equations .EMS Publishing House, 2007.[20] A. N. Kolmogorov. ¨Uber die beste Ann¨aherung von Funktionen einer gegebenenFunktionenklasse.
Ann. Math. , 37:107–111, 1936.[21] A. N. Kolmogorov and V. M. Tihomirov. ε -entropy and ε -capacity of sets infunction spaces. Uspehi Mat. Nauk., vol. 14 (1959), 3-86, English transl. inAmer. Math. Soc. Transl. 17, 277-364. MR0112032 (22:2890).[22] S. G. Krantz and H. R. Parks. A Primer of Real Analytic Functions . Birkh¨auser,1992.[23] G. G. Lorentz. Metric entropy and approximation.
Bull. Amer. Math. Soc. ,72(6):903–937, 1966.[24] G. G. Lorentz, M. v. Golitschek, and Y. Makovoz.
Constructive Approximation,Advanced Problems . Springer Verlag, New York, 1996.[25] M. Maggioni and H. N. Mhaskar. Diffusion polynomial frames on metric measurespaces.
Appl. Comput. Harmon. Anal. , 24(3):329–353, 2008.[26] H. N. Mhaskar. On the representation of smooth functions on the sphere usingfinitely many bits.
Appl. Comput. Harmon. Anal , 18(3):215–233, 2005.[27] H. N. Mhaskar. Eignets for function approximation on manifolds.
Appl. Com-put. Harmon. Anal. , 29:63–87, 2010.[28] H. N. Mhaskar. A generalized diffusion frame for parsimonious representation offunctions on data defined manifolds.
Neural Networks , 24:345–359, 2011.[29] C. A. Micchelli and T. J. Rivlin.
Optimal Estimation in Approximation Theory ,chapter A survey of optimal recovery, pages 1–54. Plenum Press, 1977.[30] B. Nadler and M. Galun. Fundamental limitations of spectral clustering.
NeuralInformation Processing systems , 19:1017–1024, 2007.[31] E. Novak. Optimal recovery and n -widths for convex classes of functions. J. Ap-prox. Theory , 80(3):390–408, 1995.[32] A. Pinkus. n -Widths in Approximation Theory . Springer-Verlag, New York, 1985.[33] A. Pinkus. n -widths of Sobolev spaces in L p . Constr. Approx. , 1:15–62, 1985.[34] T. J Rivlin.
The Chebyshev polynomials . John Wiley and Sons, 1974.1635] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linearembedding.
Science , 290(5500):2323–2326, 2000.[36] B. Sch¨olkopf and A. Smola.
Learning with Kernels . MIT Press, Cambridge, MA,2002.[37] V. N. Temlyakov. Estimates for the asymptotic characteristics of classes of functionswith bounded mixed derivative or difference.
Proc. Steklov Inst. Math. , 189(161–197),1990.[38] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric frameworkfor nonlinear dimensionality reduction.
Science , 290(5500):2319–23, Dec 2000.[39] J. F. Traub and H. Wozniakowski.
A General Theory of Optimal Algorithms . Aca-demic Press, New York, 1980.[40] A. G. Vitushkin.