[PDF] On non-inclusion of certain functions in reproducing kernel Hilbert spaces

Abstract

We use a classical characterisation to prove that functions which are bounded away from zero cannot be elements of reproducing kernel Hilbert spaces whose reproducing kernels decays to zero in a suitable way. The result is used to study Hilbert spaces on subsets of the real line induced by analytic translation-invariant kernels which decay to zero at infinity.

Full PDF

aa r X i v : . [ m a t h . F A ] F e b On Non-Inclusion of Certain Functions inReproducing Kernel Hilbert Spaces

Toni Karvonen

The Alan Turing Institute, United Kingdom

February 23, 2021

Abstract

We use a classical characterisation to prove that functions which are bounded away from zerocannot be elements of reproducing kernel Hilbert spaces whose reproducing kernels decaysto zero in a suitable way. The result is used to study Hilbert spaces on subsets of the real lineinduced by analytic translation-invariant kernels which decay to zero at inﬁnity.

The inclusion or non-inclusion of certain functions, often constants or polynomials, in repro-ducing kernel Hilbert spaces (RKHSs) has numerous implications in theory of statistical andmachine learning algorithms. See Steinwart and Christmann (2008, p. 142); Lee et al. (2016,Assumption 2); and Karvonen et al. (2019, Proposition 6) for a few speciﬁc examples. Non-inclusion of polynomials in an RKHS also explains the phenomena observed in Xu and Stein(2017). Furthermore, error estimates for kernel-based approximations methods typically requirethat the target function be an element of the RKHS (Wendland, 2005, Chapter 11).The RKHSs of a number of ﬁnitely smooth kernels, such as Matérn and Wendland kernels,are well understood, being norm-equivalent to Sobolev spaces (e.g., Wendland, 2005, Corol-lary 10.13). With the exception of power series kernels (Zwicknagl and Schaback, 2013), lessis known about inﬁnitely smooth kernels. Since the work of Steinwart et al. (2006) and Minh(2010), which is based on explicit computations involving an orthonormal basis of the RKHS,it has been known that the RKHS of the Gaussian kernel does not contain non-trivial polyno-mials. Recently, Dette and Zhigljavsky (2021) have proved that RKHSs of analytic translation-invariant kernels do not contain polynomials via connection to the classical Hamburger momentproblem. In this note we use a classical RKHS characterisation to furnish a simple proof for the factthat, roughly speaking, functions which are bounded away from zero (e.g., constant functions)cannot be elements of an RKHS whose kernel decays to zero in a certain manner. An analyticityassumption is used to effectively localise this result for domains Ω ⊂ R which contain anaccumulation point. We then consider analytic translation-invariant kernels which decay tozero. Although quite simple, it seems that these results have not appeared in the literature.Analyticity of functions in an RKHS has been previously studied by Saitoh (1997, pp. 41–43)and Sun and Zhou (2008). General results concerning existence of RKHSs containing givenclasses of functions can be found in Aronszajn (1950, Section I.13). They do not state explicitly that their results apply to all analytic translation-invariant kernels, but this can be seenby inserting the standard bound | f ( n ) ( x ) | ≤ CR n n ! for analytic functions in their Equation (1.6) and using Stirling’sapproximation. Results

Let Ω be a set. Recall that a function K : Ω × Ω → R is a positive-semideﬁnite kernel if N X n =1 N X m =1 a n a m K ( x n , x m ) ≥ for any N ≥ , a , . . . , a N ∈ R , and x , . . . , x N ∈ Ω . By the Moore–Aronszajn theorem apositive-semideﬁnite kernel induces a unique reproducing kernel Hilbert space, H K (Ω) , whichconsists of functions f : Ω → R . The inner product and norm of this space are denoted h· , ·i K and k·k K . The kernel is reproducing in H K (Ω) , which is to say that f ( x ) = h f, K ( · , x ) i K forevery f ∈ H K (Ω) and x ∈ Ω . The following theorem characterises the elements of an RKHS;see, for example, Section 3.4 in Paulsen and Raghupathi (2016) for a proof. Theorem 2.1 (Aronszajn) . Let K be a positive-semideﬁnite kernel on Ω . A function f : Ω → R is contained in H K (Ω) if and only if R ( x, y ) = K ( x, y ) − c f ( x ) f ( y ) deﬁnes a positive-semideﬁnite kernel on Ω for some c > . If Θ is a subset of Ω , the RKHS H K (Θ) contains those functions f : Θ → R for which thereexists an extension f e ∈ H K (Ω) (i.e., f = f e | Θ ). We begin with a result for general bounded kernels.

Theorem 2.2.

Let K be a bounded positive-semideﬁnite kernel on Ω and ( x n ) ∞ n =1 a sequencein Ω such that lim ℓ →∞ | K ( x ℓ + n , x ℓ + m ) | = 0 for any n = m. (2.1) If f : Ω → R satisﬁes either f ( x n ) ≥ α or f ( x n ) ≤ − α for some α > and all sufﬁcientlylarge n , then f / ∈ H K (Ω) .Proof. Assume to the contrary that f ∈ H K (Ω) . By Theorem 2.1 there exists c > such that R ( x, y ) = K ( x, y ) − c f ( x ) f ( y ) deﬁnes a positive-semideﬁnite kernel on Ω . Therefore thequadratic form r N,ℓ = N X n =1 N X m =1 a n a m R ( x ℓ + n , x ℓ + m )= N X n =1 N X m =1 a n a m (cid:0) K ( x ℓ + n , x ℓ + m ) − c f ( x ℓ + n ) f ( x ℓ + m ) (cid:1) is non-negative for every N ≥ and ℓ ≥ and any a , . . . , a N ∈ R . By (2.1) it holds for allsufﬁciently large ℓ that max n,m ≤ Nn = m | K ( x ℓ + n , x ℓ + m ) | ≤ c α . Let C K = sup x ∈ Ω K ( x, x ) and set a = · · · = a N = 1 . Then, for sufﬁciently large ℓ , r N,ℓ = N X n =1 K ( x ℓ + n , x ℓ + n ) + X n = m K ( x ℓ + n , x ℓ + m ) − c N X n =1 N X m =1 f ( x ℓ + n ) f ( x ℓ + m ) ≤ C K N + 12 c α N − c α N = (cid:18) C K − c α N (cid:19) N, N > C K / ( c α ) . It follows that r N,ℓ is negative for sufﬁciently large N and ℓ which contradicts the assumption that f ∈ H K (Ω) .An alternative way to prove a similar result in some settings is by appealing to integrability.For example, elements of the RKHS of an integrable translation-invariant kernel on R d aresquare-integrable (Wendland, 2005, Theorem 10.12). Other integrability results can be found inSun (2005) and Carmeli et al. (2006). Next we use the fact that RKHSs which consist of analytic functions do not depend on thedomain to prove a localised versions of the above results for certain subset of R . The classicalresults on real analytic functions that we use are collected in Section 1.2 of Krantz and Parks(2002). Lemma 2.3.

Let K be a positive-semideﬁnite kernel on R and Ω a subset of R which has anaccumulation point. If H K ( R ) consists of analytic functions and f : R → R is analytic, then f ∈ H K ( R ) if and only if f | Ω ∈ H K (Ω) .Proof. If f ∈ H K ( R ) , then f | Ω ∈ H K (Ω) by deﬁnition. Suppose then that f | Ω ∈ H K (Ω) .Hence there is an analytic function g ∈ H K ( R ) such that g | Ω = f | Ω . The function f − g is analytic and vanishes on Ω . Because an analytic function which vanishes on a set with anaccumulation point is identically zero, we conclude that g = f and therefore f ∈ H K ( R ) . Theorem 2.4.

Let K be a bounded positive-semideﬁnite kernel on R such that H K ( R ) consistsof analytic functions, Ω a subset of R which has an accumulation point, and ( x n ) ∞ n =1 a sequencein Ω such that lim ℓ →∞ | K ( x ℓ + n , x ℓ + m ) | = 0 for any n = m. Then a function f : Ω → R is not an element of H K (Ω) if there exist an analytic function f e : R → R and α > such that f e | Ω = f and either f e ( x n ) ≥ α or f e ( x n ) ≤ − α for allsufﬁciently large n .Proof. By Lemma 2.3 f ∈ H K (Ω) if and only if f e ∈ H K ( R ) . But by Theorem 2.2 f e cannotbe an element of H K ( R ) . This proves the claim.Note that the requirement that H K ( R ) consist of analytic function cannot be simply removed.For example, by Proposition 2.5 the RKHS of the non-analytic kernel K ( x, y ) = exp( − | x − y | ) on R does not contain non-trivial polynomials. However, if Ω is a bounded interval, then H K (Ω) is norm-equivalent to the ﬁrst-order standard Sobolev space and therefore contains all polyno-mials. A kernel K on R is translation-invariant if there is a function ϕ : [0 , ∞ ) → R such that K ( x, y ) = ϕ (( x − y ) ) for all x, y ∈ R . For translation-invariant kernels the decay assumption (2.1) can be cast into a less abstract form.

Proposition 2.5.

Let K be a translation-invariant positive-semideﬁnite kernel on R for ϕ ≥ such that lim r →∞ ϕ ( r ) = 0 . Then a function f : R → R is not an element of H K ( R ) if thereis R ∈ R such that (a) f does not change sign on [ R, ∞ ) and lim inf x →∞ | f ( x ) | > or (b) f does not change sign on ( −∞ , R ] and lim inf x →−∞ | f ( x ) | > . roof. Translation-invariant kernels are bounded because K ( x, x ) = ϕ (0) for every x ∈ R . Theclaim follows from Theorem 2.2 by selecting a sequence ( x n ) ∞ n =1 such that | x ℓ + n − x ℓ + m | →∞ as ℓ → ∞ for any n = m and x n → ∞ (or x n → −∞ ). For example, x n = 1 + · · · + n (or x n = − (1 + · · · + n ) ) sufﬁces since then | x ℓ + n − x ℓ + m | = | n − m | (2 ℓ + n + m + 1)2 ≥ ℓ. Note that this proposition could be slightly generalised by requiring only that f ( x n ) bebounded away from zero for large n . For example, the function f ( x ) = sin( π ( x + )) , whichis not covered by Proposition 2.5, satisﬁes f ( x n ) = 1 for all n if x n = ± (1 + · · · + n ) .Let ϕ ( n )+ (0) denote the n th derivative from right of ϕ at the origin and deﬁne D n K x ( y ) = ∂ n ∂v n K ( v, y ) (cid:12)(cid:12)(cid:12)(cid:12) v = x and D n,n K ( x, y ) = ∂ n ∂v n ∂w n K ( v, w ) (cid:12)(cid:12)(cid:12)(cid:12) v = xw = y . The following lemma has been essentially proved by Sun and Zhou (2008). For completenesswe supply a simple proof.

Lemma 2.6. If K is a translation-invariant positive-semideﬁnite kernel on R for ϕ which isanalytic on R , then all elements of H K ( R ) are analytic.Proof. Because K is inﬁnitely differentiable on R , every f ∈ H K ( R ) is inﬁnitely differentiableand satisﬁes | f ( n ) ( x ) | = |h f, D n K x i K | ≤ k f k K k D n K x k K = k f k K p D n,n K ( x, x ) for every n ≥ and x ∈ R (Steinwart and Christmann, 2008, Corollary 4.36). From the Taylorexpansion K ( x, y ) = ∞ X n =0 ϕ ( n ) (0) n ! ( x − y ) n it is straightforward to compute that, for any x ∈ R , D n,n K ( x, x ) = ( − n (2 n )! n ! ϕ ( n )+ (0) . Since ϕ is analytic, there are positive constants C and R such that | ϕ ( n )+ (0) | ≤ CR n n ! for every n ≥ . It follows that | f ( n ) ( x ) | ≤ k f k K r (2 n )! n ! ϕ ( n )+ (0) ≤ k f k K p CR n (2 n )! ≤ √ C k f k K (2 √ R ) n n ! , which implies that f is analytic on R . Theorem 2.7.

Let K be a translation-invariant positive-semideﬁnite kernel on R for ϕ ≥ which is analytic on [0 , ∞ ) and satisﬁes lim r →∞ ϕ ( r ) = 0 and Ω a subset of R which has anaccumulation point. Then a function f : Ω → R is not an element of H K (Ω) if there exists ananalytic function f e : R → R such that f e | Ω = f and lim inf x →−∞ | f e ( x ) | > or lim inf x →∞ | f e ( x ) | > . (2.2) Proof.

The claim follows from Lemmas 2.3 and 2.6 and Proposition 2.5. The requirement inProposition 2.5 that the function should not change sign follows from continuity and (2.2).4

Examples

Standard examples of analytic translation-invariant kernels are the Gaussian kernel K ( x, y ) = ϕ (cid:0) ( x − y ) (cid:1) for ϕ ( r ) = exp( − r ) and the inverse quadratic K ( x, y ) = ϕ (cid:0) ( x − y ) (cid:1) for ϕ ( r ) = 11 + r . It is known that the RKHSs of these kernels do not contain non-trivial polynomials (Minh,2010; Dette and Zhigljavsky, 2021) on bounded intervals. These results are special cases ofTheorem 2.7, which can be applied to any analytic function whose analytic continuation isbounded away from zero at inﬁnity. For example, the function f ( x ) = exp (cid:18) − sin( x ) + 1 √ x (cid:19) is in the RKHS of no translation-invariant kernel for which ϕ ≥ decays to zero at inﬁnity.The exponential kernel K ( x, y ) = exp( xy ) serves as a good example that lim x →∞ K ( x, y ) = 0 for inﬁnitely many y is not a sufﬁcientcondition for Theorem 2.2 to hold. The RKHS on R of the exponential kernel consists of analyticfunctions and contains all polynomials. For any y < it holds that lim x →∞ K ( x, y ) = 0 .However, it is not possible to select a sequence ( x n ) ∞ n =1 for which K satisﬁes (2.1). For clearly x ℓ + n and x ℓ + m would have to have had opposite signs for all sufﬁciently large ℓ if n = m .But this would in particular imply that sgn( x ℓ +1 ) = sgn( x ℓ +2 ) , sgn( x ℓ +1 ) = sgn( x ℓ +3 ) , and sgn( x ℓ +2 ) = sgn( x ℓ +3 ) for sufﬁciently large ℓ , which is not possible. Acknowledgements

The author was supported by the Lloyd’s Register Foundation Programme for Data-CentricEngineering at the Alan Turing Institute, United Kingdom. Correspondence with Anatoly Zhigl-javsky served as an inspiration for this note. Motonobu Kanagawa and Chris Oates providedhelpful comments.

References

Aronszajn, N. (1950). Theory of reproducing kernels.

Transactions of the American Mathemat-ical Society , 68(3):337–404.Carmeli, C., De Vito, E., and Toigo, A. (2006). Vector valued reproducing kernel Hilbert spacesof integrable functions and Mercer theorem.

Analysis and Applications , 4(4):377–408.Dette, H. and Zhigljavsky, A. (2021). Reproducing kernel Hilbert spaces, polynomials and theclassical moment problems. arXiv:2101.11968v2 .Karvonen, T., Kanagawa, M., and Särkkä, S. (2019). On the positivity and magnitudes ofBayesian quadrature weights.

Statistics and Computing , 29(6):1317–1333.Krantz, S. G. and Parks, H. R. (2002).

A Primer of Real Analytic Functions . Birkhäuser, 2ndedition. 5ee, K.-Y., Li, B., and Zhao, H. (2016). Variable selection via additive conditional independence.

Journal of the Royal Statistical Society. Series B (Statistical Methodology) , 78(5):1037–1055.Minh, H. Q. (2010). Some properties of Gaussian reproducing kernel Hilbert spaces and theirimplications for function approximation and learning theory.

Constructive Approximation ,32(2):307–338.Paulsen, V. I. and Raghupathi, M. (2016).

An Introduction to the Theory of Reproducing KernelHilbert Spaces . Number 152 in Cambridge Studies in Advanced Mathematics. CambridgeUniversity Press.Saitoh, S. (1997).

Integral Transforms, Reproducing Kernels and Their Applications . Chapmanand Hall.Steinwart, I. and Christmann, A. (2008).

Support Vector Machines . Information Science andStatistics. Springer.Steinwart, I., Hush, D., and Scovel, C. (2006). An explicit description of the reproducingkernel Hilbert spaces of Gaussian RBF kernels.

IEEE Transactions on Information Theory ,52(10):4635–4643.Sun, H. (2005). Mercer theorem for RKHS on noncompact sets.

Journal of Complexity ,21(3):337–349.Sun, H.-W. and Zhou, D.-X. (2008). Reproducing kernel Hilbert spaces associated with analytictranslation-invariant Mercer kernels.

Journal of Fourier Analysis and Applications , 14(1):89–101.Wendland, H. (2005).

Scattered Data Approximation . Number 17 in Cambridge Monographson Applied and Computational Mathematics. Cambridge University Press.Xu, W. and Stein, M. L. (2017). Maximum likelihood estimation for a smooth Gaussian randomﬁeld model.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 5(1):138–175.Zwicknagl, B. and Schaback, R. (2013). Interpolation and approximation in Taylor spaces.