[PDF] Generalization of the energy distance by Bernstein functions

Abstract

We reprove the well known fact that the energy distance defines a metric on the space of Borel probability measures on a Hilbert space with finite first moment by a new approach, by analyzing the behavior of the Gaussian kernel on Hilbert spaces and a Maximum Mean Discrepancy analysis. From this new point of view we are able to generalize the energy distance metric to a family of kernels related to Bernstein functions and conditionally negative definite kernels. We also explain what occurs on the energy distance on the kernel \|x-y\|^{\alpha} for every \alpha >2, where we also generalize the idea to a family of kernels related to derivatives of completely monotone functions and conditionally negative definite kernels.

Full PDF

aa r X i v : . [ m a t h . F A ] F e b GENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS

J. C. GUELLAA

BSTRACT . We reprove the well known fact that the energy distance deﬁnes a metric on thespace of Borel probability measures on a Hilbert space with ﬁnite ﬁrst moment by a new ap-proach, by analysing the behaviour of the Gaussian kernel on Hilbert spaces and a MaximumMean Discrepancy analysis. From this new point of view we are able to generalize the energydistance metric to a family of kernels related to Bernstein functions and conditionally negativedeﬁnite kernels. We also explain what occurs on the energy distance on the kernel k x − y k α forevery α >

2, where we also generalize the idea to a family of kernels related to derivatives ofcompletely monotone functions and conditionally negative deﬁnite kernels. C ONTENTS

1. Introduction 12. Deﬁnitions 33. Conditionally positive deﬁnite kernels 64. Inner products deﬁned by CND kernels and derivatives of completely monotonefunctions 85. Space of functions deﬁned by derivatives of completely monotone functions 126. Proofs 136.1.

Section 3

Section 4

Section 5

NTRODUCTION

A popular method to compare two probabilities is done by embedding the space (or a subset)of probabilities into a Hilbert space and use the metric provided by the embedding. Currently,there are two main approaches for this task: ( I ) The maximum mean discrepancy on a bounded, continuous, positive deﬁnite kernel K : X × X → R that is characteristic [7], [4]. The distance between two Radon regular Mathematics Subject Classiﬁcation.

Key words and phrases.

Energy distance; Metric spaces of strong negative type; Metrics on probabilities;Bernstein functions; Conditionally negative deﬁnite kernels. probabilities P and Q is deﬁned by MMD ( P , Q ) : = r Z X Z X K ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) . ( II ) The use of a continuous conditionally negative deﬁnite kernel γ : X × X → R with γ ( x , x ) = x ∈ X , [19]. The kernel γ must additionally satisfy the equality(1.1) Z X Z X − γ ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) = P and Q that integrates the function x → γ ( x , z ) forevery z ∈ X only when P = Q . It can be proved that the above double integral is alwaysa nonnegative number and when this property occurs D γ ( P , Q ) : = r Z X Z X − γ ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) , is a metric on the mentioned subspace of probabilities on X .On this paper, we focus on the second method.The most popular example of this method is the energy distance, initially deﬁned as X = R m , γ ( x , y ) = k x − y k θ , where 0 < θ < k x k θ ,[24], [23]. When θ =

2, the kernel is conditionally negative deﬁnite but do not satisfy theadditional property of Equation 1.1.A more geometrical approach is when γ is a metric on X that satisfy Equation 1.1 (the topol-ogy is the one from the metric), hence ( X , γ ) is a metric space of strong negative type. Examplesof such spaces include: • Hilbert spaces: Proved on [12] as a generalization of the energy distance. • Hyperbolic spaces (ﬁnite dimensional): Proved on [13]In some cases, the conditionally negative deﬁnite kernel γ may deﬁne a metric on the set X , but γ is not of strong type. A metric space where we only know that the distance is aconditionally negative deﬁnite kernel is called a metric space of negative type. An example ofsuch space is the real sphere, proved on [5], where it is also proved that the real, complex andquaternionic projective spaces and the Cayley projective plane are not metric spaces of negativetype.In [12], it is also proved that if ( X , γ ) is a metric space of negative type then γ θ , 0 < θ < γ . Interestingly, the kernel γ θ is a metric on X , with the same topology as γ , so we canrephrase the result of Lyon as ( X , γ θ ) being a metric space of strong negative type. We providemore details and generalizations of this property on Corollary 4.3.The major aim of this paper is to provide a large amount of examples of conditionally negativedeﬁnite kernels that satisfy Equation 1.1, by using Bernstein functions on Theorem 4.1. Ourmethod encompasses all of the above mentioned kernels that satisfy ( II ) . We also provide anew proof that hyperbolic spaces (any dimension) are metric paces of strong negative type onTheorem on 4.2. ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 3

In [15], Mattner analysed the behaviour of the kernel k x − y k α , for α >

2, deﬁned on R m .What occurs is that we can still provide a metric structure on the space of probabilities withcertain integrability assumptions, but we can only compare them if they have the same vectormean ( 2 < α < < α < y ∈ H → Z H ψ ( k x − y k ) d µ ( x ) ∈ R , where ψ is a continuous function that is the difference of two derivatives (same order) of acompletely monotone function. More precisely, we analyse when they are uniquely deﬁned bythe measure µ . Section 2 is entirely focused on deﬁnitions that we use. The proofs are presentedon Section 6. 2. D EFINITIONS

We recall that a nonnegative measure λ on a Hausdorff space X is Radon regular (which wesimply refer as Radon) when it is a Borel measure such that is ﬁnite on every compact set of X and (i) (Inner regular) λ ( E ) = sup { λ ( K ) , K is compact , K ⊂ E } for every Borel set E .(ii) (Outer regular) λ ( E ) = inf { λ ( U ) , U is open , E ⊂ U } for every Borel set E .We then said that a complex valued measure λ of bounded variation is Radon if its variationis a Radon measure. The vector space of such measures is denoted by M ( X ) . Recall that everyBorel measure of ﬁnite variation (in particular, probability measures) on a separable completemetric space is necessarily Radon.An semi-inner product on a real (complex) vector space V is a bilinear real (sesquilinearcomplex) valued function ( · , · ) V deﬁned on V × V such that ( u , u ) V ≥ u ∈ V . Whenthis inequality is an equality only for u =

0, we say that ( · , · ) V is an inner-product. Similarly, apseudometric on a set X is a symmetric function d : X × X → [ , ∞ ) , such that d ( x , x ) = d ( x , y ) = x = y , d is a metric on X .A kernel K : X × X → C is called positive deﬁnite if for every ﬁnite quantity of distinct points x , . . . , x n ∈ X and scalars c , . . . , c n ∈ C , we have that Z X Z X K ( x , y ) d λ ( x ) d λ ( y ) = n ∑ i , j = c i c j K ( x i , x j ) ≥ , where λ = ∑ ni = c i δ x i . The set of measures on X used before are denoted by the symbol M δ ( X ) .The reproducing kernel Hilbert space (RKHS) of a positive deﬁnite kernel K : X × X → C isthe Hilbert space H K ⊂ F ( X , C ) , and it satisﬁes [22] ( i ) x ∈ X → K y ( x ) : = K ( x , y ) ∈ H K ; ( ii ) h K y , K x i = K ( x , y ) J. C. GUELLA ( iii ) span { K y , y ∈ X } = H K .When X is a Hausdorff space and K is continuous it holds that H K ⊂ C ( X ) .The following widely known result describes how it is possible to deﬁne a semi-inner productstructure on a subspace of M ( X ) using a continuous positive deﬁnite kernel. Lemma 2.1.

If K : X × X → C is a continuous positive deﬁnite kernel and µ ∈ M ( X ) with p K ( x , x ) ∈ L ( | µ | ) ( µ ∈ M √ K ( X ) ), thenz ∈ X → K µ ( z ) : = Z X K ( x , z ) d µ ( x ) ∈ C is an element of H K , and if η is another measure with the same conditions as µ , we have that h K η , K µ i H K = Z X Z X k ( x , y ) d η ( x ) d µ ( y ) . In particular, ( η , µ ) ∈ M √ K ( X ) × M √ K ( X ) → h K η , K µ i H K is a semi-inner product. We present a generalization of this result to a larger class of measures in Lemma 3.6. Usually,the kernel K is bounded, so M √ K ( X ) = M ( X ) . On this case, if the semi-inner product is in factan inner product we say that K is integrally strictly positive deﬁnite (ISPD), and when is an innerproduct on the vector space of measures in M ( X ) that µ ( X ) =

0, we say that K is characteristic.If the kernel K is real valued, it is sufﬁcient to analyse the ISPD and characteristic property onreal valued measures.When the kernel is characteristic we deﬁne the maximum mean discrepancy (MMD) as themetric on the space of probability measures in M ( X ) by(2.2) MMD ( P , Q ) K : = q h K P − K Q , K P − K Q i H K = r Z X Z X K ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) As mentioned at the introduction, the focused of this paper is to analyse metrics on the spaceof probabilities using conditionally negative deﬁnite kernels. We present a more general deﬁ-nition which will be useful to the analysis of the energy distance through the kernel k x − y k α , α >

2, deﬁned on a Hilbert space.

Deﬁnition 2.2.

Let γ : X × X → C be an Hermitian kernel and P a ﬁnite dimensional spaceof functions from X to C . We say that γ is P-conditionally positive deﬁnite (P-CPD) if forevery ﬁnite quantity of points x , . . . , x n ∈ X and scalars c , . . . , c n ∈ C , under the restrictionthat ∑ ni = c i p ( x i ) = for every p ∈ P, we have that n ∑ i , j = c i c j γ ( x i , x j ) ≥ . This deﬁnition generalize the concepts of positive deﬁnite kernels ( P is the zero space) andCPD kernels ( P as the set of constant functions). The most important example is when X is a ﬁnite dimensional Euclidean space and P is the set of multivariable polynomials on X with degree less than or equal to a constant k ∈ N , [25] [9], [6]. Sometimes it might be more ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 5 convenient to work with the opposite sign on Deﬁnition 2.2, on this case we say that the kernelis P -conditionally negative deﬁnite ( P -CND).In [9], [16], it is proved that a characterization for the continuous functions ψ : [ , ∞ ) → R ,such that the kernel ( x , y ) ∈ R m × R m → ψ ( k x − y k ) ∈ R is CPD for P as the family of multivariable polynomials of degree less than a ﬁxed ℓ ∈ Z + (wedenote this family by π ℓ − ( R m ) , where π − ( R m ) = { } and π ( R m ) = { constant functions } )for every m ∈ N . A function ψ satisfy this property if and only if ψ ∈ C ∞ ( , ∞ ) and ( − ) ℓ ψ ( ℓ ) is a completely monotone function on ( , ∞ ) . A function with this property can be uniquelywritten as(2.3) ψ ( t ) = Z ( , ∞ ) e − tr − e ℓ ( r ) ω ℓ, ∞ ( rt ) r ℓ d λ ( r ) + ℓ ∑ k = a k t k where λ is a nonnegative Radon measure on ( , ∞ ) (not necessarily with ﬁnite variation) with ω ℓ, ∞ ( s ) : = ℓ − ∑ l = ( − ) l s l l ! , e ℓ ( s ) : = e − s ℓ − ∑ l = s l l ! , Z ( , ∞ ) min { , r − ℓ } d λ ( r ) < ∞ and a k ∈ R , ( − ) ℓ a ℓ ≥ ω , ∞ is the zero function. For instance, the functions i ) ( − ) ℓ t a + p ( t ) ; ii ) ( − ) ℓ + t ℓ log ( t ) + p ( t ) ; iii ) ( − ) ℓ ( c + t ) a + p ( t ) ; iv ) e − rt + p ( t ) ,are elements of CM ℓ , for ℓ − < a ≤ ℓ , c > p ∈ π ℓ − . Those functions are not only in CM ℓ , but they are ℓ − [ , ∞ ) and we have a similar and simplercharacterization compared to Equation 2.3 for them.In general, a function ψ ∈ CM ℓ is such that ψ ∈ C ℓ − ([ , ∞ )) if and only if(2.4) ψ ( t ) = Z ( , ∞ ) e − tr − ω ℓ, ∞ ( rt ) r ℓ d η ( r ) + ℓ ∑ k = b k t k where η is a nonnegative Radon measure on ( , ∞ ) (not necessarily with ﬁnite variation) with ω ℓ, ∞ ( s ) : = ℓ − ∑ l = ( − ) l s l l ! , Z ( , ∞ ) min { , r − ℓ } d η ( r ) < ∞ , b k = ψ ( k ) ( ) / k ! for k < ℓ and ( − ) ℓ b ℓ ≥ ψ ∈ CM ℓ then ψ ( · + c ) ∈ CM ℓ ∩ C ℓ − ([ , ∞ )) . On this case, the mea-sure η c relative to the decomposition given on Equation 2.4 has ﬁnite variation and satisfy d η c + s ( r ) = e − sr d η c ( r ) for every c , s >

0. This property and the decomposition given on Equa-tion 2.4 are implicitly proved on Theorem 2 . . p ∈ CM ℓ if and only if p ∈ π ℓ ( R ) and the constant ( − ) ℓ p ( ℓ ) ≥ J. C. GUELLA

By Lemma 2 . ψ ∈ CM ℓ satisﬁes | ψ ( t ) | . + t ℓ (this notation means that | ψ ( t ) | / + t ℓ is a bounded function).3. C ONDITIONALLY POSITIVE DEFINITE KERNELS

The following known result states a connection between positive deﬁnite kernels and P -CPDkernels [25]. A Lagrange basis for P is a basis { p , . . . , p m } of P and points ξ , . . . , ξ m ∈ X , suchthat p i ( ξ j ) = δ i , j . A set of points ξ , . . . , ξ m ∈ X is unisolvent with respect to a m -dimensionalspace P if the only function p ∈ P such that p ( ξ i ) = i is the zero function. Theorem 3.1.

Let ξ , . . . , ξ m ∈ X and p , . . . , p m be a Lagrange basis for a ﬁnite dimensionalspace P of functions from X to C . An Hermitian kernel γ : X × X → C is P-CPD if and only ifthe Hermitian kernelK γ ( x , y ) : = γ ( x , y ) − m ∑ k = p k ( x ) γ ( ξ k , y ) − m ∑ l = p l ( y ) γ ( x , ξ l ) + m ∑ k , l = p k ( x ) p l ( y ) γ ( ξ k , ξ l ) is positive deﬁnite. This result can be easily seen by the fact that if x , . . . , x n ∈ X and c , . . . , c n ∈ C are such that ∑ ni = c i p ( x i ) = p ∈ P , then n ∑ i , j = c i c j K γ ( x i , x j ) = n ∑ i , j = c i c j γ ( x i , x j ) , and conversely, if z , . . . , z m + n ∈ X (with z n + k = ξ k ) and d , . . . , d m + n ∈ C , then m + n ∑ i , j = d i d j K γ ( z i , z j ) = m + n ∑ i , j = e i e j γ ( z i , z j ) , where e i = d i , for i ≤ n and e i = − ∑ ni = d i p i − n ( z i ) , for i > n .Similar to continuous positive deﬁnite kernels, continuous P -CPD kernels can be analyzedby its behaviour on a certain type of space of measures. Deﬁnition 3.2.

Let X be a Hausdorff space and P ⊂ C ( X ) a ﬁnite dimensional vector space.We deﬁne the set M P ( X ) : = { µ ∈ M ( X ) , Z X | p ( x ) | d | µ | ( x ) < ∞ and Z X p ( x ) d µ ( x ) = for every p ∈ P } . Theorem 3.3.

A continuous Hermitian kernel γ : X × X → C is P-CPD if and only if for ev-ery µ ∈ M P ( X ) for which γ ( x , y ) ∈ L ( | µ | × | µ | ) and γ ( x , ξ i ) ∈ L ( | µ | ) , where ( ξ i ) ≤ i ≤ m isunisolvent, we have that Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) ≥ . If we restrict the measures on Theorem 3.3 to those that γ ( x , ξ ) ∈ L ( | µ | ) for every ξ ∈ X ,then the kernel γ deﬁnes a semi-inner product on this vector space.When P is the space generated by a single function p , we can simplify the assumptions ofTheorem 3.3. ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 7

Lemma 3.4.

Let γ : X × X → C be a continuous Hermitian kernel and [ p ] = P ⊂ C ( X ) be aone dimensional vector space. Then, γ is P-CPD if and only if for every µ ∈ M P ( X ) for which γ ( x , y ) ∈ L ( | µ | × | µ | ) Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) ≥ . Additionally, if p and γ are real valued functions such that p ( x ) = and the function γ ( x , x ) / p ( x ) is bounded, the following assertions are equivalent: ( i ) γ ∈ L ( | µ | × | µ | ) ; ( ii ) The function x ∈ X → γ ( x , z ) ∈ L ( | µ | ) for some z ∈ X ; ( iii ) The function x ∈ X → γ ( x , z ) ∈ L ( | µ | ) for every z ∈ X .

As a direct consequence of the previous Lemma we obtain that if the function γ ( x , x ) / p ( x ) isbounded, the set of measures on M P ( X ) that integrates γ ( x , y ) is a vector space and the doubleintegral deﬁnes a semi inner product on it. We focus on the CPD case and when γ is real valueddue to its relevance. Corollary 3.5.

Let γ : X × X → R be a continuous CPD kernel such that the function γ ( x , x ) isbounded. The semi inner product ( µ , ν ) ∈ M ( X , γ ) × M ( X , γ ) → I ( µ , ν ) γ : = Z X Z X γ ( x , y ) d µ ( x ) d ν ( y ) ∈ R is well deﬁned on the vector space M ( X , γ ) : = { η ∈ M ( X ) , η ( X ) = , γ ∈ L ( | η | × | η | ) } On the next lemma we improve the condition p K ( x , x ) ∈ L ( | µ | ) and the set of measuresanalysed on Lemma 2.1, at the cost of describing the function K µ at the exception of a | µ | measure zero set. Lemma 3.6.

Let K : X × X → C be a continuous positive deﬁnite kernel. Let µ ∈ M ( X ) suchthat K ( x , y ) ∈ L ( | µ | × | µ | ) , then the set of pointsX µ : = { z ∈ X , K ( · , z ) ∈ L ( | µ | ) } is such that | µ | ( X − X µ ) = , and the functionz ∈ X µ → Z X K ( x , z ) d µ ( x ) ∈ C is the restriction of an element K µ ∈ H K . If η is a measure with the same conditions as themeasure µ and K ∈ L ( µ × η ) , we have that h K η , K µ i H K = Z X Z X k ( x , y ) d η ( x ) d µ ( y ) . J. C. GUELLA

4. I

NNER PRODUCTS DEFINED BY

CND

KERNELS AND DERIVATIVES OF COMPLETELYMONOTONE FUNCTIONS

Since all kernels that we deal on this Section are real valued, we simplify the writing by onlyfocusing on real valued measures (which we still use the notation M ( X ) ). As mentioned onSection 2, this is not a restriction.In [12], it is proved that on a separable real Hilbert space H , the bilinear function I / deﬁnedas ( µ , ν ) ∈ M ( H ) × M ( H ) → I ( µ , ν ) / : = Z H Z H −k x − y k H d µ ( x ) d ν ( y ) deﬁnes a inner product on the vector space M ( H ) : = { η ∈ M ( H ) , η ( H ) = , k x k ∈ L ( | η | ) } . The function t ∈ [ , ∞ ) → ψ ( t ) : = √ t ∈ R is an example of a Bernstein function, [18]. Itis continuous, ψ ∈ C ∞ (( , ∞ )) and ψ ′ is a completely monotone function on ( , ∞ ) (we do notneed to assume on our context that Bernstein functions are nonnegative). In other words, afunction ψ is a Bernstein function if and only if − ψ ∈ CM , and then it can be written, byEquation 2.4 for ℓ =

1, as −√ t = √ π Z ( , ∞ ) ( e − rt − ) r / dr . So, ( x , y ) ∈ H × H → −k x − y k H = √ π Z ( , ∞ ) ( e − r k x − y k − ) r / dr , and this kernel is CPD. The Gaussian kernels e − r k x − y k , r >

0, are ISPD for every Hilbert space[8], being so, by Fubini-Tonelli Theorem we have that if µ ∈ M ( H ) with µ ( H ) = k x k ∈ L ( | µ | ) , then Z H Z H ( − ) k x − y k H d µ ( x ) d µ ( y ) = √ π Z ( , ∞ ) (cid:18) Z H Z H e − r k x − y k d µ ( x ) d µ ( y ) (cid:19) r / dr ≥ . Further, the double inner integral is positive whenever µ is not the zero measure, implying thatthe ﬁnal result is a positive number, which is the key argument in order to verify that I / is aninner product, thus reobtaining the main result of [12] by a complete different argument. Moregenerally, we have the following result. Theorem 4.1.

Let ψ : [ , ∞ ) → R be a Bersntein function and γ : X × X → [ , ∞ ) be a continuousCND kernel such that x → γ ( x , x ) is a bounded function. Consider the vector space M ( X ; γ , ψ ) : = { η ∈ M ( X ) , ψ ( γ ( x , y )) ∈ L ( | η | × | η | ) and η ( X ) = } , then the function ( µ , ν ) ∈ M ( X ; γ , ψ ) × M ( X ; γ , ψ ) → I ( µ , ν ) γ , ψ : = − Z X Z X ψ ( γ ( x , y )) d µ ( x ) d ν ( y ) deﬁnes an semi-inner product on M ( X ; γ , ψ ) . If ψ is not a linear function and γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y, then I ( µ , ν ) γ , ψ deﬁnes an inner product on M ( X ; γ , ψ ) . ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 9

We emphasize that by Lemma 3.4 ( p is the constant 1 function) ψ ( γ ( x , y )) ∈ L ( | η | × | η | ) ifand only if x → ψ ( γ ( x , z )) ∈ L ( | η | ) for some (or every) z ∈ X .For instance, if X is a real Hilbert space H , γ ( x , y ) = k x − y k and ψ ( t ) = t a / , 0 < a < ( µ , ν ) ∈ M ( H ; t a / ) × M ( H ; t a / ) → I ( µ , ν ) a / : = − Z H Z H k x − y k a d µ ( x ) d ν ( y ) deﬁnes an inner product on M ( H ; t a / ) : = { η ∈ M ( H ) , k x k a ∈ L ( | η | ) and η ( X ) = } . It is relevant to say that usually the inner product on Theorem 4.1 is not complete (hence, M ( X ; γ , ψ ) is not a Hilbert space). For instance, on [20] it is proved that the Gaussian kernelcan be used to deﬁne an inner product on the space of tempered distributions on Euclideanspaces.Another example occurs on the generalized real hyperbolic space. Let H be a Hilbert spaceand deﬁne H : = { ( x , t x ) ∈ H × ( , ∞ ) , t x − k x k = } be the real hyperbolic space relative to H and consider the kernel (( x , t x ) , ( y , t y )) ∈ H × H → [( x , t x ) , ( y , t y )] : = t x t y − h x , y i ∈ [ , ∞ ) , which satisﬁes the relation cosh ( d H (( x , t x ) , ( y , t y ))) = [( x , t x ) , ( y , t y )] , where d H is a metric in H . On [3] or chapter 5 in [1], it is proved that the metric d H on H is aCND kernel, being so we can apply Theorem 4.1 for the kernel γ = d H and ψ = t a / , 0 < a < ( µ , ν ) ∈ M ( H ; t a / ) × M ( H ; t a / ) → H ( µ , ν ) a / : = − Z H Z H d H ( x , y ) a / d µ ( x ) d ν ( y ) deﬁnes a inner product on M ( H ; t a / ) : = { η ∈ M ( H ) , x ∈ H → d H ( x , z ) a / ∈ L ( | η | ) for some (or every) z ∈ H and η ( H ) = } . We can also include the case a =

2. A proof when H is ﬁnite dimensional was provided on[13] using geometric properties of hyperbolic spaces. Our proof relies on a Laurent type ofapproximation for the function arcCosh ( t ) . Theorem 4.2.

Let H be a real hyperbolic space, and consider the vector space M ( H ; t ) : = { η ∈ M ( H ) , x ∈ H → d H ( x , z ) ∈ L ( | η | ) for some (or every) z ∈ H and η ( H ) = } . Then ( µ , ν ) ∈ M ( H ; t ) × M ( H ; t ) → H ( µ , ν ) : = − Z H Z H d H ( x , y ) d µ ( x ) d ν ( y ) is an inner product. A different behaviour occurs on the generalized real spheres. Let H be a Hilbert space anddeﬁne S H : = { x ∈ H , k x k = } be the real sphere relative to H . The kernel d S H deﬁnedon S H by the relation cos ( d S H ( x , y )) = h x , y i H , x , y ∈ H is a metric and deﬁnes a CND kernel as shown on [5]. However, unlikely the Hilbert spaceand the real hyperbolic space, d S H is not a metric space of strong negative type, [14]. Gan-golli also proved on [5] that the metric on the other compact two-point homogeneous spaces(real/complex/quaternionic projective spaces and the Cayley projective plane) does not deﬁne aCND kernel.The following Corollary of Theorem 4.1, connects the setting of metric spaces of strongnegative type and the kernels on Theorem 4.1. Corollary 4.3.

Let ψ : [ , ∞ ) → R be a nonzero Bernstein function such that ψ ( ) = , lim t → ∞ ψ ( t ) / t = and ( X , γ ) is a metric space of negative type. Then, ( x , y ) ∈ X × X → D ψ , γ ( x , y ) : = ψ ( γ ( x , y )) is a metric on X and ( X , D ψ , γ ) is a metric space of strong negative type homeomorphic to ( X , γ ) . As an example of Corollary 4.3, the Bersntein function ψ ( t ) = log ( t + ) , satisﬁes ψ ( ) = t → ∞ ψ ( t ) / t =

0. In particular, on a Hilbert space H , log ( k x − y k + ) is a metric on H that is homeomorphic with the Hilbertian topology and this metric is of strong negative type.Interestingly we can apply Corollary 4.3 again in order to obtain that the same occurs with themetric log ( log ( k x − y k + ) + ) .Returning to the kernel ( x , y ) ∈ H × H → k x − y k a , we may ask ourselves what occurs when a ≥

2. The case a = − Z H Z H k x − y k d µ ( x ) d ν ( y ) = Z H Z H h x , y i H d µ ( x ) d ν ( y ) , for every µ , ν ∈ M ( H ; t ) : = { η ∈ M ( H ) , k x k ∈ L ( | η | ) and η ( X ) = } . This still deﬁnesa semi-inner product on M ( H ; t ) , but the vector space M ( H ; t ) : = { η ∈ M ( H ; t ) , Z H h x , y i H d η ( x ) = , for every y ∈ H } ⊂ M ( H ; t ) is equivalent to the zero measure on this inner product. For an arbitrary measure η ∈ M ( H ) such that k x k ∈ L ( | η | ) , the linear functional y ∈ H → Z H h x , y i d η ( x ) ∈ R is continuous, so there exists a vector v η , which we call the vector mean of η , which representsthe above continuous linear functional.On the case a >

2, a different behaviour emerges. The double integral kernel does not deﬁne asemi-inner product on M ( H , t a / ) , however, if we restrict ourselves to the vector space space M ( H ; t a / ) : = { η ∈ M ( H ) , k x k a ∈ L ( | η | ) , η ( H ) = , v η = } ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 11 for 2 < a < CM function t a / = a ( a − ) Γ ( − a / ) Z ( , ∞ ) ( e − rt − + rt ) r a / + dr , by Fubini-Tonelli we obtain that if µ , ν ∈ M ( H ; t a / ) Z H Z H k x − y k a d µ ( x ) d ν ( y )= a ( a − ) Γ ( − a / ) Z ( , ∞ ) (cid:18) Z H Z H e − r k x − y k d µ ( x ) d ν ( y ) (cid:19) r a + dr ≥ . In particular, we can use the kernel k x − y k a , 2 < a <

4, in order to deﬁne a metric on thespace of Radon probability measures on H with ﬁnite second moment, but with a ﬁxed vectormean.More generally, we have that. Theorem 4.4.

Let ℓ ∈ N , ψ : [ , ∞ ) → R be a continuous function on CM ℓ and γ : X × X → [ , ∞ ) be a continuous CND kernel such that x → γ ( x , x ) is a constant function. Consider the vectorspace M ℓ ( X ; γ , ψ ) : = { η ∈ M ( X ) , ψ ( γ ( x , y )) ∈ L ( | η | × | η | ) , γ ( x , y ) ℓ ∈ L ( | η | × | η | ) and η ( X ) = , Z X Z X K − γ ( x , y ) j d η ( x ) d η ( y ) = , ≤ j ≤ ℓ − } where K − γ is the kernel in Theorem 3.1, then the function ( µ , ν ) ∈ M ℓ ( X ; γ , ψ ) × M ℓ ( X ; γ , ψ ) → I ( µ , ν ) γ , ψ : = Z X Z X ψ ( γ ( x , y )) d µ ( x ) d ν ( y ) deﬁnes an semi-inner product on M ℓ ( X ; γ , ψ ) . If ψ is not a polynomial of degree ℓ or lessand γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y, then I ( µ , ν ) γ , ψ deﬁnes an inner product on M ℓ ( X ; γ , ψ ) . From Equation 2.4 and the fact that ( − ) ℓ [ e − tr − ω ∞ ,ℓ ( rt )] ≥

0, for every t , r ≥ ℓ ∈ N (this can be easily proved by induction on ℓ ), if ψ ∈ CM ℓ on Theorem 4.4 also belongs C ℓ − [ , ∞ ) , we may lower the requirement γ ℓ ∈ L ( | η | × | η | ) to γ ℓ − ∈ L ( | η | × | η | ) on thedeﬁnition of M ℓ ( X ; γ , ψ ) .The fact that we required additional properties on the function γ ( x , x ) on Theorem 4.4 com-pared to Theorem 4.1, is related to the fact that the integrals R X R X γ ( x , y ) j d η ( x ) d η ( y ) are dif-ﬁcult to analyse on the general setting of Theorem 4.1. However, if ψ ∈ CM ℓ on Theorem 4.4also belongs C ℓ − [ , ∞ ) and all of its derivatives up to ℓ − x → γ ( x , x ) is abounded function on Theorem 4.4. This is the case for the function ( − ) ℓ t a , 2 ( ℓ − ) < a < ℓ .As an example of Theorem 4.4, if X is a Hilbert space H , γ ( x , y ) = k x − y k and ψ ( t ) =( − ) ℓ t a / , 2 ( ℓ − ) < a < ℓ , ℓ ∈ N then ( µ , ν ) ∈ M ℓ ( H ; t a / ) × M ℓ ( H ; t a / ) → I ( µ , ν ) a / : = Z H Z H ( − ) ℓ k x − y k a d µ ( x ) d ν ( y ) deﬁnes a inner product on the vector space M ℓ ( H ; t a / ) : = { µ ∈ M ( H ) , k x k a ∈ L ( | µ | ) , µ ( H ) = , and Z H h x , y i . . . h x , y j i d µ ( x ) = , y , . . . , y j ∈ H ≤ j ≤ ℓ − } . Theorem 4.1 and Theorem 4.4 on the case where X is an Euclidean space R m and γ ( x , y ) = k x − y k were proved on [15].5. S PACE OF FUNCTIONS DEFINED BY DERIVATIVES OF COMPLETELY MONOTONEFUNCTIONS

As mentioned in [12], the fact that the energy distance deﬁnes a metric on a separable Hilbertspace can be proved using the proposed method, but also follows as a consequence of the factthat if H is a separable Hilbert space, then a measure µ ∈ M ( H ) such that k x k a ∈ L ( | µ | ) , a ∈ ( , ∞ ) \ N , satisﬁes(5.5) Z H k x − y k a d µ ( x ) = , y ∈ H if and only if µ is the zero measure, proved in [11], [10].On [8] it is proved that if ψ ∈ CM and is not a constant function, then Z H ψ ( k x − y k ) d µ ( x ) = , y ∈ H if and only if µ is the zero measure. In this section we prove similar results on a much broadersetting, as a consequence of the results presented on Section 4. Theorem 5.1.

Let H be an inﬁnite dimensional Hilbert space, ℓ ∈ Z + and φ , ϕ ∈ CM ℓ . If ameasure µ ∈ M ( H ) such that k x k ℓ ∈ L ( | µ | ) satisﬁes Z H ψ ( k x − y k ) d µ ( x ) = y ∈ H , where ψ : = φ − ϕ then it must hold that Z H ψ ( k x − y k + c ) d µ ( x ) = y ∈ H , c ≥ . In addition, (even if H is not inﬁnite dimensional), ψ is not a polynomial if and only if the onlymeasure µ ∈ M ( H ) such that k x k ℓ ∈ L ( | µ | ) satisﬁes Z H ψ ( k x − y k + c ) d µ ( x ) = y ∈ H , c ≥ is the zero measure. For some functions we can provide a version of Theorem 5.1 on ﬁnite dimensional spaces.

ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 13

Lemma 5.2.

Let ℓ ∈ N and H be a Hilbert space. A measure µ ∈ M ( H ) such that k x k ( ℓ − ) ∈ L ( | µ | ) and ψ ( k x − y k ) ∈ L ( | µ | × | µ | ) , satisﬁes Z H ψ ( k x − y k ) d µ ( x ) = y ∈ H when ψ : [ , ∞ ) → R is one of the following functions: ( i ) ψ ( t ) = t a / , ( ℓ − ) < a < ℓ ; ( ii ) ψ ( t ) = t ℓ − log ( t ) , ℓ > ; ( iii ) ℓ = and ψ ∈ CM ℓ is not a polynomial, ψ ( ) ≤ . ( iv ) ℓ = and ψ ∈ CM ℓ is not a polynomial, ψ ( ) ≤ but k x k ℓ ∈ L ( | µ | ) .if and only if µ is the zero measure. We remark that on the case ( iv ) we may withdraw the additional assumption k x k ℓ ∈ L ( | µ | ) if ψ ∈ C ℓ − [ , ∞ ) . 6. P ROOFS

Section 3.

Proof of Theorem 3.3 . The converse is immediate.Suppose that γ is P -CPD. Since P is ﬁnite dimensional there exists a basis p , . . . , p m ∈ P forit such that p i ( ξ j ) = δ i , j . By the integrability assumptions on the functions p i and γ ( x , ξ j ) , thekernel K γ ∈ L ( | µ | × | µ | ) , and Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) = Z X Z X K γ ( x , y ) d µ ( x ) d µ ( y ) the conclusion will follow from Lemma 3.6. (cid:3) Proof of Lemma 3.4 . Let µ ∈ M P ( X ) for which γ ( x , y ) ∈ L ( | µ | × | µ | ) . Let A : = { ξ ∈ X , γ ( · , ξ ) ∈ L ( | µ | ) } , which by Fubini-Tonelli its complement has | µ | zero mea-sure. If A ∩ { ξ , p ( ξ ) = } 6 = /0, the result is a consequence of Theorem 3.3. On the otherhand, if A ∩ { ξ , p ( ξ ) = } = /0, note that the kernel γ is positive deﬁnite when restricted tothe closed set B : = { ξ , p ( ξ ) = } , A ⊂ B , and Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) = Z B Z B γ ( x , y ) d µ ( x ) d µ ( y ) . The conclusion follows from Lemma 3.6.Now, under the additional requirements on p and γ it is easy to see that γ is [ p ] -PD if and onlyif the kernel ( x , y ) ∈ X × X → β ( x , y ) : = γ ( x , y ) p ( x ) p ( y ) ∈ R is CPD. Note that is sufﬁcient to prove the 3 equivalences on the kernel β for any measure η ∈ M ( X ) with η ( X ) =

0, because we can take d η = pd µ .The kernel d ( x , y ) : = ( − β ( x , y ) + β ( x , x ) + β ( y , y )) / is a pseudometric on X , because d is a CND kernel with d ( x , x ) = x ∈ X , so it satisﬁes the triangle inequality. Since thefunction β ( x , x ) is bounded, the relations β ∈ L ( | η | × | η | ) , β ( x , z ) ∈ L ( | η | ) for some z ∈ X , β ( x , z ) ∈ L ( | η | ) for every z ∈ X are respectively equivalent to the relations d ∈ L ( | η | × | η | ) , d ( x , z ) ∈ L ( | η | ) for some z ∈ X , d ( x , z ) ∈ L ( | η | ) for every z ∈ X . The conclusion that these 3 properties are equivalent for the kernel d follows directly from thetriangle inequality. (cid:3) Proof of Lemma 3.6 . Assume without loss of generalization that µ is a nonnegative measure.The fact that the set X µ satisﬁes µ ( X − X µ ) = ( C n ) n ∈ N forwhich µ ( X − ∪ n ∈ N C n ) =

0. In particular, by the Dominated Convergence Theorem, the L ( µ × µ ) convergence holds Z X Z X K ( x , y ) χ C n ( x ) χ C n ( y ) d µ ( x ) d µ ( y ) → Z X Z X K ( x , y ) d µ ( x ) d µ ( y ) , because µ × µ ( X × [ X − ∪ n ∈ N C n ]) =

0. The function p K ( x , x ) ∈ L ( χ C n µ ) , so by Lemma 2.1, K µ n ∈ H K , where µ n : = χ C n d µ , and h K µ n − K µ m , K µ n − K µ m i H K = Z X Z X K ( x , y )[ χ C n ( x ) − χ C m ( x )][ χ C n ( y ) − χ C m ( y )] d µ ( x ) d µ ( y ) −−−−→ m , n → ∞ ( K µ n ) n ∈ N is Cauchy, in particular, convergent to an element K µ ∈ H K . Since H K is a RKHS, convergence in norm implies pointwise convergence, so K µ ( z ) = lim n → ∞ K n µ ( z ) = lim n → ∞ Z C n K ( x , z ) d µ ( x ) = Z X K ( x , z ) d µ ( x ) , for every z ∈ X µ , which proves our claim.Now, if K ∈ L ( µ × η ) , we have that h K η n , K µ n i H K = Z X Z X k ( x , y ) χ D n ( x ) χ C n ( y ) d η ( x ) d µ ( y ) . The left hand side of this equality converge to h K η , K µ i H K , while the right hand side convergeto R X R X k ( x , y ) d η ( x ) d µ ( y ) by the Dominated Convergence Theorem. (cid:3) Section 4 .

Throughout the rest of the paper, we use the well known fact that a Hermitiankernel γ : X × X → C is CND if and only if the kernel e − r γ ( x , y ) is positive deﬁnite for every r >

0, page 74 in [1].Next Lemma is an improvement of Lemma 3.4 for p as the set of constant functions. We useCND instead of CPD because it is how we apply this result. ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 15

Lemma 6.1.

Let γ : X × X → R be a continuous CND kernel such that γ ( x , x ) is a boundedfunction, µ ∈ M ( X ) and θ > . Then, the following assertions are equivalent ( i ) γ ∈ L θ ( | µ | × | µ | ) ; ( ii ) The function x ∈ X → γ ( x , z ) ∈ L θ ( | µ | ) for some z ∈ X ; ( iii ) The function x ∈ X → γ ( x , z ) ∈ L θ ( | µ | ) for every z ∈ X .Proof.

Since γ is CND there exists a CND kernel β : X × X → R , for which β ( x , x ) = x ∈ X , β / is a pseudometric on X and γ ( x , y ) = β ( x , y ) + γ ( x , x ) / + γ ( y , y ) / γ ( x , x ) is bounded and µ is a ﬁnite measure, for θ ≥ γ arerespectively equivalent to the three equivalences for the CND kernel β by the Minkowsky in-equality. If θ ∈ ( , ) the same relation occurs, but it follows from the general relation on L θ spaces Z | f + g | θ ≤ Z | f | θ + Z | g | θ . In particular, we may suppose that γ is a CND kernel for which γ ( x , x ) = x ∈ X .If γ ( x , y ) θ ∈ L ( | µ | × | µ | ) then there exists z ∈ X for which γ ( x , z ) θ ∈ L ( | µ | ) by the Fubini-Tonelli Theorem.If x ∈ X → γ ( x , z ) θ ∈ L ( | µ | ) for some z ∈ X , then for every y ∈ X ( γ ( x , y )) θ = (( γ ( x , y )) / ) θ ≤ ( γ ( x , z ) / + γ ( y , z ) / ) θ . For θ ≥ /

2, the functions inside the parenthesis on the right hand side of the previous equationare elements of L θ ( | µ | ) ( x variable), which by Minkowski Theorem we obtain the integrabilityof x ∈ X → γ ( x , y ) θ . Integrating the Minkowsky inequality with respect to d | µ | ( y ) , we alsoobtain that γ θ ∈ L ( | µ | × | µ | ) . For 0 < θ < /

2, the proof is the same but it follows from thefrom the general relation on L θ spaces as mentioned above. (cid:3) Proof of Theorem 4.1 . For the ﬁrst claim it is sufﬁcient to prove that I ( µ , µ ) γ , ψ ≥ − ψ ( γ ( x , y )) = a + b γ ( x , y ) + Z ( , ∞ ) e − r γ ( x , y ) − r d λ ( r ) , where b ≤ λ is a nonnegative Radon measure such that min { , r − } ∈ L ( λ ) . Conse-quently, − ψ ( γ ( x , y )) is a CPD kernel, the conclusion is then consequence of Corollary 3.4.If ψ is not a linear function, then λ (( , ∞ )) >

0, because the representation on Equation 2.4 isunique.If µ ∈ M ( X ; γ , ψ ) , then the 3 functions that describes ψ ( γ ( x , y )) are in L ( | µ | × | µ | ) , because e − r γ ( x , y ) − ≤ r > x , y ∈ X and b ≤

0. Since ( − e − rt ) ≤ r ( + t ) min { , r − } , r , t ≥ we can apply Fubini-Tonelli and obtain that − Z X Z X ψ ( γ ( x , y )) d µ ( x ) d µ ( y ) = b Z X Z X γ ( x , y ) d µ ( x ) d µ ( y )+ Z ( , ∞ ) (cid:20) Z X Z X e − r γ ( x , y ) d µ ( x ) d µ ( y ) (cid:21) r d λ ( r ) . The ﬁrst double integral is non positive by Corollary 3.4. Since 2 γ ( x , y ) = γ ( x , x ) + γ ( y , y ) onlywhen x = y , the kernel e − r γ ( x , y ) is ISPD for every r > . Z X Z X e − r γ ( x , y ) d µ ( x ) d µ ( y ) > , r > λ (( , ∞ )) > (cid:3) Proof of Theorem 4.2 . By equation 4 . . t ≥ ( t ) = log ( ) + log ( t ) − ∞ ∑ k = ( k ) !2 k ( k ! ) t − k k . In [1] it is proved that log ([ x , y ]) is a CND kernel on H while by [8] the positive deﬁnite kernel [ x , y ] − k on H is ISPD for every k ∈ N . Since the series appearing on the arcCosh formula aboveonly contains nonnegative numbers, we may reverse the order the summation with integrationfor any η ∈ M ( H ; t ) . Consequently, if µ is not the zero measure − Z H Z H d H ( x , y ) d µ ( x ) d µ ( y ) = − Z H Z H arcCosh ([ x , y ]) d µ ( x ) d µ ( y )= Z H Z H − log ([ x , y ]) + ∞ ∑ k = ( k ) !2 k ( k ! ) k [ x , y ] − k d µ ( x ) d µ ( y ) ≥ Z H Z H ∞ ∑ k = ( k ) !2 k ( k ! ) k [ x , y ] − k d µ ( x ) d µ ( y )= ∞ ∑ k = ( k ) !2 k ( k ! ) k Z H Z H [ x , y ] − k d µ ( x ) d µ ( y ) > . (cid:3) Proof of Corollary 4.3 . By Remark 3 . ψ satisfy these assumptions then we canwrite the kernel D ψ , γ as D ψ , γ ( x , y ) = ψ ( γ ( x , y )) = Z ( , ∞ ) − e − r γ ( x , y ) r d λ ( r ) where λ is a nonnegative Radon measure such that min { , r − } ∈ L ( λ ) . Because γ is a metric,we have that 1 − e − r γ ( x , y ) ≤ [ − e − r γ ( x , z ) ] + [ − e − r γ ( z , y ) ] , x , y , z ∈ X , Which proves that D ψ , γ ( x , y ) ≤ D ψ , γ ( x , z ) + D ψ , γ ( z , y ) .The topologies are equivalent because ψ is necessarily an increasing function with ψ ( ) = ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 17 so ψ ( t n ) → t n → ( X , D ψ , γ ) has strong negative type because the kernel γ is continuous on themetric topology ( X , γ ) , ψ is not a linear function and the remaining requirements for Theorem4.1 are satisﬁed. (cid:3) In order to prove the next result, we will use the same inﬁnite dimensional multinomialtheorem that was used to prove that the Gaussian kernel is ISPD on Hilbert spaces on [8]. If H is a real Hilbert space and ( e ξ ) ξ ∈ I is a complete orthonormal basis for it, then for every n ∈ N (6.6) h x , y i n = ∑ ξ ∈ I x ξ y ξ ! n = ∑ α ∈ ( I , Z + ) , | α | = n n ! α ! x α y α where x ξ = h x , e ξ i , ( I , Z + ) is the space of functions from I to Z + , the condition | α | = n meansthat ∑ ξ ∈ I α ( ξ ) = n (in particular α must be the zero function except for a ﬁnite number ofpoints). Also α ! = ∏ ξ ∈ I α ( ξ ) ! (which makes sense because 0! =

1) and x α = ∏ α ( ξ ) = x α ( ξ ) ξ .This result can be proved using approximations of h x , y i on ﬁnite dimensional spaces and themultinomial theorem on those spaces. The number ⌊ l ⌋ stands for the smallest integer less thenor equal to l .On the next Lemma we use the fact that for a continuous positive deﬁnite kernel K : X × X → C a measure µ ∈ M √ K ( X ) satisfy Z X K ( x , y ) d µ ( x ) = , y ∈ X if and only if R X R X K ( x , y ) d µ ( x ) d µ ( y ) =

0, which can be seen on [17], [21].

Lemma 6.2.

Let H be a real Hilbert space, n ∈ N and µ ∈ M ( H ) . Suppose that k x − y k n ∈ L ( | µ | × | µ | ) , then h x , y i k k x k i k y k j ∈ L ( | µ | × | µ | ) , k , i , j ∈ Z + , k + i + j ≤ n . Moreover, if R H R H h x , y i k d µ ( x ) d µ ( y ) = for every ≤ k ≤ n − , then ( − ) n Z H Z H k x − y k n d µ ( x ) d µ ( y )= ⌊ n / ⌋ ∑ l = (cid:18) n l (cid:19)(cid:18) ll (cid:19) n − l Z H Z H h x , y i n − l k x k l k y k l d µ ( x ) d µ ( y ) ≥ , and Z H Z H k x − y k m d µ ( x ) d µ ( y ) = , ≤ m ≤ n − . Proof.

By Lemma 6.1, the fact that k x − y k n ∈ L ( | µ | × | µ | ) is equivalent at k x k n ∈ L ( | µ | ) .Since |h x , y i k k x k i k y k j | ≤ k x k i + k k y k j + k and k x k i + k ≤ max { , k x k n } , we obtain the desired integrability.Note that k x − y k m = ( k x k + k y k − h x , y i ) m = m ∑ k = m − k ∑ i = (cid:18) mk (cid:19)(cid:18) m − ki (cid:19) ( − ) k h x , y i k k x k i k y k ( m − k − i ) If k + i ≤ n −

1, then by the hypothesis0 = Z H Z H h x , y i k + i d µ ( x ) d µ ( y ) = Z H Z H h x , y i k ∑ ξ ∈ I x ξ y ξ ! i d µ ( x ) d µ ( y )= Z H Z H h x , y i k ∑ ξ ∈ I x ξ y ξ ! i d µ ( x ) d µ ( y )= Z H Z H h x , y i k ∑ | α | = i i ! α ! x α y α ! d µ ( x ) d µ ( y )= ∑ | α | = i i ! α ! Z H Z H h x , y i k x α y α d µ ( x ) d µ ( y ) . But then, R H R H h x , y i k x α y α d µ ( x ) d µ ( y ) = α ∈ ( I , Z + ) with | α | = i , because thekernel inside the double integral is positive deﬁnite, continuous and satisﬁes the conditions onLemma 2.1. In particular, since for every y ∈ H and | α | = i there exists a sequence ( y l ) l ∈ N that converges to y and y α l =

0, we have that Z H h x , y i k x α d µ ( x ) = , y ∈ H , α ∈ ( I , Z + ) , | α | = i . Then Z H Z H h x , y i k k x k i k y k ( m − k − i ) d µ ( x ) d µ ( y )= ∑ | β | = m − k − i ∑ | α | = i ( m − k − i ) ! β ! i ! α ! Z H Z H h x , y i k x α y β d µ ( x ) d µ ( y ) = . By symmetry, the same double integral is zero when k + ( m − k − i ) ≤ n −

1. Those tworelations occur only when n = m and 2 i = ( n − i − k ) . The remaining terms on the sum when n = m are exactly those on the statement on the theorem after a simpliﬁcation using those twoequalities. The conclusion follows because the kernel h x , y i k k x k l k y k l is continuous, positivedeﬁnite and satisﬁes the conditions on Lemma 2.1 (cid:3) Corollary 6.3.

Let γ : X × X → [ , ∞ ) be a continuous CND kernel such that x → γ ( x , x ) is aconstant function and γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y. Then for n ∈ N and µ ∈ M ( X ) such that γ n ∈ L ( | µ | × | µ | ) , the kernel K − γ deﬁned in Theorem 3.1 satisﬁes ( K − γ ) m ∈ L ( | µ | ×| µ | ) , ≤ m ≤ n and if Z X Z X K − γ ( x , y ) m d µ ( x ) d µ ( y ) = , ≤ m ≤ n − , ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 19 then ( − ) n Z X Z X γ ( x , y ) n d µ ( x ) d µ ( y ) ≥ and Z X Z X γ ( x , y ) m d µ ( x ) d µ ( y ) = , ≤ m ≤ n − . Proof.

By the hypothesis on γ , there exists a Hilbert space H and a continuous and injectivefunction T : X → H , such that γ ( x , y ) = k T ( x ) − T ( y ) k H + c , where c ≥ γ on the diagonal. If µ ∈ M ( X ) is a measure satisfying the conditions on the Corollary, thenthe image measure µ T ∈ M ( H ) satisﬁes the same conditions of Lemma 6.2. The conclusionfollows by standard properties of image measures. (cid:3) Proof of Theorem 4.4 . By Equation 2.3, we have that ψ ( γ ( x , y )) = Z ( , ∞ ) e − γ ( x , y ) r − e ℓ ( r ) ω ℓ, ∞ ( γ ( x , y ) r ) r ℓ d λ ( r ) + ℓ ∑ k = a k γ ( x , y ) k . By the hypothesis, the ℓ + L ( | µ | × | µ | ) . Corollary 6.3 implies that Z X Z X ℓ ∑ k = a k γ ( x , y ) k d µ ( x ) d µ ( y ) = Z X Z X a ℓ γ ( x , y ) ℓ d µ ( x ) d µ ( y ) ≥ . On the other hand, because of Lemma 6.4 we can apply Fubini-Tonelli, and then Z X Z X " Z ( , ∞ ) e − γ ( x , y ) r − e ℓ ( r ) ω ℓ, ∞ ( γ ( x , y ) r ) r ℓ d λ ( r ) d µ ( x ) d µ ( y )= Z ( , ∞ ) r ℓ (cid:20) Z X Z X e − γ ( x , y ) r d µ ( x ) d µ ( y ) (cid:21) d λ ( r ) ≥ , because the inner double integral is a nonnegative number for every r > ψ is unique, if ψ is not a polynomial of degree ℓ or less then λ (( , ∞ )) >

0, also, if 2 γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y , by [8] the inner doubleintegral is a positive number for every r > µ is not the zero measure, and then the tripleintegral is a positive number as well. (cid:3) Lemma 6.4.

There exists an M > , which only depends on ℓ ∈ Z + for which (6.7) | e − rt − e ℓ ( r ) ω ℓ, ∞ ( rt ) | ≤ Mr ℓ ( + t ℓ ) min { , r − ℓ } , r > , t ≥ . Proof.

Note that r ℓ min { , r − ℓ } = min { r ℓ , } .Case r ≥

1: On this case, the right hand side of Equation 6.7 is ( + t ℓ ) r ℓ , while the left handside is | e − rt − e ℓ ( r ) ω ℓ, ∞ ( rt ) | ≤ | e − rt − ω ℓ, ∞ ( rt ) | + | ( e ℓ ( r ) − ) ω ℓ, ∞ ( rt ) | . The function [ e − s − ω ℓ, ∞ ( s )] / s ℓ is a bounded function on s ∈ [ , ∞ ) , and from this we obtain thedesired inequality for | e − rt − ω ℓ, ∞ ( rt ) | .On the other function we have that | ( e ℓ ( r ) − ) ω ℓ, ∞ ( rt ) | ≤ ℓ − ∑ l = | ( e ℓ ( r ) − ) r l | t l / l ! . Similarly, since e ℓ ( r ) − = − e − r ∑ ∞ k = ℓ r k / k ! the functions ( e ℓ ( r ) − ) r l r − ℓ are bounded on r ∈ ( , ) and from this we also obtain the desired inequality for | ( e ℓ ( r ) − ) ω ℓ, ∞ ( rt ) | , whichconcludes the proof. (cid:3) Section 5.

Proof of Theorem 5.1 . Since H is inﬁnite dimensional, take ( e ι ) ι ∈ N be an orthonormal se-quence of vectors in H . By the Dominated Convergence Theorem, we have that0 = Z H ψ ( k x − y − re ι k ) d µ ( x ) → Z H ψ ( k x − y k + r ) d µ ( x ) , y ∈ H , r ∈ R because h x − y , e ι i → ι → ∞ and | ψ ( t ) | ≤ | ϕ ( t ) | + | φ ( t ) | . ( + t ) ℓ , which proves the ﬁrstassertion.Now, if ψ is a polynomial of degree n , let t , . . . , t N ∈ R , c , . . . , c N ∈ R (not all null) such that ∑ Ni = c i p ( t i ) = p ∈ π n ( R ) . Then if k v k =

1, the measure µ : = ∑ Ni = c i δ ( t i v ) ∈ M ( H ) is nonzero and Z H ψ ( k x − y k ) d µ ( x ) = N ∑ i = c i ψ ( k y − h y , v i v k + ( h y , v i − t i ) ) = n for every ﬁxed y ∈ H .For the converse, ﬁrst, we show that is sufﬁcient to prove the case ℓ = c ∈ ( , ∞ ) → F ( c ) : = ψ ( k x − y k + c ) ∈ R is differentiable for every x , y ∈ H , and ∂ F ∂ c ( y ) = ψ ′ ( c + k x − y k ) . Since ψ = ϕ − φ , and those functions are elements of CM ℓ , we have that | ψ ′ ( t + c ) | . ( + t ) ℓ − ,for every c >

0. In particular, the derivative is a function in L ( | µ | ) and(6.8) Z H ψ ′ ( c + k x − y k ) d µ ( x ) = , y ∈ H , c > . Since ψ ′ ( c + · ) also is the difference between two functions in CM ℓ − for every c >

0, byinduction, we may assume that ℓ = ψ is not a polynomial and µ is a nonzero measure that satisfy the equality on ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 21 the statement of the Theorem. The function ψ ( c + · ) is the difference between two completelymonotone functions on [ , ∞ ) , so there exists a measure β c in [ , ∞ ) for which ψ ( c + t ) = Z [ , ∞ ) e − rt d β c ( r ) , c > , t ≥ d β c + s ( r ) = e − rs d β c ( r ) for every c , s >

0. Integrating the function on the hypotheses withrespect to the measure d µ ( y ) , we obtain that(6.9)0 = Z H Z H ψ ( c + k x − y k ) d µ ( x ) d µ ( y ) = Z [ , ∞ ) Z H Z H e − r k x − y k d µ ( x ) d µ ( y ) d β c ( r ) , c > . The continuous and bounded function I µ ( r ) : = R H R H e − r k x − y k d µ ( x ) d µ ( y ) , r ≥

0, is positivefor every r > c = s + = Z [ , ∞ ) e − sr I µ ( r ) d β ( r ) , s ≥ . By the uniqueness representation of Laplace transform, this can only occur if the ﬁnite measure I µ d β is the zero measure on [ , ∞ ) . The behaviour of I µ implies that this occur if and only if I µ ( ) = β is a multiple of δ , the latter implies that ψ is a constant function, which is acontradiction. (cid:3) Proof of Lemma 5.2 . By Theorem 5.1 we only need to focus on the ﬁnite dimensional case.We prove ( i ) and ( ii ) by showing that is sufﬁcient to prove the case ℓ = ,

2, which will followfrom ( iii ) and ( iv ) . For the induction argument on ( ii ) we assume a more general setting, that ψ ( t ) = t ℓ − log ( t ) + bt ℓ − , with b ∈ R .Indeed, suppose that ℓ ≥

3. Note then that the function y ∈ H → F ( y ) : = ψ ( k x − y k ) ∈ R istwice differentiable on each direction of an orthonormal basis ( e ι ) ι ∈ I for H , and ∂ F ∂ e ι ( y ) = ψ ′′ ( k x − y k )( y ι − x ι ) + ψ ′ ( k x − y k ) . Since ψ ∈ C ℓ − ([ , ∞ )) ∩ CM ℓ (or − ψ is an element, the sign does not make difference forthe induction step), we have that | ψ ′ ( t ) | . ( + t ) ℓ − and | ψ ′′ ( t ) | . ( + t ) ℓ − . In particular,the second derivative is a function in L ( | µ | ) and summing on the ι variable we obtain ( m = dim ( H ) )(6.10) 0 = Z H ψ ′′ ( k x − y k ) k x − y k + m ψ ′ ( k x − y k ) d µ ( x ) , y ∈ H . When ψ is a function of type ( i ) or ( ii ) , the integrand on this equation is equal to a positivemultiple of k x − y k a − (or k x − y k ℓ − log ( k x − y k ) plus a multiple of k x − y k ℓ − ), which isthe induction argument.Now, let ψ be an arbitrary function on CM ℓ , ℓ = ,

2, that is not a polynomial. For every t > η t : = t µ − τ t , where τ t = t µ ( H ) δ − ( δ tv µ − δ − tv µ ) / v µ is the vector mean, that is Z H h x , y i d µ ( x ) = h v µ , y i , y ∈ H . On the case ℓ = v µ might not be well deﬁned, on this case deﬁne it as the vectorzero. Then η t ( H ) =

0, and if it is well deﬁned v η t =

0. By the hypothesis we obtain that4 Z H Z H ψ ( k x − y k ) d η t ( x ) d η t ( y ) = Z H Z H ψ ( k x − y k ) d τ t ( x ) d τ t ( y )= ψ ( )( t µ ( H ) + ) − ψ ( t k v µ k ) By Theorem 4.4, this is a nonnegative number for every t > ℓ = ( − ) ψ ( t ) convergesto + ∞ as t → ∞ , so if k v µ k 6 = ψ ( ) < v µ = , ψ ( ) =

0. In particular, we obtain that the double integral with respect to η is zero,which by Theorem 4.4 we must have that µ = µ ( H ) δ , because ψ is not a polynomial. Fromthis equality and the initial assumption on µ we obtain that µ ( H ) ψ ( k y k ) = y ∈ H ,which can only occur if µ is the zero measure because ψ is not a polynomial.The case ℓ = (cid:3) R EFERENCES [1] C. B

ERG , J. C

HRISTENSEN , AND

P. R

ESSEL , Harmonic analysis on semigroups: theory of positive deﬁniteand related functions , vol. 100 of Graduate Texts in Mathematics, Springer, 1984.[2]

NIST Digital Library of Mathematical Functions . F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I.Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.[3] J. F

ARAUT AND

K. H

ARZALLAH , Distances hilbertiennes invariantes sur un espace homog`ene , Annales del’Institut Fourier, 24 (1974), pp. 171–217.[4] K. F

UKUMIZU , F. R. B

ACH , AND

M. I. J

ORDAN , Dimensionality reduction for supervised learning withreproducing kernel hilbert spaces , Journal of Machine Learning Research, 5 (2004), pp. 73–99.[5] R. G

ANGOLLI , Positive deﬁnite kernels on homogeneous spaces and certain stochastic processes related tol´evy brownian motion of several parameters , Annales de Institut Henri Poincar´e Probabilit´es et Statistiques,3 (1967), pp. 121–226.[6] I. M. G

ELFAND AND

N. Y. V

ILENKIN , Generalized Functions, Vol. 4: Applications of Harmonic Analysis ,Academic Press, 1964.[7] A. G

RETTON , K. B

ORGWARDT , M. R

ASCH , B. S CH ¨ OLKOPF , AND

A. S

MOLA , A kernel method for thetwo-sample-problem , Advances in neural information processing systems, 19 (2006), pp. 513–520.[8] J. C. G

UELLA , On Gaussian kernels on Hilbert spaces and kernels on Hyperbolic spaces , arXiv e-prints,(2020), p. arXiv:2007.14697.[9] K. G UO , S. H U , AND

X. S UN , Conditionally positive deﬁnite functions and Laplace-Stieltjes integrals ,Journal of Approximation Theory, 74 (1993), pp. 249–265.[10] A. L. K

OLDOBSKII , Isometric operators in vector-valued lp-spaces , Journal of Soviet Mathematics, 36(1987), pp. 420–423.[11] W. LINDE,

On rudin’s equimeasurability theorem for inﬁnite dimensional hilbert spaces , Indiana UniversityMathematics Journal, 35 (1986), pp. 235–243.[12] R. L

YONS , Distance covariance in metric spaces , Ann. Probab., 41 (2013), pp. 3284–3305.[13] ,

Hyperbolic space has strong negative type , Illinois J. Math., 58 (2014), pp. 1009–1013.[14] ,

Strong negative type in spheres , Paciﬁc Journal of Mathematics, 307 (2020), pp. 383–390.[15] L. M

ATTNER , Strict deﬁniteness of integrals via complete monotonicity of derivatives , Transactions of theAmerican Mathematical Society, 349 (1997), pp. 3321–3342.

ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 23 [16] C. A. M

ICCHELLI , Interpolation of scattered data: distance matrices and conditionally positive deﬁnitefunctions , Constructive Approximation, 2 (1984), pp. 11–22.[17] C. A. M

ICCHELLI , Y. X U , AND

H. Z

HANG , Universal kernels , Journal of Machine Learning Research, 7(2006), pp. 2651–2667.[18] R. L. S

CHILLING , R. S

ONG , AND

Z. V

ONDRACEK , Bernstein functions: theory and applications , vol. 37,Walter de Gruyter, 2012.[19] D. S

EJDINOVIC , B. S

RIPERUMBUDUR , A. G

RETTON , AND

K. F

UKUMIZU , Equivalence of distance-basedand rkhs-based statistics in hypothesis testing , The Annals of Statistics, (2013), pp. 2263–2291.[20] C.J. S

IMON -G ABRIEL AND

B. S CH ¨ OLKOPF , Kernel distribution embeddings: Universal kernels, character-istic kernels and kernel metrics on distributions , Journal of Machine Learning Research, 19 (2018), pp. 1–29.[21] B. K. S

RIPERUMBUDUR , K. F

UKUMIZU , AND

G. R. L

ANCKRIET , Universality, characteristic kernels andRKHS embedding of measures , Journal of Machine Learning Research, 12 (2011), pp. 2389–2410.[22] I. S

TEINWART AND

A. C

HRISTMANN , Support vector machines , Springer Science & Business Media, 2008.[23] G. J. S Z ´ EKELY AND

M. L. R

IZZO , Energy statistics: A class of statistics based on distances , Journal ofStatistical Planning and Inference, 143 (2013), pp. 1249–1272.[24] G. J. S Z ´ EKELY , M. L. R

IZZO , ET AL ., Testing for equal distributions in high dimension , InterStat, 5 (2004),pp. 1249–1272.[25] H. W

ENDLAND , Scattered data approximation , vol. 17, Cambridge university press, 2005.

Email address : [email protected] RIKEN C

ENTER FOR A DVANCED I NTELLIGENCE P ROJECT , T

OKYO , J, J