Generalization of the energy distance by Bernstein functions
aa r X i v : . [ m a t h . F A ] F e b GENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS
J. C. GUELLAA
BSTRACT . We reprove the well known fact that the energy distance defines a metric on thespace of Borel probability measures on a Hilbert space with finite first moment by a new ap-proach, by analysing the behaviour of the Gaussian kernel on Hilbert spaces and a MaximumMean Discrepancy analysis. From this new point of view we are able to generalize the energydistance metric to a family of kernels related to Bernstein functions and conditionally negativedefinite kernels. We also explain what occurs on the energy distance on the kernel k x − y k α forevery α >
2, where we also generalize the idea to a family of kernels related to derivatives ofcompletely monotone functions and conditionally negative definite kernels. C ONTENTS
1. Introduction 12. Definitions 33. Conditionally positive definite kernels 64. Inner products defined by CND kernels and derivatives of completely monotonefunctions 85. Space of functions defined by derivatives of completely monotone functions 126. Proofs 136.1.
Section 3
Section 4
Section 5
NTRODUCTION
A popular method to compare two probabilities is done by embedding the space (or a subset)of probabilities into a Hilbert space and use the metric provided by the embedding. Currently,there are two main approaches for this task: ( I ) The maximum mean discrepancy on a bounded, continuous, positive definite kernel K : X × X → R that is characteristic [7], [4]. The distance between two Radon regular Mathematics Subject Classification.
Key words and phrases.
Energy distance; Metric spaces of strong negative type; Metrics on probabilities;Bernstein functions; Conditionally negative definite kernels. probabilities P and Q is defined by MMD ( P , Q ) : = r Z X Z X K ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) . ( II ) The use of a continuous conditionally negative definite kernel γ : X × X → R with γ ( x , x ) = x ∈ X , [19]. The kernel γ must additionally satisfy the equality(1.1) Z X Z X − γ ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) = P and Q that integrates the function x → γ ( x , z ) forevery z ∈ X only when P = Q . It can be proved that the above double integral is alwaysa nonnegative number and when this property occurs D γ ( P , Q ) : = r Z X Z X − γ ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) , is a metric on the mentioned subspace of probabilities on X .On this paper, we focus on the second method.The most popular example of this method is the energy distance, initially defined as X = R m , γ ( x , y ) = k x − y k θ , where 0 < θ < k x k θ ,[24], [23]. When θ =
2, the kernel is conditionally negative definite but do not satisfy theadditional property of Equation 1.1.A more geometrical approach is when γ is a metric on X that satisfy Equation 1.1 (the topol-ogy is the one from the metric), hence ( X , γ ) is a metric space of strong negative type. Examplesof such spaces include: • Hilbert spaces: Proved on [12] as a generalization of the energy distance. • Hyperbolic spaces (finite dimensional): Proved on [13]In some cases, the conditionally negative definite kernel γ may define a metric on the set X , but γ is not of strong type. A metric space where we only know that the distance is aconditionally negative definite kernel is called a metric space of negative type. An example ofsuch space is the real sphere, proved on [5], where it is also proved that the real, complex andquaternionic projective spaces and the Cayley projective plane are not metric spaces of negativetype.In [12], it is also proved that if ( X , γ ) is a metric space of negative type then γ θ , 0 < θ < γ . Interestingly, the kernel γ θ is a metric on X , with the same topology as γ , so we canrephrase the result of Lyon as ( X , γ θ ) being a metric space of strong negative type. We providemore details and generalizations of this property on Corollary 4.3.The major aim of this paper is to provide a large amount of examples of conditionally negativedefinite kernels that satisfy Equation 1.1, by using Bernstein functions on Theorem 4.1. Ourmethod encompasses all of the above mentioned kernels that satisfy ( II ) . We also provide anew proof that hyperbolic spaces (any dimension) are metric paces of strong negative type onTheorem on 4.2. ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 3
In [15], Mattner analysed the behaviour of the kernel k x − y k α , for α >
2, defined on R m .What occurs is that we can still provide a metric structure on the space of probabilities withcertain integrability assumptions, but we can only compare them if they have the same vectormean ( 2 < α < < α < y ∈ H → Z H ψ ( k x − y k ) d µ ( x ) ∈ R , where ψ is a continuous function that is the difference of two derivatives (same order) of acompletely monotone function. More precisely, we analyse when they are uniquely defined bythe measure µ . Section 2 is entirely focused on definitions that we use. The proofs are presentedon Section 6. 2. D EFINITIONS
We recall that a nonnegative measure λ on a Hausdorff space X is Radon regular (which wesimply refer as Radon) when it is a Borel measure such that is finite on every compact set of X and (i) (Inner regular) λ ( E ) = sup { λ ( K ) , K is compact , K ⊂ E } for every Borel set E .(ii) (Outer regular) λ ( E ) = inf { λ ( U ) , U is open , E ⊂ U } for every Borel set E .We then said that a complex valued measure λ of bounded variation is Radon if its variationis a Radon measure. The vector space of such measures is denoted by M ( X ) . Recall that everyBorel measure of finite variation (in particular, probability measures) on a separable completemetric space is necessarily Radon.An semi-inner product on a real (complex) vector space V is a bilinear real (sesquilinearcomplex) valued function ( · , · ) V defined on V × V such that ( u , u ) V ≥ u ∈ V . Whenthis inequality is an equality only for u =
0, we say that ( · , · ) V is an inner-product. Similarly, apseudometric on a set X is a symmetric function d : X × X → [ , ∞ ) , such that d ( x , x ) = d ( x , y ) = x = y , d is a metric on X .A kernel K : X × X → C is called positive definite if for every finite quantity of distinct points x , . . . , x n ∈ X and scalars c , . . . , c n ∈ C , we have that Z X Z X K ( x , y ) d λ ( x ) d λ ( y ) = n ∑ i , j = c i c j K ( x i , x j ) ≥ , where λ = ∑ ni = c i δ x i . The set of measures on X used before are denoted by the symbol M δ ( X ) .The reproducing kernel Hilbert space (RKHS) of a positive definite kernel K : X × X → C isthe Hilbert space H K ⊂ F ( X , C ) , and it satisfies [22] ( i ) x ∈ X → K y ( x ) : = K ( x , y ) ∈ H K ; ( ii ) h K y , K x i = K ( x , y ) J. C. GUELLA ( iii ) span { K y , y ∈ X } = H K .When X is a Hausdorff space and K is continuous it holds that H K ⊂ C ( X ) .The following widely known result describes how it is possible to define a semi-inner productstructure on a subspace of M ( X ) using a continuous positive definite kernel. Lemma 2.1.
If K : X × X → C is a continuous positive definite kernel and µ ∈ M ( X ) with p K ( x , x ) ∈ L ( | µ | ) ( µ ∈ M √ K ( X ) ), thenz ∈ X → K µ ( z ) : = Z X K ( x , z ) d µ ( x ) ∈ C is an element of H K , and if η is another measure with the same conditions as µ , we have that h K η , K µ i H K = Z X Z X k ( x , y ) d η ( x ) d µ ( y ) . In particular, ( η , µ ) ∈ M √ K ( X ) × M √ K ( X ) → h K η , K µ i H K is a semi-inner product. We present a generalization of this result to a larger class of measures in Lemma 3.6. Usually,the kernel K is bounded, so M √ K ( X ) = M ( X ) . On this case, if the semi-inner product is in factan inner product we say that K is integrally strictly positive definite (ISPD), and when is an innerproduct on the vector space of measures in M ( X ) that µ ( X ) =
0, we say that K is characteristic.If the kernel K is real valued, it is sufficient to analyse the ISPD and characteristic property onreal valued measures.When the kernel is characteristic we define the maximum mean discrepancy (MMD) as themetric on the space of probability measures in M ( X ) by(2.2) MMD ( P , Q ) K : = q h K P − K Q , K P − K Q i H K = r Z X Z X K ( x , y ) d [ P − Q ]( x ) d [ P − Q ]( y ) As mentioned at the introduction, the focused of this paper is to analyse metrics on the spaceof probabilities using conditionally negative definite kernels. We present a more general defi-nition which will be useful to the analysis of the energy distance through the kernel k x − y k α , α >
2, defined on a Hilbert space.
Definition 2.2.
Let γ : X × X → C be an Hermitian kernel and P a finite dimensional spaceof functions from X to C . We say that γ is P-conditionally positive definite (P-CPD) if forevery finite quantity of points x , . . . , x n ∈ X and scalars c , . . . , c n ∈ C , under the restrictionthat ∑ ni = c i p ( x i ) = for every p ∈ P, we have that n ∑ i , j = c i c j γ ( x i , x j ) ≥ . This definition generalize the concepts of positive definite kernels ( P is the zero space) andCPD kernels ( P as the set of constant functions). The most important example is when X is a finite dimensional Euclidean space and P is the set of multivariable polynomials on X with degree less than or equal to a constant k ∈ N , [25] [9], [6]. Sometimes it might be more ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 5 convenient to work with the opposite sign on Definition 2.2, on this case we say that the kernelis P -conditionally negative definite ( P -CND).In [9], [16], it is proved that a characterization for the continuous functions ψ : [ , ∞ ) → R ,such that the kernel ( x , y ) ∈ R m × R m → ψ ( k x − y k ) ∈ R is CPD for P as the family of multivariable polynomials of degree less than a fixed ℓ ∈ Z + (wedenote this family by π ℓ − ( R m ) , where π − ( R m ) = { } and π ( R m ) = { constant functions } )for every m ∈ N . A function ψ satisfy this property if and only if ψ ∈ C ∞ ( , ∞ ) and ( − ) ℓ ψ ( ℓ ) is a completely monotone function on ( , ∞ ) . A function with this property can be uniquelywritten as(2.3) ψ ( t ) = Z ( , ∞ ) e − tr − e ℓ ( r ) ω ℓ, ∞ ( rt ) r ℓ d λ ( r ) + ℓ ∑ k = a k t k where λ is a nonnegative Radon measure on ( , ∞ ) (not necessarily with finite variation) with ω ℓ, ∞ ( s ) : = ℓ − ∑ l = ( − ) l s l l ! , e ℓ ( s ) : = e − s ℓ − ∑ l = s l l ! , Z ( , ∞ ) min { , r − ℓ } d λ ( r ) < ∞ and a k ∈ R , ( − ) ℓ a ℓ ≥ ω , ∞ is the zero function. For instance, the functions i ) ( − ) ℓ t a + p ( t ) ; ii ) ( − ) ℓ + t ℓ log ( t ) + p ( t ) ; iii ) ( − ) ℓ ( c + t ) a + p ( t ) ; iv ) e − rt + p ( t ) ,are elements of CM ℓ , for ℓ − < a ≤ ℓ , c > p ∈ π ℓ − . Those functions are not only in CM ℓ , but they are ℓ − [ , ∞ ) and we have a similar and simplercharacterization compared to Equation 2.3 for them.In general, a function ψ ∈ CM ℓ is such that ψ ∈ C ℓ − ([ , ∞ )) if and only if(2.4) ψ ( t ) = Z ( , ∞ ) e − tr − ω ℓ, ∞ ( rt ) r ℓ d η ( r ) + ℓ ∑ k = b k t k where η is a nonnegative Radon measure on ( , ∞ ) (not necessarily with finite variation) with ω ℓ, ∞ ( s ) : = ℓ − ∑ l = ( − ) l s l l ! , Z ( , ∞ ) min { , r − ℓ } d η ( r ) < ∞ , b k = ψ ( k ) ( ) / k ! for k < ℓ and ( − ) ℓ b ℓ ≥ ψ ∈ CM ℓ then ψ ( · + c ) ∈ CM ℓ ∩ C ℓ − ([ , ∞ )) . On this case, the mea-sure η c relative to the decomposition given on Equation 2.4 has finite variation and satisfy d η c + s ( r ) = e − sr d η c ( r ) for every c , s >
0. This property and the decomposition given on Equa-tion 2.4 are implicitly proved on Theorem 2 . . p ∈ CM ℓ if and only if p ∈ π ℓ ( R ) and the constant ( − ) ℓ p ( ℓ ) ≥ J. C. GUELLA
By Lemma 2 . ψ ∈ CM ℓ satisfies | ψ ( t ) | . + t ℓ (this notation means that | ψ ( t ) | / + t ℓ is a bounded function).3. C ONDITIONALLY POSITIVE DEFINITE KERNELS
The following known result states a connection between positive definite kernels and P -CPDkernels [25]. A Lagrange basis for P is a basis { p , . . . , p m } of P and points ξ , . . . , ξ m ∈ X , suchthat p i ( ξ j ) = δ i , j . A set of points ξ , . . . , ξ m ∈ X is unisolvent with respect to a m -dimensionalspace P if the only function p ∈ P such that p ( ξ i ) = i is the zero function. Theorem 3.1.
Let ξ , . . . , ξ m ∈ X and p , . . . , p m be a Lagrange basis for a finite dimensionalspace P of functions from X to C . An Hermitian kernel γ : X × X → C is P-CPD if and only ifthe Hermitian kernelK γ ( x , y ) : = γ ( x , y ) − m ∑ k = p k ( x ) γ ( ξ k , y ) − m ∑ l = p l ( y ) γ ( x , ξ l ) + m ∑ k , l = p k ( x ) p l ( y ) γ ( ξ k , ξ l ) is positive definite. This result can be easily seen by the fact that if x , . . . , x n ∈ X and c , . . . , c n ∈ C are such that ∑ ni = c i p ( x i ) = p ∈ P , then n ∑ i , j = c i c j K γ ( x i , x j ) = n ∑ i , j = c i c j γ ( x i , x j ) , and conversely, if z , . . . , z m + n ∈ X (with z n + k = ξ k ) and d , . . . , d m + n ∈ C , then m + n ∑ i , j = d i d j K γ ( z i , z j ) = m + n ∑ i , j = e i e j γ ( z i , z j ) , where e i = d i , for i ≤ n and e i = − ∑ ni = d i p i − n ( z i ) , for i > n .Similar to continuous positive definite kernels, continuous P -CPD kernels can be analyzedby its behaviour on a certain type of space of measures. Definition 3.2.
Let X be a Hausdorff space and P ⊂ C ( X ) a finite dimensional vector space.We define the set M P ( X ) : = { µ ∈ M ( X ) , Z X | p ( x ) | d | µ | ( x ) < ∞ and Z X p ( x ) d µ ( x ) = for every p ∈ P } . Theorem 3.3.
A continuous Hermitian kernel γ : X × X → C is P-CPD if and only if for ev-ery µ ∈ M P ( X ) for which γ ( x , y ) ∈ L ( | µ | × | µ | ) and γ ( x , ξ i ) ∈ L ( | µ | ) , where ( ξ i ) ≤ i ≤ m isunisolvent, we have that Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) ≥ . If we restrict the measures on Theorem 3.3 to those that γ ( x , ξ ) ∈ L ( | µ | ) for every ξ ∈ X ,then the kernel γ defines a semi-inner product on this vector space.When P is the space generated by a single function p , we can simplify the assumptions ofTheorem 3.3. ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 7
Lemma 3.4.
Let γ : X × X → C be a continuous Hermitian kernel and [ p ] = P ⊂ C ( X ) be aone dimensional vector space. Then, γ is P-CPD if and only if for every µ ∈ M P ( X ) for which γ ( x , y ) ∈ L ( | µ | × | µ | ) Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) ≥ . Additionally, if p and γ are real valued functions such that p ( x ) = and the function γ ( x , x ) / p ( x ) is bounded, the following assertions are equivalent: ( i ) γ ∈ L ( | µ | × | µ | ) ; ( ii ) The function x ∈ X → γ ( x , z ) ∈ L ( | µ | ) for some z ∈ X ; ( iii ) The function x ∈ X → γ ( x , z ) ∈ L ( | µ | ) for every z ∈ X .
As a direct consequence of the previous Lemma we obtain that if the function γ ( x , x ) / p ( x ) isbounded, the set of measures on M P ( X ) that integrates γ ( x , y ) is a vector space and the doubleintegral defines a semi inner product on it. We focus on the CPD case and when γ is real valueddue to its relevance. Corollary 3.5.
Let γ : X × X → R be a continuous CPD kernel such that the function γ ( x , x ) isbounded. The semi inner product ( µ , ν ) ∈ M ( X , γ ) × M ( X , γ ) → I ( µ , ν ) γ : = Z X Z X γ ( x , y ) d µ ( x ) d ν ( y ) ∈ R is well defined on the vector space M ( X , γ ) : = { η ∈ M ( X ) , η ( X ) = , γ ∈ L ( | η | × | η | ) } On the next lemma we improve the condition p K ( x , x ) ∈ L ( | µ | ) and the set of measuresanalysed on Lemma 2.1, at the cost of describing the function K µ at the exception of a | µ | measure zero set. Lemma 3.6.
Let K : X × X → C be a continuous positive definite kernel. Let µ ∈ M ( X ) suchthat K ( x , y ) ∈ L ( | µ | × | µ | ) , then the set of pointsX µ : = { z ∈ X , K ( · , z ) ∈ L ( | µ | ) } is such that | µ | ( X − X µ ) = , and the functionz ∈ X µ → Z X K ( x , z ) d µ ( x ) ∈ C is the restriction of an element K µ ∈ H K . If η is a measure with the same conditions as themeasure µ and K ∈ L ( µ × η ) , we have that h K η , K µ i H K = Z X Z X k ( x , y ) d η ( x ) d µ ( y ) . J. C. GUELLA
4. I
NNER PRODUCTS DEFINED BY
CND
KERNELS AND DERIVATIVES OF COMPLETELYMONOTONE FUNCTIONS
Since all kernels that we deal on this Section are real valued, we simplify the writing by onlyfocusing on real valued measures (which we still use the notation M ( X ) ). As mentioned onSection 2, this is not a restriction.In [12], it is proved that on a separable real Hilbert space H , the bilinear function I / definedas ( µ , ν ) ∈ M ( H ) × M ( H ) → I ( µ , ν ) / : = Z H Z H −k x − y k H d µ ( x ) d ν ( y ) defines a inner product on the vector space M ( H ) : = { η ∈ M ( H ) , η ( H ) = , k x k ∈ L ( | η | ) } . The function t ∈ [ , ∞ ) → ψ ( t ) : = √ t ∈ R is an example of a Bernstein function, [18]. Itis continuous, ψ ∈ C ∞ (( , ∞ )) and ψ ′ is a completely monotone function on ( , ∞ ) (we do notneed to assume on our context that Bernstein functions are nonnegative). In other words, afunction ψ is a Bernstein function if and only if − ψ ∈ CM , and then it can be written, byEquation 2.4 for ℓ =
1, as −√ t = √ π Z ( , ∞ ) ( e − rt − ) r / dr . So, ( x , y ) ∈ H × H → −k x − y k H = √ π Z ( , ∞ ) ( e − r k x − y k − ) r / dr , and this kernel is CPD. The Gaussian kernels e − r k x − y k , r >
0, are ISPD for every Hilbert space[8], being so, by Fubini-Tonelli Theorem we have that if µ ∈ M ( H ) with µ ( H ) = k x k ∈ L ( | µ | ) , then Z H Z H ( − ) k x − y k H d µ ( x ) d µ ( y ) = √ π Z ( , ∞ ) (cid:18) Z H Z H e − r k x − y k d µ ( x ) d µ ( y ) (cid:19) r / dr ≥ . Further, the double inner integral is positive whenever µ is not the zero measure, implying thatthe final result is a positive number, which is the key argument in order to verify that I / is aninner product, thus reobtaining the main result of [12] by a complete different argument. Moregenerally, we have the following result. Theorem 4.1.
Let ψ : [ , ∞ ) → R be a Bersntein function and γ : X × X → [ , ∞ ) be a continuousCND kernel such that x → γ ( x , x ) is a bounded function. Consider the vector space M ( X ; γ , ψ ) : = { η ∈ M ( X ) , ψ ( γ ( x , y )) ∈ L ( | η | × | η | ) and η ( X ) = } , then the function ( µ , ν ) ∈ M ( X ; γ , ψ ) × M ( X ; γ , ψ ) → I ( µ , ν ) γ , ψ : = − Z X Z X ψ ( γ ( x , y )) d µ ( x ) d ν ( y ) defines an semi-inner product on M ( X ; γ , ψ ) . If ψ is not a linear function and γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y, then I ( µ , ν ) γ , ψ defines an inner product on M ( X ; γ , ψ ) . ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 9
We emphasize that by Lemma 3.4 ( p is the constant 1 function) ψ ( γ ( x , y )) ∈ L ( | η | × | η | ) ifand only if x → ψ ( γ ( x , z )) ∈ L ( | η | ) for some (or every) z ∈ X .For instance, if X is a real Hilbert space H , γ ( x , y ) = k x − y k and ψ ( t ) = t a / , 0 < a < ( µ , ν ) ∈ M ( H ; t a / ) × M ( H ; t a / ) → I ( µ , ν ) a / : = − Z H Z H k x − y k a d µ ( x ) d ν ( y ) defines an inner product on M ( H ; t a / ) : = { η ∈ M ( H ) , k x k a ∈ L ( | η | ) and η ( X ) = } . It is relevant to say that usually the inner product on Theorem 4.1 is not complete (hence, M ( X ; γ , ψ ) is not a Hilbert space). For instance, on [20] it is proved that the Gaussian kernelcan be used to define an inner product on the space of tempered distributions on Euclideanspaces.Another example occurs on the generalized real hyperbolic space. Let H be a Hilbert spaceand define H : = { ( x , t x ) ∈ H × ( , ∞ ) , t x − k x k = } be the real hyperbolic space relative to H and consider the kernel (( x , t x ) , ( y , t y )) ∈ H × H → [( x , t x ) , ( y , t y )] : = t x t y − h x , y i ∈ [ , ∞ ) , which satisfies the relation cosh ( d H (( x , t x ) , ( y , t y ))) = [( x , t x ) , ( y , t y )] , where d H is a metric in H . On [3] or chapter 5 in [1], it is proved that the metric d H on H is aCND kernel, being so we can apply Theorem 4.1 for the kernel γ = d H and ψ = t a / , 0 < a < ( µ , ν ) ∈ M ( H ; t a / ) × M ( H ; t a / ) → H ( µ , ν ) a / : = − Z H Z H d H ( x , y ) a / d µ ( x ) d ν ( y ) defines a inner product on M ( H ; t a / ) : = { η ∈ M ( H ) , x ∈ H → d H ( x , z ) a / ∈ L ( | η | ) for some (or every) z ∈ H and η ( H ) = } . We can also include the case a =
2. A proof when H is finite dimensional was provided on[13] using geometric properties of hyperbolic spaces. Our proof relies on a Laurent type ofapproximation for the function arcCosh ( t ) . Theorem 4.2.
Let H be a real hyperbolic space, and consider the vector space M ( H ; t ) : = { η ∈ M ( H ) , x ∈ H → d H ( x , z ) ∈ L ( | η | ) for some (or every) z ∈ H and η ( H ) = } . Then ( µ , ν ) ∈ M ( H ; t ) × M ( H ; t ) → H ( µ , ν ) : = − Z H Z H d H ( x , y ) d µ ( x ) d ν ( y ) is an inner product. A different behaviour occurs on the generalized real spheres. Let H be a Hilbert space anddefine S H : = { x ∈ H , k x k = } be the real sphere relative to H . The kernel d S H definedon S H by the relation cos ( d S H ( x , y )) = h x , y i H , x , y ∈ H is a metric and defines a CND kernel as shown on [5]. However, unlikely the Hilbert spaceand the real hyperbolic space, d S H is not a metric space of strong negative type, [14]. Gan-golli also proved on [5] that the metric on the other compact two-point homogeneous spaces(real/complex/quaternionic projective spaces and the Cayley projective plane) does not define aCND kernel.The following Corollary of Theorem 4.1, connects the setting of metric spaces of strongnegative type and the kernels on Theorem 4.1. Corollary 4.3.
Let ψ : [ , ∞ ) → R be a nonzero Bernstein function such that ψ ( ) = , lim t → ∞ ψ ( t ) / t = and ( X , γ ) is a metric space of negative type. Then, ( x , y ) ∈ X × X → D ψ , γ ( x , y ) : = ψ ( γ ( x , y )) is a metric on X and ( X , D ψ , γ ) is a metric space of strong negative type homeomorphic to ( X , γ ) . As an example of Corollary 4.3, the Bersntein function ψ ( t ) = log ( t + ) , satisfies ψ ( ) = t → ∞ ψ ( t ) / t =
0. In particular, on a Hilbert space H , log ( k x − y k + ) is a metric on H that is homeomorphic with the Hilbertian topology and this metric is of strong negative type.Interestingly we can apply Corollary 4.3 again in order to obtain that the same occurs with themetric log ( log ( k x − y k + ) + ) .Returning to the kernel ( x , y ) ∈ H × H → k x − y k a , we may ask ourselves what occurs when a ≥
2. The case a = − Z H Z H k x − y k d µ ( x ) d ν ( y ) = Z H Z H h x , y i H d µ ( x ) d ν ( y ) , for every µ , ν ∈ M ( H ; t ) : = { η ∈ M ( H ) , k x k ∈ L ( | η | ) and η ( X ) = } . This still definesa semi-inner product on M ( H ; t ) , but the vector space M ( H ; t ) : = { η ∈ M ( H ; t ) , Z H h x , y i H d η ( x ) = , for every y ∈ H } ⊂ M ( H ; t ) is equivalent to the zero measure on this inner product. For an arbitrary measure η ∈ M ( H ) such that k x k ∈ L ( | η | ) , the linear functional y ∈ H → Z H h x , y i d η ( x ) ∈ R is continuous, so there exists a vector v η , which we call the vector mean of η , which representsthe above continuous linear functional.On the case a >
2, a different behaviour emerges. The double integral kernel does not define asemi-inner product on M ( H , t a / ) , however, if we restrict ourselves to the vector space space M ( H ; t a / ) : = { η ∈ M ( H ) , k x k a ∈ L ( | η | ) , η ( H ) = , v η = } ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 11 for 2 < a < CM function t a / = a ( a − ) Γ ( − a / ) Z ( , ∞ ) ( e − rt − + rt ) r a / + dr , by Fubini-Tonelli we obtain that if µ , ν ∈ M ( H ; t a / ) Z H Z H k x − y k a d µ ( x ) d ν ( y )= a ( a − ) Γ ( − a / ) Z ( , ∞ ) (cid:18) Z H Z H e − r k x − y k d µ ( x ) d ν ( y ) (cid:19) r a + dr ≥ . In particular, we can use the kernel k x − y k a , 2 < a <
4, in order to define a metric on thespace of Radon probability measures on H with finite second moment, but with a fixed vectormean.More generally, we have that. Theorem 4.4.
Let ℓ ∈ N , ψ : [ , ∞ ) → R be a continuous function on CM ℓ and γ : X × X → [ , ∞ ) be a continuous CND kernel such that x → γ ( x , x ) is a constant function. Consider the vectorspace M ℓ ( X ; γ , ψ ) : = { η ∈ M ( X ) , ψ ( γ ( x , y )) ∈ L ( | η | × | η | ) , γ ( x , y ) ℓ ∈ L ( | η | × | η | ) and η ( X ) = , Z X Z X K − γ ( x , y ) j d η ( x ) d η ( y ) = , ≤ j ≤ ℓ − } where K − γ is the kernel in Theorem 3.1, then the function ( µ , ν ) ∈ M ℓ ( X ; γ , ψ ) × M ℓ ( X ; γ , ψ ) → I ( µ , ν ) γ , ψ : = Z X Z X ψ ( γ ( x , y )) d µ ( x ) d ν ( y ) defines an semi-inner product on M ℓ ( X ; γ , ψ ) . If ψ is not a polynomial of degree ℓ or lessand γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y, then I ( µ , ν ) γ , ψ defines an inner product on M ℓ ( X ; γ , ψ ) . From Equation 2.4 and the fact that ( − ) ℓ [ e − tr − ω ∞ ,ℓ ( rt )] ≥
0, for every t , r ≥ ℓ ∈ N (this can be easily proved by induction on ℓ ), if ψ ∈ CM ℓ on Theorem 4.4 also belongs C ℓ − [ , ∞ ) , we may lower the requirement γ ℓ ∈ L ( | η | × | η | ) to γ ℓ − ∈ L ( | η | × | η | ) on thedefinition of M ℓ ( X ; γ , ψ ) .The fact that we required additional properties on the function γ ( x , x ) on Theorem 4.4 com-pared to Theorem 4.1, is related to the fact that the integrals R X R X γ ( x , y ) j d η ( x ) d η ( y ) are dif-ficult to analyse on the general setting of Theorem 4.1. However, if ψ ∈ CM ℓ on Theorem 4.4also belongs C ℓ − [ , ∞ ) and all of its derivatives up to ℓ − x → γ ( x , x ) is abounded function on Theorem 4.4. This is the case for the function ( − ) ℓ t a , 2 ( ℓ − ) < a < ℓ .As an example of Theorem 4.4, if X is a Hilbert space H , γ ( x , y ) = k x − y k and ψ ( t ) =( − ) ℓ t a / , 2 ( ℓ − ) < a < ℓ , ℓ ∈ N then ( µ , ν ) ∈ M ℓ ( H ; t a / ) × M ℓ ( H ; t a / ) → I ( µ , ν ) a / : = Z H Z H ( − ) ℓ k x − y k a d µ ( x ) d ν ( y ) defines a inner product on the vector space M ℓ ( H ; t a / ) : = { µ ∈ M ( H ) , k x k a ∈ L ( | µ | ) , µ ( H ) = , and Z H h x , y i . . . h x , y j i d µ ( x ) = , y , . . . , y j ∈ H ≤ j ≤ ℓ − } . Theorem 4.1 and Theorem 4.4 on the case where X is an Euclidean space R m and γ ( x , y ) = k x − y k were proved on [15].5. S PACE OF FUNCTIONS DEFINED BY DERIVATIVES OF COMPLETELY MONOTONEFUNCTIONS
As mentioned in [12], the fact that the energy distance defines a metric on a separable Hilbertspace can be proved using the proposed method, but also follows as a consequence of the factthat if H is a separable Hilbert space, then a measure µ ∈ M ( H ) such that k x k a ∈ L ( | µ | ) , a ∈ ( , ∞ ) \ N , satisfies(5.5) Z H k x − y k a d µ ( x ) = , y ∈ H if and only if µ is the zero measure, proved in [11], [10].On [8] it is proved that if ψ ∈ CM and is not a constant function, then Z H ψ ( k x − y k ) d µ ( x ) = , y ∈ H if and only if µ is the zero measure. In this section we prove similar results on a much broadersetting, as a consequence of the results presented on Section 4. Theorem 5.1.
Let H be an infinite dimensional Hilbert space, ℓ ∈ Z + and φ , ϕ ∈ CM ℓ . If ameasure µ ∈ M ( H ) such that k x k ℓ ∈ L ( | µ | ) satisfies Z H ψ ( k x − y k ) d µ ( x ) = y ∈ H , where ψ : = φ − ϕ then it must hold that Z H ψ ( k x − y k + c ) d µ ( x ) = y ∈ H , c ≥ . In addition, (even if H is not infinite dimensional), ψ is not a polynomial if and only if the onlymeasure µ ∈ M ( H ) such that k x k ℓ ∈ L ( | µ | ) satisfies Z H ψ ( k x − y k + c ) d µ ( x ) = y ∈ H , c ≥ is the zero measure. For some functions we can provide a version of Theorem 5.1 on finite dimensional spaces.
ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 13
Lemma 5.2.
Let ℓ ∈ N and H be a Hilbert space. A measure µ ∈ M ( H ) such that k x k ( ℓ − ) ∈ L ( | µ | ) and ψ ( k x − y k ) ∈ L ( | µ | × | µ | ) , satisfies Z H ψ ( k x − y k ) d µ ( x ) = y ∈ H when ψ : [ , ∞ ) → R is one of the following functions: ( i ) ψ ( t ) = t a / , ( ℓ − ) < a < ℓ ; ( ii ) ψ ( t ) = t ℓ − log ( t ) , ℓ > ; ( iii ) ℓ = and ψ ∈ CM ℓ is not a polynomial, ψ ( ) ≤ . ( iv ) ℓ = and ψ ∈ CM ℓ is not a polynomial, ψ ( ) ≤ but k x k ℓ ∈ L ( | µ | ) .if and only if µ is the zero measure. We remark that on the case ( iv ) we may withdraw the additional assumption k x k ℓ ∈ L ( | µ | ) if ψ ∈ C ℓ − [ , ∞ ) . 6. P ROOFS
Section 3.
Proof of Theorem 3.3 . The converse is immediate.Suppose that γ is P -CPD. Since P is finite dimensional there exists a basis p , . . . , p m ∈ P forit such that p i ( ξ j ) = δ i , j . By the integrability assumptions on the functions p i and γ ( x , ξ j ) , thekernel K γ ∈ L ( | µ | × | µ | ) , and Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) = Z X Z X K γ ( x , y ) d µ ( x ) d µ ( y ) the conclusion will follow from Lemma 3.6. (cid:3) Proof of Lemma 3.4 . Let µ ∈ M P ( X ) for which γ ( x , y ) ∈ L ( | µ | × | µ | ) . Let A : = { ξ ∈ X , γ ( · , ξ ) ∈ L ( | µ | ) } , which by Fubini-Tonelli its complement has | µ | zero mea-sure. If A ∩ { ξ , p ( ξ ) = } 6 = /0, the result is a consequence of Theorem 3.3. On the otherhand, if A ∩ { ξ , p ( ξ ) = } = /0, note that the kernel γ is positive definite when restricted tothe closed set B : = { ξ , p ( ξ ) = } , A ⊂ B , and Z X Z X γ ( x , y ) d µ ( x ) d µ ( y ) = Z B Z B γ ( x , y ) d µ ( x ) d µ ( y ) . The conclusion follows from Lemma 3.6.Now, under the additional requirements on p and γ it is easy to see that γ is [ p ] -PD if and onlyif the kernel ( x , y ) ∈ X × X → β ( x , y ) : = γ ( x , y ) p ( x ) p ( y ) ∈ R is CPD. Note that is sufficient to prove the 3 equivalences on the kernel β for any measure η ∈ M ( X ) with η ( X ) =
0, because we can take d η = pd µ .The kernel d ( x , y ) : = ( − β ( x , y ) + β ( x , x ) + β ( y , y )) / is a pseudometric on X , because d is a CND kernel with d ( x , x ) = x ∈ X , so it satisfies the triangle inequality. Since thefunction β ( x , x ) is bounded, the relations β ∈ L ( | η | × | η | ) , β ( x , z ) ∈ L ( | η | ) for some z ∈ X , β ( x , z ) ∈ L ( | η | ) for every z ∈ X are respectively equivalent to the relations d ∈ L ( | η | × | η | ) , d ( x , z ) ∈ L ( | η | ) for some z ∈ X , d ( x , z ) ∈ L ( | η | ) for every z ∈ X . The conclusion that these 3 properties are equivalent for the kernel d follows directly from thetriangle inequality. (cid:3) Proof of Lemma 3.6 . Assume without loss of generalization that µ is a nonnegative measure.The fact that the set X µ satisfies µ ( X − X µ ) = ( C n ) n ∈ N forwhich µ ( X − ∪ n ∈ N C n ) =
0. In particular, by the Dominated Convergence Theorem, the L ( µ × µ ) convergence holds Z X Z X K ( x , y ) χ C n ( x ) χ C n ( y ) d µ ( x ) d µ ( y ) → Z X Z X K ( x , y ) d µ ( x ) d µ ( y ) , because µ × µ ( X × [ X − ∪ n ∈ N C n ]) =
0. The function p K ( x , x ) ∈ L ( χ C n µ ) , so by Lemma 2.1, K µ n ∈ H K , where µ n : = χ C n d µ , and h K µ n − K µ m , K µ n − K µ m i H K = Z X Z X K ( x , y )[ χ C n ( x ) − χ C m ( x )][ χ C n ( y ) − χ C m ( y )] d µ ( x ) d µ ( y ) −−−−→ m , n → ∞ ( K µ n ) n ∈ N is Cauchy, in particular, convergent to an element K µ ∈ H K . Since H K is a RKHS, convergence in norm implies pointwise convergence, so K µ ( z ) = lim n → ∞ K n µ ( z ) = lim n → ∞ Z C n K ( x , z ) d µ ( x ) = Z X K ( x , z ) d µ ( x ) , for every z ∈ X µ , which proves our claim.Now, if K ∈ L ( µ × η ) , we have that h K η n , K µ n i H K = Z X Z X k ( x , y ) χ D n ( x ) χ C n ( y ) d η ( x ) d µ ( y ) . The left hand side of this equality converge to h K η , K µ i H K , while the right hand side convergeto R X R X k ( x , y ) d η ( x ) d µ ( y ) by the Dominated Convergence Theorem. (cid:3) Section 4 .
Throughout the rest of the paper, we use the well known fact that a Hermitiankernel γ : X × X → C is CND if and only if the kernel e − r γ ( x , y ) is positive definite for every r >
0, page 74 in [1].Next Lemma is an improvement of Lemma 3.4 for p as the set of constant functions. We useCND instead of CPD because it is how we apply this result. ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 15
Lemma 6.1.
Let γ : X × X → R be a continuous CND kernel such that γ ( x , x ) is a boundedfunction, µ ∈ M ( X ) and θ > . Then, the following assertions are equivalent ( i ) γ ∈ L θ ( | µ | × | µ | ) ; ( ii ) The function x ∈ X → γ ( x , z ) ∈ L θ ( | µ | ) for some z ∈ X ; ( iii ) The function x ∈ X → γ ( x , z ) ∈ L θ ( | µ | ) for every z ∈ X .Proof.
Since γ is CND there exists a CND kernel β : X × X → R , for which β ( x , x ) = x ∈ X , β / is a pseudometric on X and γ ( x , y ) = β ( x , y ) + γ ( x , x ) / + γ ( y , y ) / γ ( x , x ) is bounded and µ is a finite measure, for θ ≥ γ arerespectively equivalent to the three equivalences for the CND kernel β by the Minkowsky in-equality. If θ ∈ ( , ) the same relation occurs, but it follows from the general relation on L θ spaces Z | f + g | θ ≤ Z | f | θ + Z | g | θ . In particular, we may suppose that γ is a CND kernel for which γ ( x , x ) = x ∈ X .If γ ( x , y ) θ ∈ L ( | µ | × | µ | ) then there exists z ∈ X for which γ ( x , z ) θ ∈ L ( | µ | ) by the Fubini-Tonelli Theorem.If x ∈ X → γ ( x , z ) θ ∈ L ( | µ | ) for some z ∈ X , then for every y ∈ X ( γ ( x , y )) θ = (( γ ( x , y )) / ) θ ≤ ( γ ( x , z ) / + γ ( y , z ) / ) θ . For θ ≥ /
2, the functions inside the parenthesis on the right hand side of the previous equationare elements of L θ ( | µ | ) ( x variable), which by Minkowski Theorem we obtain the integrabilityof x ∈ X → γ ( x , y ) θ . Integrating the Minkowsky inequality with respect to d | µ | ( y ) , we alsoobtain that γ θ ∈ L ( | µ | × | µ | ) . For 0 < θ < /
2, the proof is the same but it follows from thefrom the general relation on L θ spaces as mentioned above. (cid:3) Proof of Theorem 4.1 . For the first claim it is sufficient to prove that I ( µ , µ ) γ , ψ ≥ − ψ ( γ ( x , y )) = a + b γ ( x , y ) + Z ( , ∞ ) e − r γ ( x , y ) − r d λ ( r ) , where b ≤ λ is a nonnegative Radon measure such that min { , r − } ∈ L ( λ ) . Conse-quently, − ψ ( γ ( x , y )) is a CPD kernel, the conclusion is then consequence of Corollary 3.4.If ψ is not a linear function, then λ (( , ∞ )) >
0, because the representation on Equation 2.4 isunique.If µ ∈ M ( X ; γ , ψ ) , then the 3 functions that describes ψ ( γ ( x , y )) are in L ( | µ | × | µ | ) , because e − r γ ( x , y ) − ≤ r > x , y ∈ X and b ≤
0. Since ( − e − rt ) ≤ r ( + t ) min { , r − } , r , t ≥ we can apply Fubini-Tonelli and obtain that − Z X Z X ψ ( γ ( x , y )) d µ ( x ) d µ ( y ) = b Z X Z X γ ( x , y ) d µ ( x ) d µ ( y )+ Z ( , ∞ ) (cid:20) Z X Z X e − r γ ( x , y ) d µ ( x ) d µ ( y ) (cid:21) r d λ ( r ) . The first double integral is non positive by Corollary 3.4. Since 2 γ ( x , y ) = γ ( x , x ) + γ ( y , y ) onlywhen x = y , the kernel e − r γ ( x , y ) is ISPD for every r > . Z X Z X e − r γ ( x , y ) d µ ( x ) d µ ( y ) > , r > λ (( , ∞ )) > (cid:3) Proof of Theorem 4.2 . By equation 4 . . t ≥ ( t ) = log ( ) + log ( t ) − ∞ ∑ k = ( k ) !2 k ( k ! ) t − k k . In [1] it is proved that log ([ x , y ]) is a CND kernel on H while by [8] the positive definite kernel [ x , y ] − k on H is ISPD for every k ∈ N . Since the series appearing on the arcCosh formula aboveonly contains nonnegative numbers, we may reverse the order the summation with integrationfor any η ∈ M ( H ; t ) . Consequently, if µ is not the zero measure − Z H Z H d H ( x , y ) d µ ( x ) d µ ( y ) = − Z H Z H arcCosh ([ x , y ]) d µ ( x ) d µ ( y )= Z H Z H − log ([ x , y ]) + ∞ ∑ k = ( k ) !2 k ( k ! ) k [ x , y ] − k d µ ( x ) d µ ( y ) ≥ Z H Z H ∞ ∑ k = ( k ) !2 k ( k ! ) k [ x , y ] − k d µ ( x ) d µ ( y )= ∞ ∑ k = ( k ) !2 k ( k ! ) k Z H Z H [ x , y ] − k d µ ( x ) d µ ( y ) > . (cid:3) Proof of Corollary 4.3 . By Remark 3 . ψ satisfy these assumptions then we canwrite the kernel D ψ , γ as D ψ , γ ( x , y ) = ψ ( γ ( x , y )) = Z ( , ∞ ) − e − r γ ( x , y ) r d λ ( r ) where λ is a nonnegative Radon measure such that min { , r − } ∈ L ( λ ) . Because γ is a metric,we have that 1 − e − r γ ( x , y ) ≤ [ − e − r γ ( x , z ) ] + [ − e − r γ ( z , y ) ] , x , y , z ∈ X , Which proves that D ψ , γ ( x , y ) ≤ D ψ , γ ( x , z ) + D ψ , γ ( z , y ) .The topologies are equivalent because ψ is necessarily an increasing function with ψ ( ) = ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 17 so ψ ( t n ) → t n → ( X , D ψ , γ ) has strong negative type because the kernel γ is continuous on themetric topology ( X , γ ) , ψ is not a linear function and the remaining requirements for Theorem4.1 are satisfied. (cid:3) In order to prove the next result, we will use the same infinite dimensional multinomialtheorem that was used to prove that the Gaussian kernel is ISPD on Hilbert spaces on [8]. If H is a real Hilbert space and ( e ξ ) ξ ∈ I is a complete orthonormal basis for it, then for every n ∈ N (6.6) h x , y i n = ∑ ξ ∈ I x ξ y ξ ! n = ∑ α ∈ ( I , Z + ) , | α | = n n ! α ! x α y α where x ξ = h x , e ξ i , ( I , Z + ) is the space of functions from I to Z + , the condition | α | = n meansthat ∑ ξ ∈ I α ( ξ ) = n (in particular α must be the zero function except for a finite number ofpoints). Also α ! = ∏ ξ ∈ I α ( ξ ) ! (which makes sense because 0! =
1) and x α = ∏ α ( ξ ) = x α ( ξ ) ξ .This result can be proved using approximations of h x , y i on finite dimensional spaces and themultinomial theorem on those spaces. The number ⌊ l ⌋ stands for the smallest integer less thenor equal to l .On the next Lemma we use the fact that for a continuous positive definite kernel K : X × X → C a measure µ ∈ M √ K ( X ) satisfy Z X K ( x , y ) d µ ( x ) = , y ∈ X if and only if R X R X K ( x , y ) d µ ( x ) d µ ( y ) =
0, which can be seen on [17], [21].
Lemma 6.2.
Let H be a real Hilbert space, n ∈ N and µ ∈ M ( H ) . Suppose that k x − y k n ∈ L ( | µ | × | µ | ) , then h x , y i k k x k i k y k j ∈ L ( | µ | × | µ | ) , k , i , j ∈ Z + , k + i + j ≤ n . Moreover, if R H R H h x , y i k d µ ( x ) d µ ( y ) = for every ≤ k ≤ n − , then ( − ) n Z H Z H k x − y k n d µ ( x ) d µ ( y )= ⌊ n / ⌋ ∑ l = (cid:18) n l (cid:19)(cid:18) ll (cid:19) n − l Z H Z H h x , y i n − l k x k l k y k l d µ ( x ) d µ ( y ) ≥ , and Z H Z H k x − y k m d µ ( x ) d µ ( y ) = , ≤ m ≤ n − . Proof.
By Lemma 6.1, the fact that k x − y k n ∈ L ( | µ | × | µ | ) is equivalent at k x k n ∈ L ( | µ | ) .Since |h x , y i k k x k i k y k j | ≤ k x k i + k k y k j + k and k x k i + k ≤ max { , k x k n } , we obtain the desired integrability.Note that k x − y k m = ( k x k + k y k − h x , y i ) m = m ∑ k = m − k ∑ i = (cid:18) mk (cid:19)(cid:18) m − ki (cid:19) ( − ) k h x , y i k k x k i k y k ( m − k − i ) If k + i ≤ n −
1, then by the hypothesis0 = Z H Z H h x , y i k + i d µ ( x ) d µ ( y ) = Z H Z H h x , y i k ∑ ξ ∈ I x ξ y ξ ! i d µ ( x ) d µ ( y )= Z H Z H h x , y i k ∑ ξ ∈ I x ξ y ξ ! i d µ ( x ) d µ ( y )= Z H Z H h x , y i k ∑ | α | = i i ! α ! x α y α ! d µ ( x ) d µ ( y )= ∑ | α | = i i ! α ! Z H Z H h x , y i k x α y α d µ ( x ) d µ ( y ) . But then, R H R H h x , y i k x α y α d µ ( x ) d µ ( y ) = α ∈ ( I , Z + ) with | α | = i , because thekernel inside the double integral is positive definite, continuous and satisfies the conditions onLemma 2.1. In particular, since for every y ∈ H and | α | = i there exists a sequence ( y l ) l ∈ N that converges to y and y α l =
0, we have that Z H h x , y i k x α d µ ( x ) = , y ∈ H , α ∈ ( I , Z + ) , | α | = i . Then Z H Z H h x , y i k k x k i k y k ( m − k − i ) d µ ( x ) d µ ( y )= ∑ | β | = m − k − i ∑ | α | = i ( m − k − i ) ! β ! i ! α ! Z H Z H h x , y i k x α y β d µ ( x ) d µ ( y ) = . By symmetry, the same double integral is zero when k + ( m − k − i ) ≤ n −
1. Those tworelations occur only when n = m and 2 i = ( n − i − k ) . The remaining terms on the sum when n = m are exactly those on the statement on the theorem after a simplification using those twoequalities. The conclusion follows because the kernel h x , y i k k x k l k y k l is continuous, positivedefinite and satisfies the conditions on Lemma 2.1 (cid:3) Corollary 6.3.
Let γ : X × X → [ , ∞ ) be a continuous CND kernel such that x → γ ( x , x ) is aconstant function and γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y. Then for n ∈ N and µ ∈ M ( X ) such that γ n ∈ L ( | µ | × | µ | ) , the kernel K − γ defined in Theorem 3.1 satisfies ( K − γ ) m ∈ L ( | µ | ×| µ | ) , ≤ m ≤ n and if Z X Z X K − γ ( x , y ) m d µ ( x ) d µ ( y ) = , ≤ m ≤ n − , ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 19 then ( − ) n Z X Z X γ ( x , y ) n d µ ( x ) d µ ( y ) ≥ and Z X Z X γ ( x , y ) m d µ ( x ) d µ ( y ) = , ≤ m ≤ n − . Proof.
By the hypothesis on γ , there exists a Hilbert space H and a continuous and injectivefunction T : X → H , such that γ ( x , y ) = k T ( x ) − T ( y ) k H + c , where c ≥ γ on the diagonal. If µ ∈ M ( X ) is a measure satisfying the conditions on the Corollary, thenthe image measure µ T ∈ M ( H ) satisfies the same conditions of Lemma 6.2. The conclusionfollows by standard properties of image measures. (cid:3) Proof of Theorem 4.4 . By Equation 2.3, we have that ψ ( γ ( x , y )) = Z ( , ∞ ) e − γ ( x , y ) r − e ℓ ( r ) ω ℓ, ∞ ( γ ( x , y ) r ) r ℓ d λ ( r ) + ℓ ∑ k = a k γ ( x , y ) k . By the hypothesis, the ℓ + L ( | µ | × | µ | ) . Corollary 6.3 implies that Z X Z X ℓ ∑ k = a k γ ( x , y ) k d µ ( x ) d µ ( y ) = Z X Z X a ℓ γ ( x , y ) ℓ d µ ( x ) d µ ( y ) ≥ . On the other hand, because of Lemma 6.4 we can apply Fubini-Tonelli, and then Z X Z X " Z ( , ∞ ) e − γ ( x , y ) r − e ℓ ( r ) ω ℓ, ∞ ( γ ( x , y ) r ) r ℓ d λ ( r ) d µ ( x ) d µ ( y )= Z ( , ∞ ) r ℓ (cid:20) Z X Z X e − γ ( x , y ) r d µ ( x ) d µ ( y ) (cid:21) d λ ( r ) ≥ , because the inner double integral is a nonnegative number for every r > ψ is unique, if ψ is not a polynomial of degree ℓ or less then λ (( , ∞ )) >
0, also, if 2 γ ( x , y ) = γ ( x , x ) + γ ( y , y ) only when x = y , by [8] the inner doubleintegral is a positive number for every r > µ is not the zero measure, and then the tripleintegral is a positive number as well. (cid:3) Lemma 6.4.
There exists an M > , which only depends on ℓ ∈ Z + for which (6.7) | e − rt − e ℓ ( r ) ω ℓ, ∞ ( rt ) | ≤ Mr ℓ ( + t ℓ ) min { , r − ℓ } , r > , t ≥ . Proof.
Note that r ℓ min { , r − ℓ } = min { r ℓ , } .Case r ≥
1: On this case, the right hand side of Equation 6.7 is ( + t ℓ ) , while the left hand sideis | e − rt − e ℓ ( r ) ω ℓ, ∞ ( rt ) | ≤ + | e ℓ ( r ) ω ℓ, ∞ ( rt ) | ≤ + ℓ − ∑ l = | e ℓ ( r ) r l | t l / l ! . Since each function | e ℓ ( r ) r l | is bounded, the results follows from the fact that t l ≤ + t ℓ . Case r <
1: On this case, the right hand side of Equation 6.7 is ( + t ℓ ) r ℓ , while the left handside is | e − rt − e ℓ ( r ) ω ℓ, ∞ ( rt ) | ≤ | e − rt − ω ℓ, ∞ ( rt ) | + | ( e ℓ ( r ) − ) ω ℓ, ∞ ( rt ) | . The function [ e − s − ω ℓ, ∞ ( s )] / s ℓ is a bounded function on s ∈ [ , ∞ ) , and from this we obtain thedesired inequality for | e − rt − ω ℓ, ∞ ( rt ) | .On the other function we have that | ( e ℓ ( r ) − ) ω ℓ, ∞ ( rt ) | ≤ ℓ − ∑ l = | ( e ℓ ( r ) − ) r l | t l / l ! . Similarly, since e ℓ ( r ) − = − e − r ∑ ∞ k = ℓ r k / k ! the functions ( e ℓ ( r ) − ) r l r − ℓ are bounded on r ∈ ( , ) and from this we also obtain the desired inequality for | ( e ℓ ( r ) − ) ω ℓ, ∞ ( rt ) | , whichconcludes the proof. (cid:3) Section 5.
Proof of Theorem 5.1 . Since H is infinite dimensional, take ( e ι ) ι ∈ N be an orthonormal se-quence of vectors in H . By the Dominated Convergence Theorem, we have that0 = Z H ψ ( k x − y − re ι k ) d µ ( x ) → Z H ψ ( k x − y k + r ) d µ ( x ) , y ∈ H , r ∈ R because h x − y , e ι i → ι → ∞ and | ψ ( t ) | ≤ | ϕ ( t ) | + | φ ( t ) | . ( + t ) ℓ , which proves the firstassertion.Now, if ψ is a polynomial of degree n , let t , . . . , t N ∈ R , c , . . . , c N ∈ R (not all null) such that ∑ Ni = c i p ( t i ) = p ∈ π n ( R ) . Then if k v k =
1, the measure µ : = ∑ Ni = c i δ ( t i v ) ∈ M ( H ) is nonzero and Z H ψ ( k x − y k ) d µ ( x ) = N ∑ i = c i ψ ( k y − h y , v i v k + ( h y , v i − t i ) ) = n for every fixed y ∈ H .For the converse, first, we show that is sufficient to prove the case ℓ = c ∈ ( , ∞ ) → F ( c ) : = ψ ( k x − y k + c ) ∈ R is differentiable for every x , y ∈ H , and ∂ F ∂ c ( y ) = ψ ′ ( c + k x − y k ) . Since ψ = ϕ − φ , and those functions are elements of CM ℓ , we have that | ψ ′ ( t + c ) | . ( + t ) ℓ − ,for every c >
0. In particular, the derivative is a function in L ( | µ | ) and(6.8) Z H ψ ′ ( c + k x − y k ) d µ ( x ) = , y ∈ H , c > . Since ψ ′ ( c + · ) also is the difference between two functions in CM ℓ − for every c >
0, byinduction, we may assume that ℓ = ψ is not a polynomial and µ is a nonzero measure that satisfy the equality on ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 21 the statement of the Theorem. The function ψ ( c + · ) is the difference between two completelymonotone functions on [ , ∞ ) , so there exists a measure β c in [ , ∞ ) for which ψ ( c + t ) = Z [ , ∞ ) e − rt d β c ( r ) , c > , t ≥ d β c + s ( r ) = e − rs d β c ( r ) for every c , s >
0. Integrating the function on the hypotheses withrespect to the measure d µ ( y ) , we obtain that(6.9)0 = Z H Z H ψ ( c + k x − y k ) d µ ( x ) d µ ( y ) = Z [ , ∞ ) Z H Z H e − r k x − y k d µ ( x ) d µ ( y ) d β c ( r ) , c > . The continuous and bounded function I µ ( r ) : = R H R H e − r k x − y k d µ ( x ) d µ ( y ) , r ≥
0, is positivefor every r > c = s + = Z [ , ∞ ) e − sr I µ ( r ) d β ( r ) , s ≥ . By the uniqueness representation of Laplace transform, this can only occur if the finite measure I µ d β is the zero measure on [ , ∞ ) . The behaviour of I µ implies that this occur if and only if I µ ( ) = β is a multiple of δ , the latter implies that ψ is a constant function, which is acontradiction. (cid:3) Proof of Lemma 5.2 . By Theorem 5.1 we only need to focus on the finite dimensional case.We prove ( i ) and ( ii ) by showing that is sufficient to prove the case ℓ = ,
2, which will followfrom ( iii ) and ( iv ) . For the induction argument on ( ii ) we assume a more general setting, that ψ ( t ) = t ℓ − log ( t ) + bt ℓ − , with b ∈ R .Indeed, suppose that ℓ ≥
3. Note then that the function y ∈ H → F ( y ) : = ψ ( k x − y k ) ∈ R istwice differentiable on each direction of an orthonormal basis ( e ι ) ι ∈ I for H , and ∂ F ∂ e ι ( y ) = ψ ′′ ( k x − y k )( y ι − x ι ) + ψ ′ ( k x − y k ) . Since ψ ∈ C ℓ − ([ , ∞ )) ∩ CM ℓ (or − ψ is an element, the sign does not make difference forthe induction step), we have that | ψ ′ ( t ) | . ( + t ) ℓ − and | ψ ′′ ( t ) | . ( + t ) ℓ − . In particular,the second derivative is a function in L ( | µ | ) and summing on the ι variable we obtain ( m = dim ( H ) )(6.10) 0 = Z H ψ ′′ ( k x − y k ) k x − y k + m ψ ′ ( k x − y k ) d µ ( x ) , y ∈ H . When ψ is a function of type ( i ) or ( ii ) , the integrand on this equation is equal to a positivemultiple of k x − y k a − (or k x − y k ℓ − log ( k x − y k ) plus a multiple of k x − y k ℓ − ), which isthe induction argument.Now, let ψ be an arbitrary function on CM ℓ , ℓ = ,
2, that is not a polynomial. For every t > η t : = t µ − τ t , where τ t = t µ ( H ) δ − ( δ tv µ − δ − tv µ ) / v µ is the vector mean, that is Z H h x , y i d µ ( x ) = h v µ , y i , y ∈ H . On the case ℓ = v µ might not be well defined, on this case define it as the vectorzero. Then η t ( H ) =
0, and if it is well defined v η t =
0. By the hypothesis we obtain that4 Z H Z H ψ ( k x − y k ) d η t ( x ) d η t ( y ) = Z H Z H ψ ( k x − y k ) d τ t ( x ) d τ t ( y )= ψ ( )( t µ ( H ) + ) − ψ ( t k v µ k ) By Theorem 4.4, this is a nonnegative number for every t > ℓ = ( − ) ψ ( t ) convergesto + ∞ as t → ∞ , so if k v µ k 6 = ψ ( ) < v µ = , ψ ( ) =
0. In particular, we obtain that the double integral with respect to η is zero,which by Theorem 4.4 we must have that µ = µ ( H ) δ , because ψ is not a polynomial. Fromthis equality and the initial assumption on µ we obtain that µ ( H ) ψ ( k y k ) = y ∈ H ,which can only occur if µ is the zero measure because ψ is not a polynomial.The case ℓ = (cid:3) R EFERENCES [1] C. B
ERG , J. C
HRISTENSEN , AND
P. R
ESSEL , Harmonic analysis on semigroups: theory of positive definiteand related functions , vol. 100 of Graduate Texts in Mathematics, Springer, 1984.[2]
NIST Digital Library of Mathematical Functions . F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I.Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.[3] J. F
ARAUT AND
K. H
ARZALLAH , Distances hilbertiennes invariantes sur un espace homog`ene , Annales del’Institut Fourier, 24 (1974), pp. 171–217.[4] K. F
UKUMIZU , F. R. B
ACH , AND
M. I. J
ORDAN , Dimensionality reduction for supervised learning withreproducing kernel hilbert spaces , Journal of Machine Learning Research, 5 (2004), pp. 73–99.[5] R. G
ANGOLLI , Positive definite kernels on homogeneous spaces and certain stochastic processes related tol´evy brownian motion of several parameters , Annales de Institut Henri Poincar´e Probabilit´es et Statistiques,3 (1967), pp. 121–226.[6] I. M. G
ELFAND AND
N. Y. V
ILENKIN , Generalized Functions, Vol. 4: Applications of Harmonic Analysis ,Academic Press, 1964.[7] A. G
RETTON , K. B
ORGWARDT , M. R
ASCH , B. S CH ¨ OLKOPF , AND
A. S
MOLA , A kernel method for thetwo-sample-problem , Advances in neural information processing systems, 19 (2006), pp. 513–520.[8] J. C. G
UELLA , On Gaussian kernels on Hilbert spaces and kernels on Hyperbolic spaces , arXiv e-prints,(2020), p. arXiv:2007.14697.[9] K. G UO , S. H U , AND
X. S UN , Conditionally positive definite functions and Laplace-Stieltjes integrals ,Journal of Approximation Theory, 74 (1993), pp. 249–265.[10] A. L. K
OLDOBSKII , Isometric operators in vector-valued lp-spaces , Journal of Soviet Mathematics, 36(1987), pp. 420–423.[11] W. LINDE,
On rudin’s equimeasurability theorem for infinite dimensional hilbert spaces , Indiana UniversityMathematics Journal, 35 (1986), pp. 235–243.[12] R. L
YONS , Distance covariance in metric spaces , Ann. Probab., 41 (2013), pp. 3284–3305.[13] ,
Hyperbolic space has strong negative type , Illinois J. Math., 58 (2014), pp. 1009–1013.[14] ,
Strong negative type in spheres , Pacific Journal of Mathematics, 307 (2020), pp. 383–390.[15] L. M
ATTNER , Strict definiteness of integrals via complete monotonicity of derivatives , Transactions of theAmerican Mathematical Society, 349 (1997), pp. 3321–3342.
ENERALIZATION OF THE ENERGY DISTANCE BY BERNSTEIN FUNCTIONS 23 [16] C. A. M
ICCHELLI , Interpolation of scattered data: distance matrices and conditionally positive definitefunctions , Constructive Approximation, 2 (1984), pp. 11–22.[17] C. A. M
ICCHELLI , Y. X U , AND
H. Z
HANG , Universal kernels , Journal of Machine Learning Research, 7(2006), pp. 2651–2667.[18] R. L. S
CHILLING , R. S
ONG , AND
Z. V
ONDRACEK , Bernstein functions: theory and applications , vol. 37,Walter de Gruyter, 2012.[19] D. S
EJDINOVIC , B. S
RIPERUMBUDUR , A. G
RETTON , AND
K. F
UKUMIZU , Equivalence of distance-basedand rkhs-based statistics in hypothesis testing , The Annals of Statistics, (2013), pp. 2263–2291.[20] C.J. S
IMON -G ABRIEL AND
B. S CH ¨ OLKOPF , Kernel distribution embeddings: Universal kernels, character-istic kernels and kernel metrics on distributions , Journal of Machine Learning Research, 19 (2018), pp. 1–29.[21] B. K. S
RIPERUMBUDUR , K. F
UKUMIZU , AND
G. R. L
ANCKRIET , Universality, characteristic kernels andRKHS embedding of measures , Journal of Machine Learning Research, 12 (2011), pp. 2389–2410.[22] I. S
TEINWART AND
A. C
HRISTMANN , Support vector machines , Springer Science & Business Media, 2008.[23] G. J. S Z ´ EKELY AND
M. L. R
IZZO , Energy statistics: A class of statistics based on distances , Journal ofStatistical Planning and Inference, 143 (2013), pp. 1249–1272.[24] G. J. S Z ´ EKELY , M. L. R
IZZO , ET AL ., Testing for equal distributions in high dimension , InterStat, 5 (2004),pp. 1249–1272.[25] H. W
ENDLAND , Scattered data approximation , vol. 17, Cambridge university press, 2005.
Email address : [email protected] RIKEN C
ENTER FOR A DVANCED I NTELLIGENCE P ROJECT , T
OKYO , J, J