Nonparametric density estimation for intentionally corrupted functional data
aa r X i v : . [ m a t h . S T ] D ec Nonparametric density estimation forintentionally corrupted functional data
Aurore Delaigle ∗ Alexander Meister † Abstract
We consider statistical models where functional data are artificiallycontaminated by independent Wiener processes in order to satisfy pri-vacy constraints. We show that the corrupted observations have a Wienerdensity which determines the distribution of the original functional ran-dom variables, masked near the origin, uniquely, and we construct a non-parametric estimator of that density. We derive an upper bound for itsmean integrated squared error which has a polynomial convergence rate,and we establish an asymptotic lower bound on the minimax convergencerates which is close to the rate attained by our estimator. Our estimatorrequires the choice of a basis and of two smoothing parameters. We pro-pose data-driven ways of choosing them and prove that the asymptoticquality of our estimator is not significantly affected by the empirical pa-rameter selection. We examine the numerical performance of our methodvia simulated examples.
Keywords: classification, convergence rates, differential privacy, infinite-dimensionalGaussian mixtures, Wiener densities.
AMS subject classification 2010:
Data privacy is an important feature of a database, where the collected data aretransformed and released so as to make it difficult to identify individuals participatingin a study. Various privatisation methods are in use, resulting in different privacyconstraints of which a popular one is differential privacy. We refer to Wasserman andZhou (2010) for a statistical introduction of differential privacy. The privatisationmechanism typically has an impact on the statistical analysis of the data, and oneof the research directions in statistical privacy is to find ways of ensuring differentialprivacy while keeping as much of the information as possible from the original database(see e.g. Hall et al., 2013 in the functional data context and Karwa and Slavkovi, 2016in the setting of synthetic graphs). ∗ School of Mathematics and Statistics and Australian Research Council Centre of Excel-lence for Mathematical and Statistical Frontiers, University of Melbourne, Australia, email :[email protected] † Institut f¨ur Mathematik, Universit¨at Rostock, D-18051 Rostock, Germany, email :[email protected] ne simple way of ensuring differential privacy is to contaminate the data ar-tificially with additive random noise; see for example Wasserman and Zhou (2010).In the functional data context, Hall et al. (2013) propose a data release mechanismwhere the observed functional data are contaminated by adding to each function arandom Gaussian process (one per functional observation) that is independent of theoriginal data. Proposition 3.3 in Hall et al. (2013) roughly says that the data can bemade differentially private whenever the scaling noise factor of the Gaussian processis sufficiently large.In this paper, we show that if the Gaussian process is a Wiener process and thevalue of the raw data is masked at the origin, then the contaminated data are differ-entially private, but they also have a density. This contrasts with the usual functionaldata setting where the assumption that all measures which are admitted to be thetrue image measure of functional random variables are dominated by a known basicmeasure seems very hard to justify. There exists no canonical basic measure such asthe Lebesgue measure for finite-dimensional Euclidean data or the Haar measure fordata in general locally compact groups. As a result, inference and descriptive sum-maries of functional data are often based on pseudo-densities. See for example Delaigleand Hall (2010) and Ciollaro et al. (2016). Recently, Lin et al. (2018) considered theestimation of densities for functions which lie in a dense subset S of the Hilbert space L ( D ), where D is a finite interval. There, S is defined as the (non-closed) linearhull of an orthonormal basis of L ( D ) and does not contain the functional data con-taminated by Wiener processes that we consider. Privacy issues for functional dataare also discussed in the recent work of Mirshani et al. (2017). Therein, the authorsalso deduce the existence of a Gaussian density for fixed functional observations, butnonparametric estimation of that density is not studied.By contrast, with the privatisation process we propose, the privatised functionaldata have a Radon-Nikodym derivative (thus a true, non pseudo, density) with respectto the Wiener measure. Exploiting the fact that the contaminating distribution isusually known in this context, we consider statistical inference from such privatisedfunctional data.To our knowledge, most existing nonparametric approaches for estimating a Wienerdensity are motivated by diffusion processes. Although these do not include the typeof functional data we consider, some of these methods can be applied in our context.See for example Dabo-Niang (2004a), who suggests an orthogonal series estimator,Dabo-Niang (2002, 2004b) and Ferraty and Vieu (2006), who propose a kernel densityestimator (see also Prakasa Rao, 2010a, for a generalisation in the case of diffusionprocesses), and Prakasa Rao (2010b) and Chesneau et al. (2013) who construct awavelet estimator. See also Ba´ıllo et al. (2011) for a parametric context where thedata and the reference measure are Gaussian. However these methods either sufferfrom slow logarithmic convergence rates, or are derived under abstract assumptionsthat seem hard to justify in our context, or seem difficult to implement in practice.We propose a fully data-driven estimator which has fast polynomial convergence ratesunder simple conditions. Although our estimator is motivated by the privacy settingwe consider, our results can be extended to more general cases of functional data whichhave a Wiener density.This paper is organised as follows. In Section 2 we introduce our statistical modeland show that the Wiener density exists and determines uniquely the image measureof the raw functional random variables masked near zero. Moreover we prove that theprivacy constraints are fulfilled when the noise level is sufficiently large. In Section 3we construct a nonparametric orthonormal series estimator of the Wiener density and ropose data-driven procedures for choosing the basis (Section 3.4) and the smoothingparameters (Section 3.5). In Section 4 we derive an explicit upper bound for themean integrated squared error of our estimator and show that it achieves polynomialconvergence rates under intuitive tail restrictions and metric entropy constraints onthe measure of the original data. Functional data problems in which such fast rates areavailable are rare; usually the achievable rates are only logarithmic or sub-polynomial;see e.g. Dabo-Niang (2004a), Mas (2012) and Meister (2016). Finally, we derive alower bound on the mean integrated square error under our intuitive conditions andwe show that choosing the parameters in a data-driven way does not significantlydeteriorate the asymptotic performance of our procedure (thus we establish a weakadaptivity result). Numerical simulations are provided in Section 5. The proofs aredeferred to supplement. We observe functional data Y , . . . , Y n defined on [0 , Y j = X j + σW j , j = 1 , . . . , n , (2.1)where the random functions X j and W j , j = 1 , . . . , n , are totally independent. Here, X j represents the j th function of interest, which is corrupted by a standard Wienerprocess W j with a deterministic scaling factor σ >
0. Unlike typical measurementerror problems where contamination is due to imprecise measurement or unavoidableperturbation, here the data are contaminated artificially and we can assume that σ isknown.We assume that the X j ’s take their values in C , ([0 , C ,ℓ ([0 , ℓ times continuously differentiable (or just continuous when ℓ = 0) functions f defined on [0 ,
1] that are such that f (0) = 0. The X j ’s have an unknown probabilitymeasure P X on the Borel σ -field B ( C , ([0 , C , ([0 , C , ([0 , k · k ∞ . Throughout we use the notation V j = σW j , and we use V , W , X and Y to denote a generic function that has the samedistribution as, respectively, the V j ’s, W j ’s, the X j ’s and the Y j ’s. Critically here, thefunctional data X j are assumed to satisfy X j (0) = 0. Indeed, since W j (0) = 0, then Y j (0) = X j (0) and if the value of X j at zero is not masked, then individuals can beidentified from Y j (0). In practice, if the raw data do not satisfy X j (0) = 0, they canbe pre-masked at zero before the contamination step, for example by replacing X j by e X j = X j − X j (0) or e X j = X j w where w is a smooth function such that w (0) = 0 and w (1) = 1. In this section, we show that the Y j ’s have a well defined density with respect to thescaled Wiener measure, and that this density characterises the distribution of the X j ’suniquely. Finally, we show that the contamination process ensures differential privacy.To ensure existence of a density, we need the following assumption, which we as-sume throughout this work: ssumption 1 X ∈ C , ([0 , ϕ from C , ([0 , , E { ϕ ( Y ) } = E { ϕ ( X + σW ) } = E n ϕ ( σW ) exp (cid:16) σ Z X ′ ( t ) dW ( t ) (cid:17) exp (cid:16) − σ Z | X ′ ( t ) | dt (cid:17)o = E h ϕ ( V ) E n exp (cid:16) σ Z X ′ ( t ) dV ( t ) (cid:17) exp (cid:16) − σ Z | X ′ ( t ) | dt (cid:17)(cid:12)(cid:12)(cid:12) V oi , so that, by integration by parts, we have, a.s., dP Y dP V ( V ) = E h exp n σ Z X ′ ( t ) dV ( t ) − σ Z | X ′ ( t ) | dt o(cid:12)(cid:12)(cid:12) V i = Z exp n σ x ′ (1) V (1) − σ Z x ′′ ( t ) V ( t ) dt − σ Z | x ′ ( t ) | dt o dP X ( x ) . (2.2)Applying the factorization lemma to this conditional expectation, we deduce thatthere exists a Borel measurable mapping f Y : C , ([0 , → R such that f Y ( V ) is equalto the right hand side of (2.2) almost surely. This implies that f Y is the density of P Y with respect to P V . Thus the contaminated Y j ’s have a density f Y . The next theoremestablishes its connection with the measure of the X j ’s. Theorem 2.1.
The functional density f Y in (2.2) characterises the probability mea-sure P X uniquely. We deduce from this theorem that inference about P X (e.g. goodness-of-fit tests orclassification problems; see Section 2.3) can be performed via f Y . To use this resultin practice, it remains to see whether we can estimate f Y nonparametrically using thedata Y , . . . , Y n . This is what we study in Section 3.Throughout we use the notation h· , ·i for the inner product of L ([0 , k · k forthe corresponding norm, and we make the following assumption: Assumption 2
For some constant C X, ∈ (0 , ∞ ), we have that k X ′ k ≤ C X, a.s. . The following proposition shows that if the scaling factor σ is large enough, thecontaminated data are privatised. For the definition of ( α, β )-privacy we refer to Hallet al. (2013); in our setting this criterion means that P [ x + σW ∈ B ] ≤ exp( α ) · P [˜ x + σW ∈ B ] + β , ∀ B ∈ B ( C , ([0 , , for all x, ˜ x ∈ C , ([0 , {k x ′ k , k ˜ x ′ k } ≤ C X, . Proposition 2.1.
For any α, β > , choosing σ > C X, p /β ) /α guarantees ( α, β ) -privacy of the observation of Y = X + σW under the Assumptions 1 and 2. .3 Applications The existence of a density for contaminated data has important practical applica-tions. One of them is goodness-of-fit testing. Goodness-of-fit tests for functional datahave been considered in e.g. Bugni et al. (2009). In our context, using the observedi.i.d. contaminated functional data Y , . . . , Y n , the problem consists in testing the nullhypothesis H : X ∼ P X versus the alternative H : X P X for some fixed probabil-ity measure P X on B ( C , ([0 , H is equivalent to theclaim that Y has the functional density f Y = d ( P X ∗ P V ) /dP V . Using the estimatorˆ f Y of f Y that we introduce in Section 3, a testing procedure could be based on T ( Y , . . . , Y n ) := ( , for R (cid:12)(cid:12) ˆ f Y ( y ) − f Y ( y ) (cid:12)(cid:12) dP V ( y ) > ρ , , otherwise,where ρ is a threshold parameter. In Theorem 4.1 we derive an upper bound on themean integrated squared error of our estimator ˆ f Y . Using the Markov inequality, wededuce that the test can attain any given significance level α > ρ largeror equal to the ratio of this upper bound and α . While this gives some insights about ρ , this upper bound does not provide a data-driven rule for selecting ρ in practice.The latter is a difficult problem; for example, it requires deriving the asymptoticdistribution of the fully data-driven estimator. This goes beyond the scope of thispaper and we leave this issue open for future research.Another interesting application is classification, which, in our context, can be ex-pressed as follows. We observe training contaminated data pairs ( Y i , I i ), i = 1 , . . . , n ,where Y i = X i + V i , the X i ’s come from two distinct populations Π and Π and theclass label I i = k if X i comes from population Π k , for k = 0 ,
1. The V i ’s are Wienerprocesses independent of the X i ’s, and are identically distributed within each popula-tion, but the scaling noise parameter σ need not be the same for the two populations.Using these data, the goal is to classify in Π or Π a new random curve Y = X + V ,where X comes from either Π or Π , but whose class label is unknown.It is well known in general classification problems that the optimal classifier isthe Bayes classifier which, adapted to our context, assigns a curve to Π if E ( I | Y = y ) > / otherwise. In the case where the probability measures P Y, and P Y, of the Y i ’s that originate from, respectively, Π and Π , have well defineddensities f Y, and f Y, , the Bayes classifier can be expressed as: assign Y to Π if π f Y, ( Y ) > π f Y, ( Y ), and to Π otherwise, where π k = P ( I = k ). In the particularGaussian case, Ba´ıllo et al. (2011) showed that these densities are well defined andshowed how to estimate them.In our case the Y i ’s are generally not Gaussian but they have functional densities f Y,k = dP Y,k /dP V , for k = 0 ,
1. Since P X, = P X, implies that f Y, = f Y, (seeTheorem 2.1), these densities can be used for classification of X from observations on Y in the optimal Bayes classifier. There, in practice we classify Y in Π if π ˆ f Y, ( Y ) ≥ π ˆ f Y, ( Y ) and in Π otherwise, where for k = 0 ,
1, ˆ f Y,k denotes the estimator of f Y,k from Section 3 constructed from the training data Y i for which I i = k .There exist many other classification procedures for functional data, often basedon pseudo-densities or finite dimensional approximations. However, Delaigle andHall (2012) pointed that, except in the Gaussian case, often such projections do notensure good finite sample performance. See for example Hall et al. (2001), Ferratyand Vieu (2006), Escabias et al. (2007), Preda et al. (2007) and Shin (2008). See alsoDai et al. (2017) for a recent example, where the authors approximate the densities in he two populations by the finite dimensional surrogate densities proposed in Delaigleand Hall (2010); see Delaigle and Hall (2013) for a related classifier. In this section we consider the problem of estimating the functional density f Y non-parametrically. Nonparametric estimation of a density for stochastic processes whose probability mea-sure has a Radon-Nikodym derivative with respect to the Wiener measure has beenconsidered by several authors. In Dabo-Niang (2002, 2004b), the author proposes touse a kernel density estimator; see also Prakasa Rao (2010a). This estimator is sim-ple but it suffers from slow logarithmic convergence rates, which are reflected by itspractical performance. A wavelet estimator with polynomial convergence rates wasproposed by Prakasa Rao (2010b) and Chesneau et al. (2013), but their conditionsare quite technical and it is not clear how their parameters can be chosen in practice.Moreover, their theory is derived under abstract high level conditions which might notbe easily satisfied in our context.A simpler estimator is the orthogonal series estimator of Dabo-Niang (2004a), de-fined as follows. Let { ϕ j } j ∈ N denote an orthonormal basis of real-valued functionsof [0 , ϕ j ∈ L ([0 , H j ) j ≥ denote the scaled Hermite poly-nomials defined by H k ( x ) = ( − k φ ( k ) ( x ) / { φ ( x ) √ k ! } , for all integer k ≥
0, where φ ( x ) = exp( − x / / √ π . Also, for x ∈ C ([0 , β ′ x,ℓ = Z ϕ ℓ ( t ) dx ( t ) . (3.1)Using results from Cameron and Martin (1947), the author notes that, as K → ∞ ,the Fourier-Hermite series (Ψ k ,...,k K ) ≤ k ≤ K,..., ≤ k K ≤ K , where, for x ∈ C ([0 , k ,...,k K ( x ) ≡ H k ,...,k K ( β ′ x, , . . . , β ′ x,K ) ≡ K Y ℓ =1 H k ℓ ( β ′ x,ℓ ) , (3.2)forms an orthonormal basis of the Hilbert space of all square-integrable C ([0 , f T of functional data T , . . . , T n (thathave a Wiener density) byˆ f KT ( x ) = K X k ,...,k K =0 n n X j =1 H k ,...,k K ( β ′ T j , , . . . , β ′ T j ,K ) · H k ,...,k K ( β ′ x, , . . . , β ′ x,K ) , (3.3)where K is a smoothing parameter. This estimator is very attractive for its simplicity.However, a drawback is that the rates derived by Dabo-Niang (2004a) are logarith-mic. In the next two sections, using a two-stage approximation approach (first a sieveapproximation of f Y and then an estimator of the approximation), we are able tointroduce a different regularisation scheme which involves two parameters. This in-creases the flexibility of the estimator, which, as we shall see, enables us to obtainpolynomial convergence rates. Moreover we provide data-driven choices of the basisand the threshold parameters. .2 Finite-dimensional approximation of f Y Recall from (2.2) that for V = σW with W a standard Wiener process, we have f Y ( V ) = E h exp n σ Z X ′ ( t ) dV ( t ) − σ Z | X ′ ( t ) | dt o(cid:12)(cid:12)(cid:12) V i , a.s. , and that our goal is to estimate f Y from data Y , . . . , Y n . Instead of directly expressing f Y in the Fourier-Hermite basis at (3.2), we first construct a sieve approximation of f Y ,and then (see Section 3.3) we express our sieve approximation in the Fourier-Hermitebasis.Using the notation β ′ x,ℓ = R ϕ ℓ ( t ) dx ( t ) from Equation (3.1), where { ϕ j } j ∈ N is areal-valued orthonormal basis of L ([0 , Z X ′ ( t ) dV ( t ) − Z | X ′ ( t ) | dt = ∞ X j =1 β ′ X,j · β ′ V,j − ∞ X j =1 β ′ X,j , (3.4)where the infinite sums should be understood as mean squared limits. Truncating thesums to m terms, with m ≥ f Y ( V )by f [ m ] Y ( β ′ V, , . . . , β ′ V,m ), where, for all s , . . . , s m ∈ R , f [ m ] Y ( s , . . . , s m ) = E n exp (cid:16) σ m X j =1 β ′ X,j · s j − σ m X j =1 β ′ X,j (cid:17)o = exp (cid:16) σ m X j =1 s j (cid:17) Z exp n − σ m X j =1 (cid:0) s j − x j (cid:1) o dP X,m ( x , . . . , x m ) , (3.5)and where P X,m denotes the measure of ( β ′ X, , . . . , β ′ X,m ).The following lemma shows that, as long as m is sufficiently large, f [ m ] Y ( β ′ V, , . . . , β ′ V,m ) is a good approximation to f Y ( V ), where V denotes a generic V i ∼ P V . Lemma 3.1.
Let A m denote the σ -field generated by β ′ V , , . . . , β ′ V ,m . Under Assump-tions 1 and 2,(a) f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) = E { f Y ( V ) | A m } a.s. ,(b) we have E (cid:12)(cid:12) f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) − f Y ( V ) (cid:12)(cid:12) ≤ σ · exp (cid:0) C X, /σ (cid:1) · (cid:16) X j,j ′ >m (cid:12)(cid:12) h ϕ j , Γ X ϕ j ′ i (cid:12)(cid:12) (cid:17) / , where the linear operator Γ X : L ([0 , → L ([0 , is defined by (cid:0) Γ X f (cid:1) ( t ) = E n X ′ ( t ) Z X ′ ( s ) f ( s ) ds o , t ∈ [0 , , f ∈ L ([0 , . (3.6)Since Γ X is a self-adjoint and positive-semidefinite Hilbert-Schmidt operator, the upperbound in Lemma 3.1(b) is finite for any orthonormal basis { ϕ j } j of L ([0 , m → ∞ . Indeed, Assumption 2 guarantees that P j,j ′ (cid:12)(cid:12) h ϕ j , Γ X ϕ j ′ i (cid:12)(cid:12) ≤ E k X ′ k ≤ C X, < ∞ . If X (and hence X ′ ) is centered then Γ X coincides with thecovariance operator of X ′ . .3 Estimating the sieve approximation of f Y Next we show how to estimate f [ m ] Y using a Fourier-Hermite series. For this, let P Y,m and f Y,m denote, respectively, the measure and the m -dimensional Lebesgue densityof the observed random vector ( β ′ Y j , , . . . , β ′ Y j ,m ), where β ′ Y j ,k = Z ϕ k ( t ) dY j ( t ) = β ′ X j ,k + β ′ V j ,k , j = 1 , . . . , n ; k = 1 , . . . , m . Let g σ denote the N (0 , σ I m )-density, with I m the m × m -identity matrix, let L ,g σ ( R m )denote the Hilbert space of Borel measurable functions f : R m → R which satisfy k f k g σ ≡ R | f ( t ) | g σ ( t ) dt < ∞ , and let h· , ·i g σ denote the inner product of L ,g σ ( R m ).It is easy to deduce from (3.5) that f [ m ] Y ( s , . . . , s m ) = f Y,m ( s , . . . , s m ) /g σ ( s , . . . , s m ) , (3.7)and it can be proved that f [ m ] Y ∈ L ,g σ ( R m ). Therefore, if Ψ , Ψ , . . . is an orthonormalbasis of L ,g σ ( R m ), we can write f [ m ] Y = ∞ X k =1 α k Ψ k ,α k = h Ψ k , f [ m ] Y i g σ = Z Ψ k ( y ) f Y,m ( y ) dy = E { Ψ k ( β ′ Y, , . . . , β ′ Y,m ) } . Now the sequence ( H k ,...,k m ) k ,...,k m ≥ of functions H k ,...,k m ( x , . . . , x m ) = Q mj =1 H k j ( x j ) defined at (3.2) forms an orthonormal basis of L ,g ( R m ). Thus we can takeΨ k ( · ) = H k ,...,k m ( · /σ ). To estimate f [ m ] Y , we replace α k by ˆ α k = n − P nj =1 Ψ k ( β ′ Y j , , . . . , β ′ Y j ,m ).Finally, for U a functional random variable independent of Y , . . . , Y n which has adensity with respect to P V , we define our estimator of f Y ( U ) byˆ f [ m,K ] Y ( U )= X k ,...,k m ≥ n n X j =1 H k ,...,k m ( β ′ Y j , /σ, . . . , β ′ Y j ,m /σ ) H k ,...,k m (cid:0) β ′ U, /σ, . . . , β ′ U,m /σ (cid:1) × ω K ( k + · · · + k m ) 1 { k + · · · + k m ≤ K } , (3.8)where K ≥ ≤ ω K ( x ) ≤ , K ]. The term ω K ( k + · · · + k m ) 1 { k + · · · + k m ≤ K } prevents the k i ’sfrom being too large, which controls the variability of the estimator. Using waveletterminology, the function ω K dictates whether the k i ’s are chosen by a soft or a hardrule. Specifically, a hard rule corresponds to ω K ≡
1: here all k i ’s summing to at most K are given equal weight and as K increases, new indices appear and play as big arole as older ones. For a soft rule, ω K ( x ) is taken to be a smooth decreasing functionof x , e.g. ω K ( x ) = 1 − x/ ( K + 1); as K increases, new indices start playing a role buthave less weight than former ones.A major difference between (3.8) and Dabo-Niang’s (2004a) estimator at (3.3)is our regularisation scheme: because of the two-step construction of our estima-tor (sieve approximation followed by basis expansion), we do not use all the indices( k , . . . , k K ) ∈ { , . . . , K } K . Instead we use ( k , . . . , k m ) ∈ { , . . . , K } m such that k + . . . + k m ≤ K , and we assign a weight ω K ( k + . . . + k m ) to each group of m indices. As we will see in the next sections, our use of a second parameter m and the estriction we put on k + . . . + k m drastically improve the quality of the estimator, boththeoretically and practically. Moreover, in Section 3.4, we introduce a data-driven wayof choosing the basis { ϕ j } j ∈ N used to construct the coefficients β ′ Y j ,k and β ′ U,k . ϕ j ’s To compute our estimator in practice, we need to choose the basis { ϕ j } j used in(3.1). Lemma 3.1(b) implies that if we take the ϕ j ’s equal to the eigenfunctionsof Γ X , ordered such that the sequence of corresponding eigenvalues ( λ j ) j decreasesmonotonically, then E (cid:12)(cid:12) f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) − f Y ( V ) (cid:12)(cid:12) ≤ σ · exp (cid:0) C X, /σ (cid:1) · (cid:16) X j>m λ j (cid:17) / . (3.9)This bound decreases monotonically as m increases, which gives an indication thatthe first m terms of the basis capture some of the main characteristics of f Y .Of course, in practice Γ X is unknown and thus the ϕ j ’s are unknown. Thus we needto estimate Γ X , but a priori this does not seem to be an easy task because, up to somemean terms, Γ X is the covariance function of the first derivative X ′ of X . If we couldobserve X ′ , . . . , X ′ n , we could use standard covariance estimation techniques such asthose in Hall and Hosseini-Nasab (2006), Mas and Ruymgaard (2015) and Jirak (2016).However we only observe the contaminated Y j ’s. If the Y j ’s were differentiable, wecould take their derivative and estimate Γ X and its eigenfunctions as in the referencesjust cited. However they are not differentiable and we cannot take such a simpleapproach.Instead, we propose the following approximation procedure. Let { ψ j } j denote anorthonormal basis of L ([0 , ϕ ℓ denotes the eigenfunction of Γ X with eigenvalue λ ℓ , where λ ≥ λ ≥ · · · . In the supplement we show that, for all k ≥ ∞ X j =1 ϕ ℓ,j h ψ k , Γ X ψ j i = λ ℓ ϕ ℓ,k , (3.10)where ϕ ℓ,j = h ϕ ℓ , ψ j i . If we take the ψ j ’s to be continuously differentiable and suchthat ψ j (0) = ψ j (1) = 0, for example if { ψ j } j is the Fourier sine basis, then for j, k = 1 , , . . . , we have h ψ k , Γ X ψ j i = M j,k − σ · { j = k } , (3.11)where M j,k = R ψ ′ j ( t ) R E (cid:8) Y ( t ) Y ( s ) (cid:9) ψ ′ k ( s ) ds dt (see the proof in the supplement).We propose to approximate ϕ ℓ by P Mj =1 ˆ ϕ ℓ,j ψ j , with M a large positive integer, whereˆ ϕ ℓ,j denotes an estimator of ϕ ℓ,j . Next we show how to compute ˆ ϕ ℓ, , . . . , ˆ ϕ ℓ,M fromour data. First, combining (3.10) and (3.11), we have P ∞ j =1 ϕ ℓ,j (cid:0) M j,k − σ · { j = k } (cid:1) = λ ℓ ϕ ℓ,k so that M X j =1 ϕ ℓ,j (cid:0) M j,k − σ · { j = k } (cid:1) = λ ℓ ϕ ℓ,k + R k,ℓ , (3.12)where R k,ℓ is a remainder term resulting from the truncation of the sum to M terms.Let I M and M denote, respectively, the M × M -identity matrix and the M × M -matrixwhose components are defined by M j,k , j, k = 1 , . . . , M , and let Φ ℓ = ( ϕ ℓ, , . . . , ϕ ℓ,M ) T and R ℓ = ( R ,ℓ , . . . , R M,ℓ ) T . Then (3.12) implies that ( M − σ I M ) Φ ℓ = λ Φ ℓ + R ℓ . ote that | R ℓ | shrinks to zero as M → ∞ since | R ℓ | ≤ C X, P j>M | ϕ ℓ,j | . Thus, (cid:0) M − σ I M ) Φ ℓ ≈ λ ℓ Φ ℓ , which motivates us to approximate Φ ℓ by the unit eigenvector v ℓ of the matrix M − σ I M correponding to the ℓ th largest eigenvalue. Now, (cid:0) M − σ I M ) v ℓ = λ ℓ v ℓ implies that M v ℓ = ( λ ℓ + σ ) v ℓ . Thus, v ℓ is also the eigenvector of M corresponding to its ℓ th largest eigenvalue. Of course, M is unknown but it canbe estimated byˆ M = 1 n n X ℓ =1 n Z Z ψ ′ j ( t ) Y ℓ ( t ) Y ℓ ( s ) ψ ′ k ( s ) ds dt o j,k =1 ,...,M . (3.13)For ℓ = 1 , . . . , M , let ˆ v ℓ denote the M unit eigenvectors of ˆ M (ordered so thatthe corresponding eigenvalues decrease monotonically). We propose to estimate Φ ℓ byˆ Φ ℓ = ( ˆ ϕ ℓ, , . . . , ˆ ϕ ℓ,M ) T = ˆ v ℓ . Finally, we estimate ϕ ℓ by ˆ ϕ ℓ = P Mj =1 ˆ ϕ ℓ,j ψ j . M , m and K To compute the estimator at (3.8) in practice, we need to choose three parameters: M , the parameter used in Section 3.4 to construct the basis functions ϕ j employedto compute the projections in (3.1), m , a parameter which dictates the dimensionof our approximation of f Y by f [ m ] Y at (3.5), and K , the truncation parameter ofour orthogonal series expansion at (3.8). Having the ˆ ϕ j ’s close to the eigenfunctionsof Γ X is likely to give better practical performance, but it is not necessary for theconsistency of our estimator. This suggests that the choice of M is not crucial andwe take M = 20. By contrast, m and K are important smoothing parameters whichinfluence consistency and need to be chosen with care. We suggest choosing ( m, K )by minimising the cross-validation (CV) criterionCV( m, K ) = Z (cid:12)(cid:12) ˆ f Y ( v ) (cid:12)(cid:12) dP V ( v ) − n n X i =1 ˆ f ( − i ) Y ( Y i ) , (3.14)with ˆ f ( − i ) Y defined in the same way as the estimator at (3.8), except that it uses onlythe data Y , . . . , Y i − , Y i +1 , . . . , Y n . To compute the integral at (3.14) we generate alarge sample (we took a sample of size 10000 in our numerical work) of V j ’s from P V and approximate the integral by the mean of the | ˆ f Y ( V j ) | ’s.As in standard nonparametric density estimation problems, our cross-validationcriterion can have multiple local minima and the global minimum is not necessarily agood choice. In case of multiple local minima, we choose the one that produces thesmallest value of m + K . Moreover, when minimising CV ( K, m ) we discard all pairsof values (
K, m ) for which more than 50% of the ˆ f ( − i ) Y ’s or of the ˆ f Y ’s are negative.For the non discarded ( K, m )’s, we replace each negative ˆ f ( − i ) Y ( Y i ) and ˆ f Y ( V j ) byrecomputing those estimators by repeatedly replacing K by K − m and m − In this section we derive theoretical properties of our estimator. For simplicity wederive our results in the case where the weight function ω K in (3.8) is equal to 1. Similarresults can be established for a more general weight function, but at the expense ofeven more technical proofs. In Section 4.1 we derive an upper bound on the mean ntegrated squared error of our estimator which is valid for all n . Next, in Section 4.2we derive asymptotic properties of our estimator. In the next theorem, we give an upper bound on the mean integrated squared error R ( ˆ f [ m,K ] Y , f Y ) = E Z (cid:12)(cid:12) ˆ f [ m,K ] Y ( v ) − f Y ( v ) (cid:12)(cid:12) dP V ( v )of the estimator at (3.8) in the case where the orthonormal basis { ϕ j } j and the pa-rameters m and K are deterministic. Our result is non asymptotic and is valid for all n . Theorem 4.1.
Under Assumptions 1 and 2 and the selection ω K ≡ , we have R ( ˆ f [ m,K ] Y , f Y ) ≤ V + B + D , where V = 1 n exp (cid:0) KC X, /σ (cid:1) · K + mK ! , B = inf h ∈H m,K (cid:13)(cid:13) f [ m ] Y ( σ · ) − h (cid:13)(cid:13) g , D = 1 σ · exp (cid:0) C X, /σ (cid:1) · (cid:16) X j,j ′ >m (cid:12)(cid:12) h ϕ j , Γ X ϕ j ′ i (cid:12)(cid:12) (cid:17) / , and where H m,K denotes the linear hull of the H k ,...,k m ’s for which k + · · · + k m ≤ K . In Theorem 4.1, V represents a variance term while B represents a bias termwhich depends on smoothness properties of f [ m ] Y . Both are typical of nonparametricestimators, but the term D is of a different type. It reflects the error of the finite-dimensional approximation of the density f Y by the function f [ m ] Y . Next we derive asymptotic properties of our density estimator. For this, we need anadditional assumption which will be used when dealing with the term D from Theo-rem 4.1: Assumption 3
There exist constants C X, , C X, ∈ (0 , ∞ ) and γ > X j,j ′ >m (cid:12)(cid:12)(cid:12) Z ϕ j ( s ) (cid:0) Γ X ϕ j ′ (cid:1) ( s ) ds (cid:12)(cid:12)(cid:12) ≤ C X, · exp (cid:0) − C X, m γ (cid:1) , ∀ m ∈ N . For example, if X is centered and { ϕ j } j is the principal component basis witheigenvalues λ ≥ λ ≥ · · · discussed in Section 3.4, then Assumption 3 is satisfied assoon as P ∞ j =1 exp( C ′ X, j γ ) · λ j < ∞ for some C ′ X, > C X, . In this case, Assumption3 can be interpreted as an exponential decay of the eigenvalues of Γ X ; concretelyAssumption 3 is satisfied if there exist some C ′′ X, > C ′ X, > C X, and some C ′′′ X, > λ j ≤ C ′′′ X, exp( − C ′′ X, j γ /
2) for all integer j ≥ f [ m,K ] Y as the sample size n tends to infinity. e establish the upper bound uniformly over the class F X = F X (cid:0) C X, , C X, , C X, , γ, { ϕ j } j (cid:1) of all admitted image measures of X such that Assumptions 1 to 3 are satisfied forsome deterministic orthonormal basis { ϕ j } j of L ([0 , Theorem 4.2.
Assume that γ ∈ (0 , and select the weight function ω K ≡ andthe parameters K and m such that K = K n = ⌊ γ (log n ) / log(log n ) ⌋ , m = m n = ⌊ (cid:0) C M · log n (cid:1) /γ ⌋ , for some finite constant C M > /C X, . Then our estimator ˆ f [ m,K ] satisfies lim sup n →∞ sup P X ∈F X log (cid:8) R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1)(cid:9) / log n ≤ − γ . Theorem 4.2 shows that the risk of our estimator converges to zero faster than O ( n − γ ′ ) for any γ ′ < γ <
1. In particular, our estimator achieves polynomial con-vergence rates, which is usually impossible in problems of nonparametric functionalregression or density estimation. In standard problems of that type where the datarange over an infinite-dimensional space, only logarithmic or sub-algebraic rates canusually be achieved (see e.g. Mas, 2012, Chagny and Roche, 2014 and Meister, 2016).In our case the dimension of the data is infinite as well; however the density f Y formsan infinite-dimensional Gaussian mixture and its smoothness degree is sufficiently highto overcome the difficulty caused by high dimensionality.The next theorem provides an asymptotic lower bound for the problem of estimat-ing f Y nonparametrically. For simplicity we restrict to the case where C X, = 1. Theorem 4.3.
Assume that γ ∈ (0 , and let C X, = 1 in Assumption 2. Moreover,assume that the orthonormal basis { ϕ j } j of L ([0 , is such that all ϕ j ’s are continu-ously differentiable. Then, for any sequence ( ˆ f n ) n of estimators of f Y computed fromthe data Y , . . . , Y n , we have lim inf n →∞ sup P X ∈F X log (cid:8) R (cid:0) ˆ f n , f Y (cid:1)(cid:9) / log n ≥ − γ + ( γ − / ( γ − . We learn from the theorem that, in this problem, no nonparametric estimatorcan reach the parametric squared convergence rate n − . This is significantly differentfrom the simpler problem of nonparametric estimation of one-dimensional Gaussianmixtures, where the parametric rates are achievable up to a logarithmic factor (seeKim, 2014). Note that the upper bound in Theorem 4.2 is usually larger than thelower bound in Theorem 4.3, although the two bounds are very close to each other for γ close to 1. Rather than our estimator being suboptimal, we suspect that our lowerbound is not sharp enough. Deriving the exact minimax rates seems a very challengingopen problem for future research.As is standard in nonparametric estimation problems requiring the choice of smooth-ing parameters, Theorem 4.2 was derived under a deterministic choice of m and K .Next we establish an asymptotic result in the case where ( ˆ m, ˆ K ) is chosen by cross-validation as at (3.14), where minimisation is performed over the mesh G = (cid:8) ⌊ log n ⌋ , . . . , ⌊ (log n ) /γ ⌋ (cid:9) × (cid:8) , . . . , ⌊ (log n ) / log(log n ) ⌋ (cid:9) , (4.1)for some constant γ ∈ (0 , γ ). The following theorem shows that the convergence ratesfrom Theorem 4.2 can be maintained at least in a weak sense. heorem 4.4. Our estimator ˆ f [ ˆ m, ˆ K ] Y , where ω K ≡ and ( ˆ m, ˆ K ) is selected by cross-validation over the mesh G at (4.1) , satisfies lim n →∞ sup P X ∈F X P n n γ Z (cid:12)(cid:12) ˆ f [ ˆ m, ˆ K ] Y ( x ) − f Y ( x ) (cid:12)(cid:12) dP V ( x ) ≥ n d o = 0 , for all γ ∈ [ γ , and d > . To illustrate the performance of our density estimation procedure, we performed sim-ulations in different settings. For a grid of T = 101 points 0 = t < t < . . . < t T = 1equispaced by ∆ t = 1 / ( T − Y i ( t k ) = P Jj =1 √ λ j Z ik φ j ( t k ) + σ W i ( t k ), where the Z ik ’s are i.i.d., each Z ik is the average of the two independent U [ − . , .
1] random variables, W i ( t ) = 0 and, for k = 1 , . . . , T , W i ( t k ) = W i ( t k − )+ ǫ ik ,where the ǫ ik ’s are i.i.d. ∼ N (0 , ∆ t ). We considered five settings: (i) J = 20, σ = 0 . λ j = exp( − j ) and φ j ( t ) = √ πtj ); (ii) same as (i) but with J = 40; (iii) same as(ii) but with σ = 0 . σ = 0 . φ j ( t ) = √ πtj ) κ ( t ), κ ( t ) = 2 exp(10 t ) / { t ) } −
1; (v) same as (i) but with σ = 0 . φ j ( t ) = √ πtj ) κ ( t ).In each case we generated B = 200 samples of Y i ( t k )’s, of sizes n = 500, 1000,2000 and 5000. Then, for b = 1 , . . . , B , using the b th sample of Y i ( t k )’s, we computedour density estimator ˆ f [ m,K ] Y ( V ) at (3.8) for 10 functions V generated from the samedistribution as σW , where m and K were chosen by cross-validation by minimisationof (3.14) and where we took the weight function ω K ( x ) = 1 − x/ ( K + 1). The basisfunctions ϕ j were computed as in Section 3.4 with M = 20 and ψ j ( t ) = √ πtj );we denote by DM the resulting estimator. Each time the m and K selected by CVproduced a negative estimator ˆ f Y ( v ) for a new data curve v , for that curve v werepeatedly replaced, K by K − m and m − m, K )was such that ˆ f Y ( v ) > ϕ j ’s, which we denote by DN. We chose K by minimisation of thecross-validation criterion at (3.14), replacing there our estimator by this estimatorand ( m, K ) by K . As for our estimator, each time the selected value of K produced anegative estimator for a new curve v , we replaced, for that curve v , K by the largestvalue smaller or equal to K which produced a positive estimator.We also considered the kernel density estimator of Dabo-Niang (2004b), whichrequires the choice of a bandwidth. To choose it in practice we considered several ver-sions of cross-validation and a nearest-neighbour bandwidth version of the estimator.However we encountered major numerical issues with denominators getting too closeto zero and did not manage to obtain reasonable results. Therefore we do not considerthis estimator in our numerical work.The results of our simulations are summarised in Table 1 where, for each case andeach sample size n we present 10 times the median and the first and third quartilesof the squared error SE = { ˆ f Y ( V ) − f Y ( V ) } computed for the 200 × V values.As expected by the theory, both estimators improved as sample size increased, andoverall our estimator worked significantly better than Dabo-Niang’s (2004a) estimator.In Table 2, for our estimator and that of Dabo-Niang (2004a), we also show the averagetime (in seconds and averaged over 10 simulated examples) required to compute onedensity estimator and its associated data-driven smoothing parameters. Recall that × median [first quartile,second quartile] of 2 × values of the SE. Model Method n = 500 n = 1000 n = 2000 n = 5000(i) DM 635[145,2242] 492[120,1660] 395[103,1252] 316[86,953]DN 891[171,4122] 800[166,3439] 664[125,2970] 527[100,2271](ii) DM 683[152,2427] 506[123,1732] 409[108,1293] 343[94,1051]DN 911[179,4133] 823[168,3568] 659[124,2990] 544[101,2420](iii) DM 1134[237,4538] 898[188,3529] 813[175,3237] 784[165,3197]DN 1375[209,8046] 1200[186,7325] 1081[174,6611] 1025[177,5574](iv) DM 908[194,3788] 801[172,3158] 744[154,3135] 590[124,2399]DN 1468[232,8351] 1151[183,6878] 1097[190,6514] 1052[196,5460](v) DM 849[187,3287] 751[163,2812] 654[143,2500] 565[122,2273]DN 1097[170,6389] 1024[172,5817] 914[160,5133] 865[160,4309] Table 2: Average computational time (in seconds) for computing one densityestimator (including the CV choice of smoothing parameters).
Model Method n = 500 n = 1000 n = 2000 n = 5000(i) DM 94 114 130 198DN 42 46 54 77(ii) DM 95 113 135 200DN 49 55 68 96(iii) DM 102 116 138 218DN 50 53 71 97(iv) DM 104 110 127 191DN 46 47 59 82(v) DM 91 130 125 182DN 41 47 65 100our estimator requires the choice by CV of two smoothing parameters m and K whereasthat of Dabo-Niang (2004a) requires to choose one smoothing parameter K . It isunsurprising then that our estimator requires longer computational time: this is theprice to pay for the additional accuracy brought by choosing, in a data-driven way,two parameters instead of one. To prove (3.10), note that since ϕ ℓ ∈ L ([0 , ϕ ℓ = P ∞ j =1 ϕ ℓ,j ψ j .Since Γ X ϕ ℓ = λ ℓ ϕ ℓ , we deduce that P ∞ j,k =1 ϕ ℓ,j h ψ k , Γ X ψ j i ψ k = λ ℓ P ∞ k =1 ϕ ℓ,k ψ k .Multiplying both sides of this equality by ψ k and taking the integral we obtain (3.10).To prove (3.11), note that, using Fubini’s theorem and integration by parts, we ave h ψ k , Γ X ψ j i = Z ψ k ( t ) (cid:0) Γ X ψ j (cid:1) ( t ) dt = E n Z ψ k ( t ) X ′ ( t ) dt Z ψ j ( s ) X ′ ( s ) ds o = Z ψ ′ k ( t ) Z (cid:2) E (cid:8) X ( t ) X ( s ) (cid:9)(cid:3) ψ ′ j ( s ) ds dt = Z ψ ′ k ( t ) Z (cid:16)(cid:2) E (cid:8) Y ( t ) Y ( s ) (cid:9)(cid:3) − σ min( s, t ) (cid:17) ψ ′ j ( s ) ds dt = Z ψ ′ k ( t ) Z E (cid:8) Y ( t ) Y ( s ) (cid:9) ψ ′ j ( s ) ds dt − σ Z ψ k ( t ) ψ j ( t ) dt = M j,k − σ · { j = k } , where we used the fact that R ψ ′ k ( t ) R min( s, t ) ψ ′ j ( s ) ds dt = R ψ k ( t ) ψ j ( t ) dt .In order to provide a more general/ abstract view of a major step (6.17) in theproof of Theorem 4.3, we mention that the supremum of a statistical risk E θ k ˆ θ − θ k over all θ ∈ Θ is estimated from below by a Bayesian risk with respect to some a-prioridistribution Q on the parameter space Θ. Therein Θ is a subset of a separable Hilbertspace with the norm k · k . Moreover impose that the data distribution has the density f ( θ ; · ) with respect to some dominating σ -finite measure µ on the action space Ω. Inorder to calculate the smallest Bayesian risk, consider the classical argument that E Q E θ k ˆ θ − θ k = Z Z k ˆ θ ( ω ) − θ k f ( θ ; ω ) dµ ( ω ) dQ ( θ )= Z Z (cid:13)(cid:13)(cid:0) ˆ θ ( ω ) − ˜ θ ( ω ) (cid:1) + (cid:0) ˜ θ ( ω ) − θ (cid:1)(cid:13)(cid:13) f ( θ ; ω ) dµ ( ω ) dQ ( θ ) ≥ Z Z k ˜ θ ( ω ) − θ k f ( θ ; ω ) dµ ( ω ) dQ ( θ )+ 2 Z D ˆ θ ( ω ) − ˜ θ ( ω ) , ˜ θ ( ω ) Z f ( θ ; ω ) dQ ( θ ) − Z θf ( θ ; ω ) dQ ( θ ) E dµ ( ω ) , (6.1)where h· , ·i denotes the inner product associated with k · k and the integrals inside theinner product may be understood as Bochner integrals. Putting˜ θ ( ω ) := Z θf ( θ ; ω ) dQ ( θ ) / Z f ( θ ; ω ) dQ ( θ ) , the last term in (6.1) vanishes so that ˜ θ is the Bayes estimator of θ with respect to Q and k · k . Thus the minimal Bayesian risk (Bayesian risk of ˜ θ ) equals E Q E θ k ˜ θ − θ k = Z Z (cid:13)(cid:13)(cid:13) Z θ ′ f ( θ ′ ; ω ) dQ ( θ ′ ) / Z f ( θ ′′ ; ω ) dQ ( θ ′′ ) − θ (cid:13)(cid:13)(cid:13) f ( θ ; ω ) dµ ( ω ) dQ ( θ )= Z (cid:13)(cid:13)(cid:13) Z θ ′ f ( θ ′ ; ω ) dQ ( θ ′ ) (cid:13)(cid:13)(cid:13) n Z f ( θ ′′ ; ω ) dQ ( θ ′′ ) o − Z f ( θ ; ω ) dQ ( θ ) dµ ( ω ) − Z D Z θ ′ f ( θ ′ ; ω ) dQ ( θ ′ ) , Z θf ( θ ; ω ) dQ ( θ ) En Z f ( θ ′′ ; ω ) dQ ( θ ′′ ) o − dµ ( ω )+ Z k θ k Z f ( θ ; ω ) dµ ( ω ) | {z } = 1 dQ ( θ ) , o that E Q E θ k ˜ θ − θ k = Z k θ k dQ ( θ ) − Z (cid:13)(cid:13)(cid:13) Z θ ′ f ( θ ′ ; ω ) dQ ( θ ′ ) (cid:13)(cid:13)(cid:13) (cid:14)n Z f ( θ ; ω ) dQ ( θ ) o dµ ( ω ) . This corresponds to the lower bound on the minimax risk which is applied in (6.17).
Since the measure P V of V is known, we can identify the measure P Y from the Radon-Nikodym derivative f Y = dP Y /dP V . Suppose there exist two measures P X and ˜ P X ,each of which is a candidate for the true measure of X , and both of which lead to thesame measure P Y of Y = X + V . Consider the functional characteristic functions ψ X , ˜ ψ X and ψ Y , defined by ψ X ( t ) = Z exp n i Z t ( u ) x ( u ) du o dP X ( x ) , ˜ ψ X ( t ) = Z exp n i Z t ( u ) x ( u ) du o d ˜ P X ( x ) ,ψ Y ( t ) = Z exp n i Z t ( u ) x ( u ) du o dP Y ( x ) ,ψ V ( t ) = Z exp n i Z t ( u ) x ( u ) du o dP V ( x )= exp n − σ Z Z t ( u ) min( u, u ′ ) t ( u ′ ) du du ′ o , for any t ∈ L ([0 , X and V that ψ X ( t ) · ψ V ( t ) = ψ Y ( t ) = ˜ ψ X ( t ) · ψ V ( t ) for all t ∈ L ([0 , ψ V does not vanishanywhere, the above equality implies that ψ X = ˜ ψ X . Now, for u ∈ [0 , t ( u ) = t h ( u ) = h − P m − j =1 τ j · K (cid:8) ( u − j/ m ) /h (cid:9) , where m > τ j ’s arereal coefficients, and for a bandwidth parameter h ∈ (0 , − m ] and a kernel function K : R → R , which is non-negative, continuous, supported on the interval [ − ,
1] andintegrates to one.For any fixed m and τ j , j = 1 , . . . , m −
1, we have lim h → R t h ( u ) x ( u ) du = P m − j =1 τ j · x ( j/ m ), for any x ∈ C ([0 , ψ X ( t h ) = ˜ ψ X ( t h ) tend to the characteristic functions of the random vector X [ m ]1 = (cid:0) X (1 / m ) , . . . , X ((2 m − / m ) (cid:1) at τ = ( τ , . . . , τ m − ) under the probability measure P X and ˜ P X , respectively, as h ↓
0. Since τ can be chosen to be any vector in R m − , the above mentioned characteristicfunctions are equal. It is well known that the characteristic function of any randomvector in R m − determines its distribution uniquely so that the distributions of X [ m ]1 under the basic measure P X , on the one hand, and ˜ P X , on the other hand, are identical.Thus, for some arbitrary s ∈ C ([0 , P X (cid:0)(cid:8) x ∈ C , ([0 , x ( j/ m ) ≤ s ( j/ m ) , ∀ j = 1 , . . . , m − (cid:9)(cid:1) = ˜ P X (cid:0)(cid:8) x ∈ C , ([0 , x ( j/ m ) ≤ s ( j/ m ) , ∀ j = 1 , . . . , m − (cid:9)(cid:1) . (6.2) he countable set Q = S m ∈ N { k/ m : k = 1 , . . . , m − } is dense in the interval[0 , (cid:8) x ∈ C , ([0 , x ( u ) ≤ s ( u ) , ∀ u ∈ [0 , (cid:9) = (cid:8) x ∈ C , ([0 , x ( u ) ≤ s ( u ) , ∀ u ∈ Q (cid:9) = \ m ∈ N (cid:8) x ∈ C , ([0 , x ( j/ m ) ≤ s ( j/ m ) , ∀ j = 1 , . . . , m − (cid:9) . Therefore we obtain that P X (cid:0)(cid:8) x ∈ C , ([0 , x ( u ) ≤ s ( u ) , ∀ u ∈ [0 , (cid:9)(cid:1) = lim m →∞ P X (cid:0)(cid:8) x ∈ C , ([0 , x ( j/ m ) ≤ s ( j/ m ) , ∀ j = 1 , . . . , m − (cid:9)(cid:1) . (6.3)The corresponding equality holds true for the measure ˜ P X .Combining (6.2) and (6.3) we deduce that P X (cid:0)(cid:8) x ∈ C , ([0 , x ( u ) ≤ s ( u ) , ∀ u ∈ [0 , (cid:9)(cid:1) = ˜ P X (cid:0)(cid:8) x ∈ C , ([0 , x ( u ) ≤ s ( u ) , ∀ u ∈ [0 , (cid:9)(cid:1) , for any s ∈ C ([0 , (cid:8) x ∈ C , ([0 , x ( u ) ≤ s ( u ) , ∀ u ∈ [0 , (cid:9) , s ∈ C ([0 , , is stable with respect to intersection and generates the Borel σ -field B ( C , ([0 , P X = ˜ P X . Let x and ˜ x be two realizations of the functional random variable X . Thanks to As-sumptions 1 and 2, we may impose that x (0) = ˜ x (0) = 0 and that max {k x ′ k , k ˜ x ′ k } ≤ C X, . For any t , . . . , t n ∈ [0 ,
1] we introduce the vector F = (cid:0) x ( t j ) − ˜ x ( t j ) (cid:1) Tj =1 ,...,n and the matrix M = (cid:8) EW ( t j ) W ( t k ) (cid:9) j,k =1 ,...,n . According to Proposition 7 in Hallet al. (2013), in order to prove privacy it suffices to show that (cid:12)(cid:12) M − / F (cid:12)(cid:12) ≤ σα/c ( β ),where we may put c ( β ) = p /β ) according to Proposition 3 in Hall et al. (2013).Without any loss of generality we assume that t ≤ · · · ≤ t n since (cid:12)(cid:12) ( P MP T ) − / ( P F ) (cid:12)(cid:12) = F T M − F = (cid:12)(cid:12) M − / F (cid:12)(cid:12) , for any n × n -permutation matrix P . Then, M = t t t · · · t t t t · · · t t t t · · · t ... ... ... ... ... t t t · · · t n . Writing ∆ j = ( F j − F j − ) / ( t j − t j − ) if t j > t j − and x ′ ( t j ) − ˜ x ′ ( t j ) if t j = t j − ;and Y j = ∆ j − ∆ j +1 , where we set F = t = 0 and ∆ n +1 = 0, we consider that k − X l =1 t l Y l + n X l = k t k Y l = k − X l =1 t l (∆ l − ∆ l +1 ) + t k ∆ k = t ∆ + k X l =2 ( t l + t l − )∆ l = F k , or all integer k = 1 , . . . , n so that MY = F , where Y = ( Y j ) Tj =1 ,...,n . We deduce thatthe left hand side of the above system of equations equals F T M − F = F T Y = n X j =1 F j ∆ j − n X j =1 F j ∆ j +1 = n X j =1 ( F j − F j − ) t j − t j − . (6.4)As F j − F j − = R t j t j − (cid:8) x ′ ( t ) − ˜ x ′ ( t ) (cid:9) dt for j = 1 , . . . , n , the Cauchy-Schwarz inequalityin L ([0 , n X j =1 Z t j t j − (cid:12)(cid:12) x ′ ( t ) − ˜ x ′ ( t ) (cid:12)(cid:12) dt ≤ C X, , which completes the proof of the proposition. (cid:3) Proof of Lemma 3.1 : (a) Expanding X ′ in the orthonormal basis { ϕ j } j we get Z X ′ ( t ) dV ( t ) = ∞ X j =1 h X ′ , ϕ j i · β ′ V ,j , k X ′ k = ∞ X j =1 (cid:12)(cid:12) h X ′ , ϕ j i (cid:12)(cid:12) , where the infinite sums should be understood as mean squared limits.Since, for any integer m , A m is a subset of the σ -field generated by V , we havethat E (cid:8) f Y ( V ) | A m (cid:9) = E n exp (cid:16) σ Z X ′ ( t ) dV ( t ) − σ Z | X ′ ( t ) | dt (cid:17)(cid:12)(cid:12)(cid:12) A m o = E n exp (cid:16) σ ∞ X j =1 h X ′ , ϕ j i · β ′ V ,j (cid:17) · exp (cid:0) − k X ′ k / (2 σ ) (cid:1)(cid:12)(cid:12)(cid:12) A m o = E n exp (cid:16) σ m X j =1 h X ′ , ϕ j i · β ′ V ,j (cid:17) · exp (cid:0) − k X ′ k / (2 σ ) (cid:1) · E n exp (cid:16) σ X j>m h X ′ , ϕ j i · β ′ V ,j (cid:17)(cid:12)(cid:12)(cid:12) A m , X ′ o(cid:12)(cid:12)(cid:12) A m o = E n exp (cid:16) σ m X j =1 h X ′ , ϕ j i · β ′ V ,j (cid:17) · exp (cid:0) − k X ′ k / (2 σ ) (cid:1) · Y j>m E n exp (cid:16) σ h X ′ , ϕ j i · β ′ V ,j (cid:17)(cid:12)(cid:12)(cid:12) X ′ o(cid:12)(cid:12)(cid:12) A m o (6.5)holds true almost surely.Applying, to the last term in (6.5), the fact that E (cid:8) exp (cid:0) tδ (cid:1)(cid:9) = exp (cid:0) t / (cid:1) , (6.6) or all δ ∼ N (0 ,
1) and t ∈ R , we deduce that E (cid:8) f Y ( V ) | A m (cid:9) = E n exp (cid:16) σ m X j =1 h X ′ , ϕ j i · β ′ V ,j (cid:17) · exp (cid:0) − k X ′ k / (2 σ ) (cid:1) · exp (cid:16) σ X j>m (cid:12)(cid:12) h X ′ , ϕ j i (cid:12)(cid:12) (cid:17)(cid:12)(cid:12)(cid:12) A m o = E n exp (cid:16) σ m X j =1 h X ′ , ϕ j i · β ′ V ,j (cid:17) · exp (cid:16) − σ m X j =1 (cid:12)(cid:12) h X ′ , ϕ j i (cid:12)(cid:12) (cid:17)(cid:12)(cid:12)(cid:12) A m o = Z exp (cid:16) σ m X j =1 h x ′ , ϕ j i · β ′ V ,j − σ m X j =1 (cid:12)(cid:12) h x ′ , ϕ j i (cid:12)(cid:12) (cid:17) dP X ( x )= f [ m ] Y ( β ′ V , , . . . , β ′ V ,m )almost surely, which completes the proof of part (a).(b) Using the result of part (a) we have E (cid:12)(cid:12) f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) − f Y ( V ) (cid:12)(cid:12) = E (cid:2) E (cid:8)(cid:12)(cid:12) f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) − f Y ( V ) (cid:12)(cid:12) | A m (cid:9)(cid:3) = E (cid:2) var (cid:8) f Y ( V ) | A m (cid:9)(cid:3) . (6.7)Using Fubini’s theorem, we get E (cid:2) var (cid:8) f Y ( V ) | A m (cid:9)(cid:3) = E n var (cid:16) Z exp (cid:16) σ ∞ X j =1 h x ′ , ϕ j i · β ′ V ,j (cid:17) · exp (cid:0) − k x ′ k / (2 σ ) (cid:1) dP X ( x ) (cid:12)(cid:12)(cid:12) A m (cid:17)o = Z Z exp (cid:0) − {k x ′ k + k x ′ k } / (2 σ ) (cid:1) E n exp (cid:16) σ m X j =1 h x ′ + x ′ , ϕ j i · β ′ V ,j (cid:17)o · cov n exp (cid:16) σ X j>m h x ′ , ϕ j i · β ′ V ,j (cid:17) , exp (cid:16) σ X j>m h x ′ , ϕ j i · β ′ V ,j (cid:17)o dP X ( x ) dP X ( x ) . (6.8)Using (6.6) again we deduce that E (cid:8) exp (cid:0) σ − m X j =1 h x ′ + x ′ , ϕ j i · β ′ V ,j (cid:1)(cid:9) = exp (cid:8) m X j =1 (cid:12)(cid:12) h x ′ + x ′ , ϕ j i (cid:12)(cid:12) / (2 σ ) (cid:9) , and thatcov n exp (cid:16) σ X j>m h x ′ , ϕ j i · β ′ V ,j (cid:17) , exp (cid:16) σ X j>m h x ′ , ϕ j i · β ′ V ,j (cid:17)o = E n exp (cid:16) σ X j>m h x ′ + x ′ , ϕ j i · β ′ V ,j (cid:17)o − h E n exp (cid:16) σ X j>m h x ′ , ϕ j i · β ′ V ,j (cid:17)oi · h E n exp (cid:16) σ X j>m h x ′ , ϕ j i · β ′ V ,j (cid:17)oi = exp (cid:16) σ X j>m (cid:12)(cid:12) h x ′ + x ′ , ϕ j i (cid:12)(cid:12) (cid:17) − exp (cid:16) σ X j>m (cid:8)(cid:12)(cid:12) h x ′ , ϕ j i (cid:12)(cid:12) + (cid:12)(cid:12) h x ′ , ϕ j i (cid:12)(cid:12) (cid:9)(cid:17) . lugging these equalities into (6.8) we conclude that E (cid:2) var (cid:8) f Y ( V ) | A m (cid:9)(cid:3) = Z Z h exp (cid:8) − ( k x ′ k + k x ′ k − k x ′ + x ′ k ) / (2 σ ) (cid:9) − exp n − σ m X j =1 (cid:0)(cid:12)(cid:12) h x ′ , ϕ j i (cid:12)(cid:12) + (cid:12)(cid:12) h x ′ , ϕ j i (cid:12)(cid:12) − (cid:12)(cid:12) h x ′ + x ′ , ϕ j i (cid:12)(cid:12) (cid:1)oi dP X ( x ) dP X ( x )= Z Z n exp (cid:0) h x ′ , x ′ i /σ (cid:1) − exp (cid:16) σ m X j =1 h x ′ , ϕ j ih x ′ , ϕ j i (cid:17)o dP X ( x ) dP X ( x ) . (6.9)Let X denote an independent copy of X . Then (6.9) satisfies E n exp (cid:0) h X ′ , X ′ i /σ (cid:1) − exp (cid:16) σ m X j =1 h X ′ , ϕ j ih X ′ , ϕ j i (cid:17)o ≤ σ E (cid:12)(cid:12)(cid:12) X j>m h X ′ , ϕ j ih X ′ , ϕ j i (cid:12)(cid:12)(cid:12) · exp (cid:0) C X, /σ (cid:1) ≤ σ · exp (cid:0) C X, /σ (cid:1) · (cid:16) X j,j ′ >m (cid:12)(cid:12) h ϕ j , Γ X ϕ j ′ i (cid:12)(cid:12) (cid:17) / , where we used the mean value theorem and the Cauchy-Schwarz inequality. Let V ∼ P V denote a functional random variable which is independent of X , . . . , X n and W , . . . , W n , and let β ′ V,j = R ϕ j ( t ) dV ( t ). Since ˆ f [ m,K ] Y ( V ) is measurable in the σ -field generated by β ′ V, , . . . , β ′ V,m , Y , . . . , Y n and as f [ m ] Y ( β ′ V, , . . . , β ′ V,m ) = E (cid:8) f Y ( V ) | β ′ V, , . . . , β ′ V,m (cid:9) = E (cid:8) f Y ( V ) | β ′ V, , . . . , β ′ V,m , Y , . . . , Y n (cid:9) , a.s., by Lemma 3.1(a), we have R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1) = E (cid:12)(cid:12) ˆ f [ m,K ] Y ( V ) − f Y ( V ) (cid:12)(cid:12) = E (cid:2) E (cid:8)(cid:12)(cid:12) ˆ f [ m,K ] Y ( V ) − f Y ( V ) (cid:12)(cid:12) | β ′ V, , . . . , β ′ V,m , Y , . . . , Y n (cid:9)(cid:3) = E h var (cid:8) f Y ( V ) | β ′ V, , . . . , β ′ V,m , Y , . . . , Y n (cid:9)i + E (cid:12)(cid:12) ˆ f [ m,K ] Y ( V ) − f [ m ] Y ( β ′ V, , . . . , β ′ V,m ) (cid:12)(cid:12) = E h var (cid:8) f Y ( V ) | β ′ V, , . . . , β ′ V,m (cid:9)i + E (cid:12)(cid:12) ˆ f [ m,K ] Y ( V ) − f [ m ] Y ( β ′ V, , . . . , β ′ V,m ) (cid:12)(cid:12) ≤ D + E (cid:12)(cid:12) ˆ f [ m,K ] Y ( V ) − f [ m ] Y ( β ′ V, , . . . , β ′ V,m ) (cid:12)(cid:12) , (6.10)using also Lemma 3.1(b). Using the definition (3.8) of the estimator ˆ f [ m,K ] Y and Par-seval’s identity with respect to the orthonormal basis of the H k ,...,k m in L ,g ( R m ),we get (cid:12)(cid:12) ˆ f [ m,K ] Y ( V ) − f [ m ] Y ( β ′ V, , . . . , β ′ V,m ) (cid:12)(cid:12) = E (cid:13)(cid:13)(cid:13) X k ,...,k m ≥ { k + · · · + k m ≤ K }· n n X j =1 H k ,...,k m ( β ′ Y j , /σ, . . . , β ′ Y j ,m /σ ) · H k ,...,k m − f [ m ] Y ( σ · ) (cid:13)(cid:13)(cid:13) g = X k ,...,k m ≥ { k + · · · + k m ≤ K }· E (cid:12)(cid:12)(cid:12) n n X j =1 H k ,...,k m ( β ′ Y j , /σ, . . . , β ′ Y j ,m /σ ) − (cid:10) f [ m ] Y ( σ · ) , H k ,...,k m (cid:11) g (cid:12)(cid:12)(cid:12) + B . (6.11)Since, from (3.7), E n n n X j =1 H k ,...,k m ( β ′ Y j , /σ, . . . , β ′ Y j ,m /σ ) o = (cid:10) f [ m ] Y ( σ · ) , H k ,...,k m (cid:11) g , it follows that E (cid:12)(cid:12)(cid:12) n n X j =1 H k ,...,k m ( β ′ Y j , /σ, . . . , β ′ Y j ,m /σ ) − (cid:10) f [ m ] Y ( σ · ) , H k ,...,k m (cid:11) g (cid:12)(cid:12)(cid:12) = var (cid:16) n n X j =1 H k ,...,k m ( β ′ Y j , /σ, . . . , β ′ Y j ,m /σ ) (cid:17) ≤ n · EH k ,...,k m ( β ′ Y , /σ, . . . , β ′ Y ,m /σ ) . Using the fact that the Hermite polynomials form an Appell sequence (see e.g. Ap-pell, 1880) we deduce that E (cid:8) H k ,...,k m ( β ′ Y , /σ, . . . , β ′ Y ,m /σ ) (cid:9) = E h m Y l =1 E (cid:8) H k l ( β ′ X ,l /σ + β ′ V, ,l /σ ) | X ′ (cid:9)i = E h m Y l =1 k l ! E n(cid:16) k l X j =0 p j ! k l j ! σ j − k l β ′ X ,lk l − j H j ( β ′ V, ,l /σ ) (cid:17) | X ′ oi = E h m Y l =1 n k l ! k l X j,j ′ =0 p j ! j ′ ! k l j ! k l j ′ ! σ j + j ′ − k l β ′ X ,l k l − j − j ′ · Z H j ( t ) H j ′ ( t ) 1 √ π e − t dt oi = E h m Y l =1 n k l ! k l X j =0 j ! k l j ! σ j − k l ) β ′ X ,l k l − j ) oi = E h m Y l =1 n k l X j =0 j ! k l j ! σ − j β ′ X ,l j oi ≤ E h m Y l =1 (cid:8) β ′ X ,l /σ ) (cid:9) k l i ≤ exp (cid:0) KC X, /σ (cid:1) , (6.12) sing the orthonormality of the H k ,...,k m with respect to h· , ·i g . Using elementaryarguments from combinatorics, we also have (cid:8) ( k , . . . , k m ) ∈ N : k + · · · + k m ≤ K (cid:9) = K + mK ! . Combined with (6.12), this implies that the first term in (6.11) is bounded from aboveby V . Combining this with the other derivations above completes the proof of thetheorem. The next lemma gives an upper bound on the term B defined in Theorem 4.1. It willbe used to prove the theorem. Lemma 6.1.
Under Assumptions 1 and 2, the term B in Theorem 4.1 satisfies B = O (cid:8) ( C X, /σ ) K (2 C X, /σ + √ K / ( K + 1)! (cid:9) , where the constants contained in O ( · · · ) only depend on C X, and σ .Proof of Lemma 6.1 : By Taylor expansion we can write f [ m ] Y = T m,K + R m,K where R m,K is a remainder term that will be treated below, and T m,K ( s , . . . , s m ) = E n K X k =0 k ! σ − k (cid:16) m X j =1 β ′ X ,j · s j (cid:17) k exp (cid:16) − σ m X j =1 β ′ X ,j (cid:17)o = K X k =0 k ! σ − k m X j ,...,j k =1 (cid:16) k Y l =1 s j l (cid:17) · E n(cid:16) k Y l =1 β ′ X ,j l (cid:17) exp (cid:16) − σ m X j =1 β ′ X ,j (cid:17)o . (Assumption 2 guarantees integrability of the above terms). Now T m,K ( σ · ) is an m -variate polynomial of degree ≤ K , so that T m,K ( σ · ) is contained in the linear subspace H m,K of L ,g ( R m ). It follows from there that B ≤ (cid:13)(cid:13) R m,K ( σ · ) (cid:13)(cid:13) g . (6.13)Next, using the Lagrange representation, the remainder term R m,K has the fol-lowing upper bound: (cid:12)(cid:12) R m,K ( s , . . . , s m ) (cid:12)(cid:12) ≤ K + 1)! E h(cid:12)(cid:12)(cid:12) σ m X j =1 β ′ X ,j · s j (cid:12)(cid:12)(cid:12) K +1 · max n exp (cid:16) − σ m X j =1 β ′ X ,j · s j (cid:17) , o exp (cid:16) − σ m X j =1 β ′ X ,j (cid:17)i , so that, by Jensen’s inequality, (cid:13)(cid:13) R m,K ( σ · ) (cid:13)(cid:13) g ≤ K + 1)!] E h(cid:12)(cid:12)(cid:12) σ m X j =1 β ′ X ,j · β ′ V,j (cid:12)(cid:12)(cid:12) K +1) · max n exp (cid:16) − σ m X j =1 β ′ X ,j · β ′ V,j (cid:17) , o exp (cid:16) − σ m X j =1 β ′ X ,j (cid:17)i . (6.14) onditionally on X ′ , the random variable P mj =1 β ′ X ,j · β ′ V,j /σ is normally dis-tributed with mean 0 and variance κ m = P mj =1 β ′ X ,j /σ . Thus, the right hand sideof (6.14) can be expressed as1 { ( K + 1)! } E (cid:16) κ K +1) m exp (cid:0) − κ m (cid:1) E (cid:2) δ K +1) · max (cid:8) exp (cid:0) κ m δ (cid:1) , (cid:9) | X ′ (cid:3)(cid:17) , where δ ∼ N (0 ,
1) and X ′ are independent. Thus (6.14) has the following upperbound: 1 { ( K + 1)! } E n κ K +1) m exp (cid:0) − κ m (cid:1) Eδ K +1) o + 1 { ( K + 1)! } E h κ K +1) m exp (cid:0) − κ m (cid:1) E (cid:8) δ K +1) exp (cid:0) κ m δ (cid:1) | X ′ (cid:9)i = 1 { ( K + 1)! } E n κ K +1) m exp (cid:0) − κ m (cid:1)o K +1 Γ( K + 3 / / √ π + 1 { ( K + 1)! } E n κ K +1) m exp (cid:0) − κ m (cid:1) Z s K +1) exp (cid:0) κ m s − s / (cid:1) ds o / √ π ≤ O (cid:8) (2 C X, /σ ) K / ( K + 1)! (cid:9) + 1 { ( K + 1)! } E n κ K +1) m exp (cid:0) κ m (cid:1) Z ( s + 2 κ m ) K +1) exp (cid:0) − s / (cid:1) ds o / √ π ≤ O (cid:8) (2 C X, /σ ) K / ( K + 1)! (cid:9) + 1 { ( K + 1)! } E n κ K +1) m exp (cid:0) κ m (cid:1)(cid:0) √ κ m (cid:1) K +1) o max n , Γ( K + 3 / / √ π o = O (cid:8) ( C X, /σ ) K (2 C X, /σ + √ K / ( K + 1)! (cid:9) , where we have used Assumption 2, which guarantees that κ m ≤ C X, /σ ; the fact that Eδ K +1) = Γ( K + 3 / K +1 / √ π and Minkowski’s inequality. (cid:3) Proof of Theorem 4.2.
Since Assumption 2 holds, we can apply Theorem 4.1. Firstwe consider the variance term V . Using Stirling’s approximation, we haveexp (cid:0) KC X, /σ (cid:1) K + mK ! ≍ √ π exp (cid:0) KC X, /σ (cid:1) r m + 1 K · (cid:0) m/K (cid:1) K · (cid:0) K/m (cid:1) m ≤ √ π · exp (cid:8) K (1 + C X, /σ + log 2) (cid:9) · ( m/K ) K , since m ≥ K for n sufficiently large. We deduce thatlim sup n →∞ sup P X ∈F X (log V ) / log n = γ (1 /γ − − − γ . Using Lemma 6.1, an upper bound for log B is given by const. · K − K log K , wherethe constant is uniform over all P X ∈ F X , so thatlim sup n →∞ sup P X ∈F X (log B ) / log n = − γ . Finally, under Assumption 3, D = O (cid:0) n − C X, C M / (cid:1) uniformly over all P X ∈ F X .The assumption C M > /C X, guarantees that D is asymptotically negligible. .6 Proof of Theorem 4.3 We define f K ( x ) = K K/ ( m − K ) ( m − K ) / n Y k ∈K f (cid:0) √ Kx k (cid:1)o · n Y k ∈{ ,...,m }\K (cid:12)(cid:12) f (cid:0) √ m − Kx k (cid:1)(cid:12)(cid:12)o , for all x ∈ R m , any integers m > K >
0, any subset
K ⊆ { , . . . , m } with K = K and f = 1 (0 , / − [ − / , . Then we introduce the functions f θ ( x ) = mK ! − X K (cid:12)(cid:12) f K ( x ) (cid:12)(cid:12) + mK ! − X K θ K f K ( x ) , for any vector θ = { θ K } K with θ K ∈ {− , } . All f θ ’s are m -variate Lebesgue proba-bility densities. Then we define the probability measure ˜ P θ on B ( R m ) by˜ P θ ( B ) = (1 − η ) · B (0) + η Z B f θ ( x ) dx , B ∈ B ( R m ) , for some η ∈ (0 ,
1) still to be selected. Now let ˜ X = ( ˜ X , . . . , ˜ X m ) be some m -dimensional random vector with the measure ˜ P θ . Then P X,θ denotes the image mea-sure of the functional random variable X on B ( C , ([0 , X ( t ) = m X j =1 ˜ X j · Z t ϕ j ( s ) ds , t ∈ [0 , . (6.15)Now we show that P X,θ ∈ F X for all vectors θ . As the ϕ j ’s are continuouslydifferentiable, Assumption 1 holds true. Moreover, the support of each f K is includedin the m -dimensional ball around zero with the radius 1. Therefore the measure ˜ P θ isalso supported on a subset of this ball so that k X ′ k = | ˜ X | ≤ , a.s.. HenceAssumption 2 is satisfied. We have that Z ϕ j ( s ) (cid:0) Γ X ϕ j ′ (cid:1) ( s ) ds = 1 (cid:8) max { j, j ′ } ≤ m (cid:9) · E ˜ X j ˜ X j ′ = 1 { j = j ′ ≤ m } · η m , for all K ≥ f is an odd function. Putting η = 6 √ m p C X, · exp (cid:0) − C X, m γ / (cid:1) , (6.16)for m sufficiently large, Assumption 3 is satisfied as well.Following a usual strategy for the proof of lower bounds, we bound the supremumof the statistical risk from below by the Bayesian risk where the a priori distribution of θ is such that all θ K ’s are i.i.d. {− , } -valued random variables with P ( θ K = 1) = 1 / P X ∈F X R (cid:0) ˆ f n , f Y (cid:1) ≥ E θ Z (cid:12)(cid:12) f Y,θ ( v ) (cid:12)(cid:12) dP V ( v ) − Z Z (cid:12)(cid:12) E θ f Y,θ ( u ) f ( n ) Y,θ ( v ) (cid:12)(cid:12) dP V ( u ) /E θ f ( n ) Y,θ ( v ) dP ( n ) V ( v ) , (6.17)where f Y,θ denotes the density of Y with respect to P V when X ∼ P X,θ , and P ( n ) V and f ( n ) Y,θ denote the n -fold product measure and product density of P V and f Y,θ , espectively. Note that P ( n ) Y,θ is the measure of the observed data. For details on theproof of (6.17), see Section 6.1.By Lemma 3.1 and Equation (3.7), the L ( P V )-inner product of f Y,θ ′ and f Y,θ ′′ equals Z f Y,θ ′ ( v ) f Y,θ ′′ ( v ) dP V ( v ) = Ef [ m ] Y,θ ′ ( β ′ V , , . . . , β ′ V ,m ) f [ m ] Y,θ ′′ ( β ′ V , , . . . , β ′ V ,m )= Z n Z g σ ( s − x ) d ˜ P θ ′ ( x ) o n Z g σ ( s − x ′ ) d ˜ P θ ′′ ( x ′ ) o /g σ ( s ) ds = Z Z n Z g σ ( s − x ) g σ ( s − x ′ ) /g σ ( s ) ds o d ˜ P θ ′ ( x ) d ˜ P θ ′′ ( x ′ )= Z Z exp (cid:0) x † x ′ /σ (cid:1) d ˜ P θ ′ ( x ) d ˜ P θ ′′ ( x ′ )=: (cid:10) ˜ P θ ′ , ˜ P θ ′′ (cid:11) exp , (6.18)for all θ ′ , θ ′′ ∈ {− , } K since ˜ X coincides with the vector ( β ′ X , , . . . , β ′ X ,m ) from(3.1). Note that h· , ·i exp represents an inner product on the linear space of all finitesigned measures Q on B ( R m ) such that the support of the measure | Q | is includedin the m -dimensional closed unit ball around 0. By a slight abuse of the notation wewrite h f K , f K ′ i exp for the corresponding inner product of the signed measures whichare induced by the functions f K and f K ′ . We show that the f K form an orthogonalsystem with respect to this inner product; precisely we have that h f K , f K ′ i exp = Z Z exp (cid:0) x † x ′ /σ (cid:1) f K ( x ) f K ′ ( x ′ ) dx dx ′ = 1 {K = K ′ } · h Z Z exp (cid:8) st/ ( σ K ) (cid:9) f ( s ) f ( t ) ds dt i K · h Z Z exp (cid:8) st/ (cid:0) σ ( m − K ) (cid:1)(cid:9) | f ( s ) | | f ( t ) | ds dt i m − K = 1 {K = K ′ } · (cid:0) σ K (cid:1) − K · (cid:8) ± o (1) (cid:9) , (6.19)if K and m − K tend to infinity as n → ∞ .Combining (6.18), (6.19) and the fact that the θ K ’s are centered random variableswe deduce that the first term in (6.17) equals E θ Z (cid:12)(cid:12) f Y,θ ( v ) (cid:12)(cid:12) dP V ( v ) = k S k + η mK ! − X K (cid:13)(cid:13) f K (cid:13)(cid:13) , (6.20)where k · k exp stands for the norm which is induced by h· , ·i exp and the measure S on B ( R m ) is defined by S ( B ) = (1 − η ) 1 B (0) + η mK ! − X K Z B | f K ( x ) | dx , B ∈ B ( R m ) . The second term in (6.17) is E θ ′ ,θ ′′ Z n Z f Y,θ ′ ( u ) f Y,θ ′′ dP V ( u ) o f ( n ) Y,θ ′ ( v ) f ( n ) Y,θ ′′ ( v ) / (cid:8) E θ f ( n ) Y,θ ( v ) (cid:9) dP ( n ) V ( v )= k S k + η mK ! − X K (cid:13)(cid:13) f K (cid:13)(cid:13) · Z (cid:8) E θ θ K f ( n ) Y,θ ( v ) (cid:9) /E θ f ( n ) Y,θ ( v ) dP ( n ) V ( v ) , here, here, θ ′ and θ ′′ denote two independent copies of θ . There we have used thefact that E θ Z f ( n ) Y,θ ( v ) dP ( n ) V ( v ) = 1 , E θ θ K f ( n ) Y,θ = 12 E θ f ( n ) Y,θ ( K , +) − E θ f ( n ) Y,θ ( K , − ) , (6.21)where θ ( K , ± ) denotes the vector θ with θ K replaced by ±
1; hence, Z E θ θ K f ( n ) Y,θ ( v ) dP ( n ) V ( v ) = 0 . Together with (6.20) this implies that the right hand side of (6.17) equals η mK ! − X K (cid:13)(cid:13) f K (cid:13)(cid:13) · h − Z (cid:8) E θ θ K f ( n ) Y,θ ( v ) (cid:9) /E θ f ( n ) Y,θ ( v ) dP ( n ) V ( v ) i . (6.22)Using (6.21) and the fact that E θ f ( n ) Y,θ = E θ f ( n ) Y,θ ( K , +) + E θ f ( n ) Y,θ ( K , − ) , we establishthat1 − Z (cid:8) E θ θ K f ( n ) Y,θ ( v ) (cid:9) /E θ f ( n ) Y,θ ( v ) dP ( n ) V ( v ) ≥ Z q E θ f ( n ) Y,θ ( K , +) ( v ) q E θ f ( n ) Y,θ ( K , − ) ( v ) dP ( n ) V ( v ) − ≥ E θ Z q f ( n ) Y,θ ( K , +) ( v ) q f ( n ) Y,θ ( K , − ) ( v ) dP ( n ) V ( v ) −
1= 2 E θ (cid:16) Z q f Y,θ ( K , +) ( v ) q f Y,θ ( K , − ) ( v ) dP V ( v ) (cid:17) n − , (6.23)by the Cauchy-Schwarz inequality. The Hellinger affinity between the densities f Y,θ ( K , +) and f Y,θ ( K , − ) is bounded from below by the corresponding χ -distance, i.e. Z q f Y,θ ( K , +) ( v ) q f Y,θ ( K , − ) ( v ) dP V ( v ) ≥ − χ (cid:8) f Y,θ ( K , +) , f Y,θ ( K , − ) (cid:9) , where χ ( f, g ) = R ( f − g ) /f dP V . We refer to the book of Tsybakov (2009) for anintensive review on these information distances. We deduce that f Y,θ ( K , +) ( V ) = f [ m ] Y,θ ( β ′ V , , . . . , β ′ V ,m ) ≥ − η , a.s. . Equipped with this inequality and (6.18) we consider that χ (cid:0) f Y,θ ( K , +) , f Y,θ ( K , − ) (cid:1) ≤ − η (cid:13)(cid:13) ˜ P θ ( K , +) − ˜ P θ ( K , − ) (cid:13)(cid:13) ≤ η − η mK ! − (cid:13)(cid:13) f K (cid:13)(cid:13) . Combining this with (6.19), (6.22) and (6.23) we obtain thatsup P X ∈F X R (cid:0) ˆ f n , f Y (cid:1) ≥ η mK ! − (16 σ K ) − K · (cid:8) ± o (1) (cid:9) · (cid:16) − η − η mK ! − (16 σ K ) − K · (cid:8) ± o (1) (cid:9)(cid:17) n . (6.24) ow we take m = ⌊ ( D M log n ) /γ ⌋ and K = ⌊ D K (log n ) / log(log n ) ⌋ for someconstants D M , D K >
0. Whenever − D M C X, − D K (1 /γ − − D K < −
1, theinequality (6.24), together with (6.16), yields thatlim inf n →∞ sup P X ∈F X (cid:8) log R (cid:0) ˆ f n , f Y (cid:1)(cid:9) / log n ≥ − D M C X, − D K /γ . We may choose D K = γ/ (2 − γ ) and D M > (cid:3) The proof follows a usual structure of adaptivity proofs for cross-validation techniques,see e.g. Section 2.5.1 in the book of Meister (2009) for a related proof in the field ofdensity deconvolution.Let ( m n , K n ) be defined as in the statement of Theorem 4.2 and define the set G ′ = (cid:8) ( m, K ) ∈ G : R ( ˆ f [ m,K ] Y , f Y ) > R ( ˆ f [ m n ,K n ] Y , f Y ) (cid:9) . Using the notation k g k P V = R | g ( x ) | dP V ( x ), for any g ∈ L ( P V ), we need to provethat lim n →∞ sup P X ∈F X P (cid:0) n γ k ˆ f [ ˆ m, ˆ K ] Y − f Y k P V ≥ n d (cid:1) = 0.By Markov’s inequality we have P (cid:0) n γ k ˆ f [ ˆ m, ˆ K ] Y − f Y k P V ≥ n d (cid:1) ≤ X ( m,K ) ∈ G \ G ′ P (cid:0) k ˆ f [ m,K ] Y − f Y k P V > n − γ + d (cid:1) + P (cid:2) ( ˆ m, ˆ K ) ∈ G ′ (cid:3) ≤ G ) · n γ − d · R (cid:0) ˆ f [ m n ,K n ] Y , f Y (cid:1) + X ( m,k ) ∈ G ′ P (cid:0) ˆ m = m, ˆ K = K (cid:1) . (6.25)By Theorem 4.2, the first term in (6.25) converges to 0 as n → ∞ uniformly over P X ∈ F X . It remains to study the second term.On the event { ˆ m = m, ˆ K = K } , we have CV( m, K ) ≤ CV( m n , K n ) and, hencealso (cid:13)(cid:13) ˆ f [ m,K ] Y (cid:13)(cid:13) P V − E (cid:13)(cid:13) ˆ f [ m,K ] Y (cid:13)(cid:13) P V − ( m, K ) − ( m, K ) + R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1) ≤ (cid:13)(cid:13) ˆ f [ m n ,K n ] Y (cid:13)(cid:13) P V − E (cid:13)(cid:13) ˆ f [ m n ,K n ] Y (cid:13)(cid:13) P V − ( m n , K n ) − ( m n , K n )+ R (cid:0) ˆ f [ m n ,K n ] Y , f Y (cid:1) , (6.26)where ∆ ( m, K ) = { n ( n − } − P i = i ′ P k ∈K ( m,K ) Ξ( i, m, k ) · Ξ( i ′ , m, k ),∆ ( m, K ) = n − P ni =1 P k ∈K ( m,K ) (cid:8) E Ξ(1 , m, k ) (cid:9) · Ξ( i, m, k ), K ( m, K ) = (cid:8) k ∈ N m : k + · · · + k m ≤ K (cid:9) , Ξ( j, m, k ) = H k (cid:0) β ′ Y j , /σ, . . . , β ′ Y j ,m /σ (cid:1) andΞ( j, m, k ) = Ξ( j, m, k ) − E Ξ( j, m, k ).The first terms of both sides of the inequality at (6.26) can be represented as ollows, using the orthonormality of the H k ’s: k ˆ f [ m,K ] Y k P V − E k ˆ f [ m,K ] Y k P V = X k ∈K ( m,K ) n(cid:12)(cid:12)(cid:12) n n X i =1 Ξ( i, m, k ) (cid:12)(cid:12)(cid:12) − E (cid:12)(cid:12)(cid:12) n n X i =1 Ξ( i, m, k ) (cid:12)(cid:12)(cid:12) o = 1 n X i,i ′ X k ∈K ( m,K ) (cid:8) Ξ( i, m, k )Ξ( i ′ , m, k ) − E Ξ( i, m, k )Ξ( i ′ , m, k ) (cid:9) = (1 − /n ) (cid:8) ∆ ( m, K ) + 2∆ ( m, K ) (cid:9) + ∆ ( m, K ) , where ∆ ( m, K ) = n − P ni =1 P k ∈K ( m,K ) (cid:8) Ξ ( i, m, k ) − E Ξ ( i, m, k ) (cid:9) . Together with(6.26) this implies that, for ( m, K ) ∈ G ′ , R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1) / ≤ ∆ ( m, K, m n , K n ),where ∆ ( m, K, m n , K n ) = (1 + 1 /n ) (cid:8)(cid:12)(cid:12) ∆ ( m, K ) (cid:12)(cid:12) + (cid:12)(cid:12) ∆ ( m n , K n ) (cid:12)(cid:12) + 2 (cid:12)(cid:12) ∆ ( m, K ) − ∆ ( m n , K n ) (cid:12)(cid:12)(cid:9) + | ∆ ( m, K ) | + | ∆ ( m n , K n ) | . Hence the second term in (6.25) has the following upper bound:2 X ( m,K ) ∈ G ′ (cid:8) R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1)(cid:9) − (cid:8) E ∆ ( m, K, m n , K n ) (cid:9) / . (6.27)In order to bound (6.27), we need a lower bound on R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1) . Theorem 4.1provides only an upper bound to this term but an inspection of the proof of thistheorem – in particular (6.10) to (6.12) – yields that R (cid:0) ˆ f [ m,K ] Y , f Y (cid:1) ≥ B ( m, K ) + V ∗ n ( m, K ) + D ∗ ( m ) − n (cid:13)(cid:13) f [ m ] Y ( σ · ) (cid:13)(cid:13) g , (6.28)where B ( m, K ) is the term B from Theorem 4.1 and D ∗ ( m ) = E var { f Y ( V ) | A m } , V ∗ n ( m, K ) = 1 n K + mK ! . Here we have used the fact that E (cid:8) H k ,...,k m ( β ′ Y , /σ, . . . , β ′ Y ,m /σ ) (cid:9) ≥
1, whichcomes from the first lines of (6.12).In order to bound (6.27), we also need an upper bound for E ∆ ( m, K, m n , K n ), hich involves ∆ to ∆ . For ∆ we have E (cid:12)(cid:12) ∆ ( m, K ) (cid:12)(cid:12) = 2 n ( n − X k , k ′ ∈K ( m,K ) (cid:2) cov (cid:8) Ξ(1 , m, k ) , Ξ(1 , m, k ′ ) (cid:9)(cid:3) = 2 n ( n − X k , k ′ ∈K ( m,K ) (cid:2)(cid:10) H k , H k ′ f [ m ] Y ( σ · ) (cid:11) g − (cid:10) H k , f [ m ] Y ( σ · ) (cid:11) g · (cid:10) H k ′ , f [ m ] Y ( σ · ) (cid:11) g (cid:3) ≤ n ( n − X k , k ′ ∈K ( m,K ) (cid:10) H k , H k ′ f [ m ] Y ( σ · ) (cid:11) g + 4 n ( n − (cid:16) X k ∈K ( m,K ) (cid:10) H k , f [ m ] Y ( σ · ) (cid:11) g (cid:17) ≤ n ( n − X k ′ ∈K ( m,K ) (cid:13)(cid:13) H k ′ f [ m ] Y ( σ · ) (cid:13)(cid:13) g + 4 n ( n − (cid:13)(cid:13) f [ m ] Y ( σ · ) (cid:13)(cid:13) g ≤ n − (cid:13)(cid:13) { f [ m ] Y ( σ · ) } (cid:13)(cid:13) g · sup k ∈K ( m,K ) (cid:13)(cid:13) H k (cid:13)(cid:13) g V ∗ n ( m, K ) + 4 n ( n − (cid:13)(cid:13) { f [ m ] Y ( σ · ) } (cid:13)(cid:13) g , (6.29)where we have used Parseval’s identity with respect to the orthonormal system H k .For the term involving ∆ we have E (cid:12)(cid:12) ∆ ( m, K ) − ∆ ( m n , K n ) (cid:12)(cid:12) ≤ n E h X k ∈K ( m,K ) Ξ(1 , m, k ) { E Ξ(1 , m, k ) }− X k ∈K ( m n ,K n ) Ξ(1 , m n , k ) { E Ξ(1 , m n , k ) } i = 1 n E h X k ∈ N m (cid:8) K m ( m,K ) ( k ) − K m ( m n ,K n ) ( k ) (cid:9) Ξ(1 , m, k ) { E Ξ(1 , m, k ) } i = 1 n X k , k ′ (cid:8) K m ( m,K ) ( k ) − K m ( m n ,K n ) ( k ) (cid:9)(cid:8) K m ( m,K ) ( k ′ ) − K m ( m n ,K n ) ( k ′ ) (cid:9) · h H k , f [ m ] Y ( σ · ) i g h H k ′ , f [ m ] Y ( σ · ) i g (cid:10) H k , H k ′ f [ m ] Y ( σ · ) (cid:11) g , (6.30)with m = max { m, m n } and K m ( m, K ) = (cid:8) k ∈ K ( m, K ) : k l = 0 , ∀ l > m (cid:9) . Here we have used the fact that H ≡ E (cid:8) f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) | A m (cid:9) = f [ m ] Y ( β ′ V , , . . . , β ′ V ,m ) a.s., which follows from Lemma 3.1(a).Applying the Cauchy-Schwarz inequality and Parseval’s identity, we get that the ight hand side of (6.30) has the following upper bound:1 n (cid:16) X k (cid:12)(cid:12) K m ( m,K ) ( k ) − K m ( m n ,K n ) ( k ) (cid:12)(cid:12) h H k , f [ m ] y ( σ · ) i g (cid:17) · (cid:16) X k ∈K ( m,K ) (cid:13)(cid:13) H k · f [ m ] Y ( σ · ) (cid:13)(cid:13) g (cid:17) / ≤ V ∗ n ( m, K ) · (cid:8)(cid:13)(cid:13) P H ( m,K ) f [ m ] Y ( σ · ) (cid:13)(cid:13) g + (cid:13)(cid:13) P H ( m n ,K n ) f [ m ] Y ( σ · ) (cid:13)(cid:13) g − (cid:13)(cid:13) P H ( m,K ) f [ m ] Y ( σ · ) (cid:13)(cid:13) g (cid:9) · m + Km ! − / · (cid:13)(cid:13) { f [ m ] Y ( σ · ) } (cid:13)(cid:13) / g · max (cid:8) k H k k / g : k ∈ K ( m, K ) (cid:9) , (6.31)where K = max { K, K n } , m = min { m, m n } , K = min { K, K n } and P H ( m,k ) denotesthe orthogonal projector onto the linear subspace H ( m, K ) of L ,g ( R m ).Since kP H ( m,K ) f [ m ] Y ( σ · ) (cid:13)(cid:13) g = k f [ m ] Y ( σ · ) (cid:13)(cid:13) g − B ( m, K )= E | f Y ( V ) | − D ∗ ( m ) − B ( m, K ) , then using Lemma 3.1(a), the right hand side of (6.31) has the following upper bound:3 V ∗ n ( m, K ) · (cid:8) D ∗ ( m ) + D ∗ ( m n ) + B ( m, K ) + B ( m n , K n ) (cid:9) · m + Km ! − / · (cid:13)(cid:13) { f [ m ] Y ( σ · ) } (cid:13)(cid:13) / g · max (cid:8) k H k k / g : k ∈ K ( m, K ) (cid:9) , (6.32)since B ( m, K ) decreases as K increases.Finally, for the term involving ∆ we have E (cid:12)(cid:12) ∆ ( m, K ) (cid:12)(cid:12) ≤ n − E n X k ∈K ( m,K ) Ξ (1 , m, k ) o ≤ n (cid:8) V ∗ n ( m, K ) (cid:9) (cid:13)(cid:13)(cid:8) f [ m ] Y ( σ · ) (cid:9) (cid:13)(cid:13) g · max (cid:8) k H k k g : k ∈ K ( m, K ) (cid:9) . (6.33)In order to bound the terms (6.29), (6.32) and (6.33), we need some technical esults. Using the explicit sum representation of the Hermite polynomials we write Z H ℓk ( x ) 1 √ π exp( − x / dx = ⌊ k/ ⌋ X i ,...,i l =0 ( k !) l/ · ( − − i −···− i ℓ i ! · · · i ℓ ! · ( k − i )! · · · ( k − i ℓ )! Z x kℓ − i + ··· + i ℓ ) √ π exp( − x / dx ≤ ℓ ⌊ k/ ⌋ X i =0 − i ( k !) ℓ/ Γ( kℓ/ / − i ) i !( kℓ − i )! × X i + ··· + i ℓ = i ii , . . . , i ℓ ! kℓ − ik − i , . . . , k − i ℓ ! / √ π ≤ ℓ ⌊ k/ ⌋ X i =0 − i kℓ/ i ! ℓ kℓ − i ≤ ( ℓ + ℓ/ kℓ/ , for any k ∈ N and any even integer ℓ >
0. Furthermore we have (cid:13)(cid:13) { f [ m ] Y ( σ · ) } (cid:13)(cid:13) g = E (cid:12)(cid:12) E (cid:8) f Y ( V ) | A m (cid:9)(cid:12)(cid:12) ≤ Ef Y ( V ) ≤ E exp (cid:16) − σ Z | X ′ ( t ) | dt (cid:17) E n exp (cid:16) σ Z X ′ ( t ) dV ( t ) (cid:17)(cid:12)(cid:12) X o ≤ exp (cid:8) (8 /σ ) C X, (cid:9) , where we used (6.6) and Assumption 2. Applying these results to (6.29), (6.32) and(6.33) and recalling (6.28), we deduce that (6.27) has the upper bound(log n ) /γ D K h(cid:8) n R ( ˆ f [ m,K ] Y , f Y ) (cid:9) − / + m + Km ! − / i , for some global finite constant D >
0, so that (6.27) converges to zero uniformly over P X ∈ F X . This completes the proof of the theorem. (cid:3) Acknowledgments
Delaigle’s research was supported by a grant and a fellowship from the AustralianResearch Council. The authors are grateful to the editors and two referees for theirhelpful and valuable comments.
References [1] Appell, P. (1988). Sur une classe de polynˆomes.
Annales scientifiques de l’ ´EcoleNormale Sup´erieure 2me s´erie , 119–144.[2] Ba´ıllo, A., Cuevas, A. and Cuesta-Albertos, J.A. (2011). Supervised classificationfor a family of Gaussian functional models. Scand. J. Statist. , 480–498.[3] Bugni, F.A., Hall, P., Horowitz, J.L. and Neumann, G.R. (2009). Goodness-of-fittests for functional data. Econometrics J. , 1–18.
4] Cameron, R. H. and Martin, W. T. (1947). The orthogonal development of non-linear functionals in series FourierHermite functionals.
Ann. Math. , , 385–392.[5] Chagny, G. and Roche, A. (2014). Adaptive and minimax estimation of the cumula-tive distribution function given a functional covariate. Elec. J. Statist. , 2352–2404.[6] Chesneau, C., Kachour, M. and Maillot, B. (2013). Nonparametric estimationfor functional data by wavelet thresholding. REVSTAT – Statistical Journal ,211–230.[7] Ciollaro, M., Genovese, Ch.R. and Wang, D. (2016). Nonparametric clustering offunctional data using pseudo-densities. Elec. J. Statist. , 2922-2972.[8] Dabo-Niang, S. (2002). Estimation de la densit´e en dimension infinie: Applicationaux processus de type diffusion. C.R. Acad. Sci. Paris I , , 213-216.[9] Dabo-Niang, S. (2003). Density estimation in a separable metric space. Pub. Inst.Stat. Univ. Paris , , fasc. 1-2, 3–21.[10] Dabo-Niang, S. (2004a). Density estimation by orthogonal series in an infinitedimensional space: application to processes of diffusion type I. J. NonparametricStatistics , , 171–186.[11] Dabo-Niang, S. (2004b). Kernel density estimator in an infinite-dimensional spacewith a rate of convergence in the case of diffusion process. Applied MathematicsLetters , 381-386.[12] Dabo-Niang, S. and Yao, A.-F. (2013). Kernel spatial density estimation in infinitedimension space. Metrika , 19–52.[13] Dai, X., M¨uller, H.-G. and Yao, F. (2017). Optimal Bayes classifiers for functionaldata and density ratios. Biometrika , 545–560.[14] Delaigle, A. and Hall, P. (2010). Defining probability density for a distributionof random functions.
Ann. Statist. , 1171–1193.[15] Delaigle, A. and Hall, P. (2012). Achieving near-perfect classification for func-tional data. J. Roy. Statist. Soc., Ser. B , 267–286.[16] Delaigle, A., Hall, P. and Bathia, N. (2012). Componentwise classification andclustering of functional data. Biometrika , , 299–313.[17] Delaigle, A. and Hall, P. (2013). Classification using censored functional data. J.Amer. Statist. Assoc. , 1269–1283.[18] Escabias, M., Aguilera, A.M. and Valderrama,M.J. (2007). Functional PLS logitregression model.
Comput. Statist. Data Anal. , 4891–4902.[19] Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis: theoryand practice , Springer.[20] Girsanov, I. V. (1960). On transforming a certain class of stochastic processes byabsolutely continuous substitution of measures.
Theo. Probab. Appl. , 285-301.[21] Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principalcomponents analysis. J. Roy. Statist. Soc., Ser B ,109–126.[22] Hall, P., Poskitt, D. and Presnell, B. (2001). A functional data-analytic approachto signal discrimination. Technometrics , 1–9.[23] Hall, R., Rinaldo, A. and Wasserman, L. (2013). Differential privacy for functionsand functional data. J. Mach. Learn. Research , 703–727.
24] Jirak, M. (2016). Optimal eigen expansions and uniform bounds.
Probab. Theo.Rel. Fields , 753–799.[25] Karwa, V. and Slavkovi, A. (2016). Inference using noisy degrees: differentiallyprivate β -model and synthetic graphs. Ann. Statist. , 87–112.[26] Kim, A.K.H. (2014). Minimax bounds for estimation of normal mixtures. Bernoulli , 1802–1818.[27] Lin, Z., M¨uller, H.-G. and Yao, F. (2018). Mixture inner product spaces and theirapplication to functional data analysis. Ann. Statist. , 370–400.[28] Mas, A. (2012). Lower bound in regression for functional data by small ballprobability representation in Hilbert space. Elec. J. Statist. , 1745–1778.[29] Mas A. and Ruymgaart, F. (2015). High dimensional principal projections. Com-plex Anal. Operator Theo. , 35–63.[30] Meister, A. (2009). Deconvolution problems in nonparametric statistics , Springer.[31] Meister, A. (2016). Optimal classification and nonparametric regression for func-tional data.
Bernoulli , 1729–1744.[32] Mirshani, A., Reimherr, M. and Slavkovic, A. (2017). Establishing statisticalprivacy for functional data via functional densities. arXiv:1711.06660 .[33] Prakasa Rao, B.L.S. (2010a). Nonparametric density estimation for functionaldata by delta sequences. Braz. J. Probab. Stat. , 468–478.[34] Prakasa Rao, B.L.S. (2010b). Nonparametric density estimation for functionaldata via wavelets. Comm. Stat. – Theo. Meth. , 1608–1618.[35] Preda, C., Saporta, G. and Leveder, C. (2007). PLS classification of functionaldata. Comput. Statist. , 223–235.[36] Shin, H. (2008). An extension of Fisher’s discriminant analysis for stochasticprocesses. J. Mult. Anal. , 1191–1216.[37] Tsybakov, A.B. (2009). Introduction to nonparametric estimation , Springer.[38] Wasserman, L., and Zhou, S. (2010). A statistical framework for differentialprivacy.
J. Amer. Statist. Assoc. , 375–389.[39] Zhang, X.L., Begleiter, H., Porjesz, B., Wang, W. and Litke, A. (1995). Eventrelated potentials during object recognition tasks.
Brain Research Bulletin , 531–538., 531–538.