[PDF] Nonparametric calibration for stochastic reaction-diffusion equations based on discrete observations

Abstract

Nonparametric estimation for semilinear SPDEs, namely stochastic reaction-diffusion equations in one space dimension, is studied. We consider observations of the solution field on a discrete grid in time and space with infill asymptotics in both coordinates. Firstly, based on a precise analysis of the H\"older regularity of the solution process and its nonlinear component, we show that the asymptotic properties of diffusivity and volatility estimators derived from realized quadratic variations in the linear setup generalize to the semilinear SPDE. In particular, we obtain a rate-optimal joint estimator of the two parameters. Secondly, we derive a nonparametric estimator for the reaction function specifying the underlying equation. The estimate is chosen from a finite-dimensional function space based on a simple least squares criterion. Oracle inequalities with respect to both the empirical and usual L^2-risk provide conditions for the estimator to achieve the usual nonparametric rate of convergence. Adaptivity is provided via model selection.

Full PDF

NNonparametric calibration for stochastic reaction-diﬀusion equations based on discrete observations

Florian Hildebrandt and Mathias TrabsUniversit¨at Hamburg

Nonparametric estimation for semilinear SPDEs, namely stochastic reaction-diﬀusionequations in one space dimension, is studied. We consider observations of the solution ﬁeldon a discrete grid in time and space with inﬁll asymptotics in both coordinates. Firstly,based on a precise analysis of the H¨older regularity of the solution process and its nonlinearcomponent, we show that the asymptotic properties of diﬀusivity and volatility estimatorsderived from realized quadratic variations in the linear setup generalize to the semilinearSPDE. In particular, we obtain a rate-optimal joint estimator of the two parameters.Secondly, we derive a nonparametric estimator for the reaction function specifying theunderlying equation. The estimate is chosen from a ﬁnite-dimensional function space basedon a simple least squares criterion. Oracle inequalities with respect to both the empiricaland usual L -risk provide conditions for the estimator to achieve the usual nonparametricrate of convergence. Adaptivity is provided via model selection. Keywords: inﬁll asymptotics, realized quadratic variation, model selection, semilinear stochasticpartial diﬀerential equations.

In view of a growing number of stochastic partial diﬀerential equation (SPDE) models used in thenatural sciences as well as in mathematical ﬁnance, their data-based calibration has become an in-creasingly active ﬁeld of research during the last few years. Studying stochastic reaction-diﬀusionequations, this article advances the statistical theory for SPDEs based on discret observations in timeand space to a semilinear framework and provides a ﬁrst nonparametric estimator for the reactionfunction. Speciﬁcally, we consider the mild solution X = ( X t ( x ) , x ∈ [0 , , t ≥

0) of the SPDE  dX t ( x ) = (cid:0) ϑ ∂ ∂x X t ( x ) + f ( X t ( x )) (cid:1) dt + σdW t ( x ) ,X t (0) = X t (1) = 0 ,X ( x ) = ξ ( x ) , (1)with dW denoting a white noise in space and time and an initial condition ξ : [0 , → R . Theequation is parameterized in terms of the volatility σ >

0, the diﬀusivity ϑ > f : R → R on which we impose no parametric assumptions. Reaction-diﬀusion equations are typically used to model a scenario where local production of some quantity X with the nonlinear reaction function f competes with a linear diﬀusion eﬀect while undergoing1 a r X i v : . [ m a t h . S T ] F e b nternal ﬂuctuations, see [35] as well as, e.g., [24] for the physical background. Of particular interestis the case where f is a polynomial of odd degree with a negative leading coeﬃcient. With f ≡ σ and ϑ and thenonparametric estimation of f . We study the practically most natural situation where X is observedat a discrete grid { ( t i , y k ) } i =0 ,...,N, k =0 ,...,M ⊂ [0 , T ] × [0 ,

1] in time and space with

T > T → ∞ . Our focus lies on a high frequency regime in time and space where both the number M of spatial observations and the number N of temporal observations tend to inﬁnity.So far, the vast majority of literature on statistics for SPDEs deals with linear equations, see[12, 31] for reviews of various available approaches and observation schemes. The statistical analysis ofsemilinear SPDEs is limited to few works, most of which have only appeared during the last two years:Within the spectral approach , where one considers the observations (cid:104) X t , e k (cid:105) , t ∈ [0 , T ] , k ≤ K → ∞ with ( e k ) k ≥ being the eigenbasis of the diﬀerential operator in the underlying SPDE, Cialenco andGlatt-Holtz [13] considered diﬀusivity estimation for the stochastic two-dimensional Navier-Stokesequation. This theory hast been generalized by Pasemann and Stannat [41] as well as Pasemann et al.[40] to more general equations. Diﬀusivity estimation based on the local measurements approach dueto Altmeyer and Reiß [3] was studied in a semilinear framework, see Altmeyer et al. [1, 2]. Here, theobservations are given by (cid:104) X t , K h (cid:105) , t ∈ [0 , T ] , for a kernel function K h that localizes in space as h → X t = X t + N t with X t being the solutionto the corresponding linear system. Exploiting that the regularity of the nonlinear component N t exceeds the regularity of X t in the Sobolev spaces D (( − A ϑ ) γ ) , γ >

0, where A ϑ = ϑA is the drivingdiﬀerential operator, statistical methods for the linear equation turn out to be applicable also in thesemilinear model.To estimate the parameters σ and ϑ based on fully discrete observations, we extend the realizedquadratic variation based methods developed in Hildebrandt and Trabs [26] and Bibinger and Trabs[7] to the semilinear framework (1). Similar methods have also been applied to various linear SPDEmodels, e.g., in [11, 14, 34, 43, 45]. Speciﬁcally, assuming f ∈ C ( R ) in the model (1), we show thatthe asymptotic properties of the realized quadratic variations based on space, time and the doubleincrements D ik := X t i +1 ( y k +1 ) − X t i +1 ( y k ) − X t i ( y k +1 ) + X t i ( y k )derived for the case f ≡ σ , ϑ ) can be estimated jointly at the rate (cid:16) δ ∨ ∆ / T (cid:17) / with δ := y k +1 − y k , ∆ := t i +1 − t i . This rate is generally slower than the usual parametric rate (

M N ) − / , unless a balanced samplingdesign δ (cid:104) √ ∆ is present. Nevertheless, in view of an immediate extension of the lower bound from[26] to the case T → ∞ , this rate is seen to be optimal up to a logarithmic factor.As for spectral and local observations, our ﬁndings rely on the higher order regularity of the non-linear component of the solution process. However, we quantify the regularity in terms of H¨olderspaces instead of the Sobolev spaces since, being based on power variations, our estimators exploitthe regularity properties of X in the spaces of H¨older continuous functions. Besides the concreteapplication in statistics, our detailed account of the H¨older regularity in time and space also providesstructural insights from a theoretical probabilistic point of view.Similar to [2, 41], our techniques can principally be applied to more general types of semilinearSPDEs, e.g., Burgers’ equation where the nonlinearity is given by F ( u ) = − u ∂∂x u . To facilitate theanalysis, we would have to consider a more regular noise process than a cylindrical Brownian motion.Indeed, in order to obtain the higher order regularity of the nonlinear component, it is necessaryfor the solution process to take values in the domain of the nonlinearity. This is generally not the2ase for equations driven by space-time white noise and we restrict our analysis to nonlinearities ofNemytskii-type where F ( u ) = f ◦ u . Also, a more regular noise process would be necessary to obtainthe existence of a function-valued solution process to equation (1) in space dimension d ≥

2, which isa crucial requirement for dealing with discrete observations.To the authors’ knowledge, the only work on estimation of the nonlinearity in semilinear SPDEswas conducted by Goldys and Maslowski [22] who studied a parametric problem, assuming a fullobservation ( X t ) t ≤ T of a controlled SPDE as T → ∞ . Somewhat related, Pasemann et al. [40] studiedestimation of the diﬀusivity parameter when the nonlinearity is only known up to a ﬁnite-dimensionalnuisance parameter by using a joint maximum likelihood approach for spectral observations. Since itfollows, e.g., from the absolute continuity result in Koski and Loges [30] that even the simple linearfunction f ( x ) = ϑ x for some ϑ < T → ∞ .Nonparametric estimation of the reaction function f turns out to be comparable to nonparametricdrift estimation for ﬁnite-dimensional SDEs. The latter problem, has been addressed in the statisticsliterature for high-frequency observations in multiple works, see, e.g., to [15, 21, 27]. The startingpoint for drift estimation for SDEs is to formulate a regression model based on the discrete observa-tions. Generalizing this approach to the inﬁnite-dimensional SPDE framework, our key insight is theregression-type decomposition X t +∆ − S (∆) X t ∆ = f ( X t ) + “stochastic noise term” + “negligible remainder terms”where ( S ( t )) t ≥ is the strongly continuous semigroup on L ((0 , ϑ ∂ ∂x .Note that, in contrast to the ﬁnite-dimensional situation, the time increments in the response variableshave to be corrected in terms of the semigroup due to its presence in the nonlinear component N t = (cid:82) t S ( t − s ) f ( X s ) ds . Doing so, the contribution of the linear component to the right hand side of theregression model automatically becomes stochastically independent of the covariate X t and can, thus,be treated as stochastic noise. Clearly, computing S (∆) X t is not feasible in the discrete observationscheme, as it depends on the whole spatial process ( X t ( x ) , x ∈ (0 , ϑ which weaddress by employing a plug-in approach. Doing so, we obtain an approximation ˇ S ∆ t ≈ S (∆) X t whichis only based on the discrete data.To estimate f , we adapt the nonparametric least squares approach by [16] which was successfullyapplied to ergodic one-dimensional diﬀusion processes in Comte et al. [15]. Hence, our nonparametricestimator is deﬁned as the minimizer of γ N,M ( g ) := 1 M N N − (cid:88) i =0 M − (cid:88) k =1 (cid:16) g ( X t i ( y k )) − X t i +1 ( y k ) − ˇ S ∆ t i ( y k )∆ (cid:17) over the functions g from a suitable ﬁnite-dimensional approximation space. Working in an ergodicregime for the process ( X t ) t ≥ , we derive O p -type oracle inequalities for the estimator when the riskis either the empirical 2-norm with evaluations at the data points or the usual L -norm on a compactset. These oracle inequalities reﬂect the well-known bias-variance trade-oﬀ in nonparametric statistics.Under suﬃcient conditions on the sampling frequencies δ and ∆, an optimal choice of the dimensionof the approximation space yields the usual nonparametric rate T − α/ (2 α +1) where α is some regularityparameter associated with f . Employing a model selection approach similiar to [15], we obtain anadaptive estimator.While our parametric estimators for ( σ , ϑ ) are constructed as method of moments estimatorsdirectly based on the covariance structure of the discrete observations in the linear case, our non-parametric estimator for f is based on an approximation of the spatially continuous model. As aconsequence, our estimation method requires a ﬁne observation mesh throughout the whole space3omain. In particular, the condition M ∆ → ∞ becomes our fundamental requirement.This article is organized as follows: In Section 2, we provide a thorough introduction of the SPDEmodel and the considered observation scheme. Further, we derive precise results on the H¨older regu-larity in time and space of the solution process and show the higher order regularity of its nonlinearcomponent. In Section 3, we conclude that the asymptotic properties of realized quadratic variationsbased on space, time and double increments as well as the corresponding estimators mainly carry overfrom the linear setting. Section 4 is devoted to nonparametric estimation of f . As a ﬁrst step, weintroduce the approximation spaces which serve as the candidate functions for estimating f . Then,we deﬁne the estimator and derive corresponding oracle inequalities. To that aim, we assume, ﬁrstly,that the diﬀusivity parameter ϑ is known and then employ a plug-in approach. All proofs are collectedin Section 5.1.We use the notations N := { , , . . . } and N := N ∪ { } as well as R + := [0 , ∞ ). For a, b ∈ R weuse the shorthand a ∧ b := min( a, b ) and a ∨ b := max( a, b ) as well as [ a ] := max { k ∈ N : k ≤ a } for a ∈ R + . For two sequences ( a n ) , ( b n ), we write a n (cid:46) b n to indicate that there exists some c > | a n | ≤ c ·| b n | for all n ∈ N and we write a n (cid:104) b n if a n (cid:46) b n (cid:46) a n . Throughout a n , b n → ∞ is meantin the sense of a n ∧ b n → ∞ for n → ∞ . If a n = a for some a ∈ R and all n ∈ N , we write ( a n ) ≡ a .When we write statements like M, N → ∞ , we implicitly assume that M and N depend on a commonindex n ∈ N such that N n , M n → ∞ for n → ∞ . Convergence in probability and convergence indistribution are denoted by P −→ and D −→ , respectively. The total variation distance of two probabilitymeasures P, Q on some measure space (Ω , F ) is denoted by (cid:107) P − Q (cid:107) TV := sup A ∈F | P ( A ) − Q ( A ) | . Fora sequence ( X n ) of real random variables on a probability space (Ω , F , P ), we use the usual stochasticLandau symbols: X n = o p ( a n ) indicates that X n /a n → X n = O p ( a n )means that for any ε > C > P ( | X n /a n | ≥ C ) ≤ ε for all suﬃciently large n ∈ N . Throughout, we work on a probability space (Ω , F , P ) equipped with a ﬁltration ( F t ) t ≥ satisfyingthe usual conditions. In order to introduce the reaction-diﬀusion equations from (1) thoroughly, letus consider the semilinear SPDE dX t = (cid:0) ϑ ∂ ∂x X t + F ( X t ) (cid:1) dt + σdW t , X = ξ, (2)where W denotes a cylindrical Brownian motion on L ((0 , F is givenby F ( u ) = f ◦ u for some f ∈ C ( R ) . For simplicity, we will also refer to the functional by f , i.e., we write f ( u ) = f ◦ u for functions u : [0 , → R . Throughout, the parameters σ and ϑ are strictly positive constants. As usual, theDirichlet boundary conditions in (2) are implemented into the domain of the diﬀerential operator A ϑ := ϑ ∂ ∂x , namely, we deﬁne D ( A ϑ ) := H ((0 , ∩ H ((0 , H k ((0 , L -Sobolev spaces of order k ∈ N and with H ((0 , C ∞ c ((0 , H ((0 , L ((0 , A ϑ by ( S ( t )) t ≥ .Recall that a cylindrical Brownian motion W is deﬁned as a linear mapping L ((0 , (cid:51) u (cid:55)→ W · ( u )such that t (cid:55)→ W t ( u ) is a one-dimensional standard Brownian motion for all normalized u ∈ L ((0 , W t ( u ) , W s ( v )) = ( s ∧ t ) (cid:104) u, v (cid:105) , for u, v ∈ L ((0 , , s, t ≥ W can thus be understood as the anti-derivative in time of space-time white noise.4ithout further notice, in the sequel we will work under the standing assumption that there is aunique mild solution X t = S ( t ) ξ + σ (cid:90) t S ( t − s ) dW s + (cid:90) t S ( t − s ) f ( X s ) ds, t ≥ , (3)such that X := ( X t ) t ≥ is a Markov process with state space E := C ([0 , { u ∈ C ([0 , u (0) = u (1) = 0 } and such that X ∈ C ( R + , E ) holds almost surely for all ξ ∈ E . It follows from [18, Example 7.8] thatthis assumption is fulﬁlled, e.g., when f satisﬁes f ( λ + η )sgn( λ ) ≤ a ( | η | )(1 + | λ | ) , λ, η ∈ R , (4)for some increasing function a : R + → R + . This is particularly satisﬁed when f is a polynomial ofodd degree with a negative leading coeﬃcient. Note that the existence of continuous trajectories isa crucial requirement for dealing with fully discrete observations. Even in the linear setting f ≡ X into its linear and its nonlinear component turns outto be useful. The decomposition is given by X t = S ( t ) ξ + X t + N t , t ≥ , where X t := σ (cid:90) t S ( t − s ) dW s , N t := (cid:90) t S ( t − s ) f ( X s ) ds, t ≥ , and ( X t ) is the mild solution to the associated linear SPDE ( f ≡ ξ ≡ A ϑ has a complete orthonormal system of eigenvectors in L ((0 , − λ (cid:96) , e (cid:96) ) (cid:96) ≥ are given by e (cid:96) ( y ) = √ π(cid:96)y ) , λ (cid:96) = ϑπ (cid:96) , y ∈ [0 , , (cid:96) ∈ N . Employing this sine base, the cylindrical Brownian motion W can be realized via W t = (cid:80) (cid:96) ≥ β (cid:96) ( t ) e (cid:96) in the sense that W t ( · ) = (cid:80) (cid:96) ≥ β (cid:96) ( t ) (cid:104)· , e k (cid:105) for a sequence of independent standard Brownian motions( β (cid:96) ) (cid:96) ≥ . In particular, the linear component of X admits the representation X t ( x )= (cid:88) (cid:96) ≥ u (cid:96) ( t ) e (cid:96) ( x ) , t ≥ , x ∈ [0 , , (5)where ( u (cid:96) ) (cid:96) ≥ are one dimensional independent processes satisfying the Ornstein-Uhlenbeck dynamics du (cid:96) ( t ) = − λ (cid:96) u (cid:96) ( t ) dt + σ dβ (cid:96) ( t ) or, equivalently, u (cid:96) ( t ) = u (cid:96) (0)e − λ (cid:96) t + σ (cid:90) t e − λ (cid:96) ( t − s ) dβ (cid:96) ( s ) , u (cid:96) (0) = (cid:104) ξ, e (cid:96) (cid:105) in the sense of a usual ﬁnite-dimensional stochastic integral. From representation (5) it is evident that( t, x ) (cid:55)→ X t ( x ) is a two-parameter centered Gaussian ﬁeld with covariance structureCov (cid:0) X s ( x ) , X t ( y ) (cid:1) = σ (cid:88) (cid:96) ≥ e − λ (cid:96) | t − s | − e − λ (cid:96) ( t + s ) λ (cid:96) e (cid:96) ( x ) e (cid:96) ( y ) , s, t ≥ , x, y ∈ [0 , . (6)In our analysis, we will always assume that either ξ = 0 or that ξ follows the stationary distributionon E associated with the Markov process X , provided that it exists. For the linear system f ≡ { β (cid:96) , (cid:104) ξ, e (cid:96) (cid:105) , (cid:96) ∈ N } be independent with (cid:104) ξ, e (cid:96) (cid:105) ∼N (0 , σ / (2 λ (cid:96) )) such that each coeﬃcient process t (cid:55)→ (cid:104) X t , e (cid:96) (cid:105) is stationary with covariance function c ( s, t ) := σ λ (cid:96) e − λ (cid:96) | t − s | , s, t ≥

0. For semilinear equations there are generally no explicit expressionsfor the invariant distribution, though its existence can be guaranteed via abstract criteria in a largevariety of cases, see, e.g., Proposition 2.1 below.As a ﬁnal standing assumption, we require that the nonlinearity f and its derivative are at most ofpolynomial growth, i.e., there exist constants c > d ∈ N such that | f ( x ) | , | f (cid:48) ( x ) | ≤ c (1 + | x | d ) , x ∈ R . (7)This assumption will be essential for our whole analysis as it allows to deduce the higher order regu-larity of ( N t ) from properties of X .The set of basic assumptions just introduced is suﬃcient for generalizing realized-quadratic-variation-based estimators for σ and ϑ to the semilinear framework, as long as the time horizon T remainsbounded. When dealing with the case T → ∞ , on the other hand, we need to impose a stricterassumption, that ensures that the error induced by the nonlinearity remains negligible uniformly intime, namely:(B) The process X from (3) with zero or, in case of existence, stationary initial condition satisﬁessup t ≥ E ( (cid:107) X t (cid:107) p ∞ ) < ∞ for any p ≥ f , our analysis will rely on a concentrationinequality derived via the mixing property of a stationary process. Hence, we will later assume:(M) For the Markov process X from (3), there exists a stationary distribution π on E and the mildsolution with X = ξ ∼ π satisﬁes E ( (cid:107) X t (cid:107) p ∞ ) = E ( (cid:107) X (cid:107) p ∞ ) < ∞ for any p ≥

1. Furthermore, X is exponentially β -mixing, i.e., there exist constants L, γ > β X ( t ) := (cid:90) E (cid:107) P t ( u, · ) − π ( · ) (cid:107) TV π ( du ) ≤ L e − γt , t ≥ , (8)where ( P t ) t ≥ is the transition semigroup on E associated with the Markov process X .Suﬃcient conditions for Assumptions (B) and (M) to be satisﬁed are given in the following Propositionwhich is a slight extension of results derived in Goldys and Maslowski [23]. Proposition 2.1.

If there are constants a, b, c, β ≥ such that sgn( x ) f ( x + y ) ≤ − a | x | + b | y | β + c (9) holds for all x, y ∈ R , then Assumptions ( B ) and ( M ) are satisﬁed. A proof for the above proposition is given in Section 5.4. Condition (9) requires that − f has atleast linear growth at inﬁnity and is, not surprisingly, stronger than the general existence condition(4). Still, it covers a large class of systems, including the case where f is a polynomial of odd degreewith a negative leading coeﬃcient.Finally, for the nonparametric estimation of f on a compact set A ⊂ R , we will need that the L ( A )-norm is comparable to the emperical norm induced by the process X . This can be achieved byrequiring the following equivalence condition.(E) For the Markov process X from (3) there exists a stationary distribution π on E and, if ξ ∼ π ,the random variables ξ ( x ) admit a Lebesgue density µ x for each x ∈ (0 , A ⊂ R , there are constants c , c > b ∈ (0 , ) such that µ x ( z ) ≤ c for all z ∈ A, x ∈ (0 , ,µ x ( z ) ≥ c for all z ∈ A, x ∈ ( b, − b ) . b in the above lower bound is due to the degeneracy induced by theDirichlet boundary conditions. Speciﬁcally, Assumption (E) will be used in order to conclude theexistence of constants c, C > c (cid:107) t (cid:107) L ( A ) ≤ E (cid:16) M M − (cid:88) k =1 t (cid:0) X (cid:0) kM (cid:1)(cid:1)(cid:17) ≤ C (cid:107) t (cid:107) L ( A ) holds for all functions t ∈ L ( R ) with support in the compact set A . Assumption (E) is clearly satisﬁedin the case where f is a linear function f ( x ) = ϑ x for some ϑ <

0. Indeed, the correspondingstationary distribution matches the stationary distribution for the case f ≡ λ (cid:96) replaced by˜ λ (cid:96) := λ (cid:96) − ϑ . Thus, the random variables X ( x ) , x ∈ (0 , , are Gaussian and (E) can be checkedby examining their variances. Concerning more general nonlinearities f , there is a large amountof literature concerned with the existence and regularity of Lebesgue densities corresponding to themarginal distributions associated with various SPDE models, see, e.g., [4, 36, 38, 39]. However, to theauthor’s best knowledge, there are so far no estimates on the densities of the random variables X t ( x )that hold uniformly in x ∈ X for some inﬁnite set X ⊂ (0 , f toensure (E) goes beyond the scope of this work and is postponed to further research. Throughout, we assume that the following set of observations derived from a single sample path ofthe process X from (3) is available: we suppose to have ( M + 1)( N + 1) time- and space-discreteobservations { X t i ( y k ) , i = 0 , . . . , N, k = 0 , . . . , M } on a regular grid { ( t i , y k ) } i,k ⊂ [0 , T ] × [0 ,

1] with a time horizon

T >

M, N ∈ N . Moreprecisely, we assume that y k = b + kδ and t i = i ∆ where δ = 1 − bM , ∆ = TN for some ﬁxed b ∈ [0 , / T > T → ∞ .Due to the boundedness of the space domain, we have high frequency observations in space whenever M → ∞ . High frequency observations in time are present when T /N →

0. This is trivially satisﬁedif T is ﬁxed and N → ∞ .Note that the spatial locations y k are equidistant inside a (possibly proper) sub-interval [ b, − b ] ⊂ [0 , b >

0. This is done to prevent undesired boundary eﬀects, which lead to biased estimates.

In this section, we discuss the H¨older regularity of the process ( X t ( y ) , t ≥ , y ∈ [0 , N t ( y ) , t ≥ , y ∈ [0 , α >

0, we consider the H¨older spaces C α := C α ([0 , u ∈ C [ α ] suchthat (cid:107) u (cid:107) C α := [ α ] (cid:88) k =0 (cid:107) u ( k ) (cid:107) ∞ + sup x,y ∈ [0 , | u ( x ) − u ( y ) || x − y | α − [ α ] < ∞ . The H¨older continuous functions with Dirichlet boundary conditions are denoted by C α := { u ∈ C α , u (0) = u (1) = 0 } . X t ( x ) , x ∈ [0 , , t ≥

0) of X is a Gaussian process and E (( X t ( ξ ) − X t ( η )) ) = (cid:88) (cid:96) ≥ σ λ (cid:96) (1 − e − λ (cid:96) t )( e (cid:96) ( ξ ) − e (cid:96) ( η )) ≤ (cid:88) (cid:96) ≥ σ λ (cid:96) ( e (cid:96) ( ξ ) − e (cid:96) ( η )) (cid:104) | ξ − η | , (10) E (( X t ( x ) − X s ( x )) ) = (cid:88) (cid:96) ≥ σ λ (cid:96) (1 − e − λ (cid:96) | t − s | ) (cid:18) − − e − λ (cid:96) | t − s | − λ (cid:96) s (cid:19) e (cid:96) ( x ) ≤ (cid:88) (cid:96) ≥ σ λ (cid:96) (1 − e − λ (cid:96) | t − s | ) (cid:104) (cid:112) | t − s | . (11)From this observation it follows that, almost surely, X t ∈ E for any t ≥ x (cid:55)→ X t ( x ) is 2 γ -H¨oldercontinuous and t (cid:55)→ X t ( x ) is locally γ -H¨older continuous for any γ < /

4. The following propositiongeneralizes this fact to the semilinear setting and shows that, under Assumption (B), the correspondingH¨older norms are L p ( P )-bounded as functions of time. Proposition 2.2.

For any p ∈ [1 , ∞ ) the following hold.(i) For any γ < / , we have X ∈ C ( R + , C γ ) a.s. and, if Assumption (B) is satisﬁed, then sup t ≥ E ( (cid:107) X t (cid:107) pC γ ) < ∞ .(ii) For any γ < / and T > , we have ( X t ) ≤ t ≤ T ∈ C γ ([0 , T ] , E ) a.s. and, if Assumption (B)is satisﬁed, then there exists a constant C > such that E ( (cid:107) X t − X s (cid:107) p ∞ ) ≤ C | t − s | γp for all s, t ≥ .Furthermore, the same results hold for X replaced by f ( X ) where f ( x ) := f ( x ) − f (0) . We remark that a norm bound as in ( i ) with p = 1 is also derived in Cerrai [10, Proposition 4.2]. Toprove the above proposition, we analyze the linear and the nonlinear component of X separately. Theregularity of ( X t ) can be assessed by using properties (10) and (11) together with general techniquesfor the study of continuity properties for linear equations, especially the Garsia-Rodemich-Rumseyinequality, see Lemma 5.1. The regularity of ( N t ) is a consequence of the regularizing property ofthe semigroup ( S ( t )) t ≥ in view of the fact that, due to our basic assumptions, the process f ( X ) iscontinuous as a function of time and space.Having derived the H¨older regularity of the process X and, in particular, of f ( X ), we can use theregularizing impact of ( S ( t )) t ≥ once more to show that the regularity of N t = (cid:82) t S ( t − s ) f ( X s ) ds, t ≥ , exceeds the regularity of X . A related strategy has been pursued by Pasemann and Stannat [41] whostudied the higher order regularity of the nonlinear component of X in the spaces D (( − A ϑ ) ε ) , ε > N t = N t + M t where N t := (cid:90) t S ( t − s ) f ( X s ) ds, M t := (cid:90) t S ( r ) m dr (12)for m ≡ f (0) and f ( x ) = f ( x ) − f (0). Note that f maps E and, in particular, C α into itself. Proposition 2.3.

For any

T > , p ≥ and γ < / the following hold.(i) For any t ≥ , we have N t ∈ C γ and sup t ≤ T (cid:107) A ϑ N t (cid:107) C γ < ∞ almost surely. In particular,if Assumption (B) is satisﬁed, then sup t ≥ E ( (cid:107) A ϑ N t (cid:107) pC γ ) < ∞ . ii) We have ( N t ) t ≤ T ∈ C γ ([0 , T ] , E ) and ddt N t = f ( X t ) + A ϑ N t in E almost surely. In par-ticular, under Assumption (B), there exists C > such that E ( (cid:107) ddt ( N t − N s ) (cid:107) p ∞ ) ≤ C ( t − s ) γp holds for all s, t ≥ .Furthermore, the same results hold for ( N t ) and f instead of ( N t ) and f , provided that we replace E by C ([ b, − b ]) and C γ by C γ ([ b, − b ]) for some b ∈ (0 , ) . Using our precise analysis of the H¨older regularity of the linear and the nonlinear component of thesolution process from Section 2.2, we are now able to carry the central limit theorems for space anddouble increments from Hildebrandt and Trabs [26] as well as the central limit theorem for timeincrements, derived in Bibinger and Trabs [7], over to the semilinear framework. As a consequence,resulting method of moments estimators for the volatility σ and the diﬀusivity ϑ apply in the semilin-ear framework and, under quite general assumptions, their asymptotic properties remain unchanged.As before, X denotes the mild solution of equation (2) with zero or stationary initial condition and weconsider the observation scheme deﬁned in Section 2.1. The constant b deﬁning the minimal distanceof spatial observations to the boundary of [0 ,

1] is assumed to be strictly positive so that Proposition2.3 provides the regularity of the process ( N t ( x ) , x ∈ [ b, − b ] , t ≥

0) in space and time.First, let us consider the realized quadratic variation based on time increments, i.e.,¯ V t := 1 M N √ ∆ N − (cid:88) i =0 M − (cid:88) k =0 ( X t i +1 ( y k ) − X t i ( y k )) . In case f ≡

0, we denote the above expression by V t . An immediate extension of Bibinger and Trabs[7, Theorem 3.4] to the case T → ∞ shows that the central limit theorem √ M N (cid:18) V t − σ √ πϑ (cid:19) D −→ N (cid:18) , Bσ πϑ (cid:19) , N, M → ∞ , (13)with B := 2 + (cid:80) ∞ J =1 (cid:16) √ J − √ J + 1 − √ J − (cid:17) holds under the assumptions M = o (∆ − ρ ) for some ρ < / T M = o (∆ − ) . (14)Note that the condition T M = o (∆ − ) is already covered by M = o (∆ − ρ ) when T is ﬁxed. In the case T → ∞ , this condition becomes necessary in order to neglect the bias. A central limit theorem fortime increments in the case T → ∞ is also proved by Kaino and Uchida [28]. We obtain the followingextension of (13) to the semilinear framework. Theorem 3.1.

Grant assumption (14) . If T is ﬁxed and ﬁnite, the central limit theorem (13) remainsvalid for ¯ V t . If T → ∞ , it remains valid if Assumption (B) is satisﬁed and there exists ρ < / suchthat T M = o (∆ − ρ ) . While the result for a ﬁxed time horizon carries over from the linear setting without any extraassumptions on the interplay between

M, N and T , the additional assumption for the case T → ∞ isconsiderably stricter. In the proof of the above theorem, we show that R t := ¯ V t − V t = o p (1 / √ M N ),which proves the result in view of Slutsky’s Lemma. In fact, it follows from the temporal regularityproperties of the processes ( X t ) and ( N t ), that R t is of the order O p (∆ α ) for any α < /

4. Hence, thereason for the additional assumption in the case T → ∞ is that √ M N ∆ α = √ M T ∆ α − is requiredto tend to 0. 9ext, we consider the realized quadratic variation based on space increments, i.e.,¯ V sp := 1 M N δ N − (cid:88) i =0 M − (cid:88) k =0 ( X t i ( y k +1 ) − X t i ( y k )) . Since the terms indexed by i = 0 do not contribute to the sum if X = 0, in this case, we sum over i ∈ { , . . . , N } instead of { , . . . , N − } . Recall from Hildebrandt and Trabs [26, Theorem 3.3] that,denoting ¯ V sp in the case f ≡ V sp , the central limit theorem √ M N (cid:18) V sp − σ ϑ (cid:19) D −→ N (cid:18) , σ ϑ (cid:19) , M, N → ∞ , (15)holds under the condition N = o ( M ) . (16) Theorem 3.2.

Grant assumption (16) . If T is ﬁxed and ﬁnite, the central limit theorem (15) remainsvalid for ¯ V sp . If T → ∞ , it remains valid under Assumption (B). Although our proof strategy for the above theorem is the same as for time increments, here, theresult carries over from the linear setting with no extra conditions on

M, N and T , at all. Indeed, byusing a summation by parts formula to rewrite R sp := ¯ V sp − V sp , we can proﬁt from the fact that thesecond order spatial increments of ( N t ), namely N t i ( y k +1 ) − N t i ( y k ) + N t i ( y k − ), are of the order O p ( δ ), thanks to the spatial regularity of the process ( N t ).Finally, we consider the realized quadratic variation based on double increments, i.e.,¯ V := 1 M N Φ ϑ ( δ, ∆) N − (cid:88) i =0 M − (cid:88) k =0 ¯ D ik with ¯ D ik := X t i +1 ( y k +1 ) − X t i ( y k +1 ) − X t i +1 ( y k ) + X t i ( y k ) and the renormalizationΦ ϑ ( δ, ∆) := 2 (cid:88) (cid:96) ≥ − e − π ϑ(cid:96) ∆ π ϑ(cid:96) (cid:0) − cos( π(cid:96)δ ) (cid:1) (cid:104) δ ∧ √ ∆ . As discussed in [26] for the linear case, if a so called balanced sampling design is present, i.e. δ/ √ ∆ ≡ r for some r >

0, we can also consider¯ V r := 1 M N √ ∆ N − (cid:88) i =0 M − (cid:88) k =0 ¯ D ik . In the case X = 0, ¯ V and ¯ V r are redeﬁned just like ¯ V sp . Recall from [26, Theorem 3.7] that, denoting¯ V and ¯ V r in the case f ≡ V and V r , respectively, the centrallimit theorem √ M N ( V − σ ) D −→ N (cid:0) , C (cid:0) r/ √ ϑ (cid:1) σ (cid:1) , N, M → ∞ , (17)for some bounded and strictly positive continuous function C ( · ) on [0 , ∞ ] holds under the conditions δ/ √ ∆ → r ∈ { , ∞} or δ/ √ ∆ ≡ r > → T = o ( M ) . (19)10urthermore, in case of a balanced sampling design δ/ √ ∆ ≡ r >

0, [26, Corollary 3.8] states that wehave √ M N (cid:16) V r − ψ ϑ ( r ) σ (cid:17) D −→ N (cid:16) , C (cid:0) r/ √ ϑ (cid:1) ψ ϑ ( r ) σ (cid:17) , N, M → ∞ , (20)where ψ ϑ ( r ) := 2 √ πϑ (cid:16) − e − r ϑ + r √ ϑ (cid:90) ∞ r √ ϑ e − z dz (cid:17) . (21) Theorem 3.3.

Grant assumptions (18) and (19) . If T is ﬁxed and ﬁnite, the central limit theorems (17) and (20) remain valid for ¯ V and ¯ V r , respectively. If T → ∞ , they remain valid if Assumption(B) is satisﬁed and there exists a ∈ (0 , such that T = o ( M a ) . As for space increments, there are essentially no additional assumptions compared to the linearsetting. The inﬂuence induced by the nonlinearity is negligible, in particular, since the double incre-ments computed from the process ( N t ) decay in both ∆ and δ at the same time, as opposed to thedouble increments computed from ( X t ) which are roughly of the order ( δ ∧√ ∆) / , see also Lemma 5.3.It is straightforward to derive asymptotically normal method of moments estimators for σ or ϑ based on the above central limit theorems when one of the parameters is known, as discussed in, e.g.,[7, 11, 14, 26]. Joint estimation of the parameters ( σ , ϑ ) remains possible in the semilinear frameworkas well by exploiting the central limit theorems for ¯ V r . To that aim, one needs to revert to subsetsof the data having a balanced sampling design ˜ δ/ (cid:112) ˜∆ ≡ r for two diﬀerent values of r . Let us brieﬂyrecall the estimation procedure from [26]:Choose v, w ∈ N such that v (cid:104) max(1 , δ / ∆) and w (cid:104) max(1 , √ ∆ /δ ). Then, ˜∆ := v ∆ and ˜ δ := wδ satisfy r := ˜ δ/ (cid:112) ˜∆ (cid:104) . Using double increments on the coarser grid, namely¯ D v,w ( i, k ) := X t i + v ( y k + w ) − X t i ( y k + w ) − X t i + v ( y k ) + X t i ( y k ) , we set V ν := 1( M − w + 1)( N − νv + 1) √ νv ∆ M − w (cid:88) k =0 N − νv (cid:88) i =0 ¯ D νv,w ( i, k ) , ν = 1 , . In the case X = 0 we employ the obvious redeﬁnition of V ν . The ﬁnal estimator for ( σ , ϑ ) is thendeﬁned as (ˆ σ , ˆ ϑ ) := arg min (˜ σ , ˜ ϑ ) ∈ H (cid:16)(cid:0) V − σ ψ ˜ ϑ ( r ) (cid:1) + (cid:0) V − σ ψ ˜ ϑ (cid:0) r √ (cid:1)(cid:1) (cid:17) (22)for some compact set H ⊂ (0 , ∞ ) . In fact, denoting by G r the inverse function of ϑ (cid:55)→ ψ ϑ ( r ) /ψ ϑ ( r/ √ ϑ = G r ( V /V ) , ˆ σ = V /ψ ˆ ϑ ( r ) , provided that V /V lies in the range of ϑ (cid:55)→ ψ ϑ ( r ) /ψ ϑ ( r/ √ V , V ), thelatter is true with probability tending to one. In combination with the analysis in [26] and Theorem3.3, one immediately obtains the following result. Theorem 3.4.

Assume T max( √ ∆ , δ ) → and let H be a compact subset of (0 , ∞ ) × R such that ( σ , ϑ ) lies in its interior. If there exist values v (cid:104) max(1 , δ / ∆) and w (cid:104) max(1 , √ ∆ /δ ) such that wδ/ √ v ∆ is constant, then we have (ˆ σ − σ ) + ( ˆ ϑ − ϑ ) = O p (cid:16) δ ∨ ∆ / T (cid:17) for T, N, M → ∞ and ∆ → . This convergence rate is optimal up to a logarithmic factor. emark . The inverse of the squared rate,

T / ( δ ∨ ∆ / ), is exactly the order of magnitude of thesize of balanced sub-samples of the data. Further, the rate optimality of the estimator can be deducedjust like in [26], where the case f ≡ T is treated: Allowing T → ∞ , thequantity 1 / √ r δ, ∆ ,T with r δ, ∆ ,T :=  T ∆ / , √ ∆ δ (cid:38) ,Tδ · log δ √ ∆ , √ ∆ δ → σ , ϑ ). The above logarithmic factor is probably due totechnical issues. Comparison with the upper bound from Theorem 3.4 shows that the estimator (22)is (almost) rate optimal. The following section discusses nonparametric estimation of f . We adapt an estimation procedure,introduced by Comte et al. [15] in the context of one-dimensional diﬀusions, to the SPDE setting.Note that the parameters ( σ , ϑ ) can be estimated well using the methods analyzed in the previoussection. Thus, in a ﬁrst step, we will assume that these parameters (in fact, only ϑ is necessary) areknown. Later, a plug-in approach will be considered.In contrast to the previous section, we will strictly require that the mild solution X , deﬁned by(3), admits a stationary distribution, denoted by π , and, moreover, that the mixing assumption (M)is satisﬁed. Furthermore, it will be essential for the derivation of our oracle inequalities that we have T → ∞ . From now on, let A ⊂ R be a ﬁxed compact set on which we want to estimate f .Before treating the actual estimation problem, we introduce the approximation spaces serving ascandidate functions for the estimation of f in the following section. In order to estimate f on the set A , we consider a sequence ( S m ) m ∈ N of ﬁnite-dimensional sub-spacesof L ( A ) such that D m := dim( S m ) → ∞ for m → ∞ . For a well chosen m , we will estimate f by the function ˆ f m ∈ S m that minimizes the empirical loss to be deﬁned later. Like in [5], our keyassumption on the approximation spaces ( S m ) is the following.(N) There is a constant C > m ∈ N there is an orthonormal basis ( ϕ λ ) λ ∈ Λ m of S m , | Λ m | = D m , with (cid:13)(cid:13)(cid:13) (cid:88) λ ∈ Λ m ϕ λ (cid:13)(cid:13)(cid:13) ∞ ≤ CD m . It is shown in Birg´e and Massart [8] that Assumption (N) is equivalent to requiring (cid:107) t (cid:107) ∞ ≤ CD m (cid:107) t (cid:107) L ( A ) for all t ∈ S m and m ∈ N .Additionally, it will be necessary for our analysis to require the minimal continuity of the approxi-mation spaces given by the following assumption.(H) For any g ∈ (cid:83) m ∈ N S m , let ¯ g : R → R be the extension of g by zero on the set A c . Then, thefunction ¯ g is piecewise H¨older continuous, i.e., there are constants α > −∞ = a < a <. . . < a L = ∞ , L ∈ N , such that ¯ g | ( a l ,a l +1 ) ∈ C α (( a l , a l +1 )) for any 1 ≤ l ≤ L − A isa closed interval and take A = [ − a, a ] for some a > xample 4.1. [T] The trigonometric spaces S m = span (cid:16)(cid:110) √ a , √ a sin (cid:0) kπa · (cid:1) , √ a cos (cid:0) kπa · (cid:1) , ≤ k ≤ m (cid:111)(cid:17) have dimension D m = 2 m + 1 and property (N) follows directly from the fact that the trigono-metric base functions are uniformly bounded.[P] Piecewise polynomials on a dyadic grid: Most conveniently, these spaces are parameterized interms of a pair m = ( p, r ) with p ∈ N , r ∈ { , . . . , r max } and r max ∈ N being some ﬁxedvalue. Let ( p l ) l ∈ N be the complete orthonormal system in L ([0 , p l is the rescaledLegendre polynomial of degree l for l ∈ N . Further, for p ∈ N and j ∈ {− p , . . . , p − } , let I pj := [ ja − p , ( j + 1) a − p ). Then, for m = ( p, r ), we deﬁne S ( p,r ) := span (cid:16) { ϕ pj,l , l ≤ r, − p ≤ j ≤ p − } (cid:17) with ϕ pj,l ( x ) := (cid:114) p a p l (cid:16) p xa − j (cid:17) I pj ( x ) , x ∈ A. Clearly, dim( S p,r ) = ( r + 1)2 p +1 ≤ ( r max + 1)2 p +1 and property (N) holds with a constant C depending on r max .[W] The dyadic wavelet generated spaces: For arbitrary r ∈ N , there are functions φ, ψ ∈ C r ( R )with support in [0 ,

1] such that (cid:110) √ a φ (cid:16) · a (cid:17) , √ a φ (cid:16) · a + 1 (cid:17) , (cid:114) p a ψ (cid:16) p · a − j (cid:17) , − p ≤ j < p , p ∈ N (cid:111) is a complete orthonormal system in L ( A ), see, e.g., [19]. Then, for the subspace S m = span (cid:16)(cid:110) √ a φ (cid:16) · a (cid:17) , √ a φ (cid:16) · a + 1 (cid:17) , (cid:114) p a ψ (cid:16) p · a − j (cid:17) , − p ≤ j < p , p ≤ m (cid:111)(cid:17) we have dim( S m ) = 2 m +2 by the formula for the ﬁnite geometric sum and property (N) isfulﬁlled.The following deﬁnition due to Baraud et al. [5] proves to be useful in analyzing the nonparametricestimator. We ﬁx an orthonormal basis ( ϕ λ , λ ∈ Λ m ) of S m according to Assumption (N) and deﬁnethe matrices V m , B m ∈ R Λ m × Λ m by V mλ,λ (cid:48) := (cid:107) ϕ λ ϕ λ (cid:48) (cid:107) L ( A ) , B mλ,λ (cid:48) := (cid:107) ϕ λ ϕ λ (cid:48) (cid:107) ∞ . These expressions are convenient in order to express certain estimates, e.g., | ϕ λ ( Z ) ϕ λ (cid:48) ( Z ) | ≤ B mλ,λ (cid:48) and E ( | ϕ λ ( Z ) ϕ λ (cid:48) ( Z ) | ) (cid:46) ( V mλ,λ (cid:48) ) when Z is an A -valued random variable with a bounded Lebesguedensity. Further, let L m := max( ρ ( V m ) , ρ ( B m )) , ρ ( H ) := sup a ∈ R Λ m , (cid:107) a (cid:107)≤ (cid:88) λ,λ (cid:48) | a λ a λ (cid:48) H λ,λ (cid:48) | , H ∈ { V m , B m } . (23)For our main oracle inequalities, we will have to require that L m is asymptotically negligible withrespect to the time horizon T . For the previous examples of approximation spaces, the quantity L m can be linked directly to the dimension D m . In fact, it is shown in [5] that L m (cid:46) D m for [T] and L m (cid:46) D m for [P] and [W]. 13 .2 Construction and analysis of the estimator For a moment, let us consider observations that are discrete in time but continuous in space, i.e., thedata is given by { X t i ( x ) , x ∈ [0 , , i = 0 , . . . , N } . From (3), it is evident that we can decompose X t +∆ = S (∆) X t + σ (cid:90) t +∆ t S ( t + ∆ − s ) dW s + (cid:90) t +∆ t S ( t + ∆ − s ) f ( X s ) ds. By rearranging, we can pass to X t +∆ − S (∆) X t ∆ = f ( X t ) + σ ∆ (cid:90) t +∆ t S ( t + ∆ − s ) dW s + 1∆ (cid:90) t +∆ t (cid:16) S ( t + ∆ − s ) f ( X s ) − f ( X t ) (cid:17) ds, yielding the regression model Y cont i = f ( X t i ) + R cont i + ε cont i , ≤ i ≤ N − , (24)with Y cont i := X t i +1 − S (∆) X t i ∆ ,ε cont i := σ ∆ (cid:90) t i +1 t i S ( t i +1 − s ) dW s ,R cont i := 1∆ (cid:90) t i +1 t i (cid:16) S ( t i +1 − s ) f ( X s ) − f ( X t i ) (cid:17) ds. The main term in the regression model is given by f ( X t i ), ε cont i is the stochastic noise term and R cont i is a negligible bias. Note that the stochastic noise term is stochastically independent of the covariate X t i . The corresponding least squares estimator is deﬁned byˆ f cont m := arg min g ∈ S m γ N ( g ) , γ N ( g ) := 1 N N − (cid:88) i =0 (cid:107) Y cont i − g ( X t i ) (cid:107) L with L := L ((0 , ϑ through the semigroup S ( · )appearing in the response variables Y cont i which, for now, is assumed to be known.Now, let us return to the fully discrete observation scheme described in Section 2.1. In order toderive a discretized version of the estimator discussed in the previous section, we assume that thediscrete observations are recorded at the locations t i = i ∆ and y k = kM for 0 ≤ i ≤ N and 0 ≤ k ≤ M , i.e., the parameter b specifying the margin of the spatial observationwindow is set to b = 0. With observations distributed throughout the whole space domain (0 , x k ( t ) := (cid:104) X t , e k (cid:105) L by their empiricalcounterpart (cid:104) X t , e k (cid:105) M := M (cid:80) M − l =1 X t ( y l ) e k ( y l ). Recall that there is a discrete version of the oth-onormality property for the sine base, see e.g. [26, Section 5] or the proof of [42, Theorem 2.1]. Inparticular, for k ≤ M −

1, we have the relation (cid:104) X t , e k (cid:105) M = 1 M M − (cid:88) l =1 X t ( y l ) e k ( y l ) = (cid:88) (cid:96) ∈I + k x (cid:96) ( t ) − (cid:88) (cid:96) ∈I − k x (cid:96) ( t )14here I + k := k + 2 M · N and I − k := 2 M − k + 2 M · N . In order to approximate the expression S (∆) X t i appearing in the deﬁnition of ˆ f cont m , we deﬁne ˆ S (∆) := ˆ S M (∆) byˆ S (∆) X t := M − (cid:88) (cid:96) =1 e − λ (cid:96) ∆ (cid:104) X t , e (cid:96) (cid:105) M e (cid:96) which only hinges on X t through the discrete data ( X t ( y k ) , k = 1 , . . . , M − Y i = ˆ S (0) f ( X t i ) + R i + ε i . (25)with Y i := ˆ S (0) X t i +1 − ˆ S (∆) X t i ∆ ,R i := R cont i + f ( X t i ) − ˆ S (0) f ( X t i ) + S (∆) X t i − ˆ S (∆) X t i ∆ + ˆ S (0) X t i +1 − X t i +1 ∆ ,ε i := ε cont i . We introduce the corresponding least squares estimatorˆ f m := arg min g ∈ S m N N − (cid:88) i =0 (cid:13)(cid:13) Y i − ˆ S (0) g ( X t i ) (cid:13)(cid:13) L = arg min g ∈ S m N N − (cid:88) i =0 M − (cid:88) k =1 (cid:18) (cid:104) X t i +1 , e k (cid:105) M − e − λ k ∆ (cid:104) X t i , e k (cid:105) M ∆ − (cid:104) g ( X t i ) , e k (cid:105) M (cid:19) (26)which is purely based on the fully discrete observations.Under Assumption (H), it is possible to derive a convenient and intuitive representation for ourestimator ˆ f m based on the following lemma. Lemma 4.2.

Let H : [0 , → R be H¨older continuous in an environment of y k for each ≤ k ≤ M − and set h k := (cid:104) H, e k (cid:105) L . Then, the series H k := (cid:80) l ∈I + k h l − (cid:80) l ∈I − k h l converges and we have (cid:104) H, e k (cid:105) M = H k as well as M M (cid:88) k =1 H ( y k ) = (cid:107) H M (cid:107) L = M − (cid:88) l =1 H l with H M := ˆ S (0) H = (cid:80) M − l =1 H l e l . The H¨older condition in the above lemma can be relaxed to requiring convergence of the Fourierseries of H at y k to H ( y k ) for each 1 ≤ k ≤ M − X t i ( y k ) hit a discontinuity of the extension¯ g of some g ∈ (cid:83) m ∈ N S m with probability zero and, hence, the above lemma is applicable with H := X ti +1 − ˆ S (∆) X ti ∆ − g ( X t i ). In particular, the estimator ˆ f m can, almost surely, be expressed viaˆ f m = arg min g ∈ S m γ N,M ( g ) , γ N,M ( g ) := 1 N M N − (cid:88) i =0 M − (cid:88) k =1 (cid:16) X t i +1 ( y k ) − S ∆ t i ( y k )∆ − g ( X t i ( y k )) (cid:17) (27)where S ∆ t i := ˆ S (∆) X t i . 15he natural empirical norm associated with the discrete observations scheme is given by (cid:107) g (cid:107) N,M := 1

N M N − (cid:88) i =0 M − (cid:88) k =1 g ( X t i ( y k ))and, in the sequel, we derive a bound on E ( (cid:107) ˆ f m − f A (cid:107) N,M ) with f A := f A . As before, π denotes thestationary distribution for X and, for nonrandom g ∈ L ( A ), let (cid:107) g (cid:107) π,M := 1 M M − (cid:88) k =1 E ( g ( X ( y k ))) . Recall that, under Assumption (E), there are constants c, C > c (cid:107) g (cid:107) L ( A ) ≤ (cid:107) g (cid:107) π,M ≤ C (cid:107) g (cid:107) L ( A ) (28)holds for all g ∈ L ( A ). The oracle choice for an estimate of f A from the space S m is given by f m := arg min g ∈ S m (cid:107) f − g (cid:107) L ( A ) . (29) Theorem 4.3.

Grant Assumptions (M), (E), (N) and (H). Further, assume that M ∆ → ∞ as wellas N ∆log N → ∞ , L m = o ( N ∆log N ) and D m ≤ N . Then, for any γ < / , we have E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M (cid:1) (cid:46) (cid:107) f − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ . Remark . Under the same assumptions as in the above theorem, we can obtain the oracle inequality1 N N − (cid:88) i =0 E (cid:0) (cid:107) ˆ f cont m ( X t i ) − f A ( X t i ) (cid:107) L (cid:1) (cid:46) (cid:107) f − f m (cid:107) L ( A ) + D m T + ∆ γ for the estimator ˆ f cont m from (26) based on space-continuous observations. In fact, this result can beobtained without the continuity Assumption (H).We encounter the usual bias-variance trade-oﬀ in nonparametric statistics: When we choose m toosmall the estimator is not suﬃciently versatile, leading to a large bias term (cid:107) f − f m (cid:107) L ( A ) . On the otherhand, when choosing m too large, the estimated function will closely follow the concrete realizationof the data, leading to a large variance term D m /T . Now, assuming that (cid:107) f m − f (cid:107) L ( A ) (cid:104) D − αm forsome α >

0, balancing the bias and the variance term shows that it is optimal to choose D m (cid:104) T α .Under the additional assumption that T (cid:16) ∆ γ + 1 M ∆ (cid:17) → γ < /

2, the last two terms on the right hand side of the oracle inequality in Theorem4.3 can be regarded as negligible remainders and we obtain the usual (squared) nonparametric rate E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M (cid:1) (cid:46) T − α α +1 . In fact, some caution is necessary in order to prevent a contradiction between D m (cid:104) T α and thecondition L m = o (cid:0) T log N (cid:1) in the theorem, as already pointed out in [15]. When working with [P]or [W], we have L m (cid:104) D m and the conditions can be met at the same time. When working withthe trigonometric spaces [T], we have L m (cid:104) D m and, thus, it is only possible to take D m (cid:104) T α ,provided that α > /

2. For a function f meeting our fundamental requirement f ∈ C ( R ), the k -th16ourier coeﬃcients are generally of the order 1 /k , a faster decay is only present in the exceptionalcase where the function f is periodic on A . Thus, we have (cid:107) f m − f (cid:107) L ( A ) (cid:104) D − αm with α = 1 / T − ˜ α α +1 for any ˜ α < α . In general, the true value ofthe regularity parameter α is unknow, as it is a property of the unknown function f . This issue willbe addressed via an adaptive procedure that chooses the approximation space S m in a data drivenway, see Theorem 4.7.In contrast to the term ∆ γ , γ < /

2, on the right hand side of the oracle inequality, Comte et al.[15] obtain the smaller bound ∆ in their corresponding result in the context of SODEs. In fact, thiserror term arises from bounding the bias terms R cont i in the underlying regression model. The dif-ference in the order of magnitude is due to the fact that for the SPDE model there only is temporalH¨older regularity up to exponent 1 /

4, as opposed to exponent 1 / R i − R cont i .Naturally, it is not present in the result from [15]. In the proof, we show that, for h ≥

0, we have E ( (cid:107) S ( h ) X t − ˆ S ( h ) X t (cid:107) L ) = O (1 /M ). This error is caused by approximating the coeﬃcient processesof X by their empirical counterparts, in order to get an approximation of the semigroup. Due to theroughness of the paths x (cid:55)→ X t ( x ), the corresponding approximation quality is rather poor. A similareﬀect can be observed in Kaino and Uchida [29] where a spectral approximation is used for parametricestimation for the linear equation. The approximation error of the order O (1 /M ) gets further ampli-ﬁed by dividing by the squared renormalization ∆ , leading to the condition M ∆ → ∞ . Under thiscondition, the observation frequency in space is much larger than in time which, in particular, rulesout a balanced sampling design. The error term E ( (cid:107) f ( X t ) − ˆ S (0) f ( X t ) (cid:107) L ) induced by R i − R cont i turns out to be negligible with respect to ∆ γ under the condition M ∆ → ∞ .Figure 1 shows ten exemplary realizations of the estimator ˆ f m with the trigonometric basis [T]when f is the linear function f ( x ) := ϑ x for some ϑ < f is estimated is A = [ − , f iscaptured accurately inside some interval containing the origin, roughly [ − . , . S m are necessarily periodic over [ − , g ∈ (cid:83) m ∈ N S m , we can use Lemma 4.2 and the regression model (25) towrite γ N,M ( g ) − γ N,M ( f ) = (cid:107) g − f (cid:107) N,M + 2 N N − (cid:88) i =0 (cid:104) Y i − ˆ S (0) f ( X t i ) , ˆ S (0) f ( X t i ) − ˆ S (0) g ( X t i ) (cid:105) L = (cid:107) g − f (cid:107) N,M + 2 N N − (cid:88) i =0 (cid:104) ε i + R i , ˆ S (0) f ( X t i ) − ˆ S (0) g ( X t i ) (cid:105) L . By deﬁnition of ˆ f m , we have γ N,M ( ˆ f m ) − γ N,M ( f ) ≤ γ N,M ( f m ) − γ N,M ( f ) and using the aboveexpansion on both sides of this inequality yields (cid:107) ˆ f m − f (cid:107) N,M ≤ (cid:107) f m − f (cid:107) N,M + 2 N N − (cid:88) i =0 (cid:104) ε i + R i , ˆ S (0) ˆ f m ( X t i ) − ˆ S (0) f m ( X t i ) (cid:105) L . Since both ˆ f m and f m are A -supported, if we insert f = f A + f A c in the above equation, thenthe terms (cid:107) f A c (cid:107) N,M on both sides of the inequality cancel. We arrive at the fundamental oracle17igure 1: Ten realizations of the estimator ˆ f m from (27) with the trigonometric basis [T] on A = [ − , f ( x ) = − . · x (black). The barplotshows a histogram of the corresponding discrete observations { X t i ( y k ) } i,k . The sample sizesare given via M = 500 , T = 200 , ∆ = 0 .

05. The dimension of the approximation space waschosen to be D m = 2 m + 1 = 29, which corresponds to D m (cid:104) √ T . The discrete observationsof X are obtained by means of the replacement method with parameter L = 1, see [25]. Theparameter values are σ = ϑ = 0 . (cid:107) ˆ f m − f A (cid:107) N,M ≤ (cid:107) f m − f A (cid:107) N,M + 2 N N − (cid:88) i =0 (cid:104) ε i , ˆ S (0) ˆ f m ( X t i ) − ˆ S (0) f m ( X t i ) (cid:105) L + 2 N N − (cid:88) i =0 (cid:104) R i , ˆ S (0) ˆ f m ( X t i ) − ˆ S (0) f m ( X t i ) (cid:105) L . (30)By treating each of the three terms appearing on the right hand side above individually, we can derivethe following proposition. Proposition 4.5.

Grant Assumptions ( M ) , ( N ) and ( H ) and assume M ∆ → ∞ . For c > , deﬁne Ω N,M,m := Ω

N,M,m,c := (cid:110) (cid:107) t (cid:107) N,M ≥ c (cid:107) t (cid:107) L ( A ) for all t ∈ S m (cid:111) . Then, for any γ < / , we have E (cid:16) (cid:107) ˆ f m − f A (cid:107) N,M Ω N,M,m (cid:17) (cid:46) (cid:107) f A − f m (cid:107) π,M + D m T + ∆ γ + 1 M ∆ . The event Ω

N,M,m has been introduced since the proof requires bounding the stochastic noise term N (cid:80) N − i =0 (cid:104) ε i , ˆ S (0) t ( X t i ) (cid:105) L uniformly over all (cid:107) · (cid:107) N,M -normalized t ∈ S m . The latter is diﬃcult sinceboth the object to be bounded and the norm are random objects. On the event Ω N,M,m , it is suﬃcient18o bound it uniformly over all (cid:107) · (cid:107) L ( A ) -normalized t ∈ S m which is possible thanks to Assumption(N).Under Assumption (E), we can further bound (cid:107) f A − f m (cid:107) π,M (cid:46) (cid:107) f − f m (cid:107) L ( A ) , hence, Proposition 4.5already provides the relevant terms appearing in the oracle inequality from Theorem 4.3. The secondmain step of the proof of the theorem is to verify that the event Ω cN,M,m has negligible probability.To that aim, let us consider Ξ N,M,m := (cid:40)(cid:12)(cid:12)(cid:12) (cid:107) t (cid:107) N,M (cid:107) t (cid:107) π,M − (cid:12)(cid:12)(cid:12) ≤ ∀ t ∈ S m (cid:41) which satisﬁes Ξ N,M,m ⊂ Ω N,M,m, . Since E ( (cid:107) t (cid:107) N,M ) = (cid:107) t (cid:107) π,M and (cid:12)(cid:12)(cid:12) (cid:107) t (cid:107) N,M (cid:107) t (cid:107) π,M − (cid:12)(cid:12)(cid:12) (cid:104) (cid:12)(cid:12)(cid:12) (cid:107) t (cid:107) N,M − (cid:107) t (cid:107) π,M (cid:107) t (cid:107) L ( A ) (cid:12)(cid:12)(cid:12) under Assumption (E), bounding the probability of Ξ cN,M,m is equivalent to deriving a concentrationinequality for (cid:107) t (cid:107) N,M uniformly over all L ( A )-normalized t ∈ S m . This can be done using thestandard techniques for β -mixing sequences, see, e.g., [20]: by means of the mixing assumption (8) in(M), we approximate ( X t , . . . , X t N ) by a process with independent blocks and then apply a variantof Bernstein’s inequality. We obtain the following bound for P (Ξ cN,M,m ). Lemma 4.6.

Grant Assumptions (M), (E), (N) and (H). Then, there are constants

K, K (cid:48) > suchthat P (Ξ cN,M,m ) ≤ K (cid:16) N β X ( q N ∆) + D m exp (cid:16) − K (cid:48) p N L m (cid:17)(cid:17) holds for any p N , q N ∈ N with N = 2 p N q N . In particular, with the constants γ and L from the β -mixing condition (8) as well as ˜ K := K · ( L ∨ , we have P (Ξ cN,M,m ) ≤ ˜ K (cid:16) N exp (cid:0) − γq N ∆ (cid:1) + D m exp (cid:16) − K (cid:48) p N L m (cid:17)(cid:17) . The conclusion of the main theorem is a straightforward consequence of Proposition 4.5 and Lemma4.6, we refer to Section 5.3 for further details on the proof.In order to choose an appropriate approximation space S m in a purely data-driven way, let M N,M := { , . . . , ¯ m } be the indexes of a sequence of approximation spaces S m with dimensions D m , subject tothe nesting assumption S m ⊂ S ¯ m for m ∈ M N,M . We deﬁne the model selection methodˆ m := arg min m ∈M N,M (cid:8) γ N,M ( ˆ f m ) + pen( m ) (cid:9) with pen( m ) := κ σ D m T for some appropriate constant κ >

0. Note that the penalty term is of the order of the stochasticerror in the underlying regression problem such that ˆ m automatically balances the deterministicapproximation error and the stochastic error. The parameter σ in the penalty can be replaced bysome a priori upper bound of the volatility or ˆ σ -dependent. The adaptive estimator for the reactionfunction is then ˆ f ˆ m fulﬁlling( ˆ m, ˆ f ˆ m ) = arg min m ∈M N,M ,f m ∈ S m (cid:8) γ N,M ( f m ) + pen( m ) (cid:9) . Theorem 4.7.

Grant Assumptions (M), (E), (N) and (H) and let κ be suﬃciently large (dependingonly on the constants in Assumption (E)). Further, assume that M ∆ → ∞ as well N ∆log N → ∞ , L ¯ m = o ( N ∆log N ) and D ¯ m ≤ N . Then, for any γ < / we have E ( (cid:107) ˆ f ˆ m − f A (cid:107) N,M (cid:1) (cid:46) inf m ∈M N,M inf f m ∈ S m (cid:110) (cid:107) f m − f (cid:107) L ( A ) + D m T (cid:111) + 1 M ∆ + ∆ γ . f m . The extensions to the adaptive case are straightforward.Next, we assess the quality of ˆ f m in terms of the more intuitive distance measure (cid:107) ˆ f m − f (cid:107) L ( A ) ,rather then (cid:107) ˆ f m − f A (cid:107) N,M . Using the triangle inequality as well as the equivalence of the empiricaland the L ( A )-norm on Ξ N,M,m , we can bound (cid:107) ˆ f m − f (cid:107) L ( A ) ≤ (cid:107) ˆ f m − f m (cid:107) L ( A ) + 2 (cid:107) f m − f (cid:107) L ( A ) = 2 (cid:107) ˆ f m − f m (cid:107) L ( A ) Ξ N,M,m + 2 (cid:107) ˆ f m − f m (cid:107) L ( A ) Ξ cN,M,m + 2 (cid:107) f m − f (cid:107) L ( A ) (cid:104) (cid:107) ˆ f m − f m (cid:107) N,M Ξ N,M,m + 2 (cid:107) ˆ f m − f m (cid:107) L ( A ) Ξ cN,M,m + 2 (cid:107) f m − f (cid:107) L ( A ) . Then, thanks to Proposition 4.5 and Lemma 4.6, it is straightforward, to derive an upper bound inprobability. Bounding E ( (cid:107) ˆ f m − f (cid:107) L ( A ) ), on the other hand, is more challenging since the behavior of (cid:107) ˆ f m − f (cid:107) L ( A ) on the set Ξ cN,M,m is a priori unclear. This issue can be circumvented by consideringthe truncated version ˆ f K N m := ( − K N ) ∨ ( ˆ f m ∧ K N )where ( K N ) is a sequence of positive numbers with K N → ∞ such that K N P (Ξ cN,M,m ) → Corollary 4.8.

Grant Assumptions (M), (E), (N) and (H). Further, assume that M ∆ → ∞ as wellas N ∆log N → ∞ , L m = o ( N ∆log N ) and D m ≤ N . Then, for any γ < / , we have (cid:107) ˆ f m − f (cid:107) L ( A ) = O p (cid:16) (cid:107) f − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ (cid:17) with f m from (29) . Furthermore, for a sequence ( K N ) with K N → ∞ and K N /N β → for some β > , we have E ( (cid:107) ˆ f K N m − f (cid:107) L ( A ) ) (cid:46) (cid:107) f − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ . f with unknown diﬀusivity and volatility While our nonparametric estimator for f does not hinge on the volatility parameter σ , the diﬀusivityparameter ϑ enters the least square criterion in (27) via the expressions S ∆ t i ( y k ) = M − (cid:88) (cid:96) =1 e − π ϑ(cid:96) (cid:104) X t i , e (cid:96) (cid:105) M e (cid:96) ( y k ) . In practice, this parameter will generally be unknown and has to be replaced by an estimate. To thataim, we make use of the double increments based estimator ˆ ϑ from (22), while omitting the spatialobservations y k / ∈ [ b, − b ] for an arbitrary but ﬁxed b >

0. Recall that the computation of ˆ ϑ doesnot require prior knowledge of the volatility parameter σ and Theorem 3.4 reveals the (squared)convergence rate ( ϑ − ˆ ϑ ) = O p ((∆ / ∨ δ ) /T ). Since the conclusion of Theorem 4.3 is only useful aslong as M ∆ → ∞ , we work in the regime M √ ∆ → ∞ where the (squared) convergence rate is givenby ( ϑ − ˆ ϑ ) = O p (∆ / /T ). Now, based on the estimator ˆ ϑ , we can deﬁne an approximation ˇ S (∆) ofthe discretized semigroup ˆ S (∆), namelyˇ S (∆) u := M − (cid:88) (cid:96) =1 e − ˆ λ (cid:96) ∆ (cid:104) u, e (cid:96) (cid:105) M e (cid:96) with ˆ λ (cid:96) := π ˆ ϑ(cid:96) u : [0 , → R . The corresponding nonparametric estimator for f is then givenby ˇ f m := arg min g ∈ S m N M N − (cid:88) i =0 M − (cid:88) k =1 (cid:16) X t i +1 ( y k ) − ˇ S ∆ t i ( y k )∆ − g ( X t i ( y k )) (cid:17) where ˇ S ∆ t i := ˇ S (∆) X t i . In order to analyze the convergence rate of ˇ f m , we incorporate the approxi-mation of the semigroup into the regression model. Due to ˆ S (0) = ˇ S (0) and in view of (25), it is nowgiven byˆ S (0) X t i +1 − ˇ S (∆) X t i ∆ = ˆ S (0) f ( X t i ) + R (cid:48) i + ε i with R (cid:48) i := R i + ˆ S (∆) X t i − ˇ S (∆) X t i ∆ . Based on this representation, we are now going to show that the approximation of the discretizedsemigroup does not aﬀect the convergence rate of the nonparametric estimator. In fact, since ourerror bound for ˆ ϑ from Theorem 3.4 is a priori only valid in probability sense, the same holds for ˇ f m . Theorem 4.9.

Grant Assumptions (M), (E), (N) and (H). Further, assume that M ∆ → ∞ , T √ ∆ → as well as N ∆log N → ∞ , L m = o ( N ∆log N ) and D m ≤ N . Then, for any γ < / , we have (cid:107) ˇ f m − f A (cid:107) N,M = O p (cid:16) (cid:107) f − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ (cid:17) with f m from (29) . Furthermore, the same bound holds for (cid:107) ˇ f m − f (cid:107) L ( A ) .Remark . The proof does not make use of any properties of ˆ ϑ apart from its convergence rate.In general, if ˜ ϑ is any estimator for ϑ with | ˜ ϑ − ϑ | = O p ( a M,N,T ) for some convergence rate a M,N,T ,we get the additional term a N,M,T ∆ / in the above upper bound. For ˆ ϑ , this additional term does notappear in the theorem, as it is dominated by D m /T . We start by proving the results on the H¨older regularity of the linear and nonlinear component of X . The two subsequent sections contain the proofs of our main results, namely on diﬀusivity andvolatility estimation, Section 5.2, and on nonparametric estimation of the nonlinearity f , Section 5.3.Further proofs and auxiliary results are deferred to Section 5.4. X We verify the results on the H¨older regularity of the processes X and ( N t ) claimed in Propositions 2.2and 2.3 of Section 2.2, respectively. To that aim, recall that for s ≥ p ≥ W s,p := W s,p ((0 , s ]-times weakly diﬀerentiable functions u : (0 , → R such that (cid:107) u (cid:107) W s,p := [ s ] (cid:88) k =0 (cid:107) u ( k ) (cid:107) L p + (cid:18)(cid:90) (cid:90) | u ([ s ]) ( ξ ) − u ([ s ]) ( η ) | p | ξ − η | s − [ s ]) p dξ dη (cid:19) /p < ∞ . Further, the Sobolev embedding theorem states that for any s ≥ p ≥ α >

0, we have α ≤ s − p ⇒ W s,p ⊂ C α (31)and the embedding is continuous. Our ﬁrst step is an analysis of the H¨older regularity of the linearcomponent ( X t ). We remark that the norm bounds in statements ( i ) and ( ii ) of the following lemmaare also stated in Cerrai [10] as well as Da Prato and Zabczyk [18]. The remaining results are derivedusing similar techniques. We provide a complete proof for the sake of completeness.21 emma 5.1. For any p ∈ [1 , ∞ ) , the following hold.(i) sup t ≥ E ( (cid:107) X t (cid:107) p ∞ ) < ∞ .(ii) For any γ < / , we have ( X t ) ∈ C ( R + , C γ ) a.s. and sup t ≥ E ( (cid:107) X t (cid:107) pC γ ) < ∞ .(iii) For any γ < / and T > , we have ( X t ) ≤ t ≤ T ∈ C γ ([0 , T ] , E ) a.s. and there exists a constant C > such that E ( (cid:107) X t − X s (cid:107) p ∞ ) ≤ C | t − s | γp for all s, t ≥ .Proof. ( iii ) The property ( X t ) ≤ t ≤ T ∈ C γ ([0 , T ] , E ) is a consequence of Kolmogorov’s criterion and E ( (cid:107) X t − X s (cid:107) p ∞ ) ≤ C | t − s | γp for all p ≥

1. To verify the latter statement, assume, without loss ofgenerality that s, t ∈ ( a, a + 1) for some a ≥ U := ( a, a + 1) × (0 , E ( | X t ( x ) − X s ( y ) | ) (cid:46) (cid:112) | t − s | + | x − y | ≤ (cid:112) | t − s | + (cid:112) | x − y | (cid:46) (( t − s ) + ( x − y ) ) / holds uniformly in x, y ∈ (0 ,

1) and s, t ≥

0. The last step follows from the equivalence of norms on R . Now, since ( t, x ) (cid:55)→ X t ( x ) is a continuous function, the Garsia-Rodemich-Rumsey inequality, seee.g., [17, Theorem B.1.5], provides the following bound on its increments: for any α > , β >

4, thereexists a constant c > a ) such that | X s ( x ) − X t ( y ) | ≤ c (( x − y ) + ( t − s ) ) β − α (cid:18)(cid:90) U×U | X u ( ξ ) − X v ( η ) | α ( | ξ − η | + | u − v | ) β/ dξ dη du dv (cid:19) α (32)for all ( x, s ) , ( y, t ) ∈ U . Note that for x = y , the right hand side of the above inequality is independentof x . Now, choose α = 2 m for some m ∈ N in such a way that α = 2 m > p . Then, by applyingJensen’s inequality to the concave function R + (cid:51) h (cid:55)→ h p/α , we obtain E (sup x | X s ( x ) − X t ( x ) | p ) ≤ c p ( t − s ) β − m p (cid:18)(cid:90) U×U E ( | X u ( ξ ) − X v ( η ) | m )( | ξ − η | + | u − v | ) β/ dξ dη du dv (cid:19) pα (cid:46) c p ( t − s ) β − m p (cid:18)(cid:90) U×U ( | ξ − η | + | u − v | ) m/ ( | ξ − η | + | u − v | ) β/ dξ dη du dv (cid:19) pα . The above integral is ﬁnite as long as β − m <

2. Now, the result follows since for any given γ < / m ∈ N and β < m such that β − m ≤ γ .Assertion ( i ) can be proved similarly by taking s = t and y = 1 in (32) to obtain a bound forsup x | X t ( x ) | = sup x | X t ( x ) − X s ( y ) | . Note that, in order to be able to chose y = 1, we have to modifythe set U by taking, e.g., U = ( a, a + 1) × ( − ε, ε ) for some ε >

0, and extend ( X t ) by deﬁning X t ( z ) := 0 for z / ∈ [0 ,

1] such that ( X t ) is a continuous function on U .( ii ) Clearly, A ϑ is a second order diﬀerential operator whose eigenvalues satisfy the condition (cid:80) (cid:96) ≥ λ − ρ(cid:96) < ∞ for any ρ > /

2. Thus, by [18, Theorem 5.25], ( X t ) ∈ C ( R + , W α,p ) holds forany α > p > /p + α < /

4. Now, by choosing α close to 1 / p suﬃcientlylarge, ( X t ) ∈ C ( R + , C γ ) follows from the Sobolev embedding theorem (31). Now, with the bound(10) for the Gaussian process ( X t ), we get for any h ∈ (0 ,

1) that E ( (cid:107) X t (cid:107) qW h,q ) (cid:46) E ( (cid:107) X t (cid:107) q ∞ ) + (cid:90) (cid:90) E ( | X t ( ξ ) − X t ( η ) | q ) | ξ − η | hq dξ dη (cid:46) E ( (cid:107) X t (cid:107) q ∞ ) + (cid:90) (cid:90) | ξ − η | q/ | ξ − η | hq dξ dη.

22n view of ( i ), this shows that sup t ≥ E ( (cid:107) X t (cid:107) qW h,q ) < ∞ , as long as h < /

2. Further, by the Sobolevembedding theorem, we have (cid:107) X t (cid:107) C γ (cid:46) (cid:107) X t (cid:107) W h,q , provided that h − q > γ . Thus, choosing h ∈ ( γ, )and q > max(( h − γ ) − , p ), we get E ( (cid:107) X t (cid:107) pC γ ) (cid:46) E ( (cid:107) X t (cid:107) q pq W h,q ) ≤ E ( (cid:107) X t (cid:107) qW h,q ) pq by Jensen’s inequality. The claim now follows by taking the supremum over t ≥ f (0) (cid:54) = 0 and, hence, f ( X t ) / ∈ E , we need to regard ( S ( t )) t ≥ as a semigroup actingon the space ˜ E = C ([0 , A ˜ E of A ϑ = ϑ ∆ in ˜ E , i.e., A ˜ E x := A ϑ x for x ∈ D ( A ˜ E ) := { x ∈ ˜ E ∩ D ( A ϑ ) : A ϑ x ∈ ˜ E } . Note that A ˜ E generates a semigroup ( S ˜ E ( t )) t ≥ on ˜ E which is not strongly continuous. Indeed, we have D ( A ˜ E ) ˜ E = E and lim t → S ˜ E ( t ) x = x in ˜ E holds ifand only if x ∈ E . Nevertheless, ( S ˜ E ( t )) t ≥ deﬁnes a so called analytic semigroup on ˜ E which retainsmany properties of C -semigroups. In particular, for any x ∈ ˜ E , it holds that (cid:82) t S ˜ E ( r ) x dr ∈ D ( A ˜ E )and we have the representation S ˜ E ( t ) x − x = A ˜ E (cid:90) t S ˜ E ( r ) x dr. (33)Hence, if r (cid:55)→ (cid:107) A ˜ E S ˜ E ( r ) x (cid:107) ˜ E is integrable over [0 , t ], then S ˜ E ( t ) x − x = (cid:82) t A ˜ E S ˜ E ( r ) x dr . Since thedeﬁnitions of the semigroups S and S ˜ E and their generators agree on the intersection of their domains,respectively, we will refer to both by ( S ( t )) t ≥ and A ϑ from now on. The following inequalities, whichare particular cases of results derived in Sinestrari [44], are our main tool to study the regularity of( N t ). Lemma 5.2.

We ﬁx an element λ ∈ (0 , λ ) . For any α, β, ∈ (0 , \ { } and n ∈ N , there exists aconstant C > such that for all t > :(i) (cid:107) A nϑ S ( t ) x (cid:107) ∞ ≤ C e − λ t t − n (cid:107) x (cid:107) ∞ for all x ∈ ˜ E ,(ii) (cid:107) S ( t ) x (cid:107) C α ≤ C e − λ t t − α/ (cid:107) x (cid:107) ∞ for all x ∈ ˜ E ,(iii) (cid:107) A ϑ S ( t ) x (cid:107) ∞ ≤ Ct − (1 − α/ (cid:107) x (cid:107) C α for all x ∈ C α ,(iv) (cid:107) A nϑ S ( t ) x (cid:107) C β ≤ C e − λ t t − ( n + β − α ) (cid:107) x (cid:107) C α for all x ∈ C α where it is required that either n ≥ or α ≤ β . For a proof of ( i ) , ( ii ) and ( iv ), we refer to [33, Proposition 2.3.1], ( iii ) follows from [44, Proposition1.11]. Further, in order to transfer the spatial to the temporal regularity, of particular importance forour study are the so called intermediate spaces, deﬁned by D A ϑ ( α, ∞ ) := (cid:26) x ∈ ˜ E : (cid:107) x (cid:107) D Aϑ ( α, ∞ ) := (cid:107) x (cid:107) ˜ E + sup t> (cid:107) S ( t ) x − x (cid:107) ˜ E t α < ∞ (cid:27) , α ∈ (0 , , which are Banach spaces with the norm (cid:107) · (cid:107) D Aϑ ( α, ∞ ) . These spaces can be deﬁned for arbitraryanalytic semigroups on a Banach space, see, e.g. [44]. For our concrete choice of A ϑ and ˜ E , they aregiven by the Dirichlet-H¨older spaces D A ϑ ( α, ∞ ) = C α ([0 , , α (cid:54) = 12 , where the norms are equivalent, see Lunardi [32].23 roof of Proposition 2.2. Due to Lemma 5.1, in remains to prove the statements for ( N t ) and, if ξ follows the stationary distribution, for ( ξ t ) t ≥ with ξ t := S ( t ) ξ .( i ) Step 1.

We show (cid:107) N t (cid:107) C γ < ∞ a.s. for all t ≥ t ≥ E ( (cid:107) N t (cid:107) pC γ ) < ∞ : From Lemma 5.2 ( ii ) we have that (cid:107) N t (cid:107) C γ ≤ (cid:90) t (cid:107) S ( t − s ) f ( X s ) (cid:107) C γ ds (cid:46) (cid:90) t e − λ ( t − s ) ( t − s ) − γ (cid:107) f ( X s ) (cid:107) ∞ ds and, consequently, (cid:107) N t (cid:107) C γ (cid:46) sup s ≤ t (cid:107) f ( X s ) (cid:107) ∞ (cid:90) t e − λ r r − γ dr is almost surely ﬁnite by our basic assumptions. Also, using Jensen’s inequality and the fact that r (cid:55)→ a ( r ) := e − λ r r − γ is integrable over R + , we get (cid:107) N t (cid:107) pC γ ≤ (cid:90) t a ( t − s ) (cid:107) f ( X s ) (cid:107) p ∞ ds · (cid:18)(cid:90) t a ( r ) dr (cid:19) p − (cid:46) (cid:90) t a ( t − s ) (cid:107) f ( X s ) (cid:107) p ∞ ds. Thus, Fubini’s theorem and the polynomial growth condition on f from (7) yieldsup t ≥ E ( (cid:107) N t (cid:107) pC γ ) (cid:46) sup s ≥ E ( (cid:107) f ( X s ) (cid:107) p ∞ ) (cid:46) s ≥ E ( (cid:107) X s (cid:107) dp ∞ )which is ﬁnite under Assumption (B). Step 2:

We show ( N t ) ∈ C ( R + , C γ ): In order to verify (cid:107) N t + h − N t (cid:107) C γ → h → N t + h − N t = ( S ( h ) − I ) N t + (cid:90) t + ht S ( t + h − r ) f ( X r ) dr. To treat the ﬁrst term, choose α ∈ ( γ, ). Then, using (33) and property ( iv ) of Lemma 5.2, we canbound (cid:107) ( S ( h ) − I ) N t (cid:107) C γ (cid:46) (cid:90) h (cid:107) A ϑ S ( r ) N t (cid:107) C γ dr ≤ (cid:107) N t (cid:107) C α (cid:90) h e − λ r r − (1+ γ − α ) dr which tends to 0 for h →

0. For the second term, it follows from bound ( ii ) in Lemma 5.2 that (cid:13)(cid:13)(cid:13) (cid:90) t + ht S ( t + h − r ) f ( X r ) dr (cid:13)(cid:13)(cid:13) C γ (cid:46) sup r ≤ T (cid:107) f ( X r ) (cid:107) ∞ (cid:90) t + ht e − λ r r − γ dr which also tends to 0 almost surely for h → Step 3 : Steps 1 and 2 verify claim ( i ) in the case ξ = 0. To treat the case where ξ follows thestationary distribution, we use the fact that X has the same distribution as ˜ X = ( X t ) t ≥ . Again,we have the decomposition ˜ X t = S (1 + t ) ξ + X t + N t and ( i ) has already been proved for the second and third term. For the ﬁrst term, the result followsfrom (cid:107) S (1 + t ) ξ (cid:107) C γ (cid:46) (cid:107) ξ (cid:107) ∞ = (cid:107) X (cid:107) ∞ by inequality ( ii ) in Lemma 5.2. Step 4.

We transfer the result ( i ) from X to f ( X ): First of all, f ( X ) ∈ C ( R + , C γ ) almost surelyholds due to the result for X and the assumption f ∈ C ( R ). Further, we have (cid:107) f ( X t ) (cid:107) C γ = (cid:107) f ( X t ) (cid:107) ∞ + sup ξ (cid:54) = η | f ( X t ( ξ )) − f ( X t ( η )) || ξ − η | γ ≤ (cid:107) f ( X t ) (cid:107) ∞ + (cid:107) f (cid:48) ( X t ) (cid:107) ∞ (cid:107) X t (cid:107) C γ E ( (cid:107) f ( X t ) (cid:107) pC γ ) (cid:46) E ( (cid:107) f ( X t ) (cid:107) p ∞ ) + E ( (cid:107) f (cid:48) ( X t ) (cid:107) p ∞ ) + E ( (cid:107) X t (cid:107) pC γ ) (cid:46) E ( (cid:107) X t (cid:107) dp ∞ ) + E ( (cid:107) X t (cid:107) pC γ ) < ∞ uniformly in t ≥ ii ) Step 1.

We show the claim for ( N t ): Using the same decomposition for the increments of ( N t )as in the proof of ( i ), we get (cid:107) N t − N s (cid:107) ∞ ≤ (cid:107) ( S ( t − s ) − I ) N s (cid:107) ∞ + (cid:90) ts (cid:107) S ( t − r ) f ( X r ) (cid:107) ∞ dr for s < t . For the ﬁrst term, by deﬁnition of the intermediate spaces, it holds that (cid:107) ( S ( t − s ) − I ) N s (cid:107) ∞ (cid:46) (cid:107) N s (cid:107) D Aϑ ( γ, ∞ ) ( t − s ) γ (cid:46) (cid:107) N s (cid:107) C γ ( t − s ) γ . (34)By Lemma 5.2 ( i ) and H¨older’s inequality, we have (cid:13)(cid:13)(cid:13) (cid:90) ts S ( t − r ) f ( X r ) dr (cid:13)(cid:13)(cid:13) p ∞ ≤ (cid:18)(cid:90) ts (cid:107) S ( t − r ) f ( X r ) (cid:107) ∞ dr (cid:19) p (cid:46) (cid:18)(cid:90) ts e − λ ( t − r ) (cid:107) f ( X r ) (cid:107) ∞ dr (cid:19) p ≤ ( t − s ) p − (cid:90) ts e − pλ ( t − r ) (cid:107) f ( X r ) (cid:107) p ∞ dr. (35)By combining (34) and (35), we obtain ( N t ) ≤ t ≤ T ∈ C γ ([0 , T ] , E ) almost surely and, under Assump-tion (B), E ( (cid:107) N t − N s (cid:107) p ∞ ) (cid:46) ( t − s ) γp E ( (cid:107) N s (cid:107) pC γ ) + ( t − s ) p (1 + sup h ≥ E ( (cid:107) X h (cid:107) pd ∞ ) , from which the result for ( N t ) follows due to ( i ). Step 2.

The case where ξ follows the stationary distribution can be treated as in ( i ) since (cid:107) S (1 + t ) ξ − S (1 + s ) ξ (cid:107) ∞ (cid:46) ( t − s ) γ (cid:107) S (1) ξ (cid:107) C γ (cid:46) ( t − s ) γ (cid:107) X (cid:107) ∞ . Step 3.

We transfer the result ( ii ) from X to f ( X ): First of all, the pathwise property is again aconsequence of the assumption f ∈ C ( R ). Next, without loss of generality, assume that d from (7)is given by d = 2 m for some m ∈ N . Then, using the formula a n − b n = ( a − b ) (cid:80) n − k =0 a k b n − − k for a, b ∈ R and n ∈ N , yields | f ( X t ( x )) − f ( X s ( x )) | ≤ (cid:90) X t ( x ) X s ( x ) | f (cid:48) ( h ) | dh (cid:46) (cid:90) X t ( x ) X s ( x ) (1 + h m ) dh = | X t ( x ) − X s ( x ) | + 12 m + 1 | X t ( x ) m +1 − X s ( x ) m +1 | (cid:46) | X t ( x ) − X s ( x ) | (cid:32) m (cid:88) k =0 | X t ( x ) k X s ( x ) m − k | (cid:33)(cid:124) (cid:123)(cid:122) (cid:125) =: Z s,t where we have assumed X t ( x ) ≥ X s ( x ) without loss of generality. Consequently, since ( s, t ) (cid:55)→ (cid:107) Z s,t (cid:107) ∞ is bounded in L p ( P ) for any p ≥ E ( (cid:107) f ( X t ) − f ( X s )) (cid:107) p ∞ ) (cid:46) E ( (cid:107) X t − X s (cid:107) p ∞ ) / E ( (cid:107) Z s,t (cid:107) p ∞ ) / (cid:46) ( t − s ) γp .

25e turn to the excess H¨older regularity of the nonlinear component ( N t ) of X . Since ( N t ) is thepathwise solution of the equation dN t = A ϑ N t + f ( S ( t ) ξ + X t + N t ) , N = 0, the almost sure propertiesare a consequence of the results of Sinestrari [44] on the regularity of solutions to deterministic systems.In the following, we give a direct proof for them, both for the sake of completeness and since we requireits steps in order to bound the respective norms in L p ( P ). Proof of Proposition 2.3. ( i ) Due to Proposition 2.2, we have f ( X t ) ∈ D A ϑ ( γ, ∞ ) = C γ for any γ < /

4. Further, for any ˜ γ ∈ ( γ, ), Lemma 5.2 ( iv ) yields that (cid:107) A ϑ N t (cid:107) C γ (cid:46) (cid:90) t (cid:107) A ϑ S ( t − s ) f ( X s ) (cid:107) C γ ds ≤ (cid:90) t h ( t − s ) (cid:107) f ( X s ) (cid:107) C γ ds with h ( r ) := e − λ r r − γ − γ . Since h is integrable over R + and A ϑ = ϑ ∂ ∂x , the almost sure properties N t ∈ C γ and sup t ≤ T (cid:107) A ϑ N t (cid:107) C γ < ∞ immediately follow from f ( X ) ∈ C ( R + , C γ ), cf. Proposi-tion 2.2. Further, Jensen’s inequality gives (cid:107) A ϑ N t (cid:107) pC γ (cid:46) (cid:90) t h ( t − s ) (cid:107) f ( X s ) (cid:107) pC γ ds (cid:18)(cid:90) t h ( r ) dr (cid:19) p − . Consequently, under Assumption (B), sup t ≥ E ( (cid:107) A ϑ N t (cid:107) pC γ ) (cid:46) sup t ≥ E ( (cid:107) f ( X t ) (cid:107) pC γ ) is ﬁnite, byProposition 2.2.( ii ) In order to prove ddt N t = A ϑ N t + f ( X t ) in E , note that the usual decomposition for theincrements of ( N t ) and formula (33) yield the representation∆ − ( N t +∆ − N t ) − A ϑ N t − f ( X t ) = 1∆ (cid:90) ∆0 ( S ( r ) − I ) A ϑ N t dr + 1∆ (cid:90) t +∆ t (cid:16) S ( t + ∆ − r ) f ( X r ) − f ( X t ) (cid:17) dr. We have (cid:107) ( S ( r ) − I ) A ϑ N t (cid:107) ∞ (cid:46) r γ (cid:107) A ϑ N t (cid:107) C γ and (cid:107) S ( h ) f ( X r ) − f ( X t ) (cid:107) ∞ ≤(cid:107) S ( h )( f ( X r ) − f ( X t )) (cid:107) ∞ + (cid:107) ( S ( h ) − I ) f ( X t ) (cid:107) ∞ (cid:46) (cid:107) f ( X r ) − f ( X t ) (cid:107) ∞ + h γ (cid:107) f ( X t ) (cid:107) C γ . Thus, ( i ) and Proposition 2.2 yield (cid:107) ∆ − ( N t +∆ − N t ) − A ϑ N t − f ( X t ) (cid:107) ∞ (cid:46) ∆ γ → ddt N t now follow from the properties of f ( X t ) provided by Proposition2.2 and (cid:107) A ϑ N t +∆ − A ϑ N t (cid:107) ∞ ≤ (cid:107) ( S (∆) − I ) A ϑ N t (cid:107) ∞ + (cid:90) t +∆ t (cid:107) A ϑ S ( t + ∆ − r ) f ( X r ) (cid:107) ∞ dr (cid:46) ∆ γ (cid:107) A ϑ N t (cid:107) C γ + (cid:90) t +∆ t ( t + ∆ − r ) − γ (cid:107) f ( X r ) (cid:107) C γ dr where the bound on the integrand is taken from result ( iii ) in Lemma 5.2.It remains to analyze the regularity of the process ( M t ). First of all, by (33), we have A ϑ M t = S ( t ) m − m and m ∈ C γ ([ b, − b ]) is trivially fulﬁlled. Further, setting m t := S ( t ) m , we have m t ( x ) = √ π (cid:80) (cid:96) ≥ − λ (cid:96) +1 t (cid:96) +1 e (cid:96) +1 ( x ). The mean value theorem yields m t ( x ) − m t ( y ) = ( x − y )8 (cid:88) (cid:96) ≥ e − λ (cid:96) +1 t cos( π (2 (cid:96) + 1) z )26or some z between x and y . Thanks to the bound on trigonometric series from [26, Lemma A.7],the sum (cid:80) (cid:96) ≥ e − λ (cid:96) +1 t cos( π (2 (cid:96) + 1) z ) is uniformly bounded in t > z ∈ [ b, − b ] and we canconclude sup t ≥ (cid:107) A ϑ M t (cid:107) C γ ([ b, − b ]) < ∞ . The same argument shows that (cid:107) S ( t + ∆) m − S ( t ) m (cid:107) C ([ b, − b ]) (cid:46) sup (cid:96) ≥ − e − λ (cid:96) +1 ∆ (cid:96) + 1 (cid:46) √ ∆ . Hence, (cid:107) ∆ − ( M t +∆ − M t ) − S ( t ) m (cid:107) C ([ b, − b ]) ≤ (cid:90) t +∆ t (cid:107) ( S ( r ) m − S ( t ) m ) (cid:107) C ([ b, − b ]) dr (cid:46) √ ∆and, in particular, ddt M t = S ( t ) m in C ([ b, − b ]) as well as (cid:107) ddt ( M t +∆ − M t ) (cid:107) C ([ b, − b ]) (cid:46) √ ∆ (cid:46) ∆ γ . σ and ϑ In the following, we prove the central limit theorems for the realized quadratic variations in thesemilinear framework, as claimed in Theorems 3.1, 3.2 and 3.3 of Section 3. Central limit theoremsfor the derived method of moments estimators for σ and ϑ follow directly in view of the delta method.Note that the central limit theorems for space and double increments have been derived in [26] forthe linear case f ≡

0, assuming a stationary initial condition. In fact, by employing a perturbationargument together with Slutsky’s Lemma, one can easily show that these central limit theorems re-main valid in the case X = 0 when using the slight modiﬁcation explained in Section 3. We omit adetailed veriﬁcation for the sake of brevity.We start by proving the result for time increments. Proof of Theorem 3.1.

It is suﬃcient to consider the process X t + t + N t + t instead of X t : if ξ followsthe stationary distribution, then X has the same distribution as ˜ X with ˜ X t := X t + t = S ( t ) S ( t ) ξ + X t + t + N t + t for any t > S ( t ) S ( t ) ξ ) t ≥ can be chosen arbitrarily regular by choosing t suﬃciently large. In fact, since the properties of ( N t ) used in the sequel, are the same under each ofthe initial conditions, we can assume ξ = 0 for simplicity. Then, we have¯ V t = 1 M N √ ∆ N − (cid:88) i =0 M − (cid:88) k =0 ( X t i +1 ( y k ) − X t i ( y k )) + 1 M N √ ∆ N − (cid:88) i =0 M − (cid:88) k =0 ( N t i +1 ( y k ) − N t i ( y k )) + 2 M N √ ∆ N − (cid:88) i =0 M − (cid:88) k =0 ( X t i +1 ( y k ) − X t i ( y k ))( N t i +1 ( y k ) − N t i ( y k ))=: V t + R + R . Since V t satisﬁes the claimed central limit theorem (13), due to Slutsky’s lemma, it suﬃces to provethat R and R are of the order o p (1 / √ M N ).If T is ﬁnite, it follows from Lemma 5.1 and Proposition 2.3 that for all γ < / P -almost allrealizations ω ∈ Ω, there exists a constant C = C ( ω, T ) such that | X t i +1 ( y k ) − X t i ( y k ) | ≤ C ∆ γ and | N t i +1 ( y k ) − N t i ( y k ) | ≤ C ∆ for all i ≤ N , k ≤ M and N, M ∈ N . Consequently, R and R are of theorder o p (∆ + γ ) and the statement follows due to the condition M = o (∆ − ρ ) for some ρ < / T → ∞ and Assumption (B) is satisﬁed, Lemma 5.1 and Proposition 2.3 yield E ( | R | ) (cid:46) ∆ / and, by applying the Cauchy-Schwarz inequality to the cross terms, we get E ( | R | ) (cid:46) ∆ + γ for any γ < /

4. The claim follows since √ M N ∆ + γ = √ T M ∆ γ converges to 0 for any γ ∈ ( ρ , ) and thefact that convergence in L ( P ) implies convergence in probability.Next, we prove the result for space increments.27 roof of Theorem 3.2. We only consider the case of a ﬁnite time horizon, the case T → ∞ can betreated similarly by taking expectations. Further, it suﬃces to consider the case ξ = 0, see also theproof of Theorem 3.1. We have¯ V sp = 1 M N δ N (cid:88) i =1 M − (cid:88) k =0 ( X t i ( y k +1 ) − X t i ( y k )) + 1 M N δ N (cid:88) i =1 M − (cid:88) k =0 ( N t i ( y k +1 ) − N t i ( y k )) + 2 M N δ N (cid:88) i =1 M − (cid:88) k =0 ( X t i ( y k +1 ) − X t i ( y k ))( N t i ( y k +1 ) − N t i ( y k ))=: V sp + R + R and the claim follows if R and R are of the order o p (1 / √ M N ).To bound the term R , we use the summation by parts formula M − (cid:88) k =0 a k ( b k +1 − b k ) = − M − (cid:88) k =0 ( a k +1 − a k ) b k +1 + a M − b M − a b . (36)Setting a k := N t i ( y k +1 ) − N t i ( y k ) and b k := X t i ( y k ), we get R = 2 M N δ N (cid:88) i =1 M − (cid:88) k =0 X t i ( y k +1 )( N t i ( y k +2 ) − N t i ( y k +1 ) + N t i ( y k ))+ 1 M N δ N (cid:88) i =1 (cid:0) ( N t i ( y N ) − N t i ( y N − )) X t i ( y N ) + ( N t i ( y ) − N t i ( y )) X t i ( y ) (cid:1) . By Lemma 5.1 and Proposition 2.3, we have ( X t ) ∈ C ( R + , C ([ b, − b ])) and sup t ≤ T (cid:107) A ϑ N t (cid:107) C ([ b, − b ]) < ∞ almost surely. Thus, there exists a random variable C = C ( ω, T ) with | N t i ( y k +2 ) − N t i ( y k +1 ) − N t i ( y k ) | ≤ Cδ , | N t i ( y k +1 ) − N t i ( y k ) | ≤ Cδ and | X t i | (cid:46) C for all i ≤ N, k ≤ M − M, N ∈ N almost surely. It follows that | R | ≤ C δ and | R | (cid:46) C δ hold almost surely and, therefore, the claimfollows from the fact that √ M N δ (cid:104) (cid:112)

N/M tends to 0, by assumption.Finally, we prove the result for double increments. To that aim, let N ik denote the double incrementscomputed from the process ( N t ), i.e., N ik := N t i +1 ( y k +1 ) − N t i +1 ( y k ) − N t i ( y k +1 ) + N t i ( y k ). The ﬁrststep of the proof of Theorem 3.3 is given by the following lemma. Lemma 5.3.

Assume that the constant b from the observation scheme deﬁned in Section 2.1 is strictlypositive and let p ≥ .(i) Let α ∈ (0 , and β ∈ (0 , be such that α + β < . If T is ﬁnite, then there exists a randomvariable C = C ( ω, T ) > such that | N ik | ≤ Cδ α ∆ β holds for all i ≤ N, k ≤ M and N, M ∈ N almost surely. If Assumption (B) is satisﬁed, thenthere exists a constant C > such that E ( | N ik | p ) ≤ C (cid:16) δ α ∆ β (cid:17) p holds for all i ≤ N, k ≤ M , N, M ∈ N uniformly in T > .(ii) Let γ < and ε < / . If T is ﬁnite, then there exists a random variable C = C ( ω, T ) > suchthat | N i ( k +1) − N ik | ≤ Cδ γ ∆ ε olds for all i ≤ N, k ≤ M and N, M ∈ N almost surely. If Assumption (B) is satisﬁed, thenthere exists a constant C > such that E ( | N i ( k +1) − N ik | p ) ≤ C (cid:0) δ γ ∆ ε (cid:1) p holds for all i ≤ N, k ≤ M , N, M ∈ N uniformly in T > .Proof. We write N ik = N ik + M ik where N ik and M ik are the double increments computed from theprocesses ( N t ) and ( M t ) deﬁned by (12), respectively. In the following, these double increments areestimated separately.( i ) For α ∈ (0 , | N ik | ≤ δ α (cid:107) N t i +1 − N t i (cid:107) C α ≤ δ α (cid:18) (cid:107) S (∆) − I ) N t i (cid:107) C α + (cid:13)(cid:13)(cid:13) (cid:90) t i +1 t i S ( t i +1 − s ) f ( X s ) ds (cid:13)(cid:13)(cid:13) C α (cid:19) . Further, using formula (33) and Lemma 5.2 ( iv ) in combination with α + β − ≤ α yields (cid:107) S (∆) − I ) N t i (cid:107) C α = (cid:13)(cid:13)(cid:13) (cid:90) ∆0 A ϑ S ( r ) N t i dr (cid:13)(cid:13)(cid:13) C α ≤ (cid:90) ∆0 (cid:107) S ( r ) (cid:107) L ( C α + β − ,C α ) (cid:107) A ϑ N t i (cid:107) C α + β − dr (cid:46) (cid:90) ∆0 r − − β (cid:107) A ϑ N t i (cid:107) C α + β − dr (cid:46) ∆ β (cid:107) A ϑ N t (cid:107) C α + β − . Similarly, by Lemma 5.2 ( iii ) and H¨older’s inequality, (cid:13)(cid:13)(cid:13) (cid:90) t i +1 t i S ( t i +1 − s ) f ( X s ) ds (cid:13)(cid:13)(cid:13) C α (cid:46) (cid:90) t i +1 t i ( t i +1 − s ) − − β (cid:107) f ( X s ) (cid:107) C α + β − ds (cid:46) (cid:18)(cid:90) t i +1 t i (cid:107) f ( X s ) (cid:107) pC α + β − ds (cid:19) p ∆ − p − − β . Thus, noting α + β − < , Propositions 2.2 and 2.3 yield the claim for the case of a ﬁnite timehorizon and, under Assumption (B), E ( | N ik | p ) (cid:46) δ pα (cid:18) ∆ p β E ( (cid:107) A ϑ N t (cid:107) pC α + β − ) + ∆ p − − p − β (cid:90) t i +1 t i E ( (cid:107) f ( X s ) (cid:107) pC α + β − ) ds (cid:19) (cid:46) δ pα ∆ p β . To verify that M ik is of the claimed order, recall that in the proof of Proposition 2.3 it is shown that ddt M t = S ( t ) m =: m t and that | m t ( x ) − m t ( y ) | (cid:46) | x − y | holds uniformly in t > x, y ∈ [ b, − b ].Thus, we have M ik = (cid:82) t i +1 t i ( m s ( y k +1 ) − m s ( y k )) ds and, consequently, | M ik | (cid:46) ∆ δ (cid:46) δ α ∆ β .( ii ) For γ ∈ (1 , | N i ( k +1) − N ik | ≤ δ γ (cid:107) N t i +1 − N t i (cid:107) C γ . Using the decomposition N t i +1 − N t i = (cid:90) t i S ( t i − s )( f ( X s +∆ ) − f ( X s )) ds + (cid:90) ∆0 S ( t i +1 − s ) f ( X s ) ds,

29e get from Lemma 5.2 ( i ) that (cid:13)(cid:13)(cid:13) (cid:90) t i S ( t i − s )( f ( X s +∆ ) − f ( X s )) ds (cid:13)(cid:13)(cid:13) C γ = (cid:13)(cid:13)(cid:13) (cid:90) t i S ( r )( f ( X t i +1 − r ) − f ( X t i − r )) dr (cid:13)(cid:13)(cid:13) C γ (cid:46) (cid:90) t i e − λ r r − γ (cid:107) f ( X t i +1 − r ) − f ( X t i − r ) (cid:107) ∞ dr. Further, for h < /

2, Lemma 5.2 ( iii ) gives (cid:13)(cid:13)(cid:13) (cid:90) ∆0 S ( t i +1 − s ) f ( X s ) ds (cid:13)(cid:13)(cid:13) C γ (cid:46) (cid:90) ∆0 ( t i +1 − r ) − γ − h (cid:107) f ( X r ) (cid:107) C h dr. Now, the result in case of a ﬁxed T follows from the path regularity of f ( X ). Further, underAssumption (B), we can use Jensen’s and H¨older’s inequality to estimate E ( | N i ( k +1) − N ik | p ) (cid:46) δ γ sup t ≥ E ( (cid:107) f ( X t +∆ ) − f ( X t ) (cid:107) p ∞ ) + δ γ ∆ p − − γ − h p (cid:90) ∆0 E ( (cid:107) f ( X t i +1 − r ) (cid:107) pC h ) dr (cid:46) δ γ ∆ pε + δ γ ∆ p (1 − γ − h ) . The result follows, since one can pick h ∈ (0 , ) such that 1 − γ − h ≥ ε .To estimate | M ik | , recall that in the proof of Proposition 2.3 it is shown that ∂ ∂x M t = ϑ A ϑ M t = ϑ ( S ( t ) − I ) m and that (cid:107) ∂ ∂x M t − ∂ ∂x M s (cid:107) C ([ b, − b ]) = ϑ (cid:107) S ( t ) m − S ( s ) m (cid:107) C ([ b, − b ]) (cid:46) (cid:112) | t − s | . Further,recall that by Taylor’s formula, we have the expansion h ( x + δ ) = h ( x )+ δh (cid:48) ( x )+ (cid:82) x + δx ( x + δ − z ) h (cid:48)(cid:48) ( z ) dz for any h ∈ C ( R ). Hence, we can write δ − ( h ( x + δ ) − h ( x ) + h ( x − δ )) = (cid:90) K δ ( z − x ) h (cid:48)(cid:48) ( z ) dz with K δ ( z ) := δ − K ( δ − z ) and the triangular kernel K ( z ) := (1 − | z | ) {− ≤ z ≤ } . Application to thedouble increments yields M i ( k +1) − M ik = δ (cid:90) x + δx − δ K δ ( z − x ) ∂ ∂z ( M t i +1 ( z ) − M t i ( z )) dz and, consequently, | M i ( k +1) − M ik | (cid:46) δ √ ∆ (cid:46) ∆ γ ∆ ε .The above lemma is the main ingredient for the following proof of the central limit theorem fordouble increments. Proof of Theorem 3.3.

As for time and space increments, we can assume ξ = 0 and the claim followsif we verify | R i | = o p (1 / √ M N ) , i ∈ { , } , with R := 1 M N Φ ϑ ( δ, ∆) N (cid:88) i =1 M − (cid:88) k =0 N ik , R := 1 M N Φ ϑ ( δ, ∆) N (cid:88) i =1 M − (cid:88) k =0 N ik D ik and where D ik are the double increments computed from ( X t ). In the following, we verify the claimunder Assumption (B). The result for the case of a ﬁxed T can be shown analogously by using thepathwise properties of ( N t ) derived in Lemma 5.3. We treat the cases M √ ∆ = O (1) and M √ ∆ → ∞ separately. Case M √ ∆ = O (1) : Using Lemma 5.3 with α = 0 and β = 1 yields E ( N ik ) (cid:46) ∆ and, hence, √ M N E ( | R | ) (cid:46) √ M NM N √ ∆ (cid:88) i,k E ( N ik ) (cid:46) √ M N ∆ / → . β = 1 and α = a < in Lemma 5.3 to bound E ( | D ik N ik | ) ≤ E ( D ik ) / E ( N ik ) / (cid:46) ∆ / ∆ δ a/ , implying √ M N E ( | R | ) → . Case M √ ∆ → ∞ : With β = 1 / α = a +34 < E ( N ik ) (cid:46) ∆ / δ α and, hence, √ M N E ( | R | ) → . To treat the cross terms, we use formula (36) with a k := N ik and b k := H ik := X t i +1 ( y k ) − X t i ( y k ) to deduce N (cid:88) i =1 M − (cid:88) k =0 D ik N ik = − N (cid:88) i =1 M − (cid:88) k =0 ( N i ( k +1) − N ik ) H i ( k +1) + N (cid:88) i =1 N i ( M − H iM − N (cid:88) i =1 N i H i . (37)Since E ( H ik ) (cid:46) √ ∆, Lemma 5.3 gives for any γ < ε < / E (cid:16)(cid:12)(cid:12)(cid:12) M N δ N (cid:88) i =1 M − (cid:88) k =0 ( N i ( k +1) − N ik ) H i ( k +1) (cid:12)(cid:12)(cid:12)(cid:17) ≤ M N δ N (cid:88) i =1 M − (cid:88) k =0 E (( N i ( k +1) − N ik ) ) / E ( H i ( k +1) ) / (cid:46) δ γ ∆ ε √ ∆ δ = δ γ − ∆ ε +1 / . Further, by picking ε and γ in such a way that 2 γ − ε > a , we get √ M N δ γ − ∆ ε +1 / → . Forthe remaining two terms in (37), take α = a +12 < β = 1 / E (cid:16)(cid:12)(cid:12)(cid:12) M N δ N (cid:88) i =1 N i ( M − H iM (cid:12)(cid:12)(cid:12)(cid:17) (cid:46) ∆ / δ α ∆ / M δ (cid:104) ∆ δ γ and, since √ M N ∆ δ α →

0, we obtain √ M N E ( | R | ) →

0, which ﬁnishes the proof. f For the proof of Theorem 4.3, we follow the main proof strategy from Comte et al. [15, Proposition1]. First, we verify our estimate for (cid:107) ˆ f m − f A (cid:107) N,M on the event Ω

N,M,m . Proof of Proposition 4.5.

By applying the Cauchy-Schwarz inequality, Young’s inequality and Lemma4.2 to (30), we can bound (cid:107) ˆ f m − f A (cid:107) N,M ≤(cid:107) f m − f A (cid:107) N,M + 2 N N − (cid:88) i =0 (cid:13)(cid:13) ˆ S (0) (cid:0) ˆ f m ( X t i ) − f m ( X t i ) (cid:1)(cid:13)(cid:13) L (cid:13)(cid:13) R i (cid:13)(cid:13) L + (cid:107) ˆ f m − f m (cid:107) N,M sup t ∈ S m , (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L ≤(cid:107) f m − f A (cid:107) N,M + 1 η (cid:107) ˆ f m − f m (cid:107) N,M + ηN N − (cid:88) i =0 (cid:13)(cid:13) R i (cid:13)(cid:13) L + 1 η (cid:107) ˆ f m − f m (cid:107) N,M + η (cid:32) sup t ∈ S m , (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L (cid:33) (cid:16) η (cid:17) (cid:107) f m − f A (cid:107) N,M + 4 η (cid:107) ˆ f m − f A (cid:107) N,M + ηN N − (cid:88) i =0 (cid:13)(cid:13) R i (cid:13)(cid:13) L + η (cid:32) sup t ∈ S m , (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L (cid:33) for any η >

0. Taking η = 8 and rearranging gives (cid:107) ˆ f m − f A (cid:107) N,M ≤ (cid:107) f m − f A (cid:107) N,M + 16 N N − (cid:88) i =0 (cid:13)(cid:13) R i (cid:13)(cid:13) L + 16 (cid:32) sup t ∈ S m , (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L (cid:33) . (38)The claim of the proposition follows by bounding the expectation on Ω N,M,m of the three terms onthe right hand side of the above inequality.For the ﬁrst term, we have E ( (cid:107) f m − f A (cid:107) N,M Ω N,M,m ) ≤ E ( (cid:107) f m − f A (cid:107) N,M ) = (cid:107) f m − f A (cid:107) π,M . To treat the second term, we show that E ( (cid:107) R i (cid:107) L ) (cid:46) M ∆ + ∆ γ (39)holds for any ρ < / γ < /

2: First of all, with f := f − f (0) and := [0 , , we have (cid:13)(cid:13)(cid:13) S ( h ) f ( X s ) − f ( X t ) (cid:13)(cid:13)(cid:13) L (cid:46) (cid:13)(cid:13)(cid:13) S ( h ) f ( X s ) − f ( X s ) (cid:13)(cid:13)(cid:13) ∞ + f (0) (cid:13)(cid:13)(cid:13) S ( h ) − (cid:13)(cid:13)(cid:13) L + (cid:13)(cid:13)(cid:13) f ( X s ) − f ( X t ) (cid:13)(cid:13)(cid:13) ∞ (cid:46) h γ (cid:107) f ( X s ) (cid:107) D A ( γ/ , ∞ ) + f (0) (cid:88) (cid:96) ≥ (1 − e − λ (cid:96) ∆ ) (cid:104) , e (cid:96) (cid:105) + (cid:13)(cid:13)(cid:13) f ( X s ) − f ( X t ) (cid:13)(cid:13)(cid:13) ∞ . Using Jensen’s inequality, D A ϑ ( γ , ∞ ) = C γ and (cid:104) , e (cid:96) (cid:105) (cid:46) (cid:96) − , we get for any γ < / E (cid:16)(cid:13)(cid:13) R cont i (cid:13)(cid:13) L (cid:17) ≤ E (cid:18)(cid:90) t i +1 t i (cid:13)(cid:13)(cid:13) S ( t i +1 − s ) f ( X s ) − f ( X t i ) (cid:13)(cid:13)(cid:13) L ds (cid:19) (cid:46) (cid:90) t i +1 t i ∆ γ E (cid:18)(cid:13)(cid:13)(cid:13) f ( X s ) (cid:13)(cid:13)(cid:13) C γ (cid:19) ds + f (0) √ ∆+ 1∆ (cid:90) t i +1 t i E (cid:18)(cid:13)(cid:13)(cid:13) f ( X s ) − f ( X t i ) (cid:13)(cid:13)(cid:13) ∞ (cid:19) ds (cid:46) ∆ γ in view of Proposition 2.2. Further, by Lemma 5.6, we have E ( (cid:107) f ( X t ) − ˆ S (0) f ( X t ) (cid:107) L ) (cid:46) E ( (cid:107) f ( X t ) (cid:107) C α + (cid:107) f ( X t ) (cid:107) ∞ + (cid:107) f ( X t ) (cid:107) D α ) δ α α +1 with the space D α deﬁned in (45). The expectation on the right hand side is ﬁnite as long as α < / α suﬃciently close to 1 /

4, we get E ( (cid:107) f ( X t ) − ˆ S (0) f ( X t ) (cid:107) L ) (cid:46) δ γ/ = M − γ/ = o (∆ γ )32nder the condition M ∆ → ∞ . To bound ∆ − E ( (cid:107) S ( h ) X t − ˆ S ( h ) X t (cid:107) L ) for h ∈ { , ∆ } , we use theusual decomposition X t = S ( t ) X + X t + N t where we can ﬁx a convenient value for t >

0, due tostationarity. Since the decomposition is trivial for t = 0, we pick t := t = ∆. The linear component X t can easily be treated due to independence in view of Lemma 4.2: E ( (cid:107) S ( h ) X t − ˆ S ( h ) X t (cid:107) L ) = E (cid:16) M − (cid:88) k =1 e − λ k h (cid:16) (cid:88) (cid:96) ∈I + k \{ k } u (cid:96) ( t ) − (cid:88) (cid:96) ∈I − k u (cid:96) ( t ) (cid:17) (cid:17) + E (cid:16) (cid:88) (cid:96) ≥ M e − λ (cid:96) h u (cid:96) ( t ) (cid:17) ≤ (cid:88) (cid:96) ≥ M E ( u (cid:96) ( t )) (cid:46) M and dividing by the squared renormalization ∆ yields the claimed O (1 / ( M ∆ ))-bound. The othertwo terms in the decomposition are of lower order: For S ( t ) X , we have (cid:107) S ( h ) S ( t ) X − ˆ S ( h ) S ( t ) X (cid:107) L = M − (cid:88) k =1 e − λ k h (cid:16) (cid:104) S ( t ) X , e k (cid:105) L − (cid:104) S ( t ) X , e k (cid:105) M (cid:17) + (cid:88) k ≥ M e − λ k ( h + t ) (cid:104) X , e k (cid:105) L . For the ﬁrst sum, Lemma 4.2 and the Cauchy-Schwarz inequality yields (cid:16) (cid:104) S ( t ) X , e k (cid:105) L − (cid:104) S ( t ) X , e k (cid:105) M (cid:17) = (cid:16) (cid:88) l ∈I + k \{ k } e − λ l t (cid:104) X , e l (cid:105) L − (cid:88) l ∈I − k e − λ l t (cid:104) X , e l (cid:105) L (cid:17) ≤ (cid:107) X (cid:107) L (cid:88) l ∈ ( I + k ∪I − k ) \{ k } e − λ l t and, thus, with t = ∆, M − (cid:88) k =1 e − λ k h (cid:16) (cid:104) S ( t ) X , e k (cid:105) L − (cid:104) S ( t ) X , e k (cid:105) M (cid:17) ≤ (cid:107) X (cid:107) L (cid:88) l ≥ M e − λ l ∆ ≤ (cid:107) X (cid:107) L √ ∆ (cid:90) ∞ M √ ∆ e − π ϑx dx (cid:46) (cid:107) X (cid:107) L M ∆ / . The same bound holds for the second sum since (cid:88) k ≥ M e − λ k ( h +∆) (cid:104) X , e k (cid:105) L ≤ (cid:107) X (cid:107) L e − λ M ∆ (cid:46) (cid:107) X (cid:107) L M ∆ (cid:46) (cid:107) X (cid:107) L M ∆ / . Therefore, assuming M ∆ → ∞ , we get∆ − E ( (cid:107) S ( h ) S ( t ) X − ˆ S ( h ) S ( t ) X (cid:107) L ) (cid:46) E ( (cid:107) X (cid:107) L ) 1 M ∆ / = o (cid:16) M ∆ (cid:17) . For the nonlinear part, set B k := (cid:80) (cid:96) ∈I + k \{ k } n (cid:96) ( t ) − (cid:80) (cid:96) ∈I − k n (cid:96) ( t ) with n (cid:96) ( t ) := (cid:104) N t , e (cid:96) (cid:105) L . Then, bythe Cauchy-Schwarz inequality, B k ≤ (cid:16) (cid:88) (cid:96) ∈ ( I + k ∪I − k ) \{ k } λ α(cid:96) n (cid:96) (cid:17)(cid:16) (cid:88) (cid:96) ∈ ( I + k ∪I − k ) \{ k } λ − α(cid:96) (cid:17) ≤ (cid:107) N t (cid:107) D α (cid:16) (cid:88) (cid:96) ∈ ( I + k ∪I − k ) \{ k } λ − α(cid:96) (cid:17) . Since, furthermore, (cid:80) (cid:96) ≥ M n (cid:96) ( t ) ≤ λ − αM (cid:107) N t (cid:107) D α , we have E ( (cid:107) S ( h ) N t − ˆ S ( h ) N t (cid:107) L ) ≤ E (cid:16) M − (cid:88) k =1 B k (cid:17) + E (cid:16) (cid:88) (cid:96) ≥ M n (cid:96) ( t ) (cid:17) (cid:46) E ( (cid:107) N t (cid:107) D α ) (cid:16) (cid:88) k ≥ M λ − αk + λ − αM (cid:17) M α − E ( (cid:107) N t (cid:107) D α ) . Now, by Remark 5.5, we have E ( (cid:107) N t (cid:107) D α ) < ∞ for α = 1 / E ( (cid:107) S ( h ) N t − ˆ S ( h ) N t (cid:107) L ) (cid:46) M ,which ﬁnishes the proof of (39).To treat the third term on the right hand side of (38), consider an orthonormal system { ϕ λ , λ ∈ Λ m } of S m with the property (cid:107) (cid:80) λ ∈ Λ m ϕ λ (cid:107) ∞ ≤ CD m which exists due to Assumption (N). Since onΩ N,M,m , (cid:107) t (cid:107) N,M = 1 implies (cid:107) t (cid:107) L ( A ) ≤ /c, we obtainsup t ∈ S m , (cid:107) t (cid:107) N,M =1 (cid:32) N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L (cid:33) Ω N,M,m ≤ c sup t ∈ S m , (cid:107) t (cid:107) L ≤ (cid:32) N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L (cid:33) = 1 c sup α ∈ R Λ m , (cid:107) α (cid:107)≤ (cid:32) (cid:88) λ ∈ Λ m α λ N N − (cid:88) i =0 (cid:10) ˆ S (0) ϕ λ ( X t i ) , ε i (cid:11) L (cid:33) (cid:46) (cid:88) λ ∈ Λ m (cid:32) N N − (cid:88) i =0 (cid:10) ˆ S (0) ϕ λ ( X t i ) , ε i (cid:11) L (cid:33) . To handle the expectation of the above bound, note that ε i = σ ∆ (cid:82) t i +1 t i S ( t i +1 − s ) dW s is independentof F t i and (cid:92) ϕ λ ( X t i ) := ˆ S (0) ϕ λ ( X t i ) is F t i -measurable, implying E (cid:16)(cid:10) (cid:92) ϕ λ ( X t i ) , ε i (cid:11) L |F t i (cid:17) = (cid:90) E (cid:0)(cid:0) (cid:92) ϕ λ ( X t i ) (cid:1) ( x ) ε i ( x ) |F t i (cid:1) dx = (cid:90) (cid:0) (cid:92) ϕ λ ( X t i ) (cid:1) ( x ) E ( ε i ( x ) |F t i ) dx = (cid:90) (cid:0) (cid:92) ϕ λ ( X t i ) (cid:1) ( x ) E ( ε i ( x )) dx = 0 . Hence, for j < i , we have E (cid:16)(cid:10) (cid:92) ϕ λ ( X t i ) , ε i (cid:11) L (cid:10) (cid:92) ϕ λ ( X t j ) , ε j (cid:11) L (cid:17) = E (cid:16) E (cid:16)(cid:10) (cid:92) ϕ λ ( X t i ) , ε i (cid:11) L (cid:12)(cid:12)(cid:12) F t i (cid:17) (cid:68) (cid:92) ϕ λ ( X t j ) , ε j (cid:69) L (cid:17) = 0and, consequently, E  sup t ∈ S m , (cid:107) t (cid:107) N,M =1 (cid:32) N N − (cid:88) i =0 (cid:10) (cid:92) t ( X t i ) , ε i (cid:11) L (cid:33) Ω N,M,m  ≤ (cid:88) λ ∈ Λ m N N − (cid:88) i =0 E (cid:16)(cid:10) (cid:92) ϕ λ ( X t i ) , ε i (cid:11) L (cid:17) . Further, Parseval’s relation yields E (cid:16)(cid:10) (cid:92) ϕ λ ( X t i ) , ε i (cid:11) L (cid:17) = E (cid:32)(cid:16) (cid:88) k ≥ (cid:10) (cid:92) ϕ λ ( X t i ) , e k (cid:11) L (cid:10) ε i , e k (cid:11) L (cid:17) (cid:33) = σ ∆ E (cid:32)(cid:16) (cid:88) k ≥ (cid:104) (cid:92) ϕ λ ( X t i ) , e k (cid:105) L (cid:90) t i +1 t i e − λ k ( t i +1 − s ) dβ k ( s ) (cid:17) (cid:33) = σ ∆ (cid:88) k ≥ E ( (cid:104) (cid:92) ϕ λ ( X t i ) , e k (cid:105) L ) E (cid:18)(cid:16) (cid:90) t i +1 t i e − λ k ( t i +1 − s ) dβ k ( s ) (cid:17) (cid:19) σ ∆ (cid:88) k ≥ − e − λ k ∆ λ k E ( (cid:104) (cid:92) ϕ λ ( X t i ) , e k (cid:105) L ) ≤ σ ∆ E (cid:16) (cid:107) (cid:92) ϕ λ ( X t i ) (cid:107) L (cid:17) = σ M ∆ E (cid:16) M − (cid:88) k =1 ϕ λ ( X t i ( y k )) (cid:17) . Above, we have used independence of the (one-dimensional) stochastic integrals from F t i and pairwiseindependence of { β k , k ≥ } in the third step as well as Lemma 4.2 in the last step. In view ofAssumption (N), we have shown E  sup t ∈ S m , (cid:107) t (cid:107) N,M =1 (cid:32) N N − (cid:88) i =0 (cid:10) ˆ S (0) t ( X t i ) , ε i (cid:11) L (cid:33) Ω N,M,m  ≤ T (cid:13)(cid:13)(cid:13) (cid:88) λ ∈ Λ m ϕ λ (cid:13)(cid:13)(cid:13) ∞ (cid:46) D m T which ﬁnishes the proof.The following proof veriﬁes our bound on the probability of the event Ξ N,M,m . Proof of Lemma 4.6.

We follow the steps of the proof of Lemma 1 in [15] which employs the stan-dard technique for deriving concentration inequalities for β -mixing sequences, see, e.g., Theorem 4in Doukhan [20, Section 1.4.2]. We require Bernstein’s inequality, which states that for independentreal-valued random variables Z , . . . , Z n with | Z i | ≤ B and E ( Z i ) ≤ ν for some constants B, ν > P (cid:0) | S n − E ( S n ) | ≥ ν √ x + Bx (cid:1) ≤ − nx where S n := 1 n n (cid:88) i =1 Z i (40)for any x >

0, see, e.g., Massart [37, Proposition 2.9]. In order to be able to make use of (40)in our context, we need to approximate the sequence X t , . . . , X t N by independent blocks. In fact,using Berbee’s coupling Lemma [6], it can be shown (see, e.g., the discussion following Lemma 5.1in [46]) that there exists a process ( X ∗ i ∆ , ≤ i ≤ N −

1) with the following properties. For every j = 0 , . . . , p N −

1, we have U j, := ( X [2 jq N +1]∆ , . . . , X [(2 j +1) q N ]∆ ) D = ( X ∗ [2 jq N +1]∆ , . . . , X ∗ [(2 j +1) q N ]∆ ) =: U ∗ j, ,U j, := ( X [(2 j +1) q N ]∆ , . . . , X [2( j +1) q N ]∆ ) D = ( X ∗ [(2 j +1) q N ]∆ , . . . , X ∗ [2( j +1) q N ]∆ ) =: U ∗ j, and for each a ∈ { , } , U ∗ ,a , . . . , U ∗ p N − ,a are independent and P ( U j,a (cid:54) = U ∗ j,a ) ≤ β X ( q N ∆) . Here, β X is the β -mixing coeﬃcient of X which is in our case given by (8). Set Ω ∗ := { X i ∆ = X ∗ i ∆ , i =0 , . . . , N − } and P ∗ := P ( · ∩ Ω ∗ ). Clearly, P (Ξ cN,M,m ) ≤ P (Ω ∗ ∩ Ξ cN,M,m ) + P ((Ω ∗ ) c )and, using the union bound, we get P ((Ω ∗ ) c ) ≤ p N β X ( q N ∆) ≤ N β X ( q N ∆) . It remains to show P ∗ (Ξ cN,M,m ) (cid:46) D m exp( − K (cid:48) p N L m ). To that aim, set v N,M ( t ) := 1 N M N − (cid:88) i =0 M − (cid:88) k =1 (cid:0) t ( X i ∆ ( y k )) − E (cid:0) t ( X i ∆ ( y k )) (cid:1)(cid:1)

35o that v N,M ( t ) = (cid:107) t (cid:107) N,M − (cid:107) t (cid:107) π,M . Recall the constants 0 < c < C < ∞ from the implication (28)of Assumption (E). We have P ∗ (cid:0) Ξ cN,M,m (cid:1) = P ∗ (cid:32) sup t ∈ S m \{ } (cid:12)(cid:12)(cid:12) (cid:107) t (cid:107) N,M − (cid:107) t (cid:107) π,M (cid:107) t (cid:107) π,M (cid:12)(cid:12)(cid:12) ≥ (cid:33) ≤ P ∗ (cid:32) sup t ∈ S m \{ } (cid:12)(cid:12)(cid:12) (cid:107) t (cid:107) N,M − (cid:107) t (cid:107) π,M c (cid:107) t (cid:107) L ( A ) (cid:12)(cid:12)(cid:12) ≥ (cid:33) = P ∗ (cid:32) sup t ∈ S m , (cid:107) t (cid:107) L A ) =1 | v N,M ( t ) | ≥ c (cid:33) . Now, each t ∈ S m with (cid:107) t (cid:107) L ( A ) = 1 has a representation t = (cid:80) λ ∈ Λ m α λ ϕ λ with (cid:80) λ ∈ Λ m α λ = 1 and v N,M ( t ) = (cid:88) λ,λ (cid:48) ∈ Λ m α λ α λ (cid:48) v N,M ( ϕ λ ϕ λ (cid:48) ) . On the set {| v N,M ( ϕ λ ϕ λ (cid:48) ) | ≤ V mλλ (cid:48) (2 Cx ) / + 2 B mλλ (cid:48) x, ∀ λ, λ (cid:48) ∈ Λ m } with x := c CL m , we have (cid:88) λ,λ (cid:48) ∈ Λ m | α λ α λ (cid:48) || v N,M ( ϕ λ ϕ λ (cid:48) ) | ≤ Cx ) / ρ ( V m ) + 2 xρ ( B m ) ≤ c √ c C ≤ c c ≤ C and, hence, sup t ∈ S m , (cid:107) t (cid:107) L A ) =1 | v N,M ( t ) | ≤ c is fulﬁlled.Consequently, P ∗ (cid:0) Ξ cN,M,m (cid:1) ≤ P ∗ (cid:16) ∃ λ, λ (cid:48) ∈ Λ m : | v N,M ( ϕ λ ϕ λ (cid:48) ) | ≥ V mλλ (cid:48) (2 Cx ) / + 2 B mλλ (cid:48) x (cid:17) ≤ (cid:88) λ,λ (cid:48) ∈ Λ m P ∗ (cid:16) | v N,M ( ϕ λ ϕ λ (cid:48) ) | ≥ V mλλ (cid:48) (2 Cx ) / + 2 B mλλ (cid:48) x (cid:17) . We decompose v N,M ( ϕ λ ϕ λ (cid:48) ) = v N,M ( ϕ λ ϕ λ (cid:48) ) + v N,M ( ϕ λ ϕ λ (cid:48) ) where v aN,M ( t ) := 1 p N p N − (cid:88) j =0 (cid:16) Z j,a ( t ) − E ( Z j,a ( t )) (cid:17) , Z j,a ( t ) := 1 q N q N (cid:88) i =1 M M − (cid:88) k =1 t ( U ij,a ( y k ))and U ij,a denotes the i -th entry of U j,a . Under P ∗ , the family ( Z ,a ( t ) , . . . , Z p N − ,a ( t )) is independentfor a ∈ { , } by construction and satisﬁes | Z j,a ( ϕ λ ϕ λ (cid:48) ) | ≤ B mλ,λ (cid:48) , E (cid:16) Z j,a ( ϕ λ ϕ λ (cid:48) ) (cid:17) ≤ q N M q N (cid:88) i =1 M − (cid:88) k =1 E (cid:0) ϕ λ ( U ij,a ( y k )) ϕ λ (cid:48) ( U ij,a ( y k )) (cid:1) = (cid:107) ϕ λ ϕ λ (cid:48) (cid:107) π,M ≤ C (cid:107) ϕ λ ϕ λ (cid:48) (cid:107) L ( A ) = C ( V mλ,λ (cid:48) ) where we have used Jensen’s inequality in the second line. Thus, by the Bernstein inequality (40), weget P ∗ (cid:16) | v N,M ( ϕ λ ϕ λ (cid:48) ) | ≥ V mλλ (cid:48) (2 Cx ) / + B mλλ (cid:48) x (cid:17) ≤ (cid:88) a =1 P ∗ (cid:16) | v aN,M ( ϕ λ ϕ λ (cid:48) ) | ≥ V mλλ (cid:48) (2 Cx ) / + B mλλ (cid:48) x (cid:17) ≤ − p N x . Summing up, we have shown P ∗ (cid:0) Ξ cN,M,m (cid:1) ≤ D m exp (cid:18) − p N c CL m (cid:19) which ﬁnishes the proof. 36ased on the previous results, we are now ready to verify the conclusion of the main theorem. Proof of Theorem 4.3.

Consider Ω

N,M,m = Ω

N,M,m, c as deﬁned in Proposition 4.5 with c > N,M,m , we have (cid:107) t (cid:107) N,M ≥ (cid:107) t (cid:107) π,M ≥ c (cid:107) t (cid:107) L ( A ) for all t ∈ S m , implying Ξ N,M,m ⊂ Ω N,M,m . Thus, E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M (cid:1) = E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M Ω N,M,m (cid:1) + E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M Ω cN,M,m (cid:1) (cid:46) (cid:107) f A − f m (cid:107) π,M + D m T + ∆ γ + 1 M ∆ + E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M Ω cN,M,m (cid:1) (cid:46) (cid:107) f A − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ + E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M Ξ cN,M,m (cid:1) by Proposition 4.5 and Assumption (E). In the following, we conclude the theorem by showing that E ( (cid:107) ˆ f m − f A (cid:107) N,M Ξ cN,M,m ) = o (∆ γ ) . We consider the Hilbert space H N := ( L (0 , N equipped with the inner product (cid:104) u, v (cid:105) H N := N (cid:80) Ni =1 (cid:104) u i , v i (cid:105) L for u, v ∈ H N . Note that (cid:107) t (cid:107) N,M = (cid:107) ¯ t (cid:107) H N with ¯ t := ( ˆ S (0) t ( X ) , . . . , ˆ S (0) t ( X t N − )).Clearly, the vector ( ˆ S (0) ˆ f m ( X ) , . . . , ˆ S (0) ˆ f m ( X ( N − )) is the orthogonal projection in H N of ¯ Y :=( Y , . . . , Y N − ) onto the subspace { ( ˆ S (0) t ( X ) , . . . , ˆ S (0) t ( X t N − )) , t ∈ S m } . Denoting the correspond-ing projection operator by Π m , we have (cid:107) ˆ f m − f A (cid:107) N,M ≤ (cid:107) ˆ f m − f (cid:107) N,M = (cid:107) Π m ¯ Y − ¯ f (cid:107) H N = (cid:107) ( I − Π m ) ¯ f (cid:107) H N + (cid:107) Π m ( ¯ Y − ¯ f ) (cid:107) H N ≤ (cid:107) ¯ f (cid:107) H N + (cid:107) ¯ Y − ¯ f (cid:107) H N since the operator norm of the projections is given by one. Now, E ( (cid:107) ¯ f (cid:107) H N Ξ cN,M,m ) = E ( (cid:107) f (cid:107) N,M Ξ cN,M,m ) ≤ E ( (cid:107) f ( X ) (cid:107) ∞ ) / P (Ξ cN,M,m ) / (cid:46) P (Ξ cN,M,m ) / and, due to Y i = ˆ S (0) f ( X t i ) + R i + ε i , we have E ( (cid:107) ¯ Y − ¯ f (cid:107) H N Ξ cN,M,m ) = 1 N N − (cid:88) i =0 E ( (cid:107) R i + ε i (cid:107) L Ξ cN,M,m ) (cid:46) ( E ( (cid:107) R i (cid:107) L ) / + E ( (cid:107) ε i (cid:107) L ) / ) P (Ξ cN,M,m ) / . It can be shown just like in the proof of Proposition 4.5 that E ( (cid:107) R i (cid:107) L ) = O (1) and an explicitcalculation yields E ( (cid:107) ε i (cid:107) L ) = σ ∆ E (cid:16) (cid:88) (cid:96) ≥ (cid:16) (cid:90) t i +1 t i e − λ (cid:96) ( t i +1 − s ) dβ (cid:96) ( s ) (cid:17) (cid:17)  = σ ∆ (cid:88) (cid:96),(cid:96) (cid:48) ≥ E (cid:18)(cid:16) (cid:90) t i +1 t i e − λ (cid:96) ( t i +1 − s ) dβ (cid:96) ( s ) (cid:17) (cid:16) (cid:90) t i +1 t i e − λ (cid:96) (cid:48) ( t i +1 − s ) dβ (cid:96) (cid:48) ( s ) (cid:17) (cid:19) (cid:46) (cid:16) (cid:88) (cid:96) ≥ − e − λ (cid:96) ∆ λ (cid:96) (cid:17) = O (∆ − ) . Gathering bounds, we have shown E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M Ξ cN,M,m (cid:1) (cid:46) ∆ − / P (Ξ cN,M,m ) / . D m ≤ N , we get P (Ξ cN,M,m ) (cid:46) N exp( − γq N ∆) + N exp (cid:16) − K (cid:48) N q N L m (cid:17) . Under the condition N ∆log N → ∞ , it is possible to choose ( q N ) such that q N / ( ν log N ∆ ) → ν >

0. Then, we have N exp( − γq N ∆) ≤ N exp( − γν N )) = N − ( γν − . Further, since L m = o ( N ∆log N ), for any β > L m ≤ β N ∆log N for suﬃciently large N , implying K (cid:48) N q N L m ≥ K (cid:48) N ∆4 ν log N L m ≥ K (cid:48) νβ log( N )as well as N exp (cid:16) − K (cid:48) N q N L m (cid:17) ≤ N − K (cid:48) νβ . Hence, for arbitrary α >

0, we can choose ν suﬃciently large and β suﬃciently small such that P (Ξ cN,M,m ) ≤ N − (2 α +3) and, thus, E (cid:0) (cid:107) ˆ f m − f A (cid:107) N,M Ξ cN,M,m (cid:1) (cid:46) ∆ − / P (Ξ cN,M,m ) / (cid:46) T / N α (cid:46) N α = ∆ α T α = o (∆ α ) . From here, the claim follows by choosing α = γ .Next, be prove our risk bound for our adaptive estimator. Proof of Theorem 4.7.

Due to the above argument in the proof of Theorem 4.3, it suﬃces to boundthe estimation error on the event Ξ

N,M, ¯ m . Since γ N,M ( ˆ f ˆ m ) + pen( ˆ m ) ≤ γ N,M ( f m ) + pen( m ) for any m ∈ M N,M and f m ∈ S m , we can modify the fundamental inequalities (30) and (38), respectively, to (cid:107) ˆ f ˆ m − f A (cid:107) N,M + 2pen( ˆ m ) ≤ (cid:107) f m − f A (cid:107) N,M + 2pen( m ) + 16 N N − (cid:88) i =0 (cid:107) R i (cid:107) L + 16 (cid:16) sup t ∈ S ˆ m,m : (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L (cid:17) with S m (cid:48) ,m := span( S m (cid:48) ∪ { f m } ). In view of (39) and Assumption (E), we obtain E (cid:0) (cid:107) ˆ f ˆ m − f A (cid:107) N,M Ξ N,M, ¯ m (cid:1) ≤ (cid:107) f m − f A (cid:107) π,M + 2pen( m ) + O (cid:16) M ∆ + ∆ γ (cid:17) + 16 E (cid:16)(cid:16) sup t ∈ S ˆ m,m : (cid:107) t (cid:107) N,M =1 (cid:16) N N − (cid:88) i =1 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L (cid:17) −

18 pen( ˆ m ) (cid:17) Ξ N,M, ¯ m (cid:17) (cid:46) (cid:107) f m − f (cid:107) L ( A ) + pen( m ) + O (cid:16) M ∆ + ∆ γ (cid:17) + (cid:88) m (cid:48) ∈M N,M E (cid:16)(cid:16) Γ( m (cid:48) , m ) −

18 pen( m (cid:48) ) (cid:17) + Ξ N,M, ¯ m (cid:17) (41)with Γ( m (cid:48) , m ) := sup t ∈ S m (cid:48) ,m : (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L . (42)38sing a martingale concentration of N (cid:80) N − i =1 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L and a classical chaining argument, Lemma 5.7shows E (cid:16)(cid:16) Γ( m (cid:48) , m ) −

18 pen( m (cid:48) ) (cid:17) + Ξ N,M, ¯ m (cid:17) ≤ σ T e − D m (cid:48) . Therefore, E (cid:0) (cid:107) ˆ f ˆ m − f (cid:107) N,M Ξ N,M, ¯ m (cid:1) (cid:46) (cid:107) f m − f (cid:107) L ( A ) + 2pen( m ) + 32 σ T (cid:88) m (cid:48) ∈M N,M e − D m (cid:48) + O (cid:16) M ∆ + ∆ γ (cid:17) . For our choices of approximation spaces, we have (cid:80) m (cid:48) ∈M N,M e − D m (cid:48) = O (1) such that this term isnegligible.Next, we prove our oracle inequality for the L -risk. Proof of Corollary 4.8.

First, we prove the bound in probability: For any a > (cid:107) f (cid:107) L ( A ) , the triangleinequality yields P (cid:0) (cid:107) ˆ f m − f (cid:107) L ( A ) ≥ a (cid:1) ≤ P (cid:0) (cid:107) ˆ f m − f m (cid:107) L ( A ) ≥ a − (cid:107) f m − f (cid:107) L ( A ) (cid:1) ≤ P (cid:0) Ξ N,M,m (cid:107) ˆ f m − f m (cid:107) L ( A ) ≥ a − (cid:107) f m − f (cid:107) L ( A ) (cid:1) + P (cid:0) Ξ cN,M,m (cid:1) . Now, as in the proof of Theorem 4.3, we have (cid:107) t (cid:107) N,M ≥ (cid:107) t (cid:107) π,M ≥ c (cid:107) t (cid:107) L ( A ) on Ξ N,M,m for all t ∈ S m where c > f m − f m ∈ S m as well as Markov’s inequality andthe triangle inequality, we get P (cid:0) Ξ N,M,m (cid:107) ˆ f m − f m (cid:107) L ( A ) ≥ a − (cid:107) f m − f (cid:107) L ( A ) (cid:1) ≤ P (cid:0) Ξ N,M,m (cid:107) ˆ f m − f m (cid:107) N,M ≥ c a − (cid:107) f m − f (cid:107) L ( A ) ) (cid:1) ≤ E (cid:0) Ξ N,M,m (cid:107) ˆ f m − f m (cid:107) N,M (cid:1) c ( a − (cid:107) f m − f (cid:107) L ( A ) ) ≤ E (cid:0) Ξ N,M,m (cid:107) ˆ f m − f A (cid:107) N,M (cid:1) + 4 (cid:107) f m − f A (cid:107) π,M c ( a − (cid:107) f m − f (cid:107) L ( A ) ) . Now, we set a := K (cid:0) (cid:107) f − f m (cid:107) L ( A ) + D m /T + ∆ γ + ( M ∆ ) − (cid:17) for some K >

1. By Proposition 4.5and Assumption (E), the above bound can be estimated up to a constant by (cid:107) f m − f (cid:107) L ( A ) + D m /T + ∆ γ + ( M ∆ ) − a − (cid:107) f m − f (cid:107) L ( A ) ≤ K − . Using P (Ξ cN,M,m ) (cid:46) N − α for any power α > ε > K >

N,M →∞ P (cid:16) (cid:107) ˆ f m − f A (cid:107) L ( A ) ≥ K (cid:0) (cid:107) f A − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ (cid:1)(cid:17) ≤ ε which veriﬁes the claimed bound in probability.Next, we consider the truncated estimator ˆ f K N m : We have (cid:107) ˆ f K N m − f (cid:107) L ( A ) ≤ (cid:107) ˆ f K N m − f (cid:107) L ( A ) Ξ N,M,m + 2( (cid:107) f (cid:107) L ∞ ( A ) + K N ) Ξ cN,M,m and, thus, as soon as K N ≥ (cid:107) f (cid:107) L ∞ ( A ) , we can further bound (cid:107) ˆ f K N m − f (cid:107) L ( A ) ≤ (cid:107) ˆ f m − f (cid:107) L ( A ) Ξ N,M,m + 4 K N Ξ cN,M,m . E ( (cid:107) ˆ f m − f (cid:107) L ( A ) Ξ N,M,m ) (cid:46) (cid:107) f A − f m (cid:107) L ( A ) + D m /T + ∆ γ + ( M ∆ ) − . The expectation ofthe second term, 4 K N P (Ξ cN,M,m ), decreases faster than any negative power of N , thanks to Lemma4.6 and the growth assumption on K N . Thus, 4 K N P (Ξ cN,M,m ) (cid:46) N − γ (cid:46) ∆ γ , which ﬁnishes theproof.The following proof veriﬁes that the convergence rate is not aﬀected when the parameter ϑ appearingin the deﬁnition of ˆ f m is replaced by an appropriate estimator. Proof of Theorem 4.9.

We verify the bound for (cid:107) ˇ f m − f A (cid:107) N,M . The bound for (cid:107) ˇ f m − f A (cid:107) L ( A ) thenfollows as in the proof of Corollary 4.8. We deﬁneΨ hN,M := (cid:110) ( ˆ ϑ − ϑ ) ≤ h ∆ / T (cid:111) . Step 1:

We show that E ( Ψ hN,M ∆ − (cid:107) ˆ S (∆) X t i − ˇ S (∆) X t i (cid:107) L ) (cid:46) hT :For ﬁxed ϑ ∈ (0 , ϑ ), we have ˆ ϑ ≥ ϑ on the event Ψ hN,M as soon as T is suﬃciently large. Thus, we canestimate (cid:107) ˆ S (∆) X t i − ˇ S (∆) X t i (cid:107) L = M − (cid:88) k =1 (e − λ k ∆ − e − ˆ λ k ∆ ) (cid:104) X t i , e k (cid:105) M (cid:46) ( ϑ − ˆ ϑ ) ∆ M − (cid:88) k =1 λ k e − ϑπ k ∆ (cid:104) X t i , e k (cid:105) M ≤ h ∆ / T M − (cid:88) k =1 λ k e − ϑπ k ∆ (cid:104) X t i , e k (cid:105) M on Ψ hN,M . Therefore, E ( Ψ hN,M (cid:107) ˆ S (∆) X t i − ˇ S (∆) X t i (cid:107) L ) (cid:46) h ∆ / T M − (cid:88) k =1 λ k e − ϑπ k ∆ E ( (cid:104) X t i , e k (cid:105) M )and the claim follows from a Riemann sum argument if we show that E ( (cid:104) X t i , e k (cid:105) M ) (cid:46) λ − k . To thataim, we apply the decomposition X t = S ( t ) ξ + X t + N t . As in previous results, S ( t ) ξ is negligiblesince we can choose t arbitrarily large due to stationarity. For the linear part, we have E ( (cid:104) X t i , e k (cid:105) M ) (cid:46) (cid:88) (cid:96) ∈I + k ∪I − k λ (cid:96) (cid:46) (cid:88) (cid:96) ≥ k + 2 (cid:96)M ) ≤ k (cid:88) (cid:96) ≥ (cid:96) ) (cid:46) λ k . Finally, for the nonlinear part, deﬁne n (cid:96) ( t ) := (cid:104) N t , e (cid:96) (cid:105) L . Then, using the Cauchy-Schwarz inequalityand the spaces D α from (45), we have (cid:104) N t , e k (cid:105) M ≤ (cid:16) (cid:88) (cid:96) ∈ ( I + k ∪I − k ) λ α(cid:96) n (cid:96) (cid:17)(cid:16) (cid:88) (cid:96) ∈ ( I + k ∪I − k ) λ − α(cid:96) (cid:17) ≤ (cid:107) N t (cid:107) D α (cid:16) (cid:88) (cid:96) ∈ ( I + k ∪I − k ) λ − α(cid:96) (cid:17) ≤ λ k (cid:107) N t (cid:107) D α (cid:16) (cid:88) (cid:96) ≥ λ − (2 α − (cid:96) (cid:17) (cid:46) λ k (cid:107) N t (cid:107) D α , α > /

4. Now, by picking α ∈ ( , E ( (cid:104) N t , e k (cid:105) M ) (cid:46) λ − k in view of Remark5.5. Step 2 : By Markov’s inequality, we can estimate P (cid:16) (cid:107) ˇ f m − f A (cid:107) N,M ≥ a (cid:17) ≤ P (cid:16) {(cid:107) ˇ f m − f A (cid:107) N,M ≥ a } ∩ Ψ hN,M ∩ Ξ N,M,m (cid:17) + P ((Ψ hN,M ) c ) + P (Ξ cN,M,m ) ≤ a − E (cid:16) Ψ hN,M ∩ Ξ N,M,m (cid:107) ˇ f m − f A (cid:107) N,M (cid:17) + P ((Ψ hN,M ) c ) + P (Ξ cN,M,m )for any a >

0. Now, using Step 1, we can show E (cid:16) Ψ hN,M ∩ Ξ N,M,m (cid:107) ˇ f m − f A (cid:107) N,M (cid:17) (cid:46) (cid:107) f − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ + hT just like in the proof of Proposition 4.5. Further, P (Ξ cN,M,m ) converges to 0 under the assumptions ofthis theorem and, thanks to Theorem 3.4, P ((Ψ hN,M ) c ) can be made arbitrarily small by choosing h suﬃciently large. Since h/T (cid:46) D m /T holds for any ﬁxed h , we have shown that, for arbitrary ε > K >

M,N →∞ P (cid:16) (cid:107) ˇ f m − f A (cid:107) N,M ≥ K (cid:0) (cid:107) f A − f m (cid:107) L ( A ) + D m T + ∆ γ + 1 M ∆ (cid:1)(cid:17) < ε. Before turning to the auxiliary results for nonparametric estimation of the nonlinearity f , we provethat condition (9) implies Assumptions (B) and (M). Proof of Proposition 2.1.

First, we sketch the existence proof and show that Assumption (B) is sat-isﬁed for ξ = 0. To that aim, we follow the line of arguments from [18, Theorem 7.7], see also [23,Propositon 6.1]. As before, write m ≡ f (0) as well as f ( x ) = f ( x ) − m and decompose X t = w ( t )+ v ( t )with w ( t ) := X t + (cid:82) t S ( r ) m dr and v ( t ) := S ( t ) ξ + (cid:82) t S ( t − s ) f ( X s ) dt . It follows from Lemma 5.1and (cid:107) S ( r ) m (cid:107) ∞ (cid:46) e − λ r (cid:107) m (cid:107) ∞ (cf. Lemma 5.2) that w ∈ C ( R + , E ) holds almost surely andsup t ≥ E ( (cid:107) w t (cid:107) p ∞ ) < ∞ . (43)Further, since F ( u ) := f ◦ u is a locally Lipschitz continuous function from E into itself, thereexists a solution to equation (2) up to a terminal time t max = t max ( ω ) >

0. Thus, global existencefollows from an a-priori estimate on (cid:107) v ( · ) (cid:107) ∞ . We consider the approximation v n := nR ( n, A ϑ ) S ( t ) ξ + (cid:82) t nR ( n, A ϑ ) S ( t − s ) f ( v ( s ) + w ( s )) ds where R ( n, A ϑ ) := ( nI − A ϑ ) − is the resolvent operator of A ϑ . Then, v n is diﬀerentiable in time, even when v is not. Now, for any x ∈ E and x ∗ ∈ ∂ (cid:107) x (cid:107) , itfollows like in [18, Example 7.8] that (cid:104) A ϑ x, x ∗ (cid:105) ≤ ∂ (cid:107) x (cid:107) is the subdiﬀerential of the norm.Recall that, for a function u ∈ E , the functional h u : E (cid:51) v (cid:55)→ σ u v ( ξ u ) with ξ u ∈ arg max r | u ( r ) | and σ u := sgn( u ( ξ u )) is an element of ∂ (cid:107) u (cid:107) . Thus, setting δ n ( t ) := v (cid:48) n ( t ) − A ϑ v n − f ( v n ( t ) + w ( t )), we canestimate d − dt (cid:107) v n ( t ) (cid:107) ∞ ≤ (cid:10) ddt v n ( t ) , h v n ( t ) (cid:11) = (cid:104) A ϑ v n ( t ) , h v n ( t ) (cid:105) + (cid:104) f ( v n ( t ) + w ( t )) , h v n ( t ) (cid:105) + (cid:104) δ n ( t ) , h v ( t ) (cid:105)≤ (cid:104) f ( v n ( t ) + w ( t )) , h v n ( t ) (cid:105) + (cid:107) δ n ( t ) (cid:107) ∞ ≤ − a (cid:107) v n ( t ) (cid:107) ∞ + b (cid:107) w ( t ) (cid:107) β ∞ + c + (cid:107) δ n ( t ) (cid:107) ∞ . Using Gronwall’s inequality and the fact that v n ( t ) → v ( t ) and δ n ( t ) → (cid:107) v ( t ) (cid:107) ∞ ≤ e − at (cid:107) ξ (cid:107) ∞ + (cid:90) t e − a ( t − s ) ( b (cid:107) w ( s ) (cid:107) β ∞ + c ) ds.

41y Jensen’s inequality, we pass to (cid:107) v ( t ) (cid:107) p ∞ (cid:46) e − apt (cid:107) ξ (cid:107) p ∞ + (cid:90) t e − a ( t − s ) ( b (cid:107) w ( t ) (cid:107) β ∞ + c ) p ds · (cid:18)(cid:90) t e − as ds (cid:19) p − and Fubini’s theorem as well as (43) show that there exists K > ξ = x ∈ E , we have E ( (cid:107) X t (cid:107) p ∞ ) (cid:46) e − apt (cid:107) x (cid:107) p ∞ + K. (44)In particular, Assumption (B) with ξ = 0 is satisﬁed.Further, based on their derivation of lower bounds for the transition densities associated with theMarkov semigroup ( P t ), Goldys and Maslowski [23, Theorem 6.3] show the existence of an invariantmeasure π on E and of constants C, γ > (cid:107) P ∗ t ν − π (cid:107) TV ≤ C (cid:16) (cid:90) E (cid:107) u (cid:107) ∞ ν ( du ) + 1 (cid:17) e − γt with P ∗ t ν := (cid:90) E P t ( u, · ) ν ( du )holds for all probability measures ν on E . Thus, we have (cid:107) P t ( x, · ) − π (cid:107) TV ≤ C ( (cid:107) x (cid:107) ∞ + 1)e − γt and P t ( x, · ) converges weakly to π ( · ) as t → ∞ for all x ∈ E . By Skorokhod’s representation theorem,there exists a probability space on which there are E -valued random variables Z, Z , Z , . . . with Z i ∼ P i ( x, · ) , Z ∼ π and Z i → Z almost surely. Denoting the expectation on the second probabilityspace by ˜ E , Fatou’s Lemma yields (cid:90) E (cid:107) u (cid:107) p ∞ π ( du ) = ˜ E ( (cid:107) Z (cid:107) p ∞ ) ≤ lim inf i →∞ ˜ E ( (cid:107) Z i (cid:107) p ∞ ) = lim inf i →∞ (cid:90) E (cid:107) u (cid:107) p ∞ P i ( x, du ) < ∞ by (44). Thus, if X = ξ ∼ π , then E ( (cid:107) X t (cid:107) p ∞ ) = E ( (cid:107) X (cid:107) p ∞ ) < ∞ and (cid:90) E (cid:107) P t ( u, · ) − π (cid:107) TV π ( du ) ≤ C (cid:16) (cid:90) E (cid:107) u (cid:107) ∞ π ( du ) + 1 (cid:17) e − γt (cid:46) e − γt , as required for (M) as well as (B) in case of a stationary initial condition. f The following proof veriﬁes the connection between the L -norm on [0 ,

1] and its empirical counterpart.

Proof of Lemma 4.2.

In view of Dini’s test, the H¨older condition implies convergence of the Fourierseries of H at the points y k , i.e., ¯ H n ( y k ) := (cid:80) nl =1 h l e l ( y k ) → H ( y k ) as n → ∞ for any 1 ≤ k ≤ M − |(cid:104) H, e k (cid:105) M − (cid:104) ¯ H n , e k (cid:105) M | ≤ M M − (cid:88) l =1 | H ( y l ) − ¯ H n ( y l ) || e k ( y l ) | tends to 0 as n → ∞ . Hence, the sequence (cid:104) ¯ H n , e k (cid:105) M = (cid:80) l ∈I + k ∩ [1 ,n ] h l − (cid:80) l ∈I − k ∩ [1 ,n ] h l converges tothe limit (cid:104) H, e k (cid:105) M , proving the ﬁrst part of the Lemma. In the same way, using e l ( y k ) = ± e j ( y k ) for l ∈ I ± j , one can show that H ( y k ) = (cid:80) M − l =1 H l e l ( y k ). Consequently,1 M M − (cid:88) k =1 H ( y k ) = 1 M M − (cid:88) k =1 (cid:16) M − (cid:88) l =1 H l e l ( y k ) (cid:17) = M − (cid:88) l,l (cid:48) =1 H l H l (cid:48) (cid:104) e l , e l (cid:48) (cid:105) M = M − (cid:88) l =1 H l = (cid:107) H M (cid:107) L . The following lemma analyzes the regularity of X t in the spaces D ε := D (( − A ϑ ) ε ) := (cid:8) u ∈ L ((0 , (cid:88) k ≥ λ εk (cid:104) u, e k (cid:105) < ∞ (cid:9) (45)endowed with the norm (cid:107) u (cid:107) D ε := (cid:107) ( − A ϑ ) ε u (cid:107) L . For ε < /

4, these spaces can be identiﬁed with L -Sobolov spaces on (0 , D ε = W ε, and the norms are equivalent. For a proof of thischaracterization, we refer to, e.g., [9]. 42 emma 5.4. Under Assumption ( M ) , we have E ( (cid:107) X t (cid:107) pD ε ) = E ( (cid:107) X (cid:107) pD ε ) < ∞ and E ( (cid:107) f ( X t ) (cid:107) pD ε ) = E ( (cid:107) f ( X ) (cid:107) pD ε ) < ∞ for all ε < / and p ≥ .Proof. We use the usual decomposition X t = S ( t ) X + X t + N t . By stationarity, we may choose t = 1.As before, E ( (cid:107) X (cid:107) pD ε ) < ∞ can be shown by a direct calculation. Further, E ( (cid:107) S (1) X (cid:107) pD ε ) < ∞ follows from (cid:107) S (1) X (cid:107) D ε = (cid:88) k ≥ e − λ k λ εk (cid:104) X , e k (cid:105) ≤ (cid:107) X (cid:107) L (cid:88) k ≥ e − λ k λ εk (cid:46) (cid:107) X (cid:107) ∞ . To treat N = (cid:82) S (1 − s ) f ( X s ) ds , note that (cid:107) ( − A ϑ ) ε S ( h ) u (cid:107) L = (cid:88) k ≥ λ εk e − λ k h (cid:104) u, e k (cid:105) ≤ sup λ ≥ λ λ ε e − λh (cid:107) u (cid:107) L . The function λ (cid:55)→ λ ε e − λh attains its maximum over R + in λ ∗ := ε/h and is monotonically decreasingon [ λ ∗ , ∞ ). Thus, we have sup λ ≥ λ λ ε e − λh ≤ g ( h ) with g ( h ) := ( ε e h ) ε for h ≤ ε/λ and g ( h ) := λ ε e − λ h for h > ε/λ . Since g ∈ L ( R + ), we can use Jensen’s inequality to show (cid:107) N (cid:107) pD ε ≤ (cid:16) (cid:90) g (1 − s ) (cid:107) f ( X s ) (cid:107) L ds (cid:17) p ≤ (cid:16) (cid:90) g ( s ) ds (cid:17) p − (cid:16) (cid:90) t g (1 − s ) (cid:107) f ( X s ) (cid:107) pL ds (cid:17) (cid:46) (cid:90) g (1 − s ) (cid:107) f ( X s ) (cid:107) pL ds. Therefore, E ( (cid:107) N (cid:107) pD ε ) (cid:46) E ( (cid:107) f ( X ) (cid:107) pL ) (cid:46) E ( (cid:107) f ( X ) (cid:107) p ∞ ) < ∞ by Assumption (M) which shows theclaim for X t . In order to transfer the result to f ( X t ), we estimate (cid:107) f ( X t ) (cid:107) D ε (cid:46) (cid:107) f ( X t ) (cid:107) W ε, = (cid:107) f ( X t ) (cid:107) L + (cid:90) (cid:90) ( f ( X t ( x )) − f ( X t (( y ))) | x − y | ε dxdy ≤ (cid:107) f ( X t ) (cid:107) L + (cid:107) f (cid:48) ( X t ) (cid:107) ∞ (cid:107) X t (cid:107) D ε (cid:46) (cid:107) f ( X t ) (cid:107) L + (cid:107) f (cid:48) ( X t ) (cid:107) ∞ + (cid:107) X t (cid:107) D ε , from where the claim follows by Assumption (M) and (7) in view of the ﬁrst part of this proof. Remark . The treatment of the nonlinear component in the above proof shows that E ( (cid:107) N t (cid:107) pD ε ) < ∞ holds for all ε < (cid:107) ˆ S (0) f ( X t ) − f ( X t ) (cid:107) L appearing in theremainder term R i from the regression model (25). Of particular interest to us is the situation where α is close to 1 / α α +1 can be chosen close to 1 / Lemma 5.6.

Let H ∈ C α ([0 , ∩ D α for some α ∈ (0 , ) . Further, let H M := (cid:80) M − k =1 H k e k where H k := (cid:104) H, e k (cid:105) M = M (cid:80) M − l =1 H ( y l ) e k ( y l ) . Then, there exists a constant C > such that (cid:107) H − H M (cid:107) L ≤ CK δ α α +1 where K := max( (cid:107) H (cid:107) ∞ , (cid:107) H (cid:107) C α , (cid:107) H (cid:107) D α ) .Proof. First of all, by regarding H k as a Riemann sum, we can bound | H k − h k | = (cid:12)(cid:12)(cid:12) M M − (cid:88) l =1 H ( y l ) e k ( y l ) − (cid:90) H ( y ) e k ( y ) dy (cid:12)(cid:12)(cid:12) ≤ M (cid:88) l =0 (cid:90) y l +1 y l | H ( y l ) e k ( y l ) − H ( y ) e l ( y ) | dy (cid:46) ( (cid:107) e k (cid:107) ∞ (cid:107) H (cid:107) C α + (cid:107) H (cid:107) ∞ (cid:107) e k (cid:107) C α ) δ α (cid:46) ( (cid:107) H (cid:107) C α + (cid:107) H (cid:107) ∞ k α ) δ α (cid:46) Kλ αk δ α . (46)43imilarly, since M (cid:80) M − k =1 H ( y k ) = (cid:107) H M (cid:107) L = (cid:80) M − k =1 H k holds by Lemma 4.2, we have (cid:12)(cid:12)(cid:12) (cid:107) H M (cid:107) L − (cid:107) H (cid:107) L (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) M M − (cid:88) k =1 H ( y k ) − (cid:107) H (cid:107) L (cid:12)(cid:12)(cid:12) ≤ M − (cid:88) k =0 (cid:90) y k +1 y k | H ( y k ) − H ( y ) | dy ≤ (cid:107) H (cid:107) C α δ α ≤ (cid:107) H (cid:107) ∞ (cid:107) H (cid:107) C α δ α (cid:46) K δ α . (47)Also, note that for h k := (cid:104) H, e k (cid:105) L and any R ∈ N , we have (cid:88) l ≥ R h l ≤ λ − αR (cid:88) l ≥ R λ αl h l ≤ (cid:107) H (cid:107) D α λ − αR (cid:46) K /R α . (48)The three inequalities just derived are now used to bound (cid:107) H − H M (cid:107) L = (cid:107) H M (cid:107) L − (cid:107) H (cid:107) L + 2 (cid:104) H − H M , H (cid:105) L ≤ |(cid:107) H M (cid:107) L − (cid:107) H (cid:107) L | + 2 |(cid:104) H − H M , H (cid:105) L | :Due to (47), the ﬁrst term can be bounded by K δ α (cid:46) K δ α α +1 up to a constant. For the secondterm, using Parseval’s identity, we get |(cid:104) H − H M , H (cid:105) L | = (cid:12)(cid:12)(cid:12) M − (cid:88) l =1 ( h l − H l ) h l + ∞ (cid:88) l = M h l (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) M − (cid:88) l =1 ( h l − H l ) h l (cid:12)(cid:12)(cid:12) + ∞ (cid:88) l = M h l =: T + T . It follows directly from (48) that T (cid:46) K /M (cid:46) K δ α α +1 . To estimate T , we decompose T ≤ (cid:12)(cid:12)(cid:12) M − (cid:88) l =1 ( h l − H l ) h l (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) M − (cid:88) l = M ( h l − H l ) h l (cid:12)(cid:12)(cid:12) =: T + T for some intermediate value M ∈ { , . . . M − } . Now, using the Cauchy-Schwarz inequality and (46),we get T ≤ (cid:16) M − (cid:88) l =1 λ − αl ( h l − H l ) (cid:17)(cid:16) M − (cid:88) l =1 λ αl h l (cid:17) (cid:46) K M δ α (cid:107) H (cid:107) D α (cid:46) K M δ α and, by (48), T ≤ M − (cid:88) l = M ( h l − H l ) M − (cid:88) l = M h l (cid:46) ( (cid:107) H (cid:107) L + (cid:107) H M (cid:107) L ) ∞ (cid:88) l = M h l (cid:46) K /M α . Balancing the bounds for T and T shows that it is optimal to take M (cid:104) δ − α α +1 and, with thischoice, we obtain the overall bound T (cid:46) K δ α α +1 which ﬁnishes the proof.Finally, we bound the expectation of Γ( m (cid:48) , m ) from (42). Lemma 5.7.

Let C ≥ be a constant satisfying property (28) . If pen( m (cid:48) ) ≥

400 log (cid:0) √ C (cid:1) σ D m (cid:48) T ,then E (cid:16)(cid:16) Γ( m (cid:48) , m ) −

18 pen( m (cid:48) ) (cid:17) + Ξ N,M, ¯ m (cid:17) ≤ σ T e − D m (cid:48) . Proof. Step 1.

For any τ, v > t ∈ (cid:83) m ∈ N S m , we prove that P (cid:16) N N − (cid:88) i =0 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L ≥ τ, (cid:107) t (cid:107) N,M ≤ v (cid:17) ≤ e − ( T τ ) / (2 σ v ) . (49)44ince (cid:92) t ( X t i ) = ˆ S (0) t ( X t i ) = M − (cid:88) (cid:96) =1 (cid:104) X t i , e (cid:96) (cid:105) M e (cid:96) and ε i = σ ∆ (cid:90) t i +1 t i S ( t i +1 − s ) dW s , Parseval’s identity yields1 N N − (cid:88) i =0 (cid:104) (cid:91) t ( X i ) , ε i (cid:105) L = 1 N N − (cid:88) i =0 M − (cid:88) (cid:96) =1 (cid:104) t ( X t i ) , e (cid:96) (cid:105) M σ ∆ (cid:90) t i +1 t i e − λ (cid:96) ( t i +1 − s ) dβ (cid:96) ( s ) = σT Y M ( T )with the Itˆo process Y M ( s ) := (cid:80) M − (cid:96) =1 Y (cid:96) ( s ) , s ≥ , and, for (cid:96) = 1 , . . . , M − Y (cid:96) ( s ) := (cid:90) s H (cid:96) ( h ) dβ (cid:96) ( h ) , H (cid:96) ( h ) := N − (cid:88) i =0 (cid:104) t ( X t i ) , e (cid:96) (cid:105) M e − λ (cid:96) ( t i +1 − h ) [ t i ,t i +1 ) ( h ) . For each (cid:96) , the processes Y (cid:96) as well as Y M are martingales with quadratic variation satisfying (cid:104) Y M (cid:105) T = M − (cid:88) (cid:96) =1 (cid:104) Y (cid:96) (cid:105) T = M − (cid:88) (cid:96) =1 N − (cid:88) i =0 (cid:104) t ( X t i ) , e (cid:96) (cid:105) M (cid:90) t i +1 t i e − λ (cid:96) ( t i +1 − s ) ds = M − (cid:88) (cid:96) =1 N − (cid:88) i =0 (cid:104) t ( X t i ) , e (cid:96) (cid:105) M − e − λ (cid:96) ∆ λ (cid:96) ≤ N ∆ (cid:107) t (cid:107) N,M where we used Lemma 4.2 in the last inequality. For any λ >

0, the process (exp( λY Ms − λ (cid:104) Y M (cid:105) s ) , s ≥

0) inherits the martingale property from Y M . Hence, P (cid:16) N N − (cid:88) i =0 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L ≥ τ, (cid:107) t (cid:107) M,N ≤ v (cid:17) ≤ P (cid:16) σT Y M ( T ) ≥ τ, (cid:104) Y M (cid:105) T ≤ T v (cid:17) ≤ P (cid:16) exp (cid:16) λY M ( T ) − λ (cid:104) Y M (cid:105) T (cid:17) ≥ exp (cid:16) λ T τσ − λ T v (cid:17)(cid:17) ≤ exp (cid:16) − λ T τσ + λ T v (cid:17)(cid:17) . Choosing the minimizer λ = τσv yields (49). Step 2.

We use a chaining argument to deduce from Step 1 a bound for (41): Due to the nestingassumption, we have on Ξ

N,M, ¯ m thatsup t ∈ S m (cid:48) ,m : (cid:107) t (cid:107) N,M =1 N N − (cid:88) i =0 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L (cid:46) sup t ∈ S m (cid:48) ,m : (cid:107) t (cid:107) L =1 N N − (cid:88) i =0 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L . Since the dimension of S m (cid:48) ,m is bounded by D m (cid:48) + 1, we can cover the L -unit ball in S m (cid:48) ,m witha ε -net G ε of the maximal size (3 /ε ) D m (cid:48) +1 [37, Lemma 4.14]. Considering a sequence G ε k of ε k -netswith ε k = ε − k , k ≥

1, and some ε >

0, we denote for any t ∈ S m (cid:48) ,m by π k ( t ) the closest element in G ε k and set π ( f ) = 0. We obtain from t = (cid:80) k ≥ ( π k ( t ) − π k − ( t )) and Lemma 4.2 the decomposition1 N N − (cid:88) i =1 (cid:104) (cid:92) t ( X t i ) , ε i (cid:105) L = 1 N N − (cid:88) i =0 (cid:88) k ≥ (cid:104) ˆ S ( π k ( t ) − π k − ( t ))( X t i ) , ε i (cid:105) L . Under Assumption (E) and on the event Ξ

N,M, ¯ m , we have (cid:107) π k ( t ) − π k − ( t ) (cid:107) N,M ≤ (cid:113) C (cid:107) π k ( t ) − π k − ( t ) (cid:107) L ≤ (cid:113) C ε (2 − k + 2 − k +1 ) ≤ √ Cε − k for C from (28). Together with Step 2, we obtain for45 = C − / / τ k ) that P (cid:16)(cid:110) Γ( m (cid:48) , m ) ≥ (cid:88) k ≥ − k τ k (cid:111) ∩ Ξ N,M, ¯ m (cid:17) ≤ (cid:88) k ≥ (cid:88) t k ∈G εk ,t k − ∈G εk − P (cid:16) N N − (cid:88) i =0 (cid:104) ˆ S ( t k − t k − )( X t i ) , ε i (cid:105) L ≥ − k τ k , (cid:107) t k − t k − (cid:107) N,M ≤ √ Cε − k (cid:17) ≤ (cid:88) k ≥ (cid:88) t k ∈G εk ,t k − ∈G εk − e − T τ k / (2 σ ) = (cid:88) k ≥ exp (cid:0) − T τ k / (2 σ ) + log |G ε k | + log |G ε k − | (cid:1) . In view of log |G ε k | ≤ ( D m (cid:48) + 1) log(3 /ε k ) = ( D m (cid:48) + 1)(log(3 /ε ) + k log 2) ≤ /ε )( D m (cid:48) + 1) k wechoose τ k = σ T ( D m (cid:48) + τ + 4 log(3 /ε )( D m (cid:48) + 2) k ) for some τ >

0. Then the above probability isbounded by e − τ − D m (cid:48) (cid:88) k ≥ e − /ε ) k ≤ e − τ − D m (cid:48) . Owing to ( (cid:80) k ≥ − k τ k ) ≤ (cid:80) k ≥ − k τ k = σ T ( D m (cid:48) + τ + 4 log(3 /ε )( D m (cid:48) + 2) (cid:80) k ≥ k − k ) ≤ σ τT + σ T log(3 /ε )) D m (cid:48) . We conclude that P (cid:16)(cid:110) Γ( m (cid:48) , m ) −

50 log(3 /ε ) σ D m (cid:48) T ≥ σ τT (cid:111) ∩ Ξ N,M, ¯ m (cid:17) ≤ e − τ − D m (cid:48) . If pen( m (cid:48) ) ≥

400 log(3 /ε ) σ D m (cid:48) T , we obtain E (cid:16)(cid:16) Γ( m (cid:48) , m ) −

18 pen( m (cid:48) ) (cid:17) + Ξ N,M, ¯ m (cid:17) ≤ (cid:90) ∞ P (cid:16)(cid:110) Γ( m (cid:48) , m ) −

50 log(3 /ε ) σ D m (cid:48) T ≥ r (cid:111) ∩ Ξ N,M, ¯ m (cid:17) dr ≤ e − D m (cid:48) (cid:90) ∞ e − T r/ (2 σ ) dr = 2 σ T e − D m (cid:48) . References [1] Altmeyer, R., Bretschneider, T., Jan´ak, J., and Reiß, M. (2020a). Parameter estimation in anSPDE model for cell repolarisation. arXiv preprint arXiv:2010.06340 .[2] Altmeyer, R., Cialenco, I., and Pasemann, G. (2020b). Parameter estimation for semilinear SPDEsfrom local measurements. arXiv preprint arXiv:2004.14728 .[3] Altmeyer, R. and Reiß, M. (2020). Nonparametric estimation for linear SPDEs from local mea-surements.

Ann. Appl. Probab.

Forthcoming.[4] Bally, V. and Pardoux, E. (1998). Malliavin calculus for white noise driven parabolic SPDEs.

Potential Anal. , 9(1):27–64.[5] Baraud, Y., Comte, F., and Viennet, G. (2001). Adaptive estimation in autoregression or β -mixingregression via model selection. Ann. Statist. , 29(3):839–875.[6] Berbee, H. C. P. (1979).

Random walks with stationary increments and renewal theory . Mathe-matical Centre Tracts. Mathematisch Centrum, Amsterdam.[7] Bibinger, M. and Trabs, M. (2020). Volatility estimation for stochastic PDEs using high-frequencyobservations.

Stochastic Process. Appl. , 130(5):3005–3052.468] Birg´e, L. and Massart, P. (1997). From model selection to adaptive estimation. In

Festschrift forLucien Le Cam , pages 55–87. Springer, New York.[9] Bonforte, M., Sire, Y., and V´azquez, J. L. (2015). Existence, uniqueness and asymptotic be-haviour for fractional porous medium equations on bounded domains.

Discrete Contin. Dyn. Syst. ,35(12):5725–5767.[10] Cerrai, S. (1999). Ergodicity for stochastic reaction-diﬀusion systems with polynomial coeﬃcients.

Stochastics Stochastics Rep. , 67(1-2):17–51.[11] Chong, C. (2020). High-frequency analysis of parabolic stochastic PDEs.

Ann. Statist. ,48(2):1143–1167.[12] Cialenco, I. (2018). Statistical inference for SPDEs: an overview.

Stat. Inference Stoch. Process. ,21(2):309–329.[13] Cialenco, I. and Glatt-Holtz, N. (2011). Parameter estimation for the stochastically perturbedNavier-Stokes equations.

Stochastic Process. Appl. , 121(4):701–724.[14] Cialenco, I. and Huang, Y. (2020). A note on parameter estimation for discretely sampled SPDEs.

Stoch. Dyn. , 20(3).[15] Comte, F., Genon-Catalot, V., Rozenholc, Y., et al. (2007). Penalized nonparametric mean squareestimation of the coeﬃcients of diﬀusion processes.

Bernoulli , 13(2):514–543.[16] Comte, F. and Rozenholc, Y. (2002). Adaptive estimation of mean and volatility functions in(auto-)regressive models.

Stochastic Process. Appl. , 97(1):111–145.[17] Da Prato, G. and Zabczyk, J. (1996).

Ergodicity for inﬁnite-dimensional systems , volume 229 of

London Mathematical Society Lecture Note Series . Cambridge University Press, Cambridge.[18] Da Prato, G. and Zabczyk, J. (2014).

Stochastic Equations in Inﬁnite Dimensions . CambridgeUniversity Press, Cambridge.[19] Daubechies, I. (1992).

Ten lectures on wavelets , volume 61 of

CBMS-NSF Regional ConferenceSeries in Applied Mathematics . Society for Industrial and Applied Mathematics (SIAM), Philadel-phia, PA.[20] Doukhan, P. (1994).

Mixing , volume 85 of

Lecture Notes in Statistics . Springer-Verlag, NewYork. Properties and examples.[21] Galtchouk, L. I. and Pergamenshchikov, S. M. (2015). Eﬃcient pointwise estimation based ondiscrete data in ergodic nonparametric diﬀusions.

Bernoulli , 21(4):2569–2594.[22] Goldys, B. and Maslowski, B. (2002). Parameter estimation for controlled semilinear stochasticsystems: identiﬁability and consistency.

J. Multivariate Anal. , 80(2):322–343.[23] Goldys, B. and Maslowski, B. (2006). Lower estimates of transition densities and bounds onexponential ergodicity for stochastic PDE’s.

Ann. Probab. , 34(4):1451–1496.[24] Haken, H. (2013).

Synergetics: Introduction and advanced topics . Springer Science & BusinessMedia.[25] Hildebrandt, F. (2020). On generating fully discrete samples of the stochastic heat equation onan interval.

Statist. Probab. Lett. , 162. Article 108750.[26] Hildebrandt, F. and Trabs, M. (2019). Parameter estimation for spdes based on discrete obser-vations in time and space. arXiv preprint arXiv:1910.01004 .4727] Hoﬀmann, M. (1999). Adaptive estimation in diﬀusion processes.

Stochastic Process. Appl. ,79(1):135–163.[28] Kaino, Y. and Uchida, M. (2020). Adaptive estimator for a parabolic linear SPDE with a smallnoise. arXiv preprint arXiv:2008.05353 .[29] Kaino, Y. and Uchida, M. (2021). Parametric estimation for a parabolic linear SPDE modelbased on discrete observations.

J. Statist. Plann. Inference , 211:190–220.[30] Koski, T. and Loges, W. (1985). Asymptotic statistical inference for a stochastic heat ﬂowproblem.

Statist. Probab. Lett. , 3:185–189.[31] Lototsky, S. V. (2009). Statistical inference for stochastic parabolic equations: a spectral ap-proach.

Publ. Mat. , 53(1):3–45.[32] Lunardi, A. (1985). Interpolation spaces between domains of elliptic operators and spaces ofcontinuous functions with applications to nonlinear parabolic equations.

Math. Nachr. , 121(1):295–318.[33] Lunardi, A. (2012).

Analytic semigroups and optimal regularity in parabolic problems . SpringerScience & Business Media.[34] Mahdi Khalil, Z. and Tudor, C. (2019). Estimation of the drift parameter for the fractionalstochastic heat equation via power variation.

Mod. Stoch. Theory Appl. , 6(4):397–417.[35] Manthey, R. (1986). Existence and uniqueness of a solution of a reaction-diﬀusion equation withpolynomial nonlinearity and white noise disturbance.

Math. Nachr. , 125:121–133.[36] Marinelli, C., Nualart, E., and Quer-Sardanyons, L. (2013). Existence and regularity of thedensity for solutions to semilinear dissipative parabolic SPDEs.

Potential Anal. , 39(3):287–311.[37] Massart, P. (2007).

Concentration inequalities and model selection , volume 1896 of

Lecture Notesin Mathematics . Springer, Berlin. Lectures from the 33rd Summer School on Probability Theoryheld in Saint-Flour, July 6–23, 2003.[38] Mueller, C. and Nualart, D. (2008). Regularity of the density for the stochastic heat equation.

Electron. J. Probab. , 13(74):2248–2258.[39] Nualart, D. and Quer-Sardanyons, L. (2009). Gaussian density estimates for solutions to quasi-linear stochastic partial diﬀerential equations.

Stochastic Process. Appl. , 119(11):3914–3938.[40] Pasemann, G., Flemming, S., Alonso, S., Beta, C., and Stannat, W. (2020). Diﬀusivity estima-tion for activator-inhibitor models: Theory and application to intracellular dynamics of the actincytoskeleton. arXiv preprint arXiv:2005.09421 .[41] Pasemann, G. and Stannat, W. (2020). Drift estimation for stochastic reaction-diﬀusion systems.

Electron. J. Stat. , 14(1):547–579.[42] Rohde, A. (2004). On the asymptotic equivalence and rate of convergence of nonparametricregression and Gaussian white noise.

Statist. Decisions , 22(3):235–243.[43] Shevchenko, R., Slaoui, M., and Tudor, C. A. (2019). Generalized k -variations and Hurstparameter estimation for the fractional wave equation via Malliavin calculus. arXiv preprintarXiv:1903.02369 .[44] Sinestrari, E. (1985). On the abstract cauchy problem of parabolic type in spaces of continuousfunctions. J. Math. Anal. Appl. , 107(1):16–66. 4845] Torres, S., Tudor, C., Viens, F., et al. (2014). Quadratic variations for the fractional-coloredstochastic heat equation.

Electron. J. Probab. , 19.[46] Viennet, G. (1997). Inequalities for absolutely regular sequences: application to density estima-tion.