[PDF] Multi-dimensional parameter estimation of heavy-tailed moving averages

Abstract

In this paper we present a parametric estimation method for certain multi-parameter heavy-tailed Lévy-driven moving averages. The theory relies on recent multivariate central limit theorems obtained in [3] via Malliavin calculus on Poisson spaces. Our minimal contrast approach is related to the papers [14, 15], which propose to use the marginal empirical characteristic function to estimate the one-dimensional parameter of the kernel function and the stability index of the driving Lévy motion. We extend their work to allow for a multi-parametric framework that in particular includes the important examples of the linear fractional stable motion, the stable Ornstein-Uhlenbeck process, certain CARMA(2, 1) models and Ornstein-Uhlenbeck processes with a periodic component among other models. We present both the consistency and the associated central limit theorem of the minimal contrast estimator. Furthermore, we demonstrate numerical analysis to uncover the finite sample performance of our method.

Full PDF

aa r X i v : . [ m a t h . S T ] J u l Multi-Dimensional Parameter Estimation of Heavy-TailedMoving Averages

Mathias Mørck Ljungdahl ∗ Mark Podolskij † Abstract

In this paper we present a parametric estimation method for certain multi-parameterheavy-tailed Lévy-driven moving averages. The theory relies on recent multivariatecentral limit theorems obtained in [3] via Malliavin calculus on Poisson spaces. Ourminimal contrast approach is related to the papers [15, 14], which propose to use themarginal empirical characteristic function to estimate the one-dimensional parameterof the kernel function and the stability index of the driving Lévy motion. We ex-tend their work to allow for a multi-parametric framework that in particular includesthe important examples of the linear fractional stable motion, the stable Ornstein–Uhlenbeck process, certain CARMA( , ) models and Ornstein–Uhlenbeck processeswith a periodic component among other models. We present both the consistency andthe associated central limit theorem of the minimal contrast estimator. Furthermore,we demonstrate numerical analysis to uncover the ﬁnite sample performance of ourmethod. Keywords : Heavy tails, low frequency, Lévy processes, parametric estimation, limittheorems

AMS 2010 subject classiﬁcations : 62F12, 60F05, 60G51, 60G52

Steadily through the last decades estimation procedures for various classes of continuoustime moving averages and related processes have been proposed, see e.g. [2, 11, 16] forestimation of the parameters in the linear fractional stable motion model and [9, 10] forthe more general class of self-similar processes among many others. The bedrock of thesetechniques are of course the underlying limit theory for various functionals of the processesat hand. One such seminal paper is [18], which gives conditions for bounded functionals ∗ Department of Mathematics, Aarhus University, E-mail: [email protected]. † Department of Mathematics, University of Luxembourg, E-mail: [email protected].

1f a large class of moving averages and was later extended in [19] to certain unboundedfunctions. In a similar framework [5] gives an almost complete picture of the ‘law of largenumbers’ for the classical case of the power variation functional. The article [4] extendsthe functionals from power variation to a large class of statistically interesting functionalsand for a class of symmetric β -stable, moving averages. This paper also provides an almostcomplete picture of the corresponding weak limit theorems, at least in the setting of Appellrank > (such as is the case for power variation and the (real part) of the characteristicfunction).Previous estimation methods suggested in [15, 14, 16] relied on functionals of the one-dimensional marginal law of the process and speciﬁc properties of the process at hand. Sincethe marginal distribution of the considered models have been symmetric β -stable, only thescale and the stability parameters can be estimated via such statistics. In particular, theyare typically not suﬃcient to estimate kernel functions that depend on a multi-dimensionalparameter. Indeed, this discrepancy is observed in [15], where the characteristic functionof the one-dimensional law is not suﬃcient and instead the authors have to rely on acombination with other statistics to ensure estimation of all parameters. The aim of thispaper is to construct estimators of the kernel function and the stability index in the generalsetting of a multi-dimensional parameter space. Instead of relying on existing theory [4,5, 18], which only accounts for the marginal law of the underlying model, we shall usethe framework from the recent paper [3], which is tailor-made for the study of Gaussianﬂuctuations of functionals of multiple heavy-tailed moving averages, to estimate the multi-dimensional parameter.Let us now deﬁne the class of moving average processes for which the underlying limittheory applies. Let L = ( L t ) t ∈ R be a standard symmetric β -stable Lévy process and con-sider the model X t = Z t −∞ g ( t − s ) d L s , t ∈ R , (1.1)for some measurable g : R → R . Necessary and suﬃcient conditions for the integral to existare given in [20] and we mention that in our setting a suﬃcient condition is R R | g ( s ) | β d s < ∞ . The kernel function g is assumed to have a power behaviour around and at inﬁnity.More speciﬁcally, we shall assume the existence of a constant K > together with powers α > and κ ∈ R for which it holds | g ( x ) | ≤ K (cid:0) x κ [0 , ( x ) + x − α [1 , ∞ ) ( x ) (cid:1) for all x ∈ R . (1.2)We are interested in (scaled) partial sums of multivariate functionals of the vectors (( X s +1 ,. . . , X s + m )) s ≥ : V n ( X ; f ) = 1 √ n n − m X s =0 ( f ( X s +1 , . . . , X s + m ) − E [ f ( X , . . . , X m )]) , (1.3)where f : R m → R d is a suitable Borel function. Adhering to [3, Remark 2.4(iii)] thefollowing result holds. Below C b ( R m , R d ) denotes the space of twice diﬀerentiable functions f : R m → R d such that f and all of its ﬁrst and second order derivatives are bounded andcontinuous. 2 heorem 1.1 ([3, Theorem 2.3]) Let ( X t ) t ∈ R be a moving average as in (1.1) with ker-nel function g satisfying (1.2) . Assume that αβ > and κ > − /β . Let f = ( f , . . . , f d ) ∈ C b ( R m , R d ) and consider the statistic V n ( X ; f ) introduced at (1.3) . Then as n → ∞ Σ i,jn := Cov( V n ( X ; f )) → Σ i,j := X s ∈ Z Cov( f i ( X s +1 , . . . , X s + m ) , f j ( X , . . . , X m )) (1.4) for any ≤ i, j ≤ d . Moreover, V n ( X ; f ) L −→ N d (0 , Σ) as n → ∞ . The paper [3] additionally provides Berry–Esseen type bounds for an appropriate distancebetween probability laws on R d , but Theorem 1.1 is suﬃcient for our statistical analysis.We remark that the limit theory for bounded f in the case of m = 1 and general d ∈ N ishandled in [19], but it is actually the reverse situation, i.e. m ∈ N and d = 1 , which we shallneed. Speciﬁcally, f will be the empirical characteristic function of the joint distribution ( X s +1 , . . . , X s + m ) , which then grants us the ability to estimate parameters which are notdetermined by the one-dimensional distribution of X , see Examples 2.2–2.6 below.The paper is organised as follows. In Section 2 we introduce the parametric model,numerous assumptions and the main theoretical results of the paper, which show the strongconsistency and the asymptotic normality of the minimal contrast estimator. Section 3 isdevoted to a numerical analysis of the ﬁnite sample performance of our estimator. Finally,all proofs are collected in Section 4. In the following we will consider a Lévy-driven moving average X = ( X t ) t ∈ R given by X t = Z R g β,θ ( t − s ) d L s , t ∈ R , (2.1)where L is a symmetric β -stable Lévy process with scale parameter and β ∈ Υ for someopen subset Υ ⊆ (0 , , and { g β,θ | β ∈ Υ , θ ∈ Θ } is a measurable family of functionsparametrised by an open subset Υ × Θ ⊆ (0 , × R d for some d ≥ . For ease of notationwe shall often denote the joint parameter with ξ = ( β, θ ) and the open subset by Ξ = Υ × Θ .The main goal of this section is to extend the theory of [14] from a one-dimensionalparameter space, i.e. d = 1 , to a general multi-dimensional theory. Such multi-dimensionalparameter spaces include important examples of the linear fractional stable motion, thestable Ornstein–Uhlenbeck process, certain CARMA( , ) models, and Ornstein–Uhlenbeckprocesses with a periodic component among others. One of the main diﬃculties in extend-ing from d = 1 to d ∈ N is that, quite naturally, the parameters ( β, θ ) should be identiﬁablefrom the (theoretical) statistic, which in the case of [14] is the one-dimensional character-istic function: φ β,θ ( u ) = E [e i uX ] = exp( − k ug β,θ k ββ ) . d > , see Example 2.2.But if we instead consider the characteristic function of the joint distribution ( X , . . . , X m ) ϕ mβ,θ ( u , . . . , u m ) = E (cid:2) e i P mk =1 u k X k (cid:3) = exp (cid:16) − (cid:13)(cid:13)(cid:13) m X k =1 u k g β,θ ( · + k ) (cid:13)(cid:13)(cid:13) ββ (cid:17) , (2.2)such an identiﬁcation may be possible. Let us discuss this in more details. The underlyingstability index β is always identiﬁable from (2.2), since the stability index of a stable ran-dom variable is unique. The problem is then reduced to whether the parametrisation of thekernel θ g β,θ speciﬁes the distribution of X uniquely. The question now becomes a matterof uniqueness for the spectral representation of moving averages, which has been studied ine.g. [21]. Translating the question to the characteristic functions of the ﬁnite dimensionaldistributions, ( X , . . . , X m ) , m ∈ N , we ask whether the β -norm of linear combinations oftranslations of the kernel speciﬁes g β,θ uniquely. This is known as Kanter’s theorem in theliterature and ﬁrst appeared in [12], but for exposition sake let us repeat it here. Suppose β ∈ (0 , ∞ ) is not an even integer and let g, h ∈ L β ( R ) . Then Kanter’s theorem states thatif for all n ∈ N and u , t , . . . , u n , t n ∈ R it holds that (cid:13)(cid:13)(cid:13) n X i =1 u i g ( · + t i ) (cid:13)(cid:13)(cid:13) ββ = (cid:13)(cid:13)(cid:13) n X i =1 u i h ( · + t i ) (cid:13)(cid:13)(cid:13) ββ , then there exists an ǫ ∈ {± } and a τ ∈ R such that g = ǫh ( · + τ ) almost everywhere.Kanter’s theorem then implies that the distribution of X is the same under θ and θ ′ if andonly if there exists ǫ ∈ {± } and τ ∈ R such that ǫg β,θ ( · + τ ) = g β,θ ′ almost everywhere.For many concrete examples of the kernel family { g ξ | ξ ∈ Ξ } it is often straightforward tocheck that such an identity only occurs if ǫ = 1 , τ = 0 and θ = θ ′ .Due to the preceding discussion it is reasonable to make the following assumptionson the family of kernels and we note that similar identiﬁcation requirements are oftenexplicitly or implicitly required in the literature. An important remark is that our theoryallows for a general m ∈ N instead of only m ∈ { , } , where the statistics in the case m = 2 are often autocorrelations. We denote by ∂ z ,z f ξ the partial derivative of f withrespect to the parameters z , z evaluated at ξ ∈ Ξ . Assumption (A)

There exists an m ∈ N such that:(1) < k g β,θ k β < ∞ for all ( β, θ ) ∈ Υ × Θ .(2) The map θ ϕ mβ,θ given in (2.2) is injective.(3) The function ( β, θ )

7→ k P mi =1 u i g β,θ ( · + i ) k ββ is C (Υ × Θ) for each u , . . . , u m ∈ R .(4) u ∂ β ϕ mξ ( u ) , ∂ θ ϕ mξ ( u ) , . . . , ∂ θ d ϕ mξ ( u ) are linearly independent continuous functions.Let us give some remarks about the imposed conditions.4 emark 2.1 (i) The assumption (A)(1) is a necessary and suﬃcient condition for X to be well-deﬁned and non-degenerate. Moreover, (A)(1) makes it apparent why an explicitdependence on β of the kernel g β,θ could be useful. This case of dependence is alsonecessary for some processes such as increments of the linear fractional stable motion,see Example 2.3 below.(ii) Condition (A)(2) is necessary to ensure that the model (2.1) is parametrised properly.Note that the non-existence of an m ∈ N such that (A)(2) holds would imply that theparameters could never be inferred from any ﬁnite data sample making the inferenceof θ impossible in practice. The identiﬁcation of the parameters in a continuous timemodel from samples at equidistant time points is known in the literature as the aliasing problem .(iii) Condition (A)(3) is a minimal requirement for our method of proof (see also [14,Assumption (A)]). In particular, it ensures existence of the derivatives in (A)(4).In order to use Theorem 1.1 we need to make additional assumptions on our kernel and forthis we need to introduce some more notation. Consider a strictly positive weight function w ∈ L ( R m + ) and deﬁne the weighted inner product and norms h g, h i w = Z R m + g ( x ) h ( x ) w ( x ) d x and k h k pw,p = Z R m + | h ( x ) | p w ( x ) d x, p ∈ { , } . Let L pw ( R m + ) denote the corresponding Banach L p -space of Borel functions. Assumption (B) (1) Assume that for all ( β, θ ) ∈ Υ × Θ there exist κ ∈ R and α > such that κ > − /β , αβ > and (1.2) holds for g β,θ .(2) The functions u

7→ | ∂ ξ i ,ξ k ϕ ξ ( u ) | , | ∂ ξ i ϕ ξ ( u ) | , i, k ∈ { , . . . , d + 1 } , are locally dominatedin L w ( R m + ) . That is, there exists for all ξ ∈ Ξ a neighbourhood Ξ ∋ ξ such that thesupremum of these functions over ξ ∈ Ξ are dominated by a function in L w ( R m + ) .Assumption (B)(1) is imposed to ensure that we may employ Theorem 1.1. While (B)(2)seems strict it is always satisﬁed in the one-dimensional case m = 1 and we shall need itto ensure validity of the implicit function theorem in our setup.We now demonstrate some examples, which satisfy Assumption (A) for m ≥ but notfor m = 1 . Example 2.2 (Stable Ornstein–Uhlenbeck process)

Let ( X t ) t ∈ R denote the β -stableOrnstein–Uhlenbeck process with parameter λ > and scale parameter σ > . That is, ( X t ) t ∈ R is a stationary solution of the stochastic diﬀerential equationd X t = − λX t d t + σ d L t .

5t has the representation (2.1) with kernel function g θ ( u ) = σ exp( − λu ) (0 , ∞ ) ( u ) and θ = ( σ, λ ) ∈ (0 , ∞ ) . It is clear that the one-dimensional characteristic function does notcharacterise the parameter θ , hence Assumption (A)(2) is not satisﬁed for m = 1 . Considertherefore the case m = 2 . Here the characteristic function is uniquely determined by θ ifthe β -norms are. Indeed, using the binomial series one may deduce the following formula: k u g θ + u g θ ( · + 1) k ββ = σ β βλ h u β (1 − exp( − βλ )) + ( u + u exp( − λ )) β i , u > u ≥ . It is then straightforward to check that these equations in u > u ≥ determine θ ∈ (0 , ∞ ) uniquely. Additionally, (A)(4) can be checked in a manner similar to Example 2.4below and we refer to Section 4.4 for the derivation of these statements.There are a number of alternative estimation methods for a stable Ornstein–Uhlenbeckmodel. When the stability parameter β is known, λ can be estimated with convergencerate ( n/ log n ) /β as it has been shown in [24]. In the discrete-time setting of the AR(1)model with heavy-tailed i.i.d. noise, it is known that a Gaussian limit can be obtained,cf. [13], but this method again lacks joint estimation with the parameter β . In a similarframework the paper [1] investigates the asymptotic behaviour of the maximum likelihoodestimator. In particular, their results imply that the parameters σ and β can be estimatedwith a √ n -precision, while the drift parameter λ has a faster convergence rate of n /β . Example 2.3 (Linear fractional stable motion)

Let ( Y t ) t ∈ R be the linear fractionalstable motion with self-similarity H ∈ (0 , , stability index β ∈ (0 , and scale parameter σ > . That is, Y t = Z R σ [( t − s ) H − /β + − ( − s ) H − /β + ] d L s . Consider the low frequency k th order increment at rate r ( k, r ∈ N ) deﬁned as X i := ∆ ri,k Y = k X j =0 ( − j (cid:18) kj (cid:19) Y i − rj , i ≥ rk. If r = 1 or k = 1 we remove the corresponding index. In the case of k = r = 1 then X i = ∆ i Y = Y i − Y i − is simply the increments of Y and for k = 2 we have that ∆ ri, Y = Y i − Y i − r + Y i − r , i ≥ r. The corresponding kernel of X becomes g β,H,σ ( u ) = k X j =0 ( − j (cid:18) kj (cid:19) ( u − rj ) H − /β + , where x + = x ∨ is the positive part and x a + := 0 for all x ≤ . We note the asymptoticbehaviour g β,H,σ ( u ) Ku H − /β − k −→ as u → ∞ K > depending on α , H and k . Hence, (1.2) holds with κ = H − /β > − /β and α = k + 1 /β − H > . In this case Assumption (B) can simply be translatedinto an assumption on the parameter space Υ × Θ , e.g., Υ × Θ = { ( β, H, σ ) | < H < k − /β, C < σ < C } , for some arbitrary but ﬁnite constant C > . It is well-known that X has a version withcontinuous paths if and only if H − /β > , so if we want to do inference in the continuouscase we have the two parameter inequalities: < H − /β and H < k − /β. (2.3)Note that these inequalities never hold for k = 1 . But they are always satisﬁed for k ≥ ,which shows the usefulness of higher order increments. Moreover, the H -self-similarity of X implies that E [ | ∆ k,k Y | p ] E [ | ∆ k,k Y | p ] = 2 pH for p ∈ ( − , . For k = 2 the term ∆ , Y is a linear combination of ∆ , Y , ∆ , Y , ∆ , Y . Hence, H is iden-tiﬁable from the characteristic function of the three-dimensional distribution ( X , X , X ) ,in other words, m = 3 in the case k = 2 . Example 2.4 (OU-type model with a periodic component)

The next example weconsider is a periodic extension of the stable Ornstein–Uhlenbeck process from Example 2.2.Let θ = ( θ , θ ) ∈ (0 , ∞ ) and consider the kernel function: g θ ( u ) = exp( − θ u − θ f ( u )) (0 , ∞ ) ( u ) , u ∈ R , where f : R → R is a bounded measurable function which is either non-negative or non-positive and has period , i.e. f ( x + 1) = f ( x ) for all x . If f does not vanish except onLebesgue null set, then θ ϕ mβ,θ for m = 2 is injective. If, in addition, f is negative thenAssumption (B)(2) is satisﬁed except possibly at β = 1 . We refer to Section 4.4 for theproof of these statements. Example 2.5 (Modulated OU)

Consider the process X deﬁned at (2.1) with kernelgiven by g θ ( s ) = θ s exp( − θ s ) (0 , ∞ ) ( s ) , s ∈ R . (2.4)Under the assumptions on the parameters θ ∈ (0 , ∞ ) and β ∈ (1 , it is possible to provethat θ is not identiﬁable from m = 1 while it is in the case m = 2 . We refer to Section 4.5for the full exposition of these claims. Example 2.6 (CARMA processes)

Consider integers p > q . The CARMA( p, q ) pro-cess ( Y t ) t ∈ R with parameters a , . . . , a p , b , . . . , b q − ∈ R driven by L is the solution to thestochastic diﬀerential equation X t = b ⊤ Y t with d Y t − AY t d t = e d L t , (2.5)7here e and b are the p -dimensional column vectors given by e = (0 , . . . , , ⊤ and b = ( b , . . . , b p − ) ⊤ , where b q = 1 and b i = 0 for all q < i < p and A is the p × p matrix given by A =  · · ·

00 0 1 · · · ... ... ... . . . − a p − a p − a p − · · · − a  . CARMA( p, q ) processes ﬁts within the framework of (2.1) since if the eigenvalues of A havestrictly negative real part, then a unique stationary solution of (2.5) exists and is given by X t = Z R b ⊤ e A ( t − s ) e [0 , ∞ ) ( t − s ) d L s , t ∈ R , see [7, Proposition 1]. In this example we discuss a speciﬁc three-dimensional sub-class ofCARMA( , ) processes, which corresponds to the choice λ := −√ a and a = 2 √ a = − λ . The parameter of interest becomes ξ = ( β, b , λ ) and we further assume that β ∈ (1 , and θ := b + λ > . In this setting the matrix A is given by A = (cid:18) − λ λ (cid:19) and λ < is the only eigenvalue of A . We thus obtain the Jordan normal form A = S (cid:18) λ λ (cid:19) S − , S = (cid:18) λ (cid:19) , S − = (cid:18) − λ (cid:19) . Using this representation elementary matrix algebra yields the identity g ( s ) = b ⊤ exp( sA ) e [0 , ∞ ) ( s ) = (1 + θs ) exp( λs ) [0 , ∞ ) ( s ) . In Section 4.6 we show that the parameters of the model are identiﬁable in the case m = 2 . We note ﬁrst that the discrete time process ( X t ) t ∈ Z is ergodic according to [8], and so isthe sequence Y i = f ( X i +1 , . . . , X i + m ) , i ∈ Z ,for any measurable function f . Hence, we obtain by Birkhoﬀ’s ergodic theorem the strongconsistency (of the real part) of the joint empirical characteristic function: ϕ n ( u , . . . , u m ) = 1 n n − m X i =0 cos (cid:16) m X k =1 u k X i + k (cid:17) a.s. −−→ E h cos (cid:16) m − X k =0 u k X k (cid:17)i = ϕ mξ ( u , . . . , u m ) , (2.6)8here ξ = ( β, θ ) ∈ Ξ denotes the unknown parameter of the model. To reduce cumbersomenotation we drop the dependence on m in the characteristic function and simply write ϕ ξ from now on. For a weight function w introduced in the previous section, we denote by F : L w ( R m + ) × Ξ → R the map F ( ψ, ξ ) = k ψ − ϕ ξ k w, . The minimal contrast estimator ξ n of ξ is then deﬁned as ξ n ∈ argmin ξ ∈ Ξ F ( ϕ n , ξ ) = argmin ξ ∈ Ξ Z R m + ( ϕ n ( u ) − ϕ ξ ( u )) w ( u ) d u, (2.7)and we remark that ξ n can be chosen universally measurable by [23, Theorem 2.17(d)]. Toobtain the asymptotic normality of the minimal contrast estimator ξ n we will show a centrallimit theorem for the statistic √ n ( ϕ n ( u , . . . , u m ) − ϕ ξ ( u , . . . , u m )) using Theorem 1.1and then apply a functional version of the implicit function theorem. For this purpose weintroduce a centred Gaussian ﬁeld ( G u ) u ∈ R m + whose covariance kernel is deﬁned as Cov( G u , G v ) = X l ∈ Z Cov(cos( h u, Z i R m ) , cos( h v, Z l i R m )) , (2.8)where Z k = ( X k , . . . , X m + k ) . The main theoretical result of the paper is the strongconsistency and asymptotic normality of the minimal contrast estimator ξ n . Theorem 2.7

Let ( ξ n ) be the minimal contrast estimator at (2.1) associated with thetrue parameter ξ = ( β , θ ) . Suppose that Assumptions (A) and (B) hold for the un-derlying family of kernels ( g ξ ) ξ ∈ Ξ . Assume that the weight function w is continuous and R R m + k u k R m w ( u ) d u < ∞ . (i) ξ n → ξ almost surely as n → ∞ . (ii) The convergence as n → ∞√ n ( ξ n − ξ ) L −→ (cid:0) ∇ ξ F ( ϕ ξ , ξ ) (cid:1) − ( h ∂ ξ i ϕ ξ , G i w ) i =1 ,...,d +1 holds, where G = ( G u ) u ∈ R m + is a continuous zero-mean Gaussian random ﬁeld withcovariance kernel deﬁned by (2.8) . In particular, the above limit is a normally dis-tributed ( d + 1) -dimensional random vector. We note that due to Assumption (A)(4) the matrix ∇ ξ F ( ϕ ξ , ξ ) is invertible. In principle,the normal limit in Theorem 2.7 is explicit up to the knowledge of the parameter ξ , butdue to the complex covariance kernel of the process G it is hard to apply the centrallimit theorem to obtain conﬁdence regions. Instead one may use a parametric bootstrapapproach as it has been suggested in [15, Section 4.2].We remark that the convergence rate is √ n for all parameters. Due to the non-Markovianstructure of the general model (2.1) it is a non-trivial task to assess the optimality of thisrate. As we have discussed in Example 2.2 the rate √ n can be suboptimal in the particularcase of the drift parameter in an Ornstein–Uhlenbeck model.9 emark 2.8 (Extension to general Lévy drivers) If we drop the requirement for es-timation of β we can consider a larger class of Lévy drivers. Indeed, according to [3] thestatement of Theorem 1.1 still holds for a symmetric Lévy process L , which admits a Lévydensity ν such that ν ( x ) ≤ C | x | − − β for all x = 0 .In this case the characteristic function takes on a more complicated form. Indeed, by [20,Theorem 2.7] it holds that E (cid:2) e i h u, ( X ,...,X m ) i R m (cid:3) = exp (cid:16)Z R Z R [cos( h u, x ( g ξ ( z + i )) i =0 ,...,m − i R m ) − ν ( d x ) d z (cid:17) . In principle, the asymptotic theory of Theorem 2.7 can be extended to this more generalsetting. However, the proof of the asymptotic normality relies on the existence of a con-tinuous modiﬁcation of the random ﬁeld ( G u ) u ∈ R m + and the behaviour of E [ G u ] in u ∈ R m + (cf. Section 4.1), which requires a diﬀerent treatment compared to the β -stable case. In this section we will demonstrate the ﬁnite sample performance of our estimator forthree examples, which are supposed to highlight diﬀerent aspects of the minimal contrastapproach. First, we will consider the linear fractional stable motion (cf. Example 2.3) anduse m = 3 to estimate the three-dimensional parameter of the model. The second exampleis the generalized modulated OU-process , which has not been shown to satisfy the mainassumptions of the paper. We will use m = 2 to estimate the three-dimensional parametricmodel and test how our method works in this framework. The third model is the Ornstein-Uhlenbeck process considered in Example 2.2 with a ﬁxed and known scale parameter σ .In this setting both m = 1 and m = 2 can be used to estimate the drift λ and the stabilityindex β , and the aim of the numerical simulation is to test how the choice of higher index m aﬀects the performance of the estimator.Since the weight function w depends on m implicitly via its domain we need a function,which is reasonably compatible between diﬀerent dimensions and we consider thereforethroughout this study the m -dimensional Gaussian density with zero mean and a scaledunit covariance matrix ν I m : w ν ( u ) = (2 πν ) − m/ exp (cid:16) − k u k R m ν (cid:17) , u ∈ R m , ν > . (3.1)The choice of ν varies between the three example process and it is a subject for future re-search to automatically determine an optimal weight. For the computation of the weightedintegral in (2.7) we use Gauss–Laguerra quadrature which is a weighted sum of functionvalues and the number of weights will also vary depending on the process.We note additionally that the minimisation involved in computing the minimal contrastestimator at (2.7) has to be done numerically and for this we use the method of [17], whichrequires picking a starting point which naturally will depend on the example kernel at hand.10astly, we remark that the β -norm of the kernel function is generally not known explicitly,hence the theoretical characteristic function is approximated as well.All tables in this section are based on at least 200 Monte Carlo repetitions. Recall from the discussion in Example 2.3 that it is prudent to take higher order increments,and we ﬁx throughout k = 2 . Moreover, to properly identify the parameters we considerthe characteristic function of the three-dimensional joint distribution, hence m = 3 . Nextwe consider throughout the weight function at (3.1) with standard deviation ν = 10 andthe weighted integral is approximated with = 1728 number of weights. The startingpoint for the minimisation algortihm is ( β, H, σ ) = (1 . , . , .The estimator is tested in the continuous case, so only parameter combinations resultingin the equality H − /β > are considered. Table 1 reports the bias and standard deviationin the case of n = 1000 for diﬀerent parameter combinations, while Table 2 explores thecase n = 10 000 . We observe a rather good performance of all estimators with superiorresults in the setting n = 10 000 as expected from our theoretical statements. We notethat the estimator of the scale parameter σ performs the best, which is in line with earlierﬁndings of [16]. Table 1:

Absolute value of bias (|Bias|) and standard deviation (Std) for n = 1000 and σ = 0 . for thelinear fractional stable motion. |Bias| Std H β b β n b H n b σ n b β n b H n b σ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2:

Absolute value of bias (|Bias|) and standard deviation (Std) for n = 10 000 and σ = 0 . for thelinear fractional stable motion. |Bias| Std H β b β n b H n b σ n b β n b H n b σ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Generalized modulated OU process The generalized modulated OU process is deﬁned via equation (2.1) with kernel function g θ ( s ) = s σ exp( − λs ) (0 , ∞ ) ( s ) , s ∈ R , where θ = ( σ, λ ) ∈ (0 , ∞ ) . We recall that this class of kernels has not been shown tosatisfy the main assumption of the paper, but it easily seen that m = 1 is not enoughto identify the parameters in θ . We take m = 2 and increase the number of weights to , hence the weighted integral approximation is based on = 400 nodes. Moreover,the weight function is as in (3.1) with ν = 0 . . Lastly, we pick as starting point for theminimisation algorithm ( β, λ, σ ) = (1 . , , .Tables 3 and 4 report the ﬁnite sample performance of the estimators for n = 10 000 ,and σ = 0 . and σ = 2 , respectively. We observe a good performance of the estimator b β n and a very unsatisfactory performance of the estimator b σ n . We conjecture that the reasonfor the suboptimal performance lies in the choice of the weight function w , which mayhave opposite eﬀects on diﬀerent parameters of the model, as well as in the minimisationalgorithm, since it has a tendency to get stuck in local minima. Table 3:

Absolute value of bias (|Bias|) and standard deviation for n = 10 000 and σ = 0 . for thegeneralized modulated OU kernel. |Bias| Std β λ b β n b λ n b σ n b β n b λ n b σ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4:

Absolute value of bias (|Bias|) and standard deviation (Std) for n = 10 000 and σ = 2 for thegeneralized modulated OU kernel. |Bias| Std β λ b β n b λ n b σ n b β n b λ n b σ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Ornstein–Uhlenbeck In this subsection we consider the Ornstein–Uhlenbeck kernel from Example 2.2 with σ = 1 being ﬁxed and known. In this case Assumption (A) is satisﬁed for both m = 2 and m = 1 ,and we will compare the performance for each of these dimensions. Akin to Section 3.2we pick m , m = 1 , , number of weights in the integral approximation with weightfunction chosen as in (3.1) with ν = 1 . The starting point for the minimisation algorithmis throughout ( β, λ ) = (1 . , . .Tables 5 and 6 demonstrate the simulation results for m = 1 and m = 2 , respectively.We observe a rather convincing performance for both estimators in all settings, but thechoice m = 1 clearly outperforms the setting m = 2 . We conjecture that it has a theoreticalbackground, i.e. the asymptotic variances in Theorem 2.7(ii) are smaller for m = 1 , anda numerical background. Indeed, the minimisation algorithm has a worse performance forhigher values of m . For this reason it is advisable to use the minimal m , which identiﬁesthe parameters of the model. Table 5:

Absolute value of bias (|Bias|) and standard deviation (Std) for m = 1 and n ∈ { , } . n = 1000 |Bias| Std β λ b β n b λ n b β n b λ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n = 10 000 |Bias| Std β λ b β n b λ n b β n b λ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . able 6: Absolute value of bias (|Bias|) and standard deviation (Std) for m = 2 and n ∈ { , } . n = 1000 |Bias| Std β λ b β n b λ n b β n b λ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n = 10 000 |Bias| Std β λ b β n b λ n b β n b λ n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Proofs

In this section

C > denotes a generic constant, which may change from line to line. Recallmoreover the shorthand ξ = ( β, θ ) for the joint parameters. To characterise the covariance of the asymptotic Gaussian ﬁeld ( G u ) u ∈ R m + we deﬁne adependence measure between two m -dimensional stable vectors Y = ( R h d L, . . . , R h m d L ) and Z = ( R g d L, . . . , R g m d L ) : U Y,Z ( u, v ) := E (cid:2) e i h ( u,v ) , ( Y,Z ) i R m (cid:3) − E (cid:2) e i h u,Y i R m (cid:3) E (cid:2) e i h v,Z i R m (cid:3) , u, v ∈ R m . This is a straightforward multivariate extension of the measure deﬁned in [19]. We nowapply Theorem 1.1 in conjunction with the smooth and bounded functions f u ( x ) = cos( h u, x i R m ) , u, x ∈ R m , such that we obtain the ﬁnite dimensional convergence of the processes: √ n ( ϕ n ( u ) − ϕ ξ ( u )) u ∈ R m + ﬁdi −−−→ n →∞ ( G u ) u ∈ R m + . Let Z = ( X , . . . , X m ) and Z ℓ = ( X ℓ , . . . , X m + ℓ ) , then the covariance function R : R m × R m → R of G is, cf. (1.4), given by R ( u, v ) = X ℓ ∈ Z r ℓ ( u, v ) , where for ℓ ∈ Z r ℓ ( u, v ) = Cov(cos( h u, Z i ) , cos( h v, Z ℓ i )) , u, v ∈ R m . We will now prove that there exists a version of G , which is locally Hölder continuous upto any order less than β/ . By Kolmogorov’s criteria and Gaussianity it is enough to provethat for any T > there exists a constant C T ≥ such that E (cid:2) ( G u − G v ) (cid:3) ≤ C T k u − v k β/ for all u, v ∈ [0 , T ] m , (4.1)where k u − v k = P mi =1 | u i − v i | denotes the ℓ -norm throughtout the rest of this paper. Toprove (4.1) note the decomposition E (cid:2) ( G u − G v ) (cid:3) = R ( u, u ) − R ( u, v ) + R ( v, v ) − R ( u, v ) . Hence by symmetry it suﬃces to consider the term R ( u, u ) − R ( v, v ) = X ℓ ∈ Z ( r ℓ ( u, u ) − r ℓ ( u, v )) . r ℓ ( u, u ) − r ( u, v ) which is both β -Hölderin ( u, v ) and summable in ℓ . Using the standard identity cos( x ) = (e i x + e − i x ) / and thesymmetry of L we deduce the identity r ℓ ( u, u ) − r ℓ ( u, v )) = [ U Z ,Z ℓ ( u, − u ) − U Z ,Z ℓ ( u, − v )] + [ U Z ,Z ℓ ( u, u ) − U Z ,Z ℓ ( u, v )] . The two terms in the square brackets are treated very similarly so we consider only theﬁrst one. Before diving into the tedious calculations we recall the following inequalities for x, y ∈ R : | e − x − e − y | ≤ | x − y | if x, y ≥ (4.2) | x + y | β ≤ | x | β + | y | β for β ∈ (0 , (4.3) || x | β − | y | β | ≤ | x − y | β for β ∈ (0 , (4.4) || x + y | β − | x | β − | y | β | ≤ | xy | β/ for β ∈ (0 , . (4.5)Deﬁne additionally the two quantities ρ i = Z R | g ξ ( x ) g ξ ( x + i ) | β/ d x and µ i = Z ∞− m | g ξ ( x + i ) | β d x, i ∈ Z . We shall need the following lemma.

Lemma 4.1

Let i ∈ N . Then it holds: (i) ρ i ≤ Ci − αβ/ . (ii) If i > m then µ i ≤ C ( i − m ) − αβ . Proof. (i) follows as in [6, Lemma 4.1]. For (ii) note if k > m then x + k > for any x > − m , so according to assumption (1.2) µ i ≤ C Z ∞− m ( x + k ) − αβ = C ( k − m ) − αβ , where we used that αβ > . (cid:3) Using the expression for the characteristic function of a symmetric β -stable random variablewe decompose as follows U Z ,Z ℓ ( u, − u ) − U Z ,Z ℓ ( u, − v )= exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ (cid:17) − exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) − (cid:20) exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ( i − · ) − v i g ( i + ℓ − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) − exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i + ℓ − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17)(cid:21) exp (cid:16) (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) × (cid:20) exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) − exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17)(cid:21) × (cid:20) exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ (cid:17) − exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17)(cid:21) + exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) × (cid:20) exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ + 2 (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) − exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) − v i g ξ ( i + ℓ − · ) (cid:13)(cid:13)(cid:13) ββ + (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ + (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17)(cid:21) =: r ℓ ( u, v ) + r ℓ ( u, v ) . For the ﬁrst term, r ℓ , we notice that the exponential term in front is bounded in u ∈ [0 , T ] m (and of course in ℓ ∈ Z as well), hence by (4.2) r ℓ ( u, v ) ≤ C T (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) × (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) . The ﬁrst absolute value term will give the Hölder continuity of order β/ and the second willensure summability in ℓ . For the ﬁrst term we may bound as follows in the case β ∈ (0 , using (4.4) and (4.3) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z R (cid:16) m X i =1 | u i − v i | | g ξ ( i − x ) | (cid:17) β d x ≤ k u − v k β m X i =1 Z R | g ξ ( i − x ) | β d x ≤ C T k u − v k β/ . If instead β > , then the map is u

7→ k P mi =1 u i g ξ ( i − · ) k ββ is continuously diﬀerentiable,hence by the mean value theorem it is Hölder continuous of any order less than or equal to , and since β ∈ (0 , Hölder continuity of order β/ then holds. For the second absolutevalue term it follows by (4.5) and (4.3) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) − m X i =1 u i g ξ ( i + ℓ − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:13)(cid:13)(cid:13)(cid:16) m X i =1 u i g ξ ( i − · ) (cid:17)(cid:16) m X k =1 u k g ξ ( k + ℓ − · ) (cid:17)(cid:13)(cid:13)(cid:13) β/ β/ ≤ T β m X i,k =1 k g ξ ( i − · ) g ξ ( k + ℓ − · ) k β/ β/ = 2 T β m X i,k =1 ρ ℓ + k − i , which is summable in ℓ by Lemma 4.1 and the assumption αβ > . We now turn ourattention to the more complicated second term r ℓ ( u, v ) . Utilising (4.2) we have that r ℓ ( u, v ) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ + (cid:13)(cid:13)(cid:13) m X i =1 v i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ + (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) − v i g ξ ( i + ℓ − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z ∞− m (cid:20)(cid:12)(cid:12)(cid:12) m X i =1 u i ( g ξ ( x + i ) − g ξ ( i + ℓ + x )) (cid:12)(cid:12)(cid:12) β − (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + x ) − v i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β (cid:21) + (cid:20)(cid:12)(cid:12)(cid:12) m X i =1 v i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β − (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β (cid:21) d x (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z ∞− m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m X i =1 u i ( g ξ ( x + i ) − g ξ ( i + ℓ + x )) (cid:12)(cid:12)(cid:12) β − (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + x ) − v i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β (cid:12)(cid:12)(cid:12)(cid:12) d x + Z ∞− m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m X i =1 v i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β − (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β (cid:12)(cid:12)(cid:12)(cid:12) d x =: r , ℓ ( u, v ) + r , ℓ ( u, v ) ,

18e deal ﬁrst with the second term r , ℓ . First, if β ∈ (0 , , then by (4.4) and (4.3) r , ℓ ( u, v ) ≤ Z ∞− m (cid:12)(cid:12)(cid:12) m X i =1 ( u i − v i ) g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β d x ≤ k u − v k β m X i =1 µ i + ℓ , and by Lemma 4.1(ii) we obtain a bound which is summable in ℓ > m . If instead β ∈ (1 , the map h ( u ) = Z ∞− m (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β d x, u ∈ R m , is continuously diﬀerentiable and the absolute value of the derivative is bounded as followsfor any u ∈ [0 , T ] m and ℓ > m : (cid:12)(cid:12)(cid:12) ∂∂u k h ( u ) (cid:12)(cid:12)(cid:12) ≤ Z ∞− m (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β − | g ξ ( k + ℓ + x ) | d x ≤ T β − m X i =1 Z ∞− m | g ξ ( i + ℓ + x ) | β − | g ξ ( k + ℓ + x ) | d x ≤ CT β − m ( ℓ − m ) − αβ , where we have argued as in Lemma 4.1(ii) in the last inequality. Hence, in the case β ∈ (1 , we obtain by the mean value theorem r , ℓ ( u, v ) ≤ sup z ∈ [0 ,T ] m k∇ h ( z ) k k u − v k ≤ C T ( ℓ − m ) − αβ k u − v k β , and as αβ > we have obtained a bound summable in ℓ .It remains to consider the term r , ℓ . Here it follows from the inequality || x | β − | y | β | ≤| x − y | β/ and the triangle inequality that the integrand is bounded by (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m X i =1 u i ( g ξ ( i + x ) − g ξ ( i + ℓ + x )) (cid:12)(cid:12)(cid:12) β − (cid:12)(cid:12)(cid:12) m X i =1 u i g ξ ( i + x ) − v i g ξ ( i + ℓ + x ) (cid:12)(cid:12)(cid:12) β (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) m X i,k =1 u i u k ( g ξ ( i + x ) − g ξ ( i + ℓ + x ))( g ξ ( k + x ) − g ξ ( k + ℓ + x )) − ( u i g ξ ( i + x ) − v i g ξ ( i + ℓ + x ))( u k g ξ ( k + x ) − v k g ξ ( k + ℓ + x )) (cid:12)(cid:12)(cid:12)(cid:12) β/ = (cid:12)(cid:12)(cid:12)(cid:12) m X i,k =1 h ( u i u k − v i v k ) g ξ ( i + ℓ + x ) g ξ ( k + ℓ + x )+ u i ( v k − u k ) g ξ ( i + x ) g ξ ( k + ℓ + x )+ u k ( v i − u i ) g ξ ( i + ℓ + x ) g θ,β ( k + x ) i(cid:12)(cid:12)(cid:12)(cid:12) β/ ≤ C T k u − v k β/ m X i,k =1 h | g ξ ( i + ℓ + x ) g ξ ( k + ℓ + x ) | β/ + | g ξ ( i + x ) g ξ ( k + ℓ + x ) | β/ i . r , ℓ ( u, v ) ≤ C T k u − v k β/ (cid:16) ( ℓ − m ) − αβ + m X i,k =1 ρ ℓ + k − i (cid:17) , which is summable in ℓ as αβ > .Lastly, we shall prove that ( G u ) u ∈ R m + has paths in L w ( R m + ) almost surely, such that R R m + G u w ( u ) d u is well-deﬁned. A suﬃcient criteria for this is R R m + Var[ G u ] / w ( u ) d u < ∞ ,since G is centred. For this we need to study r ℓ ( u, u ) again. Recall that r ℓ ( u, u ) = U Z ,Z ℓ ( u, − u ) + U Z ,Z ℓ ( u, u ) . As both terms are treated almost identically it suﬃces to consider the ﬁrst one. Here itfollows from the inequality | e x − | ≤ e | x | | x | , x ∈ R , and (4.5), that | U Z ,Z ℓ ( u, − u ) | = (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − · )) (cid:13)(cid:13)(cid:13) ββ (cid:17) − exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:17) × (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − x )) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12) × exp (cid:18)(cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) m X i =1 u i ( g ξ ( i − · ) − g ξ ( i + ℓ − x )) (cid:13)(cid:13)(cid:13) ββ − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) ≤ exp (cid:16) − (cid:13)(cid:13)(cid:13) m X i =1 u i g ξ ( i − · ) (cid:13)(cid:13)(cid:13) ββ + 2 (cid:13)(cid:13)(cid:13)(cid:16) m X i =1 u i g ξ ( i − · ) (cid:17)(cid:16) m X i =1 u i g ξ ( i + ℓ − · ) (cid:17)(cid:13)(cid:13)(cid:13) β/ β/ (cid:17) × k u k β m X i,k =1 ρ ℓ + k − i ≤ k u k β m X i,k =1 ρ ℓ + k − i , where we have used the Cauchy–Schwarz inequality in the last line. Summing over ℓ yieldsan element in L w ( R m + ) by the assumption on the weight function w . In Section 4.1 we saw that the empirical characteristic functions suitably scaled and centredconverge to a Gaussian process in ﬁnite dimensional sense. We wish to extend this con-vergence to integrals of our processes. For this we need to extend [14, Lemma 1] to amultivariate case. For x ∈ R let ⌊ x ⌋ denote the largest integer l such that l ≤ x and for avector u = ( u , . . . , u m ) ∈ R m we let ⌊ u ⌋ = ( ⌊ u ⌋ , . . . , ⌊ u m ⌋ ) .20 emma 4.2 Let ( Y nu ) u ∈ R m + and ( Y u ) u ∈ R m + be continuous random ﬁelds with Y n ﬁdi −→ Y .Assume that R R m + E [ | Y nu | ] d u < ∞ and R R m + E [ | Y u | ] d u < ∞ , and set for k, ℓ, n ∈ N X n,k,ℓ = Z [0 ,ℓ ] m Y n ⌊ uk ⌋ /k d u and X n,ℓ = Z [0 ,ℓ ] m Y nu d u. Suppose that lim ℓ →∞ lim sup n →∞ Z R m − − i + Z ∞ ℓ Z R i + E [ | Y nu | ] d u = 0 , lim k →∞ lim sup n →∞ P ( | X n,k,ℓ − X n,ℓ | > ε ) = 0 where the ﬁrst convergence holds for all i ∈ { , . . . , m − } and the latter for all ε, ℓ > .Then convergence in distribution holds: Z R m + Y nu d u L −→ Z R m + Y u d u for n → ∞ . Proof.

Observe for each ℓ > the decomposition Z R m + Y nu d u = X n,k,ℓ + ( X n,ℓ − X n,k,ℓ ) + m − X i =0 Z R m − − i + Z ∞ ℓ Z [0 ,ℓ ] i Y nu d u. Conclude now as in [14, Lemma 1]. (cid:3)

First, ξ n a.s. −−→ ξ follows by standard arguments which in particular uses Assumption (A),see e.g. [14], where one uses k ϕ n − ϕ ξ k w a.s. −−→ as n → ∞ , (4.6)which is a direct consequence of (2.2) and Lebesgue’s dominated convergence theorem. Toderive the central limit theorem for the estimator, we consider instead the requirement ∇ ξ F ( ϕ, ξ ) = 0 ϕ ∈ L w ( R m + ) , ξ ∈ Ξ which is satisﬁed at ( ϕ ξ , ξ ) . The problem may now be viewed from a implicit functionalpoint of view. To this end we recall the implicit function theorem on general Banach spaces.Consider a Fréchet diﬀerentiable map g : U × U → B where U and U are open subsetsof the Banach spaces B and B , respectively, and B is an additional Banach space. Let D ih i g ( p , p ) , i ∈ { , } , denote the partial derivatives at the point ( p , p ) ∈ U × U inthe direction h i ∈ B i . If ( p , p ) ∈ U × U is a point such that g ( p , p ) = 0 and the map h D h g ( p , p ) : B → B is a continuous and invertible function, then there exists opensubsets V ⊆ U and V ⊆ U such that ( p , p ) ∈ V × V and a Fréchet diﬀerentiable andbijective (implicit) function Φ : V → V such that g ( p , p ) = 0 ⇐⇒ Φ( p ) = p .

21n addition, the derivative is given by D h Φ( p ) = − (cid:0) D g ( p, Φ( p )) (cid:1) − (cid:0) D h g ( p, Φ( p )) (cid:1) , h ∈ B , p ∈ V . (4.7)As might be apparent we shall consider the speciﬁc setup of g = ∇ ξ F , B = U = L w ( R m + ) , U = Ξ ⊆ B = R d +1 . We note that Assumption (B)(2) ensures the existenceand continuity of the ﬁrst and second order derivatives of F . Moreover, Assumption (A)(4)yields the invertibility of the Hessian ∇ ξ F ( ϕ ξ , ξ ) .In this case Φ( ϕ n ) = ξ n and Φ( ϕ ξ ) = ξ . Hence, by Fréchet diﬀerentiability we ﬁnd that √ n ( ξ n − ξ ) = √ n (Φ( ϕ ξ + ( ϕ n − ϕ ξ )) − Φ( ϕ ξ ))= D √ n ( ϕ n − ϕ ξ ) Φ( ϕ ξ ) + √ n k ϕ n − ϕ ξ k w, R ( ϕ n − ϕ ξ ) , where the remainder term satisﬁes that R ( ϕ n − ϕ ξ ) a.s. −−→ as k ϕ n − ϕ ξ k w, a.s. −−→ . Recallingthe derivative at (4.7) and the representation F ( ϕ, ξ ) = h ϕ − ϕ ξ , ϕ − ϕ ξ i w , it suﬃces toprove that √ n k ϕ n − ϕ ξ k w, L −→ k G k w, ( h ∂ ξ i ϕ ξ , √ n ( ϕ n − ϕ ξ ) i w ) i =1 ,...,d +1 L −→ ( h ∂ ξ i ϕ ξ , G i w ) i =1 ...,d +1 . Note that since it is the same underlying sequence of processes, √ n ( ϕ n − ϕ ξ ) , it is forthe last convergence enough to consider the case of a ﬁxed i ∈ { , . . . , d + 1 } , indeedthis requires little modiﬁcation of Lemma 4.2. We focus on the last convergence as theﬁrst is treated similarly, and to use Lemma 4.2 it is suﬃcient to provide suitable momentestimates for Y nu = ∂ ξ i ϕ ξ ( u ) w ( u ) √ n ( ϕ n ( u ) − ϕ ξ ( u )) =: h ( u ) G nu , u ∈ R m + , n ∈ N . Using arguments as in [3, Proposition 3.3, page 12] and the variance estimates fromSection 4.1 we deduce that E (cid:2) | Y nu | (cid:3) ≤ ( ∂ ξ i ϕ ξ ( u ) w ( u )) X ℓ ∈ Z | r ℓ ( u, u ) | ≤ C k u k β ( ∂ ξ i ϕ ξ ( u ) w ( u )) . Taking the square root we obtain a bound in L ( R m + ) of E [ | Y nu | ] by the Cauchy–Schwarzinequality used together with Assumption (B)(2) and that u

7→ k u k is an element of L w ( R m + ) . Hence the ﬁrst condition of Lemma 4.2 is satisﬁed. The second condition is slightlymore involved, but let an ℓ > be given and consider any u, v ∈ [0 , ℓ ] m . Then E (cid:2) | Y nu − Y nv | (cid:3) / ≤ | h ( u ) − h ( v ) | Var[ G nu ] / + | h ( v ) | Cov( G nu , G nv ) / ≤ C T ( | h ( u ) − h ( v ) | + k u − v k ) , which by Markov’s inequality yields the second condition of Lemma 4.2.22 .4 Proof of statements in Example 2.4 Consider the kernel g θ ( u ) = exp( − θ u − θ f ( u )) (0 , ∞ ) ( u ) for θ = ( θ , θ ) ∈ (0 , ∞ ) andwhere f is a bounded measurable -periodic function which does not vanish except on aLebesgue null set. Assume moreover that f is either non-positive and or non-negative. It isstraightforward to see that in this case the characteristic function of X does not determinethe parameter θ uniquely. Consider instead the joint characteristic function ϕ β,θ ( u , u ) of ( X , X ) for the moving average X with kernel g θ , which is given by: ϕ β,θ ( u , u ) = exp (cid:0) − k u g θ + u g θ ( · + 1) k ββ (cid:1) , u , u ≥ . If ϕ β,θ = ϕ β, ˜ θ for θ, ˜ θ ∈ (0 , ∞ ) , then the β -norms must be equal. Recalling the generalizedbinomial theorem ( x + y ) β = ∞ X k =0 (cid:18) βk (cid:19) x β − k y k x > y ≥ we may calculate these norms explicitly for u > u ≥ : k u g θ + u g θ ( · + 1) k ββ = u β Z exp( − θ x − θ f ( x )) d x + Z ∞ ∞ X k =0 (cid:18) βk (cid:19) u β − k u k exp( − ( β − k )( θ x + θ f ( x )) − k ( θ ( x + 1) + θ f ( x + 1))) d x = u β Z exp( − θ x − θ f ( x )) d x + Z ∞ ∞ X k =0 (cid:18) βk (cid:19) u β − k u k exp( − β ( θ x + θ f ( x )) − kθ ) d x = u β Z exp( − θ x − θ f ( x )) d x + ( u + u exp( − θ )) β Z ∞ exp( − β ( θ x + θ f ( x ))) d x where the last equality follows from the generalised binomial theorem since u > u ≥ u exp( − θ ) . Hence if ϕ β,θ = ϕ β, ˜ θ then for all u > u ≥

01 = u β R exp( − θ x − θ f ( x )) d x + ( u + u exp( − θ )) β R ∞ exp( − β ( θ x + θ f ( x ))) d xu β R exp( − ˜ θ x − ˜ θ f ( x )) d x + ( u + u exp( − ˜ θ )) β R ∞ exp( − β (˜ θ x + ˜ θ f ( x ))) d x . Inserting u = 1 > u yields the identity: K := Z ∞ exp( − β ( θ x + θ f ( x ))) d x = Z ∞ exp( − β (˜ θ x + ˜ θ f ( x ))) d x hence it suﬃces to prove that θ = ˜ θ . Moreover, inserting the above identity in ϕ β,θ = ϕ β, ˜ θ and diﬀerentiating with respect to u gives that for all u > u : ( u + u exp( − θ )) β − K = ( u + u exp( − ˜ θ )) β − K, Similarly considerations can be done for the Ornstein–Uhlenbeck kernel, albeit easier and more explicit. θ = ˜ θ if β = 1 .Let us additionally show that u ∂ θ ϕ ξ and u ∂ θ ϕ ξ are linearly independent ifthe -periodic function is negative and bounded and β = 1 . Due to their exponential formthese derivatives are linearly independent if the following functions (note that we onlyhave an explicit formula when u > u ≥ ) are linearly independent in u > u ≥ : ∂ θ k u g θ + u g θ ( · + 1) k ββ = − K θ, u β − K θ, ( u + u ) β − u − K θ, ( u + u exp( − θ )) β ∂ θ k u g θ + u g θ ( · + 1) k ββ = K θ, u β + K θ, ( u + u exp( − θ )) β , where the constant K θ, , . . . , K θ, are strictly positive, indeed the only constants which arenot in general positive are: K θ, = − Z ∞ βθ f ( x ) exp( − β ( θ x + θ f ( x ))) d xK θ, = − Z f ( x ) exp( − θ x − θ f ( x )) d x but they are by our assumption f < . The main observation needed is that these functionsare of diﬀerent order in u when u = 0 and that their constants are of opposite sign. Indeed,for a, b ∈ R we have that (cid:0) a∂ θ k u g θ + u g θ ( · + 1) k ββ + b∂ θ k u g θ + u g θ ( · + 1) k ββ (cid:1) (cid:14) u β −−−−→ u →∞ − aK θ, + bK θ, . The constants aK θ, and bK θ, must then be same and we have the following major sim-pliﬁcation: a∂ θ k u g θ + u g θ ( · + 1) k ββ + b∂ θ k u g θ + u g θ ( · + 1) k ββ = − ( aK θ, − bK θ, ) u β − aK θ, ( u + u ) β − u . If β > then this is clearly unbounded in u , hence a = 0 , and therefore b = 0 as wellsince K θ, > . If β < then diﬀerentiating with respect to u yields the simple equation: aK θ, ( u + u ) β − u for all u > u ≥ ,which yields a = 0 and therefore b = 0 since again K θ, > . Recall the moving average kernel from (2.4). First, we show that the one-dimensionalcharacteristic function is not enough to idenitify θ = ( θ , θ ) . Indeed, we see that for twoparameters θ , ˜ θ ∈ (0 , ∞ ) equality of the one-dimensional characteristic functions gives θ β Γ( β + 1)( βθ ) β +1 = Z ∞ ( θ s exp( − θ s )) β d s = Z ∞ (˜ θ s exp( − ˜ θ s )) β d s = ˜ θ β Γ( β + 1)( β ˜ θ ) β +1 . (4.8)24e claim that the two-dimensional characteristic function is enough to identity θ . Forthis we recall the covariation between X and X , cf. [22, Section 2.7], which is uniquelydetermined by the distribution of ( X , X ) and hence by its joint characteristic function.If θ denotes the underlying parameter for the moving average X , then the covariation is,cf. [22, Proposition 3.5.2], [ X , X ] β = Z R g θ ( s + 1) g θ ( s ) β − d s = θ β Z ∞ ( s + 1)e − θ ( s +1) s β − e − ( β − θ s d s = θ β e − θ hZ ∞ s β e − βθ s d s + Z ∞ s β − e − βθ s d s i = θ β e − θ h Γ( β + 1)( βθ ) β +1 + Γ( β )( βθ ) β i = θ β Γ( β + 1)( βθ ) β +1 e − θ (1 + θ ) , (4.9)where we used the deﬁning property: β Γ( β ) = Γ( β + 1) . Hence if θ and ˜ θ leads to the samedistribution of ( X , X ) , then combining the identities (4.8) and (4.9) yields (1 + θ )e − θ = (1 + ˜ θ )e − ˜ θ . It is straightforward to check that the function x (1 + x )e − x is strictly decreasing on (0 , ∞ ) , and therefore injective, which proves that θ = ˜ θ and therefore θ = ˜ θ as well, cf.(4.8). We consider a CARMA( , ) model of the form X t = Z t −∞ b ⊤ exp( A ( t − s )) e d L s , t ∈ R , where b = ( b , ⊤ , e = (0 , ⊤ , L is a symmetric β -stable Lévy process with β ∈ (1 , ,and A = (cid:18) − λ λ (cid:19) with λ < . We further assume that θ = b + λ > . Recall the deﬁnition of the incompletegamma function: Γ( β ; x ) = Z ∞ x y β − exp( − y ) d y, β, x > . The following identity is due to partial integration: Γ( β + 1; x ) = β Γ( β ; x ) + x β exp( − x ) ,or in other words Γ( β ; x ) = β − (Γ( β + 1; x ) − x β exp( − x )) . (4.10)The one-dimensional characteristic function of X uniquely determines the term Z R | g ξ ( x ) | β d x = Z ∞ (1 + θx ) β exp( λβx ) d x = (cid:0) θ exp( − λθ − ) (cid:1) β Z ∞ θ − y β exp( λβy ) d y − λβ (cid:18) − θ exp( − λθ − ) λβ (cid:19) β Γ( β + 1; − λβθ − ) =: c. Now, we compute the covariation [ X , X ] β : [ X , X ] β = Z R g ξ ( x + 1) g ξ ( x ) β − d x = Z ∞ (1 + θ ( x + 1)) exp( λ ( x + 1))(1 + θx ) β − exp( λ ( β − x ) d x = − λβ (cid:18) − θ exp( − λθ − ) λβ (cid:19) β exp( λ ) (cid:0) Γ( β + 1; − λβθ − ) − λβ Γ( β ; − λβθ − ) (cid:1) = exp( λ )( c (1 − λ ) − β − ) , where we used the formula (4.10). Since c is uniquely determined, the quantity [ X , X ] β identiﬁes the parameter λ (note that − cλ − β − > , and in particular this term is neverequal to 0). Acknowledgement

The authors acknowledge ﬁnancial support from the project ‘Ambit ﬁelds: probabilisticproperties and statistical inference’ funded by Villum Fonden.

References [1] Beth Andrews, Matthew Calder and Richard A. Davis. Maximum likelihood estima-tion for α -stable autoregressive processes. The Annals of Statistics , 37(4):1946–1982,2009.[2] Antoine Ayache and Julien Hamonier. Linear fractional stable motion: A waveletestimator of the α parameter. Statistics & Probability Letters , 82(8):1569–1575, 2012.[3] Ehsan Azmoodeh, Mathias Mørck Ljungdahl and Christoph Thäle. Multi-dimensionalnormal approximation of heavy-tailed moving averages. 2020. arXiv:2002.11335[4] Andreas Basse-O’Connor, Claudio Heinrich and Mark Podolskij. On limit theory forfunctionals of stationary increments Lévy driven moving averages.

Electronic Journalof Probability , 24(79):1–42, 2019.[5] Andreas Basse-O’Connor, Raphaël Lachièze-Rey and Mark Podolskij. Power variationfor a class of stationary increments Lévy driven movring averages.

The Annals ofProbability , 45(6B):4477–4528, 2017.[6] Andreas Basse-O’Connor, Mark Podolskij and Christoph Thäle. A Berry–Esseéntheorem for partial sums of functions of heavy-tailed moving averages. 2019.arXiv:1904.06065 267] Peter J. Brockwell, Richard A. Davis and Yu Yang. Estimation for non-negative Lévy-driven CARMA processes.

Journal of Business & Economic Statistics , 29(2):250–259,2011.[8] Stamatis Cambanis, Clyde D. Hardin Jr. and Aleksander Weron. Ergodic properties ofstationary stable distributions.

Stochastic Processes and Their Applications , 24:1–18,1987.[9] R. Dahlhaus. Eﬃcient parameter estimation for self-similar processes.

Annals ofStatistics , 17:1749–1766, 1989.[10] T. T. N. Dang and J. Istas. Estimation of the Hurst and the stability indices of a H -self-similar stable process. Electronic Journal of Statistics , 11:4103–4150, 2017.[11] Danijel Grahovac, Nikolai N. Leonenko and Murad S. Taqqu. Scaling properties ofthe empirical structure function of linear fractional stable motion and estimation ofits parameters. 158(1):105–119, 2015.[12] Marek Kanter. The L p norm of sums of translates of a function. Transactions of theAmerican Mathematical Society , 179:35–47, 1973.[13] Shiqing Ling. Self-weighted least absolute deviation estimation for inﬁnite varianceautoregressive models.

Journal of the Royal Statistical Society. Series B (StatisticalMethodology) , 67(3):381–393, 2005.[14] Mathias Mørck Ljungdahl and Mark Podolskij. A note on parametric estimation ofLévy moving average processes.

Springer Proceedings in Mathematics & Statistics ,294:41–56, 2019.[15] Mathias Mørck Ljungdahl and Mark Podolskij. A minimal contrast estimation for thelinear fractional stable motion.

Statistical Inference for Stochastic Processes , 23:381–413, 2020.[16] Stepan Mazur, Dmitry Otryakhin and Mark Podolskij. Estimation of the linear frac-tional stable motion.

Bernoulli , 26(1):226–252, 2020.[17] J. A. Nelder and R. Mead. A simplex method for function minimization.

The Com-puter Journal , 7(4):308–313, 1965.[18] Vladas Pipiras and Murad S. Taqqu. Central limit theorems for partial sums ofbounded functionals of inﬁnite-variance moving averages.

Bernoulli , 5:833–855, 2003.[19] Vladas Pipiras, Murad S. Taqqu and Patrice Abry. Bounds for the covariance offunctions of inﬁnite variance stable random variables with applications to central limittheorems and wavelet-based estimation.

Bernoulli , 13(4):1091–1123, 2007.[20] Balram S. Rajput and Jan Rosinski. Spectral representations of inﬁnitely divisibleprocesses.

Probability Theory and Related Fields , 82:451–487, 1989.2721] Jan Rosinski. On uniqueness of the spectral representation of stable processes.

Journalof Theoretical Probability , 7(3):615—-634, 1994.[22] Gennady Samorodnitsky and Murad S. Taqqu.

Stable non-Gaussian random processes:stochastic models with inﬁnite variance . CRC Press, 2000.[23] Maxwell B. Stinchcombe and Halbert White. Some measurability results for extremaof random functions over random sets.