aa r X i v : . [ m a t h . S T ] J un OPTIMAL STABLE ORNSTEIN-UHLENBECK REGRESSION
HIROKI MASUDA
Abstract.
We prove some efficient inference results concerning estimation of a Ornstein-Uhlenbeckregression model, which is driven by a non-Gaussian stable L´evy process and where the output processis observed at high-frequency over a fixed time period. Local asymptotics for the likelihood function ispresented, followed by a way to construct an asymptotically efficient estimator through a suboptimalyet very simple preliminary estimator, which enables us to bypass not only numerical optimization ofthe likelihood function, but also the multiple-root problem. Introduction
The Ornstein-Uhlenbeck (OU) models has been used in a wide variety of applications, such as: electricconsumption modeling [15], [1], [19], and [20], ecology [10], and protein dynamics modeling [3], to mentiona few. In this paper, we consider the following OU regression model(1.1) Y t = Y + Z t ( µ · X s − λY s ) ds + σJ t , t ∈ [0 , T ] , where J is the symmetric β -stable (c`adl`ag) L´evy process characterized by E ( e iuJ t ) = exp( −| u | β ) , t ≥ , u ∈ R , and where X = ( X t ) t ∈ [0 ,T ] is an R q -valued non-random c`adl`ag function. We suppose that the drivingnoise J is stochastically independent of the initial variable Y . Throughout, the terminal sampling time T > θ := ( λ, µ, β, σ ) ∈ Θ , where Θ ⊂ R p ( p := q + 3) is a bounded convex domain such that its closure satisfies Θ ⊂ R × R q × (0 , × (0 , ∞ ). We denote by ( P θ ) θ ∈ Θ a family of induced image measures of ( J, Y ) on some probabilityspace. We assume that there exists a true value θ ∈ Θ inducing the distribution L ( J, Y ). In thelikelihood asymptotics, we assume that available data is ( X t ) t ∈ [0 ,T ] and ( Y t j ) nj =1 , where t j = t nj := jh with h = h n := T /n . The model (1.1) thus described can be seen as a continuous-time counterpart of thesimple first-order autoregressive exogenous (ARX) model. Nevertheless, any proper form of the efficientestimation result has been missing in the literature, due to the lack of background theory which candeal with the bounded-domain infill asymptotics. The objective of this paper is to study asymptoticallyefficient estimation of the parameter θ , assuming that the model is correctly specified.Let us recall that, when J is a Wiener process ( β = 2), the drift parameters are consistently estimableonly when the terminal sampling time tends to infinity, and the associated statistical experiments areknown to possess essentially different properties according to the sign of λ , that is to say, the model is:locally asymptotically normal for λ > λ = 0 (unit-root case); locally asymptotically mixed-normal (LAMN) for λ < β -stable density.It is worth mentioning that, unlike the case of ARX time-series models and the Gaussian OU models, itdoes not matter that if the model is ergodic or not. The asymptotic results presented here are uniformlyvalid over each compact subset of the parameter space Θ. In particular, the sign of the autoregressive Date : June 9, 2020. parameter λ is no longer important, which in turn implies that the conventional unit-root problem (see[16] and the references therein) is not relevant here at all.2. Asymptotically optimal estimation
Likelihood asymptotics.
Denote by φ β the density of the distribution L ( J ): P θ ( J ∈ dy ) = φ β ( y ) dy . It is known that φ β ( y ) > y ∈ R , that φ β is smooth in ( y, β ) ∈ R × (0 , y ∈ R | y | β +1+ k log l (1 + | y | ) (cid:12)(cid:12) ∂ k ∂ lβ φ β ( y ) (cid:12)(cid:12) < ∞ , k, l ∈ Z + . Here we wrote ∂ k ∂ lβ φ β ( y ) := ( ∂ k /∂y k )( ∂ l /∂β l ) φ β ( y ); analogous notation for the partial derivatives willbe used in the sequel. Integrating by parts applied to t e λt Y t provides us with the explicit c`adl`agsolution process: under P θ , Y t = e − λ ( t − s ) Y s + µ · Z ts e − λ ( t − s ) X s ds + σ Z ts e − λ ( t − s ) dJ s , t > s. For x, λ ∈ R , we write η ( x ) = 1 x (1 − e − x ) ,ζ j ( λ ) = 1 h Z t j t j − e − λ ( t j − s ) X s ds. By the property of the L´evy integral, we havelog E θ ( exp iu σ Z t j t j − e − λ ( t j − s ) dJ s !) = −| σu | β Z t j t j − e − λβ ( t j − s ) ds = − (cid:12)(cid:12) σh /β η ( λβh ) /β u (cid:12)(cid:12) β . Hence(2.1) ǫ j ( θ ) := Y t j − e − λh Y t j − − µ · ζ j ( λ ) hσh /β η ( λβh ) /β P θ ∼ i.i.d. L ( J ) . Then, the exact log-likelihood function ℓ n ( θ ) = ℓ n (cid:0) θ ; ( X t ) t ∈ [0 ,T ] , ( Y t j ) nj =0 (cid:1) is given by ℓ n ( θ ) = n X j =1 log (cid:18) σh /β η ( λβh ) /β φ β ( ǫ j ( θ )) (cid:19) = n X j =1 (cid:18) − log σ + 1 β log(1 /h ) + 1 β log η ( λβh ) + log φ β ( ǫ j ( θ )) (cid:19) . To investigate asymptotic behavior of this random function ℓ n , we introduce further notation. Wewrite a n . b n when there exists a universal constant C such that a n ≤ Cb n for every n large enough.For positive functions a n ( θ ) and b n ( θ ), we denote a n ( θ ) . u b n ( θ ) if sup θ | a n ( θ ) /b n ( θ ) | .
1, where thesupremum is taken over the compact set Θ. For a matrix A we write A ⊗ = AA ⊤ , with ⊤ denotingthe transposition. We will simply write R j instead of R t j t j − . Denote by → u the uniform convergence ofnon-random quantities with respect to θ over Θ. Given continuous random functions ξ ( θ ) and ξ n ( θ ), n ≥
1, we write: ξ n ( θ ) L −→ u ξ ( θ ) if | P ξ n ( θ ) f − P ξ ( θ ) f | → u f ,where P ζ denotes the distribution of ζ ; also, ξ n ( θ ) p −→ u ξ ( θ ) if the joint distribution of ξ n and ξ arewell-defined under P θ and if P θ {| ξ n ( θ ) − ξ ( θ ) | > ǫ } → u ǫ > n → ∞ .Let(2.2) ϕ n = ϕ n ( θ ) := diag (cid:26) √ nh − /β I q , √ n (cid:18) ϕ ,n ( θ ) ϕ ,n ( θ ) ϕ ,n ( θ ) ϕ ,n ( θ ) (cid:19)(cid:27) , PTIMAL STABLE OU REGRESSION 3 where the real entries ϕ kl,n = ϕ kl,n ( θ ) are assumed to be continuous in θ and to satisfy the followingconditions for some finite values ϕ kl = ϕ kl ( θ ):(2.3) ϕ ,n ( θ ) → u ϕ ( θ ) ,ϕ ,n ( θ ) → u ϕ ( θ ) ,s ,n ( θ ) := β − log(1 /h n ) ϕ ,n ( θ ) + σ − ϕ ,n ( θ ) → u ϕ ( θ ) ,s ,n ( θ ) := β − log(1 /h n ) ϕ ,n ( θ ) + σ − ϕ ,n ( θ ) → u ϕ ( θ ) , inf θ | ϕ ( θ ) ϕ ( θ ) − ϕ ( θ ) ϕ ( θ ) | > , max ( k,l ) (cid:12)(cid:12) ∂ ( β,σ ) ϕ kl,n ( θ ) (cid:12)(cid:12) . u log (1 /h ) . The matrix ϕ n ( θ ) will turn out to be the right norming with which u ℓ n ( θ + ϕ n ( θ ) u ) − ℓ n ( θ ) under P θ has an asymptotically quadratic structure in R p ; see [2] and [5] for previous related studies. Note that √ nh − /βn → u ∞ and | ϕ ,n ( θ ) | ∨ | ϕ ,n ( θ ) | . log(1 /h ). By the same reasoning as in [2, page 292], wealso have inf θ | ϕ ,n ( θ ) ϕ ,n ( θ ) − ϕ ,n ( θ ) ϕ ,n ( θ ) | & | ϕ n ( θ ) | → u f β ( y ) := ∂ β φ β φ β ( y ) , g β ( y ) := ∂φ β φ β ( y ) . We define the block-diagonal random matrix I ( θ ) = diag {I λ,µ ( θ ) , I β,σ ( θ ) } , where, for a random variable ǫ such that L ( ǫ ) = φ β ( y ) dy , I λ,µ ( θ ) := 1 σ E θ { g β ( ǫ ) } T Z T (cid:18) Y t − Y t X ⊤ t − Y t X t X ⊗ t (cid:19) dt, (2.4) I β,σ ( θ ) := (cid:18) ϕ ϕ − ϕ − ϕ (cid:19) ⊤ (cid:18) E θ { f β ( ǫ ) } E θ { ǫf β ( ǫ ) g β ( ǫ ) } E θ { ǫf β ( ǫ ) g β ( ǫ ) } E θ { (1 + ǫg β ( ǫ )) } (cid:19)(cid:18) ϕ ϕ − ϕ − ϕ (cid:19) . (2.5)Note that I ( θ ) depends on the choice of ϕ ( θ ) = { ϕ kl ( θ ) } . In what follows, we assume that Z T X ⊗ t dt > , Then we have I ( θ ) > P θ -a.s.: indeed, I β,σ ( θ ) > I λ,µ ( θ ) > R T Y t dt > u ∈ R q , u ⊤ Z T X ⊗ t dt − Z T Y t X t dt ! Z T Y t dt ! − Z T Y t X t dt ! ⊤ u = Z T ( u · X t ) dt − Z T Y t dt ! − Z T Y t ( u · X t ) dt ! > Y = ( u · X ) ξ on [0 , T ] does not hold with positive probabilityfor any constant real ξ .The normalized score function is given by∆ n ( θ ) := ϕ n ( θ ) ⊤ ∂ θ ℓ n ( θ ) . Let
M N p (0 , I ( θ ) − ) denote the covariance mixture of p -dimensional normal distribution, correspondingto the characteristic function u E θ { exp( − u ⊤ I ( θ ) − u/ } . Here is the main claim of this section. Theorem 2.1.
Under the aforementioned setup: (1)
The uniform local asymptotically mixed normal (LAMN) property holds, that is, ℓ n ( θ + ϕ n ( θ ) u ) − ℓ n ( θ ) − (cid:18) ∆ n ( θ )[ u ] − I ( θ )[ u, u ] (cid:19) p −→ u , where (∆ n ( θ ) , I ( θ )) L −→ u (cid:0) I ( θ ) / Z, I ( θ ) (cid:1) with Z ∼ N p (0 , I ) independent of ( Y , J ) . (2) There exists a local maximum ˆ θ n of ℓ n with probability tending to such that ϕ n ( θ ) − (ˆ θ n − θ ) L −→ u M N p (cid:0) , I ( θ ) − (cid:1) . HIROKI MASUDA
As in [2], the particular non-diagonal form of A n ( θ ) is inevitable in the ML estimation. The LAMNproperty ensures the asymptotic optimality property of the MLE. See [9, Theorem 8] for details. Proof of Theorem 2.1.
The proof is analogous to that of [2, Theorem 1], the argument of which makeuse of the general theory [17]. Although the Fisher information matrix I ( θ ) is random, we do not needto take care about the stable convergence in law for the normalized score function at all, which is quiteoften crucial when concerned with a high-frequency sampling for a process with dependent increments.We have sup t ∈ [0 ,T ] | X t | < ∞ since X : [0 , T ] → R q is c`adl`ag. By means of the localization procedure,we may and do suppose that the driving stable L´evy process does not have jumps of size greater thansome fixed threshold: see [14, Section 6.1] for a concise account; in particular, we may suppose thatsup θ ∈ Θ E θ ( | J t | K ) < ∞ for any K > t ∈ [0 , T ]. Also to be noted is that, due to the symmetry of L ( J ), the removal of large jumps does not change the parametric form of the drift coefficient.We will complete the proof of Theorem 2.1 by showing the three statements corresponding to theconditions (12), (13), and (14) in [2], which here read I n ( θ ) := − ϕ n ( θ ) ⊤ ∂ θ ℓ n ( θ ) ϕ n ( θ ) p −→ u I ( θ ) , (2.6) sup θ ′ ∈ N n ( c ; θ ) | ϕ n ( θ ′ ) − ϕ n ( θ ) − I | → u , (2.7) sup θ ,...,θ p ∈ N n ( c ; θ ) (cid:12)(cid:12) ϕ n ( θ ) ⊤ { ∂ θ ℓ n ( θ , . . . , θ p ) − ∂ θ ℓ n ( θ ) } ϕ n ( θ ) (cid:12)(cid:12) → u , (2.8)respectively, where (2.7) and (2.8) should hold for all c > N n ( c ; θ ) := (cid:8) θ ′ ∈ Θ : | ϕ n ( θ ) − ( θ ′ − θ ) | ≤ c (cid:9) , and where ∂ θ ℓ n ( θ , . . . , θ p ), θ k ∈ Θ, denotes the p × p Hessian matrix, whose ( k, l )th element is given by ∂ θ k ∂ θ l ℓ n ( θ k ). Verification of (2.7) is completely the same as in [2], hence omitted.To verify (2.6), we first recall that ℓ n ( θ ) = n X j =1 (cid:18) − log σ + 1 β log(1 /h ) + log (cid:16) η /β ( λβh ) (cid:17) + log φ β ( ǫ j ( θ )) (cid:19) . For looking at the entries of ∂ θ ℓ n ( θ ), we make several shorthands. Let us omit the subscript β and theargument ǫ j of the aforementioned notation, such as g := g β ( ǫ j ) and so on, and symbolically write thelast display as ℓ n ( θ ) = n X j =1 (cid:18) − log σ + 1 β l ′ n + (log c ) + log φ ( ǫ ) (cid:19) . Further, the partial differentiation with respect to a variable will be denoted by the braced subscript suchas ǫ ( a ) := ∂ a ǫ j ( θ ) and ǫ ( a,b ) := ∂ a ∂ b ǫ j ( θ ). Then direct computations give the first-order partial derivatives ∂ µ ℓ n ( θ ) = P nj =1 (cid:0) (log c ) ( λ ) + g ǫ ( λ ) (cid:1) , ∂ λ ℓ n ( θ ) = P nj =1 g ǫ ( µ ) , ∂ β ℓ n ( θ ) = P nj =1 (cid:0) − β − l ′ n + (log c ) ( β ) + g ǫ ( β ) + f (cid:1) , ∂ σ ℓ n ( θ ) = P nj =1 (cid:0) − σ − + g ǫ ( σ ) (cid:1) , followed by ∂ λ ℓ n ( θ ) = n X j =1 (cid:8) ( ∂g )( ǫ ( λ ) ) + g ǫ ( λ,λ ) + (log c ) ( λ,λ ) (cid:9) ,∂ µ ℓ n ( θ ) = n X j =1 ( ∂g )( ǫ ( µ ) ) ,∂ β ℓ n ( θ ) = n X j =1 (cid:8) β − l ′ n + (log c ) ( β,β ) + g ǫ ( β,β ) + g ( β ) ǫ ( β ) + ( ∂g ) ( ǫ ( β ) ) + f ( β ) + ( ∂f ) ǫ ( β ) (cid:9) ,∂ σ ℓ n ( θ ) = n X j =1 (cid:8) σ − + ( ∂g ) ( ǫ ( σ ) ) + g ǫ ( σ,σ ) (cid:9) ,∂ λ ∂ µ ℓ n ( θ ) = n X j =1 (cid:8) ( ∂g ) ǫ ( λ ) ǫ ( µ ) + g ǫ ( µ,λ ) (cid:9) ,∂ λ ∂ β ℓ n ( θ ) = n X j =1 (cid:8) (log c ) ( λ,β ) + g ( β ) ǫ ( λ ) + ( ∂g ) ǫ ( β ) ǫ ( λ ) + g ǫ ( λ,β ) (cid:9) , PTIMAL STABLE OU REGRESSION 5 ∂ λ ∂ σ ℓ n ( θ ) = n X j =1 (cid:8) ( ∂g ) ǫ ( σ ) ǫ ( λ ) + g ǫ ( λ,σ ) (cid:9) ,∂ µ ∂ β ℓ n ( θ ) = n X j =1 (cid:8) g ( β ) ǫ ( µ ) + ( ∂g ) ǫ ( β ) ǫ ( µ ) + g ǫ ( β,µ ) (cid:9) ,∂ µ ∂ σ ℓ n ( θ ) = n X j =1 (cid:8) ( ∂g ) ǫ ( µ ) ǫ ( σ ) + g ǫ ( µ,σ ) (cid:9) ,∂ β ∂ σ ℓ n ( θ ) = n X j =1 (cid:8) g ( β ) ǫ ( σ ) + ( ∂g ) ǫ ( β ) ǫ ( σ ) + g ǫ ( β,σ ) (cid:9) . It is straightforward to see what is the leading term in each expression above. We do not list up all thedetails here, but for later reference just mention a few of the points: • We have ∂ k log η ( y ) = O (1) for | y | → k ∈ Z + is, and (log c ) ( λ ) = O ( h ), (log c ) ( λ,λ ) = O ( h ), (log c ) ( β ) = O ( h ), (log c ) ( β,β ) = O ( h ), and so on. • sup j ≤ n | ∂ kλ ζ j ( λ ) | = O ( h k ), k ∈ Z + . • Recalling the definition (2.1), we obtain asymptotic equivalence of the partial derivatives of ǫ j ( θ ), valid uniformly in θ ∈ Θ: we have ǫ ( µ,σ ) ∼ u σ − h − /β , ǫ ( µ,λ ) ∼ u σ − h − /β / ǫ ( σ,λ ) ∼ u − σ − h − /β Y t j − + O ∗ p ( h ∨ h − /β ), ǫ ( λ,λ ) = O ∗ p ( h − /β ), ǫ ( β,β ) = O ∗ p (( l ′ n ) ), ǫ ( λ,β ) = O ∗ p ( l ′ n h − /β ),and so on, where we wrote A n ( θ ) ∼ u B n ( θ ) if sup θ | A n ( θ ) /B n ( θ ) − | p −→ u
0, and where O ∗ p ( a n )stands for any random sequence χ nj ( θ ) such that (under the localization)sup n sup j ≤ n E θ (cid:0) | a − n χ nj ( θ ) | K (cid:1) < ∞ for any K >
0; likewise, we will occasionally use the O ∗ u ( a n ) if the sequence in question is notrandom and the asymptotics is uniformaly valid in θ ∈ Θ.Now we write I n ( θ ) = (cid:18) I ,n ( θ ) I ,n ( θ ) I ,n ( θ ) ⊤ I ,n ( θ ) (cid:19) , where I ,n ( θ ) ∈ R q ⊗ R q and I ,n ( θ ) ∈ R q ⊗ R . We can deduce I ,n ( θ ) p −→ u I β,σ ( θ ) in the sameway as in the proof of Eq.(12) in [2]. We will show I ,n ( θ ) p −→ u I λ,µ ( θ ) and I ,n ( θ ) p −→ u r n = r n ( β ) := √ nh − /β , and denote any random sequence p −→ u p −→ u o u,p (1) and u,p ,respectively; we will also use the notation O u,p (1) in the obvious sense. The right continuity of t X t implies that(2.9) lim n →∞ sup j ≤ n (cid:12)(cid:12)(cid:12)(cid:12) h Z j ( X s − X t j − ) ds (cid:12)(cid:12)(cid:12)(cid:12) = 0 . We obtain − r n ∂ µ ℓ n ( θ ) = − n n X j =1 σ − ( ∂g ) ζ j ( λ ) ⊗ + o u,p (1) = 1 n n X j =1 σ − g X ⊗ t j − + o u,p (1) , − r n ∂ λ ℓ n ( θ ) = − n n X j =1 σ − ( ∂g ) Y t j − u,p + o u,p (1) + O u,p ( h /β )= 1 n n X j =1 σ − g Y t j − + o u,p (1) , − r n ∂ λ ∂ µ ℓ n ( θ ) = − n n X j =1 n ( ∂g ) (cid:16) u,p σ − Y t j − ζ j ( λ ) + O ∗ p ( h /β ) (cid:17) ( − σ − u,p ) + g O ∗ p ( h /β ) o = u,p n n X j =1 σ − ( ∂g ) Y t j − ζ j ( λ ) + o u,p (1) + o u,p (1)= 1 n n X j =1 σ − g Y t j − X t j − + o u,p (1) . HIROKI MASUDA
Now we note that, in general, the Burkholder inequality ensures that1 √ n n X j =1 π j − ( θ ) U ( ǫ j ( θ )) = O u,p (1)for any continuous π ( x, y ; θ ) and for U ( ǫ j ( θ )) such that E θ { U ( ǫ j ( θ )) } = 0 ( θ ∈ Θ) the left-hand side iscontinuous over θ ∈ Θ, where π j − ( θ ) := π ( X t j − , Y t j − ; θ ); this is a basic device which we will make useof several times below without mention.Let us turn back to our model: making the compensation g = E θ ( g ) + ( g − E θ ( g )) in the summandof the expression of − r − n ∂ ( λ,µ ) ℓ n ( θ ), noting that ǫ j = ǫ j ( θ ) ∼ i.i.d. L ( J ) under P θ , and then apply thelaw of large numbers(2.10) 1 n n X j =1 ψ ( X t j − , Y j − ) p −→ T Z T ψ ( X t , Y t ) dt, n → ∞ , valid for any ψ ∈ C ( R q × R ) with | ∂ ( x,y ) ψ ( x, y ) | . (1 + | x | + | y | ) C under the Riemann integrability of t ( X t ( ω ) , Y t ( ω )), we can make use of the argument in the last paragraph to conclude that I ,n ( θ ) p −→ u I λ,µ ( θ ).We also need to look at I ,n ( θ ) = {I kl ,n ( θ ) } k,l : I ,n ( θ ) = ϕ ,n ( θ ) ∂ λ ∂ β ℓ n ( θ ) + ϕ ,n ( θ ) ∂ µ ∂ β ℓ n ( θ ) , I ,n ( θ ) = ϕ ,n ( θ ) ∂ λ ∂ σ ℓ n ( θ ) + ϕ ,n ( θ ) ∂ µ ∂ σ ℓ n ( θ ) , I ,n ( θ ) = ϕ ,n ( θ ) ∂ λ ∂ β ℓ n ( θ ) + ϕ ,n ( θ ) ∂ µ ∂ β ℓ n ( θ ) , I ,n ( θ ) = ϕ ,n ( θ ) ∂ λ ∂ σ ℓ n ( θ ) + ϕ ,n ( θ ) ∂ µ ∂ σ ℓ n ( θ ) . However, we can deduce that I ,n ( θ ) p −→ u I ,n ( θ ). Let us only mention the lower-left q × | ϕ ,n | . u l ′ n , I ,n ( θ ) = − ( h − /β ) − n n X j =1 (cid:18) ϕ ,n (log c ) ( λ,β ) + ϕ ,n g ( β ) ǫ ( λ ) + ϕ ,n ( ∂g ) ǫ ( λ ) ǫ ( β ) + ϕ ,n g ǫ ( λ,β ) + ϕ ,n g ( β ) ǫ ( µ ) + ϕ ,n ( ∂g ) ǫ ( β ) ǫ ( µ ) + ϕ ,n g ǫ ( β,µ ) (cid:19) = O ( h /β ) + O u,p ( n − / ∨ h /β ) + O u,p (cid:16) ( n − / ∨ h /β ) l ′ n (cid:17) + O u,p ( n − / ) + O u,p (cid:16) n − / ( l ′ n ) (cid:17) + O u,p (cid:16) n − / ( l ′ n ) (cid:17) p −→ u . To verify (2.8), we note that for | ǫ j ( θ ′ ) | . u | ǫ j ( θ ) | + o u,c (1) for θ ′ ∈ N n ( c ; θ ). Hence for each k, l, m ∈ Z + , we have1 n (cid:12)(cid:12) ∂ kβ ∂ lσ ∂ mµ ℓ n ( θ ) (cid:12)(cid:12) . u { log(1 /h ) } k h (1 − /β ) m n n X j =1 (1 + | Y t j − | ) (cid:8) (cid:0) | ǫ j ( θ ) | (cid:1)(cid:9) k . As in the proof of Eq.(14) in [2], for each c > R = R ( c ) > θ ,...,θ p ∈ N n ( c ; θ ) (cid:12)(cid:12) ϕ n ( θ ) ⊤ { ∂ θ ℓ n ( θ , . . . , θ p ) − ∂ θ ℓ n ( θ ) } ϕ n ( θ ) (cid:12)(cid:12) . u sup θ ′ ,θ ,...,θ p ∈ N n ( c ; θ ) (cid:12)(cid:12) ϕ n ( θ ) ⊤ (cid:8) ∂ θ ℓ n ( θ , . . . , θ p )[ θ ′ − θ ] (cid:9) ϕ n ( θ ) (cid:12)(cid:12) . u √ n { log(1 /h ) } sup β ′ ,β ′′ ∈ B ( β ; R/ log(1 /h )) h (1 /β ′ − /β ′′ )3 sup θ ′ ∈ N n ( c ; θ ) n n X j =1 { | ǫ j ( θ ′ ) | ) } . u √ n { log(1 /h ) } n n X j =1 { | ǫ j ( θ ) | ) } p −→ u . This shows (2.8), hence the proof of Theorem 2.1 is complete.
Remark 2.2 (Time scale) . (1) We have set the terminal sampling time fixed to be T . Note that √ nh − /β = n /β − / T − /β . If β > X, Y ) is, informally speaking, more or less stableover the period [0 , T ], then a longer (resp. shorter) period will lead to a better (resp. worse)
PTIMAL STABLE OU REGRESSION 7 performance of estimating ( λ, µ ). The situation is reversed for β <
1. The Cauchy case β = 1 isexceptional.(2) We can explicitly associate changing T with changes of the components of θ . Consider sam-pling period [0 ,
1] instead of [0 , T ]. Then, changing t to tT in (1.1), we see that the process Y T = ( Y Tt ) t ∈ [0 , := ( Y tT ) t ∈ [0 , satisfies the same integral equation as in (1.1) except that θ = ( λ, µ, β, σ ) is replaced by θ T = ( T λ, T µ, β, T /β σ ), X t by X Tt := X tT , and finally J t by J Tt := T − /β J tT . As a result, it is straightforward to rewrite our theoretical results for the caseof any fixed [0 , T ].(3) This present framework allows us to do unit-period wise (for example, day-by-day) inference forboth trend and scale structures, providing a sequence of period-wise estimates with theoreticallyvalid approximate confidence sets. This would suggest an aspect of change-point analysis in high-frequency data: if we have high-frequency sample over [ k − , k ] for k = 1 , . . . , [ T ], then we canconstruct a sequence of estimators { ˆ θ n ( k ) } [ T ] k =1 ; then it would be possible in some way to rejectthe constancy of θ over [0 , [ T ]] if (some of) the parameter components have got changed.2.2. One-step estimator.
The well-known shortcoming of the classical Cram´er-type argument is itslocal character: the result just tells us the existence of asymptotically good root of the likelihood equation,but does not provide us with information about which local maxima is the one in case where there aremultiple local maxima, or multiple roots for the likelihood equations [11, Section 7.3]. Indeed, the log-likelihood function ℓ n is highly nonlinear and and non-concave. In this section, we consider removingthe locality by a one-step improvement, which in our case will not only remedy the aforementionedinconvenience about the multiple-root problem, but also enable us to bypass the numerical optimizationinvolving the stable density φ β .In [2, Section 3], for the β -stable L´evy process, which is a special case of (1.1) with λ = µ = 0, weprovided an initial estimator based on the sample median and the method of moments associated withlogarithm and/or lower-order fractional moments. In that paper it was crucial that the model was aL´evy process, for which we could apply the median-adjusted central limit theorem. Here we will take adifferent route.2.2.1. Initial rates of convergence.
First we prove a basic result about the classical one-step estimator.Recall the definition (2.2) of the non-diagonal matrix ϕ n = ϕ n ( θ ). We have seen that the likelihoodequation ∂ θ ℓ n ( θ ) = 0 admits a root, denoted by ˆ θ n , such that ϕ n ( θ ) − (ˆ θ n − θ ) is uniformly asymptoticallymixed-normal in the sense of Theorem 2.1(2).We introduce the diagonal matrix ϕ n = ϕ n ( θ ) by ϕ n = diag( ϕ ,n , . . . , ϕ p,n ) := diag (cid:18) √ nh − /β I q , √ n , log(1 /h ) √ n (cid:19) . Suppose that there exists an estimator ˆ θ n = (ˆ λ n , ˆ µ n , ˆ β n , ˆ σ n ) satisfies that ( ϕ n ) − (ˆ θ n − θ ) = O u,p (1):(2.11) (cid:18) √ nh − /β (ˆ λ n − λ ) , √ nh − /β (ˆ µ n − µ ) , √ n ( ˆ β n − β ) , √ n log(1 /h ) (ˆ σ n − σ ) (cid:19) = O u,p (1) . Then, we define the one-step estimator ˆ θ n = (ˆ λ n , ˆ µ n , ˆ β n , ˆ σ n ) starting from ˆ θ n by(2.12) ˆ θ n = ˆ θ n + ϕ n (ˆ θ n ) I n (ˆ θ n ) − ∆ n (ˆ θ n ) , which is well-defined with probability tending to 1 for n → ∞ since (recall the property (2.8)) I n (ˆ θ n ) p −→ u I ( θ )with the limit being a.s. positive definite. Theorem 2.3.
For any ˆ θ n satisfying (2.11) , the one-step estimator ˆ θ n defined by (2.12) is uniformlyasymptotically equivalent to the asymptotically efficient MLE ˆ θ n in the sense that ϕ n ( θ ) − (ˆ θ n − ˆ θ n ) = o u,p (1) . In particular, (2.13) ϕ n (ˆ θ n ) − (ˆ θ n − θ ) L −→ u M N p (cid:0) , I ( θ ) − (cid:1) . In (2.13) we may replace the term ϕ n (ˆ θ n ) − by ϕ n (ˆ θ n ) − . Proof of Theorem 2.1.
We only need to show(2.14) ϕ − n (ˆ θ n − ˆ θ n ) = o u,p (1) . HIROKI MASUDA
Below we will use the shorthand “hat” for plugging-in ˆ θ n : ˆ I n = I n (ˆ θ n ), ˆ ϕ n = ϕ n (ˆ θ n ), ∂ θ ˆ ℓ n = ∂ θ ℓ n (ˆ θ n ),and so on.Note that (2.3) and (2.11) imply(2.15) (cid:12)(cid:12) ϕ − n ˆ ϕ n − I p (cid:12)(cid:12) p −→ u , so that it suffices to show ˆ ϕ − n (ˆ θ n − ˆ θ n ) = o u,p (1).By (2.12) and the second-order Taylor expansion of ∂ θ ℓ n (ˆ θ n ) therein around ˆ θ n , we have for somerandom point ξ n ∈ [0 , ϕ − n (ˆ θ n − ˆ θ n )= ˆ ϕ − n (ˆ θ n − ˆ θ n ) + ˆ I − n ˆ ϕ ⊤ n (cid:16) ∂ θ ˆ ℓ n + ∂ θ ℓ n (cid:0) ˆ θ n + ξ n (ˆ θ n − ˆ θ n ) (cid:1) [ˆ θ n − ˆ θ n ] (cid:17) = ˆ I − n n ˆ ϕ ⊤ n ∂ θ ˆ ℓ n + (cid:16) ˆ I n + ˆ ϕ ⊤ n ∂ θ ℓ n (cid:0) ˆ θ n + ξ n (ˆ θ n − ˆ θ n ) (cid:1) ˆ ϕ n (cid:17) ˆ ϕ − n ϕ n [( ϕ n ) − (ˆ θ n − ˆ θ n )] o = ˆ I − n n ˆ ϕ ⊤ n ∂ θ ˆ ℓ n + (cid:16) ˆ I n + ˆ ϕ ⊤ n ∂ θ ˆ ℓ n ˆ ϕ n (cid:17) ˆ ϕ − n ϕ n [( ϕ n ) − (ˆ θ n − ˆ θ n )]+( ϕ − n ˆ ϕ n ) ⊤ (cid:16) ϕ ⊤ n { ∂ θ ℓ n (cid:0) ˆ θ n + ξ n (ˆ θ n − ˆ θ n ) (cid:1) − ∂ θ ˆ ℓ n } ϕ n (cid:17) ϕ − n ϕ n h ( ϕ n ) − (ˆ θ n − ˆ θ n ) io =: ˆ I − n ( δ ,n + δ ,n + δ ,n ) . We have δ ,n ≡
0; we need an additional condition for this term, when concerned with a scoring procedure,see Remark 2.4. We also have δ ,n p −→ u ǫ > P θ ( | δ ,n | > ǫ ) ≤ P θ ( | ∂ θ ℓ n (ˆ θ n ) | 6 = 0) → u θ n is a consistent root of the likelihood equation with probability tending to 1. Due to (2.15) and( ϕ n ) − (ˆ θ n − ˆ θ n ) = O u,p (1), in order to deduce δ ,n p −→ u (cid:16) ϕ ⊤ n { ∂ θ ℓ n (cid:0) ˆ θ n + ξ n (ˆ θ n − ˆ θ n ) (cid:1) − ∂ θ ˆ ℓ n } ϕ n (cid:17) ϕ − n ϕ n p −→ u . For this, it is in turn sufficient that there exists a positive sequence ǫ ′ n → ǫ ′ n (cid:12)(cid:12) ϕ − n ϕ n (cid:12)(cid:12) → u , ∀ c > , max ≤ k ≤ p sup ρ ∈ Θ: | ( ϕ n ) − ( ρ − θ ) |≤ c (cid:12)(cid:12)(cid:12)(cid:0) ϕ ⊤ n ∂ θ k ∂ θ ℓ n ( ρ ) ϕ n (cid:1) (ˆ θ k,n − ˆ θ k,n ) (cid:12)(cid:12)(cid:12) = O u,p ( ǫ ′ n )hold. Indeed, we can take ǫ ′ n = n − / log K (1 /h ) for some K >
0: then the former is obvious, and we canverify the latter as in the proof of Eq.(14) in [2]. Thus we have obtained (2.14), which combined withTheorem 2.1 ends the proof of Theorem 2.1.
Remark 2.4. (1) As was the classical Fisher’s scoring, we could replace the definition of ˆ θ n byˆ θ n = ˆ θ n + ϕ n (ˆ θ n ) − H n (ˆ θ n ) − ∆ n (ˆ θ n ) , for any random function H n ( θ ) for which H n ( θ ) p −→ u I ( θ ) and that (cid:16) ˆ H n + ˆ ϕ ⊤ n ∂ θ ˆ ℓ n ˆ ϕ n (cid:17) ˆ ϕ − n ϕ n p −→
0. This is easily seen by inspecting the proof above.(2) We may repeatedly apply the above argument to obtain k -step estimator. In the literature, onemay need a refined one-step estimator when rate of convergence of an initial estimator is “much”slower than the target one, see [12]; in our situation, just one step is enough to achieve the best. Remark 2.5.
Having (2.13) in hand, we want to have consistent estimators ˆ I λ,µ,n p −→ u I λ,µ ( θ ) andˆ I β,σ,n p −→ u I β,σ ( θ ), so that(2.16) ˆ I / λ,µ,n √ nh − / ˆ β n (cid:18) ˆ λ n − λ ˆ µ n − µ (cid:19) , ˆ I / β,σ,n √ n ˜ ϕ n (ˆ θ n ) − (cid:18) ˆ β n − β ˆ σ n − σ (cid:19)! L −→ N p (0 , I p ) , where n − / ˜ ϕ n ( θ ) denotes the lower-right 2 × ϕ n ( θ ); (2.16) can be used for goodness-of-fittesting, in particular variable selection among the components of X . In view of the expressions (2.4) and(2.5), we can observe the following. • Replacing the (Riemann) dt -integrals by corresponding sample quantities (see (2.10)). • The elements of the form E θ { H ( ǫ ; β ) } = R H ( ǫ ; β ) φ β ( ǫ ) dǫ is estimated through a numericalintegration involving the density φ β ( ǫ ) and its partial derivatives with respect to ( β, ǫ ), withplugging-in the estimate ˆ β n for the value of β . PTIMAL STABLE OU REGRESSION 9 • The values ϕ kl ( θ ) are estimated by plugging-in ˆ θ n in (2.3): ˆ β − n log(1 /h ) ϕ ,n (ˆ θ n ) + ˆ σ − n ϕ ,n (ˆ θ n ) → u ϕ ( θ ) , ˆ β − n log(1 /h ) ϕ ,n (ˆ θ n ) + ˆ σ − n ϕ ,n (ˆ θ n ) → u ϕ ( θ ) ,ϕ ,n (ˆ θ n ) → u ϕ ( θ ) ,ϕ ,n (ˆ θ n ) → u ϕ ( θ ) . A concrete example satisfying (2.11) will be briefly mentioned below. Borrowing some existing resultspartly with slight modifications, we will propose to make use of the least absolute deviation (LAD)estimator and the power-variation based estimator for ( λ, µ ) and ( β, σ ), respectively, under the assumptionthat(2.17) β ∈ (cid:18) , (cid:19) . To avoid technical difficulty, we assume that we observe { ( R t j t j − X s ds, Y t j ) } nj =1 in the sequel. An additionalconstraint on acceptable range of β will be made in Section 2.2.3.2.2.2. LAD estimator of ( λ, µ ) . Let us recall the key expression: Y t j = e − λh Y t j − + µ · Z j e − λ ( t j − s ) X s ds + σ Z j e − λ ( t j − s ) dJ s . We note that the approximation of the Riemann integral:(2.18) Z j e − λ ( t j − s ) X s ds = Z j X s ds + O ∗ u ( h ) . This combined with the conventional convexity argument will make the subsequent exposition muchsimpler.We define the LAD estimator (ˆ λ n , ˆ µ n ) by a minimizer of the random function( λ, µ ) n X j =1 (cid:12)(cid:12)(cid:12)(cid:12) Y t j − e − λh Y t j − − µ · Z j X s ds (cid:12)(cid:12)(cid:12)(cid:12) . This is a slight modification of the previously studied approximate LAD estimator in [13] in the contextof drift estimation of the ergodic locally stable OU process. Because the technical aspects here are quiteanalogous to those of [13], we will only mention an outline.We will work under P θ below, unless otherwise mentioned. Let ǫ ′ j = ǫ ′ j ( θ ) := h − /β (cid:18) Y t j − e − λh Y t j − − µ · Z j X s ds (cid:19) = ǫ j ( θ ) ση ( λβh ) /β + O ∗ u ( h − /β ) . This will later enable us to approximate as ǫ ′ j ( θ ) ≈ i.i.d. L ( ση ( λβh ) /β J ) ≈ L ( σJ ). For brevity wewrite x j − = ( Y t j − , R j X s ds ). Introduce the convex random fieldsΛ ′ n ( u ) := n X j =1 (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) ǫ ′ j − √ n u · x j − (cid:12)(cid:12)(cid:12)(cid:12) − | ǫ ′ j | (cid:19) , Λ n ( u ) := n X j =1 (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) ǫ j ( θ ) ση ( λβh ) /β − √ n u · x j − (cid:12)(cid:12)(cid:12)(cid:12) − | ǫ j ( θ ) ση ( λβh ) /β | (cid:19) , u ∈ R q . Define a variable ˆ u n = (ˆ u ,n , ˆ u ,n ) ∈ R × R q by ˆ u ,n := √ nh − /β ( e − ˆ λ n h − e − λh ) and ˆ u ,n := √ nh − /β (ˆ µ n − µ ). Then, according to the definition of (ˆ λ n , ˆ µ n ), ˆ u n is a minimizer of Λ ′ n . Further, when β > /
3, bythe triangular inequality we have for each u , | Λ n ( u ) − Λ ′ n ( u ) | . | u | O u,p ( √ nh − /β ) . | u | O u,p ( h / − /β ) = | u | o u,p (1) . Next let Λ ♯n ( u ) := ∆ n [ u ] −
12 Γ n [ u, u ] , where ∆ n := n X j =1 √ n x j − sgn( ǫ j ) , Γ n := 2 φ σ,β (0) 1 n n X j =1 (cid:18) Y t j − − Y t j − X ⊤ t j − − Y t j − X t j − X ⊗ t j − (cid:19) , with φ σ,β denoting the density of L ( σJ ). Moreover, we haveΓ n p −→ u Γ := 2 φ σ,β (0) 1 T Z T (cid:18) Y t − Y t X ⊤ t − Y t X t X ⊗ t (cid:19) dt. Then, we have the locally asymptotically quadratic structure:(2.19) Λ n ( u ) − Λ ♯n ( u ) = o u,p (1)uniformly in u ∈ R q over compact sets; the proof is in a quite similar fashion to that of [13, Theorem2.1], hence omitted.We now verify the representation(2.20) ˆ u n = Γ − ∆ n + o u,p (1) , from which it will follow that the LAD estimator (ˆ λ n , ˆ µ n ) is √ nh − /β -consistent. The a.s. positivedefiniteness of Γ implies that argmin Λ ♯n a.s. consists of the single point ˆ u ♯n := Γ − ∆ n . Fix any ǫ > δ n ( u ) := Λ n ( u ) − Λ ♯n ( u ). Making use of the convexity of Λ n , (2.19) and [7, Lemma 2],we have P θ (cid:0) | ˆ u n − ˆ u ♯n | ≥ ǫ (cid:1) ≤ P θ sup u : | u − ˆ u ♯n |≤ ǫ | δ n ( u ) | ≥
12 inf ( u,v ): | v | =1 ,u =ˆ u♯n + ǫv Λ ♯n ( u ) ≤ P θ sup u : | u − ˆ u ♯n |≤ ǫ | δ n ( u ) | ≥ ǫ λ min ! (2.21)for each n ∈ N , where λ min > . Given any K >
0, the upperbound in (2.21) is bounded by P θ (sup u : | u |≤ K + ǫ | δ n ( u ) | ≥ ǫ λ min /
2) + sup n P θ ( | ˆ u ♯n | ≥ K ). By (2.19) thefirst term → u
0, so that (2.20) follows from ˆ u ♯n = O u,p (1). To show ∆ n = O u,p (1), we apply the Lenglartinequality (e.g. [8, I.3.31]): for any K ′ , K ′′ >
1, we havesup n P θ ( | ∆ n | ≥ K ) . K ′ K + K ′′ K + sup n P θ n n X j =1 (1 + Y t j − ) & K ′′ k X k ∞ . Since n − P nj =1 Y t j − = O u,p (1), we conclude (2.20). Remark 2.6.
The structural assumptions could be much weakened for the LAD estimator to be √ nh − /β -consistent: the driving noise J can be a symmetric locally β -stable L´evy process as in [14], [4], and [5].The assumption that the covariate process X is non-random could be removed if it satisfies a certaincontinuity-in-probability conditions; also, when X is random yet independent of ( Y , J ), then by lookingat the X -conditional probability measure from the very beginning we could proceed as if X is a non-random process. Further, although it is enough here to look at the tightness (2.11), we could prove that(ˆ λ n , ˆ µ n ) is asymptotically normally mixed-normally distributed. The same remarks as above apply tothe power-variation based estimator ( ˆ β n , ˆ σ n ) mentioned subsequently.2.2.3. Power variation based estimator of ( β, σ ) . We apply the limit theorems for the power-variationstatistics via second-order increments [18]; note that we keep assuming (2.17). Pick an r ∈ ( | β − | β ∧ , β/ ⊂ [0 , V ′ n ( r ) := n X j =2 | ∆ j Y − ∆ j − Y | r ,V ′′ n ( r ) := n X j =4 | ∆ j Y − ∆ j − Y + ∆ j − Y − ∆ j − Y | r . We introduce the following simple estimators of β and σ :ˆ β n ( r ) := r log(2) / log (cid:0) V ′′ n ( r ) /V ′ n ( r ) (cid:1) , ˆ σ n ( r ) := T − / ˆ β n ( r ) n n r/ ˆ β n ( r ) − V ′ n ( r ) . m (cid:0) r, ˆ β n ( r ) (cid:1) o /r , PTIMAL STABLE OU REGRESSION 11 where, with J ′ and J ′′ denoting i.i.d. copies of J , m ( r, β ) := E θ ( | J ′ − J ′′ | r ) = 2 r/β r Γ (( r + 1) /
2) Γ(1 − r/β ) √ π Γ(1 − r/ . By [18, Theorem 3] we have both √ n ( ˆ β n ( r ) − β ) = O u,p (1) and √ n log(1 /h ) (ˆ σ n ( r ) − σ ) = O u,p (1), hence therequirement in (2.11). Acknowledgement.
This work was supported by JSPS KAKENHI Grant Number JP17K05367, andalso JST CREST Grant Number JPMJCR14D7, Japan.
References [1] S. Borovkova and M. D. Schmeck. Electricity price modeling with stochastic time change.
Energy Economics , 63:51 –65, 2017.[2] A. Brouste and H. Masuda. Efficient estimation of stable L´evy process with symmetric jumps.
Stat. Inference Stoch.Process. , 21(2):289–307, 2018.[3] C. J. Challis and S. C. Schmidler. A stochastic evolutionary model for protein structure alignment and phylogeny.
Molecular biology and evolution , 29(11):3575–3587, 2012.[4] E. Cl´ement and A. Gloter. Estimating functions for SDE driven by stable L´evy processes.
Ann. Inst. Henri Poincar´eProbab. Stat. , 55(3):1316–1348, 2019.[5] E. Cl´ement and A. Gloter. Joint estimation for SDE driven by locally stable L´evy processes.
Preprint hal-02125428 ,2019.[6] W. H. DuMouchel. On the asymptotic normality of the maximum-likelihood estimate when sampling from a stabledistribution.
Ann. Statist. , 1:948–957, 1973.[7] N. L. Hjørt and D. Pollard. Asymptotics for minimisers of convex processes.
Statistical Research Report, University ofOslo , 1993. Available at Arxiv preprint arXiv:1107.3806, 2011.[8] J. Jacod and A. N. Shiryaev.
Limit theorems for stochastic processes , volume 288 of
Grundlehren der MathematischenWissenschaften [Fundamental Principles of Mathematical Sciences] . Springer-Verlag, Berlin, second edition, 2003.[9] P. Jeganathan. Some aspects of asymptotic theory with applications to time series models.
Econometric Theory ,11(5):818–887, 1995. Trending multiple time series (New Haven, CT, 1993).[10] D.-C. Jhwueng and V. Maroulas. Phylogenetic Ornstein-Uhlenbeck regression curves.
Statist. Probab. Lett. , 89:110–117,2014.[11] E. L. Lehmann.
Elements of large-sample theory . Springer Texts in Statistics. Springer-Verlag, New York, 1999.[12] Y. Y. Linke. Refinement of Fisher’s one-step estimators in the case of slowly converging initial estimators.
TheoryProbab. Appl. , 60(1):88–102, 2016.[13] H. Masuda. Approximate self-weighted LAD estimation of discretely observed ergodic Ornstein-Uhlenbeck processes.
Electron. J. Stat. , 4:525–565, 2010.[14] H. Masuda. Non-Gaussian quasi-likelihood estimation of SDE driven by locally stable L´evy process.
Stochastic Process.Appl. , 129(3):1013–1059, 2019.[15] M. Perninge, V. Knazkins, M. Amelin, and L. S¨oder. Modeling the electric power consumption in a multi-area system.
European Transactions on Electrical Power , 21(1):413–423, 2011.[16] D. M. M. Samarakoon and K. Knight. A note on unit root tests with infinite variance noise.
Econometric Rev. ,28(4):314–334, 2009.[17] T. J. Sweeting. Uniform asymptotic normality of the maximum likelihood estimator.
Ann. Statist. , 8(6):1375–1381,1980. Corrections: (1982)
Annals of Statistics , 320.[18] V. Todorov. Power variation from second order differences for pure jump semimartingales. Stochastic Process. Appl. ,123(7):2829–2850, 2013.[19] H. Verdejo, A. Awerkin, W. Kliemann, and C. Becker. Modelling uncertainties in electrical power systems with sto-chastic differential equations.
International Journal of Electrical Power & Energy Systems , 113:322 – 332, 2019.[20] H. Verdejo, A. Awerkin, E. Saavedra, W. Kliemann, and L. Vargas. Stochastic modeling to represent wind powergeneration and demand in electric power system based on real data.
Applied Energy , 173:283 – 295, 2016.
Faculty of Mathematics, Kyushu University, 744 Motooka Nishi-ku Fukuoka 819-0395, Japan
E-mail address ::