[PDF] Affine Jump-Diffusions: Stochastic Stability and Limit Theorems

Abstract

Affine jump-diffusions constitute a large class of continuous-time stochastic models that are particularly popular in finance and economics due to their analytical tractability. Methods for parameter estimation for such processes require ergodicity in order establish consistency and asymptotic normality of the associated estimators. In this paper, we develop stochastic stability conditions for affine jump-diffusions, thereby providing the needed large-sample theoretical support for estimating such processes. We establish ergodicity for such models by imposing a `strong mean reversion' condition and a mild condition on the distribution of the jumps, i.e. the finiteness of a logarithmic moment. Exponential ergodicity holds if the jumps have a finite moment of a positive order. In addition, we prove strong laws of large numbers and functional central limit theorems for additive functionals for this class of models.

Full PDF

aa r X i v : . [ q -f i n . M F ] O c t Aﬃne Jump-Diﬀusions: Stochastic Stability and LimitTheorems

Xiaowei Zhang † Peter W. Glynn ∗ Abstract.

Aﬃne jump-diﬀusions constitute a large class of continuous-time stochasticmodels that are particularly popular in ﬁnance and economics due to their analyticaltractability. Methods for parameter estimation for such processes require ergodicity inorder establish consistency and asymptotic normality of the associated estimators. Inthis paper, we develop stochastic stability conditions for aﬃne jump-diﬀusions, therebyproviding the needed large-sample theoretical support for estimating such processes. Weestablish ergodicity for such models by imposing a “strong mean reversion” conditionand a mild condition on the distribution of the jumps, i.e. the ﬁniteness of a logarithmicmoment. Exponential ergodicity holds if the jumps have a ﬁnite moment of a positiveorder. In addition, we prove strong laws of large numbers and functional central limittheorems for additive functionals for this class of models.

Key words. aﬃne jump-diﬀusion; ergodicity; Lyapunov inequality; strong law of largenumbers; functional central limit theorem

Aﬃne jump-diﬀusion (AJD) processes constitute an important class of continuous time stochasticmodels that are widely used in ﬁnance and econometrics. This class of models is ﬂexible enoughto capture various empirical attributes such as stochastic volatility and leverage eﬀects; see, e.g.,Barndorﬀ-Nielsen and Shephard (2001). Furthermore, the aﬃne structure permits eﬃcient compu-tation, as a consequence of the fact that the characteristic function of its transient distributionis of an exponential aﬃne form. The transform can then be computed by solving a system ofordinary diﬀerential equations (ODEs) of generalized Riccati type; see Duﬃe et al. (2000). Theability to eﬃciently compute such characteristic functions then leads to signiﬁcant tractability bothfor computing various expectations and probabilities, and for the use of “method of moments” forcalibrating such models; see Singleton (2001), Bates (2006), and Filipovi´c et al. (2013). † Corresponding author. Department of Management Sciences, College of Business, City University of Hong Kong,Hong Kong. Email: [email protected] ∗ Department of Management Science and Engineering, Stanford University, CA 94305, U.S.

We will adopt the following notation throughout the paper. • We write R d + := { v ∈ R d : v i ≥ , i = 1 , . . . , d } and R d − := { v ∈ R d : v i ≤ , i = 1 , . . . , d } . • A vector v ∈ R d is treated as a column vector, v ⊺ denotes its transpose, || v || denotes itsEuclidean norm. • For a matrix A , A (cid:23) A is symmetric positive semideﬁnite and A ≻ A is symmetric positive deﬁnite. • We write v I = ( v i : i ∈ I ) and A IJ = ( A ij : i ∈ I, j ∈ J ), where v ∈ R d is a vector, A ∈ R d × d is a matrix, and I , J ⊆ { , . . . , d } are two index sets. • We use to denote a zero vector or a zero matrix, and Id( i ) to denote a matrix with all zeroentries except the i -th diagonal entry is 1, regardless of dimension. • For a set K ⊆ R d , I K ( x ) denotes the indicator function associated with K , i.e. I K ( x ) = 1 if x ∈ K and 0 otherwise.Fix a complete probability space (Ω , F , P ) equipped with a ﬁltration { F t : t ≥ } that satisﬁesthe usual hypotheses (Protter 2003, p.3). Suppose that a stochastic process X = ( X ( t ) : t ≥ X ⊆ R d satisﬁes the following stochastic diﬀerential equation (SDE)d X ( t ) = µ ( X ( t )) d t + σ ( X ( t )) d W ( t ) + Z R d zN (d t, d z ) ,X (0) = x ∈ X , (1)where W = ( W ( t ) : t ≥

0) is a d -dimensional Wiener process and N (d t, d z ) is a random countingmeasure on [0 , ∞ ) × R d with compensator measure Λ( X ( t -))d tν (d z ); moreover, µ : R d R d , σ : R d R d × d , and Λ : R d R are measurable functions, and ν is a Borel measure on R d . Inthe sequel, we will write P x ( · ) = P ( ·| X (0) = x ) and P η ( · ) = R X P ( ·| X (0) = x ) η (d x ) for an initialdistribution η ; E x and E η denote the corresponding expectation operators.We call X an AJD if the drift µ ( x ), diﬀusion matrix σ ( x ) σ ( x ) ⊺ , and jump intensity Λ( x ) are all3ﬃne in x , namely, µ ( x ) = b + βx, b ∈ R d , β ∈ R d × d σ ( x ) σ ( x ) ⊺ = a + d X i =1 x i α i , a ∈ R d × d , α i ∈ R d × d , i = 1 , . . . , d Λ( x ) = λ + κ ⊺ x, λ ∈ R , κ ∈ R d . (2)This paper is largely motivated by statistical calibration of AJDs. Most calibration proceduresthat have been applied to AJDs are based on some estimating equation as follows. Let Ξ denotethe collection of unknown parameters. For simplicity, we assume that the process X is discretelysampled at time epochs { k ∆ : k = 0 , , . . . , n } for some ∆ >

0. To estimate Ξ, one judiciously selectsa tractable function h ( x, y ; Ξ) for which E [ h ( X (0) , X (∆); Ξ)] = 0, and then solves the equation1 n n X k =1 h ( X (( k − , X ( k ∆); ˆΞ n ) = 0 , to compute the estimate ˆΞ n . In a situation where the dimension of h is greater than the dimensionof Ξ, one can use the generalized method of moments (Hansen 1982). Typical choices of h includethe marginal characteristic function of the conditional distribution of X ( k ∆) given X (( k − A g ( x ) for some tractable function g with enough smoothness, where A is the operator deﬁned in (10) as in Hansen and Scheinkman (1995). See also Duﬃe and Glynn(2004) for a choice of h that also utilizes the operator A but in a context where X is sampled atrandom times rather than deterministic times.In order to establish consistency and asymptotic normality of ˆΞ n , it is standard to assume positiveHarris recurrence as well as certain moment conditions on the function h ; see, e.g., Hansen (1982).The SLLNs and FCLTs that we present as part of Theorem 1 and Theorem 2 provide large-sampletheoretical support for establishing these asymptotic properties of the estimator. We refer interestedreaders to A¨ıt-Sahalia (2007) for an extensive survey on various statistical calibration methods forgeneral jump-diﬀusions and related assumptions for statistical validity. The following three assumptions are universal throughout the paper.

Assumption 1.

Let X = R m + × R d − m . For each x ∈ X , there exists a unique X -valued strongsolution to the SDE (1) with coeﬃcients (2) . Assumption 2.

Let I = { , . . . , m } and J = { m + 1 , . . . , d } for some ≤ m ≤ d .(i) a (cid:23) with a II = (ii) α i (cid:23) and α i, II = α i,ii · Id( i ) for i ∈ I ; α i = for i ∈ J ; iii) b ∈ R m + × R d − m ;(iv) β IJ = and β II has non-negative oﬀ-diagonal elements;(v) λ ∈ R + , κ I ∈ R m + and κ J = ;(vi) ν is a probability distribution on X . Assumption 3. a J J ≻ and b i > α i,ii > for i = 1 , . . . , m . In this paper, we focus on AJDs with canonical state space (Assumption 1) and admissible param-eters (Assumption 2). In the absence of jumps (i.e., λ = 0 and κ = ), the existence and uniquenessof a strong solution to the SDE (1) with coeﬃcients (2) is established in Filipovi´c and Mayerhofer(2009). They ﬁrst prove the existence of a weak solution, then prove pathwise uniqueness of thesolution, and ﬁnally apply the Yamada–Watanabe theorem (Karatzas and Shreve 1991, Corollary5.3.23). The same approach is followed in Dawson and Li (2006) to prove the case of AJDs in oneor two dimensions.Clearly, under Assumption 2 both the diﬀusion matrix and the jump intensity are independentof x J , i.e. σ ( x ) σ ( x ) ⊺ = a + P mi =1 x i α i and the jump intensity Λ( x ) = λ + P mi =1 x i κ i . In ﬁnancialapplications, the ﬁrst m components ( X , . . . , X m ) are often used to model volatility processes andthus are referred to as volatility factors , whereas the other ( d − m ) components are referred to as dependent factors .The jumps of the AJDs we study here have ﬁnite activity, a consequence of the fact that ν is assumed to be a probability distribution rather than a σ -ﬁnite measure. Nevertheless, thisrestriction is imposed merely for mathematical simplicity; the main results could also be provedfor the case of inﬁnite activity at the cost of a more involved analysis. One may recognize that theSDE (1) with ﬁnite activity jumps is precisely the model proposed in Duﬃe et al. (2000), whichalready covers a substantial number of ﬁnancial and economic applications.For a one-dimensional AJD such as the CIR model, the condition 2 b i > α i,ii > X admits a positive transition density. Note that the existence of a transition density for AJDs isestablished in Filipovi´c et al. (2013) but their proof requires b i > α i,ii > i = 1 , . . . , m , whichis stronger than our Assumption 3. Prior to presenting the main results of the paper, let us review several concepts regarding stochasticstability of a Markov process. 5 eﬁnition . A Markov process X with state space X is called Harris recurrent if there exists anon-trivial σ -ﬁnite measure ϕ on X such that R ∞ I K ( X ( t )) d t = ∞ , P x -a.s., for all x ∈ X and anymeasurable set K with ϕ ( K ) > Deﬁnition . A Harris recurrent Markov process X is called positive Harris recurrent if it admitsa ﬁnite invariant measure π , which can be normalized to a probability measure that is called the stationary distribution of X , the measure π must necessarily be unique. Deﬁnition . For a Markov process X with state space X , a set K ⊆ X is called uniformly transient if there exists M < ∞ such that E x R ∞ I K ( X ( t )) d t ≤ M for all x ∈ X . Furthermore, X is called transient if there is a countable cover of X with uniformly transient sets. Deﬁnition . For any measurable function f : X 7→ [1 , ∞ ) and any signed-measure ϕ on X , deﬁnethe f -norm of ϕ by || ϕ || f := sup | h |≤ f | ϕ ( h ) | , where ϕ ( h ) := R X h ( x ) ϕ (d x ). When f ≡ ||·|| f iscalled the total variation norm and is denoted by ||·|| .The following concept is also needed to state our condition for stochastic stability for AJDs. Deﬁnition . A square matrix is called stable if all its eigenvalues have negative real parts.The following notation will facilitate the presentation in the sequel. Let Z denote an R d -valuedrandom variable with distribution ν . For q >

0, set f q ( x ) := 1 + || x || q . For any measurable functions f : X 7→ [1 , ∞ ) and h : X 7→ R , set || h || f := sup x ∈X {| h ( x ) | /f ( x ) } . Let D [0 ,

1] denote the space ofright continuous functions x : [0 , R with left limits, endowed with the Skorokhod topology.A distinctive feature of AJDs, besides the aﬃne structure, relative to other jump-diﬀusion modelsis that its jump intensity is state-dependent. This property endows AJDs with greater ﬂexibilityin ﬁnancial modeling but creates technical diﬃculties for analyzing the dynamics of the process.Indeed, diﬀering theoretical treatments are needed, depending on whether the jump intensity isstate-dependent, when we establish Lyapunov inequalities in Section 3. We therefore present ourmain results in two separate theorems. Theorem 1 covers only AJDs with state-independent jumpintensities ( κ = ), whereas Theorem 2 allows state-dependent jump intensities. Theorem 1.

If Assumptions 1–3 hold, κ = , β is a stable matrix, and E log(1 + || Z || ) < ∞ , then:(i) X is positive Harris recurrent and lim t →∞ || P x ( X ( t ) ∈ · ) − π ( · ) || = 0 , x ∈ X , (3) where π is the stationary distribution of X .If, in addition, E || Z || p < ∞ for some p > , then:(ii) For each q ∈ (0 , p ] , there exist positive ﬁnite constants c q and ρ q such that || P x ( X ( t ) ∈ · ) − π ( · ) || f q ≤ c q f q ( x ) e − ρ q t , t ≥ , x ∈ X . (4)6 iii) For any measurable function h : X 7→ R with || h || f p < ∞ , P x (cid:18) lim t →∞ t Z t h ( X ( s )) d s = π ( h ) (cid:19) = 1 , x ∈ X , (5) and P x lim n →∞ n n X i =1 h ( X ( i ∆)) = π ( h ) ! = 1 , x ∈ X . (6) (iv) For any measurable function h : X 7→ R with || h q || f p < ∞ for some q > , there existnon-negative ﬁnite constants σ h and γ h such that n / (cid:18) n Z n · h ( X ( s )) d s − π ( h ) (cid:19) ⇒ σ h W ( · ) , (7) and n /  n ⌊ n ·⌋ X i =1 h ( X ( i ∆)) − π ( h )  ⇒ γ h W ( · ) , (8) as n → ∞ P x -weakly in D [0 , for all x ∈ X , where W is a one-dimensional Wiener process. Theorem 2.

If Assumptions 1–3 hold, β + E ( Z ) κ ⊺ is a stable matrix, and E || Z || < ∞ , then:(i) X is positive Harris recurrent and (3) holds.If, in addition, E || Z || p < ∞ for some p ≥ . Then:(ii) For each q ∈ [1 , p ] , there exist positive ﬁnite constants c q and ρ q such that (4) holds.(iii) For any measurable function h : X 7→ R with || h || f p < ∞ , (5) and (6) hold.(iv) For any measurable function h : X 7→ R with || h q || f p < ∞ for some q > , there exist non-negative ﬁnite constants σ h and γ h such that (7) and (8) hold as n → ∞ P x -weakly in D [0 , for all x ∈ X . We note that X is called ergodic if it has a stationary distribution π and the convergence (3)holds, whereas called f - exponentially ergodic if || P x ( X ( t ) ∈ · ) − π ( · ) || f ≤ c ( x ) e − ρt , t ≥ , x ∈ X , for some functions f : X 7→ [1 , ∞ ), c : X 7→ R + and some positive ﬁnite constant ρ . Clearly, X is f p -exponentially ergodic under the assumptions of Theorem 1(ii) or Theorem 2(ii).The key condition imposed here to establish positive Harris recurrence of AJDs is that β + E ( Z ) κ ⊺ is a stable matrix. If we adopt the convention that 0 ×∞ = 0, when κ = this condition is reduced tothat β is a stable matrix regardless of the ﬁniteness of E || Z || . The condition that β is a stable matrixis typically assumed in the literature, including Sato and Yamazato (1984), Glasserman and Kim72010), and Jena et al. (2012), in order that the process be mean reverting and have a stationarydistribution. However, the ﬁrst of the three articles works on a special L´evy-driven SDE, whereasthe other two study ADs, so none of them allows state-dependent jump intensities as AJDs do. Itcan be shown that the stability of β + E ( Z ) κ ⊺ implies that of β ; see Lemma 3 of Zhang et al. (2015).Thus, our condition is stronger and we call it the strong mean reversion condition .Note that E ( Z ) is the mean jump size and κ largely determines the magnitude of the jumpintensity when the AJD takes on big values. To some extent, E ( Z ) κ ⊺ captures the impact of thejumps. Thus, by imposing the stability of β + E ( Z ) κ ⊺ , we essentially assume that mean reversionis a dominating factor, more signiﬁcant than the jumps, in the dynamics of the process. On theother hand, this condition is technically mild. Indeed, we show in Section 3.3 that it cannot berelaxed in general if positive Harris recurrence of an AJD is desired. In this section, we apply Lyapunov ideas to address the stochastic stability of X . A key step in thisapproach is to judiciously construct suitable Lyapunov functions; see Meyn and Tweedie (1993c)for an extensive treatment of this approach. Nevertheless, we do not directly use the results therebecause their theory uses a deﬁnition of domain that insists on functions inducing martingales,whereas we work with local martingales.Consider a twice-diﬀerentiable function g : X 7→ R . By virtue of Itˆo’s formula, g ( X ( t )) = g ( X (0)) + Z t (cid:20) ∇ g ( X ( s -)) ⊺ µ ( X ( s -)) + 12 d X i,j =1 ∂ g ( X ( s -)) ∂x i ∂x j ( σ ( X ( s -)) σ ( X ( s -)) ⊺ ) ij (cid:21) d s + Z t ∇ g ( X ( s -)) ⊺ σ ( X ( s -)) d W ( s ) + Z t Z X ( g ( X ( s -) + z ) − g ( X ( s -))) N (d s, d z ) . (9)By deﬁning operators G , L , and A on twice-diﬀerentiable appropriately integrable functions g via G g ( x ) := ∇ g ( x ) · ( b + βx ) + 12 d X i,j =1 ∂ g ( x ) ∂x i ∂x j a i,j + d X k =1 α k,ij x k ! , L g ( x ) :=( λ + κ ⊺ x ) Z X ( g ( x + z ) − g ( x )) ν (d z ) , A g ( x ) := G g ( x ) + L g ( x ) , (10)8e may rewrite (9) as g ( X ( t )) = g ( X (0)) + Z t A g ( X ( s -)) d s + S ( t ) + S ( t ) ,S ( t ) := Z t ∇ g ( X ( s -)) ⊺ σ ( X ( s -)) d W ( s ) ,S ( t ) := Z t Z X ( g ( X ( s -) + z ) − g ( X ( s -))) ˜ N (d s, d z ) , (11)where ˜ N (d s, d z ) = N (d s, d z ) − Λ( X ( s -))d sν (d z ) is the compensated random measure of N (d s, d z ).We introduce some notation to facilitate the construction of the needed Lyapunov inequalities.First, for a d × d matrix H ≻

0, deﬁne || v || H := √ v ⊺ Hv . Then, ||·|| H is a vector norm on R d and itis easy to show that ¯ δ || v || ≤ || v || H ≤ ¯ δ || v || , v ∈ R d , (12)where ( δ i : i = 1 , . . . , d ) are the eigenvalues of H , ¯ δ = min { δ i : i = 1 , . . . , d } and ¯ δ = max { δ i : i =1 , . . . , d } . We can then deﬁne the following induced matrix norms (see Horn and Johnson (2012,p.340)). For a matrix A ∈ R d × d , deﬁne ||| A ||| := sup (cid:26) || Av |||| v || : = v ∈ R d (cid:27) and ||| A ||| H := sup (cid:26) || Av || H || v || H : = v ∈ R d (cid:27) . For each ∆ >

0, let X ∆ := ( X ( n ∆) : n = 0 , , . . . ) denote the ∆-skeleton of X . Proposition 1.

Under Assumptions 1–3, X ∆ is ϕ -irreducible for any ∆ > , where ϕ is theLebesgue measure on X . The proof of Proposition 1 relies on the following result, which is of interest in its own right. Itreduces irreducibility of a jump-diﬀusion process to that of the associated diﬀusion process.

Lemma 1.

Suppose that X satisﬁes the SDE (1) . Let ˜ X = ( ˜ X ( t ) : t ≥ satisfy d ˜ X ( t ) = µ ( ˜ X ( t )) d t + σ ( ˜ X ( t )) d W ( t ) , ˜ X (0) = x ∈ X , (13) where W is the d -dimensional Wiener process in (1) . If ˜ X ∆ (resp., ˜ X ) is ϕ -irreducible, then X ∆ (resp., X ) is ϕ -irreducible.Proof. Consider a measurable K ⊆ X and let τ denote the ﬁrst jump time of X . Then P x ( X ( t ) =˜ X ( t )) = 1 for t < τ ∗ . It follows that for any t > P x ( X ( t ) ∈ K, τ ∗ > t ) = E x h E (cid:16) I ( ˜ X ( t ) ∈ K, τ ∗ > t ) | X ( s ) , ≤ s ≤ t (cid:17)i Here, we do not restrict its coeﬃcients µ , σ , Λ to follow the aﬃne form (2). E x h I ( ˜ X ( t ) ∈ K ) P (cid:16) τ ∗ > t | ˜ X ( s ) , ≤ s ≤ t (cid:17)i = E x h I ( ˜ X ( t ) ∈ K ) e − R t Λ( ˜ X ( s )) d s i . Hence, P x ( X ( t ) ∈ K, τ ∗ > t ) = 0 if and only if P x ( ˜ X ( t ) ∈ K ) = 0 for any t >

0. It is then clearthat the ϕ -irreducibility of ˜ X ∆ (resp., ˜ X ) implies that of X ∆ (resp., X ). Proof of Proposition 1.

The key in the proof is to convert the AJD by a linear transformation usedin Filipovi´c and Mayerhofer (2009) into a canonical representation in which the matrices involvedare of special form. Speciﬁcally, note that if X satisﬁes the SDE (1) with coeﬃcients (2), then forany nonsingular matrix A ∈ R d × d , the linear transformation Y = AX satisﬁesd Y ( t ) = ( Ab + AβA − Y ( t )) d t + Aσ (cid:0) A − Y ( t ) (cid:1) d W ( t ) + Z R d AzN (d t, d z ) ,Y (0) = Ax, (14)where N (d t, d z ) has the compensator measure Λ( A − Y ( t -))d tν (d z ). So the drift, diﬀusion matrix,and intensity of SDE (14) are Ab + AβA − y , Aσ (cid:0) A − y (cid:1) σ (cid:0) A − y (cid:1) ⊺ A ⊺ , and λ + κ ⊺ A − y , respectively,which are all aﬃne in y . Consequently, the existence and uniqueness of a strong solution to (1) isinvariant with respect to nonsingular linear transformations.Since α i,ii > i = 1 , . . . , m , it follows from Lemma 7.1 of Filipovi´c and Mayerhofer (2009)that there exists a nonsingular matrix A ∈ R d × d that maps R m + × R d − m to itself and renders thetransformed diﬀusion matrix in the following block-diagonal form Aσ (cid:0) A − y (cid:1) σ (cid:0) A − y (cid:1) ⊺ A ⊺ = diag( α , y , . . . , α m,mm y m ) h + P mi =1 y i η i ! for some ( d − m ) × ( d − m ) matrices h (cid:23) η i (cid:23) i = 1 , . . . , m . In particular, A is of the form A = I m D I d − m ! , for some ( d − m ) × m matrix D , where I m and I d − m are identity matrices. Moreover, it is straight-forward to verify that Ab , AβA − , and κ ⊺ A − satisfy both Assumption 2 and Assumption 3 in lieuof b , β , and κ . Hence, we can assume without loss of generality that the diﬀusion matrix of (1) hasthe form σ ( x ) σ ( x ) ⊺ = diag( α , x , . . . , α m,mm x m ) a J J + P mi =1 x i α i, J J ! . (15)Hence, ˜ X I ( t ) satisﬁesd ˜ X I ( t ) = ( b I + β II ˜ X I ( t )) d t + diag( √ α , x , . . . , √ α m,mm x m ) d W I ( t ) , ˜ X I (0) = x I ∈ R m + . b i > α i,ii , i = 1 , . . . , m , we can directly verify the conditions of thetheorem on p.388 of Duﬃe and Kan (1996) to conclude that ∈ R m + is not attainable in ﬁnite time,i.e. ˜ X i ( t ) > t > i = 1 , . . . , m , if ˜ X i (0) > i = 1 , . . . , m .We now consider a bijective transformation ˜ Y := f ( ˜ X ), where f : X 7→ X is deﬁned as follows: f i ( x ) = 2 √ x i for i = 1 , . . . , m and f i ( x ) = x i for x = m + 1 , . . . , d . Then, ∂f i ( x ) ∂x j =  x − / i , if i = j, i = 1 , . . . , m, , if i = j, i = m + 1 , . . . , d, , otherwise,and ∂ f i ( x ) ∂x k ∂x l = ( − x − / i , if i = k = l, i = 1 , . . . , m, , otherwise.It follows that, by Itˆo’s formula,d f i ( ˜ X ( t )) = ζ i ( ˜ X ( t )) d t + ∇ f i ( ˜ X ( t )) ⊺ σ ( ˜ X ( t )) d W ( t ) , for i = 1 , . . . , d , where ζ i ( x ) = ∂f i ( x ) ∂x i µ i ( x ) + 12 ∂ f i ( x ) ∂x i ( σ ( x ) σ ( x ) ⊺ ) ii . Note that we have shown that x i > i = 1 , . . . , m for x ∈ X , so the function ζ ( x ) is well-deﬁnedfor all x ∈ X . Let f − denote the inverse mapping of f , i.e. f − i ( y ) = y i for i = 1 , . . . , m and f − i ( y ) = y i , for i = m + 1 , . . . , d . Then,d ˜ Y ( t ) = ζ ( f − ( ˜ Y ( t ))) d t + ∇ f ( f − ( ˜ Y ( t ))) σ ( f − ( ˜ Y ( t ))) d W ( t ) , (16)where ∇ f := ( ∂f i ∂x j ) ≤ i,j ≤ d is the Jacobian matrix of f . A straightforward calculation reveals thatthe diﬀusion matrix of (16) is ∇ f ( f − ( y )) σ ( f − ( y )) σ ( f − ( y )) ⊺ ∇ f ( f − ( y )) ⊺ = diag( α , . . . , α m,mm ) a J J + P mi =1 y i α i, J J ! . Hence, in light of the assumption that α i,ii > i = 1 , . . . , m and a J J ≻

0, the diﬀusion matrix of(16) is uniformly elliptic . It is well known that such diﬀusion processes admit a positive probabilitydensity; see, e.g., Theorem 3.3.4 of Davies (1989). Since the mapping f is bijective, we concludethat ˜ X also admits a positive transition density, so ˜ X ∆ is ϕ -irreducible. This completes the proofin light of Lemma 1. Proposition 2.

Under Assumptions 1 and 2, X is a stochastically continuous aﬃne process.Proof. For

T > u ∈ i R d , deﬁne M ( t ) := e φ ( T − t,u )+ ψ ( T − t,u ) ⊺ X ( t ) ,11here φ : R + × i R d C and ψ ( t, u ) : R + × i R d C d are functions that are diﬀerentiable withrespect to t . Applying Itˆo’s formula, M ( t )= M (0) + Z t M ( s -) ψ ( T − s, u ) ⊺ σ ( X ( s )) d W ( s ) + Z t M ( s -) Z X (cid:16) e ψ ( T − s,u ) ⊺ z − (cid:17) N (d s, d z )+ Z t M ( s -) [ − ∂ t φ ( T − s, u ) − ∂ t ψ ( T − s, u ) ⊺ X ( s ) + ψ ( T − s, u ) ⊺ µ ( X ( s -))] d s + 12 Z t M ( s -) ψ ( T − s, u ) ⊺ σ ( X ( s -)) σ ( X ( s -)) ⊺ ψ ( T − s, u ) d s = M (0) + Z t M ( s -) ψ ( T − s, u ) ⊺ σ ( X ( s )) d W ( s ) + Z t M ( s -) Z X (cid:16) e ψ ( T − s,u ) ⊺ z − (cid:17) ˜ N (d s, d z )+ Z t M ( s -) (cid:20) − ∂ t φ ( T − s, u ) + ψ ( T − s, u ) ⊺ b + 12 ψ ( T − s, u ) ⊺ aψ ( T − s, u ) (cid:21) d s + Z t M ( s -) (cid:20) − ∂ t ψ ( T − s, u ) ⊺ X ( s -) + ψ ( T − s, u ) ⊺ βX ( s -) + 12 d X i =1 ψ ( s, u ) ⊺ α i ψ ( s, u ) X i ( s -) (cid:21) d s + Z t M ( s -)( λ + κ ⊺ X ( s -)) Z X (cid:16) e ψ ( T − s,u ) ⊺ z − (cid:17) ν (d z )d s. Hence, if φ and ψ satisfy the following generalized Riccati equations ∂ t φ ( t, u ) = ψ ( t, u ) ⊺ b + 12 ψ ( t, u ) ⊺ aψ ( t, u ) + λ Z X (cid:16) e ψ ( t,u ) ⊺ z − (cid:17) ν (d z ) ,∂ t ψ i ( t, u ) = ψ ( t, u ) ⊺ β i + 12 d X i =1 ψ ( t, u ) ⊺ α i ψ ( t, u ) + κ i Z X (cid:16) e ψ ( t,u ) ⊺ z − (cid:17) ν (d z ) , i = 1 , . . . , d, with φ (0 , u ) = 0 and ψ (0 , u ) = u , where β i is the i -th column of β , then M ( t ) = M (0) + Z t M ( s -) ψ ( T − s, u ) ⊺ σ ( X ( s )) d W ( s ) + Z t M ( s -) Z X (cid:16) e ψ ( T − s,u ) ⊺ z − (cid:17) ˜ N (d s, d z ) . (17)It follows from Proposition 6.1 and Proposition 6.4 of Duﬃe et al. (2003) that under Assumption2, the preceding generalized Riccati equations have a unique solution ( φ ( · , u ) , ψ ( · , u )) : R + C − × C m − × i R d − m for all u ∈ C m − × i R d − m , where C m − = { z ∈ C m | Re( z ) ∈ R m − } . Hence, φ ( t, u ) + ψ ( t, u ) ⊺ x ∈ C − , x ∈ X , (18)under Assumption 1. Further, Proposition 7.4 of Duﬃe et al. (2003) asserts that φ ( t + s, u ) = φ ( t, u ) + φ ( s, ψ ( t, u )) ψ ( t + s, u ) = ψ ( t, ψ ( s, u )) (19)for all t, s ∈ R + and u ∈ C m − × i R d − m . 12n light of (17) and (18), ( M ( t ) : 0 ≤ t ≤ T ) is a local martingale with | M ( t ) | ≤ t ,thereby a martingale. So E x [ e u ⊺ X ( T ) ] = E x [ M ( T )] = E x [ M (0)] = e φ ( T,u )+ ψ ( T,u ) ⊺ x , (20)namely the characteristic function E x [ e u ⊺ X ( t ) ] is exponential-aﬃne in x . In addition, it is easy toverify via (19) and (20) the ChapmanKolmogorov equation P x ( X ( t + s ) ∈ · ) = Z X P x ( X ( t ) ∈ d y ) P y ( X ( s ) ∈ · ) , implying that X is a time-homogeneous Markov process, thereby an aﬃne process by (20).At last, E x [ e u ⊺ X ( t ) ] is clearly continuous in t by (20), indicating that X is stochastically continuous. Proof of Theorem 1(i).

We ﬁrst show that X is Harris recurrent. Theorem 1.1 of Meyn and Tweedie(1993a) asserts that X is Harris recurrent if (i) X is a Borel right process (Getoor 1975, p.55),and (ii) there exists a petite set K for X , such that P x ( τ K < ∞ ) = 1 for all x ∈ X , where τ K = inf { t ≥ X ( t ) ∈ K } .For condition (i), we note that X is a Feller process by Theorem 5.1 of Keller-Ressel et al. (2011),Proposition 8.2 of Duﬃe et al. (2003), and Proposition 2. The Feller property of X trivially impliesthat X is a Borel right process.For condition (ii), ﬁx an arbitrary ∆ > X ∆ is a Feller chain since X is a Fellerprocess. By Theorem 3.4 of Meyn and Tweedie (1992), the Feller property of X ∆ and Proposition1 immediately imply that all compact sets are petite for X ∆ , thereby petite for X . In the sequel,we will show that there exists a compact set K such that P x ( τ K < ∞ ) = 1 for all x ∈ X . To thatsend, we ﬁrst establish the following Lyapunov inequality A g ( x ) ≤ − c + c I K ( x ) , x ∈ X , (21)for some compact set K and some positive ﬁnite constants c and c , where g ( x ) = log(1 + || x || H )for some d × d matrix H ≻ β is a stable matrix, there exists a d × d matrix H ≻ − ( Hβ + β ⊺ H ) ≻ g ( x ) as follows ∇ g ( x ) = 2 Hx || x || H and ∇ g ( x ) = 2(1 + || x || H ) H − Hxx ⊺ H (1 + || x || H ) . G g ( x ) = 21 + || x || H  x ⊺ H ( b + βx ) + 12 d X i,j =1 a ij + d X k =1 α k,ij x k ! H − Hxx ⊺ H || x || H ! ij  . (22)We note that for any i, j = 1 , . . . , d , | ( Hxx ⊺ H ) ij | = | ( Hx ) i ( Hx ) j | ≤ || Hx || ≤ ||| H ||| || x || ≤ ¯ δ − ||| H ||| || x || H , where the last inequality follows from (12). Hence, | ( Hxx ⊺ H ) ij | || x || H = O (1) , (23)as || x || H → ∞ . Therefore, we can rewrite (22) as G g ( x ) = 2 x ⊺ Hβx + O ( || x || H )1 + || x || H = 2 x ⊺ Hβx || x || H (1 + || x || H ) + o (1) , as || x || H → ∞ . Moreover, by virtue of (12) and the fact that − ( Hβ + β ⊺ H ) ≻ − x ⊺ Hβx = − x ⊺ ( Hβ + β ⊺ H ) x ≥ ¯ γ || x || ≥ ¯ γ ¯ δ − || x || H , where ¯ γ > − ( Hβ + β ⊺ H ). Therefore,lim sup || x || H →∞ G g ( x ) = lim sup || x || H →∞ x ⊺ Hβx || x || H (1 + || x || H ) ≤ − ¯ γ ¯ δ − . (24)On the other hand, it is easy to see that 1 + ( || x || H + || z || H ) ≤ || x || H )(1 + || z || H ) for all x, z ∈ R d . Thus,log || x + z || H || x || H ! ≤ log || x || H + || z || H ) || x || H ! ≤ log(2(1 + || z || H )) . (25)It is easy to see that log(2(1 + || z || H )) is integrable on X , since E log(1 + || Z || H ) < ∞ if and only if E log(1 + || Z || ) < ∞ in light of (12). Then, we move the left-hand-side of (25) to the right-hand-sideand apply Fatou’s lemma to obtainlim sup || x || H →∞ Z X log || x + z || H || x || H ! ν (d z ) ≤ Z X lim sup || x || H →∞ log || x + z || H || x || H ! ν (d z ) = 0 . κ = , lim sup || x || H →∞ L g ( x ) = lim sup || x || H →∞ λ Z X log || x || H + || z || H ) || x || H ! ν (d z ) ≤ . (26)We then conclude from (24) and (26) that there exists k > A g ( x ) = G g ( x ) + L g ( x ) ≤ − γ ¯ δ − , for all x ∈ X with || x || H > k . Then, it is easy to check that the inequality (21) holds by setting K = { x ∈ X : || x || H ≤ k } , c = ¯ γ ¯ δ − /

2, and c = max { , sup x ∈ K ( A g ( x ) + c ) } .We are now ready to show P x ( τ K < ∞ ) = 1 for all x ∈ X . Deﬁne T n = inf { t ≥ | X ( t ) | > n } .It follows from (11) and (21) that g ( X ( t ∧ T n )) ≤ g ( X (0)) + Z t ∧ T n [ − c + c I K ( X ( s -))] d s + S ( t ∧ T n ) + S ( t ∧ T n ) , n ≥ . (27)Noting that | X ( t -) | ≤ n is bounded for t ∈ [0 , T n ), ( S i ( t ∧ T n ) : t ≥

0) is a martingale, i = 1 , E x [ g ( X ( t ∧ τ K ∧ T n ))] ≤ g ( x ) − c E x ( t ∧ τ K ∧ T n ) , x ∈ X \ K, n ≥ . Therefore, c E x ( t ∧ τ K ∧ T n ) ≤ g ( x ) , x ∈ X \ K, n ≥ , since g ( x ) ≥ x ∈ X . Note that X is non-explosive, so T n → ∞ as n → ∞ P x -a.s. for all x ∈ X . Therefore, by sending n → ∞ and then sending t → ∞ , we conclude from the monotoneconvergence theorem that c E x ( τ K ) ≤ g ( x ) for x ∈ X \ K . Hence, P x ( τ K < ∞ ) = 1 for all x ∈ X .Consequently, X is Harris recurrent by Theorem 1.1 of Meyn and Tweedie (1993a).Theorem 1.2 of Meyn and Tweedie (1993a) states that given the Harris recurrence, X is positiveHarris recurrent if sup x ∈ K E x ( τ K (∆)) < ∞ . We now show this is indeed the case. For any ∆ > τ K (∆) := ∆ + Θ ∆ ◦ τ K be the ﬁrst hitting time on K after ∆, where Θ ∆ is the shift operator ;see Sharpe (1988, p.8). Then, E x ( τ K (∆) − ∆) = Z X P x ( X (∆) ∈ d y ) E y ( τ K ) ≤ Z X c − g ( y ) P x ( X (∆) ∈ d y ) = c − E x g ( X (∆)) , (28)for all x ∈ X . In addition, it follows from (27) that E x g ( X (∆ ∧ T n )) ≤ g ( x ) + ( c − c ) E x (∆ ∧ T n ) , x ∈ X , n ≥ . E x g ( X (∆)) ≤ lim inf n →∞ E x g ( X (∆ ∧ T n )) ≤ g ( x ) + ( c − c )∆ , x ∈ X . (29)Combining (28) and (29) yields that E x ( τ K (∆)) ≤ c − ( g ( x ) + d ) , x ∈ X . Hence, sup x ∈ K E x ( τ K (∆)) < ∞ , which implies that X is positive Harris recurrent by Theorem 1.2of Meyn and Tweedie (1993a).Finally, Theorem 6.1 of Meyn and Tweedie (1993b) asserts that if X ∆ is ϕ -irreducible, which istrue by Proposition 1, then a positive Harris recurrent process is ergodic, i.e. (3) holds. Proof of Theorem 2(i).

Following the proof Theorem 1(i), it suﬃces to show the Lyapunov inequal-ity (21) holds under the assumptions of Theorem 2(i). In fact, we prove the following strongerresult A g ( x ) ≤ − c g ( x ) + c I K ( x ) , x ∈ X , (30)for some compact set K and some positive ﬁnite constants c and c , where g ( x ) = (1 + || x || H ) p/ for some d × d matrix H ≻ p ≥ E || Z || < ∞ , there exists p ≥ E || Z || p < ∞ . Since β + E ( Z ) κ ⊺ is stable, thereexists a matrix H ≻ − [ H ( β + E ( Z ) κ ⊺ ) + ( β + E ( Z ) κ ⊺ ) ⊺ H ] ≻ . (31)It is straightforward to calculate the gradient and Hessian of g ( · ) as follows ∇ g ( x ) = pg ( x )1 + || x || H Hx and ∇ g ( x ) = pg ( x )1 + || x || H " H + ( p − Hxx ⊺ H || x || H . It then follows from (23) that as || x || H → ∞ , G g ( x ) = pg ( x )1 + || x || H  x ⊺ H ( b + βx ) + 12 d X i,j =1 ( a i,j + d X k =1 α k,ij x k ) H + ( p − Hxx ⊺ H || x || H ! i,j  = pg ( x ) x ⊺ Hβx || x || H + o (1) ! . (32)To analyze the asymptotic behavior of L g ( x ), we apply the mean value theorem, namely g ( x + z ) − g ( x ) = ∇ g ( ξ ) ⊺ z = p (1 + || ξ || H ) p/ − ξ ⊺ Hz, where ξ = x + uz for some u ∈ (0 , || ξ || H lies between || x || H and || x + z || H and ξ ⊺ Hzκ ⊺ x x ⊺ Hzκ ⊺ x and ( x + z ) ⊺ Hzκ ⊺ x . It then follows that κ ⊺ x ( g ( x + z ) − g ( x )) g ( x ) = p · (1 + || ξ || H ) p/ − (1 + || x || H ) p/ · ξ ⊺ Hzκ ⊺ x ∼ p · x ⊺ Hzκ ⊺ x || x || H (33)as || x || H → ∞ for all z ∈ R d . Moreover, | g ( x + z ) − g ( x ) | = p (1 + || ξ || H ) p/ − | z ⊺ Hξ |≤ p (1 + || ξ || H ) p/ − || z |||| Hξ ||≤ p ¯ δ − (1 + || ξ || H ) p/ − || z || H ||| H ||| H || ξ || H ≤ p ¯ δ − (1 + || ξ || H ) p/ − / || z || H ||| H ||| H , (34)where the second inequality follows from (12). So (cid:12)(cid:12)(cid:12)(cid:12) κ ⊺ x ( g ( x + z ) − g ( x )) g ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | κ || x | p ¯ δ − (1 + || ξ || H ) p/ − / || z || H ||| H ||| H (1 + || x || H ) p/ ≤ p ¯ δ − ||| H ||| H || κ || H || z || H (1 + || ξ || H ) p/ − / (1 + || x || H ) p/ − / , where the second inequality follows from (12). Note that1 + || ξ || H = 1 + || x + uz || H ≤ || x || H )(1 + || uz || H ) ≤ || x || H )(1 + || z || H ) , so Z X (cid:12)(cid:12)(cid:12)(cid:12) κ ⊺ x ( g ( x + z ) − g ( x )) g ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ν (d z ) ≤ p/ − / p ¯ δ − ||| H ||| H || κ || H Z X (1 + || z || H ) p/ ν (d z ) < ∞ . (35)By (33) and (35), the dominated convergence theorem dictates that κ ⊺ x Z X ( g ( x + z ) − g ( x )) ν (d z ) ∼ pg ( x ) · Z x ⊺ Hzκ ⊺ x || x || H ν (d z ) = pg ( x ) · x ⊺ H E ( Z ) κ ⊺ x || x || H , and thus L g ( x ) = ( λ + κ ⊺ x ) Z X ( g ( x + z ) − g ( x )) ν (d z ) ∼ pg ( x ) x ⊺ H E ( Z ) κ ⊺ x || x || H , (36)as || x || H → ∞ . Combining (32) and (36), A g ( x ) = G g ( x ) + L g ( x ) = pg ( x ) x ⊺ H ( β + E ( Z ) κ ⊺ ) x || x || H + o (1) ! Here, we use the notation that f ( x ) ∼ g ( x ) if lim || x || H →∞ f ( x ) g ( x ) = 1 . || x || H → ∞ . By (31), the deﬁnition of the matrix H , − x ⊺ H ( β + E ( Z ) κ ⊺ ) x = − x ⊺ [ H ( β + E ( Z ) κ ⊺ ) + ( β + E ( Z ) κ ⊺ ) ⊺ H ] x ≥ γ || x || ≥ γ ¯ δ − || x || H , where ¯ γ > − [ H ( β + E ( Z ) κ ⊺ ) + ( β + E ( Z ) κ ⊺ ) ⊺ H ]. Hence, thereexists k > A g ( x ) ≤ − p ¯ γ ¯ δ − g ( x ) for all x ∈ X with || x || H > k . Therefore, (30) holdsby setting K = { x ∈ X : || x || H ≤ k } , c = p ¯ γ ¯ δ − /

4, and c = max { , sup x ∈ K ( A g ( x ) + c g ( x )) } . Proof of Theorem 1(ii).

Note that if E || Z || p < ∞ for some p >

0, then E || Z || q < ∞ for all q ∈ (0 , p ].We assume that p ∈ (0 , p ≥ β is stable, there exists a matrix H ≻ − ( Hβ + β ⊺ H ) ≻

0. We show that g q ( x ) = (1 + || x || H ) q/ satisﬁes the inequality (30) for some compact set K and some positive ﬁniteconstants c , c . Note that g q ( x + z ) − g q ( x ) ≤ (1 + || x || H + || z || H ) q/ − (1 + || x || H ) q/ = q ξ q/ − || z || qH , where the equality follows from the mean value theorem and ξ ∈ (1 + || x || H , || x || H + || z || H ).Since ξ > p ∈ (0 , g q ( x + z ) − g q ( x ) ≤ q || z || qH . Likewise, it can be shown that g q ( x ) − g q ( x + z ) ≤ q || z || qH . Hence, | g q ( x + z ) − g q ( x ) | ≤ q || z || qH and (cid:12)(cid:12)(cid:12)(cid:12)Z X g q ( x + z ) − g q ( x ) ν (d z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z X | g q ( x + z ) − g q ( x ) | ν (d z ) ≤ q E || Z || qH < ∞ , It follows that with κ = , L g q ( x ) = λ Z X ( g q ( x + z ) − g q ( x )) ν (d z ) = O (1) , as || x || H → ∞ . Moreover, applying (32) to g q ( x ), A g q ( x ) = G g q ( x ) + L g q ( x ) = qg q ( x ) x ⊺ Hβx || x || H + o (1) ! , as || x || H → ∞ . By the deﬁnition of the matrix H , − x ⊺ Hβx = − x ⊺ ( Hβ + β ⊺ H ) x ≥ γ || x || ≥ γ ¯ δ − || x || H , where ¯ γ > − ( Hβ + β ⊺ H ). Hence, there exists k > A g ( x ) ≤ − p ¯ γ ¯ δ − g ( x ) for all x ∈ X with || x || H > k . Therefore, A g q ( x ) ≤ − c g q ( x ) + c I K ( x ) , x ∈ X , (37)18here K = { x ∈ X : || x || H ≤ k } , c = p ¯ γ ¯ δ − /

4, and c = max { , sup x ∈ K ( A g ( x ) + c g ( x )) } .We apply Itˆo’s formula to e c t g q ( X ( t )). In particular, by (11), e c t g q ( X ( t )) = g q ( X (0)) + Z t e c s [ c g q ( X ( s -)) + A g q ( X ( s -))]d s + Z t e c s ∇ g q ( X ( s -)) ⊺ σ ( X ( s )) d W ( s )+ Z t e c s Z X ( g q ( X ( s -) + z ) − g q ( X ( s -))) ˜ N (d s, d z ) . Clearly, the two stochastic integrals above are both martingales up to time T n , where T n = { t ≥ | X ( t ) | > n } . It follows from (37) and the optional sampling theorem that e c t E x g q ( X ( t ∧ T n )) ≤ g q ( x ) + E x Z t ∧ T n e c s · c I K ( X ( s )) d s ≤ g q ( x ) + c c − E x e t ∧ T n . We now apply Fatou’s lemma and the monotone convergence theorem to conclude that e c t E x g q ( X ( t )) ≤ g q ( x ) + c c − · lim inf n →∞ E x e t ∧ T n = g q ( x ) + c c − e c t . (38)Then we can adopt the argument used in the proof of Theorem 6.1 of Meyn and Tweedie (1993c)to conclude that because of (38), there exist positive ﬁnite constants d q and ρ q such that || P x ( X ( t ) ∈ · ) − π ( · ) || g q +1 ≤ d q ( g q ( x ) + 1) e − ρ q t , t ≥ , x ∈ X . By (12), there exist positive constants d and d such that d ≤ (cid:12)(cid:12)(cid:12) f q ( x ) g q ( x )+1 (cid:12)(cid:12)(cid:12) ≤ d for all x ∈ X . Hence, || P x ( X ( t ) ∈ · ) − π ( · ) || f q ≤ c q f q ( x ) e − ρ q t , t ≥ , x ∈ X , where c q = d q d /d . Proof of Theorem 2(ii).

Following the proof of Theorem 1(ii), it suﬃces to show that (37) holdsunder the present assumptions. Note that E || Z || q < ∞ for all q ∈ [1 , p ] since E || Z || p < ∞ . Hence,we can apply the Lyapunov inequality (30) to g q ( x ), which results in (37). The key condition that we impose to establish positive Harris recurrence of X is the strong mean-reversion condition, i.e, β + E ( Z ) κ ⊺ is a stable matrix. Indeed, this condition cannot be relaxed ingeneral as illustrated by the following example. Proposition 3.

Suppose that d = 1 , m = 1 , and Assumptions 1–3 hold. If E | Z | < ∞ and β + E ( Z ) κ > , then X is transient . roof. The proof also relies on Lyapunov inequalities; see Theorem 3.3 of Stramer and Tweedie(1994). Speciﬁcally, transience follows if there exists a bounded function g and a closed set K suchthat A g ( x ) ≥ , x ∈ X \ K, (39)and sup x ∈ K g ( x ) < g ( x ) , x ∈ X \ K. (40)Let g ( x ) = 1 − e − ǫx for some ǫ >

0. Obviously, g is bounded for x ∈ X = R + . Then, A g ( x ) = ( b + βx ) g ′ ( x ) + 12 ( a + αx ) g ′′ ( x ) + ( λ + κx ) Z R + ( g ( x + z ) − g ( x )) ν (d z )= e − ǫx (cid:20) ǫ ( b + βx ) − ǫ a + αx ) + ( λ + κx ) Z R + (1 − e − ǫz ) ν (d z ) (cid:21) = e − ǫx (cid:20)(cid:18) ǫβ − ǫ α + κ (1 − E e − ǫZ ) (cid:19) x + ǫb − ǫ a + λ (1 − E e − ǫZ ) (cid:21) . Let h ( ǫ ) be the coeﬃcient of x in the brackets above, i.e., h ( ǫ ) := ǫβ − ǫ α + κ (1 − E e − ǫZ ). Clearly, h (0) = 0 and h ′ (0) = β + κ E ( Z ) >

0, yielding that h ( ǫ ) > ǫ >

0. Fixing this ǫ , we seethat A g ( x ) ∼ e − ǫx h ( ǫ ) x as x → ∞ . Hence, there exists k > A g ( x ) > x ∈ X \ K ,where K := [0 , k ], proving (39). Moreover, (40) is true since g ( x ) is increasing in x .The “boundary” case, i.e. β + E ( Z ) κ ⊺ = 0, is more complicated as the behavior of the processmay depend on other parameters. We leave its analysis for future research. In this section, we prove SLLNs and FCLTs for additive functionals of X of the form R t h ( X ( s )) d s or P ni =1 h ( X ( i ∆)) for some function h . Limit theorems for both discrete-time and continuous-timeMarkov processes have been extensively studied in the past; see, e.g., Glynn and Meyn (1996),Kontoyiannis and Meyn (2003), Meyn and Tweedie (2009, chap.17), and references therein. Inparticular, positive Harris recurrence is “almost” suﬃcient for a LLN to hold. Conditions forFCLTs, on the other hand, often include exponential ergodicity, or Lyapunov inequalities of theform similar to (30).Nevertheless, existing FCLTs for discrete-time Markov processes are not applicable to the skeletonchain X ∆ because they typically require one to establish a “discrete-time” version of the Lyapunovinequality of the form E x [ g ( X (∆))] ≤ cg ( x ) for some constant c <

1, some function g ≥

1, and all x oﬀ a compact set. This is awkward mathematically given the fact that the transition measure P x ( X (∆) ∈ · ) is not known explicitly. Our approach to establish (8) is to ﬁrst consider thescenario in which X (0) follows the stationary distribution. We then apply an FCLT for stationarysequences, i.e., Theorem 3.1 of Ethier and Kurtz (1986, p.351), whose conditions can be veriﬁed as20 consequence of exponential ergodicity (4). To generalize the FCLT to an arbitrary initial statewe follow an argument similar to one used in Glynn and Meyn (1996).The asymptotic variances, σ h in (7) and γ h in (8), can be expressed in terms of the solutionto a Poisson equation ; see, e.g., Glynn and Meyn (1996). But it typically has no closed form interms of the parameters ( a, α . . . , α d , b, β, λ, κ, ν ) of the SDE (1). However, when h is the (vector-valued) identity function, we are indeed able to analytically derive both the asymptotic mean andasymptotic covariance matrix that appear in the corresponding FCLT (see Corollary 1), thanks tothe tractable aﬃne structure. Proof of Theorem 1(iii) and Theorem 2(iii).

We have established positive Harris recurrence andergodicity of X in Section 3.1 under the assumptions of Theorem 1(iii) or Theorem 2(iii). So π ( | h | ) < ∞ for any measurable function h : X 7→ R with || h || f p < ∞ . The SLLN (5) then followsfrom Theorem 2 of Sigman (1990).For the skeleton chain X ∆ , note that the stationary distribution π of X is necessarily invariantfor X ∆ . In addition, X ∆ is ϕ -irreducible by Proposition 1, so X ∆ is positive Harris recurrent.Hence, the SLLN (6) follows from Theorem 17.1.7 of Meyn and Tweedie (2009, p.427). Proof of Theorem 1(iv) and Theorem 2(iv).

Fix q > h : X 7→ R with || h q || f p < ∞ .We have shown in Section 3.2 that there exists a matrix H ≻

0, a compact set K and positive ﬁniteconstants c , c such that A g ( x ) ≤ − c g ( x ) + c I K ( x ) for all x ∈ X , where g ( x ) = (1 + || x || H ) p/ .Thanks to (12), || h || f p < ∞ if and only if || h || g < ∞ . Moreover, we have shown in Section 3.1 that K is a petite set for X . It then follows immediately from Theorem 4.4 of Glynn and Meyn (1996)that (7) holds as n → ∞ P x -weakly in D [0 ,

1] for all x ∈ X .We now show that (8) holds P π -weakly in D [0 , π is the stationary distribution of X .This can be done by applying an FCLT for stationary sequences to { ¯ h ( X ( n ∆) : n = 0 , , . . . } , whichis a mean-zero stationary sequence if X (0) ∼ π , where ¯ h ( x ) := h ( x ) − π ( h ).Speciﬁcally, let F k and F k denote the σ -algebras generated by ( X ( n ∆) : n ≤ k ) and ( X ( n ∆) : n ≥ k ), respectively. Let ϕ ( l ) := sup Γ ∈ F k + l E π | P (Γ | F k ) − P (Γ) | denote the measure of mixing (Ethier and Kurtz 1986, p.346) of F k and F k + l associated with the L -norm. Then, by Theorem3.1 and Remark 3.2(b) of Ethier and Kurtz (1986, p.351), it suﬃces to verify that for some ǫ > E π h(cid:12)(cid:12) ¯ h ( X ( n ∆)) (cid:12)(cid:12) ǫ i < ∞ and ∞ X l =0 [ ϕ ( l )] ǫ/ (2+ ǫ ) < ∞ . (41)21et ǫ = q − >

0. Then, E π h(cid:12)(cid:12) ¯ h ( X ( n ∆)) (cid:12)(cid:12) ǫ i = π (¯ h q ) ≤ (cid:12)(cid:12)(cid:12)(cid:12) ¯ h q (cid:12)(cid:12)(cid:12)(cid:12) f p π ( f p ) < ∞ , verifying the ﬁrst condition in (41). To verify the second, note that by the Markov property, forany Γ ∈ F k + l there exists a function w Γ with | w Γ ( · ) | ≤ P [Γ | F k + l ] = w Γ ( X (( k + l )∆)).If X (0) ∼ π , then for any Γ ∈ F k + l , | P (Γ | F k ) − P (Γ) | = | E [ w Γ ( X (( k + l )∆)) | F k ] − E [ w Γ ( X (( k + l )∆))] | = (cid:12)(cid:12)(cid:12)(cid:12)Z X w Γ ( y ) P X ( k ∆) ( X ( l ∆) ∈ d y ) − Z X Z X w Γ ( y ) P x ( X (( k + l )∆) ∈ d y ) π (d x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) P X ( k ∆) ( X ( l ∆) ∈ · ) − π ( · ) (cid:12)(cid:12)(cid:12)(cid:12) , where ||·|| is the total variation norm, where the inequality follows from Deﬁnition 4 and the factthat | w Γ ( · ) | ≤

1. It follows that ϕ ( l ) ≤ E π (cid:12)(cid:12)(cid:12)(cid:12) P X ( k ∆) ( X ( l ∆) ∈ · ) − π ( · ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ c p e − ρ p l ∆ E π [ f ( X ( k ∆))] = c p π ( f p ) e − ρ p l ∆ , where the second inequality holds because of Theorem 1(ii) and Theorem 2(ii). This immediatelyimplies P ∞ l =0 [ ϕ ( l )] ǫ/ (2+ ǫ ) < ∞ . Therefore, we conclude that (8) holds P π -weakly in D [0 , n → ∞ P x -weakly in D [0 ,

1] for all x ∈ X . To that end,we ﬁrst show that P x (cid:18) lim n →∞ sup ≤ t ≤ | Y n ( t ) − Y n,l ( t ) | = 0 (cid:19) = 1 , x ∈ X , (42)for any positive integer l , where Y n ( t ) := n − / P ⌊ nt ⌋ i =1 ¯ h ( X ( i ∆)) and Y n,l ( t ) := n − / P ⌊ nt ⌋ + li = l +1 ¯ h ( X ( i ∆)).Note that for all suﬃciently large n ,sup ≤ t ≤ | Y n ( t ) − Y n,l ( t ) | = 1 n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ¯ h ( X ( i ∆)) − n + l X i = l +1 ¯ h ( X ( i ∆)) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) l X i =1 ¯ h ( X ( i ∆)) − n + l X i = n +1 ¯ h ( X ( i ∆)) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n l X i =1 ¯ h ( X ( i ∆)) + 1 n n + l X i = n +1 ¯ h ( X ( i ∆)) → , P x − a . s ., as n → ∞ for all x ∈ X , because n P ni =1 ¯ h ( X ( i ∆)) → π (¯ h ) < ∞ , P x − a . s . , as n → ∞ for all x ∈ X , thanks to Theorem 1(iii) and Theorem 2(iii). This completes the proof of (42).Let φ be a bounded continuous functional φ on D [0 , l , | E x [ φ ( Y n )] − E x [ φ ( Y n,l )] | → n → ∞ for all x ∈ X . This limit can be rewritten aslim n →∞ (cid:12)(cid:12)(cid:12)(cid:12) E x [ φ ( Y n )] − Z X P x ( X ( l ∆) ∈ d y ) E y [ φ ( Y n,l )] (cid:12)(cid:12)(cid:12)(cid:12) = 0 , x ∈ X . (43)On the other hand, note that (cid:12)(cid:12)(cid:12)(cid:12)Z X P x ( X ( l ∆) ∈ d y ) E y [ φ ( Y n,l )] − E π [ φ ( Y n )] (cid:12)(cid:12)(cid:12)(cid:12) ≤ || P x ( X ( l ∆) ∈ · ) − π ( · ) || · sup g ∈D [0 , | φ ( g ) | . Since || P x ( X ( l ∆) ∈ · ) − π ( · ) || → l → ∞ by Theorem 1(i) and Theorem 2(i), for any δ > l so large that (cid:12)(cid:12)(cid:12)(cid:12)Z X P x ( X ( l ∆) ∈ d y ) E y [ φ ( Y n,l )] − E π [ φ ( Y n )] (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ. (44)It then follows from (43) and (44) that lim sup n →∞ | E x [ φ ( Y n )] − E π [ φ ( Y n )] | ≤ δ . Since (8) holds P π -weakly in D [0 , n →∞ | E π [ φ ( Y n )] − E π [ φ ( W )] | = 0, and thuslim sup n →∞ | E x [ φ ( Y n )] − E π [ φ ( W )] | ≤ δ. Sending δ → P x -weakly in D [0 ,

1] for all x ∈ X . Thanks to the aﬃne structure, the asymptotic mean and the asymptotic variance can be derivedanalytically when h is the identity function, i.e. h ( x ) = x . Note that with h being R d -valued, thecorresponding SLLN and FCLT are multivariate. The calculation follows closely the approach usedin Zhang et al. (2015) so we omit the details. Corollary 1.

If Assumptions 1–3 hold and E || Z || < ∞ , then P x (cid:18) lim t →∞ t Z t h ( X ( s )) d s = v (cid:19) = 1 , x ∈ X , where v = − ( β + E ( Z ) κ ⊺ ) − ( b + λ E ( Z )) . Furthermore, if E || Z || ǫ < ∞ for some ǫ > , then n / (cid:18) n Z n · X ( s ) d s − v (cid:19) ⇒ Σ / W ( · ) , as n → ∞ P x -weakly in D R d [0 , for all x ∈ X , where Σ = A ( a + λ E ( ZZ ⊺ )) A ⊺ + m X i =1 v i A ( α i + κ i E ( ZZ ⊺ )) A ⊺ . cknowledgments The ﬁrst author was partially supported by the Hong Kong Research Grants Council under GeneralResearch Fund (ECS 624112). The second author gratefully acknowledges the support and theintellectual environment of the Institute for Advanced Study at the City University of Hong Kong,where this work was completed.

References

A¨ıt-Sahalia, Y. (2007). Estimating continuous-time models using discretely sampled data. In R. Blundell,P. Torsten, and W. K. Newey (Eds.),

Advances in Economics and Econometrics, Theory and Applica-tions, Ninth World Congress , Chapter 9. Cambridge University Press.A¨ıt-Sahalia, Y., J. Cacho-Diaz, and R. J. A. Laeven (2015). Modeling ﬁnancial contagion using mutuallyexciting jump processes.

J. Financ. Econ. 117 (3), 585–606.Andersen, L. B. G. and V. V. Piterbarg (2007). Moment explosions in stochastic volatility models.

Financ.Stoch. 11 , 29–50.Barczy, M., L. D¨oring, Z. Li, and G. Pap (2014). Stationarity and ergodicity for an aﬃne two-factor models.

Adv. Appl. Probab. 46 (3), 878–898.Barndorﬀ-Nielsen, O. E. and N. Shephard (2001). Non-Gaussian Ornstein-Uhlenbeck-based models and someof their uses in ﬁnancial economics.

J. R. Statist. Soc. B 63 (2), 167–241.Bates, D. S. (2006). Maximum likelihood estimation of latent aﬃne processes.

Rev. Financ. Stud. 19 (3),909–965.Berman, A. and R. J. Plemmons (1994).

Nonnegative Matrices in the Mathematical Sciences . SIAM, Philadel-phia.Cheridito, P., D. Filipovi´c, and R. L. Kimmel (2007). Market price of risk speciﬁcation for aﬃne models:Theory and evidence.

J. Financ. Econ. 83 , 123–170.Collin-Dufresne, P., R. S. Goldstein, and C. S. Jones (2008). Identiﬁcation of maximal aﬃne term structuremodels.

J. Finance 63 (2), 743–759.Cox, J. C., J. E. Ingersoll, and S. A. Ross (1985). A theory of the term structure of interest rates.

Econo-metrica 53 (2), 385–407.Dai, Q. and K. J. Singleton (2000). Speciﬁcation analysis of aﬃne term structure models.

J. Finance 55 ,1943–1978.Davies, E. B. (1989).

Heat Kernels and Spectral Theory , Volume 92 of

Cambridge Tracts in Mathematics .Cambridge University Press.Dawson, D. A. and Z. Li (2006). Skew convolution semigroups and aﬃne Markov processes.

Ann.Probab. 34 (3), 1103–1142.Duﬃe, D., D. Filipovi´c, and W. Schachermayer (2003). Aﬃne processes and applications in ﬁnance.

Ann.Appl. Probab. 13 (3), 984–1053.Duﬃe, D. and P. W. Glynn (2004). Estimation of continuous-time Markov processes sampled at randomtime intervals.

Econometrica 72 (6), 1773–1808.Duﬃe, D. and R. Kan (1996). A yield-factor model of interest rates.

Math. Finance 6 , 379–406. uﬃe, D., J. Pan, and K. J. Singleton (2000). Transform analysis and asset pricing for aﬃne jump-diﬀusions. Econometrica 68 (6), 1343–1376.Errais, E., K. Giesecke, and L. R. Goldberg (2010). Aﬃne point processes and portfolio credit risk.

SIAMJ. Finan. Math. 1 , 642–665.Ethier, S. N. and T. G. Kurtz (1986).

Markov Processes: Characterization and Convergence . John Wiley &Sons, Inc.Filipovi´c, D. and E. Mayerhofer (2009). Aﬃne diﬀusion processes: Theory and applications. In H. Albrecher,W. Runggaldier, and W. Schachermayer (Eds.),

Radon Ser. Comput. Appl. Math. , Volume 8, pp. 1–40.Filipovi´c, D., E. Mayerhofer, and P. Schneider (2013). Density approximations for multivariate aﬃne jump-diﬀusion processes.

J. Econometrics 176 , 93–111.Gao, X., X. Zhou, and L. Zhu (2018). Transform analysis for Hawkes processes with applications in darkpool trading.

Quant. Finance 18 (2), 265–282.Getoor, R. K. (1975).

Markov Processes: Ray Processes and Right Processes . Lecture Notes in Mathematics.Springer-Verlag Berlin Heidelberg.Glasserman, P. and K.-K. Kim (2010). Moment explosions and stationary distributions in aﬃne diﬀusionmodels.

Math. Finance 20 (1), 1–33.Glynn, P. W. and S. P. Meyn (1996). A Liapounov bound for solutions of the Poisson equation.

Ann.Probab. 24 (2), 916–931.Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.

Economet-rica 50 , 1029–1054.Hansen, L. P. and J. A. Scheinkman (1995). Back to the future: generating moment implications forcontinuous-time Markov processes.

Econometrica 63 (4), 767–804.Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes.

Biometrika 58 ,83–90.Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bondand currency options.

Rev. Financ. Stud. 6 , 327–343.Horn, R. A. and C. R. Johnson (2012).

Matrix Analysis (2nd ed.). Cambridge University Press.Jena, R. P., K.-K. Kim, and H. Xing (2012). Long-term and blow-up behaviors of exponential moments inmulti-dimensional aﬃne diﬀusions.

Stoch. Proc. Appl. 122 , 2961–2993.Jin, P., J. Kremer, and B. R¨udiger (2017). Exponential ergodicity of an aﬃne two-factor model based anthe α -root process. Adv. Appl. Probab. 49 , 1144–1169.Jin, P., B. R¨udiger, and C. Trabelsi (2016). Positive Harris recurrence and exponential ergodicity of thebasic aﬃne jump-diﬀusion.

Stoch. Anal. Appl. 34 (1), 75–95.Karatzas, I. and S. E. Shreve (1991).

Brownian Motion and Stochastic Calculus (2nd ed.). Springer.Keller-Ressel, M. (2011). Moment explosions and long-term behavior of aﬃne stochastic volatility model.

Math. Finance 21 (1), 73–98.Keller-Ressel, M., W. Schachermayer, and J. Teichmann (2011). Aﬃne processes are regular.

Probab. Theor.Relat. Field. 151 , 591–611.Kontoyiannis, I. and S. P. Meyn (2003). Spectral theory and limit theorems for geometrically ergodic Markovprocesses.

Ann. Appl. Probab. 13 (1), 304–362. ee, R. W. (2004). The moment formula for implied volatlity at extreme strikes. Math. Finance 14 (3),469–480.Masuda, H. (2004). On multidimensional Ornstein-Uhlenbeck processes driven by a general L´evy process.

Bernoulli 10 , 97–120.Meyn, S. P. and R. L. Tweedie (1992). Stability of Markovian processes I: criteria for discrete-time chains.

Adv. Appl. Probab. 24 , 542–574.Meyn, S. P. and R. L. Tweedie (1993a). Generalized resolvents and Harris recurrence of Markov processes.

Contemporary Mathematics 149 , 227–250.Meyn, S. P. and R. L. Tweedie (1993b). Stability of Markovian processes II: continuous-time processes andsampled chains.

Adv. Appl. Probab. 25 , 487–517.Meyn, S. P. and R. L. Tweedie (1993c). Stability of Markovian processes III: Foster-Lyapunov criteria forcontinuous-time processes.

Adv. Appl. Probab. 25 , 518–548.Meyn, S. P. and R. L. Tweedie (2009).

Markov Chains and Stochastic Stability (2nd ed.). CambridgeUniversity Press.Protter, P. E. (2003).

Stochastic Integration and Diﬀerential Equations (2nd ed.). Springer.Sato, K.-i. and M. Yamazato (1984). Operator-self-decomposable distributions as limit distributions ofprocesses of Ornstein–Uhlenbeck type.

Stoch. Proc. Appl. 17 , 73–100.Sharpe, M. (1988).

General Theory of Markov Processes . Academic Press.Sigman, K. (1990). One-dependent regenerative processes and queues in continuous time.

Math. Oper.Res. 15 (1), 175–189.Singleton, K. J. (2001). Estimation of aﬃne asset pricing models using the empirical characteristic function.

J. Econometrics 102 (1), 111–141.Stramer, O. and R. L. Tweedie (1994). Stability and instability of continuous time Markov processes. InF. P. Kelly (Ed.),

Probability, Statistics and Optimization: A Tribute to Peter Whittle , pp. 173–184.John Wiley & Sons, Inc.Vasicek, O. (1977). An equilibrium characterization of the term structure.

J. Financ. Econ. 5 , 177–188.Zhang, X., J. Blanchet, K. Giesecke, and P. W. Glynn (2015). Aﬃne point processes: Approximation andeﬃcient simulation.

Math. Oper. Res. 40 (4), 797–819.(4), 797–819.