[PDF] Diffusion Copulas: Identification and Estimation

Abstract

We propose a new semiparametric approach for modelling nonlinear univariate diffusions, where the observed process is a nonparametric transformation of an underlying parametric diffusion (UPD). This modelling strategy yields a general class of semiparametric Markov diffusion models with parametric dynamic copulas and nonparametric marginal distributions. We provide primitive conditions for the identification of the UPD parameters together with the unknown transformations from discrete samples. Likelihood-based estimators of both parametric and nonparametric components are developed and we analyze the asymptotic properties of these. Kernel-based drift and diffusion estimators are also proposed and shown to be normally distributed in large samples. A simulation study investigates the finite sample performance of our estimators in the context of modelling US short-term interest rates. We also present a simple application of the proposed method for modelling the CBOE volatility index data.

Full PDF

DDiﬀusion Copulas: Identiﬁcation and Estimation

Ruijun Bu ∗ Kaddour Hadri † Dennis Kristensen ‡ April 2020

Abstract

We propose a new semiparametric approach for modelling nonlinear univariate diﬀusions,where the observed process is a nonparametric transformation of an underlying parametric dif-fusion (UPD). This modelling strategy yields a general class of semiparametric Markov diﬀusionmodels with parametric dynamic copulas and nonparametric marginal distributions. We provideprimitive conditions for the identiﬁcation of the UPD parameters together with the unknowntransformations from discrete samples. Likelihood-based estimators of both parametric andnonparametric components are developed and we analyze the asymptotic properties of these.Kernel-based drift and diﬀusion estimators are also proposed and shown to be normally dis-tributed in large samples. A simulation study investigates the ﬁnite sample performance of ourestimators in the context of modelling US short-term interest rates. We also present a simpleapplication of the proposed method for modelling the CBOE volatility index data.JEL Classiﬁcation: C14, C22, C32, C58, G12Keywords: Continuous-time model; diﬀusion process; copula; transformation model; identiﬁca-tion; nonparametric; semiparametric; maximum likelihood; sieve; kernel smoothing. ∗ Department of Economics, Management School, University of Liverpool, Liverpool, UK. Email: [email protected]. † Queen’s University Management School, Queen’s University Belfast, Belfast, Northern Ireland, UK. Email:[email protected]. ‡ Department of Economics, University College London, London, UK. Email: [email protected]. a r X i v : . [ ec on . E M ] M a y Introduction

Most ﬁnancial time series have fat tails that standard parametric models are not able to generate.One forceful argument for this in the context of diﬀusion models was provided by A¨ıt-Sahalia(1996b) who tested a range of parametric models against a nonparametric alternative and foundthat most standard models were inconsistent with observed features in data.One popular semiparametric approach that allows for more ﬂexibility in terms of marginaldistributions, and so allowing for fat tails, is to use the so-called copula models, where the copulais parametric and the marginal distribution is left unspeciﬁed (nonparametric). Joe (1997) showedhow bivariate parametric copulas could be used to model discrete-time stationary Markov chainswith ﬂexible, nonparametric marginal distributions. The resulting class of semiparametric modelsare relatively easy to estimate; see, e.g. Chen and Fan (2006). However, most parametric copulasknown in the literature have been derived in a cross-sectional setting where they have been used todescribe the joint dependence between two random variables with known joint distribution, e.g. abivariate t -distribution. As such, existing parametric copulas may be diﬃcult to interpret in termsof the dynamics they imply when used to model Markov processes. This in turn means that appliedresearchers may ﬁnd it diﬃcult to choose an appropriate copula for a given time series.One could have hoped that copulas with a clearer dynamic interpretation could be developedby starting with an underlying parametric Markov model and then deriving its implied copula.This approach is unfortunately hindered by the fact that the stationary distributions of generalMarkov chains are not available on closed-form and so their implied dynamic copulas are notavailable on closed form either. This complicates both the theoretical analysis (such as establishingidentiﬁcation) and the practical implementation of such models.An alternative approach to modelling fat tails using Markov diﬀusions is to specify ﬂexible formsfor the so-called drift and diﬀusion term. Such non-linear features tend to generate fat tails in themarginal distribution of the process. This approach has been widely used to, for example, modelshort-term interest rates; see, e.g., A¨ıt-Sahalia (1996a,b), Conley et al. (1997), Stanton (1997),Ahn and Gao (1999) and Bandi (2002). These models tend to either be heavily parameterized orinvolve nonparametric estimators that suﬀer from low precision in small and moderate samples.We here propose a novel class of dynamic copulas that resolves the above-mentioned issues:We show how copulas can easily be generated from parametric diﬀusion processes. The copulashave a clear interpretation in terms of dynamics since they are constructed from an underlyingdynamic continuous-time process. At the same time, a given copula-based diﬀusion can exhibitstrong non-linearities in its drift and diﬀusion term even if the underlying copula is derived from,for example, a linear model. Furthermore, primitive conditions for identiﬁcation of the parametersare derived; and this despite the fact that the copulas are implicit. Finally, the models can easily beimplemented in practice using existing numerical methods for parametric diﬀusion processes. Thisin turn implies that estimators are easy to compute and do not involve any smoothing parameters;this is in contrast to existing semi- and nonparametric estimators of diﬀusion models.The starting point of our analysis is to show that there is a one-to-one correspondence between2ny given semiparametric Markov copula model and a model where we observe a nonparametrictransformation of an underlying parametric Markov process. We then restrict attention to para-metric Markov diﬀusion processes which we refer to as underlying parametric diﬀusions (UPD’s).Copulas generated from a given UPD has a clear interpretation in terms of dynamic properties. Inparticular, standard results from the literature on diﬀusion models can be employed to establishmixing properties and existence of moments for a given model; see, e.g. Chen et al. (2010). More-over, we are able to derive primitive conditions for the parameters of the copula to be identiﬁedtogether with the unknown transformation.Once identiﬁcation has been established, estimation of our copula diﬀusion models based ona discretely sampled process proceeds as in the discrete-time case. One can either estimate themodel using a one-step or two-step procedure: In the one-step procedure, the marginal distributionand the parameters of the UPD are estimated jointly by sieve-maximum likelihood methods asadvocated by Chen, Wu and Yi (2009). In the two-step approach, the marginal distribution is ﬁrstestimated by the empirical cdf, which in turn is plugged into the likelihood function of the model.This is then maximized with respect to the parameters of the UPD. We provide an asymptotictheory for both cases by importing results from Chen, Wu and Yi (2009) and Chen and Fan (2006),respectively. In particular, we provide primitive conditions for their high-level assumptions tohold in our diﬀusion setting. The resulting asymptotic theory shows √ n -asymptotic normalityof the parametric components. Given the estimates of parametric component, one can obtainsemiparametric estimates of the drift and diﬀusion functions and we also provide an asymptotictheory for these.Our modelling strategy has parametric ascendants: Bu et al. (2011), Eraker and Wang (2015)and Forman and Sørensen (2014) considered parametric transformations of UPDs for modellingshort-term interest rates, variance risk premia and molecular dynamics, respectively. We here pro-vide a more ﬂexible class of models relative to theirs since we leave the transformation unspeciﬁed.At the same time, all the attractive properties of their models remain valid: The transition densityof the observed process is induced by the UPD and so the estimation of copula-based diﬀusionmodels is computationally simple. Moreover, copula diﬀusion models can furthermore be easilyemployed in asset pricing applications since (conditional) moments are easily computed using thespeciﬁcation of the UPD. Finally, none of these papers fully addresses the identiﬁcation issue andso our identiﬁcation results are also helpful in their setting.There are also similarities between our approach and the one pursued in A¨ıt-Sahalia (1996a)and Kristensen (2010). They developed two classes of semiparametric diﬀusion models where eitherthe drift or the diﬀusion term is speciﬁed parametrically and the remaining term is left unspeciﬁed.The remaining term is then recovered by using the triangular link between the marginal distribu-tion, the drift and the diﬀusion terms that exist for stationary diﬀusions. In this way, the marginaldistribution implicitly ties down the dynamics of the observed diﬀusion process. Unfortunately, itis very diﬃcult to interpret the dynamic properties of the resulting semiparametric diﬀusion model.In contrast, in our setting, the UPD alone ties down the dynamics of the observed diﬀusion andso these are much better understood. The estimation of copula diﬀusions are also less computa-3ionally burdensome compared to the Pseudo Maximum Likelihood Estimator (PMLE) proposedin Kristensen (2010).The remainder of this paper is organized as follows. Section 2 outlines our semiparametricmodelling strategy. Section 3 investigates the identiﬁcation issue of our model. In Section 4,we discuss the estimators of our model while Section 5 investigates their asymptotic properties.Section 6 presents a simulation study to examine the ﬁnite sample performance of our estimators.In Section 7, we consider a simple empirical application. Some concluding remarks are given inSection 8. All proofs and lemmas are collected in Appendices. Consider a continuous-time process Y = { Y t : t ≥ } with domain Y = ( y l , y r ), where −∞ ≤ y l θ and the function V . Note here that we only observe Y while X remains unobserved since we leave V unspeciﬁed (unknown to us). For convenience, we collectthe unknown component in the structure S ≡ ( θ, V ).The above class of models allows for added ﬂexibility through the transformation V which wetreat as a nonparametric object that we wish to estimate together with θ . By allowing for a broadnonparametric class of transformations V , our model is richer and more ﬂexible compared to thefully parametric case with known or parametric speciﬁcations of V . In particular, as we shall see,any given member of the above class of models is able to completely match the marginal distributionof any given time series.We will require that the underlying Markov process X sampled at i ∆, i = 1 , , ... , possesses atransition density p X ( x | x ; θ ),Pr ( X ∆ ∈ A| X = x ) = (cid:90) A p X ( x | x ; θ ) dx, A ⊆ X . (2.3)4oreover, some of our results require X to be recurrent, a property which can be stated in termsof the so-called scale density and scale measure. These are deﬁned as s ( x ; θ ) := exp (cid:26) − (cid:90) xx ∗ µ X ( z ; θ ) σ X ( z ; θ ) dz (cid:27) and S ( x ; θ ) := (cid:90) xx ∗ s ( z ; θ ) dz (2.4)for some x ∗ ∈ X . We then impose the following: Assumption 2.1. (i) µ X ( · ; θ ) and σ X ( · ; θ ) > S ( x ; θ ) → −∞ (+ ∞ ) as x → x l ( x r ); (iii) ξ ( θ ) = (cid:82) X (cid:8) σ X ( x ; θ ) s ( x ; θ ) (cid:9) − dx < ∞ . Assumption 2.2.

The transformation V is strictly increasing with inverse U = V − , i.e., y = V ( x ) ⇔ x = U ( y ), and is twice continuously diﬀerentiable.Assumption 2.1(i) provides primitive conditions for a solution to eq. (2.2) to exist and for thetransition density p X ( x | x ; θ ) to be well-deﬁned, while Assumption 2.1(ii) implies that this solutionis positive recurrent; see Bandi and Phillips (2003), Karatzas and Shreve (1991, Section 5.5) andMcKean (1969, Section 5) for more details. Assumption 2.1(iii) strengthens the recurrence propertyto stationarity and ergodicity in which case the stationary marginal density of X takes the form f X ( x ; θ ) = ξ ( θ ) σ X ( x ; θ ) s ( x ; θ ) , (2.5)where ξ ( θ ) was deﬁned in Assumption 2.1(iii). However, stationarity will not be required for allour results to hold; in particular, some of our identiﬁcation results and proposed estimators do notrely on stationarity. This is in contrast to the existing literature on dynamic copula models wherestationarity is a maintained assumption.Assumption 2.2 requires V to be strictly increasing; this is a testable restriction under theremaining assumptions introduced below which ensures identiﬁcation: Suppose that indeed V isstrictly decreasing; we then have Y t = ¯ V (cid:0) ¯ X t (cid:1) , where ¯ V ( x ) = V ( − x ) is increasing and ¯ X t = − X t has dynamics p X ( − x | − x ; θ ). Assuming that the chosen UPD satisﬁes p X ( − x | − x ; θ ) (cid:54) = p X ( x | x ; ˜ θ ) for θ (cid:54) = ˜ θ , we can test whether V indeed is decreasing or increasing.The smoothness condition on V is imposed so that we can employ Ito’s Lemma on the trans-formation to obtain that the continuous-time dynamics of Y can be written in terms of S as dY t = µ Y ( Y t ; S ) dt + σ Y ( Y t ; S ) dW t , with µ Y ( y ; S ) = µ X ( U ( y ) ; θ ) U (cid:48) ( y ) − σ X ( U ( y ) ; θ ) U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) , (2.6) σ Y ( y ; S ) = σ X ( U ( y ) ; θ ) U (cid:48) ( y ) , (2.7)where we have used that, with U (cid:48) ( y ) and U (cid:48)(cid:48) ( y ) denoting the ﬁrst two derivatives of U ( y ), V (cid:48) ( U ( y )) = 1 /U (cid:48) ( y ) and V (cid:48)(cid:48) ( U ( y )) = − U (cid:48)(cid:48) ( y ) /U (cid:48) ( y ) . In particular, Y is a Markov diﬀusion5rocess. As can be seen from the above expressions, the dynamics of Y , as characterized by µ Y and σ Y , may appear quite complex with U potentially generating nonlinearities in both the driftand diﬀusion terms even if µ X and σ X are linear. We demonstrate this feature in the subsequentsubsection where we present examples of simple UPD’s are able to generate non-linear shapes of µ Y and σ Y via the non-linear transformation V . At the same time, if we transform Y by U we re-cover the dynamics of the UPD. As a consequence, the transition density of the discretely sampledprocess Y i ∆ , i = 0 , , , ... , can be expressed in terms of the one of X as p Y ( y | y ; S ) = U (cid:48) ( y ) p X ( U ( y ) | U ( y ) ; θ ) , (2.8)using standard results for densities of invertible transformations. By similar arguments, the sta-tionary density of Y satisﬁes f Y ( y ; S ) = U (cid:48) ( y ) f X ( U ( y ) ; θ ) , (2.9)which shows that any choice for UPD is able to fully adapt to any given marginal density of Y dueto the nonparametric nature of U .The above expressions also highlights the following additional theoretical and practical advan-tages of our modelling strategy: First, for a given choice of U , we can easily compute p Y ( y | y ; S )and f Y ( y ; S ) since computation of parametric transition densities and stationary densities of diﬀu-sion models is in general straightforward, even if they are not available on closed form. Second, Y inherits all its dynamic properties from X ; and in the modelling of X , we can rely on a large litera-ture on parametric modelling of diﬀusion models. Formally, we have the following straightforwardresults adopted from Forman and Sørensen (2014). Proposition 2.1

Suppose that Assumptions 2.1(i)–(ii) and 2.2 hold. Then the following resultshold for the model (2.1)-(2.2):1. If Assumption 2.1(iii) hold, then X is stationary and ergodic and so is Y .2. The mixing coeﬃcients of X and Y coincide.3. If E [ | X t | q ] < ∞ and | V ( x ) | ≤ B (1 + | x | q ) for some B < ∞ and q , q ≥ , then E [ | Y t | q /q ] < ∞ .4. If ϕ is an eigenfunction of X with corresponding eigenvalue ρ in the sense that E [ ϕ ( X ) | X ] = ρϕ ( X ) then ϕ ◦ U is an eigenfunction of Y with corresponding eigenvalue ρ . The above theorem shows that, given knowledge (or estimates) of S , the properties of Y interms of mixing coeﬃcients, moments, and eigenfunctions are well-understood since they are in-herited from the speciﬁcation of X . In addition, computations of conditional moments of Y can bedone straightforwardly utilizing knowledge of the UPD. For example, for a given function G , thecorresponding conditional moment can be computed as E [ G ( Y t + s ) | Y t = y ] = E [ G X ( X t + s ) | X t = U ( y )] , where G X ( x ) := G ( V ( x )) . X and so standard methods for computing momentsof parametric diﬀusion models (e.g., Monte Carlo methods, solving partial diﬀerential equations,Fourier transforms) can be employed. This facilitates the use of our diﬀusion models in asset pricingwhere the price often takes the form of a conditional moment. We refer to Eraker and Wang (2015)for more details on asset pricing applications for our class of models; they take a fully parametricapproach but all their arguments carry over to our setting.The last result of the above theorem will prove useful for our identiﬁcation arguments sincethese will rely on the fundamental nonparametric identiﬁcation results derived in Hansen et al.(1998). Their results involve the spectrum of the observed diﬀusion process, and the last result ofthe theorem implies that the spectrum of Y is fully characterized by the spectrum of X togetherwith the transformation. The eigenfunctions and their eigenvalues are also useful for evaluatinglong-run properties of Y . In our semiparametric approach, the eigenfunctions and correspondingeigenvalues of Y are easily computed from X and so we circumvent the problem of estimating thesenonparametrically as done in, for example, Chen, Hansen and Scheinkman (2009) and Gobet et al.(2004). Our framework is quite ﬂexible and in principle allows for any speciﬁcation of the UPD for X . Manyparametric models are available for that purpose, and we here present three speciﬁc examples fromthe literature on continuous-time interest rate modelling. Example 1: Ornstein-Uhlenbeck (OU) model.

The OU model (c.f. Vasicek, 1977) is givenby dX t = κ ( α − X t ) dt + σdW t , (2.10)deﬁned on the domain X = ( −∞ , + ∞ ). The process is stationary if and only if κ >

0, in which case X mean-reverts to its unconditional mean α . The scale of X is controlled by σ . Its stationary andtransition distributions are both normal, and the corresponding copula of the discretely sampledprocess is a Gaussian copula with correlation parameter e − κ ∆ . For this particular model, theresulting drift and diﬀusion term of the observed process takes the form µ Y ( y ; S ) = κ ( α − U ( y )) U (cid:48) ( y ) − σ U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) , σ Y ( y ; S ) = σ U (cid:48) ( y ) . (2.11)In Figure 2 (found in Section 6), we plot these two functions with U and θ ﬁtted to the 7-dayEurodollar interest rate time series used in A¨ıt-Sahalia (1996b). Observe that U generates non-linear behavior in µ Y and σ Y despite the UPD being a linear Gaussian process. Example 2: Cox-Ingersoll-Ross (CIR) model.

The CIR process (c.f. Cox et al., 1985) isgiven by dX t = κ ( α − X t ) dt + σ (cid:112) X t dW t . (2.12)The process has domain X = (0 , + ∞ ) and is stationary if and only if κ > α > κα/σ ≥ X i ∆ , X ( i +1)∆ admits a non-central χ distribution with fractional degrees of freedom7hile its stationary distribution is a Gamma distribution. To our best knowledge, the correspondingdynamic copula has not been analyzed before or used in empirical work. Figure 4 (in Section 6)displays µ Y and σ Y , with U and θ chosen in the same way as in Exampe 1. Compared to thisexample, the resulting drift and diﬀusion term of Y exhibit even stronger non-linearities. Example 3: Nonlinear Drift Constant Elasticity Variance (NLDCEV) model.

TheNLDCEV speciﬁcation (c.f. Conley et al., 1997) is given by dX t = (cid:32) l (cid:88) i = − k α i X it (cid:33) dt + σX βt dW t (2.13)with domain X = (0 , + ∞ ). It is easily seen that when α − k > α l < X . A popular choicefor various studies in ﬁnance assumes that k = 1 and l = 2 or 3 (c.f. A¨ıt-Sahalia, 1996b; Choi,2009; Kristensen, 2010; Bu, Cheng and Hadri, 2017), in which case the drift has linear or zeromean-reversion in the middle part and much stronger mean-reversion for large and small values of X . Meanwhile, the CEV diﬀusion term is also consistent with most empirical ﬁndings of the shapeof the diﬀusion term. It follows that since (2.13) is one of the most ﬂexible parametric diﬀusions,diﬀusion processes that are unspeciﬁed transformations of (2.13) should represent a very ﬂexibleclass of diﬀusion models. Similar to (2.12), the implied copula of the NLDCEV is new to the copulaliterature.Examples 1-2 are attractive from a computational standpoint since the corresponding transitiondensities are available on closed-form thereby facilitating their implementation. But this comes atthe cost of the dynamics being somewhat simple. The NLDCEV model implies more complexand richer dynamics but on the other hand its transition density is not available on closed form.However, the marginal pdf of the NLDCEV process, as well as more general speciﬁcations, can beevaluated in closed form by (2.5). Moreover, closed-from approximations of the transition density ofthe NLDCEV model developed by, for example, A¨ıt-Sahalia (2002) and Li (2013) can be employed.Alternatively, simulated versions of the transition density can be computed using the techniquesdeveloped in, for example, Kristensen and Shin (2012) and Bladt and Sørensen (2014). In eithercase, an approximate version of the exact likelihood can be easily computed, thereby allowing forsimple estimation of even quite complex underlying UPDs. As already noted in the introduction, copula-based diﬀusions are related to the class of so-called discrete-time copula-based Markov models; see, for example, Chen and Fan (2006) and referencestherein. To map the notation and ideas of this literature into our continuous-time setting, we setthe sampling time distance ∆ = 1 in the remaining part of this section.Let us ﬁrst introduce copula-based Markov models where a given discrete-time, stationaryscalar Markov process Y = { Y i : i = 0 , , . . . , n } is modelled through a bivariate parametric copula8ensity , say, c X ( u , u ; θ ), together with its stationary marginal cdf F Y , i.e., so that Y ’s transitiondensity satisﬁes p Y ( y | y ; θ, F Y ) = f Y ( y ) c X ( F Y ( y ) , F Y ( y ) ; θ ) , (2.14)where f Y ( y ) = F (cid:48) Y ( y ). An alternative representation of this model is Y i = F − Y (cid:0) ¯ X i (cid:1) , ¯ X i +1 | ¯ X i = x ∼ c X ( x , · ; θ ) , (2.15)so that Y i is a transformation of an underlying Markov process ¯ X i ∈ [0 , c X ( x , x ; θ ). Thus, if c X ( x , x ; θ ) is inducedby an underlying Markov diﬀusion transition density, the corresponding copula-based Markov modelfalls within our framework.Reversely, consider a copula-based diﬀusion and suppose that the UPD X is stationary withmarginal cdf F X ( x ; θ ). By deﬁnition of Y , its marginal cdf satisﬁes F Y ( y ) = F X ( U ( y ) ; θ ) ⇔ U ( y ) = F − X ( F Y ( y ) ; θ ) . (2.16)Substituting the last expression for U into (2.8), we see that p Y can be expressed in the form of(2.14) where c X ( u , u ; θ ) is the density function of the (dynamic) copula implied by the discretelysampled UPD X , c X ( u , u ; θ ) = p X (cid:0) F − X ( u ; θ ) | F − X ( u ; θ ) ; θ (cid:1) f X (cid:0) F − X ( u ; θ ) ; θ (cid:1) . (2.17)Thus, any discretely sampled stationary copula-based diﬀusion satisﬁes (2.15) with ¯ X i = F X ( X i ).However, the literature on copula-based Markov models focus on discrete-time models withstandard copula speciﬁcations derived from bivariate distributions in an i.i.d. setting. Using copulasthat are originally derived in an i.i.d. setting complicates the interpretation of the dynamics of theresulting Markov model, and conditions for the model to be mixing, for example, can be quitecomplicated to derive; see, e.g., Beare (2010) and Chen, Wu and Yi (2009). This also implies thatvery few standard copulas can be interpreted as diﬀusion processes; to our knowledge, the only oneis the Gaussian copula which corresponds to the OU process in Example 1.The reader may now wonder why we do not simply generate dynamic copulas by ﬁrst deriv-ing the transition density p X ( x | x ; θ ) for a given discrete-time Markov model and then obtainthe corresponding Markov copula through eq. (2.17)? The reason is that for most discrete-timeMarkov models the stationary distribution F X ( x ; θ ) is not known on closed form. Thus, ﬁrst ofall, F − X ( u ; θ ) and thereby also c X have be approximated numerically. Second, since c X is now notavailable on closed form, the analysis of which parameters one can identify from the resulting copulamodel becomes very challenging. And identiﬁcation in copula-based Markov models is a non-trivialproblem: Generally, for a given parametric Markov model, not all parameters are identiﬁed fromthe corresponding copula as given in (2.17) and some of them have to be normalized. The copula C X ( u , u ; θ ) for a given Markov process is deﬁned as C X ( u , u ; θ ) = Pr (cid:0) X ≤ F − X ( u ; θ ) , X ≤ F − X ( u ; θ ) (cid:1) . The corresponding copula density is then given by c X ( u , u ; θ ) = ∂ C X ( u , u ; θ ) / ( ∂u ∂u ).

9e here directly generate copulas through an underlying continuous-time diﬀusion model for X . This resolves the aforementioned drawbacks of existing copula-based Markov models: First,we are able to generate highly ﬂexible copulas so far not considered in the literature. Second,given that our copulas are induced by specifying the drift and diﬀusion functions of X , the timeseries properties are much more easily inferred from our model, c.f. Theorem 2.1. Third, by Ito’sLemma, eqs. (2.6)-(2.7) provide us with explicit expressions linking the drift and diﬀusion termsof the observed diﬀusion process Y to the UPD through the transformation V ; this will allow usto derive necessary and suﬃcient conditions for identiﬁcation in the following. Fourth, in terms ofestimation, the stationary distribution of a given diﬀusion model has an explicit form, c.f. eq. (2.5),which allows us to develop computationally simple estimators of copula diﬀusion models. Finally,some of our identiﬁcation results will not require stationarity and so expands the scope for usingcopula-type models in time series analysis.Our modelling strategy is also related to the ideas of A¨ıt-Sahalia (1996a) and Kristensen (2010,2011) where F Y is left unspeciﬁed while either the drift, µ Y , or the diﬀusion term, σ Y , is speciﬁedparametrically. As an example, consider the former case where σ Y ( y ; θ ) is known up to the pa-rameter θ . Given knowledge of the marginal density f Y (or a nonparametric estimator of it), thediﬀusion term can then be recovered as a functional of f Y and µ Y as µ Y ( y ; f Y , θ ) = 12 f Y ( y ) ∂∂y (cid:2) σ Y ( y ; θ ) f Y ( y ) (cid:3) . So in their setting f Y pins down the resulting dynamics of Y in a rather opaque manner. Suppose that a particular speciﬁcation of the UPD as given in (2.2) has been chosen. Given thediscrete sample of Y , the goal is to obtain consistent estimates of θ together with V . To this end,we ﬁrst have to show that these are actually identiﬁed from data. In order to do so, we need tobe precise about which primitives we can identify from data. Given the primitives, we then wishto recover ( θ, V ). In the cross-sectional literature, one normally take as given the distribution ofdata and then establish a mapping between this and the structural parameters. In our setting, weare able to learn about the transition density of our data, p Y , from the population and so it wouldbe natural to use this as primitive from which we wish to recover ( θ, V ). However, the mappingfrom p Y to ( θ, V ) is not available on closed form in general in our setting and so this identiﬁcationstrategy appears highly complicated. Instead we will take as primitives the drift, µ Y , and diﬀusionterm, σ Y , of Y and then show identiﬁcation of ( θ, V ) from these. This identiﬁcation argumentrelies on us being able to identify µ Y and σ Y in the ﬁrst place, which we formally assume here: Assumption 3.1

The drift, µ Y , and the diﬀusion, σ Y , are nonparametrically identiﬁed from thediscretely sampled process Y .The above assumption is not completely innocuous and does impose some additional regularityconditions on the Data Generating Process (DGP). We therefore ﬁrst provide suﬃcient conditions10nder which Assumption 3.1 holds. The ﬁrst set of conditions are due to Hansen et al. (1998) whoshowed that Assumption 3.1 is satisﬁed if Y is stationary and its inﬁnitesimal operator has a discretespectrum. Theorem 2.1(4) is helpful in this regard since it informs us that the spectrum of Y can berecovered from the one of X . In particular, if X is stationary with a discrete spectrum, then Y willhave the same properties. Since the dynamics of X is known to us, the properties of its spectrum arein principle known to us and so this condition can be veriﬁed a priori. The second set of primitiveconditions come from Bandi and Phillips (2003): They show that as ∆ → n ∆ → ∞ , thedrift and diﬀusion functions of a recurrent Markov diﬀusion process are identiﬁed. This last resultholds without stationarity, but on the other hand requires high-frequency observations.In order to formally state the above two results, we need some additional notation. Recall thatthe inﬁnitesimal operator, denoted L X , of a given UPD X is deﬁned as L X,θ g ( x ) := µ X ( x ; θ ) g (cid:48) ( x ) + 12 σ X ( x ; θ ) g (cid:48)(cid:48) ( x ) , for any twice diﬀerentiable function g ( x ). We follow Hansen et al. (1998) and restrict the domainof L X to the following set of functions: D ( L X,θ ) = (cid:26) g ∈ L ( f X ) : g (cid:48) is a.c., L X,θ g ∈ L ( f X ) and lim x ↓ x l g (cid:48) ( x ) s ( x ) = lim x ↑ x u g (cid:48) ( x ) s ( x ) = 0 (cid:27) . where a.c. stands for absolutely continuous. The spectrum of L X,θ is then the set of solution pairs( ϕ, ρ ), with ϕ ∈ D ( L X,θ ) and ρ ≥

0, to the following eigenvalue problem, L X,θ ϕ = − ρϕ . Werefer to Hansen et al. (1998) and Kessler and Sørensen (1999) for a further discussion and resultsregarding the spectrum of L X . The following result then holds: Proposition 3.1

Suppose that Assumption 2.1(i)-(ii) is satisﬁed. Then Assumption 3.1 holdsunder either of the following two sets of conditions:1. Assumption 2.1(iii) holds and L X,θ has a discrete spectrum where θ is the data-generatingparameter value.2. ∆ → and n ∆ → ∞ . Importantly, the above result shows that Assumption 3.1 can be veriﬁed without imposingstationarity. Unfortunately, this requires high-frequency information (∆ → > S = ( θ, V ) containsthe objects of interest and let our model consist of all the structures that satisfy, as a minimum,Assumptions 2.1(i)–(ii) and 2.2. According to (2.6)-(2.7), each structure implies a drift and diﬀusionterm of the observed process. We shall say that two structures S = ( θ, V ) and ˜ S = (˜ θ, ˜ V ) are11 bservationally equivalent , a property which we denote by S ∼ ˜ S , if they imply the same drift anddiﬀusion of Y , i.e. ∀ y ∈ Y : µ Y ( y ; S ) = µ Y (cid:16) y ; ˜ S (cid:17) and σ Y ( y ; S ) = σ Y (cid:16) y ; ˜ S (cid:17) . (3.1)The structure S is then said to be identiﬁed within the model if S ∼ ˜ S implies S = ˜ S . In oursetting, without suitable normalizations on the parameters of the UPD, identiﬁcation will generallyfail. To see this, observe that any given structure S is observationally equivalent to the followingprocess: Choose any one-to-one transformation T : X (cid:55)→ X , and rewrite the DGP implied by S as Y t = ˜ V (cid:16) ˜ X t (cid:17) , ˜ V ( x ) = V ( T ( x )) , (3.2)where ˜ X t = T − ( X t ) solves d ˜ X t = µ T − ( X ) (cid:16) ˜ X t ; θ (cid:17) dt + σ T − ( X ) (cid:16) ˜ X t ; θ (cid:17) dW t , (3.3)with µ T − ( X ) ( x ; θ ) = µ X ( T ( x ) ; θ ) ∂T ( x ) / ( ∂x ) − σ X ( T ( x ) ; θ ) ∂ T ( x ) / (cid:0) ∂x (cid:1) ∂T ( x ) / ( ∂x ) , (3.4) σ T − ( X ) ( x ; θ ) = σ X ( T ( x ) ; θ ) ∂T ( x ) / ( ∂x ) . (3.5)Suppose now that there exists ˜ θ so that µ T − ( X ) ( x ; θ ) = µ X (cid:16) x ; ˜ θ (cid:17) and σ T − ( X ) ( x ; θ ) = σ X (cid:16) x ; ˜ θ (cid:17) .Then the alternative representation (3.2)-(3.3) is a member of our model with structure ˜ S = (˜ θ, ˜ V )which is observationally equivalent to S = ( V, θ ). The following result provides a complete charac-terizations of the class of observationally equivalent structures for a given model:

Theorem 3.2

Suppose that Assumptions 3.1 is satisﬁed. For any two structures S = ( V, θ ) and ˜ S = ( ˜ V , ˜ θ ) satisfying Assumptions 2.1(i) and 2.2, the following hold: S ∼ ˜ S if and only if thereexists one-to-one transformation T : X (cid:55)→ X so that ˜ V ( x ) = V ( T ( x )) (3.6) and, with µ T − ( X ) (cid:16) x ; ˜ θ (cid:17) and σ T − ( X ) (cid:16) x ; ˜ θ (cid:17) given in eqs. (3.4)-(3.5),(i) µ T − ( X ) ( x ; ˜ θ ) = µ X ( x ; θ ) and (ii) σ T − ( X ) ( x ; ˜ θ ) = σ X ( x ; θ ) . (3.7) In particular, the data-generating structure is identiﬁed if and only if there exists no one-to-onetransformation T such that (3.7) holds for θ (cid:54) = ˜ θ . Note that the above theorem does not require stationarity since it is only concerned with themapping

S (cid:55)→ ( µ Y ( · ; S ) , σ Y ( · ; S )) which is well-deﬁned irrespectively of whether data is stationary.The ﬁrst part of the theorem provides a exact characterization of when any two structures areequivalent, namely if there exists a transformation T so that (3.6)-(3.7) hold. The second part12omes as a natural consequence of the ﬁrst part: If there exists no such transformation, then thedata-generating structure must be identiﬁed.Unfortunately, the above result may not always be useful in practice since it requires us tosearch over all possible one-to-one transformations T and for each of these verify that there existsno θ (cid:54) = ˜ θ for which eq. (3.7) holds. In some cases, it proves useful to ﬁrst normalize the UPDsuitably and then verify eq. (3.7) in the normalized version. First note that for any one-to-onetransformation ¯ T ( · ; θ ) : X (cid:55)→ ¯ X , an equivalent representation of the model is Y t = V (cid:0) ¯ X t (cid:1) , where the ”normalised” UPD ¯ X t := ¯ T − ( X t ; θ ) ∈ ¯ X solves d ¯ X t = µ ¯ X (cid:0) ¯ X t ; θ (cid:1) dt + σ ¯ X (cid:0) ¯ X t ; θ (cid:1) dW t , with µ ¯ X (¯ x ; θ ) = µ X (cid:0) ¯ T (¯ x ; θ ) ; θ (cid:1) ∂ ¯ T (¯ x ; θ ) / ( ∂ ¯ x ) − σ X (cid:0) ¯ T (¯ x ; θ ) ; θ (cid:1) ∂ ¯ T (¯ x ; θ ) / (cid:0) ∂ ¯ x (cid:1) ∂ ¯ T (¯ x ; θ ) / ( ∂ ¯ x ) , (3.8) σ ¯ X (¯ x ; θ ) = σ X (cid:0) ¯ T (¯ x ; θ ) ; θ (cid:1) ∂ ¯ T (¯ x ; θ ) / ( ∂ ¯ x ) . (3.9)Given that the above representation is observationally equivalent to the original model, we can stillemploy Theorem 3.2 but with µ ¯ X and σ ¯ X replacing µ X and σ X . Verifying the identiﬁcation con-ditions stated in the second part of the theorem for the normalised versions will in some situationsbe easier by judicious choice of ¯ T .Below, we present three particular normalising transformations that we have found useful in thisregard. The chosen transformations allow us to provide easy-to-check conditions for a given UPDto be identiﬁed. For a given UPD, the researcher is free to apply either of the three identiﬁcationschemes depending on which is the easier one to implement. The three schemes lead to diﬀerentnormalizations/parametrizations, but they all lead to models that are exactly identiﬁed (no over-identifying restrictions are imposed) and so are observationally equivalent: The resulting form of µ Y and σ Y will be identical irrespectively of which scheme is employed.The three transformations that we consider also highlights three alternative modelling ap-proaches: Instead of starting with a parametric UPD as found in the existing literature, such asExamples 1-3, one can alternatively build a UPD with unit diﬀusion ( σ X = 1), zero drift ( µ X = 0)or known marginal distribution. As we shall see, either of these three modelling approaches are inprinciple as ﬂexible as the standard approach where the researcher jointly speciﬁes the drift anddiﬀusion term. In our ﬁrst identiﬁcation scheme, we choose to normalize X t by the so-called Lamperti transform,¯ X t = ¯ T − ( X t ; θ ) := γ ( X t ; θ ) , γ ( x ; θ ) = (cid:90) xx ∗ σ X ( z ; θ ) dz, x ∗ ∈ X . The resulting process is a unit diﬀusion process, d ¯ X t = µ ¯ X (cid:0) ¯ X t ; θ (cid:1) dt + dW t , with domain ¯ X = (¯ x l , ¯ x r ), where ¯ x r = lim x → x + r γ ( x ; θ ) and ¯ x l = lim x → x − l γ ( x ; θ ), and drift function µ ¯ X (¯ x ; θ ) = µ X (cid:0) γ − (¯ x ; θ ) ; θ (cid:1) σ X ( γ − (¯ x ; θ ) ; θ ) − ∂σ X ∂x (cid:0) γ − (¯ x ; θ ) ; θ (cid:1) . (3.10)For the unit diﬀusion version of the UPD, the equivalence condition (3.7)(ii) becomes1 = σ ¯ X (¯ x ; θ ) = σ T − ( ¯ X ) (cid:16) ¯ x ; ˜ θ (cid:17) = 1 ∂T (¯ x ) / ( ∂x ) , which can only hold if T (¯ x ) = ¯ x + η for some constant η ∈ R . Thus, we can restrict attention tothis class of transformations and (3.7)(i) becomes: Assumption 3.2.

With µ ¯ X given in (3.10): There exists no η (cid:54) = 0 and ˜ θ (cid:54) = θ such that µ ¯ X (¯ x ; ˜ θ ) = µ ¯ X (¯ x + η ; θ ) for all ¯ x ∈ ¯ X .Assumption 3.2 imposes a normalization condition on the transformed drift function to ensureidentiﬁcation. When verifying Assumption 3.2 for the transformed unit diﬀusion ¯ X deﬁned above,we will generally need to ﬁx some of the parameters that enter µ X ( x ; θ ) and σ X ( x ; θ ) of the originalprocess X , see below. Corollary 3.3

Under Assumptions 2.1(i), 2.2 and 3.1, S is identiﬁed if and only if Assumption3.2 is satisﬁed. The above transformation result can be applied to standard parametric speciﬁcations when γ ( x ; θ ) is available on closed-form. But it also highlights that in terms of modelling copula diﬀu-sions, we can without loss of generality build a model where we from the outset restrict σ X = 1and only model the drift term µ X . For example, we could choose the following ﬂexible polynomialdrift model where we have already normalized the diﬀusion term: dX t = (cid:32) l (cid:88) i =1 α i X it (cid:33) dt + dW t , (3.11)where θ = ( α , ..., α l ). Corollary 3.3 shows that this particular copula diﬀusion speciﬁcation isidentiﬁed without further restrictions on θ . Below we apply Corollary 3.3 to some of the standardparametric diﬀusions introduced earlier: Example 1 (continued).

The Lamperti transform of the OU process in (2.10) is given by d ¯ X t = κ (cid:0) α/σ − ¯ X t (cid:1) dt + dW t . α/σ is a location shift of ¯ X , we need to normalize α/σ in order for the identiﬁcation condition3.3 to be satisﬁed; one such is α/σ = 0 leading to the following identiﬁed model, d ¯ X t = − κ ¯ X t dt + dW t . (3.12) Example 2 (continued).

The Lamperti transform of the CIR diﬀusion in (2.12) is given by d ¯ X t = (cid:20) κ (cid:18) X t ασ − ¯ X t (cid:19) −

12 ¯ X t (cid:21) dt + dW t , (3.13)which only depends on θ = ( κ, α ∗ ) where α ∗ = α/σ . Note that the dimension of the parametervector reduced from 3 to 2. Crucially, it also suggests that we can only identify α and σ up to aratio. Hence, normalization requires ﬁxing either α , σ , or their ratio. Example 3 (continued).

It can be easily veriﬁed that the Lamperti transform of the NLDCEVdiﬀusion in (2.13) takes the form d ¯ X t = (cid:34) l (cid:88) i = − k α ∗ i ¯ X i − β − β t − β − β ) ¯ X − t (cid:35) dt + dW t , (3.14)where α ∗ i := α i σ i − − β (1 − β ) i − β − β , i = − k, ..., l . Hence, the parameters θ = (cid:0) β, α ∗− k , ..., α ∗− l (cid:1) areidentiﬁed and the number of parameters is reduced from l + k + 3 to l + k + 2. Note that just as(2.10) and (2.12) are special cases of (2.13), both (3.12) and (3.13) are special cases of (3.14). Our second identiﬁcation strategy transforms X by its scale measure deﬁned in eq. (2.4),¯ X t := S ( X t ; θ ) , which brings the diﬀusion process onto its natural scale, d ¯ X t = σ ¯ X (cid:0) ¯ X t ; θ (cid:1) dW t , where the drift is zero (and so known) while σ X (¯ x ; θ ) = s (cid:0) S − (¯ x ; θ ) ; θ (cid:1) σ (cid:0) S − (¯ x ; θ ) ; θ (cid:1) . (3.15)Since the drift term is zero, the identiﬁcation condition (3.7)(i) becomes0 = − σ X (cid:16) T (¯ x ) ; ˜ θ (cid:17) ∂ T (¯ x ) / (cid:0) ∂ ¯ x (cid:1) ∂T (¯ x ) / ( ∂ ¯ x ) , (3.16)which can only hold if ∂ T (¯ x ) / (cid:0) ∂ ¯ x (cid:1) = 0. We can therefore restrict attention to linear transfor-mations T (¯ x ) = η ¯ x + η , for some constants η , η ∈ R , in which case (3.7)(ii) becomes:15 ssumption 3.3. With σ X given in (3.15): There exists no η (cid:54) = 1, η (cid:54) = 0 and ˜ θ (cid:54) = θ such that σ X (¯ x ; ˜ θ ) = σ X ( η ¯ x + η ; θ ) /η for all ¯ x ∈ ¯ X .In comparison to Assumption 3.2, we here have to impose two normalizations to ensure identi-ﬁcation. The intuition for this is that setting the drift to zero does not act as a complete normal-ization of the process: Any additional scale transformation of ¯ X still leads to a zero-drift process.Therefore, for the third scheme to work we need both a scale and location normalization. Theorem 3.4

Under Assumptions 2.1(i)–(ii), 2.2 and 3.1, S is identiﬁed if and only if Assumption3.3 is satisﬁed. Compared to the ﬁrst identiﬁcation scheme, it is noticeably harder to apply this one to existingparametric diﬀusion models since the inverse of the scale transform is usually not available in closedform. But, similar to the ﬁrst identiﬁcation scheme, the result shows that without loss of ﬂexibility,we can focus on UPDs with zero drift and then model the diﬀusion term in a ﬂexible manner, e.g., dX t = exp (cid:32) l − (cid:88) i =1 β i X it + β l | X t | l (cid:33) dW t . (3.17)Corollary 3.4 shows that this UPD is identiﬁed together with V without any further parameterrestrictions on θ = ( β , ..., β l ). Our third identiﬁcation strategy transforms a given stationary UPD by its marginal cdf,¯ X t = F X ( X t ; θ ) . (3.18)In this case, there is generally no simpliﬁcation in terms of the drift and diﬀusion term, which takethe form µ ¯ X (¯ x ; θ ) = µ X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) f X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) (3.19)+ 12 σ X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) f (cid:48) X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) and σ ¯ X (¯ x ; θ ) = σ X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) f X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) . (3.20)for ¯ x ∈ ¯ X = (0 , X t ∼ U (0 ,

1) and we candirectly identify the transformation function by U ( y ) = F Y ( y ), c.f. eq. (2.16). The identiﬁcationcondition then takes the form: 16 ssumption 3.4. With µ ¯ X (¯ x ; θ ) and σ ¯ X (¯ x ; θ ) given in eqs. (3.19)-(3.20), the following hold: ∀ ¯ x ∈ (0 ,

1) : µ ¯ X (¯ x ; θ ) = µ ¯ X (cid:16) ¯ x ; ˜ θ (cid:17) and σ ¯ X (¯ x ; θ ) = σ ¯ X (cid:16) ¯ x ; ˜ θ (cid:17) ⇔ θ = ˜ θ. Corollary 3.5

Under Assumptions 2.1-2.2 and 3.1, S is identiﬁed if and only if Assumption 3.4is satisﬁed. The above result is only useful for showing identiﬁcation of a given UPD if F − (¯ x ; θ ) is availableon closed form. But similar to the previous identiﬁcation schemes, it demonstrates we can restrictattention to diﬀusions with known marginal distributions in the model building phase. Speciﬁcally,one can choose a known density f X ( x ) that describes the stationary distribution of X togetherwith a parametric speciﬁcation for, say, the drift function. We can then rearrange eq. (2.5) to backout the diﬀusion term of the UPD: σ X ( x ; θ ) = 2 f X ( x ) (cid:90) xx l µ X ( z ; θ ) f X ( z ) dz. (3.21)If the drift is speciﬁed so that µ X ( · ; θ ) (cid:54) = µ X ( · ; ˜ θ ) for θ (cid:54) = ˜ θ , then Assumption 3.4 will be satisﬁedfor this model. Alternatively, one could choose a parametric speciﬁcation of the diﬀusion term andthen derive the corresponding drift term of the UPD satisfying µ X ( x ; θ ) = 12 f X ( x ) ∂∂x (cid:2) σ X ( x ; θ ) f X ( x ) (cid:3) . The resulting copula diﬀusion model is identiﬁed as long as the chosen diﬀusion term satisﬁes σ X ( · ; θ ) (cid:54) = σ X ( · ; ˜ θ ) for θ (cid:54) = ˜ θ , then Assumption 3.4 will be satisﬁed for this model.Below, we apply the third identiﬁcation scheme to the OU and CIR model: Example 1 (continued).

The stationary distribution of (2.10) is N (cid:0) α, v (cid:1) with v = σ / κ andso the marginal density and cdf takes the form f X ( x ; θ ) = v φ (cid:0) x − αv (cid:1) and F X ( x ; θ ) = Φ (cid:0) x − αv (cid:1) ,where φ and Φ denote the density and cdf of the N (0 ,

1) distribution. Applying the transformation(3.18) yields, after some tedious calculations, d ¯ X t = − κ Φ − (cid:0) ¯ X t (cid:1) φ (cid:0) Φ − (cid:0) ¯ X t (cid:1)(cid:1) dt + √ κφ (cid:0) Φ − (cid:0) ¯ X t (cid:1)(cid:1) dW t , which is independent of α and σ and these therefore have to be ﬁxed, leaving κ as the only freeparameter. This is the same ﬁnding as with the ﬁrst identiﬁcation strategy. Example 2 (continued).

The stationary distribution of the CIR process is a Γ-distribution withscale parameter ω = 2 κ/σ and shape parameter ν = 2 κα/σ . Thus, the marginal density and cdfcan be written as f X ( x ; θ ) = f X ( x ; ω, ν ) = ω ν Γ ( ν ) x ν − e − ωx F X ( x ; θ ) = F X ( x ; ω, ν ) = 1Γ ( ν ) γ ( ν, ωx )17here Γ ( ν ) is the gamma function and γ ( ν, ωx ) is the lower incomplete gamma function. Applyingthe transformation (3.18) yields µ ¯ X (¯ x ; θ ) = (cid:20) κ (cid:18) ν κ − γ − ( ν, ¯ x Γ ( ν ))2 κ (cid:19) + (cid:18) ν − − γ − ( ν, ¯ x Γ ( ν ))2 (cid:19)(cid:21) κ Γ ( ν ) γ − ( ν, ¯ x Γ ( ν )) ν − e − γ − ( ν, ¯ x Γ( ν )) and σ X (¯ x ; θ ) = 2 κγ − ( ν, ¯ x Γ ( ν )) (cid:20)

1Γ ( ν ) γ − ( ν, ¯ x Γ ( ν )) ν − e − γ − ( ν, ¯ x Γ( ν )) (cid:21) . Note that µ ¯ X (¯ x ; θ ) and σ X (¯ x ; θ ) only depend on κ and ν , which means we can only identify α and σ up to a ratio say α ∗ = α/σ . Hence, either α or σ must be ﬁxed, which is in accordance withwhat we found when applying the ﬁrst identiﬁcation strategy to the CIR. We could, for example,set σ = 2 κ which leads to the following normalized CIR dX t = κ ( α − X t ) dt + (cid:112) κX t dW t . In this section we develop two alternative semiparametric estimators of θ and V for a given speciﬁ-cation of the UPD. The ﬁrst takes the form of a two-step Pseudo Maximum Likelihood Estimator(PMLE). The second is a semiparametric sieve-based ML estimator (SMLE). We consider two dif-ferent scenarios when developing estimators: In the ﬁrst one (see Section 4.1), Y is observed at lowfrequency which we formally deﬁne as the case when ∆ > n → ∞ . In the second one(see Section 4.2), high-frequency data is available so that ∆ → n → ∞ . To motivate the two estimators, suppose that U is known, in which case the MLE of θ is given byˆ θ MLE = arg max θ ∈ Θ L n ( θ, U ) , where L n ( θ, U ) is the log-likelihood of { Y i ∆ : i = 0 , , ..., n } , L n ( θ, U ) = 1 n n (cid:88) i =1 (cid:8) log p X (cid:0) U ( Y i ∆ ) | U (cid:0) Y ( i − (cid:1) ; θ (cid:1) + log U (cid:48) ( Y i ∆ ) (cid:9) , (4.1)where p X was is deﬁned in eq. (2.3). If U is unknown, the above estimator is not feasible and weinstead have to estimate it together with θ .Our PMLE assumes Y is stationary in which case U satisﬁes eq. (2.16), where F X is known upto θ while F Y is unknown. The latter can be estimated by the empirical cdf deﬁned as˜ F Y ( y ) = 1 n + 1 n (cid:88) i =0 I { Y i ∆ ≤ y } , I {·} denotes the indicator function, or alternatively by the following kernel smoothed em-pirical cdf, ˆ F Y ( y ) = 1 n + 1 n (cid:88) i =0 K h ( Y i ∆ − y ) , (4.2)where K h ( y ) = K ( y/h ) with K ( y ) = (cid:82) y −∞ K ( z ) dz , K being a kernel (e.g., the standard normaldensity), and h > F Y in eq. (2.16) with either ˜ F Y or ˆ F Y , we obtain thefollowing two alternative estimators of U ,˜ U ( y ; θ ) = F − X ( ˜ F Y ( y ) ; θ ); ˆ U ( y ; θ ) = F − X ( ˆ F Y ( y ) ; θ ) . (4.3)Since ˆ F Y ( y ) = ˜ F Y ( y ) + O (cid:0) h (cid:1) , the above two estimators of U will be ﬁrst-order asymptoticallyequivalent under appropriate bandwidth conditions. A natural way to estimate θ in our semipara-metric framework would then be to substitute either ˆ U ( y ; θ ) or ˜ U ( y ; θ ) into L n ( θ, U ). However,in the latter case, this is not possible since L n ( θ, U ) depends on U (cid:48) and ˜ U is not diﬀerentiable.However, note that U (cid:48) ( y ) = f Y ( y ) f X ( U ( y ) ; θ ) , (4.4)so that log U (cid:48) ( y ) = log f Y ( y ) − log f X ( U ( y ) ; θ ). Since the ﬁrst term is parameter independent, itcan be ignored and so we arrive at the following semiparametric PMLE,ˆ θ PMLE = arg max θ ∈ Θ ¯ L n ( θ, ˜ U ( · ; θ )) , where Θ is the parameter space and¯ L n ( θ, U ) = 1 n n (cid:88) i =1 (cid:8) log p X (cid:0) U ( Y i ∆ ) | U (cid:0) Y ( i − (cid:1) ; θ (cid:1) − log f X ( U ( Y i ∆ ) ; θ ) (cid:9) is L n ( θ, U ) − (cid:80) ni =1 log f Y ( Y i ∆ ) /n . One can easily check that, by rewriting the above in terms ofthe implied copula of X , this estimator is equivalent to the one analyzed in Chen and Fan (2006).Our second proposal, the SMLE, replaces the unknown density function f Y ( y ) by a sieve ap-proximation f Y,m ( y ) ∈ F m where F m is a ﬁnite-dimensional function space reﬂecting the propertiesof f Y , m = 1 , , ... . For a given candidate density, we then compute U ( y ; f Y,m , θ ) = F − X ( F Y,m ( y ) ; θ )where F Y,m ( y ) = (cid:82) yy l f Y,m ( z ) dz . Substituting this into the likelihood function yields the followingsemiparametric sieve maximum-likelihood estimator,(ˆ θ SMLE , ˆ f Y,m ) = arg max θ ∈ Θ ,f Y,m ∈F m L n ( θ, U ( · ; f Y,m , θ )) . (4.5)The above SMLE is identical to the one proposed by Chen, Wu and Yi (2009) for the estimation ofcopula-based Markov models, except that while they estimate the parameters of a copula function,we estimate those of the drift and diﬀusion functions of the UPD. In comparison with the PMLE, thenumerical implementation of the SMLE involves joint maximization over both θ and F m , which is19 harder numerical problem and potentially more time-consuming. In terms of statistical eﬃciency,ˆ θ SMLE will in general reach the semiparametric eﬃciency bound under stationarity, while the PMLEis ineﬃcient.Both of the above estimators require us to evaluate F − X ( x ; θ ) which in general is not availableon closed form and so has to be computed using numerical methods, e.g., numerical integrationor Monte Carlo methods combined with a equation solver. For the SMLE, one can circumventthis issue by directly approximating U instead of f Y : For a given ﬁnite-dimensional functionspace of one-to-one transformations U m , an alternative to the SMLE in (4.5) is (˜ θ SMLE , ˜ U m ) =arg max θ ∈ Θ ,U m ∈U m L n ( θ, U m ). We expect this to be computationally more eﬃcient compared tothe density version above; the theoretical analysis of this alternative SMLE is left for future re-search.Once an estimator for θ has been obtained, we can estimate the drift and diﬀusion termsof Y using the expressions given in (2.6) and (2.7) by replacing θ and U with their estimators.However, this involves estimating the ﬁrst and second derivative of U . For the SMLE this is notan issue assuming that F m is a diﬀerentiable function space. For the PMLE, since ˜ U ( y ; θ ) isnot diﬀerentiable, we instead use the kernel smoothed version ˆ U ( y ; θ ), leading to the followingthree-step estimators of the drift and diﬀusion functionsˆ µ Y ( y ) = µ X ( ˆ U ( y ) ; ˆ θ PMLE )ˆ U (cid:48) ( y ) − σ X ( ˆ U ( y ) ; ˆ θ PMLE ) ˆ U (cid:48)(cid:48) ( y )ˆ U (cid:48) ( y ) , (4.6)ˆ σ Y ( y ) = σ X ( ˆ U ( y ) ; ˆ θ PMLE )ˆ U (cid:48) ( y ) , (4.7)where ˆ U ( y ) = F − X ( ˆ F Y ( y ) ; ˆ θ PMLE ). We now turn to the case where high-frequency data is available; this scenario is formally modelledas ∆ → n → ∞ . The proposed estimators described in the previous section remains valid, butan alternative estimation method is available in this case since the exact density of the underlyingUPD, p X , is well-approximated byˆ p X ( x | x ; θ ) = 1 √ π ∆ σ X ( x ; θ ) exp (cid:34) − ( x − x − µ X ( x ; θ ) ∆) σ X ( x ; θ ) ∆ (cid:35) (4.8)as ∆ →

0, c.f. Kessler (1997). We then propose to estimate θ using either the two-step or sieveapproach described in the previous section, except that we here replace p X ( x | x ; θ ) with its high-frequency approximation, ˆ p X ( x | x ; θ ), in the deﬁnition of L n ( θ, U ) and ¯ L n ( θ, U ). The advantage ofdoing so is computational in that ˆ p X ( x | x ; θ ) is on closed form for any given UPD while p X ( x | x ; θ )generally can only be evaluated using numerical methods as pointed out earlier.For most standard UPD’s, the parameters can be decomposed into θ = ( θ , θ ) so that µ X ( x ; θ ) = µ X ( x ; θ ) and σ X ( x ; θ ) = σ X ( x ; θ ) only depends on the ﬁrst and second component, respec-tively. One could hope to be able to estimate θ and θ separately in this case. For known U ,20his is indeed possible. We could, for example, use least-squares methods similar to Kanaya andKristensen (2018) where θ and θ , respectively, are estimated by the minimizers of the followingtwo least-squares objectives, L ( µ ) n, ∆ ( θ ; U ) = n (cid:88) i =1 w ( µ ) i (cid:0) U ( Y i ∆ ) − U (cid:0) Y ( i − (cid:1) − µ X (cid:0) U (cid:0) Y ( i − (cid:1) ; θ (cid:1) ∆ (cid:1) , (4.9) L ( σ ) n, ∆ ( θ ; U ) = n (cid:88) i =1 w ( σ ) i (cid:16)(cid:8) U ( Y i ∆ ) − U (cid:0) Y ( i − (cid:1)(cid:9) − σ X (cid:0) U (cid:0) Y ( i − (cid:1) ; θ (cid:1) ∆ (cid:17) , (4.10)where w ( µ ) i = w ( µ ) (cid:0) Y ( i − , Y i ∆ (cid:1) and w ( σ ) i = w ( σ ) (cid:0) Y ( i − , Y i ∆ (cid:1) are weighting functions.This approach, however, faces two complications in our setting: First, after applying any ofthe three normalizations presented in Section 3 in order to achieve identiﬁcation, the resultingdrift and diﬀusion of the UPD tend to share parameters. Second, U is unknown and has to beestimated together with θ . In the case of PMLE, ˜ U ( y ; θ ) in eq. (4.3) generally depends on both θ and θ since f X ( x ; θ ) does. Thus, if we replace U by ˜ U ( y ; θ ) in the above objectives, we cannotseparately estimate θ and θ . Similarly, the SMLE requires joint estimation of U together with θ in which case it would have to be re-estimated for each of the two objectives. In conclusion, theseleast-squares estimators are rarely useful in practice.Another alternative approach, inspired by Bandi and Phillips (2007), see also Kristensen (2011),would be to ﬁrst obtain non-parametric estimates of µ Y and σ Y and then match these with theones implied by the copula model, Q ( µ ) n, ∆ ( S ) = n (cid:88) i =1 w ( µ ) i (ˆ µ Y ( Y i ∆ ) − µ Y ( Y i ∆ ; S )) , Q ( σ ) n, ∆ ( S ) = n (cid:88) i =1 w ( σ ) i (cid:0) ˆ σ Y ( Y i ∆ ) − σ Y ( Y i ∆ ; S ) (cid:1) , where ˆ µ Y ( · ) and ˆ σ Y ( · ) are the ﬁrst-step nonparametric estimators; see Bandi and Phillips (2007)for their precise forms. This procedure suﬀers from the same issue as the least-squares one describedin the previous paragraph. An additional complication is that it involves multiple smoothing pa-rameters: First, ˆ µ Y ( · ) and ˆ σ Y ( · ) depend on two bandwidths and converge with slow rates and,second, µ Y ( · ; S ) and σ Y ( · ; S ) involve derivatives of U and so if we replace U by its kernel-smoothedestimator, ˆ U , the two objective funtions will depend on the ﬁrst and second order derivatives ofthe kernel density estimator of f Y , which in turn depends on additional bandwidth. All together,these estimators will be complicated to implement due to the multiple bandwidths that the econo-metrician have to choose. Moreover, their asymptotic analysis and behaviour will be non-standard. We here establish an asymptotic theory for the proposed estimators in the case of low-frequency data(∆ > ssumption 4.1 S is identiﬁed.The previous section provided three diﬀerent sets of primitive conditions for Assumption 4.1to hold in terms of ( µ Y ( · ; S ) , σ Y ( · ; S )). This combined with Assumption 3.1 then implies thatthe mapping ( µ Y ( · ; S ) , σ Y ( · ; S )) (cid:55)→ p Y ( y | y ; S ) is injective so that diﬀerent drift and diﬀu-sion terms lead to diﬀerent transition densities. One implication of Assumptions 3.1 and 4.1 is E [log p Y ( Y ∆ | Y ; S )] < E [log p Y ( Y ∆ | Y ; S )] for any S (cid:54) = S , c.f. Newey and McFadden (1994,Lemma 2.2). This ensures that the SMLE identiﬁes S in the limit. Regarding the PMLE,we note that it replaces U by ˆ U ( y ; θ ) = F − X ( ˆ F Y ( y ; θ )). By the LLN of stationary and er-godic sequences, ˆ U ( y ; θ ) → P U ( y ; θ ) = F − X ( F Y ( y ; θ )), where, by the same arguments as before, E [log p Y ( Y ∆ | Y ; θ, U ( · ; θ ))] < E [log p Y ( Y ∆ | Y ; θ , U ( · ; θ ))]. Thus, the PMLE will also in the limitidentify θ .Next, we import conditions from Chen et al. (2010) guaranteeing, in conjunction with our ownAssumptions 2.1-2.2, that the UPD X , and thereby Y , is stationary and β -mixing with mixingcoeﬃcients decaying at either polynomial rate (c.f. Corollary 5.5 in Chen et al., 2010) or geometricrate (c.f. Corollary 4.2 in Chen et al., 2010): Assumption 4.2. (i) µ X and σ X satisﬁeslim x → x r (cid:26) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:27) ≤ , lim x → x u (cid:26) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:27) ≥ s ( x ; θ ) and S ( x ; θ ) deﬁned in (2.4),lim x → x r (cid:26) s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) (cid:27) > , lim x → x u (cid:26) s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) (cid:27) < X .Finally, we impose the same conditions as used in the asymptotic analysis of the PMLE in Chenand Fan (2006) and Chen, Wu and Yi (2009), respectively, on the copula implied by the chosenUPD and the sieve density in the case of SMLE: Assumption 4.3. (i) c X ( u , u ; θ ) deﬁned in (2.17) satisﬁes the regularity conditions set out inChen and Fan (2006, A1-A3, A4 or A4’, A5-A6); (ii) c X ( u , u ; θ ) and the sieve space F m satisfy Assumptions 3.1-3.4 and 4.1–4.7, respectively, in Chen, Wu and Yi (2009).22e here abstain from stating the precise, mostly technical, conditions and refer the interestedreader to Chen and Fan (2006) and Chen, Wu and Yi (2009); broadly speaking their conditionstranslate into moment bounds and smoothness conditions on the log-transition density of the UPD.These conditions depend on the precise choice of the UPD and so will have to be veriﬁed on a case-by-case basis. In Appendix B, we verify the conditions for models in Examples 1–2.The following result now follows from the general theory of Chen and Fan (2006) and Chen,Wu and Yi (2009), respectively: Theorem 5.1

Under Assumptions 2.1-2.2, 4.1, 4.2(i) and 4.3(i), √ n (ˆ θ PMLE − θ ) → d N (cid:0) , B − Σ B − (cid:1) , where B and Σ are deﬁned in Chen and Fan (2006, A1 and A ∗ n ).Under Assumptions 2.1-2.2, 4.1, 4.2(ii) and 4.3(ii), √ n (ˆ θ SMLE − θ ) → d N (cid:0) , I − ∗ ( θ ) (cid:1) , where I ∗ is deﬁned in Chen, Wu and Yi (2009). Consistent estimators of the asymptotic variances, B − Σ B − and I − ∗ ( θ ), can be found in Chenand Fan (2006) and Chen, Wu and Yi (2009), respectively. Next, we discuss the asymptotic properties of the PMLE based on the high-frequency log-likelihoodthat takes as input ˆ p X ( x | x ; θ ) deﬁned in eq. (4.8); a complete analysis of the PMLE and SMLEin a high-frequency setting is left for future research. In the following, we let T := n ∆ denote thesampling range, which will be assumed to diverge as ∆ → θ PMLE = arg max θ ∈ Θ ˆ L n (cid:16) θ, ˜ U ( · ; θ ) (cid:17) whereˆ L n ( θ, U ) = 1 n n (cid:88) i =1 (cid:8) log ˆ p X (cid:0) U ( Y i ∆ ) | U (cid:0) Y ( i − (cid:1) ; θ (cid:1) − log f X ( U ( Y i ∆ ) ; θ ) (cid:9) , and ˜ U ( Y i ∆ ; θ ) deﬁned in (4.3). We ﬁrst specialize the general result of Kanaya (2018, Theorem 2)by choosing B = ψ = 1 and K h ( y ) = I { y ≤ } in his notation to obtain that under our Assumption4.2, sup y ∈Y (cid:12)(cid:12)(cid:12) ˜ F Y ( y ) − F Y ( y ) (cid:12)(cid:12)(cid:12) = O P (cid:16) √ ∆ / log ∆ (cid:17) + O P (cid:16) log T / √ T (cid:17) , (5.1)where the two terms on the right-hand side correspond to discretization bias and sampling variance,respectively. By letting T grow suﬃciently fast as ∆ →

0, the ﬁrst term can be ignored. Underregularity conditions on µ X and σ X so that ( y, y ) (cid:55)→ ˆ p X (cid:0) F − X ( y ; θ ) | F − Y ( y ) ; θ (cid:1) /f Y ( y ) satisﬁesLipschitz conditions similar to the ones in Chen and Fan (2006), we then obtainsup θ ∈ θ (cid:12)(cid:12)(cid:12) ˆ L n (cid:16) θ, ˜ U ( · ; θ ) (cid:17) − ˆ L n ( θ, U ( · ; θ )) (cid:12)(cid:12)(cid:12) = O P (cid:16) √ ∆ / log ∆ (cid:17) + O P (cid:16) log T / √ T (cid:17) , U ( y ; θ ) = F Y (cid:0) F − X ( y ; θ ) (cid:1) . Consistency of the PMLE now follows by extending the argumentsof Kessler (1997) to allow for the presence of the parameter-dependent transformation U ( y ; θ ).Next, to simplify our discussion of the asymptotic distribution of the PMLE, we consider twospecial cases:First, suppose that suppose that, after suitable normalizations, σ X ( x ) is known and only µ X ( x ; θ ) is parameter dependent. In this case, we expect that Kessler’s results generalize so thatˆ θ PMLE will converge with √ T -rate towards a Normal distribution, where the asymptotic variancewill have to be adjusted to take into account the ﬁrst-step estimation of ˆ F Y .Next, consider the opposite scenario, µ X ( x ) is known and only σ X ( x ; θ ) is parameter de-pendent. With U known, Kessler (1997) shows that ˆ θ PMLE converges with √ n -rate towards aNormal distribution in this case. Note the faster convergence rate compared to the drift estimator.However, in our setting U ( y ; θ ) is parameter dependent, and as a consequence this result appearsto no longer apply: U ( y ; θ ) enters ˆ L n ( θ, U ) in the same way that µ X does and so the score ofˆ L n ( θ, U ( · ; θ )) will have a component on the same form as in the ﬁrst case and so will converge with √ T -rate instead of √ n -rate. Moreover, the presence of the ﬁrst-step estimator ˜ F Y ( y ), which alsoconverge with √ T -rate, will generate an additional variance term. In total, estimators of diﬀusionparameters appear not to enjoy ”super” consistency in our setting due to the way that the unknowntransformation U enters the likelihood. We here analyze the asymptotic properties of the kernel-based estimators of µ Y and σ Y given ineqs. (4.6)-(4.7). We only do so for the low-frequency case; the analysis of the high-frequency caseshould proceed in a similar fashion. Our analysis takes as starting point the following regularityconditions on the estimator of the parametric component and the kernel function: Assumption 4.4.

The transformation function V is four times continuously diﬀerentiable. Assumption 4.5.

The estimator ˆ θ of the parameter of the UPD X is √ n -consistent. Assumption 4.6.

The kernel K is diﬀerentiable, and there exists constants D, ω > (cid:12)(cid:12)(cid:12) K ( i ) ( z ) (cid:12)(cid:12)(cid:12) ≤ D | z | − ω , (cid:12)(cid:12)(cid:12) K ( i ) ( z ) − K ( i ) (˜ z ) (cid:12)(cid:12)(cid:12) ≤ D | z − ˜ z | , i = 0 , , where K ( i ) ( z ) denotes the i th derivative of K ( z ). Moreover, (cid:82) R K ( z ) dz = 1, (cid:82) R zK ( z ) dz = 0and κ = (cid:82) R z K ( z ) dz < ∞ .Assumption 4.4 ensures the existence of the 3rd and 4th derivatives of U ( y ), which in turnensure that relevant quantities entering the asymptotic distributions of ˆ µ Y and ˆ σ Y are well deﬁned.Assumption 4.5 implies that the asymptotic properties of ˆ µ Y and ˆ σ Y are determined by the prop-erties of the kernel density estimator alone. The proposed PMLE and SMLE satisfy this condition24nder our Assumptions 4.1-4.3, but other √ n -consistent estimators are allowed for. Assumption4.6 regulates the kernel functions and allow for most standard kernels such as the Gaussian andthe Uniform kernels. Using the functional delta-method together with standard results for kerneldensity estimators, as found in Robinson (1983), we obtain: Theorem 5.2

Under Assumptions 2.1-2.2, 4.2(i), and 4.4-4.6, we have as n → ∞ , h → and nh → ∞ , √ nh (cid:8) ˆ µ Y ( y ) − µ Y ( y ) − h B µ Y ( y ) (cid:9) → d N (0 , V µ Y ( y )) , where B µ Y ( y ) = − κ σ Y ( y ) f (cid:48)(cid:48)(cid:48) Y ( y )4 f Y ( y ) , V µ Y ( y ) = σ Y ( y )4 f Y ( y ) (cid:90) R K (cid:48) ( z ) dz. Also, as n → ∞ , h → and nh → ∞ , we have √ nh { ˆ σ Y ( y ) − σ Y ( y ) − h B σ Y ( y ) } → d N (0 , V σ ( y )) , where B σ Y ( y ) = − κ σ Y ( y ) f (cid:48)(cid:48) Y ( y ) f Y ( y ) , V σ Y ( y ) = 4 σ Y ( y ) f Y ( y ) (cid:90) R K ( z ) dz. We see that both estimators suﬀer from smoothing biases, B µ Y ( y ) and B σ Y ( y ). If h → In this section, we compare the ﬁnite sample performance of our low-frequency semiparametricPMLE with that of a fully parametric PMLE (described below) through Monte Carlo simulations.

We consider the following normalized versions of the UPDs of Examples 1–2,OU : dX t = − κX t dt + √ κdW t , θ = κ, (6.1)CIR : dX t = κ ( α − X t ) dt + (cid:112) κX t dW t , θ = ( κ, α ) . (6.2)The chosen normalizations have the advantage that the marginal distributions of X are invariantto the mean-reversion parameter κ . Hence, by varying κ , we can change the persistence level of X (and thus Y ) while keeping the marginal distributions ﬁxed. In this way, we can examine theimpact of persistence on the performance of the proposed estimators of θ , µ Y and σ Y .Next, we specify the transformation of the DGP of Y . This is done by choosing marginalcdf F Y ( y ; φ ), where φ is a hyper parameter governing the shape of the cdf, which induces the25ransformation V ( X t ; φ ) = F − Y ( F X ( X t ; θ ) ; φ ). With f Y ( y ; φ ) = F (cid:48) Y ( y ; φ ), the transition densityof the true DGP of Y then takes the form p Y ( y | y ; θ, φ ) = f Y ( y ; φ ) c X ( F Y ( y ; φ ) , F Y ( y ; φ ) ; θ ) . (6.3)We choose F Y ( y ; φ ) as a ﬂexible distribution to reﬂect stylized features such as asymmetry and fat-tailedness of observed ﬁnancial data. Speciﬁcally, we use the Skewed Student- t (SKST) Distributionof Hansen (1994) with density f Y ( y ; φ ) =  bqv  τ −  bv ( y − m ) + a − λ   − ( τ +1) / if y < m − av/b,bqv  τ −  bv ( y − m ) + a λ   − ( τ +1) / if y ≥ m − av/b, (6.4)where v >

0, 2 < τ < ∞ , − < λ < a = 4 λq (cid:18) τ − τ − (cid:19) , b = 1 + 3 λ − a and q =Γ (( τ + 1) / / (cid:112) π ( τ −

2) Γ ( τ / φ = ( m, v, λ, τ ) whichhas to be chosen in order to fully specify the DGP. While m and v are the unconditional meanand standard deviation of the distribution, λ controls the skewness and τ controls the degreesof freedom (hence the fat-tailedness) of the distribution. The distribution reduces to the usualstudent- t distribution when λ = 0. Due to its ﬂexibility in modelling skewness and kurtosis, theSKST distribution is often used in ﬁnancial modelling. (c.f. Patton, 2004; Jondeau and Rockinger,2006; Bu, Fredj and Li, 2017).The transformed diﬀusion Y generated by the SKST marginal distribution together with thenormalized UPD in (6.1) or (6.2) is referred to as the OU-SKST or the CIR-SKST model, respec-tively. The true data-generating parameters φ and θ are chosen as estimates obtained from ﬁttingthe parametric versions of the two models to the 7-day Eurodollar interest rate time series usedin A¨ıt-Sahalia (1996b). The estimation is based on a fully parametric two-stage PMLE. In theﬁrst stage, the SKST distribution is ﬁtted to the data (as if they are i.i.d) to obtain ˆ φ . We thensubstitute F Y ( y ; ˆ φ ) and f Y ( y ; ˆ φ ) into (6.3) which is then maximized with respect to θ to obtain ˆ θ for each of the two UPD’s. The calibrated parameter values of the marginal SKST distribution are( ˆ m, ˆ v, ˆ λ, ˆ τ ) = (0 . , . , . , . κ = 1 . κ, ˆ α ) = (0 . , . n = 2202 and n = 5505, respectively, are then generated using φ = ˆ φ and θ = ˆ θ as our true data-generating parameters. For both OU-SKST and CIR-SKST, θ involves26he mean-reversion parameter κ which controls the level of persistence. We create 3 additionalscenarios by multiplying κ by factors of 5, 10, and 20 while keeping everything else unchanged.Collectively, we have a total of 8 cases corresponding to 2 sample sizes and 4 persistence levels. Themaximum factor 20 is chosen because the implied 1st-order autocorrelation coeﬃcient ρ ≈ . We compare our low-frequency PMLE of θ with the corresponding fully parametric PMLE (PPMLE)described above that we used for our calibration. Note that the only diﬀerence between the twoestimators is that the former estimates the marginal distribution F Y parametrically, while the latterestimates it nonparametrically.The relative bias and RMSE (deﬁned as the ratios of the actual bias and the actual RMSEover the true parameter value, respectively) of the estimators of the parametric components of theOU-SKST case are presented in Table 1. Overall, the results from the two estimation methods aregenerally comparable with the same magnitudes. The semiparametric PMLE tends to do better interms of bias while the parametric PMLE dominates in terms of variance. However, as the level ofpersistence decreases, the two estimators’ performance is close to identical.[Table 1]The results for the CIR-SKST case are presented in Table 2 and 3 which are qualitatively verysimilar to the ones for the OU-SKST. Overall, the performance of the PMLE is comparable withthat of the PPMLE with very similar estimation errors. Moreover, the gap in the performance ofthe PMLE relative to the PPMLE appears to narrow when the true DGP gets less persistent.[Table 2 and 3]Next, we investigate the performance of the semiparametric estimators of µ Y and σ Y in eqs.(4.6)-(4.7) relative to their fully parametric estimators. In Figure 2, we plot their pointwise meansand 95% conﬁdence bands from the 500 estimates against the truth for the OU-SKST process with κ = 22 .

753 and sample size 2202. First, it is worth noting that µ Y and σ Y exhibit strong nonlin-earities that closely resemble the nonlinearities depicted in, for example, A¨ıt-Sahalia (1996b), Jiangand Knight (1997), and Stanton (1997). Second, the mean estimates from both estimation methodsare fairly close to the truth, but the variability of the semiparametric estimators is noticeably largerthan the parametric ones, especially in the right end of the range. This is not surprising: Firstly, asshown in Theorem 5.2, ˆ µ Y and ˆ σ Y converges at slower than √ n -rate due to the use of kernel esti-mators of f Y . From Figure 1, we can see that f Y has a long right tail which is diﬃcult to estimateby the kernel estimator in small and moderate samples. Figure 3 presents the same estimatorsat sample size 5505. At this larger sample size, the bias is even smaller for both methods andthe variability of these estimates are also reduced signiﬁcantly. Overall, although the parametric27ethod obviously has the advantage due to its parametric structure, our semiparametric methodalso provides fairly satisfactory estimation results.[Figure 2 and 3]The drift and diﬀusion estimators from the two methods where the true DGP is the CIR-SKSTprocess with κ = 15 .

307 and the two sample sizes are presented in Figure 4 and 5, respectively.Almost identical qualitative conclusions can be reached.[Figure 4 and 5]

As an empirical illustration, we here model the time series dynamics of the CBOE Volatility Indexdata using copula diﬀusion models. The data consists of the daily VIX index from January 2,1990 to July 19, 2019 (7445 observations). It is displayed and summarized in Figure 6 and Table4, respectively. The time series plot shows a clear pattern of mean reversion, and AugmentedDickey-Fuller tests with reasonable lags all rejected the unit root hypothesis at 5% signiﬁcancelevel, which justiﬁes the use of stationary diﬀusion models. The mean and the standard deviationis of VIX is 19 .

21 and 7 .

76, respectively. Meanwhile, the skewness and the kurtosis are 2 .

12 and10 .

85, respectively, suggesting that the stationary distribution deviates quite substantially fromnormality. This is more formally conﬁrmed by the highly signiﬁcant Jarque-Bera test statistic witha negligible p -value. [Figure 6 and Table 4] We focus on whether two well known parametric transformed diﬀusion models proposed for mod-elling VIX are supported by the data against their semiparametric alternatives. The two parametricmodels are the transformed-OU model of Detemple and Osakwe (2000) (DO) and the transformed-CIR model of Eraker and Wang (2015) (EW). Speciﬁcally, the DO model is the exponential trans-form of the OU process, which can be written as Y t = exp ( X t ) , dX t = κ ( α − X t ) dt + σdW t and the EW model is a parameter-dependent transformation of the CIR process, which is given by Y t = 1 X + δ + (cid:37), dX t = κ ( α − X t ) dt + σ (cid:112) X t dW t Meanwhile, the two semiparametric models we consider are the same two models considered in oursimulations, namely, the nonparametrically transformed OU and CIR models, which we denote as28PTOU and NPTCIR, respectively. Their associated normalized UPD processes are given in (6.1)and (6.2).Importantly, we maintain the assumption that the VIX is a Markov diﬀusion process. Inparticular, we rule out jumps and stochastic volatility (SV) in the VIX which is inconsistent with theempirical ﬁndings of, e.g., Kaeck and Alexander (2013). However, their models are fully parametricand so impose much stronger functional form restrictions on the drift and diﬀusion componentcompared to our semiparametric approach. Speciﬁcally, jumps and SV components are often usedto capture extremal events (fat tails). It is possible that these components are needed in explainingthe VIX dynamics due to the restrictive drift and diﬀusion speciﬁcations they consider. Oursemiparametric approach allows for more ﬂexibility in this respect and so can be seen as a competingapproach to capturing the same features in data. An interesting research topic would be to developtools that allow for formal statistical comparison of our class of models against these alternativeones.

For each of the two UPDs, we examine whether the parametric speciﬁcation of the transforma-tion is supported by the data. We do this by testing each of the parametric models against thesemiparametric alternative where the transformation is left unspeciﬁed. We do so by computinga pseudo Likelihood Ratio (pseudo-LR) test statistic deﬁned as the diﬀerence between the pseudolog-likelihood (pseudo-LL) of the semiparametric model and the log-likelihood (LL) of the paramet-ric model. Since the model under the alternative is semiparametric and estimated by pseudo-ML,the pseudo-LL test statistic will not follow a χ -distribution. We therefore resort to a paramet-ric bootstrap procedure: For each of the two pseudo-LR test, we simulate 1000 new time seriesfrom the parametric model using as data-generating parameter values the MLEs obtained fromthe original sample. For each of the 1000 new data sets, of the same size as the original one, weestimate both the parametric model and the semiparametric model and compute the correspondingpseudo-LR statistic. Finally, we use the 95th and 99th quantiles from the simulated distributionof the pseudo-LR statistic as our 5% and 1% bootstrap critical values, respectively.The pseudo-LL is computed using the log-likelihood given in (4.1) with U ( y ) and log U (cid:48) ( y ; θ )replaced by ˜ U (cid:48) ( y ; θ ) given in (4.3) and log ˜ U (cid:48) ( y ; θ ) = log ˆ f Y ( y ) − log f X (cid:16) ˜ U ( y ) ; θ (cid:17) , respectively.Here, ˆ f Y ( y ) is the kernel density estimator which requires us choosing a bandwidth. There isa lack of consensus on the right procedure for choosing bandwidths for kernel estimators usingdependent data. We therefore considered a sequence of bandwidths constructed by multiplyingthe Silverman’s rule of thumb bandwidth, denoted as h S , by a factor k between 0 .

75 and 1 .

75 ona small grid. Visual inspection of these density estimates revealed that with k is around 1 .

5, theresulting density appears to be the most satisfactory in terms of smoothness and the revelation ofdistributional features of the data. For this reason, we report our inferential results based on therelatively optimal bandwidth 1 . h S = 2 . κ is estimated for the NPTOUmodel and only κ and α for the NPTCIR model. In addition, while κ has the same interpretation(i.e. rate of mean reversion) and scale in all four models, α has diﬀerent scales in the two trans-formed CIR models. For both the transformed OU and the transformed CIR classes of models,we can see that the PMLEs of the mean-reversion parameter ˆ κ are slightly lower than their corre-sponding MLE estimates. The same diﬀerence applies to their standard errors. This shows thatparametric (mis-)speciﬁcation of the stationary distribution does have a quite signiﬁcant impacton the estimation of the dynamic parameters.[Table 5]The lower panel presents the LL values and the our pseudo-LR test results. We can see thatthe EW model has a much higher LL ( − . − . . . p -values of our pseudo-LR tests, obtained from ourbootstrap procedure described above. For both tests, we observe that those critical values areall negative and the p -values are both exactly zero. This means that the original pseudo-LRs of290 . . We propose a novel semiparametric approach for modelling stationary nonlinear univariate diﬀu-sions. The class of models can be thought of as Markov copula models where the copula is impliedby the UPD model. Primitive conditions for the identiﬁcation of the UPD parameters together withthe unknown transformations from discrete samples are provided. We derive the asymptotic prop-erties for our semiparametric likelihood-based estimators of the UPD parameters and kernel-baseddrift and diﬀusion estimators. Our simulation results suggest that our semiparametric methodperforms well in ﬁnite sample compared to the fully parametric method, and our relatively sim-ple application shows that the parametric assumptions on the transformation function of the wellknown DO model and EW model are rejected by the data against our nonparametric alternatives.Potential future work under this framework may include extensions to multivariate diﬀusions andjump-diﬀusions. 31 eferences

Ahn, D.-H., Gao, B., 1999. A Parametric nonlinear model of term structure dynamics. Review ofFinancial Studies 12, 721-762.A¨ıt-Sahalia, Y., 1996a. Nonparametric pricing of interest rate derivatives. Econometrica 64, 527-560.A¨ıt-Sahalia, Y., 1996b. Testing continuous-time models of the spot interest rate. Review of FinancialStudies 9, 385-426.A¨ıt-Sahalia, Y., 2002. Maximum likelihood estimation of discretely sampled diﬀusions: a closed-form approximation Approach. Econometrica 70, 223-262.Bandi, F.M., 2002. Short-term interest rate dynamics: A spatial approach. Journal of FinancialEconomics 65, 73-110.Bandi, F.M., Phillips, P.C.B., 2003, Fully nonparametric estimation of scalar diﬀusion models.Econometrica 71, 241-283.Beare, B.K., 2010. Copulas and temporal dependence. Econometrica 78, 395-410.Bladt, M., Sørensen, M., 2014. Simple simulation of diﬀusion bridges with application to likelihoodinference for diﬀusions. Bernoulli 20, 645-675.Bu, R., Cheng, J., Hadri, K., 2017. Speciﬁcation analysis in regime-switching continuous-timediﬀusion models for market volatility. Studies in Nonlinear Dynamics and Econometrics 21(1),65-80.Bu, R., Fredj, J., Li, Y., 2017. An empirical comparison of transformed diﬀusion models for VIX andVIX futures. Journal of International Financial Markets, Institutions and Money 46, 116-127.Bu, R., Giet, L., Hadri, K., Lubrano, M., 2011. Modelling multivariate interest rates using time-varying copulas and reducible non-linear stochastic diﬀerential equations. Journal of FinancialEconometrics 9(1), 198-236.Chen, X., Fan, Y., 2006. Estimation of copula-based semiparametric time series models. Journal ofEconometrics 130, 307-335.Chen, X., Hansen, L.P., Scheinkman, J., 2009. Nonlinear principal components and long run im-plications of multivariate diﬀusions. Annals of Statistics 37, 4279-4312.Chen, X., Hansen, L.P., Carrasco, M., 2010. Nonlinearity and temporal dependence. Journal ofEconometrics 155, 155-169.Chen, X., Wu, W.B., Yi, Y., 2009. Eﬃcient estimation of copula-based semiparametric Markovmodels. Annals of Statistics 37, 4214-4253. 32hoi, S., 2009. Regime-switching univariate diﬀusion models of the short-term interest rate. Studiesin Nonlinear Dynamics and Econometrics 13(1), Article 4.Conley, T., Hansen, L., Luttmer, E., Scheinkman, J., 1997. Short-term interest rates as subordi-nated diﬀusions. Review of Financial Studies 10, 525-577.Cox, J., Ingersoll, J., Ross, S., 1985. In intertemporal general equilibrium model of asset prices.Econometrica 53, 363-384.Detemple, J., and Osakwe, C. 2000. The valuation of volatility option. European Finance Review4, 21-50.Eraker, B., Wang, J., 2015. A non-linear dynamic model of the variance risk premium, Journal ofEconometrics 187, 547-556.Forman, J.L., Sørensen, M., 2014. A transformation approach to modelling multi-modal diﬀusions.Journal of Statistical Planning and Inference 146, 56-69.Gobet, E., Hoﬀmann, M., Reiß, M., 2004. Nonparametric estimation of scalar diﬀusions based onlow frequency data. Annals of Statistics 32, 2223-2253.Hansen, B., 1994. Autoregressive conditional density estimation. International Economic Review35, 705-730.Hansen, L.P., Scheinkman, J., Touzi, N., 1998. Spectral methods for identifying scalar diﬀusions.Journal of Econometrics 86, 1-32.Jiang, G., Knight, J., 1997. A nonparametric approach to the estimation of diﬀusion processes withan application to a short-term interest rate model. Econometric Theory 13, 615-645.Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman & Hall, London.Jondeau, E., Rockinger, M., 2006. The copula-GARCH model of conditional dependencies - aninternational stock application. Journal of International Money and Finance 25, 827-853.Kaeck, A., Alexander, C., 2013. Continuous-time VIX dynamics: On the role of stochastic volatilityof volatility. International Review of Financial Analysis 28, 46-56.Kanaya, S., Uniform Convergence Rates of Kernel-Based Nonparametric Estimators for ContinuousTime Diﬀusion Processes: A Damping Function Approach. Econometric Theory 33, 874-914.Kanaya, S., Kristensen, D., 2016. Estimation of stochastic volatility models by nonparametricﬁltering. Econometric Theory 32, 861-916.Karatzas, I., Shreve, S., 1991. Brownian Motion and Stochastic Calculus , Proofs

Proof of Theorem 3.2.

From eqs. (3.2)-(3.5), it is obvious that (3.6)-(3.7) imply

S ∼ ˜ S . Now,suppose that S ∼ ˜ S ; this implies that µ Y ( y ; S ) = µ Y (cid:16) y ; ˜ S (cid:17) and σ Y ( y ; S ) = σ Y (cid:16) y ; ˜ S (cid:17) , where µ Y and σ Y are given in eqs. (2.6)-(2.7). That is, for all y ∈ Y , µ X ( U ( y ) ; θ ) U (cid:48) ( y ) − σ X ( U ( y ) ; θ ) U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) = µ X (cid:16) ˜ U ( y ) ; ˜ θ (cid:17) ˜ U (cid:48) ( y ) − σ X (cid:16) ˜ U ( y ) ; ˜ θ (cid:17) ˜ U (cid:48)(cid:48) ( y )˜ U (cid:48) ( y ) ,σ X ( U ( y ) ; θ ) U (cid:48) ( y ) = σ X (cid:16) ˜ U ( y ) ; ˜ θ (cid:17) ˜ U (cid:48) ( y ) . Since V is one-to-one we can set y = V ( x ) in the above to obtain the following for all x ∈ X , µ X ( U ( V ( x )) ; θ ) U (cid:48) ( V ( x )) − σ X ( U ( V ( x )) ; θ ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) (A.1)= µ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48) ( V ( x )) − σ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) ,σ X ( U ( V ( x )) ; θ ) U (cid:48) ( V ( x )) = σ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48) ( V ( x )) . (A.2)Deﬁne T ( x ) = ˜ U ( V ( x )) ⇔ T − ( x ) = U (cid:16) ˜ V ( x ) (cid:17) , and observe that U ( V ( x )) = x, U (cid:48) ( V ( x )) V (cid:48) ( x ) = 1 , ∂T ( x ) ∂x = ˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) . Eq. (A.2) combined with the above implies (3.7)(ii), σ X ( x ; θ ) = σ X ( U ( V ( x )) ; θ ) U (cid:48) ( V ( x )) V (cid:48) ( x ) = σ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) = σ X (cid:16) T ( x ) ; ˜ θ (cid:17) ∂T ( x ) / ( ∂x ) = σ T − ( X ) (cid:16) x ; ˜ θ (cid:17) . (A.3)Next, divide through with V (cid:48) ( x ) in (A.1) and rearrange to obtain µ X ( x ; θ ) = µ X (cid:16) T ( x ) ; ˜ θ (cid:17) ∂T ( x ) / ( ∂x ) + 12 (cid:40) σ X ( x ; θ ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) V (cid:48) ( x ) − σ X (cid:16) T − ( x ) ; ˜ θ (cid:17) ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:41) = µ X (cid:16) T ( x ) ; ˜ θ (cid:17) ∂T ( x ) / ( ∂x ) + 12 σ X (cid:16) T ( x ) ; ˜ θ (cid:17) (cid:40) U (cid:48) ( V ( x )) V (cid:48) ( x ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) − ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:41) where the second equality uses (A.3). Eq. (3.7)(i) now follows since1˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) − ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) V (cid:48) ( x )= 1˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:34) ˜ U (cid:48) ( V ( x )) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) − ˜ U (cid:48)(cid:48) ( V ( x )) V (cid:48) ( x ) (cid:35) = − U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:104) ˜ U (cid:48) ( V ( x )) V (cid:48)(cid:48) ( x ) + ˜ U (cid:48)(cid:48) ( V ( x )) V (cid:48) ( x ) (cid:105) = − ∂ T ( x ) / (cid:0) ∂x (cid:1) ∂T ( x ) / ( ∂x ) . roof of Theorem 5.1. We ﬁrst note that the PMLE takes the same form as the one analyzedin Chen and Fan (2006) with the general copula considered in their work satisfying eq. (2.17). Thedesired result will follow if we can verify that the conditions stated in their proof are satisﬁed byour assumptions: First, by Assumptions 2.1, the discrete sample { X i ∆ : i = 0 , , . . . , n } generatedby the UPD X is ﬁrst-order Markovian and with marginal density f X ( x ; θ ) and transition density p X ( x | x ; θ ). Hence, the copula density c X ( u , u ; θ ) in (2.17) implied by X is absolutely continuouswith respect to the Lebesgue measure on [0 , due to its continuity in F X ( x ; θ ), f X ( x ; θ ) and p X ( x | x ; θ ). Moreover, the implied copula is neither the Fr´echet-Hoeﬀding upper or lower bounddue to Assumption 2.1, i.e., σ X ( x ; θ ) > x ∈ X . Thus, Chen and Fan (2006, Assumption1) is satisﬁed. Second, our Assumption 4.2(i) ensures that X is β -mixing with polynomial decayrate. Third, by Theorem 2.1, Y is mixing with the same mixing properties as X and so satisﬁesChen and Fan (2006, Assumption 1). The remaining conditions are met by Assumption 4.3(i).For the analysis of the proposed sieve MLE, we note that it takes the same form as the oneanalyzed in Chen, Wu and Yi (2009) and so their results carry over to our setting. Their AssumptionM and assumption of β -mixing property are satisﬁed by Y under our Assumptions 2.1, 2.2, and4.2(ii) together with our Theorem 2.1. The remaining conditions are met by Assumption 4.3(ii). Proof of Theorem 5.2.

Similar to the proof strategy employed in Lemma C.1, we deﬁne˜ µ Y ( y ) = µ X ( U ( y ) ; θ ) U (cid:48) ( y ) − σ X ( U ( y ) ; θ ) ˆ U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) , ˜ σ Y ( y ) = σ X ( U ( y ) ; θ )ˆ U (cid:48) ( y ) , and, with f ( i ) Y denoting the i th derivative of f Y and similar for other functions, arrive at √ nh (cid:40) ˆ µ Y ( y ) − µ Y ( y ) − h κ f (3) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ )2 U (cid:48) ( y ) (cid:21)(cid:41) = √ nh (cid:40) ˜ µ Y ( y ) − µ Y ( y ) − h κ f (3) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ )2 U (cid:48) ( y ) (cid:21)(cid:41) + o p (1)= − σ X ( U ( y ) ; θ )2 U (cid:48) ( y ) √ nh (cid:40) ˆ U (2) ( y ) − U (2) ( y ) − h κ f (3) Y ( y ) f X ( U ( y ) ; θ ) (cid:41) + o p (1) , and √ nh (cid:40) ˆ σ Y ( y ) − σ Y ( y ) − h κ f (2) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ ) U (cid:48) ( y ) (cid:21)(cid:41) = √ nh (cid:40) ˜ σ Y ( y ) − σ Y ( y ) − h κ f (2) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ ) U (cid:48) ( y ) (cid:21)(cid:41) + o p (1)= − σ X ( U ( y ) ; θ ) U (cid:48) ( y ) √ nh (cid:40) ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (2) Y ( y ) f X ( U ( y ) ; θ ) (cid:41) + o p (1) . These together with (C.1) and (C.2) of Lemma C.1 and Slutsky’s Theorem complete the proof.36

Veriﬁcation of conditions for OU and CIR model

We here verify the technical conditions of Chen and Fan (2006) for the normalized versions of the OUand CIR model given in eqs. (6.1) and (6.2), respectively. For both examples, we will require that U ( y ; θ ), as deﬁned in eq. (2.16), and its ﬁrst and second-order derivatives w.r.t θ are polynomiallybounded in y . This imposes growth restrictions on the transformation function and is used toeasily verify various moment conditions in the following. Also note that the criterion l ( U i − , U i ; θ )in Chen and Fan (2006) takes the form l ( U i − , U i ; θ ) := log p X (cid:0) U ( Y i ∆ ; θ ) ; U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) − log f X ( U ( Y i ∆ ; θ ) ; θ ), where U i = F Y ( Y i ∆ ), in our notation. B.1 OU model

Assumption 4.2:

It is easily seen that (cid:110) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:111) = − (cid:112) κ x and s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) =exp (cid:16) x (cid:17) / (cid:82) xx ∗ exp (cid:16) z (cid:17) dz . Assumption 4.2 is veriﬁed by taking the relevant limits. Assumption 4.3:

The implied copula of the normalized OU process is Gaussian, for which As-sumption 4.3(i) and 4.3(ii) are satisﬁed as discussed in Chen and Fan (2006) and Chen, Wu, andYi (2009), respectively.

B.2 CIR model

Assumption 4.2:

We obtain (cid:110) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:111) = (2 α − (cid:112) κ x − (cid:112) κ x and s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) = exp { x } x α √ κ √ x/ (cid:82) xx ∗ exp { z } z α dz and the assumption is veriﬁed by taking relevant limits. Assumption 4.3.

First observe that p X ( x | x ; θ ) = exp (cid:2) c ( θ ) − c ( θ ) (cid:0) x + e − κ ∆ x (cid:1)(cid:3) x x I α − (cid:0) c ( θ ) √ xx (cid:1) , where I q ( · ) is the so-called modiﬁed Bessel function of the ﬁrst kind and of order q and c ( θ, ∆) > c ( θ, ∆) > f X is here the density of a gamma distributionand so all polynomial moments of X exist. Since U is assumed to be polynomially bounded,this implies that all polynomial moments of Y also exist. All smoothness conditions imposed inChen and Fan (2006) are trivially satisﬁed since p X ( x | x ; θ ) and U ( y ; θ ) are twice continuouslydiﬀerentiable w.r.t their arguments and so will not be discussed any further. Similarly, we havealready shown that Y is geometrically mixing. It remains to verify the moment conditions and theidentifying restrictions imposed in C1-C.5 in Proposition 4.2 and A2-A6 in Chen and Fan (2006). C1 is satisﬁed if we restrict θ = ( α, κ ) to be situated in a compact set on R that contains thetrue value. Observe thatlog p X ( x | x ; θ ) = c ( θ ) − c ( θ ) (cid:0) x + e − κ ∆ x (cid:1) + log (cid:16) x x (cid:17) + log I α − (cid:0) c ( θ ) √ xx (cid:1) . s θ ( x | x ; θ ) : = ∂ log p X ( x ; x ; θ ) ∂θ = ˙ c ( θ ) − ˙ c ( θ ) (cid:0) x + e − κ ∆ x (cid:1) + c ( θ ) ∆ e − κ ∆ x + I (cid:48) α − (cid:0) c ( θ ) √ xx (cid:1) c ( θ ) √ xx ˙ c ( θ ) I α − (cid:0) c ( θ ) √ xx (cid:1) +  ˙ I α − ( c ( θ ) √ xx ) I α − (2 c ( θ ) √ xx )  , where ˙ c ( θ ) = ∂c ( θ ) / ( ∂θ ) and similar for other functions, I (cid:48) α − ( x ) = ∂I α − ( x ) / ( ∂x ), and˙ I α − ( x ) = ∂I α − ( x ) / ( ∂α ). It is easily veriﬁed that (cid:12)(cid:12) I (cid:48) α − ( x ) /I α − ( x ) (cid:12)(cid:12) and (cid:12)(cid:12)(cid:12)(cid:12) I (cid:48) α − ( x ) /I α − ( x ) (cid:12)(cid:12)(cid:12)(cid:12) are both bounded by a polynomial in x . Thus, (cid:107) s X ( x | x ; θ ) (cid:107) is bounded by a polynomial uni-formly in θ ∈ Θ. The expressions of s x ( x | x ; θ ) := ∂ log p X ( x ; x ; θ ) / ( ∂x ) and s x ( x | x ; θ ) := ∂ log p X ( x ; x ; θ ) / ( ∂x ) are on a similar form and also polynomially bounded. Now, observe that l θ ( U i − , U i ; θ ) : = ∂l ( U i − , U i ; θ ) ∂θ = s θ (cid:0) U ( Y i ∆ ; θ ) | U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) + s x (cid:0) U ( Y i ∆ ; θ ) | U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) ˙ U ( Y i ∆ ; θ )+ s x (cid:0) U ( Y i ∆ ; θ ) | U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) ˙ U (cid:0) Y ( i − ; θ (cid:1) − ∂ log f X ( U ( Y i ∆ ; θ ) ; θ ) ∂θ . Given that the model is correctly speciﬁed and identiﬁed, it follows by standard arguments forMLE that E [ l θ ( U i , U i − ; θ )] = 0 if and only if θ equals the true value. C4 . From the above expression of l θ ( U i , U i − ; θ ) together with our assumption on U ( y ; θ ), it iseasily checked that it is bounded by a polynomial in (cid:0) Y i ∆ , Y ( i − (cid:1) uniformly in θ ∈ Θ. It nowfollows that E [sup θ (cid:107) l θ ( U i , U i − ; θ ) (cid:107) p ] < ∞ for any p ≥ C5. l θ, ( U i − , U i ; θ ) = ∂l θ ( U i − , U i ; θ ) ∂U i − , l θ, ( U i − , U i ; θ ) = ∂l θ ( U i − , U i ; θ ) ∂U i are again bounded by polynomials in (cid:0) Y i ∆ , Y ( i − (cid:1) and so have all relevant moments. A1(ii)-(iii).

With W ,i and W ,i deﬁned in (4.2)-(4.3) in Chen and Fan (2006) and l θ,θ ( U i − , U i ; θ ) = ∂ l ( U i − , U i ; θ ) ∂θ∂θ (cid:48) , lim n →∞ Var (cid:32) √ n n (cid:88) i =1 { l θ ( U i − , U i ; θ ) + W ,i + W ,i } (cid:33) , and E [ l θ,θ ( U i − , U i ; θ )] to have full rank. We have been unable to verify these two conditions dueto the complex form of the score and hessian of the CIR model. A4.

Observe that | W ,i | ≤ E [ | U i − | (cid:107) l θ, ( U i − , U i ; θ ) (cid:107) ] < ∞ and similar for W ,i . Thus, both haveall relevant moments. A5-A6 have already been veriﬁed above. 38

Lemma

Lemma C.1

Under Assumptions 2.1-2.2, 4.2(i), and 4.4-4.6, we have as n → ∞ , h → , nh → ∞ , √ nh (cid:26) ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (cid:48)(cid:48) Y ( y ) f X ( U ( y ) ; θ ) (cid:27) → d N (cid:32) , U (cid:48) ( y ) f Y ( y ) (cid:90) R K ( z ) dz (cid:33) , (C.1) and as n → ∞ , h → , nh → ∞ , √ nh (cid:26) ˆ U (cid:48)(cid:48) ( y ) − U (cid:48)(cid:48) ( y ) − h κ f (cid:48)(cid:48)(cid:48) Y ( y ) f X ( U ( y ) ; θ ) (cid:27) → d N (cid:32) , U (cid:48) ( y ) f Y ( y ) (cid:90) R K (cid:48) ( z ) dz (cid:33) . (C.2) Proof.

With ˆ F Y ( y ) given in (4.2), let ˆ f ( i ) Y ( y ) = ˆ F ( i +1) Y ( y ), for i = 1 ,

2, be the i th derivative of thekernel marginal density estimator. Using standard methods for kernel estimators (c.f. Robinson,1983), we obtain under the assumptions of the lemma that, as n → ∞ , h →

0, and nh i → ∞ , √ nh i (cid:26) ˆ f ( i ) Y ( y ) − f ( i ) Y ( y ) − h κ f ( i +2) Y ( y ) (cid:27) → d N (0 , V i ( y )) (C.3)where V i ( y ) = f Y ( y ) (cid:82) R K ( i ) ( z ) dz . Assumptions 2.1 and 4.4 ensure that f Y ( y ) is suﬃcientlysmooth so that f (2) Y ( y ) and f (3) Y ( y ) exist. Assumption 4.2(i) and 4.6 regulate the mixing propertyof Y and the kernel function, respectively, as required by Robinson (1983).From (4.4) we have ˆ U (cid:48) ( y ) = ˆ f Y ( y ) /f X ( ˆ U ( y ) ; ˆ θ ). Now deﬁne ˆ U (cid:48) ( y ) = ˆ f Y ( y ) /f X ( U ( y ) ; θ )and note that Assumption 4.4 and 4.5 together with the delta-method imply ˆ U (cid:48) ( y ) − ˆ U (cid:48) ( y ) = O P (1 / √ n ) = o P (1 / √ nh ). It then follows that √ nh (cid:26) ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (2) Y ( y ) 1 f X ( U ( y ) ; θ ) (cid:27) = √ nh (cid:26) o P (cid:16) / √ nh (cid:17) + ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (2) Y ( y ) 1 f X ( U ( y ) ; θ ) (cid:27) = 1 f X ( U ( y ) ; θ ) √ nh (cid:26) ˆ f Y ( y ) − f Y ( y ) − h κ f (2) Y ( y ) (cid:27) + o P (1) . Using (C.3) and the same arguments as in Kristensen (2011, Proof of Theorem 1), we arrive at(C.1).Next, observe that U (cid:48)(cid:48) ( y ) = f (cid:48) Y ( y ) f X ( U ( y ); θ ) − f (cid:48) X ( U ( y ); θ ) f Y ( y ) f X ( U ( y ); θ ) where f (cid:48) X ( x ; θ ) and f (cid:48) Y ( y ) are theﬁrst derivatives of f X ( x ; θ ) and f Y ( y ), respectively. Similarly, it is easily checked that ˆ U (cid:48)(cid:48) ( y ) = ˆ f (cid:48) Y ( y ) f X ( ˆ U ( y );ˆ θ ) − f (cid:48) X ( ˆ U ( y );ˆ θ ) ˆ f Y ( y ) f X ( ˆ U ( y );ˆ θ ) . Deﬁne ˆ U (cid:48)(cid:48) ( y ) = ˆ f (cid:48) Y ( y ) f X ( U ( y ); θ ) − f (cid:48) X ( U ( y ); θ ) f Y ( y ) f X ( U ( y ); θ ) and apply argumentssimilar to before to obtain √ nh (cid:26) ˆ U (cid:48)(cid:48) ( y ) − U (cid:48)(cid:48) ( y ) − h κ f (3) Y ( y ) 1 f X ( U ( y ) ; θ ) (cid:27) = 1 f X ( U ( y ) ; θ ) √ nh (cid:26) f (cid:48) Y ( y ) − f (cid:48) Y ( y ) − h κ f (3) Y ( y ) (cid:27) + o p (1)which together with (C.3) yield (C.2). 39 Tables and Figures

Table 1: Bias and RMSE of κ in the OU-SKST Model Bias/ κ Sample Size 2202 5505True Parameter Value ρ PPMLE PMLE PPMLE PMLE κ = 1 . . . . . . κ = 5 . . . . . . κ = 11 .

377 0 . . . . . κ = 22 .

753 0 . . . . . κ Sample Size 2202 5505True Parameter Value ρ PPMLE PMLE PPMLE PMLE κ = 1 . . . . . . κ = 5 . . . . . . κ = 11 .

377 0 . . . . . κ = 22 .

753 0 . . . . . able 2: Bias and RMSE of κ in the CIR-SKST Model Bias/ κ Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . κ Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . Table 3: Bias and RMSE of α in the CIR-SKST Model Bias/ α Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . α Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . able 4: Descriptive Statistics of Daily VIX Sample Period January 2, 1990 - July 19, 2019Sample Size 7445Mean 19.21Median 17.31Std Dev. 7.76Skewness 2.12Kurtosis 10.85Jarque-Bera Statistic 24669.26

Table 5: Model Estimation and Pseudo-LR Test Results

Transformed OU Transformed CIRDO NPTOU EW NPTCIRˆ κ α σ (cid:37) δ LL (cid:0) (cid:1) -1.1724 -1.1579 -1.1585 -1.1565 LR CV . -52.1521 -23.6766 CV . -30.5511 -10.9027 p -value 0.0000 0.000042igure 1: Marginal Densities of the Eurodollar Rates. Solid = SKST Density, Dashed = Kernel Density, Dotted = Normal Density

Figure 2: Estimated Drift and Diﬀusion for the OU-SKST Model ( T = 2202) . Solid = True Function, Dashed = Mean of Estimates, Dotted = 95% Conﬁdence Bands

Figure 3: Estimated Drift and Diﬀusion for the OU-SKST Model ( T = 5505). Solid = True Function, Dashed = Mean of Estimates, Dotted = 95% Conﬁdence Bands T = 2202). Solid = True Function, Dashed = Mean of Estimates, Dotted = 95% Conﬁdence Bands