Diffusion Copulas: Identification and Estimation
DDiffusion Copulas: Identification and Estimation
Ruijun Bu ∗ Kaddour Hadri † Dennis Kristensen ‡ April 2020
Abstract
We propose a new semiparametric approach for modelling nonlinear univariate diffusions,where the observed process is a nonparametric transformation of an underlying parametric dif-fusion (UPD). This modelling strategy yields a general class of semiparametric Markov diffusionmodels with parametric dynamic copulas and nonparametric marginal distributions. We provideprimitive conditions for the identification of the UPD parameters together with the unknowntransformations from discrete samples. Likelihood-based estimators of both parametric andnonparametric components are developed and we analyze the asymptotic properties of these.Kernel-based drift and diffusion estimators are also proposed and shown to be normally dis-tributed in large samples. A simulation study investigates the finite sample performance of ourestimators in the context of modelling US short-term interest rates. We also present a simpleapplication of the proposed method for modelling the CBOE volatility index data.JEL Classification: C14, C22, C32, C58, G12Keywords: Continuous-time model; diffusion process; copula; transformation model; identifica-tion; nonparametric; semiparametric; maximum likelihood; sieve; kernel smoothing. ∗ Department of Economics, Management School, University of Liverpool, Liverpool, UK. Email: [email protected]. † Queen’s University Management School, Queen’s University Belfast, Belfast, Northern Ireland, UK. Email:[email protected]. ‡ Department of Economics, University College London, London, UK. Email: [email protected]. a r X i v : . [ ec on . E M ] M a y Introduction
Most financial time series have fat tails that standard parametric models are not able to generate.One forceful argument for this in the context of diffusion models was provided by A¨ıt-Sahalia(1996b) who tested a range of parametric models against a nonparametric alternative and foundthat most standard models were inconsistent with observed features in data.One popular semiparametric approach that allows for more flexibility in terms of marginaldistributions, and so allowing for fat tails, is to use the so-called copula models, where the copulais parametric and the marginal distribution is left unspecified (nonparametric). Joe (1997) showedhow bivariate parametric copulas could be used to model discrete-time stationary Markov chainswith flexible, nonparametric marginal distributions. The resulting class of semiparametric modelsare relatively easy to estimate; see, e.g. Chen and Fan (2006). However, most parametric copulasknown in the literature have been derived in a cross-sectional setting where they have been used todescribe the joint dependence between two random variables with known joint distribution, e.g. abivariate t -distribution. As such, existing parametric copulas may be difficult to interpret in termsof the dynamics they imply when used to model Markov processes. This in turn means that appliedresearchers may find it difficult to choose an appropriate copula for a given time series.One could have hoped that copulas with a clearer dynamic interpretation could be developedby starting with an underlying parametric Markov model and then deriving its implied copula.This approach is unfortunately hindered by the fact that the stationary distributions of generalMarkov chains are not available on closed-form and so their implied dynamic copulas are notavailable on closed form either. This complicates both the theoretical analysis (such as establishingidentification) and the practical implementation of such models.An alternative approach to modelling fat tails using Markov diffusions is to specify flexible formsfor the so-called drift and diffusion term. Such non-linear features tend to generate fat tails in themarginal distribution of the process. This approach has been widely used to, for example, modelshort-term interest rates; see, e.g., A¨ıt-Sahalia (1996a,b), Conley et al. (1997), Stanton (1997),Ahn and Gao (1999) and Bandi (2002). These models tend to either be heavily parameterized orinvolve nonparametric estimators that suffer from low precision in small and moderate samples.We here propose a novel class of dynamic copulas that resolves the above-mentioned issues:We show how copulas can easily be generated from parametric diffusion processes. The copulashave a clear interpretation in terms of dynamics since they are constructed from an underlyingdynamic continuous-time process. At the same time, a given copula-based diffusion can exhibitstrong non-linearities in its drift and diffusion term even if the underlying copula is derived from,for example, a linear model. Furthermore, primitive conditions for identification of the parametersare derived; and this despite the fact that the copulas are implicit. Finally, the models can easily beimplemented in practice using existing numerical methods for parametric diffusion processes. Thisin turn implies that estimators are easy to compute and do not involve any smoothing parameters;this is in contrast to existing semi- and nonparametric estimators of diffusion models.The starting point of our analysis is to show that there is a one-to-one correspondence between2ny given semiparametric Markov copula model and a model where we observe a nonparametrictransformation of an underlying parametric Markov process. We then restrict attention to para-metric Markov diffusion processes which we refer to as underlying parametric diffusions (UPD’s).Copulas generated from a given UPD has a clear interpretation in terms of dynamic properties. Inparticular, standard results from the literature on diffusion models can be employed to establishmixing properties and existence of moments for a given model; see, e.g. Chen et al. (2010). More-over, we are able to derive primitive conditions for the parameters of the copula to be identifiedtogether with the unknown transformation.Once identification has been established, estimation of our copula diffusion models based ona discretely sampled process proceeds as in the discrete-time case. One can either estimate themodel using a one-step or two-step procedure: In the one-step procedure, the marginal distributionand the parameters of the UPD are estimated jointly by sieve-maximum likelihood methods asadvocated by Chen, Wu and Yi (2009). In the two-step approach, the marginal distribution is firstestimated by the empirical cdf, which in turn is plugged into the likelihood function of the model.This is then maximized with respect to the parameters of the UPD. We provide an asymptotictheory for both cases by importing results from Chen, Wu and Yi (2009) and Chen and Fan (2006),respectively. In particular, we provide primitive conditions for their high-level assumptions tohold in our diffusion setting. The resulting asymptotic theory shows √ n -asymptotic normalityof the parametric components. Given the estimates of parametric component, one can obtainsemiparametric estimates of the drift and diffusion functions and we also provide an asymptotictheory for these.Our modelling strategy has parametric ascendants: Bu et al. (2011), Eraker and Wang (2015)and Forman and Sørensen (2014) considered parametric transformations of UPDs for modellingshort-term interest rates, variance risk premia and molecular dynamics, respectively. We here pro-vide a more flexible class of models relative to theirs since we leave the transformation unspecified.At the same time, all the attractive properties of their models remain valid: The transition densityof the observed process is induced by the UPD and so the estimation of copula-based diffusionmodels is computationally simple. Moreover, copula diffusion models can furthermore be easilyemployed in asset pricing applications since (conditional) moments are easily computed using thespecification of the UPD. Finally, none of these papers fully addresses the identification issue andso our identification results are also helpful in their setting.There are also similarities between our approach and the one pursued in A¨ıt-Sahalia (1996a)and Kristensen (2010). They developed two classes of semiparametric diffusion models where eitherthe drift or the diffusion term is specified parametrically and the remaining term is left unspecified.The remaining term is then recovered by using the triangular link between the marginal distribu-tion, the drift and the diffusion terms that exist for stationary diffusions. In this way, the marginaldistribution implicitly ties down the dynamics of the observed diffusion process. Unfortunately, itis very difficult to interpret the dynamic properties of the resulting semiparametric diffusion model.In contrast, in our setting, the UPD alone ties down the dynamics of the observed diffusion andso these are much better understood. The estimation of copula diffusions are also less computa-3ionally burdensome compared to the Pseudo Maximum Likelihood Estimator (PMLE) proposedin Kristensen (2010).The remainder of this paper is organized as follows. Section 2 outlines our semiparametricmodelling strategy. Section 3 investigates the identification issue of our model. In Section 4,we discuss the estimators of our model while Section 5 investigates their asymptotic properties.Section 6 presents a simulation study to examine the finite sample performance of our estimators.In Section 7, we consider a simple empirical application. Some concluding remarks are given inSection 8. All proofs and lemmas are collected in Appendices. Consider a continuous-time process Y = { Y t : t ≥ } with domain Y = ( y l , y r ), where −∞ ≤ y l
The transformation V is strictly increasing with inverse U = V − , i.e., y = V ( x ) ⇔ x = U ( y ), and is twice continuously differentiable.Assumption 2.1(i) provides primitive conditions for a solution to eq. (2.2) to exist and for thetransition density p X ( x | x ; θ ) to be well-defined, while Assumption 2.1(ii) implies that this solutionis positive recurrent; see Bandi and Phillips (2003), Karatzas and Shreve (1991, Section 5.5) andMcKean (1969, Section 5) for more details. Assumption 2.1(iii) strengthens the recurrence propertyto stationarity and ergodicity in which case the stationary marginal density of X takes the form f X ( x ; θ ) = ξ ( θ ) σ X ( x ; θ ) s ( x ; θ ) , (2.5)where ξ ( θ ) was defined in Assumption 2.1(iii). However, stationarity will not be required for allour results to hold; in particular, some of our identification results and proposed estimators do notrely on stationarity. This is in contrast to the existing literature on dynamic copula models wherestationarity is a maintained assumption.Assumption 2.2 requires V to be strictly increasing; this is a testable restriction under theremaining assumptions introduced below which ensures identification: Suppose that indeed V isstrictly decreasing; we then have Y t = ¯ V (cid:0) ¯ X t (cid:1) , where ¯ V ( x ) = V ( − x ) is increasing and ¯ X t = − X t has dynamics p X ( − x | − x ; θ ). Assuming that the chosen UPD satisfies p X ( − x | − x ; θ ) (cid:54) = p X ( x | x ; ˜ θ ) for θ (cid:54) = ˜ θ , we can test whether V indeed is decreasing or increasing.The smoothness condition on V is imposed so that we can employ Ito’s Lemma on the trans-formation to obtain that the continuous-time dynamics of Y can be written in terms of S as dY t = µ Y ( Y t ; S ) dt + σ Y ( Y t ; S ) dW t , with µ Y ( y ; S ) = µ X ( U ( y ) ; θ ) U (cid:48) ( y ) − σ X ( U ( y ) ; θ ) U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) , (2.6) σ Y ( y ; S ) = σ X ( U ( y ) ; θ ) U (cid:48) ( y ) , (2.7)where we have used that, with U (cid:48) ( y ) and U (cid:48)(cid:48) ( y ) denoting the first two derivatives of U ( y ), V (cid:48) ( U ( y )) = 1 /U (cid:48) ( y ) and V (cid:48)(cid:48) ( U ( y )) = − U (cid:48)(cid:48) ( y ) /U (cid:48) ( y ) . In particular, Y is a Markov diffusion5rocess. As can be seen from the above expressions, the dynamics of Y , as characterized by µ Y and σ Y , may appear quite complex with U potentially generating nonlinearities in both the driftand diffusion terms even if µ X and σ X are linear. We demonstrate this feature in the subsequentsubsection where we present examples of simple UPD’s are able to generate non-linear shapes of µ Y and σ Y via the non-linear transformation V . At the same time, if we transform Y by U we re-cover the dynamics of the UPD. As a consequence, the transition density of the discretely sampledprocess Y i ∆ , i = 0 , , , ... , can be expressed in terms of the one of X as p Y ( y | y ; S ) = U (cid:48) ( y ) p X ( U ( y ) | U ( y ) ; θ ) , (2.8)using standard results for densities of invertible transformations. By similar arguments, the sta-tionary density of Y satisfies f Y ( y ; S ) = U (cid:48) ( y ) f X ( U ( y ) ; θ ) , (2.9)which shows that any choice for UPD is able to fully adapt to any given marginal density of Y dueto the nonparametric nature of U .The above expressions also highlights the following additional theoretical and practical advan-tages of our modelling strategy: First, for a given choice of U , we can easily compute p Y ( y | y ; S )and f Y ( y ; S ) since computation of parametric transition densities and stationary densities of diffu-sion models is in general straightforward, even if they are not available on closed form. Second, Y inherits all its dynamic properties from X ; and in the modelling of X , we can rely on a large litera-ture on parametric modelling of diffusion models. Formally, we have the following straightforwardresults adopted from Forman and Sørensen (2014). Proposition 2.1
Suppose that Assumptions 2.1(i)–(ii) and 2.2 hold. Then the following resultshold for the model (2.1)-(2.2):1. If Assumption 2.1(iii) hold, then X is stationary and ergodic and so is Y .2. The mixing coefficients of X and Y coincide.3. If E [ | X t | q ] < ∞ and | V ( x ) | ≤ B (1 + | x | q ) for some B < ∞ and q , q ≥ , then E [ | Y t | q /q ] < ∞ .4. If ϕ is an eigenfunction of X with corresponding eigenvalue ρ in the sense that E [ ϕ ( X ) | X ] = ρϕ ( X ) then ϕ ◦ U is an eigenfunction of Y with corresponding eigenvalue ρ . The above theorem shows that, given knowledge (or estimates) of S , the properties of Y interms of mixing coefficients, moments, and eigenfunctions are well-understood since they are in-herited from the specification of X . In addition, computations of conditional moments of Y can bedone straightforwardly utilizing knowledge of the UPD. For example, for a given function G , thecorresponding conditional moment can be computed as E [ G ( Y t + s ) | Y t = y ] = E [ G X ( X t + s ) | X t = U ( y )] , where G X ( x ) := G ( V ( x )) . X and so standard methods for computing momentsof parametric diffusion models (e.g., Monte Carlo methods, solving partial differential equations,Fourier transforms) can be employed. This facilitates the use of our diffusion models in asset pricingwhere the price often takes the form of a conditional moment. We refer to Eraker and Wang (2015)for more details on asset pricing applications for our class of models; they take a fully parametricapproach but all their arguments carry over to our setting.The last result of the above theorem will prove useful for our identification arguments sincethese will rely on the fundamental nonparametric identification results derived in Hansen et al.(1998). Their results involve the spectrum of the observed diffusion process, and the last result ofthe theorem implies that the spectrum of Y is fully characterized by the spectrum of X togetherwith the transformation. The eigenfunctions and their eigenvalues are also useful for evaluatinglong-run properties of Y . In our semiparametric approach, the eigenfunctions and correspondingeigenvalues of Y are easily computed from X and so we circumvent the problem of estimating thesenonparametrically as done in, for example, Chen, Hansen and Scheinkman (2009) and Gobet et al.(2004). Our framework is quite flexible and in principle allows for any specification of the UPD for X . Manyparametric models are available for that purpose, and we here present three specific examples fromthe literature on continuous-time interest rate modelling. Example 1: Ornstein-Uhlenbeck (OU) model.
The OU model (c.f. Vasicek, 1977) is givenby dX t = κ ( α − X t ) dt + σdW t , (2.10)defined on the domain X = ( −∞ , + ∞ ). The process is stationary if and only if κ >
0, in which case X mean-reverts to its unconditional mean α . The scale of X is controlled by σ . Its stationary andtransition distributions are both normal, and the corresponding copula of the discretely sampledprocess is a Gaussian copula with correlation parameter e − κ ∆ . For this particular model, theresulting drift and diffusion term of the observed process takes the form µ Y ( y ; S ) = κ ( α − U ( y )) U (cid:48) ( y ) − σ U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) , σ Y ( y ; S ) = σ U (cid:48) ( y ) . (2.11)In Figure 2 (found in Section 6), we plot these two functions with U and θ fitted to the 7-dayEurodollar interest rate time series used in A¨ıt-Sahalia (1996b). Observe that U generates non-linear behavior in µ Y and σ Y despite the UPD being a linear Gaussian process. Example 2: Cox-Ingersoll-Ross (CIR) model.
The CIR process (c.f. Cox et al., 1985) isgiven by dX t = κ ( α − X t ) dt + σ (cid:112) X t dW t . (2.12)The process has domain X = (0 , + ∞ ) and is stationary if and only if κ > α > κα/σ ≥ X i ∆ , X ( i +1)∆ admits a non-central χ distribution with fractional degrees of freedom7hile its stationary distribution is a Gamma distribution. To our best knowledge, the correspondingdynamic copula has not been analyzed before or used in empirical work. Figure 4 (in Section 6)displays µ Y and σ Y , with U and θ chosen in the same way as in Exampe 1. Compared to thisexample, the resulting drift and diffusion term of Y exhibit even stronger non-linearities. Example 3: Nonlinear Drift Constant Elasticity Variance (NLDCEV) model.
TheNLDCEV specification (c.f. Conley et al., 1997) is given by dX t = (cid:32) l (cid:88) i = − k α i X it (cid:33) dt + σX βt dW t (2.13)with domain X = (0 , + ∞ ). It is easily seen that when α − k > α l < X . A popular choicefor various studies in finance assumes that k = 1 and l = 2 or 3 (c.f. A¨ıt-Sahalia, 1996b; Choi,2009; Kristensen, 2010; Bu, Cheng and Hadri, 2017), in which case the drift has linear or zeromean-reversion in the middle part and much stronger mean-reversion for large and small values of X . Meanwhile, the CEV diffusion term is also consistent with most empirical findings of the shapeof the diffusion term. It follows that since (2.13) is one of the most flexible parametric diffusions,diffusion processes that are unspecified transformations of (2.13) should represent a very flexibleclass of diffusion models. Similar to (2.12), the implied copula of the NLDCEV is new to the copulaliterature.Examples 1-2 are attractive from a computational standpoint since the corresponding transitiondensities are available on closed-form thereby facilitating their implementation. But this comes atthe cost of the dynamics being somewhat simple. The NLDCEV model implies more complexand richer dynamics but on the other hand its transition density is not available on closed form.However, the marginal pdf of the NLDCEV process, as well as more general specifications, can beevaluated in closed form by (2.5). Moreover, closed-from approximations of the transition density ofthe NLDCEV model developed by, for example, A¨ıt-Sahalia (2002) and Li (2013) can be employed.Alternatively, simulated versions of the transition density can be computed using the techniquesdeveloped in, for example, Kristensen and Shin (2012) and Bladt and Sørensen (2014). In eithercase, an approximate version of the exact likelihood can be easily computed, thereby allowing forsimple estimation of even quite complex underlying UPDs. As already noted in the introduction, copula-based diffusions are related to the class of so-called discrete-time copula-based Markov models; see, for example, Chen and Fan (2006) and referencestherein. To map the notation and ideas of this literature into our continuous-time setting, we setthe sampling time distance ∆ = 1 in the remaining part of this section.Let us first introduce copula-based Markov models where a given discrete-time, stationaryscalar Markov process Y = { Y i : i = 0 , , . . . , n } is modelled through a bivariate parametric copula8ensity , say, c X ( u , u ; θ ), together with its stationary marginal cdf F Y , i.e., so that Y ’s transitiondensity satisfies p Y ( y | y ; θ, F Y ) = f Y ( y ) c X ( F Y ( y ) , F Y ( y ) ; θ ) , (2.14)where f Y ( y ) = F (cid:48) Y ( y ). An alternative representation of this model is Y i = F − Y (cid:0) ¯ X i (cid:1) , ¯ X i +1 | ¯ X i = x ∼ c X ( x , · ; θ ) , (2.15)so that Y i is a transformation of an underlying Markov process ¯ X i ∈ [0 , c X ( x , x ; θ ). Thus, if c X ( x , x ; θ ) is inducedby an underlying Markov diffusion transition density, the corresponding copula-based Markov modelfalls within our framework.Reversely, consider a copula-based diffusion and suppose that the UPD X is stationary withmarginal cdf F X ( x ; θ ). By definition of Y , its marginal cdf satisfies F Y ( y ) = F X ( U ( y ) ; θ ) ⇔ U ( y ) = F − X ( F Y ( y ) ; θ ) . (2.16)Substituting the last expression for U into (2.8), we see that p Y can be expressed in the form of(2.14) where c X ( u , u ; θ ) is the density function of the (dynamic) copula implied by the discretelysampled UPD X , c X ( u , u ; θ ) = p X (cid:0) F − X ( u ; θ ) | F − X ( u ; θ ) ; θ (cid:1) f X (cid:0) F − X ( u ; θ ) ; θ (cid:1) . (2.17)Thus, any discretely sampled stationary copula-based diffusion satisfies (2.15) with ¯ X i = F X ( X i ).However, the literature on copula-based Markov models focus on discrete-time models withstandard copula specifications derived from bivariate distributions in an i.i.d. setting. Using copulasthat are originally derived in an i.i.d. setting complicates the interpretation of the dynamics of theresulting Markov model, and conditions for the model to be mixing, for example, can be quitecomplicated to derive; see, e.g., Beare (2010) and Chen, Wu and Yi (2009). This also implies thatvery few standard copulas can be interpreted as diffusion processes; to our knowledge, the only oneis the Gaussian copula which corresponds to the OU process in Example 1.The reader may now wonder why we do not simply generate dynamic copulas by first deriv-ing the transition density p X ( x | x ; θ ) for a given discrete-time Markov model and then obtainthe corresponding Markov copula through eq. (2.17)? The reason is that for most discrete-timeMarkov models the stationary distribution F X ( x ; θ ) is not known on closed form. Thus, first ofall, F − X ( u ; θ ) and thereby also c X have be approximated numerically. Second, since c X is now notavailable on closed form, the analysis of which parameters one can identify from the resulting copulamodel becomes very challenging. And identification in copula-based Markov models is a non-trivialproblem: Generally, for a given parametric Markov model, not all parameters are identified fromthe corresponding copula as given in (2.17) and some of them have to be normalized. The copula C X ( u , u ; θ ) for a given Markov process is defined as C X ( u , u ; θ ) = Pr (cid:0) X ≤ F − X ( u ; θ ) , X ≤ F − X ( u ; θ ) (cid:1) . The corresponding copula density is then given by c X ( u , u ; θ ) = ∂ C X ( u , u ; θ ) / ( ∂u ∂u ).
9e here directly generate copulas through an underlying continuous-time diffusion model for X . This resolves the aforementioned drawbacks of existing copula-based Markov models: First,we are able to generate highly flexible copulas so far not considered in the literature. Second,given that our copulas are induced by specifying the drift and diffusion functions of X , the timeseries properties are much more easily inferred from our model, c.f. Theorem 2.1. Third, by Ito’sLemma, eqs. (2.6)-(2.7) provide us with explicit expressions linking the drift and diffusion termsof the observed diffusion process Y to the UPD through the transformation V ; this will allow usto derive necessary and sufficient conditions for identification in the following. Fourth, in terms ofestimation, the stationary distribution of a given diffusion model has an explicit form, c.f. eq. (2.5),which allows us to develop computationally simple estimators of copula diffusion models. Finally,some of our identification results will not require stationarity and so expands the scope for usingcopula-type models in time series analysis.Our modelling strategy is also related to the ideas of A¨ıt-Sahalia (1996a) and Kristensen (2010,2011) where F Y is left unspecified while either the drift, µ Y , or the diffusion term, σ Y , is specifiedparametrically. As an example, consider the former case where σ Y ( y ; θ ) is known up to the pa-rameter θ . Given knowledge of the marginal density f Y (or a nonparametric estimator of it), thediffusion term can then be recovered as a functional of f Y and µ Y as µ Y ( y ; f Y , θ ) = 12 f Y ( y ) ∂∂y (cid:2) σ Y ( y ; θ ) f Y ( y ) (cid:3) . So in their setting f Y pins down the resulting dynamics of Y in a rather opaque manner. Suppose that a particular specification of the UPD as given in (2.2) has been chosen. Given thediscrete sample of Y , the goal is to obtain consistent estimates of θ together with V . To this end,we first have to show that these are actually identified from data. In order to do so, we need tobe precise about which primitives we can identify from data. Given the primitives, we then wishto recover ( θ, V ). In the cross-sectional literature, one normally take as given the distribution ofdata and then establish a mapping between this and the structural parameters. In our setting, weare able to learn about the transition density of our data, p Y , from the population and so it wouldbe natural to use this as primitive from which we wish to recover ( θ, V ). However, the mappingfrom p Y to ( θ, V ) is not available on closed form in general in our setting and so this identificationstrategy appears highly complicated. Instead we will take as primitives the drift, µ Y , and diffusionterm, σ Y , of Y and then show identification of ( θ, V ) from these. This identification argumentrelies on us being able to identify µ Y and σ Y in the first place, which we formally assume here: Assumption 3.1
The drift, µ Y , and the diffusion, σ Y , are nonparametrically identified from thediscretely sampled process Y .The above assumption is not completely innocuous and does impose some additional regularityconditions on the Data Generating Process (DGP). We therefore first provide sufficient conditions10nder which Assumption 3.1 holds. The first set of conditions are due to Hansen et al. (1998) whoshowed that Assumption 3.1 is satisfied if Y is stationary and its infinitesimal operator has a discretespectrum. Theorem 2.1(4) is helpful in this regard since it informs us that the spectrum of Y can berecovered from the one of X . In particular, if X is stationary with a discrete spectrum, then Y willhave the same properties. Since the dynamics of X is known to us, the properties of its spectrum arein principle known to us and so this condition can be verified a priori. The second set of primitiveconditions come from Bandi and Phillips (2003): They show that as ∆ → n ∆ → ∞ , thedrift and diffusion functions of a recurrent Markov diffusion process are identified. This last resultholds without stationarity, but on the other hand requires high-frequency observations.In order to formally state the above two results, we need some additional notation. Recall thatthe infinitesimal operator, denoted L X , of a given UPD X is defined as L X,θ g ( x ) := µ X ( x ; θ ) g (cid:48) ( x ) + 12 σ X ( x ; θ ) g (cid:48)(cid:48) ( x ) , for any twice differentiable function g ( x ). We follow Hansen et al. (1998) and restrict the domainof L X to the following set of functions: D ( L X,θ ) = (cid:26) g ∈ L ( f X ) : g (cid:48) is a.c., L X,θ g ∈ L ( f X ) and lim x ↓ x l g (cid:48) ( x ) s ( x ) = lim x ↑ x u g (cid:48) ( x ) s ( x ) = 0 (cid:27) . where a.c. stands for absolutely continuous. The spectrum of L X,θ is then the set of solution pairs( ϕ, ρ ), with ϕ ∈ D ( L X,θ ) and ρ ≥
0, to the following eigenvalue problem, L X,θ ϕ = − ρϕ . Werefer to Hansen et al. (1998) and Kessler and Sørensen (1999) for a further discussion and resultsregarding the spectrum of L X . The following result then holds: Proposition 3.1
Suppose that Assumption 2.1(i)-(ii) is satisfied. Then Assumption 3.1 holdsunder either of the following two sets of conditions:1. Assumption 2.1(iii) holds and L X,θ has a discrete spectrum where θ is the data-generatingparameter value.2. ∆ → and n ∆ → ∞ . Importantly, the above result shows that Assumption 3.1 can be verified without imposingstationarity. Unfortunately, this requires high-frequency information (∆ → > S = ( θ, V ) containsthe objects of interest and let our model consist of all the structures that satisfy, as a minimum,Assumptions 2.1(i)–(ii) and 2.2. According to (2.6)-(2.7), each structure implies a drift and diffusionterm of the observed process. We shall say that two structures S = ( θ, V ) and ˜ S = (˜ θ, ˜ V ) are11 bservationally equivalent , a property which we denote by S ∼ ˜ S , if they imply the same drift anddiffusion of Y , i.e. ∀ y ∈ Y : µ Y ( y ; S ) = µ Y (cid:16) y ; ˜ S (cid:17) and σ Y ( y ; S ) = σ Y (cid:16) y ; ˜ S (cid:17) . (3.1)The structure S is then said to be identified within the model if S ∼ ˜ S implies S = ˜ S . In oursetting, without suitable normalizations on the parameters of the UPD, identification will generallyfail. To see this, observe that any given structure S is observationally equivalent to the followingprocess: Choose any one-to-one transformation T : X (cid:55)→ X , and rewrite the DGP implied by S as Y t = ˜ V (cid:16) ˜ X t (cid:17) , ˜ V ( x ) = V ( T ( x )) , (3.2)where ˜ X t = T − ( X t ) solves d ˜ X t = µ T − ( X ) (cid:16) ˜ X t ; θ (cid:17) dt + σ T − ( X ) (cid:16) ˜ X t ; θ (cid:17) dW t , (3.3)with µ T − ( X ) ( x ; θ ) = µ X ( T ( x ) ; θ ) ∂T ( x ) / ( ∂x ) − σ X ( T ( x ) ; θ ) ∂ T ( x ) / (cid:0) ∂x (cid:1) ∂T ( x ) / ( ∂x ) , (3.4) σ T − ( X ) ( x ; θ ) = σ X ( T ( x ) ; θ ) ∂T ( x ) / ( ∂x ) . (3.5)Suppose now that there exists ˜ θ so that µ T − ( X ) ( x ; θ ) = µ X (cid:16) x ; ˜ θ (cid:17) and σ T − ( X ) ( x ; θ ) = σ X (cid:16) x ; ˜ θ (cid:17) .Then the alternative representation (3.2)-(3.3) is a member of our model with structure ˜ S = (˜ θ, ˜ V )which is observationally equivalent to S = ( V, θ ). The following result provides a complete charac-terizations of the class of observationally equivalent structures for a given model:
Theorem 3.2
Suppose that Assumptions 3.1 is satisfied. For any two structures S = ( V, θ ) and ˜ S = ( ˜ V , ˜ θ ) satisfying Assumptions 2.1(i) and 2.2, the following hold: S ∼ ˜ S if and only if thereexists one-to-one transformation T : X (cid:55)→ X so that ˜ V ( x ) = V ( T ( x )) (3.6) and, with µ T − ( X ) (cid:16) x ; ˜ θ (cid:17) and σ T − ( X ) (cid:16) x ; ˜ θ (cid:17) given in eqs. (3.4)-(3.5),(i) µ T − ( X ) ( x ; ˜ θ ) = µ X ( x ; θ ) and (ii) σ T − ( X ) ( x ; ˜ θ ) = σ X ( x ; θ ) . (3.7) In particular, the data-generating structure is identified if and only if there exists no one-to-onetransformation T such that (3.7) holds for θ (cid:54) = ˜ θ . Note that the above theorem does not require stationarity since it is only concerned with themapping
S (cid:55)→ ( µ Y ( · ; S ) , σ Y ( · ; S )) which is well-defined irrespectively of whether data is stationary.The first part of the theorem provides a exact characterization of when any two structures areequivalent, namely if there exists a transformation T so that (3.6)-(3.7) hold. The second part12omes as a natural consequence of the first part: If there exists no such transformation, then thedata-generating structure must be identified.Unfortunately, the above result may not always be useful in practice since it requires us tosearch over all possible one-to-one transformations T and for each of these verify that there existsno θ (cid:54) = ˜ θ for which eq. (3.7) holds. In some cases, it proves useful to first normalize the UPDsuitably and then verify eq. (3.7) in the normalized version. First note that for any one-to-onetransformation ¯ T ( · ; θ ) : X (cid:55)→ ¯ X , an equivalent representation of the model is Y t = V (cid:0) ¯ X t (cid:1) , where the ”normalised” UPD ¯ X t := ¯ T − ( X t ; θ ) ∈ ¯ X solves d ¯ X t = µ ¯ X (cid:0) ¯ X t ; θ (cid:1) dt + σ ¯ X (cid:0) ¯ X t ; θ (cid:1) dW t , with µ ¯ X (¯ x ; θ ) = µ X (cid:0) ¯ T (¯ x ; θ ) ; θ (cid:1) ∂ ¯ T (¯ x ; θ ) / ( ∂ ¯ x ) − σ X (cid:0) ¯ T (¯ x ; θ ) ; θ (cid:1) ∂ ¯ T (¯ x ; θ ) / (cid:0) ∂ ¯ x (cid:1) ∂ ¯ T (¯ x ; θ ) / ( ∂ ¯ x ) , (3.8) σ ¯ X (¯ x ; θ ) = σ X (cid:0) ¯ T (¯ x ; θ ) ; θ (cid:1) ∂ ¯ T (¯ x ; θ ) / ( ∂ ¯ x ) . (3.9)Given that the above representation is observationally equivalent to the original model, we can stillemploy Theorem 3.2 but with µ ¯ X and σ ¯ X replacing µ X and σ X . Verifying the identification con-ditions stated in the second part of the theorem for the normalised versions will in some situationsbe easier by judicious choice of ¯ T .Below, we present three particular normalising transformations that we have found useful in thisregard. The chosen transformations allow us to provide easy-to-check conditions for a given UPDto be identified. For a given UPD, the researcher is free to apply either of the three identificationschemes depending on which is the easier one to implement. The three schemes lead to differentnormalizations/parametrizations, but they all lead to models that are exactly identified (no over-identifying restrictions are imposed) and so are observationally equivalent: The resulting form of µ Y and σ Y will be identical irrespectively of which scheme is employed.The three transformations that we consider also highlights three alternative modelling ap-proaches: Instead of starting with a parametric UPD as found in the existing literature, such asExamples 1-3, one can alternatively build a UPD with unit diffusion ( σ X = 1), zero drift ( µ X = 0)or known marginal distribution. As we shall see, either of these three modelling approaches are inprinciple as flexible as the standard approach where the researcher jointly specifies the drift anddiffusion term. In our first identification scheme, we choose to normalize X t by the so-called Lamperti transform,¯ X t = ¯ T − ( X t ; θ ) := γ ( X t ; θ ) , γ ( x ; θ ) = (cid:90) xx ∗ σ X ( z ; θ ) dz, x ∗ ∈ X . The resulting process is a unit diffusion process, d ¯ X t = µ ¯ X (cid:0) ¯ X t ; θ (cid:1) dt + dW t , with domain ¯ X = (¯ x l , ¯ x r ), where ¯ x r = lim x → x + r γ ( x ; θ ) and ¯ x l = lim x → x − l γ ( x ; θ ), and drift function µ ¯ X (¯ x ; θ ) = µ X (cid:0) γ − (¯ x ; θ ) ; θ (cid:1) σ X ( γ − (¯ x ; θ ) ; θ ) − ∂σ X ∂x (cid:0) γ − (¯ x ; θ ) ; θ (cid:1) . (3.10)For the unit diffusion version of the UPD, the equivalence condition (3.7)(ii) becomes1 = σ ¯ X (¯ x ; θ ) = σ T − ( ¯ X ) (cid:16) ¯ x ; ˜ θ (cid:17) = 1 ∂T (¯ x ) / ( ∂x ) , which can only hold if T (¯ x ) = ¯ x + η for some constant η ∈ R . Thus, we can restrict attention tothis class of transformations and (3.7)(i) becomes: Assumption 3.2.
With µ ¯ X given in (3.10): There exists no η (cid:54) = 0 and ˜ θ (cid:54) = θ such that µ ¯ X (¯ x ; ˜ θ ) = µ ¯ X (¯ x + η ; θ ) for all ¯ x ∈ ¯ X .Assumption 3.2 imposes a normalization condition on the transformed drift function to ensureidentification. When verifying Assumption 3.2 for the transformed unit diffusion ¯ X defined above,we will generally need to fix some of the parameters that enter µ X ( x ; θ ) and σ X ( x ; θ ) of the originalprocess X , see below. Corollary 3.3
Under Assumptions 2.1(i), 2.2 and 3.1, S is identified if and only if Assumption3.2 is satisfied. The above transformation result can be applied to standard parametric specifications when γ ( x ; θ ) is available on closed-form. But it also highlights that in terms of modelling copula diffu-sions, we can without loss of generality build a model where we from the outset restrict σ X = 1and only model the drift term µ X . For example, we could choose the following flexible polynomialdrift model where we have already normalized the diffusion term: dX t = (cid:32) l (cid:88) i =1 α i X it (cid:33) dt + dW t , (3.11)where θ = ( α , ..., α l ). Corollary 3.3 shows that this particular copula diffusion specification isidentified without further restrictions on θ . Below we apply Corollary 3.3 to some of the standardparametric diffusions introduced earlier: Example 1 (continued).
The Lamperti transform of the OU process in (2.10) is given by d ¯ X t = κ (cid:0) α/σ − ¯ X t (cid:1) dt + dW t . α/σ is a location shift of ¯ X , we need to normalize α/σ in order for the identification condition3.3 to be satisfied; one such is α/σ = 0 leading to the following identified model, d ¯ X t = − κ ¯ X t dt + dW t . (3.12) Example 2 (continued).
The Lamperti transform of the CIR diffusion in (2.12) is given by d ¯ X t = (cid:20) κ (cid:18) X t ασ − ¯ X t (cid:19) −
12 ¯ X t (cid:21) dt + dW t , (3.13)which only depends on θ = ( κ, α ∗ ) where α ∗ = α/σ . Note that the dimension of the parametervector reduced from 3 to 2. Crucially, it also suggests that we can only identify α and σ up to aratio. Hence, normalization requires fixing either α , σ , or their ratio. Example 3 (continued).
It can be easily verified that the Lamperti transform of the NLDCEVdiffusion in (2.13) takes the form d ¯ X t = (cid:34) l (cid:88) i = − k α ∗ i ¯ X i − β − β t − β − β ) ¯ X − t (cid:35) dt + dW t , (3.14)where α ∗ i := α i σ i − − β (1 − β ) i − β − β , i = − k, ..., l . Hence, the parameters θ = (cid:0) β, α ∗− k , ..., α ∗− l (cid:1) areidentified and the number of parameters is reduced from l + k + 3 to l + k + 2. Note that just as(2.10) and (2.12) are special cases of (2.13), both (3.12) and (3.13) are special cases of (3.14). Our second identification strategy transforms X by its scale measure defined in eq. (2.4),¯ X t := S ( X t ; θ ) , which brings the diffusion process onto its natural scale, d ¯ X t = σ ¯ X (cid:0) ¯ X t ; θ (cid:1) dW t , where the drift is zero (and so known) while σ X (¯ x ; θ ) = s (cid:0) S − (¯ x ; θ ) ; θ (cid:1) σ (cid:0) S − (¯ x ; θ ) ; θ (cid:1) . (3.15)Since the drift term is zero, the identification condition (3.7)(i) becomes0 = − σ X (cid:16) T (¯ x ) ; ˜ θ (cid:17) ∂ T (¯ x ) / (cid:0) ∂ ¯ x (cid:1) ∂T (¯ x ) / ( ∂ ¯ x ) , (3.16)which can only hold if ∂ T (¯ x ) / (cid:0) ∂ ¯ x (cid:1) = 0. We can therefore restrict attention to linear transfor-mations T (¯ x ) = η ¯ x + η , for some constants η , η ∈ R , in which case (3.7)(ii) becomes:15 ssumption 3.3. With σ X given in (3.15): There exists no η (cid:54) = 1, η (cid:54) = 0 and ˜ θ (cid:54) = θ such that σ X (¯ x ; ˜ θ ) = σ X ( η ¯ x + η ; θ ) /η for all ¯ x ∈ ¯ X .In comparison to Assumption 3.2, we here have to impose two normalizations to ensure identi-fication. The intuition for this is that setting the drift to zero does not act as a complete normal-ization of the process: Any additional scale transformation of ¯ X still leads to a zero-drift process.Therefore, for the third scheme to work we need both a scale and location normalization. Theorem 3.4
Under Assumptions 2.1(i)–(ii), 2.2 and 3.1, S is identified if and only if Assumption3.3 is satisfied. Compared to the first identification scheme, it is noticeably harder to apply this one to existingparametric diffusion models since the inverse of the scale transform is usually not available in closedform. But, similar to the first identification scheme, the result shows that without loss of flexibility,we can focus on UPDs with zero drift and then model the diffusion term in a flexible manner, e.g., dX t = exp (cid:32) l − (cid:88) i =1 β i X it + β l | X t | l (cid:33) dW t . (3.17)Corollary 3.4 shows that this UPD is identified together with V without any further parameterrestrictions on θ = ( β , ..., β l ). Our third identification strategy transforms a given stationary UPD by its marginal cdf,¯ X t = F X ( X t ; θ ) . (3.18)In this case, there is generally no simplification in terms of the drift and diffusion term, which takethe form µ ¯ X (¯ x ; θ ) = µ X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) f X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) (3.19)+ 12 σ X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) f (cid:48) X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) and σ ¯ X (¯ x ; θ ) = σ X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) f X (cid:0) F − X (¯ x ; θ ) ; θ (cid:1) . (3.20)for ¯ x ∈ ¯ X = (0 , X t ∼ U (0 ,
1) and we candirectly identify the transformation function by U ( y ) = F Y ( y ), c.f. eq. (2.16). The identificationcondition then takes the form: 16 ssumption 3.4. With µ ¯ X (¯ x ; θ ) and σ ¯ X (¯ x ; θ ) given in eqs. (3.19)-(3.20), the following hold: ∀ ¯ x ∈ (0 ,
1) : µ ¯ X (¯ x ; θ ) = µ ¯ X (cid:16) ¯ x ; ˜ θ (cid:17) and σ ¯ X (¯ x ; θ ) = σ ¯ X (cid:16) ¯ x ; ˜ θ (cid:17) ⇔ θ = ˜ θ. Corollary 3.5
Under Assumptions 2.1-2.2 and 3.1, S is identified if and only if Assumption 3.4is satisfied. The above result is only useful for showing identification of a given UPD if F − (¯ x ; θ ) is availableon closed form. But similar to the previous identification schemes, it demonstrates we can restrictattention to diffusions with known marginal distributions in the model building phase. Specifically,one can choose a known density f X ( x ) that describes the stationary distribution of X togetherwith a parametric specification for, say, the drift function. We can then rearrange eq. (2.5) to backout the diffusion term of the UPD: σ X ( x ; θ ) = 2 f X ( x ) (cid:90) xx l µ X ( z ; θ ) f X ( z ) dz. (3.21)If the drift is specified so that µ X ( · ; θ ) (cid:54) = µ X ( · ; ˜ θ ) for θ (cid:54) = ˜ θ , then Assumption 3.4 will be satisfiedfor this model. Alternatively, one could choose a parametric specification of the diffusion term andthen derive the corresponding drift term of the UPD satisfying µ X ( x ; θ ) = 12 f X ( x ) ∂∂x (cid:2) σ X ( x ; θ ) f X ( x ) (cid:3) . The resulting copula diffusion model is identified as long as the chosen diffusion term satisfies σ X ( · ; θ ) (cid:54) = σ X ( · ; ˜ θ ) for θ (cid:54) = ˜ θ , then Assumption 3.4 will be satisfied for this model.Below, we apply the third identification scheme to the OU and CIR model: Example 1 (continued).
The stationary distribution of (2.10) is N (cid:0) α, v (cid:1) with v = σ / κ andso the marginal density and cdf takes the form f X ( x ; θ ) = v φ (cid:0) x − αv (cid:1) and F X ( x ; θ ) = Φ (cid:0) x − αv (cid:1) ,where φ and Φ denote the density and cdf of the N (0 ,
1) distribution. Applying the transformation(3.18) yields, after some tedious calculations, d ¯ X t = − κ Φ − (cid:0) ¯ X t (cid:1) φ (cid:0) Φ − (cid:0) ¯ X t (cid:1)(cid:1) dt + √ κφ (cid:0) Φ − (cid:0) ¯ X t (cid:1)(cid:1) dW t , which is independent of α and σ and these therefore have to be fixed, leaving κ as the only freeparameter. This is the same finding as with the first identification strategy. Example 2 (continued).
The stationary distribution of the CIR process is a Γ-distribution withscale parameter ω = 2 κ/σ and shape parameter ν = 2 κα/σ . Thus, the marginal density and cdfcan be written as f X ( x ; θ ) = f X ( x ; ω, ν ) = ω ν Γ ( ν ) x ν − e − ωx F X ( x ; θ ) = F X ( x ; ω, ν ) = 1Γ ( ν ) γ ( ν, ωx )17here Γ ( ν ) is the gamma function and γ ( ν, ωx ) is the lower incomplete gamma function. Applyingthe transformation (3.18) yields µ ¯ X (¯ x ; θ ) = (cid:20) κ (cid:18) ν κ − γ − ( ν, ¯ x Γ ( ν ))2 κ (cid:19) + (cid:18) ν − − γ − ( ν, ¯ x Γ ( ν ))2 (cid:19)(cid:21) κ Γ ( ν ) γ − ( ν, ¯ x Γ ( ν )) ν − e − γ − ( ν, ¯ x Γ( ν )) and σ X (¯ x ; θ ) = 2 κγ − ( ν, ¯ x Γ ( ν )) (cid:20)
1Γ ( ν ) γ − ( ν, ¯ x Γ ( ν )) ν − e − γ − ( ν, ¯ x Γ( ν )) (cid:21) . Note that µ ¯ X (¯ x ; θ ) and σ X (¯ x ; θ ) only depend on κ and ν , which means we can only identify α and σ up to a ratio say α ∗ = α/σ . Hence, either α or σ must be fixed, which is in accordance withwhat we found when applying the first identification strategy to the CIR. We could, for example,set σ = 2 κ which leads to the following normalized CIR dX t = κ ( α − X t ) dt + (cid:112) κX t dW t . In this section we develop two alternative semiparametric estimators of θ and V for a given specifi-cation of the UPD. The first takes the form of a two-step Pseudo Maximum Likelihood Estimator(PMLE). The second is a semiparametric sieve-based ML estimator (SMLE). We consider two dif-ferent scenarios when developing estimators: In the first one (see Section 4.1), Y is observed at lowfrequency which we formally define as the case when ∆ > n → ∞ . In the second one(see Section 4.2), high-frequency data is available so that ∆ → n → ∞ . To motivate the two estimators, suppose that U is known, in which case the MLE of θ is given byˆ θ MLE = arg max θ ∈ Θ L n ( θ, U ) , where L n ( θ, U ) is the log-likelihood of { Y i ∆ : i = 0 , , ..., n } , L n ( θ, U ) = 1 n n (cid:88) i =1 (cid:8) log p X (cid:0) U ( Y i ∆ ) | U (cid:0) Y ( i − (cid:1) ; θ (cid:1) + log U (cid:48) ( Y i ∆ ) (cid:9) , (4.1)where p X was is defined in eq. (2.3). If U is unknown, the above estimator is not feasible and weinstead have to estimate it together with θ .Our PMLE assumes Y is stationary in which case U satisfies eq. (2.16), where F X is known upto θ while F Y is unknown. The latter can be estimated by the empirical cdf defined as˜ F Y ( y ) = 1 n + 1 n (cid:88) i =0 I { Y i ∆ ≤ y } , I {·} denotes the indicator function, or alternatively by the following kernel smoothed em-pirical cdf, ˆ F Y ( y ) = 1 n + 1 n (cid:88) i =0 K h ( Y i ∆ − y ) , (4.2)where K h ( y ) = K ( y/h ) with K ( y ) = (cid:82) y −∞ K ( z ) dz , K being a kernel (e.g., the standard normaldensity), and h > F Y in eq. (2.16) with either ˜ F Y or ˆ F Y , we obtain thefollowing two alternative estimators of U ,˜ U ( y ; θ ) = F − X ( ˜ F Y ( y ) ; θ ); ˆ U ( y ; θ ) = F − X ( ˆ F Y ( y ) ; θ ) . (4.3)Since ˆ F Y ( y ) = ˜ F Y ( y ) + O (cid:0) h (cid:1) , the above two estimators of U will be first-order asymptoticallyequivalent under appropriate bandwidth conditions. A natural way to estimate θ in our semipara-metric framework would then be to substitute either ˆ U ( y ; θ ) or ˜ U ( y ; θ ) into L n ( θ, U ). However,in the latter case, this is not possible since L n ( θ, U ) depends on U (cid:48) and ˜ U is not differentiable.However, note that U (cid:48) ( y ) = f Y ( y ) f X ( U ( y ) ; θ ) , (4.4)so that log U (cid:48) ( y ) = log f Y ( y ) − log f X ( U ( y ) ; θ ). Since the first term is parameter independent, itcan be ignored and so we arrive at the following semiparametric PMLE,ˆ θ PMLE = arg max θ ∈ Θ ¯ L n ( θ, ˜ U ( · ; θ )) , where Θ is the parameter space and¯ L n ( θ, U ) = 1 n n (cid:88) i =1 (cid:8) log p X (cid:0) U ( Y i ∆ ) | U (cid:0) Y ( i − (cid:1) ; θ (cid:1) − log f X ( U ( Y i ∆ ) ; θ ) (cid:9) is L n ( θ, U ) − (cid:80) ni =1 log f Y ( Y i ∆ ) /n . One can easily check that, by rewriting the above in terms ofthe implied copula of X , this estimator is equivalent to the one analyzed in Chen and Fan (2006).Our second proposal, the SMLE, replaces the unknown density function f Y ( y ) by a sieve ap-proximation f Y,m ( y ) ∈ F m where F m is a finite-dimensional function space reflecting the propertiesof f Y , m = 1 , , ... . For a given candidate density, we then compute U ( y ; f Y,m , θ ) = F − X ( F Y,m ( y ) ; θ )where F Y,m ( y ) = (cid:82) yy l f Y,m ( z ) dz . Substituting this into the likelihood function yields the followingsemiparametric sieve maximum-likelihood estimator,(ˆ θ SMLE , ˆ f Y,m ) = arg max θ ∈ Θ ,f Y,m ∈F m L n ( θ, U ( · ; f Y,m , θ )) . (4.5)The above SMLE is identical to the one proposed by Chen, Wu and Yi (2009) for the estimation ofcopula-based Markov models, except that while they estimate the parameters of a copula function,we estimate those of the drift and diffusion functions of the UPD. In comparison with the PMLE, thenumerical implementation of the SMLE involves joint maximization over both θ and F m , which is19 harder numerical problem and potentially more time-consuming. In terms of statistical efficiency,ˆ θ SMLE will in general reach the semiparametric efficiency bound under stationarity, while the PMLEis inefficient.Both of the above estimators require us to evaluate F − X ( x ; θ ) which in general is not availableon closed form and so has to be computed using numerical methods, e.g., numerical integrationor Monte Carlo methods combined with a equation solver. For the SMLE, one can circumventthis issue by directly approximating U instead of f Y : For a given finite-dimensional functionspace of one-to-one transformations U m , an alternative to the SMLE in (4.5) is (˜ θ SMLE , ˜ U m ) =arg max θ ∈ Θ ,U m ∈U m L n ( θ, U m ). We expect this to be computationally more efficient compared tothe density version above; the theoretical analysis of this alternative SMLE is left for future re-search.Once an estimator for θ has been obtained, we can estimate the drift and diffusion termsof Y using the expressions given in (2.6) and (2.7) by replacing θ and U with their estimators.However, this involves estimating the first and second derivative of U . For the SMLE this is notan issue assuming that F m is a differentiable function space. For the PMLE, since ˜ U ( y ; θ ) isnot differentiable, we instead use the kernel smoothed version ˆ U ( y ; θ ), leading to the followingthree-step estimators of the drift and diffusion functionsˆ µ Y ( y ) = µ X ( ˆ U ( y ) ; ˆ θ PMLE )ˆ U (cid:48) ( y ) − σ X ( ˆ U ( y ) ; ˆ θ PMLE ) ˆ U (cid:48)(cid:48) ( y )ˆ U (cid:48) ( y ) , (4.6)ˆ σ Y ( y ) = σ X ( ˆ U ( y ) ; ˆ θ PMLE )ˆ U (cid:48) ( y ) , (4.7)where ˆ U ( y ) = F − X ( ˆ F Y ( y ) ; ˆ θ PMLE ). We now turn to the case where high-frequency data is available; this scenario is formally modelledas ∆ → n → ∞ . The proposed estimators described in the previous section remains valid, butan alternative estimation method is available in this case since the exact density of the underlyingUPD, p X , is well-approximated byˆ p X ( x | x ; θ ) = 1 √ π ∆ σ X ( x ; θ ) exp (cid:34) − ( x − x − µ X ( x ; θ ) ∆) σ X ( x ; θ ) ∆ (cid:35) (4.8)as ∆ →
0, c.f. Kessler (1997). We then propose to estimate θ using either the two-step or sieveapproach described in the previous section, except that we here replace p X ( x | x ; θ ) with its high-frequency approximation, ˆ p X ( x | x ; θ ), in the definition of L n ( θ, U ) and ¯ L n ( θ, U ). The advantage ofdoing so is computational in that ˆ p X ( x | x ; θ ) is on closed form for any given UPD while p X ( x | x ; θ )generally can only be evaluated using numerical methods as pointed out earlier.For most standard UPD’s, the parameters can be decomposed into θ = ( θ , θ ) so that µ X ( x ; θ ) = µ X ( x ; θ ) and σ X ( x ; θ ) = σ X ( x ; θ ) only depends on the first and second component, respec-tively. One could hope to be able to estimate θ and θ separately in this case. For known U ,20his is indeed possible. We could, for example, use least-squares methods similar to Kanaya andKristensen (2018) where θ and θ , respectively, are estimated by the minimizers of the followingtwo least-squares objectives, L ( µ ) n, ∆ ( θ ; U ) = n (cid:88) i =1 w ( µ ) i (cid:0) U ( Y i ∆ ) − U (cid:0) Y ( i − (cid:1) − µ X (cid:0) U (cid:0) Y ( i − (cid:1) ; θ (cid:1) ∆ (cid:1) , (4.9) L ( σ ) n, ∆ ( θ ; U ) = n (cid:88) i =1 w ( σ ) i (cid:16)(cid:8) U ( Y i ∆ ) − U (cid:0) Y ( i − (cid:1)(cid:9) − σ X (cid:0) U (cid:0) Y ( i − (cid:1) ; θ (cid:1) ∆ (cid:17) , (4.10)where w ( µ ) i = w ( µ ) (cid:0) Y ( i − , Y i ∆ (cid:1) and w ( σ ) i = w ( σ ) (cid:0) Y ( i − , Y i ∆ (cid:1) are weighting functions.This approach, however, faces two complications in our setting: First, after applying any ofthe three normalizations presented in Section 3 in order to achieve identification, the resultingdrift and diffusion of the UPD tend to share parameters. Second, U is unknown and has to beestimated together with θ . In the case of PMLE, ˜ U ( y ; θ ) in eq. (4.3) generally depends on both θ and θ since f X ( x ; θ ) does. Thus, if we replace U by ˜ U ( y ; θ ) in the above objectives, we cannotseparately estimate θ and θ . Similarly, the SMLE requires joint estimation of U together with θ in which case it would have to be re-estimated for each of the two objectives. In conclusion, theseleast-squares estimators are rarely useful in practice.Another alternative approach, inspired by Bandi and Phillips (2007), see also Kristensen (2011),would be to first obtain non-parametric estimates of µ Y and σ Y and then match these with theones implied by the copula model, Q ( µ ) n, ∆ ( S ) = n (cid:88) i =1 w ( µ ) i (ˆ µ Y ( Y i ∆ ) − µ Y ( Y i ∆ ; S )) , Q ( σ ) n, ∆ ( S ) = n (cid:88) i =1 w ( σ ) i (cid:0) ˆ σ Y ( Y i ∆ ) − σ Y ( Y i ∆ ; S ) (cid:1) , where ˆ µ Y ( · ) and ˆ σ Y ( · ) are the first-step nonparametric estimators; see Bandi and Phillips (2007)for their precise forms. This procedure suffers from the same issue as the least-squares one describedin the previous paragraph. An additional complication is that it involves multiple smoothing pa-rameters: First, ˆ µ Y ( · ) and ˆ σ Y ( · ) depend on two bandwidths and converge with slow rates and,second, µ Y ( · ; S ) and σ Y ( · ; S ) involve derivatives of U and so if we replace U by its kernel-smoothedestimator, ˆ U , the two objective funtions will depend on the first and second order derivatives ofthe kernel density estimator of f Y , which in turn depends on additional bandwidth. All together,these estimators will be complicated to implement due to the multiple bandwidths that the econo-metrician have to choose. Moreover, their asymptotic analysis and behaviour will be non-standard. We here establish an asymptotic theory for the proposed estimators in the case of low-frequency data(∆ > ssumption 4.1 S is identified.The previous section provided three different sets of primitive conditions for Assumption 4.1to hold in terms of ( µ Y ( · ; S ) , σ Y ( · ; S )). This combined with Assumption 3.1 then implies thatthe mapping ( µ Y ( · ; S ) , σ Y ( · ; S )) (cid:55)→ p Y ( y | y ; S ) is injective so that different drift and diffu-sion terms lead to different transition densities. One implication of Assumptions 3.1 and 4.1 is E [log p Y ( Y ∆ | Y ; S )] < E [log p Y ( Y ∆ | Y ; S )] for any S (cid:54) = S , c.f. Newey and McFadden (1994,Lemma 2.2). This ensures that the SMLE identifies S in the limit. Regarding the PMLE,we note that it replaces U by ˆ U ( y ; θ ) = F − X ( ˆ F Y ( y ; θ )). By the LLN of stationary and er-godic sequences, ˆ U ( y ; θ ) → P U ( y ; θ ) = F − X ( F Y ( y ; θ )), where, by the same arguments as before, E [log p Y ( Y ∆ | Y ; θ, U ( · ; θ ))] < E [log p Y ( Y ∆ | Y ; θ , U ( · ; θ ))]. Thus, the PMLE will also in the limitidentify θ .Next, we import conditions from Chen et al. (2010) guaranteeing, in conjunction with our ownAssumptions 2.1-2.2, that the UPD X , and thereby Y , is stationary and β -mixing with mixingcoefficients decaying at either polynomial rate (c.f. Corollary 5.5 in Chen et al., 2010) or geometricrate (c.f. Corollary 4.2 in Chen et al., 2010): Assumption 4.2. (i) µ X and σ X satisfieslim x → x r (cid:26) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:27) ≤ , lim x → x u (cid:26) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:27) ≥ s ( x ; θ ) and S ( x ; θ ) defined in (2.4),lim x → x r (cid:26) s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) (cid:27) > , lim x → x u (cid:26) s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) (cid:27) < X .Finally, we impose the same conditions as used in the asymptotic analysis of the PMLE in Chenand Fan (2006) and Chen, Wu and Yi (2009), respectively, on the copula implied by the chosenUPD and the sieve density in the case of SMLE: Assumption 4.3. (i) c X ( u , u ; θ ) defined in (2.17) satisfies the regularity conditions set out inChen and Fan (2006, A1-A3, A4 or A4’, A5-A6); (ii) c X ( u , u ; θ ) and the sieve space F m satisfy Assumptions 3.1-3.4 and 4.1–4.7, respectively, in Chen, Wu and Yi (2009).22e here abstain from stating the precise, mostly technical, conditions and refer the interestedreader to Chen and Fan (2006) and Chen, Wu and Yi (2009); broadly speaking their conditionstranslate into moment bounds and smoothness conditions on the log-transition density of the UPD.These conditions depend on the precise choice of the UPD and so will have to be verified on a case-by-case basis. In Appendix B, we verify the conditions for models in Examples 1–2.The following result now follows from the general theory of Chen and Fan (2006) and Chen,Wu and Yi (2009), respectively: Theorem 5.1
Under Assumptions 2.1-2.2, 4.1, 4.2(i) and 4.3(i), √ n (ˆ θ PMLE − θ ) → d N (cid:0) , B − Σ B − (cid:1) , where B and Σ are defined in Chen and Fan (2006, A1 and A ∗ n ).Under Assumptions 2.1-2.2, 4.1, 4.2(ii) and 4.3(ii), √ n (ˆ θ SMLE − θ ) → d N (cid:0) , I − ∗ ( θ ) (cid:1) , where I ∗ is defined in Chen, Wu and Yi (2009). Consistent estimators of the asymptotic variances, B − Σ B − and I − ∗ ( θ ), can be found in Chenand Fan (2006) and Chen, Wu and Yi (2009), respectively. Next, we discuss the asymptotic properties of the PMLE based on the high-frequency log-likelihoodthat takes as input ˆ p X ( x | x ; θ ) defined in eq. (4.8); a complete analysis of the PMLE and SMLEin a high-frequency setting is left for future research. In the following, we let T := n ∆ denote thesampling range, which will be assumed to diverge as ∆ → θ PMLE = arg max θ ∈ Θ ˆ L n (cid:16) θ, ˜ U ( · ; θ ) (cid:17) whereˆ L n ( θ, U ) = 1 n n (cid:88) i =1 (cid:8) log ˆ p X (cid:0) U ( Y i ∆ ) | U (cid:0) Y ( i − (cid:1) ; θ (cid:1) − log f X ( U ( Y i ∆ ) ; θ ) (cid:9) , and ˜ U ( Y i ∆ ; θ ) defined in (4.3). We first specialize the general result of Kanaya (2018, Theorem 2)by choosing B = ψ = 1 and K h ( y ) = I { y ≤ } in his notation to obtain that under our Assumption4.2, sup y ∈Y (cid:12)(cid:12)(cid:12) ˜ F Y ( y ) − F Y ( y ) (cid:12)(cid:12)(cid:12) = O P (cid:16) √ ∆ / log ∆ (cid:17) + O P (cid:16) log T / √ T (cid:17) , (5.1)where the two terms on the right-hand side correspond to discretization bias and sampling variance,respectively. By letting T grow sufficiently fast as ∆ →
0, the first term can be ignored. Underregularity conditions on µ X and σ X so that ( y, y ) (cid:55)→ ˆ p X (cid:0) F − X ( y ; θ ) | F − Y ( y ) ; θ (cid:1) /f Y ( y ) satisfiesLipschitz conditions similar to the ones in Chen and Fan (2006), we then obtainsup θ ∈ θ (cid:12)(cid:12)(cid:12) ˆ L n (cid:16) θ, ˜ U ( · ; θ ) (cid:17) − ˆ L n ( θ, U ( · ; θ )) (cid:12)(cid:12)(cid:12) = O P (cid:16) √ ∆ / log ∆ (cid:17) + O P (cid:16) log T / √ T (cid:17) , U ( y ; θ ) = F Y (cid:0) F − X ( y ; θ ) (cid:1) . Consistency of the PMLE now follows by extending the argumentsof Kessler (1997) to allow for the presence of the parameter-dependent transformation U ( y ; θ ).Next, to simplify our discussion of the asymptotic distribution of the PMLE, we consider twospecial cases:First, suppose that suppose that, after suitable normalizations, σ X ( x ) is known and only µ X ( x ; θ ) is parameter dependent. In this case, we expect that Kessler’s results generalize so thatˆ θ PMLE will converge with √ T -rate towards a Normal distribution, where the asymptotic variancewill have to be adjusted to take into account the first-step estimation of ˆ F Y .Next, consider the opposite scenario, µ X ( x ) is known and only σ X ( x ; θ ) is parameter de-pendent. With U known, Kessler (1997) shows that ˆ θ PMLE converges with √ n -rate towards aNormal distribution in this case. Note the faster convergence rate compared to the drift estimator.However, in our setting U ( y ; θ ) is parameter dependent, and as a consequence this result appearsto no longer apply: U ( y ; θ ) enters ˆ L n ( θ, U ) in the same way that µ X does and so the score ofˆ L n ( θ, U ( · ; θ )) will have a component on the same form as in the first case and so will converge with √ T -rate instead of √ n -rate. Moreover, the presence of the first-step estimator ˜ F Y ( y ), which alsoconverge with √ T -rate, will generate an additional variance term. In total, estimators of diffusionparameters appear not to enjoy ”super” consistency in our setting due to the way that the unknowntransformation U enters the likelihood. We here analyze the asymptotic properties of the kernel-based estimators of µ Y and σ Y given ineqs. (4.6)-(4.7). We only do so for the low-frequency case; the analysis of the high-frequency caseshould proceed in a similar fashion. Our analysis takes as starting point the following regularityconditions on the estimator of the parametric component and the kernel function: Assumption 4.4.
The transformation function V is four times continuously differentiable. Assumption 4.5.
The estimator ˆ θ of the parameter of the UPD X is √ n -consistent. Assumption 4.6.
The kernel K is differentiable, and there exists constants D, ω > (cid:12)(cid:12)(cid:12) K ( i ) ( z ) (cid:12)(cid:12)(cid:12) ≤ D | z | − ω , (cid:12)(cid:12)(cid:12) K ( i ) ( z ) − K ( i ) (˜ z ) (cid:12)(cid:12)(cid:12) ≤ D | z − ˜ z | , i = 0 , , where K ( i ) ( z ) denotes the i th derivative of K ( z ). Moreover, (cid:82) R K ( z ) dz = 1, (cid:82) R zK ( z ) dz = 0and κ = (cid:82) R z K ( z ) dz < ∞ .Assumption 4.4 ensures the existence of the 3rd and 4th derivatives of U ( y ), which in turnensure that relevant quantities entering the asymptotic distributions of ˆ µ Y and ˆ σ Y are well defined.Assumption 4.5 implies that the asymptotic properties of ˆ µ Y and ˆ σ Y are determined by the prop-erties of the kernel density estimator alone. The proposed PMLE and SMLE satisfy this condition24nder our Assumptions 4.1-4.3, but other √ n -consistent estimators are allowed for. Assumption4.6 regulates the kernel functions and allow for most standard kernels such as the Gaussian andthe Uniform kernels. Using the functional delta-method together with standard results for kerneldensity estimators, as found in Robinson (1983), we obtain: Theorem 5.2
Under Assumptions 2.1-2.2, 4.2(i), and 4.4-4.6, we have as n → ∞ , h → and nh → ∞ , √ nh (cid:8) ˆ µ Y ( y ) − µ Y ( y ) − h B µ Y ( y ) (cid:9) → d N (0 , V µ Y ( y )) , where B µ Y ( y ) = − κ σ Y ( y ) f (cid:48)(cid:48)(cid:48) Y ( y )4 f Y ( y ) , V µ Y ( y ) = σ Y ( y )4 f Y ( y ) (cid:90) R K (cid:48) ( z ) dz. Also, as n → ∞ , h → and nh → ∞ , we have √ nh { ˆ σ Y ( y ) − σ Y ( y ) − h B σ Y ( y ) } → d N (0 , V σ ( y )) , where B σ Y ( y ) = − κ σ Y ( y ) f (cid:48)(cid:48) Y ( y ) f Y ( y ) , V σ Y ( y ) = 4 σ Y ( y ) f Y ( y ) (cid:90) R K ( z ) dz. We see that both estimators suffer from smoothing biases, B µ Y ( y ) and B σ Y ( y ). If h → In this section, we compare the finite sample performance of our low-frequency semiparametricPMLE with that of a fully parametric PMLE (described below) through Monte Carlo simulations.
We consider the following normalized versions of the UPDs of Examples 1–2,OU : dX t = − κX t dt + √ κdW t , θ = κ, (6.1)CIR : dX t = κ ( α − X t ) dt + (cid:112) κX t dW t , θ = ( κ, α ) . (6.2)The chosen normalizations have the advantage that the marginal distributions of X are invariantto the mean-reversion parameter κ . Hence, by varying κ , we can change the persistence level of X (and thus Y ) while keeping the marginal distributions fixed. In this way, we can examine theimpact of persistence on the performance of the proposed estimators of θ , µ Y and σ Y .Next, we specify the transformation of the DGP of Y . This is done by choosing marginalcdf F Y ( y ; φ ), where φ is a hyper parameter governing the shape of the cdf, which induces the25ransformation V ( X t ; φ ) = F − Y ( F X ( X t ; θ ) ; φ ). With f Y ( y ; φ ) = F (cid:48) Y ( y ; φ ), the transition densityof the true DGP of Y then takes the form p Y ( y | y ; θ, φ ) = f Y ( y ; φ ) c X ( F Y ( y ; φ ) , F Y ( y ; φ ) ; θ ) . (6.3)We choose F Y ( y ; φ ) as a flexible distribution to reflect stylized features such as asymmetry and fat-tailedness of observed financial data. Specifically, we use the Skewed Student- t (SKST) Distributionof Hansen (1994) with density f Y ( y ; φ ) = bqv τ − bv ( y − m ) + a − λ − ( τ +1) / if y < m − av/b,bqv τ − bv ( y − m ) + a λ − ( τ +1) / if y ≥ m − av/b, (6.4)where v >
0, 2 < τ < ∞ , − < λ < a = 4 λq (cid:18) τ − τ − (cid:19) , b = 1 + 3 λ − a and q =Γ (( τ + 1) / / (cid:112) π ( τ −
2) Γ ( τ / φ = ( m, v, λ, τ ) whichhas to be chosen in order to fully specify the DGP. While m and v are the unconditional meanand standard deviation of the distribution, λ controls the skewness and τ controls the degreesof freedom (hence the fat-tailedness) of the distribution. The distribution reduces to the usualstudent- t distribution when λ = 0. Due to its flexibility in modelling skewness and kurtosis, theSKST distribution is often used in financial modelling. (c.f. Patton, 2004; Jondeau and Rockinger,2006; Bu, Fredj and Li, 2017).The transformed diffusion Y generated by the SKST marginal distribution together with thenormalized UPD in (6.1) or (6.2) is referred to as the OU-SKST or the CIR-SKST model, respec-tively. The true data-generating parameters φ and θ are chosen as estimates obtained from fittingthe parametric versions of the two models to the 7-day Eurodollar interest rate time series usedin A¨ıt-Sahalia (1996b). The estimation is based on a fully parametric two-stage PMLE. In thefirst stage, the SKST distribution is fitted to the data (as if they are i.i.d) to obtain ˆ φ . We thensubstitute F Y ( y ; ˆ φ ) and f Y ( y ; ˆ φ ) into (6.3) which is then maximized with respect to θ to obtain ˆ θ for each of the two UPD’s. The calibrated parameter values of the marginal SKST distribution are( ˆ m, ˆ v, ˆ λ, ˆ τ ) = (0 . , . , . , . κ = 1 . κ, ˆ α ) = (0 . , . n = 2202 and n = 5505, respectively, are then generated using φ = ˆ φ and θ = ˆ θ as our true data-generating parameters. For both OU-SKST and CIR-SKST, θ involves26he mean-reversion parameter κ which controls the level of persistence. We create 3 additionalscenarios by multiplying κ by factors of 5, 10, and 20 while keeping everything else unchanged.Collectively, we have a total of 8 cases corresponding to 2 sample sizes and 4 persistence levels. Themaximum factor 20 is chosen because the implied 1st-order autocorrelation coefficient ρ ≈ . We compare our low-frequency PMLE of θ with the corresponding fully parametric PMLE (PPMLE)described above that we used for our calibration. Note that the only difference between the twoestimators is that the former estimates the marginal distribution F Y parametrically, while the latterestimates it nonparametrically.The relative bias and RMSE (defined as the ratios of the actual bias and the actual RMSEover the true parameter value, respectively) of the estimators of the parametric components of theOU-SKST case are presented in Table 1. Overall, the results from the two estimation methods aregenerally comparable with the same magnitudes. The semiparametric PMLE tends to do better interms of bias while the parametric PMLE dominates in terms of variance. However, as the level ofpersistence decreases, the two estimators’ performance is close to identical.[Table 1]The results for the CIR-SKST case are presented in Table 2 and 3 which are qualitatively verysimilar to the ones for the OU-SKST. Overall, the performance of the PMLE is comparable withthat of the PPMLE with very similar estimation errors. Moreover, the gap in the performance ofthe PMLE relative to the PPMLE appears to narrow when the true DGP gets less persistent.[Table 2 and 3]Next, we investigate the performance of the semiparametric estimators of µ Y and σ Y in eqs.(4.6)-(4.7) relative to their fully parametric estimators. In Figure 2, we plot their pointwise meansand 95% confidence bands from the 500 estimates against the truth for the OU-SKST process with κ = 22 .
753 and sample size 2202. First, it is worth noting that µ Y and σ Y exhibit strong nonlin-earities that closely resemble the nonlinearities depicted in, for example, A¨ıt-Sahalia (1996b), Jiangand Knight (1997), and Stanton (1997). Second, the mean estimates from both estimation methodsare fairly close to the truth, but the variability of the semiparametric estimators is noticeably largerthan the parametric ones, especially in the right end of the range. This is not surprising: Firstly, asshown in Theorem 5.2, ˆ µ Y and ˆ σ Y converges at slower than √ n -rate due to the use of kernel esti-mators of f Y . From Figure 1, we can see that f Y has a long right tail which is difficult to estimateby the kernel estimator in small and moderate samples. Figure 3 presents the same estimatorsat sample size 5505. At this larger sample size, the bias is even smaller for both methods andthe variability of these estimates are also reduced significantly. Overall, although the parametric27ethod obviously has the advantage due to its parametric structure, our semiparametric methodalso provides fairly satisfactory estimation results.[Figure 2 and 3]The drift and diffusion estimators from the two methods where the true DGP is the CIR-SKSTprocess with κ = 15 .
307 and the two sample sizes are presented in Figure 4 and 5, respectively.Almost identical qualitative conclusions can be reached.[Figure 4 and 5]
As an empirical illustration, we here model the time series dynamics of the CBOE Volatility Indexdata using copula diffusion models. The data consists of the daily VIX index from January 2,1990 to July 19, 2019 (7445 observations). It is displayed and summarized in Figure 6 and Table4, respectively. The time series plot shows a clear pattern of mean reversion, and AugmentedDickey-Fuller tests with reasonable lags all rejected the unit root hypothesis at 5% significancelevel, which justifies the use of stationary diffusion models. The mean and the standard deviationis of VIX is 19 .
21 and 7 .
76, respectively. Meanwhile, the skewness and the kurtosis are 2 .
12 and10 .
85, respectively, suggesting that the stationary distribution deviates quite substantially fromnormality. This is more formally confirmed by the highly significant Jarque-Bera test statistic witha negligible p -value. [Figure 6 and Table 4] We focus on whether two well known parametric transformed diffusion models proposed for mod-elling VIX are supported by the data against their semiparametric alternatives. The two parametricmodels are the transformed-OU model of Detemple and Osakwe (2000) (DO) and the transformed-CIR model of Eraker and Wang (2015) (EW). Specifically, the DO model is the exponential trans-form of the OU process, which can be written as Y t = exp ( X t ) , dX t = κ ( α − X t ) dt + σdW t and the EW model is a parameter-dependent transformation of the CIR process, which is given by Y t = 1 X + δ + (cid:37), dX t = κ ( α − X t ) dt + σ (cid:112) X t dW t Meanwhile, the two semiparametric models we consider are the same two models considered in oursimulations, namely, the nonparametrically transformed OU and CIR models, which we denote as28PTOU and NPTCIR, respectively. Their associated normalized UPD processes are given in (6.1)and (6.2).Importantly, we maintain the assumption that the VIX is a Markov diffusion process. Inparticular, we rule out jumps and stochastic volatility (SV) in the VIX which is inconsistent with theempirical findings of, e.g., Kaeck and Alexander (2013). However, their models are fully parametricand so impose much stronger functional form restrictions on the drift and diffusion componentcompared to our semiparametric approach. Specifically, jumps and SV components are often usedto capture extremal events (fat tails). It is possible that these components are needed in explainingthe VIX dynamics due to the restrictive drift and diffusion specifications they consider. Oursemiparametric approach allows for more flexibility in this respect and so can be seen as a competingapproach to capturing the same features in data. An interesting research topic would be to developtools that allow for formal statistical comparison of our class of models against these alternativeones.
For each of the two UPDs, we examine whether the parametric specification of the transforma-tion is supported by the data. We do this by testing each of the parametric models against thesemiparametric alternative where the transformation is left unspecified. We do so by computinga pseudo Likelihood Ratio (pseudo-LR) test statistic defined as the difference between the pseudolog-likelihood (pseudo-LL) of the semiparametric model and the log-likelihood (LL) of the paramet-ric model. Since the model under the alternative is semiparametric and estimated by pseudo-ML,the pseudo-LL test statistic will not follow a χ -distribution. We therefore resort to a paramet-ric bootstrap procedure: For each of the two pseudo-LR test, we simulate 1000 new time seriesfrom the parametric model using as data-generating parameter values the MLEs obtained fromthe original sample. For each of the 1000 new data sets, of the same size as the original one, weestimate both the parametric model and the semiparametric model and compute the correspondingpseudo-LR statistic. Finally, we use the 95th and 99th quantiles from the simulated distributionof the pseudo-LR statistic as our 5% and 1% bootstrap critical values, respectively.The pseudo-LL is computed using the log-likelihood given in (4.1) with U ( y ) and log U (cid:48) ( y ; θ )replaced by ˜ U (cid:48) ( y ; θ ) given in (4.3) and log ˜ U (cid:48) ( y ; θ ) = log ˆ f Y ( y ) − log f X (cid:16) ˜ U ( y ) ; θ (cid:17) , respectively.Here, ˆ f Y ( y ) is the kernel density estimator which requires us choosing a bandwidth. There isa lack of consensus on the right procedure for choosing bandwidths for kernel estimators usingdependent data. We therefore considered a sequence of bandwidths constructed by multiplyingthe Silverman’s rule of thumb bandwidth, denoted as h S , by a factor k between 0 .
75 and 1 .
75 ona small grid. Visual inspection of these density estimates revealed that with k is around 1 .
5, theresulting density appears to be the most satisfactory in terms of smoothness and the revelation ofdistributional features of the data. For this reason, we report our inferential results based on therelatively optimal bandwidth 1 . h S = 2 . κ is estimated for the NPTOUmodel and only κ and α for the NPTCIR model. In addition, while κ has the same interpretation(i.e. rate of mean reversion) and scale in all four models, α has different scales in the two trans-formed CIR models. For both the transformed OU and the transformed CIR classes of models,we can see that the PMLEs of the mean-reversion parameter ˆ κ are slightly lower than their corre-sponding MLE estimates. The same difference applies to their standard errors. This shows thatparametric (mis-)specification of the stationary distribution does have a quite significant impacton the estimation of the dynamic parameters.[Table 5]The lower panel presents the LL values and the our pseudo-LR test results. We can see thatthe EW model has a much higher LL ( − . − . . . p -values of our pseudo-LR tests, obtained from ourbootstrap procedure described above. For both tests, we observe that those critical values areall negative and the p -values are both exactly zero. This means that the original pseudo-LRs of290 . . We propose a novel semiparametric approach for modelling stationary nonlinear univariate diffu-sions. The class of models can be thought of as Markov copula models where the copula is impliedby the UPD model. Primitive conditions for the identification of the UPD parameters together withthe unknown transformations from discrete samples are provided. We derive the asymptotic prop-erties for our semiparametric likelihood-based estimators of the UPD parameters and kernel-baseddrift and diffusion estimators. Our simulation results suggest that our semiparametric methodperforms well in finite sample compared to the fully parametric method, and our relatively sim-ple application shows that the parametric assumptions on the transformation function of the wellknown DO model and EW model are rejected by the data against our nonparametric alternatives.Potential future work under this framework may include extensions to multivariate diffusions andjump-diffusions. 31 eferences
Ahn, D.-H., Gao, B., 1999. A Parametric nonlinear model of term structure dynamics. Review ofFinancial Studies 12, 721-762.A¨ıt-Sahalia, Y., 1996a. Nonparametric pricing of interest rate derivatives. Econometrica 64, 527-560.A¨ıt-Sahalia, Y., 1996b. Testing continuous-time models of the spot interest rate. Review of FinancialStudies 9, 385-426.A¨ıt-Sahalia, Y., 2002. Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation Approach. Econometrica 70, 223-262.Bandi, F.M., 2002. Short-term interest rate dynamics: A spatial approach. Journal of FinancialEconomics 65, 73-110.Bandi, F.M., Phillips, P.C.B., 2003, Fully nonparametric estimation of scalar diffusion models.Econometrica 71, 241-283.Beare, B.K., 2010. Copulas and temporal dependence. Econometrica 78, 395-410.Bladt, M., Sørensen, M., 2014. Simple simulation of diffusion bridges with application to likelihoodinference for diffusions. Bernoulli 20, 645-675.Bu, R., Cheng, J., Hadri, K., 2017. Specification analysis in regime-switching continuous-timediffusion models for market volatility. Studies in Nonlinear Dynamics and Econometrics 21(1),65-80.Bu, R., Fredj, J., Li, Y., 2017. An empirical comparison of transformed diffusion models for VIX andVIX futures. Journal of International Financial Markets, Institutions and Money 46, 116-127.Bu, R., Giet, L., Hadri, K., Lubrano, M., 2011. Modelling multivariate interest rates using time-varying copulas and reducible non-linear stochastic differential equations. Journal of FinancialEconometrics 9(1), 198-236.Chen, X., Fan, Y., 2006. Estimation of copula-based semiparametric time series models. Journal ofEconometrics 130, 307-335.Chen, X., Hansen, L.P., Scheinkman, J., 2009. Nonlinear principal components and long run im-plications of multivariate diffusions. Annals of Statistics 37, 4279-4312.Chen, X., Hansen, L.P., Carrasco, M., 2010. Nonlinearity and temporal dependence. Journal ofEconometrics 155, 155-169.Chen, X., Wu, W.B., Yi, Y., 2009. Efficient estimation of copula-based semiparametric Markovmodels. Annals of Statistics 37, 4214-4253. 32hoi, S., 2009. Regime-switching univariate diffusion models of the short-term interest rate. Studiesin Nonlinear Dynamics and Econometrics 13(1), Article 4.Conley, T., Hansen, L., Luttmer, E., Scheinkman, J., 1997. Short-term interest rates as subordi-nated diffusions. Review of Financial Studies 10, 525-577.Cox, J., Ingersoll, J., Ross, S., 1985. In intertemporal general equilibrium model of asset prices.Econometrica 53, 363-384.Detemple, J., and Osakwe, C. 2000. The valuation of volatility option. European Finance Review4, 21-50.Eraker, B., Wang, J., 2015. A non-linear dynamic model of the variance risk premium, Journal ofEconometrics 187, 547-556.Forman, J.L., Sørensen, M., 2014. A transformation approach to modelling multi-modal diffusions.Journal of Statistical Planning and Inference 146, 56-69.Gobet, E., Hoffmann, M., Reiß, M., 2004. Nonparametric estimation of scalar diffusions based onlow frequency data. Annals of Statistics 32, 2223-2253.Hansen, B., 1994. Autoregressive conditional density estimation. International Economic Review35, 705-730.Hansen, L.P., Scheinkman, J., Touzi, N., 1998. Spectral methods for identifying scalar diffusions.Journal of Econometrics 86, 1-32.Jiang, G., Knight, J., 1997. A nonparametric approach to the estimation of diffusion processes withan application to a short-term interest rate model. Econometric Theory 13, 615-645.Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman & Hall, London.Jondeau, E., Rockinger, M., 2006. The copula-GARCH model of conditional dependencies - aninternational stock application. Journal of International Money and Finance 25, 827-853.Kaeck, A., Alexander, C., 2013. Continuous-time VIX dynamics: On the role of stochastic volatilityof volatility. International Review of Financial Analysis 28, 46-56.Kanaya, S., Uniform Convergence Rates of Kernel-Based Nonparametric Estimators for ContinuousTime Diffusion Processes: A Damping Function Approach. Econometric Theory 33, 874-914.Kanaya, S., Kristensen, D., 2016. Estimation of stochastic volatility models by nonparametricfiltering. Econometric Theory 32, 861-916.Karatzas, I., Shreve, S., 1991. Brownian Motion and Stochastic Calculus , Proofs
Proof of Theorem 3.2.
From eqs. (3.2)-(3.5), it is obvious that (3.6)-(3.7) imply
S ∼ ˜ S . Now,suppose that S ∼ ˜ S ; this implies that µ Y ( y ; S ) = µ Y (cid:16) y ; ˜ S (cid:17) and σ Y ( y ; S ) = σ Y (cid:16) y ; ˜ S (cid:17) , where µ Y and σ Y are given in eqs. (2.6)-(2.7). That is, for all y ∈ Y , µ X ( U ( y ) ; θ ) U (cid:48) ( y ) − σ X ( U ( y ) ; θ ) U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) = µ X (cid:16) ˜ U ( y ) ; ˜ θ (cid:17) ˜ U (cid:48) ( y ) − σ X (cid:16) ˜ U ( y ) ; ˜ θ (cid:17) ˜ U (cid:48)(cid:48) ( y )˜ U (cid:48) ( y ) ,σ X ( U ( y ) ; θ ) U (cid:48) ( y ) = σ X (cid:16) ˜ U ( y ) ; ˜ θ (cid:17) ˜ U (cid:48) ( y ) . Since V is one-to-one we can set y = V ( x ) in the above to obtain the following for all x ∈ X , µ X ( U ( V ( x )) ; θ ) U (cid:48) ( V ( x )) − σ X ( U ( V ( x )) ; θ ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) (A.1)= µ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48) ( V ( x )) − σ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) ,σ X ( U ( V ( x )) ; θ ) U (cid:48) ( V ( x )) = σ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48) ( V ( x )) . (A.2)Define T ( x ) = ˜ U ( V ( x )) ⇔ T − ( x ) = U (cid:16) ˜ V ( x ) (cid:17) , and observe that U ( V ( x )) = x, U (cid:48) ( V ( x )) V (cid:48) ( x ) = 1 , ∂T ( x ) ∂x = ˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) . Eq. (A.2) combined with the above implies (3.7)(ii), σ X ( x ; θ ) = σ X ( U ( V ( x )) ; θ ) U (cid:48) ( V ( x )) V (cid:48) ( x ) = σ X (cid:16) ˜ U ( V ( x )) ; ˜ θ (cid:17) ˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) = σ X (cid:16) T ( x ) ; ˜ θ (cid:17) ∂T ( x ) / ( ∂x ) = σ T − ( X ) (cid:16) x ; ˜ θ (cid:17) . (A.3)Next, divide through with V (cid:48) ( x ) in (A.1) and rearrange to obtain µ X ( x ; θ ) = µ X (cid:16) T ( x ) ; ˜ θ (cid:17) ∂T ( x ) / ( ∂x ) + 12 (cid:40) σ X ( x ; θ ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) V (cid:48) ( x ) − σ X (cid:16) T − ( x ) ; ˜ θ (cid:17) ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:41) = µ X (cid:16) T ( x ) ; ˜ θ (cid:17) ∂T ( x ) / ( ∂x ) + 12 σ X (cid:16) T ( x ) ; ˜ θ (cid:17) (cid:40) U (cid:48) ( V ( x )) V (cid:48) ( x ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) − ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:41) where the second equality uses (A.3). Eq. (3.7)(i) now follows since1˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) − ˜ U (cid:48)(cid:48) ( V ( x ))˜ U (cid:48) ( V ( x )) V (cid:48) ( x )= 1˜ U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:34) ˜ U (cid:48) ( V ( x )) U (cid:48)(cid:48) ( V ( x )) U (cid:48) ( V ( x )) − ˜ U (cid:48)(cid:48) ( V ( x )) V (cid:48) ( x ) (cid:35) = − U (cid:48) ( V ( x )) V (cid:48) ( x ) (cid:104) ˜ U (cid:48) ( V ( x )) V (cid:48)(cid:48) ( x ) + ˜ U (cid:48)(cid:48) ( V ( x )) V (cid:48) ( x ) (cid:105) = − ∂ T ( x ) / (cid:0) ∂x (cid:1) ∂T ( x ) / ( ∂x ) . roof of Theorem 5.1. We first note that the PMLE takes the same form as the one analyzedin Chen and Fan (2006) with the general copula considered in their work satisfying eq. (2.17). Thedesired result will follow if we can verify that the conditions stated in their proof are satisfied byour assumptions: First, by Assumptions 2.1, the discrete sample { X i ∆ : i = 0 , , . . . , n } generatedby the UPD X is first-order Markovian and with marginal density f X ( x ; θ ) and transition density p X ( x | x ; θ ). Hence, the copula density c X ( u , u ; θ ) in (2.17) implied by X is absolutely continuouswith respect to the Lebesgue measure on [0 , due to its continuity in F X ( x ; θ ), f X ( x ; θ ) and p X ( x | x ; θ ). Moreover, the implied copula is neither the Fr´echet-Hoeffding upper or lower bounddue to Assumption 2.1, i.e., σ X ( x ; θ ) > x ∈ X . Thus, Chen and Fan (2006, Assumption1) is satisfied. Second, our Assumption 4.2(i) ensures that X is β -mixing with polynomial decayrate. Third, by Theorem 2.1, Y is mixing with the same mixing properties as X and so satisfiesChen and Fan (2006, Assumption 1). The remaining conditions are met by Assumption 4.3(i).For the analysis of the proposed sieve MLE, we note that it takes the same form as the oneanalyzed in Chen, Wu and Yi (2009) and so their results carry over to our setting. Their AssumptionM and assumption of β -mixing property are satisfied by Y under our Assumptions 2.1, 2.2, and4.2(ii) together with our Theorem 2.1. The remaining conditions are met by Assumption 4.3(ii). Proof of Theorem 5.2.
Similar to the proof strategy employed in Lemma C.1, we define˜ µ Y ( y ) = µ X ( U ( y ) ; θ ) U (cid:48) ( y ) − σ X ( U ( y ) ; θ ) ˆ U (cid:48)(cid:48) ( y ) U (cid:48) ( y ) , ˜ σ Y ( y ) = σ X ( U ( y ) ; θ )ˆ U (cid:48) ( y ) , and, with f ( i ) Y denoting the i th derivative of f Y and similar for other functions, arrive at √ nh (cid:40) ˆ µ Y ( y ) − µ Y ( y ) − h κ f (3) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ )2 U (cid:48) ( y ) (cid:21)(cid:41) = √ nh (cid:40) ˜ µ Y ( y ) − µ Y ( y ) − h κ f (3) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ )2 U (cid:48) ( y ) (cid:21)(cid:41) + o p (1)= − σ X ( U ( y ) ; θ )2 U (cid:48) ( y ) √ nh (cid:40) ˆ U (2) ( y ) − U (2) ( y ) − h κ f (3) Y ( y ) f X ( U ( y ) ; θ ) (cid:41) + o p (1) , and √ nh (cid:40) ˆ σ Y ( y ) − σ Y ( y ) − h κ f (2) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ ) U (cid:48) ( y ) (cid:21)(cid:41) = √ nh (cid:40) ˜ σ Y ( y ) − σ Y ( y ) − h κ f (2) Y ( y ) f X ( U ( y ) ; θ ) (cid:20) − σ X ( U ( y ) ; θ ) U (cid:48) ( y ) (cid:21)(cid:41) + o p (1)= − σ X ( U ( y ) ; θ ) U (cid:48) ( y ) √ nh (cid:40) ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (2) Y ( y ) f X ( U ( y ) ; θ ) (cid:41) + o p (1) . These together with (C.1) and (C.2) of Lemma C.1 and Slutsky’s Theorem complete the proof.36
Verification of conditions for OU and CIR model
We here verify the technical conditions of Chen and Fan (2006) for the normalized versions of the OUand CIR model given in eqs. (6.1) and (6.2), respectively. For both examples, we will require that U ( y ; θ ), as defined in eq. (2.16), and its first and second-order derivatives w.r.t θ are polynomiallybounded in y . This imposes growth restrictions on the transformation function and is used toeasily verify various moment conditions in the following. Also note that the criterion l ( U i − , U i ; θ )in Chen and Fan (2006) takes the form l ( U i − , U i ; θ ) := log p X (cid:0) U ( Y i ∆ ; θ ) ; U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) − log f X ( U ( Y i ∆ ; θ ) ; θ ), where U i = F Y ( Y i ∆ ), in our notation. B.1 OU model
Assumption 4.2:
It is easily seen that (cid:110) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:111) = − (cid:112) κ x and s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) =exp (cid:16) x (cid:17) / (cid:82) xx ∗ exp (cid:16) z (cid:17) dz . Assumption 4.2 is verified by taking the relevant limits. Assumption 4.3:
The implied copula of the normalized OU process is Gaussian, for which As-sumption 4.3(i) and 4.3(ii) are satisfied as discussed in Chen and Fan (2006) and Chen, Wu, andYi (2009), respectively.
B.2 CIR model
Assumption 4.2:
We obtain (cid:110) µ X ( x ; θ ) σ X ( x ; θ ) − ∂σ X ( x ; θ ) ∂x (cid:111) = (2 α − (cid:112) κ x − (cid:112) κ x and s ( x ; θ ) σ X ( x ; θ ) S ( x ; θ ) = exp { x } x α √ κ √ x/ (cid:82) xx ∗ exp { z } z α dz and the assumption is verified by taking relevant limits. Assumption 4.3.
First observe that p X ( x | x ; θ ) = exp (cid:2) c ( θ ) − c ( θ ) (cid:0) x + e − κ ∆ x (cid:1)(cid:3) x x I α − (cid:0) c ( θ ) √ xx (cid:1) , where I q ( · ) is the so-called modified Bessel function of the first kind and of order q and c ( θ, ∆) > c ( θ, ∆) > f X is here the density of a gamma distributionand so all polynomial moments of X exist. Since U is assumed to be polynomially bounded,this implies that all polynomial moments of Y also exist. All smoothness conditions imposed inChen and Fan (2006) are trivially satisfied since p X ( x | x ; θ ) and U ( y ; θ ) are twice continuouslydifferentiable w.r.t their arguments and so will not be discussed any further. Similarly, we havealready shown that Y is geometrically mixing. It remains to verify the moment conditions and theidentifying restrictions imposed in C1-C.5 in Proposition 4.2 and A2-A6 in Chen and Fan (2006). C1 is satisfied if we restrict θ = ( α, κ ) to be situated in a compact set on R that contains thetrue value. Observe thatlog p X ( x | x ; θ ) = c ( θ ) − c ( θ ) (cid:0) x + e − κ ∆ x (cid:1) + log (cid:16) x x (cid:17) + log I α − (cid:0) c ( θ ) √ xx (cid:1) . s θ ( x | x ; θ ) : = ∂ log p X ( x ; x ; θ ) ∂θ = ˙ c ( θ ) − ˙ c ( θ ) (cid:0) x + e − κ ∆ x (cid:1) + c ( θ ) ∆ e − κ ∆ x + I (cid:48) α − (cid:0) c ( θ ) √ xx (cid:1) c ( θ ) √ xx ˙ c ( θ ) I α − (cid:0) c ( θ ) √ xx (cid:1) + ˙ I α − ( c ( θ ) √ xx ) I α − (2 c ( θ ) √ xx ) , where ˙ c ( θ ) = ∂c ( θ ) / ( ∂θ ) and similar for other functions, I (cid:48) α − ( x ) = ∂I α − ( x ) / ( ∂x ), and˙ I α − ( x ) = ∂I α − ( x ) / ( ∂α ). It is easily verified that (cid:12)(cid:12) I (cid:48) α − ( x ) /I α − ( x ) (cid:12)(cid:12) and (cid:12)(cid:12)(cid:12)(cid:12) I (cid:48) α − ( x ) /I α − ( x ) (cid:12)(cid:12)(cid:12)(cid:12) are both bounded by a polynomial in x . Thus, (cid:107) s X ( x | x ; θ ) (cid:107) is bounded by a polynomial uni-formly in θ ∈ Θ. The expressions of s x ( x | x ; θ ) := ∂ log p X ( x ; x ; θ ) / ( ∂x ) and s x ( x | x ; θ ) := ∂ log p X ( x ; x ; θ ) / ( ∂x ) are on a similar form and also polynomially bounded. Now, observe that l θ ( U i − , U i ; θ ) : = ∂l ( U i − , U i ; θ ) ∂θ = s θ (cid:0) U ( Y i ∆ ; θ ) | U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) + s x (cid:0) U ( Y i ∆ ; θ ) | U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) ˙ U ( Y i ∆ ; θ )+ s x (cid:0) U ( Y i ∆ ; θ ) | U (cid:0) Y ( i − ; θ (cid:1) ; θ (cid:1) ˙ U (cid:0) Y ( i − ; θ (cid:1) − ∂ log f X ( U ( Y i ∆ ; θ ) ; θ ) ∂θ . Given that the model is correctly specified and identified, it follows by standard arguments forMLE that E [ l θ ( U i , U i − ; θ )] = 0 if and only if θ equals the true value. C4 . From the above expression of l θ ( U i , U i − ; θ ) together with our assumption on U ( y ; θ ), it iseasily checked that it is bounded by a polynomial in (cid:0) Y i ∆ , Y ( i − (cid:1) uniformly in θ ∈ Θ. It nowfollows that E [sup θ (cid:107) l θ ( U i , U i − ; θ ) (cid:107) p ] < ∞ for any p ≥ C5. l θ, ( U i − , U i ; θ ) = ∂l θ ( U i − , U i ; θ ) ∂U i − , l θ, ( U i − , U i ; θ ) = ∂l θ ( U i − , U i ; θ ) ∂U i are again bounded by polynomials in (cid:0) Y i ∆ , Y ( i − (cid:1) and so have all relevant moments. A1(ii)-(iii).
With W ,i and W ,i defined in (4.2)-(4.3) in Chen and Fan (2006) and l θ,θ ( U i − , U i ; θ ) = ∂ l ( U i − , U i ; θ ) ∂θ∂θ (cid:48) , lim n →∞ Var (cid:32) √ n n (cid:88) i =1 { l θ ( U i − , U i ; θ ) + W ,i + W ,i } (cid:33) , and E [ l θ,θ ( U i − , U i ; θ )] to have full rank. We have been unable to verify these two conditions dueto the complex form of the score and hessian of the CIR model. A4.
Observe that | W ,i | ≤ E [ | U i − | (cid:107) l θ, ( U i − , U i ; θ ) (cid:107) ] < ∞ and similar for W ,i . Thus, both haveall relevant moments. A5-A6 have already been verified above. 38
Lemma
Lemma C.1
Under Assumptions 2.1-2.2, 4.2(i), and 4.4-4.6, we have as n → ∞ , h → , nh → ∞ , √ nh (cid:26) ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (cid:48)(cid:48) Y ( y ) f X ( U ( y ) ; θ ) (cid:27) → d N (cid:32) , U (cid:48) ( y ) f Y ( y ) (cid:90) R K ( z ) dz (cid:33) , (C.1) and as n → ∞ , h → , nh → ∞ , √ nh (cid:26) ˆ U (cid:48)(cid:48) ( y ) − U (cid:48)(cid:48) ( y ) − h κ f (cid:48)(cid:48)(cid:48) Y ( y ) f X ( U ( y ) ; θ ) (cid:27) → d N (cid:32) , U (cid:48) ( y ) f Y ( y ) (cid:90) R K (cid:48) ( z ) dz (cid:33) . (C.2) Proof.
With ˆ F Y ( y ) given in (4.2), let ˆ f ( i ) Y ( y ) = ˆ F ( i +1) Y ( y ), for i = 1 ,
2, be the i th derivative of thekernel marginal density estimator. Using standard methods for kernel estimators (c.f. Robinson,1983), we obtain under the assumptions of the lemma that, as n → ∞ , h →
0, and nh i → ∞ , √ nh i (cid:26) ˆ f ( i ) Y ( y ) − f ( i ) Y ( y ) − h κ f ( i +2) Y ( y ) (cid:27) → d N (0 , V i ( y )) (C.3)where V i ( y ) = f Y ( y ) (cid:82) R K ( i ) ( z ) dz . Assumptions 2.1 and 4.4 ensure that f Y ( y ) is sufficientlysmooth so that f (2) Y ( y ) and f (3) Y ( y ) exist. Assumption 4.2(i) and 4.6 regulate the mixing propertyof Y and the kernel function, respectively, as required by Robinson (1983).From (4.4) we have ˆ U (cid:48) ( y ) = ˆ f Y ( y ) /f X ( ˆ U ( y ) ; ˆ θ ). Now define ˆ U (cid:48) ( y ) = ˆ f Y ( y ) /f X ( U ( y ) ; θ )and note that Assumption 4.4 and 4.5 together with the delta-method imply ˆ U (cid:48) ( y ) − ˆ U (cid:48) ( y ) = O P (1 / √ n ) = o P (1 / √ nh ). It then follows that √ nh (cid:26) ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (2) Y ( y ) 1 f X ( U ( y ) ; θ ) (cid:27) = √ nh (cid:26) o P (cid:16) / √ nh (cid:17) + ˆ U (cid:48) ( y ) − U (cid:48) ( y ) − h κ f (2) Y ( y ) 1 f X ( U ( y ) ; θ ) (cid:27) = 1 f X ( U ( y ) ; θ ) √ nh (cid:26) ˆ f Y ( y ) − f Y ( y ) − h κ f (2) Y ( y ) (cid:27) + o P (1) . Using (C.3) and the same arguments as in Kristensen (2011, Proof of Theorem 1), we arrive at(C.1).Next, observe that U (cid:48)(cid:48) ( y ) = f (cid:48) Y ( y ) f X ( U ( y ); θ ) − f (cid:48) X ( U ( y ); θ ) f Y ( y ) f X ( U ( y ); θ ) where f (cid:48) X ( x ; θ ) and f (cid:48) Y ( y ) are thefirst derivatives of f X ( x ; θ ) and f Y ( y ), respectively. Similarly, it is easily checked that ˆ U (cid:48)(cid:48) ( y ) = ˆ f (cid:48) Y ( y ) f X ( ˆ U ( y );ˆ θ ) − f (cid:48) X ( ˆ U ( y );ˆ θ ) ˆ f Y ( y ) f X ( ˆ U ( y );ˆ θ ) . Define ˆ U (cid:48)(cid:48) ( y ) = ˆ f (cid:48) Y ( y ) f X ( U ( y ); θ ) − f (cid:48) X ( U ( y ); θ ) f Y ( y ) f X ( U ( y ); θ ) and apply argumentssimilar to before to obtain √ nh (cid:26) ˆ U (cid:48)(cid:48) ( y ) − U (cid:48)(cid:48) ( y ) − h κ f (3) Y ( y ) 1 f X ( U ( y ) ; θ ) (cid:27) = 1 f X ( U ( y ) ; θ ) √ nh (cid:26) f (cid:48) Y ( y ) − f (cid:48) Y ( y ) − h κ f (3) Y ( y ) (cid:27) + o p (1)which together with (C.3) yield (C.2). 39 Tables and Figures
Table 1: Bias and RMSE of κ in the OU-SKST Model Bias/ κ Sample Size 2202 5505True Parameter Value ρ PPMLE PMLE PPMLE PMLE κ = 1 . . . . . . κ = 5 . . . . . . κ = 11 .
377 0 . . . . . κ = 22 .
753 0 . . . . . κ Sample Size 2202 5505True Parameter Value ρ PPMLE PMLE PPMLE PMLE κ = 1 . . . . . . κ = 5 . . . . . . κ = 11 .
377 0 . . . . . κ = 22 .
753 0 . . . . . able 2: Bias and RMSE of κ in the CIR-SKST Model Bias/ κ Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . κ Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . Table 3: Bias and RMSE of α in the CIR-SKST Model Bias/ α Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . α Sample Size 2202 5505True Parameter Values ρ PPMLE PMLE PPMLE PMLE( κ, α ) = (0 . , . . . . . . κ, α ) = (3 . , . . . . . . κ, α ) = (7 . , . . . . . . κ, α ) = (15 . , . . . . . . able 4: Descriptive Statistics of Daily VIX Sample Period January 2, 1990 - July 19, 2019Sample Size 7445Mean 19.21Median 17.31Std Dev. 7.76Skewness 2.12Kurtosis 10.85Jarque-Bera Statistic 24669.26
Table 5: Model Estimation and Pseudo-LR Test Results
Transformed OU Transformed CIRDO NPTOU EW NPTCIRˆ κ α σ (cid:37) δ LL (cid:0) (cid:1) -1.1724 -1.1579 -1.1585 -1.1565 LR CV . -52.1521 -23.6766 CV . -30.5511 -10.9027 p -value 0.0000 0.000042igure 1: Marginal Densities of the Eurodollar Rates. Solid = SKST Density, Dashed = Kernel Density, Dotted = Normal Density
Figure 2: Estimated Drift and Diffusion for the OU-SKST Model ( T = 2202) . Solid = True Function, Dashed = Mean of Estimates, Dotted = 95% Confidence Bands
Figure 3: Estimated Drift and Diffusion for the OU-SKST Model ( T = 5505). Solid = True Function, Dashed = Mean of Estimates, Dotted = 95% Confidence Bands T = 2202). Solid = True Function, Dashed = Mean of Estimates, Dotted = 95% Confidence Bands