Modelling volatile time series with v-transforms and copulas
MModelling volatile time series with v-transforms and copulas
Alexander J. McNeil ∗ The York Management School, University of York12th January 2021
Abstract
An approach to the modelling of volatile time series using a class of uniformity-preservingtransforms for uniform random variables is proposed. V-transforms describe the relationshipbetween quantiles of the stationary distribution of the time series and quantiles of the distribu-tion of a predictable volatility proxy variable. They can be represented as copulas and permitthe formulation and estimation of models that combine arbitrary marginal distributions withcopula processes for the dynamics of the volatility proxy. The idea is illustrated using a GaussianARMA copula process and the resulting model is shown to replicate many of the stylized factsof financial return series and to facilitate the calculation of marginal and conditional character-istics of the model including quantile measures of risk. Estimation is carried out by adaptingthe exact maximum likelihood approach to the estimation of ARMA processes and the model isshown to be competitive with standard GARCH in an empirical application to Bitcoin returndata.
JEL
Codes: C52; G21; G28; G32
Keywords :time series; volatility; probability-integral transform; ARMA model; copula
In this paper, we show that a class of uniformity-preserving transformations for uniform randomvariables can facilitate the application of copula modelling to time series exhibiting the serial depen-dence characteristics that are typical of volatile financial return data. Our main aims are twofold: ∗ Address correspondence to Alexander J. McNeil, The York Management School, University of York, FreboysLane, York YO10 5GD, UK, +44 (0) 1904 325307, [email protected] . a r X i v : . [ q -f i n . R M ] J a n o establish the fundamental properties of v-transforms and show that they are a natural fit to thevolatility modelling problem; to develop a class of processes using the implied copula process of aGaussian ARMA model that can serve as an archetype for copula models using v-transforms. Al-though the existing literature on volatility modelling in econometrics is vast, the models we proposehave some attractive features. In particular, as copula-based models, they allow the separation ofmarginal and serial dependence behaviour in the construction and estimation of models.A distinction is commonly made between genuine stochastic volatility models, as investigatedby Taylor (1994) and Andersen (1994), and GARCH-type models as developed in a long series ofpapers by Engle (1982), Bollerslev (1986), Ding et al. (1993), Glosten et al. (1993) and Bollerslevet al. (1994), among others. In the former an unobservable process describes the volatility at anytime point while in the latter volatility is modelled as a function of observable information describingthe past behaviour of the process; see also the review articles by Shephard (1996) and Andersen andBenzoni (2009). The generalized autoregressive score (GAS) models of Creal et al. (2013) generalizethe observation-driven approach of GARCH models by using the score function of the conditionaldensity to model time variation in key parameters of the time series model. The models of thispaper have more in common with the observation-driven approach of GARCH and GAS but havesome important differences.In GARCH-type models, the marginal distribution of a stationary process is inextricably linkedto the dynamics of the process as well as the conditional or innovation distribution; in most cases, ithas no simple closed form. For example, the standard GARCH mechanism serves to create power-law behaviour in the marginal distribution, even when the innovations come from a lighter-taileddistribution such as Gaussian (Mikosch and Stărică, 2000). While such models work well for manyreturn series, they may not be sufficiently flexible to describe all possible combinations of marginaland serial dependence behaviour encountered in applications. In the empirical example of thispaper, which relates to log-returns on the Bitcoin price series, the data appear to favour a marginaldistribution with sub-exponential tails that are lighter than power tails and this cannot be wellcaptured by standard GARCH models. Moreover, in contrast to much of the GARCH literature,the models we propose make no assumptions about the existence of second-order moments andcould also be applied to very heavy-tailed situations where variance-based methods fail.Let X , . . . , X n be a time series of financial returns sampled at (say) daily frequency and assumethat these are modelled by a strictly stationary stochastic process ( X t ) with marginal distributionfunction (cdf) F X . To match the stylized facts of financial return data described, for example,2y Campbell et al. (1997) and Cont (2001), it is generally agreed that ( X t ) should have limitedserial correlation, but the squared or absolute processes ( X t ) and ( | X t | ) should have significant andpersistent positive serial correlation to describe the effects of volatility clustering.In this paper, we refer to transformed series like ( | X t | ) , in which volatility is revealed throughserial correlation, as volatility proxy series . More generally, a volatility proxy series ( T ( X t )) isobtained by applying a transformation T : R (cid:55)→ R which (i) depends on a change point µ T thatmay be zero, (ii) is increasing in X t − µ T for X t ⩾ µ T and (iii) is increasing in µ T − X t for X t ⩽ µ T .Our approach in this paper is to model the probability-integral transform (PIT) series ( V t ) ofa volatility proxy series. This is defined by V t = F T ( X ) ( T ( X t )) for all t , where F T ( X ) denotes thecdf of T ( X t ) . If ( U t ) is the PIT series of the original process ( X t ) , defined by U t = F X ( X t ) forall t , then a v-transform is a function describing the relationship between the terms of ( V t ) andthe terms of ( U t ) . Equivalently, a v-transform describes the relationship between quantiles of thedistribution of X t and the distribution of the volatility proxy T ( X t ) . Alternatively, it characterizesthe dependence structure or copula of the pair of variables ( X t , T ( X t )) . In this paper, we show howto derive flexible, parametric families of v-transforms for practical modelling purposes.To gain insight into the typical form of a v-transform, let x , . . . , x n represent the realizeddata values and let u , . . . , u n and v , . . . , v n be the samples obtained by applying the transfor-mations v t = F ( | X | ) n ( | x t | ) and u t = F ( X ) n ( x t ) , where F ( X ) n ( x ) = n +1 (cid:80) nt =1 I { x t ⩽ x } and F ( | X | ) n ( x ) = n +1 (cid:80) nt =1 I {| x t | ⩽ x } denote scaled versions of the empirical distribution functions of the x t and | x t | samples, respectively. The graph of ( u t , v t ) gives an empirical estimate of the v-transform for therandom variables ( X t , | X t | ) . In the left-hand plot of Figure 1 we show the relationship for a sampleof n = 1043 daily log-returns of the Bitcoin price series for the years 2016–2019. Note how theempirical v-transform takes the form of a slightly asymmetric ‘V’.The right-hand plot of Figure 1 shows the sample autocorrelation function (acf) of the datagiven by z t = Φ − ( v t ) where Φ is the standard normal cdf. This reveals a persistent pattern ofpositive serial correlation which can be modelled by the implied ARMA copula. This pattern is notevident in the acf of the raw x t data in the centre plot.To construct a volatility model for ( X t ) using v-transforms, we need to specify a process for ( V t ) . In principle, any model for a series of serially dependent uniform variables can be applied to ( V t ) . In this paper, we illustrate concepts using the Gaussian copula model implied by the standardARMA dependence structure. This model is particularly tractable and allows us to derive modelproperties and fit models to data relatively easily.3here is a large literature on copula models for time series; see, for example, the review papersby Patton (2012) and Fan and Patton (2014). While the main focus of this literature has been oncross-sectional dependencies between series, there is a growing literature on models of serial depen-dence. First-order Markov copula models have been investigated by Chen and Fan (2006), Chenet al. (2009) and Domma et al. (2009) while higher-order Markov copula models using D-vines areapplied by Smith et al. (2010). These models are based on the pair-copula apporoach developedin Joe (1996), Bedford and Cooke (2001, 2002) and Aas et al. (2009). However, the standardbivariate copulas that enter these models are not generally effective at describing the typical serialdependencies created by stochastic volatility, as observed by Loaiza-Maya et al. (2018). . . . . . . u v . . . . . . Lag A C F o f x . . . . . . Lag A C F o f z Figure 1: Scatterplot of v t against u t (left), sample acf of raw data x t (centre) and sample acf of z t = Φ − ( v t ) (right). The transformed data are defined by v t = F ( | X | ) n ( | x t | ) and u t = F ( X ) n ( x t ) where F ( X ) n and F ( | X | ) n denoteversions of the empirical distribution function of the x t and | x t | values, respectively. The sample size is n = 1043 and the data are daily log-returns of the Bitcoin price for the years 2016–2019. The paper is structured as follows. In Section 2, we provide motivation for the paper by con-structing a symmetric model using the simplest example of a v-transform. The general theory ofv-transforms is developed in Section 3 and is used to construct the class of VT-ARMA processes andanalyse their properties in Section 4. Section 5 treats estimation and statistical inference for VT-ARMA processes and provides an example of their application to the Bitcoin return data; Section 6presents the conclusions. Proofs may be found in the Appendix A.4
A motivating model
Given a probability space (Ω , F , P ) , we construct a symmetric, strictly stationary process ( X t ) t ∈ N \{ } such that, under the even transformation T ( x ) = | x | , the serial dependence in the volatility proxyseries ( T ( X t )) is of ARMA type. We assume that the marginal cdf F X of ( X t ) is absolutelycontinuous and the density f X satisfies f X ( x ) = f X ( − x ) for all x > . Since F X and F | X | are bothcontinuous the properties of the probability-integral (PIT) transform imply that the series ( U t ) and ( V t ) given by U t = F X ( X t ) and V t = F | X | ( | X t | ) both have standard uniform marginal distributions.Henceforth we refer to ( V t ) as the volatility PIT process and ( U t ) as the series PIT process .Any other volatility proxy series that can be obtained by a continuous and strictly increasingtransformation of the terms of ( | X t | ) , such as ( X t ) , yields exactly the same volatility PIT process.For example, if ˜ V t = F X ( X t ) , then it follows from the fact that F X ( x ) = F | X | ( + √ x ) for x ⩾ that ˜ V t = F X ( X t ) = F | X | ( | X t | ) = V t . In this sense we can think of classes of equivalent volatilityproxies, such as ( | X t | ) , ( X t ) , (exp | X t | ) and (ln(1+ | X t | )) . In fact ( V t ) is itself an equivalent volatilityproxy to ( | X t | ) since F | X | is a continuous and strictly increasing transformation.The symmetry of f X implies that F | X | ( x ) = 2 F X ( x ) − − F X ( − x ) for x ⩾ . Hence wefind that V t = F | X | ( | X t | ) = F | X | ( − X t ) = 1 − F X ( X t ) = 1 − U t , if X t < F | X | ( X t ) = 2 F X ( X t ) − U t − , if X t ⩾ which implies that the relationship between the volatility PIT process ( V t ) and the series PITprocess ( U t ) is given by V t = V ( U t ) = | U t − | (1)where V ( u ) = | u − | is a perfectly symmetric v-shaped function that maps values of U t close to 0or 1 to values of V t close to 1, and values close to . to values close to 0. V is the canonical exampleof a v-transform. It is related to the so-called tent-map transformation T ( u ) = 2 min( u, − u ) by V ( u ) = 1 − T ( u ) .Given ( V t ) , let the process ( Z t ) be defined by setting Z t = Φ − ( V t ) so that we have the followingchain of transformations X t U t V t Z t . F X V Φ − (2)We refer to ( Z t ) as a normalized volatility proxy series . Our aim is to construct a process ( X t ) suchthat, under the chain of transformations in (2), we obtain a Gaussian ARMA process ( Z t ) with5ean zero and variance one. We do this by working back through the chain.The transformation V is not an injection and, for any V t > , there are two possible inverse values, (1 − V t ) and (1+ V t ) . However, by randomly choosing between these values, we can ‘stochasticallyinvert’ V to construct a random variable U t such that V ( U t ) = V t , This is summarized in Lemma 1,which is a special case of a more general result in Proposition 4. Lemma 1.
Let V be a standard uniform variable. If V = 0 set U = . Otherwise let U = (1 − V ) with probability 0.5 and U = (1 + V ) with probability 0.5. Then U is uniformly distributed and V ( U ) = V . This simple result suggests the following algorithm for constructing a process ( X t ) with sym-metric marginal density f X such that the corresponding normalized volatility proxy process ( Z t ) under the absolute value transformation (or continuous and strictly increasing functions thereof) isan ARMA process. We describe the resulting model as a VT-ARMA process. Algorithm 1.
1. Generate ( Z t ) as a causal and invertible Gaussian ARMA process of order ( p, q ) with mean zero and variance one.2. Form the volatility PIT process ( V t ) where V t = Φ( Z t ) for all t .3. Generate a process of iid Bernoulli variables ( Y t ) such that P ( Y t = 1) = 0 . .4. Form the PIT process ( U t ) using the transformation U t = 0 . − V t ) I { Yt =0 } (1 + V t ) I { Yt =1 } .5. Form the process ( X t ) by setting X t = F − X ( U t ) .It is important to state that the use of the Gaussian process ( Z t ) as the fundamental buildingblock of the VT-ARMA process in Algorithm 1 has no effect on the marginal distribution of ( X t ) ,which is F X as specified in the final step of the algorithm. The process ( Z t ) is exploited only for itsserial dependence structure , which is described by a family of finite-dimensional Gaussian copulas;this dependence structure is applied to the volatility proxy process.Figure 2 shows a symmetric VT-ARMA(1,1) process with ARMA parameters α = 0 . and β = − . ; such a model often works well for financial return data. Some intuition for thisobservation can be gained from the fact that the popular GARCH(1,1) model is known to have6he structure of an ARMA(1,1) model for the squared data process; see, for example, McNeil et al.(2015) (Section 4.2) for more details. Time X t − − − − Time Z t − − − . . . . . . k r ( X t , X t + k ) . . . . . . k r ( X t , X t + k ) Figure 2: Realizations of length n = 500 of ( X t ) and ( Z t ) for a VT-ARMA(1,1) process with a marginal Student tdistribution with ν = 3 degrees of freedom and ARMA paramaters α = 0 . and β = − . . ACF plots for ( X t ) and ( | X t | ) are also shown. To generalize the class of v-transforms we admit two forms of asymmetry in the construction de-scribed in Section 2: we allow the density f X to be skewed; we introduce an asymmetric volatilityproxy. Definition 1 (Volatility proxy transformation and profile) . Let T and T be strictly increasing,continuous and differentiable functions on R + = [0 , ∞ ) such that T (0) = T (0) . Let µ T ∈ R . Anytransformation T : R → R of the form T ( x ) = T ( µ T − x ) x ⩽ µ T T ( x − µ T ) x > µ T (3)7s a volatility proxy transformation. The parameter µ T is the change point of T and the associatedfunction g T : R + → R + , g T ( x ) = T − ◦ T ( x ) is the profile function of T .By introducing µ T we allow the possibility that the natural change point may not be identicalto zero. By introducing different functions T and T for returns on either side of the change point,we allow the possibility that one or other may contribute more to the volatility proxy. This has asimilar economic motivation to the leverage effects in GARCH models (Ding et al., 1993); falls inequity prices increase a firm’s leverage and increase the volatility of the share price.Clearly the profile function of a volatility proxy transformation is a strictly increasing, continuousand differentiable function on R + such that g T ( x ) = 0 . In conjunction with µ T , the profile containsall the information about T that is relevant for constructing v-transforms. In the case of a volatilityproxy transformation that is symmetric about µ T , the profile satisfies g T ( x ) = x .The following result shows how v-transforms V = V ( U ) can be obtained by considering differentcontinuous distributions F X and different volatility proxy transformations T of type (3). Proposition 1.
Let X be a random variable with absolutely continuous and strictly increasing cdf F X on R and let T be a volatility proxy transformation. Let U = F X ( X ) and V = F T ( X ) ( T ( X )) .Then V = V ( U ) where V ( u ) = F X (cid:0) µ T + g T (cid:0) µ T − F − X ( u ) (cid:1)(cid:1) − u, u ⩽ F X ( µ T ) u − F X (cid:0) µ T − g − T (cid:0) F − X ( u ) − µ T (cid:1)(cid:1) , u > F X ( µ T ) . (4)The result implies that any two volatility proxy transformations T and ˜ T which have the samechange point µ T and profile function g T belong to an equivalence class with respect to the resultingv-transform. This generalizes the idea that T ( x ) = | x | and T ( x ) = x give the same v-transform inthe symmetric case of Section 2. Note also that the volatility proxy transformations T ( V ) and T ( Z ) defined by T ( V ) ( x ) = F T ( X ) ( T ( x )) = V (cid:0) F X ( x ) (cid:1) T ( Z ) ( x ) = Φ − ( T ( V ) ( x )) = Φ − (cid:16) V (cid:0) F X ( x ) (cid:1)(cid:17) (5)are in the same equivalence class as T since they share the same change point and profile function. Definition 2 (v-transform and fulcrum) . Any transformation V that can be obtained from equa-tion (4) by choosing an absolutely continuous and strictly increasing cdf F X on R and a volatility8roxy transformation T is a v-transform. The value δ = F X ( µ T ) is the fulcrum of the v-transform. In this section we derive a family of v-transforms using construction (4) by taking a tractableasymmetric model for F X using the construction proposed by Fernández and Steel (1998) and bysetting µ T = 0 and g T ( x ) = kx ξ for k > and ξ > . This profile function contains the identityprofile g T ( x ) = x (corresponding to the symmetric volatility proxy transformation) as a specialcase, but allows cases where negative or positive returns contribute more to the volatility proxy.The choices we make may at first sight seem rather arbitrary, but the resulting family can in factassume many of the shapes that are permissable for v-transforms, as we will argue.Let f be a density that is symmetric about the origin and let γ > be a scalar parameter.Fernandez and Steel suggested the model f X ( x ; γ ) = γ γ f ( γx ) x ⩽ γ γ f (cid:16) xγ (cid:17) x > . (6)This model is often used to obtain skewed normal and skewed Student distributions for use asinnovation distributions in econometric models. A model with γ > is skewed to the right whilea model with γ < is skewed to the left, as might be expected for asset returns. We consider theparticular case of a Laplace or double exponential distribution f ( x ) = 0 . −| x | ) which leads toparticularly tractable expressions. Proposition 2.
Let F X ( x ; γ ) be the cdf of the density (6) when f ( x ) = 0 . −| x | ) . Set µ T = 0 and let g T ( x ) = kx ξ for k, ξ > . The v-transform (4) is given by V δ,κ,ξ ( u ) = − u − (1 − δ ) exp (cid:16) − κ (cid:0) − ln (cid:0) uδ (cid:1)(cid:1) ξ (cid:17) u ⩽ δ,u − δ exp (cid:18) − κ − /ξ (cid:16) − ln (cid:16) − u − δ (cid:17)(cid:17) /ξ (cid:19) u > δ, (7) where δ = F X (0) = (1 + γ ) − ∈ (0 , and κ = k/γ ξ +1 > . It is remarkable that (7) is a uniformity-preserving transformation. If we set ξ = 1 and κ = 1 we get V δ ( u ) = ( δ − u ) /δ u ⩽ δ, ( u − δ ) / (1 − δ ) u > δ (8)9hich obviously includes the symmetric model V . ( u ) = | u − | . The v-transform V δ ( u ) in (8) isa very convenient special case and we refer to it as the linear v-transform.In Figure 3 we show the v-transform V δ,κ,ξ when δ = 0 . , κ = 1 . and ξ = 0 . . We will use thisparticular v-transform to illustrate further properties of v-transforms and find a characterization. . . . . . . u V ( u ) v u u* d v Figure 3: An asymmetric v-transform from the family defined in (7). For any v-transform, if v = V ( u ) and u ∗ isthe dual of u , then the points ( u, , ( u, v ) , ( u ∗ , and ( u ∗ , v ) form the vertices of a square. For the given fulcrum δ ,a v-transform can never enter the gray shaded area of the plot. It is easily verified that any v-transform obtained from (4) consists of two arms or branches, de-scribed by continuous and strictly monotonic functions; the left arm is decreasing and the rightarm increasing. See Figure 3 for an illustration. At the fulcrum δ we have V ( δ ) = 0 . Every point u ∈ [0 , \ { δ } has a dual point u ∗ on the opposite side of the fulcrum such that V ( u ∗ ) = V ( u ) .Dual points can be interpreted as the quantile probability levels of the distribution of X that giverise to the same level of volatility. 10e collect these properties together in the following lemma and add one further importantproperty that we refer to as the square property of a v-transform; this property places constraintson the shape that v-transforms can take and is illustrated in Figure 3. Lemma 2.
A v-transform is a mapping V : [0 , → [0 , with the following properties:1. V (0) = V (1) = 1 ;2. There exists a point δ known as the fulcrum such that < δ < and V ( δ ) = 0 ;3. V is continuous;4. V is strictly decreasing on [0 , δ ] and strictly increasing on [ δ, ;5. Every point u ∈ [0 , \ { δ } has a dual point u ∗ on the opposite side of the fulcrum satisfying V ( u ) = V ( u ∗ ) and | u ∗ − u | = V ( u ) (square property). It is instructive to see why the square property must hold. Consider Figure 3 and fix a point u ∈ [0 , \ { δ } with V ( u ) = v . Let U ∼ U (0 , and let V = V ( U ) . The events { V ⩽ v } and { min( u, u ∗ ) ⩽ U ⩽ max( u, u ∗ ) } are the same and hence the uniformity of V under a v-transformimplies that v = P ( V ⩽ v ) = P (min( u, u ∗ ) ⩽ U ⩽ max( u, u ∗ )) = | u ∗ − u | . (9)The properties in Lemma 2 could be taken as the basis of an alternative definition of a v-transform. In view of (9) it is clear that any mapping V that has these properties is a uniformity-preserving transformation. We can characterize the mappings V that have these properties asfollows. Theorem 1.
A mapping V : [0 , → [0 , has the properties listed in Lemma 2 if and only if ittakes the form V ( u ) = (1 − u ) − (1 − δ )Ψ (cid:0) uδ (cid:1) u ⩽ δ,u − δ Ψ − (cid:16) − u − δ (cid:17) u > δ, (10) where Ψ is a continuous and strictly increasing distribution function on [0 , . Our arguments so far show that every v-transform must have the form (10). It remains toverify that every uniformity-preserving transformation of the form (10) can be obtained from con-struction (4) and this is the purpose of the final result of this section. This allows us to view11efinition 2, Lemma 2 and the characterization (10) as three equivalent approaches to the defini-tion of v-transforms.
Proposition 3.
Let V be a uniformity-preserving transformation of the form (10) and F X a con-tinuous distribution function. Then V can be obtained from construction (4) using any volatilityproxy transformation with change point µ T = F − X ( δ ) and profile g T ( x ) = F − X ( F X ( µ T − x ) + V ( F X ( µ T − x ))) − µ T , x ⩾ . (11)Henceforth we can view (10) as the general equation of a v-transform. Distribution functions Ψ on [0 , can be thought of as generators of v-transforms. Comparing (10) with (7) we see thatour parametric family V δ,κ,ξ is generated by Ψ( x ) = exp( − κ ( − (ln x ) ξ )) . This is a 2-parameter dis-tribution whose density can assume many different shapes on the unit interval including increasing,decreasing, unimodal and bathtub-shaped forms. In this respect it is quite similar to the beta distri-bution which would yield an alternative family of v-transforms. The uniform distribution function Ψ( x ) = x gives the family of linear v-transforms V δ .In applications we construct models starting from the building blocks of a tractable v-transform V such as (7) and a distribution F X ; from these we can always infer an implied profile function g T using (11). The alternative approach of starting from g T and F X and constructing V via (4) isalso possible but can lead to v-transforms that are cumbersome and computationally expensive toevaluate if F X and its inverse do not have simple closed forms. If two uniform random variables are linked by the v-transform V = V ( U ) then the joint distributionfunction of ( U, V ) is a special kind of copula. In this section we derive the form of the copula, whichfacilitates the construction of stochastic processes using v-transforms.To state the main result we use the notation V − and V ′ for the the inverse function and thegradient function of a v-transform V . Although there is no unique inverse V − ( v ) (except when v = 0 ) the fact that the two branches of a v-transform mutually determine each other allows usto define V − ( v ) to be the inverse of the left branch of the v-transform given by V − : [0 , → [0 , δ ] , V − ( v ) = inf { u : V ( u ) = v } . The gradient V ′ ( u ) is defined for all points u ∈ [0 , \ { δ } andwe adopt the convention that V ′ ( δ ) is the left derivative as u → δ . Theorem 2.
Let V and U be random variables related by the v-transform V = V ( U ) . . The joint distribution function of ( U, V ) is given by the copula C ( u, v ) = P ( U ⩽ u, V ⩽ v ) = u < V − ( v ) u − V − ( v ) V − ( v ) ⩽ u < V − ( v ) + vv u ⩾ V − ( v ) + v . (12)
2. Conditional on V = v the distribution of U is given by U = V − ( v ) with probability ∆( v ) if v ̸ = 0 V − ( v ) + v with probability − ∆( v ) if v ̸ = 0 δ if v = 0 (13) where ∆( v ) = − V ′ ( V − ( v )) . (14) E (∆( V )) = δ . Remark 1.
In the case of the symmetric v-transform V ( u ) = | − u | the copula in (12) takes theform C ( u, v ) = max(min( u + v − , v ) , . We note that this copula is related to a special case ofthe tent map copula family C T θ in Remillard (2013) by C ( u, v ) = u − C T ( u, − v ) .For the linear v-transform family the conditional probability ∆( v ) in (14) satisfies ∆( v ) = δ .This implies that the value of V contains no information about whether U is likely to be below orabove the fulcrum; the probability is always the same regardless of V . In general this is not thecase and the value of V does contain information about whether U is large or small.Part (2) of Theorem 2 is the key to stochastically inverting a v-transform in the general case.Based on this result we define the concept of stochastic inversion of a v-transform. We refer to thefunction ∆ as the conditional down probability of V . Definition 3 (Stochastic inversion function of a v-transform) . Let V be a v-transform with condi-tional down probability ∆ . The two-place function V − : [0 , × [0 , → [0 , defined by V − ( v, w ) = V − ( v ) if w ⩽ ∆( v ) v + V − ( v ) if w > ∆( v ) . (15)13s the stochastic inversion function of V .The following proposition, which generalizes Lemma 1, allows us to construct general asymmetricprocesses that generalize the process of Algorithm 1. Proposition 4.
Let V and W be iid U (0 , variables and let V be a v-transform with stochasticinversion function V . If U = V − ( V, W ) , then V ( U ) = V and U ∼ U (0 , . In Section 4 we apply v-transforms and their stochastic inverses to the terms of time seriesmodels. To understand the effect this has on the serial dependencies between random variables, weneed to consider multivariate componentwise v-transforms of random vectors with uniform marginaldistributions and these can also be represented in terms of copulas. We now give a result whichforms the basis for the analysis of serial dependence properties. The first part of the result shows therelationship between copula densities under componentwise v-transforms. The second part showsthe relationship under the componentwise stochastic inversion of a v-transform; in this case weassume that the stochastic inversion of each term takes place independently given V so that allserial dependence comes from V . Theorem 3.
Let V be a v-transform and let U = ( U , . . . , U d ) ′ and V = ( V , . . . , V d ) ′ be vectors ofuniform random variables with copula densities c U and c V respectively.1. If V = ( V ( U ) , . . . , V ( U d )) ′ then c V ( v , . . . , v d ) = (cid:88) j =1 · · · (cid:88) j d =1 c U ( u j , . . . , u dj d ) d (cid:89) i =1 ∆( v i ) I { ji =1 } (1 − ∆( v i )) I { ji =2 } (16) where u i = V − ( v i ) and u i = V − ( v i ) + v i for all i ∈ { , . . . , d } .2. If U = ( V − ( V , W ) , . . . , V − ( V d , W d )) ′ where W , . . . , W d are iid uniform random variablesthat are also independent of V , . . . , V d , then c U ( u , . . . , u d ) = c V ( V ( u ) , . . . , V ( u d )) . (17) In this section we study some properties of the class of time series models obtained by the followingalgorithm, which generalizes Algorithm 1. The models obtained are described as VT-ARMA pro-14esses since they are stationary time series constructed using the fundamental building blocks of av-transform V and an ARMA process. Algorithm 2.
1. Generate ( Z t ) as a causal and invertible Gaussian ARMA process of order ( p, q ) with mean zero and variance one.2. Form the volatility PIT process ( V t ) where V t = Φ( Z t ) for all t .3. Generate iid U (0 , random variables ( W t ) .4. Form the series PIT process ( U t ) by taking the stochastic inverses U t = V − ( V t , W t ) .5. Form the process ( X t ) by setting X t = F − X ( U t ) for some continuous cdf F X .We can add any marginal behaviour in the final step and this allows for an infinitely rich choice.We can, for instance, even impose an infinite-variance or an infinite-mean distribution, such as theCauchy distribution, and still obtain a strictly stationary process for ( X t ) . We make the followingdefinitions. Definition 4 (VT-ARMA and VT-ARMA copula process) . Any stochastic process ( X t ) that can begenerated using Algorithm 2 by choosing an underlying ARMA process with mean zero and varianceone, a v-transform V and and a continuous distribution function F X is a VT-ARMA process. Theprocess ( U t ) obtained at the penultimate step of the algorithm is a VT-ARMA copula process.Figure 4 gives an example of a simulated process using Algorithm 2 and the v-transform V δ,κ,ξ in (7) with κ = 0 . and MA parameter ξ = 1 . . The marginal distribution is a heavy-tailed skewedStudent distribution of type (6) with degrees-of-freedom ν = 3 and skewness γ = 0 . , which givesrise to more large negative returns than large positive returns. The underlying time series model isan ARMA(1,1) model with AR parameter α = 0 . and MA parameter β = − . . See caption offigure for full details of parameters.In the remainder of this section we concentrate on the properties of VT-ARMA copula processes ( U t ) from which related properties of VT-ARMA processes ( X t ) may be easily inferred.15 .1 Stationary distribution The VT-ARMA copula process ( U t ) of Definition 4 is a strictly stationary process since the jointdistribution of ( U t , . . . , U t k ) for any set of indices t < · · · < t k is invariant under time shifts. Thisproperty follows easily from the strict stationarity of the underlying ARMA process ( Z t ) accordingto the following result, which uses Theorem 3. Proposition 5.
Let ( U t ) follow a VT-ARMA copula process with v-transform V and an underlyingARMA( p , q ) structure with autocorrelation function ρ ( k ) . The random vector ( U t , . . . , U t k ) for k ∈ N has joint density c Ga P ( t ,...,t k ) ( V ( u ) , . . . , V ( u k )) where c Ga P ( t ,...,t k ) denotes the density of theGaussian copula C Ga P ( t ,...,t k ) and P ( t , . . . , t k ) is a correlation matrix with ( i, j ) element given by ρ ( | t j − t i | ) . An expression for the joint density facilitates the calculation of a number of dependence measuresfor the bivariate marginal distribution of ( U t , U t + k ) . In the bivariate case the correlation matrix ofthe underlying Gaussian copula C Ga P ( t,t + k ) contains a single off-diagonal value ρ ( k ) and we simplywrite C Ga ρ ( k ) . The Pearson correlation of ( U t , U t + k ) is given by ρ ( U t , U t + k ) = 12 (cid:90) (cid:90) u u c Ga ρ ( k ) ( V ( u ) , V ( u )) d u d u − . (18)This value is also the value of the Spearman rank correlation ρ S ( X t , X t + k ) for a VT-ARMA process ( X t ) with copula process ( U t ) (since the Spearman’s rank correlation of a pair of continuous randomvariables is the Pearson correlation of their copula). The calculation of (18) typically requiresnumerical integration. However, in the special case of the linear v-transform V δ in (8) we can get asimpler expression as shown in the following result. Proposition 6.
Let ( U t ) be a VT-ARMA copula process satisfying the assumptions of Proposition 5with linear v-transform V δ . Let ( Z t ) denote the underlying Gaussian ARMA process. Then ρ ( U t , U t + k ) = (2 δ − ρ S ( Z t , Z t + k ) = 6(2 δ − arcsin (cid:16) ρ ( k )2 (cid:17) π . (19)For the symmetric v-transform V . , equation (19) obviously yields a correlation of zero so that,in this case, the VT-ARMA copula process ( U t ) is a white noise with an autocorrelation functionthat is zero, except at lag zero. However even a very asymmetric model with δ = 0 . or δ = 0 . gives ρ ( U t , U t + k ) = 0 . ρ S ( Z t , Z t + k ) so that serial correlations tend to be very weak.16hen we add a marginal distribution, the resulting process ( X t ) has a different auto-correlationfunction to ( U t ) , but the same rank autocorrelation function. The symmetric model of Section 2 isa white noise process. General asymmetric processes ( X t ) are not perfect white noise processes buthave only very weak serial correlation. To derive the conditional distribution of a VT-ARMA copula process we use the vector notation U t = ( U , . . . , U t ) ′ and Z t = ( Z , . . . , Z t ) ′ to denote the history of processes up to time point t and u t and z t for realizations. These vectors are related by the componentwise transformation Z t = Φ − ( V ( U t )) . We assume all processes have time index set given by t ∈ { , , . . . } . Proposition 7.
For t > the conditional density f U t | U t − ( u | u t − ) is given by f U t | U t − ( u | u t − ) = ϕ (cid:16) Φ − ( V ( u )) − µ t σ ϵ (cid:17) σ ϵ ϕ (Φ − ( V ( u ))) (20) where µ t = E ( Z t | Z t − = Φ − ( V ( u t − ))) and σ ϵ is the standard deviation of the innovation processfor the ARMA model followed by ( Z t ) . When ( Z t ) is iid white noise µ t = 0 , σ ϵ = 1 and (20) reduces to the uniform density f U t | U t − ( u | u t − ) = 1 as expected. In the case of the first-order Markov AR(1) model Z t = α Z t − + ϵ t theconditional mean of Z t is µ t = α Φ − ( V ( u t − )) and σ ϵ = 1 − α . The conditional density (20) canbe easily shown to simplify to f U t | U t − ( u | u t − ) = c Ga α ( V ( u ) , V ( u t − )) where c Ga α ( V ( u ) , V ( u )) denotes the copula density derived in Proposition 5. In this special case the VT-ARMA model fallswithin the class of first-order Markov copula models considered by Chen and Fan (2006), althoughthe copula is new.If we add a marginal distribution F X to the VT-ARMA copula model to obtain a model for ( X t ) and use similar notational conventions as above, the resulting VT-ARMA model has conditionaldensity f X t | X t − ( x | x t − ) = f X ( x ) f U t | U t − ( F X ( x ) | F X ( x t − )) (21)with f U t | U t − as in (20). An interesting property of the VT-ARMA process is that the conditionaldensity (21) can have a pronounced bimodality for values of µ t in excess of zero, that is in highvolatility situations where the conditional mean of Z t is higher than the marginal mean value of zero;in low volatility situations the conditional density appears more concentrated around zero. This17henomenon is illustrated in Figure 4. The bimodality in high volatility situations makes sense: insuch cases it is likely that the next return will be large in absolute value and relatively less likelythat it will be close to zero. Time X t − − Time Z t − − − −6 −4 −2 0 2 4 . . . . x den s i t y −6 −4 −2 0 2 4 . . . . . . x den s i t y Figure 4: Top left: realization of length n = 500 of ( X t ) for a process with a marginal skewed Student distribution(parameters: ν = 3 , γ = 0 . , µ = 0 . , σ = 1 ) a v-transform of the form (7) (parameters: δ = 0 . , κ = 0 . , ξ = 1 . )and an underlying ARMA process ( α = 0 . , β = − . , σ ϵ = 0 . ). Top right: the underlying ARMA process ( Z t ) in gray with the conditional mean ( µ t ) superimposed in black; horizontal lines at µ t = 0 . (a high value) and µ t = − . (a low value). The corresponding conditional densities are shown in the bottom figures with the marginaldensity as a dashed line. The conditional distribution function of ( X t ) is F X t | X t − ( x | x t − ) = F U t | U t − ( F X ( x ) | F X ( x t − )) and hence the ψ -quantile x ψ,t of F X t | X t − can be obtained by solving ψ = F U t | U t − ( F X ( x ψ,t ) | F X ( x t − )) . (22)For ψ < . the negative of this value is often referred to as the conditional (1 − ψ ) -VaR (value-at-risk) at time t in financial applications. 18 Statistical inference
In the copula approach to dependence modelling, the copula is the object of central interest andmarginal distributions are often of secondary importance. A number of different approaches toestimation are found in the literature. As before, let x , . . . , x n represent realizations of variables X , . . . , X n from the time series process ( X t ) .The semi-parametric approach developed by Genest et al. (1995) is very widely used in copulainference and has been applied by Chen and Fan (2006) to first-order Markov copula models inthe time series context. In this approach the marginal distribution F X is first estimated non-parametrically using the scaled empirical distribution function F ( X ) n (see definition in Section 1)and the data are transformed onto the (0 , scale. This has the effect of creating pseudo-copuladata u t = rank ( x t ) / ( n + 1) where rank ( x t ) denotes the rank of x t within the sample. The copula isfitted to the pseudo-copula data by maximum likelihood (ML).As an alternative, the inference-functions-for-margins (IFM) approach of Joe (2015) could beapplied. This is also a two-step method although in this case a parametric model (cid:98) F X is estimatedunder an iid assumption in the first step and the copula is fitted to the data u t = (cid:98) F X ( x t ) in thesecond step.The approach we adopt for our empirical example is to first use the semi-parametric approach todetermine a reasonable copula process, then to estimate marginal parameters under an iid assump-tion, and finally to estimate all parameters jointly using the parameter estimates from the previoussteps as starting values.We concentrate on the mechanics of deriving maximum likelihood estimates (MLEs). The prob-lem of establishing the asymptotic properties of the MLEs in our setting is a difficult one. It issimilar to, but appears to be more technically challenging than, the problem of showing consis-tency and efficiency of MLEs for a Box-Cox-transformed Gaussian ARMA process, as discussedin Terasaka and Hosoya (2007). We are also working with a componentwise transformed ARMAprocess, although in our case the transformation ( X t ) → ( Z t ) is via the non-linear, non-increasingvolatility proxy transformation T ( Z ) ( x ) in (5), which is not differentiable at the change point µ T .We have, however, run extensive simulations which suggest good behaviour of the MLEs in largesamples. 19 .1 Maximum likelihood estimation of VT-ARMA copula process We first consider the estimation of the VT-ARMA copula process for a sample of data u , . . . , u n .Let θ ( V ) and θ ( A ) denote the parameters of the v-transform and ARMA model respectively. Itfollows from Theorem 3 (part 2) and Proposition 5 that the log-likelihood for the sample u , . . . , u n is simply the log density of the Gaussian copula under componentwise inverse v-transformation.This is given by L ( θ ( V ) , θ ( A ) | u , . . . , u n ) = L ∗ ( θ ( A ) | Φ − ( V θ ( V ) ( u )) , . . . , Φ − ( V θ ( V ) ( u n ))) − n (cid:88) t =1 ln ϕ (cid:0) Φ − ( V θ ( V ) ( u t )) (cid:1) (23)where the first term L ∗ is the log-likelihood for an ARMA model with a standard N(0,1) marginaldistribution. Both terms in the log-likelihood (23) are relatively straightforward to evaluate.The evaluation of the ARMA likelihood L ∗ ( θ ( A ) | z , . . . , z n ) for parameters θ ( A ) and data z , . . . , z n can be accomplished using the Kalman filter. However, it is important to note that theassumption that the data z , . . . , z n are standard normal requires a bespoke implementation of theKalman filter, since standard software always treats the error variance σ ϵ as a free parameter inthe ARMA model. In our case we need to constrain σ ϵ to be a function of the ARMA parametersso that var( Z t ) = 1 . For example, in the case of an ARMA(1,1) model with AR parameter α andMA parameter β , this means that σ ϵ = σ ϵ ( α , β ) = (1 − α ) / (1 + 2 α β + β ) . The constraint on σ ϵ must be incorporated into the state-space representation of the ARMA model.Model validation tests for the VT-ARMA copula can be based on residuals r t = z t − (cid:98) µ t , z t = Φ − ( V (cid:98) θ ( V ) ( u t ))) (24)where z t denotes the implied realization of the normalized volatility proxy variable and where anestimate (cid:98) µ t of the conditional mean µ t = E ( Z t | Z t − = z t ) may be obtained as an output of theKalman filter. The residuals should behave like an iid sample from a normal distribution.Using the estimated model, it is also possible to implement a likelihood-ratio (LR) test forthe presence of stochastic volatility in the data. Under the null hypothesis that θ ( A ) = thelog-likelihood (23) is identically equal to zero. Thus the size of the maximized log-likelihood L ( (cid:98) θ ( V ) , (cid:98) θ ( A ) ; u , . . . , u n ) provides a measure of the evidence for the presence of stochastic volatility.20 .2 Adding a marginal model If F X and f X denote the cdf and density of the marginal model and the parameters are denoted θ ( M ) then the full log-likelihood for the data x , . . . , x n is simply L full ( θ | x , . . . , x n ) = n (cid:88) t =1 ln f X ( x t ; θ ( M ) )+ L (cid:16) θ ( V ) , θ ( A ) | F X ( x ; θ ( M ) ) , . . . , F X ( x n ; θ ( M ) ) (cid:17) (25)where the first term is the log-likelihood for a sample of iid data from the marginal distribution F X and the second term is (23).When a marginal model is added we can recover the implied form of the volatility proxy trans-formation using Proposition 3. If (cid:98) δ is the estimated fulcrum parameter of the v-transform then theestimated change point is (cid:98) µ T = F − X ( (cid:98) δ ; (cid:98) θ ( M ) ) and the implied profile function is (cid:98) g T ( x ) = (cid:98) F − X (cid:16) (cid:98) F X ( (cid:98) µ T − x ) − V (cid:98) θ ( V ) (cid:16) (cid:98) F X ( (cid:98) µ T − x ) (cid:17)(cid:17) − (cid:98) µ T . (26)Note that is is possible to force the change point to be zero in a joint estimation of marginal modeland copula by imposing the constraint F X (0; θ ( M ) ) = δ on the fulcrum and marginal parametersduring the optimization. However, in our experience, superior fits are obtained when these param-eters are unconstrained. We analyse n = 1043 daily log-returns for the Bitcoin price series for the period 2016–2019; valuesare multiplied by 100. We first apply the semi-parametric approach of Genest et al. (1995) usingthe log-likelihood (23) which yields the results in Table 1. Different models are referred to byVT( n )-ARMA( p , q ) where ( p, q ) refers to the ARMA model and n indexes the v-transform: 1 is thelinear v-transform V δ in (8); 3 is the three-parameter transform V δ,κ,ξ in (7); 2 is the two-parameterv-transform given by V δ,κ := V δ,κ, . In unreported analyses we also tried the three-parameter familybased on the beta distribution, but this had negligible effect on the results.The column marked L gives the value of the maximized log-likelihood. All values are large andpositive showing strong evidence of stochastic volatility in all cases. The model VT(1)-ARMA(1,0)is a first-order Markov model with linear v-transform. The fit of this model is noticeably poorer than21he others suggesting that Markov models are insufficient to capture the persistence of stochasticvolatility in the data. The column marked SW contains the p-value for a Shapiro-Wilks test ofnormality applied to the residuals from the VT-ARMA copula model; the result is non-significantin all cases.Model α β δ κ ξ SW L AICVT(1)-ARMA(1,0) 0.283 0.460 0.515 37.59 -71.170.026 0.001VT(1)-ARMA(1,1) 0.962 -0.840 0.416 0.197 92.91 -179.810.012 0.028 0.004VT(2)-ARMA(1,1) 0.965 -0.847 0.463 0.920 0.385 94.73 -181.450.011 0.026 0.001 0.131VT(3)-ARMA(1,1) 0.962 -0.839 0.463 0.881 0.995 0.407 94.82 -179.640.012 0.028 0.001 0.123 0.154
Table 1: Analysis of daily Bitcoin return data 2016–2019. Parameter estimates, standard errors (below estimates)and information about the fit: SW denotes Shapiro Wilks p-value; L is the maximized value of the log-likelihoodand AIC is the Akaike information criterion. According to the AIC values, the VT(2)-ARMA(1,1) is the best model. We experimented withhigher order ARMA processes but this did not lead to further significant improvements. Figure 5provides a visual of the fit of this model. The pictures in the panels show the QQplot of the residualsagainst normal, acf plots of the residuals and squared residuals and the estimated conditional meanprocess ( (cid:98) µ t ) , which can be taken as an indicator of high and low volatility periods. The residuals andabsolute residuals show very little evidence of serial correlation and the QQplot is relatively linear,suggesting that the ARMA filter has been successful in explaining much of the serial dependencestructure of the normalized volatility proxy process.We now add various marginal distributions to the VT(2)-ARMA(1,1) copula model and estimateall parameters of the model jointly. We have experimented with a number of location-scale familiesincluding Student t, Laplace (double exponential) and a double-Weibull family which generalizes theLaplace distribution and is constructed by taking back-to-back Weibull distributions. Estimationresults are presented for these 3 distributions in Table 2. All three marginal distributions aresymmetric around their location parameters µ and no improvement is obtained by adding skewnessusing the construction of Fernández and Steel (1998) described in Section 3.1; in fact, the Bitcoinreturns in this time period show a remarkable degree of symmetry. In the table the shape andscale parameters of the distributions are denoted η and σ respectively; in the case of Student, aninfinite-variance distribution with degree-of-freedom parameter η = 1 . is fitted, but this model is22 − − − Theoretical r e s i d . . . . . . Lag A C F Series resid . . . . . . Lag A C F Series abs(resid) − . − . . . . m u_ t Figure 5: Plots for a VT(2)-ARMA(1,1) model fitted to the Bitcoin return data: QQplot of the residuals againstnormal (upper left); acf of the residuals (upper right); acf of the absolute residuals (lower left); estimatedconditional mean process ( µ t ) (lower right). inferior to the models with Laplace and double-Weibull margins; the latter is the favoured modelon the basis of AIC values.Figure 6 shows some aspects of the joint fit for the fully parametric VT(2)-ARMA(1,1) modelwith double-Weibull margin. A QQplot of the data against the fitted marginal distribution confirmsthat the double-Weibull is a good marginal model for these data. Although this distribution is sub-exponential (heavier-tailed than exponential), its tails do not follow a power law and it is in themaximum domain of attraction of the Gumbel distribution (see, for example, McNeil et al., 2015,Chapter 5).Using (26) the implied volatility proxy profile function (cid:98) g T can be constructed and is found tolie just below the line y = x as shown in the upper-right panel. The change point is estimatedto be (cid:98) µ T = 0 . . We can also estimate an implied volatility proxy transformation in the equiv-alence class defined by (cid:98) g T and (cid:98) µ T . We estimate the transformation T = T ( Z ) in (5) by taking23tudent Laplace dWeibull α β -0.842 0.026 -0.847 0.025 -0.847 0.035 δ κ η µ σ L -2801.696 -2791.999 -2779.950AIC 5617.392 5595.999 5573.899 Table 2: VT(2)-ARMA(1,1) model with 3 different margins: Student t, Laplace, double Weibull. Parameterestimates, standard errors (alongside estimates) and information about the fit: SW denotes Shapiro Wilks p-value; L is the maximized value of the log-likelihood and AIC is the Akaike information criterion. (cid:98) T ( x ) = Φ − ( V (cid:98) θ ( V ) ( F X ( x ; (cid:98) θ ( M ) ))) . In the lower-left panel of Figure 6 we show the empirical v-transform formed from the data ( x t , (cid:98) T ( x t )) together with the fitted parametric v-transform V (cid:98) θ ( V ) .We recall from Section 1 that the empirical v-transform is the plot ( u t , v t ) where u t = F ( X ) n ( x t ) and v t = F ( (cid:98) T ( X )) n ( (cid:98) T ( x t )) . The empirical v-transform and the fitted parametric v-transform show agood degree of correspondence. The lower-right panel of Figure 6 shows the volatility proxy trans-formation (cid:98) T ( x ) as a function of x superimposed on the points ( x t , Φ − ( v t )) . Using the curve wecan compare the effects of, for example, a log-return ( × rugarch package in R . The generalized error distribution (GED) contains normal andLaplace as special cases as well as a model which has similar tail behaviour to Weibull; note,however, that by the theory of Mikosch and Stărică (2000) the tails of the marginal distributionof the GARCH decay according to a power law in both cases. The results in Table 3 show thatthe VT(2)-ARMA(1,1) models with Laplace and double-Weibull marginal distributions outperformboth GARCH models in terms of AIC values.Figure 7 shows the in-sample 95% conditional value-at-risk (VaR) estimate based on the VT(2)-ARMA(1,1) model which has been calculated using (22). For comparison, a dashed line shows thecorresponding estimate for the GARCH(1,1) model with GED innovations.Finally, we carry out an out-of-sample comparison of conditional VaR estimates using the same24
20 −10 0 10 20 30 − − Theoretical da t a xvals g T ( x ) . . . . . . edf data ed f v o l p r o xy −20 −10 0 10 20 − − − data s t d . v o l p r o xy Figure 6: Plots for a VT(2)-ARMA(1,1) model combined with a double Weibull marginal distribution fitted to theBitcoin return data: QQplot of the data against fitted double Weibull model (upper left); estimated volatility proxyprofile function g T (upper right); estimated v-transform (lower left); implied relationship between data andvolatility proxy variable (lower right). two models. In this analysis, the models are estimated daily throughout the 2016–2019 periodusing a 1000-day moving data window and one-step-ahead VaR forecasts are calculated. The VT-ARMA model gives 47 exceptions of the 95% VaR and 11 exceptions of the 99% VaR, comparedwith expected numbers of 52 and 10 for a 1043 day sample, while the GARCH model leads to57 and 12 exceptions; both models pass binomial tests for these exception counts. In a follow-uppaper (Bladt and McNeil, 2020), we conduct more extensive out-of-sample backtests for modelsusing v-transforms and copula processes and show that they rival and often outperform forecastmodels from the extended GARCH family. 25arameters AICVT-ARMA (Student) 7 5617.39VT-ARMA (Laplace) 6 5596.00VT-ARMA (dWeibull) 7 5573.90GARCH (Student) 5 5629.02GARCH (GED) 5 5611.53 Table 3: Comparison of three VT(2)-ARMA(1,1) models with different marginal distributions with twoGARCH(1,1) models with different innovation distributions.
This paper has proposed a new approach to volatile financial time series in which v-transformsare used to describe the relationship between quantiles of the return distribution and quantilesof the distribution of a predictable volatility proxy variable. We have characterized v-transformsmathematically and shown that the stochastic inverse of a v-transform may be used to constructstationary models for return series where arbitrary marginal distributions may be coupled withdynamic copula models for the serial dependence in the volatility proxy.The construction was illustrated using the serial dependence model implied by a Gaussian ARMAprocess. The resulting class of VT-ARMA processes is able to capture the important features offinancial return series including near-zero serial correlation (white noise behaviour) and volatilityclustering. Moreover, the models are relatively straightforward to estimate building on the clas-sical maximum-likelihood estimation of an ARMA model using the Kalman filter. This can beaccomplished in the stepwise manner that is typical in copula modelling or through joint modellingof marginal and copula process. The resulting models yield insights into the way that volatilityresponds to returns of different magnitude and sign and can give estimates of unconditional andconditional quantiles (VaR) for practical risk measurement purposes.There are many possible uses for VT-ARMA copula processes. Because we have complete controlover the marginal distribution they are very natural candidates for the innovation distribution inother time series model. For example, they could be applied to the innovations of an ARMA modelto obtain ARMA models with VT-ARMA errors; this might be particularly appropriate for longerinterval returns, such as weekly or monthly returns, where some serial dependence is likely to bepresent in the raw return data.Clearly, we could use other copula processes for the volatility PIT process ( V t ) . The VT-ARMA copula process has some limitations: the radial symmetry of the underlying Gaussian copula26
016 2017 2018 2019 2020 − − Figure 7: Plot of estimated 95% value-at-risk (VaR) for Bitcoin return data superimposed on log returns. Solid lineshows VaR estimated using the VT(2)-ARMA(1,1) model combined with a double Weibull marginal distribution;the dashed line shows VaR estimated using a GARCH(1,1) model with GED innovation distribution. means that the serial dependence between large values of the volatility proxy must mirror the serialdependence between small values; moreover this copula does not admit tail dependence in eithertail and it seems plausible that very large values of the volatility proxy might have a tendency tooccur in succession.To extend the class of models based on v-transforms we can look for models for the volatilityPIT process ( V t ) with higher dimensional marginal distributions given by asymmetric copulas withupper tail dependence. First-order Markov copula models as developed in Chen and Fan (2006)can give asymmetry and tail dependence, but they cannot model the dependencies at longer lagsthat we find in empirical data. D-vine copula models can model higher-order Markov dependenciesand Bladt and McNeil (2020) show that this is a promising alternative specification for the volatilityPIT process. 27 oftware The analyses were carried out using R tscopula package (Alexan-der J. McNeil and Martin Bladt, 2020) available at https://github.com/ajmcneil/tscopula . Thefull reproducible code and the data are available at https://github.com/ajmcneil/vtarma . Acknowledgements
The author is grateful for valuable input from a number of researchers including Hansjoerg Albrecher,Martin Bladt, Valérie Chavez-Demoulin, Alexandra Dias, Christian Genest, Michael Gordy, YenHsiao Lok, Johanna Nešlehová, Andrew Patton and Ruodu Wang. Particular thanks are due toMartin Bladt for providing the Bitcoin data and collaborating on the data analysis. The paper wascompleted while the author was a guest at the Forschungsinstitut für Mathematik (FIM) at ETHZurich.
A Proofs
A.1 Proof of Proposition 1
We observe that for x ⩾ F T ( X ) ( x ) = P ( µ T − T − ( x ) ⩽ X t ⩽ µ T + T − ( x )) = F X ( µ T + T − ( x )) − F X ( µ T − T − ( x )) . { X t ⩽ µ T } ⇐⇒ { U ⩽ F X ( µ T ) } and in this case V = F T ( X ) ( T ( X t )) = F T ( X ) ( T ( µ T − X t )) = F X ( µ T + T − ( T ( µ T − X t ))) − F X ( X t )= F X (cid:0) µ T + g T (cid:0) µ T − F − X ( U ) (cid:1)(cid:1) − U. { X t > µ T } ⇐⇒ { U > F X ( µ T ) } and in this case V = F T ( X ) ( T ( X t )) = F T ( X ) ( T ( X t − µ T )) = F X ( X t ) − F X ( µ T − T − ( T ( X t − µ T )))= U − F X (cid:0) µ T − g − T (cid:0) F − X ( U ) − µ T (cid:1)(cid:1) . .2 Proof of Proposition 2 The cumulative distribution function F ( x ) of the double exponential distribution is equal to . e x for x ⩽ and − . e − x if x > . It is straightforward to verify that F X ( x ; γ ) = δe γx x ⩽ − (1 − δ ) e − xγ x > and F − X ( u ; γ ) = γ ln (cid:0) uδ (cid:1) u ⩽ δ − γ ln (cid:16) − u − δ (cid:17) u > δ . When g T ( x ) = kx ξ we obtain for u ⩽ δ that V δ,κ,ξ ( u ) = F X (cid:32) kγ ξ (cid:32) ln (cid:18) δu (cid:19) ξ (cid:33) ; γ (cid:33) − u = 1 − u − (1 − δ ) exp (cid:18) − kγ ξ +1 (cid:16) − ln (cid:16) uδ (cid:17)(cid:17) ξ (cid:19) . For u > δ we make a similar calculation.
A.3 Proof of Theorem 1
It is easy to check that equation (10) fulfills the list of properties in Lemma 2. We concentrate onshowing that a function that has these properties must be of the form (10). It helps to consider thepicture of a v-transform in Figure 3. Consider the lines v = 1 − u and v = δ − u for u ∈ [0 , δ ] . Theareas above the former and below the latter are shaded gray.The left branch of the v-transform must start at (0 , , end at ( δ, and lie strictly betweenthese lines in (0 , δ ) . Suppose, to the contrary, that v = V ( u ) ⩽ δ − u for u ∈ (0 , δ ) . This wouldimply that the dual point u ∗ given by u ∗ = u + v satisfies u ∗ ⩽ δ which contradicts the requirementthat u ∗ must be on the opposite side of the fulcrum. Similarly, if v = V ( u ) ⩾ − u for u ∈ (0 , δ ) then u ∗ ⩾ and this is also not possible; if u ∗ = 1 then u = 0 which is a contradiction.Thus the curve that links (0 , and ( δ, must take the form V ( u ) = ( δ − u )Ψ (cid:16) uδ (cid:17) + (1 − u ) (cid:16) − Ψ (cid:16) uδ (cid:17)(cid:17) = (1 − u ) − (1 − δ )Ψ (cid:16) uδ (cid:17) where Ψ(0) = 0 , Ψ(1) = 1 and < Ψ( x ) < for x ∈ (0 , . Clearly Ψ must be continuous tosatisfy the conditions of the v-transform. It must also be strictly increasing. If it were not thenthe derivative would satisfy V ′ ( u ) ⩾ − which is not possible: if at any point u ∈ (0 , δ ) we have V ′ ( u ) = − then the opposite branch of the v-transform would have to jump vertically at the dualpoint u ∗ , contradicting continuity; if V ′ ( u ) > − then V would have to be a decreasing function at29 ∗ , which is also a contradiction.Thus Ψ fulfills the conditions of a continuous, strictly increasing distribution function on [0 , and we have established the necessary form for the left branch equation. To find the value of theright branch equation at u > δ we invoke the square property. Since V ( u ) = V ( u ∗ ) = V ( u − V ( u )) we need to solve the equation x = V ( u − x ) for x ∈ [0 , using the formula for the left branchequation of V . Thus we solve x = 1 − u + x − (1 − δ )Ψ( u − xδ ) for x and this yields the right branchequation as asserted. A.4 Proof of Proposition 3
Let g T ( x ) be as given in (11) and let u ( x ) = F X ( µ T − x ) . For x ∈ R + , u ( x ) is a continuous,strictly decreasing function of x starting at u (0) = δ and decreasing to . Since Ψ is a cumulativedistribution function, it follows that u ∗ ( x ) = u ( x ) + V ( u ( x )) = 1 − (1 − δ )Ψ (cid:18) u ( x ) δ (cid:19) is a continuous, strictly increasing function starting at u ∗ (0) = δ and increasing to . Hence g T ( x ) = F − X ( u ∗ ( x )) − µ T is continuous and strictly increasing on R + with g T (0) = 0 as required ofthe profile function of a volatility proxy transformation. It remains to check that if we insert (11)in (4) we recover V ( u ) , which is straightforward. A.5 Proof of Theorem 2
1. For any ⩽ v ⩽ the event { U ⩽ u, V ⩽ v } has zero probability for u < V − ( v ) . For u ⩾ V − ( v ) we have { U ⩽ u, V ⩽ v } = {V − ( v ) ⩽ U ⩽ min( u, V − ( v ) + v ) } and hence P ( U ⩽ u, V ⩽ v ) = min( u, V − ( v ) + v ) − V − ( v ) and (12) follows.2. We can write P ( U ⩽ u, V ⩽ v ) = C ( u, v ) where C is the copula given by (12). It follows from30he basic properties of a copula that P ( U ⩽ u, V = v ) = dd v C ( u, v ) = u < V − ( v ) − dd v V − ( v ) V − ( v ) ⩽ u < V − ( v ) + v u ⩾ V − ( v ) + v This is the distribution function of a binomial distribution and it must be the case that ∆( v ) = − dd v V − ( v ) . Equation (14) follows by differentiating the inverse.3. Finally, E (∆( V )) = δ is easily verified by making the substitution x = V − ( v ) in the integral E (∆( V )) = − (cid:82)
10 1 V ′ ( V − ( v )) d v . A.6 Proof of Proposition 4
It is obviously true that V ( V − ( v, W )) = v for any W . Hence V ( U ) = V ( V − ( V, W )) = V . Theuniformity of U follows from the fact that P (cid:0) V − ( V, W ) = V − ( v ) | V = v (cid:1) = P ( W ⩽ ∆( v ) | V = v ) = P ( W ⩽ ∆( v )) = ∆( v ) . Hence the pair of random variables ( U, V ) has the conditional distribution (13) and is distributedaccording to the copula C in (12). A.7 Proof of Theorem 3
1. Since the event { V i ⩽ v i } is equal to the event {V − ( v i ) ⩽ U i ⩽ V − ( v i ) + v i } we first computethe probability of a box [ a , b ] × · · · × [ a d , b d ] where a i = V − ( v i ) ⩽ V − ( v i ) + v i = b i . Thestandard formula for such probabilities implies that the copulas C V and C U are related by C V ( v , . . . , v d ) = (cid:88) j =1 · · · (cid:88) j d =1 ( − j + ··· + j d C U ( u j , . . . , u dj d ) ; see, for example, McNeil et al. (2015), page 221. Thus the copula densities are related by c V ( v , . . . , v d ) = (cid:88) j =1 · · · (cid:88) j d =1 c U ( u j , . . . , u dj d ) d (cid:89) i =1 dd v i ( − j i u ij i dd v i ( − j u ij = dd v i (cid:0) −V − ( v i ) (cid:1) = ∆( v i ) if j = 1 , dd v i (cid:0) v i + V − ( v i ) (cid:1) = 1 − ∆( v i ) if j = 2 .2. For the point ( u , . . . , u d ) ∈ [0 , d we consider the set of events A i ( u i ) defined by A i ( u i ) = { U i ⩽ u i } if u i ⩽ δ { U i > u i } if u i > δ The probability P ( A ( u ) , . . . , A d ( u d )) is the probability of an orthant defined by the point ( u , . . . , u d ) and the copula density at this point is given by c U ( u , . . . , u d ) = ( − (cid:80) di =1 I { ui>δ } d d d u · · · d u d P (cid:32) d (cid:92) i =1 A i ( u i ) (cid:33) . The event A i ( u i ) can be written A i ( u i ) = { V i ⩾ V ( u i ) , W i ⩽ ∆( V i ) } if u i ⩽ δ { V i > V ( u i ) , W i > ∆( V i ) } if u i > δ and hence we can use Theorem 2 to write P (cid:32) d (cid:92) i =1 A i ( u i ) (cid:33) = (cid:90) V ( u ) · · · (cid:90) V ( u d ) c V ( v , . . . , v d ) d (cid:89) i =1 ∆( v i ) I { ui ⩽ δ } (1 − ∆( v i )) I { ui>δ } d v · · · d v d . The derivative is given by d d d u · · · d u d P (cid:32) d (cid:92) i =1 A i ( u i ) (cid:33) = ( − d c V ( V ( u ) , . . . , V ( u d )) d (cid:89) i =1 p ( u i ) I { ui ⩽ δ } (1 − p ( u i )) I { ui>δ } V ′ ( u i ) where p ( u i ) = ∆( V ( u i )) and hence we obtain c U ( u , . . . , u d ) = c V ( V ( u ) , . . . , V ( u d )) d (cid:89) i =1 ( − p ( u i )) I { ui ⩽ δ } (1 − p ( u i )) I { ui>δ } V ′ ( u i ) . It remains to verify that each of the terms in the product is identically equal to 1. For u i ⩽ δ − p ( u i ) = − ∆( V ( u i )) = 1 / V ′ ( u i ) . For u i > δ we need anexpression for the derivative of the right branch equation. Since V ( u i ) = V ( u i − V ( u i )) weobtain V ′ ( u i ) = V ′ ( u i − V ( u i ))(1 − V ′ ( u i )) = V ′ ( u ∗ i )(1 − V ′ ( u i )) = ⇒ V ′ ( u i ) = V ′ ( u ∗ i )1 + V ′ ( u ∗ i ) implying that − p ( u i ) = 1 − ∆( V ( u i )) = 1 − ∆( V ( u ∗ i )) = 1 + 1 V ′ ( u ∗ i ) = 1 + V ′ ( u ∗ i ) V ′ ( u ∗ i ) = 1 V ′ ( u i ) . A.8 Proof of Proposition 5
Let V t = V ( U t ) and Z t = Φ − ( V t ) as usual. The process ( Z t ) is an ARMA process with acf ρ ( k ) andhence ( Z t , . . . , Z t k ) are jointly standard normally distributed with correlation matrix P ( t , . . . , t k ) .This implies that the joint distribution function of ( V t , . . . , V t k ) is the Gaussian copula with density c Ga P ( t ,...,t k ) and hence by Part 2 of Theorem 3 the joint distribution function of ( U t , . . . , U t k ) is thecopula with density c Ga P ( t ,...,t k ) ( V ( u ) , . . . , V ( u k )) . A.9 Proof of Proposition 6
We split the integral in (18) into four parts. First observe that by making the substitutions v = V ( u ) = 1 − u /δ and v = V ( u ) = 1 − u /δ on [0 , δ ] × [0 , δ ] we get (cid:90) δ (cid:90) δ u u c Ga ρ ( k ) ( V ( u ) , V ( u )) d u d u = δ (cid:90) (cid:90) (1 − v )(1 − v ) c Ga ρ ( k ) ( v , v ) d v d v = δ E ((1 − V t )(1 − V t + k ))= δ (1 − E ( V t ) − E ( V t + k ) + E ( V t V t + k )) = δ E ( V t V t + k ) where ( V t , V t + k ) has joint distribution given by the Gaussian copula C Ga ρ ( k ) . Similarly by making thesubstitutions v = V ( u ) = 1 − u /δ and v = V ( u ) = ( u − δ ) / (1 − δ ) on [0 , δ ] × [ δ, we get (cid:90) δ (cid:90) δ u u c Ga ρ ( k ) ( V ( u ) , V ( u )) d u d u = (cid:90) (cid:90) δ (1 − δ )(1 − v ) (cid:16) δ + (1 − δ ) v (cid:17) c Ga ρ ( k ) ( v , v ) d v d v = δ (1 − δ ) E (1 − V t ) + δ (1 − δ ) E ((1 − V t ) V t + k ) = δ (1 − δ )2 − δ (1 − δ ) E ( V t V t + k ) [ δ, × [0 , δ ] . Finally making the substitutions v = V ( u ) = ( u − δ ) / (1 − δ ) and v = V ( u ) = ( u − δ ) / (1 − δ ) on [ δ, × [ δ, we get (cid:90) δ (cid:90) δ u u c Ga ρ ( k ) ( V ( u ) , V ( u )) d u d u = (cid:90) (cid:90) (1 − δ ) (cid:16) δ + (1 − δ ) v (cid:17)(cid:16) δ + (1 − δ ) v (cid:17) c Ga ρ ( k ) ( v , v ) d v d v = (cid:90) (cid:90) (1 − δ ) (cid:16) δ + δ (1 − δ ) v + δ (1 − δ ) v + (1 − δ ) v v (cid:17) c Ga ρ ( k ) ( v , v ) d v d v = δ (1 − δ ) + δ (1 − δ ) E ( V t ) + δ (1 − δ ) E ( V t + k ) + (1 − δ ) E ( V t V t + k )= δ (1 − δ ) + (1 − δ ) E ( V t V t + k ) Collecting all of these terms together yields (cid:90) (cid:90) u u c Ga ρ ( k ) ( V ( u ) , V ( u )) d u d u = δ (1 − δ ) + (2 δ − E ( V t V t + k ) and since ρ S ( Z t , Z t + k ) = 12 E ( V t V t + k ) − it follows that ρ ( U t , U t + k ) = 12 E ( U t U t + k ) − (cid:90) (cid:90) u u c Ga ρ ( k ) ( V ( u ) , V ( u )) d u d u −
3= 12 δ (1 − δ ) + 12(2 δ − E ( V t V t + k ) −
3= 12 δ (1 − δ ) + (2 δ − ( ρ S ( Z t , Z t + k ) + 3) −
3= (2 δ − ρ S ( Z t , Z t + k ) . The value of Spearman’s rho ρ S ( Z t , Z t + k ) for the bivariate Gaussian distribution is well known; seefor example McNeil et al. (2015). A.10 Proof of Proposition 7
The conditional density satisfies f U t | U t − ( u | u t − ) = c U t ( u , . . . , u t − , u ) c U t − ( u , . . . , u t − ) = c Ga P (1 ,...,t ) ( V ( u ) , . . . , V ( u t − ) , V ( u )) c Ga P (1 ,...,t − ( V ( u ) , . . . , V ( u t − )) . c Ga P ( v , . . . , v d ) = f Z (cid:0) Φ − ( v ) , . . . , Φ − ( v d ) (cid:1)(cid:81) di =1 ϕ (cid:0) Φ − ( v i ) (cid:1) where Z is a multivariate Gaussian vector with standard normal margins and correlation matrix P .Hence it follows that we can write f U t | U t − ( u | u t − ) = f Z t (cid:16) Φ − (cid:0) V ( u ) (cid:1) , . . . , Φ − (cid:0) V ( u t − ) (cid:1) , Φ − (cid:0) V ( u ) (cid:1)(cid:17) f Z t − (cid:16) Φ − (cid:0) V ( u ) (cid:1) , . . . , Φ − (cid:0) V ( u t − ) (cid:1)(cid:17) ϕ (cid:0) Φ − (cid:0) V ( u ) (cid:1)(cid:1) = f Z t | Z t − (cid:16) Φ − (cid:0) V ( u ) (cid:1) | Φ − (cid:0) V ( u t − ) (cid:1)(cid:17) ϕ (cid:0) Φ − (cid:0) V ( u ) (cid:1)(cid:1) where f Z t | Z t − is the conditional density of the ARMA process, from which (20) follows easily. References
Aas, K., C. Czado, A. Frigessi, and H. Bakken, 2009, Pair-copula constructions of multiple depen-dence,
Insurance: Mathematics and Economics
44, 182–198.Andersen, T.G., 1994, Stochastic autoregressive volatility: a framework for volatility modeling,
Mathematical Finance
4, 75–102.Andersen, T.G., and L. Benzoni, 2009, Stochastic volatility, in R.A. Meyers, ed.,
Complex Systemsin Finance and Econometrics (Springer, New York).Bedford, Tim, and Roger M. Cooke, 2001, probability density decomposition for conditionally in-dependent random variables modeled by vines,
Annals of Mathematics and Artificial Intelligence
32, 245–268.Bedford, Tim, and Roger M. Cooke, 2002, Vines–a new graphical model for dependent randomvariables,
Annals of Statistics
30, 1031–1068.Bladt, M., and A.J. McNeil, 2020, Time series copula models using d-vines and v-transforms: analternative to GARCH modelling, arXiv:2006.11088.Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity,
Journal of Econo-metrics
31, 307–327. 35ollerslev, T., R. F. Engle, and D. B. Nelson, 1994, ARCH models, in R. F. Engle, and D. L.McFadden, eds.,
Handbook of Econometrics , volume 4, 2959–3038 (North-Holland, Amsterdam).Campbell, J. Y., A. W. Lo, and A. C. MacKinlay, 1997,
The Econometrics of Financial Markets (Princeton University Press, Princeton).Chen, X., and Y. Fan, 2006, Estimation of copula-based semiparametric time series models,
Journalof Econometrics
Annals of Statistics
37, 4214–4253.Cont, R., 2001, Empirical properties of asset returns: stylized facts and statistical issues,
Quantita-tive Finance
1, 223–236.Creal, D., S.J. Koopman, and A. Lucas, 2013, Generalized autoregressive score models with appli-cations,
Journal of Applied Econometrics
28, 777–795.Ding, Z., C. W. Granger, and R. F. Engle, 1993, A long memory property of stock market returnsand a new model,
Journal of Empirical Finance
1, 83–106.Domma, F., S. Giordano, and P. F. Perri, 2009, Statistical modeling of temporal dependence infinancial data via a copula function,
Communications if Statistics: Simulation and Computation
38, 703–728.Engle, R. F., 1982, Autoregressive conditional heteroskedasticity with estimates of the variance ofUnited Kingdom inflation,
Econometrica. Journal of the Econometric Society
50, 987–1008.Fan, Y., and A.J. Patton, 2014, Copulas in econometrics,
Annual Review of Economics
6, 179–200.Fernández, C., and M.F.J. Steel, 1998, On Bayesian modeling of fat tails and skewness,
Journal ofthe American Statistical Association
93, 359–371.Genest, C., K. Ghoudi, and L. Rivest, 1995, A semi-parametric estimation procedure of dependenceparameters in multivariate families of distributions,
Biometrika
82, 543–552.Glosten, L. R., R. Jagannathan, and D. E. Runkle, 1993, On the relation between the expected valueand the volatility of the nominal excess return on stocks,
The Journal of Finance
48, 1779–1801.Joe, H., 2015,
Dependence Modeling with Copulas (CRC Press, Boca Raton).36oe, Harry, 1996, Families of m -variate distributions with given margins and m ( m − / bivariatedependence parameters, in Ludger Rüschendorf, Berthold Schweizer, and Michael D. Taylor, eds., Distributions with fixed marginals and related topics , volume 28 of
Lecture Notes–MonographSeries , 120–141 (Institute of Mathematical Statistics, Hayward, CA).Loaiza-Maya, R., M.S. Smith, and W. Maneesoonthorn, 2018, Time series copulas for heteroskedas-tic data,
Journal of Applied Econometrics
33, 332–354.McNeil, A. J., R. Frey, and P. Embrechts, 2015,
Quantitative Risk Management: Concepts, Tech-niques and Tools , second edition (Princeton University Press, Princeton).Mikosch, T., and C. Stărică, 2000, Limit theory for the sample autocorrelations and extremes of aGARCH(1,1) process,
The Annals of Statistics
28, 1427–1451.Patton, A.J., 2012, A review of copula models for economic time series,
Journal of MultivariateAnalysis
Statistical Methods for Financial Engineering (Chapman & Hall).Shephard, N., 1996, Statistical aspects of ARCH and stochastic volatility, in D. R. Cox, D. V.Hinkley, and O. E. Barndorff-Nielsen, eds.,
Time Series Models in Econometrics, Finance andOther Fields , 1–55 (Chapman & Hall, London).Smith, Michael, Aleksey Min, Carlos Almeida, and Claudia Czado, 2010, Modeling LongitudinalData Using a Pair-Copula Decomposition of Serial Dependence,
Journal of the American Statis-tical Association
MathematicalFinance
4, 183–204.Terasaka, T., and Y. Hosoya, 2007, A modified Box-Cox transformation in the multivariate ARMAmodel,