On time scaling of semivariance in a jump-diffusion process
OOn Time Scaling of Semivariance in a Jump-Diffusion Process
Rodrigue Oeuvray ∗ Pascal Junod † September 17, 2018
Abstract
The aim of this paper is to examine the time scaling of the semivariance when returns aremodeled by various types of jump-diffusion processes, including stochastic volatility modelswith jumps in returns and in volatility. In particular, we derive an exact formula for thesemivariance when the volatility is kept constant, explaining how it should be scaled whenconsidering a lower frequency. We also provide and justify the use of a generalization ofthe Ball-Torous approximation of a jump-diffusion process, this new model appearing todeliver a more accurate estimation of the downside risk. We use Markov Chain Monte Carlo(MCMC) methods to fit our stochastic volatility model. For the tests, we apply our method-ology to a highly skewed set of returns based on the Barclays US High Yield Index, wherewe compare different time scalings for the semivariance. Our work shows that the squareroot of the time horizon seems to be a poor approximation in the context of semivarianceand that our methodology based on jump-diffusion processes gives much better results.
Keywords: time-scaling of risk, semivariance, jump diffusion, stochastic volatility, MCMCmethods
Modern portfolio theory has shown that investing in certain asset classes promising higherreturns has always been linked with a higher variability (also called volatility) of those returns,hence resulting in increased risks for the investor. Hence, it is one of the main tasks of financialengineering to accurately estimate the variability of the return of a given asset (or portfolio ofassets) and take this figure into account in various tasks, including risk management, findingoptimal portfolio strategies for a given risk aversion level, or derivatives pricing.For instance, properly estimating volatilities is essential for applying the Black and Sc-holes (1973) option pricing method; another prominent example is the estimation of returndistribution quantiles for computing value-at-risk (VaR) figures, a method which is typicallyrecommended by international banking regulation authorities, such as the Basel Committee onBanking Supervision. This approach is also widely applied in practice in internal risk manage-ment systems of many financial institutions worldwide.In practice, sufficient statistical information about the past behavior of an asset’s return isoften not available, hence one is commonly forced to use the so-called square-root-of-time rule.Essentially, this rule transforms high-frequency risk estimates (for instance, gathered with a 1-day period over several years) into a lower frequency T (like a 1-year period) by multiplying the ∗ Pictet Asset Management SA, route des Acacias 60, CH-1211 Geneva 73, Switzerland ([email protected]) † University of Applied Sciences and Arts Western Switzerland / HEIG-VD, route de Cheseaux 1, CH-1401Yverdon-les-Bains, Switzerland ([email protected]) a r X i v : . [ q -f i n . S T ] N ov olatility by a factor of √ T . As an illustration, the Basel regulations recommend to computethe VaR for the ten day regulatory requirement by estimating a 1-day VaR and by multiplyingthis value by √
10, where the VaR is the value that solves the equation ε = (cid:90) − VaR −∞ ˆ f ( r ) dr given the density ˆ f ( r ) of the bank’s return estimated probability distribution and a confidencelevel ε , fixed for instance to 1%.It is well-known that scaling volatilities with the square-root-of-time is only accurate undera certain number of assumptions that are typically not observed in practice: according toDan´ıelsson and Zigrand (2006), returns need to be homoscedastic and conditionally seriallyuncorrelated at all leads, an assumption slightly weaker than the one of independently andidentically distributed (iid) returns. Dan´ıelsson and Zigrand show furthermore that for thesquare-root-of-time rule to be correct for all quantiles and horizons implies the iid property ofthe zero-means returns, but also that the returns are normally distributed.In this paper, we are interested in studying the effects of applying the square-root-of-timerule on the semivariance of a continuous jump-diffusion process. As a first step, the semivariancebeing a downside risk measure, we quickly recall its history and properties in the followingsection. Downside risk measures have appeared in the context of portfolio theory in the 1950’s, with thedevelopment by Markowitz (1952) and Roy (1952) of decision-making tools helping to managerisky investment portfolios. Markowitz (1952) showed how to exploit the averages, variancesand covariances of the return distributions of assets contained in a portfolio in order to computean efficient frontier on which every portfolio either maximizes the expected return for a givenvariance (i.e., risk level), or minimizes the variance for a given expected return. In the scenarioof Markowitz, a utility function, defining the investor’s sensitivity to changing wealth and risk,is used to pick the proper portfolio on the optimal border.On his side, Roy (1952) was willing to derive a practical method allowing to determine thebest risk-return trade-off; as he was not convinced that it is feasible to model in practice thesensitivity to risk of a human being with a utility function, he chose to assume that an investorwould prefer the investment with the smallest probability of going below a disaster level, or atarget return. Recognizing the wisdom of this claim, Markowitz (1959) figured out two veryimportant points, namely that only the downside risk is relevant for an investor, and that returndistributions might be skewed, i.e., not symmetrically distributed, in practice. In that spirit,Markowitz suggested to use the following variability measure, that he called a semivariance , asit only takes into account a subset of the return distribution:(1.1) (cid:90) τ −∞ ( τ − r ) f ( r ) dr where f ( r ) denotes the density of the returns probability distribution, R denotes a randomvariable distributed according to f ( r ) and τ is a return target level. If τ is equal to µ R = (cid:82) rf ( r ) dr , then (1.1) is called the below-mean semivariance of R , while if τ is arbitrary, (1.1)is called the below-target semivariance of R , where τ is defined to be the target return. Inother words, only the deviations to the left of the returns distribution average, or a fixed returntarget are accounted for in the computations of the variability. Similarly, the square root of asemivariance is called a semideviation , with analogy to the standard deviation. Note that for a2ymmetrical, i.e., non-skewed return distribution, the variance of a random variable R is equalto twice its below-mean semivariance.The Sharpe (1966) ratio is a measure of the risk-adjusted return of an asset, a portfolio oran investment strategy, that quantifies the excess return per unit of deviation; it is defined as(1.2) E[ R A − R B ] (cid:112) Var[ R A − R B ] , where R A and R B are random variables modeling the returns of assets A and B , respectively. Aprominent variant of the Sharpe ratio, called the Sortino ratio (see Sortino and Van Der Meer(1991)), is relying on the semideviation instead of the standard deviation of the returns distri-bution. It is well-known and easily understood that the Sharpe and Sortino ratios tend to givevery different results for highly-skewed return distributions.Finally, we would like to note that the concept of semivariance has been generalized, resultingin the development of lower partial moments by Bawa (1975) and Fishburn (1977). Essentially,the square is replaced by an arbitrary power a that can freely vary:(1.3) (cid:90) τ −∞ ( τ − r ) a f ( r ) dr Varying a might help in modeling the fact that an investor is more (through larger values of a )or less (through smaller values of a ) sensitive to risk. In this paper, we have chosen to stick to a = 2 for simplicity reasons. In the following, we recall the concepts of jump-diffusion models. Jump-diffusion models are continuous-time stochastic processes introduced in quantitative fi-nance by Merton (1976), extending the celebrated work of Black and Scholes (1973) on optionpricing. These models are a mixture of a standard diffusion process and a jump process. Theyare typically used to reproduce stylized facts observed in asset price dynamics, such as mean-reversion and jumps. Indeed, modeling an asset price as a standard Brownian process impliesthat it is very unlikely that large jumps over a short period might occur, as it is sometimes thecase in real life, unless for unrealistically large volatility values. Hence, introducing the conceptof jumps allows to take into account those brutal price variations, which is especially usefulwhen considering risk management, for instance.Various specifications have been proposed in the literature and we refer the reader to Contand Tankov (2004) for an extensive review. In what follows, we consider first (in §
3) thestandard jump-diffusion model with time invariant coefficients, constant volatility and Gaussiandistributed jumps. Later, in §
5, we will also consider more elaborated stochastic processes,involving random jumps in returns and in volatility.A basic jump-diffusion stochastic process is a mixture of a standard Brownian process withconstant drift µ and volatility σ and of a (statistically independent) compound Poisson processwith parameter λ and whose jump size is distributed according to an independent normal law N ( µ Q , σ Q ). More precisely, this model can be expressed as the following stochastic differentialequation:(1.4) dX ( t ) = X ( t )( µdt + σdW ( t ) + J ( t ) dP ( t )) , where X ( t ) denotes the process that describes the price of a financial asset, with Pr[ X (0) >
0] = 1, where µ ∈ R is the process drift coefficient, σ > W ( t ) is astandard Wiener process, P ( t ) is a Poisson process with constant intensity λ > J ( t ) is the3rocess generating the jump size, that together with P ( t ) forms a compound Poisson process.The solution of the stochastic differential equation (1.4) is given by(1.5) X ( t ) = X (0) e (cid:16) µ − σ (cid:17) t + σW ( t )+ (cid:80) P ( t ) k =1 Q k , where Q k is implicitly defined according to J ( T k ) = e Q k −
1, and T k is the time at which the k -th jump of the Poisson process occurs. If P ( t ) = 0, the sum is zero by convention. We assumethat the Q k form an independent and identically normally distributed sequence with mean µ Q and variance σ Q .The log-return of X ( t ) over a t -period is defined as Y t = log X ( t ) − log X (0) and, from (1.5),its dynamic is given by(1.6) Y t = (cid:18) µ − σ (cid:19) t + σW ( t ) + P ( t ) (cid:88) k =1 Q k , The distribution of Y t is an infinite mixture of Gaussian distributions N (cid:0)(cid:0) µ − . σ (cid:1) t + kµ Q , σ t + kσ Q ) (cid:1) and has a density function given by(1.7) f Y t ( y ) = + ∞ (cid:88) k =0 e − λt ( λt ) k k ! 1 (cid:113) π ( σ t + kσ Q ) e −
12 ( y − (( µ − . σ t + kµQ ) σ t + kσ Q . Our contributions in this paper can be summarized as follows: first of all, we derive in § § § λ should be smaller than a (large) upper bound. However, we show in § § § § We derive in this section an explicit formula for the semivariance of a standard jump-diffusionmodel with time invariant coefficients, constant volatility and Gaussian distributed jumps. Tothe best of our knowledge, this is the first time that an explicit formula is provided for computingthe semivariance.In the following, let us denote respectively by φ ( x ) and Φ( x ) the probability density functionand the cumulative distribution function of a standard normal distribution N (0 ,
1) with mean0 and variance 1, i.e., φ ( x ) = 1 √ π e − x and Φ( x ) = (cid:90) x −∞ φ ( t ) dt for x ∈ R . Theorem 2.1.
The semivariance of the density (1.7) is given by (2.1) + ∞ (cid:88) k =0 e − λt ( λt ) k k ! (cid:0) ( D − µ k ) Φ( D k ) + σ k ( D − µ k ) f ( D k ) + σ k Φ( D k ) (cid:1) , where µ k = (cid:16) µ − σ (cid:17) t + kµ Q , σ k = σ t + kµ Q , D k = D − µ k σ k and k = 0 , ..., + ∞ . The proof is given in Appendix A. For a pure diffusion process without jump, the previousformula (2.1) simplifies to the following one.
Corollary 2.1.
The semivariance of a pure diffusion process with drift µ and volatility σ isequal to (2.2) ( D − µ ) Φ( D ) + σ ( D − µ ) f ( D ) + σ Φ( D ) , where µ = (cid:16) µ − σ (cid:17) t , σ = σ t and D = D − µ σ . The proof is a direct consequence of Proposition A.1 given in Appendix A and of the factthat (1.6) can be rewritten as(2.3) Y t = (cid:18) µ − σ (cid:19) t + σW ( t )when we consider a pure diffusion process. The task of fitting a jump-diffusion model to real-world data is not as easy at it appears, and thisfact has been early recognized, see for instance Beckers (1981) or Honore (1998). Essentially,5he reason lies in the fact that the likelihood function of an infinite mixture of distribution canbe unbounded, hence resulting in inconsistencies. However, by making some assumptions aboutthe parameters of this model, it is possible to accurately estimate them. In the following, wepresent the approach of Ball and Torous (1983, 1985).Therein, the authors present a simplified version of a jump-diffusion process by assumingthat, if the jumps occurrence rate is small, then during a sufficiently short time period only asingle jump can occur. Accordingly, for small values of λ ∆ t , ∆ P ( t ) can be approximated by aBernoulli distribution of parameter λ ∆ t , and the density of ∆ Y ( t ) can then be written as(3.1) f ∆ Y ( y ) = (1 − λ ∆ t ) f ∆ D ( y ) + λ ∆ t ( f ∆ D (cid:63) f Q )( y ) , where f ∆ D denotes the probability density function of the diffusion part (including the drift), f Q the probability density function of the jump intensity, and (cid:63) denotes the convolution operator.As mentioned in § f ∆ D follows a normal law with mean ( µ − σ / t and variance σ ∆ t . If f Q is distributed according to a normal law statistically independent of the diffusion part, thenthe convolution f ∆ D (cid:63) f Q of f ∆ D and f Q is normal with mean ( µ − σ / t + µ Q and variance σ ∆ t + σ Q .For a sequence of observed log-returns ∆ y , . . . , ∆ y T , the log-likelihood log L of the modelparameters θ = ( λ, µ, σ, µ Q , σ Q ) T is obtained in a straightforward manner from (3.1) as(3.2) log L ( θ | ∆ y , ..., ∆ y T )( y ) = T (cid:88) t =1 log f ∆ Y (∆ y t | θ ) , and the maximum likelihood estimator ˆ θ is obtained by maximizing (3.2). Kiefer (1978) hasshown that there may exist several local minima in such a mixture setting, a fact that we havealso observed in the experimental setup that is the subject of the next section.While fitting the Ball-Torous model to real-world data (see § λ ∆ t (cid:28) t = (i.e., ∆ t representing one day in a 252-day trading year), the best obtained estimationfor λ ranged into the interval [1 , λ ∆ t was often nearer to 1 thanto 0, and this obviously questions the validity in practice of the assumption made by Ball andTorous, at least in our experimental setup. In the following, we propose a new methodologyrevolving around relaxing this assumption to a milder one, namely that λ ∆ t <
1, and we justifyits use.
The methodology we describe in this section can be interpreted as an extension of the workof Ball and Torous (1983). In this approach, the authors make the assumption that λ ∆ t is small,or in other words, that the expected number of jumps per ∆ t period is very small. However,in practice, this assumption might not always be satisfied, as we observed it on our data. Wepropose to relax this assumption and to replace it by the milder one λ ∆ t <
1. For ∆ t = 1 / λ < § λ . To summarize, with our methodology, we are able to fit any jump diffusion without having6ny strong restriction on λ , which is a major improvement in comparison to the work of Balland Torous (1983).Obviously, the reason why Ball and Torous decided to limit the jump rate to a small valuewas to capture the apparition of brutal and rare events. As a matter of fact, it is clearly moredesirable on a practical point of view to be able to model rare and large downside marketmovements than frequent and small ones. This assumption implies small λ ∆ t values, and con-sequently, it means that the occurrence of more than a single jump per ∆ t period is sufficientlyunlikely that it can be neglected.However, we have experimentally figured out that the semivariance computed on our relaxedmodel, i.e., allowing also frequent and small jumps, may result in significantly higher values thanon the model assuming that λ ∆ t is small (see Figure 4.6). In other words, we observed thatthe Ball and Torous model seems to underestimate the downside risk, compared to our relaxedmodel, at least on our data set. A direct consequence of not limiting the jumps occurrence rate λ ∆ to a small values is that the probability of having more than one jump during a single daymay become not negligible when λ is large enough. By allowing more than a single jump per∆ t period, one can also take into account the fact that several bad news might influence themarket during a trading day, i.e., during a ∆ t period. Obviously, if ∆ t becomes sufficientlysmall, maybe as small as a single second, it would be more difficult to justify in practice severaljumps during a time interval. However, when discretizing the process in periods as large as asingle day, we are convinced that allowing more than a single jump better reflects the reality.Consequently, in the following, we will assume that up to m ≥ t , where m ∈ N is a finite value. The probability distribution of the possiblenumber k of jumps that may occur during a time interval ∆ t is then the following:(3.3) p k = e − λ ∆ t ( λ ∆ t ) k k ! for k = 0 , ..., m − p m = 1 − m − (cid:88) k =0 p k . The resulting probability distribution, that we denote by ˜ f ∆ Y ( y ) and which can be comparedto f ∆ Y ( y ) in (3.1), can be expressed as˜ f ∆ Y ( y ) = p f ∆ D ( y ) + m (cid:88) k =1 p k (cid:16) f ∆ D (cid:63) f Q ( k ) (cid:17) ( y ) , where f Q ( k ) denotes the convolution of k density functions f Q .In what follows, we propose ourselves to look at the approximation error of replacing aPoisson law by a truncated one. For a random variable following a Poisson law of parameter λ , the probability of obtaining a value strictly larger than m is given by the following function f ( λ ):(3.4) f ( λ ) = 1 − m (cid:88) k =0 p k and p k = e − λ λ k k ! with k = 0 , . . . , m. It is easy to see that f ( λ ) is an increasing function in λ . Indeed, the derivative of f ( λ ) is givenby f (cid:48) ( λ ) = − m (cid:88) k =0 d p k dλ . Then, we have dp k dλ = − p k + p k − for k = 0 , . . . , m , since dp k dλ = ddλ e − λ λ k k ! = − e − λ λ k k ! + ke − λ λ k − k ! = − p k + p k − , Upper Bound1 0.2642 0.0803 0.0194 0.0035 0.001Table 3.1: Upper bounds for the probabilities of obtaining more than m jumps for a Poissonlaw with λ ∆ t < p − is set to 0 as dp dλ = ddλ e − λ = − e − λ = − p . Following this observation and telescoping the sum, the derivative f (cid:48) ( λ ) can be rewritten as f (cid:48) ( λ ) = − (cid:80) mk =0 dp k dλ = p m >
0. Now, if we substitute λ by λ ∆ t and assuming λ ∆ t <
1, then wecan observe that the supremum in (3.4) is obtained for λ ∆ t = 1. We conclude that, assumingthat ∆ t <
1, the probability of having more than m jumps is upper-bounded by the followingexpression:(3.5) + ∞ (cid:88) k = m +1 e − k ! = 1 − m (cid:88) k =0 e − k !Table 3.1 gives a numerical upper bound for the probability of obtaining values strictly largerthan m for different values of m when λ ∆ t <
1. Based on the previous results, we conclude that“truncating” the Poisson random variable combined with our assumption that λ ∆ t < t = n ∆ t and that no morethan m jumps can happen in a ∆ t interval, we can derive a very accurate approximation forthe semivariance given in (A.2) by simply considering the first ( mn + 1) terms of (2.1) given inTheorem 2.1:(3.6) mn (cid:88) k =0 e − λt ( λt ) k k ! (cid:0) ( D − µ k ) Φ( D k ) + σ k ( D − µ k ) f ( D k ) + σ k Φ( D k ) (cid:1) . For a Poisson process, the expected value and the variance are equal to λ for t = 1 year. In otherwords, if λ <
252 and assuming that m = 5, it means that we consider the first 5 × − √ ≈
63 standard deviations from the average of the Poisson process; a new time,this implies that the approximation given in (3.6) is an almost exact formula. λ So far, we assumed that λ ∆ t <
1, a choice that might appear arbitrary at first sight. Thepurpose of this section is to show that this assumption is actually not a strong limitation.If we assume that ∆ t = 1 /
252 (1 day) and that λ ∆ t ≥
1, or equivalently, that λ ≥ P ( t ) is sufficiently large, then we can invoke the central limit theorem toapproximate Q = (cid:80) P ( t ) k =1 Q k , as the Q k ’s are iid random variables with finite mean and variance.The mean and the variance of a compound Poisson distribution derive in a simple way from thelaws of total expectation and of total variance. Formally, let us denote by E X [ X ] and Var[ X ]the expected value and the variance of a random variable X , respectively. Furthermore, letE X | Y [ X | Y ] denote the conditional expectation of the random variable X conditioned by Y .Note that E X | Y [ X | Y ] is a random variable, and therefore, one can compute its expected value.The law of total expectation tells us that E Y (cid:2) E X | Y [ X | Y ] (cid:3) = E X [ X ], while the law of totalvariance is formulated as Var [ Y ] = E X [Var[ Y | X ]] + Var X (cid:2) E Y | X [ Y | X ] (cid:3) .Let us remind that in our jump-diffusion process, the amplitude Q of a single jump isassumed to follow a normal distribution with mean µ Q and standard deviation σ Q , respectively,and that the number of jumps P in a given interval is modeled by a Poisson law of parameter λ ∆ t . Finally, let us remind that Q and P are independent random variables. We haveE[ Q ] = E P (cid:2) E Q | P [ Q | P ] (cid:3) = E P [ P · E Q [ Q ]] = E P [ P ] · E Q [ Q ] = λtµ Q as well asVar[ Q ] = E (cid:2) Var Q | P [ Q ] (cid:3) + Var (cid:2) E Q | P [ Q ] (cid:3) = E [ P · Var( Q )] + Var [ P · E[ Q ]]= E[ P ]Var[ Q ] + (E[ Q ]) Var[ P ] = λtσ Q + µ Q λt = λt ( σ Q + µ Q )Based on the previous development, we can conclude that the jump diffusion process convergesin distribution to a normal distribution when P ( t ) becomes large:(3.7) Y t ∼ N (cid:18) ( µ − σ λµ Q ) t, ( σ + λ ( σ Q + µ Q )) t (cid:19) . In short, the annualized return distribution converges to the return distribution that we wouldobtain for a pure diffusion process but with different drift and volatility parameters. Concretely,when λ ∆ t is larger than 1, then the interest of such a model is limited and the use of a purediffusion process is preferable at least in our context. In order to practically illustrate our results, we have chosen to focus on the
Barclays US HighYield Bond Index between January 3rd, 2007 and July 31st, 2012. We explain in Appendix Bwhy we can use a jump-diffusion process to model a bond benchmark index. The daily log-returns (see Figure 4.1) of this index form a good example of a highly-skewed, long-taileddistribution with a negative sample skewness of − .
63 and a sample kurtosis of 24 .
08. Anhistogram approximating the probability density function of the log-returns, as well as a normallaw whose parameters are the log-returns sample mean ˆ µ ≈ . σ ≈ .
42% are depicted in Figure 4.2.The semivariance of the log-returns can easily be computed with help of (1.1). However, it isnot clear at all how we should proceed to annualize this semivariance, except in one case: if weassume that the log-returns follow a pure diffusion process, with no drift, and if the threshold τ is set to zero, then the variance of the process increases linearly with time, and the annualizedsemivariance is just the daily semivariance multiplied by the square root of time. However,as soon as we introduce jumps in the stochastic process, there is no reason to apply this ruleanymore. Indeed, we have derived in (2.1) an exact formula that enables us to accuratelyannualize a daily semivariance into an annualized semivariance.9he purpose of this section consists in performing a rolling analysis and to compare thesemivariances that we obtain from three distinct methodologies:1. an approach relying on fitting our data to a jump-diffusion process, and computing anannualized semivariance according to our generalized Ball-Torous model and (3.6), whichwe have shown to be an almost exact approximation of (2.1);2. an approach relying on fitting our data to a pure diffusion process (i.e., without jumps),and computing an annualized semivariance according to (2.2);3. and the square-root-of-time rule.The objective of this analysis is to experimentally determine which one of these threemethodologies seems to provide the best results. In our computations, all the semivariancesare based on 1 year of historical data, that is, 252 points, and we perform a rolling analysisover the whole period. We also have decided to set τ = 0 in all of our tests, since the use of thesquare-root-of-time is only justified for this particular value. The semivariance in t is computedas follows: first, we compute the daily semivariance based on historical data from t −
251 days to t ; then, we transform the daily semivariance to obtain an annualized semivariance by replacing t by ∆ t in (3.6) and (2.2), respectively, or by applying the square-root-of-time rule.This section is organized as follows: in § § The main challenge that we faced from an optimization point of view is the fact the objectivefunction (3.2) possesses several local optima. To overcome this issue, we have decided to usethe optimizer described in Ardia and Mullen (2013). The main motivations for us to choose itis its easiness to use, its capacity to handle local optima and its ability to manage constraintsabout the parameters. Moreover, it was successfully tested in Ardia et al. (2011b) on diffusionproblems. Finally, this kind of algorithm has proved to be flexible enough for us to customizeit for the rolling analyses that we are dealing with in this paper. We call the variant we willdescribe later differential evolution algorithm with memory .Optimization methods based on Differential Evolution (DE) is a class of search algorithmsintroduced by Storn and Price (1997) that belongs to the class of evolutionary algorithms.These algorithms assume no special mathematical property about the target function, likedifferentiability for gradient descents or quasi-Newton methods. It means that DE can handlenon-continuous or very ill-conditioned optimization problems by searching a very large numberof candidate solutions; we note that this number of candidate solutions can be fitted to thecomputational power one has at disposal. The price to pay on this lack of assumption is thatoptimization algorithms relying on DE are never guaranteed to converge towards an optimal Note that the formulas derived in § § τ .
008 2009 2010 2011 2012 − − − − − Barclays US High Yield Bond Index
Log-Return and Index Value
Log Returns
Index Value
Figure 4.1: Barclays US High Yield Bond Index – Daily log-returns (in %) and index value fromJanuary 3rd, 2007 and July 31st, 2012. 11 − − − − . . . . . . Barclays US High Yield Bond Index
Log-Return Distribution N (cid:0) ˆ µ, ˆ σ (cid:1) with ˆ µ = 0 . and ˆ σ = 0 . Figure 4.2: Histogram of the daily returns. The returns show negative sample skewness equalto − .
63 and a sample kurtosis equal to 24 .
08. The sample mean ˆ µ and standard deviation ˆ σ are in %. 12olution. However, such meta-heuristics still seem to deliver good performances on continuousproblems, see e.g. Price et al. (2005).Roughly speaking, DE-based optimization algorithms exploit bio-inspired operations suchas crossover, mutation and selection of candidate populations in order to optimize an objectivefunction over several successive generations. Let f : R d −→ R denote the d -dimensional objectivefunction that must be optimized. For the sake of simplicity, we assume that f has to beminimized; this is not restrictive, as f can easily be maximized by minimizing − f . The DEalgorithm begins with a population of v initial vectors x i ∈ R d , with 1 ≤ i ≤ v , that caneither be randomly chosen or provided by the user. For a certain number of rounds, fixed inadvance, one iterates then the following operations: for each vector x i ∈ R d , one selects atrandom three other vectors x a , x b and x c , with 1 ≤ a, b, c, i ≤ t being all distinct, as wellas a random index 1 ≤ ρ ≤ d . Then, the successor x (cid:48) i of x i is computed as follows: given a crossover probability of π ∈ [0 , differential weight ω ∈ [0 ,
1] and x i = ( x i, , . . . , x i,d ) T , foreach 1 ≤ j ≤ d , one draws a number r j uniformly at random. If r j < π , or if j = ρ , then onesets x (cid:48) i,j = x a,j + ω · ( x b,j − x c,j ); otherwise, x (cid:48) i,j = x i,j keeps untouched. Once this mutationdone, the resulting vector x (cid:48) is evaluated with respect to the objective function, and replaces x i in the candidate population if it is better. Otherwise, x (cid:48) i is rejected. For more details, we referthe reader to Price et al. (2005) and Storn and Price (1997). For practical purposes, severalimplementations of DE-based optimization algorithms are currently available, as illustrated bythe web-based list of DE-based software for general purpose optimization maintained by Storn.In what follows, we rely on the package DEoptim , see Ardia et al. (2011a); Ardia and Mullen(2013); Mullen et al. (2011), which implements DE-based optimization in the R language andenvironment for statistical computing, see R Development Core Team (2008).For the optimizer, we have used a maximum number of iterations of 250, default values forthe crossover probability of π = 0 . ω = 0 .
8, as well as an initialpopulation of v = 200 vectors. We also have constrained the value of λ to the interval [0 , λ ∆ t <
1. The main challenge that we have faced for ourrolling analysis is the lack of stability over time of some parameters of the objective function.This is illustrated in the upper graph of Figure 4.3, that depicts the values computed by theDE-based optimizer for λ .We can observe that this lack of stability can be extreme for certain periods of time. Forinstance, in the first half of 2010, we can see that λ jumps from a high value near to its upperbound 252 to a very low value the next date. This is due to the fact that objective functionpossesses several optima and that for each of these solutions, λ can be very different.In order to overcome this issue, we propose to add “memory” to the algorithm. Indeed, foreach date, we compute a solution that maximizes the log-likelihood (3.2). Our idea consistsin storing these solutions and to feed the initial population with the last 50 solutions of theoptimization problem. Concretely, to determine the solution at a specific date, the initialpopulation – whose size is v = 200 – is fed with the solutions of the optimization problem attimes t − , t − , . . . , t −
50. There is no reason why the solution at time t is the same as time t −
1, except if the objective function remain the same; however, it is very likely that the solutionat t − t will not switchfrom one point to another with the same log-likelihood value at the next optimization problemof date t + 1 if the objective function has not changed. The effect of this stabilization procedureare illustrated in the lower graph of Figure 4.3. The stabilization procedure is very importantfor the interpretation of our results. Without this stabilization procedure, it would be far more13
009 2010 2011 2012
Evolution of λ through time without stabilization procedure with stabilization procedure Figure 4.3: Evolution of λ without and with the stabilization procedure. The stabilizationprocedure relies on feeding the initial population with the 50 last solutions, if available.14
009 2010 2011 2012
Time evolution of the annualized semideviationSquare-root-of-timeEq. (2.1)
Figure 4.4: Comparison of the semideviation (in %) obtained via the square-root-of-time rulewith the semideviation (in %) based on a jump-diffusion process. We can observe that thesemideviation based on the square-root-of-time rule significantly underestimates the risk inperiods of high volatility and overestimates the risk in periods of lower stress.difficult and even impossible to interpret the results for some periods of time.
We start by comparing in Figure 4.4 the evolution of the annualized semideviation, i.e., thesquare root of the semivariance, computed thanks to the square-root-of-time rule to the semide-viation computed through fitting a jump-diffusion process to the data under the constraint λ ≤
252 on which we applied our formula given in (3.6). It is quite striking to see that the riskestimated on the jump-diffusion process is up to 4 times larger than the one computed thanksto the square-root-of-time rule. At the beginning of 2009, the first one is larger than 40% whilethe other is just below 10%. The risk is not only underestimated in period of crisis, but alsoseems to be overestimated when the market rallies or in period of low or medium stress. Indeed,we can observe that the jump-diffusion semideviation is almost zero from the end of 2009 to theend of 2011, while the square-root-of-time semideviation remains at a level around 2% duringthat period.
To better understand how the jump-diffusion semideviation works, it is very important tounderstand the relationship between the different process parameters. Figure 4.5 illustrates theevolution through time of some important parameters of the jump-diffusion process, namely15
009 2010 2011 2012 − − Time evolution of the annualized semideviationwith important process parametersSemideviation (in % ) λµ (in % ) µ Q (in bps) Figure 4.5: Time evolution of the jump-diffusion process parameters λ , µ and µ Q . We can observe a direct relationship between λ and µ Q , namely that when λ gets small, then µ Q gets large (in absolute value). This behavior is quite easy to interpret:in normal market conditions, λ is rather high, while µ Q is rather low. The combination ofthese two parameters under normal market conditions captures a well-known effect in creditportfolios, a negative asymmetry in their return distribution.The parameter µ has an offsetting effect counterbalancing the compound Poisson process.For example, let us assume that λ = 252 and that µ Q has an average size of 1 basis point.In that case, the cumulative annual impact of the accidents driven by the Poisson process ison average a negative return of 252 × µ would be around 10% (note that from (1.6), we also have to take into accountthe role of σ to determine the annual expected return). In crisis situation, λ really captures thefrequency of extreme events and its value typically drops from 252 to smaller values, sometimesclose to 1. By contrast, µ is by construction less sensitive to these extreme accidents, eventhough we can see that it is impacted, by looking at Figure 4.5. Indeed, we can observe that µ has dropped to negative values during the 2008 financial crisis.It is also quite interesting to have a look at what happened during the rally phase thatstarted in March 2009. We can observe that µ has peaked to values larger than 60%. It isworth mentioning that even though the value µ has been very large at the end of 2009 due toa strong rally in the market, µ Q stayed negative during that phase. It is only in year 2010 that µ Q had positive values. This situation is quite exceptional, as we are used to negative skewnessin the returns for credit portfolios. However, under some very particular circumstances, theycan exhibit positive skewness and the values of µ Q cannot be interpreted any more as losses,16ut as gains.Note also that the jump-diffusion semideviation can be very close to zero in very specialcases, and in particular when µ is much larger than λ · µ Q , when µ Q is negative or in thesituation where both µ and µ Q are positive. This just means that in this kind of situation, theprobability of obtaining a annualized return below zero is very low, resulting in a semideviationclose to zero.The two worst crisis periods that the financial markets have experienced during the last fiveyears are, by order of importance, the 2008 financial crisis and the US Debt Ceiling crisis in2011. After weeks of negotiation, President Obama signed the Budget Control Act of 2011 intolaw on August 2, the date estimated by the department of the Treasury where the borrowingauthority of the US would be exhausted. Four days later, on August 5, the credit-rating agencyStandard & Poor’s downgraded the credit rating of US government bond for the first time inthe country’s history. Markets around the world experienced their most volatile week since the2008 financial crisis with the Dow Jones Industrial Average plunging for 635 points (or 5 . λ and the Semideviation It is worth studying the relationship between λ and the semideviation of the jump-diffusionprocess. In Figure 4.6, we give the evolution of the semideviation based on different values of λ . This analysis clearly shows that the assumption made about λ has a huge impact on thecomputation of the semideviation. In particular, it has been a standard practice since thepublication of the work of Ball and Torous (1983) to use a Bernoulli jump diffusion process asan approximation of the true diffusion process by making the assumption that λ ∆ t is small.However, our analysis in Figure 4.4 indicates that a model with a larger λ fits best the data.Obviously, the motivation of Ball and Torous was to capture large and infrequent events byopposition to small and frequent jumps. However, imposing to λ ∆ t to be small has the undesiredconsequence of underestimating the risk in periods of high volatility. Indeed, the graph inFigure 4.6 shows that the semideviation obtained with λ = 252 can be as much as 50 % largerin a period of stress than the semideviation computed with the constraint that λ = 10. Thisobservation was the main motivation for us to extend the work of Ball and Torous by allowingmodels where the frequency of the jumps is not imposed anymore, but driven by the data.In § λ is large, then the jump-diffusion return distributionshould be close to the distribution of a pure diffusion process but with different parameters.Rather than superposing the semideviation computed out of a jump-diffusion process with λ = 252 with a pure diffusion process, we have put in Figure 4.7 the jump-diffusion semideviationand its difference with the semideviation obtained from a pure diffusion process.The difference is small, except during the 2008 financial crisis. During that period, thedifference can be larger by several %. The central limit theorem tells us that the annualizedreturn distribution should converge to a normal distribution whose parameters are given in (3.7).But the parameters of the pure diffusion process have been determined by maximization of thelog-likelihood and may give results quite different from the ones in (3.7). Moreover, we cansee that the two semideviations are diverging in period of crisis, i.e., when it is difficult to17
009 2010 2011 2012 A nnu a li z e d s e m i d e v i a t i o n ( i n % ) Time evolution of the annualized semideviationcomputed on jump processes with different values of λλ = 252 λ = 100 λ = 50 λ = 10 Figure 4.6: Evolution of the semideviations (in %) for different values of λ . The curve in redcorresponds to the constraint λ = 252, in green to λ = 100, in blue to λ = 50 and in violet to λ = 10. Note that the curves in red and in green are almost identical.18
009 2010 2011 2012 A nnu a li z e d s e m i d e v i a t i o n ( i n % ) Time evolution of the annualized semideviationcomputed on a pure diffusion process and a jump process with λ = 252 λ = 252 λ = 0 Difference
Figure 4.7: Comparison between the semideviation (in %) obtained from a jump diffusionprocess with λ = 252 with a pure diffusion process with λ = 0. The blue and green linesrepresent the jump-diffusion and pure diffusion semideviations, respectively, while the red oneis the difference between the two. Even though the magnitude of the semideviations are similar,we can observe that the difference is substantial during the 2008 financial crisis.19
009 2010 2011 2012 − − − − − − L o g - li k e li h oo d Time evolution of the log-likelihood valuewhen fitting a pure diffusion process and a jump process with λ = 252 λ = 252 λ = 0 Figure 4.8: Comparison of the log-likelihood for the jump-diffusion process (in blue) with λ =252 and with the pure diffusion process (in red). In both cases, we can see that the log-likelihoodis smaller in periods of high market stress, which means that the calibration of the model isalso more challenging.calibrate the models due to the appearance of extreme events that make the determination ofthe parameters of the models quite challenging.To illustrate this fact, we have depicted the evolution of the log-likelihood value for bothprocesses on Figure 4.8. We have also compared the jump-diffusion and the Bernoulli jump-diffusion semideviations which should be the same based on the results from § λ is constrained to be small. σ Finally, we compare in Figure 4.10 the time evolution of the volatility σ when fitting a jumpprocess and a pure diffusion process. One can observe that the σ obtained for a pure diffusionprocess is larger in magnitude than the one obtained for a jump diffusion process; this comeswithout any surprise, as in the second case, the volatility is also captured by the jumps. Asecond fact that we would like to outline is that σ appears to be quite unstable for the jumpprocess. This can be interpreted as a possible misspecification of the model, as the jumpsappear to influence the value of σ . In other words, assuming a constant σ might be a wronghypothesis for our data, and this can be translated as a further motivation to study processes20
009 2010 2011 2012 A nnu a li z e d s e m i d e v i a t i o n ( i n % ) Time evolution of the annualized semideviationwhen fitting a jump process and a Bernoulli jump process, both with λ = 10 Jump diffusion with λ = 10 Difference
Figure 4.9: Comparison between the semideviation (in %) obtained from a jump diffusionprocess (in blue) and the difference with a Ball and Torous jump-diffusion (in red). In bothcases, we have set λ = 10. The difference is very small compared to the level of the semideviation.21
009 2010 2011 2012 . . . . . . . . V a r i a n c e ( σ ) Time evolution of σ when fitting a jump process and a pure diffusion processJump diffusion processPure diffusion process Figure 4.10: Comparison between the time evolution of the parameter σ obtained when fittinga jump process (in blue) and a pure diffusion process (in green).with stochastic volatility including jumps in returns and in volatility, as we will do it in thenext section. We explain now how to extend the methodology we developed before to processes involvingstochastic volatility with jumps in returns and volatility. This section is structured as follows.In § § §
3. We continue by addressing in § § § .1 Importance of Jumps in Returns and Volatility Modeling accurately the price of stocks is an important topic in financial engineering. A majorbreakthrough was the Black and Scholes (1973) model, that however suffers from at least twomajor drawbacks. The first one is that the stock price follows a lognormal distribution andthe second one is that its volatility is constant. Several studies (see Chernov et al. (1999), forinstance) have emphasized that the asset returns’ unconditional distributions show a greaterdegree of kurtosis than implied by the normality assumption, and that volatility clustering ispresent in the data, suggesting random changes in returns’ volatility. An extension of this modelis to integrate large movements in the stock prices making possible to model financial crashes,like the Black Monday (October 19th, 1987). They were introduced in the form of jump-diffusionmodels, see Cox and Ross (1976); Merton (1976). An important extension of the Black-Scholesmodel was to use stochastic volatility rather than considering it as constant, see for instanceHeston (1993); Hull and White (1987); Scott (1987). Bates (1996) and Scott (1997) combinedthese two approaches and introduced the stochastic volatility model with jumps in returns.Event though their new approach has helped better characterize the behavior of stock prices,several studies have shown that models with both diffusive stochastic volatility and jumps inreturn were incapable of fully capturing the empirical features of equity index returns, see forinstance Bakshi et al. (1997); Bates (2000); Pan (2002). Their main weakness is that they donot capture well the empirical fact that the conditional volatility of returns rapidly increases.By adding jumps in the volatility, then the volatility process is better specified. Duffie et al.(2000) were the first to introduce a model with jumps in both returns and volatility. Erakeret al. (2003) have shown that the new model with jumps in volatility performed better thanprevious ones, and resulted in no major misspecification in the volatility process.
As mentionned above, Eraker et al. (2003) consider a jump diffusion process with jumps in re-turns and in volatility. These jumps arrive simultaneously, with the jump sizes being correlated.According to the model, the logarithm of an asset’s price Y t = log( S t ) solves(5.1) (cid:18) dY t dV t (cid:19) = (cid:18) µκ ( θ − V − t ) (cid:19) dt + (cid:112) V t − (cid:18) ρσ ν (cid:112) (1 − ρ ) σ ν (cid:19) d W t + (cid:18) ξ y ξ ν (cid:19) dN t where V t = lim s ↑ t V s , W t = ( W (1) t W (2) t ) T is a standard Brownian motion in R and ( · ) T denotesthe transpose of a matrix or a vector. The jump arrival N t is a Poisson process with constantintensity λ , and this model assumes that the jump arrivals for the returns and the volatilityare contemporaneous. The variables ξ y and ξ ν denote the jump sizes in returns and volatility,respectively. The jump size in volatility follows an exponential distribution ξ ν ∼ exp( µ v ) whilethe jump sizes in returns and volatility are correlated with ξ y | ξ ν ∼ N ( µ y + ρ j ξ ν , σ y ).Their methodology relies on Markov Chain Monte Carlo (MCMC) methods. The basis fortheir MCMC estimation is the time-discretization of (5.1): Y t +∆ t − Y t = µ ∆ t + (cid:112) V t ∆ tε yt +∆ t + ξ yt +∆ t J t +∆ t ,V t +∆ t − V t = κ ( θ − V t )∆ t + σ ν (cid:112) V t ∆ tε νt +∆ t + ξ νt +∆ t J t +∆ t , (5.2)where J kt +∆ t = 1 indicates a jump arrival. ε yt +∆ t , ε νt +∆ t are standard normal random variableswith correlation ρ and ∆ t is the time-discretization interval (i.e., one day). The jump sizesretain their distributional structure and the jump times are Bernoulli random variables withconstant intensity λ ∆ t . The authors apply then Bayesian techniques to compute the model23arameters Θ . The posterior distribution summarizes the sample information regarding theparameters Θ as well as the latent volatility, jump times, and jump sizes:(5.3) Pr( Θ , J, ξ y , ξ ν , V | Y ) ∝ Pr( Y | Θ , J, ξ y , ξ ν , V ) Pr( Θ , J, ξ y , ξ ν , V )where J, ξ y , ξ ν , V are vectors containing the time series of the relevant variables. The posteriorcombines the likelihood Pr( Y | Θ , J, ξ y , ξ ν , V ) and the prior Pr( Θ , J, ξ y , ξ ν , V | Y ). As the pos-terior distribution is not known in closed form, the MCMC-based algorithm generates samplesby iteratively drawing from the following conditional posteriors:Parameters: Pr( Θ i | Θ − i , J, ξ y , ξ ν , V, Y ) i = 1 , . . . , k Jump times: Pr( J t = 1 | Θ , ξ y , ξ ν , V, Y ) i = 1 , . . . , T Jump sizes: Pr( ξ yt | Θ , J t = 1 , ξ ν , V, Y ) i = 1 , . . . , T Pr( ξ νt | Θ , J t = 1 , V, Y ) i = 1 , . . . , T Volatility: Pr( V t | V t +∆ t , V t − ∆ t , Θ , J, ξ y , ξ ν , Y ) i = 1 , . . . , T where Θ − i denotes the elements of the parameter vector, except Θ i . Drawing from thesedistributions is straightforward, with the exception of the volatility, as its distribution is not ofstandard form. To sample from it, the authors use a random-walk Metropolis algorithm. For ρ ,they use an independence Metropolis algorithm with a proposal density centered at the samplecorrelation between the Brownian increments. For a review of these MCMC techniques, werecommend Johannes and Polson (2009) where the authors provide a review of the theory behindMCMC algorithms in the context of continuous-time financial econometrics. The algorithmof Eraker et al. (2003) produces a set of draws (cid:8) Θ ( g ) , J ( g ) , ξ y ( g ) , ξ ν ( g ) , V ( g ) (cid:9) Gg =1 which are samplesfrom Pr( Θ , J, ξ y , ξ ν , V | Y ).Our approach relies on the methodology developed in Eraker et al. (2003). The only differ-ence is that we have replaced the Bernoulli jump process by the jump process described in § J t +∆ t can take only two values and its posterioris Bernoulli with parameter λ ∆ t . To compute the Bernoulli probability, the authors use theconditional independence of increments to volatility and returns to getPr( J t +∆ t = 1 | V t +∆ t , V t , Y t +∆ t , ξ yt +∆ t , ξ νt +∆ t , Θ ) ∝ λ ∆ t × Pr( Y t +∆ t , V t +∆ t | V t , J t +∆ t = 1 , ξ yt +∆ t , ξ νt +∆ t , Θ ) . Our generalization is straightforward. Following the approach explained in § J t +∆ t can takeany value between 0 and m . ThenPr( J t +∆ t = m | V t +∆ t , V t , Y t +∆ t , ξ yt +∆ t , ξ νt +∆ t , Θ ) ∝ p k × Pr( Y t +∆ t , V t +∆ t | V t , J t +∆ t = m, ξ yt +∆ t , ξ νt +∆ t , Θ ) , with p k = e − λ ∆ t ( λ ∆ t ) k k ! for k = 0 , . . . , m − p m = (cid:80) m − k =0 p k . We have followed the approach developed by Numatsi and Rengifo (2010) for the estimationof the model parameters. These authors describe an implementation in R of the methodologyexposed by Eraker et al. (2003) and apply it on FTSE 100 daily returns. We assume that wehave daily data at times t i for i = 1 , . . . , T and with t i +1 − t i = ∆ t = 1 day. The bivariatedensity function under consideration is given by(5.4) f ( B t i ) = 12 π | Σ | . exp (cid:18) −
12 ( B t i − E[ B t i ]) T Σ − t i ( B t i − E[ B t i ]) (cid:19) , | · | denotes the determinant of matrix. The likelihood function is simply given by (cid:81) ni =1 f ( B t i ) with(5.5) B t i = (cid:18) ∆ y t i ∆ ν t i (cid:19) (5.6) E[ B t i ] = (cid:18) µ + ξ yt i ξJ t i κ ( θ − V t i − ) + ξ νt i J t i (cid:19) (5.7) Σ t i = Cov[∆ y t i , ∆ ν t i ] = (cid:18) V t i − ρσ ν V t i − ρσ ν V t i − σ ν V t i − (cid:19) where ∆ y t i = Y t i − Y t i − and ∆ ν t i = V t i − V t i − . The joint distribution is given by the product ofthe likelihood times the distributions of the state variables times the priors of the parameters,more specifically: (cid:34) T (cid:89) i =1 f ( B t i ) (cid:35) × (cid:34) T (cid:89) i =1 f ( ξ yt i ) × f ( ξ νt i ) × f ( J t i ) (cid:35) × (cid:2) f ( µ ) × f ( κ ) × f ( θ ) × f ( ρ ) × f ( σ ν ) × f ( µ y ) × f ( ρ J ) × f ( σ y ) (cid:3) × [ f ( µ ν ) × f ( λ )]The distributions of the state variables are respectively given by ξ νt i ∼ exp( µ ν ), ξ yt i ∼ N ( µ y + ρ J ξ νt i , σ y ), and J t i ∼ B ( λ ). In Numatsi and Rengifo (2010), the authors impose little informationthrough priors, that follow the same distributions than in Eraker et al. (2003): µ ∼ N (0 , κ ∼ N (0 , θ ∼ N (0 , ρ ∼ U ( − , σ ν ∼ IG (2 . , . µ y ∼ N (0 , ρ J ∼ N (0 , σ y ∼ IG (5 , µ ν ∼ G (20 , λ ∼ β (2 , Following our approach, we only made thefollowing change: we have relaxed the assumption about the Bernoulli process to consider themore general process proposed in §
3. It may be also tempting to change the assumptions aboutthe distribution of λ , but the β distribution is so flexible that it can cope with a wide variety ofassumptions about these parameters. For example, the β (1 ,
1) gives a uniform variable whichcan be used if we want to have a uninformative distribution.It is important to store the sampled parameters and the vector of state variables at eachiteration. The mean of each parameter over the number of iterations gives us the parameterestimate. The convergence is checked using trace plots showing the history of the chain foreach parameter. ACF plots is used to analyze the correlation structure of draws. Finally, thequality of the fit may be assessed through the mean squared errors and the normal QQ plot.The standardized error is defined as follows: Y t +∆ t − Y t − µ ∆ t − ξ yt +∆ t J yt +∆ t (cid:112) V t +∆ t ∆ t = ε yt +∆ t . Once we have managed to estimate the parameters of the model on daily data, we still haveto compute an annualized semideviation. The first step consists in simulating the returns overthe τ horizon, where τ is typically 1-year if we are interested in computing an annualized Here, B ( · ), G ( · , · ), IG ( · , · ), U ( · , · ), β ( · , · ) denote the Bernoulli, gamma, inverse gamma, uniform and betadistributions, respectively. N τ with jump intensity λ . The outputof this simulation consists of the number N of jumps occurring between times 0 and τ , inaddition to the times 0 ≤ j ≤ · · · ≤ j N ≤ τ at which these jumps occur. For each interval[ j i − , j i ], we simulate the diffusion parts of the return and volatility processes according to anEuler discretization schema for the Heston model, this procedure being explained later in thesection. This gives us preliminary values Y − j i and V − j i for the return and variance processes attime j i . Next, we simulate jump sizes for the jumps in the two processes. These jumps are givenby ξ yj i and ξ νj i , respectively. The final values of the two processes at time j i can be calculatedas: Y j i = Y − j i + ξ yj i (5.8) V j i = V − j i + ξ νj i (5.9)If j N (cid:54) = τ , then no jump occurs in the interval [ j N , τ ], and we can apply the Euler discretizationschema for the Heston model to get the values of Y τ and V τ .In the following, we explain how to implement the Euler discretization schema for the Hestonmodel. Discretizing the two equations (5.1) and ignoring the jump part, the difference between Y t +∆ t and Y t and the difference between V t +∆ t and V t is simply given by Y t +∆ t − Y t = µ ∆ t + (cid:112) V t (cid:16) ∆ W (1) t (cid:17) ,V t +∆ t − V t = κ ( θ − V t )∆ t + σ ν (cid:112) V t (cid:16) ρ ∆ W (1) t + (cid:112) − ρ ∆ W (2) t (cid:17) . (5.10)As already mentioned before, we typically use ∆ t = 1 day to discretize both processes. Tosimulate the Brownian increments ∆ W (1) t and ∆ W (2) t , we use the fact that each incrementis independent of the others. Each such increment is normally distributed with mean 0 andvariance ∆ t . However, using (5.10) for the simulations may generate negative volatilities. Thisis a well-known effect that has been addressed several times in the literature. Following Lordet al. (2006), we slightly modify (5.10) to prevent the simulation from generating negativevalues:(5.11) V t +∆ t − V t = κ ( θ − V + t )∆ t + σ ν (cid:113) V + t (cid:16) ρ ∆ W (1) t + (cid:112) − ρ ∆ W (2) t (cid:17) , where V + t is the maximum between 0 and V t . If the time between different jumps is smaller thatthe discretization step ∆ t , then it simply means that we have several jumps occurring duringthat time interval. In that case, we just have to simulate the appropriate number of jumps,both for the return and for the volatility, during that period. In the end, the return over theperiod [0 , τ ] is just Y τ . By repeating this procedure M times, we get M realizations of theaccumulated returns over the period [0 , τ ], which make it possible to estimate its density. Inour tests, we have used a Gaussian kernel to estimate it. This can easily be done in R by usingthe density() function. Then, one can obtain the semivariance by numerically integratingthe expression given in (1.1). For this, we have used the QUADPACK library via the useof the integrate() function in R. QUADPACK is a well known numerical analysis libraryimplemented in FORTRAN 77 and containing very efficient methods for numerical integrationof one-dimensional functions. We ran the algorithm using 100’000 iterations with 10’000 burn-in iterations. Table 5.1 providesparameter posterior means and their computed standard errors. The return mean µ is close26arameter Value (100k iter.) µ µ y -0.0222 (0.2055) µ ν θ κ ρ -0.0058 (0.2595) ρ J λ σ y σ ν µ , µ y , µ ν , σ y , and σ ν are in %.to the daily return mean from the data (0 . V t , givenby E[ V t ] = θ + ( µ ν λ ) /κ , is equal to 0 . . − . . −
23 bps and +18 bps. The parameters σ y and σ ν are 1 . . . .
05. However, this empirical intensity was computed considering that differences inreturns above 2 .
57 deviations from the mean could be considered as jumps. Some caution mustbe taken, since this empirical intensity is very sensitive to the number of standard deviationsused as threshold for defining the jumps. More details about the implementation can be foundin Appendix C.In a second step, we have computed an annualized semideviation based on that process. Wehave followed the methodology explained in § Intel R (cid:13) Core TM i5-3470 CPU@ 3.20 GHz . Assuming that we would have to perform a rolling analysis based on 1-year ofhistory over the same period, then we would have to run more than 1000 times this analysis.This is not impossible to realize but would typically require parallel computing techniques. Thesecond reason for not having performed this analysis is that we have to check the convergenceof the process for each period we consider in the rolling analysis, a task that is difficult to fullyautomate. As a reminder, the convergence can be checked with trace plots showing the historyof the chain for each parameter, with ACF plots to analyze the correlation structure of draws,and the quality of the fit may be assessed through the mean squared errors and the normalQQ plot. So these two reasons have prevented us from doing this rolling analysis when we27 −
10 0 10 20 30 40
Returns (in % ) . . . . . . . . D e n s i t y Simulated Returns
Figure 5.1: This graph shows the histogram of the returns based on 1’000 simulations.consider the model with stochastic volatility and jumps in returns and in volatility. However,we have performed some analyses based on 1 year of data for the two worst periods: duringthe credit crunch crisis in 2008 and in 2011. Table 5.2 summarizes the results and comparesthe semideviations obtained from the different models that we have explored in this paper. Wecan observe that the semideviation SD4 obtained with a stochastic volatility model seems to bemore in line with the semideviations SD2 (based on a jump-diffusion model) and SD3 (based ona pure diffusion model) rather than SD1 (square-root-of-time rule). This is not very surprisingsince SD4 may be interpreted as a refinement of SD2, explaining why the two analytics shouldnot be so different. Year SD1 SD2 SD3 SD42008-2012 4.70 1.21 1.39 1.742008 8.71 31.59 31.72 32.162011 3.69 1.31 1.34 2.90Table 5.2: Annualized semideviations (in %) for different periods. SD1 is based on the square-root-of-time rule, SD2 on a jump-diffusion model with constant volatility, SD3 on a pure diffu-sion model with a constant volatility, and SD4 is based on a jump diffusion model with stochasticvolatility and jumps in returns and in volatility.28
Conclusion and Future Venues of Research
In this paper, we propose a generalization of the Ball-Torous approximation of a jump-diffusionprocess and we show how to compute the semideviation based on several jump-diffusion models.We have in particular considered a very exhaustive example based on the Barclays US HighYield Index, whose returns show negative skewness and fat tails. It is a common practice toimpose a bound on the Poisson process intensity parameter to make sure that only the largeaccidents are captured by the jumps process. However, our analysis clearly shows that the riskmay be underestimated in such a situation. Without constraining the intensity parameter, thesemideviation may be more than 50 % larger than the one obtained by arbitrary limiting thisparameter in order to capture only large jumps.We see at least two reasons why our approach should be preferred. The first one is thatwe do not impose any arbitrary constraint on the intensity parameter. With our method, thisparameter is determined in order to obtain the best fit to the data. Jumps may occur veryoften in certain periods and can be very uncommon (as rare as a few ones per year) in others.The second reason is that imposing an arbitrary limit on the intensity parameter may result inan underestimation of the risk. Capturing the risk as accurately as possible is one of the mostimportant mission in risk management. It is the reason why the “traditional” approach basedon the work of Ball and Torous does not seem appropriate in our context. Moreover, one of ourconclusions is that the use of the square-root-of-time rule may either underestimate the risk inperiods of high volatility or underestimate it in periods of lower stress.We also provide in this paper a generalization of the work of Eraker, Johannes and Polson,who consider a jump diffusion model with stochastic volatility and jumps in returns and involatility. As in the case where the volatility is constant, we have extended their approach byreplacing the jump Bernoulli process by a more general process being able to model severalaccidents per day when the intensity parameter is high. The parameter estimation methods arebased on MCMC methods. In particular, we have used a Gibbs sampler when the conditionaldistributions were known and variants of the Metropolis algorithm when it was not the case.We also provide a procedure to compute an annualized semideviation once the parameters ofthe model have been estimated.It was not possible for us to go as far as we would like to when we have considered themodel with stochastic volatility with jumps in returns and in volatility. It would have been veryuseful to be able to do the same rolling analyses that we have performed when the volatilitywas kept constant. However, such analyses would have been too much time consuming from acomputational point of view and it would have been complicated to automate the check of theconvergence of the process that needs to be done for each period in the rolling analysis. Thiswas clearly a limitation when we have tried to determine empirically the relationship betweenthe daily semideviation and its annualized version.We really think that the use of the semideviation (or semivariance) could benefit the financeindustry by providing a useful and powerful risk measure. This risk metric has been proposed avery long time ago but its difficulty to be computed and its lack of nice properties for its scalingmade it hard to implement. However, we have shown in this article that it is still possibleto calculate it even when we consider very complex stochastic processes and that some usefulformula for its time scaling can be derived under some mild assumptions. Finally, we hope thatthis paper will help democratize its use in the asset management industry.
Disclaimer:
The views expressed in this article are the sole responsibility of the authorsand do not necessarily reflect those of Pictet Asset Management SA. Any remaining errors orshortcomings are the authors responsibility. 29
Proof of Theorem 2.1
As a first step, we note the two following identities:(A.1) (cid:90) a −∞ tφ ( t ) dt = − φ ( a ) and (cid:90) a −∞ t φ ( t ) dt = − aφ ( a ) + Φ( a ) . The first one can be easily obtained by observing that dφdt = − tφ ( t ) and lim t →−∞ φ ( t ) = 0.Then, (cid:90) a −∞ tφ ( t ) dt = − (cid:90) a −∞ ddt φ ( t ) dt = φ ( a ) . For obtaining the second identity, one can proceed as follows: we note that, for a standardizednormal density φ , (cid:82) tφ ( t ) dt = − φ ( t ). Integrating by part, we get (cid:90) a −∞ t φ ( t ) dt = (cid:90) a −∞ t ( tφ ( t )) dt = − tφ ( t ) (cid:12)(cid:12)(cid:12) a −∞ + (cid:90) a −∞ φ ( t ) dt. We can conclude by noting that lim t →−∞ tφ ( t ) = 0. Those two identities are useful to provethe following result. Proposition A.1.
Let W ∼ N ( µ , σ ) , with a density function f W ( w ) = 1 σ √ π exp (cid:32) − (cid:18) w − µ σ (cid:19) (cid:33) , and let ˜ D = D − µ σ ; then, the semivariance of f W is given by (A.2) (cid:90) D −∞ f W ( w )( D − w ) dw = ( D − µ ) Φ( ˜ D ) + σ ( D − µ ) f ( ˜ D ) + σ Φ( ˜ D ) Proof.
By first using the change of variable x = w − µ σ and the identities given in (A.1), weobtain the following straightforward development. (cid:90) D −∞ f W ( w )( D − w ) dw = (cid:90) ˜ D −∞ f ( x ) ( D − ( σ x + µ )) dx = (cid:90) ˜ D −∞ ( D − µ ) f ( x ) dx + (cid:90) ˜ D −∞ σ ( µ − D ) xf ( x ) dx + (cid:90) ˜ D −∞ σ x f ( x ) dx =( D − µ ) Φ( ˜ D ) + 2 σ ( D − µ ) f ( ˜ D ) + σ (cid:16) − ˜ Df ( ˜ D ) + Φ( ˜ D ) (cid:17) =( D − µ ) Φ( ˜ D ) + σ ( D − µ ) f ( ˜ D ) + σ Φ( ˜ D )Taking into account that the log returns follow the density given in (1.7), we can applyProposition A.1 on (1.7) to obtain the formula of Theorem 2.1. B Use of a Jump-Diffusion Process to Model a Bond Bench-mark Index
Using a jump-diffusion process is a standard practice for modeling a stock price. We explainhere why it is possible to use this model for a bond benchmark index, because this may seem30isturbing at a first sight. Indeed, the particularities of a stock and of a bond are quite different.In particular, a bond has a well-defined maturity, while it is not the case for a stock. Moreover,the clean price of a bond converges to 100 at its maturity. This is the well-known bond pull-to-par effect.However, when considering a bond benchmark, most of criticisms in favor of not using anequity model in this context become irrelevant. Indeed, bond benchmarks have some charac-teristics that are quite different from a bond. For example, they are typically rebalanced ona monthly basis and their maturity are (almost) kept constant, while the maturity of a bonddecreases linearly with time. The bonds with the shortest maturity (for example less than 1year) are automatically removed from the index. A direct consequence of holding the maturity(almost) constant is that the pull-to-par effect at the portfolio level is not relevant anymore.Another distinction between a bond and a stock is that a bond pays coupons while a stockpays dividends. When a coupon is paid, its dirty price falls for an amount equivalent to thecoupons. However, it is not necessary to model the coupon payments for bond benchmarks,since the rules they obey imply that the coupons are automatically reinvested. Concretely, itmeans that the payment of a coupon does not impact the benchmark index.Moreover, in a diffusion model, the stock price increases exponentially if we ignore therandom part of the model. This is a direct consequence of the fact the returns are not additive,but must be compounded. This also applies to a bond benchmark index where the performanceis also compounded.Finally, as stock markets may experience crashes, bond markets may also suffered fromextremely severe and sudden losses justifying the use of jumps in returns and in volatility.All these remarks suggest that using a jump-diffusion process to model a bond benchmarkindex seems to be appropriate, even though the dynamics driving bonds are different from theones driving stocks. But at the portfolio level, and when we consider a bond benchmark witha constant maturity, there is no fundamental reason for not approximating the dynamics of theindex level as if it were a stock price.
C Details about the Implementation
Our MCMC algorithm is implemented in R language and relies on the code provided by Numatsiand Rengifo (2010). We have modified their implementation in order to cope with the modelthat we have introduced in this paper. The starting values for our algorithm were generated asfollows. The volatility vector was created using a three-month rolling window. We consideredas jumps absolute returns (the absolute value of the returns) that were above 2 .
57 standard de-viations above the mean (after accounting for outliers). For a standardized normal distribution,2 .
57 corresponds to its 0 . The technical document describing the management of the Barclays bond indexes can be obtained at https://indices.barcap.com/index.dxml . X for thejumps in volatility where the Poisson process results in a P ( λ ) and the jump sizes are exponen-tial variables ξ ν of parameter ν . It is easy to show that the distribution function F ( x ) of theprocess X is simply given by F ( x ) = Pr( X ≤ x ) = + ∞ (cid:88) k =0 p k Pr (cid:32) k (cid:88) i =1 ξ ν (cid:33) , where p k = e − λ λ k k ! . If there is no jump, then X = 0 with probability p = e − λ . A directconsequence is that the density does not exist. In order to overcome this issue, we propose toestimate λ based on the proportion of days having at least a jump. The probability of havingno accident in a particular day is e − λ ≈ − λ , when λ is small. From that formula, we cansimply estimate λ by the ratio of days with at least one jump over the total number of dayswithin that period.Once these parameters have been estimated, we still need to determine the number of jumpsoccurring when an accident is detected. It is well-known that the convolution of k iid expo-nential variables is given by a Gamma distribution with parameters k and λ . So, based on theobservation of the jump size, we can determine the number of jumps by determining which k gives the highest density.Having determined the state space variables, we can now tackle the problem of estimatingthe model parameters. We have implemented either the Gibbs sampler or Metropolis-Hastingdepending on whether or not we knew the closed form of the parameters’ distributions. Inorder to able to compare our results with those in Eraker et al. (2003) and in Numatsi andRengifo (2010), we have used the same priors. Note that little information is imposed throughpriors. They are as follows: µ ∼ N (0 , , κ ∼ N (0 , , κθ ∼ N (0 , , ρ ∼ U ( − , , σ ν ∼IG (2 . , . , µ y ∼ N (0 , , ρ J ∼ N (0 , , σ y ∼ IG (5 , , µ ν ∼ G (20 , λ ∼ β (2 , References
Ardia, D., Boudt, K., Carl, P., Mullen, K. M., and Peterson, B. G. (2011a). Differential evolutionwith DEoptim – an application to non-convex portfolio optimization.
The R Journal , 3(1):27–34.Ardia, D., David, J., Arango, O., and G´omez, N. D. G. (2011b). Jump-diffusion calibrationusing differential evolution.
Willmott , 2011(55):76–79.32rdia, D. and Mullen, K. M. (2013). DEoptim: Differential evolution optimization in R, version2.2-2. http://CRAN.R-project.org/package=DEoptim .Bakshi, G., Cao, C., and Chen, Z. (1997). Empirical performance of alternative option pricingmodels.
Journal of Finance , 52:2003 – 2049.Ball, C. A. and Torous, W. N. (1983). A simplified jump process for common stock returns.
Journal of Financial and Quantitative Analysis , 18:53–65.Ball, C. A. and Torous, W. N. (1985). On jumps in common stock prices and their impact oncall option pricing.
The Journal of Finance , 40(1):155–173.Bates, D. (1996). Jumps and stochastic volatility: the exchange rate processes implicit indeutschemark options.
Review of Financial Studies , 9:69–107.Bates, D. (2000). Post 87 crash fears in SP 500 futures options.
Journal of Econometrics ,94:181–238.Bawa, V. S. (1975). Optimal rules for ordering uncertain prospects.
Journal of FinancialEconomics , 2(1):95–121.Beckers, S. (1981). A note on estimating the parameters of the jump-diffusion model of stockreturns.
Journal of Financial and Quantitative Analysis , 16.Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities.
The Journalof Political Economy , pages 637–654.Chernov, M., Gallant, A. R., Ghysels, E., and Tauchen, G. (1999). A new class of stochasticvolatility models with jumps: Theory and estimation.
SSRN eLibrary .Cont, R. and Tankov, P. (2004).
Financial Modelling with Jump Processes . Chapman & Hall.Cox, J. C. and Ross, S. A. (1976). The valuation of options for alternative stochastic processes.
Journal of Financial Economics , 3:145–166.Dan´ıelsson, J. and Zigrand, J.-P. (2006). On time-scaling of risk and the square-root-of-timerule.
Journal of Banking & Finance , 30(10):2701–2713.Duffie, D., Pan, J., and Singleton, K. J. (2000). Transform analysis and asset pricing for affinejump-diffusions.
Econometrica , pages 1343–1376.Eraker, B., Johannes, M., and Polson, N. (2003). The impact of jumps in volatility and returns.
The Journal of Finance , 58(3):1269–1300.Fishburn, P. C. (1977). Mean-risk analysis with risk associated with below-target returns.
American Economic Review , 67(2):116–126.Heston, S. (1993). A closed-form solution for options with stochastic volatility with applicationsto bond and currency options.
The Review of Financial Studies , 6:1993.Honore, P. (1998). Pitfalls in estimating jump-diffusion models.
Working Paper
Hull, J. C. and White, A. (1987). The pricing of options on assets with stochastic volatilities.
Journal of Finance , 42:281–300. 33ohannes, M. and Polson, N. (2009).
Handbook of Financial Econometrics , volume 2, chapterMCMC Methods for Continuous-Time Financial Econometrics. Elsevier Science.Kiefer, N. (1978). Discrete parameter variation: Efficient estimation in diffusion process.
Econo-metrica , 2:427–434.Lord, R., Koekkoek, R., and van Dijk, D. (2006). A comparison of biased simulation schemes forstochastic volatility models. Technical report, University Rotterdam, Rabobank Internationaland Robeco Alternative Investments.Markowitz, H. M. (1952). Portfolio selection.
Journal of Finance , 7(1):77–91.Markowitz, H. M. (1959).
Portfolio selection . John Wiley and Sons.Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous.
Journalof Financial Economics , pages 125–144.Mullen, K. M., Ardia, D., Gil, D. L., Windover, D., and Cline, J. (2011). DEoptim: An Rpackage for global optimization by differential evoluation.
Journal of Statistical Software ,40(6).Numatsi, A. and Rengifo, E. (2010). Stochastic volatility model with jumps in returns andvolatility: An R-package implementation. In Vinod, H., editor,
Advances in Social ScienceResearch Using R . Springer Science+Business Media.Pan, J. (2002). The jump-risk premia implicit in options: Evidence from an integrated time-series study.
Journal of Financial Economics , pages 3–50.Price, K., Storn, R. M., and Lampinen, J. A. (2005).
Differential evolution – a practical approachto global optimization . Natural Computing Series. Springer-Verlag.R Development Core Team (2008).
R: A Language and Environment for Statistical Computing .R Foundation for Statistical Computing, Vienna, Austria. .Roy, A. D. (1952). Safety first and the holding of assets.
Econometrica , 20(3):431–449.Scott, L. (1997). Pricing stock options in a jump-diffusion model with stochastic volatility andinterest rates: Applications of Fourier inversion methods.
Mathematical Finance , 7:413–426.Scott, L. O. (1987). Option pricing when the variance changes randomly: Theory, estimation,and an application.
Journal of Financial and Quantitative Analysis , 22:419–438.Sharpe, W. F. (1966). Mutual fund performance.
Journal of Business , 39(1):119–138. Part II.Sortino, F. A. and Van Der Meer, R. (1991). Downside risk.
Journal of Portfolio Management ,17(4):27–32.Storn, R. and Price, K. (1997). Differential evolution – a simple and efficient heuristic for globaloptimization over continuous spaces.
Journal of Global Optimization , 11(4):341–359.Storn, R. M. Differential evolution for continuous function optimization.