Calibration and filtering for multi factor commodity models with seasonality: incorporating panel data from futures contracts
Gareth W. Peters, Mark Briers, Pavel V. Shevchenko, Arnaud Doucet
aa r X i v : . [ q -f i n . C P ] M a y Calibration and filtering for multi factor commodity models withseasonality: incorporating panel data from futures contracts.
Gareth W. Peters , Mark Briers Pavel V. Shevchenko , Arnaud Doucet
UNSW Mathematics and Statistics Department, Sydney, 2052, Australia;email: [email protected]; [2]
CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,NSW, 1670, Australia; e-mail: [email protected], fax: +61-2-93253200 [3]
QinetiQ, UK. [4]
University of British Columbia, Department of Computer Science and Department of Statistics. arXiv:
Working Paper, Version from October 31, 2018
Abstract
We examine a general multi-factor model for commodity spot prices and futures valuation. We extendthe multi-factor long-short model in [1] and [2] in two important aspects: firstly we allow for both thelong and short term dynamic factors to be mean reverting incorporating stochastic volatility factors andsecondly we develop an additive structural seasonality model. Then a Milstein discretized non-linearstochastic volatility state space representation for the model is developed which allows for futures andoptions contracts in the observation equation. We then develop numerical methodology based on anadvanced Sequential Monte Carlo algorithm utilising Particle Markov chain Monte Carlo to performcalibration of the model jointly with the filtering of the latent processes for the long-short dynamicsand volatility factors. In this regard we explore and develop a novel methodology based on an adaptiveRao-Blackwellised version of the Particle Markov chain Monte Carlo methodology. In doing this wedeal accurately with the non-linearities in the state-space model which are therefore introduced into thefiltering framework. We perform analysis on synthetic and real data for oil commodities.
Key words:
Multi-Factor, Commodity, Spot Price, Stochastic Volatility, Milstein, Adaptive Markovchain Monte Carlo, Particle filter, Rao-Blackwellization.
The basis of commodity modelling has a long history in the economics and econometrics literature, indeedinstrumental concepts were introduced in [3] which developed normal backwardation to attempt to modelinter-temporal futures prices, in order to explain differences between spot and futures prices. Storage theory1as developed in [4] to account for storage costs, which basically states that inter-temporal prices are thejointly determined price of storage. That is, futures prices will be approximately the spot price plus positivestorage costs. One problem with this explanation was that the interpretation of backwardation, whichoccurs when the observed Futures price is less than the Spot price, was difficult to explain. To addressthis problem with interpretation, the work of [5] developed the concept of convenience yield, which is theimplicit revenue associated with stock holding. This can be considered as analogous to coupons linked withbonds or dividends payed on stocks. Put another way the convenience yield can be interpreted as the flowof services accruing to the holder of the spot commodity but not to the owner of a futures contract.The conclusions of storage theory provided the following fundamental elements that models of commodi-ties should aim to address when considering the economic relationship between spot and futures pricesimplied by storage cost. This relationship identifies at least three key factors influencing futures prices,given by the Spot price, the convenience yield and the interest rate. Additionally, convenience yield andspot price are positively correlated and are both inverse functions of stock level. One also needs to considerthat there is an asymmetric relationship for basis, that is the variation between spot price of a deliverablecommodity and the relative price of the futures contract closest to maturity. In particular when the marketis in contango (futures > spot), the basis level is limited to the storage costs. One final important economicrelationship that commodity models should aim to address is the Samuelson Effect, [6], which states thatmovement in prices of prompt (soon to mature) contracts are large and erratic whilst prices of long-termcontracts are not erratic. This implies that variance of futures prices and correlation between nearest futuresprice and subsequent contract prices decrease as a function of time. In this paper we begin with a review of multi-factor commodity futures models. Then in Section 3 wepropose and develop a general multi-factor model for commodity spot prices and futures valuation. Weextend the multi-factor long-short model in [1] and [2] in two important aspects: firstly we allow for boththe long and short term dynamic factors to be mean reverting incorporating stochastic volatility factorsand secondly we develop an additive structural seasonality model. In Section 4 a Milstein discretized non-linear stochastic volatility state space representation for the model is developed which allows for futures andoptions contracts in the observation equation. In Section 5 we develop an Bayesian filtering and calibrationframework in which we develop numerical methodology based on an advanced Sequential Monte Carloalgorithm utilising Particle Markov chain Monte Carlo to perform calibration of the model jointly withthe filtering of the latent processes for the long-short dynamics and volatility factors. Section 6 explainsthis novel methodology in detail. In this regard we explore and develop a novel methodology based on anadaptive Rao-Blackwellised version of the Particle Markov chain Monte Carlo methodology. In doing thiswe deal accurately with the non-linearities in the state-space model which are therefore introduced into thefiltering framework. In Section 7 we perform analysis on synthetic and real data for oil commodities.2
Review of multi-factor dynamic commodity futures models
The typical approach to formulating a model for commodities is to begin by specifying a real world latentstochastic differential equation (s.d.e.) based model for the factors to be considered in the commoditiesmodel. Then under assumptions on the market prices of risk and the behavior of such markets, a changeof measure is made to adjust the real world process by the price of risk in order to obtain the risk neutralmeasure. Under this risk neutral s.d.e. model, the futures price are obtained as functions of the latentprocess factors as expected discounted future payoffs.In [7] a reduced form model for commodity prices was introduced. This was a one-factor model in whichthe commodity price follows a Geometric Brownian motion (GBM) and the convenience yield is treated asa dividend yield. This model was clearly not sufficient as it failed in two important aspects, firstly it doesnot account for the property of mean reversion of spot prices and secondly it neglects inventory dependenceproperties of the convenience yield. This basic model was extended by the work in [8] to a two factor modeland used to study oil futures. In the two-factor model the spot price was modelled by Geometric BrownianMotion and the Convenience Yield was modelled by a mean reverting process.As noted in [1], this two factor model can be made more parsimonious via a change of variable. A newstate variable is defined by the demeaned convenience yield minus the long-term convenience yield and attime t this represents the short-term deviation in log prices or the deviation of spot price returns from itslong-term value. Next, define a second factor as the long-term price return or price appreciation which isgiven by the long-term total return minus the long term convenience yield. This reformulation in addition tobeing parsimonious is also easier to interpret from a physical perspective since now the returns are definedin terms of the long-term price appreciation.In the three-factor model, ([9]; [10]; [11]) a third unobserved stochastic variable is introduced, with theshort term variation (follows OU process), the equilibrium price (follows GBM process) and the instantaneousgrowth in equilibrium prices (follows a mean reverting process) for the real process.Note, it should be pointed out that an additional assumption on these models is that there are noarbitrage opportunities. Strictly speaking this will not be true since, as pointed out in [12], these models donot guarantee that the convenience yield is strictly positive. This is required for these models to be arbitragefree, where contemporaneous spot prices are greater than discounted futures prices net of carrying costs.Additionally, as pointed out in the introduction, a key element of storage theory is that the variance andcorrelation between spot and futures prices should be related to the level of the price or convenience yield.These models make the simplistic assumption of constant correlation and variance. Finally, in addition tothese weaknesses, the spot price volatilities obtained from these models are strongly hetroskedastic, see [12]for a review of literature on studies demonstrating these issues.More recent models include those developed in [12] which removes the potential for cash and carryarbitrage, by forcing the convenience yield to be positive at all times. Additionally, these new models admittime varying spot and convenience yield volatilities. This model modifies the models discussed whilst still3dmitting an analytic expression to be obtained for the futures contract price as a function of the unobservedstochastic factors. In particular the two factor model they develop has a spot price following a GBM inwhich the convenience yield is treated as an exogenous dividend yield and the volatility is proportional to thesquare root of the instantaneous convenience yield level. The second factor, the convenience yield, followsa Cox-Ingersoll-Ross (CIR) model, forcing convenience yield to be strictly non-negative, with volatilityproportional to the square root of the instantaneous convenience yield level. Under the assumption thatthe interest rate is constant, an analytic solution is obtained for the futures price as a function of theseunobserved stochastic factors (real process).Another recent model is from [2] which developed a very general multi-factor model incorporating severalextensions to the above models including a Levy driven process with jumps. However, again in this modelthe questions relating to joint parameter estimation and filtering were not addressed, nor was seasonalityincorporated into the model. The stochastic model developed was given by (risk neutral setting): dS t = ( r t − δ t − λµ J ) S t dt + σ S S t dW + p V t S t dW + J S t dq Pr( dq = 1) = λdt ; ln(1 + J ) ∼ N (cid:18) ln (1 + µ J ) − σ J , σ J (cid:19) dr t = ( θ r − κ r r t ) dt + σ r √ r t dW r ; dδ t = ( θ δ − κ δ δ t ) dt + σ δ dW δ dV t = ( θ V − κ V V t ) dt + σ V p V t dW V + J V dq ; J V ∼ exp( θ ) . Where S t is the spot price at time t , δ t is the instantaneous convenience yield at time t , µ J the long termtotal return (price appreciation + convenience yield), κ r , κ V and κ δ are the mean reverting coefficients, θ r and θ V are the long term interest rate and volatility, σ s , σ r the volatilities of spot price and convenienceyield dynamics.We modify this model structure for this paper and derive the resulting futures price as a function ofthe latent commodity factors. To achieve this we remain in the exponentially affine family of models. Thederivation of the futures price for our model allows us to use it as the observation equations for these modelsthat we consider in this paper which is derived as a exponentially affine linear function of the state variables.The observation equations linking these unobserved factors to the futures price were obtained under the riskneutral measure. In this paper we will extend the long-short stochastic Model to allow both the long and short stochasticfactors to be mean reverting processes and we incorporate an additive structural model to account forseasonality and a third factor for stochastic volatility to allow for modelling of implied volatility smiles.4 .1 Latent stochastic factors and seasonality.
Here we present the model considered in the remainder of this paper.
Real World Process Risk Neutral Process X ( t ) = ln ( S t ) = χ t + ξ t + θ t + f ( t ) X ∗ ( t ) = χ ∗ t + ξ ∗ t + θ ∗ t + f ( t ) dχ t = − βχ t dt + σ χ dZ χ dχ ∗ t = ( − β ∗ χ ∗ t − λ ) dt + σ χ dZ ∗ χ dξ t = ( µ ξ − κ ξ ξ t ) dt + σ ξ dZ ξ dξ ∗ t = (cid:16) µ ∗ ξ − κ ∗ ξ ξ ∗ t (cid:17) dt + σ ξ dZ ∗ ξ dθ t = √ V t dZ θ dθ ∗ t = − λ V ∗ t dt + p V ∗ t dZ ∗ θ dV t = ( µ V − κ V V t ) dt + σ V √ V t dZ V dV ∗ t = ( µ ∗ V − κ ∗ V V ∗ t ) dt + σ V p V ∗ t dZ ∗ V , with correlations structures given by:E [ dZ ξ dZ θ ] = E [ dZ χ dZ θ ] = E [ dZ χ dZ V ] = E [ dZ ξ dZ V ] = 0; E [ dZ ξ dZ χ ] = ρ ξχ ; E [ dZ V dZ θ ] = ρ V θ . In addition we define f ( t ) as the deterministic unknown seasonal component and the risk premia in the riskneutral model are given by: λ χ ( χ t , ξ t , θ t , V t ) = λ ∗ + λ ∗ χ t ; λ ξ ( χ t , ξ t , θ t , V t ) = λ ∗ + λ ∗ ξ t λ θ ( χ t , ξ t , θ t , V t ) = λ ∗ + λ ∗ p V t ; λ V ( χ t , ξ t , θ t , V t ) = λ ∗ + λ ∗ V t λ = λ ∗ σ χ ; λ = λ ∗ σ χ ; λ = λ ∗ σ ξ ; λ = λ ∗ σ ξ ; λ = λ ∗ ; λ = λ ∗ = 0 λ = λ ∗ σ V ; λ = λ ∗ σ V ; β ∗ = β + λ ; µ ∗ ξ = µ ξ − λ κ ∗ ξ = κ ξ + λ ; µ ∗ V = µ V − λ ; κ ∗ V = κ V + λ . Note the real processes are simply obtained by setting the risk premia ( λ ) parameters to zero. We canalso ensure this model remains arbitrage free, according to discussions in [13], by considering constraintson the parameter µ V such that 2 µ V ≥ f ( t ) = X k =2 ω k Ω kt , (3.1)where Ω kt = 1 if t is the date of the k -th calendar month and Ω kt = 0 otherwise. Note, without loss ofgenerality one can set f (0) = 0 as this constant is absorbed by ξ t and all variation is with respect to January. In this section we obtain analytic expressions for the futures price under the proposed multi-factor long-short dynamic model. The futures price we consider here will be developed as a function of the underlyingstochastic factors. We denote the Futures contract price at time t with maturity of the contract at T by F t,T with the proof of this results provided in Appendix 1. Proposition 1:
The futures contract price F t,T at time t for a contract maturing at time T has a closed rom solution, with respect to the latent state variables that comprise the log spot price, χ t , ξ t , θ t , when pricedunder the risk neutral measure, given by F t,T = exp ( f ( T ) + B ( τ ) + B ( τ ) ξ t + B ( τ ) χ t + B ( τ ) θ t ) , with coefficients given by, B ( τ ) = σ ξ κ ∗ ξ exp( − κ ∗ ξ τ ) + σ χ β ∗ exp( − β ∗ τ )+ µ ∗ ξ κ ∗ ξ exp( − κ ∗ ξ τ ) − λ β ∗ exp( − β ∗ τ ) + ρ χξ σ χ σ ξ β ∗ + κ ∗ ξ exp( − β ∗ τ − κ ∗ ξ τ ) B ( τ ) = exp( − κ ∗ ξ τ ); B ( τ ) = exp( − β ∗ τ ); B ( τ ) = 1 . The proof of this result is provided in detail in Appendix 1.
Remark 1:
Note that the parameters B i ( τ ) are under the risk neutral framework and the latent factors inthe futures price are real. Having derived the futures price for the stochastic model we propose, we now specify a state space modelformulation and a Bayesian model. We will then develop a novel sampling methodology to estimate jointlythe model parameters and the latent factors.
In specifying the state space models, we shall consider a panel of N different futures contracts with maturities T , . . . , T N and futures prices at time t denoted by F t,T , . . . , F t,T N . In addition, we note that the filteringframework is set up such that the latent process is the real process with real parameters used and theobservation process is the real process with risk neutral parameters used. In this section we convert our s.d.e. models for the latent factors into a discretized version to obtain astate space model formulation. To achieve this we consider strong Taylor schemes, where an approximationprocess Y converges in a strong sense with order γ ∈ (0 , ∞ ] with a continuous process X if there exists afinite constant K and a positive constant δ such that,E ( | X T − Y N | ) (cid:22) Kδ γ . We employ a strong stochastic Milstein Scheme. For the long-short factors the strong stochastic Taylorscheme of order 1, otherwise known as a Milstein scheme, corresponds to the Euler scheme.For the volatility related stochastic processes the Milstein scheme is employed for these two processessince firstly they are both correlated, so the same discretization scheme should be employed for each of6hem. Secondly the process for dV t certainly does not have a Gaussian transition density, as such a Eulertype scheme with additive Gaussian increments is too simplistic an approximation, even at daily intervals.In the Milstein scheme we truncate the mixed integrals at a p-th order as shown in Appendix 2. Proposition 2:
The bivariate s.d.e.’s for χ t and ξ t with 2-dimensional Wiener process discretized undera strong Euler scheme leads to the following state space equations. χ t = (1 − β △ t ) χ t − + σ χ p △ tn χ ξ t = µ ξ + (1 − κ ξ △ t ) ξ t − + σ ξ p △ tn ξ where n χ , n ξ are standard normal random variables for the innovation noise. The bivariate s.d.e.’s for θ t and V t with 2-dimensional Wiener process discretized under a strong Milstein scheme leads to the followingstate space equations. θ t ≈ θ t − + p V t − △ tn θ,t + 14 σ V ρ V θ (cid:0) △ tn θ,t − △ t (cid:1) + 12 σ V q − ρ V θ J p (2 , △ t V t ≈ V t − + ( µ V − κ V V t − ) △ t + σ V ρ V θ p V t − △ tn θ,t + σ V q V t − (cid:0) − ρ V θ (cid:1) △ tn V,t + 14 σ V ρ V θ (cid:0) △ tn θ,t − △ t (cid:1) + 12 σ V ρ V θ q − ρ V θ J p (1 , △ t + 12 σ V ρ V θ q − ρ V θ J p (2 , △ t + 14 σ V (cid:0) − ρ V θ (cid:1) (cid:0) △ tn V,t − △ t (cid:1) where n θ,t and n V,t are i.i.d. standard normal random variables and we utilise a p -th order truncation withterms for J p ( i,j ) △ t defined by J p ( j ,j ) △ t = △ t (cid:18) ζ j ζ j + √ ρ p ( µ j ,p ζ j − µ j ,p ζ j ) (cid:19) + △ t π p X r =1 r (cid:16) ψ j ,r (cid:16) √ ζ j + ν j ,r (cid:17) − ψ j ,r (cid:16) √ ζ j + ν j ,r (cid:17)(cid:17) , where ζ j , µ j,p , ν j,r and ψ j,r are all independent N (0; 1) Gaussian random variables with, ρ p = 112 − π p X r =1 r , ζ j = 1 √△ t △ W j . In the specification of our models we take △ t = 1 day. The proof of these results is presented in Appendix 2.
Remark 2:
Under this discretization scheme it will not be possible in general to find the transitiondensity for these two factors θ t and V t , given θ t − and V t − . We note that the version of the particle filterwe develop only requires that we can generate from these state equations, which is trivial in the sense thatit only requires generating Gaussian random variables and applying a transform. This is the key reason forour use later on of the SIR particle filter approach in our estimation methodology. Clearly, the observation equations for the futures contracts are linear and will be given in Equations 5.1.ln F t,T = f ( T ) + B ( τ ) + B ( τ ) ξ t + B ( τ ) χ t + B ( τ ) θ t + ǫ (1) t ...ln F t,T N = f ( T N ) + B ( τ N ) + B ( τ N ) ξ t + B ( τ N ) χ t + B ( τ N ) θ t + ǫ ( N ) t ǫ (1) t , . . . , ǫ ( N ) t are all zero mean normal random variables for the observation noise on the futurescontract prices, with covariance structures at each time t denoted by Σ ǫ which is ( N × N ) dimensional.Clearly in this model the state equations, presented in Proposition 2, are non-linear and non-Gaussian,hence the use of a Kalman-Filter is no-longer optimal. It is also clear that a simple linearization of the stateequations will not suffice, hence this also precludes the use of the approximate Extended Kalman Filterformulation. Instead we will develop a particle filter and Markov chain Monte Carlo solution based arounda novel version of the Particle MCMC algorithm. In this section we develop a Bayesian model for the parameters and latent factors that must be estimatedover time. In introducing a Bayesian formulation for this problem we note that we model the real processparameters as random variables and the risk premia used to convert to the risk neutral pricing framework asunknown random variables. The Bayesian formulation proceeds by first defining the posterior distributionone is interested in considering: π ( Φ , χ t , ξ t , θ t , V t | F t,T , . . . , F t,T N ) (5.1)= µ Φ ( χ , ξ , θ , V ) T Y t =1 f ( χ ) Φ ( χ t | χ t − ) f ( ξ ) Φ ( ξ t | ξ t − ) f ( θ ) Φ ( θ t | θ t − , V t − ) f ( V ) Φ ( V t | V t − ) (5.2) × T Y t =1 N Y n =1 g Φ ( F t,T n | χ t , ξ t , θ t ) p ( Φ ) (5.3)where we define quantities as follows: • Φ = ( Φ R , Φ RN , ω ), where Φ R and Φ RN are the real and risk neutral parameter vectors respec-tively, e.g. Φ R = ( β, µ ξ , κ ξ , µ V , κ V , σ χ , σ ξ , σ V , Σ ǫ ) the random vector of unknown real-world modelparameters. • A priori there is no reason to specify the priors for the real-world and risk-neutral parameters differ-ently and therefore we give them identical prior specifications. With the real-world prior p ( Φ R ) forthe random parameter vector Φ R given by: β χ ∼ U (cid:0) A β χ , B β χ (cid:1) ; σ χ ∼ U ( A σ χ , B σ χ ); µ ζ ∼ U (cid:0) M µ ζ , V µ ζ (cid:1) ; κ ζ ∼ U (cid:0) A κ ζ , B κ ζ (cid:1) ; σ ζ ∼ U ( A σ ζ , B σ ζ ); µ V ∼ U ( A µ V , B µ V ) ; κ V ∼ U ( A κ V , B κ V ) ; σ V ∼ U ( A σ V , B σ V ); σ F ( T i ) ∼ IG ( A σ F ( Ti ) , B σ F ( Ti ) ) . With respect to the choices of hyper-prior parameters, we consider an objective uninformative priorspecification, see Section 7. Furthermore, we assume that Σ ǫ = diag (cid:0) σ ǫ , . . . , σ ǫ N (cid:1) , though it is trivialto consider alternative specifications. • µ Φ ( χ , ξ , θ , V ) is a multivariate Gaussian with known mean ν and covariance Σ ν which specifies thedistribution of the initial state of the factors given the model parameters.8 f ( χ ) Φ ( χ t | χ t − ) is given by the state space model state equation to be a Gaussian with mean (1 − β ) χ t − △ t and standard deviation σ χ √△ t . • f ( ξ ) Φ ( ξ t | ξ t − ) is given by the state space model state equation to be a Gaussian with mean µ ξ △ t + (1 − κ ξ △ t ) χ t − and standard deviation σ ξ √△ t . • f ( θ ) Φ ( θ t | θ t − ) and f ( V ) Φ ( V t | V t − ) are given by the state space model state equations to be the distribu-tion formed from the addition of the random variables √ V t − n θ,t + e , θ,t + e , θ,t and σ V √ V t − n V,t + h , V,t respectively. The convolution integrals involved with finding these distributions do not admit a para-metric form. We that this does not affect our estimation approach since we only require that we cansample efficiently these processes. • g Φ ( F t,T n | χ t , ξ t , θ t ) is given by the state space model observation equation to be a Gaussian with mean f ( T ) + B ( τ ) + B ( τ ) ξ t + B ( τ ) χ t + B ( τ ) θ t and standard deviation 1. In this section we detail the simulation methodology we develop for obtaining samples from the posteriordistribution given in Equation (5.3). This simulation methodology is based on a recent sampler specificallydeveloped for use in state space models in which one wishes to estimate jointly the model parameters andthe latent states, see [14]. The methodology is known as Particle Markov chain Monte Carlo (PMCMC)and represents a state of the art sampling framework for such problems. The approach embeds a particlefilter estimate of an optimal proposal distribution into a MCMC algorithm, allowing one to obtain samplesfrom the target posterior distribution for the parameters and latent states. This embedding is done in anon-trivial fashion, see [14] for details.The Particle MCMC proposal distribution is split into two components. The first involves a proposalkernel which is constructed via an adaptive Metropolis scheme and this will be used to sample the staticparameters Φ . The second component of the proposal kernel involves the sampling of a trajectory for thelatent factors, χ t , ξ t , θ t , V t . This proposal kernel is constructed via a Rao-Blackwellized SequentialMonte Carlo algorithm in which the Rao-Blackwellizing variance reduction is achieved via optimal Kalman-Filtering for the path space associated to χ t , ξ t .The introduction of an adaptive MCMC proposal kernel into the Particle MCMC setting allows theMarkov chain proposal distribution to adaptively learn the regions in which the marginal posterior distri-bution for the static model parameters has most mass. As such the probability of acceptance under suchan online adaptive proposal will be significantly improved over time. Then for the latent factor path spaceswe develop a minimum variance approach involving Rao-Blackwellisation via a Kalman Filter, followed bya Sampling Importance Resampling (SIR) version of the particle filter. Note - it is important to work with n SIR filter for the path space parameters associated with θ t , V t . The reason for this is that under theMilstein discretization scheme we develop, we can efficiently sample exactly from the prior parameters butcan not evaluate the prior density point-wise. Hence, under this variance reduction sampling framework, weoptimally sample the latent state path space trajectories for χ t and ξ t via a Kalman Filter formulationand we use a SIR version of the particle filter for the state path trajectories for θ t , V t . Particle MCMC is a new methodology introduced recently in [14]. The methodology utilizes particle filteringmethodology to approximate the optimal proposal distribution in an MCMC algorithm in which one isinterested in high-dimensional block updates. This naturally, lends it self well to the setting of state spacemodelling. In the setting considered in this paper the parameter space of static parameters in the model ishigh dimensional, and in addition the path space latent factors given an additional 4 t parameters. Where,in this case t can range from hundreds to thousands of days, depending on the commodities studied andsampling frequency of the observations. Hence, this problem is ideally suited for the PMCMC methodology.The simplest solution to this problem would be to sample directly form the univariate full conditionaldistributions. However, even in the case in which a single component Gibbs sampler is achievable viainversion sampling for all parameters, this sampling scheme will typically results in very slow mixing ofthe Markov chain around the support of the posterior. This is especially problematic in high dimensionaltarget posterior distributions, since it requires excessively long Markov chain runs to achieve samples fromstationary regime. Typically, it can also lead to very high autocorrelations in the Markov chain states.It is well known that to avoid this slow mixing Markov chain setting, one must sample from larger blocksof parameters. However, the design of an optimal proposal distribution for large blocks of parameters is verycomplicated. The Particle MCMC methodology allows one to approximate the optimal proposal distributionfor a large number of parameters via a Sequential Monte Carlo approximated proposal distribution.In the state space setting, the Particle MCMC algorithm used to sample from a generic target distribution,with static parameters θ = ( Φ R , Φ R N ) and latent state vector x t = ( χ t , ξ t , V t , θ t ), π ( θ , x t | y t ) proceedsby mimicking the marginal Metropolis-Hastings algorithm in which the acceptance probability is given by, α (cid:0) θ , θ ′ (cid:1) = min (cid:18) , π ( θ ′ | y t ) q ( θ ′ , θ ) π ( θ | y t ) q ( θ , θ ′ ) (cid:19) . Clearly, achieving this requires one to use a very particular structure for the proposal kernel in the MCMCalgorithm. In particular the proposal kernel must take the form, q (cid:0) ( θ , x t ) , (cid:0) θ ′ , x ′ t (cid:1)(cid:1) = q (cid:0) θ ′ , θ (cid:1) π (cid:0) x ′ t | y t , θ ′ (cid:1) . Under this proposal, the acceptance probability for a standard Metropolis-Hastings algorithm takes the10orm, α (cid:0) θ , θ ′ (cid:1) = min (cid:18) , π ( θ ′ , x ′ t | y t ) q (( θ , x t ) , ( θ ′ , x ′ t )) π ( θ , x t | y t ) q (( θ ′ , x ′ t ) , ( θ , x t )) (cid:19) = min (cid:18) , π ( θ ′ | y t ) q ( θ ′ , θ ) π ( θ | y t ) q ( θ , θ ′ ) (cid:19) . The critical idea recognized in [14] in formulating the Particle MCMC algorithm is that the proposaldistribution for π ( x ′ t | y t , θ ′ ) can be sampled from via a Sequential Monte Carlo algorithm. One iteration of the generic Particle MCMC algorithm proceeds as follows:1. Sample θ ′ ∼ q ( θ , · ).2. Run an SMC algorithm with N particles to obtain: b π (cid:0) x t | y t , θ ′ (cid:1) = N X i =1 W ( i )1: t δ x ( i )1: t ( x t ) b π (cid:0) y t | θ ′ (cid:1) = t Y j =1 N N X i =1 w j (cid:16) x ( i ) j (cid:17)! Then sample a candidate path X ′ t ∼ b π ( x t | y t , θ ′ ).3. Utilise the estimate of b π ( y t | θ ) from the previous iteration of the Markov chain.4. Accept the proposed new Markov chain state comprised of ( θ ′ , X ′ t ) with acceptance probability givenby α (cid:0) ( θ , X t ) , (cid:0) θ ′ , X ′ t (cid:1)(cid:1) = min (cid:18) , b π ( y t | θ ′ ) π ( θ ′ ) q ( θ ′ , θ ) b π ( y t | θ ) π ( θ ) q ( θ , θ ′ ) (cid:19) (6.1)The key advantage of this approach is that the difficult problem of designing high dimensional proposalshas been replaced with the simpler problem of designing low dimensional mutation kernels in the SequentialMonte Carlo algorithms embedded in the MCMC algorithm. In the paper [14], this sampling approach inwhich Sequential Monte Carlo is used to approximate the marginal likelihood in the acceptance probabilityhas been shown to have several theoretical convergence properties, see Section 4 of [14] for details. There are several classes of adaptive MCMC algorithms, see [15]. The distinguishing feature of adaptiveMCMC algorithms, compared to standard MCMC, is generation of the Markov chain via a sequence oftransition kernels. Adaptive algorithms utilise a combination of time or state inhomogeneous proposalkernels. Each proposal in the sequence is allowed to depend on the past history of the Markov chaingenerated, resulting in many possible variants. 11ue to the inhomogeneity of the Markov kernel used in adaptive algorithms, it is particularly importantto ensure the generated Markov chain is ergodic, with the appropriate stationary distribution. Several recentpapers proposing theoretical conditions that must be satisfied to ensure ergodicity of adaptive algorithmsinclude,[16],[15], [17], [18] and [19].In [20] an adaptive Metropolis algorithm with proposal covariance adapted to the history of the Markovchain was developed. The original proof of ergodicity of the Markov chain under such an adaption wasoverly restrictive. It required a bounded state space and a uniformly ergodic Markov chain. In [15] a proofof ergodicity of adaptive MCMC under simpler conditions known as
Diminishing Adaptation and
BoundedConvergence is presented. In general it is non-trivial to develop adaption schemes which can be verifiedto satisfy these two conditions. In this paper we use the adaptive MCMC algorithm to learn the proposaldistribution for the static parameters in our posterior Φ . In particular we work with an Adaptive Metropolisalgorithm utilizing a mixture proposal kernel known to satisfy these two ergodicity conditions for unboundedstate spaces and general classes of target posterior distribution, see [15] for details. In this section we present the specific details of the Adaptive Metropolis within Rao-Blackwellised ParticleMCMC algorithm used to sample from the posterior on the path space of our latent factors and statespace model parameters. This involves specifying the details of the proposal distribution in the PMCMCalgorithm, q (( Φ , χ t , ξ t , θ t , V t ) , ( θ ′ , χ ′ t , ξ ′ t , θ ′ t , V ′ t )) = q ( Φ ′ , Φ ) π ( χ ′ t , ξ ′ t , θ ′ t , V ′ t | F t,T , C t,T,M,K , Φ ′ ) . The proposal, q (cid:0) Φ ( j − , Φ ′ (cid:1) , for the static model parameters Φ is given by an adaptive Metropolis proposalcomprised of a mixture of Gaussians, one component of which has a covariance structure which is adaptivelylearnt on-line as the algorithm progressively explores the posterior distribution. The mixture proposaldistribution for parameters Φ is given at iteration j of the Markov chain by, q j (cid:16) Φ ( j − , · (cid:17) = w N Φ ; Φ ( j − , (2 . d Σ j ! + (1 − w ) N Φ ; Φ ( j − , (0 . d I d,d ! . (6.2)Here, Σ j is the current empirical estimate of the covariance between the parameters of Φ estimated usingsamples from the Particle Markov chain up to time j . The theoretical motivation for the choices of scalefactors 2.38, 0.1 and dimension d are all provided in [15] and are based on optimality conditions presentedin [21] and [22]. We note that the update of the covariance matrix, can be done recursively online via thefollowing recursion, µ j +1 = µ j + 1 j + 1 (cid:16) ˜ β ( j − − µ j (cid:17) Σ j +1 = Σ j + 1 j + 1 (cid:18)(cid:16) ˜ β ( j − − µ j (cid:17) (cid:16) ˜ β ( j − − µ j (cid:17) ′ − Σ j (cid:19) . (6.3)Having specified the adaptive PMCMC proposal distribution for the static model parameters, we now focuson the proposal mechanism for the latent path space trajectories for the factors in the commodity model inthe next subsection. 12 .3 Rao-Blackwellised Sequential Importance Sampling proposal The proposal kernel for the latent factors that will give us a Marginal Metropolis-Hastings framework is givenby, π ( χ ′ t , ξ ′ t , θ ′ t , V ′ t | F t,T , Φ ′ ). Sampling from this proposal can not be done exactly as it corresponds tosampling from the full conditional posterior distribution for the latent factors path genealogies. Instead wewill utilise a specifically designed Sequential Monte Carlo algorithm to construct our proposal distribution.The SMC algorithm we develop here has a particular form which allows us to achieve two goals, the firstthat we can perform Rao-Blackwellisation via a Kalman-Filter for the latent factors χ t , ξ t . This optimalmarginalised particle filtering approach and ideas are presented in [23] and similar ideas are discussed in[24]. Secondly, we develop a filter known as the Sequential Importance Sampling Resampling (SIR) filter forthe latent factors θ t , V t . In doing this we can sample exactly from the prior for our latent factors (i.e. theMilstein discretization) and perform the importance weighting only dependent on the likelihood model. Thealgorithm for the Adaptive Metropolis within Rao-Blackwellised Particle MCMC is presented in Appendix12. Where the details of step 5. are also provided in Appendix 12. The details of step 6. pertain to theRao-Blackwellised SIR particle filter for constructing the proposal for the path spaces of the latent factorsin the model. This requires some additional discussion and is presented below in Section 6.3. Remark 3:
Development of a Rao-Blackwellized SIR particle filter proposal for the latent path trajec-tories proposal in the PMCMC algorithm, given the proposal static parameters Φ has the advantage thatit significantly reduces the variance in importance sampling weights. Achieved by exploiting the conditionallinear Gaussian stat-space structure and a Kalman Filter. This will in turn increase the acceptance rate ofour proposal mechanism. Remark 4:
Utilising the SIR filter to construct a proposal for the latent factors θ t , V t has an importantadvantage in the Milstein discretized state space model formulation we developed in that it can sample exactlyfrom the prior for our latent factors and perform the importance weighting only dependent on the likelihoodmodel. This is clearly beneficial in our setting since it allows us to avoid having to perform point-wiseevaluation of the transition distributions given by f Φ ( θ t | θ t − ) and f Φ ( V t | V t − ) which can only be expressedas convolution integrals. The specific algorithm used to complete Step 7 of the Adaptive Metropolis within Rao-BlackwellisedParticle MCMC algorithm is provided in Section 12. Here we present the details for the sampling of acandidate proposal, conditional on the proposed state Φ ′ , for the latent factor path spaces, χ t , ξ t , θ t , V t as well as the estimation of the marginal likelihood b p ( F t,T | Φ ′ ) required for the calculation of the acceptanceprobability. This involves the development of a Rao-Blackwellised Kalman Filter scheme in conjunction withan SIR particle filter stage.The Rao-Blackwellisation via the Kalman Filter involves the following decomposition of the filtering13istribution, π ( χ t , ξ t , θ t , V t | F t,T , Φ ) = π ( χ t , ξ t | θ t , V t , F t,T , Φ ) × π ( θ t , V t | F t,T , Φ )Next we note that the posterior distribution π ( χ t , ξ t | θ t , V t , F t,T , Φ ) is exactly characterized by theKalman Filtering estimate. We seek to exploit this fact in developing our sampling methodology. First weutilize a standard SIR particle filter with adaptive resampling to obtain the particle estimate, b π ( θ t , V t | F t,T , Φ ) = N X i =1 W ( i ) t δ (cid:16) θ ( i )1: t ,V ( i )1: t (cid:17) ( θ t , V t )This particle estimate of the marginal then results in the joint posterior distribution estimate, b π ( χ t , ξ t , θ t , V t | F t,T , Φ ) = N X i =1 W ( i ) t π (cid:16) χ t , ξ t | θ ( i )1: t , V ( i )1: t , F t,T , Φ (cid:17) . Now, clearly we can obtain π (cid:16) χ t , ξ t | θ ( i )1: t , V ( i )1: t , F t,T , Φ (cid:17) , for each particle i , an estimate of this distribu-tion, optimally via the Kalman Filter. In addition, it is easily shown that in this version of the SIR particlefilter the unnormalised importance sampling weight for particle i is given by, see [23],˜ W ( i ) t ∝ W ( i ) t − p (cid:16) F t,T | F t − ,T , Φ , θ ( i )1: t , χ ( i )1: t , ξ ( i )1: t (cid:17) . We explain now the details of the Rao-Blackwellisation within the context of the SIR particle filter andthen present the algorithm to perform this methodology. Given we have obtained a particle estimate { θ ( i )1: t , V ( i )1: t , W ( i ) t } i =1: N at time t . We associate to each particle a Kalman Filter mean and covariance µ ( i ) t = (cid:16) µ ( i ) χ ( t ) , µ ( i ) ξ ( t ) (cid:17) and Σ ( i ) t = Σ ( i ) χ,ξ ( t ), which we obtain via a standard Kalman Filter recursion as follows: µ ( i ) t | t − = A n µ ( i ) t − | t − Σ ( i ) t | t − = A n Σ ( i ) t − | t − A ⊤ n + σ b n b ⊤ n Γ ( i ) t = b ⊤ n Σ ( i ) t | t − b n y ( i ) t | t − = b ⊤ n µ ( i ) t | t − + b ⊤ s (cid:16) θ ( i ) t , V ( i ) t (cid:17) µ ( i ) t | t = µ ( i ) t | t − − Σ ( i ) t | t − b n h Γ ( i ) t i − (cid:16) y t − y ( i ) t | t − (cid:17) Σ ( i ) t | t = Σ ( i ) t | t − − Σ ( i ) t | t − b n h Γ ( i ) t i − b ⊤ n Σ ( i ) t | t − , where Γ t = Cov ( F t,T | F t − ,T ) and the predictive density is given by p ( F t,T | F t − ,T , Φ , θ t , χ t , ξ t ) = N (cid:16) F t,T ; f ( F t − ,T Φ , θ t , χ t , ξ t ) , Γ (1) t (cid:17) . Now that we have detailed each aspect of the model and each aspect of the proposed sampling algorithmto allow us to calibrate and estimate the Bayesian commodity model developed, we will now present theresults and analysis. This will be done in two stages, first we present details of a comprehensive syntheticsimulation study, followed by a real data analysis. 14
Results and Discussion
In this section we present studies on the performance of the Bayesian model formulation, the Milsteindiscretization framework and the Adaptive PMCMC sampling framework under Rao-Blackwellization. Toachieve this we first consider a case study consisting of synthetic data analysis. After demonstrating the ac-curacy of the estimation and calibration methodology we developed, we perform a detailed analysis utilisingactual data representing a panel of crude oil futures.
In this section we present detailed analysis of the performance of the Adaptive PMCMC algorithm in thecontext of the Bayesian state space model non-linear filtering and parameter estimation framework thatwe have developed. This study aims to systematically assess the following model and AdPMCMC sampleraspects: the ability to estimate static model parameters (perform calibration) in high observation noise; theimpact of parameter estimation under vague priors for static parameters corresponding to drift and volatilityof each of the four discretized s.d.e. factor models; the impact of increasing the number of futures contractswith different maturities on parameter and latent factor estimation; and finally, the impact of the Milsteinscheme truncation for p . The simulation study for this section proceeded by first sampling a set of static model parameters fromthe priors which we will utilise to generate synthetic data. We will then obtain posterior estimates of theseparameters given the generated data. The ‘true’ static parameters utilised in this study, were given by Φ R =( β, µ ξ , κ ξ , µ V , κ V , σ χ , σ ξ , σ V , Σ F ) = Φ RN = (0 . , . , . , . , . , . , . , . ,
4) and seasonality components w = 0. These parameter settings were then utilized to generate a set of latent process trajectories whichwere then in turn utilized to generate a set of futures curves with observations in noise. This produced a set ofsynthetic latent paths, for 100 days T = 100, for χ T , ζ T , θ T , V T which are presented in the upper panelof Figure 3. The lower panel of Figure 4 presents the resulting log spot price each day. We then utilised thesesimulated trajectories to generate a panel of futures curves according to the observation equations developedin Section 4.2. We utilised a panel of futures curves corresponding to 10 futures contracts, each separated bymaturities of 30 days, the first contract rolling over in 29 days time, followed by the next contract maturingin 59 days, all the way up to the tenth contract which at time t = 1 of the observations still had 299days remaining until maturity. Using these latent process, the noisy observations of the futures curve wasgenerated over 100 days of observations using Σ F = diag (cid:16) σ F ( T ) , . . . , σ F ( T N ) (cid:17) = (1 , . . . , J = 200 ,
000 Markovchain samples with N = 500 particles. The discard Markov chain samples for burn-in was selected as50 , △ t = 1 and the p-th order truncation of theMilstein discretization used to calculate ρ p involved setting p = 100.15n this section, we consider uniform priors on the static state space model parameters, corresponding tohyper-parameter settings specified according to β χ ∼ U (0 , σ χ ∼ U (0 , µ ζ ∼ N (0 , κ ζ ∼ U (0 , σ ζ ∼ U (0 , t , is diagonal.We then consider the observation noise variance for the T i -th futures contract on day t , denoted σ ǫ i utilisedto generate the observations as given by Σ F = 4 I which is corresponding to a coefficient of variation in thenoise of around 20%. Note, in the Bayesian model we treat this observation noise variances as unknownquantities a priori that must be estimated jointly with the other model parameters from the posterior.We consider a panel of N different futures contracts with maturities T , . . . , T N and futures prices attime t denoted by F t,T , . . . , F t,T N . Note, the filtering framework is set up such that the latent process isthe real process with real parameters used and the observation process is the real process with risk neutralparameters used. In the following section we analyse the ability to perform jointly parameter estimation(calibration) and filtering for the latent states of the model as we vary the number of contracts observed onthe futures curve on any given day from N ∈ { , , } .In the following simulation results we demonstrate the accuracy of the joint calibration (parameterestimation) and filtering (state estimation) using the posterior distribution derived in Equation 5.3 and theRao-Blackwellised Adaptive Particle MCMC algorithm. The first results presented are for the parameterestimations as a function of the number of Futures contracts considered, where we consider N ∈ { , , } .In Figure 1 we present the MMSE and 95% posterior confidence intervals for each model parameters realand risk neutral. Each plot contains the parameter estimates as well as the true parameter values usedto generate the data represented by circles. The top panel contains the posterior estimates obtained fora panel of 100 days of data with N = 1 futures contract, the middle panel has posterior estimates for apanel of 100 days of data with N = 5 futures contracts and the lower panel has these results for 100 daysof observations with N = 10 contracts. Each contract considered is separated by 30 days so we have 30day contract, 60 day, through to maturity T N = 10 contracts having very accurate parameter estimates.In Figure 2 we present the trace plots of the model parameters from the Rao-Blackwellised AdaptivePMCMC sampler for futures contract panel data contains N = 10 contracts separated by 30 day maturitiesover 100 trading days. We have separated the results per factor and present the real world and risk neutralparameters on the same trace plot. These results are for uninformative priors and we see reasonable mixingfor the Rao-Blackwellised AdPMCMC sampler updating in dimensions 4 T + 18 each iteration.In Figure 3 we present the Kalman filtered and particle filtered estimates of the latent path space for eachof the four factors for short and long term dynamics and the two stochastic volatility terms, for the caseof N = 10 contracts separated by 30 day maturities over 100 trading days in the panel of futures curves.The results demonstrate that the short term factor and volatility terms are very accurately estimated. The16 µ(ζ) κ(ζ) µ(ν) κ(ν) σ(χ) σ(ζ) σ(ν) Σ β µ(ζ) κ(ζ) µ(ν) κ(ν) σ(χ) σ(ζ) σ(ν) λ−5051015 −5051015 −5051015
10 Futures Contracts
Posterior MMSE Calibrated SDE ParametersTrue Model Parameters
Figure 1: Filled circles are the estimated parameter MMSE and error bars represent 95% posterior confidenceintervals. Open circles are true parameter values used to generate the data. Parameters are presented inorder of real world (left) and risk neutral (right). Real World and Risk Neutral AdPMCMC Posterior Samples. time (days) β χ real worldrisk neutral time (days) σ χ real worldrisk neutral −1012345 Real World and Risk Neutral AdPMCMC Posterior Samples.time (days) µ ζ real worldrisk neutral time (days) κ ζ real worldrisk neutral −10123456 time (days) σ ζ real worldrisk neutral Real World and Risk Neutral AdPMCMC Posterior Samples.time (days) µ V real worldrisk neutral −10123456 time (days) κ V real worldrisk neutral −10123456 time (days) σ V real worldrisk neutral −1−0.500.511.5 λ O b s time (days)Real World and Risk Neutral AdPMCMC Posterior Samples. σ O b s time (days) Figure 2: Trace plots of the simulated Markov chain state trajectories generated by the Rao-BlackwellisedAdPMCMC sampler for the model parameters of the multi-factor s.d.e. model.long term factor estimates the mean reversion level well, and includes the true latent process within a 95%posterior confidence interval. 17
10 20 30 40 50 60 70 80 90−1001020 time (days)
Short term dynamic χ time (days) Long term dynamic ζ time (days) Volatility term Θ time (days) Volatility term V
Figure 3: True latent factor model s.d.e. trajectories (solid line) versus Rao-Blackwellised AdPMCMCsampler filter estimates for the MMSE (dashed line) and 95% posterior confidence intervals (dotted lines)for each factor: short term and long term trends and volatility. The Milstein SIR proposal utilised p = 100and 500 particles.Finally, the most important result in this analysis is a comparison of the estimated log spot price versus theestimated MMSE log spot price and 95% posterior confidence interval, see Figure 4. As above, this examplecorresponds to the situation in which the futures contract panel data contains 10 contracts separated by30 day maturities over 100 trading days of observations. The Milstein SIR proposal considered in theRao-Blackwellized Adaptive PMCMC algorithm utilised truncation of p = 100 and 500 particles.We also note that simulation studies were also performed utilising futures contracts generated withseasonal components, the resulting estimation accuracy were equivalent to those presented. Additionally,we varied the signal to noise ratio by decreasing observation error to σ F = 1 and observed a correspondingincrease in accuracy of the estimation. In this section we take a panel 10 futures contracts with daily price data corresponding to the contracts forcrude light oil commodity futures over 150 trading days from 2/01/1998 to 6/08/1998. The maturities ofthe 10 contracts are 21 days for the first contracts initiation and each contract there after has a maturityincreasing by 20 days. Therefore the last contract has a maturity from its initiation of 200 days. Weestimated the posterior model MMSE parameters and posterior confidence intervals for the developed inSection 5 using this panel of oil futures data. This was achieved via the Rao-Blackwellised AdPMCMC18
10 20 30 40 50 60 70 80 90−15−10−5051015202530 time (days)
Log Spot Price
True Log SpotMMSE Log Spot95% Posterior C.I. Log Spot
Figure 4: True latent log spot price trajectory versus Rao-Blackwellised AdPMCMC sampler filter estimatedlog spot price MMSE and 95% posterior confidence interval.algorithm for the four factor model developed, again with a calibration and filtering formulation and the100,000 iterations of the sampler. The average acceptance rate of the PMCMC sampler developed was28% and 500 particles were utilised. The Milstein SIR proposal utilised p = 100 and the calibrated modelparameters obtained are presented in Figure 5 along with 3 posterior standard deviations of these estimatedparameters using the model developed in Section 5. β µ(ζ) κ(ζ) µ(ν) κ(ν) σ(χ) σ(ζ) σ(ν) Σ β µ(ζ) κ(ζ) µ(ν) κ(ν) σ(χ) σ(ζ) σ(ν) λ−10123456
10 Futures Contracts
Figure 5: Calibrated MMSE posterior model parameters and posterior confidence intervals. This examplecorresponds to the situation in which the crude oil futures contract panel data contains 10 contracts separatedby 20 day maturities over 150 trading days. The Milstein SIR proposal utilised p = 100 and 500 particles.In Figure 6 we present the latent path space trajectory estimates for the four factors over 150 days fittedto the panel of oil futures data. The plots contain the posterior MMSE estimates for each factor and the95% posterior C.I. estimates. In addition, we present the estiamted log spot price for the crude oil futurescontract data, based on 10 contracts with 150 days of data in the panel.Next we present the MMSEP and the associated posterior predictive 95% confidence intervals obtained19 ime (days) S ho r t t e r m d y na m i c χ : T
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150−0.4−0.3−0.2−0.100.10.20.3 time (days)
Long t e r m d y na m i c ζ : T
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150−0.4−0.200.20.40.60.811.21.41.6 time (days) V o l a t ili t y t e r m Θ : T
10 20 30 40 50 60 70 80 90 100 110 120 130 140 15000.511.522.533.5 time (days) V o l a t ili t y t e r m V : T
10 20 30 40 50 60 70 80 90 100 110 120 130 140 15000.511.522.533.544.55 time (days)
Log S po t P r i c e
10 20 30 40 50 60 70 80 90 100 110 120 130 140 15000.511.522.533.54
Figure 6: Panel 1: Short term dynamic χ T , Panel 2: Long term dynamic ζ T , Panel 3: Volatility term θ T , Panel 4: Volatility V T , Panel 5: Log spot price estimated S T . Solid line is filtered MMSE and grayband is posterior 95% C.I.from the predictive distribution of the log futures curves at some time point τ defined by Equation 7.1. p ( F τ,T , F τ,T , . . . , F τ,T N | F T,T , F T,T , . . . , F T,T N )= Z N Y n =1 g Φ ( F τ,T n | χ τ , ζ τ , θ τ ) π ( Φ , χ T , ζ T , θ T , V T | F T,T , F T,T , . . . , F T,T N ) d Φ dχ T,τ dζ T,τ dθ T,τ dV T,τ (7.1)
Results in Figure 7 represent from top to bottom in the figure panels on the left the within sample andon the right the out of sample forecast log futures prices after integrating out uncertainty in the latent20rocess states and parameters, resulting from the posterior predictive distribution. The shaded gray areacorresponds to the predicted 95% posterior predictive confidence interval for the log of the futures prices forcontracts 1, 5, 10 observed over 150 days and predicted out of sample over an additional 5 days. The firstcontract in the top row of results is the contract closest to maturing, the 5th contract in the middle row ofresults is contracts maturing at 100 days and the 10-th contract in the bottom row of results are those withmaturties of 200 days. The solid line represents the observed time series of daily log futures price at closeof market over 150 days within sample and 5 days out of sample. The dashed line represents the estimatedposterior MMSE and predicted MMSE of the log futures price. Clearly we see that the observed futuresprice curves are contained within the 95% posterior C.I of the estimated model at all times, and secondlythat the estimated MMSE and forecast MMSE out of sample indicate acturate calibration and estimationof the commodity model on the real crude oil futures data.
Figure 7: Left Panels: within sample prediction, Right Panels: out of sample forecasts. Top Row: contractswith 20 days maturity, Middle Row: contracts with 100 days maturity, Bottom Row: contracts with 200days maturity. This example corresponds to the situation in which the crude oil futures contract panel datacontains 10 contracts separated by 20 day maturities over 150 trading days. Solid line observed log futuresprice, dashed line predicted MMSE log futures price, gray shading 95% posterior C.I.
In this paper we have developed a novel general multi-factor model for commodity spot prices and futuresvaluation. In particular we extend the multi-factor model long-short stochastic differential equation (s.d.e.)21odel developed in [1] and [2] in two important aspects: firstly we allow for both the long and short termdynamic factors to be mean reverting and secondly we develop an additive structural seasonality model. Wedeveloped a non-trivial state space model representation for such a model and in doing so we also extendthe literature in this area by allowing for futures and options contracts in the observation equation of theMilstein discretized non-linear stochastic volatility state space model.Next to perform estimation and calibration under this model we derived a closed form solution for thefutures prices and then developed a novel numerical methodology based on Sequential Monte Carlo toperform calibration of the model jointly with the filtering of the latent processes for the long-short andvolatility dynamics. In this regard we explore and develop a novel methodology based on an adaptive Rao-Blackwellised version of the Particle Markov chain Monte Carlo methodology for the joint calibration andfiltering. In doing this we deal accurately with the non-linearities introduced into the filtering framework.We perform analysis on real data for oil commodities.
References [1] E. Schwartz and J. Smith, “Short-term variations and long-term dynamics in commodity prices,”
Man-agement Science , pp. 893–911, 2000.[2] X. Yan, “Valuation of commodity derivatives in a new multi-factor model,”
Review of DerivativesResearch , vol. 5, no. 3, pp. 251–271, 2002.[3] J. Keynes, “The policy of government storage of food-stuffs and raw materials,”
The Economic Journal ,vol. 48, no. 191, pp. 449–460, 1938.[4] H. Working, “The theory of price of storage,”
The American Economic Review , vol. 39, no. 6, pp.1254–1262, 1949.[5] N. Kaldor, “Speculation and economic stability,”
The Review of Economic Studies , vol. 7, no. 1, pp.1–27, 1939.[6] P. Samuelson, “Proof that properly anticipated prices fluctuate randomly,”
Management Review , vol. 6,no. 2, pp. 41–49, 1965.[7] M. Brennan and E. Schwartz, “Evaluating natural resource investments,”
The Journal of Business ,vol. 58, no. 2, pp. 135–157, 1985.[8] R. Gibson and E. Schwartz, “Stochastic convenience yield and the pricing of oil contingent claims,”
The Journal of Finance , vol. 45, no. 3, pp. 959–976, 1990.[9] E. Schwartz, “The stochastic behavior of commodity prices: Implications for valuation and hedging,”
JOURNAL OF FINANCE-NEW YORK- , vol. 52, pp. 923–974, 1997.2210] K. Miltersen and E. Schwartz, “Pricing of options on commodity futures with stochastic term structuresof convenience yields and interest rates,”
Journal of Financial and Quantitative Analysis , pp. 33–59,1998.[11] J. Hilliard and J. Reis, “Valuation of commodity futures and options under stochastic convenienceyields, interest rates, and jump diffusions in the spot,”
Journal of Financial and Quantitative Analysis ,pp. 61–86, 1998.[12] D. Ribeiro and S. Hodges, “A two-factor model for commodity prices and futures valuation,”
EFMA2004 Basel Meetings Paper , 2004.[13] P. Cheridito, D. Filipovic, and R. Kimmel, “Market price of risk specifications for affine models: Theoryand evidence,”
Journal of Financial Economics , vol. 83, no. 1, pp. 123–170, 2007.[14] C. Andrieu, A. Doucet, and R. Holenstein, “Particle markov chain monte carlo methods,”
Journal ofthe Royal Statistical Society: Series B (Statistical Methodology) , vol. 72, no. 3, pp. 269–342, 2010.[15] G. Roberts and J. Rosenthal, “Examples of adaptive mcmc,”
Journal of Computational and GraphicalStatistics , vol. 18, no. 2, pp. 349–367, 2009.[16] Y. Atchad´e and J. Rosenthal, “On adaptive markov chain monte carlo algorithms,”
Bernoulli , vol. 11,no. 5, pp. 815–828, 2005.[17] H. Haario, E. Saksman, and J. Tamminen, “Componentwise adaptation for high dimensional mcmc,”
Computational Statistics , vol. 20, no. 2, pp. 265–273, 2005.[18] C. Andrieu and ´E. Moulines, “On the ergodicity properties of some adaptive mcmc algorithms,”
TheAnnals of Applied Probability , vol. 16, no. 3, pp. 1462–1505, 2006.[19] C. Andrieu and Y. Atchad´e, “On the efficiency of adaptive mcmc algorithms,” pp. 43–es, 2006.[20] H. Haario, E. Saksman, and J. Tamminen, “An adaptive metropolis algorithm,”
Bernoulli , vol. 7, no. 2,pp. 223–242, 2001.[21] G. Roberts, A. Gelman, and W. Gilks, “Weak convergence and optimal scaling of random walk metropo-lis algorithms,”
The Annals of Applied Probability , vol. 7, no. 1, pp. 110–120, 1997.[22] G. Roberts and J. Rosenthal, “Optimal scaling for various metropolis-hastings algorithms,”
StatisticalScience , vol. 16, no. 4, pp. 351–367, 2001.[23] A. Doucet, N. De Freitas, K. Murphy, and S. Russell, “Rao-blackwellised particle filtering for dynamicbayesian networks,” pp. 176–183, 2000.[24] G. Casella and C. Robert, “Rao-blackwellisation of sampling schemes,”
Biometrika , vol. 83, no. 1, p. 81,1996. 23
Appendix 1: Proof of Futures Price
Proof:
Here we present the proof of Theorem 1 for the futures price from the real latent process s.d.e.model. To proceed we obtain the risk neutral process from the real process we need to deduct the riskfree premium from the long term trend to account for the market price of risk where the risk premium isdenoted by λ ( x, t ). Next, we denote the Futures contract price at time t with maturity of the contract at T by F ( t, T ). F ( t, T ) = E ∗ [exp ( X ( T )) | X ( t )] = E ∗ [exp ( f ( T ) + ξ ∗ T + χ ∗ T + θ ∗ T ) | f ( t ) , ξ ∗ t , χ ∗ t , θ ∗ t ] , where E ∗ denotes the expectation taken with respect to the risk neutral process. To solve for this ex-pectation, we must first derive the expression for the transition density of the state variables, denotedby p ( ξ ∗ T + χ ∗ T + θ ∗ T , V ∗ T , T | ξ ∗ t , χ ∗ t , θ ∗ t , V ∗ t , t ) = p ( Y T | Y t , t ). This is obtained for the continuous s.d.e. latentprocess model by writing down the either the Kolmogorov-backward or forward equation. Here we workwith the backward equation (dropping the ∗ notation on state variables for convenience) with the conditionthat, p ( Y T , T | Y t , t = T ) = δ ( Y T − Y t ). Next, to obtain the futures price we multiply each term in theKolmogorov-backward equation by exp ( X t ) and then integrate with respect to X t . Note, the notation F t,T refers to F ( Y t , t = T, T ). This allows us to express each term in the pde with respect to the futures price. ∂F t,T ∂t + 12 σ ξ ∂ F t,T ∂ξ t + 12 σ χ ∂ F t,T ∂χ t + 12 σ V V t ∂ F t,T ∂V t + 12 V t ∂ F t,T ∂θ t + (cid:0) µ ∗ ξ − κ ∗ ξ ξ t (cid:1) ∂F t,T ∂ξ t + ( β ∗ χ t − λ ) ∂F t,T ∂χ t + ( µ ∗ V − κ ∗ V V t ) ∂F t,T ∂V t + ( − λ V t ) ∂F t,T ∂θ t + ρ χξ σ χ σ ξ ∂ F t,T ∂χ t ∂ξ t + ρ θV σ V V t ∂ F t,T ∂χ t ∂V t = 0 , with condition F ( Y t , t = T, T ) = exp( X T ).Next, since the state variables enter into the above pde in a linear fashion, therefore the solution belongsto the class of exponentially affine models. Hence, we can look for a solution of the form, F t,T = exp ( B ( τ ) + B ( τ ) ξ t + B ( τ ) χ t + B ( τ ) θ t + B ( τ ) V t ) . Hence, substitution of this solution results in a system of 5 ODE’s which after substitution and grouping oflike terms, then divide both sides by F t,T and group terms according to latent factors to obtain: dB ( τ ) dτ = − κ ∗ ξ B ( τ ); dB ( τ ) dτ = − β ∗ B ( τ ); dB ( τ ) dτ = 0; dB ( τ ) dτ = − λ B ( τ ) − κ ∗ V B ( τ ) + 12 σ V [ B ( τ )] + ρ θV σ V B ( τ ) B ( τ ) + 12 B ( τ ); dB ( τ ) dτ = − σ ξ [ B ( τ )] − σ χ [ B ( τ )] − B ( τ ) µ ∗ ξ + λ B ( τ ) − µ ∗ V B ( τ ) − ρ χξ σ χ σ ξ B ( τ ) B ( τ ) Note, the condition that F ( Y t , t = T, T ) = exp( X t ) needs to be satisfied, in addition we assumed the formof the risk premium λ ( X t , t ) = λ √ V t = − √ V t (this allows us to remove the volatility V t from the futuresprice, however still keep it in the options price). 24he solution to this system of ODE’s is obtained by first applying the constraint at time T given by F ( Y t , t = T, T ) = exp( X T ) which automatically sets B (0) = B (0) = B (0) = 1 and B (0) = B (0) = 0.We can then obtain solutions B ( τ ) = exp( − β ∗ τ ), B ( τ ) = exp( − κ ∗ ξ τ ) and with initial condition B (0) = 1we get B ( τ ) = 1. Finally, substituting the solution for B ( τ ) into the expression for dB ( τ ) dτ and setting λ = − results in ODE dB ( τ ) dτ = [ ρ θV σ V − κ ∗ V ] B ( τ ) + 12 σ V [ B ( τ )] with the initial condition that B (0) = 0. This is recognized as Ricatti’s equation of the 2nd kind one canobtain a solution to a transformed linear 2nd order ode and then in turn obtain a solution to the Ricattiequation which produces a solution B ( τ ) = 0. Finally, substitution of these solutions for B ( τ ) , B ( τ ) and B ( τ ) into the expression for dB ( τ ) dτ results in an ode which can be integrated (w.r.t. τ ) explicitly to obtaina solution for B ( τ ) as follows: B ( τ ) = − σ ξ Z exp( − κ ∗ ξ τ ) dτ − σ χ Z exp( − β ∗ τ ) dτ − Z exp( − κ ∗ ξ τ ) µ ∗ ξ dτ + λ Z exp( − β ∗ τ ) dτ − ρ χξ σ χ σ ξ Z exp( − β ∗ τ ) exp( − κ ∗ ξ τ ) dτ = σ ξ κ ∗ ξ exp( − κ ∗ ξ τ ) + σ χ β ∗ exp( − β ∗ τ )+ µ ∗ ξ κ ∗ ξ exp( − κ ∗ ξ τ ) − λ β ∗ exp( − β ∗ τ ) + ρ χξ σ χ σ ξ β ∗ + κ ∗ ξ exp( − β ∗ τ − κ ∗ ξ τ )
10 Appendix 2: Proof of Strong Stochastic Taylor Representation ofs.d.e. Factors
The strong Taylor stochastic expansion known as the Milstein scheme in a multivariate setting is presentedbelow. We present the i-th component of a general n-dimensional s.d.e., with m-dimensional Weiner processas, dX it = a i ( t, X t ) dt + m X j =1 b i,j ( t, X t ) dW jt . Throughout the remainder of this section we will adopt here the operator notation of Kloeden and Platen(1999). In their book, p. 346 they demonstrate that the i-th component of the Milstein scheme under theIto integrals is then given by, X it = X it − + a i ( t − , X t − ) △ t + m X j =1 b i,j ( t − , X t − ) △ W jt − + m X j ,j L j b i,j ( t − , X t − ) I ( j ,j ) △ t where, L j = P ni =1 b i,j ∂∂x i and the Ito multiple integral I ( j ,j ) △ t is given by, I ( j ,j ) △ t = Z t n +1 t n Z s t n dW j s dW j s . I ( j ,j ) △ t = 12 { ( △ W t ) − △ t } , and I ( j ,j ) △ t with j = j not easily expressed.As pointed out in Kloeden and Platen (1999), when j = j the Ito and Stratonovich integrals are equal, I ( j ,j ) = J ( j ,j ) = Z τ n +1 τ n Z s τ n dW j s dW j s . It will be easier to obtain the approximation under the Stratonovich integrals which under a p-th ordertruncation is given by, J p ( j ,j ) △ t = △ t (cid:18) ζ j ζ j + √ ρ p ( µ j ,p ζ j − µ j ,p ζ j ) (cid:19) + △ t π p X r =1 r (cid:16) ψ j ,r (cid:16) √ ζ j + ν j ,r (cid:17) − ψ j ,r (cid:16) √ ζ j + ν j ,r (cid:17)(cid:17) , where ζ j , µ j,p , ν j,r and ψ j,r are all independent N (0; 1) Gaussian random variables with, ρ p = 112 − π p X r =1 r ,ζ j = 1 √△ t △ W j . We note that a bivariate approximation for the Heston model was studied in Stump (...) and a rec-ommendation for p is also provided in Kloeden and Platen (1999), who suggest p ≥ K △ t for some positiveconstant K .Hence, the bivariate s.d.e.’s for θ t and V t given by, dθ t = p V t dZ θ dV t = ( µ V − κ V V t ) dt + σ V p V t dZ V will first be recast with respect to independent Weiner processes dW and dW as follows dθ t = p V t dW dV t = ( µ V − κ V V t ) dt + σ V p V t (cid:18) ρ V θ dW + q − ρ V θ dW (cid:19) . a ( t, θ t , V t ) = 0; a ( t, θ t , V t ) = ( µ V − κ V V t ) ; b , ( t, θ t , V t ) = p V t ; b , ( t, θ t , V t ) = 0 b , ( t, θ t , V t ) = σ V p V t ρ V θ ; b , ( t, θ t , V t ) = σ V p V t q − ρ V θ L b , ( t, θ t , V t ) = b , ( t, θ t , V t ) ∂∂θ t b , ( t, θ t , V t ) + b , ( t, θ t , V t ) ∂∂V t b , ( t, θ t , V t ) = 12 σ V ρ V θ L b , ( t, θ t , V t ) = b , ( t, θ t , V t ) ∂∂θ t b , ( t, θ t , V t ) + b , ( t, θ t , V t ) ∂∂V t b , ( t, θ t , V t ) = 12 σ V q − ρ V θ L b , ( t, θ t , V t ) = 0; L b , ( t, θ t , V t ) = 0; L b , ( t, θ t , V t ) = b , ( t, θ t , V t ) ∂∂θ t b , ( t, θ t , V t ) + b , ( t, θ t , V t ) ∂∂V t b , ( t, θ t , V t ) = 12 σ V ρ V θ q − ρ V θ L b , ( t, θ t , V t ) = b , ( t, θ t , V t ) ∂∂θ t b , ( t, θ t , V t ) + b , ( t, θ t , V t ) ∂∂V t b , ( t, θ t , V t ) = 12 σ V ρ V θ q − ρ V θ L b , ( t, θ t , V t ) = b , ( t, θ t , V t ) ∂∂θ t b , ( t, θ t , V t ) + b , ( t, θ t , V t ) ∂∂V t b , ( t, θ t , V t ) = 12 σ V (cid:0) − ρ V θ (cid:1) L b , = b , ( t, θ t , V t ) ∂∂θ t b , ( t, θ t , V t ) + b , ( t, θ t , V t ) ∂∂V t b , ( t, θ t , V t ) = 12 σ V ρ V θ
This leads to the following bivariate strong Milstein discretization scheme, θ t = θ t − + p V t − △ tn θ,t + 12 σ V ρ V θ I (1 , △ t + 12 σ V q − ρ V θ I (2 , △ t ≈ θ t − + p V t − △ tn θ,t + 14 σ v ρ V θ (cid:0) △ tn θ,t − △ t (cid:1) + 12 σ V q − ρ V θ J p (2 , △ t V t = V t − + ( µ V − κ V V t − ) △ t + σ V ρ V θ p V t − △ tn θ,t + σ V q V t − (cid:0) − ρ V θ (cid:1) △ tn V,t + 12 σ V ρ V θ I (1 , △ t + 12 σ V ρ V θ q − ρ V θ I (1 , △ t + 12 σ V ρ V θ q − ρ V θ I (2 , △ t + 12 σ V (cid:0) − ρ V θ (cid:1) I (2 , △ t ≈ V t − + ( µ V − κ V V t − ) △ t + σ V ρ V θ p V t − △ tn θ,t + σ V q V t − (cid:0) − ρ V θ (cid:1) △ tn V,t + 14 σ V ρ V θ (cid:0) △ tn θ,t − △ t (cid:1) + 12 σ V ρ V θ q − ρ V θ J p (1 , △ t + 12 σ V ρ V θ q − ρ V θ J p (2 , △ t + 14 σ V (cid:0) − ρ V θ (cid:1) (cid:0) △ tn V,t − △ t (cid:1) where n θ,t and n V,t are i.i.d. standard normal random variables and we will approximate the mixed ex-pressions J , ( θ ) and J , ( V ) with p -th order truncations of the series, when we perform simulation of theseries.
11 Appendix 3: Simulating a Strong Stochastic Taylor Scheme
Simulation of the strong stochastic Taylor scheme for a Miltstein approximation is give next. This will forman important aspect of the proposal mechanism in the SIR fitler contained in the Particle MCMC algorithm.We wish to simulate new realizations for θ t and V t given θ t − and V t − and model parameters at time t, Φ .27 imulation of a bivariate Milstein Scheme.1 . Initialisation: i. Specify time discretization △ t and truncation p . . Simulation: For i = 1 , . . . , N i. simulate n ( i ) V,t − ∼ N (0 ,
1) and n ( i ) θ,t − ∼ N (0 , ii. simulate n ( i ) V ∼ N (0 ,
1) and n ( i ) θ ∼ N (0 ,
1) and square the resultsto obtain (cid:16) n ( i ) V (cid:17) and (cid:16) n ( i ) θ (cid:17) iii. simulate ζ ( i )1 , ζ ( i )2 , µ ( i )1 ,p , µ ( i )2 ,p each independently from N (0 , For r = 1 , . . . , p iv. simulate ψ ( i )1 ,r , ψ ( i )2 ,r , ν ( i )1 ,r , ν ( i )2 ,r each independently from N (0 , . Evaluate: For i = 1 , . . . , N i. evaluate J p , θ and J p , ( V ) using equation 10 and simulated terms from stage 2. ii. evaluate Milstein approximations to obtain new samples θ ( i ) t and V ( i ) t given θ ( i ) t − and V ( i ) t − Table 1:
Generation from the 2-d Milstein strong stochastic Taylor scheme, truncated at p-th order.
12 Appendix 4: Adaptive Metropolis within Rao-Blackwellised ParticleMCMC Algorithm
Step 2: Sample the static model parameters. if u ≥ w then /* perform an adaptive random walk move */ j the empirical covariance, at iteration j of the Markov chain, for Φ using samples { Φ ( i ) } i =1: j .2(ii). Sample proposal Φ ′ ∼ N (cid:16) Φ ; Φ ( j − , (2 . d Σ j (cid:17) . else /* perform a non-adaptive random walk move */ Φ ′ ∼ N (cid:16) Φ ; Φ ( j − , (0 . d I d,d (cid:17) . lgorithm 1 : Adaptive Metropolis within Rao-Blackwellised Particle MCMC Input : Initial Markov chain state (cid:16) Φ (0) , χ (0)1: t , ξ (0)1: t , θ (0)1: t , V (0)1: t (cid:17) . Output : J Markov chain samples { Φ ( j ) , χ ( j )1: t , ξ ( j )1: t , θ ( j )1: t , V ( j )1: t } j =1: J ∼ p ( Φ , χ t , ξ t , θ t , V t | F t,T ). begin
1. Set initial state (cid:16) Φ (0) , χ (0)1: t , ξ (0)1: t , θ (0)1: t , V (0)1: t (cid:17) deterministically or by sampling the priors. repeat
2. Sample Φ , given Φ ( j − , from an adaptive multivariate Gaussian Metropolis proposal, to obtainproposal Φ ′
3. Sample χ t , ξ t , θ t , V t via a combination of Rao-Blackwellizing Kalman Filter and SIR particlefiltering, conditional on Φ ′ to obtain proposal ( Φ ′ , χ ′ t , ξ ′ t , θ ′ t , V ′ t )4. Accept proposal ( Φ ′ , χ ′ t , ξ ′ t , θ ′ t , V ′ t ) with Metropolis-Hastings acceptance probability given by α (cid:16)(cid:16) Φ ( j − , χ ( j − t , ξ ( j − t , θ ( j − t , V ( j − t (cid:17) , (cid:0) Φ ′ , χ ′ t , ξ ′ t , θ ′ t , V ′ t (cid:1)(cid:17) = min , b p ( F t,T | Φ ′ ) p ( Φ ′ ) q (cid:0) Φ ′ , Φ ( j − (cid:1)b p (cid:0) F t,T | Φ ( j − (cid:1) p (cid:0) Φ ( j − (cid:1) q (cid:0) Φ ( j − , Φ ′ (cid:1) ! and increment j = j + 1. until j=J end lgorithm 2 : Step 3: Particle Filter Proposal - Conditional on the sampled static model parameters θ ′ , run an SIR particle filter with N particles3(i). Initialise the SIR particle filter for X = ( θ , V ) with N particles { (cid:16) θ ( i )0 , V ( i )0 (cid:17) } i =1: N χ ( i )0 , ξ ( i )0 denoted by µ ( i ) t = (cid:16) µ ( i ) χ (0) , µ ( i ) ξ (0) (cid:17) andΣ ( i ) t = Σ ( i )( χ,ξ )(0) repeat N particles at iteration t −
1, to update µ ( i ) t − and Σ ( i ) t − to µ ( i ) t and Σ ( i ) t according to recursions in (Section 6.3)3(iv). Perform Mutation of the N particles at iteration t − t viaprior, X ( i ) t ∼ N (cid:0) x t ; f t ( x t − , θ ′ ) , σ ǫ (cid:1) . Append the new state at time t to the previous particlegeneology to obtain X ( i )0: t for i = 1 , . . . , N N particles as, i -th weight is f W ( i ) t ∝ W ( i ) t − p ( y t | x ( i ) t , θ ′ ). until t=T ;3(vi). Normalise the importance sampling weights W ( i ) t = f W ( i ) t P Ni =1 f W ( i ) t t using stratifiedresampling based on the empirical distribution constructed from the importance weights to obtainnew particles with equal weight.3(viii). Sample one proposed path trajectory from the particle estimated proposal distribution given by X ′ T ∼ b p ( x T | y T , θ ′ ) = P Ni =1 W ( i )1: T δ x ( i )1: T ( x T )3(ix). Evaluate the marginal likelihood given by b p ( y T | θ ′′