[PDF] Pricing commodity swing options

Abstract

In commodity and energy markets swing options allow the buyer to hedge against futures price fluctuations and to select its preferred delivery strategy within daily or periodic constraints, possibly fixed by observing quoted futures contracts. In this paper we focus on the natural gas market and we present a dynamical model for commodity futures prices able to calibrate liquid market quotes and to imply the volatility smile for futures contracts with different delivery periods. We implement the numerical problem by means of a least-square Monte Carlo simulation and we investigate alternative approaches based on reinforcement learning algorithms.

Full PDF

PPricing Commodity Swing Options ∗ Roberto Daluiso † Emanuele Nastasi ‡ Andrea Pallavicini § Giulio Sartorelli ¶ First Version: October 15, 2019. This version: January 27, 2020

Abstract

In commodity and energy markets swing options allow the buyerto hedge against futures price ﬂuctuations and to select its preferreddelivery strategy within daily or periodic constraints, possibly ﬁxedby observing quoted futures contracts. In this paper we focus on thenatural gas market and we present a dynamical model for commodityfutures prices able to calibrate liquid market quotes and to imply thevolatility smile for futures contracts with diﬀerent delivery periods. Weimplement the numerical problem by means of a least-square MonteCarlo simulation and we investigate alternative approaches based onreinforcement learning algorithms.

JEL classiﬁcation codes:

C63, G13.

AMS classiﬁcation codes:

Keywords:

Commodity, Swing option, Volatility smile, Local volatility,Least-square Monte Carlo, Reinforcement learning, Proximal policy opti-mization. ∗ We thank Edoardo Vittori for introducing us to reinforcement learning algorithms. † Banca IMI Milan, [email protected] ‡ Exprivia, [email protected] § Imperial College London and Banca IMI Milan, [email protected] ¶ Banca IMI Milan, [email protected] a r X i v : . [ q -f i n . P R ] J a n ontents The opinions here expressed are solely those of the authors and do not represent in anyway those of their employers. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options In energy markets, a class of commonly traded contracts allows the buyerto select its preferred delivery strategy within daily or periodic constraints,while the purchase price can be ﬁxed at inception or determined beforethe starting date of the delivery period by observing the prices of quotedfutures contracts. These contracts are usually known as swing options sincethe buyer is allowed to swing between a lower and an upper boundary inthe commodity ﬂow.From the modeling point of view, the daily selection of the delivery strat-egy along with constraints on the total consumption force us to describe theswing option pricing problem as a speciﬁc type of a stochastic control prob-lem for the optimal consumption strategy. The ﬁrst works in the literaturedate back to the nineties and they focus on speciﬁc payoﬀs, see for instanceThompson [1995]. The ﬁrst contribution describing general swing optionpayoﬀs is Jaillet et al. [2004], where the authors provide an eﬃcient val-uation framework and propose a stochastic process appropriate for energyprices. Alternative numerical approximations can be found in Haarbrückerand Kuhn [2009], Zhang and Oosterlee [2013], Kirkby and Deng [2020].Investigations on the price dynamics of the underlying commodity can befound in Benth et al. [2012], or in Eriksson et al. [2013] where Lévy modelsare introduced.The theoretical aspects of the stochastic control problem are describedin Barrera-Esteve et al. [2006], where the delivery strategy is analyzed alsoby using neural networks, and later in Carmona and Touzi [2008] and Bar-dou et al. [2009]. In these papers a speciﬁc consumption strategy, namedbang-bang, is discussed. According to this strategy only the minimum ormaximum consumption allowed by all the constraints is selected on eachdelivery day. In particular in Bardou et al. [2009] suﬃcient conditions forthe existence of an optimal bang-bang strategy are derived.Our contribution to the literature is twofold. First, we propose a simplediﬀusive model for commodity futures prices, which is able to describe thevolatility smile quoted by the market for futures contracts with diﬀerent de-livery periods. Our proposal starts from the extension of the local-volatilitylinear model presented in Nastasi et al. [2018]. We stress the importanceof modelling futures prices with heterogeneous delivery periods since swingoption prices depend both on day-ahead prices through the consumptionstrategy and on longer period futures contracts (usually one-month con-tracts) to determine the purchase strike prices. We also show how spikes canbe included in our framework. Second, we investigate reinforcement learning aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options

The local-volatility linear model presented in Nastasi et al. [2018] allows todescribe futures prices in a parsimonious way while preserving a perfect ﬁtto plain vanilla options quoted in the commodity market. Moreover, mid-curve options and calendar spread options can be calibrated by means of abest-ﬁt procedure. In the original paper some extensions are discussed tointroduce multiple risk factors to drive the curve dynamics and to allow forstochastic volatilities. Here, we stick to the one-dimensional speciﬁcation ofthe model and we investigate how to extend it to deal with futures contractson diﬀerent delivery periods and to incorporate spikes.

We start by considering futures contracts with the same delivery period (e.g.one month). We model their prices by introducing the price process S t of arolling futures contract which can be identiﬁed with futures contracts quotedin the market on their last trading date (or on their ﬁrst notiﬁcation dateif it occurs before the last trading date). We can think of this process as a“ﬁctitious” spot price. We model the spot price by means of the process s t : “ S t F p t q (1)where F p t q is the futures price term structure as seen today. We select alocal-volatility model with a linear drift for the process s t , namely we write ds t “ a p t qp ´ s t q dt ` η p t, s t q s t dW t , s “ aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options W t is a standard Brownian motion under the risk-neutral measure, a p t q is a positive function of time, η p t, s t q is Lipschitz in the second argument,bounded and positive. With these assumptions the previous SDE has aunique positive solution for any time t ą F t p T q . We obtain dF t p T q “ η F p t, T, F t p T qq dW t (3)where the local volatility of futures prices is deﬁned as η F p t, T, K q : “ ´ K ´ F p T q ´ ´ e ´ ş Tt a p u q du ¯¯ η p t, k F p t, T, K qq (4) k F p t, T, K q : “ ´ ˆ ´ KF p T q ˙ e ş Tt a p u q du (5)We can explicitly solve the above dynamics and we obtain F t p T q “ F p T q ´ ´ p ´ s t q e ´ ş Tt a p u q du ¯ (6)Then, we can calculate futures plain-vanilla options by means of an ex-tended version of the Dupire equation. We deﬁne the normalized call priceat time 0 as given by c p t, k q : “ E “ p s t ´ k q ` ‰ , c p , k q “ p ´ k q ` (7)Option on futures can be expressed in term of normalized calls as C p t, T, K q “ P p T p ; e q F p T q e ´ ş Tt a p u q du c p t, k F p t, T, K qq (8)where P p T p ; e q is the price of a zero-coupon bond with yield e t , with e t “ c p t, k q for c p t, k q toease the notation. By exploiting the linear form of the drift, we can derivethe following parabolic PDE for normalized call prices. B t c p t, k q “ ˆ ´ a p t q ´ a p t qp ´ k q B k ` k η p t, k q B k ˙ c p t, k q (9)with the boundary conditions c p t, q “ , c p t,

8q “ , c p , k q “ p ´ k q ` (10) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options a p t q and the local volatility η p t, k q . We choose a simpleconstant (time-independent) speciﬁcation for the mean reversion, while weassume a non-parametric spline interpolation for the local volatility as inNastasi et al. [2018]. We implement the following calibration procedure:1. we guess a value for the mean-reversion a ,2. we perfectly calibrate the local volatility η p t, k q to PVO prices,3. we evaluate all the MCO (or CSO) we wish to best ﬁt,4. we repeat the procedure from the second step with a diﬀerent value of a if MCO (or CSO) prices are not recovered with the required precision.We test the calibration procedure on the TTF natural gas commoditymarket, since in the following sections we are interested in pricing swingoptions on this market. In particular we analyze futures contracts with de-livery period of one month and options on such contracts. We compare theperformance of the procedure against the ﬁxed-point algorithm of Reghaiet al. [2012]. We do not consider gradient-based optimization algorithmssince they show poorer performances due to Jacobian evaluations (we haveabout one hundred parameters). The main improvements introduced inNastasi et al. [2018] are using the accelerated ﬁxed-point algorithm of An-derson [1965], and employing asymptotic expansions similar to Berestyckiet al. [2002] to update the local volatility values.We show in Figure 1 the performance of the calibration algorithm in de-termining the local-volatility function. We consider the market and model-implied volatility surfaces for quoted strikes and maturities, and we plot themaximum absolute diﬀerence between these two surfaces as a function of thealgorithm iterations (we solve the Dupire equation for call prices on each aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options ´ ´ ´ ´ Iterations C a li b r a t i o n E rr o r [ bp ] asympt.freez.AA+asympt.AA+freez. Figure 1: Calibration of 55 PVO on NG TTF Futures quoted on 29 March2018 on ICE market with and without Anderson scheme (AA). Dashed redcurve refers to Reghai et al. [2012] algorithm. Solid blue line to our imple-mentation. .

25 0 . .

75 10 . . . . .

25 Option Expiry V o l a t ili t y Figure 2: NG TTF One-Month MCO quoted on 29 March 2018 on ICEmarket. Volatilities quoted in the market (red dots) or implied by the model(blue lines). Mean reversion ranging from top to bottom from zero to 1 . . aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options .

25 0 . .

75 1 ´ ´ V o l a t ili t y D r o p [ bp ] Figure 3: NG TTF One-Month CSO quoted on 29 March 2018 on ICEmarket. Volatility drops quoted in the market (red dots) or implied by themodel (blue lines). Mean reversion ranging from top to bottom from 1 . . aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options We now continue the modelling section by extending the model presentedin Nastasi et al. [2018] to deal with futures contracts with heterogeneousdelivery periods.Futures on commodities like natural gas, oil and electricity have as un-derlying a daily ﬂow for the whole delivery period. Often day-ahead futuresare quoted on the market as a close proxy of the spot prices. Moreover,futures on diﬀerent delivery periods are usually quoted ranging from oneday to a whole year. On the other hand, PVO contracts are usually quotedonly for the most liquid delivery period.A common way to model these futures prices consists in introducinga dynamics for futures with the shortest delivery period, and in buildinglonger periods by summing futures prices. Yet, it is diﬃcult to ﬁnd modelswhich allows to derive closed-form formulae for futures prices with longerperiods. See for instance the approach of Benth et al. [2018]. Here, we relyon the linear form of the drift coeﬃcient and on using a single risk factor toderive simple closed-form formulae for sums of futures prices.We start by introducing the instantaneous futures price process f t p T q with delivery at time T , and we assume that we can model them by thelocal-volatility linear model presented in the previous section, so that wecan write f t p T q “ f p T q ´ ´ p ´ s t q e ´ ş Tt a p u q du ¯ (11)where the spot process is given by ds t “ a p t qp ´ s t q dt ` η p t, s t q s t dW t , s “ r T ` δ , T ` δ s as given by F t p T, δ q “ ż w p u ´ T, δ q f t p u q du , w p τ, δ q : “ t δ ď τ ď δ u δ ´ δ (13) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options δ : “ δ ´ δ , and we discard the dependency on δ to lightenthe notation. For instance, the futures contracts with a delivery period ofone month presented in the previous section are now denoted as F t p T, q .Now, we are left with the problem of deriving the dynamics of F t p T, δ q for diﬀerent delivery periods δ . We can integrate the instantaneous futuresover the delivery period F t p T, δ q “ F p T, δ q ´ ´ p ´ s t p δ qq e ´ ş Tt A p u,δ q du ¯ (14)where we deﬁne s t p δ q : “ ´ p ´ s t q G p t, δ q , A p t, δ q : “ a p t q ´ B t log G p t, δ q (15) G p t, δ q : “ F p t, δ q ż w p u ´ t, δ q f p u q e ´ ş ut a p v q dv du (16)We notice that the relationship between F t p T, δ q and s t p δ q is the sameholding between f t p T q and s t up to a change in parameters. In particular, wehave f t p T q “ F t p T, q and s t “ s t p q . We can calculate also the dynamicsfollowed by the normalized spot price s t p δ q corresponding to the deliveryperiod δ . ds t p δ q “ A p t, δ qp ´ s t p δ qq dt ` η p t, δ, s t p δ qq s t p δ q dW t , s p δ q “ η p t, δ, k q : “ ˆ ´ ´ G p t, δ q k ˙ η ˆ t, ´ ´ kG p t, δ q ˙ (18)The following bounds are holding s t p δ q ą ´ G p t, δ q , ă G p t, δ q ď δ , and we can imply the smile for otherperiods. For instance, in the natural gas market the only liquid options haveas underlying asset futures contracts with a delivery period of one month.By a direct calculation we obtain a simple relationship linking the localvolatilities of futures on diﬀerent delivery periods. Indeed, we get η p t, δ, k q “ ˆ ´ k ˆ ´ G p t, δ q G p t, ¯ δ q ˙˙ η ˆ t, ¯ δ, ´ p ´ k q G p t, ¯ δ q G p t, δ q ˙ (20) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options . .

25 0 . .

75 0 . . . . . Delta V o l a t ili t y . .

25 0 . .

75 0 . . . . . Delta V o l a t ili t y Figure 4: NG TTF PVO on

JUL18 futures quoted on 29 March 2018 on ICEmarket. Market (red dots) and model (blue lines) implied volatilities. Meanreversion equal to 0 . k ą ´ G p t, δ q . The previous formula allows usto imply volatility smiles for any delivery period. Notice that smiles fordiﬀerent delivery periods can be diﬀerent only if a p t q ą

0, since if a p t q “ G p t, δ q “ When looking at daily futures contracts, we could consider the impact ofspikes in the day-ahead market. Here, we limit ourselves in describing astrategy to include spikes in the dynamics of the ﬁctitious spot price by aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options .

25 0 . .

75 10 . . . . . . Option Expiry V o l a t ili t y Figure 5: NG TTF PVO quoted on 29 March 2018 on ICE market. Market(red dots) one-month futures PVO at-the-money volatilities. Model (bluelines) day-ahead futures PVO at-the-money implied volatilities. Mean re-version ranging from top to bottom from 1.5 to zero with a step of 0.5.adapting the results of Hambly et al. [2009] to the local-volatility linearmodel. We leave to a future work the analysis of how calibrate spike param-eters to observed spikes in the day-ahead market.We can model spikes under the risk-neutral measure as a pure spike priceprocess given by: dy t “ ´ γ p t q y t dt ` φ dN t , y “ γ is a positive function, the amplitude φ is an exponentially distributed random variable with mean ζ , N t is aPoisson process with intensity λ p t q under the risk-neutral measure. Weassume also that the ﬁctitious spot and the spike process are independent.The process y t can be explicitly integrated leading to y t “ N t ÿ i “ φ i exp " ´ ż tτ i γ p u q du * (22)where τ i is the i -th jump time and φ i is the corresponding amplitude real-ization. We can also calculate forward values in closed form as given by E t r y T s “ y t e ´ ş Tt γ p u q du ` h p t, T q (23) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options h p t, T q : “ ζ ż Tt λ p u q e ´ ş Tu γ p v q dv du (24)Adding spikes must leave unaltered the initial term structure of futuresprices, so that we deﬁne spike-altered spot price as¯ s t : “ f p t q ` h p , t q p s t ` y t q , ¯ f t p T q : “ E t r ¯ s T s (25)where we assume that s t and y t are independent. We can proceed with thecalculation of instantaneous futures prices.¯ f t p T q “ f p T q ˆ ´ ´ ¯ s t ` h p , T q e ´ ş Tt a p u q du ´ h p , t q ´ y t ` h p , T q e ´ ş Tt γ p u q du ˙ (26)Then, we integrate over the weights w p τ, δ q to obtain the futures priceson longer delivery periods.¯ F t p T, δ q “ F p T, δ q ´ ´ p ´ s ht p δ qq e ´ ş Tt A h p u,δ q du ´ p H γ,h p t, δ q ´ y t p δ qq e ´ ş Tt Γ h p u,δ q du ¯ (27)where we deﬁne s ht p δ q : “ ´ p ´ s t q G a,h p t, δ q , y t p δ q : “ y t G γ,h p t, δ q (28) A h p t, δ q : “ a p t q ´ B t log G a,h p t, δ q , Γ h p t, δ q : “ γ p t q ´ B t log G γ,h p t, δ q (29) H γ,h p t, δ q : “ h p , t q G γ,h p t, δ q (30)in term of the deterministic functions G a,h p T, δ q : “ F p T q ż w h p u, T, δ q f p u q e ´ ş uT a p v q dv du (31) G γ,h p T, δ q : “ F p T q ż w h p u, T, δ q f p u q e ´ ş uT γ p v q dv du (32)with modiﬁed weights w h p t, T, δ q : “ w p T ´ t, δ q ` h p , t q (33)The calibration of plain-vanilla options on the longer-delivery futuresprices F t p T, δ q can be performed by mapping their prices onto the price ofplain-vanilla options on the normalized spot process by conditioning on theprocess y t , since it is independent of process x t . The spike density p y t canbe calculated starting from the moment generating function of y t . A closed-form solution in the case of time-homogeneous spike parameters and in thelimit of high spike decay and small spike frequency can be found in Hamblyet al. [2009]. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options We set up in this numerical section the stochastic control problem requiredto get swing option prices, and we solve it by means of a least-square MonteCarlo (LSMC) simulation. As a speciﬁc example we consider swing optionstraded in the TTF natural gas market.The LSMC algorithm is particularly eﬀective when we have to deal onlywith few risk factors, since the method requires to calculate a linear re-gression whose dimension rapidly explodes as the number of risky factorsincreases. In our case we adopt a parsimonious model with only one risk fac-tor. However, for a better description of curve and smile dynamics we couldlook at model extensions inclusive of additional risk factors as described inNastasi et al. [2018]. For this reason we will also investigate in Section 4solutions which could be applied in higher dimensionality settings.

A swing option contract guarantees a ﬂexible daily supply of gas with adelivery period of one month. The underlying contracts are the day-aheadfutures, namely F T i p T i ` , q for each ﬁxing date T , . . . , T n f within thedelivery period. At each ﬁxing date the owner of the option is allowed tobuy a quantity N T i of gas within a daily range r N m , N M s at a strike price K . The total consumption of gas must be within a total range r C m , C M s .The option price can be written as W : “ max N P N n f ÿ i “ E r N T i p F T i p T i ` , q ´ K q s P p T p,i ; e q (34)where P p T p,i ; e q is the price of a zero-coupon bond with yield e t , and theconsumption plan N : “ t N T , . . . , N T nf u can be chosen from a set N ofplans subject to the following constraints. N m ď N T i ď N M , C m ď n f ÿ i “ N T i ď C M (35)The strike price of swing options can be known at inception, or ﬁxed ata forward date. In the latter case it is calculated as the daily average ofa speciﬁc one-month futures contract over the observation dates t , . . . , t n s .For example, the strike price of a swing option with delivery in July 2018 is aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options Fixing PeriodStrike Period N T i p F T i p T i ` , q ´ K t ns q J u l Today M a r J u l J un J un K t ns “ n s ř n s j “ F t j p JUL18 , q Figure 6: Term-sheet data for a swing option contract with delivery in July2018 in TTF natural gas market. The strike is ﬁxed by avering the

JUL18 futures contract observed in the month of June.ﬁxed by averaging the daily observations of the

JUL18 one-month contract,namely we set K t ns : “ n s n s ÿ j “ F t j p JUL18 , q where t is the 1st of June 2018 and t n s is the 28th of the same month (lasttrading date).We show in Figure 6 the JUL18 swing option on NG TTF day-aheadfutures term-sheet data. In this example the strike price is set by observingone-month futures contracts.

We can write the stochastic control problem underlying the pricing of aswing option contract by introducing the consumption strategy N T i whichrepresents the quantity of gas delivered in T i , and by deﬁning the totalconsumption up to time T i as given by C T i : “ i ÿ j “ N T j (36) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options T i in the simulation we have to solve the following controlproblem W T i “ max N Ti " N T i p F T i p T i ` , q ´ K q ` E „ W T i ` p N T i q P p T p,i ` ; e q P p T p,i ; e q ˇˇˇˇ F T i , C T i ´ * (37)In case the option is forward starting we should add to conditioning factorsalso the strike price.We can solve the control problem by means of a LSMC simulation. Here,we describe the details of our implementation, which can be split into threesteps: (i) we build consumption grids, (ii) we estimate of the value func-tions on the grids by means of regressions with a backward procedure, (iii)we compute the swing option price with a standard forward Monte Carlosimulation.For simplicity, the method is described in the following by consideringzero interest rates and ﬁxed strike prices. We start by constructing the consumption grid by taking care that thepoints corresponding to extreme choices of the amount to be consumed areincluded. We deﬁne the global constraint functions at each ﬁxing date T i asgiven by U i : “ min pp C M ´ N m q p n f ´ i q , iN M q (38)and D i : “ max pp C m ´ N M q p n f ´ i q , iN m q (39)At each date T i the total consumption must be within such values: D i ď C T i ď U i . We deﬁne C i as the vector representing the consumption grid atﬁxing date T i . The consumption grid is built by following Algorithm 1.On each date T i we start by adding the points allowed by a bang-bangstrategy. We use the term bang-bang as in Jaillet et al. [2004] to indicate astrategy which on each date T i is consuming the minimum or the maximumamount of commodity according to all the constraints. Thus, deﬁning thestarting consumption C as a vector with a single component equal to zero,at time T we have only two possible bang-bang states given by N m and N M (but in the case of tighter global constraints). On the following date T thegrid has three bang-bang points, obtained by starting from the consumptionlevels of the previous date and consuming the minimum or maximum allowedquantities, and so on. Then, we reﬁne the grid in between the bang-bang aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options Algorithm 1

Algorithm to build the consumption grid. Operations from 9to 11 are performed only when the strategy is continuous. procedure Grid ( t T i u n f i “ , N m , N M , C m , C M , ∆) C : “ r s Ź Starting consumption for i “ to n f do for x in C i ´ do Ź Bang Bang points append min p U i , x ` N M q to C i append max p D i , x ` N m q to C i end for C i = unique ( C i ) Ź Sort the grid and erase duplicates for j in length p C i ) - 1 do append unif p C ij , C ij ` , ∆ q to C i Ź Thicken the grid end for C i = unique ( C i ) Ź Sort the grid and erase duplicates end for end procedure points to allow for intermediate choices (continuous consumption strategy).The resulting grids are depicted in Figure 7.

Now, we proceed by describing the simulation algorithm. We start by pro-ducing a Monte Carlo simulation on dates T i for the day-ahead futures pricesaccording to Equations (14) and (17), and we build the consumption grids C i according to the previous algorithm. We call F p k q T i the i ´ th ﬁxing ofthe k ´ th simulation. For each point C i ´ ‘ of the grid at previous time, weintroduce the set N p C i ´ ‘ q of all the possible consumption levels, whose j -thelement can be deﬁned as N j p C i ´ ‘ q : “ C ij ´ C i ´ ‘ (40)and the the set Q T i p C i ´ ‘ q of admissible consumption levels given global andlocal constraints relative to the ‘ ´ th point of the grid Q T i p C i ´ ‘ q : “ ! N P N p C i ´ ‘ q č r N m , N M s | C i ´ ‘ ` N P r C m , C M s ) (41)Then, we can write the control problem on the grid as given by W T i ´ F p k q T i , C i ´ ‘ ¯ “ max N P Q Ti p C i ´ ‘ q ! N ´ F p k q T i ´ K ¯ ` E ” W T i ` ` F T i , C i ´ ‘ ` N ˘ ˇˇ F T i “ F p k q T i ı) (42) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options N M “ N m “ C M “ . C m “ . “ . Left and rightpanel show the bang bang and continuous cases respectively.with terminal condition W T nf ´ F p k q T nf , C n f ´ j ¯ “ $&% min ´ U n f ´ C n f ´ j , N M ¯ ´ F p k q T nf ´ K ¯ F p k q T nf ą K max ´ C n f ´ j ´ D n f , N m ¯ ´ F p k q T nf ´ K ¯ F p k q T nf ď K (43)We can solve the problem backward in time by starting from the termi-nal condition in T n f , and proceeding to the previous steps by numericallyevaluating the forward expectation in the right-hand side of Equation (42)by means of the Monte Carlo simulation. We call f T i ` F ; C i ´ ‘ ` N ˘ theestimate of such forward expectation E ” W T i ` ` F T i ` , C i ´ ‘ ` N ˘ ˇˇ F T i “ F p k q T i ı « f T i ` F ; C i ´ ‘ ` N ˘ (44)and we suppose that it is quadratic with respect to day-ahead futures prices: f T i p F ; C q : “ α i p C q ` β i p C q F ` γ i p C q F (45)We note that C i ´ ‘ ` N P C i by construction, hence we can estimate thecoeﬃcients by regressing for each j the realizations y p k q : “ W T i ` ´ F p k q T i ` , C ij ¯ (46)against x p k q : “ F p k q T i (47) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options T i before the terminalcondition is then solved by replacing the estimate just performed (45) inplace of the forward expectation. The procedure just described is repeatedbackward in time until the ﬁrst ﬁxing. Once the backward procedure is completed we have calculated the coeﬃ-cients α i , β i and γ i on each grid date T i which allows us to approximate theforward expectation given by Equation (45) on any scenario. Thus, in orderto avoid biases, we proceed by sampling a second Monte Carlo simulationfor day-ahead futures prices. On each scenario k of the second simulationand on each date T i we calculate F p k q T i . Starting from the ﬁrst ﬁxing, ateach step, being at a certain point C p k q i ´ on the grid C i ´ , we choose thequantity to consume ˆ N p k q i solving the problem optimization (42) with thecoeﬃcients calculated in the previous simulation. This step takes us to thepoint C p k q i “ C p k q i ´ ` ˆ N p k q i . Repeating the step described until reaching thelast ﬁxing we get the reward R p k q T nf : “ n f ÿ i “ ˆ N p k q i ´ F p k q T i ´ K ¯ (48)Hence, the swing option price is given by averaging the rewards. W T p F T ; 0 q “ E T ” R T nf ı (49) We are now ready to calculate the price of swing options with the local-volatility linear model by using the LSMC algorithm. It is our aim to high-light the impact of the mean-reversion parameter in swing option prices.Moreover, we wish to show that Theorem 2 in Bardou et al. [2009] is hold-ing, and the LSMC algorithm is able to select the optimal strategies inagreement with the theorem.We consider for our numerical analysis futures contracts on the TTFnatural gas, quotations are expressed in A C/MWh. We calibrate our modelto PVO quoted on 29 March 2018 on ICE market. We consider swing optioncontracts with delivery ranging from May 2018 up to June 2019. We considerboth ﬁxed-strike option and ﬂoating-strike options with at-the-money strike.The strike price is calculated by considering the one-month futures contract aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options N m “ , N M “ C m “ . , C M “

20 MWh (51)Notice that we choose the daily constraints without loss of generality byfollowing what is usually done in the literature, see Bardou et al. [2009].Diﬀerent choices can be obtained by simply scaling all the relevant quanti-ties.

We start by analyzing the impact of the mean-reversion speed on swingoption prices. In Figure 8 we show the swing option prices for diﬀerentdelivery periods, each period corresponds to the delivery of futures contractquoted in the market. The trend of the price of the ﬁxed strike options withrespect to the delivery month can be easily understood. As time increases,the option increases its time value which translates into an increase in price.The increasing trend as a function of the mean reversion speed is insteadexplained by the two graphs in Figure 4. This picture shows that the volatil-ity of the day-ahead contract, i.e. one-day delivery period, implied by themodel, is increasing as the mean reversion speed increases, which translatesinto an increase in the price of the swing option. On the contrary, if we lookat the prices of the forward start options for mean reversion speed equal to0, we note that the price trend reproduces the shape of the at the moneymarket volatility at Figure 5. This is due to the fact that the volatility of themonthly Futures observed during the strike period is equal to the volatilityof the day-ahead contract. By increasing the mean reversion, instead, wehave the two opposite eﬀects: the volatility of the strike decreases as shownin the Figure 2 while the volatility of the day-ahead contract, as alreadysaid, increases. As result the forward volatility relative to the ﬁxing periodincreases, producing increasing prices as the mean reversion speed increases.In the following sections we will focus on speciﬁc numerical problems, sothat we will consider only the case of a ﬁxed-strike option delivering in May2018. Moreover, we set the mean reversion speed to 1. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options MAY18 JUN18 JUL18 AUG18 SEP18 OCT18 NOV18 DEC18 JAN19 S w i n g P r i ce s MAY18 JUN18 JUL18 AUG18 SEP18 OCT18 NOV18 DEC18 JAN19 S w i n g P r i ce s Figure 8: Swing option prices by varying the delivery starting date. Meanreversion ranging from top to bottom from 1.5 to 0 with a step of 0.5. Leftpanel ﬁxed-strike contracts. Right panel ﬂoating-strike contracts.

We continue by investigating the strategies selected by the LSMC algorithm.We recall that Theorem 2 in Bardou et al. [2009] describes the structure ofoptimal strategies for swing options when the consumption levels have aspeciﬁc form. In particular, it states that the if the minimum global con-straint and the diﬀerence between the global constraints can be expressed asan integer multiple of the diﬀerence between the daily constraints, then theoptimal strategy on all dates is consuming the daily minimum or maximum(that strategy is of bang-bang type). For instance, the swing option con-tract used for the example of Figure 8 does not satisfy the theorem, whilethe same contract with integer values for the global constraints is within thetheorem since the diﬀerence between the local constraints is 1.We start from one of the cases studied in the previous section (ﬁxed-strike delivering in May 2018 with a “ aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options . ˘ .

04 7 . ˘ . . ˘ .

04 8 . ˘ . C m “ . C m “

12 MWhso that the Theorem is satisﬁed. Prices are calculated either allowing allpossible strategies or only the bang-bang ones. One-sigma statistical errorsare displayed.2. Without limitations on the strategies we change the minimum globalconstraint to 12 MWh, so that now we satisfy the hypotheses of theTheorem. In the bottom-left entry of the table we report the price ofthis scenario. The price is now bigger since the global constraints arewider.3. With the constraint of the previous scenario we limit the allowedstrategies to be only of bang-bang type. We expect not to see a re-duced price in this case since we forbid consumption choices whichare not selected for the optimal strategy. Indeed, in the bottom-rightentry of the table we can see that the price is unchanged.We support our discussion by showing in Figure 9 the graph of dailyconsumption N T i selected by the optimal strategy on a particular simu-lation path, when we assume that the minimum global constraint is ei-ther 12 . As we have seen swing option pricing requires to solve a stochastic con-trol problem with a continuous set of actions. Standard techniques rely onregression-based simulations whose performances may degrade when the di-mensionality of the problem increases. In particular, if we wish to extendour analysis to long-dated options, we should introduce more driving factorsto deal with the curve dynamics and possibly of the volatility dynamics. In- aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options a “

1. The blue lines refer to the case where only bang-bang strategiesare allowed, while red lines to the case without restrictions. In the bang-bangcase (right panel) the two lines coincide.deed, in Nastasi et al. [2018] we extend the local-volatility linear model inthis direction.In the literature diﬀerent techniques are investigated starting from theresults of Barrera-Esteve et al. [2006] on the form of the optimal consumptionstrategy. These authors prove that in the case of diﬀerentiable constraintsthe optimal strategy has the so-called bang-bang form, namely on each datethe optimal strategy is delivering the minimum or the maximum allowedby the constraints. In Bardou et al. [2009] such result is extended also tosharp constraints when the contract speciﬁcs have very particular forms.For contracts with a bang-bang optimal strategy it is possible to simplifythe stochastic control problem since we have only two choices at each date,leading to a simpler LSMC algorithm.Here, we wish to price swing option with an alternative algorithm basedon reinforcement learning (RL) techniques. RL has been introduced in ﬁ-nance to assist the trading activity. See for instance Kolm and Ritter [2019].See also Becker et al. [2019] for applications to American options. In Barrera-Esteve et al. [2006] RL is already considered as a possible pricing tool forswing options. In our approach we use the recently developed proximalpolicy optimization (PPO) algorithm proposed in Schulman et al. [2017]. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options RL describes how an agent behaves in an environment so to maximize somenotion of cumulative reward. The actions of the agent as a function of hisobservations of the environment are termed the agent policy. In our casethe agent can choose the amount of commodity to be delivered within thecontract limits, so that the policy is the consumption strategy, while therewards are the cash ﬂows generated by holding the swing option. Oncethe agent is trained, and the optimal policy is selected, we can run a MonteCarlo simulation to calculate the swing option price.

We consider as before a discrete time-grid of ﬁxing times T , . . . , T n f . Thealgorithm we chose for the training of the agent belongs to the family ofactor-critic algorithms. In particular, in our setting, this means that theagent uses a parametric function with parameters θ to calculate both thequantity N θT i to consume at time T i , and its best estimate of the valuefunction V θT i ; the latter represents the expected value of future rewards,which will match the option price W T i for optimal N θ . The agent makesits decision by observing the environment given by the ﬁxing time T i , theday-ahead futures price F T i : “ F T i p T i ` , q , and the total quantity of gas C θT i ´ consumed up to time T i ´ . We represent in Figure 10 the relationshipsbetween the agent and the environment.We adopt as learning strategy the PPO algorithm developed in Schulmanet al. [2017]. This algorithm is well-suited for continuous control problems .The PPO algorithm collects a small batch of experiences interacting withthe environment to update its decision-making policy. The expected rewardand the value function of a new policy are estimated by sampling from theenvironment. A brief overview of how this is done is provided here below. In the PPO algorithm the policies are randomized, so they are deﬁned asprobability distributions on the set of possible actions. In our case theyrepresent the probability of a speciﬁc gas consumption on each ﬁxing dategiven the value of the day-ahead futures contract and the total level of gasconsumption up to the previous ﬁxing date. In case of forward-strike swing We use the implementation of the algorithm found in OpenAI Baselines https://github.com/openai/baselines aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options t N θT i , V θT i u State t T i , F T i , C θT i ´ u State t T i ` , F T i ` , C θT i u Reward N θT i p F T i ´ K q Reward N θT i ` p F T i ` ´ K q readenvironmentgetreward updateconsumptionsimulatecumulateFigure 10: Agent description.options we have to include also the strike level. If the space of controls is acontinuum, the algorithm considers the actions to be random variables ˜ N θT i distributed according to independent Gaussian distributions φ centered onthe value N θT i , which is determined by a neural network. π θT i p n q : “ Q ! ˜ N θT i “ n | s T i , C θT i ´ ) “ φ p n ; N θT i , ξ i q (52)where the variances ξ i are added to the set θ of parameters which are subjectto optimization. Policies are identiﬁed by the PPO algorithm with thesedensities.If we wish to limit the allowed strategies only to the bang-bang ones,we can simply restrict the action space to a discrete set. Hence, the neuralnetwork will directly return the vector of probabilities of each single admis-sible action. At the end of the training phase, the candidate optimal agentwill take as N θT i the action with maximum probability as determined by thenetwork.Starting from the swing option control problem, we can deﬁne the action-value function if a speciﬁc action is taken at time T i as˜ Q θT i p n q : “ E ” r T i p n q ` ˜ Q θT i ` p ˜ N θT i ` q D p T i , T i ` q ˇˇˇ F T i , C T i ´ ı (53)where r T i p n q : “ n p F T i ´ K q is the reward at time T i , and the ﬁltration isextended to incorporate also the uncertainty in the actions. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options T i is integrated over all the possible choices, we canwrite the value function as˜ V θT i : “ E ” r T i p ˜ N θT i q ` ˜ V θT i ` D p T i , T i ` q ˇˇˇ F T i , C T i ´ ı (54)The PPO algorithm acts on θ to increase the value of an objective func-tion which is made up of two main components, L A and L V . The ﬁrstcomponent L A measures the goodness of the policy, and is related to theso-called advantage A θT i : “ ˜ Q θT i p ˜ N θT i q ´ ˜ V θT i (55)which has the property that the gradient of the expected reward equals ∇ θ E « A ¯ θT i π θT i p ˜ N ¯ θT i q π ¯ θT i p ˜ N ¯ θT i q ﬀ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ¯ θ “ θ (56)In practice one substitutes the unknown A θ with a pathwise quantity whichgives (approximately) the same gradient, namelyˆ A θi : “ n f ´ i ´ ÿ l “ D p T i , T i ` l q λ l ” r T i ` l p ˜ N θT i ` l q ` D p T i ` l , T i ` l ` q V θt ` l ` ´ V θt ` l ı (57)Note that if λ “ A θT i telescopically reduces to the sum of realizeddiscounted rewards, while if λ ă L A of a rewards which are far in the future. This is done to getlower variance. For details, see Schulman et al. [2016].Instead, the second component L V of the objective function measureshow well V θ represents the value function ˜ V θ of the policy π θ .We illustrate in Figure 11 the PPO algorithm. Each PPO batch is formedby episodes in which the state is simulated up to the swing option maturity.The agent interacting with the environment calculates on each ﬁxing date t the policy density and the advantage for selecting an action ˜ N θt at suchtime.After the sampling process a new policy is proposed using a StochasticGradient Descent (SGD) with respect to the θ parameters: θ k ` “ θ k ´ ρ ¨ E « ∇ θ n f ÿ i “ ` L AT i p θ q ´ βL VT i p θ q ˘ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ “ θ k ﬀ (58) L AT i p θ q : “ min π θT i p ˜ N θ k T i q π θ k T i p ˜ N θ k T i q ˆ A θ k i , clip ˜ ´ ε, π θT i p ˜ N θ k T i q π θ k T i p ˜ N θ k T i q , ` ε ¸ ˆ A θ k i + (59) aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options T T n f ¨ ¨ ¨¨ ¨ ¨ Policy and Advantage { π θt , A θt } Figure 11: An episode is a simulation of the state up to the swing optionmaturity. The agent interacting with the environment calculates on eachﬁxing date t the policy density and the advantage for selecting an action˜ N θT i at such time. L VT i p θ q : “ ´ V θT i ´ p ˆ A θ k i ` V θ k T i q ¯ (60)where the expected value is estimated over a batch of episodes, ρ is a learningrate, clip p a, x, b q is the clip function (capped and ﬂoored linear function),while ε and β are hyper-parameters.One of the key ideas of PPO is to ensure that a new policy update is“close” to the previous policy by clipping the advantages. The ratio behindthis choice is to keep the new policy π θ k ` within a neighbourhood of theold one π θ k where one can trust both the ﬁrst order approximation to theobjective function given by the stochastic gradient, and the function V θ k used in the estimation of the advantage. Once the policy is updated, theexperiences are thrown away and a newer batch is collected with the newlyupdated policy. We focus in this example on a swing option contract with at-the-money ﬁxedstrike. The mean reversion is equal to 1 in all experiments.Several PPO hyper-parameters will be kept ﬁxed to the following values: λ “ . ε “ . ρ “ . θ are updated onceevery 2048 training episodes. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options t , where the network inputs are T i expressed as a year fraction, the total consumption to-date remapped lin-early at each time so that its domain is always r´ . , . s , and log p F T i { F T q .In this way all inputs are well normalized, which helps the training of thenetwork. The output layer is linear, i.e. no activation function is applied.Both when the hypotheses which guarantee the existence of bang-bangoptima are satisﬁed, and when they are not, we can allow for general r N m , N M s -valued actions; in the former case, the learning algorithm should ﬁnd outby itself that the best strategy only involves bang-bang consumptions. Thecontinuous-valued consumption is obtained by clipping the network’s out-put to the interval r , s and then remapping the result linearly so that 0and 1 correspond respectively to the minimum and maximum admissibleconsumptions given both daily and global constraints. When we want toforce bang-bang strategies instead, then only the minimum and maximumare considered as admissible actions, and a softmax layer remaps to proba-bilities the output units corresponding to these two actions. We explored several possible architectures of the neural network to inves-tigate whether it impacts the ﬁnal price and/or the number of iterationsrequired for convergence. To this aim, we considered an option with a com-paratively short delivery period of one week, and constraints N m “ , N M “ C m “ , C M “ β ﬁxed to 0 . aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options episodes. The shadows represent the 98% conﬁdence intervals. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options . ˘ .

04 7 . ˘ . . ˘ .

04 8 . ˘ . C m “ . C m “

12 MWh so that the Theorem is satisﬁed. Prices are calculatedeither allowing all possible strategies or only the bang-bang ones. One-sigmastatistical errors are displayed.

In this section, we consider the contracts with maturity of one month de-livering in May 2018 which were analysed in sections 3.3 and 3.3.2 in thecontext of LSMC pricing.After training, the option can be priced using either the LSMC or theRL candidate optimal policy. We therefore run a Monte Carlo simulationwith 1 million paths to get the price.We performed a grid search on the hyperparameter β and found that β “ .

01 was more eﬀective than the default β “ .

5, corresponding to slowerupdates of the network which approximates the value function. Moreover,since the objective function is not convex, we run each optimization fourtimes with diﬀerent random starting guesses for θ , and then choose theoptimized network with the best in-sample performance on the last 1,000,000training episodes. The out-of-sample results of such network are shown inTable 2, and they are compatible with the LSMC results in Table 1 withinstatistical uncertainty.We also see that the unconstrained PPO agent successfully identiﬁes astrategy of bang-bang type for the case C m “

12 MWh in which we knowthat it is optimal to do so. This is exempliﬁed by Figure 13, where we ﬁxa decision time and plot the chosen action as a function of the other twocoordinates of the network input (i.e. normalized log-spot and consumption).

In this paper we presented a new model to price swing option contracts. Themodel is able to calibrate liquid market quotes and to imply the volatilitysmile for futures contracts with diﬀerent delivery periods. We show also how aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options p F T i { F T q ; totalconsumption to-date remapped linearly so that its domain is r´ . , . s ;today’s consumption remapped linearly so that its domain is r , s .to extend the model to include spikes into its dynamics. The pricing algo-rithm is implemented both by using a least-square Monte Carlo approachand by means of recent reinforcement learning algorithms, such as the prox-imal policy optimization algorithm. Using the former, we investigate optionprices and optimal strategies for diﬀerent conﬁguration of the model, andwe test the impact of constraining the choice of the control problem only tobang-bang strategies. The aim of exploring techniques based on reinforce-ment learning is due to the fact that we wish to investigate calculation toolsmore suitable in high-dimensional settings. We ﬁnd that this novel tech-niques also gives accurate results. This paper focuses on situations whereother techniques are available as a benchmark, to gather evidence on the ro-bustness of the approach; we leave for future developments the explorationof settings where it could be the only possibility. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options References

D. Anderson. Iterative procedures for nonlinear integral equations.

Journalof the ACM , 4(12):547–560, 1965.O. Bardou, S. Bouthemy, and G. Pagès. Optimal quantization for the pricingof swing options.

Applied Mathematical Finance , 16(2):183–217, 2009.C. Barrera-Esteve, F. Bergeret, C. Dossal, E. Gobet, A. Meziou, R. Munos,and D. Reboul-Salze. Numerical methods for the pricing of swing options:a stochastic control approach.

Methodology and Computing in AppliedProbability , 8(4):517–540, 2006.S. Becker, P. Cheridito, A. Jentzen, and T. Welti. Solving high-dimensionaloptimal stopping problems using deep learning.

Working paper , 2019.URL arXiv.org .F. Benth, J. Lempa, and T. Nilssen. On the optimal exercise of swing optionsin electricity markets.

Journal of Energy Markets , 4(4):3–28, 2012.F. Benth, M. Piccirilli, and T. Vargiolu. Additive energy forward curves ina heath-jarrow-morton framework.

Working paper , 2018. URL https://arxiv.org/abs/1709.03310 .H. Berestycki, J. Busca, and I. Florent. Asymptotics and calibration of localvolatility models.

Quantitative Finance , 2(1):61–69, 2002.R. Carmona and N. Touzi. Optimal multiple stopping and valuation of swingoptions.

Mathematical Finance , 18(2):239–268, 2008.M. Eriksson, J. Lempa, and T. Nilssen. Swing options in commodity mar-kets: A multidimensional lévy diﬀusion model.

Mathematical Methods ofOperational Research , 79(1):31–67, 2013.G. Haarbrücker and D. Kuhn. Valuation of electricity swing options bymultistage stochastic programming.

Management Science , 45(4):889–899,2009.B. Hambly, S. Howison, and T. Kluge. Modeling spikes and pricing swingoptions in electricity markets.

Quantitative Finance , 9(8):937–949, 2009.P. Jaillet, E. I. Ronn, and S. Tompaidis. Valuation of commodity-basedswing options.

Management Science , 50(7), 2004. aluiso, Nastasi, Pallavicini, Sartorelli, Pricing Commodity Swing Options

International Journal of Theoretical andApplied Finance , 22(8), 2020.P. Kolm and G. Ritter. Dynamic replication and hedging: A reinforcementlearning approach.

The Journal of Financial Data Science , 1(1):159–171,2019.E. Nastasi, A. Pallavicini, and G. Sartorelli. Smile modelling in commoditymarkets.

Working paper , 2018. URL arXiv.org .A. Reghai, G. Boya, and G. Vong. Local volatility: smooth calibrationand fast usage.

Working Paper , 2012. doi: 10.2139/ssrn.2008215. URL https://ssrn.com/abstract=2008215 .J. Schulman, P. Moritz, M. Levine, S. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation.

Proceedings of ICLR 2016 , 2016. URL arXiv.org .J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximalpolicy optimization algorithms.

Working paper , 2017. URL arXiv.org .A. C. Thompson. Valuation of path-dependent contingent claims with mul-tiple exercise decisions over time: the case of take or pay.

Journal ofFinancial and Quantitative Analysis , 30:271–293, 1995.B. Zhang and C. Oosterlee. An eﬃcient pricing algorithm for swing optionsbased on fourier cosine expansions.