[PDF] Equal Risk Pricing of Derivatives with Deep Hedging

Abstract

This article presents a deep reinforcement learning approach to price and hedge financial derivatives. This approach extends the work of Guo and Zhu (2017) who recently introduced the equal risk pricing framework, where the price of a contingent claim is determined by equating the optimally hedged residual risk exposure associated respectively with the long and short positions in the derivative. Modifications to the latter scheme are considered to circumvent theoretical pitfalls associated with the original approach. Derivative prices obtained through this modified approach are shown to be arbitrage-free. The current paper also presents a general and tractable implementation for the equal risk pricing framework inspired by the deep hedging algorithm of Buehler et al. (2019). An ϵ -completeness measure allowing for the quantification of the residual hedging risk associated with a derivative is also proposed. The latter measure generalizes the one presented in Bertsimas et al. (2001) based on the quadratic penalty. Monte Carlo simulations are performed under a large variety of market dynamics to demonstrate the practicability of our approach, to perform benchmarking with respect to traditional methods and to conduct sensitivity analyses.

Full PDF

EEqual Risk Pricing of Derivatives with Deep Hedging ∗ Alexandre Carbonneau † a and Fr´ed´eric Godin ba,b Concordia University, Department of Mathematics and Statistics, Montr´eal, Canada

June 9, 2020

Abstract

This article presents a deep reinforcement learning approach to price and hedge ﬁnancialderivatives. This approach extends the work of Guo and Zhu (2017) who recently introducedthe equal risk pricing framework, where the price of a contingent claim is determined byequating the optimally hedged residual risk exposure associated respectively with the longand short positions in the derivative. Modiﬁcations to the latter scheme are consideredto circumvent theoretical pitfalls associated with the original approach. Derivative pricesobtained through this modiﬁed approach are shown to be arbitrage-free. The current paperalso presents a general and tractable implementation for the equal risk pricing frameworkinspired by the deep hedging algorithm of Buehler et al. (2019). An (cid:15) -completeness measureallowing for the quantiﬁcation of the residual hedging risk associated with a derivative isalso proposed. The latter measure generalizes the one presented in Bertsimas et al. (2001)based on the quadratic penalty. Monte Carlo simulations are performed under a largevariety of market dynamics to demonstrate the practicability of our approach, to performbenchmarking with respect to traditional methods and to conduct sensitivity analyses.

Keywords:

Reinforcement learning, Deep learning, Option pricing, Hedging, Convex riskmeasures. ∗ A GitHub repository with some examples of codes can be found at github.com/alexandrecarbonneau. ∗ Alexandre Carbonneau gratefully acknowledges ﬁnancial support from FRQNT. Fr´ed´eric Godin gratefullyacknowledges ﬁnancial support from NSERC (RGPIN-2017-06837). † Corresponding author.

Email addresses: [email protected] (Alexandre Carbonneau), [email protected](Fr´ed´eric Godin). a r X i v : . [ q -f i n . C P ] J un Introduction

Under the complete market paradigm, for instance as in Black and Scholes (1973) and Merton(1973), all contingent claims can be perfectly replicated with some dynamic hedging strategy. Insuch circumstances, the unique arbitrage-free price of an option must be the initial value of thereplicating portfolio. However, in reality, markets are incomplete and perfect replication is typicallyimpossible for non-linear derivatives. Indeed, there are many sources of market incompletenessobserved in practice such as discrete-time rebalancing, liquidity constraints, stochastic volatility,jumps, etc. In an incomplete market, it is often impracticable for a hedger to select a tradingstrategy that entirely removes risk as it would typically entail unreasonable costs. For instance,Eberlein and Jacod (1997) show that the super-replication price of a European call option undera large variety of underlying asset dynamics is the initial underlying asset price. Thus, in practice,a hedger must accept the presence of residual hedging risk that is intrinsic to the contingentclaim being hedged. The determination of option prices and hedging policies therefore depend onsubjective assumptions regarding risk preferences of market participants.An incomplete market derivatives pricing approach that is extensively studied in the literatureconsists in the selection of a suitable equivalent martingale measure (EMM). As shown in theseminal work of Harrison and Pliska (1981), if a market is incomplete and arbitrage-free, thereexists an inﬁnite set of EMMs each of which can be used to price derivatives through a risk-neutralvaluation. Some popular examples of EMMs in the literature include the Esscher transform byGerber and Shiu (1994) and the minimal-entropy martingale measure by Frittelli (2000). Optionpricing functions induced by the latter risk-neutral measures can then be used to calculate Greekletters associated with the option, which leads to the speciﬁcation of hedging policies, e.g. delta-hedging. However, in that case, hedging policies are not an input of the pricing procedure, butrather a by-product. Thus, hedging policies obtained from many popular EMMs are typically notoptimal, and corresponding option prices are not designed in a way that is consistent with optimalhedging strategies. Another strand of literature derives martingale measures that are designedto be consistent with optimal hedging approaches such as the minimal martingale measure by1¨ollmer and Schweizer (1991), the variance-optimal martingale measure by Schweizer (1995) andthe Extended Girsanov Principle of Elliott and Madan (1998). However, an undesirable featureof the three previous methods is their reliance on quadratic objective functions which penalizehedging gains. Moreover, the minimal and variance-optimal martingale measures are often signedmeasures under realistic models of the underlying asset price, which can be problematic from atheoretical standpoint. The Extended Girsanov Principle, on the contrary, produces a legitimateprobability measure. This measure is consistent with a local hedging optimization, i.e. a procedureminimizing incremental discounted risk-adjusted hedging costs. Nevertheless, the identiﬁcation ofa pricing procedure consistent with a non-quadratic global optimization of hedging errors, i.e. ajoint optimization over hedging decisions for all time periods until the maturity of the derivative,would be desirable.In that direction, another approach studied in the literature considers the determination ofderivatives prices directly from global optimal hedging strategies without having to specify anEMM. A ﬁrst example of approach among these schemes is utility indiﬀerence pricing in which atrader with a speciﬁc utility function prices a contingent claim as the value such that the utilityof his portfolio remains unchanged by the inclusion of the contingent claim. For instance, Hodgesand Neuberger (1989) study hedging and indiﬀerence pricing under the negative exponentialutility function with transaction costs under the Black-Scholes model (BSM). Closely related isthe risk indiﬀerence pricing in which a risk measure is used to characterize the risk aversion of thetrader instead of a utility function. For example, Xu (2006) studies the indiﬀerence pricing andhedging in an incomplete market using convex risk measures as deﬁned in F¨ollmer and Schied(2002). One notable feature of utility and risk indiﬀerence pricing is that the resulting pricedepends on the position (long or short) of the hedger in the contingent claim. This highlightsthe need to identify hedging-based pricing schemes producing a unique price that is invariant tobeing long or short.Recently, Guo and Zhu (2017) introduced the concept of equal risk pricing. In their framework,the option price is set as the value such that the global risk exposure of the long and shortpositions is equal under optimal hedging strategies. Contrarily to utility and risk indiﬀerence2ricing, equal risk pricing provides a unique transactional price. The latter paper focuses mainlyon theoretical features of the equal risk pricing framework and does not provide a general approachto compute the solution of the hedging problem embedded in the methodology. Thus, equal riskprices are only provided for a limited number of speciﬁc cases.To enhance the tractability of the equal risk approach, the current paper considers the use ofconvex risk measures to quantify the global risk exposures of the long and short positions underoptimal hedging strategies. Hedging under a convex risk measure has been extensively studied inthe literature: Alexander et al. (2003) minimize the Conditional Value-at-Risk (CVaR, Rockafellarand Uryasev (2002)) in the context of static hedging with multiple assets, Xu (2006) studiesthe indiﬀerence pricing and hedging under a convex risk measure in an incomplete market andGodin (2016) develops a global hedging strategy using CVaR as the cost function in the presenceof transaction costs. The use of convex risk measures within the equal risk pricing frameworkwas ﬁrst proposed by Marzban et al. (2020) who provide a dynamic programming algorithm tocompute hedging strategies. Recently, Buehler et al. (2019) introduced an algorithm called deephedging to hedge a portfolio of over-the-counter derivatives in the presence of market frictionsunder a convex risk measure using deep reinforcement learning (deep RL). The general frameworkof RL is for an agent to learn over many iterations of an environment how to select sequencesof actions in order to optimize a cost function. In quantitative ﬁnance, RL has been appliedsuccessfully in algorithmic trading (Moody and Saﬀell (2001), Lu (2017) and Deng et al. (2016))and in portfolio optimization (Jiang et al. (2017) and Almahdi and Yang (2017)). Hedgingwith RL also has received some attention. Kolm and Ritter (2019) demonstrate that SARSA(Rummery and Niranjan, 1994) can be used to learn the hedging strategy if the objective functionis a mean-variance criteria under the BSM. Halperin (2017) shows that Q-learning (Watkins andDayan, 1992) can be used to learn the option pricing and hedging strategy under the BSM. In thenovel deep hedging algorithm of Buehler et al. (2019), an agent is trained to learn how to optimizethe hedging strategy produced by a neural network through many simulations of a syntheticmarket. Their deep RL approach to the hedging problem helps to counter the well-known curse ofdimensionality that arises when the state space gets too large. As argued by Fran¸cois et al. (2014),3hen applying traditional dynamic programming algorithms to compute hedging strategies, thecurse of dimensionality can prevent the use of a large number of features to model the diﬀerentcomponents of the ﬁnancial market.The contribution of the current study is threefold. The ﬁrst contribution consists in providing auniversal and tractable methodology to implement the equal risk pricing framework under verygeneral conditions. The approach based on deep RL as in Buehler et al. (2019) can price andoptimally hedge a very large number of contingent claims (e.g. vanilla options, exotic options,options with multiple underlying assets) with multiple liquid hedging instruments under a widevariety of market dynamics (e.g. regime-switching, stochastic volatility, jumps, etc.). Resultspresented in this paper, which rely on Buehler et al. (2019), demonstrate that our methodologicalapproach to equal risk pricing can approximate arbitrarily well the true equal risk price.The second contribution of the current study consists in performing several numerical experimentsstudying the behavior of equal risk prices in various contexts. Such experiments showcase thewide applicability of our proposed framework. The behavior of the equal risk pricing approachis analyzed among others through benchmarking against expected risk-neutral pricing and byconducting sensitivity analyzes determining the impact on option prices of the conﬁdence levelassociated with the risk measure and of the underlying asset model choice. The conduction ofsuch numerical experiments crucially relies on the deep RL scheme outlined in the current study.Using the latter framework allows presenting numerical examples for equal risk pricing that aremore extensive, realistic and varied than in previous studies; such results would most likely havebeen previously inaccessible when relying on more traditional computation methods (e.g. ﬁnitediﬀerence dynamic programming). Numerical results show, among others, that equal risk pricesof out-of-the-money (OTM) options are signiﬁcantly higher than risk-neutral prices across alldynamics considered. This ﬁnding is shown to be shared by diﬀerent option categories whichinclude vanilla and exotic options. Thus, by using the usual risk-neutral valuation instead of theequal risk pricing framework, a risk averse participant trading OTM options might signiﬁcantlyunderprice these contracts. 4he last contribution is the introduction of an asymmetric (cid:15) -completeness measure based onhedging strategies embedded in the equal risk pricing approach. The purpose of the metric isto quantify the magnitude of unhedgeable risk associated with a position in a contingent claim.The (cid:15) -completeness measure can therefore be used to quantify the level of market incompletenessinherent to a given market model. Our contribution complements the work of Bertsimas et al.(2001); their proposed measure of market incompleteness is based on the mean-squared-error costfunction, while ours has the advantage of allowing to characterize the risk aversion of the hedgerwith any convex risk measure. Furthermore, the current paper’s proposed measure is asymmetricin the sense that the risk for the long and short positions in the derivative are quantiﬁed by twodiﬀerent hedging strategies, unlike in Bertsimas et al. (2001) where the single variance-optimalhedging strategy is considered.The paper is structured as follows. Section 2 introduces our adaptation of the equal risk pricingframework along with the proposed (cid:15) -completeness measure. Section 3 describes the deep RLnumerical solution to equal risk pricing. Section 4 presents various numerical experimentsincluding, among others, sensitivity and benchmarking analyses. Section 5 concludes. All proofsare provided in Appendix A.

This section details the theoretical option pricing setup considered in the current study.

Let (Ω , F , P ) be the probability space where P is the physical measure. The ﬁnancial market is indiscrete time with a ﬁnite time horizon of T years and known ﬁxed trading dates T := { t

It can be shown, see for instance Lamberton and Lapeyre (2011), that δ is self-ﬁnancing if and only if V δn = B n ( V + G δn ) for n = 0 , , . . . , N. Deﬁnition 2.4. (Admissible trading strategies) Let Π be the convex set of admissible tradingstrategies which consists of all suﬃciently well-behaved self-ﬁnancing trading strategies. In an incomplete market, perfect replication is impossible and the hedger must accept that somerisks cannot be fully hedged. As such, an optimal hedging strategy (also referred to as a globalhedging strategy) is deﬁned as one that minimizes a criterion based on the closeness between thehedging portfolio value and the payoﬀ of the contingent claim at maturity (the diﬀerence betweentwo such quantities is referred to as the hedging error). Many diﬀerent measures of distancecan be used to represent the risk aversion of the hedger. In this paper, convex risk measures asdeﬁned in F¨ollmer and Schied (2002) are considered. As discussed in Marzban et al. (2020) andshown in the current section, the use of a convex risk measure to characterize the risk aversion ofthe hedger enhances the tractability of the equal risk pricing framework.

Deﬁnition 2.5. (Convex risk measure) Let X be a set of random variables representing liabilitiesand X , X ∈ X . As deﬁned in F¨ollmer and Schied (2002), ρ : X → R is a convex risk measureif it satisﬁes the following properties: X := { X n } Nn =0 with X n := [ X (1) n , . . . , X ( D ) n ] is F -predictable if for j = 1 , . . . , D, X ( j )0 ∈ F and X ( j ) n +1 ∈ F n for n = 0 , . . . , N − i) Monotonicity: X ≤ X ⇒ ρ ( X ) ≤ ρ ( X ) . A larger liability is riskier.(ii) Translation invariance: For a ∈ R , ρ ( X + a ) = ρ ( X ) + a . This implies the hedger isindiﬀerent between an empty portfolio and a portfolio with a liability X and a cash amountof ρ ( X ) : ρ ( X − ρ ( X )) = ρ ( X ) − ρ ( X ) = 0 . (iii) Convexity: For ≤ λ ≤ , ρ ( λX + (1 − λ ) X ) ≤ λρ ( X ) + (1 − λ ) ρ ( X ) . Diversiﬁcationdoes not increase risk.

For the rest of the paper, let ρ be the convex risk measure used to characterize the risk aversionof the hedger for both the long and short positions in the usual contingent claim. Also, assumewithout loss of generality (w.l.o.g.) that the position in the hedging portfolio is long for both thelong and short positions in the derivative. Deﬁnition 2.6. (Long and short sided risk) Deﬁne (cid:15) ( L ) ( V ) and (cid:15) ( S ) ( V ) respectively as themeasured risk exposure of a long and short position in the derivative under the optimal hedge ifthe value of the initial hedging portfolio is V : (cid:15) ( L ) ( V ) := min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N ( V + G δN ) (cid:1) , (2.3) (cid:15) ( S ) ( V ) := min δ ∈ Π ρ (cid:0) Φ( S N , Z N ) − B N ( V + G δN ) (cid:1) . (2.4) Remark 2.2.

An assumption implicit to Deﬁnition 2.6 is that the minimum in (2.3) or (2.4) is indeed attained by some trading strategy, i.e. that the inﬁmum is in fact a minimum. Theidentiﬁcation of conditions which ensure that this assumption is satisﬁed are left out-of-scope. We emphasize that the optimal risk exposures of the long and short position as deﬁned in (2.3)and (2.4) are reached through two distinct hedging strategies. The following proposition is adirect consequence of the translation invariance of ρ .8 roposition 2.1. (cid:15) ( L ) ( V ) = (cid:15) ( L ) (0) − B N V , (cid:15) ( S ) ( V ) = (cid:15) ( S ) (0) − B N V . Deﬁnition 2.7. (Optimal hedging) Let δ ( L ) and δ ( S ) be respectively the optimal hedging strategiesfor the long and short positions in the derivative: δ ( L ) := arg min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N ( V + G δN ) (cid:1) , (2.5) δ ( S ) := arg min δ ∈ Π ρ (cid:0) Φ( S N , Z N ) − B N ( V + G δN ) (cid:1) . (2.6)The translation invariance property of ρ implies that the optimal hedging strategies δ ( L ) and δ ( S ) do not depend on V , as stated in the following proposition. Proposition 2.2 ( Independence of the optimal hedging strategies from V ) . δ ( L ) = arg min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N G δN (cid:1) , (2.7) δ ( S ) = arg min δ ∈ Π ρ (cid:0) Φ( S N , Z N ) − B N G δN (cid:1) . (2.8) (cid:15) -completeness measure The current section outlines the equal risk pricing criterion to determine the price of a derivative.It entails ﬁnding a price for which the risk exposure to both the long position and short positionhedgers are equal. One important concept in the valuation of contingent claims is the absence ofarbitrage. In this paper, the notions of super-replication and sub-replication are used to deﬁnearbitrage-free pricing.

Deﬁnition 2.8 ( Super-replication and sub-replication strategies ) . A super-replication strategyfor the contingent claim Φ is deﬁned as a pair ( v, δ ) such that v ∈ R , and δ is an admissiblehedging strategy for which V δ = v and V δN = B N ( v + G δN ) ≥ Φ( S N , Z N ) P -a.s. Super-replicationis a conservative approach to hedging which can be used by a seller of Φ to remove all residual edging risk. Let ¯ v be the greatest lower bound of the set of initial portfolio values for which asuper-replication strategy exists: ¯ v := inf (cid:8) v : ∃ δ ∈ Π such that P (cid:2) B N ( v + G δN ) ≥ Φ( S N , Z N ) (cid:3) = 1 (cid:9) . (2.9)¯ v is called the super-replication price of Φ and it represents an upper bound of the set of arbitrage-free prices for Φ . Similarly, a sub-replication strategy is a pair ( v, δ ) that completely removes thehedging risk exposure associated with a long position in Φ , i.e. for which B N ( v + G δN ) ≤ Φ( S N , Z N ) P -a.s. The least upper bound of the set of portfolio values such that a sub-replication strategyexists is called the sub-replication price of Φ and is a lower bound of the set of arbitrage-free pricesfor Φ : v := sup (cid:8) v : ∃ δ ∈ Π such that P (cid:2) B N ( v + G δN ) ≤ Φ( S N , Z N ) (cid:3) = 1 (cid:9) . (2.10) Deﬁnition 2.9 (Arbitrage-free pricing) . For the rest of the paper, the price of a contingent claim Φ is said to be arbitrage-free if it falls within the interval [ v, ¯ v ] as deﬁned in (2.9) and (2.10) . The price of a contingent claim under the equal risk pricing framework can now be deﬁned.

Deﬁnition 2.10 ( Equal risk price for European-type claims ) . The equal risk price C (cid:63) of thecontingent claim Φ is deﬁned as the initial portfolio value such that the optimally hedged measuredrisk exposure of both the long and short positions in the derivative are equal, i.e. C (cid:63) := C suchthat: (cid:15) ( L ) ( − C ) = (cid:15) ( S ) ( C ) . (2.11) Remark 2.3.

Contrarily to Guo and Zhu (2017), in the current paper, the optimal hedgingstrategy minimizes risk under the physical measure instead of under some risk-neutral measure.Two main reasons led to this modiﬁcation of the original approach found in Guo and Zhu (2017).First, under incomplete markets, the choice of the risk-neutral measure is arbitrary, whereas thephysical measure can be more objectively determined using econometrics techniques. Having the rice being determined under the physical measure removes the subjectivity associated with thechoice of the martingale measure. Secondly, under a risk-neutral measure Q , the price is alreadycharacterized by the discounted expected payoﬀ E Q (cid:2) e − rT Φ( S N , Z N ) (cid:3) , which makes interpretationof the risk-neutral equal risk price questionable. Before introducing results showing that equal risk option prices are arbitrage-free, a technicalassumption on which the proofs rely is outlined.

Assumption 2.1.

As in Xu (2006) and Marzban et al. (2020), assume that the risk associated tohedging losses is bounded below across all admissible trading strategies, i.e. min δ ∈ Π ρ ( − B N G δN ) > −∞ . The next theorem provides a characterization of equal risk prices. It also indicates that equalrisk prices of contingent claims with a ﬁnite super-replication price are arbitrage-free. Therepresentation of C (cid:63) in (2.12) is analogous to results found in Marzban et al. (2020) who considersa similar setup with convex risk measures. Although the arbitrage-free result is also stated inMarzban et al. (2020), a formal proof was not given. Theorem 2.1 ( Absence of arbitrage ) . Assume that there exist a ﬁnite super-replication price for Φ . Then, the equal risk price C (cid:63) from Deﬁnition 2.10 exists, is unique, is arbitrage-free and canbe expressed as C (cid:63) = (cid:15) ( S ) (0) − (cid:15) ( L ) (0)2 B N . (2.12) Remark 2.4.

The risk measure considered in the work of Guo and Zhu (2017) lacks the translationinvariance property, which implies that equal risk prices are provided for a very limited numberof cases. The proof of Theorem 2.1 shows that the representation of C (cid:63) in (2.12) is a directconsequence of the translation invariance property of convex risk measures. We now propose measures to quantify the residual risk faced by hedgers of the contingent claim.Such measures are analogous to but more general than the one proposed in Bertsimas et al. (2001)who study the case of variance-optimal hedging.

Deﬁnition 2.11 ( (cid:15) market completeness measure ) . Deﬁne (cid:15) (cid:63) as the level of residual risk facedby the hedger of any of the short or long position in the contingent claim if its price is the equal isk price and optimal hedging strategies are used for both positions: (cid:15) (cid:63) := (cid:15) ( L ) ( − C (cid:63) ) = (cid:15) ( S ) ( C (cid:63) ) . (2.13) (cid:15) (cid:63) and (cid:15) (cid:63) /C (cid:63) are referred to as respectively the measured residual risk exposure per derivativecontract and per dollar invested. The following proposition states that (cid:15) (cid:63) is the average of the measured risk exposure of both longand short optimally hedged positions in the contingent claim Φ assuming that the initial value ofthe portfolio is zero.

Proposition 2.3. (cid:15) (cid:63) = (cid:15) ( L ) (0) + (cid:15) ( S ) (0)2 . (2.14) Remark 2.5.

Bertsimas et al. (2001) proposed instead the following measure of market incom-pleteness: ε (cid:63) = min V ,δ E [(Φ( S N , Z N ) − B N ( V + G δN )) ] , where the expectation is taken with respect to the physical measure. Our measure (cid:15) (cid:63) has theadvantage of characterizing the risk aversion of the hedger with a convex risk measure, contrarilyto Bertsimas et al. (2001) who are restricted to the use of a quadratic penalty. Using the latterpenalty entails that hedging gains are penalized during the optimization of the hedging strategy,which is clearly undesirable. The ability to rely on convex measures in the current scheme forrisk quantiﬁcation allows for an asymmetric treatment of hedging gains and losses which is moreconsistent of actual objectives of the hedging agents. As argued by Bertsimas et al. (2001), market incompleteness is often described in the literatureas a binary concept whereas in practice, it is much more natural to consider diﬀerent degreesof incompleteness implying diﬀerent levels of residual hedging risk. The measure (cid:15) (cid:63) allowsdetermining where is any contingent claim situated within the spectrum of incompleteness andwhether it is easily hedgeable or not. As discussed in Bertsimas et al. (2001), a single metric12uch as (cid:15) (cid:63) might not be suﬃcient for a complete depiction of the level of market incompletenessassociated with a contingent claim. For instance, it does not depict the entire hedging errordistribution, nor does it directly indicate which scenarios are the main drivers of hedging residualrisk. Nevertheless, (cid:15) (cid:63) is still a good indication of the eﬃciency of the optimal hedging procedure fora given derivative. Moreover, sensitivity analyses over (cid:15) (cid:63) with respect to various model dynamicscan be done to assess the impact of the diﬀerent sources of market incompleteness. Numericalexperiments in Section 4 will attempt to provide some insight on drivers of (cid:15) (cid:63) . In the current section, a tractable solution is proposed to implement the equal risk pricingframework. The approach uses the recent deep hedging algorithm of Buehler et al. (2019) totrain two distinct neural networks which are used to approximate the optimal hedging strategyrespectively for the long and the short position in the derivative.

For convenience, a very similar notation for neural networks as the one introduced by Buehleret al. (2019) is used (see Section 4 of their paper). The reader is referred to Goodfellow et al.(2016) for a general description of neural networks.

Deﬁnition 3.1 ( Feedforward neural network ) . Let X ∈ R d in × be a feature vector of dimensions d in ∈ N and L, d , . . . , d L − , d out ∈ N with L ≥ . Deﬁne a feedforward neural network (FFNN)as the mapping F θ : R d in → R d out with trainable parameters θ : F θ ( X ) := A L ◦ F L − ◦ . . . ◦ F , (3.1) F l := σ ◦ A l , l = 1 , . . . , L − , where ◦ denotes the function composition operator, and for any l = 1 , . . . , L , the function A l isdeﬁned through A l ( Y ) := W ( l ) Y + b ( l ) with • W (1) ∈ R d × d in , b (1) ∈ R d × and Y ∈ R d in × if l = 1 , W ( l ) ∈ R d l × d l − , b ( l ) ∈ R d l × and Y ∈ R d l − × if l = 2 , . . . , L − and L ≥ , • W ( L ) ∈ R d out × d L − , b ( L ) ∈ R d out × and Y ∈ R d L − × if l = L .The activation function σ : R → R is applied element-wise to outputs of the pre-activationfunctions A l . Moreover, θ := { W (1) , . . . , W ( L ) , b (1) , . . . , b ( L ) } (3.2) is the set of trainable parameters of the FFNN. The following deﬁnition of sets of FFNN will be used throughout the rest of the section to deﬁne,for instance, the two neural networks used for hedging the long and short position in the derivative,the tractable solution to the equal risk pricing framework and the optimization procedure ofneural networks.

Deﬁnition 3.2 ( Sets of FFNN ) . Let

N N σ ∞ ,d in ,d out be the set of all FFNN mapping from R d in → R d out as in Deﬁnition 3.1 with a ﬁxed activation function σ and an arbitrary number of layersand neurons per layer. Since a unique activation function is considered in the numerical section,let N N ∞ ,d in ,d out := N N σ ∞ ,d in ,d out . Moreover, for all M ∈ N and R M ∈ N that depends on M ,let Θ M,d in ,d out ⊆ R R M . Deﬁne N N

M,d in ,d out as the set of neural networks F θ as in (3.1) with θ ∈ Θ M,d in ,d out : N N

M,d in ,d out := { F θ : θ ∈ Θ M,d in ,d out } . (3.3) The sequence of sets {N N

M,d in ,d out } M ∈ N is assumed to have the following properties: • For any M ∈ N : N N

M,d in ,d out ⊂ N N M +1 ,d in ,d out where ⊂ denotes strict inclusion, • (cid:83) M ∈ N N N

M,d in ,d out = N N ∞ ,d in ,d out . This deﬁnition of sets of FFNN introduced by Buehler et al. (2019) is very convenient as the sets {N N

M,d in ,d out } N ∈ N can be used to describe two cases of interest in deep learning. Here are twodiﬀerent possible deﬁnitions for N N

M,d in ,d out . 14A) Let { L ( M ) } M ∈ N , { d ( M )1 } M ∈ N , { d ( M )2 } M ∈ N , . . . be non-decreasing integer sequences. Then, N N

M,d in ,d out is deﬁned as the set of all FFNN mapping from R d in → R d out with a ﬁxedstructure of L ( M ) layers and of d ( M )1 , . . . , d ( M ) L ( M ) − , d out neurons per layer. This case is usefulfor the problem of ﬁtting the trainable parameters θ with a ﬁxed set of hyperparameters.(B) Let N N

M,d in ,d out be the set of all FFNN mapping from R d in → R d out for an arbitrary numberof layers and number of neurons per layer with at most M non-zero trainable parameters.This case is useful to describe the complete optimization problem of neural networks whichinclude the selection of hyperparameters, often called hyperparameters tuning.Unless speciﬁed otherwise, one can assume w.l.o.g. either deﬁnition for {N N M,d in ,d out } M ∈ N . To formulate how two distinct neural networks can approximate arbitrarily well the optimalhedging of the long and short position in a derivative, the following assumption is applied for therest of the paper.

Assumption 3.1.

For each position (long and short) in the derivative, there exists a function f : R ˜ D → R D (distinct for the long and short position) such that at each rebalancing date, theoptimal hedge is of the form δ (1: D ) n +1 = f ( S n , V n , I n , T − t n ) where ˜ D := D + 2 + dim ( I n ) with I n being some random vector encompassing relevant necessary information to compute the optimalhedging strategy, which depends on the market setup considered. Note that Assumption 3.1 typically holds for low-dimension processes {I n } Nn =0 when some formof Markov dynamics common in the hedging literature is assumed. See, for example, Fran¸coiset al. (2014) for the case of regime-switching models.In what follows, L and S used both as subscripts and superscripts denote respectively the longand short position hedges. Deﬁnition 3.3 ( Hedging with two neural networks ) . Let X n := ( S n , V n , I n , T − t n ) ∈ R ˜ D be thefeature vector for each trading time t n ∈ { t , . . . , t N − } . For some M L ∈ N , let F L θ ∈ N N M L , ˜ D,D be a FFNN. Given X n as an input, F L θ outputs a D -dimensional vector of the number of shares f each of the D risky assets held in the hedging portfolio of the long position during the period ( t n , t n +1 ] , i.e. δ (1: D ) n +1 = F L θ ( X n ) . Similarly, for some M S ∈ N , F S θ ∈ N N M S , ˜ D,D is a distinctFFNN which computes the position in the D risky assets to hedge the short position in the optionat each time step. These two FFNN are referred to as the long- N N and short-

N N . Remark 3.1.

In the current paper’s approach, the two neural networks are trained separately tominimize diﬀerent cost functions. As such, F L θ and F S θ will possibly have a diﬀerent structure,e.g. diﬀerent number of layers and number of neurons per layer, and diﬀerent values of trainableparameters. The problem of evaluating the measured risk exposure of the long and short positions under optimalhedging can now be formulated as a classical deep learning optimization problem. Since the inputand output of F L θ and F S θ are always respectively of dimensions ˜ D and D , let Θ M L := Θ M L , ˜ D,D and Θ M S := Θ M S , ˜ D,D be the sets of trainable parameters values as in (3.3) for respectively thelong-

N N and short-

N N . Deﬁnition 3.4 ( Long and short sided risk with two neural networks ) . For M L , M S ∈ N , deﬁne (cid:15) ( M L ) ( V ) and (cid:15) ( M S ) ( V ) as the measured risk exposure of the long and short position in thederivative if F L θ and F S θ are used to compute the hedging strategies and the initial hedging portfoliovalue is V : (cid:15) ( M L ) ( V ) := min θ ∈ Θ M L ρ (cid:16) − Φ( S N , Z N ) − B N ( V + G δ ( L,θ ) N ) (cid:17) , (3.4) (cid:15) ( M S ) ( V ) := min θ ∈ Θ M S ρ (cid:16) Φ( S N , Z N ) − B N ( V + G δ ( S,θ ) N ) (cid:17) , (3.5) where δ ( L,θ ) and δ ( S,θ ) in (3.4) and (3.5) are to be understood respectively as the trading strategiesobtained through F L θ and F S θ . Remark 3.2.

Following similar steps as in the proof of Proposition 2.1, it can be shown that (cid:15) ( M L ) ( V ) = (cid:15) ( M L ) (0) − B N V , (cid:15) ( M S ) ( V ) = (cid:15) ( M S ) (0) − B N V . Remark 3.3.

Suppose Assumption 3.1 is satisﬁed. Using the universal function approximation heorem of Hornik (1991) which essentially states that a FFNN approximates multivariate functionsarbitrarily well, Buehler et al. (2019) show that for any well-behaved and integrable enough assetprices dynamics and contingent claims (see Proposition 4.3 of their paper ): lim M S →∞ (cid:15) ( M S ) (0) = (cid:15) ( S ) (0) , lim M L →∞ (cid:15) ( M L ) (0) = (cid:15) ( L ) (0) . (3.6) Thus, this result shows that for both the long and short positions, there exists a large FFNN whichcan approximate arbitrarily well the optimal hedging strategy.

The equal risk pricing approach as well as the measure of market incompleteness (cid:15) can now berestated with the use of the long-

N N and short-

N N . Deﬁnition 3.5 ( Equal risk pricing and (cid:15) -completeness measure with two neural networks ) . Deﬁne C ( (cid:63), N N )0 as the equal risk price if F L θ and F S θ are used to compute the hedging strategies, i.e. C ( (cid:63), N N )0 := C such that: (cid:15) ( M L ) ( − C ) = (cid:15) ( M S ) ( C ) . Furthermore, let (cid:15) ( (cid:63), N N ) be the measure of market incompleteness if the price of the derivative is C ( (cid:63), N N )0 : (cid:15) ( (cid:63), N N ) := (cid:15) ( M L ) (cid:16) − C ( (cid:63), N N )0 (cid:17) = (cid:15) ( M S ) (cid:16) C ( (cid:63), N N )0 (cid:17) . (3.7) Remark 3.4.

Following similar steps as in the proof of Theorem 2.1 and Proposition 2.3, it canbe shown that C ( (cid:63), N N )0 = (cid:15) ( M S ) (0) − (cid:15) ( M L ) (0)2 B N , (cid:15) ( (cid:63), N N ) = (cid:15) ( M L ) (0) + (cid:15) ( M S ) (0)2 . (3.8)An important consequence of Remark 3.3 is that the current paper’s approach based on neural Buehler et al. (2019) consider a more general market with a ﬁltration generated by a process { I k } where I k ∈ R d contains any new market information at time t k . They use a distinct neural network at each tradingdate which can be a function of ( I , . . . , I k , δ k ) at time t k . From remarks 5 and 6 of Buehler et al. (2019), theconvergence result (3.6) holds under Assumption 3.1 by using instead a single FFNN for both the long and shortposition for all time steps as in Deﬁnition 3.3 of the current paper. Proposition 3.1. lim M S ,M L →∞ C ( (cid:63), N N )0 = C (cid:63) , lim M S ,M L →∞ (cid:15) ( (cid:63), N N ) = (cid:15) (cid:63) . (3.9) The training procedure of the long-

N N and short-

N N consists in searching for their optimalparameters θ ( L ) ∈ Θ M L and θ ( S ) ∈ Θ M S to minimize the measured risk exposures as in (3.4) and(3.5). The approach utilized in this paper is based on the deep hedging algorithm of Buehler et al.(2019). The training procedure of the short- N N with (minibatch) stochastic gradient descent(SGD), a very popular algorithm in deep learning, is presented. It is straightforward to adapt thelatter to the long-

N N with a simple modiﬁcation to the cost function (3.10) that follows. Let J ( θ ) be the cost function to be minimized for the short derivative position hedge : J ( θ ) := ρ (cid:16) Φ( S N , Z N ) − B N G δ ( S,θ ) N (cid:17) , θ ∈ Θ M S . (3.10)Denote θ ∈ Θ M S as the initial parameter values of F Sθ . The classical SGD algorithm consists inupdating iteratively the trainable parameters as follows: θ j +1 = θ j − η j ∇ θ J ( θ j ) , (3.11)where ∇ θ denotes the gradient operator with respect to θ and η j is a small positive deterministicvalue which is typically progressively reduced through iterations, i.e. as j increases. Recall that inthe current framework, a synthetic market is considered where paths of the hedging instruments Recall from (3.5) that the relation between J ( θ ) and the measured risk exposure of the short position is (cid:15) ( M S ) (0) = min θ ∈ Θ M S J ( θ ) . In this paper, the initialization of θ is always done with the Glorot uniform initialization from Glorot andBengio (2010). N batch ∈ N be the size of a simulated minibatch B j := { π i,j } N batch i =1 with π i,j being the i th hedging error if the trainable parameters are θ = θ j : π i,j := Φ( S N,i , Z

N,i ) − B N G δ ( S,θj ) N,i . (3.12)Moreover, let ˆ ρ ( B j ) be the empirical estimator of ρ ( π i,j ). The gradient of the cost function ∇ θ J ( θ j ) is estimated with ∇ θ ˆ ρ ( B j ) evaluated at θ = θ j .In the numerical section, the convex risk measure is assumed to be the Conditional Value-at-Risk(CVaR) as deﬁned in Rockafellar and Uryasev (2002). For an absolutely continuous integrablerandom variable , the CVaR has the following representation:CVaR α ( X ) := E [ X | X ≥ VaR α ( X )] , α ∈ (0 , , (3.13)where VaR α ( X ) := min x { x | P ( X ≤ x ) ≥ α } is the Value-at-Risk (VaR) of conﬁdence level α . TheCVaR has been extensively used in the risk management literature as it considers tail risk byaveraging all losses larger than the VaR. For a simulated minibatch of hedging errors B j , let { π [ i ] ,j } N batch i =1 be the corresponding ordered sequence and ˜ N := (cid:100) αN batch (cid:101) where (cid:100) x (cid:101) is the ceilingfunction (i.e. the smallest integer greater or equal to x ). Following the work of Hong et al. (2014)(see Section 2 of their paper), let VaR (cid:86) α ( B j ) and CVaR (cid:86) α ( B j ) be the estimators of the VaR andCVaR of the short hedging error at conﬁdence level α :VaR (cid:86) α ( B j ) := π [ ˜ N ] ,j , CVaR (cid:86) α ( B j ) := VaR (cid:86) α ( B j ) + 1(1 − α ) N batch N batch (cid:88) i =1 max( π i,j − VaR (cid:86) α ( B j ) , . Note that CVaR (cid:86) α ( B j ) depends of every trainable parameters in θ j as the π i,j are functions of thetrading strategy produced by the output of the short- N N . Furthermore, since the gain processand the trading strategy are linearly dependent, CVaR (cid:86) α ( B j ) is also linearly dependent of the In Section 4, the only dynamics considered for the risky assets produce integrable and absolutely continuoushedging errors. ∇ θ CVaR (cid:86) α ( B j ) is known analytically as the gradient ofthe output of a FFNN with respect to the trainable parameters is known analytically (see e.g.Goodfellow et al. (2016)). Remark 3.5.

It can be shown that CVaR (cid:86) α ( B j ) is biased in ﬁnite sample size, but is a consistentand asymptotically normal estimator of the CVaR (see e.g. Theorem of Trindade et al. (2007)).The speciﬁc impacts of this bias on the optimization procedure presented in this paper are out-of-scope. Multiple considerations are typically used to determine the minibatch size. It is oftentreated as an additional hyperparameter (see e.g. Chapter . . of Goodfellow et al. (2016) foradditional details). Numerical results presented in Section 4 of the current paper are robust todiﬀerent minibatch sizes, i.e. no signiﬁcant diﬀerence was observed under diﬀerent minibatchsizes. Remark 3.6.

A very popular algorithm in deep learning to compute analytically the gradient ofa cost function with respect to the parameters is backpropagation (Rumelhart et al., 1986), oftencalled backprop. Backprop leverages eﬃciently the structure of neural networks and the chainrule of calculus to obtain such gradient. In practice, deep learning libraries such as Tensorﬂoware often used to implement backprop. Moreover, sophisticated SGD algorithms such as Adam(Kingma and Ba, 2014) which dynamically adapt the η j in (3.11) over time have been shownto improve the training of neural networks. For all of the numerical experiments in Section 4,Tensorﬂow and Adam were used. Remark 3.7.

Proposition 3.1 shows that the current paper’s approach can approximate arbitrarilywell C (cid:63) and (cid:15) (cid:63) . As discussed in Goodfellow et al. (2016), SGD is not guaranteed to reach a localminimum in a reasonable amount of time. But empirically, it is often the case that the algorithmﬁnds a very low value of the cost function quickly enough to be useful. Hence, in practice, onecannot expect the current SGD approach to converge to C (cid:63) and (cid:15) (cid:63) , but should instead expect toobtain close-to-optimal parameters for F L θ and F S θ which result in a good enough approximation. Numerical results

This section illustrates the implementation of the equal risk pricing framework under diﬀerentmarket setups. Our analysis starts oﬀ in Section 4.2 with a sensitivity analyses of equal risk pricesand residual hedging risk in relation with the choice of convex risk measure. The assessment ofthe impact of diﬀerent empirical properties of assets returns on the equal risk pricing frameworkis performed in Section 4.3. A comparison with benchmarks consisting in risk-neutral expectedprices under commonly used EMMs is also presented. Section 4.4 shows that the current paper’sapproach is very general and is able to price exotic derivatives and assess their associated residualhedging risk. The setup for the latter numerical experiments is detailed in Section 4.1.

A single risky asset (i.e. D = 1) of initial price S = 100 is considered. It can be assumed to be anon-dividend paying stock. The annualized continuous risk-free rate is r = 0 .

02. Daily rebalancingwith 260 business days per year is applied, i.e. t i − t i − = 1 /

260 for i = 1 , . . . , N . The contingentclaim to be priced is a vanilla European put option on the risky asset with maturity T = 60 / K = 90 for OTM, K = 100 for at-the-money (ATM)and K = 110 for in-the-money (ITM). The convex risk measure used by the hedger is assumed tobe the CVaR risk measure. For n = 1 , . . . , N , the daily log-returns y n := log( S n /S n − ) are assumed, unless stated otherwise,to follow a Gaussian regime-switching (RS) model. RS models have the ability to reproducebroadly accepted stylized facts of asset returns such as heteroskedasticity, autocorrelation inabsolute returns, leverage eﬀect and fat tails, see, for instance Ang and Timmermann (2012).Under RS models, log-returns depend on an unobservable discrete-time process. Let h = { h n } Nn =0 be a ﬁnite state Markov chain taking values in { , . . . , H } for a positive integer H , where h n is theregime or state of the market during the period [ t n , t n +1 ). Let { γ i,j } H,Hi =1 ,j =1 be the homogeneous21ransition probabilities of the Markov chain, where for n = 0 , . . . , N − : P ( h n +1 = j |F n , h n , . . . , h ) = γ h n ,j , j = 1 , . . . , H. (4.1)Let ∆ := 1 / y n +1 = µ h n ∆ + σ h n √ ∆ (cid:15) n +1 , n = 0 , . . . , N − , (4.2)where { (cid:15) n } Nn =1 are independent standard normal random variables and { µ i , σ i } Hi =1 are the yearlymodel parameters with µ i ∈ R and σ i >

0. Following the work of Godin et al. (2019), deﬁne ξ := { ξ n } Nn =0 as the predictive probability process with ξ n := [ ξ n, , . . . , ξ n,H ] and ξ n,j as theprobability that the Markov chain is in the j th regime during [ t n , t n +1 ) conditional on theinvestor’s ﬁltration, i.e.: ξ n,j := P ( h n = j |F n ) , j = 1 , . . . , H. (4.3)Fran¸cois et al. (2014) show that the optimal hedging portfolio composition at time t n is strictlya function of { S n , V n , ξ n } . Thus, in Assumption 3.1, the feature vector considered for both thelong- N N and short-

N N is X n = [ S n , V n , ξ n , T − t n ]. Fran¸cois et al. (2014) also provide a recursionto compute the predictive probabilities processes ξ . For k = 1 , . . . , H , deﬁne the function φ k asthe Gaussian pdf with mean µ k and standard deviation σ k : φ k ( x ) := 1 σ k √ π exp (cid:18) − ( x − µ k ) σ k (cid:19) . Setting ξ as the stationary distribution of the Markov chain, the ξ n,i can be recursively computedfor n = 1 , . . . , N as follows: ξ n,i = (cid:80) Hj =1 γ j,i φ j ( y n ) ξ n − ,j (cid:80) Hj =1 φ j ( y n ) ξ n − ,j , i = 1 , . . . , H. The distribution of h is assumed to be the stationary distribution of the Markov chain.

22n Section 4.3, diﬀerent dynamics for the underlying will be considered. Each model is estimatedwith maximum likelihood on the same time series of daily log-returns on the S&P 500 priceindex for the period 1986-12-31 to 2010-04-01 (5863 observations). Resulting parameters are inAppendix C.

The training of the long-

N N and short-

N N is done as described in Section 3.3 with 100 epochs ,a minibatch size of 1 ,

000 on a training set (in-sample) of 400 ,

000 independent simulated pathsand a learning rate of 0 . ,

000 independent paths. The structure of every neuralnetwork is 3 layers with 56 neurons per layer and the activation function is the rectiﬁed linearunit (ReLU) where ReLU : R → [0 , ∞ ) is deﬁned as ReLU( x ) := max(0 , x ). In this section, we perform sensitivity analyses of equal risk prices and residual hedging risk withrespect to the conﬁdence level of the CVaR. Three diﬀerent conﬁdence levels are considered:CVaR α at levels 0 . , .

95 and 0 .

99. Optimizing risk exposure using a higher level α correspondsto agents with a higher risk aversion as the latter puts more relative weight on losses of largermagnitude. Thus, the choice of the conﬁdence level is motivated by the objective of assessing theimpact of the level of risk aversion of hedging agents on equal risk pricing. Table 1 presents theequal risk option prices and residual hedging risk exposures under the three conﬁdence levels.Our numerical results show that under the equal risk pricing framework, an increase in the riskaversion of hedging agents leads to increased put option prices. Indeed, under the use of theCVaR . risk measure, option prices signiﬁcantly increase across all moneynesses with relativeincreases of respectively 91%, 42% and 14% for OTM, ATM and ITM contracts with respectto prices obtained with the CVaR . . By using the CVaR . instead of CVaR . , only OTMequal risk prices are signiﬁcantly impacted with an increase of 32%, while for ATM and ITM, the An epoch is deﬁned as one complete iteration of SGD over the training set. For a training set of 400 , , able 1: Sensitivity analysis of equal risk prices C ( (cid:63), N N )0 and residual hedging risk (cid:15) ( (cid:63), N N ) forOTM ( K = 90), ATM ( K = 100) and ITM ( K = 110) put options of maturity T = 60 / C ( (cid:63), N N )0 (cid:15) ( (cid:63), N N ) (cid:15) ( (cid:63), N N ) /C ( (cid:63), N N )0 Moneyness OTM ATM ITM OTM ATM ITM OTM ATM ITMCVaR . .

40 4 .

19 11 .

14 1 .

36 2 .

61 1 .

68 0 .

97 0 .

62 0 . .

32% 4% 2% 35% 11% 14% 2% 7% 12%CVaR .

91% 42% 14% 99% 79% 98% 4% 26% 74%Notes: These results are computed based on 100 ,

000 independent paths generated from theregime-switching model under P (see Section 4.1.1 for model deﬁnition and Appendix C for modelparameters). The training of neural networks is done as described in Section 4.1.2. Values for theCVaR . and CVaR . risk measures are expressed relative to CVaR . (% increase).increase seems marginal. The positive association between put option prices and the conﬁdencelevel of hedgers can be explained by the fact that a put option payoﬀ is bounded below by zero.Therefore, the hedging error of the short position has a much heavier right tail than for the longposition. Thus, an increase in α often implies a larger increase for the risk exposure of the shortposition than for the long position. This results in a higher equal risk price to compensate theheavier increase in risk exposure of the short position.As expected, the risk exposure per option contract ( (cid:15) ( (cid:63), N N ) ) also increases with the level ofrisk aversion across all moneynesses. This is a direct consequence of Proposition 2.3 and themonotonicity property of CVaR α with respect to α . Also, the risk exposure per dollar invested( (cid:15) ( (cid:63), N N ) /C ( (cid:63), N N )0 ) for ITM and ATM contracts exhibits high sensitivity to the conﬁdence level α ,while for OTM the value of α seems much less important. This observation for OTM contractsis due to a similar relative increase in prices and residual risk exposures obtained when α isincreased. From these results, we can conclude that in practice, the choice of the conﬁdencelevel (or more generally of the risk measure itself) needs to be carefully analyzed as it can have amaterial impact on equal risk option prices. In this section, we consider four diﬀerent dynamics for the underlying: the BSM, a GARCHprocess, a regime-switching process and a jump-diﬀusion. This is motivated by the objective of24ssessing the impact of diﬀerent empirical properties of asset returns on the equal risk pricingframework. Indeed, Monte Carlo simulations from these models enable quantifying the impactof time-varying volatility, regime risk and jump risk on equal risk prices and residual hedgingrisk. Moreover, risk-neutral expected prices are used as benchmarks to equal risk prices undercommon EMMs found in the literature. The physical dynamics of each model is described belowand the associated risk-neutral dynamics are provided in Appendix B.

Under the discrete Black-Scholes model, log-returns are assumed to be i.i.d. normal randomvariables with daily mean and variance of respectively ( µ − σ )∆ and σ ∆: y n = (cid:18) µ − σ (cid:19) ∆ + σ √ ∆ (cid:15) n , n = 1 , . . . , N, (4.4)where µ ∈ R and σ > { (cid:15) n } Nn =1 are independent standardnormal random variables. The feature vector of the neural network is X n = [ S n , V n , T − t n ]. The jump-diﬀusion model of Merton (1976) generalizes the BSM by incorporating random jumpswithin paths. Let { (cid:15) n } Nn =1 be independent standard normal random variables, { N n } Nn =0 be valuesof a homogeneous Poisson process of intensity λ at t , . . . , t N and { χ j } ∞ j =1 be i.i.d. normal randomvariables of mean γ ∈ R and variance ϑ . { N n } Nn =1 , { (cid:15) n } Nn =1 and { χ j } ∞ j =1 are assumed independent.For n = 1 , . . . , N : y n = (cid:18) α − λ (cid:16) e γ + ϑ / − (cid:17) − σ (cid:19) ∆ + σ √ ∆ (cid:15) n + N n (cid:88) j = N n − +1 χ j , (4.5)where { α, γ, ϑ, λ, σ } are the model parameters with { α, λ, σ } being on a yearly scale, α ∈ R and σ >

0. The feature vector of the neural network is X n = [ S n , V n , T − t n ].25 .3.3 GARCH model In constrast to the BSM or MJD model, GARCH models allow for the volatility of asset returnsto be time-varying. The GJR-GARCH(1,1) model of Glosten et al. (1993) assumes that theconditional variance of log-returns is stochastic and captures important features of asset returnssuch as the leverage eﬀect and volatility clustering. For n = 1 , . . . , N : y n = µ + σ n (cid:15) n , (4.6) σ n +1 = ω + ασ n ( | (cid:15) n | − γ(cid:15) n ) + βσ n , where the model parameters { ω, α, β } are positive real values, µ ∈ R , γ ∈ R and { (cid:15) n } Nn =1 is asequence of independent standard normal random variables. Given the initial value σ , { σ n } Nn =1 is predictable with respect to F . A common assumption which is used in this paper is to set σ as the stationary variance: σ = ω − α (1+ γ ) − β . The feature vector of the neural network is X n = [ S n , V n , σ n +1 , T − t n ]. Table 2 presents the equal risk prices and residual hedging risk exposures for the four dynamicsconsidered based on the CVaR . risk measure. Values observed for C ( (cid:63), N N )0 indicate that thesensivity of equal risk prices with respect to the dynamics of the underlying highly depends on themoneyness. Indeed, OTM prices are signiﬁcantly impacted by the choice of dynamics; choosingthe GJR-GARCH or RS models instead of the BSM leads to a price increase respectively of 68%or 217%. For ATM and ITM contracts, only the RS model seems to materially alter equal riskprices in comparison to BSM prices with respective increases of 23% and 9%. Moreover, thenumerical results conﬁrm that the increase in hedging residual risk generated by time-varyingvolatility, regime risk and jump risk is far from being marginal and is highly sensitive to themoneyness of the option. Values obtained for the metric (cid:15) ( (cid:63), N N ) /C ( (cid:63), N N )0 show that regime riskhas the most impact with an increase of the risk exposure per dollar invested of 67% , able 2: Equal risk prices C ( (cid:63), N N )0 and residual hedging risk (cid:15) ( (cid:63), N N ) for OTM ( K = 90), ATM( K = 100) and ITM ( K = 110) put options of maturity T = 60 /

260 under diﬀerent dynamics. C ( (cid:63), N N )0 (cid:15) ( (cid:63), N N ) (cid:15) ( (cid:63), N N ) /C ( (cid:63), N N )0 Moneyness OTM ATM ITM OTM ATM ITM OTM ATM ITMBSM 0 .

58 3 .

53 10 .

39 0 .

35 0 .

74 0 .

59 0 .

60 0 .

21 0 . −

2% 0% 50% 62% 41% 42% 65% 41%GJR-GARCH 68% − −

1% 165% 115% 28% 57% 124% 30%Regime-switching 217% 23% 9% 428% 291% 223% 67% 218% 196%Notes: These results are computed based on 100 ,

000 independent paths generated from eachof the four diﬀerent models for the underlying under P (see Section 4.1.1 and Section 4.3 formodel deﬁnitions and Appendix C for model parameters). For each model, the training of neuralnetworks is done as described in Section 4.1.2. The conﬁdence level of the CVaR risk measure is α = 0 .

95. Results are expressed relative to the BSM (% increase).compared to jump risk, time-varying volatility seems to have a higher impact on residual hedgingrisk for OTM and ATM options, while jump risk has a higher impact on ITM contracts. It isinteresting to note that Augustyniak et al. (2017) evaluate the impact of the dynamics of theunderlying on the risk exposure and on the price of contingent claims under a quadratic penalty.Their numerical results show that the risk exposure is highly sensitive to the dynamics, but notthe price. This is in contrast to numerical results of the current study which show that under anon-quadratic penalty, prices can also vary signiﬁcantly with the dynamics of the underlying.Table 3 compares equal risk prices to risk-neutral prices for each dynamics. These results showthat except for a few cases, equal risk prices are signiﬁcantly higher than risk-neutral prices acrossall dynamics and moneynesses. This is especially true for OTM contracts: the lowest and highestrelative price increases are 10% and 231% when going from the BSM to the regime-switchingmodel. This signiﬁcant increase in option prices can be attributed to the diﬀerent treatment ofmarket scenarios by each approach. Expected risk-neutral prices consider averages of all scenarios,while equal risk prices with the CVaR risk measure coupled with a high conﬁdence level α onlyconsider extreme scenarios.The latter observation has important implications for ﬁnancial participants in the option market.Indeed, by using the risk-neutral valuation approach instead of the equal risk pricing framework,27 able 3: Equal risk and risk-neutral prices for OTM ( K = 90), ATM ( K = 100) and ITM( K = 110) put options of maturity T = 60 / .

53 3 .

51 10 .

36 10% 1% 0%MJD 0 .

46 3 .

32 10 .

24 34% 4% 2%GJR-GARCH 0 .

57 2 .

98 9 .

84 71% 14% 4%Regime-switching 0 .

56 3 .

10 10 .

33 231% 40% 10%Notes: Results for equal risk prices are computed based on 100 ,

000 independent paths generatedfrom each of the four diﬀerent models for the underlying under P (see Section 4.1.1 and Section 4.3for model deﬁnitions and Appendix C for model parameters). For each model, the training ofneural networks is done as described in Section 4.1.2. The conﬁdence level of the CVaR riskmeasure is α = 0 .

95. Results for risk-neutral prices are computed under the associated risk-neutraldynamics described in Appendix B. Equal risk prices are expressed relative to risk-neutral prices(% increase).a risk averse participant acting as a provider of options, e.g. a market maker, might signiﬁcantlyunderprice OTM put options. From the perspective of the equal risk pricing framework, risk-neutral prices imply more residual risk for the short position of OTM put contracts than for thelong position. It is important to note that the risk-neutral dynamics considered in this paperassume that jump and regime risk are not priced in the market. Additional analyses comparingequal risk prices to risk-neutral prices under alternative EMMs embedding other forms of riskpremia (see for instance Bates (1996) for jump risk premium and Godin et al. (2019) for regimerisk premium) may prove worthwhile in further work.

In this section, two exotic contingent claims are considered for the equal risk pricing framework,namely an Asian average price put and lookback put with ﬁxed strike. For Z n = n +1 (cid:80) ni =0 S i , n = 0 , . . . , N , the Asian option’s payoﬀ is:Φ( S N , Z N ) = max(0 , K − Z N ) . Z n = min i =0 ,...,n S i , n = 0 , . . . , N , the lookback option’s payoﬀ is:Φ( S N , Z N ) = max(0 , K − Z N ) . The same assumptions as in Section 4.3 are imposed, and only the regime-switching model isconsidered. The maturity is still T = 60 / X n = [ S n , Z n , V n , ξ n , T − t n ]. Table 4 presents the prices and residual hedging risk forthe three contingent claims (including the vanilla put option studied in previous sections) andTable 5 compares the equal risk prices to risk-neutral prices. From Table 4, we observe that theresidual hedging risk exposure ( (cid:15) ( (cid:63), N N ) ) is the highest for the lookback option, followed by thevanilla put and then the Asian option. Although this was expected, our approach has the beneﬁtof quantifying how risk varies across diﬀerent option categories. Table 5 shows that equal riskprices under the RS model are signiﬁcantly higher than risk-neutral prices across all contingentclaims considered. The diﬀerence is most important for OTM contracts with respective increasesof 231%, 465% and 220% for the put, Asian and lookback options. The ﬁnding that equal riskprices tend to be higher than risk-neutral prices is therefore shared by multiple option categories. Table 4:

Equal risk prices C ( (cid:63), N N )0 and residual hedging risk (cid:15) ( (cid:63), N N ) for OTM ( K = 90), ATM( K = 100) and ITM ( K = 110) vanilla put, Asian average price put and lookback put options ofmaturity T = 60 / C ( (cid:63), N N )0 (cid:15) ( (cid:63), N N ) (cid:15) ( (cid:63), N N ) /C ( (cid:63), N N )0 Moneyness OTM ATM ITM OTM ATM ITM OTM ATM ITMPut 1 .

84 4 .

34 11 .

33 1 .

84 2 .

90 1 .

91 1 .

00 0 .

67 0 . − − − − − −

55% 1% 6% − −

5% 86%Notes: These results are computed based on 100 ,

95. Results are expressed relative to the put option (%increase). 29 able 5:

Equal risk and risk-neutral prices for OTM ( K = 90), ATM ( K = 100) and ITM( K = 110) vanilla put, Asian average price put and lookback put options of maturity T = 60 / .

56 3 .

10 10 .

33 231% 40% 10%Asian 0 .

11 1 .

77 9 .

91 465% 56% 6%Lookback 0 .

94 5 .

61 15 .

57 220% 39% 19%Notes: Results for equal risk prices are computed with 100 ,

95. Results for risk-neutral prices arecomputed under the associated risk-neutral dynamics described in Appendix B. Equal risk pricesare expressed relative to risk-neutral prices (% increase).

This paper presents a deep reinforcement learning approach to price and hedge ﬁnancial derivativesunder the equal risk pricing framework. This framework introduced by Guo and Zhu (2017)sets option prices such that the optimally hedged residual risk exposure of the long and shortpositions in the contingent claim is equal. Adaptations to the latter scheme are used as proposedin Marzban et al. (2020) by considering convex risk measures under the physical measure toevaluate residual risk exposures. A rigorous proof that equal risk prices under these modiﬁcationsare arbitrage-free in general market settings which can include an arbitrary number of hedginginstruments is given in the current paper.Moreover, a universal and tractable solution based on the deep hedging algorithm of Buehleret al. (2019) to implement the equal risk pricing framework under very general conditions isdescribed. Results presented in this paper, which rely on Buehler et al. (2019), demonstratethat our methodological approach to equal risk pricing can approximate arbitrarily well the trueequal risk price. This study also introduces asymmetric (cid:15) -completeness measures to quantifythe level of unhedgeable risk associated with a position in a contingent claim. These measurescomplement the work of Bertsimas et al. (2001) who proposed market incompleteness measuresunder the quadratic penalty, while ours has the advantage of characterizing the risk aversion of30he hedger with any convex risk measure. Additionally, the measures introduced in this paperare asymmetric in the sense that the risk for the long and short positions in the derivative arequantiﬁed by two diﬀerent hedging strategies, unlike in Bertsimas et al. (2001) where the singlevariance-optimal hedging strategy is considered.Furthermore, Monte Carlo simulations were performed to study the equal risk pricing frameworkunder a large variety of market dynamics. The behavior of equal risk pricing is analyzedthrough the choice of the underlying asset model and of the conﬁdence level associated withthe risk measure, and is benchmarked against expected risk-neutral pricing. The conduction ofthese numerical experiments crucially relied on the deep RL algorithm presented in this study.Numerical results showed that except for a few cases, equal risk prices are signiﬁcantly higherthan risk-neutral prices across all dynamics and moneynesses considered. This ﬁnding is shownto be most important for OTM contracts and shared by multiple option categories. Furthermore,for a ﬁxed model for the underlying, sensitivity analyzes show that the choice of conﬁdence levelunder the CVaR risk measure has a material impact on equal risk prices. Numerical experimentsalso provided insight on drivers of the (cid:15) -completeness measures introduced in the current paper.The numerical study conﬁrms that for vanilla put options, the increase in hedging residual riskgenerated by time-varying volatility, regime risk and jump risk is far from being marginal and ishighly sensitive to the moneyness of the option.Future research on equal risk pricing could prove worthwhile. First, a question which remains iswhether the consistence of equal risk pricing approach with risk-neutral valuations can be madeexplicit. Moreover, additional analyses comparing equal risk prices to risk-neutral prices underalternative EMMs embedding other forms of risk premia may also prove worthwhile. Furthermore,a numerical study of the equal risk pricing framework under other convex measures than theCVaR could be of interest. We note that Marzban et al. (2020) provide numerical results of equalrisk pricing under the worst-case risk measure in the context of robust optimization. Lastly, theﬁnancial market could be extended by including diﬀerent market frictions such as transactioncosts and trading constraints. The latter inclusions would require examining if equal risk pricesare guaranteed to remain arbitrage-free in this context.31 eferences

Alexander, S., Coleman, T. F., and Li, Y. (2003). Derivative portfolio hedging based on CVaR.

New Risk Measures in Investment and Regulation: Wiley .Almahdi, S. and Yang, S. Y. (2017). An adaptive portfolio trading system: A risk-return portfoliooptimization using recurrent reinforcement learning with expected maximum drawdown.

ExpertSystems with Applications , 87:267–279.Ang, A. and Timmermann, A. (2012). Regime changes and ﬁnancial markets.

Annual Review ofFinancial Economics , 4(1):313–337.Augustyniak, M., Godin, F., and Simard, C. (2017). Assessing the eﬀectiveness of local andglobal quadratic hedging under GARCH models.

Quantitative Finance , 17(9):1305–1318.Bates, D. S. (1996). Jumps and stochastic volatility: Exchange rate processes implicit in deutschemark options.

The Review of Financial Studies , 9(1):69–107.Bertsimas, D., Kogan, L., and Lo, A. W. (2001). Hedging derivative securities and incompletemarkets: an (cid:15) -arbitrage approach.

Operations Research , 49(3):372–397.Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities.

Journal ofPolitical Economy , 81(3):637–654.Bollen, N. P. (1998). Valuing options in regime-switching models.

Journal of Derivatives , 6:38–50.Buehler, H., Gonon, L., Teichmann, J., and Wood, B. (2019). Deep hedging.

Quantitative Finance ,19(8):1271–1291.Carr, P. and Madan, D. (1999). Option valuation using the fast Fourier transform.

Journal ofComputational Finance , 2(4):61–73.Delbaen, F. and Schachermayer, W. (1994). A general version of the fundamental theorem ofasset pricing.

Mathematische Annalen , 300(1):463–520.Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete32ata via the EM algorithm.

Journal of the Royal Statistical Society: Series B (Methodological) ,39(1):1–22.Deng, Y. et al. (2016). Deep direct reinforcement learning for ﬁnancial signal representation andtrading.

IEEE Transactions on Neural Networks and Learning Systems , 28(3):653–664.Duan, J.-C. (1995). The GARCH option pricing model.

Mathematical Finance , 5(1):13–32.Eberlein, E. and Jacod, J. (1997). On the range of options prices.

Finance and Stochastics ,1(2):131–140.Elliott, R. J. and Madan, D. B. (1998). A discrete time equivalent martingale measure.

Mathe-matical Finance , 8(2):127–152.F¨ollmer, H. and Schied, A. (2002). Convex measures of risk and trading constraints.

Finance andstochastics , 6(4):429–447.F¨ollmer, H. and Schweizer, M. (1991). Hedging of contingent claims under incomplete information.

Applied Stochastic Analysis , 5:389–414.Fran¸cois, P., Gauthier, G., and Godin, F. (2014). Optimal hedging when the underlying as-set follows a regime-switching markov process.

European Journal of Operational Research ,237(1):312–322.Frittelli, M. (2000). The minimal entropy martingale measure and the valuation problem inincomplete markets.

Mathematical Finance , 10(1):39–52.Gerber, H. U. and Shiu, E. S. W. (1994). Option pricing by esscher transforms.

Transactions ofthe Society of Actuaries , 46:99–191.Glorot, X. and Bengio, Y. (2010). Understanding the diﬃculty of training deep feedforward neuralnetworks. In

Proceedings of the thirteenth international conference on artiﬁcial intelligence andstatistics , pages 249–256.Glosten, L. R., Jagannathan, R., and Runkle, D. E. (1993). On the relation between the expectedvalue and the volatility of the nominal excess return on stocks.

The Journal of Finance ,338(5):1779–1801.Godin, F. (2016). Minimizing CVaR in global dynamic hedging with transaction costs.

QuantitativeFinance , 16(3):461–475.Godin, F., Lai, V. S., and Trottier, D.-A. (2019). Option pricing under regime-switching models:Novel approaches removing path-dependence.

Insurance: Mathematics and Economics , 87:130–142.Goodfellow, I., Bengio, Y., and Courville, A. (2016).

Deep learning . MIT press.Guo, I. and Zhu, S.-P. (2017). Equal risk pricing under convex trading constraints.

Journal ofEconomic Dynamics and Control , 76:136–151.Halperin, I. (2017). Qlbs: Q-learner in the black-scholes (-merton) worlds.

Available at SSRN3087076 .Hardy, M. R. (2001). A regime-switching model of long-term stock returns.

North AmericanActuarial Journal , 5(2):41–53.Harrison, J. M. and Pliska, S. R. (1981). Martingales and stochastic integrals in the theory ofcontinuous trading.

Stochastic Processes and their Applications , 11(3):215–260.Hodges, S. and Neuberger, A. (1989). Optimal replication of contingent claims under transactionscosts.

Review of Futures Markets , 8:222–239.Hong, L. J., Hu, Z., and Liu, G. (2014). Monte carlo methods for value-at-risk and conditionalvalue-at-risk: a review.

ACM Transactions on Modeling and Computer Simulation (TOMACS) ,24(4):1–37.Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks.

NeuralNetworks , 4(2):251–257.Jiang, Z., Xu, D., and Liang, J. (2017). A deep reinforcement learning framework for the ﬁnancialportfolio management problem. arXiv preprint arXiv:1706.10059 .Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint rXiv:1412.6980 .Kolm, P. N. and Ritter, G. (2019). Dynamic replication and hedging: A reinforcement learningapproach. The Journal of Financial Data Science , 1(1):159–171.Lamberton, D. and Lapeyre, B. (2011).

Introduction to stochastic calculus applied to ﬁnance .Chapman and Hall/CRC.Lu, D. W. (2017). Agent inspired trading using recurrent reinforcement learning and lstm neuralnetworks. arXiv preprint arXiv:1707.07338 .Marzban, S., Delage, E., and Li, J. Y. (2020). Equal risk pricing and hedging of ﬁnancialderivatives with convex risk measures. arXiv preprint arXiv:2002.02876 .Merton, R. C. (1973). Theory of rational option pricing.

The Bell Journal of Economics andManagement Science , 4(1):141–183.Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous.

Journalof Financial Economics , 3:125–144.Moody, J. and Saﬀell, M. (2001). Learning to trade via direct reinforcement.

IEEE Transactionson Neural Networks , 12(4):875–889.Rockafellar, R. T. and Uryasev, S. (2002). Conditional value-at-risk for general loss distributions.

Journal of Banking & Finance , 26(7):1443–1471.Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations byback-propagating errors.

Nature , 323(6088):533–536.Rummery, G. A. and Niranjan, M. (1994).

On-line Q-learning using connectionist systems ,volume 37. University of Cambridge, Department of Engineering Cambridge, UK.Schweizer, M. (1995). Variance-optimal hedging in discrete time.

Mathematics of OperationsResearch , 20(1):1–32.Trindade, A. A., Uryasev, S., Shapiro, A., and Zrazhevsky, G. (2007). Financial prediction withconstrained tail risk.

Journal of Banking & Finance , 31(11):3524–3538.35atkins, C. J. and Dayan, P. (1992). Q-learning.

Machine Learning , 8(3-4):279–292.Xu, M. (2006). Risk measure pricing and hedging in incomplete markets.

Annals of Finance ,2(1):51–71.

A Proofs

A.1 Proof of Proposition 2.1

Using the translation invariance property of ρ , (cid:15) ( L ) ( V ) = min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N ( V + G δN ) (cid:1) = min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N G δN (cid:1) − B N V = (cid:15) ( L ) (0) − B N V . Similar steps show that (cid:15) ( S ) ( V ) = (cid:15) ( S ) (0) − B N V . (cid:3) A.2 Proof of Proposition 2.2

Using the translation invariance property of ρ , δ ( L ) = arg min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N ( V + G δN ) (cid:1) = arg min δ ∈ Π (cid:8) ρ (cid:0) − Φ( S N , Z N ) − B N G δN (cid:1) − B N V (cid:9) = arg min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N G δN (cid:1) . Similar steps show that δ ( S ) is independent of V . (cid:3) A.3 Lemma 1

For any ¯ δ, ˜ δ ∈ Π , G ¯ δ +˜ δn = G ¯ δn + G ˜ δn for n = 0 , . . . , N . .4 Proof of Lemma 1 G ¯ δ +˜ δ = G ¯ δ + G ˜ δ = 0 by deﬁnition and for n = 1 , . . . , N : G ¯ δ +˜ δn = n (cid:88) k =1 (¯ δ (1: D ) k + ˜ δ (1: D ) k ) • ( B − k S k − B − k − S k − ) = G ¯ δn + G ˜ δn . (cid:3) A.5 Lemma 2

For any δ ∈ Π , G δn = − G − δn for n = 0 , . . . , N . A.6 Proof of Lemma 2

For n = 0, the result is trivial as G δ = 0 for any δ ∈ Π. For n = 1 , . . . , N : G δn = n (cid:88) k =1 δ (1: D ) k • ( B − k S k − B − k − S k − ) = − n (cid:88) k =1 ( − δ (1: D ) k ) • ( B − k S k − B − k − S k − ) = − G − δn . (cid:3) A.7 Proof of Theorem 2.1

Using the results of Proposition 2.1: (cid:15) ( L ) ( − C (cid:63) ) = (cid:15) ( S ) ( C (cid:63) ) ⇐⇒ (cid:15) ( L ) (0) + B N C (cid:63) = (cid:15) ( S ) (0) − B N C (cid:63) ⇐⇒ C (cid:63) = (cid:15) ( S ) (0) − (cid:15) ( L ) (0)2 B N . (A.1)This shows that C (cid:63) exists, is unique and is given by (2.12). Next, we show that the equal riskprice is arbitrage-free. Some parts of the proof are inspired by the work of Xu (2006). Let (¯ v, ¯ δ )be a super-replication strategy of Φ, see Deﬁnition 2.8, where ¯ v is the super-replication priceas in (2.9). Let ˜ δ := arg min δ ∈ Π ρ (cid:0) − B N G δN (cid:1) . Using the translation invariance and monotonicity37roperties of ρ and Lemma 1: (cid:15) ( S ) (0) = min δ ∈ Π ρ (cid:0) Φ( S N , Z N ) − B N G δN (cid:1) ≤ ρ (cid:16) Φ( S N , Z N ) − B N ( G ¯ δ +˜ δN ) (cid:17) = ρ (cid:16) Φ( S N , Z N ) − B N ( G ¯ δN + G ˜ δN ) (cid:17) = ρ (cid:16) Φ( S N , Z N ) − B N (¯ v + G ¯ δN ) − B N G ˜ δN (cid:17) + B N ¯ v ≤ ρ ( − B N G ˜ δN ) + B N ¯ v, (A.2)where for (A.2), the monotonicity property is applied to Φ( S N , Z N ) − B N (¯ v + G ¯ δN ) ≤ P -a.s.This implies ζ ( S ) := (cid:15) ( S ) (0) − ρ ( − B N G ˜ δN ) B N ≤ ¯ v. (A.3)Similarly, let ( v, δ ) be a sub-replication strategy where v is the sub-replication price. Using thetranslation invariance and monotonicity properties of ρ as well as Lemma 1 and Lemma 2: (cid:15) ( L ) (0) = min δ ∈ Π ρ (cid:0) − Φ( S N , Z N ) − B N G δN (cid:1) ≤ ρ (cid:16) − Φ( S N , Z N ) − B N G ˜ δ − δN (cid:17) = ρ (cid:16) B N G δN − Φ( S N , Z N ) − B N G ˜ δN (cid:17) = ρ (cid:16) B N ( v + G δN ) − Φ( S N , Z N ) − B N G ˜ δN (cid:17) − B N v ≤ ρ ( − B N G ˜ δN ) − B N v, (A.4)where for (A.4), the monotonicity property is applied to B N ( v + G δN ) − Φ( S N , Z N ) ≤ P -a.s.This implies v ≤ ζ ( L ) := ρ ( − B N G ˜ δN ) − (cid:15) ( L ) (0) B N . (A.5)38sing (A.1), C (cid:63) has the representation C (cid:63) = 0 . ζ ( L ) + ζ ( S ) ). The last step of the proof entailsshowing that ζ ( L ) ≤ ζ ( S ) , which implies that the derivative price C (cid:63) ∈ [ v, ¯ v ] and is arbitrage-freein the sense of Deﬁnition 2.9. Deﬁne the risk measure (cid:37) as (cid:37) ( X ) := min δ ∈ Π ρ (cid:0) X − B N G δN (cid:1) . (A.6)Buehler et al. (2019) show that since ρ is a convex risk measure and Π is a convex set, (cid:37) is aconvex risk measure (see Proposition 3 . (cid:15) ( S ) (0) = (cid:37) (Φ( S N , Z N )) and (cid:15) ( L ) (0) = (cid:37) ( − Φ( S N , Z N )) by deﬁnition. With the translation invariance and convexity propertiesof (cid:37) , we obtain that min δ ∈ Π ρ (cid:0) − B N G δN (cid:1) = (cid:37) (0) = (cid:37) (cid:18)

12 Φ( S N , Z N ) −

12 Φ( S N , Z N ) (cid:19) ≤ (cid:37) (Φ( S N , Z N )) + 12 (cid:37) ( − Φ( S N , Z N ))= 12 (cid:15) ( S ) (0) + 12 (cid:15) ( L ) (0)= ⇒ min δ ∈ Π ρ (cid:0) − B N G δN (cid:1) − (cid:15) ( L ) (0) B N ≤ (cid:15) ( S ) (0) − min δ ∈ Π ρ (cid:0) − B N G δN (cid:1) B N = ⇒ ζ ( L ) ≤ ζ ( S ) . (cid:3) A.8 Proof of Proposition 2.3

Consider the equal risk price C (cid:63) deﬁned in (2.11). By the deﬁnition of (cid:15) (cid:63) , Proposition 2.1 andTheorem 2.1: (cid:15) (cid:63) = (cid:15) ( L ) ( − C (cid:63) ) = (cid:15) ( L ) (0) + B N C (cid:63) = (cid:15) ( L ) (0) + B N (cid:18) (cid:15) ( S ) (0) − (cid:15) ( L ) (0)2 B N (cid:19) = (cid:15) ( L ) (0) + (cid:15) ( S ) (0)2 . (cid:3) A.9 Proof of Proposition 3.1

This is a direct consequence of Proposition 4 . C ( (cid:63), N N )0 and (cid:15) ( (cid:63), N N ) as stated in (3.8). (cid:3) Risk-neutral dynamics

Since the market is arbitrage-free under the models assumed for the underlying, the ﬁrst fun-damental theorem of asset pricing implies that their exist a probability measure Q such that { S n e − rt n } Nn =0 is an ( F , Q )-martingale (see, for instance, Delbaen and Schachermayer (1994)). Forthe rest of Appendix B, let { (cid:15) Q n } Nn =1 be independent standard normal random variables under Q and denote P ,T as the price at time 0 of a contingent claim of payoﬀ Φ( S N , Z N ) at maturity T : P ,T := e − rT E Q (cid:2) Φ( S N , Z N ) (cid:12)(cid:12) F (cid:3) . (B.1)Here are the risk-neutral dynamics for each model considered. B.1 Regime-switching

The change of measure considered is the so-called regime-switching mean-correcting transform,a popular choice under RS models (see, e.g. Hardy (2001) and Bollen (1998)). This change ofmeasure preserves the model dynamics of regime-switching except for a shift to the drift in eachrespective regime. More precisely, during the passage from P to Q , the transition probabilitiesof the Markov chain and the volatilities are left unchanged, but the drifts µ i ∆ are shifted to( r − σ i / i = 1 , . . . , H . The resulting dynamics for the log-returns under Q is y n +1 = (cid:18) r − σ h n (cid:19) ∆ + σ h n √ ∆ (cid:15) Q n +1 , n = 0 , . . . , N − . (B.2)Let H := {H n } Nn =0 be the ﬁltration generated by the markov chain h : H n := σ ( h n (cid:12)(cid:12) u = 0 , . . . , n ) , n = 0 , . . . , N. (B.3)Following the work of Godin et al. (2019), option prices can be developed as follow. Let G := {G n } Nn =0 be the ﬁltration which contains all latent factors as well as information availableto market participants, i.e. G = F ∨ H . Thus, the process { ( S n , h n ) } is Markov under Q with40espect to G . With the law of iterated expectations, the time-0 price of a derivative P ,T can bewritten as follows: P ,T = e − rT E Q (cid:2) Φ( S N , Z N ) (cid:12)(cid:12) F (cid:3) = e − rT E Q (cid:2) E Q (cid:2) Φ( S N , Z N ) (cid:12)(cid:12) G (cid:3) (cid:12)(cid:12) F (cid:3) = e − rT H (cid:88) j =1 ξ Q ,j E Q (cid:2) Φ( S N , Z N ) (cid:12)(cid:12) S , h = j (cid:3) , (B.4)where ξ Q is assumed to be the stationary distribution of the Markov chain under P . Thecomputation of P ,T can be done through Monte Carlo simulations for all contingent claims (i.e.vanilla and exotic). B.2 Discrete BSM

By a discrete-time version of the Girsanov theorem, there exists a market price of risk process { ˜ λ n } Nn =1 such that (cid:15) Q n = (cid:15) n + ˜ λ n , for n = 1 , . . . , N . Setting ˜ λ n := √ ∆ (cid:0) µ − rσ (cid:1) and replacing (cid:15) n = (cid:15) Q n − ˜ λ n into (4.4), it is straightforward to obtain the Q -dynamics of the log-returns: y n = (cid:18) r − σ (cid:19) ∆ + σ √ ∆ (cid:15) Q n , n = 1 , . . . , N. (B.5)The computation of P ,T can be done with the well-known closed-form solution for vanillaput options (i.e. the Black-Scholes equation) and through Monte Carlo simulations for exoticcontingent claims. B.3 Discrete MJD

The change of measure used assumes no risk premia for jumps as in Merton (1976) and simplyshifts the drift in (4.5) from α to r . The Q -dynamics is thus y n = (cid:18) r − λ (cid:16) e γ + δ / − (cid:17) − σ (cid:19) ∆ + σ √ ∆ (cid:15) Q n + N n (cid:88) j = N n − +1 χ j , { χ j } ∞ j =1 and { N n } Nn =1 have the same distribution than under P . The computation of P ,T for vanilla put options can be quickly performed with the fast Fourier transform (see, e.g. Carrand Madan (1999)). The pricing of exotic contingent claims can be done through Monte Carlosimulations. B.4 GARCH

The risk-neutral measure considered is often used in the GARCH option pricing literature underwhich the one-period ahead conditional log-return mean is shifted, but the one-period aheadconditional variance is left untouched (see e.g. Duan (1995)). For n = 1 , . . . , N , let ϕ n ∈ F n − and deﬁne (cid:15) Q n := (cid:15) n + ϕ n . Replacing (cid:15) n = (cid:15) Q n − ϕ n into (4.6), we obtain y n = µ − σ n ϕ n + σ n (cid:15) Q n for n = 1 , . . . , N . To ensure that { S n e − rt n } Nn =0 is an ( F , Q )-martingale, the one-period conditionalexpected return under Q must be equal to the daily risk-free rate, i.e.: E Q [ e y n (cid:12)(cid:12) F n − ] = e µ − σ n ϕ n + σ n / = e r ∆ ⇐⇒ ϕ n := µ − r ∆ + σ n / σ n , n = 1 , . . . , N. Thus, the Q -dynamics of the GJR-GARCH(1,1) model is: y n = r ∆ − σ n / σ n (cid:15) Q n ,σ n +1 = ω + ασ n ( | (cid:15) Q n − ϕ n | − γ ( (cid:15) Q n − ϕ n )) + βσ n . The computation of P ,T can be done through Monte Carlo simulations for all contingent claims. C Maximum likelihood estimates results

This section presents estimated parameters for the various underlying asset models considered innumerical experiments from Section 4. 42 able 6:

Maximum likelihood parameter estimates of the Black-Scholes model. µ σ . . µ and σ are on an annual basis. Table 7:

Maximum likelihood parameter estimates of the GJR-GARCH(1,1) model. µ ω α γ β . . . . . Table 8:

Maximum likelihood parameter estimates of the regime-switching model.RegimeParameter 1 2 3 µ . . − . σ . . . ν . . . . . . γ . . . . . . ν represent probabilities associated with the stationary distribution of the Markovchain. γ is the transition matrix as in (4.1). µ and σ are on an annual basis. Table 9:

Maximum likelihood parameter estimates of the Merton jump-diﬀusion model. α σ λ γ ϑ . . . − . . α , σ and λλ