[PDF] A model-free backward and forward nonlinear PDEs for implied volatility

Abstract

We derive a backward and forward nonlinear PDEs that govern the implied volatility of a contingent claim whenever the latter is well-defined. This would include at least any contingent claim written on a positive stock price whose payoff at a possibly random time is convex. We also discuss suitable initial and boundary conditions for those PDEs. Finally, we demonstrate how to solve them numerically by using an iterative finite-difference approach.

Full PDF

AA model-free backward and for-ward nonlinear PDEs for impliedvolatility

Peter Carr , Andrey Itkin and Sasha Stoikov Tandon School of Engineering, New York University, 12 Metro Tech Center, 26th ﬂoor, Brooklyn NY 11201, USA Cornell University, School of Operations Research and Information Engineering, 2W Loop Road, New York, NY 10044,USA

July 18, 2019 W e derive a backward and forward nonlinear PDEs that govern the implied volatil-ity of a contingent claim whenever the latter is well-deﬁned. This would includeat least any contingent claim written on a positive stock price whose payoﬀ at apossibly random time is convex. We also discuss suitable initial and boundary conditionsfor those PDEs. Finally, we demonstrate how to solve them numerically by using aniterative ﬁnite-diﬀerence approach. Introduction

As this is well-known, given an option contract written, e.g., on some stock, the implied volatility isderived from an option price and shows what the market implies about the underlying stock volatilityin the future. For instance, the implied volatility is one of six inputs used in a simple option pricing(Black-Scholes) model, but is the only one that is not directly observable in the market. The standardway to determine it by knowing the market price of the contract and the other ﬁve parameters, issolving for the implied volatility by equating the model and market prices of the option contract.There exist various reasons why traders prefer considering option positions in term of the impliedvolatility, rather than the option price itself, see e.g., [Natenberg, 1994]. However, in this paper our1 a r X i v : . [ q -f i n . C P ] J u l oal is not to discuss the importance of this concept. Instead we focus on the way how the impliedvolatility is computed, and namely - on performance of this process.As was mentioned, the traditional method of computing the Black-Scholes implied volatility isrelatively simple. Perhaps, the main problem with such an approach occurs if one needs to simultaneouslycompute the implied volatility for a wide set of the model (contract) parameters. Then, the iterativeroot solver can converge slowly because i) there is no a universal good initial guess to kick oﬀ theiterations, and ii) for some regions of the model parameters the solution either doesn’t exist at all, oris hard to be found as the sensitivity of the model price to changes in the implied volatility is very low.Therefore, a surprising idea would be to replace an algebraic equation Eq.(2) used to determine theimplied volatility with another object, for instance, a partial diﬀerential equation (PDE). Here thesurprise means that the PDE is a more complicated object than the algebraic equation, and solvingthe PDE (especially nonlinear) requires some special methods usually much more complicated than asimple root solver. However, in contrast to the algebraic equation that should be solved independentlyat every given point e.g., in the space of option strikes and maturities, the PDE approach allows ﬁndingthe solution simultaneously in many such the points by using a series of marching sweeps in time. Also,solving a linear PDEs doesn’t require iterations, and hence, there is no need for the good initial guess.In this paper we demonstrate how such a PDE could be derived by using just a general deﬁnitionof the implied volatility. Moreover, we further extend this idea by deriving several types of thesePDEs. For instance, we obtain quasilinear backward and forward parabolic PDEs, and also nonlinearhyperbolic PDEs of the ﬁrst order. As these PDEs are new and yet have not been investigated in theliterature, we need to discuss suitable initial and boundary conditions that provide the solution of thesePDE to exist and be unique. We also develop an iterative numerical method to solve the PDEs byusing a ﬁnite-diﬀerence approach. The method is of second order of approximation in both space andtime, is unconditionally stable and preserves positivity of the solution. Using this method we computethe PDE implied volaitlity and ﬁnd that our intuition behind the main idea of the paper is correct. Inother words, performance of the ﬁnite-diﬀerence solver exceeds that of the traditional approach byfactor of forty. However, this result is subject to some details which are highlighted in the last Section.Despite in this paper the PDE approach is considered as an alternative to the traditional methodof getting the implied volatility by using the root solver, the latter in turn was seriously improvedwithin the last few years. First, in [Jackel, 2015] modiﬁed Newton iterations were proposed to solveEq.(2). When doing so the entire input domain is decomposed into four areas. Rational approximationsare used to provide initial guesses and reduce the number of iterations. Also this method providesan accuracy close to the machine precision. However, it breaks possible vectorization (but we don’tconsider this topic in this paper anyway).In [Glau et al., 2018] the approach of [Jackel, 2015] is improved by using a bivariate Chebyshevinterpolation of the implied volaitlity. This algorithm allows vectorization and consists of two steps.At the oﬄine phase polynomial weights of a low-rank Chebyshev interpolation of the implied volatilitysurface are computed and stored for four diﬀerent input domains, using the algorithm of [Jackel, 2015].This step is only performed once. During the online phase the input data is split into four domains,and the Chebyshev interpolation is applied to each domain by choosing the precomputed nodes fromthe oﬄine step.From this prospective, this problem can also be solved by building an artiﬁcial neural network (ANN)and training it on a given data set which will include all parameters of the contract (the spot price,the time to maturity, the strike, option price, etc.) and output the Black-Scholes implied volatility.The method of [Jackel, 2015] can be used when training this ANN as the golden "pricer". As such, theapproach proposed in this paper becomes, apparently, less important since the ANN training couldbe done oﬄine, while performance of the "pricer" is less important, rather than the accuracy. This is2ompletely opposite to the proposed PDE approach. However, using the ANN still requires solvingvarious problems, such as no-arbitrage, the existence of derivatives, etc., see [Itkin, 2019]. Also, theadvantage of using the ANN approach vs traditional methods, is, perhaps, a property inherent to manyﬁnancial models, and not only to the model we consider in this paper.Aside of practical importance, the existence of the model-free quasilinear parabolic PDE (or theother hyperbolic PDE) governing the implied volatility is an interesting fact, which yet has not beendiscovered in the literature. Therefore, the novelty of this paper consists in the derived equationstogether with the proposed numerical method of their solution.The rest of the paper is organized as follows. In Section 1 we derive a backward quasilinear PDEfor the implied volatility. In Section 2 a similar approch is used to derive an analogous forward PDE.Section 3 discusses how one can set the appropriate initial and boundary conditions for these PDEs. InSection 4 we also derive some other PDEs for the implied volatility that have a diﬀerent type (nonlinearhyperbolic rather than quasilinear parabolic equations). Then we construct a numerical method tosolve these equations which is described in detail in Section 5 and two Appendices. The results ofour numerical experiments and comparison of two methods: i) the traditional method of ﬁnding theimplied volatility by using a root solver with the Black-Scholes equation, and ii) the proposed methodof solving the derived PDEs numerically, are provided in Section 6. In Section 7 we discussed theresults obtained and conclude.

1. Backward PDE

Consider a class of contingent claims which have a single payoﬀ and which are written on the spotprice path of a single underlying risky asset. The Black-Scholes implied volatility, [Hull, 1997], canbe deﬁned for any such contingent claim, as long as the claim’s value in the Black-Scholes model ismonotonic in the volatility of the underlying asset. Standard examples include European optionsand American options that are optimally held alive. If the claim’s value is decreasing in volatility,then the value of a short position in that claim is increasing in volatility. Hence, there is no loss ingenerality if implied volatility is only deﬁned for short or long positions in claims whose value increaseswith volatility. We refer to any such claim as an option. This section derives a new nonlinear partialdiﬀerential equation (PDE) that relates the Black-Scholes implied volatility to the option price, to theunderlying stock price, and to calendar time.The Black-Scholes implied volatility is a useful measure, as it is a market practice instead of quotingthe option premium in the relevant currency, the options are quoted in terms of the Black-Scholesimplied volatility. Over the years, option traders have developed an intuition in this quantity. However,it can be further generalized by using a similar concept, but replacing the Black-Scholes frameworkwith another one. For instance, in [Corcuera et al., 2009] this is done under a Lévy framework, and,therefore, based on distributions that match more closely historical returns. In this paper we don’tconsider these generalizations, and are concentrated only on the Black-Scholes implied volatility.Consider a ﬁxed time horizon t ∈ [0 , T ] where T > S is strictly positive over this horizon. Let BS ( S, K, σ, r, q, t ) denote the functionrelating the Black-Scholes model value BS ( S, K, σ, r, q, t ) ∈ R of a contingent claim to the underlyingstock price level S >

0, the strike

K >

0, the (constant) volatility σ >

0, calendar time t ∈ [0 , T ],the constant interest rate r and continuous dividend q . Let subscripts denote partial derivatives. Weassume that the function BS ( S, K, σ, r, q, t ) is governed by the Black-Scholes backward PDE. To makethe notation easier, further in the expression BS ( S, K, σ, r, q, t ) we will drop all variables not relevantto our particular context. Therefore, instead, in this Section we denote the function value as BS ( S, σ, t ).3ith this notation, the corresponding Black-Scholes PDE reads, [Hull, 1997] ∂BS ( t, S, σ ) ∂t + ( r − q ) S ∂BS ( t, S, σ ) ∂S + 12 σ S ∂ BS ( t, S, σ ) ∂S = rBS ( t, S, σ ) , (1)deﬁned on some open region R of space and time. For example, for a European Call or Put, the region R would be the Cartesian product R E ≡ [0 , ∞ ) × [0 , T ). As a second example, for a down-and-out Callwith a lower barrier L ∈ (0 , S ), the region R would be the Cartesian product R L ≡ ( L, ∞ ) × [0 , T ).We furthermore assume that the Vega of the claim is always strictly positive, i.e. BS σ ( S, σ, t ) > R . This clearly holds for a European Call or Put on R E , but it can also be shown that it holds for a down-and-out Call on R L . We also assume that one can directly observe the market price Z ( T, K ) of such aclaim, and that this market price lies in some arbitrage-free open interval

A ⊂ R . For a European Callstruck at K , A C ≡ (( Q T S − D T K ) + , SQ T ), where D T = e − rT is the discount factor, and Q T = e − qT is the price appreciation factor. For a European Put struck at K , A P ≡ (( D T K − Q T S ) + , D T K ). Fora down-and-out Call struck at K , A D ≡ (( D T K − Q T S ) + , Q T ( S − L )), [Hull, 1997].For any such claim, we can implicitly deﬁne a function Σ( S, t, K, T, r, q, Z ) that relates the impliedvolatility Σ of the contingent claim to (

S, t ) ∈ R and to the market price of the contingent claim Z ∈ A . Again, for easy of notation we drop K, T, r, q from the list of independent variables, and thenthe deﬁnition of Σ(

S, t, Z ) reads BS ( S, Σ( S, t, Z ) , t ) = Z. (2)Let D denote the domain for this function, which is deﬁned as the Cartesian product of R and A .Since the PDE Eq.(1) holds for any level of σ >

0, it holds in particular if we set σ = Σ( S, t, Z ): ( ∂BS ( t, S, σ ) ∂t + ( r − q ) S ∂BS ( t, S, σ ) ∂S + 12 σ S ∂ BS ( t, S, σ ) ∂S − rBS ( t, S, σ ) )(cid:12)(cid:12)(cid:12)(cid:12) σ =Σ( S,t,Z ) = 0 . (3)We now show that Eq.(2) and Eq.(3) can be used to generate a nonlinear PDE governing Σ( S, t, Z ) on D .First, diﬀerentiate Eq.(2) with respect to t : BS t ( S, Σ( S, t, Z ) , t ) + BS σ ( S, Σ( S, t, Z ) , t )Σ t ( S, t, Z ) = 0 . (4)Second, diﬀerentiate Eq.(2) with respect to S instead: BS S ( S, Σ( S, t, Z ) , t ) + BS σ ( S, Σ( S, t, Z ) , t )Σ S ( S, t, Z ) = 0 . (5)In what follows, we will also drop the 3 arguments of Σ for notational ease, as it shouldn’t give riseto any confusion. Now diﬀerentiate Eq.(5) with respect to S : BS SS ( S, Σ , t ) + 2 BS Sσ ( S, Σ , t )Σ S + BS σσ ( S, Σ , t )Σ S + BS σ ( S, Σ , t )Σ SS = 0 . (6)Substituting Eq.(4) and Eq.(6) into Eq.(3) implies: − BS σ ( S, Σ , t )Σ t −

12 Σ S h BS Sσ ( S, Σ , t )Σ S + BS σσ ( S, Σ , t )Σ S + BS σ ( S, Σ , t )Σ SS i (7)+ ( r − q ) SBS S ( S, Σ , t ) = rBS ( S, Σ , t ) . We now show that we can express Vanna BS Sσ ( S, Σ , t ) and Volga BS σσ ( S, Σ , t ) in terms of Vega BS σ ( S, Σ , t ). First, diﬀerentiate Eq.(2) with respect to Z : BS σ ( S, Σ , t )Σ Z = 1 . (8)4ext, diﬀerentiate Eq.(8) with respect to S : BS Sσ ( S, Σ , t )Σ Z + BS σσ ( S, Σ , t )Σ S Σ Z + BS σ ( S, Σ , t ) . Σ SZ = 0 . (9)Now diﬀerentiate Eq.(8) with respect to Z instead: BS σσ ( S, Σ , t )Σ Z + BS σ ( S, Σ , t )Σ ZZ = 0 . (10)Solving Eq.(10) for Volga BS σσ ( S, Σ , t ) yields BS σσ ( S, Σ , t ) = − BS σ ( S, Σ , t ) Σ ZZ Σ Z . (11)Substituting Eq.(11) into Eq.(9) and solving for Vanna BS Sσ ( S, Σ , t ) implies: BS Sσ ( S, Σ , t ) = BS σ ( S, Σ , t ) Σ S Σ Z Σ ZZ − BS σ ( S, Σ , t ) Σ SZ Σ Z . (12)Substituting Eq.(11) and Eq.(12) into Eq.(7) yields0 = BS σ ( S, Σ , t ) ( Σ t + 12 Σ S "(cid:18) Σ S Σ Z (cid:19) Σ ZZ − S Σ Z Σ SZ + Σ SS (13) − ( r − q ) SBS S ( S, Σ , t ) + rZ. Also from Eq.(8) we can express the option Vega via Σ Z , from Eq.(5) - the option Delta, and substitutethese expressions into Eq.(13) to obtain0 = BS σ ( S, Σ , t ) ( Σ t + 12 Σ S "(cid:18) Σ S Σ Z (cid:19) Σ ZZ − S Σ Z Σ SZ + Σ SS + ( r − q ) S Σ S + rZ Σ Z (cid:27) . (14)Therefore, if the option Vega is non-zero, the backward PDE for the implied volatility Σ ﬁnally takesthe form Σ t + 12 Σ S "(cid:18) Σ S Σ Z (cid:19) Σ ZZ − S Σ Z Σ SZ + Σ SS + ( r − q ) S Σ S + rZ Σ Z = 0 . (15)This is a quasilinear PDE of the parabolic type.When the Black-Scholes model Delta of a contingent claim is known in closed form, there is analternative quasilinear PDE that governs the function Σ(

S, Z, t ) on D . Solving Eq.(8) for Vega BS σ ( S, Σ , t ), substituting the result into Eq.(5), and then solving for Delta BS S ( S, Σ , t ) implies: BS S ( S, Σ , t ) = − Σ S Σ Z . (16)Hence Eq.(13) can also be written as:Σ t + 12 Σ S h BS S ( S, Σ , t )Σ ZZ + 2 BS S ( S, Σ , t )Σ SZ + Σ SS i + ( r − q ) S Σ S + rZ Σ Z = 0 . (17) A partial diﬀerential equation is said to be quasilinear if it is linear with respect to all the highest order derivatives ofthe unknown function, [Pinchover and Rubinstein, 2005].

5o illustrate, it is well known that for a European Call, Delta BS S ( S, Σ , t ) is known in closed form: BS S ( S, Σ , t ) = N ( d ( S, Σ , t )) , (18)where: d ( S, Σ , t ) ≡ ln SK + Σ T Σ √ T , (19)and N ( x ) is the normal CDF. Hence, for a European Call, the quasilinear PDE Eq.(17) can also berepresented as another quasilinear PDE:Σ t + 12 Σ S h N ( d ( S, Σ , t ))Σ ZZ + 2 N ( d ( S, Σ , t ))Σ SZ + Σ SS i + ( r − q ) S Σ S + rZ Σ Z = 0 , (20)whose domain is the Cartesian product of R E and A C , which is D = (0 , ∞ ) × (0 , T ) × (( Q T S − D T K ) + , Q T S ) . Similar quasilinear PDE’s can be derived for European Puts or down-and-out Calls.For an American put option with a positive early exercise premium, in the continuation region therelevant PDE is Eq.(15), since neither the value nor the Delta is known in closed form.

2. Forward equation

In practice it is often necessary to simultaneously compute the Black-Scholes implied volatility of manyoptions written on the same underlying, but having diﬀerent strikes and maturity. This is to constructso-called the implied volatility surface, given market quotes Z ( K, T ) and the market values of

S, r, q .However, it is well-known that solving this problem by using the backward PDE is time-consumingsince for every pair

K, T a separate backward PDE has to be solved. In contrast, the forward equationcan be used to eﬃciently do this in one sweep, see e.g., [Dupire, 1994, Gatheral, 2006].Having this in mind, in this Section we derive a forward PDE for the implied volatility Σ(

S, T, K, r, q, Z ).For easy of notation we will write it as Σ(

T, K, Z ), thus dropping parameters that don’t change. Thederivation can be done in a way similar to that in Section 1. Therefore, here we omit a detailed algebra,and provide just short explanations.Again, we can start with the deﬁnition of the implied volatility Σ(

K, T, Z ) in Eq.(2), which in newvariables read BS ( K, Σ( K, T, Z ) , T ) = Z, (21)where Σ( K, T, Z ) ∈ [0 , ∞ ), and the domain of deﬁnition for ( K, T, Z ) is R F,E ≡ [0 , ∞ ) × (0 , ∞ ) × A C .To proceed, we need a forward PDE for the European option price BS ( K, T, σ ) which is also knownas Dupire’s equation, [Dupire, 1994] ∂BS ( K, T, σ ) ∂T = 12 σ ( K ) K ∂ BS ( K, T, σ ) ∂K − ( r − q ) K ∂BS ( K, T, σ ) ∂K − qBS ( K, T, σ ) . (22)Since the PDE Eq.(22) holds for any level of σ >

0, it holds, in particular, if we set σ = Σ( K, T, Z ). Inwhat follows, we will drop the 3 arguments of Σ for notational ease.Diﬀerentiating Eq.(21) with respect to T , K and twice with respect to K yields0 = BS T ( K, Σ , T ) + BS σ ( K, Σ , T )Σ T , (23)0 = BS K ( K, Σ , T ) + BS σ ( K, Σ , T )Σ K , BS KK ( K, Σ , T ) + 2 BS Kσ ( K, Σ , T )Σ K + BS σσ ( K, Σ , T )Σ K + BS σ ( K, Σ , T )Σ KK . BS σ ( K, Σ , T )Σ T −

12 Σ K h BS Kσ ( K, Σ , T )Σ K + BS σσ ( K, Σ , T )Σ K + BS σ ( K, Σ , T )Σ KK i + ( r − q ) KBS σ ( K, Σ , T )Σ K − qBS ( K, Σ , T ) . (24)Again, derivatives BS Kσ ( K, Σ , T ) and BS σσ ( K, Σ , T ) can be expressed in terms of BS σ ( K, Σ , T ).Diﬀerentiating Eq.(21), ﬁrst with respect to Z , and then with respect either to K , or to Z , yields1 = BS σ ( K, Σ , T )Σ Z , (25)0 = BS Kσ ( K, Σ , T )Σ Z + BS σσ ( K, Σ , T )Σ K Σ Z + BS σ ( K, Σ , T )Σ KZ , BS σσ ( K, Σ , T )Σ Z + BS σ ( K, Σ , T )Σ ZZ . The last equation in this system can be solved for BS σσ ( K, Σ , T ), and then the second one - for BS Kσ ( K, Σ , T ). Substituting these solutions into Eq.(24) yields0 = BS σ ( K, Σ , T ) ( Σ T −

12 Σ K "(cid:18) Σ K Σ Z (cid:19) Σ ZZ − K Σ Z Σ KZ + Σ KK + ( r − q ) K Σ K ) − qZ. (26)Using the ﬁrst line in Eq.(25), this could be re-written as0 = BS σ ( K, Σ , T ) ( Σ T −

12 Σ K "(cid:18) Σ K Σ Z (cid:19) Σ ZZ − K Σ Z Σ KZ + Σ KK + ( r − q ) K Σ K − qZ Σ z (cid:27) . (27)If the option Vega is non-zero, we ﬁnally obtain the forward PDE for the implied volatility Σ( K, T, Z )0 = Σ T −

12 Σ K "(cid:18) Σ K Σ Z (cid:19) Σ ZZ − K Σ Z Σ KZ + Σ KK + ( r − q ) K Σ K − qZ Σ Z . (28)This PDE has to be solved subject to some initial and boundary conditions which are discussed inSection 3.Since we consider the European options, the Black-Scholes model strike delta of a contingent claimis known in closed form. Then, similar to the previous Section, the PDE in Eq.(28) could be slightlysimpliﬁed. Indeed, as follows from the second line of Eq.(23) and ﬁrst line of Eq.(25) BS K ( S, Σ , T ) = − Σ K Σ Z . (29)Hence Eq.(28) can also be re-written as:0 = Σ T −

12 Σ K h BS K ( K, Σ , T )Σ ZZ + 2 BS K ( K, Σ , T )Σ KZ + Σ KK i + ( r − q ) K Σ K − qZ Σ Z . (30)For instance, for a European Call, the strike delta BS K ( K, Σ , T ) can be found in closed form. Indeed,since the Black-Scholes European Call option price is a homogeneous function of order 1 in ( S, K ) it iseasy to ﬁnd that

KBS K ( K, σ, T ) = BS ( K, σ, T ) − SBS S ( K, σ, T ) = − Ke − rT N ( d ( K, σ, T )) , (31)where d = d − σ √ T . 7 . Initial and boundary conditions The non-linear PDE Eq.(28) describes evolution of the function Σ(

K, T, Z ) ∈ [0 , ∞ ) at the domain R F,E , and has to be solved subject to some initial and boundary conditions. However, this immediatelybrings some problems.Indeed, the initial condition for Eq.(28) has to be set at T = 0. From the Black-Scholes formulawe know that the option Vega BS σ ( K, Σ , T ) vanishes at T = 0. Therefore, the expression in curlybraces in Eq.(26) could have any value. Also, the equation BS σ ( K, Σ , T ) = 0 doesn’t have a solutionfor Σ. This is because at T = 0 function BS ( K, Σ ,

0) gets its intrinsic value (e.g. for the Call option - BS ( K, Σ ,

0) = ( S − K ) + ) which doesn’t depend on Σ. Accordingly, for the ITM options there is no Σwhich would make the intrinsic value zero, and for the OTM options any Σ could be chosen.To resolve this, we make a change of the dependent variable Σ( K, T, Z ) S ( K, T, Z ) = Σ(

K, T, Z ) √ T .This pursues two goals. First, at T = 0 a natural initial condition implies S ( K, , Z ) = 0. Second, ifwe know S ( K, T, Z ) given some values of (

K, T, Z ), the value of Σ(

K, T, Z ) can be easily restored.The forward PDE for the new dependent variable S ( K, T, Z ) can be derived from Eq.(28), whichyields S T = 12 T S K "(cid:18) S K S Z (cid:19) S ZZ − S K S Z S KZ + S KK − ( r − q ) K S K + qZ S Z + 12 T S . (32)Since Eq.(32) is a 2D parabolic PDE (quasilinear), the boundary conditions must be set ∀ T >

K, Z ] ∈ [0 , ∞ ] × (( Q T S − D T K ) + , Q T S ) which is depicted in Fig. 1. A special attention should be drawn to two of theseboundaries: Z = 0 and Z = Q T S − D t K - to decide whether the PDE needs any boundary conditionat them. Z = S Q T − K D T KZ Q T S Q T S/D T Ω Figure 1:

The domain

Ω : [

K, Z ] ∈ [0 , ∞ ] × A C . As mentioned in [Itkin and Carr, 2011], the correct boundary conditions are determined by the speedof the diﬀusion term as we approach the boundary in a direction normal to the boundary. To illustrate,consider a PDE C t = a ( x ) C xx + b ( x ) C x + c ( x ) C, (33)where C = C ( t, x ) is some function of the time t and the independent variable x ∈ [0 , ∞ ), a ( x ) , b ( x ) , c ( x ) ∈ C are some known functions of x . Then, as shown by [Oleinik and Radkevich, 1973], no boundarycondition is required at x = 0 if lim x → [ b ( x ) − a x ( x )] ≥

0. In other words, the convection term at x = 08s ﬂowing upwards and dominates as compared with the diﬀusion term. A well-known example of suchconsideration is the Feller condition as applied to the Heston model, see, e.g., [Lucic, 2008].To make it clear, no boundary condition means that instead of the boundary condition at x → a (0) , b (0) , c (0).In Section 5 we show how to solve Eq.(32) by using temporal discretization that gives rise to thefollowing equation S T = 12 T ˜ S K  ˜ S K ˜ S Z ! S ZZ − S K ˜ S Z S KZ + S KK  − ( r − q ) K S K + qZ S Z + 12 T S . (34)Here S = S ( T + ∆ T, K, Z ), ˜ S = ˜ S ( T, K, Z ), and ∆ T is the step of the temporal grid in T . TheEq.(34) is a linear PDE since all coeﬃcients are already known (they are either given, e.g., r, q, T , orknown from the previous time step). Thus, Eq.(34) is a 2D version of Eq.(33).Accordingly, in the K direction with allowance for Eq.(23),Eq.(31), the condition lim K → b ( K ) − a ( K ) ≥ K → (cid:2) b ( K ) − a ( K ) (cid:3) = lim K → ( − " ( r − q ) K + ˜ S KT (cid:16) ˜ S + K ˜ S K (cid:17) = lim K → ( − K " r − q + ˜ S T (cid:18) ˜ S − K √ T BS K ( K, Σ , T ) BS σ ( K, Σ , T ) (cid:19) = lim K → ( − K " r − q + ˜ S T (cid:18) ˜ S + N ( d ( K, σ, T ))Φ( d ( K, Σ , T )) (cid:19) = 0 , Φ( x ) = N ( x ) , because lim K → ˜ S ( K, T, Z ) = 0 , T >

0. Therefore, as per [Oleinik and Radkevich, 1973], no boundarycondition should be set at K = 0 as the PDE itself serves as the correct boundary condition.However, our domain Ω includes the only point with K = 0, while the left boundary is a line Z = Q T S − D T K . It can be easily checked that along this line S = 0 which is the required boundaryconditions.In the Z direction, again using Eq.(25), we obtain12 K ∂∂Z ˜ S ˜ S K ˜ S Z ! = K ˜ S ˜ S K ˜ S Z S ˜ S KZ ˜ S Z ˜ S K − ˜ SS ZZ ˜ S Z . ! (35)= K T ˜Σ ˜Σ K ˜Σ Z KZ ˜Σ Z ˜Σ K − ˜Σ ˜Σ ZZ ˜Σ Z ! = K T ˜Σ ˜Σ K ˜Σ Z BS K,σ ( K, ˜Σ , T ) BS K ( K, ˜Σ , T ) ! = K T ˜Σ ˜Σ K ˜Σ Z − ˜Σ BS σ,K ( K, ˜Σ , T ) D T N ( d ( K, ˜Σ , T )) ! = K T ˜Σ ˜Σ K ˜Σ Z − ˜Σ Φ( d ( K, ˜Σ , T )) d ( K, ˜Σ , T ) D T N ( d ( K, ˜Σ , T )) ! . From Eq.(25), Σ Z = 1 /BS σ ( K, ˜Σ , T ) >

0. Therefore,lim Z → (cid:2) b ( Z ) − a ( Z ) (cid:3) = lim Z → " qZ − K ˜Σ ˜Σ K ˜Σ Z − ˜Σ Φ( d ( K, ˜Σ , T )) d ( K, ˜Σ , T ) D T N ( d ( K, ˜Σ , T )) ! = − K lim Z → " ˜Σ ˜Σ K BS σ ( K, ˜Σ , T ) − ˜Σ Φ( d ( K, ˜Σ , T )) d ( K, ˜Σ , T ) D T N ( d ( K, ˜Σ , T )) ! Since at Z = 0 the domain Ω is a straight line KD T ≥ SQ T , it can be checked that the value of d ( K, ˜Σ , T ) along this line is always negative, while ˜Σ = 0 except the point K = SQ T /D T . Therefore,9hat remains to be checked is the behavior of the option Vega as Z →

0. The implied volatility˜Σ which solves the equation Z = 0 can be found numerically, and is not zero at KD T > SQ T .Therefore, Vega is positive (despite very small). Thus, lim Z → [ b ( Z ) − a ( Z )] <

0. Therefore, as per[Oleinik and Radkevich, 1973], we need a boundary condition at the boundary Z = 0. This conditioncan be found by solving the equation Z = BS ( K, Σ , T ) = 0 , (36)with respect to Σ. A typical example of the behavior of this solution is given in Fig. 2 which iscomputed using the following input data: r = 0 . , q = 0 .

01 and two cases with ( S = 1 . , T = 0 .

1) and( S = 100 , T = 1 . Figure 2:

Implied volatility S obtained by solving the equation Z = 0 for the European Call option. Boundary conditions at K → ∞ : To set this boundary condition we need to know an asymp-totic behavior of the implied volatility when K → ∞ . This was a subject of an intensive researchfor the last decade, and several useful results are available in the literature, see, e.g., [Lee, 2004,Benaim and Friz, 2009, Gulisashvili, 2010] and references therein. In particular, the following asymp-totic formula was obtained in [Gulisashvili, 2010] S ( K ) = √ "s log KF + log SQ T C ( K ) − s log SQ T C ( K ) + O (cid:18) log SQ T C ( K ) (cid:19) − / log log SQ T C ( K ) ! , (37)as K → ∞ . Here, C ( K ) is the Call option price corresponding to the given K, T , and F = SQ t /D T isthe forward price. One can observe, that in our case C ( K ) = Z . Therefore, as the boundary conditionat K → ∞ we can use S ( K ) = √ s log KF + log SQ T Z − s log SQ T Z  . (38)10ince Z ≤ SQ T , the logarithm log( SQ T /Z ) is always non-negative.However, this asymptotics cannot be used along the whole vertical line of the domain Ω at K → ∞ in Fig. 1. For instance, at Z = SQ T the reminder of the approximation in Eq.(37) is large. In this caseone can solve numerically the equation BS ( K, S , T ) = Z. (39) Boundary conditions at Z = SQ T : The condition Z = Q T S implies that N ( d ( K, T, S )) = 1 and N ( d ( K, T, S )) = 0. This means that S → ∞ . To resolve this when constructing a numerical solution,one can move this boundary to Q T ( S − ∆ S ), where ∆ S (cid:28) S . Then, using a Taylor series expansion of BS ( K, S , T ) around S → ∞ , we can ﬁnd the solution of the equation BS ( K, S , T ) = Q T ( S − ∆ S ) . (40)Taking into account ﬁrst 8 terms in this expansion, we arrive at the following equation0 = √ k h − (cid:16) S − (cid:17) log ( k ) + 24 (cid:16) S − S + 240 (cid:17) log ( k ) + log ( k ) (41) − (cid:16) S − S + 48 S − (cid:17) i + 24 √ πδe S S , where k = SQ T / ( KD T ) , δ = ∆ SQ T / ( KD T ), so δ (cid:28) k . This equation has three roots, and usuallyone is negative. Among the other two positive roots we need the smallest one, which approximatelycorresponds to the solution of Eq.(41) with δ = 0. Then this root aligns with the asymptotics of thesolution at large K considered in the previous paragraph. Setting δ = 0 in Eq.(41) gives rise to a cubicalgebraic equation which can be solved analytically. Alternatively, Eq.(40) can be solved numerically.

4. Another type of PDE

The Eq.(32) can be further transformed into another type of the PDE. Indeed, the expression in squarebrackets in Eq.(32) can be represented as A ( T, K, Z, S ) = (cid:18) S K S Z (cid:19) S ZZ − S K S Z S KZ + S KK = S Z  ∂ (cid:16) S K S Z (cid:17) ∂K − ∂ (cid:16) S K S Z (cid:17) ∂Z  . (42)From Eq.(29), and Eq.(31) we have S K S Z = − BS K ( S, S , T ) = e − rT N ( d ( K, S )) , (43)Substituting Eq.(43) into Eq.(42), and taking into account that S = S ( K, T, Z ), we obtain A ( T, K, Z ) = a ( T, K, S ) S Z h b ( T, K, S ) S Z + c ( T, K, S ) S K + d ( T, K, S ) i , (44) a ( T, K, S ) = 1 √ π exp (cid:18) − rT − d ( K, S ) (cid:19) ,c ( T, K, S ) = ∂d ( K, σ, T ) ∂σ (cid:12)(cid:12)(cid:12) σ √ T = S , b ( T, K, S ) = − e − rT N ( d ( K, S )) c ( T, K, S ) ,d ( T, K, S ) = ∂d ( K, σ, T ) ∂K (cid:12)(cid:12)(cid:12) σ √ T = S . S T = a ( T, K, S ) K T S S Z [ b ( T, K, S ) S Z + c ( T, K, S ) S K + d ( T, K, S )] (45) − ( r − q ) K S K + qZ S Z + 12 T S . In contrast to Eq.(32), this is a nonlinear hyperbolic

PDE. It can be used as an alternative to Eq.(32)which is a quasilinear parabolic

PDE. Accordingly, the PDE in Eq.(45) requires just the boundaryconditions at lines K ∈ [ SQ t /D T , ∞ ) and Z = ( SQ T − D T K ) + , see Fig. 1.The Eq.(45) can be further simpliﬁed by using Eq.(43) that yields S T = − Θ( T, K, σ ) (cid:12)(cid:12)(cid:12) σ √ T = S S Z + 12 T S , (46)where Θ( T, K, σ ) is the option Theta. This, of course, also follows from Eq.(23), Eq.(25), so Eq.(46)can be obtained directly from these equations. The Eq.(32) is a one-dimensional non-linear PDE with K being a dummy variable. In other words, for any ﬁxed K this is a one-dimensional non-linear PDEwith T, Z be the independent variables.

5. Numerical method

In this Section we describe a numerical method used to solve either the PDE in Eq.(32) or in Eq.(45).The method is constructed similar to how this is done in [Itkin, 2018].First, we re-write Eq.(32) and Eq.(45) in the operator form by introducing an operator L ( T, K, Z ) :[0 , ∞ ) [0 , ∞ ). With this notation Eq.(32) can be represented as S T = L ( T, K, Z ) S , (47)where L ( T, K, Z ) = 12 T S K "(cid:18) S K S Z (cid:19) ∂ Z − S K S Z ∂ K ∂ Z + ∂ K − ( r − q ) K∂ K + qZ∂ Z + 12 T , (48)Similarly, Eq.(45) can be re-written in the same form with L ( T, K, Z ) = a ( K, T, Z ) K T S S Z [ b ( K, T, Z ) ∂ Z + c ( K, T, Z ) ∂ K + d ( K, T, Z )] (49) − ( r − q ) K∂ K + qZ∂ Z + 12 T .

Suppose we now discretize the time to maturity T ∈ [0 , ∞ ) by creating a uniform grid in timewith step ∆ T , so our discrete time is deﬁned at the points [0 , ∆ T, T, . . . ). By using Taylor seriesexpansion it can be shown that the discrete solution of the Eq.(47) up to O ((∆ T )) could be representedin the form S ( T + ∆ T, K, Z ) = e ∆ T L ( T,K,Z ) S ( T, K, Z ) (50)where e ∆ T L ( T,K,Z ) is the operator exponent, [Hochschild, 1981].Again, using the Taylor series expansion, one can verify that the following scheme S ( T + ∆ T, K, Z ) = e ∆ T L ( T +∆ T,K,Z ) S ( T, K, Z ) (51)12lso approximates Eq.(47) up to in O ((∆ T )). Combining them together we obtain the scheme S ( T + ∆ T, K, Z ) = e ∆ T [ L ( T +∆ T,K,Z )+ L ( T,K,Z )] S ( T, K, Z ) , (52)which approximates Eq.(47) up to O ((∆ T ) ).Now we setup an iterative algorithm to solve this discrete equation. This algorithm is a ver-sion of the ﬁx-point Picard iteration algorithm, and relies on the Banach ﬁxed-point theorem,[Granas and Dugundji, 2003]:1. At the ﬁrst iteration we start by setting L ( T + ∆ T, K, Z ) = L ( T, K, Z ) as the initial guess. Thus,Eq.(52) transforms to Eq.(50), which could be represented in the form S (1) ( T + ∆ T, K, Z ) = e ∆ T L ( T,K,Z ) S ( T, K, Z ) , (53)where in S ( k ) the superscript ( k ) marks the value of S found at the k -th iteration of the numericalprocedure. As the exponent L ( T, K, Z ) in the right hands size of Eq.(53) is computed at theknown time T , it is a linear operator with coeﬃcients being the known functions of ( K, T, Z ). Inparticular, when T = 0 these functions are determined by the initial condition. If T >

K, Z ) (which could be done with complexity O ( N M + N M ), N is the number of thegrid points in K direction, and M is the number of the grid points in Z direction), or by using anysort of Padé approximation of the exponential operator. For instance, the Padé approximation(1 ,

1) leads to the well-known Crank-Nicholson scheme which could be solved with the linearcomplexity in N and M , and provides O ((∆ T ) ) approximation in time, see [Itkin, 2017b].2. To proceed we represent Eq.(52) in the form S ( k ) ( T + ∆ T, K, Z ) = e ∆ T [ L ( k − ( T +∆ T,K,Z )+ L ( T,K,Z ) ] S ( k − ( T, K, Z ) , k > . (54)Therefore, the next approximation S ( k ) ( T + ∆ T, K, Z ) to S ( T + ∆ T, K, Z ) can be found againby either computing the matrix exponential, or by using some Padé approximation.Having this machinery in hands we can proceed in the same manner until the entire procedureconverges, i.e. when the condition k S ( k ) ( T + ∆ T, K, Z ) − S ( k − ( T + ∆ T, K, Z ) k < ε is reached after k iterations with ε being the method tolerance, and k · k being some norm, e.g., L .Note, that Eq.(54) with the accuracy O ((∆ T ) ) can be re-written in the form S ( k ) T = 12 h L ( T, K, Z ) + L ( k − ( T + ∆ T, K, Z ) i S ( k − = ˜ L ( k − S ( k − , (55)where tilde in the new notation ˜ L ( k − means that we take a half of sum of the operators L at times T and T + ∆ T , and ( k − means that at time level T + ∆ T we take the value of the operator at the( k − .1. Coordinate transformation Since our PDEs are deﬁned in the domain Ω which has a trapezoidal shape, we cannot directly applya ﬁnite-diﬀerence (FD) method for solving Eq.(54), while ﬁnite elements or Radial Basis Functionsmethods can be used here with no problem. However, a simple trick makes application of the FDMpossible. The trick utilizes the fact that the right vertical boundary of Ω lies at K → ∞ . Thus,based on the asymptotic formula Eq.(38), in this limit S → ∞ as well, regardless of the value of Z ∈ [0 , SQ T /D T ]. Therefore, without any loss of accuracy, the domain Ω could be replaced with thedomain ˜Ω depicted in Fig. 3. Z = S Q T − K D T Z = D T ( K m a x − K ) K max KZ Q T S Q T S/D T ˜Ω Figure 3:

The domain ˜Ω : { K ∈ [0 , ∞ ) , Z ∈ [0 , SQ T ] . By construction, the domain ˜Ω is a parallelogram, and K ∈ [0 , K max ], where K max is some ﬁxedstrike. We introduce it to truncate an inﬁnite strip in K to some ﬁxed computational area.Now, the parallelogram could be transformed to rectangle by using, e.g., the following aﬃnetransformation (for introduction to this topic see, e.g., [Katsumi; and Sasaki, 1994]) Z SQ T u, K SQ T D T (1 − u ) + (cid:18) K max − SQ T D T (cid:19) v. (56)This transformation converts the domain ˜Ω into a unit square with the following map of the boundaries Z = SQ T − KD T v = 0 , (57) Z = D T ( K max − K ) v = 1 ,Z = SQ T u = 1 ,Z = 0 u = 0 . Accordingly, we need to re-write the operator L ( T, K, Z ) in new variables ( u, v ). For instance, for theparabolic operator Eq.(48) this yields L ( T, v, u ) = 12 T S K h S u ∂ v − S u S v ∂ u ∂ v + S v ∂ u i − [( r − q ) v + B ] ∂ v + qu∂ u + 12 T , (58) B = Q T S ( r (1 − u ) − q ) D T K max − SQ T ,K = c (1 − u ) + ava S u + c S v , a = 1 SQ T , c = 1 D T K max − SQ T , (59)14ince ˜Ω - the domain of deﬁnition of the operator L ( T, v, u ) in Eq.(58) - is a unit square, now the PDEcan be solved by using a FD method.

Since computation of matrix exponent is time-consuming, especially in case of a two-dimensionaloperator L , we proceed with using a splitting technique, see [Lanser and Verwer, 1999, Duﬀy, 2006,Itkin, 2017b] and references therein. For instance, in [Itkin, 2017a] an ADI (alternative directionimplicit) scheme proposed in [In’t Hout and Welfert, 2007] for the solution of a backward PDE wasextended twofold. First, a similar ADI scheme has been proposed for the forward equation whichequalizes the results obtained by solving backward and forward PDEs. Second, it replaces the ﬁrstexplicit step of the ADI, applied to the mixed derivative term of the PDE, with the fully implicitstep. This is especially important for the forward PDE as it increases the stability of the solution andguarantees its positivity.However, for the PDE in Eq.(55) using this ADI approach is not a trivial problem, mainly because ofvarious problems with a stable approximation of the mixed derivative term at the second sweep of theADI. Therefore, instead of the ADI method, here we use the Strang splitting, [Lanser and Verwer, 1999,Duﬀy, 2006, Itkin, 2017b]. The main idea of this approach is as follows.With allowance for Eq.(58), let us represent the Eq.(55) in the new coordinates in the form S ( k ) T = h ˜ L ( k − u + ˜ L ( k − v + ˜ L ( k − uv i S ( k − , (60) L u = 12 T S K S v ∂ u + qu∂ u + 14 T , L v = 12 T S K S u ∂ v − [( r − q ) v + B ] ∂ v + 14 T , L uv = − T S K S u S v ∂ u ∂ v . Thus, the operator L u includes all diﬀerentials with respect to u and also a half of the killing term inEq.(58). This is a linear convection-diﬀusion operator with state and time dependent coeﬃcients. Theoperator L v includes all diﬀerentials with respect to v and a half of the killing term in Eq.(58), andalso is a linear convection-diﬀusion operator with state and time dependent coeﬃcients. Finally, theoperator L uv includes only a mixed derivatives term in Eq.(58), and also is a linear diﬀusion operatorwith state and time dependent coeﬃcients.Equation Eq.(60) can be solved numerically by using the Strang splitting scheme S ( k ) ( T + ∆ T, u, v ) = e ∆ T ˜ L ( k − u e ∆ T ˜ L ( k − v e ∆ T ˜ L ( k − uv e ∆ T ˜ L ( k − v e ∆ T ˜ L ( k − u S ( k − ( T + ∆ T, u, v ) (61)+ O ((∆ T ) ) . This scheme can also be represented as a sequence of fractional steps, which read S ( k )1 = e ∆ T ˜ L ( k − u S ( k − , (62) S ( k )2 = e ∆ T ˜ L ( k − v S ( k )1 , S ( k )3 = e ∆ T ˜ L ( k − uv S ( k )2 , S ( k )4 = e ∆ T ˜ L ( k − v S ( k )3 , S ( k ) = e ∆ T ˜ L ( k − u S ( k − .

15e underline that the Strang splitting provides the second order approximation in ∆ T only if thePDE at every fractional step in Eq.(62) is solved also with the second order approximation in ∆ T . Toachieve that at every fractional step we use the Padé approximation (1 ,

1) to obtain a Crank-Nicholsonscheme in time. For instance, for the ﬁrst line in Eq.(62) it gives the following scheme (cid:18) −

14 ∆ T ˜ L ( k − u (cid:19) S ( k )1 = (cid:18) T ˜ L ( k − u (cid:19) S ( k − . (63) To apply this algorithm we construct an appropriate discrete nonuniform grid G ( x ) in the ( u, v )space, see [Itkin, 2017b]. In each spatial direction i ∈ [ u, v ] a nonuniform grid contains N i + 1 nodes( p , p , ..., p N i ) with the steps h = p − p , ..., h N i = p N i − p N i − . Accordingly, in the operator Eq.(58)we replace continuous derivatives with their ﬁnite-diﬀerence approximations, and denote these discreteoperators deﬁned on G ( x ) as (cid:79) . Thus, we replace ∂ x with (cid:79) x , etc. Once a non-uniform grid is constructed, the ﬁrst and second derivatives can be approximated on thisgrid. The corresponding expressions are obtained by using the Taylor series expansions. Here for thereference we provide just the ﬁnal results, see, e.g., [Itkin, 2017b, In’t Hout and Foulon, 2010]:Let f : R → R be any given function, let x i , i ∈ Z be any given increasing sequence of mesh points,and h i = x i − x i − , ∀ i . To approximate the ﬁrst derivatives of f ( x ) employ the following formulas. Backward scheme: (cid:79) Bx = α − f ( x i − ) + α − f ( x i − ) + α f ( x i ) = f ( x i ) + O (( h i + h i − ) ) , (64) Forward scheme: (cid:79) Fx = γ f ( x i ) + γ f ( x i +1 ) + γ f ( x i +2 ) = f ( x i ) + O (( h i +1 + h i +2 ) ) . Central scheme: (cid:79) Cx = β − f ( x i − ) + β f ( x i ) + β f ( x i +1 ) = f ( x i ) + O (( h i + h i +1 ) ) . Here the super index X means the ﬁrst derivatives with the X approximation, X ∈ [ B, F, C ] could bebackward (B), forward (F) and central (C), and the coeﬃcients α, β, γ read α − = h i h i − ( h i + h i − ) , α − = − h i − + h i h i h i − , α = h i − + 2 h i h i ( h i − + h i ) ,γ = − h i +1 + h i +1 h i +2 ( h i +2 + h i +1 ) , γ = h i +2 + h i +1 h i +2 h i +1 , γ = − h i +1 h i +2 ( h i +2 + h i +1 ) .β − = − h i +1 h i ( h i +1 + h i ) , β = h i +1 − h i h i +1 h i , β = h i h i +1 ( h i +1 + h i ) . As follows from Eq.(64), these expressions provide approximation of the ﬁrst derivatives with thesecond order O ( h ).To approximate the second derivatives of f ( x ) one can employ the following formulas. (cid:79) Cx = (cid:79) Fx (cid:79) Bx = (cid:79) Bx (cid:79) Fx , (cid:79) Bx = (cid:79) Bx (cid:79) Bx , (cid:79) Fx = (cid:79) Fx (cid:79) Fx . O ( h ).For instance, for (cid:79) Cx we have in the explicit form (cid:79) Cx = δ − f ( x i − ) + δ f ( x i ) + δ f ( x i +1 ) = f ( x i ) + O (( h i + h i +1 ) ) , where δ − = 2 h i ( h i +1 + h i ) , δ = − h i +1 h i , δ = 2 h i +1 ( h i +1 + h i ) . Let f : R → R be any given function of two variables ( x, y ). To approximate the mixed derivative ∂ x,y f ( x i , y j ) at any point ( x i , y j ) one can consequently apply either one-sided approximations in eachdirection, or apply the central diﬀerence approximation ﬁrst in x and then in y (or vice versa). Thelatter results in the FD approximation of the mixed derivative using a 9-points stencil while stillproviding the second order of approximation. However, as shown in [Itkin, 2017a], this discretizationcould be unstable. Therefore, instead a fully implicit scheme was proposed which, as applied to ourproblem, is discussed in the next Section.For the future we need some additional notation. We denote a matrix of (cid:79) Fx as A F ,x ; the matrixof (cid:79) Bx - as A B ,x ; the matrix of (cid:79) Cx - as A C ,x . For the ﬁrst order approximations we use thefollowing deﬁnitions. Deﬁne a one-sided forward discretization of (cid:79) , which we denote as A F ,x : A F ,x C ( x ) = [ C ( x + h, t ) − C ( x, t )] /h . Also deﬁne a one-sided backward discretization of (cid:79) , denoted as A B ,x : A B ,x C ( x ) = [ C ( x, t ) − C ( x − h, t )] /h . These discretizations provide ﬁrst order approximation in h , e.g., ∂ x C ( x ) = A F ,x C ( x ) + O ( h ). Also I x denotes a unit matrix. L ( T, v, u )Here we use the same approach as in [Itkin, 2018]. Denote A C ,x a matrix of the discrete operator (cid:79) Cx on the grid G ( x ). Observe, that A C ,x is the Metzler matrix, see [Berman and Plemmons, 1994,Itkin, 2017b]. Indeed, it has all negative elements on the main diagonal, and all nonnegative elementsoutside of the main diagonal. Also A C ,x is a tridiagonal matrix.By the properties of the Metzler matrix M its exponent is a nonnegative matrix. Therefore, operation e M S preserves positivity of vector S . Also all eigenvalues of the Metzler matrix have a negative realpart. Therefore, the spectral norm of the matrix follows k e M k < . Thus, if in Eq.(54) the operator M = ˜ L ( k − on the grid G ( x ) is represented by the Metzler matrix A M , the map e A M is contractual, and, hence, the entire scheme Eq.(54) is stable.By the deﬁnition of the operator ˜ L ( k − in Eq.(60), it consists of several terms. Second derivatives.

The ﬁrst term T S K S u ∂ v can be approximated on the grid as A ,v = 12 T S K S u A C ,v . As T >

0, obviously A ,v is the Metzler matrix. Same is true for the matrix A ,u : A ,u = 12 T S K S v A C ,u . Hence, e.g. for A F ,x , the superscript means: backward second order approximation, and subscript means: ﬁrst orderderivative. he mixed derivative. The mixed derivative term in Eq.(60) reads: − T S K S u S v ∂ u ∂ v . Observe,that from Eq.(56) we have S u = SQ T D T ( D T S Z − S K ) , S v = S K (cid:18) K max − SQ T D T (cid:19) , from Eq.(25) S Z = √ TBS σ ( K, Σ , T ) ≥ , and from Eq.(29), Eq.(31) S K = − S Z BS K ( S, Σ , T ) = S Z D T N ( d ( K, Σ , T )) ≥ . With the allowance for these expressions, it follows that S u = √ TBS σ ( K, Σ , T ) SQ T [1 − N ( d ( K, Σ , T ))] = sR ( d ) ≥ , (65) S v = √ TBS σ ( K, Σ , T ) N ( d ( K, Σ , T ))( D T K max − SQ T ) = (cid:18) K max K − s (cid:19) (cid:16) √ πe d / − R ( d ) (cid:17) ≥ ,s = SQ T KD T , d = d ( K, Σ , T ) , where R ( x ) ≡ − N ( x ) N ( x ) is the Mills ratio for the standard normal distribution. It can be eﬃcientlycomputed by the particularly simple continued fraction representation at x > R ( x ) = 1 x + 1 x + 2 x + ..., (66)or by using Taylor series expansion at 0 ≤ x ≤

1, see [Gasull and Utzet, 2013].Thus, S u S v ≥

0. Therefore, to have A ,uv to be the Metzler matrix, we need the matrix of S u S v tobe the negative of the Metzler matrix , again see [Itkin, 2017b]. However, the standard methods failto provide discretization of the second order that guarantees this feature. Therefore, we need anotherapproach, similar to what was proposed in [Itkin, 2017a]. We discuss this in Appendix A.As compared with [Itkin, 2017a], the method in this paper has two innovations. First, in [Itkin, 2017a]a ﬁrst order approximation in time step was constructed, while here we provide the second orderapproximation. Second, in [Itkin, 2017a] a mixed derivative term has the form ρ u,v f ( v ) g ( u ) ∂ u ∂ v , where f ( u ) is some function of u only, and g ( v ) is some function of v only. Here we deal with a more generalterm ρ u,v f ( u, v ) ∂ u ∂ v , and show that the approach of [Itkin, 2017a] can be applied in this case as well. Convection terms.

The next step is to construct a suitable approximation for the convection terms.The term qu∂ u admits the forward approximation (cid:79) Fu as q ≥ u ∈ [0 , − [( r − q ) v + B ] ∂ v is a more delicate issue and depends on the sign of the function F ( u, v ) = ( r − q ) v + B .Suppose r > q . A typical behavior of function F ( u, v ) at T = 0 .

01 is presented in Fig. 4 at the valuesof model parameters given in Table 1. Rigorously speaking, the matrix A ,uv should be the negative of the EM-matrix, where the latter stays for the EventuallyM-Matrix. For the introduction into the EM matrices and their properties see [Itkin, 2017b]. igure 4: The behavior of function F ( u, v ) at T = 0 . . As can be seen, F ( u, v ) ≥ u, v ) ∈ [0 , × [0 , − F ( u, v ) ∂ v . a backward approximation should be used to obtain the Metzler matrix, i.e. − F ( u, v ) A B ,v . It can be checked that in case r < q the forward approximation should be used as then F ( u, v ) ≤ A ˜ L u , A ˜ L v , are, by construction, also the Metzler matrices, as each ofthem is a sum of the Metzler matrices, [Berman and Plemmons, 1994].It is easy to prove that the matrix exp( A ˜ L ( k − ) is the second order approximation of the operator e M in the spatial steps h u , h v on the grid, i.e. O ( h u + h v + h u h v ). That follows from the fact that thederivatives are approximated by using Eq.(64), where every line is the second order approximation tothe corresponding partial derivative. The remaining point to prove is the convergence of the ﬁxed-point Picard iterations. Again, in doingso we follow [Itkin, 2018]. According to Eq.(54), change of variables in Eq.(56), and Eq.(55) we runthe iterations S ( k ) ( T + ∆ T, u, v ) = e ∆ T ˜ L ( k − S ( k − ( T, u, v ) , k > . (67)By construction, at every iteration k the operator ˜ L ( k ) is linear, continuous and bounded at thecomputational domain. Also, the operator E = exp[∆ T ˜ L ( k ) ] is Lipschitz (cid:13)(cid:13)(cid:13) E S ( T, u, v ) − E S ( T, u, v ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) E (cid:13)(cid:13)(cid:13) k S ( T, u, v ) − S ( T, u, v ) k ≤ k S ( T, u, v ) − S ( T, u, v ) k , since we constructed discretization of E as e M with M being the Metzler matrix, and it was shownearlier that in the spectral norm k e M k ≤ S ( T + ∆ T, u, v ) the exact solution of19q.(47) at time T + ∆ T . Then, by the mean-value theorem for operators, we have (cid:13)(cid:13)(cid:13) S ( k ) ( T + ∆ T, u, v ) − ˆ S ( T + ∆ T, u, v ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) E S ( T, u, v ) − ˆ E l S ( T, u, v ) (cid:13)(cid:13)(cid:13) ≤ k D ( E )( ξ ( k )) k · (cid:13)(cid:13)(cid:13) S ( k − ( T + ∆ T, u, v ) − ˆ S ( T, u, v ) (cid:13)(cid:13)(cid:13) where ξ ( k ) is a convex combination of S ( k − ( T + ∆ T, u, v ) and S ( k ) ( T + ∆ T, u, v ), and D denotesthe Frèchet derivative of the operator E at the space of all bounded non-linear operators, see, e.g.,[Hutson et al., 2005, Li et al., 2010]. Thus, the iterations converge if k D ( E ) k <

1. We prove thatactually this is the case for the operators in Eq.(60) in Appendix B. In particular, we show there thatthe iterative scheme is "almost" unconditionally stable, where "almost" means for the values of themodel parameters reasonable from the practical point of view.All the results obtained in this Section can be further summarized in the following proposition

Proposition 5.1.

The scheme Eq.(54) with approximation Eq.(67) is a) (almost) unconditionallystable, b) preserves non-negativity of the solution, c) provides second order approximation of Eq.(54) inspace on the grid G , and d) converges.

6. Numerical results and comparison

In our numerical test we use the input parameters shown in Table 1. We build a non-uniform grid

S r q T max K max Call/Put N u N v R c Table 1:

Parameters of the test. G ( x ) in the ( u, v ) space, which contains N u nodes in the u direction and N v nodes in the v direction.Since the gradients of the implied volatility S ( T, u, v ) are high at the boundaries of the computationaldomain, i.e. close to u = 0 , v = 0 , u = 0 , v = 0. This is obtained by using the compression ratio R c = 50, see the description in[Itkin, 2017b]. Thus constructed grid is represented in Fig. 5. With the initial parameters in Table 1,one step in the v direction approximately corresponds to a change in strike by 3$, and one step in the u direction approximately corresponds to a 50 cents change in both the options price and the strike.We solve the forward equation starting at T = 0 and going forward in time to T max with the time step∆ T = 0 .

01. The boundary conditions are computed as this is described in Section 3 using numericalsolutions of Eq.(36), Eq.(39) and Eq.(40). Thus found boundary values corresponding to T = 0 .

01 arepresented in Fig. 6 (along the lines u = 0, u = 1, v = 0 and v = 1).The standard way to get the implied volatility by solving Eq.(2) numerically is used for comparison,and further is called "traditional". We run this traditional approach at every point of the grid G ( x ) by using Matlab function blsimpv , or a root solver at the boundaries where blsimpv providesunsatisfactory results. All tests were run on a PC with Intel i7-4790 CPU (8 Cores, 3.6GHz) as asequential loop, i.e. no parallel tools have been used. The total elapsed time was 113 secs for 20000points on the grid, or, on average, 6 msc per a single point.. The results are presented in Fig. 3.The grid G ( x ) doesn’t change in time. On the other hand, functions d ( S ) , d ( S ) in the Black-Scholesformula (see Eq.(19)) also almost don’t depend on T (since T is now embedded into S ). Therefore, the20 igure 5: Non-uniform ﬁnite-diﬀerence gridin ( u, v ) space. Figure 6:

Implied volatility S ( K, Z, T ) at T = 0 . computed by using the traditionalapproach. only dependence on T in the numerical implementation of Eq.(2) is the dependence of Q T , D T whichis weak. Thus, Fig. 6 changes with time very slow. Also, for the purpose of a better illustration ofthe behavior of S at intermediate values of ( u, v ), we zoom-in Fig. 6 by excluding the boundaries andsetting T = 0 .

5. Thus obtained plot is represented in Fig. 7.

Figure 7:

Implied volatility S ( K, Z, T ) at T = 0 . computed by using the traditionalapproach (zoomed-in). The ﬁrst step in time is the most challenging problem with our approach. Indeed, according to Section 3,the initial condition for the PDE in Eq.(55) is S (0 , u, v ) = 0 for all u and v . This, in turn, means that21ll gradients S u , S v tend to inﬁnity at T = 0. Also,˜ L NL (0 , u, v ) = 0 , (68)where L NL denotes the non-linear part of the operator L . Therefore, the iterative scheme in Eq.(55)becomes incorrect. At the moment we don’t have a good resolution of this problem. Therefore, instead,for T = ∆ T we proceed by computing S (∆ T, u, v ) in the traditional way, and denote thus obtainedimplied volatility as S tr (∆ T, u, v ).Again, the main problem here is that at T = ∆ T we don’t have a good initial guess for S as itcannot be taken from the previous time step. But, in principle, if we would have a good initial guess for S , we could run our iterative algorithm even at T = ∆ T . To validate this, we further use S tr (∆ T, u, v )as the initial guess for the iterative scheme. But, based on Eq.(68), for this time step we also need tomodify the scheme in Eq.(55) to be S ( k ) T = L ( k − ( T + ∆ T, K, Z ) S ( k − = ˜ L ( k − S ( k − . (69)Then we solve this equation iteratively, but it is easy to check that this scheme provides only the ﬁrstorder approximation in ∆ T .The results are as follows. We do 20 iterations of the method to obtain the relative error ε = k S tr (∆ T, u, v ) − S (∆ T, u, v ) kk S tr (∆ T, u, v ) k to be 1.5 bps. The elapsed time for doing so is 16 secs, or 0.8 secs per iteration. The diﬀerence between S tr (∆ T, u, v ) and our numerical solution S (∆ T, u, v ) is presented in Fig. 8. The graph is truncated upto u = v = 0 . Figure 8:

The diﬀerence ∆ S = S tr (∆ T, u, v ) − S (∆ T, u, v ) as a function of u, v at T = 0 . after 20 iterations. Having good (not all zeros) values of S (∆ T, u, v ) we can proceed based on the general schemedescribed in Section 5. We increase time to T = 2∆ T , iterate to get the numerical solution, then22ompute S tr (2∆ T, u, v ), compare these solutions, etc. In Fig. 9 these results are presented for T = 2∆ T, T, T, T . Figure 9:

The diﬀerence ∆ S = S tr (∆ T, u, v ) − S (∆ T, u, v ) as a function of u, v at T = 0 . (upper-left), T = 0 . (upper-right), T = 0 . (bottom-left) and T = 0 . (bottom-right). It turns out that at those time steps the iterations converge much faster as compared with the testfor the ﬁrst step reported in Section 6.1. This convergence is summarized in Table 2. This partiallycould be explained by the order of approximation in time, since at the ﬁrst step the scheme was of theorder one, while at the next steps it is of the order 2.As this can be seen from the presented results, the convergence of the method slightly improves withtime T , the elapsed time remains the same, while the relative error between the traditional methodand the PDE solver slightly increases with T . The later can obviously be explained by accumulation ofthe approximation errors at every time step with the increase of T . However, for instance, for T = 0 . u and v . In particular, in our numerical example the point u = v = 0 . K = 260$ which, in turn, corresponds to the moneyness S/M = 0 . ε t elapsed, secs1 20 1.5e-4 162 3 2.3e-6 2.73 3 1.9e-6 2.85 3 1.4e-6 2.810 3 8.3e-7 2.820 3 3.8e-7 2.8 Table 2:

Convergence of the Picard iterations for various steps in time T . The time step ∆ T = 0.01.

7. Discussion

In this paper we derive a backward and forward quasilinear parabolic PDEs that govern the impliedvolatility of a contingent claim whenever the latter is well-deﬁned. This would include at least anycontingent claim written on a positive stock price whose payoﬀ at a possibly random time is convex.Alternative to that we also derived a forward nonlinear hyperbolic PDE of the ﬁrst order which alsogoverns evolution of the implied volatility in (

K, T, Z ) space. We also discuss suitable initial andboundary conditions for those PDEs. Finally, we demonstrate how to solve them numerically by usingan iterative method. All these results are new.We compare performance of two methods. The ﬁrst one uses a traditional scheme of getting theimplied volatilities by using a root solver and solving Eq.(2). The second method relies on a numericalsolution of the derived PDEs. We show that based on our numerical experiments, if the impliedvolatility is needed at every point on the rather dense grid, the PDE solver signiﬁcantly outperformsthe root solver. As follows from Table 2, the PDE solver is almost 40 times faster.However, a few comments shoud be made with this regard. First, our method utilizes the traditionalmethod in two ways. We use it to compute the boundary conditions, and we also use it at the ﬁrsttime step T = ∆ T as this was discussed in Section 6.1 since we don’t have a good start value of S foriterations to kick oﬀ. Also this drops the order of approximation in time from O ((∆ T ) ) to O (∆ T )at this step. Nevertheless, all those issues are relatively minor, and for the longer time horizons ourmethod still outperforms the traditional one.The second obstacle is that, most likely, in practice we rarely need implied volatilities on so densegrid in both space and time. Usually, we need them only at those points in ( K, T ) which correspond tothe tradable market strikes and maturities, and they are not so dense. However, the FD method stillrequires a grid. Therefore, the number of points for the traditional method could be much smaller thatthat for the FD method. This could compensate a slower performance of the traditional method andmake both methods to be comparable in the elapsed time.Third, as was already mentioned, functions d ( S ) , d ( S ) in the Black-Scholes formula (see Eq.(19))weakly depend on T (since T is now embedded into S ), namely only via Q T and D T . Therefore,the solution of Eq.(2) also depends on T only via Q T , D T . Therefore, since we already computed S (∆ T, K, Z ) by using the traditional way, then for T = 2∆ T Eq.(2) could be solved by expanding theleft hands side into series on ∆ T (cid:28)

1. In more detail, we write BS ( T, K, S ( T, K, Z )) = Z, (70) BS ( T + ∆ T, K, S ( T + ∆ T, K, Z )) = Z, and deﬁne S ( T + ∆ T, K, Z ) = S ( T, K, Z ) + ∆ S ( T, K, Z ). Now expanding the second line of Eq.(70)24nto series on ∆ T (cid:28) , ∆ S (cid:28)

1, and using the ﬁrst line in Eq.(70), in the ﬁrst order we obtain∆ S ( T, K, Z ) = − Θ( T, K, S ( T, K, Z ))Vega(

T, K, S ( T, K, Z )) √ T ∆ T. (71)Since S ( T, K, Z ) is already known, the expression in Eq.(71) can be easily computed.Once all S ( T + ∆ T, K, Z ) are computed in such a way, we can proceed to T + 2∆ T . The problemwith this approach, however, is that its accuracy decreases with every time step. As an alternative,periodically at some time steps the accurate value of S ( T, K, Z ) can be recomputed by using thetraditional approach. However, same could be done for the PDE approach as well.It should also be mentioned that in this paper we present results for European plain vanilla optionsusing the forward equation, while for e.g., for American options one has to solve the backward PDE,which computationally is more expensive.

Acknowledgments

We thank Archil Gulisashvili and Daniel Duﬀy for useful discussions. Any errors are our own.

References [Benaim and Friz, 2009] Benaim, S. and Friz, P. (2009). Regular variations and smile asymptotics.

Mathematical Finance , 19:1–12.[Berman and Plemmons, 1994] Berman, A. and Plemmons, R. (1994).

Nonnegative matrices in math-ematical sciences . SIAM.[Bhatia and Holbrook, 2003] Bhatia, R. and Holbrook, J. (2003). Frechet derivatives of the powerfunction.

Indiana Univ. Math. Journal , 49:1155–1173.[Chiarella et al., 2008] Chiarella, C., Kang, B., Mayer, G., and Ziogas, A. (2008). The evaluation ofAmerican option prices under stochastic volatility and jump-diﬀusion dynamics using the method oflines. Technical Report Research paper 219, Quantitative Finance Research Centre, University ofTechnology, Sydney.[Corcuera et al., 2009] Corcuera, J., Guillaume, F., Leoni, P., and Schoutens, W. (2009). ImpliedLévy volatility.

Quantitative Finance , 9(4):383–393.[Duﬀy, 2006] Duﬀy, D. (2006).

Finite Diﬀerence Methods in Financial Engineering: A Partial Diﬀer-ential Equation Approach . The Wiley Finance Series.[Dupire, 1994] Dupire, B. (1994). Pricing with a smile.

Risk , 7:18–20.[Gasull and Utzet, 2013] Gasull, A. and Utzet, F. (2013). Approximating mills ratio. available at http://arxiv.org/pdf/1307.3433v1.pdf .[Gatheral, 2006] Gatheral, J. (2006).

The volatility surface . Wiley ﬁnance.[Glau et al., 2018] Glau, K., Herold, P., Madan, D., and Potz, C. (2018). The chebyshev method forthe implied volatility.

Journal of Computational Finance . Forthcoming.25Granas and Dugundji, 2003] Granas, A. and Dugundji, J. (2003).

Fixed point theory . Springer Verlag,New York.[Gulisashvili, 2010] Gulisashvili, A. (2010). Asymptotic formulas with error estimates for call pricingfunctions and the implied volatility at extreme strikes.

SIAM J. FINANCIAL MATH , 1:609–641.[Hochschild, 1981] Hochschild, G. (1981).

Basic theory of algebraic groups and Lie algebras . Graduatetexts in mathematics. Springer-Verlag.[Hull, 1997] Hull, J. C. (1997).

Options, Futures, and other Derivative Securities . Prentice-Hall, Inc.,Upper Saddle River, NJ, third edition.[Hutson et al., 2005] Hutson, V., Pym, J., and Cloud, M. (2005).

Applications of Functional Analysisand Operator Theory . Mathematics in Science and Engineering. Elsevier Science.[In’t Hout and Foulon, 2010] In’t Hout, K. J. and Foulon, S. (2010). ADI ﬁnite diﬀerence schemes foroption pricing in the Heston model with correlation.

International journal of numerical analysis andmodeling , 7(2):303–320.[In’t Hout and Welfert, 2007] In’t Hout, K. J. and Welfert, B. D. (2007). Stability of ADI schemes ap-plied to convection-diﬀusion equations with mixed derivative terms.

Applied Numerical Mathematics ,57:19–35.[Itkin, 2016] Itkin, A. (2016). Eﬃcient solution of backward jump-diﬀusion partial integro-diﬀerentialequations with splitting and matrix exponentials.

J. Comput. Financ. , 19(3):29–70.[Itkin, 2017a] Itkin, A. (2017a). Modeling stochastic skew of fx options using slv models with stochasticspot/vol correlation and correlated jumps.

Applied Mathematical Finance , 24(6):485–519.[Itkin, 2017b] Itkin, A. (2017b).

Pricing Derivatives Under Lévy Models. Modern Finite-Diﬀerence andPseudo-Diﬀerential Operators Approach. , volume 12 of

Pseudo-Diﬀerential Operators . Birkhauser.[Itkin, 2018] Itkin, A. (2018). Nonlinear PDEs risen when solving some optimization problems inﬁnance, and their solutions.

Journal of Computational Finance , 21(4):1–21.[Itkin, 2019] Itkin, A. (2019). Deep learning calibration of option pricing models: some pitfalls andsolutions. Arxiv: 1906.03507.[Itkin and Carr, 2011] Itkin, A. and Carr, P. (2011). Jumps without tears: A new splitting technologyfor barrier options.

International Journal of Numerical Analysis and Modeling , 8(4):667–704.[Jackel, 2015] Jackel, P. (2015). Let’s be rational.

Wilmott magazine , pages 40–43.[Katsumi; and Sasaki, 1994] Katsumi;, N. and Sasaki, S. (1994).

Aﬃne Diﬀerential Geometry . Cam-bridge University Press.[Lanser and Verwer, 1999] Lanser, D. and Verwer, J. (1999). Analysis of operator splitting for advection-diﬀusion-reaction problems from air pollution modelling.

Journal of Computational and AppliedMathematics , 111(1-2):201–216.[Lee, 2004] Lee, R. (2004). The moment formula for implied volatility at extreme strikes.

MathematicalFinance. , 14(3):469–480. 26Li et al., 2010] Li, B., Lu, B., Wang, Z., and McCammon, J. (2010). Solutions to a reduced Poisson-Nernst-Planck system and determination of reaction rates.

Physica A , 389:1329–1345.[Lucic, 2008] Lucic, V. (2008). Boundary conditions for computing densities in hybrid models via pdemethods. SSRN 1191962.[Natenberg, 1994] Natenberg, S. (1994).

Option Volatility & Pricing: Advanced Trading Strategies andTechniques . McGraw-Hill Education.[Oleinik and Radkevich, 1973] Oleinik, O. A. and Radkevich, E. V. (1973).

Second order equationswith non-negative characteristic form . Kluwer Academic Publishers.[Pinchover and Rubinstein, 2005] Pinchover, Y. and Rubinstein, J. (2005).

An Introduction to PartialDiﬀerential Equations . Cambridge University Press.[Schwartz and Karcher, 1969] Schwartz, J. and Karcher, H. (1969).

Nonlinear functional analysis .(Notes on mathematics and its applications). Gordon & Breach.[Starzak, 1989] Starzak, M. (1989).

Mathematical methods in chemistry and physcis . Springer, NewYork.[Toivanen, 2010] Toivanen, J. (2010). A componentwise splitting method for pricing American optionsunder the Bates model. In

Computational Methods in Applied Sciences , pages 213–227. Springer.

Appendices

A. FD approximation of the mixed derivative term in Eq.(60)

In this Appendix we show how to solve the third line in Eq.(62) with the accuracy O ((∆ T ) ) in timeand O ( h u + h v + h u h v ) in space folloiwng the method ﬁrst proposed in [Itkin, 2017a]. To recall, theequation under consideration reads S ( k ) = e ∆ T ˜ L ( k − uv S ( k ) , (A.1) L uv = − T S K S u S v (cid:79) u (cid:79) v . To achieve this accuracy in time we use the Padé approximation (1 ,

1) to obtain a Crank-Nicholsonscheme (wrtten with some loose of notation to make it better readable) (cid:18) −

14 ∆ T ˜ L ( k − uv (cid:19) S ( T + ∆ T ) = (cid:18) T ˜ L ( k − uv (cid:19) S ( T ) = ¯ S ( T ) . (A.2)This numerical scheme is, of course, implicit due to the fact that, despite we know ¯ S ( T ), to ﬁnd S ( T + ∆ T ) we need to solve Eq.(A.2). It is known that, after some discretization on the FD grid isapplied to Eq.(A.2), the matrix in the left hands side of Eq.(A.2) must be an M-matrix, to ensure thatthis scheme is stable and preserves the positivity of the solution. However, this is impossible by usingthe standard FD discretizations of the second order, in more detail see [Itkin, 2017b].27herefore, we rewrite Eq.(A.2) by dropping the dependence on T and using a trick proposed in[Itkin, 2017a] (cid:16) P − √ ∆ T ρ u,v U ( u, v ) (cid:79) u (cid:17)(cid:16) Q + √ ∆ T V ( u, v ) (cid:79) v (cid:17) S (A.3)= ¯ S + h ( P Q − − Q √ ∆ T ρ u,v U ( u, v ) (cid:79) u + P √ ∆ T V ( u, v ) (cid:79) v i S .U ( u, v ) = 1 √ T S K S u , V ( u, v ) = 1 √ T S K S v . Here ρ u,v = − / , P, Q, are some positive numbers which have to be chosen based on some conditions,e.g., to provide diagonal dominance of the matrices in the parentheses in the lhs of Eq.(A.3), see below.Also the intuition behind this representation is discussed in detail in [Itkin, 2017a].Note, that in [Itkin, 2017a] we considered a mixed derivative term of the form ρ u,v f ( v ) g ( u ) ∂ u ∂ v ,where f ( u ) is some function of u only, and g ( v ) is some function of v only. Here we are dealingwith a more general term ρ u,v f ( u, v ) ∂ u ∂ v , so the presented approach is a generalization of that in[Itkin, 2017a]. Also our method in this paper is of the second order in time while the method in[Itkin, 2017a] is of the ﬁrst order.The Eq.(A.3) can be solved using ﬁxed-point Picard iterations. One can start with setting S = ¯ S in the right hands side of Eq.(A.3), then solve sequentially two systems of equations (cid:16) Q + √ ∆ T V ( u, v ) (cid:79) v (cid:17) S ∗ = ¯ S + h P Q − − Q √ ∆ T ρ u,v U ( u, v ) (cid:79) u + P √ ∆ T V ( u, v ) (cid:79) v i S j , (cid:16) P − √ ∆ T ρ u,v U ( u, v ) (cid:79) u (cid:17) S j +1 = S ∗ . (A.4)Here S j is the value of S at the j -th iteration .To solve Eq.(A.4) we propose two FD schemes. The ﬁrst one (Scheme A) is introduced by thefollowing Propositions : Proposition A.1.

Let us approximate the left hands side of Eq.(A.4) using the following ﬁnite-diﬀerence scheme: (cid:16) QI v + √ ∆ T V ( u, v ) A B ,v (cid:17) S ∗ = α + ¯ S − S j , (A.5) (cid:16) P I u − √ ∆ T ρ u,v U ( u, v ) A B ,u (cid:17) S j +1 = S ∗ ,α + = ( P Q + 1) I − Q √ ∆ T ρ u,v U ( u, v ) A F ,u + P √ ∆ T V ( u, v ) A F ,v . Then this scheme is unconditionally stable in time step ∆ T , approximates Eq.(A.4) with O ( √ ∆ T max( h u , h v )) and preserves positivity of the matrix S ( u, v, T ) if Q = β √ ∆ T /h v , P = β √ ∆ T /h u , where h v , h u arethe grid space steps correspondingly in v and u directions, and the coeﬃcient β must be chosen to obeythe condition: β > max u,v [ V ( u, v ) + ρ u,v U ( u, v )] . The trick is motivated by the desire to build an ADI scheme which consists of two one-dimensional steps, becausefor the 1D equations we know how to make the rhs matrix to be an EM-matrix, see [Itkin, 2016, Itkin, 2017b] andreferences therein. We introduce this notation to make it easy to compare the description of the method in this paper with that in[Itkin, 2017a]. The reader shouldn’t miss these iterations (marked by j ) used to solve Eq.(A.1) with those (marked by k ) for thesolution of the nonlinear equation which are discussed in Section 5. For the sake of clearness we formulate this Proposition for the uniform grid, but it is transparent how to extend it forthe non-uniform grid. roof. The proof can be found in [Itkin, 2017a, Itkin, 2017b].The computational scheme in Eq.(A.5) should be understood in the following way. At the ﬁrst line ofEq.(A.5) we begin with computing the product S = α + ¯ S . This can be done in three steps. First, theproduct S = Q √ ∆ T ρ u,v U ( u, v ) A F ,u ¯ S is computed in a loop on v i , i = 1 , ..., N v . In other words, if ¯ S isa N u × N v matrix where the rows represent the u coordinate and the columns represent the v coordinate,each j -th column of S is a product of matrix Q √ ∆ T ρ u,v U ( u, v ) A F ,u and the j -th column of ¯ S . Thesecond step is to compute the product S = P √ ∆ T V ( u, v ) A F ,v ¯ S which can be done in a loop on u i , i = 1 , ..., N u . Finally, the right hand side of the ﬁrst line in Eq.(A.5) is ( P Q + 1) ¯ S − S + S − S j .Then in a loop on u i , i = 1 , ..., N u , N u systems of linear equations have to be solved, each givinga row vector of S ∗ . The advantage of the representation Eq.(A.5) is that the product α + ¯ S can beprecomputed.If √ ∆ T ≈ max( h u , h v ), then the whole scheme becomes of the second order in space. However, thiswould be a serious restriction inherent to the explicit schemes. Therefore, we don’t rely on it. Note,that in practice the time step is usually chosen such that √ ∆ T (cid:28)

1, and hence the whole scheme isexpected to be closer to the second, rather than to the ﬁrst order in h . Note, that this approach issimilar to [Toivanen, 2010, Chiarella et al., 2008] for negative correlations, where a seven point stencilbreaks a rigorous second order of approximation in space.As shown in [Itkin, 2017a], the rate of convergence of the Picard iterations is linear.The above results can be further improved by making the whole scheme to be of the second order ofapproximation in h u and h v . We call this FD scheme as Scheme B. Proposition A.2.

Let us approximate the lhs of Eq.(A.4) using the following ﬁnite-diﬀerence scheme: (cid:16) QI v + √ ∆ T V ( u, v ) A B ,v (cid:17) S ∗ = α +2 ¯ S − S j , (A.6) (cid:16) P I u − √ ∆ T ρ u,v U ( u, v ) A B ,u (cid:17) S j +1 = S ∗ ,α +2 = ( P Q + 1) I − Q √ ∆ T ρ u,v U ( u, v ) A F ,u + P √ ∆ T V ( u, v ) A F ,v . Then this scheme is unconditionally stable in time step ∆ T , approximates Eq.(A.4) with O (max( h u , h v )) and preserves positivity of the vector S ( u, x, T ) if Q = β √ ∆ T /h v , P = β √ ∆ T /h u , where h v , h u arethe grid space steps correspondingly in v and u directions, and the coeﬃcient β must be chosen to obeythe condition: β >

32 max u,v [ V ( u, v ) + ρ u,v U ( u, v )] . The scheme Eq.(A.6) has a linear complexity in each direction.Proof.

See [Itkin, 2017a].The coeﬃcient β should be chosen experimentally. In our experiments described in the followingsections we used β = max u,v [ U ( u, v ) − ρ u,v U ( u, v )] . (A.7) B. Norm of the Fr `echet derivative for operators in Eq.(60)

To compute the norm of the Frèchet derivative recall that by deﬁnition k D ( E ) k = sup ν =0 k D ( E )( ν ) kk ν k ν = ( ν , ..., ν m ) ∈ [ L ∞ ( −∞ , m , then D ( E )( ν ) = ∂ E ( S + kν ) ∂k (cid:12)(cid:12)(cid:12) k =0 . In other words, in this case the Frèchet derivative coincides with the Gâteaux derivative.To ﬁnd D ( E NL )( S ) we need some results from functional analysis, namely:1. The product and quotient rules for the Frèchet derivatives, [Schwartz and Karcher, 1969].2. The property that the Frèchet derivative of a linear operator is the operator itself, [Schwartz and Karcher, 1969].3. Let A → A r be the map that takes a positive deﬁnite matrix to its r -th power, and let D A r bethe Frèchet derivative of this map. Then it is known that k D A r k = k rA r − k if r ≤ r ≥ L as ˜ L NL , and the linear part - as ˜ L L . Accordingly, ˜ L = ˜ L NL + ˜ L L .Using the properties of the Frèchet derivatives, we obtain (cid:13)(cid:13)(cid:13) D (∆ T L NL u )( S ) (cid:13)(cid:13)(cid:13) ≤ ∆ T T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) S v [ c (1 − u ) + av ] a S u + c S v (cid:19) (2 S + S ) (cid:79) u (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (B.1) ≤ ∆ T T (cid:13)(cid:13)(cid:13)(cid:13) S v [ c (1 − u ) + av ] a S u + c S v (cid:13)(cid:13)(cid:13)(cid:13) k S (2 + S ) kk (cid:79) u k . It can be seen that the ﬁrst norm k · k in the right hands part of Eq.(B.1) is bounded, and moreover, k · k ≤

2. The second norm k · k is also bounded since we work on a truncated grid with K < K max .Therefore, the values of S are also bounded. For instance, in our numerical experiments S <