[PDF] Hedging Non-Tradable Risks with Transaction Costs and Price Impact

Abstract

A risk-averse agent hedges her exposure to a non-tradable risk factor U using a correlated traded asset S and accounts for the impact of her trades on both factors. The effect of the agent's trades on U is referred to as cross-impact. By solving the agent's stochastic control problem, we obtain a closed-form expression for the optimal strategy when the agent holds a linear position in U . When the exposure to the non-tradable risk factor ψ( U T ) is non-linear, we provide an approximation to the optimal strategy in closed-form, and prove that the value function is correctly approximated by this strategy when cross-impact and risk-aversion are small. We further prove that when ψ( U T ) is non-linear, the approximate optimal strategy can be written in terms of the optimal strategy for a linear exposure with the size of the position changing dynamically according to the exposure's "Delta" under a particular probability measure.

Full PDF

HHedging Non-Tradable Risks with Transaction Costs and PriceImpact (cid:73) ´Alvaro Cartea a , Ryan Donnelly b , Sebastian Jaimungal c a Mathematical Institute, University of Oxford,Oxford-Man Institute of Quantitative Finance, Oxford, United Kingdom b King’s College London, United Kingdom c University of Toronto, Toronto, Canada

Abstract

A risk-averse agent hedges her exposure to a non-tradable risk factor U using a correlatedtraded asset S and accounts for the impact of her trades on both factors. The eﬀect of theagent’s trades on U is referred to as cross-impact. By solving the agent’s stochastic controlproblem, we obtain a closed-form expression for the optimal strategy when the agent holdsa linear position in U . When the exposure to the non-tradable risk factor ψ ( U T ) is non-linear, we provide an approximation to the optimal strategy in closed-form, and prove thatthe value function is correctly approximated by this strategy when cross-impact and risk-aversion are small. We further prove that when ψ ( U T ) is non-linear, the approximate optimalstrategy can be written in terms of the optimal strategy for a linear exposure with the sizeof the position changing dynamically according to the exposure’s “Delta” under a particularprobability measure. Keywords: non-tradable risk, hedging, algorithmic trading, price impact (cid:73)

SJ would like to acknowledge the support of the Natural Sciences and Engineering Research Council ofCanada (NSERC), [funding reference numbers RGPIN-2018-05705 and RGPAS-2018-522715].The authors would like to thank participants at the Research in Options Conference, the SIAM AnnualGeneral Meeting, the Bachelier World Congress, the INFORMS Annual Meeting, the Western Conference onMathematical Finance, and the SIAM Financial Mathematics and Engineering Conference for comments onthis article. (cid:73)(cid:73)

The data that support the ﬁndings of this study are available from the corresponding author upon rea-sonable request.

Email addresses: [email protected] ( ´Alvaro Cartea), [email protected] (Ryan Donnelly), [email protected] (Sebastian Jaimungal) a r X i v : . [ q -f i n . M F ] M a r . Introduction In this paper we show, for the ﬁrst time, how a risk-averse agent manages her exposure toa non-tradable risk factor while taking into account trading price impact. The agent cantrade in a correlated asset to hedge her exposure. Ideally, this position in the traded assetis achieved immediately, however price impact restricts the speed at which the agent cantrade. On the other hand, trading too slowly exposes the agent to the risk associated withthe non-tradable risk factor.Price impact can generally be classiﬁed by the timescale of its persistence into two types:temporary or permanent. The ﬁrst occurs when the volume of the trade exceeds the availableliquidity at the best quote in the limit order book (LOB). The second occurs due to updatesof limit orders to reﬂect the arrival of new information conveyed by the liquidity taking order.Some studies of price impact eﬀects include Potters and Bouchaud (2003), Cont et al. (2014),Donier et al. (2015), and Bacry et al. (2015). Our problem is related to two strands ofliterature, one is the optimal execution of large positions, and the other is the hedging of non-tradable risks. The execution of large positions with price impact has been studied extensivelyin the literature, see the early work of Almgren and Chriss (2001), and more recently Gu´eant(2015), Cartea et al. (2015), Bechler and Ludkovski (2015), and Gu´eant (2016).We provide three examples where investors are exposed to a non-tradable risk factor. The ﬁrsttwo are when the non-tradable factor is a ﬁnancial instrument which the agent is restrictedfrom trading for legal or regulatory reasons. The third is when the non-tradable factor isnot a ﬁnancial instrument. (i) Employees who are given compensation in the form of optionswritten on the stock of their ﬁrm may be bound to a covenant that precludes them fromtrading the options or the ﬁrm’s stock for a period of time. (ii) A regulatory body imposes ashort selling ban on stocks as were the cases in 2008 and 2011 . Investors holding derivativeswritten on assets with short-sell bans would have to seek out unrestricted and correlatedassets to hedge their exposures. (iii) Weather derivatives may be hedged by taking positionsin traded stocks of ﬁrms whose ﬁnancial performance is correlated to weather. Henceforth,we interpret the non-tradable factor as an asset which the agent is precluded from tradingand the agent either holds shares of this asset or a European-style contingent claim on thenon-tradable risk factor.We solve in closed-form for the agent’s value function (Proposition 1) and optimal trading . Model In this section we outline the dynamics of multiple assets that include the price impact eﬀectsof the agent’s trading, as well as the dynamics of the agent’s inventory and cash holdings.We denote by S ν = ( S νt ) t ∈ [0 ,T ] the (controlled) midprice process of a traded asset, and by U ν = ( U νt ) t ∈ [0 ,T ] the (controlled) value process of a non-tradable risk factor. We assume theagent is able to directly trade S and she has additional exposure to U such that her wealthincreases by ψ ( U ) at some future time T , where ψ : R → R . Although she is unable todirectly trade in U , trades that occur in S have an eﬀect on the value of the non-tradable riskfactor. We also let Q ν = ( Q νt ) t ∈ [0 ,T ] denote the (controlled) inventory process in the tradedasset held by the agent, and let the control ν = ( ν t ) t ∈ [0 ,T ] denote the rate at which this asset isacquired (a positive/negative value indicates she is buying/selling the asset). The dynamicsof the controlled inventory are Q νt = q + (cid:90) t ν u du . (1)The traded asset price and non-tradable risk factor satisfy the SDEs dS νt = ( µ + b ν t ) dt + σ dW t , (2) dU νt = ( β + c ν t ) dt + η dZ t . (3)Here, ( W t ) t ∈ [0 ,T ] and ( Z t ) t ∈ [0 ,T ] are standard Brownian motions with correlation ρ ∈ ( − , b ν t , with b ≥ S has a cross-price impact on U .This is accounted for by the inclusion of the term c ν t , with c constant, in the drift of U and β is a constant.In addition to the permanent impact on midprices, we model a temporary price impact byintroducing an execution price, which we denote by ˆ S ν = ( ˆ S νt ) t ∈ [0 ,T ] and is given byˆ S νt = S νt + k ν t . (4)The execution price is the value the agent pays to acquire shares of the traded asset. Tradingat a faster rate induces an execution price which is farther away from the midprice, in additionto aﬀecting the drift of the asset. The temporary price impact can be considered a result oflimit order book microstructure. The permanent price impact can be thought of, amongother eﬀects, as the result of information leakage which induces other market participants to4odify existing orders. For further discussions on temporary and permanent impact arisingfrom LOB dynamics see Cartea et al. (2015).As the agent executes trades in the asset S , she must withdraw or deposit appropriate fundsfrom her cash holdings, which have value denoted by X ν = ( X νt ) t ∈ [0 ,T ] and equals X νt = x − (cid:90) t ˆ S νu ν u du . (5)Throughout, we work on the completed and ﬁltered probability space (Ω , P , {F t } t ∈ [0 ,T ] ) where F t is the standard augmentation of the natural ﬁltration generated by ( W u , Z u ) u ∈ [0 ,t ] . The agent employs an exponential utility function with risk aversion parameter γ and aims tomaximize her expected utility of wealth at time T . At time T the exposure to the non-tradablerisk factor directly aﬀects the agent’s wealth. At this time her wealth consists of her cashholdings, the value included in her inventory holdings of the traded asset, and the exposure ψ ( U T ). If she acts according to a trading strategy ν ∈ A , where the set of admissible tradingstrategies A consists of F -predictable processes such that E [ (cid:82) T ν u du ] < ∞ , her performancecriterion is given by H ν ( t, x, q, S, U ) = E t,x,q,S,U (cid:20) − e − γ (cid:0) X νT + Q νT ( S νT − α Q νT ) + ψ ( U νT ) (cid:1)(cid:21) , (6)where E t,x,q,S,U [ · ] represents expectation conditional on X νt = x , Q νt = q , S νt = S , and U νt = U . The term α ( Q νT ) represents a price penalty that the agent incurs from having toliquidate her inventory at time T and incentivizes her to hold a small inventory position nearmaturity.In general the liquidation of terminal inventory Q νT may have a cross-impact eﬀect on the non-traded risk factor U νT , and so ψ should depend on both U T and Q T . This would complicatethe analysis and in reality there are many situations in which this cross-impact would not berealized. For example, if the exposure ψ ( U T ) is cash settled and the liquidation of Q νT shares isconducted immediately after the settlement, then any cross impact eﬀect of this trade wouldbe irrelevant because the agent does not physically hold exposure to U T anymore.Her value function is H ( t, x, q, S, U ) = sup ν ∈A H ν ( t, x, q, S, U ) . (7)5he control problem posed in (7) has the associated Hamilton-Jacobi-Bellman (HJB) equa-tion: ∂ t H + sup ν (cid:26) ν ∂ q H − ( S + k ν ) ν ∂ x H + ( µ + b ν ) ∂ S H + σ ∂ SS H + ( β + c ν ) ∂ U H + η ∂ UU H + ρ σ η ∂ SU H (cid:27) = 0 ,H ( T, x, q, S, U ) = − e − γ ( x + q ( S − α q )+ ψ ( U )) . (8)In the next section we assume that the exposure ψ ( U ) is linear in the value of the non-tradablerisk factor. This allows us to solve for the value function and the optimal trading strategy inclosed-form. In Section 4 we relax this assumption and provide solutions which are correctup to corrections that vanish in the limit of small risk-aversion and cross-impact.

3. Linear Exposure

We consider the special case where the exposure to the non-tradable risk factor is linear: ψ ( U ) = N U . A direct interpretation of this exposure is that the agent holds N shares ofthe non-tradable risk factor and is restricted from trading it ∀ t ∈ [0 , T ), but at time T thisrestriction is lifted and the shares are immediately liquidated. We also assume 2 α − b > t ∈ [0 , T ]. If this inequality is not obeyed, then it ispossible for the terms in (11b) and (11c), which are shown below, to explode. This inequalityis typically satisﬁed in practical examples, because the reverse inequality induces the agentto buy (or sell) very large quantities and destabilize prices, then liquidate this large positionwith a smaller penalty than the gain incurred by the original price movements.The following proposition and theorem are particular cases of well-known results in linear-quadratic-exponential Gaussian control (for example, see Jacobson (1973) and Duncan (2013)).We include them for completeness and because the expression in equation (12) in Theorem 2plays an important role in one of our subsequent results. In principle we could include a liquidation penalty as we do for the traded asset, however as the agenthas no control over the number of shares of U that she holds during [0 , T ], this penalty would factor out ofthe performance criteria as a constant and would have no eﬀect on the optimal trading strategy. roposition 1 (Linear Exposure Value Function). With ψ ( U ) = N U the solution toequation (8) together with its terminal condition is given by H ( t, x, q, S, U ) = − exp {− γ ( x + q S + N U + h ( t, q ; N )) } , (9) where h ( t, q ; N ) = h ( t ; N ) + h ( t ; N ) q + h ( t ) q . (10) The time-dependent functions h , h , h are given by h ( t ; N ) = (cid:0) β N − γ η N (cid:1) ( T − t ) + k (cid:90) Tt ( h ( s ; N ) + c N ) ds , (11a) h ( t ; N ) = ζ kω φ − (1 − e − ωk ( T − t ) ) − φ + (1 − e ωk ( T − t ) ) + ω cζ k N φ − e − ωk ( T − t ) + φ + e ωk ( T − t ) − c N , (11b) h ( t ) = ω φ − e − ωk ( T − t ) − φ + e ωk ( T − t ) φ − e − ωk ( T − t ) + φ + e ωk ( T − t ) − b , (11c) and the constants ζ = µ − γ ρ σ η N , ω = (cid:113) k γ σ , and φ ± = ω ± α ∓ b . Proof

For a proof see the Appendix.

Theorem 2 (Optimal Trading Strategy: Linear Case).

The optimal trading speed ν ∗ t = k (cid:0) c N + h ( t ; N ) + (2 h ( t ) + b ) Q ν ∗ t (cid:1) , (12) is admissible, and the solution provided in (9) is indeed the value function. Moreover, theoptimal level of inventory is deterministic and is given by Q ν ∗ t = (cid:18) ζ k ( φ − − φ + )4 ω + c N (cid:19) e ωk t − e − ωk t (cid:96) ( T ) − ζ k ω (cid:18) (cid:96) ( T − t ) (cid:96) ( T ) − (cid:19) + Q (cid:96) ( T − t ) (cid:96) ( T ) (13) with (cid:96) ( t ) = φ + e ωk t + φ − e − ωk t , and ζ , ω , and φ ± as in Proposition 1. Proof

For a proof see the Appendix. 7he optimal trading strategy in Theorem 2 shows how trading in asset S is aﬀected by theexposure to asset U . In the simpliﬁed case of no cross-impact ( c = 0), the trading strategyis identical to the single asset case except with the drift modiﬁed to µ − γ ρ σ η N . Thismodiﬁcation represents the trade-oﬀ between a source of expected returns and a source ofrisk. Holding an inventory of Q t means that the agent’s wealth is increasing at a rate of µ Q t ,but at the same time there is a risk contribution of the form ρ σ η N Q t due to covariationbetween S and U . This drift modiﬁcation has an interesting consequence that is illustratedmost clearly when µ = 0 and Q = 0. If the agent has no exposure to the non-tradablerisk factor ( N = 0), and if she does not speculate on the future value of the traded asset( µ = 0), then she has no reason to acquire any shares and will optimally hold a zero positionfor the whole trading period. This becomes apparent in equation (13) when ζ = 0. However,if she holds a linear position in U , then she takes a non-zero position in the traded assetdue to her ability to partially hedge the risk in U . This qualitative diﬀerence in the tradingstrategy exempliﬁes the importance of considering the interaction between the traded andnon-tradable risk factors.Although it may not be apparent from the formulation of the problem or the explicit formof the equations which dictate the optimal strategy, there is a speciﬁc inventory level of thetraded asset that the agent favors and attempts to hold if the trading period is long. Anydeviation from this position is caused by the various forms of frictions and penalties that theagent has to pay. For example the temporary price impact incurs larger costs to the agentif she trades too quickly, and the terminal inventory penalty means the agent favors smallerinventory levels as the end of the trading period approaches.To formulate our notion of the agent’s desired long horizon position, we introduce a quantitywe refer to as relative time. As t ∈ [0 , T ], any instant in the trading period can be expressedin the form t = κ T for κ ∈ [0 , κ as the relative time. Proposition 3 (Long Horizon or Frictionless Position).

Fix a relative time κ ∈ (0 , and let t = κ T . Then lim T →∞ Q ν ∗ κ T = µ − γ ρ σ η N γ σ = lim k → Q ν ∗ κ T . (14) Proof

For a proof see the Appendix.We illustrate the optimal strategy numerically in Figure 1. As long as T is suﬃciently large,the ﬁrst equality in (14) tells us that the agent desires to hold this inventory position for8 Figure 1: Agent’s optimal inventory position over time. In the left panel, the length of the trading periodends at T = 0 . T = 3. Other model parameters are µ = 0, β = 0, σ = 1, η = 1, ρ = 0 . b = 10 − , c = 10 − , k = 10 − , γ = 1, α = 0 .

05. Solid curves are used when the agent is exposedto one share of the non-tradable risk factor ( N = 1). Dashed curves represent the Almgren-Chriss strategywhen there is no exposure to the non-tradable risk-factor ( N = 0). The long horizon level of Proposition 3which all solid curves approach in the right panel is given by Q = − . as long as possible. The exception is towards the beginning and end of the trading period.This behavior is reﬂected by the fact that we are required to exclude κ = 0 and κ = 1 in theproposition. The agent favors this position because it maximizes the return versus risk over allpossible inventory levels. For a ﬁxed inventory level q in the traded asset, the instantaneousexpected return is µ q . However, this position exposes the agent to an instantaneous level ofrisk of the form ρ σ η N q + σ q (the agent is also exposed to an instantaneous risk of the form η N but the agent has no control over this quantity). Taking a diﬀerence of the return andrisk scaled by γ and then maximizing with respect to q gives the same expression as in equation(14). Thus, this is the optimal position in the traded asset which balances instantaneous riskand return. The second equality in (14) tells us that this is the optimal position that the agentwould hold if frictionless trading were possible. The equivalence between these two limits isan indicator that the agent attempts to trade towards the frictionless optimal inventory level,but is only prevented from doing so due to the frictions involved with trading.

4. Non-Linear Exposure

In this section the agent is exposed to the non-tradable risk factor in the form ψ ( U ), whichwe may interpret as holding a European-style contingent claim written on the non-tradablerisk factor. The performance criterion, value function, and associated HJB equation are thesame as (6), (7), and (8), respectively. The non-linear payoﬀ prevents us from disentangling9he dependence between U and the other variables, so we propose the ansatz H ψ ( t, x, q, S, U ; c, γ ) = − exp {− γ ( x + q S + h ψ ( t, q, U ; c, γ )) } . (15)We show explicit dependence of the functions H ψ and h ψ on c and γ because these twoparameters are used in an expansion approximation, which we discuss below. We also makethe dependence on ψ explicit for clarity. Substituting this ansatz into (8) yields the followingequation for h ψ : ∂ t h ψ + µ q − γ σ q + ( β − γ ρ σ η q ) ∂ U h ψ + η ∂ UU h ψ − γ η ( ∂ U h ψ ) + sup ν (cid:8) ν∂ q h ψ + c ν ∂ U h ψ + b q ν − k ν (cid:9) = 0 ,h ψ ( T, q, U ; c, γ ) = ψ ( U ) − α q . (16)The supremum in the preceding equation, which provides us with the feedback form of theoptimal strategy, is achieved at ν ∗ ( t, q, U ; c, γ ) = k ( ∂ q h ψ + c ∂ U h ψ + b q ) . (17)Substituting this value of ν into equation (16) gives ∂ t h ψ + µ q − γ σ q + ( β − γ ρ σ η q ) ∂ U h ψ + η ∂ UU h ψ − γ η ( ∂ U h ψ ) + k ( ∂ q h ψ + c ∂ U h ψ + b q ) = 0 . (18)It is easily checked that if ψ ( U ) = N U , then this equation along with its terminal conditionare solved by h ψ ( t, q, U ; c, γ ) = h ( t, q ) + N U , which also gives H ψ = H (as in Proposition 1)as expected. For general forms of the payoﬀ ψ we are not able to ﬁnd closed-form expressionswhich solve equation (16), but if we consider small values of model parameters c and γ wecan obtain solutions that are approximate in an asymptotic sense.It is reasonable to suppose that the cross-price impact factor c is smaller than both thetemporary and permanent price impact factors. Indeed, the eﬀect that trading in one stockhas on the price of another stock should be signiﬁcantly less than the eﬀect that it has onits own price. For this reason, the parameter c is one choice for which we may performan asymptotic expansion. We also perform the expansion with respect to the risk-aversionparameter γ . To this end, we perform the expansion in each quantity simultaneously byintroducing an expansion parameter θ and making the substitutions c (cid:55)→ θ c and γ (cid:55)→ θ γ . Assumption 4.

We make the following technical assumptions to prove the validity of theexpansion. ψ ∈ C ( R ) with all four derivatives bounded. ii) 2 α − b > . iii) Given initial states x , q , S , and U , there exist positive constants θ ∗ < , (cid:15) ∗ , C , and D that satisfy the following uniform boundedness condition: for every θ ∈ (0 , θ ∗ ) and (cid:15) ∈ (0 , (cid:15) ∗ ) , if ν is an admissible control such that H ν (0 , x, q, S, U ; θ c, θ γ ) + (cid:15) ≥ H ψ (0 , x, q, S, U ; θ c, θ γ ) , then E (cid:20)(cid:90) T e D ( | X νt | + | Q νt S νt | + | Q νt | +( Q νt ) + | U νt | ) dt (cid:21) ≤ C . (19)Assumption 4 i) eliminates the consideration of vanilla European option payoﬀs as they are nottwice continuously diﬀerentiable (even in a weak sense the second derivative is not bounded).However this complication can be avoided by using a regularized version of the payoﬀs, e.g., byassuming that the option with maturity T expires at time T + δt , for δt arbitrarily small. Thiscondition ensures that many of the terms in the expansion below have bounded derivativeswith respect to U and allows us to make certain growth estimates more easily.Assumption 4 ii) is made for the same reason as the case of the linear payoﬀ. It ensures thatthe terms in the expansion are well deﬁned for all t ∈ [0 , T ].Finally, Assumption 4 iii) can be interpreted as a condition on boundedness/continuity withrespect to the space of admissible controls. It states that a particular exponential moment isuniformly bounded over a set of controls suﬃciently close to optimal. In proving the validityof our approximation, this inequality allows us to bound the magnitude of the error, and thekey point is that one can choose the constant C so that it does not depend on θ (though itmay depend on θ ∗ ). Recall that if ψ is linear then the optimal control is deterministic and weremark that such a bound can be found for all optimal controls locally uniformly with respectto θ .Before the theorem, we introduce a lemma which is useful in showing that many relevantquantities are diﬀerentiable and bounded. This lemma concerns the function g ( t, U ) = E [ ψ ( ˜ U T ) | ˜ U t = U ] , (20)11here the process ˜ U = ( ˜ U t ) t ∈ [0 ,T ] satisﬁes the SDE d ˜ U t = β dt + η dZ t . (21)This function plays an important role in our approximation to the value function and in ourcandidate approximately optimal trading strategy. We remark that ∂ U g ( t, U ) measures thesensitivity of g to changes in the underlier U and therefore has an interpretation similar tothat of the “delta” of an option. It is helpful in the discussion below to directly interpretthis derivative as an option’s “delta” even though they are not strictly equal because theexpectation in (28) is taken under the physical measure rather than an equivalent risk-neutralmeasure. In addition, the process ˜ U above is a ﬁctitious process that equals the path of U when there is no cross impact from trading. Lemma 5 (Future Option Delta).

Suppose ψ satisﬁes Assumption 4 i) ( ψ ∈ C ( R ) withbounded derivatives up to fourth order). Then E (cid:104) ∂ U g ( s, ˜ U s ) | ˜ U t = U (cid:105) = ∂ U g ( t, U ) , ∀ t ≤ s ≤ T . (22)

In addition, if the function f : R (cid:55)→ R is integrable, then E (cid:20)(cid:90) Tt f ( s ) ∂ U g ( s, ˜ U s ) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) = ∂ U g ( t, U ) (cid:90) Tt f ( s ) ds . (23) Finally, the expressions in (22) and (23) have derivatives up to third order with respect to U which are bounded and continuous. Proof

Write g ( t, U ) in terms of the transition density of the process ˜ U . Let p ( z ; t, T, U ) = 1 (cid:112) π η ( T − t ) exp (cid:26) − ( z − U − β ( T − t )) η ( T − t ) (cid:27) , therefore g ( t, U ) = E [ ψ ( ˜ U T ) | ˜ U t = U ] = (cid:90) ∞−∞ ψ ( z ) p ( z ; t, T, U ) dz = (cid:90) ∞−∞ ψ ( x + U ) p ( x ; t, T, dx. The Leibniz integration rule may be used to diﬀerentiate the expression above because thederivative of ψ is bounded, and we write ∂ U g ( t, U ) = (cid:90) ∞−∞ dψdU ( x + U ) p ( x ; t, T, dx = (cid:90) ∞−∞ dψdU ( z ) p ( z ; t, T, U ) dz = E (cid:20) dψdU ( ˜ U T ) | ˜ U t = U (cid:21) . This ﬁnal expression is a Doob martingale, which shows the ﬁrst claim. The second claimfollows from Fubini’s Theorem. The third claim follows from applying the ﬁrst and secondclaims to a modiﬁed payoﬀ by replacing ψ with dψ/dU , d ψ/dU , or d ψ/dU . (cid:3) ∂ U g ( t, ˜ U t ) is a martingale, and thereforethe expected value of an option’s delta in the future is equal to its delta at the present. Thesecond claim states that the expected average future value of the option’s delta is equal toits present value when f ≡

1. In addition to providing convenient bounds throughout muchof the following, many of the appearances of ∂ U g ( t, ˜ U t ) within complicated expressions beloweasily simplify – this motivates Proposition 8. Theorem 6 (Asymptotic Approximation of Value Function).

The function h ψ in equa-tion (15) admits the following approximation:i) Expansion : h ψ ( t, q, U ; θ c, θ γ ) = ˆ h ( t, q, U ; θ c, θ γ ) + R ( t, q, U ; θ ) , ˆ h ( t, q, U ; θ c, θ γ ) = h ( t, q, U ) + θ ( c h ( t, q, U ) + γ h ( t, q, U ))+ θ (cid:0) c h ( t, q, U ) + c γ h ( t, q, U ) + γ h ( t, q, U ) (cid:1) , (24) such that lim θ ↓ θ R ( t, q, U ; θ ) = 0 . (25) ii) Zero and First Order Terms : The functions h , h , and h may be taken as h ( t, q, U ) = f ( t ) + f ( t ) q + f ( t ) q + g ( t, U ) , (26a) h ( t, q, U ) = λ ( t, U ) + λ ( t, U ) q , (26b) h ( t, q, U ) = Λ ( t, U ) + Λ ( t, U ) q + Λ ( t ) q , (26c) where by letting m = 2 α − b , f ( t ) = k (cid:90) Tt f ( s ) ds , (27a) f ( t ) = µ ( T − t )(4 k + m ( T − t ))4 k + 2 m ( T − t ) , (27b) f ( t ) = − k m k + m ( T − t ) − b , (27c) g ( t, U ) = E [ ψ ( ˜ U T ) | ˜ U t = U ] , (28)13 ( t, U ) = E (cid:20)(cid:90) Tt f ( s )2 k (cid:18) λ ( s, ˜ U s ) + ∂ U g ( s, ˜ U s ) (cid:19) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , (29a) λ ( t, U ) = − m k + m ( T − t ) E (cid:20)(cid:90) Tt ∂ U g ( s, ˜ U s ) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , (29b)Λ ( t, U ) = k E (cid:20)(cid:90) Tt (cid:16) f ( s )Λ ( s, ˜ U s ) − k η ( ∂ U g ( s, ˜ U s )) (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , (30a)Λ ( t, U ) = k E (cid:20)(cid:90) Tt k + m ( T − s )2 k + m ( T − t ) (cid:16) f ( s )Λ ( s ) − k ρ σ η ∂ U g ( s, ˜ U s ) (cid:17) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , (30b)Λ ( t ) = − σ ( T − t ) 12 k + 6 k m ( T − t ) + m ( T − t ) k + m ( T − t )) , (30c) where the process ˜ U = ( ˜ U t ) t ∈ [0 ,T ] satisﬁes the SDE d ˜ U t = β dt + η dZ t , (31) iii) Second Order Terms : The functions h , h , and h may be taken as h ( t, q, U ) = A ( t, U ) + A ( t, U ) q + A ( t, U ) q , (32a) h ( t, q, U ) = B ( t, U ) + B ( t, U ) q + B ( t, U ) q , (32b) h ( t, q, U ) = C ( t, U ) + C ( t, U ) q + C ( t, U ) q . (32c) where each A , , , B , , , and C , , is bounded and continuously diﬀerentiable with respect to U . Proof

See Appendix A.The decomposition of the value function warrants some discussion, but much of the intuitionbehind these expressions becomes clearer when we consider how they inﬂuence an approx-imately optimal trading speed. This is demonstrated in the next theorem. An immediateconsequence of this theorem is that the inventory process becomes stochastic.

Theorem 7 (Asymptotic Approximation of Optimal Trading Speed).

Let ˆ ν be a feed-back control given by ˆ ν ( t, q, U ; θ c, θ γ ) = ν ( t, q ) + θ ( c ν ( t, U ) + γ ν ( t, q, U )) , (33)14 ith ν ( t, q ) = k ( f ( t ) + (2 f ( t ) + b ) q ) , (34a) ν ( t, U ) = k ( ∂ U g ( t, U ) + λ ( t, U )) , (34b) ν ( t, q, U ) = k (Λ ( t, U ) + 2 Λ ( t ) q ) . (34c) Then ˆ ν t = ˆ ν ( t, Q ˆ νt , U ˆ νt , θ c, θ γ ) is an admissible control. Deﬁning h ˆ ν by the relation H ˆ ν ( t, x, q, S, U ; θ c, θ γ ) = − e − θ γ ( x + q S + h ˆ ν ( t,q,U ; θ c,θ γ )) , ˆ ν is asymptotically optimal to second order: h ψ ( t, q, U ; θ c, θ γ ) = h ˆ ν ( t, q, U ; θ c, θ γ ) + o ( θ ) . Proof

For the proof see the Appendix.For the purposes of discussing the interpretation of the quantities in (34) we assume that ∂ U g is positive for all t and U . Nearly all of the discussion below holds similarly if ∂ U g isnegative, except with the agent’s actions also being appropriately changed (i.e., selling insteadof buying).The zero order term, which we denote by ν , has a clear interpretation. This term representsthe optimal trading speed of a risk-neutral agent when there is no cross-price impact betweenthe traded and the non-tradable risk factors. The feedback form of this term is the same asthe term that appears in an optimal execution program for a single asset with no risk-aversion.Observe that the zero order term h of the value function in (26a) is the sum of the valueof such an optimal trading program as well as the expected future payoﬀ ψ under Bachelierdynamics. This is again due to the lack of risk-aversion and, in this limit, the absence of anyinteraction between the S and U .The correction term ν in the optimal trading speed is due to cross-price impact and containstwo components. The term ∂ U g ( t, U ) arises directly due to the impact that the agent’s tradeshave on the current value of the option. As we assume g is an increasing function with respectto U , this term has the eﬀect of making the agent increase the speed of trading. Buying moreshares tends to increase the price process U t , which increases the value of the option.With increasing g , the second component λ ( t, U ) is negative as seen from (29b), whichresults in slowing down the rate of buying shares. This term arises from the agent’s desire15o ﬁnish with inventory close to zero to avoid the terminal liquidation penalty. As she knowsthat any shares she buys now she will partially liquidate in the future, she wants to avoidaccumulating a large position in S which results in costly round-trip trades. The value of λ ( t, U ) is a measurement of the expected average future option delta weighted by how fastthe agent expects to liquidate in the future. By lowering the trading speed by this amount, theagent is balancing the beneﬁt of buying now and increasing the option value, while knowingshe has to sell in the future and lowering the option value, both trades incur a cost due totemporary price impact.Expression (34c) in the trading speed due to risk-aversion has two components. The term2 Λ ( t ) q acts to bring the agent’s inventory closer to zero (note that Λ is always negative).This term arises because the agent wants to avoid inventory risk, which exposes her to therisk in the traded asset price S t .The term Λ ( t, U ) has indeterminate sign, so it could result in either more or less buying. Itstems from two sources of risk as can be seen in the integrand of (30b). The ﬁrst is related toa tradeoﬀ between inventory risk and passive gain when holding non-zero inventory. If µ (cid:54) = 0then the agent has incentive to hold non-zero inventory and beneﬁt from the trending priceof S , but this also exposes the agent to risk when holding inventory due to unexpected pricechanges. If f ( t ) quantiﬁes the desired speed of trading to beneﬁt from price drift, then theﬁrst term in the integrand of (30b) quantiﬁes the correction associated with not accumulatinga risky position. The second source of risk in Λ is that associated with holding the optionand a non-zero position in the traded asset. As S and U are correlated, the agent can reduceher risk exposure by tending to favor a position in the traded asset which cancels out therandom changes in the option value While the approximation of the trading strategy given in Theorem 7 involves some complicatedexpressions, it makes it clear how each of the components of the dynamics aﬀect the agent’strading speed. In this section, we approximate the optimal control process by another simplercontrol with a closed-form expression that is easier to evaluate.Using Lemma 5, the approximation to the optimal control can be computed in closed-form,however, it involves the evaluation of several one-dimensional integrals of rational functions.Instead we employ the optimal strategy in the linear payoﬀ case (which admits a closed-formexpression, see Theorem 2) to provide an approximation for the non-linear case. We let v ∗ bethe feedback form of the optimal strategy when the agent has linear exposure of X units of16he non-tradable risk factor, which is given by v ∗ ( t, q, X ; θ c, θ γ ) = k ( θ c X + h ( t ; X, θ ) + (2 h ( t, θ ) + b ) q ) . (35)Our closed-form approximation for the optimal trading strategy is summarized by the follow-ing two results. Proposition 8 (Closed-form Approximation of Optimal Trading Speed).

The follow-ing approximation holds locally uniformly in ( t, q, U ) : v ∗ ( t, q, ∂ U g ( t, U ); θ c, θ γ ) = ˆ ν ( t, q, U ; θ c, θ γ ) + o ( θ ) . (36) Let ν (cid:48) be a control given by ν (cid:48) t = v ∗ ( t, Q ν (cid:48) t , ∂ U g ( t, U ν (cid:48) t ); θ c, θ γ ) . (37) Then ν (cid:48) is admissible. Deﬁne h ν (cid:48) by the relation H ν (cid:48) ( t, x, q, S, U ; θ c, θ γ ) = − e − θ γ ( x + q S + h ν (cid:48) ( t,q,U ; θ c,θ γ )) , so that ν (cid:48) is asymptotically approximately optimal to second order: h ψ ( t, q, U ; θ c, θ γ ) = h ν (cid:48) ( t, q, U ; θ c, θ γ ) + o ( θ ) . (38) Proof

For a proof see the Appendix.This proposition shows that the agent can approximate the optimal trading speed by tradingat time t as if she were holding ∂ U g ( t, U t ) units of the non-tradable risk factor, and, as before,the value of these units will not be paid until T . This approximation is sensible because anoption’s delta represents locally the equivalent number of shares of the underlier that theagent holds in terms of risk and reward exposure.This closed-form approximation works for the trading speed, but no such approximation holdsby making a similar substitution of ∂ U g ( t, U ) for N in the closed form expressions of inventoryand value function in the linear payoﬀ case. The inventory position at time t depends on theentire path of U up to time t , which is given by Q t = Q + (cid:90) t ν s ds . Thus, even if ν t depends on the process U only through its value at time t , the inventory doesnot have this property. 17 .2. Simulation of Agent’s Inventory Position In this section we consider a speciﬁc form of the exposure ψ and investigate the agent’s optimaltrading strategy. The exposure is in the form of N European call options written on U withstrike K = U . The maturity of the option is T + δt for a small value of δt . This ensures thatthe payoﬀ function ψ is twice continuously diﬀerentiable around T . Our approximation tothe value function and optimal trading speed require us to compute the value of the optionand its delta under Bachelier dynamics. Elementary computations show that g in equation(28) and its derivative are given by g ( t, U ) = N η √ T + δt − t ( z Φ( z ) + φ ( z )) , (39a) ∂ U g ( t, U ) = N Φ( z ) , and (39b) z = U − Kη ( T + δt − t ) − + βη ( T + δt − t ) , (39c)where Φ and φ are the standard normal cumulative distribution and density functions, re-spectively. We begin with the case γ = 0 to observe the impact of the parameter c on the agent’s tradingspeed. When γ = 0, we do not apply Proposition 8 to approximate the trading speed becausemany quantities in (12) are undeﬁned at γ = 0. It is possible, however, to compute themin the limiting sense γ →

0. Instead, an application of Theorem 7, along with Lemma 5 forcomputing λ , shows that for small c the optimal trading speed may be approximated byˆ ν t = k ( f ( t ) + (2 f ( t ) + b ) Q t ) + c (2 k + m ( T − t )) − ∂ U g ( t, U t ) . (40)Interestingly, if we force the agent to ﬁnish with zero inventory, by taking the limit α → ∞ ,then the eﬀect of cross-price impact disappears (recall m = 2 α − b ) and the agent behavesaccording to an optimal trading program with one asset. This is because the net eﬀect of theagent’s trading on the process U only depends on the net change in inventory, which is alwaysequal to − c Q if the agent must have Q T = 0. If the total eﬀect on U is the same regardlessof the trading strategy then there can be no additional beneﬁt of basing the trades oﬀ of U .We simulate several paths of the price process U t taking into account the cross impact ofthe agent’s own trades and plot the resulting inventory paths. These are shown in Figure2. We see distinct behaviour depending on whether the option ends in-the-money or not.As the option maturity approaches, if the agent can be relatively certain that it will expireout-of-the-money then she begins to adopt a strategy which essentially mimics a risk-neutraloptimal liquidation program as in Almgren and Chriss (2001).18 Figure 2: Agent’s optimal inventory position over time for 5 simulated paths of U . In the left panel the agent’sinventory position is displayed. The middle panel shows the agent’s trading speed. The right panel showsthe price of the non-tradable risk factor. Colors are chosen based on the ﬁnal value of U T (larger values arered, smaller values are blue). Other model parameters are µ = 0, β = 0, σ = 1, η = 1, ρ = 0 . b = 10 − , c = 10 − , k = 10 − , γ = 0, α = 0 . N = 100, δt = 10 − . On the other hand, if the agent believes the option will end up in-the-money, then she choosesa target inventory level which is not zero. If U t is suﬃciently larger than K and t suﬃcientlyclose to T , then ∂ U g ( t, U t ) is equal to N , the number of options held, until maturity. This isseen by expanding and rearranging (40) to giveˆ ν t = f ( t )2 k − m ( Q t − cm ∂ U g ( t, U t ))2 k + m ( T − t ) . (41)The second term on the right-hand side in (41) has the eﬀect of making the inventory Q tend to c ∂ U g ( t, U t ) /m – recall that dQ t = ν t dt . The magnitude of this eﬀect is intensiﬁedas the strategy gets closer to T . For the choice of parameters in Figure 2, the value of c/m is approximately 0 .

01 and we see that the in-the-money inventory paths approach this valuemultiplied by N at time T .At the beginning of the trading period the agent begins to purchase shares, which exerts asmall pressure to increase the value of the option. Once the path of the non-tradable riskfactor begins to develop, she updates the probability which she assigns to the option expiringin or out-of-the-money.Figure 2 shows only a small number of paths, but the general distribution of some valuesis also of interest, in particular the distribution of terminal inventory. Figure 3 shows thedistribution of total inventory along with a scatter plot of the terminal inventory versus theterminal value of the non-tradable risk factor.19 Figure 3: Distribution of agent’s terminal inventory and dependence of terminal inventory on terminal valueof the non-tradable risk factor. Parameters are identical to those in Figure 2. Number of simulations is M = 10 , Here, we set c = 0 to consider only the eﬀect of risk-aversion. We apply Proposition 8 directlyto compute the approximate optimal trading speed in closed-form. The paths of U shown inthe right panel of Figure 4 are the same as the unaﬀected paths from the previous example.That is, the realizations of the two Brownian motions are the same, but due to cross-priceimpact the actual paths are diﬀerent. The magnitude of the diﬀerence is imperceptible in thisexample.The general eﬀect of risk-aversion in this example is to take a short position in the tradedasset in a gradual manner, and then part way through the trading period to buy back thatposition and end with inventory close to zero. This is expected from a risk-averse agent whenthe payoﬀ is a call option and the two assets have positive instantaneous correlation. Theshort position tends to decrease the variability in the overall holdings, which consists of thetraded asset and the option.If we compare the results in Figure 4 with those in Figure 2, we see that the eﬀect of risk-aversion is opposite to the eﬀect of cross-price impact. A positive cross-price impact parameterincentivizes the agent to acquire a long position in the traded asset whereas risk-aversion willalways give incentive to short. In addition, the amount the agent desires to short dependson her estimate of the probability that the option will end up in-the-money or not. If it isvery likely that the option ends in-the-money, then she will acquire a larger short position.If the U moves in such a way that the agent expects with great conﬁdence that the optionwill expire out-of-the-money, then she ceases the acquisition of the short position early andtrades to target zero inventory at the end of the trading period. These two extreme opposite20 Figure 4: Agent’s optimal inventory position over time for 5 simulated paths of U . In the left panel the agent’sinventory position is displayed. The middle panel shows the agent’s trading speed. The right panel showsthe value of the non-tradable risk factor. Colors are chosen based on the ﬁnal value of U T (larger values arered, smaller values are blue). Other model parameters are identical to those in Figure 2 except c = 0 and γ = 10 − . -0.25 -0.2 -0.15 -0.1 -0.05 00100200300400500 6 8 10 12 14-0.25-0.2-0.15-0.1-0.050 Figure 5: Distribution of agent’s terminal inventory and dependence of terminal inventory on the terminalvalue of the non-tradable risk factor. Parameters are identical to those in Figure 4. Number of simulations is M = 10000. outcomes are seen by comparing the two price paths in Figure 4 which end at the highest andlowest points. The remaining paths have an intermediate behavior.Also of notable interest is that the variance of the inventory position is greatest at the halfway point of the trading period. It is of interest to consider the behavior of the strategy when the eﬀects of cross-price impactand risk-aversion are present because these eﬀects tend to oppose each other. Figure 6 showsthe trading strategy and associated inventory path when both cross-price impact and risk-aversion are present. We see a combination of the counteracting eﬀects that take place, namely21

Figure 6: Agent’s optimal inventory position over time for 5 simulated paths of U . In the left panel the agent’sinventory position is displayed. The middle panel shows the agent’s trading speed. The right panel shows thevalue of U T . Colors are chosen based on the ﬁnal value of U T (larger values are red, smaller values are blue).Other model parameters are identical to those in Figure 2 except c = 10 − and γ = 2 · − . -0.2 0 0.2 0.4 0.6 0.8050010001500 6 8 10 12 14-0.500.51 Figure 7: Distribution of agent’s terminal inventory and dependence of terminal inventory on U T . Parametersare identical to those in Figure 6. Number of simulations is M = 10000. the agent acquires a short position over most of the trading period to mitigate risk, but ratherthan liquidating this position she has incentive to acquire a long position before maturity ifshe is conﬁdent the option will expire in-the-money.The counteracting eﬀects of the two expansion parameters also leads to interesting behaviorregarding the distribution of the agent’s inventory through time. Many algorithms that tradeoﬀ expected returns and risks or trading penalties have their lowest variance at the endpointsof the trading period (the variance will be zero at time 0 because the agent knows what theirinventory holding is). Low variance at the end of the trading period is generally expectedfor various reasons, such as the fact that a trading target is acquired or nearly acquired, orbecause non-zero inventory positions are undesirable over night. Figure 8 displays the samplemean and standard deviation of the agent’s inventory as a function of time.22 Figure 8: Sample mean and standard deviation of agent’s inventory over the course of the trading period.Parameters are identical to those in Figure 6. Number of simulations is M = 10000.

5. Conclusions

We solved a problem of an agent who has exposure to a risk-factor that cannot be directlytraded. The agent can trade in an asset which is correlated to the risk-factor to reduce riskexposure. In addition, the agent’s trades have an eﬀect on the immediate and future price ofthe traded asset as well as the future value of the non-tradable risk factor. When the exposureto the factor is linear we solve for the agent’s value function and optimal trading strategy inclosed-form. This closed-form consists of several terms that illustrate how the agent tradesoﬀ the risks and rewards of the combination of the positions in the two assets.When the exposure to the non-tradable risk factor has non-linear dependence we derive anapproximation to the agent’s value function which holds when the cross-price impact andrisk-aversion parameters are small. In addition, an observation about this expansion approxi-mation allows us to assert that the agent has a simple trading strategy (in closed-form) whichis also an approximation to the optimal strategy. Given the trading strategy which is optimalwhen the factor exposure is linear and interpreting the non-linear exposure as a Europeanoption written on the non-tradable risk factor, the agent should trade at time t as if she wereholding a number of units of the non-tradable risk factor that is equal to the option’s deltaat time t . The parameters of the expansion, cross-price impact and risk-aversion, aﬀect theoptimal trading strategy in qualitatively diﬀerent ways, inducing either long or short positionsdepending on which eﬀect is stronger. 23 . ProofsAppendix A: Proofs for Section 3 (Linear Exposure) The form of the terminal conditions and the coeﬃcients of the HJB equation suggest that wemake the ansatz H ( t, x, q, S, U ) = − e − γ ( x + q S + N U + h ( t,q )) . Substitute the expression into theHJB equation to obtain an equation satisﬁed by h ( t, q ): ∂ t h + sup ν (cid:26) ν ∂ q h − k ν + µ q + b q ν − σ γq + ( β + c ν ) N − η γ N − ρ σ η γ N q (cid:27) = 0 , (42)subject to terminal condition h ( T, q ) = − α q . The supremum is obtained at ν ∗ = k ( ∂ q h + b q + c N ) . (43)Substitute the optimal control into equation (42) to write the following non-linear PDE: ∂ t h + µ q − σ γ q + β N − η γ N − ρ σ η γ N q + ( ∂ q h + b q + c N ) k = 0 . (44)Once again, based on the form of the coeﬃcients and the terminal conditions for h , we suggestthe following form: h ( t, q ) = h ( t ) + h ( t ) q + h ( t ) q . (45)Substitute this form into equation (44) and group by like powers of q gives the followingsystem of equations: h (cid:48) ( t ) + β N − η γ N + k ( h ( t ) + c N ) = 0 , (46a) h (cid:48) ( t ) + µ − ρ σ η γ N + k ( h ( t ) + c N )(2 h ( t ) + b ) = 0 , (46b) h (cid:48) ( t ) − σ γ + k (2 h ( t ) + b ) = 0 , (46c)subject to the terminal conditions h ( T ) = 0, h ( T ) = 0, and h ( T ) = − α . Equation (46c) isuncoupled and of Riccati type, and may be solved explicitly. One may check that the solutionis given by (11c), it can be substituted into equation (46b), and the solution of this equationcan be checked to be given by (11b). (cid:3) .2. Proof of Theorem 2 Given the explicit form of the candidate solution in Proposition 1, insert h in equation (45)into (43), so that ν ∗ ( t, q ) = k ( c N + h ( t ) + (2 h ( t ) + b ) q ) . Assumption 4 ii) (recall: 2 α − b >

0) implies that h and h are bounded, thus the ODE dQ ν ∗ t = ν ∗ t dt has a solution for all t ∈ [0 , T ]. It is straightforward but tedious to show thatthe solution is given by (13). The solution Q ν ∗ t is deterministic, therefore it is bounded, andso is ν ∗ t = ν ∗ ( t, Q ν ∗ t ), thus (cid:82) T ( ν u ) du < + ∞ . Hence, as the solution to the associated HJBequation is classical and the feedback-form strategy is admissible, the strategy is indeed theone we seek, and the solution in Proposition 1 is indeed the value function. (cid:3) Substitute t = κ T into equation (13) and perform some elementary algebra to obtain Q ν ∗ κT = (cid:18) ζ k ( φ − − φ + )4 ω + c N (cid:19)(cid:18) e ωk ( κ − T − e − ωk ( κ +1) T φ + + φ − e − ωk T (cid:19) − ζk ω (cid:18) φ + e − ωk κ T + φ − e − ωk (2 − κ ) T φ + + φ − e − ωk T − (cid:19) + Q (cid:18) φ + e − ωk κ T + φ − e − ωk (2 − κ ) T φ + + φ − e − ωk T (cid:19) . Recall that ω = (cid:113) k γ σ and φ ± = ω ± α ∓ b , and that we assume 2 α − b >

0. As we restrict κ ∈ (0 , T → ∞ the numerator of each fraction above with exponential terms go to zero,and the denominators go to ω + α − b >

0. The only remaining term giveslim T →∞ Q ν ∗ κT = ζ k ω = µ − γ ρ σ η N γ σ , as desired. Similarly, as k ↓ α − b/ >

0. There is again a single remaining term givinglim k → Q ν ∗ κT = lim k → ζ k ω = µ − γ ρ σ η N γ σ . (cid:3) Appendix B: Proofs for Section 4 (Non-Linear Exposure)

Each of the three main proofs in this section (for Theorems 6 and 7, and Proposition 8) isbroken into multiple parts. The main component of each proof is to perform an approximate25eriﬁcation argument. These proceed by applying Ito’s Lemma to a candidate approximationof the value function where the underlying processes are controlled by a candidate approxi-mation of the optimal control. The desired approximation results then amount to boundingthe magnitude of the error with respect to optimality and showing that this error tends tozero at the appropriate rate.The veriﬁcation in Theorem 6 shows that our candidate approximation of the value function isaccurate up to second order. The veriﬁcations in Theorem 7 and Proposition 8 show that thecandidate approximation is accurate up to second order with respect to the performance crite-ria of both of our candidate controls. Combining these results means that these performancecriteria are also accurate up to second order to the value function.

The proof proceeds in two parts. First we substitute the formal expansion of (24) into equation(18) (with c and γ replaced by θ c and θ γ ) and group terms according to the zero, ﬁrst, andsecond order in θ . Second, we show that this formal second order expansion is valid in thesense that the limit in (25) holds by performing a veriﬁcation argument.Part I (formal solution): Substituting (24) into (18) and setting terms proportional to θ tovanish gives (cid:40) ∂ t h + µ q + β ∂ U h + η ∂ UU h + k ( ∂ q h + b q ) = 0 ,h ( T, q, U ) = − α q + ψ ( U ) . (47)It is easily veriﬁed that equation (47) has solution given by h ( t, q, U ) = f ( t ) + f ( t ) q + f ( t ) q + g ( t, U ) , (48a) f ( t ) = k (cid:90) Tt ( f ( s )) ds , (48b) f ( t ) = µ ( T − t )(4 k + m ( T − t ))4 k + 2 m ( T − t ) , (48c) f ( t ) = − k m k + m ( T − t ) − b , (48d) g ( t, U ) = E [ ψ ( ˜ U T ) | ˜ U t = U ] , (48e) d ˜ U t = β dt + η dZ t . (48f)26imilarly, grouping terms proportional to θ gives  c (cid:2) ∂ t h + β ∂ U h + η ∂ UU h + k ( ∂ q h + b q ) ( ∂ q h + ∂ U h ) (cid:3) + γ (cid:2) ∂ t h + β ∂ U h + η ∂ UU h + k ( ∂ q h + bq ) ∂ q h − σ q − ρ σ η q ∂ U h − η ( ∂ U h ) (cid:3) = 0 ,c h ( T, q, U ) + γ h ( T, q, U ) = 0 . (49)We seek solutions to equation (49) that do not depend on c or γ , hence, we set each term insquare brackets in equation (49) to zero independently.Thus, set the ﬁrst square bracket in (49) to zero, write h ( t, q, U ) in the form h ( t, q, U ) = λ ( t, U ) + λ ( t, U ) q , and set q and q terms to vanish independently, and obtain (cid:40) ∂ t λ + β∂ U λ + η ∂ UU λ + k f ( λ + ∂ U g ) = 0 ,λ ( T, U ) = 0 , (50)and (cid:40) ∂ t λ + β∂ U λ + η ∂ UU λ + k (2 f + b ) λ + k (2 f + b ) ∂ U g = 0 ,λ ( T, U ) = 0 , (51)where f , , ( t ) and g ( t, U ) are given in equations (48b) to (48e). By the Feynman-Kac formula,equations (50) and (51) have solutions given by λ ( t, U ) = E (cid:20)(cid:90) Tt f ( s )2 k (cid:18) λ ( s, ˜ U s ) + ∂ U g ( s, ˜ U s ) (cid:19) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , (52) λ ( t, U ) = − m k + m ( T − t ) E (cid:20)(cid:90) Tt ∂ U g ( s, ˜ U s ) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) . (53)Next, set the second square bracket of (49) to zero, write h ( t, q, U ) in form h ( t, q, U ) =Λ ( t, U ) + Λ ( t, U ) q + Λ ( t ) q , and set q , q , and q terms to zero independently, and write (cid:40) ∂ t Λ + β ∂ U Λ + η ∂ UU Λ + k f Λ − η ( ∂ U g ) = 0 , Λ ( T, U ) = 0 , (54) (cid:40) ∂ t Λ + β ∂ U Λ + η ∂ UU Λ + k (2 f + b ) Λ + k Λ f − ρ σ η ∂ U g = 0 , Λ ( T, U ) = 0 , (55) (cid:40) ∂ t Λ + k (2 f + b ) Λ − σ = 0 , Λ ( T ) = 0 . (56)The solution to ODE (56) isΛ ( t ) = − σ ( T − t ) 12 k + 6 k m ( T − t ) + m ( T − t ) k + m ( T − t )) . (57)27y the Feynman-Kac formula, equations (54) and (55) have solutionsΛ ( t, U ) = k E (cid:20)(cid:90) Tt (cid:16) f ( s )Λ ( s, ˜ U s ) − k η ( ∂ U g ( s, ˜ U s )) (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , (58)Λ ( t, U ) = k E (cid:20)(cid:90) Tt k + m ( T − s )2 k + m ( T − t ) (cid:16) f ( s )Λ ( s ) − k ρ σ η ∂ U g ( s, ˜ U s ) (cid:17) ds (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) . (59)Finally, group the terms proportional to θ and obtain  c (cid:2) ∂ t h + β ∂ U h + η ∂ UU h + k ( ∂ U h + ∂ q h ) + k ( ∂ q h + b q ) ( ∂ u h + ∂ q h ) (cid:3) + c γ (cid:2) ∂ t h + β ∂ U h + η ∂ UU h + k ( ∂ q h + ∂ U h ) ∂ q h + k ( ∂ q h + b q ) ( ∂ U h + ∂ q h ) − η ∂ U h ∂ U h − ρ σ η q ∂ U h (cid:3) + γ (cid:2) ∂ t h + β ∂ U h + η ∂ UU h + k ( ∂ q h ) + k ( ∂ q h + b q ) ∂ q h − η ∂ U h ∂ U h − ρ σ η q ∂ U h (cid:3) = 0 ,c h ( T, q, U ) + c γ h ( T, q, U ) + γ h ( T, q, U ) = 0 . (60)We seek solutions to (60) that do not depend on c and γ , so we set each of the three terms insquare brackets equal to zero independently. Make the substitutions h ( t, q, U ) = A ( t, U ) + A ( t, U ) q + A ( t, U ) q , (61a) h ( t, q, U ) = B ( t, U ) + B ( t, U ) q + B ( t, U ) q , (61b) h ( t, q, U ) = C ( t, U ) + C ( t, U ) q + C ( t, U ) q , (61c)to arrive at a system of PDE’s for A , , , B , , , and C , , . In Lemma 9 (which appears at theend of this proof) we show that these functions are bounded and continuously diﬀerentiablewith respect to U with bounded derivatives.Part II: (accuracy of approximation). With ˆ h given by (24), deﬁneˆ H ( t, x, q, S, U ; θ c, θ γ ) = − e − θγ ( x + qS +ˆ h ( t,q,U ; θ c,θ γ )) . (62)Then the desired limit in (25) is equivalent to H ψ ( t, x, q, S, U ; θ c, θ γ ) = ˆ H ( t, x, q, S, U ; θ c, θ γ ) + o ( θ ) , (63)where the additional power of θ follows from a Taylor expansion of the exponential functionand noting the additional factor of θ that appears in the exponential of (62). For simplicity,we prove the approximation in (63) holds for t = 0 with initial states given by x , q , S , and U . The case of t (cid:54) = 0 follows similarly. 28enceforth, consider the initial states x , q , S , and U to be ﬁxed, and take θ ∈ (0 , θ ∗ ), (cid:15) ∈ (0 , (cid:15) ∗ )where θ ∗ , (cid:15) ∗ are as in Assumption 4 iii). Further, let ν θ,(cid:15) be an admissible control which is (cid:15) θ -optimal, speciﬁcally such that H ν θ,(cid:15) (0 , x, q, S, U ; θ c, θ γ ) + (cid:15) θ ≥ H ψ (0 , x, q, S, U ; θ c, θ γ ) . (64)Applying Ito’s Lemma to the process G t = ˆ H ( t, X ν θ,(cid:15) t , Q ν θ,(cid:15) t , S ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) yields G T − G = (cid:90) T ( ∂ t + L ν θ,(cid:15) ) ˆ H ( t, X ν θ,(cid:15) t , Q ν θ,(cid:15) t , S ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) dt − θ γ σ (cid:90) T ˆ H ( t, X ν θ,(cid:15) t , Q ν θ,(cid:15) t , S ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) Q ν θ,(cid:15) t dW t − θ γ η (cid:90) T ˆ H ( t, X ν θ,(cid:15) t , Q ν θ,(cid:15) t , S ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) ∂ U ˆ h ( t, Q ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) dZ t , (65)where the diﬀerential operator L ν is given by L ν = ν ∂ q − ( S + k ν ) ν ∂ x + ( µ + b ν ) ∂ S + 12 σ ∂ SS + ( β + θ c ν ) ∂ U + 12 η ∂ UU + ρ σ η ∂ SU . Inspection of ∂ U ˆ h ( t, q, U ; θ c, θ γ ) shows that it is a polynomial with respect to q of degree 2with coeﬃcients that are bounded with respect to ( t, U ) due to Lemma 5.Next, we apply the uniform bound in (19) from Assumption 4 iii) to show that both stochas-tic integrals have expectation zero for suﬃciently small θ . There is a suﬃciently large N independent of θ ∈ (0 , θ ∗ ) such that (cid:12)(cid:12)(cid:12) ˆ H ( t, x, q, S, U ; θ c, θ γ ) q (cid:12)(cid:12)(cid:12) ≤ N e θ γ N ( | x | + | q S | + | q | + q + | U | ) , (cid:12)(cid:12)(cid:12) ˆ H ( t, x, q, S, U ; θ c, θ γ )( ∂ U ˆ h ( t, q, U ; θ c, θ γ )) (cid:12)(cid:12)(cid:12) ≤ N e θ γ N ( | x | + | q S | + | q | + q + | U | ) . Therefore, by Assumption 4 iii), if θ <

Dγ N then the integrands in both stochastic integralsin (65) are square-integrable over [0 , T ] × Ω and therefore have zero expectation. If θ ∗ > Dγ N ,then henceforth we further restrict θ ∈ (0 , Dγ N ).Given the explicit form of ˆ H , we obtain the bound( ∂ t + L ν θ,(cid:15) ) ˆ H ( t, x, q, S, U ; θ c, θ γ ) ≤ sup ν ( ∂ t + L ν ) ˆ H ( t, x, q, S, U ; θ c, θ γ ) (66)= − θ γ ˆ H ( t, x, q, S, U ; θ c, θ γ ) (cid:88) i =3 θ i P i ( t, q, U ) . (67)29he supremum in (66) is attained at ν † = ∂ q ˆ h + θ c ∂ U ˆ h + b q k , which after direct substitution and some tedious but straightforward computations results in(67), where, by Lemma 5, each P i ( t, q, U ), i ∈ { , , , } , is a polynomial with respect to q ofdegree at most four with coeﬃcients that are bounded with respect to t and U (full expressionsappear in (89) in Appendix C). Taking expectations in (65), substituting the deﬁnition of G t ,and using the inequality (67), results in the inequalities E (cid:20)(cid:90) T − θ γ ˆ H ( t, X ν θ,(cid:15) t , Q ν θ,(cid:15) t , S ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) (cid:88) i =3 θ i P i ( t, Q ν θ,(cid:15) t , U ν θ,(cid:15) t ) dt (cid:21) ≥ E [ ˆ H ( T, X ν θ,(cid:15) T , Q ν θ,(cid:15) T , S ν θ,(cid:15) T , U ν θ,(cid:15) T ; θ c, θ γ )] − ˆ H (0 , x, q, S, U ; θ c, θ γ )= H ν θ,(cid:15) (0 , x, q, S, U ; θ c, θ γ ) − ˆ H (0 , x, q, S, U ; θ c, θ γ ) . Rearrange and recall that ν θ,(cid:15) is (cid:15) θ -optimal so that we have1 θ (cid:16) H ψ (0 , x, q, S, U ; θ c, θ γ ) − ˆ H (0 , x, q, S, U ; θ c, θ γ ) (cid:17) ≤ (cid:15) + E (cid:20)(cid:90) T − θ γ ˆ H ( t, X ν θ,(cid:15) t , Q ν θ,(cid:15) t , S ν θ,(cid:15) t , U ν θ,(cid:15) t ; θ c, θ γ ) (cid:88) i =3 θ i − P i ( t, Q ν θ,(cid:15) t , U ν θ,(cid:15) t ) dt (cid:21) . (68)We again apply the uniform bound in (19) from Assumption 4 iii). By construction, ˆ h has atmost linear growth in U . Moreover, the zeroth order, linear, and quadratic dependence on q appear with bounded coeﬃcients. Next, as each P i is at most degree four in q , with boundedcoeﬃcients, there is a suﬃciently large N , independent of θ ∈ (0 , θ ∗ ), such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ H ( t, x, q, S, U ; θ c, θ γ ) (cid:88) i =3 θ i − P i ( t, Q ν θ,(cid:15) t , U ν θ,(cid:15) t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N e θ γ N ( | x | + | q S | + | q | + q + | U | ) . If θ ∗ > Dγ N and N > N , then further restrict θ ∈ (0 , Dγ N ), and as (cid:15) ∈ (0 , (cid:15) ∗ ) and θ < θ ∗ , theuniform bound in (19) applies and hence θ (cid:12)(cid:12)(cid:12) H ψ (0 , x, q, S, U ; θ c, θ γ ) − ˆ H (0 , x, q, S, U ; θ c, θ γ ) (cid:12)(cid:12)(cid:12) ≤ (cid:15) + θ γ N C . (69)Finally, as (cid:15) ∈ (0 , (cid:15) ∗ ) is arbitrary, we havelim θ ↓ θ (cid:12)(cid:12)(cid:12) H ψ (0 , x, q, S, U ; θ c, θ γ ) − ˆ H (0 , x, q, S, U ; θ c, θ γ ) (cid:12)(cid:12)(cid:12) = 0 , (70)which is the desired limit. (cid:3) emma 9. The functions A , , , B , , , and C , , are bounded and continuously diﬀerentiablewith respect to U with bounded derivatives. Proof

Let L = β ∂ U + η ∂ UU . Upon substituting (61) into (60), the functions A , , , B , , ,and C , , satisfy the following systems of PDE’s:  ∂ t A + L A + k ( λ + ∂ U g ) + k f ( ∂ U λ + A ) = 0 ,∂ t A + L A + k (2 f + b ) ( ∂ U λ + A ) + k f ( ∂ U λ + 2 A ) = 0 ,∂ t A + L A + k (2 f + b ) ( ∂ U λ + 2 A ) = 0 ,A , , ( T, U ) = 0 , (71)  ∂ t B + L B + k f ( ∂ U λ + B ) + k Λ ( λ + ∂ U g ) − η ∂ U λ ∂ U g = 0 ,∂ t B + L B + k f ( ∂ U Λ + 2 B ) + k Λ ( λ + ∂ U g )+ k (2 f + b ) ( ∂ U Λ + B ) − η ∂ U λ ∂ U g − ρση∂ U λ = 0 ,∂ t B + L B + k (2 f + b ) ( ∂ U Λ + 2 B ) + k f ∂ U Λ − ρση∂ U λ = 0 ,B , , ( T, U ) = 0 , (72)  ∂ t C + L C + k (Λ ) + k f C − η ∂ U Λ ∂ U g = 0 ,∂ t C + L C + k Λ Λ + k f C + k (2 f + b ) C − η ∂ U Λ ∂ U g − ρση∂ U Λ = 0 ,∂ t C + L C + k (Λ ) + k (2 f + b ) C − ρση∂ U Λ = 0 ,C , , ( T, U ) = 0 . (73)Inspection shows that within each of the three systems, the coupling is only in one directionso the equations may be solved one by one. We also see that each individual equation takesthe form ∂ t w + L w + F + Gw = 0 and w ( T, U ) = 0 . (74)By the Feynman-Kac formula the solution for w is w ( t, U ) = E (cid:20)(cid:90) Tt e (cid:82) st G ( r, ˜ U r ) dr F ( s, ˜ U s ) (cid:12)(cid:12)(cid:12)(cid:12) ˜ U t = U (cid:21) , where d ˜ U t = β dt + η dZ t . The forcing term F in each equation is bounded and continuously diﬀerentiable with respect to U because the functions f , , , λ , , Λ , , , and ∂ U g are bounded and continuously diﬀerentiablewith respect to U . In addition, inspection shows that each discount term G is bounded anda function only of t . Therefore each A , , , B , , , and C , , is bounded, and continuouslydiﬀerentiable with respect to U by Lemma 5. (cid:3) .5. Proof of Theorem 7 Fix θ > θ ∈ (0 , θ ). Next, consider the inventory and non-tradable risk factorpath when the agent follows the conjectured approximate strategy, speciﬁcally such that dQ ˆ νt = ˆ ν (cid:0) t, Q ˆ νt , U ˆ νt (cid:1) dt , (75a) dU ˆ νt = (cid:0) β + c ˆ ν (cid:0) t, Q ˆ νt , U ˆ νt (cid:1)(cid:1) dt + η dZ t . (75b)By Lemma 5, the function ˆ ν may be written asˆ ν ( t, q, U ) = F ( t ) + F ( t ; θ ) q + F ( t ; θ ) ∂ U g ( t, U ) , (76)with ∂ U g ( t, U ) and ∂ UU g ( t, U ) bounded, therefore ˆ ν ( t, q, U ) is Lipschitz with linear growth inthe variables q and U . Thus, the SDEs (75) have a unique strong solution (see Karatzas andShreve (2012) Theorem 5.2.9). Moreover, choose the linear growth coeﬃcient uniformly withrespect to θ ∈ (0 , θ ), so that E (cid:104)(cid:0) Q ˆ νt (cid:1) + (cid:0) U ˆ νt (cid:1) (cid:105) ≤ C e Ct , ∀ t ∈ [0 , T ] , for some constant C . Therefore, by Fubini’s Theorem, we have E (cid:104)(cid:82) T ˆ ν u du (cid:105) < ∞ .To show that ˆ ν is asymptotically approximately optimal, we proceed with a veriﬁcation argu-ment while keeping track of the magnitude of the error with respect to optimization, analogousto the proof of Theorem 6. We also remark that as H ψ ( t, x, q, S, U ; θ c, θ γ ) = − e − θ γ ( x + q S + h ψ ( t,q,U ; θ c,θ γ )) ,H ˆ ν ( t, x, q, S, U ; θ c, θ γ ) = − e − θ γ ( x + q S + h ˆ ν ( t,q,U ; θ c,θ γ )) , our desired approximation result is equivalent to H ψ ( t, x, q, S, U ; θ c, θ γ ) = H ˆ ν ( t, x, q, S, U ; θ c, θ γ ) + o ( θ ) , (77)which follows from a Taylor expansion of the exponential function.We prove the accuracy result at t = 0 with given initial states x , q , S , and U , which wehenceforth consider to be ﬁxed. The general result for t (cid:54) = 0 follows similarly.Given the control ˆ ν , and the resulting state processes X ˆ νt , Q ˆ νt , S ˆ νt , and U ˆ νt , deﬁne the process( G t ) t ∈ [0 ,T ] where G t = ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) , and ˆ H ( t, x, q, S, U ; θ c, θ γ ) = − e − θ γ ( x + q S +ˆ h ( t,q,U ; θ c,θ γ ) . h is the approximation of h ψ given in Theorem 6 Equation (24). Applying Ito’s Lemmato G gives G T − G = (cid:90) T ( ∂ t + L ˆ ν ) ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) dt − θ γ σ (cid:90) T ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) Q ˆ νt dW t − θ γ η (cid:90) T ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) ∂ U ˆ h ( t, Q ˆ νt , U ˆ νt ; θ c, θ γ ) dZ t = − θ γ (cid:90) T ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) (cid:18) (cid:88) i =3 θ i M i ( t, Q ˆ νt , U ˆ νt ) (cid:19) dt − θ γ σ (cid:90) T ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) Q ˆ νt dW t − θ γ η (cid:90) T ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) ∂ U ˆ h ( t, Q ˆ νt , U ˆ νt ; θ c, θ γ ) dZ t , (78)where each M i ( t, q, U ), i ∈ { , , } , is a polynomial in q of degree at most four with coeﬃcientsthat are uniformly bounded functions of t and U (see (90) in Appendix C for the explicitexpressions).We proceed to show that for θ ∈ (0 , θ ) both stochastic integrals have zero expectation andthat Fubini’s Theorem may be applied to the expectation of the Riemann integral. First, weconstruct appropriate bounds on the underlying processes.The linear growth conditions of ˆ ν and boundedness of ∂ U g implies ν ( t, q ) ≤ ˆ ν ( t, q, U ) ≤ ν ( t, q ) , where ν ( t, q ) = C (1 + | q | ) and ν ( t, q ) = − ν ( t, q )for some constant C >

0. In addition, the processes ( Q νt ) t ∈ [0 ,T ] and ( Q νt ) t ∈ [0 ,T ] are determin-istic and satisfy Q νt ≤ Q ˆ νt ≤ Q νt . Similarly, there exists processes ( S νt ) t ∈ [0 ,T ] , ( S νt ) t ∈ [0 ,T ] , ( U νt ) t ∈ [0 ,T ] , and ( U νt ) t ∈ [0 ,T ] such that S νt ≤ S ˆ νt ≤ S νt and U νt ≤ U ˆ νt ≤ U νt , almost surely (see Karatzas and Shreve (2012) Proposition 5.2.18). Therefore, there exists C > C > | S ˆ νt | ≤ C (cid:18) ≤ t ≤ T {| W t |} (cid:19) and | U ˆ νt | ≤ C (cid:18) ≤ t ≤ T {| Z t |} (cid:19) . M W = max ≤ t ≤ T {| W t |} and M Z = max ≤ t ≤ T {| Z t |} . These bounds provides thefollowing bounds for X ˆ νt | X ˆ νt | ≤ | x | + (cid:90) t | S ˆ νt + k ˆ ν ( s, Q ˆ νs , U ˆ νs ) | | ˆ ν ( s, Q ˆ νs , U ˆ νs ) | ds ≤ | x | + (cid:90) T | S ˆ νt | | ˆ ν ( s, Q ˆ νs , U ˆ νs ) | ds + k (cid:90) T | ˆ ν ( s, Q ˆ νs , U ˆ νs ) | ds ≤ | x | + T C C (cid:18) M W (cid:19) + k T C , where C is a constant.The uniform bounds on ∂ U g ( t, U ) and ∂ UU g ( t, U ) implies ˆ h has at most linear growth in U and hence | ˆ h ( t, Q ˆ νt , U ˆ νt ) | ≤ C (1 + M Z ) and | ∂ U ˆ h ( t, Q ˆ νt , U ˆ νt ) | ≤ C , where C and C are constants.Applying the above bounds together provides e − θ γ C (1+ M W + M Z ) ≤ | ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) | ≤ e θ γ C (1+ M W + M Z ) . (79)We may choose the constants C i independent of θ ∈ (0 , θ ) and thereforeˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ )( Q ˆ νt ) ≤ C e θ γ C (1+ M W + M Z ) , ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) ( ∂ U ˆ h ( t, Q ˆ νt , U ˆ νt ; θ c, θ γ )) ≤ C e θ γ C (1+ M W + M Z ) , where C = max { ( Q νT ) , ( Q νT ) } . As right-hand sides of both inequalities are integrable over[0 , T ] × Ω, the stochastic integrals in (78) have zero expectation.Next, as noted above, M i , i ∈ { , , } , is polynomial in q of degree at most four withcoeﬃcients that are uniformly bounded functions of t and U . Hence, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i =3 θ i M i ( t, Q ˆ νt , U ˆ νt ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ θ C , (80)where C is a constant which does not depend on θ ∈ (0 , θ ). This bound, along with (79),allows us to apply Fubini’s Theorem to the Riemann integral in (78). Putting this togetherwith the result that stochastic integrals on the rhs of (78) have zero expectation, we have E [ H ˆ ν ( T, X ˆ νT , Q ˆ νT , S ˆ νT , U ˆ νT ; θ c, θ γ )] − ˆ H (0 , x, q, S, U ; θ c, θ γ )= − θ E (cid:20) γ (cid:90) T ˆ H ( t, X ˆ νt , Q ˆ νt , S ˆ νt , U ˆ νt ; θ c, θ γ ) (cid:18) (cid:88) i =3 θ i M i ( t, Q ˆ νt , U ˆ νt ) (cid:19) dt (cid:21) (81)34sing the bound (79), we further have θ (cid:12)(cid:12)(cid:12) H ˆ ν (0 , x, q, S, U ; θ c, θ γ ) − ˆ H (0 , x, q, S, U ; θ c, θ γ ) (cid:12)(cid:12)(cid:12) ≤ θ γ C T E [ e θ γ C (1+ M W + M Z ) ] . (82)From Theorem 6, we havelim θ ↓ θ (cid:12)(cid:12)(cid:12) H ψ (0 , x, q, S, U ; θ c, θ γ ) − ˆ H (0 , x, q, S, U ; θ c, θ γ ) (cid:12)(cid:12)(cid:12) = 0 . Combining the above with (82) implieslim θ ↓ θ (cid:12)(cid:12) H ψ (0 , x, q, S, U ; θ c, θ γ ) − H ˆ ν (0 , x, q, S, U ; θ c, θ γ ) (cid:12)(cid:12) = 0 , (83)as desired. (cid:3) The proof proceeds in three parts. (i) We prove the local uniform approximation given by(36); (ii) we prove the control in (37) is admissible; and (iii) ﬁnally we prove the control (37)is approximately optimal to second order in the sense of (38).Part (i): (local uniform approximation): The feedback form of the optimal control when theagent holds N units of the non-tradable risk factor is given in closed-form by equation (12).Denote this function by v ∗ ( t, q, N ; c, γ ). The feedback form of the approximate optimal controlwhen the agent has exposure of the form ψ ( U ) is in equation (33). Due to Lemma 5, thedependence of ν and ν on U in equation (33) appears only through ∂ U g ( t, U ). Denote theﬁrst three terms on the right-hand side of (33), with ∂ U g ( t, U ) replaced by ∆, by ˆ v ( t, q, ∆; c, γ ).Write v ∗ and ˆ v as v ∗ ( t, q, ∆; θ c, θ γ ) = k ( v ( t ; θ ) + v ( t ; θ ) q + v ( t ; θ ) ∆) , (84)ˆ v ( t, q, ∆; θ c, θ γ ) = k (ˆ v ( t ; θ ) + ˆ v ( t ; θ ) q + ˆ v ( t ; θ ) ∆) . (85)We next show that lim θ ↓ θ ( v i ( t ; θ ) − ˆ v i ( t ; θ )) = 0 , uniformly in t for each i = 0 , ,

2. Thus,lim θ ↓ θ ( v ∗ ( t, q, ∆; θ c, θ γ ) − ˆ v ( t, q, ∆; θ c, θ γ )) = 0 , locally uniformly in ( t, q, ∆). 35o prove this, we study the θ dependence of the ODEs satisﬁed by v i and ˆ v i . The convergenceresults follow from continuity and diﬀerentiability with respect to a parameter of solutions ofsaid ODEs (see for example Chicone (2006) Theorem 1.3).Inspection of (12), (33), and (34) shows that v ( t ; θ ) = 2 h ( t ; θ ) + b and ˆ v ( t ; θ ) = 2 f ( t ) + b + 2 θ γ Λ ( t ). The functions h and f both satisfy ODEs of the form x (cid:48) = F ( x ; θ ) and x ( T ) = − α , where F ( x ; θ ) = σ θ γ − k (2 x + b ) and the ODE for f corresponds to θ = 0. When θ ↓ F ( x ; θ ) → F ( x ; 0) uniformly in x , therefore h ( t ; θ ) → f ( t ) uniformly in t ∈ [0 , T ]. This alsoimplies v ( t ; θ ) → f ( t ) + b uniformly in t ∈ [0 , T ]. By L’Hopital’s rule we havelim θ ↓ θ ( v ( t ; θ ) − ˆ v ( t ; θ )) = lim θ ↓ ( ∂ θ v ( t ; θ ) − ∂ θ ˆ v ( t ; θ )) = 2 lim θ ↓ ( ∂ θ h ( t ; θ ) − γ Λ ( t )) . (86)Next, from (11c), h ( t ; θ ) has continuous mixed second order derivatives (wrt t and θ ) for θ >

0. Thus we write ∂ t ( ∂ θ h ) = ∂ θ ( ∂ t h ) = σ γ − k (2 h + b ) ∂ θ h and ∂ θ h ( T ; θ ) = 0 . We also have ∂ t Λ = σ − k (2 f + b ) Λ and Λ ( T ) = 0 , and because h → f uniformly in t as θ ↓

0, we have ∂ θ h → γ Λ uniformly in t . Thus,lim θ ↓ θ ( v ( t ; θ ) − ˆ v ( t ; θ )) = 0 , uniformly in t . Inspection of (12), (46), and (84) shows that v and v satisfy the ODEs ∂ t v = − µ − k (2 h + b ) v , v ( T ) = 0 ,∂ t v = θ γ ρ σ η − k (2 h + b ) v , v ( T ) = θ c . We wish to make the depence on θ explicit. To this end, inspection of (29b), (30b), and (33)shows that we may writeˆ v + ˆ v ∆ = f + θ c ∆ + θ c λ + θ γ Λ = f + θ c ∆ + θ c ˜ λ ∆ + θ γ Λ + θ γ ˜Λ ∆ , where the introduced functions satisfy the ODEs ∂ t f = − µ − k (2 f + b ) f , f ( T ) = 0 ,∂ t ˜ λ = − k (2 f + b )(1 + ˜ λ ) , ˜ λ ( T ) = 0 ,∂ t Λ = − k Λ f − k (2 f + b ) Λ , Λ ( T ) = 0 ,∂ t ˜Λ = ρ σ η − k (2 f + b ) ˜Λ , ˜Λ ( T ) = 0 . ∂ t ˆ v = − µ − k (2 f + b ) f − θ γ (cid:0) k Λ f + k (2 f + b )Λ (cid:1) , ˆ v ( T ) = 0 ,∂ t ˆ v = θ (cid:16) γ ρ σ η − k (2 f + b )( c + c ˜ λ + γ ˜Λ ) (cid:17) , ˆ v ( T ) = θ c . Analogous to how we prove ˆ v = v + o ( θ ) above, we may prove the same for ˆ v and ˆ v :First, repeat the arguments to show that the rhs of the associated ODEs converge to ap-propriate limits, hence lim θ ↓ v i ( t ; θ ) − ˆ v i ( t ; θ ) = 0, next repeat the arguments to show thatlim θ ↓ ∂ θ v i ( t ; θ ) − ∂ θ ˆ v i ( t ; θ ) = 0. All limits can be taken uniformly in t ∈ [0 , T ].Part (ii) (admissibility): In feedback form, the candidate trading strategy is v ∗ ( t, q, ∂ U g ( t, U )) = k ( v ( t ; θ ) + v ( t ; θ ) q + v ( t ; θ ) ∂ U g ( t, U )) . (87)This is of the same form as the feedback strategy in (76) (the time dependent coeﬃcients arediﬀerent, but for ﬁxed θ they are bounded). Thus, the argument for admissibility is the same.Part (iii) (optimality approximation): This part of the proof proceeds similarly to Theorem 7.Given the candidate strategy ν (cid:48) t = v ∗ ( t, Q ν (cid:48) t , ∂ U g ( t, U ν (cid:48) t ); θ c, θ γ ), deﬁne the stochastic process( G t ) t ∈ [0 ,T ] by G t = ˆ H ( t, X ν (cid:48) t , Q ν (cid:48) t , S ν (cid:48) t , U ν (cid:48) t ; θc, θγ ) , where ˆ H ( t, x, q, S, U ; θ c, θ γ ) = − e − θ γ ( x + q S +ˆ h ( t,q,U ; θ c,θ γ ) ) . and ˆ h is the approximation of h ψ in Theorem 6. Apply Ito’s Lemma to G and write G T − G = − θ γ (cid:90) T ˆ H ( t, X ν (cid:48) t , Q ν (cid:48) t , S ν (cid:48) t , U ν (cid:48) t ; θ c, θ γ ) × (cid:18) (cid:88) i =3 θ i M i ( t, Q ν (cid:48) t , U ν (cid:48) t ) + V ( t, Q ν (cid:48) t , U ν (cid:48) t ; θ ) (cid:19) dt − θ γ σ (cid:90) T ˆ H ( t, X ν (cid:48) t , Q ν (cid:48) t , S ν (cid:48) t , U ν (cid:48) t ; θ c, θ γ ) Q ν (cid:48) t dW t − θ γ η (cid:90) T ˆ H ( t, X ν (cid:48) t , Q ν (cid:48) t , S ν (cid:48) t , U ν (cid:48) t ; θ c, θ γ ) ∂ U ˆ h ( t, Q ν (cid:48) t , U ν (cid:48) t ; θ c, θ γ ) dZ t , (88)where M , , are given by (78). The quantity V is shown by explicit computation to be V ( t, q, U ; θ ) = r ( t, q, U ; θ ) (cid:18) ∂ q ˆ h ( t, q, U ; θ ) + θ c ∂ U ˆ h ( t, q, U ; θ )+ b q − k ˆ v ( t, q, U ; θ ) (cid:19) − k r ( t, q, U ; θ ) , r = v ∗ − ˆ v . More details on the computation of V are given in Appendix C. Byconstruction of ˆ v we have ∂ q ˆ h + θ c ∂ U ˆ h + b q − k ˆ v = θ (cid:18) c ∂ U h + c ( ∂ q h + ∂ U h ) + c γ ( ∂ q h + ∂ U h ) + γ ∂ q h + θ c ( c ∂ U h + c γ ∂ U h + γ ∂ U h ) (cid:19) . In particular, V ( t, q, U ; θ ) is a polynomial with respect to q of degree 3 with coeﬃcients thatare bounded functions of t and U . Furthermore, due to arguments in the ﬁrst part of thisproof we have lim θ ↓ θ V ( t, q, U ; θ ) = 0 , where the convergence is locally uniform with respect to q and uniform with respect to t and U .All of the estimates from the proof of Theorem 7 hold identically (except for possibly diﬀerentconstants C , . . . , C ). We write (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i =3 θ i M i ( t, Q ν (cid:48) t , U ν (cid:48) t ) + V ( t, Q ν (cid:48) t , U ν (cid:48) t ; θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ θ C + V ( θ ) , with C as in (80) from Theorem 7 and where V satisﬁeslim θ ↓ θ V ( θ ) = 0 , (recall that Q ν (cid:48) t is bounded by a constant). We then have | E [ G T ] − G | = (cid:12)(cid:12)(cid:12)(cid:12) θ E (cid:20) γ (cid:90) T ˆ H ( t, X ν (cid:48) t , Q ν (cid:48) t , S ν (cid:48) t , U ν (cid:48) t ; θ c, θ γ ) × (cid:18) (cid:88) i =3 θ i M i ( t, Q ν (cid:48) t , U ν (cid:48) t ) + V ( t, Q ν (cid:48) t , U ν (cid:48) t ; θ ) (cid:19) dt (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ γ θ (cid:18) θ C T E [ e θ γC (1+ M W + M Z ) ] + V ( θ ) θ T E [ e θ γC (1+ M W + M Z ) ] (cid:19) . Therefore, lim θ ↓ θ (cid:12)(cid:12)(cid:12) H ν (cid:48) (0 , x, q, S, U ) − ˆ H (0 , x, q, S, U ) (cid:12)(cid:12)(cid:12) = 0 , which, when combined with Theorem 6, proves the required result. (cid:3) ppendix C - P , , , , M , , , and V P , , , The following expressions give the functions P , , , ( t, q, U ), which appear in the proof ofTheorem 6. These expressions are found by explicitly computing the supremum in (66) andthen grouping powers of θ .Recall that each h , , , , , is quadratic with respect to q . Then, by inspection we see that P and P are third degree polynomials with respect to q , P and P are fourth degree polynomialswith respect to q , and the coeﬃcients of these polynomials are uniformly bounded functionsof t and U . In addition, the coeﬃcients are continuously diﬀerentiable with respect to U withbounded derivatives by Lemma 5. P = ( γ ∂ q h + c ( ∂ q h + ∂ U h )) ( c ∂ q h + c γ ∂ q h + γ ∂ q h )2 k − γ η (( c ∂ U h + γ ∂ U h ) + 2 ∂ U h ( c ∂ U h + c γ ∂ U h + γ ∂ U h ))2+ c ( c ∂ U h + c γ ∂ U h + γ ∂ U h ) ( ∂ q h + b q )2 k + c ( γ ∂ q h + c ( ∂ q h + ∂ U h )) ( c ∂ U h + γ ∂ U h )2 k − γ ρ σ η q ( c ∂ U h + c γ∂ U h + γ ∂ U h ) , (89a) P = ( c ( c ∂ U h + γ ∂ U h ) + c ∂ q h + c γ ∂ q h + γ ∂ q h ) k + c ( c ∂ U h + c γ ∂ U h + γ ∂ U h )( c ∂ U h + c ∂ q h + γ ∂ q h )2 k − γ η ( c ∂ U h + γ ∂ U h )( c ∂ U h + c γ ∂ U h + γ ∂ U h ) , (89b) P = c ( c ∂ U h + c γ ∂ U h + γ ∂ U h )( c ∂ U h + γ ∂ U h )2 k + c ( c ∂ U h + cγ∂ U h + γ ∂ U h )( c ∂ q h + c γ ∂ q h + γ ∂ q h )2 k − γ η ( c ∂ U h + c γ ∂ U h + γ ∂ U h ) , (89c) P = c ( c ∂ U h + c γ ∂ U h + γ ∂ U h ) k . (89d) M , , The following expressions give the functions M , , ( t, q, U ), which appear in the proofs ofTheorem 7 and Proposition 8. These expressions are found by substituting the feedbackcontrol ˆ ν from (33) into ( ∂ t + L ˆ ν ) ˆ H ( t, x, q, S, U ; θ c, θ γ ) and grouping powers of θ .39ecall that each h , , , , , is quadratic with respect to q . Then, we see by inspection that M and M are third degree polynomials with respect to q , M is a fourth degree polynomialwith respect to q , and the coeﬃcients of these polynomials are uniformly bounded functionsof t and U . In addition, the coeﬃcients are continuously diﬀerentiable with respect to U withbounded derivatives by Lemma 5. M = ( γ ∂ q h + c ( ∂ q h + ∂ U h ))( c ∂ q h + c γ ∂ q h + γ ∂ q h )2 k − γ η (( c ∂ U h + γ ∂ U h ) + 2 ∂ U h ( c ∂ U h + c γ ∂ U h + γ ∂ U h ))2+ c ( c ∂ U h + c γ ∂ U h + γ ∂ U h )( ∂ q h + b q )2 k + c ( γ ∂ q h + c ( ∂ q h + ∂ U h ))( c ∂ U h + γ ∂ U h )2 k − γ ρ σ η q ( c ∂ U h + c γ ∂ U h + γ v∂ U h ) , (90a) M = (cid:18) c ( γ ∂ q h + c ( ∂ U h + ∂ q h )) − k γ η ( c ∂ U h + γ ∂ U h )2 k (cid:19)(cid:18) c ∂ U h + c γ ∂ U h + γ ∂ U h (cid:19) , (90b) M = − γ η ( c ∂ U h + c γ ∂ U h + γ ∂ U h ) . (90c) V Here we show in more detail the steps required to compute V , which appears in the proof ofProposition 8. We begin with( ∂ t + L ν ) ˆ H ( t, x, q, S, U ; θ c, θ γ ) = − θγ ˆ H (cid:18) ∂ t ˆ h + µ q − θγ σ q + ( β − θγ ρ σ η q ) ∂ U ˆ h + η ∂ UU ˆ h − θγ η ( ∂ U ˆ h ) + ν∂ q ˆ h + θc ν ∂ U ˆ h + b q ν − k ν (cid:19) , (91)and recall that in feedback form the control ν (cid:48) is given by v ∗ ( t, q, ∂ U g ( t, U ); θ c, θ γ ). We writethis feedback control as v ∗ ( t, q, ∂ U g ( t, U ); θ c, θ γ ) = ˆ v ( t, q, ∂ U g ( t, U ); θ c, θ γ ) + r ( t, q, U ; θ ) , (92)where r ( t, q, U ; θ ) = v ∗ ( t, q, ∂ U g ( t, U ); θ c, θ γ ) − ˆ v ( t, q, ∂ U g ( t, U ); θ c, θ γ ) .

40e now substitute (92) in (91) then expand and group terms which contain r ( t, q, U ; θ )separate from those which do not. The resulting expression is( ∂ t + L ν (cid:48) ) ˆ H ( t, x, q, S, U ; θ c, θ γ )= − θγ ˆ H (cid:18) ∂ t ˆ h + µ q − θγ σ q + ( β − θγ ρ σ η q ) ∂ U ˆ h + η ∂ UU ˆ h − θγ η ( ∂ U ˆ h ) + v ∗ ∂ q ˆ h + θc v ∗ ∂ U ˆ h + b q v ∗ − k ( v ∗ ) (cid:19) = − θγ ˆ H (cid:18) ∂ t ˆ h + µ q − θγ σ q + ( β − θγ ρ σ η q ) ∂ U ˆ h + η ∂ UU ˆ h − θγ η ( ∂ U ˆ h ) + ˆ v ∂ q ˆ h + θc ˆ v ∂ U ˆ h + b q ˆ v − k ˆ v + r ( ∂ q ˆ h + θ c ∂ U ˆ h + b q − k ˆ v ) − k r (cid:19) = ( ∂ t + L ˆ ν ) ˆ H ( t, x, q, S, U ; θ c, θ γ ) − θ γ ˆ H (cid:18) r ( ∂ q ˆ h + θ c ∂ U ˆ h + b q − k ˆ v ) − k r (cid:19) = − θ γ ˆ H (cid:18) (cid:88) i =3 θ i M i ( t, q, U ) + r ( ∂ q ˆ h + θ c ∂ U ˆ h + b q − k ˆ v ) − k r (cid:19) . The summation in the last line comes from the deﬁnitions of the M i ’s in the proof of Theorem7, also outlined earlier in this appendix. The remaining terms in large parentheses are denotedby V ( t, q, U ; θ ). References

Almgren, R. and N. Chriss (2001). Optimal execution of portfolio transactions.

Journal ofRisk 3 , 5–40.Bacry, E., A. Iuga, M. Lasnier, and C.-A. Lehalle (2015). Market impacts and the life cycleof investors orders.

Market Microstructure and Liquidity 01 (02), 1550009.Bechler, K. and M. Ludkovski (2015). Optimal execution with dynamic order ﬂow imbalance.

SIAM Journal on Financial Mathematics 6 (1), 1123–1151.Cartea, ´A. and S. Jaimungal (2017). Irreversible investments and ambiguity aversion.

Inter-national Journal of Theoretical and Applied Finance 20 (07), 1750044.Cartea, ´A., S. Jaimungal, and J. Penalva (2015).

Algorithmic and high-frequency trading .Cambridge University Press.Chicone, C. (2006).

Ordinary diﬀerential equations with applications , Volume 34. SpringerScience & Business Media. 41ont, R., A. Kukanov, and S. Stoikov (2014). The price impact of order book events.

Journalof Financial Econometrics 12 (1), 47–88.Donier, J., J. Bonart, I. Mastromatteo, and J.-P. Bouchaud (2015). A fully consistent, minimalmodel for non-linear market impact.

Quantitative Finance 15 (7), 1109–1121.Duncan, T. E. (2013). Linear-exponential-quadratic gaussian control.

IEEE Transactions onAutomatic Control 58 (11), 2910–2911.Grasselli, M. (2011). Getting real with real options: a utility–based approach for ﬁnite–timeinvestment in incomplete markets.

Journal of Business Finance & Accounting 38 (5-6),740–764.Grasselli, M. and V. Henderson (2009). Risk aversion and block exercise of executive stockoptions.

Journal of Economic Dynamics and Control 33 (1), 109–127.Gu´eant, O. (2015). Optimal execution and block trade pricing: A general framework.

AppliedMathematical Finance 22 (4), 336–365.Gu´eant, O. (2016).

The ﬁnancial mathematics of market liquidity: From optimal executionto market making , Volume 33. CRC Press.Henderson, V. (2002). Valuation of claims on nontraded assets using utility maximization.

Mathematical Finance 12 (4), 351–373.Henderson, V. (2007). Valuing the option to invest in an incomplete market.

Mathematicsand Financial Economics 1 (2), 103–128.Jacobson, D. (1973). Optimal stochastic linear systems with exponential performance criteriaand their relation to deterministic diﬀerential games.

IEEE Transactions on Automaticcontrol 18 (2), 124–131.Karatzas, I. and S. Shreve (2012).

Brownian motion and stochastic calculus , Volume 113.Springer Science & Business Media.Leung, T. and M. Lorig (2016). Optimal static quadratic hedging.

Quantitative Finance 16 (9),1341–1355.Leung, T. and R. Sircar (2009a). Accounting for risk aversion, vesting, job termination riskand multiple exercises in valuation of employee stock options.

Mathematical Finance 19 (1),99–128. 42eung, T. and R. Sircar (2009b). Exponential hedging with optimal stopping and applicationto employee stock option valuation.

SIAM Journal on Control and Optimization 48 (3),1422–1451.Potters, M. and J.-P. Bouchaud (2003). More statistical properties of order books and priceimpact.