Two-sided Singular Control of an Inventory with Unknown Demand Trend
TTWO-SIDED SINGULAR CONTROL OF AN INVENTORYWITH UNKNOWN DEMAND TREND
SALVATORE FEDERICO, GIORGIO FERRARI, AND NEOFYTOS RODOSTHENOUS
Abstract.
We study the problem of optimally managing an inventory with unknown demand trend.Our formulation leads to a stochastic control problem under partial observation, in which a Brownianmotion with non-observable drift can be singularly controlled in both an upward and downwarddirection. We first derive the equivalent separated problem under full information with state-spacecomponents given by the Brownian motion and the filtering estimate of its unknown drift, and we thencompletely solve the latter. Our approach uses the transition amongst three different but equivalentproblem formulations, links between two-dimensional bounded-variation stochastic control problemsand games of optimal stopping, and probabilistic methods in combination with refined viscositytheory arguments. We show substantial regularity of (a transformed version of) the value function,we construct an optimal control rule, and we show that the free boundaries delineating (transformed)action and inaction regions are bounded globally Lipschitz continuous functions. To our knowledgethis is the first time that such a problem has been solved in the literature.
Keywords : bounded-variation stochastic control, partial observation, inventory management,Dynkin games, free boundaries.
MSC2010 subject classification : 93E20, 93E11, 91A55, 49J40, 90B05.1.
Introduction
In real-world situations, decision makers are usually faced with the uncertainty of noise or volatil-ity in the dynamics of an underlying stochastic process. However, in many occasions they are alsofaced with uncertainty in their estimation of the drift of this monitored stochastic process. In otherwords, decision makers might not know the exact growth characteristics of the future value of theunderlying process. They may find themselves observing the evolution of its value, but cannot per-fectly distinguish whether the cause of its variations is due to the drift or the stochastic driver ofthe process. Through their observations, they can update their beliefs about the drift, however dueto the aforementioned inability of distinguishing the cause of variations, the information acquired byobservations is inevitably noisy. Such an uncertainty about the drift therefore adds a structural riskcomponent to decision making, in addition to the noise from the stochastic driver of the underlyingprocess. Such scenaria have already received attention in the mathematical economic/financial liter-ature, such as [13] for investment timing, [8] for asset trading, [18] for optimal liquidation, [16] forcontract theory, and [12] and [14] for dividend payments.In this paper, we consider the optimal management of inventory when the demand is stochasticand partially observed. There exists an enormous literature on optimal inventory management (see,e.g. [40] for an overview and the significance of inventory control in operations and profitability ofcompanies). The optimal singular/impulsive control literature of stochastic inventory systems hasso far assumed that the dynamics of the inventory is fully known to decision makers, see e.g. [1],[6], [7], [22], [23], [24] [37], [38] [39], amongst many others. Some of the most celebrated results arethe optimality of (constant) threshold strategies determining ( a ) base-stock policies – maintaininginventory above a fixed shortage level – and ( b ) restrictions on the size of inventory, in order tomanage storage-related costs. In this paper, we generalise the existing literature on the singularcontrol of inventories by assuming that the demand rate or the mean of the random demand for Date : February 24, 2021. a r X i v : . [ m a t h . O C ] F e b FEDERICO, FERRARI, AND RODOSTHENOUS the product is unknown to decision makers. This can be relevant to companies operating in newlyestablished markets or producing a novel good, for which there is limited knowledge about thedemand trend. In particular, we will show in this paper how the aforementioned optimal strategiesare no longer triggered by constant thresholds, but by functions of the decision maker’s learningprocess of the unknown demand rate. We further note that our analysis and results in this papercan also contribute to applications way beyond the inventory management literature; for instance,to cash balance management problems (see, e.g. [19]), when the drift of the cash process is unknownto managers.
The model and general results.
We consider decision makers who can observe in real time theevolution of the (random) inventory level S t = x + µt + ηB t , which represents the production minusthe stochastic demand for the product at time t (see [22], [37] for the first such models, and e.g. [39]for a detailed description of Brownian inventory systems). The inventory has a deterministic “netdemand” rate µ , which is unknown to decision makers, and a stochastic part modelling the volatilityassociated to demand via a standard one-dimensional Brownian motion B and a constant volatilityparameter η >
0. The decision makers can control the inventory via a bounded-variation process P t = P + t − P − t , where P ± t are increasing processes that provide the minimal decomposition of P anddefine the total amount of increase/decrease of the inventory process up to time t . The controlledinventory level is therefore given by X t = S t + P t = x + µt + ηB t + P + t − P − t for all t ≥ . Note that, a positive value of X naturally models the current excess inventory level, while the absolutevalue of a negative X models the backlog in production.Both levels of excess inventory and backorder bare (non-necessarily symmetric) holding and short-age costs per unit of time, modelled via a suitable convex function C ( X ) which is based on the levelof X . On one hand, if the holding costs and expenses/investments into more storage space C ( X ) toaccommodate an increasing inventory X become too costly, the decision maker can unload part ofthe excess inventory in various ways (e.g. start promotions, send to outlets, donate, ship to anotherfacility, or destroy) at a cost K − proportional to the inventory volume that is unloaded. On the otherhand, when shortage costs, loss of dissatisfied customers and penalties for delayed shipments C ( X )due to undesirable levels of backlog X , become too costly, the decision maker can place an inventoryreplenishment order to raise the inventory level. This would come at a cost K + proportional to theinventory volume that is ordered.Overall, the aforementioned holding and shortage costs C ( X ) need to be controlled but the pro-portional costs K ± of controlling the inventory create a trade off. The decision maker thus needsto find the right balance between letting the storage system evolve freely according to the realiseddemand and the timings of controlling it, so that the overall cost is minimised. The question wetherefore study in the sequel is “What is the optimal inventory management strategy that minimisesthe total expected (discounted) future holding, shortage and control costs, when the demand rate isunknown?” .As in most of the aforementioned literature, we allow the rate of reduction d P − and increase d P + to be unbounded and allow them to reduce or increase, respectively, the level of X instantaneously.In mathematical terms, the aforementioned question is formulated as a bounded-variation stochasticcontrol problem of a linearly controlled one-dimensional diffusion with the novelty of a random (non-observable) drift µ . To the best of our knowledge, this is the first time that the complete solutionto a bounded-variation problem under partial observation is derived. Given that the drift of X isunknown to the decision maker, the analysis of this question becomes considerably harder than instandard versions of the aforementioned problem with full information (see, e.g. [22]). In orderto model this additional uncertainty, we assume that the random variable µ ∈ { µ , µ } , for some µ , µ ∈ R such that µ < µ . The decision makers can only observe the overall evolution of S ,whose natural filtration modelling the information available to them up to time t , is denoted by F St , while they just have a prior belief π := P ( µ = µ ) ∈ (0 ,
1) on the value of µ at time t = 0.Their belief on the drift is however continuously updated as new information is revealed and theirbelief process takes the form Π t := P ( µ = µ | F St ), according to standard filtering techniques (fora survey, see e.g. [32]). Naturally, the decisions whether to act/control the system or not, arenot based solely on the position of the Brownian (inventory) system X , as in standard problemswhere the drift is known (see, e.g. [22]). These decisions are now adapted dynamically according tothe current belief on the drift µ of the system, thus they depend strongly on the learning processΠ of the decision maker. However, under this filtering estimate of the drift, the dynamics of theproblem becomes essentially two-dimensional and diffusive, which results in an associated variationalformulation with partial differential equations (PDEs). Therefore, obtaining explicit solutions is notpossible in general. Nevertheless, using our methodology that combines various different techniques(as we outline later), we manage to solve the problem and provide the complete characterisation ofthe optimal control strategy.Given the convexity of C , when the (inventory) level X is relatively high (resp., low) resultingin a large holding (resp., shortage) marginal cost C (cid:48) ( X ), the decision maker has an incentive toexert control P − (resp., P + ) to decrease (resp., increase) the level of X . The decision maker mustfind an optimal control strategy P (cid:63) + and P (cid:63) − that minimises the overall expected future holdingand shortage costs counterbalanced with the proportional costs K ± per unit of control exerted.Indeed, we successfully prove in this paper, that such an optimal strategy P (cid:63) + and P (cid:63) − exists andis explicitly characterised by two boundaries, each one associated with one of the control processes P (cid:63) ± . These boundaries then split the space in three distinct but connected regions: ( a ) An actionregion that is divided into two parts, namely the areas above or below these boundaries, prescribingthat when X is either relatively large or small, the decision maker should intervene by decreasing orincreasing X , respectively, and bring X inside the area which is between the two boundaries; and ( b )an intermediate waiting (inaction) region for relatively intermediate values of X , which is preciselythe aforementioned area between the two boundaries.To the best of our knowledge, the study and complete characterisation of these boundaries whichdefine the solution of a bounded-variation stochastic control problem under partial information on thedynamics of the underlying diffusion, has also never been addressed in the literature. We prove thatthe aforementioned boundaries triggered by X are monotone functions of the belief process Π andcan be completely characterised in terms of monotone Lipschitz continuous curves solving a system ofnonlinear integral equations. The dependence of the optimal boundaries on the belief variable Π is incontrast to the full information cases, where the decision makers must intervene whenever X breachessome constant thresholds, irrespective of its past evolution (see, e.g. [22]). In fact, we also provethat our boundaries are bounded by these (constant) thresholds of the full information cases. Thisfurther shows that our model extends and complements the existing literature on bounded-variationstochastic control problems in the case when there is uncertainty about the drift of the underlyingprocess. Our contributions, approach and an overview of the mathematical analysis.
Our contri-bution in this paper is twofold. From the point of view of its application, even though the literatureon the optimal management of inventory is extremely rich (see, e.g. papers cited before), there isno model where the demand is assumed to be partially observed and lump-sum as well as singularlycontinuous actions on the inventory are allowed. To the best of our knowledge, this makes our papera pioneer in this class of problems, which is our first main contribution. From the mathematicaltheory perspective, the development of methods to tackle optimal control problems with absolutelycontinuous (regular) controls and partial observation has an extensive history, see e.g. [2], [27], [28],and [30]. However, the literature on the characterisation of the optimal policy in singular stochasticcontrol problems with partial observation is limited, and actually deals only with monotone con-trols. We firstly refer to [33] that studies singular control problems with partial information viathe study of their associated backward stochastic differential equations (BSDEs) leading to general
FEDERICO, FERRARI, AND RODOSTHENOUS maximum principles; [12] that solves the optimal dividend problem under partial information onthe drift of the revenue process of a firm that can default, creating also an absorption state; [14]that studies a dynamic model of a firm whose shareholders learn about its profitability, face costsof external financing and costs of holding cash; and [4] that considers the debt-reduction problem ofa government that has partial information on the underlying business conditions. Contrary to theaforementioned papers with monotone controllers, we allow the decision maker to both decrease andincrease the underlying process by using controls of bounded-variation. Thus, our paper is expandingthe traditional bounded-variation control theory towards the direction of partial information, by pro-viding a methodology for dealing with such problems, achieving the complete characterisation of thefree boundaries that define the optimal control, and achieving also notable value function regularityproperties. This is our second main contribution, on which we elaborate in the remaining of thissection.By relying on classical filtering theory (see [32]) we first determine an equivalent problem underfull information, the so-called “separated problem”. This is a genuine two-dimensional bounded-variation singular stochastic control problem, with state-space described by the level of the inventoryand the decision maker’s belief on the demand rate. Given the two-dimensional nature of the problem,the traditional “guess and verify” approach is not effective. Indeed, this would require at first theconstruction of an explicit solution to a PDE with (gradient) boundary conditions, which in generalcannot be obtained.We instead use a more direct approach that allows for a thorough study of the regularity andstructure of the problem’s value function V , and eventually leads to the complete characterisationof the optimal control strategy. To be more precise, we begin with connecting our two-dimensionalbounded-variation stochastic control problem to a suitable zero-sum optimal stopping game (Dynkingame), such that V x = v where v denotes the value of the game with underlying two-dimensional,uncontrolled, degenerate diffusion ( S, Π) taking values in R × (0 , X . By studying the game, we are able to characterise the optimal stopping strategy of eachplayer via two free boundary functions a ± ( π ) for π ∈ (0 , X, Π) is trans-formed into ( X, Φ) with decoupled dynamics that takes values in R × (0 , ∞ ). Under these new( x, ϕ )–coordinates, we show that the transformed control value function V ( x, ϕ ), game value func-tion v ( x, ϕ ), and associated free boundary functions b ± ( ϕ ) inherit all properties proved for V ( x, π ), v ( x, π ) and a ± ( π ). Using these properties, and proving local semiconcavity of V , allow us to showvia fine techniques from viscosity theory that V ∈ C ( R × (0 , ∞ )). Because of the degeneracy ofthe process ( X, Φ) (in which X and Φ are driven by the same Brownian motion), in order to derivefurther regularity of the control problem’s value function it is useful to derive the intrinsic parabolicformulation of the problem (see also [25] and [12]). This is achieved by passing yet to another trans-formation ( X, Y ) of our state process taking values in R . In these new coordinates we prove that thetransformed control value function (cid:98) V ( x, y ) is also continuously differentiable and it is furthermoresuch that (cid:98) V xx admits a continuous extension to the closure of the associated inaction region (wherea linear parabolic PDE holds). This regularity is then employed in order to prove a verification the-orem identifying an optimal control rule. This keeps for almost all times the diffusion ( X, Φ) withinthe closure of the inaction region { ( x, ϕ ) : b + ( ϕ ) < x < b − ( ϕ ) } , according to a Skorokhod reflection.In order to obtain finer regularity and a characterisation of the free boundaries triggering theoptimal control rule, we continue our analysis in the ( x, y )–coordinates. Here, by introducing a newtransformed Dynkin game with value (cid:98) v ( x, y ), we are able to show that the ( x, ϕ )-inaction regiontransforms into an open set of R which is delineated by two strictly increasing curves x = c ± ( y ). Byexploiting the structure of transformation linking the ( x, ϕ )-plane to the ( x, y )-plane, we then obtainan easy proof of the fact that c ± are Lipschitz-continuous functions, with Lipschitz constant L = 1.Such a result is of particular independent interest, given the importance of Lipschitz regularity in obstacle problems (see the introduction of [10] for a detailed account on this and its relatedliterature). Moreover, we believe that the simple argument of our proof can be applied also to othersingular control/optimal stopping problems with partial observation, thus providing an alternative– to the more technical approach developed in [10] – for obtaining the Lipschitz regularity of theoptimal stopping boundaries. The Lipschitz property of c ± is then employed to show via probabilistictechniques `a la [11] that the Dynkin game’s value function is continuously differentiable in R ; thatis, a global smooth-fit property holds. The latter fact is finally useful in proving that (cid:98) v xx ∈ L ∞ loc ( R )and in obtaining a system of nonlinear integral equations solved by c ± .Overall, notwithstanding the degeneracy of the associated PDE in the variational formulation ofthe original control problem, by using our probabilistic methodology in combination with viscositytheory arguments and switching between three equivalent formulations (under change of variables):( a ) we achieve a notable global regularity of the value function V , namely V ∈ C ( R × (0 , ∞ )), andwe deduce that its transformed version (cid:98) V is actually C , in the closure of its inaction region; ( b ) weuse these properties in order to construct an optimal control strategy in terms of the belief-dependentprocess t (cid:55)→ b ± (Φ t ); ( c ) we obtain global Lipschitz continuity of the free boundaries c ± arising in thetransformed problem (cid:98) V , which are then characterised via nonlinear integral equations.Note that, using our methodology as described above, we manage to obtain the minimal (neces-sary) regularity in order to construct an optimal control strategy and verify its optimality. As inmulti-dimensional settings proving regularity properties of the control value function can be verychallenging, having a methodology that takes a different route can be very helpful in studying simi-lar problems with singular controls under partial observation. Moreover, it is worth observing thatbacktracking all the involved change of variables, the characterisation of c ± effectively turns into acharacterisation of the free boundaries b ± and consequently of a ± in the original ( x, π )–coordinates. Structure of the paper.
The rest of this paper is organised as follows. In Section 2, wepresent the model, formulate the control problem, and then derive the separated problem V . Thefirst related optimal stopping game is derived in Section 3, while Section 4 introduces the firstuseful change of coordinates. Section 5 then studies the regularity of the (transformed) controlproblem’s value function V , and Section 6 presents the verification theorem and the construction ofan optimal control. Finally, in Section 7: we introduce the last change of variables; we obtain theLipschitz-continuity of the corresponding free boundaries c ± ; we prove the smooth-fit property of thetransformed Dynkin game’s value function (cid:98) v ; and we derive the integral equations for c ± .2. Problem Formulation and the Separated Problem
On a complete probability space (Ω , F , P ), we define a one-dimensional Brownian motion ( B t ) t ≥ whose P -augmented natural filtration is denoted by ( F Bt ) t ≥ . Moreover, we define a random variable µ which is independent of the Brownian motion B and can take two possible real values, namely µ ∈ { µ , µ } , where µ , µ ∈ R . Without loss of generality, we assume henceforth that µ > µ andthat π := P ( µ = µ ) ∈ (0 , . In absence of any intervention, the underlying (stochastic inventory) process S t as observed by thedecision maker, follows the dynamicsd S t = µ d t + η d B t , S = x ∈ R , for some η >
0. Recall that the drift µ of the process S is not observable by the decision maker, whocan only monitor the evolution of the process S itself. In light of this observation, the decision makerselect their control strategy P based solely on their observation of the process S . By denoting thenatural filtration of any process Y by F Y := ( F Yt ) t ≥ , we can therefore define the set of admissible FEDERICO, FERRARI, AND RODOSTHENOUS controls A := { P : Ω × R + → R such that t (cid:55)→ P t is right-continuous, (locally) of bounded variationand P is F S − adapted } . To be more precise, we consider the minimal decomposition of the bounded-variation control P ∈ A to be P t = P + t − P − t , where P + and P − are then nondecreasing, right-continuous F S –adapted processes. From now on,we set P ± − = 0 a.s. for any P ∈ A . Hence, the reference (controlled inventory) process is given by(2.1) X Pt := S t + P t , where P ∈ A , and such that X P − = x . Note that, when P ≡
0, the inventory process is uncontrolledand takes the form X = S .Given the aforementioned setting, the decision maker’s goal is to minimise the overall (discounted)cost of holding, shortage and controlling the inventory process. In mathematical terms, the bounded-variation control problem of the decision maker is given by(2.2) inf P ∈A E (cid:20)(cid:90) ∞ e − ρt (cid:0) C ( X Pt )d t + K + d P + t + K − d P − t (cid:1)(cid:21) , where E denotes the expectation under the probability measure P , ρ > K + , K − > X P ,and C : R → R + is a holding and shortage cost function which satisfies the following standingassumption . Assumption 2.1.
There exists constants p > , α , α , α > such that the following hold true:(i) for every x ∈ R ≤ C ( x ) ≤ α (1 + | x | p ); (ii) for every x, x (cid:48) ∈ R , | C ( x ) − C ( x (cid:48) ) | ≤ α (cid:0) C ( x ) + C ( x (cid:48) ) (cid:1) − p | x − x (cid:48) | ; (iii) for every x, x (cid:48) ∈ R and λ ∈ (0 , , ≤ λC ( x ) + (1 − λ ) C ( x (cid:48) ) − C ( λx + (1 − λ ) x (cid:48) ) ≤ α λ (1 − λ )(1 + C ( x ) + C ( x (cid:48) )) (cid:16) − p (cid:17) + | x − x (cid:48) | ;Notice that ( iii ) above implies that C is convex and locally semiconcave. Hence, by [5, Corollary3.3.8], we have that C ∈ C , Liploc ( R ; R + ). A classical quadratic holding cost C ( x ) = ( x − x ) , for sometarget level x ∈ R , clearly satisfies Assumption 2.1.Given the feature of a non-observable µ , Problem (2.2) is not Markovian and cannot be thereforetackled via a dynamic programming approach. In the following, we will derive a new equivalentMarkovian problem under full information, the so-called “separated problem”. This will be thensolved by exploiting its connection to a zero-sum game of optimal stopping and by a careful analysisof the regularity of its value function.2.1. The separated problem.
In order to derive the equivalent problem under full information, weuse standard arguments from filtering theory (see, e.g., [32, Section 4.2]) and we define the “belief”process Π t := P ( µ = µ | F St ) , t ≥ , according to which, decision makers update their beliefs on the (true) value of the drift µ based onthe arrival of new information via the observation of the process S . Then, the dynamics of X P andΠ can be written as(2.3) (cid:40) d X Pt = ( µ Π t + µ (1 − Π t ))d t + η d W t + d P t , X P − = x ∈ R , dΠ t = γ Π t (1 − Π t )d W t , Π = π ∈ (0 , , where the innovation process W , given byd W t = d S t η − (cid:18) µ η + γ Π t (cid:19) d t, for all t ≥ , is an F S -Brownian motion on (Ω , F , P ) according to L´evy’s characterisation theorem (see, e.g., [32,Theorem 4.1]), and γ := µ − µ η > . It can be verified that the pair ( X P , Π) is an F S -adapted (time-homogeneous strong) Markov processon (Ω , F , P ) as a unique strong solution of the system of stochastic differential equations in (2.3)(see, e.g. [35, Chapter V]). In (2.3), the (unknown/non-observable) drift µ of X in the original modelis replaced with its filtering estimate E [ µ | F t ]. Moreover, the belief (learning) process Π = (Π t ) t ≥ involved in the filtering is a bounded martingale on [0 ,
1] such that Π ∞ ∈ { , } , due to the fact thatall information eventually gets revealed at time t = ∞ .Then, for ( X P , Π) as in (2.3), with ( x, π ) ∈ O := R × (0 , V ( x, π ) := inf P ∈A E (cid:20)(cid:90) ∞ e − ρt (cid:0) C ( X Pt )d t + K + d P + t + K − d P − t (cid:1)(cid:21) , where all the processes involved are now F S -adapted. Hence, Problem (2.4) is a two-dimensionalMarkovian singular stochastic control problem with controls of bounded variation. Moreover, byuniqueness of the strong solution to the belief equation, a control P (cid:63) is optimal for (2.2) if and onlyif it is optimal for (2.4), and the values in (2.2) and (2.4) coincide.Note that, in light of the dynamics of ( X P , Π) in (2.3), a high value of Π close to 1 would implythat the decision maker has a strong belief in a high drift µ , while a low Π close to 0 would imply,on the contrary, a strong belief in a low drift µ scenario. Remark 2.2 (Full information cases) . In the formulation (2.2) , the case of prior belief π := P ( µ = µ ) ∈ { , } implies the certainty of the decision maker regarding whether µ = µ or µ = µ . Hence,in this case, there is no uncertainty about the value of the drift µ , which is not a random variableany more. Respectively, in the formulation (2.4) , the case of prior belief Π = π ∈ { , } yields thatthe belief process Π will actually remain constant through time, due to its dynamics which imply that Π t = π for all t > . Therefore, we equivalently have that such values of π ∈ { , } correspond to thefull information cases.In these cases, the optimal control problem becomes a standard one-dimensional bounded-variationstochastic control problem, for which an early study can be found in [22] . The optimal control strategyin such a case is triggered by two constant boundaries within which the process X P is kept (via aSkorokhod reflection). Given the convexity of C as in Assumption 2.1, and the linear structure of P (cid:55)→ X P in (2.3), byfollowing standard arguments based on Koml´os’ theorem (see, e.g., [20, Proposition 3.4]) the nextresult can be shown. Proposition 2.3.
There exists an optimal control P (cid:63) for (2.4) . Moreover, this is unique (up toindistinguishability) if C is strictly convex. FEDERICO, FERRARI, AND RODOSTHENOUS The First Related Optimal Stopping Game
We now derive a zero-sum optimal stopping game (Dynkin game) related to V , and we providepreliminary properties of its value function and of the geometry of its state space. In this section,the uncontrolled process X with P t ≡ t ≥ X t , Π t ) t ≥ ≡ ( S t , Π t ) t ≥ is the two-dimensional strong Markov process solving(3.1) (cid:40) d X t = ( µ Π t + µ (1 − Π t ))d t + η d W t , X = x ∈ R , dΠ t = γ Π t (1 − Π t )d W t , Π = π ∈ (0 , , Proposition 3.1.
Consider the process ( X t , Π t ) t ≥ defined in (3.1) and define (3.2) v ( x, π ) := inf σ sup τ E ( x,π ) (cid:20) (cid:90) τ ∧ σ e − ρt C (cid:48) ( X t )d t − K + e − ρτ { τ<σ } + K − e − ρσ { τ>σ } (cid:21) , where the optimisation is taken over the set of F W -stopping times and E ( x,π ) denotes the expectationconditioned on ( X , Π ) = ( x, π ) ∈ O . Consider also the control value function V ( x, π ) defined in (2.4) . Then, we have the following properties:(i) x (cid:55)→ V ( x, π ) is differentiable and v ( x, π ) = V x ( x, π ) .(ii) x (cid:55)→ V ( x, π ) is convex and therefore x (cid:55)→ v ( x, π ) is nondecreasing.(iii) π (cid:55)→ v ( x, π ) is nondecreasing.(iv) ( x, π ) (cid:55)→ v ( x, π ) is continuous on R × (0 , .Proof. In this proof, whenever we need to stress the dependence of the state process on its startingpoint, we denote by ( X x (cid:48) ,π (cid:48) ) , Π π (cid:48) ) the unique strong solution to (3.1) starting at ( x (cid:48) , π (cid:48) ) ∈ O attime zero. We prove separately the four parts. Proof of (i).
Thanks to Proposition 2.3, it suffices to apply [29, Theorem 3.2] upon setting G ≡ H ( ω, t, x ) := e − ρt C (cid:16) x + ηW t ( ω ) + (cid:90) t (cid:0) µ Π s ( ω ) + µ (1 − Π s ( ω )) (cid:1) d s (cid:17) , ( ω, t, x ) ∈ Ω × R + × R ,γ t := e − ρt K + , ν t := e − ρt K − , t ≥ , and noticing that the proof in [29] can be easily adapted to our infinite-time horizon discountedsetting. Proof of (ii).
Denote by ( X P ;( x,π ) , Π π ) the unique strong solution to (2.3) when ( X P − , Π ) = ( x, π ).The convexity of V ( x, π ) with respect to x , can be easily shown by exploiting the convexity of C ( x )and the linear structure of ( x, P ) (cid:55)→ X P ;( x,π ) , for any P ∈ A and ( x, π ) ∈ O . The nondecreasingproperty of v ( · , π ) then follows from the fact that v = V x from part ( i ). Proof of (iii).
Notice that(3.3) X t = x + ηW t + (cid:90) t (cid:0) µ Π s + µ (1 − Π s ) (cid:1) d s, t ≥ , and that π (cid:55)→ Π π is nondecreasing due to standard comparison theorems for strong solutions to one-dimensional stochastic differential equations [26, Chapter 5.2]. Then, the claim follows from (3.2)and Assumption 2.1 according to which x (cid:55)→ C (cid:48) ( x ) is nondecreasing. Proof of (iv).
By [29, Theorem 3.1] and Proposition 2.3 we know that, for any ( x, π ) ∈ O , (3.2)admits a saddle point. Take ( x n , π n ) → ( x, π ) as n ↑ ∞ , and let ( τ (cid:63) , σ (cid:63) ) and ( τ (cid:63)n , σ (cid:63)n ) realize thesaddle-points for ( x, π ) and ( x n , π n ), respectively. Then, we have v ( x, π ) − v ( x n , π n ) ≤ E (cid:20) (cid:90) τ (cid:63) ∧ σ (cid:63)n e − ρt (cid:16) C (cid:48) ( X x,π ) t ) − C (cid:48) ( X x n ,π n ) t ) (cid:17) d t (cid:21) ≤ E (cid:20) (cid:90) ∞ e − ρt (cid:12)(cid:12)(cid:12) C (cid:48) ( X x,π ) t ) − C (cid:48) ( X x n ,π n ) t ) (cid:12)(cid:12)(cid:12) d t (cid:21) . Without loss of generality, we can take ( x n , π n ) ⊂ ( x − ε, x + ε ) × ( π − ε, π + ε ), for a suitable ε > n sufficiently large. Then, by Assumption 2.1.( ii ) and standard estimates using Assumption2.1.( i ), (3.3) and the fact that Π is bounded in [0 , n →∞ ( v ( x, π ) − v ( x n , π n )) ≤ . Arguing symmetrically, now with the couple of stopping times ( τ (cid:63)n , σ (cid:63) ), we also findlim sup n →∞ ( v ( x n , π n ) − v ( x, π )) ≤ . Combining the last two inequalities, we obtain the desired continuity claim. (cid:3)
In the rest of this section, we focus on the study of the optimal stopping game v presented in (3.2),due to its connection to our stochastic control problem (cf. Proposition 3.1). To that end, we definebelow the so-called continuation (waiting) region(3.4) C := (cid:8) ( x, π ) ∈ O : − K + < v ( x, π ) < K − (cid:9) , and the stopping region S := S ∪ S − , whose components are given by(3.5) S := (cid:8) ( x, π ) ∈ O : v ( x, π ) ≤ − K + (cid:9) , S − := (cid:8) ( x, π ) ∈ O : v ( x, π ) ≥ K − (cid:9) . In light of the continuity of v in Proposition 3.1.( iv ), we conclude that the continuation region C is an open set, while the two components of the stopping regions S ± are both closed sets. We cantherefore define the free boundaries a + ( π ) := sup (cid:8) x ∈ R : v ( x, π ) ≤ − K + (cid:9) and a − ( π ) := inf (cid:8) x ∈ R : v ( x, π ) ≥ K − (cid:9) . (3.6)Here, and throughout the rest of this paper, we use the convention sup ∅ = −∞ and inf ∅ = + ∞ .Then, by using the fact that v is nondecreasing with respect to x (see Proposition 3.1.( ii )), we canobtain the structure of the continuation and stopping regions, which take the form C = (cid:8) ( x, π ) ∈ O : a + ( π ) < x < a − ( π ) (cid:9) , (3.7) S +1 = (cid:8) ( x, π ) ∈ O : x ≤ a + ( π ) (cid:9) and S − = (cid:8) ( x, π ) ∈ O : x ≥ a − ( π ) (cid:9) . (3.8)Clearly, the continuity of v further implies that the free boundaries a ± are strictly separated, namely a + ( π ) < a − ( π ) for all π ∈ (0 , . We now prove some preliminary properties of the free boundaries π (cid:55)→ a ± ( π ). Proposition 3.2.
The free boundaries a ± defined in (3.6) satisfy the following properties:(i) a ± ( · ) are nonincreasing on (0 , .(ii) a + ( · ) is left-continuous and a − ( · ) is right-continuous on (0 , .(iii) There exist constants x ∗± ∈ R , such that (3.9) x ∗ + ≤ a + ( π ) < a − ( π ) ≤ x ∗− , ∀ π ∈ (0 , . Moreover, letting ( C (cid:48) ) − be the generalised inverse of C (cid:48) , we have a + ( π ) ≤ ( C (cid:48) ) − ( − ρK + ) and a − ( π ) ≥ ( C (cid:48) ) − ( ρK − ) for all π ∈ (0 , .Proof. We prove separately the four parts.
Proof of (i).
This is a consequence of the definitions of a ± ( · ) in (3.6) and the fact that v ( x, · ) isnondecreasing for any x ∈ R ; cf. Proposition 3.1.( iii ). Proof of (ii).
This follows from part ( i ) above and the closedness of the sets S ± . Proof of (iii).
The fact that a + ( π ) ≤ ( C (cid:48) ) − ( − ρK + ) and a − ( π ) ≥ ( C (cid:48) ) − ( ρK − ) follows by noticingthat S +1 ⊆ { x ∈ R : x ≤ ( C (cid:48) ) − ( − ρK + ) } and S − ⊆ { x ∈ R : x ≥ ( C (cid:48) ) − ( ρK − ) } . In order to show the other bounds, we proceed as follows. Since µ > µ and Π t >
0, we have P ( x,π ) -a.s., for any t ≥
0, that X t = x + ηW t + (cid:90) t (cid:0) µ Π s + µ (1 − Π s ) (cid:1) d s = x + ηW t + µ t + (cid:90) t ( µ − µ )Π s d s ≥ x + ηW t + µ t =: X t . Similarly, using that Π t <
1, we get that X t ≤ x + ηW t + µ t =: X t . Therefore, the latter two estimates yield that X t ≤ X t ≤ X t for all t ≥
0. Combining theseinequalities with the fact that C (cid:48) ( · ) is nondecreasing due to Assumption 2.1 and the definition (3.2)of the value function v ( x, π ), we conclude that(3.10) v ( x ) ≤ v ( x, π ) ≤ v ( x ) , for all ( x, π ) ∈ O , where we have introduced the one-dimensional optimal stopping games v ( x ) := inf σ ∈T sup τ ∈T E (cid:20) (cid:90) τ ∧ σ e − ρt C (cid:48) ( X t )d t − K + e − ρτ { τ<σ } + K − e − ρσ { τ>σ } (cid:21) and v ( x ) := inf σ ∈T sup τ ∈T E (cid:20) (cid:90) τ ∧ σ e − ρt C (cid:48) ( X t )d t − K + e − ρτ { τ<σ } + K − e − ρσ { τ>σ } (cid:21) . Because both v ( · ) and v ( · ) are nondecreasing on R , standard techniques allow to show that thereexists finite x (cid:63) − , x (cid:63) + such that { x ∈ R : x ≥ x (cid:63) − } = { x ∈ R : v ( x ) ≥ K − } and { x ∈ R : x ≤ x (cid:63) + } = { x ∈ R : v ( x ) ≤ − K + } . Hence, combining the latter two regions together with the inequalities in (3.10), we eventually getthat { x ∈ R : x ≥ x (cid:63) − } ⊆ { ( x, π ) ∈ O : v ( x, π ) ≥ K − } = S − . (3.11)and { x ∈ R : x ≤ x (cid:63) + } ⊆ { ( x, π ) ∈ O : v ( x, π ) ≤ − K + } = S +1 , (3.12)Hence, S ± (cid:54) = ∅ and the claim follows from (3.11)-(3.12). (cid:3) Recall that, the higher the value of π , the stronger the decision makers’ belief is about µ begin equalto µ , which is the highest possible value (recall that µ > µ ). Taking this into account, we noticefrom the monotonicity (nonincreasing) of the free boundary functions a ± ( π ) in Proposition 3.2.( i )that: The more the decision maker’s belief tends towards µ (higher inventory level on average), themore cautious they need to be, thus they tend to intervene (by unloading part of excess inventory)more often to make sure that the inventory level X is kept below the optimal threshold a − ( π ),despite its strong tendency to go up, so that the overall (holding and control) costs are minimised.On the other hand, they are more willing to delay interventions (by placing replenishment orders) toincrease the inventory level X , by optimally setting a lower “base-stock” level a + ( π ) as their beliefgrows towards µ (higher inventory level on average). This reflects the fact that the inventory level X will not breach this lower boundary too often under their belief that µ = µ and eventually achievesthe minimisation of the overall (shortage and control) costs. C S +1 S − a + a − x ∗− x ∗ + αβ πx Figure 1.
An illustrative drawing of the free boundaries a + and a − satisfying Propo-sition 3.2. In the picture, α := ( C (cid:48) ) − ( ρK − ) and β := ( C (cid:48) ) − ( − ρK + ).4. A Decoupling Change of Measure
In order to provide further results about the optimal control problem (2.4) and the associatedDynkin game (3.2), it is convenient to decouple the dynamics of the controlled inventory process X P and the belief process Π. This can be achieved via a transformation of state space and a change ofmeasure, as we explain in the following subsections.4.1. Transformation of process Π to Φ . We first recall from (2.3) (see also (3.1)), that for anyprior belief Π = π ∈ (0 , t ∈ (0 ,
1) for all t ∈ (0 , ∞ ). Hence, we define the processΦ t := Π t − Π t , t ≥ , whose dynamics are given via Itˆo’s formula by(4.1) dΦ t = γ Π t Φ t d t + γ Φ t d W t = γ Φ t ( γ Π t d t + d W t ) , Φ = ϕ := π − π . Note that, the process Φ is known as the “likelihood ratio process” in the literature of filtering theory(see, e.g. [25]).4.2.
Change of measure from P to Q T , for some fixed T > . We begin by defining theexponential martingale ζ T := exp (cid:26) − γ (cid:90) T Π s d W s − (cid:90) T γ Π s d s (cid:27) , and the measure Q T ∼ P on (Ω , F T ) by d Q T d P = ζ T . Then, the process W ∗ t := W s + γ (cid:90) t Π s d s, t ∈ [0 , T ] , is a Brownian motion in [0 , T ] under Q T , and the dynamics of Φ in (4.1) simplifies to(4.2) dΦ t = γ Φ t d W ∗ t , t ∈ (0 , T ] , Φ = ϕ, hence Φ is an exponential martingale under Q T .Consequently, applying the same change of measure to the process X P from (2.3), we get that(4.3) d X Pt = µ d t + η d W ∗ t + d P + t − d P − t , t ∈ [0 , T ] , X P − = x. In order to change the measure also in the cost criterion of our value function in (2.4), we furtherdefine the process Z t := 1 + Φ t ϕ , t ∈ [0 , T ] , which can be verified via Itˆo’s formula to satisfy Z t = 1 /ζ t , for every t ∈ [0 , T ] . Hence, denoting by E Q T the expectation under Q T , we have that E (cid:20) (cid:90) T e − ρt (cid:0) C ( X Pt )d t + K + d P + t + K − d P − t (cid:1) (cid:21) = 11 + ϕ E Q T (cid:20) (1 + Φ T ) (cid:90) T e − ρt (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) . (4.4)Since the process (1 + Φ t ) t ≥ defines a nonnegative martingale under Q T , by [15, Theorem 57] (andthe example after the theorem) we can write E Q T (cid:20) (1 + Φ T ) (cid:90) T e − ρt C ( X Pt )d t (cid:21) = E Q T (cid:20) (cid:90) T e − ρt (1 + Φ t ) C ( X Pt )d t (cid:21) , as well as E Q T (cid:20) (1 + Φ T ) (cid:90) T e − ρt d P ± t (cid:21) = E Q T (cid:20) (cid:90) T e − ρt (1 + Φ t )d P ± t (cid:21) . Hence, combining together the above expressions of the expectations E Q T we get that (4.4) can beexpressed in the form of E (cid:20) (cid:90) T e − ρt (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) = 11 + ϕ E Q T (cid:20) (cid:90) T e − ρt (1 + Φ t ) (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) . (4.5)4.3. Passing to the limit as T → ∞ and to the new measure Q . We firstly notice that passingto the limit as T → ∞ cannot be performed directly to the latter expression in (4.5), since themeasure Q T changes with T . Nevertheless, notice that the right-hand side of (4.5) only depends onthe law of the processes involved. Given that we are only interested in the value function (2.4) andeventually in the optimal feedback control P (cid:63) (cf. Proposition 2.3) – which do not depend on thelaws – we can introduce a new auxiliary problem. To that end, we define a new filtered probability space (Ω , F , F , Q ) supporting a Brownian motion( W t ) t ≥ and the strong solution to the controlled stochastic differential equation (cid:40) d X Pt = µ d t + η d W t + d P + t − d P − t , X P − = x, dΦ t = γ Φ t d W t , Φ = ϕ := π − π , for P = P + − P − ∈ A , where A := (cid:8) P : Ω × R + → R such that t (cid:55)→ P t is right-continuous, (locally) of bounded variationand P is F − adapted (cid:9) . Then, denoting by E the expectation on (Ω , F ) under P , we have for every T >
0, that E Q T (cid:20) (cid:90) T e − ρt (1 + Φ t ) (cid:0) C ( X Pt )d t + K + d P + t + K − d P − t (cid:1) (cid:21) = E (cid:20) (cid:90) T e − ρt (1 + Φ t ) (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) , due to the equivalence in law of the process ( X Pt , Φ t , W ∗ t , P t ) t ≥ under Q T and the process ( X Pt , Φ t , W t , P t ) t ≥ under Q . Therefore, combining the above equality with (4.5), we eventually get E (cid:20) (cid:90) T e − ρt (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) = 11 + ϕ E (cid:20) (cid:90) T e − ρt (1 + Φ t ) (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) , (4.6)Thanks to (4.6), we can now take limits as T → ∞ and obtain, in view of the definitions (2.4) ofthe control value function and (4.1) of the starting value ϕ , that(4.7) V ( x, π ) = (1 − π ) V (cid:16) x, π − π (cid:17) , or equivalently V ( x, ϕ ) = (1 + ϕ ) V (cid:16) x, ϕ ϕ (cid:17) , where we define V ( x, ϕ ) := inf P ∈A E (cid:20) (cid:90) ∞ e − ρt (1 + Φ t ) (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) . Therefore, in order to obtain the value function V ( x, π ) from (2.4), we could instead solve first theabove problem to get V ( x, ϕ ) and then use the equality in (4.7). However, in order to simplify thenotation, from now on in the study of V we will simply write (Ω , F , F , Q , E Q , W, X, Φ , P, A ) insteadof (Ω , F , F , Q , E , W , X, Φ , P , A ).4.4. The optimal control problem with state-space process ( X P , Φ) under the new mea-sure Q . Summarising the results from Sections 4.1–4.3, we henceforth focus on the study of thefollowing optimal control problem(4.8) V ( x, ϕ ) := inf P ∈A E Q (cid:20) (cid:90) ∞ e − ρt (1 + Φ t ) (cid:16) C ( X Pt )d t + K + d P + t + K − d P − t (cid:17)(cid:21) =: inf P ∈A J x,ϕ ( P ) . under the dynamics(4.9) (cid:40) d X Pt = µ d t + η d W t + d P + t − d P − t , X P − = x ∈ R , dΦ t = γ Φ t d W t , Φ = ϕ := π − π ∈ (0 , ∞ ) , for a standard Brownian motion W . In light of the equality in (4.7), this will lead to the original valuefunction V ( x, π ) from (2.4). In the remaining of Section 4, we expand our study – beyond the valuesof the control problems – to the relationship between the free boundaries in the two formulations,since these boundaries will eventually define the optimal control strategy (see Section 6). The optimal stopping game associated to (4.8) – (4.9) under the new measure Q . Thenext result is concerned with properties of the value function defined in (4.8) and its connection toan associated optimal stopping game. The proof is omitted for brevity, since it can be proved byemploying arguments similar to those used in the proof of Propositions 2.3 and 3.1 above.
Proposition 4.1.
Consider the problem defined in (4.8) – (4.9) .(i) There exists an optimal control P (cid:63) solving (4.8) . Moreover, P (cid:63) is unique (up to indistinguisha-bility) if C is strictly convex.(ii) x (cid:55)→ V ( x, ϕ ) is convex and differentiable, such that V x ( x, ϕ ) = v ( x, ϕ ) on R × (0 , ∞ ) , for (4.10) ¯ v ( x, ϕ ) := inf σ sup τ E Q (cid:20) (cid:90) τ ∧ σ e − ρt (1 + Φ t ) C (cid:48) ( X t )d t − K + (1 + Φ τ ) e − ρτ { τ<σ } + K − (1 + Φ σ ) e − ρσ { τ>σ } (cid:21) . Here, the optimisation is taken over the set of F W -stopping times and the state-space process isgiven by (4.11) (cid:40) d X t = µ d t + η d W t , X = x ∈ R , dΦ t = γ Φ t d W t , Φ = ϕ := π − π ∈ (0 , ∞ ) . It further follows from the previous analysis, namely Sections 4.1–4.3, that the value function v ( x, π ) of the optimal stopping game in (3.2) is connected to the value function ¯ v ( x, ϕ ) of the newgame introduced above in (4.10), according to (see also (4.7) for the control value functions) thefollowing equality(4.12) ¯ v ( x, ϕ ) = (1 + ϕ ) v (cid:16) x, ϕ ϕ (cid:17) . In view of the above relationship, the value function ¯ v ( · , · ) inherits important properties which havealready been proved for v ( · , · ) in Section 3. In particular, we have directly from Proposition 3.1.( ii )and ( iv ) the following result. Proposition 4.2.
The value function ¯ v defined in (4.10) satisfies the following properties:(i) ( x, ϕ ) (cid:55)→ ¯ v ( x, ϕ ) is continuous over R × (0 , ∞ ) ;(ii) x (cid:55)→ ¯ v ( x, ϕ ) is nondecreasing. Following similar steps as in Section 3 to study the new game (4.10), we define below the so-calledcontinuation (waiting) region(4.13) C := (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : − K + (1 + ϕ ) < ¯ v ( x, ϕ ) < K − (1 + ϕ ) (cid:9) , and the stopping region S := S ∪ S − , whose components are given by S +2 := (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : ¯ v ( x, ϕ ) ≤ − K + (1 + ϕ ) (cid:9) , (4.14) S − := (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : ¯ v ( x, ϕ ) ≥ K − (1 + ϕ ) (cid:9) . (4.15)Moreover, in light of the continuity of ¯ v in Proposition 4.2.( i ), we conclude that the continuationregion C is an open set, while the two components of the stopping regions S ± are both closed sets.We can therefore define the free boundaries b + ( ϕ ) := sup (cid:8) x ∈ R : v ( x, ϕ ) ≤ K + (1 + ϕ ) (cid:9) , (4.16) b − ( ϕ ) := inf { x ∈ R : v ( x, ϕ ) ≥ K − (1 + ϕ ) } . (4.17) Then, by using the fact that ¯ v is nondecreasing with respect to x (see Proposition 4.2.( ii )), we canobtain the structure of the continuation and stopping regions, which take the form C = (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : b + ( ϕ ) < x < b − ( ϕ ) (cid:9) , (4.18) S +2 = (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : x ≤ b + ( ϕ ) (cid:9) and S − = (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : b − ( ϕ ) ≤ x (cid:9) . (4.19)Clearly, the continuity of ¯ v implies that these free boundaries b ± are strictly separated, namely b + ( ϕ ) < b − ( ϕ ) for all ϕ ∈ (0 , ∞ ).Moreover, observe that the relationship in (4.12) together with the definitions (3.4) and (4.13) of C and C , respectively, imply that the latter two regions are equal under the transformation from( x, π )- to ( x, ϕ )-coordinates. To be more precise, for any ( x, π ) ∈ R × (0 , T := ( T , T ) : R × (0 , → R × (0 , ∞ ) , ( T ( x, π ) , T ( x, π )) = (cid:16) x, π − π (cid:17) , which is invertible and its inverse is given by T − ( x, ϕ ) = (cid:16) x, ϕ ϕ (cid:17) , ( x, ϕ ) ∈ R × (0 , ∞ ) . Hence, T : R × (0 , → R × (0 , ∞ ) is a global diffeomorphism, which implies together with theexpressions of (3.4)–(3.5) and (4.13)–(4.15) that C = T ( C ) and S ± = T ( S ± ) . Taking this into account together with the expressions (3.7)–(3.8) of C and S ± , we can furtherconclude from the expressions (4.18)–(4.19) of C and S ± that(4.20) b ± ( ϕ ) = a ± (cid:18) ϕ ϕ (cid:19) . Hence, in light of the previously proved results for a ± in Proposition 3.2, we also obtain thefollowing preliminary properties of the free boundaries ϕ (cid:55)→ b ± ( ϕ ). Proposition 4.3.
The free boundaries b ± defined in (4.16) – (4.17) satisfy the following properties:(i) b ± ( · ) are nonincreasing on (0 , ∞ ) .(ii) b + ( · ) is left-continuous and b − ( · ) is right-continuous on (0 , ∞ ) .(iii) b ± ( · ) are bounded by x ∗± as in Proposition 3.2: x ∗ + ≤ b + ( ϕ ) < b − ( ϕ ) ≤ x ∗− , ∀ ϕ ∈ (0 , ∞ ) . Moreover, we have b + ( ϕ ) ≤ ( C (cid:48) ) − ( − ρK + ) and b − ( ϕ ) ≥ ( C (cid:48) ) − ( ρK − ) for all ϕ ∈ (0 , ∞ ) . Notice that the explicit relationship (4.20) between the free boundaries a ± and b ± that we provedabove, is not only crucial for retrieving the original boundaries a ± from b ± , but it is also particularlyuseful in the proof of Proposition 4.3.( i ) and ( iii ). In fact, proving the monotonicity and boundednessof b ± by directly working on the Dynkin game (4.10) is not a straightforward task.Up this point, we managed to obtain the structure of the optimal stopping strategies and prelim-inary properties of the corresponding optimal stopping boundaries associated with these strategies,for both Dynkin games (3.2) and (4.10) connected to the optimal control problems (2.4) and (4.8),respectively. Moreover, we managed to obtain some regularity results for the value functions of thelatter control problems (see Propositions 3.1, 4.1 and 4.2). In Sections 5 and 6 below, building onthe aforementioned analysis, we show that the control value function V has the sufficient regularityneeded to construct an optimal control strategy. This will involve the boundaries b ± . HJB Equation and Regularity of V In this section, we introduce the Hamilton-Jacobi-Bellman (HJB) equation (variational inequality)associated to the control value function V defined in (4.8) and state-space process ( X P , Φ) given by(4.9). First, let
D ⊆ R be an open domain and define the space C k,h ( D ; R ) as the space of functions f : D → R which are k -times continuously differentiable with respect to the first variable and h -timescontinuously differentiable with respect to the second variable. When k = h we simply write C h .We begin our study with the following ex ante regularity result for V . Its proof can be found inthe Appendix. Proposition 5.1.
The control value function V defined in (4.8) is locally semiconcave; that is, forevery R > there exists L R > such that for all λ ∈ [0 , and all ( x, ϕ ) , ( x (cid:48) , ϕ (cid:48) ) such that | ( x, ϕ ) | ≤ R and | ( x (cid:48) , ϕ (cid:48) ) | ≤ R , we have λV ( x, ϕ ) + (1 − λ ) V ( x (cid:48) , ϕ (cid:48) ) − V ( λ ( x, ϕ ) + (1 − λ )( x (cid:48) , ϕ (cid:48) )) ≤ L R λ (1 − λ ) | ( x, ϕ ) − ( x (cid:48) , ϕ (cid:48) ) | . In particular, by [5, Theorem 2.17] , we conclude that V is locally Lipschitz. Given the locally Lipschitz continuity proved in the previous result, we now aim at employing theHJB equation to investigate further regularity of V . To that end, we define on f ∈ C ( R × (0 , ∞ ); R )the second order differential operator(5.1) L f ( x, ϕ ) := µ f x ( x, ϕ ) + 12 (cid:0) η f xx ( x, ϕ ) + γ ϕ f ϕϕ ( x, ϕ ) + 2 γηϕf xϕ ( x, ϕ ) (cid:1) . By the dynamic programming principle, we expect that V solves (in a suitable sense) the HJBequation (in the form of a variational inequality)(5.2) max (cid:110) ( ρ − L ) u ( x, ϕ ) − (1 + ϕ ) C ( x ) , − u x ( x, ϕ ) − K + (1 + ϕ ) , u x ( x, ϕ ) − K − (1 + ϕ ) (cid:111) = 0 , for ( x, ϕ ) ∈ R × (0 , ∞ ). In particular, we now first show that the value function V of the controlproblem defined in (4.8) is a viscosity solution to (5.2). We present the formal definition of the latternotion below. Definition 5.2.
A function u ∈ C ( R × (0 , ∞ ); R ) is called a viscosity solution to (5.2) if it is botha viscosity subsolution and supersolution, where: (i) a function u ∈ C ( R × (0 , ∞ ); R ) is called a viscosity subsolution to (5.2) if, for every ( x, ϕ ) ∈ R × (0 , ∞ ) and every β ∈ C ( R × (0 , ∞ ); R ) such that u − β attains a local maximumat ( x, ϕ ) , it holds max (cid:8) ( ρ − L ) β ( x, ϕ ) − (1 + ϕ ) C ( x ) , − β x ( x, ϕ ) − K + (1 + ϕ ) , β x ( x, ϕ ) − K − (1 + ϕ ) (cid:9) ≤ . (ii) a function u ∈ C ( R × (0 , ∞ ); R ) is called a viscosity supersolution to (5.2) if, for every ( x, ϕ ) ∈ R × (0 , ∞ ) and every β ∈ C ( R × (0 , ∞ ); R ) such that u − β attains a local minimumat ( x, ϕ ) , it holds max (cid:8) ( ρ − L ) β ( x, ϕ ) − (1 + ϕ ) C ( x ) , − β x ( x, ϕ ) − K + (1 + ϕ ) , β x ( x, ϕ ) − K − (1 + ϕ ) (cid:9) ≥ . Following the arguments developed in Theorem 5.1 in Section VIII.5 of [21], and using the a prioriregularity obtained in Proposition 5.1, one can show the following classical result.
Proposition 5.3.
The value function V defined in (4.8) is a locally Lipschitz continuous viscositysolution to (5.2) . Recall the definition (4.13) of the continuation region C of problem v ( x, ϕ ) in (4.10) and therelationship V x ( x, ϕ ) = v ( x, ϕ ) on R × (0 , ∞ ) from Proposition 4.1.( ii ), to observe that(5.3) C = (cid:8) ( x, ϕ ) ∈ R × (0 , ∞ ) : − K + (1 + ϕ ) < V x ( x, ϕ ) < K − (1 + ϕ ) (cid:9) , This implies that C identifies also with the so-called “inaction region” of V , as suggested also bythe HJB equation (5.2). Combining the latter fact with Proposition 5.3 clearly implies the followingresult. Corollary 5.4.
The value function V defined in (4.8) is a locally Lipschitz continuous viscositysolution to ( ρ − L ) u ( x, ϕ ) − (1 + ϕ ) C ( x ) = 0 , for all ( x, ϕ ) ∈ C . The result in Corollary 5.4 will be used in the forthcoming analysis to upgrade the regularity ofthe value function in the closure of its inaction region which is the main goal of Section 5. Beforereaching this (final) step of our analysis in this section, we need to further prove that V is actuallyglobally continuously differentiable. We present this result in the following proposition, which isproved by using once again Proposition 5.3 together with the properties of V proved in Proposition5.1 and in Section 4.5. Proposition 5.5.
The value function V defined in (4.8) satisfies V ∈ C ( R × (0 , ∞ ); R ) .Proof. In order to prove that V ∈ C ( R × (0 , ∞ ); R ), we need to prove that both (classical) derivatives V x ( x, ϕ ) and V ϕ ( x, ϕ ) of V ( x, ϕ ) in the directions x and ϕ , respectively, are continuous on R × (0 , ∞ ).We therefore split the proof of the desired claim in the following two steps. Step 1. Continuity of V x . We already know from Proposition 4.1.( ii ) that V x = ¯ v exists and fromProposition 4.2.( i ) that ( x, ϕ ) (cid:55)→ ¯ v ( x, ϕ ) is continuous over R × (0 , ∞ ). Hence, we conclude that( x, ϕ ) (cid:55)→ V x ( x, ϕ ) is continuous on R × (0 , ∞ ). Step 2. Continuity of V ϕ . Let us now show that the (classical) derivative V ϕ exists at each( x o , ϕ o ) ∈ R × (0 , ∞ ).We assume, without loss of generality , that V is actually concave in a neighborhood I of ( x o , ϕ o ).Then, by concavity of V in I , the right- and left-derivatives of V exist in the ϕ -direction at ( x o , ϕ o ).We denote these derivatives by V + ϕ ( x o , ϕ o ) and V − ϕ ( x o , ϕ o ), respectively, and due to concavity theysatisfy the inequality V − ϕ ( x o , ϕ o ) ≥ V + ϕ ( x o , ϕ o ). Then, in order to show that V ϕ exists, it suffices toshow that the strict inequality V − ϕ ( x o , ϕ o ) > V + ϕ ( x o , ϕ o ) cannot hold. Aiming for a contradiction, weassume henceforth that V − ϕ ( x o , ϕ o ) > V + ϕ ( x o , ϕ o ) does hold true.It follows from [36, Theorem 23.4] and the fact that V x exists and is continuous (cf. Step 1 above)that there exist vectors ζ := ( V x ( x o , ϕ o ) , ζ ϕ ) , η := ( V x ( x o , ϕ o ) , η ϕ ) ∈ D + V ( x o , ϕ o ) such that ζ ϕ < η ϕ , where we denote by D + V ( x o , ϕ o ) the superdifferential of V at ( x o , ϕ o ). For any ( x, ϕ ) ∈ I , we thendefine g ( x, ϕ ) := V ( x o , ϕ o ) + V x ( x o , ϕ o )( x − x o ) + η ϕ ( ϕ − ϕ o ) ∧ ζ ϕ ( ϕ − ϕ o )and notice that V ( x o , ϕ o ) = g ( x o , ϕ o ), while we also get by concavity that V ( x, ϕ ) ≤ g ( x, ϕ ) , ∀ ( x, ϕ ) ∈ I . Next, we consider the sequence of functions ( f n ) n ∈ N ⊂ C ( R × (0 , ∞ ); R ) defined by f n ( x, ϕ ) := g ( x, ϕ o ) + 12 ( η ϕ + ζ ϕ )( ϕ − ϕ o ) − n ϕ − ϕ o ) , ∀ n ∈ N . Such a sequence satisfies the following collection of properties:(5.4) f n ( x o , ϕ o ) = g ( x o , ϕ o ) = V ( x o , ϕ o ) , ∀ n ∈ N ,f n ≥ V in a neighborhood of ( x o , ϕ o ) , ∀ n ∈ N ,f nx ( x o , ϕ o ) = V x ( x o , ϕ o ) , f nxx ( x o , ϕ o ) = 0 = f nxϕ ( x o , ϕ o ) , f nϕϕ ( x o , ϕ o ) = − n, ∀ n ∈ N . This can be done by replacing the (locally) semiconcave V ( x, ϕ ) by W ( x, ϕ ) := V ( x, ϕ ) − C | ( x − x o , ϕ − ϕ o ) | forsuitable C > Then, using the viscosity subsolution property (cf. Definition 5.2.( i )) of V at ( x o , ϕ o ) yields0 ≥ ( ρ − L ) f n ( x o , ϕ o ) − (1 + ϕ o ) C ( x o ) n →∞ −→ + ∞ , which gives the desired contradiction. Hence, by arbitrariness of ( x o , ϕ o ), we have that V is differ-entiable in the direction ϕ .In view of the aforementioned differentiability in the direction ϕ and the semiconcavity of V (cf.Proposition 5.1) we conclude from [36, Theorem 25.5] that V ϕ is continuous on R × (0 , ∞ ). (cid:3) We are now ready to show the final result of this section, namely to upgrade the regularity of thevalue function of the control problem to the minimal required regularity for constructing a candidateoptimal control policy and verify its optimality in Section 6.To this end, we define for any ( x, ϕ ) ∈ R × (0 , ∞ ) the transformation(5.5) T := ( T , T ) : R × (0 , ∞ ) → R , ( T ( x, ϕ ) , T ( x, ϕ )) = (cid:16) x, x − ηγ log( ϕ ) (cid:17) , which is invertible with inverse given by T − ( x, y ) = (cid:16) x, e γη ( x − y ) (cid:17) , ( x, y ) ∈ R . Using the latter inverse transformation, we introduce the transformed version (cid:98) V ( x, y ) of the valuefunction V ( x, ϕ ) defined in (4.8) by(5.6) (cid:98) V ( x, y ) := V ( x, e γη ( x − y ) ) , ( x, y ) ∈ R . Moreover, direct calculations yield that(5.7) (cid:98) V x ( x, y ) + (cid:98) V y ( x, y ) = V x ( x, e γη ( x − y ) ) , ( x, y ) ∈ R . Given that T : R × (0 , ∞ ) → R is a global diffeomorphism, we have from (5.3) and (5.7) that theopen set(5.8) C := (cid:8) ( x, y ) ∈ R : − K + (1 + e γη ( x − y ) ) < (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( x, y ) < K − (1 + e γη ( x − y ) ) (cid:9) = T ( C ) . Finally, define the second-order linear differential operator on f ∈ C , ( R ; R ) by(5.9) L X,Y f ( x, y ) := 12 η f xx ( x, y ) + µ f x ( x, y ) + 12 ( µ + µ ) f y ( x, y ) Proposition 5.6.
The transformed value function (cid:98) V defined in (5.6) satisfies (cid:98) V ∈ C , ( C ; R ) , where C denotes the closure of the open set C defined in (5.8) . In addition, (cid:98) V is a classical solution to (5.10) (cid:0) ρ − L X,Y (cid:1) u ( x, y ) = C ( x )(1 + e γη ( x − y ) ) , for all ( x, y ) ∈ C . Proof.
First of all, due to Corollary 5.4 and the expression of the transformed value function in (5.6),one can easily verify that (cid:98) V is a viscosity solution to (5.10) on C due to (5.8). Then, in light ofProposition 5.5 and the above smooth transformation, we also obtain that (cid:98) V ∈ C ( R ; R ).By a standard localization argument based on the fact that (cid:98) V is a continuously differentiableviscosity solution to (5.10) on C and results for Dirichlet boundary problems involving partial differ-ential equations of parabolic type (see [31]), we have that actually (cid:98) V ∈ C , ( C ; R ) and solves (5.10)on C in a classical sense. Hence,(5.11) 12 η (cid:98) V xx ( x, y ) = − C ( x )(1 + e γη ( x − y ) ) + ρ (cid:98) V ( x, y ) − µ (cid:98) V x ( x, y ) −
12 ( µ + µ ) (cid:98) V y ( x, y ) , for all ( x, y ) ∈ C . However, since we know that (cid:98) V ∈ C ( R ; R ) and since the right-hand side of(5.11) only involves functions that are continuous on R , we conclude that (cid:98) V xx admits a continuousextension on C . This completes the proof of the claim. (cid:3) Verification Theorem and Optimal Control
Given the regularity of (cid:98) V obtained in Proposition 5.6 and the relation (5.6) between (cid:98) V with thevalue function V defined in (4.8), we are now able to prove a verification theorem. Namely, weprovide in this section the optimal control for V in terms of the boundaries b ± defined in (4.16)–(4.17). Before we commence the analysis, recall also the properties of the latter boundaries provedin Proposition 4.3.6.1. Construction of control (cid:98) P for state-space process ( X (cid:98) P , Φ) . For any given ( x, ϕ ) ∈ R × (0 , ∞ ), we define the admissible control strategy (cid:98) P := (cid:98) P + − (cid:98) P − such that the following couple ofproperties hold true:(6.1) b + (Φ t ) ≤ X (cid:98) Pt ≤ b − (Φ t ) , Q ⊗ d t − a.e.; (cid:98) P + t = (cid:90) [0 ,t ] { X (cid:98) Ps ≤ b + (Φ s ) } d (cid:98) P + s and (cid:98) P − t = (cid:90) [0 ,t ] { X (cid:98) Ps ≥ b − (Φ s ) } d (cid:98) P − s , t ≥ . In practice, according to the aforementioned strategy, a lump-sum increase or decrease of theinventory process X may be required, whenever the inventory level X t − happens to be either strictlybelow the boundary b + (Φ t ) or above the boundary b − (Φ t ), respectively. The purpose of these jumps of at most one of the controls (cid:98) P ± t at each such t ≥
0, of size either ( b + (Φ t ) − X (cid:98) Pt − ) + or ( X (cid:98) Pt − − b − (Φ t )) + ,is to bring immediately the inventory level X t inside the interval [ b + (Φ t ) , b − (Φ t )]. Mathematically,these are the actions caused at any time t ≥
0, by the jump parts ∆ (cid:98) P ± t := (cid:98) P ± t − (cid:98) P ± t − of the controls (cid:98) P ± . Then, the strategy prescribes taking action (increase or decrease the inventory) when theinventory process X t approaches, at any time t ≥
0, either boundary b + (Φ t ) from above or boundary b − (Φ t ) from below. The purpose of these actions now is to make sure (with a minimal effort) thatthe inventory level X t is kept inside the interval [ b + (Φ t ) , b − (Φ t )]. Mathematically, these actionsare caused by the continuous parts of the respective controls (cid:98) P ± and are the so-called Skorokhodreflection-type policies.Given that the dynamics of X (cid:98) P and Φ are decoupled (cf. (4.9)), the solution triplet ( X (cid:98) Pt , Φ t , (cid:98) P t ) t ≥ to the Skorokhod reflection problem at the boundaries b ± can be constructed as in [20, Section 4.3].It further follows from (6.1) above together with the definitions (4.16)–(4.17) of boundaries b ± , theregion C from (4.18) and the fact that ¯ v = V x from Proposition 4.1.( ii ), that the nondecreasingprocesses (cid:98) P ± are such that the state-space process ( X (cid:98) P , Φ) and the induced (random) measures d (cid:98) P ± on R + satisfy:(6.2) ( X (cid:98) Pt , Φ t ) ∈ C , for Q ⊗ d t -a.e., with C as in (4.18);d (cid:98) P + has support on (cid:8) t ≥ V x ( X (cid:98) Pt , Φ t ) ≤ − K + (1 + Φ t ) (cid:9) ;d (cid:98) P − has support on (cid:8) t ≥ V x ( X (cid:98) Pt , Φ t ) ≥ K − (1 + Φ t ) (cid:9) . Transformation of controlled process ( X (cid:98) P , Φ) to ( X (cid:98) P , Y (cid:98) P ) . We now use the transforma-tion (5.5) from ( x, ϕ )- to ( x, y )-coordinates, in order to define the controlled process(6.3) Y (cid:98) Pt := X (cid:98) Pt − ηγ log(Φ t ) , t ≥ , whose dynamics are given via Itˆo-Meyer’s formula byd Y (cid:98) Pt = 12 ( µ + µ )d t + d (cid:98) P + t − d (cid:98) P − t , Y (cid:98) P − = y := x − ηγ log( ϕ ) . Recalling the transformed value function (5.6) and the relation in (5.7), we have(6.4) (cid:98) V ( X (cid:98) Pt , Y (cid:98) Pt ) := V (cid:0) X (cid:98) Pt , e γη ( X (cid:98) Pt − Y (cid:98) Pt ) (cid:1) and (cid:98) V x ( X (cid:98) Pt , Y (cid:98) Pt ) + (cid:98) V y ( X (cid:98) Pt , Y (cid:98) Pt ) = V x (cid:0) X (cid:98) Pt , e γη ( X (cid:98) Pt − Y (cid:98) Pt ) (cid:1) , under the dynamics(6.5) (cid:40) d X (cid:98) Pt = µ d t + η d W t + d (cid:98) P + t − d (cid:98) P − t , X (cid:98) P − = x ∈ R , d Y (cid:98) Pt = ( µ + µ )d t + d (cid:98) P + t − d (cid:98) P − t , Y (cid:98) P − = y := x − ηγ log( ϕ ) ∈ R . Hence, in light of (6.4)–(6.5), we can express the control (cid:98) P defined in Section 6.1 in terms of thestate-space process ( X (cid:98) P , Y (cid:98) P ) via(6.6) ( X (cid:98) Pt , Y (cid:98) Pt ) ∈ C , for Q ⊗ d t -a.e., where C is defined in (5.8);d (cid:98) P + has support on (cid:8) t ≥ (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Pt , Y (cid:98) Pt ) ≤ − K + (cid:0) e γη ( X (cid:98) Pt − Y (cid:98) Pt ) (cid:1)(cid:9) ;d (cid:98) P − has support on (cid:8) t ≥ (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Pt , Y (cid:98) Pt ) ≥ K − (cid:0) e γη ( X (cid:98) Pt − Y (cid:98) Pt ) ) (cid:1)(cid:9) . Optimality of control (cid:98) P . In this section we prove the optimality of the control (cid:98) P definedthrough (6.1), which is equivalently expressed by (6.2) in terms of the state-space process ( X (cid:98) P , Φ)and by (6.6) in terms of the state-space process ( X (cid:98) P , Y (cid:98) P ), see Sections 6.1–6.2. Theorem 6.1 (Verification Theorem) . The admissible control (cid:98) P ∈ A defined through (6.1) (seealso (6.2) and (6.6) ) is optimal for Problem (4.8) . Actually, (cid:98) P is the unique optimal control (up toindistinguishability) if C is strictly convex.Proof. Recall that (cid:98) V ∈ C , ( C ; R ) by Proposition 5.6, where (cid:98) V is the transformed value function in(5.6). By the Tietze extension theorem, it can be extended to a function (cid:101) V ∈ C , ( R ; R ).Let now ( X (cid:98) P − , Y (cid:98) P − ) = ( x, y ) ≡ ( x, x − η log( ϕ ) /γ ) ∈ C be given and fixed, and define τ n :=inf (cid:8) t ≥ | ( X (cid:98) Pt , Y (cid:98) Pt ) | > n (cid:9) ∧ n , for n ∈ N with state-space process ( X (cid:98) P , Y (cid:98) P ) as defined in (6.5).Then, noticing that ( X (cid:98) Pt , Y (cid:98) Pt ) ∈ C , Q -a.s. for all t ≥
0, and that (cid:101) V = (cid:98) V on C we can apply Dynkin’sformula to the process e − ρt (cid:101) V ( X (cid:98) Pt , Y (cid:98) Pt ) on the (random) time interval [0 , τ n ], obtaining (cid:98) V ( x, y ) = E Q (cid:20) e − ρτ n (cid:98) V ( X (cid:98) Pτ n , Y (cid:98) Pτ n ) (cid:21) − E Q (cid:20) (cid:90) τ n e − ρs (cid:0) L X,Y − ρ (cid:1) (cid:98) V ( X (cid:98) Ps , Y (cid:98) Ps )d s (cid:21) − E Q (cid:20) (cid:90) τ n e − ρs (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Ps , Y (cid:98) Ps )d (cid:98) P cs (cid:21) − E Q (cid:20) (cid:88) ≤ s ≤ τ n e − ρs (cid:16) (cid:98) V ( X (cid:98) Ps , Y (cid:98) Ps ) − (cid:98) V ( X (cid:98) Ps − , Y (cid:98) Ps − ) (cid:17)(cid:21) , (6.7)where (cid:98) P c denotes the continuous part of (cid:98) P and the final sum is non-zero only for (at most countablymany) times s such that ∆ (cid:98) P s := (cid:98) P s − (cid:98) P s − (cid:54) = 0. Clearly, ∆ (cid:98) P s = ∆ (cid:98) P + s − ∆ (cid:98) P − s , where ∆ (cid:98) P ± s := (cid:98) P ± s − (cid:98) P ± s − and notice that (cid:88) ≤ s ≤ τ n e − ρs (cid:0) (cid:98) V ( X (cid:98) Ps , Y (cid:98) Ps ) − (cid:98) V ( X (cid:98) Ps − , Y (cid:98) Ps − ) (cid:1) = (cid:88) ≤ s ≤ τ n e − ρs (cid:90) ∆ (cid:98) P + s (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Ps − + u, Y (cid:98) Ps − + u )d u − (cid:88) ≤ s ≤ τ n e − ρs (cid:90) ∆ (cid:98) P − s (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Ps − − u, Y (cid:98) Ps − − u )d u. (6.8) Hence, plugging (6.8) into (6.7) and using (5.10), we obtain (cid:98) V ( x, y ) = E Q (cid:20) e − ρτ n (cid:98) V ( X (cid:98) Pτ n , Y (cid:98) Pτ n ) (cid:21) + E Q (cid:20) (cid:90) τ n e − ρs (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) C ( X (cid:98) Ps )d s (cid:21) − E Q (cid:20) (cid:90) τ n e − ρs (cid:16) (cid:98) V x + (cid:98) V y (cid:17) ( X (cid:98) Ps , Y (cid:98) Ps )d (cid:0) (cid:98) P + ,cs − (cid:98) P − ,cs (cid:1)(cid:21) − E Q (cid:20) (cid:88) ≤ s ≤ τ n e − ρs (cid:90) ∆ (cid:98) P + s (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Ps − + u, Y (cid:98) Ps − + u )d u − (cid:88) ≤ s ≤ τ n e − ρs (cid:90) ∆ (cid:98) P − s (cid:0) (cid:98) V x + (cid:98) V y (cid:1) ( X (cid:98) Ps − − u, Y (cid:98) Ps − − u )d u (cid:21) . (6.9)Using now the nonnegativity of (cid:98) V as well as the second and third property of control (cid:98) P in (6.6), wesee that (6.9) becomes (cid:98) V ( x, y ) ≥ E Q (cid:20) (cid:90) τ n e − ρs (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) C ( X (cid:98) Ps )d s (cid:21) + E Q (cid:20) (cid:90) τ n e − ρs K + (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) d (cid:98) P + s + (cid:90) τ n e − ρs K − (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) d (cid:98) P − s (cid:21) . Then, we take limits as n ↑ ∞ and we invoke Fatou’s lemma (given the nonnegativity of all theintegrands above) to find that (cid:98) V ( x, y ) ≥ E Q (cid:20) (cid:90) ∞ e − ρs (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) C ( X (cid:98) Ps )d s (cid:21) + E Q (cid:20) (cid:90) ∞ e − ρs K + (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) d (cid:98) P + s + (cid:90) ∞ e − ρs K − (cid:0) e γη ( X (cid:98) Ps − Y (cid:98) Ps ) (cid:1) d (cid:98) P − s (cid:21) . (6.10)Given now that X (cid:98) P − Y (cid:98) P = η log(Φ) /γ by definition (6.3), and that (5.6) yields (cid:98) V ( x, y ) = (cid:98) V ( x, x − η log( ϕ ) /γ ) = V ( x, ϕ ), we further conclude from (6.10) that for any ( x, ϕ ) ∈ C (as we had assumed( x, y ) ≡ ( x, x − η log( ϕ ) /γ ) ∈ C )(6.11) V ( x, ϕ ) ≥ E Q (cid:20) (cid:90) ∞ e − ρs (cid:0) s (cid:1) C ( X (cid:98) Ps )d s + (cid:90) ∞ e − ρs (cid:0) s (cid:1)(cid:0) K + d (cid:98) P + s + K − d (cid:98) P − s (cid:1)(cid:21) = J x,ϕ ( (cid:98) P ) . Combining this inequality with definition (4.8), i.e. V ( x, ϕ ) ≤ J x,ϕ ( (cid:98) P ), we prove that (cid:98) P is an optimalcontrol, for any ( x, ϕ ) ∈ C .Suppose now that ( x, ϕ ) is such that x < b + ( ϕ ), so that ( x, ϕ ) ∈ S +2 . Then, according to (6.1)(see also (6.2)), and using (6.11), we have that J x,ϕ ( (cid:98) P ) = K + (1 + ϕ ) (cid:0) b + ( ϕ ) − x ) + J b + ( ϕ ) ,ϕ ( (cid:98) P ) ≤ V ( b + ( ϕ ) , ϕ ) − (cid:90) b + ( ϕ ) x V x ( z, ϕ ) = V ( x, ϕ ) . Proceeding similarly also for ( x, ϕ ) such that x > b − ( ϕ ), we conclude that (cid:98) P is indeed optimal forany ( x, ϕ ) ∈ R . (cid:3) Refined Regularity of the Free Boundaries and their Characterization
In this section we will obtain substantial regularity of the value ¯ v ( x, ϕ ) of the Dynkin game (4.10),as well as an analytical characterisation of its corresponding free boundaries b ± . Due to Theorem6.1, this consequently leads to the complete knowledge of the optimal control rule (cid:98) P . C S +2 S − b + b − x ∗− x ∗ + (cid:63) (cid:63) (cid:63) (cid:63)(cid:54) (cid:54) (cid:54) (cid:54) αβ ϕx Figure 2.
An illustrative drawing of the free boundaries b + and b − satisfying Propo-sition 4.3. In the picture, α := ( C (cid:48) ) − ( ρK − ) and β := ( C (cid:48) ) − ( − ρK + ). More-over, the vertical arrows identify the directions of exercise of the optimal control (cid:98) P defined through (6.1).7.1. Parabolic formulation and Lipschitz continuity of the free boundaries.
In view of afurther change of variables, in line with (6.3), we define(7.1) Y t := X t − ηγ log(Φ t ) , t ≥ , with X as in (4.11). Then, by Itˆo’s formula, we have(7.2) (cid:40) d X t = µ d t + η d W t , X = x ∈ R , d Y t = ( µ + µ )d t, Y = y := x − ηγ log( ϕ ) ∈ R , and (4.10) rewrites in terms of the new coordinates ( x, y ) = ( X , Y ) as (cid:98) v ( x, y ) := inf σ sup τ E Q (cid:20) (cid:90) τ ∧ σ e − ρt (cid:16) e γη ( X t − Y t ) (cid:17) C (cid:48) ( X t )d t − K + e − ρτ (cid:16) e γη ( X τ − Y τ ) (cid:17) { τ<σ } + K − e − ρσ (cid:16) e γη ( X σ − Y σ ) (cid:17) { τ>σ } (cid:21) = ¯ v (cid:16) x, e γη ( x − y ) (cid:17) , ( x, y ) ∈ R . (7.3)In view of the relationship in (7.3), the value function (cid:98) v ( · , · ) inherits important properties whichhave already been proved for ¯ v ( · , · ). To be more precise, we first conclude immediately from Propo-sition 4.2.( i ) the following result. Proposition 7.1.
The value function ( x, y ) (cid:55)→ (cid:98) v ( x, y ) defined in (7.3) is continuous over R . Moreover, since ¯ v ( x, exp { γ ( x − y ) /η } ) = V x ( x, exp { γ ( x − y ) /η } ) by Proposition 4.1.( ii ), it followsfrom (5.7) that (cid:98) v ( x, y ) = (cid:98) V x ( x, y ) + (cid:98) V y ( x, y ) for all ( x, y ) ∈ R , and consequently the open set C defined in (5.8) takes the form C = (cid:8) ( x, y ) ∈ R : − K + (cid:0) e γη ( x − y ) (cid:1) < (cid:98) v ( x, y ) < K − (cid:0) e γη ( x − y ) (cid:1)(cid:9) = T ( C ) . (7.4)Hence, by also defining the closed sets S +3 := (cid:8) ( x, y ) ∈ R : (cid:98) v ( x, y ) ≤ − K + (cid:0) e γη ( x − y ) (cid:1)(cid:9) , (7.5) S − := (cid:8) ( x, y ) ∈ R : (cid:98) v ( x, y ) ≥ K − (cid:0) e γη ( x − y ) (cid:1)(cid:9) , (7.6)the global diffeomorphism T from (5.5) implies that S ± = T ( S ± ) as well, where C and S ± arethe continuation and stopping regions (4.13)–(4.15) for the Dynkin game ¯ v in (4.10). Combiningthese relationships with the structure of the latter regions in (4.18)–(4.19) yields that C and S ± are connected.In order to obtain the explicit structure of the regions C and S ± , we now define the generalisedinverses of the nonincreasing b ± (cf. Proposition 4.3) by(7.7) b − ( x ) := sup { ϕ ∈ (0 , ∞ ) : b + ( ϕ ) ≥ x } and b − − ( x ) := inf { ϕ ∈ (0 , ∞ ) : b − ( ϕ ) ≤ x } . Since the map ϕ (cid:55)→ T ( x, ϕ ) in (5.5) is decreasing for any given x ∈ R (cf. the functions b ± arenonincreasing due to Proposition 4.3.( i )), we have( x, y ) ∈ C ⇔ (cid:0) x, e γη ( x − y ) (cid:1) ∈ C ⇔ b − ( x ) < e γη ( x − y ) < b − − ( x ) ⇔ x − ηγ log( b − − ( x )) < y < x − ηγ log( b − ( x )) , while similar relations hold true for the characterisation of S ± . Then, by defining(7.8) c − ± ( x ) := x − ηγ log( b − ± ( x )) , we can obtain the structure of the continuation and stopping regions of (cid:98) v , which take the form C = { ( x, y ) ∈ R : c − − ( x ) < y < c − ( x ) } , (7.9) S +3 = { ( x, y ) ∈ R : y ≥ c − ( x ) } and S − = { ( x, y ) ∈ R : y ≤ c − − ( x ) } . (7.10)The proof of the next lemma can be found in the Appendix. Lemma 7.2.
The functions c − ± ( · ) defined in (7.8) are strictly increasing, while c − ( · ) is left-continuous and c − − ( · ) is right-continuous on R . In light of Lemma 7.2, we may define the functions(7.11) c + ( y ) := inf { x ∈ R : y ≤ c − ( x ) } and c − ( y ) := sup { x ∈ R : y ≥ c − − ( x ) } , y ∈ R . In the following result, we prove that y (cid:55)→ c ± ( y ) identify with the optimal free boundaries of theDynkin game (cid:98) v in (7.3) and provide some important properties such as their global Lipschitz conti-nuity. Proposition 7.3.
The free boundaries c ± defined in (7.11) satisfy the following properties:(i) c ± ( · ) are strictly increasing on R and we have x ∗ + ≤ c + ( y ) < c − ( y ) ≤ x ∗− for all y ∈ R (with x ∗± as in Proposition 3.2). Moreover, c + ( y ) ≤ ( C (cid:48) ) − ( − ρK + ) and c − ( y ) ≥ ( C (cid:48) ) − ( ρK − ) for all y ∈ R ;(ii) c ± ( · ) are Lipschitz-continuous on R with Lipschitz constant L = 1 , namely ≤ c ± ( y ) − c ± ( y (cid:48) ) ≤ y − y (cid:48) , ∀ y ≥ y (cid:48) . C S +3 S − c + c − x ∗− x ∗ + (cid:54)(cid:63) noisedirection αβ yx Figure 3.
An illustrative drawing of the free boundaries c + and c − satisfying Propo-sition 7.3. In the picture, α := ( C (cid:48) ) − ( ρK − ) and β := ( C (cid:48) ) − ( − ρK + ). (iii) The structure of the continuation and stopping regions for (7.3) take the form C = { ( x, y ) ∈ R : c + ( y ) < x < c − ( y ) } , S +3 = { ( x, y ) ∈ R : x ≤ c + ( y ) } and S − = { ( x, y ) ∈ R : x ≥ c − ( y ) } . Proof.
We prove separately the three parts.
Proof of (i).
The first part of the claim follows from Lemma 7.2, together with the definition (7.11)of c ± . The second and third parts of the claim are due to the fact that T as in (5.5) is the identity. Proof of (ii).
Using the definitions (7.8) of c − ± and the monotonicity of b − ± (see proof of Lemma7.2) we get(7.12) c − ± ( x ) − c − ± ( x (cid:48) ) = (cid:18) x − ηγ log( b − ± ( x )) (cid:19) − (cid:18) x (cid:48) − ηγ log( b − ± ( x (cid:48) )) (cid:19) ≥ x − x (cid:48) , ∀ x ≥ x (cid:48) . Combining this with definitions (7.11) and part ( i ), we obtain the desired claim. Proof of (iii).
This is again due to the definitions (7.11) of c ± , their monotonicity from part ( i )and the expressions of the sets in (7.9) and (7.10). (cid:3) Global C -regularity of (cid:98) v . For any ( x, y ) ∈ R given and fixed, we consider the strong solutionto the dynamics in (7.2), denoted by X ,xt = x + µ t + ηW t and Y ,yt = y + 12 ( µ + µ ) t, t ≥ , and we define(7.13) τ (cid:63) ( x, y ) := inf { t ≥ X ,xt , Y ,yt ) ∈ S +3 } and σ (cid:63) ( x, y ) := inf { t ≥ X ,xt , Y ,yt ) ∈ S − } . Notice that by [17] and [34], the pair ( τ (cid:63) , σ (cid:63) ) realises a saddle point for the Dynkin game withvalue (cid:98) v in (7.3). In the sequel, we aim at deriving the global C -regularity of (cid:98) v ( · , · ), following thearguments developed in [11]. In order to accomplish that, the next result about the regularity (inthe probabilistic sense) of ( τ (cid:63) , σ (cid:63) ) is needed. Lemma 7.4.
Suppose that ( x n , y n ) n ∈ N ∗ ⊂ C is such that ( x n , y n ) → ( x o , y o ) , where y o ∈ R and x o := c + ( y o ) (resp., x o := c − ( y o ) ), then τ (cid:63) ( x n , y n ) → (resp., σ (cid:63) ( x n , y n ) → ), Q -a.s..Proof. We prove the claim for τ (cid:63) ( x n , y n ), since the proof for σ (cid:63) ( x n , y n ) can be performed analogously.Fix ω ∈ Ω and assume (aiming for a contradiction) thatlim sup n →∞ τ (cid:63) ( x n , y n )( ω ) =: δ > . This means that there exists a subsequence, still labelled by ( x n , y n ), such that(7.14) X ,x n t ( ω ) > c + ( Y ,y n t ) ∀ n ∈ N ∗ , ∀ t ∈ [0 , δ/ x n + µ t + ηW t ( ω ) > c + (cid:16) y n + 12 ( µ + µ ) t (cid:17) ∀ n ∈ N ∗ , ∀ t ∈ [0 , δ/ . Hence, taking the limit as n → ∞ and considering that c + is continuous (see Proposition 7.3.( ii )), ηW t ( ω ) ≥ c + (cid:16) y o + 12 ( µ + µ ) t (cid:17) − x o − µ t, ∀ t ∈ [0 , δ/ . Using now the Lipschitz continuity of c + (see again Proposition 7.3.( ii )), we further obtain ηW t ( ω ) ≥ c + ( y o ) −
12 ( µ + µ ) − t − x o − µ t = − (cid:0) ( µ + µ ) − + µ (cid:1) t, ∀ n ∈ N ∗ , ∀ t ∈ [0 , δ/ . (7.15)However, by the law of iterated logarithm we know that for every ε > t n ) n ∈ N decreasing to 0 such that a.s. for any n ∈ N one has W t n ≤ − (1 − ε ) (cid:114) t n log (cid:0) log (cid:0) t n (cid:1)(cid:1) . Hence, because (cid:113) t log (cid:0) log (cid:0) t (cid:1)(cid:1) /t → ∞ as t ↓
0, we have that (7.15) can only happen for ω belonging to a Q -null set and the proof is complete. (cid:3) Remark 7.5.
From the previous proof one can easily observe that, by replacing the strict inequalitywith the large one in (7.14) , we can actually prove that ˇ τ ∗ ( x n , y n ) → and ˇ σ (cid:63) ( x n , y n ) → , Q -a.s.,where (7.16)ˇ τ (cid:63) ( x, y ) := inf { t ≥ X ,xt , Y ,yt ) ∈ Int( S +3 ) } , ˇ σ (cid:63) ( x, y ) := inf { t ≥ X ,xt , Y ,yt ) ∈ Int( S − ) } . We now show that the value function (cid:98) v ( x, y ) of the Dynkin game (7.3) is smooth across thetopological boundary ∂ C of the continuation region C from (7.4) in both directions x and y . Thedetails of the proof of the following result can be found in the Appendix. Proposition 7.6 (Smooth-fit) . Let y o ∈ R and set x o := c ± ( y o ) . Then the value function (cid:98) v definedin (7.3) satisfies lim ( x,y ) → ( xo,yo ) ( x,y ) ∈C (cid:98) v x ( x, y ) = ∓ γη K ± e γη ( x o − y o ) and lim ( x,y ) → ( xo,yo ) ( x,y ) ∈C (cid:98) v y ( x, y ) = ± γη K ± e γη ( x o − y o ) . We are now ready to derive the global C -regularity of (cid:98) v as well as the local boundedness of itssecond derivative in x . Proposition 7.7.
The value function (cid:98) v defined in (7.3) satisfies (cid:98) v ∈ C ( R ; R ) and (cid:98) v xx ∈ L ∞ loc ( R ; R ) .Proof. By standard arguments based on the strong Markov property and Dirichlet boundary problemsinvolving second-order partial differential equations of parabolic type, one can show that (cid:98) v in (7.3)is a classical C , -solution to(7.17) ( ρ − L X,Y ) u ( x, y ) − (cid:0) e γη ( x − y ) (cid:1) C (cid:48) ( x ) = 0 , for all ( x, y ) ∈ C , where L X,Y is the second-order differential operator defined in (5.9) and C is given by (7.4) (see alsoProposition 7.3.( iii )). Also, (cid:98) v ∈ C ∞ in the interior of S ± . Hence, by Proposition 7.6 we have that (cid:98) v ∈ C ( R ; R ).Moreover, we have from (7.17) that12 η (cid:98) v xx ( x, y ) = ρ (cid:98) v ( x, y ) −
12 ( µ + µ ) (cid:98) v y ( x, y ) − µ (cid:98) v x ( x, y ) − (cid:0) e γη ( x − y ) (cid:1) C (cid:48) ( x ) , ∀ ( x, y ) ∈ C . Given that (cid:98) v ∈ C ( R ; R ), the right-hand side of the latter equation only involves functions that arecontinuous on R , hence (cid:98) v xx admits a continuous extension on the closure of C , and it is thereforebounded therein. Therefore, for y ∈ R , we have that (cid:98) v x ( · , y ) is Lipschitz continuous on [ c + ( y ) , c − ( y )],with Lipschitz constant K ( y ) which is locally bounded on R . Combining this with the fact that (cid:98) v x ( · , y ) is infinitely many times continuously differentiable, and therefore locally bounded, in thestopping regions S ± , we conclude that (cid:98) v xx ∈ L ∞ loc ( R ). (cid:3) Integral equations for the free boundaries.
By Proposition 7.7, and by using standardarguments based on the strong Markov property (cf. [17] and [34]), we have that the value function (cid:98) v defined in (7.3) and the free boundaries c + and c − solve the free-boundary problem (cid:0) L X,Y − ρ (cid:1)(cid:98) v ( x, y ) = − (1 + e γη ( x − y ) ) C (cid:48) ( x ) , c + ( y ) < x < c − ( y ) , y ∈ R (cid:0) L X,Y − ρ (cid:1)(cid:98) v ( x, y ) ≤ − (1 + e γη ( x − y ) ) C (cid:48) ( x ) , x < c + ( y ) , y ∈ R (cid:0) L X,Y − ρ (cid:1)(cid:98) v ( x, y ) ≥ − (1 + e γη ( x − y ) ) C (cid:48) ( x ) , x > c − ( y ) , y ∈ R − K + (1 + e γη ( x − y ) ) ≤ (cid:98) v ( x, y ) ≤ K + (1 + e γη ( x − y ) ) , ( x, y ) ∈ R (cid:98) v ( x, y ) = − K + (1 + e γη ( x − y ) ) , x ≤ c + ( y ) , y ∈ R (cid:98) v ( x, y ) = K − (1 + e γη ( x − y ) ) , x ≥ c − ( y ) , y ∈ R (cid:98) v x ( x, y ) = ∓ γη K ± e γη ( x − y ) , x = c ± ( y ) , y ∈ R (cid:98) v y ( x, y ) = ± γη K ± e γη ( x − y ) , x = c ± ( y ) , y ∈ R . (7.18)Here L X,Y is the second-order differential operator defined in (5.9) and (cid:98) v ∈ C , inside C (cf.Proposition 7.3.( iii )). Hence, via the above results and a suitable application of (a week versionof) Itˆo’s lemma, we firstly aim at obtaining an integral representation of (cid:98) v . This will then lead toa system of coupled integral equations solved by the free boundaries c ± defined in (7.11) (see alsoProposition 7.3 for their properties). Proposition 7.8.
Consider the free boundaries c ± defined in (7.11) . Then, for any ( x, y ) ∈ R , thevalue function (cid:98) v of (7.3) can be written as (cid:98) v ( x, y ) = E Q ( x,y ) (cid:20) (cid:90) ∞ e − ρs (cid:0) e γη ( X s − Y s ) (cid:1) C (cid:48) ( X s ) { c + ( Y s )
The free boundaries c ± defined in (7.11) solve the system of integral equations ∓ K ± q ( c ± ( y ) , y ) = (cid:90) ∞ e − ρs (cid:18) (cid:90) R (cid:0) e γη ( z − Y s ) (cid:1) C (cid:48) ( z ) { c + ( Y s ) Taking x = c ± ( y ) in Proposition 7.8, and employing the value function’s continuity (i.e. (cid:98) v ( c ± ( y ) , y ) = ∓ K ± (cid:0) { γ ( c ± ( y ) − y ) /η } (cid:1) , for any y ∈ R ), we find that ∓ K ± q ( c ± ( y ) , y ) = E Q ( c ± ( y ) ,y ) (cid:20) (cid:90) ∞ e − ρs (cid:0) e γη ( X s − Y s ) (cid:1) C (cid:48) ( X s ) { c + ( Y s ) The complete characterisation of the boundaries c ± provided by Proposition 7.9together with (7.8) , yield a complete description of the free boundaries b ± , at which the optimal controlrule (cid:98) P constructed in (6.1) – (6.2) (see Section 6.1 for details) commands the process ( X (cid:98) Pt , Φ t ) t ≥ tobe reflected.Indeed, once c ± are determined by solving (numerically) the system (7.20) , we can use (7.8) toobtain b − ± , and consequently determine b ± by inverting (7.7) . However, since a numerical treatmentof (7.20) is non trivial and outside the scopes of the present work, we do not address it in this paper. Appendix A.A.1. Proof of Proposition 5.1. It follows from (4.9), that Φ t = ϕ M t , where M t := exp { γW t − γ t/ } , for any t ≥ ϕ > x, ϕ ) ∈ R × (0 , ∞ ) given and fixed, one clearly has V ( x, ϕ ) ≤ J x,ϕ (0). Hence, without lossof generality, we can restrict the attention to all those controls P ∈ A such that, for some constant κ o > 0, we have E Q (cid:20) (cid:90) ∞ e − ρt (cid:0) ϕ M t (cid:1) C ( X x ; Pt )d t (cid:21) ≤ J x,ϕ ( P ) ≤ J x,ϕ (0) = E Q (cid:20) (cid:90) ∞ e − ρt (cid:0) ϕ M t (cid:1) C ( X x ;0 t )d t (cid:21) = (1 + ϕ ) E (cid:20) (cid:90) ∞ e − ρt C ( X x ;0 t )d t (cid:21) ≤ κ o (1 + ϕ )(1 + | x | p ) . (A-1)Here, the second equality follows from a change of measure as in Section 4, X x ;0 in the secondexpectation evolves as in (4.11), while in the third expectation it evolves as in (3.1), and the laststep is due to Assumption 2.1.( i ) and standard estimates. In the rest of this proof, we denote by A o the class of admissible controls P for which (A-1) holds true.Then, let ( x, ϕ ) , ( x (cid:48) , ϕ (cid:48) ) such that | ( x, ϕ ) | ≤ R , | ( x (cid:48) , ϕ (cid:48) ) | ≤ R be given and fixed, and take λ ∈ [0 , V (and restricting to the class A o ) we get λV ( x, ϕ )+(1 − λ ) V ( x (cid:48) , ϕ (cid:48) ) − V ( λ ( x, ϕ ) + (1 − λ )( x (cid:48) , ϕ (cid:48) )) ≤ sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt (cid:104) λ (1 + ϕ M t ) C ( X x ; Pt ) + (1 − λ )(1 + ϕ (cid:48) M t ) C ( X x (cid:48) ; Pt ) − (cid:0) λϕ + (1 − λ ) ϕ (cid:48) ) M t (cid:1) C ( X λx +(1 − λ ) x (cid:48) ; Pt ) (cid:105) d t + (cid:90) ∞ e − ρt K + (cid:104) λ (1 + ϕ M t ) + (1 − λ )(1 + ϕ (cid:48) M t ) − (cid:0) λϕ + (1 − λ ) ϕ (cid:48) ) M t (cid:1)(cid:105) d P + t + (cid:90) ∞ e − ρt K − (cid:104) λ (1 + ϕ M t ) + (1 − λ )(1 + ϕ (cid:48) M t ) − (cid:0) λϕ + (1 − λ ) ϕ (cid:48) ) M t (cid:1)(cid:105) d P − t (cid:21) . By adding and subtracting (1 − λ ) ϕ M (cid:0) C ( X x (cid:48) ; P ) + C ( λX x ; Pt + (1 − λ ) X x (cid:48) ; Pt ) (cid:1) in the d t -integralappearing in the last equation, using the semiconcavity property of C in Assumption 2.1.( iii ) togetherwith the solution X x ; P of (4.9), as well as the fact that sup( f + g ) ≤ sup( f ) + sup( g ), we obtain λV ( x, ϕ ) + (1 − λ ) V ( x (cid:48) , ϕ (cid:48) ) − V ( λ ( x, ϕ ) + (1 − λ )( x (cid:48) , ϕ (cid:48) )) ≤ sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt α λ (1 − λ ) (cid:16) C ( X x ; Pt ) + C ( X x (cid:48) ; Pt ) (cid:17) (1 − p ) + | x − x (cid:48) | d t (cid:21) + sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt ϕ M t (cid:16) λC ( X x ; Pt ) + (1 − λ ) C ( X x (cid:48) ; Pt ) − C ( λX x ; Pt + (1 − λ ) X x (cid:48) ; Pt ) (cid:17) d t + (cid:90) ∞ e − ρt (1 − λ )( ϕ − ϕ (cid:48) ) M t (cid:16) C ( λX x ; Pt + (1 − λ ) X x (cid:48) ; Pt ) − C ( X x (cid:48) ; Pt ) (cid:17) d t (cid:21) . Using again the assumed semiconcavity of C and H¨older’s inequality, we further conclude that λV ( x, ϕ ) + (1 − λ ) V ( x (cid:48) , ϕ (cid:48) ) − V ( λ ( x, ϕ ) + (1 − λ )( x (cid:48) , ϕ (cid:48) )) ≤ α λ (1 − λ ) | x − x (cid:48) | E Q (cid:20) (cid:90) ∞ e − ρt d t (cid:21) p sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt (cid:16) C ( X x ; Pt ) + C ( X x (cid:48) ; Pt ) (cid:17) d t (cid:21) (1 − p ) + + α λ (1 − λ ) | x − x (cid:48) | sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt ϕ M t (cid:16) C ( X x ; Pt ) + C ( X x (cid:48) ; Pt ) (cid:17) (1 − p ) + d t (cid:21) + α λ (1 − λ ) | ϕ − ϕ (cid:48) || x − x (cid:48) | sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt M t (cid:16) C ( X x ; Pt ) + C ( X x (cid:48) ; Pt ) (cid:17) (1 − p ) d t (cid:21) . We now distinguish the cases p ∈ (1 , 2] and p > 2. If p ∈ (1 , E Q [ (cid:82) ∞ e − ρt M t d t ] = 1 /ρ and H¨older’s inequality, we further obtain that λV ( x, ϕ ) + (1 − λ ) V ( x (cid:48) , ϕ (cid:48) ) − V ( λ ( x, ϕ ) + (1 − λ )( x (cid:48) , ϕ (cid:48) )) ≤ α λ (1 − λ ) | x − x (cid:48) | (cid:16) ρ − p + ρ − (cid:17) + α λ (1 − λ ) | ϕ − ϕ (cid:48) || x − x (cid:48) | E Q (cid:20) (cid:90) ∞ e − ρt M t d t (cid:21) p × sup P ∈A o E Q (cid:20) (cid:90) ∞ e − ρt M t (cid:16) C ( X x ; Pt ) + C ( X x (cid:48) ; Pt ) (cid:17) d t (cid:21) − p . (A-2)Hence, employing the estimate (A-1) in (A-2), we find for some κ > λV ( x, ϕ ) + (1 − λ ) V ( x (cid:48) , ϕ (cid:48) ) − V ( λ ( x, ϕ ) + (1 − λ )( x (cid:48) , ϕ (cid:48) )) ≤ κλ (1 − λ ) (cid:16) | x − x (cid:48) | + | ϕ − ϕ (cid:48) || x − x (cid:48) | (1 + | x | + | x (cid:48) | ) p − (cid:17) , which gives the claimed semiconcavity. The case p > p ∈ (1 , (cid:3) A.2. Proof of Lemma 7.2. Due to (7.7) and Proposition 4.3, we have that b − ± are nonincreasing,with b − left-continuous and b − − right-continuous. Combining the aforementioned properties togetherwith the definition (7.8) yields the desired properties. (cid:3) A.3. Proof of Proposition 7.6. We focus on proving the continuity of (cid:98) v x across c + , since theother claims can be obtained similarly. To that end, we firstly simplify the notation by defining (cf.(7.4)–(7.5))(A-3) q ( x, y ) := 1 + e γη ( x − y ) and (cid:98) w ( x, y ) := (cid:98) v ( x, y ) + K + q ( x, y ) (cid:40) > , for all ( x, y ) ∈ R \ S +3 , = 0 , for all ( x, y ) ∈ S +3 , and notice that, for every ( x, y ) ∈ R we have (cid:98) w ( x, y )= sup τ ∈T inf σ ∈T E Q (cid:20) (cid:90) τ ∧ σ e − ρt q ( X ,xt , Y ,yt ) (cid:0) C (cid:48) ( X ,xt ) + ρK + (cid:1) d t + ( K + + K − ) e − ρσ q ( X ,xσ , Y ,yσ ) { σ<τ } (cid:21) . Then, the desired continuity of (cid:98) v x across c + is equivalent to(A-4) lim C (cid:51) ( x,y ) → ( x o ,y o ) (cid:98) w x ( x, y ) = 0 , for x o := c + ( y o ) and y o ∈ R . In the remaining of the proof, we therefore focus on proving (A-4).Fix ( x, y ) ∈ C and let ε > x + ε, y ) ∈ C . Denote by τ (cid:63) ≡ τ (cid:63) ( x, y ) and ˇ τ (cid:63) ≡ ˇ τ (cid:63) ( x, y ) from (7.13) and (7.16), respectively. Then, define τ (cid:63)ε := τ (cid:63) ( x + ε, y ) according to (7.13) andˇ τ (cid:63)ε := ˇ τ (cid:63) ( x + ε, y ) according to (7.16). In view of Proposition 7.3.( iii ), these take the form τ (cid:63)ε = inf { t ≥ X ,x + εt ≤ c + ( Y ,yt ) } , ˇ τ (cid:63)ε = inf { t ≥ X ,x + εt < c + ( Y ,yt ) } ,τ (cid:63) = inf { t ≥ X ,xt ≤ c + ( Y ,yt ) } and ˇ τ (cid:63) = inf { t ≥ X ,xt < c + ( Y ,yt ) } . By the regularity of the Brownian motion, we have τ (cid:63)ε = ˇ τ (cid:63)ε and τ (cid:63) = ˇ τ (cid:63) , and by the continuity oftrajectories of the Brownian motion, we have(A-5) lim ε ↓ ˇ τ (cid:63)ε → ˇ τ (cid:63) which eventually yields that lim ε ↓ τ (cid:63)ε → τ (cid:63) . Moreover, Proposition 7.3.( iii ) further implies that σ (cid:63) ≡ σ (cid:63) ( x, y ) from (7.13) takes the form σ (cid:63) = inf { t ≥ X ,xt ≥ c − ( Y ,yt ) } . Then, we have (cid:98) w ( x + ε, y ) − (cid:98) w ( x, y ) ε ≤ ε E Q (cid:20) (cid:90) τ ∗ ε ∧ σ ∗ e − ρt (cid:0) q ( X ,x + εt , Y ,yt ) − q ( X xt , Y t ) (cid:1)(cid:0) C (cid:48) ( X ,x + εt + ρK + (cid:1) d t (cid:21) + 1 ε E Q (cid:20) (cid:90) τ ∗ ε ∧ σ ∗ e − ρt q ( X ,xt , Y ,yt ) (cid:0) C (cid:48) ( X ,x + εt ) − C (cid:48) ( X ,xt ) (cid:1) d t (cid:21) + 1 ε E Q (cid:20) e − ρσ ∗ { τ ∗ ε >σ ∗ } ( K + + K − ) (cid:0) q ( X ,x + εσ ∗ , Y ,yσ ∗ ) − q ( X ,xσ ∗ , Y ,yσ ∗ ) (cid:1)(cid:21) . Using the Mean-Value Theorem, the above inequality becomes (cid:98) w ( x + ε, y ) − (cid:98) w ( x, y ) ε ≤ E Q (cid:20) (cid:90) τ ∗ ε ∧ σ ∗ e − ρt q x (Λ εt , Y ,yt ) (cid:0) C (cid:48) ( X ,x + εt ) + ρK + (cid:1) d t (cid:21) + E Q (cid:20) (cid:90) τ ∗ ε ∧ σ ∗ e − ρt q ( X ,xt , Y ,yt ) C (cid:48)(cid:48) (Ξ εt )d t + e − ρσ ∗ { τ ∗ ε >σ ∗ } ( K + + K − ) q x (Θ εσ ∗ ) (cid:21) . (A-6)where Λ εt , Ξ εt ∈ ( X ,xt , X ,x + εt ) and Θ εσ ∗ ∈ ( X ,xσ ∗ , X ,x + εσ ∗ ). If now the dominated convergence theoremcan be applied, by taking limits and using (A-5) in (A-6), we getlim sup ε ↓ (cid:98) w ( x + ε, y ) − (cid:98) w ( x, y ) ε ≤ E Q (cid:20) (cid:90) τ ∗ ∧ σ ∗ e − ρt q x ( X ,xt , Y ,yt ) (cid:0) C (cid:48) ( X ,xt ) + ρK + (cid:1) d t (cid:21) + E Q (cid:20) (cid:90) τ ∗ ∧ σ ∗ e − ρt q ( X ,xt , Y ,yt ) C (cid:48)(cid:48) ( X ,xt )d t + e − ρσ ∗ { τ ∗ ≥ σ ∗ } ( K + + K − ) q x ( X ,xσ ∗ ) (cid:21) . With similar estimates, we can also getlim inf ε ↓ (cid:98) w ( x + ε, y ) − (cid:98) w ( x, y ) ε ≥ E Q (cid:20) (cid:90) τ ∗ ∧ σ ∗ e − ρt q x ( X ,xt , Y ,yt ) (cid:0) C (cid:48) ( X ,xt ) + ρK + (cid:1) d t (cid:21) + E Q (cid:20) (cid:90) τ ∗ ∧ σ ∗ e − ρt q ( X ,xt , Y ,yt ) C (cid:48)(cid:48) ( X ,xt )d t + e − ρσ ∗ { τ ∗ ≥ σ ∗ } ( K + + K − ) q x ( X ,xσ ∗ ) (cid:21) . Hence, (cid:98) w x ( x, y ) = E Q (cid:20) (cid:90) τ ∗ ∧ σ ∗ e − ρt q x ( X ,xt , Y ,yt ) (cid:0) C (cid:48) ( X ,xt ) + ρK + (cid:1) d t (cid:21) + E Q (cid:20) (cid:90) τ ∗ ∧ σ ∗ e − ρt q ( X ,xt , Y ,yt ) C (cid:48)(cid:48) ( X ,xt )d t + e − ρσ ∗ { τ ∗ ≥ σ ∗ } ( K + + K − ) q x ( X ,xσ ∗ ) (cid:21) . Then, we obtain (A-4) by taking the limit as ( x, y ) → ( x , y ), using Lemma 7.4 and noticing thatclearly lim inf ( x,y ) → ( x ,y ) σ ∗ ( x, y ) > E Q (cid:20) (cid:90) τ ∗ ε ∧ σ ∗ e − ρt q x (Λ εt , Y ,yt ) (cid:0) C (cid:48) ( X ,x + εt ) + ρK + (cid:1) d t (cid:21) in (A-6), as the others can be treated similarly. Notice that, since q x ( · , y ) is positive and increasing, C (cid:48) ( · ) is nondecreasing, and Λ εt ≤ X ,x + εt ≤ X ,x +1 t (for any ε < 1, without loss of generality), we can write (cid:90) τ ∗ ε ∧ σ ∗ e − ρt q x (Λ εt , Y ,yt ) (cid:0) C (cid:48) ( X ,x + εt ) + ρK + (cid:1) d t ≤ (cid:90) τ ∗ ε ∧ σ ∗ e − ρt q x ( X ,x +1 t , Y ,yt ) (cid:0) C (cid:48) ( X ,x +1 t ) + ρK + (cid:1) d t ≤ γη (cid:90) ∞ e − ρt (cid:0) q ( X ,x +1 t , Y ,yt ) + 1) (cid:0)(cid:12)(cid:12) C (cid:48) ( X ,x +1 t ) (cid:12)(cid:12) + ρK + (cid:1) d t = γη (cid:90) ∞ e − ρt q ( X ,x +1 t , Y ,yt ) (cid:0)(cid:12)(cid:12) C (cid:48) ( X ,x +1 t ) (cid:12)(cid:12) + ρK + (cid:1) d t + γη (cid:90) ∞ e − ρt (cid:0)(cid:12)(cid:12) C (cid:48) ( X ,x +1 t ) (cid:12)(cid:12) + ρK + (cid:1) d t. Now, E Q [ (cid:82) ∞ e − ρt | C (cid:48) ( X ,x +1 t ) | d t ] < ∞ due to Assumption 2.1 and standard estimates on the Brownianmotion. On the other hand, by using the definition of q ( · , · ) and (7.1), one has q ( X ,x +1 t , Y ,yt ) =1 + Φ ϕt , with ϕ ≡ e γη ( x +1 − y ) . Hence, E Q (cid:20) (cid:90) ∞ e − ρt q ( X ,x +1 t , Y ,yt ) (cid:16)(cid:12)(cid:12) C (cid:48) ( X ,x +1 t ) (cid:12)(cid:12) + ρK + (cid:17) d t (cid:21) = E Q (cid:20) (cid:90) ∞ e − ρt (cid:0) ϕt ) (cid:0)(cid:12)(cid:12) C (cid:48) ( X ,x +1 t ) (cid:12)(cid:12) + ρK + (cid:1) d t (cid:21)(cid:12)(cid:12)(cid:12) ϕ = e γη ( x +1 − y ) = (cid:16) e γη ( x +1 − y ) (cid:17) E (cid:20) (cid:90) ∞ e − ρt (cid:0)(cid:12)(cid:12) C (cid:48) ( X ,x +1 t ) (cid:12)(cid:12) + ρK + (cid:1) d t (cid:21) , (A-7)where the last equality is due to a change of measure as in Section 4 and X in the last expectationevolves as in (3.1). But then, standard estimates together with the growth requirements on C inAssumption 2.1 ensure that the last expectation in (A-7) is finite, thus completing the proof. (cid:3) A.4. Proof of Proposition 7.8. In this proof, we recall the notation q ( x, y ) := 1 + e γη ( x − y ) , whichwill be used in the following four steps. Step 1. Let R > τ R := inf { t ≥ | X t | ≥ R or | Y t | ≥ R } under P ( x,y ) . Since (cid:98) v ∈ C ( R ; R ) and (cid:98) v xx ∈ L ∞ loc ( R ; R ) (cf. Proposition 7.7), we can apply a weak version of Itˆo’slemma (see, e.g., [3], Lemma 8 . . 5, pp. 183–186) up to the stopping time τ R ∧ T , forsome T > 0, to obtain (cid:98) v ( x, y ) = E Q ( x,y ) (cid:20) e − ρ ( τ R ∧ T ) (cid:98) v ( X τ R ∧ T , Y τ R ∧ T ) − (cid:90) τ R ∧ T e − ρs (cid:0) L X,Y − ρ (cid:1)(cid:98) v ( X s , Y s )d s (cid:21) . (A-8)The right-hand side of (A-8) is well defined, since Y is a deterministic process, the transitionprobability of X is absolutely continuous with respect to the Lebesgue measure and ( L X,Y − ρ ) u isdefined up to a set of zero Lebesgue measure.Since (cid:98) v solves the free-boundary problem (7.18), we have for almost all ( x, y ) ∈ R , that (cid:0) L X,Y − ρ (cid:1)(cid:98) v ( x, y ) = − q ( x, y ) C (cid:48) ( x ) { c + ( y ) Using the relationship (7.3) between (cid:98) v and v and the definition (7.2) of ( X , Y ), we obtain E Q ( x,y ) (cid:104) e − ρ ( τ R ∧ T ) | (cid:98) v ( X τ R ∧ T , Y τ R ∧ T ) | (cid:105) = E Q ( x,y ) (cid:104) e − ρ ( τ R ∧ T ) (cid:12)(cid:12)(cid:12) v (cid:16) X τ R ∧ T , e γη ( X τR ∧ T − Y τR ∧ T ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:105) ≤ ( K + ∨ K − ) E Q ( x, exp { γη ( x − y ) } ) (cid:104) e − ρ ( τ R ∧ T ) (cid:16) τ R ∧ T (cid:17)(cid:105) = (cid:0) e γη ( x − y ) (cid:1) E ( x,π ) (cid:104) e − ρ ( τ R ∧ T ) (cid:105) , (A-10)for π := e γη ( x − y ) / (1 + e γη ( x − y ) ), and where the last step can be justified by performing a change ofmeasure in the same spirit of Section 4. Clearly, taking limits as R ↑ ∞ and T ↑ ∞ in (A-10) yields(A-11) lim T ↑∞ lim R ↑∞ E Q ( x,y ) (cid:104) e − ρ ( τ R ∧ T ) (cid:98) v ( X τ R ∧ T , Y τ R ∧ T ) (cid:105) = 0 . Step 3. On one hand, notice that using the strong solution ( X , Y ) to (7.2), we get E Q ( x,y ) (cid:20) (cid:90) τ R ∧ T e − ρs q ( X s , Y s ) { X s ≤ c + ( Y s ) } d s (cid:21) ≤ E Q ( x,y ) (cid:20) (cid:90) ∞ e − ρs q ( X s , Y s )d s (cid:21) = (cid:90) ∞ e − ρs (cid:16) E Q ( x,y ) (cid:104) e γη ( X s − Y s ) (cid:105)(cid:17) d s = (cid:90) ∞ e − ρs (cid:16) e γη ( x − y ) E Q (cid:104) e γW s − γ s (cid:105)(cid:17) d s < ∞ , since W is a Q -Brownian motion, thus the latter expectation is equal to 1. This clearly implies thefiniteness of the latter expectation in (A-9). On the other hand, by a change of measure as that ofSection 4 and Assumption 2.1, we also have E Q ( x,y ) (cid:20) (cid:90) τ R ∧ T e − ρs q ( X s , Y s ) C (cid:48) ( X s ) { c + ( Y s ) Finally, given the finiteness of all the expectations of integrals appearing in (A-9) due to Step 3 , we can apply the monotone convergence theorem to interchange limits as R ↑ ∞ and T ↑ ∞ with these expectations in (A-9). Therefore, using this fact together with Step 2 we obtain (7.19),which completes the proof. (cid:3) References [1] Bather, J.A. (1966). A continuous time inventory model. J. Appl. Probab. (2) 538–549.[2] Beneˇs, V. E., Karatzas, I., Ocone, D., Wang, H. (2004). Control with partial observations and an explicitsolution of Mortensen’s equation. Appl. Math. Optim. (3) 217–239.[3] Bensoussan, A., Lions, J.L. (1982). Applications of variational inequalities in stochastic control . NorthHolland,Amsterdam.[4] Callegaro, G., Ceci, C., Ferrari, G. (2020). Optimal reduction of public debt under partial observation ofthe economic growth. Finance Stoch. (4) 1083–1132.[5] Cannarsa, P., Sinestrari, C. (2014). Semiconcave Functions, Hamilton–Jacobi Equations, and Optimal Con-trol . Progress in Nonlinear Differential Equations and Their Applications, Volume 58. Birkh¨auser.[6] Dai, J.G., Yao, D. (2013). Brownian inventory models with convex holding cost, part 1: Average-optimalcontrols. Stoch. Syst. (2) 442–499.[7] Dai, J.G., Yao, D. (2013). Brownian inventory models with convex holding cost, part 2: Discount-optimalcontrols. Stoch. Syst. (2) 500–573.[8] Daley, B., Green, B. (2012). Waiting for news in the market for lemons. Econometrica (4) 1433–1504.[9] De Angelis, T., Ferrari (2014). A stochastic partially reversible investment problem on a finite time-horizon:Free-boundary analysis. Stoch. Process. Appl. [10] De Angelis, T., Stabile, G. (2019). On Lipschitz Continuous Optimal Stopping Boundaries. SIAM J. ControlOptim. De Angelis, T., Peskir, G. (2020). Global C regularity of the value function in optimal stopping problems. Ann. Appl. Probab. De Angelis, T. (2020). Optimal dividends with partial information and stopping of a degenerate reflectingdiffusion. Finance Stoch. D´ecamps, J.-P., Mariotti, T., Villeneuve, S. (2005). Investment timing under incomplete information. Math.Oper. Res. (2) 472–500.[14] D´ecamps, J.-P., Villeneuve, S. (2020). Dynamics of cash holdings, learning about profitability, and access tothe market. TSE Working Paper, n. 19-1046, version September 2020.[15] Dellacherie, C., Meyer, P.-A. (1982). Probabilities and Potential B - Theory of martingales . North-HollandPublishing Company.[16] De Marzo, P.M., Sannikov, Y. (2016). Learning, termination, and payout policy in dynamic incentive con-tracts. Rev. Econom. Stud. (1) 182–236.[17] Ekstr¨om, E., Peskir, G. (2008). Optimal stopping games for Markov processes. SIAM J. Control Optim. Ekstr¨om, E., Vaicenavicius (2016). Optimal liquidation of an asset under drift uncertainty. SIAM J. Financ.Math. Eppen, G.D., Fama, E.F. (1969). Cash balance and simple dynamic portfolio problems with proportional costs. Int. Econ. Rev. (2) 119–133.[20] Federico, S., Pham, H. (2014). Characterization of the Optimal Boundaries in Reversible Investment Problems. SIAM J. Control Optim. (4) 2180–2223.[21] Fleming, W.H., Soner, H.M. (2005). Controlled Markov processes and viscosity solutions . 2nd Edition. Springer.[22] Harrison, J.M., Taksar, M.I. (1983). Instantaneous control of Brownian motion. Math. Oper. Res. Harrison, J.M., Taylor, A.J. (1978). Optimal control of a Brownian storage system. Stoch. Process. Appl. (2) 179–194.[24] He, S., Yao, D., Zhang, H. (2017). Optimal ordering policy for inventory systems with quantity-dependentsetup costs. Math. Oper. Res. (4) 979–1006.[25] Johnson, P., Peskir, G. (2017). Quickest detection problems for Bessel processes. Ann. Appl. Probab. (2),1003–1056.[26] Karatzas, I., Shreve, S.E. (1991). Brownian motion and stochastic calculus . Second Edition (First Edition1988) Springer-Verlag.[27] Karatzas, I. (1997). Adaptive control of a diffusion to a goal and a parabolic Monge–Amp´ere-type equation. Asian J. Math. (2) 295–313.[28] Karatzas, I., Zhao, X. (2001). Bayesian adaptive portfolio optimization in Option pricing, interest rates andrisk management, 632–669 . Cambridge Univ. Press.[29] Karatzas, I., Wang, H. (2005). Connections between bounded-variation control and Dynkin games in OptimalControl and Partial Differential Equations; Volume in Honor of Professor Alain Bensoussan’s 60th Birthday (J.L.Menaldi, A. Sulem and E. Rofman, eds.) 353–362. IOS Press, Amsterdam.[30] Lakner, P. (1995). Utility maximization with partial information. Stoch. Process. Appl. (2) 247–273.[31] Liebermann, G.M. (2005). Second order parabolic differential equations . World Scientific.[32] Liptser, R.S., Shiryaev, A.N. (2001). Statistics of random processes I. Second Edition (First Edition 1977).Springer-Verlag.[33] Øksendal, B., Sulem, A. (2012). Singular stochastic control and optimal stopping with partial information ofItˆo–L´evy processes. SIAM J. Control Optim. (4) 2254–2287.[34] Peskir, G. (2008). Optimal stopping games and Nash equilibrium. Theory Probab. Appl. Protter, P.E. (2004). Statistic integration and differential equations Second Edition. Springer-Verlag.[36] Rockafellar T. (1970). Convex analysis . Princeton University Press.[37] Taksar, M.I. (1985). Average optimal singular control and a related stopping problem. Math. Oper. Res. Xu, Z., Zhang, J., Zhang, R.Q. (2019). Instantaneous control of Brownian motion with a positive lead time. Math. Oper. Res. (3) 943–965.[39] Yang, J., Yao, D.D., Ye, H.Q. (2020). On the optimality of reflection control. Oper. Res. (6) 1668–1677.[40] Zipkin, P.H. (2000). Foundations of inventory management . McGraw-Hill. S. Federico: Dipartimento di Economia, Universit`a di Genova, Piazza F. Vivaldi 5, 16126, Genova,Italy Email address : [email protected] G. Ferrari: Center for Mathematical Economics (IMW), Bielefeld University, Universit¨atsstrasse25, 33615, Bielefeld, Germany Email address : [email protected] N. Rodosthenous: School of Mathematical Sciences, Queen Mary University of London, London E14NS, UK Email address ::