[PDF] Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems

Abstract

This paper proposes penalty schemes for a class of weakly coupled systems of Hamilton-Jacobi-Bellman quasi-variational inequalities (HJBQVIs) arising from stochastic hybrid control problems of regime-switching models with both continuous and impulse controls. We show that the solutions of the penalized equations converge monotonically to those of the HJBQVIs. We further establish that the schemes are half-order accurate for HJBQVIs with Lipschitz coefficients, and first-order accurate for equations with more regular coefficients. Moreover, we construct the action regions and optimal impulse controls based on the error estimates and the penalized solutions. The penalty schemes and convergence results are then extended to HJBQVIs with possibly negative impulse costs. We also demonstrate the convergence of monotone discretizations of the penalized equations, and establish that policy iteration applied to the discrete equation is monotonically convergent with an arbitrary initial guess in an infinite dimensional setting. Numerical examples for infinite-horizon optimal switching problems are presented to illustrate the effectiveness of the penalty schemes over the conventional direct control scheme.

Full PDF

aa r X i v : . [ m a t h . O C ] J a n ERROR ESTIMATES OF PENALTY SCHEMES FORQUASI-VARIATIONAL INEQUALITIES ARISING FROM IMPULSECONTROL PROBLEMS

CHRISTOPH REISINGER ∗ AND

YUFEI ZHANG ∗ Abstract.

This paper proposes penalty schemes for a class of weakly coupled systems ofHamilton-Jacobi-Bellman quasi-variational inequalities (HJBQVIs) arising from stochastic hybridcontrol problems of regime-switching models with both continuous and impulse controls. We showthat the solutions of the penalized equations converge monotonically to those of the HJBQVIs. Wefurther establish that the schemes are half-order accurate for HJBQVIs with Lipschitz coeﬃcients,and ﬁrst-order accurate for equations with more regular coeﬃcients. Moreover, we construct the ac-tion regions and optimal impulse controls based on the error estimates and the penalized solutions.The penalty schemes and convergence results are then extended to HJBQVIs with possibly negativeimpulse costs. We also demonstrate the convergence of monotone discretizations of the penalizedequations, and establish that policy iteration applied to the discrete equation is monotonically con-vergent with an arbitrary initial guess in an inﬁnite dimensional setting. Numerical examples forinﬁnite-horizon optimal switching problems are presented to illustrate the eﬀectiveness of the penaltyschemes over the conventional direct control scheme.

Key words.

Hybrid control, HJB quasi-variational inequality, monotone system, regime switch-ing, penalty method, error estimate.

AMS subject classiﬁcations.

1. Introduction.

In this paper we study penalty schemes and their convergencefor the following weakly coupled system of degenerate Hamilton-Jacobi-Bellman quasi-variational inequalities (HJBQVIs): for all i ∈ I := { , . . . , M } ,max n sup α ∈A i L αi ( x, u ( x ) , Du i ( x ) , D u i ( x )) , ( u i − M i u )( x ) o = 0 , x ∈ R d , (1.1)where u = ( u i ) i ∈I denotes the unknown solution, ( L αi ) i ∈I is a family of second orderdiﬀerential operators, and M i is an intervention operator of the following form:(1.2) ( M i u )( x ) = min z ∈ Z i ( x ) { u i (Γ i ( x, z )) + K i ( x, z ) } . The above system extends the classical scalar HJBQVIs, and arises naturallyfrom hybrid control problems of regime-switching models with both continuous andimpulse controls (see e.g. [8, 41, 43, 39, 40]). For instance, let α be a c`adl`ag adaptedstochastic control process, and let γ = ( τ , ξ ; τ , ξ ; . . . ) be an impulse control strategyconsisting of a sequence of impulse times 0 = τ ≤ τ ≤ τ ≤ . . . , and adapted implusecontrols ( ξ , ξ , . . . ). Between impulse times, we assume the state process X follows acontrolled regime-switching process deﬁned as follows: X = x ∈ R d , I = i ∈ I , andfor all k ∈ N ∪ { } , dX t = b ( α t , I t , X t ) dt + σ ( α t , I t , X t ) dW t , τ k < t < τ k +1 , where W is a standard Brownian motion, and I is a continuous-time Markov chainwith values in the ﬁnite set I , which represents the uncertainty in the environmentand randomly switches among M = |I| states, governed by a controlled Markov ∗ Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK([email protected], [email protected]).1 ransition matrix ( d α t ij ( X t )) i,j ∈I . At an impulse time τ k , the impulse control ξ k isapplied and instantaneously changes the state into X τ k = Γ( I τ − k , X τ − k , ξ k ). The aimis to minimize the expected cost over all admissible strategies ( α, γ ) by consideringthe following value function:(1.3) u i ( x ) := inf α,γ E (cid:20) Z ∞ ℓ ( α t , I t , X t ) e − c ( I t ,X t ) t dt + ∞ X k =1 K ( I τ − k , X τ − k , ξ k ) e − c ( I τ − k ,X τ − k ) τ k (cid:21) for each x ∈ R d and i ∈ I , where ℓ and K are the running cost and the impulse cost,respectively.Such hybrid control problems appear in mathematical ﬁnance, such as in the fol-lowing applications: portfolio optimization with transaction costs [28, 33, 3], controlof exchange rates [28, 33, 14, 3], credit securitization [38], inventory control and div-idend control [28, 4]. It is well-known that under suitable assumptions, the valuefunctions ( u i ) i ∈I in (1.3) can be characterized by the viscosity solution to (1.1) (seee.g. [41, 40]). Note that due to the random switching process I , each operator L αi involves all components of the solution u , which leads us to a weakly coupled systemof HJBQVIs (see e.g. [43, 11]).As the solution to (1.1) is in general not known analytically, several classes ofnumerical schemes have been proposed to solve such nonlinear equations. By writingthe obstacle term u i − M i u as max z ∈ Z i ( x ) [ u i − u i (Γ i ( x, z )) − K i ( x, z )], one can extendthe “direct control” scheme of HJB equations to solve (1.1), which discretizes theoperators in (1.1) and attempts to solve the resulting nonlinear discrete equationsusing policy iteration [12, 3]. However, due to the non-strict monotonicity of theterm u i − M i u , such a scheme in general requires a very accurate initial guess forthe policy iteration to converge. In fact, as we shall show in Remark 6.1, even forsome simple intervention operators, policy iteration in the direct control scheme maynot be well-deﬁned for an arbitrary initial guess due to the possible singularity of thematrix iterates.An alternative approach to solving (1.1), referred to as iterated optimal stopping,approximates the QVI by a sequence of HJB variational-inequalities (see (4.1)–(4.2)),which can subsequently be solved by the direct control scheme [33, 38]. However,since this approach can be equivalently formulated as a ﬁxed point algorithm for theQVI, one can show that this approach in general suﬀers from slow convergence (i.e.rate close to 1), especially for small impulse costs [37].In this work, we shall extend the penalty schemes in [27, 1] for scalar equations(i.e. M = 1) to systems of HJBQVIs, and construct the solution of (1.1) from asequence of penalized equations. The major advantage of the penalty approximationis that one can easily construct convergent monotone discretizations of the penalizedequation with a ﬁxed penalty parameter, and policy iteration applied to the discreteequation is monotonically convergent with any initial guess (see Section 6). Moreover,the Lagrange multipliers of the penalized equations enjoy better regularity than thoseof the unpenalized QVI (1.1). It is observed empirically that this improved regularityleads to mesh-independent behaviour of policy iteration for solving the penalizedequations, i.e., the number of iterations for solving the discrete problem remainsbounded as the mesh size tends to zero, in contrast to the direct scheme (see Figure7.2 in Section 7, see also [35, 18]).All these appealing features motivate us to design eﬃcient penalty schemes forsolving systems of HJBQVIs with general intervention operators. We further establish hat as the penalty parameter ρ tends to inﬁnity, the solution of the penalized equationconverges monotonically from above to the solution of the HJBQVI. We shall alsoconstruct novel convergent approximations of the action regions and optimal impulsecontrol strategies based on the penalized solutions.Another major contribution of this work is the convergence rate of such pen-alty approximations for degenerate HJBQVIs, which is novel even in the scalar case(i.e. M = 1). Although the convergence of penalty schemes for QVIs has been provedin various works (e.g. [31, 27, 1]), to the best of our knowledge, there is no publishedwork on the accuracy of the penalty approximation with a given penalty parameter(except for, [37] where the penalty error for discrete QVIs has been analyzed). This isnot only important for the choice of penalty parameters and the practical implemen-tation of penalty schemes, but is also crucial for the construction of action regions(see Remark 4.3) and optimal impulse control strategies. In this work, we shall closethe gap by giving a rigorous analysis of the penalty errors.Let us brieﬂy comment on the two main diﬃculties encountered in deriving theerror estimates. In contrast to the results for ﬁnite-dimensional (discretized) QVIsin [37], the convergence rate of penalty approximations for HJBQVIs depends on theregularity of the solution. Since in this work we focus on degenerate HJBQVIs, includ-ing the fully degenerate case where L αi reduces to a ﬁrst-order diﬀerential operator,the solution of (1.1) is typically not diﬀerentiable due to the lack of regularizationfrom the Laplacian operator. Therefore, we need to obtain suitable regularity of thesolution to weakly coupled systems based on viscosity solution theory [13].Moreover, the non-diagonal dominance of the obstacle term u i − M i u poses asigniﬁcant challenge for estimating the penalization errors. In fact, a crucial step inestimating the penalty error for HJB variational-inequalities is to show that thereexists a constant C , depending on the regularity of the obstacle, such that for any ρ >

0, if u ρ solves the penalized equation with the parameter ρ , then u ρ − C/ρ satisﬁes the constraint of the variational inequality (see e.g. [26, 42]). However, thisis in general false for the QVIs since the term u i − M i u remains invariant under anyvertical shift of the solutions.We shall overcome the above diﬃculty by combining the ideas of [10, 26] andprecise regularity estimates (i.e., Lipschitz continuity and semiconcavity) of solutionsto HJB variational inequalities. In particular, we shall construct a family of auxiliaryapproximations for our error analysis via iterated optimal stopping. This reduces theproblem to estimating the solution regularity and penalty errors for a sequence ofobstacle problems. We shall derive a more precise estimate for the semiconcavity con-stant of the solution to HJB variational inequalities with respect to the obstacle termthan those in prior works (see the discussion above Proposition 4.3). This is crucialfor us to be able to conclude that the penalty approximation is half-order accurate forHJBQVIs with Lipschitz coeﬃcients, and ﬁrst-order accurate for equations with moreregular coeﬃcients (see Theorems 4.10 and 5.2). These convergence rates of penaltyschemes for HJBQVIs are optimal in the sense that they are of the same order (up tologarithmic terms) as those for conventional HJB variational inequalities.We further extend the penalty scheme and its error estimate to a class of HJBQVIswith possibly negative impulse costs . Note that signed costs are not only of mathe-matical interest, but are also important to model the situation where the controllercan obtain a positive impulse beneﬁt, for example, receive ﬁnancial support for in-vesting in renewable energy production (see [34, 32]). In this setting, we deduce errorestimates for a diﬀerent type of penalty schemes, which apply the penalty to eachimpulse control strategy, instead of the pointwise maximum over all impulse control trategies (Remark 5.1). These convergence results rely on a novel construction of astrict subsolution to HJBQVIs with general switching costs, for which we impose lessrestrictive conditions on the switching costs than those given in the literature (see thediscussion after (H.7) for details).Finally, we would like to point out a control-theoretic interpretation of our penaltyschemes. As observed in [29, 30], the viscosity solution of the penalized equation withparameter ρ can be identiﬁed as the value function of a hybrid control problem wherethe controller is only allowed to perform impulse controls at a sequence of Poissonarrival times with intensity ρ , instead of any stopping times. Our error estimates givea convergence rate of these hybrid control problems with random intervention timesin terms of the intensity ρ , which is of independent interest.We organize this paper as follows. Section 2 states the main assumptions and re-calls basic results for the system of HJBQVIs with positive impulse costs. In Section3 we shall propose a penalty approximation to the HJBQVIs and establish its mono-tone convergence. Then by exploiting the regularization introduced in Section 4.1, weestimate the convergence rates of the penalty schemes in Section 4.3, and constructconvergent approximations to action regions and optimal impulse controls in Section4.4. We extend the convergence results to HJBQVIs with signed costs in Section 5,and discuss the monotone convergence of policy iteration in Section 6. Numericalexamples for inﬁnite-horizon optimal switching problems are presented in Section 7to illustrate the eﬀectiveness of the penalty schemes. Appendix A is devoted to theproofs of some technical results.

2. HJBQVIs with positive costs.

In this section, we introduce the system ofHJBQVIs of our interest, state the main assumptions on its coeﬃcients, and recall theappropriate notion of solutions. We start with some useful notation which is neededfrequently throughout this work.For a function φ : R d → R , we deﬁne the following (semi-)norms: | φ | = sup x ∈ R d | φ ( x ) | , [ φ ] = sup x,y ∈ R d | φ ( x ) − φ ( y ) || x − y | , | φ | = | φ | + [ φ ] . As usual, we denote by C ( R d ) (resp. C n ( R d )) the space of bounded continuousfunctions (resp. n -times diﬀerentiable functions) in R d , and by C ( R d ) the subset offunctions in C ( R d ) with ﬁnite | · | norm. Finally, we shall denote by S d the setof d × d symmetric matrices, and by X ≥ Y in S d the fact that X − Y is positivesemi-deﬁnite.We shall consider the following weakly coupled system: for each i ∈ I := { , . . . , M } , and x ∈ R d , F i ( x, u, Du i , D u i ):= max n sup α ∈A i L αi ( x, u ( x ) , Du i ( x ) , D u i ( x )) , ( u i − M i u )( x ) o = 0 , (2.1)where u = ( u i ) i ∈I , L αi : R d × R M × R d × S d → R is the following linear operator:(2.2) L αi ( x, s, p, X ) = − tr[ a αi ( x ) X ] − b αi ( x ) p + c αi ( x ) s i − ℓ αi ( x ) − X j ∈I − i d αij ( x ) s j , with I − i := { j ∈ I | j = i } , M i is the intervention operator (1.2), i.e.,(2.3) ( M i u )( x ) = min z ∈ Z i ( x ) { u i (Γ i ( x, z )) + K i ( x, z ) } . efore introducing the assumptions on the coeﬃcients, let us recall the conceptof semiconcavity of a continuous function [13, 5], which is crucial for the subsequentconvergence analysis. Definition

A continuous function φ is semiconcave around x ∈ R d with constant C ≥ , if it holds that φ ( x + h ) − φ ( x ) + φ ( x − h ) ≤ C | h | , for all suﬃciently small h ∈ R d .(2.4) We say a continuous function φ is semiconcave with constant C ≥ if (2.4) holdsfor all x ∈ R d . For any given semiconcave function φ , we shall denote by [ φ ] , + itssemiconcavity constant, i.e., [ φ ] , + := inf { C ≥ | u ( x + h ) − u ( x ) + u ( x − h ) ≤ C | h | , x, h ∈ R d } . A concave function is clearly semiconcave. Moreover, a C function with locallyLipschitz gradient is semiconcave [5].We now list the main assumptions on the coeﬃcients. H. For any i, j ∈ I , A i is a nonempty compact set, a αi = σ αi σ αi T for some σ αi ∈ R d × d ′ , and σ αi , b αi , ℓ αi , c αi , d αij are continuous functions. Moreover, there existconstants C and λ such that it holds for any j = i , α ∈ A i that | σ αi | + | b αi | + | ℓ αi | + | c αi | + | d αij | ≤ C, (2.5) d αij ≥ , c αi − X j ∈I − i d αij ≥ λ > . (2.6) H. For any i ∈ I and x ∈ R d , Z i ( x ) is a nonempty compact set in a metricspace ( Z , d Z ) , Γ i and K i are continuous functions, and the mapping x → Z i ( x ) iscontinuous in the Hausdorﬀ metric. Moreover, there exists a constant κ such thatfor all i ∈ I , x ∈ R d and z ∈ Z i ( x ) , we have K i ( x, z ) ≥ κ > . The condition (2.5) in (H.1) is the standard regularity assumption for the coef-ﬁcients in viscosity solution theory, while (2.6) in (H.1) implies the monotonicity ofthe HJB equations. The condition (H.2) on the intervention operator is the same asthat in [38, 1], which ensures the well-posedness of (2.1) in the class of bounded con-tinuous functions. As we shall show in Section 3, they are suﬃcient for the monotoneconvergence of the penalty approximation, even for non-convex/non-concave systemsinvolving Isaacs’ equations.The following additional assumptions are necessary to derive the regularity of thevalue functions and quantify the error estimates of the penalty schemes. H. The constant λ in (H.1) satisﬁes λ > sup α,i ([ σ αi ] + [ b α ] ) . H. There exists a constant

C > such that for any i, j ∈ I , α ∈ A i , we have σ αi , b αi , c αi , d αij ∈ C ( R d ) satisfying the estimate | Dσ αi | + | Db αi | + | Dc αi | + | Dd αij | ≤ C, and ℓ αi is semiconcave with constant C in R d . H. For any i ∈ I , the operator M i preserves Lipschitz functions, i.e., thereexists a constant C > such that for any u ∈ C ( R d ) , M i u is Lipschitz continuouswith a constant satisfying [ M i u ] ≤ [ u ] + C . . For any i ∈ I , the operator M i preserves semiconcave functions, i.e., thereexists a constant C > such that for any given bounded semiconcave function u , M i u is semiconcave with a constant satisfying [ M i u ] , + ≤ [ u ] , + + C . Let us brieﬂy discuss the importance of the above assumptions. The condition(H.3) is the standard assumption for the Lipschitz continuity of the value functions(see [31]), while (H.4) will be used to establish the semiconcavity of the solutions inSection 4.1, which is the maximal regularity that one can expect for the solutions ofdegenerate HJB equations (see e.g. [5]).Conditions (H.5) and (H.6) are certain structural assumptions for the interventionoperator M i , which play an essential role in our error estimates. In general theseconditions need to be veriﬁed in a problem dependent way, as demonstrated in thefollowing special cases. Example

For the commonly studied intervention operator (see e.g. [23, 10,14, 39, 3, 2]): (2.7) M i u ( x ) = inf z ∈ R p [ u i ( x + γ ( z )) + K ( z )] , x ∈ R d , with | K ( z ) | → ∞ as | z | → ∞ , it is straightforward to show that (H.5) and (H.6) holdwith C = 0 (note the growth of K ensures the optimal impulse strategy is attained ina compact set). See Section 5 for examples with state-dependent impulse costs. Example

For the intervention operator M u ( x ) = inf z ∈ Z ( x ) [ u ( x − z )+ K ( z )] ,with x ∈ R d + := (0 , ∞ ) d and Z ( x ) = { z ∈ R d | ≤ z i ≤ x i , i = 1 , . . . , d } , which isa concave analogue of the maximum utility operator for multi-dimensional optimaldividend/inventory problems (e.g. [4]), one can show that (H.5) (resp. (H.6)) holds if K is Lipschitz continuous (resp. semiconcave).We shall only discuss (H.6), since (H.5) can be shown by a similar approach. Let x ∈ R d + and ˆ z ∈ Z ( x ) such that M u ( x ) = u ( x − ˆ z ) + K (ˆ z ) . Deﬁne the set I x = { ≤ i ≤ d | ˆ z i = x i } and the constant h = min(min i I x ( x i − ˆ z i ) , min i =1 ,...,d x i ) > . Thenfor any given h ∈ R d such that | h | < h , we can consider the vector h I x = ( h I x i ) di =1 deﬁned by h I x i = h i if i ∈ I x and otherwise, which satisﬁes the following properties: ≤ ˆ z i + h I x i ≤ x i + h i , ≤ ˆ z i − h I x i ≤ x i − h i , ∀ i ∈ I . In other words, we have z + := ˆ z + h I x ∈ Z ( x + h ) and z − := ˆ z − h I x ∈ Z ( x − h ) .Therefore one can deduce from the semiconcavity of u and K that M u is semiconcavearound x : M u ( x + h ) − M u ( x ) + M u ( x − h ) ≤ u ( x + h − z + ) + K ( z + ) − u ( x − ˆ z ) + K (ˆ z )) + u ( x − h − z − ) + K ( z − ) ≤ [ u ] , + | h − h I x | + [ K ] , + | h I x | ≤ ([ u ] , + + [ K ] , + ) | h | , which subsequently leads to the desired estimate [ M u ] , + ≤ [ u ] , + + [ K ] , + . Example

A general intervention operator (2.3) satisﬁes (H.5) under thefollowing Lipschitz conditions on the data: there exist constants C , C , C , C ≥ such that C + C C ≤ and Z i ( y ) ⊆ Z i ( x ) + ¯ B C | x − y | , | Γ i ( x, z ) − Γ i ( y, z ′ ) | ≤ C | x − y | + C d Z ( z, z ′ ) , | K i ( x, z ) − K i ( y, z ′ ) | ≤ C ( | x − y | + d Z ( z, z ′ )) , ∀ i ∈ I , x, y ∈ R d , z, z ′ ∈ Z , here for each r ≥ , ¯ B r denotes a closed ball of center and radius r in the metricspace Z . In fact, let i ∈ I , u ∈ C ( R d ) , x, y ∈ R d , ˆ z ( y ) ∈ Z ( y ) such that ( M i u )( y ) = u (Γ i ( y, ˆ z ( y )))+ K i ( y, ˆ z ( y )) . Then we can ﬁnd z ( x ) ∈ Z i ( x ) such that d Z ( z ( x ) , ˆ z ( y )) ≤ C | x − y | , which leads to the following estimate that ( M i u )( x ) − ( M i u )( y ) ≤ (cid:2) u (Γ i ( x, z ( x ))) + K i ( x, z ( x )) (cid:3) − (cid:2) u (Γ i ( y, ˆ z ( y ))) + K i ( y, ˆ z ( y )) (cid:3) ≤ [ u ] | Γ i ( x, z ( x )) − Γ i ( y, ˆ z ( y )) | + | K i ( x, z ( x )) − K i ( y, ˆ z ( y )) |≤ [ u ] (cid:0) C | x − y | + C d Z ( z ( x ) , ˆ z ( y )) (cid:1) + C ( | x − y | + d Z ( z ( x ) , ˆ z ( y ))) ≤ (cid:0) [ u ] (cid:0) C + C C ) + C (1 + C ) (cid:1) | x − y | . Hence we can conclude from the assumption C + C C ≤ that M i satisﬁes (H.5)with C = C (1 + C ) . A suﬃcient condition of (H.6) for the intervention operator (2.3) in general involves technical second-order conditions on the set-valued mapping x → Z ( x ) , which will not be derived here for the sake of simplicity. Note that we do not require any non-degeneracy condition on the diﬀusion coef-ﬁcients, i.e., the coeﬃcient a αi may vanish at certain points, hence our results applyto the fully degenerate case with a α = 0, where (2.1) reduces to QVIs of ﬁrst order.We now discuss the well-posedness of HJBVI (2.1). Due to the lack of regulariza-tion from a Laplacian operator, the solution of (2.1) is typically nonsmooth and weshall understand all equations in this work in the following viscosity sense. Definition

A bounded, upper-semicontinuous (resp.lower-semicontinuous) function u = ( u i ) i ∈I is a viscosity subsolution (resp. superso-lution) to (2.1) , if for each i ∈ I and function φ ∈ C ( R d ) , at each local maximum(resp. minimum) point x of u i − φ we have F i ( x, u ( x ) , Dφ ( x ) , D φ ( x )) ≤ (resp. ≥ ).A continuous function is a viscosity solution of (2.1) if it is both a subsolution and asupersolution. Remark

Deﬁnition 2.2 formulates the notation of viscosity solution withsuitable test functions. It is well-known that one can equivalently deﬁne the viscositysolution to (2.1) in terms of the superjet and subjet of u i at x ∈ R d , denoted by J , + u i ( x ) and J , − u i ( x ) respectively, or their closures ¯ J , + u i ( x ) and ¯ J , − u i ( x ) (seee.g. [22, Proposition 2.3]). The fact that the impulse cost is strictly positive (see (H.2)) implies − C is astrict subsolution to (2.1) for a large enough constant C >

0. Therefore, one canestablish a comparison principle of (2.1) by using similar arguments as in [23] (cf. theproof of Proposition 3.1; see also [38, Theorem 2.5.11] for a related result for solutionsof polynomial growth). The comparison principle directly leads to the uniquenessof bounded viscosity solutions to (2.1), which can be explicitly constructed throughpenalty approximations (Theorem 3.2).

Proposition

Suppose (H.1) and (H.2) hold. If u (resp. v ) is a boundedsubsolution (resp. supersolution) of (2.1) , then u ≤ v in R d . We end this section by collecting several important properties of the interventionoperator M i . Lemma

For any i ∈ I , we have:(1) M i is concave, i.e., it holds for any locally bounded functions u, v : R d → R andconstant λ ∈ [0 , that M i [(1 − λ ) u + λv ] ≥ (1 − λ ) M i u + λ M i v .(2) M i is monotone, i.e., if u ≥ v , then M i u ≥ M i v .

3) Suppose (H.2) holds, and let ( u ρ ) ρ ∈ N be a family of uniformly bounded functionson R d with the following half-relaxed limits u ∗ and u ∗ : (2.8) u ∗ ( x ) := lim sup ρ →∞ ,y → x u ρ ( y ) , u ∗ ( x ) := lim inf ρ →∞ ,y → x u ρ ( y ) , x ∈ R d . Then it holds for any given x ∈ R d and sequence ( x ρ ) ρ ∈ N with lim ρ →∞ x ρ = x that (2.9) ( M i u ∗ )( x ) ≤ lim inf ρ →∞ ( M i u ρ )( x ρ ) ≤ lim sup ρ →∞ ( M i u ρ )( x ρ ) ≤ ( M i u ∗ )( x ) . Proof.

Properties (1) and (2) follow directly from the structure of M i . Property(3) is an analogue of [1, Lemma 12] to the present concave intervention operator M i and compact set Z i , whose proof will be given in Appendix A for completeness.

3. Penalty approximations for HJBQVIs.

In this section, we propose apenalty approximation for the system of HJBQVIs (2.1), which is an extension ofthe ideas used for scalar HJBQVIs in [7, 1]. We shall also establish the monotoneconvergence of the penalized solutions in terms of the penalty parameter.For any given penalty parameter ρ ≥

0, we consider the following system of HJBequations: for all i ∈ I and x ∈ R d , F ρi ( x, u ρ , Du ρi , D u ρi ):= sup α ∈A i L αi ( x, u ρ ( x ) , Du ρi ( x ) , D u ρi ( x )) + ρ ( u ρi − M i u ρ ) + ( x ) = 0 , (3.1)where the operators L αi and M i are deﬁned as in (2.2) and (2.3), respectively.The deﬁnitions of viscosity solution, sub- and supersolution for (3.1) extend nat-urally from Deﬁnition 2.2. The following result asserts the comparison principle andthe well-posedness of (3.1) for any given penalty parameter. Proposition

Suppose (H.1) and (H.2) hold, and let ρ ≥ be a given penaltyparameter. If u ρ (resp. v ρ ) is a bounded subsolution (resp. supersolution) of (3.1) ,then u ρ ≤ v ρ in R d . Consequently, (3.1) admits a unique viscosity solution, which isuniformly bounded in ρ .Proof. We postpone the proof of the comparison principle to Appendix A, whichadapts the strict subsolution technique in [23] to the penalized equation, and reducesthe problem to a HJB equation without the penalty part.Since K ( x, z ) ≥ κ >

0, there exits a large enough constant C , independentof ρ , such that − C and C are the viscosity sub- and supersolution of (3.1) withany parameter ρ , respectively. Thus by using the comparison principle and Perron’smethod (see [22, Theorem 3.3]), one can deduce the well-posedness of (3.1) in theviscosity sense.The next result demonstrates the monotone and locally uniform convergence ofthe solution ( u ρ ) ρ ≥ of (3.1) in terms of the penalty parameter ρ . Theorem

Suppose (H.1) and (H.2) hold. Then as ρ → ∞ , the solution of (3.1) converges monotonically from above to the bounded viscosity solution of (2.1) ,uniformly on compact sets.Proof. It is clear that if ρ ≤ ρ and u ρ is a subsolution to (3.1) with theparameter ρ , then u ρ is a subsolution to (3.1) with the parameter ρ . Hence thecomparison principle leads to the fact that u ≥ u ρ ≥ u ρ . Now we shall adopt he equivalent deﬁnition of viscosity solution in terms of semi-jets and prove that thecomponent-wise half-relaxed limit u ∗ (resp. u ∗ ) is a subsolution (resp. supersolution)to (2.1).We start by showing u ∗ is a subsolution. Let x ∈ R d , i ∈ I and ( p, X ) ∈ J , + u ∗ i ( x ), then by applying [13, Lemma 6.1], there exist sequences ( x ρ , p ρ , X ρ ) ρ ∈ N such that ( p ρ , X ρ ) ∈ J , + u ρi ( x ρ ) for each ρ and ( x ρ , u ρi ( x ρ ) , p ρ , X ρ ) → ( x, u ∗ i ( x ) , p, X )as ρ → ∞ . Since u ρ is a subsolution to (3.1), we have(3.2) sup α ∈A i L αi ( x ρ , u ρ ( x ρ ) , p ρ , X ρ ) + ρ ( u ρi − M i u ρ ) + ( x ρ ) ≤ , ∀ ρ ∈ N . Then it follows from the boundedness of coeﬃcients that there exists a constant

C > u ρi ( x ρ ) − M i u ρ ( x ρ ) ≤ ( u ρi − M i u ρ ) + ( x ρ ) ≤ C/ρ, hence by letting ρ → ∞ and using Lemma 2.2 (3), we deduce that u ∗ i ( x ) = lim ρ →∞ u ρi ( x ρ ) ≤ lim sup ρ →∞ ( M i u ρ ( x ρ ) + C/ρ ) ≤ M i u ∗ ( x ) . On the other hand, (3.2) yields for any α ∈ A i , L αi ( x ρ , u ρ ( x ρ ) , p ρ , X ρ ) ≤

0, whichimplies that − tr[ a αi ( x ) X ] − b αi ( x ) p + c αi ( x ) u ∗ i ( x ) − ℓ αi ( x ) ≤ X j ∈I − i lim sup ρ →∞ d αij ( x ρ ) u ρj ( x ρ ) ≤ X j ∈I − i d αij ( x ) u ∗ j ( x ) , (3.3)where we have used the fact that lim ρ →∞ d αij ( x ρ ) = d αij ( x ) ≥

0. Then by takingthe supremum over α , we have sup α ∈A i L αi ( x, u ∗ ( x ) , p, X ) ≤

0, which shows u ∗ is asubsolution to (2.1).Then we proceed to study u ∗ by ﬁxing x ∈ R d , i ∈ I and ( p, X ) ∈ J , − ( u ∗ ) i ( x ).Let ( x ρ , p ρ , X ρ ) ρ ∈ N be a sequence such that ( p ρ , X ρ ) ∈ J , − u ρi ( x ρ ) for each ρ and( x ρ , u ρi ( x ρ ) , p ρ , X ρ ) → ( x, ( u ∗ ) i ( x ) , p, X ) as ρ → ∞ . Then the supersolution propertyof u ρ implies for each n ∈ N ,(3.4) sup α ∈A i L αi ( x ρ , u ρ ( x ρ ) , p ρ , X ρ ) + ρ ( u ρi − M i u ρ ) + ( x ρ ) ≥ . Suppose that lim sup ρ →∞ ρ ( u ρi − M i u ρ ) + ( x ρ ) >

0, then, by possibly passing to asubsequence, we have u ρi ( x ρ ) > ( M i u ρ )( x ρ ) for all ρ . Then, by using Lemma 2.2 (3),we obtain that ( u ∗ ) i ( x ) ≥ lim inf ρ →∞ ( M i u ρ )( x ρ ) ≥ ( M i u ∗ )( x ).On the other hand, suppose that lim sup ρ →∞ ρ ( u ρi − M i u ρ ) + ( x ρ ) = 0, then forany δ > ρ ∈ N , thereexists α ρ,δ ∈ A i such that − tr[ a α ρ,δ i ( x ρ ) X ρ ] − b α ρ,δ i ( x ρ ) p ρ + c α ρ,δ i ( x ρ ) u ρi ( x ρ ) − ℓ α ρ,δ i ( x ρ ) − X j ∈I − i d α ρ,δ ij ( x ρ ) u ρj ( x ρ ) ≥ − δ. Since A i is compact, we can assume α ρ,δ → α δ ∈ A i as ρ → ∞ . Then, by taking thelimit inferior and using the fact that lim inf ρ →∞ d α ρ,δ ij ( x ρ ) u ρj ( x ρ ) ≥ d α δ ij ( x )( u ∗ ) j ( x ), wededuce that(3.5) − tr[ a α δ i ( x ) X ] − b α δ i ( x ) p + c α δ i ( x )( u ∗ ) i ( x ) − ℓ α δ i ( x ) − X j ∈I − i d α δ ij ( x )( u ∗ ) j ( x ) ≥ − δ, rom which, by taking the supremum over α and sending δ →

0, we can deduce thatsup α ∈A i L αi ( x, u ∗ ( x ) , p, X ) ≥

0, and conclude that u ∗ is a supersolution to (2.1).Finally, by using Proposition 2.1, we have u := u ∗ = u ∗ is the unique continuousviscosity solution of (2.1) and consequently ( u ρ ) converges to u locally uniformly. Remark

Theorem 3.2 provides us with a constructive proof for the existenceof solutions of (2.1) based on penalty approximations. Moreover, since the convergenceanalysis relies only on the comparison principle of (2.1) and the local boundednessof ( u ρ ) ρ ≥ , it is possible to extend the results to nonlocal non-convex/non-concavesystems with coeﬃcients of polynomial growth.

4. Error estimates for penalty approximations.

In this section, we shallproceed to analyze the convergence rate of the penalty approximation for (2.1). Aspointed out in Section 1, unlike the variational inequalities [26], the non-strict mono-tonicity of the term u i − M i u prevents us from obtaining an upper bound of u ρ − u by constructing a subsolution of (2.1) directly from the penalized equations, whichsigniﬁcantly complicates the error analysis. We shall overcome this diﬃculty by reg-ularizing the HJBQVIs, and recover the same convergence rates (up to a logarithmicterm) as those for conventional obstacle problems. In this section, we approximate (2.1) by asequence of obstacle problems, through the iterated optimal stopping approximation(see e.g. [31, 38, 16] for its application to QVIs). We shall quantify the approximationerrors of these obstacle problems depending on the regularity of the solution, whichwe also establish.Let u = ( u i ) i ∈I be the viscosity solution of the following system of HJB equa-tions: sup α ∈A i L αi ( x, u ( x ) , Du i ( x ) , D u i ( x )) = 0 , x ∈ R d , i ∈ I . (4.1)We then inductively deﬁne a sequence of functions { u n } n ∈ N , where for each n ∈ N ,i.e., n >

0, given functions u n − , let u n = ( u ni ) i ∈I be the viscosity solution to thefollowing obstacle problem:(4.2)max n sup α ∈A i L αi ( x, u n ( x ) , Du ni ( x ) , D u ni ( x )) , ( u ni −M i u n − )( x ) o = 0 , x ∈ R d , i ∈ I . Under the assumptions (H.1)–(H.3) and (H.5), one can establish the comparisonprinciples for (4.1) and (4.2), and then demonstrate the existence of u n ∈ [ C ( R d )] M for each n ≥ u n − ≥ u n for all n ∈ N .The following proposition estimates the approximation error u n − u , which extendsthe results in [10, 16] to weakly coupled systems with (possibly) negative running cost( ℓ i ) i ∈I . Proposition

Suppose (H.1)–(H.3) and (H.5) hold, and let u and u n be theviscosity solution to (2.1) and (4.2) , respectively. Then there exist constants µ ∈ (0 , and C ≥ such that ≤ u n − u ≤ C (1 − µ ) n , n ≥ . Consequently, the iterates ( u n ) n ≥ are bounded uniformly in n . roof. We adapt the arguments for [37, Theorem 3.4] to the current continuoussetting, and present the main steps in Appendix A for the reader’s convenience.Now we turn to investigate the regularity of solutions to (4.2) based on diﬀerentassumptions on the coeﬃcients. We shall ﬁrst focus on the following variationalinequalities:(4.3) max n sup α ∈A i L αi ( x, u ( x ) , Du i ( x ) , D u i ( x )) , ( u i − Ψ i )( x ) o = 0 , i ∈ I , with given obstacles (Ψ i ) i ∈I , which serves as a general form of the iterative equations(4.2).The following result shows the Lipschitz continuity of the solution to the obsta-cle problem (4.3), which can be proved by using the standard doubling of variablestechnique (see e.g. [11, 16]). Proposition

Suppose (H.1) and (H.3) hold, and Ψ ∈ [ C ( R d )] M . Thenthe viscosity solution u to (4.3) is Lipschitz continuous with constant sup i ∈I [ u i ] ≤ max( C, sup i ∈I [Ψ i ] ) , where C is a constant independent of [Ψ i ] . We then proceed to study higher regularity of the solutions, which enables us todeduce a higher convergence rate of the penalty approximation. The next propositionextends the results in [20, 19] to weakly coupled systems, and asserts that if thecoeﬃcients are suﬃciently regular, then the solution to the obstacle problem (4.3) issemiconcave.Note that instead of viewing the obstacle problem (4.3) as a convex HJB equationas is studied in [20, 19], we shall separately analyze the obstacle part and the HJB partof (4.3), which leads to a sharper estimate for the semiconcavity constant of u in termsof Ψ. Moreover, instead of requiring Ψ i ∈ W , ∞ ( R d ) as in [20, 19] (which essentiallymeans Ψ i is diﬀerentiable with bounded and Lipschitz continuous derivative), we onlyassume the obstacles to be semiconcave, which is crucial for the subsequent analysisof penalty errors. Proposition

Suppose (H.1) and (H.4) hold. Assume further that the con-stant λ in (H.1) is suﬃciently large and the obstacle Ψ ∈ [ C ( R d )] M is semiconcave.Then the viscosity solution u ∈ [ C ( R d )] M to (4.3) is semiconcave with a constantsatisfying the estimate sup i ∈I [ u i ] , + ≤ max (cid:26) C sup i ∈I | u i | , sup i ∈I [Ψ i ] , + (cid:27) , for some constant C , independent of [Ψ j ] , + , [Ψ j ] and [ u j ] for all j ∈ I .Proof. For δ, ε, γ >

0, we deﬁne for all x, y, z ∈ R d that φ ( x, y, z ) = δ | x − y | + ε | x + y − z | + γ | x | , Φ i ( x, y, z ) = u i ( x ) + u i ( y ) − u i ( z ) − φ ( x, y, z ) , and let m δ,ε,γ := sup ( x,y,z ) ∈ R d ,i ∈I Φ i ( x, y, z ). By the ﬁniteness of I , the boundednessand continuity of ( u i ) i ∈I , and the penalization term φ , there exists i ∈ I , independentof δ, ε, γ , and (¯ x δ,ε,γ , ¯ y δ,ε,γ , ¯ z δ,ε,γ ) ∈ R d such that m δ,ε,γ = Φ i (¯ x δ,ε,γ , ¯ y δ,ε,γ , ¯ z δ,ε,γ ). Inthe following we shall omit the dependence on δ, ε, γ for notational simplicity. Thenwe can deduce from [13, Theorem 3.2] that for any θ >

1, there exist

X, Y, Z ∈ S d such that( p x , X ) ∈ ¯ J , + u (¯ x ) , ( p y , Y ) ∈ ¯ J , + u (¯ y ) , ( − p z / , − Z/ ∈ ¯ J , − u (¯ z ) , here ( p x , p y , p z ) = ( D x φ (¯ x, ¯ y, ¯ z ) , D y φ (¯ x, ¯ y, ¯ z ) , D z φ (¯ x, ¯ y, ¯ z )), and  X Y

00 0 Z  ≤ θD φ (¯ x, ¯ y, ¯ z ) . Hence, by the deﬁnition of viscosity solution, we obtain thatmax n sup α ∈A i L αi (¯ x, u (¯ x ) , p x , X ) , ( u i − Ψ i )(¯ x ) o ≤ , max n sup α ∈A i L αi (¯ y, u (¯ y ) , p y , Y ) , ( u i − Ψ i )(¯ y ) o ≤ , max n sup α ∈A i L αi (¯ z, u (¯ z ) , − p z / , − Z/ , ( u i − Ψ i )(¯ z ) o ≥ . (4.4)Now we discuss two cases. Suppose the maximum in the third inequality of (4.4)is attained by its second argument, then we obtain from (4.4) that u i (¯ x ) + u i (¯ y ) − u i (¯ z ) ≤ Ψ i (¯ x ) + Ψ i (¯ y ) − i (¯ z ) = Ψ i (¯ x ) + Ψ i (¯ y ) − i ( ¯ x + ¯ y i ( ¯ x + ¯ y − i (¯ z ) ≤ [Ψ i ] , + | ¯ x − ¯ y | / i ] | ¯ x + ¯ y − z | , where we have used the Lipschitz continuity and semiconcavity of Ψ i . Thus, thedeﬁnition of m δ,ε,γ and the fact that sup r> ( − δr + Cr ) = C / (4 δ ) give us that m δ,ε,γ ≤ [Ψ i ] , + | ¯ x − ¯ y | / i ] | ¯ x + ¯ y − z | − δ | ¯ x − ¯ y | − ε | ¯ x + ¯ y − z | ≤ [Ψ i ] , + δ + [Ψ i ] ε . Thus, by letting γ → ∞ , we have for all x, y, z ∈ R d that u i ( x ) + u i ( y ) − u i ( z ) ≤ [Ψ i ] , + δ + [Ψ i ] ε + δ | x − y | + ε | x + y − z | , ∀ δ, ε > , from which by minimizing over δ, ε separately, and setting x = z + h , y = z − h , weobtain that(4.5) u i ( z + h ) + u i ( z − h ) − u i ( z ) ≤ [Ψ i ] , + | h | / i ] , + | h | . On the other hand, suppose the maximum in the third inequality of (4.4) isattained by the ﬁrst argument, then for any η >

0, there exists α η ∈ A such that thefollowing inequality holds: L α η i (¯ x, u (¯ x ) , p x , X ) + L α η i (¯ y, u (¯ y ) , p y , Y ) − L α η i (¯ z, u (¯ z ) , − p z / , − Z/

2) + η ) ≤ . More precisely, we have − tr[ a α η i (¯ x ) X + a α η i (¯ y ) Y + a α η i (¯ z ) Z ] − [ b α η i (¯ x ) p x + b α η i (¯ y ) p y + b α η i (¯ z ) p z ]+ c α η i (¯ x ) u i (¯ x ) + c α η i (¯ y ) u i (¯ y ) − c α η i (¯ z ) u i (¯ z ) − [ ℓ α η i (¯ x ) + ℓ α η i (¯ y ) − ℓ α η i (¯ z )] − η ≤ X j ∈I − i d α η ij (¯ x ) u j (¯ x ) + d α η ij (¯ y ) u j (¯ y ) − d α η ij (¯ z ) u j (¯ z ) . omparing with [19, Theorem 5 (ii)], it remains to estimate the terms in the last lineof the above inequality. Note for any given function g ∈ C ( R d ) and x, z ∈ R d , wehave that | g ( x ) − g ( z ) |≤ (cid:12)(cid:12) g ( x ) − g ( x + y (cid:12)(cid:12) + (cid:12)(cid:12) g ( x + y − g ( z ) (cid:12)(cid:12) ≤ [ g ] | x − y | | g | [ g ] | x + y − z | ) / ≤ | g | ( | x − y | + | x + y − z | / ) , ∀ y ∈ R d . Therefore, we obtain for each j ∈ I − i that d α η ij (¯ x ) u j (¯ x ) + d α η ij (¯ y ) u j (¯ y ) − d α η ij (¯ z ) u j (¯ z )= d α η ij (cid:0) ¯ z )( u j (¯ x ) + u j (¯ y ) − u j (¯ z ) (cid:1) + (cid:0) d α η ij (¯ x ) + d α η ij (¯ y ) − d α η ij (¯ z ) (cid:1) u j (¯ z )+ ( d α η ij (¯ x ) − d α η ij (¯ z ))( u j (¯ x ) − u j (¯ z )) + ( d α η ij (¯ y ) − d α η ij (¯ z ))( u j (¯ y ) − u j (¯ z )) ≤ d α η ij (cid:0) ¯ z )( u i (¯ x ) + u i (¯ y (cid:1) − u i (¯ z )) + (cid:0) [ Dd α η ij ] | ¯ x − ¯ y | / d α η ij ] | ¯ x + ¯ y − z | (cid:1) | u j | + 4 | d α η ij | | u j | ( | x − y | + | x + y − z | ) . Then if λ is suﬃciently large, we can proceed along lines of the proof of [19, Theorem5 (ii)], and deduce that there exists a constant C ≥

0, independent of [ u i ] for any i ∈ I , such that it holds for all z, h ∈ R d and i ∈ I that u i ( z + h )+ u i ( z − h ) − u i ( z ) ≤ C (sup i ∈I | u i | ) | h | (cf. equation (5.8) in [19]), which together with (4.5) completes ourproof.With Propositions 4.1 and 4.3 in hand, we are ready to present the followingupper bounds of the Lipschitz and semiconcavity constants of the iterates ( u n ) n ∈ N deﬁned as in (4.1) and (4.2). Theorem

Suppose (H.1)–(H.3) and (H.5) hold, then for any n ∈ N , theiterate u n is Lipschitz continuous with a constant satisfying sup i ∈I [ u ni ] ≤ Cn , where C is a constant independent of n . If we further assume (H.4) and (H.6) hold, andthe constant λ in (H.1) is suﬃciently large, then the iterate u n is semiconcave witha constant satisfying sup i ∈I [ u ni ] , + ≤ Cn .Proof. It is well understood that the solution u to a weakly coupled systemwith convex Hamiltonians is Lipschitz continuous under (H.1)–(H.3) (see [11]), and issemiconcave if the coeﬃcients enjoy higher regularity (see the second case in the proofof Proposition 4.3). We now use an inductive argument to estimate the regularity ofthe iterates ( u n ) n ∈ N .It has been shown in Proposition 4.1 that ( u n ) n ∈ N are bounded uniformly in n .Now suppose u n − is Lipschitz continuous, and (H.5) holds. Then we can deducefrom Proposition 4.2 that(4.6) sup i ∈I [ u ni ] ≤ max (cid:18) C ′ , sup i ∈I [ M i u n − ] (cid:19) ≤ max (cid:18) C ′ , sup i ∈I [ u n − i ] + C (cid:19) , where C ′ is a constant independent of n , and C is the constant in (H.5). An inductiveargument enables us to conclude the desired estimate [ u ni ] = O ( n ) for all i .Moreover, by further assuming (H.6) and the assumptions of Proposition 4.3, wecan obtain for all i ∈ I the following estimate: for all i ∈ I ,(4.7)[ u ni ] , + ≤ max (cid:18) C ′ sup i ∈I | u ni | , sup i ∈I [ M i u n − i ] , + (cid:19) ≤ max (cid:18) C ′ sup i ∈I | u ni | , sup i ∈I [ u n − i ] , + + C (cid:19) , here C ′ is a constant independent of n , and C is the constant in (H.6). Then,by using the previous Lipschitz estimates of ( u n ) n ∈ N , we conclude from (4.7) that[ u ni ] , + = O ( n ) for all i . Remark

Suppose (H.5) and (H.6) hold with C = 0 (e.g. the interventionoperator M is of the form (2.7) ), then the estimates (4.6) and (4.7) hold with C = 0 .Thus one can show inductively that for any n ≥ , the Lipschitz constant [ u n ] and thesemiconcavity constant [ u n ] , + of the iterate u n are uniformly bounded in terms of n ,which along with Proposition 4.1, imply that the solution to HJBQVI (2.1) is Lipschitzcontinuous and semiconcave. As we shall see in Remark 4.2, this observation enablesus to improve the convergence rate of the penalty approximation by a log factor. In this section, we shall proposea sequence of auxiliary problems to the penalized equation (3.1) with a ﬁxed parameter ρ >

0, which is similar to the regularization of the QVI (2.1) discussed in Section 4.1.These auxiliary problems will serve as an important tool for quantifying convergenceorders of the penalty approximations.More precisely, for any given penalty parameter ρ >

0, we shall consider thefollowing sequence of auxiliary problems: let u ρ, = u be the solution to (4.1), andfor each n ≥

1, given u ρ,n − , let u ρ,n be the solution to the following equations:(4.8)sup α ∈A i L αi ( x, u ρ,n ( x ) , Du ρ,ni ( x ) , D u ρ,ni ( x )) + ρ ( u ρ,ni − M i u ρ,n − ) + ( x ) = 0 , i ∈ I . The above iterates ( u ρ,n ) n ≥ can be equivalently expressed as u ρ,n = Q ρ u ρ,n − forall n ∈ N with an operator Q ρ : [ C ( R d )] M → [ C ( R d )] M deﬁned as follows: for anygiven u , Q ρ u is deﬁned as the unique solution to the following equations:(4.9)sup α ∈A i L αi ( x, Q ρ u ( x ) , D ( Q ρ u ) i ( x ) , D ( Q ρ u ) i ( x )) + ρ (( Q ρ u ) i − M i u ) + ( x ) = 0 , i ∈ I . We now present some important properties of the operator Q ρ . Suppose (H.1)and (H.5) hold, then for any given u ∈ [ C ( R d )] M , we see M u ∈ [ C ( R d )] M , fromwhich one can establish the comparison principle of (4.9) and the well-posedness of(4.9) in the class of bounded continuous functions.The following lemma strengthens the comparison principle by indicating that Q ρ is monotone and Lipschitz continuous with constant 1, which is essential for the errorestimates in Section 4.3. The proof is included in Appendix A. Lemma

Suppose (H.1) and (H.5) hold. Then for any u, v ∈ [ C ( R d )] M , wehave that sup i ∈I (cid:12)(cid:12)(cid:0) ( Q ρ u ) i − ( Q ρ v ) i (cid:1) + (cid:12)(cid:12) ≤ sup i ∈I (cid:12)(cid:12) ( u i − v i ) + (cid:12)(cid:12) . Consequently, if u ≤ v , then Q ρ u ≤ Q ρ v . Finally, a straightforward modiﬁcation of the doubling arguments for Lemma 4.5enables us to show that under the assumptions (H.1), (H.3) and (H.5), the solution Q ρ u to (4.9) is in fact Lipschitz continuous provided that u is Lipschitz continu-ous, which subsequently implies the iterates ( u ρ,n ) n ≥ are well-deﬁned functions in[ C ( R d )] M . We omit the proof of these Lipschitz estimates, by pointing out that theanalysis for the obstacle part is exactly the same as those for Lemma 4.5, and referringthe reader to [11] for a discussion on the HJB part.We then proceed to study the convergence of the iterates ( u ρ,n ) n ≥ . The nextlemma shows the sequence ( u ρ,n ) n ∈ N is monotone and uniformly bounded. emma Suppose (H.1)–(H.3) and (H.5) hold, and ρ > is a ﬁxed penaltyparameter. Then the iterates ( u ρ,n ) n ≥ are monotonically decreasing and uniformlybounded in terms of n .Proof. Note that the comparison principle of (4.9) yields u ρ, ≥ u ρ, , which to-gether with the monotonicity of Q ρ leads to u ρ,n − ≥ u ρ,n for all n ≥

1. Nowwe show by induction that u ρ,n ≥ u ρ for all n ≥

0, where u ρ is the solution to(3.1). The statement holds clearly for n = 0. Suppose for some n ∈ N , we have u ρ,n − ≥ u ρ , then Lemma 2.2 (2) implies M i u ρ,n − ≥ M i u ρ for all i ∈ I , and hence ρ ( u ρ −M i u ρ ) + ≥ ρ ( u ρ −M i u ρ,n − ) + , which implies u ρ is a subsolution of the equationfor u ρ,n . Consequently, we obtain from the comparison principle that u ρ,n ≥ u ρ .The next theorem presents the convergence of ( u ρ,n ) n ≥ to the solution of (3.1). Theorem

Suppose (H.1)–(H.3) and (H.5) hold. Then for any given ρ ≥ ,the iterates ( u ρ,n ) n ≥ converge monotonically from above to the solution u ρ of (3.1) as n → ∞ .Proof. With the comparison principle of (3.1) (Proposition 3.1) in mind, it re-mains to show the component-wise relaxed half-limit u ∗ ,ρ (resp. u ρ ∗ ) of ( u ρ,n ) n ≥ is asubsolution (resp. supersolution) to (3.1). For notational simplicity, we shall omit thedependence on ρ in the subsequent analysis if no confusion can occur.Let x ∈ R d , i ∈ I and ( p, X ) ∈ J , + u ∗ i ( x ). It follows from [13, Lemma 6.1]that there exist ( x n , p n , X n ) n ∈ N such that ( p n , X n ) ∈ J , + u ρ,ni ( x n ) for each n and( x n , u ρ,ni ( x n ) , p n , X n ) → ( x, u ∗ i ( x ) , p, X ) as n → ∞ . Then we have for all n ∈ N that(4.10) sup α ∈A i L αi ( x n , u ρ,n ( x n ) , p n , X n ) + ρ ( u ρ,ni − M i u ρ,n − ) + ( x n ) ≤ . Note that by Lemma 2.2 (3), we havelim inf n →∞ ( u ρ,ni − M i u ρ,n − ) + ( x n ) ≥ lim inf n →∞ ( u ρ,ni − M i u ρ,n − )( x n ) ≥ ( u ∗ i − M i u ∗ )( x ) , which implies lim inf n →∞ ( u ρ,ni − M i u ρ,n − ) + ( x n ) ≥ ( u ∗ i − M i u ∗ ) + ( x ). Thus, for anygiven α ∈ A i , we can take limit inferior in (4.10) and obtain from the inequalitylim inf n →∞ − d αij ( x ρ ) u ρj ( x ρ ) ≥ − d αij ( x ) u ∗ j ( x ) (see (3.3)) that L αi ( x, u ∗ ( x ) , p, X ) + ρ ( u ∗ i − M i u ∗ ) + ( x ) ≤ lim inf n →∞ L αi ( x n , u ρ,n ( x n ) , p n , X n ) + ρ ( u ∗ i − M i u ∗ ) + ( x ) ≤ . Then taking the supremum over α ∈ A i gives us the desired result.We then turn to study u ∗ by ﬁxing x ∈ R d , i ∈ I and ( p, X ) ∈ J , − ( u ∗ ) i ( x ).Let ( x n , p n , X n ) n ∈ N be a sequence such that ( p n , X n ) ∈ J , − u ρ,ni ( x n ) for each n and ( x n , u ρ,ni ( x n ) , p n , X n ) → ( x, ( u ∗ ) i ( x ) , p, X ) as n → ∞ . Then for all n ∈ N , thesupersolution property of u ρ,n implies that L α n i ( x n , u ρ,n ( x n ) , p n , X n )= sup α ∈A i L αi ( x n , u ρ,n ( x n ) , p n , X n ) ≥ − ρ ( u ρ,ni − M i u ρ,n − ) + ( x n ) , for some α n ∈ A i . Then by taking limit superior as n → ∞ on both sides of the aboveinequality and using similar arguments as (3.5), we obtain thatsup α ∈A i L αi ( x, ( u ∗ ) i ( x ) , p, X ) ≥ lim sup n →∞ − ρ ( u ρ,ni − M i u ρ,n − ) + ( x n ) ≥ − ρ lim sup n →∞ ( u ρ,ni − M i u ρ,n − ) + ( x n ) . ow it remains to show m := lim sup n →∞ ( u ρ,ni − M i u ρ,n − ) + ( x n ) ≤ (( u ∗ ) i − M i u ∗ ) + ( x ) . We shall assume without loss of generality that m >

0. Then by extracting a subse-quence, we can further assume u ρ,ni ( x n ) > M i u ρ,n − ( x n ) for n , and lim n →∞ ( u ρ,ni ( x n ) −M i u ρ,n − ( x n )) = m . These properties along with Lemma 2.2 (3) yield m = lim sup n →∞ ( u ρ,ni ( x n ) −M i u ρ,n − ( x n )) ≤ ( u ∗ ) i ( x ) −M i ( u ∗ )( x ) ≤ (( u ∗ ) i −M i u ∗ ) + ( x ) , which ﬁnishes the proof of the statement that u ∗ is a supersolution of (3.1). In this section, we shall ex-ploit the regularization procedures discussed in Sections 4.1 and 4.2 to estimate theconvergence rates of the penalty approximation, depending on the regularity of thecoeﬃcients.Let us ﬁrst recall the penalty errors for the classical obstacle problem, which havebeen analyzed in [26, 36] and play an important role in our error estimates. To avoidconfusion with the solutions to (2.1) and (3.1), we shall denote by v the solution tothe obstacle problem (4.3), and by v ρ the solution to the following penalized equationwith a given parameter ρ ≥ α ∈A i L αi ( x, v ρ ( x ) , Dv ρi ( x ) , D v ρi ( x )) + ρ ( v ρi − Ψ i ) + ( x ) = 0 , i ∈ I . Proposition

For any given penalty parameter ρ > , let v and v ρ be thesolution to the obstacle problem (4.3) and the penalized equation (4.11) , respectively.Suppose (H.1) holds and Ψ i ∈ C ( R d ) for all i ∈ I . Then there exists a constant C ,independent of (Ψ i ) i ∈I , such that (4.12) 0 ≤ v ρi ( x ) − v i ( x ) ≤ C (cid:18) sup i ∈I | Ψ i | (cid:19) ρ − / , x ∈ R d , i ∈ I . If, in addition, Ψ i is semiconcave for all i ∈ I , then we have (4.13) 0 ≤ v ρi ( x ) − v i ( x ) ≤ C sup i ∈I (cid:18) | Ψ i | + [Ψ i ] , + (cid:19) /ρ, x ∈ R d , i ∈ I . Proof.

The statement extends the results for scalar HJB equations studied in [26],and can be established by using similar arguments. The main step is to observe thatfor any given constant C ρ satisfying C ρ ≥ ρ ( v ρi − Ψ i ) + for all i ∈ I , v ρ − C ρ /ρ isa subsolution to (4.3), which implies v ρ − v ≤ C ρ /ρ . Then if we suppose that Ψ ∈ [ C ( R d )] M , one can deduce that there exists a constant C >

0, independent of ρ and(Ψ i ) i ∈I , such that the upper bound C ρ ≤ C sup i (cid:0) | Ψ i | + | ( D Ψ i ) + | ) holds for all ρ ≥

0, which enables us to conclude (4.13) for smooth obstacles. Finally, we can regularizea general nonsmooth obstacle with molliﬁers, and balance the approximation errorsto obtain the desired error estimates (4.12) and (4.13).We then present the following elementary lemma, which extends [10, Lemma6.1] to polynomials with higher degrees. The proof follows from a straightforwardcomputation, which is included in Appendix A for completeness. emma For any given α > , µ ∈ (0 , and γ ∈ N , consider the function φ α : (0 , ∞ ) → R , φ α ( x ) = αx γ + µ x . Then there exists a constant C > , dependingonly on γ and µ , such that m α := min n ∈ N φ α ( n ) ≤ Cα ( − log α ) γ , as α → u ρ − u . Theorem

Let u and u ρ solve the QVI (2.1) and the penalized problem (3.1) , respectively. If (H.1)–(H.3) and (H.5) hold, then for all large enough penaltyparameter ρ , we have (4.14) 0 ≤ u ρi ( x ) − u i ( x ) ≤ C (log ρ ) ρ − / , x ∈ R d , i ∈ I . If we further assume (H.4) and (H.6) hold, and the constant λ in (H.1) is suﬃcientlylarge, then (4.15) 0 ≤ u ρi ( x ) − u i ( x ) ≤ C (log ρ ) /ρ, x ∈ R d , i ∈ I , for some constant C independent of the parameter ρ .Proof. For notational simplicity, in the subsequent analysis, we shall denote by C a generic constant, which is independent of the iterate index n and the penaltyparameter ρ , and may take a diﬀerent value at each occurrence.The monotone convergence (see Theorem 3.2) of ( u ρ ) ρ ≥ implies that u ρi − u i ≥ ρ ≥

0, hence it remains to establish an upper bound of u ρ − u . Notethat we have u ρi − u i = ( u ρi − u ρ,ni ) + ( u ρ,ni − u ni ) + ( u ni − u i ) , i ∈ I , n ≥ , where u ρ,ni and u ni solve (4.8) and (4.2), respectively. Since u ρ,n ≥ u ρ for any n ∈ N (see Theorem 4.7), we obtain from Proposition 4.1 that(4.16) u ρi − u i ≤ ( u ρ,ni − u ni ) + ( u ni − u i ) ≤ | u ρ,n − u n | + Cµ n , n ∈ N , for some constants µ ∈ (0 ,

1) and

C > | u ρ,n − u n | . Since the operator Q ρ is Lipschitz con-tinuous with constant 1 (see Lemma 4.5), it holds for all n ∈ N that(4.17) | u ρ,n − u n | ≤ | Q ρ u ρ,n − − Q ρ u n − | + | Q ρ u n − − u n | ≤ | u ρ,n − − u n − | + | Q ρ u n − − u n | . Then by letting the obstacle Ψ i = M i u n − for all i ∈ I in (4.3) and using Proposi-tion 4.8, we can obtain an upper bound of the last term in (4.17) depending on theregularity of the iterates ( u n ) n ≥ .In particular, under the assumptions (H.1)–(H.3) and (H.5), we know from The-orem 4.4 that u n is Lipschitz continuous with constant | u ni | ≤ Cn for all i ∈ I and n ∈ N . Then by using the estimates (4.12), (4.16) and (4.17), we get u ρi − u i ≤ C ( n ρ − / + µ n ) for all n ∈ N , from which we can conclude (4.14) by applying Lemma4.9 with α = ρ − / and γ = 2.Similarly, by further assuming (H.4), (H.6), and the constant λ in (H.1) is suﬃ-ciently large, we obtain from Theorem 4.4 that | u ni | + [ u ni ] , + ≤ Cn for all n , whichimplies u ρi − u i ≤ C ( n /ρ + µ n ) for all n ∈ N , and subsequently leads to the desiredestimate (4.15). emark As pointed out in Remark 4.1, in the case where the interventionoperator satisﬁes (H.5) and (H.6) with C = 0 (e.g. M i is of the form (2.7) ), we knowthe iterates ( u n ) n ∈ N are uniformly Lipschitz continuous and uniformly semiconcavewith respect to n . Therefore, by following the above arguments, we can improve theestimates (4.14) and (4.15) to O ((log ρ ) ρ − / ) and O ((log ρ ) ρ − ) , respectively. Inthis section, we propose convergent approximations to the action regions and optimalcontrol strategies of the HJBQVI (2.1) based on the penalized equations. Since in gen-eral an optimal continuous control strategy may not exist due to the nonsmoothnessof value functions, we shall focus on the approximation of optimal impulse controls.Throughout this section, instead of specifying the precise convergence rates ofthe penalty schemes, which depend on the regularity of coeﬃcients (see Theorem 4.10and Remark 4.2), we shall assume there exists a function ω : (0 , ∞ ) → (0 , ∞ ) suchthat ω ( ρ ) → ρ → ∞ and(4.18) 0 ≤ u ρi − u i ≤ ω ( ρ ) , i ∈ I , ρ > . For each i ∈ I , we shall approximate the action region of the i -th component S i = { x ∈ R d | u i ( x ) − M i u ( x ) = 0 } of (2.1) by the following sets:(4.19) S ρi = { x ∈ R d | | u ρi ( x ) − M i u ρ ( x ) | ≤ ω ( ρ ) } , ρ > . The next result shows that S ρi converges to S i in the Hausdorﬀ metric. Proposition

Suppose (H.1), (H.2) and the error estimate (4.18) hold, andlet ( S ρi ) i ∈I be the sets deﬁned in (4.19) for each ρ > . Then S i ⊂ S ρi for all i ∈ I and ρ > . Moreover, it holds for any given compact set K ⊂ R d that S ρi ∩K convergesto S i ∩ K in the Hausdorﬀ metric as ρ → ∞ .Proof. The fact that S i ⊂ S ρi follows directly from the estimate (4.18) and themonotonicity of M i (see Lemma 2.2 (2)). Hence it remains to show that for any givencompact set K ⊂ R d , we have lim ρ →∞ sup y ∈S ρi ∩K inf x ∈S i ∩K | y − x | = 0. Suppose itdoes not hold, then by passing to a subsequence, we know there exists ε > y n ∈ S ρ n i ∩ K , ρ n → ∞ , such that y n → y ∗ ∈ K and inf x ∈S i ∩K | y ∗ − x | ≥ ε ,i.e., y ∗

6∈ S i . However, by using the continuity of the functions u i and M i u (seeLemma 2.2 (3)), the deﬁnition of S ρi , and the estimate (4.18), we can obtain:( u i − M i u )( y ∗ ) =( u i − M i u )( y ∗ ) − ( u i − M i u )( y n )+ ( u i − M i u )( y n ) − ( u ρ n i − M i u ρ n )( y n ) + ( u ρ n i − M i u ρ n )( y n ) ≥ ( u i − M i u )( y ∗ ) − ( u i − M i u )( y n ) − ω ( ρ n ) − ω ( ρ n ) → n → ∞ , which along with the fact u i ≤ M i u on R d implies y ∗ ∈ S i , and henceleads to a contradiction. Remark

It is essential to include the modulus of convergence ω in the def-inition of S ρi , since in general the naive approximation ˜ S ρi = { x ∈ R d | u ρi ( x ) −M i u ρ ( x ) = 0 } does not give a convergent approximation to the action region S i . Forexample, let the vector v = ( v l ) l ∈{ , } solve the following discrete QVI: max( v − b, v − ( v + c )) = 0 , and max( v − b, v − ( v + c )) = 0 , where b > c > . It is clear that the solution is given by v = b and v = b + c , andthe action region is the second index, i.e., S = { } . However, for each ρ > , one can irectly verify that v ρ = b and v ρ = b + c + b − c ρ solve the penalized equation: v ρ − b + ρ ( v ρ − v ρ − c ) + = 0 , and v ρ − b + ρ ( v ρ − v ρ − c ) + = 0 , which implies that ˜ S ρ = ∅ for all ρ . Now we proceed to study optimal impulse control strategies. For any given u ∈ [ C ( R d )] M , we denote by Z ui ( x ) := arg min z ∈ Z ( x ) [ u i (Γ i ( x, z )) + K i ( x, z )] the set ofoptimal impulse control strategies for all i ∈ I and x ∈ S i . The following resultconstructs a convergent approximation of Z ui , based on the set of impulse controls Z u ρ i ( x ), x ∈ S ρi , obtained by the penalized solution u ρ . Theorem

Suppose the assumptions of Proposition 4.11 hold. Then for any i ∈ I , x ∈ S i , and sequence of impulse controls ( z ρ ) ρ> satisfying z ρ ∈ Z u ρ i ( x ) for all ρ , we have lim ρ →∞ ¯ d Z ( z ρ , Z ui ( x )) := lim ρ →∞ inf { d Z ( z ρ , z ) | z ∈ Z ui ( x ) } = 0 , where ( Z , d Z ) is the metric space in (H.2). Consequently, if Z ui ( x ) is a singleton,then Z u ρ i ( x ) converges to Z ui ( x ) in the Hausdorﬀ metric as ρ → ∞ .Proof. Suppose there exists i ∈ I , x ∈ S i , and a sequence ( z ρ ) ρ> satisfying z ρ ∈ Z u ρ i ( x ) and ¯ d Z ( z ρ , Z ui ( x )) ≥ ε >

0. Now let us consider the compact set Z ε ( x ) = { z ∈ Z ( x ) | ¯ d Z ( z, Z ui ( x )) ≥ ε } , and pick z ε ∈ Z ε ( x ) such that u i (Γ i ( x, z ε )) + K i ( x, z ε ) = min z ∈ Z ε ( x ) [ u i (Γ i ( x, z )) + K i ( x, z )] . Since Z ε ( x ) ∩ Z ui ( x ) = ∅ , we can derive from the facts z ε

6∈ Z ui ( x ) and z ρ ∈ Z ε ( x ) thefollowing inequality:[ u i (Γ i ( x, z ρ )) + K i ( x, z ρ )] − [ u i (Γ i ( x, ˆ z )) + K i ( x, ˆ z )] ≥ [ u i (Γ i ( x, z ε )) + K i ( x, z ε )] − [ u i (Γ i ( x, ˆ z )) + K i ( x, ˆ z )] := c > , (4.20)for some ˆ z ∈ Z ui ( x ). On the other hand, we obtain from the estimate (4.18) that[ u i (Γ i ( x, z ρ )) + K i ( x, z ρ )] − [ u i (Γ i ( x, ˆ z )) + K i ( x, ˆ z )]= [ u i (Γ i ( x, z ρ )) − u ρi (Γ i ( x, z ρ ))] + [ u ρi (Γ i ( x, z ρ )) + K i ( x, z ρ )] − [ u i (Γ i ( x, ˆ z )) + K i ( x, ˆ z )] ≤ M i u ρ ( x ) − M i u ( x ) ≤ | u ρi − u i | ≤ ω ( ρ ) , which contradicts (4.20) by passing ρ → ∞ , and ﬁnishes the proof. Remark

We refer the reader to [17, 34] and references therein, where theuniqueness of a pointwise optimal impulse strategy has been established for variouspractical impulse control problems by exploiting the regularity of the value functionsand the structure of the intervention operator. Then in Theorem 4.12 the convergenceof the approximate controls follows.

5. Extension to some HJBQVIs with signed costs.

In this section, weextend the penalty schemes to a class of QVIs with possibly negative impulse costswhich arise from optimal switching problems. In this setting, the controller has twomechanisms of aﬀecting the regime switching process I in (1.3), namely through theircontinuous control process α acting on its Markov transition matrix, as well as directlyand immediately by exercising an impulse control to change the regime, the latter at he expense of a positive impulse cost or beneﬁtting from a negative impulse cost.We shall propose an eﬃcient, alternative penalty scheme by taking advantage of theﬁniteness of the set of switching controls, and extend the convergence analysis inSection 4 to estimate the penalization error.More precisely, we consider the following system of HJBQVIs: for each i ∈ I = { , . . . , M } and x ∈ R d , F i ( x, u, Du i , D u i ):= max n sup α ∈A i L αi ( x, u ( x ) , Du i ( x ) , D u i ( x )) , ( u i − M i u )( x ) o = 0 , (5.1)where the linear operator L αi is deﬁned as in (2.2), and the intervention operator M i is given by(5.2) ( M i u )( x ) = min j ∈I − i { u j ( x ) + k ij ( x ) } , x ∈ R d . By enlarging the state space R d into the product space R d × I , we can treat (5.2)as a special case of (2.3) with Z i ( x ) = I − i , Γ i ( x, z ) = ( x, z ) and K i ( x, z ) = k iz ( x )for all ( x, i ) ∈ R d × I . Consequently, if the switching costs k ij are strictly positive,i.e., k ij ≥ κ >

0, we can directly apply the penalty scheme (3.1) to solve (5.1), anddeduce from Theorem 4.10 the rate of convergence in terms of the parameter ρ .As we shall see shortly, the structure of the operator M i and the ﬁniteness of theset of impulse controls allow us to consider signed switching costs ( k ij ) j taking bothpositive and negative values.Now we introduce an alternative penalty scheme for solving (5.1). For any givenpenalty parameter ρ ≥

0, we consider the following system of penalized equations: forall i ∈ I , x ∈ R d , F ρi ( x, u, Du i , D u i ):= sup α ∈A i L αi ( x, u ρ ( x ) , Du ρi ( x ) , D u ρi ( x )) + ρ X j ∈I − i ( u ρi − u ρj − k ij ) + ( x ) = 0 . (5.3)Unlike (3.1), the above penalty scheme makes use of the ﬁniteness of the set I − i ,and performs penalization on each component of the system, which leads to easilyimplementable and eﬃcient iterative schemes for the penalized equations withouttaking the pointwise maximum over all switching components (see [37]). Remark

The penalty scheme (5.3) can be extended to the general interven-tion operator (2.3) , for which we introduce the following penalty term: ρ Z Z i ( x ) (cid:18) u i ( x ) − u i (Γ i ( x, z )) − K i ( x, z ) (cid:19) + ν ( dz ) , ρ ≥ , where ν is a given ﬁnite measure supported on the set ∪ i,x Z i ( x ) (see [27]). In the remaining part of this section, we shall discuss how to extend the con-vergence analysis in the previous sections to study penalty schemes for (5.1) withpossibly negative switching costs. We shall focus on the scheme (5.3), but the sameanalysis extends naturally to the scheme (3.1). More precisely, we shall replace (H.2)by the following condition on the switching costs: . There exist constants C ≥ and κ > such that for all j = i, l ∈ I , wehave k ii ≡ , k ij ( x ) + k jl ( x ) − k il ( x ) ≥ κ > , x ∈ R d , (5.4) and the following regularity estimates: | k ij | ≤ C , and k ij is semiconcave with con-stant C around any point x ∈ R d with k ij ( x ) < κ . The allowance of negative switching costs clearly complicates the assumptions onthe switching costs, which is worth a detailed discussion. The triangular condition(5.4) is similar to the assumption used in [34, 32], which means that it is less expensiveto switch directly from regime i to l than in two steps via an intermediate regime j .It also implies k ij + k ji ≥ κ > j = i , which prevents arbitrage opportunitiesthat one can gain a positive proﬁt by instantaneously switching back and forth. Thisfurther leads to the “no loop condition” introduced by [21], which together with (H.1)enables us to conclude a comparison principle of (5.1) by using similar arguments asthose for [32, Theorem 2.1], and consequently the uniqueness of viscosity solutions to(5.1) in the class of bounded continuous functions.The Lipschitz continuity and semiconcavity assumptions in (H.7) are similar tothose in [32], which ensure the existence of a strict subsolution to (5.1) (see Propo-sition 5.1). However, we remark that, instead of requiring the switching costs to besemiconcave on R d as in [32], we only impose the semiconcavity condition around thepoints at which the costs are close to or less than zero, hence no additional regularityis required if we are in the classical context of strictly positive switching costs.The following proposition explicitly constructs a strict subsolution to (5.1), whichis crucial to the well-posedness of (5.1) and (5.3), but also the error estimates of thepenalty approximations (cf. Propositions 3.1 and 4.1). Proposition

Suppose (H.1) and (H.7) hold. Then there exists a constant

C > , such that for any ε ∈ (0 , κ ) , the function w ∈ [ C ( R d )] M deﬁned as (5.5) w i = − ˜ k i − C, ˜ k i = min (cid:26) min j ∈I − i ( k ji − ε ) , (cid:27) , i ∈ I , is a strict subsolution to (5.1) and (5.3) for any ρ ≥ , i.e., F i ( x, u, Du i , D u i ) ≤− min( ε, κ − ε ) and F ρi ( x, u, Du i , D u i ) ≤ − min( ε, κ − ε ) in the viscosity sense.Proof. For any given ε ∈ (0 , κ ), we ﬁrst verify w i − M i w ≤ − min( ε, κ − ε ).Note that w i − w j − k ij = − ˜ k i + ˜ k j − k ij = − ˜ k i + min (cid:26) min l ∈I − j ( k lj − ε ) , (cid:27) − k ij , ∀ j ∈ I − i . Now if ˜ k i = 0, we can pick l = i and deduce that w i − w j − k ij ≤ k ij − ε − k ij = − ε .Otherwise, if ˜ k i = k mi − ε for some m ∈ I − i , then we shall separate the discussionsinto two cases. If m = j , then (5.4) and the facts that k ii = 0, ˜ k j ≤ w i − w j − k ij ≤ − ( k ji − ε ) − k ij ≤ − ( κ − ε ). On the other hand, if m = j , by setting l = m , we obtain that w i − w j − k ij ≤ − ( k mi − ε ) + ( k mj − ε ) − k ij ≤ − κ , which completes the proof of the desired statement by taking the maximum over all j ∈ I − i . ote that for any given i ∈ I and x ∈ R d , ˜ k i ( x ) is deﬁned by taking the minimumover the indices j ∈ I − i such that k ji ( x ) ≤ ε < κ , hence by using (H.7) one canshow ˜ k i ( x ) is semiconcave with some constant C ≥ x . Therefore, we caninfer for each i ∈ I that ˜ k i is Lipschitz continuous and semiconcave in R d . Hencethere exists a sequence of smooth functions (˜ k ε ) ε> such that D ( − ˜ k ε ) is boundedand D ( − ˜ k ε ) is bounded below uniformly in terms of ε , and ˜ k ε uniformly convergesto ˜ k as ε →

0. Then by using the boundedness of coeﬃcients and the stability ofsubsolutions, we deduce that there exists a constant C ′ such that for all i ∈ I and x ∈ R d , we have sup α ∈A i L αi ( x, − ˜ k, D ( − ˜ k i ) , D ( − ˜ k i )) ≤ C ′ in the viscosity sense.Hence, for any constant C such that C ≥ ( C ′ + min( ε, κ − ε )) /λ , we can concludethat w ∈ [ C ( R d )] M is a strict subsolution to (5.1) and (5.3) for any ρ ≥ u ∈ [ C ( R d )] M and [ k ij ] ≤ C for all j ∈ I − i , then M i u ∈ C ( R d ) satisﬁes [ M i u ] ≤ sup j ∈I − i ([ u j ] + [ k ij ] ). If u i and k ij are semiconcave in R d for all i, j ∈ I − i , then M i u is semiconcave in R d with constant [ M i u ] , + ≤ sup j ∈I − i ([ u j ] , + + [ k ij ] , + ). Therefore, we can obtain as adirect consequence of Theorem 4.4 that the iterates ( u n ) n ∈ N are Lipschitz continuouswith constant O ( n ) if the switching costs are Lipschitz continuous, and they aresemiconcave with constant O ( n ) if the switching costs are semiconcave.Finally, by assuming the obstacles (Ψ i ) i ∈I in (4.3) are of the form Ψ i = min j ∈I − i Ψ ij for all i ∈ I , we can generalize Proposition 4.8 to study the following penalty approx-imation to the classical obstacle problem (4.3):sup α ∈A i L αi ( x, v ρ ( x ) , Dv ρi ( x ) , D v ρi ( x )) + ρ X j ∈I − i ( v ρi − Ψ ij ) + ( x ) = 0 , i ∈ I . and obtain exactly the same error estimates (4.12) and (4.13).Now we are ready to conclude the following analogue of Theorem 4.10, whichgives the convergence rate of (5.3) to (5.1) with respect to the penalty parameter. Theorem

Let u and u ρ solve the QVI (5.1) and the penalized problem (5.3) ,respectively. If (H.1), (H.3) and (H.7) hold, then for all large enough penalty param-eter ρ , we have ≤ u ρi ( x ) − u i ( x ) ≤ C (log ρ ) ρ − / , x ∈ R d , i ∈ I . If we further assume (H.4) holds, the constant λ in (H.1) is suﬃciently large, and ( k ij ) i,j ∈I are semiconcave in R d , then we have ≤ u ρi ( x ) − u i ( x ) ≤ C (log ρ ) /ρ, x ∈ R d , i ∈ I , for some constant C , independent of the parameter ρ and the number of switchingcomponents M .

6. Discretization and policy iteration for penalized equations.

In thissection, we shall discuss brieﬂy how to construct convergent discretizations for thepenalized equations, and propose a globally convergent iterative method to solve thediscretized equation based on policy iteration. et us start with the discretization of the penalized equation (3.1) with a ﬁxedpenalty parameter ρ >

0. We shall denote by { x l } l = h Z d a uniform spatial grid on R d with mesh size h , by u ρi,l the discrete approximation to u ρi at the point x l , and by Z i,l the set of impulse controls at the point x l .It is standard to show that, by using monotone discretizations (e.g. the semi-Lagrangian scheme in [15]) for the diﬀerential operators and multilinear interpolationsfor the intervention operator (see [1, 36]), one can derive the following approximationto (3.1): for all i ∈ I ,sup α ∈A i (cid:20) X m ∈ Z d θ αi,l,m ( u ρi,l − u ρi,m ) + c αi,l u ρi,l − X j ∈I − i d αij,l u ρj,l − ℓ αi,l + ρ (cid:18) u ρi,l − inf z ∈ Z i,l (cid:20) X m ∈ Z d γ zi,l,m u ρi,m + K zi,l (cid:21)(cid:19) + (cid:21) = 0 , l ∈ Z d , (6.1)with some coeﬃcients θ αi,l,m ≥

0, 0 ≤ γ zi,l,m ≤ P m ∈ Z d γ zi,l,m = 1 for all l, m ∈ Z d , α ∈ A i and z ∈ Z i,l . Under (H.1) and (H.2), it is straightforward to show that theabove scheme is monotone and consistent with the penalized equation (3.1) as h tendsto zero, which enables us to conclude from [11, Proposition 3.3] that the numericalsolution of (6.1) converges to the solution of (3.1) as h →

0. Moreover, one candeduce by similar arguments as those in [1] that the numerical solution converges tothe solution of the QVI (2.1) when 1 /ρ and h tend to zero simultaneously.Now we proceed to demonstrate the global convergence of policy iteration forsolving (6.1). We shall ﬁrst enlarge the control space and reformulate (6.1) intoan HJB equation in a countably inﬁnite space. Note that by introducing the set B = { , } and using the fact P m ∈ Z d γ zi,l,m = 1, one can rearrange the terms of (6.1)and obtain that: for all ( i, l ) ∈ I × Z d ,sup ( α,β,z ) ∈A i ×B× Z i,l (cid:20)(cid:18) X m = l ( θ αi,l,m + βργ zi,l,m ) + c αi,l (cid:19) u ρi,l − X m = l ( θ αi,l,m + βργ zi,l,m ) u ρi,m − X j ∈I − i d αij,l u ρj,l − ℓ αi,l − βρK zi,l (cid:21) = 0 , which can be equivalently expressed in the following compact form:(6.2) sup ω ∈A (cid:18) ˜ A ( ω ) u ρ − ˜ b ( ω ) (cid:19) = 0 , where u ρ = ( u ρi,l ) ( i,l ) ∈I× Z d , A = ( A i × B × Z i,l ) I× Z d , and for any given ω =( α i,l , β i,l , z i,l ) ( i,l ) ∈I× Z d ∈ A , ˜ A ( ω ) = (˜ a ( i,l ) , ( i ′ ,l ′ ) ( ω )) ( i,l ) , ( i ′ ,l ′ ) ∈I× Z d is the following“inﬁnite” matrix (see [9]):(6.3) ˜ a ( i,l ) , ( i ′ ,l ′ ) ( ω ) = P m = l ( θ α i,l i,l,m + β i,l ργ z i,l i,l,m ) + c α i,l i,l , i ′ = i, l ′ = l, − ( θ α i,l i,l,l ′ + β i,l ργ z i,l i,l,l ′ ) , i ′ = i, l ′ = l, − d α i,l ii ′ ,l , i ′ = i, l = l. Now we can apply the classical policy iteration to solve (6.2), or equivalently (6.1):let ω (0) be a given initial control value, for all k ≥

0, deﬁne ( u ρ, ( k ) , ω ( k +1) ) as follows:(6.4) ˜ A ( ω ( k ) ) u ρ, ( k ) − ˜ b ( ω ( k ) ) = 0 , ω ( k +1) ∈ arg max ω ∈A ( ˜ A ( ω ) u ρ, ( k ) − ˜ b ( ω )) , here the maximization is performed component-wise. Such maximization operationis well-deﬁned under (H.1) and (H.2), due to the fact that the control set A i × B × Z i,l is compact and the coeﬃcients ˜ A and ˜ b are continuous in ω .The next theorem establishes the monotone convergence of ( u ρ, ( k ) ) k ≥ for anyinitial guess ω (0) , which extends the result in [3] to weakly coupled systems in aninﬁnite dimensional setting. Theorem

Suppose (H.1) and (H.2) hold. Then for any initial control value ω (0) , the iterates ( u ρ, ( k ) ) k ≥ are well-deﬁned, and converge pointwise to the uniquesolution of (6.2) , or equivalently (6.1) , as k → ∞ . Moreover, we have u ρ, ( k ) ≥ u ρ, ( k +1) for all k ≥ .Proof. The statement is an analogue of Proposition B.1 in [9], where the monotoneconvergence of policy iteration has been proved for concave HJB equations. Note (2.6)implies that for each ω ∈ A and ( i, l ) ∈ I × Z d ,˜ a ( i,l ) , ( i,l ) ( ω ) ≥ X ( i ′ ,l ′ ) =( i,l )) | ˜ a ( i,l ) , ( i ′ ,l ′ ) ( ω ) | + λ , which gives the monotonicity of ˜ A , i.e., for any given ω ∈ A , if ˜ A ( ω ) u ≥ u is bounded, then u ≥

0. Moreover, the boundedness of coeﬃcients leads to theuniform boundedness of the iterates ( u ( k ) ) k ≥ and the fact that sup ω ∈A (Card { ( i ′ , l ′ ) | ˜ a ( i,l ) , ( i ′ ,l ′ ) ( ω ) = 0 } ) < ∞ for each ( i, l ) ∈ I× Z d . Therefore, even though the control setin (6.2) varies for each component ( i, l ), it is straightforward to adapt the argumentsfor [9, Proposition B.1] and establish the desired convergence result. Remark

Theorem 6.1 establishes one of the major advantages of penaltyschemes over the direct control scheme studied in [3, 12], which applies policy iterationto solve a direct discretization of QVI (2.1) . Such a scheme in general is not well-deﬁned due to the possible singularity of the matrix iterates caused by the non-strictmonotonicity of u i −M i u in u . In fact, consider the simple QVI max( u − g, u −M u ) =0 with M u := u + c and c > , whose solution is given by u = g due to the fact that u − M u = − c < . Suppose that we initialize policy iteration with the impulsecontrol, then we need to solve u − ( u + c ) = 0 , which clearly admits no solution. Morecomplicated examples can be constructed to show that the direct control scheme canfail at any intermediate iterate (see [3]).

7. Numerical experiments.

In this section, we illustrate the theoretical ﬁnd-ings and demonstrate the eﬃciency improvement of the penalty schemes over thedirect control scheme through numerical experiments. We shall present an inﬁnite-horizon optimal switching problem and examine the performance of penalty schemeswith respect to the spatial mesh size and the penalty parameter.We ﬁrst introduce the following two-regime inﬁnite-horizon optimal switchingproblem (see e.g. [34, 37]). Let (Ω , F t , P ) be a ﬁltered probability space and γ =( γ t ) t ≥ be a control process such that γ t = P k ≥ i k [ τ k ,τ k +1 ) ( t ), where ( τ k ) k ≥ isa non-decreasing sequence of stopping times representing the decision on “when toswitch”, and for each k ≥ i k is an F τ k -measurable random variable valued in thediscrete space I = { , } , representing the decision on “where to switch”. That is,the decision maker chooses regime i k at the time τ k for all k ≥ γ , we consider the following controlledstate equation: dX γt = ( r + ν ( γ t )( µ − r )) X γt dt + σν ( γ t ) X γt dW t , t > , X γ = x, here r, µ, σ, x > W t ) t> is a one-dimensional Brownian motiondeﬁned on (Ω , F t , P ), and ν ( i ) = i − i ∈ I . Then the objective function associatedwith the control strategy γ is given by: J ( x, γ ) = E (cid:20) Z ∞ e − rt ℓ ( X γt ) dt − X k ≥ e − rτ k +1 c i k ,i k +1 (cid:21) , where ℓ represents the running reward function and c i,j represents the switching costfrom regime i to j , ∀ i, j ∈ I . For each i ∈ I , let A i be all control strategies startingwith regime i , i.e., i = i and τ = 0. Then the decision maker has the following valuefunctions: u i ( x ) = sup γ ∈ A i J ( x, γ ) , i ∈ I = { , } . Suppose that the switching costs c i,j ≡ c > i = j , then we can deduce fromthe dynamic programming principle (see [34]) that the value functions ( u , u ) satisfythe following system of quasi-variational inequalities: for all i ∈ I , j = i , x ∈ (0 , ∞ ),(7.1)min (cid:20) − σ ν ( i ) x D u i ( x ) − ( r + ν ( i )( µ − r )) xDu i ( x )+ ru i ( x ) − ℓ ( x ) , ( u i − u j + c )( x ) (cid:21) = 0 . Moreover, even though (7.1) involves a pointwise minimization instead of a pointwisemaximization as in (5.1), for any given penalty parameter ρ >

0, one can easily extendthe scheme (5.3) and derive the corresponding penalized equation for (7.1): for all i ∈ I , j = i , x ∈ (0 , ∞ ),(7.2) − σ ν ( i ) x D u ρi ( x ) − ( r + ν ( i )( µ − r )) xDu ρi ( x )+ ru i ( x ) − ℓ ( x ) − ρ ( u ρj − c − u ρi ) + ( x ) = 0 . For our numerical experiments, we set the parameters as c = 1 / σ = 0 . µ = 0 . r = 0 .

02 and choose a nonsmooth running reward function: ℓ ( x ) = 0 . − | x − | for x ∈ [0 . , .

5] and ℓ ( x ) = 0 otherwise.Now let ρ > n ∈ N , and { x l } = { lh } l ∈ N ∪{ } be a uniform grid of (0 , ∞ ) withthe mesh size h = 2 − n . We shall derive a monotone discretization of the penalizedequation (7.2) by employing the standard (two-point) forward diﬀerence for the ﬁrstderivates and (three-point) central diﬀerence for all second derivatives; see Section 6and [1] for the convergence of the discretization as n, ρ → ∞ . We shall also localizethe equation on the computational domain (0 ,

2) with homogenous Dirichlet boundarycondition u = 0 at x = 2, which leads to the following discrete equation for (7.2): ﬁnd u ρN = ( u ρ ,N , u ρ ,N ) ∈ R N satisfying A u ρN − ~ℓ − ρ ( b − M u ρN ) + := (cid:18) B + rI N/ B + rI N/ (cid:19) u ρN − (cid:18) ℓℓ (cid:19) − ρ max( b − M u ρN ,

0) = 0 , (7.3)where N = 4 /h = 2 n +2 is the total number of unknowns, B , B ∈ R N/ × N/ arematrices resulting from discretization of the diﬀerential operators, ℓ ∈ R N/ is avector such that ℓ k = ℓ ( x k − ) for all k = 1 , . . . , N/ M = (cid:18) I N/ − I N/ − I N/ I N/ (cid:19) s a matrix representation of the switching operator, and b ∈ R N is a constant vectorwith value − c . Similarly, we can discretize (7.1) for the direct control scheme: ﬁnd u N = ( u ,N , u ,N ) ∈ R N satisfying(7.4) min( A u N − ~ℓ, M u N − b ) = 0 . In the following, we shall discuss the implementation details for solving (7.3)and (7.4) with policy iteration. The direct control scheme, which will serve as abenchmark for our penalized schemes, applies policy iteration to the discrete equation(7.4) directly (see [12, 3]). More precisely, let ω (0) ∈ { , } N be a given initial controlvalue. Then, for all k ≥

0, we ﬁnd ( u ( k ) , ω ( k +1) ) ∈ R N × { , } N such that(7.5) A ( k ) u ( k ) − b ( k ) = 0 , ω ( k +1) ∈ arg min ω ∈{ , } (cid:20) (1 − ω )( A u ( k ) N − ~ℓ ) + ω ( M u ( k ) N − b ) (cid:21) , where the i th row of the matrix A ( k ) and the i -th component of the vector b ( k ) aredetermined by: A ( k ) i = (1 − ω ( k ) i ) A i + ω ( k ) i M i , b ( k ) i = (1 − ω ( k ) i ) ~ℓ i + ω ( k ) i b, i = 1 , . . . , N. The iteration will be terminated once a desired tolerance is achieved, i.e.,(7.6) k u ( k ) N − u ( k − N k max( k u ( k ) N k , scale) < tol , where k · k denotes the sup-norm, and the scale parameter is chosen to guarantee thatno unrealistic level of accuracy will be imposed if the solution is close to zero. Onthe other hand, the penalized scheme views (7.3) with a given penalty parameter ρ asa discrete HJB equation, and applies policy iteration (6.4) to solve it, which will beterminated by the same criterion (7.6), with ( u ( k ) N ) k ≥ replaced by ( u ρ, ( k ) N ) k ≥ . Wetake tol = 10 − and scale = 1 for all the experiments, and perform computationsusing Matlab

R2018a on a 2.70GHz Intel Xeon E5-2680 processor.We reiterate that, compared with the global convergence of policy iteration (6.4)applied to the penalized equation (7.3), policy iteration (7.5) applied to (7.4) in generalis not well-deﬁned for an arbitrary initial guess ω (0) , as already observed in Remark6.1 and [3]. In fact, if we initialize (7.5) with ω (0) = { } N , then we need to solve M u (0) N − b = 0, which has no solution due to the structure of the matrix M and thefact b = − c <

0. Therefore, we shall initialize policy iteration for (7.3) and (7.4)with the continuation value, i.e., u (0) N = u ρ, (0) N satisfying A u (0) N = ~ℓ , which admits asolution since A is a monotone matrix.We start by examining the convergence of the penalized schemes with respect tothe penalty parameter and the mesh size. Figure 7.1 presents, for a ﬁxed mesh size h = 2 − (the total number of unknowns is N = 65536), the diﬀerence between thenumerical solutions obtained by the direct control scheme and the penalty schemewith diﬀerent penalty parameters. It clearly indicates that, as the penalty parameter ρ → ∞ , the penalized solutions converge monotonically from below to the solution ofthe direct control scheme. Since the value function is suﬃciently smooth (Figure 7.1,bottom), we can also observe ﬁrst order convergence of the penalization error (in thesup-norm) with respect to the penalty parameter ρ .Table 7.1 summarizes, for diﬀerent mesh sizes, the numerical solutions of thedirect control scheme and the penalty scheme with a ﬁxed parameter ρ = 10 . It is x u , N − u ρ , N × -4 ρ = 10 ρ = 10 ρ = 10 x u , N Direct control

Fig. 7.1: Numerical solutions of the value function u obtained by the direct controlscheme and the penalty schemes with diﬀerent penalty parameters ( N = 65536).Shown are: the diﬀerence u ,N − u ρ ,N of numerical solutions (top), and the numericalsolution u ,N of the direct control scheme (bottom).interesting to observe that, for a ﬁxed mesh size, the spatial discretization errors ofboth the direct control scheme and the penalty scheme are of the same magnitude andconverge to zero with ﬁrst order as the mesh size tends to 0. Moreover, the penaltyparameter ρ = 10 already leads to a negligible penalization error (compared to thediscretization error), which seems to be stable with respect to diﬀerent mesh sizes.Table 7.1: Results for the direct control scheme and the penalty scheme ( ρ = 10 )with diﬀerent mesh sizes. N 16384 32768 65536Direct control scheme u ,N ( x = 1) 6.9339733 6.9330192 6.9325423 | u ,N − u ,N/ | ( x = 1) 9 . × − . × − Penalty scheme ( ρ = 10 ) u ρ ,N ( x = 1) 6.9339645 6.9330100 6.9325330 | u ρ ,N − u ρ ,N/ | ( x = 1) 9 . × − . × − k u N − u ρN k . × − . × − . × − We proceed to analyze the computational eﬃciency of the direct control schemeand the penalty scheme. Figure 7.2 compares, for diﬀerent mesh sizes and penaltyparameters, the number of required policy iterations and the computational timeof both schemes. One can observe clearly from Figure 7.2, left, that the number ofrequired iterations for the direct control scheme (the blue line) exhibits a linear growthin the size of the discrete system. Moreover, our experiments show that policy iterationapplied to (7.4) with ﬁne meshes, i.e., N ∈ { , } , is not able to meet thedesired accuracy within 10 iterations, which suggests that the direct control scheme ay diverge for suﬃciently ﬁne meshes. On the other hand, for penalty schemes withﬁxed penalty parameters (the green and black lines in Figure 7.2, left), the numberof required iterations eventually stabilizes to a ﬁnite value for all ﬁne meshes, whichis signiﬁcantly less than the number of iterations for the direct control scheme.One can further compare the overall runtime of the direct control scheme and thepenalty scheme for solving discrete systems with diﬀerent sizes N (Figure 7.2, right).Note that for both methods, the computational time per iteration grows at a rate O ( N ) due to the linear system solver. Hence, the total runtime of the direct controlscheme increases at a rate O ( N ) due to the linear growth of the required iterations(the blue line), while the penalized scheme (with a ﬁxed penalty parameter) achievesa linear complexity in the computational time (the green and black lines), beneﬁt-ing from a mesh-independence property of policy iteration for penalized equations.This suggests that the penalty schemes are signiﬁcantly more eﬃcient than the directcontrol scheme for solving large-scale discrete QVIs, as pointed out in [3].

512 1024 2048 4096 8192 16384 32768 65536 131072 262144

Number of unknowns N u m be r o f i t e r a t i on s Direct controlPenalty ρ = 10 Penalty ρ = 10 Penalty ρ = N/ ρ = N/

512 1024 2048 4096 8192 16384 32768 65536

Number of unknowns R un t i m e ( i n s e c ond s ) Direct controlPenalty ρ = 10 Penalty ρ = 10 Penalty ρ = N/ ρ = N/ Fig. 7.2: Comparison of the number of iterations and the runtime for the direct controlscheme and the penalty method with diﬀerent mesh sizes and penalty parameters(plotted in a log-log scale).In practice, instead of solving the penalized equation (7.2) with a ﬁxed penaltyparameter ρ , we shall construct a convergent approximation to the solution of the QVI(7.1) based on the penalized solutions, by letting 1 /ρ and h tend to zero simultaneously(see also [3, 1]). The ﬁrst order convergence of both the penalization error and thediscretization error (see Figure 7.1 and Table 7.1) suggests us to take ρ = CN ,where the constant C = 1 /

16 was found to achieve the optimal balance between thepenalization error and the discretization error. Moreover, as suggested in [24], we cancombine the penalty method with a continuation procedure in ρ to further improvethe algorithm’s eﬃciency. In particular, given a discrete penalized equation (7.3) ofsize N , if the corresponding penalty parameter ρ = N/ > ρ = 100 by using the initialization u (0) = A − ℓ , and then use the solution as the initialization for the algorithm with thedesired parameter ρ .Figure 7.2 depicts the performance of the penalty scheme with the parameter ρ = N/

16 (the red line) and the penalty scheme with the parameter ρ = N/

16 anda continuation procedure (the purple line). The increasing penalty parameter results n an increasing number of iterations, but the growth rate is much lower than thatof the discrete control scheme. A linear regression of the data without continuationprocedure shows that the number of iterations is of the magnitude O ( N . ). Moreover,the continuation strategy eﬀectively enhances the eﬃciency of the algorithm, and thenumber of iterations has only a mild dependence on the size of the system.We ﬁnally remark that one can choose ∆ t = O ( h ) and 1 /ρ = O ( h ) to constructa convergent penalty approximation to solutions of parabolic HJBQVIs. It has beenobserved in practice (see Table 6.6 in [3]) that the number of iterations for the penaltyscheme remains stable with respect to the mesh reﬁnement, due to the fact thatreﬁning the mesh size in general produces a more accurate initial guess for policyiteration, while the direct control scheme requires an increasing number of policyiterations per timestep as the mesh size tends to zero, which leads to signiﬁcantlymore policy iterations for high levels of reﬁnement.

8. Conclusions.

This paper develops a penalty approximation to systems ofHJB quasi-variational inequalities (HJBQVIs) stemming from hybrid control prob-lems involving impulse controls. We established the monotone convergence of thepenalty schemes and estimated the convergence orders, which subsequently led toconvergent approximations of action regions and optimal impulse controls. We fur-ther proved the monotone convergence of policy iteration for the penalized equationsin an inﬁnite dimensional setting. Numerical examples for inﬁnite-horizon optimalswitching problems are presented to illustrate the theoretical ﬁndings and to demon-strate the eﬃciency improvement of the penalty schemes over the classical directcontrol scheme.To the best of our knowledge, this is the ﬁrst paper which derives rigorous errorestimates for penalty approximations of HJBQVIs, and proposes convergent approx-imations to action regions and optimal impulse controls. The penalty schemes andconvergence results can be easily extended to nonlocal elliptic HJBQVIs arising fromimpulse control problems of jump-diﬀusion processes with regime switching. Naturalnext steps would be to extend the penalty approach to parabolic HJBQVIs as in [38],and to monotone systems with bilateral obstacles arising from switching games [20].

Appendix A. Proofs of Lemma 2.2 (3), Propositions 3.1 and 4.1, andLemmas 4.5 and 4.9.

Proof of Lemma 2.2 (3).

Let x ρ , x ∈ R d for all ρ ∈ N and lim ρ →∞ x ρ = x , weﬁrst establish that lim sup ρ →∞ ( M i u ρ )( x ρ ) ≤ ( M i u ∗ ) u ( x ). For any ε >

0, there exists z ε ∈ Z ( x ), such that u ∗ i (Γ i ( x, z ε ))+ K i ( x, z ε ) − ε ≤ ( M i u ∗ )( x ). Since Z ( x ρ ) convergesto Z ( x ) in the Hausdorﬀ metric, we can ﬁnd z ρ,ε ∈ Z ( x ρ ), such that lim ρ →∞ z ρ,ε = z ε .Then we conclude the desired result from the continuity of Γ i , K i and the followinginequality: for all ε > ρ →∞ ( M i u ρ )( x ρ ) ≤ lim sup ρ →∞ [ u ρi (Γ i ( x ρ , z ρ,ε )) + K i ( x ρ , z ρ,ε )] ≤ u ∗ i (Γ i ( x, z ε )) + K i ( x, z ε ) ≤ ( M i u ∗ )( x ) + ε. We then show ( M i u ∗ )( x ) ≤ lim inf ρ →∞ ( M i u ρ )( x ρ ). For any ε > ρ ∈ N ,there exists z ρ,ε ∈ Z ( x ρ ) such that u ρi (Γ i ( x ρ , z ρ,ε )) + K i ( x ρ , z ρ,ε ) − ε ≤ ( M i u ρ )( x ρ ).The fact that Z ( x ρ ) is convergent to the compact set Z ( x ) implies that by passing toa subsequence, one can assume ( z ρ,ε ) ρ ∈ N is convergent to some z ε ∈ Z ( x ). Then we ave lim inf ρ →∞ ( M i u ρ )( x ρ ) ≥ lim inf ρ →∞ [ u ρi (Γ i ( x ρ , z ρ,ε )) + K i ( x ρ , z ρ,ε ) − ε ] ≥ ( u ∗ ) i (Γ i ( x, z ε )) + K i ( x, z ε ) − ε ≥ ( M i u ∗ )( x ) − ε, which completes the proof by letting ε → Proof of Proposition 3.1.

Let u and v be a bounded subsolution and supersolutionof (3.1) with a ﬁxed penalty parameter ρ ≥

0, respectively. We observe that forsuﬃciently large constant

C > w = − C is a subsolution to F ρi ( x, w, Dw i , D w i ) ≤− κ <

0, from which by using the fact that F ρ is convex in u , Du and D u , we deducethat u m := (1 − m ) u + m w is a subsolution to F ρi ( x, u m , D ( u m ) i , D ( u m ) i ) ≤ − κ /m for all m ∈ N . Note that it suﬃces to show u m − v ≤ m ∈ N , since one candeduce the desired comparison principle u − v ≤ m → ∞ .Now suppose that there exists m ∈ N such that M = sup x ∈ R d ,i ∈I (( u m ) i − v i )( x ) >

0, and consider for each ε > M ε = sup x,y ∈ R d ,i ∈I (( u m ) i ( x ) − v i ( y ) − ε | x − y | ) . Then, by assuming without loss of generality that there exists an i ∈ I , inde-pendent of ε , such that the maximum is obtained at the index i and the point( x ε , y ε ) (otherwise one can modify the test function with an additional penalty term),one can deduce from the standard arguments (see [13]) that lim ε → M ε = M andlim ε → x ε = lim ε → y ε = x for some x . Thus by applying the maximum principle([13, Theorem 3.2]), we have for any given θ > X, Y ∈ S d such that( p x , X ) ∈ ¯ J , + u m ( x ε ) and ( − p y , − Y ) ∈ ¯ J , − v ( y ε ), where( p x , p y ) = 1 ε ( x ε − y ε , y ε − x ε ) , and , (cid:18) X Y (cid:19) ≤ θ ε (cid:18) I − I − I I (cid:19) , from which, by using the sub- and supersolution properties, we havesup α ∈A i L αi ( x ε , u m ( x ε ) , p x , X ) − sup α ∈A i L αi ( y ε , v ( y ε ) , − p y , − Y )+ ρ (( u m ) i − M i u m ) + ( x ε ) − ρ ( v i − M i v ) + ( y ε ) + κ /m ≤ . (A.2)Now we separate our discussions into two cases. Suppose for all small enough ε ,we have ρ (( u m ) i − M i u m ) + ( x ε ) − ρ ( v i − M i v ) + ( y ε ) ≤ − κ /m , which implies ( v i − M i v )( y ε ) ≥ u m ) i − M i u m )( x ε ) − ( v i − M i v )( y ε ) ≤ − κ / ( ρm ) . Then by rearranging the terms in the above inequality and using the deﬁnition of M ε ,we have M = lim ε → M ε = lim ε → (cid:2) ( u m ) i ( x ε ) − v i ( y ε ) − | x ε − y ε | / (2 ε ) (cid:3) ≤ lim sup ε → ( M i u m )( x ε ) − lim inf ε → ( M i v )( y ε ) − lim inf ε → | x ε − y ε | / (2 ε ) − κ / ( ρm ) ≤ ( M i u m )( x ) − ( M i v )( x ) − κ / ( ρm ) ≤ M − κ / ( ρm ) , here we have used Lemma 2.2 (3) and the fact that u m and v are upper- and lower-semicontinuous, respectively. This clearly contradicts to the fact that κ / ( ρm ) > ε , we havesup α ∈A i L αi ( y ε , v ( y ε ) , − p y , − Y ) − sup α ∈A i L αi ( x ε , u m ( x ε ) , p x , X ) ≥ . This is the classical case (see [22]). In particular, by using the estimatesup α ∈A i L αi ( x ε , u m ( x ε ) , p x , X ) − sup α ∈A i L αi ( x ε , v ( y ε ) , p x , X ) ≥ λ (( u m ) i ( x ε ) − v i ( y ε )) = λ (cid:18) M ε + | x ε − y ε | ε (cid:19) and letting ε →

0, we can deduce that M ≤

0, which is a contradiction.

Proof of Proposition 4.1.

We start with several important properties of the solu-tion operator Q : [ C ( R d )] M → [ C ( R d )] M to (4.2). That is, for any given u , Qu solvesthe system of variational inequalities of the form (4.2), where the obstacle M i u n − isreplaced by M i u . Then the comparison principle of (4.2) and Lemma 2.2 (2) implythat Q is monotone: Qu ≥ Qv if u ≥ v . Moreover, one can show Q is concave. Infact, for any given u, v ∈ [ C ( R d )] M and λ ∈ [0 , i ∈ I ,(A.3)(1 − λ )( Qu ) i + λ ( Qv ) i − M i [(1 − λ ) u + λv ] ≤ (1 − λ )(( Qu ) i − M i u ) + λ (( Qv ) i − M i v ) . Moreover, since the HJB equation (4.1) is convex in u , Du and D u , by applying [6,Lemma A.3] (note the weakly coupled term P j ∈I − i d αij u j is linear in u j , j ∈ I ), wesee (1 − λ ) Qu + λQv is a subsolution to (4.2) with an obstacle M i [(1 − λ ) u + λv ], andconsequently conclude the concavity of the operator Q from the comparison principleof (4.2).Now let C be a suﬃciently large constant such that w = ( w i ) i ∈I with w i = − C for all i ∈ I is a strict subsolution to (2.1), that is, F i ( x, w, Dw i , D w i ) ≤ − κ forall i ∈ I . We proceed to establish a contractive property of the iterates ( u n ) n ∈ N ,where u n is a viscosity solution to (4.2) for each n . By using the monotonicity andconcavity of the operator Q , we can show that if u n − − u n ≤ λ ( u n − − w ) forsome λ ∈ [0 ,

1] and n ∈ N , then it holds for any constants C ≥ | ( u n ) + | + | w | and0 < µ ≤ min(1 , κ /C ) that u n − u n +1 ≤ λ (1 − µ )( u n − w ) (cf. [37, Lemma 3.3]).Since w ≤ u n ≤ u for all n and w is bounded, there exists a constant µ ∈ (0 , ≤ u n − − u n ≤ (1 − µ ) n − ( u − w ) for all n ≥

0. Consequently we canshow ( u n ) n ≥ converges uniformly to some continuous function u , which is the uniqueviscosity solution to (2.1). Then the contractive property enables us to conclude thedesired error estimate. Proof of Lemma 4.5.

For δ, γ >

0, we deﬁne for all x, y ∈ R d thatΦ i ( x, y ) = ( Q ρ u ) i ( x ) − ( Q ρ v ) i ( y ) − φ ( x, y ) , φ ( x, y ) = δ | x − y | + γ | x | , and let Φ i (¯ x, y ) = m δ,γ := sup i,x,y Φ i ( x, y ) for some (¯ x, ¯ y ) ∈ R d and i ∈ I , where weomit the dependence on δ, γ for notational simplicity. Since I is a ﬁnite set, we shallassume without loss of generality that the index i is independent of δ, γ . Then for any θ >

1, we deduce from the maximum principle [13, Theorem 3.2] that for any θ > e have sup α ∈A i L αi (¯ x, ( Q ρ u )(¯ x ) , p x , X ) − sup α ∈A i L αi (¯ y, ( Q ρ v )(¯ y ) , − p y , − Y )+ ρ (( Q ρ u ) i − M i u ) + (¯ x ) − ρ (( Q ρ v ) i − M i v ) + (¯ y ) ≤ , where ( p x , p y ) = ( D x φ (¯ x, ¯ y ) , D y φ (¯ x, ¯ y )), and (cid:18) X Y (cid:19) ≤ θD φ (¯ x, ¯ y ).We now discuss two cases. Suppose (( Q ρ u ) i − M i u ) + (¯ x ) − (( Q ρ v ) i − M i v ) + (¯ y ) <

0, then we have (( Q ρ u ) i − M i u )(¯ x ) ≤ (( Q ρ v ) i − M i v )(¯ y ), and consequently( Q ρ u ) i (¯ x ) − ( Q ρ v ) i (¯ y ) ≤ ( M i u )(¯ x ) − ( M i v )(¯ x ) + ( M i v )(¯ x ) − ( M i v )(¯ y ) ≤ | ( u i − v i ) + | + [ M i v ] | ¯ x − ¯ y | , where we used the deﬁnition (2.3) of M i . This implies that m δ,γ ≤ | ( u i − v i ) + | + [ M i v ] | ¯ x − ¯ y | − δ | ¯ x − ¯ y | ≤ | ( u i − v i ) + | + [ M i v ] / (4 δ ) . Then, by passing γ →

0, we deduce for any x, y ∈ R d and δ > Q ρ u ) i ( x ) − ( Q ρ v ) i ( y ) ≤ | ( u i − v i ) + | + [ M i v ] / (4 δ ) + δ | x − y | , which, along with the assumption [ M i v ] ≤ [ v ] + C , leads to the desired conclusionby minimizing over δ > x = y .On the other hand, if (( Q ρ u ) i − u j − k ij ) + (¯ x ) − (( Q ρ v ) i − v j − k ij ) + (¯ y ) ≥

0, thenthe classical results for weakly coupled system gives us that ( Q ρ u ) i ≤ ( Q ρ v ) i (seee.g. [22]). Proof of Lemma 4.9.

Note that for any given α > µ ∈ (0 ,

1) and γ ∈ N , wehave ( φ α ) ′ = αγx γ − + µ x log µ , which is increasing on (0 , ∞ ). Suppose that α issuﬃciently small such that αγ < − log µ , then we can show ( φ α ) ′ ( n α ) ≥

0, with thenatural number n α deﬁned as: n α := (cid:24) log( − αγ/ log( µ ))log µ (cid:25) ≤ log( − αγ/ log µ )log µ + 1 . Consequently, φ α is increasing on ( n α , ∞ ), which leads to the estimate that for allsmall enough α , m α ≤ φ α ( n α ) ≤ α (cid:18) log( − αγ/ log µ )log µ + 1 (cid:19) γ + µ − αγ log µ ≤ Cα ( − log α ) γ , where the constant C depends only on γ and µ . REFERENCES[1] P. Azimzadeh, E. Bayraktar, and G. Labahn,

Convergence of implicit schemes for Hamilton-Jacobi-Bellman quasi-variational inequalities , SIAM J. Control Optim., 56 (2018), pp. 3994–4016.[2] P. Azimzadeh,

A zero-sum stochastic diﬀerential game with impulses, precommitment, and un-restricted cost functions , Appl. Math. Optim., 79 (2019), pp. 483–514.[3] P. Azimzadeh and P. A. Forsyth,

Weakly chained matrices, policy iteration, and impulse control ,SIAM J. Numer. Anal., 54 (2016), pp. 1341–1364.[4] L. Bai and J. Paulsen,

Optimal dividend policies with transaction costs for a class of diﬀusionprocesses , SIAM J. Control Optim., 48 (2010), pp. 4987–5008.325] M. Bardi and I. Capuzzo-Dolcetta,

Optimal Control and Viscosity Solutions of HamiltonJacobi-Bellman Equations , Systems Control Found. Appl., Birkh¨auser Boston, Boston, 1997.[6] G. Barles and E. R. Jakobsen,

On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations , M2AN Math. Model. Numer. Anal., 36 (2002), pp. 33–54.[7] A. Bensoussan and J.-L. Lions,

Contrˆole Impulsionnel et In´equations Quasi-Variationnelles ,Dunod, Paris, 1982.[8] A. Bensoussan and J.L. Menaldi,

Hybrid control and dynamic programming , Dynam. Contin.Discrete Impuls. Systems, 3 (1997), pp. 395–442.[9] O. Bokanowski, B. Bruder, S. Maroso, and H. Zidani,

Numerical approximation for a superrepli-cation problem under gamma constraints , SIAM J. Numer. Anal., 47 (2009), pp. 2289–2320,[10] F. Bonnans, S. Maroso, and H. Zidani,

Error estimates for a stochastic impulse control problem ,Appl. Math. Optim., 55 (2007), pp. 327–357.[11] A. Briani, F. Camilli, and H. Zidani,

Approximation schemes for monotone systems of non-linear second order partial diﬀerential equations: convergence result and error estimate ,Diﬀerential Equations Appl., 4 (2012), pp. 297–317.[12] J. P. Chancelier, M. Messaoud, and A. Sulem,

A policy iteration algorithm for ﬁxed pointproblems with nonexpansive operators , Math. Methods Oper. Res., 65 (2007), pp. 239–259.[13] M. G. Crandall, H. Ishii, and P.-L. Lions,

User’s guide to viscosity solutions of second orderpartial diﬀerential equations , Bull. Amer. Math. Soc. (N.S.), 27 (1992), pp. 1–67.[14] M. H. A. Davis, X. Guo, and G. Wu,

Impulse controls of multidimensional jump diﬀusions ,SIAM J. Control Optim., 48 (2010), pp. 5276–5293.[15] K. Debrabant and E. R. Jakobsen,

Semi-Lagrangian schemes for linear and fully nonlineardiﬀusion equations , Math. Comp., 82 (2012), pp. 1433–1462.[16] R. Ferretti, A. Sassi, and H. Zidani,

Error estimates for numerical approximation of Hamilton-Jacobi equations related to hybrid control systems , Appl. Math. Optim., 16 (2018).[17] X. Guo and G. L. Wu,

Smooth ﬁt principle for impulse control of multidimensional diﬀusionprocesses , SIAM J. Control Optim., 48 (2009), pp. 594–617.[18] M. Hinterm¨uller,

Mesh-independence and fast local convergence of a primal-dual active setmethod for mixed control-state constrained elliptic problems , ANZIAM J., 49 (2007), pp.1–38.[19] H. Ishii,

On the equivalence of two notions of weak solutions, viscosity solutions and distributionsolutions , Funkcial. Ekvac., 38 (1995), pp. 101–120.[20] H. Ishii and P. L. Lions,

Viscosity solutions of fully nonlinear second-order elliptic partialdiﬀerential equations , J. Diﬀerential Equations, 83 (1990), pp. 26–78.[21] H. Ishii and S. Koike,

Viscosity solutions of a system of nonlinear second-order elliptic PDEsarising in switching games , Funkcial. Ekvac., 34 (1991), pp. 143–155.[22] H. Ishii and S. Koike,

Viscosity solutions for monotone systems of second-order elliptic PDEs ,Comm. Partial Diﬀerential Equations, 16 (1991), pp. 1095–1128.[23] K. Ishii,

Viscosity solutions of nonlinear second order elliptic PDEs associated with impulsecontrol problems , Funkcial. Ekvac., 36 (1993), pp.123–141.[24] K. Ito and K. Kunisch,

Semi-smooth Newton methods for variational inequalities of the ﬁrstkind , M2AN Math. Model. Numer. Anal., 37 (2003), pp. 41–62.[25] E. R. Jakobsen,

On the rate of convergence of approximation schemes for Bellman equations as-sociated with optimal stopping time problems , Math. Models Methods Appl. Sci., 13 (2003),pp. 613–644.[26] E. R. Jakobsen,

On error bounds for monotone approximation schemes for multi-dimensionalIsaacs equations , Asymptot. Anal., 49 (2006), pp. 249–273.[27] I. Kharroubi, J. Ma, H. Pham, and J. Zhang,

Backward SDEs with contrained jumps andquasi-variational inequalities , Ann. Probab., 38 (2010), pp. 794–840.[28] R. Korn,

Some applications of impulse control in mathematical ﬁnance , Math. Methods Oper.Res., 50 (1999), pp. 493–518.[29] G. Liang,

Stochastic control representations for penalized backward stochastic diﬀerential equa-tions , SIAM J. Control Optim., 53 (2015), pp. 1440–1463.[30] G. Liang and W. Wei,

Optimal switching at Poisson random intervention times , Discrete Con-tin. Dyn. Syst. Ser. B, 21 (2016), pp. 1483–1505.[31] P.-L. Lions and J.-L Menaldi,

Optimal control of stochastic integrals and Hamilton-Jacobi-Bellman equations (part I) , SIAM J. Control Optim., 20 (1982), pp. 58–81.[32] N. Lundstr¨om, K. Nystr¨om, and M. Olofsson,

Systems of variational inequalities in the contextof optimal switching problems and operators of Kolmogorov type , Ann. Mat. Pura Appl., 4(2014), pp. 1213–1247.[33] B. Øksendal and A. Sulem,

Applied Stochastic Control of Jump Diﬀusions , Universitext,Springer, Berlin, 2005. 3334] H. Pham,

Continuous-time Stochastic Control and Optimization with Financial Applications ,Stoch. Model. Appl. Probab. 61, Springer Verlag, Berlin, 2009.[35] C. Reisinger and J. H. Witte,

On the use of policy iteration as an easy way of pricing Americanoptions , SIAM J. Financ. Math., 3 (2012), pp. 459–478.[36] C. Reisinger and Y. Zhang,

A Penalty Scheme and Policy Iteration for Nonlocal HJB Vari-ational Inequalities with Monotone Drivers , preprint, arXiv:1805.06255 [math.NA], 2018.[37] C. Reisinger and Y. Zhang,

A penalty scheme for monotone systems with interconnected obsta-cles: convergence and error estimates , SIAM J. Numer. Anal., 57 (2019), pp. 1625–1648.[38] R. C. Seydel,

Impulse Control for Jump-Diﬀusions: Viscosity Solutions of Quasi-VariationalInequalities and Applications in Bank Risk Management , PhD Thesis, Leipzig University,2009.[39] L. R. Sotomayor and A. Cadenillas,

Stochastic impulse control with regime switching for theoptimal dividend policy when there are business cycles, taxes and ﬁxed costs , Stochastics,85 (2013), pp. 707–722.[40] S. Tan, Z. Jin, and G. Yin,

Optimal dividend payment strategies with debt constraint in ahybrid regime-switching jump-diﬀusion model , Nonlinear Anal. Hybrid Syst., 27 (2018), pp.141–156.[41] J. Wei, H. Yang, and R. Wang,

Classical and impulse control for the optimization of dividendand proportional reinsurance policies with regime switching , J. Optim. Theory Appl., 147(2010), pp. 358–377.[42] J. H. Witte and C. Reisinger,

Penalty methods for the solution of discrete HJB equations:Continuous control and obstacle problems , SIAM J. Numer. Anal., 50 (2012), pp. 595–625.[43] G. Yin, C. Zhu,