Foundations of Multistage Stochastic Programming
FFoundations ofMultistage Stochastic Programming
Paul Dommel ∗ Alois Pichler ∗† February 23, 2021
Abstract
Multistage stochastic optimization problems are oftentimes formulated informally ina pathwise way. These are correct in a discrete setting and suitable when addressingcomputational challenges, for example. But the pathwise problem statement does not allowan analysis with mathematical rigor and is therefore not appropriate.This paper addresses the foundations. We provide a novel formulation of multistagestochastic optimization problems by involving adequate stochastic processes as control. Thefundamental contribution is a proof that there exist measurable versions of intermediate valuefunctions. Our proof builds on the Kolmogorov continuity theorem.A verification theorem is given in addition, and it is demonstrated that all traditionalproblem specifications can be stated in the novel setting with mathematical rigor. Further, weprovide dynamic equations for the general problem, which is developed for various problemclasses. The problem classes covered here include Markov decision processes, reinforcementlearning and stochastic dual dynamic programming.
Keywords:
Multistage stochastic optimization · stochastic processes · measurability Classification:
Stochastic optimization problems are frequently considered in finance, energy management andoperations research where it is essential and of primary interest to develop efficient algorithms andto provide access to fast decisions. Many of these algorithms build on finite models in discretespace. Multistage stochastic problems are built on stochastic processes in discrete time or ondecision trees, cf. Maggioni and Pflug [18], Philpott et al. [22] or Girardeau et al. [11] amongmany others.This paper aims at presenting a rigorous mathematical framework for stochastic optimizationproblems, particularly multistage stochastic optimization problems, by systematically exploitingmeasurability in stochastic processes, in conditional expectations and by involving the proper ∗ University of Technology, Chemnitz, Faculty of mathematics. 90126 Chemnitz, GermanyDFG, German Research Foundation – Project-ID 416228727 – SFB 1410 † orcid.org/0000-0001-8876-2429. Contact: [email protected] a r X i v : . [ m a t h . O C ] F e b onditional infimum. We develop value processes and show their relation to the genuine stochasticoptimization problem. Our central result finally resolves measurability of the intermediate valuefunctions, it builds on the Kolmogorov continuity theorem.Multistage stochastic optimization involves optimization based on partial realizations, whichare partially observed trajectories. It is a major difficulty of multistage stochastic optimization thatindividual realizations or trajectories have probability zero. But the problems are stated naturallyin this pathwise way. It is hence essential to avoid difficulties with arise with this pathwise, or 𝜔 -by- 𝜔 considerations and to address measurability carefully.Early and important attempts to capture measurability are already present in Rockafellar[24] and in Rockafellar and Wets [27]. The conditional expectation, the conditional probabilityand the conditional infimum constitute main and major difficulties in multistage stochasticoptimization. The infimum in the optimization formulation and the conditional expectations needto be interchanged at subsequent stages to exploit computational advantages, cf. Carpentier et al.[5] or Pflug and Pichler [20]. Indeed, a recourse decision is based on a partial realization of astochastic outcome, but has to be considered already at the very beginning of decision making.Considering every outcome separately, 𝜔 -by- 𝜔 , is only possible for finite states, so that a treedescribes the evolution of the stochastic process and the evolution of the decision process as well.In a multistage environment, however, the computational burden grows exponentially with thebranching structure and this approach thus is clearly not advisable. The catch phrase curse ofdimensionality can be associated with this phenomenon in multistage stochastic optimization.This paper addresses the general problem of measurability for discrete and continuousprobability measures. The central result is a proof that there exists a measurable version of theintermediate value process. We present dynamic equations even for the general, non-Markoviansetting. The general verification theorems presented are characterizations as martingales.We elaborate the theory in full generality and elaborate on problem settings, which areof particular importance in applications and increasingly popular in stochastic optimization.They include dynamic programming (the references Bertsekas [3] and Feinberg [7] includeconsiderations on measurability as well), stochastic dual dynamic programming, the Bellmanprinciple and reinforcement learning, which has grown to outstanding importance in machinelearning or data science. For a recent tutorial including also computational aspects we refer toShapiro [33].Investigations on foundations have been started in Pichler and Shapiro [23] with a focus onthe distributionally robust aspect of multistage stochastic optimization. This paper enhances,complements and continuous these investigations on foundations, but now addressing the genuineproblem statement itself.An important spacial case of multistage stochastic optimization, as it is presented in this paper,is dynamic optimization. Dedicated algorithms have been developed for this special case andpapers as Lan and Zhou [16] address convergence of dynamic stochastic approximation, e.g.,Carpentier et al. [4] collect recent theoretical results for the special case of dynamic optimizationagain.Applications of multistage stochastic optimization are widespread over many economic andmanagerial disciplines. We pick Löhndorf et al. [17] to represent and demonstrate importanceof multistage stochastic optimization for example in energy and Shapiro et al. [34] to exemplify2omputational limitations and Ruszczyński [29] to point to extensions involving risk. Outline.
We address the general multistage problem formulation in Section 3, after introducingthe informal description and the mathematical setting. An essential component to manage theevolution of the underlying stochastic process and the decisions is the value process, introducedin Section 4. Particular situations as dynamic problems, additive cost functions, Markovianprocesses and SDDP (stochastic dual dynamic programming) appear frequently in applications.Important simplifications, dedicated complexity and convergence issues are essential to solvethese problems. We address these particular problem formulations in Section 5.
Stochastic optimization builds on random variables, while multistage stochastic optimizationbuilds on stochastic processes on adequate probability spaces. In what follows we address theinformal, pathwise setting and then prepare the mathematical stage to discuss the optimizationproblem with mathematical rigor.
The multistage optimization problem, stated informally as a work instruction, isinf 𝑢 E 𝑋 . . . E 𝑋 𝑡 inf 𝑢 𝑡 E 𝑋 𝑡 + inf 𝑢 𝑡 + . . . E 𝑋 𝑇 inf 𝑢 𝑇 𝑣 ( 𝑋 , . . . , 𝑋 𝑇 , 𝑢 , . . . 𝑢 𝑇 ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑣 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑉 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 − ) . (2.1)Here, 𝑣 is the random objective of the optimization problem, ( 𝑋 , . . . , 𝑋 𝑇 ) are the consecutiverandom observations and 𝑢 , . . . , 𝑢 𝑇 the decisions made after each partial realization 𝑋 𝑡 at eachstage 𝑡 . The functions 𝑣 𝑡 and 𝑉 𝑡 are the intermediate value functions, which are given intuitivelyin (2.1) in an 𝜔 -by- 𝜔 or pathwise context.The problem statement (2.1) exhibits the following difficulties:(i) The expectation at stage 𝑡 is a conditional expectation, conditional on the precedingobservations 𝑋 , . . . , 𝑋 𝑡 . This trajectory has probability 0 and the conditional expectationmust not be considered in a pathwise specification as (2.1) does.(ii) The infimum with respect to 𝑢 𝑡 at stage 𝑡 depends on preceding observations. As above,this is a conditional infimum and not measurable.(iii) The intermediate value functions 𝑣 𝑡 and 𝑉 𝑡 aggregate the entire future. As functions,defined on observed partial realizations, they are not necessarily measurable.Nonetheless, the work instruction (2.1) provides a straightforward illustration of the optimizationproblem, indicating the progression of successive optimization and random realizations. While (i)and (ii) are fixed with standard means, interchanging the infimum with expectations requiresclarification. The issue (iii) emerges specifically in multistage optimization. We resolve thisproblem with the help of Kolmogorov’s continuity theorem.3n what follows we provide a rigorous mathematical problem statement of (2.1) first and thendiscuss derived variants. Let ( Ω , F , 𝑃 ) be a probability space. We may refer to Kallenberg [13, Lemma 1.13] or Shiryaev[36, Theorem II.4.3] for the following Doob–Dynkin lemma. Lemma 2.1 (Doob–Dynkin) . Suppose the random variable 𝑈 with values in R 𝑡 is measurablewith respect to the 𝜎 -algebra 𝜎 ( 𝑋 ) generated by the random variable 𝑋 with values in R 𝑑 . Thenthere is a (Borel-) measurable function 𝜑 : R 𝑑 → R 𝑡 so that 𝑈 = 𝜑 ◦ 𝑋 .
The essential infimum of a set of random variables is defined in Dunford and Schwartz [6].We want to highlight Föllmer and Schied [10, Appendix A.5] for the most compelling proofregarding existence.
Definition 2.2 (Essential infimum) . Let U be a family of R -valued random variables. The randomvariable 𝑌 is the essential infimum of U , if(i) 𝑌 ≤ 𝑈 a.e. for all 𝑈 ∈ U and(ii) 𝑍 ≤ 𝑌 a.e., whenever 𝑍 ≤ 𝑈 for all 𝑈 ∈ U .We shall write ess inf 𝑈 ∈U 𝑈 (cid:66) 𝑌 for the essential infimum of U . Remark . The essential infimum exists and is unique, cf. Föllmer and Schied [10, Appendix A.5]or Karatzas and Shreve [14, Appendix A]. If U is closed under pairwise minimization, i.e,min ( 𝑈, 𝑉 ) ∈ U for 𝑈 , 𝑉 ∈ U , then there is a nonincreasing sequence 𝑈 𝑛 ∈ U such that 𝑈 𝑛 → ess inf 𝑈 ∈U 𝑈 a.s., as 𝑛 → ∞ .For 𝑋 measurable, the random variable ess inf 𝑈 ∈U (cid:0) 𝑈 | 𝜎 ( 𝑋 ) (cid:1) is measurable with respect to 𝜎 ( 𝑋 ) , the sigma algebra generated by 𝑋 . By the Doob–Dynkin lemma there is a measurable 𝜑 (·) so that ess inf 𝑈 ∈U (cid:0) 𝑈 | 𝜎 ( 𝑋 ) (cid:1) = 𝜑 ( 𝑋 ) . We shall denote this function by ess inf 𝑈 ∈U ( 𝑈 | 𝑋 ) (cid:66) 𝜑 . Remark . We shall also address the conditional essential infimum for a singleton U = { 𝑈 } .In this case, the random variable ess inf 𝑈 ∈U ( 𝑈 | 𝑋 ) is the 𝜎 ( 𝑋 ) -measurable envelope of 𝑈 forwhich we shall write ess inf ( 𝑈 | 𝑋 ) . Remark . The term essential infimum is occasionally also used for the largest number 𝑐 ∈ R smaller than the random variable 𝑋 , 𝑐 ≤ 𝑋 a.s. This is ess inf 𝑈 ∈U ( 𝑈 | {∅ , Ω }) in thenotation introduced, where {∅ , Ω } is the trivial sigma algebra. The set U is said to be directed downwards in Föllmer and Schied [10]. .3 Functional optimization The prevailing perspective in practice of multistage stochastic optimization is not a measuretheoretic perspective but rather a functional view: we shall develop and address this perspectiveas the informal , 𝜔 -by- 𝜔 or pathwise description . Throughout, we will give the stochastic processperspective first and then complement the informal perspective as well. While the first oneprovides expressions with mathematical rigor, the latter, intuitive problem statement is perhapsbetter to understand, well-established and more practical for concrete numerical implementations.This is essential for both, the governing stochastic process and the decision process. Definition 2.6 (Decomposable functions) . Let 𝜎 (U) (cid:66) 𝜎 ( 𝑢 : 𝑢 ∈ U) be the sigma algebragenerated by the functions 𝑢 : R 𝑡 → R 𝑑 contained in U . We shall say that the class of functions U is decomposable , if 𝑢 𝐴 ∈ U , where 𝑢 𝐴 ( 𝑥 ) (cid:66) (cid:40) 𝑢 ( 𝑥 ) if 𝑥 ∈ 𝐴,𝑢 ( 𝑥 ) elsewhenever 𝐴 ∈ 𝜎 (U) and 𝑢 , 𝑢 ∈ U .Traditional formulations of the interchangeability principle require that the infimum ismeasurable (cf. Shapiro [32] or the normal integrands in Rockafellar and Wets [27, Theorem 14.60]or Rockafellar and Wets [26]). By involving the essential infimum, the following propositionestablishes the interchangeability principle without requesting measurability explicitly. Proposition 2.7 (Interchangeability principle) . Let U be a class of measurable functions, let 𝑣 : R 𝑡 × R 𝑑 → R be a (measurable) function bounded from below and 𝑋 : Ω → R 𝑡 a randomvariable. It holds that E ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) ≤ inf 𝑢 ∈U E 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) . (2.2) Equality holds in (2.2) if U is decomposable and 𝑋 is measurable with respect to 𝜎 (U) .Proof. For every 𝑥 we have that inf 𝑢 ∈U 𝑣 (cid:0) 𝑥, 𝑢 ( 𝑥 ) (cid:1) ≤ 𝑣 (cid:0) 𝑥, 𝑢 ( 𝑥 ) (cid:1) and thus ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) ≤ 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) a.e. Taking expectations first and then the infimum reveals (2.2).For the remaining assertion recall from Remark 2.3 (or Karatzas and Shreve [14, Appendix A])that there is a sequence 𝑢 𝑗 so that min 𝑗 = ,...,𝑛 𝑣 (cid:0) 𝑋, 𝑢 𝑗 ( 𝑋 ) (cid:1) → ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) almostsurely, as 𝑛 → ∞ . Define 𝐴 𝑖 (cid:66) (cid:26) 𝑣 (cid:0) 𝑋, 𝑢 𝑖 ( 𝑋 ) (cid:1) = min 𝑗 = ,...,𝑛 𝑣 (cid:0) 𝑋, 𝑢 𝑗 ( 𝑋 ) (cid:1) (cid:27) , ˜ 𝐴 𝑖 : = 𝐴 𝑖 \ (cid:216) 𝑗<𝑖 𝐴 𝑗 and set ˜ 𝑢 𝑛 (cid:66) (cid:205) 𝑛𝑖 = 𝑢 𝑖 · ˜ 𝐴 𝑖 . As U is decomposable we have that ˜ 𝑢 𝑛 ∈ U and 𝑣 (cid:0) 𝑥, ˜ 𝑢 𝑛 ( 𝑥 ) (cid:1) = min 𝑖 = ,...,𝑛 𝑣 (cid:0) 𝑥, 𝑢 𝑖 ( 𝑥 ) (cid:1) . Employing Beppo Levi’s monotone convergence theorem we concludethat E 𝑣 (cid:0) 𝑋, ˜ 𝑢 𝑛 ( 𝑋 ) (cid:1) → E ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) as 𝑛 → ∞ and hence the assertion. (cid:3) Proposition 2.8.
Suppose that 𝑢 ↦→ 𝑣 ( 𝑥, 𝑢 ) is monotone for every 𝑥 , i.e., 𝑣 ( 𝑥, 𝑢 ) ≤ 𝑣 ( 𝑥, 𝑢 ) whenever 𝑢 ≤ 𝑢 in every component and min ( 𝑢 , 𝑢 ) ∈ U for 𝑢 , 𝑢 ∈ U . Then interchange-ability (2.2) holds with equality. roof. By monotonicity of 𝑣 we have with 𝑢 (cid:66) min 𝑖 = ,...,𝑛 𝑢 𝑖 ∈ U thatmin 𝑗 = ,...,𝑛 𝑣 (cid:0) 𝑋, 𝑢 𝑗 ( 𝑋 ) (cid:1) = 𝑣 (cid:0) 𝑋, min 𝑗 = ,...,𝑛 𝑢 𝑗 ( 𝑋 ) (cid:1) = 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) . The assertion follows along the proof of Proposition 2.7. (cid:3)
The general multistage optimization problem involves a stochastic process instead of a simplerandom variable. Let 𝑋 = ( 𝑋 , . . . , 𝑋 𝑡 ) be a stochastic process with stages 𝑡 = , . . . , 𝑇 and, without loss of generality, with marginals 𝑋 𝑡 ∈ R . For convenience, the stochasticprocess 𝑋 is occasionally also augmented with a deterministic starting value 𝑋 = 𝑥 a.s. so that 𝑋 = ( 𝑋 , 𝑋 , . . . , 𝑋 𝑇 ) . Definition 3.1 (Nonanticipativity) . The stochastic process 𝑈 = ( 𝑈 , . . . , 𝑈 𝑇 ) is adapted to 𝑋 ,if 𝑈 𝑡 is measurable with respect to 𝜎 ( 𝑋 , . . . , 𝑋 𝑡 ) for every 𝑡 = , . . . , 𝑇 . We shall write 𝑈 (cid:67) 𝑋, if 𝑈 is adapted to 𝑋 .In stochastic optimization, the synonymous term nonanticipative is more common thanadapted. Definition 3.2 (The natural filtration) . The stochastic process 𝑋 = ( 𝑋 , . . . , 𝑋 𝑇 ) is adapted to the natural filtration , if 𝑋 𝑡 ( 𝜔 ) = 𝑋 𝑡 ( 𝜔 , . . . , 𝜔 𝑡 ) (that is, 𝑋 𝑡 ( 𝜔 ) = ˜ 𝑋 𝑡 ( 𝜔 , . . . , 𝜔 𝑡 ) for some randomvariable ˜ 𝑋 which we identify with 𝑋 𝑡 ).Multistage stochastic optimization considers classes U of stochastic control processes. To notrun into difficulties regarding a governing measure we assume that there is a control 𝑈 so that 𝑈 (cid:67) 𝑈 (nonanticipative) for all 𝑈 ∈ U . A particular situation arises for the class U of stochastic processes adapted to 𝑋 , U ⊂ { 𝑈 : 𝑈 (cid:67) 𝑋 } .In this case one may chose 𝑈 = 𝑋 as governing process.We consider the following, general multistage stochastic optimization problem. Definition 3.3 (Multistage optimization problem) . Let 𝑣 : R 𝑇 + × R 𝑇 + → R (3.1) ( 𝑥, 𝑢 ) ↦→ 𝑣 ( 𝑥, 𝑢 ) be a measurable function. For a class U of feasible controls, the general multistage stochasticoptimization problem is inf 𝑈 ∈U ,𝑈 (cid:67) 𝑋 E 𝑣 ( 𝑋, 𝑈 ) , (3.2)where the infimum is among all feasible control policies 𝑈 ∈ U adapted to 𝑋 . The function 𝑣 is the (stochastic) objective function and the set U is the set of admissible controls , decisions or policies . Note that the decision space is R 𝑇 + in (3.1), that is, at each stage 𝑡 ∈ { , . . . , 𝑇 } adecision in R is made; this setting is chosen for convenience of presentation.6n what follows we shall assume that the infimum in (3.2) is finite. A somewhat strongerassumption, although not necessary, is that 𝑣 is uniformly bounded from below (i.e., 𝑣 ≥ 𝐶 > −∞ )so that the expectation in (3.2) is well-defined for every 𝑈 ∈ U . For 𝑢 𝑡 ( 𝑥 , . . . , 𝑥 𝑡 ) measurable it is evident that 𝑢 𝑡 ( 𝑋 , . . . , 𝑋 𝑡 ) is measurable with respect to 𝜎 ( 𝑋 , . . . , 𝑋 𝑡 ) . For this, 𝑈 (cid:66) 𝑢 ( 𝑋 , . . . , 𝑋 𝑇 ) (3.3)is a nonanticipative process with respect to 𝑋 , provided that 𝑢 ( 𝑥 , . . . , 𝑥 𝑇 ) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) 𝑢 𝑢 ( 𝑥 ) ...𝑢 𝑇 ( 𝑥 , . . . , 𝑥 𝑇 ) (cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172) . (3.4)The Doob–Dynkin lemma (Lemma 2.1) ensures that every process 𝑈 ∈ U adapted to 𝑋 has theparticular form (3.3) with (3.4). Lemma 3.4 (Doob–Dynkin lemma, extended) . Let 𝑋 = ( 𝑋 , . . . , 𝑋 𝑇 ) be a stochastic process indiscrete time with marginals states 𝑋 𝑡 ∈ R 𝑑 and 𝑈 (cid:67) 𝑋 . There are measurable functions 𝑢 𝑡 sothat 𝑈 𝑡 = 𝑢 𝑡 ( 𝑋 , . . . , 𝑋 𝑡 ) for 𝑡 = , . . . , 𝑇 a.s. and 𝑈 = 𝜑 𝑈 ◦ 𝑋 , where 𝜑 𝑈 = 𝑢 is given by (3.4) . Functional optimization perspective.
The optimization problem (3.2) employs a fixed stochas-tic process 𝑋 . In view of the Doob–Dynkin lemma, the problem (3.2) thus can be stated as anoptimization problem among stochastic processes, or equivalently also as optimization problemamong functions, each of the specific form (3.4). The multistage stochastic optimization problemthus can be classified as a functional optimization problem , because solving it means findingunknown functions as (3.4). The equivalence between measurable functions and processes isgiven by 𝑈 ↦→ 𝜑 𝑈 , where 𝜑 𝑈 is the function from the extended Doob–Dynkin lemma (Lemma 3.4), while the inverseis the map 𝑢 ↦→ 𝑈 = 𝑢 ( 𝑋 ) given in (3.3).Further, this equivalence allows extending the notion of decomposable to stochastic processes. Definition 3.5 (Decomposable processes) . The class U of stochastic process is decomposable, ifeach function in { 𝜑 𝑈 : 𝑈 ∈ U} is decomposable in the sense of Definition 2.6. 7 .2 Special cases of the general problem setting The conventional stochastic optimization problem and the stochastic optimization problem withrecourse are special cases of the multistage stochastic optimization problem.
Example 3.6 ( 𝑇 = . Consider the set of policies with
U ⊂ { 𝑈 : 𝑈 𝑡 (cid:67) 𝑋 for all 𝑡 ≥ } (or 𝑇 = 𝑢 𝑡 is deterministic, i.e., nonrandom. The correspondingoptimization problem inf 𝑢 ∈ R 𝑇 + E 𝑣 ( 𝑋, 𝑢 ) (3.5)is a conventional stochastic optimization problem, as it is sufficient to treat 𝑋 as a random vectorin (3.5). Here, it is not essential that 𝑋 is a stochastic process, the time component is missing. Example 3.7 ( 𝑇 = . Consider the feasible policies
U ⊂ { 𝑢 : 𝑢 (cid:67) 𝑋 and 𝑢 𝑡 (cid:67) 𝑋 for all 𝑡 ≥ } (or 𝑇 = ( 𝑢 ,𝑢 ( 𝑋 )) ∈U E 𝑣 (cid:0) 𝑋 , 𝑢 , 𝑢 ( 𝑋 ) (cid:1) . Here, the decision 𝑢 is deterministic, i.e., does not depend on the random components of 𝑋 ; 𝑢 (·) is called the random recourse decision in the literature (cf. Shapiro et al. [35]). It is an important conceptual element in stochastic optimization to consider the problem sequentiallyin time, so that any new observation 𝑋 𝑡 triggers a subsequent new decision 𝑢 𝑡 ( 𝑋 . . . , 𝑋 𝑡 ) , whichitself is based on the past. Shapiro [31] depicts the consecutive transitions via the chain in Figure 1.The transitions Figure 1 can be started with 𝑋 equally well. 𝑢 (cid:123) 𝑋 (cid:123) 𝑢 (cid:123) · · · (cid:123) 𝑋 𝑡 (cid:123) 𝑢 𝑡 (cid:123) 𝑋 𝑡 + (cid:123) · · · (cid:123) 𝑋 𝑇 (cid:123) 𝑢 𝑇 Figure 1: The progression of random observations and decisionsIn what follows we develop a similar decomposition of the optimization problem (3.2)and present our main result in Theorem 4.2 below. For notational convenience we introducethe abbreviation 𝑥 𝑡 : 𝑡 (cid:48) (cid:66) ( 𝑥 𝑡 , 𝑥 𝑡 + , . . . , 𝑥 𝑡 (cid:48) ) (0 ≤ 𝑡 , 𝑡 (cid:48) ≤ 𝑇 ) for subvectors. We also write 𝑋 : 𝑡 (cid:66) ( 𝑋 , . . . , 𝑋 𝑡 ) for the initial and 𝑈 𝑡 : (cid:66) ( 𝑈 𝑡 , . . . , 𝑈 𝑇 ) for the final (trailing) substrings.Recall that 𝑈 is a non-anticipative process if there is a control 𝑢 so that 𝑈 = 𝑢 ( 𝑋 , . . . , 𝑋 𝑇 ) , aswell as a functions 𝑢 𝑡 : with 𝑈 𝑡 : = 𝑢 𝑡 : ( 𝑋 ) . By U 𝑡 : = { 𝑢 𝑡 : : 𝑈 ∈ U} we denote the set of functionsincluding the final decisions of all control processes.8 .1 Existence of the intermediate value functions A common way to solve the initial problem (3.2) is to decompose it into a sequence of subproblems.We specify these subproblems by introducing the value process in the following considerations.Let 𝑢 : 𝑡 ∈ R 𝑡 + and a function ˜ 𝑢 𝑡 + 𝑇 ∈ U 𝑡 + 𝑇 be given. As a consequence of the Doob–Dynkinlemma (Lemma 2.1) there is measurable mapping 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 : R 𝑡 + → R such that 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 ( 𝑋 : 𝑡 ) = E (cid:0) 𝑣 ( 𝑋, 𝑢 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 (cid:1) . (4.1)These conditional expectations constitute the building block for the intermediate value functions. Definition 4.1.
The (intermediate) value functions are 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) (cid:66) ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 ( 𝑥 : 𝑡 ) and (4.2) 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) (cid:66) ess inf ˜ 𝑢 𝑡 ∈U 𝑡 𝑣 𝑡 (cid:0) 𝑥 : 𝑡 , 𝑢 : 𝑡 − , ˜ 𝑢 𝑡 ( 𝑥 : 𝑡 ) (cid:1) , (4.3)where 𝑡 = , . . . , 𝑇 .These value functions are functions on R ( 𝑡 + )×( 𝑡 + ) ( R ( 𝑡 + )× 𝑡 , resp.) and the essential imfimaare with respect to these spaces. These functions are generally not unique as there are multiplefunctions satisfying the Doob–Dynkin lemma. The value functions 𝑉 𝑡 and 𝑣 𝑡 are defined pointwise(and well-defined on each point), but they are not necessarily measureable. Hence, additionalconditions on 𝑣 need to be imposed to ensure measurability.The following statement is the main result. It establishes existence of a measurable version ofthe intermediate value functions. The proof builds on Kolmogorov’s continuity theorem , alsoknown as Kolmogorov–Chentsov theorem.
Theorem 4.2 (Existence of a measurable version of the value function) . Assume that 𝑣 is locallyHölder continuous with exponent 𝛼 > in 𝑢 , i.e., | 𝑣 ( 𝑥, 𝑢 ) − 𝑣 ( 𝑥, 𝑢 )| ≤ 𝐶 (cid:107) 𝑢 − 𝑢 (cid:107) 𝛼 for 𝑥 ∈ R 𝑡 and (cid:107) 𝑢 − 𝑢 (cid:107) ≤ 𝛿, (4.4) where 𝛿 > is sufficiently small. Then there exists a version of 𝑣 𝑡 of the intermediate valuefunction which is measurable with respect to B ( R 𝑡 + ) ⊗ B ( R 𝑡 + ) and locally Hölder continuouswith exponent ˜ 𝛼 ∈ (cid:0) , 𝛼𝑡 + (cid:1) . To prove the main theorem we recall the following condition on joint measureability fromGowrisankaran [12, Theorem 2]; we state the result in full mathematical beauty, although we donot need this most general variant.
Theorem 4.3.
Let ( 𝑋, 𝜏 ) be a measureable space and 𝑌 a Suslin space. Let B be the Borel 𝜎 -algebra of all measurable subsets for a locally finite measure 𝜆 on the Borel 𝜎 -algebra of 𝑌 .Then, a function 𝑓 : 𝑋 × 𝑌 → 𝐴 with values in a separable metrizable space 𝐴 with(i) 𝑥 ↦→ 𝑓 ( 𝑥, 𝑦 ) is 𝜏 -measurable for every 𝑦 ∈ 𝑌 and(ii) 𝑦 ↦→ 𝑓 ( 𝑥, 𝑦 ) is continuous on 𝑌 for each 𝑥 ∈ 𝑋 is 𝜏 ⊗ B -measurable on 𝑋 × 𝑌 . emark . Functions satisfying the conditions (i) and (ii) of Theorem 4.3 are also known as
Carathéodory functions . Proof of Theorem 4.2.
We shall employ Theorem 4.3. Consider the function 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) (cid:66) 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 ( 𝑥 : 𝑡 ) (cf. (4.1)), where ˜ 𝑢 𝑡 + 𝑇 ∈ U 𝑡 + 𝑇 is fixed. Measurability follows from the definition of thefunction 𝑣 𝑡 in (4.2) and general measurability of the essential infimum and thus the condition (i)of Theorem 4.3.It remains to verify continuity, i.e., (ii). In order to employ Theorem 4.3 we need to showcontinuity of 𝑢 : 𝑡 ↦→ 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) . To this end consider the stochastic process ( 𝑍 𝑢 ) 𝑢 ∈ R 𝑡 + ,indexed by 𝑢 ∈ R 𝑡 + and defined by 𝑍 𝑢 (cid:66) E (cid:0) 𝑣 ( 𝑋, 𝑢, ˜ 𝑢 𝑡 + .𝑇 ( 𝑋 ))| 𝑋 : 𝑡 (cid:1) . Further, let 𝑢 ∈ R and 𝑢 ∈ 𝑈 𝛿 ( 𝑢 ) for 𝛿 > 𝛼 (cid:66) 𝑡 + 𝛼 , 𝛽 (cid:66) E (cid:12)(cid:12) 𝑍 𝑢 − 𝑍 𝑢 (cid:12)(cid:12) ˜ 𝛼 = E (cid:12)(cid:12) E ( 𝑣 ( 𝑋, 𝑢, ˜ 𝑢 𝑡 + .𝑇 ( 𝑋 )) − 𝑣 ( 𝑋, 𝑢 , ˜ 𝑢 𝑡 + .𝑇 ( 𝑋 ))| 𝑋 : 𝑡 ) (cid:12)(cid:12) ˜ 𝛼 ≤ E ( 𝐶 · (cid:107) 𝑢 − 𝑢 (cid:107) 𝛼 ) ˜ 𝛼 ≤ 𝐶 (cid:107) 𝑢 − 𝑢 (cid:107) 𝑡 + = 𝐶 (cid:107) 𝑢 − 𝑢 (cid:107) 𝑡 + + 𝛽 for some 𝐶 < ∞ . Hence, by the Kolmogorov continuity theorem (cf. Klenke [15, p. 453]),there is a process 𝑍 𝑢 such that 𝑍 𝑢 = 𝑍 (· , 𝑢 ) = E ( 𝑣 ( 𝑋, 𝑢, ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 )) | 𝑋 : 𝑡 ) and 𝑍 ( 𝜔, ·) is Höldercontinuous with exponent 𝛽 ˜ 𝛼 = 𝛼𝑡 + for almost every 𝜔 ∈ Ω . It follows that the correspondingfunctions 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡 are continuous with respect to 𝑢 . This proves (ii) and hence the assertion of thetheorem. (cid:3) Remark . It is evident that measurable versions of (4.2) and (4.3) existfor uniformly Lipschitz continuous objective functions 𝑣 . In what follows, we define the value processes substituting 𝑥 : 𝑡 and 𝑢 : 𝑡 by their stochasticcounterparts 𝑋 : 𝑡 and 𝑈 : 𝑡 . Definition 4.6 (Value process) . Assume 𝑣 satisfies the Hölder condition imposed in Theorem 4.2and 𝑈 ∈ U is a nonanticipative stochastic process ( 𝑈 (cid:67) 𝑋 ). The general value processes are 𝒗 𝑈𝑡 (cid:66) 𝑣 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 ) and (4.5) 𝑽 𝑈𝑡 (cid:66) 𝑉 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 − ) , (4.6)where 𝑣 𝑡 and 𝑉 𝑡 are the intermediate value functions, cf. Definition 4.1.10 RR 𝑇 + × R 𝑇 + R 𝑡 + × R 𝑡 + ( 𝑋 : 𝑡 , 𝑈 : 𝑡 ) 𝒗 𝑡 ( 𝑋, 𝑈 ) 𝑣𝑣 𝑡 ≤ E (cid:0) 𝑣 ( 𝑋, 𝑈 )| 𝑋 : 𝑡 , 𝑈 : 𝑡 (cid:1) Figure 2: Domain and range of the objective and the general value process
Remark . The functions 𝑣 𝑡 and 𝑉 𝑡 (cf. (4.2) and (4.3)) are defined on 𝑉 𝑡 : R 𝑡 + × R 𝑡 → R and 𝑣 𝑡 : R 𝑡 + × R 𝑡 + → R . We employ bold letters to indicate random variables, i.e., functions on Ω given by 𝒗 𝑈𝑡 ( 𝜔 ) = 𝑣 𝑡 (cid:0) 𝑋 : 𝑡 ( 𝜔 ) , 𝑈 : 𝑡 ( 𝜔 ) (cid:1) and 𝑽 𝑈𝑡 ( 𝜔 ) = 𝑉 𝑡 (cid:0) 𝑋 : 𝑡 ( 𝜔 ) , 𝑈 : 𝑡 − ( 𝜔 ) (cid:1) . Figure 2 depicts the domain and the range of these functions and random variables.
Remark 𝜔 -by- 𝜔 description) . The functions 𝑉 𝑡 and 𝑣 𝑡 describing the valueprocesses (4.5) and (4.6) can be given explicitly and directly—but intuitively—as 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) = inf 𝑢 𝑡 : (·) E (cid:0) 𝑣 (cid:0) 𝑋 : 𝑇 , 𝑢 : 𝑡 − , 𝑢 𝑡 : 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 = 𝑢 : 𝑡 − (cid:1) (4.7)and 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) = inf 𝑢 𝑡 + (·) E (cid:0) 𝑣 (cid:0) 𝑋 : 𝑇 , 𝑢 : 𝑡 , 𝑢 𝑡 + 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 = 𝑢 : 𝑡 (cid:1) , (4.8)where the infima are among functions 𝑢 𝑡 : ( 𝑥 , . . . , 𝑥 𝑇 ) = (cid:169)(cid:173)(cid:173)(cid:171) 𝑢 𝑡 ( 𝑥 , . . . , 𝑥 𝑡 ) ...𝑢 𝑇 ( 𝑥 , . . . , 𝑥 𝑡 , . . . , 𝑥 𝑇 ) (cid:170)(cid:174)(cid:174)(cid:172) with 𝑢 ( 𝑋 ) ∈ U .Note, however, that the expressions (4.7) and (4.8) are not necessarily well defined, as theymay depend explicitly on the choice of the control process 𝑈 : 𝑡 . They further face a delicatemeasurability problem, as the pointwise infimum is not measurable, in general. Hence (4.7)and (4.8) can not be used as definitions. Our definitions (4.2) and (4.3), together with (4.5)and (4.6), resolve this problem by addressing 𝑢 : 𝑡 as a parameter and passing over to the essentialinfimum, which has a measurable version by the main theorem, Theorem 4.3.11 .3 Relation to the multistage problem In what follows we derive the equations interconnecting the value functions introduced in thepreceding section. To this end observe first that 𝑉 = inf 𝑈 ∈U E 𝑣 ( 𝑋, 𝑈 ) (4.9)by definition (4.3), so that 𝑉 is the optimal value of the initial problem. Further, we havewith (4.2) that 𝑣 𝑇 = 𝑣, (4.10)which is the starting point of the optimization problem at the final stage.The following statements interconnect the value functions at intermediate stages. Theorem 4.9.
Let 𝑈 ∈ U be a feasible policy. It holds that 𝑉 𝑡 ( 𝑋 : 𝑡 , 𝑈 𝑡 − ) = ess inf ˜ 𝑢 𝑡 ∈U 𝑡 𝑣 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 − , ˜ 𝑢 𝑡 ( 𝑋 : 𝑡 )) and 𝑣 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 ) ≥ E ( 𝑉 𝑡 + ( 𝑋 : 𝑡 + , 𝑈 : 𝑡 )| 𝑋 : 𝑡 ) . (4.11) Equality holds in (4.11) , if U is decomposable.Proof. The first equation follows directly from the definition of 𝑉 𝑡 and 𝑣 𝑡 . The second followsfrom E (cid:0) 𝑉 𝑡 + ( 𝑋 : 𝑡 + , 𝑈 : 𝑡 )| 𝑋 : 𝑡 (cid:1) = E (cid:18) ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 E ( 𝑣 ( 𝑋, 𝑈 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 + ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 (cid:19) ≤ ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 E (cid:0) E ( 𝑣 ( 𝑋, 𝑈 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 + )| 𝑋 : 𝑡 (cid:1) = ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 E (cid:0) 𝑣 ( 𝑋, 𝑈 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 (cid:1) by Proposition 2.7 and the tower property of the conditional expectation. Equality holds, byProposition 2.7 again, for decomposable controls and hence the assertion. (cid:3) Remark 𝜔 -by- 𝜔 description) . As above and employing the functions (4.7)and (4.8), the equations can be stated directly and explicitly by 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) = inf 𝑢 𝑡 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − , 𝑢 𝑡 ) and 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) ≥ E 𝑋 𝑡 + ( 𝑉 𝑡 + ( 𝑋 : 𝑡 + , 𝑢 : 𝑡 )| 𝑋 : 𝑡 = 𝑥 : 𝑡 ) = E 𝑋 𝑡 + ( 𝑉 𝑡 + ( 𝑥 : 𝑡 , 𝑋 𝑡 + , 𝑢 : 𝑡 )| 𝑋 : 𝑡 = 𝑥 : 𝑡 ) . Equality holds, if U is decomposable.These equations get to the point directly and explain the computational task at each stage ( 𝑡 ) and at each node ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ). Note again that stating the equations this way is not justified from amathematical perspective, the equations suffer from measurability issues, in general. They arejustified in the finite dimensional case if 𝑃 ( 𝑋 : 𝑡 = 𝑥 : 𝑡 and 𝑈 : 𝑡 − = 𝑢 𝑡 − ) > Corollary 4.11 (Dynamic relations) . Let 𝑈 ∈ U be a feasible control process. It holds that 𝑽 𝑈𝑡 ≥ ess inf 𝑈 (cid:48) : 𝑡 ∈U : 𝑡 , 𝑈 (cid:48) : 𝑡 − = 𝑈 : 𝑡 − E (cid:16) 𝑽 𝑈 (cid:48) 𝑡 + (cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 (cid:17) and 𝒗 𝑈𝑡 ≥ E (cid:32) ess inf 𝑈 (cid:48) : 𝑡 + ∈U : 𝑡 + , 𝑈 (cid:48) : 𝑡 = 𝑈 : 𝑡 𝒗 𝑈 (cid:48) 𝑡 + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 (cid:33) . Equality holds, if U is decomposable.Proof. The assertion is immediate by combining the defining equations (4.5) and (4.6) and theassertions of Theorem 4.9. (cid:3)
Remark . It holds that 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) ≥ inf 𝑢 𝑡 E 𝑋 𝑡 + ( 𝑉 𝑡 + ( 𝑥 : 𝑡 , 𝑋 𝑡 + , 𝑢 : 𝑡 )| 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 − = 𝑢 : 𝑡 − ) and 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) ≥ E 𝑋 𝑡 + (cid:18) inf 𝑢 𝑡 + 𝑣 𝑡 + ( 𝑥 : 𝑡 , 𝑋 𝑡 + , 𝑢 : 𝑡 + ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 = 𝑢 : 𝑡 (cid:19) . Equality holds, if U is decomposable. Verification theorems provide optimality conditions. Given these characterizations it is thepurpose of verification theorems to allow verifying or checking, if a given policy is optimal or not.An interesting, early reference is Rockafellar and Wets [25], who study martingales associatedwith optimality conditions. Fleming and Soner [8] give verification theorems for dynamic (inparticular Markovian) problems in continuous time. We shall address this particular situationfurther in more detail below.The value process 𝒗 𝑈𝑡 is a stochastic process depending on an underlying policy 𝑈 . A specialsituation occurs if the underlying policy 𝑈 is optimal, i.e., 𝑈 solves the initial problem (3.2).In what follows we examine this situation. We further provide a useful characterization of theoptimizers of (3.2), relating the different concepts regarding optimization and probability theory. Theorem 4.13 (Verification theorem) . Let 𝑢 ∈ U be any policy. Then the stochastic processes 𝒗 𝑈𝑡 = 𝑣 𝑡 (cid:0) 𝑋 : 𝑡 , 𝑢 : 𝑡 ( 𝑋 : 𝑡 ) (cid:1) , 𝑡 = , . . . , 𝑇 , and 𝑽 𝑈𝑡 = 𝑉 𝑡 (cid:0) 𝑋 : 𝑡 , 𝑢 : 𝑡 − ( 𝑋 : 𝑡 − ) (cid:1) , 𝑡 = , . . . , 𝑇 are submartingales. They are martingales, if U is decomposable and if 𝑢 solves the initialproblem (3.2) .Conversly if U decomposable and 𝑽 𝑈𝑡 , 𝒗 𝑈𝑡 are martingals, then 𝑈 is an optimizer of (3.2) . roof. The first assertion is immediate from Corollary 4.11. For the second assume that 𝑽 𝑈 ∗ 𝑡 , 𝒗 𝑈 ∗ 𝑡 are martingals for an underlying policy 𝑈 ∗ . By employing (4.9), (4.10) and Theorem 4.9 itfollows thatinf 𝑈 ∈U E 𝑣 ( 𝑋, 𝑈 ) = 𝑽 = 𝑽 𝑈 ∗ = E (cid:16) 𝑽 𝑈 ∗ (cid:12)(cid:12)(cid:12) 𝑋 (cid:17) = 𝒗 𝑈 ∗ = E (cid:16) 𝒗 𝑈 ∗ 𝑇 (cid:12)(cid:12)(cid:12) 𝑋 (cid:17) = E ( 𝑣 ( 𝑋, 𝑈 ∗ )) and thus the assertion. (cid:3) Theorem 4.13 allows identifying a policy 𝑈 = 𝑢 ( 𝑋 ) as optimal policy by checking, if thevalue processes constitute a martingale or not. Note that the verification theorem does not give ahint on where and how to improve the policy. Instead, it can be used ex post to check an existing,given policy with respect to optimality.The verification theorem presented above notably works for every multistage stochasticoptimization problem. We did not impose other conditions on the function 𝑣 except Höldercontinuity, and we did not restrict the analysis to Markovian processes. From this mathematicalperspective the statement is rather general. Most common in optimal control, finance and reinforcement learning are value functions, whichaccumulate costs occurring at consecutive stages. We derive their intermediate value functionsexplicitly by exploiting the specific structure of the objective function. To this end we transformthe equations for the general additive case first and derive the equations for MDP (Markov decisionprocesses) subsequently. The Markovian property, from probabilistic perspective, is essential forthe MDP equations. As well, we derive the equations for stochastic dual dynamic programming(SDDP) from the general equations. ℓ stochastic processes and additive objective functions The particular value function which we consider here, 𝑣 ( 𝑥 : 𝑇 , 𝑢 : 𝑇 ) (cid:66) 𝑇 ∑︁ 𝑡 = 𝛾 𝑡 − 𝑐 𝑡 ( 𝑥 𝑡 − ℓ : 𝑡 , 𝑢 𝑡 − ℓ : 𝑡 − ) , (5.1)adds consecutive costs at lag ℓ ≥ 𝑢 − = 𝑢 ,e.g., and the corresponding cost function is adjusted accordingly). The value function (5.1) is offundamental importance in finance and in reinforcement learning, where 𝑐 𝑡 is the cost associatedwith time 𝑡 and 𝛾 is a discount factor. Note the very particular choice of arguments of thefunction 𝑐 𝑡 : the last input element is the observation 𝑥 𝑡 , but the subsequent decision 𝑢 𝑡 is nottaken into account. Figure 3 depicts the support of the cost component 𝑐 𝑡 at stage 𝑡 (comparewith Figure 1). 14 · · (cid:123) 𝑋 𝑡 − ℓ (cid:123) 𝑢 𝑡 − ℓ (cid:123) · · · (cid:123) 𝑋 𝑡 − (cid:123) 𝑢 𝑡 − (cid:123) 𝑋 𝑡 (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑐 𝑡 (cid:123) . . . Figure 3: Arguments of the cost component 𝑐 𝑡 The parameter 𝛾 ∈ (− , ) in (5.1) is most typically interpreted as discount factor . Toderive the dynamic equations we assume that the functions 𝑐 𝑡 are Hölder continuous and assumethat the stochastic process associated with the value function (2.1) has lag ℓ as well; that is, 𝜎 ( 𝑋 , . . . , 𝑋 𝑡 ) = 𝜎 ( 𝑋 𝑡 − ℓ , . . . , 𝑋 𝑡 ) for all 𝑡 = ℓ, . . . , 𝑇 . Define the functions ˜ 𝑉 𝑡 by˜ 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) · 𝛾 𝑡 (cid:66) 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) − 𝑡 ∑︁ 𝑖 = 𝛾 𝑖 − 𝑐 𝑖 ( 𝑥 𝑖 − ℓ : 𝑖 , 𝑢 𝑖 − ℓ : 𝑖 − ) so that ˜ 𝑉 = 𝑉 . For additive cost functions, the schematic decomposition (2.1) now isinf 𝑢 𝑐 + E 𝑋 inf 𝑢 𝑐 + · · · + E 𝑋 𝑡 inf 𝑢 𝑡 𝑐 𝑡 + E 𝑋 𝑡 + inf 𝑢 𝑡 + 𝑐 𝑡 + + · · · + E 𝑋 𝑇 inf 𝑢 𝑇 𝑐 𝑇 (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ˜ 𝑣 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ˜ 𝑉 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 − ) . From (4.7) we conclude that˜ 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) = inf 𝑢 𝑡 : (·) E (cid:32) 𝑇 ∑︁ 𝑖 = 𝑡 + 𝛾 𝑖 − − 𝑡 𝑐 𝑖 (cid:0) 𝑋 𝑖 − ℓ : 𝑖 , 𝑢 𝑖 − ℓ : 𝑖 − , 𝑢 𝑖 : 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 ,𝑈 : 𝑡 − = 𝑢 : 𝑡 − (cid:33) . (5.2)The function inside the expectation is independent of 𝑥 : 𝑡 − ℓ and the stochastic process 𝑋 has lag ℓ .Further, the decision process 𝑈 is adapted to 𝑋 (cf. (3.2)) and thus has lag ℓ as well. With that itfollows that (5.2) actually is˜ 𝑉 𝑡 ( 𝑥 𝑡 − ℓ + 𝑡 , 𝑢 𝑡 − ℓ + 𝑡 − ) = inf 𝑢 𝑡 : (·) E (cid:32) 𝑇 ∑︁ 𝑖 = 𝑡 + 𝛾 𝑖 − − 𝑡 𝑐 𝑖 (cid:0) 𝑋 : 𝑖 , 𝑢 𝑖 − ℓ : 𝑖 − , 𝑢 𝑖 : 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 𝑋 𝑡 − ℓ + 𝑡 = 𝑥 𝑡 − ℓ + 𝑡 ,𝑈 𝑡 − ℓ + 𝑡 − = 𝑢 𝑡 − ℓ + 𝑡 − (cid:33) . Employing Remark 4.12 we deduce the recursion˜ 𝑉 𝑡 ( 𝑥 𝑡 − ℓ + 𝑡 , 𝑢 𝑡 − ℓ + 𝑡 − ) (5.3) ≥ inf 𝑢 𝑡 E 𝑋 𝑡 + (cid:18) 𝑐 𝑡 + ( 𝑥 𝑡 + − ℓ : 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 + − ℓ : 𝑡 )+ 𝛾 ˜ 𝑉 𝑡 + ( 𝑥 𝑡 − ℓ + 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 − ℓ + 𝑡 ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 𝑡 − ℓ + 𝑡 = 𝑥 𝑡 − ℓ + 𝑡 ,𝑈 𝑡 − ℓ + 𝑡 − = 𝑢 𝑡 − ℓ + 𝑡 − (cid:19) , where equality indicates optimality. This backwards recursion leads to the following discussionon MDP. 15 .2 MDP A Markov decision process (MDP) is a discrete-time stochastic control process. To this end weconsider the cost functions (5.1) with lag ℓ =
1, i.e., 𝑣 ( 𝑥 : 𝑇 , 𝑢 : 𝑇 ) (cid:66) 𝑇 ∑︁ 𝑡 = 𝛾 𝑡 − 𝑐 𝑡 ( 𝑥 𝑡 − , 𝑥 𝑡 ; 𝑢 𝑡 − ) (5.4)and a process 𝑋 with same lag ℓ =
1, i.e., a Markovian process. With that, the recursion (5.3)collapses further to˜ 𝑉 𝑡 ( 𝑥 𝑡 ) = inf 𝑢 𝑡 E (cid:0) 𝑐 𝑡 + ( 𝑥 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 ) + 𝛾 ˜ 𝑉 𝑡 + ( 𝑋 𝑡 + ) (cid:12)(cid:12) 𝑋 𝑡 = 𝑥 𝑡 (cid:1) . (5.5)This recursion is well-known in MDP and (5.5) is also known as backward induction involvingthe Bellman principle (cf. Bellman [1, 2]), which is of fundamental importance in dynamicprogramming. Remark . The MDP literature considers rather trajectories which are driven themselves by thecontrol 𝑢 (the control is called action in the MDP literature). To recognize this dependency inaddition we can restate the recursion as˜ 𝑉 𝑡 ( 𝑥 𝑡 ) = inf 𝑢 𝑡 E 𝑢 𝑡 (cid:0) 𝑐 𝑡 + ( 𝑥 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 ) + 𝛾 ˜ 𝑉 𝑡 + ( 𝑋 𝑡 + ) (cid:12)(cid:12) 𝑋 𝑡 = 𝑥 𝑡 (cid:1) , where E 𝑢 is the expectation with respect to the kernel 𝑃 𝑢 (· | 𝑥 𝑡 ) , which explicitly depends on thedecision 𝑢 . The cost function (5.4) is also considered on an infinite horizon, i.e., 𝑣 ( 𝑥 : 𝑇 , 𝑢 : 𝑇 ) (cid:66) ∞ ∑︁ 𝑡 = 𝛾 𝑡 − 𝑐 𝑡 ( 𝑥 𝑡 − , 𝑥 𝑡 ; 𝑢 𝑡 − ) ; (5.6)problems in reinforcement learning are of this particular form (5.6). The value function (5.1) isbounded in the chosen setting, if the cost functions are uniformly bounded, | 𝑐 𝑡 | ≤ 𝐾 < ∞ andlearning rate 𝛾 ∈ (− , ) (although most typical is 𝛾 ∈ ( , ) ).A particularly interesting situation arises for cost functions which do not depend on the stage 𝑡 ,i.e., 𝑐 𝑡 = 𝑐 and decision satisfying ( 𝑋 𝑡 , 𝑋 𝑡 + ) ∼ ( 𝑋, 𝑋 (cid:48) ) . Then, the value functions ˜ 𝑉 𝑡 does notdepend on 𝑡 neither and the equation˜ 𝑉 ( 𝑥 ) = inf 𝑢 E (cid:0) 𝑐 ( 𝑥, 𝑋 (cid:48) , 𝑢 ) + 𝛾 ˜ 𝑉 ( 𝑋 (cid:48) ) (cid:12)(cid:12) 𝑋 = 𝑥 (cid:1) (5.7)holds.This is a fixed point equation and Banach’s fixed point theorem can be applied to proveexistence and uniqueness of the value function ˜ 𝑉 in appropriate spaces. As well, the equation (5.7)specifies an iterative scheme to improve the value function ˜ 𝑉 in consecutive steps. As an examplewe state the following, where we refer to Fleten et al. [9] for a proof in a similar situation. Theorem 5.2.
Suppose that 𝑐 is continuous and 𝑋 ∈ 𝐾 a.s. for some compact set 𝐾 ⊂ R 𝑛 and | 𝛾 | < . Then the value function ˜ 𝑉 is continuous and ˜ 𝑉 ∈ 𝐶 ( 𝐾 ) . .4 SDDP The problem setting of stochastic dual dynamic programming (SDDP) considers a stagewiseindependent stochastic process 𝑋 𝑡 (i.e., 𝑋 𝑡 is independent of all preceding 𝑋 𝑡 (cid:48) , 𝑡 (cid:48) < 𝑡 ), which isa further simplification of all situations described above. With 𝑋 𝑡 ∼ 𝑋 , the dynamic equationreduces further to ˜ 𝑉 𝑡 ( 𝑥 𝑡 ) = inf 𝑢 𝑡 E (cid:0) 𝑐 𝑡 + ( 𝑥 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 ) + 𝛾 ˜ 𝑉 𝑡 + ( 𝑋 𝑡 + ) (cid:1) . (5.8)This is the simplest situation from statistic perspective and it is not surprising that large andextensive problem settings are accessible for numerical computations. The important algorithmfor SDDP for solving the problem (5.8) efficiently originated in Pereira and Pinto [19].We refer to Shapiro [30] for an extended analysis of the algorithm, to Römisch and Guigues[28] and to Girardeau et al. [11], Philpott and Guan [21] for convergence proofs of the algorithm. Multistage stochastic optimization has many applications in varying areas, from finance to datascience to just mention two. The problems are popular and typically stated conditioned on partialrealizations. This pathwise, or 𝜔 -by- 𝜔 , perspective lacks mathematical rigor. It is surprisingthat mathematical foundations regarding measurability are incomplete from a mathematicalperspective and still missing.This paper clarifies that multistage optimization problems, even if given in an informal,pathwise or 𝜔 -by- 𝜔 way can be cast with mathematical rigor. We start by outlining the generalproblem and employ the Kolmogorov continuity theorem to verify that value functions are welldefined, even if conditioned on sets of measure zero.Verification theorems can be employed to confirm that candidate policies are optimal. Wefurther characterize optimal policies by involving martingales to characterize these optimalsolutions.Markov decision processes, the Bellman principle for reinforcement learning and stochasticdual dynamic programming are probably most well-known and common in practice of dynamicprogramming. We derive these problem settings as special cases and, in this way, provide rigorousmathematical foundations. References [1] R. E. Bellman.
Dynamic Programming . Princeton University Press, Princton, NJ, 1957. 16[2] R. E. Bellman.
Adaptive control processes . Princeton Legacy Library 2045. PrincetonUniversity Press, 1961. ISBN 978-1-4008-7466-8. doi:10.1002/nav.3800080314. 16[3] D. Bertsekas.
Dynamic programming and optimal control . Athena Scientific. ISBN1886529434. 2 174] P. Carpentier, J.-P. Chancelier, G. Cohen, M. De Lara, and P. Girardeau. Dynamic consistencyfor stochastic optimal control problems.
Annals of Operations Research , 200(1):247–263,2012. doi:10.1007/s10479-011-1027-8. 2[5] P. Carpentier, J.-P. Chancelier, G. Cohen, and M. De Lara.
Stochastic Multi-Stage Op-timization . Springer International Publishing, 2015. doi:10.1007/978-3-319-18138-7.2[6] N. Dunford and J. T. Schwartz.
Linear Operators. Part I. General Theory . Wiley-Interscience,New York, 1957. URL http://books.google.com/books?id=DuJQAAAAMAAJ . 4[7] E. A. Feinberg. On measurability and representation of strategic measures in Markovdecision processes. In
Institute of Mathematical Statistics Lecture Notes - Monograph Series ,pages 29–43. Institute of Mathematical Statistics, 1996. doi:10.1214/lnms/1215453563. 2[8] W. H. Fleming and H. M. Soner.
Controlled Markov Processes and Viscosity Solutions .Springer, second edition, 2006. doi:10.1007/0-387-31071-1. 13[9] S.-E. Fleten, E. Haugom, A. Pichler, and C. J. Ullrich. Structural estimation of switchingcosts for peaking power plants.
European Journal on Operational Research , 285(1):23–33,2020. doi:10.1016/j.ejor.2019.03.031. 16[10] H. Föllmer and A. Schied.
Stochastic Finance: An Introduction in Discrete Time . deGruyter Studies in Mathematics 27. Berlin, Boston: De Gruyter, 2004. ISBN 978-3-11-046345-3. doi:10.1515/9783110218053. URL http://books.google.com/books?id=cL-bZSOrqWoC . 4[11] P. Girardeau, V. Leclère, and A. B. Philpott. On the convergence of decomposition methodsfor multistage stochastic convex programs.
Mathematics of Operations Research , 40(1):1–16, 2014. doi:10.1287/moor.2014.0664. 1, 17[12] K. Gowrisankaran. Measurability of functions in product spaces.
Proceedings of TheAmerican Mathematical Society - PROC AMER MATH SOC , 31:485–485, 02 1972.doi:10.1090/S0002-9939-1972-0291403-X. 9[13] O. Kallenberg.
Foundations of Modern Probability . Springer, New York, 2002.doi:10.1007/b98838. 4[14] I. Karatzas and S. E. Shreve.
Methods of Mathematical Finance . Stochastic Modelling andApplied Probability. Springer, 1998. doi:10.1007/b98840. 4, 5[15] A. Klenke.
Probability Theory . Springer London. doi:10.1007/978-1-4471-5361-0. 10[16] G. Lan and Z. Zhou. Dynamic stochastic approximation for multi-stage stochastic opti-mization.
Mathematical Programming , 2020. doi:10.1007/s10107-020-01489-y. URL https://arXiv.org/abs/1707.03324 . 21817] N. Löhndorf, D. Wozabal, and S. Minner. Optimizing trading decisions for hydro storagesystems using approximate dual dynamic programming.
Operations Research , 61(4):810–823, 2013. doi:10.1287/opre.2013.1182. 2[18] F. Maggioni and G. Ch. Pflug. Guaranteed bounds for general non-discrete multistagerisk-averse stochastic optimization programs.
SIAM Journal on Optimization , 29(1):454–483,2019. doi:10.1137/17M1140601. 1[19] M. V. F. Pereira and L. M. V. G. Pinto. Multi-stage stochastic optimization applied to energyplanning.
Mathematical Programming , 52(1-3):359–375, 1991. doi:10.1007/BF01582895.17[20] G. Ch. Pflug and A. Pichler.
Multistage Stochastic Optimization . Springer Series inOperations Research and Financial Engineering. Springer, 2014. ISBN 978-3-319-08842-6. doi:10.1007/978-3-319-08843-3. URL https://books.google.com/books?id=q_VWBQAAQBAJ . 2[21] A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programmingand related methods.
Operations Research Letters , 36(4):450–455, 2008. ISSN 0167-6377.doi:10.1016/j.orl.2008.01.013. 17[22] A. B. Philpott, V. L. de Matos, and E. Finardi. On solving multistage stochasticprograms with coherent risk measures.
Operations Research , 61(4):957–970, 2013.doi:10.1287/opre.2013.1175. 1[23] A. Pichler and A. Shapiro. Mathematical foundations of distributionally robust multistageoptimization, 2021. URL https://arXiv.org/abs/2101.02498 . 2[24] R. T. Rockafellar. Integral functionals, normal integrands and measurable selections.In
Nonlinear operators and the calculus of variations , pages 157–207. Springer, 1976.doi:10.1007/BFb0079944. 2[25] R. T. Rockafellar and R. J.-B. Wets. Nonanticipativity and 𝐿 -martingales in stochasticoptimization problems. Mathematical Programming Study , 6:170–187, 1976. 13[26] R. T. Rockafellar and R. J. B. Wets. On the interchange of subdifferentiation andconditional expectations for convex functionals.
Stochastics , 7(3):173–182, 1982.doi:10.1080/17442508208833217. 5[27] R. T. Rockafellar and R. J.-B. Wets.
Variational Analysis . Springer Verlag,1997. doi:10.1007/978-3-642-02431-3. URL https://books.google.com/books?id=w-NdOE5fD8AC . 2, 5[28] W. Römisch and V. Guigues. Sampling-based decomposition methods for multistagestochastic programs based on extended polyhedral risk measures.
SIAM Journal onOptimization , 22(2):286–312, 2012. doi:10.1137/100811696. 171929] A. Ruszczyński. Risk-averse dynamic programming for Markov decision processes.
Math.Program., Ser. B , 125:235–261, 2010. doi:10.1007/s10107-010-0393-3. 3[30] A. Shapiro. Analysis of stochastic dual dynamic programming method.
European Journalof Operational Research , 209(1):63–72, 2010. doi:10.1016/j.ejor.2010.08.007. 17[31] A. Shapiro. Time consistency of dynamic risk measures.
Operations Research Letters , 40(6):436–439, 2012. doi:10.1016/j.orl.2012.08.007. 8[32] A. Shapiro. Interchangeability principle and dynamic equations in risk aversestochastic programming.
Operations Research Letters , 45(4):377–381, jul 2017.doi:10.1016/j.orl.2017.05.008. 5[33] A. Shapiro. Tutorial on risk neutral, distributionally robust and risk averse mul-tistage stochastic programming.
European Journal of Operational Research , 2020.doi:10.1016/j.ejor.2020.03.065. 2[34] A. Shapiro, W. Tekaya, J. P. da Costa, and M. Pereira Soares. Risk neutral andrisk averse stochastic dual dynamic programming method. 224(2):375–391, 2013.doi:10.1016/j.ejor.2012.08.022. 2[35] A. Shapiro, D. Dentcheva, and A. Ruszczyński.
Lectures on Stochastic Programming . MOS-SIAM Series on Optimization. SIAM, second edition, 2014. doi:10.1137/1.9780898718751.8[36] A. N. Shiryaev.