[PDF] Foundations of Multistage Stochastic Programming

Abstract

Multistage stochastic optimization problems are oftentimes formulated informally in a pathwise way. These are correct in a discrete setting and suitable when addressing computational challenges, for example. But the pathwise problem statement does not allow an analysis with mathematical rigor and is therefore not appropriate. This paper addresses the foundations. We provide a novel formulation of multistage stochastic optimization problems by involving adequate stochastic processes as control. The fundamental contribution is a proof that there exist measurable versions of intermediate value functions. Our proof builds on the Kolmogorov continuity theorem. A verification theorem is given in addition, and it is demonstrated that all traditional problem specifications can be stated in the novel setting with mathematical rigor. Further, we provide dynamic equations for the general problem, which is developed for various problem classes. The problem classes covered here include Markov decision processes, reinforcement learning and stochastic dual dynamic programming.

Full PDF

FFoundations ofMultistage Stochastic Programming

Paul Dommel ∗ Alois Pichler ∗† February 23, 2021

Abstract

Multistage stochastic optimization problems are oftentimes formulated informally ina pathwise way. These are correct in a discrete setting and suitable when addressingcomputational challenges, for example. But the pathwise problem statement does not allowan analysis with mathematical rigor and is therefore not appropriate.This paper addresses the foundations. We provide a novel formulation of multistagestochastic optimization problems by involving adequate stochastic processes as control. Thefundamental contribution is a proof that there exist measurable versions of intermediate valuefunctions. Our proof builds on the Kolmogorov continuity theorem.A veriﬁcation theorem is given in addition, and it is demonstrated that all traditionalproblem speciﬁcations can be stated in the novel setting with mathematical rigor. Further, weprovide dynamic equations for the general problem, which is developed for various problemclasses. The problem classes covered here include Markov decision processes, reinforcementlearning and stochastic dual dynamic programming.

Keywords:

Multistage stochastic optimization · stochastic processes · measurability Classiﬁcation:

Stochastic optimization problems are frequently considered in ﬁnance, energy management andoperations research where it is essential and of primary interest to develop eﬃcient algorithms andto provide access to fast decisions. Many of these algorithms build on ﬁnite models in discretespace. Multistage stochastic problems are built on stochastic processes in discrete time or ondecision trees, cf. Maggioni and Pﬂug [18], Philpott et al. [22] or Girardeau et al. [11] amongmany others.This paper aims at presenting a rigorous mathematical framework for stochastic optimizationproblems, particularly multistage stochastic optimization problems, by systematically exploitingmeasurability in stochastic processes, in conditional expectations and by involving the proper ∗ University of Technology, Chemnitz, Faculty of mathematics. 90126 Chemnitz, GermanyDFG, German Research Foundation – Project-ID 416228727 – SFB 1410 † orcid.org/0000-0001-8876-2429. Contact: [email protected] a r X i v : . [ m a t h . O C ] F e b onditional inﬁmum. We develop value processes and show their relation to the genuine stochasticoptimization problem. Our central result ﬁnally resolves measurability of the intermediate valuefunctions, it builds on the Kolmogorov continuity theorem.Multistage stochastic optimization involves optimization based on partial realizations, whichare partially observed trajectories. It is a major diﬃculty of multistage stochastic optimization thatindividual realizations or trajectories have probability zero. But the problems are stated naturallyin this pathwise way. It is hence essential to avoid diﬃculties with arise with this pathwise, or 𝜔 -by- 𝜔 considerations and to address measurability carefully.Early and important attempts to capture measurability are already present in Rockafellar[24] and in Rockafellar and Wets [27]. The conditional expectation, the conditional probabilityand the conditional inﬁmum constitute main and major diﬃculties in multistage stochasticoptimization. The inﬁmum in the optimization formulation and the conditional expectations needto be interchanged at subsequent stages to exploit computational advantages, cf. Carpentier et al.[5] or Pﬂug and Pichler [20]. Indeed, a recourse decision is based on a partial realization of astochastic outcome, but has to be considered already at the very beginning of decision making.Considering every outcome separately, 𝜔 -by- 𝜔 , is only possible for ﬁnite states, so that a treedescribes the evolution of the stochastic process and the evolution of the decision process as well.In a multistage environment, however, the computational burden grows exponentially with thebranching structure and this approach thus is clearly not advisable. The catch phrase curse ofdimensionality can be associated with this phenomenon in multistage stochastic optimization.This paper addresses the general problem of measurability for discrete and continuousprobability measures. The central result is a proof that there exists a measurable version of theintermediate value process. We present dynamic equations even for the general, non-Markoviansetting. The general veriﬁcation theorems presented are characterizations as martingales.We elaborate the theory in full generality and elaborate on problem settings, which areof particular importance in applications and increasingly popular in stochastic optimization.They include dynamic programming (the references Bertsekas [3] and Feinberg [7] includeconsiderations on measurability as well), stochastic dual dynamic programming, the Bellmanprinciple and reinforcement learning, which has grown to outstanding importance in machinelearning or data science. For a recent tutorial including also computational aspects we refer toShapiro [33].Investigations on foundations have been started in Pichler and Shapiro [23] with a focus onthe distributionally robust aspect of multistage stochastic optimization. This paper enhances,complements and continuous these investigations on foundations, but now addressing the genuineproblem statement itself.An important spacial case of multistage stochastic optimization, as it is presented in this paper,is dynamic optimization. Dedicated algorithms have been developed for this special case andpapers as Lan and Zhou [16] address convergence of dynamic stochastic approximation, e.g.,Carpentier et al. [4] collect recent theoretical results for the special case of dynamic optimizationagain.Applications of multistage stochastic optimization are widespread over many economic andmanagerial disciplines. We pick Löhndorf et al. [17] to represent and demonstrate importanceof multistage stochastic optimization for example in energy and Shapiro et al. [34] to exemplify2omputational limitations and Ruszczyński [29] to point to extensions involving risk. Outline.

We address the general multistage problem formulation in Section 3, after introducingthe informal description and the mathematical setting. An essential component to manage theevolution of the underlying stochastic process and the decisions is the value process, introducedin Section 4. Particular situations as dynamic problems, additive cost functions, Markovianprocesses and SDDP (stochastic dual dynamic programming) appear frequently in applications.Important simpliﬁcations, dedicated complexity and convergence issues are essential to solvethese problems. We address these particular problem formulations in Section 5.

Stochastic optimization builds on random variables, while multistage stochastic optimizationbuilds on stochastic processes on adequate probability spaces. In what follows we address theinformal, pathwise setting and then prepare the mathematical stage to discuss the optimizationproblem with mathematical rigor.

The multistage optimization problem, stated informally as a work instruction, isinf 𝑢 E 𝑋 . . . E 𝑋 𝑡 inf 𝑢 𝑡 E 𝑋 𝑡 + inf 𝑢 𝑡 + . . . E 𝑋 𝑇 inf 𝑢 𝑇 𝑣 ( 𝑋 , . . . , 𝑋 𝑇 , 𝑢 , . . . 𝑢 𝑇 ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑣 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑉 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 − ) . (2.1)Here, 𝑣 is the random objective of the optimization problem, ( 𝑋 , . . . , 𝑋 𝑇 ) are the consecutiverandom observations and 𝑢 , . . . , 𝑢 𝑇 the decisions made after each partial realization 𝑋 𝑡 at eachstage 𝑡 . The functions 𝑣 𝑡 and 𝑉 𝑡 are the intermediate value functions, which are given intuitivelyin (2.1) in an 𝜔 -by- 𝜔 or pathwise context.The problem statement (2.1) exhibits the following diﬃculties:(i) The expectation at stage 𝑡 is a conditional expectation, conditional on the precedingobservations 𝑋 , . . . , 𝑋 𝑡 . This trajectory has probability 0 and the conditional expectationmust not be considered in a pathwise speciﬁcation as (2.1) does.(ii) The inﬁmum with respect to 𝑢 𝑡 at stage 𝑡 depends on preceding observations. As above,this is a conditional inﬁmum and not measurable.(iii) The intermediate value functions 𝑣 𝑡 and 𝑉 𝑡 aggregate the entire future. As functions,deﬁned on observed partial realizations, they are not necessarily measurable.Nonetheless, the work instruction (2.1) provides a straightforward illustration of the optimizationproblem, indicating the progression of successive optimization and random realizations. While (i)and (ii) are ﬁxed with standard means, interchanging the inﬁmum with expectations requiresclariﬁcation. The issue (iii) emerges speciﬁcally in multistage optimization. We resolve thisproblem with the help of Kolmogorov’s continuity theorem.3n what follows we provide a rigorous mathematical problem statement of (2.1) ﬁrst and thendiscuss derived variants. Let ( Ω , F , 𝑃 ) be a probability space. We may refer to Kallenberg [13, Lemma 1.13] or Shiryaev[36, Theorem II.4.3] for the following Doob–Dynkin lemma. Lemma 2.1 (Doob–Dynkin) . Suppose the random variable 𝑈 with values in R 𝑡 is measurablewith respect to the 𝜎 -algebra 𝜎 ( 𝑋 ) generated by the random variable 𝑋 with values in R 𝑑 . Thenthere is a (Borel-) measurable function 𝜑 : R 𝑑 → R 𝑡 so that 𝑈 = 𝜑 ◦ 𝑋 .

The essential inﬁmum of a set of random variables is deﬁned in Dunford and Schwartz [6].We want to highlight Föllmer and Schied [10, Appendix A.5] for the most compelling proofregarding existence.

Deﬁnition 2.2 (Essential inﬁmum) . Let U be a family of R -valued random variables. The randomvariable 𝑌 is the essential inﬁmum of U , if(i) 𝑌 ≤ 𝑈 a.e. for all 𝑈 ∈ U and(ii) 𝑍 ≤ 𝑌 a.e., whenever 𝑍 ≤ 𝑈 for all 𝑈 ∈ U .We shall write ess inf 𝑈 ∈U 𝑈 (cid:66) 𝑌 for the essential inﬁmum of U . Remark . The essential inﬁmum exists and is unique, cf. Föllmer and Schied [10, Appendix A.5]or Karatzas and Shreve [14, Appendix A]. If U is closed under pairwise minimization, i.e,min ( 𝑈, 𝑉 ) ∈ U for 𝑈 , 𝑉 ∈ U , then there is a nonincreasing sequence 𝑈 𝑛 ∈ U such that 𝑈 𝑛 → ess inf 𝑈 ∈U 𝑈 a.s., as 𝑛 → ∞ .For 𝑋 measurable, the random variable ess inf 𝑈 ∈U (cid:0) 𝑈 | 𝜎 ( 𝑋 ) (cid:1) is measurable with respect to 𝜎 ( 𝑋 ) , the sigma algebra generated by 𝑋 . By the Doob–Dynkin lemma there is a measurable 𝜑 (·) so that ess inf 𝑈 ∈U (cid:0) 𝑈 | 𝜎 ( 𝑋 ) (cid:1) = 𝜑 ( 𝑋 ) . We shall denote this function by ess inf 𝑈 ∈U ( 𝑈 | 𝑋 ) (cid:66) 𝜑 . Remark . We shall also address the conditional essential inﬁmum for a singleton U = { 𝑈 } .In this case, the random variable ess inf 𝑈 ∈U ( 𝑈 | 𝑋 ) is the 𝜎 ( 𝑋 ) -measurable envelope of 𝑈 forwhich we shall write ess inf ( 𝑈 | 𝑋 ) . Remark . The term essential inﬁmum is occasionally also used for the largest number 𝑐 ∈ R smaller than the random variable 𝑋 , 𝑐 ≤ 𝑋 a.s. This is ess inf 𝑈 ∈U ( 𝑈 | {∅ , Ω }) in thenotation introduced, where {∅ , Ω } is the trivial sigma algebra. The set U is said to be directed downwards in Föllmer and Schied [10]. .3 Functional optimization The prevailing perspective in practice of multistage stochastic optimization is not a measuretheoretic perspective but rather a functional view: we shall develop and address this perspectiveas the informal , 𝜔 -by- 𝜔 or pathwise description . Throughout, we will give the stochastic processperspective ﬁrst and then complement the informal perspective as well. While the ﬁrst oneprovides expressions with mathematical rigor, the latter, intuitive problem statement is perhapsbetter to understand, well-established and more practical for concrete numerical implementations.This is essential for both, the governing stochastic process and the decision process. Deﬁnition 2.6 (Decomposable functions) . Let 𝜎 (U) (cid:66) 𝜎 ( 𝑢 : 𝑢 ∈ U) be the sigma algebragenerated by the functions 𝑢 : R 𝑡 → R 𝑑 contained in U . We shall say that the class of functions U is decomposable , if 𝑢 𝐴 ∈ U , where 𝑢 𝐴 ( 𝑥 ) (cid:66) (cid:40) 𝑢 ( 𝑥 ) if 𝑥 ∈ 𝐴,𝑢 ( 𝑥 ) elsewhenever 𝐴 ∈ 𝜎 (U) and 𝑢 , 𝑢 ∈ U .Traditional formulations of the interchangeability principle require that the inﬁmum ismeasurable (cf. Shapiro [32] or the normal integrands in Rockafellar and Wets [27, Theorem 14.60]or Rockafellar and Wets [26]). By involving the essential inﬁmum, the following propositionestablishes the interchangeability principle without requesting measurability explicitly. Proposition 2.7 (Interchangeability principle) . Let U be a class of measurable functions, let 𝑣 : R 𝑡 × R 𝑑 → R be a (measurable) function bounded from below and 𝑋 : Ω → R 𝑡 a randomvariable. It holds that E ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) ≤ inf 𝑢 ∈U E 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) . (2.2) Equality holds in (2.2) if U is decomposable and 𝑋 is measurable with respect to 𝜎 (U) .Proof. For every 𝑥 we have that inf 𝑢 ∈U 𝑣 (cid:0) 𝑥, 𝑢 ( 𝑥 ) (cid:1) ≤ 𝑣 (cid:0) 𝑥, 𝑢 ( 𝑥 ) (cid:1) and thus ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) ≤ 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) a.e. Taking expectations ﬁrst and then the inﬁmum reveals (2.2).For the remaining assertion recall from Remark 2.3 (or Karatzas and Shreve [14, Appendix A])that there is a sequence 𝑢 𝑗 so that min 𝑗 = ,...,𝑛 𝑣 (cid:0) 𝑋, 𝑢 𝑗 ( 𝑋 ) (cid:1) → ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) almostsurely, as 𝑛 → ∞ . Deﬁne 𝐴 𝑖 (cid:66) (cid:26) 𝑣 (cid:0) 𝑋, 𝑢 𝑖 ( 𝑋 ) (cid:1) = min 𝑗 = ,...,𝑛 𝑣 (cid:0) 𝑋, 𝑢 𝑗 ( 𝑋 ) (cid:1) (cid:27) , ˜ 𝐴 𝑖 : = 𝐴 𝑖 \ (cid:216) 𝑗<𝑖 𝐴 𝑗 and set ˜ 𝑢 𝑛 (cid:66) (cid:205) 𝑛𝑖 = 𝑢 𝑖 · ˜ 𝐴 𝑖 . As U is decomposable we have that ˜ 𝑢 𝑛 ∈ U and 𝑣 (cid:0) 𝑥, ˜ 𝑢 𝑛 ( 𝑥 ) (cid:1) = min 𝑖 = ,...,𝑛 𝑣 (cid:0) 𝑥, 𝑢 𝑖 ( 𝑥 ) (cid:1) . Employing Beppo Levi’s monotone convergence theorem we concludethat E 𝑣 (cid:0) 𝑋, ˜ 𝑢 𝑛 ( 𝑋 ) (cid:1) → E ess inf 𝑢 ∈U 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) as 𝑛 → ∞ and hence the assertion. (cid:3) Proposition 2.8.

Suppose that 𝑢 ↦→ 𝑣 ( 𝑥, 𝑢 ) is monotone for every 𝑥 , i.e., 𝑣 ( 𝑥, 𝑢 ) ≤ 𝑣 ( 𝑥, 𝑢 ) whenever 𝑢 ≤ 𝑢 in every component and min ( 𝑢 , 𝑢 ) ∈ U for 𝑢 , 𝑢 ∈ U . Then interchange-ability (2.2) holds with equality. roof. By monotonicity of 𝑣 we have with 𝑢 (cid:66) min 𝑖 = ,...,𝑛 𝑢 𝑖 ∈ U thatmin 𝑗 = ,...,𝑛 𝑣 (cid:0) 𝑋, 𝑢 𝑗 ( 𝑋 ) (cid:1) = 𝑣 (cid:0) 𝑋, min 𝑗 = ,...,𝑛 𝑢 𝑗 ( 𝑋 ) (cid:1) = 𝑣 (cid:0) 𝑋, 𝑢 ( 𝑋 ) (cid:1) . The assertion follows along the proof of Proposition 2.7. (cid:3)

The general multistage optimization problem involves a stochastic process instead of a simplerandom variable. Let 𝑋 = ( 𝑋 , . . . , 𝑋 𝑡 ) be a stochastic process with stages 𝑡 = , . . . , 𝑇 and, without loss of generality, with marginals 𝑋 𝑡 ∈ R . For convenience, the stochasticprocess 𝑋 is occasionally also augmented with a deterministic starting value 𝑋 = 𝑥 a.s. so that 𝑋 = ( 𝑋 , 𝑋 , . . . , 𝑋 𝑇 ) . Deﬁnition 3.1 (Nonanticipativity) . The stochastic process 𝑈 = ( 𝑈 , . . . , 𝑈 𝑇 ) is adapted to 𝑋 ,if 𝑈 𝑡 is measurable with respect to 𝜎 ( 𝑋 , . . . , 𝑋 𝑡 ) for every 𝑡 = , . . . , 𝑇 . We shall write 𝑈 (cid:67) 𝑋, if 𝑈 is adapted to 𝑋 .In stochastic optimization, the synonymous term nonanticipative is more common thanadapted. Deﬁnition 3.2 (The natural ﬁltration) . The stochastic process 𝑋 = ( 𝑋 , . . . , 𝑋 𝑇 ) is adapted to the natural ﬁltration , if 𝑋 𝑡 ( 𝜔 ) = 𝑋 𝑡 ( 𝜔 , . . . , 𝜔 𝑡 ) (that is, 𝑋 𝑡 ( 𝜔 ) = ˜ 𝑋 𝑡 ( 𝜔 , . . . , 𝜔 𝑡 ) for some randomvariable ˜ 𝑋 which we identify with 𝑋 𝑡 ).Multistage stochastic optimization considers classes U of stochastic control processes. To notrun into diﬃculties regarding a governing measure we assume that there is a control 𝑈 so that 𝑈 (cid:67) 𝑈 (nonanticipative) for all 𝑈 ∈ U . A particular situation arises for the class U of stochastic processes adapted to 𝑋 , U ⊂ { 𝑈 : 𝑈 (cid:67) 𝑋 } .In this case one may chose 𝑈 = 𝑋 as governing process.We consider the following, general multistage stochastic optimization problem. Deﬁnition 3.3 (Multistage optimization problem) . Let 𝑣 : R 𝑇 + × R 𝑇 + → R (3.1) ( 𝑥, 𝑢 ) ↦→ 𝑣 ( 𝑥, 𝑢 ) be a measurable function. For a class U of feasible controls, the general multistage stochasticoptimization problem is inf 𝑈 ∈U ,𝑈 (cid:67) 𝑋 E 𝑣 ( 𝑋, 𝑈 ) , (3.2)where the inﬁmum is among all feasible control policies 𝑈 ∈ U adapted to 𝑋 . The function 𝑣 is the (stochastic) objective function and the set U is the set of admissible controls , decisions or policies . Note that the decision space is R 𝑇 + in (3.1), that is, at each stage 𝑡 ∈ { , . . . , 𝑇 } adecision in R is made; this setting is chosen for convenience of presentation.6n what follows we shall assume that the inﬁmum in (3.2) is ﬁnite. A somewhat strongerassumption, although not necessary, is that 𝑣 is uniformly bounded from below (i.e., 𝑣 ≥ 𝐶 > −∞ )so that the expectation in (3.2) is well-deﬁned for every 𝑈 ∈ U . For 𝑢 𝑡 ( 𝑥 , . . . , 𝑥 𝑡 ) measurable it is evident that 𝑢 𝑡 ( 𝑋 , . . . , 𝑋 𝑡 ) is measurable with respect to 𝜎 ( 𝑋 , . . . , 𝑋 𝑡 ) . For this, 𝑈 (cid:66) 𝑢 ( 𝑋 , . . . , 𝑋 𝑇 ) (3.3)is a nonanticipative process with respect to 𝑋 , provided that 𝑢 ( 𝑥 , . . . , 𝑥 𝑇 ) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:173)(cid:171) 𝑢 𝑢 ( 𝑥 ) ...𝑢 𝑇 ( 𝑥 , . . . , 𝑥 𝑇 ) (cid:170)(cid:174)(cid:174)(cid:174)(cid:174)(cid:172) . (3.4)The Doob–Dynkin lemma (Lemma 2.1) ensures that every process 𝑈 ∈ U adapted to 𝑋 has theparticular form (3.3) with (3.4). Lemma 3.4 (Doob–Dynkin lemma, extended) . Let 𝑋 = ( 𝑋 , . . . , 𝑋 𝑇 ) be a stochastic process indiscrete time with marginals states 𝑋 𝑡 ∈ R 𝑑 and 𝑈 (cid:67) 𝑋 . There are measurable functions 𝑢 𝑡 sothat 𝑈 𝑡 = 𝑢 𝑡 ( 𝑋 , . . . , 𝑋 𝑡 ) for 𝑡 = , . . . , 𝑇 a.s. and 𝑈 = 𝜑 𝑈 ◦ 𝑋 , where 𝜑 𝑈 = 𝑢 is given by (3.4) . Functional optimization perspective.

The optimization problem (3.2) employs a ﬁxed stochas-tic process 𝑋 . In view of the Doob–Dynkin lemma, the problem (3.2) thus can be stated as anoptimization problem among stochastic processes, or equivalently also as optimization problemamong functions, each of the speciﬁc form (3.4). The multistage stochastic optimization problemthus can be classiﬁed as a functional optimization problem , because solving it means ﬁndingunknown functions as (3.4). The equivalence between measurable functions and processes isgiven by 𝑈 ↦→ 𝜑 𝑈 , where 𝜑 𝑈 is the function from the extended Doob–Dynkin lemma (Lemma 3.4), while the inverseis the map 𝑢 ↦→ 𝑈 = 𝑢 ( 𝑋 ) given in (3.3).Further, this equivalence allows extending the notion of decomposable to stochastic processes. Deﬁnition 3.5 (Decomposable processes) . The class U of stochastic process is decomposable, ifeach function in { 𝜑 𝑈 : 𝑈 ∈ U} is decomposable in the sense of Deﬁnition 2.6. 7 .2 Special cases of the general problem setting The conventional stochastic optimization problem and the stochastic optimization problem withrecourse are special cases of the multistage stochastic optimization problem.

Example 3.6 ( 𝑇 = . Consider the set of policies with

U ⊂ { 𝑈 : 𝑈 𝑡 (cid:67) 𝑋 for all 𝑡 ≥ } (or 𝑇 = 𝑢 𝑡 is deterministic, i.e., nonrandom. The correspondingoptimization problem inf 𝑢 ∈ R 𝑇 + E 𝑣 ( 𝑋, 𝑢 ) (3.5)is a conventional stochastic optimization problem, as it is suﬃcient to treat 𝑋 as a random vectorin (3.5). Here, it is not essential that 𝑋 is a stochastic process, the time component is missing. Example 3.7 ( 𝑇 = . Consider the feasible policies

U ⊂ { 𝑢 : 𝑢 (cid:67) 𝑋 and 𝑢 𝑡 (cid:67) 𝑋 for all 𝑡 ≥ } (or 𝑇 = ( 𝑢 ,𝑢 ( 𝑋 )) ∈U E 𝑣 (cid:0) 𝑋 , 𝑢 , 𝑢 ( 𝑋 ) (cid:1) . Here, the decision 𝑢 is deterministic, i.e., does not depend on the random components of 𝑋 ; 𝑢 (·) is called the random recourse decision in the literature (cf. Shapiro et al. [35]). It is an important conceptual element in stochastic optimization to consider the problem sequentiallyin time, so that any new observation 𝑋 𝑡 triggers a subsequent new decision 𝑢 𝑡 ( 𝑋 . . . , 𝑋 𝑡 ) , whichitself is based on the past. Shapiro [31] depicts the consecutive transitions via the chain in Figure 1.The transitions Figure 1 can be started with 𝑋 equally well. 𝑢 (cid:123) 𝑋 (cid:123) 𝑢 (cid:123) · · · (cid:123) 𝑋 𝑡 (cid:123) 𝑢 𝑡 (cid:123) 𝑋 𝑡 + (cid:123) · · · (cid:123) 𝑋 𝑇 (cid:123) 𝑢 𝑇 Figure 1: The progression of random observations and decisionsIn what follows we develop a similar decomposition of the optimization problem (3.2)and present our main result in Theorem 4.2 below. For notational convenience we introducethe abbreviation 𝑥 𝑡 : 𝑡 (cid:48) (cid:66) ( 𝑥 𝑡 , 𝑥 𝑡 + , . . . , 𝑥 𝑡 (cid:48) ) (0 ≤ 𝑡 , 𝑡 (cid:48) ≤ 𝑇 ) for subvectors. We also write 𝑋 : 𝑡 (cid:66) ( 𝑋 , . . . , 𝑋 𝑡 ) for the initial and 𝑈 𝑡 : (cid:66) ( 𝑈 𝑡 , . . . , 𝑈 𝑇 ) for the ﬁnal (trailing) substrings.Recall that 𝑈 is a non-anticipative process if there is a control 𝑢 so that 𝑈 = 𝑢 ( 𝑋 , . . . , 𝑋 𝑇 ) , aswell as a functions 𝑢 𝑡 : with 𝑈 𝑡 : = 𝑢 𝑡 : ( 𝑋 ) . By U 𝑡 : = { 𝑢 𝑡 : : 𝑈 ∈ U} we denote the set of functionsincluding the ﬁnal decisions of all control processes.8 .1 Existence of the intermediate value functions A common way to solve the initial problem (3.2) is to decompose it into a sequence of subproblems.We specify these subproblems by introducing the value process in the following considerations.Let 𝑢 : 𝑡 ∈ R 𝑡 + and a function ˜ 𝑢 𝑡 + 𝑇 ∈ U 𝑡 + 𝑇 be given. As a consequence of the Doob–Dynkinlemma (Lemma 2.1) there is measurable mapping 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 : R 𝑡 + → R such that 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 ( 𝑋 : 𝑡 ) = E (cid:0) 𝑣 ( 𝑋, 𝑢 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 (cid:1) . (4.1)These conditional expectations constitute the building block for the intermediate value functions. Deﬁnition 4.1.

The (intermediate) value functions are 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) (cid:66) ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 ( 𝑥 : 𝑡 ) and (4.2) 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) (cid:66) ess inf ˜ 𝑢 𝑡 ∈U 𝑡 𝑣 𝑡 (cid:0) 𝑥 : 𝑡 , 𝑢 : 𝑡 − , ˜ 𝑢 𝑡 ( 𝑥 : 𝑡 ) (cid:1) , (4.3)where 𝑡 = , . . . , 𝑇 .These value functions are functions on R ( 𝑡 + )×( 𝑡 + ) ( R ( 𝑡 + )× 𝑡 , resp.) and the essential imﬁmaare with respect to these spaces. These functions are generally not unique as there are multiplefunctions satisfying the Doob–Dynkin lemma. The value functions 𝑉 𝑡 and 𝑣 𝑡 are deﬁned pointwise(and well-deﬁned on each point), but they are not necessarily measureable. Hence, additionalconditions on 𝑣 need to be imposed to ensure measurability.The following statement is the main result. It establishes existence of a measurable version ofthe intermediate value functions. The proof builds on Kolmogorov’s continuity theorem , alsoknown as Kolmogorov–Chentsov theorem.

Theorem 4.2 (Existence of a measurable version of the value function) . Assume that 𝑣 is locallyHölder continuous with exponent 𝛼 > in 𝑢 , i.e., | 𝑣 ( 𝑥, 𝑢 ) − 𝑣 ( 𝑥, 𝑢 )| ≤ 𝐶 (cid:107) 𝑢 − 𝑢 (cid:107) 𝛼 for 𝑥 ∈ R 𝑡 and (cid:107) 𝑢 − 𝑢 (cid:107) ≤ 𝛿, (4.4) where 𝛿 > is suﬃciently small. Then there exists a version of 𝑣 𝑡 of the intermediate valuefunction which is measurable with respect to B ( R 𝑡 + ) ⊗ B ( R 𝑡 + ) and locally Hölder continuouswith exponent ˜ 𝛼 ∈ (cid:0) , 𝛼𝑡 + (cid:1) . To prove the main theorem we recall the following condition on joint measureability fromGowrisankaran [12, Theorem 2]; we state the result in full mathematical beauty, although we donot need this most general variant.

Theorem 4.3.

Let ( 𝑋, 𝜏 ) be a measureable space and 𝑌 a Suslin space. Let B be the Borel 𝜎 -algebra of all measurable subsets for a locally ﬁnite measure 𝜆 on the Borel 𝜎 -algebra of 𝑌 .Then, a function 𝑓 : 𝑋 × 𝑌 → 𝐴 with values in a separable metrizable space 𝐴 with(i) 𝑥 ↦→ 𝑓 ( 𝑥, 𝑦 ) is 𝜏 -measurable for every 𝑦 ∈ 𝑌 and(ii) 𝑦 ↦→ 𝑓 ( 𝑥, 𝑦 ) is continuous on 𝑌 for each 𝑥 ∈ 𝑋 is 𝜏 ⊗ B -measurable on 𝑋 × 𝑌 . emark . Functions satisfying the conditions (i) and (ii) of Theorem 4.3 are also known as

Carathéodory functions . Proof of Theorem 4.2.

We shall employ Theorem 4.3. Consider the function 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) (cid:66) 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡,𝑢 : 𝑡 ( 𝑥 : 𝑡 ) (cf. (4.1)), where ˜ 𝑢 𝑡 + 𝑇 ∈ U 𝑡 + 𝑇 is ﬁxed. Measurability follows from the deﬁnition of thefunction 𝑣 𝑡 in (4.2) and general measurability of the essential inﬁmum and thus the condition (i)of Theorem 4.3.It remains to verify continuity, i.e., (ii). In order to employ Theorem 4.3 we need to showcontinuity of 𝑢 : 𝑡 ↦→ 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) . To this end consider the stochastic process ( 𝑍 𝑢 ) 𝑢 ∈ R 𝑡 + ,indexed by 𝑢 ∈ R 𝑡 + and deﬁned by 𝑍 𝑢 (cid:66) E (cid:0) 𝑣 ( 𝑋, 𝑢, ˜ 𝑢 𝑡 + .𝑇 ( 𝑋 ))| 𝑋 : 𝑡 (cid:1) . Further, let 𝑢 ∈ R and 𝑢 ∈ 𝑈 𝛿 ( 𝑢 ) for 𝛿 > 𝛼 (cid:66) 𝑡 + 𝛼 , 𝛽 (cid:66) E (cid:12)(cid:12) 𝑍 𝑢 − 𝑍 𝑢 (cid:12)(cid:12) ˜ 𝛼 = E (cid:12)(cid:12) E ( 𝑣 ( 𝑋, 𝑢, ˜ 𝑢 𝑡 + .𝑇 ( 𝑋 )) − 𝑣 ( 𝑋, 𝑢 , ˜ 𝑢 𝑡 + .𝑇 ( 𝑋 ))| 𝑋 : 𝑡 ) (cid:12)(cid:12) ˜ 𝛼 ≤ E ( 𝐶 · (cid:107) 𝑢 − 𝑢 (cid:107) 𝛼 ) ˜ 𝛼 ≤ 𝐶 (cid:107) 𝑢 − 𝑢 (cid:107) 𝑡 + = 𝐶 (cid:107) 𝑢 − 𝑢 (cid:107) 𝑡 + + 𝛽 for some 𝐶 < ∞ . Hence, by the Kolmogorov continuity theorem (cf. Klenke [15, p. 453]),there is a process 𝑍 𝑢 such that 𝑍 𝑢 = 𝑍 (· , 𝑢 ) = E ( 𝑣 ( 𝑋, 𝑢, ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 )) | 𝑋 : 𝑡 ) and 𝑍 ( 𝜔, ·) is Höldercontinuous with exponent 𝛽 ˜ 𝛼 = 𝛼𝑡 + for almost every 𝜔 ∈ Ω . It follows that the correspondingfunctions 𝑣 ˜ 𝑢 𝑡 + 𝑇 𝑡 are continuous with respect to 𝑢 . This proves (ii) and hence the assertion of thetheorem. (cid:3) Remark . It is evident that measurable versions of (4.2) and (4.3) existfor uniformly Lipschitz continuous objective functions 𝑣 . In what follows, we deﬁne the value processes substituting 𝑥 : 𝑡 and 𝑢 : 𝑡 by their stochasticcounterparts 𝑋 : 𝑡 and 𝑈 : 𝑡 . Deﬁnition 4.6 (Value process) . Assume 𝑣 satisﬁes the Hölder condition imposed in Theorem 4.2and 𝑈 ∈ U is a nonanticipative stochastic process ( 𝑈 (cid:67) 𝑋 ). The general value processes are 𝒗 𝑈𝑡 (cid:66) 𝑣 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 ) and (4.5) 𝑽 𝑈𝑡 (cid:66) 𝑉 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 − ) , (4.6)where 𝑣 𝑡 and 𝑉 𝑡 are the intermediate value functions, cf. Deﬁnition 4.1.10 RR 𝑇 + × R 𝑇 + R 𝑡 + × R 𝑡 + ( 𝑋 : 𝑡 , 𝑈 : 𝑡 ) 𝒗 𝑡 ( 𝑋, 𝑈 ) 𝑣𝑣 𝑡 ≤ E (cid:0) 𝑣 ( 𝑋, 𝑈 )| 𝑋 : 𝑡 , 𝑈 : 𝑡 (cid:1) Figure 2: Domain and range of the objective and the general value process

Remark . The functions 𝑣 𝑡 and 𝑉 𝑡 (cf. (4.2) and (4.3)) are deﬁned on 𝑉 𝑡 : R 𝑡 + × R 𝑡 → R and 𝑣 𝑡 : R 𝑡 + × R 𝑡 + → R . We employ bold letters to indicate random variables, i.e., functions on Ω given by 𝒗 𝑈𝑡 ( 𝜔 ) = 𝑣 𝑡 (cid:0) 𝑋 : 𝑡 ( 𝜔 ) , 𝑈 : 𝑡 ( 𝜔 ) (cid:1) and 𝑽 𝑈𝑡 ( 𝜔 ) = 𝑉 𝑡 (cid:0) 𝑋 : 𝑡 ( 𝜔 ) , 𝑈 : 𝑡 − ( 𝜔 ) (cid:1) . Figure 2 depicts the domain and the range of these functions and random variables.

Remark 𝜔 -by- 𝜔 description) . The functions 𝑉 𝑡 and 𝑣 𝑡 describing the valueprocesses (4.5) and (4.6) can be given explicitly and directly—but intuitively—as 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) = inf 𝑢 𝑡 : (·) E (cid:0) 𝑣 (cid:0) 𝑋 : 𝑇 , 𝑢 : 𝑡 − , 𝑢 𝑡 : 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 = 𝑢 : 𝑡 − (cid:1) (4.7)and 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) = inf 𝑢 𝑡 + (·) E (cid:0) 𝑣 (cid:0) 𝑋 : 𝑇 , 𝑢 : 𝑡 , 𝑢 𝑡 + 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 = 𝑢 : 𝑡 (cid:1) , (4.8)where the inﬁma are among functions 𝑢 𝑡 : ( 𝑥 , . . . , 𝑥 𝑇 ) = (cid:169)(cid:173)(cid:173)(cid:171) 𝑢 𝑡 ( 𝑥 , . . . , 𝑥 𝑡 ) ...𝑢 𝑇 ( 𝑥 , . . . , 𝑥 𝑡 , . . . , 𝑥 𝑇 ) (cid:170)(cid:174)(cid:174)(cid:172) with 𝑢 ( 𝑋 ) ∈ U .Note, however, that the expressions (4.7) and (4.8) are not necessarily well deﬁned, as theymay depend explicitly on the choice of the control process 𝑈 : 𝑡 . They further face a delicatemeasurability problem, as the pointwise inﬁmum is not measurable, in general. Hence (4.7)and (4.8) can not be used as deﬁnitions. Our deﬁnitions (4.2) and (4.3), together with (4.5)and (4.6), resolve this problem by addressing 𝑢 : 𝑡 as a parameter and passing over to the essentialinﬁmum, which has a measurable version by the main theorem, Theorem 4.3.11 .3 Relation to the multistage problem In what follows we derive the equations interconnecting the value functions introduced in thepreceding section. To this end observe ﬁrst that 𝑉 = inf 𝑈 ∈U E 𝑣 ( 𝑋, 𝑈 ) (4.9)by deﬁnition (4.3), so that 𝑉 is the optimal value of the initial problem. Further, we havewith (4.2) that 𝑣 𝑇 = 𝑣, (4.10)which is the starting point of the optimization problem at the ﬁnal stage.The following statements interconnect the value functions at intermediate stages. Theorem 4.9.

Let 𝑈 ∈ U be a feasible policy. It holds that 𝑉 𝑡 ( 𝑋 : 𝑡 , 𝑈 𝑡 − ) = ess inf ˜ 𝑢 𝑡 ∈U 𝑡 𝑣 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 − , ˜ 𝑢 𝑡 ( 𝑋 : 𝑡 )) and 𝑣 𝑡 ( 𝑋 : 𝑡 , 𝑈 : 𝑡 ) ≥ E ( 𝑉 𝑡 + ( 𝑋 : 𝑡 + , 𝑈 : 𝑡 )| 𝑋 : 𝑡 ) . (4.11) Equality holds in (4.11) , if U is decomposable.Proof. The ﬁrst equation follows directly from the deﬁnition of 𝑉 𝑡 and 𝑣 𝑡 . The second followsfrom E (cid:0) 𝑉 𝑡 + ( 𝑋 : 𝑡 + , 𝑈 : 𝑡 )| 𝑋 : 𝑡 (cid:1) = E (cid:18) ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 E ( 𝑣 ( 𝑋, 𝑈 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 + ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 (cid:19) ≤ ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 E (cid:0) E ( 𝑣 ( 𝑋, 𝑈 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 + )| 𝑋 : 𝑡 (cid:1) = ess inf ˜ 𝑢 𝑡 + 𝑇 ∈U 𝑡 + 𝑇 E (cid:0) 𝑣 ( 𝑋, 𝑈 : 𝑡 , ˜ 𝑢 𝑡 + 𝑇 ( 𝑋 ))| 𝑋 : 𝑡 (cid:1) by Proposition 2.7 and the tower property of the conditional expectation. Equality holds, byProposition 2.7 again, for decomposable controls and hence the assertion. (cid:3) Remark 𝜔 -by- 𝜔 description) . As above and employing the functions (4.7)and (4.8), the equations can be stated directly and explicitly by 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) = inf 𝑢 𝑡 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − , 𝑢 𝑡 ) and 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) ≥ E 𝑋 𝑡 + ( 𝑉 𝑡 + ( 𝑋 : 𝑡 + , 𝑢 : 𝑡 )| 𝑋 : 𝑡 = 𝑥 : 𝑡 ) = E 𝑋 𝑡 + ( 𝑉 𝑡 + ( 𝑥 : 𝑡 , 𝑋 𝑡 + , 𝑢 : 𝑡 )| 𝑋 : 𝑡 = 𝑥 : 𝑡 ) . Equality holds, if U is decomposable.These equations get to the point directly and explain the computational task at each stage ( 𝑡 ) and at each node ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ). Note again that stating the equations this way is not justiﬁed from amathematical perspective, the equations suﬀer from measurability issues, in general. They arejustiﬁed in the ﬁnite dimensional case if 𝑃 ( 𝑋 : 𝑡 = 𝑥 : 𝑡 and 𝑈 : 𝑡 − = 𝑢 𝑡 − ) > Corollary 4.11 (Dynamic relations) . Let 𝑈 ∈ U be a feasible control process. It holds that 𝑽 𝑈𝑡 ≥ ess inf 𝑈 (cid:48) : 𝑡 ∈U : 𝑡 , 𝑈 (cid:48) : 𝑡 − = 𝑈 : 𝑡 − E (cid:16) 𝑽 𝑈 (cid:48) 𝑡 + (cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 (cid:17) and 𝒗 𝑈𝑡 ≥ E (cid:32) ess inf 𝑈 (cid:48) : 𝑡 + ∈U : 𝑡 + , 𝑈 (cid:48) : 𝑡 = 𝑈 : 𝑡 𝒗 𝑈 (cid:48) 𝑡 + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 (cid:33) . Equality holds, if U is decomposable.Proof. The assertion is immediate by combining the deﬁning equations (4.5) and (4.6) and theassertions of Theorem 4.9. (cid:3)

Remark . It holds that 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) ≥ inf 𝑢 𝑡 E 𝑋 𝑡 + ( 𝑉 𝑡 + ( 𝑥 : 𝑡 , 𝑋 𝑡 + , 𝑢 : 𝑡 )| 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 − = 𝑢 : 𝑡 − ) and 𝑣 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 ) ≥ E 𝑋 𝑡 + (cid:18) inf 𝑢 𝑡 + 𝑣 𝑡 + ( 𝑥 : 𝑡 , 𝑋 𝑡 + , 𝑢 : 𝑡 + ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 , 𝑈 : 𝑡 = 𝑢 : 𝑡 (cid:19) . Equality holds, if U is decomposable. Veriﬁcation theorems provide optimality conditions. Given these characterizations it is thepurpose of veriﬁcation theorems to allow verifying or checking, if a given policy is optimal or not.An interesting, early reference is Rockafellar and Wets [25], who study martingales associatedwith optimality conditions. Fleming and Soner [8] give veriﬁcation theorems for dynamic (inparticular Markovian) problems in continuous time. We shall address this particular situationfurther in more detail below.The value process 𝒗 𝑈𝑡 is a stochastic process depending on an underlying policy 𝑈 . A specialsituation occurs if the underlying policy 𝑈 is optimal, i.e., 𝑈 solves the initial problem (3.2).In what follows we examine this situation. We further provide a useful characterization of theoptimizers of (3.2), relating the diﬀerent concepts regarding optimization and probability theory. Theorem 4.13 (Veriﬁcation theorem) . Let 𝑢 ∈ U be any policy. Then the stochastic processes 𝒗 𝑈𝑡 = 𝑣 𝑡 (cid:0) 𝑋 : 𝑡 , 𝑢 : 𝑡 ( 𝑋 : 𝑡 ) (cid:1) , 𝑡 = , . . . , 𝑇 , and 𝑽 𝑈𝑡 = 𝑉 𝑡 (cid:0) 𝑋 : 𝑡 , 𝑢 : 𝑡 − ( 𝑋 : 𝑡 − ) (cid:1) , 𝑡 = , . . . , 𝑇 are submartingales. They are martingales, if U is decomposable and if 𝑢 solves the initialproblem (3.2) .Conversly if U decomposable and 𝑽 𝑈𝑡 , 𝒗 𝑈𝑡 are martingals, then 𝑈 is an optimizer of (3.2) . roof. The ﬁrst assertion is immediate from Corollary 4.11. For the second assume that 𝑽 𝑈 ∗ 𝑡 , 𝒗 𝑈 ∗ 𝑡 are martingals for an underlying policy 𝑈 ∗ . By employing (4.9), (4.10) and Theorem 4.9 itfollows thatinf 𝑈 ∈U E 𝑣 ( 𝑋, 𝑈 ) = 𝑽 = 𝑽 𝑈 ∗ = E (cid:16) 𝑽 𝑈 ∗ (cid:12)(cid:12)(cid:12) 𝑋 (cid:17) = 𝒗 𝑈 ∗ = E (cid:16) 𝒗 𝑈 ∗ 𝑇 (cid:12)(cid:12)(cid:12) 𝑋 (cid:17) = E ( 𝑣 ( 𝑋, 𝑈 ∗ )) and thus the assertion. (cid:3) Theorem 4.13 allows identifying a policy 𝑈 = 𝑢 ( 𝑋 ) as optimal policy by checking, if thevalue processes constitute a martingale or not. Note that the veriﬁcation theorem does not give ahint on where and how to improve the policy. Instead, it can be used ex post to check an existing,given policy with respect to optimality.The veriﬁcation theorem presented above notably works for every multistage stochasticoptimization problem. We did not impose other conditions on the function 𝑣 except Höldercontinuity, and we did not restrict the analysis to Markovian processes. From this mathematicalperspective the statement is rather general. Most common in optimal control, ﬁnance and reinforcement learning are value functions, whichaccumulate costs occurring at consecutive stages. We derive their intermediate value functionsexplicitly by exploiting the speciﬁc structure of the objective function. To this end we transformthe equations for the general additive case ﬁrst and derive the equations for MDP (Markov decisionprocesses) subsequently. The Markovian property, from probabilistic perspective, is essential forthe MDP equations. As well, we derive the equations for stochastic dual dynamic programming(SDDP) from the general equations. ℓ stochastic processes and additive objective functions The particular value function which we consider here, 𝑣 ( 𝑥 : 𝑇 , 𝑢 : 𝑇 ) (cid:66) 𝑇 ∑︁ 𝑡 = 𝛾 𝑡 − 𝑐 𝑡 ( 𝑥 𝑡 − ℓ : 𝑡 , 𝑢 𝑡 − ℓ : 𝑡 − ) , (5.1)adds consecutive costs at lag ℓ ≥ 𝑢 − = 𝑢 ,e.g., and the corresponding cost function is adjusted accordingly). The value function (5.1) is offundamental importance in ﬁnance and in reinforcement learning, where 𝑐 𝑡 is the cost associatedwith time 𝑡 and 𝛾 is a discount factor. Note the very particular choice of arguments of thefunction 𝑐 𝑡 : the last input element is the observation 𝑥 𝑡 , but the subsequent decision 𝑢 𝑡 is nottaken into account. Figure 3 depicts the support of the cost component 𝑐 𝑡 at stage 𝑡 (comparewith Figure 1). 14 · · (cid:123) 𝑋 𝑡 − ℓ (cid:123) 𝑢 𝑡 − ℓ (cid:123) · · · (cid:123) 𝑋 𝑡 − (cid:123) 𝑢 𝑡 − (cid:123) 𝑋 𝑡 (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑐 𝑡 (cid:123) . . . Figure 3: Arguments of the cost component 𝑐 𝑡 The parameter 𝛾 ∈ (− , ) in (5.1) is most typically interpreted as discount factor . Toderive the dynamic equations we assume that the functions 𝑐 𝑡 are Hölder continuous and assumethat the stochastic process associated with the value function (2.1) has lag ℓ as well; that is, 𝜎 ( 𝑋 , . . . , 𝑋 𝑡 ) = 𝜎 ( 𝑋 𝑡 − ℓ , . . . , 𝑋 𝑡 ) for all 𝑡 = ℓ, . . . , 𝑇 . Deﬁne the functions ˜ 𝑉 𝑡 by˜ 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) · 𝛾 𝑡 (cid:66) 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) − 𝑡 ∑︁ 𝑖 = 𝛾 𝑖 − 𝑐 𝑖 ( 𝑥 𝑖 − ℓ : 𝑖 , 𝑢 𝑖 − ℓ : 𝑖 − ) so that ˜ 𝑉 = 𝑉 . For additive cost functions, the schematic decomposition (2.1) now isinf 𝑢 𝑐 + E 𝑋 inf 𝑢 𝑐 + · · · + E 𝑋 𝑡 inf 𝑢 𝑡 𝑐 𝑡 + E 𝑋 𝑡 + inf 𝑢 𝑡 + 𝑐 𝑡 + + · · · + E 𝑋 𝑇 inf 𝑢 𝑇 𝑐 𝑇 (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ˜ 𝑣 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ˜ 𝑉 𝑡 ( 𝑥 𝑡 ,𝑢 𝑡 − ) . From (4.7) we conclude that˜ 𝑉 𝑡 ( 𝑥 : 𝑡 , 𝑢 : 𝑡 − ) = inf 𝑢 𝑡 : (·) E (cid:32) 𝑇 ∑︁ 𝑖 = 𝑡 + 𝛾 𝑖 − − 𝑡 𝑐 𝑖 (cid:0) 𝑋 𝑖 − ℓ : 𝑖 , 𝑢 𝑖 − ℓ : 𝑖 − , 𝑢 𝑖 : 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 𝑋 : 𝑡 = 𝑥 : 𝑡 ,𝑈 : 𝑡 − = 𝑢 : 𝑡 − (cid:33) . (5.2)The function inside the expectation is independent of 𝑥 : 𝑡 − ℓ and the stochastic process 𝑋 has lag ℓ .Further, the decision process 𝑈 is adapted to 𝑋 (cf. (3.2)) and thus has lag ℓ as well. With that itfollows that (5.2) actually is˜ 𝑉 𝑡 ( 𝑥 𝑡 − ℓ + 𝑡 , 𝑢 𝑡 − ℓ + 𝑡 − ) = inf 𝑢 𝑡 : (·) E (cid:32) 𝑇 ∑︁ 𝑖 = 𝑡 + 𝛾 𝑖 − − 𝑡 𝑐 𝑖 (cid:0) 𝑋 : 𝑖 , 𝑢 𝑖 − ℓ : 𝑖 − , 𝑢 𝑖 : 𝑇 ( 𝑋 : 𝑇 ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 𝑋 𝑡 − ℓ + 𝑡 = 𝑥 𝑡 − ℓ + 𝑡 ,𝑈 𝑡 − ℓ + 𝑡 − = 𝑢 𝑡 − ℓ + 𝑡 − (cid:33) . Employing Remark 4.12 we deduce the recursion˜ 𝑉 𝑡 ( 𝑥 𝑡 − ℓ + 𝑡 , 𝑢 𝑡 − ℓ + 𝑡 − ) (5.3) ≥ inf 𝑢 𝑡 E 𝑋 𝑡 + (cid:18) 𝑐 𝑡 + ( 𝑥 𝑡 + − ℓ : 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 + − ℓ : 𝑡 )+ 𝛾 ˜ 𝑉 𝑡 + ( 𝑥 𝑡 − ℓ + 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 − ℓ + 𝑡 ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 𝑡 − ℓ + 𝑡 = 𝑥 𝑡 − ℓ + 𝑡 ,𝑈 𝑡 − ℓ + 𝑡 − = 𝑢 𝑡 − ℓ + 𝑡 − (cid:19) , where equality indicates optimality. This backwards recursion leads to the following discussionon MDP. 15 .2 MDP A Markov decision process (MDP) is a discrete-time stochastic control process. To this end weconsider the cost functions (5.1) with lag ℓ =

1, i.e., 𝑣 ( 𝑥 : 𝑇 , 𝑢 : 𝑇 ) (cid:66) 𝑇 ∑︁ 𝑡 = 𝛾 𝑡 − 𝑐 𝑡 ( 𝑥 𝑡 − , 𝑥 𝑡 ; 𝑢 𝑡 − ) (5.4)and a process 𝑋 with same lag ℓ =

1, i.e., a Markovian process. With that, the recursion (5.3)collapses further to˜ 𝑉 𝑡 ( 𝑥 𝑡 ) = inf 𝑢 𝑡 E (cid:0) 𝑐 𝑡 + ( 𝑥 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 ) + 𝛾 ˜ 𝑉 𝑡 + ( 𝑋 𝑡 + ) (cid:12)(cid:12) 𝑋 𝑡 = 𝑥 𝑡 (cid:1) . (5.5)This recursion is well-known in MDP and (5.5) is also known as backward induction involvingthe Bellman principle (cf. Bellman [1, 2]), which is of fundamental importance in dynamicprogramming. Remark . The MDP literature considers rather trajectories which are driven themselves by thecontrol 𝑢 (the control is called action in the MDP literature). To recognize this dependency inaddition we can restate the recursion as˜ 𝑉 𝑡 ( 𝑥 𝑡 ) = inf 𝑢 𝑡 E 𝑢 𝑡 (cid:0) 𝑐 𝑡 + ( 𝑥 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 ) + 𝛾 ˜ 𝑉 𝑡 + ( 𝑋 𝑡 + ) (cid:12)(cid:12) 𝑋 𝑡 = 𝑥 𝑡 (cid:1) , where E 𝑢 is the expectation with respect to the kernel 𝑃 𝑢 (· | 𝑥 𝑡 ) , which explicitly depends on thedecision 𝑢 . The cost function (5.4) is also considered on an inﬁnite horizon, i.e., 𝑣 ( 𝑥 : 𝑇 , 𝑢 : 𝑇 ) (cid:66) ∞ ∑︁ 𝑡 = 𝛾 𝑡 − 𝑐 𝑡 ( 𝑥 𝑡 − , 𝑥 𝑡 ; 𝑢 𝑡 − ) ; (5.6)problems in reinforcement learning are of this particular form (5.6). The value function (5.1) isbounded in the chosen setting, if the cost functions are uniformly bounded, | 𝑐 𝑡 | ≤ 𝐾 < ∞ andlearning rate 𝛾 ∈ (− , ) (although most typical is 𝛾 ∈ ( , ) ).A particularly interesting situation arises for cost functions which do not depend on the stage 𝑡 ,i.e., 𝑐 𝑡 = 𝑐 and decision satisfying ( 𝑋 𝑡 , 𝑋 𝑡 + ) ∼ ( 𝑋, 𝑋 (cid:48) ) . Then, the value functions ˜ 𝑉 𝑡 does notdepend on 𝑡 neither and the equation˜ 𝑉 ( 𝑥 ) = inf 𝑢 E (cid:0) 𝑐 ( 𝑥, 𝑋 (cid:48) , 𝑢 ) + 𝛾 ˜ 𝑉 ( 𝑋 (cid:48) ) (cid:12)(cid:12) 𝑋 = 𝑥 (cid:1) (5.7)holds.This is a ﬁxed point equation and Banach’s ﬁxed point theorem can be applied to proveexistence and uniqueness of the value function ˜ 𝑉 in appropriate spaces. As well, the equation (5.7)speciﬁes an iterative scheme to improve the value function ˜ 𝑉 in consecutive steps. As an examplewe state the following, where we refer to Fleten et al. [9] for a proof in a similar situation. Theorem 5.2.

Suppose that 𝑐 is continuous and 𝑋 ∈ 𝐾 a.s. for some compact set 𝐾 ⊂ R 𝑛 and | 𝛾 | < . Then the value function ˜ 𝑉 is continuous and ˜ 𝑉 ∈ 𝐶 ( 𝐾 ) . .4 SDDP The problem setting of stochastic dual dynamic programming (SDDP) considers a stagewiseindependent stochastic process 𝑋 𝑡 (i.e., 𝑋 𝑡 is independent of all preceding 𝑋 𝑡 (cid:48) , 𝑡 (cid:48) < 𝑡 ), which isa further simpliﬁcation of all situations described above. With 𝑋 𝑡 ∼ 𝑋 , the dynamic equationreduces further to ˜ 𝑉 𝑡 ( 𝑥 𝑡 ) = inf 𝑢 𝑡 E (cid:0) 𝑐 𝑡 + ( 𝑥 𝑡 , 𝑋 𝑡 + , 𝑢 𝑡 ) + 𝛾 ˜ 𝑉 𝑡 + ( 𝑋 𝑡 + ) (cid:1) . (5.8)This is the simplest situation from statistic perspective and it is not surprising that large andextensive problem settings are accessible for numerical computations. The important algorithmfor SDDP for solving the problem (5.8) eﬃciently originated in Pereira and Pinto [19].We refer to Shapiro [30] for an extended analysis of the algorithm, to Römisch and Guigues[28] and to Girardeau et al. [11], Philpott and Guan [21] for convergence proofs of the algorithm. Multistage stochastic optimization has many applications in varying areas, from ﬁnance to datascience to just mention two. The problems are popular and typically stated conditioned on partialrealizations. This pathwise, or 𝜔 -by- 𝜔 , perspective lacks mathematical rigor. It is surprisingthat mathematical foundations regarding measurability are incomplete from a mathematicalperspective and still missing.This paper clariﬁes that multistage optimization problems, even if given in an informal,pathwise or 𝜔 -by- 𝜔 way can be cast with mathematical rigor. We start by outlining the generalproblem and employ the Kolmogorov continuity theorem to verify that value functions are welldeﬁned, even if conditioned on sets of measure zero.Veriﬁcation theorems can be employed to conﬁrm that candidate policies are optimal. Wefurther characterize optimal policies by involving martingales to characterize these optimalsolutions.Markov decision processes, the Bellman principle for reinforcement learning and stochasticdual dynamic programming are probably most well-known and common in practice of dynamicprogramming. We derive these problem settings as special cases and, in this way, provide rigorousmathematical foundations. References [1] R. E. Bellman.

Dynamic Programming . Princeton University Press, Princton, NJ, 1957. 16[2] R. E. Bellman.

Adaptive control processes . Princeton Legacy Library 2045. PrincetonUniversity Press, 1961. ISBN 978-1-4008-7466-8. doi:10.1002/nav.3800080314. 16[3] D. Bertsekas.

Dynamic programming and optimal control . Athena Scientiﬁc. ISBN1886529434. 2 174] P. Carpentier, J.-P. Chancelier, G. Cohen, M. De Lara, and P. Girardeau. Dynamic consistencyfor stochastic optimal control problems.

Annals of Operations Research , 200(1):247–263,2012. doi:10.1007/s10479-011-1027-8. 2[5] P. Carpentier, J.-P. Chancelier, G. Cohen, and M. De Lara.

Stochastic Multi-Stage Op-timization . Springer International Publishing, 2015. doi:10.1007/978-3-319-18138-7.2[6] N. Dunford and J. T. Schwartz.

Linear Operators. Part I. General Theory . Wiley-Interscience,New York, 1957. URL http://books.google.com/books?id=DuJQAAAAMAAJ . 4[7] E. A. Feinberg. On measurability and representation of strategic measures in Markovdecision processes. In

Institute of Mathematical Statistics Lecture Notes - Monograph Series ,pages 29–43. Institute of Mathematical Statistics, 1996. doi:10.1214/lnms/1215453563. 2[8] W. H. Fleming and H. M. Soner.

Controlled Markov Processes and Viscosity Solutions .Springer, second edition, 2006. doi:10.1007/0-387-31071-1. 13[9] S.-E. Fleten, E. Haugom, A. Pichler, and C. J. Ullrich. Structural estimation of switchingcosts for peaking power plants.

European Journal on Operational Research , 285(1):23–33,2020. doi:10.1016/j.ejor.2019.03.031. 16[10] H. Föllmer and A. Schied.

Stochastic Finance: An Introduction in Discrete Time . deGruyter Studies in Mathematics 27. Berlin, Boston: De Gruyter, 2004. ISBN 978-3-11-046345-3. doi:10.1515/9783110218053. URL http://books.google.com/books?id=cL-bZSOrqWoC . 4[11] P. Girardeau, V. Leclère, and A. B. Philpott. On the convergence of decomposition methodsfor multistage stochastic convex programs.

Mathematics of Operations Research , 40(1):1–16, 2014. doi:10.1287/moor.2014.0664. 1, 17[12] K. Gowrisankaran. Measurability of functions in product spaces.

Proceedings of TheAmerican Mathematical Society - PROC AMER MATH SOC , 31:485–485, 02 1972.doi:10.1090/S0002-9939-1972-0291403-X. 9[13] O. Kallenberg.

Foundations of Modern Probability . Springer, New York, 2002.doi:10.1007/b98838. 4[14] I. Karatzas and S. E. Shreve.

Methods of Mathematical Finance . Stochastic Modelling andApplied Probability. Springer, 1998. doi:10.1007/b98840. 4, 5[15] A. Klenke.

Probability Theory . Springer London. doi:10.1007/978-1-4471-5361-0. 10[16] G. Lan and Z. Zhou. Dynamic stochastic approximation for multi-stage stochastic opti-mization.

Mathematical Programming , 2020. doi:10.1007/s10107-020-01489-y. URL https://arXiv.org/abs/1707.03324 . 21817] N. Löhndorf, D. Wozabal, and S. Minner. Optimizing trading decisions for hydro storagesystems using approximate dual dynamic programming.

Operations Research , 61(4):810–823, 2013. doi:10.1287/opre.2013.1182. 2[18] F. Maggioni and G. Ch. Pﬂug. Guaranteed bounds for general non-discrete multistagerisk-averse stochastic optimization programs.

SIAM Journal on Optimization , 29(1):454–483,2019. doi:10.1137/17M1140601. 1[19] M. V. F. Pereira and L. M. V. G. Pinto. Multi-stage stochastic optimization applied to energyplanning.

Mathematical Programming , 52(1-3):359–375, 1991. doi:10.1007/BF01582895.17[20] G. Ch. Pﬂug and A. Pichler.

Multistage Stochastic Optimization . Springer Series inOperations Research and Financial Engineering. Springer, 2014. ISBN 978-3-319-08842-6. doi:10.1007/978-3-319-08843-3. URL https://books.google.com/books?id=q_VWBQAAQBAJ . 2[21] A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programmingand related methods.

Operations Research Letters , 36(4):450–455, 2008. ISSN 0167-6377.doi:10.1016/j.orl.2008.01.013. 17[22] A. B. Philpott, V. L. de Matos, and E. Finardi. On solving multistage stochasticprograms with coherent risk measures.

Operations Research , 61(4):957–970, 2013.doi:10.1287/opre.2013.1175. 1[23] A. Pichler and A. Shapiro. Mathematical foundations of distributionally robust multistageoptimization, 2021. URL https://arXiv.org/abs/2101.02498 . 2[24] R. T. Rockafellar. Integral functionals, normal integrands and measurable selections.In

Nonlinear operators and the calculus of variations , pages 157–207. Springer, 1976.doi:10.1007/BFb0079944. 2[25] R. T. Rockafellar and R. J.-B. Wets. Nonanticipativity and 𝐿 -martingales in stochasticoptimization problems. Mathematical Programming Study , 6:170–187, 1976. 13[26] R. T. Rockafellar and R. J. B. Wets. On the interchange of subdiﬀerentiation andconditional expectations for convex functionals.

Stochastics , 7(3):173–182, 1982.doi:10.1080/17442508208833217. 5[27] R. T. Rockafellar and R. J.-B. Wets.

Variational Analysis . Springer Verlag,1997. doi:10.1007/978-3-642-02431-3. URL https://books.google.com/books?id=w-NdOE5fD8AC . 2, 5[28] W. Römisch and V. Guigues. Sampling-based decomposition methods for multistagestochastic programs based on extended polyhedral risk measures.

SIAM Journal onOptimization , 22(2):286–312, 2012. doi:10.1137/100811696. 171929] A. Ruszczyński. Risk-averse dynamic programming for Markov decision processes.

Math.Program., Ser. B , 125:235–261, 2010. doi:10.1007/s10107-010-0393-3. 3[30] A. Shapiro. Analysis of stochastic dual dynamic programming method.

European Journalof Operational Research , 209(1):63–72, 2010. doi:10.1016/j.ejor.2010.08.007. 17[31] A. Shapiro. Time consistency of dynamic risk measures.

Operations Research Letters , 40(6):436–439, 2012. doi:10.1016/j.orl.2012.08.007. 8[32] A. Shapiro. Interchangeability principle and dynamic equations in risk aversestochastic programming.

Operations Research Letters , 45(4):377–381, jul 2017.doi:10.1016/j.orl.2017.05.008. 5[33] A. Shapiro. Tutorial on risk neutral, distributionally robust and risk averse mul-tistage stochastic programming.

European Journal of Operational Research , 2020.doi:10.1016/j.ejor.2020.03.065. 2[34] A. Shapiro, W. Tekaya, J. P. da Costa, and M. Pereira Soares. Risk neutral andrisk averse stochastic dual dynamic programming method. 224(2):375–391, 2013.doi:10.1016/j.ejor.2012.08.022. 2[35] A. Shapiro, D. Dentcheva, and A. Ruszczyński.

Lectures on Stochastic Programming . MOS-SIAM Series on Optimization. SIAM, second edition, 2014. doi:10.1137/1.9780898718751.8[36] A. N. Shiryaev.