[PDF] Two-stage Linear Decision Rules for Multi-stage Stochastic Programming

Abstract

Multi-stage stochastic linear programs (MSLPs) are notoriously hard to solve in general. Linear decision rules (LDRs) yield an approximation of an MSLP by restricting the decisions at each stage to be an affine function of the observed uncertain parameters. Finding an optimal LDR is a static optimization problem that provides an upper bound on the optimal value of the MSLP, and, under certain assumptions, can be formulated as an explicit linear program. Similarly, as proposed by Kuhn, Wiesemann, and Georghiou (Math. Program., 130, 177-209, 2011) a lower bound for an MSLP can be obtained by restricting decisions in the dual of the MSLP to follow an LDR. We propose a new approximation approach for MSLPs, two-stage LDRs. The idea is to require only the state variables in an MSLP to follow an LDR, which is sufficient to obtain an approximation of an MSLP that is a two-stage stochastic linear program (2SLP). We similarly propose to apply LDR only to a subset of the variables in the dual of the MSLP, which yields a 2SLP approximation of the dual that provides a lower bound on the optimal value of the MSLP. Although solving the corresponding 2SLP approximations exactly is intractable in general, we investigate how approximate solution approaches that have been developed for solving 2SLP can be applied to solve these approximation problems, and derive statistical upper and lower bounds on the optimal value of the MSLP. In addition to potentially yielding better policies and bounds, this approach requires many fewer assumptions than are required to obtain an explicit reformulation when using the standard static LDR approach. A computational study on two example problems demonstrates that using a two-stage LDR can yield significantly better primal policies and modestly better dual policies than using policies based on a static LDR.

Full PDF

aa r X i v : . [ m a t h . O C ] M a r Two-stage Linear Decision Rules for Multi-stage StochasticProgramming

Merve Bodur ∗ and James R. Luedtke † Department of Mechanical and Industrial Engineering, University of Toronto Department of Industrial and Systems Engineering, University of Wisconsin-Madison

March 20, 2018

Abstract

Multi-stage stochastic linear programs (MSLPs) are notoriously hard to solve in gen-eral. Linear decision rules (LDRs) yield an approximation of an MSLP by restrictingthe decisions at each stage to be an aﬃne function of the observed uncertain parame-ters. Finding an optimal LDR is a static optimization problem that provides an upperbound on the optimal value of the MSLP, and, under certain assumptions, can be for-mulated as an explicit linear program. Similarly, as proposed by Kuhn, Wiesemann,and Georghiou (“Primal and dual linear decision rules in stochastic and robust op-timization”

Math. Program. two-stage LDRs . The idea is torequire only the state variables in an MSLP to follow an LDR, which is suﬃcient toobtain an approximation of an MSLP that is a two-stage stochastic linear program (2SLP). We similarly propose to apply LDR only to a subset of the variables in thedual of the MSLP, which yields a 2SLP approximation of the dual that provides alower bound on the optimal value of the MSLP. Although solving the corresponding2SLP approximations exactly is intractable in general, we investigate how approxi-mate solution approaches that have been developed for solving 2SLP can be applied tosolve these approximation problems, and derive statistical upper and lower bounds on ∗ [email protected] † [email protected] he optimal value of the MSLP. In addition to potentially yielding better policies andbounds, this approach requires many fewer assumptions than are required to obtain anexplicit reformulation when using the standard static LDR approach. A computationalstudy on two example problems demonstrates that using a two-stage LDR can yieldsigniﬁcantly better primal policies and modestly better dual policies than using policiesbased on a static LDR. Keywords: Multi-stage stochastic programming, linear decision rules, two-stage approxi-mation

We present a new approach for approximately solving multi-stage stochastic linear pro-grams (MSLPs). MSLPs model dynamic decision-making processes in which a decision ismade, a stochastic outcome is observed, another decision is made, and so on, for T stages.At each stage, the decision vectors are constrained by linear constraints that depend on thehistory of observed stochastic outcomes. A solution of an MSLP is a policy, which deﬁnesthe decisions to be made at each stage as a function of the observed outcomes up to thatstage. The objective in an MSLP is to choose a policy that minimizes the expected costover all stages. Although MSLPs can be used to model a wide variety of problems (e.g.,[61]), they are notoriously hard to solve in general [17, 57].There are a variety of methods available for MSLPs in the case that the stochasticprocess is represented by a scenario tree [12, 28, 32, 55]. Such algorithms include nestedBenders decomposition [8, 10, 22], progressive hedging [49], and aggregation and parti-tioning [2, 9], and enable the solution of MSLPs with possibly very large scenario trees.Unfortunately, as discussed in [57], the size of a scenario tree needed to obtain even a mod-estly accurate approximation grows exponentially in the number of stages. For example,a 10-stage problem in which the uncertainty in each stage is represented by just 50 real-izations would yield a scenario tree having nearly 2 · scenarios, making any approachthat requires even a single pass through the scenario tree impossible.Under some conditions, including stage-wise independence of the random variables,stochastic dual dynamic programming (SDDP) [42] can overcome the diﬃculty in explod-ing scenario tree size, by constructing a single value function approximation for each stage.The SDDP algorithm converges almost surely on a ﬁnite scenario tree [14, 53] (see also224, 26, 44] for related results). In some cases, such as additive dependence [33] andMarkov dependence [43], the assumption of stage-wise independence can be satisﬁed viaan appropriate reformulation, e.g., see Example 10 in [54] (see also [25] for other types ofdependence). However, such reformulations are not applicable for stage-wise dependenceof random recourse matrices (i.e., in the coeﬃcients of the constraints) or objective coef-ﬁcients. Similar approaches that exploit stage-wise independence and value function ap-proximations include multi-stage stochastic decomposition [51] and approximate dynamicprogramming [47].An alternative approach to handling the complexity of MSLP is to restrict the func-tional form of the policy. One such approach is the use of linear decision rules (LDRs).The idea of an LDR is to require that all decisions made in each stage be a linear (or aﬃne)function of the observed random outcomes up to that stage. The problem then reduces toa static problem of ﬁnding the best LDR, whose expected cost then yields an upper boundon the optimal value of the MSLP. In this paper, we refer this use of an LDR as a staticLDR . While LDRs have a long history (see, e.g., [21]), they have recently gained renewedinterest in the mathematical optimization literature after their application to adjustablerobust optimization in [4]. The adaptation of this approach to MSLP was presented in[57], and Kuhn et al. [35] analyzed the application of a static LDR in the dual of theMSLP, which yields a lower bound on the optimal value of the MSLP. Moreover, undercertain assumptions, the static approximations obtained after restricting the primal anddual policies to be an LDR are both tractable linear programs, as shown in [57] and [35],respectively. The assumptions include stage-wise independence (or a slight generalization),compact and polyhedral support, and that uncertainty is limited to the right-hand side ofthe constraints. While in some cases static LDR policies provide high quality approxima-tions to MSLP, they have potential to be signiﬁcantly suboptimal. Better policies (primaland dual) can be obtained by considering more ﬂexible (nonlinear) rules such as (static)piecewise linear decision rules [13] and polynomial decision rules [3].We propose a new use of an LDR, which we refer to as two-stage linear decision rules .The key idea is to partition the decision variables into state and recourse decision variables,with the property that if the state variables are ﬁxed, then the problem decouples into aseparate problem for each stage, involving only recourse decision variables. If one appliesan LDR only to the state variables, then the problem reduces to a two-stage stochasticlinear program (2SLP), in contrast to a static problem which is obtained when using astatic LDR. The advantage of two-stage LDRs is that they free the recourse variables from3he LDR requirement, thus allowing for a potentially improved policy. Indeed, there existfeasible 2SLPs that are infeasible if one enforces an LDR on the recourse variables [21].This idea of reducing an MSLP to a 2SLP is similar to that proposed by Ahmed [1], exceptthat in [1] the state variables are completely decided in the ﬁrst-stage and ﬁxed, whereaswe allow them to vary according to an LDR. We also consider applying a two-stage LDRin the dual of an MSLP, exploiting the observation that imposing an LDR restriction only on the dual variables associated with the state equations is suﬃcient to obtain a 2SLP thatapproximates the multi-stage dual problem. We investigate how approximate solutionsto the associated primal and dual approximation problems can be used to obtain feasiblepolicies with associated statistical estimates on the optimality gap. Our analysis suggeststhat this can be done under mild assumptions, for example that the primal problem exhibitsrelatively complete recourse (i.e., for any current state there exists a feasible next decisionand state) and has a bounded feasible region with probability 1. We illustrate the two-stage LDR approach on two example problems: an inventory planning problem similar tothat studied in [4, 35], and a capacity expansion problem proposed in [15]. We ﬁnd that,for these problems, using two-stage LDRs yields signiﬁcantly better primal policies (upperbounds), and modestly improves on the lower bounds, when compared to using staticLDRs. For the capacity expansion problem, we also compare the two-stage LDR policiesand bounds to those obtained using the SDDP algorithm, when run for a similar amountof computational time. We ﬁnd that the SDDP algorithm yields similar lower boundsand better policies for this problem, as expected since the SDDP algorithm is known toconverge to an optimal solution. Thus, the two-stage LDR approximation is expected tobe useful primarily for problems where the SDDP algorithm does not apply.A signiﬁcant challenge to using two-stage LDRs is that the resulting 2SLP is in generalintractable to solve exactly. Indeed, 2SLP is P -hard [17, 27] due to the diﬃculty inevaluating the expectation of the recourse function. However, as argued in [57], undermild conditions Monte Carlo sampling-based methods can provide solutions of modestaccuracy to a 2SLP (such a statement cannot be made for MSLP). Thus, an importantbeneﬁt of the two-stage LDR approach is that it enables the application of the long historyof research into solving 2SLPs to the multi-stage setting. While using a sampling-basedmethod may lead to a suboptimal solution of the 2SLP approximations, our hope is thatthis suboptimality may be more than oﬀset by the improvement gained by eliminating theLDR requirement on the recourse decisions that is imposed when using a static LDR. Inaddition, when using a sampling-based approach, the assumptions that are required for4btaining a tractable reformulation when applying a static LDR are no longer needed. Inparticular, the random variables need not have polyhedral (or even bounded) support, theconstraint matrices may be random and dependent across time stages, and the LDR maybe based on nonlinear functions of the random variables.The two-stage LDR approach we propose can also be applied to certain multi-stagestochastic integer programs , in which some of the decision variables are required to beinteger valued. In particular, for the primal problem, the approach applies directly providedintegrality restrictions are imposed only on the recourse variables. When the state variableshave integrality restrictions as well, the form of the decision rule applied to the statevariables must be modiﬁed, but the two-stage approach still applies. We refer to [7] forone possible such decision rule based on piecewise-linear binary functions. We remarkthat combining our approach with that of [7] would have potential beneﬁt in terms ofboth tractability and policy quality, as removing the piecewise-linear binary decision rulerequirement from the recourse variables both eliminates the need to design such a rule,and gives those decisions more ﬂexibility.The rest of this paper is organized as follows. Section 2 deﬁnes the MSLP, reviewsthe static LDR approach, and presents the proposed two-stage LDR approach, includingdiscussion of how to solve the approximate problem and obtain statistical upper bounds onthe original MSLP. Section 3 conducts a similar analysis for the dual of an MSLP, yield-ing an approach for ﬁnding statistical lower bounds on an MSLP. We present illustrativeapplications in Sections 4 and 5, and make concluding remarks in Section 6. We formulate an MSLP with T ≥ a ≤ b , [ a, b ] := { a, a + 1 , . . . , b } and [ b ] := { , . . . , b } :min x,s E h X t ∈ [ T ] c t ( ξ t ) ⊤ x t ( ξ t ) + h t ( ξ t ) ⊤ s t ( ξ t ) i (1a)s.t. A t ( ξ t ) s t ( ξ t ) + B t ( ξ t ) s t − ( ξ t − ) + C t ( ξ t ) x t ( ξ t ) = b t ( ξ t ) , t ∈ [ T ] , P -a.s. , (1b)( x t ( ξ t ) , s t ( ξ t )) ∈ X t ( ξ t ) , t ∈ [ T ] , P -a.s. (1c)5here for t ∈ [ T ] X t ( ξ t ) := { x t ∈ R p t , s t ∈ R q t : D t ( ξ t ) s t + E t ( ξ t ) x t ≥ d t ( ξ t ) } . Here, { ξ t } Tt =1 is a stochastic process with probability distribution P and support Ξ, where ξ = 1 for all ξ ∈ Ξ (i.e., data in stage 1 is deterministic), ξ r is a random vector takingvalues in R ℓ r for r ∈ [2 , T ], and ξ t := ( ξ , . . . , ξ t ) for t ∈ [ T ]. Letting ℓ = 1, we denote ℓ t := P tr =1 ℓ r for t ∈ [ T ]. The s and x variables are referred to as state and recourse variables, respectively. Similarly, (1b) and (1c) are referred to as state equations and recourse constraints , respectively. The objective is to minimize the expected total cost.The functions b t : R ℓ t → R m t , d t : R ℓ t → R n t , A t : R ℓ t → R m t × q t , B t : R ℓ t → R m t × q t − , C t : R ℓ t → R m t × p t , D t : R ℓ t → R n t × q t , and E t : R ℓ t → R n t × p t deﬁne the random coeﬃcientsas a function of ξ t . Frequently in the literature, these are assumed to be aﬃne functionsof ξ t , but we will not need this assumption in this work. In (1b) for t = 1, we adoptthe convention that s ( ξ ) = 0. The constraints are required to be almost surely satisﬁedwith respect to the distribution of the stochastic process, denoted by “ P -a.s.” We notethat any MSLP can be brought into the form of (1) by introducing additional variablesand constraints. Throughout the paper, we assume that (1) is feasible and has an optimalsolution, and we denote its optimal objective value as z MSLP . A tractable approximation of MSLP can be obtained by restricting the decision policies toa certain form, i.e., by restricting the decisions to be a special function of the uncertainparameters. A linear decision rule is a policy in which the decisions at each stage t arerestricted to be a linear function of the observed random variables ξ t up that stage. Werefer to the policies in which all the decisions are required to follow an LDR as static LDRpolicies . Speciﬁcally, a static LDR policy has the form: s t ( ξ t ) = β t Φ t ( ξ t ) , (2a) x t ( ξ t ) = Θ t Φ t ( ξ t ) , (2b)where Θ t ∈ R p t × K t and β t ∈ R q t × K t are free parameters of the LDR, and Φ t ( ξ t ) =(Φ t ( ξ t ) , . . . , Φ tK t ( ξ t )) : R ℓ t → R K t for all t ∈ [ T ] is a vector of given LDR basis func-tions . We refer to the Θ and β variables as the LDR variables . We assume K = 16nd Φ t ( ξ t ) ≡ t ∈ [ T ]. Often, the basis functions are the uncertain parametersthemselves, i.e., K t = ℓ t and Φ tk ( ξ t ) = ( ξ t ) k , where ( ξ t ) k denotes the k th component of ξ t vector. In this case, we refer to the basis functions as the standard basis functions . Notethat the convention ξ ≡ t are actually aﬃne functions of the random variables ( ξ , . . . , ξ t ). Finally, for notational convenience we adoptthe convention that Φ ≡

0, so that any term involving Φ disappears.Substituting the LDRs of the form (2) into MSLP given in (1) yields the followingapproximation of MSLP, which we call P-LDR:min Θ ,β E h X t ∈ [ T ] c t ( ξ t ) ⊤ Θ t Φ t ( ξ t ) + h t ( ξ t ) ⊤ β t Φ t ( ξ t ) i (3)s.t. A t ( ξ t ) β t Φ t ( ξ t ) + C t ( ξ t )Θ t Φ t ( ξ t ) + B t ( ξ t ) β t − Φ t − ( ξ t − ) = b t ( ξ t ) , t ∈ [ T ] , P -a.s. ,D t ( ξ t ) β t Φ t ( ξ t ) + E t ( ξ t )Θ t Φ t ( ξ t ) ≥ d t ( ξ t ) , t ∈ [ T ] , P -a.s. , Θ t ∈ R p t × K t , β t ∈ R q t × K t , t ∈ [ T ] . We let z LDR denote the optimal value of P-LDR, where here and elsewhere, we adoptthe convention that if a minimization (maximization) problem is infeasible, the associatedoptimal value is deﬁned to be + ∞ ( −∞ ). Note that all the decision variables are determin-istic, i.e., they have to be determined before observing any random outcomes, and hencethis problem is a static problem. P-LDR is a semi-inﬁnite program having inﬁnitely manyconstraints. It is observed in [57] (see also [13, 35]) that P-LDR can be reformulated as alinear program (LP) using robust optimization techniques under the following assumptions:A1. The standard basis functions are used.A2. For all t ∈ [ T ], the constraint matrices, A t , B t , C t , D t , E t , are independent of therandom vector ξ T , and b t ( ξ t ) and d t ( ξ t ) are aﬃne functions of ξ t .A3. The support, Ξ, is a nonempty compact polyhedron.The LP reformulation has constraints of the form ( β, Θ , w ) ∈ D , where w are auxiliaryvariables and D is an explicitly given polyhedron. The size of this LP scales well (typicallygrows only quadratically) with the number of stages T . Moreover, the LP does not requireany discretization of P (e.g., by Monte Carlo sampling), and instead only uses a polyhedraldescription of Ξ and the second order moment matrix of the random variables. Theseresults have been generalized in [23] to the case of conic support, where A3 is replaced by7n assumption that Ξ is described by a ﬁnite set of conic inequalities, in which case P-LDR(3) is reformulated as a conic program.As P-LDR (3) is a restriction of MSLP, it provides an upper bound to MSLP. However,the beneﬁt of tractability comes at the expense of loss of optimality. That is, the obtainedupper bound can be substantially far from the optimal value of MSLP. Indeed, for 2SLPs,the optimal recourse decisions are very rarely linear in the random variables, but therealways exists an optimal piecewise linear decision rule [21]. We propose two-stage LDRs which yield upper bounds to MSLP that cannot be worsethan the ones obtained by P-LDR (3). The key idea is to apply an LDR only on the statevariables to obtain a two-stage approximation of MSLP, rather than a static approximation.Substituting the LDR of the form (2a) into the MSLP given in (1) yieldsmin x,β X t ∈ [ T ] E (cid:2) c t ( ξ t ) ⊤ x t ( ξ t ) (cid:3) + X t ∈ [ T ] E (cid:2) h t ( ξ t ) ⊤ β t Φ t ( ξ t ) (cid:3) (4)s.t. A t ( ξ t ) β t Φ t ( ξ t ) + C t ( ξ t ) x t ( ξ t ) + B t ( ξ t ) β t − Φ t − ( ξ t − ) = b t ( ξ t ) , t ∈ [ T ] , P -a.s. ,D t ( ξ t ) β t Φ t ( ξ t ) + E t ( ξ t ) x t ( ξ t ) ≥ d t ( ξ t ) , t ∈ [ T ] , P -a.s. ,x t ( ξ t ) ∈ R p t , t ∈ [ T ] , P -a.s. ,β t ∈ R q t × K t , t ∈ [ T ] . We denote this problem as P-LDR-2S and the optimal value of this problem as z S . Thisproblem is a 2SLP, which can be equivalently written as follows, where we drop dependenceof the ﬁrst-stage variables on ξ ≡ β = s ( ξ ) = Φ ( ξ ) β , z S = min x ,β c ⊤ x + X t ∈ [ T ] E (cid:2) h t ( ξ t ) ⊤ β t Φ t ( ξ t ) (cid:3) + E [ Q ( β, ξ T )]s.t. A β + C x = b ,D β + E x ≥ d ,x ∈ R p ,β t ∈ R q t × K t , t ∈ [ T ] , Q ( β, ξ T ) := P t ∈ [2 ,T ] Q t ( β, ξ t ) and for t ∈ [2 , T ], Q t ( β, ξ t ) := min x t c t ( ξ t ) ⊤ x t (5a)s.t. C t ( ξ t ) x t = b t ( ξ t ) − A t ( ξ t ) β t Φ t ( ξ t ) − B t ( ξ t ) β t − Φ t − ( ξ t − ) , (5b) E t ( ξ t ) x t ≥ d t ( ξ t ) − D t ( ξ t ) β t Φ t ( ξ t ) , (5c) x t ∈ R p t . (5d)The following proposition, immediate from the deﬁnitions of the associated problems,summarizes the relationship between the optimal values of MSLP, P-LDR (3), and P-LDR-2S (4). Proposition 2.1.

The following inequalities hold: z MSLP ≤ z S ≤ z LDR . The diﬀerence between z LDR and z S can be arbitrarily large. In particular, an exampleis given in [21] of a 2SLP having relatively complete recourse for which P-LDR (3) isinfeasible ( z LDR = ∞ ), while the 2SLP (and hence P-LDR-2S, (4)) is feasible.Unfortunately, the techniques used to derive a static approximation of P-LDR (3) donot yield an eﬃciently computable reformulation of P-LDR-2S (4), even under assumptionsA1-A3. In the next section, we review approaches for obtaining an approximate solution,say ˆ β , of P-LDR-2S (4). Then, in Section 2.4 we discuss techniques for obtaining a feasiblepolicy (and hence estimating an upper bound on z MSLP ) using such a solution.

There is a huge literature on (approximately) solving 2SLP problems. In this section, wepresent a brief overview of relevant approaches, with a focus on identifying the requiredassumptions. We refer the reader to [55] for more details.A common approach for approximately solving a 2SLP is sample average approximation ,in which P is approximated by a discrete probability measure ˆ P that assigns positive weightsonly to a ﬁnite (relatively small) number of realizations of ξ T which are called scenarios .In this way, the intractable expectation term is replaced with a sum. Scenarios may beconstructed by a variety of techniques, such as Monte Carlo, quasi-Monte Carlo, and Latinhypercube sampling (e.g., [30, 34, 38, 41, 56]). For the purpose of this paper, we consider9nly the conceptually simplest case in which scenarios are generated via independent MonteCarlo sampling.Let ξ Tj , j = 1 , . . . , N , be an independent and identically distributed (i.i.d.) randomsample of the random vector ξ T , and deﬁne the sample average approximation (SAA)problem: ˆ z SN := min x ,β c ⊤ x + X t ∈ [ T ] E (cid:2) h t ( ξ t ) ⊤ β t Φ t ( ξ t ) (cid:3) + 1 N X j ∈ [ N ] Q ( β, ξ Tj ) (6a)s.t. A β + C x = b , (6b) D β + E x ≥ d , (6c) x ∈ R p , (6d) β t ∈ R q t × K t , t ∈ [ T ] . (6e)Once the sample is ﬁxed, the SAA problem can be solved by any approach for solvingthe above, now deterministic, problem. In particular the L -shape decomposition algorithm[59] or a regularized variant [36, 50] can be applied, with the further advantage that thesubproblem obtained with ﬁxed ˆ β decomposes by both scenario and stage due to the rela-tionship Q ( β, ξ T ) = P t ∈ [2 ,T ] Q t ( β, ξ t ). The coeﬃcients on β t in the objective function (6a)can also be estimated by sampling in case the terms E [ h tj ( ξ t )Φ tk ( ξ t )] cannot be computedeﬃciently.If (i) there exists a ¯ β such that E [ Q ( β, ξ T )] < ∞ for all β in a neighborhood of ¯ β , and(ii) the set of optimal solutions to P-LDR-2S (4) is nonempty and bounded, then because Q ( · , ξ T ) is a convex function for all ξ T ∈ Ξ, Theorem 5.4 of [55] applies and implies thatˆ z SN → z S with probability 1 as N → ∞ (7)and also that the set of optimal solutions to (6) converges to the set of optimal solutionsof P-LDR-2S (4).Stronger results on the convergence of ˆ z SN to z S require additional assumptions. Forexample, a central limit theorem result (e.g., Theorem 5.7 in [55]) can be obtained underthe assumptions that E [ Q ( ¯ β, ξ T ) ] < ∞ for some ¯ β , and that there exists a measurablefunction f : Ξ → R + such that E [ f ( ξ T ) ] is ﬁnite and |Q ( β, ξ T ) − Q ( β ′ , ξ T ) | ≤ f ( ξ T ) k β − β ′ k β, β ′ and almost every ξ T ∈ Ξ. Bounds on the sample size required for (6) toyield an ǫ -optimal solution to P-LDR-2S (4) with probability at least 1 − α are derived in[52, 55, 56, 57]. These bounds scale linearly with the dimension of the ﬁrst-stage variables, β and x in this case. Dependence on the conﬁdence α is ln(1 /α ) so that high conﬁdencecan be achieved, but the dependence on ǫ is O (1 /ǫ ), which is why sampling is limited toobtaining “medium accuracy” solutions [57]. These stronger results all require, at least,that Q ( β, ξ T ) is ﬁnite for every ﬁrst-stage solution β and almost every ξ T ∈ Ξ.In order to facilitate the solution of P-LDR-2S (4) via a sampling procedure, we alsoconsider adding additional constraints β ∈ B ⊆ R τ P , where τ P := P t ∈ [ T ] q t K t , to P-LDR-2S (4) (and to the SAA (6)). For example, some of the convergence results require theﬁrst-stage feasible region to be bounded, in which case we may deﬁne B by limiting theabsolute value of each component of β to be less than a large constant. If the constant isnot chosen large enough, then this may degrade the quality of the solution obtained, butthis could be detected after solving the SAA problem by determining if any of the boundconstraints are tight. More signiﬁcantly, most of the SAA convergence results require thefollowing relatively complete recourse assumption on the set B : Assumption 2.2.

For all β ∈ B , Q ( β, ξ T ) < + ∞ P -a.s.. If P-LDR-2S (4) already has relatively complete recourse, then we can take B = R τ P ,and hence impose no additional constraints. Otherwise, adding the constraints β ∈ B has the potential to make the approximation more conservative. Derivation of a set B thatsatisﬁes Assumption 2.2 is a diﬃcult task in general. However, relatively complete recoursecan often be achieved by appropriate modeling, e.g., by introducing “artiﬁcial” variablesthat allow violation of a constraint, where the violation amount is then penalized in theobjective function. Derivation of a set B satisfying Assumption 2.2 may then be possibleusing ad hoc techniques. We provide an example of how this can be done in an inventoryplanning problem in Section 4.2. Another possibility, if assumptions A1-A3 hold, is touse the robust optimization techniques used in [35, 57] to derive a tractable set D suchthat ( β, Θ) is feasible to P-LDR (3) if and only if there are values of auxiliary variables w such that ( β, Θ , w ) ∈ D . Then, B = proj β ( D ) would satisfy Assumption 2.2. Thisconstruction of B is more conservative than necessary, because it restricts β to values forwhich there is also a Θ that makes the static LDR policy deﬁned in (2) feasible P -a.s..However, the resulting policy could still potentially be better (and for sure would not beworse) than the static LDR policy obtained from P-LDR (3), since enforcing β ∈ proj β ( D )11ould not require the recourse decisions to follow an LDR policy (it only requires existenceof a feasible LDR policy).P-LDR-2S (4), with the additional constraints β ∈ B , can also be approximately solvedby stochastic approximation [48] or one of its robust extensions, e.g., [40, 46], when As-sumption 2.2 holds and B is bounded.Finally, we remark that if we cannot derive a set B satisfying Assumption 2.2, resultsabout sampling-based approximation of chance-constrained programs derived in [11] canbe used to show that an optimal solution of the SAA problem (6) yields a policy that isfeasible for a large fraction of the random outcomes. This has been previously used in[6, 7, 60] when using sampling to approximately solve static approximations derived fromﬁnitely adaptable and piecewise-linear decision rules. Although the two-stage LDR policyitself is not necessarily feasible P -a.s. in this case, in the next section we discuss how anapproximate solution ˆ β could still be used to guide a feasible policy. z MSLP

Let (ˆ x , ˆ β ) be an approximate ﬁrst-stage solution to P-LDR-2S (4). We discuss how sucha solution can be used to obtain a feasible policy for the MSLP (1), which in turn can beused to estimate an upper bound on z MSLP . We consider two possibilities for obtainingsuch a policy, depending on whether or not a set B satisfying Assumption 2.2 is used.We ﬁrst consider the case that ˆ β ∈ B for a set B satisfying Assumption 2.2. In this case,(ˆ x , ˆ β ) deﬁnes a feasible solution to P-LDR-2S (4) and a feasible two-stage LDR policy forMSLP. In particular, at stage t , if the current history is ξ t , the state variable decisionsare given by using ˆ β in the LDR (2a) and the recourse decisions are obtained by solving(5), again substituting ˆ β for β . As this solution deﬁnes a feasible policy, the expected costof this solution provides an upper bound on z S and z MSLP . The expected cost of thepolicy deﬁned by (ˆ x , ˆ β ) can be estimated by generating an independent sample of ξ T , say { ξ Tj } N ′ j =1 , and computing c ⊤ ˆ x + X t ∈ [ T ] E (cid:2) h t ( ξ t ) ⊤ ˆ β t Φ t ( ξ t ) (cid:3) + 1 N ′ X j ∈ [ N ′ ] Q ( ˆ β, ξ Tj ) . Because ˆ β is ﬁxed in this evaluation step, it would generally be computationally feasible touse N ′ ≫ N . The values Q ( ˆ β, ξ Tj ) for j ∈ [ N ′ ] can also be used to construct a conﬁdenceinterval on the objective value of (ˆ x , ˆ β ), and hence a statistical upper bound on z MSLP .12e next consider the case when we do not know ˆ β ∈ B for a set B satisfying Assumption2.2, so that we do not know a priori that the two-stage LDR deﬁned by ˆ β deﬁnes a feasiblepolicy. To construct a policy in this case, we make the following relatively complete recourseassumption for the original problem MSLP. Assumption 2.3.

For all ξ T ∈ Ξ , and each t ∈ [2 , T ] , if the random vectors { ( s r ( ξ r ) , x r ( ξ r ) } r ∈ [ t − satisfy the constraints of MSLP for r ∈ [ t − , then there exists ( s t , x t ) that satisﬁes theconstraints of MSLP in stage t : A t ( ξ t ) s t + C t ( ξ t ) x t = b t ( ξ t ) − B t ( ξ t ) s t − ( ξ t − ) , ( x t , s t ) ∈ X t ( ξ t ) . In other words, this assumption states that in any stage t , for any value of the previousstate variables s t − ( ξ t − ) that could be obtained from past realizations of the randomoutcomes and past feasible decisions, there always exists a feasible set of decisions in thecurrent stage (see e.g., [29]).Under Assumption 2.3, we can implement a policy which is guided by ˆ β , which we referto as a state-target tracking (STT) policy . Speciﬁcally, at stage t = 1, we implement thesolution x ST T = ˆ x and s ST T = ˆ β . Then, for each stage t ∈ [2 , T ], we ﬁrst observe ξ t (thus, we have ξ t ), and then solve the problem (deterministic for this ﬁxed ξ t ):min x t ,s t c t ( ξ t ) ⊤ x t + h t ( ξ t ) ⊤ s t + ρ k s t − ˆ β t Φ t ( ξ t ) k (8a)s.t. A t ( ξ t ) s t + C t ( ξ t ) x t = b t ( ξ t ) − B t ( ξ t ) s ST Tt − ( ξ t − ) , ( x t , s t ) ∈ X t ( ξ t ) , (8b)where ρ ≥ k · k is any norm, and let the optimal solutionbe x ST Tt ( ξ t ) , s ST Tt ( ξ t ). For any ξ T ∈ Ξ, all problems in this sequence are feasible whenAssumption 2.3 holds, and hence this yields a feasible policy to MSLP. Observe that when ρ = 0, the policy reduces to a pure myopic policy that only considers the cost of decisionsin each stage, without considering the impact of s t on future costs. Using larger valuesof ρ > β on the state variables. The cost of the STT policy under a realization ξ ofthe stochastic process is X t ∈ [ T ] (cid:0) c t ( ξ t ) ⊤ x ST Tt ( ξ t ) + h t ( ξ t ) ⊤ s ST Tt ( ξ t ) (cid:1) . The expected cost of the STT policy is an upper bound on the optimal value of MSLP, and aconﬁdence interval on this expected cost can be obtained by simulation with independentreplications. We do not know an a priori upper bound on the optimality gap betweenthe expected cost of the STT policy and the optimal value z MSLP . However, the dualtwo-stage LDR discussed in Section 3 may be used to estimate a lower bound on z MSLP ,which can be used to provide an a posteriori statistical bound on the optimality gap ofthe STT policy. The value of the parameter ρ can be selected by estimating the expectedcost of the policy under diﬀerent values of ρ and choosing the most promising value, orby using optimization via simulation techniques [20, 31]. For example, in our numericalexperiments, we used a ﬁxed relatively small sample ( N ′ = 100), and applied a variant of agolden section algorithm to ﬁnd a value of ρ that approximately minimizes the estimatedcost given by this sample. See Section 5.2 for more details. Once the value of ρ is chosen,the expected cost of the resulting policy is evaluated using a larger sample. Note thatusing the STT policy, even the decisions s ST Tt ( ξ t ) may not necessarily have the form of anLDR. Thus, simulating this policy yields an estimate of an upper bound on z MSLP , butnot necessarily on z S . In this section, we apply a two-stage LDR to the dual of MSLP, with the goal of obtaininglower bounds on the optimal value of MSLP. The dual of MSLP, which we refer to as14-MSLP, is the problem (see [18]):max λ,γ E h X t ∈ [ T ] b t ( ξ t ) ⊤ λ t ( ξ t ) + d t ( ξ t ) ⊤ γ t ( ξ t ) i (9a)s.t. E h B t +1 ( ξ t +1 ) ⊤ λ t +1 ( ξ t +1 ) (cid:12)(cid:12)(cid:12) ξ t i + A t ( ξ t ) ⊤ λ t ( ξ t ) + D t ( ξ t ) ⊤ γ t ( ξ t ) = h t ( ξ t ) , t ∈ [ T ] , P -a.s. , (9b) C t ( ξ t ) ⊤ λ t ( ξ t ) + E t ( ξ t ) ⊤ γ t ( ξ t ) = c t ( ξ t ) , t ∈ [ T ] , P -a.s. , (9c) γ t ( ξ t ) ≥ , t ∈ [ T ] , P -a.s. , (9d) λ t ( ξ t ) ∈ R m t , γ t ( ξ t ) ∈ R n t , t ∈ [ T ] , P -a.s. , (9e)where B T +1 ( ξ T +1 ) = 0. For t ∈ [ T ], the dual decisions λ t ( · ) (corresponding to constraints(1b) in MSLP) and γ t ( · ) (corresponding to constraints (1c) in MSLP) are functions ofthe data ξ t observed up to stage t . Weak duality holds for MSLP and D-MSLP, i.e., theoptimal objective value of D-MSLP provides a lower bound on z MSLP . Moreover, undersome conditions, strong duality holds (i.e., optimal value of D-MSLP equals z MSLP ) [18],although we only require weak duality.

In [35] it has been proposed to use a static LDR to obtain a tractable approximation ofD-MSLP, and thus an eﬃciently computable lower bound on z MSLP . Speciﬁcally, the ideais to require all the dual decisions to be an LDR, i.e., λ t ( ξ t ) = Λ t Φ t ( ξ t ) , (10a) γ t ( ξ t ) = Γ t Φ t ( ξ t ) , (10b)15here Λ t ∈ R m t × K t , Γ t ∈ R n t × K t , for all t ∈ [ T ] are the parameters of the decision rule.Imposing (10) yields the following static approximation of D-MSLP, which we call D-LDR:max Λ , Γ E h X t ∈ [ T ] b t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) + d t ( ξ t ) ⊤ Γ t Φ t ( ξ t ) i (11)s.t. E h B t +1 ( ξ t +1 ) ⊤ Λ t +1 Φ t +1 ( ξ t +1 ) (cid:12)(cid:12)(cid:12) ξ t i + A t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) + D t ( ξ t ) ⊤ Γ t Φ t ( ξ t ) = h t ( ξ t ) , t ∈ [ T ] , P -a.s. ,C t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) + E t ( ξ t ) ⊤ Γ t Φ t ( ξ t ) = c t ( ξ t ) , t ∈ [ T ] , P -a.s. , Γ t Φ t ( ξ t ) ≥ , t ∈ [ T ] , P -a.s. , Λ t ∈ R m t × K t , Γ t ∈ R n t × K t , t ∈ [ T ] . We refer to the optimal value of D-LDR as v LDR . The semi-inﬁnite program D-LDR (11)can be reformulated as an eﬃciently solvable LP if assumptions A1-A3 stated in Section 2hold, the problem MSLP is strictly feasible, and the following additional assumption holds[35]:A4. The conditional expectation E ( ξ T | ξ t ) is almost surely linear in ξ t for all t ∈ [ T ] (e.g.,this occurs when { ξ t } t ∈ [ T ] are mutually independent, known as stage-wise indepen-dence ).If assumption A3 is replaced by an assumption that Ξ is described by conic inequalities,D-LDR (11) can be reformulated as a conic program [23]. Examining the structure of D-MSLP, given in (9), we observe that if the values λ t ( ξ t )are ﬁxed, then (9) decomposes by stage. We thus propose to apply an LDR only to the λ variables, leaving the decision variables γ as recourse variables. Imposing the LDR of1610a) collapses D-MSLP into the following 2SLP, which we refer to as D-LDR-2S: v S := max γ , Λ d ⊤ γ + X t ∈ [ T ] E (cid:2) b t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) (cid:3) + E [ G (Λ , ξ T )] (12)s.t. E (cid:2) B ( ξ ) ⊤ Λ Φ ( ξ ) (cid:3) + A ⊤ Λ + D ⊤ γ = h ,C ⊤ Λ + E ⊤ γ = c ,γ ∈ R n + , Λ t ∈ R m t × K t , t ∈ [ T ] , where we have dropped the dependence on ξ ≡ G (Λ , ξ T ) is the second-stage value function, G (Λ , ξ T ) := X t ∈ [2 ,T ] G t (Λ , ξ t ) (13)where for each t ∈ [2 , T ] G t (Λ , ξ t ) := max γ t d t ( ξ t ) ⊤ γ t (14a)s.t. D t ( ξ t ) ⊤ γ t = h t ( ξ t ) − A t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) − E h B t +1 ( ξ t +1 ) ⊤ Λ t +1 Φ t +1 ( ξ t +1 ) (cid:12)(cid:12)(cid:12) ξ t i , (14b) E t ( ξ t ) ⊤ γ t = c t ( ξ t ) − C t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) , (14c) γ t ∈ R n t + . (14d)The following proposition, immediate from the deﬁnitions of the associated problems,summarizes the relationship between the optimal values of MSLP, D-LDR (11), and D-LDR-2S(12) . Proposition 3.1.

The following inequalities hold: z MSLP ≥ v S ≥ v LDR . As D-LDR-2S (12) is a 2SLP, the discussion in Section 2.3 of methods to obtain anapproximate solution to P-LDR-2S (4) applies also to D-LDR-2S (12). In particular, one17ossibility is to obtain an i.i.d. sample { ξ Tj } Nj =1 of ξ T and solve the SAA problem:ˆ v SN := max γ , Λ d ⊤ γ + X t ∈ [ T ] E (cid:2) b t ( ξ t ) ⊤ Λ t Φ t ( ξ t ) (cid:3) + 1 N X j ∈ [ N ] G (Λ , ξ Tj ) (15a)s.t. E (cid:2) B ( ξ ) ⊤ Λ Φ t ( ξ ) (cid:3) + A ⊤ Λ + D ⊤ γ = h , (15b) C ⊤ Λ + E ⊤ γ = c , (15c) γ ∈ R n + , (15d)Λ t ∈ R m t × K t , t ∈ [ T ] . (15e)As in the primal, note that the SAA problem can be solved by decomposition algorithms asthe second-stage problem decomposes by both scenario and by stage due to the relationship(13). The expected value coeﬃcients in the objective and constraints (15b) may be furtherestimated by sampling in case they cannot be computed eﬃciently. z MSLP

Next, we discuss how to use an approximate solution of D-LDR-2S (12) to estimate a lowerbound on z MSLP . As in the primal case, in order to assure that we obtain a two-stage LDRpolicy that is feasible for all possible realizations, we consider the possibility of adding aset of constraints Λ ∈ L ⊆ R τ D to D-LDR-2S (12) and its SAA counterpart (15) where τ D := P t ∈ [ T ] K t m t . The following assumption on L assures that the problem D-LDR-2S(12) has relatively complete recourse when the constraints Λ ∈ L are enforced. Assumption 3.2.

For all Λ ∈ L , G (Λ , ξ T ) > −∞ P -a.s.. The following assumption provides a suﬃcient condition under which the set L = R τ D satisﬁes Assumption 3.2 (i.e., no additional constraints are necessary). Assumption 3.3.

The set X t ( ξ t ) is bounded for all t ∈ [ T ] and P almost all ξ T ∈ Ξ . A special case of this assumption occurs when x and s variables have explicit upperand lower bounds. An important feature of this assumption is that the sets X t ( ξ t ) are notrequired to be uniformly bounded. For example, bounds on the decision variables of theform 0 ≤ x t ( ξ t ) ≤ M ( ξ t ) (and similarly for s variables), are suﬃcient for satisfying thisassumption, even if M ( ξ t ) is not bounded over ξ T ∈ Ξ. Proposition 3.4.

If Assumption 3.3 is satisﬁed, then L = R τ D satisﬁes Assumption 3.2. roof. We show that for any given Λ ∈ R τ D and ξ T ∈ R ℓ T , (14) is feasible for any t ∈ [ T ].Let R b ( ξ t ) and R c ( ξ t ) denote the right-hand sides of the constraints (14b) and (14c),respectively. Then, the dual of (14)min R b ( ξ t ) ⊤ s t ( ξ t ) + R c ( ξ t ) ⊤ x t ( ξ t )s.t. ( x t ( ξ t ) , s t ( ξ t )) ∈ X t ( ξ t ) ,x t ( ξ t ) ∈ R p t , s t ( ξ t ) ∈ R q t , is bounded due to Assumption 3.3. It is also feasible as MSLP is assumed to be feasible.This implies that (14) cannot be infeasible.Now, suppose we have an approximate solution (ˆ γ , ˆΛ) to D-LDR-2S (12), where ˆΛ ∈ L for some set L that satisﬁes Assumption 3.2. In this case, (ˆ γ , ˆΛ) deﬁnes a feasible solutionto D-LDR-2S (12), and hence its objective value provides a lower bound on v S , and henceis a lower bound on z MSLP . The objective value of (ˆ γ , ˆΛ) can be estimated by generatingan independent sample of ξ T , say { ξ Tj } N ′ j =1 (where possibly N ′ ≫ N ), and computing d ⊤ ˆ γ + X t ∈ [ T ] E (cid:2) b t ( ξ t ) ⊤ ˆΛ t Φ t ( ξ t ) (cid:3) + 1 N ′ X j ∈ [ N ′ ] G ( ˆΛ , ξ Tj ) . The values G ( ˆΛ , ξ Tj ) for j ∈ [ N ′ ] can also be used to construct a conﬁdence interval on theobjective value of (ˆ γ , ˆΛ), and hence a statistical lower bound on z MSLP .We close this section by discussing an approach for estimating the gap between a primaltwo-stage LDR policy deﬁned by (ˆ x , ˆ β ) and a dual two-stage LDR policy deﬁned by (ˆ γ , ˆΛ).Following [39], the motivation is that if the same sample (common random numbers) isused in estimating the upper and lower bounds, then the variance of the gap estimatorcan be reduced if the upper and lower bound sample estimates are positively correlated.Speciﬁcally, given a sample { ξ Tj } N ′ j =1 , the gap observations are then calculated asGap j = h c ⊤ ˆ x + X t ∈ [ T ] E (cid:2) h t ( ξ t ) ⊤ ˆ β t Φ t ( ξ t ) (cid:3) + Q ( ˆ β, ξ Tj ) i − h d ⊤ ˆ γ + X t ∈ [ T ] E (cid:2) b t ( ξ t ) ⊤ ˆΛ t Φ t ( ξ t ) (cid:3) + G ( ˆΛ , ξ Tj ) i , for j ∈ [ N ′ ]. These values can then be used to construct a conﬁdence interval on the gap.19 Illustrative example: Inventory planning

We ﬁrst present a numerical example on an inventory planning problem to investigate theperformance of two-stage LDR policies and bounds, in comparison to static LDR policiesand bounds.

We consider a variation of the inventory planning problem used for numerical illustrationin [4, 35]. The system consists of I factories and a single product type, and the goal is tomeet demands over the planning horizon at minimum expected cost. The model is statedas follows: min E (cid:20) X t ∈ [ T ] X i ∈ [ I ] c it x it ( ξ t ) (cid:21) (16)s.t. s t − ( ξ t ) − s t ( ξ t ) + X i ∈ [ I ] x it ( ξ t ) = ξ t , t ∈ [ T ] , P -a.s. , (17) s ≤ s it ( ξ t ) ≤ ¯ s t ∈ [ T ] , i ∈ [ I ] , P -a.s. , (18)0 ≤ x it ( ξ t ) ≤ ¯ x i t ∈ [ T ] , i ∈ [ I ] , P -a.s. . (19)Here, ξ t is a scalar random variable representing demand for the product in each t ∈ [ T ].The recourse decision variable x it ( ξ t ) determines amount of the product to produce infactory i at stage t , while the state variable s t ( ξ t ) represents the inventory level at the endof stage t . Constraints (17) are the inventory balance equations, (18) limit the inventorylevel to be between lower bound s and upper bound ¯ s , and (19) are the limits on productionin each stage to be at most ¯ x i .The model in [4, 35] also has a constraint on the total amount that can be producedfrom any single factory over all the stages in the planning horizon. Modeling this constraintin our standard model format requires introducing an additional state variable for eachfactory i , representing the cumulative amount of production from each factory. Imposingan LDR on that state variable would in turn imply that the variables x it ( ξ t ) also follow anLDR, and hence for that model the static and two-stage LDR policies are identical. Thisillustrates an example where there is no beneﬁt to using a two-stage LDR over a staticLDR. In the version we consider, the x it ( ξ t ) are still ﬂexible when the state variables s t ( ξ t )follow an LDR, and hence there is potential for a two-stage LDR to yield better solutions.20ollowing the data in [4, 35], we consider an instance with I = 3, s = 500, ¯ s = 2000,and ¯ x i = 567 for i ∈ [ I ]. The random demand ξ t in stage t ∈ [ T ] is uniformly distributedin the interval Ξ t = [(1 − θ ) ξ ∗ ζ t , (1 + θ ) ξ ∗ ζ t ], where θ = 0 . ξ ∗ = 1000 is the nominal demand, and ζ t = 1 + (1 /

2) sin( π ( t − /

12) is the seasonalityfactor. Finally, the cost coeﬃcients are deﬁned as c it = α i ζ t , where α = 1, α = 1 .

5, and α = 2. For both the static and two-stage LDR policies, we use the standard basis functions, ξ t ,in stage t . For the static LDR, we implemented the deterministic reformulations proposedin [35, 57] to obtain upper bounds (with a primal LDR policy) and lower bounds (with adual LDR policy).For the primal two-stage LDR policy, we ﬁrst observe that this problem as stated doesnot satisfy relatively complete recourse, Assumption 2.3, although we remark that thisassumption is satisﬁed in a slightly modiﬁed version of the problem in which variables areintroduced to allow some amount of demand to go unserved, with a large penalty. Ratherthan making this modiﬁcation, however, we demonstrate how for this problem a set ofconstraints satisfying Assumption 2.2 can be derived. Using the standard basis functions,the state variables s t ( ξ t ) take the form s t ( ξ t ) = β t ξ t where β t ∈ R × t . Thus, the constraints (18) take the form s ≤ β t ξ t ≤ ¯ s, t ∈ [ T ] , ∀ ξ t ∈ [1 − θξ ∗ ζ t , (1 + θ ) ξ ∗ ζ t ] . These constraints can be reformulated with deterministic linear constraints in an extendedvariable space using standard robust optimization techniques [5]. To ensure β t , t ∈ [ T ] areselected such that constraints (17) can be satisﬁed for some x t ( ξ t ) , i ∈ [ I ] satisfying (19),it is suﬃcient to enforce ξ t − X i ∈ [ I ] ¯ x i ≤ β t − ξ t − − β t ξ t ≤ ξ t , t ∈ [ T ] , ∀ ξ t ∈ [(1 − θ ) ξ ∗ ζ t , (1 + θ ) ξ ∗ ζ t ] , where the lower bound is based on the maximum total production and the upper bound21s based on the minimum total production in each period. Again, these constraints can bereformulated as deterministic linear constraints using robust optimization techniques.For both the primal and dual two-stage LDR policy, we use 250 scenarios to constructan SAA, and solve the resulting problem by explicitly solving the determistic equivalentformulation. Given the resulting LDR coeﬃcients, we then use an independent sample of10 scenarios to evaluate the quality of the primal policy and dual bound. Table 1 provides the results comparing the bounds obtained for this instance, for varyingvalues of T = 2 , . . . ,

10. The columns under Static LDR provide the lower bound (LB),upper bound (UB), and optimality gap (Gap (%)), respectively, where optimality gap foran instance is calculated as (UB - LB)/UB. For the two-stage LDR policy, 95% conﬁdenceintervals for the lower bound (LB CI) and upper bound (UB CI) are provided, along withan estimate of the optimality gap, which is computed by using the lower end of the lowerbound conﬁdence interval and the upper end of the upper bound conﬁdence interval. WeStatic LDR 2S LDR T LB UB Gap (%) LB CI UB CI Gap (%)2 1972.4 2026.0 2.65 1974.4 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± We next consider a capacity expansion problem. On this problem, we again compare thetwo-stage LDR policies and bounds to those obtained from static LDR policies, and alsocompare to the policy and lower bound obtained from using the SDDP algorithm.

We consider a variant of the stochastic capacity expansion problem given in [15]. We wishto determine an investment schedule over T stages for the installation of new capacitiesof I diﬀerent power generation technologies, together with some operational decisions tomeet demand for power over time. The demand is modeled by a load duration curve, whichis approximated by partitioning each stage into J segments (of possibly diﬀerent length).The demand corresponding to segment j ∈ [ J ] in t ∈ [ T ] is denoted by d tj . The amountof new capacity of technology i ∈ [ I ] added in stage t ∈ [ T ] is represented by u + ti , and isassumed to be available for use immediately, i.e., at the beginning of stage t . The unit costof u + ti is denoted by c u + ti . The state variable s ti represents the current installed capacity oftechnology i ∈ [ I ] in the beginning of stage t ∈ [ T ], which incurs holding cost of c sti perunit. We assume that it is possible to discard (i.e., remove) some capacity of i ∈ [ I ] in t ∈ [ T ], denoted by u − ti , at a (possibly zero) unit cost of c u − ti . The operating level of i ∈ [ I ]at t ∈ [ T ] for meeting the demand in segment j ∈ [ J ] is represented by the decision variable x tij , while the amount of unsatisﬁed demand is represented as z tj , whose unit costs are c xtij and c ztj , respectively. Then, the stochastic capacity expansion problem is formulated as an23SLP as follows:min E X t ∈ [ T ] (cid:20) X i ∈ [ I ] (cid:16) c u + ti u + ti ( ξ t ) + c u − ti u − ti ( ξ t ) + c sti s ti ( ξ t ) + X j ∈ [ J ] c xtij x tij ( ξ t ) (cid:17) + X j ∈ [ J ] c ztj z tj ( ξ t ) (cid:21) (20a)s.t. s ti ( ξ t ) − s t − ,i ( ξ t − ) − u + ti ( ξ t ) + u − ti ( ξ t ) = 0 , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (20b) s ti ( ξ t ) − x tij ( ξ t ) ≥ , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , j ∈ [ J ] , (20c) X i ∈ [ I ] x tij ( ξ t ) + z tj ( ξ t ) ≥ d tj ( ξ t ) , t ∈ [ T ] , P -a.s. , j ∈ [ J ] , (20d)0 ≤ z tj ( ξ t ) ≤ d tj ( ξ t ) , t ∈ [ T ] , P -a.s. , j ∈ [ J ] , (20e)0 ≤ u + ti ( ξ t ) ≤ M u + ti , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (20f)0 ≤ u − ti ( ξ t ) ≤ M u − ti , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (20g)0 ≤ s ti ( ξ t ) ≤ M sti , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (20h) x tij ( ξ t ) ≥ , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , j ∈ [ J ] , (20i)The objective function (20a) minimizes the expected total cost. The constraints (20b) arethe only state equations, which keep track of the available capacity of each technology.Constraints (20c) limit the operating levels to the available capacity level, while (20d)ensure that either demand is met, or unmet demand is recorded in the z tj variable values.Constraints (20e)-(20h) represent the bounds on the shortfall, installation, removal, andinventory level variables, respectively. Note that (20h) constitute upper bounds also on the x variables due to (20c), and thus this formulation satisﬁes Assumption 3.3. In addition,we assume that M u − ti ≥ M st − ,i which ensures that this formulation satisﬁes relativelycomplete recourse, Assumption 2.3, since at stage t , given any feasible value of s t − ,i ( ξ t − ),it is feasible to set u − ti ( ξ t ) = s t − ,i ( ξ t − ) for i ∈ [ I ], z tj ( ξ t ) = d tj ( ξ t ) for j ∈ [ J ] and allremaining variables to zero.To the extent possible, we use data from [15], which focuses on a German system,although we extend their 3-stage example to T = 5 , , ,

20. In [15], there are I =3 technologies (coal-ﬁred power plant, combined cycle gas turbine and open cycle gasturbine). Each stage is divided into L = 8 periods, and W = 5 wind regimes are consideredfor each period. We model this as J = LW = 40 segments at each stage, correspondingto each period/wind regime pair. For t ≥

2, the demand corresponding to the segment24 = ( ℓ, w ) ∈ [ L ] × [ W ] is modeled as d tj ( ξ t ) = max (cid:26) d ,ℓ t Y r =2 ξ gr − η w K wt t Y r =2 ξ wr , (cid:27) where d ,ℓ is the base demand value of period ℓ , ξ gt is a random variable reﬂecting thedemand growth of stage t , η w is the parameter denoting the wind eﬃciency, K wt is thewind power generation target, and ξ wt is a random variable representing the growth in thewind power generation in stage t . The values of d ,ℓ and η w are from Tables 2 and 3 of[15], and are reproduced in Appendix A. We use K w = 36 .

64 and K wt = 45 .

75 for all t ≥ ξ gt has lognormal distribution with µ = 0 . σ = 0 . . t , and ξ wt haslognormal distribution with µ = 0 .

15 and σ = 0 .

25 + 0 . t . For the ﬁrst stage, we use thedeterministic demand values of d ,j =( ℓ,w ) = d ,ℓ E [ ξ g ] − η w E [ ξ w ] = 1 . d ,ℓ − . η w . Theunits of all demands (and all primal decision variables) are gigawatts.We assume there are no holding costs and no costs for removing capacity, i.e., we use c u − ti = c sti = 0 for all i ∈ [ I ] , t ∈ [ T ]. We use discounting to determine the other costs, setting c u + ti = 5 ι i / . t , c xti,j =( ℓ,w ) = 0 . c i τ ℓ τ w / . t and c zt,j =( ℓ,w ) = τ ℓ τ w / . t where the values ofthe annualized costs ι i , operation costs c i , τ ℓ and τ w values are from [15] (see AppendixA). All costs are in million of Euros. Finally, we assume the maximum installation perstage is a constant M u + ti = C , and derive redundant upper bounds on s and u − , i.e., M sti = P tr =1 M u + ri and M u − ti = M st − ,i . In our experiments we consider two diﬀerent setsof instances deﬁned using C = 50 and C = 100. We compare the primal and dual bounds obtained using two-stage and static LDR. Forthe LDR basis functions, for each t ∈ [ T ], we let K t = 3, andΦ t ( ξ t ) = 1 , Φ t ( ξ t ) = t Y r =2 ξ gr , Φ t ( ξ t ) = t Y r =2 ξ wr . Because assumptions A3 and A4 do not hold for this problem (the random variables do nothave bounded support and E ( ξ T | ξ t ) is not linear in ξ t ), the reformulation approach from[13, 35, 57] used for the static LDR cannot be applied to solve P-LDR (3) and D-LDR (11).We therefore use a sampling strategy to approximately solve these problems. Speciﬁcally,25he sample approximations of P-LDR (3) and D-LDR (11) are identical to P-LDR (3) andD-LDR (11), respectively, except that the inﬁnite set of constraints P -a.s. are replaced bythe ﬁnite set corresponding to the sample. Approximate solutions to P-LDR-2S (4) andD-LDR-2S (12) are obtained by solving the SAA problems (6) and (15), respectively. Wesolve all sample approximations using the same sample of size N = 150 T .Models P-LDR-2S (4) and D-LDR-2S (12) corresponding to model (20) and its dualare given in Appendix B. Although the MSLP given in (20) has relatively complete re-course (Assumption 2.3), its two-stage primal approximation P-LDR-2S (4) does not haverelatively complete recourse. Thus, for obtaining a primal policy using P-LDR-2S (4),we implement the STT policy proposed in Section 2.4. The parameter ρ is determinedby conducting a golden section search using a ﬁxed evaluation sample of 100 scenarios.Speciﬁcally, starting with a lower bound of 0 and an upper bound of 1000, a golden sectionsearch is performed in which a simulation of the STT policy with these 100 scenarios is usedto guide the search. In case the current upper estimate of ρ in the search process yieldsthe minimum estimated cost, the search is restarted with the new lower estimate set to thecurrent upper estimate of ρ , and the new upper estimate set to four times the current upperestimate. The search is terminated when either the upper estimate and lower estimates of ρ diﬀer by less than 1 .

0, or the diﬀerence in the estimated objective values between theupper and lower estimates is less than 10 − times the sum of the two objective estimates.The resulting value of ρ is then used in the simulation with N ′ = 5000 T replications toestimate the quality of the resulting policy. The time to select ρ in this process was vastlydominated by the time to simulate the policy, but is included in all numerical results thatfollow.Since Assumption 3.3 is satisﬁed, any solution to the SAA problem (15) provides afeasible solution to D-LDR-2S (12), and hence evaluating this solution using N ′ indepen-dent replications yields a statistical lower bound on z MSLP . In our experiments we use N ′ = 5000 T scenarios for estimating the value of this policy. Unfortunately, the sampleapproximations of P-LDR (3) and D-LDR (11) do not yield policies (primal or dual) thatare feasible under all scenarios. Thus, when evaluating these policies with the independentreplications, we report two measures: the average objective value over scenarios that arefeasible, and the fraction of scenarios that are infeasible. By averaging only over feasiblescenarios, these estimates are optimistically biased, i.e., they underestimate the bound onthe primal problem, and overestimate the bound for the dual problem. As a result, theseestimates do not necessarily provide valid (statistical) upper and lower bounds on z MSLP ,26ut we use them to provide a “best case” estimate when comparing to the estimates ob-tained from the two-stage LDR policies.All of our numerical results are carried out using IBM ILOG CPLEX 12.6 as the LPsolver. We perform all experiments using a single thread on a Mac OS X 10.12 with 4 GHzIntel Core i7 CPUs and 16 GB RAM.The SAA problems (6) and (15) are solved with a sample size of N = 150 T . The primalSAA problem (6) is solved with Benders decomposition, using a single aggregate cut pertime-stage. The Benders decomposition is run until no violated cuts are found. The dualSAA problem (15) is solved with the bundle-level method [19, 37]. The level method forsolving the dual SAA problem is terminated when the relative gap between the lower andupper bounds is less than 10 − . The details of the Benders decomposition and the levelmethod are provided in Appendix B.1 and B.2, respectively. Tables 2 and 3 present 95% conﬁdence intervals (CIs) on the expected costs of primalpolicies and dual lower bounds, respectively, obtained using the two-stage and static LDRpolicies. These results are reported only for the instances having T = 5 ,

10. The costsare normalized such that for each instance, the estimated lower bound obtained by thetwo-stage LDR policy has value 100.0. In these tables, the CIs are presented with theirmean and half-width ( ± ). In Table 2, the upper end of the CI the two-stage LDR policy isan upper bound on the expected cost of using that policy, and hence is a statistical upperbound on z MSLP . We also report under column ‘Inf. (%)’ the percentage of the scenarios(out of 5000 T evaluated scenarios) for which the static LDR policy is infeasible. To givean idea of the relative improvement in the expected policy cost obtained by using thetwo-stage LDR, the column ‘%U∆’, presents the percentage increase in the upper boundon the cost obtained with the static policy over the upper bound on the cost obtainedwith the two-stage LDR policy. We observe that the expected cost of the static LDRpolicy is between 36% and 102% higher than that of the static LDR policy, with the mostsigniﬁcant diﬀerences occurring with larger time stages. We also observe that the staticLDR policy is frequently infeasible. Finally, although not presented in the table, we ﬁndthat the estimated expected cost of the STT policies was consistently similar (within 2.3%)to the objective value of the SAA problem (6), indicating that the STT policy is eﬀectively“tracking” the obtained two-stage LDR policy.27S LDR Static LDR C T

Mean ± Mean ± Inf. (%) %U∆50 5 100.8 0.3 138.0 0.2 3.0 36.710 114.6 0.6 232.9 0.4 3.8 102.7100 5 101.3 0.3 138.4 0.2 2.9 36.510 109.2 0.4 195.8 0.3 4.0 78.8Table 2: Conﬁdence intervals for expected costs of the primal policies.Considering the CIs of the lower bounds obtained from using two-stage and static LDRpolicies presented in Table 3, we again ﬁnd that the static LDR policy is often infeasible.Column ‘%L∆’ presents the percentage diﬀerence between the lower end of the CI obtainedfrom the static and two-stage LDR policies, and indicates that the (95% conﬁdence) lowerbounds obtained by the static LDR range from being similar to 2.9% lower than thoseobtained by the two-stage LDR policy.2S LDR Static LDR

C T

Mean ± Mean ± Inf. (%) % L∆50 5 100.0 0.3 98.7 0.3 2.4 -1.310 100.0 0.5 97.1 0.4 3.5 -2.9100 5 100.0 0.3 100.0 0.3 2.1 0.010 100.0 0.4 98.1 0.4 3.2 -1.8Table 3: Conﬁdence intervals for expected costs of the dual policies.

We next compare the two-stage LDR approximation with the results obtained using SDDP.In order to apply SDDP, we need a formulation having a ﬁnite number of scenarios per stageand stage-wise independent random variables. We obtain a model with stage-wise inde-pendence by introducing new state variables v gt and v wt to represent Q tr =2 ξ gr and Q tr =2 ξ wr ,respectively, which is implemented by adding the state equations v gt = ξ gt v gt − , v wt = ξ wt v wt − , t ∈ [2 , T ] (21)28nd v g = v w = 1. With these state variables, the demand in stage t ≥ { d ,ℓ v gt − η w K wt v wt , } . In particular, the right-hand side of constraints(20d) are replaced with the expression d ,ℓ v gt − η w K wt v wt , and the redundant upper bounds z tj ( ξ t ) ≤ d tj ( ξ t ) in (20e) are removed. Thus the only random variables appearing in stage t constraints are ξ gt and ξ wt , which are stage-wise independent. We use SAA to constructscenario trees with a ﬁnite number of outcomes per stage. In an SAA problem, we ap-proximate the joint distribution of ξ gt and ξ wt with 200 scenarios, obtained by independentMonte Carlo sampling. Note that the SAA approximation has 200 T − total sample paths.The number of scenarios per stage was determined based on initial experiments solvingmultiple replications of the SAA problem, and was found to provide a good trade-oﬀ be-tween diﬃculty in solving each individual SAA problem by SDDP and the variability ofthe SAA estimates. The optimal value of an SAA problem is random because it is deﬁnedby a random sample. The expected value of this optimal value is a lower bound on the trueoptimal value [39]. Thus, by solving multiple SAA problems with independent samples, aconﬁdence interval on the expected value of the SAA problem, and hence a lower boundon the true optimal value, can be obtained. We thus generate 25 independently generatedSAA problems, and for each one we obtain a lower bound by solving it with SDDP for alimited time. These replication values are then used to construct a conﬁdence interval onthe lower bound on z MSLP .We use the SDDP implementation sddp.jl [16] to solve each SAA problem. Thisalgorithm is implemented in Julia. In benchmarks reported in [16], it was found thatthe computation times for sddp.jl were about 30% higher than those for the C++ codeDOASA [45], on a test instance for which DOASA was designed for. The code sddp.jl does not directly support having random constraint coeﬃcients, as in (21). However, thealgorithm does support solving a problem with an underlying state evolving according toa Markov chain, and with parameters in the constraints dependent on the state of theMarkov chain. Thus, we model the stochastic process as a Markov chain having 200 statescorresponding to the 200 scenarios of joint realizations of ( ξ gt , ξ wt ) in each stage t ∈ [2 , T ].The transition probability from each state in stage t to each state in stage t + 1 is 1 / sddp.jl serially, although we note that both sddp.jl and the two-stage LDR29pproximation have signiﬁcant potential for speedup via parallelization.The time limit for each SDDP replication is set as follows. We let t LDR be the total timerequired to solve the SAA problems (6) and (15), and evaluate the value of the obtaineddual policy with an independent sample of size N ′ = 5000 T . We run the SDDP algorithmon each of the 25 SAA replications with two time limits: TL:=1 . ∗ t LDR /25 and 10 ∗ TL. Theﬁrst time limit is used to approximately match the total time (over all replications) allottedto the SDDP algorithm with the time used by the two-stage LDR approach (where thefactor 1 . sddp.jl is implemented in Julia whereasthe two-stage LDR approach is implemented in C++). The second time limit is used todemonstrate the potential of SDDP to obtain improved lower bounds and policies whengiven more time. Estimating the expected cost of the SDDP and STT policies requires aseparate simulation of these policies, which has very similar computational eﬀort for thetwo policies, and thus this time is excluded from t LDR .The lower bound results are reported in Table 4, in which again the objective values arescaled such that the estimated lower bound obtained by the two-stage LDR algorithm is100.0. In the table, t LDR is rounded to the nearest second. In aggregate, 40% of this timeis spent solving (6), 48% is spent solving (15), and 12% is spent evaluating the dual boundwith the independent sample. The table also presents the mean and half-width ( ± ) of thelower bound obtained using two-stage LDR and the SDDP algorithm given time limits TLand 10 ∗ TL. The columns %L∆ present the percentage diﬀerence between the lower end ofthe 95% CI on the lower bound obtained by the SDDP algorithm and that obtained by thetwo-stage LDR algorithm. Here a negative number indicates the lower bound was smallerD-LDR-2S SDDP TL SDDP 10X TL

C T t

LDR

Mean ± Mean ± %L∆ Mean ± %L∆50 5 188 100.0 0.3 100.6 0.7 0.2 100.8 0.7 0.410 866 100.0 0.5 101.5 1.3 0.6 103.2 1.3 2.415 1728 100.0 1.3 94.3 2.6 -7.1 103.9 2.9 2.320 2897 100.0 1.5 80.8 3.4 -21.4 94.2 3.8 -8.2100 5 171 100.0 0.3 101.5 0.7 1.1 101.7 0.7 1.310 1094 100.0 0.4 100.8 1.1 0.1 101.6 1.2 0.915 2235 100.0 0.9 102.4 2.3 1.0 108.6 2.5 7.120 3827 100.0 1.8 88.3 3.4 -13.5 101.4 3.8 -0.6Table 4: Comparison of lower bounds obtained from two-stage LDR and SDDP algorithm.30worse), and a positive number indicates an improvement over two-stage LDR. We ﬁnd thatwhen given a time limit similar to the time used by the two-stage LDR approximation, theSDDP algorithm obtains slightly better lower bounds on instances with fewer time stages,but somewhat worse lower bounds on the instances with more time stages. On the otherhand, when given more time, the SDDP algorithm is able to achieve noticeably betterlower bounds on instances with the fewer time stages, and closes much of the gap on theinstances with more stages.We next compare estimates of the expected cost of policies obtained with the two-stageLDR and SDDP methods. For the two-stage LDR policy, the policy and estimate of asso-ciated upper bound are determined as described in Section 5.2. For the SDDP algorithm,a policy can be obtained by ﬁrst solving a (single) SAA approximation problem, and thenusing the resulting value-function approximation to drive a policy that is then evaluatedvia forward simulation replications using independently generated values of the randomvariables (i.e., independent from those used in the SAA approximation). Unfortunately,the ability to run a forward simulation using samples diﬀerent from those used to solvethe SDDP problem is not supported in sddp.jl . To obtain an estimate of the value ofthe policy that can be obtained using SDDP, for each of the 25 SAA replications solvedby SDDP, we simulated the resulting policy using the sample distribution used in the SAAproblem to estimate the expected cost of that policy. We then constructed a 95% conﬁ-dence interval of the resulting upper bounds, and these are the values reported in Table 5.The column ‘ t EVAL ’ in this table presents the time, in seconds, to estimate the expectedP-LDR-2S SDDP TL SDDP 10X TL

C T t

EVAL

Mean ± Mean ± %U∆ Mean ± %U∆50 5 53 100.8 0.3 100.7 0.7 0.3 100.8 0.7 0.410 239 114.6 0.6 104.3 1.4 -8.2 104.1 1.3 -8.415 491 121.0 1.3 109.5 3.0 -8.0 108.4 3.0 -8.820 932 102.9 1.4 100.6 4.1 0.4 100.4 4.0 0.1100 5 53 101.3 0.3 101.7 0.7 0.8 101.7 0.7 0.710 238 109.2 0.4 102.0 1.2 -5.9 102.0 1.2 -5.915 495 133.2 1.2 113.4 2.6 -13.6 112.4 2.6 -14.420 900 116.6 1.6 109.6 4.0 -3.9 109.0 3.9 -4.5Table 5: Comparison of approximate upper bounds obtained from two-stage LDR andSDDP algorithm. 31ost of the STT policy. The remaining columns present the conﬁdence intervals of theestimated upper bounds in format similar to Table 4. As we see from the columns %U∆,the estimated expected cost of the SDDP policies is in many cases signiﬁcantly lower thanthe estimated expected cost of the two-stage LDR policy, suggesting that SDDP obtainssigniﬁcantly better primal policies for this problem.In summary, for this problem, we ﬁnd that SDDP provides similar, or slightly worse,lower bounds, and signiﬁcantly better primal policies, in a comparable amount of timeas the two-stage LDR approximation, and the lower bounds can be improved by runningSDDP for more time. Thus, for this problem, SDDP is clearly favored over the two-stageapproximation. Thus, LDR approximations (both static and two-stage) may be most usefulfor problems in which the assumptions required to apply SDDP do not hold. For example,in a hydropower planning case study presented in [58], the time series of water inﬂows, X t ,was modeled as X t = e Y t , where Y t follows a ﬁrst order AR (1) autoregressive time series,making the model nonlinear in X t , and hence not solvable directly by SDDP. We propose two-stage LDRs, a new approximate solution method for MSLPs. This ap-proach has two advantages over staic LDRs. Due to the ﬂexibility in the recourse decisions,our method potentially yields better (at least not worse) bounds and policies than standardstatic LDR policies. In addition, as our approach is based on sampling and 2SLP, it workswith very mild assumptions and can take advantage of existing literature on methods forapproximately solving 2SLP problems. We illustrate the new approach on two exampleproblems, an inventory planning problem and a capacity planning problem, which indicatethat two-stage LDR policies have potential to yield signiﬁcantly better policies than staticLDR policies.In future research it will be interesting to test the use of two-stage LDR policies on moreproblems, and to investigate if there are problem classes where two-stage LDR policies areprovably optimal or near-optimal.In the primal problem, a two-stage LDR can be directly applied to multi-stage stochas-tic mixed integer programs , provided the integrality restrictions are imposed only on therecourse variables. Since availability of algorithms for multi-stage stochastic mixed integerprograms is very limited, it will be interesting to explore this extension further, in partic-ular possibly using ideas from [7] to obtain a decision rule in the case the state variables32lso have integrality constraints.

Acknowledgements . This work is supported in part by the National Science Foundation un-der grant CMMI-1634597, and by the U.S. Department of Energy, Oﬃce of Science, Oﬃce ofAdvanced Scientiﬁc Computing Research, Applied Mathematics program under contract numberDE-AC02-06CH11357.

References [1] S. Ahmed. Multistage stochastic optimization. .New Directions Short Course on Mathematical Optimization, 2016.[2] I. Bakir, N. Boland, B. Dandurand, and A. Erera. Scenario set partition dual bounds formultistage stochastic programming: A hierarchy of bounds and a partition sampling approach,2016. .[3] D. Bampou and D. Kuhn. Scenario-free stochastic programming with polynomial decision rules.In ,pages 7806–7812. IEEE, 2011.[4] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski. Adjustable robust solutions ofuncertain linear programs.

Math. Program. , 99(2):351–376, 2004.[5] A. Ben-Tal and A. Nemirovski. Robust optimization – methodology and applications.

Mathe-matical Programming , 92:453–480, 2002.[6] D. Bertsimas and C. Caramanis. Adaptability via sampling. In

Decision and Control, 200746th IEEE Conference on , pages 4717–4722. IEEE, 2007.[7] D. Bertsimas and A. Georghiou. Design of near optimal decision rules in multistage adaptivemixed-integer optimization.

Oper. Res. , 63(3):610–627, 2015.[8] J. Birge. Decomposition and partitioning methods for multistage stochastic linear programs.

Oper. Res. , 33(5):989–1007, 1985.[9] J. R. Birge. Aggregation bounds in stochastic linear programming.

Math. Program. , 31(1):25–41, 1985.[10] J. R. Birge, C. J. Donohue, D. F. Holmes, and O. G. Svintsitski. A parallel implementation ofthe nested decomposition algorithm for multistage stochastic linear programs.

Math. Program. ,75(2):327–352, 1996.[11] G. Calaﬁore and M. Campi. The scenario approach to robust control design.

IEEE Trans. Au-tomat. Control , 51:742–753, 2006.

12] M. S. Casey and S. Sen. The scenario generation algorithm for multistage stochastic linearprogramming.

Math. Oper. Res. , 30(3):615–631, 2005.[13] X. Chen, M. Sim, P. Sun, and J. Zhang. A linear decision-based approximation approach tostochastic programming.

Oper. Res. , 56(2):344–357, 2008.[14] Z.-L. Chen and W. Powell. Convergent cutting-plane and partial-sampling algorithm for mul-tistage stochastic linear programs with recourse.

J. Optim. Theory Appl. , 102(3):497–524,1999.[15] G. de Maere d’Aertrycke, A. Shapiro, and Y. Smeers. Risk exposure and lagrange multipliersof nonanticipativity constraints in multistage stochastic problems.

Math. Meth. Oper. Res. ,77(3):393–405, 2013.[16] O. Dowson and L. Kapelevich. SDDP.jl: a Julia package for Stochastic Dual Dynamic Pro-gramming.

Optimization Online , 2017.[17] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems.

Math. Program. , 106(3):423–432, 2006.[18] M. Eisner and P. Olsen. Duality for stochastic programming interpreted as L.P. in L p -space. SIAM J. Appl. Math. , 28(4):779–792, 1975.[19] C. I. F´abi´an and Z. Sz˝oke. Solving two-stage stochastic programming problems with leveldecomposition.

Computational Management Science , 4:313–353, 2007.[20] M. C. Fu. Feature article: Optimization for simulation: Theory vs. practice.

INFORMSJournal on Computing , 14(3):192–215, 2002.[21] S. Garstka and R. Wets. On decision rules in stochastic programming.

Math. Program. ,7(1):117–143, 1974.[22] H. I. Gassmann. Mslip: A computer code for the multistage stochastic linear programmingproblem.

Math. Program. , 47(1):407–423, 1990.[23] A. Georghiou, W. Wiesemann, and D. Kuhn. Generalized decision rule approximations forstochastic programming via liftings.

Math. Program. , 152(1-2):301–338, 2015.[24] P. Girardeau, V. Leclere, and A. Philpott. On the convergence of decomposition methodsfor multistage stochastic convex programs.

Mathematics of Operations Research , 40:130–145,2015.[25] V. Guigues. SDDP for some interstage dependent risk-averse problems and application tohydro-thermal planning.

Computational Optimization and Applications , 57:167–203, 2014.[26] V. Guigues. Convergence analysis of sampling-based decomposition methods for risk-aversemultistage stochastic convex programs.

SIAM Journal on Optimization , 26:2468–2494, 2016.

27] G. A. Hanasusanto, D. Kuhn, and W. Wiesemann. A comment on “computational complexityof stochastic programming problems”.

Math. Program. , 159(1):557–569, 2016.[28] H. Heitsch and W. R¨omisch. Scenario tree modeling for multistage stochastic programs.

Math. Program. , 118(2):371–406, 2009.[29] J. Higle and S. Sen. Multistage stochastic convex programs: Duality and its implications.

Ann. Oper. Res. , 142(1):129–146, 2006.[30] T. Homem de Mello. On rates of convergence for stochastic optimization problems undernon-independent and identically distributed sampling.

SIAM J. Optim. , 19:524–551, 2008.[31] L. J. Hong and B. L. Nelson. A brief introduction to optimization via simulation. In

WinterSimulation Conference , WSC ’09, pages 75–85. Winter Simulation Conference, 2009.[32] K. Høyland and S. Wallace. Generating scenario trees for multistage decision problems.

Man-agement Sci. , 47(2):295–307, 2001.[33] G. Infanger and D. Morton. Cut sharing for multistage stochastic linear programs with inter-stage dependency.

Math. Program. , 75(2):241–256, 1996.[34] M. Koivu. Variance reduction in sample approximations of stochastic programs.

Math. Pro-gram. , 103:463–485, 2005.[35] D. Kuhn, W. Wiesemann, and A. Georghiou. Primal and dual linear decision rules in stochasticand robust optimization.

Math. Program. , 130(1):177–209, 2011.[36] C. Lemar´echal, A. Nemirovskii, and Y. Nesterov. New variants of bundle methods.

Math. Pro-gram. , 69:111–147, 1995.[37] C. Lemar´echal, A. Nemirovskii, and Y. Nesterov. New variants of bundle methods.

Mathe-matical Programming , 69:111–147, 1995.[38] J. Linderoth, A. Shapiro, and S. Wright. The empirical behavior of sampling methods forstochastic programming.

Ann. Oper. Res. , 142:215–241, 2006.[39] W.-K. Mak, D. Morton, and R. Wood. Monte Carlo bounding techniques for determiningsolution quality in stochastic programs.

Oper. Res. Lett. , 24:47–56, 1999.[40] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation ap-proach to stochastic programming.

SIAM Journal on Optimization , 19(4):1574–1609, 2009.[41] T. Pennanen. Epi-convergent discretizations of multistage stochastic programs via integrationquadratures.

Math. Program. , 116:461–479, 2009.[42] M. Pereira and L. Pinto. Multi-stage stochastic optimization applied to energy planning.

Math. Program. , 52(1-3):359–375, 1991.

43] A. Philpott and V. De Matos. Dynamic sampling algorithms for multi-stage stochastic pro-grams with risk aversion.

European J. Oper. Res. , 218(2):470–483, 2012.[44] A. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming andrelated methods.

Oper. Res. Lett. , 36(4):450–455, 2008.[45] A. Philpott and G. Pritchard. Emi-doasa. Tech-nical report, Electric Power Optimization Centre, 2013. .[46] B. Polyak and A. Juditsky. Acceleration of stochastic approximation by averaging.

SIAM J.Control Optim. , 30:838–855, 1992.[47] W. Powell.

Approximate Dynamic Programming: Solving the curses of dimensionality , volume703. John Wiley & Sons, 2007.[48] H. Robbins and S. Monro. A stochastic approximation method.

Ann. Math. Stat. , 22:400–407,1951.[49] R. Rockafellar and R.-B. Wets. Scenarios and policy aggregation in optimization under uncer-tainty.

Math. Oper. Res. , 16(1):119–147, 1991.[50] A. Ruszczy´nski. A regularized decomposition method for minimizing a sum of polyhedralfunctions.

Mathematical Programming , 35(3):309–333, 1986.[51] S. Sen and Z. Zhou. Multistage stochastic decomposition: a bridge between stochastic pro-gramming and approximate dynamic programming.

SIAM J. Optim. , 24(1):127–153, 2014.[52] A. Shapiro. Stochastic programming approach to optimization under uncertainty.

Math. Pro-gram. , 112:183–220, 2008.[53] A. Shapiro. Analysis of stochastic dual dynamic programming method.

European J. Oper. Res. ,209(1):63–72, 2011.[54] A. Shapiro. Topics in stochastic programming.

CORE Lecture Series, Universite Catholiquede Louvain , 2011.[55] A. Shapiro, D. Dentcheva, and A. Ruszczy´nski.

Lectures on stochastic programming: modelingand theory , volume 16. SIAM, 2014.[56] A. Shapiro and T. Homem-de Mello. On the rate of convergence of optimal solutions of montecarlo approximations of stochastic programs.

SIAM J. Optim. , 11(1):70–86, 2000.[57] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In

Con-tinuous optimization , pages 111–146. Springer, 2005.

58] A. Shapiro, W. Tekaya, J. Paulo da Costa, and M. Pereira Soares. Risk neutral and risk aversestochastic dual dynamic programming method.

European Journal of Operational Research ,224(2):375 – 391, 2013.[59] R. M. Van S. and R. Wets. L-shaped linear programs with applications to optimal control andstochastic programming.

SIAM J. Appl. Math. , 17(4):638–663, 1969.[60] P. Vayanos, D. Kuhn, and B. Rustem. A constraint sampling approach for multi-stage robustoptimization.

Automatica , 48(3):459 – 471, 2012.[61] S. W. Wallace and W. T. Ziemba.

Applications of stochastic programming , volume 5. SIAM,2005. Data for the capacity expansion problem i = 1 i = 2 i = 3 ι i (k e /MW) 245.8 113.9 57.8 c i ( e /MWh) 41.9 58.9 90.8Table 6: Fixed annual cost and operation cost (Table 1 in [15]) ℓ = 1 ℓ = 2 ℓ = 3 ℓ = 4 ℓ = 5 ℓ = 6 ℓ = 7 ℓ = 8 d , ℓ (GW) 77.1 71.4 65.7 60.1 54.4 48.8 43.1 37.4 τ ℓ (h) 68 677 1585 1781 1367 1688 1289 305Table 7: Initial demand (Table 2 in [15]) w = 1 w = 2 w = 3 w = 4 w = 5 η w (%) 92.9 81.8 54.9 21.2 0.0 τ w (%) 19.8 21.78 18.2 26.7 13.5Table 8: Wind regimes (Table 3 in [15]) B Additional models for the capacity expansion example

B.1 Primal model and Benders decomposition

P-LDR-2S of the capacity expansion model is obtained by substituting s ti ( ξ t ) = X k ∈ [ K t ] Φ tk ( ξ t ) β tki n (20). Dropping ξ t dependences for variables to simplify the notation, we obtainmin X i ∈ [ I ] (cid:16) c u + i u +1 i + c u − i u − i + X j ∈ [ J ] c x ij x ij ( ξ t ) (cid:17) + X j ∈ [ J ] c z j z j (22a)+ X t ∈ [ T ] X i ∈ [ I ] c sti X k ∈ [ K t ] E (cid:2) Φ tk ( ξ t ) (cid:3) β tki + X t ∈ [2 ,T ] E [ Q t ( β, ξ t )]s.t. β i − u +1 i + u − i = 0 , i ∈ [ I ] , (22b) β i − x ij ≥ , i ∈ [ I ] , j ∈ [ J ] , (22c) X i ∈ [ I ] x ij + z j ≥ d j , j ∈ [ J ] , (22d)0 ≤ z j ≤ d j , j ∈ [ J ] , (22e)0 ≤ u +1 i ≤ M u + i , ≤ u − i ≤ M u − i , ≤ β i ≤ M s i , i ∈ [ I ] , (22f) x ij ≥ , i ∈ [ I ] , j ∈ [ J ] , (22g)where, for t ∈ [2 , T ], Q t ( β, ξ t ) is deﬁned as the optimal objective value of the following problem:min X i ∈ [ I ] (cid:16) c u + ti u + i + c u − ti u − i + X j ∈ [ J ] c xtij x ij (cid:17) + X j ∈ [ J ] c ztj z j (23a)s.t. u + i − u − i = X k ∈ [ K t ] Φ tk ( ξ t ) β tki − X k ∈ [ K t − ] Φ tk ( ξ t − ) β t − ,k,i , i ∈ [ I ] , (23b) x ij ≤ X k ∈ [ K t ] Φ tk ( ξ t ) β tki , i ∈ [ I ] , j ∈ [ J ] , (23c) X i ∈ [ I ] x ij + z j ≥ d tj ( ξ t ) , j ∈ [ J ] , (23d)0 ≤ z j ≤ d tj ( ξ t ) , j ∈ [ J ] , (23e)0 ≤ u + i ≤ M u + ti , i ∈ [ I ] , (23f)0 ≤ u − i ≤ M u − ti , i ∈ [ I ] , (23g) x ij ≥ , i ∈ [ I ] , j ∈ [ J ] , (23h)0 ≤ M sti − X k ∈ [ K t ] Φ tk ( ξ t ) β tki , i ∈ [ I ] , (23i)0 ≤ X k ∈ [ K t ] Φ tk ( ξ t ) β tki , i ∈ [ I ] , (23j)We note that P-LDR-2S of the capacity expansion model does not have relatively complete ecourse since the recourse constraints (23i) and (23j) might be violated under some scenarios.Let ξ Tn , n ∈ [ N ], be an independent and identically distributed (i.i.d.) random sample of therandom vector ξ T . We solve the obtained primal SAA problem with Benders decomposition, usinga single aggregate cut per time-stage. That is, we have a master problem of the following form:min X i ∈ [ I ] (cid:16) c u + i u +1 i + c u − i u − i + X j ∈ [ J ] c x ij x ij ( ξ t ) (cid:17) + X j ∈ [ J ] c z j z j (24a)+ X t ∈ [ T ] X i ∈ [ I ] c sti X k ∈ [ K t ] E (cid:2) Φ tk ( ξ t ) (cid:3) β tki + X t ∈ [2 ,T ] η t s.t. (22b) − (22g) , (24b)( η t , β t , . . . , β tK t I ) ∈ O t , t ∈ [2 , T ] , (24c)( β t , . . . , β tK t I ) ∈ F t , t ∈ [2 , T ] , (24d)0 ≤ X k ∈ [ K t ] Φ tk ( ξ tn ) β tki ≤ M sti , t ∈ [2 , T ] , i ∈ [ I ] , n ∈ [ N ] , (24e) η t ≥ , t ∈ [2 , T ] . (24f)The variable η t represents N P n ∈ [ N ] [ Q t ( β, ξ tn )], that is the expected second-stage cost at period t ∈ [2 , T ]. Note that since all the original decision variables are deﬁned to be nonnegative, and allthe cost parameters are assumed to be nonnegative, Q t ( β, ξ t ) ≥ β , thus (24f) arevalid. Constraints (24c) and (24d) correspond to the set of Benders optimality and feasibility cuts,respectively. As β variables belong to the master problem, we add constraints (23i) and (23j) foreach scenario in the sample to the master problem as (24e) which can be seen as an additional setof feasibility cuts.The subproblem decomposes not only by scenario but also by stage. For t ∈ [2 , T ] and n ∈ [ N ],we have the corresponding subproblem (23a)-(23h), denoted by SP( t, n ).At every iteration of the Benders decomposition algorithm, we solve the master problem, get acandidate β solution which is ﬁxed in the subproblems, and solve all the subproblems. For t ∈ [2 , T ],if there is at least one index n ∈ [ N ] for which SP( t, n ) is infeasible, then we generate a Bendersfeasibility cut and add it to the master problem. Otherwise, we generate a Benders optimality cut,but add it to the master problem only if it is violated at the current master problem solution. Werepeat this procedure until all the subproblems are feasible and no violated optimality cuts arefound. .2 Dual model and level method Let λ, γ, θ + , θ − , Γ u + , Γ u − , Γ s be the dual variables associated with the constraints (20b)-(20h) in(20), respectively. Then, the dual of (20) is:max E X t ∈ [ T ] h X j ∈ [ J ] d tj ( ξ t ) (cid:0) θ + tj ( ξ t ) − θ − tj ( ξ t ) (cid:1) − X i ∈ [ I ] (cid:0) M u + ti Γ u + ti ( ξ t ) + M u − ti Γ u − ti ( ξ t ) + M sti Γ sti ( ξ t ) (cid:1)i (25a)s.t. λ ti ( ξ t ) − E [ λ t +1 ,i ( ξ t +1 ) | ξ t ]+ X j ∈ [ J ] γ tij ( ξ t ) − Γ sti ( ξ t ) ≤ c sti , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (25b) − Γ u + ti ( ξ t ) − λ ti ( ξ t ) ≤ c u + ti , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (25c) − Γ u − ti ( ξ t ) + λ ti ( ξ t ) ≤ c u − ti , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , (25d) θ + tj ( ξ t ) − θ − tj ( ξ t ) ≤ c ztj , t ∈ [ T ] , P -a.s. , j ∈ [ J ] , (25e) θ + tj ( ξ t ) − γ tij ( ξ t ) ≤ c xtij , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , j ∈ [ J ] , (25f) γ tij ( ξ t ) ≥ , t ∈ [ T ] , P -a.s. , i ∈ [ I ] , j ∈ [ J ] , (25g) θ + tj ( ξ t ) , θ − tj ( ξ t ) ≥ , t ∈ [ T ] , P -a.s. , j ∈ [ J ] , (25h)Γ u + ti ( ξ t ) , Γ u − ti ( ξ t ) , Γ sti ( ξ t ) ≥ t ∈ [ T ] , P -a.s. , i ∈ [ I ] . (25i)Observing that θ − variables are redundant, we remove them to simplify the model. D-LDR-2Sof the capacity expansion model is obtained by substituting λ ti ( ξ t ) = X k ∈ [ K t ] Φ tk ( ξ t )Λ tki n (25). Dropping ξ t dependences for variables to simplify the notation, we obtainmax X j ∈ [ J ] d j θ +1 j − X i ∈ [ I ] (cid:0) M u + i Γ u + i + M u − i Γ u − i + M s i Γ s i (cid:1) + X t ∈ [2 ,T ] E [ G t (Λ , ξ t )] (26a)s.t. Λ i − X k ∈ [ K ] E (cid:2) Φ k ( ξ ) (cid:3) Λ ki + X j ∈ [ J ] γ ij − Γ s i ≤ c s i , i ∈ [ I ] , (26b) − Γ u + i − Λ i ≤ c u + i , i ∈ [ I ] , (26c) − Γ u − i + Λ i ≤ c u − i , i ∈ [ I ] , (26d) θ +1 j ≤ c z j , j ∈ [ J ] , (26e) θ +1 j − γ ij ≤ c x ij , i ∈ [ I ] , j ∈ [ J ] (26f) γ ij ≥ , i ∈ [ I ] , j ∈ [ J ] (26g) θ +1 j ≥ , j ∈ [ J ] , (26h)Γ u + i , Γ u − i , Γ s i ≥ i ∈ [ I ] , (26i)where, for t ∈ [2 , T ], G t (Λ , ξ t ) is deﬁned as the optimal objective value of the following problem:max X j ∈ [ J ] d tj θ + j − X i ∈ [ I ] (cid:0) M u + ti Γ u + i + M u − ti Γ u − i + M sti Γ si (cid:1) (27a)s.t. X j ∈ [ J ] γ ij − Γ si ≤ c sti − X k ∈ [ K t ] Φ tk ( ξ t )Λ tki + X k ∈ [ K t +1 ] E h Φ t +1 ,k ( ξ t +1 ) (cid:12)(cid:12)(cid:12) ξ t i Λ t +1 ,k,i , i ∈ [ I ] , (27b) − Γ u + i ≤ c u + ti + X k ∈ [ K t ] Φ tk ( ξ t )Λ tki , i ∈ [ I ] , (27c) − Γ u − i ≤ c u − ti − X k ∈ [ K t ] Φ tk ( ξ t )Λ tki , i ∈ [ I ] , (27d) θ + j ≤ c ztj , j ∈ [ J ] , (27e) θ + j − γ ij ≤ c xtij , i ∈ [ I ] , j ∈ [ J ] , (27f) γ ij ≥ , i ∈ [ I ] , j ∈ [ J ] , (27g) θ + j ≥ , j ∈ [ J ] , (27h)Γ u + i , Γ u − i , Γ si ≥ i ∈ [ I ] . (27i)Let ξ Tn , n ∈ [ N ], be an independent and identically distributed (i.i.d.) random sample of therandom vector ξ T . We solve the obtained dual SAA problem with the bundle-level method, becausethe Benders decomposition method converged slowly for this problem. We use cuts aggregated ver scenarios, thus introduce ζ t variable to represent the expected second-stage cost value, i.e., N P n ∈ [ N ] G t (Λ , ξ tn ), for t ∈ [2 , T ].We observe that the subproblem (27) can be further decomposed into two: one problem includ-ing only the u + and u − variables, and the other problem including the remaining set of variables.(DSP upart ) : max − X i ∈ [ I ] (cid:0) M u + ti Γ u + i + M u − ti Γ u − i (cid:1) s.t. (27c) , (27d) , (27i)(DSP rest ) : max X j ∈ [ J ] d tj θ + j − X i ∈ [ I ] M sti Γ si s.t. (27b) , (27e) − (27h)We exploit this decomposition to disaggregate the optimality cuts in the master problem. Thus,we introduce additional variables ζ upartt and ζ restt for t ∈ [2 , T ] and obtain the following masterproblem: (MP) : max X j ∈ [ J ] d j θ +1 j − X i ∈ [ I ] (cid:0) M u + i Γ u + i + M u − i Γ u − i + M s i Γ s i (cid:1) + X t ∈ [2 ,T ] ζ t (28a)s.t. (26b) − (26i) , (28b) ζ t = ζ upartt + ζ restt , t ∈ [2 , T ] , (28c)( ζ upartt , Λ t , . . . , Λ tK t I ) ∈ U t , t ∈ [2 , T ] , (28d)( ζ restt , Λ t , . . . , Λ tK t I ) ∈ R t , t ∈ [2 , T ] , (28e) ζ upartt ≤ , t ∈ [2 , T ] , (28f) ζ restt ≤ N X n ∈ [ N ] X j ∈ [ J ] d tj ( ξ tn ) c ztj , t ∈ [2 , T ] , (28g)where U t and R t represent the optimality cuts derived from problems (DSP upart ) and (DSP rest ),respectively. Moreover, we introduce the upper bounds on the new auxiliary variables, which arederived from the subproblems (DSP upart ) and (DSP rest ).The level method also uses a quadratic program for regularization which projects the previousiterate on the level set of the current approximation of the objective function. We use the following roblem for this projection:(QP) : max || Λ − ˆΛ || s.t. (28b) − (28g) , X j ∈ [ J ] d j θ +1 j − X i ∈ [ I ] (cid:0) M u + i Γ u + i + M u − i Γ u − i + M s i Γ s i (cid:1) ≥ L, where ˆΛ and L denote the current Λ solution (i.e., the previous iterate) and the level target,respectively. The optimal solution values of Λ variables determine the next iterate.The details of the level method are provided in Algorithm 1 where LB and UB denote lowerbound and upper bound, respectively. Algorithm 1 : Level Algorithm1. Initialize ˆΛ = 0, LB = −∞ , UB = ∞

2. Solve all the subproblems, i.e., (DSP upart ) and (DSP rest ) for all t ∈ [2 , T ] and n ∈ [ N ].Generate Benders optimality cuts, add them to both (MP) and (QP).Compute the objective value of the current iterate, and set it as LB.3. doSolve (MP). Update UB if (MP) optimal objective value is lower than UB.Set L = 0 . × UB + 0 . × LB.Solve (QP) with updated level constraint, to obtain iterate ˆΛ.Solve all the subproblems at current iterate.Generate Benders optimality cuts, and add violated cuts to both (MP) and (QP).Compute the objective value of the current iterate; if it is larger than LB, update LB.until | UB - LB | / UB < −5