Single cut and multicut SDDP with cut selection for multistage stochastic linear programs: convergence proof and numerical experiments
SSINGLE CUT AND MULTICUT SDDP WITH CUT SELECTION FOR MULTISTAGESTOCHASTIC LINEAR PROGRAMS: CONVERGENCE PROOF AND NUMERICALEXPERIMENTS
Vincent Guigues (corresponding author)School of Applied Mathematics, FGVPraia de Botafogo, Rio de Janeiro, Brazil [email protected]
Michelle BandarraSchool of Applied Mathematics, FGVPraia de Botafogo, Rio de Janeiro, Brazil [email protected]
Abstract.
We introduce a variant of Multicut Decomposition Algorithms (MuDA), called CuSMuDA (CutSelection for Multicut Decomposition Algorithms), for solving multistage stochastic linear programs thatincorporates a class of cut selection strategies to choose the most relevant cuts of the approximate recoursefunctions. This class contains Level 1 [28] and Limited Memory Level 1 [16] cut selection strategies, initiallyintroduced for respectively Stochastic Dual Dynamic Programming (SDDP) and Dual Dynamic Program-ming (DDP). We prove the almost sure convergence of the method in a finite number of iterations andobtain as a by-product the almost sure convergence in a finite number of iterations of SDDP combined withour class of cut selection strategies.We compare the performance of MuDA, SDDP, and their variants with cut selection (using Level 1 andLimited Memory Level 1) on several instances of a portfolio problem and of an inventory problem. On theseexperiments, in general, SDDP is quicker (i.e., satisfies the stopping criterion quicker) than MuDA and cutselection allows us to decrease the computational bulk with Limited Memory Level 1 being more efficient(sometimes much more) than Level 1.
Keywords:
Stochastic Programming; Stochastic Dual Dynamic Programming; Multicut DecompositionAlgorithm; Portfolio selection; Inventory management.
AMS subject classifications:
Introduction
Multistage stochastic optimization problems are common in many areas of engineering and in finance.However, solving these problems is challenging and in general requires decomposition techniques. Two popu-lar decomposition methods are Approximate Dynamic Programming (ADP) [30] and sampling-based variantsof the Nested Decomposition (ND) algorithm [6, 5] and of the Multicut Nested Decomposition (MuND) algo-rithm [9, 11]. The introduction of sampling within ND was proposed by [25] and the corresponding methodis usually called Stochastic Dual Dynamic Programming (SDDP). Several enhancements and extensions ofSDDP have been proposed such as CUPPS [10], ReSA [20], AND [7], DOASA [29], risk-averse variants in[17, 18, 27, 14, 34, 22], cut formulas for nonlinear problems in [15], regularizations in [2] for linear prob-lems and SDDP-REG in [19] for nonlinear problems. Convergence of these methods was proved in [29] forlinear problems, in [13] for nonlinear risk-neutral problems, and in [15] for risk-averse nonlinear problems.All these algorithms compute lower approximations of the cost-to-go functions expressed as a supremum ofaffine functions called optimality cuts. Typically, at each iteration, a fixed number of cuts is added for eachcost-to-go function. Therefore, techniques to reduce the number of cuts in each subproblem (referred to ascut selection or cut pruning) may be helpful to speed up the convergence of these methods. In stochasticoptimization, the problem of cut selection for lower approximations of the cost-to-go functions associatedto each node of the scenario tree was discussed for the first time in [31] where only the active cuts areselected. Pruning strategies of basis (quadratic) functions have been proposed in [12] and [24] for max-plusbased approximation methods which, similarly to SDDP, approximate the cost-to-go functions of a nonlinearoptimal control problem by a supremum of basis functions. More precisely, in [12], a fixed number of cuts is a r X i v : . [ m a t h . O C ] J u l liminated and cut selection is done solving a combinatorial optimization problem. For SDDP, in [34] it issuggested at some iterations to eliminate redundant cuts (a cut is redundant if it is never active in describingthe lower approximate cost-to-go function). This procedure is called test of usefulness in [26]. This requiressolving at each stage as many linear programs as there are cuts. In [26] and [28], only the cuts that havethe largest value for at least one of the trial points computed are considered relevant. This strategy is calledthe Territory algorithm in [26] and Level 1 cut selection in [28]. It was presented for the first time in 2007at the ROADEF congress by David Game and Guillaume Le Roy (GDF-Suez), see [26].Sampling can also be incorporated into MuND algorithm (see for instance [36]). This algorithm buildsmany more cuts than SDDP per iteration and therefore each iteration takes more time but less iterationsare in general needed to satisfy some stopping criterion. Therefore, cut selection strategies could also beuseful. However, to the best of our knowledge, the combination of multicut decomposition methods with cutselection strategies, refereed to as CuSMuDA (Cut Selection for Multicut Decomposition Algorithms) in thesequel, has not been proposed so far. In this context, the objectives and contributions of this paper are thefollowing:(A) we propose cut selection strategies that are more efficient than the aforementioned ones. More pre-cisely, instead of selecting all the cuts that are the highest at the trial points, we introduce a set ofselectors that select some subset of these cuts. The selectors have to satisfy an assumption (Assump-tion (H3), see Section 3.2) to ensure the convergence of SDDP and MuDA combined with these cutselection strategies. We obtain a family of cut selection strategies; a given strategy correspondingto a choice of selectors along the iterations. In this family, the most economic (in terms of mem-ory) cut selection strategy satisfying (H3) is the Limited Memory Level 1 (LML 1) strategy whichselects at each trial point only one cut, namely the oldest cut. This strategy was introduced in [16]in the context of Dual Dynamic Programming but can be straightforwardly applied to SDDP andMuDA. The least economic strategy, i.e., the one that keeps the largest amount of cuts, is Level 1.“Between” these two strategies, using the flexibility offered by the selectors (as long as Assumption(H3) is satisfied by these selectors), we obtain a (large) family of cut selection strategies.(B) We introduce and describe CuSMuDA, a combination of MuDA with our family of cut selectionstrategies.(C) We prove the almost sure convergence of CuSMuDA in a finite number of iterations. This proofextends the theory in [16] in two aspects: (i) first the stochastic case is considered, i.e., SDDP andmulticut SDDP are considered whereas the deterministic case, i.e., DDP, was considered in [16] andsecond (ii) more general cut selection strategies are dealt with. Item (ii) requires an additionaltechnical discussion, see Lemma 1.(D) We present the results of numerical experiments comparing the performance of six solution methodson several instances of a portfolio problem and of an inventory problem. These six solution methodsare SDDP, SDDP with Level 1 cut selection, SDDP with LML 1 cut selection, MuDA, CuSMuDAwith Level 1 cut selection, and CuSMuDA with LML 1 cut selection. To the best of our knowledge,these are the first numerical experiments on SDDP with LML 1 cut selection and the first experimentson multicut SDDP combined with cut selection. The main conclusions of these experiments are thefollowing: – in general, for a given instance, CuSMuDA with LML 1 cut selection (resp. SDDP combinedwith LML 1 cut selection) is more efficient (i.e., allows us to satisfy more quickly the stoppingcriterion) than CuSMuDA with Level 1 cut selection (resp. SDDP combined with Level 1 cutselection), itself much more efficient than MuDA (resp. SDDP). Typically, variants with cutselection require more iterations but iterations are quicker with very few cuts selected for thefirst stages. However, on some instances, CuSMuDA with Level 1 cut selection still selected alarge proportion of cuts for all stages and was less efficient than both MuDA and CuSMuDAwith LML 1 cut selection. – MuDA (resp. CuSMuDA) in general requires much more computational bulk than SDDP (resp.SDDP with cut selection). However, on some instances, CuSMuDA with LML 1 cut selectionand SDDP combined with LML 1 cut selection have shown similar performances. We alsoexpect CuSMuDA to be more efficient than SDDP with cut selection when MuDA is alreadymore efficient than SDDP. Even if this is not often the case, the results of numerical experiments n several instances of multistage stochastic linear programs where MuDA is quicker than SDDPare reported in [8].The outline of the study is as follows. The class of problems considered and assumptions are discussed inSection 2. In Subsection 3.1 we recall sampling-based MuND while in Subsection 3.2 CuSMuDA is described.In Section 4, we prove Theorem 1 which states that CuSMuDA converges almost surely in a finite number ofiterations to an optimal policy. As a by-product, we obtain the almost sure convergence of SDDP combinedwith our class of cut selection strategies (in particular Level 1 and LML 1) in a finite number of iterations.Finally, numerical experiments are presented in Section 5.Throughout the paper, the usual scalar product in R n is denoted by (cid:104) x, y (cid:105) = x (cid:62) y for x, y ∈ R n .2. Problem formulation and assumptions
We are interested in solution methods for linear stochastic dynamic programming equations: the firststage problem is(1) Q ( x ) = (cid:26) inf x ∈ R n (cid:104) c , x (cid:105) + Q ( x ) A x + B x = b , x ≥ x given and for t = 2 , . . . , T , Q t ( x t − ) = E ξ t [ Q t ( x t − , ξ t )] with(2) Q t ( x t − , ξ t ) = (cid:26) inf x t ∈ R n (cid:104) c t , x t (cid:105) + Q t +1 ( x t ) A t x t + B t x t − = b t , x t ≥ , with the convention that Q T +1 is null and where for t = 2 , . . . , T , random vector ξ t corresponds to theconcatenation of the elements in random matrices A t , B t which have a known finite number of rows andrandom vectors b t , c t (it is assumed that ξ is not random). For convenience, we will denote X t ( x t − , ξ t ) := { x t ∈ R n : A t x t + B t x t − = b t , x t ≥ } . We make the following assumptions:(H1) The random vectors ξ , . . . , ξ T are independent and have discrete distributions with finite support.(H2) The set X ( x , ξ ) is nonempty and bounded and for every x ∈ X ( x , ξ ), for every t = 2 , . . . , T ,for every realization ˜ ξ , . . . , ˜ ξ t of ξ , . . . , ξ t , for every x τ ∈ X τ ( x τ − , ˜ ξ τ ) , τ = 2 , . . . , t −
1, the set X t ( x t − , ˜ ξ t ) is nonempty and bounded.We will denote by Θ t = { ξ t , . . . , ξ tM t } the support of ξ t for stage t with p ti = P ( ξ t = ξ ti ) > , i = 1 , . . . , M t and with vector ξ tj being the concatenation of the elements in A tj , B tj , b tj , c tj .3. Algorithms
Multicut stochastic decomposition.
The multicut stochastic decomposition method approximatesthe function Q t ( · , ξ tj ) at iteration k for t = 2 , . . . , T , j = 1 , . . . , M t , by a piecewise affine lower boundingfunction Q kt ( · , ξ tj ) which is a maximum of k affine functions C itj called cuts: Q kt ( x t − , ξ tj ) = max ≤ i ≤ k C itj ( x t − ) with C itj ( x t − ) = θ itj + (cid:104) β itj , x t − (cid:105) where coefficients θ itj , β itj are computed as explained below. These approximations provide the lower bound-ing functions(3) Q kt ( x t − ) = M t (cid:88) j =1 p tj Q kt ( x t − , ξ tj )for Q t . Since Q T +1 is the null function, we will also define Q kT +1 ≡
0. The steps of MuDA are described below.
Step 1: Initialization.
For t = 2 , . . . , T , j = 1 , . . . , M t , take for Q t ( · , ξ tj ) a known lower boundingaffine function for Q t ( · , ξ tj ). Set the iteration count k to 1 and Q T +1 ≡ tep 2: Forward pass. We generate a sample ˜ ξ k = ( ˜ ξ k , ˜ ξ k , . . . , ˜ ξ kT ) from the distribution of ( ξ , ξ , . . . , ξ T ),with the convention that ˜ ξ k = ξ (here and in what follows, the tilde symbol will be used to represent real-izations of random variables: for random variable ξ , ˜ ξ is a realization of ξ ). Using approximation Q k − t ( · , ξ tj )of Q t ( · , ξ tj ) (computed at previous iterations), we solve the problem(4) (cid:26) inf x t ∈ R n (cid:104) x t , ˜ c kt (cid:105) + Q k − t +1 ( x t ) x t ∈ X t ( x kt − , ˜ ξ kt )for t = 1 , . . . , T , where x k = x and Q k − t +1 is given by (3) with k replaced by k −
1. Let x kt be an optimalsolution of the problem. Step 3: Backward pass.
The backward pass builds cuts for Q t ( · , ξ tj ) at x kt − computed in the forwardpass. For k ≥ t = 1 , . . . , T , we introduce the function Q kt : R n × Θ t → R given by(5) Q kt ( x t − , ξ t ) = (cid:26) inf x t ∈ R n (cid:104) c t , x t (cid:105) + Q kt +1 ( x t ) x t ∈ X t ( x t − , ξ t ) , with the convention that Θ = { ξ } , and we set Q kT +1 ≡
0. For j = 1 , . . . , M T , we solve the problem(6) Q T ( x kT − , ξ T j ) = (cid:40) inf x T ∈ R n (cid:104) c T j , x T (cid:105) A T j x T + B T j x kT − = b T j , x T ≥ , with dual (cid:26) sup λ (cid:104) λ, b T j − B T j x kT − (cid:105) A (cid:62) T j λ ≤ c T j . Let λ kT j be an optimal solution of the dual problem above. We get Q T ( x T − , ξ T j ) ≥ (cid:104) λ kT j , b T j − B T j x T − (cid:105) and compute θ kT j = (cid:104) b T j , λ kT j (cid:105) and β kT j = − B (cid:62) T j λ kT j . Then for t = T − t = 2, knowing Q kt +1 ≤ Q t +1 ,we solve the problem below for j = 1 , . . . , M t ,(7) Q kt ( x kt − , ξ tj ) = (cid:40) inf x t (cid:104) c tj , x t (cid:105) + Q kt +1 ( x t ) x t ∈ X t ( x kt − , ξ tj ) = inf x t ,f (cid:104) c tj , x t (cid:105) + M t +1 (cid:88) (cid:96) =1 p t +1 (cid:96) f (cid:96) A tj x t + B tj x kt − = b tj , x t ≥ ,f (cid:96) ≥ θ it +1 (cid:96) + (cid:104) β it +1 (cid:96) , x t (cid:105) , i = 1 , . . . , k, (cid:96) = 1 , . . . , M t +1 . Observe that due to (H2) the above problem is feasible and has a finite optimal value. Therefore Q kt ( x kt − , ξ tj )can be expressed as the optimal value of the corresponding dual problem:(8) Q kt ( x kt − , ξ tj ) = sup λ,µ (cid:104) λ, b tj − B tj x kt − (cid:105) + k (cid:88) i =1 M t +1 (cid:88) (cid:96) =1 µ i(cid:96) θ it +1 (cid:96) A (cid:62) tj λ + (cid:80) ki =1 (cid:80) M t +1 (cid:96) =1 µ i(cid:96) β it +1 (cid:96) ≤ c tj ,p t +1 (cid:96) = (cid:80) ki =1 µ i(cid:96) , (cid:96) = 1 , . . . , M t +1 ,µ i(cid:96) ≥ , i = 1 , . . . , k, (cid:96) = 1 , . . . , M t +1 . Let ( λ ktj , µ ktj ) be an optimal solution of dual problem (8). Using the fact that Q kt +1 ≤ Q t +1 , we get Q t ( x t − , ξ tj ) ≥ Q kt ( x t − , ξ tj ) ≥ (cid:104) λ ktj , b tj − B tj x t − (cid:105) + (cid:104) µ ktj , θ kt +1 (cid:105) and we compute θ ktj = (cid:104) λ ktj , b tj (cid:105) + (cid:104) µ ktj , θ kt +1 (cid:105) and β ktj = − B (cid:62) tj λ ktj . In these expressions, vector θ kt +1 has components θ it +1 (cid:96) , (cid:96) = 1 , . . . , M t +1 , i = 1 , . . . , k , arranged in the sameorder as components µ ktj ( (cid:96), i ) of µ ktj . Step 4: Do k ← k + 1 and go to Step 2. .2. Multicut stochastic decomposition with cut selection.
We now describe a variant of MuDA thatstores all cut coefficients θ itj , β itj , and trial points x it − , but that uses a reduced set of cuts C itj to approximatefunctions Q t ( · , ξ tj ) when solving problem (4) in the forward pass and (7) in the backward pass. Let S ktj bethe set of indices of the cuts selected at the end of iteration k to approximate Q t ( · , ξ tj ). At the end of thebackward pass of iteration k , the variant of MuDA with cut selection computes approximations Q kt of Q t given by (3) now with Q kt ( · , ξ tj ) given by(9) Q kt ( x t − , ξ tj ) = max (cid:96) ∈ S ktj C (cid:96)tj ( x t − ) , where the set S ktj is a subset of the set of indices of the cuts that have the largest value for at least one ofthe trial points computed so far. More precisely, sets S ktj are initialized taking S tj = { } . For t ∈ { , . . . , T } and k ≥
1, sets S ktj are computed as follows. For i = 1 , . . . , k , t = 2 , . . . , T , j = 1 , . . . , M t , let I iktj be the setof cuts for Q t ( · , ξ tj ) computed at iteration k or before, that have the largest value at x it − :(10) I iktj = arg max (cid:96) =1 ,...,k C (cid:96)tj ( x it − ) , where the cut indices in I iktj are sorted in ascending order. With a slight abuse of notation, we will denotethe (cid:96) -th smallest element in I iktj by I iktj ( (cid:96) ). For instance, if I iktj = { , , } then I iktj (1) = 2 , I iktj (2) = 30,and I iktj (3) = 50. A cut selection strategy is given by a set of selectors S tj ( m ), t = 2 , . . . , T , j = 1 , . . . , M t , m = 1 , , . . . , where S tj ( m ) is a subset of the set of m integers { , , . . . , m } , giving the indices of the cutsto select in I iktj through the relation S ktj = k (cid:91) i =1 (cid:8) I iktj ( (cid:96) ) : (cid:96) ∈ S tj ( | I iktj | ) (cid:9) , where | I iktj | is the cardinality of set I iktj . We require the selectors to satisfy the following assumption:(H3) for t = 2 , . . . , T , j = 1 , . . . , M t , for every m ≥ S tj ( m ) ⊆ S tj ( m + 1).Level 1 and Limited Memory Level 1 cut selection strategies described in Examples 3.1 and 3.2 respectivelycorrespond to the least and most economic selectors satisfying (H3): Example 3.1 (Level 1 and Territory Algorithm) . The strategy S tj ( m ) = { , , . . . , m } selects all cuts thathave the highest value for at least one trial point. In the context of SDDP, this strategy was called Level 1in [28] and Territory Algorithm in [26] . For this strategy, we have S kT j = { , . . . , k } for all j and k ≥ ,meaning that no cut selection is needed for the last stage T . This comes from the fact that for all k ≥ and ≤ k ≤ k , cut C k T j is selected because it is one of the cuts with the highest value at x k T − . Indeed, for any ≤ k ≤ k with k (cid:54) = k , since λ k T j is feasible for problem (6) with x kT − replaced by x k T − , we get C k T j ( x k T − ) = Q T ( x k T − , ξ T j ) ≥ (cid:104) λ k T j , b
T j − B T j x k T − (cid:105) = C k T j ( x k T − ) . Example 3.2 (Limited Memory Level 1) . The strategy that eliminates the largest amount of cuts, calledLimited Memory Level 1 (LML 1 for short), consists in taking a singleton for every set S tj ( m ) . For (H3) tobe satisfied, this implies S tj ( m ) = { } . This choice corresponds to the Limited Memory Level 1 cut selectionintroduced in [16] in the context of DDP. For that particular choice, at a given point, among the cuts thathave the highest value only the oldest (i.e., the cut that was first computed among the cuts that have thehighest value at that point) is selected. Remark 3.1.
Observe that our selectors select cuts for functions Q t ( · , ξ tj ) whereas Level 1 (resp. LML1) was introduced to select cuts for functions Q t for SDDP (resp. DDP). Therefore we could have used theterminologies Multicut Level 1 and Multicut Limited Memory Level 1 instead of Level 1 and Limited MemoryLevel 1. We did not do so to simplify and therefore to know to which functions cut selection strategies apply,it suffices to add the name of the decomposition method; for instance MuDA with Level 1 cut selection (inwhich case cut selection applies to functions Q t ( · , ξ tj ) , as explained above) or SDDP with Limited MemoryLevel 1 cut selection (in which case cut selection applies to functions Q t ). evel 1 Limited Memory Level 1 I ktj = { k } , m ktj = C ktj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) Else if C ktj ( x (cid:96)t − ) = m (cid:96)tj I (cid:96)tj = I (cid:96)tj ∪ { k } End IfIf C (cid:96)tj ( x kt − ) > m ktj I ktj = { (cid:96) } , m ktj = C (cid:96)tj ( x kt − ) Else if C (cid:96)tj ( x kt − ) = m ktj I ktj = I ktj ∪ { (cid:96) } End IfEnd ForFor (cid:96) = 1 , . . . , k , Selected [ (cid:96) ]= False
End ForFor (cid:96) = 1 , . . . , k
For m = 1 , . . . , | I (cid:96)tj | Selected [ I (cid:96)tj [ m ]]= True
End ForEnd For S ktj = ∅ For (cid:96) = 1 , . . . , k If Selected [ (cid:96) ]= True S ktj = S ktj ∪ { (cid:96) } End IfEnd For I ktj = { } , m ktj = C tj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) End IfIf C (cid:96) +1 tj ( x kt − ) > m ktj I ktj = { (cid:96) + 1 } , m ktj = C (cid:96) +1 tj ( x kt − ) End IfEnd ForFor (cid:96) = 1 , . . . , k , Selected [ (cid:96) ]= False
End ForFor (cid:96) = 1 , . . . , k
For m = 1 , . . . , | I (cid:96)tj | Selected [ I (cid:96)tj [ m ]]= True
End ForEnd For S ktj = ∅ For (cid:96) = 1 , . . . , k If Selected [ (cid:96) ]= True S ktj = S ktj ∪ { (cid:96) } End IfEnd For
Figure 1.
Pseudo-codes for the computation of set S ktj for fixed t ∈ { , . . . , T } , k ≥ , j =1 , . . . , M t , and two cut selection strategies.The computation of S ktj , i.e., of the cut indices to select at iteration k , is performed in the backward pass(immediately after computing cut C ktj ) using the pseudo-code given in the left and right panels of Figure 1for the Level 1 and LML 1 cut selection strategies respectively.In this pseudo-code, we use the notation I itj in place of I iktj . We also store in variable m itj the currentvalue of the highest cut for Q t ( · , ξ tj ) at x it − . At the end of the first iteration, we initialize m tj = C tj ( x t − ).After cut C ktj is computed at iteration k ≥
2, these variables are updated using the relations (cid:26) m itj ← max( m itj , C ktj ( x it − )) , i = 1 , . . . , k − ,m ktj ← max( C (cid:96)tj ( x kt − ) , (cid:96) = 1 , . . . , k ) . We also use an array of Boolean called
Selected using the information given by variables I itj whose (cid:96) -thentry is True if cut (cid:96) is selected for Q t ( · , ξ tj ) and False otherwise. This allows us to avoid copies of cutindices that may appear in I i ktj and I i ktj with i (cid:54) = i .4. Convergence analysis
In this section, we prove that CuSMuDA converges in a finite number of iterations. We will make thefollowing assumption:(H4) The samples in the forward passes are independent: ( ˜ ξ k , . . . , ˜ ξ kT ) is a realization of ξ k = ( ξ k , . . . , ξ kT ) ∼ ( ξ , . . . , ξ T ) and ξ , ξ , . . . , are independent.The convergence proof is based on the following lemma: emma 1. Assume that all subproblems in the forward and backward passes of CuSMuDA are solved usingan algorithm that necessarily outputs an extreme point of the feasible set (for instance the simplex algorithm).Let assumptions (H1), (H2), (H3), and (H4) hold. Then almost surely, there exists k ≥ such that forevery k ≥ k , t = 2 , . . . , T , j = 1 , . . . , M t , we have (11) Q kt ( · , ξ tj ) = Q k t ( · , ξ tj ) and Q kt = Q k t . Proof.
Let Ω be the event on the sample space Ω of sequences of forward scenarios such that every scenariois sampled an infinite number of times. By Assumption (H4), this event Ω has probability one.Consider a realization ω ∈ Ω of CuSMuDA in Ω corresponding to realizations ( ˜ ξ k T ) k of ( ξ k T ) k in theforward pass. To simplify, we will drop ω in the notation. For instance, we will simply write x kt , Q kt forrealizations x kt ( ω ) and Q kt ( · )( ω ) of x kt , Q kt given realization ω ∈ Ω of CuSMuDA.We show by induction on t that the number of different cuts computed by the algorithm is finite and thatafter some iteration k t the same cuts are selected for functions Q t ( · , ξ tj ). Our induction hypothesis H ( t ) for t ∈ { , . . . , T } is that the sets { ( θ ktj , β ktj ) : k ∈ N } , j = 1 , . . . , M t , are finite and there exists some finite k t such that for every k > k t we have(12) { ( θ (cid:96)tj , β (cid:96)tj ) : (cid:96) ∈ S ktj } = { ( θ (cid:96)tj , β (cid:96)tj ) : (cid:96) ∈ S k t tj } and x kt − = x k t t − , for every j = 1 , . . . , M t . We will denote by I iktj the set (cid:8) I iktj ( (cid:96) ) : (cid:96) ∈ S tj ( | I iktj | ) (cid:9) . We first show, in items a and b below that H ( T ) holds. a. Observe that λ kT j defined in the backward pass of CuSMuDA is an extreme point of the polyhedron { λ : A (cid:62) T j λ ≤ c T j } . This polyhedron is a finite intersection of closed half spaces in finite dimension (since A T j has a finite number of rows and columns) and therefore has a finite number of extreme points. It follows that λ kT j can only take a finite number of values, same as ( θ kT j , β kT j ) = ( (cid:104) λ kT j , b T j (cid:105) , − B (cid:62) T j λ kT j ), and there exists ¯ k T such that for every k > ¯ k T and every j , each cut C kT j is a copy of a cut C k (cid:48) T j with 1 ≤ k (cid:48) ≤ ¯ k T (no new cut iscomputed for functions Q T ( · , ξ T j ) for k > ¯ k T ).Now recall that x kT − computed in the forward pass is a solution of (4) with t = T − f , f , . . . , f M T replacing in the objective Q k − T ( x T − ) by (cid:80) M T (cid:96) =1 p T (cid:96) f (cid:96) and adding the linear constraints f (cid:96) ≥ θ iT (cid:96) + (cid:104) β iT (cid:96) , x T − (cid:105) , i = 1 , . . . , k − ,(cid:96) = 1 , . . . , M T . On top of that, for iterations k > ¯ k T , since functions Q kT ( · , ξ T j ) are made of a collection ofcuts taken from the finite and fixed set of cuts C (cid:96)T j , (cid:96) ≤ ¯ k T , the set of possible functions ( Q kT ( · , ξ T j )) k ≥ andtherefore of possible functions ( Q kT ) k ≥ is finite. It follows that there is a finite set of possible polyhedronsfor the feasible set of (4) (with t = T −
1) rewritten as a linear program as we have just explained, addingvariables f , f , . . . , f M T . Since these polyhedrons have a finite number of extreme points (recall that there isa finite number of different linear constraints), there is only a finite number of possible trial points ( x kT − ) k ≥ .Therefore we can assume without loss of generality that ¯ k T is such that for iterations k > ¯ k T , all trial point x kT − is also a copy of a trial point x k (cid:48) T − with k (cid:48) ≤ ¯ k T . b . We show that for every i ≥ j = 1 , . . . , M T , there exists 1 ≤ i (cid:48) ≤ ¯ k T and k T j ( i (cid:48) ) ≥ ¯ k T such thatfor every k ≥ max( i, k T j ( i (cid:48) )) we have(13) (cid:8) ( θ (cid:96)T j , β (cid:96)T j ) : (cid:96) ∈ I ikT j (cid:9) = (cid:110) ( θ (cid:96)T j , β (cid:96)T j ) : (cid:96) ∈ I i (cid:48) k Tj ( i (cid:48) ) T j (cid:111) , which will show H ( T ) with k T = max ≤ j ≤ M T , ≤ i (cid:48) ≤ ¯ k T k T j ( i (cid:48) ). Let us show that (13) indeed holds. Let ustake i ≥ j ∈ { , . . . , M T } . If 1 ≤ i ≤ ¯ k T define i (cid:48) = i . Otherwise due to a. we can find 1 ≤ i (cid:48) ≤ ¯ k T suchthat x iT − = x i (cid:48) T − which implies I ikT j = I i (cid:48) kT j for every k ≥ i . Now consider the sequence of sets ( I i (cid:48) kT j ) k ≥ ¯ k T .Due to the definition of ¯ k T , the sequence ( | I i (cid:48) kT j | ) k> ¯ k T is nondecreasing and therefore two cases can happen:(A) there exists k T j ( i (cid:48) ) such that for k ≥ k T j ( i (cid:48) ) we have I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j (the cuts computed afteriteration k T j ( i (cid:48) ) are not active at x i (cid:48) T − ). In this case I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j for k ≥ k T j ( i (cid:48) ), I ikT j = I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j for k ≥ max( k T j ( i (cid:48) ) , i ) and (13) holds.(B) The sequence ( | I i (cid:48) kT j | ) k ≥ ¯ k T is unbounded. Due to Assumption (H3), the sequence ( |S T j ( m ) | ) m isnondecreasing. If there exists k T j > ¯ k T such that S T j ( k ) = S T j ( k T j ) for k ≥ k T j then if k T j ( i (cid:48) )is the smallest k such that | I i (cid:48) kT j | ≥ k T j then for every k ≥ k T j ( i (cid:48) ) we have I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j and for ≥ max( k T j ( i (cid:48) ) , i ) we deduce I ikT j = I i (cid:48) k Tj ( i (cid:48) ) T j and (13) holds. Otherwise the sequence ( |S T j ( m ) | ) m is unbounded and an infinite number of cut indices are selected from the sets ( I i (cid:48) kT j ) k ≥ i (cid:48) to make upsets ( I i (cid:48) kT j ) k ≥ i (cid:48) . However, since there is a finite number of different cuts, there is only a finite numberof iterations where a new cut can be selected from I i (cid:48) kT j and therefore there exists k T j ( i (cid:48) ) such that(13) holds for k ≥ max( k T j ( i (cid:48) ) , i ). c. Now assume that H ( t + 1) holds for some t ∈ { , . . . , T − } . We want to show H ( t ). Consider the set D tjk of points of form(14) ( (cid:104) λ, b tj (cid:105) + M t +1 (cid:88) (cid:96) =1 (cid:88) i ∈ S kt +1 (cid:96) µ i(cid:96) θ it +1 (cid:96) , − B (cid:62) tj λ )where ( λ, µ ) is an extreme point of the set P tjk of points ( λ, µ ) satisfying(15) µ ≥ , p t +1 (cid:96) = (cid:80) i ∈ S kt +1 (cid:96) µ i(cid:96) , (cid:96) = 1 , . . . , M t +1 , A (cid:62) tj λ + (cid:80) M t +1 (cid:96) =1 (cid:80) i ∈ S kt +1 (cid:96) µ i(cid:96) β it +1 (cid:96) ≤ c tj . We claim that for every k > k t +1 , every point from D tjk can be written as a point from D tjk t +1 , i.e., a pointof form (14) with k replaced by k t +1 and ( λ, µ ) an extreme point of the set P tjk t +1 . Indeed, take a pointfrom D tjk , i.e., a point of form (14) with ( λ, µ ) an extreme point of P tjk and k > k t +1 . It can be written asa point from D tjk t +1 , i.e., a point of form (14) with k replaced by k t +1 and ( λ, ˆ µ ) in the place of ( λ, µ ) where( λ, ˆ µ ) is an extreme point of P tjk t +1 obtained replacing the basic columns β it +1 (cid:96) with i ∈ S kt +1 (cid:96) , i / ∈ S k t +1 t +1 (cid:96) associated with µ by columns β i (cid:48) t +1 (cid:96) with i (cid:48) ∈ S k t +1 t +1 (cid:96) such that ( θ it +1 (cid:96) , β it +1 (cid:96) ) = ( θ i (cid:48) t +1 (cid:96) , β i (cid:48) t +1 (cid:96) ) (this is possibledue to H ( t + 1)).Since P tjk t +1 has a finite number of extreme points, the set D tjk has a finite cardinality and recalling thatfor CuSMuDA ( θ ktj , β ktj ) ∈ D tjk , the cut coefficients ( θ ktj , β ktj ) can only take a finite number of values. Thisshows the first part of H ( t ). Therefore, there exists ¯ k t such that for every k > ¯ k t and every j , each cut C ktj isa copy of a cut C k (cid:48) tj with 1 ≤ k (cid:48) ≤ ¯ k t (no new cut is computed for functions Q t ( · , ξ tj ) for k > ¯ k t ). As for theinduction step t = T , this clearly implies that after some iteration, no new trial points are computed andtherefore we can assume without loss of generality that ¯ k t is such that for k > ¯ k T trial point x kt − is a copyof some x (cid:96)t − with 1 ≤ (cid:96) ≤ ¯ k t .Finally, we can show (12) proceeding as in b. , replacing T by t . This achieves the proof of H ( t ).Gathering our observations, we have shown that (11) holds with k = max t =2 ,...,T k t . (cid:3) Remark 4.1.
For Level 1 and LML 1 cut selection strategies corresponding to selectors S tj satisfyingrespectively S tj ( m ) = { , . . . , m } and S tj ( m ) = { } , integers k , ¯ k t defined in Lemma 1 and its proof satisfy k ≤ max t =2 ,...,T ¯ k t . For other selectors S tj this relation is not necessarily satisfied (see b. -(B) of the proofof the lemma). Theorem 1.
Let Assumptions (H1), (H2), (H3), and (H4) hold. Assume that all subproblems in the forwardand backward passes of CuSMuDA are solved using an algorithm that necessarily outputs an extreme point ofthe feasible set (for instance the simplex algorithm). Then Algorithm CuSMuDA converges with probabilityone in a finite number of iterations to a policy which is an optimal solution of (1) - (2) .Proof. Let Ω be defined as in the proof of Lemma 1 and let Ω be the event such that k defined in Lemma1 is finite. Note that Ω ∩ Ω has probability 1. Consider a realization of CuSMuDA in Ω ∩ Ω correspondingto realizations ( ˜ ξ k T ) k of ( ξ k T ) k in the forward pass and let ( x ∗ , x ∗ ( · ), . . . , x ∗ T ( · )) be the policy obtained fromiteration k on which uses recourse functions Q k t +1 instead of Q t +1 . Recall that policy ( x ∗ , x ∗ ( · ), . . . , x ∗ T ( · )) isoptimal if for every realization ˜ ξ T := ( ξ , ˜ ξ , . . . , ˜ ξ T ) of ξ T := ( ξ , ξ , . . . , ξ T ), we have that x ∗ t ( ξ , ˜ ξ , . . . , ˜ ξ t )solves(16) Q t ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) = inf x t {(cid:104) ˜ c t , x t (cid:105) + Q t +1 ( x t ) : x t ∈ X t ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) } for every t = 1 , . . . , T , with the convention that x ∗ = x . We prove for t = 1 , . . . , T , H ( t ) : for every k ≥ k and for every sample ˜ ξ t = ( ξ , ˜ ξ , . . . , ˜ ξ t ) of ( ξ , ξ , . . . , ξ t ) , we have Q kt ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) = Q t ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) . e show H (1) , . . . , H ( T ) by induction. H ( T ) holds since Q kT = Q T for every k . Now assume that H ( t + 1)holds for some t ∈ { , . . . , T − } . We want to show H ( t ). Take an arbitrary k ≥ k and a sample˜ ξ t − = ( ξ , ˜ ξ , . . . , ˜ ξ t − ) of ( ξ , ξ , . . . , ξ t − ). We have for every j = 1 , . . . , M t , that(17) Q kt ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) = (cid:26) inf (cid:104) c tj , x t (cid:105) + Q kt +1 ( x t ) x t ∈ X t ( x ∗ t − ( ˜ ξ t − ) , ξ tj )= (cid:104) c tj , x ∗ t ( ˜ ξ t − , ξ tj ) (cid:105) + Q kt +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) . Now we check that for j = 1 , . . . , M t , we have(18) Q kt +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) = Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) . Indeed, if this relation did not hold, since Q kt +1 ≤ Q t +1 , we would have Q k t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) = Q kt +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) < Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) . From the definitions of Q kt +1 , Q t +1 , there exists m ∈ { , . . . , M t +1 } such that Q k t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) < Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) . Since the realization of CuSMuDA is in Ω , there exists an infinite set of iterations such that the sampledscenario for stages 1 , . . . , t , is ( ˜ ξ t − , ξ tj ). Let (cid:96) be one of these iterations strictly greater than k . Using H ( t + 1), we have that Q (cid:96)t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) = Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m )which yields Q k t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) < Q (cid:96)t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m )and at iteration (cid:96) > k we would construct a cut for Q t +1 ( · , ξ t +1 m ) at x ∗ t ( ˜ ξ t − , ξ tj ) = x (cid:96)t with value Q (cid:96)t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m )strictly larger than the value at this point of all cuts computed up to iteration k . Due to Lemma 1, this isnot possible. Therefore, (18) holds, which, plugged into (17), gives Q kt ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) = (cid:104) c tj , x ∗ t ( ˜ ξ t − , ξ tj ) (cid:105) + Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) ≥ Q t ( x ∗ t − ( ˜ ξ t − ) , ξ tj )(recall that x ∗ t ( ˜ ξ t − , ξ tj ) ∈ X t ( x ∗ t − ( ˜ ξ t − ) , ξ tj )). Since Q kt ≤ Q t , we have shown Q kt ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) = Q t ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) for every j = 1 , . . . , M t , which is H ( t ). Therefore, we have proved that for t = 1 , . . . , T, for every realization ( ˜ ξ t − , ξ tj ) of ξ t , x ∗ t ( ˜ ξ t − , ξ tj ) satisfies (cid:104) c tj , x ∗ t ( ˜ ξ t − , ξ tj ) (cid:105) + Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) = Q t ( x ∗ t − ( ˜ ξ t − ) , ξ tj ), meaning that for every j = 1 , . . . , M t , x ∗ t ( ˜ ξ t − , ξ tj ) is an optimal solution of (16)written with ˜ ξ t = ( ˜ ξ t − , ξ tj ) and completes the proof. (cid:3) Remark 4.2.
The convergence proof above also shows the almost sure convergence in a finite number ofiterations for MuDA combined with cut selection strategies that would always select more cuts than any cutselection strategy satisfying Assumption (H3). It shows in particular the convergence of Level H cut selectionfrom [28] which keeps the H cuts having the largest values at each trial point. Remark 4.3.
The class of cut selection strategies described in Section 3.2 can be straightforwardly combinedwith SDDP. The convergence proof of the corresponding variant of SDDP applied to DP equations (1) , (2) can be easily obtained adapting the proofs of Lemma 1 and Theorem 1. Application to portfolio selection and inventory management
Portfolio selection. .1.1. Model.
We consider a portfolio selection problem with direct transaction costs over a discretizedhorizon of T stages. The direct buying and selling transaction costs are proportional to the amount of thetransaction ([3], [4]). Let x t ( i ) be the dollar value of asset i = 1 , . . . , n + 1 at the end of stage t = 1 , . . . , T ,where asset n + 1 is cash; ξ t ( i ) is the return of asset i at t ; y t ( i ) is the amount of asset i sold at the end of t ; z t ( i ) is the amount of asset i bought at the end of t , η t ( i ) > ν t ( i ) > t . Each component x ( i ) , i = 1 , . . . , n + 1, of x is known. The budgetavailable at the beginning of the investment period is (cid:80) n +1 i =1 ξ ( i ) x ( i ) and u ( i ) represents the maximalproportion of money that can be invested in asset i .For t = 1 , . . . , T , given a portfolio x t − = ( x t − (1) , . . . , x t − ( n ) , x t − ( n + 1)) and ξ t , we define the set X t ( x t − , ξ t ) as the set of ( x t , y t , z t ) ∈ R n +1 × R n × R n satisfying(19) x t ( n + 1) = ξ t ( n + 1) x t − ( n + 1) + n (cid:88) i =1 (cid:16) (1 − η t ( i )) y t ( i ) − (1 + ν t ( i )) z t ( i ) (cid:17) , and for i = 1 , . . . , n , x t ( i ) = ξ t ( i ) x t − ( i ) − y t ( i ) + z t ( i ) , (20a) x t ( i ) ≤ u ( i ) n +1 (cid:88) j =1 ξ t ( j ) x t − ( j ) , (20b) x t ( i ) ≥ , y t ( i ) ≥ , z t ( i ) ≥ . (20c)Constraints (19) are the cash flow balance constraints and define how much cash is available at each stage.Constraints (20a) define the amount of security i held at each stage t and take into account the proportionaltransaction costs. Constraints (20b) prevent the position in security i at time t from exceeding a proportion u ( i ). Constraints (20c) prevent short-selling and enforce the non-negativity of the amounts bought and sold.With this notation, the following dynamic programming equations of a risk-neutral portfolio model canbe written: for t = T , setting Q T +1 ( x T ) = E [ n +1 (cid:80) i =1 ξ T +1 ( i ) x T ( i )] we solve the problem(21) Q T ( x T − , ξ T ) = (cid:26) sup Q T +1 ( x T )( x T , y T , z T ) ∈ X T ( x T − , ξ T ) , while at stage t = T − , . . . ,
1, we solve(22) Q t ( x t − , ξ t ) = (cid:26) sup Q t +1 ( x t )( x t , y t , z t ) ∈ X t ( x t − , ξ t ) , where for t = 2 , . . . , T , Q t ( x t − ) = E [ Q t ( x t − , ξ t )]. With this model, we maximize the expected return ofthe portfolio taking into account the transaction costs, non-negativity constraints, and bounds imposed onthe different securities.5.1.2. CuSMuDA for portfolio selection.
We assume that the return process ( ξ t ) satisfies Assumption (H1). In this setting, we can solve the portfolio problem under consideration using MuDA and CuSMuDA. Forthe sake of completeness, we show how to apply MuDA to this problem, including the stopping crite-rion (CuSMuDA follows, incorporating one of the pseudo-codes from Figure 1). In this implementation, N independent scenarios ˜ ξ k , k = ( i − N + 1 , . . . , iN , of ( ξ , ξ , . . . , ξ T ) are sampled in the forward passof iteration i to obtain N sets of trial points (note that the convergence proof of Theorem 1 still ap-plies for this variant of CuSMuDA). At the end of iteration i , we end up with approximate functions Q it ( x t − , ξ tj ) = max ≤ (cid:96) ≤ iN (cid:104) β (cid:96)tj , x t − (cid:105) for Q t ( · , ξ tj ) (observe that for this problem the cuts have no intercept). This portfolio problem was solved using SDDP and SREDA (Stochastic REgularized Decomposition Algorithm) in [19]. It is possible (at the expense of the computational time) to incorporate stagewise dependant returns within the decom-position algorithms under consideration, for instance including in the state vector the relevant history of the returns as in[21, 14]. orward pass of iteration i . We generate N scenarios ˜ ξ k = ( ˜ ξ k , , ˜ ξ k , . . . , ˜ ξ kT ) , k = ( i − N + 1 , . . . , iN ,of ( ξ , . . . , ξ T ) and solve for k = ( i − N + 1 , . . . , iN , and t = 1 , . . . , T −
1, the problem inf x t ,f M t +1 (cid:88) (cid:96) =1 p t +1 (cid:96) f (cid:96) x t ∈ X t ( x kt − , ˜ ξ kt ) ,f (cid:96) ≥ (cid:104) β mt +1 (cid:96) , x t (cid:105) , m = 1 , . . . , ( i − N, (cid:96) = 1 , . . . , M t +1 , starting from ( x k , ˜ ξ k ) = ( x , ξ ). Let x kt be an optimal solution.We then sample S independant scenarios of returns and simulate the policy obtained in the end of theforward pass on these scenarios with corresponding decisions (¯ x k , . . . , ¯ x kT ) on scenario k = 1 , . . . , S . Wecompute the empirical mean Cost i and standard deviation σ i of the cost on these sampled scenarios:(23) Cost i = − S S (cid:88) k =1 (cid:104) E [ ξ T +1 ] , ¯ x kT (cid:105) , σ i = (cid:118)(cid:117)(cid:117)(cid:116) S S (cid:88) k =1 (cid:16) − (cid:104) E [ ξ T +1 ] , ¯ x kT (cid:105) − Cost i (cid:17) . This allows us to compute the upper end z i sup of a one-sided confidence interval on the mean cost of thepolicy obtained at iteration i given by(24) z i sup = Cost i + σ i √ S Φ − (1 − α )where Φ − (1 − α ) is the (1 − α )-quantile of the standard Gaussian distribution. Backward pass of iteration i . For k = ( i − N + 1 , . . . , iN , we solve for t = T , j = 1 , . . . , M T ,(25) (cid:26) inf −(cid:104) x T , E [ ξ T +1 ] (cid:105) x T ∈ X T ( x kT − , ξ T j )and for t = T − , . . . , j = 1 , . . . , M t ,(26) inf x t ,f M t +1 (cid:88) (cid:96) =1 p t +1 (cid:96) f (cid:96) x t ∈ X t ( x kt − , ξ tj ) ,f (cid:96) ≥ (cid:104) β mt +1 (cid:96) , x t (cid:105) , m = 1 , . . . , iN, (cid:96) = 1 , . . . , M t +1 . For stage t problem above with realization ξ tj of ξ t (problem (25) for t = T and (26) for t < T ), let λ ktj be the optimal Lagrange multipliers associated to the equality constraints and let µ ktj ≥ x t ( i ) ≤ u ( i ) n +1 (cid:80) (cid:96) =1 ξ tj ( (cid:96) ) x t − ( (cid:96) ). We compute β ktj = (cid:16) λ ktj − (cid:104) u, µ ktj (cid:105) e (cid:17) ◦ ξ tj , where e is a vector in R n +1 of ones and where for vectors x, y , the vector x ◦ y has components ( x ◦ y )( i ) = x ( i ) y ( i ). Stopping criterion (see [32] ). At the end of the backward pass of iteration i , we solve inf x ,f M (cid:88) (cid:96) =1 p (cid:96) f (cid:96) x ∈ X ( x , ξ ) ,f (cid:96) ≥ (cid:104) β m (cid:96) , x (cid:105) , m = 1 , . . . , iN, (cid:96) = 1 , . . . , M , We use minimization instead of maximization subproblems. In this context, the optimal mean income is the opposite ofthe optimal value of the first stage problem. hose optimal value provides a lower bound z i inf on the optimal value Q ( x ) of the problem. Given atolerance ε >
0, the algorithm stops either when z i inf = 0 and z i sup ≤ ε or when(27) (cid:12)(cid:12) z i sup − z i inf (cid:12)(cid:12) ≤ ε max(1 , | z i sup | ) . In the expression above, we use ε max(1 , | z i sup | ) instead of ε | z i sup | in the right-hand side to account for thecase z i sup = 0.5.1.3. Numerical results.
Problem data.
We compare six methods to solve the porfolio problem presentedin Section 5.1.1: MuDA with sampling that we have just described (denoted by
MuDA for short), SDDP,CuSMuDA and SDDP with Level 1 cut selection (denoted by
CuSMuDA CS 1 and
SDDP CS 1 respectively forshort), and CuSMuDA and SDDP with Limited Memory Level 1 cut selection (denoted by
CuSMuDA CS 2 and
SDDP CS 2 for short). The implementation was done in Matlab run on a laptop with Intel(R) Core(TM)i7-4510U CPU @ 2.00GHz. All subproblems in the forward and backward passes were solved numericallyusing Mosek Optimization Toolbox [1].We fix u ( i ) = 1 , i = 1 , . . . , n , while x has components uniformly distributed in [0 , α = 0 . ε = 0 .
1, and test two values of N , namely N = 1 and N = 200 in (27). Below,we generate various instances of the portfolio problem as follows. For fixed T (number of stages [days for ourexperiment]) and n (number of risky assets), the distributions of ξ t (1 : n ) have M t = M realizations with p ti = P ( ξ t = ξ ti ) = 1 /M , and ξ (1 : n ) , ξ t (1 : n ) , . . . , ξ tM (1 : n ) chosen randomly among historical data ofdaily returns of n of the assets of the S&P 500 index for the period 18/5/2009-28/5/2015. These n assetscorrespond to the first n stocks listed in our matrix of stock prices downloaded from Wharton ResearchData Services (WRDS: https://wrds-web.wharton.upenn.edu/wrds/ ). The daily return ξ t ( n + 1) of therisk-free asset is 0 .
1% for all t .Transaction costs are assumed to be known with ν t ( i ) = µ t ( i ) obtained sampling from the distribution ofthe random variable 0 .
08 + 0 .
06 cos( πT U T ) where U T is a random variable with a discrete distribution overthe set of integers { , , . . . , T } . Results.
The computational time and number of iterations required for solving 22 instances of theportfolio problem for several values of parameters (
M, T, n, N, S ) is given in Tables 1 and 2.On all instances except Instance 15 (with M = 2 , T = 6 , n = 10), MuDA is much slower than
SDDP , i.e.,needs much more time to satisfy the stopping criterion. On most instances, both variants with cut selectionare quicker than their counterpart without cut selection (on 17 instances out of 22 for SDDP and 12 out of22 for MuDA) and LML 1 is more efficient than Level 1 (on 18 out of 22 instances for SDDP and 19 out of22 instances for MuDA). More precisely, both for SDDP and MuDA, we observe three different patterns:(P1) Instances where variants with cut selection require comparable computational bulk but are quickerthan their counterpart without cut selection. There are 9 of these instances for SDDP (Instances 1,4, 7, 8, 9, 13, 15, 16, and 17) and 8 of these instances for MuDA (Instances 1, 2, 4, 6, 13, 14, 15, and16).(P2) Instances where one variant with cut selection (in general LML 1) is much quicker than the otherone, both of them being quicker than their counterpart without cut selection. There are 8 of theseinstances for SDDP (Instances 2, 5, 6, 10, 11, 12, 14, 18) and 4 of these instances for MuDA (Instances8, 9, 10, 11).(P3) Instances where at least one variant with cut selection (in general Level 1) is much slower than itscounterpart without cut selection. There are 5 of these instances for SDDP (Instances 3, 19, 20, 21,22) and 10 of these instances for MuDA (3, 5, 7, 12, 17, 18, 19, 20, 21, 22).To understand the impact of the cut selection strategies on the computational time, it is useful to analyzethe number of iterations and the proportion of cuts selected along the iterations of the algorithm by thevariants with cut selection in cases (P1), (P2), and (P3) described above.As examples of instances of type (P1), consider Instance 1 for SDDP and MuDA and Instance 2 forMuDA. We considered in particular instances where M is less than the number 2 n + 1 of constraints of the subproblems to giveMuDA a chance (according to [8], cases where MuDA may be competitive with (single cut) SDDP satisfy this requirement). nstance 1: M = 3, T = 48, n = 50, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 8 750 2 857 3 201 22 857 6 800 6 964Iteration 1152 1168 1286 1205 1176 1203
Instance 2: M = 50, T = 4, n = 500, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 1726.1 727.0 517.1 3 779.0 3 405 3 348Iteration 200 200 200 41 42 50
Instance 3: M = 5, T = 12, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 506.4 1 378.2 193.2 799.8 3053.5 168.9Iteration 9 11 7 6 8 7
Instance 4: M = 2, T = 6, n = 200, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 25.5 22.2 21.0 43.3 34.4 34.4Iteration 83 97 94 82 93 92
Instance 5: M = 5, T = 12, n = 5, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 22.0 21.9 16.9 42.2 104.5 24.0Iteration 200 200 200 200 200 200
Instance 6: M = 5, T = 12, n = 10, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 34.1 25.7 19.5 76.6 43.8 32.7Iteration 200 200 200 200 200 200
Instance 7: M = 10, T = 6, n = 10, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 53.7 17.3 14.8 90.3 143.0 26.2Iteration 200 200 200 200 200 200
Instance 8: M = 20, T = 6, n = 5, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 33.7 26.5 23.9 214.5 192.3 50.3Iteration 200 200 200 200 200 200
Instance 9: M = 50, T = 12, n = 10, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 340.7 181.8 158.8 3 649.4 2 541.8 835.3Iteration 200 200 200 179 200 200
Instance 10: M = 10, T = 6, n = 10, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 256.3 208.3 84.8 2 321.5 1 185.4 191.8Iteration 5 5 5 6 5 5
Instance 11: M = 20, T = 6, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 250.9 189.1 72.0 2 721.3 1 838.3 232.6Iteration 4 4 3 4 4 4
Table 1.
Computational time (in seconds) and number of iterations for solving instancesof the portfolio problem of Section 5.1.1 with
SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDACS 1 , and
CuSMuDA CS 2 . nstance 12: M = 2, T = 6, n = 450, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 175.3 117.7 153.7 235.6 351.7 183.7Iteration 158 154 173 128 181 168
Instance 13: M = 2, T = 12, n = 50, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 46.3 21.5 21.1 103.2 31.7 36.8Iteration 122 137 132 125 127 137
Instance 14: M = 2, T = 6, n = 10, N = 1, S = 10 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 0.52 0.35 0.45 0.38 0.37 0.34Iteration 13 13 13 11 13 13
Instance 15: M = 3, T = 36, n = 30, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 1029 341 394 1 710 704 634Iteration 454 467 489 436 463 462
Instance 16: M = 3, T = 48, n = 10, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 56.8 44.2 39.3 75.4 46.1 39.3Iteration 100 139 124 100 99 98
Instance 17: M = 20, T = 5, n = 4, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 103.6 88.2 81.2 538.2 807 78.6Iteration 3 4 4 2 2 2
Instance 18: M = 20, T = 5, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 30.9 26.7 17.8 1021.8 1764.6 142.8Iteration 1 1 1 3 3 3
Instance 19: M = 20, T = 5, n = 6, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 132.2 273.2 127.8 2 752.2 3 853.2 315Iteration 3 5 5 5 5 5
Instance 20: M = 10, T = 8, n = 4, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 134.7 177.9 67.4 476.4 976.2 93.6Iteration 4 4 4 3 3 3
Instance 21: M = 10, T = 8, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 264.7 297.0 90.9 2 603.4 4 438.8 258Iteration 5 5 4 6 5 7
Instance 22: M = 10, T = 8, n = 6, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2
CPU time 256.7 312.7 162.1 3 808.2 6 633 303Iteration 5 5 6 7 9 6
Table 2.
Computational time (in seconds) and number of iterations for solving instancesof the portfolio problem of Section 5.1.1 with
SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDACS 1 , and
CuSMuDA CS 2 .For Instance 1, the mean proportion (over the iterations of the algorithm) of cuts selected by all variantswith cut selection, i.e.,
SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and
CuSMuDA CS 2 , tends to increase withthe stage and is very low for nearly all stages (at most 0.1 until stage 30 and at most 0.2 until stage 40,knowing that there are T = 48 stages for that instance); see the bottom plot of Figure 2 which represents he evolution of the mean proportion of cuts selected as a function of the stage. This makes sense since atthe last stage, the cuts for functions Q T ( · , ξ T j ) are exact (and as we recalled, all of them are selected with
CuSMuDA CS 1 ) but not necessarily for stages t < T because for these stages approximate recourse functionsare used and the approximation errors propagate backward. Therefore it is expected that at the early stagesold cuts (computed at the first iterations using crude approximations of the recourse functions) will probablynot be selected. Additionally, another reason why fewer cuts are selected in earlier stages may come fromthe fact that there are fewer distinct sampled points in the state-space.Moreover the proportion of cuts selected is very similar for all variants.Since the degradation in the upper and lower bound for variants with cut selection is very limited (seethe evolution of the upper and lower bounds along iterations for all methods on the top right plot of Figure2), the number of iterations required to solve Instance 1 by these variants and their counterpart without cutselection is similar. SDDPMuDASDDP CS 1SDDP CS 2CuSMuDA CS 1CuSMuDA CS 2Iteration T o t a l C P U t i m e ( s e c ond s ) Figure 2.
Top left: total CPU time (in seconds) as a function of the number of iterationsfor
SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance1. Top right: evolution of the upper bounds z i sup and lower bounds z i inf along the iterationsof SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance1 (no legend is given on this plot since all curves are almost identical). Bottom: meanproportion of cuts (over the iterations of the algorithm) selected for stages t = 2 , . . . , T = 48by SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance 1. We see in particular that the bounds for all algorithms are very close to each other at the last iteration and that, asexpected, the lower bounds increase and the upper bounds tend to decrease along iterations. If we do not know the optimalvalue Q ( x ), these observations (that we checked on all instances) are a good indication that all algorithms were correctlyimplemented. herefore, variants with cut selection are much quicker and the quickest variant with cut selection is theone requiring the least number of iterations; see the top left plot of Figure 2 which plots the total CPU timeas a function of the number of iterations.For Instance 2 solved by MuDA and its variants with cut selection, we refer to Figure 3. We see on thisfigure that these variants again select a very small proportion of cuts for stages 2 and 3 (here, the number ofstages is T = 4) and that this proportion increases with the stage. The degradation in the lower bound forvariants with cut selection is larger than with Instance 1 but is still very limited. Therefore, variants withcut selection are much quicker. CuSMuDA CS 2 requires less cuts than
CuSMuDA CS 1 but more iterations andCPU time for both variants is similar.To explain patterns (P2) and (P3), we consider Instance 2 for SDDP, of type (P2), Instance 3 for SDDPand MuDA, of type (P3), and refer to Figures 3 and 4 which are the analogues of Figure 2 for Instances 2and 3. T o t a l C P U t i m e ( s e c ond s ) Figure 3.
Top left: total CPU time (in seconds) as a function of the number of iterationsfor
SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance2. Top right: evolution of the lower bounds z i inf along the iterations of SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance 2. Bottom: meanproportion of cuts (over the iterations of the algorithm) selected for stages t = 2 , , T = 4,by SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance 2. teration T o t a l C P U t i m e ( s e c ond s ) Iteration
SDDPSDDP CS 1SDDP CS 2MuDACuSMuDA CS 1CuSMuDA CS 2
Stage
SDDP CS 1SDDP CS 2CuSMuDA CS 1CuSMuDA CS 2
Figure 4.
Top left: total CPU time (in seconds) as a function of the number of iterationsfor
SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , and
CuSMuDA CS 2 to solve Instance 3 (we didnot represent the curve for
CuSMuDA CS 1 since the total CPU time with
CuSMuDA CS 1 ismuch larger). Top right: evolution of the upper bounds z i sup and lower bounds z i inf along theiterations of SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solveInstance 3. Bottom: mean proportion of cuts (over the iterations of the algorithm) selectedfor stages t = 2 , . . . , T = 12, by SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and
CuSMuDA CS 2 to solve Instance 3.On Figure 3, we see that on Instance 2,
SDDP CS 1 and
SDDP CS 2 select a small proportion of cuts forstages 2 and 3 (recall that this is an instance with T = 4 stages) and require, as SDDP , 200 iterations (theevolution of the lower bound along iterations is similar for all 3 methods). Moreover,
SDDP CS 2 selectssignificantly less cuts than
SDDP CS 1 for stages 2 and 3 and therefore is much quicker than
SDDP CS 1 .This means that a large number of cuts have the same value at some trial points and for each such trialpoint if
SDDP CS 1 selects all these cuts
CuSMuDA CS 2 only selects the oldest of these cuts.Instance 3 is an example of a problem where LML 1 cut selection strategy selects a very small number ofcuts for all stages whereas Level 1 selects many more cuts, both for MuDA and SDDP (see Figure 4). Ontop of that,
SDDP CS 1 (resp.
CuSMuDA CS 1 ) needs more iterations than
SDDP and
SDDP CS 2 (resp.
MuDA and
CuSMuDA CS 2 ). Therefore the CPU time needed to solve Instance 3 with
SDDP CS 1 (resp.
CuSMuDACS 1 ) is much larger than the CPU time needed to solve this instance with
SDDP and
SDDP CS 2 (resp.
MuDA and
CuSMuDA CS 2 ). This is an example of an instance where the time spent to select the cuts is notcompensated by the (small) reduction in CPU time for solving the problems in the backward and forwardpasses. ummarizing our observations, • Pattern (P3) occurs when a variant with cut selection selects a large proportion of cuts and requirestoo many iterations; • Patterns (P1) and (P2) occur when variants with cut selection select a small number of cuts anddo not need much more iterations than the variant without cut selection. In this situation, (P1)occurs either when (i) both variants with cut selection select a similar number of cuts or when (ii)one variant selects less cuts than the other but needs slightly more iterations.One last comment is now in order. We have already observed that on all experiments, Level 1 andTerritory Algorithm cut selection strategies, i.e.,
SDDP CS 1 and
CuSMuDA CS 1 , correctly select all cuts atthe final stage. However, a crude implementation of the Level 1 pseudo-code given in Figure 1 resultedin the elimination of cuts at the final stage. This comes from the fact that approximate solutions of theoptimization subproblems are computed. Therefore two optimization problems could compute the same cutsbut the solver may return two different (very close) approximate solutions. Similarly, a cut may be in theorythe highest at some point (for instance cut C kT j is, in theory, the highest at x kT − ) but numerically the valueof another cut at this point may be deemed slightly higher, because of numerical errors. The remedy is tointroduce a small error term ε ( ε = 10 − in our experiments) such that the values V and V of two cuts C and C at a trial point are considered equal if | V − V | ≤ ε max(1 , | V | ) while C is above C at this trialpoint if V ≥ V + ε max(1 , | V | ). Therefore the pseudo-codes of Level 1 and LML 1 given in Figure 1 needto be modified. The corresponding pseudo-codes taking into account the approximation errors are given inFigure 6 in the Appendix. Adding cuts that improve the approximation by more than some threshold ε was also used in [23].5.2. Inventory management.
Model.
We consider the T -stage inventory management problem given in (Shapiro et al. 2009). Foreach stage t = 1 , . . . , T , on the basis of the inventory level x t − at the beginning of period t , we have todecide on the quantity y t − x t − of a product to buy so that the inventory level becomes y t . Given demand ξ t for that product for stage t , the inventory level is x t = y t − ξ t at the beginning of stage t + 1. The inventorylevel can become negative, in which case a backorder cost is paid. If one is interested in minimizing theaverage cost over the optimization period, we need to solve the following dynamic programming equations:for t = 1 , . . . , T , defining Q t ( x t − ) = E ξ t [ Q t ( x t − , ξ t )], the stage t problem is Q t ( x t − , ξ t ) = (cid:26) inf c t ( y t − x t − ) + b t ( ξ t − y t ) + + h t ( y t − ξ t ) + + Q t +1 ( x t ) x t = y t − ξ t , y t ≥ x t − , where c t is the unit buying cost, h t is the holding cost, and b t the backorder cost. The optimal mean cost is Q ( x ) where x is the initial stock.In what follows, we present the results of numerical simulations obtained solving this problem with SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 (we used the same notation as before todenote the solution methods).5.2.2.
Numerical results.
We consider six values for the number of stages T ( T ∈ { , , , , , } ) andfor fixed T , the following values of the problem and algorithm parameters are taken: • c t = 1 . πt ) , b t = 2 . , h t = 0 . , M t = M for all t = 1 , . . . , T , • x = 10, p ti = M = for all t, i , • ξ = ξ and ( ξ t , ξ t , . . . , ξ tM ) corresponds to a sample from the distribution of ξ t (1 + 0 . ε t ) for i.i.d ε t ∼ N (0 ,
1) for t = 2 , . . . , T , where ξ t = 5 + 0 . t , • for the stopping criterion, in (24) α = 0 .
025 and N = S = 200, and ε = 0 .
05 in (27).
Checking the implementations.
As for the portfolio problem, for each value of T ∈ { , , , , , } ,we checked that upper and lower bounds computed by all 6 methods are very close for the three algorithmsat the final iteration. Computational time and proportion of cuts selected.
Table 3 shows the CPU time needed tosolve our six instances of inventory management problems. For SDDP, all variants with cut selection yield DDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2 T = 5 0.57 0.48 0.18 1.57 2.90 0.22 T = 10 5.9 1.22 1.42 26.33 43.07 1.68 T = 15 21.9 20.4 4.56 106.52 178.08 8.07 T = 20 28.6 31.2 9.94 158.10 245.24 19.43 T = 25 36.5 62.8 9.61 189.27 405.58 22.64 T = 30 77.7 63.6 18.30 363.86 575.53 56.62 Table 3.
Computational time (in minutes) for solving instances of the inventory problemof Section 5.2.1 for M = 20 with SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and
CuSMuDA CS 2 . T = 15 , M = 20 T = 20 , M = 20 Figure 5.
Mean proportion of cuts (over the iterations of the algorithm) selected forstages t = 2 , . . . , T for CuSMuDA CS 1 and
CuSMuDA CS 2 for the inventory problem for M = 20.important reduction in CPU time and SDDP CS 2 , that uses LML 1, is by far the quickest on 5 instances.For MuDA, Level 1 is not efficient (CPU time with
CuSMuDA CS 1 is much larger than CPU time with
MuDA )whereas LML 1 allows us to drastically reduce CPU time. As for instances of the portfolio problem of type(P3), the fact that
CuSMuDA CS 1 is much slower than both
CuSMuDA CS 2 and
MuDA is that it requires asimilar number of iterations and selects much more cuts than
CuSMuDA CS 2 which selects very few cuts forall stages, as can be seen on Figure 5 which represents the mean proportion of cuts selected for
CuSMuDA CS1 and
CuSMuDA CS 2 as a function of the number of stages for two instances.6.
Conclusion
We proposed CuSMuDA, a combination of a class of cut selection strategies with Multicut Decompositionalgorithms to solve multistage stochastic linear programs. We proved the almost sure convergence of themethod in a finite number of iterations and obtained as a by-product the almost sure convergence in a finitenumber of iterations of SDDP combined with this class of cut selection strategies.Numerical experiments on many instances of a portfolio and of an inventory problem have shown thatcombining LML 1 cut selection with SDDP (resp. MuDA) allows us in general to reduce considerably theCPU time of SDDP (resp. MuDA). There are, however, situations where a variant with selection is slowerthan its counterpart without cut selection.It would be interesting to test CuSMuDA and the Limited Memory variant of Level 1 (both for MuDA andSDDP) on other types of stochastic programs and to extend the analysis to nonlinear stochastic programs. cknowledgments The second author’s research was partially supported by an FGV grant, CNPq grants 307287/2013-0and 401371/2014-0, and FAPERJ grant E-26/201.599/2014. The authors wish to thank Vincent Lecl`ere forhelpful discussions.
References [1] E. D. Andersen and K.D. Andersen.
The MOSEK optimization toolbox for MATLAB manual. Version 7.0 , 2013.[2] T. Asamov and W. Powell. Regularized decomposition of high-dimensional multistage stochastic programs with markovuncertainty.
Available at: https://arxiv.org/abs/1505.02227 , 2015.[3] A. Ben-Tal, T. Margalit, and A. Nemirovski. Robust modeling of multi-stage portfolio problems. in: H. Frenk, K. Roos,T. Terlaky, S. Zhang, Eds., High Performance Optimization, Kluwer Academic Publishers , pages 303–328, 2000.[4] M. Best and J. Hlouskova. An algorithm for portfolio optimization with variable transaction costs, part 1: Theory.
Journalof Optimization Theory and Applications , 135:563–581, 2007.[5] J. Birge and F. Louveaux.
Introduction to Stochastic Programming . Springer-Verlag, New York, 1997.[6] J.R. Birge. Decomposition and partitioning methods for multistage stochastic linear programs.
Oper. Res. , 33:989–1007,1985.[7] J.R. Birge and C. J. Donohue. The Abridged Nested Decomposition Method for Multistage Stochastic Linear Programswith Relatively Complete Recourse.
Algorithmic of Operations Research , 1:20–30, 2001.[8] J.R. Birge, C.J. Donohue, D.F. Holmes, and O.G. Svintsitski. A parallel implementation of the nested decompositionalgorithm for multistage stochastic linear programs.
Mathematical Programming , 75:327–352, 1996.[9] J.R. Birge and F.V. Louveaux. A multicut algorithm for two-stage stochastic linear programs.
European Journal of Oper-ational Research , 34:384–392, 1988.[10] Z.L. Chen and W.B. Powell. Convergent Cutting-Plane and Partial-Sampling Algorithm for Multistage Stochastic LinearPrograms with Recourse.
J. Optim. Theory Appl. , 102:497–524, 1999.[11] H. Gassmann. Mslip: A computer code for the multistage stochastic linear programming problem.
Math. Program. , 47:407–423, 1990.[12] S. Gaubert, W. McEneaney, and Z. Qu. Curse of dimensionality reduction in max-plus based approximation methods:Theoretical estimates and improved pruning algorithms. , pages 1054–1061, 2011.[13] P. Girardeau, V. Leclere, and A.B. Philpott. On the convergence of decomposition methods for multistage stochastic convexprograms.
Mathematics of Operations Research , 40:130–145, 2015.[14] V. Guigues. SDDP for some interstage dependent risk-averse problems and application to hydro-thermal planning.
Com-putational Optimization and Applications , 57:167–203, 2014.[15] V. Guigues. Convergence analysis of sampling-based decomposition methods for risk-averse multistage stochastic convexprograms.
SIAM Journal on Optimization , 26:2468–2494, 2016.[16] V. Guigues. Dual dynamic programing with cut selection: Convergence proof and numerical experiments.
European Journalof Operational Research , 258:47–57, 2017.[17] V. Guigues and W. R¨omisch. Sampling-based decomposition methods for multistage stochastic programs based on extendedpolyhedral risk measures.
SIAM J. Optim. , 22:286–312, 2012.[18] V. Guigues and W. R¨omisch. SDDP for multistage stochastic linear programs based on spectral risk measures.
OperationsResearch Letters , 40:313–318, 2012.[19] V. Guigues, W. Tekaya, and M. Lejeune. Regularized decomposition methods for deterministic and stochastic convexoptimization and application to portfolio selection with direct transaction and market impact costs.
Optimization OnLine ,2017.[20] M. Hindsberger and A. B. Philpott. Resa: A method for solving multi-stage stochastic linear programs.
SPIX StochasticProgramming Symposium , 2001.[21] G. Infanger and D. Morton. Cut sharing for multistage stochastic linear programs with interstage dependency.
Math.Program. , 75:241–256, 1996.[22] V. Kozmik and D.P. Morton. Evaluating policies in risk-averse multi-stage stochastic programming.
Mathematical Pro-gramming , 152:275–300, 2015.[23] N. L¨ohndorf, D. Wozabal, and S. Minner. Optimizing trading decisions for hydro storage systems using approximate dualdynamic programming.
Operations Research , 61:810–823, 2013.[24] W.M. McEneaney, A. Deshpande, and S. Gaubert. Curse of complexity attenuation in the curse of dimensionality freemethod for HJB PDEs.
American Control Conference , pages 4684–4690, 2008.[25] M.V.F. Pereira and L.M.V.G Pinto. Multi-stage stochastic optimization applied to energy planning.
Math. Program. ,52:359–375, 1991.[26] Laurent Pfeiffer, Romain Apparigliato, and Sophie Auchapt. Two methods of pruning benders’ cuts and their applicationto the management of a gas portfolio.
Research Report RR-8133, hal-00753578 , 2012.[27] A. Philpott and V. de Matos. Dynamic sampling algorithms for multi-stage stochastic programs with risk aversion.
EuropeanJournal of Operational Research , 218:470–483, 2012.[28] A. Philpott, V. de Matos, and E. Finardi. Improving the performance of stochastic dual dynamic programming.
Journalof Computational and Applied Mathematics , 290:196–208, 2015.
29] A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods.
Oper.Res. Lett. , 36:450–455, 2008.[30] W.P. Powell.
Approximate Dynamic Programming . John Wiley and Sons, 2nd edition, 2011.[31] A. Ruszczy´nski. Parallel decomposition of multistage stochastic programming problems.
Math. Programming , 58:201–228,1993.[32] A. Shapiro. Analysis of stochastic dual dynamic programming method.
European Journal of Operational Research , 209:63–72, 2011.[33] A. Shapiro, D. Dentcheva, and A. Ruszczy´nski.
Lectures on Stochastic Programming: Modeling and Theory . SIAM,Philadelphia, 2009.[34] A. Shapiro, W. Tekaya, J.P. da Costa, and M.P. Soares. Risk neutral and risk averse stochastic dual dynamic programmingmethod.
European Journal of Operational Research , 224:375–391, 2013.[35] A. Shapiro, W. Tekaya, J.P. da Costa, and M.P. Soares. Worst-case-expectation approach to optimization under uncertainty.
Oper. Res. , 61:1435–1449, 2013.[36] W. Zhang, H. Rahimian, and G. Bayraksan. Decomposition algorithms for risk-averse multistage stochastic programs withapplication to water allocation under uncertainty.
INFORMS Journal on Computing , 28:385–404, 2016.
Appendix
Level 1 Limited Memory Level 1 I ktj = { k } , m ktj = C ktj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj + ε max(1 , | m (cid:96)tj | ) I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) Else if |C ktj ( x (cid:96)t − ) − m (cid:96)tj | ≤ ε max(1 , | m (cid:96)tj | ) I (cid:96)tj = I (cid:96)tj ∪ { k } End IfIf C (cid:96)tj ( x kt − ) > m ktj + ε max(1 , | m ktj | ) I ktj = { (cid:96) } , m ktj = C (cid:96)tj ( x kt − ) Else if |C (cid:96)tj ( x kt − ) − m ktj | ≤ ε max(1 , | m ktj | ) I ktj = I ktj ∪ { (cid:96) } End IfEnd For I ktj = { } , m ktj = C tj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj + ε max(1 , | m (cid:96)tj | ) I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) End IfIf C (cid:96) +1 tj ( x kt − ) > m ktj + ε max(1 , | m ktj | ) I ktj = { (cid:96) + 1 } , m ktj = C (cid:96) +1 tj ( x kt − ) End IfEnd For
Figure 6.