[PDF] Single cut and multicut SDDP with cut selection for multistage stochastic linear programs: convergence proof and numerical experiments

Abstract

We introduce a variant of Multicut Decomposition Algorithms (MuDA), called CuSMuDA (Cut Selection for Multicut Decomposition Algorithms), for solving multistage stochastic linear programs that incorporates a class of cut selection strategies to choose the most relevant cuts of the approximate recourse functions. This class contains Level 1 and Limited Memory Level 1 cut selection strategies, initially introduced for respectively Stochastic Dual Dynamic Programming (SDDP) and Dual Dynamic Programming (DDP). We prove the almost sure convergence of the method in a finite number of iterations and obtain as a by-product the almost sure convergence in a finite number of iterations of SDDP combined with our class of cut selection strategies. We compare the performance of MuDA, SDDP, and their variants with cut selection (using Level 1 and Limited Memory Level 1) on several instances of a portfolio problem and of an inventory problem. On these experiments, in general, SDDP is quicker (i.e., satisfies the stopping criterion quicker) than MuDA and cut selection allows us to decrease the computational bulk with Limited Memory Level 1 being more efficient (sometimes much more) than Level 1.

Full PDF

SSINGLE CUT AND MULTICUT SDDP WITH CUT SELECTION FOR MULTISTAGESTOCHASTIC LINEAR PROGRAMS: CONVERGENCE PROOF AND NUMERICALEXPERIMENTS

Vincent Guigues (corresponding author)School of Applied Mathematics, FGVPraia de Botafogo, Rio de Janeiro, Brazil [email protected]

Michelle BandarraSchool of Applied Mathematics, FGVPraia de Botafogo, Rio de Janeiro, Brazil [email protected]

Abstract.

We introduce a variant of Multicut Decomposition Algorithms (MuDA), called CuSMuDA (CutSelection for Multicut Decomposition Algorithms), for solving multistage stochastic linear programs thatincorporates a class of cut selection strategies to choose the most relevant cuts of the approximate recoursefunctions. This class contains Level 1 [28] and Limited Memory Level 1 [16] cut selection strategies, initiallyintroduced for respectively Stochastic Dual Dynamic Programming (SDDP) and Dual Dynamic Program-ming (DDP). We prove the almost sure convergence of the method in a ﬁnite number of iterations andobtain as a by-product the almost sure convergence in a ﬁnite number of iterations of SDDP combined withour class of cut selection strategies.We compare the performance of MuDA, SDDP, and their variants with cut selection (using Level 1 andLimited Memory Level 1) on several instances of a portfolio problem and of an inventory problem. On theseexperiments, in general, SDDP is quicker (i.e., satisﬁes the stopping criterion quicker) than MuDA and cutselection allows us to decrease the computational bulk with Limited Memory Level 1 being more eﬃcient(sometimes much more) than Level 1.

Keywords:

Stochastic Programming; Stochastic Dual Dynamic Programming; Multicut DecompositionAlgorithm; Portfolio selection; Inventory management.

AMS subject classiﬁcations:

Introduction

Multistage stochastic optimization problems are common in many areas of engineering and in ﬁnance.However, solving these problems is challenging and in general requires decomposition techniques. Two popu-lar decomposition methods are Approximate Dynamic Programming (ADP) [30] and sampling-based variantsof the Nested Decomposition (ND) algorithm [6, 5] and of the Multicut Nested Decomposition (MuND) algo-rithm [9, 11]. The introduction of sampling within ND was proposed by [25] and the corresponding methodis usually called Stochastic Dual Dynamic Programming (SDDP). Several enhancements and extensions ofSDDP have been proposed such as CUPPS [10], ReSA [20], AND [7], DOASA [29], risk-averse variants in[17, 18, 27, 14, 34, 22], cut formulas for nonlinear problems in [15], regularizations in [2] for linear prob-lems and SDDP-REG in [19] for nonlinear problems. Convergence of these methods was proved in [29] forlinear problems, in [13] for nonlinear risk-neutral problems, and in [15] for risk-averse nonlinear problems.All these algorithms compute lower approximations of the cost-to-go functions expressed as a supremum ofaﬃne functions called optimality cuts. Typically, at each iteration, a ﬁxed number of cuts is added for eachcost-to-go function. Therefore, techniques to reduce the number of cuts in each subproblem (referred to ascut selection or cut pruning) may be helpful to speed up the convergence of these methods. In stochasticoptimization, the problem of cut selection for lower approximations of the cost-to-go functions associatedto each node of the scenario tree was discussed for the ﬁrst time in [31] where only the active cuts areselected. Pruning strategies of basis (quadratic) functions have been proposed in [12] and [24] for max-plusbased approximation methods which, similarly to SDDP, approximate the cost-to-go functions of a nonlinearoptimal control problem by a supremum of basis functions. More precisely, in [12], a ﬁxed number of cuts is a r X i v : . [ m a t h . O C ] J u l liminated and cut selection is done solving a combinatorial optimization problem. For SDDP, in [34] it issuggested at some iterations to eliminate redundant cuts (a cut is redundant if it is never active in describingthe lower approximate cost-to-go function). This procedure is called test of usefulness in [26]. This requiressolving at each stage as many linear programs as there are cuts. In [26] and [28], only the cuts that havethe largest value for at least one of the trial points computed are considered relevant. This strategy is calledthe Territory algorithm in [26] and Level 1 cut selection in [28]. It was presented for the ﬁrst time in 2007at the ROADEF congress by David Game and Guillaume Le Roy (GDF-Suez), see [26].Sampling can also be incorporated into MuND algorithm (see for instance [36]). This algorithm buildsmany more cuts than SDDP per iteration and therefore each iteration takes more time but less iterationsare in general needed to satisfy some stopping criterion. Therefore, cut selection strategies could also beuseful. However, to the best of our knowledge, the combination of multicut decomposition methods with cutselection strategies, refereed to as CuSMuDA (Cut Selection for Multicut Decomposition Algorithms) in thesequel, has not been proposed so far. In this context, the objectives and contributions of this paper are thefollowing:(A) we propose cut selection strategies that are more eﬃcient than the aforementioned ones. More pre-cisely, instead of selecting all the cuts that are the highest at the trial points, we introduce a set ofselectors that select some subset of these cuts. The selectors have to satisfy an assumption (Assump-tion (H3), see Section 3.2) to ensure the convergence of SDDP and MuDA combined with these cutselection strategies. We obtain a family of cut selection strategies; a given strategy correspondingto a choice of selectors along the iterations. In this family, the most economic (in terms of mem-ory) cut selection strategy satisfying (H3) is the Limited Memory Level 1 (LML 1) strategy whichselects at each trial point only one cut, namely the oldest cut. This strategy was introduced in [16]in the context of Dual Dynamic Programming but can be straightforwardly applied to SDDP andMuDA. The least economic strategy, i.e., the one that keeps the largest amount of cuts, is Level 1.“Between” these two strategies, using the ﬂexibility oﬀered by the selectors (as long as Assumption(H3) is satisﬁed by these selectors), we obtain a (large) family of cut selection strategies.(B) We introduce and describe CuSMuDA, a combination of MuDA with our family of cut selectionstrategies.(C) We prove the almost sure convergence of CuSMuDA in a ﬁnite number of iterations. This proofextends the theory in [16] in two aspects: (i) ﬁrst the stochastic case is considered, i.e., SDDP andmulticut SDDP are considered whereas the deterministic case, i.e., DDP, was considered in [16] andsecond (ii) more general cut selection strategies are dealt with. Item (ii) requires an additionaltechnical discussion, see Lemma 1.(D) We present the results of numerical experiments comparing the performance of six solution methodson several instances of a portfolio problem and of an inventory problem. These six solution methodsare SDDP, SDDP with Level 1 cut selection, SDDP with LML 1 cut selection, MuDA, CuSMuDAwith Level 1 cut selection, and CuSMuDA with LML 1 cut selection. To the best of our knowledge,these are the ﬁrst numerical experiments on SDDP with LML 1 cut selection and the ﬁrst experimentson multicut SDDP combined with cut selection. The main conclusions of these experiments are thefollowing: – in general, for a given instance, CuSMuDA with LML 1 cut selection (resp. SDDP combinedwith LML 1 cut selection) is more eﬃcient (i.e., allows us to satisfy more quickly the stoppingcriterion) than CuSMuDA with Level 1 cut selection (resp. SDDP combined with Level 1 cutselection), itself much more eﬃcient than MuDA (resp. SDDP). Typically, variants with cutselection require more iterations but iterations are quicker with very few cuts selected for theﬁrst stages. However, on some instances, CuSMuDA with Level 1 cut selection still selected alarge proportion of cuts for all stages and was less eﬃcient than both MuDA and CuSMuDAwith LML 1 cut selection. – MuDA (resp. CuSMuDA) in general requires much more computational bulk than SDDP (resp.SDDP with cut selection). However, on some instances, CuSMuDA with LML 1 cut selectionand SDDP combined with LML 1 cut selection have shown similar performances. We alsoexpect CuSMuDA to be more eﬃcient than SDDP with cut selection when MuDA is alreadymore eﬃcient than SDDP. Even if this is not often the case, the results of numerical experiments n several instances of multistage stochastic linear programs where MuDA is quicker than SDDPare reported in [8].The outline of the study is as follows. The class of problems considered and assumptions are discussed inSection 2. In Subsection 3.1 we recall sampling-based MuND while in Subsection 3.2 CuSMuDA is described.In Section 4, we prove Theorem 1 which states that CuSMuDA converges almost surely in a ﬁnite number ofiterations to an optimal policy. As a by-product, we obtain the almost sure convergence of SDDP combinedwith our class of cut selection strategies (in particular Level 1 and LML 1) in a ﬁnite number of iterations.Finally, numerical experiments are presented in Section 5.Throughout the paper, the usual scalar product in R n is denoted by (cid:104) x, y (cid:105) = x (cid:62) y for x, y ∈ R n .2. Problem formulation and assumptions

We are interested in solution methods for linear stochastic dynamic programming equations: the ﬁrststage problem is(1) Q ( x ) = (cid:26) inf x ∈ R n (cid:104) c , x (cid:105) + Q ( x ) A x + B x = b , x ≥ x given and for t = 2 , . . . , T , Q t ( x t − ) = E ξ t [ Q t ( x t − , ξ t )] with(2) Q t ( x t − , ξ t ) = (cid:26) inf x t ∈ R n (cid:104) c t , x t (cid:105) + Q t +1 ( x t ) A t x t + B t x t − = b t , x t ≥ , with the convention that Q T +1 is null and where for t = 2 , . . . , T , random vector ξ t corresponds to theconcatenation of the elements in random matrices A t , B t which have a known ﬁnite number of rows andrandom vectors b t , c t (it is assumed that ξ is not random). For convenience, we will denote X t ( x t − , ξ t ) := { x t ∈ R n : A t x t + B t x t − = b t , x t ≥ } . We make the following assumptions:(H1) The random vectors ξ , . . . , ξ T are independent and have discrete distributions with ﬁnite support.(H2) The set X ( x , ξ ) is nonempty and bounded and for every x ∈ X ( x , ξ ), for every t = 2 , . . . , T ,for every realization ˜ ξ , . . . , ˜ ξ t of ξ , . . . , ξ t , for every x τ ∈ X τ ( x τ − , ˜ ξ τ ) , τ = 2 , . . . , t −

1, the set X t ( x t − , ˜ ξ t ) is nonempty and bounded.We will denote by Θ t = { ξ t , . . . , ξ tM t } the support of ξ t for stage t with p ti = P ( ξ t = ξ ti ) > , i = 1 , . . . , M t and with vector ξ tj being the concatenation of the elements in A tj , B tj , b tj , c tj .3. Algorithms

Multicut stochastic decomposition.

The multicut stochastic decomposition method approximatesthe function Q t ( · , ξ tj ) at iteration k for t = 2 , . . . , T , j = 1 , . . . , M t , by a piecewise aﬃne lower boundingfunction Q kt ( · , ξ tj ) which is a maximum of k aﬃne functions C itj called cuts: Q kt ( x t − , ξ tj ) = max ≤ i ≤ k C itj ( x t − ) with C itj ( x t − ) = θ itj + (cid:104) β itj , x t − (cid:105) where coeﬃcients θ itj , β itj are computed as explained below. These approximations provide the lower bound-ing functions(3) Q kt ( x t − ) = M t (cid:88) j =1 p tj Q kt ( x t − , ξ tj )for Q t . Since Q T +1 is the null function, we will also deﬁne Q kT +1 ≡

0. The steps of MuDA are described below.

Step 1: Initialization.

For t = 2 , . . . , T , j = 1 , . . . , M t , take for Q t ( · , ξ tj ) a known lower boundingaﬃne function for Q t ( · , ξ tj ). Set the iteration count k to 1 and Q T +1 ≡ tep 2: Forward pass. We generate a sample ˜ ξ k = ( ˜ ξ k , ˜ ξ k , . . . , ˜ ξ kT ) from the distribution of ( ξ , ξ , . . . , ξ T ),with the convention that ˜ ξ k = ξ (here and in what follows, the tilde symbol will be used to represent real-izations of random variables: for random variable ξ , ˜ ξ is a realization of ξ ). Using approximation Q k − t ( · , ξ tj )of Q t ( · , ξ tj ) (computed at previous iterations), we solve the problem(4) (cid:26) inf x t ∈ R n (cid:104) x t , ˜ c kt (cid:105) + Q k − t +1 ( x t ) x t ∈ X t ( x kt − , ˜ ξ kt )for t = 1 , . . . , T , where x k = x and Q k − t +1 is given by (3) with k replaced by k −

1. Let x kt be an optimalsolution of the problem. Step 3: Backward pass.

The backward pass builds cuts for Q t ( · , ξ tj ) at x kt − computed in the forwardpass. For k ≥ t = 1 , . . . , T , we introduce the function Q kt : R n × Θ t → R given by(5) Q kt ( x t − , ξ t ) = (cid:26) inf x t ∈ R n (cid:104) c t , x t (cid:105) + Q kt +1 ( x t ) x t ∈ X t ( x t − , ξ t ) , with the convention that Θ = { ξ } , and we set Q kT +1 ≡

0. For j = 1 , . . . , M T , we solve the problem(6) Q T ( x kT − , ξ T j ) = (cid:40) inf x T ∈ R n (cid:104) c T j , x T (cid:105) A T j x T + B T j x kT − = b T j , x T ≥ , with dual (cid:26) sup λ (cid:104) λ, b T j − B T j x kT − (cid:105) A (cid:62) T j λ ≤ c T j . Let λ kT j be an optimal solution of the dual problem above. We get Q T ( x T − , ξ T j ) ≥ (cid:104) λ kT j , b T j − B T j x T − (cid:105) and compute θ kT j = (cid:104) b T j , λ kT j (cid:105) and β kT j = − B (cid:62) T j λ kT j . Then for t = T − t = 2, knowing Q kt +1 ≤ Q t +1 ,we solve the problem below for j = 1 , . . . , M t ,(7) Q kt ( x kt − , ξ tj ) = (cid:40) inf x t (cid:104) c tj , x t (cid:105) + Q kt +1 ( x t ) x t ∈ X t ( x kt − , ξ tj ) =  inf x t ,f (cid:104) c tj , x t (cid:105) + M t +1 (cid:88) (cid:96) =1 p t +1 (cid:96) f (cid:96) A tj x t + B tj x kt − = b tj , x t ≥ ,f (cid:96) ≥ θ it +1 (cid:96) + (cid:104) β it +1 (cid:96) , x t (cid:105) , i = 1 , . . . , k, (cid:96) = 1 , . . . , M t +1 . Observe that due to (H2) the above problem is feasible and has a ﬁnite optimal value. Therefore Q kt ( x kt − , ξ tj )can be expressed as the optimal value of the corresponding dual problem:(8) Q kt ( x kt − , ξ tj ) =  sup λ,µ (cid:104) λ, b tj − B tj x kt − (cid:105) + k (cid:88) i =1 M t +1 (cid:88) (cid:96) =1 µ i(cid:96) θ it +1 (cid:96) A (cid:62) tj λ + (cid:80) ki =1 (cid:80) M t +1 (cid:96) =1 µ i(cid:96) β it +1 (cid:96) ≤ c tj ,p t +1 (cid:96) = (cid:80) ki =1 µ i(cid:96) , (cid:96) = 1 , . . . , M t +1 ,µ i(cid:96) ≥ , i = 1 , . . . , k, (cid:96) = 1 , . . . , M t +1 . Let ( λ ktj , µ ktj ) be an optimal solution of dual problem (8). Using the fact that Q kt +1 ≤ Q t +1 , we get Q t ( x t − , ξ tj ) ≥ Q kt ( x t − , ξ tj ) ≥ (cid:104) λ ktj , b tj − B tj x t − (cid:105) + (cid:104) µ ktj , θ kt +1 (cid:105) and we compute θ ktj = (cid:104) λ ktj , b tj (cid:105) + (cid:104) µ ktj , θ kt +1 (cid:105) and β ktj = − B (cid:62) tj λ ktj . In these expressions, vector θ kt +1 has components θ it +1 (cid:96) , (cid:96) = 1 , . . . , M t +1 , i = 1 , . . . , k , arranged in the sameorder as components µ ktj ( (cid:96), i ) of µ ktj . Step 4: Do k ← k + 1 and go to Step 2. .2. Multicut stochastic decomposition with cut selection.

We now describe a variant of MuDA thatstores all cut coeﬃcients θ itj , β itj , and trial points x it − , but that uses a reduced set of cuts C itj to approximatefunctions Q t ( · , ξ tj ) when solving problem (4) in the forward pass and (7) in the backward pass. Let S ktj bethe set of indices of the cuts selected at the end of iteration k to approximate Q t ( · , ξ tj ). At the end of thebackward pass of iteration k , the variant of MuDA with cut selection computes approximations Q kt of Q t given by (3) now with Q kt ( · , ξ tj ) given by(9) Q kt ( x t − , ξ tj ) = max (cid:96) ∈ S ktj C (cid:96)tj ( x t − ) , where the set S ktj is a subset of the set of indices of the cuts that have the largest value for at least one ofthe trial points computed so far. More precisely, sets S ktj are initialized taking S tj = { } . For t ∈ { , . . . , T } and k ≥

1, sets S ktj are computed as follows. For i = 1 , . . . , k , t = 2 , . . . , T , j = 1 , . . . , M t , let I iktj be the setof cuts for Q t ( · , ξ tj ) computed at iteration k or before, that have the largest value at x it − :(10) I iktj = arg max (cid:96) =1 ,...,k C (cid:96)tj ( x it − ) , where the cut indices in I iktj are sorted in ascending order. With a slight abuse of notation, we will denotethe (cid:96) -th smallest element in I iktj by I iktj ( (cid:96) ). For instance, if I iktj = { , , } then I iktj (1) = 2 , I iktj (2) = 30,and I iktj (3) = 50. A cut selection strategy is given by a set of selectors S tj ( m ), t = 2 , . . . , T , j = 1 , . . . , M t , m = 1 , , . . . , where S tj ( m ) is a subset of the set of m integers { , , . . . , m } , giving the indices of the cutsto select in I iktj through the relation S ktj = k (cid:91) i =1 (cid:8) I iktj ( (cid:96) ) : (cid:96) ∈ S tj ( | I iktj | ) (cid:9) , where | I iktj | is the cardinality of set I iktj . We require the selectors to satisfy the following assumption:(H3) for t = 2 , . . . , T , j = 1 , . . . , M t , for every m ≥ S tj ( m ) ⊆ S tj ( m + 1).Level 1 and Limited Memory Level 1 cut selection strategies described in Examples 3.1 and 3.2 respectivelycorrespond to the least and most economic selectors satisfying (H3): Example 3.1 (Level 1 and Territory Algorithm) . The strategy S tj ( m ) = { , , . . . , m } selects all cuts thathave the highest value for at least one trial point. In the context of SDDP, this strategy was called Level 1in [28] and Territory Algorithm in [26] . For this strategy, we have S kT j = { , . . . , k } for all j and k ≥ ,meaning that no cut selection is needed for the last stage T . This comes from the fact that for all k ≥ and ≤ k ≤ k , cut C k T j is selected because it is one of the cuts with the highest value at x k T − . Indeed, for any ≤ k ≤ k with k (cid:54) = k , since λ k T j is feasible for problem (6) with x kT − replaced by x k T − , we get C k T j ( x k T − ) = Q T ( x k T − , ξ T j ) ≥ (cid:104) λ k T j , b

T j − B T j x k T − (cid:105) = C k T j ( x k T − ) . Example 3.2 (Limited Memory Level 1) . The strategy that eliminates the largest amount of cuts, calledLimited Memory Level 1 (LML 1 for short), consists in taking a singleton for every set S tj ( m ) . For (H3) tobe satisﬁed, this implies S tj ( m ) = { } . This choice corresponds to the Limited Memory Level 1 cut selectionintroduced in [16] in the context of DDP. For that particular choice, at a given point, among the cuts thathave the highest value only the oldest (i.e., the cut that was ﬁrst computed among the cuts that have thehighest value at that point) is selected. Remark 3.1.

Observe that our selectors select cuts for functions Q t ( · , ξ tj ) whereas Level 1 (resp. LML1) was introduced to select cuts for functions Q t for SDDP (resp. DDP). Therefore we could have used theterminologies Multicut Level 1 and Multicut Limited Memory Level 1 instead of Level 1 and Limited MemoryLevel 1. We did not do so to simplify and therefore to know to which functions cut selection strategies apply,it suﬃces to add the name of the decomposition method; for instance MuDA with Level 1 cut selection (inwhich case cut selection applies to functions Q t ( · , ξ tj ) , as explained above) or SDDP with Limited MemoryLevel 1 cut selection (in which case cut selection applies to functions Q t ). evel 1 Limited Memory Level 1 I ktj = { k } , m ktj = C ktj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) Else if C ktj ( x (cid:96)t − ) = m (cid:96)tj I (cid:96)tj = I (cid:96)tj ∪ { k } End IfIf C (cid:96)tj ( x kt − ) > m ktj I ktj = { (cid:96) } , m ktj = C (cid:96)tj ( x kt − ) Else if C (cid:96)tj ( x kt − ) = m ktj I ktj = I ktj ∪ { (cid:96) } End IfEnd ForFor (cid:96) = 1 , . . . , k , Selected [ (cid:96) ]= False

End ForFor (cid:96) = 1 , . . . , k

For m = 1 , . . . , | I (cid:96)tj | Selected [ I (cid:96)tj [ m ]]= True

End ForEnd For S ktj = ∅ For (cid:96) = 1 , . . . , k If Selected [ (cid:96) ]= True S ktj = S ktj ∪ { (cid:96) } End IfEnd For I ktj = { } , m ktj = C tj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) End IfIf C (cid:96) +1 tj ( x kt − ) > m ktj I ktj = { (cid:96) + 1 } , m ktj = C (cid:96) +1 tj ( x kt − ) End IfEnd ForFor (cid:96) = 1 , . . . , k , Selected [ (cid:96) ]= False

End ForFor (cid:96) = 1 , . . . , k

For m = 1 , . . . , | I (cid:96)tj | Selected [ I (cid:96)tj [ m ]]= True

End ForEnd For S ktj = ∅ For (cid:96) = 1 , . . . , k If Selected [ (cid:96) ]= True S ktj = S ktj ∪ { (cid:96) } End IfEnd For

Figure 1.

Pseudo-codes for the computation of set S ktj for ﬁxed t ∈ { , . . . , T } , k ≥ , j =1 , . . . , M t , and two cut selection strategies.The computation of S ktj , i.e., of the cut indices to select at iteration k , is performed in the backward pass(immediately after computing cut C ktj ) using the pseudo-code given in the left and right panels of Figure 1for the Level 1 and LML 1 cut selection strategies respectively.In this pseudo-code, we use the notation I itj in place of I iktj . We also store in variable m itj the currentvalue of the highest cut for Q t ( · , ξ tj ) at x it − . At the end of the ﬁrst iteration, we initialize m tj = C tj ( x t − ).After cut C ktj is computed at iteration k ≥

2, these variables are updated using the relations (cid:26) m itj ← max( m itj , C ktj ( x it − )) , i = 1 , . . . , k − ,m ktj ← max( C (cid:96)tj ( x kt − ) , (cid:96) = 1 , . . . , k ) . We also use an array of Boolean called

Selected using the information given by variables I itj whose (cid:96) -thentry is True if cut (cid:96) is selected for Q t ( · , ξ tj ) and False otherwise. This allows us to avoid copies of cutindices that may appear in I i ktj and I i ktj with i (cid:54) = i .4. Convergence analysis

In this section, we prove that CuSMuDA converges in a ﬁnite number of iterations. We will make thefollowing assumption:(H4) The samples in the forward passes are independent: ( ˜ ξ k , . . . , ˜ ξ kT ) is a realization of ξ k = ( ξ k , . . . , ξ kT ) ∼ ( ξ , . . . , ξ T ) and ξ , ξ , . . . , are independent.The convergence proof is based on the following lemma: emma 1. Assume that all subproblems in the forward and backward passes of CuSMuDA are solved usingan algorithm that necessarily outputs an extreme point of the feasible set (for instance the simplex algorithm).Let assumptions (H1), (H2), (H3), and (H4) hold. Then almost surely, there exists k ≥ such that forevery k ≥ k , t = 2 , . . . , T , j = 1 , . . . , M t , we have (11) Q kt ( · , ξ tj ) = Q k t ( · , ξ tj ) and Q kt = Q k t . Proof.

Let Ω be the event on the sample space Ω of sequences of forward scenarios such that every scenariois sampled an inﬁnite number of times. By Assumption (H4), this event Ω has probability one.Consider a realization ω ∈ Ω of CuSMuDA in Ω corresponding to realizations ( ˜ ξ k T ) k of ( ξ k T ) k in theforward pass. To simplify, we will drop ω in the notation. For instance, we will simply write x kt , Q kt forrealizations x kt ( ω ) and Q kt ( · )( ω ) of x kt , Q kt given realization ω ∈ Ω of CuSMuDA.We show by induction on t that the number of diﬀerent cuts computed by the algorithm is ﬁnite and thatafter some iteration k t the same cuts are selected for functions Q t ( · , ξ tj ). Our induction hypothesis H ( t ) for t ∈ { , . . . , T } is that the sets { ( θ ktj , β ktj ) : k ∈ N } , j = 1 , . . . , M t , are ﬁnite and there exists some ﬁnite k t such that for every k > k t we have(12) { ( θ (cid:96)tj , β (cid:96)tj ) : (cid:96) ∈ S ktj } = { ( θ (cid:96)tj , β (cid:96)tj ) : (cid:96) ∈ S k t tj } and x kt − = x k t t − , for every j = 1 , . . . , M t . We will denote by I iktj the set (cid:8) I iktj ( (cid:96) ) : (cid:96) ∈ S tj ( | I iktj | ) (cid:9) . We ﬁrst show, in items a and b below that H ( T ) holds. a. Observe that λ kT j deﬁned in the backward pass of CuSMuDA is an extreme point of the polyhedron { λ : A (cid:62) T j λ ≤ c T j } . This polyhedron is a ﬁnite intersection of closed half spaces in ﬁnite dimension (since A T j has a ﬁnite number of rows and columns) and therefore has a ﬁnite number of extreme points. It follows that λ kT j can only take a ﬁnite number of values, same as ( θ kT j , β kT j ) = ( (cid:104) λ kT j , b T j (cid:105) , − B (cid:62) T j λ kT j ), and there exists ¯ k T such that for every k > ¯ k T and every j , each cut C kT j is a copy of a cut C k (cid:48) T j with 1 ≤ k (cid:48) ≤ ¯ k T (no new cut iscomputed for functions Q T ( · , ξ T j ) for k > ¯ k T ).Now recall that x kT − computed in the forward pass is a solution of (4) with t = T − f , f , . . . , f M T replacing in the objective Q k − T ( x T − ) by (cid:80) M T (cid:96) =1 p T (cid:96) f (cid:96) and adding the linear constraints f (cid:96) ≥ θ iT (cid:96) + (cid:104) β iT (cid:96) , x T − (cid:105) , i = 1 , . . . , k − ,(cid:96) = 1 , . . . , M T . On top of that, for iterations k > ¯ k T , since functions Q kT ( · , ξ T j ) are made of a collection ofcuts taken from the ﬁnite and ﬁxed set of cuts C (cid:96)T j , (cid:96) ≤ ¯ k T , the set of possible functions ( Q kT ( · , ξ T j )) k ≥ andtherefore of possible functions ( Q kT ) k ≥ is ﬁnite. It follows that there is a ﬁnite set of possible polyhedronsfor the feasible set of (4) (with t = T −

1) rewritten as a linear program as we have just explained, addingvariables f , f , . . . , f M T . Since these polyhedrons have a ﬁnite number of extreme points (recall that there isa ﬁnite number of diﬀerent linear constraints), there is only a ﬁnite number of possible trial points ( x kT − ) k ≥ .Therefore we can assume without loss of generality that ¯ k T is such that for iterations k > ¯ k T , all trial point x kT − is also a copy of a trial point x k (cid:48) T − with k (cid:48) ≤ ¯ k T . b . We show that for every i ≥ j = 1 , . . . , M T , there exists 1 ≤ i (cid:48) ≤ ¯ k T and k T j ( i (cid:48) ) ≥ ¯ k T such thatfor every k ≥ max( i, k T j ( i (cid:48) )) we have(13) (cid:8) ( θ (cid:96)T j , β (cid:96)T j ) : (cid:96) ∈ I ikT j (cid:9) = (cid:110) ( θ (cid:96)T j , β (cid:96)T j ) : (cid:96) ∈ I i (cid:48) k Tj ( i (cid:48) ) T j (cid:111) , which will show H ( T ) with k T = max ≤ j ≤ M T , ≤ i (cid:48) ≤ ¯ k T k T j ( i (cid:48) ). Let us show that (13) indeed holds. Let ustake i ≥ j ∈ { , . . . , M T } . If 1 ≤ i ≤ ¯ k T deﬁne i (cid:48) = i . Otherwise due to a. we can ﬁnd 1 ≤ i (cid:48) ≤ ¯ k T suchthat x iT − = x i (cid:48) T − which implies I ikT j = I i (cid:48) kT j for every k ≥ i . Now consider the sequence of sets ( I i (cid:48) kT j ) k ≥ ¯ k T .Due to the deﬁnition of ¯ k T , the sequence ( | I i (cid:48) kT j | ) k> ¯ k T is nondecreasing and therefore two cases can happen:(A) there exists k T j ( i (cid:48) ) such that for k ≥ k T j ( i (cid:48) ) we have I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j (the cuts computed afteriteration k T j ( i (cid:48) ) are not active at x i (cid:48) T − ). In this case I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j for k ≥ k T j ( i (cid:48) ), I ikT j = I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j for k ≥ max( k T j ( i (cid:48) ) , i ) and (13) holds.(B) The sequence ( | I i (cid:48) kT j | ) k ≥ ¯ k T is unbounded. Due to Assumption (H3), the sequence ( |S T j ( m ) | ) m isnondecreasing. If there exists k T j > ¯ k T such that S T j ( k ) = S T j ( k T j ) for k ≥ k T j then if k T j ( i (cid:48) )is the smallest k such that | I i (cid:48) kT j | ≥ k T j then for every k ≥ k T j ( i (cid:48) ) we have I i (cid:48) kT j = I i (cid:48) k Tj ( i (cid:48) ) T j and for ≥ max( k T j ( i (cid:48) ) , i ) we deduce I ikT j = I i (cid:48) k Tj ( i (cid:48) ) T j and (13) holds. Otherwise the sequence ( |S T j ( m ) | ) m is unbounded and an inﬁnite number of cut indices are selected from the sets ( I i (cid:48) kT j ) k ≥ i (cid:48) to make upsets ( I i (cid:48) kT j ) k ≥ i (cid:48) . However, since there is a ﬁnite number of diﬀerent cuts, there is only a ﬁnite numberof iterations where a new cut can be selected from I i (cid:48) kT j and therefore there exists k T j ( i (cid:48) ) such that(13) holds for k ≥ max( k T j ( i (cid:48) ) , i ). c. Now assume that H ( t + 1) holds for some t ∈ { , . . . , T − } . We want to show H ( t ). Consider the set D tjk of points of form(14) ( (cid:104) λ, b tj (cid:105) + M t +1 (cid:88) (cid:96) =1 (cid:88) i ∈ S kt +1 (cid:96) µ i(cid:96) θ it +1 (cid:96) , − B (cid:62) tj λ )where ( λ, µ ) is an extreme point of the set P tjk of points ( λ, µ ) satisfying(15) µ ≥ , p t +1 (cid:96) = (cid:80) i ∈ S kt +1 (cid:96) µ i(cid:96) , (cid:96) = 1 , . . . , M t +1 , A (cid:62) tj λ + (cid:80) M t +1 (cid:96) =1 (cid:80) i ∈ S kt +1 (cid:96) µ i(cid:96) β it +1 (cid:96) ≤ c tj . We claim that for every k > k t +1 , every point from D tjk can be written as a point from D tjk t +1 , i.e., a pointof form (14) with k replaced by k t +1 and ( λ, µ ) an extreme point of the set P tjk t +1 . Indeed, take a pointfrom D tjk , i.e., a point of form (14) with ( λ, µ ) an extreme point of P tjk and k > k t +1 . It can be written asa point from D tjk t +1 , i.e., a point of form (14) with k replaced by k t +1 and ( λ, ˆ µ ) in the place of ( λ, µ ) where( λ, ˆ µ ) is an extreme point of P tjk t +1 obtained replacing the basic columns β it +1 (cid:96) with i ∈ S kt +1 (cid:96) , i / ∈ S k t +1 t +1 (cid:96) associated with µ by columns β i (cid:48) t +1 (cid:96) with i (cid:48) ∈ S k t +1 t +1 (cid:96) such that ( θ it +1 (cid:96) , β it +1 (cid:96) ) = ( θ i (cid:48) t +1 (cid:96) , β i (cid:48) t +1 (cid:96) ) (this is possibledue to H ( t + 1)).Since P tjk t +1 has a ﬁnite number of extreme points, the set D tjk has a ﬁnite cardinality and recalling thatfor CuSMuDA ( θ ktj , β ktj ) ∈ D tjk , the cut coeﬃcients ( θ ktj , β ktj ) can only take a ﬁnite number of values. Thisshows the ﬁrst part of H ( t ). Therefore, there exists ¯ k t such that for every k > ¯ k t and every j , each cut C ktj isa copy of a cut C k (cid:48) tj with 1 ≤ k (cid:48) ≤ ¯ k t (no new cut is computed for functions Q t ( · , ξ tj ) for k > ¯ k t ). As for theinduction step t = T , this clearly implies that after some iteration, no new trial points are computed andtherefore we can assume without loss of generality that ¯ k t is such that for k > ¯ k T trial point x kt − is a copyof some x (cid:96)t − with 1 ≤ (cid:96) ≤ ¯ k t .Finally, we can show (12) proceeding as in b. , replacing T by t . This achieves the proof of H ( t ).Gathering our observations, we have shown that (11) holds with k = max t =2 ,...,T k t . (cid:3) Remark 4.1.

For Level 1 and LML 1 cut selection strategies corresponding to selectors S tj satisfyingrespectively S tj ( m ) = { , . . . , m } and S tj ( m ) = { } , integers k , ¯ k t deﬁned in Lemma 1 and its proof satisfy k ≤ max t =2 ,...,T ¯ k t . For other selectors S tj this relation is not necessarily satisﬁed (see b. -(B) of the proofof the lemma). Theorem 1.

Let Assumptions (H1), (H2), (H3), and (H4) hold. Assume that all subproblems in the forwardand backward passes of CuSMuDA are solved using an algorithm that necessarily outputs an extreme point ofthe feasible set (for instance the simplex algorithm). Then Algorithm CuSMuDA converges with probabilityone in a ﬁnite number of iterations to a policy which is an optimal solution of (1) - (2) .Proof. Let Ω be deﬁned as in the proof of Lemma 1 and let Ω be the event such that k deﬁned in Lemma1 is ﬁnite. Note that Ω ∩ Ω has probability 1. Consider a realization of CuSMuDA in Ω ∩ Ω correspondingto realizations ( ˜ ξ k T ) k of ( ξ k T ) k in the forward pass and let ( x ∗ , x ∗ ( · ), . . . , x ∗ T ( · )) be the policy obtained fromiteration k on which uses recourse functions Q k t +1 instead of Q t +1 . Recall that policy ( x ∗ , x ∗ ( · ), . . . , x ∗ T ( · )) isoptimal if for every realization ˜ ξ T := ( ξ , ˜ ξ , . . . , ˜ ξ T ) of ξ T := ( ξ , ξ , . . . , ξ T ), we have that x ∗ t ( ξ , ˜ ξ , . . . , ˜ ξ t )solves(16) Q t ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) = inf x t {(cid:104) ˜ c t , x t (cid:105) + Q t +1 ( x t ) : x t ∈ X t ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) } for every t = 1 , . . . , T , with the convention that x ∗ = x . We prove for t = 1 , . . . , T , H ( t ) : for every k ≥ k and for every sample ˜ ξ t = ( ξ , ˜ ξ , . . . , ˜ ξ t ) of ( ξ , ξ , . . . , ξ t ) , we have Q kt ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) = Q t ( x ∗ t − ( ˜ ξ t − ) , ˜ ξ t ) . e show H (1) , . . . , H ( T ) by induction. H ( T ) holds since Q kT = Q T for every k . Now assume that H ( t + 1)holds for some t ∈ { , . . . , T − } . We want to show H ( t ). Take an arbitrary k ≥ k and a sample˜ ξ t − = ( ξ , ˜ ξ , . . . , ˜ ξ t − ) of ( ξ , ξ , . . . , ξ t − ). We have for every j = 1 , . . . , M t , that(17) Q kt ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) = (cid:26) inf (cid:104) c tj , x t (cid:105) + Q kt +1 ( x t ) x t ∈ X t ( x ∗ t − ( ˜ ξ t − ) , ξ tj )= (cid:104) c tj , x ∗ t ( ˜ ξ t − , ξ tj ) (cid:105) + Q kt +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) . Now we check that for j = 1 , . . . , M t , we have(18) Q kt +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) = Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) . Indeed, if this relation did not hold, since Q kt +1 ≤ Q t +1 , we would have Q k t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) = Q kt +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) < Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) . From the deﬁnitions of Q kt +1 , Q t +1 , there exists m ∈ { , . . . , M t +1 } such that Q k t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) < Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) . Since the realization of CuSMuDA is in Ω , there exists an inﬁnite set of iterations such that the sampledscenario for stages 1 , . . . , t , is ( ˜ ξ t − , ξ tj ). Let (cid:96) be one of these iterations strictly greater than k . Using H ( t + 1), we have that Q (cid:96)t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) = Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m )which yields Q k t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m ) < Q (cid:96)t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m )and at iteration (cid:96) > k we would construct a cut for Q t +1 ( · , ξ t +1 m ) at x ∗ t ( ˜ ξ t − , ξ tj ) = x (cid:96)t with value Q (cid:96)t +1 ( x ∗ t ( ˜ ξ t − , ξ tj ) , ξ t +1 m )strictly larger than the value at this point of all cuts computed up to iteration k . Due to Lemma 1, this isnot possible. Therefore, (18) holds, which, plugged into (17), gives Q kt ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) = (cid:104) c tj , x ∗ t ( ˜ ξ t − , ξ tj ) (cid:105) + Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) ≥ Q t ( x ∗ t − ( ˜ ξ t − ) , ξ tj )(recall that x ∗ t ( ˜ ξ t − , ξ tj ) ∈ X t ( x ∗ t − ( ˜ ξ t − ) , ξ tj )). Since Q kt ≤ Q t , we have shown Q kt ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) = Q t ( x ∗ t − ( ˜ ξ t − ) , ξ tj ) for every j = 1 , . . . , M t , which is H ( t ). Therefore, we have proved that for t = 1 , . . . , T, for every realization ( ˜ ξ t − , ξ tj ) of ξ t , x ∗ t ( ˜ ξ t − , ξ tj ) satisﬁes (cid:104) c tj , x ∗ t ( ˜ ξ t − , ξ tj ) (cid:105) + Q t +1 ( x ∗ t ( ˜ ξ t − , ξ tj )) = Q t ( x ∗ t − ( ˜ ξ t − ) , ξ tj ), meaning that for every j = 1 , . . . , M t , x ∗ t ( ˜ ξ t − , ξ tj ) is an optimal solution of (16)written with ˜ ξ t = ( ˜ ξ t − , ξ tj ) and completes the proof. (cid:3) Remark 4.2.

The convergence proof above also shows the almost sure convergence in a ﬁnite number ofiterations for MuDA combined with cut selection strategies that would always select more cuts than any cutselection strategy satisfying Assumption (H3). It shows in particular the convergence of Level H cut selectionfrom [28] which keeps the H cuts having the largest values at each trial point. Remark 4.3.

The class of cut selection strategies described in Section 3.2 can be straightforwardly combinedwith SDDP. The convergence proof of the corresponding variant of SDDP applied to DP equations (1) , (2) can be easily obtained adapting the proofs of Lemma 1 and Theorem 1. Application to portfolio selection and inventory management

Portfolio selection. .1.1. Model.

We consider a portfolio selection problem with direct transaction costs over a discretizedhorizon of T stages. The direct buying and selling transaction costs are proportional to the amount of thetransaction ([3], [4]). Let x t ( i ) be the dollar value of asset i = 1 , . . . , n + 1 at the end of stage t = 1 , . . . , T ,where asset n + 1 is cash; ξ t ( i ) is the return of asset i at t ; y t ( i ) is the amount of asset i sold at the end of t ; z t ( i ) is the amount of asset i bought at the end of t , η t ( i ) > ν t ( i ) > t . Each component x ( i ) , i = 1 , . . . , n + 1, of x is known. The budgetavailable at the beginning of the investment period is (cid:80) n +1 i =1 ξ ( i ) x ( i ) and u ( i ) represents the maximalproportion of money that can be invested in asset i .For t = 1 , . . . , T , given a portfolio x t − = ( x t − (1) , . . . , x t − ( n ) , x t − ( n + 1)) and ξ t , we deﬁne the set X t ( x t − , ξ t ) as the set of ( x t , y t , z t ) ∈ R n +1 × R n × R n satisfying(19) x t ( n + 1) = ξ t ( n + 1) x t − ( n + 1) + n (cid:88) i =1 (cid:16) (1 − η t ( i )) y t ( i ) − (1 + ν t ( i )) z t ( i ) (cid:17) , and for i = 1 , . . . , n , x t ( i ) = ξ t ( i ) x t − ( i ) − y t ( i ) + z t ( i ) , (20a) x t ( i ) ≤ u ( i ) n +1 (cid:88) j =1 ξ t ( j ) x t − ( j ) , (20b) x t ( i ) ≥ , y t ( i ) ≥ , z t ( i ) ≥ . (20c)Constraints (19) are the cash ﬂow balance constraints and deﬁne how much cash is available at each stage.Constraints (20a) deﬁne the amount of security i held at each stage t and take into account the proportionaltransaction costs. Constraints (20b) prevent the position in security i at time t from exceeding a proportion u ( i ). Constraints (20c) prevent short-selling and enforce the non-negativity of the amounts bought and sold.With this notation, the following dynamic programming equations of a risk-neutral portfolio model canbe written: for t = T , setting Q T +1 ( x T ) = E [ n +1 (cid:80) i =1 ξ T +1 ( i ) x T ( i )] we solve the problem(21) Q T ( x T − , ξ T ) = (cid:26) sup Q T +1 ( x T )( x T , y T , z T ) ∈ X T ( x T − , ξ T ) , while at stage t = T − , . . . ,

1, we solve(22) Q t ( x t − , ξ t ) = (cid:26) sup Q t +1 ( x t )( x t , y t , z t ) ∈ X t ( x t − , ξ t ) , where for t = 2 , . . . , T , Q t ( x t − ) = E [ Q t ( x t − , ξ t )]. With this model, we maximize the expected return ofthe portfolio taking into account the transaction costs, non-negativity constraints, and bounds imposed onthe diﬀerent securities.5.1.2. CuSMuDA for portfolio selection.

We assume that the return process ( ξ t ) satisﬁes Assumption (H1). In this setting, we can solve the portfolio problem under consideration using MuDA and CuSMuDA. Forthe sake of completeness, we show how to apply MuDA to this problem, including the stopping crite-rion (CuSMuDA follows, incorporating one of the pseudo-codes from Figure 1). In this implementation, N independent scenarios ˜ ξ k , k = ( i − N + 1 , . . . , iN , of ( ξ , ξ , . . . , ξ T ) are sampled in the forward passof iteration i to obtain N sets of trial points (note that the convergence proof of Theorem 1 still ap-plies for this variant of CuSMuDA). At the end of iteration i , we end up with approximate functions Q it ( x t − , ξ tj ) = max ≤ (cid:96) ≤ iN (cid:104) β (cid:96)tj , x t − (cid:105) for Q t ( · , ξ tj ) (observe that for this problem the cuts have no intercept). This portfolio problem was solved using SDDP and SREDA (Stochastic REgularized Decomposition Algorithm) in [19]. It is possible (at the expense of the computational time) to incorporate stagewise dependant returns within the decom-position algorithms under consideration, for instance including in the state vector the relevant history of the returns as in[21, 14]. orward pass of iteration i . We generate N scenarios ˜ ξ k = ( ˜ ξ k , , ˜ ξ k , . . . , ˜ ξ kT ) , k = ( i − N + 1 , . . . , iN ,of ( ξ , . . . , ξ T ) and solve for k = ( i − N + 1 , . . . , iN , and t = 1 , . . . , T −

1, the problem  inf x t ,f M t +1 (cid:88) (cid:96) =1 p t +1 (cid:96) f (cid:96) x t ∈ X t ( x kt − , ˜ ξ kt ) ,f (cid:96) ≥ (cid:104) β mt +1 (cid:96) , x t (cid:105) , m = 1 , . . . , ( i − N, (cid:96) = 1 , . . . , M t +1 , starting from ( x k , ˜ ξ k ) = ( x , ξ ). Let x kt be an optimal solution.We then sample S independant scenarios of returns and simulate the policy obtained in the end of theforward pass on these scenarios with corresponding decisions (¯ x k , . . . , ¯ x kT ) on scenario k = 1 , . . . , S . Wecompute the empirical mean Cost i and standard deviation σ i of the cost on these sampled scenarios:(23) Cost i = − S S (cid:88) k =1 (cid:104) E [ ξ T +1 ] , ¯ x kT (cid:105) , σ i = (cid:118)(cid:117)(cid:117)(cid:116) S S (cid:88) k =1 (cid:16) − (cid:104) E [ ξ T +1 ] , ¯ x kT (cid:105) − Cost i (cid:17) . This allows us to compute the upper end z i sup of a one-sided conﬁdence interval on the mean cost of thepolicy obtained at iteration i given by(24) z i sup = Cost i + σ i √ S Φ − (1 − α )where Φ − (1 − α ) is the (1 − α )-quantile of the standard Gaussian distribution. Backward pass of iteration i . For k = ( i − N + 1 , . . . , iN , we solve for t = T , j = 1 , . . . , M T ,(25) (cid:26) inf −(cid:104) x T , E [ ξ T +1 ] (cid:105) x T ∈ X T ( x kT − , ξ T j )and for t = T − , . . . , j = 1 , . . . , M t ,(26)  inf x t ,f M t +1 (cid:88) (cid:96) =1 p t +1 (cid:96) f (cid:96) x t ∈ X t ( x kt − , ξ tj ) ,f (cid:96) ≥ (cid:104) β mt +1 (cid:96) , x t (cid:105) , m = 1 , . . . , iN, (cid:96) = 1 , . . . , M t +1 . For stage t problem above with realization ξ tj of ξ t (problem (25) for t = T and (26) for t < T ), let λ ktj be the optimal Lagrange multipliers associated to the equality constraints and let µ ktj ≥ x t ( i ) ≤ u ( i ) n +1 (cid:80) (cid:96) =1 ξ tj ( (cid:96) ) x t − ( (cid:96) ). We compute β ktj = (cid:16) λ ktj − (cid:104) u, µ ktj (cid:105) e (cid:17) ◦ ξ tj , where e is a vector in R n +1 of ones and where for vectors x, y , the vector x ◦ y has components ( x ◦ y )( i ) = x ( i ) y ( i ). Stopping criterion (see [32] ). At the end of the backward pass of iteration i , we solve  inf x ,f M (cid:88) (cid:96) =1 p (cid:96) f (cid:96) x ∈ X ( x , ξ ) ,f (cid:96) ≥ (cid:104) β m (cid:96) , x (cid:105) , m = 1 , . . . , iN, (cid:96) = 1 , . . . , M , We use minimization instead of maximization subproblems. In this context, the optimal mean income is the opposite ofthe optimal value of the ﬁrst stage problem. hose optimal value provides a lower bound z i inf on the optimal value Q ( x ) of the problem. Given atolerance ε >

0, the algorithm stops either when z i inf = 0 and z i sup ≤ ε or when(27) (cid:12)(cid:12) z i sup − z i inf (cid:12)(cid:12) ≤ ε max(1 , | z i sup | ) . In the expression above, we use ε max(1 , | z i sup | ) instead of ε | z i sup | in the right-hand side to account for thecase z i sup = 0.5.1.3. Numerical results.

Problem data.

We compare six methods to solve the porfolio problem presentedin Section 5.1.1: MuDA with sampling that we have just described (denoted by

MuDA for short), SDDP,CuSMuDA and SDDP with Level 1 cut selection (denoted by

CuSMuDA CS 1 and

SDDP CS 1 respectively forshort), and CuSMuDA and SDDP with Limited Memory Level 1 cut selection (denoted by

CuSMuDA CS 2 and

SDDP CS 2 for short). The implementation was done in Matlab run on a laptop with Intel(R) Core(TM)i7-4510U CPU @ 2.00GHz. All subproblems in the forward and backward passes were solved numericallyusing Mosek Optimization Toolbox [1].We ﬁx u ( i ) = 1 , i = 1 , . . . , n , while x has components uniformly distributed in [0 , α = 0 . ε = 0 .

1, and test two values of N , namely N = 1 and N = 200 in (27). Below,we generate various instances of the portfolio problem as follows. For ﬁxed T (number of stages [days for ourexperiment]) and n (number of risky assets), the distributions of ξ t (1 : n ) have M t = M realizations with p ti = P ( ξ t = ξ ti ) = 1 /M , and ξ (1 : n ) , ξ t (1 : n ) , . . . , ξ tM (1 : n ) chosen randomly among historical data ofdaily returns of n of the assets of the S&P 500 index for the period 18/5/2009-28/5/2015. These n assetscorrespond to the ﬁrst n stocks listed in our matrix of stock prices downloaded from Wharton ResearchData Services (WRDS: https://wrds-web.wharton.upenn.edu/wrds/ ). The daily return ξ t ( n + 1) of therisk-free asset is 0 .

1% for all t .Transaction costs are assumed to be known with ν t ( i ) = µ t ( i ) obtained sampling from the distribution ofthe random variable 0 .

08 + 0 .

06 cos( πT U T ) where U T is a random variable with a discrete distribution overthe set of integers { , , . . . , T } . Results.

The computational time and number of iterations required for solving 22 instances of theportfolio problem for several values of parameters (

M, T, n, N, S ) is given in Tables 1 and 2.On all instances except Instance 15 (with M = 2 , T = 6 , n = 10), MuDA is much slower than

SDDP , i.e.,needs much more time to satisfy the stopping criterion. On most instances, both variants with cut selectionare quicker than their counterpart without cut selection (on 17 instances out of 22 for SDDP and 12 out of22 for MuDA) and LML 1 is more eﬃcient than Level 1 (on 18 out of 22 instances for SDDP and 19 out of22 instances for MuDA). More precisely, both for SDDP and MuDA, we observe three diﬀerent patterns:(P1) Instances where variants with cut selection require comparable computational bulk but are quickerthan their counterpart without cut selection. There are 9 of these instances for SDDP (Instances 1,4, 7, 8, 9, 13, 15, 16, and 17) and 8 of these instances for MuDA (Instances 1, 2, 4, 6, 13, 14, 15, and16).(P2) Instances where one variant with cut selection (in general LML 1) is much quicker than the otherone, both of them being quicker than their counterpart without cut selection. There are 8 of theseinstances for SDDP (Instances 2, 5, 6, 10, 11, 12, 14, 18) and 4 of these instances for MuDA (Instances8, 9, 10, 11).(P3) Instances where at least one variant with cut selection (in general Level 1) is much slower than itscounterpart without cut selection. There are 5 of these instances for SDDP (Instances 3, 19, 20, 21,22) and 10 of these instances for MuDA (3, 5, 7, 12, 17, 18, 19, 20, 21, 22).To understand the impact of the cut selection strategies on the computational time, it is useful to analyzethe number of iterations and the proportion of cuts selected along the iterations of the algorithm by thevariants with cut selection in cases (P1), (P2), and (P3) described above.As examples of instances of type (P1), consider Instance 1 for SDDP and MuDA and Instance 2 forMuDA. We considered in particular instances where M is less than the number 2 n + 1 of constraints of the subproblems to giveMuDA a chance (according to [8], cases where MuDA may be competitive with (single cut) SDDP satisfy this requirement). nstance 1: M = 3, T = 48, n = 50, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 8 750 2 857 3 201 22 857 6 800 6 964Iteration 1152 1168 1286 1205 1176 1203

Instance 2: M = 50, T = 4, n = 500, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 1726.1 727.0 517.1 3 779.0 3 405 3 348Iteration 200 200 200 41 42 50

Instance 3: M = 5, T = 12, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 506.4 1 378.2 193.2 799.8 3053.5 168.9Iteration 9 11 7 6 8 7

Instance 4: M = 2, T = 6, n = 200, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 25.5 22.2 21.0 43.3 34.4 34.4Iteration 83 97 94 82 93 92

Instance 5: M = 5, T = 12, n = 5, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 22.0 21.9 16.9 42.2 104.5 24.0Iteration 200 200 200 200 200 200

Instance 6: M = 5, T = 12, n = 10, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 34.1 25.7 19.5 76.6 43.8 32.7Iteration 200 200 200 200 200 200

Instance 7: M = 10, T = 6, n = 10, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 53.7 17.3 14.8 90.3 143.0 26.2Iteration 200 200 200 200 200 200

Instance 8: M = 20, T = 6, n = 5, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 33.7 26.5 23.9 214.5 192.3 50.3Iteration 200 200 200 200 200 200

Instance 9: M = 50, T = 12, n = 10, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 340.7 181.8 158.8 3 649.4 2 541.8 835.3Iteration 200 200 200 179 200 200

Instance 10: M = 10, T = 6, n = 10, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 256.3 208.3 84.8 2 321.5 1 185.4 191.8Iteration 5 5 5 6 5 5

Instance 11: M = 20, T = 6, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 250.9 189.1 72.0 2 721.3 1 838.3 232.6Iteration 4 4 3 4 4 4

Table 1.

Computational time (in seconds) and number of iterations for solving instancesof the portfolio problem of Section 5.1.1 with

SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDACS 1 , and

CuSMuDA CS 2 . nstance 12: M = 2, T = 6, n = 450, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 175.3 117.7 153.7 235.6 351.7 183.7Iteration 158 154 173 128 181 168

Instance 13: M = 2, T = 12, n = 50, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 46.3 21.5 21.1 103.2 31.7 36.8Iteration 122 137 132 125 127 137

Instance 14: M = 2, T = 6, n = 10, N = 1, S = 10 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 0.52 0.35 0.45 0.38 0.37 0.34Iteration 13 13 13 11 13 13

Instance 15: M = 3, T = 36, n = 30, N = 1, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 1029 341 394 1 710 704 634Iteration 454 467 489 436 463 462

Instance 16: M = 3, T = 48, n = 10, N = 1, S = 50 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 56.8 44.2 39.3 75.4 46.1 39.3Iteration 100 139 124 100 99 98

Instance 17: M = 20, T = 5, n = 4, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 103.6 88.2 81.2 538.2 807 78.6Iteration 3 4 4 2 2 2

Instance 18: M = 20, T = 5, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 30.9 26.7 17.8 1021.8 1764.6 142.8Iteration 1 1 1 3 3 3

Instance 19: M = 20, T = 5, n = 6, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 132.2 273.2 127.8 2 752.2 3 853.2 315Iteration 3 5 5 5 5 5

Instance 20: M = 10, T = 8, n = 4, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 134.7 177.9 67.4 476.4 976.2 93.6Iteration 4 4 4 3 3 3

Instance 21: M = 10, T = 8, n = 5, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 264.7 297.0 90.9 2 603.4 4 438.8 258Iteration 5 5 4 6 5 7

Instance 22: M = 10, T = 8, n = 6, N = 200, S = 200 SDDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2

CPU time 256.7 312.7 162.1 3 808.2 6 633 303Iteration 5 5 6 7 9 6

Table 2.

Computational time (in seconds) and number of iterations for solving instancesof the portfolio problem of Section 5.1.1 with

SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDACS 1 , and

CuSMuDA CS 2 .For Instance 1, the mean proportion (over the iterations of the algorithm) of cuts selected by all variantswith cut selection, i.e.,

SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and

CuSMuDA CS 2 , tends to increase withthe stage and is very low for nearly all stages (at most 0.1 until stage 30 and at most 0.2 until stage 40,knowing that there are T = 48 stages for that instance); see the bottom plot of Figure 2 which represents he evolution of the mean proportion of cuts selected as a function of the stage. This makes sense since atthe last stage, the cuts for functions Q T ( · , ξ T j ) are exact (and as we recalled, all of them are selected with

CuSMuDA CS 1 ) but not necessarily for stages t < T because for these stages approximate recourse functionsare used and the approximation errors propagate backward. Therefore it is expected that at the early stagesold cuts (computed at the ﬁrst iterations using crude approximations of the recourse functions) will probablynot be selected. Additionally, another reason why fewer cuts are selected in earlier stages may come fromthe fact that there are fewer distinct sampled points in the state-space.Moreover the proportion of cuts selected is very similar for all variants.Since the degradation in the upper and lower bound for variants with cut selection is very limited (seethe evolution of the upper and lower bounds along iterations for all methods on the top right plot of Figure2), the number of iterations required to solve Instance 1 by these variants and their counterpart without cutselection is similar. SDDPMuDASDDP CS 1SDDP CS 2CuSMuDA CS 1CuSMuDA CS 2Iteration T o t a l C P U t i m e ( s e c ond s ) Figure 2.

Top left: total CPU time (in seconds) as a function of the number of iterationsfor

SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance1. Top right: evolution of the upper bounds z i sup and lower bounds z i inf along the iterationsof SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance1 (no legend is given on this plot since all curves are almost identical). Bottom: meanproportion of cuts (over the iterations of the algorithm) selected for stages t = 2 , . . . , T = 48by SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance 1. We see in particular that the bounds for all algorithms are very close to each other at the last iteration and that, asexpected, the lower bounds increase and the upper bounds tend to decrease along iterations. If we do not know the optimalvalue Q ( x ), these observations (that we checked on all instances) are a good indication that all algorithms were correctlyimplemented. herefore, variants with cut selection are much quicker and the quickest variant with cut selection is theone requiring the least number of iterations; see the top left plot of Figure 2 which plots the total CPU timeas a function of the number of iterations.For Instance 2 solved by MuDA and its variants with cut selection, we refer to Figure 3. We see on thisﬁgure that these variants again select a very small proportion of cuts for stages 2 and 3 (here, the number ofstages is T = 4) and that this proportion increases with the stage. The degradation in the lower bound forvariants with cut selection is larger than with Instance 1 but is still very limited. Therefore, variants withcut selection are much quicker. CuSMuDA CS 2 requires less cuts than

CuSMuDA CS 1 but more iterations andCPU time for both variants is similar.To explain patterns (P2) and (P3), we consider Instance 2 for SDDP, of type (P2), Instance 3 for SDDPand MuDA, of type (P3), and refer to Figures 3 and 4 which are the analogues of Figure 2 for Instances 2and 3. T o t a l C P U t i m e ( s e c ond s ) Figure 3.

Top left: total CPU time (in seconds) as a function of the number of iterationsfor

SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance2. Top right: evolution of the lower bounds z i inf along the iterations of SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance 2. Bottom: meanproportion of cuts (over the iterations of the algorithm) selected for stages t = 2 , , T = 4,by SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance 2. teration T o t a l C P U t i m e ( s e c ond s ) Iteration

SDDPSDDP CS 1SDDP CS 2MuDACuSMuDA CS 1CuSMuDA CS 2

Stage

SDDP CS 1SDDP CS 2CuSMuDA CS 1CuSMuDA CS 2

Figure 4.

Top left: total CPU time (in seconds) as a function of the number of iterationsfor

SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , and

CuSMuDA CS 2 to solve Instance 3 (we didnot represent the curve for

CuSMuDA CS 1 since the total CPU time with

CuSMuDA CS 1 ismuch larger). Top right: evolution of the upper bounds z i sup and lower bounds z i inf along theiterations of SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solveInstance 3. Bottom: mean proportion of cuts (over the iterations of the algorithm) selectedfor stages t = 2 , . . . , T = 12, by SDDP CS 1 , SDDP CS 2 , CuSMuDA CS 1 , and

CuSMuDA CS 2 to solve Instance 3.On Figure 3, we see that on Instance 2,

SDDP CS 1 and

SDDP CS 2 select a small proportion of cuts forstages 2 and 3 (recall that this is an instance with T = 4 stages) and require, as SDDP , 200 iterations (theevolution of the lower bound along iterations is similar for all 3 methods). Moreover,

SDDP CS 2 selectssigniﬁcantly less cuts than

SDDP CS 1 for stages 2 and 3 and therefore is much quicker than

SDDP CS 1 .This means that a large number of cuts have the same value at some trial points and for each such trialpoint if

SDDP CS 1 selects all these cuts

CuSMuDA CS 2 only selects the oldest of these cuts.Instance 3 is an example of a problem where LML 1 cut selection strategy selects a very small number ofcuts for all stages whereas Level 1 selects many more cuts, both for MuDA and SDDP (see Figure 4). Ontop of that,

SDDP CS 1 (resp.

CuSMuDA CS 1 ) needs more iterations than

SDDP and

SDDP CS 2 (resp.

MuDA and

CuSMuDA CS 2 ). Therefore the CPU time needed to solve Instance 3 with

SDDP CS 1 (resp.

CuSMuDACS 1 ) is much larger than the CPU time needed to solve this instance with

SDDP and

SDDP CS 2 (resp.

MuDA and

CuSMuDA CS 2 ). This is an example of an instance where the time spent to select the cuts is notcompensated by the (small) reduction in CPU time for solving the problems in the backward and forwardpasses. ummarizing our observations, • Pattern (P3) occurs when a variant with cut selection selects a large proportion of cuts and requirestoo many iterations; • Patterns (P1) and (P2) occur when variants with cut selection select a small number of cuts anddo not need much more iterations than the variant without cut selection. In this situation, (P1)occurs either when (i) both variants with cut selection select a similar number of cuts or when (ii)one variant selects less cuts than the other but needs slightly more iterations.One last comment is now in order. We have already observed that on all experiments, Level 1 andTerritory Algorithm cut selection strategies, i.e.,

SDDP CS 1 and

CuSMuDA CS 1 , correctly select all cuts atthe ﬁnal stage. However, a crude implementation of the Level 1 pseudo-code given in Figure 1 resultedin the elimination of cuts at the ﬁnal stage. This comes from the fact that approximate solutions of theoptimization subproblems are computed. Therefore two optimization problems could compute the same cutsbut the solver may return two diﬀerent (very close) approximate solutions. Similarly, a cut may be in theorythe highest at some point (for instance cut C kT j is, in theory, the highest at x kT − ) but numerically the valueof another cut at this point may be deemed slightly higher, because of numerical errors. The remedy is tointroduce a small error term ε ( ε = 10 − in our experiments) such that the values V and V of two cuts C and C at a trial point are considered equal if | V − V | ≤ ε max(1 , | V | ) while C is above C at this trialpoint if V ≥ V + ε max(1 , | V | ). Therefore the pseudo-codes of Level 1 and LML 1 given in Figure 1 needto be modiﬁed. The corresponding pseudo-codes taking into account the approximation errors are given inFigure 6 in the Appendix. Adding cuts that improve the approximation by more than some threshold ε was also used in [23].5.2. Inventory management.

Model.

We consider the T -stage inventory management problem given in (Shapiro et al. 2009). Foreach stage t = 1 , . . . , T , on the basis of the inventory level x t − at the beginning of period t , we have todecide on the quantity y t − x t − of a product to buy so that the inventory level becomes y t . Given demand ξ t for that product for stage t , the inventory level is x t = y t − ξ t at the beginning of stage t + 1. The inventorylevel can become negative, in which case a backorder cost is paid. If one is interested in minimizing theaverage cost over the optimization period, we need to solve the following dynamic programming equations:for t = 1 , . . . , T , deﬁning Q t ( x t − ) = E ξ t [ Q t ( x t − , ξ t )], the stage t problem is Q t ( x t − , ξ t ) = (cid:26) inf c t ( y t − x t − ) + b t ( ξ t − y t ) + + h t ( y t − ξ t ) + + Q t +1 ( x t ) x t = y t − ξ t , y t ≥ x t − , where c t is the unit buying cost, h t is the holding cost, and b t the backorder cost. The optimal mean cost is Q ( x ) where x is the initial stock.In what follows, we present the results of numerical simulations obtained solving this problem with SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 (we used the same notation as before todenote the solution methods).5.2.2.

Numerical results.

We consider six values for the number of stages T ( T ∈ { , , , , , } ) andfor ﬁxed T , the following values of the problem and algorithm parameters are taken: • c t = 1 . πt ) , b t = 2 . , h t = 0 . , M t = M for all t = 1 , . . . , T , • x = 10, p ti = M = for all t, i , • ξ = ξ and ( ξ t , ξ t , . . . , ξ tM ) corresponds to a sample from the distribution of ξ t (1 + 0 . ε t ) for i.i.d ε t ∼ N (0 ,

1) for t = 2 , . . . , T , where ξ t = 5 + 0 . t , • for the stopping criterion, in (24) α = 0 .

025 and N = S = 200, and ε = 0 .

05 in (27).

Checking the implementations.

As for the portfolio problem, for each value of T ∈ { , , , , , } ,we checked that upper and lower bounds computed by all 6 methods are very close for the three algorithmsat the ﬁnal iteration. Computational time and proportion of cuts selected.

Table 3 shows the CPU time needed tosolve our six instances of inventory management problems. For SDDP, all variants with cut selection yield DDP SDDP CS 1 SDDP CS 2 MuDA CuSMuDA CS 1 CuSMuDA CS 2 T = 5 0.57 0.48 0.18 1.57 2.90 0.22 T = 10 5.9 1.22 1.42 26.33 43.07 1.68 T = 15 21.9 20.4 4.56 106.52 178.08 8.07 T = 20 28.6 31.2 9.94 158.10 245.24 19.43 T = 25 36.5 62.8 9.61 189.27 405.58 22.64 T = 30 77.7 63.6 18.30 363.86 575.53 56.62 Table 3.

Computational time (in minutes) for solving instances of the inventory problemof Section 5.2.1 for M = 20 with SDDP , SDDP CS 1 , SDDP CS 2 , MuDA , CuSMuDA CS 1 , and

CuSMuDA CS 2 . T = 15 , M = 20 T = 20 , M = 20 Figure 5.

Mean proportion of cuts (over the iterations of the algorithm) selected forstages t = 2 , . . . , T for CuSMuDA CS 1 and

CuSMuDA CS 2 for the inventory problem for M = 20.important reduction in CPU time and SDDP CS 2 , that uses LML 1, is by far the quickest on 5 instances.For MuDA, Level 1 is not eﬃcient (CPU time with

CuSMuDA CS 1 is much larger than CPU time with

MuDA )whereas LML 1 allows us to drastically reduce CPU time. As for instances of the portfolio problem of type(P3), the fact that

CuSMuDA CS 1 is much slower than both

CuSMuDA CS 2 and

MuDA is that it requires asimilar number of iterations and selects much more cuts than

CuSMuDA CS 2 which selects very few cuts forall stages, as can be seen on Figure 5 which represents the mean proportion of cuts selected for

CuSMuDA CS1 and

CuSMuDA CS 2 as a function of the number of stages for two instances.6.

Conclusion

We proposed CuSMuDA, a combination of a class of cut selection strategies with Multicut Decompositionalgorithms to solve multistage stochastic linear programs. We proved the almost sure convergence of themethod in a ﬁnite number of iterations and obtained as a by-product the almost sure convergence in a ﬁnitenumber of iterations of SDDP combined with this class of cut selection strategies.Numerical experiments on many instances of a portfolio and of an inventory problem have shown thatcombining LML 1 cut selection with SDDP (resp. MuDA) allows us in general to reduce considerably theCPU time of SDDP (resp. MuDA). There are, however, situations where a variant with selection is slowerthan its counterpart without cut selection.It would be interesting to test CuSMuDA and the Limited Memory variant of Level 1 (both for MuDA andSDDP) on other types of stochastic programs and to extend the analysis to nonlinear stochastic programs. cknowledgments The second author’s research was partially supported by an FGV grant, CNPq grants 307287/2013-0and 401371/2014-0, and FAPERJ grant E-26/201.599/2014. The authors wish to thank Vincent Lecl`ere forhelpful discussions.

References [1] E. D. Andersen and K.D. Andersen.

The MOSEK optimization toolbox for MATLAB manual. Version 7.0 , 2013.[2] T. Asamov and W. Powell. Regularized decomposition of high-dimensional multistage stochastic programs with markovuncertainty.

Available at: https://arxiv.org/abs/1505.02227 , 2015.[3] A. Ben-Tal, T. Margalit, and A. Nemirovski. Robust modeling of multi-stage portfolio problems. in: H. Frenk, K. Roos,T. Terlaky, S. Zhang, Eds., High Performance Optimization, Kluwer Academic Publishers , pages 303–328, 2000.[4] M. Best and J. Hlouskova. An algorithm for portfolio optimization with variable transaction costs, part 1: Theory.

Journalof Optimization Theory and Applications , 135:563–581, 2007.[5] J. Birge and F. Louveaux.

Introduction to Stochastic Programming . Springer-Verlag, New York, 1997.[6] J.R. Birge. Decomposition and partitioning methods for multistage stochastic linear programs.

Oper. Res. , 33:989–1007,1985.[7] J.R. Birge and C. J. Donohue. The Abridged Nested Decomposition Method for Multistage Stochastic Linear Programswith Relatively Complete Recourse.

Algorithmic of Operations Research , 1:20–30, 2001.[8] J.R. Birge, C.J. Donohue, D.F. Holmes, and O.G. Svintsitski. A parallel implementation of the nested decompositionalgorithm for multistage stochastic linear programs.

Mathematical Programming , 75:327–352, 1996.[9] J.R. Birge and F.V. Louveaux. A multicut algorithm for two-stage stochastic linear programs.

European Journal of Oper-ational Research , 34:384–392, 1988.[10] Z.L. Chen and W.B. Powell. Convergent Cutting-Plane and Partial-Sampling Algorithm for Multistage Stochastic LinearPrograms with Recourse.

J. Optim. Theory Appl. , 102:497–524, 1999.[11] H. Gassmann. Mslip: A computer code for the multistage stochastic linear programming problem.

Math. Program. , 47:407–423, 1990.[12] S. Gaubert, W. McEneaney, and Z. Qu. Curse of dimensionality reduction in max-plus based approximation methods:Theoretical estimates and improved pruning algorithms. , pages 1054–1061, 2011.[13] P. Girardeau, V. Leclere, and A.B. Philpott. On the convergence of decomposition methods for multistage stochastic convexprograms.

Mathematics of Operations Research , 40:130–145, 2015.[14] V. Guigues. SDDP for some interstage dependent risk-averse problems and application to hydro-thermal planning.

Com-putational Optimization and Applications , 57:167–203, 2014.[15] V. Guigues. Convergence analysis of sampling-based decomposition methods for risk-averse multistage stochastic convexprograms.

SIAM Journal on Optimization , 26:2468–2494, 2016.[16] V. Guigues. Dual dynamic programing with cut selection: Convergence proof and numerical experiments.

European Journalof Operational Research , 258:47–57, 2017.[17] V. Guigues and W. R¨omisch. Sampling-based decomposition methods for multistage stochastic programs based on extendedpolyhedral risk measures.

SIAM J. Optim. , 22:286–312, 2012.[18] V. Guigues and W. R¨omisch. SDDP for multistage stochastic linear programs based on spectral risk measures.

OperationsResearch Letters , 40:313–318, 2012.[19] V. Guigues, W. Tekaya, and M. Lejeune. Regularized decomposition methods for deterministic and stochastic convexoptimization and application to portfolio selection with direct transaction and market impact costs.

Optimization OnLine ,2017.[20] M. Hindsberger and A. B. Philpott. Resa: A method for solving multi-stage stochastic linear programs.

SPIX StochasticProgramming Symposium , 2001.[21] G. Infanger and D. Morton. Cut sharing for multistage stochastic linear programs with interstage dependency.

Math.Program. , 75:241–256, 1996.[22] V. Kozmik and D.P. Morton. Evaluating policies in risk-averse multi-stage stochastic programming.

Mathematical Pro-gramming , 152:275–300, 2015.[23] N. L¨ohndorf, D. Wozabal, and S. Minner. Optimizing trading decisions for hydro storage systems using approximate dualdynamic programming.

Operations Research , 61:810–823, 2013.[24] W.M. McEneaney, A. Deshpande, and S. Gaubert. Curse of complexity attenuation in the curse of dimensionality freemethod for HJB PDEs.

American Control Conference , pages 4684–4690, 2008.[25] M.V.F. Pereira and L.M.V.G Pinto. Multi-stage stochastic optimization applied to energy planning.

Math. Program. ,52:359–375, 1991.[26] Laurent Pfeiﬀer, Romain Apparigliato, and Sophie Auchapt. Two methods of pruning benders’ cuts and their applicationto the management of a gas portfolio.

Research Report RR-8133, hal-00753578 , 2012.[27] A. Philpott and V. de Matos. Dynamic sampling algorithms for multi-stage stochastic programs with risk aversion.

EuropeanJournal of Operational Research , 218:470–483, 2012.[28] A. Philpott, V. de Matos, and E. Finardi. Improving the performance of stochastic dual dynamic programming.

Journalof Computational and Applied Mathematics , 290:196–208, 2015.

29] A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods.

Oper.Res. Lett. , 36:450–455, 2008.[30] W.P. Powell.

Approximate Dynamic Programming . John Wiley and Sons, 2nd edition, 2011.[31] A. Ruszczy´nski. Parallel decomposition of multistage stochastic programming problems.

Math. Programming , 58:201–228,1993.[32] A. Shapiro. Analysis of stochastic dual dynamic programming method.

European Journal of Operational Research , 209:63–72, 2011.[33] A. Shapiro, D. Dentcheva, and A. Ruszczy´nski.

Lectures on Stochastic Programming: Modeling and Theory . SIAM,Philadelphia, 2009.[34] A. Shapiro, W. Tekaya, J.P. da Costa, and M.P. Soares. Risk neutral and risk averse stochastic dual dynamic programmingmethod.

European Journal of Operational Research , 224:375–391, 2013.[35] A. Shapiro, W. Tekaya, J.P. da Costa, and M.P. Soares. Worst-case-expectation approach to optimization under uncertainty.

Oper. Res. , 61:1435–1449, 2013.[36] W. Zhang, H. Rahimian, and G. Bayraksan. Decomposition algorithms for risk-averse multistage stochastic programs withapplication to water allocation under uncertainty.

INFORMS Journal on Computing , 28:385–404, 2016.

Appendix

Level 1 Limited Memory Level 1 I ktj = { k } , m ktj = C ktj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj + ε max(1 , | m (cid:96)tj | ) I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) Else if |C ktj ( x (cid:96)t − ) − m (cid:96)tj | ≤ ε max(1 , | m (cid:96)tj | ) I (cid:96)tj = I (cid:96)tj ∪ { k } End IfIf C (cid:96)tj ( x kt − ) > m ktj + ε max(1 , | m ktj | ) I ktj = { (cid:96) } , m ktj = C (cid:96)tj ( x kt − ) Else if |C (cid:96)tj ( x kt − ) − m ktj | ≤ ε max(1 , | m ktj | ) I ktj = I ktj ∪ { (cid:96) } End IfEnd For I ktj = { } , m ktj = C tj ( x kt − ). For (cid:96) = 1 , . . . , k − If C ktj ( x (cid:96)t − ) > m (cid:96)tj + ε max(1 , | m (cid:96)tj | ) I (cid:96)tj = { k } , m (cid:96)tj = C ktj ( x (cid:96)t − ) End IfIf C (cid:96) +1 tj ( x kt − ) > m ktj + ε max(1 , | m ktj | ) I ktj = { (cid:96) + 1 } , m ktj = C (cid:96) +1 tj ( x kt − ) End IfEnd For

Figure 6.