Distributionally Robust Newsvendor with Moment Constraints
DDistributionally Robust Newsvendor with Moment Constraints
Derek Singh a , ∗ , Shuzhong Zhang a a Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55455
A R T I C L E I N F O
Keywords :robust newsvendormoments dualitydistributionally robust optimizationWasserstein distanceLagrangian duality
A B S T R A C T
This paper expands the work on distributionally robust newsvendor to incorporate moment constraints.The use of Wasserstein distance as the ambiguity measure is preserved. The infinite dimensionalprimal problem is formulated; problem of moments duality is invoked to derive the simpler finitedimensional dual problem. An important research question is: How does distributional ambiguityaffect the optimal order quantity and the corresponding profits/costs? To investigate this, some theoryis developed and a case study in auto sales is performed. We conclude with some comments ondirections for further research.
1. Introduction and Overview
The newsvendor model is a classical problem in inven-tory control that has been extensively studied under a vari-ety of settings. It is a decision problem in which the deci-sion maker wants to determine the optimal order quantityto balance overage and underage costs to minimize losses(or alternatively maximize profits) during the next sales pe-riod, between orders. Overage costs result from ordering toomuch product, in excess of realized demand, which leads towaste. Underage costs result from ordering too little, short ofrealized demand, which leads to lost sales. The classical set-ting is a single decision period, for a non-perishable product,where the demand distribution is known. Scarf [13] solvedthe original distributionally robust formulation, which as-sumes knowledge of both first and second moments.In this paper, we extend that work to allow for moreprecise information regarding the underlying distribution, asmeasured in terms of Wasserstein distance. This allows us tomodulate the degree of ambiguity and investigate its effecton the optimal order quantity and the corresponding prof-its/losses. This is done via the framework of Wassersteindiscrepancy between distributions and the problem of mo-ments duality results. First we formulate the distributionallyrobust newsvendor decision problem to account for ambigu-ity in distribution. Next, we formulate and solve the simplerfinite dimensional dual problem. After that, we use industrydata to investigate the practical effects of ambiguity.An outline of this paper is as follows. Section 1 giveson overview of the preliminary material as well as a litera-ture review. Section 2 develops the main theoretical resultsto characterize the distributionally robust newsvendor modelwith moment constraints. Section 3 conducts a case studyusing a five year historical data set of monthly sales of Teslaautomobiles. Section 4 discusses conclusions and sugges-tions for further research. All detailed proofs are deferred tothe Appendix. ∗ Corresponding author
Email addresses: [email protected] (D. Singh); [email protected] (S.Zhang)
As mentioned previously, the original work in this prob-lem domain was initaited by Scarf [13]. Since then, furtherresearch has been done on the topic of distributional robust-ness. One paper of note by Gallego and Moon [7] uses adifferent choice of parameters and a presents an alternateproof for Scarf’s ordering rule. According to the authors,this choice of parameters and method of proof are simplerand more interpretable. The development of distributionallyrobust optimization with moment-based ambiguity sets ledto extensions that considered multiple items, risk aversion,and additional distributional properties such as asymmetryand multimodality [9, 12].More recent, and alternative, work in distributional ro-bustness is focused on the idea of distributionaly ambiguityas measured through the statistical distance between distri-butions. A few of the seminal papers in this area, that use theWasserstein distance framework, are by Gao and Kleywegt[8], Esfahani and Kuhn [6], and Blanchet and Karthyek [3].Applications of this framework include topics in machinelearning, portfolio management, inventory control, and oth-ers. A relevant example is the work by Lee et al. [11], thatstudies the data-driven distributionally robust risk-aversenewsvendor model under Wasserstein ambiguity. The au-thors make the case for dispensing with the first two momentconstraints (difficulty in estimation, challenges in obtainingthis information in a practical setting) and analyze the re-sulting solutions in settings with/without risk aversion andfor Wasserstein distances of order 𝑝 = 1 and 𝑝 > . The empirical measure 𝑃 𝑛 is defined as 𝑃 𝑛 ∶= 𝑛 ∑ 𝑛𝑖 =1 𝛿 𝑥 𝑖 where 𝑥 𝑖 denotes the 𝑖 th realization of demand and 𝛿 𝑥 𝑖 is aDirac measure. The ambiguity set for probability measuresis 𝑈 𝛿 ( 𝑃 𝑛 ) = { 𝑃 ∶ 𝑊 𝑑 ( 𝑃 , 𝑃 𝑛 ) ≤ 𝛿 } where 𝑊 𝑑 is the Wasser-stein discrepancy with associated distance function 𝑑 ( ⋅ , ⋅ ) . 𝑊 𝑑 ( 𝑃 , 𝑃 ′ ) = inf 𝜋 { 𝔼 𝜋 [ 𝑑 ( 𝑋, 𝑌 )] ∶ 𝑋 ∼ 𝑃 , 𝑌 ∼ 𝑃 ′ } . Here 𝑑 ( 𝑋, 𝑌 ) is the distance between random variables 𝑋 and 𝑌 that follow distributions 𝑃 and 𝑃 ′ respectively, and Singh et al.
Page 1 of 7 a r X i v : . [ q -f i n . M F ] O c t RNV the inf is taken over all joint distributions 𝜋 with marginals 𝑃 and 𝑃 ′ . To simplify the analysis, this work uses the (squared)Euclidean distance 𝑑 ( 𝑥, 𝑦 ) = ‖ 𝑥 − 𝑦 ‖ = ∑ 𝑛𝑖 =1 ( 𝑥 𝑖 − 𝑦 𝑖 ) [15]. In Section 2 we formulate the primal decision problemfor the distributionally robust newsvendor with moment con-straints. Problem of moments duality results will be used toformulate the simpler yet equivalent dual problem. In thiscontext, to specify the first moment constraint 𝑀 ( 𝑋 ) = 𝔼 𝑃 [ 𝑋 ] = 𝜇 and the second moment constraint 𝑀 ( 𝑋 ) = 𝔼 𝑃 [ 𝑋 ] = ( 𝜇 + 𝜎 ) , we appeal to the strong duality of lin-ear semi-infinite programs. The dual problem appears to bemore tractable than the primal problem since it only involvesthe (finite dimensional) empirical measure 𝑃 𝑛 as opposedto a continuum of probability measures. This allows us tosolve a data-driven nested optimization problem defined bythe chosen data set. A brief restatement of the duality resultcan be found in [14]. See Appendix B of [2] and Proposition2 of [1] or the original work [10] for further details.
2. Theory: DRNV
This section formulates the primal and dual problemsfor the distributionally robust newsvendor (DRNV). A poly-nomial time algorithm, using the directional descent (DD)method, is developed to compute the solution to the dual.
This work will use the cost minimization form of thenewsvendor decision problem. In this subsection, we showthe equivalence between the cost minimization and profitmaximization forms. The cost (objective) function is Π ( 𝑄 ) = 𝑐 𝔼 ( 𝑋 − 𝑄 ) + + 𝑐 𝔼 ( 𝑄 − 𝑋 ) + where 𝑐 and 𝑐 denote the underage and overage costs, Xdenotes random demand, and Q denotes the order quantity(decision variable). The profit (objective) function is Π ( 𝑄 ) = ( 𝑝 − 𝑠 ) 𝔼 min( 𝑄, 𝑋 ) − ( 𝑐 − 𝑠 ) 𝑄 where 𝑝 > 𝑐 > 𝑠 > for sale price 𝑝 , unit cost 𝑐 , and salvagevalue 𝑠 . Let us use the relations 𝔼 min( 𝑄, 𝑋 ) = 𝑄 − 𝔼 ( 𝑄 − 𝑋 ) + = 𝜇 − 𝔼 ( 𝑋 − 𝑄 ) + 𝔼 max( 𝑄, 𝑋 ) = 𝑄 + 𝔼 ( 𝑋 − 𝑄 ) + = 𝜇 + 𝔼 ( 𝑄 − 𝑋 ) + to substitute for 𝑄 , min( 𝑄, 𝑋 ) , in the profit function to get Π ( 𝑄 ) = ( 𝑝 − 𝑠 )( 𝑄 − 𝔼 ( 𝑄 − 𝑋 ) + ) − ( 𝑐 − 𝑠 ) 𝑄 = ( 𝑠 − 𝑝 ) 𝔼 ( 𝑄 − 𝑋 ) + + ( 𝑝 − 𝑐 ) 𝑄 = ( 𝑠 − 𝑝 ) 𝔼 ( 𝑄 − 𝑋 ) + + ( 𝑝 − 𝑐 )( 𝜇 + 𝔼 ( 𝑄 − 𝑋 ) + − 𝔼 ( 𝑋 − 𝑄 ) + )= ( 𝑝 − 𝑐 )( 𝜇 − 𝔼 ( 𝑋 − 𝑄 ) + ) − ( 𝑐 − 𝑠 ) 𝔼 ( 𝑄 − 𝑋 ) + . Observe that maximizing Π ( 𝑄 ) is equivalent to minimizing Π ( 𝑄 ) for 𝑐 = 𝑝 − 𝑐 > and 𝑐 = 𝑐 − 𝑠 > . Let us characterize constraints via the set 𝐶 ∶= { 𝑋 ∈ ℝ + ∶ 𝑀 ( 𝑋 ) = 𝜇, 𝑀 ( 𝑋 ) = ( 𝜇 + 𝜎 ) , 𝑃 ∈ 𝑈 𝛿 ( 𝑃 𝑛 )} . where ℝ + denotes the restriction to non-negative realizeddemand. The primal problem can then be written as inf 𝑄 ≥ sup 𝑋 ∈ 𝐶 𝑐 𝔼 ( 𝑄 − 𝑋 ) + + 𝑐 𝔼 ( 𝑋 − 𝑄 ) + (P)where 𝑄 is the order quantity, 𝑋 is demand, and 𝛿 is theradius of the Wasserstein ball of distributions centered at 𝑃 𝑛 . Following the approach in [14], let 𝜆 ∶= ( 𝜆 , 𝜆 , 𝜆 ) , de-fine Γ ∶= { 𝜆 ∶ 𝜆 ≥ , 𝜆 , 𝜆 } , and apply moments dualityto write the dual problem as inf 𝑄 ≥ inf 𝜆 ∈Γ 𝐹 ( 𝜆, 𝑄 ) ∶= 𝜆 𝛿 + 𝜆 𝜇 + 𝜆 ( 𝜎 + 𝜇 ) + 𝑛 ∑ 𝑖 =1 Ψ 𝑖 ( 𝜆, 𝑄 ) (D)where Ψ 𝑖 ( 𝜆, 𝑄 ) ∶= − 𝜆 𝑥 𝑖 + sup 𝑥 ≥ 𝑔 𝑖 ( 𝑥, 𝜆, 𝑄 ) for 𝑔 𝑖 ( 𝑥, 𝜆, 𝑄 ) ∶= 𝑐 ( 𝑥 − 𝑄 ) + + 𝑐 ( 𝑄 − 𝑥 ) + − 𝑎𝑥 + 2 𝑏 𝑖 𝑥 for 𝑎 ∶= 𝜆 + 𝜆 , 𝑏 𝑖 ∶= 𝜆 𝑥 𝑖 − 𝜆 ∕2 . The constraint 𝑎 > en-sures 𝐹 ( 𝜆, 𝑄 ) has finite value. Note that (D) is jointly convexin ( 𝜆, 𝑄 ) . Our approach to solve (D) has two steps: (i) solvethe inner problem in closed form and develop a polynomialtime algorithm to compute its solution, (ii) use a bisectionmethod to compute the solution to the outer problem. Over-all, the algorithm finishes in polynomial time. Proposition 2.1.
Let 𝑎 > , 𝑄 ≥ , then sup 𝑥 ≥ 𝑔 𝑖 ( 𝑥, 𝜆, 𝑄 ) = ⎧⎪⎨⎪⎩ 𝑔 𝑖 , if ( 𝜆 , 𝜆 ) ∈ ̃𝑅 𝑖 ,𝑔 𝑖 , if ( 𝜆 , 𝜆 ) ∈ ̃𝑅 𝑖 ,𝑔 𝑖 , if ( 𝜆 , 𝜆 ) ∈ ̃𝑅 𝑖 , where 𝑥 𝑖 = (2 𝑏 𝑖 − 𝑐 )∕(2 𝑎 ) , 𝑥 𝑖 = (2 𝑏 𝑖 + 𝑐 )∕(2 𝑎 ) , ̃𝑐 = 𝑐 + 𝑐 , the 𝑔 𝑖𝑗 functions are given by 𝑔 𝑖 = 𝑐 𝑄,𝑔 𝑖 = 𝑐 ( 𝑄 − 𝑥 𝑖 ) − 𝑎𝑥 𝑖 + 2 𝑏 𝑖 𝑥 𝑖 ,𝑔 𝑖 = 𝑐 ( 𝑥 𝑖 − 𝑄 ) − 𝑎𝑥 𝑖 + 2 𝑏 𝑖 𝑥 𝑖 , the mapping of regions to optimal solutions 𝑥 ∗ is given byTable 1, and the ̃𝑅 𝑖𝑗 regions are defined for four cases, ac-cording to the relative positions of 𝜆 intercepts 𝛽 ∶= 𝑐 , 𝛽 ∶= − 𝑐 , 𝛽 ∶= ( 𝑐 − 𝑐 𝑎𝑄 , 𝛽 ∶= 𝑐 − 2 √ 𝑎 ̃𝑐𝑄 . Case 1. 𝛽 ≥ 𝛽 ∩ 𝛽 ∈ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , ̃𝑅 𝑖 ∶= ∅ , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where regions 𝑅 𝑖𝑗 ⊆ { 𝜆 ≥ , 𝜆 } are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ . Singh et al. Page 2 of 7RNV
Table 1
Mapping of Regions to Optimal Solutions.
Region ̃𝑅 𝑖𝑗 Optimal Solution 𝑥 ∗ ̃𝑅 𝑖 ̃𝑅 𝑖 𝑥 𝑖 ̃𝑅 𝑖 𝑥 𝑖 Case 2. 𝛽 < 𝛽 ∩ 𝛽 ∈ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where 𝑅 𝑖𝑗 regions are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } . Case 3. 𝛽 < 𝛽 ∩ 𝛽 ∉ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where 𝑅 𝑖𝑗 regions are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } . Case 4. 𝛽 ≥ 𝛽 ∩ 𝛽 ∉ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= ∅ , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where 𝑅 𝑖𝑗 regions are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ 𝑥 𝑖 ≥ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ . Proof.
The proof first solves the unconstrained problem todeduce that the set of possible values for optimal solution 𝑥 ∗ for the constrained problem is 𝑆 ∶= {0 , 𝑥 𝑖 , 𝑥 𝑖 } . Fromthere, some analysis is done, for each of the four cases, topartition the ( 𝜆 ≥ , 𝜆 ) half-plane into regions and to con-struct the mapping between regions and optimal solutions.Substituting the appropriate 𝑥 ∗ per region into 𝑔 𝑖 gives theresult. See Appendix for the detailed proof. Theorem 2.1.
DD method evaluates 𝑓 ( 𝜉, 𝑄 ) ∶= min { 𝜆 ≥ ,𝜆 } 𝐹 ( 𝜆 , 𝜆 , 𝜉, 𝑄 ) for 𝜉 ∶= 𝜆 + 𝜆 > , in polynomial time.Proof. Note the DD method can evaluate 𝑓 ( 𝜉, 𝑄 ) in at most ( 𝑛 ) operations, as it searches ( 𝑛 ) line segments and re-gions that partition the { 𝜆 ≥ , 𝜆 } half-plane, and it is adescent method that only needs to traverse each line segmentand/or region once. This once-only traversal property holdsdue to the joint convexity of 𝐹 ( 𝜆 , 𝜆 , 𝜉, 𝑄 ) . DD method:
Directional Descent method to com-pute 𝑓 ( 𝜉, 𝑄 ) for (D1) Input: { 𝜉 , 𝑄 , { 𝑥 𝑖 } , 𝑁 , 𝑛 , 𝛿 , 𝜇 , 𝜎 } Output: { 𝑦 𝜉,𝑄 = 𝑓 ( 𝜉, 𝑄 )} Sort { 𝑥 𝑖 } Decreasing ; Define intercepts { 𝛽 𝑗 } as in Proposition 2.1; Construct lines { 𝜆 = 𝐿 𝑖𝑗 ( 𝜆 ≥ where 𝐿 𝑖𝑗 ( 𝜆 ) ∶= 2 𝜆 𝑥 𝑖 + 𝛽 𝑗 ∀ 𝑗 ; Compute { 𝑉 𝜏 } , the set of vertices ( 𝜆 , 𝜆 ) whereeither two or more lines 𝐿 𝑖𝑗 intersect or 𝜆 ∈ { 𝐿 𝑖𝑗 ( 𝜆 = 0)} ; Set 𝑘 = 0 and the initial search point to 𝜆 𝑐 ( 𝑘 ) = 𝑉 𝜏 ∗ ,the vertex with the smallest value for 𝐹 ; while 𝑘 < 𝑁 do Search adjacent regions Γ for descent directions 𝜆 ◦ 𝑐 ( 𝑘 ) + 𝑡𝑑 𝛾 where we move towards the minvalue 𝜆 ∗ 𝛾 for 𝐹 Γ ; /* Here 𝐹 Γ is defined such that {Ψ 𝑖 } have the samefunctional form across the entire ( 𝜆 , 𝜆 ) plane as inregion Γ , where Γ is defined by the supporting lines 𝐿 𝑖𝑗 . 𝜆 ◦ 𝑐 ( 𝑘 ) is an interior point to region Γ within 𝜖 of 𝜆 𝑐 ( 𝑘 ) . Observe that the number of regions Γ is ( 𝑛 ) . */ if 𝐹 ( 𝜆 ∗ 𝛾 ) < 𝐹 ( 𝜆 𝑐 ( 𝑘 )) then if 𝜆 ◦ 𝑐 ( 𝑘 ) + 𝑡𝑑 𝛾 ∩ { 𝐿 𝑖 } = ∅ then 𝜆 𝑐 ( 𝑘 + 1) ∶= 𝜆 ∗ 𝛾 ; else { 𝜆 𝑚 } ∶= 𝜆 ◦ 𝑐 ( 𝑘 ) + 𝑡𝑑 𝛾 ∩ { 𝐿 𝑖𝑗 } ; 𝜆 𝑐 ( 𝑘 + 1) ∶= arg min { 𝜆 𝑚 } ‖ 𝜆 𝑐 ( 𝑘 ) − 𝜆 𝑚 ‖ ; 𝑘 = 𝑘 + 1 ; continue ; Search along adjacent rays 𝑅 (the line segments ± ⃗𝐿 𝑖𝑗 emanating from point 𝜆 𝑐 ( 𝑘 ) ) for descentdirections 𝜆 𝑐 ( 𝑘 ) + 𝑡𝑑 𝑟 where we move towardsa critical point 𝜆 ∗ 𝑟 with zero directionalderivative for 𝐹 , so 𝐷 𝑑 𝑟 𝐹 ( 𝜆 ∗ 𝑟 ) = 0 ; if { 𝑑 𝑟 ∶ 𝐷 𝑑 𝑟 𝐹 ( 𝜆 ∗ 𝑟 ) = 0} ≠ ∅ then 𝜆 𝑐 ( 𝑘 + 1) ∶= arg min { 𝜆 ∗ 𝑟 } ‖ 𝜆 𝑐 ( 𝑘 ) − 𝜆 ∗ 𝑟 ‖ ; 𝑘 = 𝑘 + 1 ; else /* There are no descent directions via regions orrays so we are at the min value. */ return 𝑦 𝜉,𝑄 = 𝐹 ( 𝜆 𝑐 ( 𝑘 )) ; Remark 1.
The equation 𝜆 = 𝐿 𝑖𝑗 ( 𝜆 ) is derived from alge-braic manipulation and substitution for ( 𝜆 , 𝜆 ) in 𝑄 = ̃𝑄 . Proposition 2.2.
The solution to (D) can be computed inpolynomial time.Proof.
The dual problem (D) is jointly convex in ( 𝜆, 𝑄 ) hence 𝑓 ( 𝜉, 𝑄 ) is jointly convex. For fixed ( 𝜉, 𝑄 ) , 𝑓 ( 𝜉, 𝑄 ) can beevaluated, using the DD method, in at most ( 𝑛 ) opera-tions to find the (global) minimum of a piecewise convex Singh et al. Page 3 of 7RNV quadratic function in ( 𝜆 , 𝜆 ) . Thus, one can apply a linesearch method to evaluate ℎ ( 𝑄 ) ∶= min 𝜉> 𝑓 ( 𝜉, 𝑄 ) with 𝑄 fixed. The constraint 𝜉 > ensures the piecewise quadraticshave finite local minima. Finally, one can apply a bisectionmethod to the subgradient of ℎ to evaluate min 𝑄 ≥ ℎ ( 𝑄 ) tocompute the solution to (D). Note that max( 𝑐 , 𝑐 ) is an up-per bound on the magnitude of the subgradient of ℎ ; this es-tablishes Lipschitz continuity of ℎ and ensures the bisectionmethod will complete in polynomial time.
3. Case Study
Let us now investigate a practical application of the the-ory and algorithms developed in this work, using 4 yearsof monthly U.S. Tesla auto sales data from [5]. Sales arerecorded in units of thousands and lost sales penalty 𝑐 ≈ 20 whereas overage penalty 𝑐 ≈ 10 (in units of thousands ofdollars). We compute the (worst case) expected costs (in mil-lions) and optimal order quantity (in thousands) as a functionof ambiguity. Figure 2 shows the trajectories approach theclassical (Scarf) limits of approximately 121mm USD in costand 12,930 in order quantity. Thus, for any alternate demanddistribution of Wasserstein distance at most 𝛿 , that satisfiesthe moment constraints, one can look up its worst case costand order quantity. To go further, one could map the levelof ambiguity 𝛿 to a statistical confidence level 𝛼 ∈ [0 , (that the Wasserstein ball contains the true distribution) us-ing measure concentration results [4].For simplicity of implementation, a grid search in ( 𝜆 , 𝜆 ) ,as opposed to the DD method, is used to evaluate 𝑓 ( 𝜉, 𝑄 ) .A line search is used to evaluate ℎ ( 𝑄 ) ∶= min 𝜉> 𝑓 ( 𝜉, 𝑄 ) with 𝑄 fixed, and the bisection method is used to evaluate min 𝑄 ≥ ℎ ( 𝑄 ) . The algorithms are coded in Matlab and makeuse of standard functions such as bisection and fminbnd . Nospecial Matlab toolboxes are needed (although parallel com-puting via parfor loops for the grid search in ( 𝜆 , 𝜆 ) requiresuse of that toolbox).
4. Conclusions and Further Work
This work has analyzed the distributionally robust newsven-dor model (with moment constraints) using Wasserstein dis-tance as the ambiguity measure. The robust newsvendor de-cision problem and requisite preliminary material were cov-ered in Section 1. The simpler dual problem was formulatedand solved in Section 2. Furthermore, a polynomial time al-gorithm, using the DD method, was developed to computethe solution. In Section 3, we presented a case study on Teslaautomobile sales. Finally, we conclude with some commen-tary on directions for further research.One direction for future research would be to analyzeDRNV in a multiproduct and/or multiperiod setting, usingthe tools of semidefinite programming (SDP) and multi-stageoptimization. Another direction for future research would beto explore risk-averse formulations of DRNV. Finally, per-haps a third direction for future research would be to inves-tigate alternate forms for the objective function.
Figure 1:
DD Method: Plots for 𝜉 = 1 (a) 𝐿 𝑖𝑗 Lines(b) Surface Plot
Figure 2:
Costs and Optimal Order Quantity
Delta C o s t ( mm ) Q ( t h o u s a nd s ) QCost
Singh et al. Page 4 of 7RNV
Data and Code Availability Statement
The raw and/or processed data and Matlab code requiredto reproduce the findings from this research can be obtainedfrom the corresponding author, [D.S.], upon reasonable re-quest.
Conflict of Interest Statement
The authors declare they have no conflict of interest.
Funding Statement
The authors received no specific funding for this work.
References [1] Blanchet, J., Chen, L., Zhou, X.Y., 2018. Distributionally robustmean-variance portfolio selection with Wasserstein distances. arXivpreprint arXiv:1802.04885.[2] Blanchet, J., Kang, Y., Murthy, K., 2019. Robust Wasserstein profileinference and applications to machine learning. Journal of AppliedProbability 56, 830–857.[3] Blanchet, J., Murthy, K., 2019. Quantifying distributional model riskvia optimal transport. Mathematics of Operations Research 44, 565–600.[4] Carlsson, J.G., Behroozi, M., Mihic, K., 2018. Wasserstein distanceand the distributionally robust TSP. Operations Research 66, 1603–1624.[5] CarSalesBase, 2020. Tesla car sales USA. //https://carsalesbase.com/us-tesla/ .[6] Esfahani, P.M., Kuhn, D., 2018. Data-driven distributionally robustoptimization using the Wasserstein metric: Performance guaranteesand tractable reformulations. Mathematical Programming 171, 115–166.[7] Gallego, G., Moon, I., 1993. The distribution free newsboy problem:review and extensions. Journal of the Operational Research Society44, 825–834.[8] Gao, R., Kleywegt, A., 2016. Distributionally robust stochas-tic optimization with Wasserstein distance. arXiv preprintarXiv:1604.02199.[9] Hanasusanto, G.A., Kuhn, D., Wallace, S.W., Zymler, S., 2015.Distributionally robust multi-item newsvendor problems with multi-modal demand distributions. Mathematical Programming 152, 1–32.[10] Isii, K., 1962. On sharpness of Tchebycheff-type inequalities. Annalsof the Institute of Statistical Mathematics 14, 185–197.[11] Lee, S., Kim, H., Moon, I., 2020. A data-driven distributionally robustnewsvendor model with a Wasserstein ambiguity set. Journal of theOperational Research Society.[12] Natarajan, K., Sim, M., Uichanco, J., 2018. Asymmetry and ambigu-ity in newsvendor models. Management Science 64, 3146–3167.[13] Scarf, H., 1958. A min-max solution of an inventory problem. Studiesin the mathematical theory of inventory and production.[14] Singh, D., Zhang, S., 2020. Tight bounds for a class of data-driven dis-tributionally robust risk measures. arXiv preprint arXiv:2010.05398.[15] Zhao, C., Guan, Y., 2018. Data-driven risk-averse stochastic opti-mization with Wasserstein metric. Operations Research Letters 46,262–267.
A. Proof of Proposition 2.1
Let 𝑎 > , 𝑄 ≥ , then sup 𝑥 ≥ 𝑔 𝑖 ( 𝑥, 𝜆, 𝑄 ) = ⎧⎪⎨⎪⎩ 𝑔 𝑖 , if ( 𝜆 , 𝜆 ) ∈ ̃𝑅 𝑖 ,𝑔 𝑖 , if ( 𝜆 , 𝜆 ) ∈ ̃𝑅 𝑖 ,𝑔 𝑖 , if ( 𝜆 , 𝜆 ) ∈ ̃𝑅 𝑖 , Table 2
Mapping of Regions to Optimal Solutions
Region ̃𝑅 𝑖𝑗 Optimal Solution 𝑥 ∗ ̃𝑅 𝑖 ̃𝑅 𝑖 𝑥 𝑖 ̃𝑅 𝑖 𝑥 𝑖 where 𝑥 𝑖 = (2 𝑏 𝑖 − 𝑐 )∕(2 𝑎 ) , 𝑥 𝑖 = (2 𝑏 𝑖 + 𝑐 )∕(2 𝑎 ) , ̃𝑐 = 𝑐 + 𝑐 , the 𝑔 𝑖𝑗 functions are given by 𝑔 𝑖 = 𝑐 𝑄,𝑔 𝑖 = 𝑐 ( 𝑄 − 𝑥 𝑖 ) − 𝑎𝑥 𝑖 + 2 𝑏 𝑖 𝑥 𝑖 ,𝑔 𝑖 = 𝑐 ( 𝑥 𝑖 − 𝑄 ) − 𝑎𝑥 𝑖 + 2 𝑏 𝑖 𝑥 𝑖 , the mapping of regions to optimal solutions 𝑥 ∗ is given byTable 2, and the ̃𝑅 𝑖𝑗 regions are defined for four cases, ac-cording to the relative positions of 𝜆 intercepts 𝛽 ∶= 𝑐 , 𝛽 ∶= − 𝑐 , 𝛽 ∶= ( 𝑐 − 𝑐 𝑎𝑄 , 𝛽 ∶= 𝑐 − 2 √ 𝑎 ̃𝑐𝑄 . Case 1. 𝛽 ≥ 𝛽 ∩ 𝛽 ∈ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , ̃𝑅 𝑖 ∶= ∅ , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where regions 𝑅 𝑖𝑗 ⊆ { 𝜆 ≥ , 𝜆 } are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ . Case 2. 𝛽 < 𝛽 ∩ 𝛽 ∈ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where 𝑅 𝑖𝑗 regions are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } . Case 3. 𝛽 < 𝛽 ∩ 𝛽 ∉ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , where 𝑅 𝑖𝑗 regions are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ 𝜆 − 2 𝑥 𝑖 𝜆 ≥ 𝛽 } ,𝑅 𝑖 ∶= { 𝜆 − 2 𝑥 𝑖 𝜆 ≤ 𝛽 } . Case 4. 𝛽 ≥ 𝛽 ∩ 𝛽 ∉ [ 𝛽 , 𝛽 ] . ̃𝑅 𝑖 ∶= 𝑅 𝑖 , ̃𝑅 𝑖 ∶= ∅ , ̃𝑅 𝑖 ∶= 𝑅 𝑖 ∪ 𝑅 𝑖 , Singh et al. Page 5 of 7RNV where 𝑅 𝑖𝑗 regions are given by 𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≤ 𝑥 𝑖 ≥ ,𝑅 𝑖 ∶= { 𝑥 𝑖 ≥ . Proof.
It turns out to be useful to solve the unconstrained problem sup 𝑥 ∈ ℝ 𝑔 𝑖 first. To start, use the max relation ( 𝑄 − 𝑥 ) + = ( 𝑄 − 𝑥 ) + ( 𝑥 − 𝑄 ) + (1)to express function 𝑔 𝑖 ( 𝑥, 𝜆, 𝑄 ) as 𝑔 𝑖 = ( 𝑐 + 𝑐 )( 𝑥 − 𝑄 ) + + 𝑐 ( 𝑄 − 𝑥 ) − 𝑎𝑥 + 2 𝑏 𝑖 𝑥. (2)The first order optimality conditions (for left and right deriva-tives) for (unconstrained) sup 𝑥 ∈ ℝ 𝑔 𝑖 say that ( 𝑐 + 𝑐 ) (0 , ∞) ( 𝑥 − 𝑄 ) − 𝑐 − 2 𝑎𝑥 + 2 𝑏 𝑖 ≥ , (3) ( 𝑐 + 𝑐 ) [0 , ∞) ( 𝑥 − 𝑄 ) − 𝑐 − 2 𝑎𝑥 + 2 𝑏 𝑖 ≤ , (4)which leads to three cases for the critical point 𝑥 ∗ . Case 1. 𝑥 ∗ > 𝑄𝑥 ∗ > 𝑄 ⟹ 𝑥 ∗ = 𝑥 𝑖 for 𝑄 < 𝑥 𝑖 . Case 2. 𝑥 ∗ < 𝑄𝑥 ∗ < 𝑄 ⟹ 𝑥 ∗ = 𝑥 𝑖 for 𝑄 > 𝑥 𝑖 . Case 3. 𝑥 ∗ = 𝑄 This case violates the first order optimality conditions andhence does not occur.
Consequently, there are three cases for 𝑄 . Case 1. 𝑄 ≤ 𝑥 𝑖 𝑥 ∗ = 𝑥 𝑖 ⟹ 𝑔 𝑖 = ̂𝑔 𝑖 = ̃𝑐 ( 𝑥 𝑖 − 𝑄 ) + 2 𝑎𝑥 𝑖 𝑥 𝑖 − 𝑎𝑥 𝑖 + 𝑐 𝑄 . Case 2. 𝑄 ≥ 𝑥 𝑖 𝑥 ∗ = 𝑥 𝑖 ⟹ 𝑔 = ̌𝑔 𝑖 = 𝑎𝑥 𝑖 + 𝑐 𝑄 . Case 3. 𝑄 ∈ ( 𝑥 𝑖 , 𝑥 𝑖 ) 𝑥 ∗ ∈ { 𝑥 𝑖 , 𝑥 𝑖 } ⟹ 𝑔 = max ̂𝑔 𝑖 , ̌𝑔 𝑖 ⟹ 𝑔 𝑖 = { ̂𝑔 𝑖 , if 𝑥 𝑖 < 𝑄 ≤ ̃𝑄,̌𝑔 𝑖 , if ̃𝑄 < 𝑄 < 𝑥 𝑖 . The value ( 𝑥 𝑖 + 𝑥 𝑖 )∕2 for ̃𝑄 is derived by equating func-tion values for ̂𝑔 𝑖 and ̌𝑔 𝑖 . Therefore, for the unconstrainedproblem the ( 𝜆 ≥ , 𝜆 ) half-plane is partitioned into two regions above and below the cut line 𝑄 = ̃𝑄 = ( 𝑥 𝑖 + 𝑥 𝑖 )∕2 .Now let us consider the constrained problem sup 𝑥 ≥ 𝑔 𝑖 . Wecan reason that the optimal solution 𝑥 ∗ is such that 𝑥 ∗ ∈ 𝑆 ∶= {0 , 𝑥 𝑖 , 𝑥 𝑖 } . It remains to determine for which re-gions of the half-plane that 𝑥 ∗ takes on one of the values in 𝑆 . As it turns out, there are four cases depending on therelative positions of the intercepts { 𝛽 , 𝛽 , 𝛽 , 𝛽 } . Case 1. 𝛽 ≥ 𝛽 ∩ 𝛽 ∈ [ 𝛽 , 𝛽 ] .Refer to Figure 3 and Table 3 for the analysis. Let us workwith the following form for the equations for lines 𝐿 𝑖𝑗 , withintercepts 𝛽 𝑗 , that cut the half-plane into regions: 𝜆 = 2 𝜆 𝑥 𝑖 + 𝛽 𝑗 . (5) Table 3
Intercepts
Intercept 𝛽 𝑗 Interpretation 𝛽 𝑥 𝑖 = 0 𝛽 𝑥 𝑖 = 0 𝛽 𝑔 𝑖 = 𝑔 𝑖 𝛽 𝑔 𝑖 = 𝑔 𝑖 𝜆 𝜆 𝛽 𝛽 𝛽 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 Figure 3:
Case 1: Cut Lines for ( 𝜆 ≥ , 𝜆 ) Half-Plane 𝜆 𝜆 𝛽 𝛽 𝛽 𝛽 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 Figure 4:
Case 2: Cut Lines for ( 𝜆 ≥ , 𝜆 ) Half-Plane add also define sets 𝑆 ∶= { 𝑥 𝑖 , 𝑥 𝑖 } and 𝑆 ∶= {0 , 𝑥 𝑖 } .The only choice for 𝑥 ∗ ∈ 𝑆 for 𝑅 𝑖 is 0. Since 𝛽 > 𝛽 ,the choice 𝑥 𝑖 for 𝑥 ∗ is not available above 𝛽 . Furthermore,the choice for 𝑥 ∗ in 𝑅 𝑖 is 𝑥 𝑖 . Some algebra gives that thechoice for 𝑥 ∗ ∈ 𝑆 is 0 for 𝑅 𝑖 and 𝑥 𝑖 for 𝑅 𝑖 . Substitutingthe appropriate 𝑥 ∗ per region into 𝑔 𝑖 gives the result. Case 2. 𝛽 < 𝛽 ∩ 𝛽 ∈ [ 𝛽 , 𝛽 ] .Refer to Figure 4 and Table 3 for the analysis. The approachis similar to Case 1. The only choice for 𝑥 ∗ for 𝑅 𝑖 is 0. Asbefore, the choice for 𝑥 ∗ ∈ 𝑆 for 𝑅 𝑖 is and 𝑥 𝑖 for 𝑅 𝑖 .Some algebra gives that the choice for 𝑥 ∗ ∈ 𝑆 is 𝑥 𝑖 for 𝑅 𝑖 and 𝑥 𝑖 for 𝑅 𝑖 . As before, substitution for 𝑥 ∗ per regioninto 𝑔 𝑖 gives the result. Case 3. 𝛽 < 𝛽 ∩ 𝛽 ∉ [ 𝛽 , 𝛽 ] .Refer to Figure 5 and Table 3 for the analysis. The onlychoice for 𝑥 ∗ for 𝑅 𝑖 is 0. Since 𝛽 < 𝛽 , the choice for Singh et al. Page 6 of 7RNV 𝜆 𝜆 𝛽 𝛽 𝛽 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 Figure 5:
Case 3: Cut Lines for ( 𝜆 ≥ , 𝜆 ) Half-Plane 𝜆 𝜆 𝛽 𝛽 𝑅 𝑖 𝑅 𝑖 𝑅 𝑖 Figure 6: