[PDF] Distributionally robust second-order stochastic dominance constrained optimization with Wasserstein distance

Abstract

We consider a distributionally robust second-order stochastic dominance constrained optimization problem, where the true distribution of the uncertain parameters is ambiguous. The ambiguity set contains all probability distributions close to the empirical distribution under the Wasserstein distance. We adopt the sample approximation technique to develop a linear programming formulation that provides a lower bound. We propose a novel split-and-dual decomposition framework which provides an upper bound. We prove that both lower and upper bound approximations are asymptotically tight when there are enough samples or pieces. We present quantitative error estimation for the upper bound under a specific constraint qualification condition. To efficiently solve the non-convex upper bound problem, we use a sequential convex approximation algorithm. Numerical evidences on a portfolio selection problem valid the efficiency and asymptotically tightness of the proposed two approximation methods.

Full PDF

aa r X i v : . [ m a t h . O C ] J a n Distributionally robust second-order stochastic dominanceconstrained optimization with Wasserstein distance

Yu Mei, Jia Liu, Zhiping Chen ∗ School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, 710049, P. R. China,Center for Optimization Technique and Quantitative Finance, Xi’an International Academyfor Mathematics and Mathematical Technology, Xi’an, 710049, P. R. China

Abstract

We consider a distributionally robust second-order stochastic dominance constrained optimization problem,where the true distribution of the uncertain parameters is ambiguous. The ambiguity set contains all proba-bility distributions close to the empirical distribution under the Wasserstein distance. We adopt the sampleapproximation technique to develop a linear programming formulation that provides a lower bound. We pro-pose a novel split-and-dual decomposition framework which provides an upper bound. We prove that bothlower and upper bound approximations are asymptotically tight when there are enough samples or pieces. Wepresent quantitative error estimation for the upper bound under a speciﬁc constraint qualiﬁcation condition.To eﬃciently solve the non-convex upper bound problem, we use a sequential convex approximation algorithm.Numerical evidences on a portfolio selection problem valid the eﬃciency and asymptotically tightness of theproposed two approximation methods.

Keywords: stochastic dominance; distributionally robust optimization; Wasserstein distance; sequential convex approxima-tion

MSC:

Stochastic dominance (SD), originated from economics, is popular in comparing random outcomes. In theirpioneering work [7], Dentcheva and Ruszczy´nski studied the stochastic optimization problem with univariateSD constraints, where they developed the necessary and suﬃcient conditions of optimality and duality theoryfor the problem. The commonly adopted univariate SD concepts in stochastic optimization are ﬁrst-order SD(FSD) and second-order SD (SSD). Researchers have investigated the stochastic optimization problem withFSD constraints from diﬀerent aspects, such as the stability and sensitivity analysis [5], the integer and linearprogramming formulation [26], and linear programming relaxations [31]. The stochastic optimization problemwith SSD constraints has been intensively studied in quite a few literature. For theoretical foundations, thestability and sensitivity analysis was presented in [6]. For the solution methods of SSD constrained stochasticoptimization problem, diﬀerent linear programming formulations were derived in [7, 26], and the idea of cuttingplane is adopted in [12, 33, 34]. The stochastic programs with SD constraints induced by mixed-integer linearrecourse were studied in [15] for FSD and in [14] for SSD. Stochastic optimization problems with the multivariateextensions of SD constraints were considered in [18, 19, 30]. Haskell et al. deﬁned multivariate SD in [18] usingmultivariate utility functions and Noyan and Rudolf deﬁned in [30] by using scalarization functions to transforma multivariate random vector into a univariate random variable. To solve multivariate SD constrained stochasticoptimization problems, Haskell et al. developed primal-dual algorithms [18], while Noyan and Rudolf adopteda cut generation method [30]. There is also a rich literature considering SD under dynamic settings, such astime stochastic dominance [11]. Applications of optimization with SD constraints in ﬁnance were investigatedin [4, 8, 20]. ∗ corresponding author, email: [email protected] φ -divergence [22], and Wasserstein distance [1, 21, 24, 28, 35]. Mohajerin Esfa-hani and Kuhn estimated the priori probability that the true distribution belongs to the Wasserstein ambiguityset and established ﬁnite sample and asymptotic guarantees for the distributionally robust solutions in [29].Using the duality theory, researchers reformulated distributionally robust optimization under the Wassersteinambiguity set as convex programs [13, 29, 38]. Such reformulations were then applied to chance-constraineddistributionally robust optimization problems [1, 35].Incorporating the ideas of SD and distributional robustness, Dentcheva and Ruszczy´nski ﬁrst introduced thedistributionally robust SD in [9], where they established the optimality conditions of the stochastic optimizationproblem with distributionally robust SSD constraints. Since then, a stream of researches has paid attention tostochastic optimization with distributionally robust SD constraints. Considering the distributional uncertainty,Dupaˇcov´a and Kopa in [10] exploited the contamination technique to derive a robust decision that is FSDeﬃcient. Guo, Xu, and Zhang in [16] proposed a discrete approximation scheme for the moment-based ambiguitysets and approximately solved the resulting stochastic optimization problem with distributionally robust SSDconstraints. Also, under a moment-based type ambiguity set, Liesi¨o, Xu, and Kuosmanen in [23] developedmodels that identify a portfolio that is robust SSD over a given benchmark. In addition to distributionally robustFSD and SSD, some researches also focused on robust k th ( k >

2) order SD. The stability of distributionallyrobust optimization problem with k th order distributionally robust SD constraints induced by full randomrecourse was established [2, 37]. Besides, multivariate extensions of distributionally robust SD and optimalityconditions and duality theory of resulting stochastic optimization problems were discussed in [3, 17].As is mentioned above, SD constrained optimization under distributional ambiguity is an important issuein many practical applications such as ﬁnancial decision making. However, to the best of our knowledge,distributionally robust SSD constrained optimization with Wasserstein distance has not been studied in theexisting literature. The main diﬃculties of solving such problems lie in three aspects. • The two levels of semi-inﬁnite constraints with respect to the SSD constraints and the distributional ro-bustness are the main challenge in solving distributionally robust SSD constrained optimization problems. • Distributionally robust SSD constraints are non-smooth, which makes gradient-based methods fail to workhere. • Compared to moment-based ambiguity sets, the ambiguity set with Wasserstein distance contains anextra optimization problem on computing the optimal transportation between the true and referencedistributions. Such an inner-level optimization problem leads a min-max structure and non-convexity ofthe distributionally robust SSD constraints.Therefore, it is quite challenging for us to study the approximation schemes and algorithms for the distribution-ally robust SSD constrained optimization problem under Wasserstein distance. Thanks to the rapid developmentrecently on the strong duality theory of distributionally robust optimization problems under Wasserstein dis-tance [13, 29], we have a chance to show in this paper eﬃcient approximation methods for the distributionallyrobust SSD constrained optimization problem under Wasserstein distance.In detail, we ﬁrst utilize the duality theory in [13] to derive a dual reformulation of distributionally robustSSD constraints. Then we adopt the sampling technique to approximate the inﬁnitely many constraints byﬁnitely many constraints and develop a linear programming formulation to obtain a lower bound approximationfor the problem, which is asymptotically tight as the sample size goes to inﬁnity. To overcome the ‘curse ofdimensionality’ of the linear programming approximation, we propose a novel split-and-dual decompositionframework. We separate the support set of the parameter in distributionally robust SSD constraints into ﬁnitesub-intervals. For each sub-interval, we exchange the order of the supremum and the expectation to get anupper bound approximation. We prove that the optimal value of the upper bound approximation converges tothat of the original problem as the number of sub-intervals goes to inﬁnity and we also quantitatively estimatethe approximation error. As the derived upper bound approximation problem is non-convex, we apply thesequential convex approximation method to solve it. 2his paper improves results in quite a few papers. Speciﬁcally, we extend the distributionally robust opti-mization with Wasserstein distance [13,35] to a more complicated case with inﬁnitely many constraints inducedby SSD. Compared with robust SD constrained optimization problems in [16, 23, 37], we study the ambiguityset with Wasserstein distance rather than moment-based ambiguity sets. The main contributions in this paperinclude: • For the ﬁrst time, we study the distributionally robust SSD constrained optimization with Wassersteindistance. • We adopt the sample approximation technique to develop a linear programming formulation that providesa lower bound, and prove that the lower bound approximation is asymptotically tight when there existenough sample points. • We propose a novel split-and-dual decomposition framework, which provides an upper bound approxima-tion of the problem. In the existing literature, the upper bounds of SD constrained problems are seldomstudied. We prove the asymptotic tightness of the split-and-dual decomposition method and quantitativelyestimate the approximation error when the number of sub-intervals goes to inﬁnity.The rest of this paper is organized as follows. In section 2, we recall distributionally robust SSD and specifythe ambiguity set as a Wasserstein ball. In section 3, we elaborate on the distributionally robust SSD constrainedoptimization with Wasserstein distance in detail. We develop a linear programming formulation to obtain a lowerapproximation and solve a sequence of second-order cone programming problems for an upper approximation.Numerical evidences valid the eﬃciency and asymptotically tightness of the proposed approximation methodsin Section 5. Section 6 concludes the paper.

We ﬁrst introduce some notations. Let U be the set of all non-decreasing and concave functions u : R → R . Weuse ( · ) + = max {· , } to denote the positive part function. Let (Ω , F ) be a measurable space with F being theBorel σ -algebra on Ω, and M be the set of all probability measures on (Ω , F ).Before introducing the distributionally robust second-order stochastic dominance, we recall the deﬁnition ofclassic second-order stochastic dominance. Consider the integrable random variables X and Y on a probabilityspace (Ω , F , P ), here P ∈ M is the true distribution. We say that X stochastically dominates Y in the secondorder, denoted by X (cid:23) P (2) Y , if E P [ u ( X )] ≥ E P [ u ( Y )] , ∀ u ∈ U . X (cid:23) P (2) Y is equivalent to E P [( η − X ) + − ( η − Y ) + ] ≤ , ∀ η ∈ R . (1)Let Y be the set of all realizations of the random variable Y . It has been shown in [25, Proposition 1] that (1)is equivalent to E P [( η − X ) + − ( η − Y ) + ] ≤ , ∀ η ∈ Y . (2)In practical applications, it is very diﬃcult to know the full information about the true probability measure P . To address the lack of perfect information of P , Dentcheva and Ruszczy´nski introduced the distributionallyrobust second-order stochastic dominance in [9] by considering an ambiguity set of probability measures insteadof P . Deﬁnition 1. X dominates Y robustly in the second order over a set of probability measures Q ⊂ M , denotedby X (cid:23) Q (2) Y , if E P [ u ( X )] ≥ E P [ u ( Y )] , ∀ u ∈ U , ∀ P ∈ Q . It is known from (1) that X (cid:23) Q (2) Y is equivalent to E P [( η − X ) + − ( η − Y ) + ] ≤ , ∀ η ∈ R , ∀ P ∈ Q . (3)In the rest of this paper, we investigate the following distributionally robust second-order stochastic domi-nance constrained optimization problem( P SSD ) min z ∈ Z f ( z )3.t. z T ξ (cid:23) Q (2) z T ξ, where ξ denotes the random vector, Z is a bounded polyhedral set, and z ∈ Z is a given benchmark. From(3), problem ( P SSD ) can be rewritten asmin z ∈ Z f ( z )s.t. E P [( η − z T ξ ) + − ( η − z T ξ ) + ] ≤ , ∀ η ∈ R , ∀ P ∈ Q . (4)We can observe that problem (4) has two levels of semi-inﬁnite constraints, η ∈ R and P ∈ Q , induced from theSSD constraints and the distributionally robust ambiguity set, respectively. Moreover, the constraint functionsin problem (4) are non-smooth as ( · ) + is involved. Therefore, problem (4), as well as problem ( P SSD ), is hard tosolve. To reduce the diﬃculties in solving problem ( P SSD ), we ﬁrstly assume that the support set Ξ is boundedand has a polyhedral structure. The boundedness of Ξ helps us reduce the index set R into a compact set [25].The polyhedral structure of Ξ, also assumed in [29, Corollary 5.1], contributes to applying the duality theoryof second-order conic programming when deriving the upper bound approximation later in this paper. Assumption 1. Ξ is bounded and polyhedral, i.e., Ξ = { ξ | Cξ ≤ d } , where C ∈ R l × n , d ∈ R l . Keeping in mind the equivalence of (1) and (2), problem (4) can be formulated asmin z ∈ Z f ( z )s.t. E P [( η − z T ξ ) + − ( η − z T ξ ) + ] ≤ , ∀ η ∈ R , ∀ P ∈ Q . (5)where R := z T Ξ. By Assumption 1, R is a compact set, which is much easier to handle than the wholereal line R . We denote the smallest and largest numbers in R by R min and R max , respectively, that is, R = [ R min , R max ]. In this section, we introduce the data-driven Wasserstein ambiguity set Q and recall a fundamental dualityresult in distributionally robust optimization problems under Wasserstein ambiguity set [13, 29, 38].To begin with, let M (Ξ) be the space of all probability measures Q supported on Ξ with E Q [ k ξ k ] < ∞ .Now, we recall the deﬁnition of 1-Wasserstein distance. Deﬁnition 2.

The - Wasserstein distance d : M (Ξ) × M (Ξ) → R + is deﬁned via d ( Q , Q ) := inf ( Z Ξ k ξ − ξ k Π( dξ , dξ ) : Π is a joint distribution of ξ and ξ with marginals Q and Q , respectively ) . Given some observations { b ξ i } Ni =1 of ξ , we deﬁne the data-driven Wasserstein ambiguity set Q as the set ofall distributions close to the empirical distribution b P N = N P Ni =1 δ b ξ i under 1-Wasseretein metric, that is, Q := { P ∈ M (Ξ) : d ( P, b P N ) ≤ ǫ } , (6)where ǫ is a prespeciﬁed robust radius. In [29], Esfahani and Kuhn stated that with any prescribed β ∈ (0 , ǫ ( β ), the true distribution P belongs to Q with a conﬁdence level 1 − β . Then it isreasonable to consider the worst-case expectation under the ambiguity set Q .Under some mild conditions, a nice duality result of distributionally robust optimization problems underWasserstein ambiguity set has been established in [29, Theorem 4.2], [13, Corollary 2] and [38, Proposition 2].We adopt the version in [13] in the rest of this paper. Lemma 1 [13] . If Ξ is bounded and Ψ( ξ ) is upper semi-continuous, then the optimal values of sup P ∈M (Ξ) (cid:26)Z Ξ Ψ( ξ ) P ( dξ ) : d ( P, b P N ) ≤ ǫ (cid:27) nd min λ ≥ ( λǫ + 1 N N X i =1 sup ξ ∈ Ξ [Ψ( ξ ) − λ k ξ − b ξ i k ] ) are equal. Moreover, the optimal solution of the latter problem can always be obtained. Later on, we will derive for problem ( P SSD ) a lower bound approximation in Section 3 and an upper boundapproximation in Section 4. The relationship of formulations in intermediate steps of the two approximationschemes is illustrated in Figure 1.( P SSD ) ⇔ (4) ⇔ (5) ⇔ (7) lower ==== ⇒ bound (8) ⇔ (14) ⇔ (15)-(18) ⇔ ( P SSD − L ) ⇔ (20) ⇔ (22)-(23) upper ==== ⇒ bound (24) ⇔ (25)-(26) ⇔ ( P SSD − U )Figure 1: The relationship of formulations in intermediate steps of the two approximation schemes.The key reformulation or approximation steps in the two approximation schemes can be summarized asfollows:1) Reformulations ( P SSD ) ⇔ (4) and (4) ⇔ (5) are due to the deﬁnition of robust SSD and Assumption 1.2) Reformulations (5) ⇔ (7) is due to the duality theory of Wasserstein robust optimization from Lemma1; Approximation (7) lower ==== ⇒ bound (8) comes from the ﬁnite discrete approximation; Reformulation (8) ⇔ (14) is asimple rewritting; Reformulations (14) ⇔ (15)-(18) and (15)-(18) ⇔ ( P SSD − L ) are obtained by adding auxiliaryvariables.3) We propose a split-and-dual decomposition framework for the upper bound approximation. In detail, weexchange the order of two supremums equivalently in (5) ⇔ (20); We split the interval R into sub-intervalsin the reformulation (20) ⇔ (22)-(23); We exchange the order of the expectation and supremum to derive theupper bound approximation (22)-(23) upper ==== ⇒ bound (24); Reformulations (24) ⇔ (25)-(26) is due to the duality theoryof Wasserstein robust optimization from Lemma 1; Reformulation (25)-(26) ⇔ ( P SSD − U ) is due to the strongduality of second-order cone programming. We start with the lower bound approximation. Note that problem (5) can be written asmin z ∈ Z f ( z )s.t. sup η ∈R sup P ∈Q E P [( η − z T ξ ) + − ( η − z T ξ ) + ] ≤ , whose optimal value is equal tomin z ∈ Z f ( z )s.t. sup η ∈R min λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i) ≤ ξ ∈ Ξ where thefunction λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + is not convex w.r.t. ξ ; 2) taking supremum over η ∈ R where thefunction min λ ≥ n λǫ − N P Ni =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + io is not convex w.r.t. η , either.5n order to tackle these diﬃculties, a natural approach is to approximate the sets Ξ and R by their ﬁnitesubsets. Then the non-convex min-max problem reduces to a tractable approximation problem with limitedenumeration. Let Ξ N = { ¯ ξ j } N j =1 be the set of ﬁnite samples in Ξ, and Γ N = { η k } N k =1 be the set of ﬁnitesamples in R = [ R min , R max ], where N and N denote the sample sizes. We then have an approximation ofproblem (7)min z ∈ Z f ( z ) (8)s.t. max ≤ k ≤N min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) ≤ . In subsection 3.1, we prove that problem (8) forms a lower bound approximation to problem (7). Besides,when the sample sizes N and N go to inﬁnity, problem (8) converges to problem (7) in sense of the feasiblesolution set, the optimal solution set and the optimal value. In subsection 3.2, we show how problem (8) canbe reformulated as a linear programming problem, and the computational eﬃciency for large sample sizes canbe further improved by the cutting-plane method. (7) lower ==== ⇒ bound (8) First, we have the following proposition.

Proposition 1.

Problem (8) is a lower bound approximation of problem (7) .Proof.

Observe that for any η ∈ R and λ ≥

0, we have λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i ≥ λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i . Taking minimization over λ ≥ λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i) ≥ min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i) . This implies thatsup η ∈R min λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i) ≥ sup η ∈R min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i) ≥ max ≤ k ≤N min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) . Therefore, the feasible solution set of problem (8) provides an outer approximation to that of problem (7).Since problem (7) and problem (8) are minimization problems, problem (8) is a lower bound approximation ofproblem (7). 6o establish the theoretical result that problem (8) converges to problem (7), we ﬁrst need to show thatmax ≤ k ≤N min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) converges to sup η ∈R min λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i) , when N and N go to inﬁnity. To this end, we need the following assumption which ensures that the subsetsΞ N and Γ N cover the sets Ξ and R with small neighborhoods, and these subsets should be monotonouslyincreasing with respect to N and N in sense of set inclusion. Assumption 2.

The sample sets Ξ N and Γ N satisfy that:a) there exist positive radii ∆ = o ( N ) and ∆ = o ( N ) , such that for each ξ ∈ Ξ , there exists at least one e ξ ∈ Ξ N in ξ ’s ∆ -neighborhood, and for each η ∈ R , there exists at least one e η ∈ Γ N in η ’s ∆ -neighborhood;b) if N ≤ N , then Ξ N ⊂ Ξ N ;c) if N ≤ N , then Γ N ⊂ Γ N . Additionally, to guarantee the convergence, the robust radius ǫ should not be too small. Assumption 3. ǫ > max ξ ,ξ ∈ Ξ k ξ − ξ k . Under Assumption 1, Ξ is bounded. Thus max ξ ,ξ ∈ Ξ k ξ − ξ k is ﬁnite. Assumption 3 can then be satisﬁedfor some ﬁnite positive number ǫ . Therefore, we have the following proposition. Proposition 2.

Under Assumptions 1, 2 and 3, we have max η ∈R min λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i) = lim N →∞N →∞ max ≤ k ≤N min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) . (9) Proof.

Let η ∗ be optimal to the left-hand side problem of equation (9). Let ˜ k ∈ { , · · · , N } be the index suchthat η ˜ k ∈ Γ N is in η ∗ ’s ∆ -neighborhood. Denote e λ ∈ argmin λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i) . (10)Then we have0 ≤ ∆ := max η ∈R min λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η − z T ξ ) + + ( η − z T ξ ) + i) − max ≤ k ≤N min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) ≤ min λ ≥ ( λǫ − N N X i =1 inf ξ ∈ Ξ h λ k ξ − b ξ i k − ( η ∗ − z T ξ ) + + ( η ∗ − z T ξ ) + i) − min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i) ≤ e λǫ − N N X i =1 inf ξ ∈ Ξ he λ k ξ − b ξ i k − ( η ∗ − z T ξ ) + + ( η ∗ − z T ξ ) + i − (e λǫ − N N X i =1 min ≤ j ≤N he λ k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i) N N X i =1 min ≤ j ≤N he λ k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i − N N X i =1 inf ξ ∈ Ξ he λ k ξ − b ξ i k − ( η ∗ − z T ξ ) + + ( η ∗ − z T ξ ) + i . (11)Then we denote ξ ∗ ∈ argmin ξ ∈ Ξ he λ k ξ − b ξ i k − ( η ∗ − z T ξ ) + + ( η ∗ − z T ξ ) + i , the inﬁmum can be obtained because Ξ is compact (Assumption 1) and e λ k ξ − b ξ i k − ( η ∗ − z T ξ ) + + ( η ∗ − z T ξ ) + iscontinuous with respect to ξ . Let ˜ j ∈ { , · · · , N } be the index such that ¯ ξ ˜ j ∈ Ξ N is in ξ ∗ ’s ∆ -neighborhood.Then by (11)∆ ≤ N N X i =1 ne λ k ¯ ξ ˜ j − b ξ i k − ( η ˜ k − z T ¯ ξ ˜ j ) + + ( η ˜ k − z T ¯ ξ ˜ j ) + − e λ k ξ ∗ − b ξ i k + ( η ∗ − z T ξ ∗ ) + − ( η ∗ − z T ξ ∗ ) + o ≤ ( e λ + k z k + k z k ) k ¯ ξ ˜ j − ξ ∗ k + 2 | η ˜ k − η ∗ | ≤ ( e λ + k z k + k z k )∆ + 2∆ . (12)Letting N , N → ∞ , it is then known from Assumption 2 a) that ∆ , ∆ →

0. Furthermore, e λ is bounded(see Lemma 2), z ∈ Z is also bounded, and these bounds are independent of N and N . Then lim N →∞N →∞ ∆ = 0.Therefore, we obtain (9). Remark 1.

From our proof, one can notice that the conclusion in Proposition 2 still holds if the condition inAssumption 2 a) is ∆ = O ( N ) or ∆ = O ( N ) . In fact, e λ deﬁned in (10) depends on N and N . It is obvious from (10) that e λ depends on N . In addition,the choice of η ˜ k in (10) depends on N . We write e λ explicitly as a function of N and N as e λ ( N , N ) := e λ ∈ argmin λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i) . Lemma 2.

Under Assumptions 1, 2 b) and 3, for any positive integers N and N , we have e λ (1 ,

1) = e λ (1 , N ) ≥ e λ ( N , N ) .Proof. Firstly, we show that e λ ( N , N ) ≥ e λ ( N , N ), if N ≤ N . Assume on the contrary that e λ ( N , N ) < e λ ( N , N ). From Assumption 2 b) and the optimality of e λ ( N , N ) under Ξ N = { ¯ ξ j } N j =1 and Γ N = { η k } N k =1 ,we immediately obtain e λ ( N , N ) ǫ − N N X i =1 min ≤ j ≤N he λ ( N , N ) k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i ≤ e λ ( N , N ) ǫ − N N X i =1 min ≤ j ≤N he λ ( N , N ) k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i . Then we have (cid:16)e λ ( N , N ) − e λ ( N , N ) (cid:17) ǫ ≤ N N X i =1 (cid:26) min ≤ j ≤N he λ ( N , N ) k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i − min ≤ j ≤N he λ ( N , N ) k ¯ ξ j − b ξ i k − ( η ˜ k − z T ¯ ξ j ) + + ( η ˜ k − z T ¯ ξ j ) + i(cid:27) (13) ≤ N N X i =1 max ≤ j ≤N h ( e λ ( N , N ) − e λ ( N , N )) k ¯ ξ j − b ξ i k i (cid:16)e λ ( N , N ) − e λ ( N , N ) (cid:17) ǫ, where the last inequality is implied by Assumption 3. Then (13) indicates a contradiction. Consequently, e λ ( N , N ) < e λ ( N , N ) cannot hold. Hence, e λ (1 , N ) ≥ e λ ( N , N ) for any positive integers N and N .Now we investigate e λ (1 , N ) ∈ argmin λ ≥ ( λǫ − N N X i =1 h λ k ¯ ξ − b ξ i k − ( η ˜ k − z T ¯ ξ ) + + ( η ˜ k − z T ¯ ξ ) + i) . It is easy to observe that the optimal solution e λ (1 , N ) does not change with respect to the choice of η ˜ k . Hence, e λ (1 , N ) = e λ (1 ,

1) for any positive integer N . Therefore, e λ ( N , N ) is bounded by e λ (1 ,

1) for any positiveintegers N and N .Now, it is time to establish the theoretical result that problem (8) converges to problem (7) when N and N tend to inﬁnity. We denote the feasible solution sets of problem (7) and problem (8) by F and F N , N , theoptimal solution sets by S and S N , N , and the optimal values by v and v N , N , respectively. Theorem 1.

Given Assumptions 2 and 3, we have F = lim N →∞N →∞ F N , N , lim sup N →∞N →∞ S N , N ⊂ S , v =lim N →∞N →∞ v N , N .Proof. Firstly, we claim that F N +1 , N +1 ⊂ F N , N . To see this, we use the same idea as that is the proof ofProposition 1. From Assumption 2 b), we know that for any η ∈ R and λ ≥ λǫ − N N X i =1 min ≤ j ≤N +1 h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i ≥ λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i . Taking minimization over λ ≥ λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N +1 h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i) ≥ min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η − z T ¯ ξ j ) + + ( η − z T ¯ ξ j ) + i) . Then taking maximization over k ∈ { , · · · , N } , we havemax ≤ k ≤N +1 min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N +1 h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) ≥ max ≤ k ≤N +1 min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) ≥ max ≤ k ≤N min λ ≥ ( λǫ − N N X i =1 min ≤ j ≤N h λ k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i) , where the last inequality is due to Assumption 2 c). Therefore, F N +1 , N +1 ⊂ F N , N .Next, we show F = lim N →∞N →∞ F N , N . Since F N , N ⊃ F N +1 , N +1 , we have by [32, Exercise 4.3] thatlim N →∞N →∞ F N , N = ∞ ∩ N N cl F N , N . F = lim N →∞N →∞ F N , N , it is suﬃcient to prove that F = ∞ ∩ N N cl F N , N . On one hand,

F ⊂ F N , N , ∀N , N , which obviously leads to F ⊂ ∩ ∞N =1 , N =1 cl F N , N . On the other hand,for any z ∈ ∩ ∞N =1 , N =1 cl F N , N , we must have z ∈ cl F N , N , ∀N , N . Thus z ∈ F N , N , ∀N , N , since F N , N is closed. This means that z satisﬁes the constraint in problem (8). Taking N → ∞ , N → ∞ , then byProposition 2, z satisﬁes the constraint in problem (7). That is, z ∈ F .Finally, we verify that v = lim N →∞N →∞ v N , N and lim sup N →∞N →∞ S N , N ⊂ S . Let ¯ f ( z ) = f ( z ) + δ F ( z ) and¯ f N , N ( z ) = f ( z ) + δ F N , N ( z ), where δ A ( z ) = 0 if z ∈ A , otherwise δ A ( z ) = + ∞ . Since F = lim N →∞N →∞ F N , N ,then by [32, Proposition 7.4(f)], δ F N , N epi-converges to δ F as N → ∞ , N → ∞ . As f is continuous andﬁnite, we obtain by [32, Exercise 7.8] that ¯ f N , N = f + δ F N , N epi-converges to ¯ f = f + δ F when N →∞ , N → ∞ . As F , F N , N are closed and f is continuous, ¯ f N , N and ¯ f are lower semi-continuous. Moreover,since ¯ f N , N and ¯ f are proper, it can then be deduced from [32, Theorem 7.33] that v = lim N →∞N →∞ v N , N andlim sup N →∞N →∞ S N , N ⊂ S .Theorem 1 states that the lower bound approximation problem (8) is asymptotically tight. Speciﬁcally, theoptimal value and the feasible solution set of problem (8) converge to those of problem (7), and the outer limitof the optimal solution set of problem (8) is included in the optimal solution set of problem (7).For the case that the support set Ξ is ﬁnite, the lower bound approximation is tight. Proposition 3.

When Ξ N = Ξ and Γ N = { z T ξ | ξ ∈ Ξ } , the optimal values of problem (7) and problem (8) are equal.Proof. The conclusion follows from [7, Proposition 3.2]. (8)

In what follows, we derive a tractable equivalent formulation of problem (8).Notice that problem (8) can be rewritten asmin z,λ f ( z )s.t. λ k ǫ − N N X i =1 min ≤ j ≤N h λ k k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i ≤ , k = 1 , · · · , N , (14) z ∈ Z, λ ∈ R N + . By introducing auxiliary variables β ik , i = 1 , · · · , N, k = 1 , · · · , N (which can be written for simplicity asa matrix β ∈ R N ×N ), problem (14) can be reformulated asmin z,λ,β f ( z ) (15)s.t. λ k ǫ − N N X i =1 β ik ≤ , k = 1 , · · · , N , (16) β ik ≤ λ k k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + , i = 1 , · · · , N, j = 1 , · · · , N , k = 1 , · · · , N , (17) z ∈ Z, λ ∈ R N + , β ∈ R N ×N . (18)In fact, the feasible solution sets of problem (14) and problem (15)-(18) are equivalent in the following sense.On one hand, given any feasible solution ( z, λ ) of problem (14), let β ik = min ≤ j ≤N h λ k k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i , i = 1 , · · · , N, k = 1 , · · · , N . Then ( z, λ, β ) is feasible for problem (15)-(18). On the other hand, given any feasible solution ( z, λ, β ) of10roblem (15)-(18), we can verify that for any k = 1 , · · · , N , it holds that λ k ǫ ≤ N N X i =1 β ik ≤ N N X i =1 min ≤ j ≤N h λ k k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + + ( η k − z T ¯ ξ j ) + i , where the inequalities are from (16) and (17). Therefore, ( z, λ ) is also feasible for problem (14).Further introducing auxiliary variables s jk , j = 1 , · · · , N , k = 1 , · · · , N , to handle ( η k − z T ¯ ξ j ) + (referto [7] (3.10)-(3.12)), we have a reformulation of problem (15)-(18) as followsmin z,λ,β,s f ( z )s.t. λ k ǫ − N X i =1 N β ik ≤ , k = 1 , · · · , N , ( P SSD − L ) β ik + s jk ≤ λ k k ¯ ξ j − b ξ i k + ( η k − z T ¯ ξ j ) + , i = 1 , · · · , N, j = 1 , · · · , N , k = 1 , · · · , N ,s jk ≥ η k − z T ¯ ξ j , j = 1 , · · · , N , k = 1 , · · · , N ,z ∈ Z, λ ∈ R N + , β ∈ R N ×N , s ∈ R N ×N + . Problem ( P SSD − L ) is equivalent to problem (8), and thus is a lower bound approximation of problem (7)and problem ( P SSD ). As a linear programming problem, ( P SSD − L ) can be solved directly using many oﬀ-the-shelf optimization softwares. However, if N and N are large, solving problem ( P SSD − L ) may be time-consuming. In fact, the dimension of s is N × N and the number of constraints in problem ( P SSD − L ) is N + N × N × N + N × N . They increase rapidly with the increase of the sample sizes N and N .In order to numerically solve problem (15)-(18) for large N , N , we propose a cutting-plane method, seeAlgorithm 1. At each iteration of the cutting-plane method, we solve problem (19), a relaxation of problem( P SSD − L ). Therefore, the approximate problem (19) provides a lower bound approximation for problem (15)-(18) at each iteration. After solving problem (19), we check whether all the constraints in (17) are satisﬁed ornot. If all the constraints in (17) hold, then the optimal solution we ﬁnd for problem (19) is also optimal forproblem (15)-(18). Otherwise, if constraint (17) is violated for some index ( j l , k l ), then we add the violatedconstraint to the approximate problem (19) at the next iteration. Theorem 2.

Algorithm 1 stops at the optimal value and optimal solution of problem (15) - (18) within ﬁnitesteps.Proof. We claim that J l $ J l +11 or J l $ J l +12 . In fact, since ( z l , λ l , β l , s l ) is an optimal solution of problem(19), we immediately have that β lik + s ljk − λ lk k ¯ ξ j − b ξ i k − ( η k − z T ¯ ξ j ) + ≤ , i = 1 , · · · , N, j ∈ J l , k ∈ J l ,s ljk ≥ η k − ( z l ) T ¯ ξ j , s ljk ≥ , j ∈ J l , k ∈ J l . This implies that β lik − λ lk k ¯ ξ j − b ξ i k + ( η k − ( z l ) T ¯ ξ j ) + − ( η k − z T ¯ ξ j ) + ≤ β lik − λ lk k ¯ ξ j − b ξ i k + s ljk − ( η k − z T ¯ ξ j ) + ≤ , i = 1 , · · · , N, j ∈ J l , k ∈ J l . On the other hand, ( i l , j l , k l ) is chosen such that for ( i, j, k ) = ( i l , j l , k l ), it holds that β lik − λ lk k ¯ ξ j − b ξ i k + ( η k − ( z l ) T ¯ ξ j ) + − ( η k − z T ¯ ξ j ) + > . Therefore, j l / ∈ J l or k l / ∈ J l . But j l ∈ J l +11 and k l ∈ J l +12 . This tells us that J l $ J l +11 or J l $ J l +12 . Asthe possible number of constraints that can be added is ﬁnite, Algorithm 1 must stop at the optimal value andoptimal solution of problem (15)-(18) within ﬁnite steps.To conclude this section, we develop an asymptotically tight lower bound approximation problem (8) for thedistributionally robust second-order stochastic dominance constrained optimization problem ( P SSD ). Problem(8) can be reformulated as problem (15)-(18), which can be easily solved using linear programming formulation( P SSD − L ) or by Algorithm 1. 11 lgorithm 1 Cutting-plane Method

Start from l = 1 and J = J = ∅ . while l ≥ do Solve the approximate problem:min z,λ,β,s f ( z )s.t. λ k ǫ − N X i =1 N β ik ≤ , k = 1 , · · · , N ,β ik + s jk ≤ λ k k ¯ ξ j − b ξ i k + ( η k − z T ¯ ξ j ) + , i = 1 , · · · , N, j ∈ J l , k ∈ J l , (19) s jk ≥ η k − z T ¯ ξ j , j ∈ J l , k ∈ J l ,z ∈ Z, λ ∈ R N + , β ∈ R N ×N , s ∈ R N ×N + . Let ( z l , λ l , β l , s l ) denote the optimal solution of problem (19).Calculate δ l := max i ∈{ , ··· ,N } ,j ∈{ , ··· , N } ,k ∈{ , ··· , N } n β lik − λ lk k ¯ ξ j − b ξ i k + ( η k − ( z l ) T ¯ ξ j ) + − ( η k − z T ¯ ξ j ) + o . if δ l ≤ then Stop. else

Determine( i l , j l , k l ) ∈ argmax i ∈{ , ··· ,N } ,j ∈{ , ··· , N } ,k ∈{ , ··· , N } n β lik − λ lk k ¯ ξ j − b ξ i k + ( η k − ( z l ) T ¯ ξ j ) + − ( η k − z T ¯ ξ j ) + o . Let J l +11 = J l ∪ j l , J l +12 = J l ∪ k l and l ← l + 1. end ifend while We derive an upper bound approximation for problem (5) in this section.Notice that problem (5) can be rewritten asmin z ∈ Z f ( z )s.t. sup P ∈Q sup η ∈R E P [( η − z T ξ ) + − ( η − z T ξ ) + ] ≤ . (20)If we exchange the order of operators sup η ∈R and E P in problem (20), we obtain an upper bound approximationfor problem (20). However, such an upper bound approximation might be loose since for each P ∈ Q , the gap E P (cid:26) sup η ∈R [( η − z T ξ ) + − ( η − z T ξ ) + ] (cid:27) − sup η ∈R E P [( η − z T ξ ) + − ( η − z T ξ ) + ] (21)might be large. This is because we determine an η for all possible ξ ’s in the latter supremum in (21), whilewe determine an η for each realization of ξ in the former supremum in (21). Therefore, the larger the range R of η , the larger the gap in (21). As an extreme case, when R reduces to a singleton set, the gap in (21)becomes 0. This observation motivates us to divide R into small sub-intervals, and exchange the order of theexpectation operator and the supremum over each sub-interval, which provides an upper bound approximationof the sub-problem taking supremum over the sub-interval. Summing all sub-problems in all sub-intervals,we would obtain a better upper bound approximation of problem (20). We name such a bounding method asplit-and-dual technique.In detail, we divide R = [ R min , R max ] into K intervals with disjoint interiors, [ η k , ¯ η k ], k = 1 , · · · , K , wherethe boundary points of the intervals are speciﬁed by η k = R min + ( k − R max −R min K , ¯ η k = R min + k R max −R min K ,12 = 1 , · · · , K . Notice that problem (20) can also be reformulated asmin z ∈ Z f ( z )s.t. max ≤ k ≤K sup η ∈ [ η k , ¯ η k ] sup P ∈Q E P [( η − z T ξ ) + − ( η − z T ξ ) + ] ≤ , or, equivalently, min z ∈ Z f ( z ) (22)s.t. sup P ∈Q sup η ∈ [ η k , ¯ η k ] E P [( η − z T ξ ) + − ( η − z T ξ ) + ] ≤ , k = 1 , · · · , K . (23)Exchanging the order of operators sup η ∈ [ η k , ¯ η k ] and E P in (23), we have the following approximation problemmin z ∈ Z f ( z )s.t. sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] [( η − z T ξ ) + − ( η − z T ξ ) + ] o ≤ , k = 1 , · · · , K . (24)The feasible solution set of problem (22)-(23) contains the feasible solution set of (24). Thus problem (24)provides an upper bound approximation for problem (22)-(23).By applying Lemma 1 to each supremum problem with respect to P for k = 1 , . . . , K , we have an equivalentformulation of problem (24)min z ∈ Z,λ ∈ R K + f ( z ) (25)s.t. λ k ǫ + 1 N N X i =1 sup ξ ∈ Ξ n sup η ∈ [ η k , ¯ η k ] [( η − z T ξ ) + − ( η − z T ξ ) + ] − λ k k ξ − b ξ i k o ≤ , k = 1 , · · · , K . (26)To simplify the notation, we write (26) as λ k ǫ + 1 N N X i =1 V ikS ≤ , k = 1 , · · · , K , (27)where V ikS := sup ( ξ,η ) ∈ Ξ × [ η k , ¯ η k ] ( η − z T ξ ) + − ( η − z T ξ ) + − λ k k ξ − b ξ i k , i = 1 , · · · , N, k = 1 , · · · , K . In what follows, we derive a reformulation for V ikS . According to Assumption 1, V ikS is equivalent tosup ξ,η,s,m ( η − z T ξ ) + − s − λ k m s.t. s ≥ η − z T ξ, Cξ ≤ d,η ≥ η k , η ≤ ¯ η k , k ξ − b ξ i k ≤ m,ξ ∈ R n , η ∈ R , s ≥ , m ∈ R . (28)Problem (28) is a non-convex optimization problem with a piecewise linear objective function with two pieces.Examining the two pieces of the objective function separately, we can split problem (28) into two convexoptimization problems: V ikS = sup ξ,η,s,m η − z T ξ − s − λ k m V ikS = sup ξ,η,s,m − s − λ k m s.t. s ≥ η − z T ξ, s.t. s ≥ η − z T ξ, ( P ikSSD − ) η − z T ξ ≥ , ( P ikSSD − ) η − z T ξ ≤ ,Cξ ≤ d, Cξ ≤ d, ≥ , s ≥ ,η ≥ η k , η ≥ η k ,η ≤ ¯ η k , η ≤ ¯ η k , k ξ − b ξ i k ≤ m. k ξ − b ξ i k ≤ m. And we have that V ikS = max { V ikS , V ikS } . (29)First, we derive the dual problem of problem ( P ikSSD − ). Using conic duality theory, we introduce the dualvariables ( µ , µ , ν, µ , µ , µ , δ, κ ) such that µ ≥ , µ ≥ , ν ≥ , µ ≥ , µ ≥ , µ ≥ κ ≥ k δ k forthe seven constraints, respectively, and we obtain the standard formulation of the dual problem of ( P ikSSD − ) asfollows: inf µ,ν,κ,δ d T ν − b ξ Ti δ − µ η k + µ ¯ η k s.t. − z + µ z − µ z − C T ν + δ = 0 , (30)1 − µ + µ + µ − µ = 0 , − µ + µ = 0 , (31) − λ k + κ = 0 , (32) κ ≥ k δ k ,µ ∈ R , ν ∈ R l + , κ ∈ R , δ ∈ R n . Eliminating δ , µ , µ and κ using (30), (31) and (32), we can reformulate the dual problem as˜ V ikS = inf µ,ν d T ν − b ξ Ti ( z − µ z + µ z + C T ν ) − µ η k + (1 − µ + µ + µ )¯ η k ( D ikSSD − ) s.t. 1 − µ + µ + µ ≥ , k z − µ z + µ z + C T ν k ≤ λ k , ≥ µ ≥ , µ ≥ , µ ≥ , ν ∈ R l + . Similarly, the dual problem of problem ( P iSSD − ) isinf µ,ν,κ,δ d T ν − b ξ Ti δ − µ η k + µ ¯ η k s.t. µ z + µ z − C T ν + δ = 0 , − µ − µ + µ − µ = 0 , − µ + µ = 0 , − λ k + κ = 0 ,κ ≥ k δ k ,µ ∈ R , ν ∈ R l + , κ ∈ R , δ ∈ R n , or, equivalently, ˜ V ikS = inf µ,ν d T ν − b ξ Ti ( − µ z − µ z + C T ν ) − µ η k + ( − µ − µ + µ )¯ η k ( D ikSSD − ) s.t. − µ − µ + µ ≥ , k − µ z − µ z + C T ν k ≤ λ k , ≥ µ ≥ , µ ≥ , µ ≥ , ν ∈ R l + . We observe that the inﬁmum problems ( D ikSSD − ) and ( D ikSSD − ) can be reached by the correspondingminimization problems over the closed feasible solution sets, given that the optimal values are ﬁnite. Byequation (29) and the duality theory, we have that V ikS ≤ max { ˜ V ikS , ˜ V ikS } , i = 1 , · · · , N, k = 1 , · · · , K . Assumption 4.

For any i = 1 , · · · , N, k = 1 , · · · , K , problems ( P ikSSD − ) and ( P ikSSD − ) are strictly feasible. V ikS (resp. V ikS )and ˜ V ikS (resp. ˜ V ikS ) is zero, and V ikS = max { ˜ V ikS , ˜ V ikS } , i = 1 , · · · , N, k = 1 , · · · , K .Introducing auxiliary variables V ik , i = 1 , · · · , N, k = 1 , · · · , N , we claim that constraints λ k ǫ + 1 N N X i =1 V ik ≤ , k = 1 , · · · , K (33) V ik ≥ ˜ V ikS , i = 1 , · · · , N, k = 1 , · · · , K , (34) V ik ≥ ˜ V ikS , i = 1 , · · · , N, k = 1 , · · · , K . (35)are equivalent to the constraints in (27). To prove the assertion, ﬁrst, if there exist V ik , i = 1 , · · · , N, k =1 , · · · , K , such that constraints (33)-(35) hold, then λ k ǫ + 1 N N X i =1 V ikS ≤ λ k ǫ + 1 N N X i =1 max { ˜ V ikS , ˜ V ikS } ≤ λ k ǫ + 1 N N X i =1 V ik ≤ , k = 1 , · · · , K , and thus, the constraints in (27) hold. On the other hand, if V ikS , i = 1 , · · · , N, k = 1 , · · · , K satisfy theconstraints in (27), let V ik = max { ˜ V ikS , ˜ V ikS } , i = 1 , · · · , N, k = 1 , · · · , K . Then by strong duality conditionbetween V ikS (resp. V ikS ) and ˜ V ikS (resp. ˜ V ikS ), constraints (33)-(35) hold.Taking the formulations ( D ikSSD − ) and ( D ikSSD − ) of ˜ V ikS , ˜ V ikS into constraints (34)-(35) provides an upperbound approximation for problem ( P SSD ). Theorem 3.

Given Assumption 4, the optimal value of the following optimization problem min f ( z )s . t . λ k ǫ + 1 N N X i =1 V ik ≤ , k = 1 , · · · , K ,µ ik ≤ , ˜ µ ik ≤ , − µ ik + µ ik + µ ik ≥ , − ˜ µ ik − ˜ µ ik + ˜ µ ik ≥ ,i = 1 , · · · , N, k = 1 , · · · , K , ( P SSD − U ) V ik ≥ d T ν ik − b ξ Ti ( z − µ ik z + µ ik z + C T ν ik ) − µ ik η k + (1 − µ ik + µ ik + µ ik )¯ η k ,i = 1 , · · · , N, k = 1 , · · · , K , k z − µ ik z + µ ik z + C T ν ik k ≤ λ k , k − ˜ µ ik z − ˜ µ ik z + C T ˜ ν ik k ≤ λ k ,i = 1 , · · · , N, k = 1 , · · · , K ,V ik ≥ d T ˜ ν ik − b ξ Ti ( − ˜ µ ik z − ˜ µ ik z + C T ˜ ν ik ) − ˜ µ ik η k + ( − ˜ µ ik − ˜ µ ik + ˜ µ ik )¯ η k ,i = 1 , · · · , N, k = 1 , · · · , K ,z ∈ Z, λ ∈ R K + , µ ik ∈ R , ν ik ∈ R l + , ˜ µ ik ∈ R , ˜ ν ik ∈ R l + , V ik ∈ R ,i = 1 , · · · , N, k = 1 , · · · , K . is an upper bound to that of problem ( P SSD ).Proof.

From what we have demonstrated, problem ( P SSD − U ) provides a reformulation for problem (24). More-over, we have shown that problem (24) is an upper bound approximation for problem (22)-(23). The latterproblem is a reformulation of problem ( P SSD ), as illustrated in Figure 1. Therefore, problem ( P SSD − U ) providesan upper bound approximation for problem ( P SSD ). (22) - (23) upper ==== ⇒ bound (24) In what follows, we show that when the interval number K goes to inﬁnity, the optimal value of problem( P SSD − U ) converges to that of problem (22)-(23). Since problem ( P SSD − U ) is a reformulation of problem (24),as illustrated in Figure 1, we next prove that the optimal value of problem (24) converges to that of problem1522)-(23). To this end, we ﬁrst prove the asymptotic convergence of g ( z, K ) := max ≤ k ≤K sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) o to g ( z ) := sup P ∈Q sup η ∈R E P [( η − z T ξ ) + − ( η − z T ξ ) + ] = max ≤ k ≤K sup P ∈Q sup η ∈ [ η k , ¯ η k ] E P (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) . As ∪ k =1 ,..., K [ η k , ¯ η k ] = R holds, the function g does not depend on the splitting of the R .We notice that g ( · , K ) and g ( · ) are Lipschitz continuous with respect to z . Proposition 4.

Under Assumption 1, g ( · , K ) and g ( · ) are Lipschitz continuous with a concretely Lipschitzconstant C = max ξ ∈ Ξ k ξ k < ∞ .Proof. We ﬁrst prove the conclusion for g ( · , K ). We have from the Lipschitz continuity of the positive partfunction ( · ) + that | g ( z , K ) − g ( z , K ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max ≤ k ≤K sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) o − max ≤ k ≤K sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) o(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ≤ k ≤K sup P ∈Q (cid:12)(cid:12)(cid:12) E P n sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) − sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) o(cid:12)(cid:12)(cid:12) ≤ max ≤ k ≤K sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] (cid:12)(cid:12)(cid:12) (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) − (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) (cid:12)(cid:12)(cid:12)o ≤ max ≤ k ≤K sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] (cid:12)(cid:12)(cid:12) z T ξ − z T ξ (cid:12)(cid:12)(cid:12)o ≤ max ξ ∈ Ξ k ξ k · k z − z k . The proof of the conclusion for g ( · ) is quite similar and thus is omitted.Next, we prove that g ( z, K ) converges to g ( z ) when K goes to inﬁnity. Proposition 5.

We have that lim

K→∞ g ( z, K )= g ( z ) , and the convergence is uniform with respect to any z ∈ Z .Proof. Denote η ∗ k ( ω ) ∈ argsup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ( ω )) + − ( η − z T ξ ( ω )) + (cid:3) , ω ∈ Ω , k = 1 , · · · , K Denote η ∗∗ k ∈ argsup η ∈ [ η k , ¯ η k ] E P (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) , k = 1 , · · · , K . Notice that η ∗ k is a random variable, while η ∗∗ k is a real number. Since η ∗ k and η ∗∗ k are in the same interval[ η k , ¯ η k ], for any ω ∈ Ω, we have | η ∗ k ( ω ) − η ∗∗ k | ≤ ¯ η k − η k = R max −R min K . Then we obtain g ( z, K ) − g ( z )= max ≤ k ≤K sup P ∈Q E P n sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) o − max ≤ k ≤K sup P ∈Q sup η ∈ [ η k , ¯ η k ] E P (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) ≤ max ≤ k ≤K sup P ∈Q ( E P n sup η ∈ [ η k , ¯ η k ] (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3) o − sup η ∈ [ η k , ¯ η k ] E P (cid:2) ( η − z T ξ ) + − ( η − z T ξ ) + (cid:3))

16 max ≤ k ≤K sup P ∈Q n E P (cid:2) ( η ∗ k − z T ξ ) + − ( η ∗ k − z T ξ ) + (cid:3) − E P (cid:2) ( η ∗∗ k − z T ξ ) + − ( η ∗∗ k − z T ξ ) + (cid:3) o ≤ max ≤ k ≤K sup P ∈Q E P (cid:12)(cid:12)(cid:12) (cid:2) ( η ∗ k − z T ξ ) + − ( η ∗ k − z T ξ ) + (cid:3) − (cid:2) ( η ∗∗ k − z T ξ ) + − ( η ∗∗ k − z T ξ ) + (cid:3) (cid:12)(cid:12)(cid:12) ≤ R max − R min K , (36)where the last inequality is due to the Lipschitz continuity of the positive part function ( · ) + . Then the conclusionimmediately follows. We denote the feasible solution sets of problem (22)-(23) and problem (24) by F and F K , the optimal solutionsets by S and S K , and the optimal values by v and v K , respectively. It is clear that F K ⊂ F , ∀K . To derivethe convergence of v K to v , as well as the quantitative approximation error estimation, we need some constraintqualiﬁcation, e.g., Mangasarian Fromovitz constraint qualiﬁcation (MFCQ) [27]. Although function g ( · , K ) isnon-smooth, it is continuous and convex, and thus subdiﬀerentiable everywhere. Therefore, it is reasonable forus to extend MFCQ to the subdiﬀerentiable case. Deﬁnition 3. (ND-MFCQ)

Let F ( t ) = { x ∈ R n | g j ( x, t ) ≤ , j ∈ J } with subdiﬀerentiable g j , here t is aparameter in the optimization. If there exist some vector θ , real number σ < , real number α > , and realnumber α > such that h v, θ i ≤ σ < , ∀ v ∈ ∂g j ( x, t ) , ∀ x : k x − ¯ x k ≤ α , ∀ t : k t − ¯ t k ≤ α , ∀ j ∈ J (¯ x, ¯ t ) , where J (¯ x, ¯ t ) = { j ∈ J | g j (¯ x, ¯ t ) = 0 } , then we say that non-diﬀerentiable MFCQ (ND-MFCQ) holds at (¯ x, ¯ t ) , ¯ x ∈ F (¯ t ) with θ, σ, α and α , We have noticed that the MFCQ condition under nonsmooth case has been discussed in many literature,such as [36, Page 14]. However, the non-diﬀerentiable MFCQ deﬁned in Deﬁnition 3 is more strict than thatin [36]. Firstly, Deﬁnition 3 deﬁnes non-diﬀerentiable MFCQ under a parameterized case, while [36] deﬁnes itunder a case without parameters. Most importly, Deﬁnition 3 requires the non-diﬀerentiable MFCQ holds in aneighborhood of (¯ x, ¯ t ) with a uniﬁed vector θ , while [36] only requires the non-diﬀerentiable MFCQ holds at apoint ¯ x .ND-MFCQ deﬁned in Deﬁnition 3 is an extension of the classic MFCQ proposed in [27], and it is equivalentto the classic MFCQ if the constraint functions are diﬀerentiable. Proposition 6.

When g j , j ∈ J are diﬀerentiable, ND-MFCQ is equivalent to classic MFCQ. In this paper, we only have one constraint and thus J = { } . Our decision variable z corresponds to x inDeﬁnition 3 and our parameter K corresponds to t in Deﬁnition 3.To arrive at the convergence result, we also require the following assumption. Assumption 5. a) The objective function f ( z ) is continuous and diﬀerentiable, and its gradients are boundedby C f = max z ∈ Z k∇ f ( z ) k < ∞ ;b) S , the optimal solution set for problem (24) with K = 1 , is nonempty. Observing the constraints in (23) and (24), we know that

S ⊃ S K ⊃ S for any K . Therefore, if S isnonempty, then S and S K are nonempty for any K . Theorem 4.

Given Assumptions 1 and 5. For some z ∗ ∈ S , assume that ND-MFCQ holds at ( z ∗ , with θ , σ , α , and α as is deﬁned in Deﬁnition 3. Then, for K ≥ max n α , | σ | R max −R min α k θ k , − R max −R min g ( z ∗ ) (cid:16) C k θ k| σ | + 1 (cid:17)o , we have that, | v K − v | ≤ C f | σ | R max − R min K . Thus, lim

K→∞ v K = v .Proof. For all K , let z K = z ∗ − σ R max −R min K θ. Since z ∗ ∈ S , then obviously g ( z ∗ ) ≤ z K ∈ F K . To prove this, we examine two cases on whether the constraint g ( z ) ≤ z ∗ . 17 Case 1: g ( z ∗ ) = 0. Then by the inequality (36) in Proposition 5, we immediately have g ( z K , K ) = g ( z K , K ) − g ( z ∗ , K ) + g ( z ∗ , K ) − g ( z ∗ ) + g ( z ∗ )= g ( z K , K ) − g ( z ∗ , K ) + g ( z ∗ , K ) − g ( z ∗ ) ≤ g ( z K , K ) − g ( z ∗ , K ) + 2 R max − R min K . From the extended mean-value theorem [32, Theorem 10.48], for some τ ∈ (0 ,

1) and the corresponding point z τ K = (1 − τ ) z K + τ z ∗ , there exists a vector v ∈ ∂g ( z τ K , K ) satisfying g ( z K , K ) = h v, z K − z ∗ i + 2 R max − R min K , For any

K ≥ | σ | R max −R min α k θ k , z τ K is in the α -neighborhood of z ∗ , which can be seen from k z τ K − z ∗ k = k (1 − τ )( z K − z ∗ ) k ≤ k z K − z ∗ k = 2 | σ | R max − R min K k θ k ≤ α . Therefore, by the ND-MFCQ assumption, we have for all

K ≥ max n | σ | R max −R min α k θ k , α o that g ( z K , K ) = − σ R max − R min K h v, θ i + 2 R max − R min K≤ − σ R max − R min K σ + 2 R max − R min K = 0 . Then z K ∈ F K . • Case 2: g ( z ∗ ) <

0. Let δ := − g ( z ∗ ) >

0. If

1) and the corresponding point z ς K = (1 − ς ) z K + τ z ∗ ,we have v K − v = f ( z ∗K ) − f ( z ∗ ) ≤ f ( z K ) − f ( z ∗ ) = h∇ f ( z ς ) , z K − z ∗ i≤ C f k z K − z ∗ k = C f | σ | R max − R min K . Since F K ⊂ F , then v K ≥ v . Therefore, it holds that | v K − v | ≤ C f | σ | R max − R min K . Proposition 4 quantitatively estimates the approximation error between the optimal value of problem (24)and that of problem (22)-(23). 18 .3 Sequential convex approximation for ( P SSD − U ) Observing that bilinear terms µ ik z and ˜ µ ik z in problem ( P SSD − U ) makes it diﬃcult to solve problem ( P SSD − U )directly. We apply a sequential convex approximation method to solve problem ( P SSD − U ), see Algorithm 2.The idea is to separate coupling variables. At each iteration, we ﬁx z and optimize with respect to µ, ˜ µ ; then ﬁx µ, ˜ µ and optimize with respect to z . The sequential convex approximation method ﬁnally generates a sequenceof decisions whose objective values converge to an upper bound of the optimal value of problem ( P SSD − U ). Algorithm 2

Sequential convex approximation

Start from z ∈ Z , k = 1. while k ≥ do Solve problem ( P SSD − U ) with an additional constraint z = z k . Denote the optimal µ, ˜ µ by µ k , ˜ µ k , respec-tively.Solve problem ( P SSD − U ) with additional constraints µ = µ k , ˜ µ = ˜ µ k . Denote the optimal z by z k +1 . if z k +1 = z k then Break. else k ← k + 1. end ifend whileProposition 7. Suppose that the optimal value of problem ( P SSD ) is ﬁnite. Given a starting point z . Algorithm2 generates a sequence of decisions whose objective values converge to an upper bound of the optimal value ofproblem ( P SSD − U ).Proof. Denote the feasible solution set of problem ( P SSD − U ) by F U . We write all the decision variables excluding z, µ, ˜ µ by y . We can thus write problem ( P SSD − U ) in a compact form min { f ( z ) | ( z, µ, ˜ µ, y ) ∈ F U } .Firstly, observe that each problem we solve in Algorithm 2 has an additional constraint compared withproblem ( P SSD − U ). Therefore, f ( z k ) , k = 1 , · · · , are upper bounds to the optimal value of problem ( P SSD − U ).Next, the sequence { f ( z k ) } has a ﬁnite lower bound, the optimal value of problem ( P SSD ). Thus in orderto show the convergence of { f ( z k ) } , it is suﬃcient to prove that { f ( z k ) } is nonincreasing. From Algorithm2, there exists y ′ such that ( µ k , ˜ µ k , y ′ ) = argmin µ, ˜ µ,y { f ( z k ) | ( z k , µ, ˜ µ, y ) ∈ F U } . It follows immediately that( z k , µ, ˜ µ, y ′ ) ∈ F U . Also there exists y ′′ such that ( z k +1 , y ′′ ) = argmin z,y { f ( z ) | ( z, µ k , ˜ µ k , y ) ∈ F U } . Since( z k , µ k , ˜ µ k , y ′ ) ∈ F U , we have f ( z k +1 ) ≤ f ( z k ) . Here it is necessary to point out that any element in the sequence of optimal values generated by Algorithm2 is an upper bound of the optimal value of problem ( P SSD − U ). Each problem we solve in Algorithm 2 is asecond-order cone programming and thus is computationally tractable.To conclude this section, we divide R into sub-intervals and exchange the order of the expectation op-erator and the supremum over each sub-interval to derive an upper bound approximation ( P SSD − U ) for thedistributionally robust second-order stochastic dominance constrained optimization problem ( P SSD ).We provethe convergence of the optimal value of the upper bound approximation problem and quantitatively estimatethe approximation error. To cope with bilinear terms in problem ( P SSD − U ), we apply the sequential convexapproximation method, Algorithm 2, to obtain an upper bound of the optimal value of problem ( P SSD − U ). In this section, we present the results of numerical experiments to illustrate the validity and practicality of ourlower and upper bound approximation methods to the distributionally robust stochastic dominance constrainedmodel ( P SSD ). We begin with a simple numerical example and examine the validation of the proposed lower and upper boundapproximations. Consider the following problem:min 12 k z k E P [( η − z T ξ ) + ] ≤ E P [( η − z T ξ ) + ] , ∀ η ∈ R , ∀ P ∈ Q , (37) z ∈ R +2 , k z k ≤ . where z = (1 , T and Q = { P ∈ M (Ξ) : d ( P, b P N ) ≤ ǫ } is deﬁned as that in (6). Here b P N = N P Ni =1 δ b ξ i isthe empirical distribution. The support set is supposed to be Ξ = { ( ξ , ξ ) | ξ ∈ [0 , , ξ ∈ [0 , } . Weset ǫ = 10 − , N = 10 and the observed sample set { b ξ i } i =1 consists of (0 , T , (250 , T , (0 , T , (100 , T ,(200 , T , (100 , T , (200 , T , (0 , T , (0 , T , (200 , T .Table 1: The optimal values and the optimal solutions of the lower and upper bound approximations to problem(37). lower bound approximation (by solving ( P SSD − L )) upper bound approximation (by Algorithm 2) Gap N N Optimal value Optimal solution K Optimal value Optimal solution100 100 0.292 (0 . , . T

10 0.410 (0 . , . T . , . T

11 0.304 (0 . , . T . , . T

12 0.303 (0 . , . T We get the lower bound approximation by solving the linear programming formulation ( P SSD − L ) and obtainthe upper bound approximation by Algorithm 2. The optimal values and the optimal solutions are shown inTable 1. We also calculate the relative gaps of the optimal values of the lower and upper bound approximations(i.e., Gap= | upper − lowerlower | ). From Table 1, we can see that the relative gap between the optimal values of thelower and upper bound approximations decreases quickly to 0 with the increase of sample sizes N , N and theinterval number K , which veriﬁes the validation of the proposed approximation methods. Next, we consider a ﬁnancial application of model ( P SSD ) to the portfolio selection problem with distributionallyrobust second-order stochastic dominance constraints:min z ∈ Z E b P N [ − z T ξ ] (38)s.t. E P [( η − z T ξ ) + ] ≤ E P [( η − z T ξ ) + ] , ∀ η ∈ R , ∀ P ∈ Q , (39)where Z = { z ∈ R n | z ≥ , P ni =1 z i = 1 } . Problem (38)-(39) is inspired by [16, Example 4.2]. The diﬀerencebetween problem (38)-(39) and that in [16, Example 4.2] lies in the construction method of the ambiguity set.In [16, Example 4.2], the ambiguity set Q is determined by ﬁrst two order moment information, while in problem(38)-(39), Q is deﬁned through Wasserstein distance.The numerical experiments are carried out by calling the Gurobi solver in CVX package in MATLAB R2016aon a Dell G7 laptop with Windows 10 operating system, Intel Core i7 8750H CPU 2.21 GHz and 16 GB RAM.We select eight risky assets to constitute the stock pool, which are U.S. three-month treasury bills, U.S. long-term government bonds, S&P 500, Willshire 5000, NASDAQ, Lehmann Brothers corporate bond index, EAFEforeign stock index, and gold. We use the same historical annual return rate data as that in [7, Table 8.1] (witha total of 22 years). We choose the equally weighted portfolio as the benchmark portfolio z . The supportset is deﬁned as Ξ = { x = ( x , · · · , x n ) | a i ≤ x i ≤ b i , i = 1 , · · · , n } , where a i and b i denote the smallest andlargest historical annual return rates of the i th asset, respectively. In Algorithm 2, for deriving an upper boundapproximation, the choice of the starting point z is an essential issue. We already know that 1) the benchmarkportfolio z is deﬁnitely feasible for problem ( P SSD ); 2) problem ( P SSD − U ) is an upper bound approximationto problem ( P SSD ) from Theorem 3. Therefore, z is probably feasible for problem ( P SSD − U ) and we thuschoose z as the starting point z .The lower bound can be obtained by solving a linear programming problem or a sequence of small-scalelinear programming problems using Algorithm 1. When deriving the lower bound approximation, we shouldapproximate sets Ξ and R = z T Ξ by randomly selected samples Ξ N and Ξ N , respectively. To obtain a betterlower bound approximation, we can repeat the sampling-optimizing process for multiple times and choose thelargest optimal value as the ﬁnal lower bound. For deriving the upper bound approximation, we solve a sequenceof second order conic programming problems. When Algorithm 2 stops, we take the optimal value of the laststep as the upper bound. 20n what follows, we show the numerical results emphatically illustrating from the following aspects: theconvergence of the lower and upper bound approximations with respect to the sample size, the impact of theinterval number on the optimal value of the upper bound approximation, the impact of robust second-orderstochastic dominance constraints deﬁned by Wasserstein distance, as well as the inﬂuence of the robust radius. Firstly, we examine the eﬀectiveness of the proposed lower and upper bound approximations. We also demon-strate the convergence of the lower bound approximation with respect to the sample sizes N , N and thedecreasing trend of the upper bound approximation when the interval number K increases. We ﬁx the robustradius ǫ = 10 − .For the lower bound approximation, we start from the case with N = 40 and N = 40, that is, both theapproximate sets Ξ N and Ξ N have 40 samples. To make fair comparison later in Section 5.2.2 with the portfoliooptimization problem with non-robust second-order stochastic dominance constraints, we let Ξ N include allthe historical annual return rates { b ξ i } i =1 from [7, Table 8.1] and other 18 randomly generated samples fromΞ; similarly, Ξ N consists of all the samples in { z T b ξ i } i =1 and other 18 randomly generated samples from z T Ξ.After solving problem ( P SSD − L ), we obtain an optimal value and an optimal solution. Then we repeat thesampling-optimizing tests for 10 times, and adopt the largest optimal value as the lower bound under this case.Then we generate other 20 samples from Ξ and add them into the set Ξ N , and also generate 20 samples from z T Ξ and add them into the set Ξ N , which corresponds to the case with N = 60 and N = 60. Repeat theabove testing procedure. The process stops when N and N are large enough such that problem ( P SSD − L )cannot be solved on MATLAB due to huge time consumption.For the upper bound approximation, we consider the cases with the interval number being K = 1 , , , , 6DPSOHVL]HV   = LQORZHUERXQGDSSUR[LPDWLRQíííííííí 2 S W L P D O YD O X H /RZHUDSSUR[LPDWLRQ ,QWHUYDOQXPEHULQXSSHUERXQGDSSUR[LPDWLRQ8SSHUDSSUR[LPDWLRQ Figure 2: The optimal values of the lower bound approximation with respect to N , N and that of the upperbound approximation with respect to K .From Figure 2, we can observe that the lower bound approximation monotonuously increases with theincrease of the sample sizes N and N . Besides, the upper bound approximation decreases with the increase ofthe interval number K . The gap between the lower and upper bound approximations approaches to 0. To seemore details, we present in Table 2 the optimal values and the optimal solutions obtained from the lower andupper bound approximations.From Table 2 we can see that the optimal value obtained from the lower bound approximation is monotonouslyincreasing with the increase of the sample sizes N , N . This veriﬁes the asymptotic convergence result estab-lished in Theorem 1. We can also see from Table 2 that the optimal value obtained from the upper boundapproximation decreases if the number of sub-intervals increases. This veriﬁes the asymptotic convergence re-sult in Theorem 4 and supports our split-and-dual technique for deriving the upper bound. From Table 2, wecan also see the changing trend of the optimal portfolios of the lower and upper bound approximations. Espe-cially for the upper bound approximation, the optimal portfolio under K = 1 is the equally weighted portfolio,21able 2: The optimal values and the optimal solutions of the lower bound approximation with respect to N , N ,and those of the upper bound approximation with respect to K . lower bound approximation (by solving ( P SSD − L )) upper bound approximation (by Algorithm 2) N N Optimal value(%) K Optimal value(%) GapOptimal solution Optimal solution40 40 -11.0082 1 -10.6534 3.22%(0.000,0.000,0.068,0.188,0.000,0.391,0.231,0.122) (0.125,0.125,0.125,0.125,0.125,0.125,0.125,0.125)60 60 -10.9872 2 -10.6543 3.04%(0.000,0.038,0.000,0.269,0.000,0.354,0.213,0.126) (0.125,0.124,0.124,0.127,0.125,0.126,0.125,0.125)80 80 -10.9463 4 -10.6546 2.66%(0.000,0.006,0.094,0.138,0.036,0.389,0.215,0.123) (0.124,0.124,0.123,0.127,0.124,0.127,0.125,0.125)100 100 -10.7838 8 -10.6551 1.19%(0.000,0.018,0.168,0.000,0.131,0.384,0.172,0.126) (0.124,0.124,0.123,0.128,0.124,0.127,0.125,0.125)120 120 -10.7838 12 -10.7389 0.42%(0.000,0.018,0.168,0.000,0.131,0.384,0.172,0.126) (0.075,0.067,0.005,0.274,0.087,0.238,0.125,0.129) while the optimal portfolio under K = 12 is quite diﬀerent from the equally weighted portfolio and approachesthe optimal portfolios obtained from the lower bound approximation.We observe from Table 2 that the lower and upper bounds we ﬁnally obtain are not equal. This is theoreticallycorrect because the upper bound approximation is not tight. In fact, from problem (22)-(23) to problem(24), we exchange the order of operators sup η ∈ [ η k , ¯ η k ] and E P , and such a transformation is not an equivalentreformulation. Therefore, a gap is induced here. To evaluate the diﬀerence between the lower and upper boundapproximations, we calculate the relative gap between the upper bound with K = 12 and the lower bound with N = 120 , N = 120, which is only | . − . − . | = 0 . P SSD − L ) or the upperbound approximation (Algorithm 2), both optimal values are close to the true optimal value. To examine the price of introducing distributional robustness, we compare the numerical results of robuststochastic dominance constrained portfolio optimization problem with those of classic stochastic dominanceconstrained portfolio optimization problem. Speciﬁcally, the latter model readsmin n E b P N [ − z T ξ ] | z ∈ Z, z T ξ (cid:23) b P N z T ξ o , which is equivalent to min z ∈ Z E b P N [ − z T ξ ] (40)s.t. E b P N [( η − z T ξ ) + ] ≤ E b P N [( η − z T ξ ) + ] , ∀ η ∈ R . Here the expectations are taken under the empirical distribution b P N .Table 3 reports the comparative results. In Table 3, for the distributionally robust stochastic dominanceconstrained problem, denoted by ‘RSD’, we present the optimal expected return rate (absolute value of theoptimal value) and the optimal portfolio of the lower bound approximation obtained by solving ( P SSD − L )under ǫ = 10 − , N = 120 , N = 120, and those of the upper bound approximation obtained by Algorithm2 under K = 12; for the classic stochastic dominance constrained problem, denoted by ‘SD’, we present theoptimal expected return rate and the optimal portfolio. Table 3 also exhibits the benchmark portfolio and itsexpected return rate.From Table 3, we can see that both the lower and upper bound approximations to problem (38)-(39) withdistributionally robust stochastic dominance constraints derive a smaller optimal expected return rate thanproblem (40) with classic stochastic dominance constraints. Therefore, the optimal expected return rate ofproblem (38)-(39) must be smaller than that of problem (40). As we expected, considering the distributionallyrobust ambiguity in stochastic dominance constraints induces a more conservative solution. It can also beseen from Table 3 that the expected return rates of the lower and upper bound approximations are largerthan that of the benchmark portfolio, which means that model (38)-(39) derives a portfolio better than the22able 3: The optimal expected return rates and the optimal portfolios for the lower and upper bound approxi-mations to problem (38)-(39), and those of problem (40). Portfolio optimization problem Expected return rate(%)RSD (38)-(39) lower bound approximation 10.7838upper bound approximation 10.7389SD (40) 11.0082Benchmark 10.6534 benchmark portfolio in sense of the expected return rate. These numerical results demonstrate that introducingdistributional robustness brings in conservation without loss of stochastic dominance.

Finally, we brieﬂy examine the impact of robust radius on the lower and upper bound approximations tothe portfolio optimization problem (38)-(39) with distributionally robust second-order stochastic dominanceconstraints. To obtain the lower bound approximation, we solve linear programming ( P SSD − L ) with N = 120and N = 120, while for the upper bound approximation, we adopt Algorithm 2 under K = 12. The optimalexpected return rates of the lower and upper bound approximations under diﬀerent robust radii are shown inTable 4.Table 4: Optimal values of the lower and upper bound approximations, and their relative gaps with respect todiﬀerent robust radius. Robust radius Optimal values (%) Gap ǫ lower bound approximation upper bound approximation10 − -10.8775 -10.8268 0.466%10 − -10.7838 -10.7389 0.416%10 − -10.7836 -10.6536 1.206%10 − -10.7823 -10.6535 1.194%0.1 -10.7689 -10.6534 1.072%0.5 -10.6885 -10.6534 0.328%1 -10.6534 -10.6534 0% We can clearly see from Table 4 that both the optimal value of the lower and upper bound approximationsto problem (38)-(39) are monotonously increasing with the increase of the robust radius, which implies that theoptimal value for problem (38)-(39) also increase as the robust radius increases. This is theoretically naturalbecause a problem with a larger robust radius has a smaller feasible solution set and thus has a larger optimalvalue. Table 4 also tells us that choosing a proper robust radius is a crucial issue in distributionally robuststochastic dominance constrained problems. We notice that for robust radius ǫ ≥ .

1, the optimal portfolioobtained from the upper bound approximation problem coincides with the benchmark portfolio (the benchmarkportfolio is always feasible for problem (38)-(39) and provides a trivial upper bound, -10.6534, for the optimalvalue of problem (38)-(39)), which means that the upper bound approximation does not provide additionallyuseful information for portfolio selection. Fortunately, when robust radius ǫ ≤ − , both the lower and upperbound approximations derive optimal portfolios diﬀerent from the benchmark portfolio and thus are useful forportfolio selection. We consider a distributionally robust SSD constrained optimization problem, where the true distribution ofthe uncertain parameters is ambiguous. The ambiguity set contains those probability distributions close to theempirical distribution under the Wasserstein distance.We propose two approximation methods to obtain bounds on the optimal value of the original problem. Weadopt the sampling technique to develop a linear programming formulation to obtain a lower bound approx-imation for the problem. The lower bound approximation can be easily solved by using linear programmingformulation or by the cutting-plane method. Moreover, we prove that the lower bound approximation is asymp-totically tight. We also develop an upper bound approximation and quantitatively estimate the approximation23rror between the optimal value of the upper bound approximation and that of the original problem. Wepropose a novel split-and-dual decomposition framework to reformulate robust SSD constraints. The upperbound approximation problem can be solved by a sequence of second-order cone programming problems. Wecarry out numerical experiments on a portfolio optimization problem to illustrate our lower and upper boundapproximation methods.One of future research topics would be modifying the design of cutting-planes to solve the lower boundapproximation problem more eﬃciently. Besides, ﬁnding eﬃcient approximation and solution methods fordistributionally robust multivariate robust SSD constrained optimization is also a promising topic.

Funding

This research was supported by the National Natural Science Foundation of China under grant numbers11991023, 11991020, 11735011, and 11901449.

References [1] Zhi Chen, Daniel Kuhn, and Wolfram Wiesemann. Data-driven chance constrained programs over Wasser-stein balls.

Available at arXiv , 2018.[2] Zhiping Chen and Jie Jiang. Stability analysis of optimization problems with k th order stochastic anddistributionally robust dominance constraints induced by full random recourse. SIAM Journal on Opti-mization , 28(2):1396–1419, 2018.[3] Zhiping Chen, Yu Mei, and Jia Liu. Multivariate robust second-order stochastic dominance and resultingrisk-averse optimization.

Optimization , 68(9):1719–1747, 2019.[4] Giorgio Consigli, Vittorio Moriggia, and Sebastiano Vitali. Long-term individual ﬁnancial planning understochastic dominance constraints.

Annals of Operations Research , 292:973–1000, 2020.[5] Darinka Dentcheva, Ren´e Henrion, and Andrzej Ruszczy´nski. Stability and sensitivity of optimizationproblems with ﬁrst order stochastic dominance constraints.

SIAM Journal on Optimization , 18(1):322–337, 2007.[6] Darinka Dentcheva and Werner R¨omisch. Stability and sensitivity of stochastic dominance constrainedoptimization models.

SIAM Journal on optimization , 23(3):1672–1688, 2013.[7] Darinka Dentcheva and Andrzej Ruszczy´nski. Optimization with stochastic dominance constraints.

SIAMJournal on Optimization , 14(2):548–566, 2003.[8] Darinka Dentcheva and Andrzej Ruszczy´nski. Portfolio optimization with stochastic dominance constraints.

Journal of Banking & Finance , 30:433–451, 2006.[9] Darinka Dentcheva and Andrzej Ruszczy´nski. Robust stochastic dominance and its application to risk-averse optimization.

Mathematical Programming , 123(1):85–100, 2010.[10] Jitka Dupaˇcov´a and Miloˇs Kopa. Robustness of optimal portfolios under risk and stochastic dominanceconstraints.

European Journal of Operational Research , 234:434–441, 2014.[11] Laureano F. Escudero, Mar´ıa Araceli Gar´ın, Mar´ıa Merino, and Gloria P´erez. On time stochastic dom-inance induced by mixed integer-linear recourse in multistage stochastic programs.

European Journal ofOperational Research , 249:164–176, 2016.[12] Csaba I F´abi´an, Gautam Mitra, and Diana Roman. Processing second-order stochastic dominance modelsusing cutting-plane representations.

Mathematical Programming , 130(1):33–57, 2011.[13] Rui Gao and Anton J. Kleywegt. Distributionally robust stochastic optimization with Wasserstein distance.

Available at arXiv , 2016.[14] Ralf Gollmer, Uwe Gotzes, and R¨udiger Schultz. A note on second-order stochastic dominance constraintsinduced by mixed-integer linear recourse.

Mathematical Programming , 126(1):179–190, 2011.2415] Ralf Gollmer, Frederike Neise, and R¨udiger Schultz. Stochastic programs with ﬁrst-order dominance con-straints induced by mixed-integer linear recourse.

SIAM Journal on Optimization , 19(2):552–571, 2008.[16] Shaoyan Guo, Huifu Xu, and Liwei Zhang. Probability approximation schemes for stochastic programswith distributionally robust second-order dominance constraints.

Optimization Methods and Software ,32(4):770–789, 2017.[17] William B. Haskell, Lunce Fu, and Maged Dessouky. Ambiguity in risk preferences in robust stochasticoptimization.

European Journal of Operational Research , 254:214–225, 2016.[18] William B. Haskell, J. George Shanthikumar, and Z. Max Shen. Primal-dual algorithms for optimizationwith stochastic dominance.

SIAM Journal on Optimization , 27(1):34–66, 2017.[19] Tito Homem-de-Mello and Sanjay Mehrotra. A cutting-surface method for uncertain linear programs withpolyhedral stochastic dominance constraints.

SIAM Journal on Optimization , 20(3):1250–1273, 2009.[20] Jian Hu and Gevorg Stepanyan. Optimization with reference-based robust preference constraints.

SIAMJournal on Optimization , 27(4):2230–2257, 2017.[21] Ran Ji and Miguel Lejeune. Data-driven distributionally robust chance-constrained optimization withWasserstein metric.

Available at SSRN 3201356 , 2020.[22] Ruiwei Jiang and Yongpei Guan. Data-driven chance constrained stochastic program.

Mathematical Pro-gramming , 158:291–327, 2016.[23] Juuso Liesi¨o, Peng Xu, and Timo Kuosmanen. Portfolio diversiﬁcation based on stochastic dominanceunder incomplete probability information.

European Journal of Operational Research , 286:755–768, 2020.[24] Jia Liu, Abdel Lisser, and Zhiping Chen. Distributionally robust chance constrained geometric optimiza-tion.

Available at Optimization Online , 2019.[25] Yongchao Liu, Hailin Sun, and Huifu Xu. An approximation scheme for stochastic programs with secondorder dominance constraints.

Numerical Algebra, Control & Optimization , 6(4):473–490, 2016.[26] James Luedtke. New formulations for optimization under stochastic dominance constraints.

SIAM Journalon Optimization , 19(3):1433–1450, 2008.[27] Olvi L. Mangasarian and Stan Fromovitz. The Fritz John necessary optimality conditions in the presenceof equality and inequality constraints.

Journal of Mathematical Analysis and Applications , 17:37–47, 1967.[28] Yu Mei, Zhiping Chen, Bingbing Ji, Zhujia Xu, and Jia Liu. Data-driven stochastic programming withdistributionally robust constraints under Wasserstein distance: asymptotic properties.

Journal of theOperations Research Society of China , 2020.[29] Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimization using theWasserstein metric: performance guarantees and tractable reformulations.

Mathematical Programming ,171:115–166, 2018.[30] Nilay Noyan and G´abor Rudolf. Optimization with stochastic preferences based on a general class ofscalarization functions.

Operations Research , 66(2):463–486, 2018.[31] Nilay Noyan, G´abor Rudolf, and Andrzej Ruszczy´nski. Relaxations of linear programming problems withﬁrst order stochastic dominance constraints.

Operations Research Letters , 34(6):653–659, 2006.[32] R. Tyrrell Rockafellar and Roger J-B Wets.

Variational Analysis . Springer, Berlin, 2009.[33] G´abor Rudolf and Andrzej Ruszczy´nski. Optimization problems with second order stochastic dominanceconstraints: duality, compact formulations, and cut generation methods.

SIAM Journal on Optimization ,19(3):1326–1343, 2008.[34] Hailin Sun, Huifu Xu, Rudabeh Meskarian, and Yong Wang. Exact penalization, level function method, andmodiﬁed cutting-plane method for stochastic programs with second order stochastic dominance constraints.

SIAM Journal on Optimization , 23(1):602–631, 2013.2535] Weijun Xie. On distributionally robust chance constrained programs with Wasserstein distance.

Mathe-matical Programming , 2019.[36] Jane J. Ye and Daoli Zhu. Optimality conditions for bilevel programming problems.

Optimization , 33(1):9–27, 1995.[37] Sainan Zhang, Shaoyan Guo, Liwei Zhang, and Hongwei Zhang. On distributionally robust optimizationproblems with k -th order stochastic dominance constraints induced by full random quadratic recourse. Journal of Mathematical Analysis and Applications , 493(2):124564, 2021.[38] Chaoyue Zhao and Yongpei Guan. Data-driven risk-averse stochastic optimization with Wasserstein metric.

Operations Research Letters , 46(2):262–267, 2018.[39] S. Zymler, D. Kuhn, and B. Rustem. Distributionally robust joint chance constraints with second-ordermoment information.