Codifferentials and Quasidifferentials of the Expectation of Nonsmooth Random Integrands and Two-Stage Stochastic Programming
aa r X i v : . [ m a t h . O C ] F e b Codifferentials and Quasidifferentials ofthe Expectation of Nonsmooth RandomIntegrands and Two-Stage StochasticProgramming
M.V. DolgopolikFebruary 15, 2021
Abstract
This work is devoted to an analysis of exact penalty functions andoptimality conditions for nonsmooth two-stage stochastic programmingproblems. To this end, we first study the co-/quasi-differentiability of theexpectation of nonsmooth random integrands and obtain explicit formu-lae for its co- and quasidifferential under some natural assumptions onthe integrand. Then we analyse exact penalty functions for a variationalreformulation of two-stage stochastic programming problems and obtainsufficient conditions for the global exactness of these functions with twodifferent penalty terms. In the end of the paper, we combine our resultson the co-/quasi-differentiability of the expectation of nonsmooth randomintegrands and exact penalty functions to derive optimality conditions fornonsmooth two-stage stochastic programming problems in terms of codif-ferentials.
Two-stage stochastic programming is one of the basic problems of stochastic op-timization [3,40] that has multiple applications in various fields, including trans-portation planning [2, 30], disaster management [25], optimal design of energysystems [49], resources management [27], etc. Although two-stage stochasticprogramming problems can be viewed as stochastic versions of bilevel optimiza-tion problems [8,9], their stochastic nature requires a largely different approachto their solution. Optimality conditions for two-stage stochastic programmingproblems were obtained in [26,36,40,45,46], while numerical methods for solvingvarious classes of two-stage stochastic programming problems were studied e.g.in [23, 28, 32, 39] (see also the references therein).The need for computing convex or nonconvex subdifferentials of the ex-pectation of nonsmooth random integrands arises in many areas of stochasticoptimization, including two-stage stochastic programming, as well as stochas-tic linear complementarity problems [6], stochastic variational inequalities [7],etc. The subdifferential in the sense of convex analysis of the expectation of aconvex integrand was computed in [37], while its approximations were discussedin [31]. Various approximations of the Clarke subdifferential of the expectation1f nonsmooth random integrands were studied in [5, 47], while an outer esti-mate of its Mordukhovich basic subdifferential was obtained in [46]. Finally,a quasidifferential of the expectation of quasidifferentiable random integrandswas computed in [29].The main goal of this paper is to apply constructive nonsmooth analysis [12,14,15] to a theoretical analysis of nonsmooth two-stage stochastic programmingproblems. Firstly, we analyse the codifferentiability and quasidifferentiability ofthe expectation of nonsmooth random integrands and present explicit formulaefor its codifferential and quasidifferential in the more general case and underdifferent assumptions than in [29] (see Remark 2 for more details).In the second part of the paper we study exact penalty functions for two-stage stochastic programming problems, reformulated as equivalent variationalproblems with pointwise constraints. With the use of the general theory of ex-act penalty functions [11, 19, 22, 34, 38, 48], we obtain sufficient conditions forthe global exactness of penalty functions for two-stage stochastic programmingwith two different types of penalty terms. The use of penalty terms of the firsttype leads to much less restrictive assumptions on constraints of the secondstage problem, while the second type of penalty terms is more convenient forapplications. In particular, it allows one to reformulate two-stage stochasticprogramming problems, whose second stage problem has DC (Difference-of-Convex) objective function and DC constraints, as equivalent unconstrainedDC optimization problems and apply the well-developed apparatus of DC opti-mization to find their solutions (cf. analogous results for bilevel programmingproblems in [33, 42]). Let us also note that exact penalty functions for single-stage stochastic programming were analysed in [24].Finally, in the end of the paper we combine our results on quasidifferentialsof the expectation of nonsmooth random integrands and exact penalty functionsfor two-stage stochastic programming problems to obtains necessary optimalityconditions for these problems in terms of codifferentials.The paper is organised as follows. Some auxiliary definitions and facts fromconstructive nonsmooth analysis, that are necessary for understanding the pa-per, are collected in Section 2. Codifferentiability and quasidifferentiability ofthe expectation of nonsmooth random integrands is studied in Section 3, whileSection 4 is devoted to nonsmooth two-stage stochastic programming problems.Exact penalty functions for such problems are analysed in Subsection 4.1, whileoptimality conditions for these problems in terms of codifferentials are derivedin Subsection 4.2.
Let us introduce the notation and briefly recall several definitions from nons-mooth analysis that will be used throughout the article. For more details in thefinite dimensional case see [12,14,15]. The infinite dimensional case was studiedin [16–18, 20].Let X be a real Banach space. Denote by X ∗ its topological dual, and by h· , ·i the duality pairing between X and X ∗ . The weak ∗ topology on X ∗ isdenoted by w ∗ or σ ( X ∗ , X ) depending on the context. Denote also by τ R thecanonical topology of the real line R . Let finally U ⊂ X be an open set.2 efinition 1. A function f : U → R is called codifferentiable at a point x ∈ U ,if there exists a pair of convex subsets df ( x ) , df ( x ) ⊂ R × X ∗ that are compactin the topological product ( R × X ∗ , τ R × w ∗ ), satisfy the equalitymax ( a,x ∗ ) ∈ df ( x ) a = min ( b,y ∗ ) ∈ df ( x ) b = 0 , (1)and for any ∆ x ∈ X satisfy the following condition:lim α → +0 α (cid:12)(cid:12)(cid:12) f ( x + α ∆ x ) − f ( x ) − max ( a,x ∗ ) ∈ df ( x ) (cid:0) a + h x ∗ , α ∆ x i (cid:1) − min ( b,y ∗ ) ∈ df ( x ) (cid:0) b + h y ∗ , α ∆ x i (cid:1)(cid:12)(cid:12)(cid:12) = 0The pair Df ( x ) = [ df ( x ) , df ( x )] is called a codifferential of f at x , the set df ( x ) is referred to as a hypodifferential of f at x , while the set df ( x ) is calleda hyperdifferential of f at x . Remark . (i) In the case when X = R d , a codifferential Df ( x ) is a pair ofconvex compact subsets of R × R d = R d +1 satisfying the equalities from theprevious definition. In addition, if X is a Hilbert space, then it is natural tosuppose that a codifferential Df ( x ) is a pair of convex weakly compact subsetsof the space R × X .(ii) Note that a codifferential is not uniquely defined. In particular, one caneasily verify that for any compact convex subset C of the space ( R × X ∗ , τ R × w ∗ )the pair [ df ( x ) + C, df ( x ) − C ] is a codifferential of f at x as well. Definition 2.
A function f : U → R is called continuously codifferentiableat a point x ∈ U , if f is codifferentiable at every point in a neighbourhoodof x and there exists a codifferential mapping Df ( · ) = [ df ( · ) , df ( · )], definedin a neighbourhood of x and such that the multifunctions df ( · ) and df ( · ) arecontinuous in Hausdorff metric at x .The class of continuously codifferentiable at a given point (or on a givenset) functions is closed under addition, multiplication, composition with con-tinuously differentiable functions, as well as pointwise maximum and minimumof finite families of functions. Moreover, any convex function is continuouslycodifferentiable in a neighbourhood of any given point from the interior of itseffective domain, and any DC function (i.e. a function that can be representedas the difference of convex functions) is continuously codifferentiable in a neigh-bourhood of any given point. Numerous examples of continuously codifferen-tiable functions, as well as main rules of codifferential calculus can be foundin [12, 14, 15, 18, 20]. Definition 3.
A function f : U → R is called quasidifferentiable at a point x ∈ U , if f is directionally differentiable at x and its directional derivative f ′ ( x, · ) at this point can be represented as the difference of sublinear functionsor, equivalently, if there exists a pair ∂f ( x ) , ∂f ( x ) ⊂ X ∗ of compact weak ∗ compact sets such that f ′ ( x, h ) = max x ∗ ∈ ∂f ( x ) h x ∗ , h i + min y ∗ ∈ ∂f ( x ) h y ∗ , h i ∀ h ∈ X. The pair D f ( x ) = [ ∂f ( x ) , ∂f ( x )] is called a quasidifferential of f at x , the set ∂f ( x ) is called a subdifferential of f at x , while the set ∂f ( x ) is referred to asa superdifferential of f at x . 3ust like codifferential, a quasidifferential is not uniquely defined. Here weonly mention that a function f is codifferentiable at a point x iff f is quasidif-ferentiable at x and one can easily compute a quasidifferential of f at x from itscodifferential at this point and vice versa. Namely, if Df ( x ) is a codifferentialof f at x , then the pair D f ( x ) = [ ∂f ( x ) , ∂f ( x )] with ∂f ( x ) = n x ∗ ∈ X ∗ (cid:12)(cid:12)(cid:12) (0 , x ∗ ) ∈ df ( x ) o ,∂f ( x ) = n y ∗ ∈ X ∗ (cid:12)(cid:12)(cid:12) (0 , y ∗ ) ∈ df ( x ) o (2)is a quasidifferential of g at x . Conversely, if D f ( x ) is a quasidifferential of f at x , then the pair [ { } × ∂f ( x ) , { } × ∂f ( x )] is a codifferential of f at x (see,e.g. [14, 20]). Below we consider only quasidifferentials of the form (2), that is,we suppose that if a codifferentiable function f and its codifferential Df ( x ) aregiven, then D f ( x ) is a quasidifferential of f of the form (2).Let us finally recall one auxiliary definition from set-valued analysis that willbe used later (see, e.g. [1, Sect. 8.2] for more details). Let X and Y be metricspace and (Ω , A , µ ) be a measure space. A set-valued mapping F : X × Ω ⇒ Y , F = F ( x, ω ) is called a Carath´eodory map , if for every x ∈ X the multifunction F ( x, · ) is measurable and for a.e. ω ∈ Ω the multifunction F ( · , ω ) is continuous. Let (Ω , A , P ) be a probability space, and suppose that a nonsmooth function f : R d × R m × Ω → R , f = f ( x, y, ω ) is given. In this section we study thecodifferentiability of the nonsmooth integral functional I ( x, y ) = E (cid:2) f ( x, y ( · ) , · ) (cid:3) := Z Ω f ( x, y ( ω ) , ω ) dP ( ω ) , where x ∈ R d is a parameter and y ∈ L p (Ω , A , P ; R m ) with 1 < p ≤ + ∞ isan m -dimensional random vector. Although the case p = 1 can be includedinto the general theory under some additional assumptions, we exclude it forthe sake of simplicity, since the proofs of the main results below are much morecumbersome in the case p = 1, than in the case 1 < p ≤ + ∞ .Denote by p ′ ∈ [1 , + ∞ ) the conjugate exponent of p , i.e. 1 /p + 1 /p ′ = 1,and let | · | be the Euclidean norm in R n . Let us impose some assumptions onthe integrand f that, as we will show below, ensure that the functional I iscorrectly defined and codifferentiable.Namely, we will suppose that for a.e. ω ∈ Ω and for all ( x, y ) ∈ R d × R m the function f is codifferentiable jointly in x and y , that is, there exists a pairof compact convex sets d x,y f ( x, y, ω ) , d x,y f ( x, y, ω ) ⊂ R × R d × R m such thatΦ f ( x, y, ω ; 0 ,
0) = Ψ f ( x, y, ω ; 0 ,
0) = 0 , and for all (∆ x, ∆ y ) ∈ R d × R m one haslim α → +0 α (cid:12)(cid:12)(cid:12) f ( x + α ∆ x, y + α ∆ y, ω ) − f ( x, y, ω ) − Φ f ( x, y, ω ; α ∆ x, α ∆ y ) − Ψ f ( x, y, ω ; α ∆ x, α ∆ y ) (cid:12)(cid:12)(cid:12) = 0 , f ( x, y, ω ; ∆ x, ∆ y ) = max ( a,v x ,v y ) ∈ d x,y f ( x,y,ω ) (cid:0) a + h v x , ∆ x i + h v y , ∆ y i (cid:1) Ψ f ( x, y, ω ; ∆ x, ∆ y ) = min ( b,w x ,w y ) ∈ d x,y f ( x,y,ω ) (cid:0) b + h w x , ∆ x i + h w y , ∆ y i (cid:1) . (3)The pair D x,y f ( x, y, ω ) = [ d x,y f ( x, y, ω ) , d x,y f ( x, y, ω )] is called a codifferentialof f in ( x, y ). Assumption 1.
The function f satisfies the following conditions:1. for any x ∈ R d the map ( y, ω ) f ( x, y, ω ) is a Carath´eodory function;2. the function f satisfies the following growh condition of order p : for any N > C N > β N ∈ L (Ω , A , P )such that | f ( x, y, ω ) | ≤ β N ( ω ) + C N | y | p for all x ∈ R d with | x | ≤ N , all y ∈ R m , and a.e. ω ∈ Ω in the case 1 < p < + ∞ , and | f ( x, y, ω ) | ≤ β N ( ω )for a.e. ω ∈ Ω and all ( x, y ) ∈ R d × R m with max {| x | , | y |} ≤ N in the case p = + ∞ ;3. the multifunctions ( y, ω ) d x,y f ( x, y, ω ) and ( y, ω ) d x,y f ( x, y, ω ) areCarath´eodory maps for any x ∈ R d ;4. the codifferential mapping D x,y f ( · ) satisfies the following growth conditionof order p : for any N > C N >
0, and nonnegative functions β N ∈ L (Ω , A , P ) and γ N ∈ L p ′ (Ω , A , P ) such thatmax {| a | , | v x |} ≤ β N ( ω ) + C N | y | p , | v y | ≤ γ N ( ω ) + C N | y | p − for all ( a, v x , v y ) ∈ d x,y f ( x, y, ω ) ∪ d x,y f ( x, y, ω ), all x ∈ R d with | x | ≤ N ,all y ∈ R m , and a.e. ω ∈ Ω in the case 1 < p < + ∞ , andmax {| a | , | v x | , | v y |} ≤ β N ( ω )for all ( a, v x , v y ) ∈ d x,y f ( x, y, ω ) ∪ d x,y f ( x, y, ω ), a.e. ω ∈ Ω, and all( x, y ) ∈ R d × R m with max {| x | , | y |} ≤ N in the case p = + ∞ .Note that the Carath´eodory and the growth conditions on the function f ensure that the value I ( x, y ) is correctly defined and finite for all x ∈ R d and y ∈ L p (Ω , A , P ; R m ). Let X = R d × L p (Ω , A , P ; R m ). Theorem 1.
Let < p ≤ + ∞ and Assumption 1 be valid. Then the functional I is codifferentiable on R d × L (Ω , A , P ; R m ) , and for any ( x, y ) from this spacethe pair D I ( x, y ) = [ d I ( x, y ) , d I ( x, y )] , defined as d I ( x, y ) = n ( A, x ∗ ) ∈ R × X ∗ (cid:12)(cid:12)(cid:12) A = E [ a ] , h x ∗ , ( h x , h y ) i = (cid:10) E [ v x ] , h x (cid:11) + Z Ω h v y ( ω ) , h y ( ω ) i dP ( ω ) ∀ ( h x , h y ) ∈ X, ( a ( · ) , v x ( · ) , v y ( · )) is a measurable selection of the map d x,y f ( x, y ( · ) , · ) o (4)5 nd d I ( x, y ) = n ( B, y ∗ ) ∈ R × X ∗ (cid:12)(cid:12)(cid:12) B = E [ b ] , h y ∗ , ( h x , h y ) i = (cid:10) E [ w x ] , h x (cid:11) + Z Ω h w y ( ω ) , h y ( ω ) i dP ( ω ) ∀ ( h x , h y ) ∈ X, ( b ( · ) , w x ( · ) , w y ( · )) is a measurable selection of the map d x,y f ( x, y ( · ) , · ) o , is a codifferential of I at ( x, y ) . The proof of Theorem 1 is similar to the proof of the codifferentiability ofthe mapping I ( u ) = R Ω f ( x, u ( x ) , ∇ u ( x )) dx from the author’s papers [17, 21](here Ω ⊆ R n is an open set and u belongs to the Sobolev space). On theother hand, Theorem 1 cannot be directly deduced from the main results of[17, 21]. That is why below we present a detailed proof of Theorem 1. It seemspossible to prove a more general result on the codifferentiability of integralfunctionals defined on Banach spaces that subsumes Theorem 1 and the mainresults of [17,21] as particular cases. A development of such general theorem onthe codifferentiability of nonsmooth integral functionals is an interesting openproblem for future research.For the sake of convenience, we divide the proof of Theorem 1 into twolemmas. Lemma 1.
Let < p ≤ + ∞ and Assumption 1 be valid. Then for any ( x, y ) ∈ X the sets d I ( x, y ) and d I ( x, y ) from Theorem 1 are nonempty, convex, compactin the topological product ( R × X ∗ , τ R × w ∗ ) , and satisfy the following equalities: max ( A,x ∗ ) ∈ d I ( x,y ) A = min ( B,y ∗ ) ∈ d I ( x,y ) B = 0 . (5) Proof.
Fix any ( x, y ) ∈ X . We prove the statement of the lemma only forthe hypodifferential d I ( x, y ), since the proof for the hyperdifferential d I ( x, y )is exactly the same.By Assumption 1 the multifunction ( y, ω ) d x,y f ( x, y, ω ) is a Carath´eodorymap. Therefore by [1, Thrm. 8.2.8] the multifunction d x,y f ( x, y ( · ) , · ) is measur-able, which by [1, Thrm. 8.1.3] implies that there exist a measurable selection( a ( · ) , v x ( · ) , v y ( · )) of this mapping. Furthermore, by the growth condition onthe codifferential D x,y f ( · ) from Assumption 1 all measurable selections of theset-valued mapping d x,y f ( x, y ( · ) , · ) belong to the space Y := L (Ω , A , P ) × L (Ω , A , P ; R d ) × L p ′ (Ω , A , P ; R m ) . (6)Consequently, the linear functional x ∗ , defined as h x ∗ , ( h x , h y ) i = (cid:10) E [ v x ] , h x (cid:11) + Z Ω h v y ( ω ) , h y ( ω ) i dP ( ω ) ∀ ( h x , h y ) ∈ X, belongs to X ∗ , and one can conclude that the hypodifferential d I ( x, y ) is cor-rectly defined and nonempty.Denote by E ( x, y ) the set of all measurable selections z ( · ) = ( a ( · ) , v x ( · ) , v y ( · ))of the set-valued mapping d x,y f ( x, y ( · ) , · ). As was noted above, E ( x, y ) is asubset of the space Y defined in (6). For any z = ( a, v x , v y ) ∈ Y denote by T ( z )the pair ( A, x ∗ ) defined as in (4). Then d I ( x, y ) = T ( E ( x, y )).6y definition, for a.e. ω ∈ Ω the hypodifferential d x,y f ( x, y ( ω ) , ω ) is a convexset. Therefore the set of measurable selections E ( x, y ) of the multifunction d x,y f ( x, y ( · ) , · ) is convex. Hence taking into account the fact that the operator T is linear one obtains that the hypodifferential d I ( x, y ) is a convex set as theimage of a convex set under a linear map.Recall that by the definition of hypodifferential one has a ≤ a, v x , v y ) ∈ d x,y f ( x, y ( ω ) , ω ), ω ∈ Ω. Therefore A ≤ A, x ∗ ) ∈ d I ( x, y ).On the other hand, observe that thanks to equality (1) for a.e. ω ∈ Ω one has0 ∈ n a ∈ R (cid:12)(cid:12)(cid:12) ∃ ( v x , v y ) ∈ R d + m : ( a, v x , v y ) ∈ d x,y f ( x, y ( ω ) , ω ) o . Hence by the Filippov theorem (see, e.g. [1, Thrm. 8.2.10]) there exists a mea-surable selection ( a ( · ) , v x ( · ) , v y ( · )) of the multifunction d x,y f ( x, y ( · ) , · ) suchthat a ( ω ) = 0 almost surely. Consequently, for ( A , x ∗ ) = T ( a , v x , v y ) onehas A = 0, which implies that equality (5) holds true.Thus, it remains to prove the compactness of the set d I ( x, y ) in the corre-sponding product topology. To this end, let us verify that the set E ( x, y ) is aweakly compact subset of the space Y defined in (6), and the operator T con-tinuously maps the space Y endowed with the weak topology to the topologicalproduct ( R , τ R ) × ( X ∗ , w ∗ ). Then one can conclude that the hypodifferential d I ( x, y ) is compact in the corresponding product topology as a continuous im-age of a compact set.We start with the proof of the continuity of the operator T . Let V bean open subset of the product space ( R , τ R ) × ( X ∗ , w ∗ ). Let us show that itspreimage U = T − ( V ) under the map T is weakly open in Y . Indeed, fix any( a, v x , v y ) ∈ U . Then ( A, x ∗ ) = T ( a, v x , v y ) ∈ V , which due to the openness ofthe set V in the corresponding topology implies that there exist ε > n ∈ N ,and pairs ( h i , ξ i ) ∈ X , i ∈ I = { , . . . , n } , such that V ε ( A, x ∗ ) = n ( B, y ∗ ) ∈ R × X ∗ (cid:12)(cid:12)(cid:12) (cid:12)(cid:12) B − A (cid:12)(cid:12) < ε, max i ∈ I (cid:12)(cid:12) h y ∗ − x ∗ , ( h i , ξ i ) i (cid:12)(cid:12) < ε o ⊆ V . Introduce the set U ε ( a, v x , v y ) = (cid:26) ( b, w x , w y ) ∈ Y (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E ( b − a ) (cid:12)(cid:12) < ε, max i ∈ I (cid:12)(cid:12)(cid:12) Z Ω h w x ( ω ) − v x ( ω ) , h i i dP ( ω ) (cid:12)(cid:12)(cid:12) < ε , max i ∈ I (cid:12)(cid:12)(cid:12) Z Ω h w y ( ω ) − v y ( ω ) , ξ i ( ω ) i dP ( ω ) (cid:12)(cid:12)(cid:12) < ε (cid:27) . This set is neighbourhood of the point ( a, v x , v y ) in the weak topology on Y . Moreover, by definition T ( U ε ( a, v x , v y )) ⊆ V ε ( A, x ∗ ), which implies that U ε ( a, v x , v y ) ⊆ U . Thus, for any point ( a, v x , v y ) ∈ U there exists a neighbour-hood of this point in the weak topology contained in U . In other words, the set U is weakly open, and one can conclude that the operator T is continuous withrespect to the chosen topologies.Let us finally proof the weak compactness of the set E ( x, y ) in the space Y defined in (6). By the Eberlein-ˇSmulian theorem it suffice to prove that E ( x, y ) is weakly sequentially compact. To this end, choose any sequence z n ( · ) =( a n ( · ) , v xn ( · ) , v yn ( · )) ∈ E ( x, y ), n ∈ N . Let us consider two cases.7 ase p = + ∞ . By the growth condition on the codifferential D x,y f ( · ) (seeAssumption 1) there exists an a.e. nonnegative function β ∈ L (Ω , A , P ) suchthat for a.e. ω ∈ Ω one hasmax (cid:8) | a n ( ω ) | , | v xn ( ω ) | , | v yn ( ω ) | (cid:9) ≤ β ( ω ) ∀ n ∈ N . Hence by the weak compactness criterion in L (see, e.g. [4, Thrm. 4.7.20]) theclosures of the sets { a n } n ∈ N , { v xn } n ∈ N , and { v yn } n ∈ N are weakly compact inthe corresponding L spaces. Therefore by the Eberlein-ˇSmulian theorem thereexists a subsequence z n k = ( a n k , v xn k , v yn k ) weakly converging to some z ∗ in Y . By Mazur’s lemma there exists a sequence of convex combinations { b z k } ofelements of the sequence z n k strongly converging to z ∗ . Therefore, as is wellknown, there exists a subsequence { b z k l } converging to z ∗ almost surely.Note that due to the convexity of E ( x, y ) one has { b z k } ⊂ E ( x, y ), that is, b z k ( ω ) ∈ d x,y f ( x, y ( ω ) , ω ) for a.e. ω ∈ Ω and all k ∈ N . Hence taking intoaccount the fact that by definition the hypodifferential d x,y f ( x, y ( ω ) , ω ), ω ∈ Ω,is a closed set, one obtains that z ∗ ( ω ) ∈ d x,y f ( x, y ( ω ) , ω ) for a.e. ω ∈ Ω. Thus, z ∗ ∈ E ( x, y ), and the set E ( x, y ) is weakly sequentially compact, which completesthe proof. Case p < + ∞ . By the growth condition on the codifferential D x,y f ( · ) (seeAssumption 1) there exist C > β ∈ L (Ω , A , P )and γ ∈ L p ′ (Ω , A , P ) such that for a.e. ω ∈ Ω and all n ∈ N one hasmax (cid:8) | a n ( ω ) | , | v xn ( ω ) | (cid:9) ≤ β ( ω ) + C | y ( ω ) | p , | v yn ( ω ) | ≤ γ ( ω ) + C | y ( ω ) | p − . Observe that the right-hand side of the first inequality belongs to L (Ω , A , P ),while the right-hand side of the second one belongs to L p ′ (Ω , A , P ). Thus, thesequence { v yn } is norm-bounded in L p ′ (Ω , A , P ; R m ), which due to the reflex-ivity of this space (note that 1 < p ′ < + ∞ , since 1 < p < + ∞ ) implies thatthere exists a weakly convergent subsequence { v yn k } . In turn, the existence ofweakly convergence subsequences of the sequences { a n } and { v xn } follows fromthe weak compactness criterion in L (see [4, Thrm. 4.7.20]).Thus, there exists a subsequence { z n k } weakly converging to some z ∗ ∈ Y .Now, applying Mazur’s lemma and arguing precisely in the same way as in thecase p = + ∞ one can prove the weak compactness of the set E ( x, y ).Denote by k · k p the standard norm on L p (Ω , A , P ). Lemma 2.
Let < p ≤ + ∞ , Assumption 1 be valid, and the sets d I ( x, y ) and d I ( x, y ) be defined as in Theorem 1. Then for any ( x, y ) ∈ X and (∆ x, ∆ y ) ∈ X one has lim α → +0 α (cid:12)(cid:12)(cid:12) I ( x + α ∆ x, y + α ∆ y ) − I ( x, y ) − max ( A,x ∗ ) ∈ d I ( x,y ) (cid:0) A + h x ∗ , α (∆ x, ∆ y ) i (cid:1) − min ( B,y ∗ ) ∈ d I ( x,y ) (cid:0) B + h y ∗ , α (∆ x, ∆ y ) i (cid:1)(cid:12)(cid:12)(cid:12) = 0 . Proof.
Fix any ( x, y ) ∈ X and (∆ x, ∆ y ) ∈ X , and choose an arbitrary sequence { α n } ⊂ (0 , + ∞ ) converging to zero. For a.e. ω ∈ Ω and n ∈ N denote f n ( ω ) = 1 α n (cid:16) f ( x + α n ∆ x, y ( ω ) + α n ∆ y ( ω ) , ω ) − f ( x, y ( ω ) , ω ) − Φ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) − Ψ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1)(cid:17) , (7)8here the functions Φ f and Ψ f are defined in (3). By the definition of codiffer-entiability the sequence f n converges to zero almost surely. Our aim is to provethat each term in the definition of f n belongs to L (Ω , A , P ) and there existsan a.e. nonnegative function ω ∈ L (Ω , A , P ) such that | f n | ≤ ω almost surely.Then by Lebesgue’s dominated convergence theorem E [ | f n | ] → n → ∞ .Hence integrating each term in the definition of f n separately one obtains thatlim n →∞ α n (cid:12)(cid:12)(cid:12) I ( x + α n ∆ x,y + α n ∆ y ) − I ( x, y ) − Z Ω Φ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) dP ( ω ) − Z Ω Ψ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) dP ( ω ) (cid:12)(cid:12)(cid:12) = 0 . Let us check that Z Ω Φ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) dP ( ω )= max ( A,x ∗ ) ∈ d I ( x,y ) (cid:0) A + h x ∗ , α n (∆ x, ∆ y ) i (cid:1) (8)(a similar equality for the min terms involving the hyperdifferentials can beverified in the same way). Then one obtains the desired result.Indeed, by definition (see (3)) for any measurable selection ( a ( · ) , v x ( · ) , v y ( · ))of the set-valued mapping d x,y f ( x, y ( · ) , · ) one hasΦ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) ≥ a ( ω ) + h v x ( ω ) , α n ∆ x i + h v y ( ω ) , α n ∆ y ( ω ) i , which implies that Z Ω Φ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) dP ( ω ) ≥ max ( A,x ∗ ) ∈ d I ( x,y ) (cid:0) A + h x ∗ , α n ∆ x i (cid:1) (see (4)). On the other hand, for a.e. ω ∈ Ω one hasΦ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) ∈ n a + h v x , α n ∆ x i + h v y , α n ∆ y ( ω ) i (cid:12)(cid:12)(cid:12) ( a, v x , v y ) ∈ d x,y f ( x, y ( ω ) , ω ) o . Consequently, by the Filippov theorem (see, e.g. [1, Thrm. 8.2.10]) there existsa measurable selection ( a ( · ) , v x ( · ) , v y ( · )) of the multifunction d x,y f ( x, y ( · ) , · )such thatΦ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) = a ( ω ) + h v x ( ω ) , α n ∆ x i + h v y ( ω ) , α n ∆ y ( ω ) i for a.e. ω ∈ Ω. Hence for the corresponding pair ( A , x ∗ ) = T ( a , v x , v y ) (seethe proof of Lemma 1), that by definition belongs to d I ( x, y ), one has Z Ω Φ f (cid:0) x, y ( ω ) , ω ; α n ∆ x, α n ∆ y ( ω ) (cid:1) dP ( ω ) = A + h x ∗ , α n (∆ x, ∆ y ) i , and therefore equality (8) holds true. 9hus, it remains to show that Lebesgue’s dominated convergence theoremis applicable to the sequence { f n } . Indeed, the first two terms in the definitionof f n (see (7)) belong to L (Ω , A , P ) by virtue of the first two parts of Assump-tion 1. Let us check that these terms are dominated by a Lebesgue integrablefunction independent of n .By the mean value theorem for codifferentiable functions [20, Prp. 2] for any n ∈ N and for a.e. ω ∈ Ω there exist α n ( ω ) ∈ (0 , α n ) and(0 , v xn ( ω ) , v yn ( ω )) ∈ d x,y f ( x + α n ( ω )∆ x, y ( ω ) + α n ( ω )∆ y ( ω ) , ω ) , (0 , w xn ( ω ) , w yn ( ω )) ∈ d x,y f ( x + α n ( ω )∆ x, y ( ω ) + α n ( ω )∆ y ( ω ) , ω )such that1 α n (cid:16) f ( x + α n ∆ x, y ( ω ) + α n ∆ y ( ω ) , ω ) − f ( x, y ( ω ) , ω ) (cid:17) = h v xn ( ω ) + w xn ( ω ) , ∆ x i + h v yn ( ω ) + w yn ( ω ) , ∆ y ( ω ) i . (9)Put α ∗ = max n ∈ N α n . By the growth condition on the codifferential D x,y f (seeAssumption 1) there exist C N > β N ∈ L (Ω , A , P )and γ N ∈ L p ′ (Ω , A , P ) (here N = | x | + α ∗ | ∆ x | ) such thatmax (cid:8) | v xn ( ω ) | , | w xn ( ω ) | (cid:9) ≤ β N ( ω ) + C N (cid:12)(cid:12) y ( ω ) + α n ( ω )∆ y ( ω ) (cid:12)(cid:12) p ≤ β N ( ω ) + C N p (cid:16) | y ( ω ) | p + α ∗ | ∆ y ( ω ) | p (cid:17) , max (cid:8) | v yn ( ω ) | , | w yn ( ω ) | (cid:9) ≤ γ N ( ω ) + C N (cid:12)(cid:12) y ( ω ) + α n ( ω )∆ y ( ω ) (cid:12)(cid:12) p − for a.e. ω ∈ Ω and all n ∈ N in the case 1 < p < + ∞ , and there exists β N ∈ L (Ω , A , P ) (here N = max {| x | + α ∗ | ∆ x | , k y k ∞ + α ∗ k ∆ y k ∞ } ) such thatmax (cid:8) | v xn ( ω ) | , | w xn ( ω ) | , | v yn ( ω ) | , | w yn ( ω ) | (cid:9) ≤ β N ( ω )for a.e. ω ∈ Ω and all n ∈ N in the case p = + ∞ . Hence with the use of (9)one obtains that in the case p = + ∞ the inequality1 α n (cid:12)(cid:12)(cid:12) f ( x + α n ∆ x, y ( ω ) + α n ∆ y ( ω ) , ω ) − f ( x, y ( ω ) , ω ) (cid:12)(cid:12)(cid:12) ≤ β N ( ω ) | ∆ x | + 2 β N ( ω ) k ∆ y k ∞ holds true for a.e. ω ∈ Ω and all n ∈ N , which implies that the first two termsin the definition of f n (see (7)) are dominated by a Lebesgue integrable functionindependent of n . In the case p < + ∞ one has1 α n (cid:12)(cid:12)(cid:12) f ( x + α n ∆ x, y ( ω ) + α n ∆ y ( ω ) , ω ) − f ( x, y ( ω ) , ω ) (cid:12)(cid:12)(cid:12) ≤ (cid:16) β N ( ω ) + C N p (cid:0) | y ( ω ) | p + α ∗ | ∆ y ( ω ) | p (cid:1)(cid:17) | ∆ x | + 2 (cid:16) γ N ( ω ) + C N p − (cid:0) | y ( ω ) | p − + α p − ∗ | ∆ y ( ω ) | p − (cid:1)(cid:17) | ∆ y ( ω ) | The right-hand side of this inequality does not depend on n and is Lebesgueintegrable, as one can easily verify with the use of H¨older’s inequality and theequality p ′ ( p −
1) = p . Thus, in the case p < + ∞ the first two terms in the10efinition of f n are dominated by a Lebesgue integrable function independentof n as well.Let us finally check that the third term in the definition of f n , denoted by θ n ( ω ) := 1 α n max ( a,v x ,v y ) ∈ d x,y f ( x,y ( ω ) ,ω ) (cid:0) a + h v x , α n ∆ x i + h v y , α n ∆ y ( ω ) i (cid:1) (see (7)), is measurable and dominated by a Lebesgue integrable function inde-pendent of n . The fact that the last term (the min term) in the definition of f n is measurable and dominated by a Lebesgue integrable function independent of n is proved in exactly the same way.As was shown in the proof of Lemma 1, the set-valued mapping d x,y f ( x, y ( · ) , · )is measurable. Consequently, the function θ n is measurable by [1, Thrm. 8.2.11].For any ω ∈ Ω introduce the function g ω ( t ) = max ( a,v x ,v y ) ∈ d x,y f ( x,y ( ω ) ,ω ) (cid:0) a + h v x , t ∆ x i + h v y , t ∆ y ( ω ) i (cid:1) . Observe that by the definition of codifferential g ω (0) = 0 (see Def. 1) and forany t, ∆ t ∈ R and α > α (cid:12)(cid:12)(cid:12) g ω ( t + α ∆ t ) − g ω ( t ) − max ( a g ,v g ) ∈ dg ω ( t ) (cid:0) a g + v g ( α ∆ t ) (cid:1)(cid:12)(cid:12)(cid:12) = 0 , where dg ω ( t ) = n ( a g , v g ) ∈ R × R (cid:12)(cid:12)(cid:12) a g = a + h v x , t ∆ x i + h v y , t ∆ y ( ω ) i − g ω ( t ) ,v g = h v x , ∆ x i + h v y , ∆ y ( ω ) i , ( a, v x , v y ) ∈ d x,y f ( x, y ( ω ) , ω ) o . The set dg ω ( t ) is obviously convex and compact. Moreover, note that the equal-ity max { a g | ( a g , v g ) ∈ dg ω ( t ) } = g ω ( t ) − g ω ( t ) = 0 holds true. Thus, the func-tion g ω is codifferentiable at every point t ∈ R , and the pair [ dg ω ( t ) , { } ] is acodifferential of g ω at the point t .Applying the mean value theorem for codifferentiable functions [20, Prp. 2]one obtains that for any n ∈ N and for a.e. ω ∈ Ω there exists α n ( ω ) ∈ (0 , α n )and (0 , v gn ( ω )) ∈ dg ω ( α n ( ω )) such that θ n ( ω ) = 1 α n ( g ω ( α n ) − g ω (0)) = v g ( ω )or, equivalently, there exists ( a n ( ω ) , v xn ( ω ) , v yn ( ω )) ∈ d x,y f ( x, y ( ω ) , ω ) suchthat θ n ( ω ) = h v xn ( ω ) , ∆ x i + h v yn ( ω ) , ∆ y ( ω ) i ∀ n ∈ N Hence by the growth condition on the codifferential D x,y f (see Assumption 1)there exist C N > β N ∈ L (Ω , A , P ) and γ N ∈ L p ′ (Ω , A , P ) (here N = k x k ) satisfying the inequality | θ n ( ω ) | ≤ (cid:16) β N ( ω ) + C N | y ( ω ) | p (cid:17) | ∆ x | + (cid:16) γ N ( ω ) + C N | y ( ω ) | p − (cid:17) | ∆ y ( ω ) | for a.e. ω ∈ Ω in the case p < + ∞ , and the inequality | θ n ( ω ) | ≤ β N ( ω ) | ∆ x | + β N ( ω ) k ∆ y ( ω ) k ∞ ω ∈ Ω in the case p = + ∞ . The right-hand sides of these inequalitiesare Lebesgue integrable and do not depend on n . Thus, the sequence { θ n } isdominated by a Lebesgue integrable function, which completes the proof.With the use of Theorem 1 one can easily obtain sufficient conditions for thequasidifferentiability of the functional I . Recall that X = R d × L p (Ω , A , P ; R m ). Corollary 1.
Let < p ≤ + ∞ and Assumption 1 be valid. Then the functional I is quasidifferentiable on R d × L (Ω , A , P ; R m ) , and for any ( x, y ) from thisspace the pair D I ( x, y ) = [ ∂ I ( x, y ) , ∂ I ( x, y )] , defined as ∂ I ( x, y ) = n x ∗ ∈ X ∗ (cid:12)(cid:12)(cid:12) h x ∗ , ( h x , h y ) i = (cid:10) E [ v x ] , h x (cid:11) + Z Ω h v y ( ω ) , h y ( ω ) i dP ( ω ) ∀ ( h x , h y ) ∈ X, (0 , v x ( · ) , v y ( · )) is a measurable selection of d x,y f ( x, y ( · ) , · ) o and ∂ I ( x, y ) = n y ∗ ∈ X ∗ (cid:12)(cid:12)(cid:12) h y ∗ , ( h x , h y ) i = (cid:10) E [ w x ] , h x (cid:11) + Z Ω h w y ( ω ) , h y ( ω ) i dP ( ω ) ∀ ( h x , h y ) ∈ X, (0 , w x ( · ) , w y ( · )) is a measurable selection of d x,y f ( x, y ( · ) , · ) o , is a quasidifferential of I at ( x, y ) . Moreover, the following equality holds true: I ′ ( x, y ; h x , h y ) = Z Ω (cid:2) f ( · , · , ω ) (cid:3) ′ ( x, y ( ω ); h x , h y ( ω )) dP ( ω ) ∀ ( h x , h y ) ∈ X. (10) Proof.
Applying Theorem 1 and the fact that any codifferentiable function g with codifferential Dg ( x ) is quasidifferentiable and the pair ∂g ( x ) = n x ∗ ∈ X ∗ (cid:12)(cid:12)(cid:12) (0 , x ∗ ) ∈ dg ( x ) o , ∂g ( x ) = n y ∗ ∈ X ∗ (cid:12)(cid:12)(cid:12) (0 , y ∗ ) ∈ dg ( x ) o is a quasidifferential of g at x (see, e.g. [14,20]), one obtains the required resultson the quasidifferentiability of the functional I .To prove equality (10), recall that the set-valued maps d x,y f ( x, y ( · ) , · ) and d x,y f ( x, y ( · ) , · ) are measurable, as was shown in the proof of Lemma 1. Hencewith the use of [1, Thrm. 8.2.4] one obtains that the set-valued mappings ∂ x,y f ( x, y ( · ) , · ) and ∂ x,y f ( x, y ( · ) , · ), defined according to equalities (2), are mea-surable as well. Consequently, applying the definition of quasidifferentiabilityand arguing in the same way as in the proof of Lemma 2 (or utilising the inter-changeability principle; see, e.g. [35, Thrm. 14.60]) one gets that I ′ ( x, y ; h x , h y ) = Z Ω (cid:16) max ( v x ,v y ) ∈ ∂ x,y f ( x ∗ ,y ∗ ( ω ) ,ω ) (cid:0) h v x , h x i + h v y , h y ( ω ) i (cid:1) + min ( w x ,w y ) ∈ ∂ x,y f ( x ∗ ,y ∗ ( ω ) ,ω ) (cid:0) h w x , h x i + h w y , h y ( ω ) i (cid:1)(cid:17) dP ( ω )for all ( h x , h y ) ∈ X , which by the definition of quasidifferential of the function f implies that equality (10) holds true.12 emark . In the particular case when the function f does not depent on y ,i.e. f = f ( x, ω ), the previous corollary contains sufficient conditions for thequasidifferentiability of the function F ( x ) = E [ f ( x, · )]. Quasidifferentiability ofthis function was studied in the recent paper [29] under different assumptionson the function f . Namely, instead of imposing any growth conditions, in [29] itwas assumed that all integrals are correctly defined and the function f is locallyLipschitz continuous in x uniformly in ω .Let us finally show that under the assumptions of Theorem 1 the functional I ( x, y ) is not only codifferentiable, but also Lipschitz continuous on boundedsets. Corollary 2.
Let < p ≤ + ∞ and Assumption 1 be valid. Then I is Lipschitzcontinuous on any bounded subset of the space X = R d × L p (Ω , A , P ; R m ) .Proof. With the use of the growth condition on the codifferential mapping D x,y f ( · ) from Assumption 1 one can readily verify that both multifunctions d I ( · ) and d I ( · ) are bounded on bounded subsets of the space X . Thereforeby [20, Corollary 2] the functional I is Lipschitz continuous on any boundedsubset of this space. Let, as above, (Ω , A , P ) be a probability space. In this section we study a generaltwo-stage stochastic programming problem of the formmin x ∈ A E (cid:2) F ( x, ω ) (cid:3) , (11)where F ( x, ω ) is the optimal value of the second stage problemmin y ∈ G ( x,ω ) f ( x, y, ω ) . (12)Here A ⊂ R d is a closed set, f : R d × R m × Ω → R is a Carath´eodory function,and G : R d × Ω ⇒ R m is a multifunction. We assume that G is measurable andfor every ω ∈ Ω the multifunction G ( · , ω ) is closed.Choose any 1 ≤ p ≤ + ∞ , and denote X = R d × L p (Ω , A , P ; R m ). By theinterchangeability principle for two-stage stochastic programming (see, e.g. [40,Thrm. 2.20]), problem (11), (12) is equivalent the following variational problemwith pointwise constraints:min ( x,y ) ∈ X E (cid:2) f ( x, y ( · ) , · ) (cid:3) subject to x ∈ A, y ( ω ) ∈ G ( x, ω ) for a.e. ω ∈ Ω , ( P )in the sense that the optimal values of these problems coincide, and if thiscommon optimal value is finite, then for any globally optimal solution ( x ∗ , y ∗ ( · ))of the problem ( P ) the point x ∗ is a globally optimal solution of problem (11) andfor a.e. ω ∈ Ω the point y ∗ ( ω ) is a globally optimal solution of the second stageproblem (12). Conversely, if x ∗ is a globally optimal solution of problem (11)and for a.e. ω ∈ Ω the point y ∗ ( ω ) is a globally optimal solution of problem (12)13ith x = x ∗ such that y ∗ ∈ L p (Ω , A , P ; R m ), then ( x ∗ , y ∗ ) is a globally optimalsolution of the problem ( P ).Since problem (11), (12) and the problem ( P ) are equivalent, below we con-sider only the problem ( P ). Our aim is to present several results on exactpenalty functions for the problem ( P ), which not only allow one to obtain op-timality conditions for the original two-stage stochastic programming problem,but also can be used for design and analysis of exact penalty methods for solvingproblem (11), (12). Fix any p ∈ [1 , + ∞ ], and denote by I ( x, y ) = Z Ω f ( x, y ( ω ) , ω ) dP ( ω )the objective function of the problem ( P ). Below we suppose that the functional I is correctly defined on the space X := R d × L p (Ω , A , P ; R m ) and does not takethe value −∞ . In particular, it is sufficient to suppose that for any x ∈ R d there exist C > β ∈ L (Ω , A , P ) suchthat | f ( x, y, ω ) | ≤ β ( ω ) + C | y | p for a.e. ω ∈ Ω and all y ∈ R m in the case p < + ∞ , and for any x ∈ R d and N > β N ∈ L (Ω , A , P ) such that | f ( x, y, ω ) | ≤ β N ( ω ) for a.e. ω ∈ Ω and all y ∈ R m with | y | ≤ N .Introduce the set M = n ( x, y ) ∈ X (cid:12)(cid:12)(cid:12) y ( ω ) ∈ G ( x, ω ) for a.e. ω ∈ Ω o . Then the problem ( P ) can be rewritten as follows:min ( x,y ) ∈ X I ( x, y ) subject to ( x, y ) ∈ M ∩ ( A × L p (Ω , A , P ; R m )) . Let ϕ : X → [0 , + ∞ ] be any function such that ϕ ( x, y ) = 0 iff ( x, y ) ∈ M , andlet Φ c ( x, y ) = I ( x, y ) + cϕ ( x, y ). The function Φ c is called a penalty function for the problem ( P ) with c ≥ ϕ is called a penalty term for the constrain ( x, y ) ∈ M . Our aim is to obtainsufficient conditions for the exactness of the penalty function Φ c .Recall that the penalty function Φ c is called globally exact , if there exists c ∗ ≥ c ≥ c ∗ the set of globally optimal solutions of thepenalized problem min ( x,y ) ∈ X Φ c ( x, y ) subject to x ∈ A (13)coincides with the set of globally optimal solutions of the problem ( P ). Thegreatest lower bound of all such c ∗ is called the least exact penalty parameter ofthe penalty function Φ c . One can verify that the penalty function Φ c is globallyexact iff there exists c ∗ ≥ c ≥ c ∗ the optimal values ofthe problem ( P ) and problem (13), and the greatest lower bound of all such c ∗ coincides with the least exact penalty parameter. See [11, 19, 22, 34, 38, 48] formore details on exact penalty functions.14et us obtain sufficient conditions for the global exactness of the penaltyfunction Φ c with the penalty term ϕ defined in several different ways. To thisend, we will utilise general sufficient conditions for the exactness of penaltyfunctions in metric and normed spaces from [19,22], and the following auxiliarylemma, which is a slight generalization of [19, Prp. 3.13]. Lemma 3.
Let Y be a normed space, F ⊂ Y be nonempty sets, and a function F : Y → R ∪ { + ∞} be such that for any bounded set C ⊂ Y there exists acontinuous from the right function ω C : [0 , + ∞ ) → [0 , + ∞ ) for which (cid:12)(cid:12) F ( y ) − F ( y ) (cid:12)(cid:12) ≤ ω C (cid:0) k y − y k (cid:1) ∀ y , y ∈ C. (14) Then for any
R > there exists a bounded set C ⊂ Y such that F ( y ) ≥ inf z ∈F F ( z ) − ω C (cid:0) dist( y, F ) (cid:1) ∀ y ∈ B (0 , R ) = { z ∈ Y | k z k ≤ R } . (15) Proof.
Denote F ∗ = inf z ∈F F ( z ), and fix any R > z ∈ F . By our assump-tion there exists a continuous from the right function ω C such that inequality(14) holds true for C = B (0 , R + k z k ).Choose any y ∈ B (0 , R ). If y ∈ F , then inequality (15) trivially holdstrue. Suppose now that y ∈ B (0 , R ) \ F . Clearly, there exists a sequence { y n } ⊂ F such that k y − y n k → dist( y, F ) as n → ∞ , and the inequalities k y − y n k ≤ k y − z k ≤ R + k z k and k y − y n k ≥ k y − y n +1 k are satisfied forall n ∈ N . By definition { y n } ⊂ C , y ∈ C , and F ( y n ) ≥ F ∗ for all n ∈ N .Therefore, by applying inequality (14) one obtains that F ∗ − F ( y ) = F ∗ − F ( y n ) + F ( y n ) − F ( y ) ≤ F ( y n ) − F ( y ) ≤ ω C (cid:0) k y − y n k (cid:1) for any n ∈ N . Hence passing to the limit as n → ∞ with the use of the factthat the function ω C is continuous from the right and the sequence {k y − y n k} is non-increasing one gets that inequality (15) holds true. Remark . Note that if F is Lipschitz continuous on bounded sets, then in-equality (14) holds true with ω C ( t ) = L C t , where L C is a Lipschitz constantof F on C . In this case the statement of the lemma can be reformulated asfollows: for any R >
L > F ( y ) ≥ F ∗ − L dist( y, F ) forall y ∈ B (0 , R ). Thus, Lemma 3 provides a lower estimate of the decay of thefunction F relative to a given set F .We start our analysis of the exactness of the penalty function Φ c with thesimplest case when the penalty term ϕ is defined via the distance function tothe multifunction G . Denote by I ∗ the optimal value of the problem ( P ). Theorem 2.
Let there exist a globally optimal solution of the problem ( P ) , theset-valued mapping G have closed images, and ϕ ( x, y ) = (cid:16) E [dist( y ( · ) , G ( x, · )) p ] (cid:17) /p ∀ ( x, y ) ∈ X in the case p < + ∞ , and ϕ ( x, y ) = ess sup ω ∈ Ω dist( y ( ω ) , G ( x, ω )) for all ( x, y ) ∈ X in the case p = + ∞ . Suppose also that the functional I is Lipschitz contin-uous on bounded sets, and there exists c ≥ such that the set { ( x, y ) ∈ X | x ∈ A, Φ c ( x, y ) < I ∗ } is bounded. Then the penalty function Φ c is globally exact. roof. Observe that the function ϕ is correctly defined for all ( x, y ) ∈ X , sincethe multifunction G is measurable. Moreover, ϕ is nonnegative, and ϕ ( x, y ) = 0iff ( x, y ) ∈ M . Denote by F the feasible set of the problem ( P ). Let us showthat ϕ ( x, y ) ≥ dist (cid:16) ( x, y ) , F (cid:17) ∀ x ∈ A, y ∈ L p (Ω , A , P ; R m ) (16)Indeed, fix any ( x, y ) ∈ X such that x ∈ A . If ϕ ( x, y ) = + ∞ , then inequality(16) obviously holds true. Suppose now that ϕ ( x, y ) < + ∞ . Then, in particular, G ( x, ω ) = ∅ for a.e. ω ∈ Ω.By our assumptions the multifunction G is measurable and has closed im-ages. Therefore by [1, Crlr. 8.2.13] there exists a measurable selection z of theset-valued mapping G ( x, · ) such that | y ( ω ) − z ( ω ) | = dist (cid:0) y ( ω ) , G ( x, ω ) (cid:1) for a.e. ω ∈ Ω . Let us check that z ∈ L p (Ω , A , P ; R m ). Then ( x, z ) ∈ F and ϕ ( x, y ) = k y − z k p = (cid:13)(cid:13) ( x, y ) − ( x, z ) (cid:13)(cid:13) ≥ dist (cid:16) ( x, y ) , F (cid:17) , that is, inequality (16) holds true.To verify that z belongs to the space L p , observe that | z ( ω ) | ≤ | y ( ω ) | + | z ( ω ) − y ( ω ) | = | y ( ω ) | + dist (cid:0) y ( ω ) , G ( x, ω ) (cid:1) for a.e. ω ∈ Ω. The right-hand side of this inequality belongs to L p (Ω , A , P ; R m )due to the fact that ϕ ( x, y ) < + ∞ . Therefore the function z belongs to thisspace as well.Thus, inequality (16) holds true. Since the functional I is Lipschitz con-tinuous on bounded sets, by Lemma 3 for any R >
L > I ( x, y ) ≥ I ∗ − L dist (cid:16) ( x, y ) , F (cid:17) ∀ ( x, y ) ∈ B (0 , R ) . Hence by [19, Prp. 3.16 and Remark 15, part (ii)] the penalty function Φ c isglobally exact. Remark . Note that by Corollary 2 the functional I is Lipschitz continuous onbounded sets in the case p >
1, provided the integrand f satisfies Assumption 1.In turn, as one can readily verify, the set { ( x, y ) ∈ X | x ∈ A, Φ c ( x, y ) < I ∗ } isbounded for some c ≥
0, if 1 ≤ p < + ∞ and one of the following conditions issatisfied:1. the set A is bounded, and the multifunction G is bounded on A × Ω;2. the set A is bounded, and there exist C > β ∈ L (Ω , A , P ) suchthat f ( x, y, ω ) ≥ C | y | p + β ( ω ) for all ( x, y ) ∈ A × R m and a.e. ω ∈ Ω;3. the multifunction G is bounded on A × Ω, and there exist β ∈ L (Ω , A , P )and a function ρ : [0 , + ∞ ) → [0 , + ∞ ) such that ρ ( t ) → + ∞ as t → + ∞ ,and f ( x, y, ω ) ≥ ρ ( | x | ) + β ( ω ) for all ( x, y ) ∈ R d + m and a.e. ω ∈ Ω;4. there exist
C > β ∈ L (Ω , A , P ), and a function ρ : [0 , + ∞ ) → [0 , + ∞ )such that ρ ( t ) → + ∞ as t → + ∞ , and f ( x, y, ω ) ≥ ρ ( | x | ) + C | y | p + β ( ω )for all ( x, y ) ∈ R d + m and a.e. ω ∈ Ω;16. (Ω , A , P ) is a finite probability space, and min ω ∈ Ω f ( x, y, ω ) → + ∞ as | x | + | y | → + ∞ .In the case p = + ∞ the set { ( x, y ) ∈ X | x ∈ A, Φ c ( x, y ) < I ∗ } is bounded,provided the first, the third or the last of the assumptions above is satisfied.In most particular cases the feasible set G ( x, ω ) of the second stage problem(12) is not defined explicitly, but rather via some constraints. As a result,one usually does not know an explicit expression for the penalty term ϕ fromTheorem 2, which makes this theorem inapplicable to real-world problems, atleast in a direct way. In some cases Theorem 2 can still be applied indirectly toreduce an analysis of the exactness of a penalty function for the problem ( P )to an analysis of constraints of the second stage problem. Let us explain thisstatement with the use of a simple example. Example 1.
Suppose that the set-valued map G is defined in the followingway: G ( x, ω ) = n y ∈ R m (cid:12)(cid:12)(cid:12) ∈ Q ( x, y, ω ) o where Q : R d × R m × Ω → R s is a multifunction with closed images. In otherwords, the second stage problem (12) has the form:min y f ( x, y, ω ) subject to 0 ∈ Q ( x, y, ω ) . In this case it is natural to define ϕ ( x, y ) = (cid:16) E (cid:2) dist(0 , Q ( x, y ( · ) , · )) p (cid:3)(cid:17) /p , ≤ p < + ∞ . Then ϕ ( x, y ) = 0 iff ( x, y ) ∈ M . Suppose that there exists K > K dist(0 , Q ( x, y, ω )) ≥ dist( y, G ( x, ω )) ∀ x ∈ A, y ∈ R m , ω ∈ Ω , that is, the function g ( y ) = dist(0 , Q ( x, y, ω )) admits a global error bound uni-form for all x ∈ A and ω ∈ Ω. ThenΦ Kc ( x, y ) = I ( x, y ) + Kcϕ ( x, y ) ≥ I ( x, y ) + cψ ( x, y )for all x ∈ A and y ∈ L p (Ω , A , P ; R m ), where ψ ( x, y ) = (cid:16) E [dist( y ( · ) , G ( x, · )) p ] (cid:17) /p . Therefore, as one can readily verify (cf. [19, Prp. 2.2]), under the assumptions ofTheorem 2 the penalty function Φ c is globally exact and its least exact penaltyparameter is at most K times greater than the least exact penalty parameterof the penalty function from Theorem 2.Let us also point out two simple cases when Theorem 2 can be applieddirectly, that is, the cases when one can write a simple explicit expression forthe penalty term ϕ from this theorem. Note that Theorem 2 can be applieddirectly whenever the distance from a given point y to the set G ( x, ω ) is easyto compute, e.g. when the set G ( x, ω ) is defined by linear or, more generally,convex quadratic constraints. 17 xample 2. Let I := { , . . . , m } . Suppose that the set G ( x, ω ) is defined bybound (box) constraints, that is, G ( x, ω ) = n y = ( y , . . . , y m ) T ∈ R m (cid:12)(cid:12)(cid:12) a i ( x, ω ) ≤ y i ≤ b i ( x, ω ) , i ∈ I o for some given functions a i and b i . Let the space R m be equipped with the ℓ ∞ norm. Then the penalty term ϕ from Theorem 2 has the form ϕ ( x, y ) = (cid:16) Z Ω max i ∈ I (cid:8) , y i ( ω ) − b i ( x, ω ) , a i ( x, ω ) − y i ( ω ) (cid:9) p dP ( ω ) (cid:17) /p in the case 1 ≤ p < + ∞ . Example 3.
Let G ( x, ω ) = B ( z ( x, ω ) , R ( x, ω )) be the closed ball with centre z ( x, ω ) and radius R ( x, ω ). Then the penalty term ϕ from Theorem 2 has theform ϕ ( x, y ) = (cid:16) Z Ω max (cid:8) , | y ( ω ) − z ( x, ω ) | − R ( x, ω ) (cid:9) p dP ( ω ) (cid:17) /p . in the case 1 ≤ p < + ∞ .Observe that the penalty terms from Theorem 2 and the examples abovedepend on the parameter p that defines the space in which one solves the problem( P ). This parameter must be chosen to satisfy the assumption of Theorem 2.Under some additional assumptions on constraints of the second stage prob-lem one can prove the global exactness of the penalty function Φ c with a penaltyterm ϕ that does not depend on p . For the sake of simplicity, we will prove thisresult only in the case when the feasible set G ( x, ω ) of the second stage problemis defined by inequality constraints, i.e. it has the form G ( x, ω ) = n y ∈ R m (cid:12)(cid:12)(cid:12) g i ( x, y, ω ) ≤ , i ∈ I = { , . . . , ℓ } o for some functions g i : R d × R m × Ω → R . Below we suppose that for each x ∈ R d the map ( y, ω ) g i ( x, y, ω ), i ∈ I , is a Carath´eodory function, so thatthe penalty term ϕ ( x, y ) = Z Ω max i ∈ I (cid:8) , g i ( x, y, ω ) (cid:9) dP ( ω ) (17)is correctly defined. Note that ϕ ( x, y ) = 0 iff ( x, y ) ∈ M . We will assume thatfor any x ∈ R d and a.e. ω ∈ Ω the function y g i ( x, y, ω ), i ∈ I , is quasidiffer-entiable and denote by D y g i ( x, y, ω ) = [ ∂ y g i ( x, y, ω ) , ∂ y g i ( x, y, ω )] its quasidif-ferential. Denote also I ( x, y, ω ) = { i ∈ I | g i ( x, y, ω ) = max k ∈ I g k ( x, y, ω ) } .Let ( Y, d ) be a metric space, K ⊂ Y be a given set, and g : Y → R ∪ { + ∞} be a given function. Recall that for any y ∈ K ∩ dom g the quantity g ↓ K ( y ) = lim inf z → y,z ∈ K g ( z ) − g ( y ) d ( z, y )is called the rate of steepest descent of g at y . If y is not a limit point of the set K ,then by definition g ↓ K ( y ) = + ∞ . Recall also that a point y ∈ K ∩ dom g is called an inf-stationary point of g on the set K , if g ↓ K ( y ) ≥
0. It should be noted that18n various particular cases this inequality is reduced to standard stationarityconditions. For example, if Y is normed space, g is Fr´echet differentiable at apoint y ∈ K , and the set K is convex, then g ↓ K ( y ) ≥ g ′ ( y )[ z − y ] ≥ z ∈ K , where g ′ ( y ) is the Fr´echet derivative of g at y . See [10,11,43,44] for moredetails on the rate of steepest descent and the definition of inf-stationarity. Theorem 3.
Let ≤ p < + ∞ and the following assumptions be valid:1. there exist a globally optimal solution of the problem ( P ) ;2. the functional I is Lipschitz continuous on bounded sets;3. the set S c ( γ ) = { ( x, y ) ∈ X | x ∈ A, Φ c ( x, y ) < γ } is bounded for some c ≥ and γ > I ∗ , where Φ c is the penalty functions with the penalty term (17) ;4. for any x ∈ A there exists an a.e. nonnegative function L ( · ) ∈ L (Ω , A , P ) such that | g i ( x, y , ω ) − g i ( x, y , ω ) | ≤ L ( ω ) k y − y k for all y , y ∈ R d ,all i ∈ I and a.e. ω ∈ Ω ;5. for all i ∈ I , x ∈ A , and y ∈ L p (Ω , A , P ; R m ) the set-valued mappings ∂ y g i ( x, y ( · ) , · ) and ∂ y g i ( x, y ( · ) , · ) are measurable;6. there exists a > such that for any ( x, y ) ∈ A × R m and a.e. ω ∈ Ω such that y / ∈ G ( x, ω ) , and for all i ∈ I ( x, y, ω ) one can find w i ( x, y, ω ) ∈ ∂ y g i ( x, y, ω ) satisfying the following condition: dist (cid:16) , co n ∂ y g i ( x, y, ω ) + w i ( x, y, ω ) (cid:12)(cid:12)(cid:12) i ∈ I ( x, y, ω ) o(cid:17) ≥ a. (18) Then the penalty function Φ c with the penalty term (17) is globally exact andthere exists c ∗ ≥ such that for any c ≥ c ∗ the following statements hold true:1. ( x ∗ , y ∗ ) ∈ S c ( γ ) is a locally optimal solution of the penalized problem (13) iff ( x ∗ , y ∗ ) is a locally optimal solution of the problem ( P ) ;2. ( x ∗ , y ∗ ) ∈ S c ( γ ) is an inf-stationary point of the penalty function Φ c onthe set A × L p (Ω , A , P ; R m ) iff ( x ∗ , y ∗ ) is an inf-stationary point of thefunctional I on the feasible set F of the problem ( P ) .Proof. Let us show that under the assumptions of the theorem ϕ ↓ ( x, · )( y ) ≤ − a for any ( x, y ) ∈ X \ F such that x ∈ A and ϕ ( x, y ) < + ∞ (here ϕ ↓ ( x, · )( y ) isthe rate of steepest descent of the function y ϕ ( x, y ) at the point y ). Thenapplying [22, Thrm. 2] one obtains the required result.To prove the required estimate for ϕ ↓ ( x, · )( y ), we first construct a descentdirection for the function ϕ using condition (18), and then obtain an upperestimate for the rate of steepest descent via the directional derivative of ϕ alongthe constructed descent direction.Fix any ( x, y ) ∈ X \ F such that x ∈ A and ϕ ( x, y ) < + ∞ . Recall that bythe definition of quasidifferential one has Q i ( h, ω ) = ( g i ( x, · , ω )) ′ ( y ( ω ) , h ) = max v ∈ ∂ y g i ( x,y ( ω ) ,ω ) h v, h i + min w ∈ ∂ y g i ( x,y ( ω ) ,ω ) h w, h i (19)19see Def. 3). Applying Assumption 5 and [1, Thrm. 8.2.11] one obtains thatthe function Q i is measurable in ω for any h ∈ R m . Moreover, since in thefinite dimensional case the quasidifferential is a pair of compact convex sets, thefunction Q i is continuous for a.e. ω ∈ Ω, i.e. Q i is a Carath´eodory function.Let us now prove that the multifunction I ( · ) := I ( x, y ( · ) , · ), I : Ω → { , . . . , ℓ } is measurable. Indeed, by definitions for any nonempty subset K ⊆ { , . . . , ℓ } one has I − ( K ) = n ω ∈ Ω (cid:12)(cid:12)(cid:12) I ( x, y ( ω ) , ω ) ∩ K = ∅ o = n ω ∈ Ω (cid:12)(cid:12)(cid:12) max k ∈ K g k ( x, y ( ω ) , ω ) ≥ max i ∈ I g i ( x, y ( ω ) , ω ) o . This set is measurable, since the functions g i ( x, y ( · ) , · )) are measurable due tothe fact that the maps ( y, ω ) g i ( x, y, ω ) are Carath´eodory functions by ourassumption. Thus, for any subset K ⊆ { , . . . , s } the set I − ( K ) is measur-able, that is, the set-valued map I ( · ) is measurable by definition (see, e.g. [1,Def. 8.1.1]).Introduce the sets E = n ω ∈ Ω (cid:12)(cid:12)(cid:12) max i ∈ I g i ( x, y ( ω ) , ω ) > o . Note that the set E is measurable, thanks to our assumption that the maps( y, ω ) g i ( x, y, ω ) are Carath´eodory functions. Moreover, P ( E ) > x, y ) is not a feasible point of the problem ( P ).Since the multifunction I ( · ) is measurable and Q i are Carath´eodory func-tions, the set-valued mapping H ( ω ) := n h ∈ R m (cid:12)(cid:12)(cid:12) | h | = 1 , max i ∈ I ( ω ) Q i ( h, ω ) = min | z | =1 max i ∈ I ( ω ) Q i ( z, ω ) o , ω ∈ E is measurable by [1, Thrm. 8.2.11]. Furthermore, this multifunction obviouslyhas closed images. Therefore by [1, Thrm. 8.1.3] there exists a measurablefunction h ∗ : E → R m such that h ∗ ( ω ) ∈ H ( ω ) for all ω ∈ E . For any ω ∈ Ω \ E define h ∗ ( ω ) = 0. Then h ∗ : Ω → R m is a measurable function and, moreover, k h ∗ k p = P ( E ) > ω ∈ E there exists b h ( ω ) ∈ R m with | b h ( ω ) | = 1 such that h v, b h ( ω ) i ≤ − a ∀ v ∈ co n ∂ y g i ( x, y ( ω ) , ω ) + w i ( x, y ( ω ) , ω ) (cid:12)(cid:12)(cid:12) i ∈ I ( ω ) o . Hence with the use of (19) one obtains that Q i ( b h ( ω ) , ω ) ≤ − a for all ω ∈ E and i ∈ I ( ω ), which by the definition of h ∗ implies thatmax i ∈ I ( ω ) Q i ( h ∗ ( ω ) , ω ) ( ≤ − a, if ω ∈ E, = 0 , if ω / ∈ E. (20)Thus, the function h ∗ is the desired descent direction, along which we willevaluate the directional derivative of the penalty term ϕ .Indeed, denote ψ ( ω, α ) = max i ∈ I { , g i ( x, y ( ω ) + αh ∗ ( ω ) , ω ) } for all α ≥ ω ∈ Ω. Applying relations (20) and standard calculus rules for directional20erivatives (see, e.g. [14]) one gets thatlim α → +0 ψ ( ω, α ) − ψ ( ω, α = ( max i ∈ I ( ω ) Q i ( h ∗ ( ω ) , ω ) ≤ − a, if ω ∈ E, , if ω / ∈ E. Applying Assumption 4 and the well-known fact that the maximum of a fi-nite family of Lipschitz continuous is Lipschitz continuous (see, e.g. [13, Ap-pendix III]) one obtains that there exists an a.e. nonnegative function L ( · ) ∈ L (Ω , A , P ) such that (cid:12)(cid:12)(cid:12)(cid:12) ψ ( ω, α ) − ψ ( ω, α (cid:12)(cid:12)(cid:12)(cid:12) ≤ L ( ω ) | h ∗ ( ω ) | ≤ L ( ω ) ∀ α > , a.e. ω ∈ Ω . Note also that ψ ( · , ∈ L (Ω , A , P ), since ϕ ( x, y ) < + ∞ . Hence by the inequal-ity above ψ ( · , α ) ∈ L (Ω , A , P ) for all α >
0. Consequently, applying Lebesgue’sdominated convergence theorem and the fact that ϕ ( x, y + αh ∗ ) = E [ ψ ( · , α )] oneobtains that (cid:2) ϕ ( x, · ) (cid:3) ′ ( y ; h ∗ ) = lim α → +0 ϕ ( x, y + αh ∗ ) − ϕ ( x, y ) α = Z E max i ∈ I ( ω ) Q i ( h ∗ ( ω ) , ω ) dP ( ω ) ≤ − aP ( E ) . Therefore ϕ ↓ ( x, · )( y ) = lim inf z → y ϕ ( x, z ) − ϕ ( x, y ) k z − y k p ≤ lim inf α → +0 ϕ ( x, y + αh ∗ ) − ϕ ( x, y ) α k h ∗ k p = (cid:2) ϕ ( x, · ) (cid:3) ′ ( y ; h ∗ ) k h ∗ k p ≤ − aP ( E ) P ( E ) = − a, and the proof is complete. Remark . (i) Note that by [35, Crlr. 14.14] the multifunctions ∂ y g i ( x, y ( · ) , · )and ∂ y g i ( x, y ( · ) , · ) are measurable for any measurable function y ( · ), provided forany ω ∈ Ω the mapping ∂ y g i ( x, · , ω ) is outer semicontinuous and the graphicalmapping ω Graph ∂ y g i ( x, · , ω ) is measurable.(ii) In the case when the functions g i are continuously differentiable in y , as-sumption (18) is satisfied iff there exists a > x, y ) ∈ R d + m and a.e. ω ∈ Ω such that y / ∈ G ( x, ω ) one hasdist (cid:16) , co n ∇ y g i ( x, y, ω ) (cid:12)(cid:12)(cid:12) i ∈ I ( x, y, ω ) o(cid:17) ≥ a. This condition can be viewed as a uniform Mangasarian-Fromovitz constraintqualification. In turn, in the case when the functions g i are convex in y , as-sumption (18) is satisfied iff there exists a > x, y ) ∈ R d + m and a.e. ω ∈ Ω such that y / ∈ G ( x, ω ) one hasdist (cid:16) , co n ∂ y g i ( x, y, ω ) (cid:12)(cid:12)(cid:12) i ∈ I ( x, y, ω ) o(cid:17) ≥ a. where ∂ y g i ( x, y, ω ) is the subdifferential of the function g i ( x, · , ω ) in the senseof convex analysis. 21 emark . Suppose that for a.e. ω ∈ Ω the functions ( x, y ) f ( x, y, ω ) and( x, y ) g i ( x, y, ω ), i ∈ I , are DC (Difference-of-Convex), that is, there existsconvex in ( x, y ) functions f ( x, y, ω ) , f ( x, y, ω ) , g i ( x, y, ω ), and g i ( x, y, ω ) suchthat f ( x, y, ω ) = f ( x, y, ω ) − f ( x, y, ω ) , g i ( x, y, ω ) = g i ( x, y, ω ) − g i ( x, y, ω )for all ( x, y ) ∈ R d + m , i ∈ I , and a.e. ω ∈ Ω. Then the penalty function fromTheorem 3 is DC as well. Namely, one has Φ c ( x, y ) = Φ c ( x, y ) − Φ c ( x, y ), whereΦ c ( x, y ) = Z Ω (cid:16) f ( x, y ( ω ) , ω )+ c max i ∈ I n , g i ( x, y ( ω ) , ω ) + X k = i g k ( x, y ( ω ) , ω ) o(cid:17) dP ( ω ) , and Φ c ( x, y ) = Z Ω (cid:16) f ( x, y ( ω ) , ω ) + c X i ∈ I g i ( x, y ( ω ) , ω ) (cid:17) dP ( ω )are convex functionals. Therefore with the use of Theorem 3 and well-knownglobal optimality conditions for DC optimization problems one can easily ob-tain global optimality conditions for the problem ( P ) and the original two-stagestochastic programming problem (cf. [41]). Moreover, under the assumptions ofTheorem 3 one can apply well-developed methods of DC optimization to findlocal or global minima of the DC penalty function Φ c ( x, y ), which coincide withlocal/global minima of the problem ( P ). Thus, Theorem 3 opens a way for ap-plications of DC programming algorithms to two-stage stochastic programmingproblems (cf. [33, 42]). Let us finally derive optimality conditions for the problem ( P ) in terms of cod-ifferentials. We will derive these conditions by applying standard optimalityconditions for quasidifferentiable functions to an exact penalty function for theproblem ( P ).For the sake of shortness, we will consider only the case when the set A isconvex and obtain optimality conditions under the assumptions of Theorem 3.It should be noted that one can obtain such conditions under less restrictiveassumptions on the functional I and the penalty function Φ c , if one considersthe so-called local exactness of the penalty function instead of the global one(see [11, 19]). Moreover, one can significantly relax the assumptions on theconstraints of the second-stage problem by considering the case p = + ∞ andutilising the highly nonsmooth penalty term ϕ ( x, y ) = ess sup ω ∈ Ω n max i ∈ I { , g i ( x, y ( ω ) , ω ) } o . However, the price one has to pay for less restrictive assumptions on con-straints is the reduced regularity of Lagrange multipliers (see the theorem be-low). Namely, in this case one must assume that the Lagrange multipliers arejust finitely additive measures. 22or any convex subset K of a Banach space Y and any y ∈ K denote by N K ( y ) = { y ∗ ∈ Y ∗ | h y ∗ , z − y i ≤ ∀ z ∈ K } the normal cone to the set K atthe point y . Theorem 4.
Let < p < + ∞ , the set A be convex, the feasible set of thesecond-stage problem (12) have the form G ( x, ω ) = n y ∈ R m (cid:12)(cid:12)(cid:12) g i ( x, y, ω ) ≤ , i ∈ I = { , . . . , ℓ } o for some functions g i : R d × R m × Ω → R , the function f satisfy Assumption 1,and the functions g i , i ∈ I , satisfy the same assumption. Suppose also thatassumptions 1, 3–6 of Theorem 3 are valid, and ( x ∗ , y ∗ ) is a locally optimalsolution of the problem ( P ) such that ( x ∗ , y ∗ ) ∈ S c ( γ ) for some c ≥ c ∗ , where c ∗ is from Theorem 3.Then for any measurable selection (0 , w x ( · ) , w y ( · )) of the set-valued map-ping d x,y f ( x ∗ , y ∗ ( · ) , · ) and any measurable selections (0 , w xi ( · ) , w yi ( · )) of themultifunctions d x,y g i ( x ∗ , y ∗ ( · ) , · ) , i ∈ I , there exist ζ ∈ L (Ω , A , P ; R d ) andnonnegative multipliers λ i ∈ L ∞ (Ω , A , P ) , i ∈ I , such that E [ ζ ] ∈ − N A ( x ∗ ) , P i ∈ I k λ i k ∞ ≤ c ∗ , λ i ( ω ) g i ( x ∗ , y ∗ ( ω ) , ω ) = 0 for a.e. ω ∈ Ω and all i ∈ I , and (0 , ζ ( ω ) , ∈ d x,y f ( x ∗ , y ∗ ( · ) , · ) + (0 , w x ( · ) , w y ( · ))+ s X i =1 λ i ( ω ) (cid:16) d x,y g i ( x ∗ , y ∗ ( · ) , · ) + (0 , w xi ( · ) , w yi ( · )) (cid:17) for a.e. ω ∈ Ω .Proof. Under the assumptions of the theorem the functional I is Lipschitz con-tinuous on bounded sets by Corollary 2. Let ϕ ( x, y ) = Z Ω max i ∈ I (cid:8) , g i ( x, y, ω ) (cid:9) dP ( ω ) ∀ ( x, y ) ∈ X. Then by Theorem 3 the pair ( x ∗ , y ∗ ) is a point of local minimum of the penaltyfunction Φ c on the set A × L p (Ω , A , P ) for any c ≥ c ∗ , where c ∗ is from Theo-rem 3. Thus, in particular, ( x ∗ , y ∗ ) is a point of local minimum of the problemmin ( x,y ) J ( x, y ) = Z Ω f ( x, y ( ω ) , ω ) dP ( ω ) s.t. ( x, y ) ∈ A × L p (Ω , A , P ; R m ) , where f = f + c ∗ max i ∈ I { , g i } . The function f is codifferentiable in ( x, y ),and applying codifferential calculus (see, e.g. [14]) one can compute its codif-ferential and verify that f satisfies Assumption 1. Therefore by Corollary 1the functional J is directionally differentiable. Applying well-known necessaryconditions for a minimum of a directionally differentiable function on a convexset (see, e.g. [14, Lemma V.1.2]) and Corollary 1 one obtains that J ′ ( x ∗ , y ∗ ; h x , h y ) = Z Ω (cid:2) f ( · , · , ω ) (cid:3) ′ ( x ∗ , y ∗ ( ω ); h x , h y ( ω )) dP ( ω ) ≥ h x , h y ) ∈ ( A − x ∗ ) × L p (Ω , A , P ; R m ). Hence with the use of the standardcalculus rules for directional derivatives (see [14, Sect. I.3]) one gets that for all23uch ( h x , h y ) the following inequality holds true: J ′ ( x ∗ , y ∗ ; h x , h y ) = Z Ω (cid:16)(cid:2) f ( · , · , ω )] ′ ( x ∗ , y ∗ ( ω ); h x , h y ( ω ))+ c ∗ max i ∈ b I ( ω ) (cid:2) g i ( · , · , ω ) (cid:3) ′ ( x ∗ , y ∗ ( ω ); h x , h y ( ω )) (cid:17) dP ( ω ) ≥ , where g ( x, y, ω ) ≡ b I ( ω ) = n i ∈ I ∪ { } (cid:12)(cid:12)(cid:12) g i ( x ∗ , y ∗ ( ω ) , ω ) = max i ∈ I (cid:8) , g i ( x ∗ , y ∗ ( ω ) , ω ) (cid:9)o . Fix any measurable selection (0 , w x ( · ) , w y ( · )) of the set-valued mapping d x,y f ( x ∗ , y ∗ ( · ) , · )and any measurable selections (0 , w xi ( · ) , w yi ( · )) of the set-valued mapping d x,y g i ( x ∗ , y ∗ ( · ) , · ), i ∈ I , and denote ( w x ( · ) , w y ( · )) ≡
0. Then by the definition of quasidifferential(Def. 3) and equality (2) one has Z Ω (cid:16) max ( v x ,v y ) ∈ ∂f ( x ∗ ,y ∗ ( ω ) ,ω ) (cid:0) h v x + w x ( ω ) , h x i + h v y + w y ( ω ) , h y ( ω ) i (cid:1) + c ∗ max i ∈ b I ( ω ) max (cid:0) h v xi + w xi ( ω ) , h x i + h v yi + w yi ( ω ) , h y ( ω ) i (cid:1)(cid:17) dP ( ω ) ≥ h x , h y ) ∈ ( A − x ∗ ) × L p (Ω , A , P ; R m ), where the last maximum is takenover all ( v xi , v yi ) ∈ ∂g i ( x ∗ , y ∗ ( ω ) , ω ). Consequently, one has Z Ω max ( v x ,v y ) ∈ Q ( ω ) (cid:0) h v x , h x i + h v y , h y ( ω ) i (cid:1) dP ( ω ) ≥ h x , h y ) ∈ ( A − x ∗ ) × L p (Ω , A , P ; R m ), where Q ( ω ) = ∂f ( x ∗ , y ∗ ( ω ) , ω ) + ( w x ( ω ) , w y ( ω ))+ c ∗ co n ∂g i ( x ∗ , y ∗ ( ω ) , ω ) + ( w xi ( ω ) , w yi ( ω )) (cid:12)(cid:12)(cid:12) i ∈ b I ( ω ) o for any ω ∈ Ω.Let us show that the multifunction Q ( · ) is measurable. Indeed, as waspointed out in the proof of Corollary 1, Assumption 1 guarantees that the set-valued mappings ∂f ( x ∗ , y ∗ ( · ) , · ) and ∂g i ( x ∗ , y ∗ ( · ) , · ), i ∈ I ∪{ } , are measurable.Hence with the use of [35, Prp. 14.11, part (c)] one gets that the set-valued maps ∂f ( x ∗ , y ∗ ( · ) , · ) + ( w x ( · ) , w y ( · )) and ∂g i ( x ∗ , y ∗ ( · ) , · ) + ( w xi ( · ) , w yi ( · )), i ∈ I ∪ { } ,are measurable as well.Arguing in the same way as in the proof of Theorem 3 one can easily checkthat the multifunction b I ( · ) is measurable, which implies that the set-valuedmaps Q i ( ω ) := ( ∂g i ( x ∗ , y ∗ ( ω ) , ω ) + ( w xi ( ω ) , w yi ( ω )) , if i ∈ b I ( ω ) , ∅ , if i / ∈ b I ( ω )are measurable for all i ∈ I ∪ { } . Therefore by [35, Prp. 14.11, part (b)]and [1, Thrm. 8.2.2] the set-valued mapco (cid:16) [ i ∈ I ∪{ } Q i ( · ) (cid:17) = co n ∂g i ( x ∗ , y ∗ ( · ) , · ) + { (0 , w xi ( · ) , w yi ( · ) } (cid:12)(cid:12)(cid:12) i ∈ b I ( · ) o .
24s measurable. Hence applying [35, Prp. 14.11, part (c)] one finally gets thatthe multifunction Q ( · ) is measurable.Now, arguing in the same way as in the proof of Lemma 2 (or utilising theinterchangeability principle; see, e.g. [35, Thrm. 14.60]) one gets that inequality(21) is satisfied iffmax ( v x ( ω ) ,v y ( · )) Z Ω (cid:0) h v x ( ω ) , h x i + h v y ( ω ) , h y ( ω ) i (cid:1) dP ( ω ) ≥ h x , h y ) ∈ ( A − x ∗ ) × L p (Ω , A , P ; R m ), where the maximum is taken overall measurable selections of the multifunction Q ( · ) (note that at least one suchselection exists by [1, Thrm. 8.1.3]). From the definition of Q ( · ) and the growthcondition on the codifferentials of the functions f and g i from Assumption 1 itfollows that the set of all measurable selection of Q ( · ) is a bounded subspace ofthe space L (Ω , A , P ; R d ) × L p ′ (Ω , A , P ; R m ). Therefore inequality (22) can berewritten as follows:max ( v ,v ) ∈Q ( x ∗ ,y ∗ ) (cid:16) h v , h x i + Z Ω h v y ( ω ) , h y ( ω ) i dP ( ω ) (cid:17) ≥ h x , h y ) ∈ ( A − x ∗ ) × L p (Ω , A , P ; R m ), where Q ( x ∗ , y ∗ ) := n ( v , v ) ∈ R d × L p ′ (Ω , A , P ; R m ) (cid:12)(cid:12)(cid:12) v = E [ v x ] , v = v y , ( v x ( · ) , v y ( · )) is a measurable selection of the map Q ( · ) o . The set Q ( x ∗ , y ∗ ) is bounded due to the boundedness of the set of all measurableselections of Q ( · ). Furthermore, the set Q ( x ∗ , y ∗ ) is convex and closed, since bydefinition Q ( · ) has closed and convex images. Therefore, Q ( x ∗ , y ∗ ) is a weaklycompact convex subset of R d × L p ′ (Ω , A , P ; R m ). Hence taking into accountinequality (23) and applying the separation theorem one can easily check that Q ( x ∗ , y ∗ ) ∩ (cid:16)(cid:8) − N A ( x ∗ ) } × { } (cid:17) = ∅ . Consequently, by the definitions of Q ( x ∗ , y ∗ ) and Q ( · ) there exists a function ζ ∈ L (Ω , A , P ; R d ) such that E [ ζ ] ∈ − N A ( x ∗ ) and( ζ ( ω ) , ∈ ∂f ( x ∗ , y ∗ ( ω ) , ω ) + ( w x ( ω ) , w y ( ω ))+ c ∗ co n ∂g i ( x ∗ , y ∗ ( ω ) , ω ) + ( w xi ( ω ) , w yi ( ω )) (cid:12)(cid:12)(cid:12) i ∈ b I ( ω ) o (24)for a.e. ω ∈ Ω.Let E J = { ω ∈ Ω | b I ( ω ) = J } for any nonempty subset J ⊆ I ∪ { } . Thesets E J form a partition of Ω. Moreover, these sets are measurable, since themultifunction b I ( · ) is measurable.Observe that from (24) it follows that( ζ ( ω ) , ∈ ∂f ( x ∗ , y ∗ ( ω ) , ω ) + ( w x ( ω ) , w y ( ω ))+ c ∗ co n ∂g i ( x ∗ , y ∗ ( ω ) , ω ) + ( w xi ( ω ) , w yi ( ω )) (cid:12)(cid:12)(cid:12) i ∈ J o for any ω ∈ E J and any nonempty J ⊆ I ∪ { } . With the use of the Filippovtheorem (see, e.g. [1, Thrm. 8.2.10]) one can readily check that the previous25nclusion implies that for any nonempty J ⊆ I ∪ { } there exist nonnegativemeasurable functions α Ji ( · ), i ∈ J , such that P i ∈ J α Ji ( ω ) = 1 and( ζ ( ω ) , ∈ ∂f ( x ∗ , y ∗ ( ω ) , ω ) + ( w x ( ω ) , w y ( ω ))+ c ∗ X i ∈ J α i ( ω ) (cid:16) ∂g i ( x ∗ , y ∗ ( ω ) , ω ) + ( w xi ( ω ) , w yi ( ω )) (cid:17) for a.e. ω ∈ E J . For any i ∈ I define λ i ( ω ) = ( c ∗ α Ji ( ω ) , if ω ∈ E J , i ∈ J (or, equivalently, i ∈ b I ( ω )) , , otherwise.Observe that by definition λ i , i ∈ I , are nonnegative measurable functions suchthat P i ∈ I k λ i k ∞ ≤ c ∗ , and λ i ( ω ) g i ( x ∗ , y ∗ ( ω ) , ω ) = 0 for a.e. ω ∈ Ω, since λ i ( ω ) = 0 whenever i / ∈ b I ( ω ), i.e. g i ( x ∗ , y ∗ ( ω ) , ω ) <
0. Furthermore, bearing inmind the fact that w x ( · ) ≡ w y ( · ) ≡
0, and ∂g ( x ∗ , y ∗ ( ω ) , ω ) ≡ { } one getsthat ( ζ ( ω ) , ∈ ∂f ( x ∗ , y ∗ ( ω ) , ω ) + ( w x ( ω ) , w y ( ω ))+ X i ∈ I λ i ( ω ) (cid:16) ∂g i ( x ∗ , y ∗ ( ω ) , ω ) + ( w xi ( ω ) , w yi ( ω )) (cid:17) . for a.e. ω ∈ Ω. Hence applying equality (2) we arrive at the required result.
Remark . It should be noted that with the use of the codifferential calculus onecan compute a codifferential of the function f from the proof of the previoustheorem, apply necessary conditions for a minimum of a codifferentiable functionon a convex set [17, Thrm. 2.8] to the functional J , and then directly rewritethese conditions in terms of the problem ( P ) with the use of Theorem 1 and anexplicit expression for a codifferential of f . However, one can check that thisapproach leads to more cumbersome optimality conditions than the ones fromthe theorem above. It is possible to verify that these conditions are equivalent,but in the author’s opinion the proof of this equivalence is more difficult thanthe proof of the previous theorem. That is why we chose to present a simpler,but somewhat indirect derivation of optimality conditions for the problem ( P ). Remark . Note that in the case when the functions f and g i , i ∈ I , aredifferentiable jointly in x and y , the optimality conditions from Theorem 4 takethe following well-known form (cf. [26, 36, 40, 45, 46]). There exist nonnegativemultipliers λ i ∈ L ∞ (Ω , A , P ), i ∈ I , such that λ i ( ω ) g i ( x ∗ , y ∗ ( ω ) , ω ) = 0 for a.e. ω ∈ Ω and all i ∈ I , and * E h ∇ x f ( x ∗ , y ∗ ( · ) , · ) + X i ∈ I λ i ( · ) ∇ x g i ( x ∗ , y ∗ ( · ) , · ) i , x − x ∗ + ≥ ∀ x ∈ A, ∇ y f ( x ∗ , y ∗ ( ω ) , ω ) + X i ∈ I λ i ( ω ) ∇ y g i ( x ∗ , y ∗ ( ω ) , ω ) = 0 for a.e. ω ∈ Ω . This work was devoted to an analysis of nonsmooth two-stage stochastic pro-gramming problems with the use of tools of constructive nonsmooth analy-sis [14]. In the first part of the paper, we analysed the co-/quasi-differentiability26f the expectation of nonsmooth random integrands and obtained explicit formu-lae for its co-/quasi-differentials under some natural measurability and growthconditions on the integrand and its codifferential.In the second part of the paper, we obtained two types of sufficient conditionsfor the global exactness of a penalty function for two-stage stochastic program-ming problems, reformulated as equivalent variational problems with pointwiseconstraints. The first type of sufficient conditions is formulated for the penaltyterm defined via the L p norm of the distance to the feasible set of the secondstage problem, while the second type of sufficient conditions is formulated forthe penalty term that is independent of p and is defined via the constraints ofthe second stage problems. Although the second type of sufficient conditionsis much more restrictive than the first one, it is more convenient for applica-tions and derivation of optimality conditions. Furthermore, as is pointed out inRemark 6, these conditions open a way for the derivation of global optimalityconditions and application of DC optimization method to two stage stochasticprogramming problems, whose second stage problem has DC objective functionand DC constraints.Finally, in the last part of the paper we combined our results on codifferen-tiability of the expectation of nonsmooth random integrands and exact penaltyfunction to derive optimality conditions for nonsmooth two-stage stochastic pro-gramming problems in terms of codifferentials, involving essentially boundedLagrange multipliers. References [1] J.-P. Aubin and H. Frankowska.
Set-Valued Analysis . Birkh¨auser, Boston,1990.[2] G. Barbarosoˇglu and Y. Arda. A two-stage stochastic programming frame-work for transportation planning in disaster response.
J. Oper. Res. Soc. ,55:43–53, 2004.[3] J. R. Birge and F. Louveaux.
Introduction to Stochastic Programming .Springer, New York, 2011.[4] V. I. Bogachev.
Measure Theory. Volume I . Springer-Verlag, Berlin, Hei-delberg, 2007.[5] J. V. Burke. The subdifferential of measurable composite max integrandsand smoothing approximation.
Math. Program. , 181:229–264, 2020.[6] X. Chen and M. Fukushima. Expected residual minimization method forstochastic linear complementarity problems.
Math. Oper. Res. , 30:916–638,2005.[7] X. Chen, R. J.-B. Wets, and Y. Zhang. Stochastic variational inequalities:residual minimization smoothing sample average approximations.
SIAM J.Optim. , 22:649–673, 2012.[8] S. Dempe, V. Kalashnikov, G. A. P´erez-Vald´es, and N. Kalashnykova, edi-tors.
Bilevel Programming Problems. Theory, Algorithmis and Applicationsto Energy Networks . Springer, Berlin, Heidelberg, 2015.279] S. Dempe and A. Zemkoho, editors.
Bilevel Optimization. Advances andNext Challenges . Springer, Cham, 2020.[10] V. F. Demyanov. Conditions for an extremum in metric spaces.
J. Glob.Optim. , 17:55–63, 2000.[11] V. F. Demyanov. Nonsmooth optimization. In G. Di Pillo and F. Schoen,editors,
Nonlinear optimization. Lecture notes in mathematics, vol. 1989 ,pages 55–163. Springer-Verlag, Berlin, 2010.[12] V. F. Demyanov and L. C. W. Dixon, editors.
Quasidifferential Calculus .Springer, Berlin, Heidelberg, 1986.[13] V. F. Dem’yanov and V. N. Malozemov.
Introduction to Minimax . DoverPublications, New York, 2014.[14] V. F. Demyanov and A. M. Rubinov.
Constructive Nonsmooth Analysis .Peter Lang, Frankfurt am Main, 1995.[15] V. F. Demyanov and A. M. Rubinov, editors.
Quasidifferentiability andRelated Topics . Kluwer Academic Publishers, Dordrecht, 2000.[16] M. V. Dolgopolik. Codifferential calculus in normed spaces.
J. Math. Sci. ,173:441–462, 2011.[17] M. V. Dolgopolik. Nonsmooth problems of calculus of variations via codif-ferentiation.
ESAIM: Control Optim. Calc. Var. , 20:1153–1180, 2014.[18] M. V. Dolgopolik. Abstract convex approximations of nonsmooth functions.
Optim. , 64:1439–1469, 2015.[19] M. V. Dolgopolik. A unifying theory of exactness of linear penalty func-tions.
Optim. , 65:1167–1202, 2016.[20] M. V. Dolgopolik. A convergence analysis of the method of codifferentialdescent.
Comput. Optim. Appl. , 71:879–913, 2018.[21] M. V. Dolgopolik. Constrained nonsmooth problems of the calculus ofvariations and nonsmooth Noether equations. arXiv: 2004.14061 , pages1–44, 2020.[22] M. V. Dolgopolik and A. Fominyh. Exact penalty functions for optimal con-trol problems I: main theorem and free-endpoint problems.
Optim. ControlAppl. Meth. , 40:1018–1044, 2019.[23] C. I. F´abi´an and Z. Sz˝oke. Solving two-stage stochastic programming prob-lems with level decomposition.
Comput. Manag. Sci. , 4:313–353, 2007.[24] S. D. Fl˚am and J. Zowe. Exact penalty functions in single-stage stochasticprogramming.
Optim. , 21:723–734, 1990.[25] E. Grass and K. Fischer. Two-stage stochastic programming in disastermanagement: a literature survey.
Surv. Oper. Res. Manag. Sci. , 21:85–100, 2016. 2826] J. B. Hiriart-Urruty. Conditions n´ecessaires d’optimalit´e pour un pro-gramme stochastique avec recours.
SIAM J. Control Optim. , 16:317–329,1978.[27] G. H. Huang and D. P. Loucks. An inexact two-stage stochastic program-ming model for water resourcers management under uncertainty.
Civ. Eng.Environ. Syst. , 17:95–118, 2000.[28] H. Le¨ovey and W. R¨omisch. Quasi-Monte Carlo methods for linear two-stage stochastic programming problems.
Math. Program. , 151:315–345,2015.[29] S. Lin, M. Huang, Z. Xia, and D. Li. Quasidifferentiabilities of the expecta-tion functions of random quasidifferentiable functions.
Optim. , pages 1–16,2020. DOI: 10.1080/02331934.2020.1818235.[30] C. Liu, Y. Fan, and F. Ord´o˜nez. A two-stage stochastic programming modelfor transportation network protection.
Comput. Oper. Res. , 36:1582–1590,2009.[31] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochas-tic approximation approach to stochastic programming.
SIAM J. Optim. ,19:1574–1609, 2009.[32] W. Oliveira, C. Sagastiz´abal, and S. Scheimberg. Inexact bundle methodsfor two-stage stochastic programming.
SIAM J. Optim. , 21:517–544, 2011.[33] A. V. Orlov. On a solving bilevel D.C.-convex optimization problems. InY. Kochetov, I. Bukadorov, and T. Gruzdeva, editors,
Mathematical Opti-mization Theory and Operations Research. MOTOR 2020. , pages 179–191.Springer, Cham, 2020.[34] G. Di Pillo and L. Grippo. Exact penalty functions in constrained opti-mization.
SIAM J. Control Optim. , 27:1333–1360, 1989.[35] R. T. Rockafellar and R. J.-B. Wets.
Variational Analysis . Springer-Verlag,Berlin, Heidelberg, 1998.[36] R. T. Rockafellar and R. J.-B. Wets. Stochastic convex programming:Kuhn-Tucker conditions.
J. Math. Econ. , 2:349–370, 1975.[37] R. T. Rockafellar and R. J.-B. Wets. On the interchange of subdiffer-entiation and conditional expectation for convex functions.
Stochastics ,7:173–182, 1982.[38] A. Rubinov and X. Yang.
Lagrange-Type Functions in Constrained Non-Convex Optimization . Kluwer Academic Publishers, Boston, 2003.[39] A. Shapiro and T. H. de Mello. A simulation-based approach to two-stagestochastic programming with recourse.
Math. Program. , 81:301–325, 1998.[40] A. Shapiro, D. Dentcheva, and A. Ruszcz`nski.
Lectures on Stochastic Pro-gramming: Modeling and Theory . SIAM, Philadelphia, 2014.2941] A. S. Strekalovsky. Global optimality conditions and exact penalization.
Optim. Lett. , 13:597–615, 2019.[42] A. S. Strekalovsky and A. V. Orlov. Global search for bilevel optimiza-tion with quadratic data. In S. Dempe and A. Zemkoho, editors,
BilevelOptimization , pages 313–334. Springer, Cham, 2020.[43] A. Uderzo. On the variational behaviour of functions with positive steepestdescent rate.
Positivity , 19:725–745, 2015.[44] A. Uderzo. A strong metric subregularity analysis of nonsmooth mappingsvia steepest displacement rate.
J. Optim. Theory Appl. , 171:573–599, 2016.[45] S. Vogel. Necessary optimality conditions for two-stage stochastic program-ming problems.
Optim. , 16:607–616, 1985.[46] H. Xu and J. J. Ye. Necessary optimality conditions for two-stage stochasticprograms with equilibrium constraints.
SIAM J. Optim. , 20:1685–1715,2010.[47] H. Xu and D. Zhang. Smooth sample average approximation of station-ary points in nonsmooth stochastic optimization and applications.
Math.Program. , 119:371–401, 2009.[48] A. J. Zaslavski.
Optimization on Metric and Normed Spaces . Springer,New York, 2010.[49] Z. Zhou, J. Zhang, P. Liu, Z. Li, M. C. Georgiadis, and E. N. Pistikopou-los. A two-stage stochastic programming model for the optimal design ofdistributed energy systems.