[PDF] Preference Robust Optimization for Quasi-concave Choice Functions

Abstract

In behavioural economics, a decision maker's preferences are expressed by choice functions. Preference robust optimization (PRO) is concerned with problems where the decision maker's preferences are ambiguous, and the optimal decision is based on a robust choice function with respect to a preference ambiguity set. In this paper, we propose a PRO model to support choice functions that are: (i) monotonic (prefer more to less), (ii) quasi-concave (prefer diversification), and (iii) multi-attribute (have multiple objectives/criteria). As our main result, we show that the robust choice function can be constructed efficiently by solving a sequence of linear programming problems. Then, the robust choice function can be optimized efficiently by solving a sequence of convex optimization problems. Our numerical experiments for the portfolio optimization and capital allocation problems show that our method is practical and scalable.

Full PDF

PPreference robust optimization for quasi-concave choicefunctions

Jian Wu, William B. Haskell, Wenjie Huang, and Huifu XuSeptember 2020

Abstract

In behavioural economics, a decision maker’s preferences are expressed by choice functions . Preference robust optimization (PRO) is concerned with problems where the decision maker’spreferences are ambiguous, and the optimal decision is based on a robust choice function withrespect to a preference ambiguity set. In this paper, we propose a PRO model to support choicefunctions that are: (i) monotonic (prefer more to less), (ii) quasi-concave (prefer diversiﬁcation),and (iii) multi-attribute (have multiple objectives/criteria). As our main result, we show that therobust choice function can be constructed eﬃciently by solving a sequence of linear programmingproblems. Then, the robust choice function can be optimized eﬃciently by solving a sequence ofconvex optimization problems. We test the behavior and scalability of our method numericallyon a portfolio optimization problem and a capital allocation problem.

Stochastic optimization and robust optimization are the two primary methods for decision makingunder uncertainty. Much of the focus in this literature has been on exogenous uncertainties beyondthe control of the decision maker, such as market demand, portfolio returns, and climate change.However, there is also signiﬁcant endogenous uncertainty which arises from ambiguity about thedecision maker’s preferences and choice function (e.g. a utility/risk function).In practice, it is dicult to elicit the decision makers true choice function. There may be inadequateinformation for a single decision maker to identify a unique choice function which best characterizeshis preferences. For instance, there may only be a few observations of the decision makers behavior.In addition, it is hard to specify a choice function in group decision making where there must beconsensus among several stakeholders. This diculty is further exacerbated in the multi-attributesetting. When multi-attribute prospects are in play, it is not obvious how to characterize themarginal dependencies of the attributes, i.e., it is not always clear how much an increase in the levelof one attribute should depend on the levels of all the others.In preference robust optimization (PRO), this preference ambiguity is explicitly quantiﬁed throughavailable partial information. Then, in the spirit of robust optimization, a robust choice functionis computed over this ambiguity set. PRO is precisely the optimization of this robust choice func-tion. In this paper, we propose a PRO model for decision makers with quasi-concave multi-attributechoice functions. This is a substantial generalization over earlier work in PRO and allows us tobetter match the preferences of real decision makers. Despite this generalization, our numericalmethod is still tractable as it is based on solving a sequence of linear/convex optimization problems.1 a r X i v : . [ q -f i n . R M ] S e p .1 Related work This work is ultimately concerned with decision maker preferences, their representation as choicefunctions, and then optimization over a set of feasible prospects.

Preferences and representation as choice functions

We need to characterize decision makerpreferences by representing them as a choice function in order to do any optimization. When thepreferences of a decision maker satisfy certain axioms including completeness, transitivity, continuityand independence, Von Neumann and Morgenstern’s expected utility theory (Von Neumann andMorgenstern (1945)) guarantees that these preferences may be characterized by an expected utilitymeasure. Artzner et al. (1999) axiomatize the class of coherent risk measures (i.e., monotonicity,scale invariance, convexity, and translation invariant). They then give a representation result for thisclass of risk measures based on Fenchel duality, which characterizes them as a worst-case expectationover a set of probability distributions. Brown et al. (2012) give a representation result for the largerclass of aspirational preferences, which allows for both diversiﬁcation favoring and concentrationfavoring preferences on diﬀerent regions of the space of prospects.

Quasi-concave choice functions

Quasi-concavity expresses the decision maker’s desire for di-versiﬁcation in a more general way than the stronger property of concavity does. For instance, itis known that the cash-additivity assumption fails if there is any form of uncertainty about inter-est rates (see El Karoui and Ravanelli (2009)). Cerreia-Vioglio et al. (2011) argue that under theassumption of cash-subadditivity, the diversiﬁcation principle only implies a quasi-convex choicefunction.Quasi-concave choice functions have already appeared widely in the literature. One of the mostprevalent examples of a quasi-concave choice function is the certainty equivalent (see e.g. Ben-Taland Teboulle (2007)). The indices of acceptability proposed by Cherny and Madan (2009) are alsoquasi-concave choice functions. Brown and Sim (2009) propose the class of satisﬁcing measures,which are based on the idea of meeting a target (i.e., a constraint rather than an objective). Here,the authors argue for the importance of quasi-concavity and focus on “quasi-concave satisﬁcingmeasures”. This work is continued in Brown et al. (2012) where the authors develop the class ofaspirational measures. This class of measures is allowed to be quasi-concave on a diversiﬁcationfavoring set, and quasi-convex on a concentration favoring set. The representation result in Brownet al. (2012) shows that aspirational measures can be expressed in terms of a family of risk measuresand targets.

Multi-attribute preferences

Multi-attribute problems are ubiquitous in practical applicationsbut the existing PRO models mainly emphasize the single attribute setting. For instance, in health-care it is typical to use several metrics rather than just one to measure the quality of life (Torranceet al., 1982; Feeny et al., 2002; Smith and Keeney, 2005). Similar problems can be found in networkmanagement (Azaron et al., 2008; Chen et al., 2010), scheduling (Liefooghe et al., 2007; Zakariazadehet al., 2014), design (Tseng and Lu, 1990; Dino and ¨U¸coluk, 2017), and portfolio optimization (Fliegeand Werner, 2014). Indeed, over the past few decades, there has been signiﬁcant research on multi-attribute expected utility (Von Stengel, 1988; Fishburn and LaValle, 1992; Miyamoto and Wakker,1996; Tsetlin and Winkler, 2006, 2007, 2009) and multi-attribute risk management (Jouini et al.,2004; Burgert and R¨uschendorf, 2006; Hamel and Heyde, 2010; Galichon and Henry, 2012).

Preference elicitation

Preference elicitation is a mechanism for learning decision maker pref-erences. The majority of the preference elicitation literature adopts ordinal judgements. Pairwise2omparisons, where the decision maker is asked to choose one of two prospects, is the most com-mon. In analytic hierarchy process (e.g., Saaty (1990)), relative weights are determined for diﬀerentdecision attributes by pairwise comparison. Alternatively, the choice list method elicits preferencesby posing a sequence of related binary questions (e.g., Binswanger (1980)) that compare a safe anda risky prospect.Alternatively, some literature adopts absolute measurement in preference elicitation. This is anumerical measure of the utility of a risky or risk-free position. The canonical way to do this is bycertainty equivalence (e.g., Becker et al. (1964)). For instance, one asks for the “buying price” of aspecic prospect. However, as argued by Karni and Safra (1987), elicitation by certainty equivalenceonly works in the framework of expected utility.

Preference robust optimization

In expected utility theory, there are two ingredients: (i) theutility function represents the decision maker’s risk attitudes and tastes; and (ii) the probabilitydistribution represents the decision maker’s beliefs about the underlying uncertainty on the states ofthe world. In the absence of complete information, taste and belief are often subjective in which casethey may interact. The classical expected utility model of Von Neumann and Morgenstern assumesthat there is no ambiguity/uncertainty in either of these ingredients. Gilboa and Schmeidler (1989)consider a situation where the decision maker’s beliefs are uncertain. Consequently, they proposea distributionally robust expected utility model where preferences are characterized through themost conservative belief (the worst case probability), see Maccheroni et al. (2006) and Gilboa andMarinacci (2016) for a more recent development in this regard.Another important stream of research focuses on ambiguity in the decision maker’s tastes. Suchambiguity may arise from a lack of accurate description of human behaviour (Thurstone (1927)),cognitive diﬃculty, or incomplete information (Karmarkar (1978) and Weber (1987)). Parametricand non-parametric approaches have subsequently been proposed to assess the true utility function,these include discrete choice models (Train (2009)), and standard and paired gambling approachesfor preference comparisons and certainty equivalence (Farquhar (1984)). We refer readers to Huet al. (2018) for an overview on this approach.In our present setting, the decision maker does not have complete information about his pref-erences or their representation as a choice function. Armbruster and Delage (2015) study a PROmodel for expected utility maximization problems. Speciﬁcally, they model the decision maker’s am-biguity by incorporating various properties of utility functions such as monotonicity, concavity, andS-shapedness, along with preference elicitation information obtained from pairwise comparisons. Inaddition, they derive tractable reformulations for the resulting PRO problem by exploiting the aﬃnesupport functions of the class of convex/concave functions. Delage and Li (2017) extend this workto risk management in ﬁnance where the investor’s choice of a monetary risk measure is ambiguous.As in Armbruster and Delage (2015), they construct an ambiguity set of risk measures via impor-tant properties such as convexity, coherence, law invariance, and elicited preferences. They thendevelop tractable reformulations using the acceptance set representation for convex risk measures.In a related paper, Delage et al. (2017) propose a robust model for shortfall risk measures to tacklethe case where investors are ambiguous about their utility loss functions. They construct an ambi-guity set for loss functions via pairwise comparison for utility risk measures that can accommodatefeatures such as coherence, convexity, and boundedness, and they derive a tractable reformulationas a linear program (LP). Wang and Xu (2020) consider ambiguity in spectral risk measures (SRM).They introduce a robust SRM model based on the worst-case risk spectrum from an uncertainty ballcentred at a nominal risk spectrum. A step-like approximation for risk spectra is developed and anerror bound for the approximation is derived and then used in optimization.Hu and Mehrotra (2015) approach PRO diﬀerently. First, they propose a moment-type ambiguity3et for a decision maker’s utility preferences via the certainty equivalent, pairwise comparisons, upperand lower bounds of the trajectories of the utility functions, and bounds on derivatives at speciﬁedgrid points. Second, they consider a probabilistic representation of the class of increasing convexutility functions. Third, by constructing a piecewise linear approximation of the trajectories of theutility bounds, they derive a tractable LP reformulation of the resulting PRO problem. Hu andMehrotra’s approach is closely related to stochastic dominance, a subject which has been intenselystudied (see e.g. the monographs Muller and Stoyan (2002); Shaked and Shanthikumar (2007)for a comprehensive treatment of the topic and Dentcheva and Ruszczy´nski (2003, 2004) for astudy of optimization problems with stochastic dominance constraints and their duality theory). Inparticular, Dentcheva and Ruszczy´nski (2009); Hu et al. (2011); Haskell et al. (2013) all developmultivariate stochastic dominance constrained optimization models.There is also work on other robust multi-attribute choice models, for instance see Lam et al.(2013); Ehrgott et al. (2014). Bertsimas and O’Hair (2013) consider robustness in multi-attributelinear utility functions. They use an integer programming formulation to address human incon-sistency, robust optimization and conditional value-at-risk to address loss aversion, and adaptiveconjoint analysis and linear optimization to learn preferences. Noyan and Rudolf (2018) considermulti-attribute PRO for a general class of scalarization functions (speciﬁcally, the class of min-biaﬃnefunctions), where the vector of weights for the attributes lies in a convex ambiguity set. Vayanoset al. (2020) ask how to do active preference elicitation within the robust/polyhedral approach touncertainty modelling, under both the max-min utility and min-max regret decision criteria. Foroﬄine elicitation (where all queries are made at once) and online elicitation (where queries areselected sequentially in an adaptive fashion), their problem can be formulated as two-stage (resp.multi-stage) robust optimization problem with decision-dependent information discovery.

In this paper, we emphasize choice functions that are monotonic, quasi-concave, and multi-attribute.In particular, we do not enforce convexity/concavity or translation invariance. The main contribu-tions of our present paper are as follows:1.

Multi-attribute quasi-concave PRO model.

We put forward a robust choice model forpreference ambiguity where the underlying choice function is monotonic, quasi-concave, andmulti-attribute. By replacing concavity with quasi-concavity and dropping translation invari-ance, we extend the existing PRO models in a way that makes it easier to incorporate thepreference elicitation information of real decision makers. By replacing concavity with quasi-concavity and dropping translation invariance, we extend the existing PRO models to coverall diversiﬁcation favoring decision maker behaviors. Moreover, our new framework coversa number of well-known preference models, such as expected utility and aspirational prefer-ences. Our model’s support for multiple attributes also makes it applicable to a broad class ofmulti-criteria problems.2.

Value problem decomposition.

We ﬁrst show that computation of the robust choice func-tion can be decomposed into two stages: the ﬁrst stage “value problem” and the second stage“interpolation problem”. The value problem is a disjunctive programming problem, which isnon-convex and is diﬃcult to solve in general. However, we show how to solve the value prob-lem eﬃciently via a sequence of LPs, and we refer to this method as our “sorting algorithm”.The optimal solution of the value problem can then be taken as input to the simpler and lessexpensive interpolation problem. As supported by our numerical results, the sorting algorithm4s scalable in terms of size of the preference elicitation information, the number of scenarios,and the number of attributes are.3.

Acceptance set characterization.

We ﬁrst show how to construct the acceptance sets forour robust choice function. This follows by applying LP duality to the above interpolationproblem. As a corollary, we ﬁnd a representation of our robust choice function as an aspira-tional measure (see Brown et al. (2012)). Once we have the acceptance sets in hand, the PROproblem can be solved exactly using binary search and solving a sequence of convex optimiza-tion problems. In particular, by leveraging the special structure of the acceptance sets, weonly have to solve a small number of convex optimization problems.4.

Law invariance.

We show how to incorporate law invariance into our model. Law invariancerequires exponentially many constraints in our ﬁrst stage value problem. However, we canreduce these constraints to a manageable polynomial number of constraints through dualityresults for the assignment problem. After this reduction, the rest of our development goesthrough similarly with only slight modiﬁcation.This paper is organized as follows. Section 2 reviews preliminaries on choice functions. Section3 then presents the details of our main PRO problem. In Section 4, we show how to solve the valueproblem and construct the robust choice function by solving a sequence of LPs. Section 5 then showshow to explicitly construct the acceptance sets of the robust choice function. Section 6 completesthe picture and explains how to do PRO using acceptance sets and the binary search algorithm.We follow in Section 7 by explaining how to incorporate the property of law invariance into ourframework. Section 8 presents numerical experiments for a portfolio optimization problem and acapital allocation problem, and the paper concludes in Section 9. All proofs are organized togetherin the Appendix.

We begin with a probability space (Ω , F , P ) where Ω is a sample space, F is a σ − algebra on Ω, and P is a probability measure on (Ω , F ). We let L = L ∞ (cid:0) Ω , F , P ; R N (cid:1) for N ≥ X : Ω → R N , equipped with the essential supremum norm: (cid:107) X (cid:107) L := inf { a ∈ R : P {(cid:107) X ( ω ) (cid:107) ∞ > a } = 0 } , where (cid:107) · (cid:107) ∞ is the ∞ -norm on R N . Each X ∈ L represents a prospect of payoﬀs associated witha decision (e.g. gains from a portfolio or the vector of revenues from a set of retailers) and thecomponents of X represent features of the prospect. We are especially concerned with the case N ≥ X = ( X n ) Nn =1 , where X n represents attribute n = 1 , . . . , N .The preferences of the decision maker for prospects in L are described by a partial order “ (cid:23) ”,where X (cid:23) Y means X is preferable to Y . Deﬁnition 2.1.

Let (cid:23) be a partial order on L .(i) (cid:23) is complete if for any X, Y ∈ L , either X (cid:23) Y or Y (cid:23) X holds.(ii) (cid:23) is transitive if X (cid:23) Y and Y (cid:23) Z , then X (cid:23) Z . Deﬁnition 2.2 (Preference relation) . (cid:23) is a preference relation (or a weak order ) if (cid:23) is completeand transitive. 5o do optimization over prospects, we need to represent the decision maker’s preferences viaa choice function. Choice functions map from the space of prospects L to the extended real line¯ R := R ∪ {−∞ , ∞} . Deﬁnition 2.3 (Choice function) . Let (cid:23) be a preference relation. The function φ : L → ¯ R is a choice function corresponding to (cid:23) if and only if φ ( X ) ≥ φ ( Y ) for all X, Y ∈ L with X (cid:23) Y .The choice function φ is also known as a representation of the preference relation (cid:23) (see e.g.Puppe (1991) and the references therein). Existence and uniqueness of a choice function dependon the prospect space L , its associated preference relation (cid:23) , as well as the properties of φ . Forexample, there is a continuous choice function φ if and only if (cid:23) is a weak order and continuous (i.e.,for each ﬁxed X ∈ L , the sets { Y ∈ L : X (cid:23) Y } and { Y ∈ L : Y (cid:23) X } are closed under the topologyof weak convergence, see Debreu (1964)). Another example is the well established expected utilitymodel of Von Neumann-Morgenstern.We emphasize the following properties of choice functions in this work: Deﬁnition 2.4 (Properties of choice functions) . Let φ : L → ¯ R be a choice function. • [Mon] (Monotone) For all X, Y ∈ L , X ≤ Y implies φ ( X ) ≤ φ ( Y ). • [Co] (Concave) For all X, Y ∈ L and λ ∈ [0 , φ ( λX + (1 − λ ) Y ) ≥ λ φ ( X ) + (1 − λ ) φ ( Y ). • [QCo] (Quasi-concave) For all X, Y ∈ L and λ ∈ [0 , φ ( λX + (1 − λ ) Y ) ≥ min { φ ( X ) , φ ( Y ) } . • [Usc] (Upper semi-continuous) For all X ∈ L , lim sup Y → X φ ( Y ) = φ ( X ).We let R QCo denote the set of all choice functions satisfying properties [Mon], [QCo], and [Usc].We emphasize choice functions within R QCo throughout this paper, which covers all choice functionsdescribing diversiﬁcation favoring behavior. For the purpose of comparison, we sometimes workwithin R Co ⊂ R QCo , the set of all choice functions satisfying properties [Mon], [Co], and [Usc].Much of the existing work on PRO is done in R Co . Two major examples of single-attribute quasi-concave choice functions follow. Example 2.1. (i) (Certainty equivalent, see Ben-Tal and Teboulle (2007)) Let u : R → R bean increasing and concave utility function. Then φ CE deﬁned by φ CE ( X ) := u − ( E [ u ( X )] is thecertainty equivalent of u , and φ CE ∈ R QCo .(ii) (Aspirational preferences, see Brown et al. (2012)) Let { µ m } m ∈ R be a collection of convexrisk measures (monotonic, translation invariant, and normalized µ m (0) = 0) and let { τ ( m ) } m ∈ R bea collection of targets. Also suppose that τ ( m ) are non-decreasing in m , and that µ m ( X − τ ( m ))is non-decreasing in m for all X ∈ L . Then, φ A : L → R deﬁned by φ A ( X ) := sup { m ∈ R : µ m ( X − τ ( m )) ≤ } is an aspirational measure, and φ A ∈ R QCo .For the multi-attribute case, it is common to choose a vector of weights w ∈ R N + and to then assessthe weighted sum (cid:104) w, X (cid:105) = (cid:80) Nn =1 w n X n with a single-attribute choice function (e.g. φ CE ( (cid:104) w, X (cid:105) )and φ A ( (cid:104) w, X (cid:105) )). Our approach does not require any weights to be speciﬁed, however.In this paper, we are interested in characterizing the upper level sets of φ ∈ R QCo (often calledthe “acceptance sets” of φ ). The acceptance sets of our robust choice function will play a criticalrole in this paper, both for theoretical and computational purposes. Deﬁnition 2.5 (Acceptance sets) . The acceptance set of φ ∈ R QCo at level m ∈ ¯ R is A φm := { X ∈ L | φ ( X ) ≥ m } . φ ∈ R QCo has an acceptance set representation as follows.

Proposition 2.1 (Properties of acceptance sets) . Let φ ∈ R QCo .(i) For all X ∈ L , φ ( X ) = sup { m ∈ R : X ∈ A φm } .(ii) A φm is monotone (i.e., X ∈ A φm and Y ≥ X implies Y ∈ A φm ) for all m ∈ R .(iii) A φm is convex for all m ∈ R .(iv) {A φm } m ∈ R is non-increasing in m (i.e., A m ⊇ A m for all m ≤ m ). Alternatively, given a family of acceptance sets {A m } m ∈ R , we can construct a correspondingchoice function φ ( · ; {A m } m ∈ R ) deﬁned via: φ ( X ; {A m } m ∈ R ) := sup { m ∈ R : X ∈ A m } , ∀ X ∈ L . In the next proposition, we give conditions on the family {A m } m ∈ R so that φ ( · ; {A m } m ∈ R ) belongsto R QCo . This result is an extension of a similar result for quasi-convex risk measures by Drapeauand Kupper (2013).

Proposition 2.2.

Let {A m } m ∈ R be a family of acceptance sets such that:(a) A m is monotone for all m ∈ R .(b) A m is convex for all m ∈ R .(c) {A m } m ∈ R is non-increasing in m .Then, φ ( · ; {A m } m ∈ R ) ∈ R QCo . We describe the setup for our robust choice function and corresponding PRO problem in this section.Let Z be a compact convex subset of a Euclidean space that represents feasible decisions. Let G : Z → L be a stochastic function that maps decisions in Z to prospects in L such that: G ( z, ω ) := [ G ( z )]( ω ) is concave for all ω ∈ Ω . Concavity of z → G ( z, ω ) for all ω ∈ Ω is interpreted in the vector-valued sense when N ≥ z over Z in some sense, but we are concerned with thecase where the decision maker’s choice function φ cannot be speciﬁed. So, something more is neededto even formulate an optimization problem. This brings us to the crux of this paper. To handle the preference ambiguity with the robust approach, we let R ⊂ R QCo denote a general preference ambiguity set which we believe contains the decision maker’s true choice function. Thereare many diﬀerent ways to get R , and we describe a principled approach shortly. The next deﬁnitionformalizes the idea of a robust choice function corresponding to R . Deﬁnition 3.1 (Robust choice function) . Let R ⊂ R QCo be a preference ambiguity set. The robustchoice function ψ R : L → R corresponding to R is deﬁned by: ψ R ( X ) := inf φ ∈ R φ ( X ) , ∀ X ∈ L .

7n both Armbruster and Delage (2015) and Delage and Li (2017), the “robust utility function” and“robust risk measure” are deﬁned analogously.We now discuss the construction of a speciﬁc preference ambiguity set within R QCo . Thisconstruction starts with the idea of preference elicitation by pairwise comparison, in line with muchof the PRO literature (see Armbruster and Delage (2015); Hu and Mehrotra (2015); Delage and Li(2017)). The sequence of pairwise comparisons enables us incorporate information about the speciﬁcpreferences of the decision maker: • [Eli] (Preference elicitation) For a sequence of pairs of prospects E = { ( W k , Y k ) } Kk =1 exposedto the decision maker, the decision maker prefers W k to Y k for all k = 1 , . . . , K . We call E theelicited comparison data set (ECDS).Next we introduce two technical requirements so that our robust choice functions are well-deﬁned.We choose a reference prospect where we normalize the value of the choice function to be zero, andwe suppose the choice function is Lipschitz continuous so that preferences do not vary too rapidly. • [Nor] (Normalization) φ ( W ) = 0 for some ﬁxed normalizing prospect W ∈ L . • [Lip] (Lipschitz continuity) For L > | φ ( X ) − φ ( Y ) | ≤ L (cid:107) X − Y (cid:107) L for all X, Y ∈ L .We then deﬁne the support set Θ := { W } ∪ { ( W k , Y k ) } Kk =1 , which consists of the normalizing prospect W and the ECDS E , where J = 2 K + 1 is the totalnumber of prospects in the support set. In the following assumption, we specify that all prospectsin Θ are essentially bounded and so eﬀectively have bounded support on R N . Assumption 3.1. Θ ⊂ L (i.e., (cid:107) θ (cid:107) L < ∞ for all θ ∈ Θ ). Now let: R ( E ) := { φ ∈ R QCo : [

Eli ] , [ N or ] , [ Lip ] } be the preference ambiguity set corresponding to our ECDS and the additional structural properties[Nor] and [Lip]. The next proposition is an immediate consequence of the fact that monotonicityand quasi-concavity are preserved for ψ R ( E ) (by the inﬁmum). Preference elicitation, normalization,and Lipschitz continuity are also all preserved by the inﬁmum. Proposition 3.2. ψ R ( E ) ∈ R ( E ) ⊂ R QCo . Now we can formulate a preference robust optimization problem. We want to maximize thestochastic function G with respect to the robust choice function ψ R ( E ) . The corresponding PROproblem is: (PRO) max z ∈Z ψ R ( E ) ( G ( z )) ≡ max z ∈Z inf φ ∈ R ( E ) φ ( G ( z )) . Problem (PRO) has a convex feasible region and a quasi-concave objective function. Quasi-concavityof the objective z → ψ R ( E ) ( G ( z )) is not our main computational challenge (there are ways to dealwith this, e.g. the bisection method and the level function method). In our case, it is challengingto even evaluate ψ R ( E ) ( G ( z )) for a ﬁxed z ∈ Z . Much of the remainder of this paper consists ofshowing how to evaluate the robust choice function ψ R ( E ) , and then showing how to solve Problem(PRO). 8 .2 A benchmark prospect We take a brief interlude to discuss how to frame our preferences with respect to a “benchmark”prospect Y ∈ L . This convention originally appeared in the literature on stochastic dominanceconstrained optimization (see Dentcheva and Ruszczy´nski (2003, 2004)), and again in Armbrusterand Delage (2015).The robust choice function with benchmark Y is deﬁned as follows. Deﬁnition 3.2 (Robust choice function with benchmark) . Let R ⊂ R QCo be a preference ambiguityset and Y ∈ L be a benchmark prospect. The robust choice function ψ R ( · ; Y ) : L → R withbenchmark Y is deﬁned via: ψ R ( X ; Y ) := inf φ ∈ R { φ ( X ) − φ ( Y ) } , ∀ X ∈ L . In this deﬁnition, the robust choice function is evaluated over the worst-case shortfall φ ( X ) − φ ( Y )with respect to the benchmark.The next proposition shows how to ﬁt the benchmark into our earlier framework, by deﬁningnew preference ambiguity setsˆ R ( E ) := { φ ∈ R QCo : [

Eli ] , [ Lip ] , φ ( Y ) ∈ R } , where φ ( Y ) is allowed to take any ﬁnite real value, and R ( E ; Y ) := { φ ∈ R QCo : [

Eli ] , [ Lip ] , φ ( Y ) = 0 } , where φ is normalized at Y to satisfy φ ( Y ) = 0. Proposition 3.3. ψ ˆ R ( E ) ( X ; Y ) = ψ R ( E ; Y ) ( X ) for all X ∈ L . In summary, whenever we have a benchmark Y , we can just take Y to be the normalizingprospect (that we earlier denoted as W ) and set φ ( Y ) = 0. All other considerations of our upcomingdevelopment are the same. In order to solve Problem (PRO), we need to be able to evaluate ψ R ( E ) ( G ( z )) for given z ∈ Z .Evaluation of ψ R ( E ) ( G ( z )) can be decomposed into a ﬁrst stage “value problem” (which does notdepend on G ( z ), but which does depend on the preference ambiguity set R ( E )) and a second stage“interpolation problem” (which does depend on G ( z ), and also depends on the optimal solution ofthe value problem).The “value problem” is P := inf φ ∈ R ( E ) (cid:88) θ ∈ Θ φ ( θ ) , and it searches over choice functions φ ∈ R ( E ) to minimize the sum of the values of φ on Θ. Then,given any set of values v = ( v θ ) θ ∈ Θ for the choice function on the prospects in Θ, the “interpolationproblem” is P ( X ; v ) := inf φ ∈ R QCo { φ ( X ) : [ Lip ] , φ ( θ ) ≥ v θ , ∀ θ ∈ Θ } , and it ﬁnds the smallest possible value for φ ( X ) among functions φ ∈ R QCo that dominate the values v on Θ. In the next result and throughout, we let val ( · ) denote the optimal value of an optimizationproblem. 9 heorem 3.4. Problem P has a unique optimal solution φ ∗ ∈ R QCo . Furthermore, for any X ∈ L , ψ R ( E ) ( X ) = val ( P ( X ; v ∗ )) , where v ∗ θ = φ ∗ ( θ ) = ψ R ( E ) ( θ ) for all θ ∈ Θ . For the rest of the paper, we adopt the following notation for the values of ψ R ( E ) on Θ to emphasizethat they are also the optimal solution of the value problem P : • v ∗ = ( v ∗ θ ) θ ∈ Θ where v ∗ θ := ψ R ( E ) ( θ ) for all θ ∈ Θ.The point of the above theorem is that we need to solve Problem P ﬁrst to obtain v ∗ . This resultis helpful because it means that the harder value problem only has to be solved once for a givenpreference ambiguity set R ( E ). Then, we can solve the easier interpolation problem as needed. Thereasoning behind our two-stage decomposition can also be applied to the PRO models in Armbrusterand Delage (2015); Delage and Li (2017). In this section we explain how to solve the value problem P . We begin with the following convenientassumption to get a ﬁnite-dimensional reformulation of Problem P . Assumption 4.1.

Ω = { ω , ω , . . . , ω T } (i.e., the underlying sample space is ﬁnite) and P ( ω ) > for all ω ∈ Ω (i.e., all scenarios have positive probability). Under Assumption 4.1, we may identify a prospect X ∈ L with the vector of its realizations (cid:126)X = ( X ( ω )) ω ∈ Ω . This convention ﬁrst appeared in Delage and Li (2017) in their work on PRO forconvex risk measures. We sometimes refer to (cid:126)X as a “long vector” because it is essentially a list of therealizations of X for all scenarios ω ∈ Ω. In this way, there is a one-to-one correspondence betweenprospects in L and long vectors in R T N . In this view, we will equate (cid:126)X ≡ X and ψ R ( E ) ( (cid:126)X ) ≡ ψ R ( E ) ( X ) for all X ∈ L . In addition, by the second part of Assumption 4.1 we know that all theprospects in L (including Θ) are ﬁnite-valued. The second part of Assumption 4.1 is without loss ofgenerality, since we can just discard any scenarios with probability zero. We now explain how to formulate the value problem P as a ﬁnite-dimensional optimization problemunder Assumption 4.1. Deﬁne (cid:98) E := { ( θ, θ (cid:48) ) ∈ Θ × Θ : θ (cid:54) = θ (cid:48) } to be the set of all edges in the set Θ (to be used to enforce quasi-concavity). The decision variablesin our upcoming reformulation of Problem P are: • v = ( v θ ) θ ∈ Θ , where v θ ∈ R corresponds to the value φ ( θ ) for all θ ∈ Θ; • v W = φ (cid:16) (cid:126)W (cid:17) = 0 corresponding to the value of the choice function at the normalizingprospect (cid:126)W ; • s = ( s θ ) θ ∈ Θ , where s θ ∈ R T N corresponds to an upper subgradient (cid:126)Z → max (cid:110) v θ + (cid:104) s θ , (cid:126)Z − θ (cid:105) , v θ (cid:111) = v θ + max (cid:110) (cid:104) s θ , (cid:126)Z − θ (cid:105) , (cid:111) of the function φ at θ for all θ ∈ Θ. 10he value problem P is then equivalent to the disjunctive programming problem: P ≡ min v, s (cid:88) θ ∈ Θ v θ (1a)s.t. v θ + max {(cid:104) s θ , θ (cid:48) − θ (cid:105) , } ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ (cid:98) E , (1b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, ∀ θ ∈ Θ , (1c) v θ ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , (1d) v W = 0 . (1e)We refer to the disjunctive support function in constraints (1b) as “hockey-stick” type functions inrecognition of their shape. In constraint (1c), we are using the fact that a quasi-concave function is L − Lipschitz if and only if there exists an upper subgradient whose dual norm is bounded by L (seeLemma B.1). We continue to let v ∗ = ( v ∗ θ ) θ ∈ Θ denote the optimal solution of Problem (1), whichwe know coincides with the values of our robust choice function ψ R ( E ) on Θ.The interpolation problem then takes (any) values v = ( v θ ) θ ∈ Θ as input, and computes the value φ ( (cid:126)X ) at a speciﬁc (cid:126)X ∈ L by minimizing over all hockey-stick support functions that dominatethe values v = ( v θ ) θ ∈ Θ on Θ. We introduce the following decision variables for the interpolationproblem: • v X ∈ R corresponding to the value φ ( (cid:126)X ); • a ∈ R T N corresponding to an upper subgradient (cid:126)Z → v X + max (cid:110) (cid:104) a, (cid:126)Z − (cid:126)X (cid:105) , (cid:111) of φ at (cid:126)X .The interpolation problem P ( (cid:126)X ; v ) is then equivalent to the disjunctive programming problem: P ( (cid:126)X ; v ) ≡ min v X , a v X (2a)s.t. v X + max (cid:110) (cid:104) a, θ − (cid:126)X (cid:105) , (cid:111) ≥ v θ , ∀ θ ∈ Θ , (2b) a ≥ , (cid:107) a (cid:107) ≤ L. (2c)We note that both Problems (1) and (2) are non-convex due to the disjunctive constraints.The next theorem veriﬁes the correctness of reformulations (1) and (2), and it connects them interms of the robust choice function ψ R ( E ) . Theorem 4.2.

Suppose Assumption 4.1 holds. Then, Problem (1) has a unique optimal solution v ∗ = ( v ∗ θ ) θ ∈ Θ where v ∗ θ = ψ R ( E ) ( θ ) for all θ ∈ Θ . Furthermore, for any (cid:126)X ∈ L , ψ R ( E ) (cid:16) (cid:126)X (cid:17) = val ( P ( (cid:126)X ; v ∗ )) . To solve these disjunctive programming problems, one can turn to the mixed-integer linearprogram (MILP) reformulation (e.g. Big-M or convex hull). One has to introduce J binary variablesin the MILP reformulation of Problem P and J binary variables in the MILP reformulation ofProblem P ( (cid:126)X ; v ). This leads to computational intractability when the number of elicited pairwisecomparisons is large. As an alternative to an MILP reformulation, we can solve Problem P more eﬃciently via a sequenceof LPs. The robust choice function ψ R ( E ) naturally induces an ordering over the prospects in Θ,11anked in descending order according to the value of ψ R ( E ) . The key idea behind our algorithm isthat we can do this sorting recursively, one prospect at a time. We start with the deﬁnition of adecomposition that preserves this ordering (recall that we denote the values of the robust choicefunction on Θ by v ∗ θ = ψ R ( E ) ( θ ) for all θ ∈ Θ).

Deﬁnition 4.1 (A decomposition of Θ) . Let each prospect θ ∈ Θ be paired with its v ∗ θ − value toform the pair ( θ, v ∗ θ ). We call an ordered list D := { ( θ, v ∗ θ ) } θ ∈ Θ (cid:48) of such pairs a decomposition of Θif: (i) Θ (cid:48) is a permutation of Θ; and (ii) v ∗ θ ≥ v ∗ θ (cid:48) for all θ, θ (cid:48) ∈ Θ such that θ precedes θ (cid:48) in Θ (cid:48) .A decomposition is a permutation of { ( θ, v ∗ θ ) } θ ∈ Θ that is arranged in descending order accordingto the value of ψ R ( E ) . Let D t denote the ﬁrst t elements of D , for t = 1 , , . . . , J , where necessarily D J = D . We treat {D t } Jt =1 as ordered lists (rather than sets) because we are concerned with theorder of their elements. For ordered lists a and b , we let { a, b } denote the ordered list obtained byconcatenating a and b . With a slight abuse of notation, we say that θ ∈ D t if ( θ, v ∗ θ ) ∈ D t . We alsodeﬁne v t := min { v ∗ θ | θ ∈ D t } to be the smallest value of ψ R ( E ) among the prospects in D t .Since the decomposition is an ordered list, it deﬁnes an induced preference relation on Θ, whichcoincides with the one induced by ψ R ( E ) . By the deﬁnition of decomposition, it is immediate thatthe preference relation recovers the contours (indiﬀerence curves) of ψ R ( E ) . That is, prospects withthe same ψ R ( E ) − value are seated together in D . Deﬁnition 4.2 (Preference relation induced by the decomposition) . We say θ precedes (resp.,succeeds) θ (cid:48) if θ precedes θ (cid:48) in D (resp., θ succeeds θ (cid:48) in D ). For all t = 1 , , . . . , J −

1, we say theprospect θ is the succeeding prospect of D t if θ is the last element of D t +1 , i.e., θ / ∈ D t but θ ∈ D t +1 .We will construct a decomposition D in the following way. We always start with D = { ( W , } since the normalizing prospect is canonically the most preferred with value v ∗ W = 0. Then, given D t for t = 1 , , . . . , J −

1, we want to identify the succeeding prospect of D t to form D t +1 . Ourprocedure is based on the idea that, for some θ / ∈ D t , we only need the information in D t to compute ψ R ( E ) ( θ ). In other words, to determine ψ R ( E ) ( θ ) for the succeeding prospects of D t , we only need D t (and not the values of any other prospects). 𝜗 ! 𝜗 " 𝜗 … 𝜃𝒟 $ The succeeding act 𝜗 ! 𝜗 " 𝜗 … 𝒟 % 𝜗 $ Figure 1: The sorting procedure. θ (in green) is chosen, and appended to D (in blue) to form D .Given D t , we want to predict the value of ψ R ( E ) ( θ ) for θ / ∈ D t based only on the information in D t . For this θ , we deﬁne the following LP: P ( θ ; D t ) := min v θ ,s θ v θ (3a)s.t. v θ + (cid:104) s θ , θ (cid:48) − θ (cid:105) ≥ v ∗ θ (cid:48) , ∀ θ (cid:48) ∈ D t , (3b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, (3c) v θ ≥ v ∗ θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D t . (3d)Problem P ( θ ; D t ) is related to the interpolation Problem (2), except that it only has linear supportfunction constraints for prospects in D t and it incorporates preference elicitation information. By12eﬁnition of a decomposition, we must have ψ R ( E ) ( θ ) ≤ v t for all θ / ∈ D t . There are thus two casesto consider for the outcome of Problem P ( θ ; D t ): • Case one: ψ R ( E ) ( θ ) < v t (see Figure 2a). In this case, Problem P ( θ ; D t ) computes the lowestattainable value of ψ R ( E ) ( θ ) given D t . • Case two: ψ R ( E ) ( θ ) = v t (see Figure 2b). In this case, the value of ψ R ( E ) ( θ ) should just be v t .If val ( P ( θ ; D t )) ≥ v t then ψ R ( E ) ( θ ) < v t could not hold and we would have ψ R ( E ) ( θ ) ≥ v t .We must then have equality v t = ψ R ( E ) ( θ ) since θ / ∈ D t .Based on these two cases, we deﬁne the function π ( θ ; D t ) := min { v t , val ( P ( θ ; D t )) } to be the predictor of the value ψ R ( E ) ( θ ) for θ / ∈ D t . We will identify the succeeding prospect of D t from the ones that have the largest predicted values. As we will show in the upcoming theorem, theprospect with the largest predicted value has the largest ψ R ( E ) − value among prospects not in D t .Hence this prospect, along with its predicted value, can be appended to the ordered list D t to form D t +1 . After that, we are ready to ﬁnd the succeeding prospect of D t +1 and continue the procedureinductively. 𝜗 ! 𝜗 " 𝜗 𝜃 (a) ψ R ( E ) ( θ ) < ψ R ( E ) ( ϑ ) 𝜗 ! 𝜗 " 𝜗 𝜃 (b) ψ R ( E ) ( θ ) = ψ R ( E ) ( ϑ ) Figure 2: Two cases correspond to the succeeding prospect θ of D We now present our sorting algorithm for the value problem. Each iteration t = 1 , , . . . , J − D t , we compute the predictor π ( θ ; D t ) for every θ / ∈ D t (which is doneby solving an LP). Then, one of the prospects not in D t that maximizes π ( θ ; D t ), as well as itspredicted value, is chosen to append to D t . The algorithm terminates with D = D J . Algorithm 1:

Sorting algorithm for the value problem

Result:

A decomposition D of Θ.Initialization: Θ, t = 1, and D t = { ( W , } ; while t < J do Choose θ ∗ ∈ arg max θ / ∈D t π ( θ ; D t ), and set u θ ∗ := π ( θ ∗ ; D t );Set D t +1 := {D t , ( θ ∗ , u θ ∗ ) } ;Set t := t + 1; endreturn D := D J .The next theorem veriﬁes that Algorithm 1 returns the optimal solution of Problem P .13 heorem 4.3. Algorithm 1 ﬁnds a decomposition D of Θ and computes v ∗ = ( v ∗ θ ) θ ∈ Θ , after solving O ( J ) linear programs.Proof sketch. Step one: (Construction of candidate solution) We conﬁrm that the construction ofa candidate solution to Problem P by Algorithm 1 is well-deﬁned. That is, the predicted value π ( θ ; D t ) is always ﬁnite for all θ and thus we can ﬁnd a maximizer θ ∗ . The algorithm selects oneprospect at a time and eventually exhausts all prospects in Θ (see Lemma B.5, Lemma B.6, andLemma B.9).Step two: (Lower bound) We verify that this candidate solution gives a lower bound on theoptimal solution of Problem P (see Lemma B.7).Step three: (Feasibility) We verify that this candidate solution is also feasible for Problem P (seeLemma B.8). Thus, it must be the unique optimal solution, completing the proof.The MILP reformulations of Problem P may require O (2 J ) LPs to be solved in the worst-case(since these reformulations have O ( J ) binary variables and O ( J ) constraints). By comparison, ourpresent sorting algorithm has better theoretical complexity and requires at most O ( J ) LPs to besolved. In our experiments, we will see that, in line with Theorem 4.3, the computational complexityof our sorting algorithm grows much more slowly in the size of the ECDS E compared to the convexhull reformulation (see Table 1 and Table 2). Remark 4.4.

The correctness of our algorithm heavily relies on an intrinsic feature of the hockey-stick support functions. Any hockey-stick support function of ψ R ( E ) at θ will always dominate θ (cid:48) withsmaller ψ R ( E ) − values by deﬁnition. Thus, we do not need to know the exact values of less preferredprospects and can eﬀectively just ignore them. This is not possible in the concave case which requiresaﬃne support functions. We cannot ignore the values of smaller prospects when minimizing overthis class of support functions. In this section we characterize the acceptance sets of ψ R ( E ) . These acceptance sets will play amajor role in our algorithm for Problem (PRO) in the next section. For this discussion, we supposethat a decomposition D of Θ has already been computed via Algorithm 1. We also artiﬁciallydeﬁne θ J +1 := −∞ to be a constant prospect with value ψ R ( E ) ( θ J +1 ) := −∞ , and we set D J +1 := {D J , ( θ J +1 , −∞ ) } (this convention is to make sure our acceptance sets are well-deﬁned for all levels).Since the normalizing prospect (cid:126)W is the most preferred and ψ R ( E ) ( (cid:126)W ) = 0, we will always have ψ R ( E ) ( (cid:126)X ) ≤ X ∈ L . We then denote the acceptance sets of ψ R ( E ) as: A v := { (cid:126)X ∈ L : ψ R ( E ) ( (cid:126)X ) ≥ v } , ∀ v ≤ . We will compute the explicit form of the acceptance sets by using the dual of the interpolationproblem P ( (cid:126)X ; D t ) for t = 1 , , . . . , J and (cid:126)X ∈ L .Recall that v t is the smallest ψ R ( E ) − value of the prospects in D t . We adopt the followingconvention for selecting the level sets of ψ R ( E ) (as there will be duplicates in v ∗ for prospects on thesame contours): Deﬁnition 5.1 (Level selection) . Given level v ≤ κ ( v ) := { t = 1 , , . . . , J + 1 | v t +1 < v ≤ v t } .The selection t = κ ( v ) is unique under this convention. The next proposition shows that we cancheck if ψ R ( E ) ( (cid:126)X ) ≥ v (equivalently, if (cid:126)X ∈ A v ) by solving Problem P ( (cid:126)X ; D t ) for t = κ ( v ).14 roposition 5.1. For level v ≤ and t = κ ( v ) , ψ R ( E ) ( (cid:126)X ) ≥ v if and only if val ( P ( (cid:126)X ; D t )) ≥ v . Now, the interpolation problem P ( (cid:126)X ; D t ) is an LP, and its dual is: D ( (cid:126)X ; D t ) := max p,q ∈ R t × R (cid:88) θ ∈D t v ∗ θ · p θ − L q (4a)s.t. (cid:88) θ ∈D t θ · p θ − (cid:126)X ≤ q, (4b) (cid:88) θ ∈D t p θ = 1 , p ≥ , q ≥ . (4c)We interpret constraint (4b) in the component-wise sense (i.e., (cid:80) θ ∈D t θ ( ω ) · p θ − (cid:126)X ( ω ) ≤ q for all ω ∈ Ω). We have val ( P ( (cid:126)X ; D t )) = val ( D ( (cid:126)X ; D t )) for all (cid:126)X ∈ L by strong duality, since the primaloptimal value is always ﬁnite.We can use Problem D ( (cid:126)X ; D t ) to get an explicit characterization of the acceptance sets of ψ R ( E ) .We ﬁrst deﬁne the following translations of the prospects in Θ:˜ θ := θ − v ∗ θ /L, ∀ θ ∈ Θ , where we interpret subtraction of the scalar v ∗ θ /L from the vector θ in the component-wise sense.In other words, ˜ θ is a translation of θ by a function of v ∗ θ . The next theorem gives the explicit formof the acceptance sets of ψ R ( E ) . Theorem 5.2.

For all v ≤ , A v =  (cid:126)X ∈ L | (cid:126)X ≥ (cid:88) θ ∈D κ ( v ) ˜ θ · p θ + v/L, (cid:88) θ ∈D κ ( v ) p θ = 1 , p ≥  . In view of ˜ θ as a translation of θ , we can interpret A v as the smallest monotone polyhedron thatcontains the convex hull of all translations ˜ θ for which v ∗ θ ≥ v .This construction of acceptance sets is related to Delage and Li (2017) (which covers PRO forconvex risk measures that are translation invariant). The diﬀerence in our setting is due to thechanging structure of A v for diﬀerent levels v ≤

0. In line with this observation, the next theoremgives a representation for ψ R ( E ) in terms of a family of convex risk measures. This result alsoconnects to Theorem 1 in Brown et al. (2012) which characterizes aspirational measures in a similarlight. For this representation, we deﬁne the constants: c t := − inf (cid:40) m ∈ R | m ≥ (cid:88) θ ∈D t ˜ θ · p θ , (cid:88) θ ∈D t p θ = 1 , p ≥ (cid:41) , t = 1 , , . . . , J, and the coherent risk measures: µ t ( (cid:126)X ) := inf (cid:110) m ∈ R | (cid:126)X + m ∈ A µ t (cid:111) , t = 1 , , . . . , J, with acceptance sets A µ t := (cid:40) (cid:126)X ∈ L | (cid:126)X ≥ (cid:88) θ ∈D t ˜ θ · p θ + c t , (cid:88) θ ∈D t p θ = 1 , p ≥ (cid:41) , t = 1 , , . . . , J. We also deﬁne a “target function” τ ( v ) := v/L − c κ ( v ) for all v ≤ heorem 5.3. For all (cid:126)X ∈ L , we have ψ R ( E ) ( (cid:126)X ) = sup (cid:110) v ≤ | µ κ ( v ) ( (cid:126)X − τ ( v )) ≤ (cid:111) . Consider the sets B µ t := (cid:40) (cid:126)X ∈ L | (cid:126)X ≥ (cid:88) θ ∈D t ˜ θ · p θ , (cid:88) θ ∈D t p θ = 1 , p ≥ (cid:41) , t = 1 , , . . . , J. Following Theorem 5.3, we note that c t is non-decreasing in t since {B µ t } Jt =1 is non-decreasing (i.e., B µ t ⊂ B µ t (cid:48) for t ≤ t (cid:48) ). It follows that the target function τ ( v ) = v/L − c κ ( v ) is also non-decreasingin v , which aligns with Brown et al. (2012). Under Assumption 4.1, we may deﬁne the stochastic function (cid:126)G ( z ) = ( G ( z, ω )) ω ∈ Ω for all z ∈ Z ,which is the long vector for our original stochastic function G : Z → L . With the acceptance sets of ψ R ( E ) in hand, we can solve Problem (PRO) by doing binary search over the levels of the acceptancesets. Given level v ≤

0, we want to ﬁnd some z ∈ Z such that (cid:126)G ( z ) ∈ A v . If we can ﬁnd such a z ,then we can next search at a higher level; otherwise, we next search at a lower level.The proposition below is a consequence of Theorem 5.2, it shows that the search for z ∈ Z with (cid:126)G ( z ) ∈ A v can be posed as a convex feasibility problem in ( z, p ): F v ( D t ) := (cid:40) ( z, p ) | (cid:126)G ( z ) ≥ (cid:88) θ ∈D t ˜ θ · p θ + v/L, (cid:88) θ ∈D t p θ = 1 , z ∈ Z , p ≥ (cid:41) . We interpret the ﬁrst inequality of F v ( D t ) in the component-wise sense. Proposition 6.1.

Choose level v ≤ and t = κ ( v ) . Then, there exists z ∈ Z such that (cid:126)G ( z ) ∈ A v if and only if F v ( D t ) has a solution. The following problem builds on this idea, it ﬁnds the largest value v ≤ F v ( D t ) hasa solution: G ( D t ) := max z,p,v v (5a)s.t. (cid:126)G ( z ) ≥ (cid:88) θ ∈D t ˜ θ · p θ + v/L, (5b) (cid:88) θ ∈D t p θ = 1 , (5c) z ∈ Z , p ≥ . (5d)We show in the following proposition that we can bound the optimal value of Problem (PRO) usingProblem G ( D t ) for t = κ ( v ). Proposition 6.2.

Choose level v ≤ and t = κ ( v ) , then max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≥ v if and only if val ( G ( D t )) ≥ v .

16e can also rephrase the statement of Proposition 6.2 in a more speciﬁc way to better suite ouralgorithm.

Corollary 6.3.

For any t = 0 , , . . . , J , max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) > v t +1 if and only if val ( G ( D t )) >v t +1 . We now present the details of our binary search algorithm for Problem (PRO). Let { v ∗ [0] , . . . , v ∗ [ H ] } where v ∗ [0] = 0 and H ≤ J denote the unique values of v ∗ sorted in descending order. Algorithm 2:

Binary search for Problem (PRO)

Result:

Returns an optimal solution of Problem (PRO)Initialization: h = H , h = 0; while h (cid:54) = h + 1 do Set h := (cid:100) h + h (cid:101) , set t := κ ( v ∗ [ h − );Compute v t = val ( G ( D t )) with optimal solution z ∗ ; if v t > v t +1 then set h := h ; else set h := h ; endend Set h := (cid:100) h + h (cid:101) , set t := κ ( v ∗ [ h − );Compute v t = val ( G ( D t )) with optimal solution z ∗ ; return z ∗ and ψ R ( E ) ( G ( z ∗ )) = min { v t , v t } . Theorem 6.4.

Algorithm 2 returns an optimal solution z ∗ of Problem (PRO), after solving O (log H ) instances of Problem G ( D t ) . The computational complexity O (log H ) in Theorem 6.4 follows from the standard complexityof bifnary search itself (since we have H alternatives to check). In line with this theorem, ournumerical experiments show that the running time of Algorithm 2 is not really sensitive to H . It ismore sensitive to the problem dimension, yet still scalable (see Table 4).As an alternative to Problem (PRO), suppose we want to maximize f ( z ) subject to a dominanceconstraint on G ( z ) with respect to a benchmark prospect Y :max z ∈Z (cid:40) f ( z ) : inf φ ∈ ˆ R ( E ) { φ ( G ( z )) − φ ( Y ) } ≥ (cid:41) . (6)We may then take Y to be the normalizing prospect to obtain the corresponding robust choicefunction ψ R ( E ; Y ) . Since ψ R ( E ; Y ) ( (cid:126)Y ) = 0, Theorem 5.2 applies to show that Problem (6) is equivalentto: max z ∈Z (cid:8) f ( z ) : ( z, p ) ∈ F ( D κ (0) ) (cid:9) . The above display is just a convex optimization problem when f is concave, since F ( D κ (0) ) is aconvex set. We explain in this section how to incorporate law invariance into our framework. The property oflaw invariance says that the decision maker only cares about the distribution of prospects on R N P ◦ X − on R N for any X ∈ L ), and not on the underlying constructionof prospects on Ω.Let X = D Y denote equality in distribution between prospects X, Y ∈ L . We are now concernedwith choice functions that satisfy the following additional property: • [Law] (Law invariance) φ ( X ) = φ ( Y ) for all X, Y ∈ L such that X = D Y .Some choice functions φ ∈ R QCo are intrinsically law invariant (e.g. the certainty equivalent φ CE ),but this property is not at all automatically satisﬁed.We introduce the following assumption to be able to encode law invariance into our optimizationproblems. Assumption 7.1 (Uniform distribution) . The probability measure P is uniform. When P is uniform, we can encode law invariance in terms of all the permutations of the samplespace Ω. It can be shown that this assumption is without loss of generality if P takes rational valueson Ω (see Delage and Li (2017)). Speciﬁcally, when P is uniform we can just take the underlyingsample space to be suﬃciently large to recover any desired rational probability.Let σ denote a permutation of scenarios of Ω = { ω , ω , . . . , ω T } , i.e., we permute the scenariosas σ (Ω) = { ω σ (1) , ω σ (2) , . . . , ω σ ( T ) } , and let Σ denote the set of all permutations on Ω. We recallour original deﬁnition of long vector, (cid:126)X = ( X ( ω t )) Tt =1 , where the realizations are listed in thesame order as Ω. Corresponding to the permutation σ ∈ Σ, we deﬁne the permuted long vector σ ( (cid:126)X ) = ( X ( ω σ ( t ) )) Tt =1 , where the realizations are listed in the same order as σ (Ω). A law invariantchoice function φ ∈ R QCo must then satisfy φ ( (cid:126)X ) = φ ( σ ( (cid:126)X )) for all σ ∈ Σ, since (cid:126)X and σ ( (cid:126)X ) havethe same law under Assumption 7.1 (the distributions on R N are the same since P is uniform).Let F X denote the distribution of X ∈ L . Let W, Y ∈ L , and suppose F W (cid:23) F Y . When [Law] isin eﬀect, F W (cid:23) F Y is equivalent to φ ( σ ( W )) ≥ φ ( σ (cid:48) ( Y )) for all σ, σ (cid:48) ∈ Σ. The ECDS can then beexpressed in terms of distributions as follows: • [Eli] (Preference elicitation) For a sequence of pairs of distributions E = { ( F W k , F Y k ) } Kk =1 , thedecision maker prefers F W k to F Y k for all k = 1 , . . . , K .We continue to use Θ = (cid:110) (cid:126)W (cid:111) ∪ (cid:110) (cid:126)W k , (cid:126)Y k (cid:111) Kk =1 to denote the set of prospects used to construct the (law invariant) robust choice function.Let R L ( E ) := { φ ∈ R QCo : [

Eli ] , [ N or ] , [ Lip ] , [ Law ] } denote our preference ambiguity set for law invariant choice functions. The corresponding robustlaw invariant choice function ψ R L ( E ) : L → R is then deﬁned as: ψ R L ( E ) ( X ) := inf φ ∈ R L ( E ) φ ( X ) , ∀ X ∈ L . Law invariance is preserved by the inﬁmum, as are the other properties of R L ( E ), so we have thefollowing proposition. Proposition 7.2. ψ R L ( E ) ∈ R L ( E ) ⊂ R QCo . .1 Two-stage decomposition In order to do optimization of ψ R L ( E ) , we will develop a two-stage decomposition scheme as we didfor the base case ψ R ( E ) . We will do this now by leveraging our earlier results for ψ R ( E ) . The lawinvariant ECDS E = { ( F W k , F Y k ) } Kk =1 corresponds to the “augmented” ECDS E L := { ( σ ( W k ) , σ (cid:48) ( Y k )) , ∀ σ, σ (cid:48) ∈ Σ } Kk =1 in our basic model without [Law]. We deﬁne the corresponding “augmented” support set to be:Θ L := ∪ σ ∈ Σ { σ ( θ ) | θ ∈ Θ } , by including all permutations of prospects in Θ, and we let J L := (2 K + 1) | Σ | denote the numberof prospects in the augmented support set Θ L .We will ﬁrst characterize ψ R ( E L ) , this is our basic robust choice function corresponding to theaugmented ECDS E L . This characterization will follow from our existing results. Then, we willverify that ψ R ( E L ) = ψ R L ( E ) is actually what we want. Exactly like we did for the basic robustchoice function ψ R ( E ) , we deﬁne the value problem for ψ R ( E L ) as follows. Deﬁne (cid:99) E L := { ( θ, θ (cid:48) ) ∈ Θ L × Θ L : θ (cid:54) = θ (cid:48) } to be the set of all edges in Θ L . The decision variables of the law invariant value problem are: • v = ( v θ ) θ ∈ Θ L , where v θ ∈ R corresponds to the value φ ( θ ) for all θ ∈ Θ L ; • v σ ( W ) = φ (cid:16) σ ( (cid:126)W ) (cid:17) = 0 corresponding to the value at the permutation of the normalizingprospect σ ( (cid:126)W ) for all σ ∈ Σ; • s = ( s θ ) θ ∈ Θ L , where s θ ∈ R T N corresponds to an upper subgradient (cid:126)Z → max (cid:110) v θ + (cid:104) s θ , (cid:126)Z − θ (cid:105) , v θ (cid:111) = v θ + max (cid:110) (cid:104) s θ , (cid:126)Z − θ (cid:105) , (cid:111) of the function φ at θ for all θ ∈ Θ L .The law invariant value problem for ψ R ( E L ) is then: P L := min v, s (cid:88) θ ∈ Θ L v θ (7a)s.t. v θ + max {(cid:104) s θ , θ (cid:48) − θ (cid:105) , } ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ (cid:98) E L , (7b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, ∀ θ ∈ Θ L , (7c) v θ ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E L , (7d) v σ ( W ) = 0 , ∀ σ ∈ Σ . (7e)Problem P L is similar to Problem P , except that it has an exponential number of constraints inEqs. (7b) - (7e) due to the indexing over the set of permutations Σ. We let v ∗ = ( v ∗ θ ) θ ∈ Θ L be theoptimal solution of Problem (7) (which is unique by the same argument we used for our basic valueproblem without [Law]). For values v = ( v θ ) θ ∈ Θ L , the interpolation problem for ψ R ( E L ) is: P L ( (cid:126)X ; v ) := min v X , a v X (8a)s.t. v X + max (cid:110) (cid:104) a, θ − (cid:126)X (cid:105) , (cid:111) ≥ v θ , ∀ θ ∈ Θ L , (8b) a ≥ , (cid:107) a (cid:107) ≤ L. (8c)19t turns out that, as we will show, the value problem and interpolation problem for the law-invariantchoice function ψ R L ( E ) are exactly P L and P L ( (cid:126)X ; v ∗ ).To make this argument, we ﬁrst conﬁrm that an optimal solution of Problem P L exists, is unique,and satisﬁes the special property that v ∗ σ ( θ ) = v ∗ σ (cid:48) ( θ ) for all θ ∈ Θ and σ, σ (cid:48) ∈ Σ. Proposition 7.3.

Suppose Assumption 4.1 holds. Then, Problem (7) has a unique optimal solution v ∗ = ( v ∗ θ ) θ ∈ Θ L where v ∗ θ = ψ R L ( E ) ( θ ) for all θ ∈ Θ L . Furthermore, v ∗ σ ( θ ) = v ∗ σ (cid:48) ( θ ) for all σ, σ (cid:48) ∈ Σ and θ ∈ Θ . By the previous result, Problem P L ( (cid:126)X ; v ∗ ) reduces to:min v X , a v X (9a)s.t. v X + max (cid:110) (cid:104) a, σ ( θ ) − (cid:126)X (cid:105) , (cid:111) ≥ v ∗ θ , ∀ θ ∈ Θ , σ ∈ Σ , (9b) a ≥ , (cid:107) a (cid:107) ≤ L. (9c)Next we verify that the choice function ψ R ( E L ) is law-invariant. Proposition 7.4.

Let v ∗ be the optimal solution of Problem P L . Then, ψ R ( E L ) satisﬁes ψ R ( E L ) ( (cid:126)X ) = val ( P L ( (cid:126)X ; v ∗ )) for all (cid:126)X ∈ L , and ψ R ( E L ) is law-invariant. Finally, we can verify the equivalence ψ R L ( E ) = ψ R ( E L ) and the two-stage decomposition of ψ R L ( E ) in the next theorem. In particular, the value problem for ψ R L ( E ) is exactly P L and theinterpolation problem for ψ R L ( E ) is exactly P L ( (cid:126)X ; v ∗ ). Theorem 7.5.

Let v ∗ be the optimal solution of Problem P L . Then, ψ R L ( E ) (cid:16) (cid:126)X (cid:17) = val ( P L ( (cid:126)X ; v ∗ )) for all (cid:126)X ∈ L . For any θ ∈ Θ, the values v ∗ σ ( θ ) must be equal for all σ ∈ Σ at the optimal solution of Problem P L .So, in our algorithm we only need to compute v ∗ θ for each θ ∈ Θ and then we can assign this samevalue to all permutations σ ( θ ) for all σ ∈ Σ. We use this observation to extend our deﬁnition ofdecomposition to the law-invariant case. We slightly abuse notation here and use v ∗ θ = ψ R L ( E ) ( θ )for all θ ∈ Θ to denote the values of the law invariant robust choice function on Θ.

Deﬁnition 7.1 (Decomposition of Θ in the law-invariant case) . We call an ordered list D L := { ( θ, v ∗ θ ) } θ ∈ Θ (cid:48) a decomposition of Θ in the law-invariant case if: (i) Θ (cid:48) is a permutation of Θ; and(ii) v ∗ θ ≥ v ∗ θ (cid:48) for all θ, θ (cid:48) ∈ Θ such that θ precedes θ (cid:48) in Θ (cid:48) .The goal of our law invariant sorting algorithm is to construct a decomposition D L of Θ. Foreach t = 1 , . . . , J , we let D L,t denote the ﬁrst t pairs of D L , and we terminate with D L,J = D L .In line with the base case, we deﬁne v L,t := min { v ∗ θ | θ ∈ D L,t } to be the smallest ψ R L ( E ) − valueamong prospects in D L,t .We deﬁne the augmented decomposition Σ( D L,t ) := { ( σ ( θ ) , v ∗ σ ( θ ) ) σ ∈ Σ } θ ∈D L,t to be the orderedlist containing all permutations of prospects in D L,t . Since v ∗ σ ( θ ) = v ∗ θ for all θ ∈ Θ and σ ∈ Σ,we seat all permutations of the same θ next to each other in Σ( D L,t ). Using this fact, the linear20nterpolation problem with Σ( D L,t ) is P ( θ ; Σ( D L,t )) = min v θ ,s θ v θ (10a)s.t. v θ + (cid:104) s θ , σ ( θ (cid:48) ) − θ (cid:105) ≥ v ∗ θ (cid:48) , ∀ θ (cid:48) ∈ D L,t , σ ∈ Σ , (10b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, (10c) v θ ≥ v ∗ θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D L,t . (10d)In Problem (10), there is a constraint for every permutation σ ∈ Σ, and the cardinality of Σ growsexponentially in the size of the ECDS. Problem (10) is related to the support function formulationfor concave risk measures (because the support functions for this class of risk measures are aﬃne).Delage and Li (2017) provide a reduction of Problem (10) to an LP with only a polynomial numberof constraints based on strong duality results for the binary assignment problem. We can extendthis idea to the quasi-concave and multi-attribute setting.For this reduction, it is convenient to deﬁne (cid:126)X n = ( X n ( w t )) Tt =1 to be the long vector correspond-ing speciﬁcally to attribute n = 1 , . . . , N of X . Then, σ ( (cid:126)X n ) = ( X n ( ω σ ( t ) )) Tt =1 is a permutation of (cid:126)X n . We introduce the following additional notation to obtain a reduced LP: • θ n is the long vector corresponding to attribute n = 1 , . . . , N of θ ; • s n ∈ R T is the subgradient of φ at θ corresponding to attribute n = 1 , . . . , N ; • y θ ∈ R T and w θ ∈ R T are auxiliary decision variables for every θ ∈ D L,t .We also denote the unit vector by −→ (cid:62) . Our reduced LP is then: P L ( θ ; D L,t ) := min s,v θ , { y θ (cid:48) ,w θ (cid:48) } θ (cid:48)∈D L,t v θ (11a)s.t. −→ (cid:62) y θ (cid:48) + −→ (cid:62) w θ (cid:48) − (cid:104) s, θ (cid:105) + v θ − v ∗ θ (cid:48) ≥ , ∀ θ (cid:48) ∈ D L,t , (11b) N (cid:88) n =1 θ (cid:48) n s (cid:62) n − y θ (cid:48) −→ (cid:62) − −→ w (cid:62) θ (cid:48) ≥ , ∀ θ (cid:48) ∈ D L,t , (11c) s ≥ , (cid:107) s (cid:107) ≤ L, (11d) v θ ≥ v ∗ θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D L,t . (11e)In particular, Problem P L ( θ ; D L,t ) has a polynomial number of constraints and allows for a muchmore eﬃcient implementation compared to Problem (10).

Proposition 7.6.

For all t = 1 , , . . . , J and θ ∈ Θ , Problem P ( θ ; Σ( D L,t )) is equivalent toProblem P L ( θ ; D L,t ) . We now present our complete sorting algorithm for Problem P L where each iteration t =1 , , . . . , J − D L,t , the law invariant predictor π L ( θ ; D L,t ) := min (cid:8) v L,t , val ( P L ( θ ; D L,t )) (cid:9) is computed for every θ / ∈ D L,t . Then, one of the prospects θ / ∈ D

L,t that maximizes π L ( θ ; D L,t ), aswell as its predicted value, is chosen to append to D L,t , and we proceed to the next iteration. The21lgorithm terminates with a decomposition D L,J = D L . Algorithm 3:

Sorting algorithm for the law invariant value problem

Result:

A decomposition D L .Initialization: Θ, t = 1, and D L,t = { ( W , } ; while t < J do Choose θ ∗ ∈ arg max θ / ∈D L,t π L ( θ ; D L,t ), and set u θ ∗ := π L ( θ ∗ ; D L,t );Set D L,t +1 := {D L,t , ( θ ∗ , u θ ∗ ) } ;Set t := t + 1; endreturn D L := D L,J .The next theorem veriﬁes that Algorithm 3 returns the optimal solution of Problem P L . Theorem 7.7.

Algorithm 3 ﬁnds a decomposition D L of Θ and computes ψ R L ( E ) (Θ) , after solving O ( J ) linear programs. We note the same order of complexity in Theorem 7.7 that appeared in Theorem 4.3. Similarto the base case, we will see in our experiments that, in line with Theorem 7.7, the computationalcomplexity of our framework mostly depends on the size of the ECDS E . It is relatively less sensitiveto the dimension of the prospect space L (see Table 3). In this subsection, we characterize the acceptance sets of the law invariant robust choice function ψ R L ( E ) . Suppose we already have a decomposition D L via Algorithm 3. In line with Section 5,we also deﬁne the constant prospect θ J +1 := −∞ with value ψ R L ( E ) ( θ J +1 ) := −∞ , and we set D L,J +1 := {D L,J , ( θ J +1 , −∞ ) } .The following proposition is the counterpart of Proposition 5.1, it shows how to check membershipof the acceptance sets of ψ R L ( E ) which we denote by: A L,v := { (cid:126)X ∈ L : ψ R ( E ) ( (cid:126)X ) ≥ v } , ∀ v ≤ . Proposition 7.8.

For level v ≤ and t = κ ( v ) , ψ R L ( E ) ( (cid:126)X ) ≥ v if and only if val ( P L ( (cid:126)X ; D L,t )) ≥ v . This proposition is immediate because Problem P L ( (cid:126)X ; D L,t ) is a reduction of Problem P ( (cid:126)X ; Σ( D L,t )),which is the linear interpolation problem for the basic model with augmented support set Θ L .Given D L,t , Problem P L ( (cid:126)X ; D L,t ) is an LP and we can express its dual as follows. First deﬁnevariables p ∈ R t , q ∈ R , and { ρ θ } θ ∈D L,t where ρ θ ∈ R T × R T . The dual to P L ( (cid:126)X ; D L,t ) is then: D L ( (cid:126)X ; D L,t ) := max p,q, { ρ θ } θ ∈D L,t (cid:88) θ ∈D L,t v ∗ θ · p θ − L q (12a)s.t. (cid:88) θ ∈D L,t ρ (cid:62) θ θ n − (cid:126)X n ≤ q, ∀ n = 1 , , . . . , N, (12b) (cid:88) θ ∈D L,t p θ = 1 , p ≥ , q ≥ , (12c) (cid:126) (cid:62) ρ θ = p θ (cid:126) (cid:62) , ρ θ (cid:126) p θ (cid:126) (cid:62) , ρ θ ≥ , ∀ θ ∈ D L,t . (12d)22e interpret constraint (12b) in the component-wise sense. We have val ( P L ( (cid:126)X ; D L,t )) = val ( D L ( (cid:126)X ; D L,t ))for all (cid:126)X ∈ L by strong duality, since the primal optimal value is always ﬁnite. Problem D L ( (cid:126)X ; D L,t )leads to an explicit characterization of the acceptance sets {A L,v } v ≤ of ψ R L ( E ) in the following the-orem. Theorem 7.9.

For all v ≤ , A L,v =  (cid:126)X ∈ L (cid:80) θ ∈D L,κ ( v ) v ∗ θ · p θ − L q ≥ v, (cid:80) θ ∈D L,κ ( v ) ρ (cid:62) θ · θ n − (cid:126)X n ≤ q, ∀ n = 1 , , . . . , N (cid:80) θ ∈D L,κ ( v ) p θ = 1 , p ≥ , q ≥ ,(cid:126) (cid:62) ρ θ = p θ (cid:126) (cid:62) , ρ θ (cid:126) p θ (cid:126) (cid:62) , ρ θ ≥ , ∀ θ ∈ D L,κ ( v )  . This theorem follows from Proposition 7.8 and the fact that val ( P L ( (cid:126)X ; D L,t )) = val ( D L ( (cid:126)X ; D L,t ))by strong duality.

The law invariant PRO problem is:(PRO-Law) max z ∈Z ψ R L ( E ) ( G ( z )) ≡ max z ∈Z inf φ ∈ R L ( E ) φ ( G ( z )) . Our strategy for Problem (PRO-Law) is analogous to the basic case and depends on the explicitrepresentation of the acceptance sets of ψ R L ( E ) and a binary search algorithm. We want to ﬁnd thelargest value v such that there exists z ∈ Z with (cid:126)G ( z ) ∈ A L,v . Fix t = 1 , . . . , J , introduce decisionvariables v ∈ R , z ∈ Z , p ∈ R t , q ∈ R , and { ρ θ } θ ∈D L,t where ρ θ ∈ R T × R T , and consider thefollowing problem: G L ( D L,t ) := max z,p,q,ρ,v v (13a)s.t. (cid:88) θ ∈D L,t v ∗ θ · p θ − L q ≥ v, (13b) (cid:88) θ ∈D L,t ρ (cid:62) θ · θ n − (cid:126)G n ( z ) ≤ q, ∀ n = 1 , , . . . , N, (13c) (cid:88) θ ∈D L,t p θ = 1 , p ≥ , q ≥ , (13d) (cid:126) (cid:62) ρ θ = p θ (cid:126) (cid:62) , ρ θ (cid:126) p θ (cid:126) (cid:62) , ρ θ ≥ , ∀ θ ∈ D L,t . (13e)We show in the following proposition that we can bound the optimal value of Problem (PRO-Law)using Problem G L ( D L,t ) for some t = κ ( v ). Proposition 7.10.

Choose v ≤ and t = κ ( v ) , then max z ∈Z ψ R L ( E ) ( (cid:126)G ( z )) ≥ v if and only if val ( G L ( D L,t )) ≥ v . We can state Proposition 7.10 in a more speciﬁc way to better suite our law invariant binarysearch algorithm.

Corollary 7.11.

For any t = 1 , , . . . , J , max z ∈Z ψ R L ( E ) ( (cid:126)G ( z )) > v L,t +1 if and only if val ( G L ( D L,t )) >v L,t +1 .

23e now present the details of our binary search algorithm for the Problem (PRO-Law). Itmirrors Algorithm 2 except now we replace each instance of Problem G ( D t ) with G L ( D L,t ). Let { v ∗ [0] , . . . , v ∗ [ H ] } where v ∗ [0] = 0 and H ≤ J denote the unique values of v ∗ sorted in descending order. Algorithm 4:

Binary search for Problem (PRO-Law)

Result:

Returns an optimal solution of Problem (PRO-Law) in the law-invariant caseInitialization: h = H , h = 0; while h (cid:54) = h + 1 do Set h := (cid:100) h + h (cid:101) , set t := κ ( v ∗ [ h − );Compute v t = val ( G L ( D L,t )) with optimal solution z ∗ ; if v t > v L,t +1 then set h := h ; else set h := h ; endend Set h := (cid:100) h + h (cid:101) , set t := κ ( v ∗ [ h − );Compute v t = val ( G L ( D L,t )) with optimal solution z ∗ ; return z ∗ and ψ R ( E ) ( G ( z ∗ )) = min { v t , v L,t } . Theorem 7.12.

Algorithm 4 returns an optimal solution z ∗ of Problem (PRO-Law), after solving O (log H ) instances of Problem G L ( D L,t ) . We note the same order of complexity in Theorem 7.12 that we saw in Theorem 6.4. Similar tothe base case, the binary search algorithm for Problem (PRO-Law) is highly scalable in both thesize of the problem instance and the level H (which grows with the size of the ECDS E ). In this section, we report numerical experiments for two applications: a single-attribute portfoliooptimization problem and a multi-attribute capital allocation problem. The LPs that appear in thesorting algorithm, and the MILP reformulation of the value problem, are all solved by Gurobi withdefault settings. All experiments are run using Python 3.7 and Gurobi 9.0.2 on a 64-bit Windows10 platform with an Intel Core i7 processor and 16GM RAM.

The perceived choice function

In the experiments, we are not aware of how the decision makerperceives his risk preference and might simply assume that the decision maker uses a “perceivedchoice function”. To generate synthetic ECDS for our experiments, the decision maker can ﬁrst beasked about structural properties like monotonicity, quasi-concavity, and law invariance. Second, thepreference ambiguity set can be reﬁned by pairwise comparison. We take the certainty equivalent asthe perceived choice function throughout our experiments. Note that the simulated decision makerwill always generate the ECDS according to this perceived choice function, but neither the decisionmaker nor our algorithms get to see this choice function and precisely identify it.The perceived choice function is constructed as follows. Let u : R → R be a continuous, monotonenon-decreasing, and concave utility function such that u (0) = 0. The certainty equivalent for this u is then deﬁned by: φ CE ( X ) = u − ( E [ u ( (cid:104) w, X (cid:105) )]) , ∀ X ∈ L , (14)24here u − is the inverse of u , w ∈ R N + is a vector of non-negative weights, and (cid:104) w, X (cid:105) = (cid:80) Nn =1 w n X n is a weighted sum. When N = 1, we just use φ CE ( X ) = u − ( E [ u ( X )]) without any weights. Thechoice function φ CE is monotone, quasi-concave, and law invariant. For our speciﬁc implementation,we take the piecewise utility function u ( x ) = (cid:40) − exp( − γx ) if x ≥ ,γx if x < , for parameter value γ = 0 . Optimizing the perceived choice function

Suppose (hypothetically) that the perceived choicefunction could be precisely identiﬁed, then the following perceived choice optimization can providethe “true” optimal solution of the decision maker. It can also serve as a benchmark and comparison.If we knew the perceived choice function φ CE exactly, we would want to solve:max z ∈Z E [ u ( (cid:104) w, G ( z ) (cid:105) )] , since u − is monotone non-decreasing. The above display is a convex optimization problem underour settings, since the objective is a mixture of concave functions, and it can be solved eﬃciently bystandard algorithms. The goal here is to construct a portfolio from assets m = 1 , , . . . , M , that is robust with respect toambiguity about the decision maker’s preferences. Let Z = { z ∈ R M : (cid:80) Mm =1 z m = 1 , z ≥ } be theset of all feasible portfolios where z m is the proportion of wealth allocated to asset m . The randomreturn rates are R = ( R , R , . . . , R M ) where R m is the return rate of asset m . The overall portfolioreturn is (cid:80) Mm =1 R m z m , and our preference robust portfolio optimization problem is:max z ∈Z ψ R ( E ) (cid:32) M (cid:88) m =1 R m z m (cid:33) . Our data consists of the quarterly return rates of exchange traded funds (ETFs) and the UScentral bank (FED) for 212 assets over ten years, from January 2006 to December 2016 (constructedfrom the daily return rate data). We work with T = 40 scenarios. First, we construct the ECDSby randomly sampling a pair of assets from the entire pool of assets, and the decision maker’spreference in each pair of assets is determined by using the perceived choice function Eq. (14). Wethen randomly choose a batch of M = 20 feasible assets to construct portfolios. Our ﬁrst set of experiments examine the scalability of the sorting algorithm in terms of the sizeof the ECDS. We compare the computation time required to solve the “value problem” P by oursorting algorithm and by the Gurobi solver applied to the MILP reformulation. Both the sortingalgorithm and Gurobi solve the problem exactly and always attain the same optimal value. Thecomparisons are conducted repeatedly for ﬁve groups, where the ECDS is randomly chosen from thepool of assets for each group. Table 1 shows that when the number of ECDS pairs exceeds sixty,the sorting algorithm becomes more eﬃcient compared to Gurobi.25 CDS PairsMethod Group 10 20 30 40 50 60 70Sorting 1 1.16s 6.60s 26.30s 67.00s 142.00s 242.00s 380.00s2 1.09s 6.80s 25.50s 69.00s 144.00s 240.00s 374.00s3 1.10s 6.83s 25.70s 69.00s 142.00s 236.00s 380.00s4 1.09s 6.76s 26.20s 67.00s 144.00s 238.00s 383.00s5 1.08s 6.74s 25.60s 67.00s 145.00s 237.00s 376.00sAverage

MILP 1 0.77s 4.94s 15.95s 28.13s 78.75s 281.40s 529.63s2 0.43s 5.17s 16.05s 28.79s 81.56s 254.67s 493.08s3 0.68s 5.16s 15.78s 28.03s 83.01s 246.25s 484.30s4 0.72s 5.24s 15.90s 28.38s 79.97s 259.67s 499.56s5 0.64s 5.19s 15.13s 31.89s 80.82s 262.24s 493.64sAverage

Table 1: Scalability of the sorting algorithm for portfolio optimization

We investigate the eﬀectiveness of our preference robust formulation in this section. Namely, weconsider the quality of the preference robust optimal solution and the “true” optimal solution withrespect to the perceived choice function. We ﬁrst study how the robust choice function value varieswith increasing ECDS size. We randomly sample ﬁfty assets X , calculate the robust choice functionvalue ψ R ( E ) ( X ) ( ψ R L ( E ) ( X )) of each asset X (all normalized to the same scale), and then calculatethe average. We run this experiment under diﬀerent ECDS sizes.Figure 3 shows how the robust choice function gradually converges as information on the per-ceived choice function is revealed. In particular, the robust choice function (without the [Law]hypothesis) needs more elicited pairs to reach the same level as its law-invariant counterpart. Thisexperiment indicates that if [Law] is enforced (by using ψ R L ( E ) ), then only a few comparisons (possi-bly fewer than ten ECDS pairs) are needed to make robust decisions of high quality. It also indicatesthat accuracy of the robust choice function can be quickly improved by using more structural prop-erties (e.g. law-invariance). − − − − A v e r ag ec h o i ce f un c t i o n v a l u e Robust choice functionRobust choice function [Law]

Figure 3: Normalized robust choice function value with increasing ECDS size26e also illustrate the relationship between the robust optimal portfolio weights and the numberof ECDS pairs. In this experiment, we choose the batch size M = 5. The optimal portfolio alwaysconsists of a mixture of only Asset 1, Asset 4, and Asset 5.In Figure 4, the result for the basic case without [Law] is shown in the left plot and the resultwith [Law] in the right plot. The weight on Asset 5 is at the top (purple area), the weight on Asset4 is in the middle (red area), and the weight on Asset 1 is on the bottom (blue area). The trueoptimal solution under the perceived choice function is to put all weight on Asset 5. We see thatwhen there are more than ﬁve ECDS pairs, the law invariant model already reaches a stable solutionthat matches the “true” optimal solution under the perceived choice function. The model without[Law] reaches a stable solution once there are more than twenty ECDS pairs, with a small optimalitygap. . . . . . . P o r t f o li o w e i g h t s . . . . . . P o r t f o li o w e i g h t s Figure 4: Robust optimal solution for portfolio optimization in the base case (left) and the law-invariant case (right)In the next experiment, we compare the perceived choice function value of the optimal solutions offour diﬀerent models: our robust choice function, our robust law-invariant choice function, expectedreturn, and the perceived choice function itself. In each experimental run, we randomly select twentyassets from the pool as the batch. Then, for each model, we calculate the perceived choice functionvalues of the solutions proposed by PRO with and without [Law], expected return maximization, andperceived choice function maximization. We conduct 500 experimental runs and then calculate theaverage of the optimal choice function values. These values serve as a measure of the performance ofPRO. In some sense, they represent how good the decision maker feels about the solutions proposedby an advisor using PRO.In Figure 5, we see that the performance of PRO improves as the size of the ECDS increases. Wealso note that the robust law-invariant choice function always outperforms our basic robust choicefunction without [Law]. We conjecture that this is because the perceived choice function (certaintyequivalent) is actually law-invariant. Finally, we see that the robust law-invariant choice function isfairly close to ground truth once the size of the ECDS is greater than ten. In this case, enforcing[Law] greatly improves the ﬁdelity of our robust approach with respect to the ground truth.

Our next application is to scenario dependent capital allocation. Consider a ﬁnancial institutionconsisting of sub-units n = 1 , , . . . , N . The overall ﬁnancial institution is represented by the randomvector X = ( X ( ω ) , X ( ω ) , . . . , X N ( ω )) ω ∈ Ω ,

10 20 30 40 50 60 70 80ECDS size − . − . − . − . − . . . A v e r ag e p e r ce i v e dp r e f e r e n ce PROPRO with [Law]Expected return maximizationPerceived optimality

Figure 5: Performance of the portfolios proposed by diﬀerent methodswhere we interpret each X n as the random revenue of sub-unit n = 1 , . . . , N . Deﬁne Z = { Z ∈ L : N (cid:88) n =1 Z n ( ω ) ≤ B, Z ( ω ) ≥ , ∀ ω ∈ Ω } , to be the set of admissible scenario-dependent ﬁnancial recourse decisions subject to a budget con-straint B >

0. Our problem is to solve: max Z ∈Z ψ R ( E ) ( X + Z ) . This problem seeks to maximize the robust choice function subject to the available budget.There is a systematic risk factor ϕ ∼ N ormal (0 , ξ n ∼ N ormal ( n × , n × . n = 1 , , . . . , N . We then setthe return to be X n = 10( ϕ + ξ n ), which is similar to the experimental setup in Esfahani and Kuhn(2018). To construct ECDS for our experiments, we randomly generate samples of ﬁnancial returnsaccording to this setup. Again, the decision maker’s preference in each pair of assets is determinedby using the perceived choice function in Eq. (14). We randomly select the weight vector w , andset the budget to be B = 0 . To test the scalability of our sorting algorithm, we model an institution with N = 20 sub-units on T = 20 scenarios. The computation times required by our sorting algorithm and MILP by Gurobiare presented in Table 2. Here, the diﬀerence in required computation time between our sortingalgorithm and MILP is more pronounced compared to our previous portfolio optimization problem.As for that problem, here we also do comparisons among ﬁve groups where the ECDS is randomlygenerated for each group.We next consider scalability of our law invariant sorting algorithm. Table 3 suggests that the lawinvariant sorting algorithm is highly scalable in terms of the number of scenarios and the number ofattributes. Note that the most rapid growth in run time is with the number of ECDS pairs. Thisresult is not surprising since according to Theorem 7.7, our theoretical complexity bound growsquadratically in the number of ECDS pairs. 28 CDS PairsMethod Group 10 20 30 40 50Sorting 1 15.6s 97s 364s 841s 1631s2 13.2s 107s 350s 826s 1539s3 12.5s 99s 326s 808s 1483s4 12.6s 107s 342s 823s 1651s5 13s 97s 337s 781s 1545sAverage

MILP 1 7.16s 56.13s 185.78s 3728.40s 22109.66s2 4.64s 59.92s 254.80s 5312.21s 31019.20s3 6.47s 90.27s 450.12s 6560.72s 38534.88s4 5.30s 71.62s 387.61s 5666.33s 36112.33s5 5.18s 38.9s 278.50s 5449.45s 32334.57sAverage

Table 2: Scalability of the sorting algorithm for capital allocationNow we study the scalability of the binary search algorithm for the capital allocation problem(which takes the optimal solution of the value problem as input). Table 4 suggests that the algorithmis highly scalable in terms of the size of the ECDS, the number of attributes, and the number ofscenarios.

Now we want to know how well our robust choice function approximates the perceived choice functionfor diﬀerent sizes of the ECDS and how the PRO solutions compare with the ground truth solution.We ﬁrst test the robust choice function value with increasing ECDS size. We randomly sample500 ﬁnancial positions X , calculate the robust choice function value ψ R ( E ) ( X ) ( ψ R L ( E ) ( X )) of eachﬁnancial position X , and then calculate the average. We run this experiment under ECDS withdiﬀerent sizes. Figure 6 shows the average of robust choice function values when there are T =4 scenarios and N = 4 sub-units. This ﬁgure shows how the robust choice function gradually“converges” as more pairwise comparison data is given. As we saw earlier, this experiment indicatesthat if [Law] is enforced then only a few comparisons (possibly fewer than ten ECDS pairs) areneeded to make robust decisions of high quality.The ﬁnal experiment tests the performance of the capital allocation plans proposed by diﬀerentmodels: robust choice function, robust law-invariant choice function, expected return, and the per-ceived choice function. In each experimental run, we randomly generate a ﬁnancial position. Then,for each model, we calculate the perceived choice function values of the allocation plans proposed byPRO with and without [Law], expected return maximization, and perceived choice maximization.For the expected return, we use the uniform weight w = ( N , . . . , N ) as our guess because theground truth is unknown. We conduct 500 experimental runs and calculate the average, the resultsare shown in Figure 7.Similar to the portfolio optimization problem, law-invariant PRO better matches the perceivedchoice function (which is also law-invariant). Second, the performance of both PRO and PRO with[Law] improves as we increase the size of the ECDS. However, we see that the convergence rate oflaw-invariant PRO is slower here than it was in the portfolio problem. One possible explanation isthat, in the multi-attribute case, enforcing [Law] only facilitates learning among diﬀerent scenarios,but not diﬀerent attributes. Hence, one can argue that more elicited pairs are needed to achieve thesame performance for multi-attribute PRO compared to the single-attribute case.29 CDS PairsSettings Group 1 2 5 10 20 50 8020 Scen, 20 Attr 1 52.9ms 437ms 7.19s 52.4s 506s 6715s 28284s2 47.5ms 420ms 7.12s 52.8s 507s 6822s 30111s3 52ms 445ms 7.28s 52.5s 508s 6720s 27765s4 50.2ms 440ms 7.2s 53s 511s 6669s 28202s5 51.8ms 444ms 7.22s 52.4s 512s 6733s 28116sAverage

Number of ScenariosSettings Group 5 10 15 20 50 100 20010 Pairs, 20 Attr 1 6.64s 17s 33.4s 54.3s 274s 1141s 4532s2 6.75s 17.1s 33.2s 55.1s 274s 1222s 4605s3 6.58s 16.8s 33.3s 52.8s 280s 1088s 4520s4 6.32s 17.9s 33.6s 54.2s 269s 1050s 4330s5 6.67s 16.9s 33.8s 55s 271s 1150s 4567sAverage

Number of AttributesSettings Group 5 10 15 20 50 100 20010 Pairs, 20 Scen 1 21s 31s 40.9s 51.5s 124s 246s 480s2 20s 31s 40.6s 51.2s 122s 244s 484s3 19s 31s 40.7s 51.6s 123s 248s 482s4 20s 30s 41.2s 52.2s 126s 250s 480s5 21s 32s 41s 51.8s 124s 246s 481sAverage

Table 3: Scalability of the sorting algorithm for capital allocation (law-invariant case) − − − − A v e r ag ec h o i ce f un c t i o n v a l u e Robust choice functionRobust choice function [Law]

Figure 6: Robust choice function value with increasing ECDS size

This paper has put forward a new framework for doing PRO. The distinguishing features of ourframework are its emphasis on (i) quasi-concavity and (ii) its support for multi-attribute prospects.Compared to concavity, quasi-concavity allows for a broader expression of preferences in utility andrisk. Quasi-concavity is also equivalent to diversiﬁcation favoring behavior, and is easier to check30

CDS PairsSettings Group 1 2 5 10 20 50 8020 Scen, 20 Attr 1 576ms 867ms 1.21s 1.46s 1.84s 2.38s 4.56s2 532ms 902ms 1.17s 1.51s 1.82s 2.32s 4.62s3 555ms 855ms 1.15s 1.45s 1.81s 2.4s 4.88s4 546ms 832ms 1.22s 1.47s 1.85s 2.33s 4.67s5 612ms 841ms 1.23s 1.46s 1.9s 2.35s 4.51sAverage

Number of ScenariosSettings Group 5 10 15 20 50 100 20010 Pairs, 20 Attr 1 153ms 414ms 1.02s 1.46s 8.8s 40s 181s2 115ms 414ms 924ms 1.36s 9.79s 35s 177s3 130ms 376ms 919ms 1.32s 10s 38s 190s4 119ms 427ms 917ms 1.33s 7.88s 42s 176s5 131ms 425ms 922ms 1.42s 9.47s 40s 183sAverage

Number of AttributesSettings Group 5 10 15 20 50 100 20010 Pairs, 20 Scen 1 402ms 745ms 1.42s 1.6s 4.56 9.42s 18.5s2 398ms 740ms 1.38s 1.55s 4.6s 9.44s 18.7s3 396ms 740ms 1.37s 1.56s 4.61s 9.45s 19s4 408ms 750ms 1.45s 1.61s 4.6s 9.39s 18.8s5 405ms 744ms 1.44s 1.61s 4.56s 9.4s 18.9sAverage

Table 4: Scalability of the binary search algorithm for capital allocationfor a new decision maker compared to the stronger property of concavity. Many decision problemsare fundamentally multi-attribute, and our framework avoids the need to assign artiﬁcial weights toeach attribute to do optimization.We show that we can solve our class of PRO problems eﬃciently. As our main methodologicalcontribution, we demonstrate that this class of robust choice functions can be decomposed into avalue problem and an interpolation problem. Then, we prove that the value problem can be solvedby a sequence of LPs that are polynomial in the size of the ECDS. Once the value problem is solved,we can optimize the robust choice function by solving a sequence of convex optimization problems(that depend on the acceptance sets which we are able to compute explicitly).Our numerical experiments show that our method has practical computational complexity interms of the number of scenarios, attributes, and the size of the ECDS - even in the law invariantcase. The greatest computational expenditure is on the value problem. Once the value problemis solved, ﬁnding the optimal PRO solution seems to be quick. The robust choice function is alsoindependent of the problem at hand, it only depends on the preference elicitation information of thedecision maker. Once the eﬀort is made to solve it, the same information can be re-used later inrelated applications.

References

B. Armbruster and E. Delage. Decision making under uncertainty when preference information isincomplete.

Management Science , 61(1):111–128, 2015.31igure 7: Performance of the allocation plans proposed by diﬀerent methodsP. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk.

Mathematical ﬁnance ,9(3):203–228, 1999.A. Azaron, K. Brown, S. Tarim, and M. Modarres. A multi-objective stochastic programmingapproach for supply chain design considering risk.

International Journal of Production Economics ,116(1):129–138, 2008.G. M. Becker, M. H. DeGroot, and J. Marschak. Measuring utility by a single-response sequentialmethod.

Behavioral science , 9(3):226–232, 1964.A. Ben-Tal and M. Teboulle. An old-new concept of convex risk measures: The optimized certaintyequivalent.

Mathematical Finance , 17(3):449–476, 2007.D. Bertsimas and A. O’Hair. Learning preferences under noise and loss aversion: An optimizationapproach.

Operations Research , 61(5):1190–1199, 2013.H. P. Binswanger. Attitudes toward risk: Experimental measurement in rural india.

Americanjournal of agricultural economics , 62(3):395–407, 1980.D. B. Brown and M. Sim. Satisﬁcing measures for analysis of risky positions.

Management Science ,55(1):71–84, 2009.D. B. Brown, E. D. Giorgi, and M. Sim. Aspirational preferences and their representation by riskmeasures.

Management Science , 58(11):2095–2113, 2012.C. Burgert and L. R¨uschendorf. Consistent risk measures for portfolio vectors.

Insurance: Mathe-matics and Economics , 38(2):289–297, 2006.S. Cerreia-Vioglio, F. Maccheroni, M. Marinacci, and L. Montrucchio. Risk measures: rationalityand diversiﬁcation.

Mathematical Finance: An International Journal of Mathematics, Statisticsand Financial Economics , 21(4):743–774, 2011.A. Chen, J. Kim, S. Lee, and Y. Kim. Stochastic multi-objective models for network design problem.

Expert Systems with Applications , 37(2):1608–1619, 2010.32. Cherny and D. Madan. New measures for performance evaluation.

The Review of FinancialStudies , 22(7):2571–2606, 2009.G. Debreu. Continuity properties of paretian utility.

International Economic Review , 30(5):285–293,1964.E. Delage and J. Y.-M. Li. Minimizing risk exposure when the choice of a risk measure is ambiguous.

Management Science , 64(1), 2017.E. Delage, S. Guo, and H. Xu.

Shortfall Risk Models When Information of Loss Function Is Incom-plete . GERAD HEC Montr´eal, 2017.D. Dentcheva and A. Ruszczy´nski. Optimization with stochastic dominance constraints.

Society ofIndustrial and Applied Mathematics Journal of Optimization , 14(2):548–566, 2003.D. Dentcheva and A. Ruszczy´nski. Optimality and duality theory for stochastic optimization prob-lems with nonlinear dominance constraints.

Mathematical Programming , 99:329–350, 2004.D. Dentcheva and A. Ruszczy´nski. Optimization with multivariate stochastic dominance constraints.

Mathematical Programming , 117:111–127, 2009.I. G. Dino and G. ¨U¸coluk. Multiobjective design optimization of building space layout, energy, anddaylighting performance.

Journal of Computing in Civil Engineering , 31(5):04017025, 2017.S. Drapeau and M. Kupper. Risk preferences and their robust representation.

Mathematics ofOperations Research , 38(1):28–62, 2013.M. Ehrgott, J. Ide, and A. Sch¨obel. Minmax robustness for multi-objective optimization problems.

European Journal of Operational Research , 239(1):17–31, 2014.N. El Karoui and C. Ravanelli. Cash subadditive risk measures and interest rate ambiguity.

Mathe-matical Finance: An International Journal of Mathematics, Statistics and Financial Economics ,19(4):561–590, 2009.P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the wassersteinmetric: Performance guarantees and tractable reformulations.

Mathematical Programming , 171(1-2):115–166, 2018.P. H. Farquhar. State of the artutility assessment methods.

Management science , 30(11):1283–1300,1984.D. Feeny, W. Furlong, G. W. Torrance, C. H. Goldsmith, Z. Zhu, S. DePauw, M. Denton, andM. Boyle. Multiattribute and single-attribute utility functions for the health utilities index mark3 system.

Medical care , 40(2):113–128, 2002.P. C. Fishburn and I. H. LaValle. Multiattribute expected utility without the archimedean axiom.

Journal of mathematical Psychology , 36(4):573–591, 1992.J. Fliege and R. Werner. Robust multiobjective optimization & applications in portfolio optimiza-tion.

European Journal of Operational Research , 234(2):422–433, 2014.A. Galichon and M. Henry. Dual theory of choice with multivariate risks.

Journal of EconomicTheory , 147(4):1501–1516, 2012. 33. Gilboa and M. Marinacci. Ambiguity and the bayesian paradigm. In

Readings in formal episte-mology , pages 385–439. Springer, 2016.I. Gilboa and D. Schmeidler. Maxmin expected utility with a non-unique prior. 1989.A. H. Hamel and F. Heyde. Duality for set-valued measures of risk.

SIAM Journal on FinancialMathematics , 1(1):66–95, 2010.W. Haskell, Z. Shen, and J. Shanthikumar. Optimization with a class of multivariate integralstochastic order constraints.

Annals of Operations Research , 51(1):273–303, 2013.W. B. Haskell, W. Huang, and H. Xu. Preference elicitation and robust optimization with multi-attribute quasi-concave choice functions. arXiv preprint arXiv:1805.06632 , 2018.J. Hu and S. Mehrotra. Robust decision making over a set of random targets or risk-averse utilitieswith an application to portfolio optimization.

IIE Transactions , 47(4):358–372, 2015.J. Hu, T. Homem-de Mello, and S. Mehrotra. Risk-adjusted budget allocation models with applica-tion in homeland security.

IIE Transactions , 43(12):819–839, 2011.J. Hu, M. Bansal, and S. Mehrotra. Robust decision making using a general utility set.

EuropeanJournal of Operational Research , 269(2):699–714, 2018.E. Jouini, M. Meddeb, and N. Touzi. Vector-valued coherent risk measures.

Finance and Stochastics ,8(4):531–552, 2004.U. S. Karmarkar. Subjectively weighted utility: A descriptive extension of the expected utilitymodel.

Organizational behavior and human performance , 21(1):61–72, 1978.E. Karni and Z. Safra. ” preference reversal” and the observability of preferences by experimentalmethods.

Econometrica: Journal of the Econometric Society , pages 675–685, 1987.S.-W. Lam, T. S. Ng, M. Sim, and J.-H. Song. Multiple objectives satisﬁcing under uncertainty.

Operations Research , 61(1):214–227, 2013.A. Liefooghe, M. Basseur, L. Jourdan, and E.-G. Talbi. Combinatorial optimization of stochasticmulti-objective problems: an application to the ﬂow-shop scheduling problem. In

EvolutionaryMulti-Criterion Optimization , pages 457–471. Springer, 2007.F. Maccheroni, M. Marinacci, and A. Rustichini. Ambiguity aversion, robustness, and the variationalrepresentation of preferences.

Econometrica , 74(6):1447–1498, 2006.J. M. Miyamoto and P. Wakker. Multiattribute utility theory without expected utility foundations.

Operations Research , 44(2):313–326, 1996.A. Muller and D. Stoyan.

Comparison Methods for Stochastic Models and Risks . John Wiley andSons, Inc., 2002.N. Noyan and G. Rudolf. Optimization with stochastic preferences based on a general class ofscalarization functions.

Operations Research , 66(2):463–486, 2018.F. Plastria. Lower subdiﬀerentiable functions and their minimization by cutting planes.

Journal ofOptimization Theory and Applications , 46(1):37–53, 1985.C. Puppe.

Distored Probabilities and Choice under Risk . Springer-Verlag, Berlin, 1991.34. L. Saaty. How to make a decision: the analytic hierarchy process.

European journal of operationalresearch , 48(1):9–26, 1990.M. Shaked and J. G. Shanthikumar.

Stochastic Orders . Springer, 2007.J. E. Smith and R. L. Keeney. Your money or your life: A prescriptive model for health, safety, andconsumption decisions.

Management Science , 51(9):1309–1325, 2005.L. L. Thurstone. A law of comparative judgment.

Psychological review , 34(4):273, 1927.G. W. Torrance, M. H. Boyle, and S. P. Horwood. Application of multi-attribute utility theory tomeasure social preferences for health states.

Operations research , 30(6):1043–1069, 1982.K. E. Train.

Discrete choice methods with simulation . Cambridge university press, 2009.C. Tseng and T. Lu. Minimax multiobjective optimization in structural design.

International Journalfor Numerical Methods in Engineering , 30(6):1213–1228, 1990.I. Tsetlin and R. L. Winkler. On equivalent target-oriented formulations for multiattribute utility.

Decision Analysis , 3(2):94–99, 2006.I. Tsetlin and R. L. Winkler. Decision making with multiattribute performance targets: The impactof changes in performance and target distributions.

Operations Research , 55(2):226–233, 2007.I. Tsetlin and R. L. Winkler. Multiattribute utility satisfying a preference for combining good withbad.

Management Science , 55(12):1942–1952, 12 2009.P. Vayanos, D. McElfresh, Y. Ye, J. Dickerson, and E. Rice. Active preference elicitation viaadjustable robust optimization. arXiv preprint arXiv:2003.01899 , 2020.J. Von Neumann and O. Morgenstern. Theory of games and economic behavior.

Bull. Amer. Math.Soc , 51(7):498–504, 1945.B. Von Stengel. Decomposition of multiattribute expected-utility functions.

Annals of OperationsResearch , 16(1):161–183, 1988.W. Wang and H. Xu. Robust spectral risk optimization when information on risk spectrum isincomplete.

Available at Optimization Online , 2020.M. Weber. Decision making with incomplete information.

European journal of operational research ,28(1):44–57, 1987.A. Zakariazadeh, S. Jadid, and P. Siano. Multi-objective scheduling of electric vehicles in smartdistribution system.

Energy Conversion and Management , 79:43–53, 2014.

A Proofs for Section 3 (the PRO problem)

A.1 Proof of Proposition 3.3 (formulation of robust choice function withbenchmark)

We are concerned with the relative diﬀerence φ ( X ) − φ ( Y ), so we do not have to force φ ( Y ) to takea speciﬁc value. To understand how a benchmark changes the setup, let us momentarily withdraw35he normalization requirement from R ( E ) and deﬁne:ˆ R ( E ) := { φ ∈ R QCo : [

Eli ] , [ Lip ] , φ ( Y ) ∈ R } . We require φ ( Y ) to be ﬁnite-valued (i.e., φ ( Y ) ∈ R ) to avoid the situation where ψ ˆ R ( E ) ( X ; Y ) = −∞ for some X ∈ L . We can deﬁne the setsˆ R m ( E ) := { φ ∈ ˆ R ( E ) : φ ( Y ) = m } , ∀ m ∈ R , to form the partition ˆ R ( E ) = ∪ m ∈ R ˆ R m ( E ). Consequently, ψ ˆ R ( E ) ( X ; Y ) = inf φ ∈ ˆ R ( E ) { φ ( X ) − φ ( Y ) } = inf φ ∈∪ m ∈ R ˆ R m ( E ) { φ ( X ) − φ ( Y ) } = inf m ∈ R inf φ ∈ ˆ R m ( E ) { φ ( X ) − φ ( Y ) } = inf φ ∈ ˆ R ( E ) { φ ( X ) − φ ( Y ) } = inf φ ∈ ˆ R ( E ) φ ( X ) , where the second-to-last equality uses the fact that φ ∈ ˆ R ( E ) if and only if φ + m ∈ ˆ R m ( E ) for all m ∈ R , and the last equality uses the fact that φ ( Y ) = 0 for all φ ∈ ˆ R ( E ). Using the new preferenceambiguity set R ( E ; Y ) := { φ ∈ R QCo : [

Eli ] , [ Lip ] , φ ( Y ) = 0 } normalized at the benchmark, wethen have ψ ˆ R ( E ) ( X ; Y ) = ψ R ( E ; Y ) ( X ) for all X ∈ L . A.2 Proof of Theorem 3.4 (two-stage decomposition)

We make use of the following technical lemma in our upcoming argument, it just shows that thepreference ambiguity set R ( E ) is closed under minimization. Lemma A.1.

Let φ , . . . , φ I ∈ R ( E ) and deﬁne ψ ( · ) := inf i =1 ,..., I φ i ( · ) , then ψ ∈ R ( E ) .Proof. The function ψ is monotonic, quasi-concave, and L − Lipschitz since these properties are allpreserved under minimization. Similarly, for each k = 1 , . . . , K , since φ i ( W k ) ≥ φ i ( Y k ) for all i = 1 , . . . , I , we must have ψ ( W k ) ≥ ψ ( Y k ). Finally, since φ i ( W ) = 0 for all i = 1 , . . . , I , we musthave ψ ( W ) = 0.For each θ ∈ Θ, let us consider the problem: P ( θ ) := inf φ ∈ R ( E ) φ ( θ ) , which just minimizes φ ( θ ) over all φ ∈ R ( E ) (in contrast to the value problem P which minimizesthe sum of the values φ ( θ ) over all θ ∈ Θ). For each θ ∈ Θ, let φ ∗ θ ∈ R ( E ) be an optimalsolution of Problem P ( θ ). By Lemma A.1, the set R ( E ) is closed under minimization, so we musthave φ ∗ := min θ ∈ Θ φ ∗ θ ∈ R ( E ) as well. This implies that ψ R ( E ) ( θ ) ≤ φ ∗ ( θ ) for all θ ∈ Θ. But φ ∗ ( θ ) = φ ∗ θ ( θ ) ≤ φ ( θ ) for all φ ∈ R ( E ) and θ ∈ Θ. Subsequently (cid:88) θ ∈ Θ φ ∗ ( θ ) = (cid:88) θ ∈ Θ φ ∗ θ ( θ ) ≤ (cid:88) θ ∈ Θ φ ( θ ) , for any feasible φ ∈ R ( E ). It follows that φ ∗ ( θ ) = ψ R ( E ) ( θ ) must hold for all θ ∈ Θ, and we have thelower bound (cid:80) θ ∈ Θ φ ∗ θ ( θ ) on the optimal value of Problem P . Since φ ( θ ) ≥ φ ∗ θ ( θ ) for all θ ∈ Θ, thislower bound is only attained for φ ∈ R QCo such that φ ( θ ) = φ ∗ θ ( θ ) for all θ ∈ Θ. Thus, the optimalsolution of Problem P has the unique values φ ∗ θ ( θ ) = ψ R ( E ) ( θ ) for all θ ∈ Θ.We now show that evaluation of ψ R ( E ) ( X ) for some X ∈ L can be thought of as minimizing φ ( X ) over the support functions φ ∈ R QCo of ψ R ( E ) at X ∈ L .36 emma A.2. For all X ∈ L , we have ψ R ( E ) ( X ) = inf φ ∈ R QCo { φ ( X ) : [ Lip ] , φ ( θ ) ≥ ψ R ( E ) ( θ ) , ∀ θ ∈ Θ } . Proof.

Deﬁne ˜ ψ ( X ) := inf φ ∈ R QCo { φ ( X ) : [ Lip ] , φ ( θ ) ≥ ψ R ( E ) ( θ ) , ∀ θ ∈ Θ } . It is immediate that ˜ ψ ( X ) ≤ ψ R ( E ) ( X ), since ˜ ψ is only required to majorize ψ R ( E ) on Θ (and noton all of L ). For the other direction, note that the set { φ ∈ R QCo : [

Lip ] , φ ( θ ) ≥ ψ R ( E ) ( θ ) , ∀ θ ∈ Θ } is closed under minimization, so we have˜ ψ ( X ) = inf φ ∈ R QCo { φ ( X ) : [ Lip ] , φ ( θ ) = ψ R ( E ) ( θ ) , ∀ θ ∈ Θ } . Then, we see that ˜ ψ satisﬁes [Nor], [Lip], and [Eli] and thus ˜ ψ ∈ R ( E ). It follows that ψ R ( E ) ( X ) ≤ ˜ ψ ( X ) must also hold, completing the argument. B Proofs for Section 4 (the value problem)

B.1 Proof of Theorem 4.2 (disjunctive programming reformulation)

We conﬁrm that the value Problem P and the interpolation problem P ( (cid:126)X ; v ∗ ) can both be expressedas disjunctive programming problems for a ﬁnite sample space Ω. We begin with a technical lemmathat characterizes Lipschitz continuous quasi-concave functions in terms of their upper subgradients. Lemma B.1.

Let (cid:107) · (cid:107) be a norm on R d and let (cid:107) · (cid:107) ∗ be its dual norm. A quasi-concave function f : R d → R is L − Lipschitz with respect to (cid:107) · (cid:107) if and only if for every x ∈ R d , there exists an uppersubgradient a ∈ ∂ + f ( x ) such that (cid:107) a (cid:107) ∗ ≤ L .Proof. See Plastria (1985) for the proof of the case for the Euclidean norm (cid:107) · (cid:107) = (cid:107) · (cid:107) . This proofcan be generalized to any norm using the general Cauchy-Schwarz inequality (cid:104) x, y (cid:105) ≤ (cid:107) x (cid:107) ∗ (cid:107) y (cid:107) forall x, y ∈ R d .Suppose that for every x ∈ R d , there exists an upper subgradient a ∈ ∂ + f ( x ) such that (cid:107) a (cid:107) ∗ ≤ L .Let y ∈ R d and assume without loss of generality that f ( y ) > f ( x ). Then 0 < f ( y ) − f ( x ) ≤(cid:104) a, y − x (cid:105) ≤ (cid:107) a (cid:107) ∗ (cid:107) y − x (cid:107) ≤ L (cid:107) y − x (cid:107) .For the other direction, suppose that f is L − Lipschitz and quasi-concave. For any x ∈ R d , let S be the upper level set of f at x , which is convex by quasi-concavity of f . Then, there exists aseparating hyperplane for x and S . Speciﬁcally, there exists a with (cid:107) a (cid:107) = 1 such that (cid:104) y − x, a (cid:105) ≥ y ∈ S .Let a = L a / (cid:107) a (cid:107) ∗ , we will prove that a is an upper-subgradient of f . Let e ∈ arg max (cid:107) e (cid:107) =1 (cid:104) e, a (cid:105) ,then we have (cid:104) e , a (cid:105) = (cid:107) a (cid:107) ∗ by deﬁnition of the dual norm. Now ﬁx any y ∈ S , and let y (cid:48) lie on theseparating hyperplane (i.e., (cid:104) a , y (cid:48) − x (cid:105) = 0) such that y − y (cid:48) = (cid:107) y − y (cid:48) (cid:107) e . Then, we have (cid:104) a, y − x (cid:105) = (cid:104) a, y − y (cid:48) (cid:105) = (cid:104) a, e (cid:105)(cid:107) y − y (cid:48) (cid:107) = (cid:107) a (cid:107) ∗ (cid:107) y − y (cid:48) (cid:107) = L (cid:107) y − y (cid:48) (cid:107) . On the other hand, we have f ( y (cid:48) ) ≤ f ( x ) since y (cid:48) lies on the separating hyperplane. By Lipschitzcontinuity of f , it follows that f ( y ) − f ( x ) ≤ f ( y ) − f ( y (cid:48) ) ≤ L (cid:107) y − y (cid:48) (cid:107) . Combining the above two inequalities, we see that f ( y ) − f ( x ) ≤ (cid:104) a, y − x (cid:105) for all y ∈ S . Since (cid:107) a (cid:107) ∗ = L by construction, the desired conclusion follows.37e now present another technical lemma that characterizes quasi-concave functions via the classof “hockey stick” support functions. This lemma will provide a theoretical function for the disjunctiveprogramming reformulation of P and P ( (cid:126)X ; v ∗ ). Lemma B.2.

Let f : R d → R . The following assertions hold.(i) Suppose that f is upper subdiﬀerentiable in its domain. Then, f is quasi-concave, uppersemi-continuous, and f ( x ) = inf j ∈J h j ( x ) , ∀ x ∈ dom f, (15) where J is a possibly inﬁnite index set and h j ( x ) = max {(cid:104) a j , x (cid:105) + b j , c j } for j ∈ J . Moreover,suppose f is in addition L − Lipschitz continuous with respect to (cid:107) · (cid:107) ∞ . Then (cid:107) a j (cid:107) ≤ L for all j ∈ J (ii) If f has a representation as Eq. (15) , then it is quasi-concave. Moreover, if a j ≥ for all j ∈ J , then f is non-decreasing. Conversely, if f is non-decreasing and quasi-concave, then thereexists a set of hockey-stick type functions { h j } with a j ≥ such that representation Eq. (15) holds.(iii) For any ﬁnite set Θ ∈ R d and values { v θ } θ ∈ Θ , ˆ f : R d → R deﬁned by ˆ f ( x ) := inf v,a v s.t. v + max {(cid:104) a, θ − x (cid:105) , } ≥ v θ , ∀ θ ∈ Θ , (16) is quasi-concave.Proof. Part (i). By Theorem 2.1 of Plastria (1985), upper subdiﬀerentiability of f implies quasi-concavity and upper semi-continuity. Moreover, for any x ∈ dom f and any s x ∈ ∂ + f ( x ), f issupported by h x ( y ) = max { f ( x ) + (cid:104) s x , y − x (cid:105) , f ( x ) } . By taking the inﬁmum of all such supportfunctions, we have f ( y ) ≤ inf x ∈ dom f h x ( y ) for all y ∈ dom f . On the other hand, for any y ∈ dom f we have f ( y ) = h y ( y ) ≥ inf x h x ( y ). These two inequalities together establish (15). The remainingpart follows from Lemma B.1.Part (ii). For any t ∈ R , we have (cid:8) x ∈ R d : f ( x ) ≥ t (cid:9) = (cid:26) x ∈ R d : inf j ∈J h j ( x ) ≥ t (cid:27) = ∩ j ∈J (cid:8) x ∈ R d : h j ( x ) ≥ t (cid:9) . Each set (cid:8) x ∈ R d : h j ( x ) ≥ t (cid:9) is convex by quasi-concavity of h j for all j ∈ J , and the intersectionof convex sets is convex. Consequently, the upper level sets of f are convex.Now we establish monotonicity. When a j ≥ j ∈ J , we can use Part (i) to show that f ( x ) ≥ f ( y ) for all x ≥ y . Conversely, since f is also upper subdiﬀerentiable, there exists s ∈ R d such that f ( y ) ≤ f ( x ) + (cid:104) s, y − x (cid:105) for all y with f ( y ) ≥ f ( x ). We assert that s ≥

0. For acontradiction, suppose s has at least one component i ∗ such that s i ∗ <

0. Let e i ∗ be the unit vectorwith the i ∗ − th component equal to 1 and all the rest equal to zero. Then, for any δ > , wehave f ( x + δ e i ∗ ) ≤ f ( x ) + δ s i ∗ < f ( x ), which contradicts monotonicity of f . By this reasoning,for any ﬁxed x ∈ R d , all upper subgradients s x of f at x must be non-negative. It follows that h x ( x ) = max {(cid:104) a x , x (cid:105) + f ( x ) , f ( x ) } , where a x = s x ≥

0, is a hockey-stick type support function, and { h x } x ∈ R d is a collection of hockey-stick type support functions.Part (iii). Haskell et al. (2018) prove in Theorem 4.3 that ˆ f : R d → R deﬁned byˆ f ( x ) := inf a,b,c max {(cid:104) a, x (cid:105) + b, c } s.t. max {(cid:104) a, θ (cid:105) + b, c } ≥ v θ , ∀ θ ∈ Θ , (17)38s quasi-concave. We will prove that ˆ f = ˆ f by showing that Problems (16) and (17) are equivalent.For any ﬁxed x , suppose ( a, b, c ) is feasible for Problem (17) and let v := max {(cid:104) a, x (cid:105) + b, c } .We will show that ( v, a ) is feasible for Problem (16) and achieves the same objective value. First, v + max {(cid:104) a, θ − x (cid:105) , } ≥ max {(cid:104) a, θ (cid:105) + b, c } sincemax {(cid:104) a, x (cid:105) + b, c } + max {(cid:104) a, θ − x (cid:105) , } ≥ max {(cid:104) a, θ (cid:105) + b, c } , is trivially true. Hence, v + max {(cid:104) a, θ − x (cid:105) , } ≥ v θ for all θ ∈ Θ. It follows that ( v, a ) is feasiblefor Problem (16), and that the objective values of Problem (16) and (17) are equal by construc-tion. Now suppose ( v, a ) is feasible for Problem (16), and let c := v and b := c − (cid:104) a, x (cid:105) . Thenmax {(cid:104) a, θ (cid:105) + b, c } = max {(cid:104) a, θ − x (cid:105) + v, v } ≥ v θ for all θ ∈ Θ. It follows that ( a, b, c ) is feasiblefor Problem (17), and again the objective values of the two problems are the same by deﬁnition.The strategy of the following proof to verify the correctness of the disjunctive programmingformulation of the value problem is similar to Armbruster and Delage (2015).

Lemma B.3.

Suppose Assumption 4.1 holds, then Problem P is equivalent to Problem (1) .Proof. Given a value vector v = ( v θ ) θ ∈ Θ , we deﬁne the set R ( v ) := { φ : L → R : φ ( θ ) = v θ , ∀ θ ∈ Θ } of all choice functions that match these values on Θ. We then have the following equivalences: P ≡ inf φ ∈ R ( E ) (cid:88) θ ∈ Θ φ ( θ ) ≡ min v ∈ R | Θ | inf φ ∈ R ( E ) ∩ R ( v ) (cid:88) θ ∈ Θ φ ( θ ) ≡ min v ∈ R | Θ | (cid:40)(cid:88) θ ∈ Θ v θ : ∃ φ ∈ R ( E ) ∩ R ( v ) (cid:41) . Next we deﬁne the additional constraint sets R No := { φ : L → R : [ N or ] } , R Li := { φ : L → R : [ Lip ] } , R El := { φ : L → R : [ Eli ] } . Then, using the fact R ( E ) = R No ∩ R Li ∩ R El ∩ R QCo , we see that P = min v ∈ R | Θ | (cid:88) θ ∈ Θ v θ s.t. φ ∈ R No ∩ R ( v ) , φ ∈ R QCo ∩ R Li ∩ R ( v ) , φ ∈ R El ∩ R ( v ) . First, we note that these constraints only restrict the values of φ on Θ via the choice of value vector v . The constraint φ ∈ R No ∩ R ( v ) is equivalent to Eq. (1e) because the set R No only restricts thevalue on the normalizing prospect to be zero. The constraint φ ∈ R QCo ∩ R Li ∩ R ( v ) is equivalentto Eq. (1b) (which requires the support function (cid:126)Z → v θ + max (cid:110) (cid:104) s θ , (cid:126)Z − θ (cid:105) , (cid:111) to majorize φ on Θ) and Eq. (1c) (which requires the norm of the upper subgradient (cid:107) s θ (cid:107) to bebounded by L ) by Lemma B.2. Finally, the constraint φ ∈ R El ∩ R ( v ) is equivalent to Eq. (1d)because R El only restricts the choice function values on Θ.39ext we verify the correctness of the disjunctive programming formulation of the interpolationproblem. Lemma B.4.

Suppose Assumption 4.1 holds, and let v ∗ be the optimal solution of Problem P .Then, ψ R ( E ) (cid:16) (cid:126)X (cid:17) = val ( P ( (cid:126)X ; v ∗ )) for all (cid:126)X ∈ L .Proof. First, by Theorem 3.4 we have ψ R ( E ) (cid:16) (cid:126)X (cid:17) = inf φ ∈ R QCo { φ (cid:16) (cid:126)X (cid:17) : [ Lip ] , φ ( θ ) ≥ ψ R ( E ) ( θ ) , ∀ θ ∈ Θ } . Since v ∗ is the optimal solution of Problem P , for any (cid:126)X ∈ L we have the equality ψ R ( E ) (cid:16) (cid:126)X (cid:17) = min v (cid:110) v ∈ R : φ (cid:16) (cid:126)X (cid:17) = v, φ ∈ R QCo , [ Lip ] , φ ( θ ) ≥ v ∗ θ , ∀ θ ∈ Θ (cid:111) . The value ψ R ( E ) (cid:16) (cid:126)X (cid:17) is then equal to the optimal value of:min a, v v (18a)s.t. v + max (cid:110) (cid:104) a, θ − (cid:126)X (cid:105) , (cid:111) ≥ v ∗ θ , ∀ θ ∈ Θ , (18b) a ≥ , (cid:107) a (cid:107) ≤ L. (18c)In particular, Eq. (18b) ensures that the support function (cid:126)Z → max (cid:110) (cid:104) a, (cid:126)Z − (cid:126)X (cid:105) , (cid:111) majorizes ψ R ( E ) on Θ via the value vector v ∗ . Finally, Eq. (18c) enforces property [Lip]. B.2 Proof of Theorem 4.3 (correctness of sorting algorithm)

We repeat the value function problem P here for ease of reference:min v, s (cid:88) θ ∈ Θ v θ (19a)s.t. v θ + max {(cid:104) s θ , θ (cid:48) − θ (cid:105) , } ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ (cid:98) E , (19b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, ∀ θ ∈ Θ , (19c) v θ ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , (19d) v W = 0 . (19e)Our ultimate goal is to solve Problem (19). We prove that the values v = ( u θ ) θ ∈ Θ corresponding to D returned by Algorithm 1 are the optimal solution of Problem (19) in the following three steps:1. First, we verify that the candidate solution constructed in Algorithm 1 is well-deﬁned.2. Second, we verify that this candidate solution gives a lower bound on the optimal solution ofProblem (19).3. Finally, we verify that this candidate solution is also feasible for Problem (19). Thus, it mustbe the (unique) optimal solution of Problem (19).40e will now justify the procedure of Algorithm 1, taking Problem (19) as the starting point.We ﬁrst present a modiﬁed version of Algorithm 1 that recursively constructs a candidate solutionto Problem (19) by selecting one prospect at a time. This modiﬁcation of Algorithm 1 is just aconvenient artifact of this proof. We will later verify that it is the same as Algorithm 1.We let D t for t = 1 , , . . . , J − D = { ( W , } . Given D t (which is composed of tuples ( θ (cid:48) , u θ (cid:48) )) and θ / ∈ D t , consider the disjunctive program: P D ( θ ; D t ) := min v θ ,s θ v θ (20a)s.t. v θ + max {(cid:104) s θ , θ (cid:48) − θ (cid:105) , } ≥ u θ (cid:48) , ∀ θ (cid:48) ∈ D t , (20b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, (20c) v θ ≥ u θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D t . (20d)There are some similarities between Problem P D ( θ ; D t ) and Problem (19). However, Problem P D ( θ ; D t ) is diﬀerent since it only tries to minimize v θ and it only enforces a subset of the con-straints of Problem (19). There are also some similarities between Problem P D ( θ ; D t ) and Problem P ( θ ; D t ) that is called by π ( θ ; D t ). For now, it is more convenient for us to use P D ( θ ; D t ), andthen establish equivalence with π ( θ ; D t ) later.A modiﬁed version of Algorithm 1 follows that is based on solving a sequence of instances ofProblem P D ( θ ; D t ). Algorithm 5: (Modiﬁed) sorting algorithm for the value problem

Result:

A decomposition D .Initialization: Θ, t = 1, and D t = { ( W , } ; while t < J do Choose θ ∗ ∈ arg max θ (cid:48) / ∈D t val ( P D ( θ (cid:48) ; D t )), and set u θ ∗ := val ( P D ( θ ∗ ; D t ));Set D t +1 := {D t , ( θ ∗ , u θ ∗ ) } ;Set t := t + 1; endreturn D := D J .Algorithm 5 is similar to Algorithm 1, except that it uses the optimal value of the disjunctiveprogramming problem P D ( θ (cid:48) ; D t ) in place of the predictor π ( θ (cid:48) ; D t ) (which is a function of theoptimal value of an LP). We pause to verify that the recursive construction of Algorithm 5 is well-deﬁned. Lemma B.5.

For all t = 1 , , . . . , J − , val ( P D ( θ ; D t )) is ﬁnite.Proof. Since W has value 0, and every prospect is ﬁnite, the optimal value val ( P D ( θ ; D )) is ﬁnitebecause of the Lipschitz constraint. By induction, every sorted prospect in D t has ﬁnite u θ − valueand so the optimal value val ( P D ( θ ; D t )) is also ﬁnite.Additionally, we verify that Algorithm 5 will output a monotone decreasing sequence of { u θ } θ ∈ Θ values in D . Lemma B.6. If θ precedes θ (cid:48) in D , then u θ ≥ u θ (cid:48) .Proof. Suppose θ t succeeds D t and θ t +1 succeeds D t +1 (this means θ t +1 must succeed θ t in D ). Let( v tθ t +1 , s tθ t +1 ) denote the optimal solution of Problem P D ( θ t +1 ; D t ). Since θ t ∈ arg max θ / ∈D t val ( P D ( θ ; D t )) ,

41e must have u θ t = val ( P D ( θ t ; D t )) ≥ val ( P D ( θ t +1 ; D t )) = v tθ t +1 . Then, we see that ( v θ t +1 , s θ t +1 ) :=( u θ t , s tθ t +1 ) is feasible for Problem P D ( θ t +1 ; D t +1 ) since ( v tθ t +1 , s tθ t +1 ) is feasible and u θ t ≥ v tθ t +1 .Thus, val ( P D ( θ t +1 ; D t +1 )) ≤ u θ t and we have u θ t ≥ u θ t +1 . The conclusion follows by induction.Next we will verify that the construction of Algorithm 5 is a lower bound on the optimal valueof Problem (19). We recall:min v, s v θ (21a)s.t. v θ + max {(cid:104) s θ , θ (cid:48) − θ (cid:105) , } ≥ v θ (cid:48) , ∀ θ (cid:54) = θ (cid:48) ∈ Θ , (21b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, ∀ θ ∈ Θ , (21c) v θ ≥ v θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , (21d) v W = 0 , (21e)which is equivalent to Problem P ( θ ). Lemma B.7.

For all t = 1 , , . . . , J − , val ( P D ( θ ; D t )) ≤ ψ R ( E ) ( θ ) for all θ / ∈ D t .Proof. We prove this statement by induction starting with D = W . We see val ( P D ( θ ; D )) is alower bound for ψ R ( E ) ( θ ) for all θ / ∈ D (this is automatic because P D ( θ ; D ) has fewer constraintsthan Problem (21) which computes ψ R ( E ) ( θ ) exactly). Proceeding inductively, if every estimate in D t is a lower bound on the corresponding ψ R ( E ) ( θ (cid:48) ) for all θ (cid:48) ∈ D t , then the optimal value of Problem P D ( θ ; D t ) is a lower bound for all θ / ∈ D t (this follows because P D ( θ ; D t ) has fewer constraints thanProblem (21), and by induction all of the values v θ (cid:48) for θ (cid:48) ∈ D t are themselves lower bounds).Next, we will verify that the construction of Algorithm 5 is also feasible for Problem (19). Lemma B.8.

The value vector v = ( u θ ) θ ∈ Θ corresponding to D is feasible for Problem (19) .Proof. Fix θ ∈ Θ, and suppose θ is the succeeding prospect of D t for some t = 1 , , . . . , J −

1. ByLemma B.6 (on monotonicity), u θ ≥ u θ (cid:48) for all θ (cid:48) / ∈ D t . As a result, Eqs. (19b) and (19c) aresatisﬁed for ( θ, θ (cid:48) ) with θ (cid:48) / ∈ D t , and hence for all pairs ( θ, θ (cid:48) ) with θ (cid:48) ∈ Θ (because this constraintis already satisﬁed for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ D t in Problem P D ( θ ; D t )). Eq. (19d) is also satisﬁed forall ( θ, θ (cid:48) ) with θ (cid:48) / ∈ D t , and hence for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ Θ (again, because this constraint is alreadysatisﬁed for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ D t in Problem P D ( θ ; D t )). This reasoning applies to all θ ∈ Θ, andso the constructed candidate solution is feasible.Finally, we will show that val ( P D ( θ ; D t )) is given by the following LP: P ( θ ; D t ) := min v θ ,s θ v θ (22a)s.t. v θ + (cid:104) s θ , θ (cid:48) − θ (cid:105) ≥ u θ (cid:48) , ∀ θ (cid:48) ∈ D t , (22b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, (22c) v θ ≥ u θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D t , (22d)which we deﬁned earlier for Algorithm 1. In particular, we establish that val ( P D ( θ ; D t )) = π ( θ ; D t ) = min { val ( P ( θ ; D t )) , v t } . Lemma B.9.

For all t = 1 , , . . . , J − , val ( P D ( θ ; D t )) = π ( θ ; D t ) for all θ / ∈ D t . roof. First, we note that every feasible solution of P ( θ ; D t ) is also feasible for P D ( θ ; D t ), so wemust have val ( P D ( θ ; D t )) ≤ val ( P ( θ ; D t )). Now let v θ = val ( P D ( θ ; D t )) and consider the twocases: (i) v θ < v t and (ii) v θ = v t (due to monotonicity, the case v θ > v t will not occur).In the ﬁrst case, we have v θ < u θ (cid:48) for every θ (cid:48) ∈ D t . Thus, the constraint v θ +max {(cid:104) s θ , θ (cid:48) − θ (cid:105) , } ≥ u θ (cid:48) in Problem P D ( θ ; D t ) is equivalent to v θ + (cid:104) s θ , θ (cid:48) − θ (cid:105) ≥ u θ (cid:48) (or it would otherwise be infeasible).Hence, Problem P D ( θ ; D t ) is equivalent to Problem P ( θ ; D t ) and we have v θ = val ( P ( θ ; D t )).In the second case, we have v t = v θ = val ( P D ( θ ; D t )) ≤ val ( P ( θ ; D t )) , and it follows that π ( θ ; D t ) = v t = v θ . To conclude, in both cases we have v θ = π ( θ ; D t ).By Lemma B.9, we can now claim that Algorithm 1 and Algorithm 5 are equivalent. To concludethe proof of Theorem 4.3, since the output of Algorithm 1 is both a lower bound and a feasiblesolution, it follows that Algorithm 1 returns the unique optimal solution of Problem (19). Moreover,Algorithm 1 solves O ( J ) LPs in total since it solves O ( J ) LPs in every iteration. C Proofs for Section 5 (acceptance set representation)

C.1 Proof of Proposition 5.1 (interpolation for a ﬁxed prospect)

Given (cid:126)X ∈ L , suppose we want to calculate v X = ψ R ( E ) ( (cid:126)X ). We can do this using Algorithm 1to construct a decomposition D J +2 of the set Θ ∪ { θ J +1 } ∪ { (cid:126)X } . We want to show v X ≥ v ⇐⇒ val ( P ( (cid:126)X ; D t )) ≥ v . We consider two cases: (i) v = v t and (ii) v t +1 < v < v t for some t = 1 , . . . , J .(i) For the ﬁrst case, suppose that val ( P ( (cid:126)X ; D t )) ≥ v t (or equivalently that π ( (cid:126)X ; D t ) = v t ). Bythe properties of Algorithm 1, (cid:126)X precedes any prospect that is not in D t . Hence (cid:126)X is the succeedingprospect of D t (cid:48) for some t (cid:48) ≤ t . On the one hand if t (cid:48) < t , then it follows that v X ≥ v t (cid:48) +1 ≥ v t byLemma B.6. On the other hand if t (cid:48) = t , we have v X = π ( (cid:126)X ; D t ) = v t .Conversely, if v X ≥ v t , then (cid:126)X must precede any prospect that is not in D t . Again sup-pose (cid:126)X is the succeeding prospect of D t (cid:48) for some t (cid:48) ≤ t . We then have that val ( P D ( (cid:126)X ; D t )) ≥ val ( P D ( (cid:126)X ; D t (cid:48) )) = v X where the inequality follows because Problem P D ( (cid:126)X ; D t (cid:48) ) has fewer con-straints than Problem P D ( (cid:126)X ; D t ). It then follows that val ( P ( (cid:126)X ; D t )) ≥ val ( P D ( (cid:126)X ; D t )) ≥ v X ≥ v t ,where we use the fact that val ( P D ( θ ; D t )) ≤ val ( P ( θ ; D t )) (see the proof of Lemma B.9).(ii) For the second case, we can artiﬁcially ﬁnd an prospect θ v ∈ L such that ψ R ( E ) ( θ v ) = v (by continuity of the choice function, we can always ﬁnd such a prospect). Then, we can insertthe prospect θ v into D J +1 to create the same setting as the previous case (i.e., v X ≥ v ⇐⇒ val ( P ( (cid:126)X ; D t )) ≥ v ). C.2 Proof of Theorem 5.2 (acceptance set representation)

For v t +1 < v ≤ v t for some t = 1 , . . . , J , by Proposition 5.1 we know that (cid:126)X ∈ A v if andonly if val ( P ( (cid:126)X ; D t )) ≥ v . By strong duality, this is equivalent to requiring val ( D ( (cid:126)X ; D t )) = val ( P ( (cid:126)X ; D t )) ≥ v . However, since Problem D ( (cid:126)X ; D t ) is a maximization problem, this latter in-equality reduces to the feasibility problem: (cid:40) ( p, q ) : (cid:88) θ ∈D t v ∗ θ · p θ − L q ≥ v, (cid:88) θ ∈D t θ · p θ − (cid:126)X ≤ q, (cid:88) θ ∈D t p θ = 1 , p ≥ , q ≥ (cid:41) . (cid:0)(cid:80) θ ∈D t v ∗ θ · p θ − v (cid:1) /L ≥ q and (cid:80) θ ∈D t θ · p θ − (cid:126)X ≤ q , we can eliminate q from the abovefeasibility problem to obtain: (cid:40) p : (cid:88) θ ∈D t θ · p θ − (cid:126)X ≤ ( (cid:88) θ ∈D t v ∗ θ · p θ − v ) /L, (cid:88) θ ∈D t p θ = 1 , p ≥ (cid:41) . We then have (cid:88) θ ∈D t θ · p θ − ( (cid:88) θ ∈D t v ∗ θ · p θ − v ) /L = (cid:88) θ ∈D t ( θ − v ∗ θ /L ) p θ + v/L, and so the desired result follows from the deﬁnition of ˜ θ . C.3 Proof of Theorem 5.3 (aspirational preferences)

By Proposition 2.1, we have ψ R ( E ) ( (cid:126)X ) = sup { v ≤ (cid:126)X ∈ A v } , ∀ (cid:126)X ∈ L , where A v is the acceptance set of ψ R ( E ) at level v ≤

0. By Theorem 5.2, we can write A v explicitlyas: A v =  (cid:126)X | (cid:126)X ≥ κ ( v ) (cid:88) j =0 ˜ θ · p θ + v/L, κ ( v ) (cid:88) j =0 p θ = 1 , p ≥  . It then follows that: ψ R ( E ) ( (cid:126)X ) = sup  v ≤ | (cid:126)X ≥ κ ( v ) (cid:88) j =0 ˜ θ · p θ + v/L, κ ( v ) (cid:88) j =0 p θ = 1 , p ≥  = sup (cid:110) v ≤ | (cid:126)X − v/L + c κ ( v ) ∈ A µ κ ( v ) (cid:111) = sup (cid:110) v ≤ | µ κ ( v ) ( (cid:126)X − v/L + c κ ( v ) ) ≤ (cid:111) = sup (cid:110) v ≤ | µ κ ( v ) ( (cid:126)X − τ ( v )) ≤ (cid:111) . In the above display: the second equality follows from the deﬁnition of coherent risk measure µ t where t = κ ( v ); the third equality follows from the fact that µ ( (cid:126)Z ) ≤ (cid:126)Z ∈ L if and only if (cid:126)Z ∈ A µ ; and the last equality follows from the deﬁnition of the target function τ ( v ). D Proofs for Section 6 (binary search algorithm for PRO)

D.1 Proof of Proposition 6.2 (lower bound for optimal value)

By Proposition 6.1, the inequality ψ R ( E ) ( (cid:126)G ( z )) ≥ v is equivalent to feasibility of the system F v ( D t )for t = κ ( v ). We then have the following chain of equivalences: (cid:110) ∃ z ∈ Z , ψ R ( E ) ( (cid:126)G ( z )) ≥ v (cid:111) ⇐⇒ { F v ( D t ) feasible } , (cid:26) max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≥ v (cid:27) ⇐⇒ { max { v (cid:48) | F v (cid:48) ( D t ) feasible } ≥ v } , (cid:26) max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≥ v (cid:27) ⇐⇒ { val ( G ( D t )) ≥ v } . .2 Proof of Corollary 6.3 (lower bound for optimal value) By Proposition 6.2, for v t +1 < v ≤ v t , we have the equivalence (cid:110) max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≥ v (cid:111) ⇐⇒{ val ( G ( D t )) ≥ v } . We then have the following chain of equivalences: (cid:26) max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) > v t +1 (cid:27) ⇐⇒ (cid:26) ∃ v t ≥ v > v t +1 , max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≥ v (cid:27) , ⇐⇒ (cid:8) ∃ v t ≥ v > v t +1 , val ( G ( D t )) ≥ v (cid:9) , ⇐⇒ (cid:8) val ( G ( D t )) > v t +1 (cid:9) . D.3 Proof of Theorem 6.4 (correctness of binary search algorithm)

Corollary 6.3 provides a necessary and suﬃcient condition for the optimal value of Problem (PRO)to satisfy max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) > v t . Hence, the binary search procedure in Algorithm 2, whichterminates after O (log H ) iterations, ﬁnds t = ζ such that v ζ +1 < max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≤ v ζ . Let(˜ z ∗ , ˜ p ∗ , ˜ v ∗ ) be the optimal solution of Problem G ( D ζ ). We want to show that ˜ z ∗ is an optimalsolution of Problem (PRO). We consider two cases: (i) ˜ v ∗ ≤ v ζ and (ii) ˜ v ∗ > v ζ .In the ﬁrst case where ˜ v ∗ ≤ v ζ , we claim the inequality ψ R ( E ) ( (cid:126)G ( z ∗ )) ≤ ˜ v ∗ . Note that ˜ v ∗ is thelargest value v such that F v ( D ζ ) is feasible. By deﬁnition, there is no z such that (cid:126)G ( z ) ∈ A v forany level v > ˜ v ∗ . It then follows that ψ R ( E ) ( (cid:126)G ( z )) ≤ ˜ v ∗ for all z ∈ Z .In the second case where ˜ v ∗ > v ζ , we claim that (cid:126)G (˜ z ∗ ) ∈ A v ζ . It will then follow that ˜ z ∗ is optimal for Problem (PRO) under the condition max z ∈Z ψ R ( E ) ( (cid:126)G ( z )) ≤ v ζ . The claim is truebecause the solution (˜ z ∗ , ˜ p ∗ , ˜ v ∗ ) is optimal for Problem G ( D ζ ), and thus it is automatically feasiblefor Problem G ( D ζ ) as well. Hence, it follows from the inequality ˜ v ∗ > v ζ that (˜ z ∗ , ˜ p ∗ , v ζ ) is alsofeasible and so (cid:126)G (˜ z ∗ ) ∈ A v ζ .By combining these two cases, we conclude with the observation that the optimal value of Problem(PRO) satisﬁes ψ R ( E ) ( (cid:126)G ( z ∗ )) = min { ˜ v ∗ , v ζ } . E Proofs for Section 7 (the law invariant case)

Recall that the underlying sample space is Ω = { ω , ω , . . . , ω T } . For a permutation σ ∈ Σ, wepermute the sample space Ω as σ (Ω) = { ω σ (1) , ω σ (2) , . . . , ω σ ( T ) } . Similarly, we permute the longvector (cid:126)X as σ ( (cid:126)X ) = ( X ( ω σ ( t ) )) Tt =1 , where the realizations are listed in the same order as σ (Ω). Alaw invariant choice function φ ∈ R QCo must satisfy φ ( (cid:126)X ) = φ ( σ ( (cid:126)X )) for all (cid:126)X ∈ L and σ ∈ Σ whenthe underlying probability measure on Ω is uniform.

E.1 Proof of Proposition 7.3 (characterization of optimal solution of lawinvariant value problem)

The uniqueness of the optimal solution of Problem P L follows the same argument as Theorem 4.2for the base case. To obtain a decomposition D = D J L of the augmented support set Θ L , we canstill (conceptually) apply Algorithm 1. We claim that there is a decomposition D J L such that:1. v ∗ σ ( θ ) = v ∗ σ (cid:48) ( θ ) for all σ, σ (cid:48) ∈ Σ and θ ∈ Θ. 45. All permutations of each θ ∈ Θ are seated next to each other (i.e., the ﬁrst | Σ | elementsof D J L are { ( σ ( W ) , v ∗ σ ( W ) ) } σ ∈ Σ for the normalizing prospect, the next | Σ | elements are { ( σ ( θ ) , v ∗ σ ( θ ) ) } σ ∈ Σ for some θ ∈ Θ, etc.).We verify this claim by induction. For the base case of W , we require every permutation σ ( W )for all σ ∈ Σ to have value 0 and hence necessarily D | Σ | = { ( σ ( W ) , } σ ∈ Σ . Now suppose theintermediate list constructed by Algorithm 1 is D | Σ | t for some t = 1 , , . . . , J −

1. The intermediateset D | Σ | t satisﬁes:1. σ ( θ (cid:48) ) ∈ D | Σ | t when θ (cid:48) ∈ D | Σ | t .2. v ∗ σ ( θ (cid:48) ) = v ∗ θ (cid:48) for all θ (cid:48) ∈ D | Σ | t and σ ∈ Σ.It can then be shown that every permutation of θ must have the same predicted value if θ is thesucceeding prospect of D | Σ | t . For a permutation σ ∈ Σ, Problem P ( σ ( θ ); D | Σ | t ) ismin s σ ,v σ v σ (23a)s.t. v σ + (cid:104) s σ , σ (cid:48) ( θ (cid:48) ) − σ ( θ ) (cid:105) ≥ v ∗ θ , ∀ θ (cid:48) ∈ D | Σ | t , ∀ σ (cid:48) ∈ Σ , (23b) s σ ≥ , (cid:107) s σ (cid:107) ≤ L, (23c) v σ ≥ v ∗ θ (cid:48) , ∀ ( σ ( θ ) , θ (cid:48) ) ∈ E , θ (cid:48) ∈ D | Σ | t . (23d)We will show that, given any feasible solution ( s σ , v σ ) to Problem (23), we can construct a solution( s σ (cid:48)(cid:48) , v σ (cid:48)(cid:48) ) that is feasible for Problem P ( σ (cid:48)(cid:48) ( θ ); D | Σ | t ) and achieves the same objective value. Inparticular, we will construct s σ (cid:48)(cid:48) := σ (cid:48)(cid:48) (cid:0) σ − ( s σ ) (cid:1) and v σ (cid:48)(cid:48) := v σ .Indeed, the objective values are trivially equal. To verify constraint (23b), we compute: v σ (cid:48)(cid:48) + (cid:104) s σ (cid:48)(cid:48) , σ (cid:48) ( θ (cid:48) ) − σ (cid:48)(cid:48) ( θ ) (cid:105) = v σ (cid:48)(cid:48) + (cid:104) σ (cid:48)(cid:48) (cid:0) σ − ( s σ ) (cid:1) , σ (cid:48) ( θ (cid:48) ) − σ (cid:48)(cid:48) ( θ ) (cid:105) = v σ (cid:48)(cid:48) + (cid:104) σ − ( s σ ) , σ (cid:48)(cid:48)− ( σ (cid:48) ( θ (cid:48) )) − θ (cid:105) = v σ (cid:48)(cid:48) + (cid:104) s σ , σ ( σ (cid:48)(cid:48)− ( σ (cid:48) ( θ (cid:48) ))) − σ ( θ ) (cid:105) ≥ v ∗ θ . The veriﬁcation for constraints (23c) and (23d) is immediate. We conclude that every permutation of θ will achieve the same optimal value val ( P ( θ ; D | Σ | t )), and thus the same predicted value π ( θ ; D | Σ | t ).Hence all permutations of θ , as well as their predicted value, can be appended to D | Σ | t and form D | Σ | ( t +1) . The desired result follows by induction. E.2 Proof of Proposition 7.4 (characterization of optimal solution of lawinvariant interpolation problem)

First, given a value vector v = ( v θ ) θ ∈ Θ we recall that Problem P L ( (cid:126)X ; v ) reduces to:min v X , a v X (24a)s.t. v X + max (cid:110) (cid:104) a, σ ( θ ) − (cid:126)X (cid:105) , (cid:111) ≥ v θ , ∀ θ ∈ Θ , σ ∈ Σ , (24b) a ≥ , (cid:107) a (cid:107) ≤ L. (24c)We want to show that val ( P L ( (cid:126)X ; v )) = val ( P L ( σ (cid:48) ( (cid:126)X ); v )) for all σ (cid:48) ∈ Σ. Given any feasible solution( v X , a ) of Problem P L ( (cid:126)X ; v ), we can construct a feasible solution ( v σ (cid:48) ( X ) , a σ (cid:48) ) for P L ( σ (cid:48) ( (cid:126)X ); v ) thatachieves the same objective value. In particular, we will take v σ (cid:48) ( X ) = v X and a σ (cid:48) = σ (cid:48) ( a ).46ndeed, the objective values are trivially equal. To verify constraint (24b), we compute: v σ (cid:48) ( X ) + max (cid:110) (cid:104) a σ (cid:48) , σ ( θ ) − σ (cid:48) ( (cid:126)X ) (cid:105) , (cid:111) = v X + max (cid:110) (cid:104) σ (cid:48) ( a ) , σ ( θ ) − σ (cid:48) ( (cid:126)X ) (cid:105) , (cid:111) = v X + max (cid:110) (cid:104) a, σ (cid:48)− ( σ ( θ )) − (cid:126)X (cid:105) , (cid:111) ≥ v θ . The veriﬁcation of constraint (24c) is immediate, and the desired results follows.

E.3 Proof of Theorem 7.5 (two-stage decomposition of law invariant ro-bust choice function)

The function ψ R ( E L ) is law-invariant and belongs to R ( E L ), thus it also belongs to R L ( E ). So, wemust have ψ R L ( E ) ( (cid:126)X ) ≤ ψ R ( E L ) ( (cid:126)X ) for all (cid:126)X ∈ L . Conversely, by [Law] any φ ∈ R L ( E ) satisﬁes[Eli] on the augmented ECDS E L . Thus, we have the inclusion R L ( E ) ⊂ R ( E L ) and it follows that ψ R L ( E ) ( (cid:126)X ) ≥ ψ R ( E L ) ( (cid:126)X ). So, we have the equality ψ R L ( E ) ( (cid:126)X ) = ψ R ( E L ) ( (cid:126)X ) for all (cid:126)X ∈ L . Thedesired decomposition result then follows from Theorem 3.4 applied to the augmented ECDS. E.4 Proof of Proposition 7.6 (reduction of law invariant interpolationproblem)

There are exponentially many constraints in (10b) due to the index σ ∈ Σ. Constraint (10b) isequivalent to: min σ ∈ Σ (cid:104) s, σ ( θ (cid:48) ) (cid:105) − (cid:104) s, θ (cid:105) + v θ − v ∗ θ (cid:48) ≥ , ∀ θ (cid:48) ∈ D L,t . (25)We will reduce the optimization problem min σ ∈ Σ (cid:104) s, σ ( θ (cid:48) ) (cid:105) in Eq. (25). Recall that s n is the subgra-dient of φ at θ corresponding to attribute n = 1 , , . . . , N . The optimal value of min σ ∈ Σ (cid:104) s, σ ( θ (cid:48) ) (cid:105) inEq. (25) is equal to the optimal value of:min Q ∈ R T × T N (cid:88) n =1 s (cid:62) n Q θ (cid:48) n (26a)s.t. Q (cid:62) −→ −→ , (26b) Q −→ −→ , (26c) Q m,l ∈ { , } , l, m = 1 , . . . , T, (26d)which is a linear assignment problem. Here Q is the permutation matrix corresponding to σ sothat Q θ (cid:48) n = σ ( θ (cid:48) n ) for all n = 1 , . . . , N (the permutation must be the same for all attributes, hencewe only have a single permutation matrix Q ). Problem (26) can be solved exactly by relaxing thebinary constraints Q m,l ∈ { , } to 0 ≤ Q m,l ≤ l, m = 1 , . . . , T . Strong duality holds forthe relaxed problem, and the optimal value of the relaxed problem is equal to:max w,y ∈ R T −→ (cid:62) w + −→ (cid:62) y (27a)s.t. N (cid:88) n =1 θ (cid:48) n s (cid:62) n − w −→ (cid:62) − −→ y (cid:62) ≥ . (27b)47t follows that constraint (25) is satisﬁed if and only if there exists w and y such that: −→ (cid:62) w + −→ (cid:62) y − (cid:104) s, θ (cid:105) + v θ − v ∗ θ (cid:48) ≥ , (28a) N (cid:88) n =1 θ (cid:48) n s (cid:62) n − w −→ (cid:62) − −→ y (cid:62) ≥ . (28b)We substitute the above display into constraint (10b) to obtain the desired reduction. E.5 Proof of Theorem 7.7 (correctness of sorting algorithm for law in-variant value problem)

We repeat the law invariant value function problem here for ease of reference: P L := min v, s (cid:88) θ ∈ Θ (cid:88) σ ∈ Σ v σ ( θ ) (29a)s.t. v σ ( θ ) + max (cid:8) (cid:104) s σ ( θ ) , σ (cid:48) ( θ (cid:48) ) − σ ( θ ) (cid:105) , (cid:9) ≥ v σ (cid:48) ( θ (cid:48) ) , ∀ ( θ, θ (cid:48) ) ∈ (cid:98) E , σ, σ (cid:48) ∈ Σ , (29b) s σ ( θ ) ≥ , (cid:107) s σ ( θ ) (cid:107) ≤ L, ∀ θ ∈ Θ , σ ∈ Σ , (29c) v σ ( θ ) ≥ v σ (cid:48) ( θ (cid:48) ) , ∀ ( θ, θ (cid:48) ) ∈ E , ∀ σ, σ (cid:48) ∈ Σ , (29d) v σ ( W ) = 0 , ∀ σ ∈ Σ . (29e)Our ultimate goal is to solve Problem (29). Proposition 7.3 already establishes the relationshipbetween the decomposition D J L (of the augmented ECDS) and the decomposition D L,J (of theoriginal ECDS while enforcing [Law]). By law invariance, for all θ, θ (cid:48) ∈ Θ and σ, σ (cid:48) ∈ Σ, we knowthat σ ( θ ) precedes σ (cid:48) ( θ (cid:48) ) if and only if θ precedes θ (cid:48) . We will prove that Algorithm 3 correctlyreturns D L,J = D L . In addition, the value vector v = ( u θ ) θ ∈ Θ corresponding to D L is optimal forProblem (29). Our proof follows three steps:1. First, we verify that the candidate solution constructed in Algorithm 3 is well-deﬁned.2. Second, we verify that this candidate solution gives a lower bound on the optimal solution ofProblem (29).3. Finally, we verify that this candidate solution is also feasible for Problem (29). Thus, it mustbe the (unique) optimal solution of Problem (29).Our proof is based on a modiﬁed version of Algorithm 3 that recursively constructs a candidatesolution to Problem (29) by selecting one prospect in Θ at a time. We will show that this modiﬁedalgorithm has the properties that we seek, and then we will conﬁrm that it is actually equivalent toAlgorithm 3. We let D L,t (which is composed of tuples ( θ, u θ )) for t = 0 , , . . . , J − D L, = { ( W , } . Theprocedure will eventually exhaust all prospects in Θ and terminate with a completely sorted D L,J .Given D L,t , or equivalently Σ( D L,t ) (where we recall that Σ( D L,t ) := { ( σ ( θ ) , v ∗ σ ( θ ) ) σ ∈ Σ } θ ∈D L,t ),consider the disjunctive program: P D ( θ ; Σ( D L,t )) := min v θ ,s θ v θ (30a)s.t. v θ + max {(cid:104) s θ , σ (cid:48) ( θ (cid:48) ) − θ (cid:105) , } ≥ u θ (cid:48) , ∀ θ (cid:48) ∈ D L,t , σ (cid:48) ∈ Σ , (30b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, (30c) v θ ≥ u θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D L,t . (30d)48or the purpose of our proof, it is convenient for us to work with P D ( θ ; Σ( D L,t )). We establishequivalence between P D ( θ ; Σ( D L,t )) and π L ( θ ; D L,t ) later.A modiﬁed version of Algorithm 3 follows that is based on solving a sequence of instances ofProblem P D ( θ ; Σ( D L,t )).

Algorithm 6: (Modiﬁed) sorting algorithm for the law invariant value problem

Result:

A decomposition D L .Initialization: Θ, t = 1, and D L,t = { ( W , } ; while t < J do Choose θ ∗ ∈ arg max θ / ∈D L,t P D ( θ ; Σ( D L,t )), and set u θ ∗ := val ( P D ( θ ; Σ( D L,t )));Set D L,t +1 := {D L,t , ( θ ∗ , u θ ∗ ) } ;Set t := t + 1; endreturn D L := D L,J .Algorithm 6 is similar to Algorithm 3, except that it uses the optimal value of the disjunctiveprogramming problem P D ( θ ; Σ( D L,t )) in place of the predictor π L ( θ ; D L,t ) (which is based onsolving an LP).In iteration t = 1 , . . . , J − D L,t ). We then choose any θ ∈ arg max θ (cid:48) / ∈D L,t val ( P D ( θ (cid:48) ; Σ( D L,t ))) , set u θ := val ( P D ( θ ; Σ( D L,t ))), and then append ( θ, u θ ) to D L,t to form D L,t +1 .We pause to verify that this recursive construction of Algorithm 6 is well-deﬁned. The followingtwo lemmas share the same proof as Lemma B.5 and Lemma B.6, because the optimization problems P D ( θ ; Σ( D L,t )) and P D ( θ ; D L,t ) only diﬀer in the number of prospects.

Lemma E.1.

For all t = 1 , , . . . , J − , val ( P D ( θ ; Σ( D L,t ))) is ﬁnite.

Lemma E.2.

Let D L be the ﬁnal output of Algorithm 6. Then, u θ ≥ u θ (cid:48) if θ precedes θ (cid:48) in D L . Next we will verify that the construction of Algorithm 6 is a lower bound on the optimal valueof Problem (29). For any θ ∈ Θ, we recall that Problem P L ( θ ) is explicitly:min v, s v θ (31a)s.t. v σ ( θ ) + max (cid:8) (cid:104) s σ ( θ ) , σ (cid:48) ( θ (cid:48) ) − σ ( θ ) (cid:105) , (cid:9) ≥ v σ (cid:48) ( θ (cid:48) ) , ∀ σ, σ (cid:48) ∈ Σ , θ (cid:54) = θ (cid:48) ∈ Θ , (31b) s σ ( θ ) ≥ , (cid:107) s σ ( θ ) (cid:107) ≤ L, ∀ σ ∈ Σ , θ ∈ Θ , (31c) v σ ( θ ) ≥ v σ (cid:48) ( θ (cid:48) ) , ∀ σ, σ (cid:48) ∈ Σ , ∀ ( θ, θ (cid:48) ) ∈ E , (31d) v σ ( W ) = 0 , ∀ σ ∈ Σ . (31e) Lemma E.3.

For all t = 1 , , . . . , J − , val ( P D ( θ ; Σ( D L,t ))) ≤ ψ R L ( E ) ( θ ) for all θ / ∈ D L,t .Proof.

We prove this statement by induction starting with D = { ( W , } . We see val ( P D ( θ ; Σ( D )))is a lower bound for ψ R L ( E ) ( θ ) for all θ / ∈ D (this is automatic because P D ( θ ; Σ( D ))) has fewerconstraints than Problem (31) which computes ψ R L ( E ) ( θ ) exactly). Since ψ R L ( E ) ( θ ) = ψ R L ( E ) ( σ ( θ ))for all σ ∈ Σ, val ( P D ( θ ; Σ( D ))) is also a lower bound of ψ R L ( E ) ( σ ( θ )).Proceeding inductively, if every estimate in D L,t is a lower bound on the corresponding value ψ R L ( E ) ( θ (cid:48) ) for all θ (cid:48) ∈ D L,t , then the optimal value of Problem P D ( θ ; Σ( D L,t )) is a lower bound forall θ / ∈ D

L,t (this follows because P D ( θ ; Σ( D L,t )) has fewer constraints than Problem (31), and byinduction all of the values v θ (cid:48) for θ (cid:48) ∈ D L,t are themselves lower bounds).49ext, we will verify that the construction of Algorithm 6 is also feasible for Problem (29).

Lemma E.4.

Let v = ( v σ ( θ ) ) θ ∈ Θ , σ ∈ Σ where v σ ( θ ) = u θ for all θ ∈ Θ , then v is feasible for Problem (29) .Proof. Suppose θ ∈ Θ is the succeeding prospect of D L,t for some t . By Lemma E.2 (on monotonic-ity), u θ ≥ u θ (cid:48) for all θ (cid:48) / ∈ D L,t . As a result, Eq. (29b) is satisﬁed for all pairs ( θ, θ (cid:48) ) with θ (cid:48) / ∈ D L,t and σ, σ (cid:48) ∈ Σ, and hence for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ Θ and σ, σ (cid:48) ∈ Σ (because this constraint is alreadysatisﬁed for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ D L,t and σ, σ (cid:48) ∈ Σ in P D ( θ ; Σ( D L,t ))). Constraint (29d) is alsosatisﬁed for all ( θ, θ (cid:48) ) with θ (cid:48) / ∈ D L,t and σ, σ (cid:48) ∈ Σ, and hence for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ Θ and σ, σ (cid:48) ∈ Σ(again, because this constraint is already satisﬁed for all ( θ, θ (cid:48) ) with θ (cid:48) ∈ D L,t and σ, σ (cid:48) ∈ Σ in P D ( θ ; Σ( D L,t ))). This reasoning applies to every θ ∈ Θ, and so the constructed candidate solutionis feasible.Now to complete our proof of the correctness of the law invariant sorting algorithm, we will showthat val ( P D ( θ ; Σ( D L,t ))) is related to the following LP: P ( θ ; Σ( D L,t )) := min v θ ,s θ v θ (32a)s.t. v θ + (cid:104) s θ , θ (cid:48) − θ (cid:105) ≥ u θ (cid:48) , ∀ θ (cid:48) ∈ D L,t , σ (cid:48) ∈ Σ , (32b) s θ ≥ , (cid:107) s θ (cid:107) ≤ L, (32c) v θ ≥ u θ (cid:48) , ∀ ( θ, θ (cid:48) ) ∈ E , θ (cid:48) ∈ D L,t . (32d)In particular, we establish val ( P D ( θ ; Σ( D L,t ))) = π ( θ ; Σ( D L,t )) := min { val ( P ( θ ; Σ( D L,t ))) , v

L,t } . The following result is a direct consequence of Lemma B.9 because u σ ( θ ) = u σ (cid:48) ( θ ) for all θ and σ, σ (cid:48) ∈ Σ. Lemma E.5.

For all t = 1 , . . . , J − , val ( P D ( θ ; Σ( D L,t ))) = π ( θ ; Σ( D L,t )) for all θ / ∈ D L,t . By Proposition 7.6, we have P ( θ ; Σ( D L,t )) = P L ( θ ; D L,t ) and hence val ( P D ( θ ; Σ( D L,t ))) = π L ( θ ; D L,t ). We can now claim that Algorithm 3 and Algorithm 6 are equivalent by Lemma E.5.To conclude the proof of Theorem 7.7, since the output of Algorithm 3 is both a lower bound anda feasible solution, it follows that Algorithm 3 returns the unique optimal solution of Problem (29).

E.6 Alternative Proof of Theorem 7.7 (correctness of sorting algorithmfor law invariant value problem)

We can also prove Theorem 7.7 by applying Algorithm 1 to obtain a decomposition D J L of Θ L .Observe that Algorithm 3 only diﬀers from Algorithm 1 in the following respects: (i) the predictoris changed from π ( θ ; Σ( D L,t )) to π L ( θ ; D L,t ); (ii) u θ is calculated as a “representative” and thenapplied to all permutations σ ( θ ) of θ . We show that Algorithm 1 applied to Problem P L correctlypreserves these features.First, by Proposition 7.6, we have equivalence between P ( θ ; Σ( D L,t )) and P L ( θ ; D L,t ). Hence,the predictor π L ( θ ; D L,t ) = min (cid:8) v L,t , val ( P L ( θ ; D L,t )) (cid:9) returns the same value as π ( θ ; Σ( D L,t )) =min (cid:8) v L,t , val ( P ( θ ; Σ( D L,t ))) (cid:9) .Second, we note that the optimal solution v ∗ of Problem (29) is unique by Theorem 7.5. Fur-thermore, the optimal values satisfy v ∗ σ ( θ ) = v ∗ σ (cid:48) ( θ ) for all θ ∈ Θ and σ , σ (cid:48) ∈ Σ by Proposition 7.3.50t follows that we only need to calculate the value u σ ( θ ) for any “representative” σ ( θ ) in order toobtain all the values { u σ (cid:48) ( θ ) } σ (cid:48) ∈ Σ . That is, to obtain a decomposition D J L , we can eﬀectively obtaina decomposition D L,J because u σ ( θ ) = u σ (cid:48) ( θ ) for all θ ∈ Θ and σ, σ (cid:48) ∈ Σ.To conclude, Algorithm 3 is actually equivalent to Algorithm 1 on an augmented ECDS. Hence,the correctness of Algorithm 3 follows from the correctness of Algorithm 1.

E.7 Proof of Theorem 7.12 (correctness of binary search algorithm)

This proof is similar to the proof of Theorem 6.4 for our base case. Corollary 7.11 provides anecessary and suﬃcient condition for the inequality max z ∈Z ψ R L ( E ) ( (cid:126)G ( z )) > v L,t to hold. Hencethe binary search procedure in Algorithm 4, which terminates after O (log H ) iterations, ﬁnds t = ζ such that v L,ζ +1 < max z ∈Z ψ R L ( E ) ( (cid:126)G ( z )) ≤ v L,ζ . Now let (˜ z ∗ , ˜ p ∗ , ˜ q ∗ , ˜ ρ ∗ , ˜ v ∗ ) be the optimal solutionof Problem G L ( D L,ζ ). We want to show that ˜ z ∗ is optimal for Problem (PRO). We consider twocases: (i) ˜ v ∗ ≤ v L,ζ and (ii) ˜ v ∗ > v L,ζ .In the ﬁrst case where ˜ v ∗ ≤ v L,ζ , we claim that the optimal value of Problem (PRO) satisﬁes theinequality ψ R L ( E ) ( (cid:126)G ( z ∗ )) ≤ ˜ v ∗ , and hence ˜ z ∗ is optimal for Problem (PRO). Note that if ˜ v ∗ ≤ v L,ζ ,the value ˜ v ∗ is the largest value v such that the inclusion (cid:126)G ( z ) ∈ A L,v has a solution. Hence if v > ˜ v ∗ , then there is no z ∈ Z such that (cid:126)G ( z ) ∈ A L,v . It follows that ψ R L ( E ) ( (cid:126)G ( z )) ≤ ˜ v ∗ for all z ∈ Z .In the second case where ˜ v ∗ > v L,ζ , we claim that (cid:126)G (˜ z ∗ ) ∈ A L,v

L,ζ and hence that ˜ z ∗ is optimal forProblem (PRO) under the condition that max z ∈Z ψ R L ( E ) ( (cid:126)G ( z )) ≤ v L,ζ . The vector (˜ z ∗ , ˜ p ∗ , ˜ q ∗ , ˜ ρ ∗ , ˜ v ∗ )is an optimal solution of Problem G L ( D L,ζ ), and so it is necessarily a feasible solution. It then followsfrom the inequality ˜ v ∗ > v L,ζ that (˜ z ∗ , ˜ p ∗ , ˜ q ∗ , ˜ ρ ∗ , v L,ζ ) is also feasible for Problem G L ( D L,ζ ). So, bythe same reasoning as the ﬁrst case, we have (cid:126)G (˜ z ∗ ) ∈ A L,v

L,ζ .By combining these two cases, we conclude that the optimal value of Problem (PRO) satisﬁes ψ R L ( E ) ( (cid:126)G ( z ∗ )) = min { ˜ v ∗ , v L,ζ }}