[PDF] Fairness through Optimization

Abstract

We propose optimization as a general paradigm for formalizing fairness in AI-based decision models. We argue that optimization models allow formulation of a wide range of fairness criteria as social welfare functions, while enabling AI to take advantage of highly advanced solution technology. We show how optimization models can assist fairness-oriented decision making in the context of neural networks, support vector machines, and rule-based systems by maximizing a social welfare function subject to appropriate constraints. In particular, we state tractable optimization models for a variety of functions that measure fairness or a combination of fairness and efficiency. These include several inequality metrics, Rawlsian criteria, the McLoone and Hoover indices, alpha fairness, the Nash and Kalai-Smorodinsky bargaining solutions, combinations of Rawlsian and utilitarian criteria, and statistical bias measures. All of these models can be efficiently solved by linear programming, mixed integer/linear programming, or (in two cases) specialized convex programming methods.

Full PDF

FFairness through Optimization ∗ Violet (Xinying) Chen and J. N. HookerCarnegie Mellon University [email protected], [email protected]

January 2021

Abstract

We propose optimization as a general paradigm for formalizing fairness in AI-based decision models.We argue that optimization models allow formulation of a wide range of fairness criteria as social welfarefunctions, while enabling AI to take advantage of highly advanced solution technology. We show how op-timization models can assist fairness-oriented decision making in the context of neural networks, supportvector machines, and rule-based systems by maximizing a social welfare function subject to appropriateconstraints. In particular, we state tractable optimization models for a variety of functions that measurefairness or a combination of fairness and eﬃciency. These include several inequality metrics, Rawlsiancriteria, the McLoone and Hoover indices, alpha fairness, the Nash and Kalai-Smorodinsky bargainingsolutions, combinations of Rawlsian and utilitarian criteria, and statistical bias measures. All of thesemodels can be eﬃciently solved by linear programming, mixed integer/linear programming, or (in twocases) specialized convex programming methods.

Artiﬁcial intelligence is increasingly used not only to solve problems, but to recommend action decisions thatrange from awarding mortgage loans to granting parole. The prospect of making decisions immediately raisesthe question of ethics and fairness. If ethical norms are to be incorporated into artiﬁcial decision making,these norms must somehow be automated or formalized. The leading approaches to this challenge include • value alignment , which strives to train or modify AI systems to reﬂect human ethical values automat-ically (Allen et al. [2005], Russell [2019], Gabriel [2020]); • logical formulations of ethical and fairness principles that attempt to represent them precisely enoughto govern a rule-based AI system (Bringsjord et al. [2006], Lindner et al. [2020], Hooker and Kim[2018]); and • statistical fairness metrics that aim to ensure that beneﬁts are allocated equitably (Dwork et al. [2012],Mehrabi et al. [2019], Chouldechova and Roth [2020]).Each of these approaches can be useful in a suitable context. We wish to propose, however, a promising toolfor formalizing ethics and fairness that has received less attention: • optimization , which allows one to achieve equity or fairness by maximizing a social welfare function .Welfare economics has long used social welfare functions to measure the desirability of a given distributionof beneﬁts and costs. These functions can be designed to represent any of a wide range of conceptions offairness and equity. They allow one to harness powerful optimization solvers, developed over a period of 80 ∗ Relevant disciplines: fairness in AI, optimization methods, optimization modeling, ethics, welfare economics. a r X i v : . [ c s . A I] F e b ears or more, to achieve equity by maximizing social welfare—a task that can be carried out automaticallyby computer. Optimization methods are of course already employed in AI to train neural networks, calibratesupport vector machines, solve clustering problems, and the like. Our proposal is to harness the power ofoptimization to identify ethical and fair decisions.Achieving fairness by maximizing a social welfare function (SWF) oﬀers at least two advantages inaddition to the employment of optimization technology. One is that it allows one considerable ﬂexibility torepresent constraints on the problem. Decisions are normally made in the context of resource constraints orother limitations on possible options. These can be represented as constraints in the optimization problem, asnearly all state-of-the-art optimization methods are designed for constrained optimization. Also, a complexSWF can often be simpliﬁed by adding constraints to the optimization problem, resulting in a problem thatis easier to solve.Another advantage of maximizing a SWF is that equity and eﬃciency are naturally combined in an SWF.There is frequently an eﬃciency or utilitarian criterion along with an equity criterion in AI-based decisionmaking, such as predictive accuracy or economic beneﬁt. A properly designed SWF can combine utilitarianand fairness criteria in principled ways. For example, the proportional fairness SWF, already widely used inengineering, balances throughput (utility) and fairness in a fashion that can be given theoretical justiﬁcation.One can of course maximize a pure eﬃciency objective subject to a constraint on inequality or some othermeasure of inequity, but this provides no principled way of balancing eﬃciency against equity. Formulationof a SWF encourages one to think explicitly about how they should be balanced, as well as allowing oneto govern the equity/eﬃciency trade-oﬀ with one or more parameters. In any event, maximizing a SWFsacriﬁces no generality, because one can always implement a constraint on inequity by penalizing constraintviolations in the SWF.As a running example, we consider a bank’s decision as to which applicants for mortgage loans shouldreceive loans, based on individual credit proﬁles. We wish to identify decisions that maximize social welfare,subject to a constraint on the amount of funds available. Social welfare is a function of the utilities enjoyedby each of the stakeholders involved, including the applicants, the bank, and perhaps other parties such asstockholders and the community at large. The utilities are, in turn, a function of the funds allocated toeach applicant. They can be measured as wealth, negative cost, or some broader type of beneﬁt that isaﬀected by the loan decisions. A neural network or support vector machine for the mortgage loan problemwould normally be trained to maximize the accuracy of predicting loan defaults or their probability. Thisimplies a concern with eﬃciency, since greater predictive accuracy results in a more eﬃcient use of capitaland perhaps greater welfare overall. The social welfare function can be designed to take into account thedistribution of utilities as well as total net utility, thus balancing equity and eﬃciency. AI technology wouldthen be designed to maximize social welfare rather than simply predictive accuracy. For example, machinelearning (e.g. neural networks) could be used to predict the probability of default, and optimization thenused to make loan decisions. Social welfare maximization can also be incorporated into a support vectormachine or into purely rule-based AI.Our contribution in this paper, aside from pointing out the potential of optimization as a general paradigmfor achieving fairness, is to show how to formulate a number of fairness-related SWFs as tractable optimiza-tion problems. Much of the art of optimization is formulating the problem in a way that makes it suitablefor existing solvers. We draw upon known modeling techniques for some of the SWFs and introduce newtechniques for others. To our knowledge, most of these formulations do not appear in the AI literature orelsewhere.The paper is organized as follows. In Section 1, we begin with introducing the optimization problemof maximizing a social welfare function in a general resource allocation context. We demonstrate concretespeciﬁcations of this social welfare optimization model on our running example of a bank’s decision to grantmortgage loans, and state our assumptions on the linearity of constraints. In Section 2, we discuss relatedwork on optimization methods for fair and ethical decision making that appears in the optimization, machinelearning and AI literature, and distinguish this paper’s contribution from previous research. We then studyfour types of fairness seeking schemes and present tractable optimization formulations for a variety of fairnessmeasures. The ﬁrst type we examine uses an inequality measure as the SWF and evaluates fairness based on2he degree of equality in the utility distribution (Section 3). The second scheme deﬁnes fairness as giving acertain amount of priority to less advantaged stakeholders in the social welfare optimization problem (Section4). The third type of methods handles the important challenge of balancing fairness and eﬃciency throughoptimizing SWFs that combine the two objectives (Section 5). Lastly, in Section 6, we discuss fairnessin support vector machines to eliminate disparity in treatment received by diﬀerent groups as well as tomaximize social welfare in general. The general problem of maximizing social welfare can be statedmax x (cid:8) W (cid:0) U ( x ) (cid:1) | x ∈ S x (cid:9) (1)where x = ( x , . . . , x n ) is a vector of resources distributed across stakeholders 1 , . . . , n , and U = ( U , . . . , U n )is a vector of utility functions corresponding to the stakeholders. Also S x is the set of feasible values of x , and W is a social welfare function. The problem maximizes social welfare over all feasible resource allocations.It is convenient to model the utility functions U using constraints, because this results in problems bettersuited for optimization solvers. We therefore write (1) asmax x , u (cid:8) W (cid:48) ( u ) | ( x , u ) ∈ S xu (cid:9) (2)where u is a vector of utilities, and S xu is deﬁned so that ( x , u ) ∈ S xu implies x ∈ S x and u ≤ U ( x ).The function W (cid:48) is a possibly simpliﬁed version of W that yields an equivalent optimization problem due toconstraints deﬁning S xu . We illustrate this maneuver throughout the paper. In the mortgage problem described earlier, we can let I A be the set of loan applicants and I B the set of otherstakeholders, such as the bank. Then x i for i ∈ I A is the loan amount allocated to applicant i . The feasibleset S xu could be deﬁned in part by the budget constraint (cid:80) i x i ≤ H , where H is the amount of availablefunds. We would also have constraints x i ≤ h i , where h i = 0 for i ∈ I B , and h i is the requested loan amountfor i ∈ I A .In a very simple version of the mortgage model, we could deﬁne U i ( x ) = a i x i for i ∈ I A and U i ( x ) = (cid:80) i r i p i x i for i ∈ I B . Here a i is a constant that roughly indicates the utility value of loan dollars to applicant i , and p i is the probability that applicant i will repay the loan. Also r i for i ∈ B is the total rate of return tostakeholder i over the lifetime of the loans. The social welfare function W is nonlinear in general but can beconverted to a linear W (cid:48) in many cases if appropriate constraints are included. To take a simple example, amaximin criterion W ( u ) = min i { u i } can be linearized by letting W (cid:48) ( u ) = w and adding constraints w ≤ u i for all i . Then the social welfare maximization problem ismax x , u ,w  w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) u i ≤ a i x i , i ∈ I A u i ≤ (cid:80) j ∈ I A r j p j x j , i ∈ I B w ≤ u i , ≤ x i ≤ h i , all i (cid:80) i ∈ I A x i ≤ H  We note that the constraints and objective function are linear and therefore deﬁne an easily solved linearprogramming problem.The probabilities p i of default for the set I A of current mortgage applicants can be estimated by machinelearning, and the optimization model can be applied to these applicants to arrive at fair loan decisions. Analternative approach is to solve the optimization problem for a pre-deﬁned set I A of hypothetical applicants,each associated with certain ﬁnancial characteristics. The probability of repayment for each hypothetical3pplicant could again be estimated by machine learning, using the pre-deﬁned characteristics as input to theneural network. Then each new applicant could be awarded or denied a loan based on the solution of theoptimization problem and the hypothetical applicant i to which the new applicant is most similar.If the optimal solution partially funds an applicant i (i.e., 0 < x i < h i ), the bank could make a judgmentcall as to whether to grant the loan, or else solve a variant of the optimization problem that requires x i ∈ { , h i } . The latter is accomplished by introducing 0–1 variables δ i and solving the problemmax x , u ,w, δ  w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) u i ≤ β i x i , i ∈ I A u i ≤ (cid:80) j ∈ I B r j p j x j , i ∈ I B x i = h i δ i , δ i ∈ { , } , all iw ≤ u i , all i ; (cid:80) i ∈ I A x i ≤ H  (3)The 0–1 variables make the problem harder to solve, but it is a mixed integer/linear programming problem,for which solution technology is highly advanced. It is likely to be solved without diﬃculty, since it is notposed for all individuals in the training set. Rather, it is solved for applicants currently under consideration,or for hypothetical applicants as described above. We will assume for the present discussion that the feasible set S xu is deﬁned by a linear system A x + B u ≤ b ,aside from any integrality conditions on the variables. This simpliﬁes the optimization problem while pro-viding a great deal of modeling ﬂexibility. For example, the assumption is satisﬁed when two conditions aremet: (a) the constraints on feasible resource allocations x are linear, which is normally the case (as whenthere are one or more budget constraints), and (b) utilities u i are a linear or concave function of resources(the latter indicating the typical situation of decreasing returns to scale) and can therefore be approximatedby a concave piecewise linear function. These conditions are met by the mortgage problem and a wide varietyof other decision problems.We do not assume that the social welfare function W is linear, and it is in fact nonlinear in most interestingcases. Yet we can convert a nonlinear W to a linear W (cid:48) in almost all of the optimization models describedhere. In fact, all of the models are of one of the following types: • Linear programming (LP) . This is an optimization problem with continuous variables, a linear objectivefunction, and linear inequality and/or equality constraints. It is extremely well solved using the simplexmethod or an interior point method. Computation time is not an issue except for truly huge instances. • Mixed integer/linear programming (MILP) . This is an LP problem except that some variables arediscrete (in our case, 0–1). It is a combinatorial problem but is often tractable for hundreds or eventhousands of discrete variables using state-of-the-art software. • Convex nonlinear programming with linear constraints . These is an LP problem except for a convexnonlinear (in one case, quadratic) objective function. Only two models have this form and can beeﬃciently solved by a quadratic programming, reduced gradient, or other specialized method.

We brieﬂy survey two research streams on optimization and fairness, one from the optimization literature,and one from the AI literature. The former stream focuses on formulating optimization models to representpractical problems where fairness is an important concern, such as resource allocation, capacity planning,routing, scheduling and so forth. While some of these applications appear in AI research, we are concernedin this paper with understanding how optimization can be viewed more broadly as a general paradigm forformulating fairness by maximizing social welfare. A comprehensive survey of fairness in the optimizationliterature is provided by Karsu and Morton (2015). 4andwidth allocation in telecommunication networks is a popular application studied in early works onfair resource allocation (Luss [1999], Ogryczak and ´Sliwi´nski [2002], Ogryczak et al. [2008]). For problemsin this domain, a standard strategy is to deﬁne an objective function that is consistent with a Rawlsiancriterion, and then to solve the corresponding model to obtain equitable allocations that optimize the worstperformance among activities or services that compete for bandwidth. Iancu and Trichakis (2014) studiedfairness in the context of portfolio optimization, and used an optimization model to determine a portfoliodesign that would attain desirable trade-oﬀs between optimal trading performance and equitable cost-sharingamong accounts. Project assignment is another application where fairness is often relevant, as the involvedstakeholders may have diﬀerent preferences over projects. For instance, Chiarandini et al. (2019) workedwith a real-life decision to assign projects to university students. They formulated the allocation problem as aMILP model and compared the empirical performance of using SWFs that capture diﬀerent fairness-eﬃciencybalancing principles as objective functions. Fair optimization has also received attention in humanitarianoperations. Eisenhandler and Tzur (2019) studied an important logistical challenge in food bank operations,food pickup and distribution. They designed a routing resource allocation model to seek both fair allocationof food to diﬀerent agency locations and eﬃcient delivery of as much food as possible. Mostajabdaveh et al.(2019) considered a disaster preparation task of selecting shelter locations and assigning neighborhoodsto shelters. To make fair and eﬃcient decisions while accounting for uncertainty, they used a stochasticprogramming model to optimize an objective function related to the Gini coeﬃcient.Recent AI research has developed eﬃcient algorithms that take fairness into account. This eﬀort diﬀersfrom our proposal in that it develops algorithms to solve speciﬁc problems that have a fairness component,rather than formulating optimization models that can be submitted to state-of-the-art software. Algorithmicdesign tasks are often associated with fair matching decisions, such as kidney exchange (McElfresh andDickerson [2018]), paper-reviewer assignment in peer review (Stelmakh et al. [2019]), or online decisionprocedures for a complex situation such as ridesharing (Nanda et al. [2020]).Fair machine learning is a rapidly growing ﬁeld in recent years. Fair ML methods in literature canbe categorized as pre-, in-, or post-processing approaches, which respectively attain fairness by modifyingstandard ML methods before, during, or after the training phase. A main diﬀerence between our proposaland the majority of literature is how fairness is deﬁned and measured. ML fairness requires the eliminationof bias and discrimination and is measured in terms of predictions from ML models, while we adopt autility- and welfare-based view of fairness. However, some recent research has discussed the usefulness of asocial-welfare-maximization perspective in fair ML (Heidari et al. [2018], Hu and Chen [2020]). Perhaps theoptimization models and techniques we present here can beneﬁt future work in this area.Since optimization is a core technique in ML regardless of whether fairness is taken into account, wefocus here on research that uses optimization in the fairness-seeking component. Pre-processing methodsaim to prepare training data to prevent bias and disparity, and optimization models can be used to ﬁnd thebest data modiﬁcations. For example, Zemel et al. (2013) and Calmon et al. (2017) proposed optimizationmodels to learn fair representations of the original training data. Their models used objective functionsthat capture the trade-oﬀ among preserving prediction accuracy, limiting data distortion and eliminatingpotential discrimination associated with protected attributes.Post-processing seeks fairness by adjusting the predictions generated from the trained model. Similar tothe pre-processing case, we can determine the optimal tuning rules with optimization models. Examples ofthis strategy can be found in Hardt et al. (2016) and Alabdulmohsin (2020).Fairness through optimization can ﬁt naturally into in-processing methods. For standard ML algorithmsthat essentially solve optimization problems, such as support vector machines, logistic regression, etc., itis convenient to obtain fair alternatives by adding fairness constraints or including fairness components inobjective function. A wide variety of fairness deﬁnitions have been studied in diﬀerent ML frameworks. Aseries of papers have formulated constraints to denote well-known parity based fairness notions includingdemographic parity, equality of opportunity and predictive rate parity (Zafar et al. [2017, 2019], Olfat andAswani [2018], Donini et al. [2018], Heidari et al. [2019]). A diﬀerent direction explored in the literature isto deﬁne objective function to encode both fairness and the conventional training accuracy goals (Berk et al.[2017], Goel et al. [2018], Heidari et al. [2018]). These papers have demonstrated the empirical potentials5f their optimization models for fair classiﬁcation and regression. Another related work is that of Aghaeiet al. (2019), which developed a mixed integer optimization based framework for learning fair decision trees,where a discrimination measure is used as penalty regularizer to eliminate disparity in the loss minimizationmodel for training optimal decision trees.

One possible measure of fairness is the degree of equality in the distribution of utilities, for which severalstatistical metrics have been proposed (Cowell [2000], Jenkins and Van Kerm [2011]). Equality is not thesame concept as fairness, but it is related and can be a useful criterion in some cases (Frankfurt [2015], Parﬁt[1997], Scanlon [2003]). We present optimization models for relative range, relative mean deviation, coeﬃcientof variation, and the Gini coeﬃcient. We begin with a brief review of linear-fractional programming, whichis useful for converting these and some other models to easily solved LP problems.

Charnes and Cooper (1962) provides a mechanism for converting optimization problems whose objectivefunction is a ratio of aﬃne functions to linear programming (LP) problems, which are easy to solve. Itapplies to problems of the form max u (cid:110) c (cid:124) u + c d (cid:124) u + d (cid:12)(cid:12)(cid:12) A u ≤ b (cid:111) (4)where the denominator du + d is positive in the feasible set { u | A u ≤ b } . We introduce a scalar variable t and use the change of variable u = u (cid:48) /t to write (4) as the LP problemmax u (cid:48) ,t (cid:8) c (cid:124) u (cid:48) + c t (cid:12)(cid:12) A u (cid:48) ≤ b t, d (cid:124) u (cid:48) + d t = 1 , t ≥ (cid:9) (5)Then if ( ˆ u (cid:48) , ˆ t ) is an optimal solution of (5), u = ˆ u (cid:48) / ˆ t solves (4). This technique can be extended to nonlinearand MILP models, as we will see below. As a simple example, linear-fractional programming can be used when the measure of inequality is the relativerange of utilities. The SWF is W ( u ) = − ( u max − u min ) / ¯ u , where u max = max i { u i } , u min = min i { u i } , and¯ u = (1 /n ) (cid:80) i u i . We assume with little loss of generality that A x + B u ≤ b implies ¯ u >

0. The problem ofmaximizing W ( u ) subject to linear constraints A x + B u ≤ b can then be written as the LP problemmin x (cid:48) , u (cid:48) ,tu (cid:48) min ,u (cid:48) max (cid:110) u (cid:48) max − u (cid:48) min (cid:12)(cid:12)(cid:12) u (cid:48) min ≤ u (cid:48) i ≤ u (cid:48) max , all iA x (cid:48) + B u (cid:48) ≤ b t, ¯ u (cid:48) = 1 , t ≥ (cid:111) where u (cid:48) min , u (cid:48) max are regarded as variables along with x (cid:48) , u (cid:48) , and t . If ( ˆ x (cid:48) , ˆ u (cid:48) , ˆ u (cid:48) min , ˆ u (cid:48) max , ˆ t ) solves thisproblem, then u = ˆ u (cid:48) / ˆ t is a distribution that minimizes the relative range.Another dispersion metric is the relative mean deviation , for which the SWF is W ( u ) = − (1 / ¯ u ) (cid:80) i | u i − ¯ u | .This, too, can be optimized by linear-fractional programming:min x (cid:48) , u (cid:48) , v ,t (cid:110) (cid:88) i v i (cid:12)(cid:12)(cid:12) − v i ≤ u (cid:48) i − ¯ u (cid:48) ≤ v i , all iA x (cid:48) + B u (cid:48) ≤ b t, ¯ u (cid:48) = 1 , t ≥ (cid:111) where v , . . . , v n are new variables.The coeﬃcient of variation is the standard deviation with normalized mean. The SWF is W ( u ) = − u (cid:104) n (cid:88) i ( u i − ¯ u ) (cid:105) x (cid:48) , u (cid:48) , v ,t (cid:110)(cid:104) n (cid:88) i ( u (cid:48) i − ¯ u (cid:48) ) (cid:105) (cid:12)(cid:12)(cid:12) A x (cid:48) + b u (cid:48) ≤ b t ¯ u (cid:48) = 1 , t ≥ (cid:111) This is not an LP problem, but we can obtain the same optimal solution by solving it without the exponent . This yields a convex quadratic programming problem with linear constraints, for which there are eﬃcientalgorithms in state-of-the-art optimization packages. The

Gini coeﬃcient is by far the best known measure of inequality, as it is routinely used to measure incomeand wealth inequality. It is proportional to the area between the Lorenz curve and a diagonal line representingperfect equality and therefore vanishes under perfect equality. The SWF is W ( u ) = − (1 / un ) (cid:80) i,j | u i − u j | .Again applying linear-fractional programming, the problem of minimizing the Gini coeﬃcient subject tolinear constraints is equivalent to the LP problemmin x (cid:48) , u (cid:48) ,V,t (cid:110) n (cid:88) i,j v ij (cid:12)(cid:12)(cid:12) − v ij ≤ u (cid:48) i − u (cid:48) j ≤ v ij , all i, jA x (cid:48) + B u (cid:48) ≤ b t, ¯ u (cid:48) = 1 , t ≥ (cid:111) where v ij is a new variable for all i, j . Rather than focus solely on inequality, fairness measures can enhance equality while giving preference tothose who are less advantaged. Far and away the most famous of such measures is the diﬀerence principleof John Rawls (1999), a maximin criterion that is based on careful philosophical argument and debated in avast literature (Freeman [2003], Richardson and Weithman [1999]). The diﬀerence principle can be plausiblyextended to a lexicographic maximum principle. There are also the Hoover and McLoone indices, which arestatistical measures that emphasize the lot of the less advantaged.

The Rawlsian diﬀerence principle states that inequality should exist only to the extent that it is necessaryto improve the lot of the worst-oﬀ. It is defended with a social contract argument that, in its simplest form,maintains that the structure of society must be negotiated in an “original position” in which people do notyet know their station in society. But one can rationally assent to the possibility of ending up on the bottomonly if that person would have been even worse oﬀ in any other social structure, whence an imperative tomaximize the lot of the worst-oﬀ. The principle is intended to apply only to the design of social institutions,and only to the distribution of “primary goods,” which are goods that any rational person would want. Yetit can be adopted as a general criterion for distributing utility, namely a maximin criterion that maximizesthe simple SWF W ( u ) = min i { u i } . This is readily formulated as the LP problemmax x , u ,w (cid:8) w (cid:12)(cid:12) w ≤ u i , all i ; A x + B u ≤ b (cid:9) The maximin criterion can be plausibly extended to lexicographic maximization (leximax) by ﬁrst max-imizing the smallest utility, then holding this utility ﬁxed while maximizing the smallest among those thatremain, and so forth. This is known as pre-emptive goal programming in the optimization literature and isachieved by solving a sequence of optimization problemsmax x , u ,w (cid:26) w (cid:12)(cid:12)(cid:12)(cid:12) w ≤ u i , u i ≥ ˆ u i k − , i ∈ I k A x + B u ≤ b (cid:27) (6)7or k = 1 , . . . , n , where ( ˆ x , ˆ u ) is an optimal solution of problem k , ˆ u i = −∞ , andˆ u i k = min i ∈ I k { ˆ u i } , with I k = { , . . . , n } \ { i , . . . , i k − } Ogryczak and ´Sliwi´nski (2006) showed how to obtain a leximax solution with a single optimization model,but it is impractical for most purposes due to the very large coeﬃcients required in the objective function.

The

Hoover index is related to the Gini coeﬃcient, as it is proportional to the maximum vertical distancebetween the Lorenz curve and a diagonal line representing perfect equality. It is also proportional to therelative mean deviation. It can be interpreted as the fraction of total utility that would have to be transferredfrom the richer half of the population to the poorer half to achieve perfect equality. The SWF is W ( u ) = − (1 / n ¯ u ) (cid:80) i | u i − ¯ u | . The Hoover index can be minimized by solving the same LP problem as for the relativemean deviation.The McLoone index compares the total utility of individuals at or below the median utility to the utilitythey would enjoy if all were brought up to the median utility. The index is 1 if nobody’s utility is strictlybelow the median, and it approaches 0 if nearly everyone below the median has utility much smaller than themedian (on the assumption that all utilities are positive.) The McLoone index beneﬁts the disadvantagedby rewarding equality in the lower half of the distribution, but it is unconcerned by the existence of veryrich individuals in the upper half. The SWF is W ( u ) = 1 | I ( u ) | u med (cid:88) i ∈ I ( u ) u i where u med is the median of utilities in u and I ( u ) is the set of indices of utilities at or below the median,so that I ( u ) = { i | u i ≤ u med } .We can formulate the maximization problem as an MILP problem, but with a fractional objective func-tion, by using standard “big- M ” modeling techniques from integer programming. The model uses 0–1variables δ i , where δ i = 1 when i ∈ I ( u ). The constant M is a large number chosen so that u i < M for all i . The model is max x , u ,m y , z , δ  (cid:80) i y i (cid:80) i z i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m − M δ i ≤ u i ≤ m + M (1 − δ i ) , all iy i ≤ u i , y i ≤ M δ i , δ i ∈ { , } , all iz i ≥ , z i ≥ m − M (1 − δ i ) , all iA x + B u ≤ b , (cid:80) i δ i ≤ n/  where the new variable m represents the median, variable y i is u i if δ i = 1 and 0 otherwise, and variable z i is m if δ i = 1 and 0 otherwise in the optimal solution. The fractional objective function can be removed,resulting in an MILP problem, by using the same change of variable as in linear-fractional programming:max x (cid:48) , u (cid:48) ,m (cid:48) y (cid:48) , z (cid:48) ,t, δ (cid:88) i y (cid:48) i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) u (cid:48) i ≥ m (cid:48) − M δ i , all iu (cid:48) i ≤ m (cid:48) + M (1 − δ i ) , all iy (cid:48) i ≤ u (cid:48) i , y (cid:48) i ≤ M δ i , δ i ∈ { , } , all iz (cid:48) i ≥ , z (cid:48) i ≥ m (cid:48) − M (1 − δ i ) , all iA x (cid:48) + B u (cid:48) ≤ b t, (cid:80) i z (cid:48) i = 1 , t ≥ (cid:80) i δ i ≤ n/  In many practical applications involving fairness, eﬃciency is desired as well. A standard eﬃciency measureis the utilitarian

SWF, W ( u ) = (cid:80) ni =1 u i , which is indiﬀerent to the inequalities among individual utilities.8ne obvious strategy for combining equity and eﬃciency is to deﬁne a SWF that is a convex combination ofutility and a fairness criterion, such as one of those described in previous sections. Although this strategy isconvenient, it poses the diﬃcult challenge of selecting and interpreting the weight parameters of the convexcombination. A popular alternative is alpha fairness, of which proportional fairness (the Nash bargainingsolution) is a special case. The Kalai-Smorodinsky bargaining solution is another option. There are alsoschemes that combine Rawlsian and utilitarian criteria based on justice principles proposed in Williams andCookson (2000). Alpha fairness regulates the relative importance of equity and eﬃciency with a parameter α in the SWF W α ( u ) =  − α (cid:88) i u − αi for α ≥ , α (cid:54) = 1 (cid:88) i log( u i ) for α = 1The SWF is purely utilitarian when α = 0 and becomes purely maximin as α → ∞ . If one person’s utility u i is less than another’s utility u j , then u j must be reduced by ( u j /u i ) α units to compensate for a unitincrease in u i while maintaining constant social welfare. Thus larger values of α imply greater sacriﬁce fromthe person j who is better oﬀ and can therefore be interpreted as giving more emphasis to fairness. Lanet al. (2010) give an axiomatic justiﬁcation of alpha fairness in the context of network resource allocation.The problem of maximizing W α ( u ) can be solved directly in the formmax x , u (cid:8) W α ( u ) (cid:12)(cid:12) A x + B u ≤ b (cid:9) without reformulation. The objective function is nonlinear, but since it is concave for all α ≥

0, any localoptimum is a global optimum. The problem can therefore be solved to optimality by such eﬃcient algorithmsas the reduced gradient method, which is a straightforward generalization of the simplex method for LP,particularly since W α ( u ) has a simple closed-form gradient. Maximizing alpha fairness can therefore beregarded as tractable for reasonably large instances.A well-known special case of α -fairness is proportional fairness , which corresponds to setting α = 1.Maximizing proportional fairness is equivalent to solving the Nash bargaining problem (Nash [1950]). Nashgave an axiomatic argument for the model, and it has also been justiﬁed as the result of certain rationalbargaining procedures (Harsanyi [1977], Rubinstein [1982], Binmore et al. [1986]). Proportional fairness iswidely used in engineering to maximize throughput while maintaining some degree of fairness, as for examplein telecommunication networks and traﬃc signal timing. The

Kalai-Smorodinsky bargaining solution, proposed as an alternative to the Nash bargaining solution,minimizes each person’s relative concession (Kalai and Smorodinsky [1975]). That is, it provides the largestpossible utility relative to the maximum one could obtain if other players are disregarded, subject to thecondition that all persons obtain the same fraction β of their maximum. It has been defended by Thompson(1994) and is consistent with the “contractarian” ethical philosophy of Gautier (1983). The SWF is W ( u ) =  (cid:80) i u i , if u = β u max for some β with 0 ≤ β ≤ , otherwisewhere u max i = max x , u { u i (cid:12)(cid:12) A x + B u ≤ b } for each i . The optimization problem is a straightforward LP:max β, x , u (cid:8) β (cid:12)(cid:12) u = β u max , A x + B u ≤ b , β ≤ (cid:9) .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ................... ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... u eed u u ∆∆ Figure 1: Contours for the equity-based Williams-Cookson SWF.

We now show how to formulate optimization models that directly combine Rawlsian and utilitarian criteria.These formulations are useful when either criterion in isolation would be too extreme for policy making.Williams and Cookson (2000) suggest two ideas for combining maximin and utilitarian objectives in thecase of two persons, and these can be generalized to n persons and formulated as MILP models. Theycorrespond to opposite approaches to combining equity and eﬃciency: an equity-based approach that beginsby maximizing equity but switches to a utilitarian criterion to avoid extreme solutions, and a utility-basedapproach that does the opposite. We also show how to generalize the equity-based approach so as to combineleximax and and utilitarian objectives in a sequence of MILP models (Chen and Hooker [2020b]). The 2-person equity-based model of Williams and Cookson (2000) pursues fairness until the eﬃciency costbecomes too high, whereupon it switches to a utilitarian objective. It uses a maximin criterion when thetwo utilities are suﬃciently close to each other, speciﬁcally | u − u | < ∆, and otherwise it uses a utilitariancriterion. This is illustrated in Fig. 1, where the feasible set is the area under the curve. The maximinsolution (open circle) requires a substantial sacriﬁce from person 2. As a result, the utilitarian solution(black dot) earns slightly more social welfare and is the preferred choice. The SWF can be written W ( u , u ) = (cid:26) u + u , if | u − u | ≥ ∆2 min { u , u } + ∆ , otherwiseThe maximin criterion is modiﬁed from the standard formula min { u , u } to ensure continuity of the SWFas one shifts between the utilitarian and the maximin objective.Hooker and Williams (2012) generalize W to n persons. The utility u i of person i belongs to the fairregion if u i − u min ≤ ∆ and otherwise to the utilitarian region , where u min = min i { u i } . A person whoseutility is in the fair region is considered suﬃciently disadvantaged to deserve priority. The generalized SWF W ( u ) counts all utilities in the fair region as equal to u min , so that they are treated in solidarity with theworst-oﬀ, and all other utilities as themselves. Similar to the 2-person case, copies of ∆ are added to the10aximin criterion to ensure continuity of W . W ( u ) = ( n − n (cid:88) i =1 max { u i − ∆ , u min } (7)The parameter ∆ regulates the equity/eﬃciency trade-oﬀ, with ∆ = 0 corresponding to a purely utilitarianobjective and ∆ = ∞ to a purely maximin objective.In addition, Hooker and Williams (2012) extended W ( u ) to represent the social welfare associated withthe utility distribution to groups of recipients. Suppose there are n groups of possibly diﬀerent sizes, and let s i and u i respectively denote the number of individuals in group i and the utility of each individual in thegroup. The function W g ( u ) considers a group i to be in the fair region when its per capita u i is within ∆of u min , and it prioritizes only the groups in the fair region. W g ( u ) = (cid:16) n (cid:88) i =1 s i − (cid:17) ∆ + n (cid:88) i =1 s i max { u i − ∆ , u min } (8)Hooker and Williams (2012) provided tractable MILP models to maximize W ( u ) and W g ( u ) subject toauxiliary constraints u i − u j ≤ M required for MILP representability. The model for maximizing W ismax x , u , δ , v ,w,z  z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z ≤ ( n − (cid:80) i v i u i − ∆ ≤ v i ≤ u i − ∆ δ i , all iw ≤ v i ≤ w + ( M − ∆) δ i , all iu i − u i ≤ M, all i, jA x + B u ≤ b u i ≥ , δ i ∈ { , } , all i  (9)and the model for W g is similar. The practicality of these models was veriﬁed with experiments on ahealthcare resource allocation instance of realistic size. Alternatively, when eﬃciency is the initial objective, fairness is not considered until inequality in the utilitydistribution becomes intolerable. For the 2-person case, Williams and Cookson (2000) deﬁne the SWF tobe utilitarian when | u − u | < ∆, which corresponds to the case where enforcing fairness is unnecessary.The SWF is maximin (again with a modiﬁcation for continuity) otherwise. In Fig. 2, the utilitarian solution(open dot) is unfair to person 1, and the welfare-maximizing solution is more egalitarian (black dot). TheSWF is W ( u , u ) = (cid:26) { u , u } + ∆ , if | u − u | ≥ ∆ u + u , otherwiseWe generalize this view to deﬁne SWFs to capture the combined objective for n persons or groups withtechniques similar to those used by Hooker and Williams. The main diﬀerence is that we now say a utility u i belongs to the fair region if u i − u min ≥ ∆, otherwise it is in the utilitarian region. In the SWFs W and W g , we still count the fair region utilities as equivalent to u min , the utilitarian region utilities as their exactvalues, and add the needed multiples of ∆ to obtain continuous SWFs. W ( u ) = ( n − n (cid:88) i =1 min { u i − ∆ , u min } W g ( u ) = (cid:16) n (cid:88) i =1 s i − (cid:17) ∆ + n (cid:88) i =1 s i min { u i − ∆ , u min } ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ................... ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. ............. .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ueed u u ∆∆ Figure 2: Contours for the utility-based Williams-Cookson SWF.Because W ( u ) and W g ( u ) are continuous and concave functions, the corresponding maximization problemshave simple LP formulations that are convenient to solve. We ﬁrst derive the LP formulation for maximizing W ( u ). The maximization problem ismax x , u , v  ( n − n (cid:88) i =1 v i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) v i = min { u i − ∆ , u min } , all iA x + B u ≤ b  It is equivalent to the following LP because the constraints on v i hold if and only if v i ≤ u i − ∆ and v i ≤ w ,where w = u min . We require w ≤ u i for all i to ensure that w is set to u min in the optimal solution.max x , u , v ,w,z  z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z ≤ ( n − (cid:80) ni =1 v i v i ≤ w ≤ u i , all iv i ≤ u i − ∆ , all iw ≥ , v i ≥ , all iA x + B u ≤ b  The formulation for maximizing W g is similar.max x , u , v ,w,z  z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z ≤ ( (cid:80) ni =1 s i − (cid:80) ni =1 s i v i v i ≤ w ≤ u i , all iv i ≤ u i − ∆ , all iw ≥ , v i ≥ , all iA x + B u ≤ b  A leximax criterion oﬀers broader sensitivity to equity than a maximin criterion, which is concerned only withthe worst-oﬀ. Chen and Hooker (2020b,2020a) oﬀer a series of SWFs that can be maximized sequentially to12ombine leximax and utilitarian criteria in a principled way: W k ( u ) = (cid:80) k − i =1 ( n − i + 1) u (cid:104) i (cid:105) +( n − k + 1) min (cid:8) u (cid:104) (cid:105) + ∆ , u (cid:104) k (cid:105) (cid:9) + (cid:80) ni = k (cid:0) u (cid:104) i (cid:105) − u (cid:104) (cid:105) − ∆ (cid:1) + , k = 2 , . . . , n where γ + = max { , γ } , and where u (cid:104) (cid:105) , . . . , u (cid:104) n (cid:105) are u , . . . , u n in nondecreasing order. The initial function W is given by (7). The parameter ∆ again regulates the eﬃciency/equity trade-oﬀ by giving preference toindividuals whose utility is within ∆ of the lowest, with greater weight to the more disadvantaged.The MILP model for maximizing W is (9). Using notation similar to that for the goal programmingmodel (6), the MILP formulation for maximizing W k , k ≥

2, ismax x , u , δ , (cid:15)v ,w,τ,z  z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z ≤ ( n − k + 1) τ + (cid:80) i ∈ I k v i ≤ v i ≤ M δ i , i ∈ I k v i ≤ u i − ˆ u i − ∆ + M (1 − δ i ) , i ∈ I k τ ≤ ˆ u i + ∆ , τ ≤ w, w ≥ ˆ u i w ≤ u i ≤ w + M (1 − (cid:15) i ) , i ∈ I k u i − ˆ u i ≤ M, i ∈ I k A x + B u ≤ b (cid:80) i ∈ I k (cid:15) i = 1; δ i , (cid:15) i ∈ { , } , i ∈ I k  The model is demonstrated in Chen and Hooker (2020b) and found to solve rapidly on healthcare resourceand earthquake shelter location problems.

Optimization-based fairness is readily implemented in support vector machines (SVMs), because the problemof ﬁnding a separating hyperplane is already a constrained optimization problem. In principle, one need onlyreplace the usual objective of maximizing the margin with that of maximizing a social welfare function.However, fairness metrics can be incorporated into the SVM problem only if 0–1 variables are introducedto indicate how individuals are classiﬁed. Since the optimization problem is solved over all observationsin the training set, this can result in a very large number of discrete variables and associated constraints.If one relies on an oﬀ-the-shelf MILP solver, this could limit the size of the training set to a few hundredobservations.Scaling up therefore remains a research issue, but possible strategies suggest themselves. An obviousone is to solve the problem using a subset of representative observations, perhaps selected with clusteringtechniques. Another is to draw on specialized techniques in the MILP literature for solving problems withmany 0–1 variables.

Branch-and-price methods, for example, routinely solve practical problems with millionsof 0–1 variables by using column generation techniques to introduce variables only as they are needed toimprove the solution. A related technique, “shrinking,” is already used in sequential minimal optimizationalgorithms for SVMs.We begin with showing how the classical SVM problem can be reformulated into an LP problem bynormalizing in a slightly diﬀerent way. This clears the way for fairness problems containing 0–1 variables tobe formulated as MILP problems. We then show how statistical fairness metrics can be incorporated intothe SVM problem. This is followed by a more general MILP model that allows the use of any SWF that islinearized in previous sections of this paper.

Adopting notation from the SVM literature, let each observation i in the training set consists of a real-valuedvector x i of features and an indication y i as to whether the individual should receive a true classiﬁcation13 y i = 1 for true, y i = − separating hyperplane { x | θ (cid:124) x + b = 0 } is one such that θ (cid:124) x i + b ≥ i . Note that x i and y i are problem data, not variables. The SVM problemis to ﬁnd a separating hyperplane that maximizes the margin σ , which is the minimum distance from thehyperplane to point x i over all i . The standard SVM model interprets distance as the L -norm:max θ ,b,σ (cid:8) σ (cid:12)(cid:12) σ ≤ y i ( θ (cid:124) x i + b ) / (cid:107) θ (cid:107) , all i (cid:9) (10)This is transformed to a problem with a convex nonlinear objective function and linear constraints:min θ ,b (cid:8) (cid:107) θ (cid:107) (cid:12)(cid:12) y i ( θ (cid:124) x i + b ) ≥ , all i (cid:9) The “soft margin” version of the problem adds to the objective function the sum of the errors ξ i that resultwhen observation i falls on the wrong side of the hyperplane.min θ ,b, ξ (cid:40) (cid:107) θ (cid:107) + C (cid:88) i ξ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y i ( θ (cid:124) x i + b ) ≥ − ξ i , all i (cid:41) This problem is solved by solving its Wolfe dual, which is much smaller because it no longer containsa constraint for each observation. The Wolfe dual is an equivalent problem because it satisﬁes Slater’scondition, and there is no duality gap as a result.To obtain an LP model, the SVM problem can be normalized with the L -norm instead of the L -norm,in which case problem (10) becomesmax θ ,b,σ (cid:8) σ (cid:12)(cid:12) σ ≤ y i ( θ (cid:124) x i + b ) / (cid:107) θ (cid:107) , all i (cid:9) Early literature has studied this L -norm based SVM and discussed its potential advantages over the conven-tional L -norm based formulation (e.g. Bradley and Mangasarian [1998], Zhu et al. [2003]). Using a similartransformation as before, we have the soft margin problemmin θ ,b, ξ (cid:40) (cid:107) θ (cid:107) + C (cid:88) i ξ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y i ( θ (cid:124) x i + b ) ≥ − ξ i , all i (cid:41) Since (cid:107) θ (cid:107) = (cid:80) i | θ i | , we can now linearize the model to obtain an LP problem.min θ ,b, ξ , t (cid:40)(cid:88) i ( t i + Cξ i ) (cid:12)(cid:12)(cid:12)(cid:12) − t i ≤ θ j ≤ t j , t j ≥ , all jy i ( θ (cid:124) x i + b ) ≥ − ξ i , all i (cid:41) (11) Social welfare and bias measures can be formulated only if we introduce 0–1 variables δ i to indicate howan individual i is classiﬁed. The resulting problem can no longer be solved using the classical strategy ofsolving the Wolfe dual, because there is a duality gap when integer variables are present. We therefore solvethe original (primal) model directly by formulating it as an MILP problem.We wish to set δ i = 1 when individual i falls on the true side of the hyperplane; that is when θ (cid:124) x i + b ≥ δ i , it is important to formulate this condition in a way that allowseﬃcient solution. We therefore write a sharp MILP formulation of a disjunctive model for each δ i (a sharpformulation is one whose continuous relaxation is the convex hull of the feasible set). The disjunctive modelis (cid:18) θ (cid:124) x i + b ≤ δ i = 0 (cid:19) ∨ (cid:18) θ (cid:124) x i + b ≥ δ i = 1 (cid:19) (12)This model has an MILP representation if and only if the polyhedra described by the two disjuncts have thesame recession cone. To ensure this, we impose in each disjunct the constraint − M ≤ θ (cid:124) x i + b ≤ M , which14s valid for suﬃciently large M . The disjunction (12) now has a sharp MILP representation that simpliﬁesto − M (1 − δ i ) ≤ θ (cid:124) x i + b ≤ M δ i , δ i ∈ { , } This constraint can be added to the SVM model for each i to deﬁne δ i and enable the objective function toreﬂect bias or fairness. We ﬁrst indicate how statistical bias measures can be incorporated into the SVM problem. These metricstypically compare statistics for a protected group with those outside the protected group. For each observa-tion i , let z i = 1 when individual i belongs to the protected group. The simplest bias metric is demographicparity , which is based on the diﬀerence between the probability of true classiﬁcation for and protected andunprotected individuals. This yields the bias metric | ∆( δ ) | , where∆( δ ) = (cid:80) i z i δ i (cid:80) i z i − (cid:80) i (1 − z i ) δ i (cid:80) i (1 − z i )This metric can be incorporated into the SVM problem by minimizing a convex combination of the margin-plus-error and bias, λ ( σ + C (cid:80) i ξ i ) + (1 − λ ) | ∆( δ ) | , where 0 ≤ λ ≤

1. Since ∆( δ ) is a linear function of δ ,the problem is easily linearized by extending the LP model (11) to obtain the following MILP model:min θ ,b, ξt , δ ,w  λ (cid:80) i ( t i + Cξ i )+(1 − λ ) w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − t i ≤ θ j ≤ t j , t j ≥ , all jy i ( θ (cid:124) x i + b ) ≥ − ξ i , all i − w ≤ ∆( δ ) ≤ w − M (1 − δ i ) ≤ θ (cid:124) x i + b, all i θ (cid:124) x i + b ≤ M δ i , all iδ i ∈ { , } , all i  A similar MILP model can be used for equalized odds , since the corresponding function ∆( δ ) is likewise alinear function of δ . Predictive rate parity requires a SWF that contains ratios of aﬃne functions, but it canbe formulated as an MILP problem using the same change of variable as in linear-fractional programming.We now formulate a more general model that maximizes social welfare. For illustrative purposes, wemaximize a convex combination of (negated) margin-plus-error and a social welfare function W ( u ), so asto allow the accuracy of prediction to contribute to welfare. The function W ( u ) could reﬂect fairness or acombination of fairness and utility, as discussed in previous sections. This general model requires that wedeﬁne the utility u i enjoyed by each individual i , which will depend on whether the individual receives thetrue classiﬁcation. If we let u i = c + c i δ i , then we can set c i > y i = 1 to indicatethat the individual is better oﬀ being classiﬁed true, while we set c i < y i = − θ ,b, ξt , δ ,w  − λ (cid:80) i ( t i + Cξ i )+(1 − λ ) W ( u ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − t i ≤ θ j ≤ t j , t j ≥ , all jy i ( θ (cid:124) x i + b ) ≥ − ξ i , all iu i = c + c i δ i , all i − M (1 − δ i ) ≤ θ (cid:124) x i + b, all i θ (cid:124) x i + b ≤ M δ i , all iδ i ∈ { , } , all i  Since the term (cid:80) i ( t i + Cξ i ) in the objective function is linear, the objective function can be linearizedwhenever W ( u ) can be linearized. In particular, there is an MILP model whenever previous sections describean LP or MILP model for the social welfare function W .15 Conclusion

We have shown how optimization can provide a general paradigm for incorporating fairness into AI andmachine learning applications. In particular, we have illustrated how it can be used in rule-based systems,in conjunction with neural networks, or as part of the optimization problem in support vector machines.By expanding the fairness problem to one of maximizing a social welfare function, one can combine fairnesswith prediction accuracy and other eﬃciency goals in a principled way. Optimization models also providethe ﬂexibility of adding constraints on resources and other problem elements while harnessing the power ofhighly advanced optimization solvers that have been developed over several decades.Speciﬁcally, we have exhibited practical optimization models for 16 useful social welfare functions. Mostof these models do not, to our knowledge, appear in previous literature. They can be eﬃciently solvedby state-of-the-art software: six by linear programming solvers, eight by mixed integer/linear programmingsolvers, and two by convex nonlinear or quadratic programming solvers that accommodate linear constraints.

References

S. Aghaei, M. J. Azizi, and P. Vayanos. Learning optimal and fair decision trees for non-discriminativedecision-making. In

Proceedings of the AAAI Conference on Artiﬁcial Intelligence , volume 33, pages1418–1426, 2019.I. Alabdulmohsin. Fair classiﬁcation via unconstrained optimization. arXiv preprint , 2005.14621, 2020.C. Allen, I. Smit, and W. Wallach. Artiﬁcial morality: Top-down, bottom-up, and hybrid approaches.

Ethicsand Information Technology , 7:149–155, 2005.R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth. A convexframework for fair regression. arXiv preprint arXiv:1706.02409 , 2017.K. Binmore, A. Rubinstein, and A. Wolinsky. The Nash bargaining solution in economic modeling.

RANDJournal of Economics , 17:176–188, 1986.P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vectormachines. In

ICML , volume 98, pages 82–90, 1998.S. Bringsjord, K. Arkoudas, and P. Bello. Toward a general logicist methodology for engineering ethicallycorrect robots.

IEEE Intelligent Systems , 21:38–44, 2006.F. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney. Optimized pre-processing fordiscrimination prevention. In

Advances in Neural Information Processing Systems , pages 3992–4001, 2017.A. Charnes and W. W. Cooper. Programming with linear fractional functionals.

Naval Research LogisticsQuarterly , 9:181–186, 1962.V. Chen and J. N. Hooker. A just approach balancing Rawlsian leximax fairness and utilitarianism. In

AAAI/ACM Conference on AI, Ethics, and Society (AIES) , pages 221–227, 2020a.V. Chen and J. N. Hooker. Balancing fairness and eﬃciency in an optimization model. arXiv preprint ,2006.05963, 2020b.M. Chiarandini, R. Fagerberg, and S. Gualandi. Handling preferences in student-project allocation.

Annalsof Operations Research , 275(1):39–78, 2019.A. Chouldechova and A. Roth. A snapshot of the frontiers of fairness in machine learning.

Communicationsof the ACM , 63(5):82–89, 2020. 16. A. Cowell. Measurement of inequality. In A. B. Atkinson and F. Bourguignon, editors,

Handbook ofIncome Distribution , volume 1, pages 89–166. Elsevier, 2000.M. Donini, L. Oneto, S. Ben-David, J. S. Shawe-Taylor, and M. Pontil. Empirical risk minimization underfairness constraints. In

Advances in Neural Information Processing Systems , pages 2791–2801, 2018.C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. S. Zemel. Fairness through awareness. In

Symposiumon Innovations in Theoretical Computer Science (ITCS) , pages 214–226, 2012.O. Eisenhandler and M. Tzur. The humanitarian pickup and distribution problem.

Operations Research , 67:10–32, 2019.H. G. Frankfurt.

On Inequality . Princeton University Press, 2015.S. Freeman, editor.

The Cambridge Companion to Rawls . Cambridge University Press, 2003.I. Gabriel. Artiﬁcial intelligence, values, and alignment.

Minds and Machines , 30:411–437, 2020.D. Gautier.

Morals by Agreement . Oxford University Press, 1983.N. Goel, M. Yaghini, and B. Faltings. Non-discriminatory machine learning through convex fairness criteria.In

Proceedings of the AAAI Conference on Artiﬁcial Intelligence , volume 32, 2018.M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In

Advances in neuralinformation processing systems , pages 3315–3323, 2016.J. C. Harsanyi.

Rational Behavior and Bargaining Equilibrium in Games and Social Situations . CambridgeUniversity Press, 1977.H. Heidari, C. Ferrari, K. Gummadi, and A. Krause. Fairness behind a veil of ignorance: A welfare analysisfor automated decision making. In

Advances in Neural Information Processing Systems , pages 1265–1276,2018.H. Heidari, M. Loi, K. P. Gummadi, and A. Krause. A moral framework for understanding fair ml througheconomic models of equality of opportunity. In

Proceedings of the Conference on Fairness, Accountability,and Transparency , pages 181–190, 2019.J. N. Hooker and T. W. Kim. Toward non-intuition-based machine and artiﬁcial intelligence ethics: Adeontological approach based on modal logic. In

AAAI/ACM Conference on AI, Ethics, and Society(AIES) , pages 130–136, 2018.J. N. Hooker and H. P. Williams. Combining equity and utilitarianism in a mathematical programmingmodel.

Management Science , 58:1682–1693, 2012.L. Hu and Y. Chen. Fair classiﬁcation and social welfare. In

Proceedings of the 2020 Conference on Fairness,Accountability, and Transparency , pages 535–545, 2020.D. A. Iancu and N. Trichakis. Fairness and eﬃciency in multiportfolio optimization.

Operations Research ,62(6):1285–1301, 2014.S. P. Jenkins and P. Van Kerm. The measurement of economic inequality. In B. Nolan, W. Salverda, andT. M. Smeeding, editors,

The Oxford Handbook of Economic Inequality . Oxford University Press, 2011.E. Kalai and M. Smorodinsky. Other solutions to Nash’s bargaining problem.

Econometrica , 43:513–518,1975.O. Karsu and A. Morton. Inequity-averse optimization in operational research.

European Journal of Opera-tional Research , 245:343–359, 2015. 17. Lan, D. Kao, M. Chiang, and A. Sabharwal. An axiomatic theory of fairness in network resource allocation.In

Conference on Information Communications (INFOCOM 2010) , pages 1343–1351. IEEE, 2010.F. Lindner, R. Mattm¨uller, and B. Nebel. Evaluation of the moral permissibility of action plans.

ArtiﬁcialIntelligence , 287, 2020.H. Luss. On equitable resource allocation problems: A lexicographic minimax approach.

Operations Research ,47(3):361–378, 1999.C. McElfresh and J. Dickerson. Balancing lexicographic fairness and a utilitarian objective with applicationto kidney exchange. In

Proceedings of AAAI Conference on Artiﬁcial Intelligence (AAAI 2018) , pages1161–1168, 2018.N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan. A survey on bias and fairness inmachine learning. arXiv preprint , 1908.09635, 2019.M. Mostajabdaveh, W. J. Gutjahr, and S. Salman. Inequity-averse shelter location for disaster preparedness.

IISE Transactions , 51(8):809–829, 2019.V. Nanda, P. Xu, K. A. Sankararaman, J. Dickerson, and A. Srinivasan. Balancing the tradeoﬀ between proﬁtand fairness in rideshare platforms during high-demand hours. In

Proceedings of the AAAI Conference onArtiﬁcial Intelligence , volume 34, pages 2210–2217, 2020.J. Nash. The bargaining problem.

Econometrica , 18:155–162, 1950.W. Ogryczak and T. ´Sliwi´nski. On equitable approaches to resource allocation problems: The conditionalminimax solutions.

Journal of Telecommunications and Information Technology , pages 40–48, 2002.W. Ogryczak and T. ´Sliwi´nski. On direct methods for lexicographic min-max optimization. In M. Gavrilova,O. Gervasi, V. Kumar, C. J. K. Tan, D. Taniar, A. Lagan´a, Y. Mun, and H. Choo, editors,

Proceedings ofInternational Conference on Computational Science and Its Applications (ICCSA 2006) , volume 3982 of

LNCS , pages 802–811, 2006.W. Ogryczak, A. Wierzbicki, and M. Milewski. A multi-criteria approach to fair and eﬃcient bandwidthallocation.

Omega , 36(3):451–463, 2008.M. Olfat and A. Aswani. Spectral algorithms for computing fair support vector machines. In

InternationalConference on Artiﬁcial Intelligence and Statistics , pages 1933–1942, 2018.D. Parﬁt. Equality and priority.

Ratio , pages 201–221, 1997.J. Rawls.

A Theory of Justice (revised). Harvard University Press (original edition 1971), 1999.H. S. Richardson and P. J. Weithman, editors.

The Philosophy of Rawls (5 volumes). Garland, 1999.A. Rubinstein. Perfect equilibrium in a bargaining model.

Econometrica , 50:97–109, 1982.S. Russell.

Human Compatible: AI and the Problem of Control . Bristol, UK: Allen Lane, 2019.T. M. Scanlon. The diversity of objections to inequality. In T. M. Scanlon, editor,

The Diﬃculty of Tolerance:Essays in Political Philosophy , pages 202–218. Cambridge University Press, 2003.I. Stelmakh, N. B. Shah, and A. Singh. Peerreview4all: Fair and accurate reviewer assignment in peer review.

Proceedings of Machine Learning Research , 98:1–29, 2019.W. Thompson. Cooperative models of bargaining. In R. J. Aumann and S. Hart, editors,

Handbook of GameTheory , volume 2, pages 1237–1284. North-Holland, 1994.18. Williams and R. Cookson. Equity in health. In A. Culyer and J. Newhouse, editors,

Handbook of HealthEconomics . Elsevier, 2000.M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness beyond disparate treatment& disparate impact: Learning classiﬁcation without disparate mistreatment. In

Proceedings of the 26thinternational conference on world wide web , pages 1171–1180, 2017.M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. Fairness constraints: A ﬂexible approachfor fair classiﬁcation.

Journal of Machine Learning Research , 20(75):1–42, 2019.R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In

InternationalConference on Machine Learning , pages 325–333, 2013.J. Zhu, S. Rosset, R. Tibshirani, and T. Hastie. 1-norm support vector machines.