[PDF] Fairness in Risk Assessment Instruments: Post-Processing to Achieve Counterfactual Equalized Odds

Abstract

Algorithmic fairness is a topic of increasing concern both within research communities and among the general public. Conventional fairness criteria place restrictions on the joint distribution of a sensitive feature A , an outcome Y , and a predictor S . For example, the criterion of equalized odds requires that S be conditionally independent of A given Y , or equivalently, when all three variables are binary, that the false positive and false negative rates of the predictor be the same for two levels of A . However, fairness criteria based around observable Y are misleading when applied to Risk Assessment Instruments (RAIs), such as predictors designed to estimate the risk of recidivism or child neglect. It has been argued instead that RAIs ought to be trained and evaluated with respect to potential outcomes Y 0 . Here, Y 0 represents the outcome that would be observed under no intervention--for example, whether recidivism would occur if a defendant were to be released pretrial. In this paper, we develop a method to post-process an existing binary predictor to satisfy approximate counterfactual equalized odds, which requires S to be nearly conditionally independent of A given Y 0 , within a tolerance specified by the user. Our predictor converges to an optimal fair predictor at n − − √ rates under appropriate assumptions. We propose doubly robust estimators of the risk and fairness properties of a fixed post-processed predictor, and we show that they are n − − √ -consistent and asymptotically normal under appropriate assumptions.

Full PDF

FFairness in Risk Assessment Instruments:Post-Processing to AchieveCounterfactual Equalized Odds

Alan Mishler ∗ Edward H. Kennedy † September 8, 2020

Abstract

Algorithmic fairness is a topic of increasing concern both within research commu-nities and among the general public. Conventional fairness criteria place restrictionson the joint distribution of a sensitive feature A , an outcome Y , and a predictor S .For example, the criterion of equalized odds requires S ⊥⊥ A | Y , or equivalently, whenall three variables are binary, that the false positive and false negative rates of thepredictor be the same for two levels of A [Hardt et al., 2016].However, fairness criteria based around observable Y are misleading when appliedto Risk Assessment Instruments (RAIs), such as predictors designed to estimate therisk of recidivism or child neglect. It has been argued instead that RAIs ought tobe trained and evaluated with respect to potential outcomes Y [Coston et al., 2020].Here, Y represents the outcome that would be observed under no intervention–forexample, whether recidivism would occur if a defendant were to be released pretrial.In this paper, we develop a method to post-process an existing binary predictorto satisfy approximate counterfactual equalized odds , which requires S to be nearlyconditionally independent of A given Y , within a tolerance speciﬁed by the user.Our predictor converges to an optimal fair predictor at √ n rates under appropriateassumptions. We propose doubly robust estimators of the risk and fairness propertiesof a ﬁxed post-processed predictor, and we show that they are √ n -consistent andasymptotically normal under appropriate assumptions. ∗ Department of Statistics & Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh,PA 15213. Email: [email protected] † Assistant Professor, Department of Statistics & Data Science, Carnegie Mellon University, 5000 ForbesAvenue, Pittsburgh, PA 15213. Email: [email protected] a r X i v : . [ s t a t . M E ] S e p Introduction

Machine learning is increasingly involved in high stakes decisions in domains such as health-care, criminal justice, and consumer ﬁnance. In these settings, ML models often take theform of Risk Assessment Instruments (RAIs): given covariates such as demographic in-formation and an individual’s medical/criminal/ﬁnancial history, the model predicts thelikelihood of an adverse outcome, such as a dangerous medical event, recidivism, or defaulton a loan. Rather than rendering an automatic decision, the model produces a “risk score,”which a decision maker may take into account when deciding whether to initiate a medicaltreatment, release a defendant on bail, or issue a personal loan.The proliferation of machine learning has raised concerns that learned models may bediscriminatory with respect to protected features like race, sex, age, and socioeconomicstatus. For example, there has been vigorous debate about whether a widely used recidivismprediction tool called COMPAS is biased against black defendants [Angwin et al., 2016;Angwin and Larson, 2016; Dieterich et al., 2016; Larson and Angwin, 2016; Lowenkampet al., 2016]. Concerns have also been raised about risk assessments used to identify highrisk medical patients [Obermeyer et al., 2019] and about common credit scoring algorithmssuch as FICO [Rice and Swesnik, 2012], among many others. Collectively, these types ofalgorithms arguably impact the lives of the vast majority of Americans.These concerns have led to an explosion of methods in recent years for developing fairmodels and auditing the fairness of existing models. These eﬀorts are complicated by thefact that there is no consensus on how to quantify (un)fairness. Researchers have proposeda wide range of fairness criteria, many of which turn out to be mutually unsatisﬁable underreal-world conditions [Chouldechova, 2017; Kleinberg et al., 2017]. The most widely discussed fairness criteria impose constraints on the joint distribution ofa sensitive feature A , an outcome Y , and a predictor S . These criteria are inappropriatefor RAIs, however. RAIs are not concerned with the observable outcomes Y in the trainingdata (“Did patients of this type historically experience serious complications?”), which arethemselves a product of historical treatment decisions. Rather, they are concerned withthe potential outcomes associated with available treatment decisions (“ Would patients ofthis type experience complications if not treated? ”). Because treatments are not assignedat random—doctors naturally treat the patients they think are at high risk—these aredistinct questions.Coston et al. [2020] showed how RAIs that are optimized to predict observable ratherthan potential outcomes systematically underestimate risk for units that have historicallybeen receptive to treatment, leading to suboptimal treatment decisions. They further See [Imai and Jiang, 2020] for a set of suﬃcient conditions under which these unsatisﬁability resultsdisappear. counterfactual equalized odds . Conventional equalizedodds requires that risk predictions be independent of the sensitive feature conditional onobserved outcomes [Hardt et al., 2016]. In the setting that we consider, in which thesensitive feature, predictor, and outcome are all binary, this is equivalent to insisting thatthe false positive and false negative rates be equal for both groups. The counterfactualversion of equalized odds simply substitutes a particular potential outcome for the observedoutcome [Coston et al., 2020]. We argue that this fairness criterion is intuitively appealingin the RAI setting, and that it is preferable to counterfactual versions of other candidatefairness criteria.Hardt et al. [2016] showed how an existing binary predictor can be post-processed toyield randomized classiﬁers that satisfy equalized odds. We extend their method to thecounterfactual setting. Like their approach, our method yields a randomized classiﬁer thatrequires access at runtime only to the sensitive feature and the output of the previouslytrained predictor. Our randomized classiﬁer is likewise parameterized as the solution to asimple linear program. We further provide estimators for the loss and fairness properties ofour derived predictor. We show that our estimators are √ n -consistent and asymptoticallynormal under appropriate assumptions. In Section 2, we propose a relaxation of counterfactual equalized odds, approximate coun-terfactual equalized odds , that allows RAI designers to control the tradeoﬀ between fairnessand predictive performance. We also discuss related work and explain our choice to fo-cus on equalized odds vs. other available fairness criteria. In Section 3, we motivate theuse of counterfactuals through an example that illustrates how observable equalized oddsfails to reduce disparities between groups and can cause harm to an already disadvantagedgroup. In Section 4, we deﬁne and provide identifying expressions for our main estimand,a loss-optimal derived predictor that satisﬁes approximate counterfactual equalized odds.In Section 5, we provide an estimator of this predictor and show that it converges to theoptimum at √ n rates in a particular sense. Once a speciﬁc derived predictor has beenchosen, it is of interest to estimate the loss and fairness properties of that predictor. Weprovide eﬃcient doubly robust estimators for these properties and show that they yieldasymptotically valid conﬁdence intervals and hypothesis tests. In Section 6, we illustrateour theoretical results via simulations. We conclude with discussion in Section 7.3 Background and Related Work

A table listing all notational choices can be found in Appendix D.Let

A, D, Y denote a sensitive feature, decision, and outcome, respectively. We consider thesetting in which all three are binary, though most of the deﬁnitions below extend readilyto continuous settings. Denote by Y , Y the potential outcomes Y D =0 , Y D =1 . Y di is theoutcome that would be observed for unit i if, possibly contrary to fact, the decision wereset to D = d [Neyman, 1923; Holland, 1986; Rubin, 2005]. We refer to the two levels ofthe sensitive feature A as the two “groups,” and we use “treatment” and “intervention”synonymously with “decision.” Let S be any random variable that maps to { , } .Under typical assumptions, the potential outcome Y D associated with the actual decisionthat is made is rendered observable, while the other outcomes remain counterfactual. How-ever, we follow convention in using the term “counterfactual” as a synonym for “potential,”covering both observable and unobservable cases.In most RAI settings, one of the decision options is a natural baseline corresponding to“no intervention” ( D = 0). Examples include the risk of recidivism if a defendant isreleased pretrial, or the risk of neglect or abuse if a child welfare call is not screened infor further investigation. Many or most RAIs do not generate a separate risk score forthe outcome associated with intervention. (In the case of child welfare, for example, callscreeners have to make a binary decision about whether to screen a call in or out. It’snot clear that a prediction that a child is at high risk with or without intervention wouldbe more useful with respect to that decision than a prediction that a child is at high riskwith no intervention.) We therefore restrict attention to this baseline potential outcome Y , though extensions to contrasts between potential outcomes are also possible.Denote the observational and counterfactual false positive rates of S for group a byFPR( S, a ) = P ( S = 1 | Y = 0 , A = a ) and cFPR( S, a ) = P ( S = 1 | Y = 0 , A = a ).For example, cFPR( S,

0) could represent the chance of being falsely labeled high-risk if re-leased pre-trial, among those black defendants who would not actually go on to recidivate,while cFPR( S,

1) could represent the corresponding chance for white defendants who wouldnot recidivate. Let FNR, cFNR, TPR, cTPR, TNR, cTNR denote the corresponding falsenegative, true positive, and true negative rates, respectively.

Deﬁnition 2.1.

A predictor S satisﬁes observational equalized odds (oEO) with respectto A and Y if S ⊥⊥ A | Y . It satisﬁes counterfactual equalized odds (cEO) if S ⊥⊥ A | Y .When A , Y , and S are all binary, equalized odds is equivalent to requiring that the corre-4ponding false positive and false negative rates be equal for the two levels of A . Our derivedpredictor will be designed to satisfy a relaxation of this criterion, deﬁned below. Deﬁnition 2.2.

The counterfactual error rate diﬀerences for a predictor S are the diﬀer-ences ∆ + and ∆ − in the cFPR and cFNR for the two groups A = 0 , A = 1, deﬁned asfollows: ∆ + ( S ) = cFPR( S, − cFPR( S, − ( S ) = cFNR( S, − cFNR( S, Deﬁnition 2.3.

When

A, Y , and S are all binary, S satisﬁes approximate counterfactualequalized odds with unfairness tolerances (cid:15) + , (cid:15) − ∈ [0 ,

1] if | ∆ + ( S ) | ≤ (cid:15) + | ∆ − ( S ) | ≤ (cid:15) − In general, a fairness-constrained predictor would not outperform an optimal unconstrainedpredictor, and in some cases, satisfying cEO exactly might degrade performance to the pointthat the RAI is no longer useful. This relaxation of cEO allows RAI designers to negotiatethis tradeoﬀ.

There are three broad approaches to developing fair models: (1) preprocessing the inputdata to remove bias [Kamiran and Calders, 2012; Calmon et al., 2017], (2) constrainingthe learning process [Zafar et al., 2017; Donini et al., 2018; Narasimhan, 2018], and (3)post-processing a model to satisfy fairness constraints [Hardt et al., 2016; Kim et al., 2019].Our approach can be viewed as belonging to class (3). Each approach has advantages anddisadvantages, and in some cases the distinctions between the approaches may not be clear.Many widely used RAIs are proprietary tools developed by for-proﬁt companies, so theyare not amenable to internal tinkering. Developing new, fair(er) RAIs would be costlyand perhaps infeasible from a policy perspective. The advantage of post-processing in thissetting is that it can be applied to models that are already in use. Our method requiresaccess at runtime only to the sensitive feature and the output of the existing predictor, soin principle, it could easily be incorporated into existing risk assessment pipelines. We referto the predictor that our method returns equivalently as a “post-processed” or “derived”predictor [Hardt et al., 2016]. 5 .3 Why equalized odds?

Equalized odds is one of several popular fairness criteria that impose constraints on thejoint distribution of (

A, Y, S ). Equalized odds is known more generally as separation , aterm which covers settings in which these variables are not necessarily binary. The othertwo popular criteria in this class are independence ( S ⊥⊥ A ) and suﬃciency ( Y ⊥⊥ A | S ); suﬃciency is equivalent to calibration or predictive parity when all three variablesare binary. Variants of all three criteria may be deﬁned for example by conditioning onadditional variables. The counterfactual versions of these criteria deﬁned in [Coston et al.,2020] simply replace Y with Y .Except in highly constrained, unrealistic conditions, these three criteria are pairwise un-satisﬁable, regardless of whether they are deﬁned with respect to Y or Y [Kleinberg et al.,2017; Chouldechova, 2017; Barocas et al., 2018]. We must therefore choose which criterionwe wish to target.When evaluating a predictive system, it seems natural to focus on its real-world impactrather than its outputs per se. One desirable property of a decision process might be D ⊥⊥ A | Y . In the context of recidivism prediction, for example, this would meanno correlation between sentence length ( D ) and race ( A ) among defendants who wouldrecidivate if released ( Y = 1) or among those who wouldn’t recidivate if released ( Y =0). By way of shorthand, we will say that if D (cid:54)⊥⊥ A | Y , then the system exhibitsdiscriminatory disparate impact, meaning an unjustiﬁable diﬀerence in the distribution ofbeneﬁts or burdens across groups.In the context of RAIs, decision makers typically have wide latitude in how they interpretand act on the risk scores, so constraining the RAI does not enforce fairness with respect totheir decisions. However, if decision makers, after the introduction of the RAI, make theirdecisions only on the basis of the RAI scores and other variables U which are independentof the RAI and A given Y , then equalized odds will imply D ⊥⊥ A | Y . That is, let D = f ( S, U ) represent the function f describing the decision process after the RAI S isintroduced. If cEO is satisﬁed and U ⊥⊥ ( S, A ) | Y , then it follows that D ⊥⊥ A | Y . Evenif U (cid:54)⊥⊥ ( S, A ) | Y , it is easy to see that if the conditional independence statement nearly holds, or if f depends primarily on S rather than U , then discriminatory disparate impactcan be small.By contrast, for a predictor satisfying either independence or suﬃciency, there is no map-ping from predictions to decisions that satisﬁes this property. Chouldechova [2017] inparticular showed how predictors which satisfy suﬃciency (predictive parity) are likely toyield decisions such that D (cid:54)⊥⊥ A | Y ; these arguments are unchanged when we substitute Y for Y . Though there is no consensus about how to quantify fairness, this is at least oneconsideration in favor of equalized odds over predictive parity or independence.6 .4 Other causal fairness criteria There is another set of fairness criteria that is motivated by causal considerations. Thesecriteria consider counterfactuals of the sensitive feature or proxy, and they characterize adecision or prediction as fair if the sensitive feature or proxy does not “cause” the decisionor prediction, either directly or along a prohibited pathway [Kilbertus et al., 2017; Kusneret al., 2017; Nabi and Shpitser, 2018; Zhang and Bareinboim, 2018; Nabi et al., 2019;Wang et al., 2019]. There is some controversy over whether it is meaningful to discuss acounterfactual of a feature like race or gender [VanderWeele and Robinson, 2014; Glymourand Glymour, 2014; Hu and Kohler-Hausmann, 2020]. Furthermore, it is not clear thatthese metrics are appropriate in the context of risk assessment, where emphasis is onassessing risk regardless of the causes of that risk. Finally, satisfying these metrics typicallyprecludes use of most of the features that go into risk assessment, like prior history, whichis not tenable in practice [Coston et al., 2020].For all these reasons, we focus on (counterfactual) equalized odds; however our approachcan be adapted to other fairness criteria, and we do not advocate for the use of equalizedodds in all settings.

We now motivate the use of counterfactual rather than observational equalized odds. Thefollowing example illustrates how predictors which satisfy oEO (observational equalizedodds) rather than cEO (counterfactual equalized odds) will in general not eliminate dis-criminatory disparate impact, and how they can unintentionally reduce rates of appropriateintervention.Consider a school district that assigns tutors to students who are believed to be at risk ofacademic failure. The school district wishes to develop a RAI S to better identify studentswho need tutors while ensuring that this resource is allocated fairly across two levels of thesensitive feature A . Let D ∈ { , } represent the decision to assign (1) or not assign (0) atutor, and let Y ∈ { , } represent academic success (0) or failure (1).Let W represent the set of covariates available as input to the RAI, with A possibly in W .The quantity of interest is S = (cid:98) P ( Y = 1 | W ), i.e., the estimated probability of failure ifno tutor is assigned. A cEO predictor satisﬁes (cid:98) P ( S | Y , A ) = (cid:98) P ( S | Y ), while an oEOpredictor satisﬁes (cid:98) P ( S | Y, A ) = (cid:98) P ( S | Y ). Divergence in these predictors is driven by theextent to which Y (cid:54) = Y in the training data. In order to parameterize this divergence, weintroduce the following deﬁnitions. Deﬁnition 3.1.

The need rate for group a is P ( Y = 1 | A = a ), the probability that a7tudent from group a would fail without a tutor. Deﬁnition 3.2.

The opportunity rate for group a is P ( D = 1 | Y = 1 , A = a ), theprobability that a student in group a who needs a tutor receives one. Deﬁnition 3.3.

The intervention strength for group a is P ( Y = 0 | Y = 1 , A = a ), theprobability that a student in group a who would fail without a tutor would succeed witha tutor.We simulate a simple data generating process in which we allow the intervention strengthto vary, while constraining it to be equal for the two groups. We ﬁx all other parts ofthe distribution. In particular, we set P ( A = 1) = 0 .

7, set the need rates to 0 . . . .

4. We set the probabilitiesthat a tutor is assigned when it is not needed to P ( D = 1 | Y = 0 , A = 0) = 0 . P ( D = 1 | Y , A = 1) = 0 .

2. This represents a scenario in which the minority group hasgreater need, perhaps due to socioeconomic factors or prior educational opportunities, andalso is likelier to receive resources. Finally, we set P ( Y = 0 | Y = 0) = 1, meaning thattutoring never increases the risk of failure.We consider a hypothetical oEO predictor S with ﬁxed false positive rate P ( S = 1 | Y =0 , A ) = P ( S = 1 | Y = 0) = 0 . P ( S = 0 | Y = 1 , A ) = P ( S =0 | Y ) = 0 .

2. We assume S ⊥⊥ Y | A, Y , as would be the case for example when S is ahigh quality predictor of Y . Figure 1 shows the cTPRs for this predictor as a function ofintervention strength, relative to the baseline opportunity rates for the two groups. Whenthe intervention has no eﬀect (strength 0), the cTPRs are equal because Y ≡ Y , so thecTPR and TPR are identical. (Of course, a strength of 0 means the tutoring is worthless.)For all strength values >

0, the cTPR of the minority groups is lower than for the majoritygroup. The diﬀerence in error rates increases as intervention strength increases. A cEOpredictor avoids this problem by design: the cTPRs for the two groups are constrained tobe equal.The actual eﬀect of the RAI on opportunity rates depends on how human decision makersinterpret and respond to the RAI, but this example makes it clear that oEO predictors ingeneral will not prevent discriminatory disparate impact, whereas, as discussed in section2.3, counterfactual EO predictors have at least the potential to mitigate or avoid it.This example also illustrates how observational EO predictors can reduce rates of appro-priate intervention. For example, suppose that decision makers, after the introduction ofthe RAI, set D = S , i.e. they assign tutors precisely to students whom the RAI labels ashigh risk. Then, for any intervention strength > .

5, the opportunity rate for the minoritygroup decreases below baseline: the RAI harms the minority group .As described in section 2.2, there are many ways to arrive at a predictor which satisﬁes a8 .0 0.2 0.4 0.6 0.8 1.0Intervention strength0.40.50.60.70.8 c T P R ( O pp o r t un i t y r a t e ) Group0 (minority)1 (majority)

Figure 1: Counterfactual true positive rates for a predictor satisfying observable equalizedodds, as a function of the intervention strength P ( Y = 0 | Y = 1). Dashed lines indicateopportunity rates P ( D = 1 | Y = 0) prior to the development of the RAI.given fairness criterion. We now deﬁne our estimand, which accomplishes this by taking ina previously trained predictor and post-processing it to satisfy cEO. We expand our notation in order to fully describe our problem setting. Consider a randomvector Z = ( A, X, D, S, Y ) ∼ P , where in addition to the binary sensitive feature A ,decision D , and outcome Y , we have covariates X ∈ R p and a previously trained predictor S . Initially, we will consider binary predictors S ∈ { , } . We require only that S isobservable; we do not require access to its inputs or internal structure. S in practice couldrepresent a RAI that is already in use, such as a recidivism prediction tool. The covariates X may or may not overlap with the inputs to S . Their role in the analysis is to rendercounterfactual quantities identiﬁable.Our target is a derived predictor that satisﬁes approximate cEO. As in the case of ob-servable equalized odds considered by [Hardt et al., 2016], we achieve this by randomlyﬂipping S with probabilities that depend only on S and A . Consider a vector θ =9 θ , , θ , , θ , , θ , ) T ∈ [0 , . We deﬁne an associated derived predictor S θ : S θ ∼ Bern( θ A,S )where θ A,S = (cid:88) a,s ∈{ , } { A = a, S = s } θ a,s In other words, the θ a, parameters represent conditional probabilities that S ﬂips, whilethe θ a, parameters represent conditional probabilities that S doesn’t ﬂip. Notice that for ˜ θ = (0 , , , S ˜ θ = S : the derived predictor is equal to the input predictor. Our target is a loss-optimal fair predictor S θ ∗ , where the fairness criterion is approximatecEO. The loss function we consider is MSE, which in the binary setting is equivalent toprediction error. For ﬁxed θ , denote the loss by L ( S θ ) = E [( Y − S θ ) ]. The estimandis θ ∗ ∈ arg min θ L ( S θ )subject to θ ∈ [0 , | ∆ + ( S θ ) | ≤ (cid:15) + | ∆ − ( S θ ) | ≤ (cid:15) − where the unfairness tolerances (cid:15) + , (cid:15) − ∈ [0 ,

1] are chosen by the user. Setting both theseconstraint parameters to 0 requires cEO to be satisﬁed exactly, while setting them to 1allows S θ ∗ to be arbitrarily unfair.For any ﬁxed θ ∈ [0 , —say, for an estimate (cid:98) θ of the target parameter θ ∗ —it is of interestto estimate the loss and fairness properties of the associated derived predictor.Since our estimands involve counterfactual quantities, distributional assumptions are re-quired in order to equate them to observable quantities. In this subsection we show that counterfactual error rates and loss can be identiﬁed understandard assumptions. All the quantities to be identiﬁed can be written in terms of the lossand the counterfactual error rates of S θ . For ease of notation, we ﬁrst deﬁne two nuisanceparameters that appear in the estimand and associated estimators, namely the outcome In the previous section, we used S to refer to an arbitrary RAI. From here forward, we use the notation S θ to indicate the dependence of the derived predictor on the input predictor S and the parameter θ . We refer to MSE as “loss” instead of the conventional “risk” in order to avoid confusion between riskassessment and the error rate of a predictor.) µ ( A, X, S ) = E [ Y | A, X, S, D = 0] π ( A, X, S ) = P ( D = 1 | A, X, S )We make the following standard “no unmeasured confounding”-type causal inference as-sumptions:Assumption 1 (Consistency). Y = DY + (1 − D ) Y Assumption 2 (Positivity). ∃ δ ∈ (0 ,

1) s.t. P ( π ( A, X, S ) ≤ − δ ) = 1Assumption 3 (Ignorability). Y ⊥⊥ D | A, X, S

Satisfying ignorability assumptions typically requires collecting a rich enough set of de-confounding covariates. In the present case, even if X is low dimensional, the ignorabilityassumption is plausible if the input predictor S substantially drives decision making, or ifit happens to be an accurate (if not fair) predictor of Y .Before giving the identifying expressions for the loss L ( S θ ) and the fairness constraints∆ + , ∆ − , we give identifying expressions for the error rates of the input predictor S , whichthemselves appear in the expressions for ∆ + , ∆ − . Proposition 1.

Under assumptions 1-3, the counterfactual error rates of the input pre-dictor S are identiﬁed as follows:cFPR( S, a ) = E [ S (1 − µ ) { A = a } ] E [(1 − µ ) { A = a } ]cFNR( S, a ) = E [(1 − S ) µ { A = a } ] E [ µ { A = a } ]All proofs are given in the appendix. We now deﬁne several quantities that appear in theidentifying expressions for the linear program: β a,s = E [ { A = a, S = s } (1 − µ )] , for a, s ∈ { , } (1) β = ( β , , β , , β , , β , ) (2) β + = (1 − cFPR( S, , cFPR( S, , cFPR( S, − , − cFPR( S, β − = ( − cFNR( S, , cFNR( S, − , cFNR( S, , − cFNR( S, roposition 2. Under assumptions 1-3, the loss and error rates of the derived predictor S θ are identiﬁed as: L ( S θ ) = θ T β + E [ µ ]∆ + ( S θ ) = θ T β + ∆ − ( S θ ) = θ T β − Since the term E [ µ ] in the loss is ﬁxed, we can drop it without changing the minimizer ofthe loss. We can therefore rewrite the estimand as θ ∗ ∈ arg min θ θ T β subject to θ ∈ [0 , (cid:12)(cid:12) θ T β + (cid:12)(cid:12) ≤ (cid:15) + (cid:12)(cid:12) θ T β − (cid:12)(cid:12) ≤ (cid:15) − (5)In other words, the optimal fair derived predictor is the solution to a linear program (LP).We refer to this as the “true LP” since it deﬁnes the estimand. We now deﬁne an estimator (cid:98) θ as the solution to an “estimated LP.” There are two broad estimation tasks. The ﬁrst is to construct a derived predictor S (cid:98) θ thatapproximates S θ ∗ . The second is to estimate properties of S (cid:98) θ , conditional on (cid:98) θ . We usesample splitting to generate two datasets: D train is for the ﬁrst task, and D test is for thesecond.Within each dataset, further splitting is used to separate the estimation of nuisance andtarget parameters, avoiding the need for potentially restrictive empirical process assump-tions. The nuisance parameters are µ and π . The target parameters can in each casebe written as functions of the nuisance parameters. In our analysis, each dataset is splitonce: one fold is used to estimate nuisance parameters, and the other fold is used to esti-mate the target parameter(s). We refer to these folds as D nuistrain , D targettrain , D nuistest , D targettest . Toregain full sample size eﬃciency, one can swap the folds, repeat the procedure, and aver-age the results, an approach that is popularly called cross-ﬁtting [Bickel and Ritov, 1988;Robins et al., 2008; Zheng and van der Laan, Mark, 2010; Chernozhukov et al., 2018]. A k -fold version of cross-ﬁtting with k > k -fold setting isstraightforward.The sample splitting procedure is illustrated in Figure 2. For convenience, we assume thateach of the four samples is of size n , though our results require only that each sample is O ( n ). Note that our results obtain even without sample splitting if the nuisance parametersbelong to Donsker classes, but sample splitting allows us to avoid this assumption. D train D nuistrain D targettrain (cid:98) µ , (cid:98) π (cid:98) θ D test D nuistest D targettest (cid:98) µ , (cid:98) π Properties of S (cid:98) θ Figure 2: Sample splitting diagram. D train is used to compute (cid:98) θ , the estimate of the optimalpredictor parameter θ ∗ . Once (cid:98) θ has been computed, D test is used to estimate the loss andfairness properties of the derived predictor S (cid:98) θ . Within each dataset, sample splitting isused to separate estimation of the nuisance parameters µ , π from estimation of the targetparameters. Remark 1. ( Notation ) To avoid excessive notation, we use hats (as in (cid:98) π ) for quantitiesestimated on both D train and D test . Likewise, for both D = D train and D = D test , we use P n to denote the empirical measure over D target , so that for any function (cid:98) f ( Z ), P n ( (cid:98) f ( Z )) = (cid:82) (cid:98) f ( Z ) d P n ( Z ) = n − (cid:80) ni =1 (cid:98) f ( Z i ) denotes the average of (cid:98) f ( Z ) taken over D target . The normof a ﬁxed function f is the L norm with respect to P , i.e., (cid:107) f (cid:107) = (cid:82) f ( z ) d P ( z ).Quantities that refer to D train occur only in the context of estimating (cid:98) θ , while quantitiesthat refer to D test occur only in the context of estimating properties of S (cid:98) θ (or of S θ forany ﬁxed parameter θ ). The intended usage should therefore be clear from context, butwe make it explicit in cases where it might otherwise be unclear.Both µ and π can be estimated with arbitrary nonparametric learners. We will requireonly that they are estimated consistently at certain rates. Our theoretical results utilizethe following additional assumptions:Assumption 4 (Bounded propensity estimator). ∃ δ ∈ [0 ,

1] s.t. (cid:98) π ≤ − δ Assumption 5 (Nuisance estimator rates). (cid:107) (cid:98) µ − µ (cid:107) = o P (1) , (cid:107) (cid:98) π − π (cid:107) = o P (1) , (cid:107) (cid:98) µ − µ (cid:107)(cid:107) (cid:98) π − π (cid:107) = o P (1 / √ n )13ssumption 4 can be trivially satisﬁed by truncating (cid:98) π at 1 − δ . Under assumption 2,this will not prevent (cid:98) π from being consistent for π . Assumption 5 states that (cid:98) µ and (cid:98) π are consistent for µ and π , and that the product of their errors is smaller than 1 / √ n .Assumption 5 can be satisﬁed under relatively weak and nonparametric smoothness orsparsity assumptions [Gy¨orﬁ et al., 2002; Raskutti et al., 2011]. For example, let d = p +2 bethe dimension of ( A, X, S ). If µ and π are in H¨older classes with smoothness index s > d/ (cid:98) µ and (cid:98) π such that (cid:107) (cid:98) µ − µ (cid:107) = o P ( n − / ) and (cid:107) (cid:98) π − π (cid:107) = o P ( n − / ), in which case Assumption 5 would be satisﬁed. The two parametersnaturally do not have to be estimated at the same rate, as long as they are both estimatedconsistently and the product of the rates is o P (1 / √ n ). An estimator for θ ∗ is derived by computing estimates (cid:98) β, (cid:98) β + , (cid:98) β − of the true LP coeﬃcientsand then solving the resulting estimated LP: (cid:98) θ = arg min θ θ T (cid:98) β subject to θ ∈ [0 , | θ T (cid:98) β + | ≤ (cid:15) + | θ T (cid:98) β − | ≤ (cid:15) − There are many possible estimators of the LP coeﬃcients. We propose doubly robust esti-mators, which allow for fast rates of convergence under relatively mild assumptions [van derVaart, 2002; Tsiatis, 2006]. These rates propagate to the loss and fairness properties of S (cid:98) θ .For ease of notation, let φ = 1 − D − π ( Y − µ ) + µ (cid:98) φ = 1 − D − (cid:98) π ( Y − (cid:98) µ ) + (cid:98) µ denote the uncentered eﬃcient inﬂuence function for E ( Y ) and its estimate, respectively[Bickel et al., 1993; Hahn, 1998; van der Laan and Robins, 2003; Kennedy, 2016]. The14stimators for individual coeﬃcients are: (cid:98) β a,s = P n [ { A = a, S = s } (1 − (cid:98) φ )] (6) (cid:92) cFPR( S, a ) = P n [ { A = a } S (1 − (cid:98) φ )] P n [ { A = a } (1 − (cid:98) φ )] (7) (cid:92) cFNR( S, a ) = P n [ { A = a } (1 − S ) (cid:98) φ ] P n [ { A = a } (cid:98) φ ] (8)The vectors (cid:98) β, (cid:98) β + , (cid:98) β − are formed by plugging in (cid:98) β a,s , (cid:92) cFPR( S, a ) , (cid:92) cFNR( S, a ) into ex-pressions (1), (3), and (4). We now give theoretical results for the derived predictor S (cid:98) θ .We show that S (cid:98) θ approaches optimal behavior at fast rates . We deﬁne two quantities ofinterest: the loss gap and the excess unfairness , and give accompanying theorems. Deﬁnition 5.1.

The loss gap is L ( S (cid:98) θ ) − L ( S θ ∗ ), i.e., the diﬀerence in loss between thederived predictor and the optimal derived predictor.We use the term loss gap rather than excess loss to acknowledge that the loss of S (cid:98) θ canbe less than the loss of S θ ∗ , if (cid:98) θ falls outside the true constraints. Of course, this can onlyoccur if S (cid:98) θ violates the true fairness constraints, which can happen because the constraintsare estimated. Theorem 1. (Loss gap.) Under Assumptions 1-5: L ( S (cid:98) θ ) − L ( S θ ∗ ) = O P (1 / √ n ) Deﬁnition 5.2.

The excess unfairness of S θ in the cFPR isUF + ( S θ ) := max {| cFPR( S θ , − cFPR( S θ , | − (cid:15) + , } , and the excess unfairness of S θ in the cFNR isUF − ( S θ ) := max {| cFNR( S θ , − cFNR( S θ , | − (cid:15) − , } Intuitively, there ought to be roughly a 50% chance that UF + ( S (cid:98) θ ) = 0, because the ran-dom constraint set should ﬂuctuate around true constraint set. Hence this deﬁnition isnontrivial. Theorem 2. (Excess unfairness.) Under assumptions 1-5:max (cid:8) UF + ( S (cid:98) θ ) , UF − ( S (cid:98) θ ) (cid:9) = O P (1 / √ n ) We ignore optimization error, since this is a function of the number of optimization iterations and canbe made arbitrarily small [Boyd and Vandenberghe, 2004] emark 2. ( (cid:98) θ vs. the behavior of S (cid:98) θ ). Without assumptions about how the loss andfairness of S (cid:98) θ depend on (cid:98) θ , there is no guarantee about the rate at which (cid:98) θ will approach θ ∗ . This is not a concern, since the object of interest is not θ ∗ per se but a predictor thatbehaves like S θ ∗ .We now turn toward estimation of properties of a ﬁxed derived predictor. These results arevalid for any derived predictor S θ . In practice, the most obvious use would be to estimateproperties of S (cid:98) θ once (cid:98) θ has been computed. Given ﬁxed θ , we are interested in estimating the loss L ( S θ ) and the error rate diﬀerences∆ + ( S θ ) , ∆ − ( S θ ). We deﬁne one additional quantity of interest related to S θ . Deﬁnition 5.3.

The change in loss for a derived predictor S θ relative to an input predictor S is Γ( S θ ) = L ( S θ ) − L ( S ).We refer to a change in loss rather than an increase in loss because it is possible for S θ tohave smaller loss that S . This is not a typical expectation: in fair prediction problems, theset of fair classiﬁers is necessarily smaller than the set of fair and unfair classiﬁers. Hence,there is a fairness-accuracy tradeoﬀ: satisfying fairness comes at the cost of a reduction inpredictive performance. In the RAI setting, however, since predictors are typically trainedto predict observable outcomes, their performance may be arbitrarily bad with respect tothe potential outcome Y . It is therefore not implausible than a derived fair predictorcould have higher accuracy than the input predictor it is derived from.The estimators used here are essentially identical to the estimators of the LP coeﬃcientsused in the previous section. Here, however, our aim is to demonstrate properties ofthese estimators, rather than properties of our derived predictor S (cid:98) θ . In particular, we areinterested in deriving conﬁdence intervals, in addition to guaranteeing rates of convergence.16he estimators are (cid:98) L ( S θ ) = θ T (cid:98) β + P n ( (cid:98) φ ) (loss) (cid:98) Γ( S θ ) = ( θ − ˜ θ ) T (cid:98) β (loss change) (cid:92) cFPR( S θ , a ) = θ a, (cid:16) − (cid:92) cFPR( S, a ) (cid:17) + θ a, (cid:92) cFPR( S, a ) (cFPR) (cid:92) cFNR( S θ , a ) = (1 − θ a, ) (cid:16) (cid:92) cFNR( S, a ) (cid:17) + θ a, (cid:16) − (cid:92) cFNR( S, a ) (cid:17) (cFNR) (cid:98) ∆ + ( S θ ) = θ T (cid:98) β + (error rate diﬀerence in cFPR) (cid:98) ∆ − ( S θ ) = θ T (cid:98) β − (error rates diﬀerence in cFNR)where recall ˜ θ = (0 , , , T , so that S ˜ θ = S . Note that the loss estimator adds back in theportion of the loss that doesn’t depend on θ and that we consequently removed from theLP in (5).We now give theoretical results for these estimators. The estimators are built aroundinﬂuence functions for the parameters and consequently are eﬃcient when the assumptionsabove are satisﬁed: no other estimator has smaller asymptotic variance uniformly around P [van der Vaart, 2002]. Theorem 3. (Loss and loss change) Fix θ ∈ [0 , . Under assumptions 1-5: √ n (cid:16) (cid:98) L ( S θ ) − L ( S θ ) (cid:17) (cid:32) N (0 , var ( f θ )) √ n (cid:16)(cid:98) Γ − Γ (cid:17) (cid:32) N (cid:0) , var (cid:0) f θ − f ˜ θ (cid:1)(cid:1) where f θ = (1 − µ ) θ A,S + φθ A,S = (cid:88) a,s ∈ , θ a,s { A = a, S = s } and the estimators (cid:98) L , (cid:98) Γ attain the nonparametric eﬃciency bound.

Corollary 3.1.

Given a consistent estimator for var( f θ ), an asymptotically valid 95%conﬁdence interval for L ( S θ ) is given by (cid:98) L ( S θ ) ± . · (cid:99) var( f θ ) / √ n . An asymptoticallyvalid test of the hypothesis L ( S θ ) = C for any C consists of evaluating whether C is in theconﬁdence interval. An analogous result holds for Γ.17 heorem 4. (Fairness estimators) Fix θ ∈ [0 , . Under assumptions 1-5: √ n (cid:16) (cid:92) cFPR( S θ , a ) − cFPR( S θ , a ) (cid:17) (cid:32) N (0 , var ( g a )) √ n (cid:16) (cid:92) cFNR( S θ , a ) − cFNR( S θ , a ) (cid:17) (cid:32) N (0 , var ( h a )) √ n (cid:16) (cid:98) ∆ + − ∆ + (cid:17) (cid:32) N (0 , var( g − g )) √ n (cid:16) (cid:98) ∆ − − ∆ − (cid:17) (cid:32) N (0 , var( h − h ))where g a = ( θ a, − θ a, ) E [ { A = a } (1 − φ )] − { A = a } (1 − φ )( R − cFPR( S, a )) h a = ( θ a, − θ a, ) E [ { A = a } φ ] − { A = a } φ ( R − cFNR( S, (cid:92) cFPR , (cid:92) cFNR , (cid:98) ∆ + , (cid:98) ∆ − attain the nonparametric eﬃciency bound.Asymptotically valid conﬁdence intervals and hypothesis tests can be obtained in the man-ner described in Corollary 3.1. In order to construct conﬁdence intervals for the loss andfairness quantities, we require estimators of the asymptotic variances. Our estimators arethe sample variances of (cid:98) g a , (cid:98) h a , (cid:98) g − (cid:98) g , and (cid:98) h − (cid:98) h , where these quantities are deﬁned bythe following: (cid:98) g a = ( θ a, − θ a, ) P n [ { A = a } (1 − (cid:98) φ )] − { A = a } (1 − (cid:98) φ )( R − (cid:92) cFPR( S θ , a )) (cid:98) h a = ( θ a, − θ a, ) P n [ { A = a } (cid:98) φ ] − { A = a } (cid:98) φ ( R − (cid:92) cFNR( S θ , f θ , g a , and h a are the eﬃcient inﬂuence functions (EIFs) for the loss anderror rates.In the subsequent section, we illustrate these theorems using simulated data. We use one set of simulations to illustrate theorems 1 and 2, which capture propertiesof S (cid:98) θ . We use another set of simulations to illustrate theorems 3 and 4, which captureproperties of our loss and fairness estimators. Finally, we simulate the fairness-accuracytradeoﬀ for diﬀerent values of the unfairness tolerances (cid:15) + , (cid:15) − . Data generating process

First, we deﬁne a pre-RAI data generating process. Usingthis data, we train a predictor S to predict observable outcomes Y , mirroring how RAIs18re typically constructed in practice. The predictor is a random forest trained on ( A, X ).We then deﬁne a post-RAI data generating process. The only diﬀerence relative to thepre-RAI process is that the predictor S now aﬀects the decisions D . This emulates the wayRAIs are intended to work in practice; for example, a criminal defendant labeled high-risk( S = 1) might be less likely to be released pre-trial ( D = 0) than they would have prior tothe introduction of the RAI. The data generating process is designed to meet assumptions1-3, with π ( D | A, X, S ) upper bounded at 0 .

975 (Figure 3). P ( A = 1) = 0 . X | A ∼ N ( A ∗ (1 , − . , , T , I ) P pre ( D = 1 | A, X ) = min { . , expit(( A, X ) T (0 . , − , , − , } P post ( D = 1 | A, X, S ) = min { . , expit(( A, X, S ) T (0 . , − , , − , } P ( Y = 1 | A, X ) = expit((

A, X ) T ( − , , − , , − P ( Y = 1 | A, X ) = expit((

A, X ) T (1 , − , , − , Y = (1 − A ) Y + AY Figure 3: Data generating process for simulations. I denotes the 4 × P pre and P post refer to the decision process before and after development of the RAI S ,which itself serves as input to our estimand S θ ∗ . No other parts of the data generatingprocess change in response to S . Doubly robust vs. plugin estimators

We contrast the performance of our approachusing doubly robust estimators with an approach that uses plugin estimators everywhereinstead, both in the linear program used to compute (cid:98) θ and in the estimators of the lossand fairness properties for the ﬁxed derived predictor S (cid:98) θ . The plugin estimators simplysubstitute (cid:98) µ for (cid:98) φ ; they are otherwise identical to the doubly robust estimators. Noneof theorems 1-5 will typically hold with plugin estimators: unlike with the doubly robustestimators, assumptions 4-5 are not suﬃcient to guarantee √ n -rates in any of the quantitiesof interest. In general, one would only expect these results to hold for plugin estimatorsif (cid:107) (cid:98) µ − µ (cid:107) = o P ( n − / ). Importantly, in a nonparametric model, this condition generallycannot be satisﬁed, unlike assumption 5 [Gy¨orﬁ et al., 2002]. Simulating (cid:98) µ and (cid:98) π To simulate estimators that are consistent for µ and π at diﬀerentrates, we add random noise (cid:15) of diﬀerent magnitudes to µ and π , as described in Figure4. The noise is added on the logit scale to ensure that (cid:98) µ , (cid:98) π remain in [0 , (cid:98) π is againtruncated to 0.975. Sample sizes and Monte Carlo iterations

Each estimation procedure was run 50019 ∼ N (0 , n − . )logit( (cid:98) π ) = min { logit(0 . , logit( π ) + (cid:15) } logit( (cid:98) µ ) = logit( µ ) + (cid:15) Figure 4: Procedure for simulating √ n -consistency of the nuisance parameter estimators (cid:98) µ and (cid:98) π times for each of n ∈ { , , , , , } . The “true” loss and fairness valueswere computed on a separate validation set of size 500,000, using the plugin estimatorswith the true µ . These approximately true values showed negligible variation over manyrepetitions. Simulations 1

Figure 5 shows the true loss L ( S (cid:98) θ ) and the excess unfairness valuesUF + ( S (cid:98) θ ), UF − ( S (cid:98) θ ) for a predictor S (cid:98) θ with (cid:15) + = 0 . , (cid:15) − = 0 .

20. Following Theorems 1and 2, the loss and excess unfairness values converge to the loss L ( S θ ∗ ) of the optimalderived predictor and 0, respectively. Only the doubly robust estimators yield guaranteed √ n convergence. Simulations 2

To illustrate that Theorems 3 and 4 apply to arbitrary derived predic-tors, the second set of simulations involves a ﬁxed derived predictor S θ , with θ randomlyset to (0 . , . , , . (cid:98) θ and θ ∗ is virtually guaranteed.) Figure 6 illustrates estimators (cid:98) L ( S θ ) , (cid:98) Γ( S θ ) , (cid:98) ∆ + ( S θ ) , (cid:98) ∆ − ( S θ )of the loss, loss change, and error rate diﬀerences for S θ . Following Theorems 3 and 4,the estimators converge to their target values. Only the doubly robust estimators yieldguaranteed √ n convergence.Table 1 contains coverage results of 95% conﬁdence intervals for the error rates, error ratediﬀerences, loss, and loss change for the same arbitrary S θ . The CIs were constructedusing sample variances. To ensure that they did not exceed the bounds of the possibleparameter values (i.e. [0 ,

1] for the loss and error rates, [ − ,

1] for the error rate diﬀerencesand loss change), the CIs were constructed using the Delta method, via the transformations (cid:98) ψ (cid:55)→ logit( (cid:98) ψ ) (for (cid:98) ψ ∈ { (cid:92) cFPR , (cid:92) cFNR , (cid:98) L} ) or (cid:98) ψ (cid:55)→ logit(( (cid:98) ψ + 1) /

2) (for (cid:98) ψ ∈ { (cid:98) ∆ + , (cid:98) ∆ − , (cid:98) Γ } ).Nominal coverage is achieved for various quantities at various sample sizes, but since thecoverage guarantees are asymptotic, it is not surprising that it is not achieved everywhere.Interestingly, the median coverage rate in the table is 0.95.20 separate set of CIs was computed without using the Delta method; those results did notdiﬀer substantially and are therefore omitted here. Simulations 3

Finally, the third set of simulations illuminates the fairness-accuracytradeoﬀ. Figure 7 shows the loss change Γ( S θ ∗ ) = L ( S θ ∗ ) − L ( S ) for each point in agrid of unfairness tolerances (cid:15) + , (cid:15) − . Here, S is the Bayes-optimal predictor of Y in ourdata generating scenario, meaning S ( A, X ) = E [ Y | A, X ]. Since any derived predictornecessarily has greater loss than the Bayes-optimal predictor, we refer to the loss changehere as the performance cost .In the setup described above, the Bayes-optimal predictor has a loss of only 0.07 andabsolute error rate diﬀerences of only 0 .

05 (∆ + ) and 0 .

04 (∆ − ), which leaves little roomto illustrate the potential cost of fairness. For these simulations, therefore, we set P ( Y =1 | A, X ) = expit((

A, X ) T ( − , . , . , . , − .

24 and absolute error ratediﬀerences of 0 .

23 (∆ + ) and 0 .

40 (∆ − ), which are plausible values for a real predictor.As expected, when (cid:15) + ≥ ∆ + ( S ) or (cid:15) − ≥ (cid:98) ∆ − ( S ), the performance cost is 0: the inputpredictor already falls within the fairness constraints. As the tolerances tighten towards 0,the performance declines, though never substantially. For (cid:15) + = (cid:15) − = 0, when the derivedpredictor is constrained to exactly satisfy cEO, the loss increases by 0.10, to 0.34. Thediﬀerent values for ∆ + ( S ) and ∆ − ( S ) are reﬂected in the diﬀering costs of satisfying fairnessalong the two axes: the cost of controlling ∆ + ( S θ ) are lower than the costs of controlling∆ + ( S θ ).[Woodworth et al., 2017] showed that post-processing can result in predictors with poorperformance, but it is unclear how likely this is to be a problem in practice. While thefairness-accuracy tradeoﬀ naturally depends on the data generating process, our exampleillustrates that fairness can in some cases be achieved without substantial performancecosts. 2100 200 500 1000 5000 20000Loss (cid:98) L ( S θ ) 0.98 0.92 0.87 0.84 0.84 0.85Loss Change (cid:98) Γ( S θ ) 1.00 0.99 0.93 0.94 0.71 0.78Error rates (cid:92) cFPR( S θ ,

0) 0.99 0.98 0.98 0.96 0.94 0.95 (cid:92) cFNR( S θ ,

1) 0.90 0.89 0.93 0.95 0.96 0.57 (cid:92) cFNR( S θ ,

0) 0.99 0.99 0.98 0.99 0.92 0.93 (cid:92) cFNR( S θ ,

1) 0.99 0.99 0.99 1.00 0.98 0.71Error rate diﬀs (cid:98) ∆ + ( S θ ) 0.98 0.98 0.97 0.99 0.97 0.92 (cid:98) ∆ − ( S θ ) 0.99 1.00 0.99 0.99 0.94 0.94Table 1: 95% CI coverage at sample sizes ranging from 100 to 20 ,

000 for the loss, losschange, error rates, and error rate diﬀerences, for an arbitrary derived predictor S θ withparameter θ = (0 . , . , , . .00.51.0 DR ( S ) UF + ( S ) UF ( S ) PI DR ( S ), scaled by n UF + ( S ), scaled by n UF ( S ), scaled by n

100 200 500 1000 5000 20000 PI

100 200 500 1000 5000 20000

Figure 5: (Illustration of Theorems 1 and 2)

Loss and excess unfairness for the derivedpredictor S (cid:98) θ for samples of size 100 to 20,000, using doubly robust (DR) vs. plugin (PI)estimators for the parameters of the linear program that deﬁnes (cid:98) θ . Each vertical linerepresents a mean ± S θ ∗ (ﬁrst column) and the target excess unfairness valueof 0 (other columns). The values in the bottom two rows represent the values in the toptwo rows transformed by ψ ( S (cid:98) θ ) (cid:55)→ √ n ( ψ ( S (cid:98) θ ) − ψ ( S θ ∗ )), where ψ is L or UF + or UF − ,as appropriate. The top two rows show that the loss and excess unfairness converge totheir target values for both DR and SR estimators. The bottom two rows illustrate that √ n -convergence is only guaranteed for (cid:98) θ DR : the scaled values for (cid:98) θ DR do not grow in n ,while the scaled values for (cid:98) θ SR begin to diverge.23 DR ( ) ( S ) + ( S ) ( S ) PI DR ( S ), scaled by n ( S ), scaled by n + ( S ), scaled by n ( S ), scaled by n

100 200 500 1000 5000 20000 PI

100 200 500 1000 5000 20000

Figure 6: (Illustration of Theorems 3 and 4)

Doubly robust (DR) vs. plugin (PI)estimates of the loss, loss change, and error rate diﬀerences for an arbitrary derived pre-dictor S θ , with θ = (0 . , . , , . ± (cid:98) ψ ( S θ ) (cid:55)→ √ n ( (cid:98) ψ ( S θ ) − ψ ( S θ )), where (cid:98) ψ is (cid:98) L or (cid:98) ∆ + or (cid:98) ∆ − , as appropriate. The top two rows show that the DR and SR estimatorsconverge. The bottom two rows illustrate that √ n -convergence is only guaranteed for theDR estimators: the scaled values for the DR estimators do not appear to grow in n , whilethe scaled values for (cid:98) L DR ( S θ ) and (cid:98) Γ DR ( S θ ) begin to diverge.24 . . . . . . . . . . .

09 0 .

08 0 .

07 0 .

06 0 .

05 0 .

04 0 .

03 0 .

02 0 .

01 0 . (cid:15) + (cid:15) − . . . . Figure 7: Loss change Γ( θ ∗ ) = L ( S θ ) − L ( S ) for the Bayes-optimal input predictor S ( A, X ) = E [ Y | A, X ] and θ ∗ corresponding to diﬀerent unfairness tolerances (cid:15) + , (cid:15) − .The black area represents fairness constraints that are looser than the error rate diﬀer-ences of the input predictor (∆ + ( S ) = 0 . , ∆ − ( S ) = 0 . .

10) occurs when the error rates diﬀerences are bothconstrained to be 0, meaning the derived predictor S θ satisﬁes cEO exactly. In this paper we considered fairness in risk assessment instruments (RAIs), which are nat-urally concerned with potential outcomes rather than strictly observable outcomes. Weintroduced the fairness criterion approximate counterfactual equalized odds (approximatecEO), which allows users to negotiate the tradeoﬀ between fairness and performance. Weargued that this fairness criterion is likelier than other candidate criteria to reduce dis-criminatory disparate impact, which we deﬁned as D (cid:54)⊥⊥ A | Y .Our method extends the work of [Hardt et al., 2016] to the potential outcome setting:it post-processes an arbitrary binary classiﬁer, yielding a randomized classiﬁer that ap-proximately satisﬁes approximate cEO. The post-processed classiﬁer is the solution to afour-dimensional linear program with a compact feasible set, so it is easy to compute. Weshowed that our classiﬁer converges in a certain sense to the optimal fair classiﬁer at √ n √ n -consistent. We illus-trated our results via simulations, and also illustrated the possibility of achieving fairnessat a relatively small cost in predictive performance.Unlike in [Hardt et al., 2016], we did not assume that the joint distribution of the sensitivefeature, input predictor, and outcome was known. Our rate results can be readily translatedto the setting of [Hardt et al., 2016], in which the outcome of interest is the observable Y and the fairness criterion is (approximate) observational equalized odds.As with any counterfactual quantity, our estimand requires for identiﬁcation a set of covari-ates that are suﬃcient to deconfound the treatment and the potential outcome of interest.This raises the possibility that those same covariates could be used to train a fair predictorfrom scratch rather than to post-process an existing predictor. While this approach meritsinvestigation, our approach has an advantage in that these covariates are required only toconstruct the post-processed predictor; at runtime, our predictor requires access only to thesensitive feature and the prediction from the input predictor. This makes it more feasiblethat our method could be implemented on top of existing RAIs. A predictor trained fromscratch would be constrained by the set of covariates available in deployment, whereas ourmethod would allow researchers to devise a set of suitable deconfounding covariates andthen collect an appropriate dataset on a one-time basis.In closing, we note that from our perspective, notions of fairness in predictive systemsought to be subordinate to notions of fairness grounded in the actual decisions or eventsthat those systems inform, and the impact that those decisions have on people’s lives. Itis perfectly possible for a RAI that is fair according to some formal criterion to result ingreater unfairness in decision making, or vice versa. Though little is currently known abouthow decision makers respond to RAIs, there is some evidence that judges do not have muchfaith in recidivism predictions and that RAIs can have little impact on the decisions that aremade [Jonnson, 2018; Stevenson, 2018]. As RAIs and the general public’s understandingof how they function co-evolve, it is likely that the ways in which decision makers respondto them will evolve as well.Nevertheless, it seems plausible that some fairness criteria for RAIs are likelier than othersto lead to increased (un)fairness with respect to decisions and outcomes. While this isultimately an empirical question, we believe that this kind of consideration ought to grounddiscussions of fairness in RAIs and predictive systems generally. As long as there aredomains involving high stakes decisions that we do not wish to fully automate, RAIs willremain relevant, and so will the task of ensuring that they lead to a society that is morefair, not less. 26 .1 Extensions to other settings In ongoing work, we generalize this method within a single framework that can handleseveral variations: (1) input predictors that take values in [0 ,

1] rather than { , } [Hardtet al., 2016], (2) approximate counterfactual suﬃciency ( Y ⊥⊥ A | S ) rather than equalizedodds, and (3) either Y or observable Y . Though we argued in section 2.3 that equalizedodds has an appealing property that suﬃciency lacks, we do not wish to claim deﬁnitivelythat one criterion is preferable to the other, and we are interested in methods that canaccommodate either one. Variation (3) is designed to accommodate diagnostic settingsrather than risk assessment settings, in which decisions do not change the estimated out-comes. We ﬁnd preliminarily that we can construct predictors that reﬂect any combinationof these variants as solutions to convex programs, and that many of the properties of theestimators deﬁned above translate to this generalized setting. References

Julia Angwin and Jeﬀ Larson. Bias in criminal risk scores is mathematically inevitable, re-searchers say.

ProPublica , 12 2016. URL . propublica . org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say .Julia Angwin, Jeﬀ Larson, Surya Mattu, and Lauren Kirchner. Machine bias. ProP-ublica , 5 2016. URL . propublica . org/article/machine-bias-risk-assessments-in-criminal-sentencing .Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and machine learning. 2018.URL . fairmlbook . org .Peter J. Bickel and Ya’acov Ritov. Estimating integrated squared density derivatives :Sharp best order of convergence estimates. Sankhy¯a: The Indian Journal of Statistics,Series A , 50(3):381–393, 1988. URL . jstor . org/stable/25050710 .Peter J. Bickel, Chris A.J. Klaassen Ya’acov Ritov, and Jon A. Wellner. Eﬃcient andadaptive estimation for semiparametric models . Johns Hopkins series in the mathematicalsciences. Johns Hopkins University Press, Baltimore, 1993. ISBN 0801845416.Stephen P. Boyd and Lieven Vandenberghe.

Convex optimization . Cambridge UniversityPress, 2004. ISBN 978-0-521-83378-3.Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy,and Kush R Varshney. Optimized pre-processing for discrimination prevention. InI. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, andR. Garnett, editors,

Advances in Neural Information Processing Systems 30 , pages27992–4001. Curran Associates, Inc., 2017. URL http://papers . nips . cc/paper/6988-optimized-pre-processing-for-discrimination-prevention . pdf .Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duﬂo, Christian Hansen,Whitney Newey, and James Robins. Double/debiased machine learning for treatmentand structural parameters. The Econometrics Journal , 21(1):C1–C68, February 2018.URL https://doi . org/10 . . .Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidi-vism prediction instruments. Big data , 5(2):153–163, 2017.Amanda Coston, Alan Mishler, Edward H. Kennedy, and Alexandra Chouldechova. Coun-terfactual risk assessments, evaluation, and fairness. In

Proceedings of the 2020 Con-ference on Fairness, Accountability, and Transparency , pages 582–593. ACM, January2020. URL https://dl . acm . org/doi/10 . . .William Dieterich, Christina Mendoza, and Tim Brennan. COMPAS Risk Scales: Demon-strating Accuracy Equity and Predictive Parity. 2016.Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimil-iano Pontil. Empirical risk minimization under fairness constraints. In S. Ben-gio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, edi-tors, Advances in Neural Information Processing Systems 31 , pages 2791–2801. CurranAssociates, Inc., 2018. URL http://papers . nips . cc/paper/7544-empirical-risk-minimization-under-fairness-constraints . pdf .Clark Glymour and Madelyn R. Glymour. Commentary: Race and sex are causes. Epidemi-ology , 25(4):488–490, 2014. URL https://doi . org/10 . . .Laszlo Gy¨orﬁ, Michael Kohler, Adam Krzyzak, and Harro Walk. A Distribution-FreeTheory of Nonparametric Regression . Springer, 2002. ISBN 978-0-387-22442-8.Jinyong Hahn. On the role of the propensity score in eﬃcient semiparametric estima-tion of average treatment eﬀects.

Econometrica , 66(2):315–331, 1998. URL . jstor . org/stable/2998560 .Moritz Hardt, Eric Price, Eric Price, and Nati Srebro. Equality of opportunity in super-vised learning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett,editors, Advances in Neural Information Processing Systems 29 , pages 3315–3323. Cur-ran Associates, Inc., 2016. URL http://papers . nips . cc/paper/6374-equality-of-opportunity-in-supervised-learning . pdf .Paul W. Holland. Statistics and causal inference. Journal of the American StatisticalAssociation , 81(396):968, 1986. URL . jstor . org/stable/2289069 .28ily Hu and Issa Kohler-Hausmann. What’s sex got to do with fair machine learning? arxiv e-prints , 2020. URL https://arxiv . org/abs/2006 . .Kosuke Imai and Zhichao Jiang. Principal fairness for human and algorithmic decision-making. arXiv e-prints , 2020. URL http://arxiv . org/abs/2005 . .Melissa Jonnson. The inﬂuence of risk assessment evidence on judicial sentencing decisions,2018. URL http://summit . sfu . ca/item/18704 .Faisal Kamiran and Toon Calders. Data preprocessing techniques for classiﬁcation withoutdiscrimination. Knowledge and Information Systems , 33(1):1–33, 2012. URL http://doi . org/10 . .Edward H. Kennedy. Semiparametric theory and empirical processes in causal inference.In Hua He, Pan Wu, and Ding-Geng (Din) Chen, editors, Statistical Causal Inferencesand Their Applications in Public Health Research , pages 141–167. Springer, 2016. URL https://doi . org/10 . .Edward H. Kennedy, Sivaraman Balakrishnan, and Max GSell. Sharp instruments forclassifying compliers and generalizing causal eﬀects. Annals of Statistics , 48(4):2008–2030, 08 2020. URL https://doi . org/10 . .Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, DominikJanzing, and Bernhard Sch¨olkopf. Avoiding discrimination through causal reasoning.In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages656–666. Curran Associates, Inc., 2017. URL http://papers . nips . cc/paper/6668-avoiding-discrimination-through-causal-reasoning . pdf .Michael P. Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post-processing for fairness in classiﬁcation. In Proceedings of the 2019 AAAI/ACM Con-ference on AI, Ethics, and Society , page 247254. ACM, January 2019. URL https://doi . org/10 . . .Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-oﬀs in thefair determination of risk scores. In Christos H. Papadimitriou, editor, , volume 67 of Leibniz In-ternational Proceedings in Informatics (LIPIcs) , pages 43:1–43:23, Dagstuhl, Germany,2017. Schloss Dagstuhl–Leibniz-Zentrum fr Informatik. URL http://doi . org/10 . . ITCS . . .Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fair-ness. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,and R. Garnett, editors, Advances in Neural Information Processing Systems 30 , pages29066–4076. Curran Associates, Inc., 2017. URL http://papers . nips . cc/paper/6995-counterfactual-fairness . pdf .Jeﬀ Larson and Julia Angwin. Technical response to northpointe. ProPublica , 7 2016. URL . propublica . org/article/technical-response-to-northpointe .Anthony W. Lowenkamp, Flores Kristin, and Bechtel Christopher T. False positives, falsenegatives, and false analyses: A rejoinder to machine bias: There’s software used acrossthe country to predict future criminals. and it’s biased against blacks. Federal Probation ,80(2):38–46, 2016.Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. In

Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence , pages 1931–1940, 2018. URL https://aaai . org/ocs/index . php/AAAI/AAAI18/paper/view/16683 .Razieh Nabi, Daniel Malinsky, and Ilya Shpitser. Learning optimal fair policies. In Pro-ceedings of the 36th International Conference on Machine Learning , volume 9, pages4674–4682, 2019. URL http://proceedings . mlr . press/v97/nabi19a/nabi19a . pdf .Harikrishna Narasimhan. Learning with complex loss functions and constraints. InAmos Storkey and Fernando Perez-Cruz, editors, Proceedings of Machine LearningResearch , volume 84, pages 1646–1654, Lanzarote, Spain, Apr 2018. PMLR. URL http://proceedings . mlr . press/v84/narasimhan18a . html .Jerzy Neyman. Justiﬁcation of applications of the calculus of probabilities to the solu-tions of certain questions in agricultural experimentation. Excerpts english translation(Reprinted). Statistical Science , 5:463–472, 1923.Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissectingracial bias in an algorithm used to manage the health of populations.

Science , 366(6464):447–453, October 2019. URL https://doi . org/10 . . aax2342 .Garvesh Raskutti, Martin J Wainwright, and Bin Yu. Minimax rates of estimation for high-dimensional linear regression over (cid:96) q -balls. IEEE transactions on information theory , 57(10):6976–6994, 2011. URL https://doi . org/10 . . . .Lisa Rice and Deidre Swesnik. Discriminatory eﬀects of credit scoring on communities ofcolor. Suﬀolk University Law Review , 46:935, 2012.James Robins, Lingling Li, Eric Tchetgen, and Aad van der Vaart. Higher order inﬂuencefunctions and minimax estimation of nonlinear functionals. In

Institute of MathematicalStatistics Collections , pages 335–421. Institute of Mathematical Statistics, Beachwood,Ohio, USA, 2008. URL https://doi . org/10 . .30onald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association , 100(469):322–331, March 2005. URL https://doi . org/10 . .Alexander Shapiro. Asymptotic analysis of stochastic programs. Annals of OperationsResearch , 30(1):169–186, December 1991. URL https://doi . org/10 . .Megan Stevenson. Assessing risk assessment in action. Minnesota Law Review , 103(1):83,2018. URL https://dx . doi . org/10 . . .Anastasios A. Tsiatis. Semiparametric Theory and Missing Data . Springer, 2006. ISBN978-0-387-32448-7. URL https://doi . org/10 . .Mark J. van der Laan and James M. Robins. Uniﬁed Methods for Censored LongitudinalData and Causality . Springer Series in Statistics. Springer, New York, NY, 2003. ISBN978-0-387-21700-0. URL https://doi . org/10 . .Aad van der Vaart. Semiparametric statistics. In Pierre Bernard, editor, Lectures onprobability theory and statistics , number 1781 in Lecture notes in mathematics. Springer,Berlin, 2002. ISBN 978-3-540-47944-4. URL https://doi . org/10 . .Tyler J. VanderWeele and Whitney R. Robinson. On the causal interpretation of racein regressions adjusting for confounding and mediating variables. Epidemiology , 25(4):473–484, 2014. URL https://doi . org/10 . . .Yixin Wang, Dhanya Sridhar, and David M. Blei. Equal opportunity and aﬃrmativeaction via counterfactual predictions. arXiv e-prints , 2019. URL http://arxiv . org/abs/1905 . .Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. Learningnon-discriminatory predictors. In Satyen Kale and Ohad Shamir, editors, Proceedings ofMachine Learning Research , volume 65, pages 1920–1953, Amsterdam, Netherlands, 07–10 Jul 2017. PMLR. URL http://proceedings . mlr . press/v65/woodworth17a . html .Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P. Gum-madi. Fairness beyond disparate treatment & disparate impact: Learning classiﬁca-tion without disparate mistreatment. In Proceedings of the 26th International Confer-ence on World Wide Web , pages 1171–1180, Perth Australia, April 2017. InternationalWorld Wide Web Conferences Steering Committee. URL https:doi . org/10 . . .Junzhe Zhang and Elias Bareinboim. Fairness in decision-making – the causal explanationformula. In AAAI Conference on Artiﬁcial Intelligence , pages 2037–2045, 2018. URL . aaai . org/ocs/index . php/AAAI/AAAI18/paper/view/16949/15911 .31enjing Zheng and van der Laan, Mark. Asymptotic Theory for Cross-validated TargetedMaximum Likelihood Estimation. U.C. Berkeley Division of Biostatistics Working PaperSeries , Working Paper 273, 2010. URL https://biostats . bepress . com/ucbbiostat/paper273/ . 32 Appendix A: Proofs of propositions

For convenience, we restate the ﬁve assumptions from the main text that underlie ourresults:Assumption 1 (Consistency). Y = DY + (1 − D ) Y Assumption 2 (Positivity). ∃ δ ∈ (0 ,

1) : P ( π ( A, X, S ) ≤ − δ ) = 1Assumption 3 (Ignorability). Y ⊥⊥ D | A, X, S

Assumption 4 (Bounded propensity estimator). ∃ δ ∈ [0 ,

1] s.t. (cid:98) π ≤ − δ Assumption 5 (Nuisance estimator rates). (cid:98) µ − µ = o P (1) , (cid:98) π − π = o P (1) , (cid:107) (cid:98) µ − µ (cid:107)(cid:107) (cid:98) π − π (cid:107) = o P (1 / √ n ) Proof of Proposition 1 (Identiﬁcation of error rates for input predictor S ) cFPR( S, a ) = P ( S = 1 | Y = 0 , A = a )= P ( S = 1 , Y = 0 , A = a ) P ( Y = 0 , A = a )= E [ R (1 − Y ) { A = a } ] E [(1 − Y ) { A = a } ]= E [ S (1 − E [ Y | A, X, S, D = 0)] { A = a } E [(1 − E [ Y | A, X, S, D = 0]) { A = a } ]= E [ S (1 − µ ) { A = a } ] E [(1 − µ ) { A = a } ]cFNR( S, a ) = P ( S = 0 | Y = 1 , A = a )= P ( S = 0 , Y = 1 , A = a ) P ( Y = 1 , A = a )= E [(1 − S ) Y { A = a } ] E [ Y { A = a } ]= E [(1 − S ) E [ Y | A, X, S, D = 0] { A = a } E [ E [ Y | A, X, S, D = 0] { A = a } ]= E [(1 − S ) µ { A = a } ] E [ µ { A = a } ]33he fourth equality in both derivations uses Assumptions 2 (positivity) and 3 (ignorability),and the ﬁfth equality uses Assumption 1 (consistency). Proof of Proposition 2 (Identiﬁcation of the loss and fairness constraints)

Proof.

Beginning with the loss, we have: L ( S θ ) : = E [( S θ − Y ) ]= E [(1 − S θ ) Y + S θ ]= E [(1 − E [ S θ | A, S ]) E [ Y | X, A, S, D = 0]] + E { E [ S θ | A, S ] } = E { (1 − E [ S θ | A, S ]) µ } + E { E [ S θ | A, S ] } = E { µ } + E { E [ S θ | A, S ](1 − µ ) } = E [ µ ] + E  (cid:88) a,s ∈{ , } θ a,s { A = a, S = s } (1 − µ )  = E [ µ ] + θ T β where β is deﬁned in (1). The second equality follows from the fact that S θ , Y ∈ { , } ,the third uses ignorability and the fact that S θ depends only on ( A, S ), the fourth usesconsistency, and the ﬁfth follows from the deﬁnition of θ a,s .We turn now to the fairness constraints. The error rates of the derived predictor S θ dependon the error rates on the input predictor S as follows. Beginning with cFPR( S θ , a ), wehave: P ( S θ = 1 | Y = 0 , A = a ) = (cid:88) r ∈{ , } P ( S θ = 1 | Y = 0 , A = a, S = r ) P ( S = r | Y = 0 , A = a )(9)= (cid:88) r ∈{ , } P ( S θ = 1 | A = a, R = r ) P ( S = r | Y = 0 , A = a ) (10)= θ a, (1 − cFPR( S, a )) + θ a, cFPR( S, a ) (11)where the ﬁrst equality simply involves conditioning on S , and the second equality usesthat S θ ⊥⊥ Y | A, S . In other words, the false positive rate of S θ depends only on θ andthe false positive rate of the input predictor S . For the cFNR, by similar reasoning, we34ave: P ( S θ = 0 | Y = 1 , A = a ) = (cid:88) r ∈{ , } P ( S θ = 0 | Y = 1 , A = a, S = r ) P ( S = r | Y = 1 , A = a )(12)= (cid:88) r ∈{ , } P ( S θ = 0 | A = a, R = r ) P ( S = r | Y = 1 , A = a ) (13)= (1 − θ a, ) cFNR( S, a ) + (1 − θ a, )(1 − cFNR( S, a )) (14)= − θ a, (cFNR( S, a )) + θ a, (cFNR( S, a ) −

1) + 1 (15)The identiﬁcation statements in the proposition follow by simply substituting in the ex-pressions for cFPR(

S, a ) , cFNR( S, a ) from Proposition 1 and rearranging.

We introduce several lemmas used in the proofs of the theorems. Here and throughout,let P ( (cid:98) f ) = E [ (cid:98) f | D nuis ] denote the expected value of any random variable (cid:98) f conditional onthe data used to estimate the nuisance parameters (either D nuistrain or D nuistest , depending oncontext). Note that this usage is consistent with the usage of P throughout the paper. Lemma 1.

Let W be a function of (at most) A, X, S such that (cid:107) W (cid:107) ≤ ∞ . Then P ( W ( (cid:98) φ − φ )) = o P (1 / √ n )35 roof. P ( W ( (cid:98) φ − φ )) = P (cid:18) W (cid:18) − D − (cid:98) π ( Y − (cid:98) µ ) + (cid:98) µ − − D − π ( Y − µ ) − µ (cid:19)(cid:19) = P (cid:18) W (cid:18) − D − (cid:98) π ( µ − (cid:98) µ ) + (cid:98) µ − − D − π ( µ − µ ) − µ (cid:19)(cid:19) = P (cid:18) W (cid:18) − π − (cid:98) π ( µ − (cid:98) µ ) + (cid:98) µ − µ (cid:19)(cid:19) = P (cid:18) W (cid:18) ( µ − (cid:98) µ )( (cid:98) π − π )1 − π (cid:19)(cid:19) ≤ δ P ( W ( µ − (cid:98) µ )( (cid:98) π − π )) ≤ δ (cid:107) W (cid:107)(cid:107) µ − (cid:98) µ (cid:107)(cid:107) (cid:98) π − π (cid:107) = o P (1 / √ n )where the fourth line uses iterated expectation, conditioning on ( A, X, S ); the ﬁfth lineuses iterated expectation again, conditioning again on (

A, X, S ); the seventh line usesAssumption 2 (positivity); the eighth line uses the Cauchy-Schwarz inequality; and thelast line uses Assumption 4 (nuisance parameter rates).The next lemma gives suﬃcient conditions under which the optimal value of an estimatedconvex problem converges at √ n rates to the optimal value of the target convex program.It is a simpliﬁcation of Theorem 3.5 in [Shapiro, 1991]; we omit some details from thattheorem that are not relevant to our problem context. Lemma 2. (Shapiro, 1991) Let Θ be a compact subset of R k . Let C (Θ) denote the setof continuous real-valued functions on Θ, with L = C (Θ) × . . . × C (Θ) the r -dimensionalCartesian product. Let ψ ( θ ) = ( ψ , . . . , ψ r ) ∈ L be a vector of convex functions. Considerthe quantity α ∗ deﬁned as the solution to the following convex optimization program: α ∗ = min θ ∈ Θ ψ ( θ )subject to ψ j ( θ ) ≤ , j = 1 , . . . , r Assume that Slater’s condition holds, so that there is some θ ∈ Θ for which the inequalitiesare satisﬁed and non-aﬃne inequalities are strictly satisﬁed, i.e. ψ j ( θ ) < ψ j is non-aﬃne. Now consider a sequence of approximating programs, for n = 1 , , . . . : (cid:98) α n = min θ ∈ Θ (cid:98) ψ n ( θ )subject to (cid:98) ψ jn ( θ ) ≤ , j = 1 , . . . , r (cid:98) ψ n ( θ ) = (cid:16) (cid:98) ψ n , . . . , (cid:98) ψ rn (cid:17) ∈ L . Assume that √ n ( (cid:98) ψ n − ψ ) converges in distributionto a random element W ∈ L . Then: √ n ( (cid:98) α n − α ) (cid:32) L for a particular random variable L . It follows that (cid:98) α n − α = O P (1 / √ n ). Lemma 3.

Let ξ, W be constant vectors and (cid:98) ξ n , (cid:99) W n be random variables, with ξ − (cid:98) ξ n = O P (1 / √ n ). If, for all M > P ( (cid:107) W − (cid:99) W n (cid:107) > M ) ≤ P ( (cid:107) ξ − (cid:98) ξ n (cid:107) > CM ) for some constant C , then W − (cid:99) W n = O P (1 / √ n ). Proof.

For any (cid:15) >

0, there exists some M (cid:15) > P ( √ n (cid:107) ξ − (cid:98) ξ n (cid:107) > M (cid:15) ) < (cid:15) for all n large enough. Set M = M (cid:15) /C . Then P ( √ n (cid:107) W − (cid:99) W n (cid:107) > M ) < (cid:15) for all n large enough.

10 Appendix C: Proofs of Theorems

In each proof that follows, quantities with hats are derived from the same dataset D ,with the nuisance parameters estimated on D nuis and the target parameters estimated on D target as usual. In the proofs of Theorems 1 and 2, D is D train ; for Theorems 3 and 4, D is D test . √ n -convergence of the loss gap to 0) The proof of Theorem 1 relies on Lemma 2 and Theorems 3 and 4. We expand the lossby introducing the term (cid:98) β T (cid:98) θ , which is the quantity that is minimized in the course ofcomputing (cid:98) θ . We proceed by splitting the loss into two terms and showing that each ofthose terms is O P (1 / √ n ). Proof.

First, note that, following the logic employed in the proofs of Theorems 3 and 4,we have (cid:98) β − β = O P (1 / √ n ) (cid:98) ∆ + ( S θ ) − ∆ + ( S θ ) = O P (1 / √ n ) (cid:98) ∆ − ( S θ ) − ∆ − ( S θ ) = O P (1 / √ n )37he loss gap can be expanded as follows: L ( S (cid:98) θ ) − L ( S θ ∗ ) = β T (cid:98) θ − β T θ ∗ = (cid:16) β T (cid:98) θ − (cid:98) β T (cid:98) θ (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) (1) + (cid:16) (cid:98) β T (cid:98) θ − β T θ ∗ (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) (2) For term (1), we have (cid:98) θ T (cid:16) β − (cid:98) β (cid:17) ≤ (cid:107) (cid:98) θ (cid:107)(cid:107) β − (cid:98) β (cid:107)≤ (cid:107) β − (cid:98) β (cid:107) = O P (1 / √ n )where the ﬁrst line uses Cauchy-Schwarz and the second line follows from the fact that (cid:98) θ ∈ [0 , . For the second term in the loss gap, we rely on Lemma 2. Note that we canwrite L ( S θ ∗ ) = min θ ∈ Θ ψ ( θ )subject to ψ j ( θ ) ≤ , j = 1 , . . . , r (cid:98) L ( S (cid:98) θ ) = min θ ∈ Θ (cid:98) ψ ( θ )subject to (cid:98) ψ j ( θ ) ≤ , j = 1 , . . . , r with Θ = [0 , , and ψ ( θ ) = ( L ( S θ ) , ∆ + ( S θ ) − (cid:15) + , − ∆ + ( S θ ) − (cid:15) + , ∆ − ( S θ ) − (cid:15) − , − ∆ − ( S θ ) − (cid:15) − ) (cid:98) ψ ( θ ) = ( (cid:98) L ( S θ ) , (cid:98) ∆ + ( S θ ) − (cid:15) + , − (cid:98) ∆ + ( S θ ) − (cid:15) + , (cid:98) ∆ − ( S θ ) − (cid:15) − , − (cid:98) ∆ − ( S θ ) − (cid:15) − )Since these are linear programs, Slater’s condition is trivially satisﬁed. Again, followingthe logic in Theorems 3 and 4, we have √ n (cid:16) (cid:98) ψ ( θ ) − ψ ( θ ) (cid:17) = ( P n − P ) ψ ( θ ) + γ ( θ )with γ ( θ ) = o P (1 / √ n )Since ψ j ( θ ) is parametric and Lipschitz in θ , with compact domain Θ, it follows that theset { ψ j ( θ ) : θ ∈ Θ } is a Donsker class. Therefore √ n ( (cid:98) ψ − ψ ) (cid:32) W W indexed by θ ∈ Θ, with mean 0 and ﬁnite covariance functiongiven by cov( W ( θ ) , W ( θ )) = cov( ψ ( θ ) , ψ ( θ )). That is, the estimated linear program,considered as a vector of functions of θ , converges at √ n rates to the true linear program.Per Lemma 2, it follows that (cid:98) L ( S (cid:98) θ ) − L ( S θ ∗ ) = O P (1 / √ n ).The sum of the two terms in the loss gap is therefore also O P (1 / √ n ). √ n -convergence of the excess unfairness to 0) The proof relies on Lemma 3 and the √ n -convergence of the constraint quantities (cid:98) β + , (cid:98) β − .When (cid:98) β + , (cid:98) β − are close to β + , β − , the excess unfairness must be small for any θ ∈ Θ =[0 , , including of course (cid:98) θ . The √ n -consistency of (cid:98) β + , (cid:98) β − carries over into the excessunfairness. Proof.

Note that, following the logic employed in the proof of Theorem 4, we have (cid:98) β + − β + = O P (1 / √ n ) (cid:98) β − − β − = O P (1 / √ n )We have P (UF + ( S (cid:98) θ ) > δ or UF − ( S (cid:98) θ ) > δ ) ≤ P (cid:16) | (cid:98) ∆ + ( S θ ) | − | ∆ + ( S θ ) | > δ or | (cid:98) ∆ − ( S θ ) | − | ∆ − ( S θ ) | > δ for some θ ∈ [0 , (cid:17) ≤ P (cid:16) | (cid:98) ∆ + ( S θ ) − ∆ + ( S θ ) | > δ or | (cid:98) ∆ − ( S θ ) − ∆ − ( S θ ) | > δ for some θ ∈ [0 , (cid:17) = P (cid:16) | θ T ( (cid:98) β + − β + ) | > δ or | θ T ( (cid:98) β − − β − ) | > δ for some θ ∈ [0 , (cid:17) ≤ P (cid:16) (cid:107) θ (cid:107) · (cid:107) (cid:99) β + − β + (cid:107) > δ or (cid:107) θ (cid:107) · (cid:107) (cid:99) β − − β − (cid:107) > δ for some θ ∈ [0 , (cid:17) ≤ P (cid:16) (cid:107) (cid:98) β + − β + (cid:107) > δ or 2 (cid:107) (cid:98) β − − β − (cid:107) > δ (cid:17) ≤ P (cid:16) (cid:107) (cid:98) β + − β + (cid:107) > δ ) + P (2 (cid:107) (cid:98) β − − β − (cid:107) > δ (cid:17) = P (cid:16) (cid:107) (cid:98) β + − β + (cid:107) > δ/

2) + P ( (cid:107) (cid:98) β − − β − (cid:107) > δ/ (cid:17) where the ﬁfth line uses Cauchy-Schwartz, the sixth line uses that θ ∈ [0 , = ⇒ (cid:107) θ (cid:107) ≤ (cid:8) UF + ( S (cid:98) θ ) , UF − ( S (cid:98) θ ) (cid:9) = O P (1 / √ n )as claimed. 39 θ ) Proof.

Fix θ ∈ [0 , . It is straightforward to show that the identifying expressions inProposition 2 hold if µ is replaced by φ . We then have: (cid:98) L ( S θ ) − L ( S θ ) = ( θ T (cid:98) β + P n ( (cid:98) φ )) − ( θ T β + P ( φ )) (16)= P n ( (cid:98) f θ ) − P ( f θ ) (17)= ( P n − P ) f θ (cid:124) (cid:123)(cid:122) (cid:125) (1) + ( P n − P )( (cid:98) f θ − f θ ) (cid:124) (cid:123)(cid:122) (cid:125) (2) + P ( (cid:98) f θ − f θ ) (cid:124) (cid:123)(cid:122) (cid:125) (3) (18)Term (2) is o P (1 / √ n ) by Lemma 2 in [Kennedy et al., 2020], and term (3) is o P (1 / √ n ) byLemma 1. We can therefore rewrite (18) as (cid:98) L ( S θ ) − L ( S θ ) = ( P n − P ) f θ + o P (1 / √ n )= ⇒ √ n (cid:16) (cid:98) L ( S θ ) − L ( S θ ) (cid:17) (cid:32) N (cid:16) , var( f θ ) (cid:17) where the convergence follows from the central limit theorem and Slutsky’s theorem. Byequivalent reasoning, (cid:98) Γ( θ, S ) − Γ( θ, S ) = ( P n − P )( f θ − f ˜ θ ) + o P (1 / √ n )= ⇒ √ n (cid:16)(cid:98) Γ( θ, S ) − Γ( θ, S ) (cid:17) (cid:32) N (cid:16) , var( f θ − f ˜ θ ) (cid:17) per the second statement of the theorem. S θ ) Proof.

Fix θ ∈ [0 , . Once again, it is simple to show that the identifying expressionsfrom Proposition 2 hold if µ is replaced with φ . Per Proposition 1 and the deﬁnition of (cid:92) cFPR given in (7), we have the following for the cFPR of the input predictor S : (cid:92) cFPR( S, a ) − cFPR( S, a ) = P n [ S (cid:98) γ a ] P n [ (cid:98) γ a ] − E [ Sγ a ] E [ γ a ] (19)= P n [ S (cid:98) γ a ] P [ γ a ] − P [ Sγ a ] P n [ (cid:98) γ a ] P n [ (cid:98) γ a ] P [ γ a ] (20)= P [ γ a ] (cid:16) P n [ S (cid:98) γ a ] − P [ Sγ a ] (cid:17) − P [ Sγ a ] (cid:16) P n [ (cid:98) γ a ] − P [ γ a ] (cid:17) P n [ (cid:98) γ a ] P [ γ a ] (21)= P n [ (cid:98) γ a ] −  ( P n [ S (cid:98) γ a ] − P [ Sγ a ]) (cid:124) (cid:123)(cid:122) (cid:125) (1) − cFPR( S, a ) ( P n [ (cid:98) γ a ] − P [ γ a ]) (cid:124) (cid:123)(cid:122) (cid:125) (2)  (22)40he two terms can be expanded as follows:(1) = ( P n − P ) Sγ a + ( P n − P )( S (cid:98) γ a − Sγ a ) + P ( S (cid:98) γ a − Sγ a ) (23)(2) = ( P n − P ) γ a + ( P n − P )( (cid:98) γ a − γ a ) + P ( (cid:98) γ a − γ a ) (24)The second term in both these expressions is o P (1 / √ n ) per Lemma 2 in [Kennedy et al.,2020]. The third terms are o P (1 / √ n ) per Lemma 1. Hence we can rewrite (22) as( P n − P ) (cid:110) P n [ (cid:98) γ a ] − ( Sγ a − cFPR( S, a ) γ a ) (cid:111) + o P (1 / √ n ) (25)Per the identifying expression for cFPR( S, a ) in Proposition 1 and the deﬁnition of (cid:92) cFPR,we have (cid:92) cFPR( S θ , a ) − cFPR( S θ , a ) = ( θ a, − θ a, )( (cid:92) cFPR( S, a ) − cFPR( S, a ))Combining this with (25), we have √ n (cid:16) (cid:92) cFPR( S θ , a ) − cFPR( S θ , a ) (cid:17) = √ n ( P n − P ) g a + o P (1) (cid:32) N (0 , var( g a ))where the convergence follows from the fact that (cid:98) γ a is consistent for γ a (per Assumption 4),combined with the central limit theorem and Slutsky’s theorem. This establishes the ﬁrststatement of the theorem. The second statement follows by identical reasoning, substituting h a for g a for the cFNR. The error rate diﬀerences can then be expressed as: √ n (cid:16) (cid:98) ∆ + ( θ, S ) − ∆ + ( θ, S ) (cid:17) = √ n ( P n − P )( g − g ) + o P (1) (cid:32) N (0 , var( g − g )) √ n (cid:16) (cid:98) ∆ − ( θ, S ) − ∆ − ( θ, S ) (cid:17) = √ n ( P n − P )( h − h ) + o P (1) (cid:32) N (0 , var( h − h ))where the convergence once again follows from the central limit theorem and Slutsky’stheorem.

11 Appendix D: Notation nput data Z = ( X, A, D, S, Y ) ∼ P Data: covariates X ∈ R p , sensitive feature A , decision (treatment, intervention) D , in-put predictor S , outcome Y D train Sample used to construct S (cid:98) θ D test Sample used to estimate properties of S (cid:98) θ ,given (cid:98) θ Derived predictor S θ = (cid:80) a,s ∈{ , } { A = a, S = s } B a,s Predictor derived from SB a,s ∼ Bern( θ a,s ) Random variable that ﬂips Sθ a,s = P ( S θ = 1 | A = a, S = s ) Conditional probability that deﬁnes S θ θ A,S = (cid:80) a,s ∈{ , } θ a,s { A = a, S = s } RV that takes value θ a,s with prob. P ( A = a, S = s ) θ = ( θ , , θ , , θ , , θ , ) T Optimization parameter ˜ θ = (0 , , ,

1) The value such that S ˜ θ = S Nuisance parameters π = π ( A, X, S ) = P ( D = 1 | A, X, S ) Propensity score for the decision µ = µ ( A, X, S, D ) = E [ Y | A, X, S, D = 0] Outcome regression φ = − D − π ( Y − µ ) + µ Uncentered inﬂuence function for E [ Y ] Loss parameters β a,s = E [ { A = a, S = s } (1 − µ )] A coeﬃcient in the loss β = ( β , , β , , β , , β , ) T Vector of loss coeﬃcients f θ = (1 − θ A,S ) φ + θ A,S

Uncentered IF for the loss of S θ L ( S θ ) = E (cid:2) ( S θ − Y ) (cid:3) = θ T β = E [ f θ ] Loss of S θ , in several equivalent formsΓ( S θ ) Change in loss L ( S θ ) − L ( S )Table 2: Notation.42 airness parameters cFPR( S θ , a ) = P ( S θ = 1 | Y = 0 , A = a ) Counterfactual FPR for S θ for group a cFNR( S θ , a ) = P ( S θ = 0 | Y = 1 , A = a ) Counterfactual FNR for S θ for group aβ + = (1 − cFPR( S, , cFPR( S, , Coeﬃcients deﬁning fairness constraintscFPR( S, − , − cFPR( S, β − = ( − cFNR( S, , cFNR( S, − ,cF N R ( S, , − cFNR( S, + ( S θ ) = θ T β + = cFPR( S θ , − cFPR( S θ ,

1) Unfairness of the predictor S θ in the cFPR∆ − ( S θ ) = θ T β − = cFNR( S θ , − cFNR( S θ ) Unfairness of the predictor S θ in the cFNR (cid:15) + , (cid:15) − Unfairness tolerances in the cFPR and cFNRUF + ( S θ ) = max( | ∆ + ( S θ ) | − (cid:15) + ,

0) Excess unfairness in the cFPRUF − ( S θ ) = max( | ∆ − ( S θ ) | − (cid:15) − ,

0) Excess unfairness in the cFNR

Optimal fair derived predictor θ ∗ = arg min θ L ( S θ ) Param deﬁning optimal fair derivedpredictor S θ ∗ s.t. | ∆ + ( S θ ) | ≤ (cid:15) + , | ∆ − ( S θ ) | ≤ (cid:15) −−