[PDF] Group Fairness: Independence Revisited

Abstract

This paper critically examines arguments against independence, a measure of group fairness also known as statistical parity and as demographic parity. In recent discussions of fairness in computer science, some have maintained that independence is not a suitable measure of group fairness. This position is at least partially based on two influential papers (Dwork et al., 2012, Hardt et al., 2016) that provide arguments against independence. We revisit these arguments, and we find that the case against independence is rather weak. We also give arguments in favor of independence, showing that it plays a distinctive role in considerations of fairness. Finally, we discuss how to balance different fairness considerations.

Full PDF

aa r X i v : . [ c s . C Y ] J a n Group Fairness: Independence Revisited

Tim Räz

Institute of PhilosophyUniversity of BernSwitzerlandInstitute of Biomedical Ethics and History of MedicineUniversity of Zü[email protected]

ABSTRACT

This paper critically examines arguments against independence , ameasure of group fairness also known as statistical parity and as demographic parity . In recent discussions of fairness in computerscience, some have maintained that independence is not a suitablemeasure of group fairness. This position is at least partially basedon two inﬂuential papers (Dwork et al., 2012, Hardt et al., 2016)that provide arguments against independence. We revisit these ar-guments, and we ﬁnd that the case against independence is ratherweak. We also give arguments in favor of independence, showingthat it plays a distinctive role in considerations of fairness. Finally,we discuss how to balance diﬀerent fairness considerations.

CCS CONCEPTS • Social and professional topics → User characteristics ; •

Com-puting methodologies → Machine learning ; •

Applied comput-ing → Arts and humanities . KEYWORDS fairness, independence, statistical parity, demographic parity, suf-ﬁciency, separation, aﬃrmative action, accuracy

ACM Reference Format:

Tim Räz. 2021. Group Fairness: Independence Revisited. In

ACM Conferenceon Fairness, Accountability, and Transparency (FAccT ’21), March 1–10, 2021,Virtual Event, Canada.

ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3442188.3445876

Measures of group fairness have become an important topic incomputer science after the publication of the ProPublica article“Machine Bias” [1]. ProPublica found that the risk assessment toolCOMPAS is biased against black people in having unbalanced falsepositive and false negative rates. This is intuitively unfair. The en-suing debate mostly focused on the contrast between the measureimplicitly used by ProPublica, now known as separation , and othermeasures, in particular a measure known as suﬃciency . However,

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor proﬁt or commercial advantage and that copies bear this notice and the full cita-tion on the ﬁrst page. Copyrights for components of this work owned by others thanthe author(s) must be honored. Abstracting with credit is permitted. To copy other-wise, or republish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee. Request permissions from [email protected].

FAccT ’21, March 1–10, 2021, Virtual Event, Canada © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8309-7/21/03...$15.00https://doi.org/10.1145/3442188.3445876 a third measure of group fairness, independence , also known as sta-tistical parity or demographic parity , has been viewed more criti-cally. Some computer scientists seem to think that independenceis not a suitable measure of group fairness [3, 12]; others main-tain that while independence is adequate in some contexts, it leadsto undesirable consequences in others [4, 17]. The critical stanceof computer scientist with respect to independence appears to beat least partially based on two inﬂuential papers [7, 8] that pro-vide arguments against independence. Here we revisit and criti-cally examine these arguments, and we ﬁnd that the case againstindependence as opposed to other measures of group fairness israther weak.We ﬁrst introduce measures of group fairness and their mostimportant properties (section 2). In particular, we introduce theconcept of conservative fairness measures, which allows us to clar-ify the relation between fairness and accuracy. We then examinethe arguments against independence (section 3). We ﬁnd that, ﬁrst,arguments against independence proposed in [7] equally apply toother measures of group fairness such as suﬃciency and separa-tion, and should therefore not be taken to apply to independencespeciﬁcally. Second, we argue that arguments against independenceproposed in [8] are ﬂawed in making unwarranted assumptionsabout conservative fairness measures such as suﬃciency and sepa-ration. We prove that suﬃciency and separation are not incremen-tally conservative , which means that these measures are not nec-essarily preserved if we increase the accuracy of a predictor. Wethen state arguments in favor of independence (section 4), ﬁndingthat independence captures aspects of fairness not covered by suﬃ-ciency and separation. Finally, we discuss how to balance diﬀerentfairness considerations (section 5). This section introduces and discusses the most important measuresof group fairness, formulates these measures for the case of binaryvariables, and discusses other relevant fairness measures, settingthe stage for the discussion in later sections.

Here we state the most important group fairness measures, follow-ing the discussion in [2]. These measures are formulated using ran-dom variables 𝑌 , 𝑅 , 𝐴 ; all measures we consider correspond to sta-tistical properties of these variables. The variables have the follow-ing interpretation: 𝑌 is the “true label”, i.e., the characteristic that AccT ’21, March 1–10, 2021, Virtual Event, Canada Tim Räz we want to predict; 𝑅 is the prediction, which can be the output ofan algorithm; 𝐴 is the characteristic indicating group membership,i.e., the property with respect to which we investigate fairness. Inthe context of supervised learning, we have access to 𝑌 throughlabeled data. We will mostly focus on binary variables. Also, wewill assume that a prediction 𝑅 leads to a corresponding decision.Let us illustrate this setup using the example of college admis-sions. College students of diﬀerent genders apply for college; a pre-diction about their suitability is made based on the application doc-uments. In this case, the value of 𝑌 corresponds to the actual suit-ability of a student applying for college. 𝑌 is known if the data inquestion is historical, and the value of 𝑌 can be determined basedon whether a student actually obtained a degree or not (or a diﬀer-ent operationalization of ‘suitable applicant’). 𝑅 corresponds to theprediction whether or not a student should be admitted to collegebased on the application documents. 𝐴 corresponds to the genderof applicants, which we assume to be binary for simplicity’s sake.To formulate fairness measures, we will use the following no-tation: Two random variables 𝑋, 𝑌 are independent if 𝑃 ( 𝑋, 𝑌 ) = 𝑃 ( 𝑋 )· 𝑃 ( 𝑌 ) ; we will write this as 𝑋 ⊥ 𝑌 . Two random variables 𝑋, 𝑌 are conditionally independent given 𝑍 if 𝑃 ( 𝑋 | 𝑌, 𝑍 ) = 𝑃 ( 𝑋 | 𝑍 ) ;we will write this as 𝑋 ⊥ 𝑌 | 𝑍 . Deﬁnition 1.

The measure of indepencende is satisﬁed if 𝑅 ⊥ 𝐴 .Independence, also known as statistical parity and demographicparity, means that the prediction 𝑅 does not depend on 𝐴 . If inde-pendence is satisﬁed, a prediction is statistically balanced betweendiﬀerent groups, in that members of the diﬀerent groups get pre-dictions at the same rate. In the case of college admissions, thismeans that an equal proportion of men and women applying forcollege are predicted to be suitable applicants. Deﬁnition 2.

The measure of suﬃciency is satisﬁed if 𝑌 ⊥ 𝐴 | 𝑅 .Suﬃciency means that, given the prediction, the true label is in-dependent of the group. The idea is that the prediction 𝑅 containsall the information about the true label, so the sensitive character-istic is not needed; in other words, the prediction 𝑅 is suﬃcientfor 𝑌 . In the case of college admissions, this means that an equalproportion of men and women predicted to be suitable applicantsare actually suitable applicants. Deﬁnition 3.

The measure of separation is satisﬁed if 𝑅 ⊥ 𝐴 | 𝑌 .Separation means that, given the true label, the prediction is in-dependent of the group. The idea is that the prediction 𝑅 can onlyvary with respect to diﬀerent groups 𝐴 insofar as this is justiﬁedby the true label 𝑌 ; see [2]. In the case of college admissions, thismeans that an equal proportion of suitable men and women apply-ing for admission are predicted to be suitable applicants. In this section, we discuss some important properties of the fair-ness measures introduced above. The discussion follows [2]; seethe appendix for proofs. First, the accuracy of a predictor 𝑅 is thedegree to which it agrees with the true label 𝑌 ; a perfect predictor Suﬃciency is closely related to calibration . Calibration means that the predicted storereﬂects the true score. Calibration and suﬃciency are equivalent up to reparametriza-tion, cf. [2, p. 52]. is a predictor that completely agrees with the true label, i.e., 𝑌 = 𝑅 .Next, we state an important property that is shared by suﬃciencyand separation, but not by independence. Proposition 4.

If we have a perfect predictor, then suﬃciency andseparation hold.Independence, 𝑅 ⊥ 𝐴 , is not, in general, compatible with a per-fect predictor: independence and perfect predictors are only com-patible if the true label is evenly distributed between groups, i.e.,if we have 𝑌 ⊥ 𝐴 , which is not the case in general. Proposition 4motivates a deﬁnition that will be important in the following. It isa distinction between diﬀerent kinds of group fairness measures: Deﬁnition 5.

A fairness measure is conservative if the measure isnecessarily satisﬁed in the case of a perfect predictor. Otherwise, afairness criterion is non-conservative .Fairness measures are called conservative because, in the caseof a perfect predictor, they do not force us to change anything toobtain fairness, i.e., they conserve the status quo . Proposition 4shows that both suﬃciency and separation are conservative fair-ness criteria; meanwhile, independence is not. Note that the re-verse implication of proposition 4 is false. The following proposi-tion provides a characterization of when suﬃciency and separationholds in some cases of non-perfect predictors:

Proposition 6.

If the joint distribution of ( 𝐴, 𝑌, 𝑅 ) is positive forall values, then suﬃciency and separation hold at the same timeiﬀ. 𝐴 is independent of the joint distribution of 𝑌 and 𝑅 , i.e., if 𝐴 ⊥ ( 𝑌, 𝑅 ) .This proposition is important because it tells us when suﬃciencyand separation hold under reasonable circumstances such as a non-vanishing joint distribution.The notion of a conservative fairness measure is very strongand of limited practical relevance, because predictors are hardlyever perfect in practice. To overcome this limitation, we can deﬁnea broader notion of conservativeness and investigate if relevantfairness measures are conservative on this notion: Deﬁnition 7.

A fairness measure is incrementally conservative ifthe degree to which the measure is satisﬁed does not decrease ifwe increase the accuracy of the predictor.Are the fairness measures we considered above conservative ac-cording to this broader notion? Unfortunately, this is not the case.In the appendix, the following proposition is proved:

Proposition 8.

Suﬃciency and separation are not incrementallyconservative fairness measures.This means that if these two measures are satisﬁed by a certain(non-perfect) predictor, and we increase the accuracy of that pre-dictor, it can happen that the improved predictor no longer satisﬁesthe two measures. Thus, the property of conservativeness does notimply incremental conservativeness; it is not necessarily the casethat if we increase the accuracy of a predictor, a conservative fair-ness measure is preserved. The notion of a conservative fairness measure used here is related to the concept of conservative justice [14, Sec. 2.1.] insofar as the latter notion concerns the preservationof (factual) practices; however, the notion of conservativeness proposed here does notconcern the preservation of norms as required by conservative justice.2 roup Fairness: Independence Revisited FAccT ’21, March 1–10, 2021, Virtual Event, Canada

In this section, we discuss group fairness measures in the case ofbinary prediction 𝑅 , ground truth 𝑌 , and characteristic 𝐴 , using so-called confusion matrices. Confusion matrices make it easier to for-mulate and reason about these measures in concrete applications.The discussion in this section draws on more thorough expositionsof confusion matrices and their characteristics in [3, 13].Assume we have collected statistical information for binary 𝑌 and 𝑅 , for example, historical records of college success 𝑌 , as wellas the binary prediction for admission 𝑅 . The prediction 𝑅 can bepositive or negative, and for either outcome, it can match the truelabel 𝑌 ( 𝑎 = 𝑑 = 𝑏 = 𝑐 = 𝑎 𝑏 𝑎 + 𝑏 negative 𝑐 𝑑 𝑐 + 𝑑 total 𝑎 + 𝑐 𝑏 + 𝑑 𝑁 On this basis, we can deﬁne some important statistics of confu-sion matrices: • Accuracy: 𝑎 + 𝑑𝑁 • Positive Predictive Value (PPV): 𝑎𝑎 + 𝑏 • Negative Predictive Value (NPV): 𝑑𝑐 + 𝑑 • False Positive Rate (FPR): 𝑏𝑏 + 𝑑 • False Negative Rate (FNR): 𝑐𝑎 + 𝑐 From here on, we assume that we have observed suﬃcientlymany cases such that the observations (relative frequencies) in ourtables approximately match the “true probabilities”. Now, in orderto formulate fairness measures for confusion matrices, we needone matrix for each of two groups 𝑝, 𝑞 (values of the random vari-able 𝐴 ): truth (Y)+ –pred. (R) + 𝑎 𝑏 – 𝑐 𝑑 Table 1: Group A=p truth (Y)+ –pred. (R) + 𝑎 ′ 𝑏 ′ – 𝑐 ′ 𝑑 ′ Table 2: Group A=q

Based on this, the three group fairness measures deﬁned in theprevious section can be formulated in terms of statistics of thesetwo confusion matrices:

Proposition 9.

For binary variables

𝑌, 𝑅, 𝐴, independence is equiv-alent to: 𝑎 + 𝑏𝑁 = 𝑎 ′ + 𝑏 ′ 𝑁 ′ . Proposition 10.

For binary variables

𝑌, 𝑅, 𝐴, suﬃciency holds iﬀ.both groups have the same positive predictive value (PPV), i.e., 𝑎𝑎 + 𝑏 = 𝑎 ′ 𝑎 ′ + 𝑏 ′ and the same negative predictive value (NPV), i.e., 𝑑𝑐 + 𝑑 = 𝑑 ′ 𝑐 ′ + 𝑑 ′ . Proposition 11.

For binary variables

𝑌, 𝑅, 𝐴, separation holds iﬀ.both groups have the same false positive rate (FPR), i.e., 𝑏𝑏 + 𝑑 = 𝑏 ′ 𝑏 ′ + 𝑑 ′ and false negative rate (FNR), ie, 𝑐𝑎 + 𝑐 = 𝑐 ′ 𝑎 ′ + 𝑐 ′ . In this section, we discuss kinds of fairness that are not group fair-ness measures, but that will play an important role in our discus-sion below. The group fairness measures introduced above are ob-servational measures, i.e., they can be measured based on data thatare typically available: In may cases, we have access to labeled data( 𝑌 ), a predictive model ( 𝑅 ), and labels or a diﬀerent kind of accessto the sensitive characteristic of individuals, ( 𝐴 ). However, thereare other kinds of fairness considerations that are not measurablein terms of these quantities.Kamishima et al. [10] propose to distinguish three diﬀerent kindsof fairness. The ﬁrst kind, prejudice, subsumes the notions of groupfairness discussed above. The second kind, underestimation, is dueto the fact that a model may be unfair due to the ﬁniteness of train-ing data. The deﬁnition of the third kind, negative legacy, is partic-ularly important: Deﬁnition 12.

Negative legacy is unfairness due to unfair sam-pling or labeling in the training data.Kamishima et al. provide the following example of negative legacy:“[I]f a bank has been unfairly rejecting the loans of the people whoshould have been approved, the labels in the training data wouldbecome unfair. This problem is serious because it is hard to detectand correct” (Ibid., p. 646). Kamishima et al. note that the problemcan be overcome to a certain extent if an independent set of fairlylabeled training data is available.A further notion of fairness that is relevant is individual fairness,which can be deﬁned as follows:

Deﬁnition 13. (Informal)

Individual fairness is the requirementthat a fair predictor should treat similar individuals similarly, i.e.,their predictions should be similar. Individual fairness is in tension with group fairness measuresunder certain conditions because group fairness deﬁnes fairnessin terms of (average) properties of group members, which usu-ally does not do justice to some individual properties of groupmembers. In particular, if the comparison of individuals uses ﬁne-grained information such as a score or utility, it is possible to vi-olate individual fairness while complying with some measure ofgroup fairness. We will see examples of this below. Finally, notethat individual fairness seem to be the fairness measure that ismost closely related to the philosophical concept of justice, cf. [14,Sec. 1.1.]

In this section, we examine arguments against independence fromthe computer science literature. When following the debate, onecan get the impression that independence is somehow ﬂawed orunsuitable as a measure for group fairness. The goal of this section The notion is due to [7]. Formally, we can make individual fairness precise by replac-ing the informal notion of “similarity” with two metrics, which capture how similaror close individuals and their predictions are. To do this, we need a metric 𝑑 betweenindividuals 𝑥, 𝑦 ∈ 𝐼 , and a metric 𝐷 between distributions of predictions 𝑀𝑥, 𝑀𝑦 of individuals, where 𝑀 is a map from individuals 𝐼 to distributions of predictions.To enforce individual fairness, we now require that the distance between individualsshould limit the distance between the distribution of predictions, i.e., we should have 𝐷 ( 𝑀𝑥, 𝑀𝑦 ) ≤ 𝑑 ( 𝑥, 𝑦 ) for 𝑥, 𝑦 ∈ 𝐼 . This is a so-called Lipschitz condition.3 AccT ’21, March 1–10, 2021, Virtual Event, Canada Tim Räz is to revisit and critically examine important arguments againstindependence.

The most important paper credited with showing that indepen-dence is not a suitable fairness concept is [7]. We will reexaminethe arguments in this paper and argue that, ﬁrst, it is not clearwhether Dwork et al. wish to reject independence, and second, thatthe arguments made by Dwork et al. should not be construed asarguments against independence, but more broadly as argumentsagainst group fairness in general.First, let us examine whether Dwork et al. wish to simply rejectindependence. Note that Dwork et al. call independence “statisticalparity”. In the introduction, Dwork et al. write: “we demonstrate[the inadequacy of statistical parity] as a notion of fairness throughseveral examples in which statistical parity is maintained, but fromthe point of view of an individual, the outcome is blatantly unfair.”(p. 2) This sounds like an outright rejection. However, later in thepaper, Dwork et al. write that “statistical parity is insuﬃcient asa general notion of fairness.” (p. 7) This suggest that Dwork et al.merely want to argue that independence (statistical parity) is not alogically suﬃcient condition for fairness, which is a much weakerclaim than the claim that it should be rejected tout court . Whatis more, the paper investigates to what extent individual fairnessimplies, or helps satisfy, independence, which would be unneces-sary if independence should be outright rejected. Thus, Dwork etal. merely caution against independence as the sole arbiter of fair-ness.Now let us examine the arguments against independence in [7,Sec. 3.1] more closely. The arguments take the form of three exam-ples, in which the adoption of independence has undesirable conse-quences, in that independence holds, but individuals are treated un-fairly, that is, individual fairness is violated. The ﬁrst example, ‘Re-duced Utility’, shows that independence does not ensure that themost suitable candidates from diﬀerent group are selected. In theexample, an organization hires people from two groups 𝑝, 𝑞 ∈ 𝐴 .It is possible for the organization to comply with independencewhile, out of ignorance, choosing the least qualiﬁed members ofgroup 𝑝 and the best qualiﬁed members of group 𝑞 . This reducesthe utility of the organization, and it also violates individual fair-ness, because similar members of the two groups are treated diﬀer-ently. To make it concrete, assume we have two individuals 𝑥 ∈ 𝑝 and 𝑦 ∈ 𝑞 , both similarly qualiﬁed, and while 𝑦 is hired, 𝑥 is nothired; thus two individuals who are similar are not treated simi-larly. The second example, ‘Self-fulﬁlling Prophecy’, has the samestructure as the ﬁrst example, but the unqualiﬁed members of 𝑝 are now maliciously chosen for the purpose of justifying futurediscrimination of members of 𝑝 . The third example, ‘Subset Tar-geting’, is based on the fact that independence does not ensure afair choice within groups, in that it does not require that the mostdeserving members of a group get to see a relevant job ad. Thisimplies, once more, a violation of individual fairness.Are these examples by Dwork et al. suﬃcient to reject indepen-dence as a criterion of group fairness? We grant that independence can decrease utility, if utility depends on the degree of accuracy, be-cause independence does not depend on 𝑌 , and thus also not on ac-curacy, i.e., how well 𝑌 and 𝑅 match. The three examples discussedby Dwork et al. are all based on the fact that independence only re-quires that an equal proportion of two groups get classiﬁed in acertain way, but does not further specify how individuals withinthese groups have to be distributed with respect to 𝑌 . As a conse-quence, independence does not, in general, guarantee individualfairness.We thus grant that these examples are valid in substance. How-ever, this is not suﬃcient to reject independence as opposed toother measures of group fairness such as separation and suﬃciency,because similar arguments can be directed against these other mea-sures. Our argumentative strategy is to make a tu quoque argu-ment: If one accepts these arguments against independence, thenone also has to accept similar arguments against suﬃciency andseparation.We will now give a ﬁrst example of gerrymandering [11] withseparation, in which an employer manipulates statistics so as torealize an unequal treatment of groups, while maintaining separa-tion. Example 14.

Gerrymandering With Separation:

A malicious em-ployer makes hiring decisions. There are two groups, 𝑝 and 𝑞 , forwhich separation has to be enforced. The employer has a prefer-ence for people from group 𝑞 . Assume that the employer has madeprovisional hiring decisions that satisfy separation, and a confu-sion matrix according to these hiring decisions has been compiled,cf. tables 1 and 2. The confusion matrices satisfy separation, whichimplies that we have 𝑎𝑎 + 𝑐 = 𝑎 ′ 𝑎 ′ + 𝑐 ′ . Assume further that the em-ployer has a reservoir of 𝑧 qualiﬁed candidates from group 𝑞 thatdo not appear in the statistic of the confusion matrix. The employercan now hire candidates from that reservoir, as long as an appropri-ate proportion of qualiﬁed candidates is rejected, i.e., the employercreates a division 𝑧 = 𝑧 + + 𝑧 − into qualiﬁed people hired 𝑧 + and qual-iﬁed people rejected 𝑧 − , such that 𝑎 ′ + 𝑧 + 𝑎 ′ + 𝑐 ′ + 𝑧 = 𝑎 ′ 𝑎 ′ + 𝑐 ′ . The new confu-sion matrices are unchanged except for the entries 𝑎 ′ → 𝑎 ′ + 𝑧 + and 𝑐 ′ → 𝑐 ′ + 𝑧 − , which means that the matrices still satisfy separa-tion, as can be easily veriﬁed. However, this hiring practice seemsto be intuitively unfair towards group 𝑝 (and the qualiﬁed peoplefrom the reservoir 𝑧 who are rejected); it can also violate individualfairness because equally suitable candidates from group 𝑝 are noteven considered.It could be objected that this example is not analogous to theexamples by Dwork et al. in that here, the employer has to keeppart of the statistic “oﬀ the book”, and add it later. However, theexamples by Dwork et al. also make the assumption that there isadditional information not captured by the variables relevant toindependence. It is important to note that this (malicious) hiringpractice would not be possible in the case of independence, becausein the example, the employer drives up the number of employeesfrom group 𝑞 , without raising the numbers in the other group, andindependence enforces exact balance between groups.Let us give a second example, with a diﬀerent structure. This isan example for gerrymandering with suﬃciency and separation. roup Fairness: Independence Revisited FAccT ’21, March 1–10, 2021, Virtual Event, Canada Example 15.

Gerrymandering With Suﬃciency and Separation:

As-sume that an employer has made provisional hiring decisions andcompiled two confusion matrices. Assume that the confusion ma-trices for 𝑝 and 𝑞 both do not have any zero entries and that thetwo confusion matrices have the same entries if they are normal-ized by 𝑁 and 𝑁 ′ respectively; this means that the correspondingjoint distribution of ( 𝑌, 𝑅, 𝐴 ) is positive everywhere, and the jointdistribution of ( 𝑌, 𝑅 ) is independent of group membership 𝐴 . Inthis case, suﬃciency and separation are both satisﬁed accordingto proposition 6. Now, the employer wants to hurt group 𝑝 . Undercertain circumstances, this can be done as follows. The employerchooses a member 𝑥 of 𝑝 that is a false negative, i.e., 𝑥 should behired, but is not predicted to be hired. If there is a member 𝑥 ∗ ingroup 𝑝 that is a true positive, i.e., should be hired, and is predictedto be hired, and is more suitable for the job than 𝑥 , the maliciousemployer can simply switch the prediction for 𝑥 and 𝑥 ∗ , such that 𝑥 becomes a true positive and 𝑥 ∗ becomes a false negative. The twoconfusion matrices are unchanged and thus still satisfy suﬃciencyand separation. However, individual fairness is violated, becausea less qualiﬁed candidate has been chosen over a more qualiﬁedcandidate, which hurts the group.Note that this example also works without assuming (malicious)intent. The situation described arises naturally if an employer hasless information about the relative suitability of people from group 𝑝 ; the employer would then hire a less than optimal selection ofcandidates from group 𝑝 and thus also not maximize utility. Thisis very similar to the ﬁrst example by Dwork et al.. We can con-clude that examples of gerrymandering that are similar to thoseby Dwork et al., can be constructed for notions of group fairnesssuch as separation and suﬃciency. Note that other examples ofgerrymandering for suﬃciency and separation can be constructedalong the lines of the examples given here.It could be asked why Dwork et al. did not appreciate that theirarguments apply to other notions of group fairness as well. Here isa plausible explanation: Except for independence, notions of groupfairness, such as suﬃciency and separation, were only discussedmore widely after the publication of the seminal ProPublica article[1], which appeared four years after the publication of [7]. Thus,Dwork et al. might have raised their objections against notionsof group fairness in general, and not just against independence, ifthey had been aware of other notions. It should also be noted thatthe primary focus of Dwork et al. is on the notion of individual fair-ness, not on group fairness. The paper does examine the relationbetween individual fairness and independence (statistical parity),but this is not the center of attention. All in all, the idea that inde-pendence is somehow more problematic than other group fairnesscriteria, which is held in parts of the computer science literatureand can at least partially be traced back to Dwork et al., may be ahistorical accident. Other arguments in the computer science literature are targetedmore speciﬁcally at independence and do not apply to other no-tions of group fairness such as suﬃciency and separation. In [8], [2] mention that argument against independence may apply to other statistical fair-ness measures as well, but this is not elaborated. the authors propose separation as a fairness measure. To diﬀerenti-ate separation from independence, the authors claim that indepen-dence is ﬂawed for two reasons that do not apply to separation. Theﬁrst reason is the kind of argument made in [7]. The second reasonwhy independence is ﬂawed is given in the following quote – notethat the authors call independence “demographic parity”, and thepredictor is denoted ˆ 𝑌 :... demographic parity often cripples the util-ity that we might hope to achieve. Just imag-ine the common scenario in which the targetvariable 𝑌 – whether an individual actually de-faults or not – is correlated with 𝐴 . Demographicparity would not allow the ideal predictor ˆ 𝑌 = 𝑌 , which can hardly be considered discrimina-tory as it represents the actual outcome. As aresult, the loss in utility of introducing demo-graphic parity can be substantial. [8, p. 2]Later, the authors note that separation does not have this prob-lem: “Unlike demographic parity, our notion always allows for theperfectly accurate solution [...]” (Ibid.) We will now reconstruct theargument in this passage based on the distinctions made in section2. There are two diﬀerent readings of the argument. The ﬁrst read-ing focuses on the relation between accuracy and utility, while thesecond reading focuses on the relation between accuracy and fair-ness. On the ﬁrst reading, the argument can be reconstructed asfollows:P1 A perfect predictor maximizes utility.P2 Independence is a non-conservative fairness criterion (is notgenerally compatible with a perfect predictor), while separa-tion is a conservative fairness criterion (is compatible witha perfect predictor).C1 Therefore, independence is not generally compatible withmaximal utility, while separation is.C2 Therefore, separation should be preferred over independence.There are two main problems with this argument. The ﬁrst prob-lem is premiss 1: It is not the case that accuracy and utility alignnecessarily; see [5]. For one, accuracy only captures the state of theworld as it is at a certain point in time. Thus, if we maximize accu-racy, we maximize utility only with regard to short-term goals. Totake the example of risk assessment, maximizing utility means min-imizing current risk. This does not take into account the value ofchanging risk assessment so as to minimize, say, future risk, whichcan be tied to, say, racial justice. It is explicitly noted in [5] thatutility according to present risk scores is “immediate utility”. Fur-thermore, note that if we have a predictor 𝑅 that is not perfect, andfalse positives and false negatives have diﬀerent utilities, we mayhave to choose a predictor 𝑅 ′ that is even less accurate than 𝑅 tomaximize utility.The second problem is the step from conclusion 1 to conclusion2. As is often pointed out in the computer science literature, wevirtually never have a perfect predictor. So we are almost neverin a situation where it actually matters that a fairness measure isconservative, i.e., that the measure is compatible with the perfectpredictor. However, if we are almost never in this situation, con-servativeness is a theoretical concern, but practically irrelevant. Asituation that is practically irrelevant should not guide our choice AccT ’21, March 1–10, 2021, Virtual Event, Canada Tim Räz of fairness measure. So, there is no practical reason to prefer con-servative fairness measures over non-conservative ones. It could be thought that the above argument also goes throughfor broader notions of conservativeness, i.e., that it holds for incre-ments of accuracy: if we increase accuracy, and this automaticallyincreases the degree to which a fairness measure holds, then wedo not need a perfect predictor for accuracy to be of practical rel-evance; the two align in increments. In fact, Hardt et al. appear tohave an argument along these lines in mind. Immediately after thepassage quoted above, they write:[O]ur criterion is easier to achieve the more ac-curate the predictor ˆ 𝑌 is, aligning fairness withthe central goal in supervised learning of build-ing more accurate predictors. [8, p. 2]This claim, however, is false in view of proposition 8, which es-tablishes that it is possible to start with a predictor 𝑅 that satisﬁesseparation, increase the accuracy of 𝑅 , and obtain a new predic-tor 𝑅 ′ that no longer satisﬁes separation. Proposition 8 shows thatboth separation and suﬃciency are not incrementally conservative,and that, therefore, an incremental version of the above argumentdoes not support separation or suﬃciency as opposed to indepen-dence.Let us now turn to the second reading of the argument in thequote from Hardt et al., which focuses on the relation betweenaccuracy and fairness: 𝑃 ∗ A perfect predictor is (maximally) fair, because it aligns withthe actual outcome. 𝑃 ∗ Independence is a non-conservative fairness criterion (is notgenerally compatible with a perfect predictor), while separa-tion is a conservative fairness criterion (is compatible witha perfect predictor).C1* Therefore, independence is not generally compatible with a(maximally) fair predictor, while separation is.C2* Therefore, separation should be preferred over independence.There are, again, two problems with this argument. The ﬁrstproblem, the step from the ﬁrst to the second conclusion, was al-ready discussed above – we can reasonably doubt the practical rel-evance of perfect predictors, because they are virtually never real-ized, and an incremental version of the argument is demonstrablyfalse. The second, more fundamental problem is premiss 𝑃 ∗ . Thispremiss is unsupported, and, arguably, wrong in general. Premiss 𝑃 ∗ is problematic both from a philosophical and from a computerscience perspective.From a computer science perspective, there are important as-pects of algorithmic fairness that are not captured by group fair-ness measures, and this is well known. Take, for example, the kindsof fairness discussed in [10]; see section 2.4 above. Negative legacy is unfairness due to unfair sampling or labeling. Consider the caseof unfair labeling. Unfair labeling means that the distribution 𝑃 ( 𝑌, 𝐴 ) is unfair, i.e., the distribution of actual outcomes 𝑌 we measure ata certain point in time favors one of the groups in 𝐴 over anotherin a way we consider to be unfair. What premiss 𝑃 ∗ says is that a Note that from a conceptual or philosophical point of view, it could be worthwhileto explore the case of perfect predictors. The argument made here takes the morepractical position of computer science that perfect predictors are negligible as a pointof departure. perfect predictor 𝑅 is fair because it aligns with the actual outcome,i.e., because we have 𝑌 = 𝑅 . However, 𝑌 = 𝑅 only provides a goodjustiﬁcation of the fairness of 𝑅 with respect to 𝐴 , i.e., of 𝑃 ( 𝑅, 𝐴 ) ,if the distribution 𝑃 ( 𝑌, 𝐴 ) itself is fair, which need not be the caseif labeling is unfair; this is what Kamishima et al. point out. Thedistribution 𝑃 ( 𝑌, 𝐴 ) can arise through unfair practices, historicalbiases, and so on.Importantly, Kamishima et al. also point out that this sort of un-fairness is hard to detect or measure if we do not have access to asample with fair labeling, such that we can obtain a fair estimate of 𝑃 ( 𝑌, 𝐴 ) . But of course, just because it can be hard, or even impos-sible, to quantify negative legacy, does not mean that this quantityis of no ethical import. Fairness is completely independent of ourability to measure it.Let us illustrate these points with some examples. Why shouldwe think that an accurate predictor is fair? One of the reasons maybe that an accurate predictor aligns with the ground truth 𝑌 . Andtrying to align predictions with the truth should not be consideredto be discriminatory – this is the point made by Hardt et al. inthe above quote. To address this point, recall what truth meansin the present context: It means that 𝑌 captures what we observein the world at a certain point. For example, we observe that peo-ple from group 𝑝 in fact get arrested more frequently than peoplefrom group 𝑞 , we observe that group 𝑝 in fact has more loan ap-plications rejected than group 𝑞 , and so on. This is what the jointdistribution of 𝑌 and 𝐴 captures. In other words, the distribution 𝑃 ( 𝑌, 𝐴 ) is a picture of the status quo. However, the world as it isat a certain point, or the status quo, is not a moral category. It isjust a description of what we ﬁnd in the world. It does not answerthe question whether the world as we ﬁnd it is fair, or morally jus-tiﬁed. Finding the world to be a certain way, and inferring fromthis that the world ought to be this way, is committing a fallacy ac-cording to some philosophers, based on a confusion between factsand values; see, e.g., the discussion of the Is-Ought gap in [15, Sec.2.1.].At this point, it could be objected that in some cases, the distribu-tion of labels does have moral import. Take, for example, the often-mentioned case of violent oﬀenders. If the distribution 𝑃 ( 𝑌, 𝐴 ) cap-tures the historical record of reoﬀending of violent criminals inthe past, then it makes sense to align our predictor 𝑅 with 𝑌 . Itseems that we cannot just ignore the historical record in favor ofa group fairness measure such as independence. The price we payby releasing (potentially) violent criminals from one group, or bylocking up (potentially) innocent members of the other group be-cause these groups have diﬀerent frequencies with respect to 𝑌 ,seems very high, and the choice of such a predictor seems morallywrong. A form of this argument is made in the following passage of[3, p. 14]: “[Independence] has been criticized because it can leadto highly undesirable decisions for individuals (Dwork et al. 2012).One might incarcerate Muslims who pose no public safety risk sothat the same proportions of Muslims and Christians are releasedon parole.”The response to this objection is that it is perfectly possible thatignoring the status quo has undesirable moral consequences , as inthe case of violent oﬀenders. However, this does not invalidate thepoint that the status quo in itself does not have moral status. It roup Fairness: Independence Revisited FAccT ’21, March 1–10, 2021, Virtual Event, Canada just means that the status quo can impact considerations of fair-ness in some cases, and that we may have to weight the moral con-sequences of sticking to or deviating from the status quo againstother considerations of fairness. We will turn to a discussion ofhow this could be achieved in the next section. So far, we have examined arguments against independence, andwe have found that the case against independence is not as clearcut as some of the computer science literature suggests. In this sec-tion, we turn to the case for independence. Why is independencea good or useful fairness measure? We compare independence toother notions of group fairness to highlight its usefulness, but alsoits limitations. Our goal is not to recapitulate the philosophical lit-erature that supports independence. Rather, our goal is to establishsome connections between philosophical concerns and the moreformal discussion in computer science.Independence is deﬁned as 𝑅 ⊥ 𝐴 , that is, probabilistic indepen-dence of group membership and prediction. Note that in practice,it makes sense to not require strict independence, but an approxi-mate version of independence. One justiﬁcation of independenceis that it controls, and potentially compensates, for historical injus-tice. One manifestation of historical injustice is what Kamishimaet al. call negative legacy [10], viz. a distribution 𝑃 ( 𝑌, 𝐴 ) that weconsider to be unjust. The distribution can be unjust because itdoes not adequately represent the true properties of the groups in-volved – this would correspond to unfair sampling, in which casewe may not know the true distribution – , or because the distribu-tion does represent the true properties of the groups involved, butthese properties themselves did not come about in a fair way – thiswould correspond to unfair labeling. Formally, negative legacy can manifest as a correlation betweengroup membership 𝐴 and ground truth 𝑌 , i.e., 𝑌 𝐴 : if the groups 𝐴 should have equal access to the outcome encoded by 𝑌 , thereshould be no correlation between group membership and outcome,i.e., we should have 𝑌 ⊥ 𝐴 . Note that, as in the case of indepen-dence, we can formulate an approximate version of this require-ment. Now, if we build a predictor 𝑅 with a focus on accuracy, asit is usually the case, we get 𝑅 ≈ 𝑌 , i.e., the predictor is approx-imately accurate. However, this also implies that the predictor 𝑅 does not satisfy (an approximate version of) independence. Thus,independence helps us detect this form of historical injustice, andit suggests that we modify 𝑅 , such that, approximately, we obtain 𝑅 ⊥ 𝐴 . This modiﬁcation of 𝑅 may also inﬂuence negative legacyin the long run by moving the distribution 𝑌 closer to the desired 𝑌 ⊥ 𝐴 over time, such that accuracy and independence align natu-rally. This is one argument in favor of independence.To better understand the usefulness of independence as a fair-ness measure, let us compare it to other kinds of measures. Take,ﬁrst, suﬃciency and separation. The main diﬀerence between inde-pendence on the one hand, and suﬃciency and separation on theother, is that independence is formulated without 𝑌 . This means Note that above, we have excluded the ﬁrst case through the assumption that theconfusion matrices are at least approximately representative of the true probabilities.We have not excluded the second case. that while suﬃciency and separation track the diﬀerence betweena prediction 𝑅 and the truth given by 𝑌 – they are measures of erroror deviation from the truth – independence does not track devia-tion from the truth. Prima facie , this may seem like a deﬁciencyof independence. However, as was just explained, independencehelps us detect unfairness in the distribution of 𝑌 exactly becauseit does not focus on deviations from 𝑌 . It helps us to see what maybe wrong with the distribution of 𝑌 itself. This is an advantage ofindependence in contrast to separation and suﬃciency.Now let us compare independence to aﬃrmative action, viz., therequirement that predictions 𝑅 have to satisfy certain thresholds orquota. In the case of college admissions, the requirement could bethat a certain percentage of admitted candidates have to be mem-bers of a racial minority; see [9] for a discussion of aﬃrmative ac-tion in the context of college admissions. A justiﬁcation for aﬃrma-tive action is to compensate for historical injustice. In this respect,the justiﬁcation of aﬃrmative action is similar to the justiﬁcationof independence given above.However, there are also important diﬀerences between indepen-dence and aﬃrmative action. One diﬀerence is that independenceonly requires predictions to be independent of group membership.Aﬃrmative action, on the other hand, can be more stringent in re-quiring that predictions satisfy certain proportions. For example,if only 10% of college applicants belong to a minority, indepen-dence would require that the admission rate for these 10% is thesame as the general admission rate, while aﬃrmative action mayrequire that the admission rate among the 10% is larger to allow fora given balance of admitted candidates, irrespective of applicationrates. This means that independence, formulated for a given set ofapplicants, will not correct for certain kinds of biases such as un-derrepresentation of groups among applicants, while aﬃrmativeaction may correct for this kind of bias.More generally, it should be stressed that while independencemay highlight and help to compensate for certain kinds of histori-cal injustice, implementing it will not correct for many other formsof injustice. In particular, independence prescribes an interventiononly on the prediction 𝑅 , which can be interpreted as a compen-sation for a certain distribution of 𝑌 , and does not prescribe anintervention on the causes of this distribution, or an interventionon the eﬀects of this distribution. We have now seen arguments both in favor and against indepen-dence, and we have found that there is some validity to argumentson both sides. How should we proceed from here? How shouldthese arguments be weighted? We will not be able to answer thesequestions here, but we can provide some rough guidelines in viewof the above discussion.First, we should always explicitly state the moral value of eitherchoosing or rejecting a group fairness measure such as indepen-dence, as opposed to arguing solely on the basis of factual and de-scriptive properties of fairness measures. We have seen why thisis important in the case of independence. We have argued that ac-curacy in and of itself does not have moral value. We do not deny AccT ’21, March 1–10, 2021, Virtual Event, Canada Tim Räz that accuracy can be morally beneﬁcial in certain situations or con-texts; however, it is these moral beneﬁts we care about, and theyshould be stated. For example, if neglecting accuracy has substan-tial social costs in some cases, this is what we care about, and notaccuracy per se. Only once the values supporting arguments foror against independence have been made explicit can we weightthem.Second, gerrymandering is a problem shared by all measuresof group fairness. It is possible to violate individual fairness whilecomplying with suﬃciency or separation, just as it is possible tocomply with independence. Now, there is already a lot of work incomputer science dealing with this problem, beginning with [7],who examine under which conditions independence and individ-ual fairness can be combined. One of the problems of combiningmeasures of group fairness and individual fairness will be, oncemore, to make the moral value of either choice explicit and assignappropriate weights to these choices.Third, it is a mistake to think that we can either require or rejectmeasures of group fairness independently of the case to which itis applied. Rather, the importance of diﬀerent group fairness mea-sures is context dependent. We have seen examples of this above:The cost of requiring independence in the case of classifying vio-lent oﬀenders is diﬀerent from the cost of requiring independencein the case of college admissions. In the ﬁrst case, the cost of mak-ing mistakes seems high; in the second case, the cost of makingmistakes seems lower both for individuals and for society; see [4].Fourth, the preceding two points suggest that none of the groupfairness measures we discussed here are logically necessary or log-ically suﬃcient for fairness: They cannot be logically suﬃcient be-cause they violate individual fairness at least in some cases, andthey cannot be logically necessary because they appear to be inconﬂict with our intuitions about fairness in other cases. This alsosuggests to interpret these measures of group fairness not as abso-lute criteria for fairness. Rather, they can be indicative of fairnessor unfairness depending on the case at hand.

In this paper, we have examined the discussion of independencein the computer science literature, and we have found that somearguments against independence are not convincing in that theyeither equally apply to other measures of group fairness, or undulyemphasize descriptive properties of fairness measures, viz. conser-vativeness, as opposed to normative ones. We have also made apositive case for independence, arguing that it can highlight a dis-tinct kind of unfairness not captured by suﬃciency or separation.The main upshot of the present paper is that independence is animportant measure of group fairness that has to be taken into ac-count in discussions of algorithmic fairness.

A PROOFS

Here we give proofs of the propositions in the main text. All propo-sitions and proofs can be found in the literature [2, 6, 16] and arecollected here for convenience’s sake, except for the proof of propo-sition 8, which is new. We ﬁrst state some useful properties of con-ditional independence (see the above references for proofs):

Proposition 16.

A perfect predictor means that 𝑌 = 𝑅 .Suﬃciency means 𝐴 ⊥ 𝑌 | 𝑅 and separation means 𝐴 ⊥ 𝑅 | 𝑌 .For a perfect predictor, these reduce to 𝐴 ⊥ 𝑌 | 𝑌 . By property3 of conditional independence, this is true for a perfect predictorbecause 𝑌 = 𝑓 ( 𝑌 ) . (cid:3) Proof of proposition 6:

The direction (1) ⇒ (2) is property 5 of con-ditional independence. The direction (2) ⇒ (1) can be seen as fol-lows: view ( 𝑌, 𝑅 ) as a two-dimensional random variable, note that 𝑌 and 𝑅 are functions of this random variable (projection), then theresult follows from property 2 of conditional independence (with-out conditioning on 𝑍 ). (cid:3) Proof of Proposition 8 (Incremental Conservativeness):

We show that suﬃciency and separation are not, in general, pre-served if the accuracy of a predictor is increased, by giving an ex-ample where accuracy increases but separation and suﬃciency arelost. First, consider the two following confusion matrices (recallthat 𝑌 stands for the true label, while 𝑅 stands for the prediction):Y+ – totalR +

10 2 12 – total

13 13

Table 3: Group A=p

Y+ – totalR +

20 4 24 – total

26 26

Table 4: Group A=q

These matrices satisfy suﬃciency and separation; the easiestway to see this is to check that the table for 𝑞 is a multiple of thetable for 𝑝 , so the relative frequencies are the same, which impliesthat suﬃciency and separation are satisﬁed by proposition 6. It canalso be checked by hand, by using the relation between the sta-tistics of confusion matrices on the one hand and fairness on theother, explained in section 2.3. Now we increase the accuracy ofthe predictor 𝑅 , by taking, in each group, an element of the falsenegatives and shifting it to the true positives. This yields a newpredictor 𝑅 ′ with the following confusion matrices:Y+ – totalR’ +

11 2 13 – total

13 13

Table 5: Group A=p

Y+ – totalR’ +

21 4 25 – total

26 26

Table 6: Group A=q

Note that the predictor is more accurate in both groups. Nowwe check whether these tables satisfy suﬃciency and separation.For suﬃciency, we would need that the positive predictive values(PPV) agree, but we have: roup Fairness: Independence Revisited FAccT ’21, March 1–10, 2021, Virtual Event, Canada 𝑎𝑎 + 𝑏 = ≠ = 𝑎 ′ 𝑎 ′ + 𝑏 ′ (1)For separation, we would need that the false negative rates (FNR)agree, but we have: 𝑐𝑎 + 𝑐 = ≠ = 𝑐 ′ 𝑎 ′ + 𝑐 ′ (2)Thus, we have increased accuracy and lost both separation andsuﬃciency. This shows that separation and suﬃciency are not in-crementally conservative fairness measures. (cid:3) Note that if we had increased accuracy in proportion to groupsize, i.e., if we had shifted two elements instead of one from falsenegatives to true positives in group 𝑞 , we would have preservedsuﬃciency and separation. The reason for this is that this incre-ment would have preserved the proportions of the confusion ma-trices between the two groups. However, this is a very special kindof increment. The case we have discussed above, with incrementsnot proportional to the size of the groups, is easier to realize andpresumably more common. Proof of proposition 10:

Suﬃciency for groups 𝐴 = 𝑝, 𝑞 means, inthe case 𝑅 = + and 𝑌 = + : 𝑃 ( 𝑌 = + | 𝐴 = 𝑝, 𝑅 = +) = 𝑃 ( 𝑌 = + | 𝐴 = 𝑞, 𝑅 = +)⇔ 𝑎𝑎 + 𝑏 = 𝑎 ′ 𝑎 ′ + 𝑏 ′ , where the choice of 𝑌 = − yields an equivalent condition; thesame reasoning holds for 𝑅 = − . (cid:3) Proof of proposition 11:

Similar to proof of proposition 10.

ACKNOWLEDGMENTS

I thank Michele Loi, Corinna Herweck, and members of the philos-ophy of science research colloquium in the Fall of 2020 at the Uni-versity of Bern for helpful comments on an earlier draft of the pa-per. This work is supported by the National Research Programme“Digital Transformation” (NRP 77) of the Swiss National ScienceFoundation (SNSF) under Grant No.: 187473.

REFERENCES [1] Angwin, J., J. Larson, S. Mattu, and L. Kirchner. 2016. Machine bias: There’s soft-ware used across the country to predict future criminals. and it’s biased againstblacks. ProPublica.[2] Barocas, S., M. Hardt, and A. Narayanan. 2019.

Fairness and Machine Learning .fairmlbook.org.[3] Berk, R., H. Heidari, S. Jabbari,M. Kearns, and A. Roth. 2018. Fairness in CriminalJustice Risk Assessments: The State of the Art.

Sociological Methods & Research .[4] Chouldechova, A. 2017. Fair prediction with disparate impact: A study of biasin recidivism prediction instruments. ArXiv:1703.00056v1.[5] Corbett-Davies, S., E. Pierson, A. Feller, S. Goel, and A. Huq. 2017. AlgorithmicDecision Making and the Cost of Fairness.

KKD ’17 : 797–806.[6] Dawid, A. P. 1979. Conditional Independence in Statistical Theory.

Journal ofthe Royal Statistical Society. Series B (Methodological)

Designing Aﬃrmative Action Policies under Uncertainty . Mas-ter’s thesis, University of Helsinki.[10] Kamishima, T., S. Akaho, and J. Sakuma. 2011. Fairness-aware Learning throughRegularization Approach. 2011 IEEE 11th International Conference on Data Min-ing Workshops. [11] Kearns, M., S. Neel, A. Roth, and Z. S. Wu. 2018. Preventing Fairness Gerryman-dering: Auditing and Learning for Subgroup Fairness.

PMLR

80: 2564–2572.[12] Kleinberg, J. M., S. Mullainathan, and M. Raghavan. 2016. Inherent Trade-Oﬀsin the Fair Determination of Risk Scores.

CoRR abs/1609.05807.[13] Loi, M., A. Herlitz, and H. Heidari. 2019. A Philosophical Theory of Fairness forPrediction-Based Decisions. Http://dx.doi.org/10.2139/ssrn.3450300.[14] Miller, D. 2017. Justice. The Stanford Encyclopedia of Philosophy.[15] Väyrynen, P. 2019. Thick Ethical Concepts. The Stanford Encyclopedia of Phi-losophy.[16] Wasserman, L. 2004.

All of Statistics . Springer Texts in Statistics. New York:Springer.[17] Zemel, R., Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. 2013. Learning fair rep-resentations.