[PDF] The Use and Misuse of Counterfactuals in Ethical Machine Learning

Abstract

The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and the semantics of counterfactuals, and we conclude that the counterfactual approach in machine learning fairness and social explainability can require an incoherent theory of what social categories are. Our findings suggest that most often the social categories may not admit counterfactual manipulation, and hence may not appropriately satisfy the demands for evaluating the truth or falsity of counterfactuals. This is important because the widespread use of counterfactuals in machine learning can lead to misleading results when applied in high-stakes domains. Accordingly, we argue that even though counterfactuals play an essential part in some causal inferences, their use for questions of algorithmic fairness and social explanations can create more problems than they resolve. Our positive result is a set of tenets about using counterfactuals for fairness and explanations in machine learning.

Full PDF

aa r X i v : . [ c s . C Y ] F e b The Use and Misuse of Counterfactuals in Ethical MachineLearning

Atoosa Kasirzadeh

University of TorontoAustralian National [email protected]

Andrew Smart

[email protected]

ABSTRACT

The use of counterfactuals for considerations of algorithmicfairness and explainability is gaining prominence within themachine learning community and industry. This paper ar-gues for more caution with the use of counterfactuals whenthe facts to be considered are social categories such as race orgender. We review a broad body of papers from philosophyand social sciences on social ontology and the semantics ofcounterfactuals, and we conclude that the counterfactual ap-proach in machine learning fairness and social explainabilitycan require an incoherent theory of what social categories are.Our ﬁndings suggest that most often the social categories maynot admit counterfactual manipulation, and hence may notappropriately satisfy the demands for evaluating the truth orfalsity of counterfactuals. This is important because the wide-spread use of counterfactuals in machine learning can leadto misleading results when applied in high-stakes domains.Accordingly, we argue that even though counterfactuals playan essential part in some causal inferences, their use for ques-tions of algorithmic fairness and social explanations can cre-ate more problems than they resolve. Our positive result isa set of tenets about using counterfactuals for fairness andexplanations in machine learning.

CCS CONCEPTS • Computing methodologies → Philosophical / theoreticalfoundations of artiﬁcial intelligence ; Machine learning ; • Social and professional topics → Socio-technical systems ; Race and ethnicity . KEYWORDS

Ethics of AI, Ethical AI, Counterfactuals, Machine learning,Fairness, Algorithmic Fairness, Explanation, Explainable AI,Philosophy, Social ontology, Social category, Social kind, Phi-losophy of AI

Permission to make digital or hard copies of all or part of this work for personalor classroom use is granted without fee provided that copies are not madeor distributed for proﬁt or commercial advantage and that copies bear thisnotice and the full citation on the ﬁrst page. Copyrights for components of thiswork owned by others than ACM must be honored. Abstracting with credit ispermitted. To copy otherwise, or republish, to post on servers or to redistributeto lists, requires prior speciﬁc permission and / or a fee. Request permissionsfrom [email protected]. FAccT ’21, March 3–10, 2021, Virtual Event, Canada © / /

03. . . $15.00https: // doi.org / / ACM Reference Format:

Atoosa Kasirzadeh and Andrew Smart. 2021. The Use and Misuseof Counterfactuals in Ethical Machine Learning. In

ACM Conferenceon Fairness, Accountability, and Transparency (FAccT ’21), March 3–10,2021, Virtual Event, Canada.

ACM, New York, NY, USA, 9 pages.https: // doi.org / / The use of counterfactuals has become increasingly popularin the machine learning community for many reasons suchas making sense of algorithmic fairness or explainability inautomated decision-making for consequential social contexts[4, 9, 14, 18, 29, 35, 42, 47, 51]. As a result, machine learningalgorithms coupled with counterfactuals could be used formaking high-stakes decisions with ethical and legal impactsin domains such as insurance, predictive policing, and hiring.Despite this widespread attention and use, there is a surpris-ing lack of engagement with the long-standing philosophicaland social scientiﬁc literature on the required ontological andsemantic conditions for an appropriate application of coun-terfactuals.What is a counterfactual? Consider X and Y to representevents or facts and the following chain of occurrences “X andY”, where X precedes Y in time. A counterfactual analysiscan help to ﬁnd whether X is a cause of Y by supposing thenon-occurrence of X and seeking for the e ﬀ ect of this sup-position on Y. This corresponds to evaluating whether thecounterfactual ‘If X had not occurred, Y would not have oc-curred.’ is true. In machine learning practice, there are sev-eral technical ways to generate and evaluate counterfactuals,such as feature-based explanations, prototype explanations,example-based explanations, or causal explanations [19, 32,35, 38, 46, 50, 51]. These approaches are most often rooted,implicitly or explicitly, in either of the two prominent con-ceptual approaches for evaluating counterfactuals: the close-enough-possible-worlds approach inspired by Lewis [36] andStalnaker [49], and the causal modeling approach developedby Spirtes et al. [48] and Pearl [44], among others. To evaluate a counterfactual, the close-enough-possible-worlds approach compares the actual world in which X andY occur with those similar-enough worlds to the actual worldin which X does not occur (e.g., comparing a data instance Strictly speaking, [36, 49] develop the closest-possible-worlds approach tomake sense of counterfactuals. With a bit of weakening, the (set of) closest-possible-world(s) can be interpreted as the (set of) close-enough-possible-world(s), where due to practical considerations those possible worlds that areclose enough to the actual world (rather than the closest possible worlds) areselected. For a recent alternative to evaluating conditionals relative to a causalmodel see [1].

AccT ’21, March 3–10, 2021, Virtual Event, Canada Atoosa Kasirzadeh and Andrew Smart to a similar data instance or to a prototype when generat-ing example-based or prototype explanations, respectively,requires comparison with respect to some notion of enoughsimilarity). If in those worlds Y does not occur the counterfac-tual is considered true and X is deemed the cause of Y; oth-erwise, the counterfactual is deemed false. The close-enough-possible-worlds account has been mainly used in discussionsof counterfactual explanations in machine learning and thecausal modeling approach has been widely applied for exam-ining fairness counterfactually. Although these two semanticaccounts are very di ﬀ erent, the following abstract recipe iscommon to both for the evaluation of counterfactuals. First,determine the facts to be kept ﬁxed under counterfactual vari-ation. Second, vary the antecedent. Third, determine the in-ﬂuence of the variation on the consequent.In this paper, we explore the ontological and epistemological-semantic conditions required for using either of the two con-ceptual approaches for an appropriate application of counter-factuals to ethical machine learning, in particular to algorith-mic fairness and social explanations. We argue that in somecases, the lack of a right grounding of the elements of a coun-terfactual into the social world can lead to their misuse inmachine learning applications. We review a broad body ofpapers from philosophy and social sciences on the ontologyof social categories and conclude that the counterfactual ap-proach in machine learning fairness and social explainabilitymight require an incoherent theory of what some social cat-egories such as race are. Our ﬁndings suggest that despiteits appeal for convenient analysis of fairness and social ex-planations, most often the social categories may not admitan apt counterfactual intervention, and hence may not appro-priately satisfy the required assumptions for evaluating thetruth or falsity of counterfactuals. Accordingly, we argue thateven though counterfactuals play an essential part in somecausal inferences, their use in discussions of algorithmic fair-ness and social explanations can create more problems thanthey resolve. Related work and novelty.

Before we go further, we wouldlike to explicitly contrast our paper in more detail with relatedwork to highlight its novelty. There are four main closely re-lated works on this topic which explicitly or implicitly critiquecounterfactual theories of social causation in decision-makingcontexts. Kohler-Hausmann [34] argues that the counterfac-tual causal model is wrong for detecting discrimination inboth law and social science. Building on this idea, Hu andKohler-Hausmann [30] argue that perhaps we need to use aformal model other than causal models (such as constitutivediagrams) for detecting discrimination. Hanna et al. [23] usecritical race theory and argue that the multi-dimensionality ofrace should be taken into account whenever this phenomenonbecomes relevant to the machine learning community, andchallenges practitioners to explicitly ask who is doing the cat-egorizing and for what purpose? Barocas et al. [4] discuss themapping of the explanatory features to actions in the worldwhen using feature-highlighting explanations. We share theperspective of these authors. However, the novelty of our con-tribution is threefold. (1) We provide a conceptual analysis of the vagueness of the notion of ‘similarity’, rooted in the close-enough-possible-worlds approach. This approach is the con-ceptual basis of feature-based, prototype, and example-basedanalytic methods for examining counterfactuals by machinelearning community. The notion of similarity is used in almostall conceptions of counterfactual explanations or fairness asreferenced. To the best of our knowledge, the philosophical-conceptual basis [36, 37, 49] and assumptions required toassess the ‘similarity’ of counterfactual worlds / scenarios arenot properly examined in the machine learning literature, yet‘similarity’ is used, implicitly or explicitly, for making senseof counterfactual explanations or fairness. (2) We go beyondthe mere criticism of causal modeling as applied to the socialdomain, and consider counterfactuals more generally by ex-amining both the close-enough-possible-worlds account andcausal modeling. We think that just a critique of manipulatingsocial categories is not su ﬃ cient because in disciplines suchas medicine and public health, the use of protected attributessuch as race or gender are considered to be an ethically ac-ceptable component of research (e.g., prostate cancer screen-ing [5, 21]). (3) We provide positive results in terms of a setof detailed tenets as summarized in table 1, showing that anytrace of a counterfactually fair or explainable algorithm (ina social context) involves making several choices and valuejudgments. To that end, the implicit presumptions, choices,and value judgments must be made as explicit and obviousas possible by using table 1. No related work does (1) – (3).The rest of the paper is structured as follows. In Section2, we examine the two prominent approaches to modelingand evaluating counterfactuals, the close-enough-possible-worlds and the causal modeling approaches, in more detail. InSection 3, we discuss the use of counterfactuals for analyzingfairness and social explanations in machine learning practicebefore raising ontological and epistemological-semantic prob-lems from this use in Section 4. In Section 5, we suggest a setof tenets about the use of counterfactuals in machine learning.Section 6 concludes the paper. Consider the following counterfactuals: (1) If Suzy had notthrown the rock, the window would not have shattered. (2)If Nora had not been Latina, she would not have been de-nied admission. Are these counterfactuals true or false? Does‘Suzy’s throwing the rock’ cause ‘the shattering of the win-dow’? Does ‘Nora’s being Latina’ cause ‘denying admission’?There are two prominent approaches to evaluate counterfactu-als, the close-enough-possible-worlds approach that is mainlyused in the discussions of social counterfactual explanations[51], and the causal modeling approach that is at the center We are not promoting this use. We just report that in medicine, economics,public health and other related disciplines, the use of protected classes suchas race or gender sometimes is the basis of development or allocation of someresources. he Use and Misuse of Counterfactuals in Ethical Machine Learning FAccT ’21, March 3–10, 2021, Virtual Event, Canada of discussions about counterfactual fairness [35]. We presentthese two semantic approaches independently, though wemust mention that, theoretically speaking, the relationshipbetween the two is not that straightforward [7]. For the lackof space, we cannot go into the di ﬀ erential details in this pa-per. But we translate this lack of straightforward connectionbetween the two semantic approaches into our set of princi-ples for using counterfactuals in machine learning research.According to the closest-possible-worlds view [36, 49], acounterfactual can be treated syntactically and semanticallyvia a variant of a modal logic for counterfactuals. The eval-uation of the counterfactual X € Y (if X had occurred, Ywould have occurred) requires the speciﬁcation of a set ofpossible worlds in which X occurs. If in these possible worldsY also occurs, the counterfactual X € Y is true. These possi-ble worlds must be ordered in terms of comparative similar-ity or closeness to the actual world (in which X occurs andY occurs). For instance, if in all the worlds which are closeenough to the actual world except that Suzy does not throwthe rock, the window does not shatter, then Suzy’s throw isthe cause of the shattering of the window. If in all the close-enough-possible-worlds to the actual world in which Norais not Latina, she is not denied admission, then Nora’s beingLatina is the cause of her rejection. The close-enough-possible-worlds approach to the evaluation of counterfactuals requiresan ordering of the possible worlds in terms of similarity to theactual world. In Section 4, we discuss that the notion of simi-larity is inherently vague and that the similarity ordering canbe done in many di ﬀ erent ways. As a result, depending on the choices for the similarity criteria and the ordering, we canobtain contradictory judgments about the truth or falsity ofcounterfactuals. Hence, the vagueness and the multiplicity oforderings pertain to the problems of using counterfactuals inmachine learning.A causal modeling approach uses a causal model as a rep-resentational tool for exploring the space of alternative causalhypothesis. Following Pearl [44], from a causal modeling per-spective, the world is described in terms of random variablesand their values. The random variables are either exogenousor endogenous, and they might take continuous or categor-ical values. The exogenous variables ( U ) are determined byfactors outside of the causal model, and serve as ﬁxed back-ground assumptions to the causal reasoning. The endogenousvariables ( V ) may have a causal inﬂuence on each other. Thisinﬂuence is modeled by a set of structural equations F thatare functions for capturing the potential causal e ﬀ ects of func-tional dependencies on the endogenous variables. A set ofexogenous and endogenous variables, their values, and a setof structural equations form a causal model M = ( U , V , F ). M can be graphically visualized by a directed acyclic graph. Thisgraph facilitates cognitive e ﬀ orts in thinking about potentialcausal sources, e ﬀ ects, and causal relations. In such a graph, anode represents a random variable and an edge between eachpair of nodes represents a direct causal relation between the Kilbertus et al. [33] use causal models to analyze fairness. We focus our dis-cussion on Kusner et al. [35], but our criticism also applies to their work. corresponding random variables; for instance, X is a directcause (parent) of Y is represented by X → Y . Nodes with noincoming edge are said to be exogenous.To ﬁnd causal relations via a causal model requires es-tablishing well-deﬁned connections between some aspectsof the sample data and a causal model [44, 48]. The mainconnections are often captured by two causal assumptions,the causal Markov condition and faithfulness. The causalMarkov condition ensures that a variable is independent ofits non-descendants given its parents. The causal faithfulnesscondition requires that all inter-dependencies in the observa-tional data are non-accidental and structural, the result of thestructure of the causal graph. To counterfactually think viaa causal modeling approach in a speciﬁc machine learningdomain requires an in-depth interpretation of the mapping ofthe random variables on the elements of the domain and thesatisfaction of the causal assumptions. If the domain of coun-terfactual thinking occurs at the level of the social world, werequire an apt interpretation of the mapping of the randomvariables on social categories, the relationship between them,and the meaning of causal assumptions applied to the rele-vant categories. So far, we have provided a discussion of thetwo most prominent semantic approaches to the evaluation ofcounterfactuals. In the next section, we give two examples ofthe use of counterfactuals in machine learning: in understand-ing fairness (via causal modeling) [35] and in understandingsocial explanations (via the closest-possible-worlds) [51]. Discussions about the treatment of fairness in machine learn-ing systems have primarily taken place in relation to a groupor the individual level. To achieve group fairness, a (statistical)measure must compare a predictor’s behavior across di ﬀ erentprotected demographic groups, and then seeks for approxi-mate parity of some desirable statistical measure across thegroups [8, 25]. On the other hand, a measure of individualfairness must compare a predictor’s behavior across similarindividuals [11, 31]. To date, the most popular proposal formaking sense of individual fairness has been the use of causalmodeling for interpreting individual fairness in a counterfac-tual way [35]. Kusner et al. [35] deﬁne a fair predictor to be theone that gives the same prediction had the individual were dif-ferent, for example, had the individual been of another raceor gender. This demands an implicit assumption that otherfeatures and properties (except for the tweaked category inthe causal model) remain the same for that individual. Moreprecisely, Kusner et al. (2017) gives the following deﬁnition:counterfactual fairness “captures the intuition that a decisionis fair towards an individual if it is the same in (a) the actualworld and (b) a counterfactual world where the individualbelonged to a di ﬀ erent demographic group.”Consider a prediction-based problem characterized in termsof A (a set of protected attributes), X (a set of non-protectedattributes), and Y (the prediction output). To put this problem AccT ’21, March 3–10, 2021, Virtual Event, Canada Atoosa Kasirzadeh and Andrew Smart

Race GPALSATGrades

Sex

Know

Figure 1: A causal model for a fair predictor adapted from[35]. into a causal modeling schema requires ﬁxing U , the set ofexogenous variables. Following [35], the deﬁnition of counter-factual fairness for the predictor ˆ Y stipulates the satisfactionof the following condition for X = x and A = a , for all y , and anyvalue a ′ attainable by A : P ( ˆ Y A ← a ( U ) = y | X = x , A = a ) = P ( ˆ Y A ← a ′ ( U ) = y | X = x , A = a ′ )To make matters more concrete, we focus on an example ofa machine learning system, as discussed by [35], employing apredictor ˆ Y to decide who should be admitted to law schoolbased on its prediction of potential student’s ﬁrst year grade(ﬁgure 1). The algorithm makes the prediction according toknowledge about the following attributes of individuals: gen-der, race, GPA, and law school entrance exam (LSAT). Accord-ing to [35], the set of sensitive attributes are A {sex, race}, andthe non-sensitive ones are X {GPA, law school entrance exam}.Moreover, there is a causal link set between the attributes andthe prediction of potential student’s ﬁrst year grade. To makethis classiﬁer fair, the following question should be answered:what would the predictor have predicted, if the individualhad a di ﬀ erent race (a di ﬀ erent sensitive attribute)? This useof counterfactuals requires assuming a single change (race) ora limited set of changes (such as sex and race) to an individual,and then evaluate the probabilistic condition above given thesupposition that everything else remains the same for thatindividual. Although the proposal might sound simple, inthe next section we discuss the problems pertaining to thisproposal such as requiring commitment to a peculiar concep-tion of race as well as controversial views about the integrityof what an individual (or the perception of an individual) is,for the purpose of satisfying the convenient requirements ofcounterfactual modeling and evaluation. Or, to understandwhat counterfactual fairness is, we ﬁrst need to make choicesabout which counterfactual worlds to consider and the basisby which the closeness of counterfactual worlds (includingthe knowledge of how a counterfactually di ﬀ erent version ofthe target individuals) to the actual world is speciﬁed. Counterfactual explanations are claimed to be among themost popular types of explanations for opaque algorithmicdecisions [51]. For instance, let us assume Nora has appliedfor a mortgage and her application is denied via an algorith-mic system. A counterfactual explanation for this denial canbe: If Nora’s annual income had been $60,000, she would have received the loan. As a matter of fact, Nora is denied a loanand her annual income is $40,000. Or consider the followingcounterfactual explanation: If Nora had not been Latina, shewould not have been denied the loan. As a matter of fact,Nora is denied a loan and she is Latina. This is an instancethat requires making a putative plausible assumption abouta di ﬀ erent version of Nora (with only a di ﬀ erent race, every-thing else equal to the original version of Nora), and thentrust the validity of this explanation.In the next section, we o ﬀ er two main arguments challeng-ing a counterfactual approach to algorithmic fairness and so-cial explanations when the things that require counterfactualsupposition are social categories such as gender, sexual orien-tation, or race in terms of which (the uniqueness of) a personis characterized. In this section, we specify two sets of problems, ontologicaland epistemological-semantic, that one faces upon attemptingto construct a fair or explainable classiﬁer which incorporatesthe counterfactual supposition of social categories such asrace and gender, as sketched in Section 3. The problems arisein the attempts to answer the following questions: what arethe objects of manipulation? Which counterfactual worlds aresimilar enough to the actual world or whose causal model’sperspective should we care about?

There has been a long standing debate among several disci-plines such as philosophy, sociology, law and epidemiologyabout the causal e ﬀ ects of social categories such as race andgender [12, 16, 34, 41]. To counterfactually suppose a socialattribute of an individual requires ﬁrst specifying what the so-cial categories are and what it means to suppose a di ﬀ erentversion of an individual with the counterfactually manipu-lated social property. This counterfactual question amountsto asking, ”what if person X had not been ”race Y” or ”genderZ”?There are several competing contemporary schools of thoughtabout what social categories are, and our review here is merelyrepresentative of some and by no means exhaustive. In therest of this section, we take ‘race’ as a prototypical instanceof the social categories of interest to counterfactual manip-ulation. With some modiﬁcations, similar arguments can bemade about other social categories such as gender.Roughly, we can distinguish between three major positionsabout what race is [39, 40]. The geo-biological essentialismabout race largely signiﬁes dividing humans into a su ﬃ cientlysmall, discrete number of categories, usually for the purposesof colonial conquest, enslavement or domination of one groupover another [10]. The categorization has been based on somekind of biological foundation (e.g., modern genes) essentialto humans, and inherited from one generation to another.This conception of race identiﬁes some geo-biological features(such as skin color, hair texture, and eye form) that are only he Use and Misuse of Counterfactuals in Ethical Machine Learning FAccT ’21, March 3–10, 2021, Virtual Event, Canada common to the members of a racial group, usually from a spe-ciﬁc geographical region. The geo-biological conception ofrace has been questioned extensively, and has been criticallychallenged by scientiﬁc and philosophical arguments rang-ing from denying that the concept of race has any biologicalfoundations to denying the very existence of races. Some alsohave argued that this biologically essentialist view about racecannot be separated from the political project of racial oppres-sion, domination and disenfranchisement [10, 23]. In additionto the geo-biological ancestry conception of race, there are twoother major views about the ontology of race.On the one hand, racial skeptics argue for the falsity of nat-uralism about race and conclude that no type of race exists[2, 3, 20, 52–54]. They claim that the natural candidates for thebases of race such as geography, phenotypes, and geneologyfail according to scientiﬁc ﬁndings. The normative implica-tions of this ontological view is to entirely disregard the exis-tence of race. On the other hand, racial constructivists dismissthe conception of biological race, but argue that the conceptof race must be preserved for the purpose of social move-ments and a ﬃ rmative action to abolish social and structuralinjustice. How so? One of the most inﬂuential proponentsof racial constructivism, Haslanger [15, 26, 27], suggests agroup-based understanding of race marked by ancestry andappearance and by hierarchical relations of power for the pur-poses of ﬁghting against social injustice . This conception of raceﬁnds using ‘race’ as a justiﬁable entity for the purpose ofresisting and combating racism. Other than that, racial iden-tiﬁcation by the dominant group constrains the autonomyof individuals by requiring them to be what a speciﬁc racialgroup signiﬁes from the point of view of who has deﬁned it.Social constructivism hence maintains that a social category– be it racial, gender, or class – was brought into existence orshaped by historical events, social forces, political power, orcolonial conquest, all of which could have been very di ﬀ erent[6, 12, 20]. Being a social constructivist about race and gendermeans that one does not subscribe to the view that race andgender are natural or biological categories with permanentor immutable properties. In other words, for such a construc-tivist, the term ‘race’ cannot refer to an essentially biologicalattribute such as skin tone, a genetically produced trait, or asigniﬁer that people just have and thereby obviously belongto a designated racial group [34].Kusner et al. [35] claim that it is counterproductive to as-sume social categories such as race cannot be causes becausewe can design experiments on such categories by interven-ing on a particular aspect of the attribute ‘race’, such as ‘raceperception’. We disagree. We think this claim only serves tojustify the convenient assumptions required for causal model-ing (i.e., that conception of race is amenable to counterfactualmanipulation). As we have shown above, there is no univer-sally agreed-upon perception of race. To be able to talk aboutthe causal e ﬀ ect of social categories, we ﬁrst need to specifywhat these categories are. For instance, we might be justiﬁedin ﬁrst having a robust social ontology informed by criticaltheory [28]. Only after this exploration, we are able to discuss what our perception of race is. As we have seen, there is a plu-rality of responses to this question, and our response dependson the perspective we adopt about this matter.Recall that an algorithm that subscribes to counterfactualfairness requires evaluating the actual non-occurrence of Xwith the supposition that X did occur. For example, we shouldreplace the actual person (or our perception thereof) who hasa protected attribute, such as being Latina, with a counterfac-tual version of the same person who has a di ﬀ erent protectedattribute, such as being white, to test whether the algorithmmakes the same prediction about the actual person (or ourperception thereof) and the counterfactual person. What viewabout race (or perception of race) does it require to supposethat racial category non-Latina for the counterfactual versionof person i knowing that Latina is the real feature of person i ?Counterfactual fairness (or counterfactual social explanation)requires us to force a random variable to take a certain value.Is the required counterfactual suppositions for designing afair algorithm compatible with the views about race speciﬁedabove?Racial skepticism is ruled out as an alternative of commit-ments held by the proponents of counterfactual fairness orcounterfactual social explanations due to its denial of thevery existence of such categories. Social constructivism makessense of race for the purposes of ﬁghting against social injustice .Hence, the constructivist ontology of race has, in addition,a purpose-relative reality that the algorithm must reﬂect inits reasoning and arguably is not subject to counterfactualvariation separate from the scope of the ﬁght against socialinjustice. Perhaps the only viable theory of race that remainsfor counterfactual fairness requires commitment to a reduc-tionist view about social categories such as race or gender asbiological attributes. Several scholars have argued that thiscommitment is deeply problematic (see, for instance, [34]).We share this perspective for several decision contexts. Thispurely reductionist understanding of social categories as es-sential and physical attributes, in addition to being scientiﬁ-cally outdated, fails the task of robust objectivity, and mightindirectly widen and exaggerate the problematic associationsbetween the sensitive attributes that are the result of socialand structural injustice in the ﬁrst place. Is there an objective view from nowhere form which to assessthe validity of counterfactuals? In this section, we raise someepistemological-semantic problems for comparing and select-ing the set of counterfactual possible worlds that are closeenough to the actual world.First, we focus on the problem of inherent vagueness associ-ated with similarity between possible worlds. Counterfactualscenarios in counterfactual worlds stand in contrast to actualscenarios in actual worlds. To evaluate a counterfactual re-quires a comparison between an actual world and a set ofsu ﬃ ciently similar counterfactual worlds to the actual world.The counterfactual X € Y is true just in case it takes less of AccT ’21, March 3–10, 2021, Virtual Event, Canada Atoosa Kasirzadeh and Andrew Smart a departure from the actual world to make X true along with Y than to make X true without Y . But, which counterfactualworlds? The number of counterfactual worlds is myriad (per-haps even uncountably inﬁnite). Lewis and Stalnaker [36, 49]emphasize that the counterfactual worlds of interest to the ac-tual world are the ones that are the most similar to the actualworld. In some cases of comparing natural features betweenworlds, it is possible to arrive at a consensus for the ordering ofsimilar worlds. However, in many cases the vagueness of thisnotion is problematic and counterintuitive for the evaluationof counterfactuals [17]. Lewis [37] provides some guidanceto ordering possible worlds: (1) avoid big widespread viola-tions of the laws of nature of the actual world, (2) maximizethe spatiotemporal perfect match of particular matters of fact,(3) avoid small, localized violations of the laws of nature ofthe actual world, and (4) secure approximate similarity ofparticular matters of fact. But, how to translate these consid-erations to the social domain? Further research is required tounderstand how to avoid big widespread violations of com-mitments to our ontological views about social categories inthe possible worlds framework.The ordering of similar worlds faces severe problems be-cause for some ordinary counterfactuals, some irrelevant pos-sible worlds end up determining the counterfactuals’ truthvalues. Also, depending on what kind of possible worlds wechoose, we might end up assigning a di ﬀ erent truth-value toa counterfactual statement. To make the matters more con-crete, consider the following counterfactual [13] (3) If Nixonhad pressed the button, then there would have been a nuclearholocaust. A similarity-based approach requires the follow-ing truth-evaluation: (3) is true if and only if the worlds mostsimilar to the actual world in which Nixon pressed the button,there was a nuclear holocaust. But the worlds in which thereis a nuclear holocaust are drastically di ﬀ erent from the actualworld: the entire future history of humanity would be di ﬀ er-ent in such a world. This example points to the di ﬃ cultieswe face in making judgments about the ordering of possibleworlds.The causal modeling approach for interpreting counterfac-tuals builds on Lewis’s ordering of similar worlds. However,it appeals to the cognitive architecture of the human mind inorder to resolve the arbitrariness of assumptions about the or-dering of the counterfactual worlds. Pearl [45] argues that tomake sense of the notion of "similarity" we should rely on thefact that we experience the same world and share the samemental model of its causal structure. However, relying on alargely speculative psychological theory of how the humanmind handles the inﬁnity of possible counterfactual worldsdoes not resolve the normative and ethical implications ofchoosing which possible worlds are the most similar to theactual world.Indeed, di ﬀ erent epistemic view points might suggest dif-ferent ordering of possible worlds. After all, humans di ﬀ erextensively in the standpoints from which they observe theworld, and these standpoints inﬂuence the formation of causalmental models [24]. From an abstract point of view, a causal model is speciﬁed according to a set of nodes, edges, and as-sumptions. What these nodes and edges represent and howthey are interpreted suggest a particular standpoint aboutthe organization of world from the view point of the causalmodel. The crucial point to remember is that no causal modelcaptures absolutely objective relations in the world. Depend-ing on the convenient assumptions for a causal model, X canbe counterfactually dependent on Y in one model but not inanother [22]. These convenient assumptions specifying thecausal model might enforce some false perceptions about thesocial world (at the risk of being seriously wrong). This sug-gests that there is always a view from somewhere, as opposedto a more objective and universal “view from nowhere” [43]from which we can assess whether a counterfactual is assert-ible. So far, we have argued that the use of counterfactuals in fairand explainable machine learning is not straightforward, andthat there are various trade-o ﬀ s and value judgments essen-tial to the use of counterfactuals for ethical machine learning.Examination of all these assumptions produces awarenessabout various trade-o ﬀ s, value judgments, or potential harmsof using or misusing counterfactuals. Therefore, to aptly usecounterfactuals requires bringing forth all implicit and un-speciﬁed assumptions about the ontology of the categorieson which we run counterfactual analysis as well as the epis-temic and the interpretational issues pertaining to the evalua-tion of counterfactuals. Examination of all these assumptionsproduces awareness about some unexpected potential harmsthat can result from the laudable goals of fair and explainablemachine learning.In this section, we o ﬀ er strategies for specifying and reﬂect-ing on the hidden ontological and epistemological-semanticassumptions through an interdisciplinary conversation. Wesummarize the results of our study (Table 1) by suggesting adetailed set of tenets to check and reﬂect upon before applyingcounterfactuals to fair and explainable machine learning. Fol-lowing this set of tenets would enable modellers and algorith-mic designers to state unspeciﬁed and implicit assumptionsabout social ontology as explicitly as possible. It also suggestsa path to researchers for seeking a variety of justiﬁcations inseeing the social world through a counterfactual lens, and tobecome aware of some potential harms and disadvantages ofmaking sense of fairness and explanations counterfactually.Our results are a necessary step to perform before designingand applying some putative counterfactually fair or explain-able algorithms to social contexts.Table 1 has three columns. The ﬁrst column provides a cat-egory of di ﬀ erent kinds of presumptions and choices (onto-logical and epistemological-semantic) which are necessary toexamine before designing and applying counterfactually fairand explainable algorithms. The second column provides theset of questions to ask and answer for articulating the implicitset of assumptions in column one as explicit as possible. The he Use and Misuse of Counterfactuals in Ethical Machine Learning FAccT ’21, March 3–10, 2021, Virtual Event, Canada A ssumption Q uestion E xample Ontological perspective What are the social categories? What is race (or gender)?Ontological choice What ontological perspective do wechoose to adopt, and why? Among di ﬀ erent views, what dowe take race to be? Social construc-tivism? Geo-biological ancestry con-ception of race? why?Ontological knowledge How do we know about the socialcategories? Who do we consult about the con-ception of race?Semantic choice Close-enough-possible-worlds orcausal modeling? What is thejustiﬁcation? Why do we choose either of these se-mantic approaches to counterfactu-ally suppose that Nora is not Latina?How is our choice justiﬁed?Evaluation reliability What happens to the truth valueof the counterfactuals of interest ifwe change the semantic approach?How robust is the truth value of thecounterfactual when moving froma close-enough-possible-worlds ap-proach to causal modeling? Is the truth value for “If Nora hadnot been Latina, she would nothave been denied admission.” dif-fer when we choose the semantic ap-proach?Similarity choice How do we choose what similaritymeans in this context? What do we sacriﬁce by supposinga particular cluster of similar worlds(rather than other possible clustersof similar worlds) in which an indi-vidual is the same except for theirrace?Comparison criteria What are our chosen criteria for com-paring the similar worlds of intereststo the actual world? Are these crite-ria socially warranted? What characterization for compar-ing similar worlds justiﬁes keeping(almost) everything about a personﬁxed except for their race? Whatdoes this socially mean?Idealization What do we miss by translatingsocial categories into random vari-ables? What is left out by translating anindividual’s race to a random vari-able?Context How do these categories operate inthe world? How does race function in theworld? Does this conﬂict with theassumptions necessary for counter-factual manipulation of race?Ethical and social harm Does our ontological preference gen-erate harms in relation to socialjustice (combating structural injus-tices)? Does our ontological preference forwhat race is generate harms in rela-tion to combating racial injustice? Table 1: Any use of a counterfactually fair or explainable algorithm (in a social context) involves making several ontological,semantic and ethical choices and judgments. These implicit presumptions, choices, and judgments must be made as explicitand obvious as possible.

AccT ’21, March 3–10, 2021, Virtual Event, Canada Atoosa Kasirzadeh and Andrew Smart third column gives an exemplar of the questions to answer inthe context of a particular social problem.Recall the counterfactual “If Nora had not been Latina, shewould not have been denied admission.”Here are the ontological assumptions that should becomeexplicit. First, an explicit statement of the ontological per-spective the algorithmic system is adopting. To tweak “be-ing Latina” the designers of the system need to specify whatrace (e.g., Latina) from their perspective is. Second, the de-signers could discover whether a counterfactual approachinadvertently commits them to a problematic social ontology.They could provide morally and politically appropriate jus-tiﬁcations for why, among other options, they choose andadopt this ontological perspective about race. Is it becausethis conception of race is compatible with some simplistic as-sumptions about social ontology that are required to use acausal modeling approach? What are the genuine reasons forthis choice, in relation to respecting intellectual humility forwhat we know about race from other disciplines? Third, theassumptions about ontological knowledge should become ex-plicit. For instance, who do we consult about a theory of race,and why?The epistemological-semantic presumptions and choicesthat must be made explicit are as follows. First, what is thesemantic choice? Will we choose a close-enough-possible-worlds or causal modeling approach? What is the justiﬁcationfor this choice? Does our choice make a di ﬀ erence to the truthevaluation of the counterfactual for this particular context ofemployment such as Nora not being Latina? Second, how dowe account for the evaluation of reliability? How robust isthe truth value of the counterfactual when moving from aclose-enough-possible-worlds approach to causal modeling?For instance, is the truth value of “If Nora had not been Latina,she would not have been denied admission.” di ﬀ er when wechoose either of the semantic approaches? Third, how do wedecide about the meaning of similarity in the particular con-text of employment? What do we sacriﬁce by supposing aparticular cluster of similar worlds (rather than other possi-ble cluster of similar worlds) in which an individual is thesame except for their race? Or if we are specifying that every-thing that is not causally dependent on the tweaked categoryshould remain constant, how do we know what is not causallydependent? Fourth, what are our chosen criteria for compar-ing the possible similar worlds of interest to the actual world?Are these criteria socially warranted? For instance, what char-acterization for comparing similar worlds justiﬁes keepingalmost everything about a person ﬁxed except for their race?What does this socially mean? Fifth, there are questions aboutthe translation of social categories such as race into randomvariables that can be appropriately treated by an algorithm, ifthe semantic choice is causal modeling. What is left out if wetranslate the conception of race into random variables? Doesthat matter? Why or why not? Sixth, there are questions aboutthe choice of context. How do social categories (such as race)operate in the world? Does this conﬂict with the required as-sumptions for counterfactual manipulation of race? Finally,there are questions about some ethical harms that can result from the use of counterfactual analysis. Does our ontologicalpreference generate harms in relation to some desired socialjustice agenda? For example, does our ontological preferencefor what race is generate harms in relation to some a ﬃ rmativeaction plans for combating racial injustice?In sum, Table 1 shows that any trace of a counterfactuallyfair or explainable algorithm (in a social context) involvesmaking several choices and presumptions. By following thesetenets, computer scientists can discuss the validity and theimplications of these choices in accordance with other disci-plines such as philosophy, social sciences, and anthropology.To that end, the implicit presumptions and choices will bemade as explicit and obvious as possible, and an interdis-ciplinary conversation can result in concluding whether thecounterfactuals should be used in the generation of explana-tions and fairness in machine learning practice. Counterfactuals are increasingly applied in machine learn-ing, for example in designing fair and explainable algorithms.This paper provides a detailed set of principles, according tophilosophical and social scientiﬁc insights, for articulatingthe implicit and unspeciﬁed contextual presumptions andchoices made in counterfactual applications. Regardless ofwhich evaluation approach to counterfactuals one takes, thisset of principles could help researchers to conduct interdis-ciplinary conversations and become aware of the potentialharms and ethical impacts of their counterfactual thinking asit pertains to the social world. We think this set of principles isan example of how to establish a successful interdisciplinaryconversation between machine learning researchers and so-cial scientists, philosophers, and ethicists.

We would like to thank Alex Beutel, Yoni Halpern, ManasiJoshi, Christina Greer, Robert Williamson, Mario Günther,and members of the Humanizing Intelligence Grand Chal-lenge at Australian National University for extremely helpfulcomments and feedback. We would also like to thank partici-pants in the Workshop on Philosophy and Medical AI at theUniversity of Tübingen, NeurIPS’s Workshop on AlgorithmicFairness through the Lens of Causality and Interpretability,and the Bias and Fairness in AI Workshop in Ghent, Belgiumfor critical discussion.

REFERENCES [1] Holger Andreas and Mario Günther. 2020. A Ramsey Test analysis ofcausation for causal models.

The British Journal for the Philosophy of Science (2020).[2] Anthony Appiah. 1995. The uncompleted argument: Du Bois and theillusion of race.

Critical inquiry

12, 1 (1995), 21–37.[3] Anthony Appiah. 1996. Race, Culture, Identity: Misunderstood Connec-tions.” Color Conscious: the political morality of race. Anthony Appiahand Amy Gutmann.[4] Solon Barocas, Andrew D Selbst, and Manish Raghavan. 2020. The hid-den assumptions behind counterfactual explanations and principal rea-sons. In

Proceedings of the 2020 Conference on Fairness, Accountability, andTransparency . 80–89.[5] Yoav Ben-Shlomo, Simon Evans, Fowzia Ibrahim, Biral Patel, Ken Anson,Frank Chinegwundoh, Cathy Corbishley, Danny Dorling, Bethan Thomas, he Use and Misuse of Counterfactuals in Ethical Machine Learning FAccT ’21, March 3–10, 2021, Virtual Event, Canada

David Gillatt, et al. 2008. The risk of prostate cancer amongst black menin the United Kingdom: the PROCESS cohort study.

European Urology

Race after technology: Abolitionist tools for the new jimcode . John Wiley & Sons.[7] R. Briggs. 2012. Interventionist counterfactuals.

Philosophical studies

Big data

5, 2 (2017),153–163.[9] Amanda Coston, Alan Mishler, Edward H Kennedy, and AlexandraChouldechova. 2020. Counterfactual risk assessments, evaluation, andfairness. In

Proceedings of the 2020 Conference on Fairness, Accountability, andTransparency . 582–593.[10] Lory Dance. 2010. Struggles of the Disenfranchised: CommonalitiesAmong Native Americans, Black Americans, and Palestinians.

Al-HewarMagazine (2010).[11] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, andRichard Zemel. 2012. Fairness through awareness. In

Proceedings of the3rd innovations in theoretical computer science conference . 214–226.[12] Dave Elder-Vass. 2012. Towards a realist social constructionism.

Sociologia,problemas e práticas

70 (2012), 9–24.[13] Kit Fine. 1975. Critical notice.

Mind

84, 335 (1975), 451–458.[14] Bhavya Ghai, Q Vera Liao, Yunfeng Zhang, and Klaus Mueller. 2020.Measuring Social Biases of Crowd Workers using Counterfactual Queries. arXiv preprint arXiv:2004.02028 (2020).[15] Joshua Glasgow, Sally Haslanger, Chike Je ﬀ ers, and Quayshawn Spencer.2019. What is Race?: Four Philosophical Views . Oxford University Press.[16] Clark Glymour and Madelyn R Glymour. 2014. Commentary: race andsex are causes.

Epidemiology

25, 4 (2014), 488–490.[17] Nelson Goodman. 1972. Seven strictures on similarity. In

Problems andprojects . New York, Bobbs-Merrill.[18] Rory Mc Grath, Luca Costabello, Chan Le Van, Paul Sweeney, FarbodKamiab, Zhao Shen, and Freddy Lecue. 2018. Interpretable credit ap-plication predictions with counterfactual explanations. arXiv preprintarXiv:1811.05245 (2018).[19] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi,Franco Turini, and Fosca Giannotti. 2018. Local rule-based explanationsof black box decision systems. arXiv preprint arXiv:1805.10820 (2018).[20] Ian Hacking, Jan Hacking, et al. 1999.

The social construction of what?

Harvard university press.[21] Susan Halabi, Sandipan Dutta, Catherine M Tangen, Mark Rosenthal,Daniel P Petrylak, Ian M Thompson Jr, Kim N Chi, John C Araujo, Christo-pher Logothetis, David I Quinn, et al. 2019. Overall survival of blackand white men with metastatic castration-resistant prostate cancer treatedwith docetaxel.

Journal of Clinical Oncology

37, 5 (2019), 403.[22] Joseph Y Halpern and Christopher Hitchcock. 2011. Actual causation andthe art of modeling. arXiv preprint arXiv:1106.2652 (2011).[23] Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020.Towards a critical race methodology in algorithmic fairness. In

Proceedingsof the 2020 Conference on Fairness, Accountability, and Transparency . 501–512.[24] Sandra G Harding. 2004.

The feminist standpoint theory reader: Intellectualand political controversies . Psychology Press.[25] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunityin supervised learning. In

Advances in neural information processing systems .3315–3323.[26] Sally Haslanger. 2000. Gender and race:(What) are they?(What) do wewant them to be?

Noûs

34, 1 (2000), 31–55.[27] Sally Haslanger. 2010. Language, politics, and “the folk”: looking for “themeaning” of ‘race’.

The Monist

93, 2 (2010), 169–187.[28] Sally Haslanger. 2016. What is a (social) structural explanation?

Philosoph-ical Studies arXiv preprint arXiv:1806.09809 (2018).[30] Lily Hu and Issa Kohler-Hausmann. 2020. What’s Sex Got To Do With Ma-chine Learning.

Proceedings of the 2020 Conference on Fairness, Accountability,and Transparency (2020).[31] Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth.2016. Fairness in learning: Classic and contextual bandits. In

Advances inNeural Information Processing Systems . 325–333.[32] Shalmali Joshi, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, andJoydeep Ghosh. 2019. Towards realistic individual recourse and action-able explanations in black-box decision making systems. arXiv preprintarXiv:1907.09615 (2019).[33] Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, MoritzHardt, Dominik Janzing, and Bernhard Schölkopf. 2017. Avoiding dis-crimination through causal reasoning. In

Advances in Neural Information Processing Systems . 656–666.[34] Issa Kohler-Hausmann. 2018. Eddie Murphy and the dangers of counter-factual causal thinking about detecting racial discrimination.

Nw. UL Rev.

113 (2018), 1163.[35] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Coun-terfactual fairness. In

Advances in Neural Information Processing Systems .4066–4076.[36] David Lewis. 1973.

Counterfactuals . Oxford: Blackwell.[37] David Lewis. 1986.

Philosophical papers II . Oxford: Oxford UniversityPress.[38] Scott M Lundberg and Su-In Lee. 2017. A uniﬁed approach to interpret-ing model predictions. In

Advances in neural information processing systems .4765–4774.[39] Ron Mallon. 2004. Passing, traveling and reality: Social constructionismand the metaphysics of race.

Noûs

38, 4 (2004), 644–673.[40] Ron Mallon. 2007. A ﬁeld guide to social construction.

Philosophy Compass

2, 1 (2007), 93–108.[41] Alexandre Marcellesi. 2013. Is race a cause?

Philosophy of Science

80, 5(2013), 650–659.[42] Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explain-ing machine learning classiﬁers through diverse counterfactual explana-tions. In

Proceedings of the 2020 Conference on Fairness, Accountability, andTransparency . 607–617.[43] Thomas Nagel. 1989.

The view from nowhere . Oxford University Press.[44] Judea Pearl. 2009.

Causality . Second Edition, Cambridge University Press.[45] Judea Pearl and Dana Mackenzie. 2018.

The book of why: the new science ofcause and e ﬀ ect . Basic Books.[46] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Whyshould I trust you?" Explaining the predictions of any classiﬁer. In Proceed-ings of the 22nd ACM SIGKDD international conference on knowledge discoveryand data mining . 1135–1144.[47] Kacper Sokol and Peter A Flach. 2018. Glass-Box: Explaining AI De-cisions With Counterfactual Statements Through Conversation With aVoice-enabled Virtual Assistant.. In

IJCAI . 5868–5870.[48] Peter Spirtes, Clark N Glymour, Richard Scheines, and David Heckerman.2000.

Causation, prediction, and search . MIT press.[49] Robert C Stalnaker. 1968. A theory of conditionals. In

Studies in LogicalTheory , Nicholas Rescher (Ed.). Oxford: Blackwell, 98–112.[50] Arnaud Van Looveren and Janis Klaise. 2019. Interpretable counterfactualexplanations guided by prototypes. arXiv preprint arXiv:1907.02584 (2019).[51] SandraWachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactualexplanations without opening the black box: Automated decisions and theGDPR.

Harvard Journal of Law & Technology

31 (2017), 841.[52] Jennifer K Wagner, Joon-Ho Yu, Jayne O Ifekwunigwe, Tanya M Harrell,Michael J Bamshad, and Charmaine D Royal. 2017. Anthropologists’ viewson race, ancestry, and genetics.

American Journal of Physical Anthropology

Race and mixed race . Temple University Press.[54] Naomi Zack. 2014.