[PDF] Fair Prediction with Endogenous Behavior

Abstract

There is increasing regulatory interest in whether machine learning algorithms deployed in consequential domains (e.g. in criminal justice) treat different demographic groups "fairly." However, there are several proposed notions of fairness, typically mutually incompatible. Using criminal justice as an example, we study a model in which society chooses an incarceration rule. Agents of different demographic groups differ in their outside options (e.g. opportunity for legal employment) and decide whether to commit crimes. We show that equalizing type I and type II errors across groups is consistent with the goal of minimizing the overall crime rate; other popular notions of fairness are not.

Full PDF

aa r X i v : . [ ec on . T H ] F e b Fair Prediction with Endogenous Behavior

Christopher Jung , Sampath Kannan , Changhwa Lee , Mallesh M. Pai , Aaron Roth ,and Rakesh Vohra Department of Computer and Information Sciences, University of Pennsylvania Department of Economics and Department of Electrical & Systems Engineering,University of Pennsylvania Department of Economics, Rice UniversityFebruary 19, 2020

Abstract

There is increasing regulatory interest in whether machine learning algorithms deployed in conse-quential domains (e.g. in criminal justice) treat diﬀerent demographic groups “fairly.” However, thereare several proposed notions of fairness, typically mutually incompatible. Using criminal justice as anexample, we study a model in which society chooses an incarceration rule. Agents of diﬀerent demo-graphic groups diﬀer in their outside options (e.g. opportunity for legal employment) and decide whetherto commit crimes. We show that equalizing type I and type II errors across groups is consistent with thegoal of minimizing the overall crime rate; other popular notions of fairness are not.

Algorithms to automate consequential decisions such as hiring [Miller, 2015], lending [Byrnes, 2016], policing[Rudin, 2013], and criminal sentencing [Barry-Jester et al., 2015] are frequently suspected of being unfair ordiscriminatory. The suspicions are not hypothetical. The 2016 ProPublica study [Angwin et al., 2016] ofthe COMPAS Recidivism Algorithm (used to inform criminal sentencing decisions by attempting to predictrecidivism) found that the algorithm was signiﬁcantly more likely to incorrectly label black defendants asrecidivism risks compared to white defendants, despite similar overall rates of prediction accuracy betweenpopulations. Since then, discoveries of “algorithmic bias” have proliferated, including a recent study ofracial bias by algorithms that prioritize patients for healthcare [Obermeyer et al., 2019]. Thus spurred, pol-icymakers, regulators, and computer scientists have proposed that algorithms be designed to satisfy notionsof fairness (see for instance O’Neil [2016], Barocas et al. [2018], Roth and Kearns [2019] for overviews).This raises a question: what measure(s) of fairness should designers be held to, and how do these con-straints interact with the original objectives the algorithm was designed to target? The COMPAS caseillustrates that the answer is not clear. ProPublica and Northpointe (the company that designed COM-PAS) advocated for diﬀerent measures of fairness. ProPublica argued that the algorithm’s predictions didnot maintain parity in false positive and false negative rates between white and black defendants, whileNorthpointe countered that their algorithm satisﬁed predictive parity. Subsequent research identiﬁed hardtrade-oﬀs in the choice of fairness metrics: under some mild conditions, the two requirements above can-not simultaneously be satisﬁed (Kleinberg et al. [2016], Chouldechova [2017]). This inspired a literatureproposing (or criticizing) notions of fairness based on ethical/ normative grounds. The literature evaluatesalgorithms on the basis of these measures, and/or proposes novel algorithms that better trade-oﬀ the goals Northpointe’s algorithm had diﬀering Type-1 and Type-2 error rates across the two groups. Roughly, the accuracy of COMPAS scores was the same for both groups at all risk levels.

1f the original designer (decision accuracy, algorithmic eﬃciency) with these fairness desiderata. In gen-eral, the diﬀerent proposed fairness measures are fundamentally at odds with one another. For example, inaddition to the impossibility results due to Kleinberg et al. [2016], Chouldechova [2017], enforcing parity offalse positive or false negative rates for e.g. parole decisions typically requires making parole decisions usingdiﬀerent thresholds on the posterior probability that an individual will commit a crime for diﬀerent groups.This has itself been identiﬁed by [Corbett-Davies and Goel, 2018] as a potential source of “unfairness”.This line of research is subject to two criticisms. First raised by, for example [Corbett-Davies and Goel,2018]: these notions of fairness are disconnected from and lead to unpalatable tradeoﬀs with other economicand social quantities and consequences one might care about. Second, the literature almost exclusivelyassumes that the agent types, which are relevant to the decision at hand, are exogenously determined, i.e.unaﬀected by the decision rule that is selected. For instance, in the criminal justice application described,individual choices of whether to commit a crime or not, and therefore the overall crime rates, are ﬁxed and notaﬀected by policy decisions made at a societal level (e.g. what legal standards are used to convict, policingdecisions etc). In settings like this where agent decisions are exogenously ﬁxed, Corbett-Davies and Goel[2018] and Liu et al. [2019b] observe that optimizing natural notions of welfare and accuracy (incarceratingthe guilty, acquitting the innocent) are achieved by decision rules that select a uniform threshold on “riskscores” that are well calibrated — for example, the posterior probability of criminal activity — which tend not to satisfy statistical notions of fairness that have been proposed in the literature. Does this mean thatsetting uniform thresholds on equally calibrated risk scores is better aligned with natural societal objectivesthan is asking for parity in terms of false positive and negative rates across populations?In this paper, we consider a setting in which agent decisions are endogenously determined and showthat in this model, the answer is no : in fact, parity of false positive and negative rates (sometimes knownin this literature as equalized odds Hardt et al. [2016]) is aligned with the natural objective of minimizingcrime rates. Parity of positive predictive value and posterior threshold uniformity are not . Although themodel need not be tied to any particular application, we develop it using the language of criminal justice.We treat agents as rational actors whose decisions about whether or not to commit crimes are endogenously determined as a function of the incentives given by the decision procedure society uses to punish crime.The possibility for unfairness arises because agents are ex-ante heterogeneous: their demographic group iscorrelated with their underlying incentives— for example each individual has a private outside option valuefor not committing a crime, and the distribution of outside options diﬀers across groups. Our key result is thatpolicies that are optimized to minimize crime rates are compatible with a popular measure of demographicfairness — equalizing false positive and negative rates across demographics — and are generally incompatiblewith equalizing positive predictive value and uniform posterior thresholds. Thus, which of these notions offairness is compatible with natural objectives hinges crucially on whether one believes that criminal behavioris responsive to policy decisions or not.Our results have direct implications for regulatory testing for unfairness. Often, in settings of interest,a regulator does not directly observe the decision rule used by an adjudicator. However, the regulator maywish to test whether the adjudicator is using a “fair” rule, i.e. whether the adjudicators choices are biasedtowards or against some demographic group. Following a tradition starting with Becker [2010], one standardused is called an outcome test, i.e. comparing, ex-post, the classiﬁcation assigned by the adjudicator toobserved outcomes. For instance, in a criminal justice setting, one may compare the judge’s decision to the(somehow obtained) actual innocence or guilt of the defendants, or in a lending setting, compare the lender’sdecision on whom to extend loans to with the actual repayment outcomes of loan applicants etc.In this context, a given prescription on what constitutes a “fair” or non-discriminatory rule maps intoa corresponding outcome test. In particular, a test that is popularly used by researchers and regulatorscorresponds to the common-posterior-threshold rule described above. As already mentioned, this is notthe best test in our model. When used, this test attempts to evaluate whether the adjudicator is using acommon posterior threshold across groups by evaluating whether the marginal agents across groups have See e.g. Dwork et al. [2012], Hardt et al. [2016], Corbett-Davies and Goel [2018], Corbett-Davies et al. [2017], Feller et al.[2016], Friedler et al. [2016], Kearns et al. [2018], H´ebert-Johnson et al. [2018], Liu et al. [2019b] for a small sample of anenormous literature. However, implementing this test is diﬃcult: identifying (andbeing sure that one has correctly identiﬁed) the marginal agent in each group is hard (this is roughly theinfra-marginality problem, see e.g. [Simoiu et al., 2017]). For instance, there may be information observedby the decision maker but not by the regulator/ econometrician (an oft cited example is that police observea suspect’s demeanor, and use this as a factor, but this cannot be quantiﬁed). By contrast, if our maintainedassumptions are valid, then an adjudicator wishing to minimize crime should use a rule that equalizes falsepositive and false negative rates across demographic groups. This is easy to estimate and test: there is noneed to identify a marginal agent.

We ﬁrst derive our results in an extremely simple baseline model to highlight the underlying intuition. Wethen show that our conclusions are robust to a number of elaborations and generalizations of the model.

The Baseline Model

Our baseline model (in Section 2) has a mass of agents who each belong to one of two demographic groups.Each agent has a single choice on the extensive margin, for instance a binary choice of whether or not tocommit a crime; or whether or not to acquire human capital, etc. To ﬁx ideas, in this paper we frame thematter as a decision about whether to commit a crime.An adjudicator has to classify each agent as guilty or innocent. This classiﬁcation is based on a noisysignal that the adjudicator receives of each agent’s choice; the distribution of this signal depends only onthe agent’s choice, and not on her group. Further, the adjudicator observes the group membership of eachagent. The adjudicator commits ex-ante to a classiﬁcation rule, i.e. how it will classify agents as a functionof the signal received, and potentially the agent’s group membership.Agents are expected payoﬀ maximizers who enjoy a monetary beneﬁt from crime but also a cost ofbeing declared guilty of the crime. In choosing whether to commit a crime, they compare their expectednet beneﬁt from the crime to an outside option. The costs and beneﬁts are privately known to the agent,but not to the adjudicator (who only sees group membership). The only distinction between groups isthat the distributions of costs and beneﬁts may be diﬀerent for diﬀerent groups. For example, individualsfrom diﬀerent groups might have diﬀerent legal employment opportunities, diﬀerent costs of incarceration(e.g. diﬀerences in stigma), etc. The model is ﬂexible enough to allow (potentially diﬀerent) fractions of thepopulation in each group who are rigidly law-abiding (i.e. do not commit a crime regardless of circumstance)or hardened criminals (i.e. will commit a crime regardless of circumstance), and a variety of responses toincentives in between these two extremes. We don’t model the source of this heterogeneity: it is exogenous,and the distribution is known to the adjudicator. Given these preferences, the adjudicator’s decision ruledetermines their choices, which in turn determines the overall crime rate in each population.The adjudicator’s objective is to minimize the overall crime rate, i.e. the total mass of agents thatchoose to commit a crime. While we model the adjudicator as knowing that the underlying groups areheterogeneous (i.e. knowing the above distributions that describe each group), the adjudicator is not biasedfor or against any group, nor is there any underlying preference for fairness. Our main result (see Section 3)is that the classiﬁer that minimizes the crime rate is fair according to a metric that has attracted attentionin the literature: setting diﬀerent thresholds on the posterior probability of crime for each group so as toguarantee equality of false positive and negative rates. This corresponds to setting the same threshold onsignals across groups.To dig a little deeper into this result, the equilibrium crime rate in each population can be viewed asthe adjudicator’s prior belief in equilibrium that an agent has committed a crime, given knowledge only ofher group membership. Given the noisy signal, the adjudicator has a posterior belief that the agent hascommitted a crime. In a static environment, the optimal classiﬁcation rule for an adjudicator who wishes tooptimize classiﬁcation accuracy will be a group-independent threshold on her posterior belief that an agent For a discussion of this in the context of evaluating the fairness of lending standards, see Ferguson and Peters [1995]. signal for each group. It can be viewed as a commitment to avoid conditioningon group membership, even when group membership information is ex-post informative for classiﬁcation.The intuition is that if the adjudicator uses the same threshold on the posterior belief that an agent hascommitted a crime for each group, the decision rule is making use of information contained within eachgroup’s prior. Although this information is statistically informative of the decision made by the agent, it isnot within the agent’s control. Using this information therefore only distorts the (dis-)incentives to commitcrime. On the other hand, if the adjudicator uses the same threshold on agents’ signals for each group (andhence diﬀerent thresholds on posterior beliefs), decisions are made only as a function of information underthe control of the agents, and hence are more eﬀective at discouraging crime. The equalizing of false-positiveand false-negative rates across the groups follows from this.

Extensions

The main insights of our baseline model continue to hold even when many of its core assumptions are relaxed.We summarize them here.First, the baseline model assumes that a signal is observed by the adjudicator for every agent (or equiva-lently, at equal rates across populations). There is signiﬁcant empirical evidence, however, that this is oftennot the case: for example, arrest rates (and hence prosecution rates) are substantially higher in minoritypopulations for certain drug oﬀenses, despite evidence that the underlying prevalence is more uniform acrossgroups [Mitchell and Caudy, 2015]. Section 4 introduces an elaboration of the baseline model based onPersico [2002].In this variant, the adjudicator must rely on an intermediary (police) to inspect agents and generate a sig-nal. The adjudicator observes a signal from an individual only if the police inspect them, and individuals arepunished only if they are inspected and their signal crosses the adjudicator’s threshold for establishing guilt.The police have their own objectives: to maximize the number of successful inspections (e.g. inspectionsthat result in an arrest). Formally, we study the following game: The adjudicator commits to a decision ruleon guilt/ innocence based on signals. Then police and agents play a simultaneous move game: The policechoose an inspection intensity for each group to maximize their objectives, given the adjudicator’s rule andcrime rates in each group, subject to an overall capacity constraint. Agents of diﬀerent groups commit crimebased on both the adjudicator’s rule, and the police’s inspection rate for their group. We show that similarresults continue to hold in this model, i.e. the optimal rule for the adjudicator will continue to equalize thedisincentive to commit crime across the groups. This will result in equalizing conditional false-positive andfalse-negative rates across the groups, i.e. the rate conditional on being inspected.In Section 5, we consider a setting in which the signal distribution depends not just on the actionchosen by the agent (crime or no crime) but may also depend on their group. For example, the underlyingsignal generating process may be less noisy for agents from certain groups, and noisier for others. Ofcourse, in general, the structure of the optimal solution is closely tied to the relationship between thesignal distributions, and our results cannot carry over without further assumptions. Nevertheless, theinsights provided by our previous analysis allow us to study the tradeoﬀ between the adjudicator’s objective(minimizing overall crime) and various fairness notions. In particular, Theorem 5.1 gives conditions underwhich our baseline insights continue to hold, i.e. conditions under which the adjudicator’s optimal rulewill continue to equalize the disincentive to commit crime across groups. Conversely, Theorem 5.3 showsconditions under which rules that equalize false-positive or false-negative rates across groups outperformrules that equalize incentives. Consider, for example, the case in which there is a perfectly informative signal for one group but not for another. Then,the optimal solution will have zero error for the former group, whereas error will be inevitable for the latter, for whom we onlyhave a noisy signal.

The economic literature starting from Arrow [1972] has considered models of discrimination where agentdecisions (e.g. to gain education) are endogenous and their incentives determined by a principal’s choice (e.g.employer’s hiring rule). Coate and Loury [1993], Foster and Vohra [1992] study models in which individualshave a choice about how much eﬀort to exert, and identical populations can have diﬀerent outcomes in (e.g.hiring markets) because of asymmetric self-conﬁrming equilibria. There has also been extensive interest inthe design of aﬃrmative action policies for example in higher education. Loury [1977] makes the case thataﬃrmative action may be necessary to correct historical inequity by constructing a dynamic model in whichheterogeneity between two groups may persist if the principal uses a non-discriminatory rule going forward.The subsequent literature is too large to comprehensively cite, see Fang and Moro [2011] for a survey.There has also been substantial interest in evaluating outcome data for evidence of discrimination, using(or developing) an underlying theoretical prediction of how such discrimination would manifest: see e.g.Knowles et al. [2001], Persico [2002] or Anwar and Fang [2006] in the context of policing/ traﬃc stops. Alarge literature has studied lending data for evidence of discrimination against women and minorities: see e.g.Ferguson and Peters [1995] or Ladd [1998] for overviews of both the debate on what measures of (un-)fairnessto use, and an overview of the existing research.More recently, in the computer science literature, several papers consider eﬀort-based models that aresimilar in spirit to Coate and Loury [1993], Foster and Vohra [1992]. Hu and Chen [2018] propose a two-stage model of a labor market with a “temporary” (i.e. internship) and “permanent” stage, and study theequilibrium eﬀects of imposing a fairness constraint (“statistical parity”, which corresponds to hiring fromtwo populations at equal rates) on the temporary stage. Liu et al. [2019c] consider a model of the labormarket with higher dimensional signals, and study equilibrium eﬀects of “subsidy” interventions which canlessen the cost of exerting eﬀort. Kannan et al. [2019] study the eﬀects of admissions policies on a two-stagemodel of education and employment, in which a downstream employer makes rational decisions, but studenttypes are exogenously determined. Two recent papers Liu et al. [2019a], Mouzannar et al. [2019] study nongame-theoretic models by which classiﬁcation interventions in an earlier stage can have eﬀects on individualtype distributions at later stages, and show that for many commonly studied fairness constraints (includingseveral that we consider in this paper), their eﬀects can either be positive or negative in the long term,depending on the functional form of the relationship between classiﬁcation decisions and changes in theagent type distribution.

Baseline Model

Each agent belongs to a group g ∈ G . A group corresponds to some observable characteristic of the agent,for instance race or gender. There are N g agents in group g . For simplicity assume just two groups { , } though the results extend straightforwardly to any ﬁnite number. Each agent makes a single binary decisionto either commit a crime ( c ) or remain innocent ( i ).Then, for each agent, the adjudicator observes a random signal s ∈ R which is informative of the agent’sguilt/innocence. The distribution of the signal depends only on whether the agent has committed a crime(and is therefore conditionally independent of their group). Criminals’ signals are drawn according to thedistribution F c (with pdf f c ) and innocents’ signals drawn from F i (with pdf f i ). It is without loss ofgenerality (reordering signals if necessary) to assume that the signal distributions satisfy the MonotoneLikelihood Ratio Property (MLRP), i.e. higher signals imply a higher likelihood of guilt.5he adjudicator commits to a decision rule β , which labels an agent in group g with signal s as guiltywith probability β g ( s ) ∈ [0 , q = 1 to indicate that theagent is labeled guilty and q = 0 otherwise.Now we describe agents’ incentives to commit a crime in the ﬁrst place. The agent receives a reward ρ when he commits a crime, but pays a penalty of κ if he is labeled as guilty. An agent who does not commit acrime receives his outside option value ω . All three quantities are privately known only to the agent and aredrawn independently from a distribution that may potentially diﬀer across the groups. An agent in group g commits a crime if his net utility from committing a crime is higher than not: ρ − κ Pr( q = 1 | c, g ) ≥ ω − κ Pr( q = 1 | i, g ) , (1)which can be written as Pr( q = 1 | c, g ) − Pr( q = 1 | i, g ) ≤ ρ − ωκ . (2)where ρ − ωκ is the marginal beneﬁt of committing a crime normalized by the penalty. Deﬁne∆ g = Pr( q = 1 | c, g ) − Pr( q = 1 | i, g ) (3)as the disincentive for committing a crime: it is the group speciﬁc additional probability of being foundguilty having committed a crime relative to not. Then, the crime rate of group g can be expressed in termsof H g , the survivor function (i.e. 1-CDF) associated with the relevant quantity on the right hand side of (2)given the joint distribution of ρ, κ and ω :CR g = Pr (cid:18) ∆ g ≤ ρ − ωκ (cid:19) = H g (∆ g ) . (4)The adjudicator’s objective is to minimize the overall crime rate, i.e. to solvemin β ∈ B X g ∈G N g CR g . (OPT)where of course, β determines ∆ g by (3) and therefore CR g as per (4). Here, B is the set of all feasiblepolicies for the adjudicator, i.e. B = { β g , g ∈ G : β g : R → [0 , } . We are interested in how decision rules β respecting various notions of fairness perform relative to the optimalpolicy, and to each other. There are ﬁve main notions of fairness that we discuss throughout this paper,each of which corresponds to equalizing some statistical quantity across groups. Three of them have beenconsidered both in the literature and in the popular press: equalizing false positive rates, false negativerates, and positive predictive value. These three notions of fairness are of particular interest to us becauseit has been shown that attaining all three measures simultaneously is impossible (Kleinberg et al. [2016],Chouldechova [2017]).Given a policy β g for group g , true positive rate (TPR), false positive rate (FPR), false negative rate(FNR), and positive predictive value (PPV) are deﬁned asTPR g = Pr( q = 1 | c, g ) = Z R f c ( s ) β g ( s ) ds (TPR)FPR g = Pr( q = 1 | i, g ) = Z R f i ( s ) β g ( s ) ds (FPR)FNR g = Pr( q = 0 | c, g ) = Z R f c ( s )(1 − β g ( s )) ds (= 1 − TPR g ) (FNR)PPV g = Pr( c | q = 1 , g ) = CR g TPR g CR g TPR g + (1 − CR g )FPR g (PPV)6ote that in light of these deﬁnitions, we can rewrite (3) as:∆ g = TPR g − FPR g (∆)Additionally, we propose two new notions of fairness: equalizing disincentives (denoted ∆) and equalizingcrime rates (denoted CR). We say the policy β achieves fairness notion ξ ∈ { FPR , FNR , PPV , ∆ , CR } if theresulting respective quantity is the same across the groups when the adjudicator chooses policy β . We write B ξ to be the set of all policies that achieve fairness notion ξ .Given this framework we are interested in two questions. First, which of these fairness notions is compat-ible with the adjudicator’s problem (OPT)? Second, under what conditions is a particular fairness notionbetter than another in terms of the objective of minimizing overall crime rate, i.e. when do we have that forfairness notions ξ, ξ ′ : min β ∈ B ξ X g ∈G N g CR g ≤ min β ∈ B ξ ′ X g ∈G N g CR g . One ﬁnal piece of notation will be useful. We denote by β ⋆ the solution to the adjudicator’s problem(OPT), i.e. the policy that minimizes crime overall. We sometimes refer to this as the optimal policy.Further, for fairness notion ξ , we denote as β ⋆ξ the solution to the adjudicator’s problem among all rules thatsatisfy fairness notion ξ ,i.e. the rule that solvesmin β ∈ B ξ X g ∈G N g CR g . The main result we build around is that the solution to the adjudicator’s problem (OPT) is naturally “fair” interms of three of the ﬁve measures above. It provides an interesting counterpoint to the impossibility resultsof Kleinberg et al. [2016] and Chouldechova [2017]. Those results state that it is impossible to simultaneouslyequalize false negative rates, false positive rates, and positive predictive value across groups. This raises aquestion of which of the fairness measures should be preferred over the others. By endogenizing the baserate of criminal activity, we ﬁnd that equalizing false positive rates and equalizing false negative rates arepreferred to equalizing positive predictive value in the sense that the former two are compatible with theoptimal policy while the latter is not. Formally,

Theorem 3.1.

The adjudicator’s optimal policy β ⋆ (i.e. the policy which solves (OPT) ) equalizes thedisincentive to commit crime (∆) across groups. As a result, it also equalizes the false negative rates (FNR) ,and false positive rates (FPR) .Proof. First note that because β g can be set independently for each group g , minimizing the total crime rateis achieved by individually minimizing the crime rate within each group. Recall that the crime rate withina group is H g (∆ g ). This in turn is minimized by maximizing the disincentive of crime ∆ g , since H g being asurvivor function is non-increasing. Recall that∆ g = Z R ( f c ( s ) − f i ( s )) β g ( s ) ds. Therefore the optimal β g is independent of the (group-dependent) distribution over private values deﬁning H g , and is therefore the same for all groups: β g ( s ) = ( f c ( s ) ≥ f i ( s ) , f c ( s ) < f i ( s ) , Since the disincentive to commit crime is a function only of β g , this results in the same disincentive tocommit crime at the optimal solution. 7inally note that both the FNR g and FPR g for each group are a function only of f i ( s ) and f c ( s ) (whichare identical across groups), and the chosen policies β g ( s ), which we have shown in the optimal solution willbe identical across groups. Hence, the adjudicator’s optimal policy will equalize false positive rates and falsenegative rates across groups.Rather than thinking of an arbitrary function β g ( s ), it is more natural to think of the adjudicator asselecting a threshold T g for each group g so that any member of group g whose signal s exceeds T g is labeledguilty. That is β g ( s ) = ( s ≥ T g s < T g . Remark 3.2.

Since, f c and f i satisfy the MLRP property (i.e. f c ( s ) f i ( s ) is non-decreasing in s ), β ⋆ is athreshold policy by observation. Note that if strict MLRP holds, then the optimal thresholds are unique. Under a threshold policy, group g ’s true positive rate reduces to TPR g = F c ( T g ) and the false positiverate simpliﬁes to FPR g = 1 − F i ( T g ). Posterior Thresholds

It is interesting to contrast the policy of setting equal thresholds on the signal, which we show to be theoptimal policy here, as opposed to equal thresholds on the ‘posterior’ or another calibrated risk score. Thelatter is advocated in Corbett-Davies et al. [2017], for example.In that paper, the authors consider a setting where crime choices are exogenous/ ﬁxed, and study thechoice of policy that minimizes weighted misclassiﬁcation rates (i.e. acquittal of the guilty and incarcerationof the innocent). They show that an optimal policy involves a common ‘threshold’ on the posterior acrossgroups, i.e., ﬁrst, the adjudicator estimates the prior probability that an individual has committed a crimeby considering the base rate of crime for the individual’s group and then uses the observed signal to updateher prior probability to her posterior belief that the individual in question has committed a crime. Second,the individual is deemed guilty if the posterior probability of guilt exceeds some threshold.Of course, in our setting, choice of crime is endogenous, and the planner’s objective function is minimizingcrime rate rather than minimizing mislabeling costs. Nevertheless, it is interesting to inquire into theimplications of equalizing the thresholds on the posterior in our setting.In our setting, the posterior after observing the signal s and the group g isPr( c | s, g ) = f c ( s )CR g f c ( s )CR g + f i ( s )(1 − CR g )which increases in s when the signal structure satisﬁes monotone likelihood ratio property. Thresholding theposterior corresponds to choosing a value π g ∈ [0 ,

1] and classifying as guilty whenever Pr( c | s, g ) ≥ π g .Let T ∗ be the threshold on the signal under the planner’s optimal policy. The optimal policy classiﬁes thedefendant as guilty if the signal s exceeds the threshold T ∗ . With the monotone likelihood ratio property, thisimplies a threshold rule on the posterior which is to classify the defendant as guilty whenever the posteriorexceeds π g = Pr( c | T ∗ , g ) . By observation, the posterior thresholds are equalized across groups if and only if CR g = CR g ′ , i.e. if andonly if crime rates are equalized under the optimal policy T ∗ . This in turn will only occur if H g (∆) = H g ′ (∆)which of course will not obtain in general because H g need not be the same as H g ′ .8 Heterogenous Signal Observation

The baseline model assumes that when an agent commits a crime, the adjudicator observes the signalgenerated and adjudicates. What if the signals generated by members of each group are observed at diﬀerentrates? This can happen if the adjudicator relies on an intermediary to record the signals. For example, inthe crime application, the groups may be policed at diﬀerent rates. Their (dis)incentives to commit crimewill then diﬀer as a result. Critically, we suppose that the police’s incentives diﬀer from the adjudicator’s.The adjudicator’s choice of rule therefore inﬂuences the police’s choice on how to divide their manpoweracross diﬀerent groups. Both the adjudicator’s rule and the police’s choice inﬂuence the incentives of agentsto commit crime, and ultimately determine the overall crime levels in society.To model this, we build upon upon the framework of Persico [2002]. There are a continuum of policeoﬃcers who choose inspection intensities for each group { θ g } g ∈G given a search capacity S . The choice ofinspection intensity determines the rate at which signals are observed from the two groups: upon inspection,a police oﬃcer observes a signal about whether a crime was committed or not. In Persico [2002], the signal isassumed to be perfect. We depart from this assumption in that in our setting, the observed signal is noisy ,as in our baseline model. The adjudicator, as before, wishes to minimize the overall crime rate. As in Persico[2002], the police have diﬀerent incentives. Speciﬁcally, each police oﬃcer tries to maximize the number of‘successful’ inspections, i.e. where the signal recorded exceeds the threshold set by the adjudicator. As inPersico [2002], we motivate this incentive as driven by the career concerns of individual police oﬃcers (whoare e.g. promoted if they have many successful arrests etc.).The timing of the game is therefore: (1) The adjudicator chooses the function β , (2) the police takes thisas ﬁxed and chooses inspection intensities θ g ∈ [0 ,

1] for each group subject to the constraint N θ + N θ ≤ S (recall that N g is the number of agents in group g ), and then (3) given the adjudicator’s choice β , andinspection probability θ , an agent of group g with crime reward ρ , cost of being found guilty κ and outsideoption commits a crime if (analogous to (1), but now taking into account also the probability of inspection): ρ − θ g κ Pr( q = 1 | c, g ) ≥ ω − κθ g Pr( q = 1 | i, g ) , i.e., whenever θ g ∆ g ≤ ρ − ωκ where ∆ g is as deﬁned in (3). By analogy with (4) an agent of group g commits crime with probability H g ( θ g ∆ g ).As a benchmark, consider a setting where the adjudicator can choose both β and θ . Here the objectivefunction is to minimize the overall crime rate just as in (OPT), and the additional constraint simply reﬂectsthat the choice of inspection rule must be feasible, i.e. the total level of inspection cannot exceed total searchcapacity S. That is to say the adjudicator’s problem in this benchmark can be written as:min { β,θ } X g N g H g ( θ g ∆ g )s.t. X g N g θ g ≤ S. (5)We refer to the solution of (5) as the ﬁrst-best solution.Now return to our setting above where the adjudicator chooses β but not θ . The police take β as givenand choose θ to maximize the number of successful inspections. Note that an inspection in group g successfulwith probability H g ( θ g ∆ g ) . As in Persico [2002], we assume an interior equilibrium solution, i.e., θ g > ∀ g . Intuitively, in this case, the optimal strategy for the police will equalize the crime rate between the twogroups. Otherwise, police that are trying to maximize their successful inspection probability will exclusivelysearch the group with the highest crime rate. Recognizing this, the adjudicator will solve the following To make this model non trivial, we assume that search capacity is limited, i.e.,

S < N + N . A corner solution entails the police completely ignoring a group. In this case, the setting is trivial because conditional falsepositive rates are undeﬁned for the ignored group. β,θ X g N g H g ( θ g ∆ g )s.t. X g N g θ g = SH ( θ ∆ ) = H ( θ ∆ ) . (6)The solution to problem (6) is the second-best solution.We note that because groups will be inspected at diﬀerent rates, the TPR and FPR as we have deﬁnedthem should correctly be called the conditional true and false positive rates respectively, i.e., the ratesconditional on being inspected. Theorem 3.1 now carries over mutatis mutandis, i.e. in both the ﬁrst andsecond best solution, the thresholds will be set so as to equalize the conditional false and true positive rates

CF P R and

CT P R . Proofs for this and subsequent theorems are in the appendix.

Theorem 4.1.

The optimal solutions to both the ﬁrst best (5) and second best (6) equalize the

CF P R and

CT P R across groups.

While the optimal β under the ﬁrst and second best outcomes coincide, the optimal inspection intensitiesneed not. In particular, the ﬁrst- and second-best solutions coincide when H is convex. However, when H is concave, then the inspection intensities under the second-best outcome maximize the average number ofcrimes out of all possible search intensities given the optimal signal thresholds. Theorem 4.2.

Suppose that the H g belong to the same location family, i.e. H g ( s ) = H ( s − µ g ) for some µ g for each i ∈ G and that H is convex (concave). Then, the inspection intensities in the second best solutionminimize (maximize) the crime rate among all thresholds that equalize conditional false positive rates andconditional true positive rates CF P R and

CT P R . We now examine the extent to which the conclusion of Theorem 3.1 holds if we allow the signal structure F g = ( F c g , F i g ) to be diﬀerent across groups g ∈ G . The signal structure F g and the strategy β g ( s ) matter tothe extent they discourage crime. Recall, that the relevant suﬃcient statistic of a strategy β g ( s ) is what wecalled the disincentive to commit crime:∆ g = TPR g − FPR g = Z R ( f c ( s ) − f i ( s )) β g ( s ) ds. The set of achievable disincentives given a signal structure is [∆ g , ∆ g ] where∆ g = Z S − g ( f c g ( s ) − f i g ( s )) ds, where S − g ≡ { s : f c g ( s ) − f i g ( s ) < } , ∆ g = Z S + ( f c g ( s ) − f i g ( s )) ds, where S + g ≡ { s : f c g ( s ) − f i g ( s ) > } . The relevant suﬃcient statistics for the signal structure ( F c g , F i g ) for group g are its minimal and maximaldisincentive ∆ g and ∆ g which determines the range of disincentives a classiﬁcation rule is able to provide. We observe that these conditional rates are implicitly what has been studied in the fairness in machine learning literature,because these are the rates that can be computed from the data. .1 General Analysis In this section, we give some insight into what happens when the signal structure varies across groups withoutmaking further assumptions on how the signal structure varies. First, as should be clear from the intuitionpreviously, what really matters for our results in the baseline model is not that the signal distributions areidentical across populations, but rather that the maximal disincentive ∆ g is the same across groups: if wehave this, then the basic insight of Theorem 3.1 holds as before (maximizing disincentives across groups).This is summarized in Theorem 5.1. Note that since the signal structures are diﬀerent, the implication ofTheorem 3.1, i.e. that FPR/ FNR will also be equalized across groups, will not hold in general. Further, if themaximal disincentive diﬀers, then in general the result does not hold, as we show in Example 5.2. Theorem5.3 then provides conditions under which various “natural” fair policies are ranked under the adjudicator’sobjective to minimize overall crime.Let us start with analyzing the optimal policy that minimizes average crime. As in Section 3, averagecrime is minimized by maximizing the disincentive for crime in each group, which is attained by setting β g ( s )such that ∆ g = ∆ g for every g . Unlike in Section 3, the optimal policy does not guarantee any of the fairnessnotions described in section 2.1 — equalizing disincentives, equalizing false positive rates, equalizing falsenegative rates or equalizing positive predictive value — when the signal structures diﬀer across the groups.An immediate observation is that when the signal structures have the same maximal disincentives ∆ g ,then the optimal eﬀective policy equalizes disincentives. Theorem 5.1.

Suppose that ∆ = ∆ . The adjudicator’s optimal policy (i.e. the solution to (OPT) )equalizes disincentives (∆) across groups. Theorem 5.1 follows from the core insight of Theorem 3.1 that the optimal rule maximizes the disincentiveto commit crime for each group. When the signal structures across the groups are identical as in Theorem3.1, the signal structures have the same maximal disincentives, and therefore, the optimal policy equalizesdisincentives. Equalizing disincentives coincides with equalizing false positive rates and equalizing falsenegative rates in this case. However when the distributions of signals are diﬀerent, equalizing disincentivesneed not be be the same as equalizing false positive/ false negative rates. Indeed, equalizing disincentivesmay yield a strictly lower crime rate than equalizing false positive rates and equalizing false negative rateseven when ∆ = ∆ . To see this, consider the following example. Example 5.2.

Suppose that the signal for group g , s g , is generated according to s g = η g + 1 c where η g is a random variable that has pdf f g ( η ) that is strictly log-concave and has full support on R , and c is an indicator function that equals if and only if the agent has committed a crime. In words, committinga crime produces a signal that exceeds the signal from not committing a crime by at least on average, whilethe underlying distribution f g may diﬀer across the groups. Note that f i g ( s ) = f g ( s ) and f c g ( s ) = f g ( s − and the strict log-concavity guarantees that the strict Monotone Likelihood Ratio Property between f i and f c is satisﬁed, that is, f c g ( s ) f i g ( s ) strictly increases in s , and that f g is unimodal. Finally, suppose that f ( η ) isasymmetric around its mode, and that f ( η ) = f ( − η ) is a horizontal reﬂection of f .For each threshold T g , the corresponding disincentives satisfy ∆ g ( T g ) = F g ( T g ) − F g ( T g − . The threshold T ∗ g maximizes ∆ g ( T g ) if and only if it equalizes the pdfs at T ∗ g and T ∗ g − : f g ( T ∗ g ) = f g ( T ∗ g − . Graphically, T ∗ g and T ∗ g − are obtained as a pair of intersection points between the pdf f g and a horizontalline, where the distance between the intersection points has to be as in Figure 1. The maximal disincentive ∆ g = ∆ g ( T ∗ g ) is the white area under f g between T ∗ g − and T ∗ g . Since f is merely a horizontal reﬂection f f , so is the maximal disincentives, and therefore, ∆ = ∆ . By Theorem 5.1, the optimal policy T ∗ g equalizes disincentives.The false positive rate and false negative rate for each group are colored in blue and red. Clearly, F P R = F P R and F N R = F N R . Consequently, equalizing false positive rates and equalizing false negative ratesyield strictly higher crime rates than equalizing disincentives. This also implies P P V = P P V so thatequalizing PPVs also yields a strictly higher crime rate in general. T ∗ − T ∗ − T ∗ T ∗ f f FNR FNR FPR FPR Figure 1When the signal structures across the groups have diﬀerent maximal disincentives, we identify conditionsunder which both equalizing false positive rates and equalizing false negative rates yields a strictly lowercrime rate than equalizing disincentives. Without loss of generality, let us assume that ∆ > ∆ . Theorem 5.3.

Let that ∆ > ∆ . Then, the following are equivalent:1. The optimal policy subject to equalizing false positive rates ( β ⋆ FPR ) attains a (weakly) lower crime ratethan equalizing disincentives ( β ⋆ ∆ ).2. The optimal policy subject to equalizing false negative rates ( β ⋆ FNR ) attains a (weakly) crime rate thanequalizing disincentives ( β ⋆ ∆ ).3. ( F c ) − ◦ F c ( T ∗ ) > ( ≥ )( F i ) − ◦ F i ( T ∗ ) where T ∗ g is the threshold under the optimal policy for group g . In general, the optimal policy does not guarantee any of the fairness notions. Theorem 5.3 provides asuﬃcient and necessary condition under which equalizing false positive rates and equalizing false negativerates attains lower crime rates than equalizing disincentives. However, condition 3 is hard to interpret.Further, it is unclear which of equalizing false positive rate and false negative rate would be better overallfor the adjudicator. Without additional structure on the signal structures, it is hard to proceed further. Toexplore these issues, we restrict attention to signal distributions that are members of location-scale familiesof distributions.

Deﬁnition 5.4.

We say that the signal structure is from a location-scale family if each group’s signal is alocation-scale transformation of the same underlying random variable η that has absolutely continuous andlog-concave density function f with full support on the real line. Speciﬁcally, the signal s g for group g isgenerated according to: s g = µ g + σ g η + m g c where µ g is a location shifter, σ g is a scale shifter, m g is the marginal eﬀect of crime on the signal and c is an indicator function that equals if and only if the agent has committed a crime. Equivalently, the onditional pdfs of signal s for group g conditioning on being innocent and having committed a crime are f i ( s ) = f (cid:18) s − µ g σ g (cid:19) and f c ( s ) = f (cid:18) s − µ g − m g σ g (cid:19) . (7)Note that the underlying distribution f is identical across the groups as in contrast to Example 5.2where the underlying distribution f g diﬀered across the groups. Combined with the functional form (7), log-concavity of f is equivalent to the signal structure satisfying the monotone likelihood ratio property for eachgroup g which implies that the optimal β is a threshold strategy. The log-concavity of f also implies that f is unimodal, which guarantees the uniqueness of the threshold that attains the optimal policy. There aremany natural location-scale families of distributions satisfying log-concavity including normal distributions,logistic distributions, and extreme value distributions.A property that makes location-scale family particularly tractable is that the disincentives engendered bya threshold depend only on m g σ g , i.e. is the ratio between the scale shift σ g and the marginal eﬀect of crime m g . For this class of distributions, we can say that it is always preferable to equalize either false positiverates or false negative rates compared to equalizing disincentives. Theorem 5.5.

Suppose the distributions across groups are from the location-scale family as deﬁned inDeﬁnition 5.4. Then1. If m σ = m σ , then the optimal policy equalizes disincentives, false positive rates and negative rates.2. Suppose m σ = m σ , and assume m σ is larger without loss of generality. Then ∆ > ∆ . Further,the optimal policy subject to equalizing false positive rates ( β ⋆ FPR ) and equalizing false negative rates( β ⋆ FNR ) attain strictly lower crime rates than equalizing disincentives ( β ⋆ ∆ ). Further, all three attainstrictly higher crime rates than the optimal policy ( β ∗ ). The formal proof is in the appendix. For some intuition, note that the maximum disincentive ∆ g isdetermined by and increasing in m g σ g . Intuitively, this is because the larger the normalized marginal eﬀect ofcrime on the signal is, the better the adjudicator is able to distinguish between the criminal and the non-criminal based on the signal, and therefore the adjudicator can increase the disincentive to commit crime. Ifthis term is equal across groups, then Theorem 5.1 applies and part (1) follows as a corollary. Now, withoutloss of generality, suppose instead that m σ < m σ . Then ∆ > ∆ . It can also be veriﬁed that the condition(3) in Theorem 5.3 holds, so that equalizing false positive rates and equalizing false negative rates alwaysyield a lower crime rate than equalizing disincentives. Furthermore, we can also verify that equalizing falsepositive rates and equalizing false negative rates never attains the crime rate under the optimal policy.A natural question to ask is whether one of equalizing false positive rates or equalizing false negativerates will have a lower crime rate than the other. It is a hard question to answer in general. When theunderlying distribution f is symmetric around 0 (0 is without loss of generality since the µ g ’s can alwaysbe shifted if f is symmetric around some other number), however, the set of feasible disincentives (∆ , ∆ )are identical under equalizing false positive rates and false negative rates, and therefore, the two notions offairness yield the same crime rate. Theorem 5.6.

81 ∆ H g ( ∆ g ) H H Equalizing Incentives Equalizing Crime Rates

Figure 2: ∆ + ǫ < ∆ To understand the implications of equalizing crime rates, let us assume that group 2 is ‘riskier’ thangroup 1 without loss of generality. Speciﬁcally, assume that H stochastically dominates H – that is toassume H (∆) ≥ H (∆) ∀ ∆.In this section, we focus on the comparison between equalizing crime rates and equalizing disincentives,while allowing any arbitrary signals structures F g = ( F c g , F i g ). Equalizing disincentives is an appropriatefairness measure to compare because it is attained by the optimal policy when ∆ = ∆ .The ﬁrst question to ask is whether equalizing crime rates can ever attain a lower crime rate thanequalizing disincentives. We ﬁnd that equalizing crime rates attains a lower crime rate than equalizingdisincentives if and only if ∆ is suﬃciently larger than ∆ . Theorem 6.1.

Suppose that H ﬁrst-order stochastically domiantes H . Then, there is an ǫ > such thatequalizing crime rates attains a lower crime rate than equalizing disincentives if and only if ∆ ≥ ∆ + ǫ . The theorem is best demonstrated using the diagrams, although the formal proof is provided in theappendix. For theorem 6.1, we consider 4 cases: (i) ∆ + ǫ ≤ ∆ , (ii) ∆ < ∆ < ∆ + ǫ , (iii) ∆ = ∆ ,and (iv) ∆ > ∆ . For initial illustration purposes, we will focus on ﬁgure 2 which corresponds to the ﬁrstcase, ∆ + ǫ < ∆ . Each red and blue curve represents H ( · ) and H ( · ), respectively. Note the ‘thicker’segments of each curve are the set of crime rates H g (∆ g ) that can be achieved by varying the disincentive,∆ g ∈ [∆ g , ∆ g ]. For each group, we denote the optimal policy by a triangle. The optimal policy whileequalizing disincentives, which is denoted by ‘X’, is obtained as intersections of the black line and theoutside option distribution functions. The optimal policy while equalizing crime rates, which is denoted by‘o’, is obtained as intersections of the orange line and the outside option distribution function. In ﬁgure 2,note that for group 1, the crime rate stay the same under equalizing disincentives and crime rates, but thecrime rate increases once one changes from the optimal policy that equalizes disincentives to the one thatequalizes crime rates. Therefore, the optimal policy subject to equalizing crime rates is more preferred thanunder equalizing disincentives.Figure 3 shows the other cases. Note that as we decrease ∆ (or equivalently increase the crime rate ofgroup 2), there comes a point determined by N and N at which equalizing crime rates is more preferredthan equalizing disincentives. More speciﬁcally, in ﬁgure 3a, imagine moving the right most blue triangleto the left and hence raising the orange line; as this happens, both ‘X’ marks, which denote the optimalpolicy that equalizes crime rate, need to go up, while the optimal policy that equalizes incentive stays thesame. Therefore, depending on the ratio of the number of people ( N and N ), there exists some ǫ such that∆ ≤ ∆ + ǫ if and only if equalizing crime rate attains lower crime rate than equalizing crime rates. Andit’s easy to see from ﬁgure 3b and 3c that equalizing disincentives achieves lower crime rate than equalizingcrime rates in the corresponding cases. Therefore, these arguments together imply that equalizing crimerates attains a lower crime rate than equalizing disincentives if and only if ∆ is suﬃciently higher than ∆ .14 − − . . . .

81 ∆ H g ( ∆ g ) H H (a) ∆ < ∆ < ∆ + ǫ − − − . . . .

81 ∆ H g ( ∆ g ) H H (b) ∆ = ∆ − − − . . . .

81 ∆ H g ( ∆ g ) H H (c) ∆ > ∆ Figure 315 − − . . . .

81 ∆ H g ( ∆ g ) H H Figure 4: H (∆ ) = H (∆ )Having veriﬁed that equalizing crime rates can attain lower crime rates than equalizing disincentives, thenext question is whether equalizing crime rates can ever attain a lower crime rate than any of other fairnessnotions. The answer to this question is positive which we establish by ﬁnding a condition under which theoptimal policy attains equalizing crime rates. Theorem 6.2.

Suppose that H ﬁrst-order stochastically dominates H . When H (∆ ) = H (∆ ) , theoptimal policy equalizes crime rates but not necessarily false positive rates, false negative rates or disincentivesin general. Figure 4 depicts the case when H (∆ ) = H (∆ ). By construction, the optimal policy equalizes crimerates. As it can be seen from Figure 4, equalizing disincentives attains a strictly higher crime rates thanequalizing crime rates. Furthermore, it can be shown that other notions of fairness — equalizing false positiverates, false negative rates and positive predictive value - are not satisﬁed in general. This paper gives a general model in which classiﬁcation rules which equalize false positive and false negativerates can be compatible with natural objectives, in spite of failing to capitalize on statistically relevantinformation . We derived the model using the language of criminal justice, but one could just as easilyapply the base model to settings in which the principle was making some other binary decision based onpartial information, such as a lending or employment decision. The underlying reason is that conditioning ondemographic information, while statistically useful, leads to decision rules that incentivize diﬀerent groupsdiﬀerently — because demographic information is not under individual control . Hence, in settings in whichthe underlying objective depends on the decisions of rational agents, the decision rule should explicitlycommit not to condition on information that relates to an individual’s demographic group, and instead useonly information that is aﬀected by the choices of the individual. Abstracting away, the necessary conditionsunder which our conclusions hold are that:1. The underlying base rates are rationally responsive to the decision rule deployed by the principle,2. Signals are observed by the adjudicator at the same rates across populations, and3. The signals that the adjudicator must use to make her decision are conditionally independent of anindividual’s group, conditioned on the individual’s decision.Here, conditions (2) and (3) are unlikely to hold precisely in most situations, but we give settings underwhich they can be relaxed. 16ore generally, if we are in a setting in which we believe that individual decisions are rationally madein response to the deployed classiﬁer, and yet the deployed classiﬁer does not equalize false positive andnegative rates, then this is an indication that either the deployed classiﬁer is sub-optimal (for the purpose ofminimizing base rates), or that one of conditions (2) and (3) fails to hold. Since in fairness relevant settings,the failure of conditions (2) and (3) is itself undesirable, this can be a diagnostic to highlight discriminatoryconditions earlier in the pipeline than the adjudicator’s decision rule. In particular, if conditions (2) or (3)fail to hold, then imposing technical fairness constraints on a deployed classiﬁer may be premature, andinstead attention should be focused on structural diﬀerences in the observations that are being fed into thedeployed classiﬁer. References

Julia Angwin, Jeﬀ Larson, Surya Mattu, and Lauren Kirchner. Machine bias.

Propublica , 2016. URL .Shamena Anwar and Hanming Fang. An alternative test of racial prejudice in motor vehicle searches: Theoryand evidence.

American Economic Review , 96(1):127–151, 2006.Kenneth J Arrow. Some mathematical models of race discrimination in the labor market.

Racial discrimi-nation in economic life , pages 187–204, 1972.Solon Barocas, Moritz Hardt, and Arvind Narayanan.

Fairness and Machine Learning . fairmlbook.org, 2018. .Anna Maria Barry-Jester, Ben Casselman, and Dana Goldstein. The newscience of sentencing.

The Marshall Project , August 8 2015. URL . Retrieved4/28/2016.Gary S Becker.

The economics of discrimination . University of Chicago press, 2010.Nanette Byrnes. Artiﬁcial intolerance.

MIT Technology Review , March 28 2016. URL . Retrieved 4/28/2016.Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism predictioninstruments. arXiv preprint arXiv:1703.00056 , 2017.Stephen Coate and Glenn C Loury. Will aﬃrmative-action policies eliminate negative stereotypes?

TheAmerican Economic Review , pages 1220–1240, 1993.Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fairmachine learning. arXiv preprint arXiv:1808.00023 , 2018.Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision makingand the cost of fairness. In

Proceedings of the 23rd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining , pages 797–806. ACM, 2017.Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness throughawareness. In

Proceedings of the 3rd Innovations in Theoretical Computer Science Conference , pages214–226. ACM, 2012.Hanming Fang and Andrea Moro. Theories of statistical discrimination and aﬃrmative action: A survey. In

Handbook of social economics , volume 1, pages 133–200. Elsevier, 2011.17vi Feller, Emma Pierson, Sam Corbett-Davies, and Sharad Goel. A computer program used for bail andsentencing decisions was labeled biased against blacks. its actually not that clear.

The Washington Post ,2016.Michael F Ferguson and Stephen R Peters. What constitutes evidence of discrimination in lending?

TheJournal of Finance , 50(2):739–748, 1995.Dean P Foster and Rakesh V Vohra. An economic argument for aﬃrmative action.

Rationality and Society ,4(2):176–188, 1992.Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. On the (im) possibility of fairness. arXiv preprint arXiv:1609.07236 , 2016.Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In

Advances inneural information processing systems , pages 3315–3323, 2016.´Ursula H´ebert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. Multicalibration: Calibrationfor the (computationally-identiﬁable) masses. In

International Conference on Machine Learning , pages1944–1953, 2018.Lily Hu and Yiling Chen. A short-term intervention for long-term fairness in the labor market. In

Proceedingsof the 2018 World Wide Web Conference , pages 1389–1398, 2018.Sampath Kannan, Aaron Roth, and Juba Ziani. Downstream eﬀects of aﬃrmative action. In

Proceedings ofthe Conference on Fairness, Accountability, and Transparency , pages 240–248, 2019.Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Audit-ing and learning for subgroup fairness. In

International Conference on Machine Learning , pages 2569–2577,2018.Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-oﬀs in the fair determinationof risk scores. arXiv preprint arXiv:1609.05807 , 2016.John Knowles, Nicola Persico, and Petra Todd. Racial bias in motor vehicle searches: Theory and evidence.

Journal of Political Economy , 109(1):203–229, 2001.Helen F Ladd. Evidence on discrimination in mortgage lending.

Journal of Economic Perspectives , 12(2):41–62, 1998.Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact of fair machinelearning. In

Proceedings of the 28th International Joint Conference on Artiﬁcial Intelligence , pages 6196–6200. AAAI Press, 2019a.Lydia T Liu, Max Simchowitz, and Moritz Hardt. The implicit fairness criterion of unconstrained learning.In

International Conference on Machine Learning , pages 4051–4060, 2019b.Lydia T Liu, Ashia Wilson, Nika Haghtalab, Adam Tauman Kalai, Christian Borgs, and Jennifer Chayes.The disparate equilibria of algorithmic decision making when individuals invest rationally. arXiv preprintarXiv:1910.04123 , 2019c.Glenn Loury. A dynamic theory of racial income diﬀerences. In

Women, minorities, and employmentdiscrimination , volume 153, pages 86–153. Heath, Lexington, MA, 1977.Clair C Miller. Can an algorithm hire better than a human?

The New York Times , June 25 2015. URL .Retrieved 4/28/2016. 18jmarrh Mitchell and Michael S Caudy. Examining racial disparities in drug arrests.

Justice Quarterly , 32(2):288–313, 2015.Hussein Mouzannar, Mesrob I Ohannessian, and Nathan Srebro. From fair decision making to social equality.In

Proceedings of the Conference on Fairness, Accountability, and Transparency , pages 359–368, 2019.Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in analgorithm used to manage the health of populations.

Science , 366(6464):447–453, 2019.Cathy O’Neil.

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy .Crown, 2016.Nicola Persico. Racial proﬁling, fairness, and eﬀectiveness of policing.

The American Economic Review , 92(5):1472–1497, 2002.Aaron Roth and Michael Kearns.

The Ethical Algorithm . Oxford University Press, 2019.Cynthia Rudin. Predictive policing using machine learning to de-tect patterns of crime.

Wired Magazine , August 2013. URL .Retrieved 4/28/2016.Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. The problem of infra-marginality in outcome testsfor discrimination.

The Annals of Applied Statistics , 11(3):1193–1216, 2017.19

Omitted Results and Proofs

Theorem 4.1.

The optimal solutions to both the ﬁrst best (5) and second best (6) equalize the

CF P R and

CT P R across groups.Proof of Theorem 4.1.

We show that the optimal disincentives { ∆ ∗ g } g in both cases must be { ∆ g } g , whichentails equalizing CF P R and

CT P R .As for the ﬁrst best outcome, it’s easy to see that given any inspection intensities { θ g } i ∈G , minimizingthe overall crime rate corresponds to maximizing { ∆ i } i . Then, it follows that for the optimal set of searchintensities and signal thresholds, the optimal solution should in the ﬁrst best outcome should be such that∆ ∗ g = ∆ g for each g .Once again, for the second best outcome, given any feasible solution ( { θ g } g , { ∆ g } g ) to 6, we can showthat if there exists g such that ∆ g is not maximized (i.e. ∆ g < ∆ g ), then we can always ﬁnd a new feasiblesolution ( { θ ′ g } g , { ∆ ′ g } g ) to 6 that sets ∆ ′ g = ∆ g , while keeping other ∆ ′ g the same (∆ ′ g ′ = ∆ g ′ ) with strictlylower overall crime rate. This shows that the optimal solution to 6 must set ∆ ∗ g = ∆ g for each g .Without loss of generality, assume that ∆ < ∆ is not maximized. Let’s say ∆ ′ = ∆ = (1 + ǫ )∆ for some ǫ > ′ = ∆ . Now, consider setting a new inspection intensity for group 1 such that θ ′ ∈ ( ǫ θ , θ ) to guarantee that the crime rate in group 1 will be strictly lower than before – that is θ ′ ∆ ′ > θ ∆ = ⇒ H ( θ ′ ∆ ′ ) < H ( θ ∆ ) . Then, because P g N g θ g = S , decreasing θ to θ ′ will require increasing θ to some θ ′ . Then, the crime ratefor group 2 will necessarily decrease: θ ′ ∆ ′ > θ ∆ = ⇒ H ( θ ′ ∆ ′ ) < H ( θ ∆ ) . By the continuity of H g , there exists θ ′ ∈ ( ǫ θ A , θ ) such that H ( θ ′ β ′ )) = H ( θ ′ β ). Note that the crimerate in group 1 and group 2 must have decreased. Therefore, we have found a better feasible solution to 6.Also, note that this proof can be generalized even when the number of groups is greater than 2. We canaggregate a collection of groups whose ∆ g are not changed to one ‘super’ group, apply the same argumentas above, and use induction over the number of groups. Theorem 4.2.

CT P R .Proof of Theorem 4.2.

We ﬁrst provide a high level sketch of the proof. Using the fact that both H g ’s arefrom the same family (i.e. mean shifted), we show that the equilibrium inspection intensities (i.e. the secondbest solution) set the derivative of the objective value to 0. Now, using the the convexity (concavity) of H , we can show that the second derivative of the overall crime rate will be positive (negative), showing theequilibrium inspection intensities achieve local minima (maxima).First, we show that if H g ’s belong to the same location family, then the derivative of the objective valueevaluated at the equilibrium inspection intensities will be 0. Denote the equilibrium inspection intensitiesby { θ ∗ g } g and the equilibrium disincentives by { ∆ ∗ g } g . Recall from Theorem 4.1 that (∆ ∗ , ∆ ∗ ) in both theﬁrst and second best solution correspond to (∆ , ∆ ).The equilibrium inspection intensities equalize the crime rates. Hence, we have H ( θ ∗ ∆ ) = H ( θ ∗ ∆ ) ⇒ H ( θ ∗ ∆ − µ ) = H ( θ ∗ ∆ − µ ) ⇒ θ ∗ ∆ − µ = θ ∗ ∆ − µ ⇒ h ( θ ∗ ∆ − µ ) = h ( θ ∗ ∆ − µ ) ⇒ h ( θ ∗ ∆ ) = h ( θ ∗ ∆ ) (A.1)20eplacing θ = S − N θ N and taking the derivative of the overall crime rate with respect to θ yields N ∆ h ( θ ∆ ) + N h (cid:18)(cid:18) S − N θ N (cid:19) ∆ (cid:19) (cid:18) − N N ∆ (cid:19) = N ∆ h ( θ ∆ ) − N ∆ h ( θ ∆ )Note that by equation A.1 and ∆ ∗ = ∆ ∗ , the derivative of the overall crime rate evaluates to 0 under { θ ∗ g } g and { ∆ ∗ g } g .Now, in order to determine whether { θ ∗ g } g achieves a local minima or maxima, we calculate the secondderivative of the overall crime rate with respect to θ : N ∆ h ′ ( θ ∆ ) − N ∆ h ′ (cid:18)(cid:18) S − N θ N (cid:19) ∆ (cid:19) (cid:18) − N N ∆ (cid:19) = N ∆ h ′ ( θ ∆ ) + N N ∆ h ′ (cid:18)(cid:18) S − N θ N (cid:19) ∆ (cid:19) By the convexity (concavity) of h , h ′ and h ′ is positive (negative). Therefore, { θ ∗ g } g achieves a local minima(maxima) at the equilibrium inspection intensities. Theorem 5.1.

Suppose that ∆ = ∆ . The adjudicator’s optimal policy (i.e. the solution to (OPT) )equalizes disincentives (∆) across groups.Proof of Theorem 5.1. This follows directly from the fact that (∆ , ∆ ) is the adjudicator’s most optimalpolicy. Theorem 5.3.

Lemma A.1.

Suppose that ∆ > ∆ . Let ∆ FPR g , ∆ FNR g and ∆ ∆ g be the disincentives under the optimal policysubject to each fairness notion FPR , FNR , and ∆ respectively. Suppose further that the optimal policies whileequalizing false positive rates, false negative rates and disincentives are threshold policies.1. Equalizing false positive rates attains a (weakly) lower crime rate than equalizing disincentives for all ( h g ) g if and only if ∆ FPR g > ( ≥ )∆ ∆ g ∀ g

2. Equalizing false negative rates attains a (weakly) lower crime rate than equalizing disincentives for all ( h g ) g if and only if ∆ FNR g > ( ≥ )∆ ∆ g ∀ g Proof of lemma A.1.

By deﬁnition, ∆ ξg = R R f c ( s ) β ξg ( s ) ds − R R f i ( s ) β ξg ( s ) ds is the disincentive under theoptimal policy β ξg ∈ arg min β ∈ B ξ P g ∈G N g CR g that achieve each fairness notion ξ ∈ { FPR,FNR , ∆ } It is straightforward that the ∆

FPR g > ( ≥ )∆ ∆ g ∀ g implies equalizing false positive rates attains a (weakly)lower crime rate than equalizing disincentives. Note that H g is a non-increasing function, and therefore, forall g , ∆ FPR g > ( ≥ )∆ ∆ g ⇐⇒ N g H g (∆ FPR g ) < ( ≤ ) N g H g (∆ ∆ g ) . g implies that ∆ FPR g > ( ≥ )∆ ∆ g for all g . To show this, suppose not. Withoutloss of generality, suppose that ∆ FPR1 < ( ≤ )∆ ∆1 . If ∆ FPR2 ≤ ∆ ∆2 , then any pair of survivor functions ( H , H )that are strictly decreasing in their arguments will imply H (∆ FPR1 ) > H (∆ ∆1 ) and H (∆ FPR2 ) ≥ H (∆ ∆2 )so that equalizing disincentives attains a strictly lower crime rate than equalizing false positive rates. If∆ FPR2 > ∆ ∆2 , then H where that the diﬀerence between its value at ∆ FPR1 and at ∆ ∆1 is large enough and H where the diﬀerence between its value at ∆ FPR2 and at ∆ ∆2 is small enough result in a lower crime ratefor equalizing disincentives than for equalizing false positive rates. More speciﬁcally, let H and H be suchthat N ( H (∆ F P R ) − H (∆ ∆1 )) > ǫ and N ( H (∆ ∆2 ) − H (∆ F P R )) < ǫ for some ǫ >

0. Then, (cid:16) N H (∆ F P R ) + N H (∆ ∆1 ) (cid:17) − (cid:16) N H (∆ ∆1 ) + N H (∆ ∆2 ) (cid:17) > FPR g < ( ≤ )∆ ∆ g for all g . The same argument may be applied for equalizing false negative rates.Because we are assuming signal threshold strategies by the adjudicator, we will parametrize FPR g ( T g ) , FNR g ( T g ) , TPR g ( T g )to denote false positive, false negative, and true positive rate when the signal threshold T g is used.We ﬁrst show the equivalence between condition (1) and (3) in the theorem. Before we show the equiva-lence, we make some characterization of { ∆ FPR g } g and { ∆ ∆ g } g . Since ∆ > ∆ , it must be that∆ ∆2 = ∆ ∆1 = ∆ . As for the ﬁrst condition (1), by lemma A.1, we have that ∆

FPR g ≥ ∆ ∆ g for all g . Thus, we have that for (1),∆ FPR1 = ∆ ∆1 = ∆ ∆1 , which implies T FPR1 = T ∆1 = T ∗ . For FPR = FPR , we haveFPR ( T FPR2 ) = FPR ( T FPR1 ) = FPR ( T ∆1 ) = F i1 ( T ∗ ) . or equivalently, F i2 ( T FPR2 ) = F i1 ( T FPR1 ) = F i1 ( T ∆1 ) = F i1 ( T ∗ ) . Now, we show the equivalence:∆

FPR2 ≥ ∆ ∆2 ⇐⇒ TPR ( T FPR2 ) − FPR ( T FPR2 ) ≥ TPR ( T ∆1 ) − FPR ( T ∆1 ) ∆ ∆2 = ∆ ∆1 ⇐⇒ TPR ( T FPR2 ) ≥ TPR ( T ∆1 ) FPR ( T FPR2 ) = FPR ( T ∆1 ) = FNR ( T ∗ ) ⇐⇒ F c1 ( T ∆1 ) ≥ F c2 ( T FPR2 ) ⇐⇒ F c1 ( T ∗ ) ≥ F c2 ( T FPR2 ) ⇐⇒ F c1 ( T ∗ ) ≥ F c2 (( F i2 ) − ◦ F i1 ( T ∗ )) T FPR2 = ( F i2 ) − ◦ F i1 ( T FPR1 ) = ( F i2 ) − ◦ F i1 ( T ∗ ) ⇐⇒ ( F c2 ) − ◦ F c1 ( T ∗ ) ≥ ( F i2 ) − ◦ F i1 ( T ∗ ) 22 similar logic applies to equalizing false negative rates. Once again, by lemma A.1, ∆ FNR g ≥ ∆ ∆ g for all g , which implies ∆ FNR1 = ∆ ∆1 and hence, T FNR1 = T ∆1 = T ∗ . For FNR = FNR , we haveTPR ( T FNR2 ) = TPR ( T FNR1 ) = TPR ( T ∆1 ) = TPR ( T ∗ )or equivalently, F c2 ( T FNR2 ) = F c1 ( T FNR1 ) = F c1 ( T ∆1 ) = F c1 ( T ∗ ) . The equivalence follows, as∆

FNR2 ≥ ∆ ∆2 ⇐⇒ TPR ( T FNR2 ) − FPR ( T FNR2 ) ≥ TPR ( T ∆1 ) − FPR ( T ∆1 ) (∆ ∆2 = ∆ ∆1 ) ⇐⇒ − FPR ( T FNR2 ) ≥ − FPR ( T ∆1 ) (TPR ( T FNR2 ) = TPR ( T ∆1 ) = TPR ( T ∗ )) ⇐⇒ F i2 ( T FNR2 ) ≥ F i1 ( T ∆1 ) ⇐⇒ F i2 ( T FNR2 ) ≥ F i1 ( T ∗ ) ⇐⇒ F i2 (( F i2 ) − ◦ F c1 ( T ∗ )) ≥ F i1 ( T ∗ ) ( T FNR2 = ( F c2 ) − ◦ F c1 ( T FNR1 ) = ( F c2 ) − ◦ F c1 ( T ∗ )) ⇐⇒ ( F c2 ) − ◦ F c1 ( T ∗ ) ≥ ( F i2 ) − ◦ F i1 ( T ∗ ) Theorem 5.5.

Let T ∗ be s.t. f ( T ∗ − r ) = f ( T ∗ ) . Deﬁne T g = µ g + σ g T ∗ . Then, − f g ( T g − m ) + f g ( T g ) = − σ g f (cid:16) µ g + σ g T ∗ − µ g − mσ g (cid:17) + 1 σ g f (cid:16) µ g + σ g T ∗ − µ g σ g (cid:17) = − σ g f (cid:16) T ∗ − m g σ g (cid:17) + 1 σ g f (cid:16) T ∗ (cid:17) = − σ g (cid:16) f ( T ∗ − r ) + f ( T ∗ ) (cid:17) = 0 . Therefore, T ∗ g = T g = µ g + σ g T ∗ . Note that F g ( T g ) = F ( T ∗ )and F g ( T g − m g ) = F (cid:16) T ∗ − r (cid:17) for both g . That is, FNR and FPR are the same across the groups.23 art (2) Suppose m σ < m σ . Then ∆ > ∆ .( F cc ) − ◦ F cc ( T ) = (cid:16) T − µ − m σ (cid:17) σ + µ + m and ( F nc ) − ◦ F nc ( T ) = (cid:16) T − µ σ (cid:17) σ + µ so that ( F cc ) − ◦ F cc ( T ) ≥ ( F nc ) − ◦ F nc ( T ) ⇐⇒ (cid:16) T − µ − m σ (cid:17) σ + µ + m ≥ (cid:16) T − µ σ (cid:17) σ + µ ⇐⇒ m σ ≥ m σ . Therefore, by Theorem 5.3, equalizing false positive rates and equalizing false negative rates attains strictlylower crime rates than equalizing disincentives.

Theorem 5.6.

Suppose the signal structure is from the location-scale family as in Deﬁnition 5.4, and f issymmetric around . Then, the optimal policy subject to equalizing false positive rates and that subject toequalizing false negative rates yield the same crime rate.Proof of Theorem 5.6. Let ( T , T ) be thresholds that equalize false positive rates, that is, 1 − F (cid:16) T − µ σ (cid:17) =1 − F (cid:16) T − µ σ (cid:17) . The disincentive for group g is F (cid:16) T g − µ g σ (cid:17) − F (cid:16) T g − µ g − m g σ g (cid:17) . (A.2)Let T ′ g = 2 µ g + m g − T g for each g . Then, T ′ g − µ g σ g = − T g − µ g − m g σ g and T ′ g − µ g − m g σ g = − T g − µ g σ g . Note that F (cid:16) T ′ − µ − m σ (cid:17) = F (cid:16) − T − µ σ (cid:17) = 1 − F (cid:16) T − µ σ (cid:17) = 1 − F (cid:16) T − µ σ (cid:17) = F (cid:16) − T − µ σ (cid:17) = F (cid:16) T ′ − µ − m σ (cid:17) where the second and the fourth equalities are from the symmetry around 0, and the third equality is from( T , T ) equalizing false positive rates. Therefore, ( T ′ , T ′ ) equalize false negative rates.Furthermore, the disincentive under T ′ g is F (cid:16) T ′ g − µ g σ (cid:17) − F (cid:16) T ′ g − µ g − m g σ g (cid:17) = F (cid:16) − T g − µ g − m g σ g (cid:17) − F (cid:16) − T g − µ g σ g (cid:17) = (cid:16) − F (cid:16) T g − µ g − m g σ g (cid:17)(cid:17) − (cid:16) − F (cid:16) T g − µ g σ g (cid:17)(cid:17) = F (cid:16) T g − µ g σ g (cid:17) − F (cid:16) T g − µ g − m g σ g (cid:17) T g . Therefore, for any pair of disincentives that is feasible underequalizing false positive rates, it is feasible under equalizing false negative rates.Similar arguments can be applied to the other case. Therefore, the set of feasible pair of disincentivesare identical, and therefore, the lowest crime rate that can be attained by equalizing false positive rates andequalizing false negative rates are identical. Theorem 6.1.

Suppose that H ﬁrst-order stochastically domiantes H . Then, there is an ǫ > such thatequalizing crime rates attains a lower crime rate than equalizing disincentives if and only if ∆ ≥ ∆ + ǫ .Proof. We will ﬁx H , H , and ∆ . Then, we will consider varying ∆ . There are 4 diﬀerent cases: (i)∆ + ǫ ≤ ∆ , (ii) ∆ < ∆ < ∆ + ǫ , (iii) ∆ = ∆ , and (iv) ∆ > ∆ , where ǫ > N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) + N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) = 0. Such ǫ exists by the continuity of H g . When ǫ = 0, then the above value is positive and once ǫ is big enough such that H (∆ + ǫ ′ ) = H (∆ ), then the valueis negative. Therefore, by the intermediate value theorem, such ǫ exists. Furthermore, note that for ǫ ′ > H (∆ + ǫ ′ ) < H (∆ ), it must be that N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) + N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) <

0. Therefore, we have that H (∆ + ǫ ) ≥ H (∆ ).Also, we write ∆ CR g and ∆ ∆ g to denote the optimal disincentives that minimize the crime rates whileequalizing the crime rate and disincentive respectively. Case (i) ∆ + ǫ ≤ ∆ First, because ∆ < ∆ , ∆ ∆1 = ∆ ∆2 = ∆ . Now, as for ∆ CR g , it depends on whether H (∆ ) ≤ H (∆ ).Consider when H (∆ ) ≤ H (∆ ). Then, we must have ∆ CR1 = ∆ , and ∆ CR2 should be such that H (∆ CR2 ) = H (∆ ). Because H stochastically dominates H , we have that ∆ ∆1 = ∆ ∆2 < ∆ CR2 . By the monotonicity of H , it must be that H (∆ ∆2 ) > H (∆ CR2 ). Therefore, N H (∆ ∆1 ) + N H (∆ ∆2 ) > N H (∆ CR1 ) + N H (∆ CR2 ) . Now, consider when H (∆ ) > H (∆ ). Then, we must have ∆ CR2 = ∆ , and ∆ CR1 should be such that H (∆ CR1 ) = H (∆ ). Compare how each group’s crime rate changes as we go from equalizing crime rateto equalizing disincentive. As for group 1, it goes from H (∆ ) to H (∆ ). As for group 2, it goes from H (∆ ) to H (∆ ). Therefore, total change in crime rate by going from equalizing crime rate to equalizingdisincentive is at most 0:0 = N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) + N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) ≥ N (cid:0) H (∆ ) − H (∆ ) (cid:1) + N (cid:0) H (∆ ) − H (∆ ) (cid:1) Therefore, we have that equalizing crime rates is better than equalizing disincentives.

Case (ii) ∆ < ∆ < ∆ + ǫ In this case, we know that H (∆ ) > H (∆ ). For the optimal disincentive-equalizing policies, we havethat ∆ ∆1 = ∆ ∆2 = ∆ . As for crime-equalizing policy, we have ∆ CR2 = ∆ , and ∆ CR1 is chosen such that H (∆ CR1 ) = H (∆ ). Now, compare how each group’s crime rate as we goes from equalizing crime ratesto equalizing disincentives. As for group 2, it goes from H (∆ ) to H (∆ ). As for group 1, it goes from H (∆ ) to H (∆ ). Therefore, total change in crime rate by going from equalizing crime rate to equalizingdisincentive is at most 0, as N (cid:0) H (∆ ) − H (∆ ) (cid:1) + N (cid:0) H (∆ ) − H (∆ ) (cid:1) > N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) + N (cid:0) H (∆ + ǫ ) − H (∆ ) (cid:1) = 0Therefore, equalizing disincentives is better than equalizing crime rates in this case.25 ase (iii) ∆ = ∆ First, ∆ ∆1 = ∆ ∆2 = ∆ = ∆ . As for equalizing crime rates ∆ CR g , ∆ CR2 = ∆ , and ∆ CR1 is chosen such that H (∆ CR1 ) = H (∆ CR2 ) > H (∆ ). Therefore, we have N H (∆ ∆1 ) + N H (∆ ∆2 ) < N H (∆ CR1 ) + N H (∆ CR2 ) , meaning equalizing disincentives is better than equalizing crime rates. Case (iv) ∆ > ∆ First, ∆ ∆1 = ∆ ∆2 = ∆ . As for equalizing crime rates ∆ CR g , ∆ CR2 = ∆ , and ∆ CR1 is chosen such that H (∆ CR1 ) = H (∆ CR2 ) > H (∆ ). Therefore, we have N H (∆ ∆1 ) + N H (∆ ∆2 ) < N H (∆ CR1 ) + N H (∆ CR2 ) , meaning equalizing disincentives is better than equalizing crime rates. Theorem 6.2.

Suppose that H ﬁrst-order stochastically dominates H . When H (∆ ) = H (∆ ) , theoptimal policy equalizes crime rates but not necessarily false positive rates, false negative rates or disincentivesin general.Proof. (∆ , ∆ ) is the most optimal policy, and they equalize the crime rates H (∆ ) = H (∆ ). However,it is not guaranteed that the false positive/negative rates or disincentives will be equalized. For instance, if∆ = ∆ , then the disincentives are not equalized. And as ∆ g = (1 − FNR g ) − FPR g , false positive/negativerates won’t be equalized in general. 26, false positive/negativerates won’t be equalized in general. 26