Intra-Processing Methods for Debiasing Neural Networks
PPost-Hoc Methods for Debiasing Neural Networks
Yash [email protected] Colin [email protected] Naveen Sundar GovindarajuluRAIR Lab [email protected]
Abstract
As deep learning models become tasked with more and more decisions that impact humanlives, such as hiring, criminal recidivism, and loan repayment, bias is becoming a growing concern.This has led to dozens of definitions of fairness and numerous algorithmic techniques to improvethe fairness of neural networks. Most debiasing algorithms require retraining a neural networkfrom scratch, however, this is not feasible in many applications, especially when the model takesdays to train or when the full training dataset is no longer available.In this work, we present a study on post-hoc methods for debiasing neural networks. Firstwe study the nature of the problem, showing that the difficulty of post-hoc debiasing is highlydependent on the initial conditions of the original model. Then we define three new fine-tuningtechniques: random perturbation, layer-wise optimization, and adversarial fine-tuning. All threetechniques work for any group fairness constraint. We give a comparison among three popularpost-processing debiasing algorithms and our three proposed methods, across three datasets andthree popular bias measures. Our algorithms outperform the existing post-processing techniqueson average, and each of our algorithms perform best in certain settings. Our code is available at https://github.com/realityengines/post_hoc_debiasing . The last decade has seen a huge increase in applications of machine learning in a wide variety ofdomains such as credit scoring, fraud detection, hiring decisions, criminal recidivism, loan repayment,and so on [31, 6, 34, 2]. The outcome of these algorithms are impacting the lives of people morethan ever. There are clear advantages in the automation of classification tasks, as machines canquickly process thousands of datapoints with many features. However, algorithms are susceptibleto bias towards individuals or groups of people from a variety of sources [38, 35, 36]. For example,Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is a computersoftware which determines the risk of a defendant committing a future crime. United States judgesconsult the software to decide whether or not a defendant should be granted bail or pretrial release.It was found that this software is biased against African-Americans [15]. The need to address theseissues is higher than ever [3, 37].Motivated by these findings, the last few years has seen a huge growth in the area of fairnessin machine learning. Dozens of formal definitions of fairness have been proposed [33], and manyalgorithmic techniques have been developed for debiasing according to these definitions [44]. Whilesubstantial progress has been made, the majority of techniques have been developed as pre-processingor in-processing methods. In other words, most techniques are developed to run before or duringthe training of the machine learning model, either as an add-on, or as a newly proposed algorithm.1 a r X i v : . [ c s . L G ] J un nly a handful of debiasing methods run after the training has been completed, either as fine-tuningmethods or post-processing methods [4]. However, as datasets become larger and training becomesmore computationally intensive, especially in the case of neural networks, there is a growing needfor debiasing algorithms which do not require retraining a model from scratch. Additionally, someapplications may require debiasing an existing model without full access to the training dataset, dueto regulatory requirements or privacy concerns. For example, this may be true any time the entitieswhich build the model are different from the entities which deploy the model. Furthermore, mostpreviously proposed post-processing methods have been designed for just one fairness measure. Dueto the diversity in fairness definitions, and since reduction of more than one fairness measure may bedifficult [10], there is a growing need for algorithms which can handle any group fairness constraint.In this work, we present a formal study of post-hoc methods for debiasing neural networks. Apost-hoc method is defined as an algorithm which has access to a trained model and a validationdataset, and either fine-tunes the model or performs post-processing on the model predictions. Westart by showing that the difficulty of post-hoc debiasing is highly dependent on the initial conditionsof the original model. In particular, given a neural network trained to optimize accuracy, the variancein the amount of bias of the trained model is much higher than the variance in the accuracy, withrespect to the random seed used for initializing the weights of the original model. Therefore, eventhe initial random seed can substantially change the amount of bias present in a neural network.Next, we present three new optimization-based techniques for post-hoc debiasing of neuralnetworks, each of which work for any group fairness measure. Each technique takes as input anobjective function, which can be chosen to trade off a fairness measure with model accuracy. Wedefine a simple algorithm, random perturbation, which iteratively adds multiplicative noise tothe weights of the neural network and then thresholds the output probabilities to minimize theobjective function. Our second technique is a layer-wise optimization algorithm. In this approach,we iteratively choose a layer of the neural network and use gradient-boosted regression trees tooptimize the weights of the chosen layer with respect to the objective function. Our last technique isan adversarial fine-tuning algorithm. Adversarial training is a powerful debiasing method becausetraining a critic (discriminator) to predict bias effectively makes the objective function differentiable,enabling the use of first-order optimization techniques such as gradient descent. This has recentlybeen proposed as an in-processing method for debiasing [45]. We show that using an adversarialmodel to fine-tune the trained neural network is a viable post-hoc technique.We compare the three above techniques with three post-processing algorithms from prior work:reject option classification [23], equalized odds post-processing [21], and calibrated equalized oddspost-processing [39]. We run experiments with three popular fairness datasets and three popularfairness definitions. We show that certain algorithms are useful in certain scenarios. For example, therandom perturbation algorithm is a strong post-hoc debiasing baseline. The adversarial fine-tuningmethod is more powerful for debiasing larger models, but it is more computationally intensive andmay require hyperparameter tuning. The layer-wise fine-tuning algorithm may work well on modelsin which the bias is concentrated in one layer.Fairness research (and machine learning research as a whole) has seen a huge increase in popularity,and recent papers have highlighted the need for fair and reproducible results [42, 4]. To facilitatebest practices, we run our experiments on the AIF360 toolkit [4] and open source all of our code. Our contributions.
We summarize our main contributions below. • We study the nature of post-hoc techniques for debiasing neural networks, showing that the2roblem is sensitive to the initial conditions of the original model. • We present three measure agnostic, fine-tuning algorithms for post-hoc debiasing: randomperturbation, layer-wise optimization, and adversarial fine-tuning. Our algorithms outperformall existing post-processing techniques on average. • We conduct a study of post-hoc techniques for debiasing neural networks, testing six differentalgorithms across three datasets and with three different fairness measures.
Debiasing overview.
There is a surging body of research on bias and fairness in machine learning.There are dozens of types of bias that can arise [29], and dozens of formal definitions of fairnesshave been proposed [33]. Popular definitions include statistical parity/demographic parity [14, 25],equal opportunity (a subset of equalized odds) [20], and average absolute odds [4]. Many biasmitigation techniques have been proposed, which generally fall into three categories: pre-processing,in-processing, and post-processsing. Post-processing debiasing techniques are performed on apretrained model and do not require access to the full training set. Therefore, these techniques areuseful in a variety of settings in which retraining is costly or impossible due to computational costsor data limitations.
Post-processing methods
Most prior work on post-processing techniques use label-flippingmethods such as randomly flipping labels until the true/false negative rates are equal, or flippinglabels in a critical region of predicted probabilities near 0.5 [20, 39, 23]. Currently, most of thesetechniques have only been established for specific fairness measures. For a full overview, see [4, 44].See Section 6 for brief descriptions of three post-processing debiasing techniques
Hyperparameter optimization for fairness
There is a variety of work on in-processing debi-asing algorithms which are similar in spirit to our optimization methods. We mention a few ofthem here. However, none of these explicitly present a post-hoc debiasing algorithm. Recently, ameta-algorithm was developed for in-processing debiasing by reducing many fairness measures toconvex problems [8]. Another work treats debiasing as an empirical risk minimization problem [12].Yet another work adds the fairness constraints as regularizers in the machine learning models [5].Other prior work has used hyperparameter optimization to select hyperparameters for trainingmodels to exhibit less bias [9], but this approach repeatedly retrains the full model with differenthyperparameters. Bias reduction has also been framed as a pre-processing convex optimizationproblem [7].There is also prior work using adversarial learning to debias algorithms [45]. To the best of ourknowledge, no prior work has designed a post-hoc algorithm using adversarial learning.
Deep learning algorithms are more prevalent than ever before. The technology is becoming moreand more integrated into society, and is used in high-stakes applications such as criminal recidivism,loan repayment, and hiring decisions [31, 6, 34, 2]. It is also becoming increasingly more evident3hat many of these algorithms are biased from various sources [38, 35, 36]. Using technology forlife-changing events which make prejudiced decisions will only deepen the divides that exist in society,and the need to address these issues is higher than ever [3, 37].Our work seeks to decrease the negative effects that biased deep learning algorithms have onsociety. Post-hoc methods, which work for any group fairness measure, will be applicable to largeexisting deep learning models, since the networks need not be retrained from scratch. Furthermore,we present simple techniques (random perturbation) as well as more complex and strong techniques(adversarial fine-tuning). Since we study the nature of post-hoc debiasing and present a studycomparing prior work to our algorithms, our work may facilitate future work in post-hoc debiasingtechniques.
Impact on bias in judicial applications
We briefly discuss how post-hoc methods for debiasingcould help in judicial settings. Some machine learning algorithms which are prejudiced have beenused in judicial applications in the past [41, 19]. Studies and investigations have found that many ofthe algorithms have some form of bias [26]. Moreover, different entities using the same model mightprefer to use different fairness measures and some of these measures might be incompatible [11].Generally, the entities that build and use the applications are not the same. Therefore, due to legaland licensing issues, the entity using the application may not have access to the training dataset.This precludes the use of pre-processing and in-processing methods for debiasing. The entity usingthe model usually has its own dataset available (e.g. a local court tracking their recidivism rates).This makes post-hoc processing the only viable method for debiasing.
In this section, we give notation and definitions used throughout the paper. Given a dataset splitinto three parts, D train , D valid , D test , let ( xxx i , Y i ) denote one datapoint, where xxx i ∈ R d contains d features including one binary protected attribute A (e.g., identifying as female or not identifyingas female), and Y i ∈ { , } is the label. Denote the value of the protected feature for xxx i as a i . Wedenote a trained neural network by a function f θ : R d → [0 , , where θ denotes the trained weights.We often denote f θ ( xxx i ) = ˆ Y i , the output predicted probability for datapoint xxx i . Finally, we refer to aset of labels in a dataset D as Y . Fairness measures.
We now give an overview of group fairness measures used in this work. Givena dataset D with labels Y , protected attribute A , and a set of predictions ˆ Y = { f θ ( xxx ) | xxx ∈ D} fromsome neural network f θ , we define the true positive and false positive rates as TPR A = a ( D , ˆ Y ) = (cid:12)(cid:12)(cid:12) { i | ˆ Y i = Y i = 1 , a i = a } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { i | ˆ Y i = Y i = 1 } (cid:12)(cid:12)(cid:12) = P ( xxx i ,Y i ) ∈D ( ˆ Y i = 1 | a i = a, Y i = 1) , FPR A = a ( D , ˆ Y ) = (cid:12)(cid:12)(cid:12) { i | ˆ Y i = 1 , Y i = 0 , a i = a } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) { i | ˆ Y i = 1 , Y i = 0 } (cid:12)(cid:12)(cid:12) = P ( xxx i ,Y i ) ∈D ( ˆ Y i = 1 | a i = a, Y i = 0) , tatistical Parity Difference (SPD) , or demographic parity difference [14, 25], measures thedifference in the probability of a positive outcome between the protected and unprotected groups.Formally, SPD ( D , ˆ Y , A ) = P ( xxx i ,Y i ) ∈D ( ˆ Y i = 1 | a i = 0) − P ( xxx i ,Y i ) ∈D ( ˆ Y i = 1 | a i = 1) . Equal opportunity difference (EOD) [20] measures the difference in TPR for the protected andunprotected groups. Equal opportunity is identical to equalized odds in the case where the protectedfeature and labels are binary. Formally, we have
EOD ( D , ˆ Y , A ) = TPR A =0 ( D , ˆ Y ) − TPR A =1 ( D , ˆ Y ) . Average Odds Difference (AOD) [4] is defined as the average of the difference in the false positiverates and true positive rates for unprivileged and privileged groups. Formally,
AOD ( D , ˆ Y , A ) = ( FPR A =0 ( D , ˆ Y ) − FPR A =1 ( D , ˆ Y )) + ( TPR A =0 ( D , ˆ Y ) − TPR A =1 ( D , ˆ Y ))2 . Optimization techniques.
Zeroth order (non-differentiable) optimization is used when the ob-jective function is not differentiable (as is the case for most definitions of group fairness). This isalso called black-box optimization. Given an input space W and an objective function µ , zerothorder optimization seeks to compute w ∗ = arg min w ∈ W µ ( w ) . Leading methods for zeroth orderoptimization when function queries are expensive (such as optimizing a deep network) includegradient-boosted regression trees (GBRT) [17, 28] and Bayesian optimization (BO) [40, 16, 43],however BO struggles with high-dimensional data.In contrast, first-order optimization is used when it is possible to take the derivative of theobjective function. For example, gradient descent is a first-order optimization technique. In this section, we describe three new fine-tuning techniques for debiasing neural networks. First wegive more notation and formally define the different types of debiasing algorithms.Given a neural network f θ , we sometimes drop the subscript θ when it is clear from context.We denote the last layer of f by f ( (cid:96) ) , and we assume that f = f ( (cid:96) ) ◦ f (cid:48) , where f (cid:48) is all but the lastlayer of the neural network. Our layer-wise optimization algorithm assumes that f is feed-forward,that is, f = f ( (cid:96) ) ◦ · · · ◦ f (1) for functions f (1) , f (2) , . . . , f ( (cid:96) ) . The performance of the model isgiven by a performance measure ρ . For a set of points D , given the set of true labels Y and theset of predicted labels ˆ Y = { f ( xxx i ) | ( xxx i , Y i ) ∈ D (cid:48) } , the performance is ρ ( Y , ˆ Y ) ∈ [0 , . Commonperformance measures include accuracy, precision, recall, or AUC ROC (area under the ROC curve).We also define a bias measure µ , given as µ ( D , ˆ Y , A ) ∈ [0 , , such as one defined in Section 4.The goal of any debiasing algorithm is to minimize the bias µ , without sacrificing performance ρ too much. Many prior works have observed that fairness comes at the price of accuracy for manydatasets, even when using large models such as deep networks [4, 44, 10], which means it is often notpossible to achieve zero bias without significantly lowering accuracy. Therefore, a common techniqueis to minimize an objective function such as the following. φ µ,ρ ( D , ˆ Y , A ) = λ · | µ ( D , ˆ Y , A ) | + (1 − λ )(1 − ρ ( Y , ˆ Y )) . (1)5n the expression, λ is a parameter in [0 , which can be tuned based on the desired bias or basedon the level of bias in the original model.An in-processing debiasing algorithm takes as input the training and validation datasets andoutputs a model f which seeks to minimize φ µ,ρ . A fine-tuning algorithm takes in the validationdataset and a trained model f with weights θ (typically f was trained to optimize the performance ρ ), and outputs fine-tuned weights θ (cid:48) such that f θ (cid:48) minimizes the objective φ µ,ρ . A post-processingdebiasing algorithm takes as input the validation dataset as well as a set of predictions ˆ Y on thevalidation dataset (typically coming from a model f which was optimized for ρ ), and outputs apost-processing function h : { , } → { , } which performs post-processing on predictions so thatthe final predictions optimize φ µ,ρ . Note that fine-tuning and post-processing debiasing algorithmsare useful in different settings. Post-processing algorithms are useful when there is no access to theoriginal model. Fine-tuning algorithms are useful when there is access to the original model, or whenthe prediction is over a continuous feature. Now we present three fine-tuning techniques.
Random perturbation.
Our first algorithm is a simple iterative random procedure, randomperturbation . In every iteration, each weight in the neural network is multiplied by a Gaussian randomvariable with mean 1 and standard deviation 0.1. In case the model f outputs probabilities, wefind the threshold τ such that ˆ Y τ = { I { ˆ Y > τ }} ˆ Y ∈ ˆ Y minimizes φ µ,ρ ( Y , ˆ Y τ , A ) . We run T iterationsand output the perturbed weights which minimize φ µ,ρ on the validation set. See Algorithm 1. Weshow in the next section that despite its simplicity, this model performs well on many datasetsand fairness measures, and therefore we recommend this algorithm as a baseline in future post-hocdebiasing applications. A natural follow-up question is whether we can do even better by using anoptimization algorithm instead of random search. This is the motivation for our next approach. Algorithm 1
Random Perturbation Input:
Trained model f with weights θ , validation dataset D valid , objective φ µ,ρ , parameter T Set θ ∗ = ∅ , val ∗ = ∞ , and τ ∗ = 0 for i = 1 to T do Sample q j ∼ N (1 , . for all j ∈ { , , ..., | θ |} θ (cid:48) j = θ j · q j Select threshold τ ∈ [0 , which minimizes the objective φ µ,ρ on the validation set Set val = φ µ,ρ ( D valid , { I { f θ (cid:48) ( xxx ) > τ } | ( xxx, Y ) ∈ D valid } , A ) If val < val ∗ , set val ∗ = val, θ ∗ = θ (cid:48) and τ ∗ = τ end for Output: θ ∗ , τ ∗ Layer-wise optimization.
Our next method fine-tunes the model by debiasing individual layersusing zeroth order optimization. Intuitively, an optimization procedure will be much more effectivethan random perturbations, but it is computationally expensive and does not scale as well, so we canonly run optimization on individual layers. Given a model, assume the model can be decomposedinto several functions f = f ( (cid:96) ) ◦ · · · ◦ f (1) For example, a feed-forward neural network with (cid:96) layerscan be decomposed in this way. We denote the trained weights of each component by θ , . . . , θ (cid:96) ,respectively. Now assume that we have access to a zeroth order optimizer A , which takes as inputa model f = f ( (cid:96) ) ◦ · · · ◦ f (1) , weights θ = θ , . . . , θ (cid:96) , dataset D valid , and an index i . The optimizer6eturns weights θ (cid:48) i , optimized with respect to to φ µ,ρ . In Algorithm 2, we set the optimizer to begradient-boosted regression trees (GBRT) [17, 28], a leading technique for black box optimizationwhich converts shallow regression trees into strong learners. GBRT iteratively constructs a posteriorpredictive model using the weights to make prediction and uncertainty estimates for each potentialset of weights θ (cid:48) . To trade off exploration and exploitation, the next set of weights to try ischosen using lower confidence bounds (LCB), a popular acquisition function (e.g., [22]). Formally, φ LCB ( θ (cid:48) ) = ˆ θ (cid:48) − β ˆ σ, in which we assume our model’s posterior predictive density follows a normaldistribution with mean ˆ θ (cid:48) and standard deviation ˆ σ. β is a tradeoff parameter that can be tuned.See Algorithm 2. Note that this algorithm can be easily generalized to optimize multiple layersat once, but this comes at the price of runtime. For example, running GBRT on the entire neuralnetwork would be strictly more powerful than the random permutation algorithm but is prohibitivelyexpensive. Algorithm 2
Layer-wise optimization Input:
Trained model f = f ( (cid:96) ) ◦ . . . ◦ f (1) with weights θ , . . . , θ (cid:96) , objective φ µ,ρ , optimizer A Set θ ∗ = ∅ , val ∗ = ∞ , and τ ∗ = 0 for i = 1 to (cid:96) do Run optimizer A to optimize weights θ i to θ (cid:48) i with respect to φ µ,ρ . Select threshold τ ∈ [0 , which minimizes objective φ µ,ρ Set val = φ µ,ρ ( D valid , { I { f θ (cid:48) ( xxx ) > τ } | ( xxx, Y ) ∈ D valid } , A ) , where θ (cid:48) = { θ , ..., θ (cid:48) i , ..., θ (cid:96) } If val < val ∗ set val ∗ = val, and θ ∗ = θ (cid:48) . end for Output: θ ∗ , τ ∗ Adversarial fine-tuning.
The previous two methods rely on zeroth order optimization techniquesbecause most group fairness measures such as statistical parity difference and equalized odds arenon-differentiable. Our last technique casts the problem of debiasing as first-order optimization byusing adversarial learning. The idea behind the adversarial method is that we train a critic model topredict the amount of bias in a minibatch. We sample the datapoints in a minibatch randomly andwith replacement. This statistical bootstrapping approach to creating a minibatch means that if thecritic can predict the bias in a minibatch accurately, then it can predict the bias in the model withrespect to the validation set reasonably well. Therefore, the critic effectively acts as a differentiableproxy for bias, which makes it possible to debias the original model using gradient descent.The adversarial algorithm works by alternately iterating between training the critic model g using the predictions from f , and fine-tuning the predictive model f with respect to φ µ,ρ using thebias proxy ˆ µ from g . Note that the first layer in g concatenates the minibatch and returns a singlenumber that estimates the bias of the minibatch as the final output. See Algorithm 3. Note thatBCELoss denotes the standard binary cross-entropy loss. In this section, we experimentally evaluate the techniques laid out in Section 5 compared to baselines,on three datasets and with multiple fairness measures. To promote reproducibility, we release ourcode at https://github.com/realityengines/post_hoc_debiasing and we use popular datasets7 lgorithm 3
Adversarial Fine-Tuning Input:
Trained model f = f (cid:96) ◦ f (cid:48) with weights θ , validation dataset D valid , objective φ µ,ρ parameters λ, m, m (cid:48) , T Set g as the critic model with weights θ (cid:48) . for i = 0 to n do for j = 0 to m do Sample a minibatch ( XXX k , YYY k ) with replacement from D valid Evaluate the bias in the minibatch, ˆ µ ← µ (( XXX k , YYY k ) , f ( XXX k )) . Update the critic model g by updating its stochastic gradient ∇ θ (cid:48) (ˆ µ − ( g ◦ f (cid:48) )( XXX k )) end for for j = 0 to m (cid:48) do Sample a minibatch ( XXX k , YYY k ) with replacement from D valid Update the original model by updating its stochastic gradient ∇ θ (cid:2) (1 − λ ) · ( g ◦ f (cid:48) )( XXX k ) + λ · BCELoss ( YYY k , f ( XXX k )) (cid:3) end for Select threshold τ ∈ [0 , that minimizes the objective φ µ,ρ end for Output:
Debiased model f , threshold τ from the AIF360 toolkit [4]. Each dataset contains one or more binary protected features(s) and abinary label. We briefly describe them below.The COMPAS dataset is a commonly used dataset in fairness research, consisting of over 10,000defendants with 402 features [15]. The goal is to predict the recidivism likelihood for an individual [1].We run separate experiments using race and also sex as protected attributes. The Adult CensusIncome (ACI) dataset is a binary classification dataset from the 1994 USA Census bureau databasein which the goal is to predict whether a person earns above $50,000 [13]. There are over 40,000 datapoints with 15 features. We use sex as the protected attribute. The Bank Marketing (BM) datasetis from the phone marketing campaign of a Portuguese bank. There are over 48,000 datapointsconsisting of 17 categorical and quantitative features. The goal is to predict whether a customer willsubscribe to a product [30]. The protected feature is whether or not the customer is older than 25. The need for neural networks.
First, we run a quick experiment to demonstrate the needfor neural networks on the above datasets. Deep learning has become a very popular approachin the field of machine learning [27], however, for tabular datasets with fewer than 20 features,it is worth checking whether logistic regression or random forest techniques perform as well asneural networks [32]. We construct a neural network with 10 fully-connected layers, BatchNorm forregularization, and a dropout rate of 0.2, and we compare this to logistic regression and a randomforest model on the ACI dataset. We see that a neural network achieves accuracy and area underthe receiver operating characteristic curve (AUC ROC) scores which are 2% higher than the othermodels. See Appendix A for the full results. Therefore, for the rest of this section, we focus on using8able 1: Bias and accuracy of a neural network.AOD EOD SPD accuracyACI (sex) -0.084 ± ± ± ± ± ± ± ± ± ± ± ± Bias sensitivity to initial model conditions.
Next, we run experiments to compute the amountof variance in the bias scores of the initial models. Neural networks have a huge number of localminima. Hyperparameters such as the optimizer and learning rate, and even the initial random seed,cause the model to converge to different local minima [27]. Techniques such as the Adam optimizerand early stopping with patience have been designed to allow neural networks to consistently reachlocal minima with high accuracies [24, 18]. However, there is no guarantee on the amount of bias.In particular, the local minima found by neural networks may have large differences in the amountof bias, and therefore, there may be very high variance on the amount of bias exhibited by neuralnetworks just because of the random seed. Every local optima has a different set of weights. If theweights of the model at a specific local optimum rely heavily on the protected feature, removingthe bias from such a model by updating the weights is harder than removing the bias from a modelwhose weights do not rely on the protected feature as heavily. Table 1 shows the mean and thestandard deviation of three fairness measures, as well as accuracy, for training a neural network with10 different initial random seeds, across three datasets. We see that the standard deviation of the biasscore is an order of magnitude higher than the standard deviation of the accuracy. In Appendix A,we plot the contribution of each individual weight to the bias score, for a neural network. We showthat the contribution of the weights to the bias score are sensitive to the initial random seed.
Now we present our main experimental study by comparing our three post-hoc debiasing methodsto three baseline methods on three datasets and with three fairness measures. Note that we donot compare to any in-processing debiasing algorithms, because these algorithms require the entiretraining set, yet all post-hoc methods only use the validation set. We briefly describe the baselinepost-processing algorithms that we tested.The reject option classification post-processing algorithm [23] defines a critical region of points inthe protected group whose predicted probability is near . , and flips these labels. This algorithm isdesigned to minimize statistical parity difference. The equalized odds post-processing algorithm [20]defines a convex hull based on the bias rates of different groups, and then flips the label of datapoints that fall inside the convex hull. This algorithm is designed to minimize equal opportunitydifference. The Calibrated equalized odds post-processing algorithm [39] defines a base rate of biasfor each group, and then adds randomness based on the group into the classifier until the bias ratesconverge. This algorithm is designed to minimize equal opportunity difference. For all algorithms,we use the implementations in the AIF360 repository [4].9ur initial model consists of a feed-forward neural network with 10 fully-connected layers of size32, with a BatchNorm layer between each fully-conntected layer, and a dropout fraction of 0.2. Themodel is trained with the Adam optimizer and an early-stopping patience of 100 epochs. The lossfunction is the binary cross-entropy loss. We use the validation data as the input for the post-hocdebiasing methods. The three post-hoc methods are set to optimize Equation 1 with λ = 0 . . Werun each post-hoc method on 10 neural networks initialized with different random seeds. In Figure 1we plot the objective function (plots 1-3) and accuracy + bias (plots 4-6) for all post-hoc debiasingalgorithms, on all datasets and all fairness measures. Note that we ran separate experiments onCOMPAS with race as the protected feature, and with gender as the protected feature. Note thatsince the three post-processing baselines are only set up to minimize a specific fairness measure,there is only a fair comparison on their respective measures.Next, we vary the hyperparameters of the initial neural network. We run experiments on threeadditional neural networks: (1) dropout probability at 0.5 instead of 0.2, (2) width of each layer setto 64 instead of 32, (3) number of layers set to 20 instead of 10. We run these experiments on theACI and BM datsets with SPD, for all post-hoc algorithms except for layerwise-optimization. Werun 10 trials of each post-hoc algorithm on each neural network. See Figure 1 (plots 7-8).
Discussion.
We see that the three fine-tuning methods significantly outperform the baselinemethods, sometimes even on the fairness metric for which the baseline was designed. We note thatthere are two caveats. First, the three fine-tuning methods had access to the objective function inEquation 1, while the post-processing methods are only designed to minimize their respective fairnessmeasures. However, as seen in Figure 1, sometimes the fine-tuning methods simultaneously achievehigher accuracy and lower bias compared to the post-processing methods, making the fine-tuningmethods Pareto-optimal. Second, fine-tuning methods are more powerful than post-processingmethods, since post-processing methods do not modify the weights of the original model, although itcomes at the price of computation time (See Table 2). Post-processing methods are more appropriatewhen the model weights are unavailable or when computation time is constrained, and fine-tuningmethods are more appropriate when higher performance is desired. We see that random perturbationis a strong fine-tuning technique, performing the best in many settings. Layer-wise optimizationperforms well in some settings, but is sometimes susceptible to the initial conditions of the originalmodel which makes intuitive sense given the discussion earlier in this section on bias sensitivity toinitial model conditions. The adversarial fine-tuning algorithm performs especially well when thedropout probability is higher and when the initial neural network is larger. This is likely due tothe fact that adversarial fine-tuning is the most powerful technique (training a neural network as asubroutine).
In this work, we present a study on post-hoc methods for debiasing neural networks. We define threenew measure-agnostic fine-tuning algorithms for debiasing neural networks: random perturbation,adversarial fine-tuning, and layer-wise optimization. First we show that the amount of bias issensitive to the initial conditions of the original neural network. Then we give an extensive studyof post-hoc debiasing by comparing our three new algorithms with three baseline post-processingalgorithms on three popular fairness datasets and with three popular fairness measures. We showthat each fine-tuning algorithm performs well for different datasets and different fairness metrics.10
CI (sex) BM (age) COMPAS (sex) COMPAS (race)
Dataset O b j e c t i v e Statistical Parity Difference
Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOpt ACI (sex) BM (age) COMPAS (sex) COMPAS (race)
Dataset O b j e c t i v e Average Odds Difference
Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOptACI (sex) BM (age) COMPAS (sex) COMPAS (race)
Dataset O b j e c t i v e Equal Opportunity Difference
Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOpt 0.10 0.05 0.00 0.05 0.10
Statistical Parity Difference A cc u r a c y . . . . . . . . . . Pareto Plot BM (age)
Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOpt0.3 0.2 0.1 0.0 0.1
Average Odds Difference A cc u r a c y . . . . . . . . . . Pareto Plot ACI (sex)
Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOpt 0.2 0.1 0.0 0.1 0.2
Equal Opportunity Difference A cc u r a c y . . . . . . . . . Pareto Plot COMPAS (race)
Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOptbase more dropout more width more layers
Architecture ACI (sex) O b j e c t i v e : | S P D | + ( )( a cc u r a c y ) Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOpt base more dropout more width more layers
Architecture BM (sex) O b j e c t i v e : | S P D | + ( )( a cc u r a c y ) Default ROC EqOdds CalibEqOdds Random Adversarial LayerwiseOpt
Figure 1: Results for post-hoc experiments. Plots 1-3: performance of six post-hoc debiasingalgorithms across three datasets and three bias measures, plotted with respect to the objectivefunction in Equation 1 which trades off bias with accuracy (a lower score is better). Plots 4-6: plotsof the bias and accuracy of the above experiments. The objective function is shown as black contourlines. Plots 7-8: additional experiments which change the hyperparameters of the original neuralnetwork. 11able 2: Runtime for every post-hoc algorithm for every dataset in secondsACI (sex) BM (age) COMPAS (sex) COMPAS (race)ROC 29.836 20.637 9.979 10.532EqOdds 0.015 0.012 0.011 0.011CalibEqOdds 0.144 0.064 0.049 0.054Random 156.848 113.529 61.937 63.540Adversarial 32.889 36.128 36.156 34.432LayerwiseOpt 186.480 146.760 79.800 79.800
Acknowledgements
We thank Murali Narayanaswamy and anonymous reviewers for their help with this project.
References [1] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias.
ProPublica, May ,23:2016, 2016.[2] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness in machine learning.
NIPSTutorial , 2017.[3] Jason Bellamy. Message from president dunn on racism and systemic inequality in america.2020.[4] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, KalapriyaKannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. Aifairness 360: An extensible toolkit for detecting, understanding, and mitigating unwantedalgorithmic bias. arXiv preprint arXiv:1810.01943 , 2018.[5] Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Mor-genstern, Seth Neel, and Aaron Roth. A convex framework for fair regression. arXiv preprintarXiv:1706.02409 , 2017.[6] Miranda Bogen and Aaron Rieke. Help wanted: An examination of hiring algorithms, equity,and bias, 2018.[7] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, andKush R Varshney. Optimized pre-processing for discrimination prevention. In
Advances inNeural Information Processing Systems , pages 3992–4001, 2017.[8] L Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K Vishnoi. Classification with fair-ness constraints: A meta-algorithm with provable guarantees. In
Proceedings of the Conferenceon Fairness, Accountability, and Transparency , pages 319–328, 2019.[9] Joymallya Chakraborty, Tianpei Xia, Fahmid M. Fahid, and Tim Menzies. Software engineeringfor fairness: A case study with hyperparameter optimization.
CoRR , abs/1905.05786, 2019.1210] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivismprediction instruments.
Big data , 5(2):153–163, 2017.[11] Alexandra Chouldechova and Aaron Roth. A snapshot of the frontiers of fairness in machinelearning.
Communications of the ACM , 63(5):82–89, 2020.[12] Michele Donini, Luca Oneto, Shai Ben-David, John S Shawe-Taylor, and Massimiliano Pontil.Empirical risk minimization under fairness constraints. In
Advances in Neural InformationProcessing Systems , pages 2791–2801, 2018.[13] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.[14] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairnessthrough awareness. In
Proceedings of the 3rd innovations in theoretical computer scienceconference , 2012.[15] Anthony W Flores, Kristin Bechtel, and Christopher T Lowenkamp. False positives, falsenegatives, and false analyses: A rejoinder to machine bias: There’s software used across thecountry to predict future criminals. and it’s biased against blacks.
Fed. Probation , 2016.[16] Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811 , 2018.[17] Jerome H Friedman. Greedy function approximation: a gradient boosting machine.
Annals ofstatistics , pages 1189–1232, 2001.[18] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Deep learning . MIT press, 2016.[19] Jamie Grace. Machine learning technologies and their inherent human rights issues in criminaljustice contexts.
Available at SSRN 3487454 , 2019.[20] Moritz Hardt, Eric Price, ecprice, and Nati Srebro. Equality of opportunity in supervisedlearning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors,
Advances in Neural Information Processing Systems 29 , pages 3315–3323. Curran Associates,Inc., 2016.[21] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In
Advances in neural information processing systems , pages 3315–3323, 2016.[22] Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning:Classic and contextual bandits. In
Proceedings of the Annual Conference on Neural InformationProcessing Systems (NIPS) , pages 325–333, 2016.[23] Faisal Kamiran, Asim Karim, and Xiangliang Zhang. Decision theory for discrimination-awareclassification. In , pages 924–929.IEEE, 2012.[24] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 , 2014.[25] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In
Advances in Neural Information Processing Systems , pages 4066–4076, 2017.1326] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compasrecidivism algorithm.
ProPublica (5 2016) , 9, 2016.[27] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature , 521(7553):436–444,2015.[28] Llew Mason, Jonathan Baxter, Peter L Bartlett, and Marcus R Frean. Boosting algorithms asgradient descent. In
Advances in neural information processing systems , pages 512–518, 2000.[29] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. Asurvey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 , 2019.[30] Sérgio Moro, Paulo Cortez, and Paulo Rita. A data-driven approach to predict the success ofbank telemarketing.
Decision Support Systems , 62:22–31, 2014.[31] Amitabha Mukerjee, Rita Biswas, Kalyanmoy Deb, and Amrit P Mathur. Multi–objectiveevolutionary algorithms for the risk–return trade–off in bank loan management.
InternationalTransactions in operational research , 2002.[32] Kevin P Murphy.
Machine learning: a probabilistic perspective . MIT press, 2012.[33] Arvind Narayanan. Translation tutorial: 21 fairness definitions and their politics. In
Proc. Conf.Fairness Accountability Transp., New York, USA , 2018.[34] Eric WT Ngai, Yong Hu, Yiu Hing Wong, Yijun Chen, and Xin Sun. The application of datamining techniques in financial fraud detection: A classification framework and an academicreview of literature.
Decision support systems , 50(3):559–569, 2011.[35] Executive Office of the President. Big data: Seizing opportunities, preserving values, 2014.[36] Executive Office of the President, Cecilia Munoz, Domestic Policy Council Director, Megan(US Chief Technology Officer Smith (Office of Science, Technology Policy)), DJ (Deputy ChiefTechnology Officer for Data Policy, Chief Data Scientist Patil (Office of Science, and TechnologyPolicy)).
Big data: A report on algorithmic systems, opportunity, and civil rights . ExecutiveOffice of the President, 2016.[37] Patrick Oliver. Protesting the death of george floyd. 2020.[38] Cathy O’neil.
Weapons of math destruction: How big data increases inequality and threatensdemocracy . Broadway Books, 2016.[39] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairnessand calibration. In
Advances in Neural Information Processing Systems , pages 5680–5689, 2017.[40] Carl Edward Rasmussen. Gaussian processes in machine learning. In
Summer School on MachineLearning , pages 63–71. Springer, 2003.[41] Michael L Rich. Machine learning, automated suspicion algorithms, and the fourth amendment.
University of Pennsylvania Law Review , pages 871–929, 2016.1442] Sebastian Schelter, Yuxuan He, Jatin Khilnani, and Julia Stoyanovich. Fairprep: Promotingdata to a first-class citizen in studies on fairness-enhancing interventions. arXiv preprintarXiv:1911.12587 , 2019.[43] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machinelearning algorithms. In
Proceedings of the Annual Conference on Neural Information ProcessingSystems (NIPS) , 2012.[44] Sahil Verma and Julia Rubin. Fairness definitions explained. In , pages 1–7. IEEE, 2018.[45] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases withadversarial learning. In
Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, andSociety , pages 335–340, 2018. 15able 3: Comparison between models. mean ± standard deviationlogistic regression neural network random forestACI accuracy 0.852 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± A Additional Experiments and Details
In this section, we give additional details from the experiments in Section 6, as well as additionalexperiments.
The need for neural networks.
We start by comparing the performance of neural networksto logistic regression and gradient-boosted regression trees (GBRT) on the datasets we used, todemonstrate the need for neural networks. This experiment is described at the start of Section 6.For convenience, we restate the details here. We construct a neural network with 10 fully-connectedlayers of size 32, BatchNorm for regularization, and a dropout rate of 0.2, and we compare this tologistic regression and GBRT on the ACI, BM, and COMPAS datasets. See Table 3. We see thatthe neural network achieves better accuracy and ROC AUC on all datasets except COMPAS, whichis within one standard deviation of the optimal performance.
Bias sensitivity to initial model conditions.
Next, we study the sensitivity of bias to initialmodel conditions. Recall that in Table 1, we computed the mean and standard deviation of threefairness measures, as well as accuracy, for training a neural network with respect to different initialrandom seeds. We see that standard deviation of the bias is an order of magnitude higher than thestandard deviation of the accuracy. Now we run more experiments to show that the contribution ofthe weights to the bias score are sensitive to the initial random seed.For this experiment, we train 10 neural networks with the same architecture as described inSection 6. We want to identify which parameters of the network contribute most to the bias. Toidentify these parameters, we create 1000 random delta vectors with mean 1 and standard deviation0.1 for each of the neural networks. We then take the Hadamard product of each random deltavector with the parameters of the corresponding network. We then evaluate the statistical paritydifference (SPD) on the test set for the networks with the new perturbed parameters. To identifywhich parameters contribute most to the bias, we train a linear model for each of the 10 neuralnetworks to predict the bias from the random delta vectors, and then we analyze the coefficients ofthe corresponding linear models. The linear models are successfully able to predict the bias basedon the random delta vectors with an R score of . ± . . Figure 2 (left) shows that only asmall fraction of the parameters contribute to the majority of the bias.Now we want to identify how similar the coefficients of the linear models are across all 10 neuralnetworks. To identify this, we stack the normalized coefficients for the linear models and decomposed16 index of sorted coefficients c o e ff i c i e n t v a l u e singular value index s i n g u l a r v a l u ee