[PDF] FairXGBoost: Fairness-aware Classification in XGBoost

Abstract

Highly regulated domains such as finance have long favoured the use of machine learning algorithms that are scalable, transparent, robust and yield better performance. One of the most prominent examples of such an algorithm is XGBoost. Meanwhile, there is also a growing interest in building fair and unbiased models in these regulated domains and numerous bias-mitigation algorithms have been proposed to this end. However, most of these bias-mitigation methods are restricted to specific model families such as logistic regression or support vector machine models, thus leaving modelers with a difficult decision of choosing between fairness from the bias-mitigation algorithms and scalability, transparency, performance from algorithms such as XGBoost. We aim to leverage the best of both worlds by proposing a fair variant of XGBoost that enjoys all the advantages of XGBoost, while also matching the levels of fairness from the state-of-the-art bias-mitigation algorithms. Furthermore, the proposed solution requires very little in terms of changes to the original XGBoost library, thus making it easy for adoption. We provide an empirical analysis of our proposed method on standard benchmark datasets used in the fairness community.

Full PDF

FFairXGBoost: Fairness-aware Classification in XGBoost

Srinivasan Ravichandran [email protected] Labs, American ExpressBangalore, Karnataka

Drona Khurana [email protected] Labs, American ExpressBangalore, Karnataka

Bharath Venkatesh [email protected] Labs, American ExpressBangalore, Karnataka

Narayanan Unny Edakunni [email protected] Labs, American ExpressBangalore, Karnataka

ABSTRACT

Highly regulated domains such as finance have long favoured theuse of machine learning algorithms that are scalable, transparent,robust and yield better performance. One of the most prominentexamples of such an algorithm is XGBoost[7]. Meanwhile, there isalso a growing interest in building fair and unbiased models in theseregulated domains and numerous bias-mitigation algorithms havebeen proposed to this end. However, most of these bias-mitigationmethods are restricted to specific model families such as logisticregression or support vector machine models, thus leaving mod-elers with a difficult decision of choosing between fairness fromthe bias-mitigation algorithms and scalability, transparency, per-formance from algorithms such as XGBoost. We aim to leveragethe best of both worlds by proposing a fair variant of XGBoost thatenjoys all the advantages of XGBoost, while also matching the lev-els of fairness from the state-of-the-art bias-mitigation algorithms.Furthermore, the proposed solution requires very little in terms ofchanges to the original XGBoost library, thus making it easy foradoption. We provide an empirical analysis of our proposed methodon standard benchmark datasets used in the fairness community.

CCS CONCEPTS • Computing methodologies → Boosting . KEYWORDS fairness, XGBoost, finance

ACM Reference Format:

Srinivasan Ravichandran, Drona Khurana, Bharath Venkatesh, and NarayananUnny Edakunni. 2020. FairXGBoost: Fairness-aware Classification in XG-Boost. In

KDD Workshop on Machine Learning in Finance ’20, August 24, 2020.

ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/1122445.1122456

Machine learning models are increasingly replacing traditionalmodeling systems because of better predictive performance andscalability. This has resulted in an explosion of the number ofmachine learning models being used in decision-making systems

KDD Workshop on Machine Learning in Finance ’20, August 24, 2020, © 2020 Association for Computing Machinery.This is the author’s version of the work. It is posted here for your personal use. Notfor redistribution. The definitive Version of Record was published in

KDD Workshopon Machine Learning in Finance ’20, August 24, 2020 , https://doi.org/10.1145/1122445.1122456. for a broad spectrum of activities such as credit lending, candidaterecruitment etc. This high rate of adoption also means that machinelearning models have a significant impact on people and the societyat large. Consequently, it is essential to ensure that these modelsare well-regulated.Building machine learning models in highly regulated domainssuch as finance, healthcare etc. often comes with its own share ofadditional challenges. For instance, in the finance industry, it isessential to be transparent in the decision-making process. Thesechallenges can also arise in the form of legal requirements suchas the Equal Credit Opportunity Act (ECOA)[3] which makes itunlawful for any creditor to discriminate against any applicant onthe basis of race, sex, color etc.Regulatory bodies all around the world such as European Unionthrough its General Data Protection and Regulation (GDPR) [1]have been keen on developing ways to make machine learningmodels fair, accountable, transparent and explainable. A host ofbias-mitigation approaches [5][11][12][16] have been proposed inrecent times that ensure that models are not discriminatory againsta specific population. While most of the initial work involved en-suring fairness in the training data being used to train these models,these methods often suffered a loss in model performance. Moresophisticated bias mitigation strategies[19][20] have been proposedrecently and have been demonstrated to be effective in ensuringfairness, while also not compromising too much on performance.However, most of these bias mitigation strategies are designed forspecific classes of models such as neural networks, which often vio-late the regulatory requirement of being transparent or for modelssuch as logistic regression and support vector machines, which aretypically inferior in performance and scalability. Thus, on the onehand, we have simple, interpretable and high-performing machinelearning algorithms such as XGBoost that are widely preferred inthe finance industry and on the other hand, we have sophisticatedbias mitigation strategies that are designed to work with a com-pletely different family of machine learning algorithms such asneural networks which may not be suitable for usage in regulateddomains.Our goal is to bridge this gap by formulating equivalent biasmitigation strategies for more practical algorithms. Specifically, inthis paper, we introduce a bias-mitigation scheme for XGBoost [7].XGBoost has the desirable advantages of being flexible, explainableand scalable while also providing state-of-the-art performance inmost supervised learning tasks. However, XGBoost differs signifi-cantly from other convex-margin based classifiers such as Support a r X i v : . [ c s . A I] O c t DD Workshop on Machine Learning in Finance ’20, August 24, 2020, Srinivasan Ravichandran, Drona Khurana, Bharath Venkatesh, and Narayanan Unny Edakunni

Vector Machines (SVM), logistic regression, neural networks etc.in the way in which the model parameters are updated. This isthe primary reason why existing bias-mitigation methods are notcompatible with XGBoost.Our main contributions in this paper are • the formulation of a regularization-based in-process biasmitigation technique that squarely fits into XGBoost’s greedytree building algorithm • empirical comparison with other state-of-the-art bias miti-gation strategies on common benchmarks typically used inthe bias-mitigation literatureThe paper is structured as follows. In Section 2, we provide a briefoverview of the existing state-of-the-art bias-mitigation methods. InSection 3, we define our bias-mitigation framework, derive gradientand hessian values for model building and show how it fits intoXGBoost’s existing framework and requires very little in terms ofmodification. In Section 4, we show our experimental results oncommon benchmark datasets in the fairness literature and compareour framework’s performance and fairness against state-of-the-artbias mitigation strategies. As noted in the introduction, bias-mitigation is a growing area of re-search, thanks to the increasing interest in machine learning modelregulation. Typically, these regulations prohibit discrimination indecision-making against a certain population characterized by sen-sitive attributes such as race, gender, age, marital status etc. Whileone might be tempted to assume that excluding these sensitiveattributes in the model building process (which is a common prac-tice in the industry) would result in fair models, [4] demonstratedthat this is not always the case. Often times, other proxy variablesthat are correlated with these sensitive attributes carry sufficientinformation to induce bias in the model. Additionally, the datasetsused to train these models and the processes that generate thesedatasets might themselves be inherently biased. Hence, we need asophisticated bias-mitigation algorithm to overcome this problem.

In this paper, we consider a supervised learning task and we adoptthe following notation. The model is trained using 𝑛 training sam-ples denoted by ( 𝑥 𝑖 , 𝑠 𝑖 , 𝑦 𝑖 ) 𝑛𝑖 = . For the 𝑖 𝑡ℎ training sample, 𝑥 𝑖 is thefeature vector , 𝑠 𝑖 is the binary indicator for the sensitive attribute(such as sex or race), 𝑦 𝑖 is the binary target label and ˆ 𝑦 𝑖 is the scoreproduced by the model. Note that in the case of binary classifica-tion, ˆ 𝑦 𝑖 needs to be transformed using the sigmoid function beforebeing interpreted as the output probabilities for the two classes. Inorder to quantify the level of bias exhibited by the model, we needa fairness metric. Multiple fairness metrics have been proposedand there isn’t a one-size-fits-all metric. In this paper, we will befocusing on the following metric. Definition 2.1.

The disparate impact for a model is the ratio of thepositive prediction rates of the minority and the majority groups. 𝐷𝐼 = 𝑃 ( ˆ 𝑌 = | 𝑆 = ) 𝑃 ( ˆ 𝑌 = | 𝑆 = ) (1) Disparate impact (DI) is a well-established measure of fairnessand is often associated with the 80% rule that is frequently citedin legislation. We consider this metric due to its prevalence in theregulatory law. Throughout this paper, we will be using DI as themetric for fairness and accuracy as the metric for measuring modelperformance. Prior work on bias mitigation can be broadly classified into threecategories: pre-processing, in-processing and post-processing meth-ods. Pre-processing methods [5][14] typically project the data intoa feature space with fair representations. In-processing methods[11][16][20] involve changing the training procedure in order tomake the model predictions fair. Post-processing [12][15] methodstypically transform the model outputs to ensure fairness. Of thesethree categories, in-processing methods offer maximum robust-ness and flexibility. Existing in-processing methods can further beclassified into four categories: optimization in a space constrainedby a fairness metric [19], a regularized objective function on anunconstrained space where the regularizer is typically a functionof the model output and the sensitive feature [16], an adversariallearning set-up where an adversary attempts to identify the correla-tion between a sensitive attribute and the predictor model’s outputwhile the predictor model’s goal is to maximize performance andsimultaneously fooling the adversary [20] and finally, designingmeta-algorithms [6].Both pre-processing and post-processing methods have beenwidely applied for black-box models. However they are often inflex-ible and result in degradation of model performance. In-processingmethods on the other hand provide robust bias mitigation witha relatively lower performance degradation. [19] proposed an in-process method where the search space for the model is constrainedby a fairness metric, namely the co-variance between the sensitivefeatures and the signed distance of the instance from the model’sdecision boundary. However, their method is applicable only tothe family of convex-margin classifiers and not to algorithms suchas XGBoost. Kamishima et al.[16] proposed the prejudice removerwhich is a regularization based bias-mitigation strategy. The ideais to add a regularizer that captures the mutual information be-tween 𝑌 and 𝑆 . Once again, their method cannot be extended toalgorithms such as XGBoost. [20] proposed an adversarial setup asdescribed earlier. However, their method suffers from poor conver-gence characteristics and it is often difficult to tune the adversarialsystem.Literature on fairness in ensemble models, especially in boostedtree models is rather limited. To the best of our knowledge, only[9] and [11] consider fairness in a boosting setup. [9] were the firstto perform a case study of fairness for Adaboost. Their approach in-volved pre-processing and post-processing methods such as randomreshuffling which can incur additional performance degradation.[11] proposed fair adversarial gradient tree boosting where thepredictor from [20] was a decision tree model. However, the issuesof convergence from an adversarial setup still remain, renderingthe method to be often impractical. airXGBoost: Fairness-aware Classification in XGBoost KDD Workshop on Machine Learning in Finance ’20, August 24, 2020, Gradient Boosted Decision Trees (GBDT), introduced by [10], isa boosting framework consisting of a collection of weak learnerswhich are shallow decision trees. [7] proposed eXtreme GradientBoosting (XGBoost) as a scalable end-to-end tree boosting algo-rithm. XGBoost has enjoyed widespread adoption by data scientistsin the machine learning community. XGBoost has gained particularinterest in finance owing to the fact that it is flexible, scalable andexplainable.The GBDT setup for a dataset 𝐷 = {( 𝑥 𝑖 , 𝑦 𝑖 )} involves 𝐾 additivefunctions put together to make a prediction. Formally, a GBDTmodel consists of 𝐾 trees each represented as 𝑓 𝑡 ( 𝑥 ) built at the 𝑡 𝑡ℎ boosting round. The prediction function is then defined asˆ 𝑦 𝑖 = 𝐾 ∑︁ 𝑡 = 𝑓 𝑡 ( 𝑥 𝑖 ) The trees are built in a greedy manner by optimizing the follow-ing objective function L 𝑡 = 𝑛 ∑︁ 𝑖 = 𝑙 ( 𝑦 𝑖 , ˆ 𝑦 ( 𝑡 − ) 𝑖 + 𝑓 𝑡 ( 𝑥 )) + Ω ( 𝑓 𝑡 ) (2)where 𝑙 is an appropriate loss function that depends on the task athand and Ω is a regularizer for the tree structure. For classficationtasks, a common choice is the cross-entropy loss between 𝑦 𝑖 andˆ 𝑦 𝑖 , while the squared error is used for regression tasks.The key contribution of [7] is the reformulation of this opti-mization problem as follows. The objective function in the aboveequation can be approximated using the Taylor expansion as L 𝑡 = 𝑛 ∑︁ 𝑖 = 𝑙 ( 𝑦 𝑖 , ˆ 𝑦 ( 𝑡 − ) 𝑖 ) + 𝑔 𝑖 𝑓 𝑡 ( 𝑥 𝑖 ) + ℎ 𝑖 𝑓 𝑡 ( 𝑥 𝑖 ) + Ω ( 𝑓 𝑡 ) (3)where 𝑔 𝑖 = ∇ ˆ 𝑦 ( 𝑡 − ) 𝑖 𝑙 ( 𝑦 𝑖 , ˆ 𝑦 ( 𝑡 − ) 𝑖 ) and ℎ 𝑖 = ∇ 𝑦 ( 𝑡 − ) 𝑖 𝑙 ( 𝑦 𝑖 , ˆ 𝑦 ( 𝑡 − ) 𝑖 ) . Thisobjective function is then transformed from the space of 𝑓 𝑡 to thespace of node weights 𝑤 𝑗 of the trees, which results in the following. L 𝑡 = 𝑇 ∑︁ 𝑗 = (cid:20) 𝑤 𝑗 (cid:16) Σ 𝑖 ∈ 𝐼 𝑗 𝑔 𝑖 (cid:17) + 𝑤 𝑗 (cid:16) Σ 𝑖 ∈ 𝐼 𝑗 ℎ 𝑗 (cid:17)(cid:21) + Ω ( 𝑓 𝑡 ) (4)where 𝐼 𝑗 is the set of indices of the samples that fall in the leaf 𝑗 .The best split is computed as the split value that optimizes thisobjective function. Our proposed approach involves the use of a fairness regularizerthat aims to remove correlation between the sensitive attribute andthe target value, thereby ensuring model fairness. The extent towhich the regularizer affects the model is controlled by a hyperpa-rameter.

Using the notation that we introduced earlier, we have a set oftraining samples 𝐷 = {( 𝑥 𝑖 , 𝑠 𝑖 , 𝑦 𝑖 )} . For convenience, let us assume,without loss of generality 𝑠 𝑖 = 𝑖 belonging tothe majority group and 𝑠 𝑖 = 𝑖 belonging to theminority group. Similarly, 𝑦 𝑖 = favourable outcome (such as approval of a credit application). Let ˆ 𝑦 𝑖 be the raw leaf scoreproduced by the model for the 𝑖 𝑡ℎ instance and 𝜎 ( 𝑧 ) = + 𝑒 − 𝑧 be theclassic sigmoid function. We propose the following regularizer. R 𝑡 = 𝑛 ∑︁ 𝑖 = − 𝑠 𝑖 𝑙𝑜𝑔 (cid:16) 𝜎 ( ˆ 𝑦 ( 𝑡 ) 𝑖 ) (cid:17) − ( − 𝑠 𝑖 ) 𝑙𝑜𝑔 (cid:16) − 𝜎 ( ˆ 𝑦 ( 𝑡 ) 𝑖 ) (cid:17) (5)We emphasise that one must choose the encoding for the major-ity and minority population as follows. If 𝑡 ∈ { , } represents thefavourable outcome in a classification task (for example, 𝑡 = 𝑠 = 𝑡 and the majority members mustbe encoded with 𝑠 = − 𝑡 . Intuitively, this encoding enables theregularizer to push for more favourable outcomes to the minoritygroup and leads to a decrease in the bias of the model. The regularized objective function for a supervised classificationtask will now be the sum of the classical cross-entropy loss betweenthe model predictions and the ground truth labels and the negativecross-entropy between the model predictions and the sensitivefeature.¯ L 𝑡 = 𝑛 ∑︁ 𝑖 = − 𝑦 𝑖 𝑙𝑜𝑔 (cid:16) 𝜎 ( ˆ 𝑦 ( 𝑡 ) 𝑖 ) (cid:17) − ( − 𝑦 𝑖 ) 𝑙𝑜𝑔 (cid:16) − 𝜎 ( ˆ 𝑦 ( 𝑡 ) 𝑖 ) (cid:17) + Ω ( 𝑓 𝑡 )− 𝜇 𝑛 ∑︁ 𝑖 = 𝑠 𝑖 𝑙𝑜𝑔 (cid:16) 𝜎 ( ˆ 𝑦 ( 𝑡 ) 𝑖 ) (cid:17) + ( − 𝑠 𝑖 ) 𝑙𝑜𝑔 (cid:16) − 𝜎 ( ˆ 𝑦 ( 𝑡 ) 𝑖 ) (cid:17) The hyperparameter 𝜇 determines the strength of the regularizer:the higher the regularizer strength the higher the fairness scoreof the model. This gives us fine-grained control over the level offairness we desire. It should be noted that the choice of 𝜇 should besuch that 𝜇 ≥

0, in order to avoid unboundedness in the directionof optimization.We re-trace the steps of [7] and reformulate this objective fromthe space of functions 𝑓 𝑡 to the space of node weights 𝑤 𝑗 , by comput-ing the gradient ¯ 𝑔 𝑖 and hessian ¯ ℎ 𝑖 for the new objective function asfollows. We drop the superscript ( 𝑡 − ) for convenience of notation.¯ 𝑔 𝑖 = ∇ ˆ 𝑦 𝑖 (cid:32) 𝑛 ∑︁ 𝑖 = − 𝑦 𝑖 𝑙𝑜𝑔 ( 𝜎 ( ˆ 𝑦 𝑖 )) − ( − 𝑦 𝑖 ) 𝑙𝑜𝑔 ( − 𝜎 ( ˆ 𝑦 𝑖 )) (cid:33) + 𝜇 ∇ ˆ 𝑦 𝑖 (cid:32) 𝑛 ∑︁ 𝑖 = 𝑠 𝑖 𝑙𝑜𝑔 ( 𝜎 ( ˆ 𝑦 𝑖 )) + ( − 𝑠 𝑖 ) 𝑙𝑜𝑔 ( − 𝜎 ( ˆ 𝑦 𝑖 )) (cid:33) ¯ 𝑔 𝑖 = 𝜎 ( ˆ 𝑦 𝑖 ) − 𝑦 𝑖 + 𝜇 ( 𝑠 𝑖 − 𝜎 ( ˆ 𝑦 𝑖 )) Similarly, we can derive ¯ ℎ 𝑖 and we obtain¯ ℎ 𝑖 = ∇ 𝑦 𝑖 (cid:32) 𝑛 ∑︁ 𝑖 = − 𝑦 𝑖 𝑙𝑜𝑔 ( 𝜎 ( ˆ 𝑦 𝑖 )) − ( − 𝑦 𝑖 ) 𝑙𝑜𝑔 ( − 𝜎 ( ˆ 𝑦 𝑖 )) (cid:33) + 𝜇 ∇ 𝑦 𝑖 (cid:32) 𝑛 ∑︁ 𝑖 = 𝑠 𝑖 𝑙𝑜𝑔 ( 𝜎 ( ˆ 𝑦 𝑖 )) + ( − 𝑠 𝑖 ) 𝑙𝑜𝑔 ( − 𝜎 ( ˆ 𝑦 𝑖 )) (cid:33) ¯ ℎ 𝑖 = ( − 𝜇 ) 𝜎 ( ˆ 𝑦 𝑖 ) ( − 𝜎 ( ˆ 𝑦 𝑖 )) DD Workshop on Machine Learning in Finance ’20, August 24, 2020, Srinivasan Ravichandran, Drona Khurana, Bharath Venkatesh, and Narayanan Unny Edakunni

The rest of the tree building process remains the same as XG-Boost except that we use the new ¯ 𝑔 𝑖 and ¯ ℎ 𝑖 instead. It is worthnoting that comparing ¯ 𝑔 𝑖 and ¯ ℎ 𝑖 with 𝑔 𝑖 and ℎ 𝑖 from the originalXGBoost formulation, we get the following relationships for thegradient and hessian.¯ 𝑔 𝑖 = 𝑔 𝑖 + 𝜇 ( 𝑠 𝑖 − 𝜎 ( ˆ 𝑦 𝑖 )) (6)¯ ℎ 𝑖 = ℎ 𝑖 ( − 𝜇 ) (7)This simple relationship between the original 𝑔 𝑖 , ℎ 𝑖 and our pro-posed ¯ 𝑔 𝑖 , ¯ ℎ 𝑖 is what makes our approach appealing since it can bedirectly implemented using the custom objective feature of XG-Boost. In the next section, we describe the experimental setup thatwe used and compare our approach to the current state-of-the artmethods. Additionally, we also provide insights on how the fairnessof the model changes as we increase 𝜇 . Throughout this section, we take the following approach. The besthyperparameter settings for the XGBoost model such as max-depth , num-rounds , learning-rate have been identified as the ones thatmaximize model accuracy, without the fairness regularizer in place( 𝜇 = 𝜇 and the correspondingaccuracy and fairness metrics are measured and reported.The fairness metric we report is the disparate impact (DI) de-fined in Section 2. The datasets we use are the standard benchmarkdatasets that are used in the bias-mitigation literature, which aredescribed below.The first dataset is the UCI Adult Income dataset [8] where thegoal is to train a model that can predict if an individual makes morethan $50K as income, given a set of features such as age, capitalgains and capital losses. The dataset also contains the sensitiveattribute sex which takes on two values {Male, Female} . Thedataset comprises of more males than females, thus making Male the majority population. Fairness here would imply that the modelpredictions do not discriminate against

Female .The second dataset that we consider is the COMPAS recidivismdataset [13]. The model is trained to predict if an individual is likelyto re-offend in the future. The dataset contains 13 features about7000 individuals. This was one of the hallmark datasets that wasused in the first major debate on the fairness of machine learn-ing models. The sensitive attribute being considered here is race .Once again, fairness here would mean that no particular race isdiscriminated against.In addition to the above, we also consider the two other datasetsthat were analyzed in [11] namely the Bank and the Default datasets.The Default dataset [18] comprises of 23 features about 30,000Taiwanese credit card users with class labels which state whetheran individual will default on payments. The sensitive attribute thatis being considered for this dataset is sex .The Bank dataset [17] consists of 16 features of about 45000clients of a Portuguese banking institution. The goal of the taskis to predict if the client has subscribed to a term deposit. Thesensitive attribute is age after it has been encoded in a binaryformat indicating if a customer is between 33 and 60 years old ornot.

Table 1: Benchmark dataset statistics

Name Number of rows Sensitive AttributeAdult ∼ ∼ ∼ ∼ Table 2: Drop in accuracy to achieve DI ≥ % 3.0% 2.3%COMPAS % 4.6% 3.7% 1.0%Default % 0.7% 1.0% %Bank % 0.6% 0.7% 0.6%We compare our work to three in-process bias mitigation meth-ods - prejudice remover [16], the fair adversarial gradient treeboosting [11] and adversarial debiasing [20]. For prejudice remover,we use the implementation provided by [16] for training the model.For [11], we re-use the numbers reported by them as the benchmarksince the hyperparameters are unknown and hence we cannot re-produce their results. Since different model families would providedifferent accuracies to begin with, we measure the drop in accuracyfrom the vanilla model in order to obtain a DI of at least 80%, ratherthan the absolute value of the accuracy itself. The 80% DI target isarbitrary and is often a useful rule of thumb that comes from the80-20 rule[2]. The comparisons in the drop in accuracy incurred byour approach against [11], [16] and [20] is shown in Table 2.Our method outperforms [11], [20] and [16] on all but the Adultdataset, where the drop in accuracy is more pronounced. Whencompared against [16], our method and [11] incur a smaller dip inaccuracy, thus showing their effectiveness. An interesting observa-tion is that for the Default dataset, the vanilla XGBoost model thatwe trained was already satisfying the critera of 𝐷𝐼 ≥

80% beforeany bias mitigation was applied. The same was observed in the caseof the vanilla neural network model for [20]. We believe that thehigher loss in accuracy for the Adult dataset could be explained bylow DI of the vanilla model. Thus, the best hyperparameters for thevanilla model need not necessarily be optimal for all values of 𝜇 .This is supported by the fact that the adversarial methods [11] and[20] incur a lower loss in accuracy for the Adult dataset becausethey use a more complex multi-layer perceptron adversary. A morenuanced method for tuning hyperparameters is hence required fornon-adversarial methods and we defer this to future work.We also plot the variation of DI and accuracy with respect tothe weight of the fairness regularizer 𝜇 , to visualise the effect ofincreasing 𝜇 on the DI and accuracy metrics. For each 𝜇 , we pickthe classifier that achieves the highest accuracy and report it alongwith the corresponding disparate impact. This is in contrast tosome of the prior analyses where the "best" models are chosen asthose with the best DI. We believe that studying the fairness ofthe best-accuracy model is more practical for modelers. In Fig. 1,we show the plot for the Adult and COMPAS datasets. The Adult airXGBoost: Fairness-aware Classification in XGBoost KDD Workshop on Machine Learning in Finance ’20, August 24, 2020, Figure 1: Disparate Impact of the maximum-accuracy clas-sifier for different values of 𝜇 for the Adult Income Datasetand the COMPAS datasetFigure 2: Disparate Impact of the maximum-accuracy clas-sifier for different values of 𝜇 for the Bank dataset and theDefault dataset dataset requires a weight in the range between 𝜇 = . 𝜇 = . 𝜇 , between 𝜇 = . 𝜇 = .

5. Similarly, the plotsfor the Bank and Default datasets are shown in Fig.2.

In this paper, we have described an extension of XGBoost that canbe used to build fair machine learning models. Our choice of the reg-ularizer for fairness makes it easy to be incorporated into XGBoostwith minimal changes, while also providing fine-grained controlof the level of fairness that needs to be imposed. Furthermore, wehave compared our method with the current state-of-the-art biasmitigation strategies on common benchmark datasets. While wehave only considered the cross-entropy loss between ˆ 𝑦 𝑖 and 𝑠 𝑖 , ourframework is applicable to other continuous and differentiable reg-ularizer functions as well. Hence, our proposal helps bridge thegap between fairness researchers and practitioners in the financecommunity.As future work it would be interesting to tackle the other chal-lenge that is typically faced in well-regulated domains - privacy.There have been methods such as differential privacy that havebeen proposed for secure sharing of sensitive features to modelersto build their models. Adopting such a methodology for XGBoost would be a good addition that would go a long way. Another in-teresting direction to pursue would be the monitoring of the regu-larized objective in order to gain insights on the fairness-accuracytradeoffs. XGBoost’s inherent support to track an evaluation metriccould be reused for this task.It would also be useful to pursue the handling of polyvalentsensitive attributes (such as race which can take on many valuessuch as Asian, White, Hispanic, African-American etc.). REFERENCES

Data Mining and Knowledge Discovery

Advances in Neural Information Processing Systems . 3992–4001.[6] L Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K Vishnoi. 2019. Clas-sification with fairness constraints: A meta-algorithm with provable guarantees.In

Proceedings of the Conference on Fairness, Accountability, and Transparency .319–328.[7] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A Scalable Tree BoostingSystem. In

Proceedings of the 22nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining . 785–794.[8] Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml[9] Benjamin Fish, Jeremy Kun, and Adám D Lelkes. [n.d.]. Fair boosting: a casestudy. Citeseer.[10] Jerome H Friedman. 2001. Greedy function approximation: a gradient boostingmachine.

Annals of statistics (2001), 1189–1232.[11] V. Grari, B. Ruf, S. Lamprier, and M. Detyniecki. 2019. Fair Adversarial GradientTree Boosting. In .1060–1065.[12] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity insupervised learning. In

Advances in neural information processing systems . 3315–3323.[13] L Kirchner J Larson, S Mattu and J Angwin. 2016. Machine Bias, ProPublica.(2016).[14] Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques forclassification without discrimination.

Knowledge and Information Systems

33, 1(2012), 1–33.[15] Faisal Kamiran, Asim Karim, and Xiangliang Zhang. 2012. Decision theory fordiscrimination-aware classification. In . IEEE, 924–929.[16] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012.Fairness-Aware Classifier with Prejudice Remover Regularizer. In

Machine Learn-ing and Knowledge Discovery in Databases - European Conference, ECML PKDD2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II (Lecture Notes in Com-puter Science) , Peter A. Flach, Tijl De Bie, and Nello Cristianini (Eds.), Vol. 7524.Springer, 35–50. https://doi.org/10.1007/978-3-642-33486-3_3[17] Sérgio Moro, Paulo Cortez, and Paulo Rita. 2014. A data-driven approach topredict the success of bank telemarketing.

Decision Support Systems

62 (2014),22–31.[18] I-Cheng Yeh and Che-hui Lien. 2009. The comparisons of data mining techniquesfor the predictive accuracy of probability of default of credit card clients.

ExpertSystems with Applications

36, 2 (2009), 2473–2480.[19] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P.Gummadi. 2019. Fairness Constraints: A Flexible Approach for Fair Classification.

Journal of Machine Learning Research

20, 75 (2019), 1–42. http://jmlr.org/papers/v20/18-262.html[20] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating un-wanted biases with adversarial learning. In