[PDF] Gradient Reversal Against Discrimination

Abstract

No methods currently exist for making arbitrary neural networks fair. In this work we introduce GRAD, a new and simplified method to producing fair neural networks that can be used for auto-encoding fair representations or directly with predictive networks. It is easy to implement and add to existing architectures, has only one (insensitive) hyper-parameter, and provides improved individual and group fairness. We use the flexibility of GRAD to demonstrate multi-attribute protection.

Full PDF

GGradient Reversal Against Discrimination

Edward Raff

Jared Sylvester Abstract

No methods currently exist for making arbitraryneural networks fair. In this work we introduceGRAD, a new and simpliﬁed method to produc-ing fair neural networks that can be used for auto-encoding fair representations or directly with pre-dictive networks. It is easy to implement and addto existing architectures, has only one (insensitive)hyper-parameter, and provides improved individ-ual and group fairness. We use the ﬂexibility ofGRAD to demonstrate multi-attribute protection.

1. Introduction

Artiﬁcial Neural Network methods are quickly becomingubiquitous in society, spurred by advances in image, signal,and natural language processing. This pervasiveness leadsto a new need for considering the fairness of such networksfrom many perspectives, including: how they are used, whocan access them and their training data, and potential biasesin the model itself. There are many reasons for desiring fairclassiﬁcation algorithms. These include legal mandates tobe non-discriminative, ensuring a moral or ethical goal, orfor use as evidence in legal proceedings (Romei & Ruggieri,2014). Despite the long-standing need and interest in thisproblem, there are few methods available today for trainingfair networks.When we say that a network is fair, we mean fair with respectto a protected attribute a p , such as age or gender. Our desireis that a model’s predicted label ˆ y given a feature vector x is invariant to changes in a p . An initial reaction may be tosimply remove a p from the feature vector x . While intuitive,this “fairness through unawareness” does not remove thecorrelations with a p that exist in the data, and so the resultwill still produce a biased model (Pedreshi et al., 2008).For this reason we need to devise approaches that explic-itly remove the presence of a p from the model’s predictions. Booz Allen Hamilton University of Maryland, Bal-timore County. Correspondence to: Edward Raff.

We do so in this work by introducing a new method to trainfair neural networks. Our approach, termed Gradient Re-versal Against Discrimination (GRAD), makes use of a net-work which simultaneously attempts to predict the targetclass y and protected attribute a p . The key is that the gradi-ents resulting from predictions of a p are reversed before be-ing used for weight updates. The result is a network whichis capable of learning to predict the target class but effec-tively inhibited from being able to predict the protected at-tribute. GRAD displays competitive accuracy and improvedfairness when compared to prior approaches. GRAD’s ad-vantage comes from increased simplicity compared to priorapproaches, making it easier to apply and applicable to awider class of networks. Prior works in this space are lim-ited to one attribute (but see Zafar et al., 2017) and requirethe introduction of multiple hyper-parameters. These pa-rameters must be cross-validated, making the approacheschallenging to use. Further, our approach can be used toaugment any current model architecture, where others havebeen limited to auto-encoding style architectures.

2. Gradient Reversal Against Discrimination

We now present our new approach to developing neuralnetworks that are fair with respect to some protected at-tribute. We call it

Gradient Reversal Against Discrimination (GRAD), and is inspired by recent work in transfer learning.Notably, Ganin et al. (2016) introduced the idea of domainadaptation by attempting to jointly predict a target label anda domain label (i.e., which domain did this data instancecome from?). By treating the protected attribute as the newdomain, we can use this same approach to instead preventthe network from being biased by the protected attribute a p .After several feature extraction layers the network forks.One branch learns to predict the target y , while the otherattempts to predict the protected attribute a p . We term theportion of the network before the splitting point the “trunk,”and those portions after the “target branch” and the “attributebranch.” The ﬁnal loss of the network is sum of the losses ofboth branches, giving (cid:96) ( y, a p ) = (cid:96) t ( y ) + λ · (cid:96) p ( a p ) . Here, λ determines the relative importance of fairness compared toaccuracy. In practice, we ﬁnd that performance is insensitiveto particular choices of λ , and any value of λ ∈ [50 , would perform equivalently. In our experiments we will use a r X i v : . [ s t a t . M L ] J u l radient Reversal Against Discrimination Raw Input x Feature ExtractionTarget Branch Attribute Branch (cid:96) t ( y ) λ · (cid:96) p ( a p ) ReverseGradient − ∂λ(cid:96) p ( a p ) ∂θ Att Branch

Figure 1.

Diagram of GRAD architecture. Red connection indi-cates normal forward propagation, but back-propagation will re-verse the signs. λ = 100 without any kind of hyper-parameter optimization.The values of both (cid:96) t ( y ) and (cid:96) p ( a p ) are calculated and usedto determine gradients for weight updates as usual, withone important exception. When the gradients have beenback-propagated from the attribute branch they are reversed(i.e., multiplied by − ) before being applied to the trunk.This moves the trunk’s parameters away from optima inpredictions of a p , crippling the ability to correctly output theprotected attribute. Since the target branch also depends onthe trunk parameters, it inherits this inability to accuratelyoutput the value of the protected attribute. No such reversalis applied to the gradients derived from y , so the network’sinternal state representations are suitable for predicting y but nescient of a p .It is instructive to consider why it may be insufﬁcient to setup a loss function which directly punishes the network forcorrectly predicting a p . If this were the case, the networkcould achieve low loss by forming internal representationwhich are very good at predicting the protected attribute,and then “throw the game” by simply reversing the correctprediction in the penultimate layer. (That is, a potential,reliable strategy to getting the wrong answer is to becomevery good at getting the right answer, and then lying aboutwhat one thinks the answer should be.) If this strategyis adopted then the representations necessary for correctlyrecovering a p from x would be available to the target branchwhen making its prediction of y , which is the situation weaim to prevent. Architecture Variants

As mentioned above, many of the other neural approachesto fair classiﬁcation take an autoencoder or representationlearning approach. This approach has its advantages. Forinstance, it allows the person constructing the fair model tobe agnostic about the ultimate task that it will be applied to.Others like ALFR consider a target value directly, and socan not be re-used for other tasks, but may perform better inpractice on the speciﬁc problem they were constructed for.Our GRAD approach, thanks to its comparative simplicity, can be used in both formulations. This makes it the onlyneural network-based approach to fairness that offers bothtask ﬂexibility and speciﬁcity.

GRAD-Auto will designate our approach when using anauto-encoder as the target branch’s loss. That is, if x is the in-put feature, ˜ x will be the feature vector derived from x suchthat the protected attribute a p / ∈ ˜ x . We then use (cid:96) Auto t ( · ) = || h target − ˜ x || as the loss function for the target branch,where h target is the activation vector from the last layer ofthe target branch. This approach is in the same style as LFRand VFA, where a hidden representation invariant to a p islearned, and then Logistic Regression is used on the outputsfrom the trunk sub-network to perform classiﬁcation. GRAD-Pred will designate our task-speciﬁc approach,where we use the labels y i directly. Here we simply use thestandard logistic loss (cid:96) Pred t ( · ) = log(1 + exp( − y · h target )) .In this case the target branch of the network will produce asingle activation, and the target branch output itself is usedas the classiﬁer directly.Since we are dealing with binary protected attributes, bothGRAD-Auto and GRAD-Pred will have the attribute branchof the network use (cid:96) p ( a p ) = log(1 + exp( − a p · h attribute )) .In the spirit of minimizing the effort needed by the practi-tioner, we do not perform any hyper-parameter search forthe network architecture either. Implemented in Chainer(Tokui et al., 2015) we use two fully-connected layers for ev-ery branch of the network (trunk, target, & attribute) whereall hidden layers have 40 neurons. Each layer will use batch-normalization followed by the the ReLU activation function.Training is done using the Adam optimizer for gradient de-cent. We emphasize that the heart of GRAD is the inclusionof the attribute branch with reversed gradient; this techniqueis ﬂexible enough to be used regardless of the particularchoices of layer types, sizes, etc. We train each model for50 epochs, and use a validation set to select the model fromthe best epoch. We deﬁne best by the model having the low-est Discrimination (see §3.1) on the validation set, breakingties by selecting the model with the highest accuracy. Whenmultiple attributes are protected, we use the lowest averageDiscrimination.

3. Methodology

There is currently ongoing debate about what it meansfor a machine learning model to be fair. We choose touse the same evaluation procedure laid out by Zemel et al.(2013). This makes our results comparable with a largerbody of work, as their approach and metrics have beenwidely used through the literature (e.g., Landeiro & Culotta,2016; Bechavod & Ligett, 2017; Dwork et al., 2017). Weuse the same evaluation procedure and metrics: Discrimina-tion, Consistency, Delta, and Accuracy. radient Reversal Against Discrimination

Given a dataset { x , . . . , x n } ∈ D , we deﬁne the groundtrue label for the i th datum as y i and the model’s predictionas ˆ y i . Each are with respect to the binary target label y ∈{ , } . While we deﬁne both y i and ˆ y i , we emphasize thatonly the predicted label ˆ y i is used in the fairness metrics.This is because fairness is not directly related to accuracyby equality of treatment.Discrimination is a macro-level measure of “group” fairness,and computed by the taking the difference between theaverage predicted scores for each attribute value, assuming a p is a binary attribute.Discrimination = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) x i ∈ T ap ˆ y i | T a p | − (cid:80) x i ∈ T ¬ ap ˆ y i | T ¬ a p | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (1)The second metric is Consistency, which is a micro-levelmeasure of “individual” fairness. For each x i ∈ D , wecompare its prediction y i with the average of its k nearestneighbors and take the average of this score across D .Consistency = 1 − N N (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ y i − k (cid:88) j ∈ k -NN ( x i ) ˆ y j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (2)Because Consistency and Discrimination are independent ofthe actual accuracy of the method used, we also consider the Delta = Accuracy − Discrimination. This gives a combinedmeasure of an algorithm’s accuracy that penalizes it forbiased predictions.We use these metrics in the same manner and on the samedatasets as laid out in Zemel et al. (2013) so that we cancompare our results with prior work. This includes using thesame training, validation, and testing splits. When trainingour GRAD approaches, we perform 50 epochs of training,and select the model to use from the validation performance.Speciﬁcally, we choose the epoch that had the lowest dis-crimination and broke ties by selecting the highest accuracy.

As a baseline for comparison against GRAD-Pred andGRAD-Auto, we will consider the same architecture butwith the attribute branch removed. This produces a standardneural network, and will be denoted as NN . For compari-son with other fairness-seeking neural network algorithms,we present prior results for Learning Fair Representations(LFR) (Zemel et al., 2013), Variation Fair Autoencoders(VFA) (Louizos et al., 2016), and Adversarial Learned FairRepresentations (ALFR) (Edwards & Storkey, 2016) ap-proaches. For all models on all datasets, we report the met-rics as presented in their original publications, as we wereunable to replicate VFA and ALFR’s results.

4. Results

The results are given in Table 1. For values unreported intheir original work, we show a dash (“—”) in the table. OurGRAD approach is shown in the top rows. The bottomthree rows include the other approaches as explained insubsection 3.2.When we compare the standard neural network (NN) withits GRAD counterpart, we can see that the GRAD approach always increases the Delta and Consistency scores, andreduces the Discrimination. This shows its applicabilityacross network types (classifying and auto-encoding). Wecan even see the GRAD approach improve accuracy on theAdult dataset by 5 percentage points. While we would notexpect this behavior (i.e. a negative cost of fairness) in thegeneral case, it is nonetheless interesting and it may indicatethe protected attribute allows overﬁtting.Comparing the GRAD algorithms to the other neural net-works LFR, VFA and ALFR, we see that GRAD is usuallybest or 2nd best in each metric. On both the German andAdult datasets, it achieves the best Discrimination and Con-sistency scores compared to any of the algorithms tested.On the German dataset VFA obtains a higher Delta scoreby having a high accuracy, though VFA has 4% discrimi-nation compared to GRAD-Pred’s 0.06%. On the Healthdataset, GRAD-Auto and GRAD-Pred have near identicalresults.This is overall signiﬁcantly better than the LFR ap-proach which has an 11 percentage point difference in Ac-curacy and Delta scores compared to the GRAD approaches.The VFA algorithm is similarly within a fractional distance,though Consistency is not reported for VFA.GRAD consistently produces the highest Consistency. Onthe Adult dataset where VFAE and ALFR get better accu-racy, it may have come at a cost of lower Consistency. Thiscouldn’t be conﬁrmed since we could not replicate their re-sults.

In almost all prior works that we are aware, it is alwaysassumed that there is only one attribute that needs to beprotected. However, this is a myopic view of the world. Allof the protected attributes that have been tested individuallyin this work, like age, race and gender, may all co-occur andinteract with each other. We show this in Table 2 using theDiabetes dataset used in Edwards & Storkey (2016), whichhas both Race and Gender as features in the corpus. Inthis case GRAD-Pred and GRAD-Auto are protecting Raceand Gender attributes. GRAD-Pred-R shows the resultsfor protecting only Race, and GRAD-Pred-G shows foronly protecting Gender. GRAD-Auto follows the sameconvention.Since Discrimination is computed with respect to speciﬁc radient Reversal Against Discrimination

Table 1.

For each dataset we show Accuracy, Delta, Discrimination, and Consistency. Best results shown in bold , second best in italics . German Adult HealthAlgorithm Acc Delta Discr Cons Acc Delta Discr Cons Acc Delta Discr ConsNN-Auto

NN-Pred

LFR 0.5909 0.5867

VFAE 0.7270 —ALFR — — — —

Table 2.

Accuracy, Delta, Discrimination (with respect to Race andGender), and Consistency for our new method on the Diabetesdataset. Last four rows show GRAD models when only Race (R)or Gender (G) are protected.

DiscriminationAlgorithms Acc Delta Race Gender ConsNN-Auto 0.5735 0.5392 0.0412 0.0275 0.6411GRAD-Auto 0.5765 0.5723

GRAD-Pred

GRAD-Auto-R 0.5851 0.5749 0.0003 0.0201 0.6404GRAD-Auto-G 0.5640 0.5143 0.0981 0.0013 0.6093GRAD-Pred-R 0.5844 0.5478 0.0020 0.0713 0.7538GRAD-Pred-G 0.5941 0.5526 0.0785 0.0045 0.6849 attributes, in the table we show the discrimination scoreswith respect to both of the protected attributes. Since wehave two protected attributes a p and a p , we compute Delta= Accuracy − ( Discrimination( a p ) + Discrimination( a p ) ) / . In doing so, we can see that when two protected vari-ables are present, the GRAD approach is able to reduce Dis-crimination and increase Delta for both the Autoencoderand the standard softmax predictive network. GRAD-Predalso continues to increase the Consistency with respect tothe naive neural network.Comparing GRAD-Pred with GRAD-Pred-R and GRAD-Pred-G is also critical to show that protecting both attributessimultaneously provides a signiﬁcant beneﬁt. On the Di-abetes data, we see the model increase its discriminationwith respect to Gender when only Race is protected. Simi-larly, when we protect Gender, discrimination with respectto Race increases. Explicitly protecting both is the only safeway to reduce discrimination on both.The model shifting to leverage other protected features isnot surprising. When we penalize a feature which providesinformation, the model must attempt to recover discrimi-native information in other (potentially non-linear) formsfrom the other features. Thus the importance and utility of . . . . λ Adult Income . . . . λ Heritage HealthAccuracyDiscriminationConsistency

Figure 2.

Plots show the performance of GRAD-Pred as a functionof λ on the x-axis (log scale). A dashed vertical black line showsthe value λ = 100 used in all experiments. GRAD to protect both simultaneously is established. λ We have discussed so far that a beneﬁt of the GRAD ap-proach is a simplicity in application due to the having onlyone hyper-parameter λ . We now show that this value λ islargely robust to the value used. In Figure 2 we plot the Ac-curacy, Discrimination, and Consistency as a function of λ for values in the range [1 , , which shows GRAD’s con-sistent behavior. On the Adult dataset, we see results sta-bilize after λ ≥ . The Health dataset looks ﬂat throughthe entire plot since the variation is on the order of − ,making it indiscernible. Only the Adult and Health plots areshown due to space limitations. The Diabetes plot is similar,and the German dataset has more variability due to its smallsize ( n = 1000 ).

5. Conclusions

We have introduced GRAD, a ﬂexible approach for buildingfair neural networks that can be used to augment any generalneural network, and does not mandate the auto-encodingapproach of prior work or the use of cumbersome additionalhyper-parameters. GRAD is competitive with prior work,can protect multiple attributes, and often delivers superiorfairness through low discrimination. radient Reversal Against Discrimination

Acknowledgments

We would like to thanks Steven Mills and Paul Terwilligerfor their support of this work.

References

Bechavod, Yahav and Ligett, Katrina. Learning Fair Classi-ﬁers: A Regularization-Inspired Approach. In

FAT MLWorkshop , 2017. URL http://arxiv.org/abs/1707.00044 .Dwork, Cynthia, Immorlica, Nicole, Kalai, Adam Tauman,and Leiserson, Max. Decoupled classiﬁers for fair andefﬁcient machine learning. In

FAT ML Workshop , 2017.doi: 1707.06613. URL https://arxiv.org/pdf/1707.06613.pdf .Edwards, Harrison and Storkey, Amos. Censoring Repre-sentations with an Adversary. In

International Confer-ence on Learning Representations (ICLR) , 2016. URL http://arxiv.org/abs/1511.05897 .Ganin, Yaroslav, Ustinova, Evgeniya, Ajakan, Hana, Ger-main, Pascal, Larochelle, Hugo, Laviolette, François,Marchand, Mario, and Lempitsky, Victor. Domain-adversarial Training of Neural Networks.

J. Mach.Learn. Res. , 17(1):2030–2096, 1 2016. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=2946645.2946704 .Landeiro, Virgile and Culotta, Aron. Robust Text Classi-ﬁcation in the Presence of Confounding Bias. In

Pro-ceedings of the Thirtieth AAAI Conference on Artiﬁ-cial Intelligence , AAAI’16, pp. 186–193. AAAI Press,2016. URL http://dl.acm.org/citation.cfm?id=3015812.3015840 .Louizos, Christos, Swersky, Kevin, Li, Yujia, Welling, Max,and Zemel, Richard. The Variational Fair Autoencoder.In

International Conference on Learning Representa-tions (ICLR) , 2016. URL http://arxiv.org/abs/1511.00830 .Pedreshi, Dino, Ruggieri, Salvatore, and Turini, Franco.Discrimination-aware Data Mining. In

Proceedings of the14th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining , KDD ’08, pp. 560–568,New York, NY, USA, 2008. ACM. ISBN 978-1-60558-193-4. doi: 10.1145/1401890.1401959. URL http://doi.acm.org/10.1145/1401890.1401959 .Romei, Andrea and Ruggieri, Salvatore. A multidisci-plinary survey on discrimination analysis.

The Knowl-edge Engineering Review , 29(05):582–638, 11 2014.ISSN 0269-8889. doi: 10.1017/S0269888913000039.URL . Tokui, Seiya, Oono, Kenta, Hido, Shohei, and Clay-ton, Justin. Chainer: a Next-Generation OpenSource Framework for Deep Learning. In

Pro-ceedings of Workshop on Machine Learning Systems(LearningSys) in The Twenty-ninth Annual Conferenceon Neural Information Processing Systems (NIPS) ,2015. URL http://learningsys.org/papers/LearningSys_2015_paper_33.pdf .Zafar, Muhammad Bilal, Valera, Isabel, Rogriguez,Manuel Gomez, and Gummadi, Krishna P. Fairness con-straints: Mechanisms for fair classiﬁcation. In

ArtiﬁcialIntelligence and Statistics , pp. 962–970, 2017.Zemel, Rich, Wu, Yu, Swersky, Kevin, Pitassi, Toni, andDwork, Cynthia. Learning Fair Representations. InDasgupta, Sanjoy and McAllester, David (eds.),

Pro-ceedings of the 30th International Conference on Ma-chine Learning , volume 28 of

Proceedings of MachineLearning Research , pp. 325–333, Atlanta, Georgia, USA,2013. PMLR. URL http://proceedings.mlr.press/v28/zemel13.htmlhttp://proceedings.mlr.press/v28/zemel13.html