[PDF] The FairCeptron: A Framework for Measuring Human Perceptions of Algorithmic Fairness

Abstract

Measures of algorithmic fairness often do not account for human perceptions of fairness that can substantially vary between different sociodemographics and stakeholders. The FairCeptron framework is an approach for studying perceptions of fairness in algorithmic decision making such as in ranking or classification. It supports (i) studying human perceptions of fairness and (ii) comparing these human perceptions with measures of algorithmic fairness. The framework includes fairness scenario generation, fairness perception elicitation and fairness perception analysis. We demonstrate the FairCeptron framework by applying it to a hypothetical university admission context where we collect human perceptions of fairness in the presence of minorities. An implementation of the FairCeptron framework is openly available, and it can easily be adapted to study perceptions of algorithmic fairness in other application contexts. We hope our work paves the way towards elevating the role of studies of human fairness perceptions in the process of designing algorithmic decision making systems.

Full PDF

TThe

Fair

Ceptron:A Framework for Measuring Human Perceptions of Algorithmic Fairness

Georg Ahnert, Ivan Smirnov, Florian Lemmerich, Claudia Wagner, Markus Strohmaier RWTH Aachen University GESIS - Leibniz Institute for the Social [email protected], { ivan.smirnov, ﬂorian.lemmerich, markus.strohmaier } @cssh.rwth-aachen.de,[email protected] Abstract

Measures of algorithmic fairness often do not account forhuman perceptions of fairness that can substantially varybetween different sociodemographics and stakeholders. The

Fair

Ceptron framework is an approach for studying percep-tions of fairness in algorithmic decision making such as inranking or classiﬁcation. It supports (i) studying human per-ceptions of fairness and (ii) comparing these human per-ceptions with measures of algorithmic fairness. The frame-work includes fairness scenario generation, fairness percep-tion elicitation and fairness perception analysis. We demon-strate the

Fair

Ceptron framework by applying it to a hypo-thetical university admission context where we collect humanperceptions of fairness in the presence of minorities. An im-plementation of the

Fair

Ceptron framework is openly avail-able , and it can easily be adapted to study perceptions ofalgorithmic fairness in other application contexts. We hopeour work paves the way towards elevating the role of stud-ies of human fairness perceptions in the process of designingalgorithmic decision making systems. Motivation

Considering fairness in algorithmic decision-making posesan important challenge (Chouldechova and Roth 2020). Dif-ferent deﬁnitions of algorithmic fairness have been pro-posed, including individual measures (Dwork et al. 2012),as well as group based measures for both classiﬁca-tion (Friedler et al. 2019) and ranking decisions (Yang andStoyanovich 2017). In general, algorithms trade accuracyand fairness (Kearns and Roth 2019), and group-based fair-ness measures cannot be simultaneously equalized over allgroups (Chouldechova 2017). Thus, normative decisionsmust be made.One way of approaching these decisions is through ananalysis of what is perceived as fair, involving the targetpopulation of a deciding algorithm in its creation. Thiscould increase the acceptance of algorithmic decision mak-ing (Awad et al. 2018). Involvement also beneﬁts procedu-ral fairness, often the most important contributor to overallfairness perception (Ambrose, Wo, and Grifﬁth 2015). Pre-vious research investigated perceptions of algorithmic fair-ness (Saxena et al. 2019; Srivastava, Heidari, and Krause https://github.com/cssh-rwth/fairceptron Fair

Ceptron framework forstudying fairness perceptions. It allows to study classiﬁca-tion and ranking decisions that do not necessarily optimizefor a single fairness measure. With the

Fair

Ceptron, obliga-tory trade-offs between accuracy and multiple fairness mea-sures can be investigated, and the nature of the relationshipsbetween fairness perceptions and fairness measures can bedetermined. An implementation is available as open source and built for easy deployment and adaptation to differentstudy contexts. The

Fair

Ceptron Framework

The

Fair

Ceptron framework consist of three components: (i)the generation of fairness scenarios according to a prespec-iﬁed algorithm, (ii) presentation of scenarios to survey par-ticipants and collecting their subjective fairness rating, and(iii) analysis of responses that takes into account character-istics of scenarios, e.g. group sizes, and characteristics ofusers, e.g. sociodemographics or attitudes. The

Fair

Ceptronframework can be implemented in various ways, in this pa-per we present one particular implementation.

Fairness scenario generation

Algorithmic ranking andclassiﬁcation scenarios are generated that consist of per-sonas of two or more groups that can optionally have a sec-ond, numeric attribute associated to them. We provide sim-ple code examples for scenario generation in Python. Thescenarios are generated as all possible selections from / per-mutations of n personas, in which personas within a groupare selected / ranked by qualiﬁcation. The scenarios are clus-tered along multiple measures of algorithmic fairness, en-suring that each participant later receives a variety of sce-narios, while maximizing the total number of scenarios thatare tested. a r X i v : . [ c s . C Y ] F e b igure 1: (A) A Fair

Ceptron ranking scenario.

Participants are shown an algorithmic ranking scenario. They rate perceivedfairness of the scenario on a visual analogue scale. In addition to ranking, classiﬁcation scenarios are also supported. (B) Perceptions of fairness across different ranking scenarios.

All scenarios are binned by ordering utility (Zehlike et al.2017) and gender representation (adapted from Yang and Stoyanovich 2017). Participants were mainly inﬂuenced by orderingutility. Higher ratings for over-representation of women vs. men can be seen in scenarios with ordering utility < . . Fairness perception elicitation

Participants take partthrough a responsive, universal web application as shown inFig. 1 (A). For each new participant, the application selectsone random fairness scenario from each pre-deﬁned clusterof scenarios, and then shufﬂes the selected scenarios. Forevery scenario, a description and an illustration is shown.The participants rate each scenario on an initially blank vi-sual analogue scale (VAS) from very unfair to very fair . Adynamic indicator is added to the VAS to improve accuracywith minimal additional bias (Matejka et al. 2016). The timeto answer, and the uncertainty in answering, measured as thesum of differences of non-ﬁnal ratings, are stored alongsidethe ﬁnal answer. Sociodemographics and attitudes can alsobe elicited. Fairness perception analysis

The obtained data can beexported from MongoDB in CSV or JSON format. Weprovide evaluation examples written with common Pythonframeworks for the above listed analyses. Heatmaps thatcompare fairness ratings on scenarios group by two distinctmeasures can easily be generated, as shown in Fig. 1 (B).

Demonstration

For demonstration purposes, we applied the

Fair

Ceptronframework using a voluntary response sample of 136 people.The hypothetical scenarios concern a university admissionprocess. All scenarios displayed 10 female / male studentapplicants with associated qualiﬁcation scores . Each partic-ipant was asked to rate 10 classiﬁcation and 10 ranking sce-narios. Additionally, participants ﬁlled in additional ques-tions about their demographics, their attitudes towards de-ciding machines (adapted from Awad et al. 2018), and tooka big-ﬁve personality short test (Rammstedt and John 2007).Fig. 1 (B) illustrates the fairness perceptions aggregated from the ranking scenarios of the

Fair

Ceptron study. In gen-eral, participants rated scenarios according to their orderingutility. The highlighted exemplary bin is rated unfair on av-erage, with scenarios that partially violate qualiﬁcation or-der and in which men are over-represented. Ratings differby participant gender and political orientation, in particularthe acceptance of over-representing female personas. Theseﬁndings only serve for illustration and are obtained from anon-representative population. The demo at ICWSM will in-clude a walk-through over scenario generation, perceptionelicitation, and analysis.

Fair

Ceptron studies can easily be deployed with little ef-forts building upon the existing implementation. The frame-work allows to investigate whether fairness perceptions de-pend on domains (e.g. education, medicine, ﬁnance), so-ciodemographics (e.g. gender, occupation) or the stakes in-volved (high- vs low-stakes decisions). The results obtainedfrom

Fair

Ceptron studies could empirically inform the se-lection and evaluation of fairness measures in real world set-tings. We hope our framework represents a stepping stonetowards a future, in which the people subjected to algorith-mic decision making are contributing in its design process,and in which algorithmic notions of fairness are subjectedto empirical studies of human perceptions of fairness beforeimplementation and roll-out.In summary, we present a framework for studying percep-tions of fairness in algorithmic decision making such as inranking or classiﬁcation that includes fairness scenario gen-eration, fairness elicitation and fairness perception analysissteps. Our implementation of the framework is available onGitHub as open source. eferences

Ambrose, M. L.; Wo, D. X.; and Grifﬁth, M. D. 2015. Over-all Justice: Past, Present, and Future. In Cropanzano, R.; andAmbrose, M. L., eds.,

The Oxford Handbook of Justice in theWorkplace , 109–135. New York: Oxford University Press.Awad, E.; Dsouza, S.; Kim, R.; Schulz, J.; Henrich, J.;Shariff, A.; Bonnefon, J.-F.; and Rahwan, I. 2018. TheMoral Machine Experiment.

Nature

Big Data

Communicationsof the ACM

Jour-nal of Applied Psychology

Proceedings ofthe 3rd Innovations in Theoretical Computer Science Con-ference , 214–226. New York: ACM. doi:10.1145/2090236.2090255.Engstrom, H. R.; Alic, A.; and Laurin, K. 2020. Justiﬁca-tion and Rationalization Causes. In Lind, E. A., ed.,

SocialPsychology and Justice , 44–66. New York: Routledge.Friedler, S. A.; Scheidegger, C.; Venkatasubramanian, S.;Choudhary, S.; Hamilton, E. P.; and Roth, D. 2019. AComparative Study of Fairness-Enhancing Interventions inMachine Learning. In

Proceedings of the Conference onFairness, Accountability, and Transparency , 329–338. NewYork: ACM. doi:10.1145/3287560.3287589.Harrison, G.; Hanson, J.; Jacinto, C.; Ramirez, J.; and Ur,B. 2020. An Empirical Study on the Perceived Fairnessof Realistic, Imperfect Machine Learning Models. In

Pro-ceedings of the 2020 Conference on Fairness, Accountabil-ity, and Transparency , 392–402. New York: ACM. doi:10.1145/3351095.3372831.Kearns, M.; and Roth, A. 2019.

The Ethical Algorithm: TheScience of Socially Aware Algorithm Design . New York:Oxford University Press.Matejka, J.; Glueck, M.; Grossman, T.; and Fitzmaurice,G. W. 2016. The Effect of Visual Appearance on the Perfor-mance of Continuous Sliders and Visual Analogue Scales.In

Proceedings of the 2016 CHI Conference on Human Fac-tors in Computing Systems , 5421–5432. New York: ACM.doi:10.1145/2858036.2858063. Rammstedt, B.; and John, O. P. 2007. Measuring Personalityin one Minute or less: A 10-item Short Version of the BigFive Inventory in English and German.

Journal of Researchin Personality

Proceedings of the 2019AAAI/ACM Conference on AI, Ethics, and Society , 99–106.New York: ACM. doi:10.1145/3306618.3314248.Srivastava, M.; Heidari, H.; and Krause, A. 2019. Mathemat-ical Notions vs. Human Perception of Fairness: A Descrip-tive Approach to Fairness for Machine Learning. In

Pro-ceedings of the 25th ACM SIGKDD International Confer-ence on Knowledge Discovery & Data Mining , 2459–2468.New York: ACM. doi:10.1145/3292500.3330664.Truxillo, D. M.; Bauer, T. N.; Campion, M. A.; and Paronto,M. E. 2006. A Field Study of the Role of Big Five Person-ality in Applicant Perceptions of Selection Fairness, Self,and the Hiring Organization.

International Journal of Selec-tion and Assessment

Academy ofManagement Journal

Proceedings of the 29th InternationalConference on Scientiﬁc and Statistical Database Manage-ment . New York: ACM. doi:10.1145/3085504.3085526.Zehlike, M.; Bonchi, F.; Castillo, C.; Hajian, S.; Megahed,M.; and Baeza-Yates, R. 2017. FA*IR: A Fair Top-k Rank-ing Algorithm. In