The FairCeptron: A Framework for Measuring Human Perceptions of Algorithmic Fairness
Georg Ahnert, Ivan Smirnov, Florian Lemmerich, Claudia Wagner, Markus Strohmaier
TThe
Fair
Ceptron:A Framework for Measuring Human Perceptions of Algorithmic Fairness
Georg Ahnert, Ivan Smirnov, Florian Lemmerich, Claudia Wagner, Markus Strohmaier RWTH Aachen University GESIS - Leibniz Institute for the Social [email protected], { ivan.smirnov, florian.lemmerich, markus.strohmaier } @cssh.rwth-aachen.de,[email protected] Abstract
Measures of algorithmic fairness often do not account forhuman perceptions of fairness that can substantially varybetween different sociodemographics and stakeholders. The
Fair
Ceptron framework is an approach for studying percep-tions of fairness in algorithmic decision making such as inranking or classification. It supports (i) studying human per-ceptions of fairness and (ii) comparing these human per-ceptions with measures of algorithmic fairness. The frame-work includes fairness scenario generation, fairness percep-tion elicitation and fairness perception analysis. We demon-strate the
Fair
Ceptron framework by applying it to a hypo-thetical university admission context where we collect humanperceptions of fairness in the presence of minorities. An im-plementation of the
Fair
Ceptron framework is openly avail-able , and it can easily be adapted to study perceptions ofalgorithmic fairness in other application contexts. We hopeour work paves the way towards elevating the role of stud-ies of human fairness perceptions in the process of designingalgorithmic decision making systems. Motivation
Considering fairness in algorithmic decision-making posesan important challenge (Chouldechova and Roth 2020). Dif-ferent definitions of algorithmic fairness have been pro-posed, including individual measures (Dwork et al. 2012),as well as group based measures for both classifica-tion (Friedler et al. 2019) and ranking decisions (Yang andStoyanovich 2017). In general, algorithms trade accuracyand fairness (Kearns and Roth 2019), and group-based fair-ness measures cannot be simultaneously equalized over allgroups (Chouldechova 2017). Thus, normative decisionsmust be made.One way of approaching these decisions is through ananalysis of what is perceived as fair, involving the targetpopulation of a deciding algorithm in its creation. Thiscould increase the acceptance of algorithmic decision mak-ing (Awad et al. 2018). Involvement also benefits procedu-ral fairness, often the most important contributor to overallfairness perception (Ambrose, Wo, and Griffith 2015). Pre-vious research investigated perceptions of algorithmic fair-ness (Saxena et al. 2019; Srivastava, Heidari, and Krause https://github.com/cssh-rwth/fairceptron Fair
Ceptron framework forstudying fairness perceptions. It allows to study classifica-tion and ranking decisions that do not necessarily optimizefor a single fairness measure. With the
Fair
Ceptron, obliga-tory trade-offs between accuracy and multiple fairness mea-sures can be investigated, and the nature of the relationshipsbetween fairness perceptions and fairness measures can bedetermined. An implementation is available as open source and built for easy deployment and adaptation to differentstudy contexts. The
Fair
Ceptron Framework
The
Fair
Ceptron framework consist of three components: (i)the generation of fairness scenarios according to a prespec-ified algorithm, (ii) presentation of scenarios to survey par-ticipants and collecting their subjective fairness rating, and(iii) analysis of responses that takes into account character-istics of scenarios, e.g. group sizes, and characteristics ofusers, e.g. sociodemographics or attitudes. The
Fair
Ceptronframework can be implemented in various ways, in this pa-per we present one particular implementation.
Fairness scenario generation
Algorithmic ranking andclassification scenarios are generated that consist of per-sonas of two or more groups that can optionally have a sec-ond, numeric attribute associated to them. We provide sim-ple code examples for scenario generation in Python. Thescenarios are generated as all possible selections from / per-mutations of n personas, in which personas within a groupare selected / ranked by qualification. The scenarios are clus-tered along multiple measures of algorithmic fairness, en-suring that each participant later receives a variety of sce-narios, while maximizing the total number of scenarios thatare tested. a r X i v : . [ c s . C Y ] F e b igure 1: (A) A Fair
Ceptron ranking scenario.
Participants are shown an algorithmic ranking scenario. They rate perceivedfairness of the scenario on a visual analogue scale. In addition to ranking, classification scenarios are also supported. (B) Perceptions of fairness across different ranking scenarios.
All scenarios are binned by ordering utility (Zehlike et al.2017) and gender representation (adapted from Yang and Stoyanovich 2017). Participants were mainly influenced by orderingutility. Higher ratings for over-representation of women vs. men can be seen in scenarios with ordering utility < . . Fairness perception elicitation
Participants take partthrough a responsive, universal web application as shown inFig. 1 (A). For each new participant, the application selectsone random fairness scenario from each pre-defined clusterof scenarios, and then shuffles the selected scenarios. Forevery scenario, a description and an illustration is shown.The participants rate each scenario on an initially blank vi-sual analogue scale (VAS) from very unfair to very fair . Adynamic indicator is added to the VAS to improve accuracywith minimal additional bias (Matejka et al. 2016). The timeto answer, and the uncertainty in answering, measured as thesum of differences of non-final ratings, are stored alongsidethe final answer. Sociodemographics and attitudes can alsobe elicited. Fairness perception analysis
The obtained data can beexported from MongoDB in CSV or JSON format. Weprovide evaluation examples written with common Pythonframeworks for the above listed analyses. Heatmaps thatcompare fairness ratings on scenarios group by two distinctmeasures can easily be generated, as shown in Fig. 1 (B).
Demonstration
For demonstration purposes, we applied the
Fair
Ceptronframework using a voluntary response sample of 136 people.The hypothetical scenarios concern a university admissionprocess. All scenarios displayed 10 female / male studentapplicants with associated qualification scores . Each partic-ipant was asked to rate 10 classification and 10 ranking sce-narios. Additionally, participants filled in additional ques-tions about their demographics, their attitudes towards de-ciding machines (adapted from Awad et al. 2018), and tooka big-five personality short test (Rammstedt and John 2007).Fig. 1 (B) illustrates the fairness perceptions aggregated from the ranking scenarios of the
Fair
Ceptron study. In gen-eral, participants rated scenarios according to their orderingutility. The highlighted exemplary bin is rated unfair on av-erage, with scenarios that partially violate qualification or-der and in which men are over-represented. Ratings differby participant gender and political orientation, in particularthe acceptance of over-representing female personas. Thesefindings only serve for illustration and are obtained from anon-representative population. The demo at ICWSM will in-clude a walk-through over scenario generation, perceptionelicitation, and analysis.
Fair
Ceptron studies can easily be deployed with little ef-forts building upon the existing implementation. The frame-work allows to investigate whether fairness perceptions de-pend on domains (e.g. education, medicine, finance), so-ciodemographics (e.g. gender, occupation) or the stakes in-volved (high- vs low-stakes decisions). The results obtainedfrom
Fair
Ceptron studies could empirically inform the se-lection and evaluation of fairness measures in real world set-tings. We hope our framework represents a stepping stonetowards a future, in which the people subjected to algorith-mic decision making are contributing in its design process,and in which algorithmic notions of fairness are subjectedto empirical studies of human perceptions of fairness beforeimplementation and roll-out.In summary, we present a framework for studying percep-tions of fairness in algorithmic decision making such as inranking or classification that includes fairness scenario gen-eration, fairness elicitation and fairness perception analysissteps. Our implementation of the framework is available onGitHub as open source. eferences
Ambrose, M. L.; Wo, D. X.; and Griffith, M. D. 2015. Over-all Justice: Past, Present, and Future. In Cropanzano, R.; andAmbrose, M. L., eds.,
The Oxford Handbook of Justice in theWorkplace , 109–135. New York: Oxford University Press.Awad, E.; Dsouza, S.; Kim, R.; Schulz, J.; Henrich, J.;Shariff, A.; Bonnefon, J.-F.; and Rahwan, I. 2018. TheMoral Machine Experiment.
Nature
Nature
Big Data
Communicationsof the ACM
Jour-nal of Applied Psychology
Proceedings ofthe 3rd Innovations in Theoretical Computer Science Con-ference , 214–226. New York: ACM. doi:10.1145/2090236.2090255.Engstrom, H. R.; Alic, A.; and Laurin, K. 2020. Justifica-tion and Rationalization Causes. In Lind, E. A., ed.,
SocialPsychology and Justice , 44–66. New York: Routledge.Friedler, S. A.; Scheidegger, C.; Venkatasubramanian, S.;Choudhary, S.; Hamilton, E. P.; and Roth, D. 2019. AComparative Study of Fairness-Enhancing Interventions inMachine Learning. In
Proceedings of the Conference onFairness, Accountability, and Transparency , 329–338. NewYork: ACM. doi:10.1145/3287560.3287589.Harrison, G.; Hanson, J.; Jacinto, C.; Ramirez, J.; and Ur,B. 2020. An Empirical Study on the Perceived Fairnessof Realistic, Imperfect Machine Learning Models. In
Pro-ceedings of the 2020 Conference on Fairness, Accountabil-ity, and Transparency , 392–402. New York: ACM. doi:10.1145/3351095.3372831.Kearns, M.; and Roth, A. 2019.
The Ethical Algorithm: TheScience of Socially Aware Algorithm Design . New York:Oxford University Press.Matejka, J.; Glueck, M.; Grossman, T.; and Fitzmaurice,G. W. 2016. The Effect of Visual Appearance on the Perfor-mance of Continuous Sliders and Visual Analogue Scales.In
Proceedings of the 2016 CHI Conference on Human Fac-tors in Computing Systems , 5421–5432. New York: ACM.doi:10.1145/2858036.2858063. Rammstedt, B.; and John, O. P. 2007. Measuring Personalityin one Minute or less: A 10-item Short Version of the BigFive Inventory in English and German.
Journal of Researchin Personality
Proceedings of the 2019AAAI/ACM Conference on AI, Ethics, and Society , 99–106.New York: ACM. doi:10.1145/3306618.3314248.Srivastava, M.; Heidari, H.; and Krause, A. 2019. Mathemat-ical Notions vs. Human Perception of Fairness: A Descrip-tive Approach to Fairness for Machine Learning. In
Pro-ceedings of the 25th ACM SIGKDD International Confer-ence on Knowledge Discovery & Data Mining , 2459–2468.New York: ACM. doi:10.1145/3292500.3330664.Truxillo, D. M.; Bauer, T. N.; Campion, M. A.; and Paronto,M. E. 2006. A Field Study of the Role of Big Five Person-ality in Applicant Perceptions of Selection Fairness, Self,and the Hiring Organization.
International Journal of Selec-tion and Assessment
Academy ofManagement Journal
Proceedings of the 29th InternationalConference on Scientific and Statistical Database Manage-ment . New York: ACM. doi:10.1145/3085504.3085526.Zehlike, M.; Bonchi, F.; Castillo, C.; Hajian, S.; Megahed,M.; and Baeza-Yates, R. 2017. FA*IR: A Fair Top-k Rank-ing Algorithm. In