[PDF] Human Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making

Abstract

How to attribute responsibility for autonomous artificial intelligence (AI) systems' actions has been widely debated across the humanities and social science disciplines. This work presents two experiments (N=200 each) that measure people's perceptions of eight different notions of moral responsibility concerning AI and human agents in the context of bail decision-making. Using real-life adapted vignettes, our experiments show that AI agents are held causally responsible and blamed similarly to human agents for an identical task. However, there was a meaningful difference in how people perceived these agents' moral responsibility; human agents were ascribed to a higher degree of present-looking and forward-looking notions of responsibility than AI agents. We also found that people expect both AI and human decision-makers and advisors to justify their decisions regardless of their nature. We discuss policy and HCI implications of these findings, such as the need for explainable AI in high-stakes scenarios.

Full PDF

HHuman Perceptions on Moral Responsibility of AI: A Case Studyin AI-Assisted Bail Decision-Making

Gabriel Lima [email protected] of Computing, KAISTData Science Group, IBSRepublic of Korea

Nina Grgić-Hlača [email protected] Planck Institute for SoftwareSystemsMax Planck Institute for Research onCollective GoodsGermany

Meeyoung Cha [email protected] Science Group, IBSSchool of Computing, KAISTRepublic of Korea

ABSTRACT

How to attribute responsibility for autonomous artificial intelli-gence (AI) systems’ actions has been widely debated across thehumanities and social science disciplines. This work presents twoexperiments ( 𝑁 =200 each) that measure people’s perceptions ofeight different notions of moral responsibility concerning AI andhuman agents in the context of bail decision-making. Using real-life adapted vignettes, our experiments show that AI agents areheld causally responsible and blamed similarly to human agentsfor an identical task. However, there was a meaningful differencein how people perceived these agents’ moral responsibility; humanagents were ascribed to a higher degree of present-looking andforward-looking notions of responsibility than AI agents. We alsofound that people expect both AI and human decision-makers andadvisors to justify their decisions regardless of their nature. Wediscuss policy and HCI implications of these findings, such as theneed for explainable AI in high-stakes scenarios. CCS CONCEPTS • Human-centered computing → Empirical studies in HCI ; •

Ap-plied computing → Psychology ; Law . KEYWORDS

AI, Moral Responsibility, Responsibility, Moral Judgment, Blame,Liability, COMPAS, Bail Decision-Making

ACM Reference Format:

Gabriel Lima, Nina Grgić-Hlača, and Meeyoung Cha. 2021. Human Per-ceptions on Moral Responsibility of AI: A Case Study in AI-Assisted BailDecision-Making. In

CHI Conference on Human Factors in Computing Sys-tems (CHI ’21), May 8–13, 2021, Yokohama, Japan.

ACM, New York, NY, USA,17 pages. https://doi.org/10.1145/3411764.3445260

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

Who should be held responsible for the harm caused by artificialintelligence (AI)? This question has been debated for over a decadesince Matthias’ landmark essay on the responsibility gap of au-tonomous machines [68]. This gap is posed by highly autonomousand self-learning AI systems. Until now, scholars in multiple dis-ciplines, including ethics, philosophy, computer science, and law,have suggested possible solutions to this moral and legal dilemma.Optimistic views proclaim that the gap can be bridged by proac-tive attitudes of AI designers, who should readily take responsibil-ity for any harm [20, 72]. Some even propose to hold AI systemsresponsible per se [91], viewing human-AI collaborations as ex-tended agencies [45, 48]. In contrast, pessimistic views questionwhether this gap can be bridged at all, since there might not existappropriate subjects of retributive blame [26] nor it makes senseto hold inanimate and non-conscious entities responsible for theiractions [16, 89, 96].Most research on the responsibility gap has been normative inthat they prescribed ethical principles and proposed solutions. How-ever, there is a growing need for practical and proactive guidelines;as Mittelstadt puts it, “principles alone cannot guarantee ethicalAI” [69]. Some even argue that normative approaches are inap-propriate as they can hurt AI’s adoption in the long run [12]. Incontrast, relatively little attention has been paid to understandingthe public’s views on this issue, who are likely the most affectedstakeholder when AI systems are deployed [78].We conducted two survey studies ( 𝑁 =200 each) that collect thepublic perception on moral responsibility of AI and human agentsin high-stakes scenarios. We approached the pluralistic view ofresponsibility and considered eight distinct notions compiled fromphilosophy and psychology literature. Real-life adapted vignettesof AI-assisted bail decisions were used to observe how people at-tributed specific meanings of responsibility to i ) AI advisors vs.human advisors and ii ) AI decision-makers vs. human decision-makers. Our study employed a within-subjects design where allparticipants were exposed to a diverse set of vignettes addressingdistinct possible outcomes from bail decisions.Our findings suggest that the eight notions of responsibilityconsidered can be re-grouped into two clusters: one encompasses present-looking and forward-looking notions (e.g., responsibility-as-task, as-power, as-authority, as-obligation), and the other includes backward-looking notions (e.g., blame, praise, liability) and causal a r X i v : . [ c s . C Y ] F e b HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al. determinations . We discuss how theories of moral responsibilitycan explain these clusters.In comparing AI agents against human agents, we found a strik-ing difference in the way people attribute responsibility. A substan-tially higher degree of the present- and forward-looking notionswere attributed to human agents than AI agents. This means thatAI agents were assigned the responsibility to complete and overseethe same task to a lesser extent than human agents. No difference,however, was observed for the backward-looking responsibilitynotions. This finding suggests that blame, liability, and causal re-sponsibility were ascribed equally to AI and human agents, despiteelectronic agents not being appropriate subjects of liability andblame [16, 26, 89]. In addition to these findings, we found thatpeople expect both human and AI agents to justify their decisions.The findings of this study have several implications for the de-velopment and regulation of AI. Using the proposition of moral-ity as a human-made social construct that aims to fulfill specificgoals [91, 93], we highlight the importance of users and designerstaking responsibility for their systems while being held responsiblefor any norm-violating outcomes. We also discuss the possibility ofholding AI systems responsible per se [61] alongside other humanagents, as a possible approach congruent to the public opinion.

Theories of moral responsibility date back to Aristotle, who arguedthat an entity should satisfy both freedom and epistemic condi-tions to appropriately be ascribed to moral responsibility. Agentsmust act freely, without coercion, and understand their actions.Although recent scholarly work does not directly challenge theseAristotelian conditions, they argue that moral responsibility cannotbe explained as a single concept, but that it involves a relativelypluralistic definition of what it means to hold someone morallyresponsible [87, 102].Scanlon [85] has proposed moral responsibility to be a bipartiteconcept. One is that there is an account of being responsible inrendering an agent worthy of moral appraisal. Another is that itis also possible to hold one responsible for specific actions andconsequences. Expanding this bipartite concept, Shoemaker [87]has proposed three different concepts of moral responsibility: at-tributability, answerability, and accountability. Various other defi-nitions have been proposed [102], including structured notions ofwhat responsibility is [104] and how they are connected [14, 34].Attributing responsibility to an entity can be both descriptive(e.g., causal responsibility) and normative (e.g., blameworthiness).For the former, one might ask if an agent is responsible for anaction or state-of-affairs, while the latter concerns whether one should attribute responsibility to an agent. Responsibility can alsobe divided into backward-looking notions if they evaluate a pastaction and possibly lead to reactive attitudes [106], or forward-looking notions if they prescribe obligations.Responsibility can take many forms. It not only addresses themoral dimension of society but also tackles legal concepts and otherdescriptive notions. One can be held legally responsible (i.e., liable)regardless of their moral responsibility, as in the case of strict orvicarious liability. Stating that an agent is causally responsible for a state-of-affairs does not necessarily prescribe a moral evaluationof the action.Holding an agent “responsible” fulfills a wide range of social andlegal functions. Legal scholars state that punishment (which couldbe seen as a form of holding an agent responsible, e.g., under crimi-nal liability) aims to reform the wrongdoers, deter re-offenses andsimilar actions, and resolve retributive sentiments [4, 99]. Previouswork has addressed how and why people assign responsibility tovarious agents. The general public might choose to hold a wrong-doer responsible for restoring moral coherence [22] or reaffirminga communal moral values [109]. Psychological research indicatesthat people base much of their responsibility attribution on retribu-tive sentiments rather than deterrence [18], while overestimatingutilitarian goals in their ascription of punishment (i.e., responsibil-ity) [17]. Intentionality also determines how much responsibilityis assigned to an entity [70]; people look for an intentional agentto hold responsible and infer other entities’ intentionality uponfailure to find one [40]. AI systems and robots are being widely adopted across society.Algorithms are used to choose which candidate is most fit for a jobposition [111], decide which defendants are granted bail [33], guidehealth-related decision [73], and assess credit risk [49]. AI systemsare often embedded into robots or machines, such as autonomousvehicles [13] and robot soldiers [3]. A natural question here is: if anAI system or a robot causes harm, who should be held responsiblefor their actions and consequences?In answering this question, some scholars have defended the exis-tence of a (techno-)responsibility gap [68] for autonomous and self-learning systems. The autonomous component of AI and robotschallenges the control condition of responsibility attribution. Si-multaneously, their self-learning capabilities and opacity do notallow users, designers, and manufacturers to foresee consequences.Similarly to the “problem of many hands” in the assignment ofresponsibility to collective agents [102], AI and robots suffer fromthe “problem of many things,” i.e., current systems are composedof various interacting entities and technologies, making the searchfor a responsible entity harder [24]. Scholars have extensively dis-cussed the assignment of responsibility for autonomous machines’actions and have expanded this gap to more specific notions ofresponsibility [5, 8, 54] and its functions [26, 62].Although a clear separation is fuzzy, one may find two schoolsof thought on the responsibility gap issue. One side argues thatdesigners and manufacturers should take responsibility for anyharm caused by their “tools.” [16, 31] Supervisors and users of thesesystems should also take responsibility for their deployment, par-ticularly in consequential environments like the military as arguedby Champagne and Tonkens [20]. The exercise of agency by thesesystems can be viewed as a human-robot collaboration, in whichhumans supervise and manage the agency of AI and robots [72].Humans should focus on their relationship to the patients of theirresponsibility to answer for the actions of autonomous systems [24]. Scholars also raise doubt on the existence of techno-responsibility gaps, arguing thatmoral institutions are dynamic and flexible and can deal with these new technologicalartifacts [53, 95]. uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan

Likewise, other authors argue that society should hold humans re-sponsible because doing so for a machine would be meaninglessas it does not understand the consequences of their actions or thereactive attitudes towards them [89, 96], possibly undermining thedefinition of responsibility [47].On the opposite side, some scholars propose autonomous sys-tems could be held responsible per se [61]. From a legal perspective,non-human entities (e.g., corporations) can be held responsible forany damage that they may cause [103]. These scholars often viewthese human-AI collaborations as extended agencies where all enti-ties should be held jointly responsible [45, 48]. AI and robots are partof the socio-technological ensemble, in which responsibility canbe distributed across multiple entities with varying degrees [32].These proposals arguably contribute to legal coherence [98], al-though it could also lead to various repercussions in moral andlegal institutions [8]. Empirical findings indicate that people at-tribute responsibility to these systems [7, 62], although to a lesserextent than human agents. According to some scholars, holding AIand robots responsible per se could fulfill specific social goals [23]and promote critical social functions [11, 91].The regulation of AI and robots poses new challenges to policy-making, as in the previously introduced techno-responsibility gap,which society must discuss at large [24]. The “algorithmic socialcontract” requires inputs from various stakeholders, whose opinionshould be weighed for the holistic crafting of regulations [78]. Itis crucial to understand how people perceive these systems beforetheir wide deployment [80]. Our responsibility practices depend onfolk-psychology [15] (i.e., how people perceive the agents involvedin social practices [91]). Literature exists on the public perception ofmoral and legal issues concerning AI [6, 7, 62]. However, little data-driven research has collected public opinion on how responsibilityshould be attributed for AI and robots’ actions.

A growing number of HCI research has been devoted to under-standing how people perceive algorithmic decisions and their con-sequences in society. For instance, Lee et al. studied people’s per-ception of trust, fairness, and justice in the context of algorith-mic decision-making [56, 57] and proposed how to embed theseviews into a policymaking framework [58]. Other scholars exploredpeople’s perceptions of procedural [41] and distributive [84, 90]aspects of algorithmic fairness and studied how they relate to in-dividual differences [42, 76, 108]. Nonetheless, little attention ispaid to the public attribution of (moral) responsibility to stakehold-ers (e.g., [43, 56, 81]), particularly the prospect of responsibilityascription to the AI system per se. The current study contributes byaddressing the public perception of algorithmic decision-makingthrough the lens of moral responsibility.Existing studies addressing how users might attribute blameto automated agents have mostly focused on robots. For instance,Malle et al. observed that people’s moral judgments between humanand robotic agents differed in that respondents blamed robots to amore considerable extent had they not taken a utilitarian action [67].Furlough et al. found that respondents attributed similar levels ofblame to robotic agents and humans when robots were described as autonomous and at the same time the leading cause of harm [37].However, these studies and many others [52, 59, 105] tackle a singu-lar notion of responsibility related to blameworthiness [102]. Thepresent research explores multiple notions of moral responsibilityof both human and AI agents involved in decision-making.

AI-based algorithms are now used to assist humans in various sce-narios, including high-stakes tasks such as medical diagnostics [35]and bail decisions [2]. These algorithms do not make decisionsthemselves, but rather “advise” humans in their decision-makingprocesses. One such algorithm is the COMPAS (Correctional Of-fender Management Profiling for Alternative Sanctions) tool, usedby the judicial system in the US to assist bail decisions and sentenc-ing [2]. Several studies have analyzed the fairness and bias aspectsof this risk assessment algorithm, e.g., [9, 33, 43].This study makes use of publicly available COMPAS data releasedby ProPublica [2] and considers the machine judgments as eitheran AI advisor (later in Study 1) or an AI decision-maker (in Study2). As stimulus material, we use real-world data obtained from aprevious analysis of the tool [2], which focused on its application inbail decision-making. This dataset contains information about 7,214defendants subjected to COMPAS screening in Broward County,Florida, between 2013 and 2014.We use 100 randomly selected cases from this dataset, the cor-responding bail suggestions, and information about whether thedefendant re-offended within two years of sentencing. The sampleddata was balanced concerning these variables. Each defendant’sCOMPAS score ranges from 1 to 10, with ten indicating the highestrisk of re-offense or nonappearance in court. In this study, scores 1to 5 were labeled “grant bail” and 6 to 10 were labeled “deny bail.”

Ascribing responsibility is a complex moral and legal practice thatencompasses various functions, entities, and social practices [71, 91].Responsibility has multiple distinct meanings depending on itspurpose and requirements. The current study revisits eight notionsof responsibility compiled from psychology and philosophy. Allof these notions originated from Van de Poel’s work [101, 102],except for responsibility-as-authority and as-power, which comesfrom Davis’s discussion on professional responsibility [27]. Wecomplement these notions with a wide range of literature rangingfrom philosophical theories of moral responsibility (e.g., [86, 87])to approaches in the context of AI systems (e.g., [24, 96]). Althoughnot exhaustive (e.g., we have not addressed virtue-based notions ofresponsibility as they cannot be easily adapted to AI systems), wehighlight how our work differs from previous HCI approaches. • Responsibility-as-obligation:

E.g., “The (agent) should ensure that the rights of the defendantare protected.”

One could be held responsible-as-obligation through con-sequentialist, deontological, and virtue-based routes [102].While an entity could be attributed such meaning of responsi-bility based on pre-determined consequentialist distribution

HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al. principles, the latter two routes presuppose the agent’s ini-tiative or promise to see to it that a specific state-of-affairsis brought about. This notion differs from responsibility-as-task as it does not imply that one should be the agent tobring about a specific state-of-affairs, but rather indicatesthat one should fulfill its supervisory duties in the process. • Responsibility-as-task: “It is the (agent)’s task to protect the defendant’s rights.”

This descriptive notion of responsibility ascribes a specifictask to an entity. These assignments do not necessarily definea moral obligation per se [101] and are often accompaniedby the understanding that an entity has to do somethingby itself [27]. In our experimental design, we highlight theagent’s acting role in completing its task. • Responsibility-as-authority: “The (agent) has the authority to prevent further offenses.”

To be responsible-as-authority implies that one is in chargeof a specific action or state-of-affairs. This notion has alsobeen posed as "responsibility-as-office" by Davis [27] in thecontext of engineers’ professional responsibility. An im-portant aspect of responsibility-as-authority is the possi-bility of delegating other complementing notions, such asresponsibility-as-task, to other agents. We address this mean-ing of responsibility by explicitly indicating that the agenthas the authority in bail decisions. • Responsibility-as-power: “The (agent) has the skills needed to protect the rights of thedefendant.”

If an entity has the skills needed to bring about an actionor state-of-affairs, one might ascribe it responsibility-as-power [27]. In other words, having the ability, in terms ofcompetency, knowledge, or expertise, might lead to the as-signment of this notion of responsibility. • Responsibility-as-answerability: “The (agent) should justify their advice.”

This is related to how one’s reasons for acting in a specificmanner could be seen under moral scrutiny. Shoemaker pro-posed this notion of moral responsibility as a form of judg-ment of one’s actions grounded in moral evaluations [87].Davis proposed a similar meaning of responsibility undera different name, responsibility-as-accountability [27], asthe responsibility for explaining specific consequences. Co-eckelbergh later applied this concept through a relationalapproach for actions and decisions made using AI [24]. • Responsibility-as-cause: “The (agent)’s decision led to the prevention of the re-offense.”

This meaning of responsibility has been further discusseddepending on the nature of an action’s consequences [27],e.g., being causally responsible for a positive state-of-affairscould lead to the ascription of “good-causation.” Causality isalso an important pre-condition for other normative notionsof responsibility, such as blame, as the blurring of a causalconnection raises questions on the foreseeability and controlof a specific action. [66, 102] • Responsibility-as-blame/praise: “The (agent) should be blamed for the violation of the rightsof the defendant.” / “The (agent) should be praised for theprotection of the rights of the defendant.” Blaming an entity for the consequences of their actions hasbeen debated as adopting certain reactive attitudes towardsit [106]. Scholars have also argued that to blame someoneis to respond to “the impairment of a relationship,” [21, 86]especially towards its constitutive standards [87]. Scholarshave debated the possibility of ascribing blame to an auto-mated agent and agree that doing so would not be morallyappropriate [26, 96]. Regardless of this consensus, previousstudies have found that people attribute a similar degree ofblame to robotic and human agents under specific conditions(e.g., [37, 67]).As an opposite concept of blame, one may consider “praise”as a positive behavioral reinforcement [51] through whichone conveys its values and expectations of the agent [28].Hence, we consider both blame and praise as responsibilitynotions in this research. • Responsibility-as-liability: “The (agent) should compensate those harmed by the re-offense.”

An entity that is ascribed this responsibility should remedyany harm caused by their actions [102]. Rather than dwellingon the discussion addressing the mental states of AI androbots and their arguable incompatibility with criminal lawand its assumption of mens rea [39, 55, 60], we address thisnotion from a civil law perspective. Scholars propose ‘mak-ing victims whole’ as the primary goal of tort law [77], andhence, we similarly address responsibility-as-liability. Wealso add that the idea of holding automated agents liablebecame prominent after the European Parliament consid-ered adopting a specific legal status for “sophisticated au-tonomous robots” [29]. Nevertheless, it is important to notethat current AI systems cannot compensate those harmed,as they do not possess any assets to be confiscated [16]. • Study 1: AI as Advisor

To study how the perceived responsibility for bail decisions differswhen judges are advised by the COMPAS tool or by another humanjudge, we considered the following scenario:

Imagine that you read the following story in your lo-cal newspaper: A court in Broward County, Florida, isstarting to use an artificial intelligence (AI) program tohelp them decide if a defendant can be released on bailbefore trial. Early career judges are taking turns receiv-ing advice from this AI program and another humanjudge, hired to serve as an advisor.

We employed a factorial survey design [107] and showed partici-pants eight vignettes that described a defendant from the ProPublicadataset, information about who the advisor was (i.e., an AI programor a human judge), which advice they gave, what the judge’s finaldecision was, and whether the defendant committed a new crimewithin the next two years (i.e., re-offended). All vignettes statedthat the judges’ final decision followed the advice given, given the uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan

Figure 1: Survey Instrument. In Study 1, where AI advisors or human advisors assist human judges, survey respondents wereasked to assign responsibility notions to AI and human advisors. In Study 2, where AI systems are decision-makers alongsidehuman judges, survey respondents were asked to assign responsibility notions to AI and human decision-makers. Both studiesemployed a factorial within-subjects design that presented eight different vignettes to each respondent. Survey instrumentsare shown in Figure 4 of the Appendix.

ProPublica dataset does not provide this information. After read-ing the stimulus material, respondents were asked to indicate towhat extent they agreed with a set of statements, presented in ran-dom order between participants, regarding the advisor on a 7-pointLikert scale (-3 = Strongly Disagree, 3 = Strongly Agree). Thesestatements aimed to capture different notions of responsibility (seeTable 2 in the Appendix for the complete list). Figure 1 illustratesthe survey methodology. Participants were also asked two attentioncheck questions in between vignettes.Each participant in the study was exposed to a random subsetof four cases with human advice and another four with AI advice.We ensured a balanced set was shown to each participant in termsof the advice (i.e., grant bail vs. deny bail) and recidivism. As aresult, each respondent was shown one vignette of every possiblecombination of scenarios, encompassing eight (advice × recidivism × AI vs. human) variations. All vignettes were presented in randomorder to eliminate any order effect [44, 79].Bail decisions aim to procure a balance between protecting futurevictims, e.g., prevent further offenses, and to impede any unnec-essary burdens towards the defendant, e.g., by ensuring that theirrights are protected [43]. The latter aspect of bail decisions is re-lated to the assumption that one is innocent until proven otherwisebeyond a reasonable doubt under criminal law [30]. To strike abalance between these two functions of bail decisions, we phrasestatements addressing all notions of responsibility addressed inthis work in two different forms: a human agent or an AI programcould be held responsible for i ) (not) protecting the rights of thedefendant and ii ) (not) preventing re-offense. Participants were ran-domly assigned to one of these treatment groups, and all statementsfollowed the same phrasing style.Towards the end of the survey, we asked demographic ques-tions (presented in Table 1). We also gathered responses to a mod-ified questionnaire of NARS (Negative Attitude towards Robot Questions related to responsibility-as-liability were shown in scenarios where i ) thedefendant re-offended and the phrasing style addressed the prevention of re-offenses,or ii ) the defendants were denied bail and did not re-offend within two years while thestatements focused on protecting their rights. The phrases tackling praise and blamewere presented depending on the advice/decision and recidivism. Scale) [92], whose subscale addressed “artificial intelligence pro-grams” rather than “robots” to accommodate the COMPAS tool. • Study 2: AI as Decision-Maker

Unlike Study 1, where a human decision-maker is advised by eithera human or an AI advisor, Study 2 explores a setting that has yet tobe implemented in the real-world. We imagine the case where anAI algorithm makes a bail decision by itself. The survey instrumentand experimental design are identical to Study 1, except that in theintroductory text, we told participants, "The court is taking turnsemploying human judges and this AI program when making bailingdecisions," and updated the phrasing of the questions to match thissetting accordingly. In each vignette, participants were asked towhat extent they agreed with the eight notions of responsibilityregarding the decision-maker , i.e., the AI program or the humanjudge, using the same 7-point Likert scale from Study 1. Both stud-ies had been approved by the Institutional Review Board (IRB) atthe first author’s institution. • Pilot Study for Validation: Cognitive Interview

We validated our survey instruments through a series of cognitiveinterviews. Cognitive interviews are a standard survey method-ology approach for improving the quality of questionnaires [83].During the interviews, respondents accessed our web-based surveyquestionnaire and were interviewed by the authors while complet-ing the survey. We utilized a verbal probing approach [110], inwhich we tested the respondents’ interpretation of the survey ques-tions, asked them to paraphrase the questions, and if they foundthe questions easy or difficult to understand and answer.We interviewed six demographically diverse respondents. Threerespondents were recruited through the online crowdsourcing plat-form Prolific [74], while the other three were our colleagues, whohad prior experience designing and conducting human-subject stud-ies. After each interview, we iteratively refined our survey instru-ment based on the respondent’s feedback. We stopped gatheringnew responses once the feedback stopped leading to new insights.This process led to two significant changes in our survey instrument

HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al.

Demographic Attribute Study 1 Study 2 Census

Total respondents 203 197 -Passed attention checks 200 194 -Women 41.5% 40.7% 51.0%0-18 years old - - 21.7%18-24 years old 37.5% 30.4% 10.8%25-34 years old 30.5% 34.0% 13.7%35-44 years old 17.0% 18.6% 12.6%45-54 years old 8.0% 7.2% 13.4%55-64 years old 6.0% 6.7% 12.9%65+ years old 1.0% 2.6% 14.9%Prefer not to respond - 0.5% -African American 4.5% 7.2% 13%Asian 17.0% 21.6% 6%Caucasian 58.5% 54.6% 61%Hispanic 10.0% 8.2% 18%Other/Prefer not to respond 10.0% 8.4% 4%Bachelor’s Degree or above 48.0% 52.0% 30%Liberal 59.0% 58.2% 33% † Conservative 14.5% 18.0% 29% † Moderate 23.5% 21.6% 34% † Other/Prefer not to respond 3.0% 2.2% 4% † Table 1: Respondents’ demographics compared to the 2016U.S. Census [100] and Pew data (marked with † ) [75]. design. Firstly, we adapted the vignette presentation, which wasinitially adapted from previous work [33]. Our respondents unani-mously stated that they found information about defendants easierto read, understand, and use when presented in a tabular format(shown in Figure 4 in the Appendix). Secondly, we rephrased someof the statements about the notions of responsibility we addressin this work so that survey respondents’ understanding of theseconcepts is similar to the definitions introduced above. We conducted a power analysis to calculate the minimum samplesize. A Wilcoxon-Mann-Whitney two-tailed test, with a 0.8 powerto detect an effect size of 0.5 at the significance level of 0.05, re-quires 67 respondents per treatment group. Hence, we recruited 400respondents through the Prolific crowdsourcing platform [74] tocompensate for attention-check failures. We targeted US residentswho have previously completed at least 100 tasks on Prolific, withan approval rate of 95% or above. Each participant was randomlyassigned to one of the two studies.The respondents’ demographics are shown in Table 1. Priorstudies of online crowdsourcing platforms have found that respon-dent samples tend to be younger, more educated, and consist ofmore women than the general US population [50]. Compared tothe 2016 US census [100], our respondents are indeed younger andmore highly educated. However, both of our studies’ samples havea smaller ratio of women than the US population. Asian ethnic-ity is slightly over-represented in our samples. Compared to PewResearch data on the US population’s political leaning [75], oursamples are substantially more liberal. The respondents were remunerated US$10 . .

66 for completing the online sur-veys. The cognitive interviews lasted less than 30 minutes, whilethe online surveys took 10.36 ± Figure 2 shows how people attributed each notion of responsibilityto AI and human agents in Study 1 (on the advisor role) and Study2 (on the decision-maker role).First, responsibility-as-answerability (i.e., the bar in the middle)was the notion ascribed the highest to both human and AI advisorsand decision-makers, followed by responsibility-as-obligation, as-task, as-authority, and as-power (i.e., the first four bars). On theother hand, liability and blame were the least attributed responsi-bility notion in bail decisions. Responsibility-as-cause and praisewere the most neutral notions, and their mean attribution is closeto zero (i.e., the baseline) across all treatments (see Figure 5 in theAppendix).Second, Figure 2 shows two distinct sets of responsibility notions,where these clusters can be observed from the pairwise Spear-man’s correlation chart. A high correlation value indicates thatthose responsibility notions are perceived similarly by people. Onegroup includes responsibility-as-task, as-authority, as-power, andas-obligation, all of which have positive mean values. The othergroup includes responsibility-as-cause, praise, blame, and liability.Responsibility-as-answerability belongs to neither of these groups.Third, we can quantify variations across vignette conditions.Each vignette shown to participants varied in the advice given,bail decision, and recidivism, allowing us to compare across thesefactors. Our data show that vignettes that grant bail (as opposed todenying bail) led to a higher assignment of all responsibility notions,particularly causal responsibility and blame (see Figure 5 in theAppendix). A similar effect was found depending on defendantrecidivism. For instance, the first four responsibility notions wereascribed to a more considerable degree if the defendant did not re-offend. In contrast, responsibility-as-cause, blame, and liability wereattributed to a lesser extent if the defendant re-offended within twoyears. These trends corroborate the responsibility clusters discussedabove.Finally, our study participants were also assigned to one of twodifferent phrasing styles addressing some of the bailing decisions’objectives. Except for responsibility-as-answerability, addressingthe violation or protection of a defendant’s rights led to a marginallyhigher assignment of responsibility than the phrasing style focusingon preventing re-offenses.

Our primary goal was to examine how people attribute responsibil-ity to human and AI agents in high-stakes scenarios. To quantify thedifference, we used a multivariate linear mixed model that includeda random-effects term to control for each participant. This allowedus to account for repeated measures, i.e., explicitly model that eachparticipant responded to questions on eight distinct defendants. We uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan (a) Study 1: Aggregate results of AI or human advisors.(b) Study 2: Aggregate results of AI or human decision-makers.

Figure 2: The overall attribution of responsibility to AI or human agents in bail decisions (left) and the correlation matrixacross different responsibility notions (right). The 𝑦 -axis indicates the degree to which participants attributed each notion ofresponsibility, based on a 7-pt Likert Scale (-3 = Strongly Disagree, 3 = Strongly Agree). use the standard .05 level of significance. In all models, we use ouradapted scale of pre-attitude towards AI systems as a control vari-able. Figure 3 shows the results. The annotated numbers indicatethe differences and significance levels between the two agents. Wereport the full regression coefficients in Table 3 in the Appendix.Both Study 1 and Study 2 show consistent differences in re-sponsibility attribution between agents, regardless of whether theyinformed a human judge (Study 1) or decided by themselves (Study2). We note subtle differences in how people attribute responsibilityto AI and humans. The first four responsibility concepts are corre-lated; the notions addressing tasks, supervisory roles, and the skillsneeded to assume them show a meaningful difference between agent types. The respondents attributed more of these notions ofresponsibility to humans than to AIs.Responsibility-as-answerability exhibits a marginal differencewith respect to the agent type that assisted human judges in baildecisions; however, the same trend was not observed in Study 2.Nevertheless, our results suggest that humans and AI are judgedsimilarly responsible with respect to causality, blame, and liabilityfor bail decisions. Moreover, human decision-makers are praisedto a considerably larger degree than AI decision-makers, althoughthe same effect was not observed for human and AI advisors. HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al. (a) Study 1: AI and human advisors.(b) Study 2: AI and human decision-makers.

Figure 3: Differences in responsibility attribution to AI programs and humans for bail decisions. ∗ 𝑝 < . , ∗∗ 𝑝 < . , ∗∗∗ 𝑝 < . . So far, we have observed two clusters of responsibility conceptsby their correlation. The first cluster is composed of responsibility-as-task, authority, power, and obligation — all of which were at-tributed to a greater degree to humans than AI systems ( Δ >0.206, 𝑝 <.001). The first three are descriptive and focus on one’s tasks(i.e., task, authority) and the necessary skills for their completion(i.e., power). Furthermore, responsibility-as-obligation is related toresponsibility-as-task in prescribing a specific goal to the agent;it differs from the latter, however, in setting a supervisory roletowards the task, rather than specifying that one should be the oneto complete it.The second cluster includes causal responsibility, blame, praise,and liability — all of which were attributed to a similar degree tohumans and AI. This finding is in line with previous work on blameassignment, highlighting the significance of causality in people’sascription of blame and punishment. Human subject studies suggestthat blame attribution is a two-step process; it is initiated by a causalconnection between an agent’s action and its consequences and isfollowed by evaluating its mental states, i.e., intentions [25]. Malleet al. [66] have also proposed a theory of blame that is dependenton the causal connection between an agent and a norm-violatingevent. Our data similarly reveal such a relationship, even whencontrolling for the advice given, bail decision, or re-offense.Concerning the phrasing styles, our experiment design addressedresponsibility-as-liability as the duty to compensate those harmedby a wrongful action. However, previous work on the connectionbetween liability (i.e., punishment) and causality focuses on the retributive aspect of punishment [25], often drawing a connec-tion between punishment and blame. Therefore, we do not positthat people’s ascription of liability is solely dependent on causalitydeterminations. We hypothesize that the low assignment of lia-bility is due to the current study’s bail decision-making context.For instance, those wrongfully convicted do not receive any com-pensation for years spent in prison in at least 21 US states [88].Hence, people might not believe that compensation is needed ordeserved, or attribute this notion of responsibility to other entities,such as the court or the government, leading to a lower ascriptionof liability to the advisor or decision-maker.Our findings indicate that participants who were presented withresponsibility statements addressing the violation or protection ofa defendant’s rights (e.g., “It is the AI program’s task to protect therights of the defendant”) were assigned higher responsibility levelsacross all notions. We posit that this effect results from the controlthat judges (humans and AIs) have over the consequences of theiradvice and decisions. Although a judge’s decision can directly affecta defendant’s rights depending on the appropriateness of one’sjailing, preventing re-offenses is a complex task that encompassesdiverse factors, such as policing and the defendant’s decision tore-offend. Participants perceived human judges and advisors as more respon-sible for their tasks than their AI counterparts (see the leftmostbars in Figure 3). Humans are responsible for the tasks they areassigned, e.g., preventing re-offenses because they are in charge (i.e.,authority) and have the skills necessary for completing them (i.e.,power). These agents should either oversee (i.e., obligation) these uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan tasks or take the lead (i.e., task). On the other hand, AI systems areascribed lower levels of all these responsibility notions.The meanings of responsibility addressing the attribution of tasksand their requirements are descriptive in the sense that they shouldbe addressed in the present tense [27], e.g., one is responsible for atask, or is in charge of it. Although descriptive and present-looking,these notions lead to the prescription of forward-looking respon-sibilities, such as an obligation. For instance, to be responsible fora specific task because one has the authority and necessary skillsprescribes that one should see to it that the task is completed, i.e.,an obligation is prescribed, through consequentialist, deontological,or virtue-based routes [102].Participants attributed lower levels of authority and power toAI. This indicates that these systems are not thought to possess thenecessary abilities to make decisions and advise such high-stakesdecisions. Therefore, it is not deemed the AI program’s responsibil-ity to complete the assigned task or see it to be fulfilled. One of the prominent findings of this work is the need of inter-pretable AI systems. Although our participants assign a marginallylower level of responsibility-as-answerability for AI advisors vis-à-vis their human counterparts ( Δ =0.167, 𝑝 <.05), they believe theyshould justify their decisions to the same extent as human judges,particularly if they are to make the final bail decision ( 𝑝 >.05).Moreover, our results suggest that an AI without a human-in-the-loop, i.e., AI judges in Study 2, could be held at the same level ofscrutiny as human decision-makers for their decisions. This findingmay imply that deploying black box AI in high-stakes scenarios,such as bail decision-making, will not be perceived well by thepublic. There exists empirical evidence that people might be averseto machines making moral decisions [10]. Previous work has notcontrolled for a system’s interpretability, and therefore such trendsmight either i ) be caused by the lack of explanations or ii ) be aggra-vated if people become aware that AI systems cannot justify theirmoral decisions.Judges should base their decisions on facts and be able to explainwhy they made such decisions. AI systems should also be capableof justifying their advice and decision-making process accordingto our results. This finding demonstrates the significance of thesesystems’ interpretability. Scholars have discussed the risks posedby the opacity of existing AI algorithms. They argue that under-standing how these systems come to their conclusions is necessaryfor both safe deployment and wide adoption [36]. Explainable AI(XAI) [46] is a field of computer science that has been given muchattention in the community [38], and our results suggest that peopleagree with its importance.Previous work has found that one’s normative and epistemo-logical values influence how explanations are comprehended [64].Explanations involve both an explainer and explainee, meaningthat conflicts might arise concerning how they are evaluated [69].Therefore, we also posit that future work should delve deeper intowhat types of explanations the general public expects from AI sys-tems. We highlight that those who are in charge of developinginterpretable systems should not try to “nudge” recipients so theycan be manipulated [63], e.g., for agency laundering [82]. The four rightmost bars in Figure 3 suggest that AI and humanagents are ascribed similar levels of backward-notions of responsi-bility, namely blame, liability, praise, and causal responsibility.

A model that canexplain our blameworthiness results is the Path Model of Blame,which proposes that blame is attributed through nested and sequen-tial judgments of various aspects of the action and its agent [66].After identifying a norm-violating event, the model states that onejudges whether the agent is causally connected to the harmful out-come. If this causal evaluation is not successful, the model assignslittle or no blame to the agent. Otherwise, the blamer evaluatesthe agent’s intentionality. If the action is deemed intentional, theblamer evaluates the reasons behind it and ascribes blame accord-ingly. For unintentional actions, however, one evaluates whetherthe agent should have prevented the norm-violating event (i.e.,had an obligation to prevent it) and could have done so (i.e., hadthe skills necessary), hence blaming the agent depending on theevaluation of these notions.Our results from both studies show that AI and human agentsare blamed to a similar degree. These findings agree with the PathModel of Blame, which proposes causality as the initial step forblame mitigation. The model proposes that one can mitigate blameby i) challenging one’s causal connection to the wrongful action orii) defending that it does not meet moral eligibility standards. Weposit that the first excuse can explain why people blame humanand AI advisors and decision-makers similarly. As their causal con-nection to the consequence is deemed alike, they are attributed tosimilar blame levels. Challenging one’s causal effect in an outcomehas also been discussed as a possible excuse to avoid blame by otherscholars [101].

The extent to whichpraise was assigned to human and AI agents varied depending onwhether one was an advisor or a decision-maker. Even though Study1 shows no difference between the two ( 𝑝 >.05), human decision-makers were more highly praised than AIs in Study 2 ( Δ =0.461, 𝑝 <.001). Previous work has proposed praise as a positive reinforce-ment [51] and a method through which one might convey informa-tion about its values and expectations to the praisee [28].Regarding the difference between advisors and decision-makers,we posit that the differences between human agents are causedby the level of control the latter has over its decision outcomes.Although an advisor influences the final decision, the judge is theone who acts on it and, hence, deserves praise. Moreover, takingpraise as positive reinforcement, praising the decision-maker overan advisor might have a bigger influence over future outcomes.However, our results also indicate that AI decision-makers arenot praised to the same level as human judges. Taking praise as amethod of conveying social expectations and values, we highlightthat people might perceive existing praising practices as inappropri-ate for AI. Similarly to the arguments against holding AI responsibleper se, focusing on the fact that they do not have mental statesrequired for existing responsibility practices [89, 96], praising anAI might lose its meaning if done as if it were towards humans. HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al.

The same argument could also be applied to the practice ofblame [26]. If the general public believes praising an AI system doesnot make sense, people might perceive blameworthiness similarly,contradicting our results. However, studies have shown a publicimpulse to blame, driven by the desire to express social valuesand expectations [18]. Psychological evidence further suggests thathumans are innate retributivists [17]. Likewise, HCI research hasfound that people attribute blame to robotic agents upon harm,particularly if they are described to be autonomous and serve themain cause of harm [37, 52, 67]. Hence, there is no contradiction inpeople attributing blame to AI systems for harms, although theyshould not be praised for opposing consequences.

Our findings indicating that AIand human agents should be held liable to a similar level goesagainst previous work, which has found that people attribute pun-ishment to AI systems to a lesser degree than their human counter-parts [62]. Punishment fulfills many societal goals, such as makingvictims whole, the satisfaction of retributive feelings, and offenders’reform. In the current study, we address one of these functions andphrase liability as the responsibility to compensate those harmed(i.e., make victims whole). Therefore, our results do not directlycontradict earlier findings that had addressed punishment in itswide definition.The results from our initial exploratory analysis in Section 4.1show that trends found between causality and blame attributionsacross different phrasing styles do not directly transfer to liabilityjudgments. Hence, we do not posit that similar causality judgmentscan explain the similar attribution of liability to AI and humansas in the case of blame. Still, we instead hypothesize that it resultsfrom two different factors based on our phrasing styles.Regarding the statements addressing the prevention of re-offenses,we posit that the lower attribution of liability to both agents iscaused by a variation of the “problem of many hands.” [102] Pre-venting defendants from re-offending does not rely solely on ajudge’s decision but encompasses many other factors as discussedabove. Therefore, liability is distributed across various entities, suchas the government and the court per se. Addressing the statementsfocusing on protecting defendants’ rights, we hypothesize that peo-ple do not expect defendants to be compensated if their rights areviolated. As examined above, much of the US legislature does notcompensate those who have been unjustly incarcerated [88]. Therespondents did not believe those harmed should, or even could,be made whole for the violation of their rights, and hence, both AIand human agents are attributed low and similar levels of liability.

Our findings indicate that people believe humans are, and shouldbe, responsible for the assigned tasks, regardless of whether theyare advisors or decision-makers. Our respondents perceive humansas having the skills necessary to complete these tasks, being incharge of them, and being able to ensure that they are completed.The responsibility notions that were attributed to human agentsto a greater extent than to AIs are present- and forward-lookingin the sense that they are descriptive, i.e., by stating a fact, andprescribe obligations. It is important to note that users of AI systemsare also responsible in a backward-looking fashion such that they should also be held responsible for the outcomes of their adviceand decisions. Therefore, our findings agree with scholars whopropose that users (and designers) should take responsibility fortheir automated systems’ actions and consequences [20, 72].Nonetheless, our study shows that AIs could also be held re-sponsible for their actions. Taking morality as a human-made con-struct [93], it may be inevitable to hold AI systems responsiblealongside their users and designers so that this formulation is keptintact. Viewing responsibility concepts as social constructs thataim to achieve specific social goals, attributing backward-lookingnotions of responsibility to AI systems might emphasize thesegoals [91]. Our study indicates these practices might not need to fo-cus on compensating those harmed by these systems given the lowattribution of liability to all agents. We instead hypothesize thatpeople might desire to hold these entities responsible for retribu-tive motives, such as satisfying their needs for revenge [71] andbridging the retribution gap [26], as a result of human nature [25].It is important to note that AI systems might not be appropriatesubjects of (retributive) blame [26, 89], i.e., scholars argue that blam-ing automated agents would be wrong and unsuccessful. Futureresearch can address which functions of responsibility attributionwould satisfy this public attribution of backward-looking respon-sibilities to AI systems. Future studies can also address scenariosin which blame could be attributed to a higher degree, e.g., thosewith life-or-death consequences, such as self-driving vehicles andAI medical advisors.A common concern raised by scholarly work is that blaming orpunishing an AI system might lead to social disruptions. From a le-gal perspective, attributing responsibility to these systems might ob-fuscate designers and users’ roles, creating human liability shields [16],i.e., stakeholders might use these automated systems as a form ofprotecting themselves from deserved punishment. Another possi-ble issue is “agency laundering,” in which the systems’ designerdistances itself from morally suspect actions, regardless of intention-ality, by blaming the algorithm, machine, or system [82]. This formof blame-shifting has been observed, for example, when Facebookcalled out its algorithm for autonomously creating anti-semiticcategories in its advertisement platform [1, 97]. We highlight thatany responsibility practice towards AI systems should not blur theresponsibility prescribed and deserved by their designers and users.Our findings suggest that autonomous algorithms alone shouldnot be held responsible by themselves, but rather alongside otherstakeholders, so these concerns are not realized.

This paper discussed the responsibility gap posed by the deploy-ment of autonomous AI systems [68] and conducted a survey studyto understand how differently people attribute responsibility to AIand humans. As a case study, we adapted vignettes from real-lifealgorithm-assisted bail decisions and employed a within-subjectsexperimental design to obtain public perceptions on various no-tions of moral responsibility. We conducted two studies; the formerillustrated a realistic scenario in which AI advises human judges, This finding does not imply that those harmed should not be compensated, but ratherthat respondents do not attribute this responsibility to AI systems per se. Some scholarspropose that other stakeholders should take this responsibility [19], mainly becauseautomated agents are not capable of doing so [16]. uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan and the latter described a fictional circumstance where AI is thedecision-maker itself.The current study focused on AI systems currently being usedto advise bailing decisions, which is an important yet specific ap-plication of these algorithms. Therefore, our results might not begeneralizable to all possible environments. For instance, some of ourresults partly conflict with previous work addressing self-drivingvehicles [7] and medical systems [62]. Studies such as ours shouldbe expanded to diverse AI applications, where they are used bothin-the-loop (as in Study 1) and autonomously (as in Study 2). Peoplehave different opinions regarding how (and where) these systemsshould be deployed in relation to how autonomous they shouldbe [65], which should affect how they ascribe responsibility fortheir actions.Study 1 was designed so that the judge’s decision always fol-lowed the advice given to reduce complexity in the vignette design.However, future studies on similar topics should also consider sce-narios in which AI systems and their human supervisors disagree.For instance, if a human judge chooses to disagree with advice,some of the advisor’s responsibilities might be shifted towards thedecision-maker regardless of the advisor’s nature. In our case study,human-AI collaborations are such that there exists an AI-in-the-loop; future work should address other collaboration variations,such as human-in-the-loop, i.e., humans assisting machines.The current research considered eight notions of responsibilityfrom related work. We recognize that other meanings of responsibil-ity could be further considered, such as virtue-based notions whereone might call an entity responsible in that it prescribes an evalu-ation of one’s traits and dispositions [87, 94]. These notions havebeen widely agreed upon as incompatible with AI systems due totheir lack of metaphysical attributes [20, 89, 96]. Nevertheless, ourresearch has found key clusters of responsibility notions concerningAI and human agents, opening further research directions.Our exploratory analysis identified two clusters of responsi-bility notions. One cluster encompasses meanings related to theattribution of tasks and obligations (i.e., responsibility-as-task, as-obligation), their necessary skills (i.e., responsibility-as-power), andthe ascription of authority (i.e., responsibility-as-authority). Theother cluster includes meanings related to causal determinations(i.e., responsibility-as-cause) and backward-looking responsibilitynotions (i.e., blame, praise, and liability).As our results demonstrate, people may hold AI to a similarlevel of moral scrutiny as humans for their actions and harms. Ourrespondents indicate that they expect decision-makers and advi-sors to justify their bailing decisions regardless of their nature. Ourfindings highlight the importance of interpretable and explainablealgorithms, particularly in high-stakes scenarios, such as our case-study. Finally, this study also showed that people judge AI andhumans differently with respect to certain notions of responsibility,particularly those addressing present- and forward-looking mean-ings, such as responsibility-as-task and as-obligation. However,we have also found that people attribute similar levels of causalresponsibility, blame, and liability to AI and human advisors anddecision-makers for bail decisions.

REFERENCES

The InternationalReview of Information Ethics

Robot ethics: The ethical and social implications of robotics (2011), 169.[5] Peter M Asaro. 2016. The liability problem for autonomous artificial agents. In .[6] Edmond Awad, Sohan Dsouza, Richard Kim, Jonathan Schulz, Joseph Henrich,Azim Shariff, Jean-François Bonnefon, and Iyad Rahwan. 2018. The moralmachine experiment.

Nature

Nature Human Behaviour

4, 2 (2020), 134–143.[8] Susanne Beck. 2016. The problem of ascribing legal responsibility in the case ofrobotics.

AI & Society

31, 4 (2016), 473–481.[9] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth.2018. Fairness in criminal justice risk assessments: The state of the art.

Socio-logical Methods & Research (2018), 0049124118782533.[10] Yochanan E Bigman and Kurt Gray. 2018. People are averse to machines makingmoral decisions.

Cognition

181 (2018), 21–34.[11] Gunnar Björnsson and Karl Persson. 2012. The explanatory component of moralresponsibility.

Noûs

46, 2 (2012), 326–354.[12] J Bonnefon, Azim Shariff, and Iyad Rahwan. 2020.

The moral psychology of AIand the ethical opt-out problem . Oxford University Press, Oxford, UK.[13] Jean-François Bonnefon, Azim Shariff, and Iyad Rahwan. 2016. The socialdilemma of autonomous vehicles.

Science

Thequest for responsibility: Accountability and citizenship in complex organisations .Cambridge university press.[15] Bartosz Brożek and Marek Jakubiec. 2017. On the legal responsibility of au-tonomous machines.

Artificial Intelligence and Law

25, 3 (2017), 293–304.[16] Joanna J Bryson, Mihailis E Diamantis, and Thomas D Grant. 2017. Of, for, andby the people: the legal lacuna of synthetic persons.

Artificial Intelligence andLaw

25, 3 (2017), 273–291.[17] Kevin M Carlsmith. 2008. On justifying punishment: The discrepancy betweenwords and actions.

Social Justice Research

21, 2 (2008), 119–137.[18] Kevin M Carlsmith, John M Darley, and Paul H Robinson. 2002. Why do wepunish? Deterrence and just deserts as motives for punishment.

Journal ofpersonality and social psychology

83, 2 (2002), 284.[19] Paulius Čerka, Jurgita Grigien˙e, and Gintar˙e Sirbikyt˙e. 2015. Liability for dam-ages caused by artificial intelligence.

Computer Law & Security Review

31, 3(2015), 376–389.[20] Marc Champagne and Ryan Tonkens. 2015. Bridging the responsibility gap inautomated warfare.

Philosophy & Technology

28, 1 (2015), 125–137.[21] Eugene Chislenko. 2019. Scanlon’s Theories of Blame.

The Journal of ValueInquiry (2019), 1–16.[22] Cory J Clark, Eric Evan Chen, and Peter H Ditto. 2015. Moral coherence pro-cesses: Constructing culpability and consequences.

Current Opinion in Psychol-ogy

AI & Society

24, 2 (2009), 181–189.[24] Mark Coeckelbergh. 2019. Artificial intelligence, responsibility attribution, anda relational justification of explainability.

Science and engineering ethics (2019),1–18.[25] Fiery Cushman. 2008. Crime and punishment: Distinguishing the roles of causaland intentional analyses in moral judgment.

Cognition

Ethics and InformationTechnology

18, 4 (2016), 299–309.[27] Michael Davis. 2012. “Ain’t no one here but us social forces”: Constructing theprofessional responsibility of engineers.

Science and Engineering Ethics

18, 1(2012), 13–34.[28] Catherine R Delin and Roy F Baumeister. 1994. Praise: More than just socialreinforcement.

Journal for the theory of social behaviour

24, 3 (1994), 219–241.[29] Mady Delvaux. 2017. Report with recommendations to the Commission on CivilLaw Rules on Robotics (2015/2103 (INL)).

European Parliament Committee onLegal Affairs (2017).

HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al. [30] Mandeep K Dhami, Samantha Lundrigan, and Katrin Mueller-Johnson. 2015.Instructions on reasonable doubt: Defining the standard of proof and the juror’stask.

Psychology, Public Policy, and Law

21, 2 (2015), 169.[31] Virginia Dignum. 2017. Responsible artificial intelligence: designing AI forhuman values. (2017).[32] Gordana Dodig-Crnkovic and Daniel Persson. 2008. Sharing moral responsibil-ity with robots: A pragmatic approach.

Frontiers in Artificial Intelligence AndApplications

173 (2008), 165.[33] Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predict-ing recidivism.

Science Advances

4, 1 (2018).[34] Robin Antony Duff. 2007.

Answering for crime: Responsibility and liability in thecriminal law . Bloomsbury Publishing.[35] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter,Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification ofskin cancer with deep neural networks.

Nature

Minds and Machines

Human Factors (2019),0018720819880641.[38] Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, andLalana Kagal. 2018. Explaining explanations: An overview of interpretability ofmachine learning. In . IEEE, 80–89.[39] Sabine Gless, Emily Silverman, and Thomas Weigend. 2016. If Robots causeharm, Who is to blame? Self-driving Cars and Criminal Liability.

New CriminalLaw Review

19, 3 (2016), 412–436.[40] Kurt Gray, Chelsea Schein, and Adrian F Ward. 2014. The myth of harmlesswrongs in moral cognition: Automatic dyadic completion from sin to suffering.

Journal of Experimental Psychology: General

Proceedings of the 2018 World Wide WebConference . 903–912.[42] Nina Grgić-Hlača, Adrian Weller, and Elissa M Redmiles. 2020. Dimensionsof Diversity in Human Perceptions of Algorithmic Fairness. arXiv preprintarXiv:2005.00808 (2020).[43] Nina Grgić-Hlača, Christoph Engel, and Krishna P. Gummadi. 2019. HumanDecision Making with Machine Assistance: An Experiment on Bailing andJailing.

Proc. ACM Hum.-Comput. Interact.

3, CSCW, Article 178 (Nov. 2019),25 pages. https://doi.org/10.1145/3359280[44] Robert M Groves, Floyd J Fowler Jr, Mick P Couper, James M Lepkowski, EleanorSinger, and Roger Tourangeau. 2011.

Survey Methodology . Vol. 561. John Wiley& Sons.[45] David J Gunkel. 2017. Mind the gap: responsible robotics and the problem ofresponsibility.

Ethics and Information Technology (2017), 1–14.[46] David Gunning. 2017. Explainable artificial intelligence (XAI).

Defense AdvancedResearch Projects Agency (DARPA), nd Web

Ethics and Information Technology

11, 1 (2009), 91–99.[49] Zan Huang, Hsinchun Chen, Chia-Jung Hsu, Wun-Hwa Chen, and Soushan Wu.2004. Credit rating analysis with support vector machines and neural networks:a market comparative study.

Decision Support Systems

37, 4 (2004), 543–558.[50] Panagiotis G Ipeirotis. 2010. Demographics of mechanical turk. (2010).[51] Alan E Kazdin. 1978.

History of behavior modification: Experimental foundationsof contemporary research.

University Park Press.[52] Taemie Kim and Pamela Hinds. 2006. Who should I blame? Effects of auton-omy and transparency on attributions in human-robot interaction. In

ROMAN2006-The 15th IEEE International Symposium on Robot and Human InteractiveCommunication . IEEE, 80–85.[53] Sebastian Köhler, Neil Roughley, and Hanno Sauer. 2017. Technology, respon-sibility gaps and the robustness of our everyday conceptual scheme.

MoralAgency and the Politics of Responsibility (2017).[54] Bert-Jaap Koops, Mireille Hildebrandt, and David-Olivier Jaquet-Chiffelle. 2010.Bridging the accountability gap: Rights for new entities in the informationsociety.

Minn. JL Sci. & Tech.

11 (2010), 497.[55] Francesca Lagioia and Giovanni Sartor. 2019. AI Systems Under Criminal Law:a Legal Analysis and a Regulatory Perspective.

Philosophy & Technology (2019),1–33.[56] Min Kyung Lee. 2018. Understanding perception of algorithmic decisions:Fairness, trust, and emotion in response to algorithmic management.

Big Data& Society

5, 1 (2018), 2053951718756684. [57] Min Kyung Lee, Anuraag Jain, Hea Jin Cha, Shashank Ojha, and Daniel Kusbit.2019. Procedural justice in algorithmic fairness: Leveraging transparency andoutcome control for fair algorithmic mediation.

Proceedings of the ACM onHuman-Computer Interaction

3, CSCW (2019), 1–26.[58] Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, AllissaChan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas, et al. 2019.WeBuildAI: Participatory framework for algorithmic governance.

Proceedingsof the ACM on Human-Computer Interaction

3, CSCW (2019), 1–35.[59] Jamy Li, Xuan Zhao, Mu-Jung Cho, Wendy Ju, and Bertram F Malle. 2016.

Fromtrolley to autonomous vehicle: Perceptions of responsibility and moral norms intraffic accidents with self-driving cars . Technical Report. SAE Technical Paper.[60] Dafni Lima. 2017. Could AI Agents Be Held Criminally Liable: Artificial Intelli-gence and the Challenges for Criminal Law.

SCL Rev.

69 (2017), 677.[61] Gabriel Lima and Meeyoung Cha. 2020. Responsible AI and Its Stakeholders. arXiv preprint arXiv:2004.11434 (2020).[62] Gabriel Lima, Chihyung Jeon, Meeyoung Cha, and Kyungsin Park. 2020. WillPunishing Robots Become Imperative in the Future?. In

Extended Abstracts ofthe 2020 CHI Conference on Human Factors in Computing Systems . 1–8. https://doi.org/10.1145/3334480.3383006[63] Zachary C Lipton. 2018. The mythos of model interpretability.

Queue

16, 3(2018), 31–57.[64] Tania Lombrozo. 2009. Explanation and categorization: How “why?” informs“what?”.

Cognition

Advances in NeuralInformation Processing Systems . 57–67.[66] Bertram F Malle, Steve Guglielmo, and Andrew E Monroe. 2014. A theory ofblame.

Psychological Inquiry

25, 2 (2014), 147–186.[67] Bertram F. Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and CoreyCusimano. 2015. Sacrifice One For the Good of Many? People Apply DifferentMoral Norms to Human and Robot Agents. In

Proceedings of the Tenth AnnualACM/IEEE International Conference on Human-Robot Interaction (HRI ’15) . https://doi.org/10.1145/2696454.2696458[68] Andreas Matthias. 2004. The responsibility gap: Ascribing responsibility forthe actions of learning automata.

Ethics and information technology

6, 3 (2004),175–183.[69] Brent Mittelstadt. 2019. Principles alone cannot guarantee ethical AI.

NatureMachine Intelligence (2019), 1–7.[70] Andrew E Monroe and Bertram F Malle. 2017. Two paths to blame: Intentionalitydirects moral information processing along two distinct tracks.

Journal ofExperimental Psychology: General

SCL Rev.

69 (2017), 579.[72] Sven Nyholm. 2018. Attributing agency to automated systems: Reflections onhuman–robot collaborations and responsibility-loci.

Science and EngineeringEthics

24, 4 (2018), 1201–1219.[73] Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019.Dissecting racial bias in an algorithm used to manage the health of populations.

Science

Journal of Behavioral and Experimental Finance

Handbook of the Law of Torts . Vol. 4. WestPublishing.[78] Iyad Rahwan. 2018. Society-in-the-loop: programming the algorithmic socialcontract.

Ethics and Information Technology

20, 1 (2018), 5–14.[79] Elissa M Redmiles, Yasemin Acar, Sascha Fahl, and Michelle L Mazurek. 2017.

A Summary of Survey Methodology Best Practices for Security and Privacy Re-searchers . Technical Report.[80] Neil M Richards and William D Smart. 2016. How should the law think aboutrobots? In

Robot Law . Edward Elgar Publishing.[81] Lionel P Robert, Casey Pierce, Liz Marquis, Sangmi Kim, and Rasha Alahmad.2020. Designing fair AI for managing employees in organizations: a review,critique, and design agenda.

Human–Computer Interaction (2020), 1–31.[82] Alan Rubel, Clinton Castro, and Adam Pham. 2019. Agency Laundering andInformation Technologies.

Ethical Theory and Moral Practice

22, 4 (2019), 1017–1041.[83] Katherine Ryan, Nora Gannon-Slater, and Michael J Culbertson. 2012. Im-proving survey methods with cognitive interviews in small-and medium-scaleevaluations.

American Journal of Evaluation

33, 3 (2012), 414–430.[84] Nripsuta Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic, David Parkes,and Yang Liu. 2019. How Do Fairness Definitions Fare? Examining PublicAttitudes Towards Algorithmic Definitions of Fairness.

AIES (2019).[85] Thomas Scanlon. 2000.

What we owe to each other . Belknap Press.[86] Thomas M Scanlon. 2008. Moral dimensions: Meaning, permissibility, and blame.

Cambridge: Harvard (2008). uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan [87] David Shoemaker. 2011. Attributability, answerability, and accountability: To-ward a wider theory of moral responsibility.

Ethics

How the wrongfully convicted are compensated for years lost

Journal of Applied Philosophy

24, 1 (2007),62–77.[90] Megha Srivastava, Hoda Heidari, and Andreas Krause. 2019. MathematicalNotions vs. Human Perception of Fairness: A Descriptive Approach to Fairnessfor Machine Learning. arXiv preprint arXiv:1902.04783 (2019).[91] Bernd Carsten Stahl. 2006. Responsible computers? A case for ascribing quasi-responsibility to computers independent of personhood or agency.

Ethics andInformation Technology

8, 4 (2006), 205–213.[92] Dag Sverre Syrdal, Kerstin Dautenhahn, Kheng Lee Koay, and Michael L Wal-ters. 2009. The negative attitudes towards robots scale and reactions to robotbehaviour in a live human-robot interaction study.

Adaptive and EmergentBehaviour and Complex Systems (2009).[93] Andreas Theodorou. [n.d.]. Why Artificial Intelligence is a Matter of Design.

Reflections in Philosophy, Theology, and the Social Sciences ([n. d.]), 105.[94] Daniel W Tigard. 2020. Responsible AI and moral responsibility: a commonappreciation.

AI and Ethics (2020), 1–5.[95] Daniel W Tigard. 2020. There Is No Techno-Responsibility Gap.

Philosophy &Technology (2020), 1–19.[96] Steve Torrance. 2008. Ethics and consciousness in artificial agents.

Ai & Society

22, 4 (2008), 495–521.[97] Andreas Tsamados, Nikita Aggarwal, Josh Cowls, Jessica Morley, Huw Roberts,Mariarosaria Taddeo, and Luciano Floridi. 2020. The Ethics of Algorithms: KeyProblems and Solutions.

Available at SSRN 3662302 (2020).[98] Jacob Turner. 2018.

Robot rules: Regulating artificial intelligence . Springer.[99] Mathias Twardawski, Karen TY Tang, and Benjamin E Hilbig. 2020. Is It AllAbout Retribution? The Flexibility of Punishment Goals.

Social Justice Research (2020), 1–24.[100] U.S. Census Bureau. 2016. American Community Survey 5-Year Estimates.[101] Ibo Van de Poel. 2011. The relation between forward-looking and backward-looking responsibility. In

Moral Responsibility . Springer, 37–52.[102] Ibo Van de Poel. 2015. Moral responsibility. In

Moral responsibility and theproblem of many hands . Routledge, 24–61.[103] Robert van den Hoven van Genderen. 2018. Do we need new legal personhoodin the age of robots and AI? In

Robotics, AI and the Future of Law . Springer,15–55.[104] Nicole A Vincent. 2011. A structured taxonomy of responsibility concepts. In

Moral responsibility . Springer, 15–35.[105] Laura Wächter and Felix Lindner. 2018. An explorative comparison of blameattributions to companion robots across various moral dilemmas. In

Proceedingsof the 6th International Conference on Human-Agent Interaction . 269–276.[106] R Jay Wallace. 1994.

Responsibility and the moral sentiments . Harvard UniversityPress.[107] Lisa Wallander. 2009. 25 years of factorial surveys in sociology: A review.

SocialScience Research

38, 3 (2009), 505–520.[108] Ruotong Wang, F Maxwell Harper, and Haiyi Zhu. 2020. Factors InfluencingPerceived Fairness in Algorithmic Decision-Making: Algorithm Outcomes, De-velopment Procedures, and Individual Differences. In

Proceedings of the 2020CHI Conference on Human Factors in Computing Systems . 1–14.[109] Michael Wenzel and Ines Thielmann. 2006. Why we punish in the name ofjustice: Just desert versus value restoration and the role of social identity.

SocialJustice Research

19, 4 (2006), 450–470.[110] Gordon B Willis. 2004.

Cognitive interviewing: A tool for improving questionnairedesign . Sage Publications.[111] Chen Zhu, Hengshu Zhu, Hui Xiong, Chao Ma, Fang Xie, Pengliang Ding, andPan Li. 2018. Person-job fit: Adapting the right talent for the right job with jointrepresentation learning.

ACM Transactions on Management Information Systems(TMIS)

9, 3 (2018), 1–17.

ACKNOWLEDGMENTS

This work was supported by the Institute for Basic Science (IBS-R029-C2).

A APPENDIX

HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al.

Notion Phrasing Statements

Responsibility-as-task Further Offense It is the (agent)’s task to prevent further offenses.Rights It is the (agent)’s task to protect the rights of the defendant.Responsibility-as-authority Further Offense The (agent) has the authority to prevent further offenses.Rights The (agent) has the authority to protect the rights of the defendant.Responsibility-as-power Further Offense The (agent) has the skills needed to prevent further offenses.Rights The (agent) has the skills needed to protect the rights of the defendant.Responsibility-as-obligation Further Offense The (agent) should ensure that no further offense is committed.Rights The (agent) should ensure that the rights of the defendant are protected.Responsibility-as-answerability Further Offense The (agent) should justify their advice/decision.Rights The (agent) should justify their advice/decision.Responsibility-as-cause Further Offense The (agent)’s decision led to the occurrence/prevention of the reoffense.Rights The (agent)’s decision led to the violation/protection of the rights of the defendant.Responsibility-as-blame/praise Further Offense The (agent) should be blamed/praised for the failure to prevent/preventionof the reoffense.Rights The (agent) should be blamed/praised for the violation/protection of the rightsof the defendant.Responsibility-as-liability Further Offense The (agent) should compensate those harmed by the reoffense.Rights The (agent) should compensate the defendant for violating their rights.

Table 2: Statements addressing all responsibility notions presented to participants in Study 1 and Study 2. (Agent) is either “AIprogram,” “human advisor,” or “human judge” depending on the agent and the study. The statements addressing responsibility-as-liability were shown if i ) the defendant re-offended and the phrasing style addressed the prevention of re-offenses, or ii )the defendants were denied bail and did not re-offend within two years while the statements focused on the protection of theirrights. The phrases tackling praise and blame were presented depending on the advice/decision and recidivism. The phrasingcolumn indicates how statements were phrased depending on which function of the bail decision they stressed: preventingfurther offenses (Further Offense) or protecting the defendant’s rights (Rights). uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan (a) Study introduction presenting the scenario where AI systems arebeing used for bail decisions. (b) Vignette presented to survey participants introducing a defendant,whether they have re-offended, and the stakeholders’ decisions andadvices.(c) Attribution of the eight notions of moral responsibility to the advisor in Study 1 (or decision-maker in Study 2). Figure 4: Example screenshots of the survey instrument used for Study 1. The study is available at https://thegcamilo.github.io/responsibility-compas/.

HI ’21, May 8–13, 2021, Yokohama, Japan Lima et al. (a) Study 1: AI and human advisors.(b) Study 2: AI and human decision-makers.

Figure 5: Attribution of responsibility for bail decisions depending on how the statements were phrased, recidivism, andadvice/decision. uman Perceptions on Moral Responsibility of AI: A Case Study in AI-Assisted Bail Decision-Making CHI ’21, May 8–13, 2021, Yokohama, Japan (1) (2)Decision-Maker Advisor answer agent_human -0.0232 0.168 ∗ advice_jail -0.0722 -0.0800defendant_reoffended 0.0696 0.130phrasing_rights 0.0308 -0.102control 0.0103 -0.119intercept 1.254 ∗∗∗ ∗∗∗ authority agent_human 0.885 ∗∗∗ ∗∗∗ advice_jail -0.122 ∗ -0.193 ∗∗ defendant_reoffended -0.00129 -0.233 ∗∗∗ phrasing_rights 1.052 ∗∗∗ ∗∗∗ control -0.150 -0.0790intercept -0.0410 0.0360 blame agent_human 0.0258 0.107advice_jail -1.128 ∗∗∗ -0.878 ∗∗∗ defendant_reoffended -0.572 ∗∗∗ -0.456 ∗∗ phrasing_rights 0.500 ∗ ∗ -0.0312intercept 0.0484 0.226 cause agent_human 0.0863 0.115advice_jail -1.179 ∗∗∗ -1.018 ∗∗∗ defendant_reoffended -0.460 ∗∗∗ -0.550 ∗∗∗ phrasing_rights 1.020 ∗∗∗ ∗∗∗ control 0.0278 -0.0686intercept 0.136 0.468 ∗∗ liability agent_human -0.0541 0.122advice_jail -0.722 ∗∗∗ -0.541 ∗∗ defendant_reoffended -0.553 ∗∗∗ -0.466 ∗∗ phrasing_rights 0.530 ∗ ∗ obligation agent_human 0.206 ∗∗∗ ∗∗∗ advice_jail -0.144 ∗∗ -0.124 ∗ defendant_reoffended -0.0438 -0.216 ∗∗∗ phrasing_rights 0.987 ∗∗∗ ∗∗∗ control -0.0614 -0.0786intercept 0.642 ∗∗∗ ∗∗∗ power agent_human 0.799 ∗∗∗ ∗∗∗ advice_jail -0.101 -0.126defendant_reoffended -0.271 ∗∗∗ -0.366 ∗∗∗ phrasing_rights 0.939 ∗∗∗ ∗∗∗ control -0.0759 -0.118intercept -0.119 0.263 praise agent_human 0.461 ∗∗∗ ∗∗∗ -0.941 ∗∗∗ phrasing_rights 1.435 ∗∗∗ ∗∗∗ control 0.0277 -0.120intercept -0.784 ∗∗∗ -0.148 task agent_human 0.282 ∗∗∗ ∗∗∗ advice_jail -0.117 ∗ -0.111defendant_reoffended -0.0683 -0.164 ∗∗ phrasing_rights 0.686 ∗∗∗ ∗∗ control -0.0417 0.0128intercept 0.667 ∗∗∗ ∗∗∗ Table 3: Coefficients from the multivariate mixed effects model presented in Section 4.2. ∗ 𝑝 < . , ∗∗ 𝑝 < . , ∗∗∗ 𝑝 < .001