[PDF] Toward a Rational and Ethical Sociotechnical System of Autonomous Vehicles: A Novel Application of Multi-Criteria Decision Analysis

Abstract

Full PDF

TToward a Rational and Ethical Sociotechnical System of Autonomous Vehicles:A Novel Application of Multi-Criteria Decision Analysis

Veljko Dubljevic , George F. List , Jovan Milojevich , Nirav Ajmeri , William Bauer , Munindar P. Singh , Eleni Bardaka , Thomas Birkland , Charles Edwards , Roger Mayer , Ioan Muntean , Thomas Powers , Hesham Rakha , Vance Ricks and M. Shoaib Samandar North Carolina State University Oklahoma State University University of Bristol University of North Carolina at Chapel Hill University of North Carolina at Asheville University of Delaware Virginia Tech Guilford College

1. Introduction: MCDA and the problems of AVs

The expansion of artificial intelligence (AI) and autonomous systems has shown the potential togenerate enormous social good while also raising serious ethical and safety concerns [ ]. AItechnology is increasingly adopted in transportation. Kamalanathsharma et al. [ ] conducted asurvey of various in-vehicle technologies and found that approximately 64% of the respondentsused a smartphone application to assist with their travel. The top-used applications werenavigation and real-time traffic information systems. Among those who used smartphones duringtheir commutes, the top-used applications were navigation and entertainment. There is a pressing need to address relevant social concerns to allow for the development ofsystems of intelligent agents that are informed and cognizant of ethical standards. Doing so willfacilitate the responsible integration of these systems in society. To this end, we have appliedMulti-Criteria Decision Analysis (MCDA) to develop a formal Multi-Attribute ImpactAssessment (MAIA) questionnaire for examining the social and ethical issues associated with theuptake of AI. We have focused on the domain of autonomous vehicles (AVs) because of theirimminent expansion [ ]. However, AVs could serve as a stand-in for any domain whereintelligent, autonomous agents interact with humans, either on an individual level (e.g.,pedestrians, passengers) or a societal level [ ].MCDA has been proposed as a method to study the assessment of harms and risks [ ].The basic tenet is that MCDA, along with qualitative techniques, can provide defensible insightsabout the way people see the multi-faceted impacts of technological change. Manyimprovements have already been made since the papers describing this method were initiallypublished. For instance, studies have 1) expanded the criteria [ ], 2) included relativeimportance (or weights) of different harms [ ], 3) made comparisons of the harm/benefit ratios[ ], and 4) reported the need to include perceptions of all relevant stakeholders [ ].One of the strengths of MCDA is that it can systematize a process that encompasses large areasof knowledge in a transparent manner, allowing for replication and improvement of themethodology [ ]. MCDA breaks down complex evaluations into a series of smaller, more easilyassessed issues, thus enhancing the reliability and validity of the results. . 2 By utilizing both qualitative and quantitative analyses, we have expanded the utility of MCDA,giving it the potential to drastically improve the ethical evaluations of transformative change,illustrated here in the context of AV technology. The MAIA questionnaire provides an evidencebase for impacts, including harm-over-benefit ratios. Notably, it addresses the drawbacksidentified in the literature critical of the MCDA methodology, such as lack of attention tosituational factors [ ], value judgments [ ], and additional stakeholders [ ].There is a substantial need to apply the kind of approach embodied by MAIA to AVs, for thefollowing reasons. As AVs are implemented in various types of transportation systems, thedegree of direct interaction with humans (e.g., pedestrians) and human operated vehicles(connected non-autonomous vehicles (CVs) and traditional vehicles) grows in complexity bothintrinsically and due to the combinatorial complexity introduced by large numbers of vehicles.Therefore, controlling the behavior of AVs becomes inherently more complex, and the potentialfor harm to humans increases. In simpler versions of the transport system (e.g., robotic single-lane freeways), it is possible to consider the devices to be automatic, precisely carrying out theinstructions of the owners, according to relatively simple programming. However, in complexurban environments, where the interactions are far more complex, humans frequently takeactions outside the rule set to resolve conflicts. Therefore, successfully implementing AVsrequires accommodating unpredictable situations that may occur as a result of human behaviorand decision-making. AVs will transform the lives of many people [ ]. Although they have the potential to savemany lives, they also raise important new safety and ethical dilemmas. Due to the inherenthuman factors involved, the successful implementation of AVs is not only an engineering issuebut a social, political, and ethical issue as well. The perspectives of multiple disciplines arerequired to craft detailed assessments of relative impacts for implementation of AVs in differenttypes of controlled and uncontrolled transportation environments. Understanding the societal andethical implications of AVs (and any AI system) inherently involves many distinct issues: thenature and capabilities of these technologies (computer science, engineering), how humans canand should use them (ethics), how humans will behave in response to the presence of AVs in thetraffic stream (social sciences), and the technology’s impact on socio-economic structures(political science, economics). Thus, producing new and relevant knowledge in this area requiresthe expertise originating in multiple disciplines [ ].Expert opinions are periodically obtained on emerging technologies to provide valuable insights[ ]. However, a comprehensive methodology for comparing heterogeneous harms andbenefits relative to different stakeholders has been lacking. Previously, expert assessments of AVtechnology have been based on fictional future scenarios so that relevant policies could bediscussed [ ], as opposed to identifying how policies adopted in the present could shape thefuture, or how each policy option compares to the current status quo in terms of relevant criteria(see Table 1). To fill this gap, we elicited expert opinions about the impacts of AVs through a Delphi exerciseand consensus workshop, resulting in operational evidence regarding the moral, social, andeconomic benefits and harms of AVs. The identification of relevant facts and values—the taskfor which disciplinary experts are essential—helps guide complex evaluations, reducesconfounds and biases, and clarifies uncertainties [ ]. . 3 Our assumption, at least for the foreseeable future, is that it is unrealistic to expect AVs tocompletely replace traditional non-autonomous motor vehicles. We expect that AVs will operatein a heterogeneous environment alongside traditional vehicles, as well as cyclists andpedestrians. Traditional vehicle technology is assumed to be robust and wanted not merely foreconomic benefits but also for psychological reasons, such as the ‘joy of driving’ [ ]. Thus, weagree with Samandar and colleagues that “a mixed traffic fleet is likely to be the predominantscenario for the foreseeable future.” [ ] Studies focused on other countries have generated assessments that are interesting but notnecessarily applicable to the U.S. context. For instance, the German Federal Ministry ofTransport and Digital Infrastructure appointed a national ethics committee for automated andconnected driving to develop and issue a code of ethics. This code states that “protection ofindividuals takes precedence over all utilitarian considerations” and “automated driving isjustifiable only to the extent to which conceivable attacks, in particular manipulation of the ITsystem or innate system weaknesses, do not result in such harm as to lastingly shatter people’sconfidence in road transport.” [ ] Such guidance is interesting, yet there is no mention of how itis to be implemented, raising concerns of its feasibility. Moreover, it fails to address importantissues such as how AV technology could be programmed to resist malicious actors, such asterrorists [ ], or how social justice issues can be safeguarded during the introduction of AVsinto the socioeconomic system [ ]. The European Union [ ] and Australia [ ] have alsodeveloped expert-assessed scenarios intended to guide policy makers in regulating AVtechnology. Groups of experts are very important, but they are better used in assessing theimportance of harms and benefits, as we have done.

2. Methodology: The Multi-Attribute Impact Assessment

We developed a novel application of the MCDA method, which we call the Multi-AttributeImpact Assessment (MAIA) questionnaire, to assess the impacts of AV technology. Weidentified 21 impacts for which we sought expert opinions about their importance (SeeSupplementary Table ST1). We followed an iterative process that began with the first author ofthis paper preparing an initial list of harms and benefits based on the AV ethics literature andrelevant agency reports [

34, 35 ]. The list was discussed at length by a sample of six experts (firstsix authors of this paper), then revised based on the feedback. It was subsequently piloted in aDelphi survey with the full panel of 19 experts, again revised based on feedback and discussed atlength during the consensus workshop (see below). The final list of impacts is categorized into13 harms and eight benefits, as shown in Table I.

Table I: The harms and benefits assessed

Q1 Harms of vehicle related mortality (e.g. driver or passenger deaths on the road)Q2 Harms of vehicle specific damage (e.g., costs of damage to property)Q3 Harms of vehicle related damage (e.g., damage to natural environment)Q4 Harms of vehicle system encroachment on human living (e.g, reduction of urban walkability)Q5 Harms of vehicle related occupational injuries (e.g., sedentary lifestyle of drivers)Q6 Harms of vehicle related lack of status (e.g., elderly losing driver’s licenses due to visual impairments)Q7 Harms of vehicle related loss of time or productivity (e.g, time spent in traffic jams)Q8 Harms of vehicle related loss of social engagement (e.g., time spent isolated from . 4 others)Q9 Harms of vehicle related injury to others (e.g., hit and run incidents)Q10 Harms of vehicle related economic costs (e.g., maintenance costs)Q11 Harms of vehicle related changes to community (e.g., marginalization of specific communities)Q12 Harms of vehicle related crime opportunities (e.g., sexual assault by ride-hailing service drivers or passengers)Q13 Harms of vehicle related economic changes (e.g., loss of jobs by drivers)Q14 Benefits of promoting societal value (e.g., increase in economic activity)Q15 Benefits of minimizing negative societal impacts (e.g., decrease in pedestrian injury and death)Q16 Protecting the interests of users (e.g., drivers)Q17 Advancing the preservation of the environment (e.g., reducing traffic jams)Q18 Maximizing the progress of science and technology (e.g., increasing data quality)Q19 Engaging relevant communities (e.g., pedestrians, business communities)Q20 Ensuring oversight and accountability (e.g., preventing or limiting irresponsible uses)Q21 Recognizing appropriate governmental and policy roles (e.g., bringing public attentionto transportation issues)Concurrently, we explored four operational scenarios or regulatory environments that might beimplemented during the deployment of AVs. They are described in Table 2.

Table II: Operational scenarios and regulatory environments explored

Note: In scenarios 2-4, we assume that traditional non-autonomous vehicles continue to operate in addition to AVs.

These categories were based on our sense of how AV technology is likely to be introduced.Beyond the status quo (scenario 1), the first AV condition (scenario 2) assumes no regulatorycontrol will be exercised and commercial entities will “push” the development and deployment.Implicitly, anyone (any entity) would be able to purchase and operate such vehicles anywhere. The second and third AV conditions (scenarios 3&4) assume that regulatory control will beapplied and either individuals (scenario 3) or only commercial operators (scenario 4) can own theAVs. Scenarios 3 and 4 assume SAE level 4, meaning that the vehicles can operate on a portionof the highway network [ ]. In scenario 3, companies can also own them, like car rental andride sharing companies, but there is no prohibition against people having AVs. In scenario 4 onlycommercial operators can own AVs; no personal ownership is allowed. The categories are silentinsofar as the level of market penetration is concerned; but implicitly, the vehicle population has . 5 enough AVs such that their operational impact is visible. We elected not to include SAE Level 5,full autonomy, as one of the scenarios because it seems far off in the future compared with thestatus quo.

3. Results

A consensus emerged that certain forms of AV implementation would be less harmful thanothers. Namely, the regulated private or fleet owned policies (scenario 3 or 4) would be betterthan either the current transportation system (scenario 1) or the haphazard or unfettered AVimplementation (scenario 2). The stacked histograms in Fig. S1, which show the harms ofdifferent AV technology implementation measured on a 4-point scale, summarize this finding.Similarly, the regulated, fleet owned scenario (scenario 4) would produce the greatest benefits(see Fig. S2). A follow-up survey that used a 10-point scale produced similar results. See Fig.S3.The harm and benefit assessments were open ended. Respondents were allowed to scale theirtotal assessments on any basis. To make this clear, whereas one respondent could have used “1”as the maximum for each, another could have used “100”. With 21 criteria, this means the firstrespondent would have a total ranging up to 21; the second, up to 2100. We wanted to see if therespondents were similar in their relative assessments of the importance of the 21 criteria. Hence,we scaled the assessments to a total of 100 for each. Fig. S4 shows the “weight profiles” thatemerged. For each respondent, the profile shows the percentage distribution of importanceamong the 21 criteria for a given respondent. If one of the respondents had made the assessmentsall equal, the “profile” would be a straight line. A “quicker rise” for a given criterion implies thatit has more importance, a slower rise, less importance. The main conclusion we draw from thisfigure is that, except for a couple of respondents, they all had a similar sense of the relativeimportance of the impacts. Respondent 5 (medium blue and the highest) gave the greatestaggregate importance to the harms (highest total percent by impact 13). Respondent 14 (darkorange, and the lowest) gave the greatest importance to the benefits (lowest total percent byimpact 13).

Two “harm” assessment surveys were administered. The harms were impacts 1-13. One surveyused a 4-point scale (0-3) for each impact where zero was “no harm” and 3 was “extreme harm”(Cronbach’s alpha, α = 0.863). The other used a 10-point scale where 1 was “no harm” and 10was “extreme harm” (Cronbach’s alpha, α = 0.916). The assessments were done sequentially,with the 4-point scale having been assessed first. Since the findings from the 4-point scale wereshared with the participants before the 10-point survey was administered, there could be impactsfrom shared feedback; as demonstrated by the results of the Cronbach’s alpha reliability tests.We analyzed the harm responses in several ways. The first was in terms of the relativeimportance of the harms; another was by “scenario.” The description of the harms is presented asin Table 1 (questions 1-13), and the four scenarios are in Table 2. The mean values and standard deviations based on the 4-point scale (0-3) are shown in Fig. S5.A higher score means greater harm. The harm with the greatest reduction due to AVs is 6, lack ofstatus loss (e.g., elderly losing driver’s licenses due to visual impairments). This makes sense;AVs provide a significant boost in mobility for these people. The one where the impacts aremixed or minimal is 11, harms related to changes to community (e.g., marginalization of specific . 6 communities). The respondents saw no clear trend in this impact. The impact with the greatestvariation in impact assessment was 3, vehicle damage (e.g., to the natural environment), wesuppose this is because of differences in perceptions about the technology and how it will beused. Harm 13 stands out as having characteristics different from the others. It pertains to harmsof vehicle related economic changes (e.g., loss of jobs by drivers). Hence, it is not surprising thatits impacts are different. The aggregate assessment of differences among scenarios will beaddressed later, but it seems clear that scenario 4, which involves a regulated commerciallyowned fleet, has the greatest reduction in harms. Fig. S6 shows the same information but on a 10-point scale. The 1-10 results were remapped to0-9 so that the low end of both assessments was 0. Strikingly different are the assessments forcriterion 1 (more spread) and 3 (a higher sense of harm for the status quo). Otherwise, the patternis similar. Moreover, as before Scenario 1 has the greatest harms, followed by Scenarios 2, 3,and 4, roughly in that order.For a broader brush, we computed the sums by respondent for all the harms (the sum of theresponses to questions 1-13). A maximum of 52 (13*4) was possible; and a minimum of 0. Wethen computed the average of these values and the standard deviation. Fig. S7 shows the resultsfor both the four-point scale (0-3) and the ten-point scale (0-9). The trends in the average amongthe four scenarios is the same in both cases. The greatest harms are associated with the status quo(scenario 1); the least with the regulated / fleet owned scenario (scenario 4). These findings areconsistent with visual inspections of Fig. S4 and S5. One noticeable difference is that the spreadbetween the scenarios is larger in the 10-point case than in the 4-point instance. The trends in thestandard deviations are also similar except that, in the case of the ten-point scale, the standarddeviation for the laissez faire option or unfettered AVs (scenario 2) is higher than it is for theother three scenarios, whereas in the four-point assessment it is similar to the others. This couldbe an impact of the shared feedback on the 4-point survey.

Two “benefit” surveys were administered. As with the harms, one was on a 4-point scale (0-3)where zero was “no benefit” and 3 was “drastic benefits” (Cronbach’s alpha, α = 0.901); theother was on a 10-point scale where 1 was “no benefit” and 10 was “drastic benefits”(Cronbach’s alpha, α = 0.941). For purposes of the presentation here, the 10-point scale is re-scaled to 0-9 so that “0” is common between the two surveys. For the “status quo” scenario, thebenefits were not assessed as it was assumed that the status quo is the baseline.Fig. S8 shows the benefit assessments based on both the 4-point and 10-point scales. Benefit 1maps to impact or question 14 and benefit 8 to impact or question 21 listed in Table 1. We findthat the greatest benefits are associated with 4 and 7, i.e., advancing the preservation of theenvironment (e.g., reducing traffic jams) and ensuring oversight and accountability (e.g.,preventing or limiting irresponsible uses), respectively. These benefits are greater for theregulated scenarios (3 and 4) than for the unregulated one (2). Moreover, in a broader sense, thebenefits for scenario 4 (regulated / fleet owned) are the greatest followed by scenario 3(regulated / privately owned) and then scenario 2 (laissez faire or unfettered). The one benefitwhere scenario 2 produces comparable or higher benefits is 1, promoting societal value (e.g.,increase in economic activity). Intuitively, respondents perceived that deregulated developmentwould produce the most innovation and capital investment. . 7

The summative question is: what does the survey suggest is the “best” scenario weighing theharms and benefits? There are many ways to answer that question [ ]. One possible approach isto take the harm and benefit value assessments, by respondent, and combine them with thecorresponding weights (by respondent), then we obtain sums of the results for the harms and thebenefits. Admittedly, this is “problematic” in that the weights for the harms and benefits wereassessed together; and here, they have been normalized to sum to one. But that may not be “bad”or “wrong.” It can be argued that forcing them to sum to 1 provides, implicitly, the respondent’ssense of the relative value of the eight benefits versus the 13 harms. Further surveying will revealvaluable information about this issue.In this instance the “total of the weighted harms” has been plotted against the “total of theweighted benefits” for the four scenarios, based on the “weighted value assessments” of therespondents. Fig. 1 plots the sum of these “weighted value assessments” for the harms againstthe “weighted value assessments” for the benefits. The message seems clear. The regulated-fleetowned scenario (4) is perceived to have greater benefits and lesser harms among all four options.It is slightly better than the regulated-personally owned scenario (3) and clearly better than the laissez-faire or unfettered scenario (2), especially insofar as the harms are concerned. (Of course,the status quo scenario has no benefits, and its harms are perceived to be the largest, significantlyso in the case of the 10-point based assessment). . 8 Figure 1: Harm/Benefit Tradeoffs for both the 4-point and 10-point assessments

4. Discussion and concluding remarks

Through the introduction of AVs, society is exposing itself to an array of risks. Despite theexcitement surrounding this technology, there are many unanswered questions about whether itwill be both beneficial and safe. Even though there are expectations of overall benefits to societyfrom the deployment of AVs, some groups compared to others could experience (much) highercosts relative to benefits [ ]. These negatively affected groups will include those whoselivelihood depends on traditional motor vehicles. In particular, a significant number of drivers . 9 (and, by extension, family members depending on their economic activity) will be affected bythe introduction of AVs. There are approximately 1.7 million truck drivers in the United States[ ], with about 800,000 involved in truck transportation. A potentially exacerbating factor isthat there is currently a shortage of truck drivers [ ]. This could drive motivation to get AVs onthe road en masse promptly, thus pushing current professional drivers out sooner. Although AVswill create new jobs in the trucking industry and other industries [ ], it is an open question ifthese new jobs would outnumber those lost due to AVs; indeed, that seems unlikely. However,many driving jobs are perceived to be unsatisfying and potentially unhealthy (e.g., due to highincidence of sleep apnea and obesity), making their eradication an overall positive outcome ifother employment opportunities are available [ ]. Positive or negative utilities [ ] related to the changes wrought to society needed to be assessedin a free and open discussion by a multi-disciplinary panel of experts. Our results point to theneed to better address how the public views the trade-offs between (1) safety; (2) physicalecology (environmental issues); (3) social ecology; (4) economic issues; and finally (5) thespecific impacts for groups that will be most affected by AV implementation (e.g., professionaldrivers).We contend that it is essential to increase the public’s confidence that the values of a pluralisticsociety are accounted for in the development of AV policies. This can be accomplished by 1)bringing society into the identification of norms surrounding AVs [

41, 42 ] and 2) accounting formultiple elements of moral decision-making [ ]. Regarding point 1, although expert groups likethe one we assembled do not nearly represent society as a whole, if such a group is large enoughand selected carefully it does represent an important slice of society that policymakers shouldpay attention to. Regarding point 2, such an expert group brings diverse, refined perspectives tomoral decision-making that can only increase the reliability of the assessment by ensuring themost important considerations and values are brought to the surface.Several states in the U.S. have started the process of legislating AVs, most notably designatingthe manufacturer of a vehicle operated by an automated driving system as the vehicle’s soledriver, and limiting this special legal framework to motor vehicle manufacturers that deploy theirvehicles as part of fleets within specific geographic areas [ ]. Our work provides valuable datathat should inform policy makers of concerns and potential benefits of AV technology in specificimplementation strategies, and this could improve the quality of the democratic policymakingprocess. We recommend that state legislatures and the federal government strongly considerincorporating our results regarding technology development scenarios as well as the MAIAquestionnaire into their deliberations about the impact of AVs. References

1. N. Ajmeri, “Engineering Multiagent Systems for Ethics and Privacy-Aware SocialComputing,” thesis, NC State University, Raleigh. (2019)2. W. A. Bauer, Virtuous vs. utilitarian artificial moral agents. AI & Society. (2), 1–9(2019).3. L. M. Clements, K. M. Kockelman, Economic Effect of Automated Vehicles.Transportation Research Record: Journal of the Transportation Research Board, No.2606: 106-114 (2017). . 10 (9566), 1047–1053 (2007).9. D. J. Nutt, L. A. King, L. D. Phillips, Drug harms in the UK: A multicriteria decisionanalysis. Lancet, , 1558–1565 (2010).10. D. Nutt, L. D. Phillips, D. Balfour, H. V. Curran, M. Dockrell, J. Foulds, D. Sweanor,Estimating the harms of nicotine-containing products using the MCDA approach.European Addiction Research, , 218-225. (2014).11. J. Van Amsterdam, A. Opperhuizen, M. Koeter, W. Van den Brink, Ranking the harm ofalcohol, tobacco and illicit drugs for the individual and the population. European AddictResearch, , 202–207 (2010).12. J. Van Amsterdam, D. Nutt, L. Phillips, W. Van den Brink, European rating of drugharms. Journal of Psychopharmacology, (6), 655-660 (2015).13. V. Dubljević, Toward an improved multi-criteria drug harm assessment process andevidence- based drug policies. Frontiers in Pharmacology, (898):1–8 (2018).14. J. P. Caulkins, P. Reuter, C. Coulson, Basing drug scheduling decisions on scientificranking of harmfulness: False promise from false premises. Addiction, , 1886–1890(2011).15. H. Kalant, Drug classification: Science, politics, both or neither? Addiction, , 1146–1149 (2010).16. C. Forlini, E. Racine, J. Vollmann, J. Schildmann, How research on stakeholderperspectives can inform policy on cognitive enhancement. American Journal ofBioethics, (7), 41–43 (2013).17. V. Dubljević, Response to Peer commentaries on “Prohibition or coffee-shops:Regulation of amphetamine and methylphenidate for enhancement use by healthyadults”. American Journal of Bioethics, (1), W1-W8. (2014).18. V. Dubljević, Neuroethics, justice and autonomy: Public reason in the cognitiveenhancement debate. Heidelberg: Springer (2019).19. M. Ford, Rise of the Robots: Technology and the Threat of a Jobless Future. (New York:Basic Books 2015).20. M. R. Frank, D. Autor, J.E. Bessen, E. Brynjolfsson, M. Cebrian, D. J. Deming, I.Rahwan, Toward understanding the impact of artificial intelligence on labor. PNAS (14): 6531-6539 (2019).21. K. Crawford, R. Calo, There is a blind spot in AI research. Nature 538 311–313 (2016). . 11 et al . 12 How to be Human in a Digital Economy (MIT Press, Cambridge, MA, 2019).41. I. Rahwan, Society-in-the-loop: programming the algorithmic social contract. Ethics andInformation Technology 20, 5-14 (2017).42. F. S. De Sio, Killing by autonomous vehicles and the legal doctrine of necessity.

EthicalTheory and Moral Practice, (2), 411–429 (2017).43. V. Dubljević, S. Sattler, E. Racine, Deciphering moral intuition: How agents, deeds, andconsequences influence moral judgment. PLOS ONE , (10):1–28 (2018).44. B. Walker Smith, Georgia and Virginia Legislation for Automated Driving and DeliveryRobots. The Center for Internet and Society. (2017). Retrieved fromhttp://cyberlaw.stanford.edu/publications/georgia-and-virginia-legislation-automated-driving-and-delivery-robots45. L. D. Phillips, Decision conferencing. In W. Edwards, R. H. Miles Jr, D. von Winterfeldt,(Eds.), Advances in decision analysis: From foundations to applications (pp. 375-399).Cambridge University Press, New York, NY (2007). . 1

Appendix: Supplementary Information

Supplementary methods section

We engaged 19 leading researchers from diverse backgrounds (in terms ofdiscipline, gender and ethnicity) to participate in a consensus-buildingworkshop on the NC State campus on 21 Feb 2020. We selected this manybecause 19 is near the upper limit espoused by Phillips [45] for effectivenessin expert-based decision analysis studies. The experts discussed the criteriaand the scenarios at length during the workshop. Five surveys wereadministered in conjunction with the workshop: 1) weights among thecriteria, 2) a 4-point assessment of harms, 3) a 4-point assessment ofbenefits, 4) a 10-point assessment of harms, and 5) a 10-point assessment ofbenefits. Subsequent to the workshop, an additional survey of weightslimited to 100% total for all criteria was conducted.The Delphi method was used to generate the responses. The participantswere briefed on the results of the n th survey before the n+1 st survey wasadministered. Their responses were converted to a 100-point scale, theparticipants were briefed about the result, and the survey was repeated witha second and a third wave. The repetition of rankings (using a 4-point scaleand a 10-point scale) helps reduce potential biases in the impactassessments. Acknowledgements

This study was funded by North Carolina State University through Researchand Innovation Seed Funding and the Kenan Institute for Engineering,Technology & Science. The authors thank Abby Scheper, Abigail Presley, Leila Ouchchy, JoshuaMyers, and Elizabeth Eskander for research assistance. Additional thanks toMissy Cummings, Stephanie Sudano, Joseph Hummer and Michael Struett fortheir valuable input during the workshop. Special thanks to the members ofthe Neuro-Computational Ethics research group for their feedback on anearlier version of the paper. Table ST1: Composition of the Expert Pool

Field of expertise Number

Political science 4Civil/transportation engineering 7Philosophy/Ethics 5Computer Science/AI 3Organizational Behavior 1