[PDF] Ethical Considerations for AI Researchers

Abstract

Use of artificial intelligence is growing and expanding into applications that impact people's lives. People trust their technology without really understanding it or its limitations. There is the potential for harm and we are already seeing examples of that in the world. AI researchers have an obligation to consider the impact of intelligent applications they work on. While the ethics of AI is not clear-cut, there are guidelines we can consider to minimize the harm we might introduce.

Full PDF

aa r X i v : . [ c s . C Y ] J un Ethical Considerations for AI Researchers

Kyle Dent

Palo Alto Research [email protected]

Abstract

Use of artiﬁcial intelligence is growing and expanding intoapplications that impact people’s lives. People trust theirtechnology without really understanding it or its limitations.There is the potential for harm and we are already seeing ex-amples of that in the world. AI researchers have an obligationto consider the impact of intelligent applications they workon. While the ethics of AI is not clear-cut, there are guidelineswe can consider to minimize the harm we might introduce.

Introduction

A quick scan of recent papers covering the area ofAI and ethics reveals researchers’ admirable im-pulse to think about teaching intelligent agents hu-man values (Abel, MacGlashan, and Littman 2016;Burton, Goldsmith, and Mattei 2016;Riedl and Harrison 2016). There is, however, anotherimportant and more immediate aspect of AI and ethicswe ought to take into consideration. AI is being widelydeployed for new applications; it’s becoming pervasive; andit’s having an effect on people’s lives. AI researchers shouldreﬂect on their own personal responsibility with regard tothe work they do. Many of us are motivated by the idea thatwe can contribute useful new technology that has a positiveimpact on the world. Positive outcomes have largely beenthe case with advanced technologies that improve cancerdiagnosis and provide safety features in cars, for example.With vast amounts of computing power and a number ofimproved techniques, intelligent software is being adoptedin more and more contexts that affect people’s lives. Howpeople use it is starting to matter, and the impact of ourdecisions matters.Not surprisingly as the use of AI expands, negative con-sequences of its failures and design ﬂaws are more visible.Much of the AI that has recently been deployed derives itsintelligence from learning algorithms that are based on sta-tistical analysis of data. The acquisition, applicability, andanalysis of that data determine its output. Statistics shinewhen making predictions about distributions over popula-tions. That predictive power fades when applied to individu-als. There will be faulty predictions. The popular press is rife

Copyright c (cid:13) with misuses of statistical analysis and AI (Crawford 2016;O’Neil 2016). Given the growing use, the built-in uncertain-ties, and the public’s tendency to blindly trust technology,we have a responsibility to consider the likely and unlikelyoutcomes of the choices we make when we are designingand developing tools or predictive systems to support deci-sion making that affect people and communities of people.Purposely malicious choices are obviously ethically un-acceptable. In (Yampolskiy 2015), the author outlines var-ious pathways that lead to dangerous artiﬁcial intelligence.Within the taxonomy, there are pathways that introduce dan-ger into artiﬁcial intelligence ‘on purpose.’ The other path-ways inadvertently lead to hazards in the system. You candecide for yourself if you are comfortable developing smartweapons, for example, and most of us would, at a minimum,pause to consider the implications of that decision. But theinadvertent pathways leading to dangerous AI can be difﬁ-cult to foresee and may come about from subtle interactions.Our less obvious responsibility lies in giving careful consid-eration to our choices and being clear to ourselves and ourstakeholders about assumptions, trade-offs, and choices wemake.Several other papers consider another ethical as-pect in the fairness of automatic systems (O’Neil 2016;Hardt, Price, and Srebro 2016; NSTCCT 2016), and someeven conclude that it’s inherently impossible for most prob-lems (Kleinberg, Mullainathan, and Raghavan 2016). Oneof the points I’ll make is that discussions about fairness andsocietal impact can be cut off once an intelligent agent isintroduced into the process. There is a popular feeling thatmachines don’t make value judgments and are inherently un-biased. However, the assumptions we make when design-ing our systems are often based on subjective value judg-ments; for example, choosing data sets, selecting weight-ing schemes, balancing precision and recall. We have to betransparent about what we do and be clear about the choiceswe have made. The ultimate purpose matters and the deci-sions you come to must be communicated.

Blind Trust in Technology

Although there are pockets of skepticism towards intelligentsystems, by and large people are content to ofﬂoad decisionsto technology. In May 2016, there was a widely publicizedcrash involving a Tesla Motors car being driven in computer-ssisted mode. It appears the driver had undue faith in the ca-pabilities of the car (Habib 2017). The following week an-other driver following a GPS unit steered her car into On-tario’s Georgian Bay (MiQuigge 2016). These extreme ex-amples reveal a trend in the general population to trust thesmart devices in our lives.Ideally government agencies and jurisdictions would ap-ply the principles of open government and transparencywhen contracting with suppliers for decision-making tools.In practice that hasn’t been the case. Last year, two re-searchers ﬁled 42 open records requests in 23 differentstates asking for information about software with predic-tive algorithms used by governments as decision supporttools (Brauneis and Goodman 2017). Their goal was to un-derstand the policies built into the algorithms in order toevaluate their usefulness and fairness. Only one of the ju-risdictions was able to provide information about the algo-rithms the software used and how it was developed. Someof those who did not respond cited agreements with vendorspreventing them from revealing information, but many didnot seem concerned about transparency in their process northe need to understand the technology. Assuming the bestintentions of the decision makers, they are also demonstrat-ing great faith in the technology and vendors they contractwith.There is also evidence that users of these systems, judgesand hiring managers for example, weight AI guidance tooheavily. Without tools, when people are making decisions,there is public awareness that decisions are made withinsome context. We understand that individuals can be inﬂu-enced even subconsciously by their biases and prejudices.Technologically assisted decisions tend to shut down theconversation about fairness despite their having a large ef-fect on people’s lives. Those affected may not have the op-portunity to contest the decisions. If important decisions aremade through our models, we must use care in developingthem and clearly communicate the assumptions we make.

Ethical Obligations

Physicians and attorneys have well-established codes ofethics. Doctors famously commit to not doing any harm. Im-plied in that concept is the idea that there is potential to doharm. It is clear from many examples, some of which I men-tion in this paper, that there is the potential for harm in ourwork, and given people’s lack of understanding of the lim-its of and the trust they place in technology, AI researchershave a personal, ethical obligation to reﬂect on the decisionswe make.Ethical thinking helps us to make choices and just asimportantly provides a framework to reason about thosechoices. The framework we use (explicitly or not) is deﬁnedby a set of principles that guide and support our decisions.One of the difﬁcult things about deﬁning ethical standardsis deciding the values to base them on. Ethics issues willundoubtedly be discussed and argued within the communityand the world generally in the coming years. Each of us canstart by considering our own roles and being consciouslyaware of the effects our work can have. The stakeholders who decide to deploy intelligent deci-sion making, government agencies for example, generallyaren’t qualiﬁed to assess the assumptions, models and algo-rithms in it. This asymmetrical relationship puts the burdenon those with the information to be clear, honest, and forth-coming with it. Those at a disadvantage depend on us toinform them about technology’s ﬁtness for their purpose, itsreliability and accuracy. We usually focus on the technicalaspects of our work like selecting highly predictive modelsand minimizing error functions, but when applying algorith-mic decision-making that will affect human beings, we havea responsibility to think about more.

Recommendations for Consideration

Ethics is not science. But it is possible to ground our think-ing in well-deﬁned guidelines to assist in making ethicaldecisions for AI development. A formal framework mayeven emerge within the researcher community with time. Inthe short-term, the following is a list of thoughts and ques-tions to ask ourselves when designing predictive or decision-making systems.

1. Relevance of data and models

It is important to think carefully about the data used to trainour technology. Are the data and models appropriate to thereal-life problem they are solving? It is tempting to believecausal forces are at play when we ﬁnd correlation on a singledataset. Does the data capture the true variable of interest? Isit consistent across observations and over time? We often in-troduce a proxy variable because the variable we need isn’tavailable or isn’t easy to quantify. Can your ﬁndings be cal-ibrated against the real-world situation? Even better couldyou measure the actual outcome you’re trying to achieve?In 2008, Researchers at Google had the idea that an in-crease in search queries related to the ﬂu and ﬂu symp-toms could be indicative of a spreading virus. They createdthe Google Flu Trends (GFT) web service to track Googleusers’ search queries related to the ﬂu. If they detected in-creased transmission before the numbers from the U.S. Cen-ters for Disease Control and Prevention (CDC) came out,earlier interventions could reduce the impact of the virus.The initial article reported 97% accuracy using the CDCdata as the gold standard (Ginsberg et al. 2009). However,a follow-up report showed that in subsequent ﬂu seasonsGFT predicted more than double what the CDC data showed(Lazer et al. 2014). Given the ﬁrst year’s high accuracy, itwould have been easy for the researchers to believe they haddiscovered a strong, predictive signal. But online behaviorisn’t necessarily a reﬂection of the real world. There areseveral factors that might make the GFT data wrong. Oneof them is that the underlying algorithms of Google Searchitself (the GFT researchers don’t control those) can changefrom one year to the next. Also users’ search behavior couldhave changed. Mainly, however, people’s search patterns areprobably not a good single indicator of a spreading virus.There are many other factors and various reasons peoplemight search for information.Training data rarely aligns with real-life goals. In(Lipton 2016), Lipton presents challenges to providing andven deﬁning interpretability of machine learning outputs.He identiﬁes several possible points of divergence betweentraining data and real-life situations. For example, off-linetraining data is not always representative of the true environ-ment, and real-world objectives can be difﬁcult to encode assimple value functions. Often we work with data that wascollected for other purposes and almost never under ideal,controlled circumstances. What was the original purpose incollecting the data, and how did that determine its content?In July of 2015, another group at Google had to apologizefor its Photos application identifying a black couple as goril-las (Guynn 2015). Their training dataset was not representa-tive of the population it was meant to predict. Also, there arelimits to the amount of generalization we can expect fromany learning method trained on a particular dataset.Is it possible your dataset contains biases? When mak-ing decisions related to hiring, judicial proceedings, andjob performance, for example, many personal characteris-tics are legally excluded. Also, humans are good at dis-carding variables they recognize as irrelevant to the deci-sion to be made; computers are blind to those considera-tions. Are there other characteristics that are closely corre-lated with legally and ethically protected ones? If you don’tconsider those, you can inadvertently treat people unfairlybased on protected or irrelevant characteristics. There is of-ten a trade-off between accuracy and the intelligibility of amodel (Caruana et al. 2015). More predictive but harder-to-understand models can make it difﬁcult to know which per-sonal characteristics determine the decision and are there-fore not available for validation against human judgment.In (Caruana et al. 2015) the authors describe a system thatlearned a rule that patients with a history of asthma have alower risk of dying from pneumonia. Based on the data usedto train the system, their model was absolutely correct. How-ever, in reality asthma sufferers (without treatment) have ahigher risk of dying from pneumonia. Because of the in-creased risk, when patients with a history of asthma go tothe hospital, the general practice is to place them in an in-tensive care unit. The extra attention they receive decreasestheir risk of dying from pneumonia even below that of thegeneral population. It is our natural inclination to developmodels with the highest accuracy. However, the necessity ofvisibility into decisions where people’s lives are concerned,may increase the importance of explainability at the expenseof some predictive performance. In all cases, our stakehold-ers must understand the decisions we make and the trade-offs implied by them.

2. Safeguards for Failures and Misuse

Even experienced researchers with the best intentions areinclined to favor the positive outcomes of their work. Wehighlight positive results, but we should also think throughfailure modes and possible unintended consequences. Whatabout misuse? There isn’t a lot you can do about a person de-termined to use the technology in ways it wasn’t intended,but are there ways a good-faith user might go wrong? Canyou add protections for that?The 2016 Tesla accident mentioned before was catas-trophic. The driver used computer-assisted mode in condi- tions it was expressly not designed for resulting in his death.The accident was investigated by two government agencies.The ﬁrst ﬁnding from the National Highway Trafﬁc andSafety Administration found that the driver-assist softwarehad no safety defects and declared that, in general, the ve-hicles performed as designed (Habib 2017) implying thatresponsibility for use of the system falls on the operator.A later investigation from The National Transportation andSafety Board found otherwise (NTSB 2016). They declaredthat the automatic controls played a major role in the crash.The fact that the driver was able to use computer assistancein a situation it was not intended for was problematic. Thecombination of human error and insufﬁcient safeguards re-sulted in an accident that should not have happened.

3. Accuracy

How accurate is your algorithm and how accurate does itneed to be? Do your stakeholders understand the number ofpeople who will be subject to a missed prediction given yourmeasure of accuracy? A model that misses only 1% showsphenomenally good performance, but if hundreds or thou-sands of people are still adversely affected, that might notbe acceptable. Are there human inputs that can compensatefor the system’s misses and can you design for that? Whatabout post-deployment accuracy? Accuracy in training datadoesn’t always reﬂect real usage. Do you have a way to mea-sure runtime accuracy? The world is dynamic and changeswith time. Is there a way to continue to assess the accuracyafter release? How often does it have to be reviewed?

4. Size and severity of impact

Think about the numbers of people affected. Of course, youwant to avoid harming anyone but knowing the size or theseverity of negative consequences can justify the cost of ex-tra scrutiny. You might also be able to design methods thatmitigate for them. Given an understanding of the impact,you can make better decisions about the value required bythe extra effort.

Conclusion

Individual researchers, especially in commercial operations,don’t always have the chance to communicate clearly andtransparently with clients. At least being transparent withyour immediate stakeholders can set the right expectationsfor them when they represent your work down the line. Youare necessarily making decisions about the models and soft-ware you develop. If you don’t surface those decisions todiscuss their effect, they may never be brought to light.A short paper cannot cover such a large and multi-facetedissue. The main idea is for each of us to think individuallyabout our own responsibilities and the impact our work canhave on real lives. It’s useful to spend time thinking aboutour assumptions and the trade-offs we make in the contextof the people who will be affected. Communicating those toeveryone concerned is also critical. Modern versions of theHippocratic Oath are still used by many medical schools.The spirit of the oath is applicable to most research affectinghuman beings. One phrase is especially general and worthkeeping in mind:I will remember that I remain a member of society,with special obligations to all my fellow human be-ings. . . ” (Tyson 2001)

References [Abel, MacGlashan, and Littman 2016] Abel, D.; Mac-Glashan, J.; and Littman, M. 2016. Reinforcement learningas a framework for ethical decision making. AI, Ethics, andSociety Workshop at the Thirtieth AAAI Conference onArtiﬁcial Intelligence. Association for the Advancement ofArtiﬁcial Intelligence.[Brauneis and Goodman 2017] Brauneis, R., and Goodman,E. P. 2017. Algorithmic transparency for the smart city.

Yale Journal of Law & Technology; GWU Law School Pub-lic Law Research Paper; GWU Legal Studies Research Pa-per .[Burton, Goldsmith, and Mattei 2016] Burton, E.; Gold-smith, J.; and Mattei, N. 2016. Using “the machine stops”for teaching ethics in artiﬁcial intelligence and computerscience. AI, Ethics, and Society Workshop at the ThirtiethAAAI Conference on Artiﬁcial Intelligence. Associationfor the Advancement of Artiﬁcial Intelligence.[Caruana et al. 2015] Caruana, R.; Lou, Y.; Gehrke, J.; Koch,P.; Sturm, M.; and Elhadad, N. 2015. Intelligible models forhealthcare: Predicting pneumonia risk and hospital 30-dayreadmission. In

Proceedings of the 21th ACM SIGKDD In-ternational Conference on Knowledge Discovery and DataMining , KDD ’15, 1721–1730. New York, NY, USA: ACM.[Crawford 2016] Crawford, K. 2016. Artiﬁcial intelligence’swhite guy problem.

New York Times – June 25, 2016 .[Ginsberg et al. 2009] Ginsberg, J.; Mohebbi, M.; Patel, R.;Brammer, L.; Smolinski, M.; and Brilliant, L. 2009. De-tecting inﬂuenza epidemics using search engine query data.

Nature

USA Today . Online; posted July 1, 2015.[Habib 2017] Habib, K. 2017. Pe 16-007 automatic vehiclecontrol systems. Technical report, National Highway TrafﬁcSafety Administration.[Hardt, Price, and Srebro 2016] Hardt, M.; Price, E.; andSrebro, N. 2016. Equality of opportunity in supervisedlearning.

CoRR abs/1610.02413.[Kleinberg, Mullainathan, and Raghavan 2016] Kleinberg,J. M.; Mullainathan, S.; and Raghavan, M. 2016. Inherenttrade-offs in the fair determination of risk scores.

CoRR abs/1609.05807.[Lazer et al. 2014] Lazer, D.; Kennedy, R.; King, G.; andVespignani, A. 2014. The parable of google ﬂu: Traps inbig data analysis.

Science

CoRR abs/1606.03490.[MiQuigge 2016] MiQuigge, M. 2016. Woman follows gps;ends up in ontario lake.

Toronto Sun – May 13, 2016 .[NSTCCT 2016] NSTCCT. 2016. Preparing for the futureof artiﬁcial intelligence. Technical report, National Scienceand Technology Council Committee on Technology. [NTSB 2016] NTSB. 2016. Highway accident report: Col-lision between a car operating with automated vehicle con-trol systems and a tractor-semitrailer truck. Technical report,National Transportation Safety Board.[O’Neil 2016] O’Neil, C. 2016.

Weapons of Math De-struction: How Big Data Increases Inequality and Threat-ens Democracy . New York, NY, USA: Crown PublishingGroup.[Riedl and Harrison 2016] Riedl, M., and Harrison, B. 2016.Using stories to teach human values to artiﬁcial agents. AI,Ethics, and Society Workshop at the Thirtieth AAAI Con-ference on Artiﬁcial Intelligence. Association for the Ad-vancement of Artiﬁcial Intelligence.[Tyson 2001] Tyson, P. 2001. The hippocratic oath today.

PBS - NOVA . Online; posted March 3, 2001.[Yampolskiy 2015] Yampolskiy, R. V. 2015. Taxonomy ofpathways to dangerous AI.