Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems
AAvoiding Negative Side Effects due to Incomplete Knowledge of AI Systems
Sandhya Saisubramanian and Shlomo Zilberstein and Ece Kamar College of Information and Computer Sciences, University of Massachusetts Amherst, Massachusetts, USA Microsoft Research, Redmond, Washington, USA
Abstract
Autonomous agents acting in the real-world often operatebased on models that ignore certain aspects of the environ-ment. The incompleteness of any given model—handcraftedor machine acquired—is inevitable due to practical limita-tions of any modeling technique for complex real-world set-tings. Due to the limited fidelity of its model, an agent’s ac-tions may have unexpected, undesirable consequences dur-ing execution. Learning to recognize and avoid such negativeside effects of the agent’s actions is critical to improving thesafety and reliability of autonomous systems. This emergingresearch topic is attracting increased attention due to the in-creased deployment of AI systems and their broad societalimpacts. This article provides a comprehensive overview ofdifferent forms of negative side effects and the recent re-search efforts to address them. We identify key characteris-tics of negative side effects, highlight the challenges in avoid-ing negative side effects, and discuss recently developed ap-proaches, contrasting their benefits and limitations. We con-clude with a discussion of open questions and suggestions forfuture research directions.
A world populated with intelligent and autonomous sys-tems that simplify our lives is gradually becoming a reality.These systems are autonomous in the sense that they candevise a sequence of actions to achieve some given objec-tives or goals, without human intervention. Such systems aredeeply integrated into our daily lives through various appli-cations such as mobile health monitoring (Sim 2019), in-telligent tutoring (Folsom-Kovarik, Sukthankar, and Schatz2013), self-driving cars (Zilberstein 2015), and clinical deci-sion making (Bennett and Hauser 2013). This broad deploy-ment brings along new challenges and increased responsi-bility for designers of AI systems, particularly ensuring thatthese systems operate as expected when deployed in the real-world. Despite recent advances in artificial intelligence andmachine learning, there are no ways to assure that systemswill always “do the right thing” when operating in the openworld (Lakkaraju et al. 2017).For example, consider an autonomous vehicle (AV) thatwas carefully designed and tested for safety aspects such asyielding to pedestrians and conforming to traffic rules. Whendeployed, the AV may not slow down when driving throughpuddles and splash water on nearby pedestrians, which is un-desirable. Another documented example of undesirable be-havior in autonomous vehicles is the vehicle swerving left and right multiple times to localize itself for active lane-keeping. During this process, the vehicle rarely promptedthe driver to take control. This behavior, especially on curvyand hilly roads, can startle the driver or cause panic (Insur-ance Institute for Highway Safety 2018).Such undesirable behaviors have been reported in manyother contexts in which AI systems are deployed. For ex-ample, robot vacuum cleaners are becoming increasinglypopular and they have a simple task—to remove dirt fromthe floor. Undesirable behaviors may occur even when per-forming relatively simple tasks. A robot vacuum cleaner inFlorida ran over animal feces in the house and continued itscleaning cycle, smearing the mess around the house (Solon2016). In an extreme case in South Korea, a robot vacuumcleaner locked into the hair of a woman who was sleepingon the floor, mistaking her hair for dust (McCurry 2015).A key factor affecting an agent’s performance is itsknowledge of the environment in which it is situated. Inthese examples, the agent was performing its task, perhapsoptimally with respect to the information provided to it, butthere were serious negative side effects to the agent’s ac-tions. In the AV example, driving fast through puddles isoptimal when optimizing travel time. The side effects aredue to the limited scope of the agent’s model, not account-ing for the undesirability of splashing water on pedestrians.In practice, it is not feasible to anticipate all possible nega-tive side effects and accurately encode them in the model atdesign time. Due to the practical limitations of data collec-tion and model specification, agents operating in the openworld often rely on incomplete knowledge of their target en-vironment. This may lead to unexpected, undesirable con-sequences whose severity ranges from mild and tolerableevents to safety-critical failures. Addressing the potentialundesirable behaviors of autonomous systems is critical tosupport long-term autonomy and ensure that a deployed AIsystem is reliable.There have been numerous recent studies focused onthe broad challenge of building safe and reliable AI sys-tems (Amodei et al. 2016; Russell, Dewey, and Tegmark2015; Saria and Subbaswamy 2019; Thomas et al. 2019).Here, we examine the particular problem of identifying andmitigating the impacts of undesirable side effects of anagent’s actions when operating in the open world. We donot consider system failure or negative side effects that re- a r X i v : . [ c s . C Y ] A ug nvironmentaction intended effects (negative) side effects observation Next action?
Agent’s model of the environment
Figure 1: Negative side effects of an agent’s behavior.sult from intentional adversarial attack on the system (Big-gio and Roli 2018; Cao et al. 2019).
Negative side effects are undesired effects of an agent’sactions that occur in addition to the agent’s intendedeffects when operating in the open world. (Figure 1).Negative side effects occur because the agent’s modeland objective function focus on some aspects of the en-vironment, but its operation could impact additional as-pects of the environment. Overcoming negative side effectsis an emerging area that is attracting increased attentionwithin the AI community (Amodei et al. 2016; Hadfield-Menell et al. 2017; Hibbard 2012; Krakovna et al. 2019;Russell 2017; Turner, Hadfield-Menell, and Tadepalli 2020;Saisubramanian, Kamar, and Zilberstein 2020; Shah et al.2019; Zhang, Durfee, and Singh 2018). The problem of neg-ative side effects in AI is related to the value alignmentproblem, which examines the unsafe behavior of an agentwhen its objective is unintentionally misspecified and is notaligned with human values (Hadfield-Menell et al. 2016;Russell 2017). While misaligned values may lead to negativeside effects, the inverse is not necessarily true. That is, neg-ative side effects can occur even in settings where the agentoptimizes legitimate objectives that align with the user’sgoals, due to incomplete knowledge and distributional shift.For example, while driving in Boston, AVs that are pro-grammed to not run into obstacles were stopped by the localbreed of unflappable seagulls standing on the street (Coren2018). Not running into obstacles is well-aligned with theusers’ intentions and objectives, but there are side effectsbecause the agent lacks knowledge that it can edge to startlethe birds and then continue driving. In fact, this knowledgewas later added to the system to resolve this problem.Design decisions that may be innocuous during initialtesting may have a significant impact when a system iswidely deployed. Certainly, some negative side effects couldbe anticipated or detected during system development andappropriate mechanisms to mitigate their impacts could beimplemented prior to deployment. We focus in this articleon negative side effects that are discovered when the systemis deployed, due to a variety of factors such as unanticipateddomain characteristics, cultural differences among the targetuser and development team, or unanticipated consequencesof system or software upgrade. For example, the issue of aRoomba locking into the hair of a person lying on the floor Property Property ValuesSeverity Ranges from mild to safety-criticalReversibility Reversible or irreversibleAvoidability Avoidable or unavoidableFrequency Common or rareStochasticity Deterministic or probabilisticObservability Full, partial, or unobservedExclusivity Prevent task completion or notTable 1: Taxonomy of negative side effects.emerged only after the system was deployed in Asia.The severity of negative side effects may range from mildto safety-critical failures. Often, the discussions around therisk of encountering negative side effects have highlightedcatastrophic events. While these discussions are critical andessential, AI systems in general are carefully designed andtested for such failures before deployment. With the increas-ing growth in the capabilities and deployment of AI systems,it is equally important to address the negative side effectsthat are not catastrophic, but have significant impacts. Suchside effects occur more frequently but are often overlooked,particularly when the only remedy available is to removethe product and develop a new version that can avoid theundesired behavior. Hence, providing end users the tools toidentify and mitigate the impacts of negative side effects iscritical in shaping how users view, interact, collaborate, andtrust AI systems.The rest of this article identifies key characteristics ofnegative side effects, highlights the challenges in overcom-ing negative side effects, and discusses the recent researchprogress in this area. To promote a better understanding ofthe prevalence of negative side effects and to provide com-mon test cases for the research community, we have createda public repository that allows AI researchers to report newcases. We conclude the article with a discussion of openquestions to encourage future research in this area.
Taxonomy of Negative Side Effects
We introduce a taxonomy of negative side effects, outlinedin Table 1. Understanding the characteristics of negative sideeffects helps design better solution approaches to detect andmitigate their impacts in deployed systems.
Severity:
The severity of negative side effects ranges frommild side effects that can be largely ignored to safety-criticalfailures that require suspension of the deployment of the sys-tem. Safety-critical side effects are typically addressed byredesigning the model and hence require extensive evalua-tion before redeployment. An example of a safety-criticalside effect is an autonomous vehicle failing to detect a con-struction worker’s hand gestures (Crane, Logue, and Pilz2017). We conjecture that many negative side effects liein the middle with significant impacts that require atten-ion, but not sufficiently critical to suspend the service. Anautonomous vehicle that does not slow down when goingthrough shallow puddles can cause significant impacts, butthose are unlikely to be considered sufficiently critical toroll-back its deployment, particularly if mechanisms are pro-vided to mitigate the negative impacts. Addressing such sideeffects without suspension of service requires agent adapta-tion and online planning.
Reversibility:
Side effects are reversible if the impact can bereversed or negated, either by the agent causing it or via ex-ternal intervention. For example, breaking a vase is an irre-versible side effect, regardless of the agent’s skills (Amodeiet al. 2016). Side effects such as leaving marks on a wall canbe fixed by repainting it, but the agent may require externalassistance to achieve that. The prior works have proposedsolutions mainly to address irreversible negative side effects.Some of these techniques can handle reversible negative sideeffects with minimal or no changes.
Avoidability:
In some problems, it may be impossible toavoid the negative side effects during the course of theagent’s operation to complete its assigned task, thereby in-troducing a trade-off between performing its assigned taskand avoiding the side effects. For example, the side effectsof driving through puddles are unavoidable if all roads lead-ing to the destination have puddles. Addressing unavoidablenegative side effects requires a principled approach to bal-ance the trade-off between avoiding side effects and opti-mizing the achievement of the assigned task.
Frequency:
The frequency of occurrence of negative sideeffects depends on the environmental conditions and the ac-tion plan. Certain negative side effects may occur rarely,considering all use cases, but may occur frequently for asmall subset of cases. A robot pushing a box over a rug maydirty it as a negative side effect. This is an example of a fre-quently occurring negative side effect when the domain ofoperation is largely covered with a rug. Much of the existingliterature focuses on frequently occurring side effects sincethey are easy to identify and important to address. The fre-quency of occurrence, however, could impact the approachto identify negative side effects and the corresponding miti-gation approach.
Stochasticity:
The occurrence of negative side effects maybe deterministic or probabilistic. Deterministic side effectsalways occur when some action preconditions arise in theopen world. Side effects are probabilistic when their occur-rence is not certain even when the right preconditions arise.For example, there may be a small probability that a robotmay accidentally slide and scratch the wall while pushinga box, but that undesired effect may happen only 20% ofthe times the robot slips. Solution approaches designed tohandle deterministic side effects can often be extended tostochastic settings, and vice versa, with little or no modifi-cations.
Observability:
The agent’s observability of the conditionsthat may trigger a negative side effect or the actual negativeeffects on the environment are generally determined by theagent’s state representation and sensory input. Negative side effects may be fully observable, partially observable, or evenunobserved by the agent. Observing a side effect is differentfrom identifying or recognizing the impact as a side effect.For example, the agent may observe the scratch it made onthe wall but may not recognize that it is undesirable, and asa result may not try to avoid it. Observability is a criticalfactor when learning to avoid negative side effects. When anexternal authority provides feedback to the agent, it may besufficient for the agent to observe the conditions that triggerthe negative side effect. However, when an agent may needto identify negative side effects on its own, it needs morecomplex general knowledge about the open world.
Exclusivity:
Negative side effects may prevent the agentfrom completing its assigned task. This category is relativelyeasier to identify. Often, however, the side effects negativelyimpact the environment without preventing the agent fromcompleting its assigned task. Such side effects are typicallydifficult to identify at design time. Much of the current re-search on avoiding negative side effects focuses on side ef-fects that do not prevent the agent from completing its pri-mary task.
Challenges in Avoiding Negative Side Effects
The challenges in avoiding negative side effects broadlystem from the difficulty in obtaining knowledge about sideeffects a priori, gathering user preferences to understandtheir tolerance for side effects, and balancing the potentialtrade-off between completing the task and avoiding the sideeffects.
Model imprecision
Agents designed to operate in theopen world are either trained in a simulator or operate basedon models created by a designer or generated automaticallyusing data. Regardless of how much effort goes into the sys-tem design and how much data is available for training andtesting, it is generally infeasible to obtain a perfect descrip-tion of the environment. Practical challenges in model speci-fication, such as the qualification and ramification problems,and computational complexity consideration often cause theagent to reason based on models that do not represent all therelevant details in the open world (Dietterich 2017). Simula-tors also suffer from this drawback, as they are also built bydesigners, resulting in mismatches between a simulator andthe actual environment (Ramakrishnan et al. 2019). As a re-sult of reasoning with incomplete information, agents maynot consistently behave as intended, leading to unexpectedand costly errors, or may completely fail in complex set-tings.There are three key reasons why the agent may not haveprior knowledge about the negative side effects of its actions.First, identifying negative side effects a priori is inherentlychallenging. As a result, this information is often lacking inthe agent’s model. Second, many AI systems are deployedin a variety of settings, which may be different from the en-vironment used in training and testing of the agent. This dis-tributional shift may cause negative side effects and is dif-ficult to assess during the design process. Third, negativeide effects in many settings arise due to user preference vi-olation. It is generally difficult to precisely learn or encodehuman preferences and account for individual or cultural dif-ferences.Techniques such as model update after deployment tominimize side effects and building more realistic simula-tors (Dosovitskiy et al. 2017) are some of the promising di-rections to handle side effects due to model imprecision.
Feedback collection
An agent that is unaware of the sideeffects of its actions can gather this information throughvarious feedback mechanisms from users or through au-tonomous exploration and model revisions. Though learn-ing from feedbacks produces good results in many prob-lems (Lakkaraju et al. 2017; Ramakrishnan et al. 2019;Saisubramanian, Kamar, and Zilberstein 2020; Zhang, Dur-fee, and Singh 2018; Zhang, Durfee, and Singh 2020;Basich et al. 2020), there are three main challenges inemploying this approach in real-world systems. First, thelearning process may not be sample efficient or may re-quire feedbacks in a certain format, such as correcting theagent policy by providing alternate actions for execution,to be sample efficient. Feedback collection in general isan expensive process, particularly when the feedback for-mat requires constant human oversight or imposes signifi-cant cognitive overload on the user. Second, the feedbacksmay be biased or delayed or both, which in turn affectsthe agent’s learning process. Finally, it is generally as-sumed that the agent uses human-interpretable representa-tions for querying and feedback collection, but there maybe mismatches between the models of the agent and human.There are some recent efforts towards addressing the prob-lem of sample efficiency in learning (Buckman et al. 2018;Wang et al. 2016) and investigating the impact of biasin feedback for agent learning (Ramakrishnan et al. 2018;Saisubramanian, Kamar, and Zilberstein 2020). Identifyingand evaluating human-interpretable state-action representa-tions for querying humans is largely an open problem. Managing side-effect tradeoffs
When negative side ef-fects are unavoidable and interfere with the performance ofthe agent’s assigned task, there is a trade-off between com-pleting the task efficiently and avoiding the negative sideeffects. In an extreme case, it may be impossible for theagent to achieve its goal without creating negative side ef-fects. How far should an agent deviate from its optimal planin order to minimize the impacts of negative side effects?Balancing this trade-off requires user feedback since it de-pends on their tolerance for negative side effects. This can bechallenging when the agent’s objective and the side effectsare measured in different units.
Approaches to Mitigate Negative Side Effects
This section reviews the emerging approaches to mitigatingthe impacts of negative side effects. Table 2 summarizes thecharacteristics of side effects handled by each one of themethods we mention.
Model and policy update
The occurrence of negativeside effects in a system depends on the agent’s trajectory,which is determined by its policy derived using its reasoningmodel. Hence, a natural approach to mitigate negative sideeffects is to update the model such that the agent’s policyavoids negative side effects as much as possible. When theside effects are severe, causing safety-critical failures, themodel update may include significant changes such as re-design of the reward function. Hadfield-Menell et al. (2017)address such a setting where the negative side effects oc-cur due to unintentional misspecification of rewards by thedesigner. The agent is assumed to be aware of a possiblereward misspecification. The agent learns the true rewardfunction by treating the designed reward function as an indi-cator of the intended reward. The agent infers the reward us-ing inverse reinforcement learning techniques. As acknowl-edged by the authors, this approach may not be scalable tolarge, complex settings.Redesigning the reward function may degrade the agent’sperformance with respect to its assigned task and hence re-quires exhaustive evaluation before redeploying the agent.This could be very expensive and likely require suspen-sion of operation until the newly derived policies couldbe deemed safe for autonomous operation. In problem do-mains where the side effects are undesirable but not safety-critical, the impact can be minimized by augmenting theagent’s model with a penalty function corresponding to neg-ative side effects. This exploits the reliability of the existingmodel with respect to the agent’s assigned task, while allow-ing a deployed agent to adjust its behavior to minimize theside effects.In related work (Saisubramanian, Kamar, and Zilberstein2020), we describe a multi-objective formulation of thisproblem with a lexicographic ordering of objectives that pri-oritizes optimizing the agent’s assigned task (primary objec-tive) over minimizing negative side effects (secondary ob-jective). A slack value to the primary objective determinesthe maximum allowed deviation from the optimal expectedreward of the primary objective so as to minimize side ef-fects. This work considers a setting in which the agent has noprior knowledge about the side effects of its actions, whichmay be unobservable by the agent. Information about thenegative side effects is gathered using feedback, which isthen encoded by a reward function. The agent may not beable to observe the NSE except for the penalty, which isproportional to the severity of the NSE provided by the feed-back mechanism. The model is updated with this learned re-ward function and an updated policy is computed that avoidsnegative side effects as much as possible, within the allowedslack. This formulation can hence handle both avoidable andunavoidable negative side effects. However this approach isnot suitable for safety-critical consequences since it priori-tizes optimizing the achievement of the assigned task.Both these approaches address the side effects associatedwith the execution of an action, independent of its outcome.
Constrained optimization
Negative side effects occurwhen some features of the environment are altered, which everity Reversibility Avoidability Frequency Stochasticity Observability Exclusivity[Hadfield-Mennel et al., 2017] - irreversible - frequent deterministic - -[Zhang et al., 2018] - irreversible avoidable - deterministic observable non-interfering[Krakovna et al.,2019] - - avoidable - - observable non-interfering[Shah et al., 2019] - irreversible avoidable frequent deterministic observable non-interfering[Zhang et al., 2020] - irreversible - - deterministic observable -[Turner et al., 2020] - irreversible avoidable frequent deterministic - non-interfering[Saisubramanian et al., 2020] not safety-critical irreversible - frequent deterministic - non-interfering
Table 2: Summary of the characteristics of the surveyed approaches to mitigating negative side effects. “-” indicates the approachis indifferent to the values of that property. Although some existing works do not explicitly refer to the severity of the sideeffects they can effectively handle, in general these approaches target side effects that are undesirable and significant, but notsafety-critical.were not expected or intended to be altered. One way toaddress this is by constraining the features that can be al-tered by the agent. Zhang, Durfee, and Singh (2018) con-sider a factored state representation and characterize nega-tive side effects as changes in the features of the environ-ment that may negatively surprise a human observer. Thisapproach assumes that the agent’s model includes the uncer-tainty over the desirability of altering a feature and considersdeterministic side effects that are irreversible, but avoidable.A policy is first computed to reach the goal assuming allthe uncertain features are “locked” for alteration. If a pol-icy exists, then the agent executes it. If no policy exists, theagent queries the human to determine which features can bealtered and recomputes a policy. A regret minimization ap-proach is used to select the top- k features for querying. Re-cently, the authors extended this approach to identify if thenegative side effects are unavoidable by casting it as a set-cover problem (Zhang, Durfee, and Singh 2020). If the sideeffects are unavoidable, the agent cease operation. There-fore, these approaches are not suitable for settings where theagent is expected to alleviate (unavoidable) negative side ef-fects to the extent possible, while completing its assignedtask. Minimizing deviations from a baseline
Another class ofsolution methods defines a penalty function for negativeside effects as a measure of deviation from a baseline state,based on the features altered. The deviation measure reflectsthe degree of disruption to the environment caused by theagent’s actions. The agent is expected to minimize the dis-ruption while pursuing its goal, thereby mitigating negativeside effects. A multi-objective formulation with scalariza-tion has been considered. The agent’s sensitivity to nega-tive side effects can be adjusted by appropriately tuning theweights used for scalarization.Different candidates for baseline states have been pro-posed, such as start state and inaction in a state, alongwith reachability-based metrics to measure the devia-tion (Krakovna et al. 2019; Shah et al. 2019). The resultingperformance is sensitive to the metric used to calculate devi-ations, particularly the choice of baseline state. The relativereachability approach (Krakovna et al. 2019) is not straight-forward to apply in settings more complex than grid-worlds, Figure 2: A public repository of negative side effectsas acknowledged by the authors. Attainable utility (Turner,Hadfield-Menell, and Tadepalli 2020) measures the impactof side effects as the shifts in the agent’s ability to optimizefor auxiliary objectives, generalizing the relative reachabil-ity measure. These approaches assume that the agent’s staterepresentation is sufficient to calculate the deviations and aretherefore not directly applicable to settings with mismatchesbetween the agent’s state representation and the environ-ment.
A Repository of Negative Side Effects
Since the problem of negative side effects is an emergingtopic, current research relies on proof-of-concept toy do-mains for performance evaluation. Moving forward, under-standing the occurrence of negative side effects in deployedAI systems is necessary for a realistic formulation of theproblem and to design effective solution approaches to ad-dress it. To that end, we have created a repository of negativeside effects identified in deployed AI systems . This pub-licly available repository is shown in Figure 2. It containsreal-world instances from scientific reports or news articles,identified by us. For each instance, details such as problemsetting in which negative side effects were observed, a de-scription of the side effects, location and date of incident, areprovided. We believe this repository will promote a deeperunderstanding of the problem, provide insights about which http://groups.cs.umass.edu/nse/ ssumptions are valid, and facilitate moving beyond simplegrid-world type domains as common test cases to evaluatetechniques.We invite the readers to contribute to this repository byreporting cases of negative side effects of deployed AI sys-tems, based on user experiences, published papers, or mediareports, using an online form we provide . Each submissionwill be reviewed by our team before adding it to the reposi-tory. Open Questions and Future Work
We discuss below some key open questions and research di-rections that can further the understanding of negative sideeffects and strategies to mitigate their impacts.
Negative side effects in multi-agent settings:
The exist-ing works have studied the negative side effects of a singleagent’s actions on the environment. In collaborative multi-agent systems, the agents work together to optimize per-formance and may have complementary skills. For exam-ple, the negative side effects produced by an agent may bereversible by another agent.
How can we leverage collab-orative multi-agent settings to effectively mitigate negativeside effects?
One solution approach is to devise a joint pol-icy to mitigate the negative side effects, in addition to op-timizing the utility of the assigned task. The existing richbody of work on cooperative multi-agent systems exam-ines how the intended effects of each agent’s actions mayaffect the other agents when devising a joint policy thatmaximizes the performance (Pynadath and Tambe 2002;Goldman and Zilberstein 2003; Zhang and Lesser 2007;Ramakrishnan et al. 2019). Extending such frameworks tohandle the side effects problem requires knowledge aboutthe negative side effects of each agent’s actions and how itaffects the behavior and rewards of other agents in the en-vironment. External feedbacks may indicate the occurrenceof negative side effects as a result of a joint action of theagents. Effectively mitigating the side effects requires mech-anism design for precise identification of the agent whoseactions produce these undesirable effects, based on the feed-back provided for joint actions.
Addressing side effects in partially observable settings:
In partially observable settings, an agent operates based ona belief distribution over the states. The problem is furthercomplicated when the agent has no prior knowledge of theside effects, which may be partially observable or unob-served.
How can an agent effectively learn to avoid negativeside effects in partially observable settings?
Due to partialobservability, the agent maps the external feedbacks indicat-ing the occurrence of negative side effects to a belief distri-bution and not an exact state. As a result, a belief distributionmay be associated with multiple conflicting feedbacks. De-pending on how the feedback signals are aggregated, differ-ent types of agent behavior emerges with varying sensitivityto negative side effects.
Expanding agent state representation:
An agent’s staterepresentation may only include features necessary to per- https://forms.gle/5MLZ7XMc9FzbDaoW7 form its assigned task. This limitation in state representationpotentially affects the process of learning to avoid negativeside effects. How to design models with sufficiently expres-sive state representation?
Building more realistic simulatorsfor training agents (Dosovitskiy et al. 2017) and updatingthe agent’s model, including expanding its state representa-tion based on raw sensor data, are promising directions toovercome this problem.
Human-agent collaboration:
Human-agent collaborationis especially useful when the negative side effects affectthe agent’s ability to complete its task, limited state repre-sentation affects the agent’s ability to learn to avoid sideeffects, or when the side effects are severe. Active userinvolvement beyond providing feedbacks, such as takingover the control from the agent (Ramakrishnan et al. 2019;Zilberstein 2015) or modifying the environment (Randløv2000) to facilitate agent learning and operation, may avoidnegative side effects.
When to ask the user for help, withoutexcessively relying on them?
There are some recent efforts inidentifying when to transfer the control to a human, when theagent is incapable of completing its task (Zilberstein 2015)or to overcome the blind spots of the system (Ramakrishnanet al. 2019).
Combination of side effects:
Many of the recent AI sys-tems, such as autonomous vehicles, are comprised of mul-tiple entities that function together to achieve a goal. Eachof these entities may contribute to different forms of nega-tive side effects. It is likely that multiple forms of negativeside effects, with varying impacts and severity, co-exist andrequire different solution techniques to mitigate the overallimpact.
How to ensure that approaches designed to elim-inate one form of side effect do not introduce new risks?
This problem is related to avoiding negative side effects incollaborative multi-agent settings since each component canbe treated as an agent collaborating with other agents. Rea-soning about multiple forms of risks together is a corner-stone in achieving safe AI systems. One approach is to eval-uate the effects of an impact regularizer on other modulesin the system that interact with the module of interest. Thisrequires broad background knowledge about the architec-ture and functionality of each component, which may notbe available in systems with black-box components.
Skill discovery to mitigate negative side effects:
Skill dis-covery (Eysenbach et al. 2018; Konidaris and Barto 2009)in reinforcement learning allows an agent to discover use-ful new skills autonomously. High-level skills or options aretemporally extended courses of actions that generalize prim-itive actions of an agent. These closed-loop policies speed upplanning and learning in complex environment and are gen-erally used in hierarchical methods for reasoning. Exploringthe feasibility of skill discovery for avoiding negative sideeffects is an interesting direction that could accelerate agentbehavior adaptation, especially to avoid side effects duringagent exploration. For example, if the agent learns to pusha box without scratching the walls or dirtying the rug, thisoption is useful in a variety of related settings and enablesfaster behavior adaptation. elated problems:
This article has discussed the undesir-able side effects in the context of safety and control in au-tonomous systems. AI systems may suffer from other fac-tors that affect their reliability, such as biases and privacyconcerns. Amplifying underlying biases in a system or in-creased vulnerability to attacks may occur when the sys-tem optimizes incorrect or incompletely specified objec-tives, which can be treated as serious side effects that re-quire entire model redesign. There are growing efforts inthe machine learning community to address many forms ofbiases and to improve the security for safeguarding againstadversarial attacks (Kurakin, Goodfellow, and Bengio 2016;Barocas et al. 2017; Gleave et al. 2019; Peng et al. 2019).
Conclusion
This article examines the concept of negative side effectsof AI systems and offers a comprehensive overview of re-cent research efforts to address the challenges presented byside effects. In doing so, we aim to advance the general un-derstanding of this nascent but rapidly evolving area. Wepresent a taxonomy of negative side effects, discuss the keychallenges in avoiding side effects, and summarize the cur-rent literature on this topic. We also present potential futureresearch directions that are aimed at deepening the under-standing of the problem. While some of these issues can beaddressed using problem-specific or ad-hoc solutions, devel-oping general techniques to identify and mitigate negativeside effects will facilitate the design and deployment of morerobust and trustworthy AI systems.
Acknowledgments
This work was supported in part by the Semiconductor Re-search Corporation (SRC).
References [Amodei et al. 2016] Amodei, D.; Olah, C.; Steinhardt, J.;Christiano, P.; Schulman, J.; and Man´e, D. 2016. Concreteproblems in AI safety.
CoRR abs/1606.06565.[Barocas et al. 2017] Barocas, S.; Crawford, K.; Shapiro, A.;and Wallach, H. 2017. The problem with bias: Allocativeversus representational harms in machine learning. In .[Basich et al. 2020] Basich, C.; Svegliato, J.; Wray, K. H.;Witwicki, S. J.; Biswas, J.; and Zilberstein, S. 2020. Learn-ing to optimize autonomy in competence-aware systems. In
Proceedings of the 19th International Conference on Au-tonomous Agents and MultiAgent Systems (AAMAS) .[Bennett and Hauser 2013] Bennett, C. C., and Hauser, K.2013. Artificial intelligence framework for simulating clin-ical decision-making: A markov decision process approach.
Artificial intelligence in medicine
Pattern Recognition
Advances in Neural Information Processing Systems , 8224–8234.[Cao et al. 2019] Cao, Y.; Xiao, C.; Cyr, B.; Zhou, Y.; Park,W.; Rampazzi, S.; Chen, Q. A.; Fu, K.; and Mao, Z. M.2019. Adversarial sensor attack on lidar-based perceptionin autonomous driving. In
Proceedings of the 2019 ACMSIGSAC Conference on Computer and Communications Se-curity , 2267–2281.[Coren 2018] Coren, M. J. 2018. All the things that still baf-fle self-driving cars, starting with seagulls. Quartz, Septem-ber 23.[Crane, Logue, and Pilz 2017] Crane, D. A.; Logue, K. D.;and Pilz, B. C. 2017. A survey of legal issues arisingfrom the deployment of autonomous and connected vehi-cles.
Michigan Telecommunications and Technology LawReview
AI Magazine .[Dosovitskiy et al. 2017] Dosovitskiy, A.; Ros, G.; Codev-illa, F.; Lopez, A.; and Koltun, V. 2017. CARLA: An openurban driving simulator. In
Conference on Robot Learning ,1–16.[Eysenbach et al. 2018] Eysenbach, B.; Gupta, A.; Ibarz, J.;and Levine, S. 2018. Diversity is all you need: Learningskills without a reward function. In
International Confer-ence on Learning Representations .[Folsom-Kovarik, Sukthankar, and Schatz 2013] Folsom-Kovarik, J. T.; Sukthankar, G.; and Schatz, S. 2013.Tractable POMDP representations for intelligent tutoringsystems.
ACM Transactions on Intelligent Systems andTechnology (TIST)
Interna-tional Conference on Learning Representations .[Goldman and Zilberstein 2003] Goldman, C. V., and Zilber-stein, S. 2003. Optimizing information exchange in cooper-ative multi-agent systems. In
Proceedings of the 2nd Inter-national Conference on Autonomous Agents and Multi AgentSystems (AAMAS) .[Hadfield-Menell et al. 2016] Hadfield-Menell, D.; Russell,S. J.; Abbeel, P.; and Dragan, A. 2016. Cooperative inversereinforcement learning. In
Advances in neural informationprocessing systems , 3909–3917.[Hadfield-Menell et al. 2017] Hadfield-Menell, D.; Milli, S.;Abbeel, P.; Russell, S. J.; and Dragan, A. 2017. Inverse re-ward design. In
Advances in Neural Information ProcessingSystems .[Hibbard 2012] Hibbard, B. 2012. Avoiding unintended AIbehaviors. In
International Conference on Artificial GeneralIntelligence , 107–116. Springer.[Insurance Institute for Highway Safety 2018] Insurance In-stitute for Highway Safety. 2018. Reality check: Research,eadly crashes show need for caution on road to full auton-omy.
Status Report Newsletter
Advances in neural infor-mation processing systems , 1015–1023.[Krakovna et al. 2019] Krakovna, V.; Orseau, L.; Martic, M.;and Legg, S. 2019. Penalizing side effects using stepwiserelative reachability. In
AI Safety Workshop, IJCAI .[Kurakin, Goodfellow, and Bengio 2016] Kurakin, A.;Goodfellow, I.; and Bengio, S. 2016. Adversarial examplesin the physical world.
CoRR abs/1607.02533.[Lakkaraju et al. 2017] Lakkaraju, H.; Kamar, E.; Caruana,R.; and Horvitz, E. 2017. Identifying unknown unknownsin the open world: Representations and policies for guidedexploration. In
Proceedings of the 31st AAAI Conference onArtificial Intelligence .[McCurry 2015] McCurry, J. 2015. South korean woman’shair ’eaten’ by robot vacuum cleaner as she slept. TheGuardian, February 8.[Peng et al. 2019] Peng, A.; Nushi, B.; Kıcıman, E.; Inkpen,K.; Suri, S.; and Kamar, E. 2019. What you see is what youget? the impact of representation criteria on human bias inhiring. In
Proceedings of the AAAI Conference on HumanComputation and Crowdsourcing , volume 7, 125–134.[Pynadath and Tambe 2002] Pynadath, D. V., and Tambe, M.2002. Multiagent teamwork: Analyzing the optimality andcomplexity of key theories and models. In
Proceedings ofthe 1st International Conference on Autonomous Agents andMulti Agent Systems (AAMAS) .[Ramakrishnan et al. 2018] Ramakrishnan, R.; Kamar, E.;Dey, D.; Shah, J.; and Horvitz, E. 2018. Discovering blindspots in reinforcement learning. In
Proceedings of the 17thInternational Conference on Autonomous Agents and Multi-Agent Systems .[Ramakrishnan et al. 2019] Ramakrishnan, R.; Kamar, E.;Nushi, B.; Dey, D.; Shah, J.; and Horvitz, E. 2019. Over-coming blind spots in the real world: Leveraging comple-mentary abilities for joint execution. In
Proceedings of the33rd AAAI Conference on Artificial Intelligence .[Randløv 2000] Randløv, J. 2000. Shaping in reinforcementlearning by changing the physics of the problem. In
Pro-ceedings of the 17th International Conference on MachineLearning , 767–774.[Russell, Dewey, and Tegmark 2015] Russell, S.; Dewey, D.;and Tegmark, M. 2015. Research priorities for robust andbeneficial artificial intelligence.
AI Magazine
Exponential Life, The Next Step .[Saisubramanian, Kamar, and Zilberstein 2020]Saisubramanian, S.; Kamar, E.; and Zilberstein, S. 2020. Amulti-objective approach to mitigate negative side effects.In
Proceedings of the 29th International Joint Conferenceon Artificial Intelligence . [Saisubramanian 2020] Saisubramanian, S. 2020. Neg-ative Side Effects Form. https://forms.gle/5MLZ7XMc9FzbDaoW7 .Accessed: 2020-08-12.[Saria and Subbaswamy 2019] Saria, S., and Subbaswamy,A. 2019. Tutorial: Safe and reliable machine learning.
CoRR abs/1904.07204.[Shah et al. 2019] Shah, R.; Krasheninnikov, D.; Alexander,J.; Abbeel, P.; and Dragan, A. 2019. Preferences implicitin the state of the world. In
Proceedings of the 7th Interna-tional Conference on Learning Representations (ICLR) .[Sim 2019] Sim, I. 2019. Mobile devices and health.
NewEngland Journal of Medicine
Sci-ence
Proceedings of the AAAI/ACM Conference on AI, Ethics,and Society .[Wang et al. 2016] Wang, Z.; Bapst, V.; Heess, N.; Mnih, V.;Munos, R.; Kavukcuoglu, K.; and de Freitas, N. 2016.Sample efficient actor-critic with experience replay.
CoRR abs/1611.01224.[Zhang and Lesser 2007] Zhang, X., and Lesser, V. 2007.Meta-level coordination for solving negotiation chains insemi-cooperative multi-agent systems. In
Proceedings ofthe 6th International Conference on Autonomous Agents andMulti Agent Systems (AAMAS) .[Zhang, Durfee, and Singh 2018] Zhang, S.; Durfee, E. H.;and Singh, S. P. 2018. Minimax-regret querying on sideeffects for safe optimality in factored markov decision pro-cesses. In
Proceedings of the 27th International Joint Con-ference on Artificial Intelligence , 4867–4873.[Zhang, Durfee, and Singh 2020] Zhang, S.; Durfee, E. H.;and Singh, S. 2020. Querying to find a safe policy underuncertain safety constraints in markov decision processes.In
Proceedings of the 34th AAAI Conference on ArtificialIntelligence .[Zilberstein 2015] Zilberstein, S. 2015. Building strongsemi-autonomous systems. In