Legible Normativity for AI Alignment: The Value of Silly Rules
LLegible Normativity for AI Alignment: The Value of Silly Rules
Dylan Hadfield-Menell,
McKane Andrus,
Gillian K. Hadfield, Department of Electrical Engineering and Computer Science, University of California Berkeley Center for Human Compatible AI Faculty of Law and Rotman School of Management, University of Toronto Vector Institute for Artificial Intelligence [email protected], [email protected], g.hadfi[email protected]
Abstract
It has become commonplace to assert that autonomous agentswill have to be built to follow human rules of behavior–socialnorms and laws. But human laws and norms are complexand culturally varied systems; in many cases agents will haveto learn the rules. This requires autonomous agents to havemodels of how human rule systems work so that they canmake reliable predictions about rules. In this paper we con-tribute to the building of such models by analyzing an over-looked distinction between important rules and what we callsilly rules —rules with no discernible direct impact on wel-fare. We show that silly rules render a normative system bothmore robust and more adaptable in response to shocks to per-ceived stability. They make normativity more legible for hu-mans, and can increase legibility for AI systems as well. ForAI systems to integrate into human normative systems, wesuggest, it may be important for them to have models thatinclude representations of silly rules.
Introduction
As attention to the challenge of aligning artificial intelli-gence with human welfare has grown, it has become com-monplace to assert that autonomous agents will have tobe built to follow human norms and laws (Etzioni and Et-zioni 2016; Etzioni 2017; IEEE 2018). But this is no easytask. Human groups are thick with rules and norms aboutbehavior, many of which are largely invisible, taken forgranted as simply “the way things are done” by partici-pants (Schutz 1964). They are constituted in complex waysthrough second-order normative beliefs : beliefs about whatothers believe we should or should not do in some situ-ation (Bicchieri 2006; 2017). Human laws and norms arefrequently ambiguous and complicated, they vary widelyacross jurisdictions, cultures, and groups, they change andadapt. The cases in which they are reducible to formal rulestatements is the exception. Even deciding whether a vehi-cle has violated a numerical speed limit is far from straight-forward: was visibility poor? were there children nearby?Adding to the complexity, rules and norms are enforcedboth by formal institutions like courts and regulators throughcostly and error-prone procedures and by the informal be-havior of agents through third-party criticism and exclu-sion or internalization and self-criticism. This means thatwhat actually counts as a rule can easily diverge from an- nounced or formal rules and that rule-based environmentsare complex dynamic systems. As a result, we cannot relyon formal rules simply being imposed on agents a priori ; in-stead, agents will in many cases have to learn the rules andhow they work in practice. Normativity–the human practiceof classifying some actions as sanctionable and others asnot and then punishing people who engage in sanctionableconduct–will have to be legible (Dragan and Srinivasa 2013)to AI systems.In this paper, we introduce a distinction between typesof rules that can aid in building predictive models to makehuman normative systems legible to an AI system. We dis-tinguish between important rules and silly rules. An impor-tant rule is one the observance of which by one agent gener-ates direct payoffs for some other agent(s). When an agentcomplies with rules prohibiting speeding, for example, otheragents enjoy a material payoff as a direct consequence, suchas a reduced probability of accident. A silly rule, in contrast,in one the observance of which by one agent does not gen-erate any direct material payoff for any other agent. Whenan agent violates a dress code, for example, such as by fail-ing to wear a head covering in public, no-one is materiallyaffected as a direct consequence of the violation. Observersmight well be offended, and they might punish the violator,but the violation itself is inconsequential.We ground our claim that the distinction between sillyand important rules will prove important to building mod-els for aligned AI using Monte Carlo simulations. We showthat silly rules promote robustness and adaptation in groups.Silly rules perform a legibility function for humans–makingit easier for them to read the state of the equilibrium in theirgroup when equilibrium is threatened. Incorporating this in-sight about silly rules into AI design should allow humannormative systems to be more legible to AI.Our paper is presented as follows. We first illustrate theconcept of silly rules an example drawn from a concrete en-vironment. We then develop a model of groups, based on(Hadfield and Weingast 2012), in which a group of agentsannounces a set of rules and relies exclusively on voluntarythird-party punishment by group members to police viola-tions. We first show formally that, if silly rules are cost-less, groups with more silly rules achieve higher payoffs.We then consider the case in which following and punishingsilly rules is costly and present the results of our simulations. a r X i v : . [ c s . A I] N ov ur results demonstrate that groups with lots of (sufficientlycheap) silly rules are more robust: they are able to maintainmore of their population and are less likely to collapse thangroups with fewer silly rules in response to an unfoundedshock to beliefs about the proportion of punishers. Groupswith lots of silly rules are also more adaptable: they collapsemore quickly when there is a true drop in the proportion ofpunishers below the threshold that makes group membershipvaluable.Our contributions are threefold. First, we present a formalmodel that can account for the presence of silly rules in anormative system and show the conditions under which sillyrules are likely to exist. This is a contribution to normativetheory in human groups. Second, this work provides an ex-ample of the importance of building predictive models of hu-man normative systems qua systems–not merely predictingthe presence of particular norms, which is the dominant ap-proach taken in the growing literature on AI ethics and align-ment. Third, we demonstrate that standard AI methods canbe valuable tools in analyzing human normative systems. What are Silly Rules? A Thought Experimentfrom Ethnography
One of the challenges of building models of human nor-mativity is that as researchers we are all participants inour taken-for-granted normative environments and this canmake it hard to study norms scientifically (Haidt and Joseph2008). To attempt to overcome this, we motivate our workwith an example drawn from an ethnography of a group thatengages in practices far removed from the worlds in whichAI researchers live. Moreover, we will use the shocking label”silly rules” in order to illuminate an overlooked distinctionin the context of the existing literature on normativity. Mostof the social science of norms focuses on functional accountsof particular norms such as norms of reciprocity, fair shar-ing of rewards, or non-interference with property. These ac-counts argue that particular norms evolve because they sup-port human cooperation and thus improve fitness (Boyd andRicherson 2009; Tomasello and Vaish 2013) or solve coor-dination games (Sugden 1986; McAdams and Nadler 2005;Myerson 2004; McAdams 2015) for example. Our workhighlights the systemic functionality of rules that, individ-ually, have no direct functionality. All human societies, wewill show, are likely develop silly rules, for good functionalreasons.Suppose that an AI system were tasked with learning howto make arrows by observing the Aw´a people of Brazil. TheAw´a are hunter-gatherers now living in relatively small num-bers on reservations established by the Brazilian govern-ment. One of the things the AI will observe, like the ethnog-raphers who have studied these people, is that the men of theAw´a spend four or more hours a day making and repairingarrows (Gonzlez-Ruibal, Hernando, and Politis 2011). Theyare produced in large quantities and need frequent repair.They are between 1.4 and 1.7 meters in length, customizedto the height of their owner. Bamboo collected to make thepoints is sometimes shared but the arrows themselves arenot; they are buried with their owner. The men use only dark, not brightly colored feathers. All parts of the arrow —shaft,point, and feathers —are smoked over a grill during prepa-ration and the arrows themselves are kept warm in smoke atall times unless they are bundled and put in storage in therafters of a hut. Will the AI system reproduce all of thesearrow-making behaviors? We can imagine that AI designedwith principles of inverse reinforcement learning (Ng, Rus-sell, and others 2000) might discern which behaviors actu-ally contribute to the functionality of the arrows–which ispresumably what the human designer intended (Hadfield-Menell et al. 2017). According to the human ethnographerswho observed the Aw´a, many of the arrow-making prac-tices are not functional. Even if smoking the wood usedin the shaft of the arrow during manufacture contributes toa harder, straighter arrow, smoking the feathers seems un-necessary, as does ensuring the arrows are kept warm at alltimes. Moreover, the men make and carry many more arrowsthan they will use. In one season, a total of 402 arrows werecarried on 9 different foraging trips; 9 were used. Most gameon these trips was shot with a shotgun (Gonzlez-Ruibal, Her-nando, and Politis 2011).An AI system that ignored the non-functional arrow-making behaviors, however, would be violating the normsof the Aw´a people. The arrow-making practices describedabove are not just practices; they are rules. They reflect nor-mative expectations (Bicchieri 2006). How do we know?The lack of functionality is one clue: the Aw´a presumablyhave also discovered that a cold arrow works and that theyspend a lot of time making arrows that go unused and aredamaged by being bundled and carried around. But the bet-ter evidence comes from how they respond to the only manwho makes his arrows differently. This man is mocked: hisarrows are exceedingly long (2.3 m) and he uses brightlycolored feathers. He is “the only man who does not social-ize with the rest of the village.” His strange arrows “are an-other sign of his loss of ‘Aw´a-ness’” (Gonzlez-Ruibal, Her-nando, and Politis 2011). The Aw´a’s rules are normative,moral principles: bright colored feathers are used only bywomen to prepare headbands and bracelets used by men inreligious rituals and are associated with the world of spiritsand ancestors; the making of fire and cooking are associ-ated with masculinity and divinity. An AI system that vio-lated these rules in the pursuit of arrow-making would notbe aligned with the moral code of the Aw´a.We call the non-functional rules ”silly rules”. We empha-size that silly rules are not “silly” to the groups that followthem. They can have considerable meaning, as they do tothe Aw´a. Our results will show why silly rules can be veryimportant to the overall welfare of a group and hence thesubject of intense concern by group members.
Model
Our model is based on a framework developed in (Had-field and Weingast 2012). We characterize a set of agentsas a group defined by a fixed and common knowledge set ofrules. A rule is a binary classification of alternative actionsthat can be taken in carrying out some behavior. Actions areeither “rule violations” or “not rule violations”.2embers of this group engage in an infinite sequence ofinteractions, each of which is governed by a rule drawn ran-domly from the ruleset. Each interaction is composed of arandomly selected pair of agents and a third actor, whom wewill call a scofflaw, who will choose either to comply withthe governing rule or not. (For tractability reasons, we do notmodel the scofflaws as group members.) One of the agentsis randomly designated as the victim of the rule violation;the other is a bystander. If a rule is an important rule, thevictim incurs a benefit if the rule is enforced and incurs acost if not. If a rule is a silly rule, the victim incurs no bene-fit from the scofflaw’s compliance with the rule and no costfrom violation.Group members are of two types in the bystander role:punishers, who always punish a rule violation, and non-punishers, who never punish. We assume that groups mem-bers signal whether they are punishers by paying a signal-ing cost in each interaction before a potential violation oc-curs. ((Boyd, Gintis, and Bowles 2010) show that signalingpunisher status supports an evolutionarily stable equilibriumin which non-punishers cannot free-ride on punisher types.)The scofflaw complies with the selected rule if the bystanderis a punisher and violates it otherwise. There is no punish-ment in equilibrium, but the model can be seen as assumingthat victims always punish but punishment is only effectivewhen bystanders punish as well.Prior to each interaction, group members have an optionto quit the group and take a risk-free payoff. We formalizethis setup as follows.Each interaction is a game g and we define the sequencefor a group as a tuple: (cid:104) G, T θ , Π , U, γ, c (cid:105) where G is a distri-bution over games and T θ is a distribution over punishmenttypes t in the group, where t = 1 if an agent is a punisherand t = 0 if not. The proportion of punishers is given by θ ∈ [0 , . For the tractability of our agent models, we treat T θ as a static distribution, and assume agents do likewise,even though it is subject to change as individuals leave thegroup.We will abuse notation somewhat and use T and G to referto the support of the corresponding distributions where themeaning is obvious. Π is each agent’s prior distribution overthe parameters of T θ , and U : G × T θ → R is a mappingfrom types and games to immediate payoffs for the agents. γ is each agent’s discount parameter for future rewards. c expresses a participation cost. This can be understood as theexpected cost of an agent in the bystander role to signal thatshe is a punisher to the other agent in an interaction.Every agent begins in period 1 with perfect knowledge ofhow actions are classified, all payoffs, and the distributionof games. The agents do not know the distribution of typesin the group, but they do hold a prior which we will specifyshortly. The agents update their beliefs about the distributionof types using Bayes’ rule. The super game is defined asfollows:For each period j :1. Each agent chooses whether to participate or not. If anagent opts out, she collects payoff.2. All agents that opt in are matched with another agent at random. A game g j +1 ∼ G is drawn for each of the agentpairings.3. Punishers incur a cost c to signal that they will punishviolations. All players observe these signals.4. In each pairing one agent is randomly assigned the roleof victim, V , and the other the bystander, B . All playersobserve the result of this random assignment.5. All players learn whether the game is a silly or importantgame.6. If B is a punisher, the scofflaw complies with the rule.Otherwise, the scofflaw violates the rule.7. Victims and bystanders collect payoffs given by U V ( g j , t B , t V ) and U B ( g j , t V , t B ) .Agents that play in the bystander role in any game incur nobenefit; they incur the cost c if they are a punisher and ifnot. Agents that play in the victim role receive a payoff of in games governed by a silly rule. In games governed byan important rule they receive a positive reward R , if B is apunisher and a negative reward, − R , if B is not.We formalize the set of important games as follows: G (cid:48) = { g ∈ G | U ( g, · ) (cid:54) = 0 } U V ( g, t O , t V | g ∈ G (cid:48) ) = (2 t O − R − (2 t V − cU O ( g, t O , t V | g ∈ G (cid:48) ) = (2 t O − c We will use E U = E g,t O [ U V ( g, t O , t V ) | g ∈ G (cid:48) ] to denotethe expected utility of an important game. We let d denotethe density of the process generating games: the probabilityof a silly game. d = 1 − P ( g ∈ G (cid:48) ); g ∼ G. Note that a super-game has high density ( d close to 1) whensilly rules are a large fraction of the ruleset.Critically, we ensure that the density of silly games doesnot alter the (expected) rate at which important games arepresented to the agents. Rather than take the place of im-portant interactions, in our model silly interactions increasethe total number of interactions happening in the same timeframe. To be concrete, we assume the expected discountedreward obtained from important games is independent of d .This condition can be attained through a suitable modifica-tion of γ as a function of d : Proposition 1.
Setting γ d = 1 − (1 − d )(1 − γ ) ensures that the expected sum of discounted rewards fromimportant games is independent of d .Proof. See appendix.It can be easily shown that this constrained model de-scribes an optimal stopping problem (Gittins, Glazebrook,and Weber 2011). Each agent in our model must choosebetween participating, in which case they get an unknownreward and learn about the enforcement equilibrium in the I [ ψ ] is the indicator function for the condition ψ . igure 1: The y-axis represents the proportion of the 1000 groups with at least 2 individuals left, where the x-axis represents time in termsof the number of expected important interactions per agent. The size of the bubbles signifies the average size of the 1000 groups at the givenpoint in time. 40 linearly spaced values between . and . were used for silly rule density, and the graphs for each setting are coloredaccordingly. For cost 0.02, we see that high density (blue) groups collapse rapidly, while the lowest density groups (orange) sustain their sizefor the duration of the experiment. As the cost comes down, higher density groups start to survive. community, or opting out, in which case they stop partici-pating get a constant reward of 0. A classic result from theliterature on optimal stopping tells us that, in the optimalpolicy, if the agent opts out once it will opt out for the restof time. This is because the agent’s information state doesnot change when it chooses to opt out, so if it was optimalto stop at time t − , it will also be optimal to stop at time t .Thus, we refer to the decision not to participate at any point,then, as a decision to retire.These problems broadly fall under the class of partiallyobserved Markov decision processes (POMDP) (Sondik1971). In a POMDP the optimal policy only depends on theagent’s belief state : the agent’s posterior distribution over thehidden state of the system. In this case, this is a distributionover the enforcement likelihood in the community. We giveour agent a beta prior over this parameter so that the beliefspace for our agent is a two dimensional lattice equivalent to Z . Initially, the belief state is ( α , β ) . The probability thatthe bystander is a punisher in the first game is p αβ = αα + β . Once the games begin, agents update their prior beliefs us-ing Bayes’ rule, adding the counts of punishers and non-punishers observed to the prior values. In the following, wewill use α i ( β i ) to represent the number of punishers (non-punishers) observed prior to round i . Theoretical Analysis: The Value of DenseNormative Structure
Consider first the case in which the signaling cost, c , is zero.In this case, punishers only face a risky choice when theyare assigned to the victim role in an important game. In allother periods, the per-period expected payoff of playing therisky arm is a constant 0.Intuitively, the benefit of higher density of unimportantgames is that the agent is in a more information rich envi-ronment. In general, this benefit trades off with the cost ofsignalling. However, when the signalling cost is 0, a higherdensity is strictly better. One way to show this is to considerthe value of perfect information (VPI): the additional utilityan agent can get in expectation when it has full informationcompared with the expected utility with partial information(Russell and Norvig 2010). We can show that, in the limit asdensity goes to 1, VPI goes to 0; high density of unimportantgames essentially removes the agents’ uncertainty over theproportion of punishers. Proposition 2.
If the participation cost, c , is 0, then, for anybelief state, ( α i , β i ) , and discount rate γ , the correspondingVPI goes to zero as density goes to 1. That is lim d → V P I (( α i , β i ); d, γ ) = 0 (1) Proof. (Sketch; see Appendix for details.) We consider apolicy such that the agent participates in order to observe4 igure 2: Using the same graphical representation as Figure 1, herewe see a comparison of the robustness between groups of groups at3 different density values. As the number of interactions increases,the survival rate and average population size of groups with highersilly rule density surpasses those of lower density groups. τ ( d ) interactions. After τ ( d ) observations, it uses its best es-timate of the probability of enforcement to decide if it shouldleave. It doesn’t reconsider retiring or rejoining afterwards.We show that this stopping time function can be chosen sothat the expected number of important games goes to 0, soit doesn’t lose utility in expectation, while the total num-ber of interactions (including silly games) goes to ∞ , so itmakes the retirement decision with perfect information inthe limit.It is straightforward to show that VPI is strictly positive aslong as P ( V ( θ ) > > (cid:15) > and P ( V ( θ ) < > (cid:15) > for some finite epsilon. Combined with our proposition, thismeans that environments with more silly rules will be highervalue to agents; as the density of silly rules goes to 1, wecan neglect the utility lost due to partial information aboutthe proportion of punishers. Where participation costs canbe neglected, an agent will prefer an environment with lotsof silly rules. Monte Carlo Experiments
To test the benefit of silly rules in groups composed ofour previously defined agents, we constructed a series ofsimulation-based experiments in which we manipulated thedensity of silly rules, cost of signaling, distribution of pun-ishers, and prior beliefs about the punisher distribution. Eachsimulation was carried out in a group of 100 agents, eachgiven the type of punisher or non-punisher. The simulationswere broken down into discrete periods, or group interac-tions , in which each individual was matched with anotherand engaged in an interaction, silly or important. We set thereward for the victim in an important game to +1 in the casein which the bystander is a punisher, and − if the bystanderis not. Note that given the symmetry in gains and losses Figure 3: Using the same graphical representation as Figure 1 and2, here we see a comparison of the adaptability between groups ofgroups at 3 different density values. Given the unstable conditionsof the scenario, higher adaptability corresponds to faster a collapserate. Groups with high density are seen to collapse much faster thanthose with low density. in important games, continued participation in the group isvaluable if the likelihood that a bystander in an importantgame is a punisher is greater than . Given the density-adjusted discount factor for each simulation, the expectedreward of 10 interactions in a d = 0 . environment wouldbe equivalent to that of 1 interaction in a d = 0 . environ-ment. This allows us to normalize the periods into timesteps ,where 1 timestep is equal to − d group interactions .We are interested in the size of groups over time, as agentsmake decisions about whether to remain in the group or notgiven their interaction experience, as a function of the den-sity of silly rules.To establish a base case simulation, we first consider thecase in which there is low uncertainty about the proportionof punishers in the group. We can think of this as the case inwhich a stable group has engaged in interactions over a longperiod of time and all agents have many observations of theproportion of punishers in the group. We set θ , the punisherproportion, to be 0.6 and the alpha-beta prior of the agentsto
30 : 20 , which implies high confidence in the agents’ es-timate of θ . Note that with this ground truth, group member-ship is valuable, generating an expected payoff higher thanthe alternative of . We then run the simulation for groups on the full factorial of logistically scaled signal-ing costs, c , and linearly scaled densities, d . Doing soconfirms our first hypothesis (see Figure 1): Hypothesis 1: When uncertainty about the proportion ofpunishers in a group is low, the likelihood that a group losesmembers and the likelihood the group collapses increaseswith the density of silly rules.
In this case, as the frequency of the cost of signaling thatone is a punisher in silly interactions increases it begins tooutweigh the possible rewards from important interactions.5e see that as cost goes up, groups with higher density ofsilly interactions shrink and collapse more frequently thanthose with low density. This confirms the intuition that sillyrules are costly if they serve no information function.
Group Robustness
Having established a baseline, we investigate the benefits ofsilly rules for a group by considering different scenarios thatwill stress test the robustness and adaptability of a group.The first case we consider is one in which the individuals’beliefs in a stable group are shocked, lowering their con-fidence in the proportion of punishers. Concretely, this in-volves setting the beta priors to . . instead of
30 : 20 .Our hypothesis for the belief-shock scenario is as follows:
Hypothesis 2: For sufficiently low signaling cost, a higherdensity of silly rules increases a group’s resilience to shocksin individuals’ beliefs about the distribution of punishers.
As shown in Figure 2, in settings with low signaling cost,high density allows for quick stabilization and strong indi-vidual retention. Around of groups with . densitypersist after timesteps , with an average population of . Compare this to the lower density groups, where thegroups that survive lose most members before stabilizing. Group Adaptability
To test adaptability, we imagine an alternative scenario inwhich the shock to individuals’ beliefs is accompanied by achange in the ground truth about the proportion of punishers.Concretely, we change the beta prior to . . once again,and set θ to be . . With fewer punishers than non-punishers,participation in the group generates a negative expected pay-off, and agents would do better to leave the group. Put dif-ferently, the group’s ruleset is no longer generating value forgroup members. In a negative-value group such as this, wedefine the adaptability of the group to be the rapidity of col-lapse. Our hypothesis for this scenario is as follows: Hypothesis 3: For sufficiently low signaling cost, a higherdensity of silly rules allows for faster adaptation to negativershocks in the distribution of punishers in a group.
Looking to Figure 3, we find support for this hypothesisin the experiments. After only a few timesteps , we see thatthe high density groups are mostly collapsed, whereas thelower density groups take quite a bit longer to peter out.
Discussion
Our results show that silly rules help groups adapt to un-certainty about the stability of social order by enriching theinformation environment. They help participants in thesegroups track their beliefs about the likelihood that violationsof important rules will be punished, and thus the likelihoodthat important rules will be violated. These beliefs are criti-cal to the incentive to invest resources in interaction.We focus on the punisher type of bystanders becausethird-party punishment is the distinctive feature of humangroups (Riedl et al. 2012; Tomasello and Vaish 2013; Buck-holtz and Marois 2012); it extends the range of actions thatcan be deterred from those deterred by the reactions of the victim alone to those that can be deterred by group punish-ment (Boyd and Richerson 1992).What are the lessons for AI alignment research? The goalof AI alignment is the goal of building AI systems that act inways consistent with human values. For groups of humans,this means (at least) values reflected in rules of behavior.Discerning values from rules is complex: some rules reflectimportant values, such as honoring a promise or avoidingharm. Others do not reflect values that are important per se .For an AI system to make good inferences and predictionsfrom observing normative behavior, then, it will need to dis-tinguish between important rules and silly rules.Failing to make this distinction could lead to at least twokey inferential errors. One error would be to treat impor-tant and silly rules as equally likely to vary over time andplace. But important rules, because they promote function-ality in human interactions, are likely to vary only whenthere is some causal reason. Silly rules, on the other hand,can vary for any reason, or none. Modelling the distinctionbetween silly and important rules is essential to accuratelylearning rule systems. An AI system that lacks this distinc-tion will over-estimate the likelihood of encountering cer-tain types of normative behavior–with respect to dress codes,for example–while under-estimating the likelihood of oth-ers, such as speeding rules.A second error that could result from a failure to distin-guish between important and silly rules is that an AI systemis likely to treat all rules that it sees enforced as equally im-portant to human values. This would produce a good solu-tion in ordinary circumstances. But this will produce a poorsolution in circumstances in which it would be very costly tocomply with all the rules. If an AI system treats all of rulesas equally important to humans, it will presumably econo-mize equally across the rules. But the better solution is toprioritize important rules and compromise on silly rules.The distinction between silly and important rules alsoraises a question for work on human-robot interaction: howimportant is it for an AI system to help enforce silly rules?Our model brings out a legibility function in silly rules–theymake it easier for agents in a group that depends on third-party enforcement to discern the stability of the rules in lightof uncertainty generated by changes in population or the en-vironment. If artificial agents are interacting in these envi-ronments and they don’t participate in enforcing silly rules,what impact does that have on the beliefs of human agents?Does the introduction of large numbers of artificial agentswho ignore silly rules into a human group (such as self-driving cars into the group of humans driving on highways)have the same impact on the robustness and adaptability ofthe group as a decrease in the density of silly rules, by reduc-ing the amount of information gained from the opportunityto observe bystander behavior in interactions? Further still,when a robot learns and enforces silly rules, do these seem-ingly arbitrary norms become reified, fundamentally chang-ing their meaning and reducing their signaling potential? Weleave these questions, and others, for future research.6 eferences
Bicchieri, C. 2006.
The grammar of society: the nature anddynamics of social norms . New York: Cambridge UniversityPress.Bicchieri, C. 2017.
Norms in the Wild: How to Diagnose,Measure, and Change Social Norms . Oxford: Oxford Uni-versity Press.Boyd, R., and Richerson, P. J. 1992. Punishment allows theevolution of cooperation (or anything else) in sizable groups.
Ethology and sociobiology
Philosophical Transactions ofthe Royal Society B: Biological Sciences
Science
Nature neuroscience
Human RobotInteraction 2013 Proceedings , 301–308.Etzioni, A., and Etzioni, O. 2016. Ai assisted ethics.
Ethicsand Information Technology
New York Times .Gittins, J.; Glazebrook, K.; and Weber, R. 2011.
Multi-armed bandit allocation indices . John Wiley & Sons.Gonzlez-Ruibal, A.; Hernando, A.; and Politis, G. 2011.Ontology of the self and material culture: Arrow-makingamong the aw hunter.
Journal of Anthropological Archae-ology
Journal of Legal Analysis
Advances inNeural Information Processing Systems , 6765–6774.Haidt, J., and Joseph, C. 2008. Oxford: Oxford UniversityPress. chapter The Moral Mind: How Five Sets of InnateIntuitions Guide the Development of Many Culture-SpecificVirtues, and Perhaps Even Modules.IEEE. 2018.
Ethically Aligned Design Ver II: A Vision forPrioritizing Human Well-being with Autonomous and Intel-ligent Systems . IEEE.McAdams, R. H., and Nadler, J. 2005. Testing the focalpoint theory of legal compliance: The effect of third-partyexpression in an experimental hawk/dove game.
Journal ofEmpirical Legal Studies
The Expressive Power of Law: Theo-ries and Limits . Cambridge, MA: Harvard University Press.Myerson, R. B. 2004. Justice, institutions and multiple equi-libria.
Chicago Journal of International Law
ICML , 663–670.Riedl, K.; Jensen, K.; Call, J.; and Tomasello, M. 2012. Nothird-party punishment in chimpanzees.
Proceedings of theNational Academy of Sciences of the United States of Amer-ica
Artificial Intelligence: AModern Approach . Pearson.Schutz, A. 1964.
Collected Papers II . The Hague: Marti-nusNijhoff.Sondik, E. J. 1971. The optimal control of partially ob-servable markov processes. Technical report, STANFORDUNIV CALIF STANFORD ELECTRONICS LABS.Sugden, R. 1986.
The Economics of Rights, Cooperation,and Welfare . London: Palgrave Macmillan.Tomasello, M., and Vaish, A. 2013. Origins of humancooperation and morality.
Annual Review of Psychology ppendix: Proofs
Proposition 1.
Setting γ d = 1 − (1 − d )(1 − γ ) ensures that the expected sum of discounted rewards fromimportant games is independent of d : ∀ d, ∈ [0 , E g j ,t O ∞ (cid:88) j =0 γ j U V ( g j , t O , t V ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) g j ∈ G (cid:48) = E g j ,t O ∞ (cid:88) j =0 I [ g j ∈ G (cid:48) ] γ jd U ( g j , , t O , t V ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d . Proof.
We first show that it is sufficient to ensure that theexpected value of γ jd is the same given that j is a round withan important game: E g j ,t O ∞ (cid:88) j =0 I [ g j ∈ G (cid:48) ] γ jd U ( g j , t O , t V ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d = ∞ (cid:88) j =0 E g j ,t O (cid:104) I [ g j ∈ G (cid:48) ] γ jd U ( g j , t O , t V ) (cid:12)(cid:12)(cid:12) d (cid:105) = ∞ (cid:88) j =0 γ jd E g j ,t O [ U ( g j , t O , t V ) | d, g j ∈ G (cid:48) ] E g j [ I [ g j ∈ G (cid:48) ] | d ]= (1 − d ) E U ∞ (cid:88) j =0 γ jd where the first line holds by the linearity of expectation, thefact that g j is an independent iid draw from a stationary dis-tribution, and the constraint on the agent’s beliefs that t O is aalso an independent iid draw from a stationary distribution.Substituting the form of the infinite geometric series, we seethat E U − γ = (1 − d ) E U − γ d (2)is sufficient to achieve our goal. Substituting the form for γ d in the theorem statement and reducing shows that thiscondition is satisfied. Proposition 2.
If the participation cost, c , is 0, then, for anybelief state, ( α i , β i ) , and discount rate γ , the correspondingVPI goes to zero as density goes to 1. That is lim d → V P I (( α i , β i ); d, γ ) = 0 (3) Proof.
Let V ( θ ) be the expected value of participating for-ever, given θ . The optimal full information policy will retirewhenever V ( θ ) < and has value V + ( θ ) = max { V ( θ ) , } .VPI is the difference between the expected value of V + and the value of the optimal partial information policy V (( α i , β i ); d, γ ) : V P I (( α i , β i ); d, γ ) = E [ V + ( θ ) | ( α i , β i )] − V (( α i , β i ); d, γ ) (4) We proceed by lower bounding V . V is the value of theoptimal policy so it is weakly lower bounded by any arbi-trary policy. We consider a policy that participates for τ ( d ) = 1 √ − d (5)rounds and then retires if E [ V ( θ )] < . This choice of τ ensures that lim d → τ ( d ) = ∞ ; (6) lim d → (cid:88) t<τ ( d ) P ( g t ∈ G ) = lim d → − d √ − d = 0 . (7)(6) ensures that, as density goes to 1, then agent’s estimateof participation value when it decides, E [ V ( θ )] converges to V ( θ ) by consistency. (7) ensures that the expected numberof important games (and thus opportunities to lose utilityagainst a full information agent) goes to 0. This is sufficientto show that lim d → V (( α i , β i ); d, γ ) = E [ V + ( θ ) | ( α i , β i )])]