[PDF] Don't Believe Everything You Hear; Preserving Relevant Information by Discarding Social Information

Abstract

Integrating information gained by observing others via Social Bayesian Learning can be beneficial for an agent's performance, but can also enable population wide information cascades that perpetuate false beliefs through the agent population. We show how agents can influence the observation network by changing their probability of observing others, and demonstrate the existence of a population-wide equilibrium, where the advantages and disadvantages of the Social Bayesian update are balanced. We also use the formalism of relevant information to illustrate how negative information cascades are characterized by processing increasing amounts of non-relevant information.

Full PDF

aa r X i v : . [ c s . M A ] J un Don’t Believe Everything You Hear;Preserving Relevant Information by Discarding Social Information

Christoph Salge and Daniel Polani University of Hertfordshire, Hatﬁeld, UKc.salge | [email protected] Abstract

Integrating information gained by observing others via So-cial Bayesian Learning can be beneﬁcial for an agent’s per-formance, but can also enable population wide informationcascades that perpetuate false beliefs through the agent pop-ulation. We show how agents can inﬂuence the observationnetwork by changing their probability of observing others,and demonstrate the existence of a population-wide equilib-rium, where the advantages and disadvantages of the SocialBayesian update are balanced. We also use the formalismof relevant information to illustrate how negative informationcascades are characterized by processing increasing amountsof non-relevant information.

Introduction

Information processing is an important aspect of life. Organ-isms equipped with sensors obtain and utilize information toincrease their inclusive ﬁtness; thus justifying the existenceof (often costly) sensors in the ﬁrst place (Polani, 2009).However, not all information is equally relevant for an or-ganism – a notion formalised by Polani et al. (2001, 2006),which we will introduce in more detail later. The basic ideaof relevant information is to quantify how much informa-tion at least is needed to obtain a certain performance level.Once this is established, the next question to ask is, how tobest obtain this speciﬁc information?Previously, we argued (Salge and Polani, 2011) thatagents with common goals and embodiments are likely tohave similar relevant information. Once they obtain this rel-evant information, they also have to act upon it to reap itsbeneﬁts, thus encoding it in their actions. As the state-spaceof actions is usually much smaller than the state-space ofthe overall environment, this is likely to lead to a higher“concentration” of relevant information in another agent’sactions rather than in the environment itself. This digestedinformation , encoded in actions, concentrates pre-processeddecision-relevant information and provides incentives foragents to observe each other and modify their own actionsaccordingly. However, similar behaviour in a population ofagents can lead to a phenomenon called herding (Banerjee, 1992) or information cascade (Bikhchandani et al., 1992).This usually requires an agent population where agents: • select one of several choices; • have some private information related to their decision; • act sequentially and can observe the choices of others, butnot the private internal information of others.This can then lead to situations such as the exampleby Easley and Kleinberg (2010), where an agent wants tochoose between restaurant A and B. His own research sug-gests that restaurant A is better, but once he gets there, noone is eating in restaurant A, while restaurant B is ﬁlled withcustomers. Based on this information it is reasonable to in-fer that several other agents have private information thatcaused them to choose B instead of A. By inferring this ad-ditional information it becomes rational to choose B insteadof A, even if his own private information suggests otherwise.The problem here is that others might make similar con-clusions, and create a chain reaction of inferred private infor-mation that is based on no or very little private information.This illustrates two common properties of information cas-cades; they can be based on very little initial information,and they can be wrong.This is somewhat in contrast to the argument presented in“The Wisdom of Crowds”, where Surowiecki (2005) arguesthat agents that aggregate their information can produce veryaccurate results. But, as Easley and Kleinberg (2010) pointout, this only applies if they are guessing independently.Furthermore, recent studies (Kao and Couzin, 2014) exam-ining several models of group behaviour suggest that smallgroups make correct decision, while larger groups are morelikely to converge on an incorrect decision. Also note thatinformation cascades are also present in other types of multi-agent scenarios, such as swarm coordination (Wang et al.,2012), and are potentially subject to similar problems. Overview

In this paper we examine the interaction between the positiveand negative effects of observing others through the perspec-tive of the relevant information framework. In particular, wehow how rational adaptations can lead to a situation whereincorrect information cascades become common, and howthey are characterized by a reduction in the density of rele-vant information. Furthermore, we demonstrate that in thisenvironment it is reasonable for agents to randomly discardpart of their sensor intake.After introducing information theory and relevant infor-mation in more detail, we present the single agent modelto create a baseline for agent performance and demonstratehow an agent’s actions encode information. The multi-agentscenario is then used to motivate the introduction of theSocial Bayesian Update, as it demonstrates the increase inperformance when information from other agents is usedin decision making. The next scenario deals with chang-ing world states and shows that agent’s performance canbe increased by explicitly modelling the noise in the world,which basically motivates internal models which cannot ex-press certainty. This speciﬁc form of bounded rational-ity is interesting in the context of information cascades, asAcemoglu et al. (2011) previously showed that a lack of in-ternal certainty makes populations more likely to synchro-nize. Finally, we will look at models that combine noise andSocial Bayesian Update, which have both been motivatedpreviously by increased agent performance. In this envi-ronment, negative information cascades are common but weshow that agents can randomly discard sensor inputs to in-crease their performance. This is motivated by results fromGale and Kariv (2003), which demonstrated that sparsity inthe observation graph makes convergence (both negative andpositive) less likely. By moderating their own sensor intake,agents can change between single-agent behaviour, and pos-itive “wisdom of the crowds” and negative information cas-cades.

Information Theory

Relevant information is based on the formalism of Informa-tion Theory (Shannon, 1948). If X is a random variable thatcan assume the states x , where each state x is a member ofthe alphabet X , then P ( X ) is the probability distribution of X , and P ( X = x ) is the probability that X assumes thevalue x , sometimes shortened to p ( x ) . Entropy, or the self-information of a variable is then deﬁned as H ( X ) = − X x ∈X p ( x ) log p ( x ) . (1)This is often described as the uncertainty about the outcomeof X , the average expected surprise, or the average infor-mation gained if one was to observe the state of X , with-out having prior knowledge about X . Consider two jointlydistributed random variables, X and Y ; then we can calcu-late the conditional entropy of X given a particular outcome Y = y as H ( X | Y = y ) = − X x ∈X p ( x | y ) log p ( x | y ) . (2) This can be averaged over all states of Y , resulting in theconditional entropy of X given Y , H ( X | Y ) = − X y ∈Y p ( y ) X x ∈X p ( x | y ) log( p ( x | y )) . (3)This is the entropy of X that remains, on average, if Y isknown. So H ( X ) and H ( X | Y ) are the entropy of X beforeand after we learn the state of Y . Thus, their difference isthe amount of information we can learn, on average, about X by knowing Y . Subtracting one from the other, we get avalue called mutual information : I ( X ; Y ) = H ( X ) − H ( X | Y ) . (4)The mutual information is symmetrical and measures theamount of information one random variable contains aboutanother (and vice versa, by symmetry). Also, note that weuse the binary logarithm for all log( . ) operations, so all in-formation measurements are in bits . Relevant Information

Relevant information is the amount of information an agentneeds to obtain to either act optimally, or at a speciﬁc perfor-mance level. Assume that there is an agent that interacts withthe environment by choosing an action in reaction to someform of sensor input. The environment R is in the state r ,and the agent chooses an action a from a set of actions A .For simplicity, we assume for now that the agent can per-ceive the whole environment, so the sensor state is equal tothe state of the environment. Furthermore, assume that theactions of the agent are connected to some utility function U ( a, r ) (for example, survival probability, or ﬁtness) whichdetermines different pay-offs, depending on the agent’s ac-tion A = a and the state of the environment R = r . Wealso assume that the states of the world R are distributedaccording to the probability distribution P ( R ) .A strategy is deﬁned as a conditional probability distribu-tion P ( A | R ) , which deﬁnes for every state r the probabilityof choosing different actions a . We can deﬁne a set π u asthe set of all strategies that have the average pay-off level, orperformance, of at least u as π u = ( P ( A | R ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X a X r U ( a, r ) p ( a | r ) p ( r ) ≥ u ) . (5)As a strategy P ( A | R ) also implies a distribution P ( A ) = P ( A | R ) P ( R ) , we can compute the mutual information I ( A ; R ) for each strategy. The relevant information for aspeciﬁc performance level is then deﬁned as RI ( u ) := min p ( a | r ) ∈ π u I ( A ; R ) , (6)which is the minimal mutual information over all strategiesthat achieve at least the average pay-off of u . As the mutual 2 3 4 5 6 7 8 9 100.000.040.080.120.160.20 Location P r ob a b ilit y Figure 1: The probability of observing an agent going to aspeciﬁc location, if the treasure is located in position 1 andthere are 10 locations.information I ( A ; R ) measures the amount of informationthe agent has to process to determine a , this can be inter-preted as the minimal amount of information an agent needsto obtain to perform at least as well as u . Due to the sym-metry of mutual information, this can also be interpreted asthe minimal amount of information an agent’s actions haveto contain. Experiments

Single Agent Model

There are ten locations; exactly one of them contains trea-sure. The treasure location is modelled by the state of thevariable T . The agent’s task is to determine the location ofthe treasure in the least number of turns. Each turn the agentdecides to visit one of the locations, and is then informed ifthat location contains the treasure or not.The agent’s decision making is modelled with an internalBayesian model ˆ T , where P ( ˆ T = t ) is the agent-assumedprobability that the treasure is in location t . Every turn, theagent chooses to visit the location where it believes the trea-sure most likely to be. In case of a tie between differentlocations, it chooses one of them at random. Initially, theagent believes all locations to be equally likely. Once it ob-serves the state of a given location, it updates its internalmodel with that knowledge. So, if location t is found empty,then it sets P ( ˆ T = t ) = 0 , and all other probabilities areuniformly scaled, so they still sum to one. If the agent ﬁndsthe treasure, it is retired from the simulation. For this simplecase the Bayesian model is not strictly necessary, but it willallow us to smoothly integrate later modiﬁcations. Here itjust prevents the agent from revisiting any empty locations,which is arguably the best possible performance for an agentwithout any additional information.For a world with ten locations it takes on average ≈ A , we can then use that data to compute how muchinformation about T , the treasure location, is encoded in A .The mutual information in this case computes to ≈ ≈ . . Thesingle agent has a performance ratio of . . This measure-ment is also identical to the fraction of agent actions that arelooking at the right location. This allows us later to evaluatethe performance of an agent population, as we do not haveto measure the search time, but just measure how many ofthe agent’s actions are going to the treasure location. Multiple Agent Scenario

The last section indicated that the agent’s actions contain rel-evant information about the treasure location. Therefore, wewill now modify the model, so that the agent can integratedata from observing other agents into their internal beliefmodel.In the multi-agent model social agents will be able to ob-serve the actions taken by other agents, but they will not seethe result of this exploration, i.e. know if the visited locationis empty. When an agent observes another agent’s action a = A , it will integrate the obtained information into itsown internal model P ( ˆ T ) by performing a Naive BayesianUpdate, based on the statistics for P ( A | T ) gathered from thenon-social statistics in (Fig. 1). So, its new internal modelafter observing a is P ( ˆ T | A = a ) = P ( A = a | T ) P ( A = a ) P ( ˆ T ) . (7)If an agent ﬁnds the treasure, it will be replaced by a newagent, which is simulated by re-initializing an agent’s inter-nal model with the uniform distribution.So, for the multi-agent simulation, all agents start withuniform internal distribution. Each turn the agents then de-cide their actions, based on their internal model, in the sameequential order. When agents observe other agent’s actions,they update their internal model immediately. When agentsobserve a location, they either update their model if that lo-cation is empty, or are replaced by a new agent (have theirmodel reset) if the location contains the treasure.Note that the Naive Bayesian Update (NBU) works withthe assumption that the different sources of information areindependently distributed, which is not true in general. NBUstill provides good approximations if the dependencies arenormally distributed, but in information cascades this is alsonot the case, as the spread of information through a popula-tion is usually self-reinforcing. We still use the NBU, as amore exact Bayesian Update would be nearly impossible toproduce, as it would require the agent to remember all pre-vious interactions, and requires statistics on how all othersources of information interact. NBU on the other hand canbe done the moment some information becomes available,and the internal belief representation can be represented as asingle probability distribution. Single Social Agent

In the ﬁrst experiment we examined10 agents in a world with 10 locations. All data discussedfrom here on is the average value for 1,000 simulations, eachrunning for 1,000 turns. Only one of the agents has the abil-ity to observe the others. The location of the treasure is ﬁxed,and determined at random at the beginning of the simulation.Unsurprisingly, the remaining non-social agents perform ex-actly as in the single agent simulation. Their distribution ofactions matches the one recorded in Fig. 1.The social agent in the simulation performs better; reach-ing a performance of ≈ . . This agent beneﬁts from theinformation the other agents gather. As discussed in the “Di-gested Information” argument, the other agents act as infor-mation preprocessors for the social agent. Also, note thatthe distribution of actions of the social agent is even moreconcentrated on the actual treasure location, hence the mu-tual information between its actions and the treasure loca-tion, I ( A ; T ) = 0 . bits, is higher than the same mutualinformation for the non-social agents, which was . bits. All Social Agents

Given the increase in performance fora single agent, we now assume that the whole populationof agents adopts the social update approach, and we exam-ine a simulation where all agents integrate the informationgained from other agent’s actions. This turns out to be ex-tremely beneﬁcial. The performance of the overall popula-tion, which is also the performance of every separate agent,is ≈ . . Once the treasure has been located by one agent,all subsequent actions lead to the treasure, and the mutualinformation between actions and treasure location is nearlymaximal, I ( A ; T ) ≈ log(10) .Basically, the relevant information that the treasure is inlocation t propagates through the agents. It is displayed inan agent’s actions, then used to update another agent’s in- ternal model. That agent then uses the information to deter-mine which action to take, which is going to be A = t .The agent will then ﬁnd the treasure and reset its inter-nal model. But it will perceive others before it has to actagain, biasing its internal model again towards taking action A = t . This will continue unless environmental informa-tion conﬂicts with this information, meaning the agent willnot ﬁnd the treasure at the location in which it was looking.In that case, the observed location’s probability to containtreasure is set to zero, and the agent will look at other lo-cations. This will initially get the agents to explore all lo-cations until they ﬁnd the treasure, after which they will allcopy each other, ﬁnding the treasure every turn from thatpoint onwards. Note, that the treasure does not move whenit is found, however the agent who found the treasure resetsits internal model (simulating its replacement with a newagent).As we see, the important information is preserved by con-tinuously ﬂowing through the agent population. Even whenagents retire and are replaced, the information is not lost.This looks like a very desirable feature for an agent popula-tion, and therefore the Social Bayesian Update seems like areasonable adaptation. Changing World State

In this section, we will demonstrate how lack of certaintycan affect this simulation. We will use the single agentmodel to motivate the inclusion of noise into our internalBayesian belief model.In the next simulation the locations of the treasure willchange during the simulation to different random loca-tions. This will happen every turn with probability of P ( change ) = 0 . . On average this should change the lo-cation every 100 turns. The behaviour of the agents is leftunchanged.First, let’s again take a look at the simulation for a sin-gle agent. The performance ratio of the agent drops from . for the static world state simulation, to . for thesimulation where the world state changes. A closer analy-sis shows that the agent’s original behaviour has problemsdealing with the new scenario. Consider that the agent vis-its a location x , and ﬁnds it empty. Then the probability for T = x will be set to zero in ˆ T . If the location now changesto T = x after the agent visited x , then the agent will ﬁrstexplore all other locations, ﬁnding all of them empty. This,in itself, is not problematic. But once the agent has lookedat each locations once, all probabilities are assumed to bezero, given that the agent still assumes there is one, non-moving treasure location. This is inconsistent with the basicproperties of probabilities and is a result of the incorrect as-sumption about the immovability of the treasure location. Inthis speciﬁc implementation the agent now resorts to ran-dom search. This behaviour has, as we have seen, a lowerperformance rate, and therefore lowers the agent’s overallerformance. Modelling Uncertainty

To address this problem we canchange the internal model to correctly reﬂect probabilitiesfrom the agent’s perspective. The treasure changes its loca-tion with a probability of P ( change ) = 0 . and relocatesto one of the locations randomly. This can be modelledby assuming that the world is in one of two states. Either,with P ( change ) = 0 . , it is in a state where the locationhas just changed, so T should be uniformly distributed withevery t ∈ T having the probability P ( T = t ) = 1 / . Theother state, with a probability of − P ( change ) , is the onewhere the treasure location remains unchanged, so the agentshould continue to assume the distribution represented byits internal model ˆ T . These two cases can be combined ina weighted sum to determine a new internal distribution ˆ T ′ .The probability for every state t in this new distribution canbe computed as P ( ˆ T ′ = t ) = P ( change ) 1 n + (1 − P ( change )) · P ( ˆ T = t ) . (8)To model the uncertainty, this formula is applied to theagent’s internal model each turn after it has completed itsaction. Note, that this leaves the ordering of probabilitiesfrom the most likely to the least likely event intact, unless theprobability of change is . . Therefore, the single agent be-haviour with modelled uncertainty performs just as well asthe agent without for a non-changing treasure location. But,applying the above uncertainty model to a single agent in aworld where the treasure location does change, increases itsperformance from . (for the agent without uncertainty)to . .The performance increases because by modelling uncer-tainty, the agent retains some information about the orderin which it explored the previous locations in its internalmodel. The location that was visited ﬁrst and found emptysubsequently had uncertainty applied to it nine times, oncethe agent cleared the last, tenth location. It therefore has thelargest probability to contain the treasure, and will be theﬁrst location to be visited again. This actually reﬂects thefact that this location is most likely to contain the treasure,since it is unclear when the treasure changed location.This also shows why modelling the uncertainty works bet-ter than simply resetting the probabilities after all locationswere visited and found empty. This would reset the inter-nal model and prevent the agent from having to use randomsearch, but it would not preserve the information about theordering of the previous search, which could be used to theagent’s advantage. Uncertainty and Social Bayesian Update

In this section, we examine a population where all agentsmodel the change uncertainty and also perform the socialBayesian Update. As a result, the agent’s performance drops to 0.1, which is equivalent to chance. Closer analysis showsthat the whole agent population is always exploring the samelocation, and the 0.1 average performance is simply the re-sult of the treasure randomly moving to this location fromtime to time. The agent population here is subject to an in-formation cascade that synchronizes the whole population.But compared to the all-social agent population with inter-nal certainty, the agents cannot reliably check that a certainlocation is wrong, so after the initial agent breaks the sym-metry, the repeated exposure to other agent’s social signalswill always override their own internal uncertain beliefs. So,while the Social Bayesian Update is beneﬁcial for agents insome cases, it turns out that it can be harmful, speciﬁcallywhen combined with a more accurate model of uncertainty.This is similar to how bounded rationality, i.e. the inabilityto internally represent certainty, facilitates convergence insocial Bayesian network learning (Acemoglu et al., 2011).The difference here is that the lack of internal certainty is notcaused by a limitation of the agent, such as cost of internalrepresentation, but motivated by an increase in performanceresulting from a more exact modelling of the noise presentin the environment.

Partial Observability

One way to reduce the probability for convergence is thereduction of network connectivity (Gale and Kariv, 2003).Currently, the agents live in a neighbourhood of a fully con-nected graph, being able to observe all other agents. Thenext simulation has changing treasure locations and an allsocial, internally uncertain agent population. Unlike the pre-vious models, only a fraction of the other agents’ actionscan be observed. Every time an agent takes an action, eachother agent has a probability of p o to observe this action andupdate its internal model. Whether an agent can observe aspeciﬁc action is determined for each observing agent sep-arately. This creates several simulations interpolating be-tween two previously studied cases. If p o = 0 , then themodel would be identical to the non-social agent simulation,and if p o = 1 , then it would be identical to one in which allagents could observe each other, which leads to a feedbackloop and very bad performance ratios. Changing Observation Probability for all Agents

Vary-ing the parameter p o for all agents results in performanceratios as depicted in Fig. 2. As expected, the extremal pointsare characteristically similar in performance to the non-social and all-social models. In the case where no agentsobserve each other, the agents ﬁnd the treasure on average0.18 times per turn. The performance ratio increases as thechance to observe other agents increases, up to ≈

30 % ob-servation probability, where all agents have a performanceratio of ≈ . . Increasing the observation probabilityfurther however, lowers the performance down to approxi-mately 0.1 at an observation probability of

50 % and above.

10 20 30 40 50 60 70 80 90 10000 . . . . . Observation Probability in % P e rf o r m a n cea nd I ( A ; T ) Performance RatioMutual Information

Figure 2: Average performance of an agent population, andthe mutual information between its actions and the treasurelocation, depending on the probability to observe the actionsof other agents.The second graph (dotted red) in Fig. 2 is the mutual in-formation between the agent’s actions A , and the treasurelocation T . We see that I ( A ; T ) has the same value as for anon-social agent when the observation probability is zero, itthen rises to a peak of ≈ . bits for an observation prob-ability of

30 % . The mutual information then decreases forlarger observation chances, down to zero mutual informationfor values above

60 % . Changing Observation Probability for one Agent

If theobservation probability is understood as the result of anagent’s effort invested in observing others, then it could betreated as a behavioural parameter that the agent, or at leastthe process that governs the adaptation of agents, could con-trol. This could be realized by deliberately degrading theagent’s sensors to save resources in case of an adaptationprocess on the agent’s population, or by simply discardingsome of the sensor input at random if this is realized as anagent strategy. In this context, it would make sense to askif an individual agent could perform better than the rest ofthe population by unilaterally changing its probability to ob-serve others.Given that the actions of the remaining population pro-vide a high degree of mutual information, it might be usefulto obtain more of this information than others do. On theother hand, there were indications that taking in too muchinformation from other agents might override the informa-tion from the non-agent environment, and thereby degradethe agent’s performance. So deliberately lowering the socialinformation intake might also improve the agent’s perfor-mance compared to the rest of the population.In the next simulation we will look at one agent that canchange its observation probability independently from therest of the population. The observation probability for an . . . . . Observation Probability in % P e rf o r m a n cea nd I ( A ; T ) Performance RatioMutual Information

Figure 3: Performance of a single agent, and the mutual in-formation between this agent’s actions and the treasure lo-cation, depending on the probability to observe the otheragents in the population. All other agents observe each otherwith a probability of 30%.agent determines how well it can see others, not how wellit can be seen. That means that whenever this agent couldobserve another agent’s action, its own observation proba-bility would be used to determine whether this agent couldactually sense what action the other agent took.All other agents in the simulation have a ﬁxed observa-tion probability of 30 %, since this was the value that ledto the best performance for the overall population, and alsoencoded the most information.In Fig. 3 we see the resulting performance ratio and mu-tual information I ( A ; T ) for varying p o for the one agentthat can change its observation probability. Overall, thegraph looks very similar to the previous graph in Fig. 2where all agents could change their observation probability.The performance for one agent is still optimal at ≈ .Scaling down the observation probability to zero obviouslyleads to the same performance as the non-social agent. In-creasing observation probability further also results in low-ering the performance to approximately 0.1.This is particularly interesting because, for this speciﬁcsimulation, it creates something akin to a game theoreticequilibrium at the 30 % point. All other factors being equal,even if all agents could change their own observation proba-bility at will, none of them could change it away from 30 %without also decreasing their performance. Relevant Information Analysis

So far, we have computed the mutual information betweenthe agent’s actions and the environment as a measure of howmuch information their collective actions provide about thestate of the environment to an observer. We will now com-pare this mutual information to the actual relevant infor-mation for different performance levels. This will demon-trate that higher observation probabilities are characterizedby processing information that is not necessary, indicatingthe perpetuation of false beliefs in the agent population.

RI(u) for the Treasure Hunter Model

The relevant in-formation for the treasure hunter model is determined bythe distribution of the treasure, encoded in T , and a speciﬁcagent’s action distribution, encoded in A . Both random vari-ables are deﬁned over the same alphabet, which correspondsto all possible locations in the world.As relevant information is a property of the environment,and not of a speciﬁc agent, it therefore considers all possiblestrategies p ( a | t ) , regardless of how any speciﬁc agent wouldacquire the information needed to actually implement thisstrategy. To determine the value for RI ( u ) we have to an-swer the question, which joint distribution of A and T hav-ing at least a performance level of u has the lowest mutualinformation?For our speciﬁc example of a world with ten locations wecan compute the relevant information function as RI ( u ) = log(10) + (cid:18) u log( u ) + (1 − u ) log (cid:18) − u (cid:19)(cid:19) . (9)Note that this function computes the minimal mutual in-formation for being on a speciﬁc performance level u , notfor having a strategy that at least has the performance level u . However, looking at the actual function, which can beseen in Fig. 4, it becomes clear that the function is, for val-ues of u over 0.1, strictly increasing. Therefore, the minimalmutual information for a speciﬁc performance level above0.1 is also the actual relevant information needed to performat least that well. The previous distinction is necessary, be-cause in this case it is necessary to process information tohave a performance level lower than 0.1. A performance of0.1 can be achieved with a random strategy, and thereforehas no relevant information. Eq.(9) reﬂects this, as it is zerofor u = 0 . . For values of u lower than 0.1 the functionin Eq.(9) computes values higher than zero, which would bethe information necessary to actually perform at this level.One would have to actively avoid the treasure. But by previ-ous deﬁnition relevant information should return the infor-mation needed to at least attain a speciﬁc level, and sincerandom performs better, and has no relevant information, allperformance levels below u = 0 . have zero relevant infor-mation.The data points plotted in Fig. 4 are taken from the twoprevious simulations, those where all agents changed theirobservation probability, and those where only one agentchanged its observation probability and all other agents hadan observation probability of 30 %. Each point is the combi-nation of the mutual information I ( A ; T ) and the achievedperformance ratio for a speciﬁc percentage of observationprobability. Different observation probabilities result in · − . .

15 0 . .

25 0 . .

35 0 . .

45 0 . . . . . . . . .

0% observation prob.30% observation prob.50% observation prob.Performance Ratio M u t u a l I n f o r m a ti on I ( A ; T ) Relevant InformationOther Agents 30% Observation Prob.All Agents change Observation Prob.

Figure 4: Relevant Information trade-off curve (black line)and points indicating the mutual information and perfor-mance for different observation probabilities.different strategies, i.e. different conditional probabilities P ( A | T ) .The data points gathered here are, as expected, all aboveor on the RI trade-off curve. The pattern of values are verysimilar for both simulations. For an observation probabilityof 0.0 the data point is located at a performance of 0.18, andactually on the trade-off curve. As the observation prob-ability increases, so does the performance. The strategiesremain on the trade-off curve at the lower percentages of ob-servation probability, and since the trade-off curve is strictlyincreasing, so does the encoded relevant information.As the observation probability increases we see that theresulting data points leave the trade-off curve, which meansthe resulting strategies encode more mutual informationabout the environment than is necessary. The strategies re-sulting from further increases in observation probability arelocated in the upper loop where they gravitate towards apoint of no mutual information and a performance of 0.1.This indicates that they also encode more information aboutthe environment than necessary.Comparison of the mutual information in the actual strate-gies to the actual relevant information illustrates how ob-serving more and more agents leads to processed informa-tion which might not necessarily be relevant. The strategieswith low observation probability are located on the relevantinformation trade-off curve, meaning they are efﬁcient inthe sense that they do not process non-relevant information.Those strategies which are subject to the information cas-cade on the other hand do display a lot of information aboutthe environment in their actions which is non-relevant. Atthe same time, as seen here, their performance diminishes aswell. Fortunately for the agent population, the point whereagents display the most relevant information about the envi-ronment is also close to the point where the agent performsbest, so it would be possible for an agent population, whichould adjust their observation probability, to stabilize at apoint which beneﬁts all agents the most. Conclusion

Our results indicate that a noisy internal representationseems to be an important factor for the convergence of infor-mation cascades, speciﬁcally those where the agents perpet-uate information that leads to wrong internal beliefs, sincethe agent cannot, with certainty, reject certain social infor-mation. In general, the problem arises in scenarios wheresignals gained from other agents overpower the agent’s pri-vate observations and are not as independently generated asthe naive Bayesian update models it. On the other hand,the information from other agents is also helpful, and canimprove an agent’s performance in our model. The interest-ing observation here is that both things can be inﬂuenced byhow many other agents an agent randomly observes. Toolittle, and the agent loses the social information, too much,and the agent population will converge, but possibly on thewrong belief.Our relevant information analysis also shows how thequality of the information suffers when more and moreagents observe each other. For very low observation proba-bilities, all the information processed is relevant and agentsonly display relevant information in their actions. When theagents observe more, their performance gets better still, butwe see that they start to pass on information that is incor-rect and perpetuate it, sometimes leading to false conver-gences. As the “good” relevant information is still improv-ing, this unnecessary information seems acceptable, but ifwe increase the observation chance even further, then we seethat the performance suffers and the information provided bythe agents is mostly wrong.In our model however, there exists a point where agentsboth perform optimally and provide the most information,so a population of agents could adapt to a strategy wherethey discard a certain percentage of their observations, andperform well. In this case, the agents would basically de-termine the observation network of the model themselves.The exact parameter of how many of one’s observations oneshould discard is, of course, model dependent. For example,if the number of agents increases, then it likely takes moreobservations for total convergence, but a lower observationprobability could be sufﬁcient to provide enough social in-formation to overpower the agent’s internal beliefs. This isinteresting if this is seen as a model for fads and fashions. Ifan agent, adapted to a population with a speciﬁc degree ofconnectivity, adapts to discard a certain percentage of socialinformation, and is then transplanted to another population,with different parameters, it might become much more sus-ceptible to false self-perpetuating beliefs. The same is truefor a population of agents that manages to change their envi-ronment in a way that radically changes how much they canobserve others.

Acknowledgements

This research was supported by the European Commissionas part of the CORBYS (Cognitive Control Framework forRobotic Systems) project under contract FP7 ICT-270219.The views expressed in this paper are those of the authors,and not necessarily those of the consortium.

References

Acemoglu, D., Dahleh, M., Lobel, I., and Ozdaglar, A. (2011).Bayesian learning in social networks.

The Review of Eco-nomic Studies , 78(4):1201–1236.Banerjee, A. (1992). A simple model of herd behavior.

The Quar-terly Journal of Economics , 107(3):797–817.Bikhchandani, S., Hirshleifer, D., and Welch, I. (1992). A theoryof fads, fashion, custom, and cultural change as informationalcascades.

Journal of Political Economy , pages 992–1026.Easley, D. and Kleinberg, J. (2010).

Networks, Crowds, and Mar-kets: Reasoning About a Highly Connected World . Cam-bridge University Press.Gale, D. and Kariv, S. (2003). Bayesian learning in social net-works.

Games and Economic Behavior , 45(2):329–346.Kao, A. B. and Couzin, I. D. (2014). Decision accuracy in com-plex environments is often maximized by small group sizes.

Proceedings of the Royal Society B: Biological Sciences ,281(1784):20133305.Polani, D. (2009). Information: Currency of life?

HFSP journal ,3(5):307–316.Polani, D., Martinetz, T., and Kim, J. T. (2001). An information-theoretic approach for the quantiﬁcation of relevance. In

ECAL ’01: Proceedings of the 6th European Conference onAdvances in Artiﬁcial Life , pages 704–713, London, UK.Springer-Verlag.Polani, D., Nehaniv, C. L., Martinetz, T., and Kim, J. T. (2006).Relevant information in optimized persistence vs. progenystrategies. In

Artiﬁcial Life X : Proceedings of the Tenth In-ternational Conference on the Simulation and Synthesis ofLiving Systems , pages 337–343. The MIT Press (BradfordBooks).Salge, C. and Polani, D. (2011). Digested information as an infor-mation theoretic motivation for social interaction.

Journal ofArtiﬁcial Societies and Social Simulation , 14(1):5.Shannon, C. E. (1948). A mathematical theory of communication.

Bell Systems Technical Journal , 27:379–423.Surowiecki, J. (2005).

The Wisdom of Crowds . Anchor.Wang, X. R., Miller, J. M., Lizier, J. T., Prokopenko, M., and Rossi,L. F. (2012). Quantifying and tracing information cascades inswarms.