aa r X i v : . [ ec on . T H ] N ov Learning in a Small/Big World ∗ Benson Tsz Kin Leung † November 16, 2020
Abstract
Savage (1972) lays down the foundation of Bayesian decision theory butasserts that it is not applicable in big worlds where the environment is com-plex. Using the theory of finite automaton to model belief formation, thispaper studies the characteristics of optimal learning behavior in small and bigworlds, where the complexity of the environment is low and high, respectively,relative to the cognitive ability of the decision maker. Confirming Savage’sclaim, optimal learning behavior is closed to a Bayesian individual in smallworlds but is significantly different in big worlds. In addition, in big worlds,the optimal learning behavior could exhibit a wide range of well-documentednon-Bayesian learning behavior, including the use of heuristics, correlationneglect, persistent over-confidence, inattentive learning, and other behaviorsof model simplification or misspecification. These results establish a clear andtestable relationship among the prominence of non-Bayesian learning behav-ior, complexity, and cognitive ability.
Keywords:
Learning, Bounded Memory, Bayesian, Complexity, Cogni-tive Ability
JEL codes:
D83, D91 ∗ I am grateful to Jacques Cr´emer, Matthew Elliott, Renato Gomes, Philippe Jehiel, HamidSabourian, Mikhael Safronov, Larry Samuelson, Tak-Yuen Wong and the audiences at variousseminars and conferences for their insightful discussions and comments. † University of Cambridge. Email: [email protected] , [email protected] . Introduction
Many experimental and empirical studies have documented different behaviors ofbelief formation that systematically depart from the Bayesian model, e.g., the useof heuristics (Kahneman et al. (1982)), correlation neglect (Enke and Zimmermann(2019)), persistent over-confidence (Hoffman and Burks (2017)), inattentive learning(Graeber (2019)), etc. Informally, these departures from the Bayesian model areoften attributed to the complexity of employing the Bayes rule. However, to thebest of my knowledge, no study formally analyzes how the complexity of an inferenceproblem affects individuals’ learning behaviors. Are “anomalies” less prominent inless complicated problems? How do learning behaviors change with the complexityof the inference problems? This paper aims to answer these questions and explaindifferent “abnormal” learning behaviors in light of complexity.Every day we form beliefs over many issues to guide our decision making, frompredicting the weather and deciding whether to go out with an umbrella, looking forchocolate in our favorite supermarket, to estimating and preparing for the impactof Brexit. Some problems are trivial and some are complicated. Intuitively givenour limited cognitive ability, the complexity of the inference problem should affectthe way that we form beliefs. After several trips to the same supermarket, wewould be fairly sure about where to look for chocolate, but even after collectingnumerous data points about the stock market, we rely on simple heuristics and oftenmake mistakes in our investment decisions. We are also more likely to disagree oncomplicated problems, e.g., the impact of Brexit or global warming, but agree onsimpler problems, e.g., whether it is raining. Moreover, different individuals perceivethe complexity of a problem differently and are likely to form beliefs in different ways.A leading macroeconomist would estimate economic growth differently comparedto an ordinary citizen, and they are likely to disagree with each other even afterobserving the same information.To study the relationship between learning and complexity, I analyze a simplemodel to compare the optimal learning behavior of an individual in small and bigworlds. The terms “small worlds” and “big worlds” are inspired by the seminalwork of Savage (1972). In his book, Savage develops the foundation of Bayesiandecision theory and asserts that it is only applicable in small worlds but not in bigworlds. The term “big worlds” refers to complicated inference problems where itis difficult for individuals to form a prior belief on states and signal structures, oreven to construct the state space. Given this distinction between small and bigworlds, different studies have developed decision theories to model rational learningand decision making in big worlds (Savage (1972), Gilboa and Schmeidler (1995, There are plenty of examples. See, for example the seminal work of Kahneman et al. (1982),Kahneman (2011) and Section 3 of the review article Rabin (1998). See for example section 3.7.1 of Forbes et al. (2015) for the prominence and doubts of the useof technical charting heuristics in the financial market. In contrast, this paper defines small and big worlds based on the complexity of theinference problem relative to the cognitive ability of individuals, and thus sheds lighton the heterogeneity of learning behavior among individuals.More specifically, I consider a decision maker (DM) who tries to learn the truestate of the world from a finite state space, where the number of possible states N measures the complexity of the inference problem. In each period t = 1 , · · · , ∞ ,the DM guesses what the true state of the world is and gets a higher utility ifhe makes a correct guess than otherwise. In each period after making a guess,he receives a signal and updates his belief. To model limited cognitive ability, Iassume that the DM’s belief is confined to an M sized automaton that capturesbounded memory, as in the seminal work of Hellman and Cover (1970). The DM’s“belief” is confined to one of M memory states, and a belief updating mechanismspecifies a (potentially stochastic) transition rule that determines how he updateshis belief from one memory state to another given the signal he receives, and a(potentially stochastic) decision rule that determines his guess given his memorystate. In contrast to the Bayesian model, the DM has a coarser idea of the likelihoodof different states of the world, and the coarseness decreases in M . That is, M measures the cognitive ability of the DM. I define small worlds as cases where NM isclose to 0 or in general very small. In contrast, big worlds refer to cases where NM is bounded away from 0. Thus, whether a problem is a small or big world dependson the relative complexity of the world with respect to the individual’s cognitiveability.I compare the characteristics of the optimal updating mechanisms that maximizethe asymptotic utility of the DM in small and big worlds. The results are summarizedin Table 1. First, I analyze how learning differs from the Bayesian benchmark insmall and big worlds. This sheds light on whether and under what circumstances the Intuitively, if individuals have super cognitive ability, any complicated problem would looksimilar to a small problem. See Compte and Postlewaite (2012), Wilson (2014), Monte and Said (2014),Basu and Chatterjee (2015), Chauvin (2019) Chatterjee and Sabourian (2020) in the eco-nomic literature that model belief updating and the aversion of complexity with finite automaton.See also Oprea (2020) and Banovetz and Ryan (2020) for experimental evidence. The Bayesian analogue of this bounded memory setting would be M equals the space ofprobability simplex and the transition rule is equal to the Bayesian formula. Note that if the individual tracks his belief not with a finite automaton but with a real numberstatistic, the cardinality of the belief statistics is much larger than N , and the model collapses toa Bayesian model. mall Worlds:low complexity relativeto cognitive ability NM Big Worlds:high complexity relativeto cognitive ability NM Is learning close to Bayesian? Yes NoCould ignorance in learning be “optimal”? No YesCould disagreement be persistent? No Yes
Table 1: Differences in learning behaviors in small/big worldsBayesian model serves as a good approximation of learning behavior. I show thatasymptotic learning behavior is very close to Bayesian in small worlds, i.e., the DMwith bounded memory almost always makes the same guess as a Bayesian individual;while in big worlds, asymptotic learning is significantly different from Bayesian, andthe DM is bound to make mistakes. Moreover, he makes more mistakes when theworld is bigger, i.e., when NM increases.From the first result mentioned in the previous paragraph, we know that learningis different from the Bayesian model in big worlds. But is it simply a noisy versionof the Bayesian model or does it resemble some of the well-documented biases? Toanswer this question, the second result of this paper shows that in big worlds, it couldbe optimal for the DM to never guess some states as he focuses on a subset of statesgiven his scarce cognitive resources. In contrast, such “ignorant” behavior is neveroptimal in small worlds as the DM has plenty of cognitive resources to learn with asmall state space. This ignorance in learning encompasses different well-documentedlearning biases including the use of heuristics, correlation neglect, persistent over-confidence, inattentive learning, and other behaviors of model simplification andmisspecification.To see this, consider the phenomenon of persistent over-confidence (Hoffman and Burks(2017), Heidhues et al. (2018)). Suppose that the state of the world comprises theDM’s ability and the ability of his teammate, where both could be high or low. Inthe current setting, persistent over-confidence occurs when the DM never guessesthe state where his ability is low, and thus behaves as if he always believes he hashigh ability and only updates his belief about his teammate’s ability, even afterobserving a large sequence of bad team performance. Similarly, other systematiclearning biases could also be modeled as the individual ignoring some states of theworld. Such ignorant behavior could be optimal in big worlds when NM is boundedaway from 0, but in small worlds, the DM would eventually realize he has a lowability.Last, I analyze whether disagreements are persistent in small and big worlds.As learning behaviors are close to the Bayesian model in small worlds, intuitively I provide two examples with the availability heuristic in Section 3.1 and inattentive learningin Section 7. M .This paper is organized as follows. In the next section, I briefly discuss howthis paper relates to the literature. Section 3 presents the model. I analyze theoptimal learning behavior in small and big worlds in Sections 4 and 5, respectively.Section 6 shows an extension of the model. Finally, in Section 7, I conclude bypresenting a discussion of the results. The proofs and omitted results are presentedin the Appendix. In this section, I discuss the existing literature and the contribution of this paper.This paper contributes to a growing set of theoretical literature that explainsbehavioral anomalies as optimal/efficient strategies in light of limited cognitive abil-ity. Sims (2003) and Matˇejka and McKay (2015) study the implication of rationalinattention and show that it explains sticky prices in the market and micro-foundsthe multinomial logit choice model, Steiner and Stewart (2016) shows that an opti-mal response to noises in perceiving the details of lotteries leads to the phenomenonof probability weighting in prospect theory (Kahneman and Tversky (1979)), andJehiel and Steiner (2020) and Leung (2020) show that a capacity constraint on thenumber of signals that individuals could update their belief with drives confirmationbias and other biases in belief formation.The closest research to this paper is Hellman and Cover (1970) and Wilson(2014), who study hypothesis testing and belief formation, respectively, using fi-nite automaton in a setting where N = 2. Hellman and Cover (1970) characterizesthe optimal design of automaton, while Wilson (2014) shows that the optimal au-tomaton exhibits confirmation bias. This paper differs from the two papers and It is also interesting to note that under the setting of rational inattention, as shown inMatˇejka and McKay (2015), the individual never ignores any state if its prior probability is strictlypositive. In other words, ignorant behavior is never optimal and disagreement never happens withcertainty under the setting of rational inattention. N = 2, they are unable to discuss how complexity (changes in N ) affectslearning behavior.Moreover, the existing literature proposes different explanations for different be-havioral phenomena, while this paper explains a wide range of biases under thesame framework: the efficient allocation of cognitive resources in face of com-plexity. In particular, the optimal learning strategy in face of complexity couldresemble a large set of ignorant learning behavior, such as the use of heuristics(Tversky and Kahneman (1973)), correlation neglect (Enke and Zimmermann (2019)),inattentive learning (Graeber (2019)), persistent over-confidence (Hoffman and Burks(2017)) and other model simplifications and misspecification. The results in thispaper also micro-found the assumptions in models of bounded rationality, includ-ing misguided learning (Heidhues, K˝oszegi and Strack (2018)) and analogy-basedequilibrium (Jehiel (2005)), and support the theory of efficient heuristic in the psy-chology literature (Gigerenzer and Goldstein (1996) and Gigerenzer and Gaissmaier(2011)). The results generate testable predictions of the prominence of these behav-ioral phenomena: the prominence of these behavioral anomalies should increase inthe complexity of the inference problem and decrease in the cognitive ability of indi-viduals, which is supported in Enke and Zimmermann (2019) and Graeber (2019).It is also worth contrasting this paper with the literature of rational inattention,which offers a different model to capture limited ability. First, the theory of finiteautomaton has been used in the existing economics literature to model the aversionof complexity (see Chatterjee and Sabourian (2020) for a review), which also receivessupport in experimental studies (Oprea (2020) and Banovetz and Ryan (2020)). Asthis paper aims to discuss the effect of complexity, it is thus natural to modellimited cognitive ability using finite automaton, as it measures individuals’ abilityto handle complexity. In contrast, rational inattention captures the inability toacquire information. Second, as shown in Matˇejka and McKay (2015), in the settingof rational inattention, individuals never ignore any states of the world; thus, themodel cannot account for any ignorant learning behaviors. Moreover, as I showin Appendix E, in the setting of this paper, it could be optimal for individuals toignore some states even in a symmetric environment when it is sufficiently complex,while in the setting of rational inattention, the distribution of actions is necessarilysymmetric.Last, this paper’s results on asymptotic disagreement contribute to the largeliterature that explains the phenomenon. In the existing literature, asymptoticdisagreement is driven by differences in signal distributions across states or dif-6erences in learning mechanisms (Morris (1994), Mailath and Samuelson (2020),Gilboa, Samuelson and Schmeidler (2020)), the lack of identification or uncertaintyin signal distributions (Acemoglu, Chernozhukov and Yildiz (2016)), confirmationbias (Rabin and Schrag (1999)), or model misspecification (Berk (1966), Freedman(1963, 1965)).Differently, this paper looks into the connection between limited ability and dis-agreement, and shows when asymptotic disagreement could arise and when it willnot occur, depending on the relative complexity of the inference problem. Moreover,I show in Proposition 4 that disagreement could arise solely because of differencesin cognitive abilities, even when two individuals share the same prior without modelmisspecification, perceive signals in the same way, adopt an (almost) optimal up-dating mechanism, and observe a large amount of public information. To the best ofmy knowledge, this novel mechanism that disagreement can be driven by differencesin cognitive ability has not been studied in the literature. I consider a world with N possible true states of the world, i.e., ω ∈ Ω = { , , · · · , N } ,and a decision-maker (DM) who wants to learn the true state. In each period t = 1 , · · · , ∞ , the DM tries to guess what the true state is. Formally, in each period t , the DM takes an action a t ∈ A = Ω and for ease of exposition I assume hegets utility u ( a, ω ) = u ω > a t = ω and 0 otherwise. The results hold for moregeneral utility function, as long as a correct guess yields strictly higher utility thanan incorrect guess, i.e., u ( a, ω ) > u ( a ′ , ω ) for all a ′ = a = ω . Moreover, I define u = max ω ∈ Ω u ω and u = min ω ∈ Ω u ω . The (potentially subjective) prior belief of theDM is denoted as ( p ω ) Nω =1 where P Nω =1 p ω = 1 and p ω > ω ∈ Ω.In each period after taking an action, the DM receives a signal s t ∈ S thatis independently drawn across different periods from a continuous distribution withp.d.f. f ω in state ω . I assume that no signal perfectly rules out any states of theworld: there exists ς > s ∈ S f ω ( s ) f ω ′ ( s ) > ς for all ω , ω ′ ∈ Ω . (1) Roughly speaking, the results in small worlds hold as the DM’s optimal behavior is closeto Bayesian that he almost always take the correct action. In big worlds, denote u ω min =min a ′ = a = ω [ u ( a, ω ) − u ( a ′ , ω )] for all ω , as the infimum utility loss must be weakly greater thana revised game in the baseline model where u ω = u ω min for all ω , Proposition 2 holds with moregeneral utility functions. The order, i.e., whether the DM receives a signal before or after taking an action in eachperiod, does not affect the result. The crucial assumption is that the action chosen by the DM ateach period depends only on his memory state, but does not (directly) depend on the signals thathe received. For ease of exposition, I assume that signals follow a continuous distribution but the resultshold with more general probability measures. ω and ω ′ = ω such that f ω ( s ) = f ω ′ ( s ) for (almost) all s ∈ S . This ensures that states areidentifiable. Given this assumption of identifiability, in a standard Bayesian setting,the DM learns almost perfectly the true state as t becomes very large. In contrast,I focus on the bounded memory setting that I now describe.The DM is subject to a memory constraint such that he can only update hisbelief using an M memory states automaton. That is, in each period, his belief isrepresented by a memory state m t ∈ { , , · · · , M } . Upon receiving a signal s t in pe-riod t , the DM updates his belief from memory state m t to m t +1 ∈ { , , · · · , M } . Anupdating mechanism specifies a (potentially stochastic) transition function betweenthe M memory states given a signal s ∈ S , which is denoted as T : M × S → △ M ,and a (potentially stochastic) decision rule d : M → △ A . That is, given that theDM is in memory state m and receives signal s , he takes action d ( m ) and transitsto memory state T ( m, s ). The timeline of a given period t is summarized in Figure 1. Note that the DMdoes not observe his utility after taking an action; thus, u ω is best interpreted asan intrinsic utility of being correct. Otherwise, the problem becomes trivial as theDM will learn perfectly the true state after observing a positive utility. This paperanalyzes the asymptotic learning of the DM, i.e., the DM aims to choose an updatingmechanism that maximizes his expected long run per-period utility: E " lim T →∞ T T X t =1 u ( a t , ω ) . An example of an updating mechanism.
Consider a simple example wherethere are two possible states, i.e., N = 2, and four memory states, i.e., M = 4.The DM has a uniform prior belief p = p = 0 .
5. For simplicity, assume thatall signals are informative, i.e., f ( s ) = f ( s ) for all s . Figure 2 shows a simpleupdating mechanism. The DM moves one memory state higher when he receives asignal supporting state 1 where f ( s ) > f ( s ) and moves one memory state lowerotherwise. Moreover, his decision rule is such that he chooses action 1 if and only ifhe is in memory state 3 or 4.With bounded memory, instead of tracking his belief in the segment [0 , See Blackwell and Dubins (1962). Note that the updating mechanism ( T , d ) is restricted to be stationary across all t = 1 · · · , ∞ to capture the idea of bounded memory. As discussed in Hellman and Cover (1970), a non-stationary updating mechanism implicitly assumes the ability to memorize time, thus implicitlyassumes a larger memory capacity. Switching between multiple M memory state automatons requires more than M memory states,as illustrated in Appendix A. Appendix A also shows an example to illustrate that the currentsetting allows switching between smaller automatons, e.g., three automatons with M +23 memorystates. tarts at memory state m t .Takes action a t ∼ d ( m t ).Receives signal s t .Transits to memory state m t +1 ∼ T ( m t , s t ). Figure 1: Timeline at period t given an updating mechanism ( T , d ).1 2 3 4 S S S S S S S S Figure 2: A simple updating mechanism with N = 2 and M = 4. S denotes the setof signals where f ( s ) > f ( s ) and S denote the other signals. The DM believesthat state 1 is more likely as he moves towards the higher memory states. The DM’sactions follow the following rules: d (1) = d (2) = 2 and d (3) = d (4) = 1.(resp. state 2) is “very likely” if he is in memory state 4 (resp. memory state 1),and “likely” if he is in memory state 3 (resp. memory state 2). When M increases,the DM has more memory states to track beliefs, and he would have a finer ideaof the likelihood of state 1 and 2. Moreover, by changing the transition function,the DM essentially changes the way he learns. For example, if the DM moves onememory state higher when he receives a signal s such that f ( s ) > f ( s ) but movesone memory state lower otherwise, he is more likely to be in memory state 1 and 2and thus is “biased” towards action 2. (cid:7) Given state ω ∈ Ω, the sequence m t , together with some specified initial memorystate, forms a Markov chain over the signal space S . The transition function T can be represented by a transition matrix between the memory states given a signalrealization s ∈ S . Denote q mm ′ ( s ) = Pr { T ( m, s ) = m ′ } as the transition probabilityfrom memory state m to memory state m ′ given a signal s , the transition matrixgiven a signal s is as follows: Q ( s ) = [Pr { T ( m, s ) = m ′ } ] = [ q mm ′ ( s )] (2)for m, m ′ = 1 , , · · · , M where P Mm ′ =1 q mm ′ ( s ) = 1 and q mm ′ ( s ) ≥ m, m ′ , s .Taking the expectation over s , the transition probability matrices under state ω areas follows: Q ω = Z Q ( s ) f ω ( s ) ds. (3)9enote µ ωm as the stationary probability that the DM is in memory state m when thetrue state of the world is ω , the stationary probability distribution over the memorystates µ ω = ( µ ω , µ ω , · · · , µ ωM ) T is the solution to the following system of equations: µ ω = ( µ ω ) T Q ω . (4)Given the stationary probability distribution over the memory states, the con-ditional probability of state ω given that the DM is in memory state m equals p ω µ ωm / P ω ′ ∈ Ω p ω ′ µ ω ′ m . Thus it would be optimal for the DM to choose d ∗ ( m, T ) =arg max ω ′ u ω ′ p ω ′ µ ω ′ m . With a slight abuse of notation, I often use action ω to denoteaction a = ω . Given this optimal decision rule, I often refer to the transition function T as the updating mechanism by implicitly assuming d ( m ) ∈ d ∗ ( m, T ). Further-more, without loss of generality, unless it is stated otherwise, I restrict attention todeterministic decision rules throughout the paper.Denote M ω as the set of memory states in which the DM takes action ω . Theasymptotic utility, or the long -un per-period utility, of an updating mechanism T is equal to: U ( T ) = N X ω =1 " u ω p ω X m ∈ M ω µ ωm ! (5)and the asymptotic utility loss of an updating mechanism T is equal to: L ( T ) = N X ω =1 " u ω p ω − X m ∈ M ω µ ωm ! . (6)The DM maximizes asymptotic utility or, equivalently, minimizes the asymptoticutility loss associated with the updating mechanism. In the paper, I mostly referthe optimal design of the updating mechanism as the minimization of L . In general,with similar arguments in Hellman and Cover (1970), an optimal T may not exist. Therefore, the rest of the paper focuses on ǫ -optimal updating mechanisms that aredefined as follows. Denote L ∗ M = inf T L ( T ), i.e., the infimum asymptotic utilityloss given a memory capacity M . An updating mechanism T is ǫ -optimal if andonly if L ( T ) ≤ L ∗ M + ǫ . In this model, N represents how complicated the world is, and M represents thecognitive resources/ability of the DM. This gives a natural definition of small and big This is true because signals are continuously distributed. Roughly speaking, consider that N = M = 2, and suppose that the DM could improve his asymptotic utility by switching from m = 1 to m = 2 with signals that strongly support state 1, i.e., with large F ( S ) F ( S ) . For any setof signal realizations S with strictly positive measure F ( S ) , F ( S ), the DM can always findanother set of signal realizations S ′ with strictly positive measure F ( S ′ ) and F ( S ′ ) and higherlikelihood ratio F ( S ′ ) F ( S ′ ) , thus improves his asymptotic utility. NM close to 0 Big Worlds: NM bounded away from 0Finite N At the limit of infinite M Finite M At the limit of infinite
N N = O ( M h ) where h < N = O ( M h ) where h ≥ N is much smallerthan the cognitive ability of the DM M , and is a big world problem otherwise. The formal definition is as follows:
Definition 1.
The inference problem is a small world when NM is close to and isa big world when NM is bounded away from . As N is finite in this baseline model, learning behaviors in small worlds refer tothe characteristics of the ǫ -optimal updating mechanisms at the limit where M → ∞ ,and big worlds refer to the cases where M is bounded above.In Section 6, I analyze an extension of this baseline model where N → ∞ .The extension not only poses a technical challenge but also has valuable economicimplications. In particular, it shows that both complexity and cognitive abilityaffect learning behavior: the behavioral implications depend on NM instead of theabsolute value of M . I will present in details the setup and results in Section 6but it would be helpful to briefly discuss the extension in order to fix idea. Morespecifically, I analyze a sequence of inference problems (cid:8) ( u ωN , p ωN , f ωN ) Nω =1 (cid:9) ∞ N =1 andcharacterize the ǫ -optimal updating mechanism at the limit where N, M → ∞ . Inthe extension, whether a sequence of inference problems is a small or big world atthe limit depends on the divergence speeds of N and M . When N goes to infiniteslower than M , at the limit NM → N goes to infinite weakly faster than M , at the limit NM isbounded away from 0 and the sequence of inference problems is a big world at thelimit. The results in the extension are qualitatively the same as the results in thebaseline model. The definitions of small and big worlds are summarized in Table 2.In the subsequent sections, I compare the learning behaviors, that is, the charac-teristics of the ǫ -optimal updating mechanisms (for small ǫ ) in small and big worldsby answering the following questions. It is important to define small/big worlds based on the relative size of the state space withrespect to the cognitive ability of the DM, instead of the absolute size of the state space. Inparticular, if the DM can track his belief using the real number space, i.e., M is uncountablyinfinite, then the model collapses to a standard Bayesian model and any inference problem withcountable state space is a small world problem. Note that it is difficult to derive comparative statics of learning behavior on N , as when N changes, prior beliefs ,and thus the importance of identifying different states, also change. s learning close to Bayesian? The first question I ask is whether the learningbehavior of the DM is close to that of a Bayesian individual in small and big worlds.The answer to this question sheds light on how well the Bayesian model approximatesindividuals’ learning and decision making in different problems, and how robust thetheoretical results in the literature are to the setting of bounded memory.As we know, in the current setup, a Bayesian individual will (almost) perfectlylearn the true state of the world asymptotically, no matter how big N is. Thus, thisquestion is equivalent to whether L ∗ M is close to 0 in small and big worlds. Definition 2.
The asymptotic learning behavior of the DM is close to Bayesian insmall worlds if and only if lim M →∞ L ∗ M = 0 , and is close to Bayesian in big worldsif and only if L ∗ M = 0 . Is ignorance (close to) optimal?
The behavioral economics and psychologyliterature has documented different types of ignorant learning behaviors, includingthe use of heuristics (Kahneman, Slovic and Tversky (1982)), correlation neglect(Enke and Zimmermann (2019)), persistent over-confidence (Hoffman and Burks (2017),Heidhues, K˝oszegi and Strack (2018)), ignorance of informational content of others’strategic behaviors (Eyster and Rabin (2005), Jehiel (2005, 2018)), ignorance of rel-evant variables (Graeber (2019)), or other behaviors of model simplification or mis-specification. These ignorant behaviors are sometimes thought to be naive or inat-tentive, but are sometimes viewed as efficient given our limited ability (Gigerenzer and Goldstein(1996), Gigerenzer and Brighton (2009), Gigerenzer and Gaissmaier (2011)).As applied to this setting, these ignorant learning behaviors are equivalent toupdating mechanisms where the DM never chooses some action. As I assume a one-to-one mapping from actions to states, the DM behaves as if he ignores some statesof the world. To see this, consider the classic example of the availability heuristic.In the experiment of Tversky and Kahneman (1973), the majority of participantsreported that there were more words in the English language that start with theletter K than for which K was the third letter, while the correct answer is the reverse.The proposed explanation is that individuals use the availability heuristic: they payattention only to the easiness of recalling the two types of words and ignore thefact that it is easier to recall words starting with K than words with K as the thirdletter. Define ω as the event where there are more words that start with K and ω ′ as the event where there are more words with K as the third letter, and define ω as the event where the position of the K letter affects the readiness of recall and ω ′ as the event where the position of the K letter does not affect recall probability.Define the state space as { ω , ω ′ } × { ω , ω ′ } . In Tversky and Kahneman (1973),subjects observe the easiness of recall as signals and they ignore the states ( ω , ω )and ( ω ′ , ω ): they infer whether ω or ω ′ is true while implicitly fixing ω ′ . Theformal definition of ignorance is as follows.12 efinition 3. An updating mechanism ignores state ω if the DM never picks ac-tion ω under all states of the world as t → ∞ , i.e., either M ω = ∅ or P m ∈ M ω µ ω ′ m = 0 for all ω ′ . It is important to note that I define ignorance based on the DM’s actions insteadof his “beliefs”. It is because different from the Bayesian setting, the DM withbounded memory does not track his belief for different states of the world but merelytransits between different memory states in M . These memory states do not haveto be associated with confidence levels of any of the N states. In particular, if theDM never chooses a particular action ω , he has no incentive to track his confidencelevel of state ω . Never choosing an action ω is thus effectively equivalent to ignoringthe possibility of state ω .By definition, any updating mechanism is ignorant if M < N : if the DM lacksthe cognitive resources to consider all possible states, he has to be ignorant. In therest of the paper, I will focus on the more interesting scenario where M ≥ N andanalyze whether an ignorant updating mechanism could be ǫ -optimal for small ǫ insmall and big worlds. Does disagreement persist among different individuals?
The third questionrelates to learning among heterogeneous individuals, in particular on whether theywill agree asymptotically. In recent years, the polarization of ideologies has receivedlots of attention in academia and society. Although the Internet has made manyinformation accessible, societies have become more polarized in both political viewsand beliefs on global warming (McCright and Dunlap (2011), Flaxman, Goel and Rao(2016)). Understanding the mechanism behind persistent disagreement would helpto alleviate disagreement on important issues.As discussed after the Definition 3 of ignorance behaviors, it is tricky and ar-bitrary to define the distance of beliefs among different individuals with boundedmemory. Thus, I define agreement and disagreement based on actions. Two individ-uals A and B agree with each other asymptotically if they choose the same actionunder every state of the world as t becomes very large. Definition 4.
Two individuals A and B are bound to agree with each other asymp-totically if and only if their actions a At and a Bt satisfy lim t →∞ Pr { a At = a Bt | ω } = 1 for all ω . Although one could interpret that the DM transits between a countable subset of beliefs in the N -dimensional probability simplex and always pays attention to his confidence levels of all the N states, I believe that this interpretation goes against the concept of bounded memory as it requiresunnecessary cognitive resources of the DM. hey are bound to disagree with each other asymptotically if and only if lim t →∞ Pr { a At = a Bt | ω } = 1 for all ω . And if lim t →∞ Pr { a At = a Bt | ω } ∈ (0 , p ωA ) Nω =1 and ( p ωB ) Nω =1 who observe the same long sequence of public information witheventually agree with each other. I also analyze whether two individuals who startwith the same prior but receive many private signals and have different abilities ofinformation acquisition ( f ωA ) Nω =1 and ( f ωB ) Nω =1 , or different levels of cognitive ability M A and M B , will eventually disagree with each other in small and big worlds. Thelatter investigates a link between disagreement and differences in cognitive abilities,which contrasts with the existing literature that explains asymptotic disagreementon the basis of differences in prior beliefs or uncertainties in information structuresamong different individuals. I first analyze the optimal learning mechanisms in small worlds where NM is small.The following proposition shows that asymptotic learning behavior is close to Bayesian. Proposition 1.
In small worlds where NM is very close to , asymptotic learningbehavior is close to Bayesian, i.e., lim M →∞ L ∗ M = 0 . Proposition 1 shows that in small worlds, (almost) optimal asymptotic learningmechanisms lead to behavior that is very close to a Bayesian individual, i.e., the DM(almost) always takes the action matches with the true state. It thus suggests thatwhen the complexity of the inference problem is small, or when the DM’s cognitiveability is high, his learning behavior could be well approximated by the Bayesianmodel.The proof of the proposition, along with other proofs in the paper, are shownin Appendix C. In the following, I roughly describe the proof by showing a simpleupdating mechanism, illustrated in Figure 3, achieves (almost) perfect asymptoticlearning as M goes to ∞ . The proposed simple mechanism tracks only the DM’sfavorable action and the corresponding confidence level over time. At any period t ,the DM believes one of the N actions or no action is favorable, while his confidencelevel of his favorable action, if he has one, is an integer between 1 and ⌊ M − N ⌋ . Thememory states could thus be represented by m t ∈ { }∪{ , · · · , N }×{ , · · · , ⌊ M − N ⌋} where memory state 0 stands for no favorable action. The decision rule is such thathe takes the favorable action if he has one, and randomly takes one of the N actionswith equal probability if he does not have a favorable action. The transition rule isdescribed as follows. 14 ... action 1confidencelevel ... action 2 · · · ... · · · ... action N − ... action N with length ⌊ M − N ⌋ Figure 3: A simple updating mechanism that achieves perfect learning in smallworlds.First, the DM starts with no favorable action, i.e., the red memory state named“0” in Figure 3. If he receives a confirmatory signal for a state ω , he changes hisfavorable action to action ω with a confidence level 1; if he receives signals that isnot confirmatory for any states, he stays in the same memory state 0 in which hehas no favorable action.Second, suppose at some period t the DM’s favorable action is action ω withconfidence level K . If he receives a confirmatory signal for state ω , he revises hisconfidence level upwards to K + 1 if K is not already at the maximum ⌊ M − N ⌋ , andstays in the same memory state if K is at the maximum. Third, if he receives aconfirmatory signal for state ω ′ = ω (which is against his current favorable action),he revises his confidence level downwards to K − δ < K ≥
2, transits to the red memory state 0 with no favorable action with probability δ < K = 1, and stays in the same memory state with probability 1 − δ . Lastly, if hereceives signals that are not “confirmatory” for any states, he stays in his currentmemory state with his favorable action and confidence level unchanged. This simpleupdating mechanism could thus be interpreted as a learning algorithm that tracksthe confidence level of only one state/action at a time, with underreaction to belief-challenging signals (captured by δ < ω , it is more likely for the DM to receive aconfirmatory signal for state ω than a confirmatory signal for any other states. Italso involves choosing a sufficiently large δ such that it is more likely for the DM The starting memory state has no impact on the stationary distribution over the memorystates and does not affect the asymptotic payoff.
15o adjust his confidence level upwards than to adjust it downwards. Crucially, withthis simple mechanism, when MN → ∞ , the upper bound of the confidence level goesto infinite which means that the DM can record infinitely many confirmatory signalsfor any state. Because asymptotically, the DM receives infinitely more confirmatorysignals for the true state than the confirmatory signals for any other states, as t → ∞ , the DM should almost surely record infinitely many confirmatory signalsfor the true state and thus almost surely learn perfectly the true state.Now, as the DM learns perfectly for all states of the world, intuitively he hasno incentive to ignore any of the N states. In particular, the utility loss of anyignorant updating mechanism is bounded below by min u ω p ω , while the utility lossof the simple mechanism illustrated in Figure 3 converges to 0 as M becomes verylarge. Thus, all ignorant updating mechanisms must be outperformed by the simplemechanism when M is sufficiently large, which implies that ignorance in learning isnot optimal in small worlds. Corollary 1.
Given a finite N , there exists some ¯ M such that if M > ¯ M (e.g., insmall worlds where M → ∞ ), there exists some ǫ > such that no ignorant updatingmechanism is ǫ -optimal. I now turn to the question of whether disagreement could persist asymptotically.Consider two individuals A and B who have different prior beliefs ( p ωA ) Nω =1 and( p ωB ) Nω =1 , or different abilities of information acquisition captured by ( f ωA ) Nω =1 and( f ωB ) Nω =1 . As Proposition 1 holds for all ( p ω ) Nω =1 and ( f ω ) Nω =1 , different individualswill be bound to choose the same (optimal) action asymptotically if they adopt an ǫ -optimal updating mechanism with very small ǫ . Corollary 2.
In small world where NM is very close to , different individuals withdifferent prior beliefs and/or information acquisition abilities are bound to agreeasymptotically in small worlds. That is, for all ( u ωA , p ωA , f ωA ) Nω =1 and ( u ωB , p ωB , f ωB ) Nω =1 ,as M → ∞ , if the two individuals adopt ǫ -optimal mechanisms with ǫ → , lim M →∞ lim ǫ → lim t →∞ Pr { a At = a Bt | ω } = 1 for all ω. Thus, Corollary 2 shows that different individuals with different prior beliefsand/or information acquisition abilities who adopt (almost) optimal updating mech-anisms are bound to agree with each other if they receive a large amount of (publicor private) information. For example, individual A could receive noisier signals than individual B , i.e., f ωA = γ +(1 − γ ) f ωB for some γ ∈ (0 , A could have different learning advantages in identifying somestates better but other states worse than individual B , i.e., sup s f ωA ( s ) /f ω ′ A ( s ) > sup s f ωB ( s ) /f ω ′ B ( s )but sup s f ω ′′ A ( s ) /f ω ′′′ A ( s ) < sup s f ω ′′ B ( s ) /f ω ′′′ B ( s ) for some ω, ω ′ , ω ′′ , ω ′′′ . Note that I do not assume that individuals have the “correct” prior beliefs. As long as theirprior beliefs satisfy the full support assumption such that p ω > ω , Proposition 1 andCorollary 1 hold.
16t is important and interesting to note that the assumption that the DM isable to design and adopt an optimal updating mechanism is not as unrealistic asone may think. In particular, there is a large set of updating mechanisms thatwould achieve perfect learning and guarantee asymptotic agreement as M becomeslarge. I illustrate this “robustness” result in the Appendix D. Roughly speaking, Iconsider the aforementioned simple updating mechanism illustrated in Figure 3 butassume that the DM mistakenly transits to a neighboring memory state with someprobability γ in each period regardless of the signal realization s : in each periodif the DM has a confidence level K where ⌊ M − N ⌋ > K >
1, he mistakenly adjustshis confidence level by one unit upwards or downwards with equal probability γ ;if K = ⌊ M − N ⌋ , he adjusts his confidence level downwards by one unit mistakenlywith probability γ ; if K = 1, he adjusts his confidence level upwards by one unit ortransit to no favorable action (the red memory state) with equal probability γ ; if hehas no favorable action, he changes his favorable action to action ω with confidencelevel 1 with equal probability γN for all ω ; and with probability 1 − γ , the DM makesno mistakes and follows the simple updating mechanism illustrated in Figure 3.Such local mistakes could be induced by mistakes in the perception of signals orimperfect tracking (local fluctuation) of memory states. Appendix D shows that theresults of (almost) perfect learning, i.e., Proposition 1, and asymptotic agreement,i.e., Corollary 2, hold for all γ ∈ [0 , Now, I proceed to the analysis of the big world and show that the three implicationsin the small world, i.e., asymptotic learning is close to Bayesian, ignorance is neveroptimal, disagreement does not persist, do not hold. Before I present the results,it would be useful to discuss a simple example, where N = M = 2, to illustratehow bounded memory affects asymptotic learning and to introduce variables thatcapture the important features of an updating mechanism. An example of N = M = . Suppose that M = { } and M = { } , i.e., theDM takes action 1 in memory state 1 and action 2 in memory state 2. An importantfeature of the updating mechanism is the state likelihood ratios that measure howlikely the DM will be in memory state m under state ω vs that under state ω ′ . Definition 5.
The state ω − ω ′ likelihood ratio at memory state m is defined as µ ωm µ ω ′ m . The higher the state ω − ω ′ likelihood ratio at memory state m is, given that theDM is in memory state m , the more confident he is that the true state is ω instead of ω ′ . In this simple example, a good updating mechanism induces a high µ µ and a low17 µ : the DM is confident that state 1 (resp. 2) is true when he takes action 1 (resp. 2).In particular, perfect learning requires that the state 1 − ω , the DM has to have the ability to record/memorize almostperfect information that supports state ω , either through recording one (almost)perfect confirmatory signal or infinitely many imperfect confirmatory signals thatsupports state ω . However, in the current setting, the former is constrained bythe informativeness of the signal structures, i.e., Equation (1), and the latter isconstrained by the bounded memory.To illustrate this constraint, define the state ω − ω ′ spread as follows. Definition 6.
Denote the state ω − ω ′ spread as Υ ωω ′ which is given by the followingequation: Υ ωω ′ = max m ∈ M ω µ ωm µ ω ′ m min m ∈ M ω ′ µ ωm µ ω ′ m In this simple example, Υ = µ µ / µ µ and it has to go to infinity if the DMperfectly learns the true state. However, I now show that Υ is bounded above,such that the DM can never be sure both when he takes action 1 and 2. In anirreducible automaton, i.e., µ ωm > m = 1 ,
2, the probability mass moving from m = 1 to m = 2 must be equal to the probability mass moving to the oppositedirection under both states. Suppose that the DM updates his belief from m = 1to m = 2 given some signals S and updates in the opposite direction given somesignals S ; then, we have in the stationary distribution µ F ( S ) = µ F ( S ) µ F ( S ) = µ F ( S )under state 1 and 2 respectively. Thus,Υ = µ µ / µ µ = F ( S ) F ( S ) / F ( S ) F ( S ) ≤ l l (7)where l ωω ′ = sup F ω ′ ( S ′ ) > F ω ( S ′ ) F ω ′ ( S ′ ) . This bound on the state 1 − l l =100. If the DM wants to design an automaton such that he chooses action 1 with18 µ µ − µ µ − µ ≥ u p µ ∗ + u p µ ∗ = L ∗ µ ∗ µ ∗ Figure 4: This example illustrates the trade-off between inference of the two states,where l l = 100. The feasibility set of µ and µ is shown as the gray area (whichmay not include the boundary) and the infimum of utility loss is given by the pointwhere the objective function touches the feasibility set.probability 99% in state 1 , then 99 µ / µ ≤ µ µ ≥ ,µ ≥ , i.e., the DM has to make mistake in state 2 (chooses action 1) almost half of thetime. In contrast, if he decreases his quality of decision making in state 1 such thathe chooses action 1 in state 1 with probability 90%, then µ ≥ and he couldmake mistake in state 2 with probability as low as 8 . u p vs u p . When u p is bigger than u p , the DM is willing to sacrifice inference in state 2 to improveinference in state 1. This trade-off is illustrated in Figure 4. As I will show later,this trade-off implies that ignorant learning behavior could be optimal. (cid:7) The following proposition shows that in the big world asymptotic learning isimperfect, and significantly different from Bayesian. Combined with the results inthe extension (Proposition 5 and 6 in Section 6), it shows that what matters isthe ratio MN rather than the absolute value of M . Importantly, it shows that bothcomplexity and cognitive ability affect learning. Fixing the cognitive resources ofindividuals, learning behaviors in a complicated problem, i.e., when N is large, aremore likely to differ from Bayesian learning compared to that in a simple problem;19hile fixing the complexity of the inference problem, learning behavior of a DM withlower cognitive ability is more likely to differ from Bayesian learning. Proposition 2.
In big worlds where NM is bounded away from , asymptotic learningdiffers significantly from Bayesian learning, i.e., L ∗ M > . Moreover, fixing N , L ∗ M decreases in M , i.e., asymptotic learning becomes closer to Bayesian learning as M increases. Comparing Proposition 2 with Proposition 1 gives us the first difference of learn-ing behavior in small and big worlds. In small worlds, asymptotic learning could bewell approximated by Bayesian updating; while in big worlds, asymptotic learningbehavior significantly differs from Bayesian updating. Moreover, the second part ofProposition 2 implies that fixing N , learning behavior gets closer to the Bayesianbenchmark when M increases, or equivalently when the relative complexity NM de-creases. This “continuity” result implies that learning is close to Bayesian not onlyat the limit of NM →
0, but also in general when NM is small.To understand the intuition of Proposition 2, it would be useful to revisit thesimple updating mechanism proposed in the small world (see Figure 3), but notethat the intuition holds for any updating mechanisms. Different from the case ofsmall worlds, in big worlds, the range of confidence levels are bounded above forall states of the world as MN < ∞ , as illustrated in Figure 5. As MN is boundedabove, no actions could be allocated with infinite memory states. Therefore, theDM is not able to record infinite (imperfect) signals supporting each state againstthe other, and it only takes a finite number of belief-challenging to convince the DMto switch between actions. Thus the DM will not be almost sure about his action.As illustrated in the simple example where N = M = 2, when the DM is neveralmost sure about choosing action ω , it implies that there is a positive probabilitythat he chooses action ω in some state ω ′ = ω , which leads to utility loss. Thus L ∗ M > Proposition 3.
In big worlds where NM is bounded away from , there exists some ξ > such that when p ω < ξ and ǫ → , all ǫ -optimal updating mechanisms ignorestate ω , i.e., lim ǫ → M ω = ∅ or lim ǫ → P m ∈ M ω µ ω ′ m = 0 for all ω ′ ∈ Ω . Note that it is difficult to directly compute the comparative statics with respect to MN becausewhen N changes, prior beliefs and the signal structures, i.e., the nature of the inference problem,also change. The extension in Section 6, however, clearly shows that the ratio MN matters for thebehavioral implication. Roughly speaking, for any updating mechanisms, M ω must contain only a finite number ofmemory states for all ω . It thus takes only a finite number of steps to transit from M ω ′ to M ω ,and it happens with probability bounded away from 0. Put differently, the updating mechanismis “noisy” and thus the DM can not be almost sure about choosing action ω . action 1confidencelevel ... action 2 · · · ... · · · ... action N − ... action N with boundedlength Figure 5: The simple updating mechanism, revisited, in a big world and with dif-ferent ranges of confidence level across different actions. In particular, confidencelevels of all actions 1 , , · · · , N are bounded above. On the other hand, suppose for some ( u ω ′ , p ω ′ ) Nω ′ =1 , when ǫ → , all ǫ -optimalupdating mechanisms ignore state ω . Then for (˜ u ω ′ , ˜ p ω ′ ) Nω ′ =1 such that ˜ u ω ′ ˜ p ω ′ ˜ u ω ′′ ˜ p ω ′′ = u ω ′ p ω ′ u ω ′′ p ω ′′ for all ω ′ , ω ′′ = ω ; ˜ u ω ˜ p ω ˜ u ω ′ ˜ p ω ′ < u ω p ω u ω ′ p ω ′ for all ω ′ = ω, when ǫ → , all ǫ -optimal updating mechanisms ignore state ω . In a big world, as previously mentioned, the DM cannot allocate infinite cognitiveresources to every state of the world. As the DM is bound to make mistakes, hethus faces meaningful trade-off in the allocation of cognitive resources as he tradesoff between the probability of mistakes in different states of the world. Intuitively,when the prior probability of a state is low, the DM would rather allocate cognitiveresources to infer about other states of the world and ignore this a priori unlikelystate. This result contrasts with Matˇejka and McKay (2015) that studies decisionmaking in a setting of rational inattention as ignoring an action is never optimal intheir setting. In small worlds, the DM can allocate infinite memory states to all actions. Adding or takingaway a finite number of memory states for each action thus does not affect stationary distributionand utility. Thus there is no meaningful trade-off in small worlds. Note that the trade-off happens not only in the allocation of memory states to M ω for different ω , but also in choosing the asymptotic probability of taking different actions. More specifically, alower probability of choosing action ω , P m ∈ M ω µ ω ′ m = 0, implies a higher probability of choosingother actions. ω also depends on the informativeness ofthe signal structures. In particular, as shown in Equation (C.8) in the proof ofProposition 3, if u ω p ω < u ω ′ p ω ′ for some ω ′ and if l ωω ′ l ω ′ ω is small, i.e., if state ω isa priori less favorable than state ω ′ and it is different to distinguish the two states,then the DM must ignore state ω . That is, the DM ignores states that are difficultto identify. He rather saves his cognitive resources to learn other states of the worldas the “return” for allocating cognitive resources to identify state ω is small.I also show in Appendix E that ignorant learning behavior can be optimal evenin a symmetric environment where states and actions are ex-ante identical, i.e., p ω , u ω and l ωω ′ are the same across all ω and ω ′ = ω . To see the intuition, consider anexample where N = M . To consider all states, the DM allocates one memory stateto each action. It is thus easy for the DM to alternate between different actions andunavoidably make mistakes. Put differently, the updating mechanism is “noisy”.This is especially true when N is large as one memory state constitutes only a smallpart of the automaton. If, in contrast, the DM ignores half of the actions, he allocatestwo memory states to each of the actions he that considers and he switches betweenactions less frequently. This improves his decision making among the smaller set ofstates that he considers. When N is large, the improvement outweighs the loss heincurs among the states that he ignores, because the loss is small to begin with, i.e.,the asymptotic utility of a “noisy” updating mechanism that considers all states issmall.Comparing Proposition 3 with Corollary 2 shows that ignorance in learningcould be optimal only in big worlds but not in small worlds. As mentioned inthe literature review (Section 2), this explains different behavioral anomalies, in-cluding the use of heuristics (Tversky and Kahneman (1973)), correlation neglect(Enke and Zimmermann (2019)), persistent over-confidence (Hoffman and Burks (2017)and Heidhues, K˝oszegi and Strack (2018)), inattentive learning (Graeber (2019))and other behaviors of model simplification and misspecification, as optimal strate-gies in face of complexity. It also provides a micro-foundation for the equilibriumconcepts proposed in Eyster and Rabin (2005) and Jehiel (2005). Importantly, itprovides a clear and testable prediction of under what circumstances these behav-ioral anomalies are more prominent: in big worlds where the complexity of inferenceproblems is high relative to the cognitive ability of individuals.Now consider two individuals with different utility functions and prior beliefs( u ωA , p ωA ) Nω =1 and ( u ωB , p ωB ) Nω =1 , but receive a long sequence of public signals where( f ωA ) Nω =1 = ( f ωB ) Nω =1 for all ω , or two individuals with the same utility functionsand prior beliefs, but receive private signals that are generated by different signalstructures, or two individuals with different utility functions, prior beliefs, signalstructures, etc. The following corollary shows that they might disagree with eachother with certainty as t → ∞ when they adopt an (almost) optimal updating22echanism. Corollary 3.
In big worlds where NM is bounded away from , there exists some ( u ωA , p ωNA , f ωA ) Nω =1 and ( u ωB , p ωB , f ωB ) Nω =1 such that the two individuals are bound to dis-agree when they adopt ǫ -optimal updating mechanism with ǫ → : lim ǫ → lim t →∞ Pr( a At = a Bt | ω ) = 1 for all ω. The result is directly implied by Proposition 3. As different individuals withdifferent prior beliefs and utilities would adopt different updating mechanisms, theycould focus their learning on different subsets of states of the world. In particular,consider an example with N = 4, if individual A ignores state 1 and 2 and individual B ignores state 3 and 4, they will never choose the same action and thus disagreewith certainty. By comparing Corollary 3 with Corollary 2, we can conclude thatasymptotic disagreement only happens in big worlds but not in small worlds.
Corollary 3 shows that individuals could be bound to disagree asymptotically if theyhave different utilities, prior beliefs, and/or information structures. This subsectionshows that disagreement could also be driven by differences in cognitive ability. Iillustrate the result in the following simple example. Consider a setting with N = 3and two individuals, A and B , who share the same prior beliefs and the sameobjective signal structure: p = 13 + 2 νp = 13 − νp = 13 − ν (8)sup s f ( s ) f n ( s ) = sup s f n ( s ) f ( s ) = √ τ for n = 2 , s f ( s ) f ( s ) = sup s f ( s ) f ( s ) = √ > τ . (9)with 1 + τ ≥ +2 ν − ν . Moreover, to simplify things, assume that u = u = u = 1.The only difference the two individuals have is their levels of cognitive ability, where M A = 1 and M B = 2. I present the following result. Note that the result of disagreement continues to hold in a classic agree-to-disagree frameworkwhere individuals observe each other’s actions. More specifically, one could re-define the signalstructures in the current setting to incorporate the information conveyed by the actions taken bythe two individuals. It ensures that if M ≥
2, the DM never chooses action 1 with probability 1 and he can achievea lower utility loss compared to the benchmark of no information. roposition 4. There exists ν, τ, Υ such that individual A and B who adopt ǫ -optimal mechanisms are bound to disagree for small ǫ , i.e., ∃ ν, τ, Υ , ˜ ǫ > such that lim t →∞ Pr { a At = a Bt | ω } = 1 for all ω and ǫ < ˜ ǫ . The intuition of the proposition is as follows: individual A always chooses action 1as he does not have sufficient cognitive resources to learn. On the other hand, when ν is small enough, and when Υ is much larger than τ , it is more beneficial forindividual B to focus on learning state 2 and 3 as the signals that support the twostates are more informative. As a result, he never chooses action 1 and thus neveragrees with individual A . Proposition 4 shows that even when individuals have the same prior beliefs andthe same knowledge about the signal structures, they can be bound to disagreewith each other after receiving a large amount of public information. This resultproposes a different channel of asymptotic disagreement that is in contrast with theexisting explanations that assume different individuals have different prior beliefs orperceive signals in different ways. N In the baseline model where N is fixed and finite, it is difficult to analyze theeffects of a change in complexity N , as the prior beliefs and thus the importanceof identifying different states also changes with N . Consequently, to show that thebehavioral implications depend on both N and M , and more specifically the ratio NM , instead of just the absolute value of M , I analyze an extension where N, M → ∞ in this section. In particular, the results in the baseline model hold in both smallworlds where N = O ( M h ) with h < N = O ( M h ) with h ≥ (cid:8) ( u ωN , p ωN , f ωN ) Nω =1 (cid:9) ∞ N =1 .The subscript N emphasize the fact that the utilities, prior beliefs, and informationstructures could change along the sequence. I first examine whether learning isclose to Bayesian at the limit, i.e., whether lim N,M →∞ L ∗ NM = 0. Second, I evaluatewhether a sequence of updating mechanisms { T N } ∞ N =1 that ignores some states at One may argue that after seeing individual B choosing action 2 or 3, individual A shouldchange his action. However, this is not possible as he has only one unit of memory capacity M = 1and thus has to effectively commit to one action. In particular, one can generalize this frameworkto which the two individuals also see each others’ actions as signals and Proposition 4 would stillhold. Note that although this example imposes strong assumptions in particular on the size ofbounded memory of individual A , it generates a strong form of disagreement in which the twoindividuals disagree asymptotically with certainty. Similar intuition implies that even when theassumption is relaxed, the difference in M would lead to asymptotic disagreement at least proba-bilistically. N, M → ∞ . Third, I investiage whether two in-dividuals who adopt a sequence of ǫ -optimal mechanisms could disagree with eachother asymptomatically as N, M → ∞ .Before I present the result, it is necessary to present some additional assumptionsabout the nature of the sequence of inference problems. First, I assume that u ωN ∈ [ u, u ] for some u > ω and all N . Thus along the sequence of inferenceproblems, no states are infinitely more important than another state.The prior beliefs of the DM satisfy the following full support assumption: Assumption 1.
For any sequence of subset of states Ω N ⊆ Ω with cardinality | Ω N | and a well-defined limit lim N →∞ | Ω N | N , lim N →∞ X ω ∈ Ω N p ωN > if lim N →∞ | Ω N | N > . (10)That is, Equation (10) ensures any sequence of subsets of states that has anon-negligible measure in fractions at the limit, i.e., lim N →∞ | Ω N | N >
0, also has anon-negligible probability mass at the limit, i.e., lim N →∞ P ω ∈ Ω N p ωN > A simpleexample would be p ωN = N for all ω and N .Similar to the baseline model, I assume that no signals rule out any states of theworld: there exists ς > s ∈ S f ωN ( s ) f ω ′ N ( s ) > ς for all ω , ω ′ ∈ Ω and for all
N . (11)Finally, I make the following assumption about identifiability that plays similarrole to the assumption that f ω = f ω ′ in the baseline model. It indicates that signalstructures with negligible Cauchy-Schwarz distances must have negligible probabilitymass at the limit where N → ∞ : Assumption 2.
For all ǫ > , there exists some ξ > and some sequence of subsetsof states N ξ ⊆ Ω such that lim N →∞ (cid:16)P ω ∈ N ξ p ωN (cid:17) > − ǫ and lim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω − log R f ωN ( s ) f ω ′ N ( s ) ds qR ( f ωN ( s )) ds qR ( f ω ′ N ( s )) ds > ξ, Appendix B provides an example of the signal structures that satisfy Assump-tion 2 and Equation 11. First, I present the results in small worlds. The followingproposition shows that learning is close to Bayesian.
Proposition 5.
In small worlds where
N, M → ∞ and N = O ( M h ) with h < ,asymptotic learning behavior is close to Bayesian, i.e., lim N,M →∞ L ∗ NM = 0 . For example, it rules out prior beliefs ( p ω ) Nω =1 such that p ω = N for ω ≤ N and p ω = N − N for ω > N , i.e., first half of all possible states have negligible probability mass at the limit. MN → ∞ : the DM could allocate infinite memory states to each state,and his confidence level for each action ranges from 1 to ∞ . As the DM could bealmost sure about taking each action, he makes no mistakes asymptotically and healmost always matches his action with the true state.Next, as the DM can learn perfectly under every state of the world, he has noincentive to ignore some states of the world to focus on a strict subset of states, i.e.,ignorance in learning is not optimal. Corollary 4.
In small worlds where
N, M → ∞ and N = O ( M h ) with h < , asequence of updating mechanisms { T N } ∞ N =1 is ǫ -optimal at the limit only if it ignoresat most ǫu measures of states at the limit. In particular, when ǫ →
0, Corollary 4 shows that the optimal updating mech-anism must ignore at most a negligible amount of states. To complete the analysisin small worlds, the following corollary shows that different individuals are boundto agree in small worlds if they agree on the probability 0 events. Corollary 5.
In small worlds where
N, M → ∞ and N = O ( M h ) with h < , dif-ferent individuals with different prior beliefs and/or information acquisition abilitiesare almost bound to agree asymptotically in small worlds if they agree on probability events. That is, for all { ( u ωNA , p ωNA , f ωNA ) Nω =1 } ∞ N =1 and { ( u ωNB , p ωNB , f ωNB ) Nω =1 } ∞ N =1 such that lim N →∞ p ωNA > if and only if lim N →∞ p ωNB > for all ω and N , if thetwo individuals adopt a sequence of ǫ -optimal mechanisms with ǫ → , lim N,M →∞ lim ǫ → X ω h n lim t →∞ Pr N { a At = a Bt | ω } = 1 o p ωNI i = 1 for I = A, B.
That is, they take the same action in almost all states of the world, measures in both lim N →∞ ( p ωNA ) Nω =1 and lim N →∞ ( p ωNB ) Nω =1 . Now I present the results in big worlds. First, the following proposition showsthat asymptotic learning is different than Bayesian.
Proposition 6.
In big worlds where
N, M → ∞ and N = O ( M h ) with h ≥ ,asymptotic learning differs significantly from Bayesian learning, i.e., lim N,M →∞ L ∗ NM > . The intuition is again similar to Proposition 2 in the baseline model. In thisextension, similar to the baseline model, MN < ∞ implies that the DM can only Note that in the baseline model, individuals necessarily agree on the probability 0 events as p ω > ω . In contrast, Assumption 1 does not guarantee this: there may exist some ω suchthat lim N,M →∞ p ωNA > N,M →∞ p ωNB = 0 Proposition 7.
In big worlds where
N, M → ∞ and N = O ( M h ) with h ≥ ,all updating mechanisms must ignore almost all (measured in fraction) states at thelimit. Note that the statements in Proposition 3 and 7 are different. Proposition 3indicates that there exist some prior beliefs such that all (almost) optimal updatingmechanisms will be ignorant. In contrast, Proposition 7 is “stronger”: for all priorbeliefs, all learning mechanisms, including the (almost) optimal updating mecha-nism, have to be ignorant. Roughly speaking, the intuition of Proposition 7 is asfollows: when M → ∞ , the stationary probability of each memory state becomesinfinitesimally small. As the DM cannot allocate infinite memory state in almost all M ω , i.e., the DM takes almost all action ω in a finite subset of memory states, hemust choose almost all actions with 0 probability.To complete the analysis in the extension, the following corollary shows thatdisagreement could be persistent in big worlds, as in the baseline model. Corollary 6.
In big worlds where
N, M → ∞ and N = O ( M h ) with h ≥ , thereexists some { ( u ωNA , p ωNA , f ωNA ) Nω =1 } ∞ N =1 and { ( u ωNB , p ωNB , f ωNB ) Nω =1 } ∞ N =1 such that for all ǫ , there exists some sequence of ǫ -optimal updating mechanisms { ( T NA , d NA ) } ∞ N =1 and { ( T NB , d NB ) } ∞ N =1 such that individual A and B who adopt { ( T NA , d NA ) } ∞ N =1 and { ( T NB , d NB ) } ∞ N =1 are bound to disagree: lim N,M →∞ lim t →∞ Pr N ( a At = a Bt | ω ) = 1 for all ω. Note that Corollary 6 is weaker than Corollary 3 in the sense that there existsome, but not all, ǫ -optimal updating mechanisms that lead to disagreement. Theintuition is as follows: as shown in Proposition 7, the DM must ignore almost allactions. Thus, he obtains positive utility only under a negligible amount of states,and it implies that a large set of updating mechanisms could be ǫ -optimal. Inparticular, when the prior likelihood of all states are infinitesimally small at thelimit, e.g., p ω = N , the supremum utility equals 0 and all updating mechanisms are ǫ -optimal. This leaves ample room for the individuals to adopt different optimallearning mechanisms and thus leads to disagreement. By comparing Corollary 627nd 5, we can conclude that the same result in the baseline model holds qualitativelyin this extension: asymptotic disagreement does not arise in small worlds but couldarise in big worlds. Ignorant learning and heuristics
This paper explains a wide range of behav-ioral anomalies under the same framework, i.e., the efficient allocation of cognitiveresources in light of complexity. It includes different inattentive/neglecting/heuristiclearning behaviors documented in Tversky and Kahneman (1973), Hoffman and Burks(2017), Enke and Zimmermann (2019) and Graeber (2019), and the results providemicro-foundation for the concepts of cursed equilibrium (Eyster and Rabin (2005)),analogy-based equilibrium (Jehiel (2005)), selection neglect (Jehiel (2018)) or mis-guided learning (Heidhues, K˝oszegi and Strack (2018)). In the following, I discuss indetail the connection of this paper with Enke and Zimmermann (2019) and Graeber(2019).Importantly, the comparison of small and big worlds illustrates a link betweenthe (relative) complexity of the inference problem and the aforementioned non-Bayesian learning behaviors, which is supported by the experimental results inEnke and Zimmermann (2019) and Graeber (2019). Enke and Zimmermann (2019)showsin Section 2.4.3 that inattentive learning negatively correlates with the cognitive abil-ity of subjects and in Section 3.1 that “an extreme reduction in the environment’scomplexity eliminates the bias”, while Graeber (2019) shows that a reduction in thecomplexity of the problem by removing a decipher stage of signals reduces inatten-tive learning behavior.Interestingly, Enke and Zimmermann (2019) and Graeber (2019) also show thatsimply reminding subjects about the neglected variables reduces inattentive learn-ing and improves inference. This “reminder effect” cam be reconciled in the currentsetup via an effect of a change in the state space. Consider the behavior of inatten-tive inference in Graeber (2019). The author shows that when subjects are asked toguess the realization of a variable A , they often ignore the effect of another variable B on the signal distribution. Applying this to the setting in this paper, considerthat before being reminded about the ignored variable, the state space is supp( A ) × supp( B ) ×{ B affects the signal distribution , B does not affect the signal distribution } ,in which subjects might ignore the states that say “ B affects the signal distribu-tion”. After being reminded about the effect of B , the set of states of the world iseffectively reduced to supp( A ) × supp( B ) × { B affects the signal distribution } , thecomplexity decreases, and subjects adopt another learning mechanism that mightnot involve ignorant learning. 28 uture research directions The mechanism mentioned in the previous para-graph brings forth an open question that is not answered in this paper. In reality,individuals face different (sets of) inference problems and are likely endowed withdifferent learning mechanisms for different sets of states of the world. Like in theexample mentioned in the previous paragraph, some state spaces could be nestedin other state spaces, and upon receiving some information that triggers the sub-jects to revise the state space, they could transit from one learning mechanism toanother. This is also related to the question of how individuals construct the statespace given an inference problem. Arguably, there are infinitely many variables thatmight affect the signal distributions, and their realizations could be incorporatedin the set of possible states. Roughly speaking, the result of ignorance seems tosuggest that individuals may only include the most “important” or “a priori prob-able” states, while the “reminder effect” suggests that the construct of the statespace also depends on the information received by the individual. Moving forward,I believe that the question of how individuals construct their perceived state spaceand the corresponding prior belief deserves more in-depth and careful analysis as itis fundamental to individuals’ learning behavior.Building on the results of this paper, another research question that is worthpursuing is how complexity and cognitive ability affect learning in presence of ex-perimentation incentives. In settings with experimentation, different from this pa-per, the DM’s actions affect his future learning via a change in the future signalstructures. There is an experimentation incentive that the DM takes an action notonly to maximize his utility in the current period, but also to influence his futureutilities as his action may improve or worsen the precision of future signals. Whenthe environment is more complex, or when the DM has a lower level of cognitiveability, learning is less efficient in general and this may give the DM more incentiveto choose instantaneous gratification instead of experimentation that yields long-term benefit. One could also argue that as learning is less efficient, instantaneousgratification is less satisfactory, and the DM might be more patient to experiment.The overall effect is thus far from obvious. Understanding better how complexityand cognitive ability affect experimentation could help to improve insufficient ex-perimentation, to increase the incentive to experiment, to aid the development ofscientific reasoning skills in educational settings (Zimmerman (2000)), etc.29 eferences
Acemoglu, Daron, Victor Chernozhukov, and Muhamet Yildiz (2016) “Fragility ofasymptotic agreement under Bayesian learning,”
Theoretical Economics , Vol. 11,No. 1, pp. 187–225.Banovetz, James and Opera Ryan (2020) “Complexity and Procedure Choice,”
Working Paper .Basu, Pathikrit and Kalyan Chatterjee (2015) “On interim rationality, belief forma-tion and learning in decision problems with bounded memory,”
Working Paper .Berk, Robert H (1966) “Limiting behavior of posterior distributions when the modelis incorrect,”
The Annals of Mathematical Statistics , Vol. 37, No. 1, pp. 51–58.Blackwell, David and Lester Dubins (1962) “Merging of opinions with increasinginformation,”
The Annals of Mathematical Statistics , Vol. 33, No. 3, pp. 882–886.Chatterjee, Kalyan and Hamid Sabourian (2020) “Game theory and strategic com-plexity,”
Complex Social and Behavioral Systems: Game Theory and Agent-BasedModels , pp. 639–658.Chauvin, Kyle P (2019) “Euclidean properties of Bayesian updating,”
Working pa-per .Compte, Olivier and Andrew Postlewaite (2012) “Belief formation,”
Unpublishedmanuscript, University of Pennsylvania .Enke, Benjamin and Florian Zimmermann (2019) “Correlation neglect in belief for-mation,”
The Review of Economic Studies , Vol. 86, No. 1, pp. 313–332.Eyster, Erik and Matthew Rabin (2005) “Cursed equilibrium,”
Econometrica , Vol.73, No. 5, pp. 1623–1672.Flaxman, Seth, Sharad Goel, and Justin M Rao (2016) “Filter bubbles, echo cham-bers, and online news consumption,”
Public opinion quarterly , Vol. 80, No. S1,pp. 298–320.Forbes, William, Robert Hudson, Len Skerratt, and Mona Soufian (2015) “Whichheuristics can aid financial-decision-making?”
International review of Financialanalysis , Vol. 42, pp. 199–210.Freedman, David A (1963) “On the asymptotic behavior of Bayes’ estimates inthe discrete case I,”
The Annals of Mathematical Statistics , Vol. 34, No. 4, pp.1386–1403. 301965) “On the asymptotic behavior of Bayes estimates in the discrete caseII,”
The Annals of Mathematical Statistics , Vol. 36, No. 2, pp. 454–456.Gigerenzer, Gerd and Henry Brighton (2009) “Homo heuristicus: Why biased mindsmake better inferences,”
Topics in cognitive science , Vol. 1, No. 1, pp. 107–143.Gigerenzer, Gerd and Wolfgang Gaissmaier (2011) “Heuristic decision making,”
An-nual review of psychology , Vol. 62, pp. 451–482.Gigerenzer, Gerd and Daniel G Goldstein (1996) “Reasoning the fast and frugalway: models of bounded rationality.,”
Psychological review , Vol. 103, No. 4, p.650.Gilboa, Itzhak, Larry Samuelson, and David Schmeidler (2020) “Learning (to Dis-agree?) in Large Worlds,”
Working Paper .Gilboa, Itzhak and David Schmeidler (1995) “Case-based decision theory,”
TheQuarterly Journal of Economics , Vol. 110, No. 3, pp. 605–639.(1996) “Case-based optimization,”
Games and Economic Behavior , Vol. 15,No. 1, pp. 1–26.(1997) “Act similarity in case-based decision theory,”
Economic Theory ,Vol. 9, No. 1, pp. 47–61.Graeber, Thomas (2019) “Inattentive inference,”
Working Paper .Heidhues, Paul, Botond K˝oszegi, and Philipp Strack (2018) “Unrealistic expecta-tions and misguided learning,”
Econometrica , Vol. 86, No. 4, pp. 1159–1214.Hellman, Martin E and Thomas M Cover (1970) “Learning with finite memory,”
The Annals of Mathematical Statistics , Vol. 41, No. 3, pp. 765–782.Hellman, Martin Edward (1969)
Learning with finite memory
Ph.D. dissertation,Standford University.Hoffman, Mitchell and Stephen V Burks (2017) “Worker overconfidence: Field evi-dence and implications for employee turnover and returns from training,”Technicalreport, National Bureau of Economic Research.Jehiel, Philippe (2005) “Analogy-based expectation equilibrium,”
Journal of Eco-nomic theory , Vol. 123, No. 2, pp. 81–104.(2018) “Investment strategy and selection bias: An equilibrium perspectiveon overoptimism,”
American Economic Review , Vol. 108, No. 6, pp. 1582–97.31ehiel, Philippe and Jakub Steiner (2020) “Selective sampling with information-storage constraints,”
The Economic Journal , Vol. 130, No. 630, p. 1753–1781,DOI: https://doi.org/10.1093/ej/uez068.Kahneman, Daniel (2011)
Thinking, fast and slow : Macmillan.Kahneman, Daniel, Paul Slovic, and Amos Tversky (1982)
Judgment under uncer-tainty: Heuristics and biases : Cambridge University Press.Kahneman, Daniel and Amos Tversky (1979) “Prospect Theory: An Analysis ofDecision under Risk,”
Econometrica , Vol. 47, No. 2, pp. 263–292.Leung, Benson Tsz Kin (2020) “Limited cognitive ability and selective informationprocessing,”
Games and Economic Behavior , Vol. 120, pp. 345–369.Mailath, George J and Larry Samuelson (2020) “Learning under Diverse WorldViews: Model-Based Inference,”
American Economic Review , Vol. 110, No. 5, pp.1464–1501.Matˇejka, Filip and Alisdair McKay (2015) “Rational inattention to discrete choices:A new foundation for the multinomial logit model,”
American Economic Review ,Vol. 105, No. 1, pp. 272–98.McCright, Aaron M and Riley E Dunlap (2011) “The politicization of climate changeand polarization in the American public’s views of global warming, 2001–2010,”
The Sociological Quarterly , Vol. 52, No. 2, pp. 155–194.Monte, Daniel and Maher Said (2014) “The value of (bounded) memory in a chang-ing world,”
Economic Theory , Vol. 56, No. 1, pp. 59–82.Morris, Stephen (1994) “Trade with heterogeneous prior beliefs and asymmetricinformation,”
Econometrica , pp. 1327–1347.Oprea, Ryan (2020) “What Makes a Rule Complex?”
American Economic Review ,p. forthcoming.Rabin, Matthew (1998) “Psychology and economics,”
Journal of Economic Litera-ture , Vol. 36, No. 1, pp. 11–46.Rabin, Matthew and Joel L Schrag (1999) “First impressions matter: A model ofconfirmatory bias,”
The Quarterly Journal of Economics , Vol. 114, No. 1, pp.37–82.Savage, Leonard J (1972)
The foundations of statistics : Courier Corporation.Sims, Christopher A (2003) “Implications of rational inattention,”
Journal of mon-etary Economics , Vol. 50, No. 3, pp. 665–690.32teiner, Jakub and Colin Stewart (2016) “Perceiving prospects properly,”
AmericanEconomic Review , Vol. 106, No. 7, pp. 1601–31.Tversky, Amos and Daniel Kahneman (1973) “Availability: A heuristic for judgingfrequency and probability,”
Cognitive psychology , Vol. 5, No. 2, pp. 207–232.Wilson, Andrea (2014) “Bounded memory and biases in information processing,”
Econometrica , Vol. 82, No. 6, pp. 2257–2294.Zimmerman, Corinne (2000) “The development of scientific reasoning skills,”
De-velopmental review , Vol. 20, No. 1, pp. 99–149.33
Switching between different automatons
This section illustrates how the setup in this paper encompasses learning mechanismsthat involve switching between automatons with size smaller than M .I first argue that assuming that the DM could switch between multiple learningmechanisms with M memory states implicitly implies that he has a larger memorycapacity than M . Suppose that a DM starts with ( T , d ) at memory state m and switches to ( T ′ , d ′ ) and ( T ′′ , d ′′ ) once he transit to memory state m and m respectively. As ( T , d ) = ( T ′ , d ′ ) = ( T ′′ , d ′′ ), when the DM receives a signal s at memory state m and decides to which memory state he transit to, or when hedecides which action he takes at memory state m , he has to remember whetherhe has once transited to memory state m and m . In other words, the DM has totrack not only his current memory states, but also has to memorize the (incomplete)history of his previous memory states. Switching between multiple automatons thusimplicitly implies a larger memory capacity.Now, I illustrate through an example that a M memory states automaton couldbe designed to involve switching between automatons with smaller sizes. The ex-ample is illustrated in Figure 6. In this example, the DM starts at memory state3 with a learning mechanism ( T , d ) that involves 5 memory states 1 to 5. Oncehe transits to memory state 1, he switches to another learning mechanism ( T ′ , d ′ )that involves 5 memory states 1, 10 , , ,
13. On the other hand, once he transitsto memory state 5, he switches to the learning mechanism ( T ′ , d ′ ) that involves 5memory states 5 , , , ,
9. Thus, the proposed 13 memory states automaton can beinterpreted as a mechanism that involves switching between three 5 memory statesautomatons.
B Example of a sequence of signal structures thatsatisfies Equation 11 and Assumption 2
In this section, I provide an example of a sequence of signal structures that satisfiesEquation 11 and Assumption 2. Consider the following (set of) signal structures { ( f ωN ) Nω =1 } ∞ N =1 with S = [0 , f N ( s ) = for s ∈ [ i , i +12 ) where i = 0 for s ∈ [ i , i +12 ) where i = 1 f ωN ( s ) = for s ∈ [ i ω , i +12 ω ) where i = 0 , , · · · , ω − for s ∈ [ i ω , i +12 ω ) where i = 1 , , · · · , ω − Initialmemory state T , d ) ( T ′′ , d ′′ )( T ′ , d ′ ) Figure 6: An example of a 13 memory states automaton that involves switchingbetween three 5 memory states automatons. The DM starts at memory state 3 with( T , d ), and switches to ( T ′ , d ′ ) and ( T ′′ , d ′′ ) once he transits to memory state 1and 5, respectively.The signal structures are illustrated in Figure 7. First, they satisfy Equation 11 as f ωN ( s ) f ω ′ N ( s ) ≥ for all ω, ω ′ ∈ Ω, N and s . Second, given any ω and ω ′ = ω and for any N , the Cauchy-Schwarz distance is equal to: − log R f ωN ( s ) f ω ′ N ( s ) ds qR ( f ωN ( s )) ds qR ( f ω ′ N ( s )) ds = − log ( ) +
12 23 43 + ( ) ( ) + ( ) = − log 110 / > ξ where ξ < log − C Proofs
C.1 Proof of Proposition 1
Proof.
Before I prove the proposition, I present and prove the following lemma. Withslight abuse of notations, I use F to denote the probability mass on any lotteriesof signals, i.e., for any lotteries of signal S ′ = P s g ( s ) × s where g ∈ △ ( S ∪ {∅} ), F ω ( S ′ ) ≡ R f ω ( s ′ ) g ( s ′ ) ds ′ . 35 = 1 s f N = f N = ω = 2 s f N = f N = f N = f N = ω = 3 s f N = f N = f N = f N = f N = f N = f N = f N = ...Figure 7: Signal structures that satisfies Equation 11 and Assumption 2 as N → ∞ .The signal structures comprise low and high density alternatively in 2 ω equal-sizedintervals. Lemma C.1.
There exist δ > and a set of lotteries of signals { S ω } Nω =1 such that δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) > for all ω, ω ′ ∈ Ω ; δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) > δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) for all ω ∈ Ω and ω ′ = ω . (C.1) Moreover if ( δ, { S ω } Nω =1 ) satisfies Equation (C.1) , ( δ, { S ′ ω } Nω =1 ) where S ′ ω = β ×{ S ′ ω } + (1 − β ) × {∅} also satisfies Equation (C.1) for all β ∈ (0 , .Proof. First, note that δ does not affect the second inequality. Thus, for all ω and ω ′ such that F ω ( S ω ′ N ) >
0, there always exists a sufficiently large δ that satisfies thefirst inequality.I proceed to show that there exists a set of lotteries of signals { S ω } Nω =1 such that F ω ( S ω ′ ) > ω, ω ′ ∈ Ω and satisfies the second inequality. Note that δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) > δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) ⇐⇒ F ω ( S ω ) > F ω ( Sf ω ′ ) . Denote A ≡ min (cid:26) , min ω √ R ( f ω ( s )) ds R ( f ω ( s )) ds (cid:27) and consider S ω = X s ∈ S Af ω ( s ) qR ( f ω ( s )) ds × s + − A sZ ( f ω ( s )) ds ! × ∅ , (C.2) Note that the choice of A is only to ensure that the probabilities in the random lottery of signal S ω sum to 1. In particular, it does not affect the subsequent proof and the stationary probabilitydistribution among memory states, as will be shown in the second part of the Lemma C.1. Af ω ( s ) qR ( f ω ( s )) ds on each signal s and the re-maining probability on the empty set. We have F ω ( S ω ) = A R ( f ω ( s )) ds qR ( f ω ( s )) ds = A sZ ( f ω ( s )) ds > A R f ω ( s ) f ω ′ ( s ) ds qR ( f ω ′ ( s )) ds = F ω ( S ω ′ )where the inequality is given by the Cauchy–Schwarz inequality.Now I prove the second part of the inequality. Note that δF ω ( S ′ ω ′ ) P ω ′′ = ω ′ F ω ( S ′ ω ′′ ) = βδF ω ( S ω ′ ) P ω ′′ = ω ′ βF ω ( S ω ′′ )= δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ )for all ω, ω ′ Thus, ( δ, { S ′ ω } Nω =1 ) satisfies Equation (C.1).Now, I construct a sequence of updating mechanisms with asymptotic utilitylosses that converge to 0 as M → ∞ . Consider the “star” updating mechanismillustrated in Figure 8. In the central memory state of the star, the DM randomlychooses one of the N actions, while there are N equiv-length branches that corre-spond to each of the N actions. For the ease of exposition, denote λ = ⌊ ( M − /N ⌋ ,relabel the memory states in the star as 0 , , , · · · , λ, , , · · · , λ, · · · , N λ anddenote the unused memory states as N λ + 1 , N λ + 2 , · · · , M −
1. Formally, the de-cision rule is as follows: d (0) = 1 N × { } + 1 N × { } + · · · + 1 N × { N } ; d ( ik ) = i for all i = 1 , . . . , N and k = 1 , · · · , λ ; d ( m ) = 1 for all m > N λ. The transition function between the memory states is defined below for some δ > { S i } Ni =1 that satisfies the two inequalities in Lemma C.1. Denote S i ( s )as the probability assigned to the realization s in the lottery S i and choose { S i } Ni =1 such that P Ni =1 S i ( s ) ≤ s ∈ S . Suppose the DM receives some signal s ,he follows the following transition rule: T (0 , s ) = N X i =1 S i ( s ) × { i } + − N X i =1 S i ( s ) ! × { } The action chosen in the unused memory states can be assigned to any of the N actions. Such { S i } Ni =1 exists as shown in the second part of Lemma C.1. a a a Figure 8: This figure illustrates a “star” updating mechanism with 4 actions. In thecentral (red) memory state, the DM randomly chooses one of the 4 actions. Thereare 4 equiv-length branches that correspond to each of the 4 actions. T ( i , s ) = S i ( s ) × { i } + X j ∈ Ω \{ i } S j ( s ) δ × { } + X j ∈ Ω \{ i } S j ( s )(1 − δ ) + 1 − N X j =1 S j ( s ) × { i } T ( iλ, s ) = X j ∈ Ω \{ i } S j ( s ) δ × { i ( λ − } + S i ( s ) + X j ∈ Ω \{ i } S j ( s )(1 − δ ) + 1 − N X j =1 S j ( s ) × { iλ } while for k = 2 , , · · · , λ − T ( ik, s ) = S i ( s ) × { i ( k + 1) } + X j ∈ Ω \{ i } S j ( s ) δ × { i ( k − } + X j ∈ Ω \{ i } S j ( s )(1 − δ ) + 1 − N X j =1 S j ( s ) × { ik } Finally, for m > N λ , T ( m, s ) = m for all s . By restricting the initial memorystate to one of 0 , , , · · · , λ, , , · · · , λ, · · · , N λ , the DM will never transitto memory states m > N λ .Before I prove the proposition, it would be useful to discuss the interpretation ofthe updating mechanism. It could be interpreted as a two-step transition rule that38nvolves first labeling the signal as supporting one of the states in Ω or not supportingany states, and then transiting based on the label of the signal. In particular, uponreceiving a signal s , the DM labels it as supporting state ω with probability S ω ( s )for all ω ∈ Ω and labels it as supporting no state with the remaining probability.Next, the transition rule based on the labeling could be interpreted as the DMtracks only the favorable action and his confidence level of this action ranging from 1to λ , as mentioned in the main text. Say at some period t the favorable action of theDM is ω , he revises his confidence level one unit upwards if he receives a signal thatsupports state ω (belief-confirming signal); he revises his confidence level one unitdownwards with probability δ <
1. If he receives a signal that supports other states(belief-challenging signal). In all other cases he does not revise his confidence level.It therefore can be interpreted as a learning algorithm with a particular (stochastic)definition of belief-confirming or belief-challenging signals, and an underreaction tobelief challenging signals.Now I compute the stationary probability distribution µ ω . Fix the state ω , inthe stationary probability distribution, we have at the two extreme memory statesin branch ω ′ , i.e., memory states ω ′ λ and ω ′ ( λ − µ ωω ′ ( λ − F ω ( S ω ′ ) = µ ωω ′ λ δ X ω ′′ = ω ′ F ω ( S ω ′′ ) µ ωω ′ ( λ − = µ ωω ′ λ " δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) − for all ω ′ . That is, in the stationary distribution, the probability mass that entersmemory state ω ′ λ equals the probability mass that leaves it. It also implies that atmemory state ω ′ ( λ − µ ωω ′ λ δ X ω ′′ = ω ′ F ω ( S ω ′′ ) + µ ωω ′ ( λ − F ω ( S ω ′ ) = µ ωω ′ ( λ − " δ X ω ′′ = ω ′ F ω ( S ω ′′ ) + F ω ( S ω ′ ) µ ωω ′ ( λ − F ω ( S ω ′ ) = µ ωω ′ ( λ − δ X ω ′′ = ω ′ F ω ( S ω ′′ ) µ ωω ′ ( λ − = µ ωω ′ ( λ − " δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) − Repeating the same procedures implies that for all k = 1 , · · · , λµ ωω ′ k = µ ωω ′ λ " δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) − ( λ − k ) (C.3)and µ ω = µ ωω ′ λ " δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) − λ (C.4)39s P Nω ′ =1 P λk =1 µ ωω ′ k + µ ω = 1, we have µ ωωλ λ X k =1 " δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) − ( λ − k ) + µ ωωλ " δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) − λ + µ ωωλ " δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) − λ X ω ′ = ω λ X k =1 " δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) ( λ − k ) = 1 (C.5)The two inequalities in Lemma C.1 imply that fixing N , as λ → ∞ , µ ωωλ " δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) − λ + µ ωωλ " δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) − λ X ω ′ = ω λ X k =1 " δF ω ( S ω ′ ) P ω ′′ = ω ′ F ω ( S ω ′′ ) ( λ − k ) → λ X k =1 µ ωωk = µ µωλ λ X k =1 " δF ω ( S ω ) P ω ′′ = ω F ω ( S ω ′′ ) − ( λ − k ) → ω and the asymptotic utility loss of the proposed non-ignorant updatingmechanism is 0, which proves lim λ →∞ L ∗ M = 0. C.2 Proof of Corollary 1
Proof.
Note that an ignorant updating mechanism induces utility loss weakly greaterthan min ω u ω p ω which is invariant in M . On the other hand, as shown in Proposi-tion 1, L ∗ M converges to 0 as M → ∞ . This implies there exists some big enough¯ M such that for M > ¯ M , L ∗ M < min ω u ω p ω . Consider ǫ < min ω u ω p ω − L ∗ ¯ M , and ifan updating mechanism T ignores some state ω ′ , for we have for M > ¯ ML ( T ) ≥ u ω ′ p ω ′ ≥ min ω u ω p ω > L ∗ ¯ M + ǫ which proves the result. C.3 Proof of Corollary 2
Proof.
By proposition 1, we have for all ω lim M →∞ lim ǫ → lim t →∞ Pr( a It = ω | ω ) = 1 for I = A, B .which proves the result. 40 .4 Proof of Proposition 2
Proof.
When N and M are finite, consider two states ω, ω ′ , and with similar argu-ment in Hellman and Cover (1970), we haveΥ ωω ′ ≤ ( l ωω ′ l ω ′ ω ) M − min m ∈ M ω ′ µ ωm µ ω ′ m ≥ ( l ωω ′ l ω ′ ω ) − ( M − max m ∈ M ω µ ωm µ ω ′ m where l ωω ′ = sup s f ω ( s ) f ω ′ ( s ) . Suppose that the DM chooses action ω in state ω withprobability 1 − ε and chooses action ω in state ω ′ with probability ε ′ , i.e., X m ∈ M ω µ ωm = 1 − ε and X m ∈ M ω µ ω ′ m = ε ′ . This implies that min max m ∈ M ω µ ωm µ ω ′ m = − εε ′ andmin m ∈ M ω ′ µ ωm µ ω ′ m ≥ ( l ωω ′ l ω ′ ω ) − ( M − − εε ′ . Moreover, as P m ∈ M ω ′ µ ωm + P m ∈ M ω µ ωm ≤
1, we have P m ∈ M ω ′ µ ωm ≤ ε and ǫ max m ∈ M ω ′ µ ω ′ m ≥ min m ∈ M ω ′ µ ωm µ ω ′ m ≥ ( l ωω ′ l ω ′ ω ) − ( M − − εε ′ max m ∈ M ω ′ µ ω ′ m ≤ ( l ωω ′ l ω ′ ω ) M − εε ′ − ε As ( l ωω ′ l ω ′ ω ) M − is bounded above, for sufficiently small ε and ε ′ , we must have X m ∈ M ω ′ µ ω ′ m < M max m ∈ M ω ′ µ ω ′ m < M ( l ωω ′ l ω ′ ω ) M − εε ′ − ε < . Thus, if the DM chooses ε and ε ′ close to 0, we must have P m ∈ M ω ′ µ ω ′ m close to 0and the utility loss is larger than u ω ′ p ω ′ . Therefore, L ∗ M > T , d ) with M ,and suppose that the DM’s memory capacity increases to M ′ > M . He can designan updating mechanism ( T ′ , d ′ ) with T ′ ( m, s ) = T ( m, s ) and d ′ ( m ) = d ( m ) for all m ≤ M . By choosing an initial memory state in m ≤ M , the DM will never transitto memory states m > M . The stationary distribution and the utility loss does notchange. Therefore, the DM can always secure a weakly lower L ∗ M when M increases,i.e., L ∗ M weakly decreases in M . 41 .5 Proof of Proposition 3 Proof.
I first prove the first statement. Following Hellman and Cover (1970), wehave for all ω ′ min m ∈ M µ ω ′ m µ ωm ≥ ( l ωω ′ l ω ′ ω ) − ( M − max m ∈ M µ ω ′ m µ ωm min m ∈ M µ ω ′ m µ ωm ≥ ς M − max m ∈ M µ ω ′ m µ ωm min m ∈ M µ ω ′ m µ ωm ≥ ς M − min max m ∈ M µ ω ′ m µ ωm min m ∈ M µ ω ′ m µ ωm ≥ ς M − u ω ′ p ω ′ µ ω ′ m u ω p ω µ ωm ≥ ς M − u ω ′ p ω ′ u ω p ω for all m . (C.8)Given that p ω , min max ω ′ u ω ′ p ω ′ u ω p ω = uu − pωN − p ω . Thus, when p ω is sufficiently small,there exists some ω ′ = ωu ω ′ p ω ′ µ ω ′ m u ω p ω µ ωm ≥ ς M − min max ω ′ u ω ′ p ω ′ u ω p ω ≥ ς M − uu − p ω N − p ω > A for all m ∈ M .for some A >
0. Suppose that M ω = ∅ , which implies that if the DM choosesaction ω ′ instead of action ω memory state m , his asymptotic utility loss de-creases by u ω ′ p ω ′ µ ω ′ m − u ω p ω µ ωm ≥
0. Thus when ǫ →
0, we have either M ω = ∅ or max ω ′ u ω ′ p ω ′ µ ω ′ m − u ω p ω µ ωm →
0. As max ω ′ u ω ′ p ω ′ µ ω ′ m − u ω p ω µ ωm > Au ω p ω µ ωm ,max ω ′ u ω ′ p ω ′ µ ω ′ m − u ω p ω µ ωm → u ω p ω µ ωm → ω ′ u ω ′ p ω ′ µ ω ′ m → u ω ′ p ω ′ µ ω ′ m → ω ′ . As M ω is finite and u ω ′ p ω ′ > ω ′ , P m ∈ M ω µ ω ′ m → ω ′ .I now prove the second statement. Note that L ∗ M is given by the followingminimization problem. Denote β ω = 1 − P m ∈ M ω µ ωm , L ∗ M = min N X ω =1 u ω p ω β ω subject to ( β ω ) Nω =1 ∈ cl( M )where M is the feasibility set and cl( M ) is the closure of M . As an example,when N = M = 2, M is characterized by Equation 7. We could also represent theproblem as a maximization problem: max N X ω =1 u ω p ω α ω subject to ( α ω ) Nω =1 ∈ cl( ˜ M )42here ( α ω ) Nω =1 ∈ cl( ˜ M ) if and only if (1 − α ω ) Nω =1 ∈ cl( M ).Suppose for some ( u ω ′ , p ω ′ ) Nω ′ =1 , ǫ -optimal updating mechanisms must ignorestate ω as ǫ →
0. Denote ( α ω ′ ∗ ) Nω ′ =1 as the solution of the maximization problem.This implies that α ω ∗ = 0 and X ω ′ = ω u ω ′ p ω ′ α ω ′ ∗ > N X ω ′ =1 u ω ′ p ω ′ α ω ′ for all ( α ω ′ ) Nω ′ =1 ∈ cl( M ) where α ω >
0. Rearranging the inequality gives us X ω ′ = ω u ω ′ p ω ′ u ω p ω ( α ω ′ ∗ − α ω ′ ) > α ω . Now, without loss of generality, assume that ω = 1, X ω ′ = ω ˜ u ω ′ ˜ p ω ′ ˜ u ω ˜ p ω ( α ω ′ ∗ − α ω ′ ) = X ω ′ = ω ˜ u ω ′ ˜ p ω ′ ˜ u ˜ p × ˜ u ˜ p ˜ u ω ˜ p ω ( α ω ′ ∗ − α ω ′ )= X ω ′ = ω u ω ′ p ω ′ u p × ˜ u ˜ p ˜ u ω ˜ p ω ( α ω ′ ∗ − α ω ′ )= ˜ u ˜ p ˜ u ω ˜ p ω × u ω p ω u p × X ω ′ = ω u ω ′ p ω ′ u p × u p u ω p ω ( α ω ′ ∗ − α ω ′ ) > X ω ′ = ω u ω ′ p ω ′ u p × u p u ω p ω ( α ω ′ ∗ − α ω ′ ) > α ω . It implies that X ω ′ = ω ˜ u ω ′ ˜ p ω ′ α ω ′ ∗ > N X ω ′ =1 ˜ u ω ′ ˜ p ω ′ α ω ′ for all ( α ω ′ ) Nω ′ =1 ∈ cl( M ) where α ω >
0. Thus ǫ ′ -optimal updating mechanisms mustignore state ω as ǫ ′ → C.6 Proof of Corollary 3
Proof.
By Proposition 3, if p ωA is small enough for all ω ∈ N A ⊂ N and p ωB is smallenough for all ω ∈ N B = N \ N A , individual A never picks action ω for all ω ∈ N A and individual B never picks action ω for all ω ∈ N \ N A . Therefore, they mustdisagree with each other. C.7 Proof of Proposition 4
Proof.
First, as M = 1 for individual A , his action is constant in all periods for allsignal realizations. The optimal automaton is thus M = M = ∅ and a At = 1 for43ll t . Now I characterize the(almost) optimal updating mechanism of individual B .With some abuse of notations, denote L ∗ ( nn ′ ) as optimal utility loss where the DMchooses action n in memory state 1 and action n ′ in memory state 2. By buildingon results in Hellman and Cover (1970), we have L ∗ (11) = 23 − ν,L ∗ (22) = L ∗ (33) = 23 + ν,L ∗ (12) = L ∗ (13) = 13 − ν + 2 q (1 + τ ) (cid:0) + 2 ν (cid:1) (cid:0) − ν (cid:1) − (cid:0) − ν (cid:1) τ ,L ∗ (23) = 13 + 2 ν + 2 (cid:0) − ν (cid:1) √ − (cid:0) − ν (cid:1) Υ . where L ∗ (22) = L ∗ (33) > L ∗ (11) ≥ L ∗ (12) = L ∗ (13). I first prove L ∗ (12) > L ∗ (23)if and only if ν is small enough. First, L ∗ (12) > L ∗ (23) if and only if △ L (12 −
23) = 3 ν + 2( − ν ) √ − − ν Υ − q (1 + τ )( + 2 ν )( − ν ) τ + + ντ < . When ν = 0, △ L (12 −
23) = 23 (cid:18) √ − √ ττ (cid:19) − (cid:18) − τ (cid:19) . As both √ xx and x decreases in x , and Υ > τ , △ L (12 − < ν = 0, i.e., L ∗ (12) > L ∗ (23) when ν = 0, which by continuity proves the result. C.8 Proof of Proposition 5
Proof.
I show that for all ǫ >
0, there exists a sequence of updating mechanisms withutility losses that converge to smaller than ǫ as N, M → ∞ . First, by Assumption 2,for all ǫ ¯ u >
0, there exists an ξ > N ξ such thatlim N →∞ (cid:16)P ω ∈ N ξ p ωN (cid:17) > − ǫ ¯ u andlim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω − log R f ωN ( s ) f ω ′ N ( s ) ds qR ( f ωN ( s )) ds qR ( f ω ′ N ( s )) ds > ξ Before proving the proposition, I first prove the following lemma for ( S ωN ) Nω =1 definedin Equation (C.2). Lemma C.2.
For ǫ ¯ u > , there exists an ˜ ξ > , a sequence of δ N > and a sequence f subset of states N ξ with lim N →∞ (cid:16)P ω ∈ N ξ p ωN (cid:17) > − ǫ ¯ u such that δ N F ωN ( S ω ′ N ) P ω ′′ = ω ′ F ωN ( S ω ′′ N ) > for all ω, ω ′ ∈ Ω and N ; lim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) δ N F ωN ( S ω ′ N ) P ω ′′ = ω ′ F ωN ( S ω ′′ N ) > ξ. (C.9) Proof. As F ωN ( S ω ′ N ) > ω, ω ′ , there always exists a δ N such that the firstinequality of Equation (C.9) holds. To prove the second inequality, note thatlim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) δ N F ωN ( S ω ′ N ) P ω ′′ = ω ′ F ωN ( S ω ′′ N ) ≥ lim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω F ωN ( S ωN ) F ωN ( S ω ′ N )= lim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω qR ( f ωN ( s )) ds qR ( f ω ′ N ( s )) ds R f ωN ( s ) f ω ′ N ( s ) ds> exp ( ξ )=1 + ˜ ξ (C.10)where first inequality of Equation (C.10) is implied by the fact that F ωN ( S ωN ) ≥ F ωN ( S ω ′ N ) and thus P ω ′′6 = ω ′ F ωN ( S ω ′′ N ) P ω ′′6 = ω F ωN ( S ω ′′ N ) ≥ N ξ , i.e., there are only branches thatcorrespond to actions in N ξ , there and exists no m ∈ M such that d ( m ) = ω ′ for ω ′ / ∈ N ξ . As in the proof of Proposition 1, I compute the stationary distributionunder state ω ∈ N ξ :lim N,M →∞ ( µ ωNωλ λ X k =1 " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − ( λ − k ) + µ ωNωλ " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − λ + µ ωNωλ " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − λ X ω ′ ∈ N ξ \{ ω } λ X k =1 " δ N F ωN ( S ω ′ N ) P ω ′′ = ω ′ F ωN ( S ω ′′ N ) ( λ − k ) ) = 1 (C.11)where µ ωNm is the stationary probability in memory state m under state ω whenstate space has size N . As lim N,M →∞ MN = ∞ , lim N,M →∞ λ = ∞ . Given the firstinequality of Lemma C.2, µ ωNωλ (cid:20) δ N F ωN ( S ωN ) P ω ′′6 = ω F ωN ( S ω ′′ N ) (cid:21) − λ → λ → ∞ . On the other45and, lim N,λ →∞ " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − λ X ω ′ ∈ N ξ \{ ω } λ X k =1 " δ N F ωN ( S ω ′ N ) P ω ′′ = ω ′ F ωN ( S ω ′′ N ) ( λ − k ) = lim N,λ →∞ " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − λ X ω ′ ∈ N ξ \{ ω } (cid:20) δ N F ωN ( S ω ′ N ) P ω ′′6 = ω ′ F ωN ( S ω ′′ N ) (cid:21) λ − δF ωN ( S ω ′ N ) P ω ′′6 = ω ′ F ωN ( S ω ′′ N ) − ≤ lim N,λ →∞ X ω ′ ∈ N ξ \{ ω } δ N F ωN ( S ω ′ N ) P ω ′′6 = ω ′ F ωN ( S ω ′′ N ) δ N F ωN ( S ωN ) P ω ′′6 = ω F ωN ( S ω ′′ N ) λ − " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − λ ≤ lim N,λ →∞ X ω ′ ∈ N ξ \{ ω } (1 + ˜ ξ ) − λ ≤ lim N,λ →∞ N (1 + ˜ ξ ) − λ where the first inequality is implied by the first inequality in Lemma C.2. As (1+ ˜ ξ ) − λ converges to 0 exponentially and N converges to infinity linearly, lim N,λ →∞ N (1 +˜ ξ ) − λ = 0. To see it formally, note that N = O ( M h ) with h < λ = MN implythat N = O ( λ h − h ) where h − h ∈ (0 , ∞ ). We havelim λ →∞ λ h − h (1 + ˜ ξ ) λ = lim λ →∞ h − h λ h − h − (log(1 + ˜ ξ ))(1 + ˜ ξ ) λ = lim λ →∞ h − h ( h − h − λ h − h − (log(1 + ˜ ξ )) (1 + ˜ ξ ) λ · · · = 0Thus, 0 ≤ lim N,λ →∞ " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − λ X ω ′ ∈ N ξ \{ ω } λ X k =1 " δ N F ωN ( S ω ′ N ) P ω ′′ = ω ′ F ωN ( S ω ′′ N ) ( λ − k ) ≤ lim N,λ →∞ N (1 + ˜ ξ ) − λ = 0and lim N,λ →∞ (cid:20) δ N F ωN ( S ωN ) P ω ′′6 = ω F ωN ( S ω ′′ N ) (cid:21) − λ P ω ′ ∈ N ξ \{ ω } P λk =1 (cid:20) δ N F ωN ( S ω ′ N ) P ω ′′6 = ω ′ F ωN ( S ω ′′ N ) (cid:21) ( λ − k ) = 0, whichimplies lim N,λ →∞ λ X k =1 µ ωNωk = lim N,λ →∞ µ µNωλ λ X k =1 " δ N F ωN ( S ωN ) P ω ′′ = ω F ωN ( S ω ′′ N ) − ( λ − k ) = 1 (C.12)for all ω ∈ N ξ . As lim N →∞ P ω ∈ N ξ p ωN > − ǫ ¯ u , the asymptotic utility loss of the46roposed non-ignorant updating mechanism is bounded above by ¯ u × ǫ ¯ u = ǫ . C.9 Proof of Corollary 4
Proof.
Suppose in contrary the sequence of updating mechanisms T N ignores strictlymore than ǫu measures of states at the limit. Denoted the set of states that is ignoredby ˜ N , the utility loss islim N,M →∞ L N ( T N ) ≥ lim N →∞ X ω ∈ ˜ N u ωN p ωN ≥ u lim N →∞ X ω ∈ ˜ N p ωN > u ǫu = ǫ = lim N,M →∞ L ∗ NM + ǫ which proves the result. C.10 Proof of Corollary 5
Proof.
By Proposition 5, we have for individual I :lim N,M →∞ lim ǫ → lim t →∞ N X ω =1 h n lim t →∞ Pr N ( a It = ω | ω ) = 1 o p ωI i = 1which is equivalent tolim N,M →∞ lim ǫ → lim t →∞ N X ω =1 h n lim t →∞ Pr N ( a It = ω | ω ) < o p ωI i = 0 , i.e., individual I would only take sub-optimal actions in probability 0 events mea-sured by ( p ωNI ) Nω =1 . As lim N →∞ p ωNA > N →∞ p ωNB > ω , combined with Assumption 1, this implies that individual A and B agree onthe probability 0 events. That is, for any sequence of subsets of states ˆ N wherelim N →∞ P ω ∈ ˆ N p ωNA = 0, we have lim N →∞ P ω ∈ ˆ N p ωNB = 0, which implies the re-sult. C.11 Proof of Proposition 6
Proof.
First, note that for lim
N,M →∞ L ∗ NM = 0, we must havelim N,M →∞ X m ∈ M ω µ ωNm = 1for almost all ω , i.e., there must exist a sequence of subsets of states ˆ N wherelim N →∞ P ω ∈ ˆ N p ωN = 1 and lim N,M →∞ X m ∈ M ω µ ωNm = 1 . ω ∈ ˆ N . Moreover, Assumption 1 implies that there must exist a sequence ofsubsets of states ˆ N where lim N → | ˆ N | N = 1 andlim N,M →∞ X m ∈ M ω µ ωNm = 1 , for all ω ∈ ˆ N . That is, the DM chooses the optimal action in almost all states,measured in both prior probability or in fraction. It implies that for all ω in ˆ N ,there must exist a set of memory state ˆ M ω ⊆ M ω such thatlim N,M →∞ X m ∈ ˆ M ω µ ωNm = 1lim N,M →∞ X m ∈ ˆ M ω µ ω ′ Nm = 0 for all ω ′ ∈ ˆ N \ { ω } lim N,M →∞ max m ∈ ˆ M ω µ ωNm µ ω ′ Nm = ∞ for all ω ′ ∈ ˆ N \ { ω } (C.13)In the following I prove that for Equation (C.13) to hold, MN has to go to ∞ . Firstconsider an irreducible automaton. Fix a ω ′ ∈ ˆ N \ { ω } , without loss of generalityrearrange the memory states such that µ ωNm µ ω ′ Nm is weakly decreasing in m . Accordingto Lemma 2 of Hellman and Cover (1970), we have for all m < M , µ ωN ( m +1) µ ω ′ N ( m +1) ≥ ( l ωω ′ N l ω ′ ωN ) − µ ωNm µ ω ′ Nm ≥ ς µ ωNm µ ω ′ Nm . (C.14)As there must exist some m with µ ωNm µ ω ′ Nm ≤
1, if max µ ωNm µ ω ′ Nm = µ ωN µ ω ′ N > K , Equation (C.14)implies that µ ωNm ′ µ ω ′ Nm ′ ≥ ς m ′ − µ ωN µ ω ′ N > ς m ′ − K and ς m ′ − K ≥ m ′ − ≤ log K ς ) − . In other words, there must exist at least log K ς ) − + 1 memory states with a state ω − ω ′ likelihood ratio µ ωNm µ ω ′ Nm ≥
1. Repeatingthe same analysis for other ω ′′ ∈ ˆ N \ { ω } implies that if max µ ωNm µ ω ′′ Nm > K for all ω ′′ ∈ ˆ N \ { ω ′ } , there must exist at log K ς ) − + 1 memory states with a likelihood ratiomin ω ′ ∈ ˆ N µ ωNm µ ω ′ Nm ≥ µ ωNm µ ω ′ Nm > K for all ω ∈ ˆ N and all ω ′ ∈ ˆ N \ { ω } , M | ˆ N | must be weakly greater than log K ς ) − + 1. This implies that M | ˆ N | goes to ∞ as K goes to ∞ . It contradicts the fact that lim N,M →∞ M | ˆ N | = lim N,M →∞ MN < ∞ in bigworlds.Now I analysis the case of reducible automatons. Denote the recurrent commu-nicating classes as R , · · · , R r , and the set of transient memory states as R . Theanalysis above applies in the cases where there is only one recurrent communicating48lass or where the initial memory state is in one of the recurrent communicatingclasses.Now consider the case where r >
1, i.e., there are more than one recurrentcommunicating class, and the initial memory state denoted by i is in R . I firstcompute the probability of absorption by R j under state ω , denoted by P ωN ( R j ).Consider a new transition rule T ′ where all transitions from m ∈ R to another m ′ ∈ R are the same as before. However, T ′ differs from T that all transitionsfrom m ∈ R to m ′ / ∈ R are changed to transition from m to i . Given such atransition rule T ′ , obviously only memory states in R are reachable. Denote µ ωNm as the stationary distribution of this new transition rule T ′ .As is known in the theory of Markov chain (see Appendix 2 of Hellman (1969)), P ω ( R j ) is given by: P ωN ( R j ) = X m ∈ R µ ωNm X m ′ ∈ R j q ωmm ′ Also denote µ jωNm as the stationary distribution of the recurrent communicating class R j , we have for m ∈ R j µ ωNm µ ω ′ Nm = P ωN ( R j ) P ω ′ N ( R j ) × µ jωNm µ jω ′ Nm = P m ∈ R µ ωNm P m ′ ∈ R j q ωmm ′ P m ∈ R µ ω ′ Nm P m ′ ∈ R j q ω ′ mm ′ × µ jωNm µ jω ′ Nm ≤ ς − max m ∈ R µ ωNm µ ωNm × µ jωNm µ jω ′ Nm . Thus, if max µ ωNm µ ω ′ Nm > K , we must have ς − max m ∈ R µ ωNm µ ωNm × max j ∈{ , , ··· ,r } max m ∈ R j µ jωNm µ jω ′ Nm > K Thus, when K → ∞ , we must have eithermax m ∈ R µ ωNm µ ωNm → ∞ , or;max j ∈{ , , ··· ,r } max m ∈ R j µ jωNm µ jω ′ Nm → ∞ . Then the result follows similar arguments in the case of irreducible automatons.
C.12 Proof of Proposition 7
Proof.
Suppose to the contrary that there exists a sequence of subsets of states ˜ N where lim N →∞ | ˜ N | N > ω ∈ ˜ N , the DM takes action ω with somestrictly positive probability in some state ω ′ (where ω ′ could be different for different49 ). Formally, this implies that for all ω ∈ ˜ N lim N.M →∞ X m ∈ M ω µ ω ′ Nm ≥ ξ > ω ′ . As M ω must be finite for almost all ω measured in fraction, it impliesthat there exists some ˜ N ′ where lim N →∞ | ˜ N ′ | N > ω ∈ ˜ N there exists some ω ′ ∈ Ω, lim
N.M →∞ µ ω ′ Nm ≥ ξ ′ > m ∈ M ω . Now denote the set of ω ′ , that is, the set of states of the world where the DMpicks some action ω ∈ ˜ N ′ with strictly positive probability as ˜ N ′′ . We must havelim N,M →∞ | ˜ N ′′ | N >
0. Otherwise, there must exist some ω ′ ∈ ˜ N ′′ such that there areinfinitely many memory states with strictly positive probability which contradictsthe fact that lim N,M →∞ P Mm =1 µ ω ′ Nm = 1. With similar arguments, for almost all ω ∈ ˜ N ′ , there must exist some m ∈ M ω such that lim N,M →∞ µ ω ′ Nm ≥ ξ ′ for some ω ′ ∈ ˜ N ′′ but lim N,M →∞ µ ω ′′ Nm = 0 for almost all states ω ′′ ∈ ˜ N \ { ω ′ } . This impliesthat there exists some sequence of subsets of states ˜˜ N where lim N →∞ | ˜˜ N | N > N,M →∞ max m µ ω ′ Nm µ ω ′′ Nm = ∞ for all ω ′ ∈ ˜˜ N and all ω ′′ ∈ ˜˜ N \ { ω ′ } . However, this is shown to be impossible in theproof of Proposition 6. The result thus follows. C.13 Proof of Corollary 6
Proof.
Consider an example where lim N →∞ p ωN = 0 for all ω . As shown in Proposi-tion 7, all sequences of updating mechanisms must ignore almost all actions when N, M goes to infinity. Thus, lim
N,M →∞ L ∗ NM = lim N →∞ P ω u ωN p ωN and all updatingmechanisms are ǫ -optimal for any ǫ ≥
0. Thus, if individual A adopts an updatingmechanism with d ( m ) = 1 for all m and all N and individual B adopts an updatingmechanism with d ( m ) = 2 for all m and all N ≥
2, then they must disagree witheach other under all ω . D Robustness of the results in small worlds toupdating mistakes
Below, I show that the behavior implications is small world, i.e., learning is close toBayesian, and that disagreement does not persist, hold even when individuals make“updating mistakes”. In other words, the results are robust to individuals’ limitedability to design and follow an “optimal” updating mechanism.50onsider two individuals A and B . Individual A adopts the star updating mech-anism described in the proof of Proposition 1 while individual B “attempts” toadopt the same updating mechanism but makes local mistakes as he randomly tran-sits to neighboring memory states with some probability γ ∈ (0 , B , denoted as T ′ ( m, s ), is as follows: T ′ (0 , s ) = (1 − γ ) × T (0 , s ) + N X j =1 γN × { j } T ′ ( i , s ) = (1 − γ ) × T ( i , s ) + γ × { i } + γ × { } T ′ ( iλ, s ) = (1 − γ ) × T ( iλ, s ) + γ × { i ( λ − } while for k = 2 , , · · · , λ − T ′ ( ik, s ) = (1 − γ ) × T ( ik, s ) + γ × { i ( k − } + γ × { i ( k + 1) } where T ( m, s ) is defined in the proof of Proposition 1. Such updating mistakescould be induced by memory imperfections, i.e., the DM’s memory state is subject tolocal fluctuations, or imperfect perceptions on signals, e.g., the DM may mistakenlyperceive any signal as a signal that supports state ω . Proposition D.1.
Consider individual A who adopts the star updating mechanismand individual B who makes local mistakes with some probability γ ∈ (0 , , char-acterized above by T ′ ( m, s ) . Fix a finite N , for all γ ∈ (0 , , the utility loss ofindividual B converges to as M → ∞ , i.e., lim M →∞ L ( T ′ ) = 0 . Individual A and B are bound to agree in small worlds, i.e., for all ( u ωA , p ωA , f ωA ) Nω =1 , ( u ωB , p ωB , f ωB ) Nω =1 , fixing N and M → ∞ , lim M →∞ lim ǫ → lim t →∞ Pr( a At = a Bt | ω ) = 1 for all ω. Proof.
The proof follows closely the proof of Proposition 1. For individual B , fixingthe state ω , in the stationary probability distribution, we have at the two extremememory states in branch ω ′ , µ ωω ′ λ " − γδ X ω ′′ = ω ′ F ω ( S ω ′′ ) + γ = µ ωω ′ ( λ − h (1 − γ ) F ω ( S ω ′ ) + γ i µ ωω ′ ( λ − = µ ωω ′ λ " (1 − γ ) F ω ( S ω ′ ) + γ − γδ P ω ′′ = ω ′ F ω ( S ω ′′ ) + γ − ω ′ . Similarly, at memory state ω ′ ( λ − µ ωω ′ λ " − γδ X ω ′′ = ω ′ F ω ( S ω ′′ ) + γ + µ ωω ′ ( λ − h (1 − γ ) F ω ( S ω ′ ) + γ i = µ ωω ′ ( λ − " − γδ X ω ′′ = ω ′ F ω ( S ω ′′ ) + (1 − γ ) F ω ( S ω ′ ) + γ µ ωω ′ ( λ − h (1 − γ ) F ω ( S ω ′ ) + γ i = µ ωω ′ ( λ − " − γδ X ω ′′ = ω ′ F ω ( S ω ′′ ) + γ µ ωω ′ ( λ − = µ ωω ′ ( λ − " (1 − γ ) F ω ( S ω ′ ) + γ − γδ P ω ′′ = ω ′ F ω ( S ω ′′ ) + γ − Repeating the same procedures implies that for all k = 1 , · · · , λ − µ ωω ′ k = µ ωω ′ λ " (1 − γ ) F ω ( S ω ′ ) + γ − γδ P ω ′′ = ω ′ F ω ( S ω ′′ ) + γ − " (1 − γ ) F ω ( S ω ′ ) + γ − γδ P ω ′′ = ω ′ F ω ( S ω ′′ ) + γ − ( λ − k − (D.1)and µ ω = µ ωω ′ λ (cid:20) (1 − γ ) F ω ( S ω ′ )+ γ − γδ P ω ′′6 = ω ′ F ω ( S ω ′′ )+ γ (cid:21) − (cid:20) (1 − γ ) F ω ( S ω ′ )+ γ − γδ P ω ′′6 = ω ′ F ω ( S ω ′′ )+ γ (cid:21) − ( λ − (cid:20) (1 − γ ) F ω ( S ω ′ )+ γN − γδ P ω ′′6 = ω ′ F ω ( S ω ′′ )+ γ (cid:21) (D.2)Note that for all γ ∈ (0 ,
1) (1 − γ ) F ω ( S ω ) + γ − γδ P ω ′′ = ω F ω ( S ω ′′ ) + γ > . (1 − γ ) F ω ( S ω ) + γ − γδ P ω ′′ = ω F ω ( S ω ′′ ) + γ > (1 − γ ) F ω ( S ω ′ ) + γ − γδ P ω ′′ = ω F ω ( S ω ′′ ) + γ which is the analogue of Lemma C.1. Then following the same steps in the proof ofProposition 1 proves the result.Next I prove the analogue result in the extension where N, M → ∞ and N = O ( M h ) with h <
1. Assume that individual A adopts a star updating mechanismwith a sequence of δ N that satisfies(1 − γ ) F ωN ( S ω ′ N ) + γ − γδ N P ω ′′ = ω ′ F ωN ( S ω ′′ N ) + γ > ω, ω ′ ∈ Ω and N .Fixing γ ∈ (0 , δ N always exists as F ωN ( S ω ′ N ) > ω, ω ′ ∈ Ω52nd N . It also implies that δ N goes to ∞ andlim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω (1 − γ ) F ωN ( S ωN )+ γ − γδN P ω ′′6 = ω F ωN ( S ω ′′ N )+ γ (1 − γ ) F ωN ( S ω ′ N )+ γ − γδN P ω ′′6 = ω ′ F ωN ( S ω ′′ N )+ γ = lim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω (1 − γ ) F ωN ( S ωN )+ γ (1 − γ ) P ω ′′6 = ω F ωN ( S ω ′′ N )+ γδN (1 − γ ) F ωN ( S ω ′ N )+ γ (1 − γ ) P ω ′′6 = ω ′ F ωN ( S ω ′′ N )+ γδN ≥ lim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω F ωN ( S ωN ) + γ − γ ) F ωN ( S ω ′ N ) + γ − γ ) where the last inequality is implied by the fact that P ω ′′ = ω F ωN ( S ω ′′ N ) ≤ P ω ′′ = ω ′ F ωN ( S ω ′′ N ).In Lemma C.2, we know that for all ǫu > ξ > N ξ with lim N →∞ (cid:16)P N ∈ N ξ p ωN (cid:17) > − ǫu such thatlim N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω F ωN ( S ωN ) F ωN ( S ω ′ N ) > ξ. Thus, fixing γ <
1, there also exists some ˜˜ ξ > N →∞ inf ω,ω ′ ∈ N ξ ; ω ′ = ω F ωN ( S ωN ) + γ − γ ) F ωN ( S ω ′ N ) + γ − γ ) > ξ. Then following the same steps in the proof of Proposition 5 gives the following result.
Proposition D.2.
Consider individual A who adopts the star updating mechanismand individual B who makes local mistakes with some probability γ ∈ (0 , , charac-terize by T ′ ( m, s ) . Suppose that N, M → ∞ and N = O ( M h ) where h < , lim N,M →∞ L N ( T ′ ) = 0 . Moreover, the two individuals are almost bound to agree asymptotically in smallworlds if they agree on the probability events. For all { ( u ωNA , p ωNA , f ωNA ) Nω =1 } ∞ N =1 and { ( u ωNB , p ωNB , f ωNB ) Nω =1 } ∞ N =1 such that lim N →∞ p ωNA > if and only if lim N →∞ p ωNB > for all ω , lim N,M →∞ lim ǫ → X ω h n lim t →∞ Pr N ( a At = a Bt | ω ) = 1 o p ωNI i = 1 for I = A, B.
The result illustrates the robustness of agreement in small world. In particu-lar, even if the individual makes local mistakes with probability close to 1, he will(almost) learn perfectly the true state of the world asymptotically. By combiningCorollary 2, 5 and Proposition D.1 and D.2, we therefore expect disagreement tovanish over time in small worlds among different individuals with different prior53eliefs, abilities of information acquisition or abilities to avoid updating mistakes.
E An example of ignorance with uniform priorbelief and symmetric signal structures
In this section, I present an example that shows that not only prior belief and signaldistribution, but a;so the complexity of the world plays a role in the optimality ofignorant learning behaviors.I consider a case where N = M ≥ p ω = N for all ω = { , · · · , N } . Moreover, u ω = 1 for all ω . Forsimplicity, consider “symmetric” discrete signal structures where S = { s , · · · .s N } and s ω is a signal that supports state ω . More specifically, F ω ( s ω ) = I F ω ( s ω ′ ) for all ω and ω ′ = ω where I > I times more likely to receive a signal thatsupports the true state than a signal that supports one of the other states.In such a symmetric environment, there seems to be no reason to ignore any ofthe states. However, I will present an example taht shows that it is beneficial toignores some states when N is large. First, consider a simple “symmetric” updatingmechanism that ignores no states, illustrated in Figure 9 with an example of N = 4.As the DM ignores no state, he allocates one memory state to each action. Withoutloss of generality, assume he takes action ω in memory state ω . Upon receiving asignal s ω , i.e., a signal that supports state ω , he transits to memory state ω withsome probability δ <
1, and stays in his current memory state otherwise.Formally, the transition function is as follows: T ( m, s ω ) = δ × { ω } + (1 − δ ) × { m } for all m and ω .Suppose state 1 is true, in the stationary distribution, we have δµ ω X ω ′ = ω F ( s ω ′ ) = δF ( s ω ) X ω ′ = ω µ ω ′ for all ω . (E.1)It is easy to see that the solution of the system of equations satisfy: µ F ( s ω ) = µ ω F ( s ) for all ω = 1,which says that all probability mass going out from memory state 1 to memorystate ω must be equal to that going from memory state ω to memory state 1. Thus µ = I µ ω for all ω = 1, and µ = I N + I − . By repeating the same procedurefor other states of the world, the asymptotic utility of this symmetric non-ignorant54
23 4 δ × s δ × s δ × s δ × s δ × s δ × s δ × s δ × s δ × s δ × s δ × s δ × s Figure 9: An example of an updating mechanism that considers all states, with N = M = 4. The number in the node denotes the action that the DM takes whenhe is this memory state. Moreover, upon receiving a signal that supports state ω ,the DM transits to node ω with probability δ < δ × s s δ × s δ × s s δ × s Figure 10: An example of an updating mechanism that considers all states, with N = M = 4. The DM takes action 1 in memory states 1 and 2, and takes action 2in memory states 3 and 4. Moreover, upon receiving a signal that supports state ω ,the DM transits to node ω with probability δ < N X ω =1 I N + I − N = I N + I − N . When N is large, it is very easy for the DM to receive onesignal that supports the wrong state and makes a mistake. Put differently, when N is large, the updating mechanism is “noisier”.Now consider an ignorant mechanism that follows the similar idea of the non-ignorant updating mechanism illustrated in Figure 9, but ignores half of the statesof the worlds. For simplicity, assume that N is plural. The ignorant mechanism isillustrated in Figure 10, with an example of N = 4. By ignoring half of the states,the DM allocates two memory states to each action that he does not ignore. Withoutloss of generality, assume that the DM takes action ω in memory states 2 ω − ω for ω ≤ N , and ignores all actions ω ′ > N . In the “more confident” memorystate 2 ω , upon receiving a signal supporting state ω ′ = ω where ω ′ ≤ N , the DMtransits to state 2 ω − δ < ω −
1, upon receiving a signalthat supports state ω , he transits to the “more confident” memory state 2 ω ; uponreceiving a signal that supports state ω ′ = ω where ω ′ ≤ N , the DM transits to state2 ω ′ − δ < T (2 ω, s ω ) = { ω } for all ω ≤ N T (2 ω, s ω ′ ) = δ × { ω − } + (1 − δ ) × { ω } for all ω ≤ N ω ′ = ω and ω ′ ≤ N T (2 ω − , s ω ) = { ω } for all ω = 1 , · · · , N T (2 ω − , s ω ′ ) = δ × { ω ′ − } + (1 − δ ) × { ω − } for all ω ≤ N ω ′ = ω and ω ′ ≤ N δµ ω X ω ′ = ω,ω ′ ≤ N F ( s ω ′ ) = µ ω − F ( s ω ) for all ω ≤ N δµ ω − F ( s ω ′ ) = δµ ω ′ − F ( s ω ) for all ω, ω ′ ≤ N ω = ω ′ . (E.3)Thus, when δ is close to 0, µ ω − is close to 0 for all ω ≤ N . Moreover, we have µ = F ( s ) δ P ω ′ =1 ,ω ′ ≤ N F ( s ω ′ ) × δF ( s ) δF ( s ω ) × δ P ω ′ = ω,ω ′ ≤ N F ( s ω ′ ) F ( s ω ) × µ ω = I N − × I × (cid:18) N − I (cid:19) × µ ω = I ( N + 2 I − N − µ ω (E.4)for all ω = 1 and ω ≤ N . We have µ + X ω =1 ,ω ≤ N µ ω = 1 µ + N − I ( N + 2 I − (cid:18) N − (cid:19) µ = 1 µ = 2 I ( N + 2 I − I ( N + 2 I −
4) + ( N − . By repeating the same computation for all ω ≤ N , the asymptotic utility equals: N X ω =1 I ( N + 2 I − I ( N + 2 I −
4) + ( N − × N = I ( N + 2 I − I ( N + 2 I −
4) + ( N − . (E.5)The asymptotic utility of the ignorant updating mechanism, illustrated in Figure 10,is larger than that of the non-ignorant updating mechanism, illustrated in Figure 9,if and only if: I ( N + 2 I − I ( N + 2 I −
4) + ( N − > I N + I − I ( N + 2 I − N + I − > I ( N + 2 I −
4) + ( N − I ( N + 2 I − N − I − > ( N − N ( I −
1) + N ( I − I + 4) > I (2 I − I + 1)( N + I −
42 ) > I (2 I − I + 1) I − I − N is sufficiently large as N + I − > N ≥ I >