[PDF] Asymptotic Behavior of Bayesian Learners with Misspecified Models

Abstract

We consider an agent who represents uncertainty about the environment via a possibly misspecified model. Each period, the agent takes an action, observes a consequence, and uses Bayes' rule to update her belief about the environment. This framework has become increasingly popular in economics to study behavior driven by incorrect or biased beliefs. Current literature has characterized asymptotic behavior under fairly specific assumptions. By first showing that the key element to predict the agent's behavior is the frequency of her past actions, we are able to characterize asymptotic behavior in general settings in terms of the solutions of a generalization of a differential equation that describes the evolution of the frequency of actions. We then present a series of implications that can be readily applied to economic applications, thus providing off-the-shelf tools that can be used to characterize behavior under misspecified learning.

Full PDF

AAsymptotic Behavior of Bayesian Learners withMisspeciﬁed Models ∗ Ignacio Esponda Demian Pouzo Yuichi Yamamoto(UC Santa Barbara) (UC Berkeley) (Hitotsubashi Univ.)October 24, 2019

Abstract

We consider an agent who represents uncertainty about the environment via a possiblymisspeciﬁed model. Each period, the agent takes an action, observes a consequence,and uses Bayes’ rule to update her belief about the environment. This framework hasbecome increasingly popular in economics to study behavior driven by incorrect or biasedbeliefs. Current literature has characterized asymptotic behavior under fairly speciﬁcassumptions. By ﬁrst showing that the key element to predict the agent’s behavior isthe frequency of her past actions, we are able to characterize asymptotic behavior ingeneral settings in terms of the solutions of a generalization of a differential equationthat describes the evolution of the frequency of actions. We then present a series ofimplications that can be readily applied to economic applications, thus providing off-the-shelf tools that can be used to characterize behavior under misspeciﬁed learning. ∗ We thank Drew Fudenberg, Ryota Iijima, Yuhta Ishii, Dale Stahl, Philipp Strack, and several seminar par-ticipants for helpful comments. Esponda: Department of Economics, UC Santa Barbara, 2127 North Hall, SantaBarbara, CA 93106, [email protected]; Pouzo: Department of Economics, UC Berkeley, 530-1 Evans Hall a r X i v : . [ ec on . T H ] O c t ontents Introduction

Over the last few decades, evidence of systematic mistakes and biases in beliefs has been col-lected in a large range of economic environments. Moreover, the evidence indicates that manyof these mistakes persist with experience. One approach to incorporating these ﬁndings in ourtheories is to simply postulate that economic agents have ﬁxed, wrong beliefs about aspects oftheir environment, and never learn about these aspects. A different approach that has gainedpopularity over the last few years is to postulate that agents do learn about their environment,but they do so in the context of a misspeciﬁed model that misses some important aspects ofreality. The idea is that the world is complex and it is natural for economic agents to representuncertainty about the world with parsimonious models that are likely to be misspeciﬁed. Theresearcher who follows this approach is forced to specify the agent’s misspeciﬁcation, and thedirection of biases is often not ex-ante obvious without further analysis.Examples of misspeciﬁed learning in economics date back to the 1970s and include thefollowing: A ﬁrm estimates a demand model but wrongly excludes competitors’ prices (Arrowand Green (1973), Kirman (1975)); a teacher assesses how praise and criticism affect studentperformance, but does not understand regression to the mean (Tversky and Kahneman (1973),Esponda and Pouzo (2016)); a person faces an increasing marginal income tax rate but behavesas if facing a constant marginal tax (Sobel (1984), Liebman and Zeckhauser (2004), Espondaand Pouzo (2016)); when learning the value of assets, policies, or investment projects, traders,voters, and investors fail to account for sample selection (Esponda (2008), Esponda and Pouzo(2017, 2019a), Jehiel (2018)); a seller estimates a constant-elasticity demand function, butelasticity is not constant (Nyarko (1991), Fudenberg, Romanyuk and Strack (2017)); a personinverts causal relationships and incorrectly believes that diet affects a chemical in the bloodwhich in turn affects health (Spiegler (2016)); overconﬁdence biases an agent’s learning of afundamental (Heidhues, K ˝oszegi and Strack (2018a)).In all of these examples, the agent processes information through the lens of a simplemodel that misses some aspect of reality. The main question in the literature is what happensto the agent’s behavior as time goes by and she uses feedback to update her belief aboutthe model’s primitives. The direction of the bias is often not obvious because the agent’sbehavior affects the feedback she observes, this feedback is in turn processed via the agent’smisspeciﬁed model, and this processing leads to updated beliefs and subsequent changes inbehavior, which in turn lead to changes in beliefs, and so on. For discussions of the evidence, see, for example, Camerer and Johnson (1997) and Section 3.D in Rabin(1998). x ∈ X is given by Q ( · | x ) ∈ ∆ ( Y ) , where Y is the set of consequences. The agent, however, does not know Q . She has a parametric modelof it, given by ( Q θ ( · | x )) x ∈ X , where parameter values, such as θ , belong to a parameter space Θ . The agent is Bayesian, so she has a prior over Θ and updates her prior in each period afterobserving the realized consequence. The agent’s model is misspeciﬁed if the support of herprior does not include the true distribution Q , and it is correctly speciﬁed otherwise. Our key point of departure from previous literature is that we begin by focusing on theevolution of the frequency of actions rather than on actions alone or on the agent’s belief. Thefrequency of actions at time t + t plus some innovation term that depends on the agent’s action at time t +

1. The actionat time t +

1, however, depends on the agent’s belief at time t , and one challenge is to be ableto write this belief as a function of frequencies of actions so as to make this recursion dependexclusively on frequencies, not beliefs.Extending results by Berk (1966) and Esponda and Pouzo (2016), we show that eventu-ally the posterior at time t roughly concentrates on the set of parameter values that minimizeKullback-Leibler divergence given the frequency of actions up to time t . This result allows usto write the evolution of frequencies of actions recursively as a function of the past frequencyalone, excluding the belief. We then apply techniques from stochastic approximation devel-oped by Benaïm, Hofbauer and Sorin (2005) to show that the continuous-time approximationof the frequency of actions can be essentially characterized as a solution to a generalization The correctly-speciﬁed version of this environment was originally studied by Easley and Kiefer (1988) andAghion, Bolton, Harris and Jullien (1991).

2f a differential equation. Finally, we present a series of implications that can be readilyapplied to economic applications, thus providing off-the-shelf tools that can be valuable topeople working in this area. In fact, for the special case of one-dimensional models–a casethat includes most of the applications in the literature–our results imply that one can essentiallycharacterize convergence and stability by looking at a simple two-dimensional ﬁgure.Our results pertain to the agent’s long-run behavior, and there are at least three reasons whythe focus on long-run behavior is important. First, there are many instances where it is notsurprising that people initially make incorrect decisions and the more interesting question iswhat types of biases persist with experience. Second, systematic patterns tend to arise as timegoes by, while initial behavior tends to be more dependent on random draws. Finally, thereis a long tradition in statistic and economics focusing on asymptotic or equilibrium behavior,and so we can use existing tools as well as compare our results to existing results in theseliteratures.Our work beneﬁts from previous work on misspeciﬁed learning. Our environment is thesingle-agent version of the environment studied by Esponda and Pouzo (2016). They introducethe notion of a Berk-Nash equilibrium and show that, under some conditions, if behavior con-verges then it must converge to a Berk-Nash equilibrium. But they do not study convergencein general. Papers that study convergence in misspeciﬁed settings are few and ingeniously establishresults, though for somewhat specialized setups. Nyarko (1991) presents an example where theagent’s action does not converge. Fudenberg, Romanyuk and Strack (2017) consider a moregeneral model where the agent has a ﬁnite number of actions but still updates between twopossible models (i.e., Θ has two elements). They provide a full characterization of asymptoticactions and beliefs, including cases where the action converges and cases where it does not.Their model is in continuous time and they exploit the fact that the belief over Θ follows a one-dimensional stochastic differential equation. Heidhues, K ˝oszegi and Strack (2018a) study a The type of differential equation is called a differential inclusion in the literature. It differs from a differentialequation in that there may be multiple derivatives at certain points and therefore multiple trajectories that solve theequation. Multiplicity arises in our environment because there are certain beliefs at which the agent is indifferentbetween different actions, and we need to keep track of what would happen to beliefs and subsequent actions ifthe agent were to follow any one of these actions. There are many examples of boundedly-rational equilibrium concepts that abstract away from the questionof dynamics and convergence, including (Jehiel, 2005, 1995), Osborne and Rubinstein (1998), Eyster and Rabin(2005), Esponda (2008), Jehiel and Koessler (2008), and (Spiegler, 2016, 2017). Esponda and Pouzo (2016) tackle the issue of convergence in Theorem 3, where they use an idea fromFudenberg and Kreps (1993) to show that, if agents are allowed to make possibly large but vanishing mistakes,then behavior can converge to any equilibrium. Here, as in the rest of the literature on misspeciﬁed learning, weconsider the case where agents don’t make these types of mistakes. As mentioned above, we are able to make signiﬁcantly moreprogress by focusing on the frequency of actions, as opposed to the action itself or the belief. Tools from stochastic approximation have been previously applied in economics, includ-ing the literature on learning in games (e.g., Fudenberg and Kreps (1993), Benaim and Hirsch(1999), and Hofbauer and Sandholm (2002)) and learning in macroeconomics (e.g., Sargent(1993)). Our approach is inspired by Fudenberg and Kreps (1993)’s model of stochastic ﬁc-titious play. In that environment, the frequency of past actions exactly represents the agents’beliefs about other agents’ strategies. In our environment, we characterize beliefs to be afunction of the frequency of actions.Misspeciﬁed learning has also been studied in other environments. Rabin and Vayanos(2010) study a case where shocks are i.i.d. but agents believe them to be autoregressive.Esponda and Pouzo (2019b) extend Berk-Nash equilibrium to Markov decision problems,where a state variable, other than a belief, affects continuation values. He (2018) considersagents suffering from the gambler’s fallacy who mislearn from endogenously censored data.Molavi (2018) studies a general-equilibrium framework that nests a class of macroeconomicmodels where agents learn with misspeciﬁed models. Bohren and Hauser (2018) and Frick,Iijima and Ishii (2019a) characterize asymptotic behavior in social learning environments withmodel misspeciﬁcation. Finally, Frick, Iijima and Ishii (2019b) focus on convergence and ro-bustness of the stability of equilibrium in both single-agent and social learning environments.We do not study robustness but study asymptotic properties of the learning process in a moregeneral way. In particular, we develop general tools to study whether behavior converges ornot, and what happens if it does not converge. We believe our results can be extended to theseother environments. For another example using normality assumptions and stochastic approximation, see the online appendix ofEsponda and Pouzo (2016). Incidentally, we show that in some of the examples of Nyarko (1991) and Fudenberg, Romanyuk and Strack(2017) where the action diverges, the action frequency converges. This is the ﬁrst result of its kind and it providesa new interpretation of a mixed-action steady state. We also present examples where not even the action frequencyconverges. See also Eyster and Rabin (2010), Bohren (2016), and Gagnon-Bartsch and Rabin (2017).

Objective environment.

There is a single agent facing the following inﬁnitely repeated prob-lem. Each period t = , , ... , the agent must choose an action from a ﬁnite set X . She thenreceives a consequence according to the consequence function Q : X → ∆ Y , where Y is theset of consequences and ∆ Y is the set of all (Borel) probability measures over it. Finally, thepayoff function π : X × Y → R determines the agent’s current payoff. In particular, if x t ∈ X is the agent’s choice at time t , then y t ∈ Y is drawn according to the probability measure Q ( · | x t ) ∈ ∆ Y , and the agent’s payoff at time t is π ( x t , y t ) . Assumption 1. (i) Y is a compact subset of Euclidean space; (ii) There exists a Borel proba-bility measure ν ∈ ∆ Y such that, for all x ∈ X , Q ( ·| x ) (cid:28) ν , i.e., Q ( ·| x ) is absolutely continuouswith respect to ν (an implication is the existence of densities q ( · | x ) ∈ L ( Y , R , ν ) such that ´ A q ( y | x ) ν ( dy ) = Q ( A | x ) for any A ⊆ Y Borel); (iii) For all x ∈ X , π ( x , · ) ∈ L ( Y , R , Q ( · | x )) . Assumption 1 collects some standard technical conditions. It includes both the case wherethe consequence is a continuous variable ( ν is the Lebesgue measure and q ( · | x ) is the densityfunction) and the case where it is discrete ( ν is the counting measure and q ( · | x ) is theprobability mass function).In the special case in which the agent knows the primitives and wishes to maximize dis-counted expected utility, she chooses an action in each period from the set of actions thatmaximizes ˆ Y π ( x , y ) Q ( dy | x ) = ˆ Y π ( x , y ) q ( y | x ) ν ( dy ) . We will study the case where the agent does not know the consequence function Q . As usual, L p ( Y , R , ν ) denotes the space of all functions f : Y → R such that ´ | f ( y ) | p ν ( dy ) < ∞ . ubjective family of models. The agent is endowed with a parametric family of conse-quence functions, Q Θ = { Q θ : θ ∈ Θ } , where each Q θ : X → ∆ Y is indexed by a model θ ∈ Θ . We refer to Q Θ as the family of models and say that it is correctly speciﬁed if Q ∈ Q Θ and misspeciﬁed otherwise. Assumption 2. (i) For all θ ∈ Θ and x ∈ X , Q θ ( ·| x ) (cid:28) ν , where ν is deﬁned in A1 (animplication is the existence of densities q θ ( · | x ) ∈ L ( Y , R , ν ) such that ´ A q θ ( y | x ) ν ( dy ) = Q θ ( A | x ) for any A ⊆ Y Borel); (ii) Θ is a compact subset of an Euclidean space and, forall x ∈ X , θ (cid:55)→ q θ ( · | x ) is continuous Q ( · | x ) -a.s.; ; (iii) For all x ∈ X , there exists g x ∈ L ( Y , R , Q ( · | x )) such that, for all θ ∈ Θ , | ln ( q ( · | x ) / q θ ( · | x )) | ≤ g x ( · ) a.s.-Q ( · | x ) . Assumption 2(i) guarantees the existence of a density function, and 2(ii) is a standardparametric assumption on the subjective model. Assumption 2(iii) will be used to establish auniform law of large numbers. This condition also implies that, for all θ and x , the support of Q θ ( · | x ) contains the support of Q ( · | x ) ; in particular, every observation can be generated bythe agent’s model. Bayesian learning . The agent is Bayesian and starts with a prior µ over the space ofmodels Θ . She observes past actions and consequences and uses this information to update herbelief about Θ in every period. The timing is as follows: At each time t , the agent holds somebelief µ t . Given µ t , she chooses an action x t . Then the consequence y t is drawn accordingto Q ( · | x t ) . The agent observes y t , receives an immediate payoff of π ( x t , y t ) , and updatesher belief to µ t + = B ( x t , y t , µ t ) , where B is the Bayesian operator. The next assumptionguarantees that the prior has full support.

Assumption 3. µ ( A ) > for any A open and non-empty.Policy and probability distribution over histories . A policy f is a function f : ∆Θ → X specifying the action f ( µ ) ∈ X that the agent takes at any moment in time in which her beliefis µ . A history is a sequence h = ( x , y , ..., x t , y t , ... ) ∈ H ≡ ( X × Y ) ∞ . Together with theprimitives of the problem, a policy f induces a probability distribution over the set of histories,which we will denote by P f . The Bayesian operator B : X × Y × ∆Θ → ∆Θ satisﬁes, for all A ⊆ Θ Borel, for any x ∈ X , and a.s.- Q ( · | x ) , B ( x , y , µ )( A ) = ´ A q θ ( y | x ) µ ( d θ ) / ´ Θ q θ ( y | x ) µ ( d θ ) . We do not allow the agent to mix to simplify the exposition and to highlight the fact that a mixed distributionover actions may describe limiting behavior despite the fact that the agent never actually mixes. In the more gen-eral case where f maps into ∆ X , our main result (Theorem 2) holds exactly as stated but some of the statementsin Section 5 need to be modiﬁed accordingly. olicy correspondence . It will be convenient to characterize behavior for a family of poli-cies, and not just for a single policy function. For this purpose, we deﬁne a policy correspon-dence to be a mapping F : ∆Θ ⇒ X , where F ( µ ) ⊆ X denotes the set of actions that the agentmight choose any time her belief is µ ∈ ∆Θ . We sometimes abuse notation and, for a set ofprobability measures A ⊆ ∆Θ , we let F ( A ) represent the set of actions x such that x ∈ F ( µ ) for some µ ∈ A . Let Sel ( F ) denote the set of all policies f that constitute a selection from thecorrespondence F , i.e., with the property that f ( µ ) ∈ F ( µ ) for all µ . Assumption 4.

The policy correspondence F is upper hemi-continuous (uhc).

An important special case is one where the agent maximizes discounted expected utilitywith discount factor β ∈ [ , ) . This problem can be cast recursively as W ( µ ) = max x ∈ X ˆ Y (cid:8) π ( x , y ) + β W ( µ (cid:48) ) (cid:9) ¯ Q µ ( dy | x ) (1)where W : ∆Θ → R is the (unique) solution to the Bellman equation (1), µ (cid:48) = B ( x , y , µ ) is theBayesian posterior, and ¯ Q µ ≡ ´ Θ Q θ µ ( d θ ) . In this case, it is well known that the correspon-dence mapping beliefs to optimal actions is uhc. Action frequency . Our main objective is to study regularities in asymptotic behavior. Pre-vious work has focused on characterizing the limit of the sequence of actions, whenever itexists. But there are cases where actions do not converge (e.g., Nyarko (1991)), and in thosecases previous work has not much else to say about asymptotic behavior. We make progressby studying the action frequency . We do so for two reasons. First, from a practical perspec-tive, even if actions do not converge, it is possible for the frequency of actions to converge.Thus, studying frequencies can help uncover additional regularities in behavior, with impor-tant implications regarding, for example, limiting average payoffs. Second, as we will show,asymptotic beliefs depend crucially on the action frequency. Because actions in turn dependon beliefs, future actions depend crucially on the frequency of past actions.For every t , we deﬁne the action frequency at time t to be a function σ t : H → ∆ X deﬁnedsuch that, for all h ∈ H and x ∈ X , σ t ( h )( x ) = t t ∑ τ = ( x ) ( x τ ( h )) is the fraction of times that action x occurs in history h by time period t .7 Asymptotic characterization of beliefs

In this section, we take as given the sequence of action frequencies, ( σ t ) t , and we characterizethe agent’s asymptotic beliefs. In subsequent sections, we will use the characterization ofbeliefs to characterize the sequence ( σ t ) t , which is ultimately an endogenous object. The keyobject in our characterization is the notion of Kullback-Leibler divergence. Deﬁnition 1.

The

Kullback-Leibler divergence (KLD) is a function K : Θ × ∆ X → R suchthat, for any θ ∈ Θ and σ ∈ ∆ X , K ( θ , σ ) = ∑ x ∈ X E Q ( ·| x ) (cid:20) ln q ( Y | x ) q θ ( Y | x ) (cid:21) σ ( x )= ∑ x ∈ X ˆ Y ln q ( y | x ) q θ ( y | x ) q ( y | x ) ν ( dy ) σ ( x ) . The set of closest models given σ is the set Θ ( σ ) ≡ arg min θ ∈ Θ K ( θ , σ ) and the mini-mized KLD given σ is K ∗ ( σ ) ≡ min θ ∈ Θ K ( θ , σ ) . Lemma 1. (i) ( θ , σ ) (cid:55)→ K ( θ , σ ) − K ∗ ( σ ) is continuous; (ii) Θ ( · ) is uhc, nonempty-, andcompact-valued.Proof. See Appendix A.1.If the actions were drawn from an i.i.d. distribution σ ∈ ∆ X , we could directly apply Berk’s(1966) result to conclude that the posterior eventually concentrates on the set of closest modelsgiven σ (i.e., for all open sets U ⊇ Θ ( σ ) , lim t → ∞ µ t ( U ) = P f -a.s.). EP2016 showed thatBerk’s conclusion extends even if actions are not i.i.d., provided that the distribution overactions at time t converges to a distribution σ . This type of result is useful to characterizebehavior under the assumption that it stabilizes, but it is insufﬁcient to determine whether ornot behavior stabilizes.In the current section, we provide a characterization of beliefs that does not rely on theassumption that behavior stabilizes. Roughly speaking, we will show that the distance betweenthe agent’s belief at time t , µ t , and the set of probability measures with support in Θ ( σ t ) goes Formally, what we call KLD is the Kullback-Leibler divergence between the distributions q · σ and q θ · σ deﬁned over the space X × Y . See also Bunke and Milhaud (1998). Relatedly, White (1982) shows that the Kullback-Leibler divergencecharacterizes the limiting behavior of the maximum quasi-likelihood estimator.

8o zero as time goes to inﬁnity, irrespective of whether or not ( σ t ) t converges. We will establishthis result in several steps, which we now discuss informally and then address formally in theproofs. First, we note that for any Borel set A ⊆ Θ , the posterior belief over A can be writtenas µ t + ( A ) = ´ A ∏ t τ = q θ ( y τ | x τ ) µ ( d θ ) ´ Θ ∏ t τ = q θ ( y τ | x τ ) µ ( d θ )= ´ A e − tL t ( θ ) µ ( d θ ) ´ Θ e − tL t ( θ ) µ ( d θ ) , (2)where L t ( θ ) ≡ t − ∑ t τ = ln q ( y τ | x τ ) q θ ( y τ | x τ ) is the sample average of the log-likelihood ratios, andwhere we have omitted the history for simplicity. Naturally, we might expect the sampleaverage to converge to its expectation for each θ . The next result strengthens this intuitionand establishes that the difference between L t ( · ) and K ( · , σ t ) converges uniformly to zero as t → ∞ . Lemma 2.

Under Assumptions 1-2, for any policy f , lim t → ∞ sup θ ∈ Θ | L t ( θ ) − K ( θ , σ t ) | = P f -a.s. Proof.

See Appendix A.2.The next step is to replace L t ( · ) in (2) with K ( · , σ t ) . By Lemma 2, for sufﬁciently large t ,we obtain µ t + ( A ) ≈ ´ A e − tK ( θ , σ t ) µ ( d θ ) ´ Θ e − tK ( θ , σ t ) µ ( d θ ) . (3)As t → ∞ , the posterior concentrates on models where K ( θ , σ t ) is close to its minimized value, K ∗ ( σ t ) . This statement is seen most easily for the case where Θ has only two elements, θ and θ . In this case, (3) becomes µ t + ( θ ) ≈ / ( + µ ( θ ) e − tK ( θ , σ t ) µ ( θ ) e − tK ( θ , σ t ) ) . (4)Suppose, for example, that ( σ t ) t converges to σ and that KLD is minimized at θ given σ .Then there exists ε > t , K ( θ , σ t ) − K ( θ , σ t ) > ε . Itfollows from (4) that µ t + ( θ ) converges to 1, so the posterior concentrates on the model thatminimizes KLD given σ . When ( σ t ) t does not converge, however, we have to account for thepossibility that K ( θ , σ t ) − K ( θ , σ t ) > t but K ( θ , σ t ) − K ( θ , σ t ) → t →

0. In9his case, we cannot say that the posterior eventually puts probability 1 on θ , even though θ always minimizes KLD. This is why the next result says that the posterior concentrates onmodels where K ( θ , σ t ) is close to its minimized value, K ∗ ( σ t ) , as opposed to saying that theposterior asymptotically concentrates on the minimizers of KLD given σ t . We now state theresult formally and provide a proof.

Theorem 1.

Under Assumptions 1-3, for any policy f , lim t → ∞ ˆ Θ ( K ( θ , σ t ) − K ∗ ( σ t )) µ t + ( d θ ) = P f -a.s. (5) Proof.

Fix a history h such that the condition of uniform convergence in Lemma 2 holds, andnote that the set of histories with this property has probability one (henceforth, we omit thehistory from the notation). In particular, for all η >

0, there exists t η such that, for all t ≥ t η , | L t ( θ ) − K ( θ , σ t ) | < η (6)for all θ ∈ Θ .Let ¯ K ( θ , σ ) ≡ K ( θ , σ ) − K ∗ ( σ ) . Fix any ε >

0. Using (2) and the facts that 0 ≤ K ∗ ( σ ) (the proof is standard) and K ∗ ( σ ) < ∞ (follows from Assumption 2(iii)) for all σ , we obtain ˆ ¯ K ( θ , σ t ) µ t + ( d θ ) = ´ Θ ¯ K ( θ , σ t ) e − tL t ( θ ) µ ( d θ ) ´ Θ e − tL t ( θ ) µ ( d θ )= ´ Θ ¯ K ( θ , σ t ) e − t ( L t ( θ ) − K ∗ ( σ t )) µ ( d θ ) ´ Θ e − t ( L t ( θ ) − K ∗ ( σ t )) µ ( d θ ) ≤ ε + ´ { θ : ¯ K ( θ , σ t ) ≥ ε } ¯ K ( θ , σ t ) e − t ( L t ( θ ) − K ∗ ( σ t )) µ ( d θ ) ´ { θ : ¯ K ( θ , σ t ) ≤ ε / } e − t ( L t ( θ ) − K ∗ ( σ t )) µ ( d θ )= : ε + A ε t B ε t . The proof concludes by showing that lim t → ∞ A ε t / B ε t = Formally, what we are saying is that it is not generally true that lim t → ∞ ´ Θ inf θ (cid:48) ∈ Θ ( σ t ) (cid:107) θ − θ (cid:48) (cid:107) µ t + ( d θ ) = ( σ t ) t converges.

10y (6), there exists t η such that, for all t ≥ t η , A ε t B ε t ≤ ´ { θ : ¯ K ( θ , σ t ) ≥ ε } ¯ K ( θ , σ t ) e − t ( ¯ K ( θ , σ t ) − η ) µ ( d θ ) ´ { θ : ¯ K ( θ , σ t ) ≤ ε / } e − t ( ¯ K ( θ , σ t )+ η ) µ ( d θ )= e t η ´ { θ : ¯ K ( θ , σ t ) ≥ ε } ¯ K ( θ , σ t ) e − t ¯ K ( θ , σ t ) µ ( d θ ) ´ { θ : ¯ K ( θ , σ t ) ≤ ε / } e − t ¯ K ( θ , σ t ) µ ( d θ ) . Observe that the function x (cid:55)→ x exp {− tx } is decreasing for all x > / t . Thus, for any t ≥ max { t η , / ε } it follows that ¯ K ( θ , σ t ) e − t ¯ K ( θ , σ t ) ≤ ε e − t ε over { θ : ¯ K ( θ , σ t ) ≥ ε } . Thus for all t ≥ max { t η , / ε } , A ε t B ε t ≤ e t η e − t ε / µ ( { θ : ¯ K ( θ , σ t ) ≤ ε / } ) . (7)In Appendix A.3, we show that continuity of ¯ K and compactness of ∆ X imply that κ ε ≡ inf σ ∈ ∆ X µ ( { θ : ¯ K ( θ , σ ) ≤ ε / } ) > ε >

0. Thus, setting η = ε / >

0, (7) implies that, for all t ≥ max { t η , / ε } , A ε t B ε t ≤ e − t ε / κ ε , which goes to zero as t → ∞ .In Section 4, we use Theorem 1 to approximate the agent’s belief, µ t , with the set of prob-ability measures with support in { θ ∈ Θ : K ( θ , σ t ) − K ∗ ( σ t ) ≤ δ t } , where δ t →

0. Therefore,we will be able to study the asymptotic behavior of ( σ t ) t via a stochastic difference equationthat only depends on σ t and a vanishing approximation error, and not on µ t . In this section, we propose a method to study the asymptotic behavior of the frequencies ofactions. Among other beneﬁts, one can use the method to determine if behavior converges ornot. The key departure from previous approaches in the literature is to focus on the evolutionof frequencies of actions. Using the characterization of beliefs in Theorem 1, we write thisevolution as a stochastic difference equation expressed exclusively in terms of the frequenciesof actions. We then use tools from stochastic approximation developed by Benaïm, Hofbauer11nd Sorin (2005) (henceforth, BHS2015) to characterize the solutions of this difference equa-tion in terms of the solution to a generalization of a differential equation.We ﬁrst provide a heuristic description of our approach. The sequence of frequencies ofactions, ( σ t ) t , can be written recursively as follows: σ t + = σ t + t + ( ( x t + ) − σ t ) , (9)where ( x t + ) = ( x ( x t + )) x ∈ X and x ( x t + ) is the indicator function that takes the value 1 if x t + = x and 0 otherwise.By adding and subtracting the conditional expectation of ( x t + ) (i.e., the probability thateach action is played at time t + t + σ t + = σ t + t + ( E [ ( x t + ) | µ t + ] − σ t ) + t + (cid:18) ( x t + ) − E [ ( x t + ) | µ t + ] (cid:124) (cid:123)(cid:122) (cid:125)(cid:19) = . (10)The last term in equation (10) is exactly equal to zero because the agent chooses pureactions. The reason it is hard to characterize ( σ t ) t using (10) is that its evolution dependson the agent’s belief. If we could somehow write the belief µ t + as a function of σ t , thenwe would have a recursion where σ t + depends only on σ t . This is where Theorem 1 fromSection 3 is useful. This theorem will allow us to approximate µ t + with a set of probabilitymeasures that depends on σ t .The ultimate objective is not really to approximate µ t + but rather the conditional expec-tation E [ ( x t + ) | µ t + ] in equation (10). The conditional expectation, however, is typicallydiscontinuous in the belief (this is particularly so for a belief under which the agent is in-different between two actions). Thus, replacing µ t + with a good approximation does notnecessarily yield a good approximation for the conditional expectation. We tackle this discon-tinuity issue by replacing the function µ (cid:55)→ E [ ( x t + ) | µ ] with a correspondence that containsthis function and is well behaved.To see how this approach works, note that E [ ( x t + ) | µ ] ∈ ∆ F ( µ ) for all µ . Therefore, we More generally, if the agent were allowed to mix, this last term is a Martingale difference sequence andessentially adds a noise term to the equation that can be controlled asymptotically in a manner that is standardin the theory of stochastic approximation. Theorem 2 continues to hold as stated, where now ∆ F ( µ ) is a set of compound lotteries, i.e., it is the set of all distributions over actions ˆ σ that are induced by some compound lottery z chosen from ∆ F ( µ ) , that is, ˆ σ ( x ) = ´ σ ∈ F ( µ ) z ( σ ) σ ( x ) d σ for each x . inclusion : σ t + = σ t + t + ( r t + − σ t ) , (11)where r t + ∈ ∆ F ( µ t + ) . It is called a difference inclusion because r t + can take multiple val-ues. Importantly, we use Theorem 1 to approximate µ t + with the set of probability measures µ satisfying ´ Θ ( K ( θ , σ t ) − K ∗ ( σ t )) µ ( d θ ) ≤ δ t , where δ t → ∆Θ ( σ t ) . More gen-erally, the difference equation (11) can be written entirely in terms of ( σ t ) t and approximationerrors.A key insight from the theory of stochastic approximation is that, in order to characterizea discrete-time process such as ( σ t ) t , it is convenient to work with its continuous-time inter-polation. Because of the multiplicity inherent in equation (11), we apply the speciﬁc methodsdeveloped by BHS2015, who extend Benaim (1996)’s ordinary-differential equation methodto the case of differential inclusions. Set τ = τ t = ∑ ti = / i for t ≥

1. The continuous-time interpolation of ( σ t ) t is thefunction w : R + → ∆ X deﬁned as w ( τ t + s ) = σ t + s σ t + − σ t τ t + − τ t , s ∈ [ , t + ) . (12)Figure 1 illustrates this simple interpolation for a speciﬁc value of x ∈ X . A convenient prop-erty of the interpolation is that it preserves the accumulation points of the discrete process.Equations (11) and (12) can be combined to show that the derivate of w with respect to(a re-indexing of) time, which we denote by ˙ w , is approximately given by r t + − σ t . As ar-gued earlier, r t + belongs to a set that depends on σ t and an approximation error, and this setis equal to ∆ F ( ∆Θ ( σ t )) . Thus, the derivate approximately takes values in ∆ F ( ∆Θ ( σ t )) − σ t .The next step is to replace σ t in this last expression by its interpolation w ( t ) . This replacementadds yet another vanishing approximation error, and we therefore obtain, ignoring the approx-imation error, that ˙ w ( t ) ∈ ∆ F ( ∆Θ ( w ( t ))) − w ( t ) . Thus, we can show that the continuous-timeinterpolation of ( σ t ) t is well approximated by solutions to the following differential inclusion:˙ σ ( t ) ∈ ∆ F ( ∆Θ ( σ ( t ))) − σ ( t ) . (13) See Borkar (2009) for a textbook treatment of the ordinary-differential equation method in stochastic ap-proximation. ( . ) tσ σ σ σ σ

12 13 14

Figure 1: Example of a continuous-time interpolation.To state the result formally, we ﬁrst deﬁne what we mean by a solution to the differentialinclusion. A solution to the differential inclusion (13) with initial point σ ∈ ∆ X is a map-ping σ : R → ∆ X that is absolutely continuous over compact intervals with the properties that σ ( ) = σ and that (13) is satisﬁed for almost every t . Let S T σ denote the set of solutions to(13) over [ , T ] with initial point σ . The assumption that F is uhc implies that, for every initialpoint, there exists a (possibly nonunique) solution to (13); see, e.g., Aubin and Cellina (2012).We now state the main characterization result. Theorem 2.

Suppose that Assumptions 1-3 hold and let F be an uhc policy correspondence.For any policy f ∈ Sel ( F ) , the following holds P f -a.s.: For all T > , lim t → ∞ inf σ ∈ S T w ( t ) sup ≤ s ≤ T (cid:107) w ( t + s ) − σ ( s ) (cid:107) = . (14) Proof.

See Appendix A.4.Theorem 2 says that, for any T >

0, the curve w ( t + · ) : [ , T ] → ∆ X deﬁned by thecontinuous-time interpolation of ( σ t ) t approximates some solution to the differential inclu-sion (13) with initial condition w ( t ) over the interval [ , T ] with arbitrary accuracy for sufﬁ-ciently large t . As we will show, this result is convenient because it allows us to characterizeasymptotic properties of ( σ t ) t by solving the differential inclusion in (13).14HS2005 refer to a function w satisfying (14) as an asymptotic pseudotrajectory of the dif-ferential inclusion. They show that the limit set of a (bounded) asymptotic pseudotrajectory isinternally chain transitive. Thus, one corollary of Theorem 2 is that the frequency of actionsconverges almost surely to an internally chain transitive set of the differential inclusion. Be-cause the notion of internally chain transitive is fairly complex, in the next section we providea series of results that help characterize behavior in most common economic applications.

We now present a series of implications of Theorem 2 that can be readily applied to economicapplications. Throughout this section we assume that the agent chooses a policy f that is aselection from F and that Assumptions 1-4 are satisﬁed. All probabilistic statements are withrespect to the corresponding probability measure P f . We begin by deﬁning the notion of equilibrium as a stationary point of the differential inclu-sion.

Deﬁnition 2. σ ∈ ∆ X is an equilibrium given a policy correspondence F if σ ∈ ∆ F ( ∆Θ ( σ )) .If σ is an equilibrium, then there is a solution of the differential inclusion that starts at σ and forever remains at σ . The next result shows that, if the action frequency converges, thenit must converge to an equilibrium. In Section 6, we relate this result to previous results in theliterature and show that the notion of equilibrium that arises naturally from our approach ismore general than notions previously considered. Proposition 1.

The following property holds almost surely: If σ t converges to some point σ ∗ ,then σ ∗ must be an equilibrium.Proof. See Appendix A.5. For a deﬁnition of an internally chain transitive set, see BHS2005, Section 3.3, Deﬁnition VI. .2 Attracting sets and repelling equilibrium Proposition 1 shows that, if the action frequency σ t converges, its limit must be an equilibriumof the differential inclusion. However, this is not a complete characterization of the long-run behavior of the action frequency, for two reasons. First, the proposition applies only tothe case in which the action frequency converges. It does not tell us what happens whenthe action frequency does not converge, and also it is not clear when the action frequencyconverges. Second, even when the action frequency converges, if there are multiple equilibria,the proposition does not tell us which one will arise as a long-run outcome. In this section, wewill introduce two concepts, attracting sets and repelling equilibria , which are useful to makea better prediction about the asymptotic behavior of the action frequency.Let d ( σ , A ) denote the distance from a point σ to a set A , that is, let d ( σ , A ) = inf ˜ σ ∈ A (cid:107) σ − ˜ σ (cid:107) .The following deﬁnition is standard in the stochastic approximation literature (e.g. BHS2015). Deﬁnition 3.

A set A ⊆ (cid:52) X is attracting if there is a set U such that A ⊂ int U and suchthat for any ε >

0, there is T such that d ( σσσ ( t ) , A ) < ε for any initial value σσσ ( ) ∈ U , for anysolution σσσ ∈ S ∞ σσσ ( ) to the differential inclusion, and for any t > T .In this deﬁnition, we require uniform convergence, in that as long as the initial value ischosen from U , σσσ ( t ) is in the ε -neighborhood of A for all periods t > T . Intuitively, thisimplies that once σσσ ( t ) enters the ε -neighborhood of A , it will never leave this neighborhood.The largest set U which satisﬁes the property in this deﬁnition is the basin of attraction of A , and we will denote it by U A . A set A is a globally attracting if it is attracting and its basinof attraction is the whole space (cid:52) X . An equilibrium σ ∗ is attracting if the set A = { σ ∗ } isattracting.The following proposition shows that an attracting set appears as a long-run outcome insome sense. Let E denote the set of all equilibria. Proposition 2.

The following results hold: (i) If A is globally attracting, then the action frequency σ t approaches this set A almostsurely: lim t → ∞ d ( σ t , A ) =

0. In particular, if A is a globally attracting equilibrium, σ t converges to that equilibrium almost surely.(ii) Suppose that there are ﬁnitely many attracting sets ( A , · · · , A N ) such that (cid:52) X is theunion of the basins ( U A , · · · , U A N ) of these attractors and of the equilibrium set E .Then almost surely, σ t approaches the equilibrium set E or one of these attractors:lim t → ∞ d ( σ t , E ) = t → ∞ d ( σ t , A n ) = n .16 roof. See Appendix A.6.Recall that in the deﬁnition of attracting sets, we require uniform convergence. This prop-erty is crucial in order to obtain Proposition 2. To see this, let ( t ) denote the current actionfrequency. Theorem 2 implies that the motion of the action frequency in the future is approx-imated by a solution σσσ ∈ S ∞ ( t ) to the differential inclusion for some (long but) ﬁnite time T ;but it does not guarantee that the action frequency is approximated by σσσ forever. So even ifall solutions σσσ ∈ S ∞ ( t ) starting from the current value ( t ) converge to some equilibrium σ ∗ ,the action frequency may not converge there. Formally, Theorem 2 implies that, for any T and ε >

0, if t is large enough, then (cid:107) ( t + T ) − σσσ ( T ) (cid:107) < ε for some σσσ ∈ S ∞ ( t ) , so the actionfrequency ( t + T ) in time t + T is close to the equilibrium σ ∗ . However, after time t + T , theaction frequency can be quite different from σσσ , and it may move away from the equilibrium σ ∗ . This suggests that in order to guarantee convergence to σ ∗ , we need a stronger assumption,and uniform convergence is precisely the property we want. To see how it works, note thatTheorem 2 can be applied iteratively, so that the action frequency from time t + T to t + T is approximated by a solution σσσ (cid:48) ∈ S ∞ ( t + T ) starting from ( t + T ) . As mentioned earlier, thisvalue ( t + T ) is close to the equilibrium σ ∗ . So if the equilibrium σ ∗ is attracting, then ( t + T ) is in the basin of σ ∗ , and the solution σσσ (cid:48) starting from this point stays around theequilibrium σ ∗ . This in turn implies that the action frequency ( t + T ) in time t + T isalso close to σ ∗ . A similar argument shows that ( t + nT ) in time t + nT is close to σ ∗ forevery n = , , · · · . The proof of Proposition 2 generalizes this idea and shows convergence toattracting sets. Now we apply the result to an example where the action does not converge, but the actionfrequency does.

Example 1.

The consequence space is Y = { , } . There are two actions, x and x , andthe probability that the consequence equals one depends on the action: Q ( | x ) = / andQ ( | x ) = / . The agent, however, has a misspeciﬁed model and believes the consequencedoes not depend on the action: Q θ ( | x k ) = θ ∈ Θ = [ , ] for k ∈ { , } , i.e., according tomodel θ , the probability that the consequence is y = is θ irrespective of the action. The KLD This is a so-called “shadowing” problem in the literature on stochastic approximation. See Section 8 ofBenaim (1999) for more details. Formally, the whole path of is approximated by a chain of trajectories ( σσσ , σσσ , · · · ) where (cid:107) σσσ n ( T ) − σσσ n + ( ) | < ε , and uniform convergence ensures that this chain of trajectories converges to σ ∗ . σ ( x ) σ ∗ ( x ) o Figure 2: Globally attracting equilibrium function in this case isK ( σ , θ ) = σ ( x ) (cid:18)

34 ln 3 / θ +

14 ln 1 / − θ (cid:19) + σ ( x ) (cid:18)

14 ln 1 / θ +

34 ln 3 / − θ (cid:19) , and there is a unique minimizer θ ( σ ) = σ ( x ) + σ ( x ) . Naturally, the model that best ﬁtsthe true model is a convex combination of the probability that y = under actions x and x .The payoff function satisﬁes π ( x , y ) = − y and π ( x , y ) = y; in particular, the agentprefers action x if y = and x if y = . Letting E µ [ y ] ≡ ´ θ µ ( d θ ) denote the agent’s per-ceived expected value of y, the optimal correspondence satisﬁes F ( µ ) = { x } if E µ [ y ] < . , = { x } if E µ [ y ] < . , and = { x , x } if E µ [ y ] = . .Actions in this example are negatively reinforcing in the sense that doing more of oneaction makes the agent want to do less of that action. This feature can be seen in Figure 2,where we have plotted σ (cid:55)→ ∆ F ( ∆Θ ( σ )) . For example, if the agent takes pure action x , i.e., σ ( x ) = , then the closest model is θ ( σ ) = , so the agent’s belief is degenerate at θ = / ,i.e., µ = δ / , and she prefers to take action x , not x . Similarly, if the agent takes only actionx , she prefers to take action x .The feature of negatively reinforcing actions is present in several examples in the literature,and previous work has shown that the action does not converge in those examples (e.g., Nyarko(1991), Esponda and Pouzo (2016), Fudenberg, Romanyuk and Strack (2017)). We can use Negative reinforcement is also present in some of the examples in Spiegler (2016) as well as in the vot-ing environments of Esponda and Pouzo (2017, 2019a) and Esponda and Vespa (2018) and in the investment ur differential inclusion to go beyond this result and show that the action frequency doesconverge. In the example, σ ∗ ( x ) = σ ∗ ( x ) = / is the unique equilibrium point: Given σ ∗ ,the closest model is θ ( σ ∗ ) = / , and, given the belief δ / , the agent is indifferent betweeneach of the actions in the support of σ ∗ . Moreover, as Figure 2 shows, for any initial condition,the solutions to the differential inclusion converge to σ ∗ , and so { σ ∗ } is a globally attractingset. Proposition 2(i) implies that the action frequency almost surely converges to σ ∗ . In the next example, we show that the attracting set need not be an equilibrium.

Example 2.

The consequence space is Y = R . There are three actions, x , x , and x . Givenan action x k , the consequence y follows the normal distribution N ( e k , I ) where e k ∈ R is theunit vector whose kth component is one, and I is the identity matrix. However, the agentdoes not recognize that the action inﬂuences the consequence. Formally, the model space isthe probability simplex Θ = { θ = ( θ , θ , θ ) | ∑ k = θ k = , θ k ≥ ∀ k } , and for each model θ = ( θ , θ , θ ) , the agent believes that y follows the normal distribution N ( θ , I ) . Assume thatfor each degenerate belief δ θ , the policy F ( δ θ ) is given as in Figure 3, where the trianglerepresents the model space Θ . For example, if the current belief puts probability one on themodel θ = e , then the policy F selects the action x . e ( . , , . ) e e ( . , . , ) ( , . , . ) x x x Figure 3: Policy F ( δ θ ) for each model θ x x x Figure 4: Differential Inclusion

In this example, given a mixed action σ ∈ (cid:52) X , the consequence follows the normal dis-tribution N ( σ , I ) , so the Kullback-Leibler divergence K ( θ , σ ) has a unique minimizer θ = σ .Accordingly, a solution to the differential inclusion is described as in Figure 4, where the environment of Jehiel (2018). riangle represents the whole action space (cid:52) X and each arrow points to the correspondingvertex in the large triangle.This example has a unique equilibrium, σ ∗ = ( , , ) . This equilibrium is not attracting.Indeed, starting from any nearby point σ (cid:54) = σ ∗ , a solution to the differential inclusion movesaway from the equilibrium, as described in Figure 5.On the other hand, the cycle described by the arrows in Figure 6 is attracting. The basinof attraction is the whole space (cid:52) X except the equilibrium point σ ∗ = ( , , ) . That is, givenany initial value σ (cid:54) = σ ∗ , any solution to the differential inclusion will eventually follow thiscycle. (The proof is straightforward and hence omitted. ) x x x Figure 5: Instability of the equilibrium x x x Figure 6: Limit cycle.Proposition 2(ii) implies that in Example 2, the action frequency σ t must converge to the(non-attracting) equilibrium σ = ( , , ) or follow the limit cycle described in Figure 6. Butwhich one is more likely to occur? It turns out that the equilibrium σ ∗ in the example above isunstable, in that the action frequency never converges there. So the action frequency followsthe limit cycle almost surely.To see why the equilibrium σ ∗ is unstable, suppose that the current action frequency isexactly this equilibrium, i.e., σ t = σ ∗ . Suppose also that the agent chooses some action today,say x . This changes the action frequency in the next period, and we have σ t + = t + δ x + tt + σ ∗ . Note that this new action frequency is slightly different from the equilibrium σ ∗ . Thenstarting from this action frequency, a solution to the differential inclusion moves away from theequilibrium (See Figure 5), which implies instability of σ ∗ . More formally, this equilibrium σ ∗ is repelling in the following sense: 20 eﬁnition 4. An equilibrium σ ∗ is repelling if there is a natural number T and an open neigh-borhood U of σ ∗ such that for any σ ∈ U , for any x ∈ F ( (cid:52) Θ ( σ ∗ )) , and for any β ∈ ( , ) ,there is β ∈ ( β , ) such that for any σσσ ∈ S ∞ β σ +( − β ) δ x , we have σσσ ( t ) / ∈ U for some t ∈ [ , T ] .In Example 2, starting from any nearby point σ (cid:54) = σ ∗ of the equilibrium σ ∗ , the solution tothe differential inclusion moves away from the equilibrium σ ∗ . In such a case, the conditionstated in the deﬁnition is satisﬁed, so this equilibrium σ ∗ is repelling.But our deﬁnition of repelling equilibrium is a bit more general. Roughly, an equilibrium σ ∗ is repelling if starting from almost all nearby points σ (cid:54) = σ ∗ of the equilibrium σ ∗ , all thesolutions to the differential inclusion eventually leave its neighborhood. This is illustrated inthe following example: Example 3.

We add one more action x (cid:48) to Example 2. This new action x (cid:48) is redundant, and isidentical to the action x . Formally, the signal distribution given the action x (cid:48) is N ( e , I ) , andthe policy F ( µ ) contains x (cid:48) for all µ such that F ( µ ) contains x in Example 2. The agent stillbelieves that the action does not inﬂuence the signal distribution.This example has a continuum of equilibria; any mixed action σ with σ ( x ) = σ ( x ) = σ ( x ) + σ ( x (cid:48) ) = is an equilibrium. Pick one equilibrium σ ∗ , and pick an open neighborhood U . This neighborhood U contains equilibrium points and non-equilibrium points. The setof equilibrium points is continuous, but has measure zero; so almost all the points in U arenon-equilibrium points. Starting from these non-equilibrium points, all the solutions to thedifferential inclusion leave the neighborhood U , just as described in Figure 5. However,starting from the equilibrium points, a solution to the differential inclusion can stay thereforever. So U contains some points from which the solution to the differential inclusion doesnot leave U . Still, this equilibrium σ ∗ is repelling. Indeed, given any point σ ∈ U and givenany action x, if we choose β sufﬁciently close to one, the perturbed point β σ + ( − β ) δ x is notan equilibrium; so starting from this perturbed point, the solution to the differential inclusioneventually leaves U . The following proposition asserts that repelling equilibria do not arise as long-run out-comes.

Proposition 3. If σ ∗ is a repelling equilibrium, then the action frequency σ t converges to σ ∗ with probability zero.Proof. See Appendix A.7. 21 .3 Convergence to attracting sets for some prior

Proposition 2 provides a useful set of conditions under which the action frequency convergesto an attracting set, such as the set of equilibria. Moreover, Proposition 3 shows that thefrequency cannot converge to repelling equilibria. These propositions, however, do not implythat the action frequency converges to any one speciﬁc attracting set or equilibrium (unless it isglobally attracting). We will show that if an attracting set A satisﬁes some additional property,then the action frequency converges to it (i.e., lim t → ∞ d ( σ t , A ) =

0) with positive probabilityfor some initial prior. Throughout this section, let B ε ( A ) denote the ε -neighborhood of A ,i.e., it is the set of all σ such that d ( σ , A ) < ε .We ﬁrst introduce the idea of a “perturbed differential inclusion.” Given an initial value σσσ ( ) , let SSS ∞ , εσσσ ( ) denote the set of all solutions to the following differential inclusion:˙ σσσ ( t ) ∈ (cid:91) ˜ σ ∈ B ε ( σσσ ( t )) (cid:52) F ( (cid:52) Θ ( ˜ σ )) − σσσ ( t ) . (15)Recall that in the original differential inclusion, the agent chooses an action from F ( (cid:52) Θ ( σσσ ( t ))) .In (15), this choice set is expanded, so that the agent chooses an action from F ( (cid:52) Θ ( ˜ σ )) , where˜ σ is a perturbation of the current action frequency σσσ ( t ) . Deﬁnition 5.

A set A is robustly attracting if it is attracting and there is ζ > ε > σσσ ( ) ∈ B ζ ( A ) , any solution σσσ ∈ SSS ∞ , εσσσ ( ) to the perturbed differentialinclusion never leaves the basin U A ; i.e., σσσ ( t ) ∈ U A for all t ≥ Θ isthe one-dimensional interval [ , ] . The same result holds when there are only two actions i.e., | X | =

2. (The proof is straightforward and hence omitted.) However, in general, attracting setsneed not be attracting. Such an example can be found in Appendix A.8.A sufﬁcient condition for a set A to be robustly attracting is that the (non-perturbed) dif-ferential inclusion has a contraction property in a neighborhood of A . Formally, let V ( σ ) = d ( σ , A ) , and suppose that there is an open neighborhood U of A such that ( ˜ σ − σ ) · ∇ V ( σ ) < Theorem 7.3 of Benaim (1999) gives a sufﬁcient condition for convergence to an attracting set for some prior.This result, however, relies on a technical assumption ((24) in his paper) that does not hold in our environment.Hence we cannot use his theorem and need to develop a new tool. σ ∈ U \ A and ˜ σ ∈ (cid:52) F ( (cid:52) Θ ( σ )) . Then this A is robustly attracting. Note that thiscontraction property is satisﬁed by any strict equilibrium; a pure action δ x is a strict equilib-rium if there is an open neighborhood U of δ x such that F ( (cid:52) Θ ( ˜ σ )) = { x } for all ˜ σ ∈ U . Soany strict equilibrium is robustly attracting.We will show that the action frequency converges to a robustly attracting set with positiveprobability, at least for some initial prior. Proposition 4.

For each robustly attracting set A, there is an initial prior µ ∗ with full supportsuch that lim t → ∞ d ( σ t , A ) = with positive probability.Proof. See Appendix A.9. Θ = [ , ] In this subsection, we will focus on a special case in which the model space is the one-dimensional interval Θ = [ , ] . This special case includes most of the current applicationsin the literature and it allows us to provide a more powerful characterization of the action fre-quency and the belief. Speciﬁcally, we will ﬁrst explain that our differential inclusion reducesto a one-dimensional problem; this reduction considerably simpliﬁes our analysis, because ingeneral the action frequency σ is multi-dimensional and solving the differential inclusion canbe a difﬁcult task. Then we will show that the belief converges to an equilibrium belief almostsurely, and we will provide a simple characterization of attracting/repelling equilibria.Throughout this subsection, we will impose the following identiﬁability assumption: Assumption 5.

The following two conditions hold:(i) For each σ , there is a unique minimizer of K ( σ , θ ) which we denote by θ ( σ ) ∈ [ , ] ,that is, Θ ( σ ) = { θ ( σ ) } . More generally, A is robustly attracting if there is an open neighborhood U of A and a function V : U → RRR + such that (i) V ( σ ) = σ ∈ A , (ii) ( ˜ σ − σ ) · ∇ V ( σ ) < σ ∈ U \ A and ˜ σ ∈ (cid:52) F ( (cid:52) Θ ( σ )) , and(iii) ∇ V is Lipchitz-continuous. Note that condition (ii) here is a bit more demanding than Lyapunov stability,which requires V ( σσσ ( t )) < V ( σσσ ( )) for all σσσ ( ) and σσσ ∈ S ∞ σσσ ( ) . The proof is as follows. It is obvious that A is attracting, so we will show that the condition stated in thedeﬁnition of robustly attracting sets is satisﬁed. Pick ζ > B ζ ( A ) is in the set U deﬁned above. Let C = B ζ ( A ) \ B ζ ( A ) . Since C is compact, ( ˆ σ − σ ) · ∇ V ( σ ) < ∇ V ensures that there is ε > ( ˆ σ − ˜ σ ) · ∇ V ( ˜ σ ) < σ ∈ C , ˜ σ ∈ B ε ( σ ) ,and ˆ σ ∈ F ( (cid:52) Θ ( σ )) . This implies that any solution to the ε -perturbed differential inclusion (15) also has acontraction property in the interior of the set C ; i.e., if the current action frequency ˜ σ is an interior point of C and d ( ˜ σ , C ) ≥ ε , then at the next instant, the action frequency becomes closer to the set A . This immediately impliesthat A is robustly attracting. ii) For each σ with θ ( σ ) ∈ ( , ) , we have ∂ K ( σ , θ ) ∂ θ (cid:12)(cid:12)(cid:12) θ = θ ( σ ) > . Part (i) is what we call the identiﬁability condition, which asserts that for each mixed action σ , there is a unique model which best ﬁts the true world. EP2016 provide a more detaileddiscussion about this identiﬁability condition. Note that the best model θ ( σ ) is continuous in σ , because Θ ( σ ) is upper hemi-continuous in σ ,Part (ii) requires that whenever θ ( σ ) is an interior solution (so that the ﬁrst-order conditionis satisﬁed at the minimum), it satisﬁes the second-order condition. This technical assump-tion is crucial for the strict monotonicity result (Proposition 5(iii)), which is needed to proveinstability of repelling models (Proposition 8). But all other results remain true even if thisassumption (ii) is dropped.The following proposition shows that the closest model θ ( σ ) is monotone with respect tothe action σ . In the proof, we ﬁrst show that the KL divergence K ( σ , θ ) has the increasingdifferences property. Then the result follows from the monotone selection theorem of Topkis(1998) and Edlin and Shannon (1998). Proposition 5.

Suppose that Assumption 5 holds. Pick any σ and ˜ σ , and for each β ∈ [ , ] ,let σ β = β σ + ( − β ) ˜ σ . Then the following results are true: (i) If θ ( σ ) = θ ( ˜ σ ) , then θ ( σ β ) = θ ( σ ) for all β ∈ [ , ] .(ii) If θ ( ˜ σ ) < θ ( σ ) , then θ ( σ β ) is weakly increasing with respect to β .(iii) If θ ( ˜ σ ) < θ ( σ ) , then θ ( σ β ) < θ ( σ β ) for any β and β such that β < β and θ ( σ β ) ∈ ( , ) . Proof.

See Appendix A.10.The monotonicity result above ensures that the motion of the closest model θ ( σ ) is char-acterized by a simple, one-dimensional problem. Note that when Θ = [ , ] , the best model θ ( σ ) can move in only three directions; it can go up, down, or stay the same. In particular,since the motion of the action frequency is approximated by˙ σσσ = σ − σσσ ( t ) for some σ ∈ (cid:52) F ( δ θ ( σσσ ( t )) ) , the monotonicity result in Proposition 5 implies that, at each time t , 24 θ ( σσσ ( t )) moves up if θ ( σ ) > θ ( σσσ ( t )) for all σ ∈ (cid:52) F ( δ θ ( σσσ ( t )) ) . • θ ( σσσ ( t )) moves down if θ ( σ ) < θ ( σσσ ( t )) for all σ ∈ (cid:52) F ( δ θ ( σσσ ( t )) ) .To better understand the motion of θ ( σσσ ( t )) , consider the following example: Example 4.

The consequence space is Y = R , and the agent has two actions, x and x . Givenan action x k , the consequence y follows the normal distribution N ( k , ) . The agent does notrecognize that the action inﬂuences the consequence, and she believes that given a model θ ∈ [ , ] , y follows the normal distribution N ( θ , ) regardless of the chosen action. Consideran upper hemi-continuous policy F which satisﬁesF ( δ θ ) =  { x } if θ ∈ [ , ) ∪ ( , ] { x } if θ ∈ ( , ) { x , x } if θ ∈ { , } . Given a mixed action σ , the consequence follows the normal distribution N ( σ ( x ) , ) ,so the closest model is θ ( σ ) = σ ( x ) . Hence the motion of θ ( σσσ ( t )) can be described bythe arrows in Figure 7: θ ( σσσ ( t )) will move up in the middle region (i.e., θ ( σσσ ( t )) ∈ ( , ) ),because the agent chooses the action x and the corresponding model is θ ( δ x ) = . Forthe other region, θ ( σσσ ( t )) will move down because the agent chooses the action x and thecorresponding model is θ ( δ x ) = . x x θ ( σ ) = θ ( σ ) = x Figure 7: Motion of θ ( σσσ ( t )) Using the fact that the closest model θ ( σ ) follows the simple rule above, we will now showthat it converges almost surely. The intuition is as follows. If the model space is Θ = [ , ] and θ ( σ ) follows a recursive rule as in Figure 7, it cannot be cyclic. This implies that θ ( σ ) cannot oscillate forever and it must converge. A model θ ∗ is an equilibrium model if thereis an equilibrium σ ∗ such that θ ( σ ∗ ) = θ ∗ . In Example 4, there are three equilibrium models,0, 1 /

3, and 2 /

3. Let Θ ∗ ⊆ Θ denote the set of all equilibrium models. Then we have thefollowing result: 25 roposition 6. Suppose that Assumption 5 holds, and that Θ ∗ is ﬁnite. Then almost surely, lim t → ∞ θ ( σ t ) exists and lim t → ∞ θ ( σ t ) ∈ Θ ∗ .Proof. See Appendix A.11.This proposition, together with Theorem 1, implies that the posterior belief µ t convergesalmost surely, and the limit belief is a degenerate belief on some equilibrium model. Whenthere are multiple equilibrium models, Proposition 6 does not tell us which one will arise as along-run outcome. To address this concern, we deﬁne attracting models as follows. Deﬁnition 6.

A model θ ∗ ∈ [ , ] is attracting if there is ε > • θ ( δ x ) ≥ θ ∗ for any θ ∈ ( θ ∗ − ε , θ ∗ ) and for any x ∈ F ( δ θ ) . • θ ( δ x ) ≤ θ ∗ for any θ ∈ ( θ ∗ , θ ∗ + ε ) and for any x ∈ F ( δ θ ) .Intuitively, a model θ ∗ is attracting if it is locally absorbing, in that θ ( σσσ ( t )) moves toward θ ∗ in its neighborhood. Indeed, the ﬁrst bullet point in the deﬁnition asserts that if θ ( σσσ ( t )) is slightly lower than θ ∗ in the current period t , then it will go up, and hence be closer to θ ∗ at the next instant. Similarly, the second bullet point in the deﬁnition ensures that if θ ( σσσ ( t )) is slightly higher than θ ∗ in the current period t , then it will go down. In Example 4, theequilibrium models 0 and 2 / / θ ∗ , let A = { σ ∈ (cid:52) F ( δ θ ∗ ) | θ ( σ ) = θ ∗ } be the set of equilibria σ in which the agent has a degenerate belief on θ ∗ . The following proposition shows thatthis set A is robustly attracting, which means that these equilibria should arise as a long-runoutcome at least for some initial prior. Also the proposition shows that the converse is true,i.e., if a set A = { σ ∈ (cid:52) F ( δ θ ∗ ) | θ ( σ ) = θ ∗ } is robustly attracting, then θ ∗ is an attractingmodel. Proposition 7.

Under Assumption 5, for each θ ∗ , the following properties are equivalent: (a) θ ∗ is attracting.(b) The set A = { σ ∈ (cid:52) F ( δ θ ∗ ) | θ ( σ ) = θ ∗ } is attracting. Upper hemi-continuity of F ensures that this set A is non-empty, which in turn implies that any attractingmodel is an equilibrium model. For the special case in which F ( δ θ ∗ ) contains only one component, this set A is a singleton. Similarly, even when F ( δ θ ∗ ) contains only two components, the set A is a singleton for genericparameters. On the other hand, when F ( δ θ ∗ ) contains three or more actions, the set A is typically continuous. A is robustly attracting. Proof.

See Appendix A.12.In the same spirit, we deﬁne repelling models as follows:

Deﬁnition 7.

A model θ ∗ ∈ ( , ) is repelling if θ ∗ (cid:54) = θ ( δ x ) for each pure action x ∈ F ( δ θ ∗ ) and there is ε > • θ ( δ x ) ≤ θ ∗ − ε for any θ ∈ ( θ ∗ − ε , θ ∗ ) and for any x ∈ F ( δ θ ) . • θ ( δ x ) ≥ θ ∗ + ε for any θ ∈ ( θ ∗ , θ ∗ + ε ) and for any x ∈ F ( δ θ ) .In words, a model θ ∗ is repelling if θ ( σσσ ( t )) moves away from θ ∗ in its neighborhood.Indeed, the ﬁrst bullet point implies that if θ ( σσσ ( t )) is slightly below θ ∗ , it will move downfurther at the next instant. The second bullet point implies that if θ ( σσσ ( t )) is slightly above θ ∗ ,it will go up at the next instant. In Example 4, the equilibrium model θ = is repelling. In thedeﬁnition above, we consider only interior models θ ∈ ( , ) . This is so because whenever anextreme point θ = , σ suchthat θ ( σ ) = θ ), there is a pure-strategy equilibrium δ x supporting it.The following proposition shows that if θ ∗ is repelling, then any equilibrium in which theagent has a degenerate belief on this model θ ∗ is repelling; hence these equilibria do not ariseas long-run outcomes. Proposition 8.

Under Assumption 5, θ ∗ ∈ ( , ) is repelling if and only if it is not supportedby a pure equilibrium, there is at least one mixed equilibrium σ ∗ with θ ( σ ∗ ) = θ ∗ , and allmixed equilibria σ ∗ with θ ( σ ∗ ) = θ ∗ are repelling.Proof. See Appendix A.12.

We conclude by applying our results to a large class of economically relevant environmentswhere beliefs are positively reinforcing in the sense that higher beliefs lead to higher ac-tions which in turn lead to higher beliefs. Two examples in this class are Esponda (2008)’seconomies with adverse selection, which includes applications to bilateral trade, insurance27arkets, auctions, and performance pay, and Heidhues, K˝oszegi and Strack (2018a)’s envi-ronment of an agent whose overconﬁdence biases her learning about a fundamental, whichincludes applications to delegation, control in organizations, and public policy choices. Incontrast to this work, we are able to derive results without additional restrictive assumptionsof a technical nature. Without loss of generality, we assume the actions are ordered according to x < ... < x | X | .We then make the following meaningful economic assumptions: • Θ = [ , ] • The identiﬁability conditions (i)-(ii) in Assumption 5 hold; in particular, let θ ( σ ) denotethe closest model given σ . • Higher actions lead to higher beliefs: x (cid:55)→ θ ( δ x ) is an increasing function. • Higher beliefs lead to higher actions: Formally, the mapping θ (cid:55)→ F ( δ θ ) is uhc andsatisﬁes max F ( δ θ ) ≤ min F ( δ θ (cid:48) ) for all θ (cid:48) > θ .The ﬁrst assumption requires a one-dimensional space of models, but it is less restrictive thanone might imagine. In applications, θ usually represents the mean of a continuous randomvariable. But, more generally, the assumption allows for non-parametric models whenever therandom variables takes a ﬁnite number of values. For example, in Esponda (2008)’s case of abuyer who does not know the distribution over a ﬁnite number of product values, v , ..., v K , amodel can be represented by a vector φ = ( φ , ..., φ K ) , where φ j denotes the probability thatthe value is v j . If the buyer cares about the expected value of the object, we can work with aone-dimensional space by deﬁning the transformation θ = ∑ Kj = φ j v j .The second assumption is an identiﬁcation condition that says that, no matter the data,there is a unique model that best ﬁts the data. This is a natural assumption in the class ofenvironments we study. Whenever it fails in applications, it is natural to reﬁne beliefs in amanner that the condition holds. In the buyer example, if the buyer decides to offer a price ofzero, then no trade happens. Then the buyer does not get any feedback about the value of theproduct and her beliefs are unrestricted. In this case, it is common to assume that the buyer’sbelief at a price of zero is the limit, as price goes to zero, of her belief at a positive price, wheretrade does happen with positive probability. Esponda (2008) does not generally tackle the question of convergence; this question is tackled by Heidhues,K˝oszegi and Strack (2018a), who establish convergence under the assumption that there is a unique equilibriumand that the distribution of noise is log concave. θ ∗ is an equilibrium model if and only if θ ∗ ∈ θ ( ∆ F ( δ θ ∗ )) . As we showin the proof of the next proposition, the mapping θ (cid:55)→ θ ( ∆ F ( δ θ )) has the staircase property.An example is depicted in the right panel of Figure 8. In the example, there are three equi-librium models, two of which are attracting ( θ ∗ and θ ∗ ) and one of which is repelling ( θ ∗ ).The attracting models are associated with a corresponding pure-action equilibrium ( x and x , respectively) while the repelling model is associated with a mixed-action equilibrium (acombination of x and x ). Our previous results imply that the agent’s action converges to apure-action equilibrium. This is a general feature of this class of environments, as long aspure-action equilibria are strict, a property that is generically true. Proposition 9.

If all pure-action equilibria are strict, then the action sequence ( x t ) t almostsurely converges to a pure-action equilibrium. roof. See Appendix A.14.

In this section, we relate the notion of equilibrium that arises naturally from the differentialinclusion approach to EP2016’s deﬁnition of Berk-Nash equilibrium. To facilitate the com-parison, we assume that the agent chooses optimal actions, and denote the correspondence ofoptimal actions by F β , where β ∈ [ , ) denotes the agent’s discount factor ( β = σ ∈ ∆ F β ( ∆Θ ( σ )) = ∆ ∪ µ ∈ ∆Θ ( σ ) F β ( µ ) (seeDeﬁnition 2). Equivalently, σ is an equilibrium if and only if for every action x in the supportof σ there exists a belief µ x such that x ∈ F β ( µ x ) . In contrast, EP2016 deﬁne a Berk-Nashequilibrium to be a probability distribution over actions satisfying σ ∈ ∪ µ ∈ ∆Θ ( σ ) ∆ F ( µ ) . Note that σ is a Berk-Nash equilibrium if and only if there exists a belief µ such that, forevery x in the support of σ , x ∈ F ( µ ) .There are two differences between the deﬁnition of equilibrium in this paper and a Berk-Nash equilibrium: A Berk-Nash equilibrium (1) restricts actions to be supported by the samebelief; and (2) requires actions to be myopically optimal. These two properties are commonin most other standard equilibrium concepts, such as Nash equilibrium. Following Fudenbergand Levine (1993), the ﬁrst property is known as the unitary-belief property, and puts restric-tions on the set of mixed actions that can constitute an equilibrium. The second propertyis convenient because myopic optimality is easier to characterize than general optimality. Wewill show that both of these properties are natural provided the following condition holds.

Deﬁnition 8.

The family of models is weakly identiﬁed given σ ∈ ∆ X if θ , θ (cid:48) ∈ Θ ( σ ) impliesthat Q θ ( · | x ) = Q θ (cid:48) ( · | x ) for all x such that σ ( x ) > σ . Weak identiﬁcation is immediately satisﬁed This is the deﬁnition of Berk-Nash equilibrium for the single agent case; EP2016 also consider the case ofmultiple agents. Fudenberg and Levine (1993) showed that non-unitary equilibria make sense in a game where there aremultiple players and, for each player, there is an underlying population of agents in the role of that player, anddifferent agents may have different experiences (hence, beliefs) about other players.

30f the agent’s family of models is correctly speciﬁed, but it is also satisﬁed in many of theapplications of misspeciﬁed learning in the literature; see EP2016 for further discussion.

Proposition 10.

Suppose that the family of models is weakly identiﬁed given σ . Then ∆ ∪ µ ∈ ∆Θ ( σ ) F β ( µ ) ⊆ ∆ ∪ µ ∈ ∆Θ ( σ ) F ( µ ) = ∪ µ ∈ ∆Θ ( σ ) ∆ F ( µ ) .Proof. See the Appendix A.15.Proposition 10 implies that, if the agent is myopic, then equilibrium and Berk-Nash equi-librium are equivalent concepts under weak identiﬁcation. Moreover, if the agent is not my-opic, then the set of equilibria are contained in the set of Berk-Nash equilibria. We conclude by relating Proposition 1 in Section 5 to one of the main results in EP2016:They show that, if the sequence of distributions over actions converges, then it converges toa Berk-Nash equilibrium. In our environment there is no motive for mixing, so convergenceof the sequence of distributions over actions implies that the actions converge. Propositions1 and 10 strengthen EP2016’s conclusion by showing that, under weak identiﬁcation, eventhough actions may not converge, if the action frequency converges, then it converges to aBerk-Nash equilibrium. Of course, the main contribution of this paper is to go beyond thecharacterization of equilibrium and to provide tools to tackle the question of convergence andstability. Of course, if the agent is not myopic, then there might be Berk-Nash equilibria that are not equilibria. Thisis similar to the idea in the bandit literature that more patient agents might be willing to experiment with actionsthat myopic agents would not. eferences Aghion, P., P. Bolton, C. Harris, and B. Jullien , “Optimal learning by experimentation,”

Thereview of economic studies , 1991, (4), 621–654. Al-Najjar, N. , “Decision Makers as Statisticians: Diversity, Ambiguity and Learning,”

Econo-metrica , 2009, (5), 1371–1401. and M. Pai , “Coarse decision making and overﬁtting,” Journal of Economic Theory, forth-coming , 2013.

Aragones, E., I. Gilboa, A. Postlewaite, and D. Schmeidler , “Fact-Free Learning,”

Ameri-can Economic Review , 2005, (5), 1355–1368. Arrow, K. and J. Green , “Notes on Expectations Equilibria in Bayesian Settings,”

Institutefor Mathematical Studies in the Social Sciences Working Paper No. 33 , 1973.

Aubin, J-P and Arrigo Cellina , Differential inclusions: set-valued maps and viability theory ,Vol. 264, Springer Science & Business Media, 2012.

Benaim, M. and M.W. Hirsch , “Mixed equilibria and dynamical systems arising from ﬁcti-tious play in perturbed games,”

Games and Economic Behavior , 1999, (1-2), 36–72. Benaim, Michel , “A dynamical system approach to stochastic approximations,”

SIAM Journalon Control and Optimization , 1996, (2), 437–472., “Dynamics of stochastic approximation algorithms,” in “Seminaire de ProbabilitesXXXIII,” Vol. 1709 of Lecture Notes in Mathematics , Springer Berlin Heidelberg, 1999,pp. 1–68.

Benaïm, Michel, Josef Hofbauer, and Sylvain Sorin , “Stochastic approximations and dif-ferential inclusions,”

SIAM Journal on Control and Optimization , 2005, (1), 328–348. Berk, R.H. , “Limiting behavior of posterior distributions when the model is incorrect,”

TheAnnals of Mathematical Statistics , 1966, (1), 51–58. Bohren, J Aislinn , “Informational herding with model misspeciﬁcation,”

Journal of Eco-nomic Theory , 2016, , 222–247. and Daniel N Hauser , “Social Learning with Model Misspeciﬁcation: A Framework anda Robustness Result,” 2018. 32 orkar, Vivek S , Stochastic approximation: a dynamical systems viewpoint , Vol. 48,Springer, 2009.

Bunke, O. and X. Milhaud , “Asymptotic behavior of Bayes estimates under possibly incor-rect models,”

The Annals of Statistics , 1998, (2), 617–644. Camerer, Colin F and Eric J Johnson , “The process-performance paradox in expert judg-ment: How can experts know so much and predict so badly,”

Research on judgment anddecision making: Currents, connections, and controversies , 1997,

Easley, D. and N.M. Kiefer , “Controlling a stochastic process with unknown parameters,”

Econometrica , 1988, pp. 1045–1064.

Edlin, Aaron S and Chris Shannon , “Strict monotonicity in comparative statics,”

Journal ofEconomic Theory , 1998, (1), 201–219. Esponda, I. , “Behavioral equilibrium in economies with adverse selection,”

The AmericanEconomic Review , 2008, (4), 1269–1291. and D. Pouzo , “Berk–Nash Equilibrium: A Framework for Modeling Agents With Mis-speciﬁed Models,” Econometrica , 2016, (3), 1093–1130. and , “Conditional Retrospective Voting in Large Elections,” American Economic Jour-nal: Microeconomics , 2017, (2), 54–75. and , “Retrospective voting and party polarization,” International Economic Review ,2019a, (1), 157–186. and , “Eqilibrium in Misspeciﬁed Markov Decision Processes,” working paper , 2019b. and E. I. Vespa , “Endogenous sample selection: A laboratory study,” Quantitative Eco-nomics , 2018, (1), 183–216. Eyster, E. and M. Rabin , “Cursed equilibrium,”

Econometrica , 2005, (5), 1623–1672. Eyster, Erik and Matthew Rabin , “Naive herding in rich-information settings,”

Americaneconomic journal: microeconomics , 2010, (4), 221–43. Frick, Mira, Ryota Iijima, and Yuhta Ishii , “Misinterpreting Others and the Fragility ofSocial Learning,” 2019a. 33 , and , “Stability and Robustness in Misspeciﬁed Learning Models,” 2019b.

Fudenberg, D. and D. Kreps , “Learning Mixed Equilibria,”

Games and Economic Behavior ,1993, , 320–367. and D.K. Levine , “Self-conﬁrming equilibrium,” Econometrica , 1993, pp. 523–545.

Fudenberg, Drew, Gleb Romanyuk, and Philipp Strack , “Active learning with a misspeci-ﬁed prior,”

Theoretical Economics , 2017, (3), 1155–1189. Gagnon-Bartsch, Tristan and Matthew Rabin , “Naive social learning, mislearning, andunlearning,” work , 2017.

He, Kevin , “Mislearning from Censored Data: The Gambler’s Fallacy in Optimal-StoppingProblems,” arXiv preprint arXiv:1803.08170 , 2018.

Heidhues, Paul, Botond K ˝oszegi, and Philipp Strack , “Unrealistic expectations and mis-guided learning,”

Econometrica , August 2018a, (4), 1159–1214. , Botond Koszegi, and Philipp Strack , “Convergence in Misspeciﬁed Learning Modelswith Endogenous Actions,” Available at SSRN 3312968 , December 2018b.

Hofbauer, J. and W.H. Sandholm , “On the global convergence of stochastic ﬁctitious play,”

Econometrica , 2002, (6), 2265–2294. Jehiel, P. , “Limited horizon forecast in repeated alternate games,”

Journal of Economic The-ory , 1995, (2), 497–519., “Analogy-based expectation equilibrium,” Journal of Economic theory , 2005, (2),81–104. and F. Koessler , “Revisiting games of incomplete information with analogy-based expec-tations,”

Games and Economic Behavior , 2008, (2), 533–557. Jehiel, Philippe , “Investment strategy and selection bias: An equilibrium perspective onoveroptimism,”

American Economic Review , 2018, (6), 1582–97.

Kirman, A. P. , “Learning by ﬁrms about demand conditions,” in R. H. Day and T. Groves,eds.,

Adaptive economic models , Academic Press 1975, pp. 137–156.

Liebman, Jeffrey B and Richard J Zeckhauser , “Schmeduling,” 2004.34 olavi, Pooya , “Macroeconomics with Learning and Misspeciﬁcation: A General Theoryand Applications,” 2018.

Nyarko, Y. , “Learning in mis-speciﬁed models and the possibility of cycles,”

Journal of Eco-nomic Theory , 1991, (2), 416–427. Olea, José Luis Montiel, Pietro Ortoleva, Mallesh M Pai, and Andrea Prat , “CompetingModels,” arXiv preprint arXiv:1907.03809 , 2019.

Osborne, M.J. and A. Rubinstein , “Games with procedurally rational players,”

AmericanEconomic Review , 1998, , 834–849. Rabin, M. and D. Vayanos , “The gambler’s and hot-hand fallacies: Theory and applications,”

The Review of Economic Studies , 2010, (2), 730–778. Rabin, Matthew , “Psychology and economics,”

Journal of economic literature , 1998, (1),11–46. Sargent, T. J. , Bounded rationality in macroeconomics , Oxford University Press, 1993.

Schwartzstein, Joshua , “Selective attention and learning,”

Journal of the European EconomicAssociation , 2014, (6), 1423–1452. Sobel, J. , “Non-linear prices and price-taking behavior,”

Journal of Economic Behavior &Organization , 1984, (3), 387–396. Spiegler, Ran , “Bayesian networks and boundedly rational expectations,”

The Quarterly Jour-nal of Economics , 2016, (3), 1243–1290., “Data Monkeys: A Procedural Model of Extrapolation from Partial Statistics,”

The Reviewof Economic Studies , 2017, (4), 1818–1841. Topkis, Donald M , Supermodularity and complementarity , Princeton university press, 1998.

Tversky, T. and D. Kahneman , “Availability: A heuristic for judging frequency and proba-bility,”

Cognitive Psychology , 1973, , 207–232. White, Halbert , “Maximum likelihood estimation of misspeciﬁed models,”

Econometrica:Journal of the Econometric Society , 1982, pp. 1–25.35

Appendix

In this appendix, we present the proofs omitted from the text. In some places, we use the factthat θ (cid:55)→ log q ( Y | x ) q θ ( Y | x ) is ﬁnite and continuous Q ( ·| x ) − a . s . for all x ∈ X . This fact follows fromAssumptions 1-2. A.1 Proof of Lemma 1

Continuity of K : For any ( θ , σ ) ∈ Θ × ∆ X take a sequence ( θ n , σ n ) n in Θ × ∆ X that convergesto this point. By the triangle inequality and the fact that K is ﬁnite under Assumption 2(iii) itfollows that | K ( θ n , σ n ) − K ( θ , σ ) | ≤ | K ( θ n , σ ) − K ( θ , σ ) | + | K ( θ n , σ n ) − K ( θ n , σ ) | .It sufﬁces to show that both terms on the RHS vanish as n → ∞ . Regarding the ﬁrst termin the RHS, observe that for any σ ∈ ∆ X , θ (cid:55)→ log q ( Y | X ) q θ ( Y | X ) is ﬁnite and continuous Q · σ − a . s . Under Assumption 2(iii), by the DCT this implies that θ (cid:55)→ K ( θ , σ ) is continuous for any σ ∈ ∆ X . Thus lim n → ∞ | K ( θ n , σ ) − K ( θ , σ ) | =

0. Regarding the other term in the RHS of thedisplay, observe that under Assumption 2(iii) | K ( θ n , σ n ) − K ( θ n , σ ) | ≤ ∑ x ∈ X ˆ g x ( y ) Q ( dy | x ) | σ n ( x ) − σ ( x ) | and the RHS vanishes as ´ g x ( y ) Q ( dy | x ) < ∞ for all x ∈ X .Finally, continuity of K , compactness of Θ (by Assumption 2(ii)) and the Theorem of theMaximum imply that σ (cid:55)→ Θ ( σ ) is compact-valued, uhc, and that σ (cid:55)→ K ∗ ( σ ) is continuous. A.2 Proof of Lemma 2

Let ( θ , z ) (cid:55)→ g ( θ , z ) ≡ log q ( y | x ) q θ ( y | x ) , where z = ( y , x ) ∈ Y × X . For any θ ∈ Θ and any ε >

0, let O ( θ , ε ) ≡ { θ (cid:48) : || θ (cid:48) − θ || < ε } .S TEP

1. Pointwise convergence. Fix any ε > θ ∈ Θ . For any τ ≥ h , let ζ τ ( h ) ≡ sup θ (cid:48) ∈ O ( θ , ε ) g ( θ (cid:48) , z τ ( h )) − E Q ( ·| x τ ( h )) (cid:34) sup θ (cid:48) ∈ O ( θ , ε ) g ( θ (cid:48) , Y , x τ ( h )) (cid:35) . The process ( ζ t ) t is a Martingale difference under P f and the ﬁltration generated by { h t ≡ ( x ( h ) , y ( h ) , x ( h ) , y ( h ) , ..., x t ( h )) : t ≥ } , because E P f ( ·| h t ) [ ζ t ( h )] = t . De-ﬁne h (cid:55)→ ζ t ( h ) ≡ ∑ t τ = ( + τ ) − ζ τ ( h ) for any t ≥

0. Since ( ζ t ) t is a Martingale differencesequence, then ( ζ t ) t is also a Martingale difference.36y the Martingale Convergence Theorem, there exist a H ⊆ H (potentially dependingon θ ∈ Θ ) and ζ ∈ L ( H , R , P f ) such that P f ( H ) = h ∈ H , ζ t ( h ) → ζ ( h ) ,provided sup t E P f (cid:104) ( ζ t ) (cid:105) < ∞ . This condition is satisﬁed because E P f (cid:104)(cid:0) ζ t (cid:1) (cid:105) = E P f (cid:34) t ∑ τ = ( + τ ) − ( ζ τ ) (cid:35) + E P f (cid:34) ∑ τ > τ (cid:48) ( + τ ) − (cid:0) + τ (cid:48) (cid:1) − ζ τ ζ τ (cid:48) (cid:35) = t ∑ τ = ( + τ ) − E P f (cid:104) ( ζ τ ) (cid:105) ≤ t ∑ τ = ( + τ ) − E P f  ˆ (cid:32) sup θ (cid:48) ∈ O ( θ , ε ) g ( θ (cid:48) , y , X τ ) (cid:33) Q ( dy | X τ )  ≤ C max x ∈ X ˆ sup θ (cid:48) ∈ O ( θ , ε ) (cid:0) g ( θ (cid:48) , y , x ) (cid:1) Q ( dy | x ) , where the second line follows from the fact that, for any τ > τ (cid:48) , E P f [ ζ τ ζ τ (cid:48) ] = E P f (cid:104) E P f ( ·| h τ ) [ ζ τ ] ζ τ (cid:48) (cid:105) =

0, and where the last line follows from the fact that C ≡ lim t → ∞ ∑ t τ = ( + τ ) − < ∞ . By As-sumption 2(iii), for any ( x , y ) ∈ X × Y , sup θ (cid:48) ∈ O ( θ , ε ) ( g ( θ (cid:48) , y , x )) ≤ ( g x ( y )) with ´ ( g x ( y )) Q ( dy | x ) < ∞ . Thus, sup t E P f (cid:104) ( ζ t ) (cid:105) < ∞ . By invoking Kronecker Lemma it follows that lim t → ∞ ( + t ) − ∑ t τ = ζ t = P f -a.s. Therefore, we have established that, for all θ ∈ Θ ,lim t → ∞ ( + t ) − t ∑ τ = (cid:32) sup θ (cid:48) ∈ O ( θ , ε ) g ( θ (cid:48) , z τ ) − E Q ( ·| x τ ) (cid:34) sup θ (cid:48) ∈ O ( θ , ε ) g ( θ (cid:48) , Y , x τ ) (cid:35)(cid:33) = P f -a.s.S TEP

2. Uniform convergence. Observe that, for any ε > θ ∈ Θ , there exists δ ( θ , ε ) such that E Q ( ·| x ) (cid:34) sup θ (cid:48) ∈ O ( θ , δ ( θ , ε )) g ( θ (cid:48) , Y , x ) − g ( θ , Y , x ) (cid:35) < . ε (16)for all x ∈ X . To see this claim, note that, since θ (cid:55)→ g ( θ , Y , x ) is continuous Q ( ·| x ) − a . s . forall x ∈ X , lim δ → sup θ (cid:48) ∈ O ( θ , δ ) | g ( θ (cid:48) , Y , x ) − g ( θ , Y , x ) | = − Q ( · | x ) for all x ∈ X . Also,by Assumption 2(iii), sup θ (cid:48) ∈ O ( θ , δ ) | g ( θ (cid:48) , y , x ) − g ( θ , y , x ) | ≤ g x ( y ) and ´ g x ( y ) Q ( dy | x ) < ∞ ,Thus, by the DCT, lim δ → E Q ( ·| x ) (cid:104) sup θ (cid:48) ∈ O ( θ , δ ) | g ( θ (cid:48) , Y , x ) − g ( θ , Y , x ) | (cid:105) = x ∈ X .Observe that ( O ( θ , δ ( θ , ε ))) θ ∈ Θ is an open cover of Θ . By compactness of Θ , there existsa ﬁnite sub-cover ( O ( θ j , δ ( θ j , ε ))) j = ,... J ( ε ) . Thus, for all ε > θ ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( + t ) − t ∑ τ = (cid:0) g ( θ , z τ ) − E Q ( ·| x τ ) [ g ( θ , Y , x τ )] (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max j sup θ ∈ O ( θ j , δ ( θ j , ε )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( + t ) − t ∑ τ = (cid:0) g ( θ , z τ ) − E Q ( ·| x τ ) [ g ( θ , Y , x τ )] (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max j ( + t ) − t ∑ τ = (cid:32) sup θ ∈ O ( θ j , δ ( θ j , ε )) (cid:12)(cid:12) g ( θ , z τ ) − E Q ( ·| x τ ) [ g ( θ , Y , x τ )] (cid:12)(cid:12)(cid:33) ≤ max j ( + t ) − t ∑ τ = (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup θ ∈ O ( θ j , δ ( θ j , ε )) g ( θ , z τ ) − E Q ( ·| x τ ) (cid:20) inf θ ∈ O ( θ j , δ ( θ j , ε )) g ( θ , Y , x τ ) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:33) ≤ max j ( + t ) − t ∑ τ = (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup θ ∈ O ( θ j , δ ( θ j , ε )) g ( θ , z τ ) − E Q ( ·| x τ ) (cid:34) sup θ ∈ O ( θ j , δ ( θ j , ε )) g ( θ , Y , x τ ) (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:33) + max j ( + t ) − t ∑ τ = (cid:32) E Q ( ·| x τ ) (cid:34) sup θ ∈ O ( θ j , δ ( θ j , ε )) g ( θ , Y , x τ ) − inf θ ∈ O ( θ j , δ ( θ j , ε )) g ( θ , Y , x τ ) (cid:35)(cid:33) = I + II . By Step 1 and the fact that we are adding over a ﬁnite number of θ j ’s, the limit as t → ∞ of the term I is equal to zero P f -a.s. For the second term, note that (16) implies that II ≤ x ∈ X ˆ sup θ ∈ O ( θ j , δ ( θ j , ε )) (cid:12)(cid:12) g ( θ , y , x ) − g ( θ j , y , x ) (cid:12)(cid:12) Q ( dy | x ) ≤ . ε . Since 0 ≤ II ≤ . ε holds for all ε >

0, it follows that II =

0. Therefore, using the deﬁnitionof g , we have established thatlim t → ∞ sup θ ∈ Θ ( + t ) − t ∑ τ = (cid:18) log q ( y τ | x τ ) q θ ( y τ | x τ ) − E Q ( ·| x τ ) (cid:20) log q ( Y | x τ ) q θ ( Y | x τ ) (cid:21)(cid:19) = P f -a.s. The statement in the lemma then follows by noting that K ( θ , σ t ) = ∑ x ∈ X E Q ( ·| x ) (cid:20) log q ( Y | x ) q θ ( Y | x ) (cid:21) σ t ( x ) = ( + t ) − t ∑ τ = E Q ( ·| x τ ) (cid:20) log q ( Y | x τ ) q θ ( Y | x τ ) (cid:21) . .3 Proof of equation (8) in Theorem 1 For simplicity, set k ≡ ε / >

0. Continuity of ( θ , σ ) (cid:55)→ ¯ K ( θ , σ ) ≡ K ( θ , σ ) − K ∗ ( σ ) (seeLemma 1(i)) and compactness of Θ × ∆ X imply that ¯ K is uniformly continuous. For any σ ,take some θ σ ∈ Θ ( σ ) (this is possible because Θ ( σ ) is nonempty; see Lemma 1(ii)). Byuniform continuity of ¯ K , there exists δ k > (cid:107) θ σ − θ (cid:48) (cid:107) < δ k and (cid:107) σ − σ (cid:48) (cid:107) < δ k imply ¯ K ( θ (cid:48) , σ (cid:48) ) < ¯ K ( θ σ , σ ) + k = k , where the last equality follows because ¯ K ( θ σ , σ ) = σ , { θ (cid:48) : (cid:107) θ σ − θ (cid:48) (cid:107) < δ k } ⊆ { θ : ¯ K ( θ , σ (cid:48) ) ≤ k } for all σ (cid:48) ∈ B ( σ , δ k ) ≡{ σ (cid:48) : (cid:107) σ − σ (cid:48) (cid:107) < δ k } . Thus, for all σ , µ ( { θ : ¯ K ( θ , σ (cid:48) ) ≤ k } ) ≥ µ ( { θ (cid:48) : (cid:107) θ σ − θ (cid:48) (cid:107) < δ k } ) for all σ (cid:48) ∈ B ( σ , δ k ) . The balls { B ( σ , δ k ) } σ form an open cover for ∆ X . Since ∆ X is compact,there exists a ﬁnite subcover { B ( σ i , δ k ) } ni = . Let r ≡ min i ∈{ ,..., n } µ ( { θ (cid:48) : (cid:107) θ σ i − θ (cid:48) (cid:107) < δ k } ) which is strictly positive by Assumption 3. Take any σ (cid:48) , there exists i such that σ (cid:48) ∈ B ( σ i , δ k ) ;by the previous argument µ ( { θ : ¯ K ( θ , σ (cid:48) ) ≤ k } ) ≥ µ ( { θ (cid:48) : (cid:107) θ σ i − θ (cid:48) (cid:107) < δ k } ) ≥ r > A.4 Proof of Theorem 2

The proof of Theorem 2 consists of three parts. Part 1 deﬁnes an enlargement of the set ofactions that allows us to adopt the methods developed by BHS2005. Part 2 and 3 correspondto the arguments in the proofs of Proposition 1.3 and Theorem 4.2 in BHS2005, respectively,and we provide them here for completeness. Throughout the proof we ﬁx a history from theset of histories with probability 1 deﬁned by the statement of Theorem 1; we omit the historyfrom the notation.

Part 1. Enlargement of the set ∆ F ( µ ) . Let S = { a − b | a , b ∈ ∆ X } and let Ξ : R + × ∆ X ⇒ S be deﬁned such that, for all ( δ , σ ) ∈ R + × ∆ X , Ξ ( δ , σ ) = (cid:40) y ∈ S : ∃ σ (cid:48) ∈ ∆ X , µ (cid:48) ∈ ∆Θ s . t . y ∈ ∆ F ( µ (cid:48) ) − σ (cid:48) , µ (cid:48) ∈ M ( δ , σ (cid:48) ) , (cid:107) σ (cid:48) − σ (cid:107) ≤ δ (cid:41) , where M : R + × ∆ X ⇒ ∆Θ is deﬁned such that, for all ( δ , σ (cid:48) ) ∈ R + × ∆ X , M ( δ , σ (cid:48) ) ≡ { µ (cid:48) ∈ ∆Θ : ˆ Θ ¯ K ( θ , σ (cid:48) ) µ (cid:48) ( d θ ) ≤ δ } , where ¯ K ( θ , σ (cid:48) ) ≡ K ( θ , σ (cid:48) ) − K ∗ ( σ (cid:48) ) . Note that Θ ( , σ ) = Θ ( σ ) and so Ξ ( , σ ) = ∪ µ ∈ ∆Θ ( σ ) ∆ F ( µ ) − σ . Claim 1: ( δ , σ ) (cid:55)→ Ξ ( δ , σ ) is uhc.Proof. Because S is compact, it sufﬁces to show that Ξ has the closed graph property. For39his purpose, we will ﬁrst show that ( δ , σ (cid:48) ) (cid:55)→ M ( δ , σ (cid:48) ) is uhc. To establish this claim, notethat ∆Θ is compact because of the assumption that Θ is compact. Hence, we will show that M has the closed graph property. Take ( µ (cid:48) n ) n converging to µ (cid:48) (in the weak topology), ( δ n ) n converging to δ , and ( σ (cid:48) n ) n converging to σ (cid:48) . Suppose that µ (cid:48) n ∈ M ( δ n , σ (cid:48) n ) for all n . We willshow that µ (cid:48) ∈ M ( δ , σ (cid:48) ) . Since ( µ (cid:48) n ) n converges (weakly) to µ (cid:48) and ¯ K ( θ , · ) is continuous (seeLemma 1), it follows thatlim n (cid:18) ˆ Θ ¯ K ( θ , σ (cid:48) n ) µ (cid:48) n ( d θ ) − ˆ Θ ¯ K ( θ , σ (cid:48) ) µ (cid:48) ( d θ ) (cid:19) = lim n (cid:18) ˆ Θ ¯ K ( θ , σ (cid:48) n ) µ (cid:48) n ( d θ ) − ˆ Θ ¯ K ( θ , σ (cid:48) ) µ (cid:48) n ( d θ ) (cid:19) + lim n (cid:18) ˆ Θ ¯ K ( θ , σ (cid:48) ) µ (cid:48) n ( d θ ) − ˆ Θ ¯ K ( θ , σ (cid:48) ) µ (cid:48) ( d θ ) (cid:19) = . Also, since µ (cid:48) n ∈ M ( δ n , σ (cid:48) n ) , then ´ Θ ¯ K ( θ , σ (cid:48) n ) µ (cid:48) n ( d θ ) ≤ δ n . Taking limits of this last expressionon both sides, we obtain ´ Θ ¯ K ( θ , σ (cid:48) ) µ (cid:48) ( d θ ) ≤ δ , implying that µ (cid:48) ∈ M ( δ , σ (cid:48) ) .Next, to show that Ξ has the closed graph property, take ( y n ) n converging to y , ( δ n ) n converging to δ , and ( σ n ) n converging to σ . Suppose that y n ∈ Ξ ( δ n , σ n ) for all n . We willshow that y ∈ Ξ ( δ , σ ) . Since y n ∈ Ξ ( δ n , σ n ) for all n , there exists a sequence ( µ (cid:48) n , σ (cid:48) n ) n suchthat y n ∈ ∆ F ( µ (cid:48) n ) − σ (cid:48) n , (cid:107) σ (cid:48) n − σ n (cid:107) ≤ δ n , and µ (cid:48) n ∈ M ( δ n , σ (cid:48) n ) . Because the sequence ( µ n , σ (cid:48) n ) n lives in a compact set, ∆Θ × ∆ X , there exists a subsequence, ( µ (cid:48) n ( k ) , σ (cid:48) n ( k ) ) k that converges to ( µ (cid:48) , σ (cid:48) ) . By uhc of M and of µ (cid:55)→ ∆ F ( µ ) (due to the assumption that F is uhc), it follows that y ∈ ∆ F ( µ (cid:48) ) − σ (cid:48) , (cid:107) σ (cid:48) − σ (cid:107) ≤ δ , and µ (cid:48) ∈ M ( δ , σ (cid:48) ) . Thus, y ∈ Ξ ( δ , σ ) .Claim 2: There exists a sequence ( δ t ) t with lim t → ∞ δ t = t , σ t + − σ t ∈ t + Ξ ( δ t , σ t ) .Proof. By equation (11) in the text, σ t + − σ t ∈ t + ( ∆ F ( µ t + ) − σ t ) for all t . By Theorem1, there exists a sequence ( δ t ) t with lim t → ∞ δ t = t , ´ Θ ¯ K ( θ , σ t ) µ t + ( d θ ) ≤ δ t . Thus, ∆ F ( µ t + ) − σ t ⊆ Ξ ( δ t , σ t ) for all t , and the claim follows. Part 2. The interpolation of ( σ t ) t is what BHS2005 call a perturbed solution of the differ-ential inclusion. Deﬁne m ( t ) ≡ sup { k ≥ t ≥ τ k } , where τ = τ k = ∑ ki = / i . Let w bethe continuous-time interpolation of ( σ t ) t , as deﬁned in equation (12) in the text. By Claim2, for any t , w ( t ) ∈ σ m ( t ) + ( t − τ m ( t ) ) Ξ ( δ m ( t ) , σ m ( t ) ) ; hence, ˙ w ( t ) ∈ Ξ ( δ m ( t ) , σ m ( t ) ) for almostevery t . Let γ ( t ) ≡ δ m ( t ) + (cid:13)(cid:13) w ( t ) − σ m ( t ) (cid:13)(cid:13) . Then ˙ w ( t ) ∈ Ξ ( γ ( t ) , w ( t )) for almost every t . Inaddition, note that lim t → ∞ γ ( t ) = ( δ t ) t goes to zero, m ( t ) goes to inﬁnity, and w isthe interpolation of ( σ t ) t . Part 3. A perturbed solution is an asymptotic pseudotrajectory (i.e., it satisﬁes equation

14) in the text).

Let v ( t ) ≡ ˙ w ( t ) ∈ Ξ ( γ ( t ) , w ( t )) for almost every t . Then w ( t + s ) − w ( t ) = ˆ s v ( t + τ ) d τ . (17)Since S is a bounded set, v is uniformly bounded; therefore, w is uniformly continuous. Hence,the family of functions { s (cid:55)→ S t ( w )( s ) : t ∈ R } — where for each ( t , s ) S t ( w )( s ) = w ( s + t ) —is equicontinuous and, therefore, relatively compact with respect to L ∞ ( R , ∆ X , Leb ) , where Leb is the Lebesgue measure; all L p spaces in the proof are with respect to Lebesgue, sowe drop it from subsequent notation. Therefore, there exists a subsequence ( t n ) n and a w ∗ ∈ L ∞ ( R , ∆ X ) such that w ∗ = lim t n → ∞ S t n ( w ) .Set t = t n in (17) and deﬁne v n ( s ) = v ( t n + s ) . Then w ∗ ( s ) − w ∗ ( ) = lim n → ∞ ˆ s v n ( τ ) d τ . Since v n ∈ L ∞ ( R , S ) for all n , then v n ∈ L ([ , T ] , S ) . By the Banach-Alouglu Theorem, thereexists a subsequence, which we still denote as ( t n ) n , and a v ∗ ∈ L ([ , T ] , S ) such that ( v n ) n converges in the weak topology to v ∗ ; therefore,lim n → ∞ ˆ s v n ( τ ) d τ = ˆ s v ∗ ( τ ) d τ (18)pointwise in s ∈ [ , T ] . Indeed, convergence is uniform because the family (cid:8) s (cid:55)→ ´ s v n ( τ ) d τ : n ∈ N (cid:9) is equicontinuous and [ , T ] is compact. In addition, v ∗ ∈ L ([ , T ] , S ) , then w ∗ is absolutelycontinuous in [ , T ] .The proof concludes by showing the claim that v ∗ ( τ ) ∈ ∆ F ( ∆Θ ( w ∗ ( τ ))) − w ∗ ( τ ) Lebesgue-a.s. in τ ∈ [ , T ] . We will prove it by showing that v ∗ ( τ ) ∈ Co ( Ξ ( , w ∗ ( τ ))) Lebesgue-a.s. in τ ∈ [ , T ] , where Co denotes the convex hull; the desired claim then follows because the factsthat ∆ F ( ∆Θ ( σ )) − σ is a convex set and contains Ξ ( , σ ) and, by deﬁnition, Co ( Ξ ( , σ )) isthe smallest convex set that contains Ξ ( , σ ) , imply that Co ( Ξ ( , σ )) ⊆ ∆ F ( ∆Θ ( σ )) − σ .We will prove v ∗ ( τ ) ∈ Co ( Ξ ( , w ∗ ( τ ))) Lebesgue-a.s., in several steps. First, we showthat weak convergence of ( v n ) n to v ∗ implies almost sure convergence of a weighted averageof ( v n ) n to v ∗ . Formally, by Mazur’s Lemma, for each n ∈ N , there exists a N ( n ) ∈ N and anon-negative vector, ( α n , ..., α N ( n ) ) , such that ∑ N ( n ) i = n α i =

1, and lim n → ∞ (cid:107) ¯ v n − v ∗ (cid:107) L ([ , T ] , S ) = v n ≡ ∑ N ( n ) k = n α k v n . Therefore, as lim n → ∞ (cid:107) ¯ v n − v ∗ (cid:107) L ([ , T ] , S ) =

0, it follows that lim n → ∞ ¯ v n = v ∗ a.s.-Lebesgue. 41ix τ ∈ [ , T ] such that the previous claim holds. Deﬁne γ n ( τ ) ≡ γ ( t n + τ ) and w n ( τ ) ≡ w ( t n + τ ) . By uhc of Ξ at ( , σ ) for all σ (see Claim 1 in Part 1) and the facts that γ n ( τ ) → w n ( τ ) → w ∗ ( τ ) , it follows that, for any ε >

0, there exists N ε such that, for all n ≥ N ε , Ξ ( γ n ( τ ) , w n ( τ )) ⊆ Ξ ε ( , w ∗ ( τ )) , where Ξ ε ( , w ∗ ( τ )) ≡ { y (cid:48) ∈ S : (cid:107) y (cid:48) − y (cid:107) ≤ ε , y ∈ Ξ ( , w ∗ ( τ )) } .Recall that v n ( τ ) ∈ Ξ ( γ n ( τ ) , w n ( τ )) for all n ; therefore, ¯v n ( τ ) ∈ Co ( Ξ ε ( , w ∗ ( τ ))) for all n ≥ N ε . Since Co ( Ξ ε ( , w ∗ ( τ ))) is closed and lim j → ∞ ¯ v n ( τ ) = v ∗ ( τ ) , it follows that v ∗ ( τ ) ∈ Co ( Ξ ε ( , w ∗ ( τ ))) . Since this is true for all ε >

0, it follows that v ∗ ( τ ) ∈ Co ( Ξ ( , w ∗ ( τ ))) . A.5 Proof of Proposition 1

Let σ ∗ be an arbitrary non-equilibrium point. Then there is a pure action x such that σ ∗ ( x ) > x / ∈ F ( (cid:52) Θ ( σ ∗ )) . Choose such x . By upper hemi-continuity of F (Assumption 4) and Θ ( · ) (Lemma 1) it follows that there exists a ε > x / ∈ F ( (cid:52) Θ ( σ )) for all σ ∈ B ε ( σ ∗ ) and such that inf σ ∈ B ε ( σ ∗ ) σ ( x ) >

0. Pick such ε >

0. Then there is some T > ε -neighborhood, σσσ ( ) ∈ B ε ( σ ∗ ) and any solution σσσ ∈ S ∞ σσσ ( ) to thedifferential inclusion leaves this neighborhood within time T , i.e., we have (cid:107) σσσ ( τ ) − σ ∗ (cid:107) ≥ ε (19)for some τ < T . Such T exists, because the share of the action x decreases whenever σσσ ( τ ) isin the set B ε ( σ ∗ ) .Now, pick a sample path h such that the property stated in Theorem 2 holds. We will showthat σ t cannot stay in the ε -neighborhood of σ ∗ forever. This completes the proof, because itimplies that almost surely, σ t cannot converge to any non-equilibrium point σ ∗ .Pick ˜ T such that for any time t > ˜ T ,inf σσσ ∈ S ( t ) (cid:107) ( t + s ) − σσσ ( s ) (cid:107) < ε ∀ s ∈ [ , T ] (20)Suppose there exists a t > ˜ T such that ( t ) ∈ B ε ( σ ∗ ) (if no such t exists, then the proof isﬁnished because it follows that σ t is outside a ε / σ ∗ for all t > ˜ T ). Thenfrom (19) and (20), there is s ∈ [ , T ] such that (cid:107) ( t + s ) − σ ∗ (cid:107) ≥ ε /

2. So σ t cannot stay inthe ( ε / ) -neighborhood forever. 42 .6 Proof of Proposition 2 Part (i) directly follows from part (ii). Proof of part (ii): Pick a history from the set of historieswith probability one deﬁned by the statement of Theorem 2, and let denote the interpolationof the action frequency σ t given this path. If there is t ∗ such that ( t ) ∈ E for all t > t ∗ , theresult follows. So we will focus on the case in which for any t ∗ , there is t > t ∗ such that ( t ) / ∈ E .Pick attracting sets ( A , · · · , A N ) as stated. Pick an arbitrarily small ε >

0. Without loss ofgenerality, we assume that for each attracting set A n , the ε -neighborhood of A n is in the basinof attraction U A n .Pick T large enough that for any attracting set A n , for any initial value σσσ ( ) ∈ U A n , for any σσσ T σσσ ( ) , and for any s ∈ [ T , T ] , d ( σσσ ( s ) , A n ) < ε . (21)Also, pick ˜ T large enough that for any t > ˜ T and for any s ∈ [ , T ] inf σσσ ∈ S ( t ) (cid:107) ( t + s ) − σσσ ( s ) (cid:107) < ε . (22)Recall that for any t ∗ , there is t > t ∗ such that ( t ) / ∈ E . This implies that there is t > ˜ T and an attracting set A n such that ( t ) ∈ U A n . Pick such t and A n . From (21) and (22), wehave d ( ( t + s ) , A n ) < ε for all s ∈ [ T , T ] . This implies that ( t + s ) ∈ U A n for all s ∈ [ T , T ] ,so applying the same argument iteratively, we have d ( ( t + s ) , A n ) < ε for all s ≥ T , whichmeans that will stay in the ε -neighborhood of the attracting set A n forever. Since ε can bearbitrarily small, d ( ( t ) − A n ) converges to zero as t → ∞ . (Note that choosing smaller ε doesnot inﬂuence A n .) A.7 Proof of Proposition 3

Let σ ∗ be a repelling equilibrium, and pick U and T as in the deﬁnition of repelling equilib-rium. Pick a history from the set of histories with probability one deﬁned by the statements ofTheorems 1 and 2. Let denote the interpolation of the action frequency σ t given this path.It sufﬁces to show that ( t ) does not converge to σ ∗ given this history.Pick a sufﬁciently small ε >

0, so that 2 ε -neighborhood of σ ∗ is a subset of U . Withoutloss of generality, we can assume that there is η > F ( µ ) ⊆ F ( (cid:52) Θ ( σ ∗ )) for any µ ´ ( K ( θ , σ ) − K ∗ ( σ )) µ ( d θ ) < η for some σ ∈ B ε ( σ ∗ ) . (If necessary, take ε small.)Pick such η > T ∗ and τ ∗ such that T ∗ = ∑ τ ∗ i = i , ˆ ( K ( θ , σ τ ) − K ∗ ( σ τ )) µ τ + ( d θ ) < η (23)for all τ ≥ τ ∗ , and inf σσσ ∈ S ( t ) sup ≤ s ≤ T (cid:107) ( t + s ) − σσσ ( s ) (cid:107) < ε (24)for all t ≥ T ∗ .Suppose that ( t ) is in the ε -neighborhood of σ ∗ for some t > T ∗ such that t = ∑ τ i = i forsome τ . We will show that there is t (cid:48) > ( t + t (cid:48) ) is not in the ε -neighborhood of σ ∗ . This completes the proof, because it implies that cannot stay around σ ∗ forever. Let σ = ( t ) satisfy the condition in the deﬁnition of repelling equilibrium. From (23) and thedeﬁnition of η , the agent chooses some action x ∈ F ( (cid:52) Θ ( σ ∗ )) in the current period. Thismeans that ( ˜ t ) moves toward δ x during the time ˜ t ∈ [ t , t + τ + ] . Then from the condition inthe deﬁnition of repelling equilibrium, there is ˜ t ∈ [ t , t + τ + ] such that for any σσσ ∈ S ∞ ( ˜ t ) , wehave σσσ ( t ) / ∈ U for some t ∈ [ , T ] . Then as in the previous case, we can show that there is t (cid:48) ≤ T such that (cid:107) ( ˜ t + t (cid:48) ) − σ (cid:107) > ε , as desired. A.8 Attracting Sets Need Not Be Robustly Attracting

The agent has three actions, x , x , and x . Given an action x k , a consequence y is randomlydrawn from Y = R according to the normal distribution N ( e k , I ) , so the action inﬂuences themean of the consequence y . However, the agent does not recognize that the action inﬂuencesthe consequence. Her model space is the probability simplex Θ = (cid:52) X , and for each model θ ,she believes that the consequence follows the normal distribution N ( θ , I ) . So given a mixture σ ∈ (cid:52) X , the closest model is θ = σ , i.e., Θ ( σ ) = { σ } for each σ .For each degenerate belief δ θ , the optimal policy is given as follows. Consider the modelspace Θ , and choose the points A = ( , , ) , B = ( , , ) , C = ( , , ) , and σ ∗ = ( , , ) as in Figure 9. For each model θ in the interior of the triangle AB σ ∗ , F ( δ θ ) = { x } , i.e.,the optimal policy is x if the belief puts probability one on some model θ in this triangle.Similarly, the optimal action is x for the triangle BC σ ∗ , and x for the triangle CA σ ∗ . For thepoint σ ∗ and the models outside the triangle ABC , all actions are optimal, that is, F ( δ θ ) = X θ . For all models on the boundary of the triangles, the optimal policy ischosen in such a way that F ( δ θ ) is upper hemi-continuous with respect to θ . For example, onthe line A σ ∗ , F ( δ θ ) = { x , x } . e Ae e B Cx x x σ ∗ Figure 9: Policy F ( δ θ ) for each model θ x A = a x x B Cb c a b Figure 10: Path starting from a .In this example, the model θ = σ ∗ is an attracting equilibrium, and its basin of attractionis the interior of the triangle ABC . For example, suppose that the action frequency so far is thepoint a = A , and the action x is chosen today. Then the new action frequency is an interiorpoint of the triangle AB σ ∗ , and the agent chooses x until the action frequency hits the point b = ( , , ) on the line B σ ∗ . After that, the agent chooses the action x until the actionfrequency hits the point c = ( , , ) on the line C σ ∗ ; then the agent chooses the action x until the action frequency hits the point a = ( , , ) . From there on, the solution to thedifferential inclusion takes the path a b c a b c · · · and converges to σ ∗ , where a n = ( a n , a n , a n ) = (cid:32) − − c n − , c n − , (cid:33) b n = ( b n , b n , b n ) = (cid:18) , − − a n , a n (cid:19) c n = ( c n , c n , c n ) = (cid:18) b n , , − − b n (cid:19) . See Figure 10. Similarly, starting from any interior points of the triangle

ABC , any solution σσσ to the differential inclusion will eventually converge to σ ∗ .Now we will modify this example in such a way that the equilibrium σ ∗ is still attractingbut not robustly attracting. Take the points d , d , · · · as in Figure 11, that is, d is the inter-section point of the line AB and the line passing through σ ∗ and C , and for each n ≥ d n is45he intersection point of the line a n b n and the line passing through σ ∗ and C . Then take thesequence ( z , z , · · · ) such that z = d , z = ( d , d + d , − d − d + d ) , and z k = z k − + d foreach k ≥

2. Intuitively, z z · · · is a “jagged bridge” which connects d and d , whose step sizeshrinks as it goes. See Figure 12. x A = a x x B Cd d d Figure 11: Policy F ( δ θ ) for each model θ z = d d z z z z z z Figure 12: Jagged path. It does not reach d .Assume that for each model θ on this bridge z z z · · · , the optimal policy is F ( δ θ ) = { x , x } . Then starting from any point on this bridge z z · · · , a solution σσσ to the differentialinclusion can move along this bridge and reach the point d . However, starting from the point d , σσσ cannot move to d ; this is so because for every large n , z n is slightly different from d , which means that the bridge z z · · · do not reach the point d exactly. Accordingly, theasymptotic motion of σσσ is the same as before, i.e., as long as the starting point is in the interiorof the triangle ABC , σσσ converges to σ ∗ .The same is true even if we add more bridges. Suppose that for each n , there is a jaggedpath from d n toward d n + . Even with this change, σ ∗ is still attracting, for example, startingfrom the point b , σσσ must follow the path b c a b c · · · and eventually converge to σ ∗ .However, adding these bridges signiﬁcantly changes the solution ˜ σσσ to the perturbed dif-ferential inclusion. Indeed, starting from the point d n , ˜ σσσ can move to d n − through the jaggedpath, because this path is ε -close to the point d n for any small ε . For the same reason, ˜ σσσ canmove to d n − , d n − , · · · , and can eventually reach the point d , which is outside of the basinof σ ∗ . This implies that σ ∗ is not robustly attracting with these bridges.46 .9 Proof of Proposition 4 We will ﬁrst present a few preliminary results. We have seen in Lemma 2 that given any initialprior µ and given any policy f , there is T such that with positive probability, the consequencefrequency is close to the mean (more formally, the sample average of the likelihood L t is closeto the mean) for all periods after T . The following claim shows that this T can be chosenindependently of µ and f . The proof can be found at the end. Claim 1.

For any η > , there is T and q > such that for any initial prior µ with fullsupport and for any f , P f ( ∀ t ≥ T ∀ θ | L t ( θ ) − K ( θ , σ t ) | < η ) > q . (25)The next claim just summarizes what we have seen in the proof of Theorem 1: It shows thatif the past consequence frequency is close to the mean as stated in the above claim, and if theinitial prior µ satisﬁes some technical condition, then the posterior belief µ t + concentrateson the states which approximately minimize K ( θ , σ t ) for large t . Claim 2.

For any η > and for any κ > , there is T such that for any initial prior µ andfor any t > T such that | L t ( θ ) − K ( θ , σ t ) | < η and µ ( { θ : K ( θ , σ t ) − K ∗ ( σ t ) ≤ η } ) ≥ κ , ˆ ( K ( θ , σ t ) − K ∗ ( σ t )) µ t + ( d θ ) < η . Proof.

Directly follows from the proof of Theorem 1.The next claim shows that if the posterior belief µ t + is concentrated as stated in the aboveclaim then the motion of the action frequency σ t is described by the perturbed differentialinclusion. A difference from Theorem 2 is that here the motion of the action frequency exactly matches a solution to the perturbed differential inclusion. In contrast, in Theorem 2,we take the limit as t → ∞ , so a solution to the differential inclusion is an approximation ofthe action frequency . Claim 3.

Let F be an uhc policy correspondence. Then for any ε > , there is η > suchthat given a sample path h, for any t > ε such that ´ ( K ( θ , σ t ) − K ∗ ( σ t )) µ t ( d θ ) < η , there is σσσ ∈ SSS ∞ , ε ( T ) ( T + s ) = σσσ ( s ) for all s ∈ [ , t + ] , where T = ∑ t τ = τ .Proof. Pick ε > η > σ and for any µ such that ´ ( K ( θ , σ ) − K ∗ ( σ )) µ ( d θ ) < η , there is ˜ σ ∈ B ε ( σ ) such that F ( µ ) ⊆ F ( (cid:52) Θ ( ˜ σ )) . 47ote that for each σ , there is ε σ < ε and η σ such that F ( µ ) ⊆ F ( (cid:52) Θ ( σ )) for all µ suchthat ´ ( K ( θ , ˜ σ ) − K ∗ ( ˜ σ )) µ ( d θ ) < η σ for some ˜ σ ∈ B ε σ ( σ ) . Since (cid:52) X is compact, thereis a ﬁnite subcover { B ε σ ( σ ) , · · · , B ε σ M ( σ M ) } . Let η = min m η σ m >

0. This η satisﬁes theproperty we want. Indeed, for any σ and µ such that ´ ( K ( θ , σ ) − K ∗ ( σ )) µ ( d θ ) < η , if weset ˜ σ = σ m such that σ ∈ B ε σ m ( σ m ) , we have F ( µ ) ⊆ F ( (cid:52) Θ ( ˜ σ )) .The next claim shows that to prove convergence to an attracting set A , it sufﬁces to showthat σ t visits the basin of A inﬁnitely often with positive probability. Claim 4.

Suppose that given an initial prior µ and a policy f , σ t visits the basin of A inﬁnitelyoften with positive probability, i.e., P f ( ∀ T ∃ t > T σ t ∈ U A ) > . Then P f ( lim t → ∞ d ( σ t , A ) = ) > .Proof. Let H be the set of all h which satisﬁes the property stated in Theorem 1. Notethat P f ( H ) =

1. Also, let ˜ H be the set of all h such that σ t visits the basin of A inﬁnitelyoften, i.e., it is the set of all h such that for any T , there is t > T such that σ t ∈ U A . Let H ∗ = H ∩ ˜ H . By the assumption, we have P f ( H ∗ ) = P f ( ˜ H ) > h ∈ H ∗ . To prove the claim, it sufﬁces to show that lim t → ∞ d ( σ t , A ) = ε >

0. Without loss of generality, we assume that B ε ( A ) isin the basin of attraction U A .Pick T large enough that (21) holds for any initial value σσσ ( ) ∈ U A , for any σσσ ∈ S T σσσ ( ) ,and for any s ∈ [ T , T ] . Also, pick ˜ T large enough that (22) holds for any t > ˜ T and for any s ∈ [ , T ] .Since σ t visits U A inﬁnitely often, there is t > ˜ T such that ( t ) ∈ U A . Pick such t . Thenas in the proof of Proposition 2(ii), we can show that will stay in the ε -neighborhood of theset A forever. Since ε can be arbitrarily small, lim t → ∞ d ( ( t ) , A ) = A be a robustly attracting set, and let ζ > B ζ ( A ) ⊂ U A . Let ζ and ε be as in the deﬁnition of robustly attracting set. Then pick η as in Claim 3, pick an arbitrary κ >

0, and pick T as stated in Claim 2.Pick t ∗ large enough that t ∗ < ε and t ∗ t ∗ + T σ + Tt ∗ + T ˜ σ ∈ B ζ ( A ) (26)for all σ ∈ B ζ ( A ) and ˜ σ ∈ (cid:52) X . Now, consider the following hypothetical situation:(a) The initial prior is µ such that µ ( { θ : K ( θ , σ ) − K ∗ ( σ ) ≤ η } ) > κ for all σ . Thecurrent period is t ∗ +

1. 48b) The action frequency in the past is close to A , in that σ t ∗ ∈ B ζ ( A ) .(c) The past observation is close to the mean, in that | L t ∗ ( θ ) − K ( θ , σ t ∗ ) | < η for all θ . Let h t ∗ be a history which satisﬁes all the properties above. (Given a policy f , the probabilityof such a history h t ∗ may be zero, but this does not affect the following argument.) Let H bethe set of histories such that the history during the ﬁrst t ∗ periods is exactly h t ∗ and | L t ∗ + , t ( θ ) − K ( θ , σ t ∗ + , t ) | < η

16 (27)for all t ≥ t ∗ + T , where L t ∗ + , t is the sample average of the likelihood from period t ∗ + t , and σ t ∗ + , t is the action frequency from period t ∗ + t . From Claim 1, weknow P f ( H | h t ∗ ) > q .Pick a path h ∈ H . We claim that given this path, σ t never leaves the basin of A afterperiod t ∗ . The proof can be found at the end. Claim 5.

For each path h ∈ H , σ t ∈ U A for all t > t ∗ . Let µ ∗ be the posterior belief induced by the initial prior µ and the history h t ∗ above.Now, consider a new game in which the agent’s initial prior is µ ∗ . Since the agent’s actionis determined by the belief, her play in this new game is exactly the same as her play in thecontinuation game induced by the initial prior µ and the history h t ∗ . So Claim 5 implies thatin this new game, with positive probability, the action frequency σ t will stay in the basin U A in all periods t > ˜ T , where ˜ T is a sufﬁciently large number. (This is so because the actionfrequency σ t ∗ during the ﬁrst t ∗ periods has almost no impact on the action frequency σ t forlarge t .) Then Claim 4 implies that in this new game, the action frequency σ t converges to A with positive probability. A.9.1 Proof of Claim 1

Let P x denote the probability distribution of the histories h = ( x t , y t ) ∞ t = when the agent chooses x every period, Claim 6.

For any η > , there is T such that for any action x,P x ( ∀ t ≥ T ∀ θ | L t ( θ ) − K ( θ , x ) | < η ) > Claim 1 ensures that this condition can be satisﬁed by some consequence sequence. roof. Pick any η >

0. From Lemma 2, lim T → ∞ P x ( ∀ t ≥ T ∀ θ | L t ( θ ) − K ( θ , x ) | < η ) = L t ( θ , x ) = t σ t ( x ) ∑ t τ = { x τ = x } log q ( y t | x t ) q θ ( y t | x t ) be the sampleaverage of the likelihood ratio, where the sample is taken from the periods in which the agentchooses x . Note that we have L t ( θ ) = ∑ x ∈ X σ t ( x ) L t ( θ , x ) .Pick η > T as in the above claim. Let H be the set of histories h such that | L t ( θ , x ) − K ( θ , x ) | < η for all x and t such that t σ t ( x ) > T . Then there is q > P f ( H ) > q for any initial prior µ and any policy f .Pick an arbitrary h ∈ H , and let ξ > (cid:12)(cid:12)(cid:12) log q ( y | x ) q θ ( y | x ) − log q ( ˜ y | x ) q θ ( ˜ y | x ) (cid:12)(cid:12)(cid:12) < ξ for all x , θ , y , and ˜ y . Then we have | L t ( θ , x ) − K ( θ , x ) | < (cid:40) η if t σ t ( x ) > T ξ otherwisefor all x , θ , and t . This implies that σ t ( x ) | L t ( θ , x ) − K ( θ , x ) | < max { η , T ξ t } . So for any t > T ∗ ≡ T ξε , we have σ t ( x ) | L t ( θ , x ) − K ( θ , x ) | < η . Hence for any t > T ∗ , | L t ( θ ) − K ( θ , σ t ) | ≤ ∑ x ∈ X σ t ( x ) | L t ( θ , x ) − K ( θ , x ) | < η . Since K and T ∗ are chosen independently of h ∈ H , this implies the result we want. A.9.2 Proof of Claim 5

Pick h as stated. From (b) and (26), we have σ t ∈ B ζ ( A ) ⊆ U A for all t ∈ { t ∗ + , · · · , t ∗ + T } ,regardless of the agent’s play during these periods.So what remains is to show that σ t ∈ U A for all t > t ∗ + T . From (27), (a), (c), and Claim2, we have ´ ( K ( θ , σ t ) − K ∗ ( σ t )) µ t + ( d θ ) < η for all t ≥ t ∗ + T . Then Claim 3 implies thatthe motion of the action frequency after period t ∗ + T is described by some solution to the ε -perturbed differential inclusion. Since σ t + T ∗ ∈ B ζ ( A ) and σ ∗ is robustly attracting, we have σ t ∈ B ζ ( A ) ⊆ U A for all t ≥ t ∗ + T . A.10 Proof of Proposition 5

We will start with a useful lemma, which shows that Assumption 5 essentially requires single-peakedness of the Kullback-Leibler divergence K ( θ , δ x ) . Let θ = min σ ∈(cid:52) X θ ( σ ) , and let50 = max σ ∈(cid:52) X θ ( σ ) . Then we have the following lemma. The proof can be found at the end. Lemma 1.

If Assumption 5 holds, then for each action frequency σ , the Kullback-Leiblerdivergence K ( θ , σ ) is single-peaked with respect to θ in [ θ , θ ] , that is, we have K ( θ , σ ) > K ( ˜ θ , σ ) for each θ ∈ [ θ , θ ( σ )) and ˜ θ ∈ ( θ , θ ( σ )] , and K ( θ , σ ) < K ( ˜ θ , σ ) for each θ ∈ [ θ ( σ ) , θ ) and ˜ θ ∈ ( θ , θ ] . Proof of Lemma 1:

We ﬁrst prove the following claim:

Claim 7.

Under Assumption 5, for each σ and ˜ σ such that θ ( σ ) > θ ( ˜ σ ) , K ( θ , σ ) is strictlydecreasing with respect to θ in [ θ ( ˜ σ ) , θ ( σ )] , and K ( θ , ˜ σ ) is strictly increasing with respectto θ in [ θ ( ˜ σ ) , θ ( σ )] .Proof. Pick σ and ˜ σ as stated. For each β ∈ [ , ] , let σ β = β σ + ( − β ) ˜ σ .We will prove only that K ( θ , σ ) is strictly decreasing with respect to θ on [ θ ( ˜ σ ) , θ ( σ )] .Suppose not, so that there is θ (cid:48) , θ (cid:48)(cid:48) ∈ [ θ ( ˜ σ ) , θ ( σ )] such that θ (cid:48) < θ (cid:48)(cid:48) and K ( θ (cid:48) , σ ) ≤ K ( θ (cid:48)(cid:48) , σ ) .We consider the following two cases. Case 1: K ( θ (cid:48) , ˜ σ ) ≤ K ( θ (cid:48)(cid:48) , ˜ σ ) . In this case, K ( θ (cid:48) , σ β ) ≤ K ( θ (cid:48)(cid:48) , σ β ) for all β , so θ (cid:48)(cid:48) cannotbe the unique minimizer of K ( θ , σ β ) , i.e., θ ( σ β ) (cid:54) = θ (cid:48)(cid:48) for all β . But this is a contradiction,because θ ( σ β ) is continuous in β and θ ( σ ) ≤ θ (cid:48)(cid:48) ≤ θ ( σ ) . Case 2: K ( θ (cid:48) , ˜ σ ) > K ( θ (cid:48)(cid:48) , ˜ σ ) . Let β (cid:48) be such that θ ( σ β (cid:48) ) = θ (cid:48) . Then we have K ( θ (cid:48) , σ β (cid:48) ) < K ( θ (cid:48)(cid:48) , σ β (cid:48) ) , which is equivalent to β (cid:48) ( K ( θ (cid:48) , σ ) − K ( θ (cid:48)(cid:48) , σ )) < ( − β (cid:48) )( K ( θ (cid:48)(cid:48) , ˜ σ ) − K ( θ (cid:48) , ˜ σ )) . Then for all β ≥ β (cid:48) , β ( K ( θ (cid:48) , σ ) − K ( θ (cid:48)(cid:48) , σ )) < ( − β )( K ( θ (cid:48)(cid:48) , ˜ σ ) − K ( θ (cid:48) , ˜ σ )) , which implies K ( θ (cid:48) , σ β ) < K ( θ (cid:48)(cid:48) , σ β ) . So θ ( σ β ) (cid:54) = θ (cid:48)(cid:48) for all β ≥ β (cid:48) . But this is a contradic-tion, because θ ( σ β ) is continuous in β and θ ( σ β (cid:48) ) < θ (cid:48)(cid:48) < θ ( σ ) .Pick an arbitrary σ ∗ . We will show that the Kullback-Leibler divergence K ( θ , σ ∗ ) issingle-peaked in [ θ , θ ] . First, consider the case in which θ ( σ ∗ ) = θ . Let ˜ σ = σ ∗ , and let σ besuch that θ ( σ ) = θ . Then from the claim above, K ( θ , σ ∗ ) is strictly increasing with respectto θ in [ θ , θ ] , which implies single-peakedness.Next, consider the case in which θ ( σ ∗ ) < θ . Let ˜ σ = σ ∗ , and let σ be such that θ ( σ ) = θ .Then from the claim above, K ( θ , σ ∗ ) is strictly increasing with respect to θ in [ θ ( σ ∗ ) , θ ] .51imilarly, letting σ = σ ∗ and ˜ σ be such that θ ( ˜ σ ) = θ , the claim above implies that K ( θ , σ ∗ ) is strictly decreasing with respect to θ in [ θ , θ ( σ ∗ )] . Hence K ( θ , σ ∗ ) is single-peaked. Proof of Proposition 5:

Part (i): A standard algebra shows that K ( θ , σ β ) = β K ( θ , σ ) + ( − β ) K ( θ , ˜ σ ) for each θ . Then the result follows immediately.Part (ii): We ﬁrst show that θ ( σ β ) ≥ θ ( ˜ σ ) for all β . Suppose not so that there is β ∈ ( , ) such that θ ( σ β ) < θ ( ˜ σ ) . Then since θ ( σ β ) is continuous in β and θ ( σ β ) < θ ( ˜ σ ) < θ ( σ ) ,there must be some β such that β < β < θ ( σ β ) = θ ( ˜ σ ) . But then from part (i), wehave θ ( σ β ) = θ ( ˜ σ ) for all β ∈ [ , β ] , and in particular θ ( σ β ) = θ ( ˜ σ ) . This is a contradiction.Similarly, we can show that θ ( σ β ) ≤ θ ( σ ) for all β . Taken together, we have θ ( σ β ) ∈ [ θ ( ˜ σ ) , θ ( σ )] for all β . Now, from Claim 7 in the proof of Lemma 1, K ( θ , σ ) has increasingdifferences, in that ∂ K ( θ , σ ) ∂ θ ∂ β = ∂ K ( θ , σ ) ∂ θ − ∂ K ( θ , ˜ σ ) ∂ θ ≥ . for all β and θ ∈ [ θ ( ˜ σ ) , θ ( σ )] . So the monotone selection theorem of Topkis implies theresult we want.Part (iii): Pick β and β as stated. Let θ ∗ = θ ( σ β ) . This θ ∗ is an interior solution, so itmust solve the ﬁrst-order condition ∂ K ( θ ∗ , σ β ) ∂ θ =

0, which is equivalent to β ∂ K ( θ ∗ , σ ) ∂ θ + ( − β ) ∂ K ( θ ∗ , ˜ σ ) ∂ θ = . (28)We claim that each term in the left-hand side is non-zero: Claim 8. ∂ K ( θ ∗ , σ ) ∂ θ (cid:54) = .Proof. Suppose not so that ∂ K ( θ ∗ , σ ) ∂ θ =

0. Then from (28), we have ∂ K ( θ ∗ , ˜ σ ) ∂ θ =

0, that is, θ ∗ satisﬁes the ﬁrst-order condition for σ and ˜ σ . Then we must have ∂ K ( θ ∗ , σ ) ∂ θ ≥

0. Indeed, ifnot and ∂ K ( θ ∗ , σ ) ∂ θ < θ ∗ becomes the local maxima for K ( θ , σ ) , which contradicts with thesingle-peakedness of K ( θ , σ ) . Similarly we have ∂ K ( θ ∗ , ˜ σ ) ∂ θ ≥ ∂ K ( θ ∗ , σ β ) ∂ θ >

0, is52atisﬁed for σ β , which is equivalent to β ∂ K ( θ ∗ , σ ) ∂ θ + ( − β ) ∂ K ( θ ∗ , ˜ σ ) ∂ θ > . This inequality implies ∂ K ( θ ∗ , σ ) ∂ θ > ∂ K ( θ ∗ , ˜ σ ) ∂ θ >

0. Suppose for now that ∂ K ( θ ∗ , σ ) ∂ θ > ∂ K ( θ ∗ , ˜ σ ) ∂ θ > ∂ K ( θ ∗ , ˜ σ ) ∂ θ ≥

0, we have ∂ K ( θ ∗ , σ β ) ∂ θ > β (cid:54) =

0. Also, since ∂ K ( θ ∗ , σ ) ∂ θ = ∂ K ( θ ∗ , ˜ σ ) ∂ θ = ∂ K ( θ ∗ , σ β ) ∂ θ = β . So θ ∗ satisﬁes both the ﬁrst-order and the second-orderconditions, which implies that θ ( σ β ) = θ ∗ for all β (cid:54) =

0. Then since θ ( σ β ) is continuousin β , we have θ ( σ β ) = θ ∗ for all β ∈ [ , ] . But this is a contradiction, because we have θ ( ˜ σ ) < θ ( σ ) .The above claim and (28) imply that ∂ K ( θ ∗ , σ β ) ∂ θ = β ∂ K ( θ ∗ , σ ) ∂ θ + ( − β ) ∂ K ( θ ∗ , ˜ σ ) ∂ θ (cid:54) = , which means that θ ∗ cannot be the optimal solution for β . (Note that θ ∗ is an interior value,so the ﬁrst-order condition is necessary for it to be optimal.) Then from part (ii), the resultfollows. A.11 Proof of Proposition 6

Let Θ ∗∗ be the union of the equilibrium models and the boundary points, that is, Θ ∗∗ = Θ ∗ ∪{ , } . Since Θ ∗ is ﬁnite, it can be written as Θ ∗∗ = { θ , θ , · · · , θ N } where 0 = θ < · · · < θ N = ( θ n , θ n + ) has a useful property. Lemma 3.

Each interval ( θ n , θ n + ) must satisfy one of the following properties: (i) For each θ ∈ ( θ n , θ n + ) and for each x ∈ F ( δ θ ) , we have θ ( δ x ) > θ .(ii) For each θ ∈ ( θ n , θ n + ) and for each x ∈ F ( δ θ ) , we have θ ( δ x ) < θ . Proof.

If there is θ ∈ ( θ n , θ n + ) such that θ ( δ x ) = θ for some x ∈ F ( δ θ ) , then this θ is anequilibrium model, which is a contradiction. So such θ does not exist.53imilarly, if there is θ ∈ ( θ n , θ n + ) such that θ ( δ x ) < θ < θ ( δ ˜ x ) for some x , ˜ x ∈ F ( δ θ ) , thenthere is a mixture σ of x and ˜ x such that θ ( σ ) = θ , which implies that θ is a mixed-strategyequilibrium model. So again such θ does not exist.Accordingly, ( θ n , θ n + ) must be the union of the two sets, Θ and Θ : Θ is the set of all θ ∈ ( θ n , θ n + ) such that θ ( δ x ) > θ for all x ∈ F ( δ θ ) . Θ is the set of all θ ∈ ( θ n , θ n + ) suchthat θ ( δ x ) < θ for all x ∈ F ( δ θ ) . However, since F ( δ θ ) is upper hemi-continuous in θ , one ofthese sets must be empty. This implies the result.Next, we characterize how the KL minimizer θ ( σ t ) changes over time, when the motion of σ t is determined by the differential inclusion. Consider an interval ( θ n , θ n + ) which satisﬁesproperty (i) in the lemma above. Pick a solution σσσ to the differential inclusion, and supposethat θ ( σσσ ( t )) ∈ ( θ n , θ n + ) in the current period t . Then from property (i), the agent will choosean action x such that θ ( δ x ) > θ ( σσσ ( t )) , which means that θ ( σσσ ( t )) should move up and even-tually reaches (a neighborhood of) θ n + . Also, once θ ( σσσ ( t )) goes above θ n + , it cannot belower than θ n + in any later period. Formally, we have the following result: Lemma 4.

Suppose that the interval ( θ n , θ n + ) satisﬁes property (i) stated in Lemma 3. Thenfor any ε > , there is T > such that given any initial value σσσ ( ) with θ ( σσσ ( )) > θ n andgiven any solution σσσ ∈ S ∞ σσσ ( ) to the differential inclusion, we have θ ( σσσ ( t )) > θ n + − ε for allt ≥ T .Proof.

Let X ∗ = ∪ θ ∈ ( θ n , θ n + ) F ( δ θ ) . We ﬁrst consider the special case in which θ ( δ x ) ≥ θ n + for all x ∈ X ∗ . Then we will explain how to extend the proof for a general case. Case 1: θ ( δ x ) ≥ θ n + for all x ∈ X ∗ . Let X be the set of all mixed strategies σ such that θ ( σ ) ≥ θ n + . From Proposition 5(ii), this set is convex. Similarly, the set (cid:52) X \ X is convex.So there is a hyperplane H which separates these two sets; i.e., there is a vector λ ∈ R | X | and k ∈ R such that λ · σ ≥ k for all σ such that θ ( σ ) ≥ θ n + , and λ · σ < k for all σ such that θ ( σ ) < θ n + . From Proposition 5(ii), for any σ ∈ (cid:52) X ∗ , we have θ ( σ ) ≥ θ n + and hence λ · σ ≥ k .Pick an arbitrary solution σσσ to the differential inclusion. Pick any time t such that θ ( σσσ ( t )) ∈ ( θ n , θ n + ) . Then we have ˙ σσσ ( t ) = σ − σσσ ( t ) (29)for some σ ∈ (cid:52) X ∗ , and also we have λ · ˙ σσσ ( t ) = λ · ( σ − σσσ ( t )) ≥ k − λ · σσσ ( t ) > . (30)54he ﬁrst equation (29), together with Proposition 5(ii), implies that θ ( σσσ ( t )) weakly increasesas time goes for all these t . That is, if θ ( σσσ ( t )) ∈ ( θ n , θ n + ) in the current time t , then we have θ ( σσσ ( t + η )) ≥ θ ( σσσ ( t )) at the next instant t + η . The second equation (30) implies that λ · ˙ σσσ ( t ) strictly increases as time goes. So σσσ ( t ) moves toward the hyperplane H if θ ( σσσ ( t )) ∈ ( θ n , θ n + ) in the current time t .These observations immediately imply the result we want. Pick an arbitrary ε >

0, and let˜ ε > θ ( σ ) > θ n + − ε for all σ such that λ · σ > k − ˜ ε . Pick T large enough that˜ ε T > k − λ · σ (31)for all σ . From (30), if λ · σσσ ( t ) > k − ˜ ε in the current period t , we have λ · ˙ σσσ ( t ) ≥ ˜ ε thatis, λ · σσσ ( t ) increases at a rate at least ˜ ε . Then from (31), given any initial value σσσ ( ) with θ ( σσσ ( )) > θ n , there is t < T such that λ · σσσ ( t ) > k − ˜ ε , which implies θ ( σσσ ( t )) > θ n + − ε .Also (29) implies that after this time t , θ ( σσσ ( ˜ t )) cannot fall below θ n + − ε , that is, θ ( σσσ ( ˜ t )) > θ n + − ε for all ˜ t > t . This implies the result, because t < T . Case 2: θ ( δ x ) < θ n + for some x ∈ X ∗ . Let X ∗∗ = { x , x , · · · , x M } denote the set ofall x ∈ X ∗ such that θ ( δ x ) < θ n + . For each action x m , let ξ m denote the maximal value of θ ∈ ( θ n , θ n + ) such that x m ∈ F ( δ θ ) . Note that the maximum exists, because F is upperhemi-continuous. Also, by the assumption, we have ξ m < θ ( δ x m ) . Without loss of generality,assume that θ n < ξ ≤ ξ · · · ≤ ξ M < θ n + .Then we can show that there is T such that given any initial value σσσ ( ) with θ ( σσσ ( )) ∈ ( θ , ξ ] and given any solution σσσ to the differential inclusion, we have θ ( σσσ ( t )) > ξ for sometime t < T . The proof is very similar to the argument in the previous case: Let λ and k besuch that λ · σ ≥ k for all θ ( σ ) ≥ θ ( δ x ) and λ · σ < k for all θ ( σ ) < θ ( δ x ) . Then for any t such that θ ( σσσ ( t )) ∈ ( θ n , ξ ] , we have ˙ σσσ ( t ) = σ − σσσ ( t ) for some σ ∈ (cid:52) X ∗ , and also λ · ˙ σσσ ( t ) = λ · ( σ − σσσ ( t )) > k − λ · σσσ ( t ) > . Note that k − λ · σσσ ( t ) is bounded away from zero uniformly in σσσ ( t ) with θ ( σσσ ( t )) ∈ ( θ n , ξ ] ,because property (i) in Lemma 3 ensures θ ( δ x m ) > ξ m for each m . This immediately impliesthe existence of T .Similarly, there is T such that given any initial value σσσ ( ) with θ ( σσσ ( )) ∈ ( ξ , ξ ] andgiven any solution σσσ to the differential inclusion, we have θ ( σσσ ( t )) > ξ for some time t < T .Again the proof is very similar to the argument in Case 1; the only difference is that here weuse the fact that the action x is never chosen when θ ( σσσ ( t )) ∈ ( ξ , ξ ] .55e iterate this process and deﬁne T , T , · · · , T M . Also, pick an arbitrarily small ε > T M + be such that given any initial value σσσ ( ) with θ ( σσσ ( )) ∈ ( ξ M , θ n + ) and givenany solution σσσ to the differential inclusion, we have θ ( σσσ ( t )) > θ n + − ε for some time t < T M + . Then let T = T + · · · + T M + . This ( ε , T ) obviously satisﬁes the property stated in thelemma.The next lemma relates the result in the previous lemma to the motion of θ ( ( t )) , where ( t ) is the actual frequency. It shows that if θ ( ( t )) visits the interval ( θ n , θ n + ) inﬁnitelyoften, then after a long time, θ ( ( t )) cannot be less than θ n + . That is, θ ( ( t )) cannot moveagainst the solution to the differential inclusion in the long run. Lemma 5.

Consider an interval ( θ n , θ n + ) which satisﬁes property (i) in Lemma 3. Pick asample path h such that the property stated in Theorem 3 is satisﬁed and such that θ ( ( t )) exceeds θ n inﬁnitely often, i.e., for any T > , there is t > T such that θ ( ( t )) > θ n . Then lim inf t → ∞ θ ( ( t )) ≥ θ n + .Proof. The proof is very similar to that of Proposition 2(ii). Pick ( θ n , θ n + ) and h as stated.Pick an arbitrarily small η >

0. Then pick ε > θ ( σ ) > θ n + − η for all σ suchthat (cid:107) σ − ˜ σ (cid:107) < ε for some ˜ σ with θ ( ˜ σ ) > θ n + − η .From Lemma 4, there is T > σσσ ( ) with θ ( σσσ ( )) > θ n and given any solution σσσ ∈ S ∞ σσσ ( ) to the differential inclusion, θ ( σσσ ( t )) > θ n + − η t ≥ T . Pick such T . Also, pick ˜ T large enough that (22) holds for any t > ˜ T and for any s ∈ [ , T ] .By the assumption, there is t > ˜ T such that θ ( ( t )) > θ n . Pick such t . Then from (22),(32), and the deﬁnition of ε , we have θ ( ( t + s )) > θ n + − η for all s ∈ [ T , T ] . Applyingthe same argument again, we obtain θ ( ( t + s )) > θ n + − η for all s ≥ T , which implies thatlim inf t → ∞ θ ( ( t )) ≥ θ n + − η . Since η can be arbitrarily small, we obtain the result.Now we will show that θ ( σ t ) converges almost surely. Suppose not, so that we havelim inf t → ∞ θ ( σ t ) < lim sup t → ∞ θ ( σ t ) with positive probability. Then there is a path h such thatthe property stated in Theorem 3 is satisﬁed and such that lim inf t → ∞ θ ( σ t ) < lim sup t → ∞ θ ( σ t ) .Pick such h .Let ( θ n , θ n + ) be an interval such that the intersection of the interval and [ lim inf t → ∞ θ ( σ t ) , lim sup t → ∞ θ ( σ t )] is non-empty. Assume for now that this interval satisﬁes property (i) stated56n lemma 3. By the deﬁnition of h , θ ( ( t )) must exceed θ n inﬁnitely often, so Lemma 5implies lim inf t → ∞ θ ( ( t )) ≥ θ n + . But this is a contradiction, because it implies that theintersection of ( θ n , θ n + ) and [ lim inf t → ∞ θ ( σ t ) , lim sup t → ∞ θ ( σ t )] is empty.Likewise, if the interval ( θ n , θ n + ) satisﬁes property (ii) in Lemma 3, there is a contradic-tion. Hence we must have lim inf t → ∞ θ ( σ t ) = lim sup t → ∞ θ ( σ t ) almost surely.Also, Lemma 5 implies that for each interval ( θ n , θ n + ) which satisﬁes property (i) inLemma 3, we have lim t → ∞ θ ( σ t ) ∈ ( θ n , θ n + ) with zero probability. Obviously the same istrue for each interval ( θ n , θ n + ) which satisﬁes property (ii). Hence lim t → ∞ θ ( σ t ) ∈ Θ ∗∗ almostsurely.So for the case in which the boundary points { , } are equilibrium models, we havelim t → ∞ θ ( σ t ) ∈ Θ ∗ . If θ = θ ( δ x ) > θ for any model θ ∈ [ θ , θ ) and for any x ∈ F ( δ θ ) . Then as in Lemma 5, we canshow that if a sample path h satisﬁes the property stated in Theorem 3 and θ ( ( t )) ∈ [ θ , θ ) inﬁnitely often, then lim inf t → ∞ θ ( ( t )) ≥ θ . This immediately implies that σ t converges to θ = θ = σ t convergesto this model with zero probability. Hence the result follows. A.12 Proof of Proposition 7

It is obvious that (c) implies (b). So in this proof, we will show that (a) implies (c), and (b)implies (a).

A.12.1 Step 1: (a) implies (c)

Pick an attracting model θ ∗ , and let A = { σ ∈ (cid:52) F ( δ θ ∗ ) | θ ( σ ) = θ ∗ } . This set is non-empty,because F is upper hemi-continuous in σ and θ ( σ ) is continuous. We will show that this set A is robustly attracting.The following notation is useful. Let X be the set of all mixed strategies σ such that θ ( σ ) < θ ∗ . From Proposition 5, this set is convex. Similarly, the set (cid:52) X \ X is convex.So there is a hyperplane H which separates these two sets; i.e., there is a vector λ ∈ R | X | and k such that λ · σ < k for all σ such that θ ( σ ) < θ ∗ , and λ · σ ≥ k for all σ suchthat θ ( σ ) ≥ θ ∗ . Similarly, letting X be the set of all σ such that θ ( σ ) > θ ∗ , there is ahyperplane H which separates X and (cid:52) X \ X , i.e., there is a vector λ ∈ R | X | and k suchthat λ · σ < k for all σ such that θ ( σ ) > θ ∗ , and λ · σ ≥ k for all σ such that θ ( σ ) ≤ θ ∗ .(These hyperplanes H and H may or may not coincide.) Let X ∗ be the set of all σ such that57 ( σ ) = θ ∗ .We ﬁrst consider the special case in which F ( δ θ ∗ ) = X , i.e., the agent is indifferent overall actions in the model θ ∗ . In this case, A = X ∗ , i.e., the set A is the set of all mixed actions σ with θ ( σ ) = θ ∗ . Later on, we will explain how to extend the proof technique to the casewith F ( δ θ ∗ ) ⊂ X . Case 1: F ( δ θ ∗ ) = X .Pick ε > X ε be the set of all σ such that | θ ( σ ) − θ ∗ | < ε . We show that this set X ε is (a subset of) the basin of attraction. That is,given any initial value σσσ ( ) ∈ X ε , any solution σσσ ∈ S ∞ σσσ ( ) to the differential inclusion willenter a neighborhood of the set A = X ∗ in ﬁnite time and stay there forever.So pick any initial value σσσ ( ) ∈ X ε and any solution σσσ ∈ S ∞ σσσ ( ) . We ﬁrst show that thissolution σσσ never leaves the set X ε . Lemma 6. σσσ ( t ) ∈ X ε for all t, that is, | σσσ ( t ) − θ ∗ | < ε for all t.Proof. Suppose that θ ( σσσ ( t )) ∈ ( θ ∗ − ε , θ ∗ ) for some t . Then we have ˙ σσσ ( t ) = σ − σσσ ( t ) forsome σ ∈ (cid:52) F ( δ θ ( σσσ ( t )) ) . By the deﬁnition of ε , we must have θ ( σ ) ≥ θ ∗ ; then from Propo-sition 5, at the next instant t + η , we have θ ( σσσ ( t + η )) ≥ θ ( σσσ ( t )) , i.e., θ ( σσσ ( t )) is weaklyincreasing in t if σσσ ( t ) ∈ [ θ ∗ − ε , θ ∗ ) . Similarly, if θ ( σσσ ( t )) ∈ ( θ ∗ , θ ∗ + ε ) , then θ ( σσσ ( t )) isweakly decreasing in t . This implies the result we want.Next, we show that A is attracting. It sufﬁces to prove the following lemma: Lemma 7.

For any ε > , there is T > such that for any initial value σσσ ( ) ∈ X ε and anysolution σσσ ∈ S ∞ σσσ ( ) , we have d ( σσσ ( t ) , X ∗ ) < ε for all t > T .Proof.

Suppose that θ ( σσσ ( t )) ∈ ( θ ∗ − ε , θ ∗ ) for some t . Then as shown in the proof of theprevious lemma, we have ˙ σσσ ( t ) = σ − σσσ ( t ) for some σ such that θ ( σ ) ≥ θ ∗ . This in turnimplies that λ · ˙ σσσ ( t ) = λ · ( σ − σσσ ( t )) ≥ k − λ · σσσ ( t ) > . Here the weak inequality follows from θ ( σ ) ≥ θ ∗ , and the strict inequality follows from θ ( σ ) < θ ∗ . Note that k − λ · σσσ ( t ) measures the current distance from σσσ ( t ) to the hyperplane H , and λ · ˙ σσσ ( t ) measures how much σσσ ( t ) gets closer to the hyperplane H at the next instant,So the equation above implies that σσσ ( t ) gets closer to H as time goes, and the speed ofconvergence is bounded away from zero until σσσ ( t ) enters a neighborhood of H .58imilarly, if θ ( σσσ ( t )) ∈ ( θ ∗ , θ ∗ + ε ) for some t , then σσσ ( t ) gets closer to the hyperplane H as time goes, and the speed of convergence is bounded away from zero until σσσ ( t ) entersa neighborhood of H . This implies the result we want, because the set A = X ∗ is the spacesandwiched by H and H (formally, A = (cid:52) X \ ( X ) ∪ X ) ).As a last step, we show that the set A is robustly attracting: Lemma 8.

The set A is robustly attracting.Proof.

Let H (cid:48) be the set of all σ with θ ( σ ) = θ ∗ − ε , and H (cid:48) be the set of all σ with θ ( σ ) = θ ∗ + ε . Take a small ε ∗ > θ ( σ ) ∈ ( θ ∗ − ε , θ ∗ ) for all σ with d ( σ , H (cid:48) ) < ε ∗ , andsuch that θ ( σ ) ∈ ( θ ∗ , θ ∗ + ε ) for all σ with d ( σ , H (cid:48) ) < ε ∗ . Note that such ε ∗ exists because H (cid:48) , H (cid:48) , X ∗ , { σ | θ ( σ ) = θ ∗ − ε } , and { σ | θ ( σ ) = θ ∗ + ε } are all compact and disjoint.Consider any solution to the ε ∗ -perturbed differential inclusion, and suppose that σσσ ( t ) ∈ H (cid:48) for some t , i.e., suppose that θ ( σσσ ( t )) = θ ∗ − ε . Then by the deﬁnition of ε ∗ , ˙ σσσ ( t ) = σ − σσσ ( t ) for some σ such that θ ( σ ) ≥ θ ∗ , which implies that θ ( σσσ ( t )) moves up at the nextinstant. Likewise, if θ ( σσσ ( t )) = θ ∗ + ε for some t , then θ ( σσσ ( t )) moves down at the nextinstant. Accordingly, if the initial value is in the set { σ | θ ∗ − ε ≤ θ ( σ ) ≤ ε } , any solution tothe ε ∗ -perturbed differential inclusion cannot leave this set. This implies the result. Case 2: F ( δ θ ∗ ) ⊂ X .Pick small ε > F ( δ ˜ θ ) ⊆ F ( δ θ ∗ ) for all θ such that | θ − θ ∗ | < ε . (Take ε small, ifnecessary.)Let X ε be as in the previous case. We show that this set X ε is (a subset of) the basin ofattraction. That is, given any initial value σσσ ( ) ∈ X ε , any solution σσσ ∈ S ∞ σσσ ( ) to the differentialinclusion will enter a neighborhood of the set A in ﬁnite time and stay there forever. Note thatnow the set A is a strict subset of X ∗ .Pick any initial value σσσ ( ) ∈ X ε and any solution σσσ ∈ S ∞ σσσ ( ) . Then Lemma 6 still holds,that is, σσσ ( t ) never leaves the set X ε . Also Lemma 7 still holds, that is, σσσ ( t ) moves toward tothe set X ∗ as time goes.Also, by the deﬁnition of ε , we have F ( (cid:52) Θ ( σ )) ⊆ F ( δ θ ∗ ) for any σ ∈ X ε . This impliesthat at every time t , we have ˙ σσσ ( t )[ x ] = − σσσ ( t )[ x ] for each x / ∈ F ( δ θ ∗ ) . This implies that σσσ ( t ) assigns probability zero on any action x / ∈ F ( δ θ ∗ ) in the limit as t → ∞ , and in particular, forany ε >

0, there is T such that ˙ σσσ ( t )[ x ] = − σσσ ( t )[ x ] for all x / ∈ F ( δ θ ∗ ) and t > T . This andLemma 7 imply that the set A is an attractor. 59lso, we can show that the set A is robustly attracting; the proof is very similar to that ofLemma 8, and hence omitted. A.12.2 Step 2: (b) implies (a)

Pick an arbitrary θ ∗ , and let A = { σ ∈ (cid:52) F ( δ θ ∗ ) | θ ( σ ) = θ ∗ } . Suppose that A is an attractor.We will show that the model θ ∗ is attracting.Let U A be the basin of the set A , and let σσσ ( ) be such that θ ( σσσ ( )) < θ ∗ . Then we havethe following lemma: Lemma 9.

For any θ ∈ ( θ ( σσσ ( )) , θ ∗ ) and for any σ ∈ F ( δ θ ) , we have θ ( σ ) > θ .Proof. Suppose not, so that there is θ ∈ ( θ ( σσσ ( ) , θ ∗ ) and σ ∈ F ( δ θ ) such that θ ( σ ) ≤ θ .Consider a solution to the differential inclusion σσσ ∈ S ∞ σσσ ( ) such that for any time t such that θ ( σσσ ( t )) = θ , we have ˙ σσσ ( t ) = σ − σσσ ( t ) . Then σσσ ( t ) ≤ θ for all t ; by the deﬁnition of σ , θ ( σσσ ( t )) must go down whenever it hits θ ( σσσ ( t )) = θ . This contradicts with the fact that σσσ ( ) is in the basin of attraction.Since F ( δ θ ) is upper hemi-continuous in θ , there is ε > F ( δ θ ) = F ( δ ˜ θ ) for all θ , ˜ θ ∈ ( θ ∗ − ε , θ ∗ ) . Then the lemma above implies that for any θ ∈ ( θ ∗ − ε , θ ∗ ) and for any σ ∈ F ( δ θ ) , we have θ ( σ ) ≥ θ ∗ .Similarly, we can show that there is ˜ ε > θ ∈ ( θ ∗ , θ ∗ + ˜ ε ) and for any σ ∈ F ( δ θ ) , we have θ ( σ ) ≤ θ ∗ . Hence θ ∗ is attracting. A.13 Proof of Proposition 8

Only if:

Suppose that a model θ ∗ ∈ ( , ) is repelling. Then upper hemi-continuity of F implies that there are pure actions x and ˜ x such that θ ( δ x ) < θ ∗ < θ ( δ ˜ x ) and and x , ˜ x ∈ F ( δ θ ∗ ) .Then from Proposition 5, there is a mixture σ ∗ of these actions x and ˜ x such that θ ( σ ∗ ) = θ ∗ .Obviously this σ ∗ is a mixed equilibrium with θ ( σ ∗ ) = θ ∗ . So it sufﬁces to show that allmixed equilibria with θ ( σ ∗ ) = θ ∗ are repelling.Choose ε > σ with θ ( σ ) ≥ θ ∗ + ε fromothers. That is, there is λ ∈ R | X | and k ∈ R such that λ · σ ≥ k if and only if θ ( σ ) ≥ θ ∗ + ε .Likewise, there is λ and k such that λ · σ ≥ k if and only if θ ( σ ) ≤ θ ∗ − ε .60et U be the set of all σ such that θ ( σ ) ∈ ( θ ∗ − ε , θ ∗ + ε ) . Also, choose T sufﬁcientlylarge so that ( k − λ · σ ) T > k − λ · ˜ σ (33)for all σ with θ ( σ ) ∈ ( θ ∗ , θ ∗ + ε ) and for all ˜ σ with θ ( ˜ σ ) ∈ ( θ ∗ , θ ∗ + ε ) , and that ( k − λ · σ ) T > k − λ · ˜ σ (34)for all σ with θ ( σ ) ∈ ( θ ∗ − ε , θ ∗ ) and for all ˜ σ with θ ( ˜ σ ) ∈ ( θ ∗ − ε , θ ∗ ) .We will show that these U and T satisfy the property stated in the deﬁnition of repellingequilibria. This completes the proof, because any mixed equilibrium σ ∗ with θ ( σ ∗ ) = θ ∗ isin the interior of U .The following result is useful: Claim 9.

For any initial point σ ∈ U with θ ( σ ) (cid:54) = θ ∗ and for any solution σσσ ∈ S ∞ σ to thedifferential inclusion, there is t < T such that σσσ ( t ) / ∈ U .Proof. First, consider the case in which θ ( σ ) ∈ ( θ ∗ , θ ∗ + ε ) . Pick an arbitrary path σσσ ∈ S ∞ σ .Suppose that θ ( σσσ ( t )) ∈ ( θ ∗ , θ ∗ + ε ) in some period t . Then since θ ∗ is repelling, ˙ σσσ ( t ) = σ − σσσ ( t ) for some σ such that θ ( σ ) ≥ θ ∗ + ε . Hence λ · ˙ σσσ ( t ) = λ · ( σ − σσσ ( t )) ≥ k − λ · σσσ ( t ) > θ ( σ ) ≥ θ ∗ + ε , and the strict inequality follows from θ ( σσσ ( t )) < θ ∗ + ε . This implies that λ · ˙ σσσ ( t ) increases whenever θ ( σσσ ( t )) ∈ ( θ ∗ , θ ∗ + ε ) inthe current period t . Hence there is t < T such that λ · θ ( σσσ ( t )) ≥ θ ∗ + ε , implying σσσ ( t ) / ∈ U .A similar argument applies to the case in which θ ( σ ) ∈ ( θ ∗ − ε , θ ∗ ) .Now we will show that U and T satisfy the property stated in the deﬁnition of repellingequilibria. Pick an arbitrary σ ∈ U . There are two cases to be considered. Case 1 : θ ( σ ) (cid:54) = θ ∗ . Pick any action x . For β close to one, a perturbed mixture β σ +( − β ) δ x is still in the set U , and θ ( β σ + ( − β ) δ x ) (cid:54) = θ ∗ . Hence from the claim above,starting from this perturbed mixture β σ + ( − β ) δ x , any solution to the differential inclusionmust leave the set U within time T . So this σ satisﬁes the property stated in the deﬁnition ofrepelling equilibria. Case 2 : θ ( σ ) = θ ∗ . Pick an arbitrary pure action x ∈ F ( δ θ ∗ ) . Since θ ∗ is repelling, θ ( δ x ) (cid:54) = θ ∗ . So from Proposition 5(iii), for any β sufﬁciently close to one, θ ( β σ + ( − ) δ x ) ∈ ( θ ∗ − ε , θ ∗ ) ∪ ( θ ∗ , θ ∗ + ε ) . Hence, from the above claim, starting from this perturbedmixture β σ + ( − β ) δ x , any solution to the differential inclusion must leave the set U withintime T . So this σ satisﬁes the property stated in the deﬁnition of repelling equilibria. If:

Let θ ∗ ∈ ( , ) be such that θ ( δ x ) (cid:54) = θ ∗ for each pure action x ∈ F ( δ θ ∗ ) , there is at leastone mixed equilibrium σ ∗ with θ ( σ ∗ ) = θ ∗ , and all such mixed equilibria are repelling. Wewill show that the model θ ∗ is repelling.Pick an arbitrary repelling equilibrium σ ∗ with θ ( σ ∗ ) = θ ∗ , and let T and U be as in thedeﬁnition of repelling equilibria. Then each point σ ∈ U satisﬁes property (i) or (ii) in thedeﬁnition of repelling equilibria. In particular, σ = σ ∗ satisﬁes property (ii), i.e., starting froma perturbed action frequency β σ ∗ + ( − β ) δ x , any solution to the differential inclusion mustleave the neighborhood U of σ ∗ within time T . This is so because σ ∗ is an equilibrium andnever satisﬁes property (i).As the following lemma shows, this property implies that θ ∗ is indeed repelling. Lemma 10. θ ∗ is repelling.Proof. We prove by contradiction, so suppose that θ ∗ is not repelling. Then from the upperhemi-continuity of F , there is ε > x ∈ F ( δ θ ∗ ) such that(a) θ ( δ x ) > θ ∗ and x ∈ F ( δ θ ) for all θ ∈ ( θ ∗ − ε , θ ∗ ) , or(b) θ ( δ x ) < θ ∗ and x ∈ F ( δ θ ) for all θ ∈ ( θ ∗ , θ ∗ + ε ) .Pick such ε and x . In what follows, we focus on the case in which this action x satisﬁesproperty (a). (The proof for the other case is symmetric, and hence omitted.)Since σ ∗ is a mixed equilibrium with θ ( σ ∗ ) = θ ∗ and since there is no pure action x with θ ( δ x ) = θ ∗ , there must be two actions x and ˜ x such that x , ˜ x ∈ F ( δ θ ∗ ) and θ ( δ x ) < θ ∗ < θ ( δ ˜ x ) .Pick such x and ˜ x . Pick β ∈ ( , ) close to one so that θ ( β σ ∗ + ( − β ) δ x ) ∈ ( θ ∗ − ε , θ ∗ ) .Then consider the following path σσσ which starts from β σ ∗ + ( − β ) δ x :˙ σσσ ( t ) = (cid:40) δ x ∗ − σσσ ( t ) if θ ( σσσ ( t )) < θ ∗ σ ∗ − σσσ ( t ) if θ ( σσσ ( t )) = θ ∗ . In words, on this path, the share of x ∗ increases until θ ( σσσ ( t )) hits θ ∗ , and after that σσσ ( t ) moves toward the equilibrium σ ∗ . Clearly this path solves the differential inclusion with theinitial value β σ ∗ + ( − β ) δ x , and in particular, if β is sufﬁciently close to one, this path neverleaves the neighborhood U of σ ∗ . This implies that σ = σ ∗ does not satisfy property (ii) inthe deﬁnition of repelling equilibria, which is a contradiction.62 .14 Proof of Proposition 9 We say a correspondence B : [ , ] → R has the staircase property if there exists a K < ∞ , 0 = a < a < ... < a K = [ , ] and ( A i ) Ki = such that A i ⊆ R closed, bounded and convex suchthat for each i ∈ { , ..., K } , (i) B ( a i ) = A i and for each θ ∈ ( a i , a i + ) , B ( θ ) = ¯ x i ≡ max { x : x ∈ A i } , (ii) ¯ x i = x i + ≡ min { x : x ∈ A i + } . Claim 10.

The correspondence θ (cid:55)→ B ( θ ) ≡ θ ( ∆ F ( δ θ )) has the staircase property.Proof. We ﬁrst show that if the mapping θ (cid:55)→ G ( θ ) ≡ F ( δ θ ) is uhc and satisﬁes max F ( δ θ ) ≤ min F ( δ θ (cid:48) ) for all θ (cid:48) > θ , then it follows that there exists a K < ∞ , 0 = a < a < ... < a K = [ , ] and ( A i ) Ki = such that A i ⊆ X ﬁnite such that for each i ∈ { , ..., K } , (i) G ( a i ) = A i and for each θ ∈ ( a i , a i + ) , G ( θ ) = ¯ x i ≡ max { x : x ∈ A i } , (ii) ¯ x i = x i + ≡ min { x : x ∈ A i + } .Second, we will show that B has the staircase property with K , 0 = a < a < ... < a K = [ , ] and ( B i ) Ki = where B i ≡ θ ( ∆ A i ) = [ θ ( δ x i ) , θ ( δ x i + )] .For the ﬁrst part note that if G is constant over a non-trivial interval, it must be singlevalued. If not, there exists a θ (cid:48) > θ such max F ( δ θ ) > min F ( δ θ (cid:48) ) . Moreover by the ordercondition and the fact that X is ﬁnite, it follows that the set of points in [ , ] wherein G ismulti-valued is ﬁnite, we denote it as 0 = a < a < ... < a K =

1. For each i ∈ { , .., K } , let A i ≡ G ( a i ) . As max G ( θ ) ≤ min G ( θ ) for all θ (cid:48) > θ , it follows that for each θ ∈ ( a i , a i + ) , G ( θ ) ≥ ¯ x i ≡ max { x : x ∈ A i } and ¯ x i ≤ x i + ≡ min { x : x ∈ A i + } . By the fact that G is uhc and X is ﬁnite, it must hold that G ( θ ) = ¯ x i , otherwise there exists a sequence a sequence ( θ n , x ) n converging to ( a i , x ) such that x = G ( θ n ) but x > max { x : x ∈ G ( a i ) } . A similar argumentshows that ¯ x i = x i + .We now show the correspondence θ (cid:55)→ B ( θ ) ≡ θ ( ∆ F ( δ θ )) has the staircase property.Take any i ∈ { , ..., K } , for any b ∈ ( a i , a i + ) , B ( b ) = θ i ≡ θ ( δ ¯ x i ) and B ( a i ) = θ ( ∆ F ( δ a i )) = θ ( ∆ A i ) ; this establishes condition (i) of the deﬁnition. We will show now that θ ( ∆ A i ) =[ θ ( δ x i ) , θ ( δ ¯ x i + )] which will ﬁnish the proof. By ﬁniteness of X , A i = { x ( ) , ..., x ( L ) } such that x ( ) < x ( ) < ... < x ( L ) (where the indexes can depend on i ), and as x (cid:55)→ θ ( δ x ) is increasing, θ ( δ x ( ) ) < ... < θ ( δ x ( L ) ) .We ﬁrst show that θ ( ∆ A i ) ⊆ [ θ ( δ x i ) , θ ( δ ¯ x i + )] . For this, we ﬁrst show that θ ( δ x i ) = θ ( δ x ( ) ) = min { θ : θ ∈ θ ( ∆ A i ) } . We do this iteratively. If L =

2, then ∆ A i = { σ λ ( x ( ) , x ( )) ≡ λ δ x ( ) + λ δ x ( ) : λ ∈ ∆ } (throughout, σ λ ( x ( i ) , .., x ( m )) denotes a mixed action over actions x ( i ) , ..., x ( m ) with weights63 ). By Proposition 5, λ (cid:55)→ θ ( σ λ ( x ( ) , x ( ))) is non-decreasing, there θ ( δ x ( ) ) = min { θ : θ ∈ θ ( ∆ A i ) } ≤ θ ( δ x ( ) ) = max { θ : θ ∈ θ ( ∆ A i ) } . If L =

3, we want to show that θ ( δ x ( ) ) ≤ θ ( σ λ ( x ( ) , x ( ) , x ( ))) for any λ ∈ ∆ . If λ = θ ( δ x ( ) ) ≤ θ ( σ λ ( x ( ) , x ( ))) for any λ ∈ ∆ . By our calculationsfor the case L =

2, it follows that θ ( δ x ( ) ) ≤ θ ( σ λ ( x ( ) , x ( ))) and since θ ( δ x ( ) ) ≤ θ ( δ x ( ) ) the desired result follows. If λ >

0, then σ λ ( x ( ) , x ( ) , x ( )) = λ δ x ( ) +( − λ ) (cid:18) λ − λ δ x ( ) + λ − λ δ x ( ) (cid:19) = λ δ x ( ) +( − λ ) σ λ (cid:48) ( x ( ) , x ( )) for a λ (cid:48) that is a function of λ . So, by Proposition 5, it sufﬁces to show that θ ( δ x ( ) ) ≤ θ ( σ λ ( x ( ) , x ( ))) for any λ ∈ ∆ , which we already showed. Iterating in this fashion theresult can be showed for L ≥ θ ( δ ¯ x i ) = θ ( δ x ( L ) ) = max { θ : θ ∈ θ ( ∆ A i ) } is analogous and thus omitted.Finally, since ¯ x i = x i + , it follows that θ ( δ ¯ x i ) = θ ( δ x i + ) . Thus, we have shown that θ ( ∆ A i ) ⊆ [ θ ( δ x i ) , θ ( δ x i + )] as desired.We now show that θ ( ∆ A i ) ⊇ [ θ ( δ x i ) , θ ( δ x i + )] . Suppose not, that is, there exists a θ ∈ [ θ ( δ x i ) , θ ( δ x i + )] \ θ ( ∆ A i ) . There exists a l ∈ { , ..., L } such that θ ( δ x ( l ) ) ≤ θ ≤ θ ( δ x ( l + ) ) . ByLemma 1, λ (cid:55)→ θ ( σ λ ( x ( l ) , x ( l + )) is continuous. By Bolzano, there exists a λ ∗ such that θ = θ ( σ λ ∗ ( x ( l ) , x ( l + )) . But σ λ ∗ ( x ( l ) , x ( l + )) ∈ ∆ A i and we thus arrived to a contradiction.Let Θ ∗ be the set of equilibrium models. It is easy to check that Θ ∗ coincides with theset of ﬁxed points of B . Also, this set is nonempty: we can take a nondecreasing selectionfrom the correspondence B and, by Tarski’s ﬁxed point theorem, it must have at least one ﬁxedpoint; since it is a selection of B , then B must have at least one ﬁxed point. Moreover, as B has the staircase property, the set of ﬁxed points Θ ∗ is ﬁnite and its elements can only be inthe following form:1. An end-point of a vertical segment of B .2. An interior point of a vertical segment of B .3. A point in the interior of a horizontal segment of B .4. θ = a > B ( θ ) = θ ∈ [ , a ] , or θ = a > B ( θ ) = θ ∈ [ − a , ] .By Proposition 6, ( θ ( σ t )) t converges almost surely to one element in Θ ∗ . We now showthat it has to converge to an element characterized by cases 3 or 4.64ase 1 is ruled out because at an end-point of a vertical segment θ (cid:48) , F ( δ θ (cid:48) ) contains morethan one action, and so the corresponding pure action equilibrium is not strict, a situation thatwas ruled out by assumption.An equilibrium model characterized by case 2 is repelling by Deﬁnition 7. By Proposition8, the mixed action associated to this equilibrium model is repelling. Thus, by Proposition 3,the action frequency converges to such equilibrium model with probability zero. Therefore,convergence occurs to an equilibrium model characterized in case 3 or 4.Equilibrium models characterized in cases 3 and 4 are attracting by Deﬁnition 2 and aresupported by a pure action. To conclude, let θ ∗ be the point to which the sequence ( θ ( σ t )) t converges to, and denote the associated pure strategy by x ∗ . The fact that θ ( . ) is increasingimplies that ( σ t ) t converges to x ∗ . Finally, we observe that using Theorem 1 it can be shownthat ( σ t ) t converging to x ∗ implies that the beliefs ( µ t ) t converges to δ θ ∗ where θ ∗ = θ ( δ x ∗ ) .As F ( δ θ ∗ ) is a pure action, upper hemi-continuity of F implies that ( F ( µ t )) t also convergesto x ∗ . A.15 Proof of Proposition 10 ∆ ∪ µ ∈ ∆Θ ( σ ) F β ( µ ) ⊆ ∆ ∪ µ ∈ ∆Θ ( σ ) F ( µ ) : Let σ ∈ ∆ ∪ µ ∈ ∆Θ ( σ ) F β ( µ ) . Fix any x such that σ ( x ) >

0. Since σ ∈ ∆ ∪ µ ∈ ∆Θ ( σ ) F β ( µ ) , there exists µ x ∈ ∆Θ ( σ ) such that x ∈ F β ( µ x ) . It sufﬁces toshow that x ∈ F ( µ x ) . Since x ∈ F β ( µ x ) , for any x (cid:48) ∈ X , ˆ ( π ( x , y ) + β V ( B ( x , y , µ x )) ¯ Q µ x ( dy | x ) = ˆ ( π ( x , y ) ¯ Q µ x ( dy | x ) + β V ( µ x ) ≥ ˆ (cid:0) π ( x (cid:48) , y ) + β V ( B ( x (cid:48) , y , µ x )) (cid:1) ¯ Q µ x ( dy | x (cid:48) ) ≥ ˆ ( π ( x (cid:48) , y ) ¯ Q µ x ( dy | x (cid:48) ) + β V ( µ x ) , where the ﬁrst line follows from weak identiﬁcation (which implies B ( x , y , µ x ) = µ x for all y inthe support of ¯ Q µ x ( · | x ) ), the second line follows from x ∈ F β ( µ x ) , and the third line followsfrom the convexity of the value function and the martingale property of Bayesian updating(which imply, using Jensen’s inequality, ´ V ( B ( x (cid:48) , y , µ x )) ¯ Q µ x ( dy | x (cid:48) ) ≥ V ( ´ B ( x (cid:48) , y , µ x ) ¯ Q µ x ( dy | x (cid:48) )) = V ( µ x ) ). Therefore, x is myopically the best action, i.e., x ∈ F ( µ x ) . ∆ ∪ µ ∈ ∆Θ ( σ ) F ( µ ) = ∪ µ ∈ ∆Θ ( σ ) ∆ F ( µ ) : The direction ⊇ holds trivially, so we only establish ⊆ . Let σ ∈ ∆ ∪ µ ∈ ∆Θ ( σ ) F ( µ ) . Fix any x , x (cid:48) such that σ ( x ) >

0. Since σ ∈ ∆ ∪ µ ∈ ∆Θ ( σ ) F β ( µ ) ,there exist µ x , µ x (cid:48) ∈ ∆Θ ( σ ) such that x ∈ F ( µ x ) and x (cid:48) ∈ F ( µ x (cid:48) ) . By weak identiﬁcation and65he fact that µ x and µ x (cid:48) both belong to ∆Θ ( σ ) , ¯ Q µ x ( · | ˜ x ) = ¯ Q µ x (cid:48) ( · | ˜ x ) for all ˜ x in the supportof σ . Therefore, for any x (cid:48)(cid:48) ∈ X , ˆ π ( x (cid:48) , y ) ¯ Q µ x ( dy | x (cid:48) ) = ˆ π ( x , y (cid:48) ) ¯ Q µ x ( dy | x ) ≥ ˆ π ( x (cid:48)(cid:48) , y ) ¯ Q µ x ( dy | x (cid:48)(cid:48) ) , and so x (cid:48) ∈ F ( µ x ) . Since x (cid:48) is an arbitrary element in the support of σ , we have shown thatthere is a common belief µ x under which any action in the support of σσ