[PDF] Berk-Nash Equilibrium: A Framework for Modeling Agents with Misspecified Models

Abstract

We develop an equilibrium framework that relaxes the standard assumption that people have a correctly-specified view of their environment. Each player is characterized by a (possibly misspecified) subjective model, which describes the set of feasible beliefs over payoff-relevant consequences as a function of actions. We introduce the notion of a Berk-Nash equilibrium: Each player follows a strategy that is optimal given her belief, and her belief is restricted to be the best fit among the set of beliefs she considers possible. The notion of best fit is formalized in terms of minimizing the Kullback-Leibler divergence, which is endogenous and depends on the equilibrium strategy profile. Standard solution concepts such as Nash equilibrium and self-confirming equilibrium constitute special cases where players have correctly-specified models. We provide a learning foundation for Berk-Nash equilibrium by extending and combining results from the statistics literature on misspecified learning and the economics literature on learning in games.

Full PDF

BBerk-Nash Equilibrium: A Framework for ModelingAgents with Misspeciﬁed Models ∗ Ignacio Esponda Demian Pouzo(WUSTL) (UC Berkeley)May 10, 2016

Abstract

We develop an equilibrium framework that relaxes the standard assumptionthat people have a correctly-speciﬁed view of their environment. Each playeris characterized by a (possibly misspeciﬁed) subjective model, which describesthe set of feasible beliefs over payoﬀ-relevant consequences as a function of ac-tions. We introduce the notion of a Berk-Nash equilibrium: Each player followsa strategy that is optimal given her belief, and her belief is restricted to be thebest ﬁt among the set of beliefs she considers possible. The notion of best ﬁtis formalized in terms of minimizing the Kullback-Leibler divergence, which isendogenous and depends on the equilibrium strategy proﬁle. Standard solutionconcepts such as Nash equilibrium and self-conﬁrming equilibrium constitutespecial cases where players have correctly-speciﬁed models. We provide a learn-ing foundation for Berk-Nash equilibrium by extending and combining resultsfrom the statistics literature on misspeciﬁed learning and the economics litera-ture on learning in games. ∗ We thank Vladimir Asriyan, Pierpaolo Battigalli, Larry Blume, Aaron Bodoh-Creed, SylvainChassang, Emilio Espino, Erik Eyster, Drew Fudenberg, Yuriy Gorodnichenko, Stephan Lauermann,Natalia Lazzati, Krist´of Madar´asz, Matthew Rabin, Ariel Rubinstein, Joel Sobel, J¨org Stoye, sev-eral seminar participants, and especially a co-editor and four anonymous referees for very helpfulcomments. Esponda: Olin Business School, Washington University in St. Louis, 1 Brookings Drive,Campus Box 1133, St. Louis, MO 63130, [email protected]; Pouzo: Department of Economics,UC Berkeley, 530-1 Evans Hall a r X i v : . [ q -f i n . E C ] M a y Introduction

Most economists recognize that the simplifying assumptions underlying their modelsare often wrong. But, despite recognizing that models are likely to be misspeciﬁed,the standard approach (with exceptions noted below) assumes that economic agentshave a correctly speciﬁed view of their environment. We present an equilibrium frame-work that relaxes this standard assumption and allows the modeler to postulate thateconomic agents have a subjective and possibly incorrect view of their world.An objective game represents the true environment faced by the agent (or players,in the case of several interacting agents). Payoﬀ relevant states and privately observedsignals are drawn from an objective probability distribution. Each player observes herown private signal and then players simultaneously choose actions. The action proﬁleand the realized state determine consequences, and consequences determine payoﬀs.In addition, each player has a subjective model representing her own view of theenvironment. Formally, a subjective model is a set of probability distributions overown consequences as a function of a player’s own action and information. Crucially, weallow the subjective model of one or more players to be misspeciﬁed, which roughlymeans that the set of subjective distributions does not include the true, objectivedistribution. For example, a consumer might perceive a nonlinear price schedule tobe linear and, therefore, respond to average, not marginal, prices. Or traders mightnot realize that the value of trade is partly determined by the terms of trade.A

Berk-Nash equilibrium is a strategy proﬁle such that, for each player, thereexists a belief with support in her subjective model satisfying two conditions. First,the strategy is optimal given the belief. Second, the belief puts probability one onthe set of subjective distributions over consequences that are “closest” to the truedistribution, where the true distribution is determined by the objective game and theactual strategy proﬁle. The notion of “closest” is given by a weighted version of theKullback-Leibler divergence, also known as relative entropy.Berk-Nash equilibrium includes standard and boundedly rational solution con-cepts in a common framework, such as Nash, self-conﬁrming (e.g., Battigalli (1987),Fudenberg and Levine (1993a), Dekel et al. (2004)), fully cursed (Eyster and Rabin,2005), and analogy-based expectation equilibrium (Jehiel (2005), Jehiel and Koessler(2008)). For example, suppose that the game is correctly speciﬁed (i.e., the support ofeach player’s prior contains the true distribution) and that the game is strongly iden- iﬁed (i.e., there is a unique distribution—whether or not correct—that matches theobserved data). Then Berk-Nash equilibrium is equivalent to Nash equilibrium. If thestrong identiﬁcation assumption is dropped, then Berk-Nash is a self-conﬁrming equi-librium. In addition to unifying previous work, our framework provides a systematicapproach for extending previous cases and exploring new types of misspeciﬁcations.We provide a foundation for Berk-Nash equilibrium (and the use of Kullback-Leibler divergence as a measure of “distance”) by studying a dynamic setup with aﬁxed number of players playing the objective game repeatedly. Each player believesthat the environment is stationary and starts with a prior over her subjective model.In each period, players use the observed consequences to update their beliefs accordingto Bayes’ rule. The main objective is to characterize limiting behavior when playersbehave optimally but learn with a possibly misspeciﬁed subjective model. The main result is that, if players’ behavior converges, then it converges to a Berk-Nash equilibrium. A converse result, showing that we can converge to any Berk-Nashequilibrium of the game for some initial (non-doctrinaire) prior, does not hold. But weobtain a positive convergence result by relaxing the assumption that players exactlyoptimize. For any given Berk-Nash equilibrium, we show that convergence to thatequilibrium occurs if agents are myopic and make asymptotically optimal choices (i.e.,optimization mistakes vanish with time).There is a longstanding interest in studying the behavior of agents who hold mis-speciﬁed views of the world. Examples come from diverse ﬁelds including industrialorganization, mechanism design, information economics, macroeconomics, and psy-chology and economics (e.g., Arrow and Green (1973), Kirman (1975), Sobel (1984),Kagel and Levin (1986), Nyarko (1991), Sargent (1999), Rabin (2002)), althoughthere is often no explicit reference to misspeciﬁed learning. Most of the literature,however, focuses on particular settings, and there has been little progress in devel-oping a uniﬁed framework. Our treatment uniﬁes both “rational” and “boundedlyrational” approaches, thus emphasizing that modeling the behavior of misspeciﬁedplayers does not constitute a large departure from the standard framework.Arrow and Green (1973) provide a general treatment and make a distinction be-tween objective and subjective games. Their framework, though, is more restrictivethan ours in terms of the types of misspeciﬁcations that players are allowed to have. In the case of multiple agents, the environment need not be stationary, and so we are ignoringrepeated game considerations where players take into account how their actions aﬀect others’ futureplay. We discuss the extension to a population model with a continuum of agents in Section 5. Our paper is also related to the bandit (e.g., Rothschild (1974), McLennan (1984),Easley and Kiefer (1988)) and self-conﬁrming equilibrium (SCE) literatures, whichhighlight that agents might optimally end up with incorrect beliefs if experimentationis costly. We also allow beliefs to be incorrect due to insuﬃcient feedback, but ourmain contribution is to allow for misspeciﬁed learning. When players have misspeciﬁedmodels, beliefs may be incorrect and endogenously depend on own actions even if thereis persistent experimentation; thus, an equilibrium framework is needed to characterizesteady-state behavior even in single-agent settings. From a technical perspective, we extend and combine results from two literatures.First, the idea that equilibrium is a result of a learning process comes from the liter-ature on learning in games. This literature studies explicit learning models to justifyNash and SCE (e.g., Fudenberg and Kreps (1988), Fudenberg and Kreps (1993), Fu-denberg and Kreps (1995), Fudenberg and Levine (1993b), Kalai and Lehrer (1993)). We extend this literature by allowing players to learn with models of the world thatare misspeciﬁed even in steady state.Second, we rely on and contribute to the literature studying the limiting behaviorof Bayesian posteriors. The results from this literature have been applied to de-cision problems with correctly speciﬁed agents (e.g., Easley and Kiefer, 1988). Inparticular, an application of the martingale convergence theorem implies that beliefsconverge almost surely under the agent’s subjective prior. This result, however, doesnot guarantee convergence of beliefs according to the true distribution if the agent hasa misspeciﬁed model and the support of her prior does not include the true distribu-tion. Thus, we take a diﬀerent route and follow the statistics literature on misspeciﬁedlearning. This literature characterizes limiting beliefs in terms of the Kullback-Leiblerdivergence (e.g., Berk (1966), Bunke and Milhaud (1998)). We extend the statistics Some explanations for why players may have misspeciﬁed models include the use of heuristics(Tversky and Kahneman, 1973), complexity (Aragones et al., 2005), the desire to avoid over-ﬁttingthe data (Al-Najjar (2009), Al-Najjar and Pai (2013)), and costly attention (Schwartzstein, 2009). In the macroeconomics literature, the term SCE is sometimes used in a broader sense to includecases where agents have misspeciﬁed models (e.g., Sargent, 1999). Two extensions of SCE are also potentially applicable: restrictions on beliefs based on introspec-tion (e.g., Rubinstein and Wolinsky, 1994), and ambiguity aversion (Battigalli et al., 2012). See Fudenberg and Levine (1998, 2009) for a survey of this literature. White (1982) shows that the Kullback-Leibler divergence also characterizes the limiting behavior

A (simultaneous-move) game G = < O , Q > is composed of a (simultaneous-move)objective game O and a subjective model Q . Objective game . A (simultaneous-move) objective game is a tuple O = (cid:104) I, Ω , S , p, X , Y , f, π (cid:105) , where: I is the set of players; Ω is the set of payoﬀ-relevant states; S = × i ∈ I S i is theset of proﬁles of signals, where S i is the set of signals of player i ; p is a probabilitydistribution over Ω × S , and, for simplicity, it is assumed to have marginals with fullsupport; we use standard notation to denote marginal and conditional distributions,e.g., p Ω | S i ( · | s i ) denotes the conditional distribution over Ω given S i = s i ; X = × i ∈ I X i is a set of proﬁles of actions, where X i is the set of actions of player i ; Y = × i ∈ I Y i is a set of proﬁles of (observable) consequences, where Y i is the set of consequencesof player i ; f = ( f i ) i ∈ I is a proﬁle of feedback or consequence functions, where f i : X × Ω → Y i maps outcomes in Ω × X into consequences of player i ; and π = ( π i ) i ∈ I ,where π i : X i × Y i → R is the payoﬀ function of player i . For simplicity, we provethe results for the case where all of the above sets are ﬁnite. The timing of the objective game is as follows: First, a state and a proﬁle ofsignals are drawn according to p . Second, each player privately observes her ownsignal. Third, players simultaneously choose actions. Finally, each player observesher consequence and obtains a payoﬀ. We implicitly assume that players observe at of the maximum quasi-likelihood estimator. The concept of a feedback function is borrowed from the SCE literature. Also, while it isredundant to have π i depend on x i , it simpliﬁes the notation in applications. In the working paper version (Esponda and Pouzo, 2014), we provide technical conditions underwhich the results extend to nonﬁnite Ω and Y . A strategy of player i is a mapping σ i : S i → ∆( X i ). The probability that player i chooses action x i after observing signal s i is denoted by σ i ( x i | s i ). A strategy proﬁleis a vector of strategies σ = ( σ i ) i ∈ I ; let Σ denote the space of all strategy proﬁles.Fix an objective game. For each strategy proﬁle σ , there is an objective distri-bution over player i ’s consequences, Q iσ : S i × X i → ∆( Y i ), where Q iσ ( y i | s i , x i ) = (cid:88) { ( ω,x − i ): f i ( x i ,x − i ,ω )= y i } (cid:88) s − i (cid:89) j (cid:54) = i σ j ( x j | s j ) p Ω × S − i | S i ( ω, s − i | s i ) , (1)for all ( s i , x i , y i ) ∈ S i × X i × Y i . The objective distribution represents the truedistribution over consequences, conditional on a player’s own action and signal, giventhe objective game and a strategy proﬁle followed by the players.

Subjective model . The subjective model represents the set of distributionsover consequences that players consider possible a priori. For a ﬁxed objective game,a subjective model is a tuple Q = (cid:104) Θ , ( Q θ ) θ ∈ Θ (cid:105) , where Θ = × i ∈ I Θ i and Θ i is player i ’s parameter set; and Q θ = ( Q iθ i ) i ∈ I , where Q iθ i : S i × X i → ∆( Y i ) is the conditional distribution over player i ’s consequencesparameterized by θ i ∈ Θ i ; we denote the conditional distribution by Q θ i ( · | s i , x i ). While the objective game represents the true environment, the subjective modelrepresents the players’ perception of their environment. This separation between ob-jective and subjective models is crucial in this paper.

Remark . A special case of a subjective model is one where each player understandsthe objective game being played but is uncertain about the distribution over states,the consequence function, and (in the case of multiple players) the strategies of otherplayers. In this special case, player i ’s uncertainty about p , f i , and σ − i can be de-scribed by a parametric model p θ i , f iθ i , σ − iθ i , where θ i ∈ Θ i . A subjective distribution Q iθ i is then derived by replacing p , f i , and σ − i with p θ i , f iθ i , σ − iθ i in equation (1). (cid:3) See Online Appendix E for the case where players do not observe own payoﬀs. As usual, the superscript − i denotes a proﬁle where the i ’th component is excluded For simplicity, we assume that players know the distribution over own signals. In this case, a player understands that other players mix independently but, due to uncertaintyover the parameter θ i that indexes σ − iθ i = ( σ jθ i ) j (cid:54) = i , she may have correlated beliefs about her oppo-

5y deﬁning Q iθ i as a primitive, we stress two points. First, this object is suﬃcient tocharacterize behavior. Second, working with general subjective distributions allows formore general types of misspeciﬁcations, where players do not even have to understandthe structural elements that determine their payoﬀ relevant consequences.We maintain the following assumptions about the subjective model. Assumption 1.

For all i ∈ I : (i) Θ i is a compact subset of an Euclidean space,(ii) Q iθ i ( y i | s i , x i ) is continuous as a function of θ i ∈ Θ i for all ( y i , s i , x i ) ∈ Y i × S i × X i ,(iii) For all θ i ∈ Θ i , there exists a sequence ( θ in ) n in Θ i such that lim n →∞ θ in = θ i andsuch that, for all n , Q iθ in ( y i | s i , x i ) > s i , x i ) ∈ S i × X i , y i ∈ f i ( x i , X − i , ω ),and ω ∈ supp ( p Ω | S i ( · | s i )).Conditions (i) and (ii) are the standard conditions used to deﬁne a parametric model in statistics (e.g., Bickel et al. (1993)). Condition (iii) plays two roles. First,it guarantees that there exists at least one parameter value that attaches positiveprobability to every feasible observation. In particular, it rules out what can be viewedas a stark misspeciﬁcation in which every element of the subjective model attacheszero probability to an event that occurs with positive true probability. Second, itimposes a “richness” condition on the subjective model: If a feasible event is deemedimpossible by some parameter value, then that parameter value is not isolated in thesense that there are nearby parameter values that consider every feasible event to bepossible. In Section 5, we show that equilibrium may fail to exist and steady-statebehavior need not be characterized by equilibrium without this assumption. We illustrate the environment by presenting several examples that had previously notbeen integrated into a common framework. In examples with a single agent, we dropthe i subscript from the notation. Example 2.1.

Monopolist with unknown demand . A monopolist facesdemand y = f ( x, ω ) = φ ( x ) + ω , where x ∈ X is the price chosen by the monopolist nents’ strategies, as in Fudenberg and Levine (1993a). Nyarko (1991) studies a special case of Example 2.1 and shows that a steady state does notexist in pure strategies; Sobel (1984) considers a misspeciﬁcation similar to Example 2.2; Tverskyand Kahneman’s (1973) story motivates Example 2.3; Sargent (1999, Chapter 7) studies Example2.4; and Kagel and Levin (1986), Eyster and Rabin (2005), Jehiel and Koessler (2008), and Esponda(2008) study Example 2.5. See Esponda and Pouzo (2014) for additional examples. ω is a mean-zero shock with distribution p ∈ ∆(Ω). The monopolist observessales y , but not the shock. The monopolist does not observe any signal, and so weomit signals from the notation. The monopolist’s payoﬀ is π ( x, y ) = xy (i.e., there areno costs). The monopolist’s uncertainty about p and f is described by a parametricmodel f θ , p θ , where y = f θ ( x, ω ) = a − bx + ω is the demand function, θ = ( a, b ) ∈ Θis a parameter vector, and ω ∼ N (0 ,

1) (i.e., p θ is a standard normal distribution forall θ ∈ Θ). In particular, this example corresponds to the special case discussed inRemark 1, and Q θ ( · | x ) is a normal density with mean a − bx and unit variance. (cid:3) Example 2.2.

Nonlinear taxation . An agent chooses eﬀort x ∈ X at cost c ( x ) and obtains income z = x + ω , where ω is a zero-mean shock with distribution p ∈ ∆(Ω). The agent pays taxes t = τ ( z ), where τ ( · ) is a nonlinear tax schedule.The agent does not observe any signal, and so we omit them. The agent observes y = ( z, t ) and obtains payoﬀ π ( x, z, t ) = z − t − c ( x ). She understands how eﬀorttranslates into income but fails to realize that the marginal tax rate depends onincome. We compare two models that capture this misspeciﬁcation. In model A, theagent believes in a random coeﬃcient model, t = ( θ A + ε ) z , in which the marginaland average tax rates are both equal to θ A + ε , where θ A ∈ Θ A = R . In modelB, the agent believes that t = θ B + θ B z + ε , where θ B is the constant marginaltax rate and θ B = ( θ B , θ B ) ∈ Θ B = R . In both models, ε ∼ N (0 ,

1) measuresuncertain aspects of the schedule (e.g., variations in tax rates or credits). Thus, Q jθ ( t, z | x ) = Q jθ ( t | z ) p ( z − x ), where Q jθ ( · | z ) is a normal density with mean θ A z and variance z in model j = A and mean θ B + θ B z and unit variance in model j = B . (cid:3) Example 2.3.

Regression to the mean . An instructor observes the initialperformance s of a student and decides to praise or criticize him, x ∈ { C, P } . Thestudent then performs again and the instructor observes his ﬁnal performance, s (cid:48) .The truth is that performances y = ( s, s (cid:48) ) are independent, standard normal randomvariables. The instructor’s payoﬀ is π ( x, s, s (cid:48) ) = s (cid:48) − c ( x, s ), where c ( x, s ) = κ | s | > s > , x = C or s < , x = P , and, in all other cases, c ( x, s ) = 0. Thefunction c represents a (reputation) cost from lying (i.e., criticizing above-average Formally, f ( x, ω ) = ( z ( x, ω ) , t ( x, ω )), where z ( x, ω ) = x + ω and t ( x, ω ) = τ ( x + ω ). It is not necessary to assume that Θ A and Θ B are compact for an equilibrium to exist; the samecomment applies to Examples 2.3 and 2.4 Formally, ω = ( s, s (cid:48) ), p is the product of standard normal distributions, and y = f ( x, ω ) = ω . s > s <

0. The instructor, however, does not admit the possibility ofregression to the mean and believes that s (cid:48) = s + θ x + ε , where ε ∼ N (0 , θ = ( θ C , θ P ) ∈ Θ parameterizes her perceived inﬂuence on performance. Thus,letting ¯ Q θ ( · | x ) be the a normal density with mean s + θ x and unit variance, it followsthat Q θ (ˆ s, s (cid:48) | s, x ) = ¯ Q θ ( s (cid:48) | s, x ) if ˆ s = s and 0 otherwise. (cid:3) Example 2.4.

Monetary policy . Two players, the government (G) and thepublic (P), i.e., I = { G, P } , choose monetary policy x G and inﬂation forecasts x P ,respectively. They do not observe signals, and so we omit them. Inﬂation, e , andunemployment, U , are determined by e = x G + ε e (2) U = u ∗ − λ ( e − x P ) + ε U , (3)where u ∗ > λ ∈ (0 ,

1) and ω = ( ε e , ε U ) ∈ Ω = R are shocks with a full supportdistribution p ∈ ∆(Ω) and V ar ( ε e ) >

0. The public and the government observerealized inﬂation and unemployment, but not the error terms. The government’spayoﬀ is π ( x G , e, U ) = − ( U + e ). For simplicity, we focus on the government’sproblem and assume that the public has correct beliefs and chooses x P = x G . Thegovernment understands how its policy x G aﬀects inﬂation, but does not realize thatunemployment is aﬀected by surprise inﬂation: U = θ − θ e + ε U . (4)The subjective model is parameterized by θ = ( θ , θ ) ∈ Θ, and it follows that Q θ ( e, U | x G ) is the density implied by the equations (2) and (4). (cid:3) Example 2.5.

Trade with adverse selection.

A buyer with valuation v ∈ V and a seller submit a (bid) price x ∈ X and an ask price a ∈ A , respectively. Theseller’s ask price and the buyer’s value are drawn from p ∈ ∆( A × V ), so that Ω = A × V A model that allows for regression to the mean is s (cid:48) = αs + θ x + ε ; in this case, the agent wouldcorrectly learn that α = 0 and θ x = 0 for all x . Rabin and Vayanos (2010) study a related setup inwhich the agent believes that shocks are autoregressive when in fact they are i.i.d. Formally, ω = ( ε e , ε U ) and y = ( e, U ) = f ( x G , x P , ω ) is given by equations (2) and (3).

8s the state space. Thus, the buyer is the only decision maker. After submitting aprice, the buyer observes y = ω = ( a, v ) and gets payoﬀ π ( x, a, v ) = v − x if a ≤ x and zero otherwise. In other words, the buyer observes perfect feedback, gets v − x ifthere is trade, and 0 otherwise. When making an oﬀer, she does not know her valueor the seller’s ask price. She also does not observe any signals, and so we omit them.Finally, suppose that A and V are correlated but that the buyer believes they areindependent. This is captured by letting Q θ = θ and Θ = ∆( A ) × ∆( V ). (cid:3) Distance to true model . In equilibrium, we will require players’ beliefs to putprobability one on the set of subjective distributions over consequences that are “clos-est” to the objective distribution. The following function, which we call the weightedKullback-Leibler divergence (wKLD) function of player i , is a weighted version ofthe standard Kullback-Leibler divergence in statistics (Kullback and Leibler, 1951). Itrepresents a “distance” between the objective distribution over i ’s consequences givena strategy proﬁle σ ∈ Σ and the distribution as parameterized by θ i ∈ Θ i : K i ( σ, θ i ) = (cid:88) ( s i ,x i ) ∈ S i × X i E Q iσ ( ·| s i ,x i ) (cid:20) ln Q iσ ( Y i | s i , x i ) Q iθ i ( Y i | s i , x i ) (cid:21) σ i ( x i | s i ) p S i ( s i ) . (5)The set of closest parameter values of player i given σ is the setΘ i ( σ ) ≡ arg min θ i ∈ Θ i K i ( σ, θ i ) . The interpretation is that Θ i ( σ ) ⊂ Θ i is the set of parameter values that player i canbelieve to be possible after observing feedback consistent with strategy proﬁle σ . Remark . We show in Section 4 that wKLD is the right notion of distance in alearning model with Bayesian players. Here, we provide an heuristic argument fora Bayesian agent (we drop i subscripts for clarity) with parameter set Θ = { θ , θ } who observes data over t periods, ( s τ , x τ , y τ ) t − τ =0 , that comes from repeated play of The typical story is that there is a population of sellers each of whom follows the weakly dominantstrategy of asking for her valuation; thus, the ask price is a function of the seller’s valuation and, ifbuyer and seller valuations are correlated, then the ask price and buyer valuation are also correlated. The notation E Q denotes expectation with respect to the probability distribution Q . Also, weuse the convention that − ln 0 = ∞ and 0 ln 0 = 0. σ . Let ρ = µ ( θ ) /µ ( θ ) denote the agent’s ratioof priors. Applying Bayes’ rule and simple algebra, the posterior probability over θ after t periods is µ t ( θ ) = (cid:18) ρ Π t − τ =0 Q θ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) (cid:19) − = (cid:18) ρ Π t − τ =0 Q θ ( y τ | s τ , x τ ) /Q σ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) /Q σ ( y τ | s τ , x τ ) (cid:19) − = (cid:32) ρ exp (cid:40) − t (cid:32) t t − (cid:88) τ =0 ln Q σ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) − t t − (cid:88) τ =0 ln Q σ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) (cid:33)(cid:41)(cid:33) − where the second equality follows by multiplying and dividing by Π t − τ =0 Q σ ( y τ | s τ , x τ ).By a law of large numbers argument and the fact that the true joint distributionover ( s, x, y ) is given by Q σ ( y | x, s ) σ ( x | s ) p S ( s ), the diﬀerence in the log-likelihoodratios converges to K ( σ, θ ) − K ( σ, θ ). Suppose that K ( σ, θ ) > K ( σ, θ ). Then,for suﬃciently large t , the posterior belief µ t ( θ ) is approximately equal to 1 / (1 + ρ exp ( − t ( K ( σ, θ ) − K ( σ, θ )))), which converges to 0. Therefore, the posterior even-tually assigns zero probability to θ . On the other hand, if K ( σ, θ ) < K ( σ, θ ), thenthe posterior eventually assigns zero probability to θ . Thus, the posterior eventuallyassigns zero probability to parameter values that do not minimize K ( σ, · ). (cid:3) Remark . Because the wKLD function is weighted by a player’s own strategy, it placesno restrictions on beliefs about outcomes that only arise following out-of-equilibriumactions (beyond the restrictions imposed by Θ). (cid:3)

The next result collects some useful properties of the wKLD function.

Lemma 1. (i) For all i ∈ I , θ i ∈ Θ i , and σ ∈ Σ , K i ( σ, θ i ) ≥ , with equality holdingif and only if Q θ i ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all ( s i , x i ) such that σ i ( x i | s i ) > . (ii)For all i ∈ I , Θ i ( · ) is nonempty, upper hemicontinuous, and compact valued.Proof. See the Appendix.The upper-hemicontinuity of Θ i ( · ) would follow from the Theorem of the Maximumhad we assumed Q iθ i to be positive for all feasible events and θ i ∈ Θ i , since the wKLDfunction would then be ﬁnite and continuous. But this assumption may be strong insome cases. Assumption 1(iii) weakens this assumption by requiring that it holdsfor a dense subset of Θ, and still guarantees that Θ i ( · ) is upper hemicontinuous. For example, it rules out cases where a player believes others follow pure strategies. ptimality. In equilibrium, we will require each player to choose a strategy thatis optimal given her beliefs. A strategy σ i for player i is optimal given µ i ∈ ∆(Θ i ) if σ i ( x i | s i ) > x i ∈ arg max ¯ x i ∈ X i E ¯ Q iµi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , Y i ) (cid:3) (6)where ¯ Q iµ i ( · | s i , x i ) = ´ Θ i Q iθ i ( · | s i , x i ) µ i ( dθ i ) is the distribution over consequences ofplayer i , conditional on ( s i , x i ) ∈ S i × X i , induced by µ i . Definition of equilibrium.

We propose the following solution concept.

Deﬁnition 1.

A strategy proﬁle σ is a Berk-Nash equilibrium of game G if, forall players i ∈ I , there exists µ i ∈ ∆(Θ i ) such that(i) σ i is optimal given µ i , and(ii) µ i ∈ ∆(Θ i ( σ )), i.e., if ˆ θ i is in the support of µ i , thenˆ θ i ∈ arg min θ i ∈ Θ i K i ( σ, θ i ) . Deﬁnition 1 places two restrictions on equilibrium behavior: (i) optimization givenbeliefs, and (ii) endogenous restrictions on beliefs. For comparison, note that thedeﬁnition of Nash equilibrium is identical to Deﬁnition 1 except that condition (ii) isreplaced with the condition that players have correct beliefs, i.e., ¯ Q iµ i = Q iσ . existence of equilibrium. The standard existence proof of Nash equilibriumcannot be used here because the analogous version of a best response correspondenceis not necessarily convex valued. To prove existence, we ﬁrst perturb payoﬀs andestablish that equilibrium exists in the perturbed game. We then consider a sequenceof equilibria of perturbed games, where perturbations go to zero, and establish thatthe limit is a Berk-Nash equilibrium of the (unperturbed) game. The nonstandardpart of the proof is to prove existence of equilibrium in the perturbed game. Theperturbed best response correspondence is still not necessarily convex valued. Ourapproach is to characterize equilibrium as a ﬁxed point of a belief correspondence andshow that it satisﬁes the requirements of a generalized version of Kakutani’s ﬁxedpoint theorem. The idea of perturbations and the strategy of the existence proof date back to Harsanyi (1973);Selten (1975) and Kreps and Wilson (1982) also used these ideas to prove existence of perfect andsequential equilibrium, respectively. heorem 1. Every game has at least one Berk-Nash equilibrium.Proof.

See the Appendix.

Example 2.1, continued from pg. 6.

Monopolist with unknown demand . Let σ = ( σ x ) x ∈ X denote a strategy, where σ x is the probability of choosing price x ∈ X .Because this is a single-agent problem, the objective distribution does not depend on σ ; hence, we denote it by Q ( · | x ), which is a normal density with mean φ ( x ) andunit variance. Similarly, Q θ ( · | x ) is a normal density with mean φ θ ( x ) = a − bx andunit variance. It follows from equation (5) that K ( σ, θ ) = (cid:88) x ∈ X σ x E Q ( ·| x ) (cid:2) ( Y − φ θ ( x )) − ( Y − φ ( x )) (cid:3) = (cid:88) x ∈ X σ x

12 ( φ ( x ) − φ θ ( x )) . For concreteness, let X = { , } , φ (2) = 34 and φ (10) = 2, and Θ = [33 , × [3 , . Let θ ∈ R provide a perfect ﬁt for demand, i.e., φ θ ( x ) = φ ( x ) for all x ∈ X . In this example, θ = ( a , b ) = (42 , / ∈ Θ and, therefore, we say that themonopolist has a misspeciﬁed model. The dashed line in Figure 1 depicts optimalbehavior: the optimal price is 10 to the left, it is 2 to the right, and the monopolist isindiﬀerent for parameter values on the dashed line.To solve for equilibrium, we ﬁrst consider pure strategies. If σ = (0 ,

1) (i.e., theprice is x = 10), the ﬁrst order conditions ∂K ( σ, θ ) /∂a = ∂K ( σ, θ ) /∂b = 0 imply φ (10) = φ θ (10) = a − b

10, and any ( a, b ) ∈ Θ on the segment AB in Figure 1minimizes K ( σ, · ). These minimizers, however, lie to the right of the dashed line,where it is not optimal to set a price of 10. Thus, σ = (0 ,

1) is not an equilibrium.A similar argument establishes that σ = (1 ,

0) is not an equilibrium: If it were, theminimizer would be at D , where it is in fact not optimal to choose a price of 2.Finally, consider mixed strategies. Because both ﬁrst order conditions cannot holdsimultaneously, the parameter value that minimizes K ( σ, θ ) lies on the boundary of Θ.A bit of algebra shows that, for any totally mixed σ , there is a unique minimizer θ σ =( a σ , b σ ) characterized as follows. If σ ≤ /

4, the minimizer is on the segment BC : In particular, the deterministic part of the demand function can have any functional form pro-vided it passes through (2 , φ o (2)) and (10 , φ (10)). Slope=2Slope=10Θ ba θ Optimal Price = 10 Optimal Price = 2D A B θ ˆ σ . ba θ Optimal Price = 10 Optimal Price = 2D A B θ σ ∗ . Figure 1: Monopolist with misspeciﬁed demand function.

Left panel: the parameter value that minimizes the wKLD function given strategy ˆ σ is θ ˆ σ .Right panel: σ ∗ is a Berk-Nash equilibrium ( σ ∗ is optimal given θ σ ∗ —because θ σ ∗ lies onthe indiﬀerence line—and θ σ ∗ minimizes the wKLD function given σ ∗ ). b σ = 3 . a σ = 4 σ + 37 solves ∂K ( σ, θ ) /∂a = 0. The left panel of Figure 1 depictsan example where the unique minimizer θ ˆ σ under strategy ˆ σ is given by the tangencybetween the contour lines of K (ˆ σ, · ) and the feasible set Θ. If σ ∈ [3 / , / θ σ = C is the northeast vertex of Θ. Finally, if σ > /

16, the minimizer is on thesegment DC : a σ = 40 and b σ = (380 − σ ) / (100 − σ ) solves ∂K ( σ, θ ) /∂b = 0.Because the monopolist mixes, optimality requires that the equilibrium belief θ σ lies on the dashed line. The unique Berk-Nash equilibrium is σ ∗ = (35 / , / θ σ ∗ = (40 , / DC , as depicted in the right panel of Figure 1. It is not thecase, however, that the equilibrium belief about the mean of Y is correct. Thus, anapproach that had focused on ﬁtting the mean, rather than minimizing K , would haveled to the wrong conclusion. (cid:3) It can be shown that K ( σ, θ ) = (cid:0) θ − θ (cid:1) (cid:48) M σ (cid:0) θ − θ (cid:1) , where M σ is a weighting matrix thatdepends on σ . In particular, the contour lines of K ( σ, · ) are ellipses. The example also illustrates the importance of mixed strategies for existence of Berk-Nash equi-librium, even in single-agent settings. As an antecedent, Esponda and Pouzo (2011) argue that thisis the reason why mixed strategy equilibrium cannot be puriﬁed in a voting application. xample 2.2, continued from pg. 7. Nonlinear taxation.

For any purestrategy x and parameter value θ ∈ Θ A = R (model A) or θ ∈ Θ B = R (model B),the wKLD function K j ( x, θ ) for model j ∈ { A, B } equals E (cid:104) ln Q ( T | Z ) p ( Z − x ) Q jθ ( T | Z ) p ( Z − x ) | X = x (cid:105) =  − E (cid:104)(cid:0) τ ( Z ) /Z − θ A (cid:1) | X = x (cid:105) + C A (model A) − E (cid:104)(cid:0) τ ( Z ) − θ B − θ B Z (cid:1) | X = x (cid:105) + C B (model B)where E denotes the true conditional expectation and C A and C B are constants.For model A, θ A ( x ) = E [ τ ( x + W ) / ( x + W )] is the unique parameter that mini-mizes K A ( x, · ). Intuitively, the agent believes that the expected marginal tax rateis equal to the true expected average tax rate. For model B, θ B ( x ) = Cov ( τ ( x + W ) , x + W ) /V ar ( x + W ) = E [ τ (cid:48) ( x + W )] , where the second equality follows from Stein’s lemma (Stein (1972)), provided that τ is diﬀerentiable. Intuitively, the agent believes that the marginal tax rate is constantand given by the true expected marginal tax rate. We now compare equilibrium under these two models with the case in which theagent has correct beliefs and chooses an optimal strategy x opt that maximizes x − E [ τ ( x + W )] − c ( x ). In contrast, a strategy x j ∗ is a Berk-Nash equilibrium of model j if and only if x = x j ∗ maximizes x − θ j ( x j ∗ ) x − c ( x ).For example, suppose that the cost of eﬀort and true tax schedule are both smoothfunctions, increasing and convex (e.g., taxes are progressive) and that X ⊂ R is acompact interval. Then ﬁrst order conditions are suﬃcient for optimality, and x opt is the unique solution to 1 − E [ τ (cid:48) ( x opt + W )] = c (cid:48) ( x opt ). Moreover, the unique Berk-Nash equilibrium solves 1 − E (cid:2) τ ( x A ∗ + W ) / ( x A ∗ + W ) (cid:3) = c (cid:48) ( x A ∗ ) for model A and1 − E [ τ (cid:48) ( x B ∗ + W )] = c (cid:48) ( x B ∗ ) for model B. In particular, eﬀort in model B is optimal, x B ∗ = x opt . Intuitively, the agent has correct beliefs about the true expected marginaltax rate at her equilibrium choice of eﬀort, and so she has the right incentives onthe margin, despite believing incorrectly that the marginal tax rate is constant. Incontrast, eﬀort is higher than optimal in model A, x A ∗ > x opt . Intuitively, the agent We use W to denote the random variable that takes on realizations ω . By linearity and normality, the minimizers of K B ( x, · ) coincide with the OLS estimands. Weassume normality for tractability, although the framework allows for general distributional assump-tions. There are other tractable distributions; for example, the minimizer of wKLD under the Laplacedistribution corresponds to the estimates of a median (not a linear) regression. (cid:3) Example 2.3, continued from pg. 7.

Regression to the mean . Sinceoptimal strategies are characterized by a cutoﬀ, we let σ ∈ R represent the strategywhere the instructor praises an initial performance if it is above σ and criticizes itotherwise. The wKLD function for any θ ∈ Θ = R is K ( σ, θ ) = ˆ σ −∞ E (cid:20) ln ϕ ( S ) ϕ ( S − ( θ C + s )) (cid:21) ϕ ( s ) ds + ˆ ∞ σ E (cid:20) ln ϕ ( S ) ϕ ( S − ( θ P + s )) (cid:21) ϕ ( s ) ds , where ϕ is the density of N (0 ,

1) and E denotes the true expectation. For each σ , theunique parameter vector that minimizes K ( σ, · ) is θ C ( σ ) = E [ S − S | S < σ ] = − E [ S | S < σ ] > θ P ( σ ) = − E [ S | S > σ ] <

0. Intuitively, the instructor is critical forperformances below a threshold and, therefore, the mean performance conditional ona student being criticized is lower than the unconditional mean performance; thus, astudent who is criticized delivers a better next performance in expectation. Similarly,a student who is praised delivers a worse next performance in expectation.The instructor who follows a strategy cutoﬀ σ believes, after observing initialperformance s >

0, that her expected payoﬀ is s + θ C ( σ ) − κs if she criticizes and s + θ P ( σ ) if she praises. By optimality, the cutoﬀ makes her indiﬀerent between praisingand criticizing. Thus, σ ∗ = (1 /κ ) ( θ C ( σ ∗ ) − θ P ( σ ∗ )) > κ →

0, meaning that instructors care only about performance and not about lying, σ ∗ → ∞ : instructors only criticize (as in Tversky and Kahneman’s (1973) story). (cid:3) We show that Berk-Nash equilibrium includes several solution concepts (both standardand boundedly rational) as special cases. 15 .1 Properties of games correctly-specified games . In Bayesian statistics, a model is correctly speciﬁedif the support of the prior includes the true data generating process. The extension tosingle-agent decision problems is straightforward. In games, however, we must accountfor the fact that the objective distribution over consequences (i.e., the true model)depends on the strategy proﬁle. Deﬁnition 2.

A game is correctly speciﬁed given σ if, for all i ∈ I , there exists θ i ∈ Θ i such that Q iθ i ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all for all ( s i , x i ) ∈ S i × X i ;otherwise, the game is misspeciﬁed given σ . A game is correctly speciﬁed if itis correctly speciﬁed for all σ ; otherwise, it is misspeciﬁed . identification . From the player’s perspective, what matters is identiﬁcation ofthe distribution over consequences Q iθ , not the parameter θ . If the model is correctlyspeciﬁed, then the true Q iσ is trivially identiﬁed. Of course, this is not true if the modelis misspeciﬁed, because the true distribution will never be learned. But we want adeﬁnition that captures the same spirit: If two distributions are judged to be equallya best ﬁt (given the true distribution), then we want these two distributions to beidentical; otherwise, we cannot identify which distribution is a best ﬁt. The fact thatplayers take actions introduces an additional nuance to the deﬁnition of identiﬁcation.We can ask for identiﬁcation of the distribution over consequences either for thoseactions that are taken by the player (i.e., on the path of play) or for all actions (i.e.,on and oﬀ the path). Deﬁnition 3.

A game is weakly identiﬁed given σ if, for all i ∈ I : if θ i , θ i ∈ Θ i ( σ ),then Q iθ i ( · | s i , x i ) = Q iθ i ( · | s i , x i ) for all ( s i , x i ) ∈ S i × X i such that σ i ( x i | s i ) > p S i has full support). If the condition is satisﬁed for all ( s i , x i ) ∈ S i × X i ,then we say that the game is strongly identiﬁed given σ . A game is [weakly orstrongly] identiﬁed if it is [weakly or strongly] identiﬁed for all σ .A correctly speciﬁed game is weakly identiﬁed. Also, two games that are identicalexcept for their feedback may diﬀer in terms of being correctly speciﬁed or identiﬁed. It would be more precise to say that the game is correctly speciﬁed in steady state . .2 Relationship to Nash and self-conﬁrming equilibrium The next result shows that Berk-Nash equilibrium is equivalent to Nash equilibriumwhen the game is both correctly speciﬁed and strongly identiﬁed.

Proposition 1. (i) Suppose that the game is correctly speciﬁed given σ and that σ is a Nash equilibrium of its objective game. Then σ is a Berk-Nash equilibrium ofthe (objective and subjective) game; (ii) Suppose that σ is a Berk-Nash equilibrium ofa game that is correctly speciﬁed and strongly identiﬁed given σ . Then σ is a Nashequilibrium of the corresponding objective game.Proof. (i) Let σ be a Nash equilibrium and ﬁx any i ∈ I . Then σ i is optimal given Q iσ . Because the game is correctly speciﬁed given σ , there exists θ i ∗ ∈ Θ i such that Q iθ i ∗ = Q iσ and, therefore, by Lemma 1, θ i ∗ ∈ Θ i ( σ ). Thus, σ i is also optimal given Q iθ i ∗ and θ i ∗ ∈ Θ i ( σ ), so that σ is a Berk-Nash equilibrium. (ii) Let σ be a Berk-Nashequilibrium and ﬁx any i ∈ I . Then σ i is optimal given ¯ Q iµ i , for some µ i ∈ ∆(Θ i ( σ )).Because the game is correctly speciﬁed given σ , there exists θ i ∗ ∈ Θ i such that Q iθ i ∗ = Q iσ and, therefore, by Lemma 1, θ i ∗ ∈ Θ i ( σ ). Moreover, because the game is stronglyidentiﬁed given σ , any ˆ θ i ∈ Θ i ( σ ) satisﬁes Q i ˆ θ i = Q iθ i ∗ = Q iσ . Then σ i is also optimalgiven Q iσ . Thus, σ is a Nash equilibrium. Example 2.4, continued from pg. 8.

Monetary policy . Fix a strategy x P ∗ for the public. Note that U = u ∗ − λ ( x G − x P ∗ + ε e ) + ε U , whereas the governmentbelieves U = θ − θ ( x G + ε e ) + ε U . Thus, by choosing θ ∗ ∈ Θ = R such that θ ∗ = u ∗ + λx P ∗ and θ ∗ = λ , it follows that the distribution over Y = ( U, e ) parameterizedby θ ∗ coincides with the objective distribution given x P ∗ . So, despite appearances, thegame is correctly speciﬁed given x P ∗ . Moreover, since V ar ( ε e ) > θ ∗ is the uniqueminimizer of the wKLD function given x P ∗ . Because there is a unique minimizer,then the game is strongly identiﬁed given x P ∗ . Since these properties hold for all x P ∗ ,Proposition 1 implies that Berk-Nash equilibrium is equivalent to Nash equilibrium.Thus, the equilibrium policies are the same whether or not the government realizesthat unemployment is driven by surprise, not actual, inﬂation. (cid:3) Sargent (1999) derived this result for a government doing OLS-based learning (a special caseof our example when errors are normal). We assumed linearity for simplicity, but the result istrue for the more general case with true unemployment U = f U ( x G , x P , ω ) and subjective model f Uθ ( x G , x P , ω ) if, for all x P , there exists θ such that f U ( x G , x P , ω ) = f Uθ ( x G , x P , ω ) for all ( x P , ω ). Proposition 2.

Suppose that the game is correctly speciﬁed given σ , and that σ is aBerk-Nash equilibrium. Then σ is also a self-conﬁrming equilibrium. Proof.

Fix any i ∈ I and let ˆ θ i be in the support of µ i , where µ i is player i ’s beliefsupporting the Berk-Nash equilibrium strategy σ i . Because the game is correctlyspeciﬁed given σ , there exists θ i ∗ ∈ Θ i such that Q iθ i ∗ = Q iσ and, therefore, by Lemma1, K i ( σ, θ i ∗ ) = 0. Thus, it must also be that K i ( σ, ˆ θ i ) = 0. By Lemma 1, it followsthat Q i ˆ θ i ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all ( s i , x i ) such that σ i ( x i | s i ) >

0. In particular, σ i is optimal given Q i ˆ θ i , and Q i ˆ θ i satisﬁes the desired self-conﬁrming restriction.For games that are not correctly speciﬁed, beliefs can be incorrect on the equilib-rium path, and so a Berk-Nash equilibrium is not necessarily Nash or SCE. An analogy-based game satisﬁes the following four properties: (i) States and in-formation structure : The state space Ω is ﬁnite with distribution p Ω ∈ ∆(Ω). Inaddition, for each i , there is a partition S i of Ω, and the element of S i that contains ω (i.e., the signal of player i in state ω ) is denoted by s i ( ω ); (ii) Perfect feedback : Foreach i , f i ( x, ω ) = ( x − i , ω ) for all ( x, ω ); (iii) Analogy partition : For each i , there existsa partition of Ω, denoted by A i , and the element of A i that contains ω is denotedby α i ( ω ); (iv) Conditional independence : ( Q iθ i ) θ i ∈ Θ i is the set of all joint probabilitydistributions over X − i × Ω that satisfy Q iθ i (cid:0) x − i , ω | s i ( ω (cid:48) ) , x i (cid:1) = Q i Ω ,θ i ( ω | s i ( ω (cid:48) )) Q i X − i ,θ i ( x − i | α i ( ω )) . A strategy proﬁle σ is a SCE if, for all players i ∈ I , σ i is optimal given ˆ Q iσ , where ˆ Q iσ ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all ( s i , x i ) such that σ i ( x i | s i ) >

0. This deﬁnition is slightly more general than thetypical one, e.g., Dekel et al. (2004), because it does not restrict players to believe that consequencesare driven by other players’ strategies. A converse does not necessarily hold for a ﬁxed game. The reason is that the deﬁnition of SCEdoes not impose any restrictions on oﬀ-equilibrium beliefs, while a particular subjective game mayimpose ex-ante restrictions on beliefs. The following converse, however, does hold: For any σ that isan SCE, there exists a game that is correctly speciﬁed for which σ is a Berk-Nash equilibrium. This assumption is made to facilitate comparison with Jehiel and Koessler’s (2008) ABEE.

18n other words, every player i believes that x − i and ω are independent conditional onthe analogy partition. For example, if A i = S i for all i , then each player believes thatthe actions of other players are independent of the state, conditional on their ownprivate information. Deﬁnition 4. (Jehiel and Koessler, 2008) A strategy proﬁle σ is an analogy-based ex-pectation equilibrium (ABEE) if for all i ∈ I , ω ∈ Ω, and x i such that σ i ( x i | s i ( ω )) > x i ∈ arg max ¯ x i ∈ X i (cid:80) ω (cid:48) ∈ Ω p Ω | S i ( ω (cid:48) | s i ( ω )) (cid:80) x − i ∈ X − i ¯ σ − i ( x − i | ω (cid:48) ) π i (¯ x i , x − i , ω (cid:48) ), where¯ σ − i ( x − i | ω (cid:48) ) = (cid:80) ω (cid:48)(cid:48) ∈ Ω p Ω |A i ( ω (cid:48)(cid:48) | α i ( ω (cid:48) )) (cid:81) j (cid:54) = i σ j ( x j | s j ( ω (cid:48)(cid:48) )). Proposition 3.

In an analogy-based game, σ is a Berk-Nash equilibrium if and onlyif it is an ABEE.Proof. See the Appendix.As mentioned by Jehiel and Koessler (2008), ABEE is equivalent to Eyster andRabin’s (2005) fully cursed equilibrium in the special case where A i = S i for all i .In particular, Proposition 3 provides a misspeciﬁed-learning foundation for these twosolution concepts. Jehiel and Koessler (2008) discuss an alternative foundation forABEE, where players receive coarse feedback aggregated over past play and multiplebeliefs are consistent with this feedback. Under this diﬀerent feedback structure,ABEE can be viewed as a natural selection of the set of SCE. Example 2.5, continued from pg. 8.

Trade with adverse selection.

InOnline Appendix A, we show that x ∗ is a Berk-Nash equilibrium price if and onlyif x = x ∗ maximizes an equilibrium belief function Π( x, x ∗ ) which represents thebelief about expected proﬁt from choosing any price x under a steady-state x ∗ . Theequilibrium belief function depends on the feedback/misspeciﬁcation assumptions, andwe discuss the following four cases:Π NE ( x ) = Pr( A ≤ x ) ( E [ V | A ≤ x ] − x )Π CE ( x ) = Pr( A ≤ x ) ( E [ V ] − x )Π BE ( x, x ∗ ) = Pr( A ≤ x ) ( E [ V | A ≤ x ∗ ] − x )Π ABEE ( x ) = k (cid:88) j =1 Pr( V ∈ V j ) { Pr( A ≤ x | V ∈ V j ) ( E [ V | V ∈ V j ] − x ) } . NE , is the benchmark case in which beliefs are correct. The secondcase, Π CE , corresponds to perfect feedback and subjective model Θ = ∆( A ) × ∆( V ),as described in page 8. This is an example of an analogy-based game with singleanalogy class V . The buyer learns the true marginal distributions of A and V andbelieves the joint distribution equals the product of the marginal distributions. Berk-Nash coincides with fully cursed equilibrium. The third case, Π BE , has the samemisspeciﬁed model as the second case, but assumes partial feedback, in the sense thatthe ask price a is always observed but the valuation v is only observed if there is trade.The equilibrium price x ∗ aﬀects the sample of valuations observed by the buyer and,therefore, her beliefs. Berk-Nash coincides with naive behavioral equilibrium.The last case, Π ABEE , corresponds to perfect feedback and the following misspec-iﬁcation: Consider a partition of V into k “analogy classes” ( V j ) j =1 ,...,k . The buyerbelieves that ( A, V ) are independent conditional on V ∈ V i , for each i = 1 , ..., k . Theparameter set is Θ A = × j ∆( A ) × ∆( V ), where, for a value θ = ( θ , ...., θ k , θ V ) ∈ Θ A , θ V parameterizes the marginal distribution over V and, for each j = 1 , ..., k , θ j ∈ ∆( A )parameterizes the distribution over A conditional on V ∈ V j . Berk-Nash coincideswith the ABEE of the game with analogy classes ( V j ) j =1 ,...,k . (cid:3) We provide a learning foundation for equilibrium. We follow Fudenberg and Kreps(1993) in considering games with (slightly) perturbed payoﬀs because, as they high-light in the context of providing a learning foundation for mixed-strategy Nash equi-librium, behavior need not be continuous in beliefs without perturbations. Thus, evenif beliefs were to converge, behavior need not settle down in the unperturbed game.Perturbations guarantee that if beliefs converge, then behavior also converges. A perturbation structure is a tuple P = (cid:104) Ξ , P ξ (cid:105) , where: Ξ = × i ∈ I Ξ i and Ξ i ⊆ R X i is a set of payoﬀ perturbations for each action of player i ; P ξ = ( P ξ i ) i ∈ I , where P ξ i ∈ ∆(Ξ i ) is a distribution over payoﬀ perturbations of player i that is absolutelycontinuous with respect to the Lebesgue measure, satisﬁes ´ Ξ i || ξ i || P ξ ( dξ i ) < ∞ , In Online Appendix A, we also consider the case of ABEE with partial feedback. perturbed game G P = (cid:104)G , P(cid:105) is composed of a game G and a perturbation structure P . The timingof a perturbed game G P coincides with the timing of G , except for two diﬀerences.First, before taking an action, each player not only observes her signal s i but alsoprivately observes a vector of own-payoﬀ perturbations ξ i ∈ Ξ i , where ξ i ( x i ) denotesthe perturbation for action x i . Second, her payoﬀ given action x i and consequence y i is π i ( x i , y i ) + ξ i ( x i ).A strategy σ i for player i is optimal in the perturbed game given µ i ∈ ∆(Θ i )if, for all ( s i , x i ) ∈ S i × X i , σ i ( x i | s i ) = P ξ ( ξ i : x i ∈ Ψ i ( µ i , s i , ξ i )), whereΨ i ( µ i , s i , ξ i ) ≡ arg max x i ∈ X i E ¯ Q iµi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) (cid:3) + ξ i ( x i ) . In other words, if σ i is an optimal strategy, then σ i ( x i | s i ) is the probability that x i is optimal when the state is s i and the perturbation is ξ i , taken over all possiblerealizations of ξ i . The deﬁnition of Berk-Nash equilibrium of a perturbed game G P isanalogous to Deﬁnition 1, with the only diﬀerence that optimality must be requiredwith respect to the perturbed game. We ﬁx a perturbed game G P and assume that players repeatedly play the correspond-ing objective game at each t = 0 , , , ... , where the time- t state and signals, ( ω t , s t ),and perturbations ξ t , are independently drawn every period from the same distribution p and P ξ , respectively. In addition, each player i has a prior µ i with full support overher (ﬁnite-dimensional) parameter set, Θ i . At the end of every period t , each playeruses Bayes’ rule and the information obtained in all past periods (her own signals, ac-tions, and consequences) to update beliefs. Players believe that they face a stationaryenvironment and myopically maximize the current period’s expected payoﬀ.Let ∆ (Θ i ) denote the set of probability distributions on Θ i with full support. Let B i : ∆ (Θ i ) × S i × X i × Y i → ∆ (Θ i ) denote the Bayesian operator of player i : for all We restrict attention to parametric models (i.e., ﬁnite-dimensional parameter spaces) because,otherwise, Bayesian updating need not converge to the truth for most priors and parameter valueseven in correctly speciﬁed statistical settings (Freedman (1963), Diaconis and Freedman (1986)). ⊆ Θ Borel measurable and all ( µ i , s i , x i , y i ) ∈ ∆ (Θ i ) × S i × X i × Y i , B i ( µ i , s i , x i , y i )( A ) = ´ A Q iθ i ( y i | s i , x i ) µ i ( dθ ) ´ Θ Q iθ i ( y i | s i , x i ) µ i ( dθ ) . Bayesian updating is well deﬁned by Assumption 1. Because players believe theyface a stationary environment with i.i.d. perturbations, it is without loss of generalityto restrict player i ’s behavior at time t to depend on ( µ it , s it , ξ it ). Deﬁnition 5. A policy of player i is a sequence of functions φ i = ( φ it ) t , where φ it : ∆(Θ i ) × S i × Ξ i → X i . A policy φ i is optimal if φ it ∈ Ψ i for all t . A policy proﬁle φ = ( φ i ) i ∈ I is optimal if φ i is optimal for all i ∈ I .Let H ⊆ ( S × Ξ × X × Y ) ∞ denote the set of histories, where any history h =( s , ξ , x , y , ..., s t , ξ t , x t , y t ... ) ∈ H satisﬁes the feasibility restriction: for all i ∈ I , y it = f i ( x it , x − it , ω ) for some ω ∈ supp ( p Ω | S i ( · | s it )) for all t . Let P µ ,φ denote theprobability distribution over H that is induced by the priors µ = ( µ i ) i ∈ I , and thepolicy proﬁles φ = ( φ i ) i ∈ I . Let ( µ t ) t denote the sequence of beliefs µ t : H → × i ∈ I ∆(Θ i )such that, for all t ≥ i ∈ I , µ it is the posterior at time t deﬁned recursivelyby µ it ( h ) = B i ( µ it − ( h ) , s it − ( h ) , x it − ( h ) , y it − ( h )) for all h ∈ H , where s it − ( h ) is player i ’s signal at t − h , and similarly for x it − ( h ) and y it − ( h ). Deﬁnition 6.

The sequence of intended strategy proﬁles given policy proﬁle φ = ( φ i ) i ∈ I is the sequence ( σ t ) t of random variables σ t : H → × i ∈ I ∆( X i ) S i such that,for all t , all i ∈ I , and all ( x i , s i ) ∈ X i × S i , σ it ( h )( x i | s i ) = P ξ (cid:0) ξ i : φ it ( µ it ( h ) , s i , ξ i ) = x i (cid:1) . (7)An intended strategy proﬁle σ t describes how each player would behave at time t for each possible signal; it is a random variable because it depends on the players’beliefs at time t , µ t , which in turn depend on the past history. By Assumption 1(ii)-(iii), there exists θ ∈ Θ and an open ball containing it, such that Q iθ (cid:48) > θ (cid:48) in the ball. Thus the Bayesian operator is well-deﬁned for any µ i ∈ ∆ (Θ i ). Moreover, byAssumption 1(iii), such θ ’s are dense in Θ, so the Bayesian operator maps ∆ (Θ i ) into itself. Deﬁnition 7.

A strategy proﬁle σ is stable [or strongly stable] under policy proﬁle φ if the sequence of intended strategies, ( σ t ) t , converges to σ with positive probability[or with probability one], i.e., P µ ,φ (cid:16) lim t →∞ (cid:107) σ t ( h ) − σ (cid:107) = 0 (cid:17) > . Lemma 2 says that, if behavior stabilizes to a strategy proﬁle σ , then, for eachplayer i , beliefs become increasingly concentrated on Θ i ( σ ). This result extends ﬁnd-ings from the statistics of misspeciﬁed learning (Berk (1966), Bunke and Milhaud(1998)) to a setting with active learning (i.e., players learn from data that is endoge-nously generated by their own actions). Three new issues arise: (i) Previous resultsneed to be extended to the case of non-i.i.d. and endogenous data; (ii) It is not ob-vious that steady-state beliefs can be characterized based on steady-state behavior,independently of the path of play (Assumption 1 plays an important role here; SeeSection 5 for an example); (iii) We allow the wKLD function to be nonﬁnite so thatplayers can believe that other players follow pure strategies. Lemma 2.

Suppose that, for a policy proﬁle φ , the sequence of intended strategies, ( σ t ) t , converges to σ for all histories in a set H ⊆ H such that P µ ,φ ( H ) > . Then,for all open sets U i ⊇ Θ i ( σ ) , lim t →∞ µ it ( U i ) = 1 , a.s.- P µ ,φ in H .Proof. See the Appendix.The sketch of the proof of Lemma 2 is as follows (we omit the i subscript to easethe notational burden). Consider an arbitrary (cid:15) > (cid:15) ( σ ) ⊆ Θdeﬁned as the points which are within (cid:15) distance of Θ( σ ). The time t posterior overthe complement of Θ (cid:15) ( σ ), µ t (Θ \ Θ (cid:15) ( σ )), can be expressed as ´ Θ \ Θ (cid:15) ( σ ) (cid:81) t − τ =0 Q θ ( y τ | s τ , x τ ) µ ( dθ ) ´ Θ (cid:81) t − τ =0 Q θ ( y τ | s τ , x τ ) µ ( dθ ) = ´ Θ \ Θ (cid:15) ( σ ) e tK t ( θ ) µ ( dθ ) ´ Θ e tK t ( θ ) µ ( dθ ) For example, if player 1 believes that player 2 plays A with probability θ and B with 1 − θ , thenthe wKLD function is inﬁnity at θ = 1 if player 2 plays B with positive probability. K t ( θ ) equals minus the log-likelihood ratio, − t (cid:80) t − τ =0 ln Q στ ( y τ | s τ ,x τ ) Q θ ( y τ | s τ ,x τ ) . This ex-pression and straightforward algebra implies that µ t (Θ \ Θ (cid:15) ( σ )) ≤ ´ Θ \ Θ (cid:15) ( σ ) e t ( K t ( θ )+ K ( σ,θ )+ δ ) µ ( dθ ) ´ Θ η ( σ ) e t ( K t ( θ )+ K ( σ,θ )+ δ ) µ ( dθ )for any δ > θ ∈ Θ( σ ) and η > (cid:15) -separated”from Θ( σ ), whereas the integral in the denominator is taken over points which are“ η -close” to Θ( σ ).Intuitively, if K t ( · ) behaves asymptotically like − K ( σ, · ), there exist suﬃcientlysmall δ > η > K t ( θ ) + K ( σ, θ ) + δ is negative for all θ which are“ (cid:15) -separated” from Θ( σ ), and positive for all θ which are “ η -close” to Θ( σ ). Thus, thenumerator converges to zero, whereas the denominator diverges to inﬁnity, providedthat Θ η ( σ ) has positive measure under the prior.The nonstandard part of the proof consists of establishing that Θ η ( σ ) has positivemeasure under the prior, which relies on Assumption 1, and that indeed K t ( · ) behavesasymptotically like − K ( σ, · ). By virtue of Fatou’s lemma, for θ ∈ Θ η ( σ ) it suﬃces toshow almost sure pointwise convergence of K t ( θ ) to − K ( σ, θ ); this is done in ClaimB(i) in the Appendix and relies on a LLN argument for non-iid variables. On theother hand, over θ ∈ Θ \ Θ (cid:15) ( σ ), we need to control the asymptotic behavior of K t ( . )uniformly to be able to interchange the limit and integral. In Claims B(ii) and B(iii)in the Appendix, we establish that there exists α > \ Θ (cid:15) ( σ ), K t ( · ) < − K ( σ, θ ) − α .While Lemma 2 implies that the support of posteriors converges, posteriors neednot converge. We can always ﬁnd, however, a subsequence of posteriors that converges.By continuity of behavior in beliefs and the assumption that players are myopic, thestable strategy proﬁle must be statically optimal. Thus, we obtain the following char-acterization of the set of stable strategy proﬁles when players follow optimal policies. Theorem 2.

Suppose that a strategy proﬁle σ is stable under an optimal policy proﬁlefor a perturbed game. Then σ is a Berk-Nash equilibrium of the perturbed game.Proof. Let φ denote the optimal policy function under which σ is stable. By Lemma2, there exists H ⊆ H with P µ ,φ ( H ) > h ∈ H , lim t →∞ σ t ( h ) = σ t →∞ µ it ( U i ) = 1 for all i ∈ I and all open sets U i ⊇ Θ i ( σ ); for the remainder ofthe proof, ﬁx any h ∈ H . For all i ∈ I , compactness of ∆(Θ i ) implies the existence ofa subsequence, which we denote as ( µ it ( j ) ) j , such that µ it ( j ) converges (weakly) to µ i ∞ (the limit could depend on h ). We conclude by showing, for all i ∈ I :(i) µ i ∞ ∈ ∆(Θ i ( σ )): Suppose not, so that there exists ˆ θ i ∈ supp ( µ i ∞ ) such thatˆ θ i / ∈ Θ i ( σ ). Then, since Θ i ( σ ) is closed (by Lemma 1), there exists an open set U i ⊃ Θ i ( σ ) with closure ¯ U i such that ˆ θ i / ∈ ¯ U i . Then µ i ∞ ( ¯ U i ) <

1, but this contradictsthe fact that µ i ∞ (cid:0) ¯ U i (cid:1) ≥ lim sup j →∞ µ it ( j ) (cid:0) ¯ U i (cid:1) ≥ lim j →∞ µ it ( j ) ( U i ) = 1, where the ﬁrstinequality holds because ¯ U i is closed and µ it ( j ) converges (weakly) to µ i ∞ .(ii) σ i is optimal for the perturbed game given µ i ∞ ∈ ∆(Θ i ): σ i ( x i | s i ) = lim j →∞ σ it ( j ) ( h )( x i | s i ) = lim j →∞ P ξ (cid:0) ξ i : x i ∈ Ψ i ( µ it ( j ) , s i , ξ i ) (cid:1) = P ξ (cid:0) ξ i : x i ∈ Ψ i ( µ i ∞ , s i , ξ i ) (cid:1) , where the second equality follows because φ i is optimal and Ψ i is single-valued, a.s.- P ξ i , and the third equality follows from a standard continuity argument. Theorem 2 provides our main justiﬁcation for Berk-Nash equilibria: any strategyproﬁle that is not an equilibrium cannot represent limiting behavior of optimizingplayers. Theorem 2, however, does not imply that behavior stabilizes. It is wellknown that convergence is not guaranteed for Nash equilibrium, which is a specialcase of Berk-Nash equilibrium. Thus, some assumption needs to be relaxed to proveconvergence for general games. Fudenberg and Kreps (1993) show that a conversefor the case of Nash equilibrium can be obtained by relaxing optimality and allowingplayers to make vanishing optimization mistakes.

Deﬁnition 8.

A policy proﬁle φ is asymptotically optimal if there exists a positivereal-valued sequence ( ε t ) t with lim t →∞ ε t = 0 such that, for all i ∈ I , all ( µ i , s i , ξ i ) ∈ Ψ i is single-valued a.s.- P ξ i because the set of ξ i such that i ( µ i , s i , ξ i ) > X i and, by absolute continuity of P ξ i , this set has measure zero. Jordan (1993) shows that non-convergence is robust to the choice of initial conditions; Benaimand Hirsch (1999) replicate this ﬁnding for the perturbed version of Jordan’s game. In the game-theory literature, general global convergence results have only been obtained in special classes ofgames—e.g. zero-sum, potential, and supermodular games (Hofbauer and Sandholm, 2002). i ) × S i × Ξ i , all t , and all x i ∈ X i , E ¯ Q iµi ( ·| s i ,x it ) (cid:2) π i ( x it , Y i ) (cid:3) + ξ i ( x it ) ≥ E ¯ Q iµi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) (cid:3) + ξ i ( x i ) − ε t where x it = φ it ( µ i , s i , ξ i ).Fudenberg and Kreps’ (1993) insight is to suppose that players are convincedearly on that the equilibrium strategy is the right one to play, and continue to playthis strategy unless they have strong enough evidence to think otherwise. And, asthey continue to play the equilibrium strategy, the evidence increasingly convincesthem that it is the right thing to do. This idea, however, need not work for Berk-Nash equilibrium because beliefs may not converge if the model is misspeciﬁed (seeBerk (1966) for an example). If the game is weakly identiﬁed, however, Lemma 2 andFudenberg and Kreps’ (1993) insight can be combined to obtain the following converseof Theorem 2. Theorem 3.

Suppose that σ is a Berk-Nash equilibrium of a perturbed game that isweakly identiﬁed given σ . Then there exists a proﬁle of priors with full support andan asymptotically optimal policy proﬁle φ such that σ is strongly stable under φ .Proof. See Online Appendix B. importance of assumption 1. The following example illustrates that equilib-rium may not exist and Lemma 2 fails if Assumption 1 does not hold. A single agentchooses action x ∈ { A, B } and obtains outcome y ∈ { , } . The agent’s model isparameterized by θ = ( θ A , θ B ), where Q θ ( y = 1 | A ) = θ A and Q θ ( y = 1 | B ) = θ B .The true model is θ = (1 / , / θ = (0 , /

4) and θ = (1 / , /

4) to be possible, i.e., Θ = { θ , θ } . In particular,Assumption 1(iii) fails for parameter value θ . Suppose that A is uniquely optimalfor parameter value θ and B is uniquely optimal for θ (further details about payoﬀsare not needed). The requirement that the priors have full support makes the statement non trivial. Assumption 1(iii) would hold if, for some ¯ ε > θ = ( ε, /

4) were also in Θ for all 0 < ε ≤ ¯ ε .

26 Berk-Nash equilibrium does not exist: If A is played with positive probability,then the wKLD is inﬁnity at θ (i.e., θ cannot rationalize y = 1 given A ) and θ isthe best ﬁt; but then A is not optimal. If B is played with probability 1, then θ isthe best ﬁt; but then B is not optimal. In addition, Lemma 2 fails: Suppose thatthe path of play converges to pure strategy B . The best ﬁt given B is θ , but theposterior need not converge weakly to a degenerate probability distribution on θ ; itis possible that, along the path of play, the agent tried action A and observed y = 1,in which case the posterior would immediately assign probability 1 to θ . forward-looking agents . In the dynamic model, we assumed that playersare myopic. In Online Appendix C, we extend Theorem 2 to the case of non-myopicplayers who solve a dynamic optimization problem with beliefs as a state variable.A key fact used in the proof of Theorem 2 is that myopically optimal behavior iscontinuous in beliefs. Non-myopic optimal behavior is also continuous in beliefs, butthe issue is that it may not coincide with myopic behavior in the steady state if playersstill have incentives to experiment. We prove the extension by requiring that the gameis weakly identiﬁed, which guarantees that players have no incentives to experimentin steady state. large population models.

The framework assumes that there is a ﬁxed num-ber of players but, by focusing on stationary subjective models, rules out aspects of“repeated games” where players attempt to inﬂuence each others’ play. In Online Ap-pendix D, we adapt the equilibrium concept to settings in which there is a populationof a large number of agents in the role of each player, so that agents have negligibleincentives to inﬂuence each other’s play. extensive-form games . Our results hold for an alternative timing where player i commits to a signal-contingent plan of action (i.e., a strategy) and observes both therealized signal s i and the consequence y i ex post. In particular, Berk-Nash equilib-rium is applicable to extensive-form games provided that players compete by choosingcontingent plan of actions and know the extensive form. But the right approach isless clear if players have a misspeciﬁed view of the extensive form (for example, theymay not even know the set of strategies available to them) or if players play thegame sequentially (for example, we would need to deﬁne and update beliefs at eachinformation set). The extension to extensive-form games is left for future work. Jehiel (1995) considers the class of repeated alternating-move games and assumes that playersonly forecast a limited number of time periods into the future; see Jehiel (1998) for a learningfoundation. Jehiel and Samet (2007) consider the general class of extensive form games with perfect elationship to bounded rationality literature. By providing a languagethat makes the underlying misspeciﬁcation explicit, we oﬀer some guidance for choos-ing between diﬀerent models of bounded rationality. For example, we could modelthe observed behavior of an instructor in Example 2.3 by directly assuming that shebelieves criticism improves performance and praise worsens it. But extrapolatingthis observed belief to other contexts may lead to erroneous conclusions. Instead,we postulate what we think is a plausible misspeciﬁcation (i.e., failure to account forregression to the mean) and then derive beliefs endogenously, as a function of thecontext.We mentioned in the paper several instances of bounded rationality that can beformalized via misspeciﬁed, endogenous learning. Other examples in the literaturecan also be viewed as restricting beliefs using the wKLD measure, but fall outside thescope of our paper either because interactions are mediated by a price or because theproblem is dynamic (we focus on the repetition of a static problem). For example,Blume and Easley (1982) and Rabin and Vayanos (2010) explicitly characterize beliefsusing the limit of a likelihood function, while Bray (1982), Radner (1982), Sargent(1993), and Evans and Honkapohja (2001) focus speciﬁcally on OLS learning withmisspeciﬁed models. Piccione and Rubinstein (2003), Eyster and Piccione (2013),and Spiegler (2013) study pattern recognition in dynamic settings and impose con-sistency requirements on beliefs that could be interpreted as minimizing the wKLDmeasure. In the sampling equilibrium of Osborne and Rubinstein (1998) and Spiegler(2006), beliefs may be incorrect due to learning from a limited sample, rather thanfrom misspeciﬁed learning. Other instances of bounded rationality that do not seemnaturally ﬁtted to misspeciﬁed learning, include biases in information processing dueto computational complexity (e.g., Rubinstein (1986), Salant (2011)), bounded mem-ory (e.g., Wilson, 2003), self-deception (e.g., B´enabou and Tirole (2002), Compte andPostlewaite (2004)), or sparsity-based optimization (Gabaix (2014)). information and assume that players simplify the game by partitioning the nodes into similarityclasses. In both cases, players are required to have correct beliefs, given their limited or simpliﬁedview of the game. This assumption corresponds to a singleton set Θ, thus ﬁxing beliefs at the outset and leavingno space for learning. This approach is common in past work that assumes that agents have amisspeciﬁed model but there is no learning about parameter values, e.g., Barberis et al. (1998). eferences Al-Najjar, N. , “Decision Makers as Statisticians: Diversity, Ambiguity and Learn-ing,”

Econometrica , 2009, (5), 1371–1401. and M. Pai , “Coarse decision making and overﬁtting,” Journal of EconomicTheory, forthcoming , 2013.

Aliprantis, C.D. and K.C. Border , Inﬁnite dimensional analysis: a hitchhiker’sguide , Springer Verlag, 2006.

Aragones, E., I. Gilboa, A. Postlewaite, and D. Schmeidler , “Fact-Free Learn-ing,”

American Economic Review , 2005, (5), 1355–1368. Arrow, K. and J. Green , “Notes on Expectations Equilibria in Bayesian Settings,”

Institute for Mathematical Studies in the Social Sciences Working Paper No. 33 ,1973.

Barberis, N., A. Shleifer, and R. Vishny , “A model of investor sentiment,”

Journal of ﬁnancial economics , 1998, (3), 307–343. Battigalli, P. , Comportamento razionale ed equilibrio nei giochi e nelle situazionisociali , Universita Bocconi, Milano, 1987. , S. Cerreia-Vioglio, F. Maccheroni, and M. Marinacci , “Selfconﬁrmingequilibrium and model uncertainty,” Technical Report 2012.

B´enabou, Roland and Jean Tirole , “Self-conﬁdence and personal motivation,”

The Quarterly Journal of Economics , 2002, (3), 871–915.

Benaim, M. and M.W. Hirsch , “Mixed equilibria and dynamical systems arisingfrom ﬁctitious play in perturbed games,”

Games and Economic Behavior , 1999, (1-2), 36–72. Benaim, Michel , “Dynamics of stochastic approximation algorithms,” in “Seminairede Probabilites XXXIII,” Vol. 1709 of

Lecture Notes in Mathematics , Springer BerlinHeidelberg, 1999, pp. 1–68.

Berk, R.H. , “Limiting behavior of posterior distributions when the model is incor-rect,”

The Annals of Mathematical Statistics , 1966, (1), 51–58. Bickel, Peter J, Chris AJ Klaassen, Peter J Bickel, Y Ritov, J Klaassen,Jon A Wellner, and YA’Acov Ritov , Eﬃcient and adaptive estimation forsemiparametric models , Johns Hopkins University Press Baltimore, 1993.

Billingsley, P. , Probability and Measure , Wiley, 1995.29 lume, L.E. and D. Easley , “Learning to be Rational,”

Journal of EconomicTheory , 1982, (2), 340–351. Bray, M. , “Learning, estimation, and the stability of rational expectations,”

Journalof economic theory , 1982, (2), 318–339. Bunke, O. and X. Milhaud , “Asymptotic behavior of Bayes estimates under pos-sibly incorrect models,”

The Annals of Statistics , 1998, (2), 617–644. Compte, Olivier and Andrew Postlewaite , “Conﬁdence-enhanced performance,”

American Economic Review , 2004, pp. 1536–1557.

Dekel, E., D. Fudenberg, and D.K. Levine , “Learning to play Bayesian games,”

Games and Economic Behavior , 2004, (2), 282–303. Diaconis, P. and D. Freedman , “On the consistency of Bayes estimates,”

TheAnnals of Statistics , 1986, pp. 1–26.

Doraszelski, Ulrich and Juan F Escobar , “A theory of regular Markov perfectequilibria in dynamic stochastic games: Genericity, stability, and puriﬁcation,”

The-oretical Economics , 2010, (3), 369–402. Durrett, R. , Probability: Theory and Examples , Cambridge University Press, 2010.

Easley, D. and N.M. Kiefer , “Controlling a stochastic process with unknown pa-rameters,”

Econometrica , 1988, pp. 1045–1064.

Esponda, I. , “Behavioral equilibrium in economies with adverse selection,”

TheAmerican Economic Review , 2008, (4), 1269–1291. and D. Pouzo , “Learning Foundation for Equilibrium in Voting Environmentswith Private Information,” working paper , 2011. Esponda, I. and D. Pouzo , “Berk-Nash Equilibrium: A Framework for ModelingAgents with Misspeciﬁed Models,”

ArXiv 1411.1152 , November 2014.

Evans, G. W. and S. Honkapohja , Learning and Expectations in Macroeconomics ,Princeton University Press, 2001.

Eyster, E. and M. Rabin , “Cursed equilibrium,”

Econometrica , 2005, (5), 1623–1672. Eyster, Erik and Michele Piccione , “An approach to asset-pricing under incom-plete and diverse perceptions,”

Econometrica , 2013, (4), 1483–1506. Freedman, D.A. , “On the asymptotic behavior of Bayes’ estimates in the discretecase,”

The Annals of Mathematical Statistics , 1963, (4), 1386–1403.30 udenberg, D. and D. Kreps , “Learning Mixed Equilibria,” Games and EconomicBehavior , 1993, , 320–367. and D.K. Levine , “Self-conﬁrming equilibrium,” Econometrica , 1993, pp. 523–545. and , “Steady state learning and Nash equilibrium,”

Econometrica , 1993,pp. 547–573. and , The theory of learning in games , Vol. 2, The MIT press, 1998. and , “Learning and Equilibrium,”

Annual Review of Economics , 2009, , 385–420. and D.M. Kreps , “A Theory of Learning, Experimentation, and Equilibrium inGames,” Technical Report, mimeo 1988. and , “Learning in extensive-form games I. Self-conﬁrming equilibria,” Gamesand Economic Behavior , 1995, (1), 20–55. Gabaix, Xavier , “A sparsity-based model of bounded rationality,”

The QuarterlyJournal of Economics , 2014, (4), 1661–1710.

Harsanyi, J.C. , “Games with randomly disturbed payoﬀs: A new rationale formixed-strategy equilibrium points,”

International Journal of Game Theory , 1973, (1), 1–23. Hirsch, M. W., S. Smale, and R. L. Devaney , Diﬀerential Equations, DynamicalSystems and An Introduction to Chaos , Elsevier Academic Press, 2004.

Hofbauer, J. and W.H. Sandholm , “On the global convergence of stochasticﬁctitious play,”

Econometrica , 2002, (6), 2265–2294. Jehiel, P. , “Analogy-based expectation equilibrium,”

Journal of Economic theory ,2005, (2), 81–104. and D. Samet , “Valuation equilibrium,”

Theoretical Economics , 2007, (2),163–185. and F. Koessler , “Revisiting games of incomplete information with analogy-basedexpectations,” Games and Economic Behavior , 2008, (2), 533–557. Jehiel, Philippe , “Learning to play limited forecast equilibria,”

Games and Eco-nomic Behavior , 1998, (2), 274–298. Jehiel, Phillippe , “Limited horizon forecast in repeated alternate games,”

Journalof Economic Theory , 1995, (2), 497–519.31 ordan, J. S. , “Three problems in learning mixed-strategy Nash equilibria,” Gamesand Economic Behavior , 1993, (3), 368–386. Kagel, J.H. and D. Levin , “The winner’s curse and public information in commonvalue auctions,”

The American Economic Review , 1986, pp. 894–920.

Kalai, E. and E. Lehrer , “Rational learning leads to Nash equilibrium,”

Econo-metrica , 1993, pp. 1019–1045.

Kirman, A. P. , “Learning by ﬁrms about demand conditions,” in R. H. Day andT. Groves, eds.,

Adaptive economic models , Academic Press 1975, pp. 137–156.

Kreps, D. M. and R. Wilson , “Sequential equilibria,”

Econometrica , 1982, pp. 863–894.

Kullback, S. and R. A. Leibler , “On Information and Suﬃciency,”

Annals ofMathematical Statistics , 1951, (1), 79–86. Kushner, H. J. and G. G. Yin , Stochastic Approximation and Recursive Algo-rithms and Applications , Springer Verlag, 2003.

McLennan, A. , “Price dispersion and incomplete learning in the long run,”

Journalof Economic Dynamics and Control , 1984, (3), 331–347. Nyarko, Y. , “Learning in mis-speciﬁed models and the possibility of cycles,”

Journalof Economic Theory , 1991, (2), 416–427., “On the convexity of the value function in Bayesian optimal control problems,” Economic Theory , 1994, (2), 303–309. Osborne, M.J. and A. Rubinstein , “Games with procedurally rational players,”

American Economic Review , 1998, , 834–849. Piccione, M. and A. Rubinstein , “Modeling the economic interaction of agentswith diverse abilities to recognize equilibrium patterns,”

Journal of the Europeaneconomic association , 2003, (1), 212–223. Pollard, D. , A User’s Guide to Measure Theoretic Probability , Cambridge UniversityPress, 2001.

Rabin, M. , “Inference by Believers in the Law of Small Numbers,”

Quarterly Journalof Economics , 2002, (3), 775–816. and D. Vayanos , “The gambler’s and hot-hand fallacies: Theory and applica-tions,”

The Review of Economic Studies , 2010, (2), 730–778. Radner, R. , Equilibrium Under Uncertainty , Vol. II of

Handbook of MathematicalEconomies , North-Holland Publishing Company, 1982.32 othschild, M. , “A two-armed bandit theory of market pricing,”

Journal of Eco-nomic Theory , 1974, (2), 185–202. Rubinstein, A. and A. Wolinsky , “Rationalizable conjectural equilibrium: be-tween Nash and rationalizability,”

Games and Economic Behavior , 1994, (2),299–311. Rubinstein, Ariel , “Finite automata play the repeated prisoner’s dilemma,”

Journalof economic theory , 1986, (1), 83–96. Salant, Y. , “Procedural analysis of choice rules with applications to bounded ratio-nality,”

The American Economic Review , 2011, (2), 724–748.

Sargent, T. J. , Bounded rationality in macroeconomics , Oxford University Press,1993.,

The Conquest of American Inﬂation , Princeton University Press, 1999.

Schwartzstein, J. , “Selective Attention and Learning,” working paper , 2009.

Selten, R. , “Reexamination of the perfectness concept for equilibrium points in ex-tensive games,”

International journal of game theory , 1975, (1), 25–55. Sobel, J. , “Non-linear prices and price-taking behavior,”

Journal of Economic Be-havior & Organization , 1984, (3), 387–396. Spiegler, R. , “The Market for Quacks,”

Review of Economic Studies , 2006, , 1113–1131., “Placebo reforms,” The American Economic Review , 2013, (4), 1490–1506., “Bayesian Networks and Boundedly Rational Expectations,”

Working Paper , 2014.

Stein, Charles , “A bound for the error in the normal approximation to the dis-tribution of a sum of dependent random variables,” in “Proceedings of the SixthBerkeley Symposium on Mathematical Statistics and Probability, Volume 2: Prob-ability Theory” University of California Press Berkeley, Calif. 1972, pp. 583–602.

Tversky, T. and D. Kahneman , “Availability: A heuristic for judging frequencyand probability,”

Cognitive Psychology , 1973, , 207–232. White, Halbert , “Maximum likelihood estimation of misspeciﬁed models,”

Econo-metrica: Journal of the Econometric Society , 1982, pp. 1–25.

Wilson, A. , “Bounded Memory and Biases in Information Processing,”

WorkingPaper , 2003. 33 ppendix

Let Z i = (cid:8) ( s i , x i , y i ) ∈ S i × X i × Y i : y i = f i ( x i , x − i , ω ) , x − i ∈ X − i , ω ∈ supp ( p Ω | S i ( · | s i )) (cid:9) .For all z i = ( s i , x i , y i ) ∈ Z i , deﬁne ¯ P iσ ( z i ) = Q iσ ( y i | s i , x i ) σ i ( x i | s i ) p S i ( s i ). We some-times abuse notation and write Q iσ ( z i ) ≡ Q iσ ( y i | s i , x i ), and similarly for Q iθ i . Thefollowing claim is used in the proofs below. Claim A.

For all i ∈ I : (i) There exists θ i ∗ ∈ Θ i and ¯ K < ∞ such that, ∀ σ ∈ Σ , K i ( σ, θ i ∗ ) ≤ ¯ K ; (ii) Fix any θ i ∈ Θ i and ( σ n ) n such that Q iθ i ( z i ) > ∀ z i ∈ Z i and lim n →∞ σ n = σ . Then lim n →∞ K i ( σ n , θ i ) = K i ( σ, θ i ) ; (iii) K i is (jointly) lowersemicontinuous: Fix any ( θ in ) n and ( σ n ) n such that lim n →∞ θ in = θ i , lim n →∞ σ n = σ .Then lim inf n →∞ K i ( σ n , θ in ) ≥ K ( σ, θ i ) ; (iv) Let ξ i be a random vector in R X i withabsolutely continuous probability distribution P ξ . Then, ∀ ( s i , x i ) ∈ S i × X i , µ i (cid:55)→ σ i ( µ i )( x i | s i ) = P ξ (cid:0) ξ i : x i ∈ arg max ¯ x i ∈ X i E ¯ Q iµi ( ·| s i , ¯ x i ) [ π i (¯ x i , Y i )]+ ξ i (¯ x i ) (cid:1) is continuous.Proof . (i) By Assumption 1 and ﬁniteness of Z i , there exist θ i ∗ ∈ Θ and α ∈ (0 , Q iθ i ∗ ( z i ) ≥ α ∀ z i ∈ Z i . Thus, ∀ σ ∈ Σ, K ( σ, θ ∗ ) ≤ − E ¯ P iσ [ln Q iθ ∗ ( Z i )] ≤ − ln α .(ii) K i ( σ n , θ i ) − K ( σ, θ i ) = (cid:80) z i ∈ Z i (cid:8)(cid:0) ¯ P iσ n ( z i ) ln Q iσ n ( z i ) − ¯ P iσ ( z i ) ln Q iσ ( z i ) (cid:1) + (cid:0) ¯ P iσ ( z i ) − ¯ P iσ n ( z i ) (cid:1) ln Q iθ i ( z i ) (cid:9) . The ﬁrst term in the RHS converges to 0 because lim n →∞ σ n = σ , Q σ is continuous, and x ln x is continuous ∀ x ∈ [0 , n →∞ σ n = σ , ¯ P iσ is continuous, and ln Q iθ i ( z i ) ∈ ( −∞ , ∀ z i ∈ Z i .(iii) K i ( σ n , θ in ) − K ( σ, θ i ) = (cid:80) z i ∈ Z i (cid:8)(cid:0) ¯ P iσ n ( z i ) ln Q iσ n ( z i ) − ¯ P iσ ( z i ) ln Q iσ ( z i ) (cid:1) + (cid:0) ¯ P iσ ( z i ) ln Q iθ i ( z i ) − ¯ P iσ n ( z i ) ln Q iθ in ( z i ) (cid:1)(cid:9) . The ﬁrst term in the RHS converges to 0 (same argument as inpart (ii)). The proof concludes by showing that, ∀ z i ∈ Z i ,lim inf n →∞ − ¯ P iσ n ( z i ) ln Q iθ in ( z i ) ≥ − ¯ P iσ ( z i ) ln Q iθ i ( z i ) . (8)Suppose lim inf n →∞ − ¯ P iσ n ( z i ) ln Q iθ in ( z i ) ≤ M < ∞ (if not, (8) holds trivially). Theneither (i) ¯ P iσ n ( z i ) → ¯ P iσ ( z i ) >

0, in which case (8) holds with equality (by continuityof θ i (cid:55)→ Q iθ i ), or (ii) ¯ P iσ n ( z i ) → ¯ P iσ ( z i ) = 0, in which case (8) holds because its RHS is0 (by convention that 0 ln 0 = 0) and its LHS is always nonnegative.(iv) The proof is standard and, therefore, omitted. (cid:3) Proof of Lemma 1.

Part (i). Note that K i ( σ, θ i ) ≥ − (cid:88) ( s i ,x i ) ∈ S i × X i ln (cid:0) E Q iσ ( ·| s i ,x i ) (cid:2) Q iθ i ( Y i | s i , x i ) Q iσ ( Y i | s i , x i ) (cid:3)(cid:1) σ i ( x i | s i ) p S i ( s i ) = 0 , · ),and it holds with equality if and only if Q iθ i ( · | s i , x i ) = Q iθ i ( · | s i , x i ) ∀ ( s i , x i ) suchthat σ i ( x i | s i ) > p S i ( s i ) > i ( σ ) is nonempty : By Claim A(i), ∃ K < ∞ such that the minimizersare in the constraint set { θ i ∈ Θ i : K i ( σ, θ i ) ≤ K } . Because K i ( σ, · ) is continuousover a compact set, a minimum exists.Θ i ( σ ) is upper hemicontinuous (uhc ): Fix any ( σ n ) n and ( θ in ) n such that lim n →∞ σ n = σ , lim n →∞ θ in = θ i , and θ in ∈ Θ i ( σ n ) ∀ n . We show that θ i ∈ Θ i ( σ ) (so that Θ i ( · )has a closed graph and, by compactness of Θ i , is uhc). Suppose, to obtain a con-tradiction, that θ i / ∈ Θ i ( σ ). By Claim A(i), there exist ˆ θ i ∈ Θ i and ε > K i ( σ, ˆ θ i ) ≤ K i ( σ, θ i ) − ε and K i ( σ, ˆ θ i ) < ∞ . By Assumption 1, ∃ (ˆ θ ij ) j withlim j →∞ ˆ θ ij = ˆ θ i and, ∀ j , Q i ˆ θ ij ( z i ) > ∀ z i ∈ Z i . We show that there is an element of thesequence, ˆ θ iJ , that “does better” than θ in given σ n , which is a contradiction. Because K i ( σ, ˆ θ i ) < ∞ , continuity of K i ( σ, · ) implies that there exists J large enough such that (cid:12)(cid:12)(cid:12) K i ( σ, ˆ θ iJ ) − K i ( σ, ˆ θ i ) (cid:12)(cid:12)(cid:12) ≤ ε/

2. Moreover, Claim A(ii) applied to θ i = ˆ θ iJ implies thatthere exists N ε,J such that, ∀ n ≥ N ε,J , (cid:12)(cid:12)(cid:12) K i ( σ n , ˆ θ iJ ) − K i ( σ, ˆ θ iJ ) (cid:12)(cid:12)(cid:12) ≤ ε/

2. Thus, ∀ n ≥ N ε,J , (cid:12)(cid:12) K i ( σ n , ˆ θ iJ ) − K i ( σ, ˆ θ i ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) K i ( σ n , ˆ θ iJ ) − K i ( σ, ˆ θ iJ ) (cid:12)(cid:12) + (cid:12)(cid:12) K i ( σ, ˆ θ iJ ) − K i ( σ, ˆ θ i ) (cid:12)(cid:12) ≤ ε .Therefore, K i ( σ n , ˆ θ iJ ) ≤ K i ( σ, ˆ θ i ) + ε ≤ K i ( σ, θ i ) − ε. (9)Suppose K i ( σ, θ i ) < ∞ . By Claim A(iii), ∃ n ε ≥ N ε,J such that K i ( σ n ε , θ in ε ) ≥ K i ( σ, θ i ) − ε . This result and expression (9), imply K i ( σ n ε , ˆ θ iJ ) ≤ K ( σ n ε , θ in ε ) − ε .But this contradicts θ in ε ∈ Θ i ( σ n ε ). Finally, if K i ( σ, θ i ) = ∞ , Claim A(iii) impliesthat ∃ n ε ≥ N ε,J such that K i ( σ n ε , θ in ε ) ≥ K , where K is the bound deﬁned in ClaimA(i). But this also contradicts θ in ε ∈ Θ i ( σ n ε ).Θ i ( σ ) is compact : As shown above, Θ i ( · ) has a closed graph, and so Θ i ( σ ) is aclosed set. Compactness of Θ i ( σ ) follows from compactness of Θ i . (cid:3) Proof of Theorem 1.

We prove the result in two parts.

Part 1 . We showexistence of equilibrium in the perturbed game (deﬁned in Section 4.1). Let Γ : × i ∈ I ∆(Θ i ) ⇒ × i ∈ I ∆(Θ i ) be a correspondence such that, ∀ µ = ( µ i ) i ∈ I ∈ × i ∈ I ∆(Θ i ),Γ( µ ) = × i ∈ I ∆ (Θ i ( σ ( µ ))) , where σ ( µ ) = ( σ i ( µ i )) i ∈ I ∈ Σ and is deﬁned as σ i ( µ i )( x i | s i ) = P ξ (cid:18) ξ i : x i ∈ arg max ¯ x i ∈ X i E ¯ Q iµi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , Y i ) (cid:3) + ξ i (¯ x i ) (cid:19) (10)35 ( x i , s i ) ∈ X i × S i . Note that if ∃ µ ∗ ∈ × i ∈ I ∆(Θ i ) such that µ ∗ ∈ Γ( µ ∗ ), then σ ∗ ≡ ( σ i ( µ i ∗ )) i ∈ I is an equilibrium of the perturbed game. We show that such µ ∗ existsby checking the conditions of the Kakutani-Fan-Glicksberg ﬁxed-point theorem: (i) × i ∈ I ∆(Θ i ) is compact, convex and locally convex Hausdorﬀ: The set ∆(Θ i ) is convex,and since Θ i is compact ∆(Θ i ) is also compact under the weak topology (Aliprantisand Border (2006), Theorem 15.11). By Tychonoﬀ’s theorem, × i ∈ I ∆(Θ i ) is compacttoo. Finally, the set is also locally convex under the weak topology; (ii) Γ hasconvex, nonempty images: It is clear that ∆ (Θ i ( σ ( µ ))) is convex valued ∀ µ . Also,by Lemma 1, Θ i ( σ ( µ )) is nonempty ∀ µ ; (iii) Γ has a closed graph: Let ( µ n , ˆ µ n ) n besuch that ˆ µ n ∈ Γ( µ n ) and µ n → µ and ˆ µ n → ˆ µ (under the weak topology). By ClaimA(iv), µ i (cid:55)→ σ i ( µ i ) is continuous. Thus, σ n ≡ ( σ i ( µ in )) i ∈ I → σ ≡ ( σ i ( µ i )) i ∈ I . ByLemma 1, σ (cid:55)→ Θ i ( σ ) is uhc; thus, by Theorem 17.13 in Aliprantis and Border (2006), σ (cid:55)→ × i ∈ I ∆ (Θ i ( σ )) is also uhc. Therefore, ˆ µ ∈ × i ∈ I ∆ (Θ i ( σ )) = Γ( µ ). Part 2 . Fix a sequence of perturbed games indexed by the probability of per-turbations ( P ξ,n ) n . By Part 1, there is a corresponding sequence of ﬁxed points( µ n ) n , such that µ n ∈ × i ∈ I ∆ (Θ i ( σ n )) ∀ n , where σ n ≡ ( σ i ( µ in , P ξ,n )) i ∈ I (see equa-tion (10), where we now explicitly account for dependance on P ξ,n ). By compactness,there exist subsequences of ( µ n ) n and ( σ n ) n that converge to µ and σ , respectively.Since σ (cid:55)→ × i ∈ I ∆ (Θ i ( σ )) is uhc, then µ ∈ × i ∈ I ∆ (Θ i ( σ )). We now show that ifwe choose ( P ξ,n ) n such that, ∀ ε >

0, lim n →∞ P ξ,n ( (cid:107) ξ in (cid:107) ≥ ε ) = 0, then σ is opti-mal given µ in the unperturbed game—this establishes existence of equilibrium inthe unperturbed game. Suppose not, so that there exist i, s i , x i , ˆ x i , and ε > σ i ( x i | s i ) > E ¯ Q iµi ( ·| s i ,x i ) [ π i ( x i , Y i )] + 4 ε ≤ E ¯ Q iµi ( ·| s i , ˆ x i ) [ π i (ˆ x i , Y i )]. By con-tinuity of µ i (cid:55)→ ¯ Q iµ i and the fact that lim n →∞ µ in = µ i , ∃ n such that, ∀ n ≥ n , E ¯ Q iµin ( ·| s i ,x i ) [ π i ( x i , Y i )] + 2 ε ≤ E ¯ Q iµin ( ·| s i , ˆ x i ) [ π i (ˆ x i , Y i )]. It then follows from (10) andlim n →∞ P ξ,n ( (cid:107) ξ in (cid:107) ≥ ε ) = 0 that lim n →∞ σ i ( µ in , P ξ,n )( x i | s i ) = 0. But this contradictslim n →∞ σ i ( µ in , P ξ,n )( x i | s i ) = σ i ( x i | s i ) > (cid:3) Proof of Proposition 3.

In the next paragraph, we prove the following result:For all σ and ¯ θ iσ ∈ Θ i ( σ ), (a) Q i Ω , ¯ θ iσ ( ω (cid:48) | s i ) = p Ω | S i ( ω (cid:48) | s i ) for all s i ∈ S i , ω (cid:48) ∈ Ωand (b) Q i X − i , ¯ θ iσ ( x − i | α i ) = (cid:80) ω (cid:48)(cid:48) ∈ Ω p Ω |A i ( ω (cid:48)(cid:48) | α i ) (cid:81) j (cid:54) = i σ j ( x j | s j ( ω (cid:48)(cid:48) )) for all α i ∈A i , x − i ∈ X − i . Equivalence between Berk-Nash and ABEE follows immediately from(a), (b), and the fact that expected utility of player i with signal s i and beliefs ¯ θ σ This last claim follows since the weak topology is induced by a family of semi-norms of the form: ρ ( µ, µ (cid:48) ) = | E µ [ f ] − E µ (cid:48) [ f ] | for f continuous and bounded for any µ and µ (cid:48) in ∆(Θ i ). (cid:80) ω (cid:48) ∈ Ω Q i Ω , ¯ θ iσ ( ω (cid:48) | s i ) (cid:80) x − i ∈ X − i Q i X − i , ¯ θ iσ ( x − i | α i ( ω (cid:48) )) π i (¯ x i , x − i , ω (cid:48) ).Proof of (a) and (b): − K i ( σ, θ i ) equals, up to a constant, (cid:88) s i , ˜ w, ˜ x − i ln (cid:0) Q i Ω ,θ i (˜ ω | s i ) Q i X − i ,θ i (˜ x − i | α i (˜ ω )) (cid:1) (cid:89) j (cid:54) = i σ j (˜ x j | s j (˜ ω )) p Ω | S i (˜ ω | s i ) p S i ( s i )= (cid:88) s i , ˜ ω ln (cid:0) Q i Ω ,θ i (˜ ω | s i ) (cid:1) p Ω | S i (˜ ω | s i ) p S i ( s i ) + (cid:88) ˜ x − i ,α i ∈A i ln (cid:0) Q i X − i ,θ i (˜ x − i | α i ) (cid:1) (cid:88) ˜ ω ∈ α i (cid:89) j (cid:54) = i σ j (˜ x j | s j (˜ ω )) p Ω (˜ ω ) . It is straightforward to check that any parameter value that maximizes the aboveexpression satisﬁes (a) and (b). (cid:3)

Proof of Lemma 2.

The proof uses Claim B, which is stated and proven afterthis proof. It is suﬃcient to establish that lim t →∞ ´ Θ i d i ( σ, θ i ) µ it ( dθ i ) = 0 a.s. in H ,where d i ( σ, θ i ) = inf ˆ θ i ∈ Θ i ( σ ) (cid:107) θ i − ˆ θ i (cid:107) . Fix i ∈ I and h ∈ H . Then, by Bayes’ rule, ˆ Θ i d i ( σ, θ i ) µ it ( dθ i ) = ´ Θ i d i ( σ, θ i ) (cid:81) t − τ =0 Q iθi ( y iτ | s iτ ,x iτ ) Q iστ ( y iτ | s iτ ,x iτ ) µ i ( dθ i ) ´ Θ i (cid:81) t − τ =0 Q iθi ( y iτ | s iτ ,x iτ ) Q iστ ( y iτ | s iτ ,x iτ ) µ i ( dθ i ) = ´ Θ i d i ( σ, θ i ) e tK it ( h,θ i ) µ i ( dθ i ) ´ Θ i e tK it ( h,θ i ) µ i ( dθ i ) , where the ﬁrst equality is well-deﬁned by Assumption 1, full support of µ i , and thefact that P µ ,φ ( H ) > Q iσ τ terms are positive, and where we deﬁne K it ( h, θ i ) = − t (cid:80) t − τ =0 ln Q iστ ( y iτ | s iτ ,x iτ ) Q iθi ( y iτ | s iτ ,x iτ ) for the second equality. For any α >

0, deﬁneΘ iα ( σ ) ≡ { θ i ∈ Θ i : d i ( σ, θ i ) < α } . Then, for all ε > η > ´ Θ i d i ( σ, θ i ) µ it ( dθ i ) ≤ ε + C A it ( h,σ,ε ) B it ( h,σ,η ) , where C ≡ sup θ i ,θ i ∈ Θ i (cid:107) θ i − θ i (cid:107) < ∞ (because Θ i is bounded) and where A it ( h, σ, ε ) = ´ Θ i \ Θ iε ( σ ) e tK it ( h,θ i ) µ i ( dθ i ) and B it ( h, σ, η ) = ´ Θ iη ( σ ) e tK it ( h,θ i ) µ i ( dθ i ).Theproof concludes by showing that, for all (suﬃciently small) ε > ∃ η ε > t →∞ A it ( h, σ, ε ) /B it ( h, σ, η ε ) = 0. This result is achieved in several steps. First, ∀ ε >

0, deﬁne K iε ( σ ) = inf { K i ( σ, θ i ) | θ i ∈ Θ i \ Θ iε ( σ ) } and α ε = ( K iε ( σ ) − K i ( σ )) / K i ( σ ) = inf θ i ∈ Θ i K i ( σ, θ i ). By continuity of K i ( σ, · ), there exist ¯ ε and ¯ α suchthat, ∀ ε ≤ ¯ ε , 0 < α ε ≤ ¯ α < ∞ . Henceforth, let ε ≤ ¯ ε . It follows that K i ( σ, θ i ) ≥ K iε ( σ ) > K i ( σ ) + 2 α ε (11) ∀ θ i such that d i ( σ, θ i ) ≥ ε . Also, by continuity of K i ( σ, · ), ∃ η ε > ∀ θ i ∈ If, for some θ i , Q iθ i ( y iτ | s iτ , x iτ ) = 0 for some τ ∈ { , ..., t − } , then we deﬁne K it ( h, θ i ) = −∞ and exp (cid:8) tK it ( h, θ i ) (cid:9) = 0. iη ε , K i ( σ, θ i ) < K i ( σ ) + α ε / . (12)Second, let ˆΘ i = (cid:8) θ i ∈ Θ i : Q iθ i ( y iτ | s iτ , x iτ ) > τ (cid:9) and ˆΘ iη ε ( σ ) = ˆΘ i ∩ Θ iη ε ( σ ). We now show that µ i ( ˆΘ iη ε ( σ )) >

0. By Lemma 1, Θ i ( σ ) is nonempty. Pickany θ i ∈ Θ i ( σ ). By Assumption 1, ∃ ( θ in ) n in Θ i such that lim n →∞ θ in = θ i and Q iθ in ( y i | s i , x i ) > ∀ y i ∈ f i (Ω , x i , X − i ) and all ( s i , x i ) ∈ S i × X i . In particular, ∃ θ i ¯ n such that d i ( σ, θ i ¯ n ) < . η ε and, by continuity of Q · , there exists an open set U around θ i ¯ n such that U ⊆ ˆΘ iη ε ( σ ). By full support, µ i ( ˆΘ iη ε ( σ )) >

0. Next, note that,lim inf t →∞ B it ( h, σ, η ε ) e t ( K i ( σ )+ αε ) ≥ lim inf t →∞ ˆ ˆΘ iηε ( σ ) e t ( K i ( σ )+ αε + K it ( h,θ i )) µ i ( dθ i ) ≥ ˆ ˆΘ iηε ( σ ) e lim t →∞ t ( K i ( σ )+ αε − K i ( σ,θ i )) µ i ( dθ i ) = ∞ (13)a.s. in H , where the ﬁrst inequality follows because ˆΘ iη ε ( σ ) ⊆ Θ iη ε ( σ ) and exp is apositive function, the second inequality follows from Fatou’s Lemma and a LLN fornon-iid random variables that implies lim t →∞ K it ( h, θ i ) = − K i ( σ, θ i ) ∀ θ i ∈ ˆΘ i , a.s. in H (see Claim B(i) below), and the last equality follows from (12) and the fact that µ i (Θ iη ε ( σ )) > A it ( h, σ, ε ). Claims B(ii) and B(iii) (see below) implythat ∃ T such that, ∀ t ≥ T , K it ( h, θ i ) < − ( K i ( σ ) + (3 / α ε ) ∀ θ i ∈ Θ i \ Θ iε ( σ ), a.s. in H . Thus, lim t →∞ A it ( h, σ, ε ) e t ( K i ( σ )+ α ε ) = lim t →∞ ˆ Θ i \ Θ iε ( σ ) e t ( K i ( σ )+ α ε + K it ( h,θ i )) µ i ( dθ i ) ≤ µ i (Θ i \ Θ iε ( σ )) lim t →∞ e − tα ε / = 0 . a.s. in H . The above expression and equation (13) imply that lim t →∞ A it ( h, σ, ε ) /B it ( h, σ, η ε ) =0 a.s.- P µ ,φ . (cid:3) We state and prove Claim B used in the proof above. For any ξ >

0, deﬁne Θ iσ,ξ tobe the set such that θ i ∈ Θ iσ,ξ if and only if Q iθ i ( y i | s i , x i ) ≥ ξ for all ( s i , x i , y i ) suchthat Q iσ ( y i | s i , x i ) σ i ( x i | s i ) p S i ( s i ) > Claim B.

For all i ∈ I : (i) For all θ i ∈ ˆΘ i , lim t →∞ K it ( h, θ i ) = − K i ( σ, θ i ) , a.s. in H ; (ii) There exist ξ ∗ > and T ξ ∗ such that, ∀ t ≥ T ξ ∗ , K it ( h, θ i ) < − ( K i ( σ )+(3 / α ε ) ∀ θ i / ∈ Θ iσ,ξ , a.s. in H ; (iii) For all ξ > , ∃ ˆ T ξ such that, ∀ t ≥ ˆ T ξ , K it ( h, θ i ) < ( K i ( σ ) + (3 / α ε ) ∀ θ i ∈ Θ iσ,ξ \ Θ iε ( σ ) , a.s. in H .Proof : Deﬁne f req it ( z i ) = t (cid:80) t − τ =0 z i ( z iτ ) ∀ z i ∈ Z i . K it can be written as K it ( h, θ i ) = κ i t ( h )+ κ i t ( h )+ κ i t ( h, θ i ), where κ i t ( h ) = − t − (cid:80) t − τ =0 (cid:80) z i ∈ Z i (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1) ln Q iσ τ ( z i ), κ i t ( h ) = − t − (cid:80) t − τ =0 (cid:80) z i ∈ Z i ¯ P iσ τ ( z i ) ln Q iσ τ ( z i ), and κ i t ( h, θ i ) = (cid:80) z i ∈ Z i f req it ( z i ) ln Q iθ i ( z i ).The statements made below hold almost surely in H , but we omit this qualiﬁcation.First, we show lim t →∞ κ i t ( h ) = 0. Deﬁne l iτ ( h, z i ) = (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1) ln Q iσ τ ( z i )and L it ( h, z i ) = (cid:80) tτ =1 τ − l iτ ( h, z i ) ∀ z i ∈ Z i . Fix any z i ∈ Z i . We show that L it ( · , z i )converges a.s. to an integrable, and, therefore, ﬁnite function L i ∞ ( · , z i ). To show this,we use martingale convergence results. Let h t denote the partial history until time t . Since E P µ ,φ ( ·| h t ) (cid:2) l it +1 ( h, z i ) (cid:3) = 0, then E P µ ,φ ( ·| h t ) (cid:2) L it +1 ( h, z i ) (cid:3) = L it ( h, z i ) and so( L it ( h, z i )) t is a martingale with respect to P µ ,φ . Next, we show that sup t E P µ ,φ [ | L it ( h, z i ) | ] ≤ M for M < ∞ . Note that E P µ ,φ (cid:2)(cid:0) L it ( h, z i ) (cid:1) (cid:3) = E P µ ,φ (cid:2)(cid:80) tτ =1 τ − (cid:0) l iτ ( h, z i ) (cid:1) +2 (cid:80) τ (cid:48) >τ τ (cid:48) τ l iτ ( h, z i ) l iτ (cid:48) ( h, z i ) (cid:3) . Since ( l it ) t is a martingale diﬀerence sequence, for τ (cid:48) > τ , E P µ ,φ (cid:2) l iτ ( h, z i ) l iτ (cid:48) ( h, z i ) (cid:3) = 0. Therefore, E P µ ,φ (cid:2)(cid:0) L it ( h, z i ) (cid:1) (cid:3) = (cid:80) tτ =1 τ − E P µ ,φ (cid:2)(cid:0) l iτ ( h, z i ) (cid:1) (cid:3) .Note also that E P µ ,φ ( ·| h τ − ) (cid:2)(cid:0) l iτ ( h, z i ) (cid:1) (cid:3) ≤ (cid:0) ln Q iσ τ ( z i ) (cid:1) Q iσ τ ( z i ). Therefore, by the lawof iterated expectations, E P µ ,φ (cid:2)(cid:0) L it ( h, z i ) (cid:1) (cid:3) ≤ (cid:80) tτ =1 τ − E P µ ,φ (cid:104)(cid:0) ln Q iσ τ ( z i ) (cid:1) Q iσ τ ( z i ) (cid:105) ,which in turn is bounded above by 1 because (ln x ) x ≤ ∀ x ∈ [0 , t E P µ ,φ [ | L it | ] ≤

1. By Theorem5.2.8 in Durrett (2010), L it ( h, z i ) converges a.s.- P µ ,φ to a ﬁnite L i ∞ ( h, z i ). Thus, byKronecker’s lemma (Pollard (2001), page 105) , lim t →∞ (cid:80) z i ∈ Z i (cid:8) t − (cid:80) tτ =1 (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1) ln Q iσ τ ( z i ) (cid:9) = 0. Therefore, lim t →∞ κ i t ( h ) = 0.Next, consider κ i t ( h ). The assumption that lim t →∞ σ t = σ and continuity of Q iσ ln Q iσ in σ imply that lim t →∞ κ i t ( h ) = − (cid:80) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) [ln Q iσ ( Y i | s i , x i )] σ i ( x i | s i ) p S i ( s i ).The limits of κ i t , κ i t imply that, ∀ γ > ∃ ˆ t γ such that, ∀ t ≥ ˆ t γ , (cid:12)(cid:12) κ i t ( h ) + κ i t ( h ) + (cid:88) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iσ ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ) (cid:12)(cid:12) ≤ γ. (14)We now prove (i)-(iii) by characterizing the limit of κ i t ( h, θ i ).(i) For all z i ∈ Z i , (cid:12)(cid:12) f req t ( z i ) − ¯ P iσ ( z i ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) t − (cid:80) t − τ =0 (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) t − (cid:80) t − τ =0 (cid:0) ¯ P iσ τ ( z i ) − ¯ P iσ ( z i ) (cid:1)(cid:12)(cid:12) . The ﬁrst term in the RHS goes to 0 (the proof is essentially identical to the This lemma implies that for a sequence ( (cid:96) t ) t if (cid:80) τ (cid:96) τ < ∞ , then (cid:80) tτ =1 b τ b t (cid:96) τ → b t ) t is anondecreasing positive real valued that diverges to ∞ . We can apply the lemma with (cid:96) t ≡ t − l t and b t = t . κ i goes to 0). The second term goes to 0 because lim t →∞ σ t = σ and¯ P i · is continuous. Thus, ∀ ζ > ∃ ˆ t ζ such that, ∀ t ≥ ˆ t ζ , ∀ z i ∈ Z i , (cid:12)(cid:12) f req it ( z i ) − ¯ P iσ ( z i ) (cid:12)(cid:12) < ζ (15)Thus, since θ i ∈ ˆΘ i , lim t →∞ κ i t ( h, θ i ) = (cid:80) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iθ i ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ). This expression and (14) establish part (i).(ii) For all θ i / ∈ Θ iσ,ξ , let z iθ i be such that ¯ P iσ ( z iθ i ) > Q iθ i ( z iθ i ) < ξ . By (15), ∃ t p iL / such ∀ t ≥ t p iL / , κ i t ( h, θ i ) ≤ f req it ( z iθ i ) ln Q iθ i ( z iθ i ) ≤ ( p iL /

2) ln ξ ∀ θ i / ∈ Θ iσ,ξ ,where p iL = min Z i { ¯ P iσ ( z i ) : ¯ P iσ ( z i ) > } . This result and (14) imply that, ∀ t ≥ t ≡ max { t p iL / , ˆ t } , K it ( h, θ i ) ≤ − (cid:88) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iσ ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ) + 1 + (cid:0) p iL / (cid:1) ln ξ ≤ Z i + 1 + (cid:0) q iL / (cid:1) ln ξ, (16) ∀ θ i / ∈ Θ iσ,ξ , where the second line follows from the facts that σ i ( x i | s i ) p S i ( s i ) ≤ x ln( x ) ∈ [ − , ∀ x ∈ [0 , K i ( σ ) < ∞ and α ε ≤ ¯ α < ∞∀ ε ≤ ¯ ε implies that the RHS of (16) can be made lower than − ( K i ( σ ) + (3 / α ε ) forsome suﬃciently small ξ ∗ .(iii) For any ξ >

0, let ζ ξ = − α ε / ( Z i ξ ) >

0. By (15), ∃ ˆ t ζ ξ such that, ∀ t ≥ ˆ t ζ ξ , κ i t ( h, θ i ) ≤ (cid:88) { z i : ¯ P iσ ( z i ) > } f req it ( z i ) ln Q iθ i ( z i ) ≤ (cid:88) { z i : ¯ P iσ ( z i ) > } (cid:0) ¯ P iσ ( z i ) − ζ ξ (cid:1) ln Q iθ i ( z i ) ≤ (cid:88) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iθ i ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ) − Z i ζ ξ ln ξ, ∀ θ i ∈ Θ iσ,ξ (since Q iθ i ( z i ) ≥ ξ ∀ z i such that ¯ P iσ ( z i ) > α ε / − Z i ζ ξ ln ξ , and (14) imply that, ∀ t ≥ ˆ T ξ ≡ max { ˆ t ζ ξ , ˆ t α ε / } , K it ( h, θ i ) < − K i ( σ, θ i ) + α ε / ∀ θ i ∈ Θ iσ,ξ . This result and (11) imply the desiredresult. (cid:3) nline Appendix A Example: Trading with adverse selection

In this section, we provide the formal details for the trading environment in Example2.5. Let p ∈ ∆( A × V ) be the true distribution; we use subscripts, such as p A and p V | A , to denote the corresponding marginal and conditional distributions. Let Y = A × V ∪{ (cid:3) } denote the space of observable consequences, where (cid:3) will be a convenientway to represent the fact that there is no trade. We denote the random variable takingvalues in V ∪ { (cid:3) } by ˆ V . Notice that the state space in this example is Ω = A × V .Partial feedback is represented by the function f P : X × A × V → Y such that f P ( x, a, v ) = ( a, v ) if a ≤ x and f P ( x, a, v ) = ( a, (cid:3) ) if a > x . Full feedback isrepresented by f F ( x, a, v ) = ( a, v ). In all cases, payoﬀs are given by π : X × Y → R , where π ( x, a, v ) = v − x if a ≤ x and 0 otherwise. The objective distributionfor the case of partial feedback, Q P , is, ∀ x ∈ X , ∀ ( a, v ) ∈ A × V , Q P ( a, v | x ) = p ( a, v )1 { x ≥ a } ( x ), and, ∀ x ∈ X , ∀ a ∈ A , Q P ( a, (cid:3) | x ) = p A ( a )1 { xx } p A ( a ) ln p A ( a ) θ A ( a ) + (cid:88) { ( a,v ) ∈ A × V : a ≤ x } p ( a, v ) ln p ( a, v ) θ A ( a ) θ V ( v ) . For each x ∈ X , θ ( x ) = ( θ A ( x ) , θ V ( x )) ∈ Θ I = ∆( A ) × ∆( V ), where θ A ( x ) = p A and θ V ( x )( v ) = p V | A ( v | A ≤ x ) ∀ v ∈ V is the unique parameter value that minimizes K BE ( x, · ). Together with (18), we obtain equation Π BE in the main text. Analogy-based expectations equilibrium . Feedback is f F and the parameterset is Θ A . The subjective model is, ∀ x ∈ X , ∀ ( a, v ) ∈ A × V j , all j = 1 , ..., k , Q ABEEθ ( a, v | x ) = θ j ( a ) θ V ( v ), and, ∀ x ∈ X , ∀ a ∈ A , Q ABEEθ ( a, (cid:3) | x ) = 0, where θ = ( θ , ..., θ k , θ V ) ∈ Θ A . This is an analogy-based game. From (17), perceivedexpected proﬁt from x ∈ X is k (cid:88) j =1 P r θ V ( V ∈ V j ) (cid:8) P r θ j ( A ≤ x | V ∈ V j ) ( E θ V [ V | V ∈ V j ] − x ) (cid:9) . (19) In all cases, the extension to mixed strategies is straightforward. x ∈ X , the wKLD function is K ABEE ( x, θ ) = E Q F ( ·| x ) (cid:2) ln Q F ( A, ˆ V | x ) Q ABEEθ ( A, ˆ V | x ) (cid:3) = k (cid:88) j =1 (cid:88) ( a,v ) ∈ A × V j p ( a, v ) ln p ( a, v ) θ j ( a ) θ V ( v ) . For each x ∈ X , θ ( x ) = ( θ ( x ) , ..., θ k ( x ) , θ V ( x )) ∈ Θ A = × j ∆( A ) × ∆( V ), where θ j ( x )( a ) = p A | V j ( a | V ∈ V j ) ∀ a ∈ A and θ V ( x ) = p V is the unique parameter valuethat minimizes K ABEE ( x, · ). Together with (19), we obtain equation Π ABEE in themain text.

Behavioral equilibrium (naive version) with analogy classes . It is natu-ral to also consider a case, unexplored in the literature, where feedback f P is partialand the subjective model is parameterized by Θ A . Suppose that the buyer’s behav-ior has stabilized to some price x ∗ . Due to the possible correlation across analogyclasses, the buyer might now believe that deviating to a diﬀerent price x (cid:54) = x ∗ af-fects her valuation. In particular, the buyer might have multiple beliefs at x ∗ . Toobtain a natural equilibrium reﬁnement, we assume that the buyer also observesthe analogy class that contains her realized valuation, whether she trades or not,and that Pr( V ∈ V j , A ≤ x ) > j = 1 , ..., k and x ∈ X . We de-note this new feedback assumption by a function f P ∗ : X × A × V → Y ∗ where Y ∗ = A × V ∪ { , ..., k } and f P ∗ ( x, a, v ) = ( a, v ) if a ≤ x and f P ∗ ( x, a, v ) = ( a, j )if a > x and v ∈ V j . The objective distribution given this feedback functionis, ∀ x ∈ X , ∀ ( a, v ) ∈ A × V , Q P ∗ ( a, v | x ) = p ( a, v )1 { x ≥ a } ( x ), and, ∀ x ∈ X , ∀ a ∈ A and all j = 1 , ..., k , Q P ∗ ( a, j | x ) = p A | V j ( a | V ∈ V j ) p V ( V j )1 { x x } p A | V j ( a | V ∈ V j ) p V ( V j ) ln p A | V j ( a | V ∈ V j ) p V ( V j ) θ j ( a ) (cid:80) v ∈ V j θ V ( v ) . For each x ∈ X , θ ( x ) = ( θ ( x ) , ..., θ k ( x ) , θ V ( x )) ∈ Θ A = × j ∆( A ) × ∆( V ), where θ j ( x )( a ) = p A | V j ( a | V ∈ V j ) ∀ a ∈ A and θ V ( x )( v ) = p V | A ( v | V ∈ V j , A ≤ x ) p V ( V j ) ∀ v ∈ V j , all j = 1 , ..., k is the unique parameter value that minimizes K BEA ( x, · ).Together with (19), we obtain Π BEA ( x, x ∗ ) = (cid:80) ki = j Pr( V ∈ V j ) Pr( A ≤ x | V ∈ V j ) (cid:0) E (cid:2) V | V ∈ V j , A ≤ x ∗ (cid:3) − x (cid:1) . B Proof of converse result: Theorem 3

Let (¯ µ i ) i ∈ I be a belief proﬁle that supports σ as an equilibrium. Consider the followingpolicy proﬁle φ = ( φ it ) i,t : For all i ∈ I and all t ,( µ i , s i , ξ i ) (cid:55)→ φ it ( µ i , s i , ξ i ) ≡  ϕ i (¯ µ i , s i , ξ i ) if max i ∈ I || ¯ Q iµ i − ¯ Q i ¯ µ i || ≤ C ε t ϕ i ( µ i , s i , ξ i ) otherwise , where ϕ i is an arbitrary selection from Ψ i , C ≡ max I (cid:8) Y i × sup X i × Y i | π i ( x i , y i ) | (cid:9) < ∞ , and the sequence ( ε t ) t will be deﬁned below. For all i ∈ I , ﬁx any prior µ i withfull support on Θ i such that µ i ( ·| Θ i ( σ )) = ¯ µ i (where for any A ⊂ Θ Borel, µ ( ·| A ) isthe conditional probability given A ).We now show that if ε t ≥ ∀ t and lim t →∞ ε t = 0, then φ is asymptoticallyoptimal. Throughout this argument, we ﬁx an arbitrary i ∈ I . Abusing notation, let U i ( µ i , s i , ξ i , x i ) = E ¯ Q µi ( ·| s i ,x i ) [ π i ( x i , Y i )] + ξ i ( x i ). It suﬃces to show that U i ( µ i , s i , ξ i , φ it ( µ i , s i , ξ i )) ≥ U i ( µ i , s i , ξ i , x i ) − ε t (20)for all ( i, t ), all ( µ i , s i , ξ i ), and all x i . By construction of φ , equation (20) is satisﬁedif max i ∈ I || ¯ Q iµ i − ¯ Q i ¯ µ i || > C ε t . If, instead, max i ∈ I || ¯ Q iµ i − ¯ Q i ¯ µ i || ≤ C ε t , then U i (¯ µ i , s i , ξ i , φ it ( µ i , s i , ξ i )) = U i (¯ µ i , s i , ξ i , ϕ i (¯ µ i , s i , ξ i )) ≥ U i (¯ µ i , s i , ξ i , x i ) , (21)4 x i ∈ X i . Moreover, ∀ x i , (cid:12)(cid:12) U i (¯ µ i , s i , ξ i , x i ) − U i ( µ i , s i , ξ i , x i ) (cid:12)(cid:12) = (cid:12)(cid:12) (cid:88) y i ∈ Y i π ( x i , y i ) (cid:0) ¯ Q i ¯ µ i ( y i | s i , x i ) − ¯ Q iµ i ( y i | s i , x i ) (cid:1)(cid:12)(cid:12) ≤ sup X i × Y i | π i ( x i , y i ) | (cid:88) y i ∈ Y i (cid:12)(cid:12)(cid:0) ¯ Q i ¯ µ i ( y i | s i , x i ) − ¯ Q iµ i ( y i | s i , x i ) (cid:1)(cid:12)(cid:12) ≤ sup X i × Y i | π i ( x i , y i ) | × Y i × max y i ,x i ,s i (cid:12)(cid:12) ¯ Q i ¯ µ i ( y i | s i , x i ) − ¯ Q iµ i ( y i | s i , x i ) (cid:12)(cid:12) so by our choice of C , | U i (¯ µ i , s i , ξ i , x i ) − U i ( µ i , s i , ξ i , x i ) | ≤ . ε t ∀ x i . Therefore,equation (21) implies equation (20); thus φ is asymptotically optimal if ε t ≥ ∀ t andlim t →∞ ε t = 0.We now construct a sequence ( ε t ) t such that ε t ≥ ∀ t and lim t →∞ ε t = 0. Let¯ φ i = ( ¯ φ it ) t be such that ¯ φ it ( µ i , · , · ) = ϕ i (¯ µ i , · , · ) ∀ µ i ; i.e., ¯ φ i is a stationary policy thatmaximizes utility under the assumption that the belief is always ¯ µ i . Let ζ i ( µ i ) ≡ C || ¯ Q iµ i − ¯ Q i ¯ µ i || and suppose (the proof is at the end) that P µ , ¯ φ ( lim t →∞ max i ∈ I | ζ i ( µ it ( h )) | = 0) = 1 (22)(recall that P µ , ¯ φ is the probability measure over H induced by the policy proﬁle ¯ φ ; bydeﬁnition of ¯ φ , P µ , ¯ φ does not depend on µ ). Then by the 2nd Borel-Cantelli lemma(Billingsley (1995), pages 59-60), for any γ > (cid:80) t P µ , ¯ φ (max i ∈ I | ζ i ( µ it ( h )) | ≥ γ ) < ∞ . Hence, for any a >

0, there exists a sequence ( τ ( j )) j such that (cid:88) t ≥ τ ( j ) P µ , ¯ φ (cid:0) max i ∈ I | ζ i ( µ it ( h )) | ≥ /j (cid:1) < a − j (23)and lim j →∞ τ ( j ) = ∞ . For all t ≤ τ (1), we set ε t = 3 C , and, for any t > τ (1), we set ε t ≡ /N ( t ), where N ( t ) ≡ (cid:80) ∞ j =1 { τ ( j ) ≤ t } . Observe that, since lim j →∞ τ ( j ) = ∞ , N ( t ) → ∞ as t → ∞ and thus ε t → P µ ,φ (lim t →∞ (cid:107) σ t ( h ∞ ) − σ (cid:107) = 0) = 1 , where ( σ t ) t is the se-quence of intended strategies given φ , i.e., σ it ( h )( x i | s i ) = P ξ ( ξ i : φ it ( µ it ( h ) , s i , ξ i ) = x i ) . Observe that, by deﬁnition, σ i ( x i | s i ) = P ξ (cid:0) ξ i : x i ∈ arg max ˆ x i ∈ X i E ¯ Q ¯ µi ( ·| s i , ˆ x i ) [ π i (ˆ x i , Y i )]+ ξ i (ˆ x i ) (cid:1) . Since ϕ i ∈ Ψ i , it follows that we can write σ i ( x i | s i ) = P ξ ( ξ i : ϕ i (¯ µ i , s i , ξ i ) = x i ).Let H ≡ { h : (cid:107) σ t ( h ) − σ (cid:107) = 0 , for all t } . It is suﬃcient to show that P µ ,φ ( H ) = 1.5o show this, observe that P µ ,φ ( H ) ≥ P µ ,φ (cid:16) ∩ t { max i ζ i ( µ t ) ≤ ε t } (cid:17) = ∞ (cid:89) t = τ (1)+1 P µ ,φ (cid:16) max i ζ i ( µ t ) ≤ ε t | ∩ lτ (1) { max i ζ i ( µ t ) ≤ ε t } (cid:17) , where the second line omits the term P µ ,φ (max i ζ i ( µ t ) < ε t for all t ≤ τ (1)) becauseit is equal to 1 (since ε t ≥ C ∀ t ≤ τ (1)); the third line follows from the fact that φ it − = ¯ φ it − if ζ i ( µ t − ) ≤ ε t − , so the probability measure is equivalently given by P µ , ¯ φ ; and where the last line also uses the fact that P µ , ¯ φ (max i ζ i ( µ t ) < ε t for all t ≤ τ (1)) =1. In addition, ∀ a > P µ , ¯ φ (cid:16) ∩ t>τ (1) { max i ζ i ( µ t ) ≤ ε t } (cid:17) = P µ , ¯ φ (cid:16) ∩ n ∈{ , ,... } ∩ { t>τ (1): N ( t )= n } { max i ζ i ( µ t ) ≤ n − } (cid:17) ≥ − ∞ (cid:88) n =1 (cid:88) { t : N ( t )= n } P µ , ¯ φ (cid:16) max i ζ i ( µ t ) ≥ n − (cid:17) ≥ − ∞ (cid:88) n =1 a − n = 1 − a , where the last line follows from (23). Thus, we have shown that P µ ,φ ( H ) ≥ − /a ∀ a >

0; hence, P µ ,φ ( H ) = 1.We conclude the proof by showing that equation (22) indeed holds. Observe that σ is trivially stable under ¯ φ . By Lemma 2, ∀ i ∈ I and all open sets U i ⊇ Θ i ( σ ),lim t →∞ µ it (cid:0) U i (cid:1) = 1 (24) a.s. − P µ , ¯ φ (over H ). Let H denote the set of histories such that x it ( h ) = x i and s it ( h ) = s i implies that σ i ( x i | s i ) >

0. By deﬁnition of ¯ φ , P µ , ¯ φ ( H ) = 1. Thus, itsuﬃces to show that lim t →∞ max i ∈ I | ζ i ( µ it ( h )) | = 0 a.s.- P µ , ¯ φ over H . To do this, take6ny A ⊆ Θ that is closed. By equation (24), ∀ i ∈ I , and almost all h ∈ H ,lim sup t →∞ ˆ A ( θ ) µ it +1 ( dθ ) = lim sup t →∞ ˆ A ∩ Θ i ( σ ) ( θ ) µ it +1 ( dθ ) . Moreover, ˆ A ∩ Θ i ( σ ) ( θ ) µ it +1 ( dθ ) ≤ ˆ A ∩ Θ i ( σ ) ( θ ) (cid:40) (cid:81) tτ =1 Q iθ ( y iτ | s iτ , x iτ ) µ i ( dθ ) ´ Θ i ( σ ) (cid:81) tτ =1 Q iθ ( y iτ | s iτ , x iτ ) µ i ( dθ ) (cid:41) = µ i ( A | Θ i ( σ )) = ¯ µ i ( A ) , where the ﬁrst inequality follows from the fact that Θ i ( σ ) ⊆ Θ i ; the ﬁrst equalityfollows from the fact that, since h ∈ H , the fact that the game is weakly identiﬁedgiven σ implies that (cid:81) tτ =1 Q iθ ( y iτ | s iτ , x iτ ) is constant with respect to θ ∀ θ ∈ Θ i ( σ ), andthe last equality follows from our choice of µ i . Therefore, we established that a.s.- P µ , ¯ φ over H , lim sup t →∞ µ it +1 ( h )( A ) ≤ ¯ µ i ( A ) for A closed. By the portmanteau lemma, thisimplies that, a.s. - P µ , ¯ φ over H , lim t →∞ ´ Θ f ( θ ) µ it +1 ( h )( dθ ) = ´ Θ f ( θ )¯ µ i ( dθ ) for any f real-valued, bounded and continuous. Since, by assumption, θ (cid:55)→ Q iθ ( y i | s i , x i ) isbounded and continuous, the previous result applies to Q iθ ( y i | s i , x i ), and since y, s, x take a ﬁnite number of values, this result implies that lim t →∞ || ¯ Q iµ it ( h ) − ¯ Q i ¯ µ i || = 0 ∀ i ∈ I a.s. - P µ , ¯ φ over H . (cid:3) C Non-myopic players

In the main text, we proved the results for the case where players are myopic. Here,we assume that players maximize discounted expected payoﬀs, where δ i ∈ [0 ,

1) is thediscount factor of player i . In particular, players can be forward looking and decide toexperiment. Players believe, however, that they face a stationary environment and,therefore, have no incentives to inﬂuence the future behavior of other players. Weassume for simplicity that players know the distribution of their own payoﬀ perturba-tions.Because players believe that they face a stationary environment, they solve a (sub-jective) dynamic optimization problem that can be cast recursively as follows. By thePrinciple of Optimality, V i ( µ i , s i ) denotes the maximum expected discounted payoﬀs(i.e., the value function) of player i who starts a period by observing signal s i and by7olding belief µ i if and only if V i ( µ i , s i ) = ˆ Ξ i (cid:26) max x i ∈ X i E ¯ Q µi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) + ξ i ( x i ) + δE p Si (cid:2) V i (ˆ µ i , S i ) (cid:3)(cid:3)(cid:27) P ξ ( dξ i ) , (25)where ˆ µ i = B i ( µ i , s i , x i , Y i ) is the updated belief. For all ( µ i , s i , ξ i ), letΦ i ( µ i , s i , ξ i ) = arg max x i ∈ X i E ¯ Q µi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) + ξ i ( x i ) + δE p Si (cid:2) V i (ˆ µ i , S i ) (cid:3)(cid:3) . The proof of the next lemma relies on standard arguments and is, therefore, omitted. Lemma 3.

There exists a unique solution V i to the Bellman equation (25); thissolution is bounded in ∆(Θ i ) × S i and continuous as a function of µ i . Moreover, Φ i is single-valued and continuous with respect to µ i , a.s.- P ξ . Because players believe they face a stationary environment with i.i.d. perturba-tions, it is without loss of generality to restrict behavior to depend on the state of therecursive problem. Optimality of a policy is deﬁned as usual (with the requirementthat φ it ∈ Φ i ∀ t ).Lemma 2 implies that the support of posteriors converges, but posteriors need notconverge. We can always ﬁnd, however, a subsequence of posteriors that converges.By continuity of dynamic behavior in beliefs, the stable strategy proﬁle is dynamicallyoptimal (in the sense of solving the dynamic optimization problem) given this conver-gent posterior. For weakly identiﬁed games, the convergent posterior is a ﬁxed pointof the Bayesian operator. Thus, the players’ limiting strategies will provide no new in-formation. Since the value of experimentation is nonnegative, it follows that the stablestrategy proﬁle must also be myopically optimal (in the sense of solving the optimiza-tion problem that ignores the future), which is the deﬁnition of optimality used in thedeﬁnition of Berk-Nash equilibrium. Thus, we obtain the following characterizationof the set of stable strategy proﬁles when players follow optimal policies. Theorem 4.

Suppose that a strategy proﬁle σ is stable under an optimal policy proﬁlefor a perturbed and weakly identiﬁed game. Then σ is a Berk-Nash equilibrium of thegame. Doraszelski and Escobar (2010) study a similarly perturbed version of the Bellman equation. roof. The ﬁrst part of the proof is identical to the proof of Theorem 2. Here, weprove that, given that lim j →∞ σ t ( j ) = σ and lim j →∞ µ it ( j ) = µ i ∞ ∈ ∆(Θ i ( σ )) ∀ i , then, ∀ i , σ i is optimal for the perturbed game given µ i ∞ ∈ ∆(Θ i ) , i.e., ∀ ( s i , x i ), σ i ( x i | s i ) = P ξ (cid:0) ξ i : ψ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) , (26)where ψ i ( µ i ∞ , s i , ξ i ) ≡ arg max x i ∈ X i E ¯ Q iµi ∞ ( ·| s i ,x i ) [ π i ( x i , Y i )] + ξ i ( x i ).To establish (26), ﬁx i ∈ I and s i ∈ S i . Thenlim j →∞ σ it ( j ) ( h )( x i | s i ) = lim j →∞ P ξ (cid:0) ξ i : φ it ( j ) ( µ it ( j ) , s i , ξ i ) = x i (cid:1) = P ξ (cid:0) ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) , where the second line follows by optimality of φ i and Lemma 3. This implies that σ i ( x i | s i ) = P ξ ( ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } ). Thus, it remains to show that P ξ (cid:0) ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) = P ξ (cid:0) ξ i : ψ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) (27) ∀ x i such that P ξ ( ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } ) >

0. From now on, ﬁx any such x i . Since σ i ( x i | s i ) >

0, the assumption that the game is weakly identiﬁed implies that Q iθ i ( · | x i , s i ) = Q iθ i ( · | x i , s i ) ∀ θ i , θ i ∈ Θ( σ ). The fact that µ i ∞ ∈ ∆(Θ i ( σ )) then implies that B i ( µ i ∞ , s i , x i , y i ) = µ i ∞ (28) ∀ y i ∈ Y i . Thus, Φ i ( µ i ∞ , s i , ξ i ) = { x i } is equivalent to E ¯ Q µi ∞ ( ·| s i ,x i ) (cid:104) π i ( x i , Y i ) + ξ i ( x i ) + δE p Si (cid:2) V i ( µ i ∞ , S i ) (cid:3)(cid:105) > E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) π i (˜ x i , Y i ) + ξ i (˜ x i ) + δE p Si (cid:2) V i ( B i ( µ i ∞ , s i , ˜ x i , Y i ) , S i ) (cid:3)(cid:3) ≥ E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) π i (˜ x i , Y i ) + ξ i (˜ x i ) (cid:3) + δE p Si (cid:104) V i ( E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) B i ( µ i ∞ , s i , ˜ x i , Y i ) (cid:3) , S i ) (cid:105) = E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) π i (˜ x i , Y i ) + ξ i (˜ x i ) (cid:3) + δE p Si (cid:2) V i ( µ i ∞ , S i ) (cid:3) ∀ ˜ x i ∈ X i , where the ﬁrst line follows by equation (28) and deﬁnition of Φ i , the secondline follows by the convexity of V i as a function of µ i and Jensen’s inequality, andthe last line by the fact that Bayesian beliefs have the martingale property. In turn, See, for example, Nyarko (1994), for a proof of convexity of the value function. ψ ( µ i ∞ , s i , ξ i ) = { x i } . D Population models

We discuss some variants of population models that diﬀer in the matching technologyand feedback. The right variant of population model will depend on the speciﬁcapplication. Single pair model . Each period a single pair of players is randomly selectedfrom each of the i populations to play the game. At the end of the period, the signals,actions, and outcomes of their own population are revealed to everyone. Steady-state behavior in this case corresponds exactly to the notion of Berk-Nash equilibriumdescribed in the paper.

Random matching model . Each period, all players are randomly matched andobserve only feedback from their own match. We now modify the deﬁnition of Berk-Nash equilibrium to account for this random-matching setting. The idea is similar toFudenberg and Levine’s (1993) deﬁnition of a heterogeneous self-conﬁrming equilib-rium. Now each agent in population i can have diﬀerent experiences and, hence, havediﬀerent beliefs and play diﬀerent strategies in steady state.For all i ∈ I , deﬁne BR i ( σ − i ) = (cid:8) σ i : σ i is optimal given µ i ∈ ∆ (cid:0) Θ i ( σ i , σ − i ) (cid:1)(cid:9) . Note that σ is a Berk-Nash equilibrium if and only if σ i ∈ BR i ( σ − i ) ∀ i ∈ I . Deﬁnition 9.

A strategy proﬁle σ is a heterogeneous Berk-Nash equilibrium ofgame G if, for all i ∈ I , σ i is in the convex hull of BR i ( σ − i ).Intuitively, a heterogenous equilibrium strategy σ i is the result of convex combi-nations of strategies that belong to BR i ( σ − i ); the idea is that each of these strategies In some cases, it may be unrealistic to assume that players are able to observe the private signalsof previous generations, so some of these models might be better suited to cases with public, but notprivate, information. Alternatively, we can think of diﬀerent incarnations of players born every period who are able toobserve the history of previous generations.

10s followed by a segment of the population i . Random-matching model with population feedback . Each period all play-ers are randomly matched; at the end of the period, each player in population i ob-serves the signals, actions, and outcomes of their own population. Deﬁne¯ BR i ( σ i , σ − i ) = (cid:8) ˆ σ i : ˆ σ i is optimal given µ i ∈ ∆ (cid:0) Θ i ( σ i , σ − i ) (cid:1)(cid:9) . Deﬁnition 10.

A strategy proﬁle σ is a heterogeneous Berk-Nash equilibriumwith population feedback of game G if, for all i ∈ I , σ i is in the convex hull of¯ BR i ( σ i , σ − i ).The main diﬀerence when players receive population feedback is that their beliefsno longer depend on their own strategies but rather on the aggregate populationstrategies. D.1 Equilibrium foundation

Using arguments similar to the ones in the text, it is now straightforward to concludethat the deﬁnition of heterogenous Berk-Nash equilibrium captures the steady stateof a learning environment with a population of agents in the role of each player. Tosee the idea, let each population i be composed of a continuum of agents in the unitinterval K ≡ [0 , ik (meaning agent k ∈ K from population i ) isdenoted by σ ik . The aggregate strategy of population (i.e., player) i is σ i = ´ K σ ik dk . Random matching model . Suppose that each agent is optimizing and that, forall i , ( σ ikt ) converges to σ ik a.s. in K , so that individual behavior stabilizes. ThenLemma 2 says that the support of beliefs must eventually be Θ i ( σ ik , σ − i ) for agent ik . Next, for each ik , take a convergent subsequence of beliefs µ ikt and denote it µ ik ∞ .It follows that µ ik ∞ ∈ ∆(Θ i ( σ ik , σ − i )) and, by continuity of behavior in beliefs, σ ik isoptimal given µ ik ∞ . In particular, σ ik ∈ BR i ( σ − i ) for all ik and, since σ i = ´ K σ ik dk ,it follows that σ i is in the convex hull of BR i ( σ − i ). Unlike the case of heterogeneous self-conﬁrming equilibrium, a deﬁnition where each action inthe support of σ is supported by a (possibly diﬀerent) belief would not be appropriate here. Thereason is that BR i ( σ − i ) might contain only mixed, but not pure strategies (e.g., Example 1). We need individual behavior to stabilize; it is not enough that it stabilizes in the aggregate. Thisis natural, for example, if we believe that agents whose behavior is unstable will eventually realizethey have a misspeciﬁed model. andom matching model with population feedback . Suppose that eachagent is optimizing and that, for all i , σ it = ´ K σ ikt dk converges to σ i . Then Lemma2 says that the support of beliefs must eventually be Θ i ( σ i , σ − i ) for any agent inpopulation i . Next, for each ik , take a convergent subsequence of beliefs µ ikt anddenote it µ ik ∞ . It follows that µ ik ∞ ∈ ∆(Θ i ( σ i , σ − i )) and, by continuity of behavior inbeliefs, σ ik is optimal given µ ik ∞ . In particular, σ ik ∈ ¯ BR i ( σ − i ) for all i, k and, since σ i = ´ K σ ik dk , it follows that σ i is in the convex hull of ¯ BR i ( σ − i ). E Lack of payoﬀ feedback

In the paper, players are assumed to observe their own payoﬀs. We now providetwo alternatives to relax this assumption. In the ﬁrst alternative, players observeno feedback about payoﬀs; in the second alternative, players may observe partialfeedback.

No payoﬀ feedback . In the paper we had a single, deterministic payoﬀ function π i : X i × Y i → R , which can be represented in vector form as an element π i ∈ R X i × Y i ) .We now generalize it to allow for uncertain payoﬀs. Player i is endowed with aprobability distribution P π i ∈ ∆( R X i × Y i ) ) over the possible payoﬀ functions. Inparticular, the random variable π i is independent of Y i , and so there is nothing newto learn about payoﬀs from observing consequences. With random payoﬀ functions,the results extend provided that optimality is deﬁned as follows: A strategy σ i forplayer i is optimal given µ i ∈ ∆(Θ i ) if σ i ( x i | s i ) > x i ∈ arg max ¯ x i ∈ X i E P πi E ¯ Q iµi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , Y i ) (cid:3) . Note that by interchanging the order of integration, this notion of optimality is equiv-alent to the notion in the paper where the deterministic payoﬀ function is given by E P πi π i ( · , · ). Partial payoﬀ feedback . Suppose that player i knows her own consequence function f i : X × Ω → Y i and that her payoﬀ function is now given by π i : X × Ω → R . Inparticular, player i may not observe her own payoﬀ, but observing a consequence mayprovide partial information about ( x − i , ω ) and, therefore, about payoﬀs. Unlike thecase in the text where payoﬀs are observed, a belief µ i ∈ ∆(Θ i ) may not uniquelydetermine expected payoﬀs. The reason is that the distribution over consequences12mplied by µ i may be consistent with several distributions over X − i × Ω; i.e., thedistribution over X − i × Ω is only partially identiﬁed. Deﬁne the set M µ i ⊆ ∆( X − i × Ω) S i × X i to be the set of conditional distributions over X − i × Ω given ( s i , x i ) ∈ S i × X i that are consistent with belief µ i ∈ ∆(Θ i ), i.e., m ∈ M µ i if and only if ¯ Q iµ i ( y i | s i , x i ) = m ( f i ( x i , X − i , W ) = y i | s i , x i ) for all ( s i , x i ) ∈ S i × X i and y i ∈ Y i . Thenoptimality should be deﬁned as follows: A strategy σ i for player i is optimal given µ i ∈ ∆(Θ i ) if there exists m µ i ∈ M µ i such that σ i ( x i | s i ) > x i ∈ arg max ¯ x i ∈ X i E m µi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , X − i , W ) (cid:3) . Finally, the deﬁnition of identiﬁcation would also need to be changed to require notonly that there is a unique distribution over consequences that matches the observeddata, but also that this unique distribution implies a unique expected utility function.

F Global stability: Example 2.1 (monopoly withunknown demand).

Theorem 3 says that all Berk-Nash equilibria can be approached with probability 1provided we allow for vanishing optimization mistakes. In this appendix, we illustratehow to use the techniques of stochastic approximation theory to establish stabilityof equilibria under the assumption that players make no optimization mistakes. Wepresent the explicit learning dynamics for the monopolist with unknown demand,Example 2.1, and show that the unique equilibrium in this example is globally stable.The intuition behind global stability is that switching from the equilibrium strategyto a strategy that puts more weight on a price of 2 changes beliefs in a way that makesthe monopoly want to put less weight on a price of 2, and similarly for a deviation toa price of 10.We ﬁrst construct a perturbed version of the game. Then we show that the learningproblem is characterized by a nonlinear stochastic system of diﬀerence equations andemploy stochastic approximation methods for studying the asymptotic behavior ofsuch system. Finally, we take the payoﬀ perturbations to zero.In order to simplify the exposition and thus better illustrate the mechanism drivingthe dynamics, we modify the subjective model slightly. We assume the monopolist onlylearns about the parameter b ∈ R ; i.e., her beliefs about parameter a are degenerate at13 point a = 40 (cid:54) = a and thus are never updated. Therefore, beliefs µ are probabilitydistributions over R , i.e., µ ∈ ∆( R ). Perturbed Game . Let ξ be a real-valued random variable distributed accordingto P ξ ; we use F to denote the associated cdf and f the pdf. The perturbed payoﬀs aregiven by yx − ξ { x = 10 } . Thus, given beliefs µ ∈ ∆( R ), the probability of optimallyplaying x = 10 is σ ( µ ) = F (8 a − E µ [ B ]) . Note that the only aspect of µ that matters for the decision of the monopolist is E µ [ B ].Thus, letting m = E µ [ B ] and slightly abusing notation, we use σ ( µ ) = σ ( m ) as theoptimal strategy. Bayesian Updating . We now derive the Bayesian updating procedure. We as-sume that the the prior µ is given by a Gaussian distribution with mean and vari-ance m , τ . It is possible to show that, given a realization ( y, x ) and a prior N ( m, τ ), the posterior is also Gaussian and the mean and variance evolve as fol-lows: m t +1 = m t + (cid:16) − ( Y t +1 − a ) X t +1 − m t (cid:17) (cid:16) X t +1 X t +1 + τ − t (cid:17) and τ t +1 = X t +1 + τ − t ) . Nonlinear Stochastic Difference Equations and Stochastic Approximation.

For simplicity, let r t +1 ≡ t +1 (cid:0) τ − t + X t +1 (cid:1) and note that the previous nonlinear systemof stochastic diﬀerence equations can be written as m t +1 = m t + 1 t + 1 X t +1 r t +1 (cid:18) − ( Y t +1 − a ) X t +1 − m t (cid:19) r t +1 = r t + 1 t + 1 (cid:0) X t +1 − r t (cid:1) . Let β t = ( m t , r t ) (cid:48) , Z t = ( X t , Y t ), G ( β t , z t +1 ) = (cid:34) x t +1 r t +1 (cid:16) − ( y t +1 − a ) x t +1 − m t (cid:17)(cid:0) x t +1 − r t (cid:1) (cid:35) This choice of prior is standard in Gaussian settings like ours. As shown below this choicesimpliﬁes the exposition considerably. G ( β ) = (cid:34) G ( β ) G ( β ) (cid:35) = E P σ [ G ( β, Z t +1 )]= (cid:34) F (8 a − m ) r (cid:16) − ( a − a − b − m (cid:17) + (1 − F (8 a − m )) r (cid:16) − ( a − a − b − m (cid:17) (4 + F (8 a − m )96 − r ) (cid:35) where P σ is the probability over Z induced by σ (and y = a − b x + ω ). Therefore,the dynamical system can be cast as β t +1 = β t + 1 t + 1 G ( β t ) + 1 t + 1 V t +1 with V t +1 = G ( β t , Z t +1 ) − G ( β t ) . Stochastic approximation theory (e.g., Kushner andYin (2003)) implies, roughly speaking, that in order to study the asymptotic behaviorof ( β t ) t it is enough to study the behavior of the orbits of the following ODE˙ β ( t ) = G ( β ( t )) . Characterization of the Steady States.

In order to ﬁnd the steady statesof ( β t ) t , it is enough to ﬁnd β ∗ such that G ( β ∗ ) = 0. Let H ( m ) ≡ F (8 a − m )10 ( − ( a − a ) + ( b − m ) 10)+(1 − F (8 a − m )) 2 ( − ( a − a ) + ( b − m ) 2). Ob-serve that G ( β ) = r − H ( m ) and that H is continuous and lim m →−∞ H ( m ) = ∞ and lim m →∞ H ( m ) = −∞ . Thus, there exists at least one solution H ( m ) = 0.Therefore, there exists at least one β ∗ such that G ( β ∗ ) = 0.Let ¯ b = b − a − a = 4 − = and b = b − a − a = 4 − − = 3, ¯ r = 4 + F (8 a − b )96 and r = 4 + F (8 a − b )96, and B ≡ [ b, ¯ b ] × [ r, ¯ r ]. It follows that H ( m ) < ∀ m > ¯ b , and thus m ∗ must be such that m ∗ ≤ ¯ b . It is also easy to see that m ∗ ≥ b .Moreover, dH ( m ) dm = 96 f (8 a − m ) (cid:0) a − a ) − (cid:0) b − m (cid:1) (cid:1) − − F (8 a − m ) . Thus,for any m ≤ ¯ b , dH ( m ) dm <

0, because m ≤ ¯ b implies 8( a − a ) ≤ ( b − m )80 < ( b − m )96.Therefore, on the relevant domain m ∈ [ b, ¯ b ], H is decreasing, thus implying thatthere exists only one m ∗ such that H ( m ∗ ) = 0. Therefore, there exists only one β ∗ such that G ( β ∗ ) = 0 .We are now interested in characterizing the limit of β ∗ as the perturbation vanishes,i.e. as F converges to 1 { ξ ≥ } . To do this we introduce some notation. We considera sequence ( F n ) n that converges to 1 { ξ ≥ } and use β ∗ n to denote the steady state15ssociated to F n . Finally, we use H n to denote the H associated to F n .We proceed as follows. First note that since β ∗ n ∈ B ∀ n , the limit exists (going toa subsequence if needed). We show that m ∗ ≡ lim n →∞ m ∗ n = a = 8 = . Supposenot, in particular, suppose that lim n →∞ m ∗ n < a = (the argument for the reverseinequality is analogous and thus omitted). In this case lim n →∞ a − m ∗ n >

0, andthus lim n →∞ F n (8 a − m ∗ n ) = 1. Thereforelim n →∞ H n ( β ∗ n ) = 10 ( − ( a − a ) + ( b − m ∗ ) 10) ≥ (cid:18) − (cid:19) > . But this implies that ∃ N such that H n ( β ∗ n ) > ∀ n ≥ N which is a contradiction.Moreover, deﬁne σ ∗ n = F n (8 a − m ∗ n ) and σ ∗ = lim n →∞ σ n . Since H n ( m ∗ n ) = 0 ∀ n and m ∗ = , it follows that σ ∗ = − (cid:0) − (cid:0) − (cid:1) (cid:1) (cid:0) − (cid:0) − (cid:1) (cid:1) − (cid:0) − (cid:0) − (cid:1) (cid:1) = 136 . Global convergence to the Steady State . In our example, it is in factpossible to establish that behavior converges with probability 1 to the unique equi-librium. By the results in Benaim (1999) Section 6.3, it is suﬃcient to establish the global asymptotic stability of β ∗ n for any n , i.e., the basin of attraction of β ∗ n is all of B . In order to do this let L ( β ) = ( β − β ∗ n ) (cid:48) P ( β − β ∗ n ) for all β ; where P ∈ R × ispositive deﬁnite and diagonal and will be determined later. Note that L ( β ) = 0 iﬀ β = β ∗ n . Also dL ( β ( t )) dt = ∇ L ( β ( t )) (cid:48) G ( β ( t ))= 2 ( β ( t ) − β ∗ n ) (cid:48) P ( G ( β ( t )))= 2 (cid:8) ( m ( t ) − m ∗ n ) P [11] G ( β ( t )) + ( r ( t ) − r ∗ n ) P [22] G ( β ( t )) (cid:9) . G ( β ∗ n ) = 0, dL ( β ( t )) dt =2 ( β ( t ) − β ∗ n ) (cid:48) P ( G ( β ( t )) − G ( β ∗ n ))=2 ( m ( t ) − m ∗ n ) P [11] ( G ( β ( t )) − G ( β ∗ n ))+ 2 ( r ( t ) − r ∗ n ) P [22] ( G ( β ( t )) − G ( β ∗ n ))=2 ( m ( t ) − m ∗ n ) P [11] ˆ ∂ G ( m ∗ n + s ( m ( t ) − m ∗ n ) , r ∗ n ) ∂m ds + 2 ( r ( t ) − r ∗ n ) P [22] ˆ ∂ G ( m ∗ n , r ∗ n + s ( r ( t ) − r ∗ n )) ∂r ds where the last equality holds by the mean value theorem. Note that d G ( m ∗ n ,r ∗ n + s ( r ( t ) − r ∗ n )) dr = − ´ d G ( m ∗ n + s ( m ( t ) − m ∗ n ) ,r ∗ n ) dm ds = ´ ( r ∗ n ) − dH ( m ∗ n + s ( m ( t ) − m ∗ n )) dm ds . Since r ( t ) > r ∗ n ≥ dH ( m ) dm < ∀ m in therelevant domain. Thus, by choosing P [11] > P [22] > dL ( β ( t )) dt < L satisﬁes the following properties: is strictly positive ∀ β (cid:54) = β ∗ n and L ( β ∗ n ) = 0, and dL ( β ( t )) dt <

0. Thus, the function satisﬁes all the conditionsof a Lyapounov function and, therefore, β ∗ n is globally asymptotically stable ∀ nn