Berk-Nash Equilibrium: A Framework for Modeling Agents with Misspecified Models
BBerk-Nash Equilibrium: A Framework for ModelingAgents with Misspecified Models ∗ Ignacio Esponda Demian Pouzo(WUSTL) (UC Berkeley)May 10, 2016
Abstract
We develop an equilibrium framework that relaxes the standard assumptionthat people have a correctly-specified view of their environment. Each playeris characterized by a (possibly misspecified) subjective model, which describesthe set of feasible beliefs over payoff-relevant consequences as a function of ac-tions. We introduce the notion of a Berk-Nash equilibrium: Each player followsa strategy that is optimal given her belief, and her belief is restricted to be thebest fit among the set of beliefs she considers possible. The notion of best fitis formalized in terms of minimizing the Kullback-Leibler divergence, which isendogenous and depends on the equilibrium strategy profile. Standard solutionconcepts such as Nash equilibrium and self-confirming equilibrium constitutespecial cases where players have correctly-specified models. We provide a learn-ing foundation for Berk-Nash equilibrium by extending and combining resultsfrom the statistics literature on misspecified learning and the economics litera-ture on learning in games. ∗ We thank Vladimir Asriyan, Pierpaolo Battigalli, Larry Blume, Aaron Bodoh-Creed, SylvainChassang, Emilio Espino, Erik Eyster, Drew Fudenberg, Yuriy Gorodnichenko, Stephan Lauermann,Natalia Lazzati, Krist´of Madar´asz, Matthew Rabin, Ariel Rubinstein, Joel Sobel, J¨org Stoye, sev-eral seminar participants, and especially a co-editor and four anonymous referees for very helpfulcomments. Esponda: Olin Business School, Washington University in St. Louis, 1 Brookings Drive,Campus Box 1133, St. Louis, MO 63130, [email protected]; Pouzo: Department of Economics,UC Berkeley, 530-1 Evans Hall a r X i v : . [ q -f i n . E C ] M a y Introduction
Most economists recognize that the simplifying assumptions underlying their modelsare often wrong. But, despite recognizing that models are likely to be misspecified,the standard approach (with exceptions noted below) assumes that economic agentshave a correctly specified view of their environment. We present an equilibrium frame-work that relaxes this standard assumption and allows the modeler to postulate thateconomic agents have a subjective and possibly incorrect view of their world.An objective game represents the true environment faced by the agent (or players,in the case of several interacting agents). Payoff relevant states and privately observedsignals are drawn from an objective probability distribution. Each player observes herown private signal and then players simultaneously choose actions. The action profileand the realized state determine consequences, and consequences determine payoffs.In addition, each player has a subjective model representing her own view of theenvironment. Formally, a subjective model is a set of probability distributions overown consequences as a function of a player’s own action and information. Crucially, weallow the subjective model of one or more players to be misspecified, which roughlymeans that the set of subjective distributions does not include the true, objectivedistribution. For example, a consumer might perceive a nonlinear price schedule tobe linear and, therefore, respond to average, not marginal, prices. Or traders mightnot realize that the value of trade is partly determined by the terms of trade.A
Berk-Nash equilibrium is a strategy profile such that, for each player, thereexists a belief with support in her subjective model satisfying two conditions. First,the strategy is optimal given the belief. Second, the belief puts probability one onthe set of subjective distributions over consequences that are “closest” to the truedistribution, where the true distribution is determined by the objective game and theactual strategy profile. The notion of “closest” is given by a weighted version of theKullback-Leibler divergence, also known as relative entropy.Berk-Nash equilibrium includes standard and boundedly rational solution con-cepts in a common framework, such as Nash, self-confirming (e.g., Battigalli (1987),Fudenberg and Levine (1993a), Dekel et al. (2004)), fully cursed (Eyster and Rabin,2005), and analogy-based expectation equilibrium (Jehiel (2005), Jehiel and Koessler(2008)). For example, suppose that the game is correctly specified (i.e., the support ofeach player’s prior contains the true distribution) and that the game is strongly iden- ified (i.e., there is a unique distribution—whether or not correct—that matches theobserved data). Then Berk-Nash equilibrium is equivalent to Nash equilibrium. If thestrong identification assumption is dropped, then Berk-Nash is a self-confirming equi-librium. In addition to unifying previous work, our framework provides a systematicapproach for extending previous cases and exploring new types of misspecifications.We provide a foundation for Berk-Nash equilibrium (and the use of Kullback-Leibler divergence as a measure of “distance”) by studying a dynamic setup with afixed number of players playing the objective game repeatedly. Each player believesthat the environment is stationary and starts with a prior over her subjective model.In each period, players use the observed consequences to update their beliefs accordingto Bayes’ rule. The main objective is to characterize limiting behavior when playersbehave optimally but learn with a possibly misspecified subjective model. The main result is that, if players’ behavior converges, then it converges to a Berk-Nash equilibrium. A converse result, showing that we can converge to any Berk-Nashequilibrium of the game for some initial (non-doctrinaire) prior, does not hold. But weobtain a positive convergence result by relaxing the assumption that players exactlyoptimize. For any given Berk-Nash equilibrium, we show that convergence to thatequilibrium occurs if agents are myopic and make asymptotically optimal choices (i.e.,optimization mistakes vanish with time).There is a longstanding interest in studying the behavior of agents who hold mis-specified views of the world. Examples come from diverse fields including industrialorganization, mechanism design, information economics, macroeconomics, and psy-chology and economics (e.g., Arrow and Green (1973), Kirman (1975), Sobel (1984),Kagel and Levin (1986), Nyarko (1991), Sargent (1999), Rabin (2002)), althoughthere is often no explicit reference to misspecified learning. Most of the literature,however, focuses on particular settings, and there has been little progress in devel-oping a unified framework. Our treatment unifies both “rational” and “boundedlyrational” approaches, thus emphasizing that modeling the behavior of misspecifiedplayers does not constitute a large departure from the standard framework.Arrow and Green (1973) provide a general treatment and make a distinction be-tween objective and subjective games. Their framework, though, is more restrictivethan ours in terms of the types of misspecifications that players are allowed to have. In the case of multiple agents, the environment need not be stationary, and so we are ignoringrepeated game considerations where players take into account how their actions affect others’ futureplay. We discuss the extension to a population model with a continuum of agents in Section 5. Our paper is also related to the bandit (e.g., Rothschild (1974), McLennan (1984),Easley and Kiefer (1988)) and self-confirming equilibrium (SCE) literatures, whichhighlight that agents might optimally end up with incorrect beliefs if experimentationis costly. We also allow beliefs to be incorrect due to insufficient feedback, but ourmain contribution is to allow for misspecified learning. When players have misspecifiedmodels, beliefs may be incorrect and endogenously depend on own actions even if thereis persistent experimentation; thus, an equilibrium framework is needed to characterizesteady-state behavior even in single-agent settings. From a technical perspective, we extend and combine results from two literatures.First, the idea that equilibrium is a result of a learning process comes from the liter-ature on learning in games. This literature studies explicit learning models to justifyNash and SCE (e.g., Fudenberg and Kreps (1988), Fudenberg and Kreps (1993), Fu-denberg and Kreps (1995), Fudenberg and Levine (1993b), Kalai and Lehrer (1993)). We extend this literature by allowing players to learn with models of the world thatare misspecified even in steady state.Second, we rely on and contribute to the literature studying the limiting behaviorof Bayesian posteriors. The results from this literature have been applied to de-cision problems with correctly specified agents (e.g., Easley and Kiefer, 1988). Inparticular, an application of the martingale convergence theorem implies that beliefsconverge almost surely under the agent’s subjective prior. This result, however, doesnot guarantee convergence of beliefs according to the true distribution if the agent hasa misspecified model and the support of her prior does not include the true distribu-tion. Thus, we take a different route and follow the statistics literature on misspecifiedlearning. This literature characterizes limiting beliefs in terms of the Kullback-Leiblerdivergence (e.g., Berk (1966), Bunke and Milhaud (1998)). We extend the statistics Some explanations for why players may have misspecified models include the use of heuristics(Tversky and Kahneman, 1973), complexity (Aragones et al., 2005), the desire to avoid over-fittingthe data (Al-Najjar (2009), Al-Najjar and Pai (2013)), and costly attention (Schwartzstein, 2009). In the macroeconomics literature, the term SCE is sometimes used in a broader sense to includecases where agents have misspecified models (e.g., Sargent, 1999). Two extensions of SCE are also potentially applicable: restrictions on beliefs based on introspec-tion (e.g., Rubinstein and Wolinsky, 1994), and ambiguity aversion (Battigalli et al., 2012). See Fudenberg and Levine (1998, 2009) for a survey of this literature. White (1982) shows that the Kullback-Leibler divergence also characterizes the limiting behavior
A (simultaneous-move) game G = < O , Q > is composed of a (simultaneous-move)objective game O and a subjective model Q . Objective game . A (simultaneous-move) objective game is a tuple O = (cid:104) I, Ω , S , p, X , Y , f, π (cid:105) , where: I is the set of players; Ω is the set of payoff-relevant states; S = × i ∈ I S i is theset of profiles of signals, where S i is the set of signals of player i ; p is a probabilitydistribution over Ω × S , and, for simplicity, it is assumed to have marginals with fullsupport; we use standard notation to denote marginal and conditional distributions,e.g., p Ω | S i ( · | s i ) denotes the conditional distribution over Ω given S i = s i ; X = × i ∈ I X i is a set of profiles of actions, where X i is the set of actions of player i ; Y = × i ∈ I Y i is a set of profiles of (observable) consequences, where Y i is the set of consequencesof player i ; f = ( f i ) i ∈ I is a profile of feedback or consequence functions, where f i : X × Ω → Y i maps outcomes in Ω × X into consequences of player i ; and π = ( π i ) i ∈ I ,where π i : X i × Y i → R is the payoff function of player i . For simplicity, we provethe results for the case where all of the above sets are finite. The timing of the objective game is as follows: First, a state and a profile ofsignals are drawn according to p . Second, each player privately observes her ownsignal. Third, players simultaneously choose actions. Finally, each player observesher consequence and obtains a payoff. We implicitly assume that players observe at of the maximum quasi-likelihood estimator. The concept of a feedback function is borrowed from the SCE literature. Also, while it isredundant to have π i depend on x i , it simplifies the notation in applications. In the working paper version (Esponda and Pouzo, 2014), we provide technical conditions underwhich the results extend to nonfinite Ω and Y . A strategy of player i is a mapping σ i : S i → ∆( X i ). The probability that player i chooses action x i after observing signal s i is denoted by σ i ( x i | s i ). A strategy profileis a vector of strategies σ = ( σ i ) i ∈ I ; let Σ denote the space of all strategy profiles.Fix an objective game. For each strategy profile σ , there is an objective distri-bution over player i ’s consequences, Q iσ : S i × X i → ∆( Y i ), where Q iσ ( y i | s i , x i ) = (cid:88) { ( ω,x − i ): f i ( x i ,x − i ,ω )= y i } (cid:88) s − i (cid:89) j (cid:54) = i σ j ( x j | s j ) p Ω × S − i | S i ( ω, s − i | s i ) , (1)for all ( s i , x i , y i ) ∈ S i × X i × Y i . The objective distribution represents the truedistribution over consequences, conditional on a player’s own action and signal, giventhe objective game and a strategy profile followed by the players.
Subjective model . The subjective model represents the set of distributionsover consequences that players consider possible a priori. For a fixed objective game,a subjective model is a tuple Q = (cid:104) Θ , ( Q θ ) θ ∈ Θ (cid:105) , where Θ = × i ∈ I Θ i and Θ i is player i ’s parameter set; and Q θ = ( Q iθ i ) i ∈ I , where Q iθ i : S i × X i → ∆( Y i ) is the conditional distribution over player i ’s consequencesparameterized by θ i ∈ Θ i ; we denote the conditional distribution by Q θ i ( · | s i , x i ). While the objective game represents the true environment, the subjective modelrepresents the players’ perception of their environment. This separation between ob-jective and subjective models is crucial in this paper.
Remark . A special case of a subjective model is one where each player understandsthe objective game being played but is uncertain about the distribution over states,the consequence function, and (in the case of multiple players) the strategies of otherplayers. In this special case, player i ’s uncertainty about p , f i , and σ − i can be de-scribed by a parametric model p θ i , f iθ i , σ − iθ i , where θ i ∈ Θ i . A subjective distribution Q iθ i is then derived by replacing p , f i , and σ − i with p θ i , f iθ i , σ − iθ i in equation (1). (cid:3) See Online Appendix E for the case where players do not observe own payoffs. As usual, the superscript − i denotes a profile where the i ’th component is excluded For simplicity, we assume that players know the distribution over own signals. In this case, a player understands that other players mix independently but, due to uncertaintyover the parameter θ i that indexes σ − iθ i = ( σ jθ i ) j (cid:54) = i , she may have correlated beliefs about her oppo-
5y defining Q iθ i as a primitive, we stress two points. First, this object is sufficient tocharacterize behavior. Second, working with general subjective distributions allows formore general types of misspecifications, where players do not even have to understandthe structural elements that determine their payoff relevant consequences.We maintain the following assumptions about the subjective model. Assumption 1.
For all i ∈ I : (i) Θ i is a compact subset of an Euclidean space,(ii) Q iθ i ( y i | s i , x i ) is continuous as a function of θ i ∈ Θ i for all ( y i , s i , x i ) ∈ Y i × S i × X i ,(iii) For all θ i ∈ Θ i , there exists a sequence ( θ in ) n in Θ i such that lim n →∞ θ in = θ i andsuch that, for all n , Q iθ in ( y i | s i , x i ) > s i , x i ) ∈ S i × X i , y i ∈ f i ( x i , X − i , ω ),and ω ∈ supp ( p Ω | S i ( · | s i )).Conditions (i) and (ii) are the standard conditions used to define a parametric model in statistics (e.g., Bickel et al. (1993)). Condition (iii) plays two roles. First,it guarantees that there exists at least one parameter value that attaches positiveprobability to every feasible observation. In particular, it rules out what can be viewedas a stark misspecification in which every element of the subjective model attacheszero probability to an event that occurs with positive true probability. Second, itimposes a “richness” condition on the subjective model: If a feasible event is deemedimpossible by some parameter value, then that parameter value is not isolated in thesense that there are nearby parameter values that consider every feasible event to bepossible. In Section 5, we show that equilibrium may fail to exist and steady-statebehavior need not be characterized by equilibrium without this assumption. We illustrate the environment by presenting several examples that had previously notbeen integrated into a common framework. In examples with a single agent, we dropthe i subscript from the notation. Example 2.1.
Monopolist with unknown demand . A monopolist facesdemand y = f ( x, ω ) = φ ( x ) + ω , where x ∈ X is the price chosen by the monopolist nents’ strategies, as in Fudenberg and Levine (1993a). Nyarko (1991) studies a special case of Example 2.1 and shows that a steady state does notexist in pure strategies; Sobel (1984) considers a misspecification similar to Example 2.2; Tverskyand Kahneman’s (1973) story motivates Example 2.3; Sargent (1999, Chapter 7) studies Example2.4; and Kagel and Levin (1986), Eyster and Rabin (2005), Jehiel and Koessler (2008), and Esponda(2008) study Example 2.5. See Esponda and Pouzo (2014) for additional examples. ω is a mean-zero shock with distribution p ∈ ∆(Ω). The monopolist observessales y , but not the shock. The monopolist does not observe any signal, and so weomit signals from the notation. The monopolist’s payoff is π ( x, y ) = xy (i.e., there areno costs). The monopolist’s uncertainty about p and f is described by a parametricmodel f θ , p θ , where y = f θ ( x, ω ) = a − bx + ω is the demand function, θ = ( a, b ) ∈ Θis a parameter vector, and ω ∼ N (0 ,
1) (i.e., p θ is a standard normal distribution forall θ ∈ Θ). In particular, this example corresponds to the special case discussed inRemark 1, and Q θ ( · | x ) is a normal density with mean a − bx and unit variance. (cid:3) Example 2.2.
Nonlinear taxation . An agent chooses effort x ∈ X at cost c ( x ) and obtains income z = x + ω , where ω is a zero-mean shock with distribution p ∈ ∆(Ω). The agent pays taxes t = τ ( z ), where τ ( · ) is a nonlinear tax schedule.The agent does not observe any signal, and so we omit them. The agent observes y = ( z, t ) and obtains payoff π ( x, z, t ) = z − t − c ( x ). She understands how efforttranslates into income but fails to realize that the marginal tax rate depends onincome. We compare two models that capture this misspecification. In model A, theagent believes in a random coefficient model, t = ( θ A + ε ) z , in which the marginaland average tax rates are both equal to θ A + ε , where θ A ∈ Θ A = R . In modelB, the agent believes that t = θ B + θ B z + ε , where θ B is the constant marginaltax rate and θ B = ( θ B , θ B ) ∈ Θ B = R . In both models, ε ∼ N (0 ,
1) measuresuncertain aspects of the schedule (e.g., variations in tax rates or credits). Thus, Q jθ ( t, z | x ) = Q jθ ( t | z ) p ( z − x ), where Q jθ ( · | z ) is a normal density with mean θ A z and variance z in model j = A and mean θ B + θ B z and unit variance in model j = B . (cid:3) Example 2.3.
Regression to the mean . An instructor observes the initialperformance s of a student and decides to praise or criticize him, x ∈ { C, P } . Thestudent then performs again and the instructor observes his final performance, s (cid:48) .The truth is that performances y = ( s, s (cid:48) ) are independent, standard normal randomvariables. The instructor’s payoff is π ( x, s, s (cid:48) ) = s (cid:48) − c ( x, s ), where c ( x, s ) = κ | s | > s > , x = C or s < , x = P , and, in all other cases, c ( x, s ) = 0. Thefunction c represents a (reputation) cost from lying (i.e., criticizing above-average Formally, f ( x, ω ) = ( z ( x, ω ) , t ( x, ω )), where z ( x, ω ) = x + ω and t ( x, ω ) = τ ( x + ω ). It is not necessary to assume that Θ A and Θ B are compact for an equilibrium to exist; the samecomment applies to Examples 2.3 and 2.4 Formally, ω = ( s, s (cid:48) ), p is the product of standard normal distributions, and y = f ( x, ω ) = ω . s > s <
0. The instructor, however, does not admit the possibility ofregression to the mean and believes that s (cid:48) = s + θ x + ε , where ε ∼ N (0 , θ = ( θ C , θ P ) ∈ Θ parameterizes her perceived influence on performance. Thus,letting ¯ Q θ ( · | x ) be the a normal density with mean s + θ x and unit variance, it followsthat Q θ (ˆ s, s (cid:48) | s, x ) = ¯ Q θ ( s (cid:48) | s, x ) if ˆ s = s and 0 otherwise. (cid:3) Example 2.4.
Monetary policy . Two players, the government (G) and thepublic (P), i.e., I = { G, P } , choose monetary policy x G and inflation forecasts x P ,respectively. They do not observe signals, and so we omit them. Inflation, e , andunemployment, U , are determined by e = x G + ε e (2) U = u ∗ − λ ( e − x P ) + ε U , (3)where u ∗ > λ ∈ (0 ,
1) and ω = ( ε e , ε U ) ∈ Ω = R are shocks with a full supportdistribution p ∈ ∆(Ω) and V ar ( ε e ) >
0. The public and the government observerealized inflation and unemployment, but not the error terms. The government’spayoff is π ( x G , e, U ) = − ( U + e ). For simplicity, we focus on the government’sproblem and assume that the public has correct beliefs and chooses x P = x G . Thegovernment understands how its policy x G affects inflation, but does not realize thatunemployment is affected by surprise inflation: U = θ − θ e + ε U . (4)The subjective model is parameterized by θ = ( θ , θ ) ∈ Θ, and it follows that Q θ ( e, U | x G ) is the density implied by the equations (2) and (4). (cid:3) Example 2.5.
Trade with adverse selection.
A buyer with valuation v ∈ V and a seller submit a (bid) price x ∈ X and an ask price a ∈ A , respectively. Theseller’s ask price and the buyer’s value are drawn from p ∈ ∆( A × V ), so that Ω = A × V A model that allows for regression to the mean is s (cid:48) = αs + θ x + ε ; in this case, the agent wouldcorrectly learn that α = 0 and θ x = 0 for all x . Rabin and Vayanos (2010) study a related setup inwhich the agent believes that shocks are autoregressive when in fact they are i.i.d. Formally, ω = ( ε e , ε U ) and y = ( e, U ) = f ( x G , x P , ω ) is given by equations (2) and (3).
8s the state space. Thus, the buyer is the only decision maker. After submitting aprice, the buyer observes y = ω = ( a, v ) and gets payoff π ( x, a, v ) = v − x if a ≤ x and zero otherwise. In other words, the buyer observes perfect feedback, gets v − x ifthere is trade, and 0 otherwise. When making an offer, she does not know her valueor the seller’s ask price. She also does not observe any signals, and so we omit them.Finally, suppose that A and V are correlated but that the buyer believes they areindependent. This is captured by letting Q θ = θ and Θ = ∆( A ) × ∆( V ). (cid:3) Distance to true model . In equilibrium, we will require players’ beliefs to putprobability one on the set of subjective distributions over consequences that are “clos-est” to the objective distribution. The following function, which we call the weightedKullback-Leibler divergence (wKLD) function of player i , is a weighted version ofthe standard Kullback-Leibler divergence in statistics (Kullback and Leibler, 1951). Itrepresents a “distance” between the objective distribution over i ’s consequences givena strategy profile σ ∈ Σ and the distribution as parameterized by θ i ∈ Θ i : K i ( σ, θ i ) = (cid:88) ( s i ,x i ) ∈ S i × X i E Q iσ ( ·| s i ,x i ) (cid:20) ln Q iσ ( Y i | s i , x i ) Q iθ i ( Y i | s i , x i ) (cid:21) σ i ( x i | s i ) p S i ( s i ) . (5)The set of closest parameter values of player i given σ is the setΘ i ( σ ) ≡ arg min θ i ∈ Θ i K i ( σ, θ i ) . The interpretation is that Θ i ( σ ) ⊂ Θ i is the set of parameter values that player i canbelieve to be possible after observing feedback consistent with strategy profile σ . Remark . We show in Section 4 that wKLD is the right notion of distance in alearning model with Bayesian players. Here, we provide an heuristic argument fora Bayesian agent (we drop i subscripts for clarity) with parameter set Θ = { θ , θ } who observes data over t periods, ( s τ , x τ , y τ ) t − τ =0 , that comes from repeated play of The typical story is that there is a population of sellers each of whom follows the weakly dominantstrategy of asking for her valuation; thus, the ask price is a function of the seller’s valuation and, ifbuyer and seller valuations are correlated, then the ask price and buyer valuation are also correlated. The notation E Q denotes expectation with respect to the probability distribution Q . Also, weuse the convention that − ln 0 = ∞ and 0 ln 0 = 0. σ . Let ρ = µ ( θ ) /µ ( θ ) denote the agent’s ratioof priors. Applying Bayes’ rule and simple algebra, the posterior probability over θ after t periods is µ t ( θ ) = (cid:18) ρ Π t − τ =0 Q θ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) (cid:19) − = (cid:18) ρ Π t − τ =0 Q θ ( y τ | s τ , x τ ) /Q σ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) /Q σ ( y τ | s τ , x τ ) (cid:19) − = (cid:32) ρ exp (cid:40) − t (cid:32) t t − (cid:88) τ =0 ln Q σ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) − t t − (cid:88) τ =0 ln Q σ ( y τ | s τ , x τ ) Q θ ( y τ | s τ , x τ ) (cid:33)(cid:41)(cid:33) − where the second equality follows by multiplying and dividing by Π t − τ =0 Q σ ( y τ | s τ , x τ ).By a law of large numbers argument and the fact that the true joint distributionover ( s, x, y ) is given by Q σ ( y | x, s ) σ ( x | s ) p S ( s ), the difference in the log-likelihoodratios converges to K ( σ, θ ) − K ( σ, θ ). Suppose that K ( σ, θ ) > K ( σ, θ ). Then,for sufficiently large t , the posterior belief µ t ( θ ) is approximately equal to 1 / (1 + ρ exp ( − t ( K ( σ, θ ) − K ( σ, θ )))), which converges to 0. Therefore, the posterior even-tually assigns zero probability to θ . On the other hand, if K ( σ, θ ) < K ( σ, θ ), thenthe posterior eventually assigns zero probability to θ . Thus, the posterior eventuallyassigns zero probability to parameter values that do not minimize K ( σ, · ). (cid:3) Remark . Because the wKLD function is weighted by a player’s own strategy, it placesno restrictions on beliefs about outcomes that only arise following out-of-equilibriumactions (beyond the restrictions imposed by Θ). (cid:3)
The next result collects some useful properties of the wKLD function.
Lemma 1. (i) For all i ∈ I , θ i ∈ Θ i , and σ ∈ Σ , K i ( σ, θ i ) ≥ , with equality holdingif and only if Q θ i ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all ( s i , x i ) such that σ i ( x i | s i ) > . (ii)For all i ∈ I , Θ i ( · ) is nonempty, upper hemicontinuous, and compact valued.Proof. See the Appendix.The upper-hemicontinuity of Θ i ( · ) would follow from the Theorem of the Maximumhad we assumed Q iθ i to be positive for all feasible events and θ i ∈ Θ i , since the wKLDfunction would then be finite and continuous. But this assumption may be strong insome cases. Assumption 1(iii) weakens this assumption by requiring that it holdsfor a dense subset of Θ, and still guarantees that Θ i ( · ) is upper hemicontinuous. For example, it rules out cases where a player believes others follow pure strategies. ptimality. In equilibrium, we will require each player to choose a strategy thatis optimal given her beliefs. A strategy σ i for player i is optimal given µ i ∈ ∆(Θ i ) if σ i ( x i | s i ) > x i ∈ arg max ¯ x i ∈ X i E ¯ Q iµi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , Y i ) (cid:3) (6)where ¯ Q iµ i ( · | s i , x i ) = ´ Θ i Q iθ i ( · | s i , x i ) µ i ( dθ i ) is the distribution over consequences ofplayer i , conditional on ( s i , x i ) ∈ S i × X i , induced by µ i . Definition of equilibrium.
We propose the following solution concept.
Definition 1.
A strategy profile σ is a Berk-Nash equilibrium of game G if, forall players i ∈ I , there exists µ i ∈ ∆(Θ i ) such that(i) σ i is optimal given µ i , and(ii) µ i ∈ ∆(Θ i ( σ )), i.e., if ˆ θ i is in the support of µ i , thenˆ θ i ∈ arg min θ i ∈ Θ i K i ( σ, θ i ) . Definition 1 places two restrictions on equilibrium behavior: (i) optimization givenbeliefs, and (ii) endogenous restrictions on beliefs. For comparison, note that thedefinition of Nash equilibrium is identical to Definition 1 except that condition (ii) isreplaced with the condition that players have correct beliefs, i.e., ¯ Q iµ i = Q iσ . existence of equilibrium. The standard existence proof of Nash equilibriumcannot be used here because the analogous version of a best response correspondenceis not necessarily convex valued. To prove existence, we first perturb payoffs andestablish that equilibrium exists in the perturbed game. We then consider a sequenceof equilibria of perturbed games, where perturbations go to zero, and establish thatthe limit is a Berk-Nash equilibrium of the (unperturbed) game. The nonstandardpart of the proof is to prove existence of equilibrium in the perturbed game. Theperturbed best response correspondence is still not necessarily convex valued. Ourapproach is to characterize equilibrium as a fixed point of a belief correspondence andshow that it satisfies the requirements of a generalized version of Kakutani’s fixedpoint theorem. The idea of perturbations and the strategy of the existence proof date back to Harsanyi (1973);Selten (1975) and Kreps and Wilson (1982) also used these ideas to prove existence of perfect andsequential equilibrium, respectively. heorem 1. Every game has at least one Berk-Nash equilibrium.Proof.
See the Appendix.
Example 2.1, continued from pg. 6.
Monopolist with unknown demand . Let σ = ( σ x ) x ∈ X denote a strategy, where σ x is the probability of choosing price x ∈ X .Because this is a single-agent problem, the objective distribution does not depend on σ ; hence, we denote it by Q ( · | x ), which is a normal density with mean φ ( x ) andunit variance. Similarly, Q θ ( · | x ) is a normal density with mean φ θ ( x ) = a − bx andunit variance. It follows from equation (5) that K ( σ, θ ) = (cid:88) x ∈ X σ x E Q ( ·| x ) (cid:2) ( Y − φ θ ( x )) − ( Y − φ ( x )) (cid:3) = (cid:88) x ∈ X σ x
12 ( φ ( x ) − φ θ ( x )) . For concreteness, let X = { , } , φ (2) = 34 and φ (10) = 2, and Θ = [33 , × [3 , . Let θ ∈ R provide a perfect fit for demand, i.e., φ θ ( x ) = φ ( x ) for all x ∈ X . In this example, θ = ( a , b ) = (42 , / ∈ Θ and, therefore, we say that themonopolist has a misspecified model. The dashed line in Figure 1 depicts optimalbehavior: the optimal price is 10 to the left, it is 2 to the right, and the monopolist isindifferent for parameter values on the dashed line.To solve for equilibrium, we first consider pure strategies. If σ = (0 ,
1) (i.e., theprice is x = 10), the first order conditions ∂K ( σ, θ ) /∂a = ∂K ( σ, θ ) /∂b = 0 imply φ (10) = φ θ (10) = a − b
10, and any ( a, b ) ∈ Θ on the segment AB in Figure 1minimizes K ( σ, · ). These minimizers, however, lie to the right of the dashed line,where it is not optimal to set a price of 10. Thus, σ = (0 ,
1) is not an equilibrium.A similar argument establishes that σ = (1 ,
0) is not an equilibrium: If it were, theminimizer would be at D , where it is in fact not optimal to choose a price of 2.Finally, consider mixed strategies. Because both first order conditions cannot holdsimultaneously, the parameter value that minimizes K ( σ, θ ) lies on the boundary of Θ.A bit of algebra shows that, for any totally mixed σ , there is a unique minimizer θ σ =( a σ , b σ ) characterized as follows. If σ ≤ /
4, the minimizer is on the segment BC : In particular, the deterministic part of the demand function can have any functional form pro-vided it passes through (2 , φ o (2)) and (10 , φ (10)). Slope=2Slope=10Θ ba θ Optimal Price = 10 Optimal Price = 2D A B θ ˆ σ . ba θ Optimal Price = 10 Optimal Price = 2D A B θ σ ∗ . Figure 1: Monopolist with misspecified demand function.
Left panel: the parameter value that minimizes the wKLD function given strategy ˆ σ is θ ˆ σ .Right panel: σ ∗ is a Berk-Nash equilibrium ( σ ∗ is optimal given θ σ ∗ —because θ σ ∗ lies onthe indifference line—and θ σ ∗ minimizes the wKLD function given σ ∗ ). b σ = 3 . a σ = 4 σ + 37 solves ∂K ( σ, θ ) /∂a = 0. The left panel of Figure 1 depictsan example where the unique minimizer θ ˆ σ under strategy ˆ σ is given by the tangencybetween the contour lines of K (ˆ σ, · ) and the feasible set Θ. If σ ∈ [3 / , / θ σ = C is the northeast vertex of Θ. Finally, if σ > /
16, the minimizer is on thesegment DC : a σ = 40 and b σ = (380 − σ ) / (100 − σ ) solves ∂K ( σ, θ ) /∂b = 0.Because the monopolist mixes, optimality requires that the equilibrium belief θ σ lies on the dashed line. The unique Berk-Nash equilibrium is σ ∗ = (35 / , / θ σ ∗ = (40 , / DC , as depicted in the right panel of Figure 1. It is not thecase, however, that the equilibrium belief about the mean of Y is correct. Thus, anapproach that had focused on fitting the mean, rather than minimizing K , would haveled to the wrong conclusion. (cid:3) It can be shown that K ( σ, θ ) = (cid:0) θ − θ (cid:1) (cid:48) M σ (cid:0) θ − θ (cid:1) , where M σ is a weighting matrix thatdepends on σ . In particular, the contour lines of K ( σ, · ) are ellipses. The example also illustrates the importance of mixed strategies for existence of Berk-Nash equi-librium, even in single-agent settings. As an antecedent, Esponda and Pouzo (2011) argue that thisis the reason why mixed strategy equilibrium cannot be purified in a voting application. xample 2.2, continued from pg. 7. Nonlinear taxation.
For any purestrategy x and parameter value θ ∈ Θ A = R (model A) or θ ∈ Θ B = R (model B),the wKLD function K j ( x, θ ) for model j ∈ { A, B } equals E (cid:104) ln Q ( T | Z ) p ( Z − x ) Q jθ ( T | Z ) p ( Z − x ) | X = x (cid:105) = − E (cid:104)(cid:0) τ ( Z ) /Z − θ A (cid:1) | X = x (cid:105) + C A (model A) − E (cid:104)(cid:0) τ ( Z ) − θ B − θ B Z (cid:1) | X = x (cid:105) + C B (model B)where E denotes the true conditional expectation and C A and C B are constants.For model A, θ A ( x ) = E [ τ ( x + W ) / ( x + W )] is the unique parameter that mini-mizes K A ( x, · ). Intuitively, the agent believes that the expected marginal tax rateis equal to the true expected average tax rate. For model B, θ B ( x ) = Cov ( τ ( x + W ) , x + W ) /V ar ( x + W ) = E [ τ (cid:48) ( x + W )] , where the second equality follows from Stein’s lemma (Stein (1972)), provided that τ is differentiable. Intuitively, the agent believes that the marginal tax rate is constantand given by the true expected marginal tax rate. We now compare equilibrium under these two models with the case in which theagent has correct beliefs and chooses an optimal strategy x opt that maximizes x − E [ τ ( x + W )] − c ( x ). In contrast, a strategy x j ∗ is a Berk-Nash equilibrium of model j if and only if x = x j ∗ maximizes x − θ j ( x j ∗ ) x − c ( x ).For example, suppose that the cost of effort and true tax schedule are both smoothfunctions, increasing and convex (e.g., taxes are progressive) and that X ⊂ R is acompact interval. Then first order conditions are sufficient for optimality, and x opt is the unique solution to 1 − E [ τ (cid:48) ( x opt + W )] = c (cid:48) ( x opt ). Moreover, the unique Berk-Nash equilibrium solves 1 − E (cid:2) τ ( x A ∗ + W ) / ( x A ∗ + W ) (cid:3) = c (cid:48) ( x A ∗ ) for model A and1 − E [ τ (cid:48) ( x B ∗ + W )] = c (cid:48) ( x B ∗ ) for model B. In particular, effort in model B is optimal, x B ∗ = x opt . Intuitively, the agent has correct beliefs about the true expected marginaltax rate at her equilibrium choice of effort, and so she has the right incentives onthe margin, despite believing incorrectly that the marginal tax rate is constant. Incontrast, effort is higher than optimal in model A, x A ∗ > x opt . Intuitively, the agent We use W to denote the random variable that takes on realizations ω . By linearity and normality, the minimizers of K B ( x, · ) coincide with the OLS estimands. Weassume normality for tractability, although the framework allows for general distributional assump-tions. There are other tractable distributions; for example, the minimizer of wKLD under the Laplacedistribution corresponds to the estimates of a median (not a linear) regression. (cid:3) Example 2.3, continued from pg. 7.
Regression to the mean . Sinceoptimal strategies are characterized by a cutoff, we let σ ∈ R represent the strategywhere the instructor praises an initial performance if it is above σ and criticizes itotherwise. The wKLD function for any θ ∈ Θ = R is K ( σ, θ ) = ˆ σ −∞ E (cid:20) ln ϕ ( S ) ϕ ( S − ( θ C + s )) (cid:21) ϕ ( s ) ds + ˆ ∞ σ E (cid:20) ln ϕ ( S ) ϕ ( S − ( θ P + s )) (cid:21) ϕ ( s ) ds , where ϕ is the density of N (0 ,
1) and E denotes the true expectation. For each σ , theunique parameter vector that minimizes K ( σ, · ) is θ C ( σ ) = E [ S − S | S < σ ] = − E [ S | S < σ ] > θ P ( σ ) = − E [ S | S > σ ] <
0. Intuitively, the instructor is critical forperformances below a threshold and, therefore, the mean performance conditional ona student being criticized is lower than the unconditional mean performance; thus, astudent who is criticized delivers a better next performance in expectation. Similarly,a student who is praised delivers a worse next performance in expectation.The instructor who follows a strategy cutoff σ believes, after observing initialperformance s >
0, that her expected payoff is s + θ C ( σ ) − κs if she criticizes and s + θ P ( σ ) if she praises. By optimality, the cutoff makes her indifferent between praisingand criticizing. Thus, σ ∗ = (1 /κ ) ( θ C ( σ ∗ ) − θ P ( σ ∗ )) > κ →
0, meaning that instructors care only about performance and not about lying, σ ∗ → ∞ : instructors only criticize (as in Tversky and Kahneman’s (1973) story). (cid:3) We show that Berk-Nash equilibrium includes several solution concepts (both standardand boundedly rational) as special cases. 15 .1 Properties of games correctly-specified games . In Bayesian statistics, a model is correctly specifiedif the support of the prior includes the true data generating process. The extension tosingle-agent decision problems is straightforward. In games, however, we must accountfor the fact that the objective distribution over consequences (i.e., the true model)depends on the strategy profile. Definition 2.
A game is correctly specified given σ if, for all i ∈ I , there exists θ i ∈ Θ i such that Q iθ i ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all for all ( s i , x i ) ∈ S i × X i ;otherwise, the game is misspecified given σ . A game is correctly specified if itis correctly specified for all σ ; otherwise, it is misspecified . identification . From the player’s perspective, what matters is identification ofthe distribution over consequences Q iθ , not the parameter θ . If the model is correctlyspecified, then the true Q iσ is trivially identified. Of course, this is not true if the modelis misspecified, because the true distribution will never be learned. But we want adefinition that captures the same spirit: If two distributions are judged to be equallya best fit (given the true distribution), then we want these two distributions to beidentical; otherwise, we cannot identify which distribution is a best fit. The fact thatplayers take actions introduces an additional nuance to the definition of identification.We can ask for identification of the distribution over consequences either for thoseactions that are taken by the player (i.e., on the path of play) or for all actions (i.e.,on and off the path). Definition 3.
A game is weakly identified given σ if, for all i ∈ I : if θ i , θ i ∈ Θ i ( σ ),then Q iθ i ( · | s i , x i ) = Q iθ i ( · | s i , x i ) for all ( s i , x i ) ∈ S i × X i such that σ i ( x i | s i ) > p S i has full support). If the condition is satisfied for all ( s i , x i ) ∈ S i × X i ,then we say that the game is strongly identified given σ . A game is [weakly orstrongly] identified if it is [weakly or strongly] identified for all σ .A correctly specified game is weakly identified. Also, two games that are identicalexcept for their feedback may differ in terms of being correctly specified or identified. It would be more precise to say that the game is correctly specified in steady state . .2 Relationship to Nash and self-confirming equilibrium The next result shows that Berk-Nash equilibrium is equivalent to Nash equilibriumwhen the game is both correctly specified and strongly identified.
Proposition 1. (i) Suppose that the game is correctly specified given σ and that σ is a Nash equilibrium of its objective game. Then σ is a Berk-Nash equilibrium ofthe (objective and subjective) game; (ii) Suppose that σ is a Berk-Nash equilibrium ofa game that is correctly specified and strongly identified given σ . Then σ is a Nashequilibrium of the corresponding objective game.Proof. (i) Let σ be a Nash equilibrium and fix any i ∈ I . Then σ i is optimal given Q iσ . Because the game is correctly specified given σ , there exists θ i ∗ ∈ Θ i such that Q iθ i ∗ = Q iσ and, therefore, by Lemma 1, θ i ∗ ∈ Θ i ( σ ). Thus, σ i is also optimal given Q iθ i ∗ and θ i ∗ ∈ Θ i ( σ ), so that σ is a Berk-Nash equilibrium. (ii) Let σ be a Berk-Nashequilibrium and fix any i ∈ I . Then σ i is optimal given ¯ Q iµ i , for some µ i ∈ ∆(Θ i ( σ )).Because the game is correctly specified given σ , there exists θ i ∗ ∈ Θ i such that Q iθ i ∗ = Q iσ and, therefore, by Lemma 1, θ i ∗ ∈ Θ i ( σ ). Moreover, because the game is stronglyidentified given σ , any ˆ θ i ∈ Θ i ( σ ) satisfies Q i ˆ θ i = Q iθ i ∗ = Q iσ . Then σ i is also optimalgiven Q iσ . Thus, σ is a Nash equilibrium. Example 2.4, continued from pg. 8.
Monetary policy . Fix a strategy x P ∗ for the public. Note that U = u ∗ − λ ( x G − x P ∗ + ε e ) + ε U , whereas the governmentbelieves U = θ − θ ( x G + ε e ) + ε U . Thus, by choosing θ ∗ ∈ Θ = R such that θ ∗ = u ∗ + λx P ∗ and θ ∗ = λ , it follows that the distribution over Y = ( U, e ) parameterizedby θ ∗ coincides with the objective distribution given x P ∗ . So, despite appearances, thegame is correctly specified given x P ∗ . Moreover, since V ar ( ε e ) > θ ∗ is the uniqueminimizer of the wKLD function given x P ∗ . Because there is a unique minimizer,then the game is strongly identified given x P ∗ . Since these properties hold for all x P ∗ ,Proposition 1 implies that Berk-Nash equilibrium is equivalent to Nash equilibrium.Thus, the equilibrium policies are the same whether or not the government realizesthat unemployment is driven by surprise, not actual, inflation. (cid:3) Sargent (1999) derived this result for a government doing OLS-based learning (a special caseof our example when errors are normal). We assumed linearity for simplicity, but the result istrue for the more general case with true unemployment U = f U ( x G , x P , ω ) and subjective model f Uθ ( x G , x P , ω ) if, for all x P , there exists θ such that f U ( x G , x P , ω ) = f Uθ ( x G , x P , ω ) for all ( x P , ω ). Proposition 2.
Suppose that the game is correctly specified given σ , and that σ is aBerk-Nash equilibrium. Then σ is also a self-confirming equilibrium. Proof.
Fix any i ∈ I and let ˆ θ i be in the support of µ i , where µ i is player i ’s beliefsupporting the Berk-Nash equilibrium strategy σ i . Because the game is correctlyspecified given σ , there exists θ i ∗ ∈ Θ i such that Q iθ i ∗ = Q iσ and, therefore, by Lemma1, K i ( σ, θ i ∗ ) = 0. Thus, it must also be that K i ( σ, ˆ θ i ) = 0. By Lemma 1, it followsthat Q i ˆ θ i ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all ( s i , x i ) such that σ i ( x i | s i ) >
0. In particular, σ i is optimal given Q i ˆ θ i , and Q i ˆ θ i satisfies the desired self-confirming restriction.For games that are not correctly specified, beliefs can be incorrect on the equilib-rium path, and so a Berk-Nash equilibrium is not necessarily Nash or SCE. An analogy-based game satisfies the following four properties: (i) States and in-formation structure : The state space Ω is finite with distribution p Ω ∈ ∆(Ω). Inaddition, for each i , there is a partition S i of Ω, and the element of S i that contains ω (i.e., the signal of player i in state ω ) is denoted by s i ( ω ); (ii) Perfect feedback : Foreach i , f i ( x, ω ) = ( x − i , ω ) for all ( x, ω ); (iii) Analogy partition : For each i , there existsa partition of Ω, denoted by A i , and the element of A i that contains ω is denotedby α i ( ω ); (iv) Conditional independence : ( Q iθ i ) θ i ∈ Θ i is the set of all joint probabilitydistributions over X − i × Ω that satisfy Q iθ i (cid:0) x − i , ω | s i ( ω (cid:48) ) , x i (cid:1) = Q i Ω ,θ i ( ω | s i ( ω (cid:48) )) Q i X − i ,θ i ( x − i | α i ( ω )) . A strategy profile σ is a SCE if, for all players i ∈ I , σ i is optimal given ˆ Q iσ , where ˆ Q iσ ( · | s i , x i ) = Q iσ ( · | s i , x i ) for all ( s i , x i ) such that σ i ( x i | s i ) >
0. This definition is slightly more general than thetypical one, e.g., Dekel et al. (2004), because it does not restrict players to believe that consequencesare driven by other players’ strategies. A converse does not necessarily hold for a fixed game. The reason is that the definition of SCEdoes not impose any restrictions on off-equilibrium beliefs, while a particular subjective game mayimpose ex-ante restrictions on beliefs. The following converse, however, does hold: For any σ that isan SCE, there exists a game that is correctly specified for which σ is a Berk-Nash equilibrium. This assumption is made to facilitate comparison with Jehiel and Koessler’s (2008) ABEE.
18n other words, every player i believes that x − i and ω are independent conditional onthe analogy partition. For example, if A i = S i for all i , then each player believes thatthe actions of other players are independent of the state, conditional on their ownprivate information. Definition 4. (Jehiel and Koessler, 2008) A strategy profile σ is an analogy-based ex-pectation equilibrium (ABEE) if for all i ∈ I , ω ∈ Ω, and x i such that σ i ( x i | s i ( ω )) > x i ∈ arg max ¯ x i ∈ X i (cid:80) ω (cid:48) ∈ Ω p Ω | S i ( ω (cid:48) | s i ( ω )) (cid:80) x − i ∈ X − i ¯ σ − i ( x − i | ω (cid:48) ) π i (¯ x i , x − i , ω (cid:48) ), where¯ σ − i ( x − i | ω (cid:48) ) = (cid:80) ω (cid:48)(cid:48) ∈ Ω p Ω |A i ( ω (cid:48)(cid:48) | α i ( ω (cid:48) )) (cid:81) j (cid:54) = i σ j ( x j | s j ( ω (cid:48)(cid:48) )). Proposition 3.
In an analogy-based game, σ is a Berk-Nash equilibrium if and onlyif it is an ABEE.Proof. See the Appendix.As mentioned by Jehiel and Koessler (2008), ABEE is equivalent to Eyster andRabin’s (2005) fully cursed equilibrium in the special case where A i = S i for all i .In particular, Proposition 3 provides a misspecified-learning foundation for these twosolution concepts. Jehiel and Koessler (2008) discuss an alternative foundation forABEE, where players receive coarse feedback aggregated over past play and multiplebeliefs are consistent with this feedback. Under this different feedback structure,ABEE can be viewed as a natural selection of the set of SCE. Example 2.5, continued from pg. 8.
Trade with adverse selection.
InOnline Appendix A, we show that x ∗ is a Berk-Nash equilibrium price if and onlyif x = x ∗ maximizes an equilibrium belief function Π( x, x ∗ ) which represents thebelief about expected profit from choosing any price x under a steady-state x ∗ . Theequilibrium belief function depends on the feedback/misspecification assumptions, andwe discuss the following four cases:Π NE ( x ) = Pr( A ≤ x ) ( E [ V | A ≤ x ] − x )Π CE ( x ) = Pr( A ≤ x ) ( E [ V ] − x )Π BE ( x, x ∗ ) = Pr( A ≤ x ) ( E [ V | A ≤ x ∗ ] − x )Π ABEE ( x ) = k (cid:88) j =1 Pr( V ∈ V j ) { Pr( A ≤ x | V ∈ V j ) ( E [ V | V ∈ V j ] − x ) } . NE , is the benchmark case in which beliefs are correct. The secondcase, Π CE , corresponds to perfect feedback and subjective model Θ = ∆( A ) × ∆( V ),as described in page 8. This is an example of an analogy-based game with singleanalogy class V . The buyer learns the true marginal distributions of A and V andbelieves the joint distribution equals the product of the marginal distributions. Berk-Nash coincides with fully cursed equilibrium. The third case, Π BE , has the samemisspecified model as the second case, but assumes partial feedback, in the sense thatthe ask price a is always observed but the valuation v is only observed if there is trade.The equilibrium price x ∗ affects the sample of valuations observed by the buyer and,therefore, her beliefs. Berk-Nash coincides with naive behavioral equilibrium.The last case, Π ABEE , corresponds to perfect feedback and the following misspec-ification: Consider a partition of V into k “analogy classes” ( V j ) j =1 ,...,k . The buyerbelieves that ( A, V ) are independent conditional on V ∈ V i , for each i = 1 , ..., k . Theparameter set is Θ A = × j ∆( A ) × ∆( V ), where, for a value θ = ( θ , ...., θ k , θ V ) ∈ Θ A , θ V parameterizes the marginal distribution over V and, for each j = 1 , ..., k , θ j ∈ ∆( A )parameterizes the distribution over A conditional on V ∈ V j . Berk-Nash coincideswith the ABEE of the game with analogy classes ( V j ) j =1 ,...,k . (cid:3) We provide a learning foundation for equilibrium. We follow Fudenberg and Kreps(1993) in considering games with (slightly) perturbed payoffs because, as they high-light in the context of providing a learning foundation for mixed-strategy Nash equi-librium, behavior need not be continuous in beliefs without perturbations. Thus, evenif beliefs were to converge, behavior need not settle down in the unperturbed game.Perturbations guarantee that if beliefs converge, then behavior also converges. A perturbation structure is a tuple P = (cid:104) Ξ , P ξ (cid:105) , where: Ξ = × i ∈ I Ξ i and Ξ i ⊆ R X i is a set of payoff perturbations for each action of player i ; P ξ = ( P ξ i ) i ∈ I , where P ξ i ∈ ∆(Ξ i ) is a distribution over payoff perturbations of player i that is absolutelycontinuous with respect to the Lebesgue measure, satisfies ´ Ξ i || ξ i || P ξ ( dξ i ) < ∞ , In Online Appendix A, we also consider the case of ABEE with partial feedback. perturbed game G P = (cid:104)G , P(cid:105) is composed of a game G and a perturbation structure P . The timingof a perturbed game G P coincides with the timing of G , except for two differences.First, before taking an action, each player not only observes her signal s i but alsoprivately observes a vector of own-payoff perturbations ξ i ∈ Ξ i , where ξ i ( x i ) denotesthe perturbation for action x i . Second, her payoff given action x i and consequence y i is π i ( x i , y i ) + ξ i ( x i ).A strategy σ i for player i is optimal in the perturbed game given µ i ∈ ∆(Θ i )if, for all ( s i , x i ) ∈ S i × X i , σ i ( x i | s i ) = P ξ ( ξ i : x i ∈ Ψ i ( µ i , s i , ξ i )), whereΨ i ( µ i , s i , ξ i ) ≡ arg max x i ∈ X i E ¯ Q iµi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) (cid:3) + ξ i ( x i ) . In other words, if σ i is an optimal strategy, then σ i ( x i | s i ) is the probability that x i is optimal when the state is s i and the perturbation is ξ i , taken over all possiblerealizations of ξ i . The definition of Berk-Nash equilibrium of a perturbed game G P isanalogous to Definition 1, with the only difference that optimality must be requiredwith respect to the perturbed game. We fix a perturbed game G P and assume that players repeatedly play the correspond-ing objective game at each t = 0 , , , ... , where the time- t state and signals, ( ω t , s t ),and perturbations ξ t , are independently drawn every period from the same distribution p and P ξ , respectively. In addition, each player i has a prior µ i with full support overher (finite-dimensional) parameter set, Θ i . At the end of every period t , each playeruses Bayes’ rule and the information obtained in all past periods (her own signals, ac-tions, and consequences) to update beliefs. Players believe that they face a stationaryenvironment and myopically maximize the current period’s expected payoff.Let ∆ (Θ i ) denote the set of probability distributions on Θ i with full support. Let B i : ∆ (Θ i ) × S i × X i × Y i → ∆ (Θ i ) denote the Bayesian operator of player i : for all We restrict attention to parametric models (i.e., finite-dimensional parameter spaces) because,otherwise, Bayesian updating need not converge to the truth for most priors and parameter valueseven in correctly specified statistical settings (Freedman (1963), Diaconis and Freedman (1986)). ⊆ Θ Borel measurable and all ( µ i , s i , x i , y i ) ∈ ∆ (Θ i ) × S i × X i × Y i , B i ( µ i , s i , x i , y i )( A ) = ´ A Q iθ i ( y i | s i , x i ) µ i ( dθ ) ´ Θ Q iθ i ( y i | s i , x i ) µ i ( dθ ) . Bayesian updating is well defined by Assumption 1. Because players believe theyface a stationary environment with i.i.d. perturbations, it is without loss of generalityto restrict player i ’s behavior at time t to depend on ( µ it , s it , ξ it ). Definition 5. A policy of player i is a sequence of functions φ i = ( φ it ) t , where φ it : ∆(Θ i ) × S i × Ξ i → X i . A policy φ i is optimal if φ it ∈ Ψ i for all t . A policy profile φ = ( φ i ) i ∈ I is optimal if φ i is optimal for all i ∈ I .Let H ⊆ ( S × Ξ × X × Y ) ∞ denote the set of histories, where any history h =( s , ξ , x , y , ..., s t , ξ t , x t , y t ... ) ∈ H satisfies the feasibility restriction: for all i ∈ I , y it = f i ( x it , x − it , ω ) for some ω ∈ supp ( p Ω | S i ( · | s it )) for all t . Let P µ ,φ denote theprobability distribution over H that is induced by the priors µ = ( µ i ) i ∈ I , and thepolicy profiles φ = ( φ i ) i ∈ I . Let ( µ t ) t denote the sequence of beliefs µ t : H → × i ∈ I ∆(Θ i )such that, for all t ≥ i ∈ I , µ it is the posterior at time t defined recursivelyby µ it ( h ) = B i ( µ it − ( h ) , s it − ( h ) , x it − ( h ) , y it − ( h )) for all h ∈ H , where s it − ( h ) is player i ’s signal at t − h , and similarly for x it − ( h ) and y it − ( h ). Definition 6.
The sequence of intended strategy profiles given policy profile φ = ( φ i ) i ∈ I is the sequence ( σ t ) t of random variables σ t : H → × i ∈ I ∆( X i ) S i such that,for all t , all i ∈ I , and all ( x i , s i ) ∈ X i × S i , σ it ( h )( x i | s i ) = P ξ (cid:0) ξ i : φ it ( µ it ( h ) , s i , ξ i ) = x i (cid:1) . (7)An intended strategy profile σ t describes how each player would behave at time t for each possible signal; it is a random variable because it depends on the players’beliefs at time t , µ t , which in turn depend on the past history. By Assumption 1(ii)-(iii), there exists θ ∈ Θ and an open ball containing it, such that Q iθ (cid:48) > θ (cid:48) in the ball. Thus the Bayesian operator is well-defined for any µ i ∈ ∆ (Θ i ). Moreover, byAssumption 1(iii), such θ ’s are dense in Θ, so the Bayesian operator maps ∆ (Θ i ) into itself. Definition 7.
A strategy profile σ is stable [or strongly stable] under policy profile φ if the sequence of intended strategies, ( σ t ) t , converges to σ with positive probability[or with probability one], i.e., P µ ,φ (cid:16) lim t →∞ (cid:107) σ t ( h ) − σ (cid:107) = 0 (cid:17) > . Lemma 2 says that, if behavior stabilizes to a strategy profile σ , then, for eachplayer i , beliefs become increasingly concentrated on Θ i ( σ ). This result extends find-ings from the statistics of misspecified learning (Berk (1966), Bunke and Milhaud(1998)) to a setting with active learning (i.e., players learn from data that is endoge-nously generated by their own actions). Three new issues arise: (i) Previous resultsneed to be extended to the case of non-i.i.d. and endogenous data; (ii) It is not ob-vious that steady-state beliefs can be characterized based on steady-state behavior,independently of the path of play (Assumption 1 plays an important role here; SeeSection 5 for an example); (iii) We allow the wKLD function to be nonfinite so thatplayers can believe that other players follow pure strategies. Lemma 2.
Suppose that, for a policy profile φ , the sequence of intended strategies, ( σ t ) t , converges to σ for all histories in a set H ⊆ H such that P µ ,φ ( H ) > . Then,for all open sets U i ⊇ Θ i ( σ ) , lim t →∞ µ it ( U i ) = 1 , a.s.- P µ ,φ in H .Proof. See the Appendix.The sketch of the proof of Lemma 2 is as follows (we omit the i subscript to easethe notational burden). Consider an arbitrary (cid:15) > (cid:15) ( σ ) ⊆ Θdefined as the points which are within (cid:15) distance of Θ( σ ). The time t posterior overthe complement of Θ (cid:15) ( σ ), µ t (Θ \ Θ (cid:15) ( σ )), can be expressed as ´ Θ \ Θ (cid:15) ( σ ) (cid:81) t − τ =0 Q θ ( y τ | s τ , x τ ) µ ( dθ ) ´ Θ (cid:81) t − τ =0 Q θ ( y τ | s τ , x τ ) µ ( dθ ) = ´ Θ \ Θ (cid:15) ( σ ) e tK t ( θ ) µ ( dθ ) ´ Θ e tK t ( θ ) µ ( dθ ) For example, if player 1 believes that player 2 plays A with probability θ and B with 1 − θ , thenthe wKLD function is infinity at θ = 1 if player 2 plays B with positive probability. K t ( θ ) equals minus the log-likelihood ratio, − t (cid:80) t − τ =0 ln Q στ ( y τ | s τ ,x τ ) Q θ ( y τ | s τ ,x τ ) . This ex-pression and straightforward algebra implies that µ t (Θ \ Θ (cid:15) ( σ )) ≤ ´ Θ \ Θ (cid:15) ( σ ) e t ( K t ( θ )+ K ( σ,θ )+ δ ) µ ( dθ ) ´ Θ η ( σ ) e t ( K t ( θ )+ K ( σ,θ )+ δ ) µ ( dθ )for any δ > θ ∈ Θ( σ ) and η > (cid:15) -separated”from Θ( σ ), whereas the integral in the denominator is taken over points which are“ η -close” to Θ( σ ).Intuitively, if K t ( · ) behaves asymptotically like − K ( σ, · ), there exist sufficientlysmall δ > η > K t ( θ ) + K ( σ, θ ) + δ is negative for all θ which are“ (cid:15) -separated” from Θ( σ ), and positive for all θ which are “ η -close” to Θ( σ ). Thus, thenumerator converges to zero, whereas the denominator diverges to infinity, providedthat Θ η ( σ ) has positive measure under the prior.The nonstandard part of the proof consists of establishing that Θ η ( σ ) has positivemeasure under the prior, which relies on Assumption 1, and that indeed K t ( · ) behavesasymptotically like − K ( σ, · ). By virtue of Fatou’s lemma, for θ ∈ Θ η ( σ ) it suffices toshow almost sure pointwise convergence of K t ( θ ) to − K ( σ, θ ); this is done in ClaimB(i) in the Appendix and relies on a LLN argument for non-iid variables. On theother hand, over θ ∈ Θ \ Θ (cid:15) ( σ ), we need to control the asymptotic behavior of K t ( . )uniformly to be able to interchange the limit and integral. In Claims B(ii) and B(iii)in the Appendix, we establish that there exists α > \ Θ (cid:15) ( σ ), K t ( · ) < − K ( σ, θ ) − α .While Lemma 2 implies that the support of posteriors converges, posteriors neednot converge. We can always find, however, a subsequence of posteriors that converges.By continuity of behavior in beliefs and the assumption that players are myopic, thestable strategy profile must be statically optimal. Thus, we obtain the following char-acterization of the set of stable strategy profiles when players follow optimal policies. Theorem 2.
Suppose that a strategy profile σ is stable under an optimal policy profilefor a perturbed game. Then σ is a Berk-Nash equilibrium of the perturbed game.Proof. Let φ denote the optimal policy function under which σ is stable. By Lemma2, there exists H ⊆ H with P µ ,φ ( H ) > h ∈ H , lim t →∞ σ t ( h ) = σ t →∞ µ it ( U i ) = 1 for all i ∈ I and all open sets U i ⊇ Θ i ( σ ); for the remainder ofthe proof, fix any h ∈ H . For all i ∈ I , compactness of ∆(Θ i ) implies the existence ofa subsequence, which we denote as ( µ it ( j ) ) j , such that µ it ( j ) converges (weakly) to µ i ∞ (the limit could depend on h ). We conclude by showing, for all i ∈ I :(i) µ i ∞ ∈ ∆(Θ i ( σ )): Suppose not, so that there exists ˆ θ i ∈ supp ( µ i ∞ ) such thatˆ θ i / ∈ Θ i ( σ ). Then, since Θ i ( σ ) is closed (by Lemma 1), there exists an open set U i ⊃ Θ i ( σ ) with closure ¯ U i such that ˆ θ i / ∈ ¯ U i . Then µ i ∞ ( ¯ U i ) <
1, but this contradictsthe fact that µ i ∞ (cid:0) ¯ U i (cid:1) ≥ lim sup j →∞ µ it ( j ) (cid:0) ¯ U i (cid:1) ≥ lim j →∞ µ it ( j ) ( U i ) = 1, where the firstinequality holds because ¯ U i is closed and µ it ( j ) converges (weakly) to µ i ∞ .(ii) σ i is optimal for the perturbed game given µ i ∞ ∈ ∆(Θ i ): σ i ( x i | s i ) = lim j →∞ σ it ( j ) ( h )( x i | s i ) = lim j →∞ P ξ (cid:0) ξ i : x i ∈ Ψ i ( µ it ( j ) , s i , ξ i ) (cid:1) = P ξ (cid:0) ξ i : x i ∈ Ψ i ( µ i ∞ , s i , ξ i ) (cid:1) , where the second equality follows because φ i is optimal and Ψ i is single-valued, a.s.- P ξ i , and the third equality follows from a standard continuity argument. Theorem 2 provides our main justification for Berk-Nash equilibria: any strategyprofile that is not an equilibrium cannot represent limiting behavior of optimizingplayers. Theorem 2, however, does not imply that behavior stabilizes. It is wellknown that convergence is not guaranteed for Nash equilibrium, which is a specialcase of Berk-Nash equilibrium. Thus, some assumption needs to be relaxed to proveconvergence for general games. Fudenberg and Kreps (1993) show that a conversefor the case of Nash equilibrium can be obtained by relaxing optimality and allowingplayers to make vanishing optimization mistakes.
Definition 8.
A policy profile φ is asymptotically optimal if there exists a positivereal-valued sequence ( ε t ) t with lim t →∞ ε t = 0 such that, for all i ∈ I , all ( µ i , s i , ξ i ) ∈ Ψ i is single-valued a.s.- P ξ i because the set of ξ i such that i ( µ i , s i , ξ i ) > X i and, by absolute continuity of P ξ i , this set has measure zero. Jordan (1993) shows that non-convergence is robust to the choice of initial conditions; Benaimand Hirsch (1999) replicate this finding for the perturbed version of Jordan’s game. In the game-theory literature, general global convergence results have only been obtained in special classes ofgames—e.g. zero-sum, potential, and supermodular games (Hofbauer and Sandholm, 2002). i ) × S i × Ξ i , all t , and all x i ∈ X i , E ¯ Q iµi ( ·| s i ,x it ) (cid:2) π i ( x it , Y i ) (cid:3) + ξ i ( x it ) ≥ E ¯ Q iµi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) (cid:3) + ξ i ( x i ) − ε t where x it = φ it ( µ i , s i , ξ i ).Fudenberg and Kreps’ (1993) insight is to suppose that players are convincedearly on that the equilibrium strategy is the right one to play, and continue to playthis strategy unless they have strong enough evidence to think otherwise. And, asthey continue to play the equilibrium strategy, the evidence increasingly convincesthem that it is the right thing to do. This idea, however, need not work for Berk-Nash equilibrium because beliefs may not converge if the model is misspecified (seeBerk (1966) for an example). If the game is weakly identified, however, Lemma 2 andFudenberg and Kreps’ (1993) insight can be combined to obtain the following converseof Theorem 2. Theorem 3.
Suppose that σ is a Berk-Nash equilibrium of a perturbed game that isweakly identified given σ . Then there exists a profile of priors with full support andan asymptotically optimal policy profile φ such that σ is strongly stable under φ .Proof. See Online Appendix B. importance of assumption 1. The following example illustrates that equilib-rium may not exist and Lemma 2 fails if Assumption 1 does not hold. A single agentchooses action x ∈ { A, B } and obtains outcome y ∈ { , } . The agent’s model isparameterized by θ = ( θ A , θ B ), where Q θ ( y = 1 | A ) = θ A and Q θ ( y = 1 | B ) = θ B .The true model is θ = (1 / , / θ = (0 , /
4) and θ = (1 / , /
4) to be possible, i.e., Θ = { θ , θ } . In particular,Assumption 1(iii) fails for parameter value θ . Suppose that A is uniquely optimalfor parameter value θ and B is uniquely optimal for θ (further details about payoffsare not needed). The requirement that the priors have full support makes the statement non trivial. Assumption 1(iii) would hold if, for some ¯ ε > θ = ( ε, /
4) were also in Θ for all 0 < ε ≤ ¯ ε .
26 Berk-Nash equilibrium does not exist: If A is played with positive probability,then the wKLD is infinity at θ (i.e., θ cannot rationalize y = 1 given A ) and θ isthe best fit; but then A is not optimal. If B is played with probability 1, then θ isthe best fit; but then B is not optimal. In addition, Lemma 2 fails: Suppose thatthe path of play converges to pure strategy B . The best fit given B is θ , but theposterior need not converge weakly to a degenerate probability distribution on θ ; itis possible that, along the path of play, the agent tried action A and observed y = 1,in which case the posterior would immediately assign probability 1 to θ . forward-looking agents . In the dynamic model, we assumed that playersare myopic. In Online Appendix C, we extend Theorem 2 to the case of non-myopicplayers who solve a dynamic optimization problem with beliefs as a state variable.A key fact used in the proof of Theorem 2 is that myopically optimal behavior iscontinuous in beliefs. Non-myopic optimal behavior is also continuous in beliefs, butthe issue is that it may not coincide with myopic behavior in the steady state if playersstill have incentives to experiment. We prove the extension by requiring that the gameis weakly identified, which guarantees that players have no incentives to experimentin steady state. large population models.
The framework assumes that there is a fixed num-ber of players but, by focusing on stationary subjective models, rules out aspects of“repeated games” where players attempt to influence each others’ play. In Online Ap-pendix D, we adapt the equilibrium concept to settings in which there is a populationof a large number of agents in the role of each player, so that agents have negligibleincentives to influence each other’s play. extensive-form games . Our results hold for an alternative timing where player i commits to a signal-contingent plan of action (i.e., a strategy) and observes both therealized signal s i and the consequence y i ex post. In particular, Berk-Nash equilib-rium is applicable to extensive-form games provided that players compete by choosingcontingent plan of actions and know the extensive form. But the right approach isless clear if players have a misspecified view of the extensive form (for example, theymay not even know the set of strategies available to them) or if players play thegame sequentially (for example, we would need to define and update beliefs at eachinformation set). The extension to extensive-form games is left for future work. Jehiel (1995) considers the class of repeated alternating-move games and assumes that playersonly forecast a limited number of time periods into the future; see Jehiel (1998) for a learningfoundation. Jehiel and Samet (2007) consider the general class of extensive form games with perfect elationship to bounded rationality literature. By providing a languagethat makes the underlying misspecification explicit, we offer some guidance for choos-ing between different models of bounded rationality. For example, we could modelthe observed behavior of an instructor in Example 2.3 by directly assuming that shebelieves criticism improves performance and praise worsens it. But extrapolatingthis observed belief to other contexts may lead to erroneous conclusions. Instead,we postulate what we think is a plausible misspecification (i.e., failure to account forregression to the mean) and then derive beliefs endogenously, as a function of thecontext.We mentioned in the paper several instances of bounded rationality that can beformalized via misspecified, endogenous learning. Other examples in the literaturecan also be viewed as restricting beliefs using the wKLD measure, but fall outside thescope of our paper either because interactions are mediated by a price or because theproblem is dynamic (we focus on the repetition of a static problem). For example,Blume and Easley (1982) and Rabin and Vayanos (2010) explicitly characterize beliefsusing the limit of a likelihood function, while Bray (1982), Radner (1982), Sargent(1993), and Evans and Honkapohja (2001) focus specifically on OLS learning withmisspecified models. Piccione and Rubinstein (2003), Eyster and Piccione (2013),and Spiegler (2013) study pattern recognition in dynamic settings and impose con-sistency requirements on beliefs that could be interpreted as minimizing the wKLDmeasure. In the sampling equilibrium of Osborne and Rubinstein (1998) and Spiegler(2006), beliefs may be incorrect due to learning from a limited sample, rather thanfrom misspecified learning. Other instances of bounded rationality that do not seemnaturally fitted to misspecified learning, include biases in information processing dueto computational complexity (e.g., Rubinstein (1986), Salant (2011)), bounded mem-ory (e.g., Wilson, 2003), self-deception (e.g., B´enabou and Tirole (2002), Compte andPostlewaite (2004)), or sparsity-based optimization (Gabaix (2014)). information and assume that players simplify the game by partitioning the nodes into similarityclasses. In both cases, players are required to have correct beliefs, given their limited or simplifiedview of the game. This assumption corresponds to a singleton set Θ, thus fixing beliefs at the outset and leavingno space for learning. This approach is common in past work that assumes that agents have amisspecified model but there is no learning about parameter values, e.g., Barberis et al. (1998). eferences Al-Najjar, N. , “Decision Makers as Statisticians: Diversity, Ambiguity and Learn-ing,”
Econometrica , 2009, (5), 1371–1401. and M. Pai , “Coarse decision making and overfitting,” Journal of EconomicTheory, forthcoming , 2013.
Aliprantis, C.D. and K.C. Border , Infinite dimensional analysis: a hitchhiker’sguide , Springer Verlag, 2006.
Aragones, E., I. Gilboa, A. Postlewaite, and D. Schmeidler , “Fact-Free Learn-ing,”
American Economic Review , 2005, (5), 1355–1368. Arrow, K. and J. Green , “Notes on Expectations Equilibria in Bayesian Settings,”
Institute for Mathematical Studies in the Social Sciences Working Paper No. 33 ,1973.
Barberis, N., A. Shleifer, and R. Vishny , “A model of investor sentiment,”
Journal of financial economics , 1998, (3), 307–343. Battigalli, P. , Comportamento razionale ed equilibrio nei giochi e nelle situazionisociali , Universita Bocconi, Milano, 1987. , S. Cerreia-Vioglio, F. Maccheroni, and M. Marinacci , “Selfconfirmingequilibrium and model uncertainty,” Technical Report 2012.
B´enabou, Roland and Jean Tirole , “Self-confidence and personal motivation,”
The Quarterly Journal of Economics , 2002, (3), 871–915.
Benaim, M. and M.W. Hirsch , “Mixed equilibria and dynamical systems arisingfrom fictitious play in perturbed games,”
Games and Economic Behavior , 1999, (1-2), 36–72. Benaim, Michel , “Dynamics of stochastic approximation algorithms,” in “Seminairede Probabilites XXXIII,” Vol. 1709 of
Lecture Notes in Mathematics , Springer BerlinHeidelberg, 1999, pp. 1–68.
Berk, R.H. , “Limiting behavior of posterior distributions when the model is incor-rect,”
The Annals of Mathematical Statistics , 1966, (1), 51–58. Bickel, Peter J, Chris AJ Klaassen, Peter J Bickel, Y Ritov, J Klaassen,Jon A Wellner, and YA’Acov Ritov , Efficient and adaptive estimation forsemiparametric models , Johns Hopkins University Press Baltimore, 1993.
Billingsley, P. , Probability and Measure , Wiley, 1995.29 lume, L.E. and D. Easley , “Learning to be Rational,”
Journal of EconomicTheory , 1982, (2), 340–351. Bray, M. , “Learning, estimation, and the stability of rational expectations,”
Journalof economic theory , 1982, (2), 318–339. Bunke, O. and X. Milhaud , “Asymptotic behavior of Bayes estimates under pos-sibly incorrect models,”
The Annals of Statistics , 1998, (2), 617–644. Compte, Olivier and Andrew Postlewaite , “Confidence-enhanced performance,”
American Economic Review , 2004, pp. 1536–1557.
Dekel, E., D. Fudenberg, and D.K. Levine , “Learning to play Bayesian games,”
Games and Economic Behavior , 2004, (2), 282–303. Diaconis, P. and D. Freedman , “On the consistency of Bayes estimates,”
TheAnnals of Statistics , 1986, pp. 1–26.
Doraszelski, Ulrich and Juan F Escobar , “A theory of regular Markov perfectequilibria in dynamic stochastic games: Genericity, stability, and purification,”
The-oretical Economics , 2010, (3), 369–402. Durrett, R. , Probability: Theory and Examples , Cambridge University Press, 2010.
Easley, D. and N.M. Kiefer , “Controlling a stochastic process with unknown pa-rameters,”
Econometrica , 1988, pp. 1045–1064.
Esponda, I. , “Behavioral equilibrium in economies with adverse selection,”
TheAmerican Economic Review , 2008, (4), 1269–1291. and D. Pouzo , “Learning Foundation for Equilibrium in Voting Environmentswith Private Information,” working paper , 2011. Esponda, I. and D. Pouzo , “Berk-Nash Equilibrium: A Framework for ModelingAgents with Misspecified Models,”
ArXiv 1411.1152 , November 2014.
Evans, G. W. and S. Honkapohja , Learning and Expectations in Macroeconomics ,Princeton University Press, 2001.
Eyster, E. and M. Rabin , “Cursed equilibrium,”
Econometrica , 2005, (5), 1623–1672. Eyster, Erik and Michele Piccione , “An approach to asset-pricing under incom-plete and diverse perceptions,”
Econometrica , 2013, (4), 1483–1506. Freedman, D.A. , “On the asymptotic behavior of Bayes’ estimates in the discretecase,”
The Annals of Mathematical Statistics , 1963, (4), 1386–1403.30 udenberg, D. and D. Kreps , “Learning Mixed Equilibria,” Games and EconomicBehavior , 1993, , 320–367. and D.K. Levine , “Self-confirming equilibrium,” Econometrica , 1993, pp. 523–545. and , “Steady state learning and Nash equilibrium,”
Econometrica , 1993,pp. 547–573. and , The theory of learning in games , Vol. 2, The MIT press, 1998. and , “Learning and Equilibrium,”
Annual Review of Economics , 2009, , 385–420. and D.M. Kreps , “A Theory of Learning, Experimentation, and Equilibrium inGames,” Technical Report, mimeo 1988. and , “Learning in extensive-form games I. Self-confirming equilibria,” Gamesand Economic Behavior , 1995, (1), 20–55. Gabaix, Xavier , “A sparsity-based model of bounded rationality,”
The QuarterlyJournal of Economics , 2014, (4), 1661–1710.
Harsanyi, J.C. , “Games with randomly disturbed payoffs: A new rationale formixed-strategy equilibrium points,”
International Journal of Game Theory , 1973, (1), 1–23. Hirsch, M. W., S. Smale, and R. L. Devaney , Differential Equations, DynamicalSystems and An Introduction to Chaos , Elsevier Academic Press, 2004.
Hofbauer, J. and W.H. Sandholm , “On the global convergence of stochasticfictitious play,”
Econometrica , 2002, (6), 2265–2294. Jehiel, P. , “Analogy-based expectation equilibrium,”
Journal of Economic theory ,2005, (2), 81–104. and D. Samet , “Valuation equilibrium,”
Theoretical Economics , 2007, (2),163–185. and F. Koessler , “Revisiting games of incomplete information with analogy-basedexpectations,” Games and Economic Behavior , 2008, (2), 533–557. Jehiel, Philippe , “Learning to play limited forecast equilibria,”
Games and Eco-nomic Behavior , 1998, (2), 274–298. Jehiel, Phillippe , “Limited horizon forecast in repeated alternate games,”
Journalof Economic Theory , 1995, (2), 497–519.31 ordan, J. S. , “Three problems in learning mixed-strategy Nash equilibria,” Gamesand Economic Behavior , 1993, (3), 368–386. Kagel, J.H. and D. Levin , “The winner’s curse and public information in commonvalue auctions,”
The American Economic Review , 1986, pp. 894–920.
Kalai, E. and E. Lehrer , “Rational learning leads to Nash equilibrium,”
Econo-metrica , 1993, pp. 1019–1045.
Kirman, A. P. , “Learning by firms about demand conditions,” in R. H. Day andT. Groves, eds.,
Adaptive economic models , Academic Press 1975, pp. 137–156.
Kreps, D. M. and R. Wilson , “Sequential equilibria,”
Econometrica , 1982, pp. 863–894.
Kullback, S. and R. A. Leibler , “On Information and Sufficiency,”
Annals ofMathematical Statistics , 1951, (1), 79–86. Kushner, H. J. and G. G. Yin , Stochastic Approximation and Recursive Algo-rithms and Applications , Springer Verlag, 2003.
McLennan, A. , “Price dispersion and incomplete learning in the long run,”
Journalof Economic Dynamics and Control , 1984, (3), 331–347. Nyarko, Y. , “Learning in mis-specified models and the possibility of cycles,”
Journalof Economic Theory , 1991, (2), 416–427., “On the convexity of the value function in Bayesian optimal control problems,” Economic Theory , 1994, (2), 303–309. Osborne, M.J. and A. Rubinstein , “Games with procedurally rational players,”
American Economic Review , 1998, , 834–849. Piccione, M. and A. Rubinstein , “Modeling the economic interaction of agentswith diverse abilities to recognize equilibrium patterns,”
Journal of the Europeaneconomic association , 2003, (1), 212–223. Pollard, D. , A User’s Guide to Measure Theoretic Probability , Cambridge UniversityPress, 2001.
Rabin, M. , “Inference by Believers in the Law of Small Numbers,”
Quarterly Journalof Economics , 2002, (3), 775–816. and D. Vayanos , “The gambler’s and hot-hand fallacies: Theory and applica-tions,”
The Review of Economic Studies , 2010, (2), 730–778. Radner, R. , Equilibrium Under Uncertainty , Vol. II of
Handbook of MathematicalEconomies , North-Holland Publishing Company, 1982.32 othschild, M. , “A two-armed bandit theory of market pricing,”
Journal of Eco-nomic Theory , 1974, (2), 185–202. Rubinstein, A. and A. Wolinsky , “Rationalizable conjectural equilibrium: be-tween Nash and rationalizability,”
Games and Economic Behavior , 1994, (2),299–311. Rubinstein, Ariel , “Finite automata play the repeated prisoner’s dilemma,”
Journalof economic theory , 1986, (1), 83–96. Salant, Y. , “Procedural analysis of choice rules with applications to bounded ratio-nality,”
The American Economic Review , 2011, (2), 724–748.
Sargent, T. J. , Bounded rationality in macroeconomics , Oxford University Press,1993.,
The Conquest of American Inflation , Princeton University Press, 1999.
Schwartzstein, J. , “Selective Attention and Learning,” working paper , 2009.
Selten, R. , “Reexamination of the perfectness concept for equilibrium points in ex-tensive games,”
International journal of game theory , 1975, (1), 25–55. Sobel, J. , “Non-linear prices and price-taking behavior,”
Journal of Economic Be-havior & Organization , 1984, (3), 387–396. Spiegler, R. , “The Market for Quacks,”
Review of Economic Studies , 2006, , 1113–1131., “Placebo reforms,” The American Economic Review , 2013, (4), 1490–1506., “Bayesian Networks and Boundedly Rational Expectations,”
Working Paper , 2014.
Stein, Charles , “A bound for the error in the normal approximation to the dis-tribution of a sum of dependent random variables,” in “Proceedings of the SixthBerkeley Symposium on Mathematical Statistics and Probability, Volume 2: Prob-ability Theory” University of California Press Berkeley, Calif. 1972, pp. 583–602.
Tversky, T. and D. Kahneman , “Availability: A heuristic for judging frequencyand probability,”
Cognitive Psychology , 1973, , 207–232. White, Halbert , “Maximum likelihood estimation of misspecified models,”
Econo-metrica: Journal of the Econometric Society , 1982, pp. 1–25.
Wilson, A. , “Bounded Memory and Biases in Information Processing,”
WorkingPaper , 2003. 33 ppendix
Let Z i = (cid:8) ( s i , x i , y i ) ∈ S i × X i × Y i : y i = f i ( x i , x − i , ω ) , x − i ∈ X − i , ω ∈ supp ( p Ω | S i ( · | s i )) (cid:9) .For all z i = ( s i , x i , y i ) ∈ Z i , define ¯ P iσ ( z i ) = Q iσ ( y i | s i , x i ) σ i ( x i | s i ) p S i ( s i ). We some-times abuse notation and write Q iσ ( z i ) ≡ Q iσ ( y i | s i , x i ), and similarly for Q iθ i . Thefollowing claim is used in the proofs below. Claim A.
For all i ∈ I : (i) There exists θ i ∗ ∈ Θ i and ¯ K < ∞ such that, ∀ σ ∈ Σ , K i ( σ, θ i ∗ ) ≤ ¯ K ; (ii) Fix any θ i ∈ Θ i and ( σ n ) n such that Q iθ i ( z i ) > ∀ z i ∈ Z i and lim n →∞ σ n = σ . Then lim n →∞ K i ( σ n , θ i ) = K i ( σ, θ i ) ; (iii) K i is (jointly) lowersemicontinuous: Fix any ( θ in ) n and ( σ n ) n such that lim n →∞ θ in = θ i , lim n →∞ σ n = σ .Then lim inf n →∞ K i ( σ n , θ in ) ≥ K ( σ, θ i ) ; (iv) Let ξ i be a random vector in R X i withabsolutely continuous probability distribution P ξ . Then, ∀ ( s i , x i ) ∈ S i × X i , µ i (cid:55)→ σ i ( µ i )( x i | s i ) = P ξ (cid:0) ξ i : x i ∈ arg max ¯ x i ∈ X i E ¯ Q iµi ( ·| s i , ¯ x i ) [ π i (¯ x i , Y i )]+ ξ i (¯ x i ) (cid:1) is continuous.Proof . (i) By Assumption 1 and finiteness of Z i , there exist θ i ∗ ∈ Θ and α ∈ (0 , Q iθ i ∗ ( z i ) ≥ α ∀ z i ∈ Z i . Thus, ∀ σ ∈ Σ, K ( σ, θ ∗ ) ≤ − E ¯ P iσ [ln Q iθ ∗ ( Z i )] ≤ − ln α .(ii) K i ( σ n , θ i ) − K ( σ, θ i ) = (cid:80) z i ∈ Z i (cid:8)(cid:0) ¯ P iσ n ( z i ) ln Q iσ n ( z i ) − ¯ P iσ ( z i ) ln Q iσ ( z i ) (cid:1) + (cid:0) ¯ P iσ ( z i ) − ¯ P iσ n ( z i ) (cid:1) ln Q iθ i ( z i ) (cid:9) . The first term in the RHS converges to 0 because lim n →∞ σ n = σ , Q σ is continuous, and x ln x is continuous ∀ x ∈ [0 , n →∞ σ n = σ , ¯ P iσ is continuous, and ln Q iθ i ( z i ) ∈ ( −∞ , ∀ z i ∈ Z i .(iii) K i ( σ n , θ in ) − K ( σ, θ i ) = (cid:80) z i ∈ Z i (cid:8)(cid:0) ¯ P iσ n ( z i ) ln Q iσ n ( z i ) − ¯ P iσ ( z i ) ln Q iσ ( z i ) (cid:1) + (cid:0) ¯ P iσ ( z i ) ln Q iθ i ( z i ) − ¯ P iσ n ( z i ) ln Q iθ in ( z i ) (cid:1)(cid:9) . The first term in the RHS converges to 0 (same argument as inpart (ii)). The proof concludes by showing that, ∀ z i ∈ Z i ,lim inf n →∞ − ¯ P iσ n ( z i ) ln Q iθ in ( z i ) ≥ − ¯ P iσ ( z i ) ln Q iθ i ( z i ) . (8)Suppose lim inf n →∞ − ¯ P iσ n ( z i ) ln Q iθ in ( z i ) ≤ M < ∞ (if not, (8) holds trivially). Theneither (i) ¯ P iσ n ( z i ) → ¯ P iσ ( z i ) >
0, in which case (8) holds with equality (by continuityof θ i (cid:55)→ Q iθ i ), or (ii) ¯ P iσ n ( z i ) → ¯ P iσ ( z i ) = 0, in which case (8) holds because its RHS is0 (by convention that 0 ln 0 = 0) and its LHS is always nonnegative.(iv) The proof is standard and, therefore, omitted. (cid:3) Proof of Lemma 1.
Part (i). Note that K i ( σ, θ i ) ≥ − (cid:88) ( s i ,x i ) ∈ S i × X i ln (cid:0) E Q iσ ( ·| s i ,x i ) (cid:2) Q iθ i ( Y i | s i , x i ) Q iσ ( Y i | s i , x i ) (cid:3)(cid:1) σ i ( x i | s i ) p S i ( s i ) = 0 , · ),and it holds with equality if and only if Q iθ i ( · | s i , x i ) = Q iθ i ( · | s i , x i ) ∀ ( s i , x i ) suchthat σ i ( x i | s i ) > p S i ( s i ) > i ( σ ) is nonempty : By Claim A(i), ∃ K < ∞ such that the minimizersare in the constraint set { θ i ∈ Θ i : K i ( σ, θ i ) ≤ K } . Because K i ( σ, · ) is continuousover a compact set, a minimum exists.Θ i ( σ ) is upper hemicontinuous (uhc ): Fix any ( σ n ) n and ( θ in ) n such that lim n →∞ σ n = σ , lim n →∞ θ in = θ i , and θ in ∈ Θ i ( σ n ) ∀ n . We show that θ i ∈ Θ i ( σ ) (so that Θ i ( · )has a closed graph and, by compactness of Θ i , is uhc). Suppose, to obtain a con-tradiction, that θ i / ∈ Θ i ( σ ). By Claim A(i), there exist ˆ θ i ∈ Θ i and ε > K i ( σ, ˆ θ i ) ≤ K i ( σ, θ i ) − ε and K i ( σ, ˆ θ i ) < ∞ . By Assumption 1, ∃ (ˆ θ ij ) j withlim j →∞ ˆ θ ij = ˆ θ i and, ∀ j , Q i ˆ θ ij ( z i ) > ∀ z i ∈ Z i . We show that there is an element of thesequence, ˆ θ iJ , that “does better” than θ in given σ n , which is a contradiction. Because K i ( σ, ˆ θ i ) < ∞ , continuity of K i ( σ, · ) implies that there exists J large enough such that (cid:12)(cid:12)(cid:12) K i ( σ, ˆ θ iJ ) − K i ( σ, ˆ θ i ) (cid:12)(cid:12)(cid:12) ≤ ε/
2. Moreover, Claim A(ii) applied to θ i = ˆ θ iJ implies thatthere exists N ε,J such that, ∀ n ≥ N ε,J , (cid:12)(cid:12)(cid:12) K i ( σ n , ˆ θ iJ ) − K i ( σ, ˆ θ iJ ) (cid:12)(cid:12)(cid:12) ≤ ε/
2. Thus, ∀ n ≥ N ε,J , (cid:12)(cid:12) K i ( σ n , ˆ θ iJ ) − K i ( σ, ˆ θ i ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) K i ( σ n , ˆ θ iJ ) − K i ( σ, ˆ θ iJ ) (cid:12)(cid:12) + (cid:12)(cid:12) K i ( σ, ˆ θ iJ ) − K i ( σ, ˆ θ i ) (cid:12)(cid:12) ≤ ε .Therefore, K i ( σ n , ˆ θ iJ ) ≤ K i ( σ, ˆ θ i ) + ε ≤ K i ( σ, θ i ) − ε. (9)Suppose K i ( σ, θ i ) < ∞ . By Claim A(iii), ∃ n ε ≥ N ε,J such that K i ( σ n ε , θ in ε ) ≥ K i ( σ, θ i ) − ε . This result and expression (9), imply K i ( σ n ε , ˆ θ iJ ) ≤ K ( σ n ε , θ in ε ) − ε .But this contradicts θ in ε ∈ Θ i ( σ n ε ). Finally, if K i ( σ, θ i ) = ∞ , Claim A(iii) impliesthat ∃ n ε ≥ N ε,J such that K i ( σ n ε , θ in ε ) ≥ K , where K is the bound defined in ClaimA(i). But this also contradicts θ in ε ∈ Θ i ( σ n ε ).Θ i ( σ ) is compact : As shown above, Θ i ( · ) has a closed graph, and so Θ i ( σ ) is aclosed set. Compactness of Θ i ( σ ) follows from compactness of Θ i . (cid:3) Proof of Theorem 1.
We prove the result in two parts.
Part 1 . We showexistence of equilibrium in the perturbed game (defined in Section 4.1). Let Γ : × i ∈ I ∆(Θ i ) ⇒ × i ∈ I ∆(Θ i ) be a correspondence such that, ∀ µ = ( µ i ) i ∈ I ∈ × i ∈ I ∆(Θ i ),Γ( µ ) = × i ∈ I ∆ (Θ i ( σ ( µ ))) , where σ ( µ ) = ( σ i ( µ i )) i ∈ I ∈ Σ and is defined as σ i ( µ i )( x i | s i ) = P ξ (cid:18) ξ i : x i ∈ arg max ¯ x i ∈ X i E ¯ Q iµi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , Y i ) (cid:3) + ξ i (¯ x i ) (cid:19) (10)35 ( x i , s i ) ∈ X i × S i . Note that if ∃ µ ∗ ∈ × i ∈ I ∆(Θ i ) such that µ ∗ ∈ Γ( µ ∗ ), then σ ∗ ≡ ( σ i ( µ i ∗ )) i ∈ I is an equilibrium of the perturbed game. We show that such µ ∗ existsby checking the conditions of the Kakutani-Fan-Glicksberg fixed-point theorem: (i) × i ∈ I ∆(Θ i ) is compact, convex and locally convex Hausdorff: The set ∆(Θ i ) is convex,and since Θ i is compact ∆(Θ i ) is also compact under the weak topology (Aliprantisand Border (2006), Theorem 15.11). By Tychonoff’s theorem, × i ∈ I ∆(Θ i ) is compacttoo. Finally, the set is also locally convex under the weak topology; (ii) Γ hasconvex, nonempty images: It is clear that ∆ (Θ i ( σ ( µ ))) is convex valued ∀ µ . Also,by Lemma 1, Θ i ( σ ( µ )) is nonempty ∀ µ ; (iii) Γ has a closed graph: Let ( µ n , ˆ µ n ) n besuch that ˆ µ n ∈ Γ( µ n ) and µ n → µ and ˆ µ n → ˆ µ (under the weak topology). By ClaimA(iv), µ i (cid:55)→ σ i ( µ i ) is continuous. Thus, σ n ≡ ( σ i ( µ in )) i ∈ I → σ ≡ ( σ i ( µ i )) i ∈ I . ByLemma 1, σ (cid:55)→ Θ i ( σ ) is uhc; thus, by Theorem 17.13 in Aliprantis and Border (2006), σ (cid:55)→ × i ∈ I ∆ (Θ i ( σ )) is also uhc. Therefore, ˆ µ ∈ × i ∈ I ∆ (Θ i ( σ )) = Γ( µ ). Part 2 . Fix a sequence of perturbed games indexed by the probability of per-turbations ( P ξ,n ) n . By Part 1, there is a corresponding sequence of fixed points( µ n ) n , such that µ n ∈ × i ∈ I ∆ (Θ i ( σ n )) ∀ n , where σ n ≡ ( σ i ( µ in , P ξ,n )) i ∈ I (see equa-tion (10), where we now explicitly account for dependance on P ξ,n ). By compactness,there exist subsequences of ( µ n ) n and ( σ n ) n that converge to µ and σ , respectively.Since σ (cid:55)→ × i ∈ I ∆ (Θ i ( σ )) is uhc, then µ ∈ × i ∈ I ∆ (Θ i ( σ )). We now show that ifwe choose ( P ξ,n ) n such that, ∀ ε >
0, lim n →∞ P ξ,n ( (cid:107) ξ in (cid:107) ≥ ε ) = 0, then σ is opti-mal given µ in the unperturbed game—this establishes existence of equilibrium inthe unperturbed game. Suppose not, so that there exist i, s i , x i , ˆ x i , and ε > σ i ( x i | s i ) > E ¯ Q iµi ( ·| s i ,x i ) [ π i ( x i , Y i )] + 4 ε ≤ E ¯ Q iµi ( ·| s i , ˆ x i ) [ π i (ˆ x i , Y i )]. By con-tinuity of µ i (cid:55)→ ¯ Q iµ i and the fact that lim n →∞ µ in = µ i , ∃ n such that, ∀ n ≥ n , E ¯ Q iµin ( ·| s i ,x i ) [ π i ( x i , Y i )] + 2 ε ≤ E ¯ Q iµin ( ·| s i , ˆ x i ) [ π i (ˆ x i , Y i )]. It then follows from (10) andlim n →∞ P ξ,n ( (cid:107) ξ in (cid:107) ≥ ε ) = 0 that lim n →∞ σ i ( µ in , P ξ,n )( x i | s i ) = 0. But this contradictslim n →∞ σ i ( µ in , P ξ,n )( x i | s i ) = σ i ( x i | s i ) > (cid:3) Proof of Proposition 3.
In the next paragraph, we prove the following result:For all σ and ¯ θ iσ ∈ Θ i ( σ ), (a) Q i Ω , ¯ θ iσ ( ω (cid:48) | s i ) = p Ω | S i ( ω (cid:48) | s i ) for all s i ∈ S i , ω (cid:48) ∈ Ωand (b) Q i X − i , ¯ θ iσ ( x − i | α i ) = (cid:80) ω (cid:48)(cid:48) ∈ Ω p Ω |A i ( ω (cid:48)(cid:48) | α i ) (cid:81) j (cid:54) = i σ j ( x j | s j ( ω (cid:48)(cid:48) )) for all α i ∈A i , x − i ∈ X − i . Equivalence between Berk-Nash and ABEE follows immediately from(a), (b), and the fact that expected utility of player i with signal s i and beliefs ¯ θ σ This last claim follows since the weak topology is induced by a family of semi-norms of the form: ρ ( µ, µ (cid:48) ) = | E µ [ f ] − E µ (cid:48) [ f ] | for f continuous and bounded for any µ and µ (cid:48) in ∆(Θ i ). (cid:80) ω (cid:48) ∈ Ω Q i Ω , ¯ θ iσ ( ω (cid:48) | s i ) (cid:80) x − i ∈ X − i Q i X − i , ¯ θ iσ ( x − i | α i ( ω (cid:48) )) π i (¯ x i , x − i , ω (cid:48) ).Proof of (a) and (b): − K i ( σ, θ i ) equals, up to a constant, (cid:88) s i , ˜ w, ˜ x − i ln (cid:0) Q i Ω ,θ i (˜ ω | s i ) Q i X − i ,θ i (˜ x − i | α i (˜ ω )) (cid:1) (cid:89) j (cid:54) = i σ j (˜ x j | s j (˜ ω )) p Ω | S i (˜ ω | s i ) p S i ( s i )= (cid:88) s i , ˜ ω ln (cid:0) Q i Ω ,θ i (˜ ω | s i ) (cid:1) p Ω | S i (˜ ω | s i ) p S i ( s i ) + (cid:88) ˜ x − i ,α i ∈A i ln (cid:0) Q i X − i ,θ i (˜ x − i | α i ) (cid:1) (cid:88) ˜ ω ∈ α i (cid:89) j (cid:54) = i σ j (˜ x j | s j (˜ ω )) p Ω (˜ ω ) . It is straightforward to check that any parameter value that maximizes the aboveexpression satisfies (a) and (b). (cid:3)
Proof of Lemma 2.
The proof uses Claim B, which is stated and proven afterthis proof. It is sufficient to establish that lim t →∞ ´ Θ i d i ( σ, θ i ) µ it ( dθ i ) = 0 a.s. in H ,where d i ( σ, θ i ) = inf ˆ θ i ∈ Θ i ( σ ) (cid:107) θ i − ˆ θ i (cid:107) . Fix i ∈ I and h ∈ H . Then, by Bayes’ rule, ˆ Θ i d i ( σ, θ i ) µ it ( dθ i ) = ´ Θ i d i ( σ, θ i ) (cid:81) t − τ =0 Q iθi ( y iτ | s iτ ,x iτ ) Q iστ ( y iτ | s iτ ,x iτ ) µ i ( dθ i ) ´ Θ i (cid:81) t − τ =0 Q iθi ( y iτ | s iτ ,x iτ ) Q iστ ( y iτ | s iτ ,x iτ ) µ i ( dθ i ) = ´ Θ i d i ( σ, θ i ) e tK it ( h,θ i ) µ i ( dθ i ) ´ Θ i e tK it ( h,θ i ) µ i ( dθ i ) , where the first equality is well-defined by Assumption 1, full support of µ i , and thefact that P µ ,φ ( H ) > Q iσ τ terms are positive, and where we define K it ( h, θ i ) = − t (cid:80) t − τ =0 ln Q iστ ( y iτ | s iτ ,x iτ ) Q iθi ( y iτ | s iτ ,x iτ ) for the second equality. For any α >
0, defineΘ iα ( σ ) ≡ { θ i ∈ Θ i : d i ( σ, θ i ) < α } . Then, for all ε > η > ´ Θ i d i ( σ, θ i ) µ it ( dθ i ) ≤ ε + C A it ( h,σ,ε ) B it ( h,σ,η ) , where C ≡ sup θ i ,θ i ∈ Θ i (cid:107) θ i − θ i (cid:107) < ∞ (because Θ i is bounded) and where A it ( h, σ, ε ) = ´ Θ i \ Θ iε ( σ ) e tK it ( h,θ i ) µ i ( dθ i ) and B it ( h, σ, η ) = ´ Θ iη ( σ ) e tK it ( h,θ i ) µ i ( dθ i ).Theproof concludes by showing that, for all (sufficiently small) ε > ∃ η ε > t →∞ A it ( h, σ, ε ) /B it ( h, σ, η ε ) = 0. This result is achieved in several steps. First, ∀ ε >
0, define K iε ( σ ) = inf { K i ( σ, θ i ) | θ i ∈ Θ i \ Θ iε ( σ ) } and α ε = ( K iε ( σ ) − K i ( σ )) / K i ( σ ) = inf θ i ∈ Θ i K i ( σ, θ i ). By continuity of K i ( σ, · ), there exist ¯ ε and ¯ α suchthat, ∀ ε ≤ ¯ ε , 0 < α ε ≤ ¯ α < ∞ . Henceforth, let ε ≤ ¯ ε . It follows that K i ( σ, θ i ) ≥ K iε ( σ ) > K i ( σ ) + 2 α ε (11) ∀ θ i such that d i ( σ, θ i ) ≥ ε . Also, by continuity of K i ( σ, · ), ∃ η ε > ∀ θ i ∈ If, for some θ i , Q iθ i ( y iτ | s iτ , x iτ ) = 0 for some τ ∈ { , ..., t − } , then we define K it ( h, θ i ) = −∞ and exp (cid:8) tK it ( h, θ i ) (cid:9) = 0. iη ε , K i ( σ, θ i ) < K i ( σ ) + α ε / . (12)Second, let ˆΘ i = (cid:8) θ i ∈ Θ i : Q iθ i ( y iτ | s iτ , x iτ ) > τ (cid:9) and ˆΘ iη ε ( σ ) = ˆΘ i ∩ Θ iη ε ( σ ). We now show that µ i ( ˆΘ iη ε ( σ )) >
0. By Lemma 1, Θ i ( σ ) is nonempty. Pickany θ i ∈ Θ i ( σ ). By Assumption 1, ∃ ( θ in ) n in Θ i such that lim n →∞ θ in = θ i and Q iθ in ( y i | s i , x i ) > ∀ y i ∈ f i (Ω , x i , X − i ) and all ( s i , x i ) ∈ S i × X i . In particular, ∃ θ i ¯ n such that d i ( σ, θ i ¯ n ) < . η ε and, by continuity of Q · , there exists an open set U around θ i ¯ n such that U ⊆ ˆΘ iη ε ( σ ). By full support, µ i ( ˆΘ iη ε ( σ )) >
0. Next, note that,lim inf t →∞ B it ( h, σ, η ε ) e t ( K i ( σ )+ αε ) ≥ lim inf t →∞ ˆ ˆΘ iηε ( σ ) e t ( K i ( σ )+ αε + K it ( h,θ i )) µ i ( dθ i ) ≥ ˆ ˆΘ iηε ( σ ) e lim t →∞ t ( K i ( σ )+ αε − K i ( σ,θ i )) µ i ( dθ i ) = ∞ (13)a.s. in H , where the first inequality follows because ˆΘ iη ε ( σ ) ⊆ Θ iη ε ( σ ) and exp is apositive function, the second inequality follows from Fatou’s Lemma and a LLN fornon-iid random variables that implies lim t →∞ K it ( h, θ i ) = − K i ( σ, θ i ) ∀ θ i ∈ ˆΘ i , a.s. in H (see Claim B(i) below), and the last equality follows from (12) and the fact that µ i (Θ iη ε ( σ )) > A it ( h, σ, ε ). Claims B(ii) and B(iii) (see below) implythat ∃ T such that, ∀ t ≥ T , K it ( h, θ i ) < − ( K i ( σ ) + (3 / α ε ) ∀ θ i ∈ Θ i \ Θ iε ( σ ), a.s. in H . Thus, lim t →∞ A it ( h, σ, ε ) e t ( K i ( σ )+ α ε ) = lim t →∞ ˆ Θ i \ Θ iε ( σ ) e t ( K i ( σ )+ α ε + K it ( h,θ i )) µ i ( dθ i ) ≤ µ i (Θ i \ Θ iε ( σ )) lim t →∞ e − tα ε / = 0 . a.s. in H . The above expression and equation (13) imply that lim t →∞ A it ( h, σ, ε ) /B it ( h, σ, η ε ) =0 a.s.- P µ ,φ . (cid:3) We state and prove Claim B used in the proof above. For any ξ >
0, define Θ iσ,ξ tobe the set such that θ i ∈ Θ iσ,ξ if and only if Q iθ i ( y i | s i , x i ) ≥ ξ for all ( s i , x i , y i ) suchthat Q iσ ( y i | s i , x i ) σ i ( x i | s i ) p S i ( s i ) > Claim B.
For all i ∈ I : (i) For all θ i ∈ ˆΘ i , lim t →∞ K it ( h, θ i ) = − K i ( σ, θ i ) , a.s. in H ; (ii) There exist ξ ∗ > and T ξ ∗ such that, ∀ t ≥ T ξ ∗ , K it ( h, θ i ) < − ( K i ( σ )+(3 / α ε ) ∀ θ i / ∈ Θ iσ,ξ , a.s. in H ; (iii) For all ξ > , ∃ ˆ T ξ such that, ∀ t ≥ ˆ T ξ , K it ( h, θ i ) < ( K i ( σ ) + (3 / α ε ) ∀ θ i ∈ Θ iσ,ξ \ Θ iε ( σ ) , a.s. in H .Proof : Define f req it ( z i ) = t (cid:80) t − τ =0 z i ( z iτ ) ∀ z i ∈ Z i . K it can be written as K it ( h, θ i ) = κ i t ( h )+ κ i t ( h )+ κ i t ( h, θ i ), where κ i t ( h ) = − t − (cid:80) t − τ =0 (cid:80) z i ∈ Z i (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1) ln Q iσ τ ( z i ), κ i t ( h ) = − t − (cid:80) t − τ =0 (cid:80) z i ∈ Z i ¯ P iσ τ ( z i ) ln Q iσ τ ( z i ), and κ i t ( h, θ i ) = (cid:80) z i ∈ Z i f req it ( z i ) ln Q iθ i ( z i ).The statements made below hold almost surely in H , but we omit this qualification.First, we show lim t →∞ κ i t ( h ) = 0. Define l iτ ( h, z i ) = (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1) ln Q iσ τ ( z i )and L it ( h, z i ) = (cid:80) tτ =1 τ − l iτ ( h, z i ) ∀ z i ∈ Z i . Fix any z i ∈ Z i . We show that L it ( · , z i )converges a.s. to an integrable, and, therefore, finite function L i ∞ ( · , z i ). To show this,we use martingale convergence results. Let h t denote the partial history until time t . Since E P µ ,φ ( ·| h t ) (cid:2) l it +1 ( h, z i ) (cid:3) = 0, then E P µ ,φ ( ·| h t ) (cid:2) L it +1 ( h, z i ) (cid:3) = L it ( h, z i ) and so( L it ( h, z i )) t is a martingale with respect to P µ ,φ . Next, we show that sup t E P µ ,φ [ | L it ( h, z i ) | ] ≤ M for M < ∞ . Note that E P µ ,φ (cid:2)(cid:0) L it ( h, z i ) (cid:1) (cid:3) = E P µ ,φ (cid:2)(cid:80) tτ =1 τ − (cid:0) l iτ ( h, z i ) (cid:1) +2 (cid:80) τ (cid:48) >τ τ (cid:48) τ l iτ ( h, z i ) l iτ (cid:48) ( h, z i ) (cid:3) . Since ( l it ) t is a martingale difference sequence, for τ (cid:48) > τ , E P µ ,φ (cid:2) l iτ ( h, z i ) l iτ (cid:48) ( h, z i ) (cid:3) = 0. Therefore, E P µ ,φ (cid:2)(cid:0) L it ( h, z i ) (cid:1) (cid:3) = (cid:80) tτ =1 τ − E P µ ,φ (cid:2)(cid:0) l iτ ( h, z i ) (cid:1) (cid:3) .Note also that E P µ ,φ ( ·| h τ − ) (cid:2)(cid:0) l iτ ( h, z i ) (cid:1) (cid:3) ≤ (cid:0) ln Q iσ τ ( z i ) (cid:1) Q iσ τ ( z i ). Therefore, by the lawof iterated expectations, E P µ ,φ (cid:2)(cid:0) L it ( h, z i ) (cid:1) (cid:3) ≤ (cid:80) tτ =1 τ − E P µ ,φ (cid:104)(cid:0) ln Q iσ τ ( z i ) (cid:1) Q iσ τ ( z i ) (cid:105) ,which in turn is bounded above by 1 because (ln x ) x ≤ ∀ x ∈ [0 , t E P µ ,φ [ | L it | ] ≤
1. By Theorem5.2.8 in Durrett (2010), L it ( h, z i ) converges a.s.- P µ ,φ to a finite L i ∞ ( h, z i ). Thus, byKronecker’s lemma (Pollard (2001), page 105) , lim t →∞ (cid:80) z i ∈ Z i (cid:8) t − (cid:80) tτ =1 (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1) ln Q iσ τ ( z i ) (cid:9) = 0. Therefore, lim t →∞ κ i t ( h ) = 0.Next, consider κ i t ( h ). The assumption that lim t →∞ σ t = σ and continuity of Q iσ ln Q iσ in σ imply that lim t →∞ κ i t ( h ) = − (cid:80) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) [ln Q iσ ( Y i | s i , x i )] σ i ( x i | s i ) p S i ( s i ).The limits of κ i t , κ i t imply that, ∀ γ > ∃ ˆ t γ such that, ∀ t ≥ ˆ t γ , (cid:12)(cid:12) κ i t ( h ) + κ i t ( h ) + (cid:88) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iσ ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ) (cid:12)(cid:12) ≤ γ. (14)We now prove (i)-(iii) by characterizing the limit of κ i t ( h, θ i ).(i) For all z i ∈ Z i , (cid:12)(cid:12) f req t ( z i ) − ¯ P iσ ( z i ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) t − (cid:80) t − τ =0 (cid:0) z i ( z iτ ) − ¯ P iσ τ ( z i ) (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) t − (cid:80) t − τ =0 (cid:0) ¯ P iσ τ ( z i ) − ¯ P iσ ( z i ) (cid:1)(cid:12)(cid:12) . The first term in the RHS goes to 0 (the proof is essentially identical to the This lemma implies that for a sequence ( (cid:96) t ) t if (cid:80) τ (cid:96) τ < ∞ , then (cid:80) tτ =1 b τ b t (cid:96) τ → b t ) t is anondecreasing positive real valued that diverges to ∞ . We can apply the lemma with (cid:96) t ≡ t − l t and b t = t . κ i goes to 0). The second term goes to 0 because lim t →∞ σ t = σ and¯ P i · is continuous. Thus, ∀ ζ > ∃ ˆ t ζ such that, ∀ t ≥ ˆ t ζ , ∀ z i ∈ Z i , (cid:12)(cid:12) f req it ( z i ) − ¯ P iσ ( z i ) (cid:12)(cid:12) < ζ (15)Thus, since θ i ∈ ˆΘ i , lim t →∞ κ i t ( h, θ i ) = (cid:80) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iθ i ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ). This expression and (14) establish part (i).(ii) For all θ i / ∈ Θ iσ,ξ , let z iθ i be such that ¯ P iσ ( z iθ i ) > Q iθ i ( z iθ i ) < ξ . By (15), ∃ t p iL / such ∀ t ≥ t p iL / , κ i t ( h, θ i ) ≤ f req it ( z iθ i ) ln Q iθ i ( z iθ i ) ≤ ( p iL /
2) ln ξ ∀ θ i / ∈ Θ iσ,ξ ,where p iL = min Z i { ¯ P iσ ( z i ) : ¯ P iσ ( z i ) > } . This result and (14) imply that, ∀ t ≥ t ≡ max { t p iL / , ˆ t } , K it ( h, θ i ) ≤ − (cid:88) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iσ ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ) + 1 + (cid:0) p iL / (cid:1) ln ξ ≤ Z i + 1 + (cid:0) q iL / (cid:1) ln ξ, (16) ∀ θ i / ∈ Θ iσ,ξ , where the second line follows from the facts that σ i ( x i | s i ) p S i ( s i ) ≤ x ln( x ) ∈ [ − , ∀ x ∈ [0 , K i ( σ ) < ∞ and α ε ≤ ¯ α < ∞∀ ε ≤ ¯ ε implies that the RHS of (16) can be made lower than − ( K i ( σ ) + (3 / α ε ) forsome sufficiently small ξ ∗ .(iii) For any ξ >
0, let ζ ξ = − α ε / ( Z i ξ ) >
0. By (15), ∃ ˆ t ζ ξ such that, ∀ t ≥ ˆ t ζ ξ , κ i t ( h, θ i ) ≤ (cid:88) { z i : ¯ P iσ ( z i ) > } f req it ( z i ) ln Q iθ i ( z i ) ≤ (cid:88) { z i : ¯ P iσ ( z i ) > } (cid:0) ¯ P iσ ( z i ) − ζ ξ (cid:1) ln Q iθ i ( z i ) ≤ (cid:88) ( s i ,x i ) ∈ S i × X i E Q σ ( ·| s i ,x i ) (cid:2) ln Q iθ i ( Y i | s i , x i ) (cid:3) σ i ( x i | s i ) p S i ( s i ) − Z i ζ ξ ln ξ, ∀ θ i ∈ Θ iσ,ξ (since Q iθ i ( z i ) ≥ ξ ∀ z i such that ¯ P iσ ( z i ) > α ε / − Z i ζ ξ ln ξ , and (14) imply that, ∀ t ≥ ˆ T ξ ≡ max { ˆ t ζ ξ , ˆ t α ε / } , K it ( h, θ i ) < − K i ( σ, θ i ) + α ε / ∀ θ i ∈ Θ iσ,ξ . This result and (11) imply the desiredresult. (cid:3) nline Appendix A Example: Trading with adverse selection
In this section, we provide the formal details for the trading environment in Example2.5. Let p ∈ ∆( A × V ) be the true distribution; we use subscripts, such as p A and p V | A , to denote the corresponding marginal and conditional distributions. Let Y = A × V ∪{ (cid:3) } denote the space of observable consequences, where (cid:3) will be a convenientway to represent the fact that there is no trade. We denote the random variable takingvalues in V ∪ { (cid:3) } by ˆ V . Notice that the state space in this example is Ω = A × V .Partial feedback is represented by the function f P : X × A × V → Y such that f P ( x, a, v ) = ( a, v ) if a ≤ x and f P ( x, a, v ) = ( a, (cid:3) ) if a > x . Full feedback isrepresented by f F ( x, a, v ) = ( a, v ). In all cases, payoffs are given by π : X × Y → R , where π ( x, a, v ) = v − x if a ≤ x and 0 otherwise. The objective distributionfor the case of partial feedback, Q P , is, ∀ x ∈ X , ∀ ( a, v ) ∈ A × V , Q P ( a, v | x ) = p ( a, v )1 { x ≥ a } ( x ), and, ∀ x ∈ X , ∀ a ∈ A , Q P ( a, (cid:3) | x ) = p A ( a )1 { xx } p A ( a ) ln p A ( a ) θ A ( a ) + (cid:88) { ( a,v ) ∈ A × V : a ≤ x } p ( a, v ) ln p ( a, v ) θ A ( a ) θ V ( v ) . For each x ∈ X , θ ( x ) = ( θ A ( x ) , θ V ( x )) ∈ Θ I = ∆( A ) × ∆( V ), where θ A ( x ) = p A and θ V ( x )( v ) = p V | A ( v | A ≤ x ) ∀ v ∈ V is the unique parameter value that minimizes K BE ( x, · ). Together with (18), we obtain equation Π BE in the main text. Analogy-based expectations equilibrium . Feedback is f F and the parameterset is Θ A . The subjective model is, ∀ x ∈ X , ∀ ( a, v ) ∈ A × V j , all j = 1 , ..., k , Q ABEEθ ( a, v | x ) = θ j ( a ) θ V ( v ), and, ∀ x ∈ X , ∀ a ∈ A , Q ABEEθ ( a, (cid:3) | x ) = 0, where θ = ( θ , ..., θ k , θ V ) ∈ Θ A . This is an analogy-based game. From (17), perceivedexpected profit from x ∈ X is k (cid:88) j =1 P r θ V ( V ∈ V j ) (cid:8) P r θ j ( A ≤ x | V ∈ V j ) ( E θ V [ V | V ∈ V j ] − x ) (cid:9) . (19) In all cases, the extension to mixed strategies is straightforward. x ∈ X , the wKLD function is K ABEE ( x, θ ) = E Q F ( ·| x ) (cid:2) ln Q F ( A, ˆ V | x ) Q ABEEθ ( A, ˆ V | x ) (cid:3) = k (cid:88) j =1 (cid:88) ( a,v ) ∈ A × V j p ( a, v ) ln p ( a, v ) θ j ( a ) θ V ( v ) . For each x ∈ X , θ ( x ) = ( θ ( x ) , ..., θ k ( x ) , θ V ( x )) ∈ Θ A = × j ∆( A ) × ∆( V ), where θ j ( x )( a ) = p A | V j ( a | V ∈ V j ) ∀ a ∈ A and θ V ( x ) = p V is the unique parameter valuethat minimizes K ABEE ( x, · ). Together with (19), we obtain equation Π ABEE in themain text.
Behavioral equilibrium (naive version) with analogy classes . It is natu-ral to also consider a case, unexplored in the literature, where feedback f P is partialand the subjective model is parameterized by Θ A . Suppose that the buyer’s behav-ior has stabilized to some price x ∗ . Due to the possible correlation across analogyclasses, the buyer might now believe that deviating to a different price x (cid:54) = x ∗ af-fects her valuation. In particular, the buyer might have multiple beliefs at x ∗ . Toobtain a natural equilibrium refinement, we assume that the buyer also observesthe analogy class that contains her realized valuation, whether she trades or not,and that Pr( V ∈ V j , A ≤ x ) > j = 1 , ..., k and x ∈ X . We de-note this new feedback assumption by a function f P ∗ : X × A × V → Y ∗ where Y ∗ = A × V ∪ { , ..., k } and f P ∗ ( x, a, v ) = ( a, v ) if a ≤ x and f P ∗ ( x, a, v ) = ( a, j )if a > x and v ∈ V j . The objective distribution given this feedback functionis, ∀ x ∈ X , ∀ ( a, v ) ∈ A × V , Q P ∗ ( a, v | x ) = p ( a, v )1 { x ≥ a } ( x ), and, ∀ x ∈ X , ∀ a ∈ A and all j = 1 , ..., k , Q P ∗ ( a, j | x ) = p A | V j ( a | V ∈ V j ) p V ( V j )1 { xx } p A | V j ( a | V ∈ V j ) p V ( V j ) ln p A | V j ( a | V ∈ V j ) p V ( V j ) θ j ( a ) (cid:80) v ∈ V j θ V ( v ) . For each x ∈ X , θ ( x ) = ( θ ( x ) , ..., θ k ( x ) , θ V ( x )) ∈ Θ A = × j ∆( A ) × ∆( V ), where θ j ( x )( a ) = p A | V j ( a | V ∈ V j ) ∀ a ∈ A and θ V ( x )( v ) = p V | A ( v | V ∈ V j , A ≤ x ) p V ( V j ) ∀ v ∈ V j , all j = 1 , ..., k is the unique parameter value that minimizes K BEA ( x, · ).Together with (19), we obtain Π BEA ( x, x ∗ ) = (cid:80) ki = j Pr( V ∈ V j ) Pr( A ≤ x | V ∈ V j ) (cid:0) E (cid:2) V | V ∈ V j , A ≤ x ∗ (cid:3) − x (cid:1) . B Proof of converse result: Theorem 3
Let (¯ µ i ) i ∈ I be a belief profile that supports σ as an equilibrium. Consider the followingpolicy profile φ = ( φ it ) i,t : For all i ∈ I and all t ,( µ i , s i , ξ i ) (cid:55)→ φ it ( µ i , s i , ξ i ) ≡ ϕ i (¯ µ i , s i , ξ i ) if max i ∈ I || ¯ Q iµ i − ¯ Q i ¯ µ i || ≤ C ε t ϕ i ( µ i , s i , ξ i ) otherwise , where ϕ i is an arbitrary selection from Ψ i , C ≡ max I (cid:8) Y i × sup X i × Y i | π i ( x i , y i ) | (cid:9) < ∞ , and the sequence ( ε t ) t will be defined below. For all i ∈ I , fix any prior µ i withfull support on Θ i such that µ i ( ·| Θ i ( σ )) = ¯ µ i (where for any A ⊂ Θ Borel, µ ( ·| A ) isthe conditional probability given A ).We now show that if ε t ≥ ∀ t and lim t →∞ ε t = 0, then φ is asymptoticallyoptimal. Throughout this argument, we fix an arbitrary i ∈ I . Abusing notation, let U i ( µ i , s i , ξ i , x i ) = E ¯ Q µi ( ·| s i ,x i ) [ π i ( x i , Y i )] + ξ i ( x i ). It suffices to show that U i ( µ i , s i , ξ i , φ it ( µ i , s i , ξ i )) ≥ U i ( µ i , s i , ξ i , x i ) − ε t (20)for all ( i, t ), all ( µ i , s i , ξ i ), and all x i . By construction of φ , equation (20) is satisfiedif max i ∈ I || ¯ Q iµ i − ¯ Q i ¯ µ i || > C ε t . If, instead, max i ∈ I || ¯ Q iµ i − ¯ Q i ¯ µ i || ≤ C ε t , then U i (¯ µ i , s i , ξ i , φ it ( µ i , s i , ξ i )) = U i (¯ µ i , s i , ξ i , ϕ i (¯ µ i , s i , ξ i )) ≥ U i (¯ µ i , s i , ξ i , x i ) , (21)4 x i ∈ X i . Moreover, ∀ x i , (cid:12)(cid:12) U i (¯ µ i , s i , ξ i , x i ) − U i ( µ i , s i , ξ i , x i ) (cid:12)(cid:12) = (cid:12)(cid:12) (cid:88) y i ∈ Y i π ( x i , y i ) (cid:0) ¯ Q i ¯ µ i ( y i | s i , x i ) − ¯ Q iµ i ( y i | s i , x i ) (cid:1)(cid:12)(cid:12) ≤ sup X i × Y i | π i ( x i , y i ) | (cid:88) y i ∈ Y i (cid:12)(cid:12)(cid:0) ¯ Q i ¯ µ i ( y i | s i , x i ) − ¯ Q iµ i ( y i | s i , x i ) (cid:1)(cid:12)(cid:12) ≤ sup X i × Y i | π i ( x i , y i ) | × Y i × max y i ,x i ,s i (cid:12)(cid:12) ¯ Q i ¯ µ i ( y i | s i , x i ) − ¯ Q iµ i ( y i | s i , x i ) (cid:12)(cid:12) so by our choice of C , | U i (¯ µ i , s i , ξ i , x i ) − U i ( µ i , s i , ξ i , x i ) | ≤ . ε t ∀ x i . Therefore,equation (21) implies equation (20); thus φ is asymptotically optimal if ε t ≥ ∀ t andlim t →∞ ε t = 0.We now construct a sequence ( ε t ) t such that ε t ≥ ∀ t and lim t →∞ ε t = 0. Let¯ φ i = ( ¯ φ it ) t be such that ¯ φ it ( µ i , · , · ) = ϕ i (¯ µ i , · , · ) ∀ µ i ; i.e., ¯ φ i is a stationary policy thatmaximizes utility under the assumption that the belief is always ¯ µ i . Let ζ i ( µ i ) ≡ C || ¯ Q iµ i − ¯ Q i ¯ µ i || and suppose (the proof is at the end) that P µ , ¯ φ ( lim t →∞ max i ∈ I | ζ i ( µ it ( h )) | = 0) = 1 (22)(recall that P µ , ¯ φ is the probability measure over H induced by the policy profile ¯ φ ; bydefinition of ¯ φ , P µ , ¯ φ does not depend on µ ). Then by the 2nd Borel-Cantelli lemma(Billingsley (1995), pages 59-60), for any γ > (cid:80) t P µ , ¯ φ (max i ∈ I | ζ i ( µ it ( h )) | ≥ γ ) < ∞ . Hence, for any a >
0, there exists a sequence ( τ ( j )) j such that (cid:88) t ≥ τ ( j ) P µ , ¯ φ (cid:0) max i ∈ I | ζ i ( µ it ( h )) | ≥ /j (cid:1) < a − j (23)and lim j →∞ τ ( j ) = ∞ . For all t ≤ τ (1), we set ε t = 3 C , and, for any t > τ (1), we set ε t ≡ /N ( t ), where N ( t ) ≡ (cid:80) ∞ j =1 { τ ( j ) ≤ t } . Observe that, since lim j →∞ τ ( j ) = ∞ , N ( t ) → ∞ as t → ∞ and thus ε t → P µ ,φ (lim t →∞ (cid:107) σ t ( h ∞ ) − σ (cid:107) = 0) = 1 , where ( σ t ) t is the se-quence of intended strategies given φ , i.e., σ it ( h )( x i | s i ) = P ξ ( ξ i : φ it ( µ it ( h ) , s i , ξ i ) = x i ) . Observe that, by definition, σ i ( x i | s i ) = P ξ (cid:0) ξ i : x i ∈ arg max ˆ x i ∈ X i E ¯ Q ¯ µi ( ·| s i , ˆ x i ) [ π i (ˆ x i , Y i )]+ ξ i (ˆ x i ) (cid:1) . Since ϕ i ∈ Ψ i , it follows that we can write σ i ( x i | s i ) = P ξ ( ξ i : ϕ i (¯ µ i , s i , ξ i ) = x i ).Let H ≡ { h : (cid:107) σ t ( h ) − σ (cid:107) = 0 , for all t } . It is sufficient to show that P µ ,φ ( H ) = 1.5o show this, observe that P µ ,φ ( H ) ≥ P µ ,φ (cid:16) ∩ t { max i ζ i ( µ t ) ≤ ε t } (cid:17) = ∞ (cid:89) t = τ (1)+1 P µ ,φ (cid:16) max i ζ i ( µ t ) ≤ ε t | ∩ l
0; hence, P µ ,φ ( H ) = 1.We conclude the proof by showing that equation (22) indeed holds. Observe that σ is trivially stable under ¯ φ . By Lemma 2, ∀ i ∈ I and all open sets U i ⊇ Θ i ( σ ),lim t →∞ µ it (cid:0) U i (cid:1) = 1 (24) a.s. − P µ , ¯ φ (over H ). Let H denote the set of histories such that x it ( h ) = x i and s it ( h ) = s i implies that σ i ( x i | s i ) >
0. By definition of ¯ φ , P µ , ¯ φ ( H ) = 1. Thus, itsuffices to show that lim t →∞ max i ∈ I | ζ i ( µ it ( h )) | = 0 a.s.- P µ , ¯ φ over H . To do this, take6ny A ⊆ Θ that is closed. By equation (24), ∀ i ∈ I , and almost all h ∈ H ,lim sup t →∞ ˆ A ( θ ) µ it +1 ( dθ ) = lim sup t →∞ ˆ A ∩ Θ i ( σ ) ( θ ) µ it +1 ( dθ ) . Moreover, ˆ A ∩ Θ i ( σ ) ( θ ) µ it +1 ( dθ ) ≤ ˆ A ∩ Θ i ( σ ) ( θ ) (cid:40) (cid:81) tτ =1 Q iθ ( y iτ | s iτ , x iτ ) µ i ( dθ ) ´ Θ i ( σ ) (cid:81) tτ =1 Q iθ ( y iτ | s iτ , x iτ ) µ i ( dθ ) (cid:41) = µ i ( A | Θ i ( σ )) = ¯ µ i ( A ) , where the first inequality follows from the fact that Θ i ( σ ) ⊆ Θ i ; the first equalityfollows from the fact that, since h ∈ H , the fact that the game is weakly identifiedgiven σ implies that (cid:81) tτ =1 Q iθ ( y iτ | s iτ , x iτ ) is constant with respect to θ ∀ θ ∈ Θ i ( σ ), andthe last equality follows from our choice of µ i . Therefore, we established that a.s.- P µ , ¯ φ over H , lim sup t →∞ µ it +1 ( h )( A ) ≤ ¯ µ i ( A ) for A closed. By the portmanteau lemma, thisimplies that, a.s. - P µ , ¯ φ over H , lim t →∞ ´ Θ f ( θ ) µ it +1 ( h )( dθ ) = ´ Θ f ( θ )¯ µ i ( dθ ) for any f real-valued, bounded and continuous. Since, by assumption, θ (cid:55)→ Q iθ ( y i | s i , x i ) isbounded and continuous, the previous result applies to Q iθ ( y i | s i , x i ), and since y, s, x take a finite number of values, this result implies that lim t →∞ || ¯ Q iµ it ( h ) − ¯ Q i ¯ µ i || = 0 ∀ i ∈ I a.s. - P µ , ¯ φ over H . (cid:3) C Non-myopic players
In the main text, we proved the results for the case where players are myopic. Here,we assume that players maximize discounted expected payoffs, where δ i ∈ [0 ,
1) is thediscount factor of player i . In particular, players can be forward looking and decide toexperiment. Players believe, however, that they face a stationary environment and,therefore, have no incentives to influence the future behavior of other players. Weassume for simplicity that players know the distribution of their own payoff perturba-tions.Because players believe that they face a stationary environment, they solve a (sub-jective) dynamic optimization problem that can be cast recursively as follows. By thePrinciple of Optimality, V i ( µ i , s i ) denotes the maximum expected discounted payoffs(i.e., the value function) of player i who starts a period by observing signal s i and by7olding belief µ i if and only if V i ( µ i , s i ) = ˆ Ξ i (cid:26) max x i ∈ X i E ¯ Q µi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) + ξ i ( x i ) + δE p Si (cid:2) V i (ˆ µ i , S i ) (cid:3)(cid:3)(cid:27) P ξ ( dξ i ) , (25)where ˆ µ i = B i ( µ i , s i , x i , Y i ) is the updated belief. For all ( µ i , s i , ξ i ), letΦ i ( µ i , s i , ξ i ) = arg max x i ∈ X i E ¯ Q µi ( ·| s i ,x i ) (cid:2) π i ( x i , Y i ) + ξ i ( x i ) + δE p Si (cid:2) V i (ˆ µ i , S i ) (cid:3)(cid:3) . The proof of the next lemma relies on standard arguments and is, therefore, omitted. Lemma 3.
There exists a unique solution V i to the Bellman equation (25); thissolution is bounded in ∆(Θ i ) × S i and continuous as a function of µ i . Moreover, Φ i is single-valued and continuous with respect to µ i , a.s.- P ξ . Because players believe they face a stationary environment with i.i.d. perturba-tions, it is without loss of generality to restrict behavior to depend on the state of therecursive problem. Optimality of a policy is defined as usual (with the requirementthat φ it ∈ Φ i ∀ t ).Lemma 2 implies that the support of posteriors converges, but posteriors need notconverge. We can always find, however, a subsequence of posteriors that converges.By continuity of dynamic behavior in beliefs, the stable strategy profile is dynamicallyoptimal (in the sense of solving the dynamic optimization problem) given this conver-gent posterior. For weakly identified games, the convergent posterior is a fixed pointof the Bayesian operator. Thus, the players’ limiting strategies will provide no new in-formation. Since the value of experimentation is nonnegative, it follows that the stablestrategy profile must also be myopically optimal (in the sense of solving the optimiza-tion problem that ignores the future), which is the definition of optimality used in thedefinition of Berk-Nash equilibrium. Thus, we obtain the following characterizationof the set of stable strategy profiles when players follow optimal policies. Theorem 4.
Suppose that a strategy profile σ is stable under an optimal policy profilefor a perturbed and weakly identified game. Then σ is a Berk-Nash equilibrium of thegame. Doraszelski and Escobar (2010) study a similarly perturbed version of the Bellman equation. roof. The first part of the proof is identical to the proof of Theorem 2. Here, weprove that, given that lim j →∞ σ t ( j ) = σ and lim j →∞ µ it ( j ) = µ i ∞ ∈ ∆(Θ i ( σ )) ∀ i , then, ∀ i , σ i is optimal for the perturbed game given µ i ∞ ∈ ∆(Θ i ) , i.e., ∀ ( s i , x i ), σ i ( x i | s i ) = P ξ (cid:0) ξ i : ψ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) , (26)where ψ i ( µ i ∞ , s i , ξ i ) ≡ arg max x i ∈ X i E ¯ Q iµi ∞ ( ·| s i ,x i ) [ π i ( x i , Y i )] + ξ i ( x i ).To establish (26), fix i ∈ I and s i ∈ S i . Thenlim j →∞ σ it ( j ) ( h )( x i | s i ) = lim j →∞ P ξ (cid:0) ξ i : φ it ( j ) ( µ it ( j ) , s i , ξ i ) = x i (cid:1) = P ξ (cid:0) ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) , where the second line follows by optimality of φ i and Lemma 3. This implies that σ i ( x i | s i ) = P ξ ( ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } ). Thus, it remains to show that P ξ (cid:0) ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) = P ξ (cid:0) ξ i : ψ i ( µ i ∞ , s i , ξ i ) = { x i } (cid:1) (27) ∀ x i such that P ξ ( ξ i : Φ i ( µ i ∞ , s i , ξ i ) = { x i } ) >
0. From now on, fix any such x i . Since σ i ( x i | s i ) >
0, the assumption that the game is weakly identified implies that Q iθ i ( · | x i , s i ) = Q iθ i ( · | x i , s i ) ∀ θ i , θ i ∈ Θ( σ ). The fact that µ i ∞ ∈ ∆(Θ i ( σ )) then implies that B i ( µ i ∞ , s i , x i , y i ) = µ i ∞ (28) ∀ y i ∈ Y i . Thus, Φ i ( µ i ∞ , s i , ξ i ) = { x i } is equivalent to E ¯ Q µi ∞ ( ·| s i ,x i ) (cid:104) π i ( x i , Y i ) + ξ i ( x i ) + δE p Si (cid:2) V i ( µ i ∞ , S i ) (cid:3)(cid:105) > E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) π i (˜ x i , Y i ) + ξ i (˜ x i ) + δE p Si (cid:2) V i ( B i ( µ i ∞ , s i , ˜ x i , Y i ) , S i ) (cid:3)(cid:3) ≥ E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) π i (˜ x i , Y i ) + ξ i (˜ x i ) (cid:3) + δE p Si (cid:104) V i ( E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) B i ( µ i ∞ , s i , ˜ x i , Y i ) (cid:3) , S i ) (cid:105) = E ¯ Q µi ∞ ( ·| s i , ˜ x i ) (cid:2) π i (˜ x i , Y i ) + ξ i (˜ x i ) (cid:3) + δE p Si (cid:2) V i ( µ i ∞ , S i ) (cid:3) ∀ ˜ x i ∈ X i , where the first line follows by equation (28) and definition of Φ i , the secondline follows by the convexity of V i as a function of µ i and Jensen’s inequality, andthe last line by the fact that Bayesian beliefs have the martingale property. In turn, See, for example, Nyarko (1994), for a proof of convexity of the value function. ψ ( µ i ∞ , s i , ξ i ) = { x i } . D Population models
We discuss some variants of population models that differ in the matching technologyand feedback. The right variant of population model will depend on the specificapplication. Single pair model . Each period a single pair of players is randomly selectedfrom each of the i populations to play the game. At the end of the period, the signals,actions, and outcomes of their own population are revealed to everyone. Steady-state behavior in this case corresponds exactly to the notion of Berk-Nash equilibriumdescribed in the paper.
Random matching model . Each period, all players are randomly matched andobserve only feedback from their own match. We now modify the definition of Berk-Nash equilibrium to account for this random-matching setting. The idea is similar toFudenberg and Levine’s (1993) definition of a heterogeneous self-confirming equilib-rium. Now each agent in population i can have different experiences and, hence, havedifferent beliefs and play different strategies in steady state.For all i ∈ I , define BR i ( σ − i ) = (cid:8) σ i : σ i is optimal given µ i ∈ ∆ (cid:0) Θ i ( σ i , σ − i ) (cid:1)(cid:9) . Note that σ is a Berk-Nash equilibrium if and only if σ i ∈ BR i ( σ − i ) ∀ i ∈ I . Definition 9.
A strategy profile σ is a heterogeneous Berk-Nash equilibrium ofgame G if, for all i ∈ I , σ i is in the convex hull of BR i ( σ − i ).Intuitively, a heterogenous equilibrium strategy σ i is the result of convex combi-nations of strategies that belong to BR i ( σ − i ); the idea is that each of these strategies In some cases, it may be unrealistic to assume that players are able to observe the private signalsof previous generations, so some of these models might be better suited to cases with public, but notprivate, information. Alternatively, we can think of different incarnations of players born every period who are able toobserve the history of previous generations.
10s followed by a segment of the population i . Random-matching model with population feedback . Each period all play-ers are randomly matched; at the end of the period, each player in population i ob-serves the signals, actions, and outcomes of their own population. Define¯ BR i ( σ i , σ − i ) = (cid:8) ˆ σ i : ˆ σ i is optimal given µ i ∈ ∆ (cid:0) Θ i ( σ i , σ − i ) (cid:1)(cid:9) . Definition 10.
A strategy profile σ is a heterogeneous Berk-Nash equilibriumwith population feedback of game G if, for all i ∈ I , σ i is in the convex hull of¯ BR i ( σ i , σ − i ).The main difference when players receive population feedback is that their beliefsno longer depend on their own strategies but rather on the aggregate populationstrategies. D.1 Equilibrium foundation
Using arguments similar to the ones in the text, it is now straightforward to concludethat the definition of heterogenous Berk-Nash equilibrium captures the steady stateof a learning environment with a population of agents in the role of each player. Tosee the idea, let each population i be composed of a continuum of agents in the unitinterval K ≡ [0 , ik (meaning agent k ∈ K from population i ) isdenoted by σ ik . The aggregate strategy of population (i.e., player) i is σ i = ´ K σ ik dk . Random matching model . Suppose that each agent is optimizing and that, forall i , ( σ ikt ) converges to σ ik a.s. in K , so that individual behavior stabilizes. ThenLemma 2 says that the support of beliefs must eventually be Θ i ( σ ik , σ − i ) for agent ik . Next, for each ik , take a convergent subsequence of beliefs µ ikt and denote it µ ik ∞ .It follows that µ ik ∞ ∈ ∆(Θ i ( σ ik , σ − i )) and, by continuity of behavior in beliefs, σ ik isoptimal given µ ik ∞ . In particular, σ ik ∈ BR i ( σ − i ) for all ik and, since σ i = ´ K σ ik dk ,it follows that σ i is in the convex hull of BR i ( σ − i ). Unlike the case of heterogeneous self-confirming equilibrium, a definition where each action inthe support of σ is supported by a (possibly different) belief would not be appropriate here. Thereason is that BR i ( σ − i ) might contain only mixed, but not pure strategies (e.g., Example 1). We need individual behavior to stabilize; it is not enough that it stabilizes in the aggregate. Thisis natural, for example, if we believe that agents whose behavior is unstable will eventually realizethey have a misspecified model. andom matching model with population feedback . Suppose that eachagent is optimizing and that, for all i , σ it = ´ K σ ikt dk converges to σ i . Then Lemma2 says that the support of beliefs must eventually be Θ i ( σ i , σ − i ) for any agent inpopulation i . Next, for each ik , take a convergent subsequence of beliefs µ ikt anddenote it µ ik ∞ . It follows that µ ik ∞ ∈ ∆(Θ i ( σ i , σ − i )) and, by continuity of behavior inbeliefs, σ ik is optimal given µ ik ∞ . In particular, σ ik ∈ ¯ BR i ( σ − i ) for all i, k and, since σ i = ´ K σ ik dk , it follows that σ i is in the convex hull of ¯ BR i ( σ − i ). E Lack of payoff feedback
In the paper, players are assumed to observe their own payoffs. We now providetwo alternatives to relax this assumption. In the first alternative, players observeno feedback about payoffs; in the second alternative, players may observe partialfeedback.
No payoff feedback . In the paper we had a single, deterministic payoff function π i : X i × Y i → R , which can be represented in vector form as an element π i ∈ R X i × Y i ) .We now generalize it to allow for uncertain payoffs. Player i is endowed with aprobability distribution P π i ∈ ∆( R X i × Y i ) ) over the possible payoff functions. Inparticular, the random variable π i is independent of Y i , and so there is nothing newto learn about payoffs from observing consequences. With random payoff functions,the results extend provided that optimality is defined as follows: A strategy σ i forplayer i is optimal given µ i ∈ ∆(Θ i ) if σ i ( x i | s i ) > x i ∈ arg max ¯ x i ∈ X i E P πi E ¯ Q iµi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , Y i ) (cid:3) . Note that by interchanging the order of integration, this notion of optimality is equiv-alent to the notion in the paper where the deterministic payoff function is given by E P πi π i ( · , · ). Partial payoff feedback . Suppose that player i knows her own consequence function f i : X × Ω → Y i and that her payoff function is now given by π i : X × Ω → R . Inparticular, player i may not observe her own payoff, but observing a consequence mayprovide partial information about ( x − i , ω ) and, therefore, about payoffs. Unlike thecase in the text where payoffs are observed, a belief µ i ∈ ∆(Θ i ) may not uniquelydetermine expected payoffs. The reason is that the distribution over consequences12mplied by µ i may be consistent with several distributions over X − i × Ω; i.e., thedistribution over X − i × Ω is only partially identified. Define the set M µ i ⊆ ∆( X − i × Ω) S i × X i to be the set of conditional distributions over X − i × Ω given ( s i , x i ) ∈ S i × X i that are consistent with belief µ i ∈ ∆(Θ i ), i.e., m ∈ M µ i if and only if ¯ Q iµ i ( y i | s i , x i ) = m ( f i ( x i , X − i , W ) = y i | s i , x i ) for all ( s i , x i ) ∈ S i × X i and y i ∈ Y i . Thenoptimality should be defined as follows: A strategy σ i for player i is optimal given µ i ∈ ∆(Θ i ) if there exists m µ i ∈ M µ i such that σ i ( x i | s i ) > x i ∈ arg max ¯ x i ∈ X i E m µi ( ·| s i , ¯ x i ) (cid:2) π i (¯ x i , X − i , W ) (cid:3) . Finally, the definition of identification would also need to be changed to require notonly that there is a unique distribution over consequences that matches the observeddata, but also that this unique distribution implies a unique expected utility function.
F Global stability: Example 2.1 (monopoly withunknown demand).
Theorem 3 says that all Berk-Nash equilibria can be approached with probability 1provided we allow for vanishing optimization mistakes. In this appendix, we illustratehow to use the techniques of stochastic approximation theory to establish stabilityof equilibria under the assumption that players make no optimization mistakes. Wepresent the explicit learning dynamics for the monopolist with unknown demand,Example 2.1, and show that the unique equilibrium in this example is globally stable.The intuition behind global stability is that switching from the equilibrium strategyto a strategy that puts more weight on a price of 2 changes beliefs in a way that makesthe monopoly want to put less weight on a price of 2, and similarly for a deviation toa price of 10.We first construct a perturbed version of the game. Then we show that the learningproblem is characterized by a nonlinear stochastic system of difference equations andemploy stochastic approximation methods for studying the asymptotic behavior ofsuch system. Finally, we take the payoff perturbations to zero.In order to simplify the exposition and thus better illustrate the mechanism drivingthe dynamics, we modify the subjective model slightly. We assume the monopolist onlylearns about the parameter b ∈ R ; i.e., her beliefs about parameter a are degenerate at13 point a = 40 (cid:54) = a and thus are never updated. Therefore, beliefs µ are probabilitydistributions over R , i.e., µ ∈ ∆( R ). Perturbed Game . Let ξ be a real-valued random variable distributed accordingto P ξ ; we use F to denote the associated cdf and f the pdf. The perturbed payoffs aregiven by yx − ξ { x = 10 } . Thus, given beliefs µ ∈ ∆( R ), the probability of optimallyplaying x = 10 is σ ( µ ) = F (8 a − E µ [ B ]) . Note that the only aspect of µ that matters for the decision of the monopolist is E µ [ B ].Thus, letting m = E µ [ B ] and slightly abusing notation, we use σ ( µ ) = σ ( m ) as theoptimal strategy. Bayesian Updating . We now derive the Bayesian updating procedure. We as-sume that the the prior µ is given by a Gaussian distribution with mean and vari-ance m , τ . It is possible to show that, given a realization ( y, x ) and a prior N ( m, τ ), the posterior is also Gaussian and the mean and variance evolve as fol-lows: m t +1 = m t + (cid:16) − ( Y t +1 − a ) X t +1 − m t (cid:17) (cid:16) X t +1 X t +1 + τ − t (cid:17) and τ t +1 = X t +1 + τ − t ) . Nonlinear Stochastic Difference Equations and Stochastic Approximation.
For simplicity, let r t +1 ≡ t +1 (cid:0) τ − t + X t +1 (cid:1) and note that the previous nonlinear systemof stochastic difference equations can be written as m t +1 = m t + 1 t + 1 X t +1 r t +1 (cid:18) − ( Y t +1 − a ) X t +1 − m t (cid:19) r t +1 = r t + 1 t + 1 (cid:0) X t +1 − r t (cid:1) . Let β t = ( m t , r t ) (cid:48) , Z t = ( X t , Y t ), G ( β t , z t +1 ) = (cid:34) x t +1 r t +1 (cid:16) − ( y t +1 − a ) x t +1 − m t (cid:17)(cid:0) x t +1 − r t (cid:1) (cid:35) This choice of prior is standard in Gaussian settings like ours. As shown below this choicesimplifies the exposition considerably. G ( β ) = (cid:34) G ( β ) G ( β ) (cid:35) = E P σ [ G ( β, Z t +1 )]= (cid:34) F (8 a − m ) r (cid:16) − ( a − a − b − m (cid:17) + (1 − F (8 a − m )) r (cid:16) − ( a − a − b − m (cid:17) (4 + F (8 a − m )96 − r ) (cid:35) where P σ is the probability over Z induced by σ (and y = a − b x + ω ). Therefore,the dynamical system can be cast as β t +1 = β t + 1 t + 1 G ( β t ) + 1 t + 1 V t +1 with V t +1 = G ( β t , Z t +1 ) − G ( β t ) . Stochastic approximation theory (e.g., Kushner andYin (2003)) implies, roughly speaking, that in order to study the asymptotic behaviorof ( β t ) t it is enough to study the behavior of the orbits of the following ODE˙ β ( t ) = G ( β ( t )) . Characterization of the Steady States.
In order to find the steady statesof ( β t ) t , it is enough to find β ∗ such that G ( β ∗ ) = 0. Let H ( m ) ≡ F (8 a − m )10 ( − ( a − a ) + ( b − m ) 10)+(1 − F (8 a − m )) 2 ( − ( a − a ) + ( b − m ) 2). Ob-serve that G ( β ) = r − H ( m ) and that H is continuous and lim m →−∞ H ( m ) = ∞ and lim m →∞ H ( m ) = −∞ . Thus, there exists at least one solution H ( m ) = 0.Therefore, there exists at least one β ∗ such that G ( β ∗ ) = 0.Let ¯ b = b − a − a = 4 − = and b = b − a − a = 4 − − = 3, ¯ r = 4 + F (8 a − b )96 and r = 4 + F (8 a − b )96, and B ≡ [ b, ¯ b ] × [ r, ¯ r ]. It follows that H ( m ) < ∀ m > ¯ b , and thus m ∗ must be such that m ∗ ≤ ¯ b . It is also easy to see that m ∗ ≥ b .Moreover, dH ( m ) dm = 96 f (8 a − m ) (cid:0) a − a ) − (cid:0) b − m (cid:1) (cid:1) − − F (8 a − m ) . Thus,for any m ≤ ¯ b , dH ( m ) dm <
0, because m ≤ ¯ b implies 8( a − a ) ≤ ( b − m )80 < ( b − m )96.Therefore, on the relevant domain m ∈ [ b, ¯ b ], H is decreasing, thus implying thatthere exists only one m ∗ such that H ( m ∗ ) = 0. Therefore, there exists only one β ∗ such that G ( β ∗ ) = 0 .We are now interested in characterizing the limit of β ∗ as the perturbation vanishes,i.e. as F converges to 1 { ξ ≥ } . To do this we introduce some notation. We considera sequence ( F n ) n that converges to 1 { ξ ≥ } and use β ∗ n to denote the steady state15ssociated to F n . Finally, we use H n to denote the H associated to F n .We proceed as follows. First note that since β ∗ n ∈ B ∀ n , the limit exists (going toa subsequence if needed). We show that m ∗ ≡ lim n →∞ m ∗ n = a = 8 = . Supposenot, in particular, suppose that lim n →∞ m ∗ n < a = (the argument for the reverseinequality is analogous and thus omitted). In this case lim n →∞ a − m ∗ n >
0, andthus lim n →∞ F n (8 a − m ∗ n ) = 1. Thereforelim n →∞ H n ( β ∗ n ) = 10 ( − ( a − a ) + ( b − m ∗ ) 10) ≥ (cid:18) − (cid:19) > . But this implies that ∃ N such that H n ( β ∗ n ) > ∀ n ≥ N which is a contradiction.Moreover, define σ ∗ n = F n (8 a − m ∗ n ) and σ ∗ = lim n →∞ σ n . Since H n ( m ∗ n ) = 0 ∀ n and m ∗ = , it follows that σ ∗ = − (cid:0) − (cid:0) − (cid:1) (cid:1) (cid:0) − (cid:0) − (cid:1) (cid:1) − (cid:0) − (cid:0) − (cid:1) (cid:1) = 136 . Global convergence to the Steady State . In our example, it is in factpossible to establish that behavior converges with probability 1 to the unique equi-librium. By the results in Benaim (1999) Section 6.3, it is sufficient to establish the global asymptotic stability of β ∗ n for any n , i.e., the basin of attraction of β ∗ n is all of B . In order to do this let L ( β ) = ( β − β ∗ n ) (cid:48) P ( β − β ∗ n ) for all β ; where P ∈ R × ispositive definite and diagonal and will be determined later. Note that L ( β ) = 0 iff β = β ∗ n . Also dL ( β ( t )) dt = ∇ L ( β ( t )) (cid:48) G ( β ( t ))= 2 ( β ( t ) − β ∗ n ) (cid:48) P ( G ( β ( t )))= 2 (cid:8) ( m ( t ) − m ∗ n ) P [11] G ( β ( t )) + ( r ( t ) − r ∗ n ) P [22] G ( β ( t )) (cid:9) . G ( β ∗ n ) = 0, dL ( β ( t )) dt =2 ( β ( t ) − β ∗ n ) (cid:48) P ( G ( β ( t )) − G ( β ∗ n ))=2 ( m ( t ) − m ∗ n ) P [11] ( G ( β ( t )) − G ( β ∗ n ))+ 2 ( r ( t ) − r ∗ n ) P [22] ( G ( β ( t )) − G ( β ∗ n ))=2 ( m ( t ) − m ∗ n ) P [11] ˆ ∂ G ( m ∗ n + s ( m ( t ) − m ∗ n ) , r ∗ n ) ∂m ds + 2 ( r ( t ) − r ∗ n ) P [22] ˆ ∂ G ( m ∗ n , r ∗ n + s ( r ( t ) − r ∗ n )) ∂r ds where the last equality holds by the mean value theorem. Note that d G ( m ∗ n ,r ∗ n + s ( r ( t ) − r ∗ n )) dr = − ´ d G ( m ∗ n + s ( m ( t ) − m ∗ n ) ,r ∗ n ) dm ds = ´ ( r ∗ n ) − dH ( m ∗ n + s ( m ( t ) − m ∗ n )) dm ds . Since r ( t ) > r ∗ n ≥ dH ( m ) dm < ∀ m in therelevant domain. Thus, by choosing P [11] > P [22] > dL ( β ( t )) dt < L satisfies the following properties: is strictly positive ∀ β (cid:54) = β ∗ n and L ( β ∗ n ) = 0, and dL ( β ( t )) dt <
0. Thus, the function satisfies all the conditionsof a Lyapounov function and, therefore, β ∗ n is globally asymptotically stable ∀ nn