Expectations, Networks, and Conventions
aa r X i v : . [ ec on . T H ] S e p EXPECTATIONS, NETWORKS, AND CONVENTIONS
BENJAMIN GOLUB AND STEPHEN MORRISA
BSTRACT . In coordination games and speculative over-the-counter financial markets, solutionsdepend on higher-order average expectations : agents’ expectations about what counterparties,on average, expect their counterparties to think, etc. We offer a unified analysis of these objectsand their limits, for general information structures, priors, and networks of counterparty rela-tionships. Our key device is an interaction structure combining the network and agents’ beliefs,which we analyze using Markov methods. This device allows us to nest classical beauty contestsand network games within one model and unify their results. Two applications illustrate the tech-niques: The first characterizes when slight optimism about counterparties’ average expectationsleads to contagion of optimism and extreme asset prices. The second describes the tyranny of theleast-informed : agents coordinating on the prior expectations of the one with the worst privateinformation, despite all having nearly common certainty, based on precise private signals, of theex post optimal action.
1. I
NTRODUCTION
Consider a situation in which each agent has strong incentives to match the behavior of oth-ers. An outcome that agents coordinate on in such a setting has been called a convention inphilosophy and economics (see Lewis (1969), Young (1996), and Shin and Williamson (1996)).In deciding how to coordinate, agents will take into account their beliefs about (i) the state ofthe world, which determines the best action; and (ii) one another’s actions. Agents may differfrom one another, and hence have an incentive to choose differently, for three reasons: first,because they are asymmetrically informed; second, because they interpret the same informa-tion differently—that is, they have different priors; and third, because they differ in whom theywant to coordinate with. Which conventions emerge in such an environment will depend on
Date : September 2020.Golub: Department of Economics, Harvard University, Cambridge, U.S.A., [email protected], web-site: bengolub.net. Morris: Department of Economics, MIT, U.S.A., [email protected], website: econom-ics.mit.edu/faculty/semorris. We are grateful for conversations with Nageeb Ali, Dirk Bergemann, Larry Blume,Ben Brooks, Andrea Galeotti, Jason Hartline, Tibor Heumann, Matthew O. Jackson, Bobby Kleinberg, Eric Maskin,Dov Samet, and Omer Tamuz; for comments from Selman Erol, Alireza Tahbaz-Salehi and Muhamet Yildiz, whoserved as discussants; as well as many questions and comments from seminar and conference participants. We es-pecially thank Ryota Iijima for several inspiring conversations early in this project. Cristian Gradinaru and GeorgiaMartin, and Eduard Talamàs provided excellent assistance in preparing the manuscript. the information asymmetries, the heterogeneous prior beliefs, and the network describing thecoordination motives of agents. Our purpose is to characterize this dependence.We informally describe a simple model of this environment: a coordination game with linearbest responses. Nature draws an external state θ , and each agent i chooses a real-valued actiona i based on some private information. This occurs simultaneously. Agents’ payoffs capture twomotivations: First, they seek to coordinate with a basic random variable y ( θ )—a random vari-able, common to everyone, that depends only on the external state; second, they seek to takeactions that are close to the actions that others take—the various a j for j i . The disutilitiesthey experience are proportional to the squares of the differences between a i and these varioustargets; this feature induces best responses linear in an agent’s expectations of y and others’actions. A network of weights captures the coordination concerns of the agents—that is, which others each agent cares about coordinating with.If just the coordination motive were present, with no desire to match the basic random vari-able, there would be a continuum of equilibria. Indeed, for any action, there would be anequilibrium with everyone choosing that action. The choice of action would be an arbitrary convention. We will be interested in the case where the convention is not arbitrary, becauseagents put some weight on the accuracy motive—matching the basic random variable—whilestill being strongly motivated to choose actions close to others’ actions. In this case, it turns outthere is a unique equilibrium. When the weight on others’ actions is high, it can be shown thatagents essentially choose a common action. We call this action—the common action played inthe limit—the convention . If there were common knowledge of the external state θ , the con-vention would be equal to y ( θ ), the value everyone seeks to match. But we are interested incharacterizing the convention when there is incomplete information about the state.The convention will depend on higher-order expectations of the agents. Suppose Ann caresmainly about coordinating with Bob, who cares mainly about coordinating with Charlie. (Recallthat the agents all care a little about matching their own expectations of the external variable.)Then Ann’s expectation of Bob’s expectation of Charlie’s expectation of the external variable be-comes relevant for Ann’s decision. In this scenario, each agent is seeking to coordinate withonly one other, but in the general model each seeks to match a (weighted) average of the ac-tions of several others. By an elaboration of the above reasoning about Ann, Bob, and Charile, higher-order average expectations become relevant: Each agent cares about the average of hisneighbors’ expectations of the average of their neighbors’ expectations of the external variable, XPECTATIONS, NETWORKS, AND CONVENTIONS 3 and so on. Thus our analysis of coordination games leads naturally to a study of higher-orderaverage expectations. We will define the consensus expectation to be (essentially) the limit ofsuch higher-order average expectations as the order becomes large. The consensus expectationwill equal the convention that obtains in the linear best-response game described above, in thelimit as agents’ coordination concerns dominate. We will focus on this limit, though many ofthe techniques we will develop can be extended to study the case where coordination motivesare not dominant.We will report three kinds of substantive results about consensus expectations. To establishthese results, we introduce a key technical device: a Markov matrix on the union of agents’signals, which we call an interaction structure , capturing both the network and agent’s beliefs.A key observation is that consensus expectations are determined by the stationary distributioncorresponding to the Markov matrix. We now present the substantive results, and we discussthe technique in more detail at the end of the Introduction.
Unifying and Generalizing Network and Asymmetric Information Results.
The first resultsunify and generalize facts known in the literatures on network games and on asymmetric infor-mation:(a) Suppose agents have the same information but may have heterogeneous beliefs about θ —that is, different priors, which are commonly known. Then the consensus expectation issimply a weighted average of agents’ heterogeneous prior expectations of the external randomvariable. The weight on an agent’s expectation is his eigenvector centrality in the network. Thiscorresponds to the seminal result of Ballester, Calvó-Armengol, and Zenou (2006) on equilib-rium actions in certain network games being weighted averages of individuals’ ideal points,with someone’s weight determined by the extent to which others want to directly and indirectlycoordinate with him. The appearance of network centrality here—a statistic of individuals de-fined from the matrix of coordination weights—is a consequence of the matrix algebra thatnaturally appears when studying higher-order average expectations. (b) If there is asymmetric information, but agents have common prior beliefs, then the con-sensus expectation is equal to the (common prior) ex ante expectation of the external state.Thus consensus expectations are independent of the network structure, and also independent For recent surveys of economic applications related to network centrality, see Jackson (2008, Section 2.2.4), Ace-moglu, Ozdaglar, and Tahbaz-Salehi (2016b), Zenou (2016), and Golub and Sadler (2016).
BENJAMIN GOLUB AND STEPHEN MORRIS of all features of the information structure except the common prior. This result turns out to bea corollary of the result of Samet (1998a).(c) Embedding both (a) and (b), if agents have both heterogeneous prior beliefs and asym-metric information but a common prior on signals , then the consensus expectation is equalto a weighted average of agents’ different ex ante expectations of the basic random variable.Just as in (a), the weight on an agent’s expectation is his eigenvector centrality in the network. This goes beyond existing work on network games with incomplete information due to Calvó-Armengol, Martí, and Prat (2015), de Martí and Zenou (2015), Bergemann, Heumann, and Mor-ris (2015b) and Blume, Brock, Durlauf, and Jayaraman (2015): we will discuss these connectionsin Section 6 when we have introduced the model and key results.
Contagion of Optimism.
Our second category of results studies second-order optimism . Weassume that each agent, given any signal, assesses his average counterparty as more optimisticthan himself about the value of the basic random variable, unless the agent himself has a first-order expectation that is already very high (close to the highest induced by any signal). Agentswhose expectations are high may be somewhat pessimistic: they may assess the average coun-terparty as less optimistic than themselves.We study when arbitrarily slight second-order optimism leads consensus expectations tobe very high—near highest possible expectation of y —via a contagion of optimism throughhigher-order expectations. The proof is via a reduction to a Markov chain inequality. The keysubtlety in the analysis is: how much pessimism can be allowed without destroying the conta-gion of optimism? We give a bound that answers this question, and describe a sense in whichthis bound is tight (Section 7.4.2). Recent work of Han and Kyle (2017) discusses a different con-tagion of optimism in a CARA-normal rational expectations model. We examine connectionswith related models in Section 7.4.1. Tyranny of the Least-Informed.
Third, we consider a setting where agents start with hetero-geneous priors about the external state but share a common interpretation of signals . Thatis, agents observe signals of the external state. They agree on the probability of any particular That is, a common prior on how the signal random variables are distributed. This decomposition separates, in a suitable sense, the effects of the network and the beliefs. A companion paper,Golub and Morris (2017), gives the necessary and sufficient condition on the information structure for this sort ofweighted average decomposition to be possible. An interpretation of a signal random variable is its conditional distribution given the state, in line with the termi-nology of Kandel and Pearson (1995) and Acemoglu, Chernozhukov, and Yildiz (2016a). XPECTATIONS, NETWORKS, AND CONVENTIONS 5 signal of a given agent conditional on any external state. However, their priors over externalstates may differ, and thus their interim beliefs may not be compatible with a common prior.Given common interpretation of signals, it makes sense to define notions of more and less pre-cisely informed agents, because the distributions of signals given the external state (and thusthe levels of noise in them) are common knowledge.We show that, in a suitable sense, the consensus expectation approximates the ex ante expec-tation of the agent whose private information is least precise. This is true even if all agents havevery precise private signals about the state, as long as the least-informed has signals sufficientlyless precise than others. The quantitative details of how to define “sufficiently” are subtle, andrely on a Markov chain connection that we discuss next.
The Interaction Structure and Markov Formalism.
The techniques underlying the results dis-cussed above are based on a Markov matrix description of higher-order average expectations.While we defer most of the details until Section 4, when we have more notation, the basic idea issimple. We define a Markov process whose state space is the union of all agents’ signals. Transi-tion probabilities between any two states combine both the network weights and the subjectiveprobabilities of the agents. In particular, the transition probability from a signal t i of agent i to a signal t j of agent j is defined as the product of (i) the network weight that i places on j and (ii) the subjective probability that agent i , given signal t i , places on t j . We call the transi-tion matrix of this Markov process the interaction structure , and it is our key technical device.This formalism treats beliefs and network weights entirely symmetrically. This symmetric treat-ment enables the analysis to be reduced to Markov chain results, which provide both a tool andnovel insights. Other work on network games with incomplete information—Calvó-Armengol,Martí, and Prat (2015), de Martí and Zenou (2015), Bergemann, Heumann, and Morris (2015b)and Blume, Brock, Durlauf, and Jayaraman (2015)—does not use this general device and mustdevelop more tailored techniques.The essence of our approach is that the iteration of the Markov matrix associated with theinteraction structure enables a brief, explicit description of higher-order average expectations: In the environment we study for this application, the signals are conditionally independent given the state, sothat signals are correlated only through the state. The symmetric treatment follows Morris (1997). As discussed in detail in Section C.3, our approach echoes Samet(1998a) in using a Markov process to represent incomplete information, although our Markov process is actually adifferent one in significant ways.
BENJAMIN GOLUB AND STEPHEN MORRIS
The n th -order average expectations can be obtained by suitably combining the n -step transi-tion probabilities of the Markov process with the first-order expectations associated to varioussignals.To study consensus expectations, we consider the limit as n grows large. Under suitable con-ditions, in this limit the Markov transition probability to any state—regardless of where theprocess starts—becomes the stationary probability of that state. This can be used to show that the stationary distribution of the Markov process determines the consensus expectation . Indeed,the consensus expectation turns out to be a weighted average of first-order expectations givenvarious signals t i . The weight on a signal t i is its weight in the stationary distribution of theMarkov process.Thus, our results on the consensus expectation are proved by studying the stationary dis-tribution of the Markov process and deriving properties of it from more primitive assumptionsabout the environment. For example, in the analysis of the contagion of optimism, the essentialidea is that when second-order optimism holds, probability mass in the Markov process flowson average to signals associated with higher first-order expectations of the basic random vari-able. It can be shown that, as a consequence, states with high first-order expectations have alarger share of the stationary probability. By our description of the consensus expectation as aweighted average of first-order expectations, with weights given by the stationary probabilities,it follows that the consensus expectation is high.Other results rely on different reasoning. The most technically involved arguments are theones associated with the tyranny of the least-informed. These arguments rely on perturbationbounds for Markov chains, which are used to show that the priors of highly informed agentscannot play a substantial role in the stationary distribution that determines the consensus ex-pectation. Overall, our main methodological claim—illustrated by the various applications—isthat the structure of higher-order expectations is illuminated by the Markov formalism.The remainder of the paper is organized as follows. Section 2 presents the environment, de-fines higher-order average expectations and consensus expectations, and illustrates them withsome simple examples. Section 3 motivates higher-order average expectations and consensusexpectations by discussing a coordination game and an asset market where they are relevant.Section 4 presents our key technical device, the interaction structure, and the correspondence XPECTATIONS, NETWORKS, AND CONVENTIONS 7 between higher-order average expectations and statistics of a Markov process. Section 5 re-lates the interaction structure to the underlying network. Section 6 relates consensus expecta-tions to agents’ priors. Together these results unify and extend the known network games andincomplete-information results. Section 7 focuses on higher-order optimism, while Section 8reports our results on the tyranny of the least-informed. Section C is a discussion of relations tothe literature, subtleties, and extensions. 2. M
ODEL
The Information Structure.
States, Signals, and Expectations.
There is a finite set Θ of states of the world . There is afinite set N of agents. Associated to each agent i ∈ N is a finite set T i of signals (i.e., possiblesignal realizations). and these sets of signals are disjoint across agents. Let T = Q i ∈ N T i bethe product of all the signal spaces, with a typical element being a tuple t = ( t i ) i ∈ N ; let T − i = Q j ∈ N \{ i } T j be the product of the signal spaces of all the others, viewed from i ’s perspective.An agent’s signal fully determines all the information he has, including the information he hasabout others’ signals. Let Ω = Θ × T be the set of all realizations .For each i and signal t i , there is a belief π i ( · | t i ) ∈ ∆ ( Θ × T − i )—that is, a probability distribu-tion over Θ × T − i . This is the interim or conditional belief that agent i has when he gets signal t i . We introduce some notation to refer to marginal distributions: π i ( t j | t i ) denotes the prob-ability this belief assigns to agent j ’s signal being t j . For states θ ∈ Θ , the notation π i ( θ | t i )has an analogous definition. We refer to π = ( π i ( · | t i )) i ∈ N , t i ∈ T i as the information structure . Insituations where only interim beliefs matter, we will use the language of types . That is, we willidentify each signal with a corresponding (belief ) type of the agent. If signal t i induces a certainbelief over Θ × T − i , we will say that type t i (of agent i ) has that belief. We will call T the typespace . On the other hand, when we wish to emphasize the ex ante stage and the literal processof drawing signals, we will use the language of signals.A random variable measurable with respect to i ’s information is a function x i : T i → R , i.e., anelement of R T i (this set being defined as the set of functions from signals in T i to real numbers).Given a random variable z : Ω → R , let E i z ∈ R T i give i ’s conditional expectation of z . It is That is, π i ({( ˆ θ , ˆ t − i ) : b θ ∈ Θ , ˆ t j = t j }). As always, uncertainty about how signals are generated can be built into this description of an information struc-ture. Thus, following Harsanyi (1968), the information structure itself is taken to be common knowledge. For moreon this see Aumann (1976, p. 1237) and Brandenburger and Dekel (1993).
BENJAMIN GOLUB AND STEPHEN MORRIS defined by(1) ( E i z )( t i ) = X ( θ , t − i ) ∈ Θ × T − i π i ( θ , t − i | t i ) z ( θ , t i , t − i ).The summation runs over all ( θ , t − i ), and states are weighted using the probabilities assignedby the interim belief π i ( · | t i ). We will often abuse notation, as we have done here, by droppingparentheses in referring to elements of Ω in the arguments of beliefs and random variables.2.1.2. Priors.
The information structure was defined above in terms of agents’ interim beliefs,i.e., their beliefs about external states and others’ signals conditional on their own signals. Thisinterim information is enough to define higher-order average expectations and to state ourmain results. However, we are interested in the ex ante interpretation of our results: There isa prior stage before agents observe their own signals, and thus where they face uncertainty asto what signals they will observe.We write ( µ i ) i ∈ N for agents’ ex ante beliefs, with µ i ∈ ∆ ( T i ). Combined with conditional be-liefs π i ( · | t i ) ∈ ∆ ( Θ × T − i ), there is a prior P µ i ∈ ∆ ( Ω ) on the entire space of realizations, assigningto any ( θ , t ) ∈ Ω a probability (2) P µ i ( θ , t ) = X t i ∈ T i µ i ( t i ) π i ( θ , t − i | t i ).If one started from agent i ’s prior P µ i ∈ ∆ ( Ω ), one would define conditional beliefs π i ( · | t i ) ∈ ∆ ( Θ × T − i ) by updating according to Bayes’ rule.The probability measure P µ i gives rise to an ex ante expectation operator, E µ i z = X ω ∈ Ω P µ i ( ω ) z ( ω ) = X t i ∈ T i µ i ( t i ) E i z .To emphasize when an ex ante perspective is being taken, we adopt the convention that ex anteprobabilities, expectations, etc. are in bold.We will later be interested in what an agent’s ex ante beliefs would be if we had fixed hisconditional beliefs π i ( · | t i ) ∈ ∆ ( Θ × T − i ) but endowed him with alternative prior beliefs. Priorsfor i other than the true priors µ i are denoted by λ i ∈ ∆ ( T i ), and we use λ i in place of µ i in thenotations introduced above. Note that the probability under P µ i of any subset of Ω can be written as a sum of probabilities defined in equation(2), and a similar statement holds for the interim probabilities π i ( · | t i ). XPECTATIONS, NETWORKS, AND CONVENTIONS 9
The Network.
For each pair of agents, i and j , there is a number γ i j ∈ [0, 1], where P j ∈ N γ i j =
1, with the interpretation that agent i assigns “weight” γ i j to agent j . A matrix Γ , whose rowsand columns are indexed by N and whose entries are γ i j , records these weights and is calledthe network . The fact that the weights of any agent add up to 1 corresponds to this matrix beingrow-stochastic.The network is to be contrasted with the information structure encoded in the interim beliefs π i ( · | t i ). One interpretation of the network weight γ i j , which will be used when we discusscoordination games, is that it measures how much agent i cares about the action of j . Wedefine N i , the neighborhood of i , to be the set of j such that γ i j >
0, and the elements of N i are i ’s neighbors. Note that j may be a neighbor of i without i being a neighbor of j .We now define an important set of statistics arising from the network. Definition 1.
The eigenvector centrality weights of the agents are the entries of the unique rowvector e ∈ ∆ ( N ) satisfying e Γ = e —i.e., for each i , e i = X j i e j γ j i .Assuming that Γ is irreducible, the Perron–Frobenius Theorem states that the eigenvector cen-trality weights are well-defined—that there is indeed a unique such vector e . Moreover, thetheorem says that all the eigenvector centrality weights are positive.2.3. Higher-Order Average Expectations.
We now define higher-order average expectations. A basic random variable is a random variable measurable with respect to the external states, i.e.,a function y : Θ → R , or an element of R Θ . Consider a random variable y ∈ R Θ and define (3) x i (1; y ) = E i y for every i ∈ N . This is i ’s first-order expectation , given i ’s own signal, of y .We can now define the key objects we will focus on: the iterated expectations, or higher-orderaverage expectations . For n ≥
2, given ( x i ( n )) i ∈ N , define(4) x i ( n + y , Γ ) = X j ∈ N γ i j E i x j ( n ; y , Γ ). Here, abusing notation, we have identified y ∈ R Θ in the obvious way with a random variable z ∈ R Θ × T , namely,with the random variable z for which z ( θ , t ) = y ( θ ) for each ( θ , t ) ∈ Θ × T . Equation (4) relies on a similar under-standing. γ = γ = γ =
113 2F
IGURE
1. The network of Example 2.This is i ’s subjective expectation of the average of the random variables corresponding to theprevious iteration of the process; the average is taken with respect to the network weights.When we do not wish to emphasize the dependence on y and Γ , or when they are clear fromcontext, we omit these arguments.Note that equation (4), despite the presence of an iteration, is defined in a static environ-ment: Higher-order average expectations do not correspond to dynamic updating over time,but rather to a hierarchy of beliefs when agents are simultaneously given different information.For this reason, these will figure in the solution of a static game (see Section 3.1.1, and a contrastwith dynamics in Section 9).2.4. Examples.Example 1.
If we have γ i j = | N | for all i , j , then every agent is weighting all others equally.Such averages will turn out to be relevant for beauty contests with homogeneous weights: x i ( n )is a random agent’s expectation of a random agent’s expectation . . . of a random agent’s expec-tation of y . Example 2.
Suppose the only nonzero entries of Γ are γ i , i + =
1, where indices are interpretedmodulo | N | , the number of agents. This corresponds to agents being arranged in a cycle, witheach paying attention to the one with the next index. Take, for example, | N | = x (3) = E E E y .We could continue this process, and then we would essentially look at ( E E E ) a E y , where a is some positive integer (possibly with E or E E appended to the front). Our study of higher-order average expectations will allow us to study the limiting properties of this sequence. XPECTATIONS, NETWORKS, AND CONVENTIONS 11
Joint Connectedness: A Maintained Technical Assumption.
A key technical assumption— jointconnectedness of the information structure and network—will be convenient in formulatingstatements about limits of higher-order average expectations. This assumption will be main-tained unless we state otherwise.Say that a signal t j (of an agent j ) is a neighbor of a signal t i (of agent i ) if agent j is a neighborof i (i.e., γ i j >
0) and agent i , when he observes signal t i , considers signal t j possible (i.e., π i ( t j | t i ) > S = S i ∈ N T i .We say the information structure and network are jointly connected if every nonempty, propersubset S ′ ( S contains some signal that is a neighbor of a signal not in S ′ . We will discuss thecontent and significance of this assumption below in Section C.1.2.6. Consensus Expectations: Definition and Existence.
An object central to our general the-oretical results and the applications will be a kind of limit of higher-order average expectationsas we consider many iterations.
Definition 2.
For any information structure π , network Γ , and basic random variable y , the consensus expectation c ( y ; π , Γ ) is defined to be any entry of the vector(5) lim β ↑ ¡ − β ¢ µ ∞ X n = β n − x i ( n ; y ) ¶ ,for any i , if the limit exists (in the sense of pointwise convergence) and is equal to a constantvector.The vector in (5) is sometimes called an Abel average of the sequence ¡ x i ( n ; y ) ¢ ∞ n = (see, e.g.,Kozitsky, Shoikhet, and Zemánek, 2013). Proposition 1 in Section 4 below asserts that the con-sensus expectation is well-defined under the maintained assumption of joint connectedness.The consensus expectation is equal to any entry of the simple limit lim n →∞ x i ( n ; y ) if thelatter exists. It also coincides with the Cesà ro limit, which is obtained by taking simple averagesover many values of n . We will discuss these issues further in Section 4.2.3. W HY H IGHER -O RDER A VERAGE E XPECTATIONS AND C ONSENSUS E XPECTATIONS M ATTER
We now discuss two economic problems where higher-order average expectations arise. First,we consider the network game with incomplete information discussed in the Introduction,where equilibrium actions are weighted averages of higher-order average expectations. Second,we describe a stylized asset market with fragmented markets, where asset prices reduce to the solution of the game, and are thus also weighted averages of higher-order average expectations.In each of these two cases, we will (i) show how outcomes are characterized by higher-order av-erage expectations; (ii) motivate the study of consensus expectations—a limit of higher-orderaverage expectations; and (iii) interpret our later results in the context of these applications.3.1.
Coordination.
How will a group of agents coordinate their behavior when they have strongincentives to take the same action as others but have different beliefs about what the best ac-tion to take is? We consider a class of games with linear best responses where each agent wantsto set her action equal to a weighted average of (i) her expectation of a random variable and(ii) the weighted average of actions taken by others. We show how the equilibrium is deter-mined by higher-order average expectations and then focus on the limit as coordination con-cerns dominate. There will be a particular single action taken in this limit by all agents after allsignals—“the convention.” We first describe the game.3.1.1.
The Game.
We will consider an incomplete-information game where payoffs depend onthe states of the world, Θ . Beliefs and higher-order beliefs about Θ are described by the belieffunctions introduced in Section 2.1.1. The strategic dependencies are encoded in a network Γ . We also assume that γ ii = i . The game will also depend on y , a basic (i.e., θ -measurable) random variable with support inthe interval [0, M ]. We will consider the “ β -game" parameterized by β ∈ [0, 1]. Each agent i chooses an action a i ∈ [0, M ], and the best-response action of agent i after observing signal t i is given by a i = (1 − β ) E i y + β X j i γ i j E i a j ,where other players’ actions are viewed as random variables that depend on their own signalrealizations. The best response can be derived from a quadratic loss function, where the ex postutility of agent i under realized state θ ∈ Θ , if action profile a = ¡ a i ¢ i ∈ N ∈ R N is played, is u i ³ a i , θ ´ = − ¡ − β ¢ ³ a i − y ( θ ) ´ − β X j i γ i j ³ a i − a j ´ . If we identify types with their (interim) beliefs, we can say that the beliefs are encoded in the type space. The assumption that the diagonal is 0 is the most natural one for this game. Analogous results hold without thisassumption and have game-theoretic interpretations. See Section C.2. We will focus on the case where agents care about the same basic random variable. But the analysis extendsreadily to the case where agents care about different random variables, since their heterogeneous expectations ofrandom variables conditional on their signals can be interpreted as agent-specific random variables. See SectionC.5 for further discussion.
XPECTATIONS, NETWORKS, AND CONVENTIONS 13
A “meetings” interpretation of the weights γ i j is that i has to commit to an action beforeknowing which agent he will interact with, and i assesses that the probability of interactingwith j is γ i j .3.1.2. Solution of the Game for Any β < . To summarize the previous section, the environ-ment in which the game is played is described by a tuple consisting of an external randomvariable, a network, and a coordination weight: ( y , Γ , β ). A strategy of agent i in the incomplete-information game, s i : T i → R , specifies an action for each signal. Write s i ( t i ) for the actionchosen by agent i upon observing signal t i . Then agent i ’s best response to strategy profile s = ¡ s i ¢ i ∈ N is given by(6) BR i ( s ) = (1 − β ) E i y + β X j i γ i j E i s j .To establish (7), write R i ( k ) for the set of i ’s pure strategies surviving k rounds of iterateddeletion of strictly dominated strategies. The map BR( s ) : [0, M ] S → [0, M ] S is a contractionmapping (with Lipschitz constant β ). Thus, the sets R i ( k ), which are produced by the repeatedapplication of this map to the set [0, M ], must converge to a single point satisfying s = BR( s ),which is an equilibrium of our game. A more detailed proof can be found in an Appendix,Section A.1.This analysis is the asymmetric version of the analysis in Morris and Shin (2002).3.1.3. Conventions: Equilibrium for β = and β ↑ . Fact 1. If β < , the β -game in the environment given by ( Γ , β , y ) has a unique rationalizablestrategy profile, and it is given by (7) s i ∗ ¡ y , Γ , β ¢ = ¡ − β ¢ µ ∞ X n = β n − x i ( n ; y , Γ ) ¶ . There is a sharp distinction between the game with β < β =
1. In thelatter case, there is a continuum of equilibria, one for each a ∈ [0, M ]. In these equilibria, agents One reason to focus on rationalizability is that because we do not have a common prior, there is some incon-sistency in using a solution concept (equilibrium) which builds in common prior beliefs about strategic behavior(see Dekel, Fudenberg, and Levine, 2004). This game has a unique equilibrium (even with unbounded action spaces). This follows from the observationthat the game is best-response equivalent to a team decision problem, and uniqueness in the team decision prob-lem is shown in Radner (1962). Ui (2009) gives a general statement of this result, expressed in the language ofBayesian potential functions. Since this is a game with strategic complementarities, bounded action spaces implythat the unique equilibrium is the unique action strategy profile surviving iterated deletion of strictly dominatedstrategies (Milgrom and Roberts (1990)). Our proof of Fact 1 is established by explicitly calculating the iteratedelimination of dominated strategies. all choose the same action independent of their signals and thus of the state. To see why, recallthat every agent’s action must be equal to his (weighted) expectation of others’ actions. Butnow consider the highest action ever played in some equilibrium (i.e., given some signal ofsome agent). The agent i taking that highest action at some signal t i must be sure that thathighest action is being taken by every other agent j who observes a signal t j that i considerspossible when he observes signal t i . Now, however, the same logic applies to agent j observingthat signal t j . Continuing in this way, our joint connectedness assumption implies that thehighest action must be played by all agents for all signal realizations. This argument and resultappear in Shin and Williamson (1996), who label the resulting play—constant across agents andsignals—a convention, because each agent is always choosing the same action and is choosingthat action because others do.To summarize: When β <
1, there is a unique equilibrium, with agents’ actions dependingon their higher-order expectations of y . When β =
1, there is a continuum of “conventional”equilibria. What happens as β ↑
1? The play is described by a limit of unique equilibria, whichturns out to be well-defined: lim β ↑ ¡ − β ¢ µ ∞ X n = β n − x i ( n ) ¶ .By an application of the argument of the previous paragraph to the limiting payoffs, under jointconnectedness the limit must feature “conventional” play, not depending on one’s signal oridentity. The existence of the limit and a characterization of the action played in it will be for-malized in the next section; the main result is Proposition 1. The limit can be seen as a selectionamong the continuum of equilibria of the β = β ↑ β ∈ [0, 1]. From now on, we will focus on the β ↑ Weinstein and Yildiz (2007a) have argued that, in a fixed linear best response game, very high-order beliefs haveonly a small impact on rationalizable play; this constrasts with the better-known observation in Weinstein andYildiz (2007b) that very high-order beliefs can have an arbitrarily high impact in general games. We get sensitivity tohigh level higher-order beliefs in linear best response games because are looking at β ↑ XPECTATIONS, NETWORKS, AND CONVENTIONS 15
Our main results focus on β begin very close to, but not equal to, 1. Some of our results applyto, or have implications for, the more general situation with β much smaller than 1, and wediscuss that case when appropriate.3.1.4. Conventions with High Coordination Weights: Preview of Main Results.
Our results inSections 6, 7, and 8 characterize that limit convention in some environments:1. Under the common prior assumption, the convention is equal to the common ex anteexpectation of y . If agents share a common prior on signals, but not necessarily on states,then the convention is equal to a weighted average of (different) ex ante expectations thatthe agents hold of y , with each agent’s expectation weighted by his eigenvector centralityin the interaction network Γ .2. If all agents always have a small amount of second-order optimism (believing that theiraverage counterparty is a bit more optimistic than they are), the convention will equalthe highest interim expectation ever held by any agent.3. If there is common interpretation of signals and one agent is sufficiently less informedthan all other agents, then the convention will equal the ex ante expectation of that least-informed agent.3.2. Asset Pricing.
Keynes (1936, p. 156) famously likened investment to a “beauty contest”whose outcome depends on higher-order beliefs:. . . professional investment may be likened to those newspaper competitionsin which the competitors have to pick out the six prettiest faces from a hun-dred photographs, the prize being awarded to the competitor whose choice mostnearly corresponds to the average preferences of the competitors as a whole; sothat each competitor has to pick, not those faces which he himself finds prettiest,but those which he thinks likeliest to catch the fancy of the other competitors, allof whom are looking at the problem from the same point of view. It is not a caseof choosing those which, to the best of one’s judgement, are really the prettiest,nor even those which average opinion genuinely thinks the prettiest. We havereached the third degree where we devote our intelligences to anticipating whataverage opinion expects the average opinion to be. And there are some, I believe,who practise the fourth, fifth and higher degrees.”
Keynes is presumably not suggesting that the newspaper competition winner is completely in-dependent of “prettiness” but rather that each competitor has an incentive to try to match theaverage expectation of prettiness, and then some average expectation of such average expec-tations, and so on. We will study an asset pricing model where asset prices will correspond tosolutions of the coordination game above and thus to the description of investment behaviorthat Keynes gives.3.2.1.
Asset Market.
Suppose that there are several populations or classes, indexed by the ele-ments of N , and each of these consists of a continuum of infinitesimal traders . There is an assetwhose payoff will depend on the realization of a random variable y that is measurable with re-spect to Θ and that takes values in [0, M ]. The beliefs and higher-order beliefs of traders in class i about the state space Θ will be given by the belief function π i defined in the general model;in particular, they share the same belief function. Each trader in class i will also observe thesame signal t i . Thus, they share the same interim beliefs. All traders are risk-neutral and thereis no discounting. A single unit of the asset will be traded among all classes of traders. Thereis a network Γ , which will determine where traders resell their assets in a way we are about todescribe.The trading game works as follows. Time is discrete. At each time t , one trader (say, in class i ) enters owning the asset. With probability β , the state is realized and the owner of the assetconsumes the realization of the asset (with the interpretation that this corresponds to liquidityneeds). He then exits the game. If not (and so with probability 1 − β ), a class of traders j is selected randomly (and exogenously). The asset owner believes that class j is selected withprobability γ i j . The owner must then sell the asset in a market consisting of all traders of class j who have not yet exited. There is Bertrand competition in market j , with each buyer (i.e.,remaining trader in class j ) offering a price p and the seller (in class i ) deciding to whom to sellthe asset. We then enter period t + j holding the asset.3.2.2. Equilibrium Asset Prices.
We will consider symmetric Markov subgame perfect equilibriaof the asset trading game described in Section 3.2.1. By “symmetric Markov,” we mean thateach trader’s offer will depend only on the class to which he belongs and the class of the currentowner from whom he is buying.The main result about this asset market is that there is a unique symmetric, Markov, subgame-perfect equilibrium where, whenever the asset is sold in market j , the traders in market j withsignal t j offer a price equal to s ( t j ), where s = s ∗ ¡ β ¢ as defined in (7), and owners sell to any XPECTATIONS, NETWORKS, AND CONVENTIONS 17 trader in class j offering the highest price. In other words, traders always set prices equal to theequilibrium of the linear best-response game of the previous section. To see why, note first thata trader’s willingness to pay for the asset does not depend on whom he is buying the asset from.Also, observe that in a symmetric equilibrium, traders must be setting prices equal to their will-ingness to pay. Thus equilibrium asset prices must satisfy equation (7) i.e., the equilibriumcondition from the linear best-response game.Our analysis does depend on the restriction to symmetric Markov strategies and equilibriumrather than rationalizability as a solution concept. If we did not impose the Markov assump-tion, there would be “bubble” equilibria, with the asset price growing exponentially. We alsoused the assumption of equilibrium in our analysis, when we directly assumed that the pricessatisfy the equation (7), rather than (as we did in Section 3.1.2) arguing that this condition fol-lows from some weaker solution concept. We used symmetry when we assumed that all mem-bers of a given class price the asset the same way whenever they have the opportunity to buy.3.2.3.
Asset Pricing with High-Frequency Trading: Preview of Main Results.
Taking the limit β ↑ Our main results below have implications for the assetprices which parallel the statements of 3.1.4 applied to the game.3.2.4.
Techniques and Related Models of Asset Pricing.
In the special case where the networkis uniform, we can could have derived the same asset pricing formula in a standard dynamicCARA-normal rational expectations model, with overlapping generations of agents, as studiedby Grundy and McNichols (1989) and many others. In each period, the market will shut downwith probability 1 − β , and the current old agents will consume a terminal value of the asset.If it does not shut down, the old will sell the asset to the young. In each period, the youngwill inherit the distribution of signals about the terminal value of the old. The asset price willequal the forward looking risk-adjusted iterated expectation of the value of the asset. If the vari-ance of noise traders in the market increased without bound, there would be no learning in themarket and the expected risk-adjusted price would be equal to the iterated average expecta-tion. Allen, Morris, and Shin (2006) show this for a finite truncation of this environment with β =
1. The dynamic CARA-normal rational expectations model is studied under the common Recall footnote 14 on the comparison of rationalizability and equilibrium in this context. Steiner and Stewart (2015) have used high-frequency limits to show a similar convergence to public randomvariables. prior assumption. Banerjee and Kremer (2010) and Han and Kyle (2017) have studied the roleof heterogeneous prior beliefs in the static version of the model.This asset market combines features that appear in many other asset pricing models, and wenow review some of the connections. Harrison and Kreps (1978) study an asset market wherean asset is re-traded in each period between different risk-neutral agents with heterogeneousprior beliefs. They focus on the minimal price paths, in order to rule out bubbles based purelyon everyone’s expectation that prices will rise based on calendar time; we achieve a similar ef-fect with our stationarity assumption. We allow asymmetric information but make exogenousthe agent to whom another agent must sell. Duffie and Manso (2007) study a random match-ing model of trade, where traders are matched in pairs at each time period. They focus oninformation percolation over time with a simple updating rule, while we focus on effects dueto higher-order beliefs; our matching technology is also more general. Malamud and Rostek(2016) study markets with an exogenous network structure of access to multiple markets, butendogenize agents’ choice of how much to trade in each market.A key simplification in our model of trading is that each agent is infinitesimal, so any learningabout the asset value does not affect anyone’s expectations. Steiner and Stewart (2015) obtainthe same effect in a model of asymmetric information where agents do not condition on others’information. They give a behavioral interpretation of this restriction via coarse perceptions. Our model and that of Steiner and Stewart (2015) both feature the same dependence of pricesonly on public information among the agents; the limit where trading becomes frequent is crit-ical to this. 4. T HE I NTERACTION S TRUCTURE
Interaction Structure.
One contribution of this paper is to show that the information struc-ture and the network structure can be seen from a unified perspective—in studying higher-order average expectations and, consequently, for our applications. In particular, we will definean interaction structure —a square matrix indexed by the set S comprising the union of every-one’s signals—that simultaneously captures beliefs and the network. This serves two purposes.First, it highlights the symmetry between information and the network. Second, it facilitates One can also give their results an interpretation in terms of heterogeneous beliefs and asymmetric information.
XPECTATIONS, NETWORKS, AND CONVENTIONS 19 relating higher-order average expectations to a Markov matrix and its iteration, which is an im-portant technique for us. Indeed, we will use a Markov process representation to deduce theresults that follow from results about Markov processes.Let S = S i ∈ N T i be the union of the (disjoint) sets of signals. Define x ( n ) : S → R by [ x ( n )]( t i ) = [ x i ( n )]( t i ). In words, this one function is a parsimonious way of keeping track of the higher-order average expectations of all agents at stage n . A random variable y : Θ → R that dependson the external state is viewed as a vector indexed by Θ , i.e., y ∈ R Θ . The first-order expectationmap y x (1) can then be viewed as a map R Θ → R S . Using the standard bases for the domainand codomain, we can represent this map via a matrix. Indeed, we can write x (1) = F y , where F is a matrix with rows indexed by T i and columns indexed by Θ , and whose entries are(8) F ( t i , θ ) = π i ( θ | t i ).Even though the rows and columns of this matrix are not ordered, we can define matrix multi-plication by stipulating that ( F y )( t i ) = X θ ∈ Θ F ( t i , θ ) y ( θ ).It is immediate to check that with this definition, ( F y )( t i ) is indeed i ’s subjective expectation of y when i receives signal t i .Along the same lines, the formula of (4), x i ( n + y ) = P j ∈ N γ i j E i x j ( n ; y ), can be described inmatrix notation. Equation (4) defines a linear map R S → R S such that x ( n ) x ( n + R S (as both the domain and codomain) we can write x ( n + = B x ( n ),where B is a matrix with rows and columns indexed by S , and entries(9) B ( t i , t j ) = γ i j π i ( t j | t i ).We call B the interaction structure . It captures the weights (arising from both the network andagents’ beliefs) that matter for iterating agents’ expectations.Combining the above, we find, for n ≥
1, the short formula(10) x ( n ) = B n − F y , Samet (1998a) introduced and used a Markov process as a representation of an information structure. We con-struct a related, but different, process: Ours simultaneously captures the network and agents’ beliefs and operateson the union of signals instead of realizations. See Golub and Morris (2017) for the exact analogue of Samet’sprocess. Recall that this object appeared in the definition of joint connectedness in Section 2.5. It should not be confusedwith the product set T = Q i ∈ N T i , whose elements are signal profiles . which describes the step- n higher-order average expectations. Thus, understanding their be-havior boils down to studying powers of the linear operator B . One can check that: Fact 2.
The interaction structure B is row-stochastic.
To verify this, note that for each t i ∈ S we have X t j ∈ S B ( t i , t j ) = X j ∈ N X t j ∈ T j γ i j π i ( t j | t i ) = X j ∈ N γ i j X t j ∈ T j π i ( t j | t i ) = t i follows because the distribution π i ( · | t i ) is a probability distribu-tion over T j and Γ is row-stochastic.We will occasionally emphasize the dependence of the matrices we have defined on π = ( π i ( · | t i )) t i ∈ T i , i ∈ N , and the dependence of B on the network Γ , by writing F π and B π , Γ , andsimilarly for derived objects.The interaction structure B allows us to recover a matrix corresponding to one agent’s beliefsabout another. For any i and j , if we set γ i j to 1 and all the other entries of Γ to 0, then B restricts naturally to an operator B i j : R T j → R T i sending T j -measurable random variables to i ’s conditional beliefs about them. The entries of the matrix are B i j ( t i , t j ) = π i ( · | t i ).Equation (10) entails a sharp separation between (i) agents’ first-order beliefs about Θ , onthe one hand, and (ii) the network and their beliefs about each other’s signals, on the other. Theformer are encoded in F , and the latter in B .4.2. The Consensus Expectation via the Interaction Structure.
In Section 2.6, we defined theconsensus expectation. The formalism we have introduced will allow us to prove Proposition 1,below, on its existence, and in the process also to relate it to properties of the matrix B .Recalling Definition 2, the consensus expectation is the number in every entry of the follow-ing vector:(11) lim β ↑ ¡ − β ¢ ∞ X n = β n − x ( n ; y )The notation introduced in Section 4.1 above allows us to rewrite this as(12) lim β ↑ ¡ − β ¢ µ ∞ X n = β n B n ¶ F y .In this section, we use the formalism we have introduced to explain why this limit exists andwhy it is a constant vector, as well as to characterize it. The following is our main result on this,which shows that the consensus expectation (recall Definition 2 in Section 2.6) is well-defined.
XPECTATIONS, NETWORKS, AND CONVENTIONS 21
Proposition 1.
The consensus expectation exists and (13) c ( y ; π , Γ ) = X t i ∈ S p ( t i ) E i [ y | t i ], where p is the unique vector in p ∈ ∆ ( S ) satisfying pB = p. All entries of p are positive, and it iscalled the vector of agent-type weights . Thus the consensus expectation of y is a weighted average of the expectations associatedwith the various signals of each agent, encoded in F y ; the weight on the expectation of signal t i of agent i , or simply type t i , is given by p ( t i ). Note that, by definition, p is the stationarydistribution of B viewed as a Markov matrix.A simple but important separation can be read off from the formula of Proposition 1. Thevector p , because it is uniquely defined by B (by the Perron–Frobenius Theorem), depends onlyon the entries of B , which in turn depend only on the network weights γ i j and on agents’ in-terim marginals on one another’s signals, π i ( t j | t i ). Thus, these features of the model jointlydetermine the weights p ( t i ). Beliefs about Θ enter only through E i [ y | t i ]. This reflects theseparation noted at the end of Section 4.1. Thus, the interesting effects arising from higher-order beliefs will be characterized by explaining how the information structure affects p ; see,for instance, Sections 7 and 8.For our analysis, we have fixed a y throughout; however, note that if y were arbitrary, Propo-sition 1 would hold with the same p for all y .To see why Proposition 1 holds, first note that if(14) lim n →∞ B n F y exists, then this limit will equal (11). This is because (11) is the weighted mean of terms of theform B n F y ; as β ↑
1, most of the weight is assigned to the terms corresponding to large valuesof n . To give intuition, here will assume that (14) exists, though our result is more general asshown in the proof of Proposition 1 in Appendix A.2. Recalling Section 2.1.1, we use the terminology of a type here for a signal to emphasize the interim perspective:All that matters for higher-order expectations (and hence consensus expectations) are an agent’s interim beliefs(including higher-order beliefs), and agents’ types fully capture these. Sometimes (11) will exist when (14) does not, because for large n , the vector B n F y cycles (approximately) amongseveral limit vectors. In this case, (11) takes an average of these vectors. We discuss these issues further in AppendixD.1.
Joint connectedness will imply that the matrix B is irreducible. Thus, by a standard factabout such matrices, every row of B ∞ is p , assuming this limit exists. Writing for the function(vector in R S ) that takes a constant value of 1 on all of S , for any vector z ∈ R S we have(15) lim β ↑ ¡ − β ¢ ∞ X n = β n B n z = ( pz ) ,where p is as defined in the statement of Proposition 1. In the analysis of (11), we set z = F y .A variant of the standard Markov chain result shows that (15) holds more generally, even whenthe limit of the x ( n ) in (10) does not exist.Proposition 1 implies that higher-order average expectations converge in the sense of (11) toa number which is independent of the agent and of his signal: the consensus expectation. Thus,in the coordination game, agents’ actions in the β ↑ Of course, the consensus expectation depends,in general, on all the interim beliefs ( π i ) i ∈ N and on the network Γ .4.3. A Markov Process Interpretation of the Interaction Structure and the Consensus Expec-tation.
The interaction structure B is a row-stochastic or Markov matrix, and corresponds toa Markov process that we construct, with S playing the role of the state space. We can imag-ine a particle starting at some state t i ∈ S , and the probability of transitioning to t j ∈ S being γ i j π i ( t j | t i ).This process can be useful for understanding the behavior of higher-order average expecta-tions. Fix a signal t i ∈ S and consider the Markov process started at this t i , with its (random)location over time captured by the random variables W = t i , W , W , . . .. If we define a func-tion f : S → R such that f ( t i ) = ( F y )( t i ), then x i ( n ) , the n th -order average expectation of y, isthe expected value of f ( W n ). The vector of agent-type weights discussed in Section 4.2 is thestationary distribution of the chain, and the consensus expectation of y is the expected value of f ( W ) where W is drawn according to the stationary distribution.The process we have defined provides a physical analogy that is useful for intuition and alsosuggests proof techniques—see Sections 7 and 8. The meaning of irreducibility in our context is discussed further in Section C.1. If there are public events, the consensus is nonrandom once public information is taken into account. See Sec-tion C.1.2 for further discussion.
XPECTATIONS, NETWORKS, AND CONVENTIONS 23
5. T HE C ONSENSUS E XPECTATION AND THE N ETWORK
One simple special case of Proposition 1 arises when | T i | = i : There is complete in-formation about each agent’s signal. In that case, B = Γ and so p = e , the eigenvector centralityvector of the network Γ . (Recall Definition 1 in Section 2.2.) It follows from (13) that c ( y ; π, Γ ) = X i e i E i y ,where, abusing notation, E i y denotes the interim expectation of y induced by the one sig-nal that agent i ever gets. This relates to network game results of Ballester, Calvó-Armengol,and Zenou (2006), and especially to the limit with high coordination motives studied in Calvó-Armengol, Martí, and Prat (2015), where play is determined by ideal points weighted by eigen-vector centralities.There is a much more general sense in which the eigenvector centralities of the agents figurein the consensus expectation: Proposition 2.
There are strictly positive priors ¡ λ i ¢ i ∈ N , with λ i ∈ ∆ ¡ T i ¢ , such that, for all y, (16) c ( y ; π, Γ ) = X i e i E λ i y , where the e i are the eigenvector centralities of the agents. The expression E λ i y corresponds to an ex ante expectation of agent i , where the expectationis taken according to a pseudoprior λ i over i ’s signals that need not be related to agent i ’s actualprior µ i .Recalling (13), we can see that this result asserts e i λ i ( t i ) = p ( t i ), and indeed its content isthat agent i ’s agent-type weights sum to his eigenvector centrality, e i . This is formally stated inthe following lemma, which is what we use to prove Proposition 2, and which also relates to theMarkov interpretation of consensus expectations in Section 4.3. Lemma 1.
For each i , the agent-type weights associated with agent i ’s types add up to the eigen-vector centrality of i : X t i ǫ T i p ( t i ) = e i . Proof.
Let ι : S → N map any type t i to the agent i whose type it is. Check that V ( n ) : = ι ( W ( n ))is a Markov process on N with transition matrix Γ . Now the stationary probabilities of theprocess W are given by p , and the total stationary probability of the set T i ⊆ S under W istherefore P t i ∈ S p ( t i ). By the coupling between V and W , this must be equal to the stationaryprobability of i under V , which is e i . (cid:3) The proof of Proposition 2 is completed by making the definition λ i ( t i ) = p ( t i )/ e i , which islegitimate because all the centralities e i are positive (see comments after Definition 1).Generally, the pseduopriors λ i will depend on both the information structure π and the net-work Γ . We will be especially interested in when the λ i depend only on beliefs. The next sectiongives some conditions for this, and the issue is discussed more generally in Section C.3.5.1. Interpreting the Interaction Structure as a Network.
As we mentioned at the end of Sec-tion 3.1.1, in the context of the interaction game, the weights γ i j can be interpreted as i ’s sub-jective probabilities of meeting or interacting with various others at the time he has to committo his action. In light of this interpretation, B ( t i , t j ) = γ i j π i ( t j | t i ) can be seen as a subjectiveprobability assessed by agent i , when he has signal t i , that his partner in the game will havesignal t j : The first factor, γ i j , is i ’s probability of meeting j , and π i ( t j | t i ) is the probability,conditional on that meeting, that j has signal t j . (An agent may be privately informed abouthis weights or interaction probabilities. This kind of uncertainty relates to that studied by Gale-otti, Goyal, Jackson, Vega-Redondo, and Yariv (2010); see our discussion in Section C.5.2.)In this sense, the environment can be reduced, from the perspective of each player, purely toincomplete information. Relatedly, we can reduce the analysis purely to networks. To this end,we construct a new environment (whose objects are distinguished by hats) based on the prim-itives of the original environment. In this environment the new set of agents, b N , is S , the set ofall signals. The network is b Γ ( t i , t j ) = γ i j π i ( t j | t i ); there is complete information about signals(each agent has a singleton type t i , which is also his agent label); and the first-order beliefs ofthe new agents replicate those of the corresponding types. Now, the higher-order average ex-pectation vector of this new environment, b x ( n ; y ), is the sam е as x ( n , y ). All statements about The reasoning is as follows: The probability of the event { V ( n + = j } conditional on { V ( n ) = i } is equal to γ i j :For any t i ∈ T i ⊆ S , we have X t j ∈ T j B ( t i , t j ) = X t j ∈ T j γ i j π i ( t j | t i ) = γ i j X t j ∈ T j π i ( t j | t i ) = γ i j . Under the obvious bijection of indices.
XPECTATIONS, NETWORKS, AND CONVENTIONS 25 higher-order average expectations in the original game of incomplete information can be rein-terpreted in this complete-information environment as network quantities. For instance, to getthe second-order average expectation of a type t i , we look at the corresponding agent in thenetwork, and take the average, across all his neighbors, of their neighbors’ first-order expecta-tions.To summarize: We have taken all uncertainty about others’ signals, and combined it with theoriginal network weights, to obtain the new network weights b Γ . From this perspective, the gameof incomplete information of Section 3.1.1 is reduced to the network game studied by Ballester,Calvó-Armengol, and Zenou (2006). This transformation is essentially the transformation ofthe game of incomplete information into an agent normal form. (For a conceptually similarreduction, see Morris (1997). The tensor products of de Martí and Zenou (2015) can also beseen as instances of this in a specific setting of exchangeable information.)6. U NIFYING AND G ENERALIZING N ETWORK AND A SYMMETRIC I NFORMATION R ESULTS
We now study conditions under which the agent-type weights take a particularly simple form.Under these conditions, there are formulas for consensus expectations that decompose nicelyinto different individuals’ prior expectations, weighted by those individuals’ centralities.Recall that agents’ priors are given by the profile ( µ i ) i ∈ N of distributions, with µ i ∈ ∆ ( T i ). Definition 3.
There is a common prior over signals (CPS) if, for each signal profile t ∈ T andeach i , j ∈ N , we have µ i ( t i ) π i ( t − i | t i ) = µ j ( t j ) π j ( t − j | t j ).CPS does not imply a common prior over the states Θ ; agents may have inconsistent beliefsabout θ . A common prior on signals could arise if each agent first observed a signal drawnaccording to the common prior but interpreted signals differently. However, CPS does implythat there is a common prior over agents’ second-order and higher-order beliefs.Now we can show that under CPS, the distributions λ i in the representation c ( y ; π, Γ ) = P i e i E λ i y of Proposition 2 are ex ante probability distributions on signals, i.e., λ i = µ i ; the pseu-dopriors are the actual priors. Recall from Section 2.1.2 that bold expectation operators denoteex ante expectations. Proposition 3.
If there is a common prior over signals, then the consensus expectation is equal tothe eigenvector-centrality weighted average of the ex ante expectations of the agents:c ( y ; π, Γ ) = X i e i E µ i y , where µ i is the prior over i ’s signals. Proposition 3 shows that the consensus expectation is a weighted average of agents’ prior ex-pectations, E µ i y , weighted by agents’ network centralities, e i . We say in this case that there is a separability between the network and the information structure: The network enters only intothe centralities, and the information structure determines E µ i y . (See Section C.3 for further dis-cussion of this property.) Under complete information about signals but heterogeneous priorsabout Θ , this yields a reinterpretation of the DeGroot model, as we discuss further in Section 9.In terms of the generality of the information structure, Proposition 3 goes beyond previousrelated results that decomposed equilibrium actions into agent-specific quantities weightedby agents’ centralities. Results in this category include Calvó-Armengol, Martí, and Prat (2015),Bergemann, Heumann, and Morris (2015b) and Myatt and Wallace (2017) (which rely on Gauss-ian signals), de Martí and Zenou (2015) (which relies on exchangeable signals), and Blume,Brock, Durlauf, and Jayaraman (2015) (which does not characterize the contribution of higher-order expectation terms). Formal details of each of these models differ in several ways from ourmodel, but in not imposing parametric or symmetry conditions, and in allowing heterogeneouspriors about states, our result on the decomposition at the β ↑ Corollary 1.
If there is a common prior over signals and E µ i y = y for all i —that is, agents have acommon ex ante expectation of the external random variable—then the consensus expectation isequal to the (common) ex ante expectation.c ( y ; π, Γ ) = y .Corollary 1 is closely related to Samet (1998a), which shows that if the common prior as-sumption (over the whole space Ω ) holds, then any sequence of expectations ( A ’s expectationof B ’s expectation . . .) of the random variable is equal to the ex ante expectation of the randomvariable y . Since limits of such iterated expectations determine the consensus expectation, it XPECTATIONS, NETWORKS, AND CONVENTIONS 27 is also equal to y . Note, however, that the hypotheses of Corollary 1 are weaker than the fullcommon prior assumption, because they impose no restrictions on the joint distribution of θ and signals.Proposition 3 is also closely related to Samet (1998a) in the following sense: given a commonprior over signals, the highly iterated expectation of any random variable measurable with re-spect to agent i ’s signal is equal to the common prior expectation of that random variable—thatis, the expectation of it with respect to the measure µ i . Our results show that these prior expec-tations are combined according to agents’ network centrality weights. Sections C.3 and C.4.2elaborate further on these issues, as well as a converse to Samet’s result.7. C ONTAGION OF O PTIMISM
Consider a case in which agents are second-order optimistic : they are optimistic about theexpectations of those they interact with. That is, they believe that, on average, those othershave higher expectations than their own. In this circumstance, we will give conditions underwhich consensus expectations are driven to extremes via a contagion of optimism. Sections3.1.4 and 3.2.3 state the interpretation of this in the game and in the asset market, respectively.7.1.
Three Illustrative Cases.
To motivate our results on this and to gain intuition, we first con-sider some extreme cases. These illustrate how the Markov process representation of higher-order expectations and its physical interpretation from Section 4.3 can yield striking resultsabout consensus expectations. Fix a random variable with minimum realization 0 and maxi-mum realization 1. Say that agent i considers j over-optimistic if agent i ’s expectation of agent j ’s expectation is always strictly greater than his own expectation (unless his own expectationis 1, in which case he is sure that agent j ’s expectation is 1). Say that agent i thinks that agent j is over-pessimistic if agent i ’s expectation of agent j ’s expectation is always strictly less thanhis own expectation (unless his own expectation is 0, in which case he is sure that agent j ’sexpectation is 0). Case I.
First, suppose that each agent considers every other agent over-optimistic. In this casethe consensus expectation must be 1, independent of the network structure. This—and other examples we describe here—may involve violating the otherwise maintained joint connected-ness assumption (irreducibility of B ), but our main result in this section, Proposition 4, does not rely on the jointconnectedness of B . agent i − i agent i + t jk − t jk t jk + t jk − t jk t jk + t jk − t jk t jk + m o r e o p t i m i s t i c γ i + i = γ i , i − = F IGURE
2. The example of Case II, with a counterclockwise network.
Case II.
Second, suppose that for every agent i , there is an agent he considers over-optimisticand another agent he considers over-pessimistic. Then there is a network structure under whichthe consensus expectation is 1. We can simply look at the network structure in which eachagent puts all weight on agents he thinks are over-optimistic. Symmetrically, there is a networkstructure in which the consensus expectation is 0. These results do not depend on agents’ exante expectations—which might take any value between 0 and 1.Figure 2 illustrates one example of this occurring. There are I agents, indexed by N = {1, . . . , I },and their indices are interpreted modulo I . Each agent has many signal realizations, t ik , withindices k ∈ {1, . . . , K }, with higher- k signals inducing more optimistic first-order beliefs about y .Assume that the most extreme signals lead to expectations 1 and 0. Agent i , when he has signal t ik , is certain that agent i − t i − k + , the next more optimistic signal. He is also certainthat agent i + t i + k − , the next more pessimistic signal. If k is already extreme (that is, k = K ) then we replace k + k −
1) by k in the above description.Now the two networks considered are as follows. One has each agent assigning all weight tothe agent counterclockwise from him (i.e., to his left, as depicted in Figure 2). The other networkhas each agent assigning all weight to the one clockwise from him (i.e., to his right). Then in thecounterclockwise network, the consensus expectation is 1, and in the clockwise network (notshown), the consensus expectation is 0. XPECTATIONS, NETWORKS, AND CONVENTIONS 29
Case III.
For our final case, rather than assuming any agent is over-optimistic about any other,assume instead that each agent’s expectation of the average expectation of others’ expecta-tions is greater than his own expectation. As always, averages are taken with respect to net-work weights, and “greater” is strict except in the case where an agent’s expectation is 1. Thisconstitutes a milder form of over-optimism. Note that it is not implied by the assumptions weimposed in either of the above results: While the condition of Case III depends on the net-work (as in Case II), it allows for the possibility that an agent is never over-optimistic about anyother particular agent (recalling that over-optimism is a condition uniform over one’s signals).Rather, which agent someone is over-optimistic about may depend on his signal. But again, theconsensus expectation is 1.
Markov Process Intuitions.
The results in each of the cases above can be established by using therepresentation of higher-order average expectations via a Markov process, which we presentedin Section 4.3.Let us begin by explaining Case I. If a particle makes transitions over the states S according tothe Markov process, then at each step it moves toward strictly more optimistic types of agents,unless it is already at a most optimistic type. Similar arguments can be given for the other cases;see the proof in Section 7.3 for the general argument.The cases discussed so far involve the unsatisfactory assumption that some types are cer-tain that they are the most optimistic. It will often be unreasonable for agents to hold suchextreme beliefs, or for the analyst to assume that they do. Thus, we wish to have a result thatis more quantitative and more robust. Also, the networks involved in Case II are extreme, notallowing an agent to put even small amounts of weight on others whom he does not considerover-optimistic (or over-pessimistic). Our general results will relax all these assumptions.The basic idea behind that generalization is clear: it follows from the arguments above andcontinuity. But the details are subtle. Indeed, what will be most interesting about the generalresults we obtain is the nature of the conditions that are involved. How much second-orderpessimism can be permitted for the very optimistic agents without losing the contagion of op-timism? By relating the situation of second-order optimism to a suitable Markov chain, we areable to give a precise bound describing how strong second-order optimism (of “most” types)must be relative to the pessimism about counterparties’ beliefs permitted for very optimisticagents. A General Case.
We now weaken our assumptions on the most optimistic types, and allowfor the possibility that when agents are maximally optimistic they assign only probability 1 − ε ,for some ε >
0, to any given other being maximally optimistic. But now we assume that whenan agent is not maximally optimistic, there is a uniform lower bound, δ , on the degree of over-optimism. With this weakening of our earlier assumptions, the above results remain true withan error of order εδ . Among other things, this allows us to use a network with γ i j > i , j in Case II above.We state and prove a formal version of our claims in this general case, and then discuss howthe claims made about our illustrative cases follow. Critically, in addition to demonstrating thecontinuity in beliefs we needed, this result gives quantitative bounds on how agents’ interimover-optimism translates into the consensus outcome. Proposition 4.
Consider an arbitrary information structure π and an arbitrary network Γ (i.e.,drop for this result the maintained assumption that B is irreducible). Suppose there exist f and δ > , ε ≥ such that beliefs about neighbors are mildly optimistic in the following sense: Every type whose first-order expectation of y is strictly below f expects the first-order ex-pectation, averaged across his counterparties, to be at least δ above his own. That is, forevery t i such that ( E i y )( t i ) < f , we have P j γ i j ( E i E j y )( t i ) ≥ ( E i y )( t i ) + δ . Every type whose first-order expectation of y is at least f expects the first-order expectation,averaged across his counterparties, to be almost as large as his own, with a shortfall of atmost ε . That is, for every t i such that ( E i y )( t i ) ≥ f , we have P j γ i j ( E i E j y )( t i ) ≥ ( E i y )( t i ) − ε .Then the consensus expectation of y is at least f + ε / δ . The proof of this result, via a suitable Markov chain inequality, is provided in Section 7.3below.An important feature of this result is that, fixing the constants δ and ε , its hypotheses do notdepend on the finite type space used to represent the environment. This allows the result toextend readily to infinite signal spaces, by considering sequences of finite ones approximatingthe infinite one.We now return to Cases I–III, with expectations taking values in [0, 1], and describe how toobtain them formally as applications of this result. For Case I, where each agent considers everyother one over-optimistic, set f =
1. Because of finiteness of the type space, there is a δ so thathypothesis (1) of Proposition 4 holds for all types whose first-order expectations of y are strictly XPECTATIONS, NETWORKS, AND CONVENTIONS 31 below f . For this case, we can take ε =
0. Applying Proposition 4, we get that the consensusexpectation is 1. For the case of over-pessimism, we simply apply a change of variables from y to 1 − y and use the same result to find that the consensus expectation is 0.For Case II, we constructed two networks. In one network, each agent places all weight onsome agent he considers over-optimistic. For this network, the hypotheses of Proposition 4 holdfor the same reasons discussed in the previous paragraph, and we conclude that the consensusexpectation is 1. In the other network, each agent places all weight on some agent he considersover-pessimistic, and by symmetry the consensus expectation is 0.Case III is a direct application of the proposition, with ε = Markov Chain for Second-Order Optimism.
We now analyze the interaction structurescorresponding to second-order optimism and discuss how to establish our results using Markovchain arguments.Consider an arbitrary finite state space S with a Markov kernel, with B ( s , s ′ ) being the proba-bility of transitioning from state s to s ′ , and fix a function f : S → R .The purpose of this subsection is to present the following lemma: Assume that for all s suchthat f ( s ) is below a certain value f , taking one step from s (according to the Markov kernel) toreach a random state W yields a value f ( W ) that is higher by at least δ , in expectation, than f ( s ). Assume also that if, in contrast, s is chosen such that f ( s ) exceeds f , then the expectedvalue of f ( W ) can decrease relative to f ( s ) by only a smaller amount, ε . Under these assump-tions, we will show that if s is drawn from a stationary distribution of B , the expectation of f ( s )is not much below f . The lemma we now state makes this quantitative and precise.We denote by W , W , . . . the stochastic process induced by the Markov chain. The symbol P W ∼ ν denotes the probability measure corresponding to this process when W is drawn ac-cording to a distribution ν . The notation for expectations is analogous.
Lemma 2.
Let B be a Markov chain as described above. Suppose there are real numbers δ , ε > and f such that the following hold: For every s such that f ( s ) < f , we have E W = s [ f ( W )] ≥ f ( s ) + δ . For every s such that f ( s ) ≥ f , we have E W = s [ f ( W )] ≥ f ( s ) − ε . When ν is a point measure on s , we write W = s in the subscript as a shorthand. Fix an arbitrary starting state, and let p denote the ergodic distribution over states that is reachedstarting from that state. Then p ( s : f ( s ) ≥ f ) ≥ + ε / δ . The proof, which appears in Section A, uses the fact that f ( W ) and f ( W ) have the same ex-pectations under the ergodic distribution, and uses the hypotheses of the lemma in this equa-tion to derive the desired inequality. With this result, we can establish all the conclusions aboutthe consensus expectation, as discussed after Proposition 4 above.7.4. Discussion.
Related Results.
The result also relates to Harrison and Kreps (1978), who consider thecase where risk-neutral agents have heterogeneous beliefs (but symmetric information) andtrade and re-trade an asset through time. The asset is always sold to the (endogenously) mostoptimistic agent at the current history. The price is driven above the highest expectation of theasset’s value held by any agent. Harrison and Kreps (1978) motivate their exercise as a modelof “speculation,” and our result has a similar interpretation. In both cases, which agent is mostoptimistic can vary: in their case, the identity of the most optimistic is determined by the publichistory of the performance of the asset, whereas for us it is because of asymmetric information.A closely related paper is that of Izmalkov and Yildiz (2010). They make a primitive assump-tion similar to our assumption about optimism: Beliefs are all distorted in the same direction.They consider two-agent, two-action coordination games, and show that agents can be inducedto take any rationalizable action—including risk-dominated ones—if the degree of optimism ishigh enough. Finally, Han and Kyle (2017) report a “contagious optimism” result in a CARA-normal assetpricing model. They study a static
CARA-normal pricing game in which an agent, in equilib-rium, conditions on the information revealed by a counterparty’s trading. While our game isdesigned to pick up higher-order average expectations, their result depends on different prop-erties of higher-order expectations (certain kinds of hierarchies in which agents wrongly as-sume common knowledge of the mean of an asset value). However, they similarly show that asmall amount of optimism can give rise to arbitrarily high asset prices. Another difference isthat their result is written for the two-agent case; any extension to many agents would requirethat the network be uniform, because trade takes place in centralized markets. Because they Note that the chain need not have a unique ergodic distribution, but there is an ergodic distribution reachedfrom any initial state. This observation illustrates a more general point of Weinstein and Yildiz (2007b)that any rationalizable actioncan be made uniquely rationalizable if a type is perturbed in the product topology.
XPECTATIONS, NETWORKS, AND CONVENTIONS 33 consider a world with normally distributed uncertainty, there is no upper bound on first-orderexpectations, and this allows contagious optimism to drive prices up without bound.7.4.2.
Tightness.
We now construct a chain to show the bound of Lemma 2 is tight. This showsthe sufficient condition for contagion of optimism is tight: in at least some cases, it gives exactlythe amount of second-order optimism needed to guarantee high consensus expectations.Consider a chain with states t ik for i ∈ {1, 2} and k ∈ {0, . . . , m } and . Let f ( t ik ) = k and define,whenever j i : B ( t ik , t j ℓ ) = δ if ℓ = k + ≤ m − δ if ℓ = k < m ε if k = m , ℓ = m − − ε if k = m , s ′ = m k as the “height” of the chain, it ascends a step with probability δ when k is inthe interval {0, 1, . . . , m − k = m , themaximum, it moves with probability ε to height k = m −
1. Otherwise, it stands still. Whilewe have described B as a Markov process, it can be realized as an interaction structure. Ournotation suggests how to realize this chain as an interaction structure with two agents, eachhaving m + ε (pessimism) be bounded as in the formula of the lemma relative to theguaranteed “optimistic drift” δ .8. T YRANNY OF THE L EAST -I NFORMED
In Proposition 3, we gave a sufficient condition (common prior on signals) under which theconsensus expectation is the centrality-weighted average of agents’ prior expectations. In thissection, we will find conditions on the information structure under which the consensus expec-tation is (almost) equal to one agent’s expectation. That is, rather than influence being sharedaccording to network centrality, it will all be allocated to one agent, in a way that will depend on the information structure. In particular, it will turn out to be the least informed agent whoaccumulates influence.To motivate these results, we can again consider some extreme cases. First, suppose thatone agent is completely ignorant and has no private information, while other agents know thestate perfectly. The agents other than the ignorant agent will have degenerate interim beliefs,so nothing about their priors can matter for iterated expectations or the consensus. Thus, ifanyone’s ex ante beliefs play a role in determining consensus expectations, it must be thoseof the least informed agent. It turns out that the consensus expectation is simply equal to theignorant agent’s prior expectation of y . A simple way to see this is to note that, because the exante beliefs of the informed agents don’t matter, we may as well take them to be equal to theprior of the ignorant agent; then the conclusion follows by Proposition 1 on the common prior.By continuity, our result continues to hold if the ignorant agent has almost no information andthe other agents have almost perfect information.Surprisingly, this conclusion remains true when the ignorant agent is only relatively ignorant,and when his beliefs are not public as they were in the toy example. The ignorant agent maypossess very precise private information about the state. But if others have even more precise(i.e., less noisy) private information, then their priors will still not matter, and only the relativelyignorant agent’s priors will determine the consensus expectation.We now present the statement and proof of the result, and then discuss it and compare it withrelated results in Section 8.4.8.1. Common Interpretation of Signals Framework.
Fix a complete Γ , i.e., one such that γ i j > i j . We specialize to a framework that we call common interpretation of signals ,following the terminology of Kandel and Pearson (1995) and Acemoglu, Chernozhukov, andYildiz (2016a). There is a state θ ∈ Θ that is drawn by nature. Each agent receives conditionallyindependent signals about it according to a full-support distribution η i ( · | θ ) ∈ ∆ ( T i ); thesedistributions are common knowledge. However, the agents have different full-support priors, ρ i ∈ ∆ ( Θ ), over the state space. Combined with the conditional distributions encoded in the η i ,these uniquely define a prior distribution over Θ × T . We denote by E ρ i the corresponding priorexpectation operator. These primitives also induce in each agent, via Bayes’ rule, an interimbelief function; for each t i ∈ T i , there is a distribution π i ( · | t i ) over both the state and overothers’ signals. XPECTATIONS, NETWORKS, AND CONVENTIONS 35
Definition 4.
We say that η i is at most ε -noisy if: for every θ ∈ Θ , there is exactly one signal t i θ satisfying η i ( t i θ | θ ) ≥ − ε , and this t i θ also satisfies η i ( t i θ | θ ′ ) ≤ ε for all θ ′ θ .This condition requires that for any θ , there is exactly one signal t i that i receives with veryhigh probability conditional on θ being realized; moreover, no two different θ , θ ′ can be associ-ated with the same such signal. Definition 5.
We say that η i is uniformly at least δ -noisy if, for every θ ∈ Θ and t i ∈ T i , theinequality η i ( t i | θ ) ≥ δ holds.This condition says that each signal has at least δ probability of being observed under eachstate, limiting the amount of information that can be inferred from any signal.8.2. Sufficient Conditions for Tyranny of the Least-Informed.
Before stating the main propo-sition, we introduce some quantities that will figure in it. Let γ min = min i j γ i j be the smallestoff-diagonal entry of Γ , which is positive by assumption. Let ρ i min be the minimal probabilityassigned to any θ ∈ Θ by the prior ρ i ∈ ∆ ( Θ ) of agent i . Let ρ min = min i ρ i min be the minimum ofall of these, across agents. Finally, let y max = max θ ∈ Θ | y ( θ ) | . Proposition 5.
Suppose that for some δ ∈ (0, 1) and ε ∈ (0, 1/2) , η is uniformly at least δ -noisy η i for all i is at most ε -noisy.Then (17) | c ( y ; B π , F π ) − E ρ [ y ] | ≤ | Θ || S | ( γ min ρ min ) · y · εδ .This bound is designed for cases where ε is much smaller than δ . It says that if agent 1’sinformation is at least δ -noisy, while all others’ information is quite precise (at most ε -noisy),then the difference between the consensus expectation of y and agent 1’s expectation of y issmall: The upper bound is linear in ε / δ . The constants depend on the sizes of the state spaceand the signal space S , and on the minimum network and belief weights in the denominator.We could formulate a version of Proposition 5 without requiring the rather strong assumptionof full support of the conditional distributions η i ( · | θ ) that is implied by Proposition 5. This isdiscussed below in Section 8.3.1, once we have a bit more notation. Key Steps in the Proof of Proposition 5.
We will analyze the consensus expectation in thesituation of Proposition 5 by analyzing the interaction structure B and its stationary distribu-tion, p . Indeed, the analysis here is intended as our main illustration of the value of reducinginformational questions to questions about the Markov chain corresponding to the interactionstructure.The key insight in proving Proposition 5 is to construct an artificial signal structure b η in whichall agents except agent 1 are certain of what θ is. This is done by rounding the signal probabil-ities η i ( t i | θ ) for i ρ i ) i ∈ N over θ that are part of the setup,this induces an artificial information structure b π = ( b π i ) i . We let b B = B b π , Γ .The proof then proceeds in three steps. First, we prove that p , the stationary distribution of B , is well-approximated by that of b B , which is denoted by b p . Second, we claim that b π can beviewed as having a common prior (corresponding to agent 1’s prior beliefs). This is because onlyagent 1 is uncertain under b π about θ , and so the ex ante beliefs of the others about θ can makeno difference; indeed, it can be shown that the other agents’ interim beliefs are compatiblewith agent i ’s prior. Thus the consensus expectation of y under b B is equal to E ρ [ y ]. Finally,we combine these facts to derive the proposition. We carry out these steps below, deferringtechnical details to Appendix A.5.The key technique in this argument deserves some extra comment. In the first step, wherewe approximate p by b p , we apply a result of Cho and Meyer (2000) on perturbations of Markovchains. This result, loosely speaking, says the following: As long as the changes in weights ingoing from b B to B are small relative to the reciprocal of the maximum mean first passage time (MMFPT ) of b B , then p is close to b p . In our application, the change in the interaction struc-ture (corresponding to interim beliefs about θ of the relatively informed agents i ε , and that is why ε appears in the nu-merator of the bound in Proposition 5. In the situation of Proposition 5, the MMFPT is of order1/ δ , the inverse of the lower bound on the uninformed agent’s noise. (That is why δ appears inthe denominator in the bound of Proposition 5.) But the technique we have outlined appliesmore broadly, in any setting where the size of the perturbation to the interaction structure can The MMFPT in the interaction structure b B is defined to be the maximum expected time it takes to get from onestate to another in the physical process of Section 4.3. It is a measure of the connectedness of b B as a network; inbelief terms, it is a measure of the maximum number of iterations required for there to be contagion of higher-order beliefs between the two “farthest” states in S . XPECTATIONS, NETWORKS, AND CONVENTIONS 37 be bounded relative to the MMFPT. This could be used to weaken the assumptions of Proposi-tion 5, for example to cover cases where noise does not have full support or the network is notcomplete—see Section 8.3.1 below.We carry out the details of the proof in Section A.5.8.3.1.
The Case Where No Player Is at Least δ -Uncertain . Suppose we did not assume that player1 is at least δ -uncertain, which entails the strong assumption that there is a lower bound onthe conditional probability of seeing any one of his signals, given any possible state. Then wewould define the uncertainty , δ , of player 1’s information as the minimum nonzero value of η i ( t i | θ ) (as t i and θ range over all possibilities). Along the same lines, we might wish to relaxthe assumption that Γ is complete, with every player putting weight on every other. We nowdiscuss how the general principles of our argument would go through and the nature of thesubtleties that would arise.As mentioned in the sketch of the proof above, what really matters in the proof is MMFPTsin b B . Assuming b B is irreducible, we can still bound these in terms of δ even with the weakerassumptions just discussed. But—as an examination of our bounds on the MMFPT shows—thebounds will involve path lengths in b B : the number of steps in b B that must be taken to linkany two states. Thus, rather than a bound on the MMFPT in b B of order δ − , which is whatwe use in our result, we might have a bound of order δ − . The exponent will depend bothon the information structure and on Γ . In the end, this will translate into a difference on theright-hand side of (17) in Proposition 5. Indeed, we conjecture that the ratio ε / δ would bereplaced by C ε / δ κ for a number κ that is increasing in the maximum path length in b B . Moreover,this adjustment would be necessary: In the more general setting we are discussing here, it isnot possible to write a bound analogous to (17) that depends on ε and our generalized δ onlythrough ε / δ .While a full exploration of these elaborations is beyond the scope of the present work, ourpoint is to say: (i) the MMFPT technique discussed here does cover less restrictive assumptionson information than we made for our illustrative result; and (ii) the topology of connectionsamong types in the interaction structure b B will matter in interesting ways for more general re-sults.8.4. Interpretation and Discussion.
Why Focus on the Least-Informed?
The results of this section may seem paradoxical. In modelsof coordination on a network motivated by organizational questions, a common result is thatagents have an incentive to focus on more informed agents, in the sense of paying more atten-tion to them or putting more weight on their signals; see, for example, Calvó-Armengol, Martí,and Prat (2015), Herskovic and Ramos (2015). Part of the reason for the difference in our resultis that asymmetric information gets washed out in our limit of higher-order expectations (recallProposition 1), rather than being learned or aggregated, and this makes the forces determininginfluence different. In Myatt and Wallace (2017), the agents are choosing which signal sources to listen to (of a commonly available set) in a coordination game; there publicness and clarityalso play a role, though in different ways.
The Least-Informed Become Effectively More Central.
It is also interesting to compare the re-sult on the tyranny of the least-informed with the result of Proposition 3 in Section 6, wherewe showed that, under a consistency condition on beliefs, it is an agent’s centrality that deter-mines his influence. However, as is seen in our simple benchmark example above, for suffi-ciently well-informed agents, their priors cannot possibly matter, no matter how central theyare in the network. Thus, an implication of Proposition 3 and Proposition 5 taken together isthat the conditions of Proposition 5 cannot, in general, be reconciled with common priors overbeliefs/signals.We can get some further intuition for our result by expressing it in the language of our ap-plications. Suppose that agents are making investment decisions, but with strategic comple-mentarities in those decisions. We might say that there is confidence in the economy if positiveexpectations about others’ investment are driving agents to invest more. In other words, confi-dence is founded on common perceptions of what is going on in the economy. Ignorant agents’(prior) views will have a disproportionate role in determining confidence. Similarly, in assetmarkets with frequent re-trading and random matching, assets will sometimes pass throughthe hands of ignorant agents. Their views will form a focal point around which market expecta-tions will form.
A Subtlety in the Meaning of “Informed.”
To interpret and apply our results, it is important toremember that “prior” really means “belief conditional on public information only.” (See Sec-tion C.1.2, where we note that all our analysis is conditional on public information.) In view ofthis, we call an agent “uninformed” if the beliefs of that agent are not sensitive to his private
XPECTATIONS, NETWORKS, AND CONVENTIONS 39 information once we have conditioned on public information. This might not correspond toother natural senses of “uninformed,” so the distinction is worth keeping in mind.
Least-Informed versus Public.
We note in closing that this result is very different from the famil-iar case of coordinating on something public or “commonly understood” in a beauty contest.The less informed agent’s information is not public or approximately public. Indeed, in ourexample, individuals’ signals are conditionally independent given the state. A highly informedplayer’s signal provides very good information about the external state , but no further informa-tion about the signals of the others who are badly informed.Moreover, in contrast to the standard case of coordinating on a public signal, our result doesnot hinge on a qualitative matter of determining which information is public (something that,actually, is held constant as we vary the noise rates). It is rather a quantitative matter of howlow the noise rate of the relatively informed players must be in order for it to “wash out” of the(public) consensus expectation. As discussed in Section 8.3.1, this can depend in a subtle wayon priors and the information structure. In particular, it can happen that the noise of the moreinformed players is vanishing compared to the noise of the less informed, and nevertheless thestructure of the smaller noise is decisive for the consensus expectation. How small the noisemust be in order not to matter depends in general on the network, priors, and informationstructure, through quantities that we have described.9. C
ONCLUDING D ISCUSSION
In Appendix C, we give some detailed discussions of important assumptions, as well as someextensions. Here we briefly summarize some of the key points.
Joint Connectedness (Section C.1).
The assumption of joint connectedness was a key main-tained assumption in our results. In this section, we relate it to properties of the beliefs andthe network—in particular, the connectedness of the network and the absence of public events(joint connectedness implies both properties but is not equivalent to their conjunction). Wealso discuss what can be done without joint connectedness. This comes down to the standardanalysis of a Markov matrix where not all states are recurrent.
Heterogeneous Self-Weights (Section C.2).
In the linear best-response game, we assumed thatall agents put a common weight β on others’ actions. If this assumption does not hold, we mayreduce to the case where it does hold by changing the network. In particular, we show howthe linear best-response game with weights ¡ β , . . . , β | N | ¢ and network Γ has the same solution as the game with a common coordination weight b β (that depends on ¡ β , . . . , β | N | ¢ ) and an al-ternative network b Γ . The diagonal entries of the matrix b Γ capture the variation in self-weights.This transformation permits the application of our main results to the case of heterogeneousself-weights. We give interpretations in terms of both the financial market and the game. Separability and Connection to Samet (1998a) (Section C.3).
In Section 5, we showed that—fixingthe information structure and network—there are strictly positive pseudopriors ³ λ i π , Γ ´ i ∈ N suchthat c ( y ; π , Γ ) = P i e i E λ i π , Γ y . At the same time, we made the observation—which here is explicitin the subscripts of λ i —that those pseudopriors may depend on both the information structure π and the network Γ . We say an information structure π satisfies separability if the pseudopri-ors depend only on the information structure. Section 6 shows that a common prior on signalsis sufficient for separability. In contrast, the assumptions made for the results on contagion ofoptimism and tyranny of the least-informed, are not, in general, consistent with separability.In Golub and Morris (2017) we give a necessary and sufficient condition for separability, whichdescribes the boundary between these cases exactly; Section C.3 sketches the essential ideas.Our results in both this paper and Golub and Morris (2017) relate closely to and build onthose of Samet (1998a). The similarity is that, as in his work, limiting properties of higher-orderexpectations are shown to depend only on a summary statistic of the information structure (inour case, the pseudoprior). Section C.3 discusses the difference in the results and techniques indetail. Ex Ante and Interim Interpretation (Section C.4).
We take an ex ante perspective in our anal-ysis: At an initial date, agents have prior beliefs—and no information—about a state of theworld. They then receive information and update their beliefs. We can interpret the results asanswering the question:
How does the consensus expectation change after agents observe theirsignals?
Our results give conditions under which: (i) the beliefs do not change (under commonpriors over signals); (ii) they change to the most optimistic conceivable beliefs (contagion ofoptimism); (iii) they change to the beliefs of the least-informed (tyranny of the least-informed).Though we take an ex ante view throughout, consensus expectations can be seen from apurely interim perspective. Indeed, consensus expectations depend only on agents’ interimbeliefs (across all possible types)—i.e. on the belief functions π . We discuss how certain mainresults would look if we were to stick to a purely interim interpretation. As in our discussionof separability above, there is a close connection to the characterization of the common priorassumption in purely interim terms given by Samet (1998a). We highlight both how our results XPECTATIONS, NETWORKS, AND CONVENTIONS 41 can be related to his, and also where an ex ante perspective makes them distinct. While conta-gion of optimism has purely interim interpretation, tyranny of the least-informed depends onassumptions about priors and has no simple interim interpretation.
Agent-Specific Random Variables and Incomplete Information about the Network (Section C.5).
Our focus throughout the paper has been on agents’ higher-order expectations of a randomvariable of common concern, y . But an equally interesting application considers a case whereagents have different preferred actions (which correspond to the different random variables y i ) in the absence of coordination motives, and where one’s network neighbors also influenceone’s choice, with linear best responses assumed (Ballester, Calvó-Armengol, and Zenou, 2006;Calvó-Armengol, Martí, and Prat, 2015; Bergemann, Heumann, and Morris, 2015b). This casecan be embedded readily into our formalism. Indeed, we can define our x i ( n ) almost iden-tically to capture this case. This embodies an equivalence between different priors over theexternal states and caring about different random variables—an equivalence which does not extend to higher-order beliefs, as we explain. In discussing this connection, we highlight howour results relate to Calvó-Armengol, Martí, and Prat (2015) and Bergemann, Heumann, andMorris (2015a).A related point is that there need not be common perceptions or complete information ofthe network weights γ i j . By allowing these to depend on individuals’ types, we can embedincomplete information about the network into our framework. Static Higher-Order Expectations, Dynamic Conditional Expectations, Behavioral Learning,and the DeGroot Model.
We have studied higher-order average expectations of a random vari-able in this paper. These higher-order expectations may be interpreted as being computed at amoment of time. We can call them “static higher-order expectations,” as they are properties ofthe agents’ static beliefs and higher-order beliefs at that moment. All the iteration of computinghigher-order expectations occurs “in the agents’ minds” rather than in an interactive dynamicprocess unfolding over time.These static higher-order expectations can be contrasted with agents’ “dynamic conditionalexpectations”: the beliefs formed via a dynamic process of updating expectations after observ-ing other agents’ conditional expectations up to that point. In this section, we will use thisdichotomy to discuss connections with some important related literatures.DeGroot (1974) suggested a behavioral model where, at each stage in a process, each of manyagents takes a weighted average of the beliefs or estimates of his neighbors. He interpreted this as a heuristic procedure according to which statisticians might average their own estimates orbeliefs with the estimates or beliefs of others whose opinions they respect, toward the goal ofreaching a reasonable consensus. In the DeGroot model, the vector of agents’ estimates atstage n is x ( n ) = Γ n x (0), where (as in our model) Γ is an exogenous, fixed stochastic matrix cor-responding to the weights agents assign to various others. Under the classical interpretation,the DeGroot model is a dynamic process, where agents start out with different estimates (per-haps based on their private information) and then updating occurs according to a behavioralrule. Economic foundations and implications of this process have been developed by DeMarzo,Vayanos, and Zwiebel (2003), Golub and Jackson (2010), Molavi, Tahbaz-Salehi, and Jadbabaie(2017), and others.Mathematically, the complete-information special case of our static higher-order expecta-tions model is isomorphic to the classic DeGroot model, in the sense that equation (10) forupdating the vector of static higher-order expectations, x ( n ) = Γ n − F y , looks very much likea DeGroot rule of the form x ( n ) = Γ n x (0). But it has a different interpretation. Our agents startout with different priors, captured by F y . In the dynamic interpretation, x (2) corresponds totaking the weighted average of neighbors’ first-period beliefs. In the static interpretation, x (2)contains agents’ expectation of the average first-order expectations of others. In this static in-terpretation, agents’ higher-order expectations are fully Bayesian but based on heterogeneouspriors and no asymmetric information, with weights (i.e., the network Γ ) which are taken asexogenous.Indeed, the general incomplete-information version of our model can also be related to theDeGroot model. If we draw a parallel where the types in our model correspond to DeGrootagents, and x (1) is taken to be the profile of initial estimates, then the “DeGroot estimate” of agiven type at stage n is the n th -order iterated average expectation of that type in our model. Inthis way, our model can be viewed as an alternative interpretation of DeGroot’s formulas.Despite the formal similarity, substantively, the two interpretations differ very significantly inhow they answer a key question in the DeGroot model literature: How does the network Γ affectthe ultimate consensus? Recall that in the DeGroot model, the consensus is a weighted average This work grew out of studying aggregation procedures for statistical estimates. Lehrer and Wagner (1981)worked on a related model, seeking normative foundations for agents’ weights in the consensus, based on theproblem of aggregating views in a network of peers. Friedkin and Johnsen (1999) studied versions of this modelin which each agent persistently weights a fixed opinion, which can be interpreted as a personal ideal point—seeSection C.5 for a version of this in our setting. See Golub and Sadler (2016), whose Section 3.5.1 we have partlyparaphrased here. Note that under complete information, B = Γ . XPECTATIONS, NETWORKS, AND CONVENTIONS 43 of the agents’ initial opinions, with the weight of an agent equal to her eigenvector centrality.(Thus, in DeGroot’s model, if high-centrality agents have high first-order expectations, the con-sensus will also be high.) There is an analogous centrality formula in our setting: Proposition1. Despite this, in our model, under the common prior assumption, there is no interesting de-pendence of outcomes on Γ , even when the network gives some agents very large network cen-trality: Higher-order average expectations will always converge to the common prior estimate,independent of the network. It is only when agents have heterogeneous priors that the networkmatters in our model. Thus, whereas in the dynamic learning DeGroot model, the updatingimplies that centrality always matters, the additional structure present in our model says thatit matters (to our outcomes) only in specific circumstances, and not under the common priorassumption.There is another approach to DeGroot’s questions that is different from his own behavioralmodel and from our interpretation of his equations sketched above. That approach is to studystandard Bayesian agents learning dynamically from each other’s beliefs, making Bayesian in-ferences at each stage. In this case we get a very different updating process. Geanakoplos andPolemarchakis (1982) considered this updating process under the common prior assumption.Their finding—in a finite-state model—was that posteriors would converge and there wouldbe common certainty of posteriors in the limit. This model has been generalized in variousdirections. For example, Parikh and Krasucki (1990) considered the case when one observesposteriors of only some neighbors, while Nielsen et al. (1990) studied the partial revelation ofposteriors. Recently, Rosenberg, Solan, and Vieille (2009) and Mueller-Frank (2013) have ex-plored such models further. Taken together, this literature provides a fairly rich understandingof dynamically updating conditional expectations with common priors and asymmetric infor-mation on a general unweighted graph. Note that it contrasts sharply with our analysis; in themodel we have studied in this paper, private information gets “washed out” rather than aggre-gated as we take n to the infinite limit. R EFERENCES A CEMOGLU , D., V. C
HERNOZHUKOV , AND
M. Y
ILDIZ (2016a): “Fragility of Asymptotic Agreementunder Bayesian Learning,”
Theoretical Economics , 11, 187–225.A
CEMOGLU , D., A. O
ZDAGLAR , AND
A. T
AHBAZ -S ALEHI (2016b): “Networks, Shocks, and Sys-temic Risk,” in
Oxford Handbook of the Economics of Networks , ed. by Y. Bramoullé, A. Gale-otti, and B. Rogers, Oxford University Press. A LLEN , F., S. M
ORRIS , AND
H. S. S
HIN (2006): “Beauty Contests and Iterated Expectations inAsset Markets,”
Review of Financial Studies , 19, 719–752.A
UMANN , R. (1976): “Agreeing to Disagree,”
Annals of Statistics , 4, 1236–1239.B
ALLESTER , C., A. C
ALVÓ -A RMENGOL , AND
Y. Z
ENOU (2006): “Who’s Who in Networks. Wanted:the Key Player,”
Econometrica , 74, 1403–1417.B
ANERJEE , S.
AND
I. K
REMER (2010): “Disagreement and Learning: Dynamic Patterns of Trade,”
Journal of Finance , 65, 1269–1302.B
ERGEMANN , D., T. H
EUMANN , AND
S. M
ORRIS (2015a): “Information and Volatility,”
Journal ofEconomic Theory , forthcoming.——— (2015b): “Networks, Information and Volatility,” Yale University and Princeton Univer-sity Working paper.B
LUME , L., W. B
ROCK , S. D
URLAUF , AND
R. J
AYARAMAN (2015): “Linear Social Interaction Mod-els,”
Journal of Political Economy , 123, 444–496.B
RANDENBURGER , A.
AND
E. D
EKEL (1993): “Hierarchies of Beliefs and Common Knowledge,”
Journal of Economic Theory , 59, 189–198.C
ALVÓ -A RMENGOL , A., J. M
ARTÍ , AND
A. P
RAT (2015): “Communication and influence,”
Theo-retical Economics , 10, 649–690.C HO , G. E. AND
C. D. M
EYER (2000): “Markov chain sensitivity measured by mean first passagetimes,”
Linear Algebra and its Applications , 316, 21–28. DE M ARTÍ , J.
AND
Y. Z
ENOU (2015): “Network games with incomplete information,”
Journal ofMathematical Economics , 61, 221–240.D E G ROOT , M. H. (1974): “Reaching a Consensus,”
Journal of the American Statistical Associa-tion , 69, 118–121.D
EKEL , E., D. F
UDENBERG , AND
D. K. L
EVINE (2004): “Learning to Play Bayesian games,”
Gamesand Economic Behavior , 46, 282–303.D E M ARZO , P. M., D. V
AYANOS , AND
J. Z
WIEBEL (2003): “Persuasion Bias, Social Influence, andUnidimensional Opinions,”
Quarterly Journal of Economics , 118, 909–968.D
UFFIE , D.
AND
G. M
ANSO (2007): “Information Percolation in Large Markets,”
American Eco-nomic Review , 97, 203–209.F
RIEDKIN , N. E.
AND
E. C. J
OHNSEN (1999): “Social Influence Networks and Opinion Change,”
Advances in Group Processes , 16, 1–29.G
ALEOTTI , A., S. G
OYAL , M. O. J
ACKSON , F. V
EGA -R EDONDO , AND
L. Y
ARIV (2010): “Networkgames,”
Review of Economic Studies , 77, 218–244.G
EANAKOPLOS , J. D.
AND
H. M. P
OLEMARCHAKIS (1982): “We Can’t Disagree Forever,”
Journalof Economic Theory , 28, 192—-200.G
OLUB , B.
AND
M. O. J
ACKSON (2010): “Naïve Learning in Social Networks and the Wisdom ofCrowds,”
American Economic Journal: Microeconomics , 2, 112–49.G
OLUB , B.
AND
S. M
ORRIS (2017): “Higher-Order Expectations,” Available at SSRN:http://ssrn.com/abstract=2979089.G
OLUB , B.
AND
E. S
ADLER (2016): “Learning in Social Networks,” in
The Oxford Handbook ofthe Economics of Networks , ed. by Y. Bramoullé, A. Galeotti, B. Rogers, and B. Rogers, OxfordUniversity Press, chap. 19, 504–542.
XPECTATIONS, NETWORKS, AND CONVENTIONS 45 G RUNDY , B. D.
AND
M. M C N ICHOLS (1989): “Trade and the Revelation of Information throughPrices and Direct Disclosure,”
Review of Financial Studies , 2, 495–526.H AN , J. AND
A. K
YLE (2017): “Speculative Equilibrium with Differences in Higher-Order Beliefs,”
Management Science , forthcoming.H
ARRISON , J. M.
AND
D. M. K
REPS (1978): “Speculative Investor Behavior in a Stock Marketwith Heterogeneous Expectations,”
The Quarterly Journal of Economics , 92, 323–336.H
ARSANYI , J. C. (1968): “Games with incomplete information played by’Bayesian’players, PartIII. The basic probability distribution of the game,”
Management Science , 14, 486–502.H
ELLMAN , Z. (2011): “Iterated expectations, compact spaces, and common priors,”
Games andEconomic Behavior , 72, 163–171.H
ERSKOVIC , B.
AND
J. R
AMOS (2015): “Acquiring Information Through Peers,” Mimeo., NYU.I
ZMALKOV , S.
AND
M. Y
ILDIZ (2010): “Investor Sentiments,”
American Economic Journal: Mi-croeconomics , 2, 21–38.J
ACKSON , M. O. (2008):
Social and Economic Networks , Princeton, NJ: Princeton UniversityPress.K
ANDEL , E.
AND
N. D. P
EARSON (1995): “Differential Interpretation of Public Signals and Tradein Speculative Markets,”
Journal of Political Economy , 103, 831–872.K
EYNES , J. M. (1936):
The General Theory of Employment, Interest and Money , Macmillan.K
OZITSKY , Y., D. S
HOIKHET , AND
J. Z
EMÁNEK (2013): “Power convergence of Abel averages,”
Archiv der Mathematik , 100, 539–549.L
EHRER , K.
AND
C. W
AGNER (1981):
Rational Consensus in Scoience and Society: A Philosophicaland Mathematical Study , vol. 21, Springer Science & Business Media.L
EWIS , D. (1969):
Convention: A Philosophical Study , Harvard University Press.M
ALAMUD , S.
AND
M. R
OSTEK (2016): “Decentralized Exchange,”
American Economic Review ,forthcoming.M
EYER , C. D., ed. (2000):
Matrix Analysis and Applied Linear Algebra , Philadelphia, PA, USA:Society for Industrial and Applied Mathematics.M
ILGROM , P.
AND
J. R
OBERTS (1990): “Rationalizability, Learning and Equilibrium in Gameswith Strategic Complementarities,”
Econometrica , 58, 1255–1277.M
OLAVI , P., A. T
AHBAZ -S ALEHI , AND . J
ADBABAIE , A LI (2017): “Foundations of Non-BayesianSocial Learning,” Columbia Business School Research Paper No. 15-95. Available at SSRN: ssrn.com/abstract=2683607 .M ORRIS , S. (1994): “Trade with Heterogeneous Prior Beliefs and Asymmetric Information,”
Econometrica , 62, 1327–1347.——— (1997): “Interaction games: A unified analysis of incomplete information, local interac-tion and random matching games,” Santa Fe Institute Working Paper.——— (2002a): “Notes on Iterated Expectations,” , Princeton University Working paper.——— (2002b): “Typical Types,” Available at princeton.edu/~smorris/pdfs/typicaltypes.pdf .M ORRIS , S.
AND
H. S
HIN (2002): “Social Value of Public Information,”
American Economic Re-view , 92, 1521–1534. M UELLER -F RANK , M. (2013): “A General Framework for Rational Learning in Social Networks,”
Theoretical Economics , 8, 1–40.M
YATT , D. P.
AND
C. W
ALLACE (2017): “Information Acquisition and Use by Net-worked Players,” Mimeo., London Business School, available at dpmyatt.org/uploads/information-networks-2017-july.pdf .M YERSON , R. B. (1997):
Game Theory , Cambridge, Mass.: Harvard University Press.N
EHRING , K. (2001): “Common priors under incomplete information: a unification,”
EconomicTheory , 18, 535–553.N
IELSEN , L. T., A. B
RANDENBURGER , J. G
EANAKOPLOS , R. M C K ELVEY , AND
T. P
AGE (1990):“Common knowledge of an aggregate of expectations,”
Econometrica , 1235–1239.P
ARIKH , R.
AND
P. K
RASUCKI (1990): “Communication, Consensus, and Knowledge,”
Journal ofEconomic Theory , 52, 178–89.R
ADNER , R. (1962): “Team Decision Problems,”
The Annals of Mathematical Statistics , 857–881.R
OSENBERG , D., E. S
OLAN , AND
N. V
IEILLE (2009): “Informational Externalities and Emergenceof Consensus,”
Games and Economic Behavior , 66, 979–994.R
UBINSTEIN , A. (1989): “The Electronic Mail Game: Strategic Behavior under ‘Almost CommonKnowledge’,”
American Economic Review , 79, 385–391.S
AMET , D. (1998a): “Iterated Expectations and Common Priors,”
Games and economic Behavior ,24, 131–141.——— (1998b): “Common Priors and Separation of Convex Sets,”
Games and Economic Behav-ior , 24, 172–174.S
HIN , H.
AND
T. W
ILLIAMSON (1996): “How Much Commom Belief is Necessary for a Conven-tion,”
Games and Economic Behavior , 13, 252–268.S
TEINER , J.
AND
C. S
TEWART (2015): “Price distortions under coarse reasoning with frequenttrade,”
Journal of Economic Theory , 159, 574–595.U I , T. (2009): “Bayesian Potentials and Information Structures: Team Decision Problems Revis-ited,” International Journal of Economic Theory , 5, 271–291.W
EINSTEIN , J.
AND
M. Y
ILDIZ (2007a): “Impact of higher-order uncertainty,”
Games and Eco-nomic Behavior , 60, 200–212.——— (2007b): “A Structure Theorem for Rationalizability with Application to Robust Predic-tions of Refinements,”
Econometrica , 75, 365–400.Y
OUNG , H. P. (1996): “The Economics of Convention,”
The Journal of Economic Perspectives ,105–122.Z
ENOU , Y. (2016): “Key Players,” in
Oxford Handbook of the Economics of Networks , ed. byY. Bramoullé, A. Galeotti, and B. Rogers, Oxford University Press.
XPECTATIONS, NETWORKS, AND CONVENTIONS 47 A PPENDIX
A. O
MITTED P ROOFS
A.1.
Proof of Fact 1.
To establish (7), write R i ( k ) for the set of i ’s pure strategies surviving k rounds of iterated deletion of strictly dominated strategies. By assumption, R i ( k ) = R i (0) = [0, M ] T i . Then using (6), R i (1) = n s i : (1 − β ) E i y ≤ s i ≤ (1 − β ) E i y + β M o = n s i : (1 − β ) x i (1) ≤ s i ≤ (1 − β ) x i (1) + β M o For induction, we may assume that for some k ≥
1, each R i ( k ) for i ∈ N has the form R i ( k ) = ( s i : (1 − β ) Ã k X n = β n − x i ( n ) ! ≤ s i ≤ (1 − β ) k X n = β n − x i ( n ) + β k M ) .We have already established the base case, k =
1. We will argue that then R i ( k + = ( s i : (1 − β ) Ã k + X n = β n − x i ( n ) ! ≤ s i ≤ (1 − β ) Ã k + X n = β n − x i ( n ) ! + β k + M ) .The reason is that if i conjectures a strategy profile s satisfying(1 − β ) Ã k X n = β n − x i ( n ) ! ≤ s j for each j i , then since best responses BR i ( s ) are nondecreasing in s , the minimum best re-sponse s i is obtained by applying BR i to the lower bound(1 − β ) Ã k X n = β n − x i ( n ) ! ,which yields (1 − β ) E i y + β X j i γ i j E i (1 − β ) Ã k X n = β n − x j ( n ) ! = (1 − β ) Ã k + X n = β n − x i ( n ) ! .The argument for the upper bound is analogous. As k → ∞ , the lower and upper bounds bothconverge to the s ∗ ( β ) of (7). A.2.
Existence and Characterization of the Consensus Expectation: Proof of Proposition 1.
Recall that p is the unique vector in p ∈ ∆ ( S ) satisfying p = pB ; this vector is uniquely deter-mined and positive by a standard result for irreducible Markov chains. Write(18) x ( β ) = (1 − β ) ∞ X n = β n B n z .We will show that for any z ∈ R S , we have(19) lim β ↑ x ( β ) = pz .Note that by the Neumann series, which can be used since the spectral radius of β B is β < P ∞ n = ( β B ) n = ( I − β B ) − , where I denotes the identity matrix of appropriate size; inparticular, I − β B is invertible. So x ( β ) = (1 − β )( I − β B ) − z , or, equivalently,(20) ( I − β B ) x ( β ) = (1 − β ) z .The formula (18) says that x ( β ) is an average, because the weights (1 − β ) β n sum to 1, of thevectors B n z . Because B n is a Markov matrix, no entry of B n z can exceed the largest value of z in absolute value. So the same is true of x ( β ), and therefore all the x ( β ) lie in a compact set.Consider a sequence β k ↑
1. By what we have said, the sequence ( x ( β k )) k lies inside a com-pact set. By a standard fact about compact sets, such a sequence converges, and has the limit pz , if and only if every convergent subsequence of it converges to pz . So consider a conver-gent subsequence, ( x ( β κ )) κ , and let x denote its limit. We will show that x = pz , which willconclude the proof of (19).By taking β ↑ x satisfies x = B x , which, given that our matrix B is irre-ducible, means that x = a for some constant a . It remains only to prove that a = pz . Premulti-plying (20) by p gives (1 − β κ ) p x ( β κ ) = (1 − β κ ) pz . Canceling (1 − β κ ), we get p x ( β κ ) = pz . Letting κ → ∞ and recalling that x is defined as the limit of the subsequence yields p x = pz . When weplug in x = a —the statement that x is a constant vector—we find that ap = pz . Since p is aprobability vector, we have p = , and so we conclude that a = pz .A.3. Proof of Lemma 2. If W is drawn from the ergodic distribution p , the distributions of W and W are the same, and so the expected difference between f ( W ) and f ( W ) is 0:(21) E W ∼ p [ f ( W ) − f ( W )] = XPECTATIONS, NETWORKS, AND CONVENTIONS 49
On the other hand, using hypotheses (1) and (2) in the second line below, we have E W ∼ p [ f ( W ) − f ( W )] = X s : f ( s ) < f p ( s ) E W = s [ f ( W ) − f ( s )] + X s : f ( s ) ≥ f p ( s ) E W = s [ f ( W ) − f ( s )] ≥ δ p ( s : f ( s ) < f ) − ε p ( s : f ( s ) ≥ f ).Combining this result with (21) and using the shorthand χ = p ( s : f ( s ) ≥ f ), we deduce 0 ≥ δ (1 − χ ) − εχ , from which the lower bound on χ claimed in the proposition follows.A.4. Proof for Claims in Section 7.4.2 about Tightness Result.
To demonstrate the claim madein Section 7.4.2, first note that the chain satisfies the assumptions of Lemma 2 with f = m . Let S k be the set of states © t ik : i ∈ {1, 2} ª . The stationary mass entering S m has to be equal to the massexiting it. Transitions to S m come only from S m − . Finally, the absorbing states are S m − ∪ S m .Combining these facts: p ( S m ) ε = p ( S m − ) δ = [1 − p ( S m )] δ ,so that p ( S m ) = + ε / δ ). A slight perturbation of the chain will result in very nearly the samebound for an irreducible chain. Note that we can generate such an example for as many agentsas we want, and as many types per agent (so tightness is established for all “sizes” of the setting).A.5. Proofs of Results on Tyranny of the Least-Informed.
The key lemma behind our proof ofProposition 5 is:
Lemma 3.
Under the hypotheses of Proposition 5, ¯¯¯¯ p ( s ) − b p ( s ) b p ( s ) ¯¯¯¯ ≤ | Θ || S | ( γ min ρ min ) · εδ . Proof.
The proof relies on Theorem 2.1 of Cho and Meyer (2000), which says that, for any s ∈ S ,(22) ¯¯¯¯ p ( s ) − b p ( s ) b p ( s ) ¯¯¯¯ ≤ °° B − b B °° ∞ max z z ′ M b B ( z , z ′ ),where M b B ( z , z ′ ) is the mean first passage time in b B to z ′ starting at z ; the norm is the maxi-mum absolute row sum. Two key technical lemmas, stated in Section A.5.1 below, allow us to Consider a Markov chain making transitions according to b B . The mean first-passage time from z to z ′ in b B isdenoted by M b B ( z , z ′ ) and defined to be the expected number of steps that the chain started at z takes up to its firstvisit to z ′ (inclusive). bound the right-hand side. Using Lemma 4 (summing the upper bounds on absolute differ-ences across any row and taking the maximum over all rows i ): °° B − b B °° ∞ ≤ | S | · | Θ || S | ε min i ρ i min .To finish bounding the right-hand side of (22), it remains to bound max z z ′ M b B ( z , z ′ ). Lemma 5does exactly this, giving max z z ′ M b B ( z , z ′ ) ≤ δρ γ .Recall that γ min is the minimum off-diagonal entry of Γ —by assumption a positive number.Combining the two inequalities gives the claimed bound. (cid:3) Now we can show how this result implies Proposition 5.The first step is to show that the consensus expectation under the hatted information struc-ture is equal to the first agent’s prior expectation: c ( y ; B b π , F b π ) = E ρ [ y ].The key to this is to establish that the information structure ( b π i ) i ∈ N is consistent with a commonprior over signals. Indeed, we will show that agent 1’s prior can be taken to be this commonprior. Let b µ ∈ ∆ ( T ) be the prior on T induced by ρ , and let b µ i ( t i ) = X t ∈ T b π ( t i | t ) b µ ( t ).For agents i
1, the interim beliefs b π i ( · | t i ) are compatible with their respective priors b µ i triv-ially, because the interim beliefs place probability 0 or 1 on any state, and are compatible with any prior—Bayes’ rule implies no restrictions. Moreover, with this profile ( b µ i ) i ∈ N , the informa-tion structure ( b π i ) i ∈ N is consistent with a common prior over signals. Now note that the priorover Θ corresponding to any b µ i is ρ . By Proposition 3, the consensus expectation c ( y ; B b π , F b π )is the common prior expectation of y , namely E ρ [ y ].The second step is to bound the distance between c ( y ; B b π , F b π ), which we have computed,and c ( y ; B π , F π ), which we would like to characterize. It is here that Lemma 3 is relevant: ¯¯ c ( y ; B π , F π ) − c ( y ; B b π , F b π ) ¯¯ = ¯¯¯¯¯X s ∈ S [ p ( s ) − b p ( s )] E i [ y | s ] ¯¯¯¯¯ XPECTATIONS, NETWORKS, AND CONVENTIONS 51 = ¯¯¯¯¯X s ∈ S p ( s ) − b p ( s ) b p ( s ) b p ( s ) E i [ y | s ] ¯¯¯¯¯ multiply and divide by b p ( s ) ≤ X s ∈ S ¯¯¯¯ p ( s ) − b p ( s ) b p ( s ) ¯¯¯¯ b p ( s ) ¯¯¯ E i [ y | s ] ¯¯¯ triangle inequality ≤ | Θ || S | ( γ min ρ min ) · εδ X s ∈ S b p ( s ) ¯¯¯ E i [ y | s ] ¯¯¯ Lemma 3 ≤ | Θ || S | ( γ min ρ min ) · y max · εδ . definition of y max This completes the proof of the proposition, except for the technical lemmas, which are thesubject of the next section.A.5.1.
Statements of Technical Lemmas.
The proof of Lemma 3 used two key bounds. We stateboth here, and give proofs in Appendix B.The first result, which was used to bound k B − b B k ∞ , converts hypotheses about the signalstructures ( η i ) i ∈ N into statements about the agents’ interim beliefs (recall that the entries of B are products of network weights from Γ and interim beliefs): Lemma 4.
For any t i , t j ∈ S with j i , we have ¯¯¯ π i ( t j | t i ) − b π i ( t j | t i ) ¯¯¯ ≤ | Θ || S | ερ i min .This follows from Bayes’ rule, but the exact statement requires a good deal of calculation. Thecore idea is that b η is obtained by changing the probabilities in η only slightly. Given full supportpriors, each π i ( t j | t i ) is continuous in η i ( t i | θ ), so it is natural that the two should be close; ourcalculation simply gives a quantitative version of this statement.We also used a bound on mean first-passage times in b B : Lemma 5.
For any two states z , z ′ ∈ S,M b B ( z , z ′ ) ≤ δρ γ .The key idea here is that, as a consequence of agent 1 having noisy information, the subjec-tive probability agent 1 puts on any type of any other agent is reasonably high: The lower boundis ρ δ , as we establish in the proof. Thus the corresponding weights in b B are lower-boundedby δρ γ min , once we take into account the network part of the weight. The other agents’ typeshave perfect information, so each of them has an edge of weight at least γ min to a type of agent
1. Thus the Markov chain is well-interconnected by agent 1’s types: Starting from any state, onegets to agent 1’s types immediately, and then to any other given state in S with substantial prob-ability, so the chain cannot take too long to visit that state (by a standard bound on geometricrandom variables).The proofs of the technical lemmas appear in Appendix B. XPECTATIONS, NETWORKS, AND CONVENTIONS 53 A PPENDIX
B. F OR O NLINE P UBLICATION : P
ROOFS OF T ECHNICAL L EMMAS
B.1.
Proof of Lemma 4.
The proof relies on the following fact about prior probabilities of sig-nals.
Fact 3.
For any i and any t i , µ i ( t i ) = X θ ′ ∈ Θ η i ( t i | θ ′ ) ρ i ( θ ′ ) ≥ (1 − ε ) ρ i min .This bound holds because η i is assumed to be at most ε -noisy, and so there must be some θ t i such that η i ( t i | θ t i ) ≥ − ε .The first step of the proof of Lemma is to write the probabilities in question via sums overstates θ . For any t i , t j with j i , we have π i ( t j | t i ) = X θ ∈ Θ η j ( t j | θ ) π i ( θ | t i )Define b π i ( t j | t i ) analogously, replacing π i by b π i and η i by b η i . Let H j ( t j | θ ) = ¯¯¯ η j ( t j | θ ) − b η j ( t j | θ ) ¯¯¯ and ∆ i ( θ | t i ) = ¯¯¯ π i ( θ | t i ) − b π i ( θ | t i ) ¯¯¯ .Now note that by the triangle inequality,(23) ¯¯¯ π i ( t j | t i ) − b π i ( t j | t i ) ¯¯¯ ≤ X θ ∈ Θ h ∆ i ( θ | t i ) + H j ( t j | θ ) + ∆ i ( θ | t i ) H j ( t j | θ ) i .Having written the difference we are studying in this way, we will bound it piece by piece. If j
1, by definition of “at most ε -noisy,” we have that | H j ( t j | θ ) | ≤ ε . If j =
1, then H j ( t j | θ ) isidentically zero. Also, note that | ∆ i ( θ | t i ) | ≤
1. So in all cases, we can bound the last two termsin the brackets by 2 ε .Now, we turn to ∆ i ( θ | t i ). If i =
1, then ∆ i ( θ | t i ) =
0, because 1’s signals are the same in boththe original information structure π and the new one b π .So assume i
1; we will show that ∆ i ( θ | t i ) ≤ ( | S | − ε (1 − ε ) ρ i min , and this will allow us to com-plete the proof. Let θ t i be such that η i ( t i | θ t i ) ≥ − ε , which is guaranteed to exist by the def-inition of “at most ε -noisy.” We will bound ∆ i ( θ | t i ), considering the cases θ θ t i and θ = θ t i separately. If θ θ t i , then by Bayes’ rule, π i ( θ | t i ) = η i ( t i | θ ) ρ i ( θ ) µ i ( t i ) ≤ η i ( t i | θ ) ρ i ( θ )(1 − ε ) ρ i min by Fact 3 ≤ ε (1 − ε ) ρ i min by definition of at least ε -nosiy.Since b π i ( θ | t i ) =
0, it follows that(24) ∆ i ( θ | t i ) ≤ ε (1 − ε ) ρ i min By the law of total probability, π i ( θ t i | t i ) ≥ − ( | S | − ε (1 − ε ) ρ i min .Since b π i ( θ t i | t i ) =
1, it follows that(25) ∆ i ( θ t i | t i ) ≤ ( | S | − ε (1 − ε ) ρ i min .This is the looser of the two bounds (24) and (25), so we can say in general that(26) ∆ i ( θ t i | t i ) ≤ ( | S | − ε (1 − ε ) ρ i min .Putting everything together, it follows that ¯¯¯ π i ( t j | t i ) − b π i ( t j | t i ) ¯¯¯ ≤ X θ ∈ Θ h ∆ i ( θ | t i ) + ε i by (23) ≤ X θ ∈ Θ " ( | S | − · ε (1 − ε ) ρ i min + ε ≤ | Θ | à ( | S | − ε (1 − ε ) ρ i min + ε ! ≤ | Θ | ε à ( | S | −
1) 2 ρ i min + ! using 1 − ε ≥ ≤ | Θ | ερ i min (2( | S | − + | S | ≥ | S | − + | S | ≥ XPECTATIONS, NETWORKS, AND CONVENTIONS 55
B.2.
Proof of Lemma 5.
The proof requires the following fact:
Fact 4.
For any t ∈ T and t j ∈ T j with j we have: b π ( t j | t ) = P θ ∈ Θ ρ ( θ ) b η j ( t j | θ ) P θ ∈ Θ ρ ( θ ) η ( t | θ ) ≥ δρ .To establish this fact, we note that there is some θ t j such that b η j ( t j | θ t j ) =
1, and the denomi-nator is at most 1 since it is the prior probability of the signal t under the information structureassociated with b η .Now we prove Lemma . Let ( c W n ) n be a stochastic process corresponding to the Markovmatrix b B . Defining the function ι : S → N by ι ( t i ) = i , we have a coupling between the chain( c W n ) n and a chain on N , the set of agents, with transition matrix Γ .Case 1: z ′ ∉ T . Let us analyze the first passage time to some z ′ ∉ T . Starting from any z ∈ S ,the mean first passage time of the process ( ι ( c W n )) n to 1 (the state corresponding to agent 1)is at most 1/ γ min . Then every time the process visits a state in T , it has probability at least δρ γ min of visiting z ′ , by Fact 4. Conditional on not visiting it at this time, we wait on average1/ γ min steps for the process to return to a state in T and have another δρ γ min chance atvisiting z ′ . Thus, using the formula for the expectation of a geometric random variable, we have M b B ( z , z ′ ) ≤ δρ γ whenever z ′ ∉ T i .Case 2: z ′ ∈ T . Let z ′ = t . If z = t j ∉ T , then there is a θ t j such that b π j ( θ t j | t j ) =
1. Then b π j ( t | t j ) = η ( t | θ t j ),which is at least δ by the definition of “at least δ -noisy.” Thus, every time the process ( c W n ) n visits any state in S \ T , it has probability at least δ of visiting z ′ . If z ∈ T , then the processsurely visits the set S \ T one step later. Thus the process takes at most two steps to be in aposition where it has probability δ of visiting z . By the same reasoning discussed above about ageometric random variable, we conclude that M b B ( z , z ′ ) ≤ δ . A PPENDIX
C. F OR O NLINE P UBLICATION : D
ISCUSSION OF A SSUMPTIONS AND V ARIANTS OFOUR R ESULTS
We now discuss robustness and extensions of our results (Sections C.1 and C.2), as well astheir context and broader implications (Sections C.3 through 9). More technical issues are post-poned to Appendix D.C.1.
Joint Connectedness.
An assumption maintained throughout was a joint connectednesscondition (recall Section 2.5), which amounts to the interaction structure B being irreducible.In the present subsection, we no longer treat this condition as a maintained assumption, andexamine its content and what can be said without it. Proposition 6 reviews a characterizationof the irreducibility condition: It is equivalent to the agent-type vector p being strictly positive.We then relate the condition to properties of the primitives Γ and π . Finally, we discuss resultsthat hold under weakenings of the assumption.C.1.1. Relations to Beliefs and the Network.
Some key properties of the network and beliefs willfeature in our characterization of irreducibility. A network Γ is complete if γ i j > i and j . Beliefs π have full support marginals if π i ¡ t j | t i ¢ > i and j , and all signals t i ∈ T i , t j ∈ T j . Event G ⊆ T is a product event if G = Q i ∈ N G i , where G i ⊆ T i for each i . Say thata product event G = Q i ∈ N G i , is a public or closed event (under beliefs π ) if, for each agent i andeach signal t i ∈ G i , the following implication holds for any t − i ∈ T − i : π i ³ t − i | t i ´ > =⇒ ( t i , t − i ) ∈ G .A public or closed event is one that, when it occurs, is common certainty among all the agents:For any observed signal, no probability is assigned any signal outside the event. Beliefs π areconnected if ∅ and T are the only public events. This corresponds to the notion of no (nontriv-ial) common certainty: Every nontrivial product event has a connection (via beliefs placed bysome agent) to states outside itself. A subset of agents J ⊆ N is closed if i ∈ J and γ i j > j ∈ J . A network Γ is connected if ∅ and N are the only closed subsets of the agent set N . Recallthat a network is a complete if γ i j > i and j .The properties mentioned so far are restrictions on either the beliefs or the network, but notboth. The property of joint connectedness from Section 2.5 is a joint restriction on both. Thefollowing result relates the two sorts of conditions. XPECTATIONS, NETWORKS, AND CONVENTIONS 57 γ = γ = γ =
113 2 i = i = i = a b a b b a F IGURE
3. An example illustrating that imposing connectedness of the networkand of beliefs is not sufficient to ensure joint connectedness, i.e. irreducibility ofthe interaction structure B . Proposition 6.
The matrix B is irreducible if and only if beliefs and the network are jointly con-nected. Necessary conditions for this are: Beliefs are connected. The network is connected.Sufficient conditions for this are: The network is complete and beliefs are connected. The network is connected and beliefs have full support marginals.Proof.
The “if and only if” part is just a rewriting of the statement that there are no nonempty,proper closed communicating classes in the Markov process corresponding to B . The two suf-ficient conditions are strengthenings of this property. (cid:3) The following example illustrates that requiring a connected network and connected beliefsseparately is not sufficient for irreducibility.
Example.
Suppose that there are three agents and each agent observes one of two signals, sothat T i = { a i , b i }. The network is given by a cycle, Γ = , and the information structure is such that each agent i is sure that agent i + i − B is not irreducible,because beliefs and the network are not jointly connected. The subset © a , a , a ª places noweight in B on its complement.The conditions we have discussed have placed restrictions on beliefs directly, rather than onobservable consequences. In Appendix D.2, we give a “no trade” behavioral characterization ofirreducibility.C.1.2. Consensus without Irreducibility.
All our results have analogues when irreducibility fails.As we saw in Section 4.2, what matters for the limit of x ( n ; y ) as n → ∞ is the behavior of B n . Thiscan be characterized quite generally based on the graph described in Section 5. The generalresult can be found in many textbooks (e.g. Meyer, 2000, Section 8.4), and we summarize itinformally. First, consider the set S A , defined as the set of absorbing states in S according tothe transition matrix B . For such t i , the analysis of x ( n ; y ) can proceed exactly as in Section 4.2,restricting B to the maximal strongly connected component containing t i .The simplest case is when S can be partitioned into several strongly connected components(so S = S A ) with B having no edges between these components. This occurs, for instance, ifthere are exactly two public (product) events. Then the analysis can be done on each of thesecomponents separately. That is, the analysis can be done conditional on public information .More generally, when there are public events, our assertion that the consensus expectation isnonrandom (recall Section 4.2) really means that it is nonrandom conditional on the publicevent that has occurred (and which, by definition of its being public, is common knowledge).Now suppose there are some nonabsorbing states. For each non-absorbing state t i ∉ S A , thecorresponding row of B ∞ is a distribution that allocates mass (in a particular way) across theset S A of absorbing states.An important case is relevant to several of our discussions. When S A consists of exactly onestrongly connected component (though it may be a strict subset of S ), we can refine our state-ments above to obtain the following generalization of Proposition 1: XPECTATIONS, NETWORKS, AND CONVENTIONS 59
Proposition 7.
If S A has exactly one strongly connected component, the consensus expectationexists and (27) c ( y ; π , Γ ) = X t i ∈ S A p ( t i ) E i [ y | t i ], where p ∈ ∆ ( S A ) , called the vector of agent-type weights , is the stationary distribution of B S A (Brestricted to S A ), i.e. the unique vector in p ∈ ∆ ( S A ) satisfying pB S A = p. Moreover, all entries of pare positive. This specializes to Proposition 1 in case S A = S . In general, the consensus expectation stillexists and is unique, which is what we need for the examples of Section 7, where irreducibilityfails to hold.Our results do not hold if we relax our maintained finiteness assumption: in Appendix D.3 wereport an example of Hellman (2011) showing this.C.2. Heterogeneous Coordination Weights and Their Relation to Self-Weights in the Network.
In the linear best-response game, we assumed that all agents put a common weight β on oth-ers’ actions, and studied the limit β ↑
1. We again maintain the assumption that Γ is irre-ducible and consider now a more general class of environments, characterized by ( Γ , β , y ),where β = ¡ β i ¢ i ∈ N is a profile of agent-specific weights. In the coordination game associatedwith such an environment, the linear best responses are given by(28) a i = (1 − β i ) E i y + β i X j i γ i j E i a j .Paralleling our main study, we can ask what happens as β i → i . As weshow in this section, this issue is closely related to “self-weights” γ ii in the network.First, we consider some simple examples. Suppose | N | =
2, with the network Γ = .If β = β <
1, then in the limit β ↑
1, iterating the (modified) best-response equation a i = (1 − β i ) E i y + β i X j i γ i j E i a j shows that we would have a convention given by lim n →∞ £ E E ¤ n E y . See Appendix D.1 for discussion of such limits, called simple higher-order expectations, and related calculations.
On the other hand, suppose | N | = β =
1, and β <
1. Now, in the limit β ↑
1, we wouldsymmetrically have a convention given by the (different) simple higher-order expectationlim n →∞ £ E E ¤ n E y Thus, with agent-specific self-weights—the corresponding convention will depend on the de-tails of how the limit ¡ β , . . . , β | N | ¢ → (1, . . . , 1) is taken—in particular, whose β i converges fasterto 1.We briefly sketch how our analysis can be adapted to this case by changing the network. Inparticular, we will show how the linear best-response game with weights ¡ β , . . . , β | N | ¢ and net-work Γ has the same solution as the game with a common coordination weight b β (that dependson ¡ β , . . . , β | N | ¢ ) and an alternative network b Γ . The alternative network b Γ may, in general, havenonzero self-weights ( b γ ii > i ) even if Γ has zero self-weights ( γ ii = i ). Thetransformation applies to any Γ , with or without positive entries on its diagonal.Note that, in general, consensus expectations were defined allowing the possibility of positiveself-weights in Γ , and our analysis of their basic properties (e.g., Propositions 1, 2 and 3) appliesin that case as well. Positive self-weights are unnatural in some applications. For instance, inthe game with the interpretation that each agent is a single player, one’s best response cannot(by definition) depend on one’s own action. On the other hand, there are other applicationswhere self-weights have reasonable interpretations. For example, in the game with linear bestresponses, if we replace each agent with a continuum of identical agents (as we have done in thefinance application), it would be natural to think of an agent caring about the average action ofindividuals like himself (i.e., in the same class). The same holds in the financial trading appli-cation, assuming there is a possibility that a player will sell into his own market (whose tradershave the same expectation, and thus the same interim beliefs). In those cases, characterizingthe average action of each population results in the equilibrium equations we have been study-ing, but with positive entries permitted on the diagonal of Γ . Formally, we can construct ananalogue of the game in Section 3.1.1 and prove, paralleling part of Fact 1, that the game has aunique rationalizable strategy profile. (The proof is by the same contraction argument used toprove Fact 1.)To see what that strategy profile is, we describe the transformation of any environment toone with an agent-independent common weight on others’ actions: XPECTATIONS, NETWORKS, AND CONVENTIONS 61
Proposition 8.
Given an environment with a network Γ and a vector β = ( β i ) i ∈ N , define b β = max i ∈ N β i and define b Γ by b γ ii = b β − β i b β ¡ − β i ¢ and b γ i j = γ i j ³ − γ ii ´ . For any y, the environments described by ( b Γ , b β , y ) and ( Γ , β , y ) have identical play in their respec-tive unique rationalizable strategy profiles. C.2.1.
Proof of Proposition 8.
By Fact 1, there is a unique rationalizable strategy profile in envi-ronment ( b Γ , b β , y ). In that strategy profile, player i ’s action given his signal satisfies a i = (1 − b β ) E i y + b β X j b γ i j E i a j .Splitting the j = i term out of the last summation, and then using the definition b γ i j = γ i j ¡ − γ ii ¢ ,we have a i = (1 − b β ) E i y + b β b γ ii E i a i + b β ³ − b γ ii ´ X j i γ i j E i a j .Rearranging and using E i a i = a i gives ³ − b β b γ ii ´ a i = (1 − b β ) E i y + b β ³ − b γ ii ´ X j i γ i j E i a j and thus a i = − b β − b β b γ ii E i y + b β ¡ − b γ ii ¢ − b β b γ ii X j i γ i j E i a j = (1 − β i ) E i y + β i X j i γ i j E i a j ,(29)where in the last step we have deduced from the formula b γ ii = b β − β i b β ( − β i ) the fact that − b β − b β b γ ii = − β i and b β ¡ − b γ ii ¢ − b β b γ ii = β i .Now (29) is an equilibrium of the coordination game in environment ( Γ , β , y ) (recall equa-tion (28) defining that game), and so by uniqueness of the rationalizable outcome, the proof iscomplete.C.3. Separability and Connection to Samet (1998a).
In Section 5, we showed that—fixing theinformation structure and network—there are strictly positive pseudopriors ³ λ i π , Γ ´ i ∈ N such that(30) c ( y ; π , Γ ) = X i e i E λ i π , Γ y . At the same time, we made the observation—which we have now made explicit in the subscriptsof λ i —that those pseudopriors may depend on both the information structure π and the net-work Γ . We say an information structure π satisfies separability if the pseudopriors depend only on the information structure: Definition 6.
The information structure π satisfies separability if there exists a profile ( λ i π ) i ∈ N such that, for every irreducible Γ , we have λ i π , Γ = λ i π .When separability holds, the asymmetric information affects the consensus expectation inan additively separable way, with each agent’s pseudoprior being weighted by his eigenvectorcentrality. Thus the incomplete information and the network can be analyzed separately. Net-works matter only via the network centrality weights, and the information structure π affectsonly the pseudopriors λ = ( λ i π ) i ∈ N .We can illustrate the failure of separability with an example building on the one in Case II ofSection 7.1. Suppose we have three agents arranged in a cycle as shown in Figure 2, with eachconsidering his counterclockwise neighbor over-optimistic and his clockwise neighbor over-pessimistic. The network Γ in which all weight goes counterclockwise (i.e., γ i , i − = i , with indices read modulo 3) gives the maximum consensus expectation. The network—callit Γ ′ —in which all weight goes clockwise (i.e., γ i , i + = i , with indices read modulo 3)gives the minimum consensus expectation given the beliefs. Note that all agents are symmetricin each network. Thus, in both networks, by symmetry all agents have the same eigenvectorcentrality.If separability held, then the two networks would have the same consensus expectation: Wehave just said that the centralities are the same across them, and that the information structurealso remains the same if we reverse the direction of each link in the network. Since in fact theconsensus expectation differs (indeed, differs as much as possible) across the two networks, wehave a failure of the separability property.We have already given one sufficient condition for separability in Section 6: a common prioron signals. Thus the example described above cannot be consistent with a common prior onsignals. In Golub and Morris (2017) we give a necessary condition for separability. We now infor-mally report the condition in stages. First, note that for higher-order expectations, and there-fore consensus expectations, the only beliefs about others that enter are marginal distributionsover another’s signal. An agent is never concerned about the correlation in the signals of two ormore others. This already suggests that the common prior assumption on signals is more than XPECTATIONS, NETWORKS, AND CONVENTIONS 63 we need: Recall from Definition 3 that the common prior assumption on signals places strongrestrictions on beliefs about profiles of signals. In fact, separability is implied by a weaker suf-ficient condition—one that requires priors about signals to agree only in their marginals onevery agent’s signal. Like the existence of a common prior on signals, such a property puts norestrictions on agents’ beliefs about Θ conditional on signals, but it also relaxes substantiallythe restrictions on beliefs about signals.In Golub and Morris (2017) we show that an even weaker condition is necessary and suf-ficient: We call it higher-order expectation-consistency . This condition specifies that we canfind a “pseudoprior” for each agent with the property that those pseudopriors have the sameexpectations of all random variables in a certain class. The class consists of all higher-order ex-pectations of random variables that are Θ -measurable. In effect, this necessary and sufficientcondition imposes only those restrictions on higher-order beliefs that are relevant to higher-order expectations.Our results in both this paper and Golub and Morris (2017) relate closely to and build on thoseof Samet (1998a). Samet showed that—if one fixes a state space and agents’ information on thatstate space (modeled via a partitional information structure)—then higher-order expectationsof all random variables converge. If the common prior assumption holds, they converge toex ante expectations under the common prior. Our Proposition 3 is a version of this result;critically, however, the reasoning is applied not to the whole state space but to the space ofsignal profiles. Samet also showed a converse: If all higher-order expectations of any randomvariable converge to the same number (depending on the random variable) regardless of theorder in which they are taken, then the information structure must satisfy the common priorassumption. We do not have a converse in this paper. The characterization of the separabilityresult in Golub and Morris (2017), which we have described above, is tight and thus is the closestanalogue to Samet (1998a). There are many conceptual and technical issues that distinguishour notion of separability from the properties that matter in Samet (1998a); these differencesare discussed in detail in Golub and Morris (2017).There is also another important technical and methodological connection to Samet (1998a).We follow Samet (1998a) in representing information structures—as well as a network, whichwe add to the model—via a Markov process. However, we actually work with a different sortof Markov process than the one in Samet (1998a): Our Markov process operates on the union of agents’ types (which we denote by S ), whereas Samet’s process applied to our questions op-erates on profiles of agents’ types T . There are a number of reasons why the former Markovprocess (on S ) is the appropriate one for our problem. First, it permits a unified or symmetrictreatment of networks and asymmetric information, as discussed in Section 5.1. If one adds anetwork structure to Samet’s Markov formulation, networks and asymmetric information enterin very different ways in the formalism (see Golub and Morris (2017) for a presentation alongthese lines). Second, and relatedly, our formalism allows us to relate key elements of our analy-sis to results in the literature on network games. Finally, the Samet (1998a) approach works withmatrices whose rows and columns are indexed by Ω = Θ × Q i ∈ N T i , which can be much largerthan S = S i ∈ N T i ; thus it can be convenient to have our formalism for doing explicit computa-tions.C.4. Ex Ante and Interim Interpretation.
We take an ex ante perspective in our analysis: Atan initial date, agents have prior beliefs—and no information—about a state of the world. Thisinterpretation entails common certainty among the agents of everyone’s prior beliefs and theway agents update their beliefs. In this section, we discuss some consequences of our ex anteapproach and interim interpretations of our resultsC.4.1.
Dynamic Interpretation: The Arrival of Information.
Under the ex ante perspective, theresults of this paper can be given an explicitly dynamic interpretation. Before the arrival of in-formation, there is symmetric information and, therefore, the consensus expectation is equalto the average of agents’ ex ante expectations, weighted by their eigenvector centralities (Sec-tion 5). In other words, if the agents had to select actions at that stage, this is what their actionswould be equal to. One interpretation of our results is as an answer to the question,
How doesthe consensus expectation change after agents observe their signals?
We show that common priorover signals is a sufficient condition for no change in the consensus expectation (Proposition3); second-order optimism causes the consensus expectation to increase to the highest possi-ble interim belief (Proposition 5); and, under the conditions in the results on the tyranny of theleast-informed, the weights on agents’ priors change from those induced by the network Γ to adegenerate vector which places all the weight on the least informed. Samet works with a partitional formalism; Golub and Morris (2017, Section 6) restates our framework in thatformalism. Under an interim interpretation, it is without loss of generality to assume common certainty of types’ interimbeliefs, i.e. how beliefs are updated: see Aumann (1976, p. 1237) and Brandenburger and Dekel (1993).
XPECTATIONS, NETWORKS, AND CONVENTIONS 65
C.4.2.
Interim Interpretation.
Though we take an ex ante view throughout, consensus expecta-tions, which emerge from agents’ play at the interim stage, cannot depend on agents’ ex antebeliefs about their own types. Thus consensus expectations must depend only on agents’ in-terim beliefs (across all possible types). We have emphasized this in our notation, by first ex-pressing the information structure in interim terms (i.e., via the beliefs π i ( · | t i )), and only thenadding in ex ante beliefs over each agent’s signals (the λ i in Proposition 2).Let us discuss how certain main results would look if we were to stick to a purely interiminterpretation. First, results such as the representation of Proposition 2 would still make sense,but the λ i would not be interpreted as anyone’s beliefs. More substantially, consider Proposition3. Let us focus on a particularly simple consequence of it: Under the common prior assumptionon all of Θ × T , the consensus expectation of y is the prior expectation of y . To make sense ofthis in interim terms, we first have to say what the common prior assumption means in interimterms. Samet (1998a) has characterized that assumption as the conditition that, for any randomvariable y , higher-order expectations converge to the same number, independent of the order inwhich expectations are taken (as long as each agent appears infinitely often) ; this number canbe identified with the common prior expectation of y . Thus an interim statement of the simpleconsequence of Proposition 3 is: Under the italicized condition, the consensus expectation of y is simply the prior expectation of y . This is natural: We can write the consensus expectationas an average of higher-order expectations, and the irreducibility of Γ ensures that all agentsappear infinitely often in each of them. An interim version of Proposition 3 follows from verysimilar reasoning, with more attention paid to the network, and this is carried out in Golub andMorris (2017).The consequence of Proposition 3 that we have discussed is similar to Corollary 1 but differsin an important way. Corollary 1 does not depend on there being a common prior on the wholestate space (i.e, on signals and beliefs jointly); rather, it requires that the ex ante first-orderexpectations of y be the same across agents. This assumption does not have an obvious interiminterpretation. Thus, the contrast between the “full common prior” result we have discussedin the previous paragraph and the actual result of Corollary 1 helps bring out where an ex anteperspective is important for us.Our second-order optimism result (Proposition 4) is stated in terms of interim beliefs only(the consensus expectation is equal to the highest possible interim belief ), so ex ante beliefs donot play a role in the interpretation. On the other hand, the common interpretation of signals property used in the result on the tyranny of the least-informed (Proposition 5) does not haveany natural interim interpretation. C.5.
Agent-Specific Random Variables and Incomplete Information about the Network.
Ourfocus throughout the paper has been on agents’ higher-order expectations of a given randomvariable, which is the same across all agents. But for many applications of interest, there isa different random variable corresponding to each agent, and then higher-order expectationsare taken. For example, a literature on coordination games in networks focuses on the casewhere agents have different preferred actions (which correspond to the different random vari-ables) in the absence of coordination motives, and where one’s network neighbors also influ-ence one’s choice, with linear best responses assumed (Ballester, Calvó-Armengol, and Zenou,2006; Calvó-Armengol, Martí, and Prat, 2015; Bergemann, Heumann, and Morris, 2015b).This case can be embedded readily into our formalism. Specifically, suppose that instead ofbeing interested in a (common) random variable y ∈ R Θ measurable with respect to the externalstate, each agent has a different random variable, y i ∈ R Θ . Now, in Section 2.3, equation (3) ischanged to x i (1; y ) = E i y i .Once x i (1; y ) is set, the higher-order average expectations are defined by the same equation,(4), as before: x i ( n + y ) = X j ∈ N γ i j E i x j ( n ; y ).Correspondingly, in the matrix notation of Section 4, where the key iteration is x ( n ) = B n − F y ,the vector
F y is replaced by a vector f ∈ R S , with f ( t i ) = E i [ y i | t i ].The analogue of (12) is lim β ↑ ¡ − β ¢ µ ∞ X n = β n B n ¶ f ,and B n f has the interpretation that it describes the higher-order average expectations of theagents’ first-order expectations of their agent-specific random variables.One can generalize further and consider a “pure private values” setting: f can be replaced byan arbitrary vector f ∈ R S , with the interpretation that f ( t i ) is the action that agent i would like The ex ante properties of Definitions 4 and 5 do imply properties of interim beliefs—see, for example, Lemma 4in Section 8.3.
XPECTATIONS, NETWORKS, AND CONVENTIONS 67 to take, when he has signal t i , in the absence of coordination motives—an action he knows. Inthis case, each agent faces no uncertainty about the random variable of interest to him. Notethat the case of different y i ∈ R Θ is a special case of this, because in that case f ( t i ) is agent i ’s expectation of his own y i given signal t i .This brings us closer to Calvó-Armengol, Martí, and Prat (2015) and Bergemann, Heumann,and Morris (2015a). Motivated by a study of endogenous attention allocation, they work witha network version of a setting commonly studied in organizational economics and focus on ananalogue of our β ↑ B n f , just as we must in order to study theheterogeneous-values variation of our model we have just presented.C.5.1. Equivalence Between Agent-Specific Random Variables and Different Priors over Θ . Givenany environment with agent-specific random variables and a common prior on signals, we canfind another environment with the same prior on signals in which agents all care about the same random variable but have heterogeneous beliefs about external states. That is, given anyprofile ( y i ) i ∈ N , we can define a new environment with new beliefs over Θ and a random variable y so that the resulting f ∈ R S mimics that arising from the original environment. Then resultssuch as Proposition 3 can be applied.This equivalence relies on the common prior on signals assumption: Without a commonprior on signals, we could maintain such an equivalence only if the “own random variables”could depend on others’ signals (cf. Myerson, 1997, p. 74). This is related to the essential dif-ferences we observed between the model with a common prior over signals (Section 6) and themodel without it.C.5.2. Type-Dependent Network Weights.
A related extension allows for type-dependence in γ i j . In this case, we take this network weight to depend on the signal of i , and write γ i j ( t i ).Much of our analysis goes through unchanged: Equation (10) still describes x ( n ), but now un-der the definition B ( t i , t j ) = γ i j ( t i ) π i ( t j | t i ).If we interpret γ i j as i ’s probability of meeting or interacting with j , then signal-dependenceof these weights corresponds to private information about interactions. The only results that we lose in this generalization are those of Section 6, because there is now no information-independent notion of the network or of centrality. But the limits we study still exist, and muchof their structure (e.g., the structure described in Proposition 1, with p the left-hand unit eigen-vector of the generalized B ) is still present and can be used to study this more general setting.A PPENDIX
D. F OR O NLINE P UBLICATION : A
DDITIONAL D ISCUSSION
D.1.
Periodicity and Simple Higher-Order Expectations.
In defining consensus expectations,or the limit of higher-order average expectations, we considered the
Abel average (31) lim β ↑ ¡ − β ¢ ∞ X n = β n x ( n + y ) = lim β ↑ ¡ − β ¢ µ ∞ X n = β n B n ¶ F y ,which is always well defined. It is natural to ask how the higher-order average expectations x ( n ; y ) behave without this averaging, and about the limit(32) lim n →∞ B n F y .As long as B is aperiodic, the limit (32) exists and is equal to the right-hand side of (31).Aperiodicity, and the existence of the limit (32), is not relevant for many of the applicationsreported in the paper. For the linear best-response game and asset pricing, we are explicitlyinterested in the limit of the weighted sum of higher-order expectations, i.e., (31) above, and notin limits of unweighted higher-order expectations, i.e., (32) above. Nothing about the structureof agent-type weights depends on aperiodicity.However, periodicity does affect the behavior of the x ( n ; y ) in the limit, and here we discusshow. Suppose that we have a cycle of agents i , i , . . . , i | N | , i : that is, that the network Γ has eachagent i k putting weight 1 on agent i k + . Then the corresponding matrix will not be aperiodic.For example, if there are two agents, N = {1, 2}, and γ = γ =
1, then we have Γ = and B = B B A matrix is said to be aperiodic if, in the associated weighted directed graph, the greatest common divisor of allcycles’ lengths is equal to 1. A sufficient condition for this is that the matrix Γ have all positive entries. Even if γ ii = i —a natural special case for some interpretations and applications—and if there are at least 3 agents, γ i j > j i is another sufficient condition for aperiodicity. XPECTATIONS, NETWORKS, AND CONVENTIONS 69 (recall the definition of B i j from Section 4) and B will be periodic and give rise to a two-cycle.In particular, there will be well-defined limitslim n →∞ [ E E ] n E y = c and lim n →∞ [ E E ] n E ¡ y ¢ = c but they will not be equal. In the limit, the vector x ( n ) will cycle between c c and c c .For more general cycles of agents, we will have limits of the formlim n →∞ £ E i E i ... E i k y ¤ n E i E i ... E i j y but they will be different for different values of j =
1, . . . , k . We will refer to such expressions as simple higher-order expectations . If the network Γ were given by this cycle, then the entries of B n F y would be the simple higher-order expectations. The general higher-order expectationsthat we study will end up being complicated weighted sums of such simple higher-order expec-tations, although we will not in general work with the decompositions.Without the assumption of finitely many types, behavior more complicated than cycling canarise, and (1 − β ) ∞ X n = β n x ( n + y ) need not converge as β ↑
1. This phenomenon is discussed inMorris (2002b) and Morris (2002a). A related but different lack of convergence plays a role inHan and Kyle (2017): there, because of the lack of finiteness of the type space, arbitrary higher-order expectations can obtain.D.2.
A Behavioral Interpretation of Irreducibility via No Trade.
What is the behavioral con-tent of the joint connectedness of beliefs and the network, i.e., the irreducibility of B ?We report a characterization of the joint connectedness property, and therefore the existenceand uniqueness of a distribution of positive agent-type weights. Just as the common prior as-sumption can be characterized as the non-existence of profitable trades among agents (seeMorris (1994) and Samet (1998b)), the property we are studying here has a no-trade charac-terization. See Nehring (2001) for more on the various relations between no-trade conditions, higher-order expectations,and common priors.
Let x i be a payment rule for agent i , x i : T i → R , which is measurable with respect to agent i ’ssignal. A trade consists of a profile of payment rules, ¡ x i ¢ i ∈ N . The trade generates strict expectedbilateral gains from trade if x i ³ t i ´ ≤ X j i γ i j X t j ∈ T j π i ³ t j | t i ´ x j ³ t j ´ for each agent i and t i ∈ T i , with strict inequality for at least one agent i and t i ∈ T i . The in-terpretation is that agent i is committed to making a payment x i ¡ t i ¢ as a function of his signal.But he anticipates receiving the payments to which others are committed. Proposition 9.
There exists a separable trade generating strict expected bilateral gains from tradeif and only if beliefs and the network are jointly connected.Proof.
The existence of a separable trade giving strict expected bilateral gains from trade isequivalent to the requirement that there exists a vector x such that x > B x , where > meansa weak inequality on all components and strict inequality on some component. Recall that irre-ducibility implies the existence of a strictly positive vector of agent-type weights p with pB = p .Now we have p x > pB x = pB x , a contradiction. So irreducibility fails. Conversely, suppose thatirreducibility fails. Then there exists at least one type t i ∈ S that no one assigns positive proba-bility to, so that γ j i π j ¡ t i | t j ¢ = t j . But now if we set x i ¡ t i ¢ < x j ¡ t j ¢ = t j t i ,then we have a separable trade with strict expected gains. (cid:3) D.3.
Non-Existence of Consensus Expectations on Infinite State Spaces.
We have maintainedthe assumption that Θ and all the T i are finite. In general, without finiteness, there may not be avector of agent-type weights as defined in Proposition 1. An example offered by Hellman (2011,Section 6) demonstrates this. The example uses a version of the two-player information struc-ture in Rubinstein’s (1989) electronic mail game, with T and T both having the cardinality of N , the natural numbers. If we take a network Γ on two players such that each puts all weighton the other, and construct a suitable infinite analogue of B , Hellman’s result implies that thereis no invariant measure for B —i.e., no vector p ∈ ∆ ( S ) of agent-type weights such that pB = p .Therefore, there is no analogue of Proposition 1, which was the foundation for all our results.We conjecture that if, like Hellman (2011), we require the state space Ω underlying T and T to be compact and the information structure to be everywhere mutually positive in his Hellman works with a partitional formalism similar to that of Samet (1998a); see Golub and Morris (2017) for atranslation of higher-order expectations into this framework.