[PDF] Feasible Joint Posterior Beliefs

Abstract

We study the set of possible joint posterior belief distributions of a group of agents who share a common prior regarding a binary state, and who observe some information structure. For two agents we introduce a quantitative version of Aumann's Agreement Theorem, and show that it is equivalent to a characterization of feasible distributions due to Dawid et al. (1995). For any number of agents, we characterize feasible distributions in terms of a "no-trade" condition. We use these characterizations to study information structures with independent posteriors. We also study persuasion problems with multiple receivers, exploring the extreme feasible distributions.

Full PDF

aa r X i v : . [ ec on . T H ] D ec Feasible Joint Posterior Beliefs ∗ Itai Arieli † Yakov Babichenko ‡ Fedor Sandomirskiy § Omer Tamuz ¶ December 24, 2020

Abstract

We study the set of possible joint posterior belief distributions of a group of agents who sharea common prior regarding a binary state, and who observe some information structure. For twoagents we introduce a quantitative version of Aumann’s Agreement Theorem, and show thatit is equivalent to a characterization of feasible distributions due to Dawid et al. (1995). Forany number of agents, we characterize feasible distributions in terms of a “no-trade” condition.We use these characterizations to study information structures with independent posteriors.We also study persuasion problems with multiple receivers, exploring the extreme feasibledistributions.

The question of whether agents’ observed behavior is compatible with rationality is fundamentaland pervasive in microeconomics. A particularly interesting case is that of beliefs: when are agents’beliefs compatible with Bayes’ Law? ∗ This paper greatly beneﬁted from multiple suggestions and comments of our colleagues. We are grateful (inalphabetic order) to Kim Border, Ben Brooks, Laura Doval, Piotr Dworczak, Nikita Gladkov, Sergiu Hart, Kevin He,Aviad Heifetz, Yuval Heller, Matthew Jackson, Eliott Lipnowski, Jeﬀrey Mensch, Benny Moldovanu, In´es Moreno deBarreda, Stephen Morris, Alexander Nesterov, Abraham Neyman, Michael Ostrovsky, Thomas Palfrey, Jim Pitman,Luciano Pomatto, Doron Ravid, Marco Scarsini, Eilon Solan, Theodore Zhu, Gabriel Ziegler, and seminar participantsat Bar-Ilan Univeristy, Caltech, Hebrew University, HSE St. Petersburg, Technion, Tel Aviv University, Stanford,and UC San Diego. † Technion, Haifa (Israel). Itai Arieli is supported by the Ministry of Science and Technology ( ‡ Technion, Haifa (Israel). Yakov Babichenko is supported by a BSF award ( § Technion, Haifa (Israel) and Higher School of Economics, St.Petersburg (Russia). Fedor Sandomirskiy is partiallysupported by the Lady Davis Foundation, by Grant 19-01-00762 of the Russian Foundation for Basic Research, bythe European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program( ¶ Caltech. Omer Tamuz is supported by a grant from the Simons Foundation ( We ask thesame question, but regarding groups of more than one agent. In particular, we consider a groupof agents for which one can observe a common prior regarding a binary state, as well as a jointdistribution of posteriors. This distribution is said to be feasible if it is the result of Bayesianupdating induced by some joint information structure observed by the agents. Is there also in thiscase a simple characterization of feasibility, that does not require the analyst to test inﬁnitely manypossible information structures?Clearly, feasibility implies that each agent’s posterior distribution must satisfy the martingalecondition. An additional important obstruction to feasibility is given by Aumann’s seminal Agree-ment Theorem (Aumann, 1976). Aumann showed that rational agents cannot agree to disagree onposteriors: when posteriors are common knowledge then they cannot be unequal. This implies thatfeasibility is precluded if, for example, agent 1 has posterior 1 / /

5, and likewise agent 2 has posterior 2 / /

5. There are, however,examples of distributions that are not feasible, even though they do not involve agents who agreeto disagree as above. Thus the Agreement Theorem does not provide a necessary and suﬃcientcondition for feasibility. Our ﬁrst result is a quantitative version of the Agreement Theorem: arelaxation that constrains beliefs to be approximately equal when they are approximately commonknowledge.For the case of two agents, a characterization of feasible distributions was obtained by Dawid, DeGroot, and Mortera(1995). Their criterion of feasibility takes a form of a family of inequalities, which arise from resultsdue to Kellerer (1961) and Strassen (1965). As we explain, these inequalities correspond exactlyto those in our quantitative Agreement Theorem, which thus provides a necessary and suﬃcientcondition for feasibility: a joint posterior belief distribution is feasible if and only if agents do notapproximately agree to disagree. The question becomes more subtle when the agent is exposed to a collection of information sources.Brooks, Frankel, and Kamenica (2019) demonstrated that the feasible family of distributions that arises as we varythe subset of information sources does not admit a simple characterization even in the single agent case. independent joint posterior belief distributions: theseare induced by information structures in which each agent receives non-trivial information regardingthe state, and yet gains no information about the others’ posterior. We give a simple conditionfor feasibility of independent distributions in the case of two agents with identically distributed,symmetric posteriors: such distributions are feasible if and only if the uniform distribution on [0 , ﬁrst-order Bayesian persuasion. A ﬁrst-order Bayesian persuasionproblem includes a single sender and multiple receivers. First, the sender chooses an informationstructure to be revealed to the agents. Then, each receiver chooses an action. Finally, both thereceivers and the sender receive payoﬀs. The sender’s utility depends generally on the receivers’actions. The key assumption is that each receiver’s utility depends only on the unknown state andher own action, and so only her ﬁrst-order beliefs matter for her choice; we therefore refer to theseproblems as ﬁrst-order persuasion problems. As each receiver’s action depends only on her ownposterior, and as the sender’s utility depends on the actions of the receivers, the sender’s problemis to choose an optimal feasible joint posterior belief distribution for the receivers.As an example of a ﬁrst-order Bayesian persuasion problem, we study the problem of a “polariz-ing” sender who wishes to maximize the diﬀerence between two receivers’ posteriors. This examplehighlights the limitations imposed by feasibility, as the sender cannot hope to always achieve com-plete polarization in which one receiver has posterior 0 and the other posterior 1; such joint posteriordistributions are precluded by the Agreement Theorem. We solve this problem in some particularregimes, and show that non-trivial solutions exist in others, depending on the details of how thediﬀerence between posteriors is quantiﬁed.A related question is how anti-correlated feasible posteriors can be. While they can clearly beperfectly correlated, by the Agreement Theorem they cannot be perfectly anti-correlated. Thisquestion can be phrased as a ﬁrst-order Bayesian persuasion problem, which we solve. When thetwo states are a priori equally likely, we show that the covariance between posteriors has to beat least − /

32, and construct a feasible distribution, supported on four points, that achieves this3ound. The question of ﬁrst-order Bayesian persuasion is closely tied to the study of the extreme pointsof the set of feasible distributions, as the sender’s optimum is always achieved at an extreme point.In the single agent case, the well-known concaviﬁcation argument of Aumann and Maschler (1995)and Kamenica and Gentzkow (2011) shows that every extreme point of the set of feasible posteriordistributions has support of size at most two. In contrast, we show that for two or more agentsthere exist extreme points with countably inﬁnite support. In the other direction, we show thatevery extreme point is supported on a set that is small, in the sense of having zero Lebesgue mea-sure. To this end, we do not use our characterization of feasibility, but rather a classical result ofLindenstrauss (1965) regarding extreme points of the set of measures with given marginals. Like-wise, our analysis of ﬁrst-order persuasion is not based on the aforementioned characterizations offeasibility; to study these problems we exploit the fact that conditional expectations are orthogonalprojections in the Hilbert space of square-integrable random variables.This work leaves open a number of interesting questions. In particular, we leave for future studiesthe extension of our work beyond the setting of a binary state and common priors. Likewise, thestudy of the extreme feasible distributions is far from complete. For example, we do not have asimple characterization of extreme points. We also do not know if there are non-atomic extremepoints.

Related literature

Coherent experts’ opinions.

The study of feasible joint distributions of posteriors was pio-neered by Dawid et al. (1995). They were motivated by the question of aggregating the opinionsof experts who rely on diﬀerent information sources. As a byproduct of their analysis, Dawid et al.(1995) characterized the set of feasible distributions for two agents and a binary state; the detaileddiscussion of this result is in §

2. Their foundational paper and the mathematical literature inspiredby it seem to be known little by the economic theory community. We refer to Burdzy and Pitman(2020) and Burdzy and Pal (2019) for the references to the literature on experts, summary of theresults known in the two-agent case (the main focus of this literature), and tight bounds on theprobability that the pair of posteriors diﬀer by more than δ . Maximizing the latter probability canbe seen as an example of a ﬁrst-order Bayesian persuasion problem. Another example is oﬀered byDubins and Pitman (1980) who found the distribution maximizing the expected maximal posteriorfor any number of agents. A particular case of this result for two agents follows from our analysis;see the discussion after Proposition 5. Independently and concurrently Cichomski (2020) proved The questions of ﬁrst-order persuasion for polarization and anti-correlation were mentioned byBurdzy and Pitman (2020, Problem 5.2) as open problems. We are indebted to Jim Pitman for introducing us to this literature.

4n analog of our Proposition 5 using a Hilbert-space technique similar to ours. Gutmann et al.(1991) discovered an analog of the characterization by Dawid et al. (1995) in the particular case ofindependent posteriors and demonstrated that the uniform distribution on the square is feasible;they did not discover the special role played by this distribution (see our Proposition 2).

Information design and necessary conditions for feasibility.

Recently, several necessaryconditions for feasibility appeared in the economic literature studying information design withbounded-rational receivers. Independently from us, Ziegler (2020) considered an information designsetting with two receivers, and derived a necessary condition for feasibility. His condition is suﬃcientfor the case of binary signals. We further discuss Ziegler’s condition in § D. Another necessarycondition was found by Levy, de Barreda, and Razin (2020, Proposition 4); it is equivalent to thecorollary of Aumann’s Agreement Theorem (Corollary A) and, hence, is not suﬃcient. Levy et al.(2020) also oﬀered several recipes of how to construct feasible distributions starting from infeasibleones.

Alternative approaches to multi-agent information design.

Mathevet, Perego, and Taneva(2019) studied Bayesian persuasion with multiple receivers and a ﬁnite number of signals. Theyfound an implicit characterization of feasibility: considering the entire belief hierarchy, they showedthat feasibility is equivalent to the consistency of the hierarchy. . Bergemann and Morris (2019),Taneva (2019), and Arieli and Babichenko (2019) related the optimal information disclosure to thebest Bayes Correlated Equilibrium from the sender’s perspective. Even if receivers’ actions are notfree of externalities, ﬁnding the best such equilibrium leads to a linear program. This linear programhappens to be tractable for a ﬁnite number of actions (Taneva (2019) and Arieli and Babichenko(2019) focused on the binary case). Our approach is conceptually closer to the geometric point ofview on persuasion of Kamenica and Gentzkow (2011), where the distribution of posteriors playsthe key role. Our approach helps “visualize” the solution and does not require the set of actionsto be ﬁnite, however, it is limited to the ﬁrst-order persuasion problems, i.e., we rule out strategicexternalities. Certain compatibility questions for belief hierarchies (without application to the feasibility of joint belief distri-bution or Bayesian persuasion) were recently addressed by Brooks, Frankel, and Kamenica (2019). In the example of a polarizing sender that we discuss in §

4, receivers have a continuum of actions (the set ofactions coincides with the set of possible posterior beliefs), and the above linear-programming approach leads to aninﬁnite-dimensional program. In Section 4 we provide several applications of our results and study feasible correlation of posteriorbeliefs. Related questions arise in numerous papers on information design. See, e.g., Ely (2017) andBergemann, Heumann, and Morris (2020). ommon prior and no-trade. A related question to ours is the common prior problem studiedin the literature on interactive epistemology. Concretely, an Aumann knowledge partition modelwith ﬁnite state space and agents is considered. Each agent has a partition over the state space andto each partition element corresponds a posterior probability that is supported on that partitionelement. The question is whether there exists a common prior : a single probability distributionover the state space that gives rise to all the posterior distributions by means of conditional prob-ability. Morris (1991, Theorem 1a) oﬀered a characterization for the existence of a common priorin no-trade terms thus providing a variant of the converse statement to the no-trade theorem ofMilgrom and Stokey (1982). This result was rediscovered by Feinberg (2000) and a simple geometricproof was given by Samet (1998).There is a fundamental distinction between the common prior problem and ours. While in thecommon prior problem the conditional probability is given and therefore the full belief hierarchyat every state can be inferred, in our case only the unconditional posterior is considered, and thebelief hierarchy is not speciﬁed. Despite this distinction, there is a connection between the no-trade characterizations of a common prior and of feasible distributions. In a follow-up paper, Morris(2020) demonstrated how to deduce a no-trade characterization of feasibility similar to ours from hisearlier characterization of a common prior. This approach leads to a characterization of feasibilityfor ﬁnitely-supported distributions and arbitrary ﬁnite sets of states. Morris (2020) also oﬀered acomprehensive discussion of the history of the no-trade approach to the common prior problem.

Measures with given marginals.

From a technical perspective, our characterization of fea-sibility relies on the existence of measures with given marginals. Instead of the classic resultsof Kellerer (1961) and Strassen (1965) used by Dawid et al. (1995), we apply a more recent re-sult due to Hansel and Troallic (1986). In the economic literature, such tools were applied byGershkov, Goeree, Kushnir, Moldovanu, and Shi (2013) and Gershkov, Moldovanu, and Strack (2018).Our feasibility condition for product distributions (Proposition 2) shares some similarity with Bor-der’s condition of feasibility for reduced-form auctions; see Hart and Reny (2015).The remainder of the paper is organized as follows. In § § §

4. In § § In the notation introduced in our model section below, the common prior problem can be phrased as follows. Fixa signal space S i for each agent i , and denote Ω = Q i S i . Say we are given, for each agent i and signal realization s i , a conditional distribution Q s i ( · ) supported on the subset of Ω in which agent i ’s signal is s i . When does thereexist a single probability measure P on Ω such that for P -almost every s i it holds that Q s i ( · ) = P ( ·| s i )? The restriction to ﬁnitely-supported distributions can possibly be eliminated by approximation arguments. Model

Information structures and posterior beliefs.

We consider a binary state space Ω = { ℓ, h } ,and a set of n agents N = { , , . . . , n } . An information structure I = (( S i ) i ∈ N , P ) consists of signalsets S i (each equipped with a sigma-algebra, which we suppress) for each agent i , and a distribution P ∈ ∆(Ω × S × · · · × S n ). Let ω, s , . . . , s n be the random variables corresponding to the n + 1coordinates of the underlying space Ω × S × · · · × S n . When it is unambiguous, we also use s i todenote a generic element of S i .The prior probability of the high state is denoted by p = P ( ω = h ) . Throughout the paper weassume that p ∈ (0 , n agents initially have prior p regarding the state ω . Then, each agent i observes the signal s i . The posterior belief x i attributed to the high state by agent i after receivingthe signal s i is x i = P ( ω = h | s i ) . We denote by P I the joint distribution of posterior beliefs induced by the information structure I . This probability measure on [0 , N is the joint distribution of ( x , x , . . . , x n ). I.e., for eachmeasurable B ⊂ [0 , N , P I (B) = P (cid:16) ( x , . . . , x n ) ∈ B (cid:17) . We similarly denote the conditional joint distributions of posterior beliefs by P ℓI and P hI ; theseare the joint distributions of ( x , . . . , x n ), conditioned on the state ω .For a probability measure P ∈ ∆([0 , N ) and for i ∈ N , we denote by P i the marginal dis-tribution, i.e., the distribution of the projection on the i th coordinate, or the distribution of theposterior of agent i . Feasible joint posterior beliefs.

When is a given distribution on [0 , N equal to the distributionof posterior beliefs induced by some information structure? This is the main question we study inthis paper. The following deﬁnition captures this notion formally. Deﬁnition 1.

Given p ∈ (0 , , we say that a distribution P ∈ ∆([0 , N ) is p -feasible if thereexists some information structure I with prior p such that P = P I . When p is understood from the context we will simply use the term “feasible” rather than p -feasible. The single agent case.

Before tackling the question of feasibility for n agents, it is instructiveto review the well-understood case of a single agent.The so-called martingale condition states that the average posterior belief of a feasible posteriordistribution is equal to the prior. Formally, given a p -feasible distribution P ∈ ∆([0 , Z x d P ( x ) = p. The necessity of this condition for p -feasibility follows from the law of iterated expectation. For thesingle agent case, the martingale condition is necessary and suﬃcient for P to be p -feasible. Thisresult is known as the Splitting Lemma. The two agent case and the Agreement Theorem.

In the two agent case the martingale con-dition is not suﬃcient for feasibility. An additional obstruction to feasibility is given by Aumann’scelebrated Agreement Theorem (Aumann, 1976). We here provide a rephrasing of this theorem ina form that will be useful for us later.

Theorem A (Aumann) . Let (( S i ) i , P ) be an information structure, and let B ⊆ S and B ⊆ S be subsets of possible signal realizations for agents 1 and 2, respectively. If P ( s ∈ B , s B ) = P ( s ∈ B , s B ) = 0 (1) then E ( x · s ∈ B ) = E ( x · s ∈ B ) . To understand why this is a reformulation of the Agreement Theorem, note that condition (1)implies that P ( s ∈ B , s ∈ B ) = P ( s ∈ B ) = P ( s ∈ B ) , and thus the event { s ∈ B , s ∈ B } is self evident, i.e., is common knowledge whenever itoccurs. Hence this form of the Agreement Theorem states that if agents have common knowledgeof the event { s ∈ B , s ∈ B } then their average beliefs on this event must coincide. The originaltheorem follows by choosing B and B such that x is constant on B and x is constant on B . Thestatement of Theorem A is close in form to that of the No-Trade Theorem of Milgrom and Stokey(1982), which provides the same obstruction to feasibility; we discuss this further below. We donot provide a proof of Theorem A, as it is a special case of our quantitative Agreement Theorem(Theorem 1) which we prove below.Using Theorem A it is easy to construct examples of distributions that are not feasible, eventhough they satisfy the martingale condition. For example, consider the prior p = / and thedistribution P = δ , + δ , , where δ x ,x denotes the point mass at ( x , x ). Clearly P satisﬁes An event is self-evident if, whenever it occurs, all agents almost surely know that it has occurred. An event A is common knowledge at an outcome a ∈ A if A contains a self-evident event that contains a . P is equal to P I ,for some information structure I = (( S i ) i , P ). Then the events B = { s ∈ S : x ( s ) = 1 } and B = { s ∈ S : x ( s ) = 0 } satisfy the common knowledge condition (1). But P ( ω = h | s ∈ B ) = 1 = 0 = P ( ω = h | s ∈ B ) , in contradiction to Theorem A.This example can be extended to a more general necessary condition for feasibility of a jointposterior distribution. Corollary A.

Let P ∈ ∆([0 , ) be feasible for some p , and let A and A be measurable subsetsof [0 , . Denote complements by A i = [0 , \ A i . If P ( A × A ) = P ( A × A ) = 0 (2) then Z A x d P ( x ) = Z A x d P ( x ) . (3)Corollary A is a recasting of Theorem A into a direct condition for feasibility: condition (2)states that the event A × A is self evident: it has the same probability as A and the sameprobability as A . And each of the two integrals in (3) is equal to the average belief of agent i conditioned on A i , times the probability of A i . Hence Corollary A follows immediately fromTheorem A, by setting B i = { s i : x i ( s i ) ∈ A i } for i = 1 ,

2. The advantage of this formulation isthat it takes the form of a direct condition on P . A quantitative Agreement Theorem.

While Aumann’s Agreement Theorem provides an ob-struction to feasibility, a joint posterior distribution can be infeasible even when it does not implythat agents agree to disagree. A larger set of necessary conditions follows from our ﬁrst result,which is a quantitative version of the Agreement Theorem:

Theorem 1.

Let (( S i ) i , P ) be an information structure, and let B ⊆ S and B ⊆ S be sets ofpossible signal realizations for agents 1 and 2, respectively. Then P ( s ∈ B , s B ) ≥ E ( x · s ∈ B ) − E ( x · s ∈ B ) ≥ − P ( s ∈ B , s B ) . A particular case of Corollary A appears in Dawid et al. (1995, Theorem 5.2); and the inaccuracy in thatstatement was later corrected by (Burdzy and Pitman, 2020, Proposition 2.1). Levy et al. (2020) found a necessarycondition for feasibility equivalent to Corollary A for distributions with ﬁnite support. None of these papers mentionthe connection to the Agreement Theorem. roof. By the law of total expectations, we have that E ( x i · s i ∈ B i ) = E ( P ( ω = h | s i ) · s i ∈ B i ) = P ( ω = h, s i ∈ B i ) . We thus need to show that P ( s ∈ B , s B ) ≥ P ( ω = h, s ∈ B ) − P ( ω = h, s ∈ B ) ≥ − P ( s ∈ B , s B ) . We show the ﬁrst inequality; the second follows by an identical argument. We in fact demonstratea stronger inequality: P ( ω = h, s ∈ B , s B ) ≥ P ( ω = h, s ∈ B ) − P ( ω = h, s ∈ B ) . (4)Denote the conditional probability P ( C | ω = h ) by P h ( C ) for any event C . Then inequality (4) isequivalent to the elementary inequality P h ( A ∩ B ) ≥ P h ( A ) − P h ( B ), which holds for any pair ofevents A an B and any probability measure P h .A comparison to the Agreement Theorem (Theorem A) is illustrative. In Theorem A the com-mon knowledge assumption (1) implies equality of average posteriors. Here (1) has been removed,and we instead bound the diﬀerence in the average posteriors by the extent to which (1) is violated.Thus, one can think of Theorem 1 as quantifying the extent to which approximate common knowl-edge implies approximate agreement. The Agreement Theorem becomes the special case in which(1) holds. In analogy to Corollary A, we use Theorem 1 to derive further necessary conditions for feasibility.

Corollary 1.

Let P ∈ ∆([0 , ) be p -feasible for some p , and let A and A be measurable subsetsof [0 , . Then P ( A × A ) ≥ Z A x d P ( x ) − Z A x d P ( x ) ≥ − P ( A × A ) . (5)Corollary 1 admits a simple interpretation in terms of the No-Trade Theorem (Milgrom and Stokey,1982). Consider three risk-neutral agents: two traders and a mediator. Trader 2 owns a goodwith an unknown quality ω ∈ { , } . The mediator also owns a copy of the same good. The twotraders receive private information regarding the quality of the good, with a joint belief distribution P ∈ ∆([0 , ). The mediator knows P and the realized pair ( x , x ).Let A , A ⊆ [0 ,

1] be any measurable sets and consider the following trading scheme: Themediator buys the good from trader 2 whenever x ∈ A at a price of x . The mediator sells one An alternative approach to quantitative extensions of the Agreement Theorem is by the concept of common p -beliefs (Monderer and Samet, 1989). Neeman (1996) showed that when posteriors are common p -belief then theycannot diﬀer by more that 1 − p . However, we are not aware of any formal connection between this extension andour Theorem 2. x ∈ A at a price of x . Thus the mediator may need to useher copy of the good, in case she sells but does not buy.We argue that the mediator’s expected proﬁt is at least Z A x d P ( x ) − Z A x d P ( x ) − P ( A × A ) . The ﬁrst two addends correspond to the expected transfer between each trader and the mediator.The last addend corresponds to the event that the mediator has to sell his own good to trader 1since trader 2’s belief x is not in A and trader 1’s belief is in A . In this case the mediator losesat most 1.Clearly, the mediator does not provide any additional information to the two players and so theirexpected proﬁt is zero. Thus the mediator’s expected proﬁt is also zero, and so we have arrived atthe left inequality of (5). The right inequality follows by symmetry. A characterization for two agents.

Dawid et al. (1995) characterized the feasible distributionsfor the case of two agents, by applying a result of Kellerer (1961) and Strassen (1965). Althoughthey do not relate their result to the Agreement Theorem or phrase it in these terms, what theyshow is that the condition of feasibility from Corollary 1 is both necessary and suﬃcient.

Theorem 2 (Dawid et al. (1995)) . A probability measure P ∈ ∆([0 , ) is p -feasible for some p ifand only if P ( A × A ) ≥ Z A x d P ( x ) − Z A x d P ( x ) ≥ − P ( A × A ) for all measurable A , A ⊆ [0 , . (6)The necessity of (6) is Corollary 1. The suﬃciency requires another argument, which usesa theorem of Kellerer (1961). For the reader’s convenience, we present the complete proof ofTheorem 2 in Appendix A.It follows from Theorem 2 and the single agent martingale condition that when (6) holds then P is p -feasible for p = Z x d P ( x ) = Z x d P ( x ) . We note that in Dawid et al. (1995), condition (6) was written in the equivalent form P ( A × B ) + p ≥ Z A x d P ( x ) + Z B x d P ( x ) , from which the relation to the Agreement Theorem is harder to see.11t is natural to wonder if (6) can be relaxed to a simpler suﬃcient condition, and in particularif it suﬃces to check it on A , A that are intervals. As we show in Appendix D, restricting (6) tointervals results in a condition that is not suﬃcient: we construct a measure that is not feasible,but satisﬁes (5) for all intervals A , A . The constructed measure demonstrates that the conditionderived by Ziegler (2020) for feasibility is necessary but insuﬃcient. A characterization for any number of agents.

For three or more agents, Theorem 2 providesa necessary condition for feasibility, as the joint belief distribution of each pair of agents mustclearly satisfy (6). However, this condition is not suﬃcient: we construct below an example of threeagents whose belief distribution satisﬁes (6) for each pair of agents, and yet is not feasible. Theviolation of feasibility stems from a violation of the No-Trade Theorem, in a manner similar to theone illustrated above for two agents. We use this approach to provide a necessary and suﬃcientcondition for feasibility for an arbitrary number of agents.A trading scheme consists of n measurable functions a i : [0 , → [ − , i = 1 , . . . , n . Givenagent i ’s posterior x i , a mediator sells a i ( x i ) units of the good to agent i for the price of x i perunit, so that the total transfer is a i ( x i ) x i .Clearly, each agent’s expected proﬁt is zero, since she is buying or selling the good at herexpected value. Hence the mediator’s expected proﬁt is also zero. We argue that Z [0 , n n X i =1 a i ( x i ) x i − max ( , n X i =1 a i ( x i ) )! d P ( x , . . . , x n ) (7)is a lower bound on the mediator’s proﬁt. Indeed, the ﬁrst addend in the integral is the totaltransfer to the mediator. The second is equal to the total number of units of the good that themediator needs to contribute to the transaction, in case the total amount that she sells exceedsthe total amount that she buys. Since each unit is worth at most 1, she loses at most P a i ( x i ),whenever this sum is positive. Thus, since the mediator’s proﬁt is zero, it follows that (7) cannotbe positive if P is feasible.Our characterization shows that this condition is also suﬃcient for feasibility. The proof is givenin Appendix B. Theorem 3.

A probability measure P ∈ ∆([0 , n ) is p -feasible for some p if and only if for everytrading scheme ( a , . . . , a n ) Z [0 , n n X i =1 a i ( x i ) x i − max ( , n X i =1 a i ( x i ) )! d P ( x , . . . , x n ) ≤ . (8)In the case of two agents, by taking a i = ± A i in (8), we recover the condition (6) fromTheorem 2. However, we are not aware of a simple argument for deducing (8) from (6) in the12wo agent case. In particular, Theorem 2 is not a simple corollary of Theorem 3 since the latterrequires a broader set of trading schemes, while indicators are enough for the former. Relatedly,Kellerer’s theorem that underlies Theorem 2 holds only for n = 2 and cannot be extended to themultidimensional case without expanding the set of test functions; see the discussion in Strassen(1965, pp. 436-437).A natural question is whether Theorem 3 can be strengthened, along the lines of Theorem 2,to consider only indicator trading schemes: Is it suﬃcient to consider trading schemes of the form a i = ± A i , so that each agent is either a buyer or a seller, and has a set of beliefs in which shetrades one unit? By computerized veriﬁcation it is possible to show that the answer is no. Acounterexample is the distribution ν × ν × ν ∈ ∆([0 , ), where ν = (cid:0) δ / + δ / + δ / (cid:1) . Thisdistribution is not feasible, and yet each of the small number of possible indicator trading schemesis not proﬁtable.Since ν × ν is feasible, this example also shows that Theorem 3 provides additional obstructionsfor feasibility when n ≥

3, beyond the pairwise condition implied by Theorem 2. We end thissection with another such example. Consider three agents whose beliefs ( x , x , x ) are distributeduniformly and independently on [0 , P ∈ ∆([0 , )be the Lebesgue measure. By Proposition 2 below, the agents pairwise satisfy the condition ofTheorem 2. We argue that this is nevertheless not a feasible distribution.To see this, consider the trading scheme given by a ( x ) = a ( x ) = a ( x ) = x ≥ / − x ≤ / . In this scheme each agent buys a unit whenever her belief is above 2 /

3, and sells when it is below1 /

3. A simple calculation shows that condition (8) of Theorem 3 is violated. These examplesillustrate the general phenomenon that is captured by Proposition 3 below: for any distribution ν on [0 ,

1] not concentrated at one point, ν n becomes infeasible for large enough n . In these twoexamples n = 3 suﬃces. Identically Distributed Binary Signals.

As an illustration of the restrictions imposed by therequirement of feasibility, consider a setting with prior p = / , and two agents who each receive We get R [0 , a i ( x i ) x i d P = − R / x d x + R / x d x = . Hence, R [0 , P i =1 a i ( x i ) x i d P = . The hyperplanes x i ∈ { / , / } partition [0 , into 27 small cubes. There is 1 small cube where the sum P i =1 a i ( x i ) is equal to 3,there are 3 cubes where the sum equals 2, and 6 cubes where it has the value of 1; on the other cubes it is non-positive.Hence, R [0 , max n , P i =1 a i ( x i ) o d P = = . We see that the condition (8) is violated and conclude that theuniform distribution on [0 , is infeasible. We thank Eric Neyman for alerting us to an error in a previous versionof this example.

13 binary signal that equals the state with some probability r > / . What joint distributions arefeasible?The canonical setting is the one in which signals are independent, conditioned on the state. Notethat they are not unconditionally independent: while each agent has each posterior with probability / , conditioned on the ﬁrst agent acquiring a high posterior, the second agent is more likely to alsohave a high posterior than a low one. In this case the induced belief distribution is P = r + (1 − r ) δ r,r + δ − r, − r ] + r (1 − r ) [ δ − r,r + δ r, − r ] . Another simple case is the one in which both agents observe the same signal, in which case theposteriors are of course perfectly correlated, and the distribution is P = 12 δ r,r + 12 δ − r, − r , as in the signal agent case. In both of these cases P is feasible, since we derive it from a joint signaldistribution.The case in which agents’ posteriors are perfectly anti-correlated, i.e., P = 12 δ r, − r + 12 δ − r,r is precluded by the Agreement Theorem and its Corollary A, as agents here agree to disagree ontheir posteriors.More generally, we can consider the case in which conditioned on an agent’s posterior, the otheragent has the same posterior with probability c . That is, P = c δ r,r + δ − r, − r ] + 1 − c δ − r,r + δ r, − r ] . (9)In this case there is never common knowledge of beliefs, as long as c <

1. The natural questionis: to what extent can the signals be anti-correlated? Can they for example be (unconditionally)independent, so that after observing a signal, the probability that an agent assigns to the otheragent’s posterior is still uniform over { r, − r } ? A common intuition suggests that this is impossible,since even if signals are independent given the state, the induced unconditional distribution ofposteriors inherits the dependence on the state and thus the posteriors must be dependent; onthe other hand, if the conditional distribution of signals is the same in both states, they conveyno information and thus the posterior just equals the prior. Perhaps surprisingly, this intuition iswrong, and posteriors can be independent and even anti-correlated; see e.g. Burdzy and Pitman(2020). Proposition 1.

A joint belief distribution P as given in (9) is / -feasible if and only if c ≥ r − . A i = { r } and A i = { − r } . More generally, for ﬁnitely supported P only ﬁnitely many conditions need bechecked.Proposition 1 implies that indeed too much anti-correlation is infeasible, especially as the signalsbecome more informative. Nevertheless, it is possible that the agents’ posteriors are independent ofeach other (i.e., c = / ) as long as r ≤ /

4. Moreover, for r < /

4, the posteriors can be negativelycorrelated; for example, P is feasible for c = 1 / r = 2 /

3. In this case, posteriors are either1 / /

3, each obtained with probability / . When an agent has the high posterior, she assignsprobability 2 / Unconditionally independent signals.

As another application of Theorem 2 we further ex-plore independent joint posterior belief distributions. To motivate this application consider a sender(e.g., a consulting agency) who wants to reveal some information to receivers (its clients). However,there is an additional concern: none of the receivers must be able to form any non-trivial guess aboutthe information received by their counterpart. This can be motivated either by privacy concerns orby the desire to avoid complicated strategic reasoning on receivers’ side. For example, consider thecase that the receivers are two ﬁrms competing on the same market and plan to use the receivedinformation to adjust their strategies. If their posteriors are not independent, they might engagein a complicated reasoning involving higher order beliefs, as in Weinstein and Yildiz (2007). An-other motivation for studying independent joint beliefs comes from mechanism design, where thesedistributions arise endogenously (see, e.g., Bergemann, Brooks, and Morris, 2017; Brooks and Du,2019).We already saw above that identically distributed binary signals can be independent for prior p = / as long as r ≤ /

4. As a second example, let P be the uniform distribution on the unitsquare. Following Gutmann et al. (1991), we verify that it is / -feasible and ﬁnd the informationstructure inducing it. This distribution clearly satisﬁes the martingale condition so it remains tocheck that P ( A × A ) ≥ Z A x d P ( x ) − Z A x d P ( x ) . The other inequality of (6) will follow by symmetry.Let 1 − a be the Lebesgue measure of A and b of A . Then the left hand side equals (1 − a )(1 − b )and the right hand-side is maximized when A is pushed as much as possible towards high valuesof the integrand (i.e., A = [ a, A is pushed towards low values ( A = [0 , b ]). We get thefollowing inequality (1 − a )(1 − b ) ≥ Z a x d x − Z b x d x = 12 (cid:0) − a − b (cid:1) . a + b − ≥

0, which holds for any a and b . Thus the uniformdistribution is / -feasible.The equality attained at a + b − P ( ω = h, x ∈ [ a, , x ∈ [ b, P ( x ∈ [ a, , x ∈ [ b, a + b − P ( ω = h | x ∈ [ a, , x ∈ [ b, T = { ( x , x ) ∈ [0 , : x + x > } , the state is ω = h ;by the symmetric argument, ω = ℓ whenever posteriors are in [0 , \ T . Hence, one can use thefollowing information structure: sample a pair of signals ( s , s ) uniformly from T if ω = h andfrom [0 , \ T if ω = ℓ . This information structure leads to x i ( s i ) = s i and thus induces theuniform distribution. In §

6, we describe a family of information structures inducing the uniformdistribution on [ c, − c ] .We generalize this example to study the conditions for / -feasibility of more general productdistributions. The next proposition provides a necessary and suﬃcient condition for feasibility of alarge class of such distributions, and shows that the uniform distribution is, in fact, an importantedge case. We say that a distribution ν ∈ ∆([0 , / if its cumulativedistribution function F ( a ) = ν ([0 , a ]) satisﬁes F ( a ) = 1 − F (1 − a ) for all a ∈ [0 , µ ∈ ∆([0 , mean preserving spread of µ ′ ∈ ∆([0 , x, x ′ suchthat x ∼ µ , x ′ ∼ µ ′ and E ( x | x ′ ) = x ′ . Proposition 2.

Let P = ν × ν , where ν ∈ ∆([0 , is symmetric around / . Then P is / -feasibleif and only if the uniform distribution on [0 , is a mean preserving spread of ν . In particular, among symmetric, / -feasible product distributions, the uniform is maximal inthe convex order. The proof of Proposition 2 is relegated to § C. The “if” direction is a consequence of the following,more general lemma.

Lemma 1.

Let P = µ × · · · × µ n ∈ ∆([0 , n ) be p -feasible, and let P ′ = µ ′ × · · · × µ ′ n , where each µ i is a mean preserving spread of µ ′ i . Then P ′ is also p -feasible. Equivalently, R f ( x ) d µ ( x ) ≥ R f ( x ) d µ ′ ( x ) for every bounded convex f . Another equivalent condition is thatboth have the same expectation, and R y F ( x ) d x ≤ R y F ′ ( x ) d x for all y ∈ [0 , F and F ′ are the cumulativedistribution functions of µ and µ ′ , respectively. See (Blackwell, 1953). We note that the symmetry assumption cannot be dropped. Indeed, the uniform distribution is not a mean-preserving spread of the non-symmetric distribution ν = a · δ − a + (1 − a ) · δ with a = √ , but nevertheless ν × ν is feasible. Feasibility can checked via Theorem 2 or directly by constructing the information structure: both agentshave two signals S = S = { s, s } ; if the state is ω = h , then the pair of signals is ( s, s ) with probability 2(1 − a ) and ( s, s ) or ( s, s ) with probabilities − (1 − a ) ; if the state is ω = ℓ , the signals are always ( s, s ). The uniformdistribution is not a mean-preserving spread of ν since R (cid:0) x − (cid:1) d ν ( x ) = − √ > R (cid:0) x − (cid:1) d x = . P , one can apply an independent garbling to eachcoordinate to arrive at a structure that induces P ′ . Our proof illustrates how the result can beobtained via the no-trade arguments of our Theorem 3, without invoking Blackwell’s Theorem.Using Proposition 2, we show in § F that Gaussian signals may induce independent beliefs,provided that that they are not too informative. Let ν be the belief distribution induced by asignal which, conditioned on the state, is a unit variance Gaussian with mean ± d . Then ν × ν is / -feasible if and only if d lies between the -quantile and -quantile of the standard normaldistribution.Interestingly, product distributions with a given marginal cease being feasible once the numberof agents becomes large enough, as demonstrated in the following proposition. Proposition 3.

For every probability measure ν ∈ ∆([0 , that diﬀers from a Dirac measure, forsuﬃciently large n the product distribution ν n ∈ ∆([0 , n ) is not feasible. The proof, that is relegated to § C, uses our Theorem 3. We show that with suﬃciently largenumber of agents, a mediator can implement a strictly beneﬁcial trading scheme, i.e. one thatviolates Theorem 3. In fact, we show that the product distribution ν n is infeasible whenever ⌊ n ⌋ > (cid:0) R m x d ν ( x ) − R m x d ν ( x ) (cid:1) − , where m is the median of the marginal distribution ν . First-order Bayesian persuasion.

In this section we consider a sender who sends informationregarding an underlying state to a group of n receivers. The sender’s utility depends on the actionsof the receivers. We assume that each receiver’s utility depends only on the state and her ownaction, as, for example, is common in the social learning literature. Since the equilibrium actionof a receiver is dictated solely by her ﬁrst-order beliefs, we call this setting ﬁrst-order Bayesianpersuasion. We note that the results of this section do not rely on the characterizations of feasibilitygiven in Theorems 2 and 3, but rather oﬀer an additional set of tools to study the set of feasibledistributions of posteriors.Formally, a ﬁrst-order Bayesian persuasion problem is given by B = ( N, p, ( A i ) i ∈ N , ( u i ) i ∈ N , u s ).As above, ω ∈ { ℓ, h } is a binary state for which n = | N | receivers have a common prior p ∈ (0 , i ∈ N has to choose an action a i from a compact metric set of actions A i . Her utility u i ( ω, a i ) depends only on the state and her action. A single sender has utility u s ( a , . . . , a n ) whichdepends on the receivers’ actions. 17he sender chooses an information structure I = (( S i ) i ∈ N , P ) with prior p , and each receiver i observes a private signal s i ∈ S i and then chooses an action a i .In equilibrium, the action ˜ a i ∈ argmax a i ∈ A i E ( u i ( ω, a i ) | s i )is chosen by i to maximize her expected utility conditioned on her information s i . Note that since u i depends only on ω and a i , it follows that given the information structure I , receiver i ’s posterior x i = P ( ω = h | s i ) is a suﬃcient statistic for her utility, i.e.,max a i ∈ A i E ( u i ( ω, a i ) | s i ) = max a i ∈ A i E ( u i ( ω, a i ) | x i ) , and so a receiver does not decrease her expected utility by discarding her private signal, retainingonly the ﬁrst-order posterior belief x i . We accordingly consider only equilibria in which the agents—even when they are indiﬀerent—use only x i to choose their actions, so that ˜ a i is a function of x i . The information structure I is chosen to maximize the expectation of u s . We assume that u i and u s are upper-semicontinuous, to ensure the existence of equilibria. The value V ( B ) is the sender’sexpected equilibrium utility in the ﬁrst-order Bayesian persuasion problem B .A crucial feature is the assumption that u i does not depend on the other agents’ actions. Thecase in which externalities are allowed is the general problem of Bayesian mechanism design, whichis beyond the scope of this paper. In contrast, in ﬁrst-order Bayesian persuasion the receivershave no strategic incentives. This implies that their higher order beliefs are irrelevant to the sender,who in turn is solely interested in their ﬁrst-order posterior beliefs. This is captured by the followingproposition, which states that for every ﬁrst-order Bayesian persuasion problem there is an indirectutility function that the sender maximizes by choosing the receivers’ posteriors. Of course, theposterior distribution must be feasible.Denote by P Np the set of p -feasible P ∈ ∆([0 , N ). Proposition 4.

For every ﬁrst-order Bayesian persuasion problem B = ( N, p, ( A i ) i , ( u i ) i , u s ) there is an indirect utility function v : [0 , N → R such that the value V ( B ) is given by V ( B ) = max P ∈P np Z v ( x , . . . , x n ) d P ( x , . . . , x n ) . This reﬁnement rules out equilibria in which, for example, an agent uses higher order beliefs to break ties whenindiﬀerent between two actions. Nevertheless, our approach remains applicable even in the presence of externalities if the game played by receiversis “simple” in the sense of B¨orgers and Li (2019), i.e., the equilibrium behavior of a receiver does not require her toform higher order beliefs. xample: Polarizing receivers. In § v ( x , x ) = | x − x | a , for some parameter a >

0. Informally, the sender wishes to maximize the polarization betweenreceivers, or the discrepancy between their posteriors.For the case a = 2 we solve the sender’s problem completely. Proposition 5.

Let V be the value of the two receiver ﬁrst-order Bayesian persuasion problemwhere the sender’s indirect utility is v ( x , x ) = ( x − x ) and the prior is p ∈ (0 , . Then V ( B ) = (1 − p ) p . In this case of a = 2, the optimum can be achieved by completely informing one agent, andleaving the other completely uninformed. E.g., by letting s = ω and s = 0. We show in § C thatwhen p = / , the same policy is also optimal for all a <

2. Dubins and Pitman (1980) show thesame result for a = 1 and any p .For a ≥

3, it is no longer true that it is optimal to reveal the state to one receiver and leave theother uninformed, which yields utility 2 − a to the sender. For example, the posterior distribution P = 14 δ , / + 14 δ / , + 12 δ / , / (10)can be easily veriﬁed to be / -feasible using (6), and yields utility (cid:0) (cid:1) a , which for a = 3 (forexample) is larger than 2 − a .Another approach to quantifying the extent to which beliefs can be anti-correlated is by di-rectly minimizing their covariance. This question admits a simple, non-trivial solution, as the nextproposition shows: for the prior p = / , the smallest possible covariance between posterior beliefsis − /

32, and is achieved on a distribution supported on four points.

Proposition 6.

Let V be the value of the two receiver ﬁrst-order Bayesian persuasion problemwhere the prior is p = / and the sender’s indirect utility is v ( x , x ) = − ( x − p ) · ( x − p ) . Then V ( B ) = 1 / , and is achieved by P = 18 δ / , + 38 δ / , / + 18 δ / , + 38 δ / , / . P is depicted in Figure 1.The proofs of Propositions 5 and 6 are presented in § C. They do not rely on our characterizationof feasible beliefs, and instead use the fact that conditional expectations are orthogonal projectionsin the Hilbert space of square-integrable random variables. This technique exploits the quadraticform of these persuasion problems. A natural avenue for future research is the extension of thesetechniques—or the development of new techniques—to tackle non-quadratic problems. x x Figure 1: An optimal distribution of the ﬁrst-order Bayesian persuasion problem given by theindirect utility v ( x , x ) = − ( x − p ) · ( x − p ) and prior p = / ; see Proposition 6. For p = / , thisdistribution achieves the lowest possible covariance between posterior beliefs: − /

32. Blue pointsdescribe the distribution of posteriors conditional on ω = 0. Red points describe the distributionof posteriors conditional on ω = 1. Extreme feasible joint posterior belief distributions.

The set P Np of p -feasible joint pos-terior belief distributions is a convex, compact subset of ∆([0 , N ), when the latter is naturallyequipped with the weak* topology; see Dubins and Pitman (1980) or Proposition 8 in the Appendix.Together with Proposition 4, this fact implies that V ( B ) is always achieved at an extreme point ofthe set of p -feasible distributions P Np . It is thus natural to study the set of extreme points.In the single agent case the concaviﬁcation argument of Aumann and Maschler (1995) andKamenica and Gentzkow (2011) implies that every extreme point is a distribution with supportof size at most 2. This is not true for 2 or more agents. For example, the posterior distributionwith support of size 3 deﬁned in (10) is extreme in P / , since its restriction to any support ofsmaller cardinality is not feasible, as it cannot satisfy the martingale condition. The next theoremshows that there in fact exist extreme points with countably inﬁnite support. It also states that20he support cannot be too large, in the sense that every extreme point is supported on a set of zeroLebesgue measure. Theorem 4.

Let | N | ≥ . Then1. For every p ∈ (0 , there exists an extreme point in P Np whose support has an inﬁnite countablenumber of points.2. For every extreme P ∈ P Np there exists a measurable A ⊆ [0 , N such that P ( A ) = 1 , and theLebesgue measure of A is zero. To prove the ﬁrst part of Theorem 4 we explicitly construct an extreme feasible belief distributionwith countably inﬁnite support; see § C. The construction is a variant of Rubinstein’s e-mail game(Rubinstein, 1989). Unlike Rubinstein’s email game, no agent in our construction is fully informedabout the state. The resulting belief distribution for two agents is depicted in Figure 2. Interestingly,it is possible to modify our construction in a way that places the beliefs closer and closer to thediagonal. This results in a sequence of extreme points that converge to a distribution that issupported on the diagonal, and has support size larger than two. Every extreme point that issupported on the diagonal is supported on at most two points, since this reduces to the single agentcase. Therefore, for two agents or more, the set of extreme points is not closed within the set offeasible distributions. This demonstrates another distinction from the single agent case where theset of extreme points is closed.The proof of the second part of Theorem 4 relies on the classic result of Lindenstrauss (1965)regarding extreme points of the set of joint distributions with given marginals.Theorem 4 leaves a natural question open: Are there any non-atomic extreme points? Orconversely, does every extreme point have countable support?

Assume we are given a feasible distribution P ∈ ∆([0 , n ). A natural question is: Which informa-tion structures induce P ? And, relatedly, which conditional posterior belief distributions ( P ℓ , P h )are compatible with P ?By Lemma 3 (stated in the Appendix), P is feasible if and only if there exists a distribution Q ∈ ∆([0 , n ) such that Q ≤ p P and d Q i ( x ) = xp d P i ( x ) for all agents i . Note that this is a linearprogram, which in general is inﬁnite-dimensional.Given a solution Q , deﬁne P h = Q and P ℓ = P − p · Q − p . Then the distribution of posteriors P isinduced by the information structure in which the signals have joint distribution P ω conditional on ω (see Lemma 2 and its proof); in this case, we say that P is implemented by the pair ( P ℓ , P h ),21 x ......1

913 2735 8197 . . . Figure 2: An extreme point of P / with inﬁnite countable support. The numbers near the pointsindicate their probabilities. Conditional on ω = ℓ the pair of posteriors belongs to the set of bluepoints. Conditional on ω = h the pair of posteriors belongs to the set of red points.22hich are also the conditional distributions of the beliefs. Thus, to ﬁnd an information structurethat induces P , it suﬃces to solve the above mentioned linear program.When P has ﬁnite support, this linear program has ﬁnite dimension, and thus a solution canbe numerically calculated by simply applying a linear program solver. In the general, inﬁnitedimensional case, we do not expect that simple, closed-form solutions always exist. An exceptionis the single agent case, in which an implementing pair is given by (see Lemma 2)d P ℓ ( x ) = 1 − x − p d P ( x ) d P h ( x ) = xp d P ( x ) . We make two observations about the single agent case. First, there is a unique pair of conditionalbelief distributions ( P ℓ , P h ) that implements P : every information structure that induces P willhave the same conditional distributions of beliefs. Second, the two distributions P ℓ , P h have theRadon–Nikodym derivative d P h d P ℓ ( x ) = − pp x − x , and so are mutually absolutely continuous, unlessthere is an atom on 0 or on 1. As we now discuss, neither of these two properties hold in generalbeyond the single agent case.We consider the case that the number of agents is n ≥

2, and that P ∈ ∆([0 , n ) is a fea-sible distribution that admits a density. In this case, the next proposition shows that P can beimplemented by ( P ℓ , P h ) that are very far from being mutually absolutely continuous: they aresupported on disjoint sets. Proposition 7.

Let n ≥ , and let P ∈ ∆([0 , n ) be p -feasible for some p . Assume that P admitsa density. Then there exists a subset D ⊂ [0 , n , such that P can implemented by ( P ℓ , P h ) withthe property that P h is supported on D and P ℓ is supported on the complement D . Furthermore,restricted to D and D respectively, P ℓ = − p P , and P h = p P .Proof. Let R be a measure on [0 , n . For a subset D ⊆ [0 , n we denote by R (cid:12)(cid:12) D the restriction of R to this subset, i.e., R (cid:12)(cid:12) D ( A ) = R ( D ∩ A ) for any measurable set A .In Theorem 3 in Gutmann et al. (1991) it was shown that for any absolutely continuous, ﬁnitemeasure R on [0 , n and every measure Q ≤ R , there exists a measurable set D such that R (cid:12)(cid:12) D and Q have identical marginals.Apply this result to R = p P and Q ≤ p P , which solves the linear program of Lemma 3. Thenthere is a D ⊂ [0 , n such that Q and p P (cid:12)(cid:12) D have the same marginals. Let Q ′ = p P (cid:12)(cid:12) D . Then Q ′ is also a solution to the linear program of Lemma 3: Q ′ ≤ p P and d Q i ( x ) = xp d P i ( x ). Hence, P isimplemented by P h = Q ′ = p P (cid:12)(cid:12) D and P ℓ = P − p · Q ′ − p = − p P | D , which proves the claim.When n ≥

2, an implementation is not always unique, in contrast to the single agent case. Tosee this, consider the case of n = 2, and P equal to the uniform distribution U S on the square These pairs are also easily seen to be extreme points among the convex set of pairs that implement P . x c − c c − cP ℓ : x x c − c c − cP h :Figure 3: The implementation ( P ℓ , P h ) of the uniform distribution on [ c, − c ] . Darker/lightercolors indicate higher/lower densities. S = [ c, − c ] with c ∈ [0 , / ). It is / -feasible since the marginals are second-order dominatedby the uniform distribution on [0 ,

1] (see Proposition 2). In §

2, we saw the implementation ofthe uniform distribution on [0 , , which corresponds to c = 0. For c >

0, that implementationsuggests that solutions of the linear program of Lemma 3 might be found in the form of a convexcombination of U S and the uniform distribution U T on the upper-triangle T = S ∩ { x + x ≥ } .Indeed, it is easy to check that Q = 1 − c c · U T + 2 c c · U S has the correct marginals and satisﬁes Q ≤ · U S . The corresponding pair of distributions ( P ℓ , P h )is illustrated in Figure 3.Note that in this implementation the two distributions ( P ℓ , P h ) do not have disjoint supports.Hence Proposition 7 implies that another implementation exists. This is illustrated in Figure 4.24 x c c ∗ − c cc ∗ − c c, − c ] in which P ℓ (depicted inblue) and P h (depicted in red) have disjoint supports. We denote c ∗ = 2 c (1 − c ). References

I. Arieli and Y. Babichenko. Private bayesian persuasion.

Journal of Economic Theory , 182:185–217,2019.I. Arieli, Y. Babichenko, and R. Smorodinsky. Identiﬁable information structures.

Games andEconomic Behavior , 120:16–27, 2020.R. J. Aumann. Agreeing to disagree.

The Annals of Statistics , pages 1236–1239, 1976.R. J. Aumann and M. Maschler.

Repeated games with incomplete information . MIT press, 1995.In collaboration with Richard E. Stearns.D. Bergemann and S. Morris. Information design: A uniﬁed perspective.

Journal of EconomicLiterature , 57(1):44–95, 2019.D. Bergemann, B. Brooks, and S. Morris. First-price auctions with general information structures:Implications for bidding and revenue.

Econometrica , 85(1):107–143, 2017.D. Bergemann, T. Heumann, and S. Morris. Information, market power and price volatility. 2020.25. Blackwell. Comparison of experiments. In

Proceedings of the Second Berkeley Symposium onMathematical Statistics and Probability , pages 93–102. University of California Press, 1951.D. Blackwell. Equivalent comparisons of experiments.

The annals of mathematical statistics , pages265–272, 1953.T. B¨orgers and J. Li. Strategically simple mechanisms.

Econometrica , 87(6):2003–2035, 2019.B. Brooks, A. P. Frankel, and E. Kamenica. Information hierarchies.

Available at SSRN 3448870 ,2019.B. A. Brooks and S. Du. Optimal auction design with common values: An informationally-robustapproach.

Available at SSRN 3137227 , 2019.K. Burdzy and S. Pal. Contradictory predictions. arXiv preprint arXiv:1912.00126 , 2019.K. Burdzy and J. Pitman. Bounds on the probability of radically diﬀerent opinions.

ElectronicCommunications in Probability , 25, 2020.S. Cichomski. Maximal spread of coherent distributions: a geometric and combinatorial perspective. arXiv preprint arXiv:2007.08022 , 2020.A. Dawid, M. DeGroot, and Mortera. Coherent combination of experts’ opinions.

Test , 4(2):263–313, 1995.L. E. Dubins and J. Pitman. A maximal inequality for skew ﬁelds.

Zeitschrift f¨ur Wahrschein-lichkeitstheorie und verwandte Gebiete , 52(3):219–227, 1980.J. C. Ely. Beeps.

American Economic Review , 107(1):31–53, 2017.Y. Feinberg. Characterizing common priors in the form of posteriors.

Journal of Economic Theory ,91(2):127–179, 2000.D. Gale et al. A theorem on ﬂows in networks.

Paciﬁc J. Math , 7(2):1073–1082, 1957.A. Gershkov, J. K. Goeree, A. Kushnir, B. Moldovanu, and X. Shi. On the equivalence of bayesianand dominant strategy implementation.

Econometrica , 81(1):197–220, 2013.A. Gershkov, B. Moldovanu, and P. Strack. A theory of auctions with endogenous valuations.

Available at SSRN 3097217 , 2018.S. Gutmann, J. Kemperman, J. Reeds, and L. A. Shepp. Existence of probability measures withgiven marginals.

The Annals of Probability , pages 1781–1797, 1991.26. Hansel and J.-P. Troallic. Sur le probl`eme des marges.

Probability theory and related ﬁelds , 71(3):357–366, 1986.S. Hart and P. J. Reny. Implementation of reduced form mechanisms: a simple approach and anew characterization.

Economic Theory Bulletin , 3(1):1–8, 2015.E. Kamenica and M. Gentzkow. Bayesian persuasion.

American Economic Review , 101(6):2590–2615, 2011.H. G. Kellerer. Funktionen auf Produktr¨aumen mit vorgegebenen Marginal-Funktionen.

Mathe-matische Annalen , 144:323–344, 1961.G. Levy, I. M. de Barreda, and R. Razin. Persuasion with correlation neglect. 2020.J. Lindenstrauss. A remark on extreme doubly stochastic measures.

The American MathematicalMonthly , 72(4):379–382, 1965.G. Lorentz. A problem of plane measure.

American Journal of Mathematics , 71(2):417–426, 1949.L. Mathevet, J. Perego, and I. Taneva. On information design in games.

Journal of PoliticalEconomy , 2019.P. Milgrom and N. Stokey. Information, trade and common knowledge.

Journal of economic theory ,26(1):17–27, 1982.D. Monderer and D. Samet. Approximating common knowledge with common beliefs.

Games andEconomic Behavior , 1(2):170–190, 1989.S. E. Morris.

The role of beliefs in economic theory . PhD thesis, Yale University, 1991. URL https://economics.mit.edu/files/19990 .S. E. Morris. No trade and feasible joint posterior beliefs. a working paper , 2020. URL https://economics.mit.edu/files/20126 .Z. Neeman. Approximating agreeing to disagree results with common p-beliefs.

Games and Eco-nomic Behavior , 12(1):162–164, 1996.M. M. Neumann. The theorem of gale for inﬁnite networks and applications. In

Inﬁnite Program-ming , pages 154–171. Springer, 1985.A. Rubinstein. The electronic mail game: Strategic behavior under “almost common knowledge”.

The American Economic Review , pages 385–391, 1989.27. Ryser. Combinatorial properties of matrices of zeros and ones.

Canadian Journal of Mathematics ,9:371–377, 1957.D. Samet. Common priors and separation of convex sets.

Games and economic behavior , 24(1-2):172–174, 1998.R. Shortt. The singularity of extremal measures.

Real Analysis Exchange , 12(1):205–215, 1986.V. Strassen. The existence of probability measures with given marginals.

The Annals of Mathe-matical Statistics , 36(2):423–439, 1965.I. Taneva. Information design.

American Economic Journal: Microeconomics , 11(4):151–85, 2019.J. Weinstein and M. Yildiz. Impact of higher-order uncertainty.

Games and Economic Behavior ,60(1):200–212, 2007.G. Ziegler. Adversarial bilateral information design.

Working Paper , 2020.

A Proof of Theorem 2

In this section we present a proof of Theorem 2, a result due to Dawid et al. (1995). We beginwith the following simple lemma, which gives a direct revelation principle for joint belief distribu-tions. Similar statements appeared in the literature (e.g., Gutmann et al., 1991; Dawid et al., 1995;Levy et al., 2020; Ziegler, 2020; Arieli et al., 2020). We include a proof for the convenience of thereader.

Lemma 2.

Let P ∈ ∆([0 , n ) be a p -feasible belief distribution. Then there exist P ℓ , P h ∈ ∆([0 , n ) such that P = (1 − p ) · P ℓ + p · P h and for every i ∈ { , . . . , n } the marginal distributions P hi and P i satisfy d P hi ( x ) = xp d P i ( x ) . (11) In particular, every feasible belief distribution can be induced by an information structure in which,for each i , the belief x i is equal to the signal s i . The next lemma gives a necessary and suﬃcient condition for feasibility in terms of the existenceof a measure with given marginals and with a given upper bound.

Lemma 3.

For n ≥ agents, a distribution P ∈ ∆([0 , n ) is p -feasible if and only if there existsa probability measure Q ∈ ∆([0 , n ) such that . Q is upper-bounded by p P , i.e. Q ( A ) ≤ p P ( A ) for any measurable A ⊆ [0 , n , and2. for every i ∈ { , . . . , n } the marginal distribution Q i is given by d Q i ( x ) = xp d P i ( x ) . (12) Proof. If P is feasible, then by Lemma 2 there is a pair P ℓ , P h ∈ ∆([0 , n ) such that P =(1 − p ) P ℓ + p · P h and such that (11) holds. Picking Q = P h , it follows from (11) that Q ≤ p P and that Q the desired marginals. In the opposite direction: if there is Q satisfying the theoremhypothesis, then let P h = Q and P ℓ = P − p · Q − p . Consider the information structure I in which ω = h with probability p , and ( s , . . . , s n ) is chosen from P h conditioned on ω = h , and from P ℓ conditioned on ω = ℓ . It follows from (12) that P is the joint posterior distribution induced by I ,and is hence p -feasible.Lemma 3 reduces the question of feasibility to the question of the existence of a bounded measurewith given marginals. This is a well-studied question in the case n = 2. When the upper bound isproportional to the Lebesgue measure, Lorentz (1949) provided the answer in terms of ﬁrst-orderstochastic dominance of marginals. The discrete analog of this result for matrices is known asthe Gale-Ryser Theorem (Gale et al., 1957; Ryser, 1957). The condition for general upper-boundmeasures was derived by Kellerer (1961, Satz 4.2); the formulation below is due to Strassen (1965,Theorem 6). Theorem B (Kellerer, Strassen) . Fix P ∈ ∆([0 , ) , < p ≤ , and M , M ∈ ∆([0 , . Thenthe following are equivalent:1. There exists a probability measure Q ∈ ∆([0 , ) that is bounded from above by p P , and whichhas marginals Q i = M i .2. For every measurable B , B ⊂ [0 , it holds that p P ( B × B ) ≥ M ( B ) + M ( B ) − . (13)As we now show, Theorem 2 is a consequence of Kellerer’s Theorem and Lemma 3. Proof of Theorem 2.

When P is p -feasible, (6) follows immediately from Corollary 1. To prove theother direction, we assume that P satisﬁes (6), and prove that P is p -feasible for some p . To this end We note that this result can also proved by interpreting the inﬁnite-dimensional linear program for Q fromLemma 3 as a condition that there is a feasible ﬂow of magnitude 1 in an auxiliary network (as in Gale et al. (1957),but with a continuum of edges) and then using max-ﬂow/min-cut duality for inﬁnite networks (see Neumann, 1985).

29e will use Kellerer’s Theorem to show that there exists a Q ∈ ∆([0 , ) satisfying the conditionsof Lemma 3.Assume then that P satisﬁes (6), i.e. for every measurable A , A ⊆ [0 ,

1] it holds that P ( A × A ) ≥ Z A x d P ( x ) − Z A x d P ( x ) ≥ P ( A × A ) . (14)Applying this to A = A = [0 ,

1] yields Z x d P ( x ) = Z x d P ( x ) . We accordingly set p = R x d P ( x ), and deﬁne the measures M , M ∈ ∆([0 , M i ( x ) = xp d P i ( x ). Given any two measurable subsets B , B ⊆ [0 , A = B and A = B . Applyingthe left hand inequality of (14) to A = B and A = B yields P ( B × B ) ≥ Z B x d P ( x ) − Z B x d P ( x ) . Since d M i ( x ) = xp d P i ( x ) we can write this as P ( B × B ) ≥ p · M ( B ) − p · M ( B ) , dividing by p and substituting M ( B ) = 1 − M ( B ) yields.1 p P ( B × B ) ≥ M ( B ) + M ( B ) − . Thus condition (13) holds, and we can directly apply Kellerer’s theorem to conclude that thereexists a measure Q that satisﬁes the conditions of Lemma 3, as condition (12) is simply Q i = M i .Hence P is p -feasible. B Proof of Theorem 3

In this section we prove Theorem 3. The proof of necessity is straightforward and explained beforethe theorem statement. The proof of suﬃciency uses Th´eor`eme 2.6 of Hansel and Troallic (1986),which is a generalization of Kellerer’s Theorem.Recall that a paving of a set X is a set of subsets of X that includes the empty set, and an algebra is a paving that is closed under unions and complements. By a collection we mean a multiset, i.e.,the same element can enter the collection several times. Denote ¯ R + = [0 , ∞ ]. A ﬁnitely additivemeasure is a map from an algebra to ¯ R + that is additive for disjoint sets.30 heorem C (Hansel and Troallic (1986)) . Let X be a set, and F an algebra of subsets of X . Let F , . . . , F n , G , . . . , G m be subpavings of F . For i ∈ { , . . . , n } and j ∈ { , . . . , m } let α i : F i → ¯ R + ,and β j : G j → ¯ R + be maps that vanish on the empty set. Then the following are equivalent:1. There is a ﬁnitely additive measure Q : F → R + such that, for every i ∈ { , . . . , n } and every A ∈ F i it holds that α i ( A ) ≤ Q ( A ) , and for every j ∈ { , . . . , m } and every B ∈ G j it holdsthat Q ( B ) ≤ β j ( B ) .2. For every ﬁnite collections of sets A ⊆ F , . . . , A n ⊆ F n , B ⊆ G , . . . , B m ⊆ G m , if n X i =1 X A ∈A i A ≤ m X j =1 X B ∈B j B then n X i =1 X A ∈A i α i ( A ) ≤ m X j =1 X B ∈B j β j ( B ) . We will need the following corollary of this result.

Corollary C.

Fix P ∈ ∆([0 , n ) , < p ≤ , and M , . . . , M n ∈ ∆([0 , . Then the following areequivalent:1. There exists a probability measure Q ∈ ∆([0 , n ) that is upper-bounded by p P , and which hasmarginals Q i = M i .2. For every A . . . , A n , ﬁnite collections of Borel subsets of [0 , , every C a ﬁnite collection ofBorel subsets of [0 , n , and for every non-negative integer K , if n X i =1 X A ∈A i A ( x i ) ≤ K + X C ∈C C ( x , . . . , x n ) (15) for all ( x , . . . , x n ) , then n X i =1 X A ∈A i M i ( A ) ≤ K + X C ∈C p P ( C ) . (16) Proof.

The proof that 1. implies 2. is simple and omitted. Let F be the Borel sigma-algebra of [0 , n . For each i ∈ { , . . . , n } let F i be the sub-sigma-algebra of sets that are measurable in the i th coordinate. That is, F i consists of sets of the form[0 , i − × A × [0 , n − i , where A is any Borel subset of [0 , π i : [0 , n → [0 ,

1] theprojection on the i th coordinate. We will only make use of the other, non-trivial direction. m = 2. Deﬁne G as the trivial algebra {∅ , [0 , n } and G = F . Let α i = M i , β (cid:0) [0 , n (cid:1) = 1, β ( ∅ ) = 0, and β = p P . Let A . . . , A n and C satisfy (15). By abuse of notation we identify each A ∈ A i with its preimage π − i ( A ) = [0 , i − × A × [0 , n − i . Thus A i ⊆ F i . We deﬁne B and B as follows. The collection B ⊆ G contains K copies of [0 , n and B m = C ⊆ G .We can apply Theorem C directly to conclude that there is a ﬁnitely additive measure Q thatis upper-bounded by p P , has Q ([0 , n ) ≤

1, and whose marginals Q i bound M i from above for i ∈ N . Each M i is a probability measure and the total mass of Q is at most 1; hence, Q i canupper-bound M i only if Q i = M i . We conclude that Q is a probability measure with marginals M i for each i . Since Q is upper-bounded by a sigma-additive measure, it is also itself sigma-additive(see Lemma 4 below). Lemma 4.

Let F be a sigma-algebra of subsets of X . Let µ : F → ¯ R + be a ﬁnitely additive measure,and let ν : F → ¯ R + be a sigma-additive measure. If µ ≤ ν and ν ( X ) < ∞ then µ is sigma-additive.Proof. Let A , A , . . . ∈ F be pairwise disjoint. Denote A = ∪ i A i . Then, by additivity, we havethat µ ( A ) = µ ( ∪ ni =1 A i ) + µ ( ∪ ∞ i = n +1 A i ) ≥ µ ( ∪ ni =1 A i ) = n X i =1 µ ( A i )Hence µ ( A ) ≥ P ∞ i =1 µ ( A i ). For the other direction, µ ( A ) = µ ( ∪ ni =1 A i ) + µ ( ∪ ∞ i = n +1 A i ) ≤ n X i =1 µ ( A i ) + ν ( ∪ ∞ i = n +1 A i ) , since µ ≤ ν . By the sigma-additivity and ﬁniteness of ν , the last addend vanishes as n tends toinﬁnity. Hence µ ( A ) ≤ P ∞ i =1 µ ( A i ).We are now ready to ﬁnish the proof of Theorem 3. We will show that (8) suﬃces for feasibilityby checking that it implies the condition of Lemma 3. Assume then that P satisﬁes (8), i.e. forevery trading scheme ( a , . . . , a i ) it holds that Z [0 , n n X i =1 a i ( x i ) x i − max ( , n X i =1 a i ( x i ) )! d P ( x , . . . , x n ) ≤ . (17)Choose i, j ∈ { , . . . , n } , and consider the trading scheme in which a i = 1, a j = − a k = 0 for all k

6∈ { i, j } . Then (17) implies that R x d P i ( x ) = R x d P j ( x ). We accordingly set p = R x d P i ( x ),and note that this deﬁnition is independent of the choice of i . Deﬁne the measures M i ∈ ∆([0 , n )by d M i ( x ) = xp d P i ( x ). 32et A . . . , A n , K , and C satisfy (15). We will show that (16) must hold. This will concludethe proof, since then it follows from Corollary C that there exists a measure Q that satisﬁes theconditions of Lemma 3, as condition (12) is simply Q i = M i .To show that (16) holds, deﬁne the trading scheme ( a , . . . , a n ) by a i ( x ) = c · X A ∈A i A ( x ) − Kn ! , where c is a normalization constant chosen small enough so that the image of all the a i ’s is in[ − , Z [0 , n n X i =1 a i ( x i ) x i d P ( x , . . . , x n ) ≤ Z [0 , n max ( , n X i =1 a i ( x i ) ) d P ( x , . . . , x n ) . We substitute the deﬁnition of the trading scheme and the measures M i . This yields p · n X i =1 X A ∈A i M i ( A ) − p · K ≤ Z [0 , n max ( , n X i =1 X A ∈A i A ( x i ) − K ) d P ( x , . . . , x n ) . By (15), the integrand on the right-hand side is at most P C ∈C C ( x , . . . , x n ). Hence p · n X i =1 X A ∈A i M i ( A ) − p · K ≤ X C ∈C P ( C ) . Dividing by p and rearranging yields n X i =1 X A ∈A i M i ( A ) ≤ K + X C ∈C p P ( C ) . Thus condition (16) holds, and by Lemma 3 the distribution P is p -feasible. This completes theproof of Theorem 3.We end this section by noting a diﬀerent avenue for proving this theorem. Lemma 3 gives anecessary and suﬃcient condition for a distribution P to be feasible, in terms of the existence of adistribution Q satisfying two properties. Since the support of Q is always a subset of the supportof P , for ﬁnitely supported P the conditions of the lemma reduce to a ﬁnite number of equalitiesand inequalities on a ﬁnite number of variables. Theorem 3 then becomes an corollary of the Farkaslemma. Theorem 3 for general distributions can then be deduced by approximation arguments. A similar approach to the characterization of ﬁnitely supported feasible distributions was taken by Morris (2020). Additional missing claims and proofs

Proof of Lemma 1.

To prove the lemma we assume that P ′ is not p -feasible and show that P is not p -feasible.Let ( x , x ′ , . . . , x n , x ′ n ) be random variables such that (i) each pair ( x i , x ′ i ) is independent of therest, (ii) x i has distribution µ i and x ′ i has distribution µ ′ i , and (iii) E ( x i | x ′ i ) = x ′ i , which is possiblesince µ i is a mean preserving spread of µ ′ i .By this deﬁnition, P is the joint distribution of ( x , . . . , x n ) and P ′ is the joint distribution of( x ′ , . . . , x ′ n ). Hence, by Theorem 3, in order to prove our claim it suﬃces to ﬁnd a trading scheme( a , . . . , a n ) such that E n X i =1 a i ( x i ) x i − max ( , n X i =1 a i ( x i ) )! > , (18)which would violate (8).Since P ′ is not p -feasible, it follows from Theorem 3 that there exists a trading scheme ( a ′ , . . . , a ′ n )such that E n X i =1 a ′ i ( x ′ i ) x ′ i − max ( , n X i =1 a ′ i ( x ′ i ) )! > . (19)Deﬁne a i by a i ( x i ) = E ( a ′ i ( x ′ i ) | x i ) . It follows from the deﬁnition of a i , from the law of iterated expectation, and from E ( x i | x ′ i ) = x ′ i that E ( a i ( x i ) · x i ) = E ( E ( a ′ i ( x ′ i ) | x i ) · x i )= E ( E ( a ′ i ( x ′ i ) · x i | x i ))= E ( a ′ i ( x ′ i ) · x i )= E ( E ( a ′ i ( x ′ i ) · x i | x ′ i ))= E ( a ′ i ( x ′ i ) · E ( x i | x ′ i ))= E ( a ′ i ( x ′ i ) · x ′ i ) . (20) This only deﬁnes a i µ i -almost everywhere, which is suﬃcient for our needs; the rest of the deﬁnition can becompleted using any version of the conditional expectation.

34n addition, we have E − max ( , n X i =1 a i ( x i ) )! = E − max ( , n X i =1 E ( a ′ i ( x ′ i ) | x i ) )! = E − max ( , E n X i =1 a ′ i ( x ′ i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x , . . . , x n !)! ≥ E E − max ( , n X i =1 a ′ i ( x ′ i ) )(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x , . . . , x n !! = E − max ( , n X i =1 a ′ i ( x ′ i ) )! (21)where the ﬁrst equality holds by the deﬁnition of a i , the second equality holds since ( x , . . . , x n )are independent, the third inequality follows from Jensen’s inequality since the function x max { , x } is concave, and the last equality is again the law of iterated expectation.Together, (20) and (21) imply that the left hand side of (18) is at least as large as that of (19).Since the latter is positive, we conclude that E n X i =1 a i ( x i ) x i − max ( , n X i =1 a i ( x i ) )! ≥ E n X i =1 a ′ i ( x ′ i ) x ′ i − max ( , n X i =1 a ′ i ( x ′ i ) )! > , as desired. Proof of Proposition 2.

Assume that the uniform distribution is a mean preserving spread of ν (or,equivalently, that ν is a mean preserving contraction of the uniform distribution). As shown beforethe statement of the proposition, the uniform distribution is / -feasible. Hence P is / -feasible byLemma 1.Conversely, let ν be a probability measure that is symmetric around / , and assume that ν is nota mean preserving contraction of the uniform distribution. We show that ν × ν is not / -feasible.Let F ( x ) = ν ([0 , x ]) be the cumulative distribution function of ν . Since the uniform distributionis not a mean-preserving spread of ν , there must be y ∈ [0 ,

1] such that H ( y ) = Z y F ( x ) d x − Z y x d x = Z y F ( x ) d x − y > . Note that H is continuous, and furthermore diﬀerentiable. Since ν is symmetric and R F ( x ) d x = / , the function H is also symmetric around / : H ( y ) = H (1 − y ).The function H vanishes at the end-points of the interval, and, as we noted above, is positivesomewhere on [0 , y ∈ (0 , / ]. By construction, H ( y ) > H is diﬀerentiable, we also have the ﬁrst-order condition H ′ ( y ) = F ( y ) − y = 0.35et z = F ( y ) R y x d ν ( x ) . We claim that z < y . To see this note that z = 1 F ( y ) Z y x d F ( x ) = 1 F ( y ) (cid:20) yF ( y ) − Z y F ( x ) d x (cid:21) < y − y F ( y ) = y − y y . The second equality follows from integration by parts, the third equality holds since R y F ( x ) d x > y , and the forth inequality follows since F ( y ) = y .Let A = [1 − y,

1] and A = [0 , y ] . Let P = ν × ν . Note that by symmetry ν ([ y, − F ( y ) = 1 − y and by independence P ( A × A ) = y (1 − y ). In addition, by construction, R A x d P ( x ) = − R y x d P ( x ) = − yz , and R A x d P ( x ) = yz . Therefore Z A x d P ( x ) − Z A x d P ( x ) − P ( A × A ) = 12 − yz − yz − y (1 − z ) = 12 − yz − y + y > − y ≥ , where the second inequality follows since z < / . Therefore, Theorem 2 implies that P = ν × ν isnot feasible. Proof of Proposition 3.

Let m ∈ (0 ,

1) be the median of ν . We ﬁrst prove the proposition for thecase where ν has no atom at m . We assume for simplicity that the number of agents is even; i.e., n = 2 k (otherwise, we simply ignore the last agent).We consider the trading scheme ( a , . . . , a k ) given by a i ( x ) = xm for i ∈ { k + 1 , . . . , k } . We argue that for large enough n (i.e., large enough k ) this trading schemeviolates the condition of Theorem 3.Let B ∼ Bin( k, ) denote the random variable of the number of agents i = 1 , ..., k whoseposterior is in [0 , m ) (i.e., those that sell the product). Let B ∼ Bin( k, ) denote the randomvariable of the number of agents i = k + 1 , ..., k whose posterior is in ( m,

1] (i.e., those that buythe product). Since ν n is a product distribution we have that B is independent of B .Note also that P ni =1 a i ( x i ) = B − B , and so we have that the lower bound to the mediator’sproﬁt given by (8) equals k Z m x d ν ( x ) − k Z m x d ν ( x ) − E (cid:0) max { , B − B } (cid:1) .

36e observe that E (cid:0) max { , B − B } (cid:1) ≤ q k : We have E (cid:0) B − B (cid:1) = 0 and E (cid:0) ( B − B ) (cid:1) = k .Jensen’s inequality implies that E (cid:0) | B − B | (cid:1) ≤ q k and by symmetry we deduce that E (cid:0) max { B − B , } (cid:1) ≤ q k .It follows that for k > (cid:0) R m xdν − R m xdν (cid:1) − this lower bound on the mediator’s proﬁt ispositive, and thus by Theorem 3 the belief distribution ν n is not feasible.In case the distribution ν has an atom at m , the only needed change is to choose a i ( m ) so thatthe expected number of units that each agent buys or sells is / ; the same calculation applies inthis case. Proof of Lemma 2.

Let I = (( S i ) i ∈ N , P ) be an information structure with prior p that induces P .By the law of total expectation, x i = P ( ω = h | s i ) = P ( ω = h | x i )almost everywhere. Hence, if we deﬁne a new information structure I ′ for which the signals are s ′ i = x i and the beliefs are, accordingly, x ′ i = P ( ω = h | s ′ i ), we will have that x ′ i = s ′ i , and so we proved the second part of the claim.Denote by P ℓ and P h be the conditional distributions of ( x ′ , . . . , x ′ n ), conditioned on ω = ℓ and ω = h , respectively. Then clearly P = (1 − p ) · P ℓ + p · P h .Finally, (11) holds, since by Bayes’ law and the fact that x ′ i = s ′ i , x ′ i = P ( ω = h | x ′ i ) = p · d P hi d P i ( x ′ i ) . Proof of Proposition 4.

Let B = ( N, p, ( A i ) i , ( u i ) i , u s ) be a ﬁrst-order Bayesian persuasion prob-lem. By our equilibrium reﬁnement assumption, for each receiver i there is a map α i : [0 , → A i such that in equilibrium receiver i chooses action ˜ a i = α i ( x i ) when their posterior is x i . Hence, ifwe set v ( x , . . . , x n ) = u s ( α ( x ) , . . . , α n ( x n )) , Then, given an information structure chosen by the sender, E ( u s (˜ a , . . . , ˜ a n )) = E ( u s ( α ( x ) , . . . , α n ( x n ))) = E ( v ( x , . . . , x n )) . P the posterior belief distribution induced by the chosen information struc-ture, we have shown that the sender’s expected utility is Z v ( x , . . . , x n ) d P ( x , . . . , x n ) . Since the sender, by choosing the information structure, can choose any p -feasible distribution, wearrive at the desired conclusion. Proof of Proposition 5.

We ﬁrst show that every information structure I = ( S , S , P ) with prior p chosen by the sender yields utility at most (1 − p ) p .This proof uses the fact conditional expectations are orthogonal projections in the Hilbert spaceof square-integrable random variables. We now review this fact.For the probability space (Ω × S × S , P ) denote by L the Hilbert space of square-integrablerandom variables, equipped with the usual inner product ( X, Y ) = E ( X · Y ) and correspondingnorm k X k = p E ( X ).Given a sub-sigma-algebra G ⊆ F , denote by L ( G ) ⊆ L the closed subspace of G -measurable,sqaure-integrable random variables. Recall that the map X E ( X | G ) is simply the orthogonalprojection L → L ( G ).The following is an elementary lemma regarding Hilbert spaces. We will, of course, apply it to L . Lemma 5.

Let u be a vector in a Hilbert space U , and let w be an orthogonal projection of u to aclosed subspace W ⊆ U . Then w lies on a sphere of radius k u k around u , i.e. k w − u k = k u k .Proof. Since w is an orthogonal projection of u , we can write u = w + w ′ , where w and w ′ areorthogonal. This orthogonality implies that k w − w ′ k = k w + w ′ k = k u k . It follows that (cid:13)(cid:13)(cid:13) w − u (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) w −

12 ( w + w ′ ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) w − w ′ (cid:13)(cid:13)(cid:13) = 12 k w − w ′ k = 12 k u k . Let X = ω = h be the indicator of the event that the state is high. Hence x i = E ( X | F i ), where F i = σ ( s i ) is the sigma-algebra generated by the private signal s i . Denote ˆ X = X/p −

1. Hence P ( ˆ X = 1 /p −

1) = p , P ( ˆ X = −

1) = 1 − p , and k ˆ X k = (1 − p ) /p . Denote ˆ X i = x i /p −

1, so thatˆ X i = E ( ˆ X | F i ). Since ˆ X i is the projection of ˆ X to the subspace L ( F i ), it follows from Lemma 5that k ˆ X i − ˆ X/ k = 12 r − pp . k ˆ X − ˆ X k ≤ p (1 − p ) /p , and since ˆ X i = x i /p − k x /p − x /p k ≤ p (1 − p ) /p , or E (( x /p − x /p ) ) ≤ − pp . Thus E (( x − x ) ) ≤ (1 − p ) p. To ﬁnish the proof of Proposition 5, we note that a simple calculation shows that the sender canachieve the expected utility (1 − p ) p by completely informing one agent, and giving no informationto the other. That is, by inducing the joint posterior distribution P = p · δ ,p + (1 − p ) δ ,p . Remark on the polarizing ﬁrst-order Bayesian persuasion problem with a < and p = / . For the ﬁrst-order Bayesian persuasion problem B a given by v ( x , x ) = | x − x | a andsymmetric prior p = / , we argue that P = δ , + δ , is the optimal posterior distribution forall a ∈ (0 , V ( B a ) of the problem equals a . To prove this, it is enough to checkthe upper bound V ( B a ) ≤ a .Consider an arbitrary / -feasible policy P ′ . H¨older’s inequality implies Z v ( x , x ) dP ′ = Z v ( x , x ) · dP ′ ≤ (cid:18)Z ( v ( x , x )) q dP ′ (cid:19) q · (cid:18)Z q ′ dP ′ (cid:19) q ′ where q, q ′ > q + q ′ = 1. Picking q = a and taking supremum over P ′ on both sides, we get V ( B a ) ≤ (cid:0) V ( B ) (cid:1) a . By Proposition 5, V ( B ) = and we obtain the required upper bound. Proof of Proposition 6.

It is immediate to check that the expected utility for the distribution P given in the theorem statement is 1 /

32. It thus remains to prove that no other distribution canachieve higher expected utility.We retain the notation used in the proof of Proposition 4. Let ( x , x ) be the posterior beliefsof two agents induced by some information structure. Since the prior is p = / , ˆ X i = 2 x i −

1, andso E ( v ( x , x )) = − E (( x − / · ( x − / − E ( ˆ X · ˆ X ) . We recall that ( · , · ) denotes the inner product in the Hilbert space L . Thus we can write E ( v ( x , x )) = −

14 ( ˆ X , ˆ X ) . (22)39y Lemma 5, the vectors ˆ X , ˆ X ∈ L lie on a sphere of radius k ˆ X k around ˆ X ∈ L . Note thatsince p = / , k ˆ X k = 1. Since { ˆ X, ˆ X , ˆ X } span a subspace that is (at most) three dimensional,the question is reduced to the following elementary question in three dimensional geometry: givena vector w ∈ R , what is the minimum of the inner product ( u , u ), as u , u ∈ R range over thesphere of radius k w k around w ? Lemma 6 states that the minimum is − k w k . Hence by (22) E ( v ( x , x )) ≤ . w u u Figure 5: Lemma 6 states that − / is the smallest possible inner product between two vectors thatlie on a unit sphere which intersects the origin. The depicted u and u achieve this minimum. Lemma 6.

Let w ∈ R . Then min n ( u , u ) : k w − u k = k w − u k = k w k o = − k w k . The proof is left as an exercise to the reader. Alternatively, this problem can be solved symbol-ically using the Mathematica command

Minimize[{x1*y1 + x2*y2 + x3*y3,(x1 - 1)^2 + (x2)^2 + (x3)^2 == 1 && (y1 - 1)^2 + (y2)^2 + (y3)^2 == 1},{x1, x2, x3, y1, y2, y3}]

Proof of Theorem 4.

We start by constructing an extreme point of P p with inﬁnite support. There-after, we extend this construction for N ≥ S = S = { , , . . . } . Choose at random anumber K , which is distributed geometrically with parameter r that depends on the state: When40 = ℓ , r = / , and when ω = h , r = / . The signals ( s , s ) that the agents receive are equal to( K, K ) when ω = ℓ , and to ( K + 1 , K ) when ω = h .A simple calculation shows that when an agent 1 observes a signal s = k her posterior x ( k ) = t k and when agent 2 observes s = k , her posterior is x ( k ) = w k , where t k and w k are given by t k = ( − k +1 p − k +1 p +2 · − k (1 − p ) , k ≥ , k = 1 , w k = 2 − k p − k p + 2 · − k (1 − p ) , k ≥ . The induced conditional distributions of ( x , x ) are P ℓ = ∞ X k =1 · − k · δ ( t k , w k ) P h = ∞ X k =1 − k · δ ( t k +1 , w k ) . Note that P ℓ and P h have disjoint supports; see Figure 2.Hence the induced (unconditional) distribution is P = (1 − p ) · P ℓ + p · P h . It satisﬁes the pairof identities t k · P (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) = (1 − t k ) · P (cid:0)(cid:8)(cid:0) t k , w k − (cid:1)(cid:9)(cid:1) , k ≥ , (23) w k · P (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) = (1 − w k ) · P (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1) , k ≥ , (24)which we use below.We ﬁrst argue that ( P ℓ , P h ) is the unique implementation of P . By Lemma 2, for any pair ( Q ℓ , Q h ) implementing a feasible distribution Q , the marginals satisfyd Q hi ( x ) = xp d Q i ( x ) for each agent i ∈ N . Since Q ℓ = Q − p · Q h − p , we get the complementary equationd Q ℓi ( x ) = − x − p d Q i ( x ). Combining the two equations, we obtain xp · d Q ℓi ( x ) = 1 − x − p · d Q hi ( x ) for i ∈ N . (25)Let ( ˆ P ℓ , ˆ P h ) be an implementation of P . The identity (25), applied to i = 1 and x = t k , givesthe following equation t p ˆ P ℓ (cid:0)(cid:8)(cid:0) t , w (cid:1)(cid:9)(cid:1) = 1 − t − p ˆ P h (cid:0)(cid:8)(cid:0) t , w (cid:1)(cid:9)(cid:1) (26)for k = 1, and the following family of equations for k ≥ t k p (cid:16) ˆ P ℓ (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) + ˆ P ℓ (cid:0)(cid:8)(cid:0) t k , w k − (cid:1)(cid:9)(cid:1)(cid:17) = 1 − t k − p (cid:16) ˆ P h (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) + ˆ P h (cid:0)(cid:8)(cid:0) t k , w k − (cid:1)(cid:9)(cid:1)(cid:17) . (27) See § i = 2 and x = w k , we get w k p (cid:16) ˆ P ℓ (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) + ˆ P ℓ (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1)(cid:17) = 1 − w k − p (cid:16) ˆ P h (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) + ˆ P h (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1)(cid:17) . (28)We now show that these equations and the condition P = (1 − p ) ˆ P ℓ + p · ˆ P h completely determine thepair ( ˆ P ℓ , ˆ P h ). Since t = 0, equation (26) results in ˆ P h ( { ( t , w ) } ) = 0 and hence the entire mass − p P ( { ( t , w ) } ) of the point ( t , w ) must be assigned to ˆ P ℓ . Given this, the equality (28) impliesthe entire mass p P ( { ( t , w ) } ) of the point ( t , w ) must be assigned to ˆ P h . Indeed, expressingˆ P ℓ ( { ( t k , w k ) } ) and ˆ P h ( { ( t k +1 , w k ) } ) through P and taking into account that ˆ P h ( { ( t k , w k ) } ) = 0for k = 1, we rewrite (28) as w k p (cid:18) − p P (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) + ˆ P ℓ (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1)(cid:19) = 1 − w k − p P (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1) − (1 − p ) ˆ P ℓ (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1) p ! (29)for k = 1. This equality and the identity (24) lead to ˆ P ℓ ( { ( t , w ) } ) = 0.Next, a similar argument demonstrates that the entire mass − p P (cid:0)(cid:8)(cid:0) t , w (cid:1)(cid:9)(cid:1) of the point( t , w ) must be assigned to ˆ P ℓ . Indeed, we know that ˆ P ℓ ( { ( t k , w k − ) } ) = 0 for k = 2, whichallows us to rewrite (27) as t k p  P (cid:16)(cid:8)(cid:0) t k , w k (cid:1)(cid:9) − p · ˆ P h (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1)(cid:17) − p  = 1 − t k − p (cid:18) ˆ P h (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) + 1 p P (cid:0)(cid:8)(cid:0) t k , w k − (cid:1)(cid:9)(cid:1)(cid:19) (30)for k = 2. Combining this equality with identity (23), we get ˆ P h (cid:0)(cid:8)(cid:0) t , w (cid:1)(cid:9)(cid:1) = 0. We proceedinductively: knowing that ˆ P h (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) = 0, we deduce equality (29) and infer using (24) thatˆ P ℓ (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1) = 0; from this, we derive equality (30) and, with the help of (23), see thatˆ P h (cid:0)(cid:8)(cid:0) t k +1 , w k +1 (cid:1)(cid:9)(cid:1) = 0, and so on. Thus ˆ P ℓ coincides with − p P restricted to { ( t k , w k ) : k ≥ } and ˆ P h coincides with p P restricted to { ( t k +1 , w k ) : k ≥ } , and so ˆ P ℓ = P ℓ , ˆ P h = P h . Hence P has a unique implementation.Given a convex combination P = αQ +(1 − α ) ˆ Q , where Q and ˆ Q are both feasible and α ∈ (0 , Q = P . Let ( Q ℓ , Q h ) and ( ˆ Q ℓ , ˆ Q h ) be some pairs of conditional probabilitydistributions that implement Q and ˆ Q , respectively. The pair ( αQ ℓ + (1 − α ) ˆ Q ℓ , αQ h + (1 − α ) ˆ Q h ) implements P . From the uniqueness of P ’s implementation we deduce that supp ( Q ℓ ) ⊂ P ℓ ) and supp ( Q h ) ⊂ supp ( P h ), and thus Q ℓ ( { ( t k +1 , w k ) } ) = Q h ( { ( t k , w k ) } ) = 0. Therefore,condition (25) implies t k p · Q ℓ (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) = 1 − t k − p · Q h (cid:0)(cid:8)(cid:0) t k , w k − (cid:1)(cid:9)(cid:1) , k ≥ w k p · Q ℓ (cid:0)(cid:8)(cid:0) t k , w k (cid:1)(cid:9)(cid:1) = 1 − w k − p · Q h (cid:0)(cid:8)(cid:0) t k +1 , w k (cid:1)(cid:9)(cid:1) , k ≥ . This family of equations uniquely determines the weights of each point ( t k , w k ) and ( t k +1 , w k ), k ≥

1, up to a multiplicative factor, which is pinned down by the condition that Q is a probabilitymeasure. Hence in the decomposition P = αQ + (1 − α ) ˆ Q , the distribution Q is unique. Thus Q must be equal to P and hence P is an extreme point.This construction extends to N ≥

2, by sending no information to the remaining N − P Np with respect to the Lebesgue measure λ on the unit cube [0 , N . The proof relies on the classical theorem of Lindenstrauss (1965),which states that all extreme points of the set of all probability measures with given marginals aresingular. Lindenstrauss (1965) proved this result for n = 2 and Shortt (1986) extended it to general n . In § E (Theorem 5), we include an alternative proof for n ≥ p -feasible distribution P can be represented as (1 − p ) P ℓ + p · P h , where themarginals of the pair ( P ℓ , P h ) satisfy the identity xp dP ℓi ( x ) = (cid:18) − xp (cid:19) dP hi ( x ) . (31)Denote by H Np the set of all pairs satisfying (31). The set of p -feasible distributions P Np is, therefore,the image of the convex set H Np under the linear map ( P ℓ , P h ) → (1 − p ) P ℓ + p · P h . The extremepoints of P Np are hence all contained in the image of the extreme points of H Np .So it is enough to check that in any extreme pair ( P ℓ , P h ), both measures are necessary singularwith respect to λ . We check singularity of P ℓ ; the argument for P h is analogous. We assume towardsa contradiction that P ℓ is not singular. Then by Theorem 5, P ℓ is not an extreme point amongmeasures with the same marginals and thus can be represented as P ℓ = Q + Q ′ , where Q and Q ′ have the same marginals as P ℓ and Q = Q ′ . This induces the representation of ( P ℓ , P h ) as theaverage of (cid:0) Q, P h (cid:1) and (cid:0) Q ′ , P h (cid:1) , where both pairs satisfy (31). Contradiction. Proposition 8.

The set of p -feasible posterior distributions is a convex, weak* compact subset of ∆([0 , n ) .Proof. Pick a pair of feasible distributions

P, P ′ ∈ P Np and consider the corresponding Q, Q ′ ∈ ∆ (cid:0) [0 , N (cid:1) that satisfy the conditions of Lemma 3 for P and P ′ , respectively. For a convex combi-43ation P ′′ = αP + (1 − α ) P ′ , the conditions of the lemma are satisﬁed by Q ′′ = αQ + (1 − α ) Q ′ .Thus P ′′ also belongs to P Np , and so we have shown that the set of feasible distributions is convex.To verify the weak* compactness of P Np , consider a sequence of p -feasible distributions P ( k ) weakly converging to some P ( ∞ ) ∈ ∆ (cid:0) [0 , N (cid:1) , as k → ∞ . To prove weak* compactness we showthat the limit distribution P ( ∞ ) is also feasible. For each P ( k ) , select some Q ( k ) from Lemma 3. Theset of all probability distributions on [0 , N is weak* compact and, therefore, there is a subsequence Q ( k m ) weakly converging to some Q ( ∞ ) ∈ ∆ (cid:0) [0 , N (cid:1) .The conditions of Lemma 3 can be rewritten in an equivalent integrated form. Condition (1)becomes Z f ( x , . . . , x n ) (cid:18) p d P ( x , . . . , x n ) − d Q ( x , . . . , x n ) (cid:19) ≥ f on the unit cube. Condition (2) is equivalent to Z g ( x i ) (cid:18) x i p d P ( x , . . . , x n ) − d Q ( x , . . . , x n ) (cid:19) = 0 (33)for any agent i ∈ N and any continuous g of arbitrary sign on the unit interval.With this reformulation it is immediate that both conditions are closed, and hence withstandthe weak* limit. Therefore, since P ( k m ) , Q ( k m ) satisfy the conditions (32) and (33), the limitingpair P ( ∞ ) , Q ( ∞ ) satisﬁes them as well. We deduce that P ( ∞ ) is feasible. D Checking on Intervals is not Suﬃcient

In this section we show that restricting the condition (6) to intervals A , A provides a conditionthat—while clearly necessary—is not suﬃcient for feasibility.To see this, consider the following distribution P , depicted in Figure 6. It is parameterizedby small ε > (cid:0) − ε , (cid:1) and (cid:0) , (cid:1) withprobabilities − ε and / , respectively, and two “light” points, (cid:0) − ε , (cid:1) and (cid:0) − ε , (cid:1) , each withprobability ε . Thus P = 1 − ε δ − ε , + 12 δ , + ε (cid:0) δ / − ε/ , + δ / − ε/ , (cid:1) (34)This distribution satisﬁes the martingale condition with prior / ; however, it is not / -feasible.To see that, pick A = (cid:8) − ε , (cid:9) and A = (cid:8) (cid:9) , then A × A has zero P -measure and the condition(6) is violated: 0 ≥ − ε · − ε · − (cid:18) − ε (cid:19) = ε . x

940 3412 920 1940 Figure 6: The distribution deﬁned in (34), for ε = 1 /

10. The “heavy” points are large, and the“light” points are small.We now check that none of the inequalities (6) is violated for intervals A , A . Since P satisﬁesthe martingale condition, it suﬃces to check the left inequality P ( A × A ) ≥ Z A x d P ( x ) − Z A x d P ( x ) . (35)Since P has ﬁnite support, diﬀerent choices of A , A yield the same inequality if each set containsthe same points of the support of P and P , respectively. Thus, we need to check that (35) issatisﬁed if A is a subset of (cid:8) − ε , − ε , (cid:9) and A , a subset of (cid:8) , , (cid:9) , except the cases of A = (cid:8) − ε , (cid:9) and A = { , } , which exclude the middle points and do not correspond to anyinterval.We consider the following cases: • Inequality (35) holds if one of the sets A i is empty or contains the whole support of P i , as itthen boils down to the martingale condition, which we already veriﬁed. • Consider the case when A × A contains exactly one heavy point; by the interval constraint,if it contains two, then A contains the support of P , which is the case we already considered.In this case, P ( A × A ) ≥ − ε . On the other hand, the integral R A x d P ( x ) does not exceed − · = if A excludes the rightmost heavy point and it does not exceed − − ε − ε . Wesee that for ε small enough (e.g., ε = ), condition (35) is satisﬁed regardless of the choiceof A , since − ε ≥ max (cid:8) , − − ε − ε (cid:9) . Using the martingale condition R A x d P ( x ) + R A x d P ( x ) = p the right inequality in (6) follows from the leftby a simple calculation. Consider the remaining case, in which there are no heavy points in A × A , and both A and A are nonempty strict subsets of the supports. This can only be possible if A contains / or A = (cid:8) − ε (cid:9) . In the former case, R A x d P ( x ) ≥ − ε , which for small ε ( suﬃces)exceeds R A x d P ( x ) for any A excluding at least one of the heavy points (see the boundsabove). Hence, the right-hand side of (35) is negative and the inequality is satisﬁed. Considerthe remaining case of A = (cid:8) − ε (cid:9) and A either { } or { } . The left-hand side of (35)equals ε and R A x d P ( x ) = (cid:0) − ε (cid:1) · ε ≤ ε . Thus (35) is satisﬁed on all intervals.This example also demonstrates that the condition of Ziegler (2020) (Theorem 1) is not suﬃcientfor feasibility. Using our notation his condition can be written as follows:max (Z a x dP ( x ) + Z b x dP ( x ) − p, Z a (1 − x ) dP ( x ) + Z b (1 − x ) dP ( x ) − (1 − p ) ) ≤≤ P ([0 , a ] × [0 , b ]) ≤≤ min (Z a (1 − x ) dP ( x ) + Z b x dP ( x ) , Z a x dP ( x ) + Z b (1 − x ) dP ( x ) ) . for every a, b ∈ [0 , a, b ∈ [0 ,

1] this condition is equivalent to four of ourconditions (see Equation 6) for all possible combinations of A = [0 , a ] or A = [ a,

1] and A = [0 , b ]or A = [ b, A , A ⊂ [0 , P and P are supported on at most two points, such intervalsexhaust all possible non-trivial sets in Equation 6; in this case, Ziegler’s condition is necessary andsuﬃcient for feasibility. However, the example above demonstrates that this condition becomesinsuﬃcient if the support contains at least 3 points. E Singularity of extreme measures with given marginals

In this section we formulate an extension of a celebrated result of Lindenstrauss (1965) regardingextreme points of the set of measures with given marginals. The extension is two-fold: the classicresult assumes n = 2 and uniform marginals and we get rid of both assumptions. This extension canalso be deduced from a more general statement by Shortt (1986), which allows for non-orthogonalmultidimensional projections.For the reader’s convenience, we include a proof similar to Lindenstrauss’s.46 heorem 5 ( n = 2 (Lindenstrauss, 1965); n ≥ . Any extreme point P of the setof probability measures on [0 , N with given one-dimensional marginals ν i ∈ ∆([0 , , i ∈ N , issingular with respect to the Lebesgue measure λ on the unit cube. In other words, for each extreme P there exists a measurable set B such that P ( B ) = 1 and λ ( B ) = 0 .Proof. Assume the converse: P is not singular. By the Lebesgue decomposition theorem the con-tinuous part of P can be singled out: P = µ + λ ⊥ , where µ = 0 is absolutely continuous withrespect to λ ( dµ = f d λ with a non-negative integrable density f ), λ ⊥ is singular with respect to λ .Consider the Lebesgue space L ( µ ) of integrable functions with respect to µ (deﬁned µ -everywhere)and its closed subspace S generated by “separable” functions g ( x , . . . , x n ) = P i ∈ N g i ( x i ), where g i ∈ L ( µ ), i ∈ N , depends on the variable x i only.Let’s assume that S 6 = L ( µ ), i.e., separable integrable functions are not dense in all integrable(we check this condition at the end of the proof). Now we show that under this assumption, µ canbe represented as a convex combination of µ ′ + µ ′′ , where µ ′ and µ ′′ are distinct but have the samemarginals. By the Hann-Banach theorem there exists a continuous functional θ of norm 1 such that θ is identically zero on S . Since the dual space to L ( µ ) is the space L ∞ ( µ ) of essentially-boundedfunctions, the functional θ can be identiﬁed with a non-zero function θ ( x , . . . , x n ) bounded by 1in absolute value. The condition of vanishing on S reads as Z X i ∈ N g i ( x i ) · θ ( x , . . . , x n ) d µ = 0 , ∀ g i = g i ( x i ) ∈ L ( µ ) , i ∈ N. (36)We see that the measure θ d µ is non-zero but has zero marginals! Deﬁne µ ′ and µ ′′ as dµ ′ =(1 − θ ) d µ and dµ ′ = (1 + θ ) d µ . By the construction, µ is the average of µ ′ and µ ′′ , µ ′ and µ ′′ aredistinct, and all measures µ , µ ′ , and µ ′′ have the same marginals. Thus P is also represented as theaverage of µ ′ + λ ⊥ and µ ′′ + λ ⊥ , i.e., P is not an extreme point. Contradiction. This contradictioncompletes the proof (under the assumption that S 6 = L ( µ )).Now we check that S 6 = L ( µ ). Our goal is to construct a function h ∈ L ( µ ) such that forany g ( x , . . . , x n ) = P i ∈ N g i ( x i ) ∈ L ( µ ) the L -distance R | h − g | d µ is bounded below by somepositive constant independent of g .Recall that d µ = f d λ . Fix δ > A δ = { x ∈ [0 , N : f ( x ) ≥ δ } hasnon-zero Lebesgue measure. Fix another small constant ε >

0; by the Lebesgue density theoremapplied to A δ , there exist a point x ∈ (0 , N and a number a > C = Q i ∈ N [ x i − a, x i + a ) ⊂ [0 , N of size 2 a centered at x the following inequality holds λ ( C \ A δ ) λ ( C ) ≤ ε .Deﬁne h ( x ) = 1 if x i ≥ x i for all i ∈ N , and h ( x ) = 0, otherwise.Cut the cube C into 2 n small cubes indexed by subsets of N : for M ⊂ N the cube C M is givenby Q i ∈ N \ M (cid:2) x i − a, x i (cid:1) × Q i ∈ M (cid:2) x i , x i + a (cid:1) . No function g ( x ) = P i ∈ N g i ( x i ) can approximate We refer here to separability in economic sense. well on all the small cubes at the same time. Intuition is the following: since h is zero in allthe cubes except C N , then g must be close to zero on these cubes; however, values on these cubesdetermine values of g on C N by g ( x ) = 1 n − X i ∈ N g (cid:0) x , . . . , x i − , x i − a, x i +1 , . . . , x n (cid:1)! − g (cid:0) x − a, x − a, . . . , x n − a (cid:1)! , (37)therefore g is close to zero on C N and cannot approximate h well.To formalize this intuition, we assume that R C M | h − g | d µ is less then some α · λ ( C M ) for any M ⊂ N and show that this constant α cannot be too small. For M = N we get R C M | g | · f d λ ≤ αλ ( C M ). Applying the Markov inequality on the set C M ∩ A δ and taking into account that this setis big enough (by the construction of the original cube, λ ( C M \ A δ ) λ ( C M ) ≤ n ε ), we obtain existence of aset B M ⊂ C M such that | g ( x ) | ≤ √ αδ on B M and λ ( C M \ B M ) λ ( C M ) ≤ n ε + √ α .Consider a subset B ∗ of C N such that, whenever x ∈ B ∗ , all the arguments of the right-handside in (37) belong to respective subsets B M , i.e., B ∗ = ∩ i ∈ N (cid:0) B N \{ i } + a · e i (cid:1) ∩ (cid:0) B ∅ + a · P i ∈ N e i (cid:1) ,where sets B M are translated by the elements of the standard basis ( e i ) i ∈ N . The union boundimplies that the set B ∗ is dense enough in C N : λ ( C N \ B ∗ ) λ ( C N ) ≤ ( n + 1) (2 n ε + √ α ). By formula (37),the absolute value of g is bounded by nn − √ αδ on B ∗ . We get the following chain of inequalities: α ≥ λ ( C N ) Z C N | h − g | d µ ≥ λ ( C N ) Z B ∗ ∩ A δ | h − g | · f d λ ≥ (cid:18) − nn − √ αδ (cid:19) · δ · λ ( B ∗ ∩ A δ ) λ ( C N ) ≥ (cid:18) δ − nn − √ α (cid:19) (cid:0) − ( n + 1) (cid:0) n ε + √ α (cid:1) − n ε (cid:1) . (38)Denote by α ∗ = α ∗ ( δ, n, ε ) the minimal value of α ≥ ε is a free parameter in the construction. Selecting it to be smallenough (namely ( n + 2)2 n ε < α ∗ > g = P i ∈ N g i ( x i ) Z [0 , N | h − g | d µ ≥ max M ⊂ N Z C M | h − g | d µ ≥ α ∗ ( δ, n, ε ) · − n λ ( C ) > . Since the constant on the right-hand side is independent of g and positive, we see that h cannot beapproximated by separable functions. Thus h does not belong to S and S 6 = L .48 Independent beliefs induced by Gaussian signals

Let φ be the density of the standard Gaussian random variable: φ ( t ) = √ π e − t .For prior p = / , consider an agent who gets a signal s ∈ R distributed according to theGaussian distribution with variance 1 and mean equal to d for the state ω = h , and equal to − d for the state ω = ℓ , so that the conditional distributions have the densities φ ( s − d ) and φ ( s + d ),respectively. The density f ( s ) of the unconditional distribution is f ( s ) = 12 φ ( s − d ) + 12 φ ( s + d ) = 12 √ π e − d e − s (cid:0) e ds + e − ds (cid:1) . For the sake of deﬁniteness we assume d > ν the induced distribution of posteriors. By Bayes’ Law, the posterior x ( s ) uponreceiving the signal s is equal to x ( s ) = φ ( s − d ) φ ( s − d ) + φ ( s + d ) ·

12 = e ds e ds + e − ds (39)and ν ([0 , t ]) = Z x − ( t ) −∞ f ( s )d s. We are interested in the question of when ν × ν is / -feasible. By Proposition 2, it is feasibleif and only if it is a mean preserving contraction of the uniform distribution. The next lemmaprovides a simple suﬃcient condition for this property. Lemma 7.

Let µ ∈ ∆([0 , be a non-atomic distribution, symmetric around / , with cumulativedistribution function F ( a ) = µ ([0 , a ]) . Assume that there is at most one point < a < such that F ( a ) = a and that F ( x ) − x > for all x close enough to , but not equal to 1. Then the uniformdistribution is a mean preserving spread of µ if and only if ≥ Z (cid:18) x − (cid:19) d µ ( x ) . (40) Proof.

Necessity follows directly from the convex order deﬁnition of mean preserving spreads. In-deed, the condition for being a mean preserving contraction of the uniform distribution is equivalentto H ( y ) = Z y ( F ( x ) − x ) d x ≤ ∀ y ∈ [0 , . (41)Integration by parts implies that for a = / this condition becomes exactly (40).To prove suﬃciency, we must check the inequality (41) for all y ∈ [0 , µ implies H ( y ) = H (1 − y ) and thus we can focus on the right sub-interval y ∈ [ / , y ∈ [ / , H ( y ) ≤

0. We claim that the maximum is attained at one of the end-points.49ndeed, if there is an internal maximum a ∈ ( / , H ′ ( a ) = F ( a ) − a = 0. Onthe whole interval ( a, H ′ ( y ) has a constant sign since it is continuous and y = a isthe unique zero in (cid:0) , (cid:1) . By assumption, F ( x ) − x > x close to 1 and, hence, the derivative ispositive on ( a,

1) and H is increasing on this interval. Therefore, the point a cannot be a maximumof H on [ / , y = / and y = 1. For y = / , this inequality coincides with (40) and for y = 1 it is trivial.Below we check that ν , as induced by Gaussian signals, satisﬁes the assumptions of Lemma 7 and,therefore, ν × ν is / -feasible if and only if R (cid:0) x − (cid:1) d ν ≤ . This is equivalent to R x d ν ≤ .We can make this condition more explicit by rewriting its left-hand side as Z x d ν ( x ) = Z ∞ x ( s ) · f ( s ) d s = 12 Z ∞ φ ( s − d ) d s = 12 Z d −∞ φ ( t ) d t, where in the last equation we applied the change of variable s − d = − t . This results in the followingcondition of feasibility: Z d −∞ φ ( t ) d t ≤ , i.e., d must be below the -quantile of the standard normal distribution. It remains to prove that ν indeed satisﬁes the assumptions of Lemma 7. We check that ν ([0 , x ]) − x is positive for x close to 1. Recall that we can write the belief x as a function x ( s ) of the signal s using (39). We denote the derivative of x ( s ) by x ′ ( s ). Substituting x = x ( s ), we get ν ([0 , x ( s )]) − x ( s ) = 1 − x ( s ) − ν ([ x ( s ) , + ∞ )) = Z ∞ s ( x ′ ( t ) − f ( t )) d t. For large t , we have f ( t ) = 12 √ π e − d · e − t + d · t (1 + o (1)) and x ′ ( t ) = 2 d (cid:0) e d · t + e − d · t (cid:1) = 2 d · e − d · t (1 + o (1)) . Therefore, the asymptotic behavior of the integrand is dictated by x ′ ( t ): x ′ ( t ) − f ( t ) = x ′ ( t )(1 + o (1)) . The integrand is positive for t large enough. This implies the desired positivity of ν ([0 , x ( s )]) − x ( s )for large values of s .Now we check that there is at most one point a ∈ (cid:0) , (cid:1) such that F ( a ) = a or, equivalently, thereis at most one point s ∈ (0 , ∞ ) such that R s −∞ f ( t ) d t − x ( s ) = 0. Denote G ( s ) = R s −∞ f ( t ) d t − x ( s ). Note that in this computation, we do not use the explicit formula for φ . In particular, one gets the same conditionof feasibility for any absolutely-continuous distribution of signals on R , if the induced distribution of posteriors ν satisﬁes the assumptions of Lemma 7. G is smooth and G (0) = lim s → + ∞ G ( s ) = 0. If G has k zeros in (0 , ∞ ), then it alsohas at least k + 1 extrema (minima or maxima) in this interval, hence, at least k + 1 critical points(zeros of the derivative G ′ ). We will show that there are no more than two critical points and thus G has at most one zero. The equation for critical points takes the following form G ′ ( s ) = 0 ⇐⇒ √ π e − d e − s (cid:0) e d · s + e − d · s (cid:1) − d (cid:0) e d · s + e − d · s (cid:1) = 0and can be rewritten as e − s (cid:0) e d · s + e − d · s (cid:1) = 4 d √ πe d . Denote the left-hand-side by H ( s ). The graph of H for s ≥ H ′ has at most one zero in (0 , + ∞ ). Indeed, H ′ ( s ) = ( s − d · tanh( dq )) · H ( q ) andthe equation q − d · tanh( dq ) = 0 has at most one positive solution by concavity of the hyperbolictangent on [0 , + ∞ ). Therefore, G has at most 2 critical points and thus at most one zero in (0 , ∞ ),which completes the argument and justiﬁes the application of Lemma 7 to νν