[PDF] Competing to Persuade a Rationally Inattentive Agent

Abstract

Firms strategically disclose product information in order to attract consumers, but recipients often find it costly to process all of it, especially when products have complex features. We study a model of competitive information disclosure by two senders, in which the receiver may garble each sender's experiment, subject to a cost increasing in the informativeness of the garbling. For a large class of parameters, it is an equilibrium for the senders to provide the receiver's first best level of information - i.e. as much as she would learn if she herself controlled information provision. Information on one sender substitutes for information on the other, which nullifies the profitability of a unilateral provision of less information. Thus, we provide a novel channel through which competition with attention costs encourages information disclosure.

Full PDF

CCompeting to Persuade a Rationally Inattentive Agent

Vasudha Jain ∗ Mark Whitmeyer † February 4, 2020

Abstract

Firms strategically disclose product information in order to attract consumers, butrecipients often ﬁnd it costly to process all of it, especially when products have complexfeatures. We study a model of competitive information disclosure by two senders, inwhich the receiver may garble each sender’s experiment, subject to a cost increasing inthe informativeness of the garbling. For a large class of parameters, it is an equilibriumfor the senders to provide the receiver’s ﬁrst best level of information–i.e. as muchas she would learn if she herself controlled information provision. Information on onesender substitutes for information on the other, which nulliﬁes the proﬁtability of aunilateral provision of less information. Thus, we provide a novel channel through whichcompetition with attention costs encourages information disclosure.

Keywords:

Bayesian persuasion; Information design; Multiple senders; Competition; Ratio-nal Inattention; Search

JEL Classiﬁcations:

D82; D83 ∗ Corresponding author. Department of Economics, The University of Texas at Austin. Email: [email protected]. † Department of Economics, The University of Texas at Austin.Thanks to V Bhaskar, Vasiliki Skreta, Max Stinchcombe, and Tom Wiseman for helpful feedback anddiscussions; and to various seminar audiences for helpful comments. a r X i v : . [ ec on . T H ] F e b onstant attention wears the active mind,Blots out our powers, and leaves a blank behind.Charles Churchill Epistle to William Hogarth

The standard Bayesian persuasion framework allows senders to design arbitrarily informativesignal structures, and assumes that receivers costlessly process all information made availableto them. This is an unrealistic assumption in many natural contexts, in which agents mayrationally choose to stay partly ignorant. Moreover, there are many situations in whichmultiple senders compete via information provision to be chosen by the agent. In thiscompetitive scenario, we ask how the consumer’s information-processing (or attention) costsshape the information provided by the senders.Consider, for instance, the situation encountered by doctors and pharmaceutical companies.Patients rely on their doctors to make important medical decisions for them, such as thedecision of which medication to take. Often, multiple drugs exist to treat the same condition,but nevertheless diﬀer in subtle ways that can prove crucial for patients. Which alternativeis best might depend on the particular circumstances of individual patients–e.g., someone’smedical history might make him more prone to the side eﬀects of one of them.A well intentioned doctor has her task clearly cut out–she should study all primaryresearch published on each drug, and let that information guide her prescription decisions.This means that when she reads about a clinical trial, she should dig into details such aswhether, for instance, adverse side eﬀects had led many trial subjects of a certain demographicgroup to drop out midway, or whether the drug had a diﬀerential impact depending on thestage of the illness.However, getting detailed information involves substantial time and eﬀort, and doctorstypically ﬁnd it diﬃcult to keep up. Tellingly, Alper et al. (2004) ﬁnd that it would take adoctor six hundred hours to skim all research relevant to general practice that is publishedin just one month. Consequently, they are likely to pay attention only to some publishedsummary statistics.Pharmaceutical companies are prohibited from falsifying facts when marketing to doctors.They do, however, strategically decide how much information to reveal and in what form,and in doing so, take into consideration the lack of attention on the part of the recipients:1esigning pamphlets in a way that the most favorable pieces of evidence stand out, orother strategies of that ilk. As Goldacre (2014) explains, “They (doctors) need good qualityinformation, but they need it, crucially, under their noses. The problem of the modern worldis not information poverty, but information overload...So doctors will not be going throughevery trial, about every treatment relevant to their ﬁeld...They will take shortcuts, and theseshortcuts can be exploited [emphasis added].”Motivated by this setting, we study a model of information disclosure á la Kamenicaand Gentzkow (2011), with two senders, and a receiver who can save on attention costs bydrawing from a less informative experiment than what is chosen by the senders. The questionwe are interested in is how, and to what extent, the degree of attention costs matters for therelationship between competition and information disclosure.More speciﬁcally, our baseline model has two senders who simultaneously commit to aBlackwell experiment for the quality of their respective products, which are ex-ante identical.An experiment can simply be identiﬁed by a distribution of beliefs that averages to theprior. A receiver, who wishes to choose the sender with a higher quality, visits the senderssequentially. When she visits the ﬁrst one, she observes the distribution of beliefs inducedby his (the sender’s) experiment, and is free to choose any mean preserving contraction–or garbling –of that. A draw from that garbling determines her posterior belief about that sender.Think back to the doctor example, and the shortcuts she might take: she might read justthe ﬁrst few pages of an article, only the nontechnical parts, only the technical sections, oreven just the title. All of these correspond to diﬀerent levels of information, and all of theseimpose on the receiver diﬀerent costs–a grueling slog through a complicated model takesmore out of the receiver than does a quick skim of the conversational portions.We capture this relationship by imposing that the less informative the receiver’s garbling, the lower her attention costs. Intuitively, further garbling of an experiment reduces costsbecause it leads to a belief distribution that is more concentrated around the prior, and henceinvolves less learning about the state. The receiver has to balance this reduction in attentioncosts against the worsening quality of information on a decision-relevant variable.With the ﬁrst posterior in hand, the receiver decides whether to visit the second sender.Importantly, we do not impose that she must visit the second sender in order to choose him.If she does decide to visit him, the protocol is identical to that for the ﬁrst sender: she choosesa garbling of his chosen experiment subject to an information cost. Finally, she chooses the q is less informative than p if q is a garbling of p . So, it is not the act of garbling that is costly, but rather the act of drawing more accurate beliefs. Each sender wants to maximize the probability ofbeing chosen.As we show later on, the receiver’s learning strategy has an intuitive feature that drivesour analysis: it can happen that she will have “seen enough” at the ﬁrst sender and need notvisit the second sender–her belief about the ﬁrst sender may be so high that she chooses himwithout ever visiting the second, and it may be so low that she chooses the second, sightunseen. Returning to the example: if a doctor is fairly certain that drug A is of low quality,then she would be willing to prescribe drug B without learning anything about it, and viceversa.In this setting, a pertinent benchmark is what the receiver would do if she had potentialaccess to full information on each sender, and could eﬀectively use any pair of Bayes plausibledistributions to learn–subject only to attention costs as described above. We show that inthis case–the ‘ﬁrst best’ scenario for her–she would always learn something from at least onesender, but never learn any sender’s type with certainty . Furthermore–and this is crucial–ﬁxing any prior, for a high enough attention cost parameter, it is optimal for her to learnfrom exactly one sender.The main question we address in this paper is whether in our game with strategic senders,there is an equilibrium in which the receiver ends up with this ﬁrst best outcome. That is,whether senders voluntarily provide as much information as the receiver would acquire in herﬁrst best scenario.Surprisingly, the answer is yes under general conditions. In particular, for any prior, thisholds as long as an attention cost parameter is above a threshold. This is a departure fromclosely related models in the literature, where either there is only one sender and the receiverfaces attention costs, or there are two senders but no attention costs.Our analysis produces a sharp economic insight into why a combination of these ingredients–competition and attention costs–gives us more information disclosure. Recall our observationthat for high enough costs it is optimal for the receiver to learn from exactly one sender. Nowsuppose that the sender from whom she does plan to learn, unilaterally deviates and restrictsher learning. Then since the other sender continues to provide full information, the receivercould just switch to learning from him instead. Her ex-ante payoﬀs, and the probability of If she does not visit a sender, her posterior is equal to the prior. This is equivalent to an environment where the receiver herself controls how much information is providedby senders. Equivalently, ﬁxing attention costs above a threshold, this holds over an interior interval of prior means.The interval expands as attention costs grow, and approaches the full range as they explode.

To the best of our knowledge, this paper is the ﬁrst to look at competitive information designwith information processing costs faced by the receiver. This relates thematically to severalstrands of the literature.Since in our model the receiver’s decision to garble a sender’s experiment is the result ofan optimization problem that accounts for attention costs, she is rationally inattentive asin the economics literature pioneered by Sims (2003). The particular framework of rationalinattention that we adopt is the same as in Lipnowski et al. (2019) and Wei (2018). Theformer paper considers the problem of a principal whose preferences over actions are perfectlyaligned with those of an agent. Attention costs are borne only by the agent, and the authorsestablish conditions under which the principal would want to restrict her information with aview to manipulating her attention.Wei (2018) belongs to the small but growing literature on persuasion of a rationallyinattentive receiver by a single sender. This paper, like ours, considers a binary types, binaryaction model with a single sender who has state independent preferences, and an exogenousthreshold of acceptance for the receiver. It is shown that the sender necessarily ﬁnds it in hisinterest to restrict the receiver’s learning–we show how competitive forces change this.Bloedel and Segal (2018) take a diﬀerent approach to a problem similar to Wei’s. In theirframework, after observing the sender’s experiment, but before seeing its realization, the4eceiver can choose a mapping from signal realizations to distributions over ‘perceptions’,incurring an entropy reduction cost. Then, the receiver observes the realized perception, andnot the actual signal realization. As Lipnowski et al. (2019) explain, this is conceptuallydiﬀerent from our paper (and theirs), since the receiver in our model pays a cost to reduceuncertainty about the state , and not the sender’s message. Matyskova (2018) studies apersuasion model where the receiver, after observing the realization from the sender’s signal,can acquire additional information on the state at a cost proportional to the reduction inentropy.Our work is also closely related to the papers on competitive information design withoutany attention costs. With two senders, this has been studied (albeit with slightly diﬀerenttiming than our baseline model) by Boleslavsky and Cotton (2015), who identify the uniqueequilibrium. Hulko and Whitmeyer (2018) extend this analysis to n > senders, while alsoincorporating the possibility of search frictions. Crucially, providing full information is notan equilibrium with zero attention costs, and we show that this continues to hold for positivebut small attention costs.Some other papers in the competitive information design literature that bear mentioningare Au and Kawai (2017a,b), Albrecht (2017), and Boleslavsky and Cotton (2018). The resultthat competition encourages information disclosure is familiar from some of these, but asdetailed above, introducing attention costs oﬀers a novel perspective on why that might betrue.Board and Lu (2018) also look at sellers competing through information to entice buyers.However, there, search is random, the number of sellers is uncountable, and the decisionby a seller of how much and what sort of information to disclose is made upon the buyer’svisit. Thus, the problem is diﬀerent from the scenario analyzed here (or in Hulko andWhitmeyer (2018)), where the searcher must choose whom to visit as part of a stoppingproblem. Moreover, in Board and Lu (2018) the value of each seller’s goods to the buyer areperfectly correlated whereas here they are independent. Accordingly, one of the key inputs oftheir model; how much a seller can observe about the consumer’s belief, is absent here.Another group of papers look at what happens if there is no competition but costs are onthe sender’s side instead of the receiver’s. Gentzkow and Kamenica (2014) look at optimalpersuasion mechanisms when the sender pays higher costs (proportional to entropy reduction)of designing more informative experiments. Likewise, Treust and Tomala (2017) consider The same is also true of some papers in the cheap talk literature, e.g. Battaglini (2002), which have avery diﬀerent ﬂavor. n copies of identical persuasion problems, but is constrained to send only k < n messages, whichare transmitted with exogenous noise. Interestingly, they ﬁnd that the sender’s payoﬀ fromthe optimal solution is the concave closure of his payoﬀ function, net of entropy reductioncosts. Thus, these costs arise endogenously in their model.The rest of the paper is organized as follows. Section 2 presents our baseline model.Section 3 presents results for the benchmark with a single sender. Section 4 presents theequilibrium analysis with two senders and spells out how the level of attention costs matters.Section 5 illustrates the robustness of our results to the various modiﬁcations mentionedin the introduction, and Section 6 concludes. The Appendix contains proofs that are notpresented in the main text. There are two senders indexed by i ∈ { , } , and a receiver ( R ). Sender i has type ω i ∈ Ω i := { , } , with the types being drawn independently. The common prior belief is that Pr( ω i = 1) = µ ∈ (0 , for i ∈ { , } . R has to select one of the two senders, and she has no outside option. Her payoﬀ is equalto the type of the selected sender, minus attention costs that we elaborate on below. Sender i ’s payoﬀ is if he is selected, and if not. All players maximize expected payoﬀs. The game proceeds in the following 3 stages.

Stage 0:

Each ( ex-ante uninformed) sender simultaneously commits to a Blackwellexperiment that generates information about his own type. Such an experiment is a mappingfrom { , } to the set of Borel probability measures over a compact metric space of signalrealizations. Each signal realization, then, is associated with a posterior belief distribution on { , } , and an experiment induces a distribution over posterior beliefs. Hereafter, we identifya posterior belief with the belief on ω i = 1 .From the work of Kamenica and Gentzkow (2011), we know that the set of Blackwellexperiments is isomorphic to the set of distributions of posterior beliefs whose average isthe prior. Thus, at this stage 0, sender i commits to a distribution p i ∈ ∆[0 , , with Our results hold with an outside option, as long as its expected quality is not too high. [0 , x dp i ( x ) = µ . Stage 1: R , who at this point does not observe the chosen distributions, decides whetherto visit any sender, and if yes, which one.Say she visits sender ﬁrst. Upon visiting she observes ’s distribution p , and is free tochoose any q that is a mean preserving contraction (or garbling) of p . Associated with anysuch q is an attention cost given by the following: C ( q ) = (cid:90) [0 , k ( x − µ ) dq ( x ) , (1)where k > . Note that costs depend on q and not directly on p . We defer a discussion ofthese costs to Section 2.2. R takes a draw from q , which determines her posterior belief about sender . Stage 2: R then decides whether to visit sender . If she does, she observes p andchooses a garbling q , once again incurring an attention cost C ( q ) . She takes a draw from q ,which determines her posterior belief about this sender. Finally, she chooses the sender forwhom her posterior belief is higher. She need not have visited a sender or learned anythingfrom him in order to select him.Notice that R ’s optimal garbling at stage 2 potentially depends on the belief she draws atstage 1. She may be more or less inclined to learn about the second sender, depending onhow much uncertainty has already been resolved about the ﬁrst one. Indeed, as we shall see,if the stage 1 belief is close enough to or , she chooses not to learn at all at stage 2, andthis fact plays a crucial role in our analysis.The distribution oﬀered by the sender visited ﬁrst dictates how much can be learned atstage 1. Then in light of the preceding observation, if both senders oﬀer diﬀerent distributions,the choice of whom to visit ﬁrst (if anyone) matters for payoﬀs. We relax this assumption in Section 5.1. q is a garbling of p if the random variable associated with q second order stochastically dominates–andhas the same mean as–the random variable associated with p . It is a strict garbling if additionally q (cid:54) = p .Trivially, q = p or q = δ µ is always an option. As long as she learns something from at least one sender, the posteriors would never be equal. (SeeFootnote 14.) They would be equal only if she does not learn from either sender–in that case she may choosebetween the senders in any way. i is a choice of a distribution p i ∈ ∆[0 , whose average is µ .A pure strategy for R consists of i) a choice of which sender to visit ﬁrst, if any; ii) a choiceof garbling for any distribution oﬀered by the sender she visits ﬁrst; iii) a choice of whetherto visit the second sender for each belief drawn in the previous stage; iv) a choice of garblingfor the second sender, for each distribution oﬀered by him and each posterior belief drawn inthe previous stage.Our solution concept is subgame perfect equilibrium (hereafter, equilibrium), deﬁned inthe standard way. We restrict the players to pure strategies, with one exception– viz. , weallow R to randomize over the order of visits.Before proceeding to our analysis, we point out the following characterization of the setof garblings of a binary distribution, which we shall extensively use: q is a garbling of a distribution with support { ν , ν } ⇐⇒ supp ( q ) ⊆ [min { ν , ν } , max { ν , ν } ] . Attention costs, in our framework, are costs incurred to process information on a sender’s type.Through his choice of a Blackwell experiment, a sender can control how much information onhis type is available–in other words, he can put a cap on what can be learned. But a recipientmay choose to ignore some of that information and take a draw from a less informativeexperiment, thereby reducing attention costs. For instance, a pharmaceutical company candecide how much research on its drug to publish, but a doctor might choose to read a subsetof that. Her costs would depend on how much of the research she chooses to read, not onhow much was published. In particular, the act of garbling per se is costless. The cost function in equation 1 captures this notion. Associated with each posterior x is acost k ( x − µ ) , so that more accurate beliefs–those that are further away from the prior–costmore. This is integrated to determine the cost of a distribution of posteriors. Thus, costs are posterior separable as in Caplin et al. (2019).Since k ( x − µ ) is strictly convex, by Jensen’s inequality we have q is a garbling of p = ⇒ C ( q ) ≤ C ( p ) , with the inequality strict for strict garblings. For instance, C ( q ) is minimized for the As discussed in the section on related literature, this is closely related to the framework in Lipnowskiet al. (2019) and Wei (2018). Of course, this property holds for any strictly convex function instead of k ( x − µ ) , but we work with the δ µ , and maximized for the fully informative one with support { , } .Clearly then, R faces a trade-oﬀ in her choice of garblings q and q –a garbling costs less,but also corresponds to a less informative experiment and is less valuable for her decisionproblem (Blackwell 1951, 1953). Returning to our example, the more extensive or detailedthe research a doctor chooses to read, the costlier it is to draw an inference from it; but also,the more conﬁdence she can place in that inference. We begin by taking a brief look at what happens if there is only one sender. R chooses agarbling of that sender’s distribution and accepts his product if the belief drawn from it isabove a threshold λ ∈ (0 , . Payoﬀs clearly depend only on the distribution ﬁnally chosenby R . In any (subgame perfect) equilibrium, the sender oﬀers a distribution to maximize hisexpected payoﬀ, correctly anticipating R ’s optimal garbling behavior.Following the arguments in Wei (2018), the setup with a single sender permits twosimpliﬁcations that are not valid in our two-sender model. One: to obtain the set ofequilibrium outcomes, it is without loss of generality to restrict the sender to incentivecompatible distributions–those that R would not want to garble. This leads to the secondsimpliﬁcation, which is that it is without loss to restrict him to binary and degeneratedistributions. (Since R has only two actions, she never wants to pay to generate more thantwo beliefs.)We refer to a distribution oﬀered by the sender in a sender-preferred equilibrium as sender-optimal .If λ < µ , it is immediate that any sender-optimal distribution is such that nothing islearned and he is accepted with certainty. For example, he can simply design an uninformativeexperiment.Now say λ > µ . If k = 0 , then we have a standard Bayesian persuasion problem, and weknow from prior work that the sender-optimal distribution has support { , λ } . But if k > ,this is no longer optimal, because the garbling chosen by R in response to that would be δ µ , and the sender would not be accepted. This is easy to see intuitively–at a belief λ , R is indiﬀerent between accepting and rejecting. When oﬀered { , λ } , her gross payoﬀ fromchoosing any garbling is the same as the payoﬀ from rejecting with certainty. But then there speciﬁc form for tractability.

9s no reason for her to pay to learn anything. To make it worth her while to do so, the senderwould have to allow her to generate beliefs above λ .The following proposition summarizes the results for this benchmark case. Proposition 3.1.

Suppose there is a single sender, and R has a threshold of acceptance λ > µ . Then,1. A sender-preferred equilibrium exists.2. In a sender-preferred equilibrium, R ’s garbling on path is strictly less informative (inthe Blackwell sense) than her optimal garbling in response to full information.Proof. See Lemma 1 and Proposition 2 in Wei (2018). (cid:4)

Importantly, as will be shown ahead in this paper, R has a unique optimal garbling inresponse to any binary distribution, which implies that this result holds for any equilibriumand we can omit the ‘sender-preferred’ qualiﬁer. Corollary 3.1.1.

If there is a single sender, and R has a threshold of acceptance λ ∈ (0 , ,then full information is not oﬀered by the sender in equilibrium. In response to full information, say R ’s optimal garbling has support { ν , ν } where ν < λ < ν . Then these results state that in equilibrium, the distribution R ends up withwould be a strict garbling of this. Stated diﬀerently, in equilibrium the sender does restrict R ’s learning, not allowing her to choose her ﬁrst best .The intuition roughly is that although the sender cannot implement his optimal no-garbling solution { , λ } , he need not go all the way to providing full information. He canproﬁtably restrict learning so that the higher belief in the support of R ’s garbling is below ν ,and the probability of its realization is higher.As we see next, introducing an additional sender yields an interesting comparison to this. We now analyze the game described in Section 2, for an arbitrary k > and µ ∈ (0 , .To start oﬀ, recall our observation that R ’s order of visits matters when the two distri-butions on oﬀer are diﬀerent. In equilibrium she must correctly anticipate the distributionschosen, and the order of visits must be a best response to those. However, since she doesnot observe the chosen distributions at stage 0, any deviation by a sender goes undetected10ntil and unless he is visited. This has the following implication, which we note for furtherreference. Remark.

Any deviation by a sender cannot aﬀect either R ’s decision to visit a sender, or theorder of her visits.Next, note that if both senders oﬀer the same distribution, then R is indiﬀerent betweenthe two orders of visit (if she visits anyone). The analysis below will make it clear that thetie breaking rule in this case does not matter for our results, and we do not assume anythingabout it.We now turn to the question of equilibrium existence. Suppose that each of the twosenders oﬀers no information, i.e. the distribution δ µ . Then upon visiting either sender, R is also restricted to choosing δ µ . But then she expects to gain nothing by visiting a sender,and not visiting either of them is a best response. She may simply select sender 1 with anyprobability p ∈ [0 , , and sender 2 with probability − p . Clearly, if this best response isplayed, a deviation by a sender goes undetected, and does not make any diﬀerence to theoutcome. Thus we have the following. Claim 4.1 (Equilibrium existence) . An equilibrium exists for all µ ∈ (0 , , and for all k > .In particular, there is always an equilibrium in which each sender oﬀers an uninformativedistribution. Naturally, we are interested in ﬁnding other, more interesting equilibria. Of particularinterest are equilibria that give R her ﬁrst best payoﬀ.First, let us clarify what exactly we mean by this. R ’s ﬁrst best payoﬀ is essentiallythe best she can achieve, across all proﬁles of sender behavior. In other words, it is thepayoﬀ she would get if she herself could choose the senders’ distributions. Now, since everyBayes-plausible distribution is a garbling of the fully informative distribution, she has greatestlatitude when both senders oﬀer the fully informative distribution. Thus, her ﬁrst best payoﬀis attained when both senders oﬀer full information.However, she may attain the same payoﬀ even when senders choose other less informativedistributions. Suppose, for illustration, that when oﬀered full information, the following is abest response for R : visit Sender 1 ﬁrst, choose the garbling { µ − (cid:15), µ + (cid:15) } for him; then visitSender 2 and choose the uninformative garbling for him. Then even if, e.g., Sender 1 oﬀersthe distribution { µ − (cid:15), µ + 2 (cid:15) } and Sender 2 oﬀers no information, she gets to choose theexact same response and secure her ﬁrst best payoﬀ. At the heart of this is the fact that dueto attention costs, she might not (and as we shall see, does not) really use full informationeven when allowed to. 11he next observation is easy to make. Remark.

Suppose there is an equilibrium in which Sender i oﬀers p i . Then R achieves herﬁrst best payoﬀ in this equilibrium if and only if her best response on path is also a bestresponse on path to full information from both senders.This observation is used to prove the ‘only if’ direction of the following proposition–theargument draws on some results from later sections in the paper, and is presented in theAppendix. The ‘if’ direction is obvious. Claim 4.2 (First best) . For given parameters, there is an equilibrium that gives R her ﬁrstbest payoﬀ if and only if there is an equilibrium in which both senders oﬀer full information. An implication of this is that in establishing conditions for the existence of a full informationequilibrium, we establish conditions for an equilibrium where R gets her ﬁrst best. Hence, wefocus our attention on full information, and the next proposition presents our main result. Proposition 4.3 (Full information equilibrium) .

1. For all k > , there is an equilibriumin which both senders oﬀer full information if and only if µ ∈ (cid:2) k , − k (cid:3) .2. For all k ∈ (cid:0) , (cid:3) , µ ∈ (0 , , there is no equilibrium in which both senders oﬀer fullinformation. It is worth highlighting that the parameter in the attention cost function is crucial. If k is above , we obtain an interval of priors over which full information is an equilibrium, andthis interval expands as k grows. In the limit, as k → ∞ , the interval converges to (0 , , thefull range of priors. Thus, by having higher attention costs, the receiver might elicit betterinformation from competing senders.The following corollary states the same result diﬀerently. Corollary 4.3.1.

For all µ ∈ (0 , , there is an equilibrium in which both senders oﬀer fullinformation if and only if k > max (cid:110) µ , − µ ) (cid:111) (weak inequality if µ (cid:54) = ) . Stated this way, one might conjecture that the result is trivially obtained because forhigh enough values of k , R ﬁnds it optimal to not learn anything at all even when oﬀered fullinformation. As it turns out, this is not the case, and for any ﬁnite k she does undertakesome learning from at least one sender when oﬀered full information.Instead, we obtain the existence result because for high enough values of k , R ﬁnds itoptimal to learn only about the quality of one sender, and completely ignore information onthe other. The analysis ahead will clarify how this fact plays a crucial role.12ection 4.2 provides a proof of this result (and presents additional results), but before wemove on to that, it is instructive to examine another benchmark, where k = 0 . k = 0 ) When k = 0 , it is costless for R to learn. This makes a substantive diﬀerence to the analysis,because she never has a strict incentive to garble either sender’s distribution–there is noreason to leave any information on the table. Notice that in this case R ’s ﬁrst best payoﬀis obtained only when both senders oﬀer full information.For simplicity, here we assume the following tie-breaking rules i) if the stage 1 draw is (or ), she rejects (or accepts) that sender without visiting the other one, and ii) if the drawsfrom both stages are the same, she selects the sender visited last. Proposition 4.4 (No attention costs) . Suppose k = 0 . Then the following are true.1. For all µ ∈ (0 , , there is an equilibrium in which both senders choose an uninformativedistribution.2. For all µ ∈ (0 , , there is no equilibrium in which both senders oﬀer full information. The reason an uninformative equilibrium exists is identical to that for k > –it is a bestresponse for R to not visit either sender, but then a deviation is not detected and makesno diﬀerence to the outcome. The reasoning behind non-existence of a full informationequilibrium, on the other hand, is very diﬀerent for k = 0 and for small, positive k .For k = 0 , in response to full information from both senders, R visits either one of them,learns his type perfectly, and immediately takes a decision. A sender’s deviation cannot makea diﬀerence if he is not visited ﬁrst. But if he is, a deviation to support { (cid:15), } is proﬁtable,where (cid:15) is arbitrarily close to zero. This is because if R ’s draw from this distribution is (cid:15) , shecontinues to learn from the second sender, and rejects him if the draw then is .When k is any positive quantity, a deviation of this nature does not help–intuitively, evenif the stage 1 draw is a small, positive (cid:15) , R is sure enough of the quality of the ﬁrst senderthat she does not ﬁnd it worth her while to learn about the other one.The following result establishes existence of other (less than fully) informative equilibriawhen attention costs are absent. This setup has been studied in Boleslavsky and Cotton (2018), Hulko and Whitmeyer (2018) and otherpapers, with the diﬀerence that they assume that R observes the chosen distributions at stage 0. laim 4.5.

1. Let k = 0 and µ ≤ . There is an equilibrium in which each player choosesthe uniform distribution on [0 , µ ] .2. Let k = 0 and µ > . There is an equilibrium in which each sender chooses a CDF witha continuous portion F ( x ) = x µ on [0 , − µ )] and a point mass of size − µ on . Insuch an equilibrium, R ’s decision about whom to visit ﬁrst must be fair (each sender isvisited ﬁrst with probability ). k > ) As previously discussed, for positive attention costs our main result pertains to the fullinformation equilibrium, which is stated in Proposition 4.3 above. We begin by showing whyit is true, and for ease of exposition present the key arguments for k = 1 . The structure ofthe proof is the same for a generic k > , and the details are relegated to the Appendix. Inessence, the argument will be that full information is an equilibrium when R wishes to learnonly from one sender. k = 1 Recall that for k = 1 , Proposition 4.3 states that full information is an equilibrium if andonly if µ ∈ (cid:2) , (cid:3) .Start by considering any µ ∈ (0 , , and suppose that each sender oﬀers the fully informa-tive distribution with support { , } . To analyze R ’s (on path) best response, we proceedin two steps–ﬁrst, we determine her stage 2 best response for each belief drawn at stage 1;second, we use that to solve for the optimal stage 1 behavior. We make use of the techniqueof concaviﬁcation for this. R (cid:48) s stage 2 best response: First let us ﬁnd the optimal stage 2 garbling, if R visitsthe sender at that stage.Say the draw from Stage 1 is x ∈ [0 , . Then, R selects the second sender if and onlyif the stage 2 draw y turns out to be higher than x . Her payoﬀ from a stage 2 belief y isthen max { x, y } , minus the attention cost associated with y . Denote this stage 2 payoﬀ by U ( y ; x ) . It does not matter what we assume about the tie breaking rule when y = x . For any distribution oﬀered, x would not belong to the support of the garbling chosen at stage 2. The reasoning is similar to the argumentfor why the standard Bayesian persuasion solution is not incentive compatible in the single-sender case (seeSection 3). ( y ; x ) = max { x, y } − ( y − µ ) , for x, y ∈ [0 , . This is piecewise concave in y , and Figure 1 plots it for a representative valueof x . Figure 1: R ’s stage 2 payoﬀsNow, since any distribution is a garbling of the one with support { , } , we know fromKamenica and Gentzkow (2011) that for any given x , R ’s optimal garbling is determinedusing the concaviﬁcation of U ( y ; x ) over [0 , . The concaviﬁcation is the red curve in Figure2. It is evident that depending on where µ lies, the optimal distribution of beliefs is eitherdegenerate on µ , or is binary. Lemma 4.6 (Stage 2 optimal garbling) . Suppose that R ’s stage 1 draw is x ∈ [0 , and shevisits the sender at stage 2. R ’s stage 2 optimal garbling is either degenerate or binary, andits support is as follows.1. If µ < , (cid:8) x − , x + (cid:9) if ≤ x < µ + { , √ x } if µ < x < { µ } if x ≤ µ or x ≥ µ + R ’s stage 2 payoﬀs

2. If µ = , (cid:8) x − , x + (cid:9) if < x < { µ } if x ≤ or x ≥

3. If µ > , (cid:8) x − , x + (cid:9) if µ − < x ≤ (cid:8) − √ − x, (cid:9) if < x < − (1 − µ ) { µ } x ≤ µ − or x ≥ − (1 − µ ) The interesting thing to note here is that regardless of the prior, if the ﬁrst stage draw iseither very high or very low, then R chooses not to learn anything from the second sender.This is intuitive–for a high enough belief that the ﬁrst sender’s quality is good, she deemsit very unlikely that the second sender is better, and does not invest in learning about him.Instead, she accepts the ﬁrst sender with certainty. Conversely, if the ﬁrst stage draw is verylow, she accepts the second sender with certainty.Furthermore, the thresholds beyond which there is no learning at stage 2 depend on theprior. The prior is the expected quality of the second sender, so the higher it is, the larger(smaller) the range of ﬁrst stage beliefs over which the second sender is accepted (rejected)without learning. 16f R does choose a binary distribution at stage 2, then she selects the second (ﬁrst) senderat the higher (lower) belief.For any stage 1 draw, if the stage 2 optimal garbling involves any learning, R strictlygains from visiting the second sender. If it does not involve any learning, R is indiﬀerentbetween making the second visit and not, and she may resolve this in any manner. R ’s stage 1 best response: Using the above result, it is straightforward to obtain R ’sﬁrst stage continuation payoﬀs for an arbitrary x , and determine her ﬁrst stage optimalgarbling from its concaviﬁcation over [0 , . This leads to the following. Lemma 4.7 (Stage 1 optimal garbling) . Any Bayes plausible distribution with support drawnfrom the following sets is optimal for R at stage 1.1. (cid:8) µ − (cid:9) ∪ (cid:2) , µ + (cid:3) if µ ∈ (cid:2) , (cid:3) .2. (cid:2) µ − , (cid:3) ∪ (cid:8) µ + (cid:9) if µ ∈ (cid:2) , (cid:3) .3. { , y ( µ ) } if µ < , where y ( µ ) ∈ (cid:0) µ, (cid:1) .4. { y ( µ ) , } if µ > , where y ( µ ) ∈ (cid:0) , µ (cid:1) . The exact expressions for y ( µ ) and y ( µ ) are not important. The main thing to notehere is that the stage 1 solution always involves some learning, and is unique if and only if µ (cid:54)∈ (cid:2) , (cid:3) . Interestingly, in spite of the fact that there are only two senders and binary typesin this model, R may choose to generate more than two beliefs at stage 1. The reason is thateach stage 1 belief is optimally followed by a diﬀerent degree of learning at stage 2.Note also that since the stage 1 optimal distribution always involves learning, a visit isnecessarily made at this stage. R does not care which sender is visited ﬁrst, and she mayrandomize her choice in any way.Since there are multiple best responses for µ ∈ (cid:2) , (cid:3) , we need to make a selection amongthem. Notice that the most informative (in the Blackwell sense) of the optimal distributionshas support (cid:8) µ − , µ + (cid:9) , and by Lemma 4.6, this is the only one among them that isnecessarily followed by no learning at stage 2. We assume that R breaks her indiﬀerence infavor of this distribution. That is, when indiﬀerent, she’d rather not put oﬀ learning untilthe next stage.In summary: if µ ∈ (cid:2) , (cid:3) , R ’s best response to full information is the following. Visitsender 1 with probability q ∈ [0 , , and Sender 2 with probability − q . Choose the garbling17ith support (cid:8) µ − , µ + (cid:9) for the sender visited. If the belief drawn is µ − , select theother sender without learning anything from him. If the belief drawn is µ + , select thevisited sender without learning anything from the other one. We now know what happens on path if full information is oﬀered. For µ ∈ (cid:2) , (cid:3) , it turnsout that we can rule out proﬁtable deviations without exactly knowing R ’s best response toany deviation. For µ (cid:54)∈ (cid:2) , (cid:3) , we show that there exists a proﬁtable deviation for a sender. No proﬁtable deviation for µ ∈ (cid:2) , (cid:3) : Consider what a sender achieves by deviating.We have already seen that this does not aﬀect the probability of being the one to be visitedﬁrst. Moreover, if he is not the one to be visited ﬁrst, his payoﬀs are not aﬀected, since R does not plan to learn anything from him. So, we only need to consider what happens if hedeviates and is visited ﬁrst. In this case, R ’s behavior would be altered if (cid:8) µ − , µ + (cid:9) is not a garbling of the distribution he deviates to.Now, say R visits a sender and ﬁnds out that she may no longer choose support (cid:8) µ − , µ + (cid:9) . Regardless of what the sender’s deviation is, though, she is permittedto learn nothing, i.e. choose support { µ } . By Lemma 4.7, this is one of her best responses,and by Lemma 4.6, this would be optimally followed by visiting the other sender (who hasnot deviated) and choosing support (cid:8) µ − , µ + (cid:9) for him.By responding to the deviation in this manner, R ensures a payoﬀ equal to what isattained in the absence of the deviation. Naturally, any other response speciﬁed in Lemma4.7, if permissible under the deviation, would also give her the same payoﬀ, and she maychoose that instead of support { µ } .What this essentially implies is that in response to any deviation by the sender visitedﬁrst, R would choose from the set speciﬁed in Lemma 4.7, and depending on the belief shedraws, follow it with stage 2 behavior speciﬁed in Lemma 4.6.This observation, and the next Lemma, are key to our analysis. Lemma 4.8.

For all µ ∈ (cid:2) , (cid:3) , conditional on being visited ﬁrst, a sender’s expected payoﬀis the same for any of the receiver responses speciﬁed in Lemma 4.7. This immediately implies that a unilateral deviation does not aﬀect a sender’s payoﬀs,and it is proven that full information is an equilibrium for µ ∈ (cid:2) , (cid:3) . Either by not visiting him at all, or by visiting but not learning. ibid. R , i.e. when both senders allow her perfect information, attention costs lead her to learnfrom only one sender. It does not matter which sender that is–if it is a best response to learnonly from the sender visited ﬁrst, then clearly it is also a best response to learn only fromthe one visited second.Now, we say that on path she chooses to learn from the ﬁrst sender she visits. If, however,that sender deviates and restricts her learning, she is able to compensate for it by learningmore from the other sender. The ex-ante probability that she makes the correct choicethereby remains unaﬀected, and the deviating sender is unable to gain. Existence of proﬁtable deviation for µ (cid:54)∈ (cid:2) , (cid:3) : When µ (cid:54)∈ (cid:2) , (cid:3) , similar reasoningdoes not apply, since R ’s best response is unique and involves learning from both senders onpath. In this case, there exists a deviation where a sender proﬁtably restricts R ’s learning incase he is visited second, without aﬀecting what happens if he is visited ﬁrst.In particular, say for instance µ < . Recall that in response to full information, R chooses support { , y ( µ ) } at stage 1. Following belief she immediately accepts the secondsender, and following belief y ( µ ) , she chooses support (cid:110) , (cid:112) y ( µ ) (cid:111) at stage 2.It can be shown that ∃ p ∈ ( y ( µ ) , (cid:112) y ( µ )) such that if a sender deviates to { , p } andis visited second (by R holding a belief y ( µ ) from stage 1), R chooses { , p } instead of (cid:110) , (cid:112) y ( µ ) (cid:111) . If he is instead visited ﬁrst, R ’s best response is unchanged. Evidently thisdeviation increases the probability of being selected, and is therefore proﬁtable. k > The analysis so far tells us that for any k , ﬁrst, an uninformative equilibrium always exists;and second, a full information equilibrium exists for a large class of parameter values.Our focus on full information is not misplaced, in spite of the fact that R never makes useof it even when it is on oﬀer. The reason we focus on it–as we have discussed at length–isClaim 4.2: full information is an equilibrium if and only if R can get her ﬁrst best outcomein an equilibrium.Our interest in equilibria where R gets her ﬁrst best is natural–ﬁrst, the existence of suchequilibria is surprising; second, in many situations it is appropriate to select receiver-preferredequilibria. We showed this for k = 1 , but Appendix A.1 shows that it is true generally: in response to full information,she visits only one sender and picks support (cid:8) µ − k , µ + k (cid:9) . R her ﬁrst best payoﬀ? And second, what can we say about equilibriathat do not give her the ﬁrst best?The answer to the ﬁrst question is easy. There is a whole class of such equilibria, wherethe distributions oﬀered by the senders allow her to respond exactly as under full information. Claim 4.9.

Suppose k > and µ ∈ (cid:2) k , − k (cid:3) . For i ∈ { , } , let p i ∈ ∆[0 , be anydistribution with expectation µ , and of which the distribution with support (cid:8) µ − k , µ + k (cid:9) is a garbling. Then, there is an equilibrium in which sender i oﬀers the distribution p i . Suchan equilibrium is outcome equivalent to full information. This can be used to construct speciﬁc examples such as the following.

Corollary 4.9.1.

1. Let µ ≤ . Then there is an equilibrium in which both senders oﬀerthe uniform distribution on [0 , µ ] if k ≥ µ .2. Let µ > . Then there is an equilibrium in which both senders oﬀer a CDF with acontinuous portion F ( x ) = x µ on [0 , − µ )] and a point mass of size − µ on if k ≥ µ for µ ≤ , and if k ≥ − µ ) for µ ≥ . We choose to present these examples for a reason: Recall from Claim 4.5 that thesedistributions are also equilibria in the k = 0 scenario, where full information is not anequilibrium. In contrast, here these equilibria are in fact outcome equivalent to full information.The diﬀerence arises since with attention costs, both full information and these distributionsare garbled down to the same thing by R .Turning to the question on equilibria that do not give R her ﬁrst best, we have the followingsharp result that rules out their existence in the class of binary, symmetric equilibria. Essentially, it implies that if a symmetric binary equilibrium exists (for any parameters), somust the full information equilibrium, and in fact it must be outcome equivalent to the fullinformation one–so that R must be getting her ﬁrst best. Proposition 4.10.

Let the distribution p have support { l, h } with l ∈ [0 , µ ) and h ∈ ( µ, .1. If k > h − l ) , there is an equilibrium where both senders oﬀer p if and only if µ ∈ (cid:2) l + k , h − k (cid:3) .2. If k ≤ h − l ) , there is no equilibrium where both senders oﬀer p . That is, equilibria where both senders oﬀer the same distribution, that has binary support. Beyond this, characterizing all equilibria of the game is beyond the scope of this paper, amajor reason being that little practical can be said about the set of garblings of a non-binarydistribution.

Here, we illustrate that our main results continue to hold under a variety of diﬀerentmodelling choices. We begin by demonstrating that they continue to hold if instead thesenders’ experiment choices are public.

So far we have considered a scenario where R does not observe the experiment chosenby a sender until she visits him. This is a natural assumption in many situations–e.g.,pharmaceutical companies have much leeway in dissemination of information to doctors, andit may not be known how detailed the research published on a drug is until one takes a lookthrough it. As another example, one does not know how many customer reviews a seller hasallowed to be posted on his website until one visits the website.However, there can be other applications where one might expect the senders’ experimentsto be observed when they are chosen, i.e. at Stage of our game. In this case, the strategicconsiderations remain the same except for an important diﬀerence–posted experiments allowa sender’s deviation to be observed by R , and hence deviations aﬀect her order of visits. Forexample, if both senders are expected to choose the same experiment, then R is indiﬀerentbetween the order of visits; by deviating, a sender can break this indiﬀerence.We show that our main results continue to hold in this scenario. Proposition 5.1.

Suppose that the experiments chosen by the senders are observed by R atStage 0 of the game. Then Proposition 4.10 still holds. Note that Proposition 4.10 encapsulates our main results: it establishes conditions forexistence of a full information equilibrium, and proves non-existence of any binary, symmetricequilibrium that is not outcome equivalent to full information.A full proof of Proposition 5.1 follows in the Appendix, but by recalling the argumentsthat lead up to existence of a full information equilibrium for some parameters in the baseline For h = 1 , l = 0 this proposition is identical to Proposition 4.3. R ’s best response to full information inthe baseline model is to learn from exactly one sender, then a deviation cannot aﬀect payoﬀseven if it is observed at Stage . Put succinctly, if R knows that a sender has deviated, shecan simply visit the other sender and learn from him. Our baseline model assumes that the distributions of the senders’ types have identicalmeans. There, the receiver’s problem is the most interesting, since ex-ante she has very littleinformation to base her choice on.In this section, we relax this assumption. Our aim is to show that our main result appliesto settings where the prior beliefs on the two senders are diﬀerent but suﬃciently close.Namely, if the two means µ and µ lie in a particular interval, then there is an equilibriumin which both senders oﬀer full information.For expositional convenience let k = 1 . We proceed in the same vein as in the proof forLemma 4.7: using Lemma 4.6 we can obtain the receiver’s ﬁrst stage continuation payoﬀsfor an arbitrary ﬁrst stage draw x , which we then concavify to obtain the receiver’s ﬁrststage optimal garbling. Suppose that sender is visited second (we’ll revisit this assumptionshortly). Lemma 5.2 (Stage 1 optimal garbling) . Any Bayes plausible distribution with support drawnfrom the following sets is optimal for R at stage 1.1. (cid:2) µ − , (cid:3) ∪ (cid:8) µ + (cid:9) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) µ − , µ + (cid:3) .2. (cid:2) µ − , (cid:3) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) µ − , (cid:3) .3. (cid:2) , µ + (cid:3) ∪ (cid:8) µ − (cid:9) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) µ − , µ + (cid:3) .4. (cid:2) , µ + (cid:3) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) , µ + (cid:3) . Note that we have not yet determined which sender should be visited ﬁrst. Our ﬁrststep is to show that if µ and µ satisfy one of the conditions for Lemma 5.2 when µ isvisited second, then they satisfy one of the conditions for Lemma 5.2 when µ is visited ﬁrst.Formally, Lemma 5.3.

One of the four parametric restrictions in Lemma 5.2 holds if and only if oneof the four parametric restrictions in Lemma 5.2 holds in which µ and µ are replaced witheach other. is visited ﬁrst is the same as her expected payoﬀ under any optimal protocol in which weassume sender is visited ﬁrst. Hence, it does not matter which sender she visits ﬁrst, andso she can break ties in that manner in any way that she chooses.This step requires just a couple sentences to prove: in each of the four cases described inLemma 5.2, there is a stage optimal distribution in which the receiver learns nothing at theﬁrst sender. Her expected payoﬀ under the optimal search protocol is thus µ + µ + µ + µ − µ µ + 116 which is invariant to an exchange of µ and µ . Finally, we arrive at the heterogeneous meansanalog to Proposition 4.3: Proposition 5.4.

There is a full information equilibrium if | µ − µ | ≤ and1. µ , µ ∈ (cid:2) , (cid:3) ; or2. µ i ≤ ≤ µ j for i, j ∈ { , } and i (cid:54) = j ; or3. µ i ≤ ≤ µ j for i, j ∈ { , } and i (cid:54) = j . Here, we modify the model by allowing the receiver’s information processing cost at a senderto itself depend on the distribution chosen by the sender. That is, we amend the attentioncost so that it is now given by (at sender ) C ( q , p ) = k ( p ) (cid:90) [0 , ( x − µ ) dq ( x ) , (2)where p is the choice of distribution of posterior beliefs by sender and q is the garblingof that distribution chosen by the receiver. We assume that k is weakly decreasing inthe Blackwell order: the more informative the sender’s experiment, the less costly a giveninformation structure is for the receiver. This corresponds to the following intuition: the lessinformative the seller is, the costlier it is for the buyer to maintain a particular informationstructure, since she is forced to pay closer attention. We thank the associate editor for suggesting such a cost of information.

23e deﬁne k F := k ( p B ) to be the minimum cost parameter, where p B is the Bernoullidistribution begotten by full information provision by the sender. Naturally, we stipulatethat k F is non-negative.With this modiﬁed cost function, our main result continues to hold. Namely, providedthe attention cost is suﬃciently high, there is an equilibrium in which both senders oﬀer fullinformation: Proposition 5.5 (Full information equilibrium) . For all k F > , if µ ∈ (cid:104) k F , − k F (cid:105) thenthere is an equilibrium in which both sellers oﬀer full information.Proof. On path, where each sender provides full information, the analysis is unchangedfrom earlier sections (with k F in lieu of k ), and the searcher’s optimal protocol is unaltered.Moreover, should a sender deviate, then again the searcher can behave optimally by learningnothing at the deviating sender, eliminating the possibility for a sender to deviate proﬁtably. (cid:4) We study a model of information disclosure by two senders who compete to persuade areceiver. The receiver, instead of passively accepting the experiment adopted by a sender,may choose to garble it before drawing a belief. The lower the informativeness of the chosengarbling, the lower her attention costs are.We show how for a large class of parameters, it is an equilibrium for the senders to oﬀerat least as much information to the receiver as she would choose for herself, if she couldcontrol information provision. In particular, full disclosure by both senders is an equilibrium.Moreover, there is no binary symmetric equilibrium (for any value of parameters) that doesnot give the receiver this ﬁrst best outcome. We prove robustness to various modelingassumptions.This is the result of an interesting trade-oﬀ that generalizes beyond the speciﬁcs of ourmodel. Due to attention costs, the receiver never ﬁnds it worthwhile to learn either sender’stype perfectly. That is, even with access to full information, she leaves some scope for furtherlearning about each. Moreover, since her task is to choose between the senders, informationon the quality of one sender partially substitutes for information on the quality of the other.For example, learning a lot about the quality of one drug on the market can be just as good(for the accuracy of a doctor’s decision) as learning a bit about both alternatives.24onsequently, starting from a situation of full disclosure by both senders, if either senderdeviates and restricts the receiver’s learning, she has an opportunity to make up for it byusing some of the ‘surplus’ information–so far unused–about the other sender. The deviatingsender thus has limited ability–if any–to aﬀect the overall quality of the receiver’s informationacross the two alternatives.This channel clearly breaks down in the absence of attention costs (so that the receiveralways uses all available information), or if there is only one sender (so that there is no notionof substitutability). Our model identiﬁes novel strategic incentives for greater informationdisclosure.We motivated our study with the example of pharmaceutical companies strategicallydisclosing information to prescribing physicians. The assumption of high attention costs, aswell as a low outside option for the receiver, are reasonable in this context.The model is well suited to study strategic disclosure in numerous other settings whereinformation is ‘complex’, e.g. the disclosure of features of retirement savings plans toconsumers, or the informational content of political campaigns.

References

Albrecht, B.C..

Mimeo . Alper, Brian S, Jason A Hand, Susan G Elliott, Scott Kinkade, Michael J Hauan,Daniel K Onion, and Bernard M Sklar.

Journal of the Medical Library association ,92(4): 429.

Au, P. H. and K. Kawai.

Working paper .2017b. “Competitive Information Disclosure by Multiple Senders.”

Working paper . Battaglini, Marco.

Economet-rica , 70(4): 1379–1401.

Blackwell, David.

Proceedings of the Second BerkeleySymposium on Mathematical Statistics and Probability : 93–102, Berkeley, Calif.: Universityof California Press. 25953. “Equivalent comparisons of experiments.”

The annals of mathematical statistics :265–272.

Bloedel, Alexander W and Ilya R Segal.

Working paper . Board, Simon and Jay Lu.

Journal of Political Economy , 126(5): 1965–2010.

Boleslavsky, Raphael and Christopher Cotton.

American Economic Journal: Microeconomics , 7(2): 248–79, 4.2018. “Limited capacity in project selection: Competition through evidence produc-tion.”

Economic Theory , 65(2): 385–421.

Caplin, Andrew, Mark Dean, and John Leahy.

Working paper . Gentzkow, Matthew and Emir Kamenica.

American Eco-nomic Review , 104(5): 457–62.

Goldacre, Ben.

Bad pharma: how drug companies mislead doctors and harm patients :Macmillan.

Hulko, Artem and Mark Whitmeyer. arXiv preprint arXiv:1802.09396 . Kamenica, Emir and Matthew Gentzkow.

AmericanEconomic Review , 101(6): 2590–2615.

Lipnowski, Elliot, Laurent Mathevet, and Dong Wei.

Forthcoming in American Economic Review: Insights . Matyskova, Ludmila.

Working paper . Sims, Christopher A.

Journal of monetaryEconomics , 50(3): 665–690.

Treust, Maël Le and Tristan Tomala.

Working paper . 26 ei, Dong.

Working paper . Whitmeyer, Joseph and Mark Whitmeyer. arXiv preprint arXiv:1905.05157 . A Proofs

Consider any k > and µ ∈ (0 , . Let each sender oﬀer support { l, h } , with l ∈ [0 , µ ) and h ∈ ( µ, .We begin by proving a series of Lemmata. Lemma A.1.

Suppose that R ’s stage 1 draw is x ∈ [ l, h ] and she visits the sender at stage 2. R ’s stage 2 optimal garbling is either degenerate or binary, and its support is as follows.1. If k > h − l ) and µ ≤ min (cid:8) h − k , l + k (cid:9) :  { µ } if x ∈ [ l, l + k ( µ − l ) ] (cid:110) l, l + (cid:113) x − lk (cid:111) if x ∈ ( l + k ( µ − l ) , l + k ) (cid:8) x − k , x + k (cid:9) if x ∈ (cid:2) l + k , µ + k (cid:1) { µ } if x ∈ (cid:2) µ + k , h (cid:3)

2. If k > h − l ) and µ ≥ max (cid:8) h − k , l + k (cid:9) : lef t [  { µ } if x ∈ (cid:2) l, µ − k (cid:3)(cid:8) x − k , x + k (cid:9) if x ∈ (cid:0) µ − k , h − k (cid:3)(cid:110) h − (cid:113) h − xk , h (cid:111) if x ∈ (cid:0) h − k , h − k ( h − µ ) (cid:1) { µ } if x ∈ [ h − k ( h − µ ) , h ]

3. If l + k ≤ µ ≤ h − k : (cid:8) x − k , x + k (cid:9) if x ∈ (cid:0) µ − k , µ + k (cid:1) { µ } if x ∈ (cid:2) l, µ − k (cid:3) ∪ (cid:2) µ + k , h (cid:3) . If k > h − l ) and h − k < µ < l + k :  { µ } if x ∈ [ l, l + k ( µ − l ) ] (cid:110) l, l + (cid:113) x − lk (cid:111) if x ∈ ( l + k ( µ − l ) , l + k ) (cid:8) x − k , x + k (cid:9) if x ∈ (cid:2) l + k , h − k (cid:3)(cid:110) h − (cid:113) h − xk , h (cid:111) if x ∈ ( h − k , h − k ( µ − h ) ) { µ } if x ∈ [ h − k ( µ − h ) , h ]

5. If k ≤ h − l ) :  { µ } if x ∈ [ l, l + k ( µ − l ) ] (cid:110) l, l + (cid:113) x − lk (cid:111) if x ∈ ( l + k ( µ − l ) , l + k ( h − l ) ) { l, h } if x ∈ [ l + k ( h − l ) , h − k ( h − l ) ] (cid:110) h − (cid:113) h − xk , h (cid:111) if x ∈ ( h − k ( h − l ) , h − k ( µ − h ) ) { µ } if x ∈ [ h − k ( µ − h ) h ] Proof. R ’s stage 2 payoﬀs for a stage 2 belief y are given by U ( y ; x ) = max { x, y } − k ( x − µ ) . This is piecewise concave. We ﬁrst obtain the concaviﬁcation of U ( y ; x ) over [ l, h ] and thenuse it to ﬁnd the optimal garbling.The concaviﬁcation of U ( y ; x ) is obtained by joining two points y , y (in a straight line)with l ≤ y < x < y ≤ h . By the deﬁnition of concaviﬁcation of a function, we must have U (cid:48) ( y ; x ) ≤ U ( y ; x ) − U ( y ; x ) y − y ≤ U (cid:48) ( y ; x ) , (3)with the ﬁrst inequality holding with equality if y > l and the second one holding withequality if y < h . The best way to see this is to assume it is does not hold and see that the deﬁnition of concaviﬁcation isviolated. y = x − k , y = x + 14 k . If l + k < x < h − k , the concaviﬁcation is given by y = x − k , y = x + k .If x ≤ min (cid:8) l + k , h − k (cid:9) , the lower bound l binds and the concaviﬁcation has y = l . y = l + (cid:113) x − lk is obtained from the second equality in Inequation 3.If x ≥ max (cid:8) h − k , l + k (cid:9) , the upper bound h binds and the concaviﬁcation has y = h . y = h − (cid:113) h − xk is obtained from the ﬁrst equality in Inequation 3.If h − k < x < l + k , the concaviﬁcation is:1. y = l , y = l + (cid:113) x − lk if l + (cid:113) x − lk ≤ h .2. y = h , y = h − (cid:113) h − xk if h − (cid:113) h − xk ≥ l .3. y = l , y = h otherwise.Having obtained the concaviﬁcation for any x , the optimal stage 2 garbling has support { y , y } if µ ∈ ( y , y ) , and support { µ } otherwise. Straightforward algebra then gives us thestated result. (cid:4) Lemma A.2.

1. If k > h − l ) and µ ≤ min (cid:8) h − k , l + k (cid:9) , R ’s optimal stage 1 garblingis(a) Any Bayes plausible distribution with support drawn from the set (cid:8) µ − k (cid:9) ∪ (cid:2) l + k , µ + k (cid:3) if µ ≥ l + k .(b) The distribution with support { l, y ( µ ) } with y ( µ ) ∈ (cid:0) µ, l + k (cid:1) if µ < l + k .2. If k > h − l ) and µ ≥ max (cid:8) h − k , l + k (cid:9) , R ’s optimal stage 1 garbling is:(a) Any Bayes plausible distribution with support drawn from the set (cid:2) µ − k , h − k (cid:3) ∪ (cid:8) µ + k (cid:9) if µ ≤ h − k .(b) The distribution with support { y ( µ ) , h } with y ( µ ) ∈ (cid:0) h − k , µ (cid:1) if h − k < µ . . If l + k ≤ µ ≤ h − k , R ’s optimal stage 1 garbling is any Bayes plausible distributionwith support on (cid:2) µ − k , µ + k (cid:3) .4. If k > h − l ) and h − k < µ < l + k , R ’s stage 1 optimal garbling is:(a) Any Bayes plausible distribution with support drawn from (cid:8) µ − k (cid:9) ∪ (cid:2) l + k , h − k (cid:3) ∪ (cid:8) µ + k (cid:9) if l + k ≤ µ ≤ h − k .(b) The distribution with support { l, y ( µ ) } with y ( µ ) ∈ (cid:0) µ, l + k (cid:1) if µ < l + k .(c) The distribution with support { y ( µ ) , h } with y ( µ ) ∈ (cid:0) h − k , µ (cid:1) if h − k < µ .5. If k ≤ h − l ) , then(a) If µ ≤ l + h , R ’s optimal stage 1 garbling is { l, y ( µ ) } , where y ( µ ) > µ is eitheron ( l + k ( µ − l ) , l + k ( h − l ) ) or on [ l + k ( h − l ) , h − k ( h − l ) ] .(b) If µ > l + h , R ’s optimal stage 1 garbling is { y ( µ ) , h } , where y ( µ ) < µ is eitheron [ l + k ( h − l ) , h − k ( h − l ) ] or on ( h − k ( h − l ) , h − k ( µ − h ) ) .Proof. Let U ( x ) be R ’s ﬁrst stage continuation payoﬀs for a ﬁrst stage belief x . Say thestage 2 distribution following x has support { y , y } , with y ≤ y and νy + (1 − ν ) y = µ .Then U ( x ) = νU ( y ; x ) + (1 − ν ) U ( y ; x ) − k ( x − µ ) . The concaviﬁcation of U over [ l, h ] is used to obtain the stage 1 optimal distribution.For any µ , U is continuous. Note that U is aﬃne over any interval of x for which thestage 2 optimal garbling is (cid:8) x − k , x + k (cid:9) . Remark.

If the stage 1 optimal garbling is unique, then it cannot have support { µ } .The reason for this is the following. If the stage 1 unique optimal garbling is degenerate,then it is veriﬁed from Lemma A.1 that the stage 2 optimal garbling has binary support, say { y , y } . But then, choosing the garbling { y , y } at stage 1 and { µ } at stage 2 must give thesame expected payoﬀ, and hence must be optimal. This is a contradiction.Now, ﬁrst let k > h − l ) and µ ≤ min (cid:8) h − k , l + k (cid:9) .Then U is strictly convex in a right neighborhood of k ( µ − l ) and concave everywhereelse (weakly on (cid:0) l + k , µ + k (cid:1) ). Then, the concaviﬁcation must join points z ≤ k ( µ − l ) and z > k ( µ − l ) (in a straight line), with z , z determined by a condition analogous toInequation 3. 30ay µ ≥ l + k . Then it is veriﬁed that z = µ − k and z = l + k . Since µ ∈ (cid:2) l + k , µ + k (cid:3) and U is aﬃne over this interval, a distribution with support on (cid:8) µ − k (cid:9) ∪ (cid:2) l + k , µ + k (cid:3) would be optimal.Now say µ < l + k . Clearly the lower bound l would bind and z = l must hold. z is obtained from the second equality in Inequation 3, and it must be higher than µ , sinceotherwise the optimal garbling would uniquely be degenerate, and we ruled that out above. z is denoted by y ( µ ) in the statement of the Lemma.Now let k > h − l ) and µ ≥ max (cid:8) h − k , l + k (cid:9) . The argument is symmetric to thepreceding one.In this case U is strictly convex in a left neighborhood of h − k ( h − µ ) and concaveeverywhere else (weakly on ( µ − k , h − k )) . The concaviﬁcation is obtained by joining points z and z as before.It is veriﬁed that for µ ≤ h − k , z = h − k and z = µ + k . This tells us that adistribution with support on (cid:2) µ − k , h − k (cid:3) ∪ (cid:8) µ + k (cid:9) would be optimal.For µ > h − k , z = h must hold. Now z is found from the ﬁrst equality in Inequation3, and it must be lower than µ , since otherwise the stage 1 optimal garbling would uniquelybe degenerate. z is denoted by y ( µ ) in the statement of the Lemma.Cases and are dealt with completely analogously.Finally, let k ≤ h − l ) .Then U is strictly convex in a right neighborhood of l + k ( µ − l ) , and in a left neighborhoodof h − k ( h − µ ) , and strictly concave everywhere else.Clearly, the concaviﬁcation must:1. join points z ∈ [ l, l + k ( µ − l ) ) and z > l + k ( µ − l ) in a straight line, and2. join points z < h − k ( h − µ ) and z ∈ ( h − k ( h − µ ) , h ] in a straight line.As usual, these points are determined by a condition analogous to Inequation 3. It turnsout that z = l and z = h , while the positions of z and z depend on parameters. Theoptimal garbling is either { l, z } or { z , h } , depending on where µ lies. (cid:4) The previous result immediately gives us the following useful corollary.31 orollary A.2.1.

The following two statements are equivalent:1. µ ∈ [ l + k , h − k ] and k > h − l ) .2. There are multiple stage 1 optimal garblings for R , including support { µ } and support (cid:8) µ − k , µ + k (cid:9) . Lemma A.3.

Suppose k > h − l ) and µ ∈ [ l + k , h − k ] . If R ’s behavior is as speciﬁed inLemmata A.1 and A.2, then conditional on being the ﬁrst sender to be visited, the probabilityof being selected is the same regardless of which stage 1 optimal garbling is chosen by R .Proof. We show the proof for µ ≤ min (cid:8) h − k , l + k (cid:9) . It is entirely analogous for the othercases from Lemma A.1.Suppose l + k ≤ µ ≤ min (cid:8) h − k , l + k (cid:9) and R ’s ﬁrst stage response is a distribution F on (cid:8) µ − k (cid:9) ∪ [ l + k , µ + k ] .Using Lemma A.1 it is easy to see that the probability of the ﬁrst sender being selectedconditional on a ﬁrst stage belief x is given by P ( x ) =  if x = µ − k kx − kµ + if x ∈ [ l + k , µ + k ] Suppose that F places a mass p ≥ on µ − k . Then conditional on being visited ﬁrst, asender’s expected probability of being selected is given by V = p ∗ (cid:90) µ + k l + k P ( x ) dF ( x ) (4)Next note that p ( µ − k ) + (cid:90) µ + k l + k xdF ( x ) = µ (5)and (cid:90) µ + k l + k dF ( x ) = 1 − p (6)Inserting Equations 3 and 4 into Equation 2, we get that V = , which is independent of F . (cid:4) .1 Proof of Proposition 4.10 Suppose each sender oﬀers support { l, h } , with l ∈ [0 , µ ) and h ∈ ( µ, .First let k > h − l ) and µ ∈ [ l + k , h − k ] .Given a stage 1 draw x , R ’s optimal stage 2 garbling is speciﬁed in Lemma A.1. If thisgarbling does not have support { µ } , R necessarily visits the second sender. If it is { µ } , sheis indiﬀerent between visiting him and not, and may choose either way.At stage 1, she has multiple best responses. The most informative one among themhas support (cid:8) µ − k , µ + k (cid:9) , and from Lemma A.2 it is the only one that is necessarilyfollowed by no learning at stage 2. We assume that she breaks her indiﬀerence in favor ofthis distribution.At belief µ − k she accepts the ﬁrst sender with certainty, and at belief µ + k acceptsthe other one with certainty.Then if a sender deviates to a diﬀerent distribution, his payoﬀs may be aﬀected only ifhe is visited ﬁrst and the distribution he deviates to is such that (cid:8) µ − k , µ + k (cid:9) is not agarbling of it.In this case, regardless of the deviation, R can secure a payoﬀ equal to what she gets inthe absence of the deviation, by picking { µ } at stage 1, followed by visiting the other senderand choosing (cid:8) µ − k , µ + k (cid:9) . Thus the deviation cannot force R to choose from outsidethe set of optimal garblings from Lemma A.2.But then due to Lemma A.3, the deviating sender’s payoﬀs are unaﬀected. Thus, theredoes not exist a proﬁtable deviation and we have an equilibrium.Next say that either k > h − l ) and µ (cid:54)∈ [ l + k , h − k ] , or k ≤ h − l ) .Then from Lemmata A.2 and A.1, R chooses a unique binary garbling at stage 1, andexactly one belief in the support is followed by a visit to the second sender.Denote the stage 1 belief following which R does learn at stage 2 by w . Under eachpossibility we show that there is a proﬁtable deviation for a sender. Possibility 1 : If w < µ , then ∃ l (cid:48) ∈ [max (cid:8) , µ − k (cid:9) , µ ) s.t w = l (cid:48) + k ( µ − l (cid:48) ) . Supposea sender deviates to support { l (cid:48) , h } . Then from Lemma A.1, if the deviating sender is visitedsecond, R chooses support { µ } and selects the deviating sender with certainty. This doesnot aﬀect R ’s behavior if the deviating sender is visited ﬁrst, since l (cid:48) < w . Thus the senderproﬁts from this deviation. Possibility 2 : If w > µ and is followed by a stage 2 best response (cid:110) l, l + (cid:113) w − lk (cid:111) , then33t must be true that w ∈ ( l + k ( µ − l ) , l + k ) . ∃ h (cid:48) < l + (cid:113) w − lk s.t w < h (cid:48) − k ( h (cid:48) − l ) .Suppose a sender deviates to { l, h (cid:48) } . Then k ≤ h (cid:48) − l ) , and Lemma A.1 tells us that if thedeviating sender is visited at stage 2, R ’s response changes to { l, h (cid:48) } . w < h (cid:48) < l + (cid:113) w − lk implies that this is proﬁtable if visited at stage 2, without aﬀecting what happens if visitedat stage 1. Thus the deviation is proﬁtable. Possibility 3 : If w > µ and is followed by a stage 2 best response of { l, h } . Then it isseen that w < h , so that h does not bind at stage 1. This implies that a sender can increaseor decrease h slightly to h (cid:48) (so that { l, h (cid:48) } is instead chosen), without aﬀecting what happensif he is visited ﬁrst. This is clearly proﬁtable. (cid:4) A.2 Proof of Claim 4.2 ‘Only if ’ : Suppose that there is no equilibrium in which both senders oﬀer full info. Then,Proposition 4.3 tells us that either k ≤ , or k > and µ (cid:54)∈ [ k , − k ] . Lemma A.1 andLemma A.2 tell us R ’s unique best response (on path) to full info from both senders. Nowwe need to show that there is no equilibrium where she gets her ﬁrst best payoﬀ. For thesake of contradiction, suppose that there is such an equilibrium–and where sender i oﬀerssome p i . From the discussion in the main text, this just means that R ’s best response onpath to ( p , p ) , is the same as the best response to full information. We argue, however, thatthe same deviations that we identiﬁed for full info, also work for this supposed equilibrium.Recall the nature of those deviations from A.1: they do not make a diﬀerence if the deviatingsender is visited ﬁrst, and restrict learning if visited second. Now if p , p is the equilibriumunder consideration and the same deviation occurs, R ’s response to this deviation would beas under full info: if she visits the deviating sender ﬁrst, she would realize she can continueto choose as on path; if she visits him second, she would make the same adjustment as underthe full info scenario.Thus, since the deviation was proﬁtable under full info, it must be proﬁtable here, and p , p cannot be an equilibrium. (cid:4) A.3 Proof of Proposition 4.3

See the proof of Proposition 4.10, setting l = 0 , h = 1 .34 .4 Proof of Proposition 4.4 Existence of the uninformative equilibrium is proven in the text. Here we show non-existenceof a full information equilibrium.Suppose that each sender chooses a fully informative distribution. Because each senderhas chosen the same distribution (on path), R is indiﬀerent as to whom she visits ﬁrst.Hence, suppose that she visits sender ﬁrst with probability λ ∈ [0 , and sender with itscomplement.If sender is visited ﬁrst, then upon R ’s visit, is realized with probability µ . At thispoint, she will stop and select sender . On the other hand, if is realized then she will selectsender without visiting. The symmetric statements hold for sender and her payoﬀ is u = λ (1 − µ ) + (1 − λ ) µ Now suppose that sender deviates and chooses a distribution that consists of 1 withprobability η := µ − n , n ∈ N , n > µ , and (cid:15) with probability n − µ , where (cid:15) := n +1 − µn .If sender is visited ﬁrst then again sender obtains an expected payoﬀ of (1 − µ ) . If sender is visited ﬁrst, with probability η , is realized and sender is selected and with probability (1 − η ) , (cid:15) is realized. At this point R visits sender and obtains a realization of withprobability − µ , at which point she selects sender . Accordingly, u = λ (1 − µ ) + (1 − λ ) ( η + (1 − η )(1 − µ )) and so sender has a proﬁtable deviation if and only if λ (1 − µ ) + (1 − λ ) ( η + (1 − η )(1 − µ )) > λ (1 − µ ) + (1 − λ ) µ which reduces to n − √ n n > µ provided λ < . Without loss of generality we may assume this, since otherwise the sameargument would suﬃce for a deviation by sender .The limit of the left hand side goes to as n goes to ∞ ; hence for any µ < there existsa ˆ n such that the left hand side is strictly greater than µ for all n > ˆ n . We conclude that forany µ < there exists a proﬁtable deviation, negating the possibility that full information isan equilibrium. (cid:4) .5 Proof of Claim 4.5 For µ ≤ :Let each sender choose the uniform distribution on [0 , µ ] , and suppose that R visitssender ﬁrst with probability λ ∈ [0 , and sender with its complement.No matter the realization at stage 1, R will proceed and visit the other sender as wellbefore selecting one of them. Hence, u = u = . Next, we check for a proﬁtable deviation.Suppose sender deviates to a distribution that contains a probability measure of size a on [2 µ, and some portion F on [0 , µ ) . It is clear that it is without loss of generality to set a to be a point mass on µ .If sender is visited ﬁrst then with probability a , he is selected and sender is nevervisited; and otherwise, sender is visited after which R selects the sender with the highestrealization. If sender is visited ﬁrst, then no matter what, sender is also visited, afterwhich the comparison ensues.Sender ’s payoﬀ is u = λ (cid:18) a + (cid:90) µ (cid:90) x dG ( y ) dF ( x ) (cid:19) +(1 − λ ) (cid:18) a + (cid:90) µ (cid:90) x dGdF (cid:19) = a + (cid:90) µ (cid:90) x dG ( y ) dF ( x ) where G ( y ) = y µ is the (on-path) distribution chosen by sender and where (cid:82) µ dF = 1 − a and (cid:82) µ xdF = 2 − µa .Next, we use the result in Whitmeyer and Whitmeyer (2019) who establish that it suﬃcesto show that has no proﬁtable deviation to any binary distribution. Let F be describedby α with probability p and β with probability − p ; where ≤ α ≤ µ , µ ≤ β ≤ µ , and αp + β (1 − p ) = µ . Consequently, we rewrite u , which becomes u = (1 − p ) F ( β ) + pF ( α )= (1 − p ) β µ + p α µ = 12 Hence, there is no proﬁtable deviation. (cid:4)

For µ > : 36n path, sender ’s payoﬀ is u = λ (cid:32) − µ + (cid:90) − µ )0 (cid:90) x µ µ dydx (cid:33) + (1 − λ ) (cid:32)(cid:18) µ − (cid:19) (cid:18) − µ (cid:19) + (cid:90) − µ )0 (cid:90) x µ µ dydx (cid:33) = 2 (2 µ − λ + (1 − µ ) (3 µ − µ If sender deviates to with probability µ and with probability − µ , his payoﬀ fromdeviating is u D = λµ + (1 − λ ) (cid:18) µ − (cid:19) µ = 1 + 2 λµ − λ − µ The diﬀerence, u D − u is (1 − µ ) (2 µ −

1) (2 λ − µ Since µ > , this is positive provided λ > and negative provided λ < . Thus, if λ (cid:54) = there exists a proﬁtable deviation (if λ < , sender can deviate proﬁtably in the analogousfashion).It remains to show that this vector of distributions is an equilibrium for λ = . Substituting λ = into u , we see that u = on path. Just as for µ ≤ , from Whitmeyer and Whitmeyer(2019) we need check only deviations to binary distributions. Let F be described by α withprobability p and β with probability − p , where αp + β (1 − p ) = µ and ≤ α ≤ µ . Thereare two cases that we need to consider. 1. µ ≤ β ≤ − µ ) ; and 2. β = 1 . In the ﬁrst case, u = (1 − p ) F ( β ) + pF ( α )= (1 − p ) β µ + p α µ = 12 and in the second case u = 12 (1 − p + pF ( α )) + 12 (cid:18)(cid:18) µ − (cid:19) (1 − p ) + pF ( α ) (cid:19) = p α µ + 1 − p µ = 12 where we used the fact that β = 1 implies that − p = µ − pα . Hence, there is no proﬁtabledeviation. (cid:4) .6 Proof of Lemma 4.6 See the proof of Lemma A.1, setting l = 0 , h = 1 and k = 1 . A.7 Proof of Lemma 4.7

See the proof of Lemma A.2, setting l = 0 , h = 1 and k = 1 . A.8 Proof of Lemma 4.8

See the proof of Lemma A.3, setting l = 0 , h = 1 and k = 1 . A.9 Proof of Claim 4.9

Let k > and µ ∈ [ k , − k ] . As shown in Appendix A.1, one of R ’s best responses to fullinformation ( l = 0 , h = 1 ) from both senders is to choose the garbling (cid:8) µ − k , µ + k (cid:9) atstage 1 and to learn nothing at stage 2.Suppose sender i oﬀers a distribution of which (cid:8) µ − k , µ + k (cid:9) is a garbling. Then, theaforementioned best response to full information is permissible, and thus continues to be abest response. Suppose R chooses this response.Then if a sender unilaterally deviates and is the one to be visited ﬁrst, R may respond bychoosing { µ } and visiting the other sender, choosing (cid:8) µ − k , µ + k (cid:9) for him. Exactly as inthe proof for existence of a full information equilibrium (Proposition 4.10 for h = 1 , l = 0 ),Lemma A.3 can be used to argue that the deviation cannot be proﬁtable. (cid:4) A.10 Proof of Corollary 4.9.1

For µ ≤ :We show that (cid:8) µ − k , µ + k (cid:9) is a mean preserving contraction of the uniform distributionon [0 , µ ] when k ≥ µ .Deﬁne l ( x ) as l ( x ) =  ≤ x < µ − k x − µ + k µ − k ≤ x < µ + k x − µ µ + k ≤ x ≤ Deﬁne j ( x ) := (cid:82) x G ( t ) dt : 38 ( x ) =  x µ ≤ x < µx − µ µ ≤ x ≤ It suﬃces to show that µ > k , that j ( x ) − l ( x ) = 0 has at most one real root, and that j (cid:0) µ + k (cid:1) > l (cid:0) µ + k (cid:1) .Set j ( x ) = l ( x ) , which holds if and only if x = 4 µk ± (cid:112) kµ (1 − kµ )4 k This is imaginary if and only if k > µ and has a unique root for k = µ (at µ ). µ − k ≥ µ > for k ≥ µ . It remains to verify that j (cid:0) µ + k (cid:1) > k (cid:0) µ + k (cid:1) ; but it is simple to verify that this must hold. Thus, if k ≥ µ , wehave the result. For µ > :The proof is analogous to the preceding one, with the exception that k must be suﬃcientlylarge so that µ + k ≤ . This holds if and only if k ≥ − µ ) . This constraint binds for µ ≥ and k ≥ µ binds for µ ≤ . (cid:4) A.11 Proof of Proposition 5.1

Suppose each sender oﬀers support { l, h } , with l ∈ [0 , µ ) and h ∈ ( µ, .Lemma A.1 and Lemma A.2 continue to describe on path behavior. Lemma A.3 still holds.First let k > h − l ) and µ ∈ [ l + k , h − k ] .On path behavior is exactly as in the baseline model: visit any one sender, pick (cid:8) µ − k , µ + k (cid:9) and take a decision without learning from the other sender.Then if a sender deviates to a diﬀerent distribution, it would be observed. Then R cansimply respond by visiting the other, non-deviating sender, and picking (cid:8) µ − k , µ + k (cid:9) forhim and taking a decision immediately. 39ue to Lemma A.3, the deviating sender’s payoﬀs are the same as on path. Thus, theredoes not exist a proﬁtable deviation and we have an equilibrium.Next say that either k > h − l ) and µ (cid:54)∈ [ l + k , h − k ] , or k ≤ h − l ) .Then from Lemmata A.2 and A.1, on path R chooses a unique binary garbling at stage 1,and exactly one belief in the support is followed by a visit to the second sender.Denote the stage 1 belief following which R does learn at stage 2 by w . Under eachpossibility we show that there is a proﬁtable deviation for a sender. Possibility 1 : Say w < µ and the stage 2 garbling is { l, h } . There must be a sender, saysender i , who is visited ﬁrst with probability < on path. Suppose sender i deviates to { l (cid:48) , h } , where l < l (cid:48) < w . But on observing this deviation, R would choose to visit sender i ﬁrst. By doing this she could get her ﬁrst best. Thus, behavior is as on path, except that theorder of visits is changed: sender i is visited ﬁrst with probability 1. It is easy to verify thatthe payoﬀ from being visited ﬁrst is > (i.e. higher than payoﬀ from being visited second),which means that this increase in probability of being visited ﬁrst is proﬁtable. Possibility 2 : Say w < µ and the stage 2 garbling is (cid:110) h − (cid:113) h − kk , h (cid:111) . Everything is as inpossibility 1, except that l (cid:48) is chosen such that h − (cid:113) h − kk < l (cid:48) < w . Possibility 3: If w > µ and is followed by a stage 2 best response (cid:110) l, l + (cid:113) w − lk (cid:111) or { l, h } .Then if a sender deviates to no information, clearly R would just learn from the other senderwith a threshold of acceptance µ . It is veriﬁed that the deviating sender’s payoﬀs then arehigher than the payoﬀs on path, conditional on being visited ﬁrst as well as conditional onbeing visited second. (cid:4) A.12 Proof of Lemma 5.2

See the proof of Lemma A.2, setting l = 0 , h = 1 and k = 1 and using µ as the mean for thesecond sender and µ as the mean for the ﬁrst sender. A.13 Proof of Lemma 5.3

Let us begin by looking at the parametric conditions given in bullet points and of Lemma5.2. By symmetry it suﬃces to assume that one of these two pairs of conditions holds for thescenario in which sender is visited second, and show that that implies that one of the fourpairs of conditions for the scenario in which sender is visited ﬁrst must hold. Observe thatthe conditions for bullet points and reduce to | µ − µ | ≤ and µ ∈ (cid:2) , (cid:3) . It is easy to40ee that if µ ∈ (cid:2) , (cid:3) then we are done. What if µ / ∈ (cid:2) , (cid:3) ? WLOG suppose that µ < .By assumption we must have µ − ≤ µ and µ ≥ . Hence, condition (with µ and µ transposed) must hold.Next, we turn our attention to the conditions given in bullet points and . WLOG itsuﬃces to focus on the conditions in bullet point . As we did in the previous paragraph,it suﬃces to assume that these conditions hold for the scenario in which sender is visitedsecond, and show that that implies that one of the four pairs of conditions for the scenarioin which sender is visited ﬁrst must hold. By construction, µ ≤ µ + and µ ∈ (cid:2) , (cid:3) .Moreover, µ ≥ µ > µ − , and so condition (with µ and µ transposed) must hold. (cid:4) A.14 Proof of Proposition 5.4

It suﬃces to show that conditional on being the ﬁrst sender to be visited, the probability ofbeing selected is the same regardless of which stage optimal garbling is chosen by R . Theremainder of the proof follows analogously to the proof of Lemma A.3. Alternatively, observethat it follows from the fact that probability of the ﬁrst sender being selected conditional ona ﬁrst stage belief x is either , , or a function that is aﬃne in x . (cid:4)(cid:4)