Competing to Persuade a Rationally Inattentive Agent
CCompeting to Persuade a Rationally Inattentive Agent
Vasudha Jain ∗ Mark Whitmeyer † February 4, 2020
Abstract
Firms strategically disclose product information in order to attract consumers, butrecipients often find it costly to process all of it, especially when products have complexfeatures. We study a model of competitive information disclosure by two senders, inwhich the receiver may garble each sender’s experiment, subject to a cost increasing inthe informativeness of the garbling. For a large class of parameters, it is an equilibriumfor the senders to provide the receiver’s first best level of information–i.e. as muchas she would learn if she herself controlled information provision. Information on onesender substitutes for information on the other, which nullifies the profitability of aunilateral provision of less information. Thus, we provide a novel channel through whichcompetition with attention costs encourages information disclosure.
Keywords:
Bayesian persuasion; Information design; Multiple senders; Competition; Ratio-nal Inattention; Search
JEL Classifications:
D82; D83 ∗ Corresponding author. Department of Economics, The University of Texas at Austin. Email: [email protected]. † Department of Economics, The University of Texas at Austin.Thanks to V Bhaskar, Vasiliki Skreta, Max Stinchcombe, and Tom Wiseman for helpful feedback anddiscussions; and to various seminar audiences for helpful comments. a r X i v : . [ ec on . T H ] F e b onstant attention wears the active mind,Blots out our powers, and leaves a blank behind.Charles Churchill Epistle to William Hogarth
The standard Bayesian persuasion framework allows senders to design arbitrarily informativesignal structures, and assumes that receivers costlessly process all information made availableto them. This is an unrealistic assumption in many natural contexts, in which agents mayrationally choose to stay partly ignorant. Moreover, there are many situations in whichmultiple senders compete via information provision to be chosen by the agent. In thiscompetitive scenario, we ask how the consumer’s information-processing (or attention) costsshape the information provided by the senders.Consider, for instance, the situation encountered by doctors and pharmaceutical companies.Patients rely on their doctors to make important medical decisions for them, such as thedecision of which medication to take. Often, multiple drugs exist to treat the same condition,but nevertheless differ in subtle ways that can prove crucial for patients. Which alternativeis best might depend on the particular circumstances of individual patients–e.g., someone’smedical history might make him more prone to the side effects of one of them.A well intentioned doctor has her task clearly cut out–she should study all primaryresearch published on each drug, and let that information guide her prescription decisions.This means that when she reads about a clinical trial, she should dig into details such aswhether, for instance, adverse side effects had led many trial subjects of a certain demographicgroup to drop out midway, or whether the drug had a differential impact depending on thestage of the illness.However, getting detailed information involves substantial time and effort, and doctorstypically find it difficult to keep up. Tellingly, Alper et al. (2004) find that it would take adoctor six hundred hours to skim all research relevant to general practice that is publishedin just one month. Consequently, they are likely to pay attention only to some publishedsummary statistics.Pharmaceutical companies are prohibited from falsifying facts when marketing to doctors.They do, however, strategically decide how much information to reveal and in what form,and in doing so, take into consideration the lack of attention on the part of the recipients:1esigning pamphlets in a way that the most favorable pieces of evidence stand out, orother strategies of that ilk. As Goldacre (2014) explains, “They (doctors) need good qualityinformation, but they need it, crucially, under their noses. The problem of the modern worldis not information poverty, but information overload...So doctors will not be going throughevery trial, about every treatment relevant to their field...They will take shortcuts, and theseshortcuts can be exploited [emphasis added].”Motivated by this setting, we study a model of information disclosure á la Kamenicaand Gentzkow (2011), with two senders, and a receiver who can save on attention costs bydrawing from a less informative experiment than what is chosen by the senders. The questionwe are interested in is how, and to what extent, the degree of attention costs matters for therelationship between competition and information disclosure.More specifically, our baseline model has two senders who simultaneously commit to aBlackwell experiment for the quality of their respective products, which are ex-ante identical.An experiment can simply be identified by a distribution of beliefs that averages to theprior. A receiver, who wishes to choose the sender with a higher quality, visits the senderssequentially. When she visits the first one, she observes the distribution of beliefs inducedby his (the sender’s) experiment, and is free to choose any mean preserving contraction–or garbling –of that. A draw from that garbling determines her posterior belief about that sender.Think back to the doctor example, and the shortcuts she might take: she might read justthe first few pages of an article, only the nontechnical parts, only the technical sections, oreven just the title. All of these correspond to different levels of information, and all of theseimpose on the receiver different costs–a grueling slog through a complicated model takesmore out of the receiver than does a quick skim of the conversational portions.We capture this relationship by imposing that the less informative the receiver’s garbling, the lower her attention costs. Intuitively, further garbling of an experiment reduces costsbecause it leads to a belief distribution that is more concentrated around the prior, and henceinvolves less learning about the state. The receiver has to balance this reduction in attentioncosts against the worsening quality of information on a decision-relevant variable.With the first posterior in hand, the receiver decides whether to visit the second sender.Importantly, we do not impose that she must visit the second sender in order to choose him.If she does decide to visit him, the protocol is identical to that for the first sender: she choosesa garbling of his chosen experiment subject to an information cost. Finally, she chooses the q is less informative than p if q is a garbling of p . So, it is not the act of garbling that is costly, but rather the act of drawing more accurate beliefs. Each sender wants to maximize the probability ofbeing chosen.As we show later on, the receiver’s learning strategy has an intuitive feature that drivesour analysis: it can happen that she will have “seen enough” at the first sender and need notvisit the second sender–her belief about the first sender may be so high that she chooses himwithout ever visiting the second, and it may be so low that she chooses the second, sightunseen. Returning to the example: if a doctor is fairly certain that drug A is of low quality,then she would be willing to prescribe drug B without learning anything about it, and viceversa.In this setting, a pertinent benchmark is what the receiver would do if she had potentialaccess to full information on each sender, and could effectively use any pair of Bayes plausibledistributions to learn–subject only to attention costs as described above. We show that inthis case–the ‘first best’ scenario for her–she would always learn something from at least onesender, but never learn any sender’s type with certainty . Furthermore–and this is crucial–fixing any prior, for a high enough attention cost parameter, it is optimal for her to learnfrom exactly one sender.The main question we address in this paper is whether in our game with strategic senders,there is an equilibrium in which the receiver ends up with this first best outcome. That is,whether senders voluntarily provide as much information as the receiver would acquire in herfirst best scenario.Surprisingly, the answer is yes under general conditions. In particular, for any prior, thisholds as long as an attention cost parameter is above a threshold. This is a departure fromclosely related models in the literature, where either there is only one sender and the receiverfaces attention costs, or there are two senders but no attention costs.Our analysis produces a sharp economic insight into why a combination of these ingredients–competition and attention costs–gives us more information disclosure. Recall our observationthat for high enough costs it is optimal for the receiver to learn from exactly one sender. Nowsuppose that the sender from whom she does plan to learn, unilaterally deviates and restrictsher learning. Then since the other sender continues to provide full information, the receivercould just switch to learning from him instead. Her ex-ante payoffs, and the probability of If she does not visit a sender, her posterior is equal to the prior. This is equivalent to an environment where the receiver herself controls how much information is providedby senders. Equivalently, fixing attention costs above a threshold, this holds over an interior interval of prior means.The interval expands as attention costs grow, and approaches the full range as they explode.
To the best of our knowledge, this paper is the first to look at competitive information designwith information processing costs faced by the receiver. This relates thematically to severalstrands of the literature.Since in our model the receiver’s decision to garble a sender’s experiment is the result ofan optimization problem that accounts for attention costs, she is rationally inattentive asin the economics literature pioneered by Sims (2003). The particular framework of rationalinattention that we adopt is the same as in Lipnowski et al. (2019) and Wei (2018). Theformer paper considers the problem of a principal whose preferences over actions are perfectlyaligned with those of an agent. Attention costs are borne only by the agent, and the authorsestablish conditions under which the principal would want to restrict her information with aview to manipulating her attention.Wei (2018) belongs to the small but growing literature on persuasion of a rationallyinattentive receiver by a single sender. This paper, like ours, considers a binary types, binaryaction model with a single sender who has state independent preferences, and an exogenousthreshold of acceptance for the receiver. It is shown that the sender necessarily finds it in hisinterest to restrict the receiver’s learning–we show how competitive forces change this.Bloedel and Segal (2018) take a different approach to a problem similar to Wei’s. In theirframework, after observing the sender’s experiment, but before seeing its realization, the4eceiver can choose a mapping from signal realizations to distributions over ‘perceptions’,incurring an entropy reduction cost. Then, the receiver observes the realized perception, andnot the actual signal realization. As Lipnowski et al. (2019) explain, this is conceptuallydifferent from our paper (and theirs), since the receiver in our model pays a cost to reduceuncertainty about the state , and not the sender’s message. Matyskova (2018) studies apersuasion model where the receiver, after observing the realization from the sender’s signal,can acquire additional information on the state at a cost proportional to the reduction inentropy.Our work is also closely related to the papers on competitive information design withoutany attention costs. With two senders, this has been studied (albeit with slightly differenttiming than our baseline model) by Boleslavsky and Cotton (2015), who identify the uniqueequilibrium. Hulko and Whitmeyer (2018) extend this analysis to n > senders, while alsoincorporating the possibility of search frictions. Crucially, providing full information is notan equilibrium with zero attention costs, and we show that this continues to hold for positivebut small attention costs.Some other papers in the competitive information design literature that bear mentioningare Au and Kawai (2017a,b), Albrecht (2017), and Boleslavsky and Cotton (2018). The resultthat competition encourages information disclosure is familiar from some of these, but asdetailed above, introducing attention costs offers a novel perspective on why that might betrue.Board and Lu (2018) also look at sellers competing through information to entice buyers.However, there, search is random, the number of sellers is uncountable, and the decisionby a seller of how much and what sort of information to disclose is made upon the buyer’svisit. Thus, the problem is different from the scenario analyzed here (or in Hulko andWhitmeyer (2018)), where the searcher must choose whom to visit as part of a stoppingproblem. Moreover, in Board and Lu (2018) the value of each seller’s goods to the buyer areperfectly correlated whereas here they are independent. Accordingly, one of the key inputs oftheir model; how much a seller can observe about the consumer’s belief, is absent here.Another group of papers look at what happens if there is no competition but costs are onthe sender’s side instead of the receiver’s. Gentzkow and Kamenica (2014) look at optimalpersuasion mechanisms when the sender pays higher costs (proportional to entropy reduction)of designing more informative experiments. Likewise, Treust and Tomala (2017) consider The same is also true of some papers in the cheap talk literature, e.g. Battaglini (2002), which have avery different flavor. n copies of identical persuasion problems, but is constrained to send only k < n messages, whichare transmitted with exogenous noise. Interestingly, they find that the sender’s payoff fromthe optimal solution is the concave closure of his payoff function, net of entropy reductioncosts. Thus, these costs arise endogenously in their model.The rest of the paper is organized as follows. Section 2 presents our baseline model.Section 3 presents results for the benchmark with a single sender. Section 4 presents theequilibrium analysis with two senders and spells out how the level of attention costs matters.Section 5 illustrates the robustness of our results to the various modifications mentionedin the introduction, and Section 6 concludes. The Appendix contains proofs that are notpresented in the main text. There are two senders indexed by i ∈ { , } , and a receiver ( R ). Sender i has type ω i ∈ Ω i := { , } , with the types being drawn independently. The common prior belief is that Pr( ω i = 1) = µ ∈ (0 , for i ∈ { , } . R has to select one of the two senders, and she has no outside option. Her payoff is equalto the type of the selected sender, minus attention costs that we elaborate on below. Sender i ’s payoff is if he is selected, and if not. All players maximize expected payoffs. The game proceeds in the following 3 stages.
Stage 0:
Each ( ex-ante uninformed) sender simultaneously commits to a Blackwellexperiment that generates information about his own type. Such an experiment is a mappingfrom { , } to the set of Borel probability measures over a compact metric space of signalrealizations. Each signal realization, then, is associated with a posterior belief distribution on { , } , and an experiment induces a distribution over posterior beliefs. Hereafter, we identifya posterior belief with the belief on ω i = 1 .From the work of Kamenica and Gentzkow (2011), we know that the set of Blackwellexperiments is isomorphic to the set of distributions of posterior beliefs whose average isthe prior. Thus, at this stage 0, sender i commits to a distribution p i ∈ ∆[0 , , with Our results hold with an outside option, as long as its expected quality is not too high. [0 , x dp i ( x ) = µ . Stage 1: R , who at this point does not observe the chosen distributions, decides whetherto visit any sender, and if yes, which one.Say she visits sender first. Upon visiting she observes ’s distribution p , and is free tochoose any q that is a mean preserving contraction (or garbling) of p . Associated with anysuch q is an attention cost given by the following: C ( q ) = (cid:90) [0 , k ( x − µ ) dq ( x ) , (1)where k > . Note that costs depend on q and not directly on p . We defer a discussion ofthese costs to Section 2.2. R takes a draw from q , which determines her posterior belief about sender . Stage 2: R then decides whether to visit sender . If she does, she observes p andchooses a garbling q , once again incurring an attention cost C ( q ) . She takes a draw from q ,which determines her posterior belief about this sender. Finally, she chooses the sender forwhom her posterior belief is higher. She need not have visited a sender or learned anythingfrom him in order to select him.Notice that R ’s optimal garbling at stage 2 potentially depends on the belief she draws atstage 1. She may be more or less inclined to learn about the second sender, depending onhow much uncertainty has already been resolved about the first one. Indeed, as we shall see,if the stage 1 belief is close enough to or , she chooses not to learn at all at stage 2, andthis fact plays a crucial role in our analysis.The distribution offered by the sender visited first dictates how much can be learned atstage 1. Then in light of the preceding observation, if both senders offer different distributions,the choice of whom to visit first (if anyone) matters for payoffs. We relax this assumption in Section 5.1. q is a garbling of p if the random variable associated with q second order stochastically dominates–andhas the same mean as–the random variable associated with p . It is a strict garbling if additionally q (cid:54) = p .Trivially, q = p or q = δ µ is always an option. As long as she learns something from at least one sender, the posteriors would never be equal. (SeeFootnote 14.) They would be equal only if she does not learn from either sender–in that case she may choosebetween the senders in any way. i is a choice of a distribution p i ∈ ∆[0 , whose average is µ .A pure strategy for R consists of i) a choice of which sender to visit first, if any; ii) a choiceof garbling for any distribution offered by the sender she visits first; iii) a choice of whetherto visit the second sender for each belief drawn in the previous stage; iv) a choice of garblingfor the second sender, for each distribution offered by him and each posterior belief drawn inthe previous stage.Our solution concept is subgame perfect equilibrium (hereafter, equilibrium), defined inthe standard way. We restrict the players to pure strategies, with one exception– viz. , weallow R to randomize over the order of visits.Before proceeding to our analysis, we point out the following characterization of the setof garblings of a binary distribution, which we shall extensively use: q is a garbling of a distribution with support { ν , ν } ⇐⇒ supp ( q ) ⊆ [min { ν , ν } , max { ν , ν } ] . Attention costs, in our framework, are costs incurred to process information on a sender’s type.Through his choice of a Blackwell experiment, a sender can control how much information onhis type is available–in other words, he can put a cap on what can be learned. But a recipientmay choose to ignore some of that information and take a draw from a less informativeexperiment, thereby reducing attention costs. For instance, a pharmaceutical company candecide how much research on its drug to publish, but a doctor might choose to read a subsetof that. Her costs would depend on how much of the research she chooses to read, not onhow much was published. In particular, the act of garbling per se is costless. The cost function in equation 1 captures this notion. Associated with each posterior x is acost k ( x − µ ) , so that more accurate beliefs–those that are further away from the prior–costmore. This is integrated to determine the cost of a distribution of posteriors. Thus, costs are posterior separable as in Caplin et al. (2019).Since k ( x − µ ) is strictly convex, by Jensen’s inequality we have q is a garbling of p = ⇒ C ( q ) ≤ C ( p ) , with the inequality strict for strict garblings. For instance, C ( q ) is minimized for the As discussed in the section on related literature, this is closely related to the framework in Lipnowskiet al. (2019) and Wei (2018). Of course, this property holds for any strictly convex function instead of k ( x − µ ) , but we work with the δ µ , and maximized for the fully informative one with support { , } .Clearly then, R faces a trade-off in her choice of garblings q and q –a garbling costs less,but also corresponds to a less informative experiment and is less valuable for her decisionproblem (Blackwell 1951, 1953). Returning to our example, the more extensive or detailedthe research a doctor chooses to read, the costlier it is to draw an inference from it; but also,the more confidence she can place in that inference. We begin by taking a brief look at what happens if there is only one sender. R chooses agarbling of that sender’s distribution and accepts his product if the belief drawn from it isabove a threshold λ ∈ (0 , . Payoffs clearly depend only on the distribution finally chosenby R . In any (subgame perfect) equilibrium, the sender offers a distribution to maximize hisexpected payoff, correctly anticipating R ’s optimal garbling behavior.Following the arguments in Wei (2018), the setup with a single sender permits twosimplifications that are not valid in our two-sender model. One: to obtain the set ofequilibrium outcomes, it is without loss of generality to restrict the sender to incentivecompatible distributions–those that R would not want to garble. This leads to the secondsimplification, which is that it is without loss to restrict him to binary and degeneratedistributions. (Since R has only two actions, she never wants to pay to generate more thantwo beliefs.)We refer to a distribution offered by the sender in a sender-preferred equilibrium as sender-optimal .If λ < µ , it is immediate that any sender-optimal distribution is such that nothing islearned and he is accepted with certainty. For example, he can simply design an uninformativeexperiment.Now say λ > µ . If k = 0 , then we have a standard Bayesian persuasion problem, and weknow from prior work that the sender-optimal distribution has support { , λ } . But if k > ,this is no longer optimal, because the garbling chosen by R in response to that would be δ µ , and the sender would not be accepted. This is easy to see intuitively–at a belief λ , R is indifferent between accepting and rejecting. When offered { , λ } , her gross payoff fromchoosing any garbling is the same as the payoff from rejecting with certainty. But then there specific form for tractability.
9s no reason for her to pay to learn anything. To make it worth her while to do so, the senderwould have to allow her to generate beliefs above λ .The following proposition summarizes the results for this benchmark case. Proposition 3.1.
Suppose there is a single sender, and R has a threshold of acceptance λ > µ . Then,1. A sender-preferred equilibrium exists.2. In a sender-preferred equilibrium, R ’s garbling on path is strictly less informative (inthe Blackwell sense) than her optimal garbling in response to full information.Proof. See Lemma 1 and Proposition 2 in Wei (2018). (cid:4)
Importantly, as will be shown ahead in this paper, R has a unique optimal garbling inresponse to any binary distribution, which implies that this result holds for any equilibriumand we can omit the ‘sender-preferred’ qualifier. Corollary 3.1.1.
If there is a single sender, and R has a threshold of acceptance λ ∈ (0 , ,then full information is not offered by the sender in equilibrium. In response to full information, say R ’s optimal garbling has support { ν , ν } where ν < λ < ν . Then these results state that in equilibrium, the distribution R ends up withwould be a strict garbling of this. Stated differently, in equilibrium the sender does restrict R ’s learning, not allowing her to choose her first best .The intuition roughly is that although the sender cannot implement his optimal no-garbling solution { , λ } , he need not go all the way to providing full information. He canprofitably restrict learning so that the higher belief in the support of R ’s garbling is below ν ,and the probability of its realization is higher.As we see next, introducing an additional sender yields an interesting comparison to this. We now analyze the game described in Section 2, for an arbitrary k > and µ ∈ (0 , .To start off, recall our observation that R ’s order of visits matters when the two distri-butions on offer are different. In equilibrium she must correctly anticipate the distributionschosen, and the order of visits must be a best response to those. However, since she doesnot observe the chosen distributions at stage 0, any deviation by a sender goes undetected10ntil and unless he is visited. This has the following implication, which we note for furtherreference. Remark.
Any deviation by a sender cannot affect either R ’s decision to visit a sender, or theorder of her visits.Next, note that if both senders offer the same distribution, then R is indifferent betweenthe two orders of visit (if she visits anyone). The analysis below will make it clear that thetie breaking rule in this case does not matter for our results, and we do not assume anythingabout it.We now turn to the question of equilibrium existence. Suppose that each of the twosenders offers no information, i.e. the distribution δ µ . Then upon visiting either sender, R is also restricted to choosing δ µ . But then she expects to gain nothing by visiting a sender,and not visiting either of them is a best response. She may simply select sender 1 with anyprobability p ∈ [0 , , and sender 2 with probability − p . Clearly, if this best response isplayed, a deviation by a sender goes undetected, and does not make any difference to theoutcome. Thus we have the following. Claim 4.1 (Equilibrium existence) . An equilibrium exists for all µ ∈ (0 , , and for all k > .In particular, there is always an equilibrium in which each sender offers an uninformativedistribution. Naturally, we are interested in finding other, more interesting equilibria. Of particularinterest are equilibria that give R her first best payoff.First, let us clarify what exactly we mean by this. R ’s first best payoff is essentiallythe best she can achieve, across all profiles of sender behavior. In other words, it is thepayoff she would get if she herself could choose the senders’ distributions. Now, since everyBayes-plausible distribution is a garbling of the fully informative distribution, she has greatestlatitude when both senders offer the fully informative distribution. Thus, her first best payoffis attained when both senders offer full information.However, she may attain the same payoff even when senders choose other less informativedistributions. Suppose, for illustration, that when offered full information, the following is abest response for R : visit Sender 1 first, choose the garbling { µ − (cid:15), µ + (cid:15) } for him; then visitSender 2 and choose the uninformative garbling for him. Then even if, e.g., Sender 1 offersthe distribution { µ − (cid:15), µ + 2 (cid:15) } and Sender 2 offers no information, she gets to choose theexact same response and secure her first best payoff. At the heart of this is the fact that dueto attention costs, she might not (and as we shall see, does not) really use full informationeven when allowed to. 11he next observation is easy to make. Remark.
Suppose there is an equilibrium in which Sender i offers p i . Then R achieves herfirst best payoff in this equilibrium if and only if her best response on path is also a bestresponse on path to full information from both senders.This observation is used to prove the ‘only if’ direction of the following proposition–theargument draws on some results from later sections in the paper, and is presented in theAppendix. The ‘if’ direction is obvious. Claim 4.2 (First best) . For given parameters, there is an equilibrium that gives R her firstbest payoff if and only if there is an equilibrium in which both senders offer full information. An implication of this is that in establishing conditions for the existence of a full informationequilibrium, we establish conditions for an equilibrium where R gets her first best. Hence, wefocus our attention on full information, and the next proposition presents our main result. Proposition 4.3 (Full information equilibrium) .
1. For all k > , there is an equilibriumin which both senders offer full information if and only if µ ∈ (cid:2) k , − k (cid:3) .2. For all k ∈ (cid:0) , (cid:3) , µ ∈ (0 , , there is no equilibrium in which both senders offer fullinformation. It is worth highlighting that the parameter in the attention cost function is crucial. If k is above , we obtain an interval of priors over which full information is an equilibrium, andthis interval expands as k grows. In the limit, as k → ∞ , the interval converges to (0 , , thefull range of priors. Thus, by having higher attention costs, the receiver might elicit betterinformation from competing senders.The following corollary states the same result differently. Corollary 4.3.1.
For all µ ∈ (0 , , there is an equilibrium in which both senders offer fullinformation if and only if k > max (cid:110) µ , − µ ) (cid:111) (weak inequality if µ (cid:54) = ) . Stated this way, one might conjecture that the result is trivially obtained because forhigh enough values of k , R finds it optimal to not learn anything at all even when offered fullinformation. As it turns out, this is not the case, and for any finite k she does undertakesome learning from at least one sender when offered full information.Instead, we obtain the existence result because for high enough values of k , R finds itoptimal to learn only about the quality of one sender, and completely ignore information onthe other. The analysis ahead will clarify how this fact plays a crucial role.12ection 4.2 provides a proof of this result (and presents additional results), but before wemove on to that, it is instructive to examine another benchmark, where k = 0 . k = 0 ) When k = 0 , it is costless for R to learn. This makes a substantive difference to the analysis,because she never has a strict incentive to garble either sender’s distribution–there is noreason to leave any information on the table. Notice that in this case R ’s first best payoffis obtained only when both senders offer full information.For simplicity, here we assume the following tie-breaking rules i) if the stage 1 draw is (or ), she rejects (or accepts) that sender without visiting the other one, and ii) if the drawsfrom both stages are the same, she selects the sender visited last. Proposition 4.4 (No attention costs) . Suppose k = 0 . Then the following are true.1. For all µ ∈ (0 , , there is an equilibrium in which both senders choose an uninformativedistribution.2. For all µ ∈ (0 , , there is no equilibrium in which both senders offer full information. The reason an uninformative equilibrium exists is identical to that for k > –it is a bestresponse for R to not visit either sender, but then a deviation is not detected and makesno difference to the outcome. The reasoning behind non-existence of a full informationequilibrium, on the other hand, is very different for k = 0 and for small, positive k .For k = 0 , in response to full information from both senders, R visits either one of them,learns his type perfectly, and immediately takes a decision. A sender’s deviation cannot makea difference if he is not visited first. But if he is, a deviation to support { (cid:15), } is profitable,where (cid:15) is arbitrarily close to zero. This is because if R ’s draw from this distribution is (cid:15) , shecontinues to learn from the second sender, and rejects him if the draw then is .When k is any positive quantity, a deviation of this nature does not help–intuitively, evenif the stage 1 draw is a small, positive (cid:15) , R is sure enough of the quality of the first senderthat she does not find it worth her while to learn about the other one.The following result establishes existence of other (less than fully) informative equilibriawhen attention costs are absent. This setup has been studied in Boleslavsky and Cotton (2018), Hulko and Whitmeyer (2018) and otherpapers, with the difference that they assume that R observes the chosen distributions at stage 0. laim 4.5.
1. Let k = 0 and µ ≤ . There is an equilibrium in which each player choosesthe uniform distribution on [0 , µ ] .2. Let k = 0 and µ > . There is an equilibrium in which each sender chooses a CDF witha continuous portion F ( x ) = x µ on [0 , − µ )] and a point mass of size − µ on . Insuch an equilibrium, R ’s decision about whom to visit first must be fair (each sender isvisited first with probability ). k > ) As previously discussed, for positive attention costs our main result pertains to the fullinformation equilibrium, which is stated in Proposition 4.3 above. We begin by showing whyit is true, and for ease of exposition present the key arguments for k = 1 . The structure ofthe proof is the same for a generic k > , and the details are relegated to the Appendix. Inessence, the argument will be that full information is an equilibrium when R wishes to learnonly from one sender. k = 1 Recall that for k = 1 , Proposition 4.3 states that full information is an equilibrium if andonly if µ ∈ (cid:2) , (cid:3) .Start by considering any µ ∈ (0 , , and suppose that each sender offers the fully informa-tive distribution with support { , } . To analyze R ’s (on path) best response, we proceedin two steps–first, we determine her stage 2 best response for each belief drawn at stage 1;second, we use that to solve for the optimal stage 1 behavior. We make use of the techniqueof concavification for this. R (cid:48) s stage 2 best response: First let us find the optimal stage 2 garbling, if R visitsthe sender at that stage.Say the draw from Stage 1 is x ∈ [0 , . Then, R selects the second sender if and onlyif the stage 2 draw y turns out to be higher than x . Her payoff from a stage 2 belief y isthen max { x, y } , minus the attention cost associated with y . Denote this stage 2 payoff by U ( y ; x ) . It does not matter what we assume about the tie breaking rule when y = x . For any distribution offered, x would not belong to the support of the garbling chosen at stage 2. The reasoning is similar to the argumentfor why the standard Bayesian persuasion solution is not incentive compatible in the single-sender case (seeSection 3). ( y ; x ) = max { x, y } − ( y − µ ) , for x, y ∈ [0 , . This is piecewise concave in y , and Figure 1 plots it for a representative valueof x . Figure 1: R ’s stage 2 payoffsNow, since any distribution is a garbling of the one with support { , } , we know fromKamenica and Gentzkow (2011) that for any given x , R ’s optimal garbling is determinedusing the concavification of U ( y ; x ) over [0 , . The concavification is the red curve in Figure2. It is evident that depending on where µ lies, the optimal distribution of beliefs is eitherdegenerate on µ , or is binary. Lemma 4.6 (Stage 2 optimal garbling) . Suppose that R ’s stage 1 draw is x ∈ [0 , and shevisits the sender at stage 2. R ’s stage 2 optimal garbling is either degenerate or binary, andits support is as follows.1. If µ < , (cid:8) x − , x + (cid:9) if ≤ x < µ + { , √ x } if µ < x < { µ } if x ≤ µ or x ≥ µ + R ’s stage 2 payoffs
2. If µ = , (cid:8) x − , x + (cid:9) if < x < { µ } if x ≤ or x ≥
3. If µ > , (cid:8) x − , x + (cid:9) if µ − < x ≤ (cid:8) − √ − x, (cid:9) if < x < − (1 − µ ) { µ } x ≤ µ − or x ≥ − (1 − µ ) The interesting thing to note here is that regardless of the prior, if the first stage draw iseither very high or very low, then R chooses not to learn anything from the second sender.This is intuitive–for a high enough belief that the first sender’s quality is good, she deemsit very unlikely that the second sender is better, and does not invest in learning about him.Instead, she accepts the first sender with certainty. Conversely, if the first stage draw is verylow, she accepts the second sender with certainty.Furthermore, the thresholds beyond which there is no learning at stage 2 depend on theprior. The prior is the expected quality of the second sender, so the higher it is, the larger(smaller) the range of first stage beliefs over which the second sender is accepted (rejected)without learning. 16f R does choose a binary distribution at stage 2, then she selects the second (first) senderat the higher (lower) belief.For any stage 1 draw, if the stage 2 optimal garbling involves any learning, R strictlygains from visiting the second sender. If it does not involve any learning, R is indifferentbetween making the second visit and not, and she may resolve this in any manner. R ’s stage 1 best response: Using the above result, it is straightforward to obtain R ’sfirst stage continuation payoffs for an arbitrary x , and determine her first stage optimalgarbling from its concavification over [0 , . This leads to the following. Lemma 4.7 (Stage 1 optimal garbling) . Any Bayes plausible distribution with support drawnfrom the following sets is optimal for R at stage 1.1. (cid:8) µ − (cid:9) ∪ (cid:2) , µ + (cid:3) if µ ∈ (cid:2) , (cid:3) .2. (cid:2) µ − , (cid:3) ∪ (cid:8) µ + (cid:9) if µ ∈ (cid:2) , (cid:3) .3. { , y ( µ ) } if µ < , where y ( µ ) ∈ (cid:0) µ, (cid:1) .4. { y ( µ ) , } if µ > , where y ( µ ) ∈ (cid:0) , µ (cid:1) . The exact expressions for y ( µ ) and y ( µ ) are not important. The main thing to notehere is that the stage 1 solution always involves some learning, and is unique if and only if µ (cid:54)∈ (cid:2) , (cid:3) . Interestingly, in spite of the fact that there are only two senders and binary typesin this model, R may choose to generate more than two beliefs at stage 1. The reason is thateach stage 1 belief is optimally followed by a different degree of learning at stage 2.Note also that since the stage 1 optimal distribution always involves learning, a visit isnecessarily made at this stage. R does not care which sender is visited first, and she mayrandomize her choice in any way.Since there are multiple best responses for µ ∈ (cid:2) , (cid:3) , we need to make a selection amongthem. Notice that the most informative (in the Blackwell sense) of the optimal distributionshas support (cid:8) µ − , µ + (cid:9) , and by Lemma 4.6, this is the only one among them that isnecessarily followed by no learning at stage 2. We assume that R breaks her indifference infavor of this distribution. That is, when indifferent, she’d rather not put off learning untilthe next stage.In summary: if µ ∈ (cid:2) , (cid:3) , R ’s best response to full information is the following. Visitsender 1 with probability q ∈ [0 , , and Sender 2 with probability − q . Choose the garbling17ith support (cid:8) µ − , µ + (cid:9) for the sender visited. If the belief drawn is µ − , select theother sender without learning anything from him. If the belief drawn is µ + , select thevisited sender without learning anything from the other one. We now know what happens on path if full information is offered. For µ ∈ (cid:2) , (cid:3) , it turnsout that we can rule out profitable deviations without exactly knowing R ’s best response toany deviation. For µ (cid:54)∈ (cid:2) , (cid:3) , we show that there exists a profitable deviation for a sender. No profitable deviation for µ ∈ (cid:2) , (cid:3) : Consider what a sender achieves by deviating.We have already seen that this does not affect the probability of being the one to be visitedfirst. Moreover, if he is not the one to be visited first, his payoffs are not affected, since R does not plan to learn anything from him. So, we only need to consider what happens if hedeviates and is visited first. In this case, R ’s behavior would be altered if (cid:8) µ − , µ + (cid:9) is not a garbling of the distribution he deviates to.Now, say R visits a sender and finds out that she may no longer choose support (cid:8) µ − , µ + (cid:9) . Regardless of what the sender’s deviation is, though, she is permittedto learn nothing, i.e. choose support { µ } . By Lemma 4.7, this is one of her best responses,and by Lemma 4.6, this would be optimally followed by visiting the other sender (who hasnot deviated) and choosing support (cid:8) µ − , µ + (cid:9) for him.By responding to the deviation in this manner, R ensures a payoff equal to what isattained in the absence of the deviation. Naturally, any other response specified in Lemma4.7, if permissible under the deviation, would also give her the same payoff, and she maychoose that instead of support { µ } .What this essentially implies is that in response to any deviation by the sender visitedfirst, R would choose from the set specified in Lemma 4.7, and depending on the belief shedraws, follow it with stage 2 behavior specified in Lemma 4.6.This observation, and the next Lemma, are key to our analysis. Lemma 4.8.
For all µ ∈ (cid:2) , (cid:3) , conditional on being visited first, a sender’s expected payoffis the same for any of the receiver responses specified in Lemma 4.7. This immediately implies that a unilateral deviation does not affect a sender’s payoffs,and it is proven that full information is an equilibrium for µ ∈ (cid:2) , (cid:3) . Either by not visiting him at all, or by visiting but not learning. ibid. R , i.e. when both senders allow her perfect information, attention costs lead her to learnfrom only one sender. It does not matter which sender that is–if it is a best response to learnonly from the sender visited first, then clearly it is also a best response to learn only fromthe one visited second.Now, we say that on path she chooses to learn from the first sender she visits. If, however,that sender deviates and restricts her learning, she is able to compensate for it by learningmore from the other sender. The ex-ante probability that she makes the correct choicethereby remains unaffected, and the deviating sender is unable to gain. Existence of profitable deviation for µ (cid:54)∈ (cid:2) , (cid:3) : When µ (cid:54)∈ (cid:2) , (cid:3) , similar reasoningdoes not apply, since R ’s best response is unique and involves learning from both senders onpath. In this case, there exists a deviation where a sender profitably restricts R ’s learning incase he is visited second, without affecting what happens if he is visited first.In particular, say for instance µ < . Recall that in response to full information, R chooses support { , y ( µ ) } at stage 1. Following belief she immediately accepts the secondsender, and following belief y ( µ ) , she chooses support (cid:110) , (cid:112) y ( µ ) (cid:111) at stage 2.It can be shown that ∃ p ∈ ( y ( µ ) , (cid:112) y ( µ )) such that if a sender deviates to { , p } andis visited second (by R holding a belief y ( µ ) from stage 1), R chooses { , p } instead of (cid:110) , (cid:112) y ( µ ) (cid:111) . If he is instead visited first, R ’s best response is unchanged. Evidently thisdeviation increases the probability of being selected, and is therefore profitable. k > The analysis so far tells us that for any k , first, an uninformative equilibrium always exists;and second, a full information equilibrium exists for a large class of parameter values.Our focus on full information is not misplaced, in spite of the fact that R never makes useof it even when it is on offer. The reason we focus on it–as we have discussed at length–isClaim 4.2: full information is an equilibrium if and only if R can get her first best outcomein an equilibrium.Our interest in equilibria where R gets her first best is natural–first, the existence of suchequilibria is surprising; second, in many situations it is appropriate to select receiver-preferredequilibria. We showed this for k = 1 , but Appendix A.1 shows that it is true generally: in response to full information,she visits only one sender and picks support (cid:8) µ − k , µ + k (cid:9) . R her first best payoff? And second, what can we say about equilibriathat do not give her the first best?The answer to the first question is easy. There is a whole class of such equilibria, wherethe distributions offered by the senders allow her to respond exactly as under full information. Claim 4.9.
Suppose k > and µ ∈ (cid:2) k , − k (cid:3) . For i ∈ { , } , let p i ∈ ∆[0 , be anydistribution with expectation µ , and of which the distribution with support (cid:8) µ − k , µ + k (cid:9) is a garbling. Then, there is an equilibrium in which sender i offers the distribution p i . Suchan equilibrium is outcome equivalent to full information. This can be used to construct specific examples such as the following.
Corollary 4.9.1.
1. Let µ ≤ . Then there is an equilibrium in which both senders offerthe uniform distribution on [0 , µ ] if k ≥ µ .2. Let µ > . Then there is an equilibrium in which both senders offer a CDF with acontinuous portion F ( x ) = x µ on [0 , − µ )] and a point mass of size − µ on if k ≥ µ for µ ≤ , and if k ≥ − µ ) for µ ≥ . We choose to present these examples for a reason: Recall from Claim 4.5 that thesedistributions are also equilibria in the k = 0 scenario, where full information is not anequilibrium. In contrast, here these equilibria are in fact outcome equivalent to full information.The difference arises since with attention costs, both full information and these distributionsare garbled down to the same thing by R .Turning to the question on equilibria that do not give R her first best, we have the followingsharp result that rules out their existence in the class of binary, symmetric equilibria. Essentially, it implies that if a symmetric binary equilibrium exists (for any parameters), somust the full information equilibrium, and in fact it must be outcome equivalent to the fullinformation one–so that R must be getting her first best. Proposition 4.10.
Let the distribution p have support { l, h } with l ∈ [0 , µ ) and h ∈ ( µ, .1. If k > h − l ) , there is an equilibrium where both senders offer p if and only if µ ∈ (cid:2) l + k , h − k (cid:3) .2. If k ≤ h − l ) , there is no equilibrium where both senders offer p . That is, equilibria where both senders offer the same distribution, that has binary support. Beyond this, characterizing all equilibria of the game is beyond the scope of this paper, amajor reason being that little practical can be said about the set of garblings of a non-binarydistribution.
Here, we illustrate that our main results continue to hold under a variety of differentmodelling choices. We begin by demonstrating that they continue to hold if instead thesenders’ experiment choices are public.
So far we have considered a scenario where R does not observe the experiment chosenby a sender until she visits him. This is a natural assumption in many situations–e.g.,pharmaceutical companies have much leeway in dissemination of information to doctors, andit may not be known how detailed the research published on a drug is until one takes a lookthrough it. As another example, one does not know how many customer reviews a seller hasallowed to be posted on his website until one visits the website.However, there can be other applications where one might expect the senders’ experimentsto be observed when they are chosen, i.e. at Stage of our game. In this case, the strategicconsiderations remain the same except for an important difference–posted experiments allowa sender’s deviation to be observed by R , and hence deviations affect her order of visits. Forexample, if both senders are expected to choose the same experiment, then R is indifferentbetween the order of visits; by deviating, a sender can break this indifference.We show that our main results continue to hold in this scenario. Proposition 5.1.
Suppose that the experiments chosen by the senders are observed by R atStage 0 of the game. Then Proposition 4.10 still holds. Note that Proposition 4.10 encapsulates our main results: it establishes conditions forexistence of a full information equilibrium, and proves non-existence of any binary, symmetricequilibrium that is not outcome equivalent to full information.A full proof of Proposition 5.1 follows in the Appendix, but by recalling the argumentsthat lead up to existence of a full information equilibrium for some parameters in the baseline For h = 1 , l = 0 this proposition is identical to Proposition 4.3. R ’s best response to full information inthe baseline model is to learn from exactly one sender, then a deviation cannot affect payoffseven if it is observed at Stage . Put succinctly, if R knows that a sender has deviated, shecan simply visit the other sender and learn from him. Our baseline model assumes that the distributions of the senders’ types have identicalmeans. There, the receiver’s problem is the most interesting, since ex-ante she has very littleinformation to base her choice on.In this section, we relax this assumption. Our aim is to show that our main result appliesto settings where the prior beliefs on the two senders are different but sufficiently close.Namely, if the two means µ and µ lie in a particular interval, then there is an equilibriumin which both senders offer full information.For expositional convenience let k = 1 . We proceed in the same vein as in the proof forLemma 4.7: using Lemma 4.6 we can obtain the receiver’s first stage continuation payoffsfor an arbitrary first stage draw x , which we then concavify to obtain the receiver’s firststage optimal garbling. Suppose that sender is visited second (we’ll revisit this assumptionshortly). Lemma 5.2 (Stage 1 optimal garbling) . Any Bayes plausible distribution with support drawnfrom the following sets is optimal for R at stage 1.1. (cid:2) µ − , (cid:3) ∪ (cid:8) µ + (cid:9) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) µ − , µ + (cid:3) .2. (cid:2) µ − , (cid:3) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) µ − , (cid:3) .3. (cid:2) , µ + (cid:3) ∪ (cid:8) µ − (cid:9) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) µ − , µ + (cid:3) .4. (cid:2) , µ + (cid:3) if µ ∈ (cid:2) , (cid:3) and µ ∈ (cid:2) , µ + (cid:3) . Note that we have not yet determined which sender should be visited first. Our firststep is to show that if µ and µ satisfy one of the conditions for Lemma 5.2 when µ isvisited second, then they satisfy one of the conditions for Lemma 5.2 when µ is visited first.Formally, Lemma 5.3.
One of the four parametric restrictions in Lemma 5.2 holds if and only if oneof the four parametric restrictions in Lemma 5.2 holds in which µ and µ are replaced witheach other. is visited first is the same as her expected payoff under any optimal protocol in which weassume sender is visited first. Hence, it does not matter which sender she visits first, andso she can break ties in that manner in any way that she chooses.This step requires just a couple sentences to prove: in each of the four cases described inLemma 5.2, there is a stage optimal distribution in which the receiver learns nothing at thefirst sender. Her expected payoff under the optimal search protocol is thus µ + µ + µ + µ − µ µ + 116 which is invariant to an exchange of µ and µ . Finally, we arrive at the heterogeneous meansanalog to Proposition 4.3: Proposition 5.4.
There is a full information equilibrium if | µ − µ | ≤ and1. µ , µ ∈ (cid:2) , (cid:3) ; or2. µ i ≤ ≤ µ j for i, j ∈ { , } and i (cid:54) = j ; or3. µ i ≤ ≤ µ j for i, j ∈ { , } and i (cid:54) = j . Here, we modify the model by allowing the receiver’s information processing cost at a senderto itself depend on the distribution chosen by the sender. That is, we amend the attentioncost so that it is now given by (at sender ) C ( q , p ) = k ( p ) (cid:90) [0 , ( x − µ ) dq ( x ) , (2)where p is the choice of distribution of posterior beliefs by sender and q is the garblingof that distribution chosen by the receiver. We assume that k is weakly decreasing inthe Blackwell order: the more informative the sender’s experiment, the less costly a giveninformation structure is for the receiver. This corresponds to the following intuition: the lessinformative the seller is, the costlier it is for the buyer to maintain a particular informationstructure, since she is forced to pay closer attention. We thank the associate editor for suggesting such a cost of information.
23e define k F := k ( p B ) to be the minimum cost parameter, where p B is the Bernoullidistribution begotten by full information provision by the sender. Naturally, we stipulatethat k F is non-negative.With this modified cost function, our main result continues to hold. Namely, providedthe attention cost is sufficiently high, there is an equilibrium in which both senders offer fullinformation: Proposition 5.5 (Full information equilibrium) . For all k F > , if µ ∈ (cid:104) k F , − k F (cid:105) thenthere is an equilibrium in which both sellers offer full information.Proof. On path, where each sender provides full information, the analysis is unchangedfrom earlier sections (with k F in lieu of k ), and the searcher’s optimal protocol is unaltered.Moreover, should a sender deviate, then again the searcher can behave optimally by learningnothing at the deviating sender, eliminating the possibility for a sender to deviate profitably. (cid:4) We study a model of information disclosure by two senders who compete to persuade areceiver. The receiver, instead of passively accepting the experiment adopted by a sender,may choose to garble it before drawing a belief. The lower the informativeness of the chosengarbling, the lower her attention costs are.We show how for a large class of parameters, it is an equilibrium for the senders to offerat least as much information to the receiver as she would choose for herself, if she couldcontrol information provision. In particular, full disclosure by both senders is an equilibrium.Moreover, there is no binary symmetric equilibrium (for any value of parameters) that doesnot give the receiver this first best outcome. We prove robustness to various modelingassumptions.This is the result of an interesting trade-off that generalizes beyond the specifics of ourmodel. Due to attention costs, the receiver never finds it worthwhile to learn either sender’stype perfectly. That is, even with access to full information, she leaves some scope for furtherlearning about each. Moreover, since her task is to choose between the senders, informationon the quality of one sender partially substitutes for information on the quality of the other.For example, learning a lot about the quality of one drug on the market can be just as good(for the accuracy of a doctor’s decision) as learning a bit about both alternatives.24onsequently, starting from a situation of full disclosure by both senders, if either senderdeviates and restricts the receiver’s learning, she has an opportunity to make up for it byusing some of the ‘surplus’ information–so far unused–about the other sender. The deviatingsender thus has limited ability–if any–to affect the overall quality of the receiver’s informationacross the two alternatives.This channel clearly breaks down in the absence of attention costs (so that the receiveralways uses all available information), or if there is only one sender (so that there is no notionof substitutability). Our model identifies novel strategic incentives for greater informationdisclosure.We motivated our study with the example of pharmaceutical companies strategicallydisclosing information to prescribing physicians. The assumption of high attention costs, aswell as a low outside option for the receiver, are reasonable in this context.The model is well suited to study strategic disclosure in numerous other settings whereinformation is ‘complex’, e.g. the disclosure of features of retirement savings plans toconsumers, or the informational content of political campaigns.
References
Albrecht, B.C..
Mimeo . Alper, Brian S, Jason A Hand, Susan G Elliott, Scott Kinkade, Michael J Hauan,Daniel K Onion, and Bernard M Sklar.
Journal of the Medical Library association ,92(4): 429.
Au, P. H. and K. Kawai.
Working paper .2017b. “Competitive Information Disclosure by Multiple Senders.”
Working paper . Battaglini, Marco.
Economet-rica , 70(4): 1379–1401.
Blackwell, David.
Proceedings of the Second BerkeleySymposium on Mathematical Statistics and Probability : 93–102, Berkeley, Calif.: Universityof California Press. 25953. “Equivalent comparisons of experiments.”
The annals of mathematical statistics :265–272.
Bloedel, Alexander W and Ilya R Segal.
Working paper . Board, Simon and Jay Lu.
Journal of Political Economy , 126(5): 1965–2010.
Boleslavsky, Raphael and Christopher Cotton.
American Economic Journal: Microeconomics , 7(2): 248–79, 4.2018. “Limited capacity in project selection: Competition through evidence produc-tion.”
Economic Theory , 65(2): 385–421.
Caplin, Andrew, Mark Dean, and John Leahy.
Working paper . Gentzkow, Matthew and Emir Kamenica.
American Eco-nomic Review , 104(5): 457–62.
Goldacre, Ben.
Bad pharma: how drug companies mislead doctors and harm patients :Macmillan.
Hulko, Artem and Mark Whitmeyer. arXiv preprint arXiv:1802.09396 . Kamenica, Emir and Matthew Gentzkow.
AmericanEconomic Review , 101(6): 2590–2615.
Lipnowski, Elliot, Laurent Mathevet, and Dong Wei.
Forthcoming in American Economic Review: Insights . Matyskova, Ludmila.
Working paper . Sims, Christopher A.
Journal of monetaryEconomics , 50(3): 665–690.
Treust, Maël Le and Tristan Tomala.
Working paper . 26 ei, Dong.
Working paper . Whitmeyer, Joseph and Mark Whitmeyer. arXiv preprint arXiv:1905.05157 . A Proofs
Consider any k > and µ ∈ (0 , . Let each sender offer support { l, h } , with l ∈ [0 , µ ) and h ∈ ( µ, .We begin by proving a series of Lemmata. Lemma A.1.
Suppose that R ’s stage 1 draw is x ∈ [ l, h ] and she visits the sender at stage 2. R ’s stage 2 optimal garbling is either degenerate or binary, and its support is as follows.1. If k > h − l ) and µ ≤ min (cid:8) h − k , l + k (cid:9) : { µ } if x ∈ [ l, l + k ( µ − l ) ] (cid:110) l, l + (cid:113) x − lk (cid:111) if x ∈ ( l + k ( µ − l ) , l + k ) (cid:8) x − k , x + k (cid:9) if x ∈ (cid:2) l + k , µ + k (cid:1) { µ } if x ∈ (cid:2) µ + k , h (cid:3)
2. If k > h − l ) and µ ≥ max (cid:8) h − k , l + k (cid:9) : lef t [ { µ } if x ∈ (cid:2) l, µ − k (cid:3)(cid:8) x − k , x + k (cid:9) if x ∈ (cid:0) µ − k , h − k (cid:3)(cid:110) h − (cid:113) h − xk , h (cid:111) if x ∈ (cid:0) h − k , h − k ( h − µ ) (cid:1) { µ } if x ∈ [ h − k ( h − µ ) , h ]
3. If l + k ≤ µ ≤ h − k : (cid:8) x − k , x + k (cid:9) if x ∈ (cid:0) µ − k , µ + k (cid:1) { µ } if x ∈ (cid:2) l, µ − k (cid:3) ∪ (cid:2) µ + k , h (cid:3) . If k > h − l ) and h − k < µ < l + k : { µ } if x ∈ [ l, l + k ( µ − l ) ] (cid:110) l, l + (cid:113) x − lk (cid:111) if x ∈ ( l + k ( µ − l ) , l + k ) (cid:8) x − k , x + k (cid:9) if x ∈ (cid:2) l + k , h − k (cid:3)(cid:110) h − (cid:113) h − xk , h (cid:111) if x ∈ ( h − k , h − k ( µ − h ) ) { µ } if x ∈ [ h − k ( µ − h ) , h ]
5. If k ≤ h − l ) : { µ } if x ∈ [ l, l + k ( µ − l ) ] (cid:110) l, l + (cid:113) x − lk (cid:111) if x ∈ ( l + k ( µ − l ) , l + k ( h − l ) ) { l, h } if x ∈ [ l + k ( h − l ) , h − k ( h − l ) ] (cid:110) h − (cid:113) h − xk , h (cid:111) if x ∈ ( h − k ( h − l ) , h − k ( µ − h ) ) { µ } if x ∈ [ h − k ( µ − h ) h ] Proof. R ’s stage 2 payoffs for a stage 2 belief y are given by U ( y ; x ) = max { x, y } − k ( x − µ ) . This is piecewise concave. We first obtain the concavification of U ( y ; x ) over [ l, h ] and thenuse it to find the optimal garbling.The concavification of U ( y ; x ) is obtained by joining two points y , y (in a straight line)with l ≤ y < x < y ≤ h . By the definition of concavification of a function, we must have U (cid:48) ( y ; x ) ≤ U ( y ; x ) − U ( y ; x ) y − y ≤ U (cid:48) ( y ; x ) , (3)with the first inequality holding with equality if y > l and the second one holding withequality if y < h . The best way to see this is to assume it is does not hold and see that the definition of concavification isviolated. y = x − k , y = x + 14 k . If l + k < x < h − k , the concavification is given by y = x − k , y = x + k .If x ≤ min (cid:8) l + k , h − k (cid:9) , the lower bound l binds and the concavification has y = l . y = l + (cid:113) x − lk is obtained from the second equality in Inequation 3.If x ≥ max (cid:8) h − k , l + k (cid:9) , the upper bound h binds and the concavification has y = h . y = h − (cid:113) h − xk is obtained from the first equality in Inequation 3.If h − k < x < l + k , the concavification is:1. y = l , y = l + (cid:113) x − lk if l + (cid:113) x − lk ≤ h .2. y = h , y = h − (cid:113) h − xk if h − (cid:113) h − xk ≥ l .3. y = l , y = h otherwise.Having obtained the concavification for any x , the optimal stage 2 garbling has support { y , y } if µ ∈ ( y , y ) , and support { µ } otherwise. Straightforward algebra then gives us thestated result. (cid:4) Lemma A.2.
1. If k > h − l ) and µ ≤ min (cid:8) h − k , l + k (cid:9) , R ’s optimal stage 1 garblingis(a) Any Bayes plausible distribution with support drawn from the set (cid:8) µ − k (cid:9) ∪ (cid:2) l + k , µ + k (cid:3) if µ ≥ l + k .(b) The distribution with support { l, y ( µ ) } with y ( µ ) ∈ (cid:0) µ, l + k (cid:1) if µ < l + k .2. If k > h − l ) and µ ≥ max (cid:8) h − k , l + k (cid:9) , R ’s optimal stage 1 garbling is:(a) Any Bayes plausible distribution with support drawn from the set (cid:2) µ − k , h − k (cid:3) ∪ (cid:8) µ + k (cid:9) if µ ≤ h − k .(b) The distribution with support { y ( µ ) , h } with y ( µ ) ∈ (cid:0) h − k , µ (cid:1) if h − k < µ . . If l + k ≤ µ ≤ h − k , R ’s optimal stage 1 garbling is any Bayes plausible distributionwith support on (cid:2) µ − k , µ + k (cid:3) .4. If k > h − l ) and h − k < µ < l + k , R ’s stage 1 optimal garbling is:(a) Any Bayes plausible distribution with support drawn from (cid:8) µ − k (cid:9) ∪ (cid:2) l + k , h − k (cid:3) ∪ (cid:8) µ + k (cid:9) if l + k ≤ µ ≤ h − k .(b) The distribution with support { l, y ( µ ) } with y ( µ ) ∈ (cid:0) µ, l + k (cid:1) if µ < l + k .(c) The distribution with support { y ( µ ) , h } with y ( µ ) ∈ (cid:0) h − k , µ (cid:1) if h − k < µ .5. If k ≤ h − l ) , then(a) If µ ≤ l + h , R ’s optimal stage 1 garbling is { l, y ( µ ) } , where y ( µ ) > µ is eitheron ( l + k ( µ − l ) , l + k ( h − l ) ) or on [ l + k ( h − l ) , h − k ( h − l ) ] .(b) If µ > l + h , R ’s optimal stage 1 garbling is { y ( µ ) , h } , where y ( µ ) < µ is eitheron [ l + k ( h − l ) , h − k ( h − l ) ] or on ( h − k ( h − l ) , h − k ( µ − h ) ) .Proof. Let U ( x ) be R ’s first stage continuation payoffs for a first stage belief x . Say thestage 2 distribution following x has support { y , y } , with y ≤ y and νy + (1 − ν ) y = µ .Then U ( x ) = νU ( y ; x ) + (1 − ν ) U ( y ; x ) − k ( x − µ ) . The concavification of U over [ l, h ] is used to obtain the stage 1 optimal distribution.For any µ , U is continuous. Note that U is affine over any interval of x for which thestage 2 optimal garbling is (cid:8) x − k , x + k (cid:9) . Remark.
If the stage 1 optimal garbling is unique, then it cannot have support { µ } .The reason for this is the following. If the stage 1 unique optimal garbling is degenerate,then it is verified from Lemma A.1 that the stage 2 optimal garbling has binary support, say { y , y } . But then, choosing the garbling { y , y } at stage 1 and { µ } at stage 2 must give thesame expected payoff, and hence must be optimal. This is a contradiction.Now, first let k > h − l ) and µ ≤ min (cid:8) h − k , l + k (cid:9) .Then U is strictly convex in a right neighborhood of k ( µ − l ) and concave everywhereelse (weakly on (cid:0) l + k , µ + k (cid:1) ). Then, the concavification must join points z ≤ k ( µ − l ) and z > k ( µ − l ) (in a straight line), with z , z determined by a condition analogous toInequation 3. 30ay µ ≥ l + k . Then it is verified that z = µ − k and z = l + k . Since µ ∈ (cid:2) l + k , µ + k (cid:3) and U is affine over this interval, a distribution with support on (cid:8) µ − k (cid:9) ∪ (cid:2) l + k , µ + k (cid:3) would be optimal.Now say µ < l + k . Clearly the lower bound l would bind and z = l must hold. z is obtained from the second equality in Inequation 3, and it must be higher than µ , sinceotherwise the optimal garbling would uniquely be degenerate, and we ruled that out above. z is denoted by y ( µ ) in the statement of the Lemma.Now let k > h − l ) and µ ≥ max (cid:8) h − k , l + k (cid:9) . The argument is symmetric to thepreceding one.In this case U is strictly convex in a left neighborhood of h − k ( h − µ ) and concaveeverywhere else (weakly on ( µ − k , h − k )) . The concavification is obtained by joining points z and z as before.It is verified that for µ ≤ h − k , z = h − k and z = µ + k . This tells us that adistribution with support on (cid:2) µ − k , h − k (cid:3) ∪ (cid:8) µ + k (cid:9) would be optimal.For µ > h − k , z = h must hold. Now z is found from the first equality in Inequation3, and it must be lower than µ , since otherwise the stage 1 optimal garbling would uniquelybe degenerate. z is denoted by y ( µ ) in the statement of the Lemma.Cases and are dealt with completely analogously.Finally, let k ≤ h − l ) .Then U is strictly convex in a right neighborhood of l + k ( µ − l ) , and in a left neighborhoodof h − k ( h − µ ) , and strictly concave everywhere else.Clearly, the concavification must:1. join points z ∈ [ l, l + k ( µ − l ) ) and z > l + k ( µ − l ) in a straight line, and2. join points z < h − k ( h − µ ) and z ∈ ( h − k ( h − µ ) , h ] in a straight line.As usual, these points are determined by a condition analogous to Inequation 3. It turnsout that z = l and z = h , while the positions of z and z depend on parameters. Theoptimal garbling is either { l, z } or { z , h } , depending on where µ lies. (cid:4) The previous result immediately gives us the following useful corollary.31 orollary A.2.1.
The following two statements are equivalent:1. µ ∈ [ l + k , h − k ] and k > h − l ) .2. There are multiple stage 1 optimal garblings for R , including support { µ } and support (cid:8) µ − k , µ + k (cid:9) . Lemma A.3.
Suppose k > h − l ) and µ ∈ [ l + k , h − k ] . If R ’s behavior is as specified inLemmata A.1 and A.2, then conditional on being the first sender to be visited, the probabilityof being selected is the same regardless of which stage 1 optimal garbling is chosen by R .Proof. We show the proof for µ ≤ min (cid:8) h − k , l + k (cid:9) . It is entirely analogous for the othercases from Lemma A.1.Suppose l + k ≤ µ ≤ min (cid:8) h − k , l + k (cid:9) and R ’s first stage response is a distribution F on (cid:8) µ − k (cid:9) ∪ [ l + k , µ + k ] .Using Lemma A.1 it is easy to see that the probability of the first sender being selectedconditional on a first stage belief x is given by P ( x ) = if x = µ − k kx − kµ + if x ∈ [ l + k , µ + k ] Suppose that F places a mass p ≥ on µ − k . Then conditional on being visited first, asender’s expected probability of being selected is given by V = p ∗ (cid:90) µ + k l + k P ( x ) dF ( x ) (4)Next note that p ( µ − k ) + (cid:90) µ + k l + k xdF ( x ) = µ (5)and (cid:90) µ + k l + k dF ( x ) = 1 − p (6)Inserting Equations 3 and 4 into Equation 2, we get that V = , which is independent of F . (cid:4) .1 Proof of Proposition 4.10 Suppose each sender offers support { l, h } , with l ∈ [0 , µ ) and h ∈ ( µ, .First let k > h − l ) and µ ∈ [ l + k , h − k ] .Given a stage 1 draw x , R ’s optimal stage 2 garbling is specified in Lemma A.1. If thisgarbling does not have support { µ } , R necessarily visits the second sender. If it is { µ } , sheis indifferent between visiting him and not, and may choose either way.At stage 1, she has multiple best responses. The most informative one among themhas support (cid:8) µ − k , µ + k (cid:9) , and from Lemma A.2 it is the only one that is necessarilyfollowed by no learning at stage 2. We assume that she breaks her indifference in favor ofthis distribution.At belief µ − k she accepts the first sender with certainty, and at belief µ + k acceptsthe other one with certainty.Then if a sender deviates to a different distribution, his payoffs may be affected only ifhe is visited first and the distribution he deviates to is such that (cid:8) µ − k , µ + k (cid:9) is not agarbling of it.In this case, regardless of the deviation, R can secure a payoff equal to what she gets inthe absence of the deviation, by picking { µ } at stage 1, followed by visiting the other senderand choosing (cid:8) µ − k , µ + k (cid:9) . Thus the deviation cannot force R to choose from outsidethe set of optimal garblings from Lemma A.2.But then due to Lemma A.3, the deviating sender’s payoffs are unaffected. Thus, theredoes not exist a profitable deviation and we have an equilibrium.Next say that either k > h − l ) and µ (cid:54)∈ [ l + k , h − k ] , or k ≤ h − l ) .Then from Lemmata A.2 and A.1, R chooses a unique binary garbling at stage 1, andexactly one belief in the support is followed by a visit to the second sender.Denote the stage 1 belief following which R does learn at stage 2 by w . Under eachpossibility we show that there is a profitable deviation for a sender. Possibility 1 : If w < µ , then ∃ l (cid:48) ∈ [max (cid:8) , µ − k (cid:9) , µ ) s.t w = l (cid:48) + k ( µ − l (cid:48) ) . Supposea sender deviates to support { l (cid:48) , h } . Then from Lemma A.1, if the deviating sender is visitedsecond, R chooses support { µ } and selects the deviating sender with certainty. This doesnot affect R ’s behavior if the deviating sender is visited first, since l (cid:48) < w . Thus the senderprofits from this deviation. Possibility 2 : If w > µ and is followed by a stage 2 best response (cid:110) l, l + (cid:113) w − lk (cid:111) , then33t must be true that w ∈ ( l + k ( µ − l ) , l + k ) . ∃ h (cid:48) < l + (cid:113) w − lk s.t w < h (cid:48) − k ( h (cid:48) − l ) .Suppose a sender deviates to { l, h (cid:48) } . Then k ≤ h (cid:48) − l ) , and Lemma A.1 tells us that if thedeviating sender is visited at stage 2, R ’s response changes to { l, h (cid:48) } . w < h (cid:48) < l + (cid:113) w − lk implies that this is profitable if visited at stage 2, without affecting what happens if visitedat stage 1. Thus the deviation is profitable. Possibility 3 : If w > µ and is followed by a stage 2 best response of { l, h } . Then it isseen that w < h , so that h does not bind at stage 1. This implies that a sender can increaseor decrease h slightly to h (cid:48) (so that { l, h (cid:48) } is instead chosen), without affecting what happensif he is visited first. This is clearly profitable. (cid:4) A.2 Proof of Claim 4.2 ‘Only if ’ : Suppose that there is no equilibrium in which both senders offer full info. Then,Proposition 4.3 tells us that either k ≤ , or k > and µ (cid:54)∈ [ k , − k ] . Lemma A.1 andLemma A.2 tell us R ’s unique best response (on path) to full info from both senders. Nowwe need to show that there is no equilibrium where she gets her first best payoff. For thesake of contradiction, suppose that there is such an equilibrium–and where sender i offerssome p i . From the discussion in the main text, this just means that R ’s best response onpath to ( p , p ) , is the same as the best response to full information. We argue, however, thatthe same deviations that we identified for full info, also work for this supposed equilibrium.Recall the nature of those deviations from A.1: they do not make a difference if the deviatingsender is visited first, and restrict learning if visited second. Now if p , p is the equilibriumunder consideration and the same deviation occurs, R ’s response to this deviation would beas under full info: if she visits the deviating sender first, she would realize she can continueto choose as on path; if she visits him second, she would make the same adjustment as underthe full info scenario.Thus, since the deviation was profitable under full info, it must be profitable here, and p , p cannot be an equilibrium. (cid:4) A.3 Proof of Proposition 4.3
See the proof of Proposition 4.10, setting l = 0 , h = 1 .34 .4 Proof of Proposition 4.4 Existence of the uninformative equilibrium is proven in the text. Here we show non-existenceof a full information equilibrium.Suppose that each sender chooses a fully informative distribution. Because each senderhas chosen the same distribution (on path), R is indifferent as to whom she visits first.Hence, suppose that she visits sender first with probability λ ∈ [0 , and sender with itscomplement.If sender is visited first, then upon R ’s visit, is realized with probability µ . At thispoint, she will stop and select sender . On the other hand, if is realized then she will selectsender without visiting. The symmetric statements hold for sender and her payoff is u = λ (1 − µ ) + (1 − λ ) µ Now suppose that sender deviates and chooses a distribution that consists of 1 withprobability η := µ − n , n ∈ N , n > µ , and (cid:15) with probability n − µ , where (cid:15) := n +1 − µn .If sender is visited first then again sender obtains an expected payoff of (1 − µ ) . If sender is visited first, with probability η , is realized and sender is selected and with probability (1 − η ) , (cid:15) is realized. At this point R visits sender and obtains a realization of withprobability − µ , at which point she selects sender . Accordingly, u = λ (1 − µ ) + (1 − λ ) ( η + (1 − η )(1 − µ )) and so sender has a profitable deviation if and only if λ (1 − µ ) + (1 − λ ) ( η + (1 − η )(1 − µ )) > λ (1 − µ ) + (1 − λ ) µ which reduces to n − √ n n > µ provided λ < . Without loss of generality we may assume this, since otherwise the sameargument would suffice for a deviation by sender .The limit of the left hand side goes to as n goes to ∞ ; hence for any µ < there existsa ˆ n such that the left hand side is strictly greater than µ for all n > ˆ n . We conclude that forany µ < there exists a profitable deviation, negating the possibility that full information isan equilibrium. (cid:4) .5 Proof of Claim 4.5 For µ ≤ :Let each sender choose the uniform distribution on [0 , µ ] , and suppose that R visitssender first with probability λ ∈ [0 , and sender with its complement.No matter the realization at stage 1, R will proceed and visit the other sender as wellbefore selecting one of them. Hence, u = u = . Next, we check for a profitable deviation.Suppose sender deviates to a distribution that contains a probability measure of size a on [2 µ, and some portion F on [0 , µ ) . It is clear that it is without loss of generality to set a to be a point mass on µ .If sender is visited first then with probability a , he is selected and sender is nevervisited; and otherwise, sender is visited after which R selects the sender with the highestrealization. If sender is visited first, then no matter what, sender is also visited, afterwhich the comparison ensues.Sender ’s payoff is u = λ (cid:18) a + (cid:90) µ (cid:90) x dG ( y ) dF ( x ) (cid:19) +(1 − λ ) (cid:18) a + (cid:90) µ (cid:90) x dGdF (cid:19) = a + (cid:90) µ (cid:90) x dG ( y ) dF ( x ) where G ( y ) = y µ is the (on-path) distribution chosen by sender and where (cid:82) µ dF = 1 − a and (cid:82) µ xdF = 2 − µa .Next, we use the result in Whitmeyer and Whitmeyer (2019) who establish that it sufficesto show that has no profitable deviation to any binary distribution. Let F be describedby α with probability p and β with probability − p ; where ≤ α ≤ µ , µ ≤ β ≤ µ , and αp + β (1 − p ) = µ . Consequently, we rewrite u , which becomes u = (1 − p ) F ( β ) + pF ( α )= (1 − p ) β µ + p α µ = 12 Hence, there is no profitable deviation. (cid:4)
For µ > : 36n path, sender ’s payoff is u = λ (cid:32) − µ + (cid:90) − µ )0 (cid:90) x µ µ dydx (cid:33) + (1 − λ ) (cid:32)(cid:18) µ − (cid:19) (cid:18) − µ (cid:19) + (cid:90) − µ )0 (cid:90) x µ µ dydx (cid:33) = 2 (2 µ − λ + (1 − µ ) (3 µ − µ If sender deviates to with probability µ and with probability − µ , his payoff fromdeviating is u D = λµ + (1 − λ ) (cid:18) µ − (cid:19) µ = 1 + 2 λµ − λ − µ The difference, u D − u is (1 − µ ) (2 µ −
1) (2 λ − µ Since µ > , this is positive provided λ > and negative provided λ < . Thus, if λ (cid:54) = there exists a profitable deviation (if λ < , sender can deviate profitably in the analogousfashion).It remains to show that this vector of distributions is an equilibrium for λ = . Substituting λ = into u , we see that u = on path. Just as for µ ≤ , from Whitmeyer and Whitmeyer(2019) we need check only deviations to binary distributions. Let F be described by α withprobability p and β with probability − p , where αp + β (1 − p ) = µ and ≤ α ≤ µ . Thereare two cases that we need to consider. 1. µ ≤ β ≤ − µ ) ; and 2. β = 1 . In the first case, u = (1 − p ) F ( β ) + pF ( α )= (1 − p ) β µ + p α µ = 12 and in the second case u = 12 (1 − p + pF ( α )) + 12 (cid:18)(cid:18) µ − (cid:19) (1 − p ) + pF ( α ) (cid:19) = p α µ + 1 − p µ = 12 where we used the fact that β = 1 implies that − p = µ − pα . Hence, there is no profitabledeviation. (cid:4) .6 Proof of Lemma 4.6 See the proof of Lemma A.1, setting l = 0 , h = 1 and k = 1 . A.7 Proof of Lemma 4.7
See the proof of Lemma A.2, setting l = 0 , h = 1 and k = 1 . A.8 Proof of Lemma 4.8
See the proof of Lemma A.3, setting l = 0 , h = 1 and k = 1 . A.9 Proof of Claim 4.9
Let k > and µ ∈ [ k , − k ] . As shown in Appendix A.1, one of R ’s best responses to fullinformation ( l = 0 , h = 1 ) from both senders is to choose the garbling (cid:8) µ − k , µ + k (cid:9) atstage 1 and to learn nothing at stage 2.Suppose sender i offers a distribution of which (cid:8) µ − k , µ + k (cid:9) is a garbling. Then, theaforementioned best response to full information is permissible, and thus continues to be abest response. Suppose R chooses this response.Then if a sender unilaterally deviates and is the one to be visited first, R may respond bychoosing { µ } and visiting the other sender, choosing (cid:8) µ − k , µ + k (cid:9) for him. Exactly as inthe proof for existence of a full information equilibrium (Proposition 4.10 for h = 1 , l = 0 ),Lemma A.3 can be used to argue that the deviation cannot be profitable. (cid:4) A.10 Proof of Corollary 4.9.1
For µ ≤ :We show that (cid:8) µ − k , µ + k (cid:9) is a mean preserving contraction of the uniform distributionon [0 , µ ] when k ≥ µ .Define l ( x ) as l ( x ) = ≤ x < µ − k x − µ + k µ − k ≤ x < µ + k x − µ µ + k ≤ x ≤ Define j ( x ) := (cid:82) x G ( t ) dt : 38 ( x ) = x µ ≤ x < µx − µ µ ≤ x ≤ It suffices to show that µ > k , that j ( x ) − l ( x ) = 0 has at most one real root, and that j (cid:0) µ + k (cid:1) > l (cid:0) µ + k (cid:1) .Set j ( x ) = l ( x ) , which holds if and only if x = 4 µk ± (cid:112) kµ (1 − kµ )4 k This is imaginary if and only if k > µ and has a unique root for k = µ (at µ ). µ − k ≥ µ > for k ≥ µ . It remains to verify that j (cid:0) µ + k (cid:1) > k (cid:0) µ + k (cid:1) ; but it is simple to verify that this must hold. Thus, if k ≥ µ , wehave the result. For µ > :The proof is analogous to the preceding one, with the exception that k must be sufficientlylarge so that µ + k ≤ . This holds if and only if k ≥ − µ ) . This constraint binds for µ ≥ and k ≥ µ binds for µ ≤ . (cid:4) A.11 Proof of Proposition 5.1
Suppose each sender offers support { l, h } , with l ∈ [0 , µ ) and h ∈ ( µ, .Lemma A.1 and Lemma A.2 continue to describe on path behavior. Lemma A.3 still holds.First let k > h − l ) and µ ∈ [ l + k , h − k ] .On path behavior is exactly as in the baseline model: visit any one sender, pick (cid:8) µ − k , µ + k (cid:9) and take a decision without learning from the other sender.Then if a sender deviates to a different distribution, it would be observed. Then R cansimply respond by visiting the other, non-deviating sender, and picking (cid:8) µ − k , µ + k (cid:9) forhim and taking a decision immediately. 39ue to Lemma A.3, the deviating sender’s payoffs are the same as on path. Thus, theredoes not exist a profitable deviation and we have an equilibrium.Next say that either k > h − l ) and µ (cid:54)∈ [ l + k , h − k ] , or k ≤ h − l ) .Then from Lemmata A.2 and A.1, on path R chooses a unique binary garbling at stage 1,and exactly one belief in the support is followed by a visit to the second sender.Denote the stage 1 belief following which R does learn at stage 2 by w . Under eachpossibility we show that there is a profitable deviation for a sender. Possibility 1 : Say w < µ and the stage 2 garbling is { l, h } . There must be a sender, saysender i , who is visited first with probability < on path. Suppose sender i deviates to { l (cid:48) , h } , where l < l (cid:48) < w . But on observing this deviation, R would choose to visit sender i first. By doing this she could get her first best. Thus, behavior is as on path, except that theorder of visits is changed: sender i is visited first with probability 1. It is easy to verify thatthe payoff from being visited first is > (i.e. higher than payoff from being visited second),which means that this increase in probability of being visited first is profitable. Possibility 2 : Say w < µ and the stage 2 garbling is (cid:110) h − (cid:113) h − kk , h (cid:111) . Everything is as inpossibility 1, except that l (cid:48) is chosen such that h − (cid:113) h − kk < l (cid:48) < w . Possibility 3: If w > µ and is followed by a stage 2 best response (cid:110) l, l + (cid:113) w − lk (cid:111) or { l, h } .Then if a sender deviates to no information, clearly R would just learn from the other senderwith a threshold of acceptance µ . It is verified that the deviating sender’s payoffs then arehigher than the payoffs on path, conditional on being visited first as well as conditional onbeing visited second. (cid:4) A.12 Proof of Lemma 5.2
See the proof of Lemma A.2, setting l = 0 , h = 1 and k = 1 and using µ as the mean for thesecond sender and µ as the mean for the first sender. A.13 Proof of Lemma 5.3
Let us begin by looking at the parametric conditions given in bullet points and of Lemma5.2. By symmetry it suffices to assume that one of these two pairs of conditions holds for thescenario in which sender is visited second, and show that that implies that one of the fourpairs of conditions for the scenario in which sender is visited first must hold. Observe thatthe conditions for bullet points and reduce to | µ − µ | ≤ and µ ∈ (cid:2) , (cid:3) . It is easy to40ee that if µ ∈ (cid:2) , (cid:3) then we are done. What if µ / ∈ (cid:2) , (cid:3) ? WLOG suppose that µ < .By assumption we must have µ − ≤ µ and µ ≥ . Hence, condition (with µ and µ transposed) must hold.Next, we turn our attention to the conditions given in bullet points and . WLOG itsuffices to focus on the conditions in bullet point . As we did in the previous paragraph,it suffices to assume that these conditions hold for the scenario in which sender is visitedsecond, and show that that implies that one of the four pairs of conditions for the scenarioin which sender is visited first must hold. By construction, µ ≤ µ + and µ ∈ (cid:2) , (cid:3) .Moreover, µ ≥ µ > µ − , and so condition (with µ and µ transposed) must hold. (cid:4) A.14 Proof of Proposition 5.4
It suffices to show that conditional on being the first sender to be visited, the probability ofbeing selected is the same regardless of which stage optimal garbling is chosen by R . Theremainder of the proof follows analogously to the proof of Lemma A.3. Alternatively, observethat it follows from the fact that probability of the first sender being selected conditional ona first stage belief x is either , , or a function that is affine in x . (cid:4)(cid:4)