Optimal Disclosure of Information to a Privately Informed Receiver
OOptimal Disclosure of Information to a PrivatelyInformed Receiver
Ozan Candogan Philipp StrackJanuary 29, 2021
Abstract
We study information design problems where the designer controls informationabout a state and the receiver is privately informed about his preferences. The receiver’saction set is general and his preferences depend linearly on the state. We show thatto optimally screen the receiver, the designer can use a menu of “laminar partitional”signals. These signals partition the states such that the same message is sent in eachpartition element and the convex hulls of any two partition elements are either nested orhave an empty intersection. Furthermore, each state is either perfectly revealed or liesin an interval in which at most n + 2 different messages are sent, where n is the numberof receiver types. In the finite action case an optimal menu can be obtained by solvinga finite-dimensional convex program. Along the way we shed light on the solutionsof optimization problems over distributions subject to a mean-preserving contractionconstraint and additional constraints which might be of independent interest. We study how a sender can use her private information about a real-valued state to influencethe mean belief and actions of a receiver who possesses private information himself. Forexample, the sender could be a seller who releases information about features of the product(e.g., through limited product tests), while the receiver could be a buyer who has privateinformation about his preferences and decides how many units of the product (if any) topurchase. To influence the receiver’s mean belief the sender offers a menu of signals. Thereceiver chooses which of these signals to observe based on his own private information.For example, if each signal would reveal information about a specific feature of the product1 a r X i v : . [ ec on . T H ] J a n he buyer (receiver) might optimally decide to observe the signal which reveals informationabout the feature he cares about the most.We show there always exists a menu of “simple” signals that is optimal for the sender.First, each signal in the optimal menu is partitional, which means that it reveals to thebuyer a non-random set in which the state lies. In other words there exists a (deterministic)function mapping each state to a message revealed to the receiver. Thus, such an optimalsignal does not introduce additional noise. Second, we show that the partition of the optimalsignal is laminar which means that the convex hull of sets of states in which different messagesare sent are either nested or they do not overlap. This means that an optimal signal can becompletely described by the collection of (smallest) intervals in which each message is sent.This characterization is invaluable for tractability as it allows us to derive simple transparentnecessary conditions for the optimality of the signal by taking the first-order conditionswith respect to changes of the bounds of the aforementioned intervals. Finally, we showthat the laminar partition structure has “depth” at most n + 2, where n is the number ofpossible different realizations of the private information of the receiver. That is, the intervalsassociated with a signal realization overlaps with at most n + 1 other intervals (associatedwith different signal realizations). This final property further reduces the complexity of theproblem as it implies that a state is (i) either perfectly revealed, or (ii) it lies in an intervalin which the distribution of the posterior mean of the receiver has at most n + 2 atoms.To obtain our results, we study optimization problems over distributions, where theobjective is linear in the chosen distribution, and a distribution is feasible if it satisfies (i) amajorization constraint (together with an underlying state distribution) as well as (ii) someside constraints. We characterize the properties of the optimal solutions to such problems.In particular, we show that the optimal distributions have mass points whose locations andnumbers depend on the side constraints. Moreover, there exists a laminar partition of theunderlying state space such that the signal based on this laminar partition “generates” theoptimal distribution. Given the generality of the optimization formulations described above,we suspect that these results may have applications beyond the private persuasion problemstudied in the paper. We discuss some immediate applications in Section 5.We also investigate the special case where the receiver has finitely many possible ac-tions. In this case, we show that the designer’s problem can be formulated as a tractablefinite-dimensional convex program (despite there being a continuum of states). Further- A Gaussian signal which equals the state plus an independent normal shock is for example not partitionalas its realization is random conditional on the state.
Literature Review.
Following the seminal work by Kamenica and Gentzkow (2011), theliterature on Bayesian persuasion studies how a sender can use information to influencethe action taken by a receiver. This framework has proven useful to analyze a varietyof economic applications, such as the design of grading systems , medical testing , stresstests and banking regulation , voter mobilization , as well as various applications in socialnetworks . Initial papers focused on the case where either the receiver possesses no private infor-mation or the sender uses public signals (Brocas and Carrillo, 2007; Rayo and Segal, 2010;Kamenica and Gentzkow, 2011; Gentzkow and Kamenica, 2016b). Kolotilin et al. (2017)extend this model by considering the case where the receiver possesses private informationabout his preferences. Guo and Shmaya (2019) consider the case where the receiver pos-sesses private information about the state. Both papers derive the optimal menu of signalsto screen the receiver. Importantly, they assume that the action taken by the receiver isbinary. Kolotilin et al. (2017) consider the case where the sender maximizes the probabilitywith which the receiver takes one of two actions. Assuming that the receiver’s payoff islinear and additive in the state and his type they show that it is without loss of generalityto restrict to public signals in the sense that every outcome that can be implemented withprivate signals can also be implemented with public signals. Guo and Shmaya (2019) con-sider a general monotone utility of the sender and receiver, but maintain the assumption ofbinary actions. They show that even though not every outcome that can be implemented Ostrovsky and Schwarz (2010); Boleslavsky and Cotton (2015). Schweizer and Szech (2018). Inostroza and Pavan (2018); Goldstein and Leitner (2018); Orlov et al. (2018). Alonso and Cˆamara (2016). Candogan and Drakopoulos (2017); Candogan (2019b). For an excellent survey of the literature see Kamenica (2019) and Bergemann and Morris (2019). designer optimal outcome can always be implemented with public signals.We complement this line of the literature by studying the case where the receiver canpotentially choose among more than two actions and find that public signals are in generalnot optimal. In this setting, we fully characterize the structure of the optimal menus.The Bayesian persuasion literature is closely related to the notion of “Bayes correlatedequilibria” (Bergemann and Morris, 2013, 2016). Bayes correlated equilibria characterizethe set of all outcomes that can be induced in a given game by revealing a signal. Thus, aBayesian persuasion problem can be solved by maximizing over the set of Bayes correlatedequilibria. While the basic concept does not allow for private information of the receiver itcan be extended to account for this case (c.f. Section 6.1 in Bergemann and Morris, 2019).In this paper, the state belongs to a continuum and the designer’s payoff depends onthe induced posterior mean. In such settings, assuming that the receiver has no privateinformation, the the approaches in Bergemann and Morris (2016); Kolotilin (2018); Dworczakand Martini (2019), can be used to characterize the optimal information structure. On theother hand, even if the receiver has finitely many actions, these approaches yield infinite-dimensional optimization problems. Alternatively, as established in Gentzkow and Kamenica(2016b), it is possible to associate a convex function with each information structure, andcast the information design problem as an optimization problem over all convex functionsthat are sandwiched in between two convex functions (associated with the full disclosure andno-disclosure information structures), which also yields an infinite-dimensional optimizationproblem. This constraint is equivalent to a majorization constraint restricting the set offeasible posterior distributions. Arieli et al. (2020) and Kleiner et al. (2020) characterize theextreme points of this set. This characterization implies that in the case without privateinformation one can restrict attention to signals where each state lies in an interval suchthat for all states in that interval at most 2 messages are sent – an insight also observed inCandogan (2019a,b).Our results generalize this insight to the case where the sender has private informationand show that each state lies in an interval in which at most n + 2 messages are used, where n is the number of types of the receiver. Furthermore, we show that in the case where thereceiver has finitely many actions, even if the state space is continuous and the receiver hasprivate information, the optimal menu can be obtained by solving a simple and tractable(finite-dimensional) convex optimization problem.4 Model
States and Types
We consider an information design setting in which a sender (she) triesto influence the action taken by a privately informed receiver (he). We call the informationcontrolled by the sender the state ω ∈ Ω and the private information of the receiver thetype θ ∈ Θ. The state ω lies in an interval Ω = [0 ,
1] and is distributed according to the(cumulative) distribution F : Ω → [0 , f ≥ The receiver’s type θ lies ina finite set Θ = { , . . . , n } and we denote by g θ > θ ∈ Θ. Throughout, we assume that the state ω and the type θ are independent. Signals and Mechanisms
The sender commits to a menu M of signals , revealing infor-mation about the state. We also refer to M as a mechanism . A signal µ assigns to eachstate ω a conditional distribution µ ( ·| ω ) ∈ ∆( S ) over the set of signal realizations S , i.e., µ ( ·| ω ) = P [ s ∈ ·| ω ] . We restrict attention to signals for which Bayes rule is well defined, and denote by P µ [ ·| s ] ∈ ∆(Ω) the posterior distribution induced over states by observing the signal realization s fromthe signal µ , and by E µ [ ·| s ] the corresponding expectation. In the case of finitely many signalrealizations P µ [ ω ≤ x | s ] = (cid:82) x µ ( { s }| ω ) dF ( ω ) (cid:82) µ ( { s }| ω ) dF ( ω ) . (Bayes Rule) Actions and Utilities
After observing his type, the receiver chooses a signal µ from M ,and observes its realization s which we will call a message . Then the receiver chooses anaction a in a compact set A to maximize her expected utilitymax a ∈ A E [ u ( a, ω, θ ) | s ] . We make no additional assumptions on the set of actions A , and allow it to be finite orinfinite.For a given mechanism M , a strategy of the receiver specifies the signal µ θ chosen by The assumption that the state lies in [0 ,
1] is a normalization that is without loss of generality fordistributions with bounded support as we impose no structure on the utility functions. Our arguments canbe easily extended to unbounded distributions with finite mean. We follow the convention of the Bayesian persuasion literature and call a Blackwell experiment a signal. Formally, that requires that P µ [ ·| s ] is a regular conditional probability. θ , as well as the action a θ ( s ) he takes upon observing the message s . Astrategy ( µ, a ) is optimal for the receiver in the mechanism M if (i) the receiver’s actions areoptimal for all types θ ∈ Θ and almost all messages s in the support of µ θ a θ ( s ) ∈ argmax b ∈ A E µ θ [ u ( b, ω, θ ) | s ] , (Opt-Act)and, (ii) each type θ ∈ Θ chooses the expected utility maximizing signal (given the subse-quently chosen actions) µ θ ∈ argmax ν ∈ M (cid:90) Ω (cid:90) S max b ∈ A E ν [ u ( b, ω, θ ) | s ] dν ( s | ω ) dF ( ω ) . (Opt-Signal)One challenge in this environment is that the receiver can deviate by simultaneously misre-porting his type and taking an action different from the one that would be optimal had hetold the truth.We denote by v : A × Ω × Θ → R the sender’s utility. For a given mechanism M andoptimal strategy of the receiver ( µ, a ), the sender’s expected utility equals (cid:88) θ ∈ Θ g θ (cid:90) Ω (cid:90) S E ν (cid:2) v ( a θ ( s ) , ω, θ ) (cid:12)(cid:12) s (cid:3) dµ θ ( s | ω ) dF ( ω ) . (1)The sender’s information design problem is to pick a mechanism M and an optimal receiverstrategy ( µ, a ) to maximize (1).For tractability, we focus on the case of preferences which are quasi-linear in the state. Assumption 1 (Quasi-Linearity) . The receiver’s and sender’s utilities u, v are quasi-linearin the state, i.e. there exist functions u , u , v , v : A × Θ → R continuous in a ∈ A suchthat u ( a, ω, θ ) = u ( a, θ ) ω + u ( a, θ ) v ( a, ω, θ ) = v ( a, θ ) ω + v ( a, θ ) . Assumption 1 is natural in many economic situations and is commonly made in theliterature (e.g., Ostrovsky and Schwarz 2010; Ivanov 2015; Gentzkow and Kamenica 2016b;Kolotilin 2018; Dworczak and Martini 2019). For example Kolotilin et al. (2017) assume For a more detailed discussion of this setting and its economic applications see Section 3.2 in Kamenica(2019). Our results directly generalize to the case where the utilities of the sender and receiver are arbitrarycontinuous (potentially non-linear) functions of the mean belief of the receiver. A = { , } and that the receiver’s utility for one action is zero,and for the other action it is the sum of the type and state, which implies that u ( a, ω, θ ) = a × ( ω + θ ).This restriction is similar to the quasi-linearity assumption commonly made in mechanismdesign settings. There, it is often assumed that the agent’s preferences are quasi-linear in theallocation, which is the natural analogue of the belief in our information design setting, inthat it is chosen by the designer. Notably, in mechanism design it is typically assumed thatthe type is a real number and the utility is multiplicative (or super-modular) in the type andallocation. This assumption ensures that only local incentive constraints are binding andgreatly simplifies the analysis. In our information design setting, this would correspond toassuming that u is non-decreasing in θ . We do not make this super-modularity assumptionin our information design setting. As we show later (see Section 4) even if one imposesit, non-local incentive constraints might bind, and a simplification of the problem like inmechanism design settings cannot be obtained. Remark.
Our results generalize to the case where v ( a, ω, θ ) = u ( a, θ ) h ( ω ) + u ( a, θ ) and u ( a, ω, θ ) = v ( a, θ ) h ( ω ) + v ( a, θ ) for some (potentially) non-monotone and non-linear func-tion h : Ω → R by redefining the state as h ( ω ). This generalization does not require anystructure on the state space as long as the distribution of h ( ω ) admits a density. Our analysis will proceed in two steps. In the first step we will restate the persuasion problemwith a privately informed receiver as a problem without private information, which is subjectto side-constraints on the agent’s beliefs. These side-constraints capture the restrictionsplaced on the mechanism due to possible deviations of different types, both in choosing asignal from the mechanism and in taking an action after observing the message. In thesecond step we establish a bound on the number of signals used in any persuasion problemwith side-constraints which then implies our main result for the persuasion problem withprivate information.To simplify notation we define the receiver’s indirect utility v ∗ : Ω × Θ → R as his utilityfrom taking an optimal action given his posterior mean m and type θu ∗ ( m, θ ) = max a ∈ A u ( a, m, θ ) .
7e also define the indirect utility v ∗ : Ω × Θ → R of the sender as the maximal payoff shecan get from a type θ receiver with a posterior mean m who takes an optimal action v ∗ ( m, θ ) = max a ∈ A ∗ ( m,θ ) v ( a, m, θ ) , where A ∗ ( m, θ ) = argmax b ∈ A u ( b, m, θ ). We note that u ∗ ( · , θ ) is continuous as it is themaximum over linear functions, and it is bounded as u is bounded and v ∗ ( · , θ ) is uppersemicontinuous for every θ . We let G θ : Ω → [0 ,
1] denote the CDF of the posterior mean belief of the receiver whenobserving the signal µ θ G θ ( x ) = P (cid:2) E µ θ [ ω | s ] ≤ x (cid:3) . (2)We say that the distribution G θ over posterior means is induced by the signal µ θ if the aboveequation holds. Incentive Compatibility
To solve the information design problem we first focus on directmechanisms where it is optimal for the receiver to truthfully reveal his type to the sender.
Definition 1.
A mechanism M = { µ , . . . , µ n } is a direct incentive compatible mechanismif for all θ, θ (cid:48) ∈ Θ (cid:90) Ω u ∗ ( s, θ ) dG θ ( s ) ≥ (cid:90) Ω u ∗ ( s, θ ) dG θ (cid:48) ( s ) (IC)The incentive compatibility constraint (IC) requires each type θ of receiver to achieve aweakly higher expected payoff by observing the signal µ θ designated for that type instead ofobserving any other signal µ θ (cid:48) offered by the mechanism. As the sender can always removesignals that are not picked by any receiver type without affecting the outcome (as this relaxesthe incentive constraints). It follows that it is without loss of generality to restrict attentionto incentive compatible direct mechanisms. Lemma 1 (Revelation Principle) . For every mechanism M and associated optimal strategyof the receiver there exists a direct incentive compatible mechanism that leaves each type ofthe receiver and the sender with the same utility. As u is continuous in the action and A compact it follows from Berge’s Maximum Theorem that for every θ the correspondence m (cid:55)→ A ∗ ( m, θ ) is non-empty, upper hemicontinuous, and compact valued. This impliesthat m (cid:55)→ v ∗ ( m, θ ) is upper semicontinuous for every θ . See for example Theorem 17.30 and Theorem 17.31in Aliprantis and Border (2013). easible Posterior Mean Distributions As the payoffs depend only on the mean ofthe receiver’s posterior belief, but not the complete distribution a natural question is whichdistributions over posterior means the sender can induce using a Blackwell signal. An impor-tant notion to address this question is the notion of mean preserving contractions (MPC). Adistribution over states H : Ω → [0 ,
1] is a MPC of a distribution H : Ω → [0 , H (cid:22) H , if and only if for all ω (cid:90) ω H ( z ) dz ≥ (cid:90) ω H ( z ) dz, (MPC)and the inequality holds with equality for ω = 0. The next result goes back to Strassen(1965) and shows that the aforementioned set corresponds exactly to the set of distributionsthat majorize the distribution of states F . Lemma 2 (Strassen 1965) . There exists a signal µ that induces the distribution G overposterior means, i.e. (2) is satisfied if and only if F (cid:22) G θ for all θ ∈ Θ . Thus, instead of considering the optimization problem over signals, we can equivalentlyoptimize over feasible distributions of posterior means.
The Optimal Mechanism
Combining the characterization of incentive compatibilityfrom Lemma 1 and the characterization of feasibility from Lemma 2 we obtain a charac-terization of optimal direct mechanisms.
Proposition 1 (Optimal Mechanisms) . The direct mechanism ( µ , µ , . . . , µ n ) is incentivecompatible and maximizes the sender’s payoff if and only if the associated ( G , . . . , G n ) solve max G ,...,G n (cid:88) θ ∈ Θ g θ (cid:90) Ω v ∗ ( s, θ ) dG θ ( s ) s.t. (cid:90) u ∗ ( s, θ ) dG θ ( s ) ≥ (cid:90) u ∗ ( s, θ ) dG θ (cid:48) ( s ) ∀ θ, θ (cid:48) ∈ Θ (3) G θ (cid:23) F ∀ θ ∈ Θ . (4)The above problem is a simplification of the original information design problem in twodimensions: First, there are no actions of the receiver in the above problem. Second, in-stead of optimizing over signals µ , which specify the distribution over signals conditionalon each state the above formulation involves only the unconditional distributions of G overposterior means. The main challenge in the above optimization problem is that it involves9aximization over a vector of distributions ( G , . . . , G n ) where the set of feasible compo-nents is strongly interdependent due to (3). This interdependence is naturally caused by theincentive compatibility constraint as the sender cannot pick the signal he provides to onetype θ without taking into account the fact that this might give another type θ (cid:48) incentivesto misreport.Our next result decouples the above optimization problem into n independent problems,one for each type θ of the receiver. To do so we define the value c θ the receiver could achievewhen deviating optimally from distribution G ∗ θ c θ = max θ (cid:48) (cid:54) = θ (cid:90) u ∗ ( s, θ ) dG ∗ θ (cid:48) ( s ) . We also define d θ to be the value the receiver gets when reporting his type truthfully d θ = (cid:90) u ∗ ( s, θ ) dG ∗ θ ( s ) . We note that c θ , d − θ are independent of G ∗ θ , i.e., the signal observed by type θ in the optimalpolicy. We can thus optimize over G θ taking G − θ as given which leads to the followingproblem. Lemma 3. If µ is a direct incentive compatible mechanism that maximizes the utility of thesender, then the CDF G θ of the induced posterior mean for type θ solves max G θ (cid:23) F (cid:90) Ω v ∗ ( s, θ ) dG θ ( s ) s.t. (cid:90) u ∗ ( s, θ ) dG θ ( s ) ≥ c θ (5) (cid:90) u ∗ ( s, η ) dG θ ( s ) ≤ d η ∀ η (cid:54) = θ . (6)In this decomposition we maximize the utility the sender receives from each type θ ofthe receiver separately under the constraint (5). This constraint ensures that type θ doesnot want to deviate and report to be another type. Similarly, the constraint (6) ensuresthat no other type wants to report to be type θ . We note that (5) and (6) encode theincentive constraints given in (3) that restrict the signal of type θ . By considering theoptimal deviation we reduced the number of incentive constraints associated with each typefrom 2( n −
1) to n . 10 0 . . . . P P . . . . P P P Figure 1: The partition of the state space Ω = [0 ,
1] on the left is not laminar while thepartition on the right is laminar as the convex hull of all pairs of sets P , P , P are eithernested or have an empty intersection. Laminar Partitional Signals
We next describe a small class of signals, laminar parti-tional signals and show that there always exists an optimal signal within this class. We firstdefine partitional signals:
Definition 2 (Partitional Signal) . A signal is partitional if for each message s ∈ S thereexists a set P s ⊆ Ω such that µ ( { s }| ω ) = ω ∈ P s . A partitional signal partitions the state space into sets ( P s ) s and reveals to the receiverthe set in which the state ω lies. Partitional signals are thus noiseless in the sense that themapping from the state to the signal is deterministic. A simple example of signals which arenot partitional are normal signals where the signal equals the state ω plus normal noise andthus is random conditional on the state. Our next definition imposes further restrictions onthe partition structure of a partitional signals. Definition 3 (Laminar Partitional Signal) . A partition ( P s ) s is laminar if cxP s ∩ cxP s (cid:48) ∈{∅ , cxP s , cxP s (cid:48) } for any s, s (cid:48) . A partitional signal is laminar if its associated partition islaminar. The restrictions imposed by laminar partitional signals are illustrated in Figure 1. Ournext result establishes that there always exists an optimal policy such that the signal of eachtype is laminar partitional. To simplify notation we will denote by M µ ( ω ) ⊆ Ω the set ofstates where the same message is sent as in the state ω for a partitional signal µ . Theorem 1.
There exists an optimal mechanism ( µ , . . . , µ n ) such that each signal µ θ islaminar partitional with partitions P θ . Furthermore, for each type θ there exists a countablecollection of intervals I θ , I θ , . . . such that(i) ω / ∈ ∪ k I θk implies M µ θ ( ω ) = { ω } ;(ii) ω ∈ I θk implies that M µ θ ( ω ) ⊆ I θk and in I θk at most n + 2 distinct signals are sent |{ s : P θs ∩ I θk (cid:54) = ∅}| ≤ n + 2 . θ a subset M µ θ ( ω ) ⊆ Ω of the state spacein which the state lies. Furthermore, this subset is a deterministic function of the state.Theorem 1 thus implies that the designer does not need to rely on random signals whosedistribution conditional on the state could be arbitrarily complex.The fact that the partition can be chosen to be laminar is a further simplification. Itimplies a partial order or a tree structure on the signals such that a message s is larger inthis partial order than s (cid:48) whenever the convex hull of P θs contains the convex hull of P θs (cid:48) . Thelaminar partition P θs can be generated by taking the convex hull of the set where a signal issent and subtracting the convex hull of all signals that are lower in this order, i.e., P θs = cxP θs \ (cid:91) s (cid:48) : cxP θs (cid:48) (cid:40) cxP θs cxP θs (cid:48) . Thus, the partition P θ can be recovered from the intervals cxP θ .To see why Theorem 1 provides a drastic reduction in complexity consider the case wherethe receiver chooses among | A | < ∞ actions. As it is never optimal to reveal to the receivermore information than the optimal action for each type of receiver the optimal signal usesat most | A | messages. Since the optimal signal is partitional these messages correspondto | A | subsets of the state space. Recall that the optimal partition is laminar, and henceeach subset can be identified with its convex hull, which is an interval. As each intervalis completely described by its endpoints it follows that each laminar partitional signal canbe identified with a point in R | A | . Thus, the problem of finding the optimal partitionalsignal can be written as an optimization problem over R | A |×| Θ | . We explore this approachin Section 3.2 The next section discusses an abstract mathematical result about optimization under con-straints that implies Theorem 1. We discuss this result separately as similar mathematicalproblems in economic problems other than the Bayesian persuasion application. For example12leiner et al. (2020) discuss how optimization problems under mean preserving contractionconstraints naturally arise in delegation problems. We leave the exploration of other appli-cations of this mathematical result for future work to keep the exposition focussed on thepersuasion problem.Consider the problem of maximizing the expectation of an arbitrary upper-semi contin-uous function v : [0 , → R over all distributions G that are mean-preserving contractionsof a given distribution F : [0 , → [0 ,
1] subject to n ≥ G (cid:23) F (cid:90) v ( s ) dG ( s )subject to (cid:90) u i ( s ) dG ( s ) ≥ i ∈ { , . . . , n } . (7)Throughout, we assume that the functions u i : [0 , → R are continuous. The next theoremestablishes conditions that need to be satisfied by any solution of the problem (7). Proposition 2.
The problem (7) admits a solution. Let G be a solution to (7) . Thereexists a countable collection of intervals I , I , . . . such that µ equals the original distributionoutside the intervals G ( x ) = F ( x ) for x / ∈ ∪ j I j and in each interval I j = ( a j , b j ) redistributes the mass of F among at most n +2 mass-points m ,j , m ,j , . . . , m n +1 ,j ∈ I j preserving the expectation G ( x ) = G ( a j ) + n +2 (cid:88) r =1 p r,j x ≤ m r,j for x ∈ I j with (cid:80) n +1 r =1 p r,j = F ( b j ) − F ( a j ) . The existence of an optimal solution follows from standard arguments exploiting thecompactness of the feasible set of (7). To establish the remaining claims of Proposition 2, inthe proof we first fix an optimal solution, and consider intervals where the mean preservingcontraction constraint (MPC) does not bind at this solution. As both the constraints aswell as the objective function in (7) are linear we can optimize over (any subinterval of) thisinterval fixing the solution on the complement of this interval, to obtain another optimalsolution. In this optimization problem the mean-preserving contraction constraint is relaxedby a constraint fixing the conditional mean of the distribution on this interval. This problemis now a maximization problem over distributions subject to the n original constraints and an13dditional the identical mean constraint. It was shown in Winkler (1988) that each extremepoint of the set of distributions, which are subject to a given number k of constraints, is thesum of at most k + 1 mass points. Combining the initial optimal solution, with the optimalsolution obtained over the aforementioned interval, we obtain a new solution that satisfiesthe conditions of the proposition over this interval. Repeating this argument for all intervalsinductively, it follows that the claim holds for the entire support. Laminar Structure
Let ω be a random variable distributed according to F . Our nextresult shows that each interval I j in Proposition 2 admits a laminar partition such that whenthe realization of ω belongs to some I j , revealing the partition element that contains it andsimply revealing ω when it does not belong to any I j induces a posterior mean distribution,given by G . Proposition 2 together with this result yields the optimality of partitional signalsas stated in Theorem 1. Proposition 3.
Consider the setting of Proposition 2 and let ω be distributed accordingto F . For each interval I j there exists a laminar partition Π j = (Π j,k ) k such that for all k ∈ { , . . . , n + 2 } P [ ω ∈ Π j,k ] = p j,k and E [ ω | ω ∈ Π j,k ] = m j,k . (8)The proof of this claim relies on a partition lemma (stated in the appendix), whichstrengthens this result by shedding light on how the partition Π j can be constructed. Theproof of the latter lemma is inductive over the number of mass points. When G given inProposition 2 has two atoms in I j , the partition element that corresponds to one of theseatoms is an interval and the other one is the complement of this interval relative to I j .Moreover, it can be obtained by solving a system of equations, expressed in terms of the endpoints of this interval, that satisfy the condition (8) of Proposition 3. As this partition islaminar this yields the result for the case where there are only 2 mass points in I j .When G consists of k > I j one can find a subinterval, such that theexpected value of ω ∼ F conditional on ω being inside this subinterval equals the value of thelargest masspoint and the probability assigned to the interval equals the probability F assignsto the largest masspoint. Conditional on ω being outside this interval, the distribution thusonly admits k − F .This allows us to invoke the induction hypothesis to generate a laminar partition such thatrevealing in which partition element ω lies generates the desired conditional distribution of14he posterior mean. Finally, as this laminar partition combined with the subinterval is againa laminar partition we obtain the result for distributions consisting of k > Next, we focus on the special case where the set of receiver’s actions is finite, A = { , . . . , | A |} with cardinality | A | = K and the designer’s utility depends only on the actions and typeof the receiver, but not the state. We denote designer’s utility when type θ receiver takesaction a ∈ A by v ( a, θ ). It follows from Assumption 1 that for each receiver type θ , thecorresponding indirect utility v ∗ ( · , θ ) is a piece-wise linear function of the induced posteriormean. We parameterize this function with parameters b θ,k , c θ,k , h θ,k , such that u ∗ ( m, θ ) = h θ,k + c θ,k ( m − b θ,k ) for m ∈ [ b θ,k − , b θ,k ] . (9)Here, { b θ,k } K θ +1 k =0 denote a sequence of increasing cutoffs 0 = b θ, ≤ b θ, ≤ . . . ≤ b θ,K θ +1 = 1with K θ ≤ K . Note that (9) implies that h θ,k = h θ,k +1 + c θ,k +1 ( b θ,k − b θ,k +1 ) . (10)The indirect utility function is convex (in the induced posterior mean), which in turn impliesthat { c θ,k } are increasing in k . The convexity also implies that for any m , the indirect utilitycan alternatively be expressed as: v ∗ ( m, θ ) = max k h θ,k + c θ,k ( m − b θ,k ) . (11)To simplify the exposition we assume that when the posterior mean (i) is in ( b θ,k − , b θ,k )the type θ receiver has an action, hereafter a θ,k , that uniquely maximizes her payoff, (ii)equals b θ,k and k ∈ [ K θ ], the type θ receiver is indifferent between exactly two actions a θ,k and a θ,k +1 . We use the short hand notation r θ,k = v ( a θ,k , θ ) to denote the sender’s payoff whentype θ receiver takes action a θ,k . Observe that the sender’s payoff is a piecewise step functionof the posterior mean: There exists a set B θ,k satisfying ( b θ,k − , b θ,k ) ⊂ B θ,k ⊂ [ b θ,k − , b θ,k ] Note that this condition holds, e.g., if the receiver’s payoff in Assumption 1 has a different slope (whenexpressed as a function of ω ) for each action a . Our arguments and optimization formulation immediatelygeneralize to settings where this condition does not hold, but at the expense of more complicated notation. B θ,k , the receiver takes action a θ,k inducing a payoffof r θ,k . Moreover, the sets {B θ,k } k ∈ K θ constitute a disjoint partition of Ω. Here B θ,k contains( b θ,k − , b θ,k ) and its end point(s) which yield(s) higher payoff to the sender when the receiveris indifferent.We next show that it is without loss of optimality to restrict attention to direct mecha-nisms whose posterior mean distributions are discrete with at most one mass point m θ,k in B θ,k for each θ , k ∈ [ K θ ]. Lemma 4.
There exists a direct incentive compatible mechanism M = { µ , . . . , µ n } that isoptimal and satisfies P [ E µ θ [ ω | s ] ∈ B θ,k ] = P [ E µ θ [ ω | s ] = m θ,k ] for all k ∈ [ K θ ] , θ ∈ Θ andsome m θ,k ∈ B θ,k . In what follows we restrict attention to mechanisms of the type characterized in thelemma. Given such a mechanism, the expected payoff of a type θ receiver from reportingher payoff as θ (cid:48) can be given as follows: (cid:90) Ω v ∗ ( s, θ ) dG θ (cid:48) ( s ) = (cid:88) k (cid:48) ∈ [ K θ (cid:48) ] p θ (cid:48) ,k v ∗ ( m θ (cid:48) ,k (cid:48) , θ )= (cid:88) k (cid:48) ∈ [ K θ (cid:48) ] p θ (cid:48) ,k (cid:16) max k h θ,k + c θ,k ( m θ (cid:48) ,k (cid:48) − b θ,k ) (cid:17) , (12)where we use the short hand notation p θ,k = P [ E µ θ [ ω | s ] = m θ,k ]. Since m θ,k ∈ B θ,k ⊂ [ b θ,k − , b θ,k ], when θ (cid:48) = θ , using (9) these expressions simplify to: (cid:90) Ω v ∗ ( s, θ ) dG θ ( s ) = (cid:88) k ∈ [ K θ ] p θ,k ( h θ,k + c θ,k ( m θ,k − b θ,k )) . (13)Defining z θ,k = m θ,k p θ,k , and using (12) and (13), the incentive compatibility constraint (cid:82) Ω v ∗ ( s, θ ) dG θ ( s ) ≥ (cid:82) Ω v ∗ ( s, θ ) dG θ (cid:48) ( s ) can be more explicitly expressed as follows: (cid:88) k ∈ [ K θ ] h θ,k p θ,k + c θ,k ( z θ,k − b θ,k p θ,k ) ≥ (cid:88) k (cid:48) ∈ [ K θ (cid:48) ] max k h θ,k p θ (cid:48) ,k (cid:48) + c θ,k ( z θ (cid:48) ,k (cid:48) − b θ,k p θ (cid:48) ,k (cid:48) ) . (14)Recall that the distributions G θ is an MPC of F for all θ (Lemma 2). Our next lemmaestablishes that the MPC constraints admit an equivalent restatement in terms of { p θ,k , z θ,k } . Lemma 5. G θ (cid:23) F if and only if (cid:80) k ≥ (cid:96) z θ,k ≤ (cid:82) − (cid:80) k ≥ (cid:96) p θ,k F − ( x ) dx , where the inequalityholds with equality for (cid:96) = 1 . { p θ,k , z θ,k } tuple. Moreover, it can be readily seen thatthere is a one-to-one correspondence between such a tuple and posterior mean distributionsof the type given in Lemma 4. Motivated by this, we next reformulate the problem ofobtaining optimal mechanisms (see Proposition 1) in terms of these variables as follows:max { p θ,k ,z θ,k ,y θ,θ (cid:48) ,k } (cid:88) θ ∈ Θ g θ (cid:88) k ∈ [ K θ ] p θ,k r θ,k s.t. (cid:88) k ∈ [ K θ ] | k ≥ (cid:96) z θ,k ≤ (cid:90) − (cid:80) k ∈ [ Kθ ] | k ≥ (cid:96) p θ,k F − ( x ) dx for θ ∈ Θ, (cid:96) ∈ [ K θ ] \ { } , (cid:88) k ∈ [ K θ ] z θ,k = (cid:90) F − ( x ) dx ∀ θ ∈ Θ, h θ,k p θ (cid:48) ,k (cid:48) + c θ,k ( z θ (cid:48) ,k (cid:48) − b θ,k p θ (cid:48) ,k (cid:48) ) ≤ y θ,θ (cid:48) ,k (cid:48) ∀ θ, θ (cid:48) ∈ Θ , k ∈ [ K θ ] , k (cid:48) ∈ [ K θ (cid:48) ], (cid:88) k (cid:48) ∈ [ K θ (cid:48) ] y θ,θ (cid:48) ,k (cid:48) ≤ (cid:88) k ∈ [ K θ ] h θ,k p θ,k + c θ,k ( z θ,k − b θ,k p θ,k ) , ∀ θ, θ (cid:48) ∈ Θ ,p θ,k b θ,k − ≤ z θ,k ≤ p θ,k b θ,k ∀ θ ∈ Θ, k ∈ [ K θ ] , (cid:88) k ∈ [ K θ ] p θ,k = 1 for θ ∈ Θ, p θ,k ≥ θ ∈ Θ, k ∈ [ K θ ] . (OPT)In this formulation, the first two constraints are the restatement of the MPC constraints(see Lemma 5), and the last two constraints ensure that { p θ,k } k (or equivalently G θ ) con-stitutes a valid probability distribution for all θ . It can be easily checked that y θ,θ (cid:48) ,k (cid:48) =max k ∈ [ K θ ] h θ,k p θ (cid:48) ,k (cid:48) + c θ,k ( z θ (cid:48) ,k (cid:48) − b θ,k p θ (cid:48) ,k (cid:48) ) at an optimal. This is because when y θ,θ (cid:48) ,k (cid:48) isstrictly larger than the right hand side, it can be decreased to construct another feasible so-lution with the same objective. Thus, it follows that the third and fourth constraints restatethe incentive compatibility constraint in (14), by using y θ,θ (cid:48) ,k (cid:48) to capture the summands inthe right hand side of the aforementioned constraint. Finally, recall that z θ,k /p θ,k capturesthe posterior mean induced in B θ,k by a feasible mechanism, and B θ,k ⊂ [ b θ,k − , b θ,k ]. Thefifth constraint relaxes this constraint, by taking the closure of B θ,k . It can be seen thatthis relaxation does not impact the optimal solution, since by construction the end points of( b θ,k − , b θ,k ) are assigned to B θ,(cid:96) for (cid:96) ∈ { k − , k, k + 1 } , in a way that yields a higher payoffto the sender.It is worth pointing out that (OPT) is a finite dimensional convex optimization problem.17his is unlike the infinite dimensional optimization formulation given in Proposition 1. Notethat (OPT) on restating sender’s problem in terms of the { p θ,k , z θ,k } tuple. Two points aboutthis reformulation are important to highlight. First, an alternative approach would involveoptimizing directly over distributions that satisfy the conditions of Lemma 4. Moreover,this could be formulated as a finite dimensional problem as well, e.g., searching over thelocation m θ,k and weight p θ,k of each atom. However, this approach does not yield a convexoptimization formulation. For instance, it can be readily seen that { p θ,k , m θ,k } tuples thatsatisfy the conditions of Lemma 4 do not constitute a convex set. The formulation in (OPT)amounts to a change of variables that yields a convex program. Second, given an optimalsolution to (OPT), the optimal distributions { G θ } can be obtained straightforwardly (byplacing an atom with weight p θ,k at z θ,k /p θ,k for each k ∈ [ K θ ]). Moreover, as discussedin Section 3.1, an optimal mechanism that induces these distributions can be obtained byconstructing a laminar partition of the state space (by following the approach in Proposition3 and Lemma 10). These observations imply our next proposition. Proposition 4.
An optimal mechanism can be obtained by solving (OPT) . We next illustrate our results through a simple example. In this example, the receiver isa buyer who decides how many units of an indivisible good to purchase. He is privatelyinformed about his type which captures his taste for the good. The designer is a seller whocontrols information about the quality of the good, captured by the state. We assume thatprices are linear in consumption and we set the price of one unit of the good to / . Theutility the buyer derives from the k -th good is given by( θ + ω ) max { − k, } . As a consequence the buyer never wants to consume more than 5 units of goods, his marginalutility of consumption decreases linearly in the number of goods and increases in the good’squality ω and in his taste parameter θ . The quality of the good is distributed uniformly in[0 ,
1] and the buyer either derives a low θ = 0 .
3, intermediate θ = 0 .
45 or high value θ = 0 . θ of the buyer, to maximize the (expected) number of units sold.The indirect utility u ∗ ( m, θ ) of the receiver is displayed in Figure 2. When the expected18 .0 0.2 0.4 0.6 0.8 1.0 m u ( m , ) = 0.3= 0.45= 0.6 Figure 2: The indirect utility of the receiverquality m of the good is low, all types find it optimal to purchase zero units, yielding apayoff of zero. As the expected quality improves, the purchase quantity increases. In Figure2, the curve for each type is piecewise linear, and the break-points of each curve correspondto the posterior mean levels where the receiver increases her purchase quantity. Since thestate (and hence the posterior mean) belongs to [0 , { , , } ), and hence the sender’s problem canbe formulated and solved using the finite-dimensional convex program of Section 3.2. Wesolve this optimization formulation, and construct the laminar partitional signal as discussedat the end of Section 3.1. The resulting optimal menu is given in Figure 3. ✓ = 0 .
45 :
2; and the buyer will find it optimal to purchase the correspondingnumber of units. Under the optimal menu, the expected purchase quantity increases withthe type. For instance, the high type purchases two units in the states where the medium19ype purchases only one unit, which in turn leads to higher expected purchase. Similarly,when the low type purchases zero units, the medium type purchases zero or one units; andthe set of states where the medium type purchases two units is larger than that for the lowtype.On the other hand, the purchase quantities of different types are not ordered for eachstate. For instance, there are states where the low and the high types purchase two units,and the medium type purchases one unit. Note that this implies that the purchase regions ofbuyers are not “nested” in the sense of Guo and Shmaya (2019). Moreover, low and mediumtypes may end up purchasing lower quantities in some high states, then they do for lowerstates. In fact, under the optimal mechanism, for the best and the worst states, the lowtype purchases zero units. Thus, in the optimal mechanism, the low and medium type ofthe buyer sometimes consume a smaller quantity of the good if it is of higher quality. This(maybe counterintuitive) feature of the optimal mechanism is a consequence of the incentiveconstraints: By pooling some high states with low states, one makes it less appealing for thehigh type to deviate and observe the signal of a lower type.
Relation to Public Revelation Results in the Binary Action Case
In case of bi-nary actions (and under some assumptions on the payoff structure), Kolotilin et al. (2017)and Guo and Shmaya (2019) establish that the optimal mechanism admits a “public” im-plementation. In this case, an optimal menu can be obtained as in Section 3.2. For eachtype the corresponding laminar partitional signal induces one action in a subinterval of thestate space, and the other action in the complement of this interval. It can be shown thatthe mechanism that reveals signal realizations associated with different types to all receivertypes is still optimal. Thus, as opposed to first eliciting types and then sharing with eachtype the realization of the signal associated with this type, the sender can achieve the opti-mal outcome by sharing a signal (which encodes the information of the signals of all types)publicly with all receiver types.By contrast, the mechanism illustrated in Figure 3 does not admit a public implemen-tation. For instance, under this mechanism the high type purchases two units wheneverthe state realization is higher than 0 . Suppose that this type of receiver had access tothe signals of, for instance, the low type as well. Then, he could infer whether the stateis in [0 . , . ∪ [0 . , .
43. This implies that the expected payoff of the high In the figure, the cutoffs are reported after rounding, e.g., the cutoff for the high type is approximatelyat 0 .
06. For sake of exposition, in our discussion we stick to the rounded values. .
43 + 0 . × . − <
0. Thus, for state real-izations that belong to the aforementioned set, the high type finds it optimal to strictlyreduce her consumption (relative to the one in Figure 3). In other words, observing the ad-ditional signal reduces the expected purchase of the high type (and the other types). Hence,such a public implementation is strictly suboptimal. As a side note, the optimal publicimplementation can be obtained by replacing different types with a single “representativetype” and using the framework of Section 3. This amounts to replacing the sender’s pay-off with (cid:80) θ g θ max a ∈ A (cid:63) ( m,θ ) v ( a, m, θ ) and removing the incentive compatibility constraintsin the optimization formulation of Section 3. We conducted this exercise and also verifiedthat restricting attention to public mechanisms yields a strictly lower expected payoff to thesender. Which Incentive Constraints Bind?
Finally, given the mechanisms it is straightfor-ward to check which incentive compatibility conditions are binding under the mechanism ofFigure 3. Both the medium and the high type are indifferent among reporting their typesas low, medium, or high. Similarly, the low type is indifferent between reporting her type aslow or medium, but achieves strictly lower payoff from reporting her type as high. Interest-ingly, these observations imply that unlike classical mechanism design settings “non-local”incentive constraints might bind under the optimal mechanism. The effect of the incentivecompatibility constraints on the optimal mechanism are easily seen from the figure. Forinstance, the high type’s payoff from a truthful type report is strictly positive. If this werethe only relevant type, the sender could choose a strictly smaller threshold than 0 .
06 andstill ensure purchase of two units, thereby increasing the expected purchase amount of thehigh type. However, when the other types are also present, such a change in the signal ofthe high type incentivizes this type to deviate and misreport his type as low or medium.Changing the signals of the remaining types to recover incentive compatibility, reduces thepayoff the sender derives from them. The mechanism in Figure 3 maximizes the sender’spayoff while carefully satisfying such incentive compatibility constraints.
Our results can be easily extended along various dimensions. Below we discuss a few imme-diate extensions. 21 ype-Dependent Participation Constraints
In our analysis we can allow each type ofthe receiver to face a participation constraint which means that the mechanism must leavethat type of agent with at least some given expected utility. Our analysis and results carryover to this case unchanged as (5) already encodes such an endogenous constraint capturingthe value of deviating by observing the signal meant for another type. To adjust the resultfor this case one just needs to replace c θ by the lower bound on the agents utility wheneverthis utility is larger than c θ . Competition with Multiple Senders
Another application of our approach is to com-petition among multiple senders. Suppose that each sender offers a menu of signals andthe receiver can chose one of them to observe. Each sender receives a higher payoff if thereceiver chooses her menu. Again the sender has to ensure that the signal she provides eachtype with provides a sufficiently high utility such that this type does not prefer to observeeither another signal of the same sender or a signal provided by a different sender. Thissituation corresponds to an endogenous type dependent participation constraint which isdetermined in equilibrium. As our analysis works for any given participation constraint itcarries over to this case. Another plausible model of competition is that the receiver can observe all the signals sent by differentreceivers. For an analysis of this situation see Gentzkow and Kamenica (2016a). ppendix Lemma 6.
Suppose u i : [0 , → R is a continuous function for i ∈ { , . . . , n } . The set ofdistributions G : [0 , → [0 , that satisfy G (cid:23) F and (cid:90) u i ( s ) dG ( s ) ≥ for i ∈ { , . . . , n } (15) is compact in the weak topology.Proof. First, note that as u i is continuous it is bounded on [0 , G i , i ∈ { , , . . . } that satisfies the above constraints. By Helly’s selectiontheorem there exists a subsequence that converges pointwise. From now on assume that( G i ) is such a subsequence and denote by G ∞ the right-continuous representation of itspoint-wise limit. Thus, any sequence of random variables m i such that m i ∼ G i convergesin distribution to a random variable distributed according to G ∞ . As ( u k ) are continuousand bounded this implies that for all k and all θ ∈ Θlim i →∞ (cid:90) u k ( s ) dG i ( s ) = (cid:90) u k ( s ) dG ∞ ( s ) . Furthermore, for all x ∈ [0 ,
1] lim i →∞ (cid:90) x G i ( s ) ds = (cid:90) x G ∞ ( s ) , and hence G ∞ also satisfies G ∞ (cid:23) F . Thus, the set of distributions given in the statementof the lemma is compact with respect to the weak topology. Lemma 7.
Let
F, G : [0 , → [0 , be CDFs and let F be continuous. Suppose that G is amean-preserving contraction of F and for some x ∈ [0 , (cid:90) x F ( s ) ds = (cid:90) x G ( s ) ds. Then F ( x ) = G ( x ) . Furthermore, G is continuous at x .Proof. Define the function L : [0 , → R as L ( z ) = (cid:90) z F ( s ) − G ( s ) ds . G is a mean-preserving contraction of F we have that L ( z ) ≤ z ∈ [0 , L ( x ) = 0. By definition L is absolutely continuous and hasa weak derivative, which we denote by L (cid:48) . As F is continuous L (cid:48) ( z ) = G ( z ) − F ( z ) almosteverywhere and L (cid:48) has only up-ward jumps. For L to have a maximum at x we need thatlim z (cid:37) x L (cid:48) ( z ) ≥ z (cid:38) x L (cid:48) ( z ) ≤
0. This implies thatlim z (cid:38) x G ( z ) − F ( z ) ≤ ≤ lim z (cid:37) x G ( z ) − F ( z ) . In turn, this implies that lim z (cid:38) x G ( z ) ≤ lim z (cid:37) x G ( z ). As G is a CDF it is non-decreasingand thus G is continuous at x . Consequently, L is continuously differentiable at x and as L admits a maximum at x , we have that 0 = L (cid:48) ( x ) = G ( x ) − F ( x ). Lemma 8.
Fix an interval [ a, b ] ⊆ [0 , , c ∈ R , upper-semicontinuous v : [0 , → [0 , andcontinuous ˜ u , . . . , ˜ u n : [0 , → R and consider the problem max ˜ G (cid:90) v ( s ) d ˜ G ( s ) (16) subject to (cid:90) ˜ u i ( s ) d ˜ G ( s ) ≥ for i ∈ { , . . . , n } (17) (cid:90) ba G ( s ) ds = c (18) (cid:90) [ a,b ] d ˜ G ( s ) = 1 . (19) If the set of distributions that satisfy (17) - (19) is non-empty then there exists a solution tothe above optimization problem that is supported on at most n + 2 points.Proof. Consider the set of distributions that assign probability 1 to the set [ a, b ]. The extremepoints of this set are the Dirac measures in [ a, b ]. Let D be the set of distributions whichsatisfy (17)-(18) and are supported on [ a, b ]. By Theorem 2.1 in Winkler (1988) each extremepoints of the set D is the sum of at most n + 2 mass points as (17) and (18) specify n + 1constraints. Note, that the set of the set of distributions satisfying (17)-(19) is compact. As v is upper-semicontinuous the function ˜ G → (cid:82) v ( s ) d ˜ G ( s ) is upper-semi continuous and linear.Thus, by Bauer’s maximum principle (see for example Result 7.69 in Aliprantis and Border2013) there exist a maximizer at an extreme point of D which establishes the result. Lemma 9.
Suppose that
H, G are distribution that assign probability 1 to [ a, b ] . Let M be an absolutely continuous function such that (cid:82) bx G ( s ) ds > M ( x ) for all x ∈ [ a, b ] , and b ˆ x H ( y ) dy < M ( x ) for some ˆ x ∈ [ a, b ] Then, there exists λ ∈ (0 , such that for all x ∈ [ a, b ] (cid:90) bx (1 − λ ) G ( s ) + λH ( s ) ds ≥ M ( x ) with equality for some x ∈ [ a, b ] .Proof. Define L λ ( x ) = (cid:90) bx (1 − λ ) G ( y ) + λH ( y ) dy − M ( x ) , and φ ( λ ) = min z ∈ [ a,b ] L λ ( z ). As M is continuous, by the assumptions of the lemma we havethat φ (0) = min x ∈ [ a,b ] L ( x ) = min x ∈ [ a,b ] (cid:20)(cid:90) bx G ( s ) ds − M ( x ) (cid:21) > φ (1) = min x ∈ [ a,b ] L ( x ) = min x ∈ [ a,b ] (cid:20)(cid:90) bx H ( s ) ds − M ( x ) (cid:21) ≤ (cid:90) b ˆ x H ( s ) ds − M (ˆ x ) < . Furthermore, (cid:12)(cid:12)(cid:12)(cid:12) ∂L λ ( z ) ∂λ (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) bx H ( s ) − G ( s ) ds (cid:12)(cid:12)(cid:12)(cid:12) ≤ b − a . Hence, λ (cid:55)→ L λ ( z ) is uniformly Lipschitz continuous and the envelope theorem thus impliesthat φ is Lipschitz continuous. As φ (0) >
0, and φ (1) < λ ∗ ∈ (0 ,
1) suchthat φ ( λ ∗ ) = 0. (cid:90) bx (1 − λ ∗ ) G ( s ) + λ ∗ H ( s ) ds ≥ M ( x )with equality for some x ∈ [ a, b ]. This completes the proof. Proof of Proposition 2.
As the set of feasible distributions is compact with respect to theweak topology by Lemma 6 and the function G (cid:55)→ (cid:82) v ∗ ( s ) dG ( s ) is upper semicontinuous inthe weak topology the optimization problem (7) admits a solution.Let G be a solution to the optimization problem (7) and denote by B G the set of pointswhere the mean preserving contraction (MPC) constraint is binding, i.e., B G = (cid:26) z ∈ [0 ,
1] : (cid:90) z F ( s ) ds = (cid:90) z G ( s ) ds (cid:27) . (20)Suppose that this solution is maximal in the sense that there does not exist another solution25 (cid:48) for which the set of points where the MPC constraint binds is larger, i.e., B G ⊂ B G (cid:48) (where B G (cid:48) defined as in (20) after replacing G with G (cid:48) ). The existence of such a maximaloptimal solution follows from Zorn’s Lemma (see for example Section 1.12 in Aliprantis andBorder 2013).Fix a point x / ∈ B G . We define ( a, b ) to be the largest interval such that the mean-preserving contraction constraint does not bind on that interval for the solution G , i.e. a = max (cid:110) z ≤ x : z ∈ B G (cid:111) ,b = min (cid:110) z ≥ x : z ∈ B G (cid:111) . Fix any a < ˆ a and ˆ b < b , and consider the interval interval [ˆ a, ˆ b ] ⊂ [ a, b ]. We define G [ˆ a, ˆ b ] : [0 , → [0 ,
1] to be the CDF of a random variable that is distributed according to G conditional on the realization being in the interval [ˆ a, ˆ b ] G [ˆ a, ˆ b ] ( z ) = G ( z ) − G (ˆ a − ) G (ˆ b ) − G (ˆ a − ) , where G (ˆ a − ) = lim s (cid:37) ˆ a G ( s ). We note that G [ˆ a, ˆ b ] is non-decreasing, right-continuous, andsatisfies G [ˆ a, ˆ b ] (ˆ b ) = 1. Thus, it is a well defined cumulative distribution supported on [ˆ a, ˆ b ].As G is feasible we get that (cid:90) ˆ b ˆ a u k ( s ) dG [ˆ a, ˆ b ] ( s ) + 1 G (ˆ b ) − G (ˆ a − ) (cid:90) [0 , \ [ˆ a, ˆ b ] u k ( s ) dG ( s ) ≥ k ∈ { , . . . , n } . (21)To simplify notation we define the functions ˜ u , . . . , ˜ u n , where for all k ˜ u k ( z ) = u k ( z ) + 1 G (ˆ b ) − G (ˆ a − ) (cid:90) [0 , \ [ˆ a, ˆ b ] u k ( y ) dG ( y ) . (22)Note that using this notation (21) can be restated as: (cid:90) ˜ u k ( s ) dG [ˆ a, ˆ b ] ( s ) ≥ k ∈ { , . . . , n } . (23)As G satisfies the mean-preserving contraction constraint relative to F , using the fact that26 < ˆ a and ˆ b < b , for z ∈ [ˆ a, ˆ b ] we obtain: (cid:90) ˆ bz G [ˆ a, ˆ b ] ( s ) ds > G (ˆ b ) − G (ˆ a − ) (cid:20)(cid:90) z F ( s ) ds − (cid:90) b G ( s ) ds − (ˆ b − z ) G (ˆ a − ) (cid:21) = M ( z ) . (24)Consider now the maximization problem over distributions supported on [ˆ a, ˆ b ] that satisfy theconstraints derived above (after replacing the strict inequality in (24) with a weak inequality)and maximize the original objective:max H (cid:90) v ( s ) dH ( s )subject to (cid:90) ˜ u i ( s ) dH ( s ) ≥ i ∈ { , . . . , n } (cid:90) ˆ bz H ( s ) ds ≥ M ( z ) for z ∈ [ˆ a, ˆ b ] (cid:90) [ˆ a, ˆ b ] dH ( s ) = 1 . (25)By (23) and (24) the conditional CDF G [ˆ a, ˆ b ] is feasible in the problem above. We claimthat it is also optimal. Suppose, towards a contradiction, that there exist a CDF H that isfeasible in (25) and achieves a strictly higher value than G [ˆ a, ˆ b ] . Consider the CDF K ( z ) = G ( z ) if z ∈ [0 , \ [ˆ a, ˆ b ] G (ˆ a − ) + H ( z )( G (ˆ b ) − G (ˆ a − )) if z ∈ [ˆ a, ˆ b ] , which equals G outside the interval [ˆ a, ˆ b ] and H conditional on being in [ˆ a, ˆ b ]. Using (22), thedefinition of M ( z ), and the feasibility of H in (25), it can be readily verified that this CDFis feasible in the original problem (7). Moreover, it achieves a higher value than G , since H achieves strictly higher value than G [ˆ a, ˆ b ] in (25). However, this leads to a contradiction tothe optimality of G in (7), thereby implying that G [ˆ a, ˆ b ] is optimal in (25).Next, we establish that there cannot exist an optimal solution H to the problem (25)where for some z ∈ (ˆ a, ˆ b ) (cid:90) ˆ bz H ( s ) ds = M ( z ) . (26)Suppose such an optimal solution exists. Then, K would be an optimal solution to theoriginal problem satisfying z ∈ B K ⊃ B G , where B K is defined as in (20) (after replacing27 with K ) and is the set of points where the mean preserving contraction constraint binds.However, this contradicts that G is a solution to the original problem that is maximal (interms of the set where the MPC constraints bind).We next consider a relaxed version of the optimization problem (25) where we replace thesecond constraint of (25) with a constraint that ensures that H has the same mean as G [ˆ a, ˆ b ] :max H (cid:90) v ( s ) dH ( s )subject to (cid:90) ˜ u i ( s ) dH ( s ) ≥ i ∈ { , . . . , n } (cid:90) ˆ b ˆ a H ( s ) ds = (cid:90) ˆ b ˆ a G [ˆ a, ˆ b ] ( s ) ds (cid:90) [ˆ a, ˆ b ] dH ( s ) = 1 . By Lemma 8 there exists a solution J to this relaxed problem that is the sum of n + 2 masspoints. Since G [ˆ a, ˆ b ] is feasible in this problem, it readily follows that (cid:90) v ( s ) dJ ( s ) ≥ (cid:90) v ( s ) dG [ˆ a, ˆ b ] ( s ) . (27)Suppose, towards a contradiction, that there exists z ∈ [ˆ a, ˆ b ] such that (cid:90) ˆ bz J ( s ) ds < M ( z ) . (28)Then, by Lemma 9, there exists some λ ∈ (0 ,
1) such that (1 − λ ) G [ˆ a, ˆ b ] + λJ satisfies (cid:90) ˆ br (1 − λ ) G [ˆ a, ˆ b ] ( s ) + λJ ( s ) ds ≥ M ( r ) , (29)for all r ∈ [ˆ a, ˆ b ], and the inequality holds with equality for some r ∈ [ˆ a, ˆ b ]. This impliesthat (1 − λ ) G [ˆ a, ˆ b ] + λJ is feasible for the problem (25). Furthermore, by the linearity of theobjective, (27), and the optimality of G [ˆ a, ˆ b ] in (25), it follows that (1 − λ ) G [ˆ a, ˆ b ] + λJ is alsooptimal in (25). However, this leads to a contradiction to the fact that (25) does not admitan optimal solution where the equality in (26) holds for some z ∈ [ˆ a, ˆ b ] ⊂ [ a, b ].Consequently, the inequality (28) cannot hold, and J must be feasible in problem (25).28ogether with (27) this implies that J is an optimal solution to (25). that assigns mass toonly n + 2 points in the interval [ˆ a, ˆ b ]. This implies that the CDF G ( z ) if z ∈ [0 , \ [ˆ a, ˆ b ] G (ˆ a − ) + J ( z )( G (ˆ b ) − G (ˆ a − )) if z ∈ [ˆ a, ˆ b ] (30)is a solution of the original problem that assigns mass to only n + 2 points in the interval[ˆ a, ˆ b ]. By setting ˆ a = a + r and ˆ b = b − r we can thus find a sequence of solutions ( H r )to (7) that each have at most n + 2 mass points in the interval [ a + r , b − r ]. As the setof feasible distributions is closed and the objective function is upper-semicontinuous thissequence admits a limit point H ∞ which itself is optimal in (7). This limit distributionconsists of at most n + 2 mass-points in the interval ( a, b ). Furthermore, by definition of a, b and our construction in (30) each solution H r and hence H ∞ satisfies the MPC constraintwith equality at { a, b } . Thus, Lemma 7 implies that H ∞ is continuous at these points, and H ∞ ( a ) = F ( a ) and H ∞ ( b ) = F ( b ).Hence, we have established that for every solution G for which B G is maximal, either x ∈ B G which by Lemma 7 implies that G ( x ) = F ( x ). Or x / ∈ B G and then one canfind a new solution ˜ G such that (i) ˜ G has at most n + 2 mass points in the interval ( a, b )with a = max { z ≤ x : z ∈ B G } and b = min { z ≥ x : z ∈ B G } , (ii) ˜ G ( a ) = F ( a ) and˜ G ( b ) = F ( b ) which implies that the mass inside the interval [ a, b ] is preserved, and (iii) ˜ G matches G outside ( a, b ). Since every interval contains a rational number there can be atmost countably many such intervals. Proceeding inductively, the claim follows.To establish Proposition 3, we make use of the partition lemma, stated next: Lemma 10 (Partition Lemma) . Suppose that distributions
F, G are such that (cid:82) x G ( t ) dt ≥ (cid:82) x F ( t ) dt for x ∈ I = [ a, b ] , where the inequality holds with equality only for the end pointsof I . Suppose further that G ( a ) = F ( a ) , G ( x ) = G ( a ) + (cid:80) Kr =1 p r x ≤ m r for x ∈ I where (cid:80) Kr =1 p r = F ( b ) − F ( a ) , { m r } is a strictly increasing collection in r , and m r ∈ I for r ∈ [ K ] .There exists a collection of intervals { J r } r ∈ [ K ] such that { P k } = { J k \ ∪ (cid:96) ∈A| (cid:96)>k J (cid:96) } is alaminar partition, which satisfies:(a) J = I , and if K > , then F (inf J ) < F (inf J K ) < F (sup J K ) < F (sup J ) ;(b) (cid:82) P k dF ( x ) = p k for all k ∈ [ K ] ;(c) (cid:82) P k xdF ( x ) = p k m k for all k ∈ [ K ] . roof of Proposition 3. The proof of the lemma is given after the proof of the proposition. Bydefinition, interval in the statement of Proposition 3 satisfies the conditions of this lemma(after setting a = a j , b = b j ). The lemma defines auxiliary intervals { J r } and explicitlyconstructs a laminar partition that satisfies conditions (a)-(c). Here, conditions (b) and (c)readily imply that the constructed laminar partition satisfies the claim in Proposition 3,concluding the proof. Proof of Lemma 10.
We prove the claim by induction on K . Note that when K = 1 wehave J = P = I , which readily implies properties (a) and (b). In addition, the definitionof p , m implies that G ( b ) b − G ( a ) a − p m = G ( a )( b − a ) + p ( b − m )= (cid:90) ba G ( t ) dt = (cid:90) ba F ( t ) dt = F ( b ) b − F ( a ) a − (cid:90) I tdF ( t )= G ( b ) b − G ( a ) a − (cid:90) P tdF ( t ) . (31)Hence, property (c) also follows.We proceed by considering two cases: K = 2, K > K = 2: Let t , t ∈ I be such that F ( t ) − F ( a ) = F ( b ) − F ( t ) = p . Observe that since (cid:82) x G ( t ) dt ≥ (cid:82) x F ( t ) dt x ∈ I and this inequality holds with equality only at the end pointsof I , we have (i) (cid:82) t a F ( x ) dx > (cid:82) t a G ( x ) dx and (ii) (cid:82) bt F ( x ) dx < (cid:82) bt G ( x ) dx . Using the firstinequality and the definition of G we obtain: p ( t − m ) + + G ( a )( t − a ) ≤ (cid:90) t a G ( x ) dx < (cid:90) t a F ( x ) dx = F ( t ) t − F ( a ) a − (cid:90) t a xdF ( x )= ( G ( a ) + p ) t − G ( a ) a − (cid:90) t a xdF ( x ) . (32)Rearranging the terms, this yields p m ≥ p t − p ( t − m ) + > (cid:90) t a xdF ( x ) . (33)30imilarly, using (ii) and the definition of G we obtain: G ( b )( b − t ) − p ( m − t ) + ≥ (cid:90) bt G ( x ) dx > (cid:90) bt F ( x ) dx = F ( b ) b − F ( t ) t − (cid:90) bt xdF ( x )= G ( b ) b − ( G ( b ) − p ) t − (cid:90) bt xdF ( x ) . (34)Rearranging the terms, this yields p m ≤ p t + p ( m − t ) + < (cid:90) bt xdF ( x ) . (35)Combining (33) and (35), and the fact that F ( t ) − F ( a ) = F ( b ) − F ( t ) = p impliesthat there exists ˆ t , ˆ t ∈ int( I ) satisfying ˆ t < ˆ t such that F (ˆ t ) − F ( a ) + F ( b ) − F (ˆ t ) = p and (cid:90) ˆ t a xdF ( x ) + (cid:90) b ˆ t xdF ( x ) = p m . (36)Note that( b − a ) G ( a ) + ( b − m ) p + ( b − m ) p = (cid:90) ba G ( x ) dx = (cid:90) ba F ( x ) dx = bF ( b ) − aF ( a ) − (cid:90) ba xdF ( x ) = bG ( b ) − aG ( a ) − (cid:90) ba xdF ( x ) . Since p + p = G ( b ) − G ( a ), this in turn implies that (cid:90) ba xdF ( x ) = p m + p m . Combining this observation with (36), we conclude that (cid:90) ˆ t ˆ t xdF ( x ) = p m . (37)Let J = [ˆ t , ˆ t ], and J = I , and define P , P as in the statement of the lemma. Observethat this construction immediately satisfies (a) and (b). Moreover, (c) also follows from (36)and (37). Thus, the claim holds when K = 2.31 >
2: Suppose that
K >
2, and that the induction hypothesis holds for any K (cid:48) ≤ K − p = p K , ˆ m = m K ; and ˆ p = (cid:80) k ∈ [ K − p k , ˆ m = p (cid:80) k ∈ [ K − p k m k . Define a distributionˆ G such that ˆ G ( x ) = G ( x ) for x / ∈ I , ˆ G ( a ) = F ( a ), and ˆ G ( x ) = ˆ G ( a ) + (cid:80) r =1 ˆ p r x ≤ ˆ m r . Thisconstruction ensures that ˆ p + ˆ p = F ( b ) − F ( a ) and ˆ y > ˆ y . Moreover, G is a meanpreserving spread of ˆ G , and hence (cid:82) x ˆ G ( t ) dt ≥ (cid:82) x G ( t ) dt . Since ˆ G ( x ) = G ( x ) for x / ∈ I , thisin turn implies that (cid:82) x ˆ G ( t ) dt ≥ (cid:82) x F ( t ) dt for x ∈ I where the inequality holds with equalityonly for the end points of I . Thus, the assumptions of the lemma hold for ˆ G and F , andusing the induction hypothesis for K (cid:48) = 2, we conclude that there exists intervals ˆ J , ˆ J andsets P = ˆ J , P = ˆ J \ ˆ J , such that(ˆ a ) I = ˆ J ⊃ ˆ J , and F (inf ˆ J ) < F (inf ˆ J ) < F (sup ˆ J ) < F (sup ˆ J );(ˆ b ) (cid:82) P k dF ( x ) = ˆ p k for k ∈ { , } ;(ˆ c ) (cid:82) P k xdF ( x ) = ˆ p k ˆ m k for all k ∈ { , } .Note that (ˆ b ) and (ˆ c ) imply that ˆ m ∈ ˆ J .Denote by x , x the end points of ˆ J and let q = F ( x ) > F ( a ), q = F ( x ) < F ( b ).Define a cumulative distribution function F (cid:48) ( · ), such that F (cid:48) ( x ) = F ( x ) / (1 − ˆ p ) for x ≤ x ,F ( x ) / (1 − ˆ p ) for x < x < x , ( F ( x ) − ˆ p ) / (1 − ˆ p ) for x ≤ x. (38)Set p (cid:48) k = p k / (1 − ˆ p ) and m (cid:48) k = m k for k ∈ [ K − G (cid:48) be such that G (cid:48) ( x ) = G ( x ) / (1 − ˆ p ) for x / ∈ I , and G (cid:48) ( x ) = G (cid:48) ( a ) + (cid:80) r ∈ [ K − p (cid:48) r x ≤ m (cid:48) r . Observe that byconstruction G (cid:48) ( a ) = F (cid:48) ( a ), (cid:80) r ∈ [ K − p (cid:48) r = F (cid:48) ( b ) − F (cid:48) ( a ), and { m (cid:48) r } is a strictly increasingcollection in r , where m (cid:48) r ∈ I , m (cid:48) r < ˆ m for r ∈ [ K − G (cid:48) and F (cid:48) also satisfy the MPC constraints over I : Lemma 11. (cid:82) x G (cid:48) ( t ) dt ≥ (cid:82) x F (cid:48) ( t ) dt for x ∈ I , where the inequality holds with equality onlyfor the end points of I .Proof. The definition of G (cid:48) implies that it can alternatively be expressed as follows: G (cid:48) ( x ) = G ( x ) / (1 − ˆ p ) for x < ˆ m , ( G ( x ) − ˆ p ) / (1 − ˆ p ) for x ≥ ˆ m . (39)32ince (cid:82) b G ( t ) dt = (cid:82) b F ( t ) dt , (38) and (39) readily imply that (cid:82) b G (cid:48) ( t ) dt = (cid:82) b F (cid:48) ( t ) dt . Simi-larly, using these observations and (38) we have(1 − ˆ p ) (cid:90) a F (cid:48) ( t ) dt = (cid:90) a F ( t ) dt − (cid:90) x x F ( t ) dt + F ( x )( x − x ) − ˆ p (1 − x )= (cid:90) a F ( t ) dt − F ( x ) x + F ( x ) x + ˆ p ˆ m + F ( x )( x − x ) − ˆ p (1 − x )= (cid:90) a G ( t ) dt − ˆ p (1 − ˆ m ) (40)Here, the second line rewrites (cid:82) x x F ( t ) dt using integration by parts, and leverages (ˆ c ). Thethird line uses the fact that ˆ p = F ( x ) − F ( x ) and (cid:82) a G ( t ) dt = (cid:82) a F ( t ) dt . On the otherhand, (39) readily implies that:(1 − ˆ p ) (cid:90) a G (cid:48) ( t ) dt = (cid:90) a G ( t ) dt − ˆ p (1 − ˆ m ) (41)Together with (40), this equation implies that (cid:82) a G (cid:48) ( t ) dt = (cid:82) a F (cid:48) ( t ) dt . Thus, the inequalityin the claim holds with equality for the end points of I .Recall that ˆ m ∈ ˆ I and hence a < x ≤ ˆ m = m K ≤ x < b . We complete the proof byfocusing on the value x takes in the following cases: (i) a < x ≤ x , (ii) x ≤ x ≤ ˆ m , (iii)ˆ m ≤ x ≤ x , (iv) x ≤ x < b . Case (i):
Using the observations (cid:82) x G ( t ) dt > (cid:82) x F ( t ) dt and (cid:82) a G ( t ) dt = (cid:82) a F ( t ) dt togetherwith (38) and (39) yields (cid:90) xa G (cid:48) ( t ) dt = 11 − ˆ p (cid:90) xa G ( t ) dt < − ˆ p (cid:90) xa F ( t ) dt = (cid:90) xa F (cid:48) ( t ) dt. (42)Together with the fact that (cid:82) a G (cid:48) ( t ) dt = (cid:82) a F (cid:48) ( t ) dt this implies that (cid:82) x G (cid:48) ( t ) dt > (cid:82) x F (cid:48) ( t ) dt in case (i). Case (ii):
Using observations and (38) and (39) we obtain:(1 − ˆ p ) (cid:90) x G (cid:48) ( t ) − F (cid:48) ( t ) dt = (cid:90) x G ( t ) dt − (1 − ˆ m )ˆ p − (cid:90) x F ( t ) dt − (cid:90) x x F ( x ) dt + (1 − x )ˆ p G is an increasing function, it can be seen that the right hand side is a concavefunction of x . Thus, for x ∈ [ x , ˆ y ] this expression is minimized for x = x or x = ˆ m . For x = x , case (i) implies that the expression is non-negative. We next argue that for x = ˆ m the expression remains non-negative. This in turn implies that (cid:82) x G (cid:48) ( t ) − F (cid:48) ( t ) dt ≥ x ∈ [ x , ˆ m ], as claimed.Setting x = ˆ m , recalling that (cid:82) b G ( t ) dt = (cid:82) b F ( t ) dt , and observing that G ( t ) = G ( b ) = F ( b ) for t ≥ ˆ m the right hand side of the previous equation reduces to: R : = ( b − ˆ m ) F ( b ) − (1 − ˆ m )ˆ p − (cid:90) bx F ( t ) dt − ( x − ˆ m ) F ( x ) + (1 − x )ˆ p = ( b − ˆ m ) F ( b ) − (cid:90) bx F ( t ) dt − ( x − ˆ m ) F ( x ) − ( x − ˆ m )ˆ p = ( b − x ) F ( b ) − (cid:90) bx F ( t ) dt + ( x − ˆ m )( F ( b ) − F ( x ) − ˆ p ) . (43)Since F ( b ) ≥ F ( x ) = ˆ p + F ( x ), we conclude: R ≥ ( b − x ) F ( b ) − (cid:90) bx F ( t ) dt ≥ , (44)where the last inequality applies since F is weakly increasing. Thus, we conclude that (cid:82) m G (cid:48) ( t ) − F (cid:48) ( t ) dt ≥
0, and the claim follows.
Case (iii):
First observe that (38) and (39) imply that(1 − ˆ p ) (cid:90) x G (cid:48) ( t ) − F (cid:48) ( t ) dt = (cid:90) x G ( t ) dt − (1 − x )ˆ p − (cid:90) x F ( t ) dt − (cid:90) x x F ( x ) dt + (1 − x )ˆ p Similar to case (ii), the right hand side is a concave function of x . Thus, this expression isminimized for x ∈ [ ˆ m , x ] this expression is minimized for x = ˆ m or x = x . When x = ˆ m ,case (ii) implies that (cid:82) x G (cid:48) ( t ) − F (cid:48) ( t ) dt ≥
0. Similarly, when x = x , case (iv) implies that (cid:82) x G (cid:48) ( t ) − F (cid:48) ( t ) dt ≥
0. Thus, it follows that (cid:82) x G (cid:48) ( t ) − F (cid:48) ( t ) dt ≥ x ∈ [ ˆ m , x ]. Case (iv):
In this case, (38) and (39) readily imply that(1 − ˆ p ) (cid:90) x G (cid:48) ( t ) − F (cid:48) ( t ) dt = (cid:90) x G ( t ) − F ( t ) dt > , F and G .Summarizing, we have established that the distribution G (cid:48) and F (cid:48) satisfy the conditionsof the lemma. By the induction hypothesis, we have that there exists intervals { J (cid:48) k } k ∈ [ K − and sets P (cid:48) k = J (cid:48) k \ ∪ (cid:96) ∈ [ K − | (cid:96)>k J (cid:48) (cid:96) for all k ∈ A (cid:48) such that:(a’) J (cid:48) = I , and F (inf J (cid:48) ) < F (inf J (cid:48) K − ) < F (sup J (cid:48) K − ) < F (sup J (cid:48) );(b’) (cid:82) P (cid:48) k dF (cid:48) ( x ) = p (cid:48) k for all k ∈ [ K − (cid:82) P (cid:48) k xdF (cid:48) ( x ) = p (cid:48) k m (cid:48) k for all k ∈ [ K − J k = J (cid:48) k \ ˆ J for k ∈ [ K −
1] such that ˆ J (cid:54)⊂ J (cid:48) k , and J k = J (cid:48) k for the remaining k ∈ [ K − J K = ˆ J = [ x , x ]. For k ∈ [ K ], let P k = J k \ ∪ (cid:96) ∈ [ K ] | (cid:96)>k J (cid:96) . Note thatthe definition of the collection { P k } k ∈ [ K ] implies that it constitutes a laminar partition of I . Observe that the construction of { J k } k ∈ [ K ] and (ˆ a ), ( a (cid:48) ) imply that these intervals alsosatisfy condition (a) of the lemma.Note that by construction we have P k ⊂ P (cid:48) k ⊂ P k ∪ J K and P k ∩ J K = ∅ for k ∈ [ K − (cid:82) J K dF (cid:48) ( t ) = 0 by (38) this observation implies that (cid:82) P (cid:48) k dF (cid:48) ( t ) = (cid:82) P k dF (cid:48) ( t ) for k ∈ [ K − b (cid:48) ), and (45), this observation implies that (cid:90) P k dF ( t ) = (cid:90) P k dF (cid:48) ( t )(1 − ˆ p ) = (cid:90) P (cid:48) k dF (cid:48) ( t )(1 − ˆ p ) = m (cid:48) k (1 − ˆ p ) = m k , for k ∈ [ K − b ) we have (cid:82) P K dF ( t ) = (cid:82) ˆ P dF ( t ) = ˆ p = p K .Finally, observe that by (ˆ c ) we have (cid:82) P K tdF ( t ) = (cid:82) ˆ P tdF ( t ) = ˆ p ˆ m = p K m K . Similarly,(38) and (45) imply that for k ∈ [ K − (cid:90) P k tdF ( t ) = (1 − ˆ p ) (cid:90) P k tdF (cid:48) ( t ) = (1 − ˆ p ) (cid:90) P (cid:48) k tdF (cid:48) ( t ) = (1 − ˆ p ) p (cid:48) k m (cid:48) k = p k m k . These observations imply that the constructed { J k } k ∈ [ K ] and { P k } k ∈ [ K ] satisfy the induc-tion hypotheses (a)–(c) for K . Thus, the claim follows by induction.35 roof of Lemma 4. The existence of a direct incentive compatible mechanism M = { µ , . . . , µ n } that is optimal follows from Theorem 1. Consider such a mechanism, and denote the pos-terior mean distribution type report θ induces under this mechanism by G θ . We claim thatthere exists a direct incentive compatible mechanism whose posterior mean distribution isdiscrete with a single mass point in B θ,k for k ∈ [ K θ ], θ ∈ Θ, and yields a weakly largerpayoff.If the posterior mean distribution G θ is discrete with at most one mass point in B θ,k forall k ∈ [ K θ ], θ ∈ Θ, then the claim trivially follows. Suppose this is not the case for some k ∈ [ K θ (cid:63) ], θ (cid:63) ∈ Θ. Define another mechanism M (cid:48) = { µ (cid:48) , . . . , µ (cid:48) n } such that for type report θ (cid:63) , whenever mechanism M has a signal realization that induces a posterior mean in B θ (cid:63) ,k ,the signal of M (cid:48) has the same realization. Note that this realization induces a posterior meanof P [ E µθ(cid:63) [ ω | s ] ∈B θ(cid:63),k ] (cid:82) B θ(cid:63),k sdG θ (cid:63) ( s ) ∈ B θ (cid:63) ,k . Observe that by construction, under such a signalrealization type θ (cid:63) finds it optimal to take the same action ( a θ (cid:63) ,k ) as the one he takes underthe corresponding signals of the initial mechanism. This implies that his expected payoff fromtype report θ (cid:63) is the same under both mechanisms. Similarly, any type θ (cid:54) = θ (cid:63) expects thesame payoff from type reports other than θ (cid:63) . On the other hand, the convexity of the indirectutility functions implies that under the new mechanism the the latter type report yields aweakly lower expected payoff to types θ (cid:54) = θ (cid:63) . These observations imply that M (cid:48) is also adirect incentive compatible mechanism. Since for type θ the designer’s payoff is r θ,k wheneverthe posterior mean is in B θ,k , it also follows that M , M (cid:48) are payoff equivalent. Repeatingthis argument for all θ ∈ Θ , k ∈ [ K θ ], yields a direct incentive compatible mechanism that ispayoff equivalent to M and that has at most one atom in B θ,k for all k ∈ [ K θ ], θ ∈ Θ, andthe claim follows.
Proof of Lemma 5.
The condition G θ (cid:23) F can equivalently be stated as: (cid:90) x (1 − G θ ( t )) dt ≥ (cid:90) x (1 − F ( t )) dt, (46)for all x , where the inequality holds with equality for x = 1. This inequality can be expressedin the quantile space as follows: (cid:90) x G − θ ( t ) dt ≥ (cid:90) x F − ( t ) dt, (47)for all x ∈ [0 , x = 1. Note that since G θ is a discrete distribution, thiscondition holds if and only if it holds for x = (cid:80) k ≤ (cid:96) p θ,k and (cid:96) ∈ [ K θ ]. On the other, for such36 , we have (cid:90) x G − θ ( t ) = (cid:88) k ≤ (cid:96) p θ,k m θ,k = (cid:88) k ≤ (cid:96) z θ,k , (48)and (47) becomes (cid:88) k ≤ (cid:96) z θ,k ≥ (cid:90) (cid:80) k ≤ (cid:96) p θ,k F − ( t ) dt. (49)Since (cid:82) F − ( t ) dt = (cid:82) G − θ ( t ) dt = (cid:80) k z θ,k , the claim follows from (49) after rearrangingterms. References
Aliprantis, C. and Border, K. (2013).
Infinite Dimensional Analysis: A Hitchhiker’s Guide .Springer-Verlag Berlin and Heidelberg GmbH & Company KG.Alonso, R. and Cˆamara, O. (2016). Persuading voters.
American Economic Review ,106(11):3590–3605.Arieli, I., Babichenko, Y., Smorodinsky, R., and Yamashita, T. (2020). Optimal persuasionvia bi-pooling.
Available at SSRN .Bergemann, D. and Morris, S. (2013). Robust predictions in games with incomplete infor-mation.
Econometrica , 81(4):1251–1308.Bergemann, D. and Morris, S. (2016). Bayes correlated equilibrium and the comparison ofinformation structures in games.
Theoretical Economics , 11(2):487–522.Bergemann, D. and Morris, S. (2019). Information design: A unified perspective.
Journalof Economic Literature , 57(1):44–95.Boleslavsky, R. and Cotton, C. (2015). Grading standards and education quality.
AmericanEconomic Journal: Microeconomics , 7(2):248–79.Brocas, I. and Carrillo, J. D. (2007). Influence through ignorance.
RAND Journal of Eco-nomics , 38(4):931–947.Candogan, O. (2019a). Optimality of double intervals in persuasion: A convex programmingframework.
Available at SSRN 3452145 . 37andogan, O. (2019b). Persuasion in networks: Public signals and k-cores. In
Proceedingsof the 2019 ACM Conference on Economics and Computation , pages 133–134. ACM.Candogan, O. and Drakopoulos, K. (2017). Optimal signaling of content accuracy: Engage-ment vs. misinformation.Dworczak, P. and Martini, G. (2019). The simple economics of optimal persuasion.
Journalof Political Economy , 127(5):1993–2048.Gentzkow, M. and Kamenica, E. (2016a). Competition in persuasion.
The Review of Eco-nomic Studies , 84(1):300–322.Gentzkow, M. and Kamenica, E. (2016b). A Rothschild–Stiglitz approach to Bayesian per-suasion.
American Economic Review , 106(5):597–601.Goldstein, I. and Leitner, Y. (2018). Stress tests and information disclosure.
Journal ofEconomic Theory , 177:34–69.Guo, Y. and Shmaya, E. (2019). The interval structure of optimal disclosure.
Econometrica ,87(2):653–675.Inostroza, N. and Pavan, A. (2018). Persuasion in global games with application to stresstesting.Ivanov, M. (2015). Optimal signals in bayesian persuasion mechanisms with ranking.
Work.Pap., McMaster Univ., Hamilton, Can .Kamenica, E. (2019). Bayesian persuasion and information design.
Annual Review of Eco-nomics , 11:249–272.Kamenica, E. and Gentzkow, M. (2011). Bayesian Persuasion.
American Economic Review ,101(6):2590–2615.Kleiner, A., Moldovanu, B., and Strack, P. (2020). Extreme points and majorization: Eco-nomic applications.
Available at SSRN .Kolotilin, A. (2018). Optimal information disclosure: A linear programming approach.
The-oretical Economics , 13(2):607–635.Kolotilin, A., Mylovanov, T., Zapechelnyuk, A., and Li, M. (2017). Persuasion of a privatelyinformed receiver.
Econometrica , 85(6):1949–1964.38rlov, D., Zryumov, P., and Skrzypacz, A. (2018). Design of macro-prudential stress tests.Ostrovsky, M. and Schwarz, M. (2010). Information disclosure and unraveling in matchingmarkets.
American Economic Journal: Microeconomics , 2(2):34–63.Rayo, L. and Segal, I. (2010). Optimal information disclosure.
Journal of political Economy ,118(5):949–987.Schweizer, N. and Szech, N. (2018). Optimal revelation of life-changing information.
Man-agement Science , 64(11):5250–5262.Strassen, V. (1965). The existence of probability measures with given marginals.
The Annalsof Mathematical Statistics , 36(2):423–439.Winkler, G. (1988). Extreme points of moment sets.