[PDF] Optimal Rating Design

Abstract

We study the design of optimal rating systems in the presence of adverse selection and moral hazard. Buyers and sellers interact in a competitive market where goods are vertically differentiated according to their qualities. Sellers differ in their cost of quality provision, which is private information to them. An intermediary observes sellers' quality and chooses a rating system, i.e., a signal of quality for buyers, in order to incentivize sellers to produce high-quality goods. We provide a full characterization of the set of payoffs and qualities that can arise in equilibrium under an arbitrary rating system. We use this characterization to analyze Pareto optimal rating systems when seller's quality choice is deterministic and random.

Full PDF

OOptimal Rating Design ∗Maryam SaeediCarnegie Mellon University [email protected]

Ali ShouridehCarnegie Mellon University [email protected]

September 8, 2020

Abstract

We study the design of optimal rating systems in the presence of adverse selectionand moral hazard. Buyers and sellers interact in a competitive market where goodsare vertically differentiated according to their qualities. Sellers differ in their cost ofquality provision, which is private information to them. An intermediary observessellers’ quality and chooses a rating system, i.e., a signal of quality for buyers, in orderto incentivize sellers to produce high-quality goods. We provide a full characterizationof the set of payoffs and qualities that can arise in equilibrium under an arbitrary ratingsystem. We use this characterization to analyze Pareto optimal rating systems whenseller’s quality choice is deterministic and random. ∗ We thank James Best, Aislinn Bohren, Odilon Camara, Emir Kamenica, Alexey Kushnir, Jacobo Perego,Ilya Segal, and Ariel Zetlin-Jones as well as various seminar and conference participants for their helpfulcomments. a r X i v : . [ ec on . T H ] S e p Introduction

The problem of information control is at the heart of the design of markets with asymmetricinformation. On platforms such as Airbnb or eBay, where buyers often find it difficult toevaluate the quality of the service or product being offered, information provided by theplatform can mitigate some of the problems arising from adverse selection and moral hazard.In insurance markets, where providers do not have precise information about insurees andwant to condition contracts on public signals, regulators control which information can beused by providers. This centrally provided information can be used to incentivize the partiesinvolved and improve allocative efficiency in the market. In this paper, we study the designof information control – henceforth rating systems – in markets with adverse selection andmoral hazard.We perform this exercise in a competitive model with adverse selection and moral haz-ard. In our model, seller types are privately known, and each seller produces a single productthat is vertically differentiated by its quality. Producing higher-quality goods is costly forthe sellers, but it is less so for higher-type sellers than for lower-type ones. Buyers areuninformed about the quality of the product sold in the market and have to rely on the in-formation provided by an intermediary, who observes product quality and sends a signal tobuyers. We refer to such signal structure as a rating system.Our goal in this paper is twofold. First, we characterize the allocations of qualities thatare achievable by an arbitrary rating system (i.e., implementable allocations). Second, wedescribe the properties of optimal rating systems when quality outcomes are deterministicand random. In our analysis, optimality of rating systems are evaluated according objectivesassociated with Pareto optimality, i.e., those that maximize a weighted average of buyers’and sellers’ payoffs, as well as the revenue earned by the intermediary.Buyers use the information provided by the intermediary to form expectations about thequality of the goods in the market. This information also impacts sellers’ choice of quality.In particular, sellers’ incentives are affected by how their choice of quality affects their ex-pected prices. Moreover, since all buyers value quality equally, their posterior belief aboutsellers’ quality is reflected in product prices. Therefore, the main determinant of sellers’incentives is their (second-order) belief about buyers’ posterior beliefs after observing theirsignal. We refer to these second-order expectations as sellers’ signaled qualities . As a result,characterizing the set of payoffs and allocations boils down to characterizing these second-order expectations. This is in addition to the standard notion of incentive compatibility,which is associated with the optimal choice of quality by sellers.Our main implementability result is that these second-order expectations are relatedto the chosen qualities via a weighted-majorization ranking. In particular, we show that iftype θ sellers (distributed according to F and ranked by their efficiency of quality provision)choose quality q ( θ ) , then their expectation of buyers’ expectation (upon realizing a signal), q ( θ ) , must be F -majorized by q ( θ ) ; in other words (cid:90) θθ q ( θ ) dF ( θ ) ≥ (cid:90) θθ q ( θ ) dF ( θ ) , ∀ θ q ( θ ) F -majorizes q ( θ ) ,then a rating system can be constructed so that q ( θ ) is the expected value of buyers’ averageposterior from the perspective of a type θ seller.This ranking of the two functions, q ( θ ) and q ( θ ) , is equivalent to the standard notionof mean preserving spread. In particular, it can be shown that q ( · ) F − majorizes q ( · ) if andonly if the random variable induced by q ( · ) using F ( · ) is a mean preserving spread of thatinduced by q ( · ) . This result, thus, can be regarded as an extension of Blackwell (1953)’sresult on the relationship between the distribution of the posterior mean and the prior foran arbitrary signal structure. Recall that in our setup, since we need to consider sellers’incentives in choosing quality, the characterization of the posterior mean is not sufficient.We have, thus, shown that the expectation of the posterior mean conditional on the state(second-order expectations or signaled qualities) also satisfies the mean preserving spreadproperty of Blackwell (1953).Using the majorization formulation has two benefits: first, it allows us to work with func-tions of type θ sellers as opposed to distributions; second, when this inequality is bindingfor a certain type θ , the equivalent rating system must separate sellers with quality below q ( θ ) from those with higher quality, so their signals must not overlap. In other words, thesignals sent by the types below θ must be different from those sent by the types above θ ,almost surely. Additionally, we provide an algorithm that constructs a rating system whichimplements any schedule of second-order expectations (signaled qualities) and true qualitieswhen the type space is discrete. For continuous type spaces, this algorithm can be appliedby approximating continuous distributions with a discrete one.This characterization of the set of implementable qualities and signaled qualities (i.e.,sellers’ second-order expectations) allows us to cast the problem of optimal rating design asa mechanism design problem. In particular, an allocation of qualities and signaled qualitiesis implementable if and only if it satisfies the standard notion of incentive compatibilitytogether with the majorization ranking described above. If we interpret signaled qualitiesas transfers, the problem of optimal rating design is equivalent to the standard mechanismdesign with transferrable utility – such as that of Mussa and Rosen (1978), Baron and My-erson (1982), and Myerson (1981), among many others – where transfers have to satisfy adispersion constraint implied by the majorization ranking. In the second part of the paper,we provide techniques to solve a subset of such mechanism design problems. More specif-ically, we study two versions of our model: one in which quality outcomes for buyers aredeterministic and one in which they are random.When chosen qualities are deterministic, the main classes of objectives that we considerare the weighted sum of sellers’ payoffs. In particular, we allow for sellers’ welfare weightsto depend on their cost type. We consider three classes of rating systems: (1) low-qualityoptimal, in which the welfare weights are decreasing in sellers’ types ; (2) high-quality-seller optimal, in which welfare weights increase with θ (i.e., the sellers with a lower cost ofquality provision have a higher weight); and (3) mid-quality-seller optimal, in which welfareweights are hump-shaped in θ . This is similar to Dworczak et al. (forthcoming) where they consider a mechanism design problem with full mixing ; one in which thequality is revealed with some probability and otherwise a generic message – common to allquality levels – is sent.Second, information revelation interacts with whether welfare weights are increasing ordecreasing in seller quality, i.e., θ . Increasing welfare weights creates a motive for profits tobe reallocated to higher quality types. We show that in this case, the best that can be doneis to reveal all information. Note that when welfare weights are increasing, a mechanismdesigner that has access to unrestricted monetary transfers reallocates profits by marginallycompensating choice of quality by more than one-to-one. When the rating system is incen-tivizing sellers, the majorization result constrains the rewards for quality. More specifically,since dispersion of signaled qualities are less than that of chosen qualities, they cannot cre-ate steep rewards for choice of quality by sellers. As a result, the best that can be done isfully reveal quality. We establish this intuition via series of perturbation arguments in ourmechanism design problem.In contrast, when the welfare weights are decreasing in seller quality, use of randommixing is optimal. In this case, partial pooling of quality choices reduces the reward forchoice of quality and allows profits to be reallocated to lower types. Consideration of mid-quality-seller optimal ratings confirms this insight. When welfare weights are hump-shapedin θ , optimal rating system reveals all information for low qualities while it is full mixingfor middle and high qualities.While probabilistic information structures are a common feature of Pareto optimal rat-ings, they are somewhat uncommon in reality. We show that when quality outcomes arerandom, such probabilistic information structures are not needed. That is, the mixing thatis required in the deterministic model can be generated with monotone partitions. We con-sider a version of our model where buyers’ quality outcomes cannot be directly controlledby the seller; the sellers control the mean of the outcome while their cost depends on theirtype and the mean. We focus on montone rating systems – those in which higher qualityoutcomes lead to higher signaled qualities. In this environment, we show that optimal rat-ings always take the form of monotone partitions where qualities are either fully revealed orpooled with an interval around them. While in general characterization of optimal ratingsis more difficult, we develop a mathematical result that helps us provide partial characteri-zation of optimal ratings. We, then use this result to show that in some cases, informationmust be fully revealed for extreme realizations while it should be pooled for intermediaterealizations.Finally, we consider the case of a revenue maximizing intermediary that charges the arbitrary Pareto weights. θ . The rating system affects this entry margin by affecting the sellers’ profits. Notethat conditional on cutoff type for entry, the intermediary’s revenue is maximized whenthe payoff of the cutoff type is maximized. This would allow the intermediary to charge ahigher fee and increase its revenue. Thus, optimal rating system takes a full mixing formand a self-interested intermediary does not have an incentive to reveal all information. Our paper is related to a few strands of literature in information economics and mechanismdesign. Most closely, it is related to the Bayesian persuasion literature, as in Kamenicaand Gentzkow (2011), Rayo and Segal (2010), Alonso and Câmara (2016), and Dworczakand Martini (2019), among many others. However, unlike most of this literature, in oursetup, the state in which an information structure is designed upon is itself endogenousand is affected by the choice of information structure. A notable exception is the paper byBoleslavsky and Kim (2020) where they consider a model with moral hazard where an agentcontrols the distribution of state with her effort. They show that Kamenica and Gentzkow(2011)’s concavification method extends to their environment. In our setup, we are able toprovide a sharp characterization of the set of implementable outcomes. Furthermore, weare able to solve the resulting mechanism design problem under fairly general assumptionson the cost function and distribution of types. Kolotilin et al. (2017) study a problem ofinformation transmission where one of the parties is privately informed. However, in theirsetup, the informed party possesses information about her payoff which is independent ofthe state. In contrast, in our model sellers are informed about the state (their cost type), andthe information disclosure affects their choice of quality. From a technical perspective, our paper is also related to a subset of the Bayesian persua-sion literature that studies problems in which receivers’ actions depend on their posteriormean. For example, Gentzkow and Kamenica (2016), Kolotilin (2018), Dworczak and Mar-tini (2019), and Roesler and Szentes (2017) use Blackwell (1953)’s result that the existenceof an information structure is equivalent to the distribution of the posterior mean second-order stochastically dominating (SOSD) the prior. However, in our study finding this poste-rior mean is not enough, since sellers’ incentives depend on the expected prices, which arethemselves determined by the expectation of the posterior mean conditional on the state.Our contribution to this literature is to show that any profile of second-order expectationsthat dominates the chosen qualities in the sense of Second Order Stochastic Dominance canbe derived from some information structure. Moreover, we use the majorization rankingin order to shed light on key properties of all the information structures that induce a cer-tain distribution of second-order expectations. In our formulation, we use the majorization Few other papers have also focused on the joint problem of mechanism and information design; Guo andShmaya (2019) and Doval and Skreta (2019) are notable examples. Dworczak and Martini (2019) develop a methodology akin to duality to solve a large class of such prob-lems. However, their methods do not apply to our problem due to the non-convexity introduced by incentive Our paper is also related to the extensive literature on contracting and mechanism de-sign. Where as often the main assumption is that monetary transfers are available to pro-vide incentives, in our setup incentives for quality provision are provided using the ratingsystem. In fact, this is often the case in multi-sided platforms: seller badges in eBay andAirbnb as well as rider and driver ratings in Uber and Lyft are a few examples. A few no-table exceptions are models that study the problem of certification and its interactions withmoral hazard: Albano and Lizzeri (2001), Zubrickas (2015), and Zapechelnyuk (2020). Animportant contribution is that of Albano and Lizzeri (2001) where a key assumption is thatthe intermediary can charge an arbitrary fee schedule. The presence of an unrestricted feeschedule potentially reduces the importance of the certification mechanism. This is in con-trast with our model where monetary transfers are not flexible. More recently, Zubrickas(2015) and Zapechelnyuk (2020) also study variants of this problem. Their focus is, how-ever, on deterministic ratings. As we show, random signals are an important feature ofoptimal mechanisms. Additionally, we analyze ratings when qualities are random and notfully controlled by the providers.The rest of the paper is organized as follows: in section 2 we set up the model; in section3, we describe the set of implementable allocations; in section 4 we describe Pareto-optimalrating systems with deterministic quality choice, in section 5 we analyze the model withrandom qualities, and finally in section 6 we consider some extensions of our model includ-ing the problem of a revenue-maximizing intermediary.

In this section, we describe our baseline model of adverse selection and moral hazard thatwill provide the main framework for our analysis. We consider an economy with a contin- constraints. In contrast, we have to use perturbation techniques, as described in section 4, to verify solutions. Gershkov et al. (2020) study optimal auction design with risk-averse bidders who have dual risk aversiona la Yaari (1987). In their problem, the feasibility of allocations implies a majorization constraint on quantities,i.e., probability of allocation of the object to each bidder. Similar to our paper, they use calculus of variations tosolve this problem. In contrast, our mechanism design problem is equivalent to a problem in which transfersmust be majorized by qualities. This together with incentive compatibility puts more restriction on the set ofimplementable allocations. Evidently, our paper is also related to the extensive and growing literature that studies the problem ofcertification and information disclosure (e.g., Lizzeri (1999), Ostrovsky and Schwarz (2010), Boleslavsky andCotton (2015), Harbaugh and Rasmusen (2018), and Hopenhayn and Saeedi (2020)). Each seller produces a single product that is vertically differen-tiated by quality. Upon making a purchase, the buyer evaluates the good according to thefollowing payoff function: q − t, where q is the quality of the good produced and t is the transfer made to the seller. Sellers choose whether to produce or not and at which quality level. The cost of produc-ing a good with quality q is given by C ( q, θ ) , where θ is the type of seller. We assume that θ is drawn from a distribution with a c.d.f. given by F ( θ ) and support Θ . We allow F ( · ) to be a piecewise continuous function with a finite set of discontinuity points; that is, F ( · ) could be a mixture of a continuous and (finite) discrete distribution.We make the following assumptions on the cost function: Assumption 1.

The function C ( q ; θ ) satisfies C q ≥ , C θ ≤ , C qq ≥ , and C qθ ≤ .Moreover, C (0 , θ ) = 0 and C q (0 , θ ) = 0 . The submodularity assumption on C ( · , · ) , C θq ≤ , ensures that it is efficient to havehigher θ ’s to produce higher-quality goods; that is, higher values of θ have a lower marginalcost of producing higher-quality goods. Finally, sellers’ payoffs are given by t − C ( q, θ ) , where t is the transfer they receive.For simplicity, we normalize the outside option of buyers and sellers to 0. In the firstpart of our analysis, we focus on the cases where all sellers produce. In section 6, we discussvarious assumptions about the entry of both buyers and sellers into the market.We assume that buyers are uninformed about the quality of the product sold in the mar-ket and have to rely on the information provided by an intermediary, who observes thequality of the products sold by each seller and sends a partially informative signal to buy-ers. This is represented by an information structure or experiment a la Blackwell (1953)and is given by a signal space S and probability measure π ( ·| q ) ∈ ∆ ( S ) . We refer tothis information structure ( π, S ) as a rating system. One can interpret this assumption onthe information structure in multiple ways. One interpretation is that of a platform whichobserves certain information about sellers’ past behavior and uses aggregated signals toprovide information to buyers. Another interpretation is that of a regulator who regulatesthe information that can be used in contracts. Such regulations are fairly common in in-surance markets. For example, community ratings in health insurance markets restrict theextent to which insurance rates can vary across individuals.Given the information provided by the intermediary, buyers form expectations aboutthe quality of the goods in the market and compete over them. Since the only informationbuyers observe about the products is the signal s ∈ S provided by the intermediary, there Alternatively, we can think of this as an economy with one seller and one buyer. While this setup ismathematically equivalent to ours, the assumption of perfect competition is easier to interpret with a largenumber of sellers and buyers. In section 5, we allow the buyers’ experience to be random but dependent on the sellers’ choice.

7s a price p ( s ) for each signal realization. In our baseline model, we assume that buyerscompete away their surplus and the price for each signal realization satisfies p ( s ) = E [ q | s ] , where the conditional expectation is taken using a prior on the distribution of qualitieschosen by the seller and the signal structure by the intermediary. Thus, our assumption isthat buyers know the signal structure together with the strategies used by sellers in termsof their quality choices. A seller of type θ that chooses a quality level q (cid:48) has the following payoff: (cid:90) S p ( s ) π ( ds | q (cid:48) ) − C ( q (cid:48) , θ ) , where (cid:82) S p ( s ) π ( ds | q (cid:48) ) is the expected price received by the seller. In other words, sellersmust take into account the fact that upon choosing a quality, there will be a distributionover the posteriors formed by buyers, which in turn affect the prices they face. Simply put,sellers’ payoffs depend on their beliefs about buyers’ beliefs (i.e., their second-order beliefs).Hence, given a rating system ( π, S ) , equilibrium quality choices { q ( θ ) } θ ∈ Θ by the sellersmust satisfy the following incentive compatibility constraint q ( θ ) ∈ arg max q (cid:48) (cid:90) S E [ q | s ] π ( ds | q (cid:48) ) − C ( q (cid:48) , θ ) ; (1)together with their participation constraint: max q (cid:48) (cid:90) S E [ q | s ] π ( ds | q (cid:48) ) − C ( q (cid:48) , θ ) ≥ . We define a seller’s signaled quality as q ( θ ) = (cid:90) S E [ q | s ] π ( ds | q ( θ )) . (2)As an example, when signals are deterministic and π ( ·| q ) is degenerate, then signaled qual-ity, q ( θ ) , is the average quality among the sellers who send the same signal as θ . Signaledqualities and their dependence on θ are the main determinants of the sellers’ incentives tochoose their desired level of quality.The above definition of equilibrium for an arbitrary rating system or information struc-ture clarifies the key difference between our setting and that of the models of persuasion ala Kamenica and Gentzkow (2011). What differentiates our setup from Bayesian persuasionis the fact that due to moral hazard (i.e., q is a sellers’ choice) the state is endogenous to theinformation structure. As we show in section 3, this endogeneity leads to incentive compat-ibility (as it does in mechanism design) and characterization of second-order expectations.We then use this characterization to describe the properties of optimal rating systems. In section 6 we consider alternative determination of prices. The zero surplus assumption for the buyersis made out of convenience and not necessary for the analysis. What is necessary is that buyers’ surplus isequated across signal realizations. Characterization of General Rating Systems

In this section, we provide a characterization of the set of payoffs and qualities that can beachieved in equilibrium by any rating system. The analysis sheds light on the restrictionsthat are imposed by the particular way that incentives are provided via the rating system.

We start our analysis by first assuming that the distribution of sellers’ types is discrete:

Θ = { θ < · · · < θ N } and f i is the probability that a seller’s type is θ i (we still refer to thedistribution θ as F ). As it is convenient to use a vector notation to describe allocations,we describe the distribution of θ by its vector of point mass function f = ( f , · · · , f N ) .Additionally, the vector of qualities and signaled qualities is given by q = ( q , · · · , q N ) and q = ( q , · · · , q N ) , respectively, where q i is the quality chosen by a seller of type θ i and q i is her signaled quality implied by the rating system. Throughout our analysis, vectors arecolumn vectors and are row vectors when they are transposed (e.g., q is a column vectorand q T is a row vector).As we discuss below, the problem of characterizing the equilibrium payoffs and qualitiesfor arbitrary information structures boils down to the characterization of the set of possiblesignaled qualities q for a given allocation of quality q . In what follows, we proceed towardsa full characterization of this set.We first establish that the main determinant of sellers’ incentives are signaled qualities, q . To see this, note that since a seller of type θ i can choose q j , then incentive compatibility(1) implies that q i − C ( q i , θ i ) ≥ q j − C ( q j , θ i ) , ∀ j, i. We can resort to an argument in the spirit of the revelation principle and use the aboveinequalities in place of the constraint (1). This is mainly because we can always choose aparticular signal s ∅ to be associated with off-path qualities together with buyers’ belief thatthe quality associated with such a signal is 0. This would imply that by deviating to a qualityother than q j , prices will be 0. Therefore, the above constraint is equivalent to the constraintin (1).The following lemma characterizes standard incentive compatibility: Lemma 1.

If a vector of qualities, q , and a vector of signaled qualities, q , arise from an equi-librium, then they must satisfy: q N ≥ · · · ≥ q , q N ≥ · · · ≥ q q i − C ( q i , θ i ) ≥ q j − C ( q j , θ i ) , ∀ i, j. The proof is standard and is omitted.Lemma (1) establishes that signaled qualities, q i ’s, paired with chosen qualities, q i , mustbe incentive compatible. A natural question then arises: Does the fact that q i ’s are derived9rom second-order expectations in (2) impose any restrictions on them? In what follows,we show that the fact that signaled qualities are derived from second-order expectations foran arbitrary rating system is equivalent to second-order stochastic dominance. That is, itis equivalent to the random variable implied by q i distributed according to F dominatingthe random variable induced by q i distributed according to F in the sense of second-orderstochastic dominance.In formulating second-order stochastic dominance, we use an alternative to the familiarformulation of Rothschild and Stiglitz (1970). In particular, we use the majorization formu-lation of second-order stochastic dominance, which, as we show later, allows us to providea sharp characterization of rating systems that induces a certain distribution of signaledqualities. Our approach is based on the majorization ranking introduced by Hardy et al.(1934). See Marshall et al. (1979) for a thorough treatment of the concept.More specifically, consider two random variables, x and y , that take on real values in { x ≤ · · · ≤ x N } and { y ≤ · · · ≤ y N } , respectively, and whose distribution is given by Pr ( x = x i ) = Pr ( y = y i ) = f i . We say that y (cid:60) F x or x is F -majorized by y if the fol-lowing holds: N (cid:88) i =1 f i x i = N (cid:88) i =1 f i y i , i (cid:88) j =1 f j x j ≥ i (cid:88) j =1 f j y j , ∀ i = 1 , · · · , N − . (3)Hardy et al. (1929) showed that the above is equivalent to (cid:80) Ni =1 f i u ( x i ) ≥ (cid:80) Ni =1 f i u ( y i ) forany concave function u ( · ) ; that is, it is equivalent to the standard notion of second-orderstochastic dominance. As it will be clear later, we prefer this formulation of second-orderstochastic dominance, since it informs us of the properties of rating systems. We followHardy et al. (1929) and refer to the inequalities in (3) as majorization inequalities.In order to describe the properties of q and q , we follow Kamenica and Gentzkow (2011)and represent the rating system ( π, S ) as a distribution over the distribution of posteriors, τ ∈ ∆ (∆ (Θ)) that satisfies the Bayes plausibility constraint, which can be written in vectorform as f = (cid:90) ∆(Θ) µ dτ, where µ is the posterior over types – represented as a vector in R N . If a rating systemgenerates a finite number of signals, then τ must have a finite support and we can constructa signal structure from τ using Bayes’ rule: ∀ µ ∈ Supp ( τ ) , π ( { s } | q i ) f i µ si = τ ( { µ s } ) , Since θ is a discrete random variable with finitely many values, it is without loss of generality to focus onrating systems with finitely many signals. s is the signal associated with the posterior µ s . We can thus use the above to formu-late the signaled qualities as a function of actual qualities: q i = (cid:88) s π ( { s } | q i ) (cid:80) j π ( { s } | q j ) f j q j (cid:80) j π ( { s } | q j ) f j = 1 f i (cid:90) µ ∈ Supp ( τ ) µ i µ T q dτ (4)where µ T is the transpose of µ , and µ T q is the inner product of µ T and q . In other words, µ T q is the posterior mean of quality, and the above integral is the expectation of the pos-terior mean quality from the perspective of the seller. One can write (4) in vector form as q = Aq where A is an N × N positive matrix which satisfies f T A = f T , Ae = e , (5)where e = (1 , · · · , T . The existence of A which satisfies (5) implies the following result: Proposition 1.

Let q , q be vectors of signaled and true qualities, respectively, that arise in anequilibrium for some information structure. Then, q (cid:60) F q . The proof is relegated to the appendix.That q F -majorizes q is a direct result of existence of matrix A that satisfies (5). Looselyspeaking, (5) implies that q i ’s are less dispersed than q i and thus q (cid:60) F q . Our main charac-terization result in this section is that the reverse of Proposition 1 holds as well: Theorem 1.

Consider the vectors of signaled and true qualities, q , q , respectively, and supposethat they satisfy q ≤ · · · ≤ q N , q ≤ · · · ≤ q N with equality in one implying the other. Moreover, suppose that q (cid:60) F q . Then there exists arating system ( π, S ) so that q i = E [ E [ q | s ] | q i ] . The formal proof of Theorem 1 is relegated to the appendix. Here, we provide an outline.Consider the set of all signaled quality vectors that are induced by some rating system: S = (cid:26) r |∃ τ ∈ ∆ (∆ (Θ)) , r = diag ( f ) − (cid:90) µµ T dτ q with f = (cid:90) µ dτ (cid:27) . (6)The set S is convex since for any two measures τ and τ that satisfy f = (cid:82) µ dτ i , i.e., Bayesplausibility, their convex combination also satisfies Bayes plausibility. This together withthe fact that (cid:82) µµ T dτ is linear in τ implies that S is convex. Now consider q (cid:60) F q . In orderto show that q ∈ S , we show that for every vector λ ∈ R N , there exists r ∈ S such that λ T r ≥ λ T q . Since this would also be true for − λ , along any directions one can find twomembers of λ T S that are on either side of λ T q on the real line. This observation togetherwith the separating hyperplane theorem implies that q ∈ S . We formally show this in Lemma 2 in the Appendix.

11n order to show the for any λ , there exists r ∈ S such that inequality λ T r ≥ λ T q ,we use induction. When N = 2 , the proof is straightforward. The definition (3) in thiscase implies that q − q ≤ q − q (Figure 1). Since f T q = f T q , then q must lie on a lineconnecting q to (cid:0) f T q (cid:1) e , i.e., signaled qualities associated with no information. This impliesour claim is true.For N > , we show that the inequality either holds for r induced by full informationor that we can use induction. Specifically, it is possible to pool two consecutive states andfocus on rating systems that do not distinguish between these two states. This reduces thenumber of states to N − , use the induction hypothesis, and construct the vector r .Figure 1: Depiction of q satisfying (3) when N = 2 .It is worth comparing our main characterization result to those of other papers in the lit-erature on Bayesian persuasion. A strand of that literature has considered a class of sender-receiver problems in which the receiver’s action depends on her posterior expectations.For example, Gentzkow and Kamenica (2016), Kolotilin (2018), and Dworczak and Martini(2019) use a version of Blackwell’s result (Blackwell (1953) ) and show that the posteriormean is a random variable that second-order stochastically dominates the state. They thenuse techniques from linear programming or optimization with stochastic dominance con-straints to solve the problem. Our work is different from theirs in two respects: First, sincewe have to respect sellers’ (i.e., senders’) incentives, our variable of interest is the second-order expectation of the sender about the receiver’s observed posterior. Second, we usea different formulation of the stochastic dominance relationship that is informative of thesignal structure, as we illustrate below.The above theorem can be used to provide a full characterization of the set of quali-ties and payoffs that arise from the equilibrium defined above, which is summarized in the A generalization of this is Strassen’s theorem; see Theorem 7.A.1 in Shaked and Shanthikumar (2007)

Corollary 1.

The vectors of signaled qualities q and qualities q arise in equilibrium if andonly if the following are satisfied: q (cid:60) F q q i − C ( q i , θ i ) ≥ q j − C ( q j , θ i ) , ∀ i, j. In this section, we extend our analysis to allow for the general distribution of types. Let F ( θ ) be a cumulative distribution function that has finitely many discontinuities. For anytwo increasing functions q ( θ ) and q ( θ ) representing signaled and true quality, respectively,we say q (cid:60) F q if the following holds: (cid:90) θθ q ( θ (cid:48) ) dF ( θ (cid:48) ) ≥ (cid:90) θθ q ( θ (cid:48) ) dF ( θ (cid:48) ) , ∀ θ ∈ (cid:2) θ, θ (cid:3) (7) (cid:90) θθ q ( θ ) dF ( θ ) = (cid:90) ¯ θθ q ( θ ) dF ( θ ) . (8)When q ( θ ) and q ( θ ) are continuous, one implication of majorization is that for low valuesof θ , q ( θ ) ≥ q ( θ ) , while for higher values of θ , q ( θ ) ≥ q ( θ ) . Using this definition of F -majorization, we can show the following proposition: Proposition 2.

Let q ( θ ) and q ( θ ) be two functions representing signaled and true quality,respectively. Then, these functions arise from an equilibrium for some rating system if andonly if they satisfy the following:1. The profit function Π ( θ ) = q ( θ ) − C ( q ( θ ) , θ ) is continuous for all θ . When its deriva-tives exist, it satisfies Π (cid:48) ( θ ) = − C θ ( q ( θ ) , θ ) . (9)

2. The functions q ( θ ) and q ( θ ) are increasing in θ and satisfy q (cid:60) F q . We prove this proposition by considering the joint distribution of the posterior mean E [ q | s ] and q ( θ ) . We approximate the distribution of F ( · ) with a sequence of discrete dis-tributions whose supports are ordered according to the subset order, i.e., they are a filtration.We can apply the result of theorem 1 to construct an information structure associated witheach of these discrete approximations. By compactness of the space of measures over theposterior mean and q ( θ ) , these information structures must have a convergent subsequencewith a limiting information structure. It thus remains to be shown that the expectation ofthe posterior mean conditional on q ( θ ) under this limiting information structure coincideswith q ( θ ) . To show that, we resort to the martingale convergence theorem. In particular,given our construction of the discrete distributions, the support of such distributions can be13sed to construct a filtration. This filtration and the realization of q ( θ ) and posterior meanform a bounded martingale. As a result, we can apply Doob’s martingale convergence the-orem to show that the posterior mean conditional on q ( θ ) under the limiting informationstructure coincides with q ( θ ) . We formalize this argument in the appendix.The conditions in Proposition 2 represent the incentive compatibility (9) and majoriza-tion. Incentive compatibility implies that the surplus function is continuous. However, itis possible that qualities (signaled and true) exhibit discontinuities. This implies that suchdiscontinuities must occur at the same points and in such a way that Π ( θ ) is continuous. While the above result fully characterizes the set of payoffs and allocations, it is not infor-mative about the rating systems that implement a given pair of signaled and true qualityfunctions, q ( θ ) and q ( θ ) . In this section, we describe what the majorization constraints im-ply about the rating systems that implement certain payoffs. While in general it is difficultto provide a full characterization of the rating systems that implement a certain pair of sig-naled and true quality, it is possible to provide a partial characterization of their properties.Our first result describes when different sets of qualities must be separated by ratingsystems. We say a rating system ( π, S ) is separating at ˆ q if the set of signals generated bythe types with q ( θ ) ≤ ˆ q is different from that generated by the types with q ( θ ) > ˆ q almostsurely. Formally, if we define the set of signals generated by the types below and above ˆ θ asfollows: S = (cid:91) θ : q ( θ ) ≤ ˆ q Supp ( π ( ·| q ( θ ))) , S = (cid:91) θ : q ( θ ) > ˆ q Supp ( π ( ·| q ( θ ))) , then ( π, S ) is separating at ˆ q if (cid:90) Θ π ( S ∩ S | q ( θ )) dF ( θ ) = 0 . The following proposition states the condition under which a signal is separating at ˆ q : Proposition 3.

Let q ( θ ) and q ( θ ) be a pair of signaled and true quality functions that satisfythe conditions in Proposition 2. Let ( π, S ) be a rating system for which q ( θ ) = E [ E [ q | s ] | q ( θ )] .Then ( π, S ) is separating at ˆ q if and only if the majorization inequality (7) binds at ˆ θ =max { θ : q ( θ )=ˆ q } θ . The proof is relegated to the appendix.When ( π, S ) is separating at ˆ q , then when receiving a signal that belongs to qualitiesbelow ˆ q = q (cid:16) ˆ θ (cid:17) , a buyer is certain that the type she faces is below ˆ θ . Therefore, standardapplication of Bayes rule implies that the expectation of signaled qualities and true qualitiesconditional on types being below ˆ θ must be equal. The reverse statement can be shown byconsidering the inequalities in the proof of Proposition 1. This implies that for a given pairof q ( · ) and q ( · ) functions, a first step of identifying the rating system that delivers such14igure 2: Full mixing ratings and their associated signaled qualityqualities and signaled qualities is to identify the points at which the majorization constraintis binding.The following illustrates one example where majorization is always slack. Example. Full Mixing Rating Systems.

Suppose that q ( θ ) and q ( θ ) are both contin-uous, increasing, and satisfy q (cid:60) F q . Furthermore, suppose that there exist a unique ˆ θ forwhich q (cid:16) ˆ θ (cid:17) = q (cid:16) ˆ θ (cid:17) . This together with majorization implies that for values of θ < ˆ θ , q ( θ ) > q ( θ ) holds, while for values of θ > ˆ θ , q ( θ ) < q ( θ ) must hold. Moreover, it im-plies that the majorization inequality (7) never binds for values of θ < θ . Let us define thefunction α ( θ ) as follows: q ( θ ) = α ( θ ) q ( θ ) + (1 − α ( θ )) q (cid:16) ˆ θ (cid:17) . Given our assumption on q and q , α ( θ ) ∈ [0 , . This implies that a signal that reveals thequality with probability α ( θ ) and reveals nothing (sends a generic signal) with probability − α ( θ ) can implement the signaled quality function q ( θ ) . Figure 2 depicts the signaledqualities in this example. Since q ( · ) and q ( · ) intersect only once, the value of α is between0 and and therefore well-defined. We refer to such rating systems as full mixing .The above example illustrates the property that when the majorization constraint doesnot bind (in the interior of Θ ) some pooling of signals is required; in this case, it is an extremeform as all types send the generic signal with a positive probability. As we will show, fullmixing rating systems are a key feature of Pareto optimal rating systems. Construction of Rating Systems

While in general characterizing properties rating sys-tems from a signaled qualities profile is difficult, we provide an algorithm to construct therating system based on quality and signaled quality profiles – see Appendix . This algo-rithm generates a rating system when the type space is discrete. Moreover, it shows that15 repeated application of a small class of rating systems – those that simply pool qualitiestogether – can always implement a vector of signaled qualities.

The results in the previous section provide a full characterization of the set of payoffs andqualities that can arise in equilibrium under an arbitrary rating system. In this section, weuse this characterization to derive the properties of an optimal rating system given differentobjectives. Our focus will be on the set of Pareto-optimal allocations, i.e., those that maxi-mize a weighted average of sellers’ surplus. In Section 6, we consider alternative objectives,such as the revenue of the intermediary that charges a flat fee. Throughout the rest of ouranalysis, we assume that F ( · ) is continuous in θ .The class of objectives that we consider is given by the following expression: (cid:90) θθ λ ( θ ) Π ( θ ) dF ( θ ) . (10)In the above Π ( θ ) is the payoff of a seller of type θ , while λ ( θ ) is the welfare weight ofsellers of different types; without loss of generality, we set (cid:82) θθ λ ( θ ) dF = 1 . We consider afew cases for this objective in order to convey the main insight of optimal rating design: (1)total profits, i.e., λ ( θ ) = 1 , ∀ θ ; (2) low-quality optimal, i.e., λ ( θ ) decreasing in θ ; (3) high-quality optimal, i.e., λ ( θ ) strictly increasing in θ ; and (4) mid-quality optimal, i.e., λ ( θ ) strictly increasing for θ < θ ∗ and strictly decreasing for θ > θ ∗ .Hence, the problem of Pareto-optimal rating design is to maximize the objective in (10)subject to incentive compatibility (as described in Proposition 2) and majorization con-straints (7) and (8). This problem is not a convex programming problem, i.e., the constraintset is not convex mainly due to the presence of majorization and incentive constraints. Asa result, standard Lagrangian methods cannot be used. In what follows, we use perturbationarguments to prove our results.

In order to characterize optimal rating systems, it is useful to start from the first best bench-mark, the one in which total profits is maximized. In the unconstrained optimum, ignoringthe fact that θ is unobservable and that incentives are provided via rating systems, qualitiesmust satisfy C q (cid:0) q F B ( θ ) , θ (cid:1) = 1 . Letting signaled quality be defined by q ( θ ) = q ( θ ) , it is straightforward to see that thepair ( q, q ) is incentive compatible and satisfies q (cid:60) F q . Moreover, since the majorization As shown by Guesnerie and Laffont (1984), it is possible to make assumptions about the cost functionand transform variables in order to make the set implied by the incentive constraints convex. However, undersuch transformation, the majorization constraint becomes non-convex. θ ’s, it implies that q should be implemented with an informationsystem that fully reveals sellers’ qualities. We use this simple benchmark as a point ofcomparison against other Pareto-optimal allocations. To characterize low-quality-optimal rating systems, it is useful to consider the standardmechanism design problem in which there are no restrictions on q ( · ) : one in which q ( · ) ’sare interpreted as monetary payments. In this case, the problem of solving for the optimalmechanism is similar to the familiar mechanism design problem such as that consideredby Mussa and Rosen (1978) and Baron and Myerson (1982), among others. Thus, similartechniques can be used to solve this relaxed version of the problem.This relaxed mechanism design problem is given by max (cid:90) λ ( θ ) Π ( θ ) dF ( θ ) (P)subject to Π ( θ ) ≥ (cid:48) ( θ ) = − C θ ( q ( θ ) , θ ) q ( θ ) : increasing (cid:90) θθ Π ( θ ) dF ( θ ) = (cid:90) θθ [ q ( θ ) − C ( q ( θ ) , θ )] dF ( θ ) . The solution can be found using standard techniques of solving mechanism design problems,as for example those described by Myerson (1981). In particular, there are two possibilities:(1) the cost function, C ( · , · ) , and distribution, F ( · ) , are such that the monotonicity con-straint on q ( θ ) is slack, and (2) the monotonicity constraint sometimes binds. In the firstcase, it is straightforward to show that q ( θ ) must satisfy C q ( q ( θ ) , θ ) − C θq ( q ( θ ) , θ ) (cid:34) − F ( θ ) − (cid:82) θθ λ ( θ (cid:48) ) dF ( θ (cid:48) ) f ( θ ) (cid:35) . (11)Since λ ( θ ) is decreasing, the above implies that C q ≤ since C θq ≤ . If q ( θ ) and q ( θ ) aredifferentiable, we must have that q (cid:48) ( θ ) = C q ( q ( θ ) , θ ) q (cid:48) ( θ ) ≤ q (cid:48) ( θ ) . Therefore, signaled qualities are flatter than actual qualities. The equality of the averagevalue of signaled qualities and that of true qualities implies that the majorization inequality(7) is always satisfied. In other words, the optimal rating system is full mixing – see section3.3. When the monotonicity constraint binds, Myerson (1981)’s ironing procedure can beused to find the optimal qualities. In that solution, either q ( θ ) is constant or it satisfies1711). Therefore, a similar argument can be used to show that the majorization inequality isalways satisfied.We thus have the following proposition: Proposition 4.

A quality allocation q ( θ ) is low-quality-optimal if and only if it is a solutionto the relaxed problem (P). Moreover, if the cost function C ( · , · ) is strictly submodular, then alow-quality-optimal rating system is full mixing. The proof is relegated to the appendix.Intuitively, in low-quality-optimal allocation, the goal is to reduce high-quality sellers’surplus as much as possible while at the same time respecting incentive compatibility. Theexistence of information rents arising from incentive compatibility implies that quality pro-vision is always lower than that of the first best, i.e., C q ≤ has to hold. Moreover, as(11) establishes, when C ( · , · ) is strictly submodular, C q < , so some degree of pooling isrequired in order to increase low-quality sellers’ surplus. In this section, we discuss the properties of optimal rating systems when the Pareto weightof higher-quality sellers is higher. As in section 4.2, we can again consider the relaxedmechanism design problem without the majorization constraint. Since λ ( θ ) is increasing,the same argument as in section 4.2 shows that the solution to the relaxed problem (i.e.,when the monotonicity constraint is slack) should satisfy C q > . This, combined withthe incentive constraint, implies that q ( · ) will be steeper than q ( θ ) as a function of θ , andthus violates the majorization constraint. Intuitively, when λ ( θ ) is increasing, it is optimalto allocate profits to higher-quality sellers. However, since this is not incentive compati-ble, optimal allocations involve the overprovision of quality in order to make allocationsincentive compatible, but this cannot be achieved, as our result in section 3 implies. Intu-itively hiding any information will not help high-quality sellers as they will be mixed withlow-quality sellers and thus get lower prices in equilibrium. Furthermore, given that allinformation is revealed, first best allocations are optimal.The following proposition establishes that high-quality-seller–optimal rating systemsmust be fully revealing: Proposition 5.

A quality allocation is high-quality optimal if and only if it satisfies q ( θ ) = q F B ( θ ) . Moreover, a high-quality optimal rating system is fully revealing. For a fully revealing rating system, the majorization constraint always binds. The proofof this proposition involves coming up with perturbations of an allocation for which ma-jorization is slack and therefore reaching a contradiction with the optimality.In particular, in order to prove Proposition 5, we focus on a relaxed version of the prob-lem, where we only impose incentive compatibility as

Π ( θ ) − Π ( θ ) ≤ − (cid:90) θθ C θ ( q ( θ (cid:48) ) , θ (cid:48) ) dθ (cid:48) . (12)18hen the above inequality binds for all values of θ , it becomes equivalent to the incentivecompatibility constraint. We show that the solution of this relaxed problem is the first bestallocation and thus at the solution this inequality binds.Suppose that the majorization constraint is slack on an interval. Then, two propertieshave to be true for any such interval:1. It must be that over this interval C q ( q ( θ ) , θ ) ≥ . To see that, suppose to the contrarythat for a subinterval I , C q ( q ( θ ) , θ ) < . Then, one can consider a perturbation of q ( θ ) over such a subinterval that increases q ( θ ) , given by δq ( θ ) . Additionally, weperturb q ( · ) as follows: δq ( θ ) = (cid:40) C q ( q ( θ ) , θ ) δq ( θ ) + (cid:82) I [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF ( θ ) θ ∈ I (cid:82) I [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF ( θ ) . θ / ∈ I This perturbation increases q ( θ ) for the θ ’s in I , compensates these types for theircost increase, and since C q < , allocates the extra surplus generated by this pertur-bation across all types. Under this perturbation, δ Π ( θ ) = δ Π ( θ ) for all values of θ (Figure 3).The perturbation increases some values of q ( θ ) and leaves the rest unchanged, and C θq ≤ , so it does not violate (12). As majorization is slack over this interval, it isalways possible to make δq ( θ ) small enough so that it holds under the perturbed allo-cation. Therefore, this perturbation increases profits and satisfies all the constraints.2. Over this interval, the incentive constraint (12) is binding. If not, it is possible to takesubinterval I , increase Π ( θ ) for high values of θ , and decrease Π ( θ ) for low valuesof θ without violating the incentive constraint. This is possible because the majoriza-tion inequalities are slack over I . Since λ ( θ ) ’s are increasing, this only increases theobjective. Thus, at the optimum, the incentive constraint is binding.It is fairly straightforward to show that the above properties are in contradiction with aslack majorization constraint. In particular, since over this interval C q ≥ and the incentiveconstraint is binding, then it must be that q ( · ) is steeper than q ( · ) . Hence, majorization willbe violated and the claim is proven. We formalize this argument in the appendix.The above result points toward a force opposite to that discussed in section 4.2. That is,a rating system is unable to reallocate profits to higher-quality sellers. In particular, whenthe objective calls for reallocating profits to higher-quality sellers, the best that can be doneis to reveal all information. This would imply that the quality allocation coincides with thefirst best. In the next section, we show that this insight holds also for other objectives. Note that the inequality in (12) is derived from integrating the envelope condition when sellers are re-stricted to lie upward, i.e., when they can only pretend to be of a higher type. It is thus similar to restrictingattention to upward incentive constraints in an environment with discrete types. That these constraints arethe relevant ones here is natural since the objective is to allocate profits to higher-quality sellers. C q ( θ ) < and majorizationis slack. The graph of q ( · ) is shifted up by (cid:82) I (1 − C q ) δqdF > . In this section, we illustrate the insight from section 4.3 that when profits must be allocatedto sellers with higher quality – in this case mid-quality sellers – then optimal rating systemsmust involve revealing information. In particular, we assume that there exists θ ∗ for which λ ( θ ) is strictly increasing in θ when θ ≤ θ ∗ and λ ( θ ) is strictly decreasing in θ when θ ≥ θ ∗ .The following proposition characterizes optimal rating systems: Proposition 6.

Suppose that λ ( θ ) is hump-shaped. Then there exists ˆ θ ≤ ˜ θ < θ ∗ such thatfor all values of θ ≤ ˆ θ , q ( θ ) = q F B ( θ ) ; for all values of θ ∈ (cid:104) ˆ θ, ˜ θ (cid:17) , q ( θ ) = q F B (cid:16) ˆ θ (cid:17) ; while itis full mixing for values of q above q (cid:16) ˜ θ (cid:17) . Figure 4 depicts the structure of the optimal rating system described in proposition 6. Asit can be seen, when the objective values the profits of medium-quality sellers, it is alwaysoptimal to hide some information about high-quality sellers. In particular, by creating un-certainty about middle- and high-quality types, the rating system can deliver higher profitsto mid-quality types. For low-quality sellers, as in the case discussed in section 4.3, therating system is fully revealing. Additionally, a possible element of mid-quality optimal al-locations is bunching of types. This occurs mainly because at ˜ θ , the optimal level of profitcould be less than the first best – since this allows the allocation to push profits to higherquality sellers. This feature combined with the fact that majorization is binding below ˜ θ implies that there must be bunching of types.20igure 4: Mid-quality optimal ratings. The left panel depicts the position of the full revela-tion and bunching cutoff; the right panel depicts optimal qualities and signaled qualities.The resulting rating system reveals all information for low qualities; reveals only that q ∈ (cid:104) q (cid:16) ˆ θ (cid:17) , q (cid:16) ˜ θ + (cid:17)(cid:17) , and is a full mixing rating system when q ≥ q (cid:16) ˜ θ + (cid:17) – q ( θ +) is theright limit of q ( θ (cid:48) ) as θ (cid:48) approaches θ .Our proof of Proposition 6 involves elements that are similar to the perturbation argu-ment in the proof of Proposition 5. In particular, we first focus on a more relaxed version ofthe problem, one in which for values of θ ≤ θ ∗ , the inequality (12) must be satisfied, whilefor values of θ ≥ θ ∗ , the inequality Π (cid:0) θ (cid:1) − Π ( θ ) ≥ − (cid:90) θθ C θ ( q ( θ (cid:48) ) , θ (cid:48) ) dθ (cid:48) (13)must be satisfied; we will then show that both sets of inequality must bind at the optimumand thus the solution of the relaxed problem coincides with that of the main problem. How-ever, our perturbations allow the inequalities to be relaxed. In this relaxed version of the problem, we are able to show that if the majorizationconstraint binds for some θ (cid:48) < θ ∗ , then it must bind for all values of θ ≤ θ (cid:48) . To do this, weuse the same perturbations discussed in section 4.3.For values of θ > θ ∗ , we can show that the majorization constraint never binds. Wedemonstrate that for values of θ > θ ∗ , C q ( q ( θ ) , θ ) ≤ . Otherwise, a perturbation thatreduces quality increases the objective and keeps majorization satisfied. Moreover, for allvalues of θ > θ ∗ , (13) must bind, which implies that Π ( θ ) must be continuous. We thenshow that if majorization is to bind at some θ > θ (cid:48) > θ ∗ , then C q ( q ( θ (cid:48) ) , θ (cid:48) ) > , which isa contradiction.An interesting observation is that even though the objective puts the highest weighton sellers of type θ ∗ , the rating system combines the types below θ ∗ with those above it, Similar to inequality (12), (13) is derived from integrating the envelope condition between θ and θ whensellers can only lie downward. ˜ θ < θ ∗ , because of the incentive constraint. In particular, if one solves for the optimalrating system when θ is restricted to be in (cid:2) θ ∗ , θ (cid:3) , at the optimum q ( θ ∗ ) = q F B ( θ ∗ ) while q ( θ ∗ ) > q ( θ ∗ ) . In other words, the surplus generated by type θ ∗ is higher than in the firstbest. As a result, continuity of Π ( θ ) cannot hold since its value is at most equal to the firstbest.Finally, an optimal rating system must necessarily create a jump in qualities. That is,it should be designed in a way that no one chooses qualities in (cid:104) q (cid:16) ˜ θ − (cid:17) , q (cid:16) ˜ θ (cid:17)(cid:105) . Thisis because for sellers below ˜ θ the allocations are fully revealing, while the rating systempartially pools sellers above ˜ θ with higher types. For payoffs to be continuous, as impliedby the incentive compatibility, allocations must exhibit a jump.In summary, our results in this section highlight the main trade-offs in reallocating prof-its using rating systems. On the one hand, when profits should be allocated from lower-quality to higher-quality types of sellers, the rating system must reveal all information. Onthe other hand, when the opposite should occur, the rating system must be partially reveal-ing. One can use this as a general guide in designing a rating system.We should note that while the immediate interpretation of the analysis here is that ofcharacterizing the Pareto frontier of this environment, one can interpret the Pareto weightsas arising from heterogeneous outside options – see Jullien (2000) for an analysis of thisproblem – where λ ( θ ) ’s are associated with tightness of the participation constraints. While equivalent to our problem, the problem of analyzing the solution with heterogeneousoutside options is much less tractable. While it is not possible to show, one can associatehigher outside option loosely with higher welfare weights. Under this admittedly looseinterpretation, our analysis provides a connection between design of rating systems andthe side of the market where entry conditions are tighter, i.e., what is sometimes referred toas shorter side of the market. According to our analysis, when high quality sellers are theshorter side of the market, then ratings should reveal all information in order to relax theirentry. If this occurs for mid- or low-quality sellers, optimal rating systems must involvesome mixing.

The model used in the previous section assumes that sellers can precisely choose their levelof quality. However, often, sellers may not be able to make such a precise choice. Forexample, two sided platforms often use measures of quality that are not fully controlled bythe service provider. These platforms often rely on buyers’ feedback which could be subjectto randomness: an eBay seller can be subject to shipment delays that are not fully controlledby her; an Airbnb host can be matched with an overcritical guest, etc. In this section, weallow for random quality outcomes. As we will show, while the optimal rating systems aredeterministic partitions, they share some similarities to those derived in section 4. In Dworczak et al. (forthcoming), agents are heterogeneous in their marginal value of money and the λ ’srepresent this heterogeneity. q , the realizedquality is given by x , where x is random. Without loss of generality, assume that q, x ∈ [0 , . We denote the distribution (p.d.f) of x conditional on choice of q by g ( x | q ) while itscumulative distribution is G ( x | q ) . As in 2, the sellers can be of different types θ and thecost of choosing q is given by C ( q, θ ) which satisfies Assumption 1. We make the followingassumptions on g ( x | q ) : Assumption 2.

The distribution function g ( x | q ) satisfies:1. Average value of x is q , i.e., (cid:82) xg ( x | q ) dx = q .2. The distribution function g ( x | q ) is continuously differentiable with respect to x and q for all values of x ∈ [0 , and q ∈ (0 , .3. The distribution function g ( x | q ) satisfies full support, i.e., g ( x | q ) > , ∀ x ∈ (0 , andmonotone likelihood ratio, i.e., g q ( x | q ) /q ( x | q ) is strictly increasing in x . The first part of this assumption is a normalization of the mean of x . The second part isa differentiability assumption and is made so that we can use standard results from calculusof variation. The third part of the assumption implies that one cannot back out the chosen q by an observation of x and that the likelihood ratio g q /g is increasing. The latter impliesthat an increase in q shifts the distribution of x to the right, i.e., increases the distributionof x in the sense of first-order stochastic dominance.As in section 2, we assume that the intermediary observes the quality experienced by thebuyers x and designs a rating system represented by ( S, π ) where π ( s | x ) is the distributionof signals conditional on the realization of quality x . As before, prices are given by buyers’posterior beliefs and expected prices for each realization of quality x are given by x ( x ) = (cid:90) E [ x | s ] π ( ds | x ) . We impose that x ( x ) must be increasing in x . Recall that, in the deterministic case, in-centive compatibility implies the monotonicity of the signaled qualities. This assumption isalso a realistic one as leading examples of rating and certification typically exhibit monotonesignals. Similar to the model with deterministic qualities, a characterization result as in The-orem 1 and Proposition 2 should hold here. However, the definition of majorization now More generally one can think of an action a ∈ R that controls the distribution of quality outcomes. Aslong as the average value of the outcome is a concave and increasing function of a , the normalization in part1 of Assumption 2 is without loss of generality. This is partly because we do not know if our characterization result in section 3 applies absent this mono-tonicity assumption. The incentive compatibility constraint (14) implies that (cid:82) x ( x ) g ( x | q ( θ )) dx and q ( θ ) must be increas-ing in θ . Under Assumption 2, this is satisfied when x ( · ) is monotone increasing. x . For an average quality profile { q ( θ ) } θ ∈ Θ , the (cumulative)distribution of x is given by H ( x ) = (cid:90) Θ (cid:90) x g ( x (cid:48) | q ( θ )) dx (cid:48) dF ( θ ) We can, thus, say that x (cid:52) H x if and only if it satisfies (cid:82) x [ x ( x (cid:48) ) − x (cid:48) ] dH ( x (cid:48) ) ≥ and (cid:82) [ x ( x ) − x ] dH ( x ) = 0 .Given a rating system and the signaled qualities it induces, i.e., the function x ( x ) , sellerschoose q optimally and the choice of average quality by the seller of type θ , q ( θ ) must satisfythe following incentive compatibility q ( θ ) ∈ arg max q ∈ [0 , (cid:90) x ( x ) g ( x | q ) dx − C ( q, θ ) (14)In order to simplify the problem of optimal rating design, we replace the above incentiveconstraint with its local version C q ( q ( θ ) , θ ) = (cid:90) x ( x ) g q ( x | q ( θ )) dx (15)The above constraint replaces incentive compatibility with its first order condition. Whenthis constraint is sufficient, we say first-order approach is valid. We derive all of our resultsunder the assumption that first-order approach is valid.Given the formulation of majorization and incentive compatibility, the problem of find-ing a Pareto optimal rating system is then to choose q ( θ ) and x ( · ) to maximize a weightedaverage of sellers’ payoffs (cid:82) Θ λ ( θ ) Π ( θ ) dF ( θ ) subject to the local incentive constraint 15and x (cid:60) H x . In this subsection, we show that optimal signal when the quality provision is random canbe obtained within monotone partitions in which in each partition, we either fully revealall information about x or we just inform the buyer that the sellers’ quality is within thepartition.Let us formally define our notion of deterministic monotone partitions. A rating system ( S, π ) is called a monotone partition if there exists a partition of [0 , to a collection of sets { I α } α ∈ A where: 1. each I α is either a closed or half-open interval of the form [ x , x ) , 2.for all α, β ∈ A , I α and I β are ranked, i.e., either min I α ≥ sup I β or min I β ≥ sup I α ,3. For each α ∈ A there exists a unique signal s α ∈ S such that π ( { s α } | x ) = 1 for all x ∈ I α . In words, a monotone partition either fully reveals each value of quality (in whichcase I α is a singleton) or pools it with an interval around it. Note that the signaled qualities x ( x ) associated with a monotone partition rating system is always of the form depicted inFigure 5. The points at which the majorization constraints are binding create the partition24igure 5: Signaled qualities for a monotone partition rating system. I α wherein between any two such points x ( x ) is constant and equal to the mean value of x conditional on x belonging to such an interval. The following proposition establishes that all optimal rating systems must be monotonepartitions:

Proposition 7.

Suppose that the first order approach is valid and Assumption 2 holds. Thena Pareto optimal rating system is a monotone partition.Proof.

We show the claim by first showing that for all x , either the monotonicity constraintis binding or the majorization constraint is binding at the optimum. Suppose to the contrarythat this does not hold. Note that a change in x ( x ) for a measure zero of x ’s, does not affectthe objective, and the majorization constraint. This implies that in order to achieve a con-tradiction, we need to rule out an interval in which neither majorization nor monotonicityconstraint is binding. Suppose that there exists an interval I = [ x , x ] for which majoriza-tion and monotonicity are slack. Note that under the validity of the first order approach,the optimal rating system must be a solution to the following planning problem: max q ( · ) ,x ( · ) (cid:90) Θ λ ( θ ) (cid:20)(cid:90) x ( x ) g ( x | q ( θ )) dx − C ( q ( θ ) , θ ) (cid:21) dF ( θ ) (P1) The assumption that I α ’s are half-open intervals implies that x ( · ) is right continuous. This is withoutloss of generality as the distribution of x does not have atoms. (cid:90) x ( x ) g q ( x | q ( θ )) dx = C q ( q ( θ ) , θ ) (cid:90) x [ x ( x (cid:48) ) − x (cid:48) ] (cid:90) Θ g ( x (cid:48) | q ( θ )) dF ( θ ) dx (cid:48) ≥ (cid:90) [ x ( x ) − x ] (cid:90) Θ g ( x | q ( θ )) dF ( θ ) dx = 0 x ( x ) − x ( x (cid:48) ) ≥ , ∀ x ≥ x (cid:48) By combining the Theorems 1 in section 9.3 and 9.4 of Luenberger (1997), there must existlagrange multipliers γ ( θ ) – for the incentive compatibility constraint, and a positive la-grange multiplier M ( x ) associated with the majorization constraint and m – the lagrangemultiplier associated with the last constraint together with η ( x, x (cid:48) ) ≥ for the monotonic-ity constraint.The optimality condition for values of x ∈ I are given by (cid:90) Θ λ ( θ ) g ( x | q ( θ )) dF ( θ ) + (cid:90) Θ γ ( θ ) g q ( x | q ( θ )) dθ + (cid:90) x M ( x (cid:48) ) dx (cid:48) (cid:90) Θ g ( x | q ( θ )) dF ( θ )+ m (cid:90) Θ g ( x | q ( θ )) dF ( θ ) = 0 (16)Moreover, since by assumption the majorization constraint is slack for x ∈ I , µ ( x ) = (cid:82) x M ( x (cid:48) ) dx (cid:48) is constant for all x ∈ I . We can write the above as (cid:90) Θ ( λ ( θ ) + µ ( x ) + m ) g ( x | q ( θ )) dF ( θ ) + (cid:90) Θ γ ( θ ) g q ( x | q ( θ )) dx = 0 , ∀ x ∈ I (17)Let ˆ x ( x ) be defined as ˆ x ( x ) = (cid:40) x ( x ) x / ∈ I (cid:82) I x ( x ) dH ( x ) (cid:82) I dH ( x ) x ∈ I Note that x (cid:60) H ˆ x and therefore, by transitivity of majorization x (cid:60) H ˆ x . Since ˆ x ( · ) and x ( · ) only differ on I and (17) holds, the value of lagrangian is the same for ˆ x and x . Thus,we can replace x with a signaled quality function for which monotincity is binding for I .Finally, note that since x ( · ) is a bounded function, (cid:82) x [ x ( x (cid:48) ) − x (cid:48) ] (cid:82) Θ g ( x (cid:48) | q ( θ )) dx (cid:48) dF ( θ ) is a continuous function. Hence, the points at which it takes a positive value is an open sub-set of [0 , . Thus, it is union of disjoint intervals. For values of x belonging to any of theintervals, the monotonicity constraint must be binding – by the previous step. Hence, x ( · ) is constant for any interval where majorization is slack. Additionally, the points at whichthe majorization constraint is binding is a closed subset of [0 , and thus it is a union of26isjoint closed intervals (possibility isolated points). Over any such interval, taking deriva-tive of the binding majorization constraint with respect to x implies that x ( x ) = x . Thisestablishes the claim.The above proposition implies that when quality outcomes are random, one does notneed rating systems with random signals – as in section 4 – to provide incentives. Unlikein the deterministic model of section 4, deterministic monotone partitions do not lead tobunching of types. This is due to the fact that the realization of quality is random and can-not be fully controlled by the sellers. For example, in the deterministic model, a montonepartition as shown in Figure 5 necessarily leads to bunching of types at points of disconti-nuity of x ( · ) while this is not the case when x is random. Before characterization of optimal rating systems, we first provide a mathematical result re-garding optimal signaled qualities in a class of auxiliary problems. For an arbitrary function

Γ ( x ) , consider the following optimization problem max x ( · ) (cid:90) Γ ( x ) x ( x ) h ( x ) dx (P’)subject to (cid:90) x [ x ( x (cid:48) ) − x (cid:48) ] h ( x (cid:48) ) dx (cid:48) ≥ (cid:90) [ x ( x ) − x ] h ( x ) dx = 0 x ( x ) ≥ x ( x (cid:48) ) , ∀ x ≥ x (cid:48) . Note that for any solution of (P1), the problem of choosing x ( · ) conditional on the choiceof q ( θ ) boils down to solving a problem of the form (P’) where Γ ( x ) is a function of Paretoweights λ ( · ) , distribution functions g ( x | q ( θ )) , and the Lagrange multipliers associatedwith the incentive constraints (15). We will refer to the function Γ ( x ) as the gain func-tion . As we show in the following proposition, the main determinant of the solution of (P’)is the derivative of Γ ( · ) and its sign. We refer to a solution as an alternating partition, if [0 , can be partitioned to a collection of intervals and each interval is either fully pooledor fully revealed with no two consecutive intervals being of the same type. The followingproposition, shows that under some fairly general assumptions on the gain function, theoptimal solution of problem (P’) is an alternating partition: Proposition 8.

Suppose that a gain function

Γ ( x ) is continuously differentiable and that itsderivative changes sign k < ∞ times, i.e., we can partition [0 , into k intervals where ineach interval Γ (cid:48) ( x ) has the same sign but not in two consecutive intervals. Then, the optimalinformation structure is an alternating partition with at most k intervals. Γ ( x ) is increasing, then the solution of (P’) is fully revealing while if Γ ( x ) is decreasing, the solution is pooling. The main idea of the proof is to show that ina monotone partition, two subsequent intervals cannot be both pooling. This implies thatoptimal x ( · ) should be an alternating partition. We then use the following insights whichcome from examining the optimality condition (16): 1. when the majorization constraintbinds over an interval, the gain function must be strictly increasing; 2. if [ x , x ) is a poolinginterval, then Γ ( x ) ≥ Γ ( x ) . This implies that there are at most k intervals. While the above results illustrate the sufficiency of monotone partitions and a general un-derstanding of optimal ratings, the exact nature of optimal rating systems depend on the de-tails of the distribution of qualities and welfare weights. In what follows, we use a two-typeexample to illustrate that some of the insights from the deterministic case carry through andshed light on the determinants of optimal rating design in the presence of random qualityoutcomes.Specifically, suppose that

Θ = { θ , θ } with θ < θ , f ( θ j ) = f j , and λ ( θ ) =1 , λ ( θ ) = 0 . Before stating our formal result, we provide a heuristic analysis of themain determinants of optimal rating. Suppose that the lagrange multipliers on the incen-tive compatibility constraints are γ j . Then, the gain function associated with optimal ratingdesign is given by: Γ ( x ) = g ( x | q ) h ( x ) (cid:18) γ g q ( x | q ) g ( x | q ) + γ g q ( x | q ) g ( x | q ) g ( x | q ) g ( x | q ) (cid:19) , (18)where in the above h ( x ) = f g ( x | q ) + f g ( x | q ) . Formally, given q and q , optimal x ( x ) must maximize (cid:82) Γ ( x ) x ( x ) h ( x ) dx subject to monotonicity and majorization.Analyzing the terms in the gain function identifies two forces that shape the propertiesof the optimal rating system:1. Redistributive: The first term in the gain function g ( x | q ) /h ( x ) is a decreasing func-tion of the likelihood ratio g ( x | q ) /g ( x | q ) . Under part three of Assumption 2, i.e.,the monotone likelihood ratio property, this likelihood function is increasing in x andas a result the term g ( x | q ) /h ( x ) is decreasing in x . Thus, when γ = γ = 0 , i.e.,when we do not have to worry about the effect of the rating system on incentives, thenoptimal rating system is simply one that provides no information. This is because inthis case, the gain function is decreasing in x .2. Incentive: The second and third term in the gain function represents the importanceof incentive provision for types 1 and 2. The function g q ( x | q ) /g ( x | q ) is an increasing As we show in section 6.1, maximizing revenue from a flat fee for an intermediary leads to the sameoutcome as this particular Pareto optimal rating. x and positive for higher values of x . As a result,the second term creates a force for information revelation. In fact, when γ and γ arevery large, the gain function Γ ( x ) becomes increasing and as a result it is optimal toreveal all information.At the optimum, the exact nature of the optimal rating system depends on how these forcesinteract. While the guiding principle for the design of rating systems is Proposition 8, inwhat follows, we provide conditions for which revelation must occur for high and low val-ues. As we show later, various classes of distribution functions g ( x | q ) satisfy this assump-tion: Assumption 3.

For arbitrary q > q , define the function ˆ x ( z ) as the solution of z = g (ˆ x ( z ) | q ) /g (ˆ x ( z ) | q ) . The function ˆ x ( z ) must satisfy the following properties:1. The function φ ( z ) = g q (ˆ x ( z ) | q ) /g (ˆ x ( z ) | q ) satisfies φ (cid:48)(cid:48) ( z ) ≤ ,2. The function ψ ( z ) = zg q (ˆ x ( z ) | q ) /g (ˆ x ( z ) | q ) satisfies ψ (cid:48)(cid:48) ( z ) ≥ ,3. The function φ (cid:48)(cid:48) ( z ) /ψ (cid:48)(cid:48) ( z ) is increasing in z . Using the above assumption, we have the following proposition:

Proposition 9.

Suppose that Assumptions 2 and 3 hold. Furthermore, suppose that

Θ = { θ , θ } with θ < θ and that λ ( θ ) = 1 , λ ( θ ) = 0 . If at the optimum q ≥ q , then thereexists two thresholds x < x where optimal rating system is fully revealing for values of x below x and above x while it is pooling for values of x ∈ ( x , x ) . Proof is relegated to the Appendix.Under Assumption 3, the incentive effects are strongest for extreme values of x whilethe the redistributive force is strongest for mid values of x . As the Proposition illustratesoptimal rating system pools intermediate values of x while fully reveal extreme values.Roughly speaking the full revelation of extreme values of x are associated with incentiveprovision for types 1 and 2. Under Assumption 3, the incentive effect for type 1 – the secondterm in the gain function (18) – is steepest for low values of x while the incentive effect fortype 2 – the third term in the gain function (18) – is steepest for high values of x . As aresult, mid-values of x are pooled, i.e., redistributive effect dominates, while at the extremesincentive effects dominate.Some examples of distributions that satisfy Assumption 3 are:1. P.d.f. is a power of x : G ( x | q ) = x q − q which implies that g q ( x | q ) /g ( x | q ) = − q ) (cid:16) log x + − qq (cid:17) .2. P.d.f. is a power of − x : G ( x | q ) = 1 − (1 − x ) − qq which implies that g q ( x | q ) /g ( x | q ) = q (cid:16) − q − q − log (1 − x ) (cid:17) .3. P.d.f. is exponential of x : G ( x | q ) = (cid:0) e λ ( q ) x − (cid:1) / (cid:0) e λ ( q ) − (cid:1) for some function λ ( q ) . This implies that g q /g = A ( q ) log x + B ( q ) . A similar property holds for G ( x | q ) = (cid:0) − e − λ ( q ) x (cid:1) / (cid:0) − e − λ ( q ) (cid:1) .29he analysis in this section illustrates two main lessons: 1. montone and alternating par-titions are optimal when qualities are random, 2. the interplay between redistributive andincentive effects determine when outcomes are pooled and when fully revealed. While thedetails of the distribution function g ( x | q ) matters, our analysis suggests that full revelationis important for the extreme realizations. In this section, we show how our results and analysis would extend beyond the model con-sidered above.

In the above analysis, we have considered optimal rating systems under Pareto optimality.However, it is often the case that intermediaries are self-interested. Here, we discuss theincentives of a revenue-maximizing intermediary that can charge the sellers a flat fee forentry.Since the intermediary is a monopolist, a flat fee charged to all sellers that enter themarket might lead to exclusion of some sellers. In other words, if the intermediary chargesa fee e , sellers only enter if their payoff is higher than e . Note that since profits of the sellersdepend on the rating system, π ( s | q ) , their decision to enter depends on the rating system.Thus, for any e and rating system π ( · ) , there must exists an entry cutoff for sellers’ typesgiven by ˆ θ ( e, π ( · )) which satisfies e = Π (cid:16) ˆ θ (cid:17) = max q (cid:48) (cid:90) S E [ q | s ] π ( ds, q (cid:48) ) − C (cid:16) q (cid:48) , ˆ θ (cid:17) . (19)Note that when the right hand side of (19) is higher than e for all values of θ , then ˆ θ = θ . Note that given the entry cutoff, the revenue of the intermediary is given by e (cid:104) − F (cid:16) ˆ θ ( e, π ( · )) (cid:17)(cid:105) .The problem of the intermediary is thus to choose an entry fee and rating system to maxi-mize this revenue.An insight that helps us characterize the optimal rating system from the intermediary’sperspective is that we can think about the intermediary choosing the cutoff ˆ θ and the ratingsystem π ( · ) and using (19) to calculate the required fee that induces the entry of types of ˆ θ and higher. Viewing the intermediary’s problem this way, we can write its revenue as Π (cid:16) ˆ θ (cid:17) (cid:104) − F (cid:16) ˆ θ (cid:17)(cid:105) . Thus, given a ˆ θ , the problem of finding the optimal rating system is tomaximize the payoff of type ˆ θ , Π (cid:16) ˆ θ (cid:17) . This is the same as the problem studied in section(4.2). We thus have the following Proposition: Proposition 10.

A revenue-maximizing optimal rating system is full mixing. We have assumed deterministic quality here. The analysis does not really change when quality is random. θ . In this case, the problem of the intermediary becomes similar to the objectivesconsidered in section 4 where one can interpret the welfare weights λ ( θ ) as the Lagrangemultipliers on the participation constraints faced by the intermediary. While in general solv-ing the resulting mechanism design problem is difficult, one can loosely argue that thesemultipliers are positively associated with the outside options, i.e., higher outside optionsare associated with higher multipliers. Thus, under this interpretation, our analysis impliesthat when higher quality sellers have a tighter participation constraint, then optimal ratingsshould be perfectly revealing. As Hui et al. (2020) have illustrated using a change in eBay’scertification policy, middle-quality sellers’ entry decision is more sensitive to changes ininformation policy. This evidence suggests that the type of objectives in section 4.4 is morerelevant in that context. In our analysis so far, we have assumed that all buyers and sellers have the same outsideoption of 0 and that buyers compete away their surplus. This is mainly done for the sake ofexposition. Here, we describe what happens in the presence of endogenous entry.Particularly, suppose that buyers’ outside option is random and given by ν and is dis-tributed according to a differentiable cumulative distribution function G ( ν ) . We assumethat sellers’ outside option is 0. If we assume that the support of G ( · ) is the entire real line,then there must exist a threshold ν e where buyers will buy the object if and only if theiroutside option is not greater than ν e . Moreover, since our equilibrium concept is competi-tive equilibrium, there must exist a threshold θ e where sellers produce if and only if θ ≥ θ e .In equilibrium the level of prices must adjust so that markets clear, i.e., G ( ν e ) = 1 − F ( θ e ) . (20)In essence, with random outside options for buyers, the overall level of prices is deter-mined by the market clearing condition (20). The properties of optimal rating systems thatwe have shown in section 4 hold independent of the division of the surplus between buyersand sellers. This implies that the properties of optimal rating systems that we discussed insection 4 go through even in the presence of endogenous entry. The only additional con-straint that endogenous entry as modeled here imposes on optimal rating systems is thatthe rating system must punish the seller types below θ e in such a way to discourage theirentry. See Jullien (2000) for treating participation constraints in classic mechanism design. Mechanisms like this are employed in platforms such as Uber where drivers with low ratings are excluded. Conclusion

In this paper, we have studied the role and the design of rating systems in providing in-centives for provision of quality. To solve the problem, we have showed a characterizationresults that establishes that sellers’ second-order expectations are majorized by the sellers’true quality choices. This characterization result allows us provide fairly general character-ization of a certain subset of Pareto optimal rating systems and draw general insights onoptimal design of rating systems.In our analysis, we have mainly focused on heterogeneity in quality in the form of ver-tical differentiation where all buyers value quality in the same fashion. Naturally, one canask about the effect of horizontal differentiation. In such an environment, information pro-vision improves the allocation and sorting of buyers among sellers with different quality. Ina companion paper Saeedi and Shourideh (2020), we undertake the analysis of this problem.Additionally, one can argue that rating systems often use past performance to provide infor-mation to the market. Designing rating systems in such a dynamic setting is an importantproblem which we leave for future work.

References

Albano, G. L. and A. Lizzeri (2001): “Strategic certification and provision of quality,”

In-ternational economic review , 42, 267–283.Aliprantis, C. D. and K. Border (2013):

Infinite Dimensional Analysis: A Hitchhiker’sGuide , Springer-Verlag Berlin and Heidelberg GmbH & Company KG.Alonso, R. and O. Câmara (2016): “Persuading voters,”

American Economic Review , 106,3590–3605.Baron, D. P. and R. B. Myerson (1982): “Regulating a monopolist with unknown costs,”

Econometrica: Journal of the Econometric Society , 911–930.Best, J. and D. Quigley (2020): “Persuasion for the long run,”

Available at SSRN 2908115 .Blackwell, D. (1953): “Equivalent comparisons of experiments,”

The annals of mathematicalstatistics , 265–272.Boleslavsky, R. and C. Cotton (2015): “Grading standards and education quality,”

Ameri-can Economic Journal: Microeconomics , 7, 248–79.Boleslavsky, R. and K. Kim (2020): “Bayesian persuasion and moral hazard,”

Working Paper,Emory University .Doob, J. L. (1994):

Measure theory , Springer Science & Business Media. See Best and Quigley (2020) for some work along this line. arXivpreprint arXiv:1811.03579 .Dworczak, P., S. D. Kominers, and M. Akbarpour (forthcoming): “Redistribution ThroughMarkets,”

Econometrica .Dworczak, P. and G. Martini (2019): “The simple economics of optimal persuasion,”

Jour-nal of Political Economy , 127, 1993–2048.Gentzkow, M. and E. Kamenica (2016): “A Rothschild-Stiglitz Approach to Bayesian Per-suasion,”

American Economic Review , 106.Gershkov, A., B. Moldovanu, P. Strack, and M. Zhang (2020): “Optimal Auctions forDual Risk Averse Bidders: Myerson meets Yaari,”

Available at SSRN .Guesnerie, R. and J.-J. Laffont (1984): “A Complete Solution to a Class of Principal-AgentProbems with an Application to the Control of a Self-Managed Firm,”

Journal of publicEconomics , 25, 329–369.Guo, Y. and E. Shmaya (2019): “The interval structure of optimal disclosure,”

Econometrica ,87, 653–675.Harbaugh, R. and E. Rasmusen (2018): “Coarse grades: Informing the public by withhold-ing information,”

American Economic Journal: Microeconomics , 10, 210–35.Hardy, G., J. Littlewood, and G. Polya (1934):

Inequalities , Cambridge Universtiy Press,Cambridge, UK.Hardy, G., G. Polya, and J. Littlewood (1929): “Some Simple Inequalities Satisfied byConvex Functions,”

Messenger of Mathematics , 58, 145–152.Hopenhayn, H. and M. Saeedi (2020): “Optimal Quality Ratings and Market Outcomes,”

National Bureau of Economic Research Working Paper .Hui, X., M. Saeedi, G. Spagnolo, and S. Tadelis (2020): “Raising the Bar: CertificationThresholds and Market Outcomes,",”

Working Paper, Carnegie Mellon University .Jullien, B. (2000): “Participation constraints in adverse selection models,”

Journal of Eco-nomic Theory , 93, 1–47.Kamenica, E. and M. Gentzkow (2011): “Bayesian persuasion,”

American Economic Review ,101, 2590–2615.Kleiner, A., B. Moldovanu, and P. Strack (2020): “Extreme Points and Majorization:Economic Applications,” mimeo .Kolotilin, A. (2018): “Optimal information disclosure: A linear programming approach,”

Theoretical Economics , 13, 607–635. 33olotilin, A., T. Mylovanov, A. Zapechelnyuk, and M. Li (2017): “Persuasion of a pri-vately informed receiver,”

Econometrica , 85, 1949–1964.Lizzeri, A. (1999): “Information revelation and certification intermediaries,”

The RAND Jour-nal of Economics , 214–231.Luenberger, D. G. (1997):

Optimization by Vector Space Methods , John Wiley & Sons.Marshall, A. W., I. Olkin, and B. C. Arnold (1979):

Inequalities: theory of majorizationand its applications , vol. 143, Springer.Mussa, M. and S. Rosen (1978): “Monopoly and product quality,”

Journal of Economic theory ,18, 301–317.Myerson, R. B. (1981): “Optimal auction design,”

Mathematics of operations research , 6, 58–73.Ostrovsky, M. and M. Schwarz (2010): “Information disclosure and unraveling in match-ing markets,”

American Economic Journal: Microeconomics , 2, 34–63.Rayo, L. and I. Segal (2010): “Optimal information disclosure,”

Journal of political Economy ,118, 949–987.Roesler, A.-K. and B. Szentes (2017): “Buyer-optimal learning and monopoly pricing,”

American Economic Review , 107, 2072–80.Rothschild, M. and J. E. Stiglitz (1970): “Increasing risk: I. A definition,”

Journal of Eco-nomic theory , 2, 225–243.Royden, H. L. and P. Fitzpatrick (1988):

Real Analysis , vol. 32, Macmillan New York.Saeedi, M. and A. Shourideh (2020): “Rate to Match: Optimal Rating Design with Sorting,”

Work in Progress .Shaked, M. and J. G. Shanthikumar (2007):

Stochastic orders , Springer Science & BusinessMedia.Yaari, M. E. (1987): “The dual theory of choice under risk,”

Econometrica , 95–115.Zapechelnyuk, A. (2020): “Optimal quality certification,”

American Economic Review: In-sights , 2, 161–76.Zubrickas, R. (2015): “Optimal grading,”

International Economic Review , 56, 751–776.34 ppendix

A Proofs

A.1 Proof of Proposition 1

Proof.

We first show that (5) holds:

Lemma 2.

Let q be a vector of qualities and q be a vector of signaled qualities for an arbitraryrating system. Then, there exists an N × N positive matrix A such that q = Aq , where the matrix A satisfies f T A = f T , Ae = e , (21) where e = (1 , · · · , T .Proof. Given (4), A is given by A = diag ( f ) − (cid:90) µµ T dτ, which is simply a rewriting of (4) in matrix form; diag ( f ) is an N × N matrix that has f asdiagonal and 0 otherwise. We have Ae = diag ( f ) − (cid:90) µµ T e dτ = diag ( f ) − (cid:90) µ dτ = e , where µ T e = 1 and f = (cid:82) µ dτ . Moreover, f T A = (cid:88) i f i f i (cid:90) µ i µ T dτ = (cid:88) i (cid:90) µ i µ T dτ = (cid:90) µ T dτ = f T . A is given by A = ( a ij ) i,j . We have that k (cid:88) i =1 f i q i = k (cid:88) i =1 f i N (cid:88) j =1 a ij q j = N (cid:88) j =1 q j k (cid:88) i =1 f i a ij ≥ k − (cid:88) j =1 q j k (cid:88) i =1 f i a ij + q k N (cid:88) j = k k (cid:88) i =1 f i a ij . (22)Since A satisfies (5), the following equality holds k (cid:88) i =1 f i N (cid:88) j =1 a ij = k (cid:88) i =1 f i . Thus, we can write the above as k (cid:88) i =1 f i q i ≥ k − (cid:88) j =1 q j k (cid:88) i =1 f i a ij + q k N (cid:88) j = k k (cid:88) i =1 f i a ij = k − (cid:88) j =1 q j k (cid:88) i =1 f i a ij + q k (cid:34) k (cid:88) i =1 f i − k − (cid:88) j =1 k (cid:88) i =1 f i a ij (cid:35) = k − (cid:88) j =1 q j f j + k − (cid:88) j =1 q j (cid:34) k (cid:88) i =1 f i a ij − f j (cid:35) + q k (cid:34) f k + k − (cid:88) j =1 f j − k − (cid:88) j =1 k (cid:88) i =1 f i a ij (cid:35) = k (cid:88) j =1 q j f j + k − (cid:88) j =1 ( q k − q j ) (cid:34) f j − k (cid:88) i =1 f i a ij (cid:35) = k (cid:88) j =1 q j f j + k − (cid:88) j =1 ( q k − q j ) N (cid:88) i = k +1 f i a ij ≥ k (cid:88) j =1 q j f j , where q k ≥ q j for all j ≤ k − and f T A = f T . Finally, f T q = f T Aq = f T q , which concludes the proof. 36 .2 Proof of Theorem 1 Proof.

We define S as follows S = (cid:26) r |∃ τ ∈ ∆ (∆ (Θ)) , r = diag ( f ) − (cid:90) µµ T dτ · q with f = (cid:90) µ dτ (cid:27) This is obviously a compact set since

S ⊂ [min i q i , max q i ] N . Moreover S is a convexset since if τ and τ satisfy Bayes plausibility, then so is their convex combination anddiag ( f ) − (cid:82) µµ T dτ is linear in τ . We show that if q satisfies the majorization property,then it must be a member of S . To show this, we show that for any λ ∈ R N , there ex-ists r ∈ S such that λ T q ≤ λ T r . Then if q / ∈ S , then there must exist λ ∈ R N such that λ T q > λ T r , ∀ r ∈ S . This is in contradiction with the previous claim and so we must havethat q ∈ S .Note that without loss of generality, we can assume that λ ≥ . This is because if λ T q ≤ λ T r then for some α > , we have α f T q = α f T q α f T r = α (cid:88) i f i f i (cid:90) µ i µ T dτ · q = α (cid:90) (cid:88) i µ i µ T dτ · q = α (cid:90) µ T dτ · q = α f T q and hence, ( λ + α f ) T q ≤ ( λ + α f ) T r . That is, a choice of α can guarantee that λ + α f has all elements positive.We prove that for all λ ≥ there exists r ∈ K such that λ T q ≤ λ T r using inductionon N .When N = 2 , there are two cases:1. λ λ + λ ≥ f . In this case, λ λ + λ q + λ λ + λ q ≤ f q + f q = f q + f q , since q ≤ q . Thus, if we choose τ ( { f } ) = 1 – no information, then r = (cid:0) f T q , f T q (cid:1) ∈ S .The above inequality then implies that λ T q ≤ ( λ + λ ) f T q = λ T r which proves the claim. 37. λ λ + λ ≤ f . Since q ≥ q and f T q = f T q , we must have that q ≤ q . Therefore, q − q ≤ q − q Multiplying both sides by (cid:16) λ λ + λ − f (cid:17) ≥ we can write (cid:18) λ λ + λ − f (cid:19) ( q − q ) ≤ (cid:18) λ λ + λ − f (cid:19) ( q − q ) and we can add f q + f q = f q + f q to both sides of the above inequality and have (cid:18) λ λ + λ − f (cid:19) ( q − q ) + f q + f q ≤ (cid:18) λ λ + λ − f (cid:19) ( q − q ) + f q + f q λ λ + λ q + (cid:20) f + f − λ λ + λ (cid:21) q ≤ λ λ + λ q + (cid:20) f + f − λ λ + λ (cid:21) q λ λ + λ q + λ λ + λ q ≤ λ λ + λ q + λ λ + λ q If we choose τ ( { e i } ) = f i , then τ satisfies Bayes plausibility and p (cid:48) = diag ( f ) − (cid:82) µµ T dτ q = diag ( f ) − diag ( f ) q = q and the above inequality implies λ T q ≤ λ T p (cid:48) which proves theclaim.Now consider q , q and f and suppose that they satisfy the hypothesis of the claim. Thereare two possibilities: Case 1. ∃ i ∈ { , · · · , N } such that λ i /f i ≤ λ i − /f i − . In this case, consider the follow-ing N − dimensional vectors q (cid:48) = (cid:18) q , · · · , q i − , f i − q i − + f i q i f i − + f i , q i +1 , · · · , q N (cid:19) q (cid:48) = (cid:18) q , · · · , q i − , f i − q i − + f i q i f i − + f i , q i +1 , · · · , q N (cid:19) f (cid:48) = ( f , · · · , f i − , f i − + f i , f i +1 , · · · , f N ) λ (cid:48) = ( λ , · · · , λ i − , λ i − + λ i , λ i +1 , · · · , λ N ) We have that k (cid:88) j =1 ˜ f j q (cid:48) j = (cid:40)(cid:80) kj =1 f j q j k ≤ i − (cid:80) k +1 j =1 f j q j k ≥ i − and a similar property holds for ˜ q . This implies that q (cid:48) , q (cid:48) and f (cid:48) satisfy the hypothesis ofour claim and as a result and by the induction hypothesis there exists τ (cid:48) ∈ ∆ (∆ (Θ (cid:48) )) –with Θ (cid:48) = (cid:110) , , · · · , i − , ˆ i, i + 1 , · · · , N (cid:111) – so that f (cid:48) = (cid:90) µ dτ (cid:48) , r (cid:48) = diag ( f (cid:48) ) − (cid:90) µµ T dτ (cid:48) q (cid:48) , ( λ (cid:48) ) T q (cid:48) ≤ ( λ (cid:48) ) T r (cid:48) (23)38e construct τ ∈ ∆ (∆ (Θ)) from τ (cid:48) by assuming that τ always sends the same signal for q i − and q i as τ (cid:48) . Formally, we define a subset K of ∆ (Θ) as follows: K = (cid:26) µ ∈ ∆ (Θ) |∃ µ (cid:48) ∈ ∆ (Θ (cid:48) ) , µ j = µ (cid:48) j , j (cid:54) = i, i − , µ i − = f i − f i − + f i µ (cid:48) ˆ i , µ i = f i f i − + f i µ (cid:48) ˆ i (cid:27) This is a borel subset of ∆ (Θ) – the set of beliefs where µ i /µ i − = f i /f i − . Moreover,for any borel subset A of K , we can define its projection P ( A ) in ∆ (Θ (cid:48) ) as P ( A ) = (cid:110) µ (cid:48) |∃ µ ∈ A, µ (cid:48) j = µ j , j (cid:54) = ˆ i, µ (cid:48) ˆ i = µ i − + µ i (cid:111) . Given this, we define τ ( A ) = τ (cid:48) ( P ( A ∩ K )) . In words, the above information structure keeps the receiver fully uninformed about states i and i − since their relative probabilities are always equal to the relative probability ofthe prior. We have (cid:90) µ dτ = (cid:90) K µ dτ = (cid:90) ∆(Θ (cid:48) ) (cid:18) µ , · · · , µ i − , f i − f i − + f i µ ˆ i , f i f i − + f i µ ˆ i , µ i +1 , · · · , µ N (cid:19) T dτ (cid:48) = (cid:18)(cid:90) µ dτ (cid:48) , · · · , (cid:90) µ i − dτ (cid:48) , f i − f i − + f i (cid:90) µ ˆ i dτ (cid:48) , f i f i − + f i (cid:90) µ ˆ i dτ (cid:48) , (cid:90) µ i +1 dτ (cid:48) , · · · , (cid:90) µ N dτ (cid:48) (cid:19) T = (cid:18) f , · · · , f i − , f i − f i − + f i ( f i − + f i ) , f i f i − + f i ( f i − + f i ) , f i +1 , · · · , f N (cid:19) T = f where in the above we have used the fact that τ (cid:48) satisfies (23).Moreover, we havediag ( f ) − (cid:90) µµ T dτ = (cid:18) f k (cid:90) µ k µ j dτ (cid:19) k,j ∈{ , ··· ,N } =  f k (cid:82) µ k µ j dτ (cid:48) k, j (cid:54) = i, i − f i + f i − (cid:82) µ ˆ i µ j dτ (cid:48) k ∈ { i, i − } , j (cid:54) = i, i − f j f k ( f i + f i − ) (cid:82) µ k µ ˆ i dτ (cid:48) k (cid:54) = i, i − , j ∈ { i, i − } f j ( f i + f i − ) (cid:82) ( µ ˆ i ) dτ (cid:48) k, j ∈ { i, i − } Therefore, if we let r = diag ( f ) − (cid:0)(cid:82) µµ T dτ (cid:1) q , we have r k = 1 f k N (cid:88) j =1 (cid:90) µ k µ j dτ q j k (cid:54) = i, i − , we can write the above as r k = 1 f k (cid:88) j (cid:54) = i,i − (cid:90) µ k µ j dτ (cid:48) q j + 1 f k f i − f i + f i − (cid:90) µ k µ ˆ i dτ (cid:48) q i − + 1 f k f i f i + f i − (cid:90) µ k µ ˆ i dτ (cid:48) q i = 1 f k (cid:88) j (cid:54) = i,i − (cid:90) µ k µ j dτ (cid:48) q j + 1 f k (cid:90) µ k µ ˆ i dτ (cid:48) q i − f i q i + f i − q i − f i + f i − = 1 f k (cid:88) j ∈ ˆΘ (cid:90) µ k µ j dτ (cid:48) ˜ q j = r (cid:48) k where in the above we have used the fact that q j = q (cid:48) j for all j (cid:54) = i, i − and q (cid:48) ˆ i = f i q i + f i − q i − f i − + f i and the definition of r . If k = i, i − , then r k = 1 f i + f i − (cid:88) j (cid:54) = i,i − (cid:90) µ ˆ i µ j dτ (cid:48) q j + f i ( f i + f i − ) (cid:90) ( µ ˆ i ) dτ (cid:48) q i + f i − ( f i + f i − ) (cid:90) ( µ ˆ i ) dτ (cid:48) q i − = 1 f i + f i − (cid:88) j (cid:54) = i,i − (cid:90) µ ˆ i µ j dτ (cid:48) q j + 1( f i + f i − ) (cid:90) ( µ ˆ i ) dτ (cid:48) f i q i + f i − q i − f i + f i − = 1 f (cid:48) ˆ i (cid:88) j ∈ ˆΘ (cid:90) µ ˆ i µ j dτ (cid:48) ˜ q j = r (cid:48) ˆ i This implies that λ T r = N (cid:88) j =1 λ j r j = (cid:88) j (cid:54) = i,i − λ j r (cid:48) j + ( λ i + λ i − ) r (cid:48) ˆ i = (cid:88) j = i,i − λ (cid:48) j r (cid:48) j + λ (cid:48) ˆ i r (cid:48) ˆ i = ( λ (cid:48) ) T r (cid:48) Moreover, λ T q = N (cid:88) j =1 λ j q j = (cid:88) j (cid:54) = i,i − λ j q j + λ i − q i − + λ i q i

40e can write λ i − q i − + λ i q i = f i − λ i − f i − q i − + f i λ i f i q i Since λ i − f i − ≥ λ i f i , q i − ≤ q i Chebyshev’s sum inequality implies that f i − f i − + f i λ i − f i − q i − + f i f i + f i − λ i f i q i ≤ (cid:18) f i − f i − + f i λ i − f i − + f i f i + f i − λ i f i (cid:19) × (cid:18) f i − f i − + f i q i − + f i f i + f i − q i (cid:19) = λ i − + λ i f i − + f i f i − q i − + f i q i f i − + f i = λ (cid:48) ˆ i f i − + f i q (cid:48) ˆ i That is λ i − q i − + λ i q i ≤ λ (cid:48) ˆ i q (cid:48) ˆ i We can therefore write λ T q ≤ ( λ (cid:48) ) T q (cid:48) and as a result λ T q ≤ ( λ (cid:48) ) T q (cid:48) ≤ ( λ (cid:48) ) T r (cid:48) = λ T r which establishes the claim. Case 2. ∀ i ∈ { , · · · , N } , λ i − f i − ≤ λ i f i . Then we can write λ T q = N (cid:88) i =1 λ i p i = N (cid:88) i =1 λ i f i f i p i = N (cid:88) i =1 (cid:18) λ i f i − λ i − f i − (cid:19) N (cid:88) j = i f j p j = N (cid:88) i =1 (cid:18) λ i f i − λ i − f i − (cid:19) (cid:34) f T q − i − (cid:88) j =1 f j p j (cid:35) ≤ N (cid:88) i =1 (cid:18) λ i f i − λ i − f i − (cid:19) (cid:34) f T q − i − (cid:88) j =1 f j q j (cid:35) = N (cid:88) i =1 (cid:18) λ i f i − λ i − f i − (cid:19) N (cid:88) j = i f j q j = N (cid:88) i =1 λ i f i f i q i = λ T q q is the vector of expected signaled qualities under full information, i.e., τ ( { e i } ) = f i and thus q ∈ S . This completes the proof. A.3 Proof of Proposition 2

Proof.

The proof of incentive compatibility and the “only if” direction of the claim is straight-forward and is omitted. Here, we prove the “if” part. In other words, consider a pair offunctions { q ( θ ) , q ( θ ) } θ ∈ Θ that satisfy conditions 1 and 2 in the statement of proposition.Consider a sequence of partitions of Θ given by Θ n = (cid:8) θ = θ n < θ n < · · · < θ nn = θ (cid:9) for n = 1 , , · · · with min i :0 ≤ i ≤ n − θ ni +1 − θ ni → and Θ n ⊂ Θ n +1 . Define f n , q n , q n asfollows f ni = F ( θ ni − ) − F (cid:0) θ ni − (cid:1) , ≤ i ≤ n − f nn = F (cid:0) θ (cid:1) − F (cid:0) θ nn − (cid:1) q ni =  (cid:82) θniθni − q ( θ ) dF ( θ ) f ni if f ni > q (cid:0) θ ni − (cid:1) if f ni = 0 q ni =  (cid:82) θniθni − q ( θ ) dF ( θ ) f ni if f ni > q (cid:0) θ ni − (cid:1) if f ni = 0 Let q n and q n represent the discrete random variables whose values are given by q ni and q ni with probability f ni . Given that q ( θ ) and q ( θ ) are increasing functions, then q ni and q ni are increasing in i . Moreover, by construction, q n (cid:60) F n q n . Thus, we can use Theorem 1and an information structure ( π n , S n ) exists where π n : Θ n → ∆ ( S n ) exists under which q ni = E [ E [ q ni | s ] | q ni ] . Note that each ( π n , S n ) induces a distribution over posterior beliefsof the buyers given by τ n ∈ ∆ (∆ (Θ n )) . Note that since any measure in ∆ (Θ n ) can beembedded in ∆ (Θ) . This is because for any µ = ( µ , · · · , µ n ) ∈ ∆ (Θ n ) we can construct ˆ µ ∈ ∆ (Θ) defined by ˆ µ ( A ) = (cid:80) ni =1 µ i [ θ ni ∈ A ] where A is an arbitrary Borel subset of Θ . Similarly, we can find ˆ τ n ∈ ∆ (∆ (Θ)) which is equivalent to τ n .Now consider the random variable representing the joint distribution of θ n and posteriormean µ [ q ] = (cid:82) q ( θ ) dµ for any µ ∈ Supp (ˆ τ n ) . Let this be given by ζ n = ( q n , µ [ q ]) where ζ n ∈ ∆ ( q (Θ) × q (Θ)) – where q (Θ) = { q ( θ ) | θ ∈ Θ } . By an application of Reisz Repre-sentation theorem (see Theorem 14.12 in Aliprantis and Border (2013)), ∆ ( q (Θ) × q (Θ)) is compact according to the weak-* topology. This implies that the sequence { ζ n } must A rough argument for sequential compactness of ∆ ( q (Θ) × q (Θ)) is as follows: Note that C ( X ) , thespace of all continuous functions on X = q (Θ) × q (Θ) , is separable since X is a compact, metrizable,and Hausdorff space (see Reisz’s Theorem in Royden and Fitzpatrick (1988) – section12.3, page 251.) Thisimplies that there exists a countable subset { f i } ∞ i =1 of C ( X ) which is dense in C ( X ) according to sup-norm.Thus, for any sequence of measures { µ m } ∞ m =1 in ∆ ( X ) , for a given i , the sequence { µ m ( f i ) } ∞ m =1 where µ ( f ) = (cid:82) f ( θ ) dµ must have a convergent subsequent. Iterating repeatedly, as we increase i , we can finda subsequence { µ m k } ∞ k =1 where { µ m k ( f i ) } converges. We define ζ ( f i ) = lim k →∞ µ m k ( f i ) . Since { f i } isdense in C ( X ) , then ζ ( f ) = lim k →∞ µ m k ( f ) must exists for all f ∈ C ( X ) and can be similarly defined. ζ ∈ ∆ ( q (Θ) × q (Θ)) . Let G n be the σ -field generated by the sets (cid:8)(cid:2) q ( θ ni ) , q (cid:0) θ ni +1 (cid:1)(cid:1)(cid:9) i ≤ n − ∪ (cid:8)(cid:2) q (cid:0) θ nn − (cid:1) , q (cid:0) θ (cid:1)(cid:3)(cid:9) and let F n = G n × {∅ , ∆ ( q (Θ)) } . In words, F n conveys the information that q ( θ ) ∈ (cid:2) q ( θ ni ) , q (cid:0) θ ni +1 (cid:1)(cid:1) or q ( θ ) ∈ (cid:2) q (cid:0) θ nn − (cid:1) , q (cid:0) θ (cid:1)(cid:3) . Note that F n ⊂ F n +1 because Θ n ⊂ Θ n +1 . Moreover, E [ ζ n |F n ] = ( q n , q n ) where the above holds by the construction of τ n and ζ n . As a result E (cid:2) ζ n +1 |F n (cid:3) = E (cid:2) E (cid:2) ζ n +1 |F n +1 (cid:3) |F n (cid:3) = E (cid:2)(cid:0) q n +1 , q n +1 (cid:1) |F n (cid:3) = ( q n , q n ) where the last equality follows because E [ q ( θ ) |F n ] = q n , E [ q ( θ ) |F n ] = q n given the def-inition of q n and q n above. All of this implies that F n is a filtration and ( ζ n , F n ) forms abounded martingale – for a definition see Doob (1994). Hence by Doob’s martingale con-vergence theorem – see Theorem XI.14 in Doob (1994), we must have that lim n →∞ E [ ζ n |F n ] = E [ ζ |F ] Therefore, E ζ [ µ [ q ] | q ( θ )] = q ( θ ) . This concludes the proof. A.4 Proof of Proposition 3

Proof.

We prove the claim for a discrete distribution. The general claim follows from argu-ments similar to those made in the proof of Proposition 2.Suppose that majorization inequality holds for some k < N , i.e., we have k (cid:88) i =1 f i q i = k (cid:88) i =1 f i q i Recall the proof of Proposition 1. If for all j > k , q j > q k , then the above equality impliesthat the inequality (22) must hold with equality. As a result, we must have that a ij = 0 forall j ≥ k + 1 , i ≤ k . Note that by definition of a ij , it is given by a ij = (cid:88) s ∈ S π ( { s } | q i ) π ( { s } | q j ) f j (cid:80) Nl =1 π ( { s } | q l ) f l where in the above S = ∪ Nl =1 Supp ( π ( ·| q l )) . Hence, for all i ≤ k, j ≥ k +1 , π ( { s } | q i ) π ( { s } | q j ) =0 . This implies that ∪ i ≤ k Supp ( π ( ·| q i )) ∩ ∪ j ≥ k +1 Supp ( π ( ·| q j )) = ∅ which establishes theclaim. It is easy to show that ζ ( f ) is a linear functional over C ( X ) and thus a member of its dual, C ( X ) ∗ . Hence,there must exists a measure ˆ ζ ∈ ∆ ( X ) where ζ ( f ) = (cid:82) f d ˆ ζ . This implies that µ m k converges to ˆ ζ accordingto the weak-* topology and hence, ∆ ( X ) is sequentially compact. .5 Proof of Proposition 4 Proof.

Consider the relaxed optimization problem given by max (cid:90) θθ λ ( θ ) Π ( θ ) dF ( θ ) subject to Π (cid:48) ( θ ) = − C θ ( q ( θ ) , θ ) , ∀ θq ( θ ) : increasing (cid:90) θθ Π ( θ ) dF ( θ ) = (cid:90) θθ [ q ( θ ) − C ( q ( θ ) , θ )] dF ( θ )Π ( θ ) ≥ We show that the solution to the above optimization satisfies the majorization constraint.Given incentive compatibility, we can calculate

Π ( θ ) using integration by parts Π ( θ ) = (cid:90) θθ (cid:20) q ( θ ) − C ( q ( θ ) , θ ) + C θ ( q ( θ ) , θ ) 1 − F ( θ ) f ( θ ) (cid:21) dF ( θ ) Hence, the objective becomes (cid:90) θθ (cid:40) q ( θ ) − C ( q ( θ ) , θ ) + C θ ( q ( θ ) , θ ) [1 − F ( θ )] − (cid:82) θθ λ ( θ (cid:48) ) dF ( θ (cid:48) ) f ( θ ) (cid:41) dF ( θ ) (24)Note that since (cid:82) λ ( θ ) dF = 1 and λ ( θ ) is decreasing, we have > (cid:82) θθ λ ( θ (cid:48) ) dF ( θ (cid:48) )1 − F ( θ ) , ∀ θ Now suppose that for an interval I = [ θ , θ ] of θ ’s, C q ( q ( θ ) , θ ) > at the optimum. Ifover this interval, q ( θ ) is strictly increasing, then we can reduce q ( θ ) such that at its lowerend, q ( θ ) does not decrease. If C q > , a perturbation of q ( θ ) given by δq ( θ ) < changesthe objective by (cid:90) I (cid:40) − C q ( q ( θ ) , θ ) + C θq ( q ( θ ) , θ ) 1 − F ( θ ) − (cid:82) θθ λ ( θ (cid:48) ) dF ( θ (cid:48) ) f ( θ ) (cid:41) δq ( θ ) dF ( θ ) We have C θq ( q ( θ ) , θ ) < , δq ( θ ) < , − F ( θ ) − (cid:82) θθ λ ( θ (cid:48) ) dF ( θ (cid:48) ) f ( θ ) > (cid:90) I C θq ( q ( θ ) , θ ) 1 − F ( θ ) − (cid:82) θθ λ ( θ (cid:48) ) dF ( θ (cid:48) ) f ( θ ) δq ( θ ) dF > Moreover, (cid:90) [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF > Thus, this perturbation increases the objective. Therefore, we cannot have C q > at theoptimum. Thus, we have C q ≤ . If on the other hand, q ( θ ) is constant over an interval ofthe form (cid:104) θ , ˆ θ (cid:105) , we can find the lowest θ for which q ( θ ) = q ( θ ) . Since C q > over I ,either there is a discontinuity at θ in which case the above argument works or C q > evenfor values below θ . In this case, we extend I below θ and repeat the above perturbation.From the incentive constraint, we have q (cid:48) ( θ ) = C q ( q ( θ ) , θ ) q (cid:48) ( θ ) ≤ q (cid:48) ( θ ) Therefore, the function q ( θ ) − q ( θ ) is a weakly decreasing function. This implies that (cid:82) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) F ( θ ) ≥ (cid:82) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) )1 − F ( θ ) Hence, (cid:90) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) + (cid:90) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) ≤ (cid:90) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) + (1 − F ( θ )) (cid:82) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) F ( θ )= (cid:82) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) F ( θ ) (cid:18) − F ( θ ) F ( θ ) (cid:19) = (cid:90) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) (cid:18) − F ( θ ) F ( θ ) (cid:19) = (cid:90) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) 1 F ( θ ) which implies that the allocation satisfies the majorization constraint.The above proof also illustrates that when C ( · , · ) is strictly sub-modular, then it mustbe that C q < for all values of θ . This is because the integrand in objective in (24) is strictlydecreasing in q ( θ ) at q F B ( θ ) . This concludes the proof.45 .6 Proof of Proposition 5 Proof.

We show that first best allocation is the solution to relaxed problem where the in-centive constraint is replaced with

Π ( θ ) − Π ( θ ) ≤ − (cid:90) θθ C θ ( q ( θ (cid:48) ) , θ (cid:48) ) dθ (cid:48) (25)Suppose to the contrary that at the optimum, there is an interval of θ ’s such that the ma-jorization constraint is slack. Let’s consider such an interval I = ( θ , θ ) and assume that themajorization constraint binds at θ and θ . Such interval must exists since (cid:82) θθ [ q ( θ (cid:48) ) − q ( θ (cid:48) )] dF ( θ (cid:48) ) is continuous function of θ and as a result the set of θ ’s for which it takes positive values isan open set. Hence, it must be a countable union of disjoint intervals.We show a contradiction in a few steps: Step 1.

For any subinterval of I , (25) cannot be slack. Suppose, to the contrary, that thisis the case and that there is a subinterval I (cid:48) ⊂ I in which (25) is slack. Let ˆ θ be themid-point of I (cid:48) . Then consider the following perturbation δq ( θ ) =  − ε (cid:48) θ ∈ I (cid:48) , θ < ˆ θε θ ∈ I (cid:48) , θ ≥ ˆ θ θ / ∈ I (cid:48) where − ε (cid:48) (cid:104) F (cid:16) ˆ θ (cid:17) − F (min I (cid:48) ) (cid:105) + ε (cid:104) F (max I (cid:48) ) − F (cid:16) ˆ θ (cid:17)(cid:105) = 0 and ε, ε (cid:48) > . Sincemajorization is slack over I (cid:48) , there exists a value of ε and ε (cid:48) small enough so that thisperturbation does not violate the majorization constraint. Moreover, (25) is slack over I (cid:48) there exists a value of ε and ε (cid:48) small enough so that (25) is satisfied. As a result, theperturbed allocation is still feasible and satisfies the constraints. The change in theobjective resulting from this perturbation is given by (cid:90) I (cid:48) λ ( θ ) δ Π ( θ ) dF ( θ ) = − ε (cid:48) (cid:90) ˆ θ min I (cid:48) λ ( θ ) dF ( θ ) + ε (cid:90) max I (cid:48) ˆ θ λ ( θ ) dF ( θ )= ε (cid:90) max I (cid:48) ˆ θ λ ( θ ) dF ( θ ) − F (max I (cid:48) ) − F (cid:16) ˆ θ (cid:17) F (cid:16) ˆ θ (cid:17) − F (min I (cid:48) ) (cid:90) ˆ θ min I (cid:48) λ ( θ ) dF ( θ )  = ε (cid:16) F (max I (cid:48) ) − F (cid:16) ˆ θ (cid:17)(cid:17)  (cid:82) max I (cid:48) ˆ θ λ ( θ ) dF ( θ ) F (max I (cid:48) ) − F (cid:16) ˆ θ (cid:17) − (cid:82) ˆ θ min I (cid:48) λ ( θ ) dF ( θ ) F (cid:16) ˆ θ (cid:17) − F (min I (cid:48) )  > where the last inequality holds because λ ( θ ) is strictly increasing. The above impliesthe required contradiction since this perturbation increases the objective. Step 2. If q ( θ ) is strictly increasing over a subinterval of I , then it must be that C q ( q ( θ ) , θ ) ≥ . Suppose not. Then, since q ( θ ) is strictly increasing over46 (cid:48) ⊂ I , it is possible to find a perturbation δq ( θ ) of q ( θ ) with δq (max I (cid:48) ) = 0 and δq ( θ ) > , ∀ θ ∈ I (cid:48) / { max I (cid:48) } which keeps q ( θ ) monotone – see Figure 3.Let δq ( θ ) be given by δq ( θ ) = (cid:40) δq ( θ ) C q ( q ( θ ) , θ ) + (cid:82) I (cid:48) [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF ( θ ) θ ∈ I (cid:48) (cid:82) I (cid:48) [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF ( θ ) θ / ∈ I (cid:48) We have ∀ θ ∈ I (cid:48) ,δ Π ( θ ) = δq ( θ ) − C q ( q ( θ ) , θ ) δq ( θ ) = (cid:90) I (cid:48) [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF ( θ ) ∀ θ / ∈ I (cid:48) ,δ Π ( θ ) = (cid:90) I (cid:48) [1 − C q ( q ( θ ) , θ )] δq ( θ ) dF ( θ ) This implies that the perturbation keeps the LHS of (25) unchanged. Since C θq ≤ , the perturbation increases the RHS of (25) and as a result the inequality (25) issatisfied for this perturbed allocation. Moreover, since majorization is slack over I (cid:48) for a small enough perturbation δq ( θ ) it is still satisfied. This perturbationincreases the profits of all sellers while it keep buyers’ utility unchanged. Thisimplies the required contradiction. Step 3.

We show that the above two statements lead to a contradiction. Since (25)binds for all values of θ except for a measure zero set, it must be that Π ( θ ) isalmost everywhere monotone and as a result almost everywhere differentiable.We thus have Π (cid:48) ( θ ) = − C θ ( q ( θ ) , θ ) which then implies q (cid:48) ( θ ) = C q ( q ( θ ) , θ ) q (cid:48) ( θ ) ≥ q (cid:48) ( θ ) Since majorization binds at θ and θ , we must have that q ( θ ) ≥ q ( θ ) and (cid:82) θ θ [ q ( θ ) − q ( θ )] dF ( θ ) = 0 . Therefore, we must have that q ( θ ) ≥ q ( θ ) foralmost all values of θ ∈ I . Since (cid:82) θ θ [ q ( θ ) − q ( θ )] dF ( θ ) = 0 , we must have that q ( θ ) = q ( θ ) for almost all values of θ ∈ I . This in turn implies that majorizationis binding for almost all values of θ ∈ I which is a contradiction.The above arguments establishes that the majorization constraint must be binding for allvalues of θ . Hence, q ( θ ) = q ( θ ) for all values of θ and thus the objective is maximized at q ( θ ) = q F B ( θ ) which concludes the proof. 47 .7 Proof of Proposition 6 Proof.

We show the desired properties in the solution to the more relaxed problem wereincentive compatibility

Π ( θ ) − Π ( θ ) = − (cid:90) θθ C θ ( q ( θ (cid:48) ) , θ (cid:48) ) dθ (cid:48) q ( θ ) : increasingis replaced with the following Π ( θ ) − Π ( θ ) ≤ − (cid:90) θθ C θ ( q ( θ (cid:48) ) , θ (cid:48) ) dθ (cid:48) , ∀ θ ≤ θ ∗ (26) Π (cid:0) θ (cid:1) − Π ( θ ) ≥ − (cid:90) θθ C θ ( q ( θ (cid:48) ) , θ (cid:48) ) dθ (cid:48) , ∀ θ ≥ θ ∗ (27) q ( θ ) : increasingAs we will show, in the solution of this more relaxed programing problem, the above in-equalities are binding.In order to show the claim, we first show that if the majorization inequality binds forsome value of θ (cid:48) ≤ θ ∗ , then it must be binding for all values of θ ≤ θ (cid:48) . The argument issimilar to that of proof of Proposition 5. Hence, we skip the details and describe in brief. Inparticular, suppose that there exists an interval I = [ θ , θ ] with θ ≤ θ ∗ , where majorizationis binding at θ and θ while it is slack for all values of θ ∈ ( θ , θ ) . Then the exact samesteps as in proof of Proposition 5 lead to a contradiction.Next, we show that for any θ > θ ∗ , the majorization constraint is slack. First, notethat the incentive constraints (27) must be binding for all values of θ > θ ∗ . Suppose to thecontrary that the incentive constraint is slack for an interval, I , of θ ’s above θ ∗ . Then let ˆ θ be the mid-point of I and consider the following perturbation: δq ( θ ) =  ε θ ∈ I, θ ≤ ˆ θ − ε (cid:48) θ ∈ I, θ > ˆ θ θ / ∈ I where ε, ε (cid:48) > and ε (cid:104) F (cid:16) ˆ θ (cid:17) − F (min I ) (cid:105) − ε (cid:48) (cid:104) F (max I ) − F (cid:16) ˆ θ (cid:17)(cid:105) = 0 . This is incentivecompatible for a small enough values of ε, ε (cid:48) since incentive compatibility is slack over I .Moreover, since signaled qualities are being allocated to lower θ ’s, the perturbed allocationsatisfies majorization. Hence, it must increase the value of the objective since λ ( θ ) is strictlydecreasing for values of θ ≥ θ ∗ .Now, suppose to the contrary that for some θ (cid:48) > θ ∗ , majorization is binding. Given thatwe have argued that (27) is binding for all values θ ≥ θ ∗ and since (cid:82) θθ C θ (cid:16) q (cid:16) ˆ θ (cid:17) , ˆ θ (cid:17) d ˆ θ is48ontinuous in θ , then Π ( θ ) must be continuous over (cid:2) θ ∗ , θ (cid:3) . Moreover, since majorizationis binding at θ (cid:48) , we must have (cid:90) θθ (cid:48) (cid:104) q (cid:16) ˆ θ (cid:17) − q (cid:16) ˆ θ (cid:17)(cid:105) dF (cid:16) ˆ θ (cid:17) ≥ , θ > θ (cid:48) (cid:90) θ (cid:48) θ (cid:104) q (cid:16) ˆ θ (cid:17) − q (cid:16) ˆ θ (cid:17)(cid:105) dF (cid:16) ˆ θ (cid:17) ≤ , θ < θ (cid:48) Dividing the top inequality by F ( θ ) − F ( θ (cid:48) ) and bottom one by F ( θ (cid:48) ) − F ( θ ) and takinglimit as θ tends to θ (cid:48) , using l’hÃŽpital’s rule, we have q ( θ (cid:48) ) ≥ q ( θ (cid:48) ) q ( θ (cid:48) − ) ≤ q ( θ (cid:48) − ) These imply that

Π ( θ (cid:48) ) = q ( θ (cid:48) − ) − C ( q ( θ (cid:48) − ) , θ (cid:48) ) ≤ q ( θ (cid:48) − ) − C ( q ( θ (cid:48) − ) , θ (cid:48) ) ≤ q F B ( θ (cid:48) ) − C (cid:0) q F B ( θ (cid:48) ) , θ (cid:48) (cid:1) Now, we show that given this property there is an alternative allocation that improves theobjective in the relaxed problem. In particular, consider the solution of the problem max q ( θ ) ,q ( θ ) (cid:90) θθ (cid:48) Π ( θ ) λ ( θ ) dF ( θ ) subject to Π (cid:48) ( θ ) = − C θ ( q ( θ ) , θ )Π ( θ ) = q ( θ ) − C ( q ( θ ) , θ ) q ( θ ) : monotone (cid:90) θθ (cid:48) Π ( θ ) dF ( θ ) = (cid:90) θθ (cid:48) [ q ( θ ) − C ( q ( θ ) , θ )] dF ( θ ) Let the solution to the above be referred to as { q r ( θ ) , q r ( θ ) } θ ∈ [ θ (cid:48) ,θ ] . As we have shownin the proof of Proposition 4, the solution of the above problem satisfies C q ≤ with astrict inequality for a positive measure of types. This would imply that in the solution ofthe above problem Π r ( θ (cid:48) ) > q F B ( θ (cid:48) ) − C (cid:0) q F B ( θ (cid:48) ) , θ (cid:48) (cid:1) ; otherwise, the first best allocationwould deliver a higher objective. This also implies that given the contradiction assumption, (cid:82) θθ (cid:48) Π ( θ ) λ ( θ ) dF ( θ ) < (cid:82) θθ (cid:48) Π r ( θ ) λ ( θ ) dF ( θ ) .Now, we consider the following allocation: { q ( θ ) , q ( θ ) } θ<θ (cid:48) , { q r ( θ ) , q r ( θ ) } θ ≥ θ (cid:48) . Thisobviously satisfies incentive compatibility for values of θ ≤ θ ∗ . Moreover, we have Π r (cid:0) θ (cid:1) − Π ( θ (cid:48) ) > Π r (cid:0) θ (cid:1) − Π r ( θ (cid:48) ) = − (cid:90) θθ (cid:48) C θ (cid:16) q r (cid:16) ˆ θ (cid:17) , ˆ θ (cid:17) d ˆ θ q ( θ − ) is the left limit of q ( · ) at θ . ˜ θ ≤ θ ∗ below which ma-jorization constraint is binding while above it the majorization constraint is slack. Sincebelow ˜ θ majorization is binding, we must have that q ( θ ) = q ( θ ) for all values of θ ≤ ˜ θ . Asa result, Π (cid:16) ˜ θ (cid:17) ≤ q F B (cid:16) ˜ θ (cid:17) − C (cid:16) q F B (cid:16) ˜ θ (cid:17) , ˜ θ (cid:17) . Note that incentive compatibility combinedwith q ( θ ) = q ( θ ) implies that there exists a threshold ˆ θ such that for θ ≤ ˆ θ , q ( θ ) = q F B ( θ ) and q ( θ ) = q F B (cid:16) ˆ θ (cid:17) = q (cid:16) ˜ θ (cid:17) for all θ ∈ (cid:104) ˆ θ, ˜ θ (cid:105) . A.8 Proof of Proposition 8

Proof.

In order to prove this result, we first show the following lemma:

Lemma 3.

For any subinterval [ x , x ) of [0 , and any x ∈ [ x , x ) , consider signaledquality functions x s ( x ) and x p ( x ) defined over [ x , x ) as follows: x s ( x ) =  (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx x ∈ [ x , x ) (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx x ∈ [ x , x ) x p ( x ) = (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx , ∀ x ∈ [ x , x ) Then, (cid:82) x x Γ ( x ) x p ( x ) h ( x ) dx ≥ (cid:82) x x Γ ( x ) x s ( x ) h ( x ) dx if and only if (cid:82) x x Γ ( x ) h ( x ) dx (cid:82) x x h ( x ) dx ≥ (cid:82) x x Γ ( x ) h ( x ) dx (cid:82) x x h ( x ) dx . (28) Proof.

Let us define a = (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx , a = (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx , b = (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx , b = (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx and α = (cid:82) x x h ( x ) dx, β = (cid:82) x x h ( x ) dx . Since b < b , by Chebyshev’s sum inequality –see Hardy et al. (1934), Theorem 43 – a ≥ a if and only if ( α + β ) ( αa b + βa b ) ≤ ( αa + βa ) ( αb + βb ) (29)We have αa b + βa b = (cid:90) x x Γ ( x ) h ( x ) dx (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx + (cid:90) x x Γ ( x ) h ( x ) dx (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx = (cid:90) x x Γ ( x ) x s ( x ) h ( x ) dxαa + βa = (cid:90) x x Γ ( x ) h ( x ) dx + (cid:90) x x Γ ( x ) h ( x ) dx = (cid:90) x x Γ ( x ) h ( x ) dxαb + βb = (cid:90) x x xh ( x ) dx + (cid:90) x x xh ( x ) dx = (cid:90) x x xh ( x ) dx (cid:90) x x h ( x ) dx (cid:90) x x Γ ( x ) x s ( x ) h ( x ) dx ≤ (cid:90) x x Γ ( x ) h ( x ) dx (cid:90) x x xh ( x ) dx or (cid:90) x x Γ ( x ) x s ( x ) h ( x ) dx ≤ (cid:90) x x Γ ( x ) x p ( x ) h ( x ) dx This proves the claim.In words, the above lemma implies that if the solution to (29), involves two consecutivepooling intervals, then it must be that average value of

Γ ( x ) is lower at the lower poolinginterval. A similar argument to that of Proposition 7 shows that for the solution to (P’),there must exist a collection of half-intervals { K α } α ∈ A where each half-interval involveseither pooling, i.e., x ( · ) is constant for x ∈ K α , or it is fully separating, i.e., x ( x ) = x for all x ∈ K α . We use the above lemma to show that if two half-intervals K α and K β which areboth pooling and sup K α = min K β , then (cid:82) Kα Γ( x ) h ( x ) dx (cid:82) Kα h ( x ) dx = (cid:82) Kβ Γ( x ) h ( x ) dx (cid:82) Kβ h ( x ) dx and thus, we cansimply replace K α , K β with their union K α ∪ K β – this is also implied by Lemma (3): Lemma 4.

Let x ( x ) be a solution to (P’) and suppose that x < x < x exists such that x ( · ) is constant over [ x , x ) and [ x , x ) ; with x ( x − ) < x ( x ) = x ( x − ) < x ( x − ) < x ( x ) .Then, (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx = (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx .Proof. Suppose to the contrary that (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx (cid:54) = (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx . If (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx > (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx ,then an alternative signaled quality function x a ( x ) would deliver a higher payoff: x a ( x ) =  (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx x ∈ [ x , x ) x ( x ) otherwiseThat x a ( x ) delivers a higher objective relative to x ( · ) is a direct result of Lemma 3. Thisis because x a pools all values of x in [ x , x ) while x separates the interval [ x , x ) from [ x , x ) . This is a contradiction as x ( · ) was assumed to be optimal.Now, suppose that (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx < (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx . Since x ( · ) is optimal and it pools valuesof x in each interval [ x , x ) and [ x , x ) , an argument similar to above can be used to showthat the following must hold (cid:82) xx Γ ( x (cid:48) ) h ( x (cid:48) ) dx (cid:48) (cid:82) xx h ( x (cid:48) ) dx (cid:48) ≥ (cid:82) x x Γ ( x (cid:48) ) h ( x (cid:48) ) dx (cid:48) (cid:82) x x h ( x (cid:48) ) dx (cid:48) , ∀ x ∈ [ x , x ) (30) (cid:82) xx Γ ( x (cid:48) ) h ( x (cid:48) ) dx (cid:48) (cid:82) xx h ( x (cid:48) ) dx (cid:48) ≥ (cid:82) x x Γ ( x (cid:48) ) h ( x (cid:48) ) dx (cid:48) (cid:82) x x h ( x (cid:48) ) dx (cid:48) , ∀ x ∈ [ x , x ) (31)51his is because if for some x , the above are reversed – for example (30), we can use Lemma3 to show that separating [ x , x ) and [ x, x ) would increase the value of the objective whichis a contradiction. Now, consider (30). We can take limit of x as it converges to x . Usingl’HÃŽpital’s rule, we have the following Γ ( x ) ≥ (cid:82) x x Γ ( x ) h ( x ) dx (cid:82) x x h ( x ) dx Similarly, by taking the limit as x converges to x , we have (cid:82) x x Γ ( x ) h ( x ) dx (cid:82) x x h ( x ) dx ≥ Γ ( x ) Hence,

Γ ( x ) ≥ (cid:82) x x Γ ( x ) h ( x ) dx (cid:82) x x h ( x ) dx ≥ Γ ( x ) Using a similar argument, we have

Γ ( x ) ≥ (cid:82) x x Γ ( x ) h ( x ) dx (cid:82) x x h ( x ) dx ≥ Γ ( x ) This is in contradiction with our initial assumption of (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx < (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx as theabove inequalities imply that (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx ≥ Γ ( x ) ≥ (cid:82) x x Γ( x ) h ( x ) dx (cid:82) x x h ( x ) dx . This completes theproof.The above lemma establishes that for the solution of (P’) we can focus our attention onsignaled quality functions that are alternating partitions since we can assume that there arenot consecutive pooling intervals at the optimum. The next lemma establishes that therecan only be k partitions by showing that if there is a fully revealing interval at the optimum Γ ( x ) must be increasing over this interval: Lemma 5.

Let x ( · ) be a solution to (P’) and suppose that x < x exist such that x ( x ) = x, ∀ x ∈ [ x , x ) and x ( x − ) < x and x ( x ) > x . Then, Γ ( x ) is weakly increasing over [ x , x ) .Proof. Suppose to the contrary that the gain function is not increasing over [ x , x ) . Thenthere must exist a subinterval [ x , x ) of [ x , x ) wherein Γ ( x ) is strictly decreasing – thisis because Γ (cid:48) ( x ) is a continuous function. Now, let us consider the following alternativesignaled quality function x a ( x ) =  (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx x ∈ [ x , x ) x ( x ) otherwise52ote that by Chebyshev sum (integral) inequality since Γ ( x ) is strictly decreasing over [ x , x ) and h ( · ) is full support, (cid:90) x x xh ( x ) dx (cid:90) x x Γ ( x ) h ( x ) dx > (cid:90) x x Γ ( x ) xh ( x ) dx (cid:90) x x h ( x ) dx or (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx (cid:90) x x Γ ( x ) h ( x ) dx > (cid:90) x x x Γ ( x ) h ( x ) dx. Hence, (cid:90) Γ ( x ) x a ( x ) h ( x ) dx − (cid:90) Γ ( x ) x ( x ) h ( x ) dx = (cid:82) x x xh ( x ) dx (cid:82) x x h ( x ) dx (cid:90) x x Γ ( x ) h ( x ) dx − (cid:90) x x x Γ ( x ) h ( x ) dx > which is a contradiction since x ( · ) is optimal.Lemmas 4 and 5 establish our claim. By Lemma 4, solution of (P’) is an alternatingpartition – alternating between pooling and separating – and by Lemma 5, its separatingparts are a subset of increasing parts of Γ ( x ) . Since Γ (cid:48) ( x ) switches sign k times, we musthave that at the optimum there are at most k intervals. A.9 Proof of Proposition 9

Proof.

To prove the claim, we provide a characterization of the properties of the gain func-tion and use . Note that the gain function is given by

Γ ( x ) = g ( x | q ) h ( x ) (cid:18) γ g q ( x | q ) g ( x | q ) + γ g q ( x | q ) g ( x | q ) g ( x | q ) g ( x | q ) (cid:19) − Before, characterizing properties of the gain function, we prove that γ is positive. To doso, we consider an alternative planning problem max (cid:90) x ( x ) g ( x | q ) dx − C ( q , θ ) (P2)subject to (cid:90) x ( x ) g q ( x | q ) dx = C ( q , θ ) (32) (cid:90) x ( x ) g q ( x | q ) dx ≥ C ( q , θ ) (33)together with majorization and monotonicity constraints. By Kuhn-Tucker’s conditions, thelagrange multiplier associated with the inequality incentive constraint (33) must be either53ositive or zero (in case the constraint is slack). Thus, in order to show that γ is positive,it is sufficient to show that in (P2), (33) is binding. Proof that (33) is binding.

Suppose that (33) is slack. In this case, we can show that γ is positive. To see that we consider a planning problem without the constraint (33) and withan inequality version of (32). It is straightforward to see that in this problem this constraintmust be binding since if slack the gain function Γ ( x ) = g ( x | q ) /h ( x ) is decreasing andthus optimal x ( x ) is full pooling which then violates the inequality incentive constraint.Hence, γ ≥ .Therefore, the gain function is given by Γ ( x ) = g ( x | q ) /h ( x ) (cid:104) γ g q ( x | q ) g ( x | q ) (cid:105) . Defin-ing z ( x ) = g ( x | q ) /g ( x | q ) , we can write Γ ( x ) = ˆΓ ( z ( x )) = 1 f + f z ( x ) [1 + γ φ ( z ( x ))] where φ ( · ) is defined in AssumptionSince z ( · ) is increasing in x , determining the sign of Γ (cid:48) ( x ) is equivalent to that of ˆΓ (cid:48) ( z ) .We have ˆΓ (cid:48) ( z ) = − f γ φ ( z )( f + f z ) + γ φ (cid:48) ( z )( f + f z ) (cid:16) ( f + f z ) ˆΓ (cid:48) ( z ) (cid:17) (cid:48) = − f γ φ (cid:48) ( z ) + f γ φ (cid:48) ( z ) + ( f + f z ) γ φ (cid:48)(cid:48) ( z )= ( f + f z ) γ φ (cid:48)(cid:48) ( z ) ≤ Since by Assumption 3 φ (cid:48)(cid:48) ( z ) ≤ , this implies that ( f + f z ) ˆΓ (cid:48) ( z ) is decreasing. Thus,there must exist z where ˆΓ (cid:48) ( z ) ≤ for values of z ≤ z and ˆΓ (cid:48) ( z ) ≥ for values of z ≥ z .It is possible that z = z = min x ∈ [0 , g ( x | q ) /g ( x | q ) or z = ¯ z = max x ∈ [0 , g ( x | q ) /g ( x | q ) .Since Γ ( x ) must have the same property, using Proposition 8 there must exist a cutoff x so that below x , optimal x ( · ) is fully revealing while above x is pooling. Now given thatoptimal x ( x ) has this shape when (32) is slack for i = 2 , consider an infinitesimal increasein q accompanied by a uniform increase in x ( · ) for all values of x so that the change in x ( x ) , δx ( x ) = f δq . Since (33) is slack, this perturbed allocation keeps it satisfied. More-over, since δx ( x ) is constant this perturbation keeps (32) unchanged. Finally, majorizationis satisfied since the perturbed x ( · ) is equal to x only at one point and is higher than x forvalues of x below this point. This leads to the desired contradiction as this perturbationincreases the payoff of type 1 sellers.Having proven that γ is positive, we can use Assumption 3 to characterize the optimalrating system. Note that in this case, the gain function is given by Γ ( x ) = ˆΓ ( z ( x )) = 1 f + f z ( x ) [1 + γ φ ( z ( x )) + γ ψ ( z ( x ))]

54e have ˆΓ (cid:48) ( z ) = − f γ φ ( z ) + γ ψ ( z )( f + f z ) + γ φ (cid:48) ( z ) + γ ψ (cid:48) ( z ) f + f z ( f + f z ) ˆΓ (cid:48) ( z ) = − f [1 + γ φ ( z ) + γ ψ ( z )]+ ( f + f z ) [ γ φ (cid:48) ( z ) + γ ψ (cid:48) ( z )] (cid:16) ( f + f z ) ˆΓ (cid:48) ( z ) (cid:17) (cid:48) = ( f + f z ) [ γ φ (cid:48)(cid:48) ( z ) + γ ψ (cid:48)(cid:48) ( z )] There are two possibilities:1. The multiplier γ is negative. In this case, γ φ (cid:48)(cid:48) ( z ) + γ ψ (cid:48)(cid:48) ( z ) ≥ by Assumption (3)which implies that either Γ (cid:48) ( x ) is always positive or negative for low values of x andpositive for high values. Hence, by Proposition (8) the solution is of the form x ( x ) = (cid:40) E h [ x | x < x ] x < x x x ≥ x where E h [ ·|· ] is conditional expectation according to the distribution H ( · ) and x ∈ [0 , . The above proves the claim since in this case x = 0 .2. The multiplier γ is positive. In this case, the sign of γ φ (cid:48)(cid:48) ( z ) + γ ψ (cid:48)(cid:48) ( z ) cannot bedetermined. However, part 3 of Assumption 3 implies that it is negative for low valuesof z while it is positive for high values of z . Hence, either ˆΓ (cid:48) ( z ) < which meansthat ˆΓ (cid:48) ( z ) (and as a result Γ (cid:48) ( x ) )only switches sign once, or ˆΓ (cid:48) ( z ) > which meansthat ˆΓ (cid:48) ( z ) at most changes sign twice. By Proposition 3, all of these cases lead to theoptimal rating structure stated in the Proposition. This concludes the proof. B An Algorithm for Construction of Signals

Here, we provide a construction algorithm when the distribution of θ is discrete. The al-gorithm illustrates that a combination of a rather small class of rating systems, those thatsimply pool qualities together, can always implement a vector of signaled qualities. Beforedescribing the algorithm, we define two classes of signals; for convenience, we work withmeasures over posterior beliefs.1. Interval pooled system : For any two indices k < l , an interval pooled signal, rep-resented by σ k → l ∈ ∆ (∆ (Θ)) , is one in which all qualities q l , · · · , q k send the same55ignal while the qualities of all other types are fully revealed. Formally, σ k → l ( { e i } ) = f i , ∀ i < k, i > lσ k → l (cid:18)(cid:26) f k f k + · · · + f l e k + · · · + f l f l + · · · + f k e l (cid:27)(cid:19) = f k + · · · + f l ; otherwise,where e i ∈ ∆ (Θ) is a vector that is 1 in its i -th element and 0 otherwise.2. Two-point pooled system : For any two indices k < l , a two-point pooled signal,represented by σ k,l ∈ ∆ (∆ (Θ)) , is one in which qualities q l and q k send the samesignal while the qualities of all other types are fully revealed. Formally, σ k,l ( { e i } ) = f i , ∀ i (cid:54) = k, lσ k,l (cid:18)(cid:26) f k f k + f l e k + f l f k + f l e l (cid:27)(cid:19) = f l + f k . We also refer to the fully informative signal as τ F I ∈ ∆ (∆ (Θ)) with τ F I ( { e i } ) = f i .Now consider the vectors of signaled and true qualities, q (cid:54) = q , such that q (cid:60) F q . Then thefollowing algorithm can be used to construct the rating system that implements the signaledquality q : Algorithm 1.

Start by letting r = q . Let l and k be defined as follows: k = arg min i q i > r i and l = arg min i>k q i < r i .

1. If for all values of j ∈ { k, · · · , l − } , q j > r j , let ˆ λ be defined as the highest value of λ < such that λr j + (1 − λ ) f k r k + · · · + f l r l f k + · · · + f l ≥ q j , ∀ j ∈ { k, · · · , l − } λr l + (1 − λ ) f k r k + · · · + f l r l f k + · · · + f l ≤ q l with at least one equality. Using this value of ˆ λ , we construct ˆ τ = (cid:16) − ˆ λ (cid:17) σ k → l + ˆ λτ F I and ˜r = diag ( f ) − (cid:82) ∆(Θ) µµ T r d ˆ τ .• If for some value of j ∈ { k, · · · , l − } , q j = r j , let k (cid:48) ∈ { k + 1 , · · · , l − } satisfy q k (cid:48) > r k (cid:48) and q k (cid:48) +1 = r k (cid:48) +1 . In addition, let ˆ λ be the highest value of λ < that satisfies λr k (cid:48) + (1 − λ ) f k (cid:48) r k (cid:48) + f l r l f k (cid:48) + f l ≤ q k (cid:48) λr l + (1 − λ ) f k (cid:48) r k (cid:48) + f l r l f k (cid:48) + f l ≥ q l , ith at least one of the above holding with equality. Using this value of ˆ λ , we construct ˆ τ = (cid:16) − ˆ λ (cid:17) σ k,l + ˆ λτ F I and ˜ r = diag ( f ) − (cid:82) ∆(Θ) µµ T r d ˆ τ .• If ˜ r (cid:54) = q , repeat the above steps replacing r with ˜ r . The proof that this algorithm works uses the fact that the set S defined in (6) is convex. Ineach step of the above algorithm, the number of elements of r and q that are different shrinksby at least l. As a result, the repetition of this procedure, while keeping majorization intact,gives us the rating system that maps q into q . As the algorithm shows, the constructed signalis derived from a repeated application of interval and two-point pooled signals. While we donot show an equivalent result for continuous distributions, one can use the above algorithmin an approximate form by approximating continuous distributions with discrete ones. B.1 Proof of Algorithm (1)

Proof.

For any q (cid:60) F q , define l and k as follows: k = arg min i q i > q i and l = arg min i>k q i < q i There are two possibilities:1. For all values of j ∈ { k, · · · , l − } we have that q j > q j .In this case, we define the following signal ˆ τ = λ · τ F I + (1 − λ ) · σ k → l and its associated matrix A = λ (cid:90) µµ T dτ F I + (1 − λ ) (cid:90) µµ T dσ k → l = λI + (1 − λ ) Σ k → l Therefore ˆ r i = ( Aq ) i = (cid:40) q i i < k, i > lλq i + (1 − λ ) i = k, · · · , l Note that because of our choice of this transformation, the elements of ˆ p are mono-tone. This is because · · · ≥ q l +1 ≥ q l ≥ λq l +(1 − λ ) q k → l ≥ · · · ≥ λq k +1 +(1 − λ ) q k → l ≥ λq k +(1 − λ ) q k → l > q k ≥ q k − ≥ · · · We can find the highest value of λ ∈ [0 , such that one of the following equalitieshold λq i + (1 − λ ) q k → l = q i , i = k, · · · , l q i ≥ q i , ∀ k < i < l and q l < q l . Let this value of λ becalled ˆ λ . From the definition of λ , it implies that the following inequalities must hold ∀ j = k, · · · , l − , ˆ λq j + (cid:16) − ˆ λ (cid:17) q k → l ≤ q j ˆ λq l + (cid:16) − ˆ λ (cid:17) q k → l ≥ q l Note further that by construction f k ˆ r k + · · · + f l ˆ r l = f k q k + · · · + f l q l We then have i (cid:88) j =1 f j ˆ r j = (cid:80) ij =1 f j q j i < k (cid:80) k − j =1 f j q j + (cid:80) ij = k f j (cid:104) ˆ λq j + (cid:16) − ˆ λ (cid:17) q k → l (cid:105) i = k, · · · , l − (cid:80) ij =1 f j q j i = l By the above inequalities the above obviously mean that i (cid:88) j =1 f j ˆ r j ≤ i (cid:88) j =1 f j q j This implies that in this constructed signal ˆ r (cid:60) F q . Moreover, obviously we musthave that q (cid:60) F ˆ r – from Lemma 2.2. There exists j ∈ { k + 1 , · · · , l − } such that q j = q j .In this case, let k (cid:48) satisfy q k (cid:48) > q k (cid:48) and q k (cid:48) +1 = q k (cid:48) +1 . This must necessarily exist since q k > q k . Now let ˆ λ be the highest value of λ ∈ [0 , that satisfies λq k (cid:48) + (1 − λ ) f k (cid:48) q k (cid:48) + f l q l f k (cid:48) + f l ≤ q k (cid:48) λq l + (1 − λ ) f k (cid:48) q k (cid:48) + f l q l f k (cid:48) + f l ≥ q l with at least one of the above holding with equality. Then we define ˆ τ = λ · τ F I + (1 − λ ) σ k (cid:48) ,l Then obviously the resulting ˆ r is monotone in its elements and with an argumentsimilar to above it F -majorizes q and is F -majorized by qq