[PDF] Optimal Pricing of Information

Abstract

A decision maker looks to take an active action (e.g., purchase some goods or make an investment). The payoff of this active action depends on his own private type as well as a random and unknown state of nature. To decide between this active action and another passive action, which always leads to a safe constant utility, the decision maker may purchase information from an information seller. The seller can access the realized state of nature, and this information is useful for the decision maker (i.e., the information buyer) to better estimate his payoff from the active action. We study the seller's problem of designing a revenue-optimal pricing scheme to sell her information to the buyer. Suppose the buyer's private type and the state of nature are drawn from two independent distributions, we fully characterize the optimal pricing mechanism for the seller in closed form. Specifically, under a natural linearity assumption of the buyer payoff function, we show that an optimal pricing mechanism is the threshold mechanism which charges each buyer type some upfront payment and then reveals whether the realized state is above some threshold or below it. The payment and the threshold are generally different for different buyer types, and are carefully tailored to accommodate the different amount of risks each buyer type can take. The proof of our results relies on novel techniques and concepts, such as upper/lower virtual values and their mixtures, which may be of independent interest.

Full PDF

OOptimal Pricing of Information

Shuze LiuUniversity of Virginia [email protected]

Weiran ShenRenmin University of China [email protected]

Haifeng XuUniversity of Virginia [email protected]

Abstract

A decision maker looks to take an active action (e.g., purchase some goods or make an investment).The payoff of this active action depends on his own private type as well as a random and unknown stateof nature . To decide between this active action and another passive action, which always leads to a safeconstant utility, the decision maker may purchase information from an information seller . The sellercan access the realized state of nature, and this information is useful for the decision maker (i.e., the information buyer ) to better estimate his payoff from the active action.We study the seller’s problem of designing a revenue-optimal pricing scheme to sell her informationto the buyer. Suppose the buyer’s private type and the state of nature are drawn from two independentdistributions, we fully characterize the optimal pricing mechanism for the seller in closed form. Speciﬁ-cally, under a natural linearity assumption of the buyer payoff function, we show that an optimal pricingmechanism is the threshold mechanism which charges each buyer type some upfront payment and thenreveals whether the realized state is above some threshold or below it. The payment and the thresholdare generally different for different buyer types, and are carefully tailored to accommodate the differ-ent amount of risks each buyer type can take. The proof of our results relies on novel techniques andconcepts, such as upper/lower virtual values and their mixtures, which may be of independent interest. a r X i v : . [ c s . G T ] F e b Introduction

In numerous situations, a decision maker wishes to take an active move but is uncertain about its outcome andpayoff. Such active moves range from ﬁnancial decisions of investing a stock or startup to daily life decisionsof purchasing a house or used car, from macro-level enterprise decisions of developing a new product tomicro-level decisions of approving a loan applicant or displaying online ads to a particular Internet user. Inthese situations, the decision maker’s payoff for the active move relies on uncertain information regarding,e.g., potential of the invested company, quality of the house, popularity of the new product, credit of theloan applicant, etc. Certainly, the decision maker typically also has a passive backup option of not makingthe move, in which case he obtains a safe utility without any risk. To decide between the active and passive action, the decision maker can turn to an information seller who can access more accurate information aboutthe uncertainties and thus help to better estimate the payoff for his action. Given the usefulness of the seller’sinformation to the decision maker, the seller can make a proﬁt from how much her information helped toimprove utilities of the decision maker, i.e., the information buyer.This paper studies how a monopoly information seller (she) described above can design an optimalpricing mechanism to sell her information to an information buyer (he) randomly drawn from some popu-lation. The buyer, a decision maker, needs to take one of two actions. The active action results in a payoff v ( q, t ) where t captures the buyer’s private type and the state of nature q summarizes the payoff-relevantuncertainty unknown to the buyer. The q and t are independent random variables and are both supported on continuous sets. The passive backup action for the buyer always results in the same utility, normalized to ,regardless of q, t . While the buyer and seller both know the distribution of q , the seller is more informed andcan additionally observe the realized the state. With the information of q , the seller would like to design anoptimal pricing scheme to sell her information to the buyer, whose type is drawn from a known distribution.The described problem setup above is a very fundamental monopoly pricing problem. However, dif-ferent from the classic pricing problem for goods, here we are looking for an optimal pricing scheme forinformation. Despite bearing a similar structure, these two pricing problems turns out to differ signiﬁcantly.For example, when selling goods, physical or digital, the seller’s allocation rule can simply be described bya probability of giving out the goods. However, it is unclear here how the seller should “pass” her informa-tion to the buyer. Second, and more importantly, in selling goods, any individually rational buyer shouldparticipate in the mechanism as long as their expected utility is at least . However, in our setup, withoutparticipating in the mechanism, the buyer may already have positive utility from his active decisions. An in-dividually rational buyer would participate in the mechanism only when his utility will become even higher.This key difference turns out to render standard mechanism design techniques inapplicable here (more de-tails will be discussed later). This is also evidenced by our characterization of the optimal mechanism forthe pricing of information, which is signiﬁcantly different and more intricate than the optimal pricing ofgoods. Main Result: Optimal Mechanism for Pricing Information.

We characterize the optimal mechanism forthe above information pricing problem, within the general class of mechanisms that can be described as ageneric interactive protocol introduced by Babaioff et al. [2]. Speciﬁcally, assuming buyer’s value functionis linear and monotone non-decreasing in t , i.e., v ( q, t ) = v ( q )[ t + ρ ( q )] for some v ( q ) ≥ , we show thatthe optimal mechanism always admits a simple format — any buyer is incentivized to report his true type t (i.e., the mechanism is incentive compatible); the seller then charges the buyer p t and, afterwards, revealswhether the realized state q satisﬁes ρ ( q ) ≥ θ t or not for some carefully chosen threshold θ t . We thuscall such a mechanism threshold mechanism . The thresholds and payments are different for different buyertypes, and are carefully designed to accommodate the amount of risks each buyer type can take. We will This class of mechanisms is quite general and includes all possible ways that the buyer may sequentially reveal informationand ask for charges. upper and lower virtual values as well as their mixtures.As a byproduct, we also exhibit several interesting properties of the optimal mechanism. For example,the optimal mechanism will charge any two buyer types the same amount if the information revealed tothem is the same. So the optimal mechanism will not discriminate buyer types. Moreover, we also provideclean characterizations about the monotonicity properties about the buyer’s payment , utility , and surplus asa function of buyer type. It turns out that they may increase or decrease, depending on the setup.We make two remarks. First, we note that threshold mechanisms are actually ubiquitous in reality. Inmany application domains, we will need to pay for doing some inspections or tests for some entity, and thenreceive an outcome about whether it passed the inspection or not. These are precisely threshold mechanisms.Viewed this way, our results characterize the optimal threshold and payment for a buyer drawn from arandom population. Second, since both q, t are in continuous space in our setup, therefore typical variableoptimization based approaches (e.g., linear programming as used in many previous works [2, 7, 10]) wouldnot apply to our setting. Our analysis is based on optimizing functional variables . The fortunate aspect,though, is that such functional optimization allows us to derive more structural optimal mechanisms, in asimilar spirit to Myerson’s seminal work in optimal auction design [17]. Related Works.

The study of markets for information has attracted extensive recent research interests. Fora comprehensive overview of the progress in this ﬁeld, we refer interested readers to a very recent surveyby Bergemann and Bonatti [3]. This paper adopts the mechanism design approach and looks to design theoptimal mechanism (within some design space) that maximizes the seller’s revenue. Next we discuss therecent works that are most relevant to us. Note that, the welfare maximization problem in selling informationis relatively straightforward since the seller can simply reveal full information. Therefore, optimality in allour discussions always mean the optimality of revenue .A starting point of our paper is the work by Babaioff et al. [2] who study information selling froma mechanism design perspective. They also consider monopoly pricing setup with one information sellerand a buyer who relies on the seller’s information to improve his decision making. They consider mecha-nisms within a very general design space captured by generic interactive protocols [2] (see our descriptionsin Section 2). These mechanisms include all possible ways the seller can sequentially reveal informationvia signaling schemes and ask for charges alone the way. Within this space, Babaioff et al. establish arevelation-principle-type result and show that the seller can always use a somewhat direct mechanism tomaximize revenue which uses a single signaling scheme and asks for charges before or after sending the sig-nal, depending on the setup. Built upon this characterization, they also develop linear program formulationsto compute the optimal mechanism in polynomial time. Our work utilizes the revelation principle of [2] butgoes signiﬁcantly beyond to further develop closed-form optimal mechanisms for selling information.Recent works by Chen et al. [10] and Cai and Velegkas [7] also adopted algorithmic approaches tocompute the optimal mechanisms in polynomial time for a budgeted buyer or in the setup with multiplebuyers. Different from all these algorithmic approaches [2, 10, 7], our results are analytical. The optimalmechanisms we developed can be explicitly described and thus reveals more structural insights about the op-timal mechanism. This choice of using analytical approaches is also somewhat necessary in our setup sinceagents have continuous utility functions and typical variable optimization approaches are not applicable toour setting. Bergemann et al. [4] also exhibit structural properties about the optimal mechanism within aspecial menu-based class of mechanisms. In their setting, each buyer type may have different beliefs aboutthe state of nature whereas, like [2, 10], we assume all buyers share the same prior beliefs in our model. Intheir more challenging model, Bergemann et al. show that an analytical solution can be characterized whenthere are two buyer types or only two buyer actions and two states, whereas in general only partial propertiesabout the optimal mechanism can be derived.Our work is also relevant to the recent rich body of works on information design, a.k.a., Bayesian per-2uasion [15]. We refer curious readers to a few recent surveys by Bergemann and Morris [6], by Kamenica[14] and a survey from the algorithmic perspective by Dughmi [12]. Most relevant to ours is the persua-sion problem with a privately informed receiver [16]. However, different from the sender in the persuasionproblem who has a speciﬁc utility function, the seller in our model does not have any utility function exceptthe transfer she received from the buyer. The optimal mechanism we obtain is also not comparable to theoptimal signaling scheme characterized by Kolotilin et al. [16]. Notably, the case of binary receiver actions(or binary buyer actions) is an important special case of information design and has attracted signiﬁcantattention in previous works [8, 9, 13, 1, 16]. Candogan [9] shows that threshold signaling scheme could beoptimal when, e.g., the receiver utility is linear in payoff parameters as well the action he takes. Finally,Daskalakis et al. [11] study the joint design of signaling schemes and auction mechanisms. Though it is avery different setup from ours, the signaling scheme in their model can be viewed as a way to extract morerevenue from bidders.

Motivated by various applications of quality testing, we consider the following optimal pricing problem ofinformation between an information

Seller (she) and an information

Buyer (he). The buyer is a decisionmaker who faces one of two actions: a passive action and an active action . Buyer obtains an uncertainpayoff v ( q, t ) for the active action where q ∈ Q is a random state of nature (unknown to the buyer) and t ∈ T is the buyer’s private type. Buyer ’s utility for the passive action is always , irrespective of his typeand the state of nature. So the passive action serves as a backup option for the buyer. For example, if Buyer is a potential purchaser of some goods with uncertain quality (e.g., a house or a used car), the passive action corresponds to not purchase in which case Buyer has no gain neither loss, whereas the active action corresponds to purchase in which Buyer ’s utility depends on the quality q of the goods as well as how muchhe values the goods, i.e., the private type t .Both t and q are modeled as random variables that are independently distributed according to the cumu-lative distribution functions (CDF) F ( t ) and G ( q ) , respectively. The buyer’s type t ∈ R is a real value andsupported on T = [ t , t ] . The state of nature q is supported on a general measurable set Q and does not need to be a real value. Such an abstract representation of q is used to accommodate applications where q may include the features relevant to Buyer ’s decisions (e.g., the brand and production year of a used car).Throughout the paper, we assume both F ( t ) and G ( q ) are differentiable with corresponding probabilitydensity functions (PDF) f ( t ) and g ( q ) , though our analyses and results also extend to the non-differentiablesituations. Both F ( t ) and G ( q ) are public knowledge. However, the realized q can be observed by the informa-tion seller. We study Seller ’s problem of designing a revenue -optimal pricing scheme to sell her privateinformation about q to Buyer . Note that the buyer’s private type t is only known to the buyer. Therefore, Seller will have to incentivize the buyer to report his type t before deciding how to reveal information tothe buyer and how much to charge. Indeed, had the seller known the buyer’s type t , the seller’s optimalpricing mechanism should be revealing full information and then charges Buyer the value of information [4]: (cid:82) q ∈ Q max { , v ( q, t ) } g ( q )d q − max { , (cid:82) q ∈ Q v ( q, t ) g ( q )d q } .We assume Buyer ’s utility function v ( q, t ) is linear and monotone non-decreasing in the buyer’s type t for any q ∈ Q . Consequently, there exists function v ( q ) ≥ and ρ ( q ) such that v ( q, t ) = v ( q )( t + ρ ( q )) . (1)Since q is a random variable, ρ ( q ) also has a probability distribution. For expositional simplicity, we assumethe distribution of ρ does not have point masses, i.e., the probability measure of the set { q | ρ ( q ) = d } for In this case, the density functions are called distributions or generalized functions . d is 0. Similar analysis applies to the general case where ρ ( q ) contains point masses, except that theoptimal mechanism may need some randomization in that case.With slight abuse of notation, let v ( t ) denote the buyer’s expected utility for action under his priorbeliefs about q , namely, when no information is purchased. That is,Buyer’s initial utility of action 1: v ( t ) = (cid:90) q ∈ Q v ( q, t ) g ( q )d q. (2)Note that our setup above is similar to persuasion of a privately informed receiver by Kolotilin et al. [16].The key difference is that in our model, Seller has no utility function and only cares about the transfer from

Buyer (i.e., the revenue) whereas in [16] the sender has a particular utility function. Moreover, our buyerutility function in Equation (1) strictly generalizes the receiver’s utility in [16] which assumes linearity in both the receiver’s type t and the state of nature q . To maximize revenue,

Seller can design arbitrary mechanisms with possibly multiple rounds of interactions.We restrict our design space to include all possible mechanisms that can be expressed as a ﬁnite extensive-form game where each node in the game tree is one of the following three types: (1) transfer node which isassociated with a (possibly negative) transfer to the seller and has a single child node; (2) seller node whichassociates each state of nature q with a distribution prescribing the probabilities of moving to its childrennodes; (3) buyer node which has arbitrary buyer actions and an arbitrary set of children. This general spaceof mechanisms is also referred to as the generic interactive protocol [2] and includes all possible ways that Seller may sequentially reveal partial information to

Buyer and ask for transfers alone the way.The space of all such possible mechanisms appears enormous. Fortunately, similar to classic mechanismdesign [17], the setting of information pricing also admits the revelation principle as shown in [2, 4, 10].To describe the space of the so-called direct mechanisms, it sufﬁces to introduce the notion of signaling scheme, which formalizes the way that

Seller reveals information to

Buyer . Formally, given a set of possiblesignals Σ , a signaling scheme π : Q → ∆ Σ is a mapping from the state of nature q to a distribution over thesignals in Σ . Such a signaling scheme can be mathematically described by { π ( σ ; q ) } q ∈ Q,σ ∈ Σ where π ( σ ; q ) is the probability of sending signal σ conditioned on state q . Given signal σ , the buyer infers posteriorprobability about any state q via a standard Bayes updates: Pr ( q | σ ) = π ( σ ; q ) · g ( q ) (cid:82) q (cid:48) ∈ Q π ( σ ; q (cid:48) ) · g ( q (cid:48) )d q (cid:48) = π ( σ ; q ) · g ( q ) E q (cid:48) ∼ G [ π ( σ ; q (cid:48) )] . (3)Consequently, conditioned on signal σ , a buyer of type t has expected utility (cid:82) q ∈ Q v ( q, t ) Pr ( q | σ )d q for theactive action , and will take the active action if and only if when (cid:82) q ∈ Q v ( q, t ) Pr ( q | σ )d q ≥ .The following revelation principle is the starting point of our design of the optimal mechanism. Lemma 1. [Revelation Principle [2, 4, 10]] There always exists a revenue-maximizing mechanism for theabove setting which consists of a menu { π t , p t } t ∈ T such that: (1) π t is a signaling scheme for buyer type t which uses at most two signals { σ , σ } , resulting in the best buyer action and , respectively; (2) p t isthe payment from the buyer of type t .This mechanism with one-round buyer-seller interaction proceeds in order as follows:1. Seller announces the mechanism { π t , p t } t ∈ T ;2. Buyer type t is realized; Such a signaling scheme is sometimes also referred to as an experiments [16, 4]. . Buyer reports a type t (cid:48) ∈ T to Seller (not necessarily the true t ), and is charged payment p t (cid:48) ;4. The state of nature q is realized to Seller , who then samples a signal according to signaling scheme π t (cid:48) and sends the signal to Buyer .Moreover, the mechanism without loss of generality is incentive compatible and individually rational.

Like classic mechanism design, incentive compatibility (IC) means reporting true type t is an optimalstrategy for Buyer , and individual rationality (IR) means participating the mechanism is no worse for

Buyer than not participating. Babaioff et al. [2] proved Lemma 1 in the form of general signaling schemes andlater results [4, 10] further simplify it and show that it sufﬁces to restrict to signaling schemes with n signalswhere n is the number of buyer actions ( n = 2 in our case).Our next result further simpliﬁes the design space and shows that we can without loss of generalityrestrict to mechanisms with non-negative payment for every type t . Though this is natural to expect, we pointout that this result does not trivially hold — in fact, it was shown that an optimal single-round mechanismmay sometimes have to involve negative payments when buyer type q and state t are correlated [2]. Theproof of this technical lemma is deferred to Appendix B. Lemma 2. [Non-Negative Payments] There exists an optimal IC and IR mechanism in which p t ≥ for all t ∈ T . We refer to the mechanisms characterized by Lemma 1 as direct mechanisms. Given the revelation principle,we can without loss of generality focus on the design of optimal direct mechanisms. Any direct mechanismcan be characterized by two functions: (1) π ( q, t ) ∈ [0 , which is the probability of sending signal σ tothe buyer of type t at the state of nature q ; (2) p ( t ) ≥ which is the non-negative payment (due to Lemma2) from the buyer. Note that, [1 − π ( q, t )] will be the seller’s probability of sending signal σ to the buyer oftype t conditioned on state q .Our main goal is to derive a feasible and optimal π ∗ , p ∗ for the seller’s information pricing mechanism.Note that this is functional optimization problem since both π ( q, t ) , p ( t ) are function variables that dependon continuous variable t ∈ [ t , t ](= T ) and abstract variable q in measurable set Q . We thus refer to π as the signaling function and p as the payment function . Next, we will formulate the problem based on theconstraints described by the revelation principle in Lemma 1. We start by deriving Seller ’s revenue objective:

Seller

Objective: max π,p (cid:90) t ∈ T f ( t ) p ( t ) d t The revelation principle shows that the signaling scheme will have two signals σ , σ , each resultingin Buyer best response of action and respectively. This poses a constraint about the signaling function π ( q, t ) : (1) (cid:82) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q ≥ for any t ∈ T ; (2) (cid:82) q ∈ Q [1 − π ( q, t )] v ( q, t ) g ( q ) d q ≤ for any t ∈ T . The ﬁrst constraint above ensures that when signal σ is recommended to the buyer of type t , thebuyer’s expected utility E q ∼ G [ π ( q,t ) v ( q,t )] E q ∼ G [ π ( q,t )] for the active action should be at least , i.e., the utility of thepassive action . Conversely, the second constraint ensures that conditioned on signal σ , the expected buyerutility for action E q ∼ G [(1 − π ( q,t )) v ( q,t )] E q ∼ G [1 − π ( q,t )] should be at most . These two constraints are also widely referredto as the obedience constraints in the literatures of information design [5, 6]. Slightly manipulating thesecond constraint above, we obtain (cid:82) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q ≥ (cid:82) q ∈ Q v ( q, t ) g ( q ) d q = v ( t ) . Therefore, wecan conveniently summarize the obedience constraint as follows:Obedience: (cid:90) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q ≥ max { , v ( t ) } , ∀ t ∈ T (4)5iven the obedience constraint and signaling scheme for type t , a buyer of true type t should be incentivizedto take action whenever receiving signal σ and action otherwise. Therefore, we can also think of thesignaling scheme as obedient action recommendations . Note that Buyer derives utility from the recom-mended action as well, therefore the expected utility of buyer type t when he reports his type truthfully and follows π ’s obedient recommendation is u ( t ) = E q ∼ G [ π ( q, t ) v ( q, t )] − p ( t ) = (cid:90) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q − p ( t ) (5)where the ﬁrst term comes from utility from his decision making under Seller ’s signaling scheme andthe second term is the payment to the seller. To ensure that

Buyer is willing to participate in the mechanism,we will impose the following individual rationality (IR) constraint:IR: (cid:90) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q − p ( t ) ≥ max { , v ( t ) } , ∀ t ∈ T, (6)where the right-hand side is the buyer’s expected utility of not participating in the mechanism. We remarkthat if the buyer does not participate in the information selling mechanism, he still has the freedom to pickthe better action among , based on his prior beliefs about q , leading to a utility equal to max { , v ( t ) } under no information purchase. Therefore, we also deﬁne the surplus s ( t ) – the additional utility gain ofparticipating the mechanism – as a function of the buyer type t to be s ( t ) = u ( t ) − max { v ( t ) , } = (cid:90) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q − p ( t ) − max { v ( t ) , } . (7)The IR constraint is equivalent to non-negative surplus. Interestingly, since the payment function is alwaysnon-negative, the IR Constraint 6 implies the Obedience Constraint (4). Derivation of Incentive Compatibility (IC) Constraints.

Finally, the IC turns out to require some carefulderivations. To guarantee IC, we require that a buyer of type t should obtain higher utility than mis-reportingany type t (cid:48) . This turns out to require some analyses since when a type- t buyer misreports type t (cid:48) , thesignaling scheme described by { π ( q, t (cid:48) ) } q ∈ Q may not be obedient any more. Therefore, the buyer utility forsignal σ should be the maximum between and the following expected utility for action U ( t (cid:48) ; t ) := (cid:90) q ∈ Q π ( q, t (cid:48) ) v ( q, t ) g ( q ) d q = (cid:90) q ∈ Q π ( q, t (cid:48) ) v ( q )[ t + ρ ( q )] g ( q ) d q. (8)Conversely, the buyer utility for signal σ will be the maximum between and the following expected utilityfor action (cid:90) q ∈ Q [1 − π ( q, t (cid:48) )] × v ( q )[ t + ρ ( q )] g ( q ) d q = v ( t ) − U ( t (cid:48) ; t ) . (9)Consequently, the type- t buyer’s utility from signaling scheme { π ( q, t (cid:48) ) } q ∈ Q equals max { U ( t (cid:48) ; t ) , } +max { v ( t ) − U ( t (cid:48) ; t ) , } . Therefore, the incentive compatibility constraint becomes the following complexconstraint: u ( t ) ≥ max { U ( t (cid:48) ; t ) , } + max { v ( t ) − U ( t (cid:48) ; t ) , } − p ( t (cid:48) ) (10)Fortunately, it turns out that some cases of the above constraint can be implied by previous constraints. Tosee this, we distinguish between two cases:1. When t > t (cid:48) , we have U ( t (cid:48) ; t ) deﬁned in Equation (8) to be at least U ( t (cid:48) ; t (cid:48) ) , which is at least bythe Obedience Constraint (4) for t (cid:48) . In this case, the right-hand-side of the above constraint becomes U ( t (cid:48) ; t ) + max { v ( t ) − U ( t (cid:48) ; t ) , } − p ( t (cid:48) ) , or equivalently max { v ( t ) , U ( t (cid:48) ; t ) } − p ( t (cid:48) ) . Note that u ( t ) ≥ v ( t ) − p ( t (cid:48) ) is already implied by the IR constraint u ( t ) ≥ v ( t ) and the condition p ( t (cid:48) ) ≥ .Therefore, the only non-redundant constraint in this case is u ( t ) ≥ U ( t (cid:48) ; t ) − p ( t (cid:48) ) .6. When t < t (cid:48) , we have v ( t ) − U ( t (cid:48) ; t ) deﬁned in Equation (9) to be at most v ( t (cid:48) ) − U ( t (cid:48) ; t (cid:48) ) , whichis at most by the Obedience Constraint (4) for t (cid:48) . In this case, the right-hand-side of the aboveconstraint becomes max { U ( t (cid:48) ; t ) , } − p ( t (cid:48) ) . Note that u ( t ) ≥ − p ( t (cid:48) ) is already implied by the IRconstraint u ( t ) ≥ and the condition p ( t (cid:48) ) ≥ . Therefore, the only non-redundant constraint in thiscase is also u ( t ) ≥ U ( t (cid:48) ; t ) − p ( t (cid:48) ) .Consequently, given the IR and Obedience constraints before, the IC constraint can ﬁnally be expressed asfollows: IC: (cid:90) q ∈ Q π ( q, t ) v ( q, t ) g ( q ) d q − p ( t ) ≥ (cid:90) q ∈ Q π ( q, t (cid:48) ) v ( q, t ) g ( q ) d q − p ( t (cid:48) ) , ∀ , t, t (cid:48) ∈ T (11) Differences from Classic Mechanism Design for Goods.

First, the mechanism in our setup has differentdesign space and is characterized by a different set of variables. For example, there appears no naturalcorrespondence between our the signaling scheme functional variable π and variables in classic mechanismdesign. The second important difference between selling information and classic mechanism design is thatthe IR constraint (6) in our setting is different from the IR constraint in classic mechanism design, whichsimply requires the utility of participation is at least . In our setting, however, a buyer of type t willhave utility max { , v ( t ) } without additional information. Therefore, our IR constraint has to guaranteethat the buyer’s utility from the mechanism is at least max { , v ( t ) } . This important difference turns out tosigniﬁcantly change the problem structure and raise new challenges to the design of the information sellingmechanism. As will be shown later, it leads to very different optimal mechanisms from what we see inclassic mechanism design. In this section, we present the characterization of the optimal pricing mechanism. Mathematically, wederive an optimal solution in closed form to the functional optimization problem formulated in Section 2.The optimal mechanism we obtain turns out to belong to the following category of threshold mechanisms . Deﬁnition 1. [Threshold Mechanism] A mechanism ( π, p ) is called a threshold mechanism if there exists afunction θ ( t ) , such that for any t ∈ [ t , t ] , π ( q, t ) = (cid:40) ρ ( q ) ≥ θ ( t )0 otherwise . Since π is fully described by θ ( t ) here, they are both referred to as a threshold signaling function . Note that the term “threshold” is only a property about the signaling function π and does not pose anyconstraint on the payment function p . To state our mechanism, we will need the following notions of lower and upper virtual values. Deﬁnition 2 (Lower/Upper Virtual Values) . For any type t with PDF f ( t ) and CDF F ( t ) , the function φ ( t ) = t − − F ( t ) f ( t ) is called the lower virtual value function and φ ( t ) = t + F ( t ) f ( t ) is called the upper virtualvalue function . A lower/upper virtual value function is regular if it is monotone non-decreasing in t . The lower virtual value function φ ( t ) is precisely the virtual value function commonly used in classicmechanism design [17]. However, the upper virtual value function is a new format, which to our knowledgedoes not appear in previous literature. The regularity deﬁnition is standard.When a virtual value function is irregular, we will need to apply the so-called “ironing” trick to makeit monotone non-decreasing in t . Myerson [17] developed a procedure for ironing the lower virtual value7unction φ ( t ) . This procedure can be easily generalized to ironing any function about the buyer type t ,speciﬁcally, also our upper virtual value function φ ( t ) and mixtures of φ ( t ) , φ ( t ) . We defer the formaldescription of this ironing procedure to Appendix A.1 and only introduce them as a deﬁnition here. Deﬁnition 3 (Mixed Virtual Values and Ironing) . For any c ∈ [0 , , deﬁne φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) as a mixed virtual value function . For any virtual value function φ ( t ) (upper or lower or mixed), let φ + ( t ) denote the ironed version of φ ( t ) obtained via the standard ironing procedure of Myerson [17]. If a virtual value function φ ( t ) is already monotone non-decreasing, it will remain the same after theironing process, i.e., φ + ( t ) = φ ( t ) , ∀ t . The following monotonicity property of the ironed mixed virtualvalue functions will be needed for proving our main results. Its proof is a bit technical and is deferred toAppendix A.2. Lemma 3. [Monotonicity of Ironed Mixed Virtual Values] Deﬁne φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) . Then wehave for any ≤ c < c (cid:48) ≤ , φ + c ( t ) ≥ φ + c (cid:48) ( t ) for any t . Moreover, φ +1 ( t ) = φ + ( t ) < t < φ + ( t ) = φ +0 ( t ) , ∀ t ∈ ( t , t ) . We are now ready to state the characterization of the optimal mechanism after introducing the followingtwo quantities V L = max { v ( t ) , } + (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x, (12) V H = max { v ( t ) , } + (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x, (13)where φ + ( x ) /φ + ( x ) are the ironed upper/lower virtual value functions. Note that Lemma 3 implies − φ + ( x ) ≥− φ + ( x ) and consequently V L ≤ V H since g ( q ) v ( q ) is always non-negative and V L integrates over a smallerregion on q .Our main result is summarized in the following theorem. Theorem 1. [Characterization of the optimal pricing mechanism for selling information]1. If v ( t ) ≤ V L , the threshold mechanism with threshold signaling function θ ∗ ( t ) = − φ + ( t ) and thefollowing payment function represents an optimal mechanism: p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − (cid:90) tt (cid:90) q ∈ Q π ∗ ( q, x ) g ( q ) v ( q ) d q d x. where π ∗ is determined by θ ∗ ( t ) as in Deﬁnition 1. Moreover, p ∗ ( t ) is monotone non-decreasing for t ∈ [ t , t ] .2. If v ( t ) ≥ V H , the threshold mechanism with threshold signaling function θ ∗ ( t ) = − φ + ( t ) and thefollowing payment function represents an optimal mechanism: p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q + (cid:90) t t (cid:90) q ∈ Q π ∗ ( q, x ) g ( q ) v ( q ) d q d x − v ( t ) , where π ∗ is determined by θ ∗ ( t ) as in Deﬁnition 1. Moreover, p ∗ ( t ) is monotone non-increasing for t ∈ [ t , t ] . . If V L < v ( t ) < V H , deﬁne φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) to be the mixed virtual value function, where c ∈ (0 , is a constant that satisﬁes (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t = v ( t ) , where φ + c ( t ) is the ironed version of φ c ( t ) . Then the threshold mechanism with threshold signalingfunction θ ∗ ( t ) = − φ + c ( t ) and the following payment function represents an optimal mechanism: p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − (cid:90) tt (cid:90) q ∈ Q π ∗ ( q, x ) g ( q ) v ( q ) d q d x. Moreover, p ∗ ( t ) is monotone non-decreasing in t when F ( t ) ≤ c and monotone non-increasing when F ( t ) > c .Let t satisfy v ( t ) = 0 . Then in all cases above, the buyer surplus function s ( t ) is monotone non-decreasingwhen t ≤ t and monotone non-increasing when t ≥ t . Note that in the optimal mechanism of Theorem 1, if the signaling schemes for two types t, t (cid:48) are thesame, then their payment must also be the same, i.e., p ∗ ( t ) = p ∗ ( t (cid:48) ) . This is a simple consequence of the ICconstraint — if p ∗ ( t ) > p ∗ ( t (cid:48) ) , the buyer of type t would misreport t (cid:48) , and vice versa. Remark 1.

In all three cases of Theorem 1, a threshold mechanism is optimal. However, the format of theoptimal mechanism and payment properties depend on how v ( t ) compares to V L , V H . Note that thresholdmechanisms are ubiquitous in reality. In many inspections, examinations and recommendations, we oftensee some goods (or services) pass a test (or deserve a recommendation). These can be viewed as a thresholdsignaling scheme. What we pay for conducting these tests or receiving recommendations are preciselythe required payment for receiving such information. From this perspective, Theorem 1 characterizes theoptimal signaling threshold and payment for buyers drawn from a random population. Remark 2.

We brieﬂy discuss the choice of the constant c in Case 3 of Theorem 1. As we will show later inour proof, v ( t ) ≤ V H will imply v ( t ) ≤ for any feasible mechanism. Therefore, in Case 3, the V L , V H deﬁned in Equation (12) and (13) only has the integral term. Therefore, the condition of Case 3 boils downto (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x < v ( t ) < (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x. Since φ + ( x ) < φ + ( x ) for any x , the choice of c ∈ (0 , is trying to “interpolating” the two integral region { q : ρ ( q ) ≥ − φ + ( x ) } and { q : ρ ( q ) ≥ − φ + ( x ) } . Since we assume that the distribution has no point mass,the following expression (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( x ) g ( q ) v ( q ) d q d x is continuous in c . Lemma 3 implies that it is also monotone weakly decreasing in c . Therefore, we canbinary search for the c that makes the value of this integral equal exactly v ( t ) . This also leads to a tractablealgorithm for computing the c parameter. We conclude this section by describing two examples and illustrate what the optimal mechanism wouldbe like in concrete instances. This is the situation where the assumption that the distribution of ρ has no point masses is needed. Without this assumption,the threshold mechanism will need randomization for those q with ρ ( q ) = φ + c ( t ) . See Appendix G for the reﬁned characterizationof the optimal mechanism for general ρ . .1 An Example of Regular Case 1 Consider v ( q, t ) = qt − v ( q )[ t + ρ ( q )] where v ( q ) = q and ρ ( q ) = − q . Suppose q ∈ Q = [0 , is uniformly distributed, i.e., g ( q ) = 1 . Let t ∈ T = [2 , also be uniformly distributed with f ( t ) = 1 .Among others, this utility function captures online advertising where q is the probability an Internet userwill purchase the product of an advertiser (the information buyer) and t is the advertiser’s value for selling aproduct. The constant is interpreted as the advertiser’s payment for displaying his ads to an Internet user.In this instance, φ ( t ) is non-decreasing, thus, ironing doesn’t change its value. We have φ + ( t ) = φ ( t ) = t − − F ( t ) f ( t ) = 2 t − . Note that v ( t ) = (cid:82) q ∈ Q g ( q ) v ( q, t ) dq = (cid:82) ( tq − dq = t − for any t ∈ [2 , .Since V L deﬁned in Equation (13) is clearly non-negative, we have v ( t ) = − . < ≤ V L , so the instancefalls into Case 1 of Theorem 1. This implies that an optimal mechanism can be speciﬁed by a thresholdsignaling scheme θ ∗ ( t ) = − φ + ( t ) = 3 − t . That is, for any buyer type t the mechanism will makeobedient recommendation of the active action when ρ ( q ) ≥ − φ + ( t ) , or concretely, when q ≥ t − . Nowthere are two situations.• When t ≤ . , the mechanism will recommend action when q ≥ t − ≥ , which means themechanism will never recommend action . Therefore, π ∗ ( t, q ) = 0 for all q ∈ Q in this situation andthe payment p ∗ = 0 . Therefore, for these buyer types, the seller will sell no information to them andcharges them as well.• When t > . , the mechanism will recommend action when q ≥ t − , which is a threshold in (0 , and decreases in t . In this situation, the payment function p ∗ ( t ) can then be computed as follows p ∗ ( t ) = (cid:90) t − [ ( qt − dq − (cid:90) t . (cid:90) x − q dq dx = − .

25 + 4 t − t − For these buyer types, their utility from the mechanism will be u ( t ) = (cid:90) t − [ ( qt − dq − p ∗ ( t ) = − .

75 + t t − Since without information, the optimal buyer action will be action due to v ( t ) ≤ , u ( t ) also equalsthe buyer surplus and can be veriﬁed to be monotone increasing.Notably, to achieve optimal revenue, the optimal mechanism does not simply recommend the activeaction whenever v ( q, t ) ≥ . For example, when t = 2 . , the mechanism reveals no information (and asksfor no charges) even v ( q, t ) > . Therefore, the revenue-optimal mechanism is generally not optimal for thewelfare. Indeed, it sacriﬁces the welfare for the buyers with smaller types to extract more revenue from thebuyers with higher types. Consider another instance v ( q, t ) = 10 q ( t + q −

72) = v ( q )[ t + ρ ( q )] where v ( q ) = 10 q and ρ ( q ) = q − .Suppose q ∈ Q = [0 , is uniformly distributed, i.e., g ( q ) = . Let t ∈ T = [30 , to be piece-wiseuniformly distributed. Speciﬁcally, t be uniformly distributed in [30 , with f ( t ) = and [54 , with f ( t ) = . This value function may capture house selling where t is the buyer’s private preference onthe house quality q , and v ( q ) ρ ( q ) models the net value this house can bring independently from personalpreferences. φ ( t ) = (cid:40) t − if ≤ t < t − if ≤ t ≤ φ + ( t ) =  t − if ≤ t < if ≤ t < t − if ≤ t ≤ v ( t ) = 8400 > max (0 , v ( t )) + V H = 7944 . Thus, Theorem 1 implies that an optimal mechanism can be speciﬁed by a threshold signalingscheme θ ∗ ( t ) = − φ + ( t ) . That is, for any buyer type t the mechanism will make obedient recommendationof the active action when ρ ( q ) ≥ − φ + ( t ) . One can verify that in this case, the optimal mechanism willreveal a non-trivial amount of information to every type of buyers (except the only buyer type t ) and alsohave positive charge from them. Concretely, the payment function p ∗ ( t ) for this instance can be calculatedas follows. p ( t ) =  t − + 7( t − − if ≤ t < if ≤ t < t − + 4( t − if ≤ t ≤ Figure 1: Functions in example 2Figure 1 plots the (no-increasing) payment p ( t ) as a function of buyer types t , as well as the buyer’s utility u ( t ) and surplus s ( t ) function, for which we omit the standard calculations and solutions. Note that a largeutility u ( t ) does not mean a large surplus since the buyer with a large type may originally already has a verylarge utility, and participating the mechanism will not give them much additional utility, i.e., the surplus.This is exactly the situation illustrated in Figure 1.One interesting observation happens at the interval ≤ t < . The mechanism recommends action when ( q − ≥ − , or equivalent q ≥ , for all these t . Moreover, the mechanism also charges the sameamount p ( t ) = 30 for t in this interval. This is predicted by the fact that for any two buyer types, the optimalmechanism charges the same payment from them if recommendation policies for them are the same. As wediscussed earlier. this fact is induced by IC.Another interesting observation is that to the optimal mechanism sometimes recommends the activeaction 1 even when v ( q, t ) < . For example, when t = 40 , the mechanism recommends action 1 whenever q ≥ . However, when q = 22 , buyer’s valuation is actually negative v ( q, t ) = − . Notably, in Case 2,the mechanism will always recommend action 1 whenever v ( q, t ) ≥ while in Case 1 this is not true. Thisis due to the nature of φ + ( t ) ≥ t ≥ φ + ( t ) .Note that, the surplus s ( t ) is the extra utility Buyer can get by participating in the mechanism and isdeﬁned in (7). Non-negativity of this function guarantees IR constraint. We shall show that it is monotonenon-increasing in Case 2 (as plotted in Figure 1), monotone non-decreasing in Case 1, and will increase ﬁrstthen decrease in Case 3. The optimal mechanism will make s ( t ) = 0 in Case 1, s ( t ) = 0 in Case 2, and s ( t ) = s ( t ) = 0 in Case 3. In this section, we prove Theorem 1. Due to space limit, we will only provide a proof for Case 3. The coreidea for proving Case 1 and 2 are similar; we thus defer them to Appendix E and F, respectively. As wewill see, our derivation here differs signiﬁcantly from the derivation of optimal mechanisms for goods. Theproof can be divided into two major steps: (1) characterizing the space of feasible mechanisms; (2) derivingthe optimal mechanism within the feasible space. While the ﬁrst step is also based on analysis of the ICconstraints like that in classic mechanism design, the conclusions we obtain are quite different since our ICconstraints are different. The second step deviates signiﬁcantly from classic approaches, and is arguablymuch more involved due to additional constraints that we have to handle.11 .1 Characterization of Feasible Mechanisms

We ﬁrst characterize the set of feasible mechanisms that satisfy the IR Constraints (6) and the IC Constraints(11). By Lemma 2, we can focus on the space of mechanisms with positive payments. In this case, theObedience Constraints (4) can be implied by the IR constraints.To describe our characterization, it is useful to introduce the following quantity. P π ( t ) = (cid:90) q ∈ Q π ( q, t ) · g ( q ) v ( q ) d q (14)Note that P π ( t ) can be interpreted as the expected weighted probability of being recommended the activeaction where the weight is v ( q ) . The following lemma summarizes our characterization. To give readerssome intuition, we only provide a proof of sufﬁciency here and defer the proof of necessity to Appendix C. Lemma 4 (Characterization of Feasible Mechanisms) . A mechanism ( π, p ) with non-negative payments isfeasible if and only if it satisﬁes the following constraints: P π ( t ) is monotone non-decreasing in t (15) u ( t ) = u ( t ) + (cid:90) tt P π ( x ) d x, ∀ t ∈ T (16) u ( t ) ≥ v ( t ) , u ( t ) ≥ (17) p ( t ) ≥ , ∀ t ∈ T (18) Proof of Sufﬁciency.

We prove that Constraints (15)-(18) imply Obedience (4), IR (6) and IC (11) con-straints. IC constraint (11) is equivalent to u ( t ) ≥ u ( t (cid:48) ) + (cid:90) q ∈ Q π ( q, t (cid:48) ) · g ( q )[ v ( q, t ) − v ( q, t (cid:48) )] d q = u ( t (cid:48) ) + ( t − t (cid:48) ) P π ( t (cid:48) ) . Therefore Constraints (15) and (16) imply IC constraint (11) because if t (cid:48) < t , we have u ( t ) − u ( t (cid:48) ) = (cid:90) tt (cid:48) P π ( x ) d x ≥ (cid:90) tt (cid:48) P π ( t (cid:48) ) d x = ( t − t (cid:48) ) P π ( t (cid:48) ) . Similarly, when t (cid:48) > t , we also have u ( t ) − u ( t (cid:48) ) ≥ ( t − t (cid:48) ) P π ( t (cid:48) ) . The IR constraint (6) is equivalent to u ( t ) ≥ and u ( t ) ≥ v ( t ) . Since P π ( x ) ≥ , Constraint (16),together with u ( t ) ≥ , imply u ( t ) ≥ for any t . We now leverage u ( t ) ≥ v ( t ) to prove u ( t ) ≥ v ( t ) forany t , as follows: u ( t ) = u ( t ) + (cid:90) tt P π ( x ) d x = u ( t ) − (cid:90) t t P π ( x ) d x ≥ v ( t ) − (cid:90) t t P π ( x ) d x. Using the deﬁnition of v ( t ) and P π ( x ) , we get u ( t ) = (cid:90) q ∈ Q g ( q ) v ( q )[ t + ρ ( q )] d q − (cid:90) t t (cid:90) q ∈ Q π ( q, x ) g ( q ) v ( q ) d q d x ≥ (cid:90) q ∈ Q g ( q ) v ( q )[ t + ρ ( q )] d q − (cid:90) t t (cid:90) q ∈ Q g ( q ) v ( q ) d q d x = (cid:90) q ∈ Q g ( q ) v ( q )[ t + ρ ( q )] d q = v ( t ) Finally, the Obedience constraint (4) follows from IR constraint (6) and p ( t ) ≥ .12ote that Condition (15) is analogous to Myerson’s allocation monotonicity condition as in auctiondesign, but is different. Speciﬁcally, in Myerson’s optimal auction, the value of an item directly depends onbuyer type t with no weight associated with it. In information selling, the value of taking the active actionwill depend on the utility coefﬁcient v ( q ) .Next we characterize Buyer ’s surplus from participating in the information selling mechanism, deﬁnedas follows: Buyer Surplus: s ( t ) = u ( t ) − max { , v ( t ) } . (19)That is, the Buyer surplus is the difference between his utility from the information selling mechanism andhis utility from directly picking the better action among , without purchasing any information. Notethat the IR constraint (6) is equivalent to s ( t ) ≥ . Recall that Buyer of type t has expected utility v ( t ) = (cid:82) q ∈ Q v ( q, t ) g ( q ) d q for action 1 without purchasing any information. Since v ( q, t ) is monotonenon-decreasing in t , we know that v ( t ) is also monotone non-decreasing. Let t be the Buyer type at which v ( t ) crosses ( t can be any one if there are multiple such t ). The following lemma characterize how Buyer ’ssurplus changes as a function of his type.

Lemma 5.

Let t be any buyer type such that v ( t ) = (cid:82) q ∈ Q v ( q, t ) g ( q ) d q = 0 . In any feasible mecha-nism ( π, p ) with non-negative payments, the buyer’s surplus s ( t ) is non-negative for t ∈ [ t , t ] , monotone(weakly) increasing for t ∈ [ t , t ] , and monotone (weakly) decreasing for t ∈ [ t, t ] .Proof. When t ≤ t , we have v ( t ) ≤ . Therefore, without participating in the mechanism to purchase addi-tional information, Buyer will derive utility by taking the passive action . So his surplus for participationis s ( t ) = u ( t ) = u ( t ) + (cid:90) tt P π ( x ) d x by the utility identify in Equation (17). Since u ( t ) ≥ and P π ( x ) ≥ , it is easy to see that s ( t ) isnon-negative and monotone non-decreasing in t .When t ≥ t , we have v ( t ) ≥ . So Buyer will derive utility v ( t ) without participating the informationselling mechanism. We thus have s ( t ) = u ( t ) − v ( t )= (cid:20) u ( t ) + (cid:90) tt (cid:90) q ∈ Q π ( q, x ) v ( q ) g ( q ) d q d x (cid:21) − (cid:20)(cid:90) q ∈ Q v ( q )[ t + ρ ( q )] g ( q ) d q (cid:21) = (cid:20) u ( t ) + (cid:90) tt (cid:90) q ∈ Q π ( q, x ) v ( q ) g ( q ) d q d x (cid:21) − (cid:20)(cid:90) tt (cid:90) q ∈ Q v ( q ) g ( q ) d q d x + v ( t ) (cid:21) = u ( t ) − v ( t ) + (cid:20)(cid:90) tt (cid:90) q ∈ Q [ π ( q, x ) − v ( q ) g ( q ) d q d x (cid:21) . Since π ( q, x ) − ≤ and v ( q ) g ( q ) ≥ , we thus have s ( t ) is monotone non-increasing in t . Consequently, s ( t ) ≥ s ( t ) = u ( t ) − v ( t ) ≥ by Inequality (17). With the characterization of feasible mechanisms in Lemma 4, we are now ready to derive the optimal mech-anism. This is where our proof starts to signiﬁcantly deviate from typical approaches for classic mechanismdesign. To see the reasons, recall that Lemma 5 shows that

Buyer surplus s ( t ) in our problem will increaseﬁrst and then decrease. In, e.g., feasible single-item mechanisms, Buyer ’s utilities are always increasing intheir types, therefore the optimal auction should always set

Buyer ’s surplus to be at the lowest type. Inour case, however, both s ( t ) and s ( t ) could be the lowest surplus and we have to ﬁgure out which one13eeds to be the lowest and under what conditions. Second, to guarantee feasibility, the mechanism has tobe designed to satisfy u ( t ) ≥ v ( t ) (coming from the IR constraints u ( t ) ≥ v ( t ) , ∀ t ∈ T ). This constraintis unique in selling information and is not required when designing mechanisms for selling goods. Guaran-teeing feasibility of this constraint will result in much more intricate derivations as well as more intricateoptimal mechanisms.It turns out that whether the minimum buyer surplus will be at t or at t or simultaneously at both t , t depends on how large v ( t ) and v ( t ) are. In fact, the optimal mechanism has different forms depending onwhether v ( t ) ≤ V L , v ( t ) ≥ V H , or V L < v ( t ) < V H . We start with a technical lemma showing that theconditions for the three cases above can also be written in terms of v ( t ) as well. The proof of this technicalLemma 6 is deferred to Appendix D.1. Lemma 6.

Deﬁne V (cid:48) L = − (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x and V (cid:48) H = − (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x. Then the three conditions v ( t ) ≤ V L , v ( t ) ≥ V H , and V L < v ( t ) < V H are equivalent to v ( t ) ≤ V (cid:48) L , v ( t ) ≥ V (cid:48) H , and V (cid:48) L < v ( t ) < V (cid:48) H , respectively. From now on, we will focus on the case with V L < v ( t ) < V H , and re-state the Case 3 of Theorem 1in the following proposition. Proposition 1. If V L < v ( t ) < V H , deﬁne φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) to be the mixed virtual valuefunction, where c ∈ (0 , is a constant that satisﬁes (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t = v ( t ) , (20) where φ + c ( t ) is the ironed version of φ c ( t ) . Then the threshold mechanism with threshold signaling function θ ∗ ( t ) = − φ + c ( t ) and the following payment function represents an optimal mechanism: p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − (cid:90) tt (cid:90) q ∈ Q π ∗ ( q, x ) g ( q ) v ( q ) d q d x. Moreover, p ∗ ( t ) is non-decreasing in t when F ( t ) ≤ c and monotone non-increasing when F ( t ) > c . Before proving the optimality of our mechanism, we ﬁrst argue that the constant c described in Proposi-tion 1 actually exists. Lemma 7. If V L < v ( t ) < V H , there exists a constant c ∈ (0 , that satisﬁes Equation (20) . The proof of Lemma 7 is deferred to Appendix D.2. But we emphasize that the proof used the assump-tion that the distribution of ρ ( q ) does not contain a point mass. However, even if this assumption does nothold, we can slightly modify our mechanism to get a threshold mechanism with partial recommendations onthreshold boundary, and still achieve the optimal revenue. For clarity, we put the solution and proof for thatgeneral case in Appendix G.Lemma 7 implies that the mechanism proposed in Proposition 1 exists. Now we show that it is alsofeasible. Lemma 8.

The mechanism ( π ∗ , p ∗ ) deﬁned according to φ + c ( t ) is feasible. roof. To prove Lemma 8, it sufﬁces to show that mechanism ( π ∗ , p ∗ ) satisﬁes all the constraints (15), (16),(17), and (18). By deﬁnition, P π ∗ ( t ) = (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q. Since φ + c ( t ) is already ironed, it is non-increasing in t . Thus the integral domain of P π ∗ ( t ) gets larger as t increases. So P π ∗ ( t ) is non-decreasing since g ( q ) v ( q ) ≥ , satisfying constraint (15).To show that the mechanism satisﬁes constraint (16), note that u ( t ) = (cid:90) q ∈ Q g ( q ) π ∗ ( q, t ) v ( q, t ) d q − p ( t ) = (cid:90) tt P π ∗ ( x ) d x. Thus u ( t ) = 0 and u ( t ) = u ( t ) + (cid:90) tt P π ∗ ( x ) d x. As for constraint (17), we already have u ( t ) = 0 . And u ( t ) = (cid:90) t t P π ∗ ( x ) d x = (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t = v ( t ) , where the last equation is the deﬁnition of the constant c .Finally, we show that the payment is non-negative, i.e., mechanism ( π ∗ , p ∗ ) satisﬁes constraint (18). Weclaim that p ∗ ( t ) is monotone non-decreasing when F ( t ) ≤ c , and monotone non-increasing when F ( t ) ≥ c (recall that c ∈ (0 , ).Deﬁne t c as the buyer type where F ( t c ) = c . Since φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) = t + F ( t ) − cf ( t ) , we have ∀ t, s.t.F ( t ) ≤ c, φ c ( t ) ≤ t and ∀ t, s.t.F ( t ) ≥ c, φ c ( t ) ≥ t .Following arguments similar to proof of Lemma 3, we know that φ + c ( t ) ≤ t, ∀ t ≤ t c and φ + c ( t ) ≥ t, ∀ t ≥ t c . For any t < t c , let t (cid:48) be any number in the interval [ φ + c ( t ) , t ] . Thus φ + c ( t ) ≥ φ + c ( t (cid:48) ) . And p ∗ ( t ) − p ∗ ( t (cid:48) ) = (cid:90) q ∈ Q g ( q ) π ∗ ( q, t ) v ( q, t ) d q − (cid:90) q ∈ Q g ( q ) π ∗ ( q, t (cid:48) ) v ( q, t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x = (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q, t ) d q − (cid:90) q : ρ ( q ) ≥− φ + c ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x. When ρ ( q ) ≥ − φ + c ( t ) , we have v ( q, t (cid:48) ) = v ( q )[ t (cid:48) + ρ ( q )] ≥ v ( q )[ t (cid:48) − φ + c ( t )] ≥ , where the last inequalityis because of the choice of t (cid:48) . So the second term in the above equation satisﬁes: (cid:90) q : ρ ( q ) ≥− φ + c ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q = (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) q : − φ + c ( t ) ≤ ρ ( q ) < − φ + c ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q ≤ (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q, t (cid:48) ) d q. Thus, p ∗ ( t ) − p ∗ ( t (cid:48) ) ≥ (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q, t ) d q − (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x = (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q )( t − t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x =( t − t (cid:48) ) P π ∗ ( t ) − (cid:90) tt (cid:48) P π ∗ ( x ) d x ≥ , P π ∗ ( t ) .Therefore, the payment function p ∗ ( t ) is monotone non-decreasing in the interval [ φ + ( t ) , t ] . Since theset of intervals { [ φ + ( t ) , t ] | t ∈ [ t , t c ] } covers [ t , t c ] , we conclude that p ∗ ( t ) is monotone non-decreasingin [ t , t c ] .Using similar analyses, we can show that p ∗ ( t ) is monotone non-increasing in the interval [ t c , t ] . There-fore, to prove that p ∗ ( t ) ≥ for all t ∈ T , it sufﬁces to show that p ∗ ( t ) ≥ and p ∗ ( t ) ≥ . Indeed, wehave p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − u ( t ) = (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q, t ) d q ≥ . The inequality is because when ρ ( q ) ≥ − φ + c ( t ) ≥ − t , we have v ( q, t ) = v ( q )[ t + ρ ( q )] ≥ . And p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − u ( t )= (cid:90) ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q )[ t + ρ ( q )] d q − (cid:90) q ∈ Q g ( q ) v ( q )[ t + ρ ( q )] d q = − (cid:90) ρ ( q ) < − φ + c ( t ) g ( q ) v ( q )[ t + ρ ( q )] d q ≥ , where the inequality is because ρ ( q ) < − φ + c ( t ) ≤ − t .To prove that the mechanism ( π ∗ , p ∗ ) is the optimal feasible mechanism, we need to apply the ironingtrick. We will ﬁrst derive the revenue function for any feasible mechanism using both t and t as referencepoints, and manipulate the expression so that it contains multiple terms. Then we show that the mechanism ( π ∗ , p ∗ ) optimizes all these terms simultaneously.Now we are ready to show the proof of Proposition 1. Proof of Proposition 1.

Let ( π, p ) be any feasible mechanism. Since the utility of Buyer is just the differencebetween the value obtained from purchasing the item and their payment, we can write the revenue of

Seller as:

REV ( π, p ) = (cid:90) t t f ( t ) p ( t ) d t = (cid:90) t t f ( t ) (cid:20)(cid:90) q ∈ Q g ( q ) π ( q, t ) v ( q, t ) d q − u ( t ) (cid:21) d t. Applying Equation (16), we get

REV ( π, p ) = (cid:90) t t f ( t ) (cid:20)(cid:90) q ∈ Q g ( q ) π ( q, t ) v ( q, t ) d q − (cid:90) tt P π ( x ) d x − u ( t ) (cid:21) d t = (cid:90) t t f ( t ) (cid:20)(cid:90) q ∈ Q g ( q ) π ( q, t ) v ( q, t ) d q (cid:21) d t − (cid:90) t t (cid:90) tt f ( t ) P π ( x ) d x d t − u ( t )= (cid:90) t t f ( t ) (cid:20)(cid:90) q ∈ Q g ( q ) π ( q, t ) v ( q, t ) d q (cid:21) d t − (cid:90) t t (cid:90) t x f ( t ) P π ( x ) d t d x − u ( t )= (cid:90) t t f ( t ) (cid:20)(cid:90) q ∈ Q g ( q ) π ( q, t ) v ( q, t ) d q (cid:21) d t − (cid:90) t t [1 − F ( x )] P π ( x ) d x − u ( t ) , REV ( π, p )= (cid:90) q ∈ Q g ( q ) (cid:20)(cid:90) t t f ( t ) π ( q, t ) v ( q, t ) d q (cid:21) d t − (cid:90) t t [1 − F ( t )] (cid:90) q ∈ Q g ( q ) π ( q, t ) v ( q )d q d t − u ( t )= (cid:90) q ∈ Q g ( q ) (cid:20)(cid:90) t t f ( t ) π ( q, t ) (cid:18) v ( q, t ) − v ( q ) 1 − F ( t ) f ( t ) (cid:19) d t (cid:21) d q − u ( t )= (cid:90) q ∈ Q g ( q ) (cid:20)(cid:90) t t f ( t ) π ( q, t ) v ( q ) (cid:2) φ ( t ) + ρ ( q ) (cid:3) d t (cid:21) d q − u ( t ) . (21)The above derivation uses u ( t ) as the “reference” points. Similarly, using a variant of Equation (16) u ( t ) = u ( t ) − (cid:82) t t P π ( x ) d x , we can derive an alternative form of the revenue with u ( t ) as the referencepoint: REV ( π, p ) = (cid:90) q ∈ Q g ( q ) (cid:20)(cid:90) t t f ( t ) π ∗ ( q, t ) v ( q ) (cid:2) φ ( t ) + ρ ( q ) (cid:3) d t (cid:21) d q − u ( t ) . (22)Note that Equation (21) and (22) are just different representations of the (same) revenue of mechanism ( π, p ) . Thus any convex combination of them also represents the same revenue. Using the constant c givenin the lemma as the combination coefﬁcient, we have REV ( π, p ) = c (cid:20)(cid:90) q ∈ Q g ( q ) (cid:90) t t f ( t ) π ( q, t ) v ( q ) (cid:2) φ ( t ) + ρ ( q ) (cid:3) d t d q − u ( t ) (cid:21) + (1 − c ) (cid:20)(cid:90) q ∈ Q g ( q ) (cid:90) t t f ( t ) π ( q, t ) v ( q ) (cid:2) φ ( t ) + ρ ( q ) (cid:3) d t d q − u ( t ) (cid:21) = (cid:90) t t (cid:90) q ∈ Q [ φ c ( t ) − ρ ( q )] π ( q, t ) f ( t ) g ( q ) v ( q ) d q d t − cu ( t ) − (1 − c ) u ( t ) . Let H c ( · ) , L c ( · ) , h c ( · ) , l c ( · ) be the corresponding functions when ironing the function φ c ( t ) (detaileddeﬁned in Appendix A).By deﬁnition, we have h c ( F ( t )) = φ c ( t ) and l c ( F ( t )) = φ + c ( t ) . So the ﬁrst term in the right-hand sideof the above equation can be written as (cid:90) t t (cid:90) q ∈ Q [ φ c ( t ) + ρ ( q )] π ( q, t ) f ( t ) g ( q ) v ( q ) d q d t = (cid:90) t t (cid:90) q ∈ Q (cid:2) φ + c ( t ) + ρ ( q ) (cid:3) π ( q, t ) f ( t ) g ( q ) v ( q ) d q d t + (cid:90) t t (cid:90) q ∈ Q [ h c ( F ( t )) − l c ( F ( t ))] π ( q, t ) f ( t ) g ( q ) v ( q ) d q d t. Using integration by parts, we can simplify the second term as follows: (cid:90) t t (cid:90) q ∈ Q [ h c ( F ( t )) − l c ( F ( t ))] π ( q, t ) f ( t ) g ( q ) v ( q ) d q d t = (cid:90) t t [ h c ( F ( t )) − l c ( F ( t ))] P π ( t ) d F ( t )= [ H c ( F ( t )) − L c ( F ( t ))] P π ( t ) | t t − (cid:90) t t [ H c ( F ( t )) − L c ( F ( t ))] d P π ( t ) L c is the “convex hull” of H c , so L c (0) = H c (0) and L c (1) = H c (1) . Thus the ﬁrst term above issimply . Therefore, we have REV ( π, p ) = (cid:90) t t (cid:90) q ∈ Q (cid:2) φ + c ( t ) + ρ ( q ) (cid:3) π ( q, t ) f ( t ) g ( q ) v ( q ) d q d t − (cid:90) t t [ H c ( F ( t )) − L c ( F ( t ))] d P π ( t ) − cu ( t ) − (1 − c ) u ( t ) . (23)Now consider mechanism ( π ∗ , p ∗ ) , which is feasible according to Lemma 8. This mechanism clearlymaximizes the ﬁrst term in Equation (23) as π ∗ ( q, t ) = 1 if and only if φ + c ( t ) + ρ ( q ) ≥ . We alsohave u ( t ) = 0 and u ( t ) = v ( t ) for this mechanism as shown in the proof of Lemma 8. Constraint(17) requires u ( t ) ≥ and u ( t ) ≥ v ( t ) , which implies that this mechanism also optimizes the last twoterms. As for the second term, note that H c ( F ( t )) − L c ( F ( t )) ≥ by deﬁnition, and d P π ( t ) ≥ for anyfeasible mechanism. Thus the second term is always non-negative. However, we claim that with mechanism ( π ∗ , p ∗ ) , this term achieves 0. Clearly, the only interesting case is when H c ( F ( t )) − L c ( F ( t )) > . We willshow that d P π ∗ ( t ) = 0 for this case. In this case, t must lie in an ironed interval I . Thus L c ( z ) is linear inthe interval I , where z = F ( t ) . This implies that φ + c ( t ) = l c ( z ) = L (cid:48) c ( z ) is constant. So P π ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q ) d q = (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q is also constant in the interval I , which leads to d P π ∗ ( t ) being 0.Therefore, the mechanism ( π ∗ , p ∗ ) optimizes all the 4 terms in Equation (23) simultaneously, thus is anoptimal feasible mechanism. References [1] Itai Arieli and Yakov Babichenko. Private bayesian persuasion.

Journal of Economic Theory , 182:185–217, 2019.[2] Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. Optimal mechanisms for selling informa-tion. In

Proceedings of the 13th ACM Conference on Electronic Commerce , EC ’12, pages 92–109,New York, NY, USA, 2012. ACM.[3] Dirk Bergemann and Alessandro Bonatti. Markets for information: An introduction.

Annual Reviewof Economics , 11:85–107, 2019.[4] Dirk Bergemann, Alessandro Bonatti, and Alex Smolin. The design and price of information.

Ameri-can economic review , 108(1):1–48, 2018.[5] Dirk Bergemann and Stephen Morris. Bayes correlated equilibrium and the comparison of informationstructures in games.

Theoretical Economics , 11(2):487–522, 2016.[6] Dirk Bergemann and Stephen Morris. Information design: A uniﬁed perspective.

Journal of EconomicLiterature , 57(1):44–95, 2019.[7] Yang Cai and Grigoris Velegkas. How to sell information optimally: An algorithmic study. In James R.Lee, editor, , volume 185 of

LIPIcs , pages 81:1–81:20. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2021. 188] Ozan Candogan. Persuasion in networks: Public signals and k-cores. In

Proceedings of the 2019 ACMConference on Economics and Computation , pages 133–134, 2019.[9] Ozan Candogan and Kimon Drakopoulos. Optimal signaling of content accuracy: Engagement vs.misinformation.

Operations Research , 68(2):497–515, 2020.[10] Yiling Chen, Haifeng Xu, and Shuran Zheng. Selling information through consulting. In

Proceedingsof the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 2412–2431. SIAM,2020.[11] Constantinos Daskalakis, Christos Papadimitriou, and Christos Tzamos. Does information revelationimprove revenue? In

Proceedings of the 2016 ACM Conference on Economics and Computation , EC’16, page 233–250, New York, NY, USA, 2016. Association for Computing Machinery.[12] Shaddin Dughmi. Algorithmic information structure design: a survey.

ACM SIGecom Exchanges ,15(2):2–24, 2017.[13] Yingni Guo and Eran Shmaya. The interval structure of optimal disclosure.

Econometrica , 87(2):653–675, 2019.[14] Emir Kamenica. Bayesian persuasion and information design.

Annual Review of Economics , 11:249–272, 2019.[15] Emir Kamenica and Matthew Gentzkow. Bayesian persuasion.

American Economic Review ,101(6):2590–2615, 2011.[16] Anton Kolotilin, Tymoﬁy Mylovanov, Andriy Zapechelnyuk, and Ming Li. Persuasion of a privatelyinformed receiver.

Econometrica , 85(6):1949–1964, 2017.[17] Roger Myerson. Optimal auction design.

Mathematics of Operations Research , 6(1):58–73, 1981.19

PPENDIXA Ironing

A.1 Formal Description of the Ironing Procedure

Deﬁnition 4 (Ironing [17]) . Let t be the buyer’s type with CDF F ( t ) and PDF f ( t ) , and φ ( t ) be anyfunction of the type t , called a virtual value function . The ironed function φ + ( t ) can be obtained throughthe following process:1. Let z = F ( t ) be another random variable and deﬁne h ( z ) = φ ( F − ( t )) , where F − ( t ) is the inversefunction of F ( t ) .2. Deﬁne H : [0 , (cid:55)→ R to be the integral of h ( z ) : H ( z ) = (cid:90) z h ( r ) d r.

3. Deﬁne L : [0 , (cid:55)→ R be the “convex hull” of function H : L ( z ) = min z ,z ,γ { γH ( z ) + (1 − γ ) H ( z ) } , where z , z , γ ∈ [0 , and γz + (1 − γ ) z = z .4. Let l ( z ) be the derivative of L : l ( z ) = L (cid:48) ( z ) .

5. Obtain φ + ( t ) by variable substitution: φ + ( t ) = l ( z ) = l ( F ( t )) . The above ironing trick is widely used in the literature. The process is illustrated in Figure 2. InMyerson’s original work [17], he only considered ironing for the lower virtual value function φ ( t ) = t − − F ( t ) f ( t ) . However, this procedure generalizes to any virtual value function. 𝜙(𝑡) In our setting, 𝜙(𝑡) can be any of the three virtual values. ℎ(𝑧)

A function that takes the inverse

CDF of t as variable. 𝑧 ∈ [0,1]

𝐻(𝑧)

Integral of ℎ(𝑧). 𝜙 + (𝑡) 𝜙 + 𝑡 is the ironed version of 𝜙(𝑡) . 𝑙(𝑧) Derivative of

𝐿(𝑧) . 𝐿(𝑧)

Convex hull of

𝐻(𝑧) . 𝐿(𝑧) ≤ 𝐻(𝑧) ℎ(𝑧) = 𝜙(𝐹 −1 𝑡 ) 𝐻(𝑧) = න ℎ 𝑟 ⅆ𝑟 Get convex hull 𝑙 z = 𝐿 ′ z𝜙 + t = 𝑙(𝐹 𝑡 ) (a) Ironing Procedure (b) Taking convex hull Figure 2: The ironing process20 .2 Proof of Lemma 3: Monotonicity of Mixed Virtual Values

Lemma (Restatement of Lemma 3) . Deﬁne mixed virtual value φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) . Then we havefor any ≤ c < c (cid:48) ≤ , φ + c ( t ) ≥ φ + c (cid:48) ( t ) for any t . Moreover, φ +1 ( t ) = φ + ( t ) < t < φ + ( t ) = φ +0 ( t ) , ∀ t ∈ ( t , t ) .Proof. We ﬁrst prove the monotonicity of φ + c ( t ) in c . We will instead consider the function φ in a newvariable space z = F ( t ) ∈ [0 , , as opposed to the original space t . Therefore, let l c ( z ) = φ + c ( F − ( z )) and h c ( z ) = φ c ( F − ( z )) . Recall that the ironing procedure satisﬁes L c ( z ) = (cid:82) z l c ( r ) d r is the convex hull of H c ( z ) = (cid:82) z h c ( r ) d r .Note that during the ironing of function H c ( z ) , it divided the compact variable space [0 , into a ﬁnitenumber of small intervals with breaking points z , z , · · · , z k = 1 on which for any interval [ z i , z i +1 ] :(1) either H c ( z ) = L c ( z ) for any z ∈ [ z i , z i +1 ] ; (2) l c ( z ) is a constant and H c ( z ) ≥ L c ( z ) for any z ∈ [ z i , z i +1 ] . In later case, we will call [ z i , z i +1 ] an ironing interval and say H c ( z ) is at ironing state in thisinterval. We call z i the ironing starting point and z i +1 the ironing ending point . Similarly, in former case,we call [ z i , z i +1 ] an un-ironing interval and say H c ( z ) is at un-ironing state in this interval. Note that in thiscase, z i will be an ironing ending point and z i +1 will be an ironing starting point. In fact, in the sequence z , z , · · · , z k = 1 , ironing starting and ending points show up alternately. The following are a fewuseful properties that will be needed.1. If z (cid:54) = 0 , is an ironing starting or ending point, then h c ( z ) = l c ( z ) and H c ( z ) = L c ( z ) .2. if [ z i , z i +1 ] is an ironing interval for H c ( z ) , then we have l c ( z ) = H c ( z i +1 ) − H c ( z i ) z i +1 − z i for any z ∈ [ z i , z i +1 ]

3. For any z ∈ [0 , we have h c ( z ) = t + F ( t ) f ( t ) − cf ( t ) ≥ t + F ( t ) f ( t ) − c (cid:48) f ( t ) = h c (cid:48) ( z ) where t = F − ( z ) .4. Due to property (3) above, for any z we have H c ( z ) − H c ( z ) ≥ H c (cid:48) ( z ) − H c (cid:48) ( z ) for any z > z .Moreover, H c (0) = H c (cid:48) (0) = 0 .Similarly, we can also have a sequence of ironing starting and ending points for the function H c (cid:48) ( z ) . Letus merge all the ironing starting and ending points of H c ( z ) and H c (cid:48) ( z ) together, and list them in order as z , z , · · · , z k = 1 . Notably, within any interval [ z i , z i +1 ] , both function H c ( z ) and H c (cid:48) ( z ) can onlyhave a single state, either ironing state or un-ironing state.We ﬁrst prove l c (0) ≥ l c (cid:48) (0) . This follows a case analysis about whether is an ironing ending orstarting point for H c .• If is an ironing ending point for both H c , H c (cid:48) , meaning both functions are not on ironing state at and its neighborhood, we know l c (0) = h c (0) ≥ h c (cid:48) (0) = l c (cid:48) (0) , as desired.• If is an ironing ending point only for H c but an ironing starting point for H c (cid:48) , this means H c is onun-ironing state at and its neighborhood whereas H c (cid:48) is on ironing state. Then we have we know l c (0) = h c (0) ≥ h c (cid:48) (0) ≥ l c (cid:48) (0) , as desired.• If is an ironing starting point for H c (it does not matter it is an ironing ending or starting point for H c (cid:48) ), this means H c is on ironing state at and its neighborhood. Let z ≥ z be the immediate nextironing ending point for H c . Suppose, for the sake of contradiction, that l c (0) < l c (cid:48) (0) . We thus have H c ( z ) − H c (0) = l c (0) · ( z − by Property (2) above < l c (cid:48) (0) · ( z − by assumption ≤ L c (cid:48) ( z ) − L c (cid:48) (0) by convexity of L c (cid:48) ≤ H c (cid:48) ( z ) − H c (cid:48) (0) L c (cid:48) (0) = H c (cid:48) (0) , L c (cid:48) ( z ) ≤ H c (cid:48) ( z ) This contradicts Property (4) above. Therefore, we must l c (0) ≥ l c (cid:48) (0) , as desired.21ext, we will prove that for any i = 0 , · · · , k − and any interval [ z i , z i +1 ] — more convenientlydenoted as [ a, b ] with interval lower bound a and upper bound b — we will have l c ( z ) ≥ l c (cid:48) ( z ) for any z ∈ [ a, b ] .Our proof uses an induction argument over the intervals indexed by i . Speciﬁcally, suppose we alreadyhave l c ( a ) ≥ l c (cid:48) ( a ) , we will show l c ( z ) ≥ l c (cid:48) ( z ) for any z ∈ [ a, b ] . This, together with the base case for a = 0 as proved above, will prove the monotonicity of l c ( z ) on c .The proof uses a case analysis about whether then ending point b of the interval [ a, b ] is an ironingstarting point or ending point for H c or for H c (cid:48) . Note that there will be four cases here. This is because wedo not know whether b is an ironing point on H c or on H c (cid:48) and thus have to consider both possibilities. Here,we will use the crucial property that both H c and H c (cid:48) will have the same state, i.e., ironing or un-ironingstate, within [ a, b ] due to our choice of a, b .• If b is an ironing ending point for function H c (cid:48) , we have for any z ∈ [ a, b ] l c ( z ) ≥ l c ( a ) by convexity of L c ≥ l c (cid:48) ( a ) by induction hypothesis = l c (cid:48) ( z ) H c (cid:48) is at ironing state in [ a, b ] • If b is an ironing starting point for function H c (cid:48) , this means H c (cid:48) is at un-ironing state within [ a, b ] . If H c is also at un-ironing state within [ a, b ] , then we have l c ( z ) = h c ( z ) ≥ h c (cid:48) ( z ) = l c (cid:48) ( z ) as desired.Thus, we now consider the case that H c is at ironing state within [ a, b ] . Let z ≥ b be the immediatenext ironing ending point for H c . Suppose, for the sake of contradiction, that l c ( z ) < l c (cid:48) ( z ) for some z ∈ [ a, b ] . Since H c is at ironing state within [ a, z ] , we know that l c ( r ) = l c ( z ) < l c (cid:48) ( z ) ≤ l c (cid:48) ( r ) forany r ∈ [ b, z ] since l c (cid:48) ( z ) is monotone non-decreasing in z . Therefore, we have H c ( z ) − H c ( b ) ≤ L c ( z ) − L c ( b ) L c ( z ) = H c ( z ) , L c ( b ) ≤ H c ( b ) ≤ l c ( b ) · ( z − b ) H c is at ironing state in [ a, z ] < l c (cid:48) ( b ) · ( z − b ) by assumption ≤ L c (cid:48) ( z ) − L c (cid:48) ( b ) by convexity of L c (cid:48) ≤ H c (cid:48) ( z ) − H c (cid:48) ( b ) b is an ironing starting point for H c (cid:48) This contradicts Property (4) above. Therefore, we must l c ( z ) ≥ l c (cid:48) ( z ) for all z ∈ [ a, b ] , as desired.Note that one corner case for this situation is when z happens to equal b , i.e., b is both the ironingstarting point of H c (cid:48) and ironing ending point of H c . Our argument above does not apply to thiscorner situation since the strict “ < ” above becomes “ = ”. However, this corner case can be proved viaa simpler argument: ∀ z ∈ [ a, b ] , l c ( z ) = l c ( b ) = h c ( b ) ≥ h c (cid:48) ( b ) ≥ h c (cid:48) ( z ) = l c (cid:48) ( z ) where the secondequality is due to the fact that b is an ironing ending point of H c and the last equality is due to the factthat H c (cid:48) is at un-ironing state within [ a, b ] .• If b is an ironing starting point for function H c , this means H c is at un-ironing state within [ a, b ] . If H c (cid:48) is also at un-ironing state within [ a, b ] , then we have l c ( z ) = h c ( z ) ≥ h c (cid:48) ( z ) = l c (cid:48) ( z ) as desired.If H c (cid:48) is at ironing state within [ a, b ] , then we have l c ( z ) ≥ l c ( a ) ≥ l c (cid:48) ( a ) = l c (cid:48) ( z ) as desired.• Finally, if b is an ironing ending point for function H c , this means H c is at ironing state within [ a, b ] . If H c (cid:48) is also at ironing state within [ a, b ] , then we have l c ( z ) = l c ( a ) ≥ l c (cid:48) ( a ) = l c (cid:48) ( z ) as desired. If H c (cid:48) is at the un-ironing state within [ a, b ] , then we have l c ( z ) = l c ( b ) = h c ( b ) ≥ h c (cid:48) ( b ) ≥ l c (cid:48) ( z ) where:(1) the ﬁrst equality is because H c is at ironing state within [ a, b ] ; (2) second equality is because b isan ironing ending point for H c and the last inequality is because H c (cid:48) is at un-ironing state within [ a, b ] and thus h c (cid:48) ( z ) = l c (cid:48) ( z ) is monotone non-decreasing in z .22inally, we prove φ +1 ( t ) = φ + ( t ) < t < φ + ( t ) = φ +0 ( t ) , ∀ t ∈ ( t , t ) . We only show φ + ( t ) ≤ t, ∀ t , asthe other part follows from similar arguments.Let H and L be the corresponding functions deﬁned in Deﬁnition 4 when ironing the lower virtualvalue function φ ( t ) . Deﬁne an ironed interval I = ( a, b ) ⊂ [0 , to be an interval with H ( a ) = L ( a ) and H ( b ) = L ( b ) but H ( z ) > L ( z ) for all z ∈ I . Thus the interval [0 , can be partitioned into a set of disjointironed and non-ironed intervals.For any t , if the corresponding z = F ( t ) falls into a non-ironed interval, then we have H ( z ∗ ) = L ( z ∗ ) for all z ∗ in the same interval. So φ + ( t ) = L (cid:48) ( z ) = H (cid:48) ( z ) = φ ( t ) . Since φ ( t ) ≤ t , we have φ + ( t ) ≤ t .Therefore the only interesting case is when the corresponding z = F ( t ) falls into an ironed interval I = ( a, b ) . In this case, function L is linear in I , and φ + ( z ) = L (cid:48) ( z ) = L (cid:48) ( a ) = φ + ( a ) . But H (cid:48) ( a ) > L (cid:48) ( a ) since I is an ironed interval and H ( a + (cid:15) ) > L ( a + (cid:15) ) for any arbitrarily small (cid:15) . Therefore, φ + ( t ) = L (cid:48) ( z ) = L (cid:48) ( a ) < H (cid:48) ( a ) = φ ( F − ( a )) ≤ F − ( a ) < F − ( z ) = t . Thus, we can conclude φ + ( t ) ≤ t, ∀ t . B Proof of Technical Lemma 2

Lemma (Non-Negative Payments) . [Restatement of Lemma 2] There exists an optimal IC and IR mechanismin which p t ≥ for all t ∈ T .Proof. Let ( π, p ) be any IC and IR optimal mechanism. We construct a different mechanism ( π ∗ , p ∗ ) whichsatisﬁes the same constraints and remains optimal but p ∗ t ≥ for any t . For convenience, we divide buyertypes into two sets: T + = { t ∈ T : p t ≥ } is the set of types who have non-negative payments inmechanism ( π, p ) and T − = T \ T + is the set of types who have negative payments.The ( π ∗ , p ∗ ) is constructed from ( π, p ) as follows:1. The mechanism for any t ∈ T + remains the same: for any t ∈ T + , let p ∗ t = p t and π ∗ t = π t for all q ∈ Q ;2. The mechanism for any t ∈ T − becomes no information and no payment: for any t ∈ T − , let p ∗ t = 0 ,and π ∗ t be the mechanism that reveals no information (e.g., sending a single signal).We observe that the constructed mechanism ( π ∗ , p ∗ ) has three useful properties: (1) it yields revenue at leastthat of ( π, p ) by construction; (2) all buyer types’ payments are non-negative now; (3) individual rationalityconstraint is satisﬁed for every buyer type. The third property follows from the construction: the utilityof any buyer type t ∈ T + did not change and the utility of a type t ∈ T − now pays and receives noinformation, so IR constraint is always satisﬁed.However, the major issue with the constructed mechanism ( π ∗ , p ∗ ) is that it may not be incentive com-patible, i.e., bidder type t may want to misreport t (cid:48) . We ﬁrst observe that the IC constraint for any t ∈ T + remains satisﬁed. First of all, any type t ∈ T + would not have incentive to deviate to another type t (cid:48) ∈ T + due to the original IC constraint of ( π, p ) and the fact that the mechanism for types in T + remains the same.We claim that any type t ∈ T + would not have incentive to deviate to a type t (cid:48) in T − as well. This is becausethe information for t (cid:48) ∈ T − is less (since Seller reveals no information now) and the payment is more (since p ∗ t (cid:48) = 0 > p t (cid:48) ). Therefore, if in mechanism ( π, p ) buyer type t does not have incentives to deviate to t (cid:48) , itremains to be true for ( π ∗ , p ∗ ) .However, buyer type t ∈ T − may indeed have incentive to deviate to some type t (cid:48) ∈ T + now, since theymay want to receive beneﬁcial information under some amount of payment. Here comes our last step of theconstruction — adjusting the above ( π ∗ , p ∗ ) to make any type t ∈ T − to also satisfy IC without decreasingthe revenue neither violating the IR and obedient constraint. To do so, for any t ∈ T − , let t (cid:48) ∈ T + bethe most proﬁtable deviation of type t , i.e., the deviation that maximizes type t ’s utility. We adjust ( π ∗ , p ∗ ) simply by adopting the scheme of type t (cid:48) to the type t — i.e., resetting π ∗ t = π t (cid:48) and p ∗ t = p t (cid:48) . After suchadjustment, the IC constraint for any type t ∈ T − is satisﬁed by construction because each of these types23as indeed their most proﬁtable mechanism. Meanwhile, this will also maintain the IC constraint for anytype t ∈ T + since the adjustment did not add more menus. Note that IR constraint remains satisﬁed sincethe utility of any type t ∈ T + is non-decreasing in his adjustment. The revenue did not decrease as thepayment p ∗ ( t ) did not decrease in our adjustment for any t ∈ T + . The only non-obvious part to verifyis the obedience constraint. Indeed, the obedience constraint may be violated for type t ∈ T − during thisadjustment since the recommended optimal action for the t (cid:48) ∈ T + might not be optimal for t . To achieveobedience, we simply “rename” the recommended action for t to indeed be his optimal action. This restoresthe obedience constraint for t . Note that, this will either not change the revealed information or lead to lessrevealed information (when type t ’s optimal actions are the same under π ( · , t (cid:48) ) ), and thus will not hurt theIC constraints. C Characterization of Feasible Mechanisms — Proof of Lemma 4

In this appendix section, we show that the conditions in Lemma 4 are also necessary for any feasible mech-anism. We start by analyzing the IC Constraints. First, Constraint (11) can be re-arranged as follows: (cid:90) q ∈ Q [ π ( q, t ) − π ( q, t (cid:48) )] · g ( q ) v ( q, t ) d q ≥ p ( t ) − p ( t (cid:48) ) . Therefore, the IC constraint implies the following two inequalities about any two types t, t (cid:48) : (cid:90) q ∈ Q [ π ( q, t ) − π ( q, t (cid:48) )] · g ( q ) v ( q, t ) d q ≥ p ( t ) − p ( t (cid:48) ) , (24) (cid:90) q ∈ Q [ π ( q, t (cid:48) ) − π ( q, t )] · g ( q ) v ( q, t (cid:48) ) d q ≥ p ( t (cid:48) ) − p ( t ) . (25)Combining Inequality (24) and (25), we obtain the following constraint for any pair of types t, t (cid:48) : (cid:90) q ∈ Q [ π ( q, t (cid:48) ) − π ( q, t )] · g ( q ) v ( q, t ) d q ≤ p ( t (cid:48) ) − p ( t ) ≤ (cid:90) q ∈ Q [ π ( q, t (cid:48) ) − π ( q, t )] · g ( q ) v ( q, t (cid:48) ) d q. Therefore, the right-hand side of the above inequality must be at least its left-hand side. This implies thefollowing necessary condition for any IC information selling mechanism ( π, p ) . That is, for any t, t (cid:48) ∈ T ,we have ≤ (cid:90) q ∈ Q [ π ( q, t (cid:48) ) − π ( q, t )] · g ( q )[ v ( q, t (cid:48) ) − v ( q, t )] d q = [ t (cid:48) − t ] (cid:90) q ∈ Q [ π ( q, t (cid:48) ) − π ( q, t )] · g ( q ) v ( q ) d q (26)Recall the deﬁnition of P π ( t ) (14) P π ( t ) = (cid:90) q ∈ Q π ( q, t ) · g ( q ) v ( q ) d q. Note that P π ( t ) can be interpreted as the expected weighted probability of being recommended the activeaction where the weights are v ( q ) . A simple case analysis for t (cid:48) > t and t (cid:48) < t implies that Inequality (26)is equivalent to that P π ( t ) is monotone non-decreasing in t . We thus term this the signaling monotonicity .This is analogous to Myerson’s allocation monotonicity condition as in auction design, but is different.Speciﬁcally, in Myerson’s optimal auction, the value of an item directly depends on buyer type t with noweight associated with it. In information selling, the value of taking the active action will depend on theutility coefﬁcient v ( q ) . 24e now derive a relation between signaling scheme π and payment rule p for any IC mechanism. Westart by analyzing Buyer ’s utility. Note that any buyer of type t will derive non-zero utilities only fromthe active action recommendation (i.e., action ) since the passive action always leads to buyer utility .Therefore, as we deﬁned in (5), Buyer of type t has the following utility:Utility of Buyer Type t : u ( t ) = (cid:90) q ∈ Q [ g ( q ) π ( q, t ) v ( q, t )] d q − p ( t ) Re-arranging Inequality (24), we have u ( t ) = (cid:90) q ∈ Q [ g ( q ) π ( q, t ) v ( q, t )] d q − p ( t ) ≥ (cid:90) q ∈ Q (cid:2) g ( q ) π ( q, t (cid:48) ) v ( q, t ) (cid:3) d q − p ( t (cid:48) ) by Ineq. (24) = (cid:90) q ∈ Q (cid:2) g ( q ) π ( q, t (cid:48) ) v ( q, t ) (cid:3) d q + u ( t (cid:48) ) − (cid:90) q ∈ Q (cid:2) g ( q ) π ( q, t (cid:48) ) v ( q, t (cid:48) ) (cid:3) d q = (cid:90) q ∈ Q (cid:2) g ( q ) π ( q, t (cid:48) )[ v ( q, t ) − v ( q, t (cid:48) )] (cid:3) d q + u ( t (cid:48) ) Algebraic Manipulation =( t − t (cid:48) ) P π ( t (cid:48) ) + u ( t (cid:48) ) Def. of P π ( t ) and v ( q, t ) As a result, Inequality (24) implies u ( t ) − u ( t (cid:48) ) ≥ ( t − t (cid:48) ) P π ( t (cid:48) ) . Together with a similar derivationfrom Inequality (25), we have the following inequality ( t − t (cid:48) ) P π ( t (cid:48) ) ≤ u ( t ) − u ( t (cid:48) ) ≤ ( t − t (cid:48) ) P π ( t ) . Note that the above inequality holds for any t, t (cid:48) . Therefore, by letting t (cid:48) → t and invoking that fact that P ( t ) is monotone and continuous, we can integrate the above equation from t to t and obtain the inequalities: (cid:90) tt P π ( x ) d x ≤ u ( t ) − u ( t ) ≤ (cid:90) tt P π ( x ) d x This implies the following utility identify u ( t ) = u ( t ) + (cid:90) tt P π ( x ) d x. Note that both the signaling monotonicity and the utility identify above are the necessary outcomes of theincentive compatibility constraints, more precisely, the outcome of Constraints (24) and (25).

D Omitted Proofs in Section 4.2

D.1 Proof of Technical Lemma 6

Lemma (Restatement of Lemma 6) . Deﬁne V (cid:48) L = − (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x and V (cid:48) H = − (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x. Then the three conditions v ( t ) ≤ V L , v ( t ) ≥ V H , and V L < v ( t ) < V H are equivalent to v ( t ) ≤ V (cid:48) L , v ( t ) ≥ V (cid:48) H , and V (cid:48) L < v ( t ) < V (cid:48) H , respectively. roof. We will only show that v ( t ) ≤ V L is equivalent to v ( t ) ≤ V (cid:48) L , as the other two cases follows fromsimilar arguments.By deﬁnition, we have v ( t ) = (cid:90) q ∈ Q g ( q ) v ( q )[ t + ρ ( q )] d x = v ( t ) + ( t − t ) (cid:90) q ∈ Q g ( q ) v ( q ) d q. Thus v ( t ) ≤ V L can be written as: v ( t ) + ( t − t ) (cid:90) q ∈ Q g ( q ) v ( q ) d q ≤ max { v ( t ) , } + (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x. Some re-arrangements yields: v ( t ) − max { v ( t ) , } ≤ (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( x ) g ( q ) v ( q ) d q d x − ( t − t ) (cid:90) q ∈ Q g ( q ) v ( q ) d q, which is equivalent to: min { v ( t ) , } ≤ − (cid:90) t t (cid:90) q : ρ ( q ) ≤− φ + ( x ) g ( q ) v ( q ) d q d x = V (cid:48) L . Note that the right-hand side is always non-positive. So the left-hand side has to be v ( t ) . Thus the condition v ( t ) ≤ V L is equivalent to v ( t ) ≤ V (cid:48) L , and also implies that v ( t ) ≤ . D.2 Proof of Technical Lemma 7

Lemma (Restatement of Lemma 7) . If V L < v ( t ) < V H , there exists a constant c ∈ (0 , that satisﬁesEquation (20) .Proof. Lemma 6 implies that the condition v ( t ) < V H is equivalent to the following: v ( t ) < − (cid:90) t t (cid:90) q : ρ ( q ) ≤− φ + ( t ) g ( q ) v ( q ) d q d t. (27)The right-hand side of the above inequality is clearly non-positive. Thus v ( t ) ≤ and max { v ( t ) , } = 0 .The condition V L < v ( t ) < V H can be written as: (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q d t < v ( t ) < (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q d t. When c = 0 , we have − φ + ( t ) = − φ + c ( t ) and (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t = (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q d t < v ( t ) . When c = 1 , we have − φ + ( t ) = − φ + c ( t ) and (cid:90) t t (cid:90) ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t = (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q d t > v ( t ) . Since the distribution of ρ does not contain a point mass, (cid:82) t t (cid:82) ρ ( q ) ≥− φ c ( t ) g ( q ) v ( q ) d q d t is continuousin c . Thus we must have c ∈ (0 , that satisﬁes Equation (20).26 Optimal Mechanism for Case 1 ( v ( t ) ≤ V L ) In this section, we derive the optimal mechanism for the ﬁrst case of Theorem 1. Similar to Section 4.2,we will ﬁrst prove that our mechanism is feasible. Then we show it achieves the optimal revenue among allfeasible mechanisms.

Lemma 9.

The threshold mechanism ( π ∗ , p ∗ ) deﬁned according to φ + ( t ) is feasible.Proof. Using the characterization of Lemma (4), it sufﬁces to show that the given mechanism satisﬁesConstraints (15)-(18). Since the ironed lower virtual value function φ + ( t ) is monotone non-decreasing, weknow that the threshold θ ∗ t = − φ + ( t ) is monotone non-increasing in t . This implies that P π ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q ) d q = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q is monotone non-decreasing in t since a larger t leads to a smaller integral lower bound, satisfying Constraint(15).The utility function is, by deﬁnition, u ( t ) = (cid:90) q ∈ Q [ g ( q ) π ∗ ( q, t ) v ( q, t )] d q − p ∗ ( t ) = (cid:90) tt P π ∗ ( x ) d x. which implies u ( t ) = 0 , and u ( t ) = u ( t ) + (cid:90) tt P π ∗ ( x ) d x, satisfying Constraint (16).For Constraint (17), we already have u ( t ) = 0 . Now we prove u ( t ) ≥ v ( t ) . Lemma 6 shows thatthe condition v ( t ) ≤ V L is equivalent to v ( t ) ≤ V (cid:48) L . Also, it is easy to see that V (cid:48) L ≤ , which implies v ( t ) ≤ . So max { v ( t ) , } = 0 , and u ( t ) = (cid:90) t t P π ∗ ( x ) d x = (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q d x = (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q d x + max { , v ( t ) }≥ v ( t ) Finally, we argue that the payment is non-negative, i.e., Constraint (18) is satisﬁed. By lemma 3, wehave for all t ∈ T , φ + ( t ) ≤ t .Let t (cid:48) be any number in the interval [ φ + ( t ) , t ] . Thus p ∗ ( t ) − p ∗ ( t (cid:48) ) = (cid:90) q ∈ Q [ g ( q ) π ∗ ( q, t ) v ( q, t )] d q − (cid:90) q ∈ Q [ π ∗ ( q, t (cid:48) ) g ( q ) v ( q, t (cid:48) )] d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t ) d q − (cid:90) q : ρ ( q ) ≥− φ + ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x. When ρ ( q ) ≥ − φ + ( t ) , we have v ( q, t (cid:48) ) = v ( q )[ t (cid:48) + ρ ( q )] ≥ v ( q )[ t (cid:48) − φ + ( t )] ≥ , where the last inequalityis because of the choice of t (cid:48) . So the second term in the above equation satisﬁes: (cid:90) q : ρ ( q ) ≥− φ + ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) q : − φ + ( t ) ≤ ρ ( q ) < − φ + ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q ≤ (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q. p ∗ ( t ) − p ∗ ( t (cid:48) ) ≥ (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t ) d q − (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q )( t − t (cid:48) ) d q − (cid:90) tt (cid:48) P π ∗ ( x ) d x =( t − t (cid:48) ) P π ∗ ( t ) − (cid:90) tt (cid:48) P π ∗ ( x ) d x ≥ , where the last inequality is due to the monotonicity of P π ∗ ( t ) .Therefore, the payment function p ∗ ( t ) is monotone non-decreasing in the interval [ φ ( t ) , t ] . Since the setof intervals { [ φ ( t ) , t ] | t ∈ T } covers the interval T , we conclude that p ( t ) is monotone non-decreasing in T . Therefore, to prove that p ( t ) ≥ for all t ∈ T , it sufﬁces to show that p ( t ) ≥ . Indeed, we have p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − u ( t ) = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t ) d q ≥ . The inequality holds because when ρ ( q ) ≥ − φ + ( t ) ≥ − t , we get v ( q, t ) = v ( q )( t + ρ ( q )) ≥ .Now we prove that the mechanism deﬁned according to φ + ( t ) is optimal, i.e., achieves the maximumpossible revenue among all feasible mechanisms. Lemma 10. If v ( t ) ≤ V L , the threshold mechanism with threshold signaling function θ ∗ ( t ) = − φ + ( t ) andthe following payment function represents an optimal mechanism: p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − (cid:90) tt (cid:90) q ∈ Q π ∗ ( q, x ) g ( q ) v ( q ) d q d x. where π ∗ is determined by θ ∗ ( t ) as in Deﬁnition 1.Proof. According to the proof of Lemma 1, the revenue of any feasible mechanism can be written as:

REV ( π, p ) = (cid:90) q ∈ Q g ( q ) (cid:20)(cid:90) t t f ( t ) π ( q, t ) v ( q ) (cid:2) φ ( t ) + ρ ( q ) (cid:3) d t (cid:21) d q − u ( t ) . Let H ( · ) , h ( · ) , L ( · ) , and l ( · ) the corresponding function when ironing the virtual value φ ( t ) . We can writethe ﬁrst term of the revenue function as follows: (cid:90) q ∈ Q (cid:90) t t (cid:2) φ ( t ) + ρ ( q ) (cid:3) f ( t ) π ( q, t ) g ( q ) v ( q ) d t d q = (cid:90) q ∈ Q (cid:90) t t (cid:2) φ + ( t ) + ρ ( q ) (cid:3) f ( t ) π ( q, t ) g ( q ) v ( q ) d t d q + (cid:90) q ∈ Q (cid:90) t t [ h ( F ( t )) − l ( F ( t ))] f ( t ) π ( q, t ) g ( q ) v ( q ) d t d q. φ + ( t ) = l ( F ( t )) and φ ( t ) = h ( F ( t )) . Using integration by parts, we cansimplify the second term (cid:90) q ∈ Q (cid:90) t t [ h ( F ( t )) − l ( F ( t ))] f ( t ) π ( q, t ) g ( q ) v ( q ) d t d q = (cid:90) t t [ h ( F ( t )) − l ( F ( t ))] P π ( t ) d F ( t )=[ H ( F ( t )) − L ( F ( t ))] P π ( t ) | t t − (cid:90) t t [ H ( F ( t )) − L ( F ( t ))] d P π ( t ) Because L is the “convex hull” of H on [0 , , L (0) = H (0) and L (1) = H (1) . Thus the term [ H ( F ( t )) − L ( F ( t ))] P π ( t ) | t t is simply , and we have REV ( π, p ) = (cid:90) q ∈ Q (cid:90) t t (cid:2) φ + ( t ) + ρ ( q ) (cid:3) f ( t ) π ( q, t ) g ( q ) v ( q ) d t d q − (cid:90) t t [ H ( F ( t )) − L ( F ( t ))] d P π ( t ) − u ( t ) Now consider mechanism ( π ∗ , p ∗ ) . π ∗ maximizes the ﬁrst term since π ∗ ( q, t ) = 1 , ∀ q, t with ρ ( q ) + φ + ( t ) ≥ . Also, by deﬁnition, we have u ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q, t ) d q − p ( t ) = (cid:90) tt P π ∗ ( x ) d x. Thus we have u ( t ) = 0 .As for the second term, note that H ( F ( t )) − L ( F ( t )) ≥ by deﬁnition, and d P π ( t ) ≥ for anyfeasible mechanism. Thus the second term is always non-negative. However, we claim that with mechanism ( π ∗ , p ∗ ) , this term is actually 0. The only interesting case is when H ( F ( t )) − L ( F ( t )) > . We will showthat d P π ∗ ( t ) = 0 . In this case, t must lie in an ironed interval I . Thus L ( z ) is linear in the interval I , where z = F ( t ) . This implies that φ + ( t ) = l ( z ) = L (cid:48) ( z ) is constant. So P π ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q ) d q = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q is also constant in the interval I , which leads to d P π ∗ ( t ) being 0.Therefore, mechanism ( π ∗ , p ∗ ) optimizes all 3 terms in Equation (21) simultaneously, hence optimal.Note that the above derivation of REV ( π, p ) used the equality u ( t ) = (cid:82) tt P π ( x ) d x + u ( t ) to expand u ( t ) with t as the reference point. This is also the original Myerson’s approach. This approach works inMyerson’s optimal auction design because there Buyer ’s surplus equals

Buyer ’s utility from participating themechanism since the only outside option is to not purchase resulting in utility . Therefore, in Myerson’soptimal auction design, u ( t ) ≥ guarantees IR constraint, i.e., u ( t ) ≥ , for any feasible mechanism.This however, ceases to be true in our setup because s ( t ) ≥ does not guarantee s ( t ) ≥ . In fact,Lemma 5 shows that s ( t ) is a concave function with s ( t ) as the maximum surplus where t is a zero of v ( t ) function. Nevertheless, we know that the optimal mechanism must satisfy either s ( t ) = 0 or s ( t ) = 0 since otherwise, we can shift the entire s ( t ) curve down by a constant — achieved by asking each buyertype to pay the same additional amount — until one of them reaches .29 Optimal Mechanism for Case 2 ( v ( t ) ≥ V H ) In this section, we will discuss the second case of our main result, i.e., when v ( t ) ≥ V H . In this case, if westill use t as the reference point and follow the same analysis of Case 1, we will end up having a mechanismwith u ( t ) < v ( t ) , hence infeasible. To solve this problem, we write the revenue expression REV ( π, p ) using t as the reference point. Although the resulting mechanism looks different, the approach for derivingit is quite similar to that in the proof of Case 1.We still start with showing the feasibility of the given mechanism ( π ∗ , p ∗ ) . Lemma 11.

The threshold mechanism ( π ∗ , p ∗ ) deﬁned according to φ + ( t ) is feasible.Proof. According to Lemma (4), it sufﬁces to show that that the given mechanism satisﬁes Constraints (15)-(18). Since the ironed upper virtual value function φ + ( t ) is monotone non-decreasing, we know that thethreshold θ ∗ ( t ) = − φ + ( t ) is monotone non-increasing in t . This implies that P π ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t ) g ( q ) v ( q ) d q = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q ) d q is monotone non-decreasing in t since a larger t leads to a larger − φ + ( t ) and thus larger integral domain for q . So Constraint (15) is satisﬁed.We now prove that ( π ∗ , p ∗ ) satisﬁes Constraint (16). Plugging the payment function π ∗ ( t ) into thedeﬁnition of u ( t ) , we get u ( t ) = (cid:90) q ∈ Q g ( q ) π ∗ ( q, t ) v ( q, t ) d q − p ∗ ( t ) = v ( t ) − (cid:90) t t P π ∗ ( x ) d x. It is easy to see that u ( t ) = v ( t ) , which can be plugged back to the above equality to obtain Constraint(16).For Constraint (17), we already have u ( t ) = v ( t ) . And u ( t ) = v ( t ) − (cid:90) t t P π ∗ ( x ) d x ≥ max { v ( t ) , } + (cid:90) t t P π ∗ ( x ) d x − (cid:90) t t P π ∗ ( x ) d x ≥ , where the inequality is due to the condition v ( t ) ≥ V H .Finally, we show that the payment p ∗ ( t ) is non-negative i.e., p ∗ ( t ) satisﬁes Constraint (18). By lemma3, we have for all t ∈ T , t ≤ φ + ( t ) .For any t > t and t (cid:48) ∈ [ t, φ + ( t )] , we have p ∗ ( t (cid:48) ) − p ∗ ( t ) = (cid:90) q ∈ Q π ∗ ( q, t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) q ∈ Q g ( q ) π ∗ ( q, t ) v ( q, t ) d q − (cid:90) t (cid:48) t P π ∗ ( x ) dx = (cid:90) q : ρ ( q ) ≥− φ + ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t ) d q − (cid:90) t (cid:48) t P π ∗ ( x ) d x. (28)Observe that φ + ( t ) ≤ φ + ( t (cid:48) ) since t ≤ t (cid:48) . So the ﬁrst term in the right-hand side can be written as: (cid:90) q : ρ ( q ) ≥− φ + ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q + (cid:90) q : − φ + ( t (cid:48) ) ≤ ρ ( q ) < − φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q. ρ ( q ) < − φ + ( t ) , we have v ( q, t (cid:48) ) = v ( q )[ t (cid:48) + ρ ( q )] ≤ v ( q )[ t (cid:48) − φ + ( t )] ≤ , where the inequality isdue to the choice of t (cid:48) . Therefore, the second term in the right-hand side of the above equation is negative.As a result, (cid:90) q : ρ ( q ) ≥− φ + ( t (cid:48) ) g ( q ) v ( q, t (cid:48) ) d q ≤ (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q. Combining with Equation (28), we get p ∗ ( t (cid:48) ) − p ∗ ( t ) ≤ (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t (cid:48) ) d q − (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q, t ) d q − (cid:90) t (cid:48) t P π ∗ ( x ) d x = (cid:90) q : ρ ( q ) ≥− φ + ( t ) g ( q ) v ( q )( t (cid:48) − t ) d q − (cid:90) t (cid:48) t P π ∗ ( x ) d x =( t (cid:48) − t ) P π ∗ ( t ) − (cid:90) t (cid:48) t P π ∗ ( x ) d x ≤ . This shows that p ∗ ( t ) is monotone non-increasing in the interval [ t, φ + ( t )] for any t > t . Since set ofintervals { [ t, φ + ( t )] | t ∈ T } covers interval T , we can conclude that p ∗ ( t ) is monotone non-increasing inthe entire interval T . Thus, to show that the payment is always non-negative, we only need to prove that p ∗ ( t ) ≥ . Indeed, p ∗ ( t ) = (cid:90) q ∈ Q g ( q ) π ∗ ( q, t ) v ( q, t ) d q − v ( t ) + (cid:90) t t P π ∗ ( x ) d x = (cid:90) q ∈ Q g ( q ) π ∗ ( q, t ) v ( q, t ) d q − (cid:90) q ∈ Q g ( q ) v ( q, t ) d q = − (cid:90) q : ρ ( q ) < − φ + ( t ) g ( q ) v ( q, t ) d q. When ρ ( q ) < − φ + ( t ) < − t , we have v ( q, t ) = v ( q )[ t + ρ ( q )] < . Thus p ∗ ( t ) ≥ . Similar techniques are also used to proved existence of solutions for differential equations. Threshold Mechanisms with Partial Recommendations

We assumed that the probability distribution of ρ does not have point masses in the main body of the paper.This is to ensure the existence of the constant c in Case 3 of Theorem 1. But if the distribution of ρ haspoint masses, such a c may not exist. In this case, we will need to slightly modify our mechanism andincorporate partial recommendations. If such a c does not exist, it must be that both the distributions of ρ and φ + c contains point masses, more speciﬁcally, the measure of { ( ρ, t ) | ρ ( q ) = φ + c ( t ) = ζ } is non-zerofor some ζ .For any c ∈ [0 , , let φ c ( t ) = cφ ( t ) + (1 − c ) φ ( t ) be the mixed virtual value function and Y ( c ) = (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t be a function of c .We ﬁrst prove the monotonicity of Y ( c ) . As shown in Lemma 3, ∀ ≤ c < c (cid:48) ≤ , we have φ + c ( t ) ≥ φ + c (cid:48) ( t ) ∀ t . Thus, when c is increasing, φ + c ( t ) is (weakly) decreasing and the threshold − φ + c ( t ) is (weakly)increasing. So the function Y ( c ) will integrate a non-negative function g ( q ) v ( q ) over a smaller region of q and is (weakly) decreasing.Next, we argue that Y ( c ) is left-continuous. By monotoniciy, we know that Y ( c ) is continuous almosteverywhere. For any c ∈ (0 , and any arbitraryly small positive (cid:15) , we have lim β → c − Y ( β ) − Y ( c )= lim (cid:15) → + Y ( c − (cid:15) ) − Y ( c )= lim (cid:15) → + (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c − (cid:15) ( t ) g ( q ) v ( q ) d q d t − (cid:90) t t (cid:90) q : ρ ( q ) ≥− φ + c ( t ) g ( q ) v ( q ) d q d t = lim (cid:15) → + (cid:90) t t (cid:90) q : − φ + c − (cid:15) ( t ) ≤ ρ ( q ) < − φ + c ( t ) g ( q ) v ( q ) d q d t = 0 where the last equation is because whenever there is a point mass such that the measure of { ( ρ, t ) | ρ ( q ) = φ + β ( t ) = ζ } is non-zero for some β < c , we can always increase the lower bound of the integral to excludethis point mass by choosing an (cid:15) smaller than c − β . Consequently, we have lim β → c − Y ( β ) = Y ( c ) , sofunction Y ( c ) is left continuous on (0 , .Now, we are ready to deﬁne our signaling function for the case with point mass ρ ( q ) . Since Y ( c ) ismonotone (weakly) decreasing and is left continuous, the following min is well-deﬁned c = min { β | Y ( β ) ≥ v ( t ) } , (29)and moreover we can use binary search to ﬁnd the c .Given the above c , we deﬁne the following signaling scheme. Deﬁne the following constant D , D = v ( t ) − ( Y ( c ) − (cid:82) t t (cid:82) q : ρ ( q )= − φ + c ( t ) g ( q ) v ( q ) d q d t ) (cid:82) t t (cid:82) q : ρ ( q )= − φ + c ( t ) g ( q ) v ( q ) d q d t = v ( t ) − (cid:82) t t (cid:82) q : ρ ( q ) > − φ + c ( t ) g ( q ) v ( q ) d q d t (cid:82) t t (cid:82) q : ρ ( q )= − φ + c ( t ) g ( q ) v ( q ) d q d t . π ∗ ( q, t ) =  for all q, t such that ρ ( q ) > − φ + c ( t ) D for all q, t such that ρ ( q ) = − φ + c ( t )0 otherwise . This signaling scheme gives rise to a threshold mechanism by using the payment function deﬁned inTheorem 1. Notably, when ρ ( q ) = − φ + c ( t ) doesn’t have point mass at this c point, D will be 0 due tocontinuity and this degenerates to the threshold signaling scheme for Case 3 in Theorem 1. The feasibilityand the optimality of the above mechanisms follow from the same argument in the proof of Lemma 8 andProposition 1, essentially because the boundary case of ρ ( q ) = − φ + c ( t ))