[PDF] On the Disclosure of Promotion Value in Platforms with Learning Sellers

Abstract

We consider a platform facilitating trade between sellers and buyers with the objective of maximizing consumer surplus. Even though in many such marketplaces prices are set by revenue-maximizing sellers, platforms can influence prices through (i) price-dependent promotion policies that can increase demand for a product by featuring it in a prominent position on the webpage and (ii) the information revealed to sellers about the value of being promoted. Identifying effective joint information design and promotion policies is a challenging dynamic problem as sellers can sequentially learn the promotion value from sales observations and update prices accordingly. We introduce the notion of confounding promotion policies, which are designed to prevent a Bayesian seller from learning the promotion value (at the expense of the short-run loss of diverting consumers from the best product offering). Leveraging these policies, we characterize the maximum long-run average consumer surplus that is achievable through joint information design and promotion policies when the seller sets prices myopically. We then establish a Bayesian Nash equilibrium by showing that the seller's best response to the platform's optimal policy is to price myopically at every history. Moreover, the equilibrium we identify is platform-optimal within the class of horizon-maximin equilibria, in which strategies are not predicated on precise knowledge of the horizon length, and are designed to maximize payoff over the worst-case horizon. Our analysis allows one to identify practical long-run average optimal platform policies in a broad range of demand models.

Full PDF

OOn the Disclosure of Promotion Value in Platformswith Learning Sellers

Yonatan Gur

Stanford University

Gregory Macnamara

Stanford University

Daniela Saban ∗ Stanford UniversityApril 2020

Abstract

We consider a platform facilitating trade between sellers and buyers with the objective of max-imizing consumer surplus. Even though in many such marketplaces prices are set by revenue-maximizing sellers, platforms can inﬂuence prices through (i) price-dependent promotion policiesthat can increase demand for a product by featuring it in a prominent position on the webpageand (ii) the information revealed to sellers about the value of being promoted. Identifying eﬀectivejoint information design and promotion policies is a challenging dynamic problem as sellers cansequentially learn the promotion value from sales observations and update prices accordingly. Weintroduce the notion of confounding promotion policies, which are designed to prevent a Bayesianseller from learning the promotion value (at the expense of the short-run loss of diverting con-sumers from the best product oﬀering). Leveraging these policies, we characterize the maximumlong-run average consumer surplus that is achievable through joint information design and promo-tion policies when the seller sets prices myopically. We then establish a Bayesian Nash equilibriumby showing that the seller’s best response to the platform’s optimal policy is to price myopicallyat every history. Moreover, the equilibrium we identify is platform-optimal within the class ofhorizon-maximin equilibria, in which strategies are not predicated on precise knowledge of thehorizon length, and are designed to maximize payoﬀ over the worst-case horizon. Our analysisallows one to identify practical long-run average optimal platform policies in a broad range ofdemand models.

Keywords : Information design, Bayesian learning, revenue management, pricing, platforms, on-line marketplaces

Online marketplaces allow consumers to evaluate, compare, and purchase products while simultane-ously providing a channel for third-party sellers to reach a broader consumer base and increase demandfor their products. As platforms seek to maintain a large consumer base, many platforms prioritizeincreasing consumer surplus by oﬀering competitively priced products. At the same time, it is common ∗ Correspondence: [email protected] , [email protected] , [email protected] . a r X i v : . [ ec on . T H ] A p r ractice in such marketplaces to let sellers determine their own price, but such ﬂexibility may result inhigher prices that reduce consumer surplus. However, platforms retain the ability to impact consumersurplus by inﬂuencing sellers’ pricing policies. One method for doing so is designing the search andrecommendation environment to incentivize sellers to post low prices. For example, a platform canchoose to prominently feature sellers that set competitive prices, thereby increasing their visibility andboosting the demand they face. A second method to inﬂuence prices is strategically sharing informa-tion on how the promotion policy impacts consumer demand. Platforms typically have the ability toobserve and track consumer behavior across products on their site and thus often have better infor-mation about consumer demand than sellers. In particular, the additional demand that is associatedwith being promoted by the platform (e.g., being featured in a prominent position on the webpage) istypically a priori unknown to sellers. By strategically sharing this information, the platform can alterthe seller’s perceived value of being promoted and thereby impact the seller’s posted prices.In general, platforms may have many diﬀerent methods of altering a given product’s or seller’s visibilitythroughout a consumer’s interaction with the platform. For a concrete example of one, considerAmazon’s featured oﬀer (also known as the ‘Buy Box’), which is depicted in Figure 1. When aconsumer reaches a product page on Amazon, she has the option to ‘Buy Now’ or ‘Add to Cart’through links that are positioned in a designated, highly visible area of the webpage referred to as the‘Buy Box’, or to consider ‘Other Sellers on Amazon,’ an option that is positioned in a less visible areaof the webpage and typically requires the consumer to scroll down the page. Figure 1:

Example of Amazon Featured Oﬀer (Buy Box)

If the consumer selects ‘Buy Now’ or ‘Add to Cart,’ then the demand will be assigned to the sellerthat is featured in the Buy Box. In this case Amazon, by promoting a seller to the Buy Box, eﬀectivelyselects the seller from which the consumer is purchasing; this valuable advantage allows the promotedseller to capitalize on demand from consumers that are “impatient,” or have a high cost of search. A very similar ‘Buy Box’ promotion mechanism is also used by Walmart Marketplace (SellerActive, 2017) and eBayProduct-Based Shopping Experience (EcommerceBytes, 2017). Thus, Amazon caninﬂuence pricing decisions not only through its Buy Box promotion policy, but also by leveraging theunderlying information asymmetry by strategically disclosing information on the additional demandassociated with being promoted.A key challenge a platform faces in utilizing its private information is that sellers can, potentially, infer the value of promotions over time from sales observations, and update their prices accordingly.Therefore, the platform needs to strike a balance between providing incentives for prices that maximizeconsumer surplus in the current period, and controlling the information that is revealed by salesobservations, which impacts the consumer surplus in subsequent periods. As the platform’s informationdisclosure policy impacts the optimal promotion policy, which in turn impacts the seller’s ability tocollect information over time, the platform must consider the design of its promotion and informationpolicies jointly. In this paper, we study how a platform can maximize consumer surplus through joint information design and dynamic promotion policies that balance the aforementioned tradeoﬀ.We note that while the ‘Buy Box’ example above describes a retail setting, our formulation andapproach are relevant to similar “promotions” that are common in online marketplaces and platformswhere prices are set by sellers. Examples include lodging platforms (e.g., Airbnb), booking and travelfare platforms (e.g., Expedia, Booking.com, TripAdvisor), freelancing platforms (e.g., Upwork), andordering and delivery platforms (e.g., Uber Eats, Grubhub). While the structure of promotions andthe criteria the platform uses for selecting the promoted sellers may vary across these settings, theyall share common features: promotions are valuable to sellers, though the exact value may be a prioriunknown to sellers, and the platform may share information about the value. Amazon’s promotion decisions are based on a Featured Merchant Algorithm (FMA). While Amazon does not publiclyreveal the factors accounted for by the FMA when selecting the featured seller, there are many resources suggesting thatthe featured sellers are those who set low prices, have high consumer ratings, etc. See, e.g., the blog post by Informed.co(2018) for a description of Amazon Featured Merchant Status and some details on the FMA algorithm, as well as Chenet al. (2016) for an overview and analysis of factors that impact Amazon’s promotion decision. ain Contributions. Our contributions lie in (1) introducing a stylized model for studying theinteraction between a platform and a strategic seller who does not know the value of promotions(has incomplete demand information); (2) characterizing the maximum long-run average expectedconsumer surplus that is achievable by the platform when the seller is myopic; (3) characterizingplatform policies that achieve this consumer surplus in equilibrium; and (4) providing a prescriptionfor identifying an optimal policy from a class of simple, practical policies given a concrete demandmodel. More speciﬁcally, our contribution is along the following dimensions. (1) Modeling.

Our model considers a platform that can promote a single product to each arrivingconsumer, and a seller that sequentially sets prices and has access to its own sales observations. Ourformulation considers a broad class of demand and consumer choice structures, and assumes that eacharriving consumer is either impatient, and therefore considers only the promoted product (versus anoutside option of not buying at all), or patient, and therefore considers all the available alternatives.As impatient consumers only have the promoted product in their consideration set, the fraction ofthese consumers captures the value of promotion to the seller.The platform has private information about the true fraction of impatient consumers. At the beginningof the horizon, the platform provides an initial information signal regarding this fraction, and commitsto a dynamic promotion policy (a dynamic sequence of functions) that at each period maps the truefraction of impatient consumers, the seller’s belief regarding this fraction, and the price posted by theseller, to a (possibly random) promotion decision. Subsequently, in each period the seller updates hisbelief regarding the fraction of impatient consumers and then posts a price. After the price is posted,the platform decides whether to promote the seller or one of its competitors. A consumer arrives,forms a consideration set depending on their patience type, and makes a purchase decision accordingto an underlying demand model. The seller observes whether it made a sale or not.Our model is stylized, and considers a strategic Bayesian seller that operates in a competitive envi-ronment that is set exogenously, yet allows for tractability in a challenging dynamic problem that isrelevant to many practical settings. Our model captures a fundamental tradeoﬀ faced by the platform,between maximizing consumer surplus in a present period, and controlling the demand informationrevealed to the seller, which may impact the achievable consumer surplus in future periods. (2) Characterizing the long-run average optimal platform performance.

We observe that fully dis-closing its private information can be detrimental to the platform. As a method for controlling theseller’s uncertainty about the fraction of impatient consumers over time, we introduce the notion of confounding promotion policies. These policies are designed to ensure that the seller’s belief about thefraction of impatient consumers is ﬁxed throughout the problem horizon (after the initial informationsignal is sent), at the cost of diverting consumers away from the best product oﬀering. Leveraging thestructure of confounding promotion policies, we characterize the maximum long-run average consumer4urplus that is achievable by the platform when the seller is myopic. (3) Equilibrium analysis.

We further show that myopic pricing is a best response to this platformstrategy, thereby establishing a

Bayesian Nash equilibrium between the platform and the seller. Inparticular, the platform cannot beneﬁt from deviating to any other joint information design andpromotion policy, and the seller cannot gain from deviating to any other dynamic pricing policy atany stage of the game. Moreover, the equilibrium we identify is platform-optimal within a class ofhorizon-maximin equilibria, in which strategies are not predicated on precise knowledge of the horizonlength, and are designed to maximize payoﬀ over the worst-case horizon length. While the literature ondynamic pricing suggests that sellers should avoid confounding prices in order to learn the underlyingdemand, our characterization implies that, in the presence of a strategic platform, it might be optimalfor the seller to set confounding prices, even though doing so leads to incomplete learning. (4) Policy design.

Finally, we leverage the class of confounding promotion policies to provide a pre-scription for identifying practical platform’s equilibrium strategies given a concrete demand model.Our approach is based on reducing the platform problem to one in which the platform needs to ﬁrstidentify the optimal confounding promotion policy for a given prior (which can be reduced to a staticproblem), and then identify the information signal that results in an optimal prior. Thus, this pro-cedure allows one to identify practical long-run average optimal information design and promotionpolicies in a broad class of demand models, and to study the impact of the underlying structure ofthe demand model and the platform’s search environment on the design of eﬀective promotion policiesand the achievable consumer surplus.

Our work relates to several strands of literature in operations and economics. First, the consideration ofthe seller’s pricing decisions relates to the literature on dynamic pricing policies in settings characterizedby demand uncertainty including Araman and Caldentey (2009), Besbes and Zeevi (2009), Farias andVan Roy (2010), Harrison et al. (2012), and den Boer and Zwart (2014), and Keskin and Zeevi (2014),among others; see also surveys by Araman and Caldentey (2010) and den Boer (2015) for an overview.More broadly, the seller’s problem relates to an extensive literature on sequential decision making underuncertainty in which a decision maker must balance a tradeoﬀ between taking actions which generatehigh immediate payoﬀs with taking actions that generate information and therefore increase futurepayoﬀs. This tradeoﬀ has been studied in contexts including retail assortment selection (e.g., Caro andGallien (2007), Saur´e and Zeevi (2013)) and inventory management (e.g., Huh and Rusmevichientong(2009), Besbes and Muharremoglu (2013)), Besbes et al. (2017)). Our work departs from these models,which assume that, conditional on the decision maker’s action, the payoﬀ and information generatedis exogenous , by considering the pricing dynamics of a learning seller when demand is endogenously confoundingbelief . On the other hand, we show that the platform may beneﬁt from concealing the underlyingdemand structure from the seller, so the platform faces a counterpart tradeoﬀ between maximizinginstantaneous consumer surplus and concealing demand information, which may increase future con-sumer surplus. We show that, in many cases, eﬀective platform strategies are in fact designed to confound the seller at certain beliefs and prevent him from learning the underlying demand structureeven at the cost of diverting consumers to inferior product oﬀerings. Moreover, while in Harrison et al.(2012) semi-myopic policies (in which the seller does not price myopically at confounding beliefs) aresuggested as a vehicle to avoid incomplete learning, our analysis implies that a strategic platform maydesign its policy to ensure myopic pricing is a best response for the seller.In fact, we establish the optimality of this solution for the platform in a class of horizon-maximinequilibria, which is related to previous work on settings where players maximize their worst-casepayoﬀ given uncertainty over other players’ preferences or action sets (see e.g. Carroll (2015)). Inour horizon-maximin solution concept, however, the platform and seller use strategies that maximizepayoﬀ over the worst-case horizon length.In our formulation, the interaction between the platform and seller begins with a disclosure of infor-mation. In that sense, our work relates to the work on information design in the Bayesian Persuasionframework originating in the work of Segal and Rayo (2010) and Kamenica and Gentzkow (2011), andmore broadly, to the work on repeated games of incomplete information in Aumann and Maschler(1995), which studies how an informed player’s actions inﬂuence the learning of an uninformed player.Thus, our work contributes to the growing ﬁeld of communication and information design in operationalsettings including queueing (Lingenbrink and Iyer 2019), networks (Candogan and Drakopoulos (2017),Candogan (2019)), inventory (Drakopoulos et al. 2018), and exploration in platforms (Papanastasiouet al. (2017), Bimpikis and Papanastasiou (2019), K¨u¸c¨ukg¨ul et al. (2019)). The current paper departsfrom this line of work in terms of both the application domain and the setting. In particular, the abovestudies typically consider a static formulation whereas in our setting the information signal is followedby a dynamic interaction between the platform and the seller through which further information may6e revealed to the seller. For additional models of dynamic Bayesian Persuasion see, e.g., Ely et al.(2015), and Ely (2017).A few literature streams study the interaction between sellers, consumers, and platforms that facilitatetheir trade. In our model, the platform can impact purchase decisions through its selection of whichproduct to promote. The phenomenon that consumers are more likely to consider and purchaseproducts given prominence on a webpage has been documented empirically in many settings; see, e.g.Kim et al. (2010) and Chen and Yao (2016) in the context of consumer products, and Besbes et al.(2016) in the context of content recommendations in media sites. Some implications of this phenomenonhas been studied from the perspective of retailers designing optimal rankings of its products, (see,e.g., Derakhshan et al. (2018) and Ferreira et al. (2019)). However, we consider how the platformcan use their promotion policy to leverage this behavior for incentivizing low prices from third-partysellers. Thus, our work relates more closely to the design of platform recommendations and searchenvironments such as Hagiu and Jullien (2011) which studies how a revenue maximizing platformdirects consumer search and, in particular, how diverting consumer search may incentivize sellers tolower their prices. Our model and analysis identify a new reason for diverting consumers; when facinga seller with incomplete demand information, a platform may divert consumers to prevent the sellerfrom learning this information. Dinerstein et al. (2018) empirically analyze a similar tradeoﬀ betweendirecting consumers to desired products and strengthening incentives for sellers to lower prices in thecontext of the eBay search environment. Hagiu and Wright (2019) studies a broadly related problemof whether a platform should induce customers to explore by steering them to new products.Finally, we would like to distinguish the notion of promotions studied in the current paper from theone that has been studied in retail management; see, e.g., Cohen et al. (2017) and references therein.In this stream of work, promotions refer to times at which a retailer temporarily reduces its priceto increase sales whereas promotion in our setting refers to the platform’s decision to increase thevisibility of a seller to consumers.

In this section, we introduce a stylized model of the dynamic interactions between a seller and aplatform. We start by providing an overview of the model, followed by a detailed description of eachmodel component. A few modeling assumptions and extensions are discussed in § Overview of Incentives.

We study how a platform, which facilitates trade between sellers andconsumers, should design a joint promotion and information sharing policy to maximize consumersurplus. Consumers arrive to the platform sequentially, and upon arrival, each arriving consumer sees apromoted product; depending on her type, she may consider additional products as well. For simplicity,7e assume that each consumer is either impatient or patient . Impatient consumers only considerthe promoted product whereas patient consumers consider all available products. Upon arrival, theconsumer observes her (potentially idiosyncratic) value for each product in her consideration set andmakes a purchase decision.By selecting which product to promote, the policy directly aﬀects impatient consumers’ choices, whichimpacts their surplus and the seller’s revenue. Moreover, the promotion policy can also inﬂuence theseller’s pricing decisions: as impatient consumers only consider to purchase from the seller if it ispromoted, a policy that promotes low-priced products incentivizes the seller to set low prices in orderto further increase sales.The platform’s ability to inﬂuence pricing decisions, however, is determined by the fraction of impatientconsumers. If a large fraction of consumers are impatient, the promotion decision impacts a substantialportion of the seller’s potential demand, and the platform can incentivize its desired prices by rewardingthe promoted seller with signiﬁcantly increased demand. On the other hand, if only a small fractionof consumers are impatient, promotion generates little beneﬁt for the seller and the extent to whichthe promotion policy can inﬂuence prices is more limited. When this fraction is privately knownby the platform, the seller’s belief about it impacts his pricing incentives. Thus, the platform canalso inﬂuence the seller’s pricing decisions by strategically sharing information about the fraction ofimpatient consumers.A key challenge introduced by this dynamic setting is that as consumers arrive and make purchasedecisions sequentially, the seller may progressively collect information about the fraction of impatientconsumers and dynamically update his price accordingly. However, as the platform’s promotion policyimpacts consumer demand, it also impacts the informativeness of sales observations. Therefore, thepromotion policy not only impacts the current price but also aﬀects future prices (by aﬀecting sellerbeliefs). Thus, the platform’s dynamic problem is inherently intertwined and it must jointly optimizeits promotion and information policy. The Dynamic Game.

We model the interaction between the platform and the seller as consistingof two stages. First, before consumers arrive, the platform publicly commits to ( i ) a signaling mecha-nism σ which may reveal information about the fraction of impatient consumers, and ( ii ) a promotionpolicy α ; both of these are described in detail below. The platform then privately observes the truefraction of impatient consumers φ ∈ { φ L , φ H } , where 0 < φ L < φ H < φ = φ H with commonlyknown probability µ . (Note that, in the tradition of the information design literature, we assumethat the platform commits to a signaling mechanism before observing the true fraction of impatientconsumers, φ .) Finally, the platform sends a signal, s , which is drawn according to σ . See Figure 2 fora summary of these dynamics. 8 eriod 0 Period 1 BeginsPlatformCommits to α , σ PlatformObserves Fraction φ SellerObserves Signal, s ∼ σ ( φ ) Figure 2:

Dynamics before horizon begins ( t = 0) In the second stage, T consumers arrive sequentially. In each period t = 1 , ..., T , before consumer t arrives, the seller sets a price p t ∈ P and then the platform uses her promotion policy α to decide topromote the seller, a t = 1, or one of its competitors, a t = 0. Consumer t then arrives and observesthe products in her consideration set. With probability φ , she is impatient and only considers thepromoted product. With probability 1 − φ , she is patient and considers all products, regardless of thepromotion decision.The consumer then makes a purchase decision according to an underlying discrete choice model asdescribed under ‘consumer demand’ below. Finally, the seller observes its own sales outcome, y t ∈{ , } . See Figure 2 for a summary of these dynamics. Period t Begins Period t + 1 BeginsSellerUpdates Belief µ t SellerSets Price p t Platform MakesPromotion Decision a t Consumer Arrives &Purchases (or not) y t Figure 3:

Dynamics at each period t = 1 , ..., T Consumer Demand . At each period, a consumer arrives with an indepedently drawn patience type,observes the products in her consideration set, and purchases according to a discrete choice model.The probability of purchasing from the seller is captured by the commonly known function ρ thatdepends on the consumer type, the price p ∈ P and a platform promotion decision a ∈ { , } : ρ ( p, a ) = P ( y = 1 | p, a ) =  ¯ ρ c ( p ) , if the consumer is patient¯ ρ ( p ) , if the consumer is impatient and a = 10 , if the consumer is impatient and a = 0 . (1)The demand function ρ captures the impact of the consumer’s patience type: ρ c denotes demandfor the seller when consumers are patient and thus the seller may face competition; ρ , on the otherhand, captures the demand when there is no competition apart from the consumer’s outside option (ofnot buying). Moreover, the probability of an impatient consumer purchasing from the seller equals 0unless the seller is promoted. If ¯ ρ c ( p ) (cid:54) = ¯ ρ ( p ), then ρ reﬂects a setting where the probability that theconsumer purchases from the seller depends on whether the consumer considers other products, whichmay be competitors. We assume a stationary arrival process where each consumer’s patience type andpurchase probability is independent of t . We make the following assumption on the demand function.9 ssumption 1 (Demand) . ¯ ρ c ( p ) and ¯ ρ ( p ) are decreasing and Lipschitz continuous in p ; p ¯ ρ c ( p ) and p ¯ ρ ( p ) are strictly concave in p ; and ¯ ρ ( p ) ≥ ρ c ( p ) for all p ∈ R . Assumption 1 is mild and satisﬁed by many common demand models, including logit, mixed logit, andprobit, among others. The concavity of the seller’s revenue function ensures that there is a uniqueoptimal price for each consumer type, and the ordering on purchase probability requires competitor’sproducts to be substitutes to the seller’s product. We illustrate a simple demand model that satisﬁesthese conditions in Example 1 (presented below).

Payoﬀs.

Without loss, we normalize the cost of the seller to be 0, so the seller’s payoﬀ in period t asa function of his price, p ∈ P ,and the consumer’s purchase decision, y ∈ { , } , is: v ( p, y ) =  p, if y = 10 , otherwiseThe platform’s payoﬀ in each period equals the expected consumer surplus, which is captured by acommonly known function of the seller’s price p and the promotion decision a , and is equal to: W ( p, a ) =  ¯ W c ( p ) , if the consumer is patient¯ W ( p ) , if the consumer is impatient and a = 1¯ W c , if the consumer is impatient and a = 0 . (2)We make the following mild assumption on consumer surplus. Assumption 2 (Consumer Surplus) . ¯ W c ( p ) and ¯ W ( p ) are decreasing and Lipschitz continuous in p . The example illustrates the structure of the purchase probability in (1) and the consumer surplus in(2) when purchasing decisions correspond to uniformly distributed willingness to pay.

Example 1 (Uniform WtP) . Suppose that there are two products on the platform, and that for α, β ∈ [0 , , each customer t has willingness to pay that is independent and distributed uniformly over a unitsquare: v t ∼ U [ α − , α ] , v t ∼ U [ β − , β ] . Suppose that each arriving customer maximizes its net surplus (which equals zero when not buying)and that seller 2 sets a ﬁxed price equal to (or equivalently, that v t represents consumer t ’s valuerelative to some ﬁxed price). Then, each customer’s probability of purchase as a function of the ﬁrst roduct’s price p ∈ ( −∞ , α ] , the platform’s promotion decision, and the consumer type is: ρ ( p ) = P ( v − p ≥

0) = α − p, ρ c ( p ) =  (1 − β )( α − p ) + ( α − p ) , if p > α − βα − p − β , if p < α − β. Recall that ρ ( p ) is the purchase probability from an impatient consumer when the seller is promoted,and ρ c ( p ) is the purchase probability from a patient consumer who considers both products. The con-sumer surplus is: ¯ W ( p ) = (cid:90) αα − max { v − p, } ∂v = ( α − p ) , ¯ W c = (cid:90) ββ − max { v , } ∂v = β , ¯ W c ( p ) = (cid:90) ββ − (cid:90) αα − max { v − p, v , } ∂v ∂v =  (3 β + 3( α − p ) (1 − β ) + ( α − p ) ) , if p > α − β (3( α − p ) + 3 β (1 − α + p ) + β ) , if p < α − β. Histories, Strategies and Beliefs.

Given a space X , let ∆( X ) be the space of probability measureson X . At the beginning of the horizon, before the observation of φ , the platform commits to a jointpromotion and information disclosure strategy ( α , σ ). Denoting the set of possible signals by S , theplatform’s information disclosure strategy is a signaling mechanism, σ : { φ L , φ H } → ∆( S ), which mapsthe true fraction of impatient consumers to a distribution over signals. We denote the realized signalby s ∈ S and the space of signaling mechanisms by Σ. Let ¯ h t = (cid:68) s, ( p t (cid:48) , y t (cid:48) ) t − t (cid:48) =1 (cid:69) denote the signal andthe sequence of seller’s posted prices and sales realizations prior to the beginning of period t . Moreover,denote the set of these as ¯ H t = S × ( P × { , } ) t − . The platform’s promotion strategy, α = { α t } Tt =1 ,is a vector of mappings where α t : P × { φ L , φ H } × ¯ H t → [0 ,

1] speciﬁes the probability that the seller ispromoted in period t as a function of the seller’s current price, the value of φ , and the previous pricesand sales observations. We denote the realized promotion decision at time t by a t ∈ { , } and the setof dynamic promotion policies by A .In addition to ¯ h t , the seller also observes the platform’s announced strategy ( α , σ ). Thus, denote theseller’s information at the beginning of period t as: h = (cid:104) s, α , σ (cid:105) , and h t = (cid:68) s, α , σ, ( p t (cid:48) , y t (cid:48) ) t − t (cid:48) =1 (cid:69) , for t > . Moreover, we denote by {H t = σ ( h t ) , t = 1 , ..., T } the ﬁltration associated with the process { h t } Tt =1 ,and the set of possible histories at the beginning of period t as H t = S × A × Σ × ( P × { , } ) t − . Theseller’s strategy is a vector of non-anticipating mappings π = { π t } Tt =1 , where each π t : H t → ∆( P )maps the seller’s information in period t to a distribution from which the seller’s period t price is11rawn. Denote the set of non-anticipating seller strategies as Π.In each period, based on the available history of information, the seller updates his belief about φ according to Bayes’ rule. We denote the seller’s belief system by µ = { µ t } Tt =1 where µ t : H t → [0 ,

1] isthe probability that he assigns to { φ = φ H } , the fraction of impatient consumers being high.Given a platform strategy α , σ and seller policy π ∈ Π, denote the platform’s expected payoﬀ as: W α ,σ, π T ( µ ) = E (cid:32) T (cid:88) t =1 W ( p t , a t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α , σ, π (cid:33) , (3)where the expectation is with respect to ( p , a , y , s, φ ) and µ is the commonly known prior for φ .Moreover, denote the seller’s expected continuation (present and future) payoﬀ at period t and history h ∈ H t as: V α ,σ, π t,T ( h ) = E (cid:32) T (cid:88) t (cid:48) = t v ( p t , y t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h t = h, α , σ, π (cid:33) . (4) Platform Maximizes Consumer Surplus.

For many platforms, long-term revenue is primarilydriven by attracting consumers to the platform. The approach of maximizing consumer surplus hasbeen considered in previous models of platform design (see, e.g., Dinerstein et al. 2018 and referencestherein). Considering a platform that, on the other hand, seeks to drive more sellers to the platformwould, perhaps, be interesting to model as well. Moreover, we note that Assumption 2 is quitegeneral and satisﬁed in many instances by functions that, for example, balance the surplus accrued toconsumers and producers or seek to maximize the probability of a consumer purchase.

One Learning Seller.

For the sake of tractability, we focus on a setting where there is a single sellerwho is learning and all others set the same price each period. While it is an interesting extensionto consider multiple learning sellers, it would require either complex belief updates (as prices becomesignals of information to other sellers) or the fairly strong assumption that each seller observes eachconsumer’s purchase decision.

Patience and Search Costs.

To simplify exposition we characterize consumers by a patience type,determining whether they consider only the promoted product or all available products. While we donot model search costs in an explicit consumer utility model, we note that, under simple assumptionson the search costs of consumers, the above consumer behavior could be maintained as an outcome ofcommon consumer utility models that explicitly account for search costs.

Platform Leads, Seller Follows.

Our model assumes that the platform commits to a dynamicpromotion decision and signaling mechanism upfront. Therefore, in each period, the seller knows the12robability of being promoted as a function of the posted price, given the true value of φ . We notethat the platform’s ability to commit to a strategy is in line with the information design literature,e.g., Kamenica and Gentzkow (2011). Seller Information.

Our model and analysis are motivated by settings where many consumers arriveto the platform. In this case, it would be diﬃcult for a seller to track how prominently their productis featured to each consumer as the platform can direct customer traﬃc from many diﬀerent avenues:home pages, search pages, competitors’ products, etc. Therefore, the seller would have little way ofknowing how many customers considered its product without the platform sharing that information.We do, however, assume that the seller knows how many potential consumers have arrived, which relieson market characteristics, and how many of them purchased its product.

In this section, we consider outcomes when the seller uses a myopic Bayesian pricing policy; wereturn to consider the general class of dynamic pricing policies in §

4. We start by formally deﬁningthe seller’s myopic Bayesian pricing policy. We then analyze the performance of two natural platformpolicies, consisting of a myopic promotion policy together with truthful and uninformative informationdisclosure. In demonstrating their sub-optimality, we illustrate how, oftentimes, the expected consumersurplus might be greater if the seller does not learn φ , and therefore a myopic promotion policy mightnot be optimal for the platform. We then leverage this insight to develop policies that balance theplatform’s goal of incentivizing the seller to set low prices with constraining the information revealedby sales observations. While our results apply generally to the formulation introduced in the previoussection, we illustrate the results using the simple demand model in Example 1 where consumers haveuniformly distributed values for two products. We consider a myopic Bayesian pricing policy , denoted by π ∗ , deﬁned as follows. Deﬁnition 1 (Myopic Bayesian Pricing Policy) . In every period t and at every history h ∈ H t , amyopic Bayesian pricing policy π ∗ = { π ∗ t } Tt =1 sets a price p t ∈ P that maximizes the seller’s expectedrevenue in the current period given history h and α t . That is, π ∗ t satisﬁes: P (cid:18) p t ∈ arg max p ∈P E a t ,y t ,φ ( v ( p, y t ) | h t = h ) (cid:12)(cid:12)(cid:12)(cid:12) π ∗ t (cid:19) = 1 . (5) If multiple prices satisfy (5) , π ∗ selects one which maximizes the current consumer surplus. This is akin to considering sender preferred equilibria which is standard in models of Bayesian Persuasion; see related φ . Speciﬁcally, the promotionpolicy in future periods does not aﬀect the price set by a myopic seller. We note that considering myopicpricing decisions reduces the complexity of the seller’s decision, but yet reﬂects a fair level of sellersophistication as it still requires the seller to constantly update beliefs and prices. In general, in each period the posted price may aﬀect the seller’s current revenue as well as theplatform’s future promotion policy, and therefore the seller’s pricing policy could potentially dependon the history in complex ways. Nevertheless, from an analysis perspective, there is an advantage infocusing on policies that depend on the history in a simple way. For that purpose, we next deﬁne theset of promotion policies that depend on the history only through the seller’s current belief. Deﬁnition 2 (Promotion Policies Based on Seller’s Belief) . The set of promotion policies A M ⊂ A are those which are constant across histories that generate the same belief. That is, α (cid:48) ∈ A M , if andonly if, for all t = 1 , ..., T , σ ∈ Σ , and for any ¯ h (cid:48) , ¯ h (cid:48)(cid:48) ∈ ¯ H t such that µ t ( (cid:104) σ, α (cid:48) , ¯ h (cid:48) (cid:105) ) = µ t ( (cid:104) σ, α (cid:48) , ¯ h (cid:48)(cid:48) (cid:105) ) ,one has α (cid:48) t ( p, φ, ¯ h (cid:48) ) = α (cid:48) t ( p, φ, ¯ h (cid:48)(cid:48) ) for all p ∈ P and φ ∈ { φ L , φ H } . In the following lemma we establish that when the seller is myopic, it is without loss of optimality toconsider promotion policies in A M . Lemma 1 (Dependence on Histories Through Beliefs) . Suppose the seller is using the myopic Bayesianpricing policy π ∗ . Then, for any α ∈ A , σ ∈ Σ , there exists a promotion policy α (cid:48) ∈ A M such that: W α ,σ, π ∗ T ( µ ) ≤ W α (cid:48) ,σ, π ∗ T ( µ ) . Formal proofs of this and subsequent results can be found in Appendix B. We next describe the keyidea of the proof. We observe that, conditional on the current promotion policy (as a function of φ and p ) and the current belief, the seller’s expected revenue in the current period is independent of thehistory. Therefore, a myopic seller’s pricing decision at histories with the same belief and promotionpolicy are identical. Hence, if the platform uses a promotion policy α where two diﬀerent histories,¯ h , ¯ h ∈ ¯ H t , generate the same belief but diﬀerent continuation values, we can construct a policy α (cid:48) which generates the same expected continuation value at ¯ h and ¯ h by altering { α t (cid:48) } Tt (cid:48) = t . We rely onthe fact that: ( i ) the platform can generate the same consumer surplus at both histories because thehistory aﬀects the myopic Bayesian seller’s pricing decisions only through its eﬀect on the seller’s beliefand the current promotion policy; and ( ii ) changes to the promotion policy in periods t, .., T do notaﬀect the expected consumer surplus in previous periods. discussion in Kamenica and Gentzkow (2011) as well as in Drakopoulos et al. (2018). Moreover, in many settings with uncertainty about demand, myopic pricing policies were shown to achieve goodperformance in terms of maximizing the seller’s long-term payoﬀs (see, e.g., related discussion in Harrison et al. (2012)). Recall that we allow the platform’s policy to be a function of the full history of prices. .2 Myopic Promotion Policy We ﬁrst analyze the consumer surplus generated by the platform when it maximizes instantaneous con-sumer surplus without considering the long-term consequences of the seller’s learning. From Lemma 1,myopic pricing decisions only depend on the history through the promotion policy and the currentbelief. Therefore, in any period and at any history that corresponds to the same belief µ ∈ [0 , Thus, it is withoutloss to set T = 1 and characterize a myopic promotion policy by solving the following for each belief µ ∈ [0 , p ∈ P,α : P ×{ φ L ,φ H }× [0 , → [0 , E φ (cid:18) φ ( ¯ W c + α ( p, φ, µ )( ¯ W ( p ) − ¯ W c )) + (1 − φ ) ¯ W c ( p ) (cid:12)(cid:12)(cid:12)(cid:12) µ (cid:19) s.t. p ∈ arg max p (cid:48) ∈ P E φ ( φα ( p (cid:48) , φ, µ ) p (cid:48) ¯ ρ ( p (cid:48) ) + (1 − φ ) p (cid:48) ¯ ρ c ( p (cid:48) ) | µ ) , (6)where p is the myopically optimal price that is induced by the promotion policy. The constraint ensuresthat p is myopically optimal, and letting p be a variable ensures that p maximizes consumer welfareamong all myopically optimal prices.By Assumption 2, for each ﬁxed φ , the expected consumer surplus is decreasing in the price p ; thus,in solving (6), the platform seeks to incentivize the seller to set a low price. However, the platformis constrained in its ability to do so because the seller can always choose to ignore the platforms’promotion policy and focus on selling exclusively to patient customers. Let p ∗ denote the seller’srevenue-maximizing price for patient consumers, that is, p ∗ := arg max p ∈ P p ¯ ρ c ( p ) . (7)Note that p ∗ is unique due to Assumption 1, which requires p ¯ ρ c ( p ) to be strictly concave. Denotethe expected fraction of impatient customers as a function of the posterior belief µ as ¯ φ ( µ ) := φ L + ( φ H − φ L ) µ . Given belief µ , the seller believes that a patient consumer arrives with probability1 − ¯ φ ( µ ) , and therefore seller’s maximum expected payoﬀ from selling exclusively to patient consumersis (1 − ¯ φ ( µ )) p ∗ ρ c ( p ∗ ). To incentivize the seller to set a lower price p < p ∗ , the platform must promotethe seller with suﬃciently high probability at price p relative to the probability of promotion at price p ∗ , so that the seller’s loss in revenue from patient consumers is, at least, made up for by revenue fromimpatient consumers. We next continue with Example 1 (put forth in §

2) and solve for the optimalmyopic promotion policy. This derivation and additional analysis is detailed in Appendix A.2.

Example 2 (Uniform WtP: Myopic Promotion Policy) . Consider the demand structure from Exam- There is not a unique myopic promotion policy because the probability of promotion at prices not selected by theseller does not aﬀect the outcome or expected payoﬀs. le 1, and suppose α > β (cid:16) − β (cid:17) . Then, p ∗ = (2 α − β ) , and a myopic promotion policy has: α ( p, φ, µ ) =  , if p = p ( µ )0 , otherwise, , p ( µ ) = 14 (2 α − β (1 − ¯ φ ( µ )) − (cid:113) ¯ φ ( µ ) (cid:113) α − β (1 − ¯ φ ( µ )) . In Example 2, even if the seller sets price p ∗ , the seller’s product generates greater expected consumersurplus than the competitor (note that α − p ∗ > β ). Therefore, the platform promotes the sellerbecause it improves the surplus for impatient consumers and it allows the platform to incentivize alower price. However, even in cases where the seller’s product generates less surplus for consumers, themyopic promotion policy typically still promotes the seller with positive probability because it allowsthe platform to incentivize a lower price. By doing so, the platform would lower the expected surplusfor impatient customers in order to improve the surplus of patient customers.For φ ∈ { φ L , φ H } and belief µ ∈ [0 , W ( φ, µ ) denote theexpected consumer surplus generated by the optimal promotion policy, conditional on the value φ . Foreach φ ∈ { φ L , φ H } , ¯ W ( φ, µ ) is increasing in the seller’s belief, µ , because as µ increases the sellerbelieves that the platform inﬂuences the consideration set of a larger fraction of consumers (see thedecreasing price in Example 2). Signaling Mechanisms . Along with the myopic promotion policy deﬁned above, we consider twonatural signaling mechanisms: ( i ) truthful , that is, σ T ( φ ) = φ , for φ ∈ { φ L , φ H } ; and ( ii ) uninforma-tive , such as σ U ( φ ) = φ L , for φ ∈ { φ L , φ H } . From the description of ¯ W ( φ, µ ), one may observe thatrelative to an uninformative signal, a truthful signal increases the single-period expected consumersurplus when φ = φ H and decreases it when φ = φ L . On the other hand, an uninformative signalpools the seller’s pricing decision at the same price regardless of the realized value of φ .In general, the expected consumer surplus from a truthful signal may be larger or smaller than anuninformative one (depending on the shape of W and ρ ). Figure 4 illustrates that, in the settingdescribed in Example 1, concealing information is valuable. In that example, in one period, revealingno information can generate consumer surplus that is 5% higher than the surplus generated by atruthful signal. The outcome that concealing information through the signaling mechanism can bevaluable for the platform in the short term is not unique to Example 1; it appears in many otherdemand models as well. However, when there are multiple periods, the advantage of concealinginformation might be lost when the seller can learn the true value of φ from sales observations.In particular, to maximize the expected future consumer surplus in cases where no (or partial) in-formation disclosure is optimal, the platform may consider promotion policies that ensure the seller’s Moreover, from the information design literature, see § φ . Restricting the information content ofsales observations may come at a cost, however, because it imposes limitations on the platform’s pro-motion policy. In fact, in order to guarantee that sales observations convey no information about φ ,the platform may have to promote the seller at higher prices and divert impatient consumers awayfrom the best product oﬀering. . . . . . . . . ( α = . , β = . , φ L = . , φ H = . Uninformative: σ U T = 1 Truthful: σ T µ AverageConsumerSurplus

Figure 4:

Comparison of myopic promotion policy with two diﬀerent signaling mechanisms in the settingdescribed in Examples 1 and 2.

The above discussion suggests that when designing its promotion policy, the platform faces a tradeoﬀbetween increasing the consumer surplus in the current period and limiting the information containedin sales observations, which in turn impacts consumer surplus in future periods. A class of promotionpolicies that will be key for balancing this tradeoﬀ are those which confound a myopic seller’s learningin all periods t = 1 , . . . , T . As we will establish, these policies are fundamental for determining theachievable long-run average consumer surplus when the seller is myopic. Deﬁnition 3 (Confounding Promotion Policies) . Suppose the seller uses the myopic pricing policy, π ∗ .For each belief µ ∈ [0 , , deﬁne the set of confounding promotion policies A C ( µ ) ⊂ A M as those whichprevent the seller’s belief from updating throughout periods t = 1 , . . . , T . That is, α ∈ A C ( µ ) , if andonly if for all t = 1 , ..., T , one has P ( µ t +1 = µ | µ t = µ, π ∗ , α ) = 1 . Deﬁnition 3 encompasses two ways for a sales observation to contain no new information about thetrue fraction of impatient consumers, φ . The ﬁrst one is trivial; if the seller knows the true value withcertainty, that is, µ ∈ { , } , then sales observations do not aﬀect his belief, and one has A C ( µ ) = A .On the other hand, if µ ∈ (0 , φ . To do so, the platform mustdesign α so that for each period t and for all µ ∈ [0 , p set by a myopic seller given17 t = µ and α : P ( y t = 1 | φ = φ H , µ t = µ, p t = p, α t ) = P ( y t = 1 | φ = φ L , µ t = µ, p t = p, α t ) . (8)In any given period, a patient or impatient consumer may arrive. A patient consumer arrives andpurchases from the seller with probability (1 − φ )¯ ρ c ( p ) while an impatient consumer does so withprobability φ P ( α ( φ, p, h ) = 1)¯ ρ ( p ). If ¯ ρ c ( p ) > ¯ ρ ( p ), then the probability that a patient consumerpurchases in period t depends on φ , and α ( φ, p, h ) must also depend on φ to ensure (8) holds. Inparticular, the platform must promote the seller to more consumers if φ = φ H so that: φ H α ( φ H , p, h )¯ ρ ( p ) + (1 − φ H )¯ ρ c ( p ) = φ L α ( φ L , p, h )¯ ρ ( p ) + (1 − φ L )¯ ρ c ( p ) . We establish that for all µ ∈ [0 ,

1] the set A C ( µ ) is non-empty through Example 4 in Appendix A.1.Note that the promotion policy we describe is one of possibly many diﬀerent confounding policies, eachof which may generate a diﬀerent consumer surplus and seller revenue.A promotion policy that confounds the seller and prevents learning might not maximize instantaneousconsumer surplus, as it may weaken incentives for a low price and/or commit to divert some impatientconsumers from the product that generates the largest expected consumer surplus. For a given pos-terior belief µ ∈ [0 , W C ( µ ) := max α ∈A C ( µ ) T E (cid:32) T (cid:88) t =1 W ( p t , a t , c ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α , π ∗ , µ (cid:33) . (9)One beneﬁt in characterizing W C ( µ ), which we discuss in further detail in §

5, is that it is independentof T , so we could formulate (9) with T = 1. The seller’s beliefs are static by construction, so since theseller prices myopically, the maximum consumer surplus is the same in each period. We now continueExample 2, and characterize the optimal confounding promotion policy. One may observe that themyopically optimal promotion policy provided in Example 2 is not confounding as, at the myopicallyoptimal price p , α ( p, φ L , µ ) = α ( p, φ H , µ ) = 1 and ¯ ρ c ( p ) < ¯ ρ ( p ). Thus, to confound the seller, theplatform must decrease the sale probability when φ = φ H by decreasing α ( p, φ H , µ ) and/or incentivizea price where the diﬀerence between ¯ ρ c ( p ) and ¯ ρ ( p ) is smaller. Example 3 (Uniform WtP: Optimal Confounding Promotion Policy) . Consider the demand structurefrom Example 1, and let α > β (cid:16) − β (cid:17) . For µ ∈ (0 , , the optimal confounding promotion policy, C is: α C ( p, φ L , µ ) =  , if p = p C ( µ )0 , otherwise, , α C ( p, φ H , µ ) =  α − p C ( µ ) − β α − p C ( µ ) (cid:16) φ H − φ L φ H (cid:17) + φ L φ H , if p = p C ( µ )0 , otherwise,where p C ( µ ) is deﬁned as: p C ( µ ) = 14 (cid:18) α − β (1 − φ L ) − (cid:113) (2 α − β ) ( φ H − φ L ) µ + φ L (4 α − β ) + β φ L (cid:19) . For µ ∈ { , } , one has α C ( p, φ, µ ) = α ( p, φ, µ ) and p C ( µ ) = p ( µ ) where these are described inExample 2. Moreover, p C ( µ ) is the price set by a myopic seller given belief µ and the promotionpolicy α C ( µ ) . In this example, the platform reduces the promotion probability of the seller at the myopically optimalprice when φ = φ H . Thus, some impatient consumers only see the product that generates lower valuefor them (in expectation). Moreover, the seller posts a higher price, which lowers the expected surplusfor all consumers. Importantly, at belief µ ∈ { , } this policy coincides with the myopic promotionpolicy since the confounding constraint is relaxed as A C ( µ ) = A because the seller’s belief neverupdates. Thus, since the space of confounding policies increases discontinuously at µ ∈ { , } , onemay expected that W C ( µ ) also increases discontinuously at these beliefs. Indeed, one may observethat, in Example 3, the price set by a myopic seller (and thus the consumer surplus) under the optimalconfounding policy has a discontinuous jump at µ = 1 because confounding is costly for the platformat beliefs µ near 1 but imposes no cost at µ = 1. In this section, we leverage the notion of confounding policies to characterize the long-run averageoptimal consumer surplus. For any function f : R → R , deﬁne co ( f ) as the concaviﬁcation of f : co ( f )( µ ) := sup { z | ( µ, z ) ∈ Conv ( f ) } , where Conv ( f ) denotes the convex hull of the set { ( x, t ) : t ≤ f ( x ) } . The following result characterizesthe maximum long-run average consumer surplus that can be generated by a signaling mechanism anda dynamic promotion policy. One may also expect to observe a discontinuous increase in W C ( µ ) at µ = 0, but as discussed in Example 4 (seeAppendix § A.1), for any α ( p, φ L , µ ) there exists α ( p, φ H , µ ) such that the policy is confounding. Thus, a promotionpolicy can be made confounding only by changing α ( p, φ H , µ ) which has a diminishing eﬀect on the expected consumersurplus for µ near 0. This function appears often in the information design literature when characterizing the optimal signaling mechanismand corresponding payoﬀ (see e.g. Aumann and Maschler (1995) and Kamenica and Gentzkow (2011)). heorem 1 (Characterization of Long-Run Average Optimal Consumer Surplus) . Let W C ( µ ) bedeﬁned as in (9) . For all µ ∈ [0 , , lim T →∞ sup α ∈A ,σ ∈ Σ T W α ,σ, π ∗ T ( µ ) = co ( W C )( µ ) . The characterization in Theorem 1 follows from the fact that for any platform promotion policy,the seller’s belief µ t asymptotically converges to a limit belief as the number of periods grows large,and the expected consumer surplus that is achievable at this limit belief determines the long-runaverage expected consumer surplus. Thus, the long-run average payoﬀ is determined by the expectedconsumer surplus at the limit belief, and the platform’s challenge is to design a policy that ensures thedistribution of seller limit beliefs is optimal. Figure 5 demonstrates that for the same demand modelthat was used in Figure 4, the optimal long-run average consumer surplus generates a substantialincrease (up to 3.5%) in consumer surplus relative to truthful revelation. . . . . . . . . ( α = . , β = . , φ L = . , φ H = . Truthful: σ T co ( W C )( µ ) µ AverageConsumerSurplus

Figure 5:

Long-Run Average Optimal Policy in the setting described in Example 1.

We next describe the key ideas of the proof of Theorem 1. To prove that the left hand side is boundedfrom above by co ( W C ), we show that for any (cid:15) >

0, the expected number of periods in which thepromotion policy generates a consumer surplus strictly greater than W C ( µ t ) + (cid:15) is ﬁnite. This result isestablished through two lemmas. First, we show that for any (cid:15) > δ > t and belief µ t , if the promotion policy α t generates an expected consumer surplus greater than W C ( µ t ) + (cid:15) , then at the myopically optimal price, | P ( y t = 1 | φ = φ H , α t ) − P ( y t = 1 | φ = φ L , α t ) | > δ .Second, we show that for any δ >

0, the expected value of the seller’s belief converges to the true valueof φ exponentially fast in the number of periods in which the probabilities of a sale under φ = φ H and φ = φ L diﬀer by δ ; see Lemma 3 (Appendix B) for more details. Finally, we show that the optimalsignal generates a consumer surplus equal to co ( W C )( µ ). By constructing a policy ( α , σ ) for which T W α ,σ, π ∗ T ( µ ) = co ( W C )( µ ) for all T ≥

1, we establish that the left hand side is also bounded frombelow by co ( W C ).We note that the myopic pricing policy that is employed by the seller plays an important role in20stablishing Theorem 1. For example, a seller may deviate from myopic pricing and learn the truevalue of φ , in which case the long-run average expected consumer surplus equals the one under truthfulrevelation. In the next section, however, we introduce equilibria results which establish that it is not inthe best interest of the seller to do so; in fact, the myopic pricing policy is a best response to platformpolicies that achieve the expected consumer surplus that is characterized in Theorem 1. So far we have analyzed optimal policies under the assumption that the seller uses a Bayesian myopicpricing policy, π ∗ . In this section we proceed to consider the general set of non-anticipating pricingpolicies Π (deﬁned in § The following theorem establishes that for any T ≥

1, there exists a Bayesian Nash equilibrium( α , σ, π ) where the seller’s best response to the platform policy is to price myopically each period, andthe platform policy generates average consumer surplus equal to co ( W C )( µ ). Theorem 2 (Bayesian Nash Equilibrium) . Fix

T > . There exists a platform policy α ∈ A , σ ∈ Σ and seller pricing policy π such that: co ( W C )( µ ) = 1 T W α ,σ, π T ( µ ) ≥ T W α (cid:48) ,σ (cid:48) , π T ( µ ) , ∀ α (cid:48) ∈ A , σ (cid:48) ∈ Σ , (10) and at each period t = 1 , . . . , T and every ¯ h ∈ ¯ H t , π is myopic (i.e. satisﬁes (5) ) and the best-responseto ( α , σ ) . That is, V α ,σ, π t (cid:0) (cid:104) α , σ, ¯ h (cid:105) (cid:1) ≥ V α ,σ, π (cid:48) t (cid:0) (cid:104) α , σ, ¯ h (cid:105) (cid:1) , ∀ π (cid:48) ∈ Π , ¯ h ∈ ¯ H, t = 1 , ..., T. (11)To prove Theorem 2, we ﬁrst establish the existence of a promotion policy α ∈ A that guarantees,for all µ ∈ [0 , W C ( µ ) at each period, given that the seller is pricingmyopically. We then prove that an optimal signal exists by adapting standard analysis in informationdesign to our setting, which establishes that the platform policy generates expected consumer surplus21 o ( W C )( µ ) in each period. Since we do not require sequential rationality from the seller in responseto non-equilibrium platform policies, we construct a pricing policy π that is myopic for all histories (cid:104) α , σ, ¯ h (cid:105) , but prices at p ∗ (see (7)) at all other histories. Thus, the platform cannot improve consumersurplus by deviating from ( α , σ ). To complete the result, we establish that pricing myopically is a best-response to the platform’s equilibrium policy in all time periods by showing that the seller’s expectedcontinuation payoﬀ weakly decreases if he ever deviates from pricing myopically.While Theorem 1 considers π ∗ which sets the myopically optimal price for all h ∈ H t (and thereforeall ( α , σ )), Theorem 2 establishes an equilibrium where the seller is myopic only in response to ( α , σ ).Alternatively, we can establish an Approximate Bayesian Nash Equilibrium in ﬁnite time where theseller uses the pricing policy π ∗ and prices myopically at all histories. It is approximate because ifthe seller commits to π ∗ , as discussed in §

3, the platform could increase average consumer surplus bydeviating to an alternative policy, but only by a small amount which diminishes over time.Notably, Theorem 2 implies that “semi-myopic” policies that price myopically unless the resulting price,given the seller’s belief, is confounding (see, e.g., Harrison et al. (2012)) are not eﬀective deviations forthe seller. When the myopic price is confounding, semi-myopic policies select an alternate price thatgenerates information; in that sense these policies are designed to avoid precisely the prices that theplatform incentivizes the seller to set. Interestingly, in equilibrium, the platform does not confoundthe seller by ensuring that every price is confounding; in fact, the consumer surplus generated by apromotion policy that confounds the seller at every price p ∈ P is dominated by equilibria where theplatform truthfully reveals φ . Instead, the platform designs the promotion policy so that the selleris incentivzied to set the confounding price and does not deviate because the additional informationgained by such deviation would generate no value for the seller. The promotion policy is designed sothat if the seller deviates from pricing myopically, then his payoﬀ in each period equals the maximumexpected revenue from selling exclusively to patient consumers: (1 − ¯ φ ( µ t )) p ∗ ¯ ρ c ( p ∗ ). One may observethat this revenue is linear in µ t . Since the seller’s beliefs µ t are a martingale (as the seller is Bayesian)and the seller’s expected continuation value is linear in µ t , in expectation, the acquired informationgenerates no value for the seller, and myopic pricing is optimal. The equilibrium characterization in Theorem 2 does not rule out the existence of other equilibria,including ones that may generate even higher consumer surplus. For example, it is possible thatthere exist equilibria where the platform requires the seller to set low prices for a ﬁxed number of With a promotion policy that confounds the seller at every price, the purchase probability must be independent of φ at every price, which can be used to establish that the maximum consumer surplus under such a promotion policyequals µ ¯ W ( φ H ,

0) + (1 − µ ) ¯ W ( φ L ,

0) where ¯ W ( φ, µ ) is deﬁned with the myopic promotion policy that solves (6). Asnoted in the previous section, ¯ W ( φ,

0) is decreasing in µ , and therefore this policy is dominated by truthful revelation. α can dependon the entire history. Depending on the structure of ρ and W , such alternation between high andlow prices might generate greater average consumer welfare in equilibrium, relative to the equilibriumcharacterized in Theorem 2, while maintaining the seller’s total expected revenue. However, suchequilibria requires both parties to have precise knowledge of the length of the selling horizon, as wellas strong commitment on behalf of the platform; even slight misspeciﬁcation of the horizon length maylead to proﬁtable deviations from both sides. Thus, such equilibria are diﬃcult to implement from apractical perspective.Nevertheless, we next establish that the equilibrium characterized by Theorem 2 is platform-optimalwithin a class of maximin equilibria in which strategies are not predicated on precise knowledge of thehorizon length, but rather designed to maximize payoﬀ over the worst-case realized horizon length (upto time T ). For µ ∈ [0 ,

1] and T ≥

1, and given the strategies ( α , σ, π ), deﬁne the minimal time-averagepayoﬀ obtained by the platform by some period ¯ t : RW α ,σ, π T ( µ ) := min ¯ t ≤ T t W α ,σ, π ¯ t ( µ ) , and for a ﬁxed period t , deﬁne the seller’s minimal time-average continuation payoﬀ given history h ∈ H t as: RV α ,σ, π t,T ( h ) := min t ≤ ¯ t ≤ T (cid:18) t − t + 1 (cid:19) V α ,σ, π t, ¯ t ( h ) . Based on these payoﬀ functions, we introduce equilibria in which the platform and the seller maximizetheir minimal time-average payoﬀs at every history on the equilibrium path.

Deﬁnition 4 (Horizon-Maximin Equilibrium) . Fix T and µ ∈ [0 , . A strategy proﬁle ( α , σ, π ) is horizon-maximin equilibrium if: RW α ,σ, π T ( µ ) ≥ RW α (cid:48) ,σ (cid:48) , π T ( µ ) , ∀ α (cid:48) ∈ A , σ (cid:48) ∈ Σ , and RV α ,σ, π t,T ( (cid:104) α , σ, ¯ h (cid:105) ) ≥ RV α ,σ, π (cid:48) t,T ( (cid:104) α , σ, ¯ h (cid:105) ) , ∀ π (cid:48) ∈ Π , ∀ t = 1 , ..., T ¯ h ∈ ¯ H t . (12)Let E ( T ) ⊂ A × Σ × Π denote the set of Horizon-Maximin equilibria with maximal horizon length T .Through this deﬁnition, we focus on equilibria in which players’ strategies do not depend on preciseknowledge of the horizon length and are rather designed to ensure that their worst-case (with respectto the realized horizon length) time-average payoﬀ is maximized. However, by requiring sequentialrationality from the seller on the equilibrium path, the concept still captures that the seller utilizescollected information to dynamically improve performance. Our next result shows that the BayesianNash equilibrium characterized in Theorem 2 is a Horizon-Maximin Equilibrium, and it is long-runoptimal for the platform in the set of Horizon-Maximin equilibria.23 heorem 3 (Optimal Horizon-Maximin Equilibria) . Fix T ≥ and µ ∈ [0 , . There exists aHorizon-Maximin Equilibrium ( α , σ, π ) such that: T W α ,σ, π T ( µ ) = co ( W C )( µ ) , and for all t = 1 , .., T and ¯ h ∈ ¯ H t , the seller’s pricing policy at (cid:104) α , σ, ¯ h (cid:105) satisﬁes (5) (i.e. is myopicallyoptimal). Moreover, lim T →∞ sup ( α ,σ, π ) ∈E ( T ) T RW α ,σ, π T ( µ ) = co ( W C )( µ ) . In the proof of Theorem 3, we establish that the equilibrium described in Theorem 2 is a horizon-maximin equilibrium for all T . As in the proof of Theorem 2, by assuming that following deviations bythe platform the seller prices at p ∗ , it is optimal for the platform to use the optimal confounding policy.On the other hand, given any platform policy, the seller’s minimal time-average continuation payoﬀis, at most, the maximum expected revenue in the ﬁrst period. Since the platform’s policy is staticand confounding, the seller can achieve this expected revenue in every period. Therefore, by pricingmyopically in response to the confounding policy, the seller also maximizes his worst-case time-averagepayoﬀ. To establish the long-run average optimality of the equilibrium, we prove that in the platform-optimal horizon-maximin equilibrium, it is without loss to assume that the seller follows a myopicpricing policy. Therefore, we can adapt the result of Theorem 1 to establish that the equilibriumdescribed in Theorem 2 is, in fact, an optimal long-run average horizon-maximin equilibrium. In this section we demonstrate that, on top of maintaining long-term platform optimality, consideringconfounding promotion policies simpliﬁes the design of the platform’s optimal policy. We introducea simple subclass of confounding policies through which the dynamic problem of information andpromotion design is simpliﬁed into a tractable, static problem. We then provide a procedure fordesigning the optimal policy from that class given a concrete demand model, and ﬁnally, numericallycharacterize the optimal simple policy for several parameterizations of the demand model in Example 1.

Given the information structure in our formulation, the platform’s promotion policy can eﬀectivelydetermine the seller’s optimal pricing policy. With that in mind, as a ﬁrst step towards computing theplatform optimal policy, we show that it suﬃces to consider policies that promote only one price with24ositive probability. Deﬁnition 5 (Single-Price Promotion Policies) . Single-price promotion policies are those which, givenany history h ∈ H t , promote at most one price with positive probability in each period. We denote theset of single-price promotion policies by A P ⊂ A , formally deﬁned as follows: A P := { α ∈ A : ∀ t = 1 , ..., T, h ∈ H t , ∃ ¯ p t ( h ) ∈ P s.t. α t ( φ, p, h ) = 0 , ∀ p (cid:54) = ¯ p t ( h ) } . Moreover, let Σ S denote the set of simple signaling mechanisms which are based on the reduced setof signals S = { φ L , φ H } . The next proposition establishes that considering promotion policies in A P and signaling mechanisms in Σ S is without loss of optimality. Proposition 1 (Payoﬀ Equivalence of Single-Price Promotion Policies with Reduced Signal Set) . For any T ≥ , α ∈ A , σ ∈ Σ , there exists a single-price promotion policy, α (cid:48) ∈ A P and signalingmechanism σ (cid:48) ∈ Σ S such that: W α (cid:48) ,σ (cid:48) , π ∗ T ( µ ) ≥ W α ,σ, π ∗ T ( µ ) . While Proposition 1 establishes a subclass of policies that maintain optimality, we leverage the insightsfrom the previous analysis to further simplify the set of platform promotion policies while maintaining long-run average optimality . Deﬁnition 6 (Simple Promotion Policies) . The set of simple promotion policies A S ⊂ A M consists ofall policies that are static, single price, and confounding: that is, policies where α ∈ A C ( µ ) ∩A P for all µ ∈ [0 , and α is static (that is, α ( p, φ, µ ) = α t ( p, φ, µ ) , ∀ t = 2 , ..., T, p ∈ P, φ ∈ { φ L , φ H } , µ ∈ [0 , ). We note that the promotion policy ¯ α deﬁned in Example 3 is not only confounding, but also single-price and static, and therefore A S is non-empty. Considering only the class of promotion and signalingmechanisms ( A S , Σ S ) signiﬁcantly reduces the set of policies to a subclass of policies that are intuitiveand relatively easy to implement. For the seller, the problem is essentially static after he updates hisbelief based on the signal sent by the platform. His belief never changes and the promotion policy doesnot change, so the optimal price does not change across time periods. It is straightforward to adaptTheorem 1 to establish that long-run average optimality is achievable in the class of joint simple (thatis, static, confounding, and single-price) promotion policies and simple signaling mechanisms when theseller uses the myopic Bayesian pricing policy. Such policies are practical to implement as the platform needs only to communicate a single price and the probabilityof promotion that corresponds to it. Moreover, in many cases these policies are equivalent to threshold policies where theplatform communicates the maximum price that is promoted with positive probability and the corresponding probability.For example, one suﬃcient condition for the optimality of threshold policies is that the revenue maximizing price undercompetition is lower than the revenue optimizing price without competition. .2 Designing Optimal Simple Policies Given any concrete demand model that satisﬁes Assumptions 1 and 2, solving for the optimal simplepolicy requires solving for the optimal confounding promotion policy and corresponding payoﬀ, andthen determining the optimal signaling mechanism. The key to solving this problem is to observe thatby considering the class of simple policies, one only needs to consider T = 1 and then optimize overthe recommended price, and the probability of promotion at that price for each realization of φ . Inparticular, for belief µ ∈ [0 ,

1] the platform needs to solve: W C ( µ ) := max α φH ,α φL ∈ [0 , ,p ∈ P E φ (cid:0) φα φ ¯ W ( p ) + φ (1 − α φ ) ¯ W c + (1 − φ ) ¯ W c ( p ) | µ (cid:1) s.t. pρ ( p )( φ L α φ L (1 − µ ) + φ H α φ H µ ) + pρ c ( p )(1 − φ L − µ ( φ H − φ L )) ≥ p ∗ ρ c ( p ∗ )(1 − φ L − µ ( φ H − φ L )) ,φ H α φ H ¯ ρ ( p t ) + (1 − φ H )¯ ρ c ( p t ) = φ L α φ L ¯ ρ ( p t ) − (1 − φ L )¯ ρ c ( p t ) . In comparison to (6), here one only needs to optimize over two constants α φ H and α φ L instead of thegeneral function α . Moreover, the ﬁrst constraint, which ensures that the selected price is myopicallyoptimal for the seller, is simpliﬁed as one only needs to compare the optimal price to p ∗ . Finally, thesecond constraint ensures that the policy is confounding. The confounding constraint fully deﬁnes α φ H given α φ L and p , and therefore one only needs to optimize over these two variables. For many demandmodels, one may further establish that α L is fully deﬁned given a price p , which allows to optimizeonly over the price p . In the cases that it is possible to solve this optimization problem analytically, asin Example 3, then additionally solving for the optimal signal is typically straightforward as W C ( µ )is continuous and oftentimes concave on [0,1). However, when it is not possible to solve this problemanalytically, one may still solve for the optimal promotion policy numerically (e.g., for a grid of beliefsin [0 , We adopt the demand model from Example 1 for the purpose of demonstrating how one may follow theabove approach to evaluate optimal simple policies. We note that we follow a uniform WtP demandmodel here only for the sake of consistency with the analysis that was previously demonstrated; whilethe precise outcomes clearly depend on the speciﬁc demand model and measure of consumer welfarethat are assumed, the phenomena that are next illustrated are quite broad and hold across manydemand structures.First, when there is a large range in the potential value of φ (that is, when φ H − φ L is large), the26latform can realize substantial long-run gains, relative to truthful revalation, by using an optimalsimple policy. Deﬁne the relative gain in consumer surplus from using the optimal simple policycompared to the optimal truthful policy: RG ( µ ) := co ( W C )( µ ) − W T ( µ ) W T ( µ ) . Figure 6 depicts the relative gain compared to the optimal truthful policy for diﬀerent parametricspeciﬁcations of the demand model from Example 1. For each combination of parametric values forthis model, we calculate the relative gain at a grid of beliefs µ ∈ [0 ,

1] and plot the maximum, average,and minimum. One may observe that the gain RG ( µ ) that is captured by the optimal confoundingpolicy relative to truthful revalation might be quite signiﬁcant. Moreover, this value is larger whenthe seller’s product is superior to the competitor’s. We note that in Figure 6, for each value of φ H themaximum value occurs at a belief in the range [0 . , . µ = 0 and µ = 1.0 0 . . . . ( α = . , β = . , φ L = . φ H RG ( µ ) (a) Seller Competitive . . . . ( α = . , β = . , φ L = . φ H RG ( µ ) (b) Seller Preferred Figure 6:

Long-term value of confounding. The plots depict three measures of the relative gain RG ( µ ) for arange of parametric speciﬁcations of Example 1. For each speciﬁcation, we show the maximum, average, andminimum (over a grid of µ ∈ [0 , RG ( µ ), which is the relative gain in consumer surplus from theoptimal simple policy relative to the optimal truthful policy. The left hand plot reﬂects a demand model wherethe seller and competitor are competitive at the seller’s equilibrium price. In the right hand plot, the seller’sproduct produces more value for consumers for nearly all beliefs. Moreover, the optimal simple platform policy is nearly optimal even in the short run. Deﬁne thefraction of one-period consumer surplus that can be captured by a confounding policy relative out ofthe optimal one-period consumer surplus:

CCS ( µ ) := co ( W C )( µ ) W max ( µ ) . Figure 7 depicts the maximum, average, and minimum value of the captured consumer surplus

CCS ( µ )over a grid of beliefs µ ∈ [0 ,

1] for diﬀerent parametric speciﬁcations of the demand model fromExample 1. One may observe that the optimal simple platform policy, even in the worst-case, captures27early 97% of the maximum one-period surplus, and it typically guarantees an even better performance.We note that the parametric setting depicted in Figure 7 is, in fact, nearly the worst-case as the platformincurs smaller losses for larger φ L . In Figure 6, for each value of φ H the minimum value occurs ata belief in the range [0 . , . µ = 0 and µ = 1.0 0 . . . . ( α = . , β = . , φ L = . φ H CCS ( µ ) (a) Seller Competitive . . . . ( α = . , β = . , φ L = . φ H CCS ( µ ) (b) Seller Preferred Figure 7:

Short-term loss from confounding. The plots depict three measures of

CCS ( µ ) for a range ofparametric speciﬁcations of Example 1. For each speciﬁcation, we show the maximum, average, and minimum(over a grid of µ ∈ [0 , CCS ( µ ), which is the percent of the maximum one-period consumer surplusthat is captured by the optimal simple policy. The left hand plot reﬂects a demand model where the sellerand competitor are competitive at the seller’s equilibrium price. In the right hand plot, the seller’s productproduces more value for consumers for nearly all beliefs. In this paper, we propose a model of platform interacting with a third-party seller. The platformcannot directly set prices but can inﬂuence prices through its promotion policy as well as by disclosinginformation on the fraction of impatient consumers, which represents the additional demand associatedwith being promoted. We characterize the maximum long-run average consumer surplus achievableby a joint information disclosure and promotion policy in this setting for a broad class of demandmodels. We introduce the notion of confounding promotions policies, which are designed to preventthe seller from learning the fraction of impatient consumers, and establish that these policies playan essential role when maximizing the long-term consumer welfare. Notably, confounding promotionpolicies can be long-run optimal even though they incur short-term costs from diverting impatientconsumers from the best product oﬀerings. Moreover, we establish a Bayesian Nash equilibrium byshowing that in response to the platform’s optimal policy, the seller’s best response at every period andevery history is to use a Bayesian myopic pricing policy. We further establish that the equilibrium weidentify is platform-optimal within the class of horizon-maximin equilibria, in which strategies are notpredicated on precise knowledge of the horizon length, and are designed to maximize payoﬀ over the28orst-case horizon. Finally, we leverage this analysis to introduce a practical subclass of joint policieswhich maintain long-run average optimality. We demonstrate the identiﬁcation of an optimal platformpolicy within this subclass, as well as the evaluation of such policies, for a given demand model.

Our analysis highlights that confounding promotion policies are key to generating optimal long-runaverage consumer surplus because they allow the platform to precisely control the information that aseller may collect. However, on their own, these policies do not guarantee long-run optimality. Instead,it is essential that the seller’s beliefs ﬁrst update to a point where it is optimal to confound as thecost of confounding is not identical across beliefs. For example, recall from § µ → W C ( µ ) < W C (1). This observation implies that confounding at beliefs close to µ = 1 is costlywhereas confounding at µ = 1 incurs no cost since A C (1) = A . In general this idea extends to theentire belief space as it is typically only optimal for the platform to confound the seller at some subsetof the belief space. Thus, for many priors, the platform improves consumer surplus by revealing (some)information and then confounding the seller.In our model, we capture this initial revelation of information through a signaling mechanism as itsimpliﬁes the identiﬁcation and implementation of an optimal confounding policy. Without a signalingmechanism, the result of Theorem 1 holds, but the long-run average optimal promotion policy mustbe dynamic and react to the evolution of the seller’s beliefs. Essentially, the platform designs thepromotion policy to confound the seller at the same beliefs but allows the seller to learn at others.However, such a promotion policy must carefully control the information that is revealed from a salesobservation depending on how close the seller’s belief is to one where it is optimal to confound. Thus,as the seller’s belief evolves, both his optimal price and the probability of promotion change, whichcomplicates the analysis and implementation of a platform optimal policy. Our model emphasizes that a platform should not only carefully design the information that it shareswith sellers, but also consider how design features may impact a seller’s ability to procure information.For example, one feature a platform may consider is whether to reveal promotion decisions to sellers.In Appendix A.3, we consider this question formally and show, consistent with the insight of ournominal model, that it is optimal to conceal this information from sellers.If the platform reveals promotion decisions, the maximum long-run average consumer surplus is stilldetermined by the optimal confounding promotion policy and signaling mechanism. However, con-founding the seller at µ ∈ (0 ,

1) is not possible when promotion decisions are observed. In that case,29he seller can learn based on sales observations and/or promotion decisions. Therefore, confounding theseller requires the platform to use policies that satisfy more stringent conditions. First, for the promo-tion decision itself not to reveal information at price p and belief µ requires α ( p, φ H , µ ) = α ( p, φ L , µ ).Second, for a purchase decision to not reveal information, the probability of a sale, conditional on apromotion, must be independent of φ ; that is, ¯ ρ ( p ) = ¯ ρ c ( p ). Third, for a purchase decision to revealno information, conditional on no promotion, requires ¯ ρ c ( p ) = 0. In summary, confounding the sellerrequires α ( p, φ H , µ ) = α ( p, φ L , µ ) = 1 and ¯ ρ ( p ) = ¯ ρ c ( p ). When promotion decisions are revealed,the equality in the purchase probability only needs to hold in expectation over the promotion decisionrather than for each decision.Typically one will not have ¯ ρ ( p ) = ¯ ρ c ( p ) because competition will decrease sales. Thus, the maximumlong-run average optimal consumer surplus, when the seller is myopic and the platform reveals promo-tion decisions, will equal the surplus generated by a fully revealing signal and the myopic promotionpolicy. For example, under the demand model considered in Example 1, it is impossible to confound aseller that can observe promotion decisions. Therefore, one may observe that Figure 5 depicts a casewhere it is strictly better for the platform to not reveal promotion decisions under any prior µ ∈ (0 , There are several interesting extensions to our model. For one, understanding how strategic compe-tition between sellers aﬀects our results may be valuable. While we believe that the key ideas fromour work, and particularly the importance of confounding policies, would remain relevant under suchextension, it is important to note that the way in which competition is modeled will have an importanteﬀect on the result. For example, whether the platform can promote at most one seller, exactly oneseller, or distribute the impact of a promotion over as many sellers as it may choose to, might impactits ability to confound sellers. Moreover, how to model each seller’s observations is critical, and it isnot a priori clear what is the appropriate assumption in diﬀerent settings.Second, in many cases the platform may be able to observe relevant information about each arrivingconsumer; for example, the platform may be able to determine the consumer’s patience type basedon browsing and purchases history. While additional information may allow the platform to confoundthe seller more eﬀectively, understanding the precise impact of additional information and identifyingsettings in which such information can increase consumer surplus is an interesting avenue of research.30hird, understanding how the platform can design confounding policies in a setting where the selleritself has private information, such as production cost or inventory, is an interesting and challengingresearch avenue that is of practical importance.

References

Araman, V. F. and R. Caldentey (2009). Dynamic pricing for nonperishable products with demand learning.

Operations research 57 (5), 1169–1188.Araman, V. F. and R. Caldentey (2010). Revenue management with incomplete demand information.

WileyEncyclopedia of Operations Research and Management Science .Aumann, R. J. and M. Maschler (1995).

Repeated games with incomplete information . MIT press.Besbes, O., J. Chaneton, and C. C. Moallemi (2017). The exploration-exploitation trade-oﬀ in the newsvendorproblem.

Columbia Business School Research Paper (14-61).Besbes, O., Y. Gur, and A. Zeevi (2016). Optimization in online content recommendation services: Beyondclick-through rates.

Manufacturing & Service Operations Management 18 (1), 15–33.Besbes, O. and A. Muharremoglu (2013). On Implications of Demand Censoring in the Newsvendor Problem.

Management Science 59 (6), 1407–1424.Besbes, O. and A. Zeevi (2009). Dynamic Pricing Without Knowing the Demand Function: Risk Bounds andNear-Optimal Algorithms.

Operations Research 57 (6).Bimpikis, K. and Y. Papanastasiou (2019). Inducing Exploration in Service Platforms. In M. Hu (Ed.),

SharingEconomy: Making Supply Meet Demand, , pp. 193–216. Springer Series in Supply Chain Management.Candogan, O. (2019). Optimality of Double Intervals in Persuasion: A Convex Programming Framework.

Working Paper .Candogan, O. and K. Drakopoulos (2017). Optimal Signaling of Content Accuracy: Likes vs. Fake News.

SSRN Electronic Journal .Caro, F. and J. Gallien (2007). Dynamic Assortment with Demand Learning for Seasonal Consumer Goods.

Management Science 53 (2), 276–292.Carroll, G. (2015). Robustness and linear contracts.

American Economic Review 105 (2), 536–63.Chen, L., A. Mislove, and C. Wilson (2016). An empirical analysis of algorithmic pricing on amazon mar-ketplace. In

Proceedings of the 25th International Conference on World Wide Web , WWW ’16, Republicand Canton of Geneva, Switzerland, pp. 1339–1349. International World Wide Web Conferences SteeringCommittee.Chen, Y. and S. Yao (2016). Sequential search with reﬁnement: Model and application with click-stream data.

Management Science 63 (12), 4345–4365.Cohen, M. C., N.-H. Z. Leung, K. Panchamgam, G. Perakis, and A. Smith (2017). The impact of linearoptimization on promotion planning.

Operations Research 65 (2), 446–468.den Boer, A. V. (2015). Dynamic pricing and learning: historical origins, current research, and new directions.

Surveys in operations research and management science 20 (1), 1–18.den Boer, A. V. and B. Zwart (2014). Simultaneously Learning and Optimizing Using Controlled VariancePricing.

Management Science 60 (3), 770–783.Derakhshan, M., N. Golrezaei, V. Manshadi, and V. Mirrokni (2018). Product ranking on online platforms.

Available at SSRN: https://ssrn.com/abstract=3130378 or http://dx.doi.org/10.2139/ssrn.3130378 .Dinerstein, M., L. Einav, J. Levin, and N. Sundaresan (2018). Consumer Price Search and Platform Design inInternet Commerce.

American Economic Review 108 (7), 1820–1859. rakopoulos, K., S. Jain, and R. S. Randhawa (2018). Persuading Customers to Buy Early: The Value ofPersonalized Information Provisioning. SSRN Electronic Journal , 1–56.EcommerceBytes (2017). ebay implements amazon style buy box. .Ely, J., A. Frankel, and E. Kamenica (2015). Suspense and surprise.

Journal of Political Economy 123 (1),215–260.Ely, J. C. (2017). Beeps.

American Economic Review 107 (1), 31–53.Farias, V. F. and B. Van Roy (2010). Dynamic pricing with a prior on market response.

Operations Re-search 58 (1), 16–29.Ferreira, K., S. Parthasarathy, and S. Sekar (2019). Learning to rank an assortment of products.

Available atSSRN 3395992 .Hagiu, A. and B. Jullien (2011). Why do intermediaries divert search?

The RAND Journal of Economics 42 (2),337–362.Hagiu, A. and J. Wright (2019). Platforms and the exploration of new products.

Management Science .Harrison, J. M., N. B. Keskin, and A. Zeevi (2012). Bayesian dynamic pricing policies: Learning and earningunder a binary prior distribution.

Management Science 58 (3), 570–586.Horner, J. and A. Skrzypacz (2017).

Learning, Experimentation, and Information Design , Volume 1 of

Econo-metric Society Monographs , pp. 63–98. Cambridge University Press.Huh, W. T. and P. Rusmevichientong (2009). A nonparametric asymptotic analysis of inventory planning withcensored demand.

Mathematics of Operations Research 34 (1), 103–123.Informed.co (2018, December). Everything you need to know about amazon featured merchant status. https://medium.com/informed/amazon-featured-merchant-status-e8276f5e1479 .Kamenica, E. and M. Gentzkow (2011). Bayesian Persuasion.

American Economic Review 101 , 2590–2615.Keskin, N. B. and A. Zeevi (2014). Dynamic pricing with an unknown demand model: Asymptotically optimalsemi-myopic policies.

Operations research 62 (5), 1142–1167.Kim, J. B., P. Albuquerque, and B. J. Bronnenberg (2010). Online demand under limited consumer search.

Marketing science 29 (6), 1001–1023.K¨u¸c¨ukg¨ul, C., z. ¨Ozer, and S. Wang (2019). Engineering social learning: Information design of time-lockedsales campaigns for online platforms.

SSRN Electronic Journal .Lingenbrink, D. and K. Iyer (2019). Optimal Signaling Mechanisms in Unobservable Queues.

OperationsResearch (Forthcoming) , 1–32.Papanastasiou, Y., K. Bimpikis, and N. Savva (2017). Crowdsourcing Exploration.

Management Science 64 (4),1727–1746.Saur´e, D. and A. Zeevi (2013). Optimal dynamic assortment planning with demand learning.

Manufacturing& Service Operations Management 15 (3), 387–404.Segal, I. and L. Rayo (2010). Optimal Information Disclosure.

Journal of Political Economy 118 (5), 949–987.SellerActive (2017). Amazon buy box and walmart buy box: What you need to know. https://selleractive.com/e-commerce-blog/amazon-buy-box-and-walmart-buy-box-what-you-need-to-know . Extensions and Examples

A.1 Example of a General Confounding Promotion Policy

Example 4 (Confounding Promotion Policy) . Recall the price, p ∗ , which is deﬁned in (7) as theunique price that the seller would set if he only sold to patient consumers. Note that p ∗ is independentof the seller’s belief about φ . Deﬁne ¯ α = { ¯ α t } Tt =1 where for all t and h ∈ H t : ¯ α t ( p, φ, h ) = 1 w.p. (cid:16) φ H − φ L φ H (cid:17) (cid:16) ¯ ρ c ( p ∗ )¯ ρ ( p ∗ ) (cid:17) , if p = p ∗ and φ = φ H , otherwise (13) The promotion policy deﬁned by (13) is well-deﬁned as < φ H − φ L φ H < by deﬁnition, and < ¯ ρ c ( p ∗ )¯ ρ ( p ∗ ) < by Assumption 1. One may observe that p ∗ is the unique myopically optimal price to set in responseto ¯ α t at each period t and for all µ . Moreover, by construction: ¯ α t ( p, φ H , h ) φ H ¯ ρ ( p ∗ ) + (1 − φ H )¯ ρ c ( p ∗ ) = (1 − φ L )¯ ρ c ( p ∗ ) + ¯ α t ( p, φ H , h ) φ H ¯ ρ ( p ∗ ) , so the probability of a sale at price p ∗ is independent of the true value of φ . Thus, the seller’s posteriorbelief will not update throughout the horizon, and ¯ α ∈ A C ( µ ) for all µ ∈ [0 , . A.2 Consumers with Uniformly Distributed Valuations

Suppose that there are two sellers on the platform. For α, β ∈ [0 , t has independent valuesfor the products distributed uniformly over a unit square: v t ∼ U [ α − , α ] , v t ∼ U [ β − , β ] . Assume seller two sets a ﬁxed price equal to 0; or equivalently, that each customer’s value v t is netof some ﬁxed price. Each customer’s probability of purchase as a function of the seller’s (seller one’s)price p ∈ ( −∞ , α ], the platform’s promotion decision, and the consumer type is (see (1)): ρ ( p ) = P ( v − p ≥

0) = α − p, ρ c ( p ) =  (1 − β )( α − p ) + ( α − p ) , if p > α − βα − p − β , if p < α − β Finally, deﬁning consumer surplus in the standard way, we have (see (2)):¯ W ( p ) = (cid:90) [ α − ,α ] max { v − p, } ∂v = ( α − p ) , ¯ W c = (cid:90) [ β − ,β ] max { v , } ∂v = β W c ( p ) = (cid:90) [ α − ,α ] × [ β − ,β ] max { v − p, v , } ∂v ∂v =  (3 β + 3( α − p ) (1 − β ) + ( α − p ) ) , if p > α − β (3( α − p ) + 3 β (1 − α + p ) + β ) , if p < α − β. A.2.1 Optimal Policies

We focus on the case where Seller 1 sells the product that most consumers prefer. In particular, α > β (cid:16) − β (cid:17) which ensures that at all prices that may arise in equilibrium, the seller’s productgenerates more value than the competitor’s. This condition is equivalent to ( α − p ∗ > β ) where wedeﬁne p ∗ := arg max p pρ c ( p ). In this case, the platform is incentivized to promote the seller with highprobability.Throughout this section, we will use the notation corresponding to simple promotion policies (seeDeﬁnition 6). Thus, we only specify the price that is promoted with positive probability and theassociated probabilities of promotion. Proposition 2 (Optimal Policies with High Quality Seller) . Fix µ ∈ [0 , and suppose α > β (cid:16) − β (cid:17) .The optimal myopic promotion policy has α L ( µ ) = α H ( µ ) = 1 , and: p ( µ ) = 14 (2 α − β (1 − ¯ φ ( µ )) − (cid:113) ¯ φ ( µ ) (cid:113) α − β (1 − ¯ φ ( µ )) The optimal confounding promotion policy has α CL ( µ ) = 1 : α CH ( µ ) = α − p C ( µ ) − β α − p C ( µ ) (cid:18) φ H − φ L φ H (cid:19) + φ L φ H ,p C ( µ ) = 14 (cid:18) α − β (1 − φ L ) − (cid:113) (2 α − β ) ( φ H − φ L ) µ + φ L (4 α − β ) + β φ L (cid:19) Proof.

Myopic policy : First, determine the value of the outside option which requires solving for theoptimal price to set for patient consumers (assume it is low enough that p ∗ ≤ α − β ). Thus, we solve:max p ∈ [0 ,α ] pρ c ( p ) = max p ∈ [0 ,α ] p (cid:18) α − p − β (cid:19) The objective is concave in p , so from ﬁrst order conditions: p ∗ = 14 (2 α − β )We now verify that p ∗ ≤ α − β as:14 (2 α − β ) < α − β ⇔ α > β (1 − β π O ( µ ) = (1 − ¯ φ ( µ )) (cid:18) α − β (cid:19) p , by As-sumption 2. Moreover, since a feasible solution is α L = α H = 1 and p = p ∗ and the objective atthis solution dominates the consumer surplus at any p > p ∗ (with any promotion probabilities), theoptimal price must be less than p ∗ . Finally, at any price p < p ∗ , the objective is increasing in α φ .Thus, at the optimal solution: α L = 1 , α H = 1. Moreover, the objective is decreasing in price so theoptimal price is the smallest one that satisﬁes the myopic constraint.(1 − E φ ) p ∗ ρ ( p ∗ ) = E φ ( p ¯ ρ ( p ) + (1 − E φ ) p ¯ ρ c ( p )(1 − ¯ φ ( µ )) (cid:18) α − β (cid:19) = ¯ φ ( µ ) p ( α − p ) + (1 − ¯ φ ( µ )) p ( α − p − β − ¯ φ ( µ )) (cid:18) α − β (cid:19) = p ( α − p ) − (1 − ¯ φ ( µ )) p β ⇒ p ( µ ) = 14 (2 α − β (1 − ¯ φ ( µ )) − (cid:113) ¯ φ ( µ ) (cid:113) α − β (1 − ¯ φ ( µ )) Confounding Policy : Using the analysis in § α φ L . First,we rule out that α φ L = 0 because at any p < p ∗ , the objective is increasing in α φ L . We can now showthat the optimal solution under the other two possibilities is the same. Assume that the contsraintbinds. Using this substitution, we have that the price that solves the following problem is optimal.max p ∈ P ( ¯ W ( p ) − ¯ W c ) (cid:18) p ∗ ρ c ( p ∗ ) − p ¯ ρ c ( p ) p ¯ ρ ( p ) (cid:19) + ¯ W c ( p )s.t. π O ( µ ) − (1 − φ L ) p ¯ ρ c ( p ) ≤ φ L p ¯ ρ ( p ) p ≤ p ∗ The objective is decreasing in p (by our assumption about p ∗ ), so the optimal solution must have α Cφ L = 1 and we can solve for the lowest price that satisﬁes the constraint: π O ( µ ) − (1 − φ L ) p ¯ ρ c ( p ) ≤ φ L p ¯ ρ ( p )(1 − ¯ φ ( µ )) (cid:18) α − β (cid:19) = (1 − φ L ) p ( α − p − β φ L p ( α − p ) ⇒ p ( µ ) = 14 (cid:18) α − (1 − φ L ) β − (cid:113) − αβ ( φ H − φ L ) µ + 4 α ( φ L + ( φ H − φ L ) µ ) + β ( φ L − φ L + ( φ H − φ L ) µ ) (cid:19) = 14 (cid:18) α − (1 − φ L ) β − (cid:113) (4 α − αβ + β )( φ H − φ L ) µ + 4 α ( φ L ) + β ( φ L − φ L ) (cid:19) = 14 (cid:18) α − β + φ L β − (cid:113) (2 α − β ) ( φ H − φ L ) µ + 4 φ L ( α − β ) + β φ L (cid:19) α Cφ H ( µ ) from the confounding constraint. A.3 Seller Observes Promotions

In this section, we analyze a setting where the seller observes the promotion decision at each period.We show that, as in our nominal formulation, the achievable long-run average consumer surplus whenthe seller is myopic is determined by the optimal confounding payoﬀ. Formally, we adjust the modelof § beginning of period t as: h a = (cid:104) s, α , σ (cid:105) , and h at = (cid:68) s, α , σ, ( p t (cid:48) , a t (cid:48) , y t (cid:48) ) t − t (cid:48) =1 (cid:69) , for t > . We denote by {H at = σ ( h t ) , t = 1 , ..., T } the ﬁltration associated with the process { h at } Tt =1 , and wedenote the set of possible histories at the beginning of period t as H at = { L, H } × (cid:0) P × { , } (cid:1) t − .The seller’s belief system, µ is deﬁned in terms of these histories. The payoﬀs and action spacesremain the same so the seller’s and platform’s myopic policies remain the same (with respect to theseller’s beliefs). However, with new information revealed, the space of confounding promotion policieschanges. Confounding promotion policies are deﬁned in the same way (though in terms of the adjustedhistories and belief structure). Deﬁnition 7 (Confounding Promotion Policies) . Suppose the seller uses the myopic pricing policy, π ∗ . For each belief µ ∈ [0 , , deﬁne the set of confounding promotion policies A C,a ( µ ) ⊂ A M as thosewhich prevent the seller’s belief from updating throughout periods t = 1 , . . . , T . That is, α ∈ A C,a ( µ ) ,if and only if for all t = 1 , ..., T , one has P ( µ t +1 = µ | µ t = µ, π ∗ , α ) = 1 . With access to promotion decisions, the seller can learn the true value of φ based on sales observationsand/or promotion decisions, so confounding the seller requires the platform to use policies that satisfymore stringent conditions. First, confounding the seller at price p requires α ( p, φ H , µ ) = α ( p, φ L , µ ).Otherwise, if the promotion decision depends on φ , (i.e., α ( p, φ H , µ ) (cid:54) = α ( p, φ L , µ )), then the promotiondecision itself reveals information about φ . Second, the probability of a sale at price p must beindependent of the seller’s patience type. If the seller is promoted at price p , the policy is confoundingonly if ¯ ρ ( p ) = ¯ ρ c ( p ), and if the seller is not promoted at price p , then impatient customers do notpurchase, so the price must have ¯ ρ c ( p ) = 0 as well. Such a price will not arise from a myopic seller,however, so confounding at µ ∈ (0 ,

1) requires: α ( p, φ H , µ ) = α ( p, φ L , µ ) = 1 , and ¯ ρ ( p ) = ¯ ρ c ( p ) . Thus, for many demand models satisfying Assumption 1 (including Example 1), confounding is notpossible. That is, A C,a ( µ ) = ∅ . As in the nominal analysis, it is useful to deﬁne the optimal confounding36ayoﬀ W C,a ( µ ) under the alternate histories h at . Deﬁne: W C,a ( µ ) := max α ∈A C,a ( µ ) T E (cid:32) T (cid:88) t =1 W ( p t , a t , c ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α , π ∗ , µ (cid:33) , (14)where W C,a ( µ ) = 0 if A C,a ( µ ) = ∅ . Since, A C,a ( µ ) ⊂ A C ( µ ), it follows that the optimal consumersurplus generated by promotion policies is smaller in the more reﬁned information setting. Proposition 3 (Access to Promotion Decisions Decreases Consumer Surplus) . For all µ ∈ [0 , , W C ( µ ) ≥ W C,a ( µ ) . (15)In many cases the inequality in Proposition 3 is strict. For example, since A C,a ( µ ) = ∅ for all µ ∈ (0 , W C,a ( µ ) = 0 for all µ ∈ (0 , Theorem 4 (Characterization of Long-Run Average Optimal Consumer Surplus with Observed Pro-motions)) . If the seller observes promotion decisions, then for all µ ∈ [0 , , lim T →∞ sup α ∈A ,σ ∈ Σ T W α ,σ, π ∗ T ( µ ) = co ( W C,a ( µ )) . The proof of Theorem 4 follows from the same structure as the proof of Theorem 1. Through Propo-sition 3 and Theorem 4, we have established that providing access to promotion decisions reduces thelong-run average consumer surplus generated by the platform’s policy when the seller prices myopi-cally. Thus, one concrete policy recommendation for a platform seeking to maximize consumer surplusis to withhold access to individual promotion decisions.

A.4 Designing Simple Policies

In this section, we detail a recipe for how to design optimal simple platform policies. In general, givenany concrete demand structure that satisﬁes the mild conditions in Assumption 1, one can design theoptimal simple policy in three steps; ( i ) characterize a simple promotion policy which generates value W C ( µ ) for all µ given that the seller is myopic; ( ii ) determine co ( W C ( µ )) based on the characterizationof W C ( µ ); and ( iii ) determine an optimal simple signal given µ , W C ( µ ), and co ( W C )( µ ).Working backwards, once the ﬁrst two have been completed, determining the optimal signaling mech-anism is straightforward. From the previous analysis, the optimal long-run average consumer surplusis co ( W C )( µ ), and the seller’s belief does not update from sales observations. Therefore, an optimalsignaling mechanism ensures the seller’s posterior belief distribution (in period 1) is optimal given thatthe expected continuation value will be W C ( µ ) in every period. Thus, an optimal simple signal, σ (cid:48) ,37akes the form: σ (cid:48) ( φ L ) =  φ L , w.p. (cid:16) − µ (cid:48) − µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) φ H , w.p. − (cid:16) − µ (cid:48) − µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) σ (cid:48) ( φ H ) =  φ L , w.p. (cid:16) µ (cid:48) µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) φ H , w.p. − (cid:16) µ (cid:48) µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) where µ (cid:48) = sup { µ ≤ µ : co ( W C ( µ )) = W C ( µ ) } and µ (cid:48)(cid:48) = inf { µ ≥ µ : co ( W C ( µ )) = W C ( µ ) } .In the second step, one then solves for the concaviﬁcation of W C ( µ ). In many cases, for example, thedemand model of Example 1, W C ( µ ) can be described analytically and is concave on the interior (0 , co ( W C ) a simple exercise. In particular, one only needs to ﬁnd the uniquebelief ˜ µ where W C (˜ µ ) + ( µ − ˜ µ )( W C (1) − W C (˜ µ )) lies entirely above W C ( µ ) on [0 , W C ( µ ) cannot be described analytically, then W C ( µ ) can be determined numerically over agrid of beliefs. In this case, ﬁnding co ( W C )( µ ) is a simple numerical procedure.Finally, in completing the ﬁrst step, the platform must solve the following optimization problem foreach µ ∈ [0 , α ∈A S T E (cid:32) T (cid:88) t =1 W c ( p t , a t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α , π ∗ , µ (cid:33) Again, while A S remains a large space of policies, we can simplify the problem in several steps. First,by considering simple promotion policies, we need only consider the analysis with T = 1. Second,by Proposition 1, for a ﬁxed µ , the promotion and pricing policies can be characterized using onlythree variables: the optimal price p ∈ P , and the probability of promotion for each realized value of φ , α φ ∈ [0 ,

1] for φ ∈ { φ L , φ H } . Moreover, we observe that one may remove the dependence on oneof the two latter quantities through the confounding constraint. If µ ∈ (0 , α φ L and price p , the confounding constraint fully deﬁnes α φ H : α φ H = (cid:18) φ H − φ L φ H (cid:19) (cid:18) ¯ ρ c ( p )¯ ρ ( p ) (cid:19) + α φ L (cid:18) φ L φ H (cid:19) . In the case that µ ∈ { , } , one of α φ L or α φ H does not aﬀect the outcome and can be ignored; forsimplicity we proceed describing the problem in terms of α φ L assuming µ ∈ [0 ,

1) but the method fordetermining W C (1) is essentially identical. Replacing this constraint results in the following optimiza-tion problem: W C ( µ ) := max α φL ∈ [0 , ,p ∈ P ( ¯ W ( p ) − ¯ W c ) (cid:18) µ ( φ H − φ L ) ¯ ρ c ( p )¯ ρ ( p ) + φ L α φ L (cid:19) + ¯ W c ( p )(1 − ¯ φ ( µ )) + ¯ φ ( µ ) ¯ W c s.t. (1 − φ L ) p ¯ ρ c ( p ) + φ L α φ L p ¯ ρ ( p ) ≥ p ∗ ¯ ρ ( p ∗ , c )(1 − φ L − ( φ H − φ L ) µ ) (16)Given a price p ∈ P , the objective is linear in α φ L , so at the optimal solution, at least one of the38onstraints will bind. That is, α φ L ∈ (cid:26) , π O ( µ ) − (1 − φ L ) p ¯ ρ c ( p ) φ L p ¯ ρ ( p ) , (cid:27) . In either of these three cases, the only optimization variable that remains is the price, so it is oftenpossible to characterize W C ( µ ) and the associated promotion policy analytically in closed form asshown in Example 1. However, even when the analytical characterization is not possible, with aconcrete demand model speciﬁed, one may characterize W C ( µ ) numerically and then follow steps ( ii )and ( iii ).The structure of the optimal ( p, α φ L ) depends on the underlying demand model and reﬂects the tensionthat the platform faces in achieving three goals: incenvitizing low seller prices, promoting the bestproduct oﬀering to impatient consumers, and confounding the seller. In some cases, these goals arealigned. For example, if the seller’s product generates more expected consumer surplus than thecompetition and confounding the seller is easier at lower prices (reﬂected by a decreasing ratio ρ c ( p ) ρ ( p ) ),then setting α φ L = 1 and selecting p as the smallest price that satisﬁes the constraint is optimal.Depending on the demand model, however, this may not be the case. If the competition generatesmore consumer surplus in expectation, then increasing α φ L means more impatient consumers seean inferior product but also that the platform may incentivize a lower price which beneﬁts patientconsumers. In this case, the platform must balance these competing objectives. Similarly, dependingon the structure of ρ c ( p ) ρ ( p ) , the goal of confounding may or may not be aligned with the other two becausethe ratio can increase or decrease in p . 39 Proofs

Throughout all proofs, we will refer to the patience type of consumer t explicitly using ψ t ∈ { I, P } and include ψ in the arguments for the functions W ( p, a ) and ρ ( p, a ) deﬁned in § B.1 Proof of Lemma 1

Recall the Lemma:

Suppose the seller is using the myopic Bayesian pricing policy π ∗ . Then, for any α ∈ A , σ ∈ Σ , there exists a promotion policy α (cid:48) ∈ A M such that: W α ,σ, π ∗ T ( µ ) ≤ W α (cid:48) ,σ, π ∗ T ( µ ) . Fix α ∈ A and σ ∈ Σ. We construct α (cid:48) that satisﬁes the properties by inductively altering α backwardsover periods t = T, ...,

1. For h ∈ H t , deﬁne the expected consumer surplus that is generated in theremaining periods: W α ,σ, π ∗ t,T ( h ) = E Z (cid:32) T (cid:88) t (cid:48) = t W ( p t (cid:48) , a t (cid:48) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) α , σ, π ∗ , h t = h (cid:33) . Given ¯ h ∈ ¯ H t , let [¯ h ] := { ¯ h (cid:48) ∈ ¯ H t : µ t ( (cid:104) α , σ, ¯ h (cid:48) (cid:105) ) = µ t ( (cid:104) α , σ, ¯ h (cid:105) ) } ∈ ¯ H t . In other words, [¯ h ] is theequivalence class of ¯ h , where ¯ h is equivalent to outcomes in ¯ H T which induce the same seller belief. Period T : Deﬁne a function h T : [0 , → ¯ H T that maps each belief µ to an element of ¯ H T

1) thatgenerates belief µ and 2) generates the highest expected continuation consumer surplus of outcomesthat generate belief µ : h T ( µ ) = ¯ h (cid:48) where µ t ( (cid:104) α , σ, ¯ h (cid:48) (cid:105) ) = µ and W α ,σ, π ∗ T,T ( (cid:104) α (cid:48) T , σ, ¯ h (cid:48) (cid:105) ) ≥ W α ,σ, π ∗ T,T (¯ h (cid:48)(cid:48) ) for ¯ h (cid:48)(cid:48) ∈ [¯ h (cid:48) ] . Then, for each ¯ h ∈ ¯ H T , deﬁne: α (cid:48) T ( p, φ, ¯ h ) = α T ( p, φ, h T ( µ T ( α , σ, ¯ h ))), and α (cid:48) T = { α t } T − t =1 ∪ { α (cid:48) T } .First, α (cid:48) T satisﬁes the conditions of Deﬁnition 2. Because { α t } T − t =1 and σ are unchanged, we have thatfor all ¯ h ∈ ¯ H T : µ T ( (cid:104) α (cid:48) T , σ, ¯ h (cid:105) ) = µ T ( (cid:104) α T , σ, ¯ h (cid:105) ). Thus, By our construction of α (cid:48) T : at any ¯ h (cid:48) , ¯ h (cid:48)(cid:48) where: µ T ( (cid:104) σ, α (cid:48) T , ¯ h (cid:48) (cid:105) ) = µ T ( (cid:104) σ, α (cid:48) T , ¯ h (cid:48)(cid:48) (cid:105) ) , one has: α (cid:48) T ( p, φ, ¯ h (cid:48) ) = α (cid:48) T ( p, φ, ¯ h (cid:48)(cid:48) ) , ∀ p ∈ P, φ ∈ { φ L , φ H } . Second, we show that the adjusted policy α (cid:48) T generates at least as much consumer surplus. Note thatat ¯ h (cid:48) and ¯ h (cid:48)(cid:48) , the price set according to π ∗ is the same because the seller’s belief is the same and α T is identical for every p and φ . Thus, the seller’s expected revenue at every price p is the same and the40xpected consumer surplus is also the same. Therefore, W α (cid:48) T ,σ, π ∗ T,T (¯ h ) = max ¯ h (cid:48) ∈ [¯ h ] W α ,σ, π ∗ T,T (¯ h (cid:48) ) ≥ W α ,σ, π ∗ T,T (¯ h ) , Thus, we can express α (cid:48) T as a function of µ T , the belief in period T . Moreover, because the promotionpolicy and pricing decisions in previous periods are unaﬀected by changes in period T , we have: W α (cid:48) T ,σ, π ∗ T ( µ ) ≥ W α ,σ, π ∗ T ( µ ) . We continue this procedure iteratively.

Induction Hypothesis : Fix 1 ≤ t ≤ T . Assume that we have constructed α (cid:48) t +1 such that the policyfor the ﬁrst t periods is the same as α : that is, α (cid:48) t (cid:48) = α t (cid:48) for all t (cid:48) ≤ t . Moreover, for all t (cid:48) > t , we canexpress the promotion policy as a function of the seller’s belief. That is, α (cid:48) t (cid:48) : P × { φ L , φ H } × [0 , → [0 , h ∈ ¯ H t +1 : W α (cid:48) t +1 ,σ, π ∗ t +1 ,T (¯ h ) ≥ W α ,σ, π ∗ t +1 ,T (¯ h ) , ∀ ¯ h ∈ ¯ H t +1 , and W α (cid:48) t +1 ,σ, π ∗ T ( µ ) ≥ W α ,σ, π ∗ T ( µ ) Induction Step : Fix t and α (cid:48) t +1 that satisﬁes the induction hypothesis. Deﬁne a function h t : [0 , → ¯ H t that maps from each belief µ to an element of ¯ H t that 1) generates belief µ and 2) generates thehighest expected continuation consumer surplus of outcomes that generate belief µ : h t ( µ ) = ¯ h (cid:48) where µ t ( (cid:104) α (cid:48) t +1 , σ, ¯ h (cid:48) (cid:105) ) = µ and W α (cid:48) t +1 ,σ, π ∗ t,T (¯ h (cid:48) ) ≥ W α (cid:48) t +1 ,σ, π ∗ t,T (¯ h (cid:48)(cid:48) ) for ¯ h (cid:48)(cid:48) ∈ [¯ h ] . Then, for each ¯ h ∈ ¯ H t , deﬁne: α (cid:48) t ( p, φ, ¯ h ) = α t ( p, φ, h t ( µ t ( α (cid:48) t +1 , σ, ¯ h ))) , and α (cid:48) t = { α τ } t − τ =1 ∪ { α (cid:48) τ } Tτ = t .We next show that α (cid:48) t satisﬁes the conditions of Deﬁnition 2 for periods t (cid:48) = t, ..., T . For all ¯ h ∈ ¯ H t : µ t ( (cid:104) α (cid:48) t , σ, ¯ h (cid:105) ) = µ t ( (cid:104) α (cid:48) t +1 , σ, ¯ h (cid:105) ) , because { α t (cid:48) } t − t (cid:48) =1 and σ are unchanged. Therefore, by construction at any ¯ h (cid:48) , ¯ h (cid:48)(cid:48) where: µ t ( (cid:104) α (cid:48) t , σ, ¯ h (cid:48) (cid:105) ) = µ t ( (cid:104) α (cid:48) t , σ, ¯ h (cid:48)(cid:48) (cid:105) ), one has: α (cid:48) t ( p, φ, ¯ h (cid:48) ) = α (cid:48) t ( p, φ, ¯ h (cid:48)(cid:48) ) , ∀ p ∈ P, φ ∈ { φ L , φ H } . When multiple prices maximize the seller’s revenue and the consumer surplus, it is possible, in principle, to havemyopic pricing policies that depend on the history, but it is without loss of platform optimality to assume that the priceset by every myopic promotion policy is the same one. Moreover, this does not occur under optimal promotion policies. t + 1 , ..., T so the conditionis satisﬁed in those as well.Second, the expected continuation consumer surplus is at lease as large. Fix history ¯ h ∈ ¯ H t : W α (cid:48) t ,σ, π ∗ t,T (¯ h ) = max ¯ h (cid:48) ∈ [¯ h ] W α (cid:48) t +1 ,σ, π ∗ t,T (¯ h (cid:48) ) ≥ W α (cid:48) t +1 ,σ, π ∗ t,T (¯ h ) . The ﬁrst equality again follows because the price set by a myopic seller is independent of the historyexcept for its eﬀect on the platform promotion policy (which by construction is the same) and theseller’s belief (which is the same by assumption). Therefore, given a belief µ , the seller sets the sameprice at every history that generates belief µ and the expected consumer surplus in the current periodis the same. Moreover, the consumer surplus in future periods is the same because the distribution ofbeliefs that are induced in subsequent periods is also the same given the same belief in period t , andby the induction hypothesis, outcomes (including consumer surplus) are uniquely determined by theseller’s belief in periods t + 1 to T under α (cid:48) t . Thus, W α (cid:48) t ,σ, π ∗ T ( µ ) ≥ W α ,σ, π ∗ T ( µ ) , and α (cid:48) t satisﬁes the conditions of the Induction hypothesis.Continute this process iteratively until period 1 and we will have α (cid:48) ∈ A M and W α ,σ, π ∗ T ( µ ) ≤ W α (cid:48) ,σ, π ∗ T ( µ ) . B.2 Proof of Theorem 1

The proof is divided into three sections. We ﬁrst state preliminaries and auxiliary results in § B.2.1.Using these results, we then prove the statement of Theorem 1 in § B.2.2. Finally, we prove the auxiliaryresults in § B.2.3

B.2.1 Preliminaries and Auxiliary Results

First, recall that π ∗ , (see Deﬁnition 1) is the Bayesian myopic pricing policy that maximizes the currentconsumer surplus. By Lemma 1, since the seller is myopic, it is without loss to specify the platformpromotion strategy as a function of the seller belief instead of the entire history. Thus, throughout theproof we focus our analysis on A M ⊂ A .For a ﬁxed (cid:15) > α ∈ A M , deﬁne the sets of beliefs, M α t ( (cid:15) ) ⊂ [0 , = 1 , ..., T where the expected consumer surplus is at least (cid:15) more than the corresponding value co ( W C )( µ ): M α t ( (cid:15) ) := { µ ∈ [0 ,

1] : E a t ,p t ,ψ,φ ( W ( p t , a t , ψ ) | α t , π ∗ , µ ) > co ( W C )( µ ) + (cid:15) } . The following result establishes that if the platform uses a promotion policy that generates expectedconsumer surplus greater than co ( W C )( µ ) + (cid:15) , the sales observation is informative for the seller. Lemma 2 (Separation of Purchase Probabilities) . Fix (cid:15) > . There exists δ > such that for all α ∈ A M , if µ ∈ M α t ( (cid:15) ) , then: | φ H α ( p t , φ H , µ )¯ ρ ( p t ) + (1 − φ H )¯ ρ c ( p t ) − φ L α ( p t , φ L , µ )¯ ρ ( p t ) − (1 − φ L )¯ ρ c ( p t ) | > δ. Next we show that beliefs converge to the truth exponentially fast (in the number of periods that µ t ∈ M α t ( (cid:15) )) which is closely related to Harrison et al. (2012) Lemma A.1. Deﬁne t n = min (cid:40) t : t (cid:88) t (cid:48) =1 { µ t (cid:48) ∈ M α t (cid:48) ( (cid:15) ) } ≥ n (cid:41) (17)where t n = T + 1 if n > (cid:80) Tt (cid:48) =1 { µ t (cid:48) ∈ M α t (cid:48) ( (cid:15) ) } and for convenience, we deﬁne history h T +1 to include φ so that µ T +1 = 0 if φ = φ L and µ T +1 = 1 if φ = φ H . Lemma 3 (Convergence of Seller Beliefs) . Fix µ ∈ [0 , and let { t n } be deﬁned according to (17) .There exist constants χ, ψ > such that: E ( µ t n | φ = φ L ) ≤ χ exp( − ψn ) E (1 − µ t n | φ = φ H ) ≤ χ exp( − ψn ) , ∀ t = 1 , , ..., T. Finally, deﬁne W max ( µ ) as the maximum consumer surplus achievable by any promotion policy when T = 1 and the seller has belief µ . Recall the deﬁnition of ¯ W ( φ, µ ) from § W max ( µ ) := µ ¯ W ( φ H , µ ) + (1 − µ ) ¯ W ( φ L , µ ) . Lemma 4 ( W C ( µ ) Bounded by Linear Functions) . Fix (cid:15) > . There exists ¯ C ≥ such that for all µ ∈ [0 , : co ( W max )( µ ) − co ( W C )( µ ) < (cid:15) Cµ, and co ( W max )( µ ) − co ( W C )( µ ) < (cid:15) C (1 − µ ) . .2.2 Proof of Theorem 1 Fix (cid:15) > T ≥

1, and platform strategies α ∈ A M , σ ∈ Σ. Let E Z indicate expectation with respectto any randomness in the pricing policy, promotion policy, customer types, purchase decisions, andthe true value of φ : that is, ( p , a , ψ , y , φ ): W α ,σ, π ∗ T ( µ ) = E Z (cid:32) T (cid:88) t =1 W ( p t , a t , ψ t ) { µ t ∈ M α t ( (cid:15)/ } + W ( p t , a t , ψ t ) { µ t (cid:54)∈ M α t ( (cid:15)/ } (cid:33) − T co ( W C )( µ ) ( a ) ≤ E (cid:32) T (cid:88) t =1 W ( p t , a t , ψ t ) { µ t ∈ M α t ( (cid:15)/ } + W ( p t , a t , ψ t ) { µ t (cid:54)∈ M α ( (cid:15)/ } − co ( W C )( µ t ) (cid:33) ( b ) ≤ T (cid:88) t =1 E (cid:0) ( W ( p t , a t , ψ t ) − co ( W C )( µ t )) { µ t ∈ M α t ( (cid:15)/ } (cid:1) + (cid:15) T (cid:88) t =1 E1 { µ t (cid:54)∈ M α t ( (cid:15)/ } (18)(a) co ( W C ) is concave by construction and E µ t = µ , ∀ t because Bayesian beliefs are a martingale.Thus, by Jensen’s inequality: E co ( W C ( µ t )) ≤ co ( W C ( E µ t )) = co ( W C ( µ )) . (b) Splitting co ( W C )( µ t ) across the outcomes µ t ∈ M α t ( (cid:15)/

2) and µ t (cid:54)∈ M α t ( (cid:15)/ T (cid:15) for all T . We complete the proofby showing the existence of ¯ T such that for all T > ¯ T the ﬁrst term is less than T (cid:15) . That is: T (cid:88) t =1 E (cid:0) ( W ( p t , a t , ψ t ) − co ( W C )( µ t )) { µ t ∈ M α ( (cid:15)/ } (cid:1) ≤ T (cid:15) φ = φ L and select χ, ψ > T T (cid:88) n =1 E Z (cid:0) ( W ( p t , a t , ψ t ) − co ( W C )( µ t n )) (cid:12)(cid:12) φ = φ L (cid:1) ≤ T T (cid:88) n =1 E Z (cid:0) co ( W max )( µ t n ) − co ( W C )( µ t n ) (cid:12)(cid:12) φ = φ L (cid:1) ≤ T T (cid:88) n =1 (cid:16) (cid:15) C E Z ( µ t n | φ = φ L ) (cid:17) , [by Lemma 4] ≤ (cid:15) C T T (cid:88) n =1 χ exp( − ψn ) , [by Lemma 3] ≤ (cid:15) C T ∞ (cid:88) n =1 χ exp( − ψn )= (cid:15) C χT e ψ − φ = φ H . Thus we can select ¯ T > ¯ C χ(cid:15) e ψ − , and for any T > ¯ T :1 T W α ,σ, π ∗ T ( µ ) < co ( W C )( µ ) + (cid:15). Letting (cid:15) →

0, we have: lim T →∞ sup α ∈A ,σ ∈S T W α , Σ , π ∗ ( µ, T ) ≤ co ( W C )( µ ) . To establish the reverse direction, we prove the existence of α , σ that achieves co ( W C )( µ ) for all µ .First, there exists a promotion policy that generates W C ( µ ) for µ ∈ [0 ,

1] and any T ≥

1. Second,there exists a signaling mechanism that, coupled with the promotion policy, generates expected payoﬀ co ( W C )( µ ) in every period. Existence of Optimal Confounding Promotion Policy . Consider the optimization (20) from theproof of Lemma 2. Fix µ ∈ [0 , , δ = 0. The feasible set of (20): F ( µ, δ ) ⊂ [0 , × P , is compactsince it is closed and bounded for any ﬁxed µ, δ . Thus, there exists an optimal solution to (20) by theextreme value theorem.Let α C ∈ A M correspond to the simple promotion policy where the one-period solution to (20) with δ = 0 is repeated T times for every µ . By construction the seller’s belief, myopically optimal price,and the expected welfare are the same in each period. Thus, the payoﬀ generated by this policy givenposterior belief µ ∈ [0 ,

1] is T · W C ( µ ). Existence of Optimal Signaling Mechanism

We now show that an optimal signal achieves co ( W C )( µ ). W C ( µ ) continuous in µ by the proof of Lemma 4. Therefore, an optimal signal σ ∈ Σ ex-ists (see Kamenica and Gentzkow (2011) Corrollary 1 and discussion) and by Kamenica and Gentzkow(2011) Corrollary 2, the optimal signal at prior µ generates value co ( W C )( µ ) in each period.Thus, for any µ, T , there exists α , σ such that:1 T W α , Σ , π ∗ T ( µ ) = co ( W C )( µ ) . B.2.3 Proofs of Auxiliary ResultsProof of Lemma 2.

Recall the Lemma:

Fix (cid:15) > . There exists δ > such that for all α ∈ A M , if µ ∈ M α t ( (cid:15) ) , then: | φ H α ( p t , φ H , µ )¯ ρ ( p t ) + (1 − φ H )¯ ρ c ( p t ) − φ L α ( p t , φ L , µ )¯ ρ ( p t ) − (1 − φ L )¯ ρ c ( p t ) | > δ. roof . by Lemma 1, it is without loss to specify the promotion as a function of µ instead of the entirehistory. Moreover, by Proposition 1 it is without loss to consider α ∈ A P . Given this simpliﬁcation,we simplify notation by letting, for φ ∈ { φ L , φ H } , α φ = α ( p, φ, µ ) where µ will be left implicit and p is the single price where α φ may be greater than 0.With this notation, deﬁne the following relaxed optimization problem where the policy α need onlybe δ confounding. That is, for µ ∈ (0 , , δ ≥

0, deﬁne: W C ( µ, δ ) := max α φH ,α φL ∈ [0 , ,p ∈ P E φ (cid:0) φα φ ¯ W ( p ) + φ (1 − α φ ) ¯ W c + (1 − φ ) ¯ W c ( p ) | µ (cid:1) s.t. pρ ( p )( φ L α φ L (1 − µ ) + φ H α φ H µ ) + pρ c ( p )(1 − φ L − µ ( φ H − φ L )) ≥ p ∗ ρ c ( p ∗ )(1 − φ L − µ ( φ H − φ L )) | φ H α φ H ¯ ρ ( p t ) + (1 − φ H )¯ ρ c ( p t ) − φ L α φ L ¯ ρ ( p t ) − (1 − φ L )¯ ρ c ( p t ) | ≤ δ. (20)The ﬁrst constraint requires that the pricing policy be myopically optimal. Setting p as a decisionvariable ensures that it is the price that maximizes consumer surplus. With this optimization problem,we prove the statement of the Lemma in a series of statements.1. The objective of (20) is Lipshitz continuous in ( α φ L , α φ H , p ) as it is linear in α φ L , α φ H (withbounded coeﬃcients) and is Lipschitz continuous in p by Assumption 2.2. Fix (cid:15) >

0. There exists ¯ δ > µ ∈ [0 , , δ < ¯ δ : for x ∈ F ( µ, δ ), there exists y ∈ F ( µ,

0) such that || x − y || < (cid:15) . Proof.

Fix µ ∈ [0 , , ¯ δ ≥ α φ L , α φ H , p ) ∈ F ( µ, ¯ δ ).We construct a new point in F ( µ,

0) whose distance from the original will be a function of ¯ δ .Thus, we can select a small enough ¯ δ (independently of µ ) that the statement holds. Case 1 : Assume that for 0 ≤ ξ ≤ ¯ δ : φ H α φ H ¯ ρ ( p t ) + ( φ L − φ H )¯ ρ c ( p t ) − φ L α φ L ¯ ρ ( p t ) = − ξ. In this case, ( α φ L , α φ H + ξφ H ρ ( p ) , p ) ∈ F ( µ, Proof.

Consider the three constraints.0 ≤ α φ H ≤ φ H α φ H ¯ ρ ( p t ) + ( φ L − φ H )¯ ρ c ( p t ) − φ L α φ L ¯ ρ ( p t ) = 0 pρ ( p )( φ L α φ L (1 − µ ) + φ H α φ H µ ) + pρ c ( p )(1 − φ L − µ ( φ H − φ L )) ≥ p ∗ ρ c ( p ∗ )(1 − φ L − µ ( φ H − φ L )) . The second constraint holding implies the ﬁrst constraint holds. The second holds by construc-tion. The third follows beause the original point satisﬁed the constraint and ξφ H ρ ( p ) ≥ ase 2 : Assume that for 0 ≤ ξ < ¯ δ : φ H α φ H ¯ ρ ( p t ) + ( φ L − φ H )¯ ρ c ( p t ) − φ L α φ L ¯ ρ ( p t ) = ξ. Case 2a: α φ L ≤ − ξφ L ρ ( p ) . It follows by construction that ( α φ L + ξφ L ρ ( p ) , α φ H , p ) ∈ F ( µ, Case 2b: α φ L > − ξφ L ρ ( p ) . It is without loss to consider α φ L = 1 , since otherwise, we canalways select δ smaller so that (with ξ < δ ) the other case is relevant.Consider the point ( α φ L , α φ H − δφ H ρ ( p ) , p ). The seller’s revenue from pricing at p decreases by µpξ from this change. To ensure that the new point is feasible, we must alter the seller’s optimalprice. Thus we have, α φ L = 1, and α φ H = (cid:18) φ H − φ L φ H (cid:19) ρ c ( p ) ρ ( p ) + φ L φ H + ξρ ( p ) φ H . Therefore at p we have: pρ ( p ) φ L + pρ c ( p )(1 − φ L ) + µξp ≥ p ∗ ρ c ( p ∗ )(1 − φ L − µ ( φ H − φ L )) . If we leave α φ L =1, and ensure that α (cid:48) φ H is confounding, we need to change p to p (cid:48) so that p (cid:48) ρ ( p (cid:48) ) φ L + p (cid:48) ρ c ( p (cid:48) )(1 − φ L ) − pρ ( p ) φ L + pρ c ( p )(1 − φ L ) ≥ µξp .Assume that p < p ∗ (the reverse holds in the same way). Consider the seller’s revenue as afunction of p : it is concave in p (by Assumption 1) and at p = p ∗ (with α (cid:48) φ H confounding), theleft hand side of the ﬁrst constant in (20) is at least µ ( φ H − φ L ) p ∗ ρ c ( p ∗ ) + p ∗ ρ ( p ∗ ) φ L greaterthan the right hand side. Thus, for ∆ >

0, if we have p (cid:48) = p + ∆ < p ∗ , the seller’s revenue underthe policy (1 , α (cid:48) φ H , p (cid:48) ) is greater than its revenue under the original policy (1 , α φ H , p ) by at least: ∆ p ∗ − p ( µ ( φ H − φ L ) p ∗ ρ c ( p ∗ ) + p ∗ ρ ( p ∗ ) φ L ). Thus, the price increase, ∆ > p ∗ − p ( µ ( φ H − φ L ) p ∗ ρ c ( p ∗ ) + p ∗ ρ ( p ∗ ) φ L ) ≥ µξp ⇔ ∆ ≥ ( p ∗ − p ) µξp ( µ ( φ H − φ L ) p ∗ ρ c ( p ∗ ) + p ∗ ρ ( p ∗ ) φ L ) ⇐ ∆ ≥ ( p ∗ − p ) ξp (( φ H − φ L ) p ∗ ρ c ( p ∗ ) + p ∗ ρ ( p ∗ ) φ L )The last inequality follows because the right hand side is increasing in µ , so select∆ = p ∗ ξp ( φ H − φ L ) p ∗ ρ c ( p ∗ ) + p ∗ ρ ( p ∗ ) φ L , and the new point will be feasible (independently of µ ).Given ∆, we calculate the resulting distance | α (cid:48) φ H − α φ H | . We ﬁrst note that ρ c ( p ) ρ ( p ) is Lipschitzcontinuous in p since ρ ( p ) and ρ c ( p ) are Lipschitz continuous by Assumption 1, and they arebounded away from 0 at p < p ∗ . Assuming that the corresponding Lipschitz constant is ¯ D , we47ave: | α (cid:48) φ H − α φ H | ≤ ¯ D ∆ ( φ H − φ L ) φ H . Therefore, | ( α φ L , α φ H , p ) − ( α φ L , α (cid:48) φ H , p (cid:48) ) | < ( ¯ D + 1)∆ < ( ¯ D + 1)( ¯ Gξ ) . In all three cases, the distance between the orginal point and the constructed point in F ( µ, ξ which is less than ¯ δ by assumption. Thus, we can select ¯ δ small enough so thatthe distance between points is less than (cid:15) for ¯ δ . Finally, since F ( µ, δ ) is increasing in δ , for any δ < ¯ δ , if x ∈ F ( µ, δ ) ⇒ x ∈ F ( µ, ¯ δ ) and the distance holds for all δ < ¯ δ .3. Fix (cid:15) >

0. There exists ¯ δ > µ ∈ [0 ,

1] one has W C (¯ µ, ¯ δ ) − W C (¯ µ, < (cid:15) . Thisfollows by the Lipschitz continuity of the objective and the continuity (in δ ) of the feasibility sets(independently of µ ) as proven in the previous point.This completes the proof of the Lemma. Proof of Lemma 3.

By Lemma 2,one may ﬁx δ > µ ∈ M α t ( (cid:15) ), then: | φ H α ( p t , φ H , µ )¯ ρ ( p t ) + (1 − φ H )¯ ρ c ( p t ) − φ L α ( p t , φ L , µ )¯ ρ ( p t ) + (1 − φ L )¯ ρ c ( p t ) | > δ. Consider the ﬁrst inequality in the statement of the lemma (i.e. conditioned on φ = φ L ). The proofof the second follows nearly verbatim. Let E L indicate that we are taking expectation conditional on φ = φ L . Assume that σ is uninformative (we will incorporate this adjustment at the end) and considerthe evolution of the seller’s belief from this point on. From the proof of Harrison et al. (2012) LemmaA.1 (see equation (A4) and the following equation), we have that: E L ( µ t n ) = E L 

11 + (cid:16) − µ µ (cid:17) exp( L t n )  , where L t n = t n (cid:88) t (cid:48) =1 ( y t (cid:48) − ρ Lt ) log (cid:18) ρ Lt (cid:48) (1 − ρ Ht (cid:48) ) ρ Ht (cid:48) (1 − ρ Lt (cid:48) (cid:19) + t n (cid:88) t (cid:48) =1 (cid:18) (1 − ρ Lt (cid:48) ) log (cid:18) − ρ Lt (cid:48) − ρ Ht (cid:48) (cid:19) + ρ Lt (cid:48) log (cid:18) − ρ Lt (cid:48) − ρ Ht (cid:48) (cid:19)(cid:19) . (21)Considering the second summation in (21), we have: t n (cid:88) t (cid:48) =1 (1 − ρ Lt (cid:48) ) log (cid:18) − ρ Lt (cid:48) − ρ Ht (cid:48) (cid:19) + ρ Lt (cid:48) log (cid:18) − ρ Lt (cid:48) − ρ Ht (cid:48) (cid:19) ≥ n (cid:88) t (cid:48) =1 δ . This can be established by observing (see, e.g., the proof of Harrison et al. (2012) Lemma A.3) that48or x, y ∈ [0 , x log (cid:18) xy (cid:19) − x log (cid:18) − x − y (cid:19) ≥ x − y ) , and therefore, in periods where µ t ∈ M α t ( (cid:15) ), the summand is ≥

0. Otherwise, one may lower thebound the summand with 2 δ . Deﬁne the ﬁrst summation in (21) as M t . That is: M t := t n (cid:88) t (cid:48) =1 ( y t (cid:48) − ρ Lt ) log (cid:18) ρ Lt (cid:48) (1 − ρ Ht (cid:48) ) ρ Ht (cid:48) (1 − ρ Lt (cid:48) (cid:19) . Fix ξ >

0. Then, E LZ (cid:32)

11 + − µ µ exp( L t n ) (cid:33) = E LZ (cid:32)

11 + − µ µ exp( M t n + 2 nδ ) (cid:33) ( a ) ≤ E LZ (cid:32)

11 + − µ µ exp( M t n + 2 nδ ) ; | M t n | < ξt n (cid:33) + E LZ (cid:32)

11 + − µ µ exp( M t n + 2 nδ ) ; | M t n | ≥ ξt n (cid:33) ( b ) ≤ E LZ (cid:32)

11 + − µ µ exp( − ξt n + 2 nδ ) ; | M t n | < ξt n (cid:33) + P ( | M t n | ≥ ξt n | φ = φ L ) ( c ) ≤

11 + − µ µ exp( − ξt n + 2 nδ ) + (cid:18) γξ (cid:19) e − ξ γ n ( d ) ≤ µ − µ exp( ξt n − nδ ) + (cid:18) γξ (cid:19) e − ξ γ n ( e ) ≤ χe − ψn , where: (a) uses a lower bound on L t ; (b) holds by conditioning on M t and the fraction in the expectationbeing less than 1; and (c) By Harrison et al. (2012) Lemma A.3, there exists γ > t P ( | M t | ≥ ξt | φ = φ L ) ≤ (cid:18) − ξ t γ (cid:19) In our case, we cannot apply this directly to M t n because the stopping time t n could depend on thevalue of M t . Thus we integrate out the probability for all t ≥ n and have: P ( | M t n | ≥ ξt n | φ = φ L ) ≤ ∞ (cid:88) t = n P ( | M t | ≥ ξt | φ = φ L ) ≤ (cid:18) γξ (cid:19) e − ξ γ n . In addition, (d) follows by algebra, and (e) holds by setting χ = 2 max (cid:110) µ − µ , γξ (cid:111) and ψ = δ min (cid:110) , δ γ (cid:111) .Now we consider the evolution of the seller’s belief accounting for the platform’s opportunity to use asignal. Thus, we take expectation over the signal s and show that the result holds for any σ ∈ Σ. Fix49 ∈ Σ S , which by Proposition 1, is without loss of optimality. Thus µ can take two values which wedenote: µ = µ ( (cid:104) φ L (cid:105) ) ≤ µ ≤ µ ( (cid:104) φ H (cid:105) ) = µ which, using Bayes’ rule and algebra, implies that: P ( s = L | L ) = (1 − µ )( µ − µ )(1 − µ )( µ − µ ) , P ( s = L | H ) = µ ( µ − µ ) µ ( µ − µ ) . Note that: χ = 2 max (cid:26) µ − µ , γξ (cid:27) , and that ψ is independent of µ . Thus, taking expectation over the signal s , one has: E s ( µ t n | φ = φ L ) = P ( s = H | φ = φ L ) E ( µ t n | φ = φ L , µ = µ ) + P ( s = L | φ = φ L ) E (cid:0) µ t n | φ = φ L , µ = µ (cid:1) ≤ P ( s = H | φ = φ L )2 max (cid:26) µ − µ , γξ (cid:27) e − ξn + P ( s = L | φ = φ L )2 max (cid:26) µ − µ , γξ (cid:27) e − ξn ( a ) = (cid:16) µ − µ (cid:17) e − ξn , if µ − µ > γξ (cid:18) µ ( µ − (1 − µ ) (cid:16) γξ (cid:17) ) − µ ( µ − (1 − µ ) (cid:16) γξ (cid:17) )(1 − µ )( µ − µ ) (cid:19) e − ξn , if µ − µ < γξ < µ − µ γξ e − ξn , if µ − µ < γξ . In any of these cases, the constant on e − ξn is bounded. This completes the proof for the case of φ = φ L .The same proof holds when φ = φ H , where ( ψ, χ ) may be diﬀerent, and taking the maximum of themestablishes the result. Proof of Lemma 4.

We ﬁrst establish that for all δ ≥ W C ( µ, δ ) is continuous in µ (see below). Thus because theconfounding constraint in (16) need not hold for µ ∈ { , } , we have for all δ ≥ µ → + W C ( µ, δ ) ≤ W C (0 , δ ) . and lim µ → − W C ( µ, δ ) ≤ W C (1 , δ ) . Finally co ( W C )( µ, δ ) is bounded. Taken together, these prove that there exists the required ¯ C ≥ W C ( µ, δ ) for all δ ≥ δ ≥ . Proof of Continuity of Objective Function.

The objective of (20) is continuous in ( µ, α φ L , α φ H , p )as it is linear in µ, α φ L , α φ H and is continuous in p by Assumption 2. Proof of Lower Hemicontinuity.

Fix µ ∈ [0 ,

1] and ( α φ L , α φ H , p ) ∈ F ( µ, δ ) . Consider a sequence µ n → µ . We need to construct ( α φ L , α φ H , p ) n ∈ F ( µ n , δ ) such that ( α φ L , α φ H , p ) n → ( α φ L , α φ H , p ).50he only constraint aﬀected by µ is: pρ ( p )( φ L α φ L (1 − µ ) + φ H α φ H µ ) + pρ c ( p )(1 − φ L − µ ( φ H − φ L )) ≥ p ∗ ρ c ( p ∗ )(1 − φ L − µ ( φ H − φ L ))(22)For any n where ( α φ L , α φ H , p ) ∈ F ( µ n , δ ), set: ( α φ L , α φ H , p ) n = ( α φ L , α φ H , p ) Otherwise, set α nφ L = α φ L and α nφ H = α φ H . Moreover, set p n as the smallest price that is feasible given µ n . Thus, p n iscontinuous in µ n so we have p n → p , which completes the proof. Proof of Upper Hemicontinuity.

Fix µ ∈ [0 ,

1] and open set V ⊂ [0 , × P . If F ( µ, δ ) ⊂ V, thenwe must show: there exists (cid:15) > µ − (cid:15) < µ < µ + (cid:15) ⇒ F ( µ, δ ) ⊂ V . Proof.

Fix open set V such that F ( µ, δ ) ⊂ V . For any (cid:15) >

0, we have that F ( µ − (cid:15), δ ) ⊂ F ( µ, δ ) ⊂ V because increasing µ relaxes the only constraint aﬀected by µ . Thus, consider (cid:15) n ↓

0. We want to show that there exists¯ N such that for N > ¯ N , F ( µ + (cid:15) n , δ ) ⊂ V . This follows because F ( µ + (cid:15) n , δ ) is closed for each (cid:15) n andit converges to F ( µ, δ ).This concludes the proof of the Lemma. B.3 Proof of Theorem 2

Fix

T > . There exists a platform policy α ∈ A , σ ∈ Σ and seller pricing policy π such that: co ( W C )( µ ) = 1 T W α ,σ, π T ( µ ) ≥ T W α (cid:48) ,σ (cid:48) , π T ( µ ) , ∀ α (cid:48) ∈ A , σ (cid:48) ∈ Σ , and at each period t = 1 , . . . , T and every ¯ h ∈ ¯ H t , π is myopic (i.e. satisﬁes (5) ) and the best-responseto ( α , σ ) . That is, V α ,σ, π t (cid:0) (cid:104) α , σ, ¯ h (cid:105) (cid:1) ≥ V α ,σ, π (cid:48) t (cid:0) (cid:104) α , σ, ¯ h (cid:105) (cid:1) , ∀ π (cid:48) ∈ Π , ¯ h ∈ ¯ H, t = 1 , ..., T.

Proof.

From the proof of Theorem 1, there exists a confounding promotion policy and signalingmechanism that generates average consumer surplus co ( W C )( µ ) for all µ ∈ [0 ,

1] and T ≥ α C ∈ A M correspond to this promotion policy. Moreover, let σ C correspond to a signaling mechanism that achieves this expected consumer surplus with a signal spaceof cardinality 2 (e.g. S = { s (cid:48) , s (cid:48)(cid:48) } ): the existence of such a mechanism that achieves the payoﬀ withthe reduced signal space is established by Proposition 1. We will adjust the promotion policy slightlyat oﬀ-path beliefs below to establish an equilibrium. Equilibrium.

We construct ( ˜ α , ˜ σ, ˜ π ) that satisﬁes the statement of the Theorem. Fix µ . Set ˜ σ = σ C .Since ˜ σ has two outcomes, there are two possible values for µ : denote these as µ (cid:48) , µ (cid:48)(cid:48) and assume they51orrespond to signals s (cid:48) , s (cid:48)(cid:48) , respectively. Deﬁne, for all t and ¯ h ∈ ¯ H t , the promotion policy:˜ α t ( p, φ, ¯ h ) =  α C ( p, φ, µ (cid:48) ) , if s = s (cid:48) and µ (¯ h ) = µ (cid:48) α C ( p, φ, µ (cid:48)(cid:48) ) , if s = s (cid:48)(cid:48) and µ (¯ h ) = µ (cid:48)(cid:48) , otherwise . Letting π ∗ be the Bayesian myopic pricing policy, deﬁne ˜ π = { ˜ π t } Tt =1 where, for each t = 1 , ..., T :˜ π t ( (cid:104) α, ˜ σ, ¯ h (cid:105) ) =  π ∗ t ( h ) , if α = ˜ α, and σ = ˜ σp ∗ otherwise . By construction, if the seller prices myopically at each history (cid:104) ˜ α, ˜ σ, ¯ h (cid:105) period at belief µ , his expectedpayoﬀ is at least p ∗ ¯ ρ ( p ∗ , c )(1 − φ L − ( φ H − φ L ) µ ). However, also by construction, if he deviatesto any other price, then the current payoﬀ is weakly less than the myopically optimal price (bydeﬁnition). In future periods, the seller’s expected value in periods where the seller has a diﬀerent beliefis p ∗ ¯ ρ ( p ∗ , c )(1 − φ L − ( φ H − φ L ) µ ) because the expected value is linear in µ and beliefs are Bayesian.Thus, the seller does not generate value in expectation from learning (but may lose it). Thus, pricingmyopically in the each period generates the highest possible expected payoﬀ for the seller both in thecurrent period and in future periods.Finally, given that the seller’s equilibrium response is ˜ π , consider the consumer surplus generated by aplatform deviation to any α (cid:48) , σ (cid:48) . In this case, the seller sets p ∗ every period. Recall σ T is the truthfulsignaling mechanism and α ∗ is the myopic promotion policy. We have W α (cid:48) ,σ (cid:48) , ˜ π T ( µ ) ≤ W α ∗ ,σ T , ˜ π T ( µ )because regardless of the true φ , the platform can incentivize price p ∗ and design the optimal promotionprobabilities at that price with truthful revelation. Truthful revelation, is in turn dominated bythe optimal confounding payoﬀ because truthful revelation is a confounding policy. Therefore, anydeviation by the platform decreases the expected consumer surplus and we have a Bayesian NashEquilibrium. B.4 Proof of Theorem 3.

Fix T ≥ and µ ∈ [0 , . There exists a Horizon-Maximin Equilibrium ( α , σ, π ) such that: T W α ,σ, π T ( µ ) = co ( W C )( µ ) , nd for all t = 1 , .., T and ¯ h ∈ ¯ H t , the seller’s pricing policy at (cid:104) α , σ, ¯ h (cid:105) satisﬁes (5) (i.e. is myopicallyoptimal). Moreover, lim T →∞ sup ( α ,σ, π ) ∈E ( T ) T RW α ,σ, π T ( µ ) = co ( W C )( µ ) . Proof.

Existence.

Fix T ≥ . Deﬁne ˜ α, ˜ σ as in Theorem 2. That is, For all t and ¯ h ∈ ¯ H t ,:˜ α t ( p, φ, ¯ h ) =  α C ( p, φ, µ (cid:48) ) , if s = s (cid:48) and µ (¯ h ) = µ (cid:48) α C ( p, φ, µ (cid:48)(cid:48) ) , if s = s (cid:48)(cid:48) and µ (¯ h ) = µ (cid:48)(cid:48) , otherwise . Deﬁne ˜ π = { ˜ π t } Tt =1 where, for each t = 1 , ..., T :˜ π t ( (cid:104) α, ˜ σ, ¯ h (cid:105) ) =  π ∗ t ( h ) , if α = ˜ α, and σ = ˜ σp ∗ otherwise . We prove that both equations in (12) hold. First consider the seller:max π RV ˜ α , ˜ σ, π T ( µ ) ( a ) ≤ max π T V ˜ α , ˜ σ, π T ( µ ) ( b ) = 1 T V ˜ α , ˜ σ, ˜ π T ( µ )(a) follows from deﬁnition of RV T . (b) follows from Theorem 2 which establishes that myopic pricingis a best response. Moreover, because the policies and belief are static under these policies, the seller’sexpected payoﬀ is the same in every period and we have: RV ˜ α , ˜ σ, ˜ π T ( µ ) = 1 T V ˜ α , ˜ σ, ˜ π T . Therefore, the seller is best responding to the platform policy with respect to its robust payoﬀs. Nowwe establish the platform is also best responding. Consider the consumer surplus generated by aplatform deviation to α (cid:48) , σ (cid:48) , so the seller sets p ∗ every period. Recall σ T is the truthful signalingmechanism and α ∗ is the myopic promotion policy. We have RW α (cid:48) ,σ (cid:48) , ˜ π T ( µ ) ( a ) ≤ W α (cid:48) ,σ (cid:48) , ˜ π T ( µ ) ( b ) ≤ W α ∗ ,σ T , ˜ π T ( µ ) ( c ) ≤ W ˜ α , ˜ σ, ˜ π T ( µ ) ( d ) = RW ˜ α , ˜ σ, ˜ π T ( µ ) . (a) is by deﬁnition. (b) follows (as in the proof of Theorem 2) because regardless of the true φ , witha truthful signaling mechanism, the platform can incentivize price p ∗ . (c) follows because truthfulrevelation, is in turn dominated by the optimal confounding payoﬀ because truthful revelation is aconfounding policy. (d) follows because the expected consumer surplus is the same in every period53nder a confounding promotion policy. Therefore, any deviation by the platform decreases the expectedconsumer surplus. Long Run Average Optimality.

Fix µ ∈ { , } . Fix ( α , σ, π ) ∈ E ( T ) . Claim : There exists ( α (cid:48) , σ (cid:48) , π (cid:48) ) ∈ E ( T ) where for all t = 1 , ..., T and ¯ h ∈ ¯ H t , the seller’s pricing policyis myopic (i.e. satisﬁes (5)) at (cid:104) α (cid:48) , σ (cid:48) , ¯ h (cid:105) and: RW α ,σ, π T ( µ ) = RW α (cid:48) ,σ (cid:48) , π (cid:48) T ( µ ) Proof.

Set σ = σ (cid:48) . As in the proof of Proposition 1, we can without loss assume that π (cid:48) is deterministicat each history (cid:104) α (cid:48) , σ (cid:48) , ¯ h (cid:105) for ¯ h ∈ ¯ H t . Moreover, we can construct α (cid:48) ∈ A P so that, at every ¯ h ∈ ¯ H t , the probability of promotion equals 0 for all histories (cid:104) φ, p, ( α (cid:48) , σ (cid:48) , ¯ h ) (cid:105) except when the price p is setin accordance with pricing policy π (cid:48) . Note that this altered policies generate the same distribution ofoutcomes and thus the same robust payoﬀs.Second, since ( α , σ, π ) ∈ E ( T ), the seller’s payoﬀ must be greater than (1 − ¯ φ ( µ t )) p ∗ ρ c ( p ∗ ) in period t for every t . If there exists t (cid:48) and history (cid:104) α , σ, ¯ h (cid:105) for ¯ h ∈ ¯ H t (cid:48) where the current payoﬀ is less than(1 − ¯ φ ( µ t )) p ∗ ρ c ( p ∗ ), then the seller can deviate and set price p ∗ in periods t (cid:48) , ..., T and increase RW t (cid:48) .Therefore, this would not be a robust equilibrium so we have a contradiction.Therefore, if the seller sets a price p (cid:54) = p ∗ , it must generate at least revenue (1 − ¯ φ ( µ t )) p ∗ ρ c ( p ∗ ) in thatperiod. And if the seller sets a price p ∗ , then no other price was promoted (by our construction of α (cid:48) ).Thus, the price set by a seller at each period generates the maximum possible expected payoﬀ, so theseller’s pricing policy on-path satisﬁes (5).Thus, lim T →∞ sup α ,σ, π ∈E ( T ) RW α ,σ, π T ( µ ) ( a ) ≤ lim T →∞ sup α ,σ RW α ,σ, π ∗ T ( µ ) [By Claim] ≤ lim T →∞ sup α ,σ W α ,σ, π ∗ T ( µ ) [By Def. of RW T ]= co ( W C )( µ ) . [By Theorem 1] B.5 Proof of Proposition 1

For any T ≥ , α ∈ A , σ ∈ Σ , there exists a single-price promotion policy, α (cid:48) ∈ A P such that: W α ,σ, π ∗ T ( µ ) = W α (cid:48) ,σ, π ∗ T ( µ ) . oreover, there exists a signaling mechanism σ (cid:48) ∈ Σ S such that: W α ,σ, π ∗ T ( µ ) ≤ W α ,σ (cid:48) , π ∗ T ( µ ) . For convenience, we will consider adjustments to the two parts of the strategy separately.

Promotion Policy

Fix α ∈ A , σ ∈ Σ. Recall that π ∗ is the a Bayesian myopic pricing policy deﬁned in § π ∗ satisfying our deﬁnition that is deter-ministic at every history; note that by Lemma 1, it is without loss to consider promotion policies thatdepend only on the belief as a myopic seller’s optimal price is a function of his belief and the currentpromotion policy (conditioned on the history). Beginning in period T , for every µ ∈ [0 ,

1] let p ∗ T ( µ ) ∈ P be a price in the support of π ∗ T ( µ ). Note that this implies we are replacing π T ( µ ) = p ∗ T ( µ ) w.p. π ∗ must have satisﬁed both. Workingbackwards, we can do the same in every period t = T − , ..., π (cid:48) that isdeterministic at every history. Using these prices, deﬁne: α (cid:48) t ( p, φ, µ ) =  α t ( p, φ, µ ) , if p = p ∗ t ( µ )0 , otherwise (23)Letting α = { α (cid:48) t } Tt =1 , we have α ∈ A P and it generates the same consumer surplus. Signaling Mechanism

Fix α ∈ A and σ ∈ Σ. The signaling mechanism σ induces a probability distribution over posteriors µ ∈ [0 , µ , by Lemma 1, the platform’s expected value is independent of therealized signal s . Thus, given α , we can write the expected consumer surplus conditional on the beliefin the ﬁrst period: W α , π ∗ ( µ ) := E Z (cid:16)(cid:80) Tt =1 W ( p t , a t , ψ t , c ) | α , π ∗ , µ (cid:17) .If W α , π ∗ ( µ ) ≥ E s (cid:0) W α , π ∗ ( µ ) | σ (cid:1) , then we have the result by deﬁning an uninformative simple signal.Namely, let S = { φ L , φ H } and σ (cid:48) ( φ ) = L w.p. , for φ ∈ { φ L , φ H } .Otherwise, since E s W α ,π ( µ ) is a convex combination of points in the set [0 , × R and ( µ , W α ,π ( µ ))is in the interior of the convex hull, there exist points 0 ≤ µ (cid:48) < µ < µ (cid:48)(cid:48) ≤ where W α ,π ( µ (cid:48) ) +55 α ,π ( µ (cid:48)(cid:48) ) − W α ,π ( µ (cid:48) ) µ (cid:48)(cid:48) − µ (cid:48) ≥ W α ,π ( µ ).Letting S = { φ L , φ H } and σ (cid:48) ( φ L ) =  φ L , w.p. (cid:16) − µ (cid:48) − µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) H, w.p. − (cid:16) − µ (cid:48) − µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) and σ (cid:48) ( φ H ) =  φ L , w.p. (cid:16) µ (cid:48) µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) φ H , w.p. − (cid:16) µ (cid:48) µ (cid:17) (cid:16) µ (cid:48)(cid:48) − µ µ (cid:48)(cid:48) − µ (cid:48) (cid:17) completes the result. B.6 Proof of Theorem 4

The proof follows the proof of Theorem 1 closely, so we simply note the steps that diﬀer. We ﬁrststate adjusted auxiliary results in § B.6.1. Using these results, we identify how the proof of 1 changesin proving the statement under the altered histories. The proofs of the auxiliary results are omittedas they are nearly identical to those in § B.2.3.

B.6.1 Preliminaries and Auxiliary Results

First, recall that π ∗ , (see Deﬁnition 1) is the Bayesian myopic pricing policy that maximizes the currentconsumer welfare. By Lemma 1, since the seller is myopic, it is without loss to specify the platformpromotion strategy as a function of the seller belief instead of the entire history. Thus, throughout theproof we focus our analysis on A M ⊂ A .For a ﬁxed (cid:15) > α ∈ A M , deﬁne the sets of beliefs, M α t ( (cid:15) ) ⊂ [0 , t = 1 , ..., T where the expected consumer surplus is at least (cid:15) more than the corresponding value co ( W C,a )( µ ): M α t ( (cid:15) ) := { µ ∈ [0 ,

1] : E a t ,p t ,ψ,φ ( W ( p t , a t , ψ ) | α t , π ∗ , µ ) > co ( W C,a )( µ ) + (cid:15) } . The following result establishes that if the platform uses a promotion policy that generates expectedconsumer surplus greater than co ( W C,a )( µ ) + (cid:15) , the sales observation is informative for the seller. Lemma 5 (Separation of Purchase Probabilities) . Fix (cid:15) > . There exists δ > such that for all α ∈ A , if µ ∈ M α t ( (cid:15) ) , then at least one of the following hold: | α ( p t , φ H , µ ) − α ( p t , φ L , µ ) | > δ, ( φ H − φ L )(¯ ρ ( p t ) − ρ c ( p t )) | > δ, | ( φ H − φ L )¯ ρ c ( p t ) | > δ. The ﬁrst corresponds to the information revealed by the promotion decision a t . The second correspondsto the information revealed from a sale when a t = 1. The third corresponds to the informationrevealed from a sale when a t = 0. Given this result, we again have that beliefs converge to the truthexponentially fast (in the number of periods that µ t ∈ M α t ( (cid:15) )). Deﬁne t n = min (cid:40) t : t (cid:88) t (cid:48) =1 { µ t (cid:48) ∈ M α t (cid:48) ( (cid:15) ) } ≥ n (cid:41) (24)where t n = T + 1 if n > (cid:80) Tt (cid:48) =1 { µ t (cid:48) ∈ M α t (cid:48) ( (cid:15) ) } and for convenience, we deﬁne history h T +1 to include φ so that µ T +1 = 0 if φ = φ L and µ T +1 = 1 if φ = φ H . Lemma 6 (Convergence of Seller Beliefs) . Fix µ ∈ [0 , and let { t n } be deﬁned according to (24) .There exist constants χ, ψ > such that: E ( µ t n | φ = φ L ) ≤ χ exp( − ψn ) E (1 − µ t n | φ = φ H ) ≤ χ exp( − ψn ) , ∀ t = 1 , , ..., T. Finally, deﬁne W max ( µ ) as the maximum consumer surplus achievable by any promotion policy when T = 1 and the seller has belief µ . Recall the deﬁnition of ¯ W ( φ, µ ) from § W max ( µ ) := µ ¯ W ( φ H , µ ) + (1 − µ ) ¯ W ( φ L , µ ) . Lemma 7 ( W C ( µ ) Bounded by Linear Functions) . Fix (cid:15) > . There exists ¯ C ≥ such that for all µ ∈ [0 , : co ( W max )( µ ) − co ( W C,a )( µ ) < (cid:15) Cµ, and co ( W max )( µ ) − co ( W C,a )( µ ) < (cid:15) C (1 − µ ) . B.6.2 Proof of Theorem 1

The ﬁrst step, lim sup W α ,σ, π ∗ ≤ co ( W C,a )( µ ), follows from an identical proof. Existence of Optimal Confounding Promotion Policy . For a given µ there exists an optimalconfounding promotion policy by the extreme value theorem. Let α C ∈ A M correspond to the simplepromotion policy where the one-period solution is repeated T times for every µ . By construction theseller’s belief, myopically optimal price, and the expected welfare are the same in each period. Thus,the payoﬀ generated by this policy given posterior belief µ ∈ [0 ,

1] is T · W C ( µ ). Existence of Optimal Signaling Mechanism

We now show that an optimal signal achieves57 o ( W C )( µ ). Since A C,a ( µ ) = ∅ for some demand functions and some µ , it requires establishingan upper bound on W C ( µ ) that is continuous. Let ¯ µ = inf µ> A C,a ( µ ) (cid:54) = ∅ . If ¯ µ = 0, then onecan establish that W C,a ( µ ) is continuous as before becuase A C,a ( µ ) (cid:54) = ∅ for all µ . Otherwise, sincethe feasilibity sets are increasing in µ , W C,a ( µ ) = ∅ , for all 0 < µ < ¯ µ and W C,a ( µ ) (cid:54) = ∅ for all µ > ¯ µ . Therefore, as before, for all µ > ¯ µ , W C,a ( µ ) is continuous. Thus, at beliefs µ ∈ (0 , ¯ µ ), W C,a ( µ ) ≤ W C,a ( µ ) + µ ( W C,a (¯ µ ) − W C,a (0)) and we can achieve this using a signaling mechanismthat splits the seller’s belief between 0 and ¯ µ . Thus, we can deﬁne a continuous function that upperbounds W C ( µ ). Therefore, an optimal signal σ ∈ Σ exists (see Kamenica and Gentzkow (2011) Cor-rollary 1 and discussion) and by Kamenica and Gentzkow (2011) Corrollary 2, the optimal signal atprior µ generates value co ( W C )( µ ) in each period.Thus, for any µ, T , there exists α , σ such that:1 T W α ,σ, π ∗ ( µ, T ) = co ( W C,a )( µ ) ..