[PDF] Context-Based Dynamic Pricing with Online Clustering

Abstract

We consider a context-based dynamic pricing problem of online products which have low sales. Sales data from Alibaba, a major global online retailer, illustrate the prevalence of low-sale products. For these products, existing single-product dynamic pricing algorithms do not work well due to insufficient data samples. To address this challenge, we propose pricing policies that concurrently perform clustering over products and set individual pricing decisions on the fly. By clustering data and identifying products that have similar demand patterns, we utilize sales data from products within the same cluster to improve demand estimation and allow for better pricing decisions. We evaluate the algorithms using the regret, and the result shows that when product demand functions come from multiple clusters, our algorithms significantly outperform traditional single-product pricing policies. Numerical experiments using a real dataset from Alibaba demonstrate that the proposed policies, compared with several benchmark policies, increase the revenue. The results show that online clustering is an effective approach to tackling dynamic pricing problems associated with low-sale products. Our algorithms were further implemented in a field study at Alibaba with 40 products for 30 consecutive days, and compared to the products which use business-as-usual pricing policy of Alibaba. The results from the field experiment show that the overall revenue increased by 10.14%.

Full PDF

SSubmitted to

Management Science manuscript MS-0001-1922.65

Authors are encouraged to submit new papers to INFORMS journals by means ofa style ﬁle template, which includes the journal title. However, use of a templatedoes not certify that the paper has been accepted for publication in the named jour-nal. INFORMS journal templates are for the exclusive purpose of submitting to anINFORMS journal and should not be used to distribute the papers in print or onlineor to submit the papers to another publication.

Context-Based Dynamic Pricing with Online Clustering

Sentao Miao

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109, [email protected]

Xi Chen

Leonard N. Stern School of Business, New York University, New York City, NY 10012, [email protected]

Xiuli Chao

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109, [email protected]

Jiaxi Liu,Yidong Zhang

Alibaba Supply Chain Platform, Hangzhou, China 311121, [email protected], [email protected]

We consider a context-based dynamic pricing problem of online products, which have low sales. Sales datafrom Alibaba, a major global online retailer, illustrate the prevalence of low-sale products. For these products,existing single-product dynamic pricing algorithms do not work well due to insuﬃcient data samples. Toaddress this challenge, we propose pricing policies that concurrently perform clustering over products andset individual pricing decisions on the ﬂy. By clustering data and identifying products that have similardemand patterns, we utilize sales data from products within the same cluster to improve demand estimationfor better pricing decisions. We evaluate the algorithms using regret, and the result shows that when productdemand functions come from multiple clusters, our algorithms signiﬁcantly outperform traditional single-product pricing policies. Numerical experiments using a real dataset from Alibaba demonstrate that theproposed policies, compared with several benchmark policies, increase the revenue. The results show thatonline clustering is an eﬀective approach to tackling dynamic pricing problems associated with low-saleproducts. Our algorithms were further implemented in a ﬁeld study at Alibaba with 40 products for 30consecutive days, and compared to the products which use business-as-usual pricing policy of Alibaba. Theresults from the ﬁeld experiment show that the overall revenue increased by 10.14%.

Key words : dynamic pricing, online clustering, regret analysis, low-sale product

History : Submitted February 20, 2019; revised October 2019.

1. Introduction

Over the past several decades, dynamic pricing has been widely adopted by industries, such asretail, airlines, and hotels, with great success (see, e.g., Smith et al. 1992, Cross 1995). Dynamic a r X i v : . [ s t a t . M L ] N ov uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 pricing has been recognized as an important lever not only for balancing supply and demand, butalso for increasing revenue and proﬁt. Recent advances in online retailing and increased availabilityof online sales data have created opportunities for ﬁrms to better use customer information tomake pricing decisions, see e.g., the survey paper by den Boer (2015). Indeed, the advances ininformation technology have made the sales data easily accessible, facilitating the estimation ofdemand and the adjustment of price in real time. Increasing availability of demand data allowsfor more knowledge to be gained about the market and customers, as well as the use of advancedanalytics tools to make better pricing decisions.However, in practice, there are often products with low sales amount or user views. For theseproducts, few available data points exist. For example,

Tmall Supermarket , a business divisionof Alibaba, is a large-scale online store. In contrast to a typical consumer-to-consumer (C2C)platform (e.g., Taobao under Alibaba) that has millions of products available, Tmall Supermarketis designed to provide carefully selected high-quality products to customers. We reviewed the salesdata from May to July of 2018 on Tmall Supermarket with nearly 75,000 products oﬀered duringthis period of time, and it shows that more than 16,000 products (21 .

6% of all products) have adaily average number of unique visitors less than 10, and more than 10,000 products (14 .

3% of allproducts) have a daily average number of unique visitors less than or equal to 2. Although eachlow-sale product alone may have little impact on the company’s revenue, the combined sales of alllow-sale products are signiﬁcant.Pricing low-sale products is often challenging due to the limited sales records available for demandestimation. In fast-evolving markets (e.g., fashion or online advertising), demand data from thedistant past may not be useful for predicting customers’ purchasing behavior in the near future.Classical statistical estimation theory has shown that data insuﬃciency leads to large estimationerror of the underlying demand, which results in sub-optimal pricing decisions. In fact, the researchon dynamic pricing of products with little sales data remains relatively unexplored. To the bestof our knowledge, there exists no dynamic pricing policy in the literature for low-sale productsthat admits theoretical performance guarantee. This paper ﬁlls the gap by developing adaptivecontext-based dynamic pricing learning algorithms for low-sale products, and our results show thatthe algorithms perform well both theoretically and numerically (including a ﬁeld experiment).

Although each low-sale product only has a few sales records, the total number of low-sale productsis usually quite large. In this paper, we address the challenge of pricing low-sale products using animportant idea from machine learning — clustering. Our starting point is that there are some set A terminology used within Alibaba to represent a unique user login identiﬁcation. uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 of products out there, though we do not know which ones, that share similar underlying demandpatterns. For these products, information can be extracted from their collective sales data toimprove the estimation of their demand function. The problem is formulated as developing adaptivelearning algorithms that identify the products exhibiting similar demand patterns, and extractthe hidden information from sales data of seemingly unrelated products to improve the pricingdecisions of low-sale products and increase revenue.We ﬁrst consider a generalized linear demand model with stochastic contextual covariate infor-mation about products and develop a learning algorithm that integrates product clustering withpricing decisions. Our policy consists of two phases. The ﬁrst phase constructs conﬁdence boundson the distance between clusters, which enables dynamic clustering without any prior knowledgeof the cluster structure. The second phase carefully controls the price variation based on the esti-mated clusters, striking a proper balance between price exploration and revenue maximization byexploiting the cluster structure. Since the pricing part of the algorithm is inspired by semi-myopicpolicy proposed by Keskin and Zeevi (2014), we refer to our algorithm as the Clustered Semi-Myopic Pricing (CSMP) policy. We ﬁrst establish the theoretical regret bound of the proposedpolicy. Speciﬁcally, when the demand functions of the products belong to m clusters, where m is smaller than the total number of products (denoted by n ), the performance of our algorithmis better than that of existing dynamic pricing policies that treat each product separately. Let T denote the length of the selling season; we show in Theorem 1 that our algorithm achieves theregret of (cid:101) O ( √ mT ), where (cid:101) O ( · ) hides the logarithmic terms. This result, when m is much smallerthan n , is a signiﬁcant improvement over the regret when applying a single-product pricing policyto individual products, which is typically (cid:101) O ( √ nT ).When the demand function is linear in terms of covariates of products and price, we extend ourresult to the setting where the covariates are non-stochastic and even adversarial. In this case,we develop a variant of the CSMP policy (called CSMP-L, where L stands for “linear”), whichhandles a more general class of demand covariates. The parameter estimation for the linear demandfunction is based on a scheme developed by Nambiar et al. (2018), which is used to build separateconﬁdence bounds for the parameters of demand covariates and price sensitivity. Similar to theCSMP algorithm, our theoretical analysis in Theorem 2 shows that the CSMP-L algorithm achievesthe regret (cid:101) O ( √ mT ).We carry out a thorough numerical experiment using both synthetic data and a real dataset fromAlibaba consisting of a large number of low-sale products. Several benchmarks, one treats eachproduct separately, one puts all products into a single cluster, and the other one applies a classicalclustering method ( K -means method for illustration), are compared with our algorithms under var-ious scenarios. The numerical results show that our algorithms are eﬀective and their performancesare consistent in diﬀerent scenarios (e.g., with almost static covariates, model misspeciﬁcation). uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Our algorithm was tested in a ﬁeld experiment conducted at Alibaba by a Tmall Supermarketteam. The algorithm was tested on 40 products for 30 consecutive days. The results from the ﬁeldexperiment show that the overall revenue was boosted by 10.14%.It is well-known that providing a performance guarantee for a clustering method is challengingdue to the non-convexity of the loss function (e.g., in K -means), which is why there exists noclustering and pricing policy with theoretical guarantees in the existing literature. This is the ﬁrstpaper to establish the regret bound for a dynamic clustering and pricing policy. Instead of adopt-ing an existing clustering algorithm from the machine learning literature (e.g., K -means), whichusually requires the number of clusters as an input, our algorithms dynamically update the clustersbased on the gathered information about customers’ purchase behavior. In addition to signiﬁcantlyimproving the theoretical performance as compared to classical dynamic pricing algorithms with-out clustering, our algorithms demonstrate excellent performance both in our simulation study andin our ﬁeld experiments with Alibaba. In this subsection, we review some related research from both the revenue management and machinelearning literature.

Related literature in dynamic pricing.

Due to increasing popularity of online retailing,dynamic pricing has become an active research area in revenue management in the past decade.We only brieﬂy review a few of the most related works and refer the interested readers to den Boer(2015) for a comprehensive literature survey. Earlier work and review of dynamic pricing includeGallego and Van Ryzin (1994, 1997), Bitran and Caldentey (2003), Elmaghraby and Keskinocak(2003). These papers assume that demand information is known to the retailer a priori and eithercharacterize or compute the optimal pricing decisions. In some retailing industries, such as fastfashion, this assumption may not hold due to the quickly changing market environment. As a result,with the recent development of information technology, combining dynamic pricing with demandlearning has attracted much interest in research. Depending on the structure of the underlyingdemand functions, these works can be roughly divided into two categories: parametric demandmodels (see, e.g., Carvalho and Puterman 2005, Bertsimas and Perakis 2006, Besbes and Zeevi2009, Farias and Van Roy 2010, Broder and Rusmevichientong 2012, Harrison et al. 2012, den Boerand Zwart 2013, Keskin and Zeevi 2014) and nonparametric demand models (see, e.g., Aramanand Caldentey 2009, Wang et al. 2014, Lei et al. 2014, Chen et al. 2015a, Besbes and Zeevi 2015,Cheung et al. 2017, Chen and Shi 2019). The aforementioned papers assume that the price iscontinuous. Other works consider a discrete set of prices, see, e.g., Ferreira et al. (2018), and recentstudies examine pricing problems in dynamically changing environments, see, e.g., Besbes et al.(2015) and Keskin and Zeevi (2016). uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 Dynamic pricing and learning with demand covariates (or contextual information) has receivedincreasing attention in recent years because of its ﬂexibility and clarity in modeling customersand market environment. Research involving this information include, among others, Chen et al.(2015b), Qiang and Bayati (2016), Nambiar et al. (2018), Ban and Keskin (2017), Lobel et al.(2018), Chen and Gallego (2018), Javanmard and Nazerzadeh (2019). In many online-retailingapplications, sellers have access to rich covariate information reﬂecting the current market situa-tion. Moreover, the covariate information is not static but usually evolves over time. Our paperincorporates time-evolving covariate information into the demand model. In particular, given theobservable covariate information of a product, we assume that the customer decision depends onboth the selling price and covariates. Although covariates provide richer information for accu-rate demand estimation, a demand model that incorporates covariate information involves moreparameters to be estimated. Therefore, it requires more data for estimation with the presence ofcovariates, which poses an additional challenge for low-sale products.

Related literature in clustering for pricing.

To the best of our knowledge, we are not awareof any operations literature that dynamically learns about the clustering structure on the ﬂy. Thereare, however, some interesting works that use historical data to determine the cluster structure ofdemand functions in an oﬄine manner, and then dynamically make pricing decisions for anotherproduct by learning which cluster its demand belongs to.Ferreira et al. (2015) study a pricing problem with ﬂash sales on the Rue La La platform. Usinghistorical information and oﬄine optimization, the authors classify the demand of all productsinto multiple groups, and use demand information for products that did not experience lost salesto estimate demand for products that had lost sales. They construct “demand curves” on thepercentage of total sales with respect to the number of hours after the sales event starts, thenclassify these curves into four clusters. For a sold-out product, they check which one of the fourcurves is the closest to its sales behavior and use that to estimate the lost sales. Cheung et al.(2017) consider the single-product pricing problem, where the demand of the product is assumedto be from one of the K demand functions (called demand hypothesis in that paper). Those K demand functions are assumed to be known, and the decision is to choose which of those functionsis the true demand curve of the product. In their ﬁeld experiment with Groupon, they applied K -means clustering to historical demand data to generate those K demand functions oﬄine. Thatis, clustering is conducted oﬄine ﬁrst using historical data, then dynamic pricing decisions aremade in an online fashion for a new product, assuming that its demand is one of the K demandfunctions. Related literature in other operations management problems.

The method of clusteringis quite popular for many operations management problems such as demand forecast for new uthor:

Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 products and customer segmentation. In the following, we give a brief review of some recent paperson these two topics that are based on data clustering approach.Demand forecasting for new products is a prevalent yet challenging problem. Since new prod-ucts at launch have no historical sales data, a commonly used approach is to borrow data from“similar old products” for demand forecasting. To connect the new product with old products,current literature typically use product features. For instance, Baardman et al. (2017) assume ademand function which is a weighted sum of unknown functions (each representing a cluster) ofproduct features. While in Ban et al. (2018), similar products are predeﬁned such that commondemand parameters are estimated using sales data of old products. Hu et al. (2018) investigate theeﬀectiveness of clustering based on product category, features, or time series of demand respectively.Customer segmentation is another application of clustering. Jagabathula et al. (2018) assumea general parametric model for customers’ features with unknown parameters, and use K -meansclustering to segment customers. Bernstein et al. (2018) consider the dynamic personalized assort-ment optimization using clustering of customers. They develop a hierarchical Bayesian model formapping from customer proﬁles to segments.Compared with these literature, besides a totally diﬀerent problem setting, our paper is also dif-ferent in the approach. First, we consider an online clustering approach with provable performanceinstead of an oﬄine setting as in Baardman et al. (2017), Ban et al. (2018), Hu et al. (2018), Jaga-bathula et al. (2018). Second, we know neither the number of clusters (in contrast to Baardmanet al. 2017, Bernstein et al. 2018 that assume known number of clusters), nor the set of productsin each cluster (as compared with Ban et al. 2018 who assume known products in each cluster).Finally, we do not assume any speciﬁc probabilistic structure on the demand model and clusters(in contrast with Bernstein et al. 2018 who assign and update the probability for a product tobelong to some cluster), but deﬁne clusters using product neighborhood based on their estimateddemand parameters. Related literature in multi-arm bandit problem.

A successful dynamic pricing algorithmrequires a careful balancing between exploration (i.e., learning the underlying demand function)and exploitation (i.e., making the optimal pricing strategy based on the learned information so far).The exploration-exploitation trade-oﬀ has been extensively investigated in the multi-armed bandit(MAB) literature; see earlier works by Lai and Robbins (1985), Auer et al. (2002), Auer (2002)and Bubeck et al. (2012) for a comprehensive literature review. Among the vast MAB literature,there is a line of research on bandit clustering that addresses a diﬀerent but related problem (see,e.g., Cesa-Bianchi et al. 2013, Gentile et al. 2014, Nguyen and Lauw 2014, Gentile et al. 2016). Thesetting is that there is a ﬁnite number of arms which belong to several unknown clusters, whereunknown reward functions of arms in each cluster are the same. Under this assumption, the MAB uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 algorithms aim to cluster diﬀerent arms and learn the reward function for each cluster.The settingof the bandit-clustering problem is quite diﬀerent from ours. In the bandit clustering problem, thearms belong to diﬀerent clusters and the decision for each period is which arm to play. In oursetting, the products belong to diﬀerent clusters and the decision for each period is what prices tocharge for all products, and we have a continuum set of prices to choose from for each product.In addition, in contrast to the linear reward in bandit-clustering problem, the demand functionsin our setting follow a generalized linear model. As will be seen in Section 3, we design a priceperturbation strategy based on the estimated cluster, which is very diﬀerent from the algorithmsin bandit-clustering literature. Related literature in clustering.

We end this section by giving a brief overview of clusteringmethods in the machine learning literature. To save space, we only discuss several popular clusteringmethods, and refer the interested reader to Saxena et al. (2017) for a recent literature review onthe topic. The ﬁrst one is called hierarchical clustering (Murtagh 1983), which iteratively clustersobjects (either bottom-up, from a single object to several big clusters; or top-down, from a bigcluster to single product). Comparable with hierarchical clustering, another class of clusteringmethod is partitional clustering, in which the objects do not have any hierarchical structure, butrather are grouped into diﬀerent clusters horizontally. Among these clustering methods, K -meansclustering is probably the most well-known and most widely applied method (see e.g., MacQueenet al. 1967, Hartigan and Wong 1979). Several extensions and modiﬁcations of K -means clusteringmethod have been proposed in the literature, e.g., K -means++ (Arthur and Vassilvitskii 2007,Bahmani et al. 2012) and fuzzy c-means clustering (Dunn 1973, Bezdek 2013). Another importantclass of clustering method is based on graph theory. For instance, the spectral clustering usesgraph Laplacian to help determine clusters (Shi and Malik 2000, Von Luxburg 2007). Beside thesegeneral methods for clustering, there are many clustering methods for speciﬁc problems such asdecision tree, neural network, etc. It should be noted that nearly all the clustering methods in theliterature are based on oﬄine data. This paper, however, integrates clustering into online learningand decision-making process. The remainder of this paper is organized as follows. In Section 2, we present the problem formu-lation. Our main algorithm is presented in Section 3 together with the theoretical results for thealgorithm performance. We develop another algorithm for the linear demand model in Section 4when the contextual covariates are non-stochastic or adversarial. In Section 5, we report the resultsof several numerical experiments based on both synthetic data and a real dataset in addition tothe ﬁndings from a ﬁeld experiment carried out at Alibaba’s Tmall Supermarket. We conclude thepaper with a discussion about future research in Section 6. Finally, all the technical proofs arepresented in the supplement. uthor:

Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

2. Problem Formulation

We consider a retailer that sells n products, labeled by i = 1 , , . . . , n , with unlimited inventory (e.g.,there is an inventory replenishment scheme such that products typically do not run out of stock).Following the literature, we denote the set of these products by [ n ]. We mainly focus on onlineretailing of low-sale products. These products are typically not oﬀered to customers as a display;hence we do not consider substitutability/complementarity of products in our model. Furthermore,these products are usually not recommended by the retailer on the platform, and instead, customerssearch for them online. We let q i > i ∈ [ n ]. In this paper, we will treat q i as the probability an arrivingcustomer searches for product i .Customers arrive sequentially at time t = 1 , , . . . , T , and we denote the set of all time indicesby [ T ]. For simplicity, we assume without loss of generality that there is exactly one arrival duringeach period. In each time period t , the ﬁrm ﬁrst observes some covariates for each product i , suchas product rating, prices of competitors, average sales in past few weeks, and promotion-relatedinformation (e.g., whether the product is currently on sale). We denote the covariates of product i by z i,t ∈ R d , where d is the dimension of the covariates that is usually small (as compared to n or T ). The covariates z i,t change over time and satisfy || z i,t || ≤ p i,t ∈ [ p, p ] for each product i , where 0 ≤ p < p < ∞ (the assumption of thesame price range for all products is without loss of generality). Let i t denote the product thatthe customer searches in period t (or customer t ). After observing the price and other details ofproduct i t , customer t then decides whether or not to purchase it. The sequence of events in period t is summarized as follows:i) In time t , the retailer observes the covariates z i,t for each product i ∈ [ n ], then sets the price p i,t for each i ∈ [ n ].ii) Customer searches for product i t ∈ [ n ] in period t with probability q i t independent of othersand then observes its price p i t ,t .iii) The customer decides whether or not to purchase product i t .The customer’s purchasing decision follows a generalized linear model (GLM, see e.g., McCullaghand Nelder 1989). That is, given price p i t ,t of product i t at time t , the customer’s purchase decisionis represented by a Bernoulli random variable d i t ,t ( p i t ,t ; z i t ,t ) ∈ { , } , where d i t ,t ( p i t ,t ; z i t ,t ) = 1 if thecustomer purchases product i t and 0 otherwise. The purchase probability, which is the expectationof d i t ,t ( p i t ,t ; z i t ,t ), takes the form E [ d i t ,t ( p i t ,t ; z i t ,t )] = µ ( α (cid:48) i t x i t ,t + β i t p i t ,t ) , (1) uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 where µ ( · ) is the link function, x (cid:48) i t ,t = (1 , z (cid:48) i t ,t ) is the corresponding extended demand covariatewith the 1 in the ﬁrst entry used to model the bias term in a GLM model, and the expectation istaken with respect to customer purchasing decision. Let θ (cid:48) i t = ( α (cid:48) i t , β i t ) be the unknown parameterof product i t , which is assumed to be bounded. That is, || θ i || ≤ L for some constant L for all i ∈ [ n ]. Remark 1.

The commonly used linear and logistic models are special cases of GLM with linkfunction µ ( x ) = x and µ ( x ) = exp ( x ) / (1 + exp( x )), respectively. The parametric demand model (1)has been used in a number of papers on pricing with contextual information, see, e.g., Qiang andBayati (2016) (for a special case of linear demand with µ ( x ) = x ) and Ban and Keskin (2017).For convenience and with a slight abuse of notation, we write p t := p i t ,t , z t := z i t ,t , x t := x i t ,t , d t := d i t ,t , where “ := ” stands for “deﬁned as”. Let the feasible sets of x t and θ i be denoted as X and Θ,respectively. We further deﬁne T i,t := { s ≤ t : i s = i } (2)as the set of time periods before t in which product i is viewed, and T i,t := |T i,t | its cardinality.With this demand model, the expected revenue r t ( p t ) of each round t is r t ( p t ) := p t µ ( α (cid:48) i t x t + β i t p t ) . (3)Note that we have made the dependency of r t ( p t ) on x t implicit. The ﬁrm’s optimization problem and regret.

The ﬁrm’s goal is to decide the price p t ∈ [ p, p ]at each time t for each product to maximize the cumulative expected revenue (cid:80) Tt =1 E [ r t ( p t )], wherethe expectation is taken with respect to the randomness of the pricing policy as well as the streamof i t for t ∈ [ T ], and for the next section, also the stochasticity in contextual covariates z t , t ∈ [ T ].The goal of maximizing the expected cumulative revenue is equivalent to minimizing the so-calledregret, which is deﬁned as the revenue gap as compared with the clairvoyant decision maker whoknew the underlying parameters in the demand model a priori . With the known demand model,the optimal price can be computed as p ∗ t = arg max p ∈ [ p,p ] r t ( p ) , and the corresponding revenue gap at time t is E [ r t ( p ∗ t ) − r t ( p t )] (the dependency of p ∗ t on x t isagain made implicit). The cumulative regret of a policy π with prices { p t } Tt =1 is deﬁned by thesummation of revenue gaps over the entire time horizon, i.e., R π ( T ) := T (cid:88) t =1 E [ r t ( p ∗ t ) − r t ( p t )] . (4) uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Remark 2.

For consistency with the online pricing literature, see e.g., Chen et al. (2015b),Qiang and Bayati (2016), Ban and Keskin (2017), Javanmard and Nazerzadeh (2019), in this paperwe use expected revenue as the objective to maximize. However, we point out that all our analysesand results carry over to the objective of proﬁt maximization. That is, if c t is the cost of theproduct in round t , then the expected proﬁt in (3) can be replaced by r t ( p t ) = ( p t − c t ) µ ( α (cid:48) i t x t + β i t p t ) . Cluster of products.

Two products i and i are said to be “similar” if they have similarunderlying demand functions, i.e., θ i and θ i are close. In this paper we assume that the n productscan be partitioned into m clusters, N j for j = 1 , , . . . , m , such that for arbitrary two products i and i , we have θ i = θ i if i and i belong to the same cluster; otherwise, || θ i − θ i || ≥ γ > γ . We refer to this cluster structure as the γ -gap assumption, which will be relaxedin Remark 7 of Section 3.2. For convenience, we denote the set of clusters by [ m ], and by a bitabuse of notation, let N i be the cluster to which product i belongs.It is important to note that the number of clusters m and each cluster N j are unknown to thedecision maker a priori . Indeed, in some applications such structure may not exist at all. If suchstructure does exist, then our policy can identify such a cluster structure and make use of it toimprove the practical performance and the regret bound. However, we point out that the clusterstructure is not a requirement for the pricing policy to be discussed. In other words, our policyreduces to a standard dynamic pricing algorithm when demand functions of the products are alldiﬀerent (i.e., when m = n ).It is also worthwhile to note that our clustering is based on demand parameters/patterns and not on product categories or features, since it is the demand of the products that we want to learn.The clustering approach based on demand is prevalent in the literature (besides Ferreira et al.2015, Cheung et al. 2017 and the references therein, we also refer to Van Kampen et al. 2012 for acomprehensive review). Clustering based on category/feature similarity is useful in some problems(see e.g., Su and Chen 2015 investigate customer segmentation using features of clicking data), butit does not apply to our setting, because, for instance, products with similar feature for diﬀerentbrands may have very diﬀerent demand. Remark 3.

For its application to the online pricing problem, the contextual information inour model is about the product. That is, at the beginning of each period, the ﬁrm observes thecontextual information about each product, then determines the pricing decision for the product,and then the arriving customer makes a purchasing decisions. We point out that our algorithmand result apply equally to personalized pricing in which the contextual information is about the uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 customer. That is, a customer arrives (e.g., logging on the website) and reveals his/her contextualinformation, and then the ﬁrm makes a pricing decision based on that information. The objective isto make personalized pricing decisions to maximize total revenue (see e.g., Ban and Keskin 2017).

3. Pricing Policy and Main Results

In this section we discuss the speciﬁcs of the learning algorithm, its theoretical performance, anda sketch of its proof. Speciﬁcally, we describe the policy procedure and discuss its intuitions inSection 3.1 before presenting its regret and outlining the proof in Section 3.2.

Our policy consists of two phases for each period t ∈ [ T ]: the ﬁrst phase constructs a neighborhood for each product i ∈ [ n ], and the second phase determines its selling price. In the ﬁrst step, ourpolicy uses individual data of each product i ∈ [ n ] to estimate parameters ˆ θ i,t − . This estimationis used only for construction of the neighborhood ˆ N i,t for product i . Once the neighborhood isdeﬁned, we consider all the products in this neighborhood as in the same cluster and use clustereddata to estimate the parameter vector (cid:101) θ ˆ N i,t ,t − . The latter is used in computing the selling price ofproduct i . We refer to Figure 1 for a ﬂowchart of our policy, and present the detailed procedure inAlgorithm 1.In the following, we discuss the parameter estimation of GLM demand functions and the con-struction of a neighborhood in detail. Estimate parameter of each productEstimate parameter using cluster data Determine neighborhood of each productSet selling price for each product Round t Customer t arrives and searches product i t Customer t observes the price and makes purchase decision Record data and go to t+1 . …… Figure 1 Flow chart of the algorithm.

Parameter estimation of GLM.

As shown in Figure 1, the parameter estimation is animportant part of our policy construction. We adopt the classical maximum likelihood estimation uthor:

Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 (MLE) method for parameter estimation (see McCullagh and Nelder 1989). For completeness, webrieﬂy describe the MLE method here. Let u t := ( x (cid:48) t , p t ) (cid:48) ∈ R d +2 . The conditional distribution ofthe demand realization d t , given u t , belongs to the exponential family and can be written as P ( d t | u t ) = exp (cid:18) d t u (cid:48) t θ − m ( u (cid:48) t θ ) g ( η ) + h ( d t , η ) (cid:19) . (5)Here m ( · ) , g ( · ), and h ( · ) are some speciﬁc functions, where ˙ m ( u (cid:48) t θ ) = E [ d t ] = µ ( u (cid:48) t θ ) depends on µ ( · ) and h ( d t , η ) is the normalization part, and η is some known scale parameter. Suppose that wehave t samples ( d s , p s ) for s = 1 , , . . . , t , the negative log-likelihood function of θ under model (5)is t (cid:88) s =1 (cid:18) m ( u (cid:48) s θ ) − d s u (cid:48) s θg ( η ) + h ( d s , η ) (cid:19) . (6)By extracting the terms in (6) that involves θ , the maximum likelihood estimator ˆ θ isˆ θ = arg min θ ∈ Θ t (cid:88) s =1 l s ( θ ) , l s ( θ ) := m ( u (cid:48) s θ ) − d s u (cid:48) s θ. (7)Since ∇ l s ( θ ) = ˙ µ ( u (cid:48) s θ ) u s u (cid:48) s is positive semi-deﬁnite in a standard GLM model (by Assumption A-2in the next subsection), the optimization problem in (7) is convex and can be easily solved. Determining the neighborhood of each product.

The ﬁrst phase of our policy determineswhich products to include in the neighborhood of each product i ∈ [ n ]. We use the term “neigh-borhood” instead of cluster, though closely related, because clusters are usually assumed to bedisjoint in the machine learning literature. In contrast, by our deﬁnition of neighborhood, someproducts can belong to diﬀerent neighborhoods depending on the estimated parameters. To deﬁnethe neighborhood of i , which is denoted by ˆ N i,t , we ﬁrst estimate parameter ˆ θ i,t − of each product i ∈ [ n ] using their own data, i.e., ˆ θ i,t − is the maximum likelihood estimator using data in T i,t − deﬁned in (2). Then, we include a product i (cid:48) ∈ [ n ] in the neighborhood ˆ N i,t of i if their estimatedparameters are suﬃciently close , which is deﬁned as || ˆ θ i (cid:48) ,t − − ˆ θ i,t − || ≤ B i (cid:48) ,t − + B i,t − , where B i,t − is a conﬁdence bound for product i given by B i,t := (cid:112) c ( d + 2) log(1 + t ) (cid:112) λ min ( V i,t ) . (8)Here, V i,t := I + (cid:80) s ∈T i,t u s u (cid:48) s is the empirical Fisher’s information matrix of product i ∈ [ n ] at time t and c is some positive constant, which will be speciﬁed in our theory development. Note that,by the γ -gap assumption discussed at the end of Section 2, the method will work even when T i,t − only contains a limited number of sales records. uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 Setting the price of each product.

Once we deﬁne the (estimated) neighborhood ˆ N i,t of i ∈ [ n ], we can pool the demand data of all products in ˆ N i,t to learn the parameter vector. That is,we let (cid:101) T ˆ N i,t ,t − := (cid:91) i (cid:48) ∈ ˆ N i,t T i (cid:48) ,t − and (cid:101) T ˆ N i,t ,t − := | (cid:101) T ˆ N i,t ,t − | . The clustered parameter vector (cid:101) θ ˆ N i,t ,t − is the maximum likelihood estimator using data in (cid:101) T ˆ N i,t ,t − .To decide on the price, we ﬁrst compute p (cid:48) i,t , which is the “optimal price” based on the estimatedclustered parameters (cid:101) θ ˆ N i,t ,t − . Then we restrict p (cid:48) i,t to the interval [ p + | ∆ i,t | , p − | ∆ i,t | ] by the projection operator . That is, we compute (cid:101) p i,t = Proj [ p + | ∆ i,t | ,p −| ∆ i,t | ] ( p (cid:48) i,t ) , where Proj [ a,b ] ( x ) := min { max { x, a } , b } . The reasoning for this restriction is that our ﬁnal price p i,t will be p i,t = (cid:101) p i,t +∆ i,t , and the projectionoperator forces the ﬁnal price p i,t to the range [ p, p ]. Here, the price perturbation ∆ i,t = ± ∆ (cid:101) T − / N i,t ,t takes a positive or a negative value with equal probability, where ∆ is a positive constant. We addthis price perturbation for the purpose of price exploration. Intuitively, the more price variationwe have, the more accurate the parameter estimation will be. However, too much price variationleads to loss of revenue because we deliberately charged a “wrong” price. Therefore, it is crucial toﬁnd a balance between these two targets by deﬁning an appropriate ∆ i,t .We note that this pricing scheme belongs to the class of semi-myopic pricing policies deﬁned inKeskin and Zeevi (2014). Since our policy combines clustering with semi-myopic pricing, we referto it as the Clustered Semi-Myopic Pricing (CSMP) algorithm.We brieﬂy discuss each step of the algorithm and the intuition behind the theoretical perfor-mance. For Steps 1 and 2, the main purpose is to identify the correct neighborhood of the productsearched in period t ; i.e., ˆ N i t ,t = N i t with high probability (for brevity of notation, we let ˆ N t :=ˆ N i t ,t ). To achieve that, two conditions are necessary. First, the estimator ˆ θ i,t should converge to θ i as t grows for all i ∈ [ n ]. Second, the conﬁdence bound B i,t should converge to 0 as t grows,such that in Step 2, we are able to identify diﬀerent neighborhood by the γ -gap assumption amongclusters. To satisfy these conditions, classical statistical learning theory (see e.g., Lemma EC.2 inthe supplement) requires the minimum eigenvalue of the empirical Fisher’s information matrix V i,t to be suﬃciently above zero, or more speciﬁcally, λ min ( V i,t ) ≥ Ω( q i √ t ) (see Lemma EC.4 in thesupplement). This requirement is guaranteed by the stochastic assumption on demand covariates z i,t , which will be imposed in Assumption A-3 in the next subsection, plus our choice of priceperturbation in Step 4.Following the discussion above, when ˆ N t = N i t with high probability, we can cluster the datawithin N i t to increase the number of samples for i t . Because of the increased data samples, it is uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Algorithm 1

The CSMP Algorithm

Require: c , the conﬁdence bound parameter; ∆ , price perturbation parameter; Step 0. Initialization.

Initialize T i, = ∅ and V i, = I for all i ∈ [ n ]. Let t = 1 and go to Step1. for t = 1 , , . . . , T do Step 1. Individual Parametric Estimation.

Compute the MLE using individual dataˆ θ i,t − = arg min θ ∈ Θ (cid:88) s ∈T i,t − l s ( θ )for all i ∈ [ n ]. Go to Step 2. Step 2. Neighborhood Construction.

Compute the neighborhood of each product i asˆ N i,t = { i (cid:48) ∈ [ n ] : || ˆ θ i (cid:48) ,t − − ˆ θ i,t − || ≤ B i (cid:48) ,t − + B i,t − } where B i,t − is deﬁned in (8) for each i ∈ [ n ]. Go to Step 3. Step 3. Clustered Parametric Estimation.

Compute the MLE using clustered data( (cid:101) α (cid:48) ˆ N i,t ,t − , (cid:101) β ˆ N i,t ,t − ) (cid:48) = (cid:101) θ ˆ N i,t ,t − = arg min θ ∈ Θ (cid:88) s ∈ (cid:101) T ˆ N i,t,t − l s ( θ )for each i ∈ [ n ]. Go to Step 4. Step 4. Pricing.

Compute price for each i ∈ [ n ] as p (cid:48) i,t = arg max p ∈ [ p,p ] µ ( α (cid:48) ˆ N i,t ,t − x i,t + β ˆ N i,t ,t − p ) p, then project to (cid:101) p i,t = Proj [ p + | ∆ i,t | ,p −| ∆ i,t | ] ( p (cid:48) i,t ) and oﬀer to the customer price p i,t = (cid:101) p i,t + ∆ i,t where ∆ i,t = ± ∆ (cid:101) T − / N i,t ,t which takes two signs with equal probability. Then, customer t arrives, searches for product i t , and makes purchasing decision d i t ,t ( p i t ,t ; z i t ,t ). Update T i t ,t = T i t ,t − ∪ { t } and V i t ,t = V i t ,t − + u t u (cid:48) t . end for expected that the estimator (cid:101) θ N it ,t − for θ i t in Step 3 is more accurate than ˆ θ i,t − . Of course, theestimation accuracy again requires the minimum eigenvalue of the empirical Fisher’s informationmatrix over the clustered set (cid:101) T N it ,t − , i.e., λ min ( I + (cid:80) s ∈ (cid:101) T N it,t − u s u (cid:48) s ), to be suﬃciently large, whichis again guaranteed by stochastic assumption of z i,t and the price perturbation in Step 4.The design of the CSMP algorithm depends critically on two things. First, by taking an appro-priate price perturbation in Step 4, we balance the exploration and exploitation. If the perturbationis too much, even though it helps to achieve good parameter estimation, it may lead to loss of uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 revenue (due to purposely charging the wrong price). Second, the sequence of demand covariates z i,t has to satisfy an important stochastic assumption (Assumption A-3) which is commonly seenin the pricing literature with demand covariates (see e.g., Chen et al. 2015b, Qiang and Bayati2016, Ban and Keskin 2017, Javanmard and Nazerzadeh 2019). In the next section, we will dropthe stochastic assumption by focusing on a special class of the generalized linear model, the lineardemand model, in which the covariates z t can be non-stochastic or even adversarial. This section presents the regret of the CSMP pricing policy. Before proceeding to the main result,we ﬁrst make some technical assumptions that will be needed for the theorem.

Assumption A:

1. The expected revenue function pµ ( α (cid:48) x + βp ) has a unique maximizer p ∗ ( α (cid:48) x, β ) ∈ [ p, p ], whichis Lipschitz in ( α (cid:48) x, β ) with parameter L for all x ∈ X and θ ∈ Θ. Moreover, the uniquemaximizer is in the interior ( p, p ) for the true θ i for all i ∈ [ n ] and x ∈ X .2. µ ( · ) is monotonically increasing and twice continuously diﬀerentiable in its feasible region.Moreover, for all x ∈ X , θ ∈ Θ and p ∈ [ p, p ], we have that ˙ µ ( α (cid:48) x + βp ) ∈ [ l , L ], and | ¨ µ ( α (cid:48) x + βp ) | ≤ L for some positive constants l , L , L .3. For each i ∈ [ n ] and t ∈ T i,T , we have E [ z i,t |F t − ] = 0 and λ min ( E [ z i,t z (cid:48) i,t |F t − ]) ≥ λ for some λ >

0, where F t − is the σ -algebra generated by history (e.g., { i s , z s , p s , d i s ,s : s ≤ t − } ) untilend of period t − α (cid:48) x + βp increases,which is plausible. One can easily verify that the commonly used demand models, such as linearand logistic demand, satisfy these two assumptions with appropriate choice of X and Θ. The lastassumption A-3 is a standard stochastic assumption on demand covariates which has appearedin several pricing papers (see e.g., Qiang and Bayati 2016, Ban and Keskin 2017, Nambiar et al.2018, Javanmard and Nazerzadeh 2019). In Section 4, we will relax this stochastic assumptionin the setting of linear demand. Note that A-3 does not require the feature sequence z i,t to beindependent or identically distributed, and only requires it to be an adapted sequence of ﬁltra-tion {F s } s ≥ . One may argue that there can be static or nearly static features in z i,t such that λ min ( E [ z i,t z (cid:48) i,t |F t − ]) ≥ λ > z i,t since the utility corresponding to these static features can be in the constant term, i.e., the inter-cept in α (cid:48) i t (1 , z i,t ). We will see in the numerical study in Section 5.1 that our algorithm performswell even when some features are nearly static or slowly changing.Under Assumption A, we have the following result on the regret of the CSMP algorithm. uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Theorem 1.

Let input parameter c ≥ /l ; the expected regret of algorithm CSMP is R ( T ) = O (cid:18) d log ( dT )min i ∈ [ n ] q i + d √ mT log T (cid:19) . (9) In particular, if q i = Θ(1 /n ) for all i ∈ [ n ] and we hide the logarithmic terms, then when T (cid:29) n ,the expected regret is at most (cid:101) O ( d √ mT ) . Sketch of proof.

For ease of presentation and to highlight the main idea, we only provide aproof sketch for the “simpliﬁed” regret (cid:101) O ( d √ mT ). The proof of the general case (9) is given in thesupplement.We show that there is a time threshold ¯ t = O ( d log ( dT ) / min i ∈ [ n ] q i ) such that for all t > t ,with high probability we will have ˆ N t = N i t (see Lemma EC.5 in the supplement). This shows thatparameters are accurately estimated when t is suﬃciently large, which leads to the desired regret.While for t ≤ t , the regret can be bounded by O (¯ t ), which is only poly-logarithmic in T and n . Toprovide a more detailed argument, we ﬁrst deﬁne (cid:101) q j := (cid:80) i ∈N j q i as the probability that a customerviews a product belonging to cluster j , and (cid:101) θ j,t − := (cid:101) θ N j ,t − as the estimated parameter of cluster j using data in (cid:101) T j,t − := (cid:83) i ∈N j T i,t − , and deﬁne (cid:101) T j,t − := | (cid:101) T j,t − | . Then, we deﬁne E N,t := { ˆ N t = N i t } , E B j ,t := {|| (cid:101) θ j,t − θ j || ≤ (cid:101) B j,t } , E V,t :=  λ min  (cid:88) s ∈ (cid:101) T jt,t u s u (cid:48) s  ≥ λ ∆ (cid:112)(cid:101) q j t t  , where λ = min(1 , λ ) / (1 + p ) is some constant. Moreover, deﬁne (cid:101) B j,t =: (cid:112) c ( d + 2) log(1 + t )) (cid:113) λ min ( (cid:101) V j,t ) , where (cid:101) V j,t = I + (cid:80) s ∈ (cid:101) T j,t u s u (cid:48) s . We further deﬁne the event E t := (cid:91) j ∈ [ m ] E B j ,t ∪ E N,t ∪ E

V,t . In the supplement, we will show that E t holds with probability at least 1 − n/t when t > t . Sothe regret on the event that E t fails is at most O ( n log T ) because T (cid:88) t =1 E [( r t ( p ∗ t ) − r t ( p t ))111( ¯ E t )] ≤ p T (cid:88) t =1 P ( ¯ E t ) ≤ pn T (cid:88) t =1 /t = O ( n log T ) . We bound the regret for each period on E t as follows. On the event E t , we apply Taylor’s theorem(note that p ∗ t is the interior point within the price bound), that under the event E t and AssumptionA (see also the derivation of (EC.1) in the supplement): E [ r t ( p ∗ t ) − r t ( p t )] ≤ O (cid:16) E (cid:104) (cid:101) B j t ,t − + ∆ t (cid:105)(cid:17) (10) uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 where ∆ t = ∆ i t ,t for the sake of brevity. By plugging the value of (cid:101) B j t ,t (with the lower bound of λ min ( (cid:101) V j t ,t ) on theevent E V,t ), we obtain (cid:88) t> t E (cid:104) (cid:101) B j t ,t − (cid:105) ≤ O ( d log T ) (cid:88) t> t E (cid:34) (cid:112)(cid:101) q j t t (cid:35) ≤ O ( d log T ) (cid:88) t> t (cid:88) j ∈ [ m ] (cid:112)(cid:101) q j √ t ≤ O ( d log T ) (cid:88) j ∈ [ m ] (cid:112)(cid:101) q j T ≤ O ( d log T ) √ mT , (11)where the ﬁrst inequality follows from the deﬁnition of (cid:101) B j t ,t − and event E V,t , the second inequalityis from realizations of j t (i.e., j t = j with probability (cid:101) q j for all j ∈ [ m ]), and the last inequality isby Cauchy-Schwarz.On the other hand, because ˆ N t = N i t for all t > t , we have E (cid:34)(cid:88) t> t ∆ t (cid:35) ≤ (cid:88) j ∈ [ m ] E  (cid:88) t ∈ (cid:101) T j,T ∆ (cid:113) (cid:101) T j,t  ≤ O  E  (cid:88) j ∈ [ m ] (cid:113) (cid:101) T j,T  ≤ O (cid:16) √ mT (cid:17) , (12)where the ﬁrst inequality follows from deﬁnition of ∆ t and the event ˆ N t = N i t .Putting (10), (11), and (12) together, we obtain (cid:88) t ≥ t E [ r t ( p ∗ t ) − r t ( p t )] ≤ O ( d log T √ mT ) . Thus, the result is proved.We have a number of remarks about the CSMP algorithm and the result on regret, following inorder.

Remark 4. (Comparison with single-product pricing)

Our pricing policy achieves theregret (cid:101) O ( d √ mT ). A question arises as to how it compares with the baseline single-product pricingalgorithm that treats each product separately. Ban and Keskin (2017) consider a single-productpricing problem with demand covariates. According to Theorem 2 in Ban and Keskin (2017), theiralgorithm, when applied to each product i in our setting separately, achieves the regret (cid:101) O ( d (cid:112) T i,T ).Therefore, adding together all products i ∈ [ n ], the upper bound of the total regret is (cid:101) O ( d √ nT ).When the number of clusters m is much smaller than n , the regret (cid:101) O ( d √ mT ) of CSMP signiﬁcantlyimproves the total regret obtained by treating each product separately. uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Remark 5. (Lower bound of regret)

To obtain a lower bound for the regret of our problem,we consider a special case of our model in which the decision maker knows the underlying trueclusters N j . Since this is a special case of our problem (which is equivalent to single-product pricingfor each cluster N j ), the regret lower bound of this problem applies to ours as well. Theorem 1in Ban and Keskin (2017) shows that the regret lower bound for each cluster j has to be at leastΩ (cid:18) d (cid:113) (cid:101) T j,t (cid:19) . In the case that (cid:101) q j = 1 /m for all j ∈ [ m ], it can be derived that the regret lower boundfor all clusters has to be at least Ω( d √ mT ). This implies that the regret of the proposed CSMPpolicy is optimal up to a logarithmic factor. Remark 6. (Improving the regret for large n ) When n is large, the ﬁrst term in ourregret bound O ( d log ( dT ) / min i ∈ [ n ] q i ) will also become large. For instance, if q i = O (1 /n ) forall i ∈ [ n ], then this term becomes O ( d n log ( dT )). One way to improve the regret, althoughit requires prior knowledge of γ , is to conduct more price exploration during the early stages.Speciﬁcally, if the conﬁdence bound B i,t − of product i is larger than γ/

4, in Step 4, we let theprice perturbation ∆ i,t be ± ∆ to introduce suﬃcient price variation (otherwise let ∆ t be thesame as in the original algorithm CSMP). Following a similar argument as in Lemma EC.4 in thesupplement, it roughly takes O ( d log( dT ) / min i ∈ [ n ] q i ) time periods before all B i,t − < γ/

4, so thesame proof used in Theorem 1 appplies. Therefore, when q i = O (1 /n ) for all i ∈ [ n ], the ﬁnal regretupper bound is O ( dn log( dT ) + d log T √ mT ). Remark 7. (Relaxing the cluster assumption)

Our theoretical development assumes thatproducts within the same cluster have exactly the same parameters θ i . This assumption can berelaxed as follows. Deﬁne two products i , i as in the same cluster if they satisfy || θ i − θ i || ≤ γ for some positive constant γ with γ < γ/ || θ i − θ i || > γ ). Ourpolicy in Algorithm 1 can adapt to this case by modifying Step 2 toˆ N i,t = { i (cid:48) ∈ [ n ] : || ˆ θ i (cid:48) ,t − − ˆ θ i,t − || ≤ B i (cid:48) ,t − + B i,t − + γ } , and we let ∆ i,t = ± ∆ max (cid:16) ˆ T − / N i,t ,t , υ (cid:17) , where υ = Θ( γ / ) is a constant. Following almost the same analysis, we can show that the regretis at most (cid:101) O ( d √ mT + γ / T ). We refer the interested reader to Theorem EC.1 in the supplementfor a more detailed discussion. The main diﬀerence between this regret and the one obtained inTheorem 1 is the extra term (cid:101) O ( γ / T ). It is clear that when γ = 0, we have exactly the same regretas in Theorem 1. In general, if γ is small (e.g., in the order of T − / ), then (cid:101) O ( d √ mT + γ / T )can still be a better regret than (cid:101) O ( d √ nT ), which is the typical regret of single-product pricingproblems for n products. As a result, the idea of clustering can be useful even if the parameterswithin the same cluster are diﬀerent. uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

4. Pricing Policy for Linear Model

The previous sections developed an adaptive policy for a generalized linear demand model under astochastic assumption on the covariates z t . This assumption may be too strong in some applications.As argued in some of the adversarial bandit literature, some terms in the reward function may notsatisfy any stochastic distribution and can even appear adversarially. In our model, the contextualcovariate usually includes such information as customer rating of the product, competitor’s priceof similar products, promotion information, and average demand of the product in the past fewweeks, etc., which may not follow any probability distribution.In this section, we drop the stochastic assumption by focusing on the linear demand model,which is an important and widely adopted special case of the generalized linear demand model.With a linear demand function, the expected value in (1) with covariates x (cid:48) i,t = (1 , z i,t ) (cid:48) takes theform µ ( α (cid:48) i x i,t + β i p i,t ) = α (cid:48) i x i,t + β i p i,t . (13)We point out that (13) is interpreted as purchasing probability in the previous section when eachperiod has a single customer. The linear demand model typically applies when the demand size inperiod t is random and given by d i,t ( x i,t , p i,t ) = α (cid:48) i x i,t + β i p i,t + (cid:15) i,t , where (cid:15) i,t is a zero-mean and sub-Gaussian random variable. Then (13) represents the averagedemand in period t . While our pricing policy applies to both cases, we focus on the case that (13)represents purchasing probability for the consistency and simplicity of presentation.For the linear demand model, we can relax Assumption A to the following. Assumption B:

1. There exists some compact interval of negative numbers B , such that β i ∈ B for each i ∈ [ n ],and − α (cid:48) i x/ (2 β i ) ∈ ( p, p ) for all x ∈ X .2. For any i ∈ [ n ] and t ∈ [ T ] such that T i,t ≥ t , λ min ( (cid:80) s ∈T i,t x s x (cid:48) s ) ≥ c T κi,t for some constant c , t > κ ∈ (1 / , β i < β i is the coeﬃcient of the price sensitivity in (13). Essentially, Assumption B-2relaxes the stochastic assumption on demand covariates in Assumption A-3 such that covariatescan be chosen arbitrarily as long as they have enough “variation”. The reasons that AssumptionB-2 is a relaxation of Assumption A-3 are the following. First, as mentioned earlier, the covariatesmay not follow any distribution at all. Second, one can verify that if Assumption A-3 is satisﬁed, uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 then Assumption B-2 is also satisﬁed with probability at least 1 − ∆ (for any ∆ >

0) given t = O (log( dn/ ∆)), c = 1 /

2, and κ = 1 (according to the proof in Lemma EC.4 in the supplement).Third, in real application, Assumption A-3 is diﬃcult to verify, while Assumption B-2 can beveriﬁed from the data by simply observing the historical demand covariates of each product. Finally,we point out that Assumption B-2 is needed only for identifying clusters of products, so it is notnecessary and can be dropped for the single-product pricing problem.For linear demand, we are able to separately estimate α i and β i . First, it can be shown that β i can be estimated accurately using a simple estimation approach below. Then, α i can be eas-ily estimated using a regularized linear regression (e.g., ridge regression). To guarantee accurateparameter estimation for α i , classical regression theory requires the minimum eigenvalue of empir-ical Fisher’s information matrix to be suﬃciently large. With α i estimated separately from β i , itsempirical Fisher’s information matrix is ¯ V i,t := I + (cid:80) s ∈T i,t x s x (cid:48) s . This explains why AssumptionB-2 on ¯ V i,t , instead of the stochastic assumption on demand covariates A-3 for the GLM case, isrequired for the linear demand model.To conduct separate parameter estimation, we adopt the idea from Nambiar et al. (2018). Letˆ β i,t := Proj B (cid:32) (cid:80) s ∈T i,t ∆ s d s (cid:80) s ∈T i,t ∆ s (cid:33) (14)be the estimated parameter of β i using individual data in T i,t . We will show that under certainconditions, ˆ β i,t is an accurate estimation of β i . To estimate α i , we apply the idea of regularization.That is, ˆ α i,t = arg min (cid:88) s ∈T i,t ( d s − α (cid:48) x s − ˆ β i,t p s ) + λ α || α || . (15)We notice that when ˆ β i,t is suﬃciently close to β i , ˆ α i,t is essentially a ridge regression estimatorof α i , whose estimation error is well-studied (see, e.g., Abbasi-Yadkori et al. 2011). To simplifyour presentation, in what follows we set the (cid:96) regularization parameter λ α in (15) as 1. From ournumerical studies, we observe that the performance is not sensitive to the choice of λ α when T islarge. Similarly, using clustered data from (cid:101) T ˆ N i,t ,t , we can obtain the estimators (cid:101) β ˆ N i,t ,t and (cid:101) α ˆ N i,t ,t .We refer to our algorithm in this section as Clustered Semi-Myopic Pricing for Linear model(CSMP-L), which is presented in Algorithm 2. The structure of CSMP-L is similar to CSMPin Algorithm 1. The main diﬀerence is that CSMP-L constructs diﬀerent conﬁdence bounds todetermine the neighborhood ˆ N i,t of product i . In particular, in Step 3 in Algorithm 2, we deﬁne C i,t = (cid:113) ( (cid:101) C βi,t ) + ( (cid:101) C αi,t ) /λ min ( ¯ V i,t ) , (16) uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 where (cid:101) C βi,t = c (cid:112) log t  (cid:88) s ∈T i,t ∆ s  − / , (cid:101) C αi,t = c (cid:112) ( d + 1) log t  (cid:88) s ∈T i,t ∆ s  − / (cid:112) T i,t , (17)for some constant c > , c >

0. The choice of c and c will be further discussed in the numericalexperiments section.The next theorem presents the theoretical performance of the CSMP-L algorithm in terms ofthe regret. Theorem 2.

The expected regret of algorithm CSMP-L is R ( T ) = O (cid:32) √ d log T min i ∈ [ n ] q κ/ i (cid:33) / (2 κ − + d √ mT (log T )  . (18) If we hide logarithmic terms and suppose min i ∈ [ n ] q i = Θ(1 /n ) with T (cid:29) n , the expected regret is atmost (cid:101) O ( d √ mT ) . Compared with Theorem 1, it is seen that the regret of CSMP-L is slightly worse than thatof CSMP by the dimension d and some logarithmic terms. This is attributed to the weakenedassumption on covariate vectors. However, in contrast to Theorem 1 where the regret is taken overthe expectation with regard to the stochastic feature z t , t ∈ [ T ], the regret in (18) holds for anyfeature vector, even when the feature vectors z t , t ∈ [ T ], are chosen adversarially. Remark 8.

Assumption B-2 (for linear model) and Assumption A-3 (for generalized linearmodel) require the product features to have suﬃcient variations. These two assumptions are madeonly for the purpose of identifying product clusters. That is, if the clustering of products is known apriori , e.g., the single-product dynamic pricing problem, then these assumptions can be completelydropped (i.e., z t can be chosen completely arbitrarily), and the results continue to hold. We oﬀer ajustiﬁcation for making this assumption. By our deﬁnition of cluster, we need E [ || ˆ θ i,t − θ i || ] ≤ γ toidentify the right cluster for product i . On the other hand, classic statistics theory (e.g., Cram´er-Rao lower bound) states that E [ || ˆ θ i,t − θ i || ] ≥ Ω(1 / (cid:112) λ min ( V i,t )). Therefore, if the product featuresdo not have suﬃcient variation, it is essentially not possible to have the estimation error boundedabove by γ to ﬁnd the right cluster for i . uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Algorithm 2

The CSMP-L Algorithm

Require: c , c , conﬁdence bound parameters; ∆ , price perturbation parameter; Step 0. Initialization.

Initialize T i, = ∅ and ¯ V i, = I for all i ∈ [ n ]. Let t = 1, go to Step 1. for t = 1 , , . . . , T do Step 1. Individual Parametric Estimation.

Compute the estimated parameters ˆ θ (cid:48) i,t − =( ˆ α i,t − , ˆ β i,t − ) for all i ∈ [ n ] as ˆ β i,t − = Proj B (cid:32) (cid:80) s ∈T i,t − ∆ s d s (cid:80) s ∈T i,t − ∆ s (cid:33) and ˆ α i,t − = arg min (cid:88) s ∈T i,t − ( d s − α (cid:48) x s − ˆ β i,t − p s ) + || α || . Go to Step 2. Step 2. Estimating Neighborhood.

Compute the neighborhood of i asˆ N i,t = { i (cid:48) ∈ [ n ] : || ˆ θ i (cid:48) ,t − − ˆ θ i,t − || ≤ C i (cid:48) ,t − + C i,t − } where C i,t − is deﬁned in (16) for all i ∈ [ n ]. Go to Step 3. Step 3. Clustered Parametric Estimation.

Compute the estimated parameter (cid:101) θ (cid:48) ˆ N i,t ,t − = ( (cid:101) α (cid:48) ˆ N i,t ,t − , (cid:101) β ˆ N i,t ,t − ) using clustered data (cid:101) β ˆ N i,t ,t − = Proj B  (cid:80) s ∈ (cid:101) T ˆ N i,t,t − ∆ s d s (cid:80) s ∈ (cid:101) T ˆ N i,t,t − ∆ s  and (cid:101) α ˆ N i,t ,t − = arg min (cid:88) s ∈ (cid:101) T ˆ N i,t,t − ( d s − α (cid:48) x s − (cid:101) β ˆ N i,t ,t − p s ) + || α || . for each i ∈ [ n ]. Go to Step 4. Step 4. Pricing.

Compute price for each i ∈ [ n ] as p (cid:48) i,t = arg max p ∈ [ p,p ] ( (cid:101) α (cid:48) ˆ N i,t ,t − x i,t + (cid:101) β ˆ N i,t ,t − p ) p, then project to (cid:101) p i,t = Proj [ p + | ∆ i,t | ,p −| ∆ i,t | ] ( p (cid:48) i,t ) and oﬀer to the customer price p i,t = (cid:101) p i,t + ∆ i,t where ∆ i,t = ± ∆ (cid:101) T − / N i,t ,t which takes two signs with equal probability. Then, customer in period t searchers for product i t , and makes purchase decision d i t ,t ( p i t ,t ; z i t ,t ), and update T i t ,t = T i t ,t − ∪ { t } and ¯ V i t ,t = ¯ V i t ,t − + x t x (cid:48) t . end for uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

5. Simulation Results and Field Experiments

This section provides the simulation and ﬁeld experiment results for algorithms CSMP and CSMP-L. First, we conduct a simulation study using synthetic data in Section 5.1 to illustrate theeﬀectiveness and robustness of our algorithms against several benchmark approaches. Second, thesimulation results using a real dataset from Alibaba are provided in Section 5.2. Third, Section5.3 reports the results from a ﬁeld experiment at Alibaba. Finally, we summarize all numericalexperiment results in Section 5.4.

In this section, we demonstrate the eﬀectiveness of our algorithms using some synthetic data simula-tion. We ﬁrst show the performance of CSMP and CSMP-L against several benchmark algorithms.Then, several robustness tests are conducted for CSMP. The ﬁrst test is for the case when clusteringassumption is violated (i.e., parameters within the same cluster are slightly diﬀerent). The secondtest is when the demand covariates z i,t contain some features that change slowly in a deterministicmanner. Finally, we test CSMP with a misspeciﬁed demand model.We shall compare the performance of our algorithms with the following benchmarks: • The Semi-Myopic Pricing (SMP) algorithm, which treats each product independently (IND),and we refer to it as SMP-IND. • The Semi-Myopic Pricing (SMP) algorithm, which treats all products as one (ONE) singlecluster, and we refer to the algorithm as SMP-ONE. • The Clustered Semi-Myopic Pricing with K -means Clustering (CSMP-KMeans), which uses K -means clustering for product clustering in Step 2 of CSMP.The ﬁrst two benchmarks are natural special cases of our algorithm. Algorithm SMP-IND skipsthe clustering step in our algorithm and always sets the neighborhood as ˆ N t = { i t } ; while SMP-ONE keeps ˆ N t = N for all t ∈ [ T ]. The last benchmark is to test the eﬀectiveness of other classicalclustering approach for our setting, in which we choose K -means clustering as an illustrativeexample because of its popularity. Logistic demand with clusters.

We ﬁrst simulate the demand using a logistic function. Weset the time horizon T = 30 , q i = 1 /n for all i ∈ [ n ] where n = 100,and the price range p = 0 and p = 10. In this study, it is assumed that all n = 100 products have m = 10 clusters (with products randomly assigned to clusters). Within a cluster j , each entry in α j is generated uniformly from [ − L/ √ d + 2 , L/ √ d + 2] with L = 10, and β j is generated uniformlyfrom [ − L/ √ d + 2 ,

0) (to guarantee that || θ i || ≤ L ). For demand covariates, each feature in z i,t ,with dimension d = 5, is generated independently and uniformly from [ − / √ d, / √ d ] (to guaranteethat || z i,t || ≤ = 1; and for the conﬁdence uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 bound B i,t = (cid:112) c ( d + 2) log(1 + t ) /λ min ( V i,t ), we ﬁrst let c = 0 . c forsensitivity analysis. For the benchmark CSMP-KMeans, we need to specify the number of clusters K ; since the true number of clusters m is not known a priori , we test diﬀerent values of K in { , , , } . Note that when K = 10, the performance of CSMP-KMeans can be considered asan oracle since it correctly speciﬁes the true number of product clusters.To evaluate the performance of algorithms, we adopt both the cumulative regret in (4) and thepercentage revenue loss deﬁned by L π ( T ) = R π ( T ) (cid:80) Tt =1 E [ r t ( p ∗ t )] , (19)which measures the percentage of revenue loss with respect to the optimal revenue. Obviously, thepercentage revenue loss and cumulative regret are equivalent, and a better policy leads to a smallerregret and a smaller percentage revenue loss.For each experiment, we conduct 30 independent runs and take their average as the output. Wealso output the standard deviation of percentage revenue loss for all policies in Table 1. It canbe seen that our policy CSMP has quite small standard deviation, so we will neglect standarddeviation results in other experiments.We recognize that a more appropriate measure for evaluating an algorithm is the regret (andpercentage of loss) of expected total proﬁt (instead of expected total revenue). We choose the latterfor the following reasons. First, it is consistent with the objective of this paper, which is the choiceof the existing literature. Second, it is revenue, not proﬁt, that is being evaluated at our industrypartner, Alibaba. Third, even if we wish to measure it using proﬁt, the cost data of products arenot available to us, since the true costs depend on such critical things as terms of contracts withsuppliers, that are conﬁdential information.The results are shown in Figure 2. According to this ﬁgure, our algorithm CSMP outperformsall the benchmarks except for CSMP-KMeans when K = m = 10. CSMP-KMeans with K = 10has the best performance, which is not surprising because it uses the exact and correct numberof clusters. However, in reality the true cluster number m is not known. We also test CSMP-KMeans with K = 5 , ,

30. We ﬁnd that when K = 20, its performance is similar to (slightlyworse than) our algorithm CSMP. When K = 5 ,

30, the performance of CSMP-KMeans becomesmuch worse (especially when K = 5). For the other two benchmarks SMP-ONE and SMP-IND,their performances are not satisfactory either, with SMP-ONE has the worst performance becauseclustering all products together leads to signiﬁcant error. Sensivitiy results of CSMP with diﬀerentparameters c are presented in Table 2, and it can be seen that CSMP is quite robust with diﬀerentvalues of c . uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Figure 2 Performance of diﬀerent policies for logistic demand with 10 clusters. The graph on the left-hand sideshows the percentage revenue loss of all algorithms, and the graph on the right-hand side shows the cumulativeregrets for each algorithm. The black solid line represents CSMP, the red dashed line represents SMP-IND, theblue dash-dotted line represents SMP-ONE, the green dotted line represents CSMP-KMeans with K = 5 , thecyan solid line with round marks represents CSMP-KMeans with K = 10 , the purple solid line with triangle marksrepresents CSMP-KMeans with K = 20 , and the yellow solid line with square marks represents CSMP-KMeanswith K = 30 . t = 5 , t = 10 , t = 15 , t = 20 , t = 25 , t = 30 , K = 5 2.08 1.97 1.95 2.26 2.22 2.19CSMP-KMeans: K = 10 2.06 1.53 1.09 0.87 0.74 0.66CSMP-KMeans: K = 20 2.12 1.36 1.15 1.02 0.91 0.82CSMP-KMeans: K = 30 1.41 0.88 0.77 0.67 0.59 0.49 Table 1 Standard deviation ( % ) of percentage revenue loss corresponding to diﬀerent time periods for logisticdemand with 10 clusters. c = 0 . c = 0 . c = 0 . c = 0 . c = 0 . c = 1 . Table 2 Mean and standard deviation ( % ) of percentage revenue loss of CSMP (logistic demand with 10clusters) with diﬀerent parameters c . Linear demand with clusters.

Now we present the results of CSMP and CSMP-L with lineardemand function. For synthetic data, z i,t is generated the same way as in the logistic demand casebut with L = 1 (in order for the purchasing probability to be within [0 , n, m, T, q i , d, ∆ uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 and price ranges are also kept the same. For demand parameters, α j,k ∈ [0 , L/ √ d + 2] for each entry k corresponding to context z t , α j,k ∈ [ L/ √ d + 2 , L/ √ d + 2] for k corresponding to the intercept,and the price sensitivity β j ∈ [ − . L/ √ d + 2 , − . L/ √ d + 2]. The reason for this constructionof data is to guarantee that the linear purchasing probabilities are mostly within [0 , c = 0 . C i,t is set to (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) c  log t/ (cid:88) s ∈T i,t ∆ s + 0 . d + 1) log tT i,t / ( λ min ( ¯ V i,t ) (cid:88) s ∈T i,t ∆ s )  , with c = 0 .

04. The results are summarized in Figure 3. It can be seen that our algorithm CSMPhas the best performance, even exceeding CSMP-KMeans with K = 10. The reason might be thatsince L = 1 (instead of L = 10 for the logistic demand case), the parameters are closer to eachother, hence it becomes more diﬃcult to be clearly separated by K -means method. For algorithmCSMP-L, its numerical performance is slightly worse than CSMP, but still performs better thanbenchmarks SMP-IND and SMP-ONE.Since logistic demand is more commonly used to model probability, in the following robustnesscheck of CSMP, we only test logistic demand as an illustration. (a) Plot of percentage revenue loss (b) Plot of cumulative regret Figure 3 Performance of diﬀerent policies for linear demand with 10 clusters. The grey solid line with X marksrepresents CSMP-L.

Logistic demand with relaxed clusters.

As we discussed in Section 3.2, strict clusteringassumption might not hold and sometimes products within the same cluster are slightly diﬀerent.This experiment tests the robustness of CSMP when parameters of products in the same clusterare slightly diﬀerent. To this end, after we generate the m = 10 centers of parameters (with eachcenter represented by θ j ), for each product i in the cluster j , we let θ i = θ j + ∆ θ i where ∆ θ i is a uthor: Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 random vector such that each entry is uniformly drawn from [ − L/ (10 √ d + 2) , L/ (10 √ d + 2)]. Allthe other parameters are the same as in the case with 10 clusters. Results are summarized in Figure4, and it can be seen that the performances of all algorithms are quite similar as in Figure 2. (a) Plot of percentage revenue loss (b) Plot of cumulative regret Figure 4 Performance of diﬀerent policies for logistic demand with relaxed clusters.

Logistic demand with almost static features.

As we discussed after Assumption A-3, insome applications there might be features that have little variations (nearly static). We next testthe robustness of our algorithm CSMP when the feature variations are small. To this end, weassume that one feature in z i,t ∈ R d for each i ∈ [ n ] is almost static. More speciﬁcally, we let thisfeature be constantly 1 / √ d for 100 periods, then change to − / √ d for another 100 periods, thenswitch back to 1 / √ d after 100 periods, and this process continues. The numerical results againstbenchmarks are summarized in Figure 5. It can be seen that with such an almost static feature, theperformances of algorithms with clustering become worse, but they still outperform the benchmarkalgorithms. In particular, CSMP (with parameter c = 0 . Logistic demand with model misspeciﬁcation.

In real applications, it may happen thatthe demand model is misspeciﬁed. In this experiment, we consider a misspeciﬁed logistic demandmodel. Speciﬁcally, we let the expected demand of product i be 1 / (1 + exp( f i ( z t , p t ))) , where theutility function f i ( z t , p t ) := c i, + d (cid:88) k =1 c ,i,k z t,k + d (cid:88) k =1 c ,i,k z t,k + d (cid:88) k =1 c ,i,k z t,k + β ,i p t + β ,i p t + β ,i p t is a third degree polynomial of z t , p t , where c i , β i are unknown parameters, and z t,k repre-sents te k -th component of z t . To generate this misspeciﬁed demand model, we let c l,i,k ∈ [ − L/ (cid:112) d + 2) , L/ (cid:112) d + 2)] with l ∈ { , , } , k ∈ [ d ], c i, ∈ [ − L/ √ d + 2 , L/ √ d + 2], and β l,i ∈ uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 (a) Plot of percentage revenue loss (b) Plot of cumulative regret

Figure 5 Performance of diﬀerent policies for logistic demand with 10 clusters and almost static features. [ − L/ (cid:112) d + 2) ,

0) with l ∈ { , , } , be all drawn uniformly. All the other input parameters for theproblem instance are the same as in the case of logistic demand with 10 clusters.To test the robustness of the misspeciﬁed CSMP, it is compared with CSMP which correctlyspeciﬁes the demand model. We call the benchmark the CSMP-Oracle. The numerical results aresummarized in Figure 6. As seen, when compared with the oracle, the misspeciﬁed CSMP hasslightly worse performance as expected. But the overall diﬀerence in percentage revenue loss is only3 . (a) Plot of percentage revenue loss (b) Plot of cumulative regret Figure 6 Performance of CSMP with (misspeciﬁed) logistic demand versus the oracle.

This section presents the results of our algorithms (for illustration, we use CSMP with logisticdemand) and other benchmarks using a real dataset provided by Alibaba. To better simulatethe real demand process, we ﬁt the demand data to create a sophisticated ground truth model(hence our algorithm CSMP may have a model misspeciﬁcation). Before presenting the results, weintroduce the dataset and pre-processing of the data. uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 The dataset.

The dataset is from Tmall Supermarket, which is an online store owned byAlibaba. To motivate our study of pricing for low-sale products, we extract sales data from05/29/2018 to 07/28/2018. During this period, nearly 75,000 products were oﬀered by Tmall Super-market. There are more than 21.6% (i.e., 16,000) products with average numbers of daily uniquevisits less than 10. Among all these low-sale products, Alibaba provided us with a test dataset com-prising 100 products that have at least one sale during the 61-day period, and at least two pricescharged with each price oﬀered to more than 10% of all customers. Because these selected prod-ucts have suﬃcient variation of prices and diﬀerent observations of customers’ purchases, demandparameters can be estimated quite accurately using the sales data in the dataset.For the features of products, we are provided by Alibaba with 5 features (hence d = 5), that aredescribed below: • Average gross merchandise volume (GMV, i.e., product revenue) in past 30 days. • Average demand in past 30 days. • Average number of unique buyers (UB, i.e., unique IP which makes the purchase) in past 30days. • Average number of unique visitors (UV) in past 30 days. • Average number of independent product views (IPV, i.e., total number of views on the product,including repetitive views from the same user) in past 30 days.These features are selected by Alibaba’s feature engineering team (via a recursive feature elim-ination approach from a raw set of features). Note that these features are not exogeneous, sincefeatures in the future can be aﬀected by current pricing decision. Such endogenous features areoften used in the demand forecasting literature. For instance, a time series model uses past demandto predict future demand (see e.g., Brown 1959); an artiﬁcial neural network (ANN) model uses his-torical demand data of composite products as features for demand prediction (Chang et al. 2005).In the pricing literature, some endogenous features have also been used. For example, in Ban andKeskin (2017), Bastani et al. (2019), their model features include auto loan data, e.g., competitors’rate, that are aﬀected by the rate oﬀered by the decision maker (the auto loan company). Incor-porating the impact of pricing decisions on features leads to challenging dynamic programmingproblem with partial information. Hence, features are considered as given and we only optimizefor current period (i.e., ignoring the long-run eﬀect of the current pricing decision).To run simulation using the real dataset, we ﬁrst create a ground truth model for the demand.We consider two ground truth models in this simulation study. The ﬁrst one is the commonlyused logistic demand function (hence no model misspeciﬁcation for our algorithm CSMP), and the We requested to include some other features, such as number/score of customer ratings and competitor’s price onsimilar product, but were unable to obtain such data due to technical reasons during the ﬁeld experiment. uthor:

Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 second is a random forest model (as used in simulation study of Nambiar et al. 2018, hence thereis model misspeciﬁcation for CSMP). We use the demand data of each product to ﬁt these twodemand models, and then apply them to simulate the demand process.We want to generate customer’s arrival at each time t , i.e., the product i t a customer choosesto search. Since the dataset contains the daily number of unique visitors for each product i , thearrival process i t is simulated by randomly permuting the unique visitors of each product on eachday. For instance, if on day 1, product 1 and product 2 have 2 and 3 unique visitors respectively;then i t for t = 1 , . . . , , , , ,

2, which is a random permutation of the unique visitors forproduct 1 and 2.

Numerical results for the algorithms.

We ﬁrst provide the speciﬁcations of the parametersin the CSMP algorithm in Algorithm 1. • The conﬁdence bound B i,t is (cid:112) c ( d + 2) log(1 + t ) /λ min ( V i,t ), where c = 0 .

01 for logistic demandand c = 0 .

05 for random forest demand (selected by a few trials of diﬀerent values). • The price lower bound of each product is 50% lower than its lowest price during the 61-dayperiod, and the price upper bound is 50% higher than its highest price during this period of time. • The basic price perturbation parameter ∆ of each product is set as the length of price rangedivided by 4, i.e., ∆ = ( p − p ) / K ∈ { , , , } . In addition, we test another benchmark proposed in Keskin andZeevi (2016). More speciﬁcally, this benchmark assumes a simple linear demand model as E [ d i,t ] = α i,t + β (cid:48) i,t p i,t with changing parameters α i,t , β i,t but without demand covariates. Since this single-product pricing algorithm can be considered as a modiﬁed version of semi-myopic pricing, we callit semi-myopic pricing (SMP) with changing parameters (CP), or SMP-CP for short. We plot theresults of cumulative revenue at diﬀerent dates in Figure 7.It can be seen that all the methods using clustering have better performance, and their perfor-mances are comparable. It is interesting to note that for clustering using K -means method, theirperformances with diﬀerent value of K are actually quite close. Finally, it is observed that theadvantage of using clustering with random forest model (i.e., misspeciﬁed model) is more than thatwith logistic model. We have collaborated with Alibaba Group to implement our algorithm CSMP to a set of productson Tmall Supermarket, and we report some of the ﬁndings in this subsection. Due to the privacypolicy of Alibaba, some details of the ﬁeld experiment are not provided.To conduct the experiment, we randomly selected 390 low-sale products from several categoriesfor our study. Then, 40 products were chosen randomly from them as the testing group and CSMP uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 / / / / / / / / / / / / / / / / / / / / / / / / / / Time C u m u l a t i v e r e v e nu e Real data simulation - logistic demand

CSMP:c=0.01SMP-INDSMP-ONESMP-CPCSMP-KMeans:K=5CSMP-KMeans:K=10CSMP-KMeans:K=20CSMP-KMeans:K=30 (a) Logistic demand model (without model misspeciﬁcation)(b) Random forest demand model (with model misspeciﬁcation)

Figure 7 Plot of cumulative revenue over diﬀerent dates for two demand models algorithm were implemented for their pricing decisions, and the rest were used as the control groupthat continued to use the original pricing policy at Alibaba. Purchasing probability is assumedto be a logistic function, and we use the same input parameters as in Section 5.2. We note twoimplementation details. First, according to the requirement from Alibaba, the price lower andupper bounds of each product are the minimum and maximum price of that product from theprevious 30 days, respectively. Second, following the company’s policy, we can only change theprice once a day for each product (instead of changing the price for every customer).We collect the testing data from 01/02/2019 to 01/31/2019 (a total of 30 days). To better presentthe results, let g ∈ { , } denote the index of groups such that g = 0 represents the control group,and g = 1 represents the testing group. Then we calculate the average revenue r g,t per customer inday t for products in group g . The average revenue per customer is deﬁned as the ratio betweenthe collected revenue and the total number of unique visitors (including those who did not make uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 a purchase) for group g in day t . Due to the data privacy policy of Alibaba, we will not be ableto present the raw data of r g,t . Instead, we will compute the percentage change in average revenueper customer, r g,t , compared with the average revenue per customer of group g during the previousmonth ¯ r g . More speciﬁcally, we deﬁne∆ r g,t := r g,t − ¯ r g ¯ r g , g = 0 , . To take away possible seasonal eﬀects, our comparison will be between ∆ r ,t and ∆ r ,t . The resultsare presented in Figure 8. Figure 8 Comparison of ∆ r g,t between groups g = 0 , every day As noted in the ﬁeld experiment results in Figure 8, the percentage of increase of the averagerevenue per customer in the testing group is higher than that of the control group in 26 of the 30days tested. By calculating the overall average revenue per customer for each group, we ﬁnd thatthe average revenue per customer for the testing group is increased by 10 .

14% compared with theprevious month, while in the control group, the average revenue per customer is increased by 4 . .

85% for the testing group, comparedwith − .

05% increase for the control group (see Table 3 for the summary). These results illustratethe eﬀectiveness of our CSMP policy in boosting the revenue as compared with the current pricingpolicy of Alibaba. uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 .

14% 14 . . − . Table 3 Overall performance of two groups in the testing period. “Revenue” represents percentage change ofaverage revenue, and “Demand” represents percentage change of purchasing probability.

In this section we ﬁrst present the simulation results using synthetic data under various scenariosto test the eﬀectiveness and robustness of our algorithms, then we present the simulation resultswith real data from Alibaba using a more sophisticated ground truth demand model (for a morerealistic simulation and robustness test under model misspeciﬁcation). Finally we report the resultsfrom a ﬁeld experiment conducted at Alibaba. The main ﬁndings from the numerical study aresummarized as follows. • In all the numerical results, pricing with clustering (either using our method in CSMP orclassical K -means clustering with appropriate choice of K ) outperforms the benchmarks of applyingsingle-product pricing algorithm on each product or naively putting all products into a singlecluster. • Dynamic pricing with K -means clustering method sometimes works as eﬀectively as (and attimes even better than) our algorithm CSMP/CSMP-L. But its performance depends on the choiceof the number of clusters K , which is unknown to the decision maker. • The CSMP algorithm is quite robust under diﬀerent scenarios: slightly diﬀerent demandparameters within the same cluster, near static or slowly changing features, and misspeciﬁed groundtruth demand model. • The CSMP algorithm (with logistic demand function) showed satisfactory performance in theﬁeld experiment at Tmall Supermarket. Compared with products in the control group that used thebusiness-as-usual pricing policy of Alibaba, the CSMP algorithm signiﬁcantly boosted the revenueof the testing products, demonstrating the eﬀectiveness of the algorithm.

6. Conclusion

With the rapid development of e-commerce, data-driven dynamic pricing is becoming increasinglyimportant due to the dynamic market environment and easy access to online sales data. Whilethere is abundant literature on dynamic pricing of normal products, the pricing of products withlow sales received little attention. The data from Alibaba Group shows that the number of suchlow-sale products is large, and that even though the demand for each low-sale product is small,the total revenue for all the low-sale products is quite signiﬁcant. In this paper, we present data uthor:

Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65 clustering and dynamic pricing algorithms to address this challenging problem. We believe thatthis paper is the ﬁrst to integrate online clustering learning in dynamic pricing of low-sale products.Two learning algorithms are developed in this paper: one for a dynamic pricing problem with thegeneralized linear demand, and another for the special case of linear demand functions under weakerassumptions on product covariates. We have established the regret bounds for both algorithmsunder mild technical conditions. Moreover, we test our algorithms on a real dataset from AlibabaGroup by simulating the demand function. Numerical results show that both algorithms outperformthe benchmarks, where one either considers all products separately, or treats all products as asingle cluster. A ﬁeld experiment was conducted at Alibaba by implementing the CSMP algorithmon a set of products, and the results show that our algorithm can signiﬁcantly boost revenue.There are several possible future research directions. The ﬁrst one is an in-depth study of themethod for product clustering. For instance, in bandit clustering literature, Gentile et al. (2014)use a graph-based method to cluster diﬀerent arms, and Nguyen and Lauw (2014) apply a K -meansclustering method to identify diﬀerent groups of arms. It will be interesting to understand thevarious product clustering methods and analyze their advantages and disadvantages under diﬀerentscenarios. Second, to highlight the beneﬁt of clustering techniques for low-sale products, in thispaper we study a dynamic pricing problem with suﬃcient inventory. One extension is to apply theclustering method for the revenue management problem with inventory constraint. Third, in thispaper we consider the generalized linear demand. There are other general demand functions, suchas the nonparametric models in Araman and Caldentey (2009), Wang et al. (2014), Chen et al.(2015a), Besbes and Zeevi (2015), Nambiar et al. (2018), Ferreira et al. (2018), Chen and Gallego(2018), and it is an interesting research direction to explore other, and broader, classes of demandfunctions. To that end, an important step will be to deﬁne an appropriate metric for clusteringthe products, which is a challenge especially for nonparametric models. In the end, we believe thatit will be interesting to include substitutability/complementarity of products and even assortmentdecisions. Acknowledgment : The authors are grateful to the Department Editor Prof. J. George Shanthikumar,the Associate Editor, and three referees for their constructive comments and suggestions, that havehelped us to signiﬁcantly improve both the content and exposition of this paper.

References

Abbasi-Yadkori Y, P´al D, Szepesv´ari C (2011) Improved algorithms for linear stochastic bandits.

Advancesin Neural Information Processing Systems , 2312–2320.Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning.

Oper-ations Research , 57(5):1169–1188. uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Proceedings of the EighteenthAnnual ACM-SIAM Symposium on Discrete Algorithms , 1027–1035.Auer P (2002) Using conﬁdence bounds for exploitation-exploration trade-oﬀs.

Journal of Machine LearningResearch , 3(Nov):397–422.Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem.

MachineLearning , 47(2-3):235–256.Baardman L, Levin I, Perakis G, Singhvi D (2017) Leveraging comparables for new product sales forecasting.

Available at SSRN 3086237 .Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++.

Proceedings of theVLDB Endowment , 5(7):622–633.Ban GY, Gallien J, Mersereau AJ (2018) Dynamic procurement of new products with covariate information:The residual tree method.

Manufacturing & Service Operations Management .Ban GY, Keskin NB (2017) Personalized dynamic pricing with machine learning.

Available at SSRN 2972985 .Bastani H, Simchi-Levi D, Zhu R (2019) Meta dynamic pricing: Learning across experiments.

Available atSSRN 3334629 .Bernstein F, Modaresi S, Saur´e D (2018) A dynamic clustering approach to data-driven assortment person-alization.

Management Science , 65(5):2095–2115.Bertsimas D, Perakis G (2006) Dynamic pricing: A learning approach.

Mathematical and ComputationalModels for Congestion Charging , 45–79 (Springer).Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization.

Operations Research , 63(5):1227–1244.Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms.

Operations Research , 57(6):1407–1420.Besbes O, Zeevi A (2015) On the (surprising) suﬃciency of linear models for dynamic pricing with demandlearning.

Management Science , 61(4):723–739.Bezdek JC (2013)

Pattern recognition with fuzzy objective function algorithms (Springer Science & BusinessMedia).Bitran G, Caldentey R (2003) An overview of pricing models for revenue management.

Manufacturing &Service Operations Management , 5(3):203–229.Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model.

OperationsResearch , 60(4):965–980.Brown RG (1959)

Statistical forecasting for inventory control (McGraw/Hill).Bubeck S, Cesa-Bianchi N, et al. (2012) Regret analysis of stochastic and nonstochastic multi-armed banditproblems.

Foundations and Trends R (cid:13) in Machine Learning , 5(1):1–122. uthor: Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Carvalho AX, Puterman ML (2005) Learning and pricing in an internet environment with binomial demands.

Journal of Revenue and Pricing Management , 3(4):320–336.Cesa-Bianchi N, Gentile C, Zappella G (2013) A gang of bandits.

Advances in Neural Information ProcessingSystems , 737–745.Chang PC, Wang YW, Tsai CY (2005) Evolving neural network for printed circuit board sales forecasting.

Expert Systems with Applications , 29(1):83–92.Chen N, Gallego G (2018) Nonparametric learning and optimization with covariates.

Available at SSRN3172697 .Chen Q, Jasin S, Duenyas I (2015a) Real-time dynamic pricing with minimal and ﬂexible price adjustment.

Management Science , 62(8):2437–2455.Chen X, Owen Z, Pixton C, Simchi-Levi D (2015b) A statistical learning approach to personalization inrevenue management.

Available at SSRN 2579462 .Chen Y, Shi C (2019) Network revenue management with online inverse batch gradient descent method.

Available at SSRN 3331939 .Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited priceexperimentation.

Operations Research , 65(6):1722–1731.Cross RG (1995) An introduction to revenue management.

Handbook of Airline Economics .den Boer AV (2015) Dynamic pricing and learning: historical origins, current research, and new directions.

Surveys in Operations Research and Management Science , 20(1):1–18.den Boer AV, Zwart B (2013) Simultaneously learning and optimizing using controlled variance pricing.

Management Science , 60(3):770–783.Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separatedclusters.

Journal of Cybernetics , 3(3):32–57.Elmaghraby W, Keskinocak P (2003) Dynamic pricing in the presence of inventory considerations: Researchoverview, current practices, and future directions.

Management Science , 49(10):1287–1309.Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response.

Operations Research ,58(1):16–29.Ferreira KJ, Lee BHA, Simchi-Levi D (2015) Analytics for an online retailer: Demand forecasting and priceoptimization.

Manufacturing & Service Operations Management , 18(1):69–88.Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling.

Operations Research , 66(6):1586–1602.Gallego G, Van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over ﬁnitehorizons.

Management Science , 40(8):999–1020. uthor:

Pricing with Clustering

Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Operations Research , 45(1):24–41.Gentile C, Li S, Kar P, Karatzoglou A, Etrue E, Zappella G (2016) On context-dependent clustering ofbandits. arXiv preprint arXiv:1608.03544 .Gentile C, Li S, Zappella G (2014) Online clustering of bandits.

International Conference on MachineLearning , 757–765.Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under abinary prior distribution.

Management Science , 58(3):570–586.Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm.

Journal of the RoyalStatistical Society. Series C (Applied Statistics) , 28(1):100–108.Horn RA, Horn RA, Johnson CR (1990)

Matrix analysis (Cambridge University Press).Hu K, Acimovic J, Erize F, Thomas DJ, Van Mieghem JA (2018) Forecasting new product life cycle curves:Practical approach and empirical analysis: Finalist–2017 m&som practice-based research competition.

Manufacturing & Service Operations Management , 21(1):66–85.Jagabathula S, Subramanian L, Venkataraman A (2018) A model-based embedding technique for segmentingcustomers.

Operations Research , 66(5):1247–1267.Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions.

Journal of Machine LearningResearch , 20(9):1–49.Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimalsemi-myopic policies.

Operations Research , 62(5):1142–1167.Keskin NB, Zeevi A (2016) Chasing demand: Learning and earning in a changing environment.

Mathematicsof Operations Research , 42(2):277–307.Lai TL, Robbins H (1985) Asymptotically eﬃcient adaptive allocation rules.

Advances in Applied Mathe-matics , 6(1):4–22.Lei YM, Jasin S, Sinha A (2014) Near-optimal bisection search for nonparametric dynamic pricing withinventory constraint.

Available at SSRN 2509425 .Lobel I, Leme RP, Vladu A (2018) Multidimensional binary search for contextual decision-making.

OperationsResearch , 66(5):1346–1361.MacQueen J, et al. (1967) Some methods for classiﬁcation and analysis of multivariate observations.

Proceed-ings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume 1, 281–297.McCullagh P, Nelder JA (1989)

Generalized linear models , volume 37 (CRC press).Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms.

The Computer Journal ,26(4):354–359. uthor:

Pricing with Clustering Article submitted to

Management Science ; manuscript no. MS-0001-1922.65

Nambiar M, Simchi-Levi D, Wang H (2018) Dynamic learning and price optimization with endogeneity eﬀect.

Forthcoming at Management Science .Nguyen TT, Lauw HW (2014) Dynamic clustering of contextual multi-armed bandits.

Proceedings of the 23rdACM International Conference on Conference on Information and Knowledge Management , 1959–1962(ACM).Qiang S, Bayati M (2016) Dynamic pricing with demand covariates.

Available at SSRN 2765257 .Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review ofclustering techniques and developments.

Neurocomputing , 267:664–681.Shi J, Malik J (2000) Normalized cuts and image segmentation.

IEEE Trans. Pattern Anal. Mach. Intell. ,22(8):888–905.Smith BC, Leimkuhler JF, Darrow RM (1992) Yield management at american airlines.

Interfaces , 22(1):8–31.Su Q, Chen L (2015) A method for discovering clusters of e-commerce interest patterns using click-streamdata.

Electronic Commerce Research and Applications , 14(1):1–13.Tropp JA (2011) User-friendly tail bounds for matrix martingales. ACM Report 2011–01, California Inst. ofTech. Pasadena, CA.Van Kampen TJ, Akkerman R, Pieter van Donk D (2012) Sku classiﬁcation: a literature review and concep-tual framework.

International Journal of Operations & Production Management , 32(7):850–876.Von Luxburg U (2007) A tutorial on spectral clustering.

Statistics and Computing , 17(4):395–416.Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenuemanagement problems.

Operations Research , 62(2):318–331. -companion to

Author:

Pricing with Clustering ec1

Online Supplement

In this Appendix, we present all the mising proofs in the mainbody of the paper. We also provethe result discussed in Remark 7 of Section 3 for a more general deﬁnition of clusters.

EC.1. Proof of Theorem 1

First of all, we deﬁne (cid:101) q j := (cid:80) i ∈N j q i as the probability that a customer views a product from cluster j . Then, deﬁne the events E N,t := { ˆ N t = N i t } , E B j ,t := {|| (cid:101) θ j,t − θ j || ≤ (cid:101) B j,t } , E V,t :=  λ min  (cid:88) s ∈ (cid:101) T jt,t u s u (cid:48) s  ≥ λ ∆ (cid:112)(cid:101) q j t t  , where λ = min(1 , λ ) / (1 + p ) and (cid:101) θ j,t is the estimated parameters using data from (cid:101) T j,t , and (cid:101) B j,t =: (cid:112) c ( d + 2) log(1 + t )) (cid:113) λ min ( (cid:101) V j,t )for some constant c ≥ /l and (cid:101) V j,t = I + (cid:80) s ∈ (cid:101) T j,t u s u (cid:48) s . These events hold at least with the followingprobabilities P ( E N,t ) ≥ − nt for t > ¯ t, P ( E B j ,t ) ≥ − t for any j ∈ [ m ] , t ∈ T , P ( E V,t ) ≥ − nt for t > t, where ¯ t is deﬁned in (EC.13). The ﬁrst inequality is from our analysis after Lemma EC.5; thesecond inequality is from Corollary EC.1; the third inequality is from Lemma EC.6. We furtherdeﬁne E B,t = (cid:83) j ∈ [ m ] E B j ,t , then it holds with probability at least 1 − m/t for any t ∈ T . Now wedeﬁne the event E t as the union of E N,t , E B,t , and E V,t . This event holds with probability at least1 − n/t obviously according to the probability of each event.We split the regret by considering t ≤ t and t > t , i.e., T (cid:88) t =1 E [ r t ( p ∗ t ) − r t ( p t )] = (cid:88) t ≤ t E [ r t ( p ∗ t ) − r t ( p t )] + (cid:88) t> t E [ r t ( p ∗ t ) − r t ( p t )] . c2 e-companion to Author:

Pricing with Clustering

Obviously, the regret of the ﬁrst summation can be bounded above by 2 p ¯ t . We focus on the secondsummation. For arbitrary t > t , E [ r t ( p ∗ t ) − r t ( p t )] = E [( r t ( p ∗ t ) − r t ( p t ))111( E t )] + E [( r t ( p ∗ t ) − r t ( p t ))111( ¯ E t )] ≤ E [( p ∗ t µ ( α (cid:48) i t x t + β i t p ∗ t ) − p t µ ( α (cid:48) i t x t + β i t p t ))111( E t )] + 10 pnt = E [( | β i t ˙ µ ( α (cid:48) i t x t + β i t ¯ p t ) + β i t ¯ p t ¨ µ ( α (cid:48) i t x t + β i t ¯ p t ) | ( p ∗ t − p t ) )111( E t )] + 10 pnt ≤ E [( (cid:101) L ( p ∗ t − (cid:101) p t − ∆ t ) )111( E t )] + 10 pnt ≤ (cid:101) L L E [ || (cid:101) θ ˆ N t ,t − − θ i t || E t )] + 4 (cid:101) L E [∆ t E t )] + 10 pnt =2 (cid:101) L L E [ || (cid:101) θ j t ,t − − θ j t || E t )] + 4 (cid:101) L E [∆ t E t )] + 10 pnt ≤ (cid:101) L L E [ (cid:101) B j t ,t − E t )] + 4 (cid:101) L E [∆ t E t )] + 10 pnt , where the ﬁrst inequality is from the probability of ¯ E t , the second equality is by applying Taylor’stheorem (where ¯ p t is some price between p ∗ t and p t ) with Assumption A-1 and Assumption A-2,the second inequality is from Assumption A-2 and (cid:101) L is some constant depending on L, L , L , p ,and both the last equality and the last inequality are from the deﬁnition of E t (i.e., events E N,t and E B,t ). Therefore, we have E [ r t ( p ∗ t ) − r t ( p t )] ≤ (cid:101) L L E [ (cid:101) B j t ,t − E t )] + 4 (cid:101) L E [∆ t E t )] + 10 pnt . (EC.1)Summing over t , the sum of the last terms above obviously lead to the regret O ( n log T ). For therest, we have (cid:88) t> t E [ (cid:101) B j t ,t − E t )] ≤ k d log T ∆ (cid:88) t> t E (cid:34) (cid:112)(cid:101) q j t t (cid:35) = k d log T ∆ (cid:88) t> t (cid:88) j ∈ [ m ] (cid:114) (cid:101) q j t ≤ k d log T ∆ (cid:88) j ∈ [ m ] (cid:112)(cid:101) q j T ≤ k d log T ∆ √ mT for some constant k , where the ﬁrst inequality is from E t (i.e., E V,t ) and the deﬁnition of (cid:101) B j t ,t , theequality is by conditioning on j t = j for all j ∈ [ m ], and the last inequality is because (cid:80) j (cid:101) q j = 1 andapply Cauchy-Schwarz. Hence (cid:88) t> t E [ (cid:101) B j t ,t − E t )] ≤ k d log T ∆ √ mT . (EC.2)On the other hand, because ˆ N t = N i t for all t > t on E t , (cid:88) t> t E [∆ t E t )] ≤ (cid:88) j ∈ [ m ] E  (cid:88) t ∈ (cid:101) T j,T ∆ (cid:113) (cid:101) T j,t  ≤ ∆ (cid:88) j ∈ [ m ] E (cid:20)(cid:113) (cid:101) T j,T (cid:21) ≤ ∆ √ mT . (EC.3) -companion to Author:

Pricing with Clustering ec3

Putting (EC.1), (EC.2), and (EC.3) together, we have (cid:88) t> t E [( r t ( p ∗ t ) − r t ( p t ))] ≤ c d log( T ) √ mT + c n log T for some constant c , and together with the regret for t < t , we are done with the regret upperbound.In the rest of this subsection, we prove the lemmas used in the proof of Theorem 1. Lemma EC.1.

For each j ∈ [ m ] and t ∈ T , with probability at least − ∆ , (cid:101) T j,t ∈ [ (cid:101) q j t − (cid:101) D ( t ) , (cid:101) q j t + (cid:101) D ( t )] for all j ∈ [ m ] , t ∈ T , where (cid:101) D ( t ) = (cid:112) t log(2 / ∆) .Proof: Obviously (cid:101) T j,t is a binomial random variable with parameter t and (cid:101) q j . Then we simplyuse Hoeﬀding inequality applied on sequence of i.i.d. Bernoulli random variable and a simple unionbound on all j ∈ [ m ] and t ∈ T . (cid:3) Lemma EC.2.

For any i ∈ [ n ] and t ∈ T , let V i,t = I + (cid:80) s ∈T i,t u s u (cid:48) s , we have that || ˆ θ i,t − θ i || V i,t ≤ (cid:112) ( d + 2) log(1 + T i,t R / ( d + 2)) + 2 log(1 / ∆) + 2 l Ll with probability at least − ∆ .Proof: We ﬁrst ﬁx some i ∈ [ n ], and we drop the index dependency on i for convenience ofnotation. At round s , the gradient of likelihood function ∇ l s ( φ ) is equal to ∇ l s ( φ ) = ( µ ( u (cid:48) s φ ) − d s ) u s . (EC.4)And its Hessian is ∇ l s ( φ ) = ˙ µ ( u (cid:48) s φ ) u s u (cid:48) s . (EC.5)Applying Taylor’s theorem, we obtain0 ≥ (cid:88) s l s (ˆ θ t ) − l s ( θ )= (cid:88) s ∇ l s ( θ ) (cid:48) (ˆ θ t − θ ) + 12 (cid:88) s ˙ µ ( u (cid:48) s ¯ θ t )( u (cid:48) s (ˆ θ t − θ )) + l || ˆ θ t − θ || − l || ˆ θ t − θ || , (EC.6)where the ﬁrst inequality is from the optimality of ˆ θ t , and θ t is a point on line segment between ˆ θ t and θ . Note that by our assumption and boundedness of u s and θ , we have ˙ µ ( u (cid:48) s ¯ θ t ) ≥ l . Therefore,we have (cid:88) s ˙ µ ( u (cid:48) s ¯ θ t )( u (cid:48) s (ˆ θ t − θ )) + l || ˆ θ t − θ || ≥ l || ˆ θ t − θ || V t , (EC.7) c4 e-companion to Author:

Pricing with Clustering where V t = I + (cid:80) s u s u (cid:48) s . On the other hand, we have ∇ l s ( θ i ) = − (cid:15) s u s , (EC.8)where (cid:15) s is the zero-mean error, which is obviously sub-Gaussian with parameter 1 as it is bounded.Now combining (EC.6), (EC.7), and (EC.8), we have l || ˆ θ t − θ || V t ≤ (cid:88) s (cid:15) s u (cid:48) s (ˆ θ t − θ ) + 2 l L ≤ || ˆ θ t − θ || V t || Z t || V − t + 2 l L , (EC.9)where Z t := (cid:80) s (cid:15) s u s , and the second inequality is from Cauchy-Schwarz and || ˆ θ t − θ || ≤ L . Thisleads to || ˆ θ t − θ || V t ≤ l || Z t || V − t + 2 L. To bound || Z t || V − t , according to Theorem 1 in Abbasi-Yadkori et al. (2011), we have || Z t || V − t ≤ (cid:114) ( d + 2) log(1 + T i,t R d + 2 ) + 2 log(1 / ∆)with probability at least 1 − ∆ and we are done. (cid:3) Corollary EC.1.

For any j ∈ [ m ] and t ∈ T , let (cid:101) V j,t := I + (cid:80) s ∈ (cid:101) T j,t u s u (cid:48) s , we have that || (cid:101) θ j,t − θ j || (cid:101) V j,t ≤ (cid:113) ( d + 2) log(1 + (cid:101) T j,t R / ( d + 2)) + 2 log(1 / ∆) + 2 l Ll with probability at least − ∆ . Next result is the minimum eigenvalue of the Fisher’s information matrix.

Lemma EC.3.

Let u (cid:48) t = ( x (cid:48) t , (cid:101) p t + ∆ t ) where ∆ t is a zero mean error with variance E [∆ t |F t − ] = ω t > , we must have λ min ( E [ u t u (cid:48) t |F t − ]) ≥ ω t min [1 , λ ] / (1 + p ) > . So we can set λ min ( E [ u t u (cid:48) t |F t − ]) ≥ λ ω t for some constant λ = min [1 , λ ] / (1 + p ) .Proof: Note that the Fisher’s information matrix can be written as E [ u t u (cid:48) t |F t − ] =  (cid:101) p t z (cid:101) p t (cid:101) p t + µ t  which is a submatrix of the matrix M :=  (cid:101) p t

00 Σ z (cid:101) p t Σ z (cid:101) p t (cid:101) p t + ω t (cid:101) p t Σ z (cid:101) p t + ω t )Σ z  = M p ⊗ M z where M p = (cid:34) (cid:101) p t (cid:101) p t (cid:101) p t + ω t (cid:35) , M z = (cid:34) z (cid:35) , -companion to Author:

Pricing with Clustering ec5 and ⊗ is the Kronecker product.To derive the minimum eigenvalue of M p , note that it is just a 2 × λ min ( M p ) = ( (cid:101) p t + ω t + 1)(1 − (cid:112) − ω t / ( (cid:101) p t + ω t + 1) )2 ≥ ω t (cid:101) p t + ω t + 1 ≥ ω t p . For M z , let y (cid:48) = ( y , y (cid:48) ) ∈ R d +1 where y ∈ R and y ∈ R d , then y (cid:48) M z y = y + y (cid:48) Σ z y ≥ y + λ || y || ≥ min [1 , λ ] || y || . Therefore, λ min ( M z ) ≥ min [1 , λ ] > λ min ( M ) = λ min ( M p ) λ min ( M z ) ≥ ω t p min [1 , λ ] . Then we obtain the result as E [ u t u (cid:48) t ] is the submatrix of M . (cid:3) We apply a matrix concentration inequality result and obtain the minimum eigenvalue of theempirical Fisher’s information matrix.

Lemma EC.4.

For any i ∈ [ n ] and t > (cid:18) R log(( d + 2) T ) λ ∆ min i ∈ [ n ] q i (cid:19) , where R := 2 + ¯ p , we have P (cid:18) λ min (cid:16) (cid:88) s ∈T i,t u s u (cid:48) s (cid:17) < λ ∆ q i √ t (cid:19) < t . Proof:

Note that λ max ( u s u (cid:48) s ) = || u s || ≤ R = 2 + p . We ﬁnd that (cid:88) s ∈T i,t u s u (cid:48) s = t (cid:88) s =1 ( i s = i ) u s u (cid:48) s , and, by Lemma EC.3, λ min ( E [ ( i s = i ) u s u (cid:48) s |F s − ]) = q i λ min ( E [ u s u (cid:48) s |F s − ]) ≥ λ q i ω s . Therefore, λ min (cid:32) t (cid:88) s =1 E [ ( i s = i ) u s u (cid:48) s |F s − ] (cid:33) ≥ t (cid:88) s =1 λ min ( E [ ( i s = i ) u s u (cid:48) s |F s − ]) ≥ q i λ t (cid:88) s =1 ω s ≥ q i λ ∆ t √ t ≥ q i λ ∆ √ t. c6 e-companion to Author:

Pricing with Clustering

As a result, we have that P  λ min ( (cid:88) s ∈T i,t u s u (cid:48) s ) < λ ∆ q i √ t  = P  λ min ( (cid:88) s ∈T i,t u s u (cid:48) s ) < λ ∆ q i √ t , t (cid:88) s =1 λ min ( E [ ( i s = i ) u s u (cid:48) s |F s − ]) ≥ λ ∆ q i √ t  ≤ P  λ min ( (cid:88) s ∈T i,t u s u (cid:48) s ) < λ ∆ q i √ t , λ min (cid:32) t (cid:88) s =1 E [ ( i s = i ) u s u (cid:48) s |F s − ] (cid:33) ≥ λ ∆ q i √ t  ≤ ( d + 2) e − λ qi √ t R , where the last inequality is from Theorem 3.1 in Tropp (2011) with ζ = 1 / i ∈ [ n ] and t > (cid:18) R log( T ( d + 2)) λ ∆ min i ∈ [ n ] q i (cid:19) , we have the simple union bound over i ∈ [ n ] , t ∈ T , ( d + 2) exp( − λ ∆ q i √ t/ (4 R )) < /t , and theproof is complete. (cid:3) Clearly, if we combine Lemma EC.4 and Lemma EC.2, for any i ∈ [ n ], t > ¯ t where¯ t = (cid:18) R log( T ( d + 2)) λ ∆ min i ∈ [ n ] q i (cid:19) , (EC.10)we have that || ˆ θ i,t − θ i || ≤ (cid:112) ( d + 2) log(1 + tR / ( d + 2)) + 2 log t + 2 l Ll (cid:112) λ min ( V i,t ) (EC.11) ≤ (cid:112) c ( d + 2) log(1 + t ) (cid:112) λ min ( V i,t ) = B i,t for some constant c > /l , and B i,t ≤ (cid:112) c ( d + 2) log(1 + t )∆ (cid:112) λ q i √ t (EC.12)with probability at least 1 − /t .The next lemma states that when estimation errors are bounded, under certain conditions wehave ˆ N t = N i t . Lemma EC.5.

Suppose for all i ∈ [ n ] it holds that || ˆ θ i,t − − θ i || ≤ B i,t − and B i,t − < γ/ . Then ˆ N t = N i t . -companion to Author:

Pricing with Clustering ec7

Proof:

First of all, for i , i ∈ [ n ], if they belong to diﬀerent clusters and B i ,t − + B i ,t − < γ/ || ˆ θ i ,t − − ˆ θ i ,t − || > B i ,t − + B i ,t − because γ ≤|| θ i − θ i || ≤ || θ i − ˆ θ i ,t − || + || ˆ θ i ,t − − ˆ θ i ,t − || + || ˆ θ i ,t − − θ i || ≤ B i ,t − + || ˆ θ i ,t − − ˆ θ i ,t − || + B i ,t − < γ/ || ˆ θ i ,t − − ˆ θ i ,t − || , which implies that || ˆ θ i ,t − − ˆ θ i ,t − || > γ/ > B i ,t − + B i ,t − .On the other hand, if || ˆ θ i ,t − − ˆ θ i ,t − || > B i ,t − + B i ,t − , we must have i , i belongs to diﬀerentclusters because B i ,t − + B i ,t − < || ˆ θ i ,t − − ˆ θ i ,t − || ≤ || θ i − ˆ θ i ,t − || + || ˆ θ i ,t − − ˆ θ i ,t − || + || ˆ θ i ,t − − θ i || ≤ B i ,t − + || ˆ θ i ,t − − ˆ θ i ,t − || + B i ,t − , which implies || ˆ θ i ,t − − ˆ θ i ,t − || >

0, i.e., they belong to diﬀerent clusters.Therefore, if i ∈ ˆ N t , i.e., || ˆ θ i t ,t − − ˆ θ i,t − || ≤ B i t ,t − + B i,t − , we must have that i ∈ N i t as well or B i t ,t − + B i,t − ≥ γ/ B i,t − < γ/ i ∈ N i t , then we must have || ˆ θ i t ,t − − ˆ θ i,t − || ≤ B i t ,t − + B i,t − , which impliesthat i ∈ ˆ N t as well.Above all, we have shown that ˆ N i t = N i t . (cid:3) Note that given (EC.11) and (EC.12), we have that B i,t − < γ/ i if t > k (( d + 2) log(1 + T )) γ λ ∆ min i ∈ [ n ] q i for some constant k . Therefore, for each t > ¯ t where¯ t = max (cid:26) t , k (( d + 2) log(1 + T )) γ λ ∆ min i ∈ [ n ] q i (cid:27) , (EC.13)and ¯ t is deﬁned in (EC.10), ˆ N t = N i t with probability at least 1 − n/t .The next lemma shows that the clustered estimation will be quite accurate when most of the ˆ N t is actually equal to N i t . Lemma EC.6.

For any t such that t > t, we have P  λ min  (cid:88) s ∈ (cid:101) T jt,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t t  < nt , where ¯ t is deﬁned in (EC.13). c8 e-companion to Author:

Pricing with Clustering

Proof:

The proof is analogous to Lemma EC.4. Let E N,t be the event such that ˆ N t = N i t , and (cid:101) E j,t be the event such that (cid:101) T j,t ≤ (cid:101) q j t/

2. From our previous analysis, we know that given t > ¯ t , E N,t holds with probability at least 1 − n/t . Also, according to Lemma EC.1, event (cid:101) E j,t holds withprobability at least 1 − /t given t ≥ T ) / min j ∈ [ m ] (cid:101) q j (which is satisﬁed by taking t > ¯ t ).On event (cid:101) E j,t and E N,s for all s ∈ [ t/ , t ] (which holds with probability at least 1 − n/t ), we have λ min ( E [ ( j s = j ) u s u (cid:48) s |F s − ]) ≥ λ (cid:101) q j ω s = λ ∆ (cid:101) q j ( (cid:101) T j,s ) − / ≥ λ ∆ (cid:114) (cid:101) q j t by Lemma EC.3 and deﬁnition of (cid:101) q j . This implies that λ min (cid:32) t (cid:88) s =1 E [ ( j s = j ) u s u (cid:48) s |F s − ] (cid:33) ≥ t (cid:88) s = t/ λ min ( E [ ( j s = j ) u s u (cid:48) s |F s − ]) ≥ λ ∆ (cid:112)(cid:101) q j t . Therefore, we have for any t > t , P  λ min  (cid:88) s ∈ (cid:101) T jt,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t t  = (cid:88) j ∈ [ m ] P  λ min  (cid:88) s ∈ (cid:101) T jt,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) j t = j  P ( j t = j )= (cid:88) j ∈ [ m ] P  λ min  (cid:88) s ∈ (cid:101) T j,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t  (cid:101) q j . For each j ∈ [ m ], we have P  λ min  (cid:88) s ∈ (cid:101) T j,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t  ≤ P  λ min  (cid:88) s ∈ (cid:101) T j,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t , (cid:91) s ∈ [ t/ ,t ] ( E N,t ∪ (cid:101) E j,t )  + 6 nt = P  λ min  (cid:88) s ∈ (cid:101) T j,t u s u (cid:48) s  < λ ∆ (cid:112)(cid:101) q j t , λ min  (cid:88) s ∈ (cid:101) T j,t E [ u s u (cid:48) s |F s − ]  ≥ λ ∆ (cid:112)(cid:101) q j t , (cid:91) s ∈ [ t/ ,t ] ( E N,t ∪ (cid:101) E j,t )  + 6 nt ≤ nt , where the ﬁrst inequality is from the probability of the complement of (cid:83) s ∈ [ t/ ,t ] ( E N,t ∪ (cid:101) E j,t ), andthe last inequality is by Theorem 3.1 in Tropp (2011), and we take t > (cid:32) R log(2( d + 2) T ) λ ∆ min j ∈ [ m ] (cid:112)(cid:101) q j (cid:33) . Since ¯ t > (cid:0) R log(2( d + 2) T ) / ( λ ∆ min j ∈ [ m ] (cid:112)(cid:101) q j ) (cid:1) by deﬁnition, we complete the proof. (cid:3) -companion to Author:

Pricing with Clustering ec9

EC.2. Proofs for the Linear Model

Proof of Theorem 2.

First of all, we deﬁne event (cid:101) E t := (cid:110) | (cid:101) β j t ,t − β j t | ≤ k (cid:112) log t ( (cid:101) q j t t ) − / / ∆ , | (cid:101) α (cid:48) j t ,t x − α (cid:48) j t x | ≤ k (cid:112) ( d + 1) log t ( (cid:101) q j t t ) / / ∆ || x || (cid:101) V − jt,t (cid:111) . According to Lemma EC.11, this event holds with probability at least 1 − n/t for any t > t (cid:48) where¯ t (cid:48) = O (cid:32) √ d log T min i ∈ [ n ] q κ/ i (cid:33) / (2 κ −  is deﬁned in (EC.17).Therefore, we can split the regret into t ≤ t (cid:48) (which has regret at most O (¯ t (cid:48) )) and t > t (cid:48) . Notethat for any t > t (cid:48) , on event (cid:101) E t and E N,t (such that ˆ N t = N i t , which holds with probability at least1 − n/t according to Lemma EC.10), we have r t ( p ∗ t ) − r t ( p t ) ≤ − β i t ( p ∗ t − (cid:101) p t − ∆ t ) ≤ − β i t ( | p ∗ t − p (cid:48) t | + | ∆ t | ) − β i t ∆ t ≤ c (( α (cid:48) i t x t − (cid:101) α (cid:48) j t ,t − x t ) + ( β i t − (cid:101) β j t ,t − ) + ∆ t ) ≤ c ( (cid:101) C αj t ,t − ( x t )) + c log T (∆ ( (cid:101) T j t ,t ) − / + ( (cid:101) q j t t ) − / / ∆ )for some constants c , c , where the third inequality is from the deﬁnition of optimal price givendemand parameters and covariates, and the fourth inequality is from Cauchy-Schwarz, event (cid:101) E t ,and the deﬁnition of ∆ t . Here (cid:101) C αj t ,t − ( x t ) is deﬁned as (cid:101) C αj t ,t − ( x t ) := k (cid:112) ( d + 1) log( t − (cid:101) q j t ( t − / / ∆ || x || (cid:101) V − jt,t − . For the second terms, if we sum them up over t , their summation can be bounded by c log T √ mT for some constant c as we did in the proof of Theorem 1. For the ﬁrst term, there is some constant c such that ( (cid:101) C αj t ,t − ( x t )) ≤ c ( d + 1) log T (cid:112)(cid:101) q j t t || x || (cid:101) V − jt,t . (EC.14)If we sum them over t , we have (on events (cid:101) E t and E N,t ) (cid:88) t> t (cid:48) E (cid:20)(cid:112)(cid:101) q j t t || x t || (cid:101) V − jt,t (cid:21) ≤ (cid:88) j ∈ [ m ] (cid:112)(cid:101) q j T E  (cid:88) t> t (cid:48) ,t ∈ (cid:101) T j,T || x t || (cid:101) V − j,t  ≤ c ( d + 1) log T (cid:88) j ∈ [ m ] (cid:112)(cid:101) q j T ≤ c ( d + 1) log T √ mT for some constant c where the second inequality is by Lemma 11 in Abbasi-Yadkori et al. (2011).Therefore, combined with (EC.14), its summation over t > t (cid:48) is at most O (cid:16) d log T √ mT (cid:17) . Notethat since the expected regret incurred on any of events (cid:101) E t or E N,t fail is at most O ( n log T ), weﬁnish the proof. c10 e-companion to Author:

Pricing with Clustering

In the rest of this subsection, we prove several lemmas that are needed for the proof of Theorem2. The ﬁrst lemma is about length of T i,t . Lemma EC.7.

For any i ∈ [ n ] , t ∈ T , with probability at least − ∆ , T i,t ∈ [ q i t − D ( t ) , q i t + D ( t )] ,where D ( t ) = (cid:112) t log(2 / ∆) / .Proof: Proof is the same as Lemma EC.1 hence neglected. (cid:3)

Lemma EC.8.

For any T t ,t := { t + 1 , . . . , t } and j ∈ [ m ] , we have | (cid:101) T j,t ∩ T t ,t | ∈ [ (cid:101) q j ( t − t ) + (cid:101) D ( t − t ) , (cid:101) q j ( t − t ) − (cid:101) D ( t − t )] with probability at least − ∆ where (cid:101) D ( t ) = (cid:112) t log(2 / ∆) .Proof: This is an immediate result of Lemma EC.1 and Lemma EC.7. (cid:3)

Lemma EC.9.

For any i ∈ [ n ] , t ∈ T , we have that | ˆ β i,t − β i | ≤ k (cid:112) log(1 / ∆) + log(1 + t )  (cid:88) s ∈T i,t ∆ s  − / || ˆ α i,t − α i || V i,t ≤ k √ d + 1(log(1 / ∆) + log(1 + t ))  (cid:88) s ∈T i,t ∆ s  − / (cid:112) T i,t for some constant k , k with probability at least − ∆ . In particular, we can show that | ˆ β i,t − β i | ≤ (cid:101) C βi,t and || ˆ α i,t − α i || V i,t ≤ (cid:101) C αi,t with probability at least − /t .Proof: First of all, we drop the index dependency on i for the sake of convenience. According todeﬁnition of ˆ β t , we have that ˆ β t − β = (cid:80) s ∈T t k s ∆ s (cid:80) s ∈T t ∆ s , where k s := α (cid:48) x s + β (cid:101) p s + (cid:15) s which satisﬁes | k s | ≤ (cid:101) L := 2 L + pL + 1 by the boundedness assumption.We can write k s ∆ s = | ∆ s | k s σ s where σ s = ± /

2, and | ˆ β t − β | (cid:115)(cid:88) s ∈T t | ∆ s | = | (cid:80) s ∈T t k s σ s | ∆ s || (cid:113) ∆ / √ t + (cid:80) s ∈T t k s | ∆ s | (cid:113) ∆ / √ t + (cid:80) s ∈T t k s | ∆ s | (cid:113)(cid:80) s ∈T t | ∆ s | ≤ (cid:113) (cid:101) L | (cid:80) s ∈T t k s σ s | ∆ s || (cid:113) ∆ / √ t + (cid:80) s ∈T t k s | ∆ s | , where the inequality is because | ∆ s | ≥ ∆ / √ t for any s ≤ t . Both σ s and k s | ∆ s | are adapted toﬁltration {F s } , and σ s , which is sub-Gaussian with parameter 1, form a martingale diﬀerence -companion to Author:

Pricing with Clustering ec11 sequence. Then Theorem 1 in Abbasi-Yadkori et al. (2011) (applied on single dimensional case)gives us that | ˆ β t − β | (cid:115)(cid:88) s ∈T t ∆ s ≤ (cid:113) (cid:101) L (cid:115) log( (cid:88) s ∈T t k s ∆ s + ∆ / √ t ) − log(∆ / √ t ) + 2 log(2 / ∆) ≤ k (cid:112) log(1 / ∆) + log(1 + t ) (EC.15)with probability at least 1 − ∆ / k .On the other hand, by deﬁnition of ˆ α t ,( (cid:88) s ∈T t x s x (cid:48) s + I )( ˆ α t − α ) = ( (cid:88) s ∈T t p s x s )( β − ˆ β t ) + (cid:88) s ∈T t (cid:15) s x s + α, which implies that || ˆ α t − α || V t ≤|| (cid:88) s ∈T t p s x s || V − t | β − ˆ β t | + || (cid:88) s ∈T t (cid:15) s x s || V − t + L ≤ p (cid:88) s ∈T t || x s || V − t | β − ˆ β t | + || (cid:88) s ∈T t (cid:15) s x s || V − t + L ≤ p | β − ˆ β t | (cid:112) T t ( d + 1) log(1 + 2 T t / ( d + 1))+ (cid:112) ( d + 1) log(1 + 2 T t / ( d + 1)) + 2 log(2 / ∆) + L, where V t = I + (cid:80) s ∈T t x s x (cid:48) s , and the last inequality hold with probability at least 1 − ∆ / k (cid:48) gives us the bound, i.e., || ˆ α t − α || V t ≤ k (cid:48) ( (cid:112) ( d + 1)(log(1 / ∆) + log(1 + t )) | β − ˆ β t | (cid:112) T t + 1) . (EC.16)Therefore, events (EC.15) and (EC.16) hold together with probability at least 1 − ∆.According to the result above, we can take ∆ = 1 /t and let c , c in (17) chosen appropriatelysuch that | ˆ β i,t − β i | ≤ (cid:101) C βi,t and || ˆ α i,t − α i || V i,t ≤ (cid:101) C αi,t with probability at least 1 − /t . (cid:3) Corollary EC.2.

For any j ∈ [ m ] , t ∈ T , we have that | (cid:101) β j,t − β i | ≤ k (cid:112) log(1 / ∆) + log(1 + t )  (cid:88) s ∈ (cid:101) T j,t ∆ s  − / || (cid:101) α j,t − α i || (cid:101) V j,t ≤ k √ d + 1(log(1 / ∆) + log(1 + t ))  (cid:88) s ∈ (cid:101) T j,t ∆ s  − / (cid:113) (cid:101) T j,t with probability at least − ∆ . Lemma EC.10.

For any t such that t > ¯ t (cid:48) := max  t min i q i , T )min i ∈ [ n ] q i , (cid:32) k √ d + 1 log Tγ ∆ min i ∈ [ n ] q κ/ i (cid:33) / (2 κ − , (cid:32) k γ min i ∈ [ n ] q κ/ i (cid:33) /κ  , (EC.17) where k is some constant, we have that ˆ N t = N i t with probability at least − n/t . c12 e-companion to Author:

Pricing with Clustering

Proof:

We consider the estimation error of β i and α i , and we want to show that both of them canbe controlled. According to Lemma EC.7, if t > t ) / min i ∈ [ n ] q i , we have that for any i ∈ [ n ] T i,t ≥ q i t/ − /t (since D ( t ) < q i t/ i ∈ [ n ]). If this this true, wehave (cid:80) s ∈T i,t ∆ s ≥ ∆ T i,t / √ t ≥ ∆ q i √ t/ . Moreover, because of Assumption B.2 and t > t / min i q i (which implies that T i,t > t for all i ∈ [ n ]), λ min ( V i,t ) ≥ c T κi,t . As a result, C i,t ≤ k (cid:32) √ d + 1 log t (cid:115) T i,t ∆ q i √ tλ min ( V i,t ) + (cid:115) log t ∆ q i √ t + (cid:115) λ min ( V i,t ) (cid:33) ≤ k (cid:32) √ d + 1 log t (cid:115) t / − κ ∆ q κi + (cid:115) log t ∆ q i √ t + (cid:112) ( q i t ) − κ (cid:33) for some constant k , k with probability at least 1 − /t . Since Lemma EC.9 implies that || ˆ θ i,t − θ i || ≤ C i,t with probability at least 1 − /t , if t > max (cid:32) k √ d + 1 log tγ ∆ min i ∈ [ n ] q κ/ i (cid:33) / (2 κ − , (cid:32) k γ min i ∈ [ n ] q κ/ i (cid:33) /κ  , we have || ˆ θ i,t − θ i || ≤ C i,t < γ/ i ∈ [ n ] with probability at least 1 − n/t . Then using LemmaEC.5 leads to the result. (cid:3) Lemma EC.11.

For any t > t (cid:48) , we have that | (cid:101) β j t ,t − β j t | ≤ k (cid:112) log t ( (cid:101) q j t t ) − / / ∆ | (cid:101) α (cid:48) j t ,t x − α (cid:48) j t x | ≤ k (cid:112) ( d + 1) log t ( (cid:101) q j t t ) / / ∆ || x || (cid:101) V − jt,t for some constants k , k with probability at least − n/t .Proof: According to Corollary EC.2 and Cauchy-Schwarz, we have | (cid:101) β j t ,t − β j | ≤ k (cid:112) t )  (cid:88) s ∈ (cid:101) T jt,t ∆ s  − / | (cid:101) α (cid:48) j t ,t x − α (cid:48) j x | ≤ k √ d + 12 log(1 + t )  (cid:88) s ∈ (cid:101) T jt,t ∆ s  − / (cid:113) (cid:101) T j t ,t || x || (cid:101) V − jt,t (EC.18)with probability at least 1 − /t .Deﬁne events E N,s = { ˆ N s = N i s } . According to Lemma EC.10, when s > ¯ t (cid:48) , E N,s holds with prob-ability at least 1 − n/s . Note that on events E N,s for all s ∈ [ t/ , t ] (which holds with probabilityat least 1 − n/t as t/ > ¯ t (cid:48) ), we have that (cid:88) s ∈ (cid:101) T jt,t ∆ s ≥ (cid:88) s ∈ (cid:101) T jt,t : s>t/ ∆ s ≥ (cid:88) s ∈ (cid:101) T jt,t : s>t/ ∆ | (cid:101) T j t ,t ∩ { s > t/ }| (cid:113) (cid:101) T j t ,t . -companion to Author:

Pricing with Clustering ec13

Then according to Lemma EC.8, | (cid:101) T j t ,t ∩ { s > t/ }| ∈ [ (cid:101) q j t t/ − (cid:101) D ( t/ , (cid:101) q j t t/ (cid:101) D ( t/ (cid:101) D ( t/

2) = (cid:112) t log(2 t ) / ≤ (cid:101) q j t t/ t > t (cid:48) ) with probability at least 1 − /t (hence | (cid:101) T j t ,t ∩ { s >t/ }| ≥ (cid:101) q j t t/ (cid:101) T j t ,t ∈ [ (cid:101) q j t t/ , (cid:101) q j t t/

2] with probability at least 1 − /t . As aresult, combined with the above equation, with probability at least 1 − n/t , we have (cid:88) s ∈ (cid:101) T jt,t ∆ s ≥ ∆ (cid:112)(cid:101) q j t t . Combining with (EC.18), we obtain the desired result. (cid:3)

EC.3. Diﬀerent θ i for the Same Cluster As mentioned in Remark 1 in Section 3, this section talks about some technical lemmas in showingthe regret of the modiﬁed CSMP when parameters θ i within the same cluster can be diﬀerent.Note that we assume || θ i − θ i || ≤ γ for any i , i in any cluster N j .The ﬁrst result is an corollary of Lemma EC.5. Corollary EC.3.

Suppose for all i ∈ [ n ] it holds that || ˆ θ i,t − − θ i || ≤ B i,t − and B i,t − < γ/ .Then in the modiﬁed algorithm (with γ > γ ), we have that ˆ N t = N i t . Proof:

The proof is almost identical to Lemma EC.5. First of all, for i , i ∈ [ n ], if they belong todiﬀerent clusters and B i ,t − + B i ,t − < γ/

4, we must have || ˆ θ i ,t − − ˆ θ i ,t − || > B i ,t − + B i ,t − + γ because γ ≤|| θ i − θ i || ≤ || θ i − ˆ θ i ,t − || + || ˆ θ i ,t − − ˆ θ i ,t − || + || ˆ θ i ,t − − θ i || ≤ B i ,t − + || ˆ θ i ,t − − ˆ θ i ,t − || + B i ,t − < γ/ || ˆ θ i ,t − − ˆ θ i ,t − || , which implies that || ˆ θ i ,t − − ˆ θ i ,t − || > γ/ > γ/ γ > B i ,t − + B i ,t − + γ .On the other hand, if || ˆ θ i ,t − − ˆ θ i ,t − || > B i ,t − + B i ,t − + γ , we must have i , i belongs todiﬀerent clusters because B i ,t − + B i ,t − + γ < || ˆ θ i ,t − − ˆ θ i ,t − || ≤ || θ i − ˆ θ i ,t − || + || θ i ,t − − θ i ,t − || + || ˆ θ i ,t − − θ i || ≤ B i ,t − + || θ i ,t − − θ i ,t − || + B i ,t − which implies || θ i ,t − − θ i ,t − || > γ , i.e., they belong to diﬀerent clusters.Therefore, if i ∈ ˆ N t , i.e., || ˆ θ i t ,t − − ˆ θ i,t − || ≤ B i t ,t − + B i,t − + γ , we must have that i ∈ N i t as wellor B i t ,t − + B i,t − ≥ γ/ B i,t − < γ/ i ∈ N i t , then we must have || ˆ θ i t ,t − − ˆ θ i,t − || ≤ B i t ,t − + B i,t − + γ , whichimplies that i ∈ ˆ N t as well. Summarizing, we have shown that ˆ N i t = N i t . (cid:3) The next lemma measures the conﬁdence bound of (cid:101) θ j,t compared with any true parameter (cid:101) θ i for i ∈ N j , with respect to the empirical Fisher’s information matrix (cid:101) V j,t . c14 e-companion to Author:

Pricing with Clustering

Lemma EC.12.

Let t satisﬁes that t > (cid:18) R log(( d + 2) T ) λ ∆ min j (cid:101) q j (cid:19) . On the event that (cid:101) T j t ,t ≥ (cid:101) q j t t/ , || (cid:101) θ j t ,t − ¯ θ j t || ≤ (cid:112) ( d + 2) log (1 + tR / ( d + 2)) + 4 log t + 2 l Ll (cid:113) λ min ( (cid:101) V t ) + 2 L R γ l λ ∆ υ with probability at least − /t .Proof: The proof is quite similar to Lemma EC.2. We drop the index j t for convenience. Notethat for an arbitrary parameter φ ∈ Θ, since (cid:101) θ t is the MLE, we have0 ≥ (cid:88) s l s ( (cid:101) θ t ) − (cid:88) s l s ( φ ) = (cid:88) s ∇ l s ( φ ) (cid:48) ( (cid:101) θ t − φ ) + 12 (cid:88) s ˙ µ ( u (cid:48) s ¯ φ t )( u (cid:48) s ( (cid:101) θ t − φ )) + l || (cid:101) θ t − φ || − l || (cid:101) θ t − φ || ≥ (cid:88) s ∇ l s ( φ ) (cid:48) ( (cid:101) θ t − φ ) + l || (cid:101) θ t − φ || (cid:101) V t − l L , (EC.19)where the ﬁrst inequality is from the optimality of (cid:101) θ t , and φ t is a point on line segment between (cid:101) θ t and φ .Now we consider ∇ l s ( φ ). By Taylor’s theorem, ∇ l s ( φ ) = ∇ l s ( θ s ) + ∇ l s (ˇ θ s ) (cid:48) ( φ − θ s ) , where θ s isthe true parameter at time s , and ˇ θ s is a point between φ and θ s . As a result, ∇ l s ( φ ) = − (cid:15) s u s + ˙ µ ( u (cid:48) s ˇ θ s ) u s u (cid:48) s ( φ − θ s ) . (EC.20)Since φ ∈ Θ is an arbitrary vector, we can let φ = θ i for any i ∈ N j . Combining (EC.19) and (EC.20),we have that with probability at least 1 − /t . l || (cid:101) θ t − θ i || (cid:101) V t ≤ (cid:88) s (cid:15) s u (cid:48) s ( (cid:101) θ t − θ i ) − (cid:88) s ˙ µ ( u (cid:48) s ˇ θ s )( θ i − θ s ) (cid:48) u s u (cid:48) s ( (cid:101) θ t − φ ) + 2 l L ≤|| (cid:88) s (cid:15) s u s || (cid:101) V − t || (cid:101) θ t − θ i || (cid:101) V t + (cid:88) s || ˙ µ ( u (cid:48) s ˇ θ s ) u s u (cid:48) s ( θ i − θ s ) || (cid:101) V − t || (cid:101) θ t − θ i || (cid:101) V t + 2 l L ≤ (cid:115) ( d + 2) log (cid:18) tR d + 2 (cid:19) + 4 log t || (cid:101) θ t − θ i || (cid:101) V t + (cid:80) s || ˙ µ ( u (cid:48) s ˇ θ s ) u s u (cid:48) s ( θ i − θ s ) || || (cid:101) θ t − θ i || (cid:101) V t (cid:113) λ min ( (cid:101) V t ) + 2 l L ≤ (cid:115) ( d + 2) log (cid:18) tR d + 2 (cid:19) + 4 log t || (cid:101) θ t − θ i || (cid:101) V t + L R γ (cid:101) q j t || (cid:101) θ t − θ i || (cid:101) V t (cid:113) λ min ( (cid:101) V t ) + 2 l L , where the second inequality is from Theorem 1 in Abbasi-Yadkori et al. (2011) and the last inequal-ity is because (cid:101) T j t ,t ≥ (cid:101) q j t t/

2. By some simple algebra, above inequality implies that || (cid:101) θ t − θ i || (cid:101) V t ≤ (cid:114) ( d + 2) log (cid:16) tR d +2 (cid:17) + 4 log tl + L R γ (cid:101) q j tl (cid:113) λ min ( (cid:101) V t ) + 2 L. -companion to Author:

Pricing with Clustering ec15

This inequality further implies that || (cid:101) θ t − θ i || ≤ (cid:114) ( d + 2) log (cid:16) tR d +2 (cid:17) + 4 log tl (cid:113) λ min ( (cid:101) V t ) + L R γ (cid:101) q j tl λ min ( (cid:101) V t ) + 2 L (cid:113) λ min ( (cid:101) V t ) . (EC.21)Since in the modiﬁed algorithm, we let ∆ t = ± ∆ max (cid:16) (cid:101) T − / N t ,t , υ (cid:17) , on the event d LemmaEC.4 implies that λ min ( (cid:101) V t ) ≥ λ ∆ (cid:101) q j max( √ t, υ t ) / ≥ λ ∆ υ (cid:101) q j t/ − /t for any t satisfying t > (8 R log(( d + 2) T ) / ( λ ∆ min j (cid:101) q j )) . Plug λ min ( (cid:101) V t ) ≥ λ ∆ υ (cid:101) q j t/ L R γ (cid:101) q j t/ ( l λ min ( (cid:101) V t )) in (EC.21), we ﬁnally show that with probability at least 1 − /t , || (cid:101) θ t − θ i || ≤ (cid:114) ( d + 2) log (cid:16) tR d +2 (cid:17) + 4 log t + 2 l Ll (cid:113) λ min ( (cid:101) V t ) + 2 L R γ l λ ∆ υ , and we ﬁnish the proof. (cid:3) Now we provide the proof (sketch) of the theorem of regret of modiﬁedalgorithm.

Theorem EC.1.

The expected regret of the modiﬁed algorithm CSMP is R ( T ) = O (cid:18) d log ( dT )min i ∈ [ n ] q i + d log T √ mT + γ / T (cid:19) . If we hide logarithmic terms and let min i ∈ [ n ] q i = Θ(1 /n ) with T (cid:29) n , we have the expected regretis at most R ( T ) = (cid:101) O ( d √ mT + γ / T ) . Proof:

The proof is almost identical to Theorem 1 so we neglect most part of the proof. Theonly thing which requires extra investigation is that conditioned on various events as in Theorem1, and let t suﬃciently large (larger than some time with the same scale as the maximum of ¯ t ),we want to bound r t ( p ∗ t ) − r t ( p t ) = O ( r t ( p ∗ t ) − r t ( p (cid:48) t ) + ∆ t ) . Note that ∆ t = O (cid:16) max (cid:16) (cid:101) T − / N t ,t , υ (cid:17)(cid:17) ≤ O (cid:16) (cid:101) T − / N t ,t + υ (cid:17) , and for the part of regret (cid:80) t O (cid:16) (cid:101) T − / N t ,t (cid:17) , it is bounded as in Theorem 1. From υ ,the cumulative regret becomes O ( υ T ).To bound r t ( p ∗ t ) − r t ( p (cid:48) t ), note that we have r t ( p ∗ t ) − r t ( p (cid:48) t ) ≤ O (cid:16) || θ i t − (cid:101) θ j t ,t || (cid:17) . Now we use theresult in Lemma EC.12 and obtain that r t ( p ∗ t ) − r t ( p (cid:48) t ) ≤ O (cid:32) d log tλ min ( (cid:101) V t ) + γ υ (cid:33) . The cumulative regret by summing over O ( d log t/λ min ( (cid:101) V t )) is the same as in Theorem 1, and thecumulative regret from O ( γ /υ ) is obviously O ( γ T /υ ).Above all, adding up all parts of regret, we have that the expected regret is at most R ( T ) = O (cid:18) d log ( dT )min i ∈ [ n ] q i + d log T √ mT + υ T + γ Tυ (cid:19) . Taking value υ = Θ( γ / ) gives us the ﬁnal result.) gives us the ﬁnal result.