[PDF] A Generalized Markov Chain Model to Capture Dynamic Preferences and Choice Overload

Abstract

Assortment optimization is an important problem that arises in many industries such as retailing and online advertising where the goal is to find a subset of products from a universe of substitutable products which maximize seller's expected revenue. One of the key challenges in this problem is to model the customer substitution behavior. Many parametric random utility maximization (RUM) based choice models have been considered in the literature. However, in all these models, probability of purchase increases as we include more products to an assortment. This is not true in general and in many settings more choices hurt sales. This is commonly referred to as the choice overload. In this paper we attempt to address this limitation in RUM through a generalization of the Markov chain based choice model considered in Blanchet et al. (2016). As a special case, we show that our model reduces to a generalization of MNL with no-purchase attractions dependent on the assortment S and strictly increasing with the size of assortment S. While we show that the assortment optimization under this model is NP-hard, we present fully polynomial-time approximation scheme (FPTAS) under reasonable assumptions.

Full PDF

AA Generalized Markov Chain Model to CaptureDynamic Preferences and Choice Overload

Kumar Goutam ∗ Vineet Goyal † Agathe Soret ‡ Abstract

Assortment optimization is an important problem that arises in many industries such as re-tailing and online advertising where the goal is to ﬁnd a subset of products from a universe ofsubstitutable products which maximize seller’s expected revenue. One of the key challenges inthis problem is to model the customer substitution behavior. Many parametric random utilitymaximization (RUM) based choice models have been considered in the literature. However,in all these models, probability of purchase increases as we include more products to an as-sortment. This is not true in general and in many settings more choices hurt sales. This iscommonly referred to as the choice overload . In this paper we attempt to address this limi-tation in RUM through a generalization of the Markov chain based choice model consideredin [5]. As a special case, we show that our model reduces to a generalization of MNL withno-purchase attractions dependent on the assortment S and strictly increasing with the size ofassortment S . While we show that the assortment optimization under this model is NP-hard, wepresent fully polynomial-time approximation scheme (FPTAS) under reasonable assumptions. Key words : assortment optimization, choice overload, choice model, Markov chain ∗ Industrial Engineering and Operations Research, Columbia University, New York, NY., [email protected] † Industrial Engineering and Operations Research, Columbia University, New York, NY. [email protected] ‡ Industrial Engineering and Operations Research, Columbia University, New York, NY. [email protected] a r X i v : . [ ec on . T H ] J u l Introduction

Assortment optimization problems arise in many practical applications such as retailing or onlineadvertising. In such settings, a seller or decision maker has to select a subset of products from auniverse of substitutable products to oﬀer to customers in order to maximize the expected revenue.The demand of any item, and therefore the expected revenue, depend on the substitution behaviorof the customers. Hence it is important to choose the “right choice model” which speciﬁes theprobability that a random customer decides to purchase a particular item oﬀered in the set. The ob-jectives are two-fold: ﬁrst determine or learn how customers choose and substitute among products,and second develop algorithms to ﬁnd an optimal assortment.More speciﬁcally, suppose we are given a universe of n substitutable products, N = { , ..., n } with prices p , ..., p n . Let π ( i, S ) denote the probability that a random buyer selects product i whenthe set of oﬀered products is S ⊆ N . This probability depends on the substitution behavior ofcustomers and is referred to as the choice probability. The assortment optimization problem, wheregoal is to ﬁnd the subset S of products that maximizes expected revenue, can be formulated as max S ⊆ N (cid:88) i ∈ S π ( i, S ) · p i . Many parametric choice models based on random utility maximization (RUM) have been studied inthe literature to capture customer substitution behavior and the probability π ( i, S ) . One of the mostpopular choice model is the Multinomial Logit (MNL) model. The MNL model was introducedindependently by [16] and [20], and later studied by [17]. The probability that a customer purchasesthe product i from assortment S is given by π ( i, S ) = e u i (cid:80) j ∈ S e u j + e u i ∈ S =: v i (cid:80) j ∈ S v j + v i ∈ S , where v j = e u j . [23] show that both the estimation and the assortment optimization under thismodel are tractable. Several algorithms are known to solve the assortment optimization problemunder the MNL model ( [23], [12], [7], and [14]). However, this model suﬀers from simplifyingassumptions, such as the IIA - Independence of Irrelevant Alternatives property (see [3]), whichlimit its applicability. 2herefore, more complex choice models have been developed for capturing a richer class ofsubstitution behaviors including the Nested Logit model ( [27], [8]) and the mixture of MultinomialLogit model ( [21]). We refer the reader to [4] for a comprehensive survey. In addition, ranking-based choice models of demand have also been studied in the literature (see [14], [11], [9], [15],[24]). [5] present a Markov chain based model where the substitutions are modeled via transitionsin a Markov chain. This model was ﬁrst considered in [28] and several variants have since beenconsidered ( [6] and [19]).However, all standard random utility based models and distribution over ranking models suﬀerfrom two serious limitations in practice, namely, i ) the models assume that customer preferencesare static and exogenous to the set of products oﬀered by the seller, and ii ) the total probability ofbuying any product always increases (not necessarily strictly) if the seller adds more products to theassortment. In many settings, these properties are not satisﬁed. In particular, the customers mayform their preferences based on the oﬀered set of products and also, the purchase probability mightdecrease when the seller adds more products to the assortment. The latter is referred to as the choiceoverload phenomenon and has been observed empirically in practice (see [13] and [22]). None ofthe random utility models in their fundamental and standard form can capture this. However, at leastin principle, some modiﬁcations of existing methods might be able to achieve that. For example,one could potentially estimate a diﬀerent logit or mixed logit model for each assortment oﬀeredto consumers, which would allow the preference parameters to vary with assortment. This seemsto be a ﬂexible way of letting the estimated demand functions depend on the products oﬀered andthus should be able to capture choice overload. But in practice, this approach is not viable if thenumber of observations per assortment is small or the number of products itself is large. Hencemore parsimonious models are a better option. [25] propose a model that incorporates an explicitsearch cost for users and can capture choice overload in some settings. [26] also consider a choicemodel with endogenous network eﬀects that capture dynamic preferences in some settings, mainly,where the utility of product for a customer depends on the number of customers interested in thatproduct. 3 .1 Our Contributions The main goal of this paper is to develop a model for substitutions that addresses the above men-tioned limitations and lead to a more practical framework for choice modeling and assortment op-timization. We propose a generalization of the Markov chain model introduced in [5], where weconsider a Markovian comparison based choice process instead of one that is only based on Marko-vian substitutions. The Markov chain choice model presents a natural framework to address theselimitations as the substitution behavior of the customers are modeled in an intuitive manner makingit a natural candidate.

Our Model.

We assume that we are given a universe of n substitutable products indexed from to n : N = { , ..., n } . In our Markov chain based model, the customer substitutions are modeledusing a Markov chain over ( n + 1) states, N + = { , , ..., n } : there is one state for each of the n substitutable products and a state for the no-purchase alternative. We further assume that the tran-sition probability matrix for the underlying Markov chain is given by (( ρ ij )) i ∈ N,j ∈ N + . A customerstarts at any random state i and then does a random walk according to the transition probabilitymatrix ρ . When the customer arrives at a state j which corresponds to product j which is in theoﬀer set S , they either buy the product and stop the random walk with a probability µ ( j, S ) whichdepends on other products in the oﬀer set, or continue doing the random walk with the remainingprobability according to the transition probability matrix until they reach either the state 0 (the no-purchase option) or any other state corresponding to a product in the oﬀer set. We call this modelas Generalized Markov chain choice model.A key outcome of introducing such a stopping probability function is that this model is no longeran RUM model and hence the purchase probability may decrease when we add more products tothe oﬀer set, which we demonstrate via examples. Addressing Limitations . In particular, our model addresses the discussed limitations in the fol-lowing way:•

Dynamic Preferences:

In practice, the substitution behavior of the customers in our modelcan be diﬀerent for diﬀerent assortments. More speciﬁcally, the implied distribution overpreferences depends on the assortment oﬀered and there may not be any single distributionover preferences or rankings that is consistent with choices for all assortments. Our model4aptures this and to the best of our knowledge, this is the ﬁrst systematic approach to capturedynamic preferences.•

Choice Overload Phenomenon:

An important consequence of our model is capturing thechoice overload phenomenon. In particular, the probability of purchase does not necessarilyincrease if the seller includes more products in the assortment. More speciﬁcally, considerassortments

S, T ⊆ { , . . . , n } such that S (cid:40) T . Then it is not necessarily true that π (0 , S ) ≥ π (0 , T ) , where π (0 , S ) denotes the probability of no purchase when the set of products oﬀeredis S . We present several examples illustrating this. Generalized Multinomial Logit Model

We consider the special case of the Generalized Markovchain model where the underlying Markov chain has rank one and name it as the

Generalized Multi-nomial Logit model . [5] show that the Multinomial Logit model can be exactly captured by a Markovchain model where the transition probability matrix has rank one. In particular, the choice proba-bility under this model has the following expression: π ( i, S ) = v i v e α (cid:80) j ∈ S + v j + (cid:80) j ∈ S v j , where α ≥ is a scale parameter. We can see that the no purchase probability is not constant butdepends on the assortment as v e α (cid:80) j ∈ S + v j , which increases the utility of the no-purchase alternativewhen the oﬀer set is enlarged with more products. Assortment Optimization Under Our Model

We study the assortment optimization problem un-der the Generalized Markov chain choice model. Through examples, we demonstrate that an op-timal assortment in our model balances between too few and too many choices and favors clustercenters. We show that the assortment optimization is NP-hard in general by a reduction from thepartition problem. On the positive side, we present a fully polynomial time approximation scheme(FPTAS) for the assortment optimization problem when the transition probability matrix has eitherrank one or a constant rank more than one.Our algorithm for the FPTAS is based on exploiting the structure of the choice probabilityexpression and consequently the expected revenue function. In particular, we show that the choiceprobabilities for a given assortment can be calculated via a system of linear equations. While the5ptimization problem of revenue maximization is non-convex, we show that we can obtain a convexapproximation of the objective function by guessing the values of a small number of linear functionsfor the optimal assortment. Our algorithm is a dynamic programming based algorithm that adaptsideas from the dynamic programming algorithm for the knapsack problem.

Outline

The rest of the paper is organized as follows. In Section 2, we present the GeneralizedMarkov Chain model and our notations. In Section 3, we present choice probability computationsand we also discuss some examples. In Section 4, we present the properties of the GeneralizedMultinomial Logit model, a special case of the previous model along with an algorithm for itsparameter estimation. Section 5 discusses the assortment optimization under the Generalized MNLmodel and shows it is NP-hard. We also present a fully polynomial-time approximation scheme(FPTAS) for the assortment optimization problem. In Section 6, we present results on another caseof the Generalized Markov chain model, where the initial transition matrix is of low rank, and showthat this generalizes the mixture of MNLs model. These results allow us to present an FPTAS forthis model as well. Finally, we include some numerical results on real-life data in Section 7 andconclude in Section 8.

In this section we present the Generalized Markov Chain Model in details.

We model the customer substitution behavior using transitions on a Markov chain on ( n + 1) states,where there is a state corresponding to each product and a state 0 for the no-purchase alternative.We ﬁrst describe the Markov chain choice model introduced in [5]. Let S ⊆ N be a subset ofoﬀered products, let S + := S ∪ { } . The model is speciﬁed by the parameters λ i , i ∈ [ n ] and ρ ij ,for all i ∈ [ n ] and j ∈ { , ..., n } .• λ i is the arrival probability at state i : a customer starts at product i with probability λ i ,• ρ ij denotes the transition probability from state i to state j if product i is unavailable.6n the model introduced in [5], when S is oﬀered, all the states corresponding to S are absorbing.If the random walk of any customer reaches state i ∈ S + , then they select product i with probabilityone regardless of what else is being oﬀered. This is again an RUM model and hence suﬀers fromthe limitations of choice overload and dynamic preferences as discussed previously. In our model, we use the Markovian framework as above to model substitution behavior. However,we introduce a stopping probability function µ ( i, S ) . Stopping probability function.

For any i ∈ S , µ ( i, S ) denotes the probability that a customerselects product i , given that they are currently already in state i of the Markov chain. In the modelconsidered by [5], this probability is equal to . In this paper, we aim to capture the following fun-damental component of customer choice, namely, that customer preferences and eventual selectiondepend on comparisons among the oﬀered products. To capture this behavior, we model µ ( i, S ) asa decreasing function of (cid:80) j ∈ S ρ ij and consider the following formulation for µ ( i, S ) µ ( i, S ) = exp  − α (cid:88) j ∈ N + ρ ij x j  , ∀ i ∈ S, where x j = 1 if j ∈ S + and 0 otherwise. Also, we have µ ( i, S ) = 0 ∀ i / ∈ S . Note that if a largenumber of products similar to i (i.e. with large ρ ij ) are oﬀered in the assortment, then the stoppingprobability is small. This reﬂects the scenario that it is diﬃcult to select a product if a large numberof similar options are available. Similarly, if we include more products in the assortment, µ ( i, S ) decreases. This models the fact that customers need more comparisons and time (transitions) toselect the best product. This can be interpreted as the higher search cost for ﬁnding the best productsif many similar items are oﬀered.We refer to this model as the Generalized Markov Chain Model . Here α is a scale parameter thatampliﬁes the comparison eﬀect. In our proposed model, we have α ≥ . A large value of α impliesa very picky customer. We would like to note that our model generalizes the model introducedin [5]. In particular, we recover the model in [5] by assuming α = 0 .Note that the choice of exponential function is arbitrary. We use this as it is a parsimonious7hoice to model transition probability.To model the eventual choice of product i by the customer, we add a vertex i (cid:48) , which is anabsorbing state. A directed edge joins the vertex i to the vertex i (cid:48) with weight µ ( i, S ) , which repre-sents the probability of buying the product i when the customer is at the vertex i . This probabilityis equal to if and if only if the product is not oﬀered, i.e. i / ∈ S . We denote N (cid:48) + to be the set ofabsorbing states { i (cid:48) } i ∈ [ n ] ∪ { } . After a certain time t , for t large enough, the customer is either ina certain state i (cid:48) , with i ∈ S , or in the no-purchase state . Modiﬁed transition probabilities.

Since the sum of the probabilities of getting out of i has to beequal to , we change ρ ij to ¯ ρ ij deﬁned as follows: ¯ ρ ij = (1 − µ ( i, S )) ρ ij . Customer substitution behavior on the new graph.

Let us summarize how a customer behaveson the new graph given a certain set of products S ⊆ N to sell under the Generalized Markov Chainmodel:• The customer arrives with probability λ i at the vertex i .• If the product i is in S then the customer, currently at the vertex i , – either selects it with probability µ ( i, S ) , arrives at the vertex i (cid:48) and then stops, – or goes to another vertex j with probability ¯ ρ ij .• If i (cid:54)∈ S , the customer cannot purchase i , so with probability ¯ ρ ij they go to another vertex j .• If i = 0 , then the customer has decided not to purchase any product, and they stop.We then proceed recursively. Example

We consider the following 4-vertex graph (see Figure 1), where we have chosen to oﬀerthe subset S = { , } . Each product i has a state i as well as a state i (cid:48) where the latter is theabsorbing state denoting that the customer buys product i .We can see that since the oﬀered set has products 3 and 4, the transition probabilities of goingfrom node 3 or node 4 to any other node now gets modiﬁed, while other transition probabilities82 341’ 4’2’ 3’0 ρ ρ (1 − µ (3 , S )) ρ ρ ρ ρ (1 − µ (4 , S )) ρ (1 − µ (3 , S )) ρ µ (4 , S ) µ (3 , S ) λ λ λ λ Figure 1: Example of a 4-vertex graph with S = { , } (i.e., from nodes 1 and 2 are unchanged). Firstly, nodes 3 and 4 are not absorbing anymore andhence the probability to go to nodes 3’ and 4’ which are now absorbing are µ (3 , S ) and µ (4 , S ) respectively. Secondly, this also changes the transition probabilities from nodes 3 and 4 to othernodes accordingly. Given the parameters λ i , ¯ ρ ij and µ ( i, S ) for all i ∈ N and j ∈ N + , we can compute the choiceprobabilities for any S ⊆ N in a very similar way as [5]. Our assumption is that a customer arrivesat the state i with probability λ i , and continues to transition according to probabilities ¯ ρ ij until theydecide to buy a product i with probability µ ( i, S ) when they are at the vertex i , or decide not to buyany product and end at the no-purchase vertex . We therefore assume that any customer buys atmost one product. Let ρ ( N, N ) be the transition probability matrix from states N to N . We recall that since thetotal probability of exiting a vertex i is , and since there is a probability µ ( i, S ) of buying theproduct represented by the vertex i , the transition probabilities are given by: ¯ ρ ij = (1 − µ ( i, S )) ρ ij .9onsequently, ρ ( N, N ) =

Diag ((1 − µ ( i, S ))) × ρ, where ρ is the initial transition probability matrix, ρ = ( ρ ij ) i,j ∈ [ n ] , and Diag ((1 − µ ( i, S ))) is thediagonal matrix with (1 − µ ( i, S )) on its diagonal. Also recall that µ ( i, S ) = 0 ∀ i / ∈ S .After a certain time, every customer will be in an absorbing state. In order to compute π ( i, S ) ,we have to know the probability that a customer arrives at the vertex i . For i ∈ [ n ] we have: π ( i, S ) = lim q →∞ λ T ( P ( S )) q e n +1 i , where P ( S ) is the transition probability matrix in the graph when the subset S is oﬀered and is ofthe form: P ( S ) =  ρ ( N (cid:48) + , N (cid:48) + ) ρ ( N (cid:48) + , N ) ρ ( N, N (cid:48) + ) ρ ( N, N )  =  I n +1 S ) Diag ((1 − µ ( i, S ))) ρ  , and Π( S ) is the following matrix: Π( S ) = ρ ( N, N (cid:48) + ) =  µ (1 , S ) 0 ... ρ µ (2 , S ) ... ρ : : :0 ... µ ( n, S ) ¯ ρ n  = (cid:104) Diag ( µ ( i, S )) ¯ ρ (cid:105) , and e n +1 i ∈ { , } n +1 has on each component except on its i th component. Since we have i ∈ [ n ] , e n +1 i always has its last n + 1 components equal to . ρ ( N (cid:48) + , N (cid:48) + ) = I n +1 because all thestates in N (cid:48) + are absorbing, which also implies that ρ ( N (cid:48) + , N ) = 0 .For q ∈ N , we have: P ( S ) q =  I n +1 (cid:80) qk =0 ( Diag ((1 − µ ( i, S ))) ρ ) k Π( S ) ( Diag ((1 − µ ( i, S ))) ρ ) q  . Therefore, if we assume that the spectral radius of ρ ( N, N ) =

Diag ((1 − µ ( i, S ))) ρ is strictly less10han 1: lim q →∞ P ( S ) q =  I n +1 I n − Diag ((1 − µ ( i, S ))) ρ ) − Π( S ) 0  , and hence π ( i, S ) = λ T ( I n − Diag ((1 − µ ( i, S ))) ρ ) − Π( S ) e i , where e i ∈ { , } n +1 . Lastly, if we want the probability of no purchase, we can simply compute λ T ( I n − Diag ((1 − µ ( i, S ))) ρ ) − Π( S ) e , where e = (0 , ..., , ∈ { , } n +1 . We now provide a couple of examples to show that our model is better at capturing the choiceoverload phenomenon than any other random utility based choice models.

Example 1 (Homogeneous Graph) . We ﬁrst consider a complete graph with n vertices with ho-mogeneous transition probabilities, ρ ij = n +1 for all i ∈ N and j ∈ N + , and also homogeneousprobabilities of arrival, λ i = n +1 for all i ∈ N + . We suppose that all products have same price p .The symmetry of this example implies that we only need to ﬁnd the number k of vertices to oﬀerthat would maximize revenue, and then randomly take a subset of k elements. For any random util-ity based choice model presented in the ﬁrst section, the optimal set to maximize revenue will bethe entire set of products. Indeed, in all these models, the product with the highest price is alwaysin the optimal assortment. Since all the products have the same price, they will all be in the set.Furthermore, the probability of no purchase always decreases when we add more products into theoﬀered set. However this is not true in our model, and the probability of no purchase depends onthe parameter α chosen. We have chosen a universe of n = 15 substitutable products, and giventhe assumptions above, we compute the probability of purchase under our model when k productsare in the oﬀered set, for all k ∈ [ n ] and we obtain the graph in Figure 2:We see that, depending on the value of α , the probability of no purchase may increase when weadd more items into the oﬀered assortment. Especially, a high α implies a sooner (in terms of thenumber of products) increase in the probability of no-purchase.We present another example, the star graph, which shows that our model will favour the clustercenters as items for the assortment set, unlike the Markov chain model.11 . . . . Number of products in the oﬀered set P r ob a b ilit yo f nopu r c h a s e α = 1 α = 2 α = 10 Figure 2: Probability of no-purchase

Example 2 (Star Graph) . We consider the following star graph with n vertices. We suppose thatvertex is linked to all other vertices in the graph, but other vertices are only linked to and to nopurchase vertex . We suppose that the transition probabilities are homogeneous. Therefore, theyare given by: ρ i = n ∀ i ∈ N + \{ } and ρ i = ρ i = ∀ i ∈ N \{ } . We suppose that the arrivalprobabilities are all equal λ i = n ∀ i ∈ N . Finally we suppose that all products, except , have aprice P , and product has a smaller price p < P .For any random utility based choice model considered in the literature, since the product withthe highest price is always in the optimal oﬀered set, then { , ..., n } ⊆ S ∗ where S ∗ is the optimalset. And this is true, for any n , even for very large n .However, our model considers that selling only the product will give a higher revenue, when α islarge enough. For α ≥ , the optimal set will always be { } . And this result is closer to the reality.Indeed, it seems more logical in practice for the seller to only oﬀer the product which is similar tomany other products, even if this product is slightly less expensive than the others.In order to get a generalization of the MNL model, we now suppose that the initial transitionprobability matrix, ρ = ( ρ ij ) i,j ∈ [ n ] is of rank one, and show that with such an assumption, theoptimization problem is NP-hard. 12 Generalized Multinomial Logit model

In this section we suppose that the transition probability matrix ρ is of rank one. Given this as-sumption, we refer to our model as the Generalized Multinomial Logit model, which is a specialcase of the Generalized Markov chain model. We remind that N represents all the vertices in thegraph, N + is the union of all the vertices and the vertex which represents the no-purchase, and ρ ( N, N + ) is the transition probability matrix from N to N + . In this model, we suppose that thereexists v = ( v i ) i ∈ [ n +1] ∈ [0 , n +1 = [ v v T ∗ ] , such that (cid:80) ni =0 v i = 1 and: ρ ( N, N + ) =  − µ (1 , S ) 0 . . . − µ ( n, S )  ×  v v ... v n ... ... ... v v ... v n  . Since there is a probability µ ( i, S ) that the customer goes from vertex i to vertex i (cid:48) , the probabilitythat the customer goes from vertex i to vertex j , with j ∈ N + , has to be (1 − µ ( i, S )) v j , thereforethe probability of exiting from vertex i is . We also suppose in this model that λ = v , thereforethe probability of arriving at a vertex i is proportional to v i . Finally we suppose that the probabilityof buying the product i while being at vertex i is given by: µ ( i, S ) := e − α × (cid:80) j ∈ S + v j . Additionally, we have µ ( i, S ) = 0 ∀ i / ∈ S . With the notations given in Section 3.1: P ( S ) =  ρ ( N (cid:48) + , N (cid:48) + ) ρ ( N (cid:48) + , N ) ρ ( N, N (cid:48) + ) ρ ( N, N )  =  I n +1 S ) D ( S ) ρ v  , where Π( S ) = e − α (cid:80) j ∈ S + v j ×  ∈ S e α (cid:80) j ∈ S + v j − ∈ S ) v . . . ... n ∈ S ( e α (cid:80) j ∈ S + v j − n ∈ S ) v  , and13 ( S ) ρ v =  − µ (1 , S ) 0 . . . − µ ( n, S )  ×  v ... v n ... ... v ... v n  = (cid:16) − e − α (cid:80) j ∈ S + v j (cid:17) ×  v ... v n ... ... v ... v n  . We summarize here the assumptions of the Generalized Multinomial Logit model:

Generalized Multinomial Logit Model

In this model we make the following assumptions:• the initial transition probability matrix within N + is of rank one, i.e. there exists v =( v i ) i ∈ N + =  v v ∗  ∈ [0 , n +1 such that ρ ( N, N ) =

Diag ((1 − µ ( i, S )) ∗ T and (cid:80) j ∈ N v j + v = 1 , • given a subset S ⊆ N , for all i ∈ N we have µ ( i, S ) = e − α (cid:80) j ∈ S + v j ,• and for all j ∈ N + , λ j = v j .As we show in Section 3.1, the assortment optimization problem under our model is given by: max S ⊂ N v T ( I n − D ( S ) ρ v ) − Π( S ) p. We now give an exact formulation of choice probability and see why it generalizes the MNL model.

We can compute the probability of choosing a product i in our model as follows. Lemma 4.1.

The probability of purchasing a product i given a chosen subset S ⊆ N under theGeneralized Multinomial Logit model is given by: π ( i, S ) = v i (cid:80) k ∈ S v k + v e α (cid:80) j ∈ S + v j i ∈ S . The proof follows from Section 3.1 applied to this particular case and is presented in detail inAppendix 9.1.

Our model is a generalization of the MNL model . We recall that under the MNL model, the14robability of buying the product i ∈ S when the set S is oﬀered is given by π MNL ( i, S ) = v i (cid:80) j ∈ S v j + v i ∈ S . Therefore, our model can be considered as a generalization of the MNL model where the nopurchase probability is not constant as is the case in MNL, but depends on the assortment S as v e α (cid:80) j ∈ S + v j , which increases the utility of the no-purchase alternative as compared to the MNLmodel.Suppose that, instead of choosing, µ ( i, S ) = e − α (cid:80) j ∈ S + v j , we had chosen a diﬀerent function,say, µ ( i, S ) = (cid:80) j ∈ S + v j , which would also convey the idea that µ ( i, S ) is a decreasing function of (cid:80) j ∈ S + v j . Then, the probability of purchasing the product i given a set S of oﬀered products wouldhave been: ¯ π ( i, S ) = v i (cid:80) j ∈ S v j + v ( (cid:80) j ∈ S + v j ) = v i (1 + v ) (cid:80) j ∈ S v j + v , which in a ratio scale, is exactly the choice probability of MNL model. Therefore, such a functionwould have given a nesting by price order and therefore an optimization problem solvable in poly-nomial time, just as MNL. However, this model would not have given suﬃcient weights to the v j ’sfor j ∈ S on the no-purchase option, which is what we want to capture, namely the choice overloadphenomenon. This is why we want to emphasize the importance of choice of the function µ ( i, S ) in our model, and why the choice of e − α (cid:80) j ∈ S + v j meets the requirements of our model. Let us revisit the example of a homogeneous Markov chain from Section 3.2 in this context.

Example (Homogeneous Graph) . We recall the assumptions in this model. Consider the caseof the complete graph with n vertices with homogeneous transition probabilities, ρ ij = n +1 forall i ∈ N and j ∈ N + , and homogeneous probabilities of arrival, λ i = n +1 for all i ∈ N + . Wesuppose that all the products have the same price p .For any random utility based choice model presented in the ﬁrst section, the optimal set to maximizeour revenue will be the entire set of products, as we explained before. This example is a particularcase where the initial transition probability is of rank one. Therefore under the Generalized MNL15odel, the assortment optimization problem for the homogeneous graph is: max S ⊆ N | S | n +1 p | S | n +1 + n +1 e α | S | n +1 = max k ∈ [ n ] kpk + e α k +1 n +1 . A simple computation shows that the optimal number of products in the oﬀered set is k ∗ = n +1 α .Therefore, if α < n +1 n then the optimal assortment set will be the entire universe. However, ifwe take α large enough, then n +1 α ≤ n − and there will be less products in the optimal oﬀeredset. This also highlights the meaning of α : α ampliﬁes the comparison eﬀect. A large value of α implies risk-averse customer, and therefore a strategy where the seller should oﬀer less products. Recall that the choice probabilities for the Generalized MNL model are given by: π ( i, S ) = v i v e α (cid:80) j ∈ S + v j + (cid:80) k ∈ S v k , ∀ i ∈ Sπ (0 , S ) = v e α (cid:80) j ∈ S + v j v e α (cid:80) j ∈ S + v j + (cid:80) k ∈ S v k , where v j = e β (cid:62) x j .Given a choice dataset: D = { j t , S t } Tt =1 , where S t is the assortment set oﬀered at time t and j t is the choice made at time t (which could be the outside option of no purchase), the log-likelihoodcan be formulated as: (cid:96) ( D , β, α ) = (cid:88) t/ ∈ D β (cid:62) x j t + (cid:88) t ∈ D (cid:0) β (cid:62) x + α (cid:88) j ∈ S t + e β (cid:62) x j (cid:1) − T (cid:88) t =1 log (cid:0) e β (cid:62) x e α (cid:80) j ∈ St + e β (cid:62) xj + (cid:88) k ∈ S t e β (cid:62) x k (cid:1) , where D is deﬁned as the subset of D where there was no purchase.While (cid:96) ( β, α ) is not jointly concave in ( β, α ) , we present an alternate algorithm based on search-ing for α . In particular, we have the following lemma. Lemma 4.2.

For a given value of α , the maximization problem over β can be reformulated as aconvex optimization problem. roof. Proof The partial maximization problem over β when α is known is the following: max β (cid:88) t/ ∈ D β (cid:62) x j t + (cid:88) t ∈ D (cid:0) β (cid:62) x + α (cid:88) j ∈ S t + e β (cid:62) x j (cid:1) − T (cid:88) t =1 log (cid:0) e β (cid:62) x e α (cid:80) j ∈ St + e β (cid:62) xj + (cid:88) k ∈ S t e β (cid:62) x k (cid:1) . We introduce the following new variables: z t = β (cid:62) x + α (cid:88) j ∈ S t + e β (cid:62) x j , t = 1 , . . . , T Then we can re-write the above maximization as: max β,z t (cid:88) t/ ∈ D β (cid:62) x j t + (cid:88) t ∈ D z t − T (cid:88) t =1 log (cid:0) e z t + (cid:88) k ∈ S t e β (cid:62) x k (cid:1) (1)s.t. β (cid:62) x + α (cid:88) j ∈ S t + e β (cid:62) x j − z t = 0 , t = 1 , . . . , T The objective function is now jointly concave in ( β, z t ) as it is a sum of linear functions of β and z t and the negative of log-sum-exp function. Also, the equality constraints are convex functions in ( β, z t ) . Hence we can solve this optimization problem in (1) eﬃciently. We also note that we areonly introducing T new variables and constraints. (cid:3) Lemma 4.3.

For a given value of β , the log-likelihood function is strictly concave in α and henceit is unimodal. So the maximization problem over α can be solved.Proof. Proof For a given value of β , the partial maximization problem over α is given by: max α (cid:88) t ∈ D α (cid:88) j ∈ S t + v j − T (cid:88) t =1 log (cid:0) v e α (cid:80) j ∈ St + v j + (cid:88) k ∈ S t v k (cid:1) Deﬁning c t := (cid:80) k ∈ S t v k = (cid:80) k ∈ S t e β (cid:62) x k , we can re-write this as max α (cid:88) t ∈ D α ( v + c t ) − T (cid:88) t =1 log (cid:0) v e α ( v + c t ) + c t (cid:1) (2)A simple derivative calculation shows the above function is strictly concave in α . Hence the17aximization problem in (2) is also easy to solve. (cid:3) In accordance with the above results, an iterative algorithm for maximizing the log-likelihood isto keep maximizing over α and β alternatively until convergence. This will lead to a local maximum.Since this type of alternative maximization algorithm is dependent on the initial point, a good initialpoint could be β MNLMLE which is the maximum likelihood estimate for the MNL model (which canbe easily found as the MLE for MNL model is a convex optimization problem). The details aregiven in Algorithm 1. We found Algorithm 1 to converge after a few iterations only, as evidentfrom Figure 3.

Algorithm 1

Parameter Estimation for Generalized Multinomial Logit model procedure

ParamEstGenMNL( D )Let β MNLMLE be the maximum likelihood estimate from the given data D for the MNL modelSet β (0) = β MNLMLE

Set α (0) = 0 for k = 1 , , . . . do Solve the optimization in (2) with β = β ( k − and set α ( k ) to be the optimal solutionSolve the optimization problem in (1) with α = α ( k ) and set β ( k ) to be the optimal solutionStop until convergence is achieved end for Let K be the index after convergence at the end of for loopSet β GMNLMLE = β ( K ) Set α GMNLMLE = α ( K ) return β GMNLMLE and α GMNLMLE as the estimates end procedure

In this section we consider the assortment optimization problem under the Generalized MNL model.Unlike the MNL model, even unconstrained assortment optimization under Generalized MNLmodel is NP-hard. Under the Generalized Multinomial Logit model , using the expression of choiceprobability we derived in Lemma 4.1, the assortment optimization problem can be written as max S ⊆ N R ( S ) := max S ⊆ N (cid:80) i ∈ S v i p i (cid:80) i ∈ S v i + v e α (cid:80) i ∈ S + v i . (3)18igure 3: Estimate of α vs number of iterations for the estimation algorithm. The feature vectorwas 4 dimensional and we had 10 products, i.e., d = 4 , n = 10 . We can see convergence after afew iterations of the alternating projection algorithm. In particular, we prove the following result.

Theorem 5.1.

The assortment optimization problem under the Generalized MNL model in (3) isNP-hard.

We use a reduction from the partition problem to prove this result and the details of the proofare presented in Appendix 9.2.Given that the assortment optimization problem in (3) is NP-hard, we can only hope to get anapproximation. We present the best possible approximation in the form of a fully polynomial timeapproximation scheme (FPTAS).Our algorithm for the FPTAS is based on the structure of the revenue function that depends ona single linear function of the assortment S , namely V ( S ) = (cid:88) j ∈ S v j .

19n particular, for any assortment S ⊆ [ n ] , R ( S ) is completely determined by V ( S ) . Hence if wecan guess the value of V ( S ∗ ) corresponding to an optimal assortment S ∗ , and ﬁnd an assortment S with V ( S ) ≈ V ( S ∗ ) , we can use a dynamic programming based algorithm similar to the knapsackproblem to construct such an approximately optimal assortment. This technique has been used inthe past (see [18]). One of the most closely related works is [10] which presents algorithms forconstrained assortment optimization under many parametric models where the revenue functionsatisﬁes this linear structural property. In this section, we now present a fully polynomial time approximation scheme (FPTAS) for theassortment optimization under the Generalized Multinomial Logit model discussed in (3). For theFPTAS, we ﬁrst consider diﬀerent guesses for V ( S ∗ ) in increasing powers of (1 + (cid:15) ) .Let v (resp. V ) be the minimum (resp. maximum) value of the transition probabilities. We canassume that v > . For any given (cid:15) > , we use the following set of guesses for V ( S ∗ ) : V (cid:15) = { v (1 + (cid:15) ) l , l = 0 , ..., L } , where L = O (log( nV /v ) /(cid:15) ) . Hence, the number of guesses is polynomial in the number of prod-ucts and /(cid:15) .Then for each guess h ∈ V (cid:15) , we consider discretized values of v j and try to construct an assort-ment S such that: h (1 − (cid:15) ) ≤ V ( S ) ≤ h (1 + (cid:15) ) , using a knapsack like dynamic programming. In particular, for given guess h ∈ V (cid:15) , we try to ﬁndthe best revenue possible with (cid:88) j ∈ S v j ≤ h, by a dynamic program.We consider the following discretized values of v j in multiples of (cid:15)h/n : ∀ j ∈ N ¯ v j = (cid:24) v j (cid:15)h/n (cid:25) . I = (cid:100) n/(cid:15) (cid:101) + n . For each ( i, k ) ∈ [ I ] × [ n ] , let R ( i, k ) be the maximum revenue of any subset S ⊆ { , ..., k } such that (cid:88) j ∈ S ¯ v j ≤ i. We compute R ( i, k ) using the following dynamic program R ( i,

1) =  v p if ¯ v ≤ i if i ≥ −∞ otherwise R ( i, k + 1) = max { v k +1 p k +1 + R ( i − ¯ v k +1 , k ) , R ( i, k ) } . Let S h be the subset corresponding to R ( I, n ) , that is, the assortment S h that maximizes the sum (cid:80) j ∈ S v j p j such that the inequality is veriﬁed for i = I . We then construct a set of candidateassortments S h for all guesses h , and return the best revenue that we get from all the candidates inthe set. Algorithm 2 presents the details for the FPTAS. Algorithm 2

FPTAS for the Generalized Multinomial Logit model procedure

FPTASGenMNL( (cid:15), v ) for h ∈ V (cid:15) do Compute the discretized coeﬃcients ¯ v j = (cid:108) v j (cid:15)h/n (cid:109) Compute R ( i, k ) for all ( i, k ) ∈ [ I ] × [ n ] using the dynamic program aboveLet S h be the subset corresponding to R ( I, n ) end for Let C = ∪ h ∈ V (cid:15) S h return the set S ∗ ∈ C that has the best revenue end procedureTheorem 5.2. Algorithm 2 returns a (1 − O ( (cid:15) )) -optimal solution to the assortment optimizationproblem in (3) . The running time is O (cid:16) n (cid:15) log( nV /v ) (cid:17) . We present the complete proof in Appendix 9.3.21

Generalized Markov Chain Model with Low Rank

In this section, we consider a general model, when the initial transition matrix is of low rank. Inparticular, we assume that the rank is some constant

K < n and the initial transition matrix ρ isgiven by ρ ( N, N + ) =  (cid:88) k ∈ [ K ] u ik v jk  i ∈ [ n ] ,j ∈ N + = (cid:88) k ∈ [ K ] u k v Tk , where ∀ i ∈ [ n ] , (cid:80) nj =0 (cid:80) Kk =1 u ik v jk = 1 . Given this initial transition probability matrix, we havethat the probability of purchasing the product i while being at vertex i when S is oﬀered is µ LR ( i, S ) = e − α (cid:16)(cid:80) j ∈ S + (cid:80) k ∈ [ K ] u ik v jk (cid:17) = e − α ( (cid:80) k ∈ [ K ] u ik V k ( S ) ) , where we deﬁne V k ( S ) := (cid:88) j ∈ S + v jk ∀ k ∈ [ K ] . We also make the following assumptions:• we suppose that ∀ j ∈ [ n ] , ∀ k ∈ [ K ] u jk v jk ≤ n (by this, we mean that the probability ofstaying at the state j without buying the product j cannot be too high);• we also suppose that α is not too large compared to n , more precisely, α ≤ log n .First assumption is natural as it stipulates that a customer either buys the product or moves toanother state. The second assumption is a technical assumption which makes sure that the spectralradius of the following matrix is bounded away from 1. Lemma 6.1.

Let S ⊆ N , we deﬁne the following matrix ∈ M K ( R ) : M = U V ( S ) = (cid:32) n (cid:88) j =1 (1 − µ LR ( j, S )) u jk v jm (cid:33) k,m ∈ [ K ] . Then the spectral radius of M , ρ ( M ) ≤ − n . We present the proof in Appendix 9.4. 22 .1 Assortment Optimization and FPTAS

We present an FPTAS for the constant rank K case with the running time of the algorithm beingexponential in K . In particular, we ﬁrst show that the expected revenue of an assortment S , R ( S ) depends on O ( K ) linear functions of S . Therefore if we can guess the values of these O ( K ) linearfunctions for an optimal assortment S ∗ , and then ﬁnd an assortment that approximately matchesthese values, we can compute the approximately optimal assortment.Unlike other problems in literature (e.g., see [18]), the revenue function depends on a system ofequations where the coeﬃcients depend on the linear function values. Therefore to control the errorin R ( S ) , we need to control the error in the estimates of the solution to the system of equations andnot just the linear function values. This is one of the main challenges we address while constructingthe algorithm for the FPTAS.In particular, we choose the linear functions to guess more carefully, which allows us to give atheoretical bound on the error in solution of the system of equations.We ﬁrst compute the expected revenue of an assortment S and give the following decomposition. Lemma 6.2.

Under the Generalized Markov chain model with the rank of transition matrix being K , the expected revenue that we get from oﬀering the assortment S is R LR ( S ) = (cid:88) i ∈ [ n ] λ i (1 − µ LR ( i, S )) (cid:32)(cid:88) j ∈ S p j µ LR ( j, S ) u Ti [ I − U V ( S )] − v j (cid:33) + (cid:88) i ∈ S λ i µ LR ( i, S ) p i where the matrix U V ( S ) ∈ M K ( R ) is deﬁned by U V ( S ) = (cid:32) n (cid:88) j =1 (1 − µ LR ( j, S )) u jk v jm (cid:33) k,m ∈ [ K ] . The proof builds from the general choice probability expression from Section 3.1. We presentthe details in Appendix 9.5.For any i and S , let f ( i, S ) := (cid:88) j ∈ S p j µ LR ( j, S ) u Ti [ I − U V ( S )] − v j . (4)23he assortment optimization problem under the Generalized Markov chain model can then be for-mulated as max S ⊆ N R LR ( S ) := max S ⊆ N (cid:88) i ∈ [ n ] λ i (1 − µ LR ( i, S )) f ( i, S ) + (cid:88) i ∈ S λ i µ LR ( i, S ) p i . (5) We guess the following K linear functions of S : V k ( S ) = (cid:88) j ∈ S + v jk ∀ k ∈ [ K ] . We show the following result which stipulates that if our guesses are within (1 ± (cid:15) ) of the optimalvalues, the error in solution of the system of equations is also within (1 ± O ( (cid:15) )) . Lemma 6.3.

Let S ⊆ N . Suppose that ∃ H ∈ M K ( R ) and ˆ v ∈ R such that (1 − O ( (cid:15) )) H ≤ U V ( S ) ≤ (1 + O ( (cid:15) )) H and (1 − O ( (cid:15) ))ˆ v ≤ v j ≤ (1 + O ( (cid:15) ))ˆ v . Then we have that (1 − O ( (cid:15) ))[ I − H ] − ˆ v ≤ [ I − U V ( S )] − v j ≤ (1 + O ( (cid:15) ))[ I − H ] − ˆ v . The proof builds from Lemma 6.1. We present the details in Appendix 9.6.Now we are ready to present the FPTAS. Let v k (resp. V k ) be the minimum (resp. maximum)value of { v ik } i ∈ N , for all k ∈ [ K ] . We can assume that v k > . For any given (cid:15) > , we use thefollowing sets of guesses: W k(cid:15) = { v k (1 + (cid:15) ) t , t = 0 , ..., T k } , for all k ∈ [ K ] , where T k = O (log( nV k /v k ) /(cid:15) ) . A guess h belongs in the set W (cid:15) = W (cid:15) × ... × W K(cid:15) . /(cid:15) . For given guess h = ( h , ..., h K ) ∈ W (cid:15) , we try to ﬁnd the best revenue possible with h k ≤ (cid:88) j ∈ S + v jk ≤ h k (1 + (cid:15) ) , for all k ∈ [ K ] , using a dynamic program. In particular, consider the following discretized values of v jk in multiplesof (cid:15)h k /n : ∀ k ∈ [ K ] , ∀ j ∈ N, ¯ v jk = (cid:24) v jk (cid:15)h k /n (cid:25) . We denote by ¯ v j the vector ¯ v j := (¯ v j , ..., ¯ v jK ) . Let L = (cid:100) n/(cid:15) (cid:101) , and U = (cid:100) n/(cid:15) (cid:101) + n . We use adynamic program to maximize the total expected revenue. For each ( l, u, m ) ∈ [ L ] K × [ U ] K × [ n ] ,let R DP ( l, u, m ) be the maximum revenue of any subset S ⊆ { , ..., m } such that l k ≤ (cid:88) j ∈ S ¯ v jk ≤ u k ∀ k ∈ [ K ] . For each guess h , let µ i ( h ) := e − α ( (cid:80) k ∈ [ K ] h k u ik ) ∀ i ∈ N. Therefore, µ ( h ) = ( µ ( h ) , ..., µ n ( h )) is an estimate of the value of the µ LR ( i, S ) ’s. Given this, wealso use the following estimation H of the matrix U V ( S ) deﬁned before: H ( h ) := (cid:32) n (cid:88) i =1 (1 − µ i ( h )) u ik v ik (cid:48) (cid:33) k,k (cid:48) ∈ [ K ] . Finally, let us also consider for each i ∈ N and each S , the following estimate for f ( i, S ) deﬁnedin equation (4) f i ( h, S ) = (cid:88) j ∈ S p j µ j ( h ) u Ti [ I − H ( h )] − v j . For each guess h , we deﬁne the approximate revenue of the subset S as R DP ( h, S ) = (cid:88) i ∈ S λ i µ i ( h ) p i + n (cid:88) i =1 λ i (1 − µ i ( h )) f i ( h, S ) .

25e compute R DP ( l, u, m ) using the following dynamic program R DP ( l, u,

1) =  λ µ ( h ) p + (cid:80) ni =1 λ i (1 − µ i ( h )) p µ ( h ) u Ti [ I − H ( h )] − v if l ≤ ¯ v ≤ u if l ≤ and u ≥ −∞ otherwise R DP ( l, u, m ) = max (cid:110) λ m µ m ( h ) p m + n (cid:88) i =1 λ i (1 − µ i ( h )) p m µ m ( h ) u Ti [ I − H ( h )] − v m + R DP ( l − ¯ v m , u − ¯ v m , m − , R DP ( l, u, m − (cid:111) . Note that the number of states in the above dynamic program is O (cid:16)(cid:0) n(cid:15) (cid:1) K n (cid:17) . Each step of thedynamic program requires a summation that can be done in O ( nK ) time. This results in a totalrunning time of O (cid:16)(cid:0) n(cid:15) (cid:1) K n K (cid:17) . We construct a set of candidate assortments S h for all guesses h , and return the best revenue that we get from all the candidates in the set. Algorithm 3 presentsthe details for the FPTAS. Algorithm 3

FPTAS for the Generalized Markov chain model with Low rank matrix procedure

FPTASGenMixtMNL( (cid:15), u k , v k ) for h ∈ W (cid:15) do Compute the discretized coeﬃcients ¯ v jk = (cid:108) v jk (cid:15)h k /n (cid:109) Compute R DP ( l, u, m ) for all ( l, u, m ) ∈ [ L ] K × [ U ] K × [ n ] using the dynamic programLet S h be the subset corresponding to R DP ( L, U, n ) end for Let C = ∪ h ∈ W (cid:15) S h return the set S ∗ ∈ C that has the best revenue end procedureTheorem 6.4. Algorithm 3 returns a (1 − O ( (cid:15) )) -optimal solution to the assortment optimizationproblem in (5) . The running time is O (cid:16) n K +2 (cid:15) K K log( nV /v ) K (cid:17) . We present the proof in Appendix 9.7. Note that the running time is exponential in the ﬁxedrank K of the transition probability matrix. 26 Numerical Results

In this section, we present numerical results on real data. We use the publicly available “RelatedArticle Recommendation Dataset” from [1] for performing this experiment.

Description.

The dataset contains information on about 57.4 million recommendations that weredisplayed in the form of an ordered-list to the users of the digital library Sowiport. Informationincludes details such as which recommendation algorithms were used to order the list (one outof content-based ﬁltering, stereotype, most popular and random) and also the time when thoserecommendations were requested, delivered and clicked.From the digital library’s point of view, the decision to be made is which articles, how many ofthem and in which order should they be displayed when a request is received. The objective is tomaximize the overall click-through rate (CTR), which is the ratio of clicked recommendations tothose delivered. This dataset has also been used by [2] to study empirical evidence of the choiceoverload phenomenon. That study ﬁnds out that higher numbers of recommendations for a requestlead to lower click-through rates.

Setup.

Since the choice models assume that at most one product is selected from the oﬀered as-sortment, we ﬁrst ﬁlter out a few recommendations which had multiple clicks. After that, we do feature engineering and build a few features which would be used to train the data on both MNLand Generalized MNL models. Then we ﬁt both the models and estimate their respective param-eters: β ∈ R d for the MNL model and β ∈ R d , α > for Generalized MNL model. Since theparameter estimation for the MNL model is a convex optimization problem, we use the populargradient descent method. We use the parameter estimation algorithm discussed in Section 4.3 forthe Generalized MNL model, i.e., Algorithm 1.Once we have estimated the parameters for both the models, we use them to predict the clickprobabilities on a separate held-out dataset. After getting these click probabilities, we simply orderthe recommended articles for each request made, according to these values. Since the objective isto generate clicks on the recommended articles, we compare the predictions from both the modelsagainst actually clicked articles. Results.

The standard metric used in the literature when the objective is to maximize CTR is the27igure 4: ROC Curves for MNL and Generalized MNL Modelsarea under the Receiver Operating Characteristic curve (the ROC AUC value, which lies between0 and 1, with a higher value being preferable). We ﬁnd out that Generalized MNL model improvesthe ROC AUC value over MNL model by 7%. The plots of the ROC curves are shown in Figure 4.Another important observation is on the α estimated from the data. We get a high value forthe estimate of α ( α ≈ ) which suggests that the choice overload phenomenon is prominent insuch situations and hence Generalized MNL model would be able to capture them much better ascompared to the MNL model. Our main contribution in this paper is to build upon the existing Markov chain based choice modelpresented by [5] and present a generalized model that addresses two signiﬁcant limitations of exist-ing random utility maximization and rank-based choice models in capturing dynamic preferencesand the choice overload phenomenon.The Generalized Markov chain model attempts to capture both dynamic preferences and choiceoverload phenomenon by considering a modiﬁed choice or selection process, where a customer28tops at a state corresponding to an oﬀered product with some probability that depends on the setof oﬀered products. This implicitly models the search cost in the selection process and therefore,captures both dynamic preferences and the choice overload phenomenon. Therefore, we present anovel framework to overcome the limitations in the existing choice models.Considering the special cases when the transition matrix of the Markov chain is of rank one aswell as when it has a low-rank, we show that the corresponding assortment optimization problemunder these models is NP-hard. We also present a fully polynomial time approximation scheme(FPTAS) for both settings. The ﬁrst model generalizes MNL model while the second model gen-eralizes Mixture of MNLs model. We also show the eﬀectiveness of Generalized MNL model onreal data.

Let S ⊆ N , and ( I n − D ( S ) ρ v ) − = ( x ij ) i,j ∈ N , our assortment optimization problem becomes: max S ⊂ N λ T ( I n − D ( S ) ρ v ) − Π( S ) p = (cid:88) i ∈ S (cid:32)(cid:88) k ∈ N λ k x ki (cid:33) e − α (cid:80) j ∈ S + v j p i . Let i ∈ S we want to compute π ( i, S ) = (cid:32)(cid:88) k ∈ N λ k x ki (cid:33) e − α (cid:80) j ∈ S + v j . We can show that: ∀ k, j ∈ N  x kj v j (1 − µ ( k,S )) = (cid:80) s ∈ N v s µ ( s,S )+ v if k (cid:54) = j, x kk − v k (1 − µ ( k,S )) = (cid:80) s ∈ N v s µ ( s,S )+ v otherwise.Let π S = e − α (cid:80) j ∈ S + v j and x = (cid:80) k ∈ N v k µ ( k,S )+ v = π S (cid:80) k ∈ S v k + v , then:29 k ∈ N λ k x ki = λ i x ii + (cid:88) k (cid:54) = i λ k xv i (1 − µ ( k, S ))= λ i x ii + xv i (cid:88) k / ∈ S λ k + xv i (1 − π S ) (cid:88) k ∈ S \{ i } λ k = λ i x i i + xv i (1 − λ i − λ ) − xv i π S (cid:88) k ∈ S \{ i } λ k = λ i (1 + (1 − π S ) xv i ) + xv i (1 − λ i − λ ) − xv i π S (cid:88) k ∈ S \{ i } λ k = λ i + xv i (cid:32) − λ − π S (cid:88) k ∈ S λ k (cid:33) Since we supposed that λ j = v j for all j ∈ N , then the probability of buying the product i becomes: π ( i, S ) = (cid:32)(cid:88) k ∈ N λ k x ki (cid:33) e − α (cid:80) j ∈ S + v j = π S v i (cid:18) − v − π S (cid:80) k ∈ S v k π S (cid:80) k ∈ S v k + v (cid:19) = π S v i (cid:18) π S (cid:80) k ∈ S v k + v (cid:19) π ( i, S ) = v i (cid:80) k ∈ S v k + v e α (cid:80) j ∈ S + v j And if i / ∈ S , we have of course π ( i, S ) = 0 which ﬁnishes the proof. In this section, we present the details of the proof of the NP-hardness of the assortment optimizationproblem discussed in (3). We distinguish two diﬀerent cases: α ≤ and α > based on a nicestructural property of an optimal solution, when α ≤ . Lemma 9.1.

In the Generalized Multinomial Logit model with α ≤ , the product with the highestprice is always in the optimal set, i.e. if p > p ≥ ... ≥ p n , for all subset S ⊆ N \{ } , we have R ( S ∪ { } ) ≥ R ( S ) . roof. Proof If N \{ } = ∅ then the result is trivial since R ( ∅ ) = 0 . Now, suppose that there is atleast one other product than in N . Let S ⊆ N \{ } . We use the following notations: V ( S ) = (cid:88) j ∈ S v j and V P ( S ) = (cid:88) j ∈ S v j p j . Therefore we have R ( S ∪ { } ) − R ( S ) ≥ ⇔ V P ( S ) + v p V ( S ) + v + v e αV ( S + ) e αv − V P ( S ) V ( S ) + v e αV ( S + ) ≥ ⇔ v ( p V ( S ) − V P ( S )) + v e αV ( S + ) ( v p − ( e αv − V P ( S )) ≥ Since has the highest price, p V ( S ) ≥ V P ( S ) and ( e αv − V P ( S )) ≤ ( e αv − V ( S ) p = ( e αv − − v − β ( S )) p , where β ( S ) = (cid:80) k ∈ N + \{{ }∪ S } v k . Moreover, since ( e αv − − v − β ( S )) − v = e αv (1 − v − β ( S )) − β ( S ) , we want to prove that g : x (cid:55)→ e αx (1 − x − β ( S )) − is a negative function on (0 , . Indeed, it isa strictly decreasing function on (0 , : g (cid:48) ( x ) = e αx ( α − αx − αβ ( S ) − < ⇔ − β ( S ) − x < α . And − β ( S ) − x = V ( S ) ≤ − v < α since we assumed α ≤ . Moreover, g (0) = 0 , therefore g is negative on (0 , and we have R ( S ∪ { } ) − R ( S ) ≥ . (cid:3) Once we have this result, we now make a reduction from the partition problem. Consider thefollowing instance of the partition problem: we are given n integers c , ..., c n and the goal is todecide whether there is a subset S ⊆ { , ..., n } such that (cid:80) i ∈ S c i = (cid:80) i ∈{ ,...,n }\ S c i .Let T = (cid:80) ni =1 c i , then (cid:80) i ∈ S c i = (cid:80) i ∈{ ,...,n }\ S c i if and only if (cid:80) i ∈ S c i = T . We can supposewithout loss of generality that c i > for all i ∈ [ n ] . We construct an instance of our problem as31ollows: v i =  c i T +1 if i ≥ , − (cid:80) ni =1 v i = T +1 if i = 0 . and let c := v e αv > , we deﬁne the prices as follows p i =  T +1) c + e αT T +1 − T + c i if i = 1 , T +1) c + e αT T +1 − T otherwise.Finally we set the target revenue as K = T (2 T +1) c + e αT T +1 T +(2 T +1) c e αT T +1 .First, we can note that is necessarily in the optimal set. Indeed has the highest price in N , sothe previous lemma implies that is necessarily in the optimal set (we can note that the choice of is random and we could have chosen any i in N ).In this case, our problem becomes max S ⊆{ ,...,n } R ( S ) := max S ⊆{ ,...,n } R ( S ∪ { } )= max S ⊆{ ,...,n } (cid:80) i ∈ S v i p i + v p (cid:80) i ∈ S ∪{ } v i + c e α (cid:80) i ∈ S ∪{ } v i = max S ⊆{ ,...,n } (cid:18) T +1) c + e αT T +1 − T (cid:19) (cid:80) i ∈ S c i + 1 (cid:80) i ∈ S c i + (2 T + 1) c e α T +1 (cid:80) i ∈ S c i =: max S ⊆{ ,...,n } F (cid:32)(cid:88) i ∈ S c i (cid:33) where F : [0 , T ] → R + x (cid:55)→ h ( x ) x +(2 T +1) c e α T +1 x , and h : x (cid:55)→ (cid:32) T + 1) c + e αT T +1 − T (cid:33) x + 1 . is increasing at x if and only if F (cid:48) ( x ) ≥ ⇔ h (cid:48) ( x )( x + (2 T + 1) c e α T +1 x ) − h ( x )(1 + c αe α T +1 x )( x + (2 T + 1) c e α T +1 x ) ≥ ⇔ h (cid:48) ( x ) − c αe α T +1 x x + (2 T + 1) c e α T +1 x h ( x ) ≥ ⇔ h (cid:48) ( x ) h ( x ) ≥ c αe α T +1 x x + (2 T + 1) c e α T +1 x > since h > on [0 , ⇔ ln( h ( x )) − ln( h (0)) ≥ ln( x + (2 T + 1) c e α T +1 x ) − ln((2 T + 1) c ) ⇔ h ( x ) ≥ T + 1) c ( x + (2 T + 1) c e α T +1 x ) since h (0) = 1 ⇔ h ( x ) − g ( x ) ≥ where g : x (cid:55)→ T +1) c ( x + (2 T + 1) c e α T +1 x ) . We note that g is a strictly increasing function on [0 , T ] such that h (0) = g (0) = 1 and h (2 T ) < g (2 T ) . Indeed, h (2 T ) − g (2 T ) = 2( e αT T +1 −

1) + 1 − e α × T T +1 = − ( e αT T +1 − < . Therefore, since h is a line with a positive slope, there exists a unique x ∗ ∈ (0 , T ) such that forall < x < x ∗ , h ( x ) − g ( x ) > , h ( x ∗ ) − g ( x ∗ ) = 0 and for all T ≥ x > x ∗ , h ( x ) − g ( x ) < .But h ( T ) = g ( T ) . So x ∗ = T . So this proves that F is strictly increasing on [0 , T ) then strictlydecreasing on ( T, T ] . So F has a unique maximum at T on (0 , T ) . Hence, max S ⊆{ ,...,n } R ( S ) = max S ⊆{ ,...,n } F (cid:32)(cid:88) i ∈ S c i (cid:33) ≤ F ( T ) = T (2 T +1) c + e αT T +1 T + (2 T + 1) c e αT T +1 = K. So there exists an assortment S ⊆ { , ..., n } whose expected revenue is at least K if and onlyif the chain of inequalities hold as equalities. For this to happen we need to have (cid:80) i ∈ S (cid:48) c i = T for some assortment S (cid:48) ⊆ { , ..., n } . Therefore there exists an assortment S ⊆ { , ..., n } whoseexpected revenue is at least K if and only if there exists an assortment S (cid:48) ⊆ { , ..., n } that satisﬁes (cid:80) i ∈ S (cid:48) c i = T .Although when α > , we no longer have any nice structure in the optimal assortment and theproduct with the highest price may not necessarily be in an optimal assortment set, we can still33rove that the assortment optimization problem is NP-hard as long as α > (recall that we areinterested in the high α case).To prove the NP-hardness in this case, we once again make a reduction from the partition prob-lem. Consider the following instance of the partition problem: we are given n integers c , ..., c n andthe goal is to decide whether there is a subset S ⊆ { , ..., n } such that (cid:80) i ∈ S c i = (cid:80) i ∈{ ,...,n }\ S c i .Let T = (cid:80) ni =1 c i , then (cid:80) i ∈ S c i = (cid:80) i ∈{ ,...,n }\ S c i if and only if (cid:80) i ∈ S c i = T . We can supposewithout loss of generality that c i > for all i ∈ [ n ] . We construct an instance of our problem asfollows: v i =  c i T × α if i ≥ , − (cid:80) ni =1 v i = 1 − α if i = 0 , and let c := v e αv > . We note that v > because we have supposed α > . We deﬁne theprices as follows ∀ i ∈ [ n ] p i = 1 . Finally we set the target revenue as K = αc e . In this case, our problem becomes max S ⊆{ ,...,n } R ( S ) := max S ⊆{ ,...,n } (cid:80) i ∈ S v i p i (cid:80) i ∈ S v i + c e α (cid:80) i ∈ S v i = max S ⊆{ ,...,n } T α (cid:80) i ∈ S c i T α (cid:80) i ∈ S c i + c e T (cid:80) i ∈ S c i =: max S ⊆{ ,...,n } F (cid:32)(cid:88) i ∈ S c i (cid:33) , where F : [0 , T ] → R + x (cid:55)→ xx + T αc e xT .F is increasing at x if and only if F (cid:48) ( x ) ≥ ⇔ x + T αc e xT − x (1 + αc e xT )( x + T αc e xT ) ≥ ⇔ x ≤ T. Therefore F is strictly increasing on [0 , T ) then strictly decreasing on ( T, T ] . So F has a unique34aximum at T on (0 , T ) . Hence, max S ⊆{ ,...,n } R ( S ) = max S ⊆{ ,...,n } F (cid:32)(cid:88) i ∈ S c i (cid:33) ≤ F ( T ) = 11 + αc e = K. So there exists an assortment S ⊆ { , ..., n } whose expected revenue is at least K if and onlyif the chain of inequalities hold as equalities. For this to happen we need to have (cid:80) i ∈ S (cid:48) c i = T for some assortment S (cid:48) ⊆ { , ..., n } . Therefore there exists an assortment S ⊆ { , ..., n } whoseexpected revenue is at least K if and only if there exists an assortment S (cid:48) ⊆ { , ..., n } that satisﬁes (cid:80) i ∈ S (cid:48) c i = T .We make a special note here about < α ≤ . We point out that we are more interested in the“picky customer” case, i.e., when α is large enough, because this is when we actually characterizechoice overload (see the homogeneous graph example in Section 4.2, or the numerical results inSection 7). Hence this is an uninteresting case, although we do believe that assortment optimizationproblem is still NP-hard for this particular setting as well (and we have examples to verify thisclaim). Let S ∗ be the optimal solution to the assortment optimization problem. There exists l such that v (1 + (cid:15) ) l − ≤ (cid:88) j ∈ S ∗ v j ≤ v (1 + (cid:15) ) l . Let h = v (1 + (cid:15) ) l . Then (cid:88) j ∈ S ∗ v j (cid:15)h/n ≤ n(cid:15)h h = n(cid:15) , and rounding up gives us (cid:88) j ∈ S ∗ ¯ v j ≤ (cid:108) n(cid:15) (cid:109) + n = I. Thus S ∗ belongs to the set of assortments such that inequality (1) is veriﬁed for I . Let S h be theassortment corresponding to R ( I, n ) for the guess h , that is the one that maximizes (cid:80) j ∈ S v j p j (1) . Then since S ∗ satisﬁes (1) (cid:88) j ∈ S h v j p j ≥ (cid:88) j ∈ S ∗ v j p j . Moreover, (cid:88) j ∈ S h v j ≤ (cid:15)h/n (cid:88) j ∈ S h ¯ v j ≤ h (1 + (cid:15) + (cid:15)/n ) ≤ h (1 + 2 (cid:15) ) . Since x (cid:55)→ x + v e α ( v x ) is a decreasing function and R ( S h ) = (cid:88) j ∈ S h v j p j (cid:80) k ∈ S h v k + v e α ( v + (cid:80) j ∈ Sh v j ) ≥ (cid:80) j ∈ S h v j p j v (1 + (cid:15) ) l (1 + 2 (cid:15) ) + v e α ( v + v (1+ (cid:15) ) l (1+2 (cid:15) )) . Let us ﬁrst show that there exists β > such that v (1 + (cid:15) ) l − + v e α ( v + v (1+ (cid:15) ) l − ) ≥ ( v (1 + (cid:15) ) l (1 + 2 (cid:15) ) + v e α ( v + v (1+ (cid:15) ) l (1+2 (cid:15) )) ) × (1 − β(cid:15) ) . (6)Indeed, v (1 + (cid:15) ) l − ≥ v (1 + (cid:15) ) l (1 + 2 (cid:15) )(1 − β(cid:15) ) ⇔ ≥ (1 + (cid:15) )(1 + 2 (cid:15) )(1 − β(cid:15) ) = 1 + (3 − β ) (cid:15) + (2 − β ) (cid:15) − β(cid:15) , which is clearly true at least for β ≥ . Moreover, v e α ( v + v (1+ (cid:15) ) l − ) ≥ v e α ( v + v (1+ (cid:15) ) l (1+2 (cid:15) )) × (1 − β(cid:15) ) ⇔ e αv (1+ (cid:15) ) l − (1 − (1+ (cid:15) )(1+2 (cid:15) )) ≥ − β(cid:15). Note that e αv (1+ (cid:15) ) l − (1 − (1+ (cid:15) )(1+2 (cid:15) )) = 1 − αv(cid:15) + o ( (cid:15) ) . Therefore if we take β ≥ max(3 , αv ) , thenthe inequality (2) is veriﬁed. Consequently v (1 + (cid:15) ) l (1 + 2 (cid:15) ) + v e α ( v + v (1+ (cid:15) ) l (1+2 (cid:15) )) ≥ − β(cid:15)v (1 + (cid:15) ) l − + v e α ( v + v (1+ (cid:15) ) l − ) ≥ − β(cid:15) (cid:80) k ∈ S ∗ v k + v e α ( v + (cid:80) k ∈ S ∗ v k ) .

36y deﬁnition of l . Therefore, R ( S h ) ≥ (1 − β(cid:15) ) (cid:80) j ∈ S h v j p j (cid:80) k ∈ S ∗ v k + v /e α ( v + (cid:80) k ∈ S ∗ v k ) R ( S h ) ≥ (1 − β(cid:15) ) R ( S ∗ ) , where in the last inequality we used that (cid:80) j ∈ S h v j p j ≥ (cid:80) j ∈ S ∗ v j p j . This proves that our algorithmreturns a (1 − O ( (cid:15) )) -optimal solution to the assortment problem. Running time

We try a total of L = O (log( nV /v ) /(cid:15) ) guesses h , and for each guess we formulatea dynamic programming with O ( n /(cid:15) ) steps. Consequently the running time of the algorithm is O (cid:16) n (cid:15) log( nV /v ) (cid:17) which is polynomial in the input size n and (cid:15) . Let || . || be the usual Euclidean norm, and (cid:104) . (cid:105) its associated scalar product. We have that ρ ( U V ( S )) = max x, || x || =1 (cid:104) U V ( S ) x, x (cid:105) . Let k ∈ [ K ] , (cid:104) U V ( S ) e k , e k (cid:105) = n (cid:88) l =1 (1 − µ LR ( l, S )) u lk v lk ≤ n n (cid:88) l =1 (1 − µ LR ( l, S )) , ≤ n (cid:32) n − | S | + (cid:88) l ∈ S (1 − e − α (cid:80) k ∈ [ K ] u lk V k ( S ) ) (cid:33) , where the ﬁrst inequality follows from the assumption made above on the coeﬃcients u lk v lk . Fur-thermore, since α ≤ log n , e − α (cid:80) k ∈ [ K ] u lk V k ( S ) ≥ e − α (cid:80) k ∈ [ K ] u lk (cid:80) j ∈ N + v jk = e − α ≥ n . Hence (cid:104)

U V ( S ) e k , e k (cid:105) ≤ n (cid:18) n − | S | + | S | (cid:18) − n (cid:19)(cid:19) = 1 − | S | n ≤ − n . e i , i ∈ [ n ] , and therefore ∀ x ∈ R K such that || x || = 1 . Hence, ρ ( U V ( S )) ≤ − n . Let S ∈ N be the chosen subset of products, and let i ∈ S . We recall from Section 3.1, that π ( i, S ) = λ T ( I n − Diag ((1 − µ ( i, S ))) ρ ( N, N )) − Π( S ) e i . (7)This result still holds as the Generalized Multinomial Logit model with Low rank matrix has thesame assumptions as the one needed to get this result. However, there is some diﬀerence with theassumptions made in Section 4. On the contrary of the Generalized Markov chain model presentedin Section 4, we now have that µ LR ( i, S ) = e − α × ( (cid:80) k ∈ [ K ] u ik V k ( S ) ) and ρ ( N, N ) = (cid:88) k ∈ [ K ] u k v Tk , where v k = ( v k , ..., v nk ) . Therefore we have to compute the coeﬃcients of the matrix (cid:0) I n − Diag ((1 − µ LR ( i, S ))) ρ ( N, N ) (cid:1) − under this new model. We know that (cid:0) I n − Diag ((1 − µ LR ( i, S ))) ρ ( N, N ) (cid:1) − = ∞ (cid:88) l =0 (cid:0) Diag ((1 − µ LR ( i, S ))) ρ ( N, N ) (cid:1) l . (8)We use the following notations:• let M := Diag ((1 − µ LR ( i, S ))) ρ ( N, N ) = (cid:0) (1 − µ LR ( i, S )) ρ ij (cid:1) i,j ∈ [ n ] ,• let U V ( S ) = (cid:0)(cid:80) nl =1 (1 − µ LR ( l, S )) u lk v lm (cid:1) k,m ∈ [ K ] ,38 for l ∈ N , we note (cid:0) U V ( S ) l (cid:1) k (cid:48) k the coeﬃcient of indices k (cid:48) , k of the matrix U V ( S ) l , ie ofthe matrix U V ( S ) elevated to the power of l .First, we show by induction that ∀ l ≥ M l =  (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) U V ( S ) l − (cid:1) k (cid:48) k  i ∈ [ n ] ,j ∈ N + Initiation

For l = 1 , we use the deﬁnition of M and ρ ( N, N ) M = (cid:0) (1 − µ LR ( i, S )) ρ ij (cid:1) i,j ∈ [ n ] =  (1 − µ LR ( i, S )) (cid:88) k ∈ [ K ] u ik v jk  i,j ∈ [ n ] . Since

U V ( S ) = I K , we have that ( U V ( S ) ) k (cid:48) k = k (cid:48) = k . Therefore M =  (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) U V ( S ) (cid:1) k (cid:48) k  i,j ∈ [ n ] . The result holds for l = 1 . Inductive step

Suppose that the result holds for a certain l ∈ N ∗ . We compute the coeﬃcient ofindices i, j of M l +1 : (cid:0) M l +1 (cid:1) ij = n (cid:88) q =1 (cid:0) M l (cid:1) iq (1 − µ LR ( q, S )) ρ qj Using the induction hypothesis, (cid:0) M l +1 (cid:1) ij = n (cid:88) q =1  (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v qk (cid:48) (cid:0) U V ( S ) l − (cid:1) k (cid:48) k  (1 − µ LR ( q, S )) (cid:88) k (cid:48)(cid:48) ∈ [ K ] u qk (cid:48)(cid:48) v jk (cid:48)(cid:48) = (1 − µ LR ( i, S )) (cid:88) k,k (cid:48)(cid:48) ∈ [ K ] u ik v jk (cid:48)(cid:48) (cid:88) k (cid:48) ∈ [ K ]  (cid:88) q ∈ [ n ] (1 − µ LR ( q, S )) u qk (cid:48)(cid:48) v qk (cid:48)  (cid:0) U V ( S ) l − (cid:1) k (cid:48) k = (1 − µ LR ( i, S )) (cid:88) k,k (cid:48)(cid:48) ∈ [ K ] u ik v jk (cid:48)(cid:48) (cid:88) k (cid:48) ∈ [ K ] ( U V ( S )) k (cid:48)(cid:48) k (cid:48) (cid:0) U V ( S ) l − (cid:1) k (cid:48) k = (1 − µ LR ( i, S )) (cid:88) k,k (cid:48)(cid:48) ∈ [ K ] u ik v jk (cid:48)(cid:48) (cid:0) U V ( S ) l (cid:1) k (cid:48)(cid:48) k . l + 1 . Conclusion

For all l ≥ , we have that M l =  (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) U V ( S ) l − (cid:1) k (cid:48) k  i ∈ [ n ] ,j ∈ N + Now that once we have this result, we can compute the coeﬃcient of individual indices i, j of thematrix ( I n − Diag ((1 − µ LR ( i, S ))) ρ ( N, N )) − = ( I n − M ) − using (2) : (cid:0) ( I n − Diag ((1 − µ LR ( i, S ))) ρ ( N, N )) − (cid:1) ij = ∞ (cid:88) l =0 ( M l ) ij = i = j + ∞ (cid:88) l =1 (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) U V ( S ) l − (cid:1) k (cid:48) k = i = j + (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) ∞ (cid:88) l =1 (cid:0) U V ( S ) l − (cid:1) k (cid:48) k = i = j + (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) ( I K − U V ( S )) − (cid:1) k (cid:48) k . The last inequality holds since ρ ( U V ( S )) < , as shown in Lemma 6.1. Injecting it in (1) , andcomputing (cid:80) i ∈ S π LR ( i, S ) p i , we have R LR ( S ) = n (cid:88) i =1 λ i n (cid:88) j =1  i = j + (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) ( I K − U V ( S )) − (cid:1) k (cid:48) k  µ LR ( j, S ) p j = n (cid:88) i =1 λ i µ LR ( i, S ) p i + n (cid:88) i =1 λ i (1 − µ LR ( i, S )) n (cid:88) j =1  (cid:88) k,k (cid:48) ∈ [ K ] u ik v jk (cid:48) (cid:0) ( I K − U V ( S )) − (cid:1) k (cid:48) k  µ LR ( j, S ) p j = n (cid:88) i =1 λ i µ LR ( i, S ) p i + n (cid:88) i =1 λ i (1 − µ LR ( i, S )) (cid:88) k,k (cid:48) ∈ [ K ] u ik (cid:0) ( I K − U V ( S )) − (cid:1) k (cid:48) k n (cid:88) j =1 v jk (cid:48) µ LR ( j, S ) p j = n (cid:88) i =1 λ i (cid:88) k,k (cid:48) ∈ [ K ] u ik (cid:0) ( I K − U V ( S )) − (cid:1) k (cid:48) k (cid:32) n (cid:88) j =1 v jk (cid:48) µ LR ( j, S ) p j (cid:33) + (cid:88) i ∈ S λ i µ LR ( i, S )  p i − (cid:88) k,k (cid:48) ∈ [ K ] u ik (cid:0) ( I K − U V ( S )) − (cid:1) k (cid:48) k (cid:32) n (cid:88) j =1 v jk (cid:48) µ LR ( j, S ) p j (cid:33) . µ LR ( j, S ) = 0 , ∀ j / ∈ S , after reorganizing the terms, we can rewrite this as R LR ( S ) = (cid:88) i ∈ [ n ] λ i (1 − µ LR ( i, S )) K (cid:88) k,k (cid:48) =1 u ik ( I − U V ( S )) − k (cid:48) k (cid:32)(cid:88) l ∈ S v lk (cid:48) µ LR ( l, S ) p l (cid:33) + (cid:88) i ∈ S λ i µ LR ( i, S ) p i . Finally, writing the sum over k, k (cid:48) in matrix form and using the notation u i = { u ik } Kk =1 and v j = { v jk } Kk =1 , we get R LR ( S ) = (cid:88) i ∈ [ n ] λ i (1 − µ LR ( i, S )) (cid:32)(cid:88) j ∈ S p j µ LR ( j, S ) u Ti [ I − U V ( S )] − v j (cid:33) + (cid:88) i ∈ S λ i µ LR ( i, S ) p i , which concludes the proof. From Lemma 6.1, we know that ρ ( U V ( S )) ≤ − n < . Therefore we can write [ I − U V ( S )] − = ∞ (cid:88) l =0 (cid:0) U V ( S ) l (cid:1) .

41e ﬁrst show one side of the inequality. Let α = (1 − O ( (cid:15) )) . Furthermore, let L = [ I − U V ( S )] − v j and L (cid:15) = [ I − α U V ( S )] − α v j . Hence, we have L − L (cid:15) = (cid:0) [ I − U V ( S )] − − α [ I − α U V ( S )] − (cid:1) v j = (cid:32) ∞ (cid:88) l =0 (cid:0) U V ( S ) l (cid:1) − α ∞ (cid:88) l =0 (cid:0) α U V ( S ) l (cid:1)(cid:33) v j = (cid:32) ∞ (cid:88) l =0 (1 − α l +11 ) (cid:0) U V ( S ) l (cid:1)(cid:33) v j ≥ (cid:32) ∞ (cid:88) l =0 (1 − α ) l +1 (cid:0) U V ( S ) l (cid:1)(cid:33) v j ( since < α < − α ) (cid:32) ∞ (cid:88) l =0 (1 − α ) l (cid:0) U V ( S ) l (cid:1)(cid:33) v j = (1 − α )[ I − (1 − α ) U V ( S )] − v j ≥ O ( (cid:15) ) L which completes the proof. The other side of the inequality is similarly proved using the fact thatfor α > (when we set α = (1 + O ( (cid:15) ) ), we have α l − ≥ ( α − l for any l ≥ . Let S ∗ be the optimal solution to the assortment optimization problem. There exist t , ..., t K suchthat for all k ∈ [ K ] v k (1 + (cid:15) ) t k ≤ (cid:88) j ∈ S ∗ v jk =: V k ( S ∗ ) ≤ v k (1 + (cid:15) ) t k +1 . Let h = ( v (1 + (cid:15) ) t , ..., v K (1 + (cid:15) ) t K ) . Choose the set S h that maximizes the dynamic programdeﬁned above. Then by deﬁnition of ¯ v jk , for each k ∈ [ K ] V k ( S h ) := (cid:88) j ∈ S h v jk ≤ (cid:15)h k /n (cid:88) j ∈ S h ¯ v jk ≤ (cid:15)h k n U = (cid:15)h k n ( (cid:100) n/(cid:15) (cid:101) + n ) ≤ h k (1 + 2 (cid:15) ) , V k ( S h ) := (cid:88) j ∈ S h v jk ≥ (cid:15)h k /n (cid:88) j ∈ S h (¯ v jk − ≥ (cid:15)h k n ( L − | S h | ) ≥ (cid:15)h k n ( (cid:100) n/(cid:15) (cid:101) − n ) ≥ h k (1 − (cid:15) ) . First of all, since for all k ∈ [ K ] , h k ≤ V k ( S ∗ ) ≤ h k (1 + (cid:15) ) , e − α (cid:80) Kk =1 h k u jk (1+ (cid:15) ) ≤ µ LR ( j, S ∗ ) = e − α (cid:80) Kk =1 V k ( S ∗ ) u jk ≤ e − α (cid:80) Kk =1 h k u jk = µ j ( h ) . Thus, if we note (cid:80) Kk =1 h k u jk =: (cid:104) h, u j (cid:105) , µ j ( h )(1 − (cid:15)α (cid:104) h, u j (cid:105) ) ≤ µ LR ( j, S ∗ ) ≤ µ j ( h ) , (9)and using the same arguments for h k (1 − (cid:15) ) ≤ V k ( S h ) ≤ h k (1 + 2 (cid:15) ) , we can show that µ j ( h )(1 − (cid:15)α (cid:104) h, u j (cid:105) ) ≤ µ LR ( j, S h ) ≤ µ j ( h )(1 + (cid:15)α (cid:104) h, u j (cid:105) ) . Therefore

U V ( S ∗ ) kk (cid:48) = (cid:88) i ∈ [ n ] (1 − µ LR ( i, S ∗ )) u ik v ik (cid:48) ≤ (cid:88) i ∈ [ n ] (1 − µ i ( h )) u ik v ik (cid:48) (cid:18) µ i ( h )1 − µ i ( h ) (cid:15)α (cid:104) h, u i (cid:105) (cid:19) ≤ (cid:88) i ∈ [ n ] (1 − µ i ( h )) u ik v ik (cid:48) (cid:18) (cid:15)α max i ∈ [ n ] (cid:26) µ i ( h )1 − µ i ( h ) (cid:104) h, u i (cid:105) (cid:27)(cid:19) Hence we get for all k, k (cid:48) ∈ [ K ] H ( h ) kk (cid:48) ≤ U V ( S ∗ ) kk (cid:48) ≤ H ( h ) kk (cid:48) (1 + δ ) , (10)where δ := (cid:15)α max i ∈ [ n ] (cid:110) µ i ( h )1 − µ i ( h ) (cid:104) h, u i (cid:105) (cid:111) , and likewise H ( h ) kk (cid:48) (1 − δ ) ≤ U V ( S h ) kk (cid:48) ≤ H ( h ) kk (cid:48) (1 + 2 δ ) . δ = O ( (cid:15) ) , we can use the result from Lemma 6.3. We want to prove that R LR ( S h ) ≥ (1 − O ( (cid:15) )) R LR ( S ∗ ) . Recall the expression of optimal revenue R LR ( S ∗ ) = (cid:88) i ∈ S ∗ λ i µ LR ( i, S ∗ ) p i + n (cid:88) i =1 λ i (1 − µ LR ( i, S ∗ )) f ( i, S ∗ ) . Let us now compare f i ( h, S ∗ ) and f ( i, S ∗ ) := (cid:80) nj =1 p j µ LR ( j, S ∗ ) u Ti [ I − U V ( S ∗ )] − v j usingbounds from (9) and the result from Lemma (6.3). We have f i ( h, S ∗ )(1 − δ ) ≤ f ( i, S ∗ ) := n (cid:88) j =1 p j µ LR ( j, S ∗ ) u Ti [ I − U V ( S ∗ )] − v j ≤ f i ( h, S ∗ )(1 + O ( (cid:15) )) , where δ := (cid:15)α max i ∈ [ n ] {(cid:104) h, u i (cid:105)} = O ( (cid:15) ) . Using the bounds for µ LR ( j, S h ) and f ( i, S h ) , we alsohave f i ( h, S h )(1 − O ( (cid:15) ))(1 − δ ) ≤ f ( i, S h ) ≤ f i ( h, S h )(1 + O ( (cid:15) )(1 + δ ) . Using these upper and lower bounds for f ( i, S ∗ ) (resp. f ( i, S h ) ) and the corresponding lower andupper bounds for µ LR ( j, S ∗ ) (resp. µ LR ( j, S h ) ) from equation (9), we have R LR ( S ∗ ) ≤ (cid:88) i ∈ S ∗ λ i µ i ( h ) p i + n (cid:88) i =1 λ i (1 − µ i ( h ))(1 + δ ) f i ( h, S ∗ ) (1 + O ( (cid:15) )) ≤ (1 + δ ) (1 + O ( (cid:15) )) R DP ( h, S ∗ ) , and R LR ( S h ) ≥ (cid:88) i ∈ S ∗ λ i (1 − δ ) µ i ( h ) p i + n (cid:88) i =1 λ i (1 − µ i ( h ))(1 − δ )(1 − O ( (cid:15) ))(1 − δ ) f i ( h, S h ) ≥ (1 − δ )(1 − O ( (cid:15) ))(1 − δ ) R DP ( h, S h ) . Next, by the deﬁnition of S h being the assortment that maximizes the dynamic program for thegiven guess h , we also have R DP ( h, S ∗ ) ≤ R DP ( h, S h ) . Combining this with the bounds derivedabove, we get R LR ( S ∗ ) ≤ (1 + δ )(1 + O ( (cid:15) ))(1 − δ )(1 − O ( (cid:15) ))(1 − δ ) R LR ( S h ) . (11)44ence we get the following lower bound for the expected revenue obtained by the assortment S h : (1 − δ )(1 − O ( (cid:15) ))(1 − δ )(1 + δ )(1 + O ( (cid:15) )) R LR ( S ∗ ) ≤ R LR ( S h ) . Finally, to argue that this gives us (1 − O ( (cid:15) )) solution, to conclude the proof, we need to show that (1 − δ )(1 − O ( (cid:15) ))(1 − δ )(1+ δ )(1+ O ( (cid:15) )) is not too far from 1. This holds since, δ := O ( (cid:15) ) and δ := O ( (cid:15) ) . Thus (1 − δ )(1 − O ( (cid:15) ))(1 − δ )(1 + δ )(1 + O ( (cid:15) )) ≥ − O ( (cid:15) ) , We have ﬁnally proven that our algorithm returns an assortment S h that is a (1 − O ( (cid:15) )) optimalsolution to our assortment problem. R LR ( S ∗ )(1 − O ( (cid:15) )) ≤ R LR ( S h ) ≤ R LR ( S ∗ ) (12) Running time

We try a total of (cid:81) k ∈ [ K ] O (log( nV k /v k ) /(cid:15) ) = O ((log( nV /v ) /(cid:15) ) K ) guesses for h (where V := max k ∈ [ K ] V k and v := min k ∈ [ K ] v k ). For each guess we formulate a dynamicprogramming with O (cid:16)(cid:0) n(cid:15) (cid:1) K n K (cid:17) run-time. Consequently the running time of the algorithm is O (cid:16) n K +1) K (cid:15) K log K ( nV /v ) (cid:17) which is polynomial in the input size n and (cid:15) .

10 BibliographyReferences [1] Joeran Beel, Zeljko Carevic, Johann Schaible, and Gabor Neusch. Rard: The related-article recom-mendation dataset, 2017.[2] Felix Beierle, Akiko Aizawa, and Joeran Beel. Exploring choice overload in related-article recommen-dations in digital libraries, 2017.[3] Moshe E Ben-Akiva and Steven R Lerman.

Discrete choice analysis: theory and application to traveldemand , volume 9. MIT press, 1985.[4] Gerardo Berbeglia, Agust´ın Garassino, and Gustavo Vulcano. A comparative empirical study of discretechoice models in retail operations.

Available at SSRN 3136816 , 2018.

5] Jose Blanchet, Guillermo Gallego, and Vineet Goyal. A markov chain approximation to choice mod-eling.

Operations Research , 64(4):886–905, 2016.[6] Hakjin Chung, Hyun-Soo Ahn, and Stefanus Jasin. On (re-scaled) multi-attempt approximation ofcustomer choice model and its application to assortment optimization.

Ross School of Business Paper ,1(1322), 2016.[7] James Davis, Guillermo Gallego, and Huseyin Topaloglu. Assortment planning under the multinomiallogit model with totally unimodular constraint structures. , 2013.[8] James M Davis, Guillermo Gallego, and Huseyin Topaloglu. Assortment optimization under variantsof the nested logit model.

Operations Research , 62(2):250–273, 2014.[9] Antoine Desir, Vineet Goyal, Srikanth Jagabathula, and Danny Segev. Assortment optimization underthe mallows model. In

Advances in Neural Information Processing Systems , pages 4700–4708, 2016.[10] Antoine D´esir, Vineet Goyal, and Jiawei Zhang. Near-optimal algorithms for capacity constrainedassortment optimization.

Available at SSRN 2543309 , 2014.[11] Vivek F Farias, Srikanth Jagabathula, and Devavrat Shah. A nonparametric approach to modelingchoice with limited data.

Management Science , 59(2):305–322, 2013.[12] Guillermo Gallego, Garud Iyengar, R Phillips, and Abha Dubey. Managing ﬂexible products on anetwork.

Available at corc.ieor.columbia.edu , 2004.[13] Sheena S Iyengar and Mark R Lepper. When choice is demotivating: Can one desire too much of agood thing?

Journal of personality and social psychology , 79(6):995, 2000.[14] S Jagabathula and P Rusmevichientong. A two-stage model of consideration set and choice: Learning,revenue prediction. Technical report, and applications. Working paper, 2013.[15] Srikanth Jagabathula and Gustavo Vulcano. A partial-order-based model to estimate individual prefer-ences using panel data.

Management Science , 2017.[16] R Duncan Luce.

Individual choice behavior: A theoretical analysis . Courier Corporation, 2005.[17] Daniel McFadden. Modeling the choice of residential location.

Transportation Research Record ,1(673), 1978.[18] Shashi Mittal and Andreas S Schulz. A general framework for designing approximation schemes forcombinatorial optimization problems with many objectives combined into one.

Operations Research ,61(2):386–397, 2013.[19] Kameng Nip, Zhenbo Wang, and Zizhuo Wang. Assortment optimization under a single transitionmodel.

Available at SSRN 2916110 , 2017.[20] Robin L Plackett. The analysis of permutations.

Applied Statistics , pages 193–202, 1975.[21] Paat Rusmevichientong, David Shmoys, Chaoxu Tong, and Huseyin Topaloglu. Assortment optimiza-tion under the multinomial logit model with random choice parameters.

Production and OperationsManagement , 2014.

22] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M Todd. Can there ever be too many options?a meta-analytic review of choice overload.

Journal of Consumer Research , 37(3):409–425, 2010.[23] Kalyan Talluri and Garrett Van Ryzin. Revenue management under a general discrete choice model ofconsumer behavior.

Management Science , 50(1):15–33, 2004.[24] Garrett van Ryzin and Gustavo Vulcano. An expectation-maximization method to estimate a rank-basedchoice model of demand.

Operations Research , 65(2):396–407, 2017.[25] Ruxian Wang and Ozge Sahin. The impact of consumer search cost on assortment planning and pricing.

Management Science , 64(8):3649–3666, 2017.[26] Ruxian Wang and Zizhuo Wang. Consumer choice models with endogenous network eﬀects.

Manage-ment Science , 2016.[27] Huw CWL Williams. On the formation of travel demand models and economic evaluation measuresof user beneﬁt.

Environment and planning A , 9(3):285–344, 1977.[28] Dan Zhang and William L Cooper. Revenue management for parallel ﬂights with customer-choicebehavior.

Operations Research , 53(3):415–431, 2005., 53(3):415–431, 2005.