[PDF] Optimal Search and Discovery

Abstract

This paper studies a search problem where a consumer is initially aware of only a few products. At every point in time, the consumer then decides between searching among alternatives he is already aware of and discovering more products. I show that the optimal policy for this search and discovery problem is fully characterized by tractable reservation values. Moreover, I prove that a predetermined index fully specifies the purchase decision of a consumer following the optimal search policy. Finally, a comparison highlights differences to classical random and directed search.

Full PDF

aa r X i v : . [ ec on . T H ] S e p Optimal Search and Discovery ∗ Rafael P. Greminger † September 21, 2020

Abstract

This paper studies a search problem where a consumer initially is aware of only a fewproducts. To ﬁnd a good match, the consumer sequentially decides between searching amongalternatives he is already aware of and discovering more products. I show that the optimalpolicy for this search and discovery problem is fully characterized by tractable reservationvalues. Moreover, I prove that a predetermined index fully speciﬁes the purchase decision ofa consumer following the optimal search policy. Finally, a comparison highlights diﬀerencesto classical random and directed search.

In many settings, consumers have limited information and ﬁrst need to search for product infor-mation before being able to compare alternatives. The resulting search frictions have receivedconsiderable attention in the literature. Under the rational choice paradigm, the analysis of suchsettings relies on optimal search policies that describe how a consumer with limited informationoptimally searches among the available alternatives. I add to this literature by developing andsolving a sequential search problem that introduces a novel aspect: limited awareness.To ﬁx ideas, consider a consumer looking to buy a mobile phone. Through advertising orrecommendations from friends, the consumer initially is aware of a single available phone and has ∗ This paper previously was circulated under the title „Optimal Search and Awareness Expansion”. † I am deeply grateful to my advisors, Tobias Klein and Jaap Abbring, for their thoughtful guidance andsupport. I also thank Bart Bronnenberg, Nikolaus Schweitzer, Nicola Pavanini and various members of theStructural Econometrics Group in Tilburg for excellent comments and advice. Finally, I thank the NetherlandsOrganisation for Scientiﬁc Research (NWO) for ﬁnancial support through Research Talent grant 406.18.568.Tilburg University, Department of Econometrics & Operations Research, [email protected]. For example, Stigler (1961); Diamond (1971); Burdett and Judd (1983); Anderson and Renault (1999); Kuksov(2006); Choi et al. (2018); Moraga-González et al. (2017a,b) study search frictions in equilibrium models andHortaçsu and Syverson (2004); Hong and Shum (2006); De Los Santos et al. (2012); Bronnenberg et al. (2016);Chen and Yao (2017); Zhang et al. (2018); Jolivet and Turon (2019) study implications of search empirically. discover discoverinspect Phone 2 discoverbuy Phone 2inspect Phone 1inspect Phone 1inspect Phone 1

Figure 1 – Example of a choice sequence in the search and discovery problem.

The „search and discovery problem” introduced in this paper formalizes the consumer’s dy-namic decision process in this and similar settings. It nests classical random and directedsequential search as special cases but generalizes both by introducing limited awareness. Inrandom search, a searcher has no prior information, searches randomly across alternatives anddecides when to end search (e.g. McCall, 1970; Lippman and McCall, 1976). In directed search,the searcher is aware of all available alternatives and uses partial product information to de-termine an order in which to inspect products and when to end search (e.g. Weitzman, 1979;Chade and Smith, 2006). In contrast, in the search and discovery problem, the consumer isaware of only a few products. Hence, he not only decides in what order to inspect products andwhen to end search, but also when to try to discover more alternatives.The resulting framework allows to study settings that are diﬃcult to accommodate in existingsearch problems. In particular, neither random nor directed search is well suited to study settingswhere rational consumers remain oblivious to some, while obtaining only partial information onother products. However, such settings are common in practice. For example, in marketswith a large number of alternatives, consumers remain unaware of many alternatives unlessthey actively set out to discover products beyond those they are already aware of. Similarly,in markets where rapid technological innovations lead to a constant stream of newly availablealternatives, few consumers are aware of new releases without exerting eﬀort to remain informed.Moreover, consumers also face a search and discovery problem when buying products online.2nline retailers and search intermediaries present alternatives on a product list that revealspartial information only for some products. Consumers then decide between clicking on productsto reveal full information, and browsing further along the list to discover more products.The contribution of this paper is to show that despite its complexity, optimal search decisionsand outcomes in the search and discovery problem remain tractable. First, I prove that theoptimal policy is fully characterized by reservation values similar to the well-known reservationprices derived by Weitzman (1979). In each period, a reservation value is assigned to eachavailable action, and it is optimal to always choose the action with the largest value. Each ofthe reservation values is independent of any other available action and can be calculated withouthaving to consider expectations over a myriad of future periods. Hence, reservation values remaintractable. This allows to determine optimal search behavior under limited awareness withoutusing numerical methods.Second, I prove that the purchase of a consumer solving the search and discovery problemis equivalent to the same consumer having full information and directly choosing products froma predetermined index. This result generalizes the „eventual purchase theorem” derived inde-pendently by both Choi et al. (2018) and Armstrong (2017) to the case of limited awareness. Similar to the eventual purchase theorem, my generalization allows to derive a consumer’s ex-pected payoﬀ and market demand without having to consider a multitude of possible choicesequences that otherwise make aggregation diﬃcult.To provide further details on how the search and discovery problem relates to existing searchframeworks, I compare a consumer’s stopping decisions, expected payoﬀ and the resulting marketdemand with classical random and directed sequential search. The comparison highlights severaldiﬀerences and reveals how limited awareness and the availability of partial product informationdetermine search outcomes. Throughout, the comparison focuses on the case where consumersdiscover products one at a time.An immediate diﬀerence between the three search problems is that having two distinct searchactions posits a novel question: Do consumers beneﬁt more from making it easier to discovermore alternatives (e.g. through search intermediaries), or from facilitating inspection by morereadily providing detailed product information? I show that when the number of availablealternatives exceeds a (possibly small) threshold, the expected payoﬀ increases more when fa-cilitating discovery instead of facilitating inspection. This reﬂects that discovery costs becomemore important as the number of alternatives grows. Choi et al. (2018) note that „Our eventual purchase theorem was anticipated by Armstrong and Vickers (2015)and has been independently discovered by Armstrong (2017) and Kleinberg et al. (2017).” A reduction in inspection costs can be more beneﬁcial when more than one product is discovered at a time.In this case, the consumer will, in expectation, inspect more products, making inspection costs relatively more If aseller’s marketing eﬀorts make more consumers aware of a product before search or increase theprobability of the product being discovered early on, ranking eﬀects directly imply that theywill increase the demand.A directed search problem does not entail a similar mechanism. As consumers are aware ofall products from the outset, advertising in the sense of informing consumers about the existenceof a product cannot directly be considered in directed search. Nonetheless, a directed searchproblem can also generate ranking eﬀects by assuming diﬀerences in inspection costs. I showthat in contrast to the search and discovery problem, this yields that ranking eﬀects increase inthe number of available alternatives, as well as in a product’s position. Moreover, the analysisreveals that ranking eﬀects generated through diﬀerences in inspection costs are sensitive to acost speciﬁcation that can be diﬃcult to interpret in practice.Both a search and discovery and a random search problem imply that consumers may notbuy products they initially are unaware of. However, in random search the consumer either hasfull or no information about a product. He therefore cannot use partial information to decidewhether to inspect a product. I show that this reduces the beneﬁts of continuing search and theconsumer’s expected payoﬀ when total costs of revealing full product information remain thesame. This shows that consumers discover more products and are better oﬀ when discoveringproducts on a product list that reveals only partial information instead of inspecting each productin detail before discovering the next. important. This relates to the „informative view” of advertising. See e.g. Bagwell (2007) for a summary and comparison tothe „persuasive view”.

The SD problem nests several existing sequential search problems. Most notable is Pandora’sproblem introduced by Weitzman (1979), which results when the consumer initially is aware ofall alternatives or cannot become aware of more alternatives. To prove the optimality of the reservation value policy, I use a diﬀerent approach than earlycontributions to search problems. Instead of directly proving that following the reservation valuepolicy maximizes expected total payoﬀ, I use results from the multi-armed bandit literatureto ﬁrst determine that a Gittins index policy is optimal, and then introduce a monotonicitycondition to show that the Gittins index reduces to simple reservation values. Speciﬁcally, I usethe results of Keller and Oldale (2003) who proved that a Gittins index policy is optimal in theirbranching bandits framework. This framework diﬀers from the standard multi-armed banditproblem in that taking an action will reveal information on multiple other actions. However, asan action branches oﬀ into new actions and reveals information only on those, the state of otheravailable actions is never altered. Hence, the important independence assumption continues tohold. Others include classical stopping problems such as those considered in McCall (1970) or Lippman and McCall(1976). Gittins et al. (2011) provide a textbook treatment of multi-armed bandit problems and the Gittins index policy.As purchasing a product ends search, search problems correspond to stoppable superprocesses as introduced byGlazebrook (1979). Besides, as the SD problem nests both random and directed search, italso relates to the growing empirical literature that models search based on these search problems(e.g. Honka, 2014; Chen and Yao, 2017; Ursu, 2018).

A risk-neutral consumer with unit demand faces a market oﬀering a (possibly inﬁnite) number ofproducts gathered in set J . Alternatives are heterogeneous with respect to their characteristics.The consumer has preferences over these characteristics which can be expressed in a utilityranking. To simplify exposition and facilitate a comparison to existing models from the consumersearch literature (e.g. Armstrong, 2017; Choi et al., 2018), I assume that the consumer’s ex postutility when purchasing alternative j is given by u ( x j , y j ) = x j + y j (1) Koulayev (2014) solves the dynamic decision problem using numerical backwards induction. For the case wherecosts are increasing in time (which is the case in his results), the present results suggest that a simple indexpolicy also characterizes the optimal policy for his model. The problems with inﬁnitely many arms in a multi-armed bandit problem discussed by Banks and Sundaram(1992) do not arise in the present setting. x j and y j are valuations derived from two distinct sets of characteristics. Note, however,that the results presented continue to hold for more general speciﬁcations that do not rely onlinear additive utility. An outside option of aborting search without a purchase oﬀering u isavailable.The consumer has limited information on available alternatives. More speciﬁcally, in periods t = 0 , , . . . the consumer knows both valuations x j and y j only for products in a considerationset C t ⊆ J . For products in an awareness set S t ⊆ J , the consumer only knows partial valuations x j . This captures the notion that if the consumer is aware of a product, he has received someinformation on the total valuation of the product. Finally, the consumer has no information onany other product j ∈ J \ ( S t ∪ C t ) .During search, the consumer gathers information by sequentially deciding which action totake starting from period t = 0 . If the consumer decides to discover more products, n d alterna-tives are added to the awareness set. If less than n d alternatives have not yet been revealed, onlythe remaining alternatives are revealed. For each of the n d alternatives, the partial valuation x j is revealed. To reveal the remaining characteristics of a product j , summarized in y j , theconsumer has to inspect the product. This reveals full information on the product and moves itfrom the awareness into the consideration set. The latter implies S t ∩ C t = ∅ .The order in which products are discovered is tracked by positions h j ∈ { , , . . . } , where asmaller position indicates that a product is discovered earlier, and h j = 0 implies either j ∈ C or j ∈ S . Without loss of generality, it is assumed that products are discovered in increasingorder of their index. Two precedence constraints on the consumer’s actions are imposed. First, the consumer canonly buy products from the consideration set. Second, the consumer can only inspect productsfrom the awareness set. Whereas the ﬁrst constraint is inherent in most search problems andimplies that a product cannot be bought before having obtained full information on it, thelatter is novel to the proposed search problem. It implies that a product cannot be inspectedunless the consumer is aware of it. In an online setting where a consumer browses through a listof products, this constraint holds naturally: Individual product pages are reached by clicking onthe respective link on the list. Hence, unless a product has been revealed on the list, it cannotbe clicked on. In other environments, this precedence constraint reﬂects that, unless a consumer Speciﬁcally, suppose that when the consumer becomes aware of alternative j , he reveals a signal on the distributionfrom which the utility of j will be drawn. Appropriately deﬁning the distribution of signals and the distributionof utilities conditional on these signals then yields an equivalent search problem. Note that in equilibrium settings, the order may be determined by sellers’ actions, requiring a careful analysis ofhow these will determine the consumer’s beliefs. For example, in online settings it is common for sellers to bidon the position at which their product adverts are shown (see e.g. Athey and Ellison, 2011). Doval (2018) is a notable exception. C t and end search.ii) Inspecting any product from the awareness set S t , thus revealing y j for that product andadding it to the consideration set.iii) Discovering n d additional products, thus revealing their partial valuations x j and addingthem to the awareness set.The distinction between inspecting and discovering products is novel in the SD problem.The two actions diﬀer in three important ways. First, whereas the consumer can use product-speciﬁc information to decide the order in which to inspect products from the awareness set, thedecision whether to discover more products is based solely on beliefs over products that may bediscovered. Second, if n d > , discovering products reveals information on multiple products.Finally, discovering products adds them into the awareness set, whereas inspecting a productmoves it into the consideration set. In combination with the precedence constraints this impliesthat the actions that are available in the next period diﬀer. These actions are gathered in the set of available actions , A t = C t ∪ S t ∪{ d } , where d indicatesdiscovery. If a consumer chooses an action a = j ∈ C t , he buys product j , whereas if he choosesan action a = j ∈ S t , he inspects product j . To clearly diﬀerentiate between the diﬀerent typesof actions, this set can also be written as A t = { b , b , s , . . . , d } , where bj indicates purchasingand sj inspecting product j .Both inspecting a product and discovering more products is costly. Inspection and discoverycosts are denoted by c s > and c d > respectively. These costs can be interpreted as the costof mental eﬀort necessary to evaluate the newly revealed information, or an opportunity cost ofthe time spent evaluating the new information. In line with this interpretation, I assume thatthere is free recall: Purchasing any of the products from the consideration set does not incurcosts, and c s is the same for inspecting any of the products in the awareness set.The consumer has beliefs over the products that he will discover, as well as the valuationhe will reveal when inspecting a product j . In particular, x j and y j are independent (across Note that the latter two points also imply that products that the consumer is not aware of cannot be modeledas a set of ex ante homogeneous products that diﬀer in terms of beliefs and associated costs from the productsin the awareness set. ) realizations from random variables X and Y , where the consumer has beliefs over theirjoint distribution. This implies that the consumer believes that in expectation, products areequivalent. A generalization where the distribution of X depends on index j is discussed inSection 3. Note that throughout, capital letters are used for random variables, lower case lettersare used for the respective realizations and bold letters indicate vectors.The consumer also has beliefs over the total number of available alternatives. I assumethat the consumer believes that with constant probability q ∈ [0 , , the next discovery willbe the last. As shown in the next section, the optimal policy is independent of the numberof remaining discoveries that may be available in the future. Note, however, that this beliefspeciﬁcation implicitly assumes that the consumer always knows whether he can reveal n d morealternatives. An extension presented in Section 3 covers the case where the consumer does notknow how many alternatives will be revealed.All information the consumer has in period t is summarized in the information tuple Ω t = (cid:10) ¯Ω , ω t (cid:11) . The tuple ¯Ω = (cid:10) u ( x, y ) , n d , c d , c s , G X ( x ) , F Y | X = x ( y ) , q (cid:11) represents the consumer’sknowledge and beliefs on the setting. It contains the utility function, how many productsare discovered, and the diﬀerent costs. It also contains the consumer’s beliefs summarized inthe probability q and the cumulative densities G X ( x ) and F Y | X = x ( y ) . The latter speciﬁes thecumulative density of Y , conditional on the realization of X , which is observed by the consumerbefore choosing to inspect a product. As a short-hand notation, I use G ( x ) and F ( y ) for thesedistributions. As a regularity condition, it is assumed that both G ( x ) and F ( y ) ∀ x have ﬁnitemean and variance.During search, the consumer reveals valuations x j and y j for the various products. Thisinformation is tracked in the set ω t , containing realizations x j for j ∈ S t ∪ C t and y j for j ∈ C t . The set of available actions A t and the information tuple Ω t capture the state in t .The consumer’s initial information on the alternatives are captured in ω which will contain(partial) valuations of products in the initial awareness and consideration set. Figure 2 showstheir transitions starting from period t = 0 . The depicted example assumes that there areonly two alternatives available and that products are discovered one at a time. If the consumerinitially chooses the outside option ( b ), no new information is revealed, and no further actionsremain. If the consumer instead reveals the ﬁrst alternative, he can inspect it in t = 1 . Note that one can translate beliefs over a speciﬁc number of available alternatives to this probability by assumingit varies during search. For example, if n d = 1 and the consumer believes that there are 3 alternatives intotal, then q t = 0 when the consumer has not yet discovered the second alternative and q t = 1 otherwise. Aspeciﬁcation like this (and any speciﬁcation where q t ≤ q t +1 ∀ t ) also satisﬁes the monotonicity condition (30)presented in Appendix C. Consequently, if it is assumed that the consumer knows | J | , monotonicity continues tohold. = (cid:10) ¯Ω , { x , y } (cid:11) A = { b , d } Ω = (cid:10) ¯Ω , { x , y } (cid:11) A = ∅ Ω = (cid:10) ¯Ω , { x , y , x } (cid:11) A = { b , s , d } Ω = (cid:10) ¯Ω , { x , y , x } (cid:11) A = ∅ Ω = (cid:10) ¯Ω , { x , y , x , y } (cid:11) A = { b , b } Ω = (cid:10) ¯Ω , { x , y , x , x } (cid:11) A = { b , s , s } ds b db Figure 2 – Transition of state variables Ω t (information tuple) and A t (setof available actions) for n d = 1 and | J | = 2 . The setting above describes a dynamic Markov decision process, where the consumer’s choiceof action determines the immediate rewards, as well as the state transitions. The state in t isgiven by Ω t and A t . As the valuations x j and y j can take on any (ﬁnite) real values, the statespace in general is inﬁnite. Time t itself is not included in the state; given A t and Ω t , it isirrelevant to the agent’s choice, because beliefs (over valuations and termination of discovery)are time invariant.The consumer’s problem consists of ﬁnding a feasible sequential policy, which maximizesthe expected payoﬀ of the whole decision process. A feasible sequential policy selects an action a t ∈ A t given information in Ω t in each period t . Let Π denote the set containing all feasiblepolicies. Formally, the consumer solves the following dynamic programming problem max π ∈ Π V (Ω , A ; π ) (2)where V (Ω t , A t ; π ) is the value function deﬁned as the expected total payoﬀ of following policy π starting from the state in t . Let [ B a V ] (Ω t , A t ; π ) = R ( a ) + E t [ V (Ω t +1 , A t +1 ; π ) | a ] (3)denote the Bellman operator, where the immediate rewards R ( a ) either are inspection costs, An exception is when x j and y j are drawn from discrete distributions, which limits the number of possiblevaluations that can be observed. j if it is bought. Immediate rewards R ( a ) therefore are known for all available actions. E t [ V (Ω t +1 , A t +1 ; π ) | a ] denotes the expected totalpayoﬀ over the whole future, conditional on policy π and having chosen action a . The expec-tations operator integrates over the respective distributions of X and Y . A purchase in t endssearch such that A t +1 = ∅ and E t [ V (Ω t +1 , ∅ ; π ) | a ] = 0 whenever a ∈ C t . The correspondingBellman equation is given by V (Ω t , A t ; π ) = max a ∈ A t [ B a V ] (Ω t , A t ; π ) (4) The optimal policy for the SD problem is fully characterized by three reservation values. In whatfollows, I ﬁrst deﬁne these reservation values, before stating the main result. At the end of thissection, I discuss possible extensions based on a monotonicity condition, as well as limitations.As in Weitzman (1979), suppose there is a hypothetical outside option oﬀering utility z .Furthermore, suppose the consumer faces the following comparison of actions: Immediatelytake the outside option, or inspect a product with known x j and end search thereafter. In thisdecision, the consumer will choose to inspect alternative j whenever the following holds: Q s ( x j , c s , z ) ≡ E Y [max { , x j + Y − z } ] − c s ≥ (5) Q s ( x j , c s , z ) deﬁnes the expected myopic net gain of inspecting product j over immediatelytaking the outside option. If the realization of Y is such that x j + y j ≤ z , the consumer takesthe hypothetical outside option after inspecting j and the gain is zero. When x j + y j > z , thegain over immediately taking the hypothetical outside option is x j + y j − z . The expectationoperator E Y [ · ] integrates over these realizations.The search value of product j , denoted by z sj , then is deﬁned as the value oﬀered by ahypothetical outside option that makes the consumer indiﬀerent in the above decision problem.Formally, z sj satisﬁes Q s ( x j , c s , z sj ) = 0 (6)which has a unique solution (see Lemma 1 in Adam, 2001). The search value can be calculated In this formulation of the problem, the consumer does not discount future payoﬀs. This is in line with theconsumer search literature, which usually assumes a ﬁnite number of alternatives without discounting. However,it is straightforward to show that the results continue to hold if a discount factor β < is introduced. In thiscase, the search and discovery values deﬁned in the next section need to be adjusted accordingly. z sj = x j + ξ (7)where ξ solves R ∞ ξ [1 − F ( y )] d y − c s = 0 (see Appendix B).The purchase value of product j , denoted by z bj , is deﬁned as the utility obtained whenbuying product j : z bj = u ( x j , y j ) (8)Based on reservation values given by (6) and (8), Weitzman (1979) showed that it cannot beoptimal to inspect a product that does not oﬀer the largest search value, or to stop when thelargest remaining search value exceeds the largest purchase value. Hence, for given S t and C t , itis optimal to always inspect and buy in decreasing order of search and purchase values. However,this rule does not fully characterize an optimal policy in the SD problem, as the consumer canadditionally discover more alternatives.For this additional action, a third reservation value based on a similar myopic comparison isintroduced. Suppose the consumer faces the following comparison of actions: Take a hypotheticaloutside option oﬀering z immediately, or discover more products and then search among thenewly revealed products. The consumer will choose the latter whenever the following holds: Q d ( c d , c s , z ) ≡ E X (cid:2) V (cid:0)(cid:10) ¯Ω , ω ( X , z ) (cid:11) , { b , s , . . . , sn d } ; ˜ π ) (cid:1)(cid:3) − z − c d ≥ (9)where ω ( X , z ) = { z, x , . . . , x n d } denotes the information the consumer has after revealing the n d more alternatives and ˜ π is the policy that optimally inspects the n d discovered products.Note that with some abuse of notation, product indices were adjusted to the reduced decisionproblem, such that j = 0 , , . . . , n d indicates the hypothetical outside option and the newlyrevealed products. Q d ( c d , c s , z ) deﬁnes the myopic net gain of discovering more products and optimally searchingamong them over immediately taking the outside option. It is myopic in the sense that it ignoresthe option to continue searching beyond the products that are discovered. In particular, note that V (cid:0)(cid:10) ¯Ω , ω ( X , z ) (cid:11) , { b , s , . . . , sn d } ; ˜ π ) (cid:1) is the value function of having an outside option oﬀering z and optimally inspecting alternatives for which partial valuations in X are known. Possiblefuture discoveries and any products in S t or C t are excluded from the set of available actions inthis value function. This implies that the discovery value does not depend on the consumer’sbeliefs over whether the next discovery will be the last. Finally, E X [ · ] deﬁnes the expectationoperator integrating over the joint distribution of the partial valuations in X . Formal detailson the calculation of the expectations and the value function are provided in Appendix B.13s for the search value, let the discovery value , denoted by z d , be deﬁned as the value of thehypothetical outside option that makes the consumer indiﬀerent in the above decision. Formally, z d is such that Q d ( c d , c s , z d ) = 0 (10)which has a unique solution. In the case where Y is independent of X , the discovery value canbe calculated as z d = µ X + Ξ( c s , c d ) (11)where µ X denotes the mean of X and Ξ( c s , c d ) solves (10) for an alternative random variable ˜ X = X − µ X . Further details for the calculation are provided in Appendix B.Theorem 1 provides the ﬁrst main result. It states that the optimal policy for the searchproblem reduces to three simple rules based on a comparison of the search, purchase and dis-covery values. In particular, the rules imply that in each period t , it is optimal to take theaction with the largest reservation value deﬁned in (6), (8), and (10). Hence, despite being fullycharacterized by myopic comparisons to a hypothetical outside option, these reservation valuesrank the expected payoﬀs of actions over all future periods. Theorem 1.

Let ˜ z b ( t ) = max k ∈ C t u ( x k , y k ) and ˜ z s ( t ) = max k ∈ S t z sk denote the largest searchand purchase values in period t . An optimal policy for the search and discovery problem ischaracterized by the following three rules: Stopping rule : Purchase j ∈ C t and end search whenever z bj = ˜ z b ( t ) ≥ max (cid:8) ˜ z s ( t ) , z d (cid:9) . Inspection rule : Inspect j ∈ S t whenever z sj = ˜ z s ( t ) ≥ max (cid:8) ˜ z b ( t ) , z d (cid:9) . Discovery rule : Discover more products whenever z d ≥ max (cid:8) ˜ z b ( t ) , ˜ z s ( t ) (cid:9) . The proof of Theorem 1 relies on results from the literature on multi-armed bandit problems,speciﬁcally the branching bandits framework of Keller and Oldale (2003). These authors showthat in a multi-armed bandit problem where taking an action branches oﬀ into new actions, aGittins index policy is optimal. Importantly, as an action branches oﬀ, it cannot be taken againin its original state. This ensures that available actions are independent in the sense that takingone does not alter the state of any other available action. The imposed precedence constraintscombined with the fact that the consumer cannot discover a product for a second time implythe same branching structure in the SD problem, and the results of Keller and Oldale (2003)therefore imply that a Gittins index policy is optimal. Introducing a monotonicity condition Ithen show that the Gittins index is equivalent to the simple reservation values deﬁned above.14ased on Theorem 1, optimal search behavior can be analyzed using only (6), (8) and (10).Weitzman (1979) showed that search values decrease in inspection costs and increase if largerrealizations y j become more likely through a shift in the probability mass of Y . The same appliesto the discovery value. It decreases in discovery costs and increases if probability mass of X is shifted towards larger values. The discovery value also depends on inspection costs and theconditional distribution of Y through the value function; it decreases in inspection costs andincreases if larger values of Y are more likely.To see the latter, consider the case where alternatives are discovered one at a time. In thiscase, the myopic net gain of discovering more products reduces to Q d ( c d , c s , z ) = E X [max { , Q s ( X, c s , z ) } ] − c d (12)For any c ′ s > c s , it holds that Q s ( x, c ′ s , z ) ≤ Q s ( x, c s , z ) for all ﬁnite values of x and z , implyingthat Q d ( c d , c ′ s , z ) ≤ Q d ( c d , c s , z ) for all z . As Q d ( c d , c s , z ) is decreasing in z (see Appendix A),it follows that the respective discovery values satisfy z d ′ ≤ z d .The optimal policy being fully characterized by simple rules leads to straightforward analysisof optimal choices for any given awareness and consideration sets. For example, consider a period t where max (cid:8) z d , ˜ z s ( t ) (cid:9) < ˜ z b ( t ) such that the consumer stops searching. When decreasinginspection costs suﬃciently in this case, the inequality reverts and the consumer will insteadeither ﬁrst discover more products, or inspect the best product from the awareness set. For the reservation value policy of Theorem 1 to be optimal, the discovery value needs tofully capture the expected net beneﬁts of discovering more products, including the option valueof being able to continue discovering products. The monotonicity condition used in the proofof the theorem ensures that this holds. It states that the expected net beneﬁts of discoveringmore products do not increase during search. Hence, whenever the consumer is indiﬀerentbetween taking the hypothetical outside option and discovering more products in t , he willeither continue to be indiﬀerent or take the outside option in t + 1 . Whether the consumer cancontinue to discover products in t + 1 thus does not aﬀect expected net beneﬁts in t , and thediscovery value fully captures the expected net beneﬁts. In the baseline SD problem, several assumptions directly imply that the monotonicity condi- For the search and purchase values, no monotonicity condition is required. This follows from the fact that in theindependent comparison to the hypothetical outside option, both actions do not provide the option to continuesearching. After buying a product, search ends, and after having inspected a product, the only option thatremains is to either buy the product or choose the hypothetical outside option. Consequently, for inspectionand purchase, at most one future period needs to be considered to fully capture the respective net beneﬁts overimmediately taking the outside option. q remains constant and (iii) n d is known. However, these assump-tions can be relaxed to capture a wider range of settings. Below, three related extensions arepresented. Formal results and further details are presented in Appendix C. Ranking in distribution:

In some settings, the consumer’ beliefs are such that the distribu-tion of partial valuations depends on the position at which a product is discovered. Monotonicitywill be satisﬁed if beliefs are such that the mean of X j decreases in a product’s position h j , ormore generally if beliefs are such that X j ﬁrst-order stochastically dominates X k if h j ≤ h k . Theoptimal policy then continues to be characterized by Theorem 1, the only diﬀerence being thatthe discovery value is based on the position-speciﬁc beliefs and decreases during search, makingit optimal to recall products in some cases. This could result in a market environment wheresellers of diﬀerentiated products compete in marketing eﬀorts for consumers to become awareof their products early on. If sellers oﬀering better valuations have a stronger incentive to bediscovered ﬁrst, they will increase marketing eﬀorts. Consumers’ beliefs then will reﬂect thisordering such that monotonicity holds and the simple optimal policy can be used to characterizeequilibria. Similarly, online stores often use algorithms to ﬁrst present products that consumersmay like more. This again satisﬁes monotonicity such that the tractable optimal policy can beused to rationalize search behavior in click-stream data from such stores.

Unknown n d : In other environments, a consumer may not know how many alternativeshe will discover. For example, a consumer may believe that there are still alternatives he isnot aware of and thus try to discover them, only to realize that he already is aware of allthe available alternatives. In such cases, a belief over how many alternatives are going to bediscovered needs to be speciﬁed. The reservation value policy continues to be optimal if thesebeliefs are such that monotonicity is satisﬁed. This will be the case if beliefs are constant, or if(more realistically) the consumer expects to discover fewer alternatives the more alternatives healready has discovered. The only diﬀerence to the baseline is that in Q d ( c d , c s , z ) , expectationsare additionally based on beliefs over how many alternatives will be revealed. Multiple discovery technologies:

Consumers may also have multiple discovery technolo-gies at their disposal. In an online setting, for example, each technology may represent a diﬀerentonline shop oﬀering alternatives. Moreover, advertising measures may separate products intodiﬀerent product pools. In such settings, the consumer also decides which technology to use todiscover more alternatives. By assigning each of the discovery technologies a diﬀerent discovery See, for example, the discussion on non-price advertising and the related references cited in Armstrong (2017). This would reﬂect the case where the consumer expects it to become harder to discover alternatives the feweralternatives have not yet been discovered. Alternatively, this could be modeled as either q or c d to increase witheach discovery, which also satisﬁes monotonicity. Though the optimal policy applies to a broad class of search problems, two limitations exist.The ﬁrst is that in the dynamic decision process, all available actions need to be independentof each other; performing one action in t should not aﬀect the payoﬀ of any other action thatis available in t . This is required to guarantee that the reservation values fully capture theeﬀects of each action. Recall that each reservation value does not depend on the availabilityof other actions. If independence does not hold, however, the availability of other actions alsoinﬂuences the expected payoﬀ of an action. Choosing actions based only on reservation valuesthat disregard these eﬀects therefore will not be optimal. Appendix D presents alternative searchproblems that violate this independence assumption.The second limitation is that the monotonicity condition discussed above needs to hold forthe discovery value to be based on myopic net beneﬁts. If this condition does not hold, then thediscovery value does not fully capture the expected net beneﬁts of discovering more products.However, as long as independence of the available actions is satisﬁed, a Gittins index policyremains optimal (see proof of Theorem 1). Hence, the optimal policy when monotonicity failsconsists of comparing the search and purchase values from equations (7) and (8) with the Gittinsindex value for discovery that explicitly accounts for future discoveries.One interesting case where this fails is if the consumer learns about the distribution of X orthe number of alternatives he will discover during search. So far, it was assumed that indepen-dently of the information the consumer reveals during search, his beliefs remain unchanged. Thiswill be the case if either the consumer has rational expectations and hence knows the underlyingdistributions, or simply does not update beliefs. With learning, the consumer updates his beliefsbased on partial valuations or number of products revealed in a discovery.Similar learning models have been studied in the context of classic search (and stopping)problems, where the consumer learns about the distribution he is sampling from (e.g. Rothschild,1974; Rosenﬁeld and Shapiro, 1981; Bikhchandani and Sharma, 1996; Adam, 2001). Whereasthese studies determine prior beliefs or learning rules such that the optimal policy is based on An interesting extension for future research is to model the case where a consumer can choose the order in whichproducts are revealed based on a product characteristic such as price. This requires modeling beliefs that reﬂectthis ordering through updating the support of the price distribution; in an ascending order the minimum pricethat can be discovered needs to increase with every discovery. Chen and Yao (2017) incorporate choices of suchsearch reﬁnements in their empirical model. However, in their model, a consumer simultaneously decides on thereﬁnement and which position to inspect. In contrast, if such choices are modeled as a SD problem the consumerwould sequentially decided between a discovery technology and whether to inspect a product. This is more closelydone by De los Santos and Koulayev (2017), who also model sequential choice of search reﬁnements and clicks,but use simplifying assumptions and do not derive the optimal policy. The SD problem is equivalent to these learning problems in the case where c s = 0 and the consumer updatesbeliefs about the distribution of the random variable X + Y . X or the number of alternatives hewill discover. The reason is that in classic search problems a consumer reveals full informationwhen inspecting a product. Hence, if a product turns out to be a good match, the value ofstopping increases along with the value of continuing search, where the learning rule guaranteesthat this is such that the expected net beneﬁts of continuing search over stopping with thecurrent best option weakly decrease with each inspection. In contrast, in the SD problem,discovering either more or better partial valuations does not necessarily increase the value of thebest option in the consideration set. For example, the consumer can discover many products thatlook very promising based on partial valuations, but after inspection realize that these productsare a bad match after all. In this case, the value of stopping remains the same, whereas beliefsare shifted such that the consumer expects to ﬁnd better or more products in future discoveries.Extending the SD problem to the case where the consumer learns about the distribution of X or the number of alternatives therefore comes at the cost of losing tractability of the discoveryvalue; a tractable expression for the Gittins index value for the discovery action (henceforthdenoted by z Lt ) is diﬃcult to obtain as it is necessary to determine the value function of adynamic decision process that includes many future periods. Moreover, whereas the discoveryvalue in Theorem 1 remains constant throughout search, z Lt changes whenever the consumerupdates beliefs. Consequently, the optimal policy when the consumer updates beliefs becomesmore complex in that the discovery value changes with each discovery and explicitly includesfuture periods.Whereas z Lt is not tractable and computationally expensive to obtain, it is possible to derivebounds on this value that are easier to compute and can serve as an approximation. First, z Lt can be approximated from below through k -step look-ahead values. The 1-step look-aheadvalue is deﬁned by (10), where the expectation operator is adjusted to account for the consumer’sbeliefs in t . As k increases, more future discoveries are considered in (10), leading to a moreprecise approximation of z Lt up to the point where z Lt is calculated precisely. Second, a resultof Kohn and Shavell (1974) can be used to derive an upper bound. These authors show thatthe expected value of continuing search when the consumer fully resolves uncertainty on theunderlying distributions in the next period exceeds the true continuation value in a classic searchproblem where a consumer samples from an unknown distribution. The same logic directlyapplies in the extension to the SD problem and the upper bound then can be computed usingthe results provided in the next section. A more formal treatment of these bounds is provided See e.g. Theorem 1 in Rosenﬁeld and Shapiro (1981).

18n Appendix E.

In an environment where consumers sequentially inspect products, a consumer’s expected payoﬀand the market demand results from integrating over diﬀerent possible choice sequences leadingto eventual purchases. Conceptually, this poses a major challenge, as the number of possiblechoice sequences grows extremely fast in the number of available alternatives. Theorem 2 allows to circumvent this diﬃculty. It states that the purchase outcome of aconsumer solving the search problem is equivalent to a consumer directly buying a product thatoﬀers the highest eﬀective value . Importantly, a product’s eﬀective value does not depend onthe various possible choice sequences leading to its purchase.

Theorem 2.

Let w j ≡  u j if j ∈ C ˜ w j if ˜ w j < z d or j ∈ S z d + f ( h j ) + ε ˜ w j elsebe the eﬀective value for product j revealed on position h j where ˜ w j ≡ min { z sj , z bj } = x j +min { ξ, y j } , f ( h j ) is a non-negative function and strictly decreasing in h j and ε is an inﬁnites-imal. The solution to the search and discovery problem with initial consideration set C andawareness set S leads to the eventual purchase of the product with the largest eﬀective value. This result is based on and generalizes the „eventual purchase theorem” of Choi et al. (2018)to the case where the consumer has limited awareness. The value ˜ w j used in the theorem isequivalent to the eﬀective value deﬁned by Choi et al. (2018), and the proof follows the samelogic; as a product (incl. out outside option) is always bought, the proof only needs to establishthat the optimal policy never prescribes to buy a product that does not have the largest eﬀectivevalue.The generalization to the case of limited awareness follows from the following implication ofthe optimal policy: Whenever both the inspection and the purchase value of a product in theawareness set exceed the discovery value, the consumer will buy the product and end search.Hence, when ˜ w j ≥ z d , the consumer never discovers products on positions beyond h j . This iscaptured in the eﬀective values by the term z d + f ( h j ) , which ranks alternatives based on when For example, with only one alternative and an outside option, there are four possible choice sequences. Withtwo alternatives, the number of possible choice sequences increases to 20, and with three alternatives, there arealready more than 100 possible choice sequences. ˜ w j > ˜ w k ≥ z d for two products discoveredon the same position. Without the inﬁnitesimal, the eﬀective value would be w j = w k , implyingthe consumer would be indiﬀerent between buying either of the two products. This contraststhe optimal policy, which for ˜ w j > ˜ w k will never prescribe to buy k if both j and k are in theawareness set. If n d = 1 , the inﬁnitesimal is not required.The result continues to hold for extensions of the SD problem, as long as the discovery valuesare predetermined. The only diﬀerence then is that in the eﬀective value of an alternative j , thediscovery value depends on the position at which j is revealed. Based on these results, it is now possible to derive a simple characterization of a consumer’sexpected payoﬀ, as summarized in Proposition 1. In this expression, the expected payoﬀ doesnot explicitly depend on inspection and discovery costs; they aﬀect the expected payoﬀ onlythrough the discovery and search values. As the proof shows, this follows from the deﬁnitionof these values, which relate expected payoﬀs and costs (as in Choi et al., 2018). Based on thischaracterization, it is only necessary to derive the distribution of the eﬀective values withouthaving to explicitly consider diﬀerent choice sequences. Note also that as the eﬀective valueis adjusted, the expected payoﬀ does not depend on the choice of function f ( h ) which ranksalternatives based on position in the eﬀective value. Proposition 1.

A consumer’s expected payoﬀ in the SD problem is given by V (Ω , A ; π ) = E ˆ W (cid:20) max j ∈ J ˆ W j (cid:21) where E ˆ W [ · ] integrates over the distribution of ˆ W = h ˆ W , . . . , ˆ W | J | i ′ , with ˆ w j being the eﬀectivevalue adjusted with f ( h j ) = ε = 0 ∀ h j . If | J | = ∞ , V (Ω , A ; π ) = z d . Whereas it is clear that making either inspection or discovery easier leads to an increase in theexpected payoﬀ, it is not obvious which of these two changes is more beneﬁcial for a consumer.For the case where n d = 1 , Proposition 2 shows that if the number of alternatives exceedssome threshold, then the consumer beneﬁts more from facilitating the discovery of additionalproducts. Note that this threshold can be zero. For example, this is the case when u = 0 , c s = 0 . and c d = 0 . , and thevaluations are drawn from standard normal distributions. roposition 2. If n d = 1 , there exists a threshold n ∗ such that whenever | J | > n ∗ , a consumerbeneﬁts more from a decrease in discovery costs than a decrease in inspection costs. This thresholddecreases in the value of the alternatives in the initial consideration and awareness set. Whereas the proof is more involved, the intuition is that when there are only few alternativesavailable, the consumer is more likely to ﬁrst discover all alternatives and then start inspectingalternatives. Hence in expectation, he pays the inspection costs relatively often and a reductionin inspection costs will be more beneﬁcial. Similarly, when the value of the outside option islarge, the consumer is likely to inspect fewer of the products he discovers, leading to relativelysmall beneﬁts of a reduction in inspection costs.For settings where n d > , it becomes diﬃcult to obtain similarly general results. In par-ticular, for some distributions and n d , it is possible that decreasing inspection costs increasesthe discovery value z d by more than decreasing the discovery costs by the same amount. Insuch cases, the consumer will beneﬁt more from making inspection less costly. Nonetheless,the general intuition remains the same in such settings; a reduction in inspection costs is morebeneﬁcial, the more likely it is that the consumer inspects relatively many alternatives. Using Theorem 2, it is straightforward to derive a market demand function when heteroge-neous consumers optimally solve the SD problem. In particular, let the eﬀective value w ij foreach consumer i be a realization of the random variable W j and gather the random variables in W = (cid:2) W , . . . , W | J | (cid:3) ′ . For a unit mass of consumers the market demand for a product j then isgiven by D j = E h [ P W ( W j ≥ W k ∀ k ∈ J \ j )] (13)where the expectations operator E h [ · ] integrates over all permutations of the order in whichproducts are discovered by a consumer.As the eﬀective value decreases in the position at which a product is discovered, (13) revealsthat the demand for a product depends on the probability of each position at which it is displayed.Speciﬁcally, the demand for a product exhibits ranking eﬀects; products that are more likelyto be discovered early are more likely to be bought. As discussed in detail in the next section,this follows from the structure of the SD problem. As search progresses, it becomes less likelythat a consumer has not yet settled for an alternative; hence, fewer consumers become aware ofproducts that would be revealed later, leading to a lower demand for such products.21 Comparison of Search Problems

To highlight how the SD problem diﬀers from existing approaches, I compare it with the twoclassical sequential search problems; directed search as in Weitzman (1979) and random search asin Lippman and McCall (1976). Both these search problems are nested within the SD problem.Directed search results if the consumer initially has full awareness (i.e. S = J ), or discoverycosts are equal to zero. Random search results if inspection costs are equal to zero such thatany product that is discovered will also be inspected. More generally, however, directed searchdiﬀers in that the consumer initially knows all partial valuations and does not discover products,and in random search the consumer cannot use partial information to decide whether to inspecta product in more detail.For clarity, I focus the comparison on the case where products are discovered one at a time( n d = 1 ) and where the consumer initially only knows an outside option ( S = ∅ ). Furthermore,valuations x j and y j are assumed to be realizations of mutually independent random variables X and Y , where the consumer has rational expectations such that beliefs are correct. Assumptionsspeciﬁc to each search problem are described below. Search and Discovery (SD):

The consumer searches as described in Section 2, incurringinspection costs c s and discovery costs c d . Without loss of generality, I assume that the consumerdiscovers products in increasing order of their index, making subscripts for position h and product j interchangeable. Random Search (RS):

When discovering a product j , the consumer reveals both x j and y j ; hence does not have to pay a cost to inspect the product. Costs to reveal this information aregiven by c RS . In this case, the consumer optimally stops and buys product j if x j + y j ≥ z RS .The reservation value is given by z RS = µ X + µ Y + ˜ ξ , where ˜ ξ is the same as in (7) but deﬁnedover the joint distribution of demeaned X and Y . Products are discovered in the same order asin SD. Furthermore, I assume u < z RS to ensure a non-trivial case. Directed Search (DS):

The consumer initially observes x j ∀ j , based on which he choosesto search among alternatives following Weitzman’s (1979) reservation value policy. Costs toinspect product j are given by a function c DSj = v DS ( c s , h j ) , where c s are baseline costs thatare adjusted for the position through a function v DS : R → R + which is assumed to be strictlyincreasing in a product’s position h j . As costs vary across products, reservation values aregiven by z sj = x j + ξ j , where ξ j is the same as in (7) with product-speciﬁc inspection costs. Theassumption on v DS ( c s , h j ) implies that ξ j decreases in j . I impose this functional form restrictionas otherwise the DS problem does not generate similar patterns, as discussed in Section 5.2.22 .1 Stopping decisions In search settings, consumers’ stopping decisions determine which products consumers con-sider and buy. Stopping decisions therefore shape how ﬁrms compete in prices, quality or forbeing discovered early during search. Hence, comparing stopping decisions across the diﬀerentsearch problems provides important insights on how well existing approaches are able to cap-ture the more general setting where consumers are not aware of all alternatives and use partialinformation to determine whether to inspect products.In the SD problem, a consumer always stops search at a product k whenever the productis both promising enough to be inspected and oﬀers a large enough valuation to not make itworthwhile to continue discovering more products. Formally, this is given by the condition x k + min { y k , ξ } ≥ z d . The probability that a consumer will stop searching before discoveringproduct j therefore is given by P X , Y ( X k + min { Y k , ξ } ≤ z d ∀ k < j ) = 1 − P X,Y ( X + min { Y, ξ } ≤ z d ) j − (14)Similarly, in the RS problem, a consumer will always stop search at a product k whenever x k + y k ≥ z RS , hence the probability of stopping search before discovering product j is given by P X , Y ( X k + Y k ≤ z RS ∀ k < j ) = 1 − P X,Y ( X + Y ≤ z RS ) j − (15)In both search problems, a consumer may stop search before discovering a product j . Conse-quently, stopping decisions in the SD and the RS problem imply the same feature: Products thata consumer initially has no information on may never be discovered and bought, independent ofhow the consumer values them.However, as the consumer has the option of not inspecting products with low partial valua-tions, the stopping probabilities diﬀer. In particular, in the case where the total cost to reveal allinformation about a product are the same, stopping probabilities are smaller in the SD problem.This is stated in Proposition 3 and follows from the fact that not having to inspect alternativeswith small partial valuations allows to save on inspection costs. This increases the expectedbeneﬁt of discovering more products, which implies a smaller probability of search stopping, andthat on average, more products will be discovered in the SD problem. Proposition 3.

If costs in the RS problem are given by c RS = c s + c d , a consumer is less likelyto stop before discovering any given product in the SD problem, than he is in the RS problem. In contrast, stopping decisions are diﬀerent in the DS problem. As the consumer initially23nows of the existence of all products and can order them based on partial information, there isno stopping decision in terms of discovering products. Instead, the consumer directly comparesall partial valuations and the diﬀerent inspection costs, based on which he decides the order inwhich to inspect products. Hence, he can directly inspect highly valued products even whenthey are presented at the last position.This diﬀerence arises from the diﬀerent assumptions on consumers’ initial information and isparamount in the analysis of search frictions. Consider an equilibrium setting where horizontallydiﬀerentiated alternatives are supplied by ﬁrms that compete by setting mean partial valuations(e.g. by setting prices as in Choi et al., 2018). If consumers are aware of all alternatives andsearch as in the DS problem, all ﬁrms will compete directly with each other. In contrast, in a SDproblem, the ﬁrm that is discovered ﬁrst initially competes only with the option of discoveringpotentially better products. This diﬀerence is further illustrated in Appendix G, and as itdetermines how ﬁrms compete, will lead to diﬀerent equilibrium dynamics. The above analysis already suggests that the demand structure diﬀers across the three searchproblems. To provide further details, I focus on a particular pattern that is generated by allthree search problems: Market demand for a product decreases in its position. Such rankingeﬀects are important as they determine how ﬁercely sellers compete for their products to be re-vealed on early positions, for example through informative advertising or position auctions (e.g.Athey and Ellison, 2011). Furthermore, they have received considerable attention in the mar-keting literature, which has produced ample empirical evidence that suggests their importancein online sales (e.g. Ghose et al., 2014; De los Santos and Koulayev, 2017; Ursu, 2018).To compare the mechanism producing ranking eﬀects across the search problems, I use thefollowing deﬁnition: The ranking eﬀect for a product is the diﬀerence in market demand ofthe product being revealed at position h and at h + 1 , with the corresponding exchange of theproduct previously revealed at position h + 1 . Formally, the ranking eﬀect is deﬁned as r k ( h ) ≡ d k ( h ) − d k ( h + 1) (16)where d k ( h ) denotes the market demand for a product when revealed at position h in searchproblem k ∈ { SD, RS, DS } . For clarity, product speciﬁc subscripts are either omitted or ex-changed with position subscripts in the following. The former is feasible as eﬀective values are To give an example, Anderson and Renault (1999) and Choi et al. (2018) model a similar environment, with thediﬀerence that in the former, consumers initially are not aware of any alternatives, whereas in the latter theyare aware and observe prices of all alternatives. Whereas in the former, decreasing inspection costs lowers theequilibrium price in a symmetric equilibrium, the opposite holds in the latter environment. W .To investigate ranking eﬀects, it is ﬁrst necessary to derive the market demand at a particularposition h . For a unit mass of consumers with independent realizations of eﬀective values, it isgiven by d SD ( h ) = P W ( W < z d ) h − (cid:20) P W ( W ≥ z d )+ P W ( W < z d ) | J |− ( h − P W ( W ≥ max k ∈ J W k | W k < z d ∀ j ) (cid:21) (17)The expression follows from Theorem 2 which implies that if a consumer discovers a productwith w j ≥ z d , he will stop searching and buy a product j . The consumer will only discover andhave the option to buy a product on position h if w j < z d for all products on better positions.In contrast, when w j < z d , the consumer will ﬁrst discover more products, and only recall j ifhe discovers all products and j is the best among them.In the latter case, a product’s position does not aﬀect market demand; once all productsare discovered, products are equivalent in terms of their inspection costs and the order in whichthey are inspected is only determined based on partial valuations. This implies that the rankingeﬀect in the SD problem is independent of the number of alternatives and simpliﬁes to r SD ( h ) = P W (cid:16) W ≥ z d (cid:17) h P W ( W < z d ) h − − P W ( W < z d ) h i (18)This expression reveals that the ranking eﬀect in the SD problem solely results from thediﬀerence in the probability of a consumer reaching positions h or h + 1 respectively. Besides thedistribution of valuations and the inspection and discovery costs, Proposition 4 shows that theranking eﬀect is determined by the position h to which the product is moved. When h is large,fewer consumers will not have already stopped searching before reaching h . Hence, the later aproduct is revealed, the smaller is the increase in demand when moving one position ahead.The demand in a random search problem is derived similarly. In RS, a consumer will onlybe able to buy a product if he has not stopped searching before, which requires that x + y < z RS for all products on better positions. Furthermore, a consumer will also only recall a product ifhe has inspected all alternatives. Similar to the SD problem, this implies that the ranking eﬀectin the RS problem is given by r RS ( h ) = P X,Y (cid:0) X + Y ≥ z RS (cid:1) h P X,Y (cid:0) X + Y < z RS (cid:1) h − − P X,Y (cid:0) X + Y < z RS (cid:1) h i (19)Comparing (18) with (19) reveals that ranking eﬀects in the RS problem are produced by25he same mechanism as in the SD problem. In both search problems; fewer consumers buyproducts at later positions due to the increasing the probability of having stopped searchingbefore discovering these products. It follows that in both search problems, ranking eﬀectsdecrease in the position and are independent of the total number of alternatives.Though their extent generally diﬀers, Proposition 4 additionally shows that at later positions,ranking eﬀects will be larger in the SD problem. The result is a direct implication of Proposition3; as a consumer is more likely to reach a product at a later position in the SD problem, rankingeﬀects at later positions will be larger. Proposition 4.

The ranking eﬀect in both the SD and the RS problem decreases in position h and is independent of the number of alternatives. Furthermore, if c RS = c s + c d , there exists athreshold h ∗ such that r SD ( h ) ≥ r RS ( h ) for all h > h ∗ . Given the diﬀerent stopping decisions, ranking eﬀects in directed search do not result fromconsumers having stopped searching before reaching products revealed at later positions. In-stead, they result from diﬀerences in the cost of inspecting products at diﬀerent positions. Tosee this, write the ranking eﬀect in the DS problem as r DS ( h ) = E ˜ W h (cid:20) Y k = h P ( ˜ W k ≤ ˜ W h ) (cid:21) − E ˜ W h +1 (cid:20) Y k = h +1 P ( ˜ W k ≤ ˜ W h +1 ) (cid:21) (20)This expression reveals that the ranking eﬀect results from two sources in the DS problem.First, by moving a product j one position ahead, the product previously on position h is nowmore costly to inspect, making it more likely that j is bought for any ˜ w j . Second, by makingit less costly to inspect j , the distribution of ˜ w j shifts such that larger values ˜ w j become morelikely.In contrast to RS and SD, the ranking eﬀect in the DS problem depends on the number ofavailable alternatives. In RS and SD, ranking eﬀects result from the decreasing probability of aconsumer having stopped searching before reaching a particular position, which does not dependon how many alternatives there are in total. In DS, however, a consumer directly compares allalternatives based on partial valuations. Adding more alternatives thus will aﬀect the demandon each position.Speciﬁcally, Proposition 5 shows that ranking eﬀects in the DS problem will be smaller if thereare many alternatives. The reason is that as the number of alternatives increases, each productis less likely to be bought and diﬀerences in the position-speciﬁc market demand decrease. Note,however, that in cases where the probability of consumers buying products on the last positionsis very small or exactly zero (e.g. when inspection costs are large), adding more alternatives will26ot aﬀect ranking eﬀects in the DS problem. Proposition 5.

The ranking eﬀect in the DS problem is weakly decreasing in the number ofalternatives.

A second diﬀerence to the RS and SD problems is that the ranking eﬀect does not necessarilydecrease in position. This is possible as there are two counteracting channels through whichposition aﬀects the ranking eﬀect in a DS problem. First, as there is lower demand for productsat later positions, diﬀerences between them will be smaller. Second, if v DS ( c s , h ) is such that ξ h decreases in h at an increasing rate, the diﬀerence in the purchase probability at h instead of at h + 1 increases in the position. When the latter dominates, the ranking eﬀect will ﬁrst increasein position.The above comparison highlights that the mechanism producing ranking eﬀects in the DSproblem is distinct from the one in the SD and RS problems, leading to a diﬀerent demandstructure. In the former, ranking eﬀects result from diﬀerences in inspection costs relative todiﬀerences in partial valuations. Hence, a better partial valuation is a substitute for movingpositions ahead. In contrast, in a SD or RS problem, a product’s large partial valuation doesnot aﬀect consumers that stop search before discovering it. Hence, oﬀering a larger partialvaluation does not substitute for being discovered early in a SD or RS problem. Moreover, the size of ranking eﬀects also determines how important it is for products to berevealed on an early position. As ranking eﬀects are independent of the number of alternativesin SD and RS, so are sellers’ incentives to have their products revealed early during search.In contrast, in DS, the demand increase of moving positions ahead becomes smaller when thenumber of alternatives increases. Hence, sellers can have smaller incentives to be revealed onearly positions when there are many, relative to when there are only few alternatives.Finally, the above comparison between the number of alternatives and ranking eﬀects alsosuggests the existence of an empirical test to distinguish the search modes in some settings. Ifdata is available that allows to test whether ranking eﬀects depend on the number of alternatives,then it will be possible to empirically determine whether a DS problem, instead of a RS orSD problem provides a framework that better captures ranking eﬀects in a particular setting.Furthermore, if data is available that allows to test whether a product’s partial valuation hasan eﬀect on whether it is inspected, it will be possible to distinguish between RS and SD. Note, however, that in an equilibrium setting, oﬀering larger partial valuations may indirectly serve as a substitutefor being discovered early by raising consumers’ expectations and induce them to search longer. .3 Expected Payoff If costs are speciﬁed such that the total costs of revealing all product information remain thesame, then the three search problems diﬀer only in the information the consumer can use duringsearch. A comparison of a consumer’s expected payoﬀ based on such a speciﬁcation thereforeprovides some insight into whether it is always to the consumer’s beneﬁt to provide informationthat helps to direct search towards some alternatives.For total costs of revealing full information about a product on position h to be the samein the three search problems, inspection costs in the RS and DS problem are speciﬁed as c RS = c s + c d and c DSj = c s + h j c d respectively.The SD problem extends the RS problem by additionally providing the consumer with theoption to not inspect products depending on their partial valuations. This allows the consumerto save on inspection costs by not inspecting products with small partial valuations. As statedin Proposition 6, this increases the expected payoﬀ which implies that providing product in-formation across two layers, as done for example by online retailers or search intermediaries, isbeneﬁcial for consumers. Proposition 6. If c RS = c s + c d , then a consumer’s expected payoﬀ in the SD problem is largerthan in the RS problem. In contrast to the SD problem, the consumer can use all partial valuations to direct search inthe DS problem. Hence, if inspection costs for all products are the same in both problems (i.e. c DSj = c s ∀ j ), a consumer will have a larger expected payoﬀ in the DS problem as he can directlyinspect products with large partial valuations. However, under the assumption that total costsof revealing full information are the same in both search problems, a more detailed analysis isnecessary to determine which search problem oﬀers a larger expected payoﬀ.Denote a consumer’s expected payoﬀ in a search problem k as π k for k ∈ { SD, DS } . Propo-sition 1 implies that expected payoﬀs are given by p SD = E ˆ W (cid:20) max { u , max j ∈ J ˆ W j } (cid:21) p DS = E ˜ W (cid:20) max { u , max j ∈ J ˜ W j } (cid:21) Furthermore, let H k ( · ) denote the cumulative density of the respective maximum value overwhich the expectation operator integrates in problem k . The diﬀerence in expected payoﬀs of28he SD and the DS problem then is given by p SD − p DS = Z ∞ z d H DS ( w ) − d w + Z z d u H DS ( w ) − H SD ( w ) d w (21)The ﬁrst expression in (21) is negative, capturing the advantage of observing partial valua-tions for all products and being able to directly inspect a product at a later position. Given H DS ( w ) ≤ H SD ( w ) on w ∈ (cid:2) u , z d (cid:3) , the second expression in (21) is positive, revealing thatdirectly observing all partial valuations x j does not only yield beneﬁts.The latter stems from the diﬀerence in how inspection and discovery costs are taken intoconsideration in the two dynamic decision processes. In DS, the total cost of inspecting a prod-uct j at a later position is directly weighed against its beneﬁts given the partial valuations. Incontrast, in SD, the consumer ﬁrst weighs the discovery costs against the expected beneﬁts ofdiscovering a product with a larger partial valuation. Once product j is revealed, the accumu-lated cost paid to discover j ( jc d ) is a sunk cost and does not aﬀect the decision whether toinspect j .Hence, in cases where products on early positions have below-average partial valuations x j ,the optimal policy in SD tends to less often prescribe to inspect these products compared tothe direct cost comparison in DS. In some cases, the former can be more beneﬁcial, leading toa larger expected payoﬀ. Directly revealing all partial valuations therefore does not alwaysimprove a consumer’s beneﬁt, if the consumer continues to incur the same total costs to revealthe full valuation of any given product. When using structural models to estimate search costs and preference parameters, it is alsoimportant to consider diﬀerences in the information consumers can use during search that dis-tinguish the diﬀerent search problems. For example, a structural search model will use pricediﬀerences across all products to inform parameter estimates if it abstracts from limited aware-ness and assumes that consumers initially observe all prices. Consumers not inspecting low-priceproducts at later positions then could be attributed either to a small (or even negative) pricesensitivity, or large inspection costs. However, if instead this is the result of consumers not hav-ing discovered these products, estimates and counterfactuals (e.g. the eﬀects of price changes)will be biased.To analyze the extent to which this can be the case, I focus on a scenario where preference For example, this is the case if X ∼ N (cid:0) , (cid:1) , Y ∼ N (cid:0) , (cid:1) , c s = c d = 0 . and | J | = 10 . No threshold result as in Proposition 2 applies in this case. The ﬁrst expression in (21) decreases whereas thesecond expression increases in the number of alternatives. I ﬁrst analyze how the diﬀerent models attribute observed stopping decisions tostructural parameters. A numerical exercise then reveals that this can lead to sizable diﬀerencesin estimates of structural parameters.

Empirical setting : The data consists of consumers’ consideration sets and purchases, aswell as a number of characteristics for each of the available products. Utility of a consumer isspeciﬁed as u j = x ′ j β + y j , where x j is vector containing the observed product characteristics, β is a vector of preference parameters and y j is an idiosyncratic unobservable taste shock withmean zero. Depending on the model, consumers are assumed to reveal x j when either discovering j (SD) or inspecting j (RS), or know x j prior to search (DS). y j is revealed after inspecting j in all three models.Given this setting, Table 1 shows suﬃcient or necessary conditions for the purchase of product j across the three models, under the assumption that the observed consideration set does notcoincide with the set of all available alternatives. The condition for the SD problem shows that apurchase of product j can be independent of realized valuations of products that the consumer isnot aware of in the purchase period ¯ t . j only needs to oﬀer good enough characteristics relativeto the mean. The RS model features the same structure; a consumer will end search and buyproduct j if its valuation exceeds the reservation value. However, Ξ and ˜ ξ depend diﬀerently onthe underlying costs and distributions of characteristics in x j and y j . Through these non-linearfunctions, a RS model will attribute observed limited consideration sets diﬀerently to preferenceand cost parameters.In the DS model rationalizing the purchase of j requires that the valuation of the purchasedproduct is larger than the search values of all uninspected products. If, for example, x k > x j foran uninspected product, the DS model will require either relatively small preference parameters,or relatively large inspection costs. Hence, depending on the characteristics of the uninspectedproducts, rationalizing limited consideration sets in a DS model will require a combination oflarge inspection costs and attenuated preference parameters, as the estimation procedure willtry to ﬁt an inequality for each uninspected product.To investigate the extent to which this can introduce biases in estimates of the structuralparameters, I perform various simulations for this setting. First, I simulate consumers solving The empirical literature extends the simple speciﬁcation for a range of settings, for example by introducingposition-speciﬁc inspection costs or heterogeneous preferences. The main rationale continues to hold in suchsettings. The condition is suﬃcient but not necessary, as alternatively, the consumer can ﬁrst become aware of all alter-natives, before then purchasing j . able 1 – Purchase conditions SD ( x j − µ X ) ′ β + y j ≥ Ξ (suﬃcient) RS ( x j − µ X ) ′ β + y j ≥ ˜ ξ (necessary) DS ( x j − x k ) ′ β + y j ≥ ξ ∀ k / ∈ C ¯ t (necessary) Notes:

Suﬃcent or necessary conditions for purchase of product j if J * C ¯ t . ¯ t denotes the purchase period. a SD problem with the given utility speciﬁcation. Using this data, I then estimate structuralparameters in search models based on either the RS and DS problem. For both models, theestimation ﬁts inequalities based on the conditions of Table 1, as well as other inequalities comingfrom continuation and purchase decisions. Details of the estimation procedure are provided inAppendix F. As comparison, I also present estimates of a full information (FI) model. Table 2 – Coeﬃcient estimates β β β β β c s β c d β SD -3.50 0.71 0.57 0.03 0.03 DS -0.79 0.71 -1.80 0.85 RS -2.08 0.98 1.86 0.06 FI -1.82 0.71 0.37 Notes:

Mean parameter estimates from 20 inde-pendently simulated datsets with 2000 consumersand 30 products per consumer. Characteristics areindependent draws (across consumers and prod-ucts) from x j ∼ N (3 , . , x j ∼ N (3 . , . and y j ∼ N (0 , . The third characteristic is an outsidedummy. Results of a typical simulation are presented in Table 2. To account for the fact that imposingthe distribution of y j is a normalization in the empirical context, estimates are presented as aratio to the coeﬃcient of the second characteristic. The results show that relative preferencesover product characteristics are estimated relatively well in the DS model, despite the secondcoeﬃcient being strongly attenuated. However, inspection costs are strongly accentuated. Thelatter indicates that the large estimates of baseline costs estimated in some DS models maybe caused by not accounting for limited awareness (e.g. Chen and Yao, 2017; Ursu, 2018). Incontrast, the RS model estimates inspection costs to be close to the combined inspection anddiscovery costs, but the ratio of preference parameters is deviates from the true values. Thelarge diﬀerences in the estimated coeﬃcient for the outside option result from how the diﬀerentmodels interpret consumers not inspecting or not buying. Whereas in the DS problem this ismainly rationalized through large inspection costs, the RS model attributes the lack of searchmainly to a good outside option.Though results from only a single simulation are presented, I obtained qualitatively similar31esults across a wide range of parameter values. Overall, these results show that limited aware-ness can have sizable eﬀects on estimates of structural parameters; the DS model overestimatesinspection costs, whereas the RS model yields biased estimates of the relative importance ofcharacteristics. However, recall that the SD problem becomes closer to the DS problem if eitherconsumers initially are aware of many alternatives or discovery costs are very small. Hence insuch casese, the DS model can recover structural parameters. When estimating search models,researchers should therefore carefully consider the degree to which limited awareness plays a rolein the speciﬁc setting they are studying.

This paper introduces a search problem that generalizes existing frameworks to settings whereconsumers have limited awareness and ﬁrst need to become aware of alternatives before being ableto search among them. The paper’s contribution is to provide a tractable solution for optimalsearch decisions and expected outcomes for this search and discovery problem. Moreover, acomparison with classical random and directed search highlights how limited awareness and theavailability of partial product information determine search outcomes and expected payoﬀs.A promising avenue for future research is to build on this paper’s results and study limitedawareness in an equilibrium setting. This could yield novel insights into how consumers’ limitedinformation shapes price competition. Furthermore, the search and discovery problem can serveas a framework to analyze how ﬁrms compete for consumers’ awareness. For example, informa-tive advertising can make it more likely that consumers are aware of a seller’s products fromthe outset. Ranking eﬀects derived in this paper already suggest that it will be in a seller’s bestinterest to make consumers aware of his product, but further research is needed to determineequilibrium dynamics.Another avenue for future research entails incorporating the search and discovery probleminto a structural model that is estimated with click-stream data. The available actions in thesearch and discovery problem closely match how consumers scroll through product lists (discov-ery) and click on products (inspection) on websites of search intermediaries and online retailers.By accounting for the fact that consumers initially do not observe entire list pages, such a modelcould improve the estimation of consumers’ preferences, inspection costs and ranking eﬀectsrelative to models that abstract from consumers not observing the whole product list. These results are produced in the supplementary material. eferences Adam, K. (2001). Learning while searching for the best alternative.

Journal of EconomicTheory 101 (1), 252–280.Anderson, S. P. and R. Renault (1999). Pricing, product diversity, and search costs: A bertrand-chamberlin-diamond model.

The RAND Journal of Economics 30 (4), 719–735.Armstrong, M. (2017). Ordered consumer search.

Journal of the European Economic Associa-tion 15 (5), 989–1024.Armstrong, M. and J. Vickers (2015). Which demand systems can be generated by discretechoice?

Journal of Economic Theory 158 , 293–307.Athey, S. and G. Ellison (2011). Position auctions with consumer search.

The Quarterly Journalof Economics 126 (3), 1213–1270.Bagwell, K. (2007). The economic analysis of advertising. In

Handbook of Industrial Organiza-tion , Volume 3, Chapter 28, pp. 1701–1844.Banks, J. S. and R. K. Sundaram (1992). Denumerable-armed bandits.

Econometrica 60 (5),1071–1096.Banks, J. S. and R. K. Sundaram (1994). Switching costs and the Gittins index.

Economet-rica 62 (3), 687–694.Bikhchandani, S. and S. Sharma (1996, January). Optimal search with learning.

Journal ofEconomic Dynamics and Control 20 (1), 333–359.Branco, F., M. Sun, and J. M. Villas-Boas (2012). Optimal search for product information.

Management Science 58 (11), 2037–2056.Bronnenberg, B. J., J. B. Kim, and C. F. Mela (2016). Zooming in on choice: How do consumerssearch for cameras online?

Marketing Science 35 (5), 693–712.Burdett, K. and K. L. Judd (1983). Equilibrium price dispersion.

Econometrica 51 (4), 955–969.Chade, H. and L. Smith (2006). Simultaneous search.

Econometrica 74 (5), 1293–1307.Chen, Y. and S. Yao (2017). Sequential search with reﬁnement: Model and application withclick-stream data.

Management Science 63 (12), 4345–4365.Choi, H. and C. F. Mela (2019). Monetizing online marketplaces.

Marketing Science 38 (6),913–1084.Choi, M., A. Y. Dai, and K. Kim (2018). Consumer search and price competition.

Economet-rica 86 (4), 1257–1281.De Los Santos, B., A. Hortacsu, M. R. Wildenbeest, and A. Hortaçsu (2012). Testing modelsof consumer search using data on web browsing and purchasing behavior.

The AmericanEconomic Review 102 (6), 2955–2980.De los Santos, B. and S. Koulayev (2017). Optimizing click-through in online rankings withendogenous search reﬁnement.

Marketing Science 36 (4), 542–564.DeGroot, M. H. (1970).

Optimal Statistical Decisions (Wiley Clas ed.). New Jersey: John Wiley& Sons, Inc. 33iamond, P. A. (1971). A model of price adjustment.

Journal of Economic Theory 3 (2),156–168.Doval, L. (2018). Whether or not to open Pandora’s box.

Journal of Economic Theory 175 ,127–158.Fershtman, D. and A. Pavan (2019). Searching for arms. Unpublished manuscript. Available athttp://faculty.wcas.northwestern.edu/ apa522/SFA.pdf.Ghose, A., P. G. Ipeirotis, and B. Li (2014). Examining the impact of ranking on consumerbehavior and search engine revenue.

Management Science 60 (7), 1632–1654.Gittins, J., K. Glazebrook, and R. Weber (2011).

Multi-armed Bandit Allocation Indices (2nded.). John Wiley & Sons, Inc.Glazebrook, K. D. (1979). Stoppable families of alternative bandit processes.

Journal of AppliedProbability 16 (4), 843–854.Hong, H. and M. Shum (2006). Using price distributions to estimate search costs.

The RANDJournal of Economics 37 (2), 257–275.Honka, E. (2014). Quantifying search and switching costs in the US auto insurance industry.

The RAND Journal of Economics 45 (4), 847–884.Honka, E. and P. Chintagunta (2017). Simultaneous or sequential? search strategies in the u.s.auto insurance industry.

Marketing Science 36 (1), 21–42.Honka, E., A. Hortaçsu, and M. A. Vitorino (2017). Advertising, consumer awareness, andchoice: evidence from the u.s. banking industry.

RAND Journal of Economics 48 (3), 611–646.Hortaçsu, A. and C. Syverson (2004). Product diﬀerentiation, search costs, and competition inthe mutual fund industry: A case study of S & P 500 index funds.

The Quarterly Journal ofEconomics 119 (2), 403–456.Jolivet, G. and H. Turon (2019). Consumer search costs and preferences on the internet.

Reviewof Economic Studies 86 (3), 1258–1300.Ke, T. T., Z.-J. M. Shen, and J. M. Villas-Boas (2016). Search for information on multipleproducts.

Management Science 62 (12), 3576–3603.Ke, T. T. and J. M. Villas-Boas (2019). Optimal learning before choice.

Journal of EconomicTheory 180 (January), 383–437.Keller, G. and A. Oldale (2003). Branching bandits: A sequential search process with correlatedpay-oﬀs.

Journal of Economic Theory 113 (2), 302–315.Kleinberg, R., B. Waggoner, and E. G. Weyl (2017). Descending price optimally coordinatessearch.Kohn, M. G. and S. Shavell (1974). The theory of search.

Journal of Economic Theory 9 (2),93–123.Koulayev, S. (2014). Search for diﬀerentiated products: identiﬁcation and estimation.

TheRAND Journal of Economics 45 (3), 553–575.Kuksov, D. (2006). Search, common knowledge, and competition.

Journal of Economic The-ory 130 (1), 95–108. 34ippman, S. A. and J. J. McCall (1976). The economics of job search: A survey.

EconomicInquiry 14 (2), 155–189.McCall, J. J. (1970). Economics of information and job search.

The Quarterly Journal ofEconomics 84 (1), 113–126.Moraga-González, J. L., Z. Sándor, and M. R. Wildenbeest (2017a). Nonsequential search equi-librium with search cost heterogeneity.

International Journal of Industrial Organization 50 ,392–414.Moraga-González, J. L., Z. Sándor, and M. R. Wildenbeest (2017b). Prices and heterogeneoussearch costs.

The RAND Journal of Economics 48 (1), 125–146.Morozov, I. (2019). Measuring beneﬁts from new products in markets with information frictions.Stanford. Working paper.Olszewski, W. and R. Weber (2015). A more general Pandora rule?

Journal of EconomicTheory 160 , 429–437.Rosenﬁeld, D. B. and R. D. Shapiro (1981). Optimal adaptive price search.

Journal of EconomicTheory 25 (1), 1–20.Rothschild, M. (1974). Searching for the lowest price when the distribution of prices is unknown.

Journal of Political Economy 82 (4), 689–711.Stigler, G. J. (1961). The economics of information.

The Journal of Political Economy 69 (3),213–225.Ursu, R. M. (2018). The power of rankings: Quantifying the eﬀect of rankings on online consumersearch and purchase decisions.

Marketing Science 37 (4), 530–552.Weitzman, M. L. (1979). Optimal search for the best alternative.

Econometrica 47 (3), 641–654.Zhang, X., T. Y. Chan, and Y. Xie (2018). Price search and periodic price discounts.

Manage-ment Science 64 (2), 495–510.

AppendixA Proofs of main Theorems and Propositions

A.1 Theorem 1

Let

Θ(Ω t , A t , z ) denote the value function of an alternative decision problem, where in ad-dition to the available actions in A t , there exists a hypothetical outside option oﬀering value z .As the SD problem satisﬁes that taking an action does not change the state of another availableaction and has the same branching structure, Theorem 1 of Keller and Oldale (2003) yields thata Gittins index policy is optimal and that the following holds: Θ(Ω t , A t , z ) = b − Z bz Π a ∈ A t ∂ Θ (Ω t , { a } , w ) ∂w d w (22) Compared to the baseline branching framework discussed by Keller and Oldale (2003), the SD problem does nothave discounting, and purchasing a product is a „terminal” action. Note also that whereas not explicitly statedby the authors, their framework accommodates the case where it is not known ex ante to how many „children”an available action branches into. This will be the case in the SD problem if the consumer does not know thenumber of products he will discover. b is some ﬁnite upper bound of the immediate rewards. The Gittins index of action d (discovering products) is deﬁned by g dt = E X (cid:2) Θ(Ω t +1 , A t +1 \ A t , g dt ) (cid:3) . Suppose the consumerknows the total number of alternatives | J | , and consider a period t in which more discoverieswill still be available in t + 1 with certainty. In this case we have g dt = E X h Θ(Ω t +1 , { d, s , . . . , sn d } , g dt ) i − c d (23) = E X " b − Z bg dt ∂ Θ (Ω t +1 , { d } , w ) ∂w n d Y k =1 ∂ Θ (Ω t +1 , { s k } , w ) ∂w d w − c d where s k ∈ S t +1 \ S t ∀ k . Θ(Ω t , { s k } , z ) is the value of a search problem with an outside op-tion oﬀering z and the option of inspecting product k (with known partial valuation x k ). Θ (Ω t +1 , { d } , w ) is the value of a search problem with an outside option oﬀering z , and theoption to discover more products. Finally, E X [ · ] is the expectation operator integrating overthe beliefs over the n d random variables in X = [ X, . . . , X ] , which does not depend on time.Optimality of the Gittins index policy then implies that when z ≥ g dt +1 , the consumerwill choose the outside option in t + 1 . Hence Θ (Ω t , { d } , w ) = w ∀ w ≥ g dt +1 which yields ∂ Θ(Ω t , { d } ,w ) ∂w = 1 ∀ w ≥ g dt +1 . This implies that for g dt ≥ g dt +1 , g dt does not depend on whethermore products can be discovered in the future, and the optimal policy is independent of thebeliefs over the number of available alternatives. As a result, as long as the Gittins index isweakly decreasing during search, i.e. g dt ≥ g dt +1 ∀ t , it is independent of the availability of futurediscoveries and beliefs q .It remains to show that g dt ≥ g dt +1 ∀ t holds in the proposed search problem. When | J | = ∞ , g dt = g dt +1 is immediately given by the fact that in both periods inﬁnitely many products remainto be discovered and that the consumer has stationary beliefs (i.e. q is constant and valuationsare independent and identically distributed). For | J | < ∞ , backwards induction yields that thiscondition holds: Suppose that in period t + 1 , no discovery action is available as all productshave been discovered. In this case, the Gittins index is given by g dt +1 = E X " b − Z bg dt +1 n d Y k =1 ∂ Θ (Ω t +1 , { s k } , w ) ∂w d w − c d (24)As ≤ ∂ Θ(Ω t , { d } ,w ) ∂w ≤ and ∂ Θ(Ω t , { s k } ,w ) ∂w ≥ , it holds that E X " b − Z bg dt +1 n d Y k =1 ∂ Θ (Ω t +1 , { s k } , w ) ∂w d w ≤ q E X " b − Z bg dt n d Y k =1 ∂ Θ (Ω t +1 , { s k } , w ) ∂w d w +(1 − q ) E X " b − Z bg dt ∂ Θ (Ω t , { d } , w ) ∂w n d Y k =1 ∂ Θ (Ω t , { s k } , w ) ∂w d w (25) which implies g t ≥ g t +1 .Finally, Θ(Ω t +1 , { d, s , . . . , sn d } , g dt ) = V (cid:0)(cid:10) ¯Ω , ω ( x, z ) (cid:11) , { b , s, . . . , sn d } ; ˜ π ) (cid:1) in (9) implies z d = g dt . Similarly, the deﬁnition of the inspection and purchase values (in (6) and (10)) areequivalent to the deﬁnition of Gittins index values for these actions and it follows that the Note that immediate rewards R ( a ) ≥ − max { c s , c d } , and that ﬁnite mean and variance of the distributions of X and Y imply that for all realizations x, y , there exists some b such that R ( a ) = x j + y j ≤ b . A.2 Theorem 2

Proof.

As a product always is bought, it suﬃces to show that the optimal policy never prescribesto buy product j if there exists another product k with w k > w j . To account for the case where C = ∞ , deﬁne z sk = ∞∀ k ∈ C which implies ˜ w k ≡ min (cid:8) z sk , z bk (cid:9) = z bk ∀ k ∈ C . First, considerthe case where k is revealed before j ( h ≤ h k < h j ). In this case, w k > w j if and only if either(i) ˜ w k ≥ z d or (ii) z d > ˜ w k > ˜ w j . In the former, the optimal policy prescribes to not discoverproducts beyond k , hence not to buy product j . This follows as z sk ≥ z d and z bk ≥ z d imply thatthe optimal policy prescribes that search ends with buying k before discovering j . In the latter, w j = ˜ w j < w k = ˜ w k , and the optimal policy prescribes to continue discovering such that bothproducts are in the awareness set. The eventual purchase theorem of Choi et al. (2018) thenapplies, and hence the optimal policy does not prescribe to buy product j . Second, consider thecase where k is discovered after j ( h k > h j ) . In this case, note that w j > w k if ˜ w j ≥ z d . Hence, w k > w j if and only if z d > ˜ w k > ˜ w j , which is the same as (ii) above. Finally, consider the casewhere k is discovered at the same time as j ( h k = h j ). Then w k > w j if and only if ˜ w k > ˜ w j ,which follows from the construction of the eﬀective values. This again is the same as (ii) aboveand hence the optimal policy does not prescribe to buy j . A.3 Proposition 1

Proof.

The proof follows a similar structure as the proof of Corollary 1 in Choi et al. (2018).To simplify exposition, the following additional notation is used: Let ˜ w j ≡ x j + min { y j , ξ j } as in Theorem 1, and ˆ w j equal to the eﬀective value from Theorem 2, with the adjustmentthat f ( h j ) = ε = 0 . Furthermore, let ¯ w r ≡ max k ∈ J r − ˆ w k ∀ r ≥ , ˜¯ w r ≡ max k ∈ J r ˜ w k and ˜¯ w r,j ≡ max k ∈ J r \ j ˜ w k where J a : b denotes the set of products discovered on position r ∈ { a, . . . , b } ,and J r is short-hand for J r : r . Finally, let · ) denote the indicator function and ¯ h the maximumposition.The payoﬀ of a consumer given realizations x j and y j for all j is given by ¯ h X r =1

1( ¯ w r < z d )  X j ∈ J r

1( ˜ w j ≥ max (cid:8) z d , ˜¯ w r,j (cid:9) )( x j + y j ) − x j + ξ j ≥ max (cid:8) z d , ˜¯ w r,j (cid:9) ) c s  + 1( ¯ w ≥ z d ) ν − ¯ h X r =1

1( ¯ w r < z d ) c d + 1( w ¯ h < z d ) ν (26) which follows from the optimal policy and Theorem 2: (i) If ¯ w ≥ z d , the stopping ruleimplies that the consumer does not discover any products beyond the initial awareness set.Conditional on not discovering any additional products, the payoﬀ then is equal to v , whichdenotes the payoﬀ of a directed search problem over products k ∈ S and an outside optionoﬀering ¯ u = max k ∈ C u k . (ii) If ¯ w r < z d , the continuation rule implies that the consumercontinues beyond position r − , i.e. discovers products on position r and pays discovery costs c d . (iii) Conditional on discovering j , when ˜ w j ≥ max (cid:8) z d , ˜¯ w r,j (cid:9) , the stopping and inspectionrules imply that the consumer buys j , gets utility x j + y j and does not continue beyond position r . (iv) Conditional on discovering j , when x j + ξ j ≥ max (cid:8) z d , ˜¯ w r,j (cid:9) , the inspection rule implies37hat the consumer inspects j and incurs costs c s . (v) If w ¯ h < z d , the continuation rule impliesthat the consumer discovers all products, whereas the inspection rule implies that he inspects allproducts (cid:8) j | x j + ξ j ≥ z d (cid:9) . Conditional having discovered all products, the consumer thereforehas the payoﬀ of a directed search problem over products (cid:8) j | x j + ξ j < z d (cid:9) with outside option ˜ u = max { u , max k ∈ { j | x j + ξ j ≥ z d ,x j + y j ≤ ξ j } x k + y k } . This is denoted by ν .Let E [ · ] integrate over the distribution of X j , Y j ∀ j ∈ J , and substitute inspection anddiscovery costs by c s = E h Y j ≥ ξ j )( Y j + x j − z sj ) i = ∀ j (note that z sj = x j + ξ j ) and c d = E h

1( ˜¯ W r ≥ z d )( ˜¯ W r − z d ) i (see Appendix B). The expected payoﬀ then is given by: ¯ h X r =1 E "

1( ¯ W r < z d ) X j ∈ J r

1( ˜ W j ≥ max { z d , ˜¯ W r,j } )( X j + Y j ) − X j + ξ j ≥ max { z d , ˜¯ W r,j } )1( Y j ≥ ξ j )( Y j − ξ j ) ! − ¯ h X r =1 E h

1( ¯ W r < z d )1( ˜¯ W r ≥ z d )( ˜¯ W r − z d ) i + E h

1( ¯ W ≥ z d ) ν + 1( ¯ W < z d ) ν i = ¯ h X r =1 E 

1( ¯ W r < z d ) X j ∈ J r

1( ˜ W j ≥ max { z d , ˜¯ W r,j } )( X j + ξ j ) ! − ¯ h X r =0 E h

1( ¯ W r < z d )1( ˜¯ W r ≥ z d )( ˜¯ W r − z d ) i + E h

1( ¯ W ≥ z d ) ν + 1( ¯ W < z d ) ν i = ¯ h X r =1 E h

1( ¯ W r < z d )1( ˜¯ W r ≥ z d ) ˜¯ W r i − ¯ h X r =1 E h

1( ¯ W r < z d )1( ˜¯ W r ≥ z d )( ˜¯ W r − z d ) i + E h

1( ¯ W ≥ z d ) ν + 1( ¯ W < z d ) ν i = ¯ h X r =1 E h

1( ¯ W r < z d )1( ˜¯ W r ≥ z d ) z d i + E "

1( ¯ W ≥ z d ) max (cid:26) ¯ u , max k ∈ S ˜ W k (cid:27) + 1( ¯ W < z d ) max { ˜ u , max k ∈ { k | x k + ξ k z d ) > ,as otherwise Q d ( c d , c s , z d ) > . Hence with | J | = ∞ , P (cid:16) max j ∈ J ˜ W j < z d (cid:17) = 0 such that E h max j ∈ J ˆ W j i = z d . 38 .4 Proposition 2 Proof.

Consider a situation where we decrease costs c s and c d to either c ′ s = c s − ∆ or c ′ d = c d − ∆ ,while keeping the other cost constant. Let H ( · ) and H ( · ) denote the cumulative density of ¯ W ≡ max { ¯ w , max j ∈ J \ C ∪ S ˆ W j } in the former and the latter case respectively, where ¯ w ≡ max { max k ∈ C u k , max k ∈ S ˜ w k } is the value of the alternatives in the initial consideration andawareness sets. Similarly, let z d and z d denote the associated discovery values. Given n d = 1 ,we have ∂Q d ( c d ,c s ,z ) ∂c d < ∂Q d ( c d ,c s ,z ) ∂c s ; hence (cid:12)(cid:12)(cid:12) ∂z d ∂c d (cid:12)(cid:12)(cid:12) > (cid:12)(cid:12)(cid:12) ∂z d ∂c s (cid:12)(cid:12)(cid:12) and z d > z d . Moreover, note that thedeﬁnition of the adjusted eﬀective value ˆ w j implies H i ( w ) = 1 ∀ w ≥ z di and H i ( w ) = 0 ∀ w ≤ ¯ w .Conditional on ¯ w < z d , the diﬀerence in a consumer’s expected payoﬀ across the two changestherefore can be written as Z z d z d − H ( w ) d w − Z z d ¯ w H ( w ) − H ( w ) d w (27)Whereas the ﬁrst part is strictly positive, the second part is negative. The latter follows as for w ∈ [ ¯ w , z d ] , ¯ W = max j ∈ J \ C ∪ S X j + min { Y j , ξ } and ∂ξ∂c s < such that H ( w ) ≤ H ( w ) . Asvaluations are independent across products, we have H k ( w ) = P X,Y ( X + min { Y, ξ k } ≤ w ) | J | ;hence, as | J | increases, H ( w ) − H ( w ) and H ( w ) decrease for w ∈ [ ¯ w , z d ] . Consequently, forall ∆ > there exists some threshold n ∗ for | J | such that the diﬀerence in the expected payoﬀconditional on ¯ w < z d is positive, i.e. Z z d z d − H ( w ) d w > Z z d ¯ w H ( w ) − H ( w ) d w (28)Conditional on ¯ w ≥ z d , having z d > z d immediately implies that the expected payoﬀincreases by at least as much when decreasing discovery costs. Note also that when z d < ¯ w ,neither change aﬀects the expected payoﬀ. Finally, integrating over the realizations y k for k ∈ S that determine ¯ w yields the unconditional expected payoﬀ as a combination of these cases, whichimplies the ﬁrst result.Increasing the value of the alternatives in the initial consideration and awareness set thenmakes larger values of ¯ w more likely. This implies the second result, as it makes both the case ¯ w ≥ z d more likely, as well as decrease the right-hand-side of (28). A.5 Proposition 3

Proof. At c s = 0 , we have z d = z RS . (cid:12)(cid:12)(cid:12) ∂z RS ∂c s (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) ∂z d ∂c s (cid:12)(cid:12)(cid:12) then implies z d ≥ z RS . Using this in (14)and (15) immediately yields the result. A.6 Proposition 4

Proof.

The ﬁrst two statements immediately follow from (18) and (19). To see the latter, rewrite(18) as P W (cid:0) W < z d (cid:1) h − P W (cid:0) W ≥ z d (cid:1) , and (19) in a similar way. c RS = c s + c d then implies z d ≥ z RS . Hence, P W (cid:0) W < z d (cid:1) = P X,Y ( X + min { Y, ξ } < z d ) ≥ P X,Y (cid:0) X + Y < z RS (cid:1) whichdirectly implies the existence of the threshold. Note that if P X,Y ( X + min { Y, ξ k } ≤ w ) is large, then H ( w ) − H ( w ) will ﬁrst increase in | J | , before starting todecrease. .7 Proposition 5 Proof.

Write the ﬁrst expression in (20) (demand at position h ) as E ˜ W h h P (cid:16) ˜ W h +1 ≤ ˜ W h (cid:17)Q k / ∈{ h,h +1 } P (cid:16) ˜ W k ≤ ˜ W h (cid:17)i . When | J | decreases, this expression decreases through the productterm, which is weighted by the ﬁrst term P (cid:16) ˜ W h +1 ≤ ˜ W h (cid:17) . As P (cid:16) ˜ W h +1 ≤ t (cid:17) ≥ P (cid:16) ˜ W h ≤ t (cid:17) ∀ t ,the ﬁrst expression in (20) decreases by more than the second one when the number of alterna-tives increases. A.8 Proposition 6

Proof.

The RS problem is equivalent to a policy in the SD problem that commits on inspectingevery product that is discovered, conditional on which the consumer chooses to stop optimally.However, as the optimal policy in the SD problem is not this policy, it must yield a (weakly)larger payoﬀ.

A.9 Uniqueness of discovery value

Proposition 7. (10) has a unique solution.Proof. Q d ( c d , c s , z ) with respect to z yields (see Appendix B) ∂Q ( c d , c s , z ) ∂z =  + H ( z ) − if z < − H ( z ) else (29)where H ( · ) denotes the cumulative density of the random variable max k ∈ ˜ J ˜ W k . This implies ∂Q d ( c d ,c s ,z ) ∂z ≤ , which combined with continuity, Q d ( c d , c s , ∞ ) = − c d and Q d ( c d , c s , −∞ ) = ∞ imply that a solution to (10) exists. Finally, uniqueness requires Q d ( c d , c s , z ) to be strictlydecreasing at z = z d . ∂Q d ( c d ,c s ,z d ) ∂z = 0 would require that H ( z d ) = 1 , which contradicts thedeﬁnition of the discovery value value z d in (10), as it implies Q d ( c d , c s , z d ) ≤ − c d < . B Further details on Search and Discovery Values

The search value of a product j is deﬁned by equation (6) and sets the myopic net gain of theinspection over immediately taking a hypothetical outside option oﬀering utility z to zero. Thismyopic net gain can be calculated as follows: Q s ( x j , c s , z ) = E Y [max { , x j + Y − z } ] − c s = Z ∞ z − x j ( x j + y − z )d F ( y ) − c s = Z ∞ z − x j [1 − F ( y )] d y − c s Substituting ξ j = z − x j then yields (7). The second steps holds as with a change in the order of integration we get R ∞ z − x j [1 − F ( y )]d y = R z − x j R ∞ y f Y ( t )d t d y = R z − x j R tz − x j f Y ( t )d y d t = R z − x j [ yf Y ( t )] y = ty = z − x j d t . z to zero. Corollary 1 in Choi et al. (2018) and similar steps as the above then imply that: Q d ( c d , c s , z ) = E X , Y (cid:20) max (cid:26) z, max k ∈{ ,...,n d } ˜ W k (cid:27)(cid:21) − z − c d = E X , Y (cid:20) max (cid:26) , max k ∈{ ,...,n d } ˜ W k − z (cid:27)(cid:21) − c d = Z ∞ z − H ( w )d w − c d where H ( · ) denotes the cumulative density of the random variable max k ∈ ˜ J ˜ W j . The above alsoimplies that in the case where Y is independent of X , a change in variables yields that thediscovery value is linear in the mean of X , denoted by µ X : z d = µ X + Ξ( c s , c d ) where Ξ( c s , c d ) solves (10) for an alternative random variable ˜ X = X − µ X . C Monotonicity and Extensions

Monotonicity of the Gittins index values ( g dt ≥ g dt +1 ∀ t ) is satisﬁed whenever the following holds: ≤ E X ,Y,n d ,q,t h Θ(Ω t +1 , ˜ A t +1 , g dt ) i − E X ,Y,n d ,q,t +1 h Θ(Ω t +2 , ˜ A t +2 , g dt +1 ) i (30)where g dt is the Gittins index of discovering products (deﬁned by (23)), and ˜ A t +1 ≡ { d, s , . . . , sn d } is the set of actions available in t + 1 containing the newly revealed products and (if available)the possible future discoveries. The expectation operator E X ,Y,n d ,J,t [ · ] integrates over the fol-lowing random realizations, where the respective joint distribution now can be time-dependent:(i) Partial valuations drawn from X = [ X , . . . , X n d ] ; (ii) conditional distributions F Y | X = x ( y ) ;(iii) the number of revealed alternatives ( n d ); (iv) whether more products can be discovered infuture periods determined by the belief q .It goes beyond the scope of this paper to determine all possible speciﬁcations of beliefs whichsatisfy this condition. However, Proposition 8 provides two speciﬁcations that can be of interestand for which (30) holds (see also Section 3). Proposition 8. (30) holds for the below deviations from the baseline model:i) Y is independent of X . Beliefs are such that the revealed partial valuations in X are i.i.d.with time-dependent cumulative density G t ( x ) such that G t ( x ) ≤ G t +1 ( x ) ∀ x ≥ z d − ξ .ii) The consumer does not know how many alternatives he will discover. Instead, he has beliefssuch that with each discovery, at most the same number of alternatives are revealed as inprevious periods ( n d,t +1 ≤ n d,t ). roof. Each part is proven using slightly diﬀerent arguments.i) Let ˜ x ≡ max k ∈{ ,...,n d } x k . If ˜ z s = ˜ x + ξ ≤ z d , Θ(Ω t +1 , ˜ A t +1 , z d ) = 1 , whereas for ˜ x >z d − ξ , ∂ Θ(Ω t +1 , { e,s ,...,sn d } ,z d ) ∂ ˜ x ≥ . Independence implies that the cumulative density of themaximum ˜ x is ˜ G t ( x ) = G t ( x ) n d . Consequently, whenever the distribution of X shifts suchthat G t ( x ) ≤ G t +1 ( x ) ∀ x ≥ z d − ξ , larger values of Θ(Ω t +1 , ˜ A t +1 , g dt ) become less likely in t + 1 , and hence (30) holds.ii) Since ∂ Θ(Ω t +1 , { s k } ,w ) ∂w ≤ , we have ∂ Θ(Ω t +1 , ˜ A t +1 ,g dt ) ∂n d ≥ . Hence (30) holds given n d,t +1 ≤ n d,t .Based on this monotonicity condition, Proposition 9 generalizes Theorem 1. It implies thatwhenever (30) holds, the discovery value can be calculated based on the expected myopic netgain of discovering products over immediately taking the hypothetical outside option. Hence,whenever (30) holds, the optimal policy continues to be fully characterized by reservation valuesthat can be obtained without having to consider many future periods. Proposition 9.

Whenever (30) is satisﬁed, Theorem 1 continues to hold (with appropriateadjustment of the discovery value’s time-dependence).Proof.

Follows directly from the proof of Theorem 1.

D Violations of independence assumption

Costly recall:

Consider a variation to the search problem, where purchasing a product in theconsideration set is costly unless it is bought immediately after it is inspected. If in period t product j is inspected, then inspecting another product or discovering more products in t + 1 will change the payoﬀ of purchasing product j by adding the purchase cost. In the context ofa multi-armed bandit problem, this case arises if there are nonzero costs of switching betweenarms. Banks and Sundaram (1994), for example, provide a more general discussion on switchingcosts and the nonexistence of optimal index-based strategies. The same reasoning also applies ina search problem where inspecting a product is more costly if the consumer ﬁrst discovers moreproducts. The exception is if there are inﬁnitely many alternatives. In this case, the optimalpolicy never prescribes to recall an alternative. Learning:

Independence is also violated for some types of learning. Consider a variationof the search problem, where the consumer updates his beliefs on the distribution of Y . In thiscase, by inspecting a product k and revealing y ik , the consumer will update his belief about thedistribution of Y , thus aﬀecting the expected payoﬀs of both discovering more and inspectingother products. Independence therefore is violated and the reservation value policy is no longeroptimal. Note, however, that as long as learning is such that only payoﬀs of actions that will Adam (2001) studies a similar case where independence continues to hold across groups of products. However,his results do not extend to the case with limited awareness, as the beliefs of Y also determine the expectedbeneﬁts of discovering more products.

42e available in the future are aﬀected, independence continues to hold. This is for example thecase when the consumer learns about the distribution of X as discussed in Section 3.2. Purchase without inspection:

A ﬁnal setting where independence does not hold is whena consumer can buy a product without ﬁrst inspecting it. In this case, the consumer has twoactions available for each product he is aware of. He can either inspect a product, or directlypurchase it. Clearly, when the consumer ﬁrst inspects the product, the information revealedchanges the payoﬀ of buying the product. Independence therefore is violated and the reservationvalue policy is not guaranteed to be optimal. Doval (2018) studies this search problem for thecase where a consumer is aware of all available alternatives, and characterizes the optimal policyunder additional conditions.

E Learning

Several studies consider priors or learning rules under which the optimal policy is myopic whensearching with recall (Rothschild, 1974; Rosenﬁeld and Shapiro, 1981; Bikhchandani and Sharma,1996; Adam, 2001). A suﬃcient condition for the optimal policy to be myopic is given in The-orem 1 of Rosenﬁeld and Shapiro (1981): Once the expected net beneﬁts of continuing searchover stopping with the current best option are negative, they remain so. Hence, whenever it isoptimal to stop in t , it is also optimal to stop in all future periods. The monotonicity conditionused in this paper directly imposes that this is satisﬁed; expected beneﬁts of discovering moreproducts remain constant or decrease during search. A fairly general assumption underlyinglearning rules that satisfy this condition is Assumption 1 in Bikhchandani and Sharma (1996).This assumption requires that beliefs are updated such that values above the largest value re-vealed so far become less likely. Hence, whenever a better value is found than the current best,ﬁnding an even better match in the future becomes less likely.In the SD problem, similar learning rules that satisfy this condition are diﬃcult to ﬁnd.When the consumer learns about the number of products that are revealed with each discovery,expected beneﬁts of discovering more products increase if many products are revealed, but thevalue of stopping remains the same if all these products are bad matches. Hence, a learning rulewould need to guarantee that beliefs shift such that the expected beneﬁts of discovering moreproducts do not increase, as opposed to only the net beneﬁts over stopping. Similarly, if theconsumer learns about the distribution of partial valuations X , the value of stopping need notincrease even if partial information indicates a good match leading the consumer to shift beliefstowards larger values; after inspecting a promising product, the consumer may still realize thatthe product is worse than the previously best option.Though the optimal policy is not myopic with learning, it is still based on the Gittins index,where the search and purchase values are as in the baseline SD problem. The main diﬃculty iscalculating the index value for discovering more products, denoted by z Lt . Whereas calculatingthis value precisely would require accounting for learning in future periods, it is possible toderive bounds on this value that are easier to calculate and can be used to judge how far oﬀ amyopic policy is. Note that Bikhchandani and Sharma (1996) consider search for low prices.

43o show this, I focus on the case where the consumer learns about the distribution of partialvaluations. In particular, consider the following variation of the search and discovery problem:Let the distribution of partial valuations in X be characterized by a parameter vector θ anddenote its cumulative density by G θ ( · ) . The consumer initially does not know the true parametervector and Bayesian updates his beliefs in period t given some prior distribution. Denoting theconsumers’ beliefs on θ with cumulative density P t ( · ) , the consumers’ beliefs about X drawn inthe next discovery are characterized by the cumulative density ˜ G t ( X ) = R G θ ( X ) d P t ( θ ) . Denote a k -step look-ahead value as z dt ( k ) and deﬁne it as the value of a hypothetical outsideoption that makes the consumer indiﬀerent between stopping immediately, and discovering moreproducts after which at most k − more discoveries remain. For example, z dt (1) satisﬁes themyopic comparison in (10), where expectations are calculated based on period t beliefs ˜ G t ( · ) .The deﬁnition of z dt (1) then implies that it is equal to the expected value of continuing to discoverproducts if no future discoveries remain. As the consumer can stop and take this hypotheticaloutside option in t + 1 , allowing for more discoveries after t + 1 can only increase the expectedvalue, hence z dt (1) ≤ z dt (2) · · · ≤ z Lt . z dt (1) therefore provides a lower bound on z Lt , and z Lt canbe approximated with increasing precision through k -step look-ahead values.To derive an upper bound, consider the case where the consumer learns the true θ in t + 1 ,if he chooses to discover more products in t . The value of discovering more products in t whenthe true θ is revealed in t + 1 then is larger compared to the case where the consumer continuesto learn. This is formally derived by Kohn and Shavell (1974) for a search problem where aconsumer samples from an unknown distribution. Intuitively, when the true θ is revealed, theconsumer is able choose the action in t + 1 that maximizes the expected payoﬀ going forward foreach realization of θ . In contrast, if the consumer does not learn the true θ in t + 1 , he cannotchoose the maximizing action for each realization of θ , but only the action that maximizesexpected payoﬀ on average across possible θ .An upper bound therefore is given by the value ¯ z dt such that the consumer is indiﬀerentbetween stopping and taking a hypothetical outside option oﬀering ¯ z dt , and discovering moreproducts after which the true θ is revealed. Formally, ¯ z dt satisﬁes ¯ z dt = Z Z ˜ V (Ω t +1 , A t +1 , ¯ z t ; θ ) d P t +1 ( θ ) d ˜ G t ( X ) (31)where ˜ V (Ω t +1 , A t +1 , ¯ z dt ; θ ) denotes the expected value of a search and discovery problem withknown θ and an outside option oﬀering ¯ z dt . Proposition 1 then directly allows to calculate thisvalue without having to consider all the possible search paths.Proposition 10 summarizes these results. A similar result can also be derived for the casewhere the consumer learns about a distribution from which the number of products that arediscovered is drawn. Proposition 10.

In the search and discovery problem with Bayesian learning about an unknowndistribution of partial valuations X , it is optimal to: For example, consider the case of sampling from a Normal distribution with unknown mean and known variance,and assume n d = 1 . If the consumer believes in t that the mean is distributed normally with θ ∼ N ( µ t , σ t ) , then ˜ G t ( x ) = Φ( x − µ t σ t ) , where Φ( · ) is the standard normal cumulative density (see e.g. Theorem 1 in DeGroot, 1970,Ch. 9.5). ) continue whenever max k ∈ C t u k ≤ z dt (1) ii) stop whenever max k ∈ C t u k ≥ ¯ z dt F Estimation Details

To estimate the three models I use a simulated maximum likelihood approach based on a kernel-smoothed frequency simulator. Using numerical optimization, parameters are found that maxi-mize the simulated likelihood given by: max γ X i L i ( γ ) = X i log N d N d X d =1

11 + P N k k =1 exp( − λκ kdi ) ! where γ is the parameter vector, N d is the number of simulation draws, λ is a smoothingparameter and κ kd is one of N k inequalities resulting from the optimal policy in the respectivemodel evaluated for draw d . All three models are estimated with λ = 10 and N d = 500 . Atthese values, parameters are recovered well when data is generated with the same model. DS conditions

These conditions are the same as in Ursu (2018), who provides further detailson how they relate the optimal policy in the DS problem. The diﬀerence to her speciﬁcationis that inspection costs are linear and that there is no position. For observed consideration set C i for consumer i , a given draw d for the unobserved taste shocks y jd which deﬁnes productutilities u j ( d ) as well as the utility of the purchased option u ∗ i , there are multiple purchase, andstopping conditions expressed in inequalities:Stopping: κ kdi = max j ∈ C i u j ( d ) − z h ∀ h / ∈ C i Continuation κ kdi = z h +1 − max j ∈ C i ( h ) u j ( d ) ∀ h = 1 , , . . . , N is − Purchase: κ kdi = u ∗ i ( d ) − u j ( d ) ∀ j ∈ C i In the continuation conditions, N is denotes the number of observed inspections, z h +1 is thesearch value of the next inspection and C i ( h ) is the consideration set of i after h inspections.Note that the last relies on observing the order in which products are inspected; if this orderwere not observed, the method proposed by Honka and Chintagunta (2017) could be used tointegrate over possible search orders. The stopping condition only applies if not all products areinspected, the continuation condition only applies if i inspected at least one product. RS conditions

The conditions in the RS model are similar to the ones in the DS model.However, the stopping and continuation conditions now are based on the reservation value z RS ,which follows directly from the optimal policy:Stopping: κ kdi = max j ∈ C i u j ( d ) − z RS Continuation κ kdi = z RS − max j ∈ C i ( h ) u j ( d ) ∀ h = 1 , , . . . , N is − Purchase: κ kdi = u ∗ i ( d ) − u j ( d ) ∀ j ∈ C i I conditions

In the FI model, standard purchase conditions apply: κ kdi = u ∗ i ( d ) − u j ( d ) ∀ j G Sellers’ decisions

To illustrate the diﬀerence in sellers’ decision making across the SD and DS problem, we cancompare the market demand generated by the SD problem with the one from the DS problemwhen there are inﬁnitely many alternatives. Given a unit mass of consumers, market demandfor a product discovered at position h is given by d SD ( h ) = P W (cid:16) W k < z d ∀ k < h (cid:17) P W h (cid:16) W h ≥ z d (cid:17) (32)where W h is the random eﬀective value of a product on position h . The expression immediatelyfollows from the stopping decision which implies that if a consumer discovers a product with w j ≥ z d , he will stop searching and buy a product j . Hence, the consumer will only discover andhave the option to buy a product on position h if w h < z d for all products on earlier positions.For the DS problem, Choi et al. (2018) showed that the market demand is given by d DS ( h ) = P W (cid:18) ˜ W h ≥ max k ∈ J ˜ W k (cid:19) (33)where ˜ W k = X k + min { Y k , ξ k } .Now suppose that the seller of a product on position h sets the mean of X h , for exampleby choosing a price. In the SD problem, this is equivalent to choosing P W h (cid:0) W h ≥ z d (cid:1) ; theprobability that the consumer inspects and then stops search by buying the seller’s product.Importantly, this does not directly depend on partial valuations of both products at earlier,and products at later positions. This results from the stopping decisions, and given the inﬁnitenumber of products a consumer will never recall a product discovered earlier.In contrast, in the DS problem, choosing the mean of X hh