RROBUST SEQUENTIAL SEARCH
Karl H. Schlag and Andriy Zapechelnyuk
Abstract.
We study sequential search without priors. Our interest lies in decisionrules that are close to being optimal under each prior and after each history. We callthese rules dynamically robust. The search literature employs optimal rules basedon cutoff strategies that are not dynamically robust. We derive dynamically robustrules and show that their performance exceeds 1/2 of the optimum against binaryenvironments and 1/4 of the optimum against all environments. This performanceimproves substantially with the outside option value, for instance, it exceeds 2/3 ofthe optimum if the outside option exceeds 1/6 of the highest possible alternative.
JEL Classification:
D83, D81, C44
Keywords:
Sequential search; search without priors; robustness; dynamic consist-ency; competitive ratio
Date : 4th August 2020.
Schlag : Department of Economics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna,Austria.
E-mail: [email protected].
Zapechelnyuk : School of Economics and Finance, University of St Andrews, Castlecliffe, the Scores,St Andrews KY16 9AR, UK.
E-mail: [email protected] authors would like to thank Dirk Bergemann, Jeffrey Ely, Olivier Gossner, JohannesH¨orner, Bernhard Kasberger, and Alexei Parakhonyak for helpful comments and suggestions. a r X i v : . [ ec on . T H ] A ug SCHLAG AND ZAPECHELNYUK Introduction
Suppose that you check stores one by one in search of the cheapest place to buy somegood. Your decision of when to stop searching depends on the distribution of pricesyou expect to encounter in unvisited stores. The methodology of Bayesian decisionmaking proposes to turn this into an optimization problem, using as input yourprior belief about possible distributions, mathematically formulated as a distributionover distributions. This is a complex and usually intractable intertemporal decisionproblem. Special cases can be solvable, but solutions are fragile as they depend onyour beliefs about what you do not know (see Gastwirth, 1976).We are interested in a robust approach to this problem that does not depend onspecific prior beliefs of a decision maker. Instead of focusing on optimality for someprior, we look for epsilon-optimality for all priors. Furthermore, we are interested ina dynamically consistent approach in which the performance matters at any point intime and not only at the outset. In this paper, we formalize a performance criterionthat fulfills these desiderata. Decision rules that are optimal under this criterion arecalled dynamically robust . We derive general properties of dynamically robust rulesand then show how close their performance is to the optimal ones under each prior.The practical relevance of robust decision making is apparent. How can a shopperknow the distribution of prices offered in the next store? How does she form a priorabout such distributions? Even if a prior is formed, will the shopper be able toovercome the complexity of Bayesian optimization? Will the decision rule still begood if the prior puts little or no weight on the environment that is realized? Howwill the shopper argue about the optimality of a particular decision rule in frontof her peers if they do not have the same prior as she does? These questions canbe addressed by a decision rule that performs relatively well for any prior. Such arule can be proposed as a compromise among Bayesian decision makers who havedifferent priors. It is a shortcut to avoid cumbersome calculations involved in findingthe Bayesian optimal rule. Finally, as a single rule that does not depend on individual(unobservable) beliefs, it is a useful benchmark for empirical studies.The setting we consider in this paper is as follows. Alternatives arrive according tosome i.i.d. process. An individual who does not know the underlying distributionhas to decide after each draw whether to stop or to continue. There is free recall:when the individual stops she can choose the best alternative found so far. Values are
OBUST SEQUENTIAL SEARCH 3 discounted over time, thus, waiting for better alternatives is costly. In an extensionwe also include an additive cost of waiting for better alternatives (see Appendix C.3).As our first contribution, we develop a methodology for robust decision making thatapplies not only to sequential search. In a nutshell, we replace optimality for a givenprior by epsilon-optimality under all priors and after all histories of past observations,and look for a smallest such epsilon. Specifically, we measure the performance of agiven decision rule as follows. For each prior and each history, we compute the ratioof the rule’s payoff to the maximal possible payoff. We then evaluate the rule by thesmallest of these ratios and call it the performance ratio of the rule. This performanceratio describes what fraction of the maximal payoff can be guaranteed, regardless ofthe prior under which the payoffs are computed and regardless of which alternativeshave realized over time. We are interested in a decision rule that achieves the largestpossible performance ratio. Such a rule will be called dynamically robust .As our second contribution, we solve the described sequential search problem. Thisis done first for binary environments, and then for more general environments. Anenvironment is binary if it is a lottery over two alternatives, low and high. Thevalues of these alternatives need not be known to the individual. We find that thedynamically robust performance ratio against binary environments is at least 1/2. So,the individual can always guarantee at least a half of the maximal payoff, even if thevalue of the maximal payoff is not known. Moreover, if there is an upper bound on thepossible values of the high alternative, then the dynamically robust performance ratiois strictly increasing in the individual’s outside option, attaining 2/3 and 3/4 when theoutside option is, respectively, 1/6 and 1/3 of that upper bound. Surprisingly, theseresults extend to general environments, provided that possible values of alternativeshave an upper bound, and the outside option is not too small. The decision rule thatsupports these findings prescribes to stop after any given history with a probabilitythat is increasing in the value of the best realized alternative. In general, we showthat the dynamically robust performance ratio is always at least 1/4, where this lowerbound is attained when alternatives are unbounded, or in the limit as the outsideoption approaches zero.Our analysis reveals that a dynamically robust rule has three notable properties.First, any such rule prescribes randomization between stopping and continuing thesearch. Intuitively, one should not stop with certainty when concerned that futureoutcomes may be higher. Similarly, one should not continue with certainty when
SCHLAG AND ZAPECHELNYUK concerned that future outcomes may never be higher. This stands in contrast withalmost the entire search literature that studies deterministic cutoff rules. Second, a dynamically robust rule does not make any inference about the environmentfrom past observations. The reason is that, after any history of explored alternatives,some degenerate environments can be ruled out. Yet, for every such environment,there are arbitrarily close nondegenerate environments that cannot be ruled out.Thus the closure of set of feasible posteriors about unexplored alternatives remainsunchanged.Finally, the worst-case priors that determine the robust performance ratio are de-generate, assigning probability one to a specific i.i.d. distribution. This means thatthe payoff ratio of a decision rule can only be higher under nondegenerate priors. Inaddition, the worst-case distributions have support on at most two different values ofalternatives. Loosely speaking, this is because the individual makes a binary choice ineach round, and hence, two values, high and low, provide enough freedom to constructworst-case distributions.Our dynamically robust rules can be replaced by simpler rules without substantiallychanging the performance ratio. These simpler rules involve a stopping probabilitythat is linear in the best realized alternative (see Appendix C.2).
Alternative Approaches to Performance Measurement.
Our paper deals withdecision making under multiple priors. A prominent candidate criterion is maximinexpected utility, as in Wald (1950) and Gilboa and Schmeidler (1989). There is aconceptual reason why we do not follow this approach. In this paper, we maintainthe classic utility maximization preferences, moving only from optimality to epsilonoptimality. Our approach makes sense to one who is unable to solve the sequential de-cision problem, unsure which specific prior to assign, or in need of justifying behaviorin front of others. In contrast, the maximin utility approach does not have any oneof these interpretations. Moreover, it takes a very different approach to multiplicityof priors. Instead of trying to be good irrespective of the prior (as in the originalmeaning of the term “robust” as discussed below) it aims to do best for the veryspecific prior where payoffs are lowest. An exception is Janssen et al. (2017).
OBUST SEQUENTIAL SEARCH 5
On top of this, the maximin utility approach is too restrictive in the sequential searchproblem. The rule selected by the maximin utility criterion prescribes to stop imme-diately and not to search at all. So, this criterion does not present useful insights forunderstanding how to search.Another criterion that receives a lot of attention is minimax regret. The degree ofsuboptimality (referred to as regret) is measured either in terms of differences (Savage,1951) or, as popular in the computer science literature, in terms of ratios (Sleatorand Tarjan, 1985; see also the axiomatization of Terlizzese, 2008), which can also befound in the robust contract literature (e.g., Chassang, 2013). We prefer ratios toobtain a scale-free measure and, thus, to be able to compare the performance afterdifferent histories, as well as across different specifications of the environment.A common feature in the minimax regret literature is the evaluation of the payoffsretrospectively, after all uncertainty is resolved, as in the search models of Berge-mann and Schlag (2011b) and Parakhonyak and Sobolev (2015). Instead, we adopt aforward-looking approach, similar to Hansen and Sargent (2001), Perakis and Roels(2008), Jiang et al. (2011), and Kasberger and Schlag (2017). The individual judgesand compares decision rules by their discounted expected payoffs before the uncer-tainty is resolved, as a standard Bayesian decision maker would.An innovative aspect to our methodology is that, in the spirit of Bayesian decisionmaking, we evaluate the performance not only ex-ante, but also after each additionalpiece of information has been gathered. We identify a bound on the relative per-formance loss that the decision maker tolerates in exchange for having a rule thatdoes not depend on a specific prior. The corresponding decision rule is dynamicallyconsistent in the sense that this bound will not be exceeded, regardless of what altern-atives are realized. We are not aware of any paper that either formulates or derivesdynamically consistent robust search behavior. In particular, ex-ante commitmentis required in the literature on the secretary problem (Fox and Marnie, 1960) thatstudies sequential search within a nonrandom set of exchangeable alternatives (fora review, see Ferguson, 1989). An analysis of ex-ante robust search in the settingof this paper is difficult and remains unsolved. Bergemann and Schlag (2011b) and Schlag and Zapechelnyuk (2017) consider dynamic decision making without priors in a non-searchsetting. A crucial difference from this paper is that they compare the performance of a decision ruleto those of a few given benchmark strategies, not to the optimal behavior for the underlying model. We investigate the secretary problem under our criterion of dynamic robustness in a separate paper(Schlag and Zapechelnyuk, 2016).
SCHLAG AND ZAPECHELNYUK
Parakhonyak and Sobolev (2015) study a special case with two periods, and Babaioffet al. (2009) study asymptotic performance of approximately optimal algorithms ina related problem with no recall, so these results are not comparable to our paper.
Other Related Literature.
The term robustness goes back to Huber (1964, 1965),defined as a procedure whose “performance is insensitive to small deviations of theactual situation from the idealized theoretical model” (Huber, 1965). Prasad (2003)and Bergemann and Schlag (2011a) formalize this notion for a policy choice, theymeasure insensitivity under small deviations as performance being close to that of theoptimal policy. The same approach has been applied to large deviations, where theperformance is evaluated under a large class of distributions, as in statistical treatmentchoice (Manski, 2004, Schlag, 2006, and Stoye, 2009), auctions (Kasberger and Schlag,2017), and search in markets (Bergemann and Schlag, 2011b, and Parakhonyak andSobolev, 2015). The term robustness has been used in the same spirit – to achieve anobjective independently of modeling details – in robust mechanism design (Bergemannand Morris, 2005), and in the field of control theory (Zhou et al., 1995). Dynamic consistency has been studied in other models of choice under ambiguityby Epstein and Schneider (2003), Maccheroni et al. (2006), Klibanoff et al. (2009),Riedel (2009), and Siniscalchi (2011). The challenge in this literature has been howto appropriately update information over time. In many cases this can only be doneby artificially constraining possible environments and priors. We avoid the resultingconceptual and technical obstacles by letting a Bayesian decision maker process theinformation, which is dynamically consistent by definition.2.
Model
Setting.
An individual chooses among alternatives that arrive sequentially. Shestarts with an outside option x which is given and is strictly positive, so x > x , x , ... are realizations of an infinite sequence of i.i.d. random variables.Each x t ≥ t = 0 , , , ... , after having observed x t , the individual decides whether to stop The term robustness has also been used in other contexts. It appears in the maximin utility approach(Wald, 1950, and Gilboa and Schmeidler, 1989) adapted to robust contract design (Chassang, 2013,and Carroll, 2015), robust optimization (Ben-Tal et al., 2009), robust selling mechanisms (Carrascoet al., 2018), and robust control in macroeconomics (Hansen and Sargent, 2001). It also appearsin Kajii and Morris (1997) where the concept of robustness is related to closeness in the strategyspace, rather than in the payoff space.
OBUST SEQUENTIAL SEARCH 7 the search, or to wait for another alternative. There is free recall : when the individualdecides to stop, she chooses from all the alternatives she has seen so far. The highestalternative in a history h t = ( x , x , ..., x t ) is referred to as best-so-far alternative anddenoted by y t , so y t = max { x , x , ..., x t } . Payoffs are discounted over time with a discount factor δ ∈ (0 , t rounds is δ t y t . The discount factorincorporates various multiplicative costs of search, such as the individual’s impatienceand a decay of values that are not accepted. We assume that alternatives are drawn from a given (Borel) set X ⊂ R + , with 0 ∈ X ,according to a probability distribution F . For instance, the set of alternatives X canbe R + , N , [0 , ¯ x ], or { , ¯ x } . Let F X denote the set of all distributions over X that havea finite mean. We refer to F as an environment and to F ⊂ F X as a set of feasibleenvironments .We also allow for mixed environments. A mixed environment is a probability distribu-tion with a finite support over the set of feasible environments F . The set of mixedenvironments is denoted by ∆( F ). Mixed environments capture applications whereeach alternative x t depends on two components, an independent value ξ t and a com-mon value θ . For example, in a job search model, the value x t of a job offer may beexpressed as x t = θ + ξ t , where θ is a market-wide or jobseeker-specific unobservablevariable, and ξ t is an idiosyncratic unobservable value specific to employer t .The decision making of the individual is formalized as follows. Clearly, if the indi-vidual stops, she chooses the best-so-far alternative. So, the only relevant decision iswhen to stop. This is given by a decision rule p that prescribes for each history ofalternatives h t = ( x , x , ..., x t ) the probability p ( h t ) of stopping after that history. The restriction to multiplicative search costs is for simplicity and clarity of exposition. Our meth-odology extends to more general costs of search that include both additive and multiplicative com-ponents, as we show in Appendix C.3. Inclusion of 0 in X is for notational convenience. Nothing changes if we replace 0 by some ¯ x aslong as the outside option satisfies x ≥ ¯ x . Inclusion of 0 is natural in applications where searchmay not provide a new alternative in each round, so the absence of a new alternative is modeled asthe zero-valued alternative. Distribution F must have a finite mean to ensure that the optimal payoff under F is well defined. We restrict attention to mixed environments with finite support to avoid technical complicationsof dealing with priors over infinite sets. In fact, we show later that the analysis reduces to dealingwith pure environments only, so the restriction to finite support plays no role in the results.
SCHLAG AND ZAPECHELNYUK
Bayesian Decision Making.
A Bayesian approach to this search problem is asfollows. A Bayesian decision maker starts with some prior over (mixed) environments.In each round, she updates this prior according to Bayes’ rule and makes a choicethat maximizes her expected payoff under the current posterior. Given a prior µ , wecall such a decision rule optimal under µ .Note that each prior, formally defined as a probability distribution with a finitesupport over the set mixed environments ∆( F ), is a compound lottery over the setof environments F . Beca use compound lotteries are equivalent to simple lotteries,any prior over mixed environments, µ ∈ ∆(∆( F )), is an element of ∆( F ) itself. Inwhat follows, we will refer to elements of ∆( F ) synonymously as priors and mixedenvironments.A prior is called degenerate if it assigns unit mass on a single environment F ∈ F .By convention, we associate each environment F with the correspondent degenerateprior, so F ∈ ∆( F ).An environment F ∈ F and a prior µ ∈ ∆( F ) are called consistent with a historyof alternatives h t = ( x , x , ..., x t ) if the sequence of alternatives x , ..., x t occurs witha positive probability under F and µ , respectively. Denote by F ( h t ) the sets ofenvironments that are consistent with h t . With a slight abuse of notation, denote by∆( F ( h t )) the sets of priors that are consistent with h t .2.3. Performance Criterion.
We consider an individual who does not know whichenvironment she faces. Rather than being concerned with the optimality under aparticular prior, the individual wishes to find a decision rule that is approximatelyoptimal under all priors and at all stages of the decision making. We formalize thisperformance criterion as follows.Consider a set of alternatives X , a set of feasible environments F ⊂ F X , a historyof alternatives h t = ( x , x , ..., x t ), and a prior µ ∈ ∆( F ( h t )), so µ is consistent withhistory h t . Let U p ( µ, h t ) denote the expected payoff of a decision rule p under µ ,conditional on history h t , so U p ( µ, h t ) = p ( h t ) y t + (1 − p ( h t )) δ (cid:90) F (cid:90) X U p ( µ, h t ⊕ x t +1 )d F ( x t +1 )d µ ( F ) , (1)where h t ⊕ x t +1 = ( x , ..., x t , x t +1 ). OBUST SEQUENTIAL SEARCH 9
Let V ( µ, h t ) denote the optimal payoff under µ conditional on h t , V ( µ, h t ) = sup p U p ( µ, h t ) . This is the highest possible expected payoff, in other words, the payoff of a Bayesiandecision maker, under prior µ given history h t .The payoff ratio U p ( µ, h t ) /V ( µ, h t ) describes the fraction of the optimal payoff thata given rule p attains under prior µ given history h t . Note that V ( µ, h t ) ≥ x > performance ratio R p ( x , F ) of a decision rule p is defined as the lowest payoffratio over all histories of alternatives and all priors consistent with those histories, R p ( x , F ) = inf h ∈H ( x ) inf µ ∈ ∆( F ( h )) U p ( µ, h ) V ( µ, h ) , where H ( x ) denotes the set of histories with outside option x . So, the performanceratio captures the fraction of the optimal payoff that a rule guarantees in each round.The highest possible performance ratio is called dynamically robust and is given by R ∗ ( x , F ) = sup p R p ( x , F ) . Note that R ∗ ( x , F ) depends only on the information available from the start: theoutside option x , the set of feasible environments F , and, implicitly, the discountfactor δ .A decision rule p ∗ is called dynamically robust if it attains the dynamically robustperformance ratio, so R p ∗ ( x , F ) = R ∗ ( x , F ).2.4. Motivation.
Our performance criterion can be motivated by the concept ofepsilon-optimality. In this paper we replace the objective of optimality against a givenenvironment or prior by the objective of epsilon-optimality against all environmentsand priors. Such a rule is robust in the sense that its performance remains close to theoptimum irrespective of which particular environment in ∆( F ) the individual faces.An important aspect of economic models of search is their dynamic nature. Decisionsare made in each round, and past search costs are sunk, hence irrelevant for today’schoices. This dynamic nature is an integral part of our approach. We are interestedin dynamic consistency of a rule, in the sense that its epsilon-optimality should holdnot only ex-ante, but also in all subsequent rounds. This is why we use the term dynamically robust . The dynamic robustness criterion does not require a decision maker to be too specificabout the environment. It is appropriate for a decision maker who is willing to sacrificepayoffs in favor of more general applicability and performance stability. Imagine anindividual (e.g., a CEO of a company) who must convince a group of observers (e.g., aboard of directors), each with a different prior, that her decision rule is good. Assumethat these observers can monitor the performance of this decision rule over time, sothey must remain convinced at all stages of the decision making. If the individual’srule is dynamically robust, then no observer will ever be able to accuse the individualof underperforming by more than a specified threshold. Moreover, being dynamicallyrobust means that the threshold is the smallest among all rules with this property.Finally, our performance criterion can be used to quantify the value of informationabout the environment. The dynamically robust performance ratio bounds the ratioof payoffs of two individuals: an ignorant one (who knows nothing about the envir-onment) and an informed one (who knows everything about the environment). Thus,it defines the maximal payoff loss due to being uninformed about the environment.2.5.
First Insights.
Before unveiling our results, we present three simple, but im-portant insights.2.5.1.
Irrelevance of Priors.
The greatest obstacle in Bayesian optimization is thatthe problem of finding an optimal rule is generally intractable and only solvable forextremely simple priors. Our approach does not have this drawback, as we do notneed to consider general priors. Below we show that it is enough to restrict attentionto pure environments.Note that optimal rules under pure environments are simple to find, as these arecutoff rules that require to search until a certain cutoff is exceeded. Specifically, byWeitzman (1979), the optimal rule under any given environment F prescribes to stopwhenever the best-so-far alternative y exceeds a reservation value c F given by c F = δ (cid:18)(cid:90) c F c F d F ( x ) + (cid:90) ∞ c F x d F ( x ) (cid:19) . (2)The optimal payoff, given a best-so-far alternative y and an environment F , is V ( F, y ) = max (cid:8) y, c F (cid:9) . (3) OBUST SEQUENTIAL SEARCH 11
The proposition below shows that the performance ratio of a rule can be determinedby looking only at the pure environments. Recall that F ( h ) and ∆( F ( h )) denote theset of environments and priors, respectively, that are consistent with a history h . Proposition 1.
For each decision rule p and each history h , inf µ ∈ ∆( F ( h )) U p ( µ, h ) V ( µ, h ) = inf F ∈F ( h ) U p ( F, h ) V ( F, h ) . Note that the ratio U p ( µ, h ) /V ( µ, h ) is nonlinear in µ , so the result does not imme-diately follow from the fact that each prior µ is a linear combination of points in F .The proof is in Appendix A.1.2.5.2. Irrelevance of Histories.
How should the individual condition her decisions onpast observations? For instance, what does the individual learn after having observeda history ( x , x , ..., x n )? All environments are still possible, except for the degenerateones that assign zero probability to the values of x , ..., x n . When the set of feasibleenvironments is convex, exclusion of these degenerate environments does not changethe infimum of the payoff ratios. Intuitively, this is because our performance measureinvolves evaluating the payoff ratio for each environment under which a given historyoccurs with a positive probability. How likely this history occurs does not influencethe payoff ratio. If the history contains observations that cannot be generated by someenvironment F , other environments arbitrarily close to F can generate this historywith a positive, albeit arbitrarily small probability, and F is a limit of a sequence ofsuch environments. Proposition 2.
Let F be convex. For each decision rule p and each history h , inf F ∈F ( h ) U p ( F, h ) V ( F, h ) = inf F ∈F U p ( F, h ) V ( F, h ) . Proposition 2 states that, when evaluating the infimum of the payoff ratio, one shouldtake into account the set of all environments, regardless of whether or not they areconsistent with the observed history. The proof is in Appendix A.2.2.5.3.
Necessity to Randomize.
We now show that dynamically robust rules neces-sarily involve randomization. Stopping with certainty in any round is bad, becauseone might miss out a high realization in the next round. Yet, continuing forever withcertainty is bad too, because this destroys the value of the outside option. We show that no deterministic rule can guarantee a better performance ratio than the rule thatstops in round zero.Specifically, if one stops and obtains x , the maximum possible foregone payoff issup F ∈F V ( F, x ). Thus, a performance ratio of x / (sup F ∈F V ( F, x )) is trivially ob-tained by stopping in round zero, that is, by not searching at all.A decision rule p is called deterministic if p ( h ) ∈ { , } for each history h . Let F denote the Dirac environment that almost surely generates an alternative that hasvalue 0. The next proposition shows that deterministic decision rules cannot performbetter than not searching at all, as long as the environment F is feasible. Proposition 3.
Let p be a deterministic decision rule. Suppose that F ∈ F . Then R p ( x , F ) ≤ x sup F ∈F V ( F, x ) . In particular, if the set of alternatives X is unbounded and all distributions arefeasible, so F = F X , then R p ( x , F ) = 0. The proof is in Appendix A.3. Remark 1.
Proposition 3 sheds light on the performance of decision rules used byBayesian decision makers. By definition, any such rule is optimal for some prior. Itstops the search if the best-so-far alternative is better than the expected continuationpayoff under this prior, and continues otherwise. So, it is generically deterministic. , Thus, by Proposition 3, for some priors, this rule is never better than not searchingat all. 3.
Binary Environments
Consider the simple case in which feasible environments can have at most one valueabove the outside option. We call such environments binary . This case is relevantfor applications where the individual knows what she is looking for, she just does notknow whether she will find it and, if so, how valuable it will be.An environment is called binary , denoted by F ( z,σ ) , if it is a lottery over two values,0 and z , with probabilities 1 − σ and σ , respectively. The assumption that the low Indifference between stopping and continuing under a given prior is nongeneric, in the sense thatit does not hold under an open set of priors in the neighborhood of that prior. This genericity follows from our assumption that the distribution of alternatives is exogenous. InJanssen et al. (2017) the distribution is endogenous, and the equilibrium Bayesian search rule isnondeterministic.
OBUST SEQUENTIAL SEARCH 13 alternative has value 0 is for convenience: the results do not change, as long as at mostone alternative above the outside option realizes with positive probability. Even if theindividual does not know the value of this alternative at the outset, she immediatelyknows it after it has realized, and stops the search. In particular, the assumption offree recall plays no role for these environments.Given a set X of feasible alternatives, we denote by B X the set of all binary environ-ments over X , so B X = { F ( z,σ ) : z ∈ X, σ ∈ [0 , } . A special case of only two feasible alternatives, X = { , z } , captures the situationwhere the individual knows the value of the high alternative. In this case, the onlyunknown parameter is how likely the high alternative emerges in each next round.When facing a set B X of binary environments, any decision rule is fully described bya sequence of probabilities q = ( q , q , q , ... ) , where q t is the probability to stop in round t conditional on only alternative 0 realizedin rounds 1 , ..., t .A decision rule ¯ q is stationary if its stopping probability is constant, so q = q = q = ... . We will show that a particular stationary rule is dynamically robust in binaryenvironments.We now present our result for binary environments. Theorem 1.
The stationary decision rule with the stopping probability ¯ q ∗ = − δ − δ (a) attains the performance ratio / ;(b) is dynamically robust if sup X = ∞ . Theorem 1 shows that one can always guarantee at least 1 / q ∗ = − δ − δ balances the payoff ratio between environments where it is optimal to stopand where it is optimal to keep searching until the high alternative realizes. Let usfix an outside option x . To simplify notation, we write v ∗ ( z,σ ) for the optimal payoff in a binary environment F ( z,σ ) , and u ∗ ( z,σ ) for the individual’s payoff from the rule thatstops with constant probability ¯ q ∗ in that environment. Observe that u ∗ ( z,σ ) = ¯ q ∗ x + (1 − ¯ q ∗ ) δ ( σ max { z, x } + (1 − σ ) u ∗ ( z,σ ) ) . Substituting ¯ q ∗ = − δ − δ and solving for u ∗ ( z,σ ) yields u ∗ ( z,σ ) = (1 − δ ) x + δσ max { z, x } − δ ) + δσ . First, consider an environment where it is optimal to stop immediately, so v ∗ ( z,σ ) = x .The payoff ratio is u ∗ ( z,σ ) v ∗ ( z,σ ) = (1 − δ ) x + δσ max { z, x } − δ ) + δσ · x ≥ (1 − δ ) x + δσx (2(1 − δ ) + δσ ) x ≥ , (4)where the first inequality by max { z, x } ≥ x , and the second inequality is becausethe ratio is increasing in σ ∈ [0 , z > x and, moreover, it is optimal to searchuntil z realizes, so the optimal payoff is v ∗ ( z,σ ) = δ ( σz + (1 − σ ) v ∗ ( z,σ ) ) . Solving for v ∗ ( z,σ ) yields v ∗ ( z,σ ) = δσz/ (1 − δ + δσ ). The payoff ratio is u ∗ ( z,σ ) v ∗ ( z,σ ) = (1 − δ ) x + δσz − δ ) + δσ · − δ + δσδσz ≥ δσz − δ ) + δσ · − δ + δσδσz ≥ , (5)where the first inequality is by (1 − δ ) x ≥ σ ∈ [0 , σ = 0 and would be violated for any stopping probability smaller than ¯ q ∗ ; andinequality (5) holds as equality when σz → ∞ and would be violated for any stoppingprobability greater than ¯ q ∗ . So, the performance ratio cannot be improved upon whensup X = ∞ . The formal proof is in Section 3.1.We achieve a better performance when environments are bounded. For each x ∈ [0 , q ∗ ( x ) = 2(1 − δ )4 − δ + x − (cid:112) x ( x + 8) and ρ ( x ) = 12 + 18 (cid:16) x + (cid:112) x ( x + 8) (cid:17) . (6) Theorem 1 (cid:48) . Let ¯ x = sup X < ∞ and let < x ≤ ¯ x . The stationary decision rulewith the stopping probability q ∗ ( x / ¯ x ) (a) attains the performance ratio ρ ( x / ¯ x ) > / ; OBUST SEQUENTIAL SEARCH 15 (b) is dynamically robust if x / ¯ x ≤ δ / (2 − δ ) . Remark 2. If x / ¯ x > δ / (2 − δ ), then the rule q ∗ ( x / ¯ x ) it is not dynamically robust(so, a higher performance ratio can be attained). Yet q ∗ ( x / ¯ x ) attains the performanceratio ρ ( x / ¯ x ) which is already very good in this case: ρ ( x / ¯ x ) > ρ (cid:18) δ − δ (cid:19) = 12 − δ > δ for all x / ¯ x > δ − δ . The dynamically robust rule and its performance ratio for all x / ¯ x ∈ (0 ,
1] are derivedin Section 3.1 below (see (14) and (15)).Theorem 1 (cid:48) shows that one can guarantee more than 1 / x is relative to the highest feasible alternative ¯ x . In fact, if x is extremelysmall, then the performance ratio is close to 1 /
2. Yet one can guarantee at least 2 / / x / ¯ x exceeds, respectively, 1 / /
3. Table 1illustrates the performance ratio ρ ( x / ¯ x ) for a few values of x / ¯ x . x / ¯ x /
89 1 /
20 1 /
10 1 / / / / / ρ ( x / ¯ x ) 0 .
538 0 .
552 0 .
625 0 .
666 0 .
685 0 .
71 0 .
75 0 . Table 1.
Some values of the performance ratio of rule q ∗ x / ¯ x δ = 0 . δ = 0 . δ = 0 . δ = 0 . Figure 1.
Stopping probability q ∗ ( x / ¯ x ) with values of the discount factor δ = 0 .
5, 0 .
7, 0 .
9, and 0 . Notice that the stopping probability q ∗ ( x / ¯ x ) of the stationary rule is increasing in x / ¯ x . Figure 1 illustrates this stopping probability for some values of the discountfactor.Curiously, the performance ratios identified in Theorems 1 and 1 (cid:48) are independent ofthe discount factor δ . Intuitively, this is because both the individual’s payoff fromthe rule given by (6) and the optimal payoff V are evaluated using the same discountfactor. So, when following a dynamically robust rule, a more patient individual simplywaits longer in expectation.3.1. Proof of Theorems 1 and 1 (cid:48) . Part (a) in each of the theorems can be provenby a simple verification that the specified stationary decision rule yields at least theclaimed performance ratio. The proof of part (b) is more involved as it requires toshow that there is no other rule that attains a higher performance ratio.Specifically, there are two main stepping stones to the proof of Theorems 1 and 1 (cid:48) .The first stepping stone was provided in Proposition 1, where we showed that theperformance ratio can be determined by looking only at the pure environments, sowe do not need to worry about mixed environments and priors.The second stepping stone, which we now establish, is that we can restrict attentionto stationary decision rules without loss of generality. This simplifies the problemtremendously, as any stationary rule is described by a single parameter: the constantstopping probability. So, it becomes a single-variable optimization problem.Clearly, the individual should stop the search after observing any alternative otherthan zero, as she then knows that such an alternative is the best possible. However,as long as only zero-valued alternatives have realized, this history is irrelevant forthe evaluation of the performance ratio, as shown by Proposition 2. That is, theindividual faces the exact same problem again and again, as long as she draws zero-valued alternatives. Given this unchanging problem, there are neither fundamentalnor strategic reasons to condition decisions on the history. We now show that thisintuition is correct, so we can search for a dynamically robust rule among stationaryrules.
Proposition 4.
For each decision rule q there exists a stationary decision rule ¯ q suchthat R ¯ q ( x , B X ) ≥ R q ( x , B X ) . The proof is in Appendix A.4.
OBUST SEQUENTIAL SEARCH 17
We are now ready to prove Theorem 1 (cid:48) . Theorem 1 will follow by taking the limit of¯ x → ∞ for a fixed x , so that x / ¯ x → X be a set of feasible alternatives with ¯ x = sup X < ∞ , and let x ∈ (0 , ¯ x ]be an outside option. By Proposition 1, we restrict attention to the set of pureenvironments, B X . By Proposition 4, we consider stationary decision rules. Any suchrule is identified with its constant probability of stopping, q ∈ [0 , F ( z,σ ) ∈ B X , let U q ( F ( z,σ ) , x ) be the individual’s expectedpayoff from a stationary rule q : U q ( F ( z,σ ) , x ) = qx + (1 − q ) δ ((1 − σ ) U q ( F ( z,σ ) , x ) + σz ) . (7)By (2), the reservation value c F ( z,σ ) satisfies c F ( z,σ ) = δ ( σz + (1 − σ ) c F ( z,σ ) ). By (3),the optimal payoff is given by V ( F ( z,σ ) , x ) = max (cid:8) x , c F ( z,σ ) (cid:9) . We thus obtain c F ( z,σ ) = δσz − δ (1 − σ ) and V ( F ( z,σ ) , y ) = max (cid:26) y, δσz − δ (1 − σ ) (cid:27) . (8)The performance ratio of rule q is R q ( x , B X ) = inf F ∈B X U q ( F ( z,σ ) , x ) V ( F ( z,σ ) , x ) = inf F ∈B X min (cid:40) U q ( F ( z,σ ) , x ) x , U q ( F ( z,σ ) , x ) c F ( z,σ ) (cid:41) . = min (cid:40) inf F ∈B X U q ( F ( z,σ ) , x ) x , inf F ∈B X U q ( F ( z,σ ) , x ) c F ( z,σ ) (cid:41) . (9)In words, the individual worries about two scenarios: c F ( z,σ ) < x , in which case itis optimal to stop immediately, and c F ( z,σ ) ≥ x , in which case a high value of z issufficiently likely, and it is optimal to wait for it. The optimal stopping probability q should be large in the first scenario and small in the second scenario, thus it shouldbalance this tradeoff. To find the optimal q , we evaluate the worst-case ratios foreach of the two scenarios.Consider the first expression under the minimum in (9). Solving (7) for U q ( F ( z,σ ) , x )yields U q ( F ( z,σ ) , x ) = qx + (1 − q ) δσ max { z, x } − δ (1 − σ )(1 − q ) . (10)We thus haveinf F ( z,σ ) ∈B X U q ( F ( z,σ ) , x ) x = inf z ∈ X,σ ∈ [0 , qx + (1 − q ) δσ max { z, x } (1 − δ (1 − σ )(1 − q )) x = q − δ (1 − q ) , (11) where the last equality is because max { z, x } ≥ x and the ratio is increasing in σ ,and thus achieves the minimum at σ = 0. So, the worst-case environments in thefirst scenario are those environments F ( z,σ ) in which σ = 0, so the high alternative z never occurs.Next, consider the second expression under the minimum in (9). By (8) and (10),inf F ( z,σ ) ∈B X U q ( F ( z,σ ) , x ) c F ( z,σ ) = inf z ∈ X,σ ∈ [0 , qx + (1 − q ) δσz − δ (1 − σ )(1 − q ) · − δ (1 − σ ) δσz = inf σ ∈ [0 , (cid:32) inf z ∈ X q x z + (1 − q ) δσ (1 − δ (1 − σ )(1 − q )) δσ − δ (1 − σ ) (cid:33) = inf σ ∈ [0 , q x ¯ x + (1 − q ) δσ (1 − δ (1 − σ )(1 − q )) δσ − δ (1 − σ ) , (12)where the last equality is by inf z ∈ X x /z = x / ¯ x . So, worst-case environments inthe second scenario are those environments F ( z,σ ) in which z = ¯ x , so z is the highestpossible alternative.Thus, from (11) and (12), we need to solvemax q ∈ [0 , min σ ∈ [0 , (cid:32) min (cid:40) q − δ (1 − q ) , q x ¯ x + (1 − q ) δσ (1 − δ (1 − σ )(1 − q )) δσ − δ (1 − σ ) (cid:41)(cid:33) . (13)Denote ˆ x = x / ¯ x . It is straightforward to verify that the unique solution (¯ q, ¯ σ ) of themaximin problem (13) is¯ q = − δ )4 − δ +ˆ x − √ ˆ x (ˆ x +8) , if 0 < ˆ x ≤ δ − δ , √ (1 − δ )((2 δ − ˆ x ) − δ ˆ x ) − (1 − δ )(2 δ − ˆ x )2 δ ( δ − ˆ x ) , if δ − δ < ˆ x < δ, , if δ ≤ ˆ x ≤ , (14)¯ σ = (1 − δ )(3ˆ x + √ ˆ x (ˆ x +8))2 δ (1 − ˆ x ) , if 0 < ˆ x ≤ δ − δ , , if δ − δ < ˆ x ≤ . We thus derived a dynamically robust decision rule ¯ q . Substituting ¯ q into (9) yieldsthe dynamically robust performance ratio R ∗ ( x , B X ) = + (cid:16) ˆ x + (cid:112) ˆ x (ˆ x + 8) (cid:17) , if 0 < ˆ x ≤ δ − δ , δ − (1 − δ )ˆ x − √ (1 − δ )((2 δ − ˆ x ) − δ ˆ x )2 δ , if δ − δ < ˆ x < δ, , if δ ≤ ˆ x ≤ . (15) OBUST SEQUENTIAL SEARCH 19
Finally, observe that decision rule q ∗ (ˆ x ) given by (6) coincides with the dynamicallyrobust rule ¯ q for ˆ x = x ¯ x ≤ δ − δ . For all 0 < ˆ x ≤ R q ∗ ( x , B X ) = q ∗ (ˆ x )1 − δ (1 − q ∗ (ˆ x )) = 12 + 18 (cid:16) ˆ x + (cid:112) ˆ x (ˆ x + 8) (cid:17) = ρ (ˆ x ) > . This completes the proof of Theorem 1 (cid:48) .4.
General Environments
Consider now more general environments that potentially generate multiple alternat-ives above the outside option. To keep the exposition simple, we fix a set of altern-atives X and allow for all distributions over X that have finite support. So F = F X .In contrast to binary environments, here the first alternative above the outside optionneed not be the best, so sometimes the individual may wish to search for even betteralternatives. This makes decision making more complex. We deal with this complex-ity by building on and extending our insights obtained for the binary setting. Onceagain, we can restrict attention to simple decision rules, which here means that theyare stationary and have some monotonicity properties. We can also restrict attentionto binary environments, as only these determine the worst-case payoff ratio.4.1. Simplicity of Decision Rules.
By Proposition 2, histories are irrelevant forthe evaluation of the performance ratio. The only payoff-relevant variable is the best-so-far alternative. Intuitively, the individual has no reason to condition decisions onanything other than the best-so-far alternative. This suggests that we can restrictattention to stationary decision rules, in which the probability of stopping in eachround depends only on the best-so-far alternative.Formally, a decision rule p is stationary if the stopping probability is the same forany pair histories h (cid:48) and h (cid:48)(cid:48) with same best-so-far alternative, somax { x : x ∈ h (cid:48) } = max { x : x ∈ h (cid:48)(cid:48) } = ⇒ p ( h (cid:48) ) = p ( h (cid:48)(cid:48) ) for all h (cid:48) , h (cid:48)(cid:48) ∈ H ( x ).With stationary decision rules, we simplify notation by replacing each history h t withthe best-so-far alternative y = max { x , x , ..., x t } . So, a stationary rule p : R + → [0 , y to stop with probability p ( y ). For eachenvironment F and each best-so-far alternative y , the optimal payoff is given by V ( F, y ) = max q ∈ [0 , (cid:18) qy + (1 − q ) δ (cid:90) ∞ V ( F, max { y, x } )d F ( x ) (cid:19) , and the payoff of rule p is given by U p ( F, y ) = p ( y ) y + (1 − p ( y )) δ (cid:90) ∞ U p ( F, max { y, x } )d F ( x ) . We introduce two intuitive properties of a stationary decision rule.A stationary decision rule p is monotone if p ( y ) is weakly increasing. It is naturalthat the individual is more likely to accept a greater best-so-far alternative.A stationary decision rule p has the monotone ratio property if r p ( y ) := inf F ∈B X U p ( F, y ) V ( F, y ) is weakly increasing . (16)This is a “free-disposal” property. Suppose that the best-so-far alternative has in-creased from y to y (cid:48) , but the payoff ratio has decreased. Then the individual couldbe better off by destroying some part of the value of the best-so-far alternative anddecreasing it back to y .Decision rules in general environments can be very complex. The next propositionshows we can restrict attention to much simpler decision rules, namely, those that arestationary, monotone, and have the monotone ratio property. Proposition 5.
For each decision rule p , there exists a stationary monotone decisionrule ˜ p with the monotone ratio property such that R ˜ p ( x , F X ) ≥ R p ( x , F X ) . The proof is in Appendix B.1.4.2.
Simplicity of Worst-Case Environments.
Proposition 5 shows that thereexist simple dynamically robust rules. We now show that their simple nature causesworst-case environments to be very simple, too. Specifically, these environments arebinary.
Proposition 6.
Let decision rule p be stationary, monotone, and satisfy the mono-tone ratio property. Then R p ( x , F X ) = inf y ≥ x inf F ∈B X U p ( F, y ) V ( F, y ) . The proof is in Appendix B.2.To gain the intuition for Proposition 6, recall that the individual cares about twocontingencies: stopping when she should have waited for a higher realization of the
OBUST SEQUENTIAL SEARCH 21 value, and continuing when there are no better alternatives in the future. The worst-case distributions for these contingencies need not be complex, they are binary valued.We hasten to point out that Proposition 6 does not imply that the individual shouldact as if she faces binary environments, as otherwise she would stop after seeingany alternative above the outside option. Instead, Proposition 6 implies that, whenevaluating the payoff ratio after any history of realized alternatives, we only need todo so for all binary environments. The value of Proposition 6 is that it drasticallysimplifies the calculation of the performance ratio.Note that binary environments are not consistent with histories that contain morethan two values. However, by Proposition 2, we should not be worried about thisinconsistency, as any binary distribution that is inconsistent with a history can beobtained as a limit of a sequence of distributions that are consistent with that history.4.3.
Dynamically Robust Performance.
We are now ready to present our find-ings for general environments.
Theorem 2.
The stationary decision rule ¯ p given for each y by ¯ p ( y ) = 1 − δ − δ (a) attains the performance ratio R ¯ p ( x , F X ) ≥ / ;(b) is dynamically robust if sup X = ∞ . The proof is in Appendix B.3.Part (a) shows that the dynamically robust performance ratio R ∗ ( x , F X ) is at least1 / y , the relevant worst-case environments are those where there is an alternative z that is very large relative to y . The stopping probability − δ − δ balances the payoff ratiobetween environments where z is sufficiently unlikely (so it is optimal to stop) andenvironments where z is likely enough and is worth waiting for. Consider decisionrules with a constant stopping probability, q , in each round, in particular, before andafter z realizes. A greater q means a shorter delay of obtaining z after it has realized, but also a greater probability of stopping before the first realization of z . In the limit,as y/z tends to 0, the payoff ratio takes the form q − δ (1 − q ) (cid:18) − q − δ (1 − q ) (cid:19) . (17)The first factor in (17), q − δ (1 − q ) = q + δ (1 − q ) q + δ (1 − q ) q + ... is a reciprocal of the expected delay of obtaining z after its realization. The secondfactor in (17) is the probability of not stopping before z realizes for the first time.Setting q − δ (1 − q ) equal to 1 / q = − δ − δ and theguaranteed payoff ratio 1 / q ∗ and ρ given by(6) in Section 3. Theorem 2 (cid:48) . Let ¯ x = sup X < ∞ . Then there exists a constant L ∈ (0 , such thatthe dynamically robust performance ratio satisfies R ∗ ( x , F X ) ≥ ρ ( x / ¯ x ) > / if x / ¯ x ≥ L .Moreover, if x / ¯ x ≥ / , then the decision rule p ∗ given by p ∗ ( y ) = q ∗ ( y/ ¯ x ) (a) attains the performance ratio ρ ( x / ¯ x ) > / ;(b) is dynamically robust if / ≤ x / ¯ x ≤ δ / (2 − δ ) . The proof is in Appendix B.4.Theorem 2 (cid:48) shows that, if the outside option is not too small relative to the highestpossible alternative, in the sense that x / ¯ x ≥ L , then the dynamically robust per-formance ratio for the general environments is the same as that for the binary environ-ments. That is, the expansion from the binary to general set of environments confersno reduction in the dynamically robust performance. Remarkably, the constant L isvery small. We numerically find an upper bound for L : L ≤ / , If x / ¯ x is not in [1 / , δ / (2 − δ )], then the rule p ∗ it is not dynamically robust. The dynamicallyrobust rule and its performance ratio for each x / ¯ x ∈ [ L,
1] are derived in the proof of Theorem 2 (cid:48) (Appendix B.4).
OBUST SEQUENTIAL SEARCH 23 which is independent of the discount factor δ . Thus, as x / ¯ x increases from 0 toa mere 1 /
89, the dynamically robust performance ratio climbs from 1 / /
2. In particular, one can guarantee at least 2 / / / / R ∗ ( x , F X ) is shown as a solidline for x / ¯ x ≥ L , and we hypothesize that it looks as depicted by the dotted line for x / ¯ x < L . x / ¯ x
16 1312 ρ ( x / ¯ x ) R p ∗ ( x , F X ) Figure 2.
Dashed line shows the performance ratio of rule p ∗ . Solid lineshows the dynamically robust performance ratio ρ ( x / ¯ x ) for x / ¯ x ≥ L when δ is sufficiently large, so x / ¯ x ≤ δ / (2 − δ ). Dotted line shows the hypothet-ical value of the dynamically robust performance ratio when x / ¯ x < L . In addition, Theorem 2 (cid:48) shows that the dynamically robust rule identified for binaryenvironments is also dynamically robust in general environments when x / ¯ x ≥ / p ∗ (dashedline) when x / ¯ x ≥ / x / ¯ x ∈ [ L, / p ∗ is no longer dynamically robust. In Figure 2, thedashed line showing R p ∗ ( x , F X ) is below the solid line showing ρ ( x ). Nevertheless,the performance ratio of ρ ( x ) can still be attained. In our proof of Theorem 2 (cid:48) wepresent a rule that is dynamically robust in this case.Two elements of Theorem 2 (cid:48) prompt curiosity. First, why is the dynamically robust performance ratio the same as under binaryenvironments for such a large interval of outside options? It turns out that when x / ¯ x ≥ L , the environments that determine the worst-case ratio are lotteries overthe extreme alternatives, 0 and ¯ x . Other alternatives do not play any role in thisparameter region, but they do when x / ¯ x < L . This stands in contrast to the settingof binary environments where the worst case environments only put weights on theextreme alternatives.Second, why is Theorem 2 (cid:48) silent about dynamic robustness when 0 < x / ¯ x < L ?This is because a closed form expression for the dynamically robust performanceratio is not available for this region. Yet, for each parameter value in this region, thedynamically robust performance ratio, together with an associated decision rule, canbe derived using a recursive procedure that we describe in Appendix C.1.5. Conclusion
It is difficult to search when the distribution of alternatives is not known. In fact, asoutlined in the literature review in the introduction, the literature has not producedsatisfactory insights into how to search in this setting. In this paper we identify thatthis difficulty is due to the desire to achieve the very highest payoff for the givenbeliefs. Namely, we find that it is easier to search if one reduces the target andreplaces “very highest” by “relatively high”. The ease refers to the ability to derivean optimal solution for a very general setting, the simplicity of our algorithm, andthe minimality of assumptions one needs to impose on the environment.The methodology developed in this paper is general, applicable to a spectrum ofdynamic decision making problems, and should spark future research. Its strength isthat it allows for dynamically consistent decision making with multiple priors. It is asif our decision maker is surrounded by other individuals, each of whom has her ownprior. At any point in time, each of these individuals wants to complain that our ruleis not appropriate given their prior. According to our concept, a dynamically robustrule may not be optimal given their prior, but their complaints cannot be large. Ouranalysis reveals bounds on the size of any such complaint.Our results about the bounds on the size of possible complaints do not change ifthe searcher has more information about the environment, for example, if she canrestrict the set of priors, or if the number of alternatives is finite and known. What
OBUST SEQUENTIAL SEARCH 25 does change is the tightness of these bounds. Of course, when more information isavailable, better decision rules can be found.The main insights (randomization is essential, dynamically robust rules are station-ary, worst-case priors are simple) extend to search without recall and to search withexchangeable distributions.New avenues for research on dynamical robustness are opened, such as extendingthis agenda to matching and to other search environments that involve strategicinteraction. The economic insights of models that include agents searching underknown distributions can now be reevaluated using agents that employ dynamicallyrobust search.
Appendix A. Binary Environments
A.1.
Proof of Proposition 1.
Fix a history h and a prior µ that is consistent withthat history, so µ ∈ ∆( F ( h )). Let r p ( F ) = U p ( F, h ) V ( F, h ) and η ( F ) = V ( F, h ) µ ( F | h ) V ( µ, h ) . Note that the posterior µ ( ·| h ) must assign zero probability to the set of all environ-ments that are inconsistent with h , thus U p ( µ, h ) = (cid:88) F ∈F ( h ) U p ( F, h ) µ ( F | h ) . Using the above notations we obtain U p ( µ, h ) V ( µ, h ) = (cid:80) F ∈F ( h ) U p ( F, h ) µ ( F | h ) V ( µ, h ) = (cid:80) F ∈F ( h ) r p ( F ) V ( F, h ) µ ( F | h ) V ( µ, h )= (cid:88) F ∈F ( h ) r p ( F ) η ( F ) ≥ inf F ∈F ( h ) r p ( F ) = inf F ∈F ( h ) U p ( F, h ) V ( F, h ) , where the inequality follows from r p ( F ) ≥ η ( F ) ≥
0, and (cid:88) F ∈F η ( F ) = (cid:80) F ∈F V ( F, h ) µ ( F | h ) V ( µ, h ) = (cid:80) F ∈F sup p U p ( F, h ) µ ( F | h ) V ( µ, h ) ≥ sup p (cid:80) F ∈F U p ( F, h ) µ ( F | h ) V ( µ, h ) = sup p U p ( µ, h ) V ( µ, h ) = V ( µ, h ) V ( µ, h ) = 1 . Since the above holds for all µ ∈ ∆( F ( h )), we haveinf µ ∈ ∆( F ( h )) U p ( µ, h ) V ( µ, h ) ≥ inf F ∈F ( h ) U p ( F, h ) V ( F, h ) . The proof of the reverse of the above inequality is trivial, since F ( h ) is a subset of∆( F ( h )).A.2. Proof of Proposition 2.
Fix an outside option x > h ∈ H ( x ).Recall that F ( h ) ⊂ F denotes the set of environments that are consistent with ahistory h , and note that F ( h ) (cid:54) = ∅ . Consider two environments, F ∈ F ( h ) and G ∈ F . Let ( G k ) ∞ k =1 be a sequence of environments given by G k = k F + (cid:0) − k (cid:1) G, k ∈ N , so lim k →∞ G k = G . By convexity of F , G k ∈ F for all k ∈ N . Consistency of F ∈ F ( h ) with history h = ( x , x , ..., x t ) means that supp ( F ) contains { x , ..., x t } .So, { x , ..., x t } ⊂ supp ( F ) ⊂ supp ( G k ), and thus G k ∈ F ( h t ) for all k ∈ N . Sincethe above is true for all G ∈ F , it follows that Closure ( F ( h )) = F , which proves theproposition.A.3. Proof of Proposition 3.
Let p be deterministic. Suppose that there exists k ∈ { , , , ... } such that p stops searching after k zero-valued alternatives. Form-ally, p ( x ⊕ k ) = 1, where k denotes the sequence of k zeros and ‘ ⊕ ’ denotes thevector concatenation operator. For any F ∈ F , the individual’s payoff in round k is U p ( F, x ⊕ k ) = x , and the optimal payoff in round k satisfies V ( F, x ⊕ k ) ≤ sup F (cid:48) ∈F V ( F (cid:48) , x ) , because, by (3), V ( F, x ) = V ( F, x ⊕ k ). Consequently, R p ( x , F ) ≤ inf F ∈F U p ( F, x ⊕ k ) V ( F, x ⊕ k ) = x sup F ∈F V ( F, x ) . Now, consider the complementary case where p never stops searching as long as onlyzeros occurred in the past. So, p ( x ⊕ k ) = 0 for all k ∈ { , , , ... } . Consider anenvironment F in which all alternatives are equal to zero with certainty. In round 0,the optimal payoff under F is V ( F , x ) = x . Since p continues after each historywith only zeros, it never stops under F , and hence its payoff is U p ( F , x ) = 0. OBUST SEQUENTIAL SEARCH 27
Consequently, R p ( x , F ) ≤ U p ( F , x ) V ( F , x ) = 0 x = 0 . A.4.
Proof of Proposition 4.
In what follows, we denote by ¯ q ∞ a constant se-quence, so ¯ q ∞ = (¯ q, ¯ q, ... ) for ¯ q ∈ [0 , q (cid:48) = ( q (cid:48) , q (cid:48) , ... ) can be replaced by aconstant sequence ¯ q ∞ that has a weakly higher performance ratio in binary environ-ments. Note that we only need to compare the individual’s payoffs U q (cid:48) and U ¯ q ∞ , asthe optimal payoff V does not depend on the decision rule.For consistency with notations in Appendix B, we use notation y = x . In thepaper, y denotes the current best-so-far alternative, and in binary environments thisis always the outside option x . Also, note that in binary environments we only haveto consider histories in which only zeros occur, and hence replace h t by the roundnumber t .The expected payoff of a rule q in each round t = 0 , , , ... is given by U q ( F ( z,σ ) , t ) = q t y + (1 − q t ) δ ( σz + (1 − σ ) U q ( F ( z,σ ) , t + 1)) . (18)For each ( z, σ ), denote the worst expected payoff among all rounds by¯ U q ( F ( z,σ ) ) = inf t =0 , ,... U q ( F ( z,σ ) , t ) . Let q (cid:48) be an arbitrary sequence of probabilities. This q (cid:48) will be called a benchmark and will be fixed for the rest of the proof. We say that a sequence q is better than q (cid:48) for ( z, σ ) if its worst expected payoff under environment F ( z,σ ) is at least as good asthat of the benchmark q (cid:48) , so ¯ U q ( F ( z,σ ) ) ≥ ¯ U q (cid:48) ( F ( z,σ ) ) . Let ¯ q ∞ = (¯ q, ¯ q, ... ) be the constant sequence where ¯ q is a solution of the equation U q (cid:48) ( F ( z, ,
0) = ¯ qy + (1 − ¯ q ) δU q (cid:48) ( F ( z, , , (19)so ¯ q = (1 − δ ) U q (cid:48) ( F ( z, , y − δU q (cid:48) ( F ( z, , . By (18), U q (cid:48) ( F ( z, , ∈ [0 , y ] when σ = 0, so ¯ q ∈ [0 , q ∞ is betterthan q (cid:48) for all z ≥ y and all σ ∈ [0 , By (18), observe that for any sequence q and any t , U q ( F ( z,σ ) , t ) − y = (1 − q t )( δσz + δ (1 − σ ) U q ( F ( z,σ ) , t + 1) − y )= (1 − q t )( δσz − (1 − δ (1 − σ )) y + δ (1 − σ )( U q ( F ( z,σ ) , t + 1) − y )) . Iterating the above for t + 1 , t + 2 , ... , we obtain U q ( F ( z,σ ) , t ) − y = ( δσz − (1 − δ (1 − σ )) y ) ∞ (cid:88) k =0 (cid:32) δ k (1 − σ ) k t + k (cid:89) s = t (1 − q s ) (cid:33) . (20)First, assume that δσz − (1 − δ (1 − σ )) y = 0. Then U q ( F ( z,σ ) , t ) − y = 0 for every q and every t . In particular, ¯ U ¯ q ∞ ( F ( z,σ ) ) = ¯ U q (cid:48) ( F ( z,σ ) ) = y . So, ¯ q ∞ is better than q (cid:48) .Next, assume that δσz − (1 − δ (1 − σ )) y (cid:54) = 0. Define ψ σt ( q ) = U q ( F ( z,σ ) , t ) − yδσz − (1 − δ (1 − σ )) y . (21)By (20), ψ σt ( q ) = (1 − q t ) (cid:0) δ (1 − σ ) ψ σt +1 ( q ) (cid:1) = ∞ (cid:88) k =0 (cid:32) δ k (1 − σ ) k t + k (cid:89) s = t (1 − q s ) (cid:33) . (22)Note that for any constant sequence q ∞ = ( q, q, ... ), ψ σ ( q ∞ ) = ∞ (cid:88) k =0 δ k (1 − σ ) k (1 − q ) k +1 = 1 − q − δ (1 − σ )(1 − q ) , (23)where we omit the subscript t for notational simplicity. When δσz − (1 − δ (1 − σ )) y > q ∞ is better than q (cid:48) if ψ σ (¯ q ∞ ) ≥ inf t ψ σt ( q (cid:48) ) . When δσz − (1 − δ (1 − σ )) y <
0, the constant sequence ¯ q ∞ is better than q (cid:48) if − ψ σ (¯ q ∞ ) ≥ inf t ( − ψ σt ( q (cid:48) )) . Therefore, to prove that ¯ q ∞ is better than q (cid:48) for all z ≥ y and all σ ∈ [0 , σ ∈ [0 , t ψ σt ( q (cid:48) ) ≤ ψ σ (¯ q ∞ ) ≤ sup t ψ σt ( q (cid:48) ) . (24) OBUST SEQUENTIAL SEARCH 29
Fix σ ∈ [0 , ψ ( q ) achievable by choosing a sequence q subject to the constraintinf t ψ σt ( q (cid:48) ) ≤ ψ σs ( q ) ≤ sup t ψ σt ( q (cid:48) ) for all s = 0 , , , ... . (25)To do this, we solve min q ψ ( q ) subject to (25), and (26)max q ψ ( q ) subject to (25). (27) Lemma 1.
There exist a solution q σ min of (26) and a solution q σ max of (27) that areconstant sequences. We postpone the proof of this lemma to the end of this section and first complete theproof of Proposition 4.By (21) and the definition of ¯ q ∞ (see (19)), ψ (¯ q ∞ ) = ψ ( q (cid:48) ) (recall that we omit thesubscript t for constant sequences). Because q (cid:48) satisfies constraint (25) by definition,we have ψ ( q σ min ) ≤ ψ (¯ q ∞ ) = ψ ( q (cid:48) ) ≤ ψ ( q σ max ) . By Lemma 1, q σ min and q σ max are constant sequences. By (23), for any constant sequence˜ q ∞ = (˜ q, ˜ q, ... ), ψ (˜ q ∞ ) is strictly increasing in ˜ q . Thus, we have q σ min ≤ ¯ q ∞ ≤ q σ max . (28)Again by (23), for any constant sequence ˜ q ∞ and any σ , ψ σ (˜ q ∞ ) is strictly increasingin ˜ q . Since q σ min and q σ max satisfy the constraint (25), we haveinf t ψ σt ( q (cid:48) ) ≤ ψ σ ( q σ min ) ≤ ψ σ (¯ q ∞ ) ≤ ψ σ ( q σ max ) ≤ sup t ψ σt ( q (cid:48) ) . So, (24) holds. This completes the proof.
Proof of Lemma 1.
We prove that a solution of the maximization problem (27) is aconstant sequence. The proof of this statement for the minimization problem (26) isanalogous.Fix σ ∈ [0 , ψ σ ( q (cid:48) ) = inf t ψ σt ( q (cid:48) ) and ¯ ψ σ ( q (cid:48) ) = sup t ψ σt ( q (cid:48) ) . Let ˜ q be the solution of the equation¯ ψ σ ( q (cid:48) ) = (1 − ˜ q )(1 + δ (1 − σ ) ¯ ψ σ ( q (cid:48) )) . (29)We now show that the constant sequence ˜ q ∞ = (˜ q, ˜ q, ... ) is a solution of the maximiz-ation problem (27). To prove this, we solve a finite-horizon problem described below.We assume that the individual makes decisions in rounds t = 0 , , ..., T , after whichthe individual’s behavior is fixed by q t = ˜ q for all t > T . Because the maximal valueof ψ ( q ) in the problem (27) can differ from that in the problem with horizon T byat most δ T , we find the solution to the infinite-horizon problem (27) as the limit ofthe solutions to the finite-horizon problem as T → ∞ .For each T = 1 , , ... consider the following problem:max q ψ ( q ) subject to¯ ψ σ ( q (cid:48) ) ≤ ψ σt ( q ) ≤ ¯ ψ σ ( q (cid:48) ) for all t,q t = ˜ q for all t = T + 1 , T + 2 , .... (30)We now show that ˜ q ∞ is a solution of (30). We proceed by induction, starting fromround k = T , and then continue to rounds k = T − , T − , ..., , k ∈ { , , ..., T } and suppose q t = ˜ q for each t > k . Observe that, by (23), for all t > k , ψ t ( q ) = 1 − ˜ q − δ (1 − ˜ q ) and ψ σt ( q ) = 1 − ˜ q − δ (1 − σ )(1 − ˜ q ) = ¯ ψ σ ( q (cid:48) ) , (31)where the last equality is by the definition of ˜ q in (29). Next, q must satisfy theconstraint in (30), so ψ σk ( q ) ≤ ¯ ψ σ ( q (cid:48) ). Using (22) and (31), we obtain that ψ σk ( q ) = (1 − q k )(1 + δ (1 − σ ) ψ σk +1 ( q )) = (1 − q k )(1 + δ (1 − σ ) ¯ ψ σ ( q (cid:48) )) ≤ ¯ ψ σ ( q (cid:48) ) (32)implies by (29) q k ≥ ˜ q. (33)Let us first deal with the case of k ≥
1. We show that if q k > ˜ q , then ψ ( q ) can beincreased by reducing q k . Specifically, we keep q t fixed for all t different from k − k , and vary q k − and q k such that ψ σk − ( q ) remains constant, that is, dψ σk − ( q ) = − (1 + δ (1 − σ ) ψ σk ( q )) dq k − + (1 − q k − ) δ (1 − σ ) ∂ψ σk ( q ) ∂q k dq k = 0 . OBUST SEQUENTIAL SEARCH 31
By (31) and (32) we have ψ σk ( q ) = (1 − q k )(1 + δ (1 − σ ) ψ σk +1 ( q )) = (1 − q k )(1 + δ (1 − σ ) ¯ ψ σ ( q (cid:48) ))= (1 − q k ) (cid:18) δ (1 − σ ) 1 − ˜ q − δ (1 − σ )(1 − ˜ q ) (cid:19) = 1 − q k − δ (1 − σ )(1 − ˜ q )and ∂ψ σk ( q ) ∂q k = − − δ (1 − σ )(1 − ˜ q ) . Thus, dq k − dq k = − δ (1 − σ )(1 − q k − )1 − δ (1 − σ )( q k − ˜ q ) . Inserting σ = 0 into (22) and (31), by the induction assumption that q k +1 = ˜ q σ , ψ k ( q ) = (1 − q k )(1 + δψ k +1 ( q )) = (1 − q k ) (cid:18) δ (1 − ˜ q )1 − δ (1 − ˜ q ) (cid:19) = 1 − q k − δ (1 − ˜ q ) , and ψ k − ( q ) = (1 − q k − )(1 + δψ k ( q )) = (1 − q k − ) 1 − δ ( q k − ˜ q )1 − δ (1 − ˜ q ) . Thus, by (22) with σ = 0, ∂ψ ( q ) ∂q k = δ k (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) ∂ψ k ( q ) ∂q k = − δ k (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) − δ (1 − ˜ q )and ∂ψ ( q ) ∂q k − = δ k − (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) ∂ψ k − ( q ) ∂q k − = − δ k − (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) − δ ( q k − ˜ q )1 − δ (1 − ˜ q )= − δ k (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) − δ ( q k − ˜ q )(1 − δ (1 − ˜ q )) δ (1 − q k − ) . Therefore, if q k − <
1, then dψ ( q ) dq k = ∂ψ ( q ) ∂q k + ∂ψ ( q ) ∂q k − dq k − dq k = − δ k (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) (cid:18) − δ (1 − ˜ q ) + 1 − δ ( q k − ˜ q )(1 − δ (1 − ˜ q )) δ (1 − q k − ) dq k − dq k (cid:19) = − δ k − δ (1 − ˜ q ) (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) (cid:18) − (1 − δ ( q k − ˜ q ))(1 − σ )1 − δ (1 − σ )( q k − ˜ q ) (cid:19) = − δ k − δ (1 − ˜ q ) (cid:32) k − (cid:89) s =0 (1 − q s ) (cid:33) σ − δ (1 − σ )( q k − ˜ q ) ≤ . Alternatively, if q k − = 1, then ψ ( q ) is independent of q k , so dψ ( q ) /dq k = 0. Thus,if q k > ˜ q , then decreasing q k increases ψ ( q ) without violating the constraint in (30),as long as q k ≥ ˜ q .Next, we deal with the case of k = 0. By (22) and (31) we have dψ ( q ) dq = − − δψ ( q ) < . So, again, if q > ˜ q , then decreasing q increases ψ ( q ) without violating the constraintin (30), as long as q ≥ ˜ q .We thus proved that if q is a solution of (30) with q k > ˜ q and q t = ˜ q for all t > k ,then there exists a solution with q t = ˜ q for all t ≥ k . As this is true for each k = T, T − , ..., , q ∞ is a solution of (30), so q σ max = ˜ q ∞ . (cid:3) Appendix B. General Environments
B.1.
Proof of Proposition 5.
We begin the proof with a lemma that will be usefulhere and in further proofs in Appendix B.
OBUST SEQUENTIAL SEARCH 33
Lemma 2.
Let p be stationary. For each y ≥ x and each F ( z,σ ) ∈ B X , U p ( F ( z,σ ) , y ) = p ( y ) y − δ (1 − p ( y )) if z ≤ y , (34) U p ( F ( z,σ ) , y ) = p ( y ) y + (1 − p ( y )) δσ p ( z ) z − δ (1 − p ( z )) − δ (1 − p ( y ))(1 − σ ) if z > y , (35) U p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = p ( y ) y + (1 − p ( y )) δσ p ( z ) z − δ (1 − p ( z )) (1 − δ (1 − p ( y ))(1 − σ )) δσz − δ (1 − σ ) if c F ( z,σ ) ≥ y . (36) Moreover, if p ( y ) is monotone, then U p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≥ U p ( F ( z, , y ) V ( F ( z, , y ) = p ( y )1 − δ (1 − p ( y )) if c F ( z,σ ) ≤ y . (37) Proof. If z ≤ y , then the best-so-far alternative never changes under F ( z,σ ) . The payoffis U p ( F ( z,σ ) , y ) = p ( y ) y + (1 − p ( y )) δU p ( F ( z,σ ) , y ). Solving this equation for U p ( F ( z,σ ) , y )yields (34). Alternatively, if z > y , then the payoff is U p ( F ( z,σ ) , y ) = p ( y ) y + (1 − p ( y )) δ ( σU p ( F ( z,σ ) , z ) + (1 − σ ) U p ( F ( z,σ ) , y )) . (38)Inserting y = z into (34) yields U p ( F ( z,σ ) , z ) = p ( z ) z − δ (1 − p ( z )) . Inserting this into (38) andsolving for U p ( F ( z,σ ) , y ) yields (35). To prove (36), suppose that c F ( z,σ ) = δσz − δ (1 − σ ) ≥ y .Note that z ≥ y/δ > y , since σ ∈ [0 , U p ( F ( z,σ ) , y ) is given by (35) and V ( F ( z,σ ) , y ) = δσz − δ (1 − σ ) by (8), and (36) follows immediately.Finally, to prove (37), suppose that c F ( z,σ ) ≤ y . Observe that V ( F ( z,σ ) , y ) = y by (8).If z ≤ y , then by (34) the payoff ratio is U p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = p ( y )1 − δ (1 − p ( y )) . Instead, if z > y , then, using (35) and p ( z ) ≥ p ( y ) by the monotonicity of p , we have U p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = p ( y ) y + (1 − p ( y )) δσ p ( z ) z − δ (1 − p ( z )) (1 − δ (1 − p ( y ))(1 − σ )) y ≥ p ( y ) y + (1 − p ( y )) δσ p ( y ) y − δ (1 − p ( y )) (1 − δ (1 − p ( y ))(1 − σ )) y = p ( y )1 − δ (1 − p ( y )) . (cid:3) We now prove Proposition 5. Fix X ⊂ R . Let ˆ R p ( x ) denote the smallest payoff ratiowhen facing an environment in B X , soˆ R p ( x ) = inf h ∈H ( x ) inf F ∈B X U p ( F, h ) V ( F, h ) . Note that ˆ R p ( x ) is not that same value as the performance ratio R p ( x , B X ) cal-culated for rule p in the binary setting. This is because in the binary setting theindividual’s choice is trivial whenever she observes an alternative above the outsideoption.To prove Proposition 5, we show that for each rule p there exists a rule q that isstationary, monotone, and has the monotone ratio property, such that R p ( x , F X ) ≤ ˆ R p ( x ) ≤ ˆ R q ( x ) = R q ( x , F X ) . The first inequality trivially follows from B X ⊂ F X and the definitions of R p and ˆ R p .Proposition 6 proves the equality, ˆ R q ( x ) = R q ( x , F X ). We hasten to point out thatthe proof of Proposition 6 does not depend on Proposition 5. It remains to prove thatˆ R p ( x ) ≤ ˆ R q ( x ) . We divide the proof into three parts. In each part, we consider a decision rule p thatsatisfies the restrictions imposed in the previous parts, and construct a different rulewhose performance ratio over the set of binary environments is weakly better thanthat of p . Part 1. Stationarity.
Let p be a decision rule. We now construct a stationary rule q whose performance ratio against environments in B X is at least as high as that of p .Let ¯ H ( y ) denote the set of histories whose best-so-far alternative is y . Let W ( y ) bethe maximal payoff of rule p among all these histories, against all binary environmentsin which no alternatives better than y will ever emerge, so W ( y ) = sup h ∈ ¯ H ( y ) (cid:32) sup F ( z,σ ) ∈B X : z ≤ y U p ( F ( z,σ ) , h ) (cid:33) . (39)Define a stationary rule q as follows. For each y ≥ x , let q ( y ) be the solution of W ( y ) = q ( y ) y + (1 − q ( y )) δW ( y ) . OBUST SEQUENTIAL SEARCH 35
Note there exists a unique solution q ( y ) ∈ [0 , y ≥ x > ≤ U p ( F ( z,σ ) , h ) ≤ V ( F ( z,σ ) , h ) = y if z ≤ y and h ∈ ¯ H ( y ), (40)so, in particular, 0 ≤ W ( y ) ≤ y . We now prove that the change from p to q does notdecrease the performance ratio, so ˆ R p ( x ) ≤ ˆ R q ( x ).Fix y (cid:48) ≥ x and F ( z,σ ) ∈ B X . Denote by q | y (cid:48) p a decision rule in which the stoppingprobability is q ( y ) whenever the best-so-far alternative is y (cid:54) = y (cid:48) , and it is given bythe original rule p ( h ) whenever the best-so-far alternative is y (cid:48) , that is, h ∈ ¯ H ( y (cid:48) ).We now prove thatinf h (cid:48) ∈ ¯ H ( y (cid:48) ) U p ( F ( z,σ ) , h (cid:48) ) ≤ inf h ∈ ¯ H ( y (cid:48) ) U q | y (cid:48) p ( F ( z,σ ) , h ) ≤ U q ( F ( z,σ ) , y (cid:48) ) . (41)To prove the first inequality in (41), we fix an arbitrary h (cid:48) ∈ ¯ H ( y (cid:48) ) and show that U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) ≥ U p ( F ( z,σ ) , h (cid:48) ). We have U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) = p ( h (cid:48) ) y (cid:48) +(1 − p ( h (cid:48) )) δ ((1 − σ ) U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ⊕ σU q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ⊕ z )) , where ‘ ⊕ ’ denotes the vector concatenation operator, so h (cid:48) ⊕ h (cid:48) with0 appended at the end. If z ≤ y (cid:48) , then U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) is independent of q (becausethe best-so-far alternative remains y (cid:48) ), so U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) = U p ( F ( z,σ ) , h (cid:48) ) . Otherwise, if z > y (cid:48) , then, by the definitions of W ( z ) and q , for each k = 0 , , ... , U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ⊕ k ⊕ z ) = U q ( F ( z,σ ) , z ) = W ( z ) ≥ U p ( F ( z,σ ) , h (cid:48) ⊕ k ⊕ z ) , (42)where k is the vector of k zeros. So, U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) = ∞ (cid:88) k =0 (cid:20)(cid:0) p ( h (cid:48) ⊕ k ) y (cid:48) + (1 − p ( h (cid:48) ⊕ k )) δσW ( z ) (cid:1) × δ k (1 − σ ) k k − (cid:89) s =0 (1 − p ( h (cid:48) ⊕ s )) (cid:21) ≥ ∞ (cid:88) k =0 (cid:20)(cid:0) p ( h (cid:48) ⊕ k ) y (cid:48) + (1 − p ( h (cid:48) ⊕ k )) δσU p ( F ( z,σ ) , h (cid:48) ⊕ k ⊕ z ) (cid:1) × δ k (1 − σ ) k k − (cid:89) s =0 (1 − p ( h (cid:48) ⊕ s )) (cid:21) = U p ( F ( z,σ ) , h (cid:48) ) . Summing up the above, we obtain U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) ≥ U p ( F ( z,σ ) , h (cid:48) ) for each h (cid:48) ∈ ¯ H ( y (cid:48) ),thus proving the first inequality in (41).Let us prove the second inequality in (41). If z ≤ y (cid:48) , then U q ( F ( z,σ ) , y (cid:48) ) = W ( y (cid:48) ) ≥ inf h ∈ ¯ H ( y (cid:48) ) U q | y (cid:48) p ( F ( z,σ ) , h ) by the definitions of W ( y (cid:48) ) and q ( y (cid:48) ).Alternatively, let z > y (cid:48) . So, for each k = 0 , , , ... , as long as z has not been realized,the only possible history is h (cid:48) ⊕ k . Define q (cid:48) k = p ( h (cid:48) ⊕ k ) , k = 0 , , , .... This is the problem with binary environments analyzed in Section 3, where x = y (cid:48) and X = { , W ( z ) } , so the value of the high alternative is W ( z ). By Proposition4, we can replace the sequence of probabilities ( q (cid:48) , q (cid:48) , ... ) by a constant sequence¯ q ∞ = (¯ q, ¯ q, ... ). Moreover, ¯ q = q ( y (cid:48) ) by (19) and the definitions of W ( y (cid:48) ) and q ( y (cid:48) ).We thus proved the second inequality in (41).By (8), V ( F ( z,σ ) , h (cid:48) ) depends on h (cid:48) only through the best-so-far alternative y (cid:48) , so V ( F ( z,σ ) , h (cid:48) ) = V ( F ( z,σ ) , y (cid:48) ). It follows from (41) thatinf h (cid:48) ∈ ¯ H ( y (cid:48) ) U p ( F ( z,σ ) , h (cid:48) ) V ( F ( z,σ ) , h (cid:48) ) = inf h (cid:48) ∈ ¯ H ( y (cid:48) ) U p ( F ( z,σ ) , h (cid:48) ) V ( F ( z,σ ) , y (cid:48) ) ≤ inf h (cid:48) ∈ ¯ H ( y (cid:48) ) U q | y (cid:48) p ( F ( z,σ ) , h (cid:48) ) V ( F ( z,σ ) , y (cid:48) ) ≤ U q ( F ( z,σ ) , y (cid:48) ) V ( F ( z,σ ) , y (cid:48) ) . The above holds for each y (cid:48) ≥ x and each F ( z,σ ) ∈ B X , thus proving ˆ R p ( x ) ≤ ˆ R q ( x ). Part 2. Monotonicity.
Consider a stationary rule p . Suppose that p is nonmonotone,so there exist y (cid:48) , y (cid:48)(cid:48) such that x ≤ y (cid:48) < y (cid:48)(cid:48) and p ( y (cid:48) ) > p ( y (cid:48)(cid:48) ). Define q by q ( y ) = sup y (cid:48) ∈ [ x ,y ] p ( y (cid:48) ) , y ≥ x . (43)Note that, for all y ≥ x , q ( y ) ≥ p ( y ) and q ( y )1 − δ (1 − q ( y )) ≥ p ( y )1 − δ (1 − p ( y )) . (44)We now show that this change from p to q does not decrease the performance ratio.Consider any y ≥ x . For each F ( z,σ ) such that c F ( z,σ ) ≤ y , U q ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≥ q ( y )1 − δ (1 − q ( y )) ≥ p ( y )1 − δ (1 − p ( y )) = U p ( F ( z, , y ) V ( F ( z, , y ) ≥ r p ( y ) , The first inequality is by (37), where we use the monotonicity of q (by construction).The second inequality is by (44). The equality is by (34) and V ( F ( z, , y ) = y . Thelast inequality is by the definition of r p ( y ) in (16). OBUST SEQUENTIAL SEARCH 37
Next, for each F ( z,σ ) such that c F ( z,σ ) > y , by (36), U q ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = q ( y ) y + (1 − q ( y )) δσ q ( z ) z − δ (1 − q ( z )) (1 − δ (1 − q ( y ))(1 − σ )) δσz − δ (1 − σ ) . By the definition of q ( y ), there exists y (cid:48) ∈ [ x , y ] such that q ( y ) = p ( y (cid:48) ). Therefore, U q ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≥ p ( y (cid:48) ) y (cid:48) + (1 − p ( y (cid:48) )) δσ p ( z ) z − δ (1 − p ( z )) (1 − δ (1 − p ( y (cid:48) ))(1 − σ )) δσz − δ (1 − σ ) = U p ( F ( z,σ ) , y (cid:48) ) V ( F ( z,σ ) , y (cid:48) ) ≥ r p ( y (cid:48) ) . We thus obtain that, for each y ≥ x , r q ( y ) ≥ r p ( y (cid:48) ) for some y (cid:48) ∈ [ x , y ]. It followsthat ˆ R q ( x ) = inf y ≥ x r q ( y ) ≥ ˆ R p ( x ) = inf y ≥ x r p ( y ). We conclude that, without lossof generality, we can restrict attention to monotone rules. Part 3. Monotone Ratio Property.
Consider a monotone stationary rule p . Supposethat r p ( y ) defined by (16) is nonmonotone.First, we show that if ¯ x = sup X < ∞ , then, without loss of generality, we can assume r p ( y ) = 1 for all y ≥ δ ¯ x , and r p ( y ) ≥ yδ ¯ x for all y ∈ [ x , δ ¯ x ]. (45)The first line is trivial, as when y ≥ δ ¯ x , one can trivially get the ratio of 1 bystopping and getting the best-so-far alternative y . To show the second line, supposethat r p ( y (cid:48) ) < y (cid:48) δ ¯ x for some y (cid:48) . Then define q ( y ) = 1 for all y ≥ y (cid:48) and q ( y ) = p ( y ) forall y < y (cid:48) . For each y ≥ y (cid:48) , r q ( y ) = yδ ¯ x ≥ y (cid:48) δ ¯ x > r p ( y (cid:48) ) . For each y ∈ [ x , y (cid:48) ), using (36), the definition of r p ( y ), and q ( z )1 − δ (1 − q ( z )) ≥ p ( z )1 − δ (1 − p ( z )) , weobtain r q ( y ) ≥ r p ( y ). It follows that ˆ R q ( x ) = inf y ≥ x r q ( y ) ≥ ˆ R p ( x ) = inf y ≥ x r p ( y ).As r p ( y ) is nonmonotone, there exists y (cid:48) and y (cid:48)(cid:48) such that δy (cid:48)(cid:48) ≤ y (cid:48) < y (cid:48)(cid:48) and r p ( y (cid:48) ) > r p ( y (cid:48)(cid:48) ) = inf y ≥ y (cid:48) r p ( y ). We now construct a monotone stationary rule q ( y )that differs from p ( y ) only on the interval [ y (cid:48) , y (cid:48)(cid:48) ) and has the following properties: r q ( y ) is constant on [ y (cid:48) , y (cid:48)(cid:48) ), continuous at y (cid:48)(cid:48) , and satisfies ˆ R q ( x ) ≥ ˆ R p ( x ). Let D ( y, g ) = min g − δ (1 − g ) , inf z>y (cid:48)(cid:48) ,σ ∈ [0 , gy + (1 − g ) δσ p ( z ) z − δ (1 − p ( z )) (1 − δ (1 − g )(1 − σ )) δσz − δ (1 − σ ) . Note that D ( y, p ( y )) = r p ( y ) for each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ). This is by (36) and (37), and thefact that c F ( z,σ ) > y implies z > y/δ ≥ y (cid:48)(cid:48) . Since it is assumed that r p ( y ) > r p ( y (cid:48)(cid:48) ) foreach y ∈ [ y (cid:48) , y (cid:48)(cid:48) ), we have D ( y, p ( y )) > D ( y (cid:48)(cid:48) , p ( y (cid:48)(cid:48) )) for each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ).Next, we have ddg (cid:32) gy + (1 − g ) δσ p ( z ) z − δ (1 − p ( z )) (1 − δ (1 − g )(1 − σ )) δσz − δ (1 − σ ) (cid:33) = y (1 − δ (1 − σ )) − δσ p ( z ) z − δ (1 − p ( z )) (cid:16) (1 − δ (1 − g )(1 − σ )) δσz − δ (1 − σ ) (cid:17) , which has a sign that does not depend on g . So gy +(1 − g ) δσ p ( z ) z − δ (1 − p ( z )) (1 − δ (1 − g )(1 − σ )) δσz − δ (1 − σ ) is monotone in g for each z , σ and y . Thus, D ( y, g ) is a lower envelope of monotone functions, so itis quasiconcave in g for each y . Moreover, D ( y,
1) = yδ ¯ x < D ( y (cid:48)(cid:48) , p ( y (cid:48)(cid:48) )) = r p ( y (cid:48)(cid:48) ) for each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ),because, by (45), r p ( y (cid:48)(cid:48) ) ≥ yδ ¯ x . To sum up, D ( y, < D ( y (cid:48)(cid:48) , p ( y (cid:48)(cid:48) )) < D ( y, p ( y )) for each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ).Since D ( y, g ) is continuous and quasiconcave in g , for each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ) there exists g ∗ ( y ) ≥ p ( y ) such that D ( y, g ∗ ( y )) = D ( y (cid:48)(cid:48) , p ( y (cid:48)(cid:48) )). Moreover, since D ( y, g ) is in-creasing in y for all g , by the monotone comparative statics theorem (Milgrom andShannon 1994, Theorem 4 (cid:48) ), g ∗ ( y ) is increasing.Define a stationary rule q as follows. For each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ), let q ( y ) = g ∗ ( y ), and foreach y (cid:54)∈ [ y (cid:48) , y (cid:48)(cid:48) ), let q ( y ) = p ( y ). We thus obtain q ( y ) ≥ p ( y ) and q ( y )1 − δ (1 − q ( y )) ≥ p ( y )1 − δ (1 − p ( y )) for each y ≥ x , (46)and r q ( y ) = r p ( y (cid:48)(cid:48) ) for each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ]. (47)Therefore, r q ( y ) is monotone on [ y (cid:48) , y (cid:48)(cid:48) ]. Moreover, for each y < y (cid:48) , by (36) and (46), r q ( y ) ≥ r p ( y ). For each y ∈ [ y (cid:48) , y (cid:48)(cid:48) ], by (47), r q ( y ) = r p ( y (cid:48)(cid:48) ). For each y > y (cid:48)(cid:48) , by q ( y ) = p ( y ), r q ( y ) = r p ( y ). We thus obtain that, for each y ≥ x , r q ( y ) ≥ min { r p ( y ) , r p ( y (cid:48)(cid:48) ) } .It follows that ˆ R q ( x ) = inf y ≥ x r q ( y ) ≥ ˆ R p ( x ) = inf y ≥ x r p ( y ).We thus conclude that, without loss of generality, we can restrict attention to rules p such that r p ( y ) is weakly increasing in y . This completes the proof. OBUST SEQUENTIAL SEARCH 39
B.2.
Proof of Proposition 6.
Let p be stationary, monotone, and satisfy the mono-tone ratio property. We now prove that R p ( x , F X ) ≥ ˆ R p ( x ) = inf y ≥ x inf F ∈B X U p ( F, y ) V ( F, y ) . Let ¯ x = sup X . Note that ¯ x = ∞ if X is unbounded. Fix a best-so-far alternative y such that y ≥ x . Consider an arbitrary environment G ∈ F X , and denote itsreservation value by c , so, by (2), (cid:90) c c d G ( x ) + (cid:90) ¯ xc x d G ( x ) = cδ . (48)We now find a binary environment F ( z,σ ) such that U p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≤ U p ( G, y ) V ( G, y ) . Let us denote by F ( w,z,σ ) the lottery between w and z with probabilities 1 − σ and σ , respectively. The construction of F ( z,σ ) consists of two parts. In Part 1, we find F ( w,z,σ ) such that U p ( F ( w,z,σ ) , y ) ≤ U p ( G, y ) and V ( F ( w,z,σ ) , y ) ≥ V ( G, y ). In Part 2,we show that w = 0 is without loss of generality. Part 1.
Consider a stationary rule p . Fix a best-so-far alternative y such that y ≥ x .Consider an arbitrary environment G ∈ F X , and denote its reservation value by c ,so, by (2), (cid:90) c c d G ( x ) + (cid:90) ¯ xc x d G ( x ) = cδ . (49)We now find an environment F ( w,z,σ ) such that U p ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) ≤ U p ( G, y ) V ( G, y ) . To find such F ( w,z,σ ) , we first consider a one-shot deviation to some environment F under the constraint V ( F, y ) ≥ V ( G, y ). The “one-shot deviation” means that theindividual will face F in the next round, and G in all subsequent rounds. We willshow that there exists F = F ( w,z,σ ) such that the individual’s expected payoff againstthe sequence of environments ( F ( w,z,σ ) , G, G, ... ) is weakly lower than against the ori-ginal i.i.d. sequence ( G, G, G, ... ). We then show that this expected payoff is evenlower if we replace ( F ( w,z,σ ) , G, G, ... ) by ( F ( w,z,σ ) , F ( w,z,σ ) , F ( w,z,σ ) , ... ), thus proving U p ( F ( w,z,σ ) , y ) ≤ U p ( G, y ). c c/δ z u ( x ) bbc bc c c/δ z u ( x ) w bc bc c c/δ b z u ( x ) (a) (b) (c) Figure 3.
Illustration of three cases that arise when solving (53).
Recall that U p ( G, y ) = p ( y ) y + (1 − p ( y )) δ (cid:90) ¯ x U p ( G, max { y, x } )d G ( x ) . (50)Let us find a distribution F that minimizes the individual’s expected payoff againstall one-shot deviation sequences ( F, G, G, ... ), subject to the constraint that the re-servation value of F is at least c (which implies V ( F, y ) ≥ V ( G, y )):inf F ∈F X p ( y ) y + (1 − p ( y )) δ (cid:90) ¯ x U p ( G, max { y, x } )d F ( x ) (51)s.t. (cid:90) c c d F ( x ) + (cid:90) ¯ xc x d F ( x ) ≥ cδ . (52)Observe that the constraint (52) does not impose any restrictions on how a mass F ( c )is assigned to the interval [0 , c ]. Thus, any positive mass F ( c ) should be assigned toa point x (cid:48) that minimizes U p ( F, max { y, x (cid:48) } ) on [0 , c ] ∩ X . So, we can simplify theproblem (51)–(52) by using the notation u ( x ) = inf x (cid:48) ∈ [0 ,c ] ∩ X U p ( G, max { y, x (cid:48) } ) , x = c,U p ( G, max { y, x } ) , x > c, x ∈ X, where u ( x ) is linearly extended to ( c, ¯ x ) \ X , so u ( x ) is defined on [ c, ¯ x ]. The problem(51)–(52) reduces to inf F ∈F X (cid:90) ¯ xc u ( x )d F ( x ) s.t. (cid:90) ¯ xc x d F ( x ) ≥ cδ . (53) OBUST SEQUENTIAL SEARCH 41
This problem is solved by the convexification method as in Kamenica and Gentzkow(2011), by minimizing the convex closure of u (i.e., the supremum among all continu-ous and convex functions that do not exceed u ) on the set [ c, ¯ x ], and thus yielding asolution with a support on at most two points, w and z . Figure 3 illustrates how sucha solution is found for three different shapes of u ( x ). The solid curve is u ( x ), whichcan be discontinuous at c , and the dashed line is the convex closure of u ( x ) where itis different from u ( x ). In Figure 3(a) the minimum of u ( x ) is attained at z ≥ c/δ ,so the solution puts the unit mass on the single point z . In Figs. 3(b) and 3(c) theminimum of u ( x ) is below c/δ , so the solution minimizes the convex closure of u ( x )at x = c/δ . In Figure 3(b) it is obtained by a convex combination of two points, w and z , and in Figure 3(c) it is obtained by a convex combination of c and z as shownon the picture. Note that in the last case, to solve the problem (51)–(52), one mustreplace c with a point w ≤ c where the value u ( c ) = inf x (cid:48) ∈ [0 ,c ] U p ( G, max { y, x (cid:48) } ) isachieved.Let us formalize the above. For every ε > w, z, σ ) such that (cid:90) ∞ c u ( x )d F ( x ) ≥ (cid:90) ∞ c u ( x )d F ( w,z,σ ) ( x ) − ε = (1 − σ ) u (max { w, c } ) + σu ( z ) − ε, (54)and F ( w,z,σ ) satisfies the constraint in (53), so(1 − σ ) max { w, c } + σz ≥ c/δ. (55)Therefore, U p ( G, y ) + ε is weakly greater than the individual’s expected payoff againstthe sequence of environments ( F ( w,z,σ ) , G, G, ... ), where F ( w,z,σ ) satisfies (54) and (55).We now show that the individual’s expected payoff is even lower if we replace ( F ( w,z,σ ) , G, G, ... ) by ( F ( w,z,σ ) , F ( w,z,σ ) , F ( w,z,σ ) , ... ). Case 1.
Suppose that u ( x ) attains its infimum at or above c/δ , that is, inf x ∈ [ c,c/δ ) u ( x ) ≥ inf x ∈ [ c/δ, ¯ x ] u ( x ), as shown in Figure 3(a). Then the constraint in (53) is not binding.So, for every ε > z ε ≥ c/δ such that F (0 ,z ε , that assigns the unit masson z ε satisfies (54).By (50), (54), and the definition of u ( x ) we have u ( z ε ) ≥ p ( z ε ) z ε + (1 − p ( z ε )) δ ( u ( z ε ) − ε ) . Solving the inequality for u ( z ε ) − ε , we have u ( z ε ) − ε ≥ p ( z ε ) z ε − ε − δ (1 − p ( z ε )) . Therefore, U p ( G, y ) ≥ u ( y ) ≥ p ( y ) y + (1 − p ( y )) δ ( u ( z ε ) − ε ) ≥ p ( y ) y + (1 − p ( y )) δ p ( z ε ) z ε − ε − δ (1 − p ( z ε ))= U p ( F (0 ,z ε , , y ) − (1 − p ( y )) δ ε − δ (1 − p ( z )) ≥ U p ( F (0 ,z ε , , y ) − δε − δ . Also, observe that, by (3) and z ε ≥ c/δ , V ( G, y ) = max { y, c } ≤ max { y, δz ε } = V ( F (0 ,z ε , , y ) . Thus, for every ε > z ε such that U p ( G, y ) V ( G, y ) ≥ U p ( F (0 ,z ε , , y ) − δε − δ V ( F (0 ,z ε , , y ) . (56)In particular, F (0 ,z ε , satisfies (54) and (55) if one replaces ε by δε/ (1 − δ ). Case 2.
Suppose that u ( x ) does not attain its infimum on [ c/δ, ∞ ), that is,inf x ∈ [ c,c/δ ) u ( x ) < inf x ∈ [ c/δ, ¯ x ] u ( x ) , (57)as shown in Figs. 3(b) and 3(c). Then the constraint in (53) is binding. So, for every ε > w ε , z ε , σ ε ) with w ε ≤ c/δ ≤ z ε such that F ( w ε ,z ε ,σ ε ) satisfies (54),and satisfies (55) with equality.As the solution lies on the convex closure of u ( x ), the straight line through points( w ε , u ( w ε ) − ε ) and ( z ε , u ( z ε ) − ε ) is weakly below the graph of u . Moreover, by (57),the slope of this straight line is nonnegative, so u ( w ε ) ≤ u ( z ε ). Thus, we obtain u ( w ε ) − ε ≤ u ( x ) for all x ≥ w ε , u ( z ε ) − ε ≤ u ( x ) for all x ≥ z ε . (58)As in Case 1, it follows that u ( z ε ) − ε ≥ p ( z ε ) z ε − ε − δ (1 − p ( z ε )) ≥ U p ( F ( w ε ,z ε ,σ ε ) , z ε ) − ε − δ . Also, u ( w ε ) ≥ p ( w ε ) w ε + (1 − p ( w ε )) δ ((1 − σ ε )( u ( w ε ) − ε ) + σ ε ( u ( z ε ) − ε )) ≥ p ( w ε ) w ε + (1 − p ( w ε )) δ ((1 − σ ε ) (cid:18) u ( w ε ) − ε ) + σ ε p ( z ε ) z ε − ε − δ (1 − p ( z ε )) (cid:19) . OBUST SEQUENTIAL SEARCH 43
Solving for u ( w ε ) − ε , we obtain u ( w ε ) − ε ≥ p ( w ε ) w ε − ε + (1 − p ( w ε )) δσ ε p ( z ε ) z ε − ε − δ (1 − p ( z ε )) − δ (1 − σ ε )(1 − p ( w ε ) ≥ U p ( F ( w ε ,z ε ,σ ε ) , max { y, w ε } ) − ε (1 − δ ) . We thus obtain U p ( G, y ) ≥ u ( y ) = p ( y ) y + (1 − p ( y )) δ (cid:90) ∞ u ( x )d G ( x ) ≥ p ( y ) y + (1 − p ( y )) δ ((1 − σ ε )( u ( w ε ) − ε ) + σ ε ( u ( z ε ) − ε )) ≥ p ( y ) y + (1 − p ( y )) δ (cid:0) (1 − σ ε ) U p ( F ( w ε ,z ε ,σ ε ) , max { y, w ε } ) + σ ε U p ( F ( w ε ,z ε ,σ ε ) , z ε ) (cid:1) − δε (1 − δ ) = U p ( F ( w ε ,z ε ,σ ε ) , y ) − δε (1 − δ ) . By (3) and the fact that the constraint in (53) is binding, observe that V ( G, y ) = max { y, c } = V ( F ( w ε ,z ε ,σ ε ) , y ) . Thus, for every ε > F ( w ε ,z ε ,σ ε ) such that U p ( G, y ) V ( F, y ) ≥ U p ( G ( w ε ,z ε ,σ ε ) , y ) − δε (1 − δ ) V ( F ( w ε ,z ε ,σ ε ) , y ) . (59)In particular, F ( w ε ,z ε ,σ ε ) satisfies (54) and (55) if one replaces ε by δε/ (1 − δ ) .Taking ε → y ≥ x and each environment G ∈ F X , U p ( G, y ) V ( G, y ) ≥ inf ( w,z,σ ) U p ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) s.t. w ≤ c/δ ≤ z , σ ∈ [0 , w, z ∈ X .It follows that R p ( x ) ≥ inf y ≥ x inf ( w,z,σ ) U p ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) s.t. w ≤ c/δ ≤ z , σ ∈ [0 , w, z ∈ X .Thus we have shown that we can restrict attention to environments F ( w,z,σ ) . Part 2.
We now show that we can further restrict the set of environments to binaryenvironments F ( z,σ ) = F (0 ,z,σ ) , so w = 0. In other words, for each y , no environment F ( w,z,σ ) with 0 < w < z and σ < U p ( F ( w,z,σ ) , y ) /V ( F ( w,z,σ ) , y )smaller than r p ( y ) given by (16). By contradiction, suppose that there exists y and an environment F ( w,z,σ ) with 0
So, U p ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) ≥ (1 − σ ) U p ( F (0 ,w, , y ) + σU p ( F (0 ,z, , y )(1 − σ ) V ( F (0 ,z, , y ) + σV ( F (0 ,w, , y ) ≥ min (cid:26) U p ( F (0 ,w, , y ) V p ( F (0 ,w, , y ) , U p ( F (0 ,z, , y ) V p ( F (0 ,z, , y ) (cid:27) ≥ r p ( y ) , which is a contradiction. Case 3 . Let U p ( F ( w,z,σ ) , z ) > U p ( F ( w,z,σ ) , w ) and w ≤ c F ( w,z,σ ) (see Figure 3(c)). Thenthe optimal rule waits for the realization of z and, by (3), satisfies V ( F ( w,z,σ ) , y ) = V ( F (0 ,z,σ ) , y ) . (63)Rearranging (34) we obtain p ( y ) y + (1 − p ( y )) δσU p ( F (0 ,z,σ ) , z ) = (1 − δ (1 − p ( y ))(1 − σ )) U p ( F (0 ,z,σ ) , y ) . (64)Thus, U p ( F ( w,z,σ ) , w ) = U p ( F ( w,z,σ ) , w ) V ( F ( w,z,σ ) , w ) V ( F ( w,z,σ ) , w ) = U p ( F (0 ,z,σ ) , w ) V ( F (0 ,z,σ ) , w ) V ( F ( w,z,σ ) , w ) ≥ r p ( w ) V ( F ( w,z,σ ) , w ) ≥ r p ( y ) V ( F ( w,z,σ ) , w ) ≥ r p ( y ) V ( F ( w,z,σ ) , y ) > U p ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) = U p ( F ( w,z,σ ) , y ) , (65)where the first inequality is by the definition of r p ( w ), the second inequality is by theassumption that r p ( y ) is nondecreasing, the third inequality follows from (3), and thefourth inequality is by (60). Then, using (61), (62), (64), and (65), we obtain U p ( F ( w,z,σ ) , y ) > (1 − δ (1 − σ )(1 − p ( y ))) U p ( F (0 ,z,σ ) , y ) + δ (1 − σ )(1 − p ( y )) U p ( F ( w,z,σ ) , y ) . Since 1 − δ (1 − σ )(1 − p ( y )) >
0, it follows that U p ( F ( w,z,σ ) , y ) > U p ( F (0 ,z,σ ) , y ) . Since V ( F ( w,z,σ ) , y ) = V ( F (0 ,z,σ ) , y ) by (63), we obtain U p ( F ( w,z,σ ) , y ) V ( F ( w,z,σ ) , y ) > U p ( F (0 ,z,σ ) , y ) V ( F (0 ,z,σ ) , y ) ≥ r p ( y ) , which is a contradiction. This completes the proof. B.3.
Proof of Theorem 2.
Part (a).
We need to show that the decision rule ¯ p given by the constant stopping probability¯ p ( y ) = ¯ π = 1 − δ − δ for all h t always yields a performance ratio of at least 1 / p is stationary and monotone. It also has the monotone ratio property,as can be easily verified by substitution of p ( y ) = p ( z ) = ¯ π into (36) and (37). ByProposition 6, we can restrict attention to binary environments in B X .First, suppose that c F ( z,σ ) ≤ y . Using p ( y ) = ¯ π = (1 − δ ) / (2 − δ ), we have by (37) U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≥ ¯ π − δ (1 − ¯ π ) = 12 > . (66)Next, suppose that c F ( z,σ ) > y . By (36), using p ( y ) = p ( z ) = ¯ π , we obtain U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = ¯ πy + (1 − ¯ π ) δσ ¯ πz − δ (1 − ¯ π ) (1 − δ (1 − σ )(1 − ¯ π )) δσz − δ (1 − σ ) > (1 − ¯ π ) ¯ π − δ (1 − ¯ π ) (1 − δ (1 − σ )(1 − ¯ π )) − δ (1 − σ ) ≥ ¯ π − δ (1 − ¯ π ) (cid:18) − ¯ π − δ (1 − ¯ π ) (cid:19) = 14 , (67)where the first inequality is by y >
0, the second equality is by the minimum w.r.t. σ ∈ [0 ,
1] attained at σ = 0, and the last equality is by ¯ π = (1 − δ ) / (2 − δ ).Note that, the expression¯ π − δ (1 − ¯ π ) = ¯ π + δ (1 − ¯ π )¯ π + δ (1 − ¯ π ) ¯ π + ... is the reciprocal of the expected delay of obtaining z after its realization, and theexpression 1 − ¯ π − δ (1 − ¯ π ) is the probability of not stopping before z realizes for the firsttime. Setting ¯ π − δ (1 − ¯ π ) equal to 1 / / Part (b).
Let sup X = ∞ . As shown above, the rule ¯ p yields R ¯ p ( x , F X ) = 1 / /
4, thus proving that ¯ p isdynamically robust.By Proposition 5, we can restrict attention to decision rules that are stationary,monotone, and have the monotone ratio property. Consider any such rule p ( y ). OBUST SEQUENTIAL SEARCH 47
As sup X = ∞ , there exists an increasing sequence ( y n ) n ∈ N of elements in X such that y ≥ x and lim n →∞ y n = ∞ . Let ( z k ) k ∈ N be an increasing subsequence of ( y n ) n ∈ N and let ( σ k ) k ∈ N be a decreasing sequence of probabilities that satisfy the following.For all k ∈ N , z k ≥ y k , lim k →∞ σ k = 0 , and lim k →∞ y k σ k z k = 0 , (68)and, in addition, c k := δσ k z k − δ (1 − σ k ) > y k . (69)Such sequences always exist, as z k can be chosen to increase fast enough relative to y k . E.g., if X = R , choose σ k = 1 /k , w k = k , and z k = k .For each k ∈ N , consider the binary environment F ( z k ,σ k ) . Let y k denote the best-so-far alternative. By (8), c k is the reservation value for the environment F ( z k ,σ k ) . By(69), c k > y k , so the optimal rule waits for z k to realize. Therefore, by (36), R p ( x , F X ) ≤ U p ( F ( z k ,σ k ) , y k ) V ( F ( z k ,σ k ) , y k ) = (cid:16) p ( y k ) y k δσ k z k + (1 − p ( y k )) p ( z k )1 − δ (1 − p ( z k )) (cid:17) (1 − δ (1 − σ k ))1 − δ (1 − σ k )(1 − p ( y k )) . As p ( y ) is nondecreasing, and y k and z k diverge, both p ( y k ) and p ( z k ) converge to thesame probability denoted by ¯ q :¯ q = lim k →∞ p ( y k ) = lim k →∞ p ( z k ) . Using the above and (68), we obtain R p ( x , F X ) ≤ lim k →∞ U p ( F ( z k ,σ k ) , w k ) V ( F ( z k ,σ k ) , w k ) = (1 − ¯ q ) ¯ q − δ (1 − ¯ q ) (1 − δ )1 − δ (1 − ¯ q ) = (1 − δ )(1 − ¯ q )¯ q (1 − δ (1 − ¯ q )) ≤ , where the last inequality is easily verified for δ ∈ (0 ,
1) and ¯ q ∈ [0 , Proof of Theorem 2 (cid:48) . Let sup
X < ∞ . By rescaling the values, without lossof generality assume that sup X = 1 . The proof consists of two steps. In Step 1, we assume that the set of feasible envir-onments is F { , } and find a dynamically robust rule for each x >
0. This rule willbe different from the rule q ∗ that we found in Section 3. In Step 2, we expand the setof feasible environments to F X , where X ⊂ [0 ,
1] and { , } ⊂ X . We show that thepreviously derived rule attains the same performance ratio if and only if the outside option x exceeds some constant L >
0. We numerically find an upper bound for thisconstant, which is 1 / Step 1.
Fix x ∈ (0 , r ∗ ( x ) = R ∗ ( x , B { , } ) , where R ∗ ( x , B { , } ) is the dynamically robust ratio for binary environments B { , } given by (15). Note that r ∗ ( x ) > / x >
0. For binary environments B { , } and rule q , recall that the payoff ratio r q ( y ) is given by r q ( y ) = inf F ∈B { , } U q ( y, F ) V ( y, F ) , y ∈ [ x , δr ∗ ( x )) . For each best-so-far alternative y ∈ [ x , p x ( y ), under the constraint that the payoff ratio is equal to r ∗ ( x ): p x ( y ) = max { q ∈ [0 ,
1] : r q ( y ) ≥ r ∗ ( x ) } . (70)Following steps (11) and (12) in the proof of Theorem 1 (cid:48) , for each y ∈ [ x ,
1] we have r q ( y ) = min (cid:40) q − δ (1 − q ) , min σ ∈ [0 , qy + (1 − q ) δσ (1 − δ (1 − σ )(1 − q )) δσ − δ (1 − σ ) (cid:41) . (71)Clearly, p x ( y ) = 1 for each y ∈ [ δr ∗ ( x ) , y ∈ [ x , δr ∗ ( x )). Consider the twoexpressions under minimum in (71). The first expression is strictly increasing in q , soit cannot be binding. The derivative of the second expression w.r.t. q has a constantsign for all q :dd q (cid:18) qy + (1 − q ) δσ − δ (1 − q )(1 − σ ) (cid:19) = 1 − δ (1 − σ )(1 − δ (1 − q )(1 − σ )) (cid:18) y − δσ − δ (1 − σ ) (cid:19) . (72)If (72) is nonnegative, then the solution of (70) is p x ( y ) = 1. If (72) is negative,then, if a solution of (70) exists, it must satisfy the equationmin σ ∈ [0 , qy + (1 − q ) δσ (1 − δ (1 − q )(1 − σ )) δσ − δ (1 − σ ) = r ∗ ( x ) . (73) Specifically, we fix δ and numerically (using Maple software) find the smallest value of x ∈ (0 , F [0 , is minimized by an environment thatrandomizes between 0 and 1. Let us call this value L δ . Thus, as long as x ≥ L δ , the restrictionto F { , } is w.l.o.g. It turns out that the numerically calculated value L δ is constant in δ and isapproximately equal to (bounded from above by) 1 / OBUST SEQUENTIAL SEARCH 49
It is straightforward to verify that the unique solution (˜ q, ˜ σ ) of (73) is given by˜ q = (1 − δ )(1 − r ∗ )( y + √ yr ∗ )( r ∗ + √ yr ∗ )(1 − δ )(1 − r ∗ )2 yr ∗ +( δ ( r ∗ ) +((1 − δ + y ) y +(1 − δ − (3 − δ ) y ) r ∗ ) √ yr ∗ , if y ∈ (0 , δ ( r ∗ ) ) , δ (1 − r ∗ ) δ − y , if y ∈ [ δ ( r ∗ ) , δr ∗ ),˜ σ = (1 − δ )( y + √ yr ) δ ( r − y ) , if y ∈ (0 , δ ( r ∗ ) ),1 , if y ∈ [ δ ( r ∗ ) , δr ∗ ),where we write r ∗ for r ∗ ( x ) for notational convenience. It is also straightforward toverify that ˜ q − δ (1 − ˜ q ) ≥ r ∗ ( x ), so ˜ q is a solution of (70).So, for each y ∈ [ x , δr ∗ ( x )), we have p x ( y ) = ˜ q and, by construction, r p x ( y ) = r ∗ ( x ). We thus obtain R p x ( x , B { , } ) = r ∗ ( x ) = R ∗ ( x , B { , } ). Step 2.
Now consider all environments in F X , where X ⊂ [0 ,
1] and { , } ⊂ X . As B { , } ⊂ F X , we have R p x ( x , F X ) ≤ R ∗ ( x , F X ) ≤ r ∗ ( x ) = R ∗ ( x , B { , } ) . We now identify the lower bound L on x such that R ∗ ( x , F X ) = r ∗ ( x ) for all x ∈ [ L, p x derived in Step 1 is dynamically robust on F [0 , .Define L = inf { x ∈ (0 ,
1] : R p x ( x , F X ) = r ∗ ( x ) } . Observe that R p x ( x , F X ) = r ∗ ( x ) = 1 for all x ∈ [ δ, x → r ∗ ( x ) = ρ (0) = 1 / x → R p x ( x , F X ) ≤ / Therefore, L ∈ (0 , δ ]. We numericallyfind that the value of L is at most 1 /
89, with the equality when X = [0 , δ .It remains to show statements (a) and (b) of Theorem 2 (cid:48) . By Theorem 1 (cid:48) , the rule p ∗ satisfies R p ∗ ( x , B { , } ) = ρ ( x ) = R ∗ ( x , B { , } ) for all x ≤ δ / (2 − δ ). As B { , } ⊂ F X ,we have for all x ≤ δ / (2 − δ ) R p ∗ ( x , F X ) ≤ R ∗ ( x , F X ) ≤ ρ ( x ) . We now find the lower bound L (cid:48) on x such that R q ∗ ( x , F X ) ≥ ρ ( x ) for all x ∈ [ L (cid:48) , q ∗ is dynamically robust on F X for x ∈ [ L (cid:48) , δ / (2 − δ )].Define L (cid:48) = inf { x ∈ (0 ,
1] : R p ∗ ( x , F X ) ≥ ρ ( x ) } . Rescaling the values by ¯ x = 1 /x , we have lim x → R ∗ ( x , F X ) ≤ lim ¯ x →∞ R ∗ (1 , F [0 , ¯ x ] ) ≤ / For X = [0 , L (cid:48) = 1 /
6, by checking that, for x < / z ∈ [0 , ,σ ∈ [0 , U p ∗ ( x , F ( z,σ ) ) V ( x , F ( z,σ ) ) < ρ ( x ) . In words, for x < /
6, the worst-case ratio is attained by a lottery over 0 and z with z <
1, which is why rule p ∗ no longer attains the dynamically robust ratio ρ ( x ). Appendix C. Variations and Extensions
C.1.
A Dynamically Robust Rule for Bounded Environments.
In the follow-ing we present a recursive procedure for constructing a dynamically robust rule forany value of x . For simplicity, we consider the case where the set of alternatives X is an interval. The proof is easily adapted to a more general case.By rescaling the values, without loss of generality assume that X = [0 , r and find a rule, together with a threshold x ( r ), such thatthis rule attains a performance ratio at least r when the outside option x is at least x ( r ). We also show that there is no rule that has a performance ratio better than r for x = x ( r ), and use this to argue that r is the dynamically robust performanceratio when x = x ( r ).Let us introduce the following notation. Let q be a stationary and monotone decisionrule such that r q ( y ) is nondecreasing. By (16), (36), (37), and the fact that y ≤ c F ( z,σ ) implies y ≤ δz , the payoff ratio of a rule that stops with probability s ∈ [0 ,
1] whenthe best-so-far alternative is y , and stops with probability q ( z ) for all z > y/δ is givenby ˜ r q ( y, s ) = min s − δ (1 − s ) , inf σ ∈ [0 , ,z ∈ ( y/δ, sy + (1 − s ) δσ q ( z ) z − δ (1 − q ( z )) (1 − δ (1 − s )(1 − σ )) δσz − δ (1 − σ ) . (74)Note that ˜ r q ( y, q ( y )) = r q ( y ) by the definition of r q .Fix a target performance ratio r ∈ ( , p and a lower bound x ( r ) such that p attains the ratio r p ( y ) ≥ r for all y ∈ [ x ( r ) , p will be compared to a different hypothetical rule q that guarantees a strictly better ratio at x ( r ), and hence at all higher best-so-faralternatives. So we suppose that r q ( y ) > r for all y ∈ [ x ( r ) , ∩ X. (75) OBUST SEQUENTIAL SEARCH 51
We will then show that no such q exists, thus proving dynamic robustness of p .We now construct p by induction. During this construction, we will verify someproperties of the hypothetical rule q .First, for each y ∈ [ δ,
1] define S r ( y ) = (cid:26) s ∈ [0 ,
1] : s − δ (1 − s ) ≥ r (cid:27) and p ( y ) = max { s ∈ [0 ,
1] : s ∈ S r ( y ) } = 1 . (76)By (74) and (75), the hypothetical rule q satisfies r q ( y ) = q ( y )1 − δ (1 − q ( y )) > r for each y ∈ [ δ, , so q ( y ) ∈ S r ( y ).We proceed by induction. For each k = 1 , , ... , we derive p ( y ) for y ∈ [ δ k +1 , δ k ), usingour solution p ( z ) for all z ≥ δ k from the earlier induction steps. We also verify that q ( y ) ∈ S r ( y ) for each y ∈ [ δ k +1 , δ k ) using the induction assumption q ( z ) ∈ S r ( z ) for all z ∈ [ δ k , . (77)For each y ∈ [ δ k +1 , δ k ), define S r ( y ) = { s ∈ [0 ,
1] : ˜ r p ( y, s ) ≥ r } and p ( y ) = max { s ∈ [0 ,
1] : s ∈ S r ( y ) } if S r ( y ) (cid:54) = ∅ . (78)Notice that S r ( y ) depends on p only through the values of p ( z ) defined in the previousiterations of the procedure.Now we check the properties of the hypothetical rule q . By (76) and (78) and theinduction assumption (77), we have q ( z ) ≤ p ( z ) for all z ≥ y/δ . By (74), ˜ r q ( y, s ) isincreasing in q ( z ) for all z ≥ y/δ . Consequently,˜ r q ( y, s ) ≤ ˜ r p ( y, s ) for all s ∈ [0 , r q ( y, q ( y )) = r q ( y ) > r by (75), we obtain q ( y ) ∈ S r ( y ) . (80) Now let us return to the construction of p . If S r ( y ) is nonempty for all y ∈ [ δ k +1 , δ k ),then we proceed to the next step of the induction, k + 1. Otherwise, we terminatethe procedure. Upon termination, we define S r ( y ) = ∅ for all y ∈ (0 , δ k +1 ) and x ( r ) = min { y : S r ( y ) (cid:54) = ∅ } . We thus obtain p ( y ) that satisfies r p ( y ) ≥ r for all y ∈ [ x ( r ) , r > /
4. Theproof of Theorem 2(b) actually shows that for each ε > x > R p (1 , F X ) ≥ / ε whenever sup X ≤ ¯ x . By rescaling the values by 1 / (sup X ), weobtain that R p ( x , F X ) ≥ / ε whenever x ≥ (sup X ) / ¯ x .Furthermore, since ˜ r p ( y, s ) is continuous in y and s , S r defined by the above procedureis continuous in r . Therefore, x ( r ) is continuous.We now show that every stopping probability in S r ( x ( r )) gives the same payoff ratio, r , that is, ˜ r p ( x ( r ) , s ) = r for all s ∈ S r ( x ( r )) . (81)If there were s ∈ S r ( x ( r )) such that ˜ r p ( x ( r ) , s ) > r , then, by continuity of ˜ r p ( y, s )in y , there would exist ε > r p ( x ( r ) − ε, s ) ≥ r , which is a contradictionto the definition of x ( r ).By continuity of x ( r ) and (81), we obtain that x ( r ) is a one-to-one mapping. Thatis, for each x > r , and a decision rule p defined by (76) and (78) forthis value of r , such that r p ( y ) ≥ r for all y ∈ [ x ,
1] with equality for y = x , andthus R p ( x , F X ) = inf y ≥ x r p ( y ) = r. We now show that p defined by (76) and (78) for a given r is dynamically robust.Recall the hypothetical rule q that satisfies (75). By (80), q ( y ) ∈ S r ( y ) for all y ∈ [ x ( r ) , s = q ( x ( r )) into (81) we obtain ˜ r p ( x ( r ) , q ( x ( r )) = r . By (79), r q ( x ( r )) = ˜ r q ( x ( r ) , q ( x ( r )) ≤ ˜ r p ( x ( r ) , q ( x ( r )) = r. This is a contradiction to (75), thus proving dynamic robustness of p .C.2. Linear Decision Rules.
Analogously to Appendix C.1, consider X = [0 , Specifically, the graph { ( y, S r ( y )) } y> is continuous in r in the topology of uniform convergence. OBUST SEQUENTIAL SEARCH 53 if we consider simple rules where the stopping probability is linear in the best-so-faralternative (wherever this probability is below 1). We find that these rules approx-imate the dynamically robust performance ratio identified in Theorem 2 (cid:48) well, withthe performance loss around 5%, provided the discount factor is not too close to one.Consider a truncated linear rule p α given by p α ( y ) = min (cid:8) − δ − δ + αy, (cid:9) , where α > − δ − δ is takenfrom decision rule ¯ p in Theorem 2.The intercept ensures good performance when the best so far alternative y is verysmall. The slope α is used to ensure good performance for higher values of y .By Proposition 6 it is sufficient to investigate performance when facing binary en-virornments. For each value of α and x , we derive the performance ratio R p α ( x , F X )for the linear rule and evaluate the performance loss given by ε α = sup x ∈ (1 / , (cid:16) R ∗ ( x , X ) − R p α ( x , X ) (cid:17) , where R ∗ is the dynamically robust performance ratio (we consider x ≥ /
89 toapply Theorem 2 (cid:48) ). We search for the value α ∗ that minimizes the performance loss, ε ∗ = ε α ∗ = inf α ≥ ε α . That is, we look for linear rules that are closest to being dynamically robust. Closenessrefers here to the smallest maximal loss in performance ratio, ε ∗ , as compared to thedynamically robust rule.Table 2 presents, for various values of discount factor δ , how much one loses in termsof the performance ratio when limiting attention to linear rules. It also presents thecorresponding slopes of the linear rules. δ . . . . . . . . . .
95 0 .
99 0 . α ∗ .
68 2 .
48 1 .
75 1 .
38 1 .
19 1 .
05 0 .
93 0 .
81 0 . .
39 0 . . ε ∗ .
9% 4 .
7% 4 .
6% 4 .
5% 4 .
8% 5% 4 .
8% 4 .
4% 5 .
5% 6 .
6% 8 .
1% 8 . Table 2.
Numerically computed coefficients α ∗ of the linear rules that areclosest to being dynamically robust, with the corresponding bounds ε ∗ . One may not be satisfied by the performance of the linear rules when δ is large. Forlarge δ , there is a different simple rule that performs almost as well as the dynamicallyrobust rule. Let ˆ p β ( y ) = min (cid:40)(cid:115) β (1 − δ ) y − y , (cid:41) for 1 / ≤ y < δ and ˆ p β ( y ) = 1 for y ≥ δ, where β > β ∗ that makes ¯ p β closest to being dynamically robust.We list the values of β ∗ and the corresponding bounds on the performance loss inTable 3. δ . .
95 0 .
99 0 . β ∗ .
35 0 . .
22 0 . ε ∗ .
8% 1 .
6% 2 .
5% 3%
Table 3.
Numerically computed coefficients β ∗ of rules ˆ p β that are closestto being dynamically robust, with the corresponding bounds ε ∗ . C.3.
Additive-Multiplicative Search Costs.
Let us now extend the model byintroducing an additive cost of search. Suppose that the individual incurs a cost of κ ≥ t , the individual has a choicebetween consuming the best-so-far alternative y t , or to proceed to the next round,where a fixed cost of κ is incurred, and all future payoffs are discounted by δ . Thus,if the individual stops the search in round t ≥
1, her payoff from the perspective ofround 0 is − ( δ + ... + δ t − + δ t ) κ + δ t y t . We assume that the cost parameters satisfy κ ≥
0, 0 < δ ≤
1, and κ +(1 − δ ) >
0. Thelast assumption demands that the search is costly. We allow for either zero additivecost, κ = 0, or zero multiplicative cost, 1 − δ = 0, but not both.First, we point out that Propositions 1 and 2, as well as Propositions 5 and 6, continueto hold in this setting. The proofs of these propositions are easily adjusted to takeinto account the additive cost of search.As before, Propositions 5 and 6 allow us to restrict attention to monotone decisionrules that depend on the best-so-far alternative only, and to narrow down the set ofpriors to the set of binary environments B X . Let p be a monotone rule. Let V ( F, y )be the optimal payoff and U p ( F, y ) be the payoff of rule p in environment F under OBUST SEQUENTIAL SEARCH 55 best-so-far alternative y . Then for F ∈ B X we obtain V ( F, y ) = max q ∈ [0 , (cid:18) qy + (1 − q ) δ (cid:18) − κ + (cid:90) V ( F, max { y, x } )d F ( x ) (cid:19)(cid:19) and U p ( F, y ) = p ( y ) y + (1 − p ( y )) δ (cid:18) − κ + (cid:90) U p ( F, max { y, x } )d F ( x ) (cid:19) . The performance ratio of rule p is defined for each outside option x > R p ( x , F X ) = inf y ≥ x inf F ∈B X U p ( F, y ) V ( F, y ) . We now find the dynamically robust performance ratio when the outside option is atleast twice the present value of all future discounted costs, so x ≥ δκ − δ . Theorem 3.
Let x ≥ δκ − δ . The stationary decision rule ¯ p given by ¯ p ( y ) = 1 − δ − δ for all y ≥ x (a) attains the performance ratio at least / ;(b) is dynamically robust if sup X = ∞ .Proof. Let x ≥ δκ − δ be an outside option. We show that the decision rule ¯ p ( y ) thatstops with the constant probability q := ¯ p ( y ) for all y ≥ x attains the performance ratio of 1 / y ≥ x . Consider an environment F ( z,σ ) ∈ B X such that z ≤ y , so V ( F ( z,σ ) , y ) = y , and U ¯ p ( F ( z,σ ) , y ) = qy + (1 − q ) δ ( U p ( F ( z,σ ) , y ) − κ ) = qy − (1 − q ) δκ − δ (1 − q ) = 12 (cid:18) y − δκ (1 − δ ) (cid:19) . By y ≥ x ≥ δκ − δ , the payoff ratio satisfies U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = 12 (cid:18) − δκ (1 − δ ) y (cid:19) ≥ . Second, consider an environment F ( z,σ ) such that z > y , so V ( F ( z,σ ) , y ) = max (cid:26) y, δ ( σz − κ )1 − δ (1 − σ ) (cid:27) . The individual’s payoff is U ¯ p ( F ( z,σ ) , y ) = qy + (1 − q ) δ (cid:0) (1 − σ ) U ¯ p ( F ( z,σ ) , y ) + σU ¯ p ( F ( z,σ ) , z ) − κ (cid:1) . Substituting q = − δ − δ and U ¯ p ( F ( z,σ ) , z ) = (cid:16) z − δκ (1 − δ ) (cid:17) , solving for U ¯ p ( F ( z,σ ) , y ), andsimplifying the expression yields U ¯ p ( F ( z,σ ) , y ) = qy + (1 − q ) δ (cid:16) σ (cid:16) z − δκ (1 − δ ) (cid:17) − κ (cid:17) − δ (1 − q )(1 − σ ) = 12 (cid:18) y − δκ (1 − δ ) (cid:19) + δσ ( z − y )4(1 − δ ) + 2 δσ . If V ( F ( z,σ ) , y ) = y , then, by y ≥ x ≥ δκ − δ , U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = 12 y (cid:18) y − δκ (1 − δ ) + δσ ( z − y )2(1 − δ ) + δσ (cid:19) ≥ y (cid:18) y − δκ (1 − δ ) (cid:19) ≥ . (82)If V ( F ( z,σ ) , y ) = δ ( σz − κ )1 − δ (1 − σ ) , then, by y ≥ x ≥ δκ − δ , U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = 12 (cid:18) y − δκ (1 − δ ) + δσ ( z − y )2(1 − δ ) + δσ (cid:19) − δ (1 − σ ) δ ( σz − κ ) ≥ (cid:18) y δσ ( z − y )2(1 − δ ) + δσ (cid:19) − δ (1 − σ ) δ ( σz − κ ) . (83)Assume that σ ≥ − δ ) /δ . Then the right-hand side in (83) is increasing in z ,so reducing z until δ ( σz − κ )1 − δ (1 − σ ) = y yields inf z U ¯ p ( F ( z,σ ) ,y ) V ( F ( z,σ ) ,y ) ≥ / σ < − δ ) /δ . Then the right-hand side in (83) is decreasing in z , so,taking z → ∞ , we obtain U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≥ inf σ< − δ ) δ lim z →∞ U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) = inf σ< − δ ) δ − δ (1 − σ )4(1 − δ ) + 2 δσ = 14 , as can be easily verified for all σ ∈ [0 , σ ∈ [0 , ,z ≥ U ¯ p ( F ( z,σ ) , y ) V ( F ( z,σ ) , y ) ≥ . Moreover, for a sequence ( z k , σ k ) k ∈ N such that z k → ∞ , σ k →
0, and δ ( σ k z k − κ )1 − δ (1 − σ k ) > y for all k , lim k →∞ U ¯ p ( F ( z k ,σ k ) , y ) V ( F ( z k ,σ k ) , y ) = 14 . (cid:3) ReferencesBabaioff, M., M. Dinitz, A. Gupta, N. Immorlica, and
K. Talwar (2009):“Secretary Problems: Weights and Discounts,” in
Proceedings of the Twentieth
OBUST SEQUENTIAL SEARCH 57
Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’09, pp. 1245–1254,Philadelphia, PA, USA.
Ben-Tal, A., L. E. Ghaoui, and
A. Nemirovski (2009):
Robust Optimization .Princeton University Press.
Bergemann, D., and
S. Morris (2005): “Robust Mechanism Design,”
Economet-rica , 73, 1771–1813.
Bergemann, D., and
K. H. Schlag (2011a): “Robust Monopoly Pricing,”
Journalof Economic Theory , 146, 2527–2543.(2011b): “Should I Stay or Should I Go? Search Without Priors,” Mimeo.
Carrasco, V., V. F. Luz, N. Kos, M. Messner, P. Monteiro, and
H. Mor-eira (2018): “Optimal selling mechanisms under moment conditions,”
Journal ofEconomic Theory , 177, 245–279.
Carroll, G. (2015): “Robustness and Linear Contracts,”
American Economic Re-view , 105, 536–563.
Chassang, S. (2013): “Calibrated Incentive Contracts,”
Econometrica , 81, 1935–1971.
Epstein, L. G., and
M. Schneider (2003): “Recursive Multiple-Priors,”
Journalof Economic Theory , 113, 1–31.
Ferguson, T. S. (1989): “Who Solved the Secretary Problem?,”
Statistical Science ,4(3), 282–289.
Fox, J. H., and
L. G. Marnie (1960): “In Martin Gardner’s Column: Mathem-atical Games,”
Scientific American , 202(2), 150–153.
Gastwirth, J. L. (1976): “On Probabilistic Models of Consumer Search for In-formation,”
The Quarterly Journal of Economics , 90, 38–50.
Gilboa, I., and
D. Schmeidler (1989): “Maxmin Expected Utility with a Non-Unique Prior,”
Journal of Mathematical Economics , 18, 141–153.
Hansen, L. P., and
T. J. Sargent (2001): “Robust Control and Model Uncer-tainty,”
American Economic Review , 91, 60–66.
Huber, P. J. (1964): “Robust Estimation of a Location Parameter,”
The Annals ofMathematical Statistics , 35, 73–101.(1965): “A Robust Version of the Probability Ratio Test,”
The Annals ofMathematical Statistics , 36, 1753–1758.
Janssen, M., A. Parakhonyak, and
A. Parakhonyak (2017): “Non-Reservation Price Equilibria and Consumer Search,”
Journal of Economic Theory ,172, 120–162.
Jiang, H., S. Netessine, and
S. Savin (2011): “Robust Newsvendor CompetitionUnder Asymmetric Information,”
Operations Research , 59, 254–261.
Kajii, A., and
S. Morris (1997): “The Robustness of Equilibria to IncompleteInformation,”
Econometrica , 65, 1283–1309.
Kamenica, E., and
M. Gentzkow (2011): “Bayesian Persuasion,”
American Eco-nomic Review , 101, 2590–2615.
Kasberger, B., and
K. H. Schlag (2017): “Robust Bidding in First-Price Auc-tions: How to Bid without Knowing What Others Are Doing,” Mimeo.
Klibanoff, P., M. Marinacci, and
S. Mukerji (2009): “Recursive SmoothAmbiguity Preferences,”
Journal of Economic Theory , 144, 930–976.
Maccheroni, F., M. Marinacci, and
A. Rustichini (2006): “Dynamic Vari-ational Preferences,”
Journal of Economic Theory , 128, 4–44.
Manski, C. F. (2004): “Statistical Treatment Rules for Heterogeneous Populations,”
Econometrica , 72, 1221–1246.
Milgrom, P., and
C. Shannon (1994): “Monotone Comparative Statics,”
Econo-metrica , 62, 157–180.
Parakhonyak, A., and
A. Sobolev (2015): “Non-Reservation Price Equilibriumand Search without Priors,”
Economic Journal , 125, 887–909.
Perakis, G., and
G. Roels (2008): “Regret in the Newsvendor Model with PartialInformation,”
Operations Research , 56, 188–203.
Prasad, K. (2003): “Non-Robustness of Some Economic Models,”
The B.E. Journalof Theoretical Economics , 3.
Riedel, F. (2009): “Optimal Stopping with Multiple Priors,”
Econometrica , 77,857–908.
Savage, L. J. (1951): “The Theory of Statistical Decision,”
Journal of the AmericanStatistical Association , 46, 55–67.
Schlag, K. H. (2006): “Eleven – Tests Needed for a Recommendation,” Mimeo.
Schlag, K. H., and
A. Zapechelnyuk (2016): “Value of Information WhenSearching for a Secretary,” Mimeo.(2017): “Dynamic Benchmark Targeting,”
Journal of Economic Theory ,169, 145–169.
Siniscalchi, M. (2011): “Dynamic Choice Under Ambiguity,”
Theoretical Econom-ics , 6, 379–421.
Sleator, D. D., and
R. E. Tarjan (1985): “Amortized Efficiency of List Updateand Paging Rules,”
Communications of the ACM , 28, 202–208.
Stoye, J. (2009): “Minimax Regret Treatment Choice with Finite Samples,”
Journalof Econometrics , 151, 70–81.
Terlizzese, D. (2008): “Relative Minimax,” EIEF Working Papers Series 0804.
Wald, A. (1950):
Statistical Decision Functions . Wiley, New York.
Weitzman, M. L. (1979): “Optimal Search for the Best Alternative,”
Econometrica ,47, 641–654.
Zhou, K., J. C. Doyle, and
K. Glover (1995):