Optimal Learning under Robustness and Time-Consistency
aa r X i v : . [ q -f i n . E C ] M a r Optimal Learning under Robustness andTime-Consistency ∗ Larry G. Epstein Shaolin JiMarch 6, 2019
Abstract
We model learning in a continuous-time Brownian setting wherethere is prior ambiguity. The associated model of preference values ro-bustness and is time-consistent. It is applied to study optimal learningwhen the choice between actions can be postponed, at a per-unit-timecost, in order to observe a signal that provides information about anunknown parameter. The corresponding optimal stopping problemis solved in closed-form, with a focus on two specific settings: Ells-berg’s two-urn thought experiment expanded to allow learning beforethe choice of bets, and a robust version of the classical problem of se-quential testing of two simple hypotheses about the unknown drift ofa Wiener process. In both cases, the link between robustness and thedemand for learning is studied.Key words: ambiguity, robust decisions, learning, partial informa-tion, optimal stopping, sequential testing of simple hypotheses, Ells-berg Paradox, recursive utility, time-consistency, model uncertainty ∗ Department of Economics, Boston University, [email protected] and Zhongtai Secu-rities Institute of Financial Studies, Shandong University, [email protected]. Ji gratefullyacknowledges the financial support of the National Natural Science Foundation of China(award No. 11571203). We are grateful for suggestions from two referees and for com-ments from Tomasz Strzalecki. An earlier version, titled ”Optimal learning and Ellsberg’surns,” was posted on arxiv in August 2017. Introduction
We consider a decision-maker (DM) choosing between three actions whosepayoffs are uncertain because they depend on both exogenous randomnessand on an unknown parameter θ , θ = θ or θ . She can postpone the choiceof action so as to learn about θ by observing the realization of a signalmodeled by a Brownian motion with drift. Because of a per-unit-time cost ofsampling, which can be material or cognitive, she faces an optimal stoppingproblem. A key feature is that DM does not have sufficient informationto arrive at a single prior about θ , that is, there is ambiguity about θ .Therefore, prior beliefs are represented by a nonsingleton set of probabilitymeasures, and DM seeks to make robust choices of both stopping time andaction by solving a maxmin problem. In addition, she is forward-looking anddynamically consistent as in the continuous-time version of maxmin utilitygiven by Chen and Epstein (2002). One contribution herein is to extend thelatter model to accommodate learning. As a result, we capture robustnessto ambiguity (or model uncertainty), learning and time-consistency. Theother contribution is to investigate optimal learning in the above setting,with particular focus on two special cases that extend classical models. Thecorresponding optimal stopping problems are solved explicitly and the effectsof ambiguity on optimal learning are determined.The first specific context begins with Ellsberg’s metaphorical thoughtexperiment: There are two urns, each containing balls that are either redor blue, where the ”known” or risky urn contains an equal number of redand blue balls, while no information is provided about the proportion of redballs in the ”unknown” or ambiguous urn. DM must choose between bettingon the color drawn from the risky urn or from the ambiguous urn.
Theintuitive behavior highlighted by Ellsberg is the choice to bet on the drawfrom the risky urn no matter the color, which behavior is paradoxical forsubjective expected utility theory, or indeed, for any model in which beliefsare represented by a single probability measure. Ellsberg’s paradox is oftentaken as a normative critique of the Bayesian model and of the view that thesingle prior representation of beliefs is implied by rationality (e.g., Gilboa2009, 2015; Gilboa et al. 2012). Here we add to the thought experimentby including a possibility to learn. Specifically, we allow DM to postponeher choice so that she can observe realizations of a diffusion process whosedrift is equal to the proportion of red in the ambiguous urn. Under specificparametric restrictions we completely describe the optimal joint learningand betting strategy. In particular, we show that it can be optimal to rejectlearning completely, and, if some learning is optimal, then it is never opti- al to bet on the risky urn after stopping . The rationality of no learningsuggests that one needs to reexamine and qualify the common presumptionthat ambiguity would fade away, or at least diminish, in the presence oflearning opportunities (Marinacci 2002). It can also explain experimentalfindings (Trautman and Zeckhauser 2013) that some subjects neglect op-portunities to learn about an ambiguous urn even at no visible (material)cost. In addition, our model is suggestive of laboratory experiments thatcould provide further evidence on the connection between ambiguity andthe demand for learning.The second application is to the classical problem of sequential testingof two simple hypotheses about the unknown drift of a Wiener process. Theseminal papers, both using a discrete-time framework, are Wald (1945,1947),which shows that the sequential probability ratio test (SPRT) provides anoptimal trade-off between type I and type II errors, and Arrow, Blackwelland Girshick (1949), which derives SPRT from utility maximization us-ing dynamic programming arguments. More recently, Peskir and Shiryaev(2006, Ch. 6) employ a Bayesian subjectivist approach and derive SPRT asthe solution to a continuous-time optimal stopping problem. We extend thelatter analysis to accommodate situations where DM, a statistician/analyst,does not have sufficient information to justify reliance on a single prior. Weshow that it is optimal to stop if every ”compatible” Bayesian (one whoseprior is an element of the set of priors used by the robustness-seeking DM)would choose to do so. But the corresponding statement for ”continue”is false: it may be optimal to stop under robustness even given a realizedsample at which all compatible Bayesians would choose to continue. In thissense, ”sensitivity analysis” overstates the robustness value of sampling. We view our model as normative, which perspective is most evident inthe hypothesis testing context. Time-consistency of preference has obviousprescriptive appeal. It is important to understand that, roughly speaking,time-consistency is the requirement that a contingent plan (e.g., a stoppingstrategy) that is optimal ex ante remain optimal conditional on every sub-sequent realization assuming there are no surprises or unforeseen events .A possible argument against such consistency, (that is sometimes expressedin the statistics literature), is that surprises are inevitable and thus thatany prescription should take that into account rather than excluding theirpossibility. We would agree that a sophisticated decision-maker would ex-pect that surprises may occur while (necessarily) being unable to describewhat form they could take. However, to the best of our knowledge therecurrently does not exist a convincing model in the economics, statistics orpsychology literatures of how such an individual should (or would) behave,3hat is, how the awareness that she may be missing something in her per-ception of the future should (or would) affect current behavior. That leavestime-consistency as a sensible guiding principle with the understanding thatreoptimization can (and should) occur if there is a surprise.A brief review of other relevant literature concludes this introduction.The classical Bayesian model of sequential decision-making, including inparticular applications to inference and experimentation, are discussed inHoward (1970) and the references therein. The maxmin model of ambiguityaverse preference is axiomatized in a static setting in Gilboa and Schmeidler(1989), (which owes an intellectual debt to the Arrow and Hurwicz (1972)model of decision-making under ignorance), and in a multi-period discrete-time framework in Epstein and Schneider (2003) where time-consistency isone of the key axioms. Optimal stopping problems have been studied inthe absence of time-consistency. It is well-known that modeling a concernwith ambiguity and robust decision-making leads to ”nonlinear” objectivefunctions, which, in a dynamic setting and in the absence of commitment,can lead to time-inconsistency issues (Peskir 2017). A similar issue arisesalso in a risk context where there is a known objective probability law, butwhere preference does not conform to von Neumann-Morgenstern’s expectedutility theory (Ebert and Strack 2018; Huang et al. 2018). Such models areproblematic in normative contexts. It is not clear why one would ever pre-scribe to a decision-maker (who is unable or unwilling to commit) that she should adopt a criterion function that would imply time-inconsistent plansand that she should then resolve these inconsistencies by behaving strate-gically against her future selves (as is commonly assumed). The recursivemaxmin model has been used in macroeconomics and finance (e.g., Epsteinand Schneider 2010) and also in robust multistage stochastic optimization(e.g., Shapiro (2016) and the references therein, including to the closely re-lated literature on conditional risk measures). Shapiro focuses on a propertyof sets of measures, called rectangularity following Epstein and Schneider(2003), that underlies recursivity of utility and time-consistency. Most ofthe existing literature deals with a discrete-time setting. The theoretical lit-erature on learning under ambiguity is sparse and limited to passive learning(e.g., Epstein and Schneider 2007, 2008). With regard to hypothesis test-ing, this paper adds to the literature on robust Bayesian statistics (Berger1984,1985,1994; Rios-Insua and Ruggeri 2000), which is largely restricted toa static environment. Walley (1991) goes further and considers both a priorand a single posterior stage, but not sequential hypothesis testing. For afrequentist approach to robust sequential testing see Huber (1965).Closest to the present paper is the literature on bandit problems with4mbiguity and robustness (Caro and Das Gupta 2015; Li 2019). Both pa-pers model endogenous learning (or experimentation) by maxmin dynami-cally consistent agents. Their models differ from ours in that they assumediscrete time, an exogenously given horizon, and also in the nature of exper-imentation. In our model, the once-and-for-all choice of action and resultingpayoff come after all learning has ceased, while in bandit problems, actionchoice and flow payoffs are continuous and intertwined with learning (forexample, the cost of experimentation is the implied reduction in currentflow payoffs). Consequently, their analyses and characterizations are muchdifferent, for example, their focus on the existence of a suitable Gittins indexhas no counterpart in our model.The paper proceeds as follows. The next section describes the model ofutility extending Chen-Epstein to accommodate learning. Readers who areprimarily interested in applications can skip this relatively technical sectionand move directly to § § For background regarding time-consistency in the maxmin framework, con-sider first the following informal outline that anticipates the specific settingof this paper. DM faces uncertainty about a payoff-relevant state space Ωdue to uncertainty about the value of a parameter θ ∈ Θ. Each θ deter-mines a unique probability law on Ω, but there is prior ambiguity aboutthe parameter that is represented by a nonsingleton set M of priors on Θ.As time proceeds, DM learns about the parameter through observation ofa signal whose increments are distributed i.i.d. conditional on θ . At issueis how to model beliefs about Ω, that is, the set P of predictive priors.(Throughout we adopt the common practice of distinguishing terminologi-cally between beliefs about the state space, referred to as predictive priors,and beliefs about parameters, which are referred to as priors.) A seeminglynatural approach is to take P to be the set of all measures that can beobtained by combining some prior µ in M with the given conditionallyi.i.d. likelihood. Learning is modeled through the set of posteriors M t at t obtained via prior-by-prior Bayesian updating of M , and a correspondingset P t of predictive posteriors is obtained as above. Finally, at each t ≥ P t t is depends on the worst-case posterior µ t in M t , but worst-casesat different nodes need not belong to same prior µ . This is in contrast withthe ex ante perspective expressed via P where a single worst-case prior µ determines the entire ex ante optimal plan. To restore dynamic consistency,one can enlarge P by adding to it all measures obtained by pasting togetheralien posteriors, leading to a ”rectangular” set that is closed with respectto further pasting. One can think of the enlarged set as capturing both thesubjectively possible probability laws and backward induction reasoning byDM.aSee Epstein and Schneider (2003) for further discussion and axiomaticfoundations in a discrete-time framework, and Chen and Epstein (2002)–CEbelow–for a continuous-time formulation that we outline next. Then we de-scribe how it can be adapted to include learning with partial information.The latter description is given in the simplest context adequate for the ap-plications below. However, it should be clear that it can be adapted moregenerally.Let (Ω , G ∞ , P ) be a probability space, and W = ( W t ) ≤ t< ∞ a 1-dimensionalBrownian motion which generates the filtration G = {G t } t ≥ , with G t ր G ∞ .(All probability spaces are taken to be complete and all related filtrationsare augmented in the usual sense.) The measure P is a reference measurewhose role is only to define null events. CE define a set of predictive priors P on (Ω , G ∞ ) through specification of their densities with respect to P . Todo so, they take as an additional primitive a (suitably adapted) set-valuedprocess (Ξ t ). (Technical restrictions are that Ξ t : Ω K ⊂ R d for somecompact set K independent of t , 0 ∈ Ξ t ( ω ) dt ⊗ dP a.s. , and that eachΞ t is convex- and compact-valued.) Define the associated set of real-valuedprocesses by Ξ = { η = ( η t ) | η t ( ω ) ∈ Ξ t ( ω ) dt ⊗ dP a.s. } . Then each η ∈ Ξ defines a probability measure on G ∞ , denoted P η , that isequivalent to P on each G t , and is given by dP η dP | G t = exp {− Z t η s ds − Z t η s dW s } for all t .Accordingly, each η t ( ω ) ∈ Ξ t ( ω ) can be thought of roughly as defining con-ditional beliefs about G t + dt , and Ξ t ( ω ) is called the set of density generators
6t ( t, ω ). By the Girsanov Theorem, dW ηt = η t dt + dW t (1)is a Brownian motion under P η , which thus can be understood as an alter-native hypothesis about the drift of the driving process W (the drift is 0under P ). Finally, P ≡ { P η : η ∈ Ξ } . (2)(The ”pasting” referred to above is accomplished through the fact that Ξ isconstructed by taking all selections from the Ξ t s.)The set P is used to define a time 0 utility function on a suitable setof random payoffs denominated in utils. In order to model in the sequelthe choice of how long to learn (or sample), we consider a set of stoppingtimes τ , that is, each τ is an adapted R + -valued and {G t } -adapted randomvariable defined on Ω, that is, { ω : τ ( ω ) > t } ∈ G t for every t . For each such τ , utility is defined on the set L ( τ ) of real-valued random variables given by L ( τ ) = { ξ | ξ is G τ -measurable and sup Q ∈P E Q | ξ | < ∞} .The time 0 utility of any ξ ∈ L ( τ ) is given by U ( ξ ) = inf Q ∈P E Q ξ = − sup Q ∈P E Q [ − ξ ] . (3)It is natural to consider also conditional utilities at each ( t, ω ), where U t ( ξ ) = ess inf Q ∈P E Q [ ξ | G t ]. (4)In words, U t ( ξ ) is the utility of ξ at time t conditional on the informationavailable then and given the state ω (the dependence of U t ( ξ ) on ω is sup-pressed notationally). The special construction of P delivers the followingcounterpart of the law of total probability (or law of iterated expectations):For each ξ , and 0 ≤ t < t ′ , U t ( ξ ) = ess inf Q ∈P E Q [ U t ′ ( ξ ) | G t ] . (5)This recursivity ultimately delivers the time-consistency of optimal choices.The components P , W , (Ξ t ) and {G t } are primitives in CE. Next wespecify them in terms of the deeper primitives of a model that includeslearning about an unknown parameter θ ∈ Θ ⊂ R .7pecifically, begin with a measurable space (Ω , F ), a filtration {F t } , F t ր F ∞ ⊂ F , and a collection { P µ : µ ∈ M } of pairwise equivalentprobability measures on (Ω , F ). Though θ is an unknown deterministicparameter, for mathematical precision we view θ as a random variable on(Ω , F ). Further, for each µ ∈ M , P µ induces the distribution µ for θ via µ ( A ) = P µ ( { θ ∈ A } ) for all Borel measurable A ⊂ Θ. Accordingly, M can be viewed as a set of priors on Θ, and its nonsingleton natureindicates ambiguity about θ . There is also a standard Brownian motion B = ( B t ), with generated filtration {F Bt } , such that B is independent of θ under each P µ . B is the Brownian motion driving the signals process Z = ( Z t ) according to Z t = Z t θds + Z t σdB s = θt + σB t , (6)where σ is a known positive constant. Because only realizations of Z t are ob-servable, take {G t } to be the filtration generated by Z . Assuming knowledgeof the signal structure, Bayesian updating of µ ∈ M gives the posterior µ t at time t . Thus prior-by-prior Bayesian updating leads to the set-valuedprocess ( M t ) of posteriors on θ .Proceed to specify the other CE components P , W and (Ξ t ). Step 1.
Take µ ∈ M . By standard filtering theory (Liptser and Shiryaev1977, Theorem 8.3), if we replace the unknown parameter θ by the estimate b θ µt = R θdµ t , then we can rewrite (6) in the form dZ t = ˆ θ µt ( Z t ) dt + σ ( dB t + θ − ˆ θ µt ( Z t ) σ dt ) (7)= ˆ θ µt ( Z t ) dt + σd ˜ B µt ,where the innovation process ( ˜ B µt ) is a standard {G t } -adapted Brownianmotion on (Ω , G ∞ , P µ ). Thus ( ˜ B µt ) takes the same role as ( W ηt ) in CE (see(1) above). Rewrite (7) as d ˜ B µt = − σ ˆ θ µt ( Z t ) dt + 1 σ dZ t which suggests that ( Z t /σ ) (resp. ( − ˆ θ µt ( Z t ) /σ )) can be chosen as the Brow-nian motion ( W t ) (resp. the drift ( η t )) in (1). Step 2.
Find a reference probability measure P on (Ω , G ∞ ) under which( Z t /σ ) is a {G t } -adapted Brownian motion on (Ω , G ∞ ). Fix µ ∈ M and8efine P by: dP dP µ | G t = exp {− σ R t (ˆ θ µs ( Z s )) ds − σ R t ˆ θ µs ( Z s ) d ˜ B µs } = exp { σ R t (ˆ θ µs ( Z s )) ds − σ R t ˆ θ µs ( Z s ) dZ s } .By Girsanov’s Theorem, ( Z t /σ ) is a {G t } -adapted Brownian motion under P . Step 3.
Viewing P as a reference measure, perturb it. For each µ ∈ M ,define P µ on (Ω , G ∞ ) by dP µ dP | G t = exp {− σ Z t (ˆ θ µs ( Z s )) ds + 1 σ Z t ˆ θ µs ( Z s ) dZ s } . By Girsanov, d ˜ B µt = − σ ˆ θ µt ( Z t ) dt + σ dZ t is a Brownian motion under P µ .In general, P µ = P µ . However, they induce the identical distributionfor Z . This is because ( ˜ B µt ) is a {G t } -adapted Brownian motion under both P µ and P µ . Therefore, by the uniqueness of weak solutions to SDEs, thesolution Z t of (7) on (Ω , F ∞ , P µ ) and the solution Z ′ of (7) on (Ω , G ∞ , P µ )have identical distributions. (Argue as in Oksendal (2005, Example 8.6.9).Given that only the distribution of signals matters in our model, there is noreason to distinguish between the two probability measures. Thus we applyCE to the following components: W and P defined in Step 2, and Ξ t givenby Ξ t = {− ˆ θ µt /σ : µ ∈ M , b θ µt = Z θdµ t } . (8)In summary, taking these specifications for P , W , (Ξ t ) and {G t } in theCE model yields a set P of predictive priors, and a corresponding utilityfunction, that capture prior ambiguity about the parameter θ (through M ),learning as signals are realized (through updating to the set of posteriors M t ), and robust (maxmin) and time-consistent decision-making (becauseof (5)). We use this model in the optimal stopping problems that follow.The only remaining primitive is M , which is specified to suit the particularsetting of interest.As indicated, the key technical step in our extension of CE is in adoptingthe weak formulation rather than their strong formulation. For readerswho may be unfamiliar with this distinction we suggest Oksendal (2005,Section 5.3) for discussion of weak versus strong solutions of SDEs, andZhang (2017, Chapter 9). The latter exposits both the technical advantagesof the weak formulation and its economic rationale, notably in models withimperfect information (such as here, where given (6), Z is observed but not9 ), or asymmetric information (such as in principal-agent models). In ourcontext, the weak formulation is suggested if one views B not as modeling aphysical noise or shock, but rather as a way to specify that the distribution of ( Z t − θt ) /σ is standard normal (conditional on θ ). DM must choose an action from the set A = { a , a , a } . Payoffs are uncer-tain and depend on an unknown parameter θ . Before choosing an action,DM can learn about θ by observing realizations of the signal process Z givenby (6), where σ is a known positive constant. There is a constant per-unit-time cost c > {G t } generated by Z , and other notation are as in §
2. Unless specified other-wise, all processes below are taken to be {G t } -adapted even where not statedexplicitly.)If DM stops learning at t , then her conditional expected payoff (in utils)is X t ; think of X t as the indirect utility she can attain by choosing optimallyfrom A . DM is forward-looking and has time 0 beliefs about future signalsgiven by the set P ⊂ ∆ (Ω , G ∞ ) described in the previous section. Herchoice of when to stop is described by a stopping time (or strategy) τ ,which is restricted to be uniformly integrable (sup Q ∈P E Q τ < ∞ ); the setof all stopping strategies is Γ. As a maxmin agent she chooses an optimalstopping strategy τ ∗ by solvingmax τ ∈ Γ min P ∈P E P ( X τ − cτ ) . (9)It remains to specify M , which determines P as described in §
2, and X t .We assume that all priors µ in M have binary support Θ = { θ , θ } , θ < θ . Specifically, let M = { µ m = (1 − m ) δ θ + mδ θ : m ≤ m ≤ m } . (10)Therefore, M can be identified with the probability interval [ m , m ] forthe larger parameter value θ . Let 0 < m < m < t , M t = { (1 − m ) δ θ + mδ θ : m t ≤ m ≤ m t } , (11)10here, by Liptser and Shiryaev (1977, Theorem 9.1), m t = m − m ϕ ( t, Z t )1 + m − m ϕ ( t, Z t ) , m t = m − m ϕ ( t, Z t )1 + m − m ϕ ( t, Z t ) , (12)and ϕ ( t, z ) = exp { θ − θ σ z − σ ( θ − θ ) t } . (13)Conditional on the parameter value, payoffs are given by u ( a i , θ j ), whereeach u ( a i , θ j ) is nonnegative. Think of u ( · , θ j ) as including the valuationof any risk remaining even if θ j is known to be true, for example, u ( a i , θ j )could be the expected utility of the lottery implied by ( a i , θ j ). Payoffs areassumed to satisfy: for each i, j = 0 , i = j , u ( a j , θ j ) = u ( a i , θ i ) > u ( a j , θ i ) . (14)Thus a is better than a given θ , and the reverse given θ , and the payoffto the better action is the same for both parameter values. The payoff to thethird action a does not depend on θ , and can be thought of as a default oroutside option. Its payoff is not ambiguous because incomplete confidenceabout θ is the only source of ambiguity in the model, but choice of a mayentail risk. Adopt the notation u = u ( a , θ ) = u ( a , θ ) . (15)It is evident that action a may be irrelevant if its payoff is sufficiently low,for example, if u = 0. To exclude the trivial case where a is always chosen,assume that u < u ( a i , θ i ) , i = 0 , t beliefs about θ as repre-sented by the set of posteriors M t . The Gilboa-Schmeidler utility of a i ismin µ ∈M t R u ( a i , θ ) dµ . Therefore, if DM chooses an optimal action at time t , then her payoff is X t = max (cid:26) min µ ∈M t Z u ( a , θ ) dµ, min µ ∈M t Z u ( a , θ ) dµ, u (cid:27) . (16)The preceding completes specification of the optimal stopping problem(9). Its solution is described in § alternative additional assump-tions: Payoff symmetry u ( a , θ ) = u ( a , θ )11 o risky option u ≤ u ( a i , θ j ), i = j = 0 , a is (weakly) inferior to each of a and a conditional on either parameter value. Hence, it would never be chosenuniquely and can be ignored, leaving only two actions. These assumptionsare satisfied respectively by the two special models upon which we focus:Ellsberg’s urns (payoff symmetry) and hypothesis testing (no risky option).We focus on these first because they extend classic models in the litera-ture and because they provide simply distinct insights into the connectionbetween ambiguity and optimal learning. There are two urns each containing balls that are either red or blue: a riskyurn in which the proportion of red balls is and an ambiguous urn in whichthe color composition is unknown. Denote by θ + the unknown proportionof red balls. Thus θ denotes the bias towards red: θ > θ < θ = 0 indicates an equal numberas in the risky urn. DM can choose between betting on the draw from therisky or ambiguous urn and also on drawing red or blue. In the absence oflearning, the intuitive behavior highlighted by Ellsberg is to bet on the drawfrom the risky urn no matter the color. Here we consider betting preferencewhen an ambiguity averse decision-maker can defer the choice between betsuntil after learning optimally about θ .To do so, we apply the model described above with particular specifica-tions for its key primitives A , Θ, M and u . For A , let a denote a bet onthe risky urn and let a ( a ) denote the bet on drawing red (blue) from theambiguous urn. (Note that there is no need to differentiate between bets onred and blue for the risky urn.) Take Θ = { θ , θ } , where θ + θ = 0, orequivalently, for some 0 < α < , θ = − α , θ = α . (17)Thus only two possible biases, of equal size, are thought possible, (the pro-portion of red is either − α or + α ). However, there is ambiguity aboutwhich direction for the bias is more likely. This ambiguity is modeled by M having the form in (10), where we assume in addition that the prob-ability interval for α (the bias towards red) is such that m + m = 1, orequivalently, for some 0 < ǫ < m = 1 − ǫ m = 1 + ǫ α and ǫ . We interpret ǫ as modelingambiguity (aversion): the probability interval (cid:2) − ǫ , ǫ (cid:3) for the bias towardsred is larger if ǫ increases. At the extreme when ǫ = 0, then M is thesingleton according to which the two biases are equally likely, and DM is aBayesian who faces uncertainty with variance α about the true bias, but noambiguity. We interpret α as measuring the degree of this prior uncertainty,or prior variance ; ( α = 0 implies certainty that the composition of theambiguous urn is identical to that of the risky urn).Finally, specify payoffs u . All bets have the same winning and losingprizes, denominated in utils, which can be normalized to 1 and 0 respectively.Given the composition of the ambiguous urn, then only risk is involved inevery bet, and an expected utility calculation yields u ( a , − α ) = u ( a , α ) = α + , u ( a , α ) = u ( a , − α ) = α − , and u = .(19)The assumptions in § X t = X ( Z t ): X ( Z t ) = ( + α ) − α − ǫ ǫ ϕ ( Z t ) if Z t > σ α log( ǫ − ǫ )( − α ) + α ǫ − ǫ ϕ ( Z t ) if Z t < − σ α log( ǫ − ǫ ) otherwise, (20)where ϕ ( z ) = exp (cid:0) αz/σ (cid:1) . Thus if Z t is large positive (negative), then abet on drawing red (blue) from the ambiguous urn is optimal. For inter-mediate values, there is not enough evidence for a bias in either directionto compensate for the ambiguity and betting on the risky urn is optimal.This is true in particular ex ante where Z = 0, consistent with the intuitiveambiguity-averse behavior in Ellsberg’s 2-urn experiment without learning.We give an explicit solution to the optimal stopping problem (9) satis-fying (17)-(19). To do so, let l ( r ) = 2 log( r − r ) − r + 11 − r , r ∈ (0 , b r by l ( b r ) = 2 α cσ . (22)13 r is uniquely defined thereby and < b r <
1, because l ( · ) is strictly increas-ing, l (0) = −∞ , l ( ) = 0, and l (1) = ∞ . Theorem 3.1 (i) τ ∗ = 0 if and only if ǫ ≥ b r , in which case X τ ∗ = X = .(ii) Let ǫ < b r . Then the optimal stopping time satisfies τ ∗ > and isgiven by τ ∗ = min { t ≥ | Z t |≥ z } , where z = σ α (cid:20) log 1 + ǫ − ǫ + log r − r (cid:21) > , (23) and r , b r < r < , is the unique solution to the equation l ( r ) + l ( 1 + ǫ α cσ . (24) Moreover, on stopping either the bet on red is chosen (if Z τ ∗ ≥ z ) or the beton blue is chosen (if Z τ ∗ ≤ − z ); the bet on the risky urn is never optimal at τ ∗ > . Finally, if ǫ < ǫ ′ < b r − , and if τ ∗′ is the corresponding optimalstopping time, then τ ∗′ ≥ τ ∗ . The two cases are defined by the relative magnitudes of ǫ , parametrizingambiguity, and b r , which is an increasing function of α / (cid:0) cσ (cid:1) ; in particular,through α , it depends positively on the payoff to knowing the direction ofthe true bias. Thus (i) considers the case where ambiguity is large realtiveto payoffs (and taking also sampling cost and signal variance into account).Then no learning is optimal and the bet on the risky urn is chosen immedi-ately. In contrast, some learning is necessarily optimal given small ambiguity(case (ii)), including in the limiting Bayesian model with ǫ = 0. Thus it isoptimal to reject learning if and only if ambiguity, as measured by ǫ , is suit-ably large . In case (ii), it is optimal to sample as long as the signal Z t lies inthe continuation interval ( − z, z ). Two features of this learning region standout. First, when Z t hits either endpoint, learning stops and DM bets onthe ambiguous urn. Thus the risky urn is chosen (if and) only if it is notoptimal to learn . The second noteworthy feature is that sampling increaseswith greater ambiguity as measured by ǫ , though when ǫ reaches 2 b r −
1, then,by (i), it is optimal to reject any learning.There is simple intuition for the preceding. First, consider the effect ofambiguity (large ǫ ) on the incentive to learn. DM’s prior beliefs admit only α and − α as the two possible values for the true bias. She will incur the14ost of learning if she believes that she is likely to learn quickly which ofthese is true. She understands that she will come to accept α (or − α ) asbeing true given realization of sufficiently large positive (negative) values for Z t . A difficulty is that she is not sure which probability law in her set P describes the signal process. As a conservative decision-maker, she bases herdecisions on the worst-case scenario P ∗ in her set. Because she is trying tolearn, the worst-case minimizes the probability of extreme, hence revealing,signal realizations, which, informally speaking, occurs if P ∗ ( { dZ t > } | Z t >
0) and P ∗ ( { dZ t < } | Z t <
0) are as small as possible. That is,if Z t >
0, then the distribution of the increment dZ t is computed usingthe posterior associated with that prior in M which assigns the largestprobability ǫ to the negative bias − α , while if Z t <
0, then the distributionof the increment is computed using the posterior associated with the priorassigning the largest probability ǫ to the positive bias α . It follows that,from the perspective of the worst-case scenario, the signal structure is lessinformative the greater is ǫ . Accordingly, conditional on some learning beingoptimal, then it must be with the expectation of a long sampling period thatincreases in length with ǫ . A second effect of an increase in ǫ is that it reducesthe ex ante utility of betting on the ambiguous urn and hence implies thatsignals in an increasingly large interval would not change betting preference.Consequently, a small sample is unlikely to be of value – only long samplesare useful. Together, these two effects suggest existence of a cutoff value for ǫ beyond which no amount of learning is sufficiently attractive to justify itscost. At the cutoff, here 2 b r −
1, DM is just indifferent between stopping andlearning for another instant.There remains the following question for smaller values of ǫ : why is itnever optimal to try learning for a while and then, for some sample re-alizations, to stop and bet on the risky urn? The intuition, adapted fromFudenberg, Strack and Strzalecki (2018), is that this feature is a consequenceof the specification M for the set of priors. To see why, suppose that Z t is small for some positive t . A possible interpretation, particularly for large t , is that the true bias is small and thus that there is little to be gained bycontinuing to sample – DM might as well stop and bet on the risky urn.But this reasoning is excluded when, as in our specification, DM is certainthat the bias is ± α . Then signals sufficiently near 0 must be noise and thesituation is essentially the same as it was at the start. Hence, if stoppingto bet on the risky urn were optimal at t , it would have been optimal alsoat time 0. This intuition is suggestive of the likely consequences of general-izing the specification of M . Suppose, for example, that M is such thatall its priors share a common finite support. We conjecture that then the15redicted incompatibility of learning and betting on the risky urn would beoverturned if the zero bias point is in the common support.Finally, using the closed-form solution in the theorem, we can give moreconcrete expression to the effect of ambiguity on optimal learning. Restrictattention to values of ǫ in [0 , b r − P θ the probability distribution of ( Z t ) if θ is the true bias. Then,by well-known results regarding hitting times of Brownian motion with drift(Borodin and Salminen 2015), the mean sample length according to P θ is E θ τ ∗ = ( z/σ ) (cid:20) tanh ( θz/σ ) θz/σ (cid:21) if θ = 0( z/σ ) if θ = 0, (25)which is increasing in ǫ . Note also that θZ τ ∗ > θ > θ < θ = 0 is the true bias, of choosing the ”correct” bet on stopping is given by P θ ( { θZ τ ∗ > } ) = 11 + exp (cid:16) − | θ | σ z (cid:17) , if θ = 0,which increases with ǫ . (To prove this equality, apply the optional stoppingtheorem to the P θ -martingale e − θZ t /σ .)The proof of Theorem 3.1 yields a closed-form expression for the valuefunction associated with the optimal stopping problem. In particular, thevalue at time 0 satisfies (from (44) and (50)), v − = ( ǫ ≥ b r cσ α [ r (1 − r ) − ǫ )(1 − ǫ ) ] if ǫ < b r . (26)Since the payoff is the best available without learning, v − is the valueof the learning option. It is positive for small ǫ < b r − ǫ increases to the switch point. (Note that ǫ = b r implies both are equal in turn to r , and hence that v is continuous at ǫ = 2 b r − c, σ, α ) = (cid:0) . , , (cid:1) , which gives . ǫ . Thus learning is rejected if ǫ = .
05. For ǫ = .
04, however, τ ∗ > Eτ ∗ = .
61 under P θ =0 . Neither of the values for ǫ is extreme:in the classic Ellsberg setting (with no learning), they imply probabilityequivalents for the bet on red equal to . . ǫ = .
05 and ǫ = .
04 respectively. 16 .3 A robust sequential hypothesis test
DM samples the signal process Z with the objective of then choosing betweenthe two statistical hypotheses H : θ = 0 and H : θ = β ,where β >
0. The novelty relative to Arrow, Blackwell and Girschik (1949)and Peskir and Shiryaev (2006) is that there is prior ambiguity about thevalue of θ and a robust decision procedure is sought.The following specialization of the general model is adopted. Let Θ = { , β } . The actions a and a are accept H and accept H , respectively.A third action is absent because there is no ”outside option” - one of thehypotheses must be chosen. (Formally, one could include a and specify itspayoff below to be zero, in which case it would never be chosen.) The setof priors M is as given in (10), corresponding to the probability interval[ m , m ] for θ = β . Finally, payoffs are given by u ( a ,
0) = u ( a , β ) = a + b , u ( a , β ) = b , u ( a ,
0) = a ,where a, b >
0. (Payoffs in this context are usually specified in terms ofa loss function that is to be minimized. The loss function L satisfying L ( a ,
0) = L ( a , β ) = 0, L ( a , β ) = a , and L ( a ,
0) = b , gives an equivalentreformulation.)There are two differences in specification from the Ellsberg context.First, there is no counterpart of the risky urn when choosing between hy-potheses. Second, while symmetry between colors is natural in the Ellsbergcontext, symmetry between hypotheses is not; thus, b need not equal a andthe probability interval [ m , m ] need not be symmetric about .The optimal stopping problem (9) admits a closed-form solution. Forperspective, consider first the special Bayesian case ( M = { µ } , hence M t = { µ t } , µ t ( β ) = m t ). Denote by ˜ r ℓB < ˜ r RB the solutions to (33), which in thiscontext simplifies to l (˜ r RB ) − l (˜ r lB ) = a + b ˆ c r RB ( − ˜ r RB ) − r lB ( − ˜ r lB ) = b − a ˆ c . (27)Then we have the following classical result. Theorem 3.2 (Peskir and Shiryaev 2006)
In the Bayesian case, for anyprior probability m it is optimal to continue at t if and only if ˜ r ℓB < m t < e r RB . (28)17 therwise, it is optimal to accept H or H according as m t ≥ e r RB or m t ≤ e r ℓB respectively. In the model with ambiguity, the cut-off values are ˜ r ℓ and ˜ r R , ˜ r ℓ < ˜ r R , that solve the appropriate version of (33), and we have the followinggeneralization of the classical result. Theorem 3.3
In the model with ambiguity, it is optimal to stop and accept H or H according as m t ≥ e r R or m t ≤ e r ℓ respectively. Otherwise, it isoptimal to continue.In addition, if a = b , then e r ℓB < e r ℓ and e r R < e r RB . (29)Under the assumption of payoff symmetry ( a = b ), the theorem has note-worthy implications for the relation between the optimal stopping strategiesfor the Bayesian and the robustness-seeking DM. (We conjecture that (29)is valid even if a = b , but a proof has escaped us.) If m ∈ [ m , m ] refer toa compatible Bayesian . The theorem implies:1. If every compatible Bayesian stops and chooses a i , then it is optimalalso for DM to stop and choose a i , i = 1 , ”sensitivity analysis” overstatesthe robustness value of sampling .The intuition is clear. Prior ambiguity leads to the signal structure beingperceived as less likely to be informative (seen from the perspective of theworst-case measure P ∗ - see the outline at the start of the proof of Theorem4.2), even though the signal structure itself is not ambiguous. In contrast,there is no counterpart given multiple Bayesian agents - each is confident inbeliefs about θ and is certain that signal increments are conditionally i.i.d.Only DM internalizes uncertainty about the probability law and discountsthe benefits of learning accordingly. Remark 3.4
As is made clear in Theorem 4.2, stopping conditions can bestated equivalently in terms of either the signal process (as in the Ellsberg odel), or posteriors (as here). In the text, we have adopted the formulationsthat seem more natural for each particular setting. For example, the use ofposteriors above facilitates comparison with the classical Bayesian result. Remark 3.5
Time-consistency in the present context is closely related tothe Stopping Rule Principle – that the stopping rule should have no effecton what is inferred from observed data and hence on the decision taken afterstopping (Berger 1985). It is well-known that: (i) conventional frequen-tist methods, based on ex ante fixed sample size significance levels, violatethis Principle and permit the analyst to sample to a foregone conclusionwhen data-dependent stopping rules are permitted; and (ii) Bayesian poste-rior odds analysis satisfies the Principle. Kadane, Schervish and Seidenfeld(1996) point to the law of iterated expectations as responsible for excludingforegone conclusions (if the prior is countably additive). Equation (5) isa nonlinear counterpart that we suspect plays a similar role in our model(though details are beyond the scope of this paper).
In order to condense notation, we write u ij in place of u ( a i , θ j ), i, j = 0 , § either payoff symmetry ( u = u ) or no risky option( u ≤ min { u , u } ). Payoff symmetry is satisfied in Theorem 3.1, butthe latter assumes more, specifically ex ante indifference between a and a ( m + m = 1) and u = ( u + u ). Thus it is extended below byTheorem 4.2(a). The assumption of no risky option is the crucial elementin the hypothesis testing example, and the corresponding optimal stoppingproblem is isomorphic to that in part (b) of Theorem 4.2.Both m t and m t defined in (12) are increasing functions of ϕ ( t, z t ). Itfollows that there exists a unique pair of probabilities π and π and a unique(deterministic) signal realization trajectory ( e z t ) satisfying, for every t , π = m t ( e z t ), π = m t ( e z t ), and πu + (1 − π ) u = πu + (1 − π ) u .For example, e z = 0, π = m and π = m if and only if a and a areindifferent ex ante. More generally, a and a are indifferent conditional onthe signal e z t at t and a ( a ) is preferred at t if Z t < ( > ) e z t .Normalize the cost of learning to b c , ˆ c = 2 cσ / ( θ − θ ) .19ptimal stopping strategies will be described in terms of several criticalvalues, that are, in turn, defined using the functions l and e l : For all r in(0 , l ( r ) = 2 log( r − r ) − r + 11 − r ˜ l ( r ) = log( r − r ) + r − r .Let ( r R , r R ), ( r l , r l ), (cid:0) r R , r l (cid:1) and (cid:0) ˜ r R , ˜ r l (cid:1) solve the following equationsrespectively: l ( r R ) − l ( r R ) = u − u ˆ c ˜ l ( r R ) − ˜ l ( r R ) = u − u ˆ c , (30) l ( r l ) − l ( r l ) = − u − u ˆ c ˜ l ( r l ) − ˜ l ( r l ) = u − u ˆ c , (31) l ( r R ) − l ( π ) = u − u ˆ c l ( r l ) − l ( π ) = − u − u ˆ c , (32) l (˜ r R ) − l ( π ) = l (˜ r l ) − l ( π ) + u − u + u − u ˆ c ˜ l ( e r R ) − ˜ l ( π ) − π (cid:0) l ( e r R ) − l ( π ) (cid:1) = e l ( e r l ) − e l ( π ) − π (cid:0) l ( e r l ) − l ( π ) (cid:1) . (33)(The latter reduces to (32) if payoff symmetry is satisfied.)Define u ∗∗ = ˆ c r l (1 − r l ) − π (1 − π ) ] + u − u . (34)Besides the existence and uniqueness assertions, the next lemma provesa number of properties that are important for the optimal stopping theoremto follow. Lemma 4.1
There exist unique solutions to (32) and (33), and the solu-tions to the latter satisfy ˜ r l < π , ˜ r R > π . (35) If u ≥ u ∗∗ , then there exist unique solutions also to (30) and (31), and thesolutions satisfy r l < r l , r R < r R , π < r R , r l < π .If payoff symmetry is also satisfied, then: π + π = 1 = r l + r R , and (36) r l ≤ π ⇐⇒ r R ≥ π ⇐⇒ u ≥ u ∗∗ . (37)20efine f ( t, r ) = θ + θ t + σ θ − θ log( 1 − m m r − r ) f ( t, r ) = θ + θ t + σ θ − θ log( 1 − m m r − r ).Then m t (cid:0) f ( t, r ) (cid:1) = r = m t (cid:0) f ( t, r ) (cid:1) , and, for any r and r , f ( t, r ) ≤ e z t ⇐⇒ r ≤ π (38) f ( t, r ) ≥ e z t ⇐⇒ r ≥ π .Finally, define three stopping times: τ ≡ min { t ≥ Z t ≤ f ( t, r l ) } = min { t ≥ m t ≤ r l } , τ ≡ min { t ≥ Z t ≥ f ( t, r R ) } = min { t ≥ m t ≥ r R } , and τ ≡ min { t ≥ f ( t, r l ) ≤ Z t ≤ f ( t, r R ) } = min { t ≥ m t ≥ r l and m t ≤ r R } . Theorem 4.2 (a) Assume payoff symmetry ( u = u ).(a.i) If r l ≤ π , then the optimal stopping time τ ∗ is given by τ ∗ = min { τ i : i = 0 , , } .Moreover, if τ ∗ = τ i , then a i is optimal on stopping. In particular, if thereis ex ante indifference between a and a ( π = m and π = m ), then τ ∗ = 0 and a is chosen.(a.ii) If r l > π , then τ ∗ = min { t ≥ Z t ≤ f ( t, r l ) or Z t ≥ f ( t, r R ) } = min { t ≥ m t ≤ r l or m t ≥ r R } .Moreover, a is optimal on stopping if Z τ ∗ ≤ f ( τ ∗ , r l ) (equivalently if m τ ∗ ≤ r l ), a is optimal if Z τ ∗ ≥ f ( τ ∗ , r R ) (equivalently if m τ ∗ ≥ r R ), and a isnever optimal. b) Assume u ≤ min { u , u } . Then τ ∗ = min { t ≥ Z t ≤ f ( t, ˜ r l ) or Z t ≥ f ( t, ˜ r R ) } = min { t ≥ m t ≤ ˜ r l or m t ≥ ˜ r R } .Moreover, a is optimal on stopping if Z τ ∗ ≤ f ( τ ∗ , ˜ r l ) (equivalently if m τ ∗ ≤ ˜ r l ), a is optimal if Z τ ∗ ≥ f ( τ ∗ , ˜ r R ) (equivalently if m τ ∗ ≥ ˜ r R ), and a isnever optimal. In (a), the distinction between the two subcases depends on the relativemagnitudes of r l and π . From (31) it follows that r l falls as u increases,while π does not depend on u . Therefore, (a.i) applies if the payoff u tothe unambiguous default is sufficiently large. The other factor leading to(a.i) is large π , equivalently (by (36)) small π , which is supported by m large and m small. Thus, (a.i) is supported also by large prior ambiguity.In (a.i), τ ∗ = 0 if either m ≤ r l (prior beliefs are strongly biasedtowards θ and hence a is chosen immediately), or m ≥ r R (prior beliefs arestrongly biased towards θ and hence a is chosen), or m ≥ r l and m ≤ r R (the worst-case probabilities of both θ and θ are both sufficiently low thatneither a nor a are attractive enough to justify the cost of sampling andhence a is chosen). That leaves continuation being optimal at time 0 if andonly if prior beliefs are ”intermediate” in the sense thateither: [ r l < m < r l ] and m < r R ,or: [ r R < m < r R ] and m > r l ].This continuation region could be empty. Since learning is only about thepayoffs to a and a , the situation at time 0 that is least favorable to learningis where there is ex ante indifference between a and a – then a long andhence costly sample would likely be needed to modify the ex ante ranking ofactions. In this case, therefore, it is optimal to reject learning and choose a ,as in Theorem 3.1. However, if, for example, a is strictly preferred initially,then an incentive to learn is that a relatively short interval of sampling maybe enough to decide between a and a . In addition, if m is sufficientlylarge, say near 1, then near certainty that θ = θ can lead to rejection oflearning and the immediate choice of a , rather than of a as in the Ellsbergcontext.In (a.ii), τ ∗ = 0 iff [ m , m ] is disjoint from ( r ℓ , r R ). Notably, the defaultaction is not chosen regardless of when sampling stops. Its payoff u is toolow (from (37), u < u ∗∗ ) compared to the expected payoff of choosing a or22 , possibly after some learning. Moreover, even given some learning, it isnot optimal to choose a regardless of the realized sample, as explained indiscussion of Theorem 3.1. Under ex ante indifference, Lemma 4.1 impliesthat τ ∗ > a and a , then a is chosen if and only if there is nolearning , thus generalizing the result in the Ellsberg model. ( The latter alsoassumes u = ( u + u ), which we see here is not needed for the precedingconclusion.)Finally, consider (b), where the payoff to the unambiguous action is solow that it would never be chosen, regardless of prior beliefs and even inthe absence of the option to learn. The optimal strategy is similar to thatin (a.ii) in form and interpretation - only the critical values may differ toreflect the different assumptions about payoffs. Another comment about(b) is that when m = m , then π = π and the equations (33) defining thecritical values ˜ r R and ˜ r l become l (˜ r R ) − l (˜ r l ) = u − u + u − u ˆ c ˜ l (˜ r R ) − ˜ l (˜ r l ) = u − u ˆ c ,which are equations (21.1.14) and (21.1.15) in Peskir and Shiryaev (2006).Proof of the theorem is provided in the e-companion. Here we commentbriefly on the proof strategy.The strategy is to: (i) guess the P ∗ in P that is the worst-case scenario;(ii) solve the classical optimal stopping problem given the single prior P ∗ ;(iii) show that the value function derived in (2) is also the value function forour problem (9); and (4) use the value function to derive τ ∗ .The intuition for the conjectured P ∗ was given in § P ∗ should make P ∗ ( { dZ t > } | Z t > e z t ) and P ∗ ( { dZ t < } | Z t < e z t ) assmall as possible, by using m t when Z t > e z t and m t when Z t < e z t . (See(41) for the precise definition of P ∗ .) The search for the value function v begins with the HJB equation which yields its functional form up to someconstants to be determined by smooth contact conditions between v and thepayoff function X (see Peskir and Shiryaev (2006) for this free-boundary ap-proach to analysing optimal stopping problems). A new ingredient relativeto existing models stems from the nature of P ∗ , specifically from the factthat the relevant posterior probability at t switches between m t and m t asdescribed, implying that the form of the value function differs between theregions Z t > e z t and Z t < e z t . Thus, in addition to ensuring a smooth contact23t stopping points, one must also be concerned with the smooth connectionat e z t .We elaborate on the latter point in order to highlight the technical nov-elty that arises from ambiguity. For concreteness consider (a.ii), where a is never chosen. Let y denote a posterior probability, computed using m or m , depending on the sub-domain, and let V R ( y ) : [ π, → [0 , + ∞ ) and V l ( y ) : [0 , π ] → [0 , + ∞ ) denote corresponding candidates for the value inthe indicated regions. Then the variational inequality and smooth contactslead to the following free-boundary differential equation, in which r R ∈ ( π, r l ∈ [0 , π ) are also unknowns to be determined: V Ryy ( y ) = ˆ c y (1 − y ) , y ∈ ( π, r R ) V R ( r R ) = ( u − u ) r R + u V Ry ( r R ) = ( u − u ) V lyy ( y ) = ˆ c y (1 − y ) , y ∈ ( r l , π ) V l ( r l ) = − ( u − u ) r l + u V ly ( r l ) = − ( u − u ), (39)and the (new) smooth contact conditions due to ambiguity ( π < π ): (cid:26) V R ( π ) = V l ( π ) ,V Ry ( π ) = V ly ( π ) . (40)In (a.ii), payoff symmetry leads to the simplification V Ry ( π ) = V ly ( π ) = 0,which leads to (32) becoming two separated equations. However, in (b), theconnection is not trivial. Below ”almost surely” qualifications should be understood, even where notstated explicitly, and as defined relative to any measure in P .To compute the payoff X t defined in (16), note thatmin µ ∈M t Z u ( a , θ ) dµ = ( u − u )(1 − m t ) + u , min µ ∈M t Z u ( a , θ ) dµ = ( u − u ) m t + u . There is a critical level of u , denoted u ∗ , u ∗ = u u − u u u + u − u − u .24f u ≤ u ∗ , then X t = (cid:26) ( u − u )(1 − m t ) + u if m t < π ( u − u ) m t + u if m t ≥ π .Accordingly, the default action a is not optimal at any t , and a ( a ) isoptimal conditional on stopping at t if m t < π ( m t ≥ π ). If u > u ∗ , then X t = ( u − u )(1 − m t ) + u if m t < u − u u − u ( u − u ) m t + u if m t ≥ u − u u − u u otherwise,reflecting the conditional optimality of a , a and a respectively in the threeindicated regions.As in §
2, for any µ ∈ M , µ t denotes its Bayesian posterior at t and b θ µt = R θdµ t is the corresponding posterior estimate of θ . The two extrememeasures µ = µ , µ , are defined by µ t ( θ ) = m t and µ t ( θ ) = m t ,and yield the estimates ˆ θ µt and ˆ θ µt respectively. Let P ∗ be the probabilitymeasure in P which has density generator process ( η t ), − η t = (ˆ θ µt /σ ) Z t ≤ e z t + (ˆ θ µt /σ ) Z t > e z t . (41)It will be shown that P ∗ is the worst-case scenario in P . Proof of (a.ii) : Consider the classical optimal stopping problem under P ∗ ,max τ E P ∗ [ X τ − cτ ]. (42)Define g and g by, for 0 < y < i = 1 , g i ( y ; C i − , C i ) = ˆ c (2 y −
1) log( y − y ) + C i − y + C i , (43)where the constants C i ( i = 1, 2, 3, 4) are determined by smooth-contactconditions.We conjecture that the value function for (42) has the form: v ( t, z ) = ( u − u )(1 − m t ( z )) + u if z < f ( t, r l ) g ( m t ( z ) ; C , C ) if f ( t, r l ) ≤ z < e z t g ( m t ( z ) ; C , C ) if e z t ≤ z < f ( t, r R )( u − u ) m t ( z ) + u if f ( t, r R ) ≤ z, (44)25here C = − ˆ cℓ ( π ), C = − ˆ cℓ ( π ) C = ( u − u )(1 − r l ) + u − ˆ c [(2 r l −
1) log( r l − r l ) − ℓ ( π ) r l ] C = ( u − u ) r R + u − ˆ c [(2 r R −
1) log( r R − r R ) − ℓ ( π ) r R ].(Note that the cut-off value u ∗∗ defined in (34) satisfies u ∗∗ = g ( π ; C , C ) = g ( π ; C , C ) = v ( t, e z t ).) Lemma 5.1 v is the value function of the classical optimal stopping problem(42), i.e., for any t ≥ , v ( t, z ) = max τ ≥ t E P ∗ [ X τ − t − c ( τ − t ) | Z t = z ] .Further, v satisfies the HJB equation max { X ( t, z ) − v ( t, z ) , − c + v t ( t, z ) + 12 σ v zz ( z ) + f ( t, z ) v z ( t, z ) } = 0 , (45) where f ( t, z ) ≡ [ θ − θ − θ m − m ϕ ( t, z ) ]1 { z< e z t } + [ θ − θ − θ m − m ϕ ( t, z ) ]1 { z ≥ e z t } . (46) Finally, v also satisfies, ∀ z ∈ ( f ( t, r l ) , f ( t, r R )) , − c + v ( t, z ) + 12 σ v zz ( z ) + f ( t, z ) v z ( t, z ) = 0 . (47)For the proof, first verify that v satisfies the HJB equation (45), and thenapply El Karoui et al. (1997, Theorems 8.5, 8.6). Alternatively, a proof canbe constructed along the lines of Peskir and Shiryaev (2006, Ch. 6).Next prove that v is the value function of the (nonclassical) optimalstopping problem (9) (solving the HJB equation is not sufficient to implythis). We consider only t = 0 and prove v (0 , z ) = max τ ≥ min P ∈P E P [ X ( Z τ ) − cτ ].26y Lemma 5.1, v (0 , z ) = max τ ≥ E P ∗ [ X ( Z τ ) − cτ ] ≥ max τ ≥ min P ∈P E P [ X ( Z τ ) − cτ ].To prove the opposite inequality, consider the stopping time τ ∗ = inf { t ≥ Z t ≤ f ( t, r l ) or Z t ≥ f ( t, r R ) } . For t ≤ τ ∗ , by Ito’s formula, (45), and (47), dv ( t, Z t ) =[ v t ( t, Z t ) + 12 σ v zz ( t, Z t )] dt + v z ( t, Z t ) dZ t (48)= [ c − f ( t, Z t ) v z ( t, Z t )] dt + v z ( t, Z t ) dZ t = [ c − f ( t, Z t ) v z ( t, Z t )] dt + v z ( t, Z t ) dZ t .Each P = P η ∈ P corresponds to a density generator process ( η t ), and( W ηt ) is a Brownian motion under P η , where W ηt = 1 σ Z t + 1 σ Z t ˜ f ( s, Z s , η s ) ds , and˜ f ( t, Z t , η t ) = [ θ − θ − θ η t − η t ϕ ( t, Z t ) ].Therefore, dv ( t, Z t ) =[ c + (cid:16) ˜ f ( t, Z t , η t ) − f ( t, Z t ) (cid:17) v z ( t, Z t )] dt + σv z ( t, Z t ) dW ηt . Note that (cid:16) ˜ f ( t, Z t , η t ) − f ( t, Z t ) (cid:17) v z ( Z t ) ≥
0. (Suppose Z t < e z t . Then v z ( Z t ) ≤ f ( t, Z t , η t ) − f ( t, Z t ) ≤
0, the latter because [ θ − θ − θ m − m ϕ ( t,z ) ]is increasing in m . Argue similarly for Z t < e z t .) Take expectation aboveunder P η to obtain v (0 , z ) ≤ E P η [ v ( τ ∗ , Z τ ∗ ) − cτ ∗ ]= E P η [ X τ ∗ − cτ ∗ ] . The above inequality is due to E P η [ Z τ ∗ σv z ( t, Z t ) dW ηt ] = 0,which is guaranteed by max P ∈P E P [ τ ∗ ] < ∞ ; (49)27ee Peskir and Shiryaev (2006, Theorem 21.1) for the classical case. In oursetting, (49) is implied by the boundedness of X t because: −∞ < max τ ≥ min P ∈P E P ( X τ − cτ ) = max τ ≥ [ − max P ∈P E P ( cτ − X τ )] ≤ max τ ≥ [max P ∈P E P ( X τ ) − max P ∈P E P ( cτ )] = ⇒ max P ∈P E P [ τ ∗ ] < ∞ .Finally, because P η can be any measure in P , deduce that v (0 , z ) ≤ min P ∈P E P [ X τ ∗ − cτ ∗ ] ≤ max τ ≥ min P ∈P E P [ X τ − cτ ].Conclude that v is the value function for our optimal stopping problem andthat τ ∗ is the optimal stopping time. Remark 5.2
The preceding implies that P ∗ is indeed the minimizing mea-sure because the minimax property is satisfied: max τ ≥ E P ∗ X ( Z τ ) = max τ ≥ min P ∈P E P X ( Z τ ) ≤ min P ∈P max τ ≥ E P X ( Z τ ) ≤ max τ ≥ E P ∗ X ( Z τ ) = ⇒ min P ∈P max τ ≥ E P X ( Z τ ) = max τ ≥ min P ∈P E P X ( Z τ ) . Proof of (a.i):
The proof is similar to that of (a.ii). The only difference isthat the value function v is given by v ( t, z ) = ( u − u )(1 − m t ( z )) + u if z < f ( t, r l ) g ( m t ( z ) ; C , C ) if f ( t, r l ) ≤ z < f ( t, r l ) u if f ( t, r l ) ≤ z < f ( t, r R ) g ( m t ( z ) ; C , C ) if f ( t, r R ) ≤ z < f ( t, r R )( u − u ) m t ( z ) + u if f ( t, r R ) ≤ z . (50)Here g and g are identical to g and g (defined in (43)) respectively, exceptthat the constants C , ..., C are replaced respectively by C , ..., C given by C = − ˆ cℓ ( r l ), C = − ˆ cℓ ( r R ) C = u − ˆ c [(2 r l −
1) log( r l − r l ) − ℓ ( r l ) r l ] C = u − ˆ c [(2 r R −
1) log( r R − r R ) − ℓ ( r R ) r R ].28 roof of (b) : Since it is never optimal to choose a , we can delete it fromthe set of feasible actions. The proof proceeds as in (a.ii), though we define v ( t, z ) = ( u − u )(1 − m t ( z )) + u if z < f ( t, ˜ r l ) g ( m t ( z ) ; C , C ) if f ( t, ˜ r l ) ≤ z < e z t g ( m t ( z ) ; C , C ) if e z t ≤ z < f ( t, ˜ r R )( u − u ) m t ( z ) + u if f ( t, ˜ r R ) ≤ z, where g and g are identical to g and g (defined in (43)) respectively,except that the constants C , ..., C are replaced respectively by C , ..., C given by C = − ˆ cℓ (˜ r R ) + u − u C = − ˆ cℓ (˜ r l ) + u − u C = u − ˆ c [1 − e l (˜ r R )] C = u − ˆ c [1 − e l (˜ r l )]. Define ˆ l ( r ) = (2 r −
1) log( r − r ). We prove the existence and uniqueness ofsolutions to the following equations: (32) : Follows from l : (0 , → ( −∞ , ∞ ) being surjective, continuous andstrictly increasing. (33) : Adapt the argument in Peskir and Shiryaev (2006, p. 290) used fora classical optimal stopping problem, generalized here to our context withambiguity. For fixed ˆ r l ∈ (0 , π ), consider the following equation for V l ( y ): V l ( y ) = ˆ c ˆ l ( y ) + ˆ C y + ˆ C V ly ( y ) = ˆ cl ( y ) + ˆ C V l (ˆ r l ) = − ( u − u )ˆ r l + u V ly (ˆ r l ) = u − u , (51)where y ∈ (0 ,
1) and ˆ C , ˆ C are constants to be determined. The solution is V l ( y ) = ˆ c ˆ l ( y ) − ( u − u + ˆ cl (ˆ r l )) y + u + ˆ c (ˆ r l l (ˆ r l ) − ˆ l (ˆ r l )).29ecause V l ( y ) depends on ˆ r l , we denote the solution by V l ( y ; ˆ r l ). If V l ( π ; ˆ r l )
1) and ˆ C , ˆ C are constants to be determined. The solution is V R ( y ) = ˆ c ˆ l ( y ) + ( V ly ( π ; ˆ r l ) − ˆ cl ( π )) y + V l ( π ; ˆ r l ) + ˆ c ( πl ( π ) − ˆ l ( π )) − πV ly ( π ; ˆ r l ).Denote the solution by V R ( y ; ˆ r l ). Since ˆ l ′′ ( y ) = l ′ ( y ) > y ∈ (0 , V l ( y ; ˆ r l ) and V R ( y ; ˆ r l ) are strictly convex functions. Recallthat π = m t ( e z t ), π = m t ( e z t ) and π ( u − u )+ u = (1 − π ) ( u − u )+ u .Then, V R ( π ) = V l ( π ; ˆ r l ) implies that the function y V R ( y ; ˆ r l ) intersects y ( u − u ) y + u for some y ∈ ( π,
1) when ˆ r l is close to π . Let y = ˆ y l satify V l ( y ; ˆ r l ) = u . Then, ˆ y l ↓ r l ↓ r l from π down to 0 and applying the properties estab-lished above, we obtain the existence of a unique point ˆ r l ∗ ∈ (0 , π ) for whichthere exists ˆ r R ∗ ∈ ( π,
1) such that V R (ˆ r R ∗ ; ˆ r l ∗ ) = ( u − u )ˆ r R ∗ + u (53) V Ry (ˆ r R ∗ ; ˆ r l ∗ ) = u − u .Combining (51), (52) and (53), we can verify that (ˆ r R ∗ , ˆ r l ∗ ) is a solution of(33). Note that each step of the derivation is reversible. Thus, there existsa unique solution ( e r R , e r l ) for (33). Inequalities (35) follow directly fromconstruction of the solution. (31) and (30) : By the definition of u ∗∗ and equation (32), it is easy tocheck that u ∗∗ > u . Set ˆ y = u − u ∗∗ u − u . Define the following payoff function V ( y ) = (cid:26) − ( u − u ) y + u if y ∈ (0 , ˆ y ); u ∗∗ if y ∈ (ˆ y, r l , r l ) for (31). The proof for (30) is similar.It is obvious that r l < r l and r R < r R due to l being strictly increasing.Turn to the remainder of the lemma (we skip the most obvious as-sertions). Given payoff symmetry, the definitions of π and π imply that π + π = 1. Then r l + r R = 1 follows from (32) and l ( r ) + l (1 − r ) = 0.30 rove (37) : Verify that l ( r ) = e l ( r ) − r (1 − r ) + 1 and rewrite (30) as˜ l ( r R ) − ˜ l ( r R ) = r R (1 − r R ) − r R (1 − r R ) + u − u ˆ c ˜ l ( r R ) − ˜ l ( r R ) = u − u ˆ c .If u = u ∗∗ , then, using payoff symmetry, we can verify that r R = r R , r R = π is the unique solution of (30). Next we prove that the solution r R of (30) is increasing with respect to u . Note that l ′ ( r ) = r (1 − r ) and˜ l ′ ( r ) = r (1 − r ) . From (30), derive l ′ ( r R ) dr R dr R − l ′ ( r R ) = 0˜ l ′ ( r R ) dr R dr R dr R du − ˜ l ( r R ) dr R du = 1ˆ c .Thus, dr R du = ( r R ) (1 − r R ) ˆ c ( r R − r R ) > r R ≥ π ⇐⇒ u ≥ u ∗∗ . Similarly, we can prove that r l ≤ π ⇐⇒ u ≥ u ∗∗ . (cid:4) Proof of Theorem 3.1 (Ellsberg): (i) Compute that ˆ c = cσ α , e z t = 0, π = − ǫ , π = ǫ . Equations (30) and (31) simplify to r R + r R = 1 , l ( r R ) = α cσ r l + r l = 1 , l ( r l ) = α cσ ,(which exploit the fact that u = ( u + u )), and the functions f and f become f ( t, r ) = σ α log( 1 − ǫ ǫ r − r ) f ( t, r ) = σ α log( 1 + ǫ − ǫ r − r ).If r l < ǫ , then f ( t, r l ) ≤ ≤ f ( t, r R ). By Theorem 4.2(a.i), the signal Z = 0 falls in the stopping region which leads to τ ∗ = 0. This proves (i)with b r = r l . 31ii) Equation (32) becomes r R + r l = 1 , l ( r R ) + l ( 1 + ǫ α cσ ,and z ≡ f ( t, r R ) = − f ( t, r l ) = σ α (cid:20) log( 1 + ǫ − ǫ ) + log( r R − r R ) (cid:21) . By Theorem 4.2(a.ii), τ ∗ = min { t ≥ | Z t |≥ z } .Let z be given by z = σ α log( 1 + ǫ − ǫ ) < z .It follows from (16) and (11) that at any given t , not necessarily an optimalstopping time, betting on the ambiguous urn is preferred to betting on therisky urn iff | Z t |≥ z . Thus at τ ∗ > | Z τ ∗ | = z > z , and betting on theambiguous urn is optimal on stopping.Finally, we show that z is increasing in ǫ : ℓ ′ ( r ) = r (1 − r ) = ⇒ dzdǫ > r R − ǫ ℓ ′ (cid:0) r R (cid:1) > ǫ − r R ℓ ′ (cid:0) ǫ (cid:1) iff ǫ · − ǫ > r R (cid:0) − r R (cid:1) . But < ǫ < r l
Econometrica
Uncertainty and Expecta-tions in Economics (Basil Blackwell, Oxford), 1-11.[3] Berger JO (1984) The robust Bayesian viewpoint. Kadane J, ed.
Ro-bustness in Bayesian Statistics (North Holland, Amsterdam), 63-124.[4] Berger JO (1985)
Statistical Decision Theory and Bayesian Analysis (Springer, New York). 325] Berger JO (1994) An overview of robust Bayesian analysis (with dis-cussion).
Test
Handbook of Brownian Motion–Factsand Formulae , 2nd ed. (Birkhauser, Basel).[7] Caro F, Das Gupta A (2015) Robust control of the multi-armed banditproblem.
Ann. Oper. Res . https://doi.org/10.1007/s10479-015-1965-7.[8] Chen Z, Epstein LG (2002) Ambiguity, risk and asset returns in con-tinuous time.
Econometrica
Math. Finan. Econom.
Proc. Fourth Berkeley Symp. on Math. Statist. andProbab. vol 1 (U. California Press, Berkeley), 79-91.[11] Choi H (2016) Learning under ambiguity: portfolio choice and assetreturns. Working Paper, City University of Hong Kong.[12] Ebert S, Strack P (2018) Never, ever gettingstarted: on prospect theory without commitment,https://papers.ssrn.com/sol3/papers.cfm?abstract id=2765550.[13] El Karoui N, Kapoundijian C, Pardoux E, Peng S, Quenez M (1997)Reflected solutions of backward SDE’s and related obstacle problemsfor PDE’s.
Ann. Probab . 25(2):702-737.[14] Ellsberg D (1961) Risk, ambiguity, and the Savage axioms.
Quart. J.Econom.
J. Econom.Theory
Rev.Econom. Stud . 74(4):1275-1303.[17] Epstein LG, Schneider M (2008) Ambiguity, information quality andasset pricing.
J. Finan.
Ann.Rev. Finan. Econom.
Amer. Econom. Rev . 108(2):3651-3684.[20] Gilboa I (2009)
Theory of Decision under Uncertainty . (Cambridge U.Press, New York).[21] Gilboa I (2015) Rationality and the Bayesian paradigm.
J. Econom.Method.
Synthese
J. Math. Econom.
Proc. IEEE
Ann.Math. Statist.
JASA
J. Math.Econom.
Statistics of Random Processes I: GeneralTheory. (Springer, Berlin).[30] Marinacci M (2002) Learning from ambiguous urns.
Statist. Papers
Ann. Econom. Finan.
Stochastic Differential Equations,
Optimal Stopping and Free-Boundary Prob-lems (Springer, Berlin).[35] Rios-Insua D, Ruggeri F (2000)
Robust Bayesian Analysis (Springer,New York).[36] Shapiro A (2016) Rectangular sets of probability measures.
Oper. Res.
Optimal Stopping Rules
Games Econom. Behav . 79:44-55.[39] Wald A (1945) Sequential tests of statistical hypotheses.
Ann. Math.Statist.
Sequential Analysis (Wiley, New York).[41] Walley P (1991)
Statistical Reasoning with Imprecise Probabilities (Chapman and Hall, London).[42] Zhang J (2017)