[PDF] Optimal Learning under Robustness and Time-Consistency

Abstract

We model learning in a continuous-time Brownian setting where there is prior ambiguity. The associated model of preference values robustness and is time-consistent. It is applied to study optimal learning when the choice between actions can be postponed, at a per-unit-time cost, in order to observe a signal that provides information about an unknown parameter. The corresponding optimal stopping problem is solved in closed-form, with a focus on two specific settings: Ellsberg's two-urn thought experiment expanded to allow learning before the choice of bets, and a robust version of the classical problem of sequential testing of two simple hypotheses about the unknown drift of a Wiener process. In both cases, the link between robustness and the demand for learning is studied.

Full PDF

aa r X i v : . [ q -f i n . E C ] M a r Optimal Learning under Robustness andTime-Consistency ∗ Larry G. Epstein Shaolin JiMarch 6, 2019

Abstract

We model learning in a continuous-time Brownian setting wherethere is prior ambiguity. The associated model of preference values ro-bustness and is time-consistent. It is applied to study optimal learningwhen the choice between actions can be postponed, at a per-unit-timecost, in order to observe a signal that provides information about anunknown parameter. The corresponding optimal stopping problemis solved in closed-form, with a focus on two speciﬁc settings: Ells-berg’s two-urn thought experiment expanded to allow learning beforethe choice of bets, and a robust version of the classical problem of se-quential testing of two simple hypotheses about the unknown drift ofa Wiener process. In both cases, the link between robustness and thedemand for learning is studied.Key words: ambiguity, robust decisions, learning, partial informa-tion, optimal stopping, sequential testing of simple hypotheses, Ells-berg Paradox, recursive utility, time-consistency, model uncertainty ∗ Department of Economics, Boston University, [email protected] and Zhongtai Secu-rities Institute of Financial Studies, Shandong University, [email protected]. Ji gratefullyacknowledges the ﬁnancial support of the National Natural Science Foundation of China(award No. 11571203). We are grateful for suggestions from two referees and for com-ments from Tomasz Strzalecki. An earlier version, titled ”Optimal learning and Ellsberg’surns,” was posted on arxiv in August 2017. Introduction

We consider a decision-maker (DM) choosing between three actions whosepayoﬀs are uncertain because they depend on both exogenous randomnessand on an unknown parameter θ , θ = θ or θ . She can postpone the choiceof action so as to learn about θ by observing the realization of a signalmodeled by a Brownian motion with drift. Because of a per-unit-time cost ofsampling, which can be material or cognitive, she faces an optimal stoppingproblem. A key feature is that DM does not have suﬃcient informationto arrive at a single prior about θ , that is, there is ambiguity about θ .Therefore, prior beliefs are represented by a nonsingleton set of probabilitymeasures, and DM seeks to make robust choices of both stopping time andaction by solving a maxmin problem. In addition, she is forward-looking anddynamically consistent as in the continuous-time version of maxmin utilitygiven by Chen and Epstein (2002). One contribution herein is to extend thelatter model to accommodate learning. As a result, we capture robustnessto ambiguity (or model uncertainty), learning and time-consistency. Theother contribution is to investigate optimal learning in the above setting,with particular focus on two special cases that extend classical models. Thecorresponding optimal stopping problems are solved explicitly and the eﬀectsof ambiguity on optimal learning are determined.The ﬁrst speciﬁc context begins with Ellsberg’s metaphorical thoughtexperiment: There are two urns, each containing balls that are either redor blue, where the ”known” or risky urn contains an equal number of redand blue balls, while no information is provided about the proportion of redballs in the ”unknown” or ambiguous urn. DM must choose between bettingon the color drawn from the risky urn or from the ambiguous urn.

Theintuitive behavior highlighted by Ellsberg is the choice to bet on the drawfrom the risky urn no matter the color, which behavior is paradoxical forsubjective expected utility theory, or indeed, for any model in which beliefsare represented by a single probability measure. Ellsberg’s paradox is oftentaken as a normative critique of the Bayesian model and of the view that thesingle prior representation of beliefs is implied by rationality (e.g., Gilboa2009, 2015; Gilboa et al. 2012). Here we add to the thought experimentby including a possibility to learn. Speciﬁcally, we allow DM to postponeher choice so that she can observe realizations of a diﬀusion process whosedrift is equal to the proportion of red in the ambiguous urn. Under speciﬁcparametric restrictions we completely describe the optimal joint learningand betting strategy. In particular, we show that it can be optimal to rejectlearning completely, and, if some learning is optimal, then it is never opti- al to bet on the risky urn after stopping . The rationality of no learningsuggests that one needs to reexamine and qualify the common presumptionthat ambiguity would fade away, or at least diminish, in the presence oflearning opportunities (Marinacci 2002). It can also explain experimentalﬁndings (Trautman and Zeckhauser 2013) that some subjects neglect op-portunities to learn about an ambiguous urn even at no visible (material)cost. In addition, our model is suggestive of laboratory experiments thatcould provide further evidence on the connection between ambiguity andthe demand for learning.The second application is to the classical problem of sequential testingof two simple hypotheses about the unknown drift of a Wiener process. Theseminal papers, both using a discrete-time framework, are Wald (1945,1947),which shows that the sequential probability ratio test (SPRT) provides anoptimal trade-oﬀ between type I and type II errors, and Arrow, Blackwelland Girshick (1949), which derives SPRT from utility maximization us-ing dynamic programming arguments. More recently, Peskir and Shiryaev(2006, Ch. 6) employ a Bayesian subjectivist approach and derive SPRT asthe solution to a continuous-time optimal stopping problem. We extend thelatter analysis to accommodate situations where DM, a statistician/analyst,does not have suﬃcient information to justify reliance on a single prior. Weshow that it is optimal to stop if every ”compatible” Bayesian (one whoseprior is an element of the set of priors used by the robustness-seeking DM)would choose to do so. But the corresponding statement for ”continue”is false: it may be optimal to stop under robustness even given a realizedsample at which all compatible Bayesians would choose to continue. In thissense, ”sensitivity analysis” overstates the robustness value of sampling. We view our model as normative, which perspective is most evident inthe hypothesis testing context. Time-consistency of preference has obviousprescriptive appeal. It is important to understand that, roughly speaking,time-consistency is the requirement that a contingent plan (e.g., a stoppingstrategy) that is optimal ex ante remain optimal conditional on every sub-sequent realization assuming there are no surprises or unforeseen events .A possible argument against such consistency, (that is sometimes expressedin the statistics literature), is that surprises are inevitable and thus thatany prescription should take that into account rather than excluding theirpossibility. We would agree that a sophisticated decision-maker would ex-pect that surprises may occur while (necessarily) being unable to describewhat form they could take. However, to the best of our knowledge therecurrently does not exist a convincing model in the economics, statistics orpsychology literatures of how such an individual should (or would) behave,3hat is, how the awareness that she may be missing something in her per-ception of the future should (or would) aﬀect current behavior. That leavestime-consistency as a sensible guiding principle with the understanding thatreoptimization can (and should) occur if there is a surprise.A brief review of other relevant literature concludes this introduction.The classical Bayesian model of sequential decision-making, including inparticular applications to inference and experimentation, are discussed inHoward (1970) and the references therein. The maxmin model of ambiguityaverse preference is axiomatized in a static setting in Gilboa and Schmeidler(1989), (which owes an intellectual debt to the Arrow and Hurwicz (1972)model of decision-making under ignorance), and in a multi-period discrete-time framework in Epstein and Schneider (2003) where time-consistency isone of the key axioms. Optimal stopping problems have been studied inthe absence of time-consistency. It is well-known that modeling a concernwith ambiguity and robust decision-making leads to ”nonlinear” objectivefunctions, which, in a dynamic setting and in the absence of commitment,can lead to time-inconsistency issues (Peskir 2017). A similar issue arisesalso in a risk context where there is a known objective probability law, butwhere preference does not conform to von Neumann-Morgenstern’s expectedutility theory (Ebert and Strack 2018; Huang et al. 2018). Such models areproblematic in normative contexts. It is not clear why one would ever pre-scribe to a decision-maker (who is unable or unwilling to commit) that she should adopt a criterion function that would imply time-inconsistent plansand that she should then resolve these inconsistencies by behaving strate-gically against her future selves (as is commonly assumed). The recursivemaxmin model has been used in macroeconomics and ﬁnance (e.g., Epsteinand Schneider 2010) and also in robust multistage stochastic optimization(e.g., Shapiro (2016) and the references therein, including to the closely re-lated literature on conditional risk measures). Shapiro focuses on a propertyof sets of measures, called rectangularity following Epstein and Schneider(2003), that underlies recursivity of utility and time-consistency. Most ofthe existing literature deals with a discrete-time setting. The theoretical lit-erature on learning under ambiguity is sparse and limited to passive learning(e.g., Epstein and Schneider 2007, 2008). With regard to hypothesis test-ing, this paper adds to the literature on robust Bayesian statistics (Berger1984,1985,1994; Rios-Insua and Ruggeri 2000), which is largely restricted toa static environment. Walley (1991) goes further and considers both a priorand a single posterior stage, but not sequential hypothesis testing. For afrequentist approach to robust sequential testing see Huber (1965).Closest to the present paper is the literature on bandit problems with4mbiguity and robustness (Caro and Das Gupta 2015; Li 2019). Both pa-pers model endogenous learning (or experimentation) by maxmin dynami-cally consistent agents. Their models diﬀer from ours in that they assumediscrete time, an exogenously given horizon, and also in the nature of exper-imentation. In our model, the once-and-for-all choice of action and resultingpayoﬀ come after all learning has ceased, while in bandit problems, actionchoice and ﬂow payoﬀs are continuous and intertwined with learning (forexample, the cost of experimentation is the implied reduction in currentﬂow payoﬀs). Consequently, their analyses and characterizations are muchdiﬀerent, for example, their focus on the existence of a suitable Gittins indexhas no counterpart in our model.The paper proceeds as follows. The next section describes the model ofutility extending Chen-Epstein to accommodate learning. Readers who areprimarily interested in applications can skip this relatively technical sectionand move directly to § § For background regarding time-consistency in the maxmin framework, con-sider ﬁrst the following informal outline that anticipates the speciﬁc settingof this paper. DM faces uncertainty about a payoﬀ-relevant state space Ωdue to uncertainty about the value of a parameter θ ∈ Θ. Each θ deter-mines a unique probability law on Ω, but there is prior ambiguity aboutthe parameter that is represented by a nonsingleton set M of priors on Θ.As time proceeds, DM learns about the parameter through observation ofa signal whose increments are distributed i.i.d. conditional on θ . At issueis how to model beliefs about Ω, that is, the set P of predictive priors.(Throughout we adopt the common practice of distinguishing terminologi-cally between beliefs about the state space, referred to as predictive priors,and beliefs about parameters, which are referred to as priors.) A seeminglynatural approach is to take P to be the set of all measures that can beobtained by combining some prior µ in M with the given conditionallyi.i.d. likelihood. Learning is modeled through the set of posteriors M t at t obtained via prior-by-prior Bayesian updating of M , and a correspondingset P t of predictive posteriors is obtained as above. Finally, at each t ≥ P t t is depends on the worst-case posterior µ t in M t , but worst-casesat diﬀerent nodes need not belong to same prior µ . This is in contrast withthe ex ante perspective expressed via P where a single worst-case prior µ determines the entire ex ante optimal plan. To restore dynamic consistency,one can enlarge P by adding to it all measures obtained by pasting togetheralien posteriors, leading to a ”rectangular” set that is closed with respectto further pasting. One can think of the enlarged set as capturing both thesubjectively possible probability laws and backward induction reasoning byDM.aSee Epstein and Schneider (2003) for further discussion and axiomaticfoundations in a discrete-time framework, and Chen and Epstein (2002)–CEbelow–for a continuous-time formulation that we outline next. Then we de-scribe how it can be adapted to include learning with partial information.The latter description is given in the simplest context adequate for the ap-plications below. However, it should be clear that it can be adapted moregenerally.Let (Ω , G ∞ , P ) be a probability space, and W = ( W t ) ≤ t< ∞ a 1-dimensionalBrownian motion which generates the ﬁltration G = {G t } t ≥ , with G t ր G ∞ .(All probability spaces are taken to be complete and all related ﬁltrationsare augmented in the usual sense.) The measure P is a reference measurewhose role is only to deﬁne null events. CE deﬁne a set of predictive priors P on (Ω , G ∞ ) through speciﬁcation of their densities with respect to P . Todo so, they take as an additional primitive a (suitably adapted) set-valuedprocess (Ξ t ). (Technical restrictions are that Ξ t : Ω K ⊂ R d for somecompact set K independent of t , 0 ∈ Ξ t ( ω ) dt ⊗ dP a.s. , and that eachΞ t is convex- and compact-valued.) Deﬁne the associated set of real-valuedprocesses by Ξ = { η = ( η t ) | η t ( ω ) ∈ Ξ t ( ω ) dt ⊗ dP a.s. } . Then each η ∈ Ξ deﬁnes a probability measure on G ∞ , denoted P η , that isequivalent to P on each G t , and is given by dP η dP | G t = exp {− Z t η s ds − Z t η s dW s } for all t .Accordingly, each η t ( ω ) ∈ Ξ t ( ω ) can be thought of roughly as deﬁning con-ditional beliefs about G t + dt , and Ξ t ( ω ) is called the set of density generators

6t ( t, ω ). By the Girsanov Theorem, dW ηt = η t dt + dW t (1)is a Brownian motion under P η , which thus can be understood as an alter-native hypothesis about the drift of the driving process W (the drift is 0under P ). Finally, P ≡ { P η : η ∈ Ξ } . (2)(The ”pasting” referred to above is accomplished through the fact that Ξ isconstructed by taking all selections from the Ξ t s.)The set P is used to deﬁne a time 0 utility function on a suitable setof random payoﬀs denominated in utils. In order to model in the sequelthe choice of how long to learn (or sample), we consider a set of stoppingtimes τ , that is, each τ is an adapted R + -valued and {G t } -adapted randomvariable deﬁned on Ω, that is, { ω : τ ( ω ) > t } ∈ G t for every t . For each such τ , utility is deﬁned on the set L ( τ ) of real-valued random variables given by L ( τ ) = { ξ | ξ is G τ -measurable and sup Q ∈P E Q | ξ | < ∞} .The time 0 utility of any ξ ∈ L ( τ ) is given by U ( ξ ) = inf Q ∈P E Q ξ = − sup Q ∈P E Q [ − ξ ] . (3)It is natural to consider also conditional utilities at each ( t, ω ), where U t ( ξ ) = ess inf Q ∈P E Q [ ξ | G t ]. (4)In words, U t ( ξ ) is the utility of ξ at time t conditional on the informationavailable then and given the state ω (the dependence of U t ( ξ ) on ω is sup-pressed notationally). The special construction of P delivers the followingcounterpart of the law of total probability (or law of iterated expectations):For each ξ , and 0 ≤ t < t ′ , U t ( ξ ) = ess inf Q ∈P E Q [ U t ′ ( ξ ) | G t ] . (5)This recursivity ultimately delivers the time-consistency of optimal choices.The components P , W , (Ξ t ) and {G t } are primitives in CE. Next wespecify them in terms of the deeper primitives of a model that includeslearning about an unknown parameter θ ∈ Θ ⊂ R .7peciﬁcally, begin with a measurable space (Ω , F ), a ﬁltration {F t } , F t ր F ∞ ⊂ F , and a collection { P µ : µ ∈ M } of pairwise equivalentprobability measures on (Ω , F ). Though θ is an unknown deterministicparameter, for mathematical precision we view θ as a random variable on(Ω , F ). Further, for each µ ∈ M , P µ induces the distribution µ for θ via µ ( A ) = P µ ( { θ ∈ A } ) for all Borel measurable A ⊂ Θ. Accordingly, M can be viewed as a set of priors on Θ, and its nonsingleton natureindicates ambiguity about θ . There is also a standard Brownian motion B = ( B t ), with generated ﬁltration {F Bt } , such that B is independent of θ under each P µ . B is the Brownian motion driving the signals process Z = ( Z t ) according to Z t = Z t θds + Z t σdB s = θt + σB t , (6)where σ is a known positive constant. Because only realizations of Z t are ob-servable, take {G t } to be the ﬁltration generated by Z . Assuming knowledgeof the signal structure, Bayesian updating of µ ∈ M gives the posterior µ t at time t . Thus prior-by-prior Bayesian updating leads to the set-valuedprocess ( M t ) of posteriors on θ .Proceed to specify the other CE components P , W and (Ξ t ). Step 1.

Take µ ∈ M . By standard ﬁltering theory (Liptser and Shiryaev1977, Theorem 8.3), if we replace the unknown parameter θ by the estimate b θ µt = R θdµ t , then we can rewrite (6) in the form dZ t = ˆ θ µt ( Z t ) dt + σ ( dB t + θ − ˆ θ µt ( Z t ) σ dt ) (7)= ˆ θ µt ( Z t ) dt + σd ˜ B µt ,where the innovation process ( ˜ B µt ) is a standard {G t } -adapted Brownianmotion on (Ω , G ∞ , P µ ). Thus ( ˜ B µt ) takes the same role as ( W ηt ) in CE (see(1) above). Rewrite (7) as d ˜ B µt = − σ ˆ θ µt ( Z t ) dt + 1 σ dZ t which suggests that ( Z t /σ ) (resp. ( − ˆ θ µt ( Z t ) /σ )) can be chosen as the Brow-nian motion ( W t ) (resp. the drift ( η t )) in (1). Step 2.

Find a reference probability measure P on (Ω , G ∞ ) under which( Z t /σ ) is a {G t } -adapted Brownian motion on (Ω , G ∞ ). Fix µ ∈ M and8eﬁne P by: dP dP µ | G t = exp {− σ R t (ˆ θ µs ( Z s )) ds − σ R t ˆ θ µs ( Z s ) d ˜ B µs } = exp { σ R t (ˆ θ µs ( Z s )) ds − σ R t ˆ θ µs ( Z s ) dZ s } .By Girsanov’s Theorem, ( Z t /σ ) is a {G t } -adapted Brownian motion under P . Step 3.

Viewing P as a reference measure, perturb it. For each µ ∈ M ,deﬁne P µ on (Ω , G ∞ ) by dP µ dP | G t = exp {− σ Z t (ˆ θ µs ( Z s )) ds + 1 σ Z t ˆ θ µs ( Z s ) dZ s } . By Girsanov, d ˜ B µt = − σ ˆ θ µt ( Z t ) dt + σ dZ t is a Brownian motion under P µ .In general, P µ = P µ . However, they induce the identical distributionfor Z . This is because ( ˜ B µt ) is a {G t } -adapted Brownian motion under both P µ and P µ . Therefore, by the uniqueness of weak solutions to SDEs, thesolution Z t of (7) on (Ω , F ∞ , P µ ) and the solution Z ′ of (7) on (Ω , G ∞ , P µ )have identical distributions. (Argue as in Oksendal (2005, Example 8.6.9).Given that only the distribution of signals matters in our model, there is noreason to distinguish between the two probability measures. Thus we applyCE to the following components: W and P deﬁned in Step 2, and Ξ t givenby Ξ t = {− ˆ θ µt /σ : µ ∈ M , b θ µt = Z θdµ t } . (8)In summary, taking these speciﬁcations for P , W , (Ξ t ) and {G t } in theCE model yields a set P of predictive priors, and a corresponding utilityfunction, that capture prior ambiguity about the parameter θ (through M ),learning as signals are realized (through updating to the set of posteriors M t ), and robust (maxmin) and time-consistent decision-making (becauseof (5)). We use this model in the optimal stopping problems that follow.The only remaining primitive is M , which is speciﬁed to suit the particularsetting of interest.As indicated, the key technical step in our extension of CE is in adoptingthe weak formulation rather than their strong formulation. For readerswho may be unfamiliar with this distinction we suggest Oksendal (2005,Section 5.3) for discussion of weak versus strong solutions of SDEs, andZhang (2017, Chapter 9). The latter exposits both the technical advantagesof the weak formulation and its economic rationale, notably in models withimperfect information (such as here, where given (6), Z is observed but not9 ), or asymmetric information (such as in principal-agent models). In ourcontext, the weak formulation is suggested if one views B not as modeling aphysical noise or shock, but rather as a way to specify that the distribution of ( Z t − θt ) /σ is standard normal (conditional on θ ). DM must choose an action from the set A = { a , a , a } . Payoﬀs are uncer-tain and depend on an unknown parameter θ . Before choosing an action,DM can learn about θ by observing realizations of the signal process Z givenby (6), where σ is a known positive constant. There is a constant per-unit-time cost c > {G t } generated by Z , and other notation are as in §

2. Unless speciﬁed other-wise, all processes below are taken to be {G t } -adapted even where not statedexplicitly.)If DM stops learning at t , then her conditional expected payoﬀ (in utils)is X t ; think of X t as the indirect utility she can attain by choosing optimallyfrom A . DM is forward-looking and has time 0 beliefs about future signalsgiven by the set P ⊂ ∆ (Ω , G ∞ ) described in the previous section. Herchoice of when to stop is described by a stopping time (or strategy) τ ,which is restricted to be uniformly integrable (sup Q ∈P E Q τ < ∞ ); the setof all stopping strategies is Γ. As a maxmin agent she chooses an optimalstopping strategy τ ∗ by solvingmax τ ∈ Γ min P ∈P E P ( X τ − cτ ) . (9)It remains to specify M , which determines P as described in §

2, and X t .We assume that all priors µ in M have binary support Θ = { θ , θ } , θ < θ . Speciﬁcally, let M = { µ m = (1 − m ) δ θ + mδ θ : m ≤ m ≤ m } . (10)Therefore, M can be identiﬁed with the probability interval [ m , m ] forthe larger parameter value θ . Let 0 < m < m < t , M t = { (1 − m ) δ θ + mδ θ : m t ≤ m ≤ m t } , (11)10here, by Liptser and Shiryaev (1977, Theorem 9.1), m t = m − m ϕ ( t, Z t )1 + m − m ϕ ( t, Z t ) , m t = m − m ϕ ( t, Z t )1 + m − m ϕ ( t, Z t ) , (12)and ϕ ( t, z ) = exp { θ − θ σ z − σ ( θ − θ ) t } . (13)Conditional on the parameter value, payoﬀs are given by u ( a i , θ j ), whereeach u ( a i , θ j ) is nonnegative. Think of u ( · , θ j ) as including the valuationof any risk remaining even if θ j is known to be true, for example, u ( a i , θ j )could be the expected utility of the lottery implied by ( a i , θ j ). Payoﬀs areassumed to satisfy: for each i, j = 0 , i = j , u ( a j , θ j ) = u ( a i , θ i ) > u ( a j , θ i ) . (14)Thus a is better than a given θ , and the reverse given θ , and the payoﬀto the better action is the same for both parameter values. The payoﬀ to thethird action a does not depend on θ , and can be thought of as a default oroutside option. Its payoﬀ is not ambiguous because incomplete conﬁdenceabout θ is the only source of ambiguity in the model, but choice of a mayentail risk. Adopt the notation u = u ( a , θ ) = u ( a , θ ) . (15)It is evident that action a may be irrelevant if its payoﬀ is suﬃciently low,for example, if u = 0. To exclude the trivial case where a is always chosen,assume that u < u ( a i , θ i ) , i = 0 , t beliefs about θ as repre-sented by the set of posteriors M t . The Gilboa-Schmeidler utility of a i ismin µ ∈M t R u ( a i , θ ) dµ . Therefore, if DM chooses an optimal action at time t , then her payoﬀ is X t = max (cid:26) min µ ∈M t Z u ( a , θ ) dµ, min µ ∈M t Z u ( a , θ ) dµ, u (cid:27) . (16)The preceding completes speciﬁcation of the optimal stopping problem(9). Its solution is described in § alternative additional assump-tions: Payoﬀ symmetry u ( a , θ ) = u ( a , θ )11 o risky option u ≤ u ( a i , θ j ), i = j = 0 , a is (weakly) inferior to each of a and a conditional on either parameter value. Hence, it would never be chosenuniquely and can be ignored, leaving only two actions. These assumptionsare satisﬁed respectively by the two special models upon which we focus:Ellsberg’s urns (payoﬀ symmetry) and hypothesis testing (no risky option).We focus on these ﬁrst because they extend classic models in the litera-ture and because they provide simply distinct insights into the connectionbetween ambiguity and optimal learning. There are two urns each containing balls that are either red or blue: a riskyurn in which the proportion of red balls is and an ambiguous urn in whichthe color composition is unknown. Denote by θ + the unknown proportionof red balls. Thus θ denotes the bias towards red: θ > θ < θ = 0 indicates an equal numberas in the risky urn. DM can choose between betting on the draw from therisky or ambiguous urn and also on drawing red or blue. In the absence oflearning, the intuitive behavior highlighted by Ellsberg is to bet on the drawfrom the risky urn no matter the color. Here we consider betting preferencewhen an ambiguity averse decision-maker can defer the choice between betsuntil after learning optimally about θ .To do so, we apply the model described above with particular speciﬁca-tions for its key primitives A , Θ, M and u . For A , let a denote a bet onthe risky urn and let a ( a ) denote the bet on drawing red (blue) from theambiguous urn. (Note that there is no need to diﬀerentiate between bets onred and blue for the risky urn.) Take Θ = { θ , θ } , where θ + θ = 0, orequivalently, for some 0 < α < , θ = − α , θ = α . (17)Thus only two possible biases, of equal size, are thought possible, (the pro-portion of red is either − α or + α ). However, there is ambiguity aboutwhich direction for the bias is more likely. This ambiguity is modeled by M having the form in (10), where we assume in addition that the prob-ability interval for α (the bias towards red) is such that m + m = 1, orequivalently, for some 0 < ǫ < m = 1 − ǫ m = 1 + ǫ α and ǫ . We interpret ǫ as modelingambiguity (aversion): the probability interval (cid:2) − ǫ , ǫ (cid:3) for the bias towardsred is larger if ǫ increases. At the extreme when ǫ = 0, then M is thesingleton according to which the two biases are equally likely, and DM is aBayesian who faces uncertainty with variance α about the true bias, but noambiguity. We interpret α as measuring the degree of this prior uncertainty,or prior variance ; ( α = 0 implies certainty that the composition of theambiguous urn is identical to that of the risky urn).Finally, specify payoﬀs u . All bets have the same winning and losingprizes, denominated in utils, which can be normalized to 1 and 0 respectively.Given the composition of the ambiguous urn, then only risk is involved inevery bet, and an expected utility calculation yields u ( a , − α ) = u ( a , α ) = α + , u ( a , α ) = u ( a , − α ) = α − , and u = .(19)The assumptions in § X t = X ( Z t ): X ( Z t ) =  ( + α ) − α − ǫ ǫ ϕ ( Z t ) if Z t > σ α log( ǫ − ǫ )( − α ) + α ǫ − ǫ ϕ ( Z t ) if Z t < − σ α log( ǫ − ǫ ) otherwise, (20)where ϕ ( z ) = exp (cid:0) αz/σ (cid:1) . Thus if Z t is large positive (negative), then abet on drawing red (blue) from the ambiguous urn is optimal. For inter-mediate values, there is not enough evidence for a bias in either directionto compensate for the ambiguity and betting on the risky urn is optimal.This is true in particular ex ante where Z = 0, consistent with the intuitiveambiguity-averse behavior in Ellsberg’s 2-urn experiment without learning.We give an explicit solution to the optimal stopping problem (9) satis-fying (17)-(19). To do so, let l ( r ) = 2 log( r − r ) − r + 11 − r , r ∈ (0 , b r by l ( b r ) = 2 α cσ . (22)13 r is uniquely deﬁned thereby and < b r <

1, because l ( · ) is strictly increas-ing, l (0) = −∞ , l ( ) = 0, and l (1) = ∞ . Theorem 3.1 (i) τ ∗ = 0 if and only if ǫ ≥ b r , in which case X τ ∗ = X = .(ii) Let ǫ < b r . Then the optimal stopping time satisﬁes τ ∗ > and isgiven by τ ∗ = min { t ≥ | Z t |≥ z } , where z = σ α (cid:20) log 1 + ǫ − ǫ + log r − r (cid:21) > , (23) and r , b r < r < , is the unique solution to the equation l ( r ) + l ( 1 + ǫ α cσ . (24) Moreover, on stopping either the bet on red is chosen (if Z τ ∗ ≥ z ) or the beton blue is chosen (if Z τ ∗ ≤ − z ); the bet on the risky urn is never optimal at τ ∗ > . Finally, if ǫ < ǫ ′ < b r − , and if τ ∗′ is the corresponding optimalstopping time, then τ ∗′ ≥ τ ∗ . The two cases are deﬁned by the relative magnitudes of ǫ , parametrizingambiguity, and b r , which is an increasing function of α / (cid:0) cσ (cid:1) ; in particular,through α , it depends positively on the payoﬀ to knowing the direction ofthe true bias. Thus (i) considers the case where ambiguity is large realtiveto payoﬀs (and taking also sampling cost and signal variance into account).Then no learning is optimal and the bet on the risky urn is chosen immedi-ately. In contrast, some learning is necessarily optimal given small ambiguity(case (ii)), including in the limiting Bayesian model with ǫ = 0. Thus it isoptimal to reject learning if and only if ambiguity, as measured by ǫ , is suit-ably large . In case (ii), it is optimal to sample as long as the signal Z t lies inthe continuation interval ( − z, z ). Two features of this learning region standout. First, when Z t hits either endpoint, learning stops and DM bets onthe ambiguous urn. Thus the risky urn is chosen (if and) only if it is notoptimal to learn . The second noteworthy feature is that sampling increaseswith greater ambiguity as measured by ǫ , though when ǫ reaches 2 b r −

1, then,by (i), it is optimal to reject any learning.There is simple intuition for the preceding. First, consider the eﬀect ofambiguity (large ǫ ) on the incentive to learn. DM’s prior beliefs admit only α and − α as the two possible values for the true bias. She will incur the14ost of learning if she believes that she is likely to learn quickly which ofthese is true. She understands that she will come to accept α (or − α ) asbeing true given realization of suﬃciently large positive (negative) values for Z t . A diﬃculty is that she is not sure which probability law in her set P describes the signal process. As a conservative decision-maker, she bases herdecisions on the worst-case scenario P ∗ in her set. Because she is trying tolearn, the worst-case minimizes the probability of extreme, hence revealing,signal realizations, which, informally speaking, occurs if P ∗ ( { dZ t > } | Z t >

0) and P ∗ ( { dZ t < } | Z t <

0) are as small as possible. That is,if Z t >

0, then the distribution of the increment dZ t is computed usingthe posterior associated with that prior in M which assigns the largestprobability ǫ to the negative bias − α , while if Z t <

0, then the distributionof the increment is computed using the posterior associated with the priorassigning the largest probability ǫ to the positive bias α . It follows that,from the perspective of the worst-case scenario, the signal structure is lessinformative the greater is ǫ . Accordingly, conditional on some learning beingoptimal, then it must be with the expectation of a long sampling period thatincreases in length with ǫ . A second eﬀect of an increase in ǫ is that it reducesthe ex ante utility of betting on the ambiguous urn and hence implies thatsignals in an increasingly large interval would not change betting preference.Consequently, a small sample is unlikely to be of value – only long samplesare useful. Together, these two eﬀects suggest existence of a cutoﬀ value for ǫ beyond which no amount of learning is suﬃciently attractive to justify itscost. At the cutoﬀ, here 2 b r −

1, DM is just indiﬀerent between stopping andlearning for another instant.There remains the following question for smaller values of ǫ : why is itnever optimal to try learning for a while and then, for some sample re-alizations, to stop and bet on the risky urn? The intuition, adapted fromFudenberg, Strack and Strzalecki (2018), is that this feature is a consequenceof the speciﬁcation M for the set of priors. To see why, suppose that Z t is small for some positive t . A possible interpretation, particularly for large t , is that the true bias is small and thus that there is little to be gained bycontinuing to sample – DM might as well stop and bet on the risky urn.But this reasoning is excluded when, as in our speciﬁcation, DM is certainthat the bias is ± α . Then signals suﬃciently near 0 must be noise and thesituation is essentially the same as it was at the start. Hence, if stoppingto bet on the risky urn were optimal at t , it would have been optimal alsoat time 0. This intuition is suggestive of the likely consequences of general-izing the speciﬁcation of M . Suppose, for example, that M is such thatall its priors share a common ﬁnite support. We conjecture that then the15redicted incompatibility of learning and betting on the risky urn would beoverturned if the zero bias point is in the common support.Finally, using the closed-form solution in the theorem, we can give moreconcrete expression to the eﬀect of ambiguity on optimal learning. Restrictattention to values of ǫ in [0 , b r − P θ the probability distribution of ( Z t ) if θ is the true bias. Then,by well-known results regarding hitting times of Brownian motion with drift(Borodin and Salminen 2015), the mean sample length according to P θ is E θ τ ∗ =  ( z/σ ) (cid:20) tanh ( θz/σ ) θz/σ (cid:21) if θ = 0( z/σ ) if θ = 0, (25)which is increasing in ǫ . Note also that θZ τ ∗ > θ > θ < θ = 0 is the true bias, of choosing the ”correct” bet on stopping is given by P θ ( { θZ τ ∗ > } ) = 11 + exp (cid:16) − | θ | σ z (cid:17) , if θ = 0,which increases with ǫ . (To prove this equality, apply the optional stoppingtheorem to the P θ -martingale e − θZ t /σ .)The proof of Theorem 3.1 yields a closed-form expression for the valuefunction associated with the optimal stopping problem. In particular, thevalue at time 0 satisﬁes (from (44) and (50)), v − = ( ǫ ≥ b r cσ α [ r (1 − r ) − ǫ )(1 − ǫ ) ] if ǫ < b r . (26)Since the payoﬀ is the best available without learning, v − is the valueof the learning option. It is positive for small ǫ < b r − ǫ increases to the switch point. (Note that ǫ = b r implies both are equal in turn to r , and hence that v is continuous at ǫ = 2 b r − c, σ, α ) = (cid:0) . , , (cid:1) , which gives . ǫ . Thus learning is rejected if ǫ = .

05. For ǫ = .

04, however, τ ∗ > Eτ ∗ = .

61 under P θ =0 . Neither of the values for ǫ is extreme:in the classic Ellsberg setting (with no learning), they imply probabilityequivalents for the bet on red equal to . . ǫ = .

05 and ǫ = .

04 respectively. 16 .3 A robust sequential hypothesis test

DM samples the signal process Z with the objective of then choosing betweenthe two statistical hypotheses H : θ = 0 and H : θ = β ,where β >

0. The novelty relative to Arrow, Blackwell and Girschik (1949)and Peskir and Shiryaev (2006) is that there is prior ambiguity about thevalue of θ and a robust decision procedure is sought.The following specialization of the general model is adopted. Let Θ = { , β } . The actions a and a are accept H and accept H , respectively.A third action is absent because there is no ”outside option” - one of thehypotheses must be chosen. (Formally, one could include a and specify itspayoﬀ below to be zero, in which case it would never be chosen.) The setof priors M is as given in (10), corresponding to the probability interval[ m , m ] for θ = β . Finally, payoﬀs are given by u ( a ,

0) = u ( a , β ) = a + b , u ( a , β ) = b , u ( a ,

0) = a ,where a, b >

0. (Payoﬀs in this context are usually speciﬁed in terms ofa loss function that is to be minimized. The loss function L satisfying L ( a ,

0) = L ( a , β ) = 0, L ( a , β ) = a , and L ( a ,

0) = b , gives an equivalentreformulation.)There are two diﬀerences in speciﬁcation from the Ellsberg context.First, there is no counterpart of the risky urn when choosing between hy-potheses. Second, while symmetry between colors is natural in the Ellsbergcontext, symmetry between hypotheses is not; thus, b need not equal a andthe probability interval [ m , m ] need not be symmetric about .The optimal stopping problem (9) admits a closed-form solution. Forperspective, consider ﬁrst the special Bayesian case ( M = { µ } , hence M t = { µ t } , µ t ( β ) = m t ). Denote by ˜ r ℓB < ˜ r RB the solutions to (33), which in thiscontext simpliﬁes to l (˜ r RB ) − l (˜ r lB ) = a + b ˆ c r RB ( − ˜ r RB ) − r lB ( − ˜ r lB ) = b − a ˆ c . (27)Then we have the following classical result. Theorem 3.2 (Peskir and Shiryaev 2006)

In the Bayesian case, for anyprior probability m it is optimal to continue at t if and only if ˜ r ℓB < m t < e r RB . (28)17 therwise, it is optimal to accept H or H according as m t ≥ e r RB or m t ≤ e r ℓB respectively. In the model with ambiguity, the cut-oﬀ values are ˜ r ℓ and ˜ r R , ˜ r ℓ < ˜ r R , that solve the appropriate version of (33), and we have the followinggeneralization of the classical result. Theorem 3.3

In the model with ambiguity, it is optimal to stop and accept H or H according as m t ≥ e r R or m t ≤ e r ℓ respectively. Otherwise, it isoptimal to continue.In addition, if a = b , then e r ℓB < e r ℓ and e r R < e r RB . (29)Under the assumption of payoﬀ symmetry ( a = b ), the theorem has note-worthy implications for the relation between the optimal stopping strategiesfor the Bayesian and the robustness-seeking DM. (We conjecture that (29)is valid even if a = b , but a proof has escaped us.) If m ∈ [ m , m ] refer toa compatible Bayesian . The theorem implies:1. If every compatible Bayesian stops and chooses a i , then it is optimalalso for DM to stop and choose a i , i = 1 , ”sensitivity analysis” overstatesthe robustness value of sampling .The intuition is clear. Prior ambiguity leads to the signal structure beingperceived as less likely to be informative (seen from the perspective of theworst-case measure P ∗ - see the outline at the start of the proof of Theorem4.2), even though the signal structure itself is not ambiguous. In contrast,there is no counterpart given multiple Bayesian agents - each is conﬁdent inbeliefs about θ and is certain that signal increments are conditionally i.i.d.Only DM internalizes uncertainty about the probability law and discountsthe beneﬁts of learning accordingly. Remark 3.4

As is made clear in Theorem 4.2, stopping conditions can bestated equivalently in terms of either the signal process (as in the Ellsberg odel), or posteriors (as here). In the text, we have adopted the formulationsthat seem more natural for each particular setting. For example, the use ofposteriors above facilitates comparison with the classical Bayesian result. Remark 3.5

Time-consistency in the present context is closely related tothe Stopping Rule Principle – that the stopping rule should have no eﬀecton what is inferred from observed data and hence on the decision taken afterstopping (Berger 1985). It is well-known that: (i) conventional frequen-tist methods, based on ex ante ﬁxed sample size signiﬁcance levels, violatethis Principle and permit the analyst to sample to a foregone conclusionwhen data-dependent stopping rules are permitted; and (ii) Bayesian poste-rior odds analysis satisﬁes the Principle. Kadane, Schervish and Seidenfeld(1996) point to the law of iterated expectations as responsible for excludingforegone conclusions (if the prior is countably additive). Equation (5) isa nonlinear counterpart that we suspect plays a similar role in our model(though details are beyond the scope of this paper).

In order to condense notation, we write u ij in place of u ( a i , θ j ), i, j = 0 , § either payoﬀ symmetry ( u = u ) or no risky option( u ≤ min { u , u } ). Payoﬀ symmetry is satisﬁed in Theorem 3.1, butthe latter assumes more, speciﬁcally ex ante indiﬀerence between a and a ( m + m = 1) and u = ( u + u ). Thus it is extended below byTheorem 4.2(a). The assumption of no risky option is the crucial elementin the hypothesis testing example, and the corresponding optimal stoppingproblem is isomorphic to that in part (b) of Theorem 4.2.Both m t and m t deﬁned in (12) are increasing functions of ϕ ( t, z t ). Itfollows that there exists a unique pair of probabilities π and π and a unique(deterministic) signal realization trajectory ( e z t ) satisfying, for every t , π = m t ( e z t ), π = m t ( e z t ), and πu + (1 − π ) u = πu + (1 − π ) u .For example, e z = 0, π = m and π = m if and only if a and a areindiﬀerent ex ante. More generally, a and a are indiﬀerent conditional onthe signal e z t at t and a ( a ) is preferred at t if Z t < ( > ) e z t .Normalize the cost of learning to b c , ˆ c = 2 cσ / ( θ − θ ) .19ptimal stopping strategies will be described in terms of several criticalvalues, that are, in turn, deﬁned using the functions l and e l : For all r in(0 , l ( r ) = 2 log( r − r ) − r + 11 − r ˜ l ( r ) = log( r − r ) + r − r .Let ( r R , r R ), ( r l , r l ), (cid:0) r R , r l (cid:1) and (cid:0) ˜ r R , ˜ r l (cid:1) solve the following equationsrespectively: l ( r R ) − l ( r R ) = u − u ˆ c ˜ l ( r R ) − ˜ l ( r R ) = u − u ˆ c , (30) l ( r l ) − l ( r l ) = − u − u ˆ c ˜ l ( r l ) − ˜ l ( r l ) = u − u ˆ c , (31) l ( r R ) − l ( π ) = u − u ˆ c l ( r l ) − l ( π ) = − u − u ˆ c , (32) l (˜ r R ) − l ( π ) = l (˜ r l ) − l ( π ) + u − u + u − u ˆ c ˜ l ( e r R ) − ˜ l ( π ) − π (cid:0) l ( e r R ) − l ( π ) (cid:1) = e l ( e r l ) − e l ( π ) − π (cid:0) l ( e r l ) − l ( π ) (cid:1) . (33)(The latter reduces to (32) if payoﬀ symmetry is satisﬁed.)Deﬁne u ∗∗ = ˆ c r l (1 − r l ) − π (1 − π ) ] + u − u . (34)Besides the existence and uniqueness assertions, the next lemma provesa number of properties that are important for the optimal stopping theoremto follow. Lemma 4.1

There exist unique solutions to (32) and (33), and the solu-tions to the latter satisfy ˜ r l < π , ˜ r R > π . (35) If u ≥ u ∗∗ , then there exist unique solutions also to (30) and (31), and thesolutions satisfy r l < r l , r R < r R , π < r R , r l < π .If payoﬀ symmetry is also satisﬁed, then: π + π = 1 = r l + r R , and (36) r l ≤ π ⇐⇒ r R ≥ π ⇐⇒ u ≥ u ∗∗ . (37)20eﬁne f ( t, r ) = θ + θ t + σ θ − θ log( 1 − m m r − r ) f ( t, r ) = θ + θ t + σ θ − θ log( 1 − m m r − r ).Then m t (cid:0) f ( t, r ) (cid:1) = r = m t (cid:0) f ( t, r ) (cid:1) , and, for any r and r , f ( t, r ) ≤ e z t ⇐⇒ r ≤ π (38) f ( t, r ) ≥ e z t ⇐⇒ r ≥ π .Finally, deﬁne three stopping times: τ ≡ min { t ≥ Z t ≤ f ( t, r l ) } = min { t ≥ m t ≤ r l } , τ ≡ min { t ≥ Z t ≥ f ( t, r R ) } = min { t ≥ m t ≥ r R } , and τ ≡ min { t ≥ f ( t, r l ) ≤ Z t ≤ f ( t, r R ) } = min { t ≥ m t ≥ r l and m t ≤ r R } . Theorem 4.2 (a) Assume payoﬀ symmetry ( u = u ).(a.i) If r l ≤ π , then the optimal stopping time τ ∗ is given by τ ∗ = min { τ i : i = 0 , , } .Moreover, if τ ∗ = τ i , then a i is optimal on stopping. In particular, if thereis ex ante indiﬀerence between a and a ( π = m and π = m ), then τ ∗ = 0 and a is chosen.(a.ii) If r l > π , then τ ∗ = min { t ≥ Z t ≤ f ( t, r l ) or Z t ≥ f ( t, r R ) } = min { t ≥ m t ≤ r l or m t ≥ r R } .Moreover, a is optimal on stopping if Z τ ∗ ≤ f ( τ ∗ , r l ) (equivalently if m τ ∗ ≤ r l ), a is optimal if Z τ ∗ ≥ f ( τ ∗ , r R ) (equivalently if m τ ∗ ≥ r R ), and a isnever optimal. b) Assume u ≤ min { u , u } . Then τ ∗ = min { t ≥ Z t ≤ f ( t, ˜ r l ) or Z t ≥ f ( t, ˜ r R ) } = min { t ≥ m t ≤ ˜ r l or m t ≥ ˜ r R } .Moreover, a is optimal on stopping if Z τ ∗ ≤ f ( τ ∗ , ˜ r l ) (equivalently if m τ ∗ ≤ ˜ r l ), a is optimal if Z τ ∗ ≥ f ( τ ∗ , ˜ r R ) (equivalently if m τ ∗ ≥ ˜ r R ), and a isnever optimal. In (a), the distinction between the two subcases depends on the relativemagnitudes of r l and π . From (31) it follows that r l falls as u increases,while π does not depend on u . Therefore, (a.i) applies if the payoﬀ u tothe unambiguous default is suﬃciently large. The other factor leading to(a.i) is large π , equivalently (by (36)) small π , which is supported by m large and m small. Thus, (a.i) is supported also by large prior ambiguity.In (a.i), τ ∗ = 0 if either m ≤ r l (prior beliefs are strongly biasedtowards θ and hence a is chosen immediately), or m ≥ r R (prior beliefs arestrongly biased towards θ and hence a is chosen), or m ≥ r l and m ≤ r R (the worst-case probabilities of both θ and θ are both suﬃciently low thatneither a nor a are attractive enough to justify the cost of sampling andhence a is chosen). That leaves continuation being optimal at time 0 if andonly if prior beliefs are ”intermediate” in the sense thateither: [ r l < m < r l ] and m < r R ,or: [ r R < m < r R ] and m > r l ].This continuation region could be empty. Since learning is only about thepayoﬀs to a and a , the situation at time 0 that is least favorable to learningis where there is ex ante indiﬀerence between a and a – then a long andhence costly sample would likely be needed to modify the ex ante ranking ofactions. In this case, therefore, it is optimal to reject learning and choose a ,as in Theorem 3.1. However, if, for example, a is strictly preferred initially,then an incentive to learn is that a relatively short interval of sampling maybe enough to decide between a and a . In addition, if m is suﬃcientlylarge, say near 1, then near certainty that θ = θ can lead to rejection oflearning and the immediate choice of a , rather than of a as in the Ellsbergcontext.In (a.ii), τ ∗ = 0 iﬀ [ m , m ] is disjoint from ( r ℓ , r R ). Notably, the defaultaction is not chosen regardless of when sampling stops. Its payoﬀ u is toolow (from (37), u < u ∗∗ ) compared to the expected payoﬀ of choosing a or22 , possibly after some learning. Moreover, even given some learning, it isnot optimal to choose a regardless of the realized sample, as explained indiscussion of Theorem 3.1. Under ex ante indiﬀerence, Lemma 4.1 impliesthat τ ∗ > a and a , then a is chosen if and only if there is nolearning , thus generalizing the result in the Ellsberg model. ( The latter alsoassumes u = ( u + u ), which we see here is not needed for the precedingconclusion.)Finally, consider (b), where the payoﬀ to the unambiguous action is solow that it would never be chosen, regardless of prior beliefs and even inthe absence of the option to learn. The optimal strategy is similar to thatin (a.ii) in form and interpretation - only the critical values may diﬀer toreﬂect the diﬀerent assumptions about payoﬀs. Another comment about(b) is that when m = m , then π = π and the equations (33) deﬁning thecritical values ˜ r R and ˜ r l become l (˜ r R ) − l (˜ r l ) = u − u + u − u ˆ c ˜ l (˜ r R ) − ˜ l (˜ r l ) = u − u ˆ c ,which are equations (21.1.14) and (21.1.15) in Peskir and Shiryaev (2006).Proof of the theorem is provided in the e-companion. Here we commentbrieﬂy on the proof strategy.The strategy is to: (i) guess the P ∗ in P that is the worst-case scenario;(ii) solve the classical optimal stopping problem given the single prior P ∗ ;(iii) show that the value function derived in (2) is also the value function forour problem (9); and (4) use the value function to derive τ ∗ .The intuition for the conjectured P ∗ was given in § P ∗ should make P ∗ ( { dZ t > } | Z t > e z t ) and P ∗ ( { dZ t < } | Z t < e z t ) assmall as possible, by using m t when Z t > e z t and m t when Z t < e z t . (See(41) for the precise deﬁnition of P ∗ .) The search for the value function v begins with the HJB equation which yields its functional form up to someconstants to be determined by smooth contact conditions between v and thepayoﬀ function X (see Peskir and Shiryaev (2006) for this free-boundary ap-proach to analysing optimal stopping problems). A new ingredient relativeto existing models stems from the nature of P ∗ , speciﬁcally from the factthat the relevant posterior probability at t switches between m t and m t asdescribed, implying that the form of the value function diﬀers between theregions Z t > e z t and Z t < e z t . Thus, in addition to ensuring a smooth contact23t stopping points, one must also be concerned with the smooth connectionat e z t .We elaborate on the latter point in order to highlight the technical nov-elty that arises from ambiguity. For concreteness consider (a.ii), where a is never chosen. Let y denote a posterior probability, computed using m or m , depending on the sub-domain, and let V R ( y ) : [ π, → [0 , + ∞ ) and V l ( y ) : [0 , π ] → [0 , + ∞ ) denote corresponding candidates for the value inthe indicated regions. Then the variational inequality and smooth contactslead to the following free-boundary diﬀerential equation, in which r R ∈ ( π, r l ∈ [0 , π ) are also unknowns to be determined:  V Ryy ( y ) = ˆ c y (1 − y ) , y ∈ ( π, r R ) V R ( r R ) = ( u − u ) r R + u V Ry ( r R ) = ( u − u ) V lyy ( y ) = ˆ c y (1 − y ) , y ∈ ( r l , π ) V l ( r l ) = − ( u − u ) r l + u V ly ( r l ) = − ( u − u ), (39)and the (new) smooth contact conditions due to ambiguity ( π < π ): (cid:26) V R ( π ) = V l ( π ) ,V Ry ( π ) = V ly ( π ) . (40)In (a.ii), payoﬀ symmetry leads to the simpliﬁcation V Ry ( π ) = V ly ( π ) = 0,which leads to (32) becoming two separated equations. However, in (b), theconnection is not trivial. Below ”almost surely” qualiﬁcations should be understood, even where notstated explicitly, and as deﬁned relative to any measure in P .To compute the payoﬀ X t deﬁned in (16), note thatmin µ ∈M t Z u ( a , θ ) dµ = ( u − u )(1 − m t ) + u , min µ ∈M t Z u ( a , θ ) dµ = ( u − u ) m t + u . There is a critical level of u , denoted u ∗ , u ∗ = u u − u u u + u − u − u .24f u ≤ u ∗ , then X t = (cid:26) ( u − u )(1 − m t ) + u if m t < π ( u − u ) m t + u if m t ≥ π .Accordingly, the default action a is not optimal at any t , and a ( a ) isoptimal conditional on stopping at t if m t < π ( m t ≥ π ). If u > u ∗ , then X t =  ( u − u )(1 − m t ) + u if m t < u − u u − u ( u − u ) m t + u if m t ≥ u − u u − u u otherwise,reﬂecting the conditional optimality of a , a and a respectively in the threeindicated regions.As in §

2, for any µ ∈ M , µ t denotes its Bayesian posterior at t and b θ µt = R θdµ t is the corresponding posterior estimate of θ . The two extrememeasures µ = µ , µ , are deﬁned by µ t ( θ ) = m t and µ t ( θ ) = m t ,and yield the estimates ˆ θ µt and ˆ θ µt respectively. Let P ∗ be the probabilitymeasure in P which has density generator process ( η t ), − η t = (ˆ θ µt /σ ) Z t ≤ e z t + (ˆ θ µt /σ ) Z t > e z t . (41)It will be shown that P ∗ is the worst-case scenario in P . Proof of (a.ii) : Consider the classical optimal stopping problem under P ∗ ,max τ E P ∗ [ X τ − cτ ]. (42)Deﬁne g and g by, for 0 < y < i = 1 , g i ( y ; C i − , C i ) = ˆ c (2 y −

1) log( y − y ) + C i − y + C i , (43)where the constants C i ( i = 1, 2, 3, 4) are determined by smooth-contactconditions.We conjecture that the value function for (42) has the form: v ( t, z ) =  ( u − u )(1 − m t ( z )) + u if z < f ( t, r l ) g ( m t ( z ) ; C , C ) if f ( t, r l ) ≤ z < e z t g ( m t ( z ) ; C , C ) if e z t ≤ z < f ( t, r R )( u − u ) m t ( z ) + u if f ( t, r R ) ≤ z, (44)25here C = − ˆ cℓ ( π ), C = − ˆ cℓ ( π ) C = ( u − u )(1 − r l ) + u − ˆ c [(2 r l −

1) log( r l − r l ) − ℓ ( π ) r l ] C = ( u − u ) r R + u − ˆ c [(2 r R −

1) log( r R − r R ) − ℓ ( π ) r R ].(Note that the cut-oﬀ value u ∗∗ deﬁned in (34) satisﬁes u ∗∗ = g ( π ; C , C ) = g ( π ; C , C ) = v ( t, e z t ).) Lemma 5.1 v is the value function of the classical optimal stopping problem(42), i.e., for any t ≥ , v ( t, z ) = max τ ≥ t E P ∗ [ X τ − t − c ( τ − t ) | Z t = z ] .Further, v satisﬁes the HJB equation max { X ( t, z ) − v ( t, z ) , − c + v t ( t, z ) + 12 σ v zz ( z ) + f ( t, z ) v z ( t, z ) } = 0 , (45) where f ( t, z ) ≡ [ θ − θ − θ m − m ϕ ( t, z ) ]1 { z< e z t } + [ θ − θ − θ m − m ϕ ( t, z ) ]1 { z ≥ e z t } . (46) Finally, v also satisﬁes, ∀ z ∈ ( f ( t, r l ) , f ( t, r R )) , − c + v ( t, z ) + 12 σ v zz ( z ) + f ( t, z ) v z ( t, z ) = 0 . (47)For the proof, ﬁrst verify that v satisﬁes the HJB equation (45), and thenapply El Karoui et al. (1997, Theorems 8.5, 8.6). Alternatively, a proof canbe constructed along the lines of Peskir and Shiryaev (2006, Ch. 6).Next prove that v is the value function of the (nonclassical) optimalstopping problem (9) (solving the HJB equation is not suﬃcient to implythis). We consider only t = 0 and prove v (0 , z ) = max τ ≥ min P ∈P E P [ X ( Z τ ) − cτ ].26y Lemma 5.1, v (0 , z ) = max τ ≥ E P ∗ [ X ( Z τ ) − cτ ] ≥ max τ ≥ min P ∈P E P [ X ( Z τ ) − cτ ].To prove the opposite inequality, consider the stopping time τ ∗ = inf { t ≥ Z t ≤ f ( t, r l ) or Z t ≥ f ( t, r R ) } . For t ≤ τ ∗ , by Ito’s formula, (45), and (47), dv ( t, Z t ) =[ v t ( t, Z t ) + 12 σ v zz ( t, Z t )] dt + v z ( t, Z t ) dZ t (48)= [ c − f ( t, Z t ) v z ( t, Z t )] dt + v z ( t, Z t ) dZ t = [ c − f ( t, Z t ) v z ( t, Z t )] dt + v z ( t, Z t ) dZ t .Each P = P η ∈ P corresponds to a density generator process ( η t ), and( W ηt ) is a Brownian motion under P η , where W ηt = 1 σ Z t + 1 σ Z t ˜ f ( s, Z s , η s ) ds , and˜ f ( t, Z t , η t ) = [ θ − θ − θ η t − η t ϕ ( t, Z t ) ].Therefore, dv ( t, Z t ) =[ c + (cid:16) ˜ f ( t, Z t , η t ) − f ( t, Z t ) (cid:17) v z ( t, Z t )] dt + σv z ( t, Z t ) dW ηt . Note that (cid:16) ˜ f ( t, Z t , η t ) − f ( t, Z t ) (cid:17) v z ( Z t ) ≥

0. (Suppose Z t < e z t . Then v z ( Z t ) ≤ f ( t, Z t , η t ) − f ( t, Z t ) ≤

0, the latter because [ θ − θ − θ m − m ϕ ( t,z ) ]is increasing in m . Argue similarly for Z t < e z t .) Take expectation aboveunder P η to obtain v (0 , z ) ≤ E P η [ v ( τ ∗ , Z τ ∗ ) − cτ ∗ ]= E P η [ X τ ∗ − cτ ∗ ] . The above inequality is due to E P η [ Z τ ∗ σv z ( t, Z t ) dW ηt ] = 0,which is guaranteed by max P ∈P E P [ τ ∗ ] < ∞ ; (49)27ee Peskir and Shiryaev (2006, Theorem 21.1) for the classical case. In oursetting, (49) is implied by the boundedness of X t because: −∞ < max τ ≥ min P ∈P E P ( X τ − cτ ) = max τ ≥ [ − max P ∈P E P ( cτ − X τ )] ≤ max τ ≥ [max P ∈P E P ( X τ ) − max P ∈P E P ( cτ )] = ⇒ max P ∈P E P [ τ ∗ ] < ∞ .Finally, because P η can be any measure in P , deduce that v (0 , z ) ≤ min P ∈P E P [ X τ ∗ − cτ ∗ ] ≤ max τ ≥ min P ∈P E P [ X τ − cτ ].Conclude that v is the value function for our optimal stopping problem andthat τ ∗ is the optimal stopping time. Remark 5.2

The preceding implies that P ∗ is indeed the minimizing mea-sure because the minimax property is satisﬁed: max τ ≥ E P ∗ X ( Z τ ) = max τ ≥ min P ∈P E P X ( Z τ ) ≤ min P ∈P max τ ≥ E P X ( Z τ ) ≤ max τ ≥ E P ∗ X ( Z τ ) = ⇒ min P ∈P max τ ≥ E P X ( Z τ ) = max τ ≥ min P ∈P E P X ( Z τ ) . Proof of (a.i):

The proof is similar to that of (a.ii). The only diﬀerence isthat the value function v is given by v ( t, z ) =  ( u − u )(1 − m t ( z )) + u if z < f ( t, r l ) g ( m t ( z ) ; C , C ) if f ( t, r l ) ≤ z < f ( t, r l ) u if f ( t, r l ) ≤ z < f ( t, r R ) g ( m t ( z ) ; C , C ) if f ( t, r R ) ≤ z < f ( t, r R )( u − u ) m t ( z ) + u if f ( t, r R ) ≤ z . (50)Here g and g are identical to g and g (deﬁned in (43)) respectively, exceptthat the constants C , ..., C are replaced respectively by C , ..., C given by C = − ˆ cℓ ( r l ), C = − ˆ cℓ ( r R ) C = u − ˆ c [(2 r l −

1) log( r l − r l ) − ℓ ( r l ) r l ] C = u − ˆ c [(2 r R −

1) log( r R − r R ) − ℓ ( r R ) r R ].28 roof of (b) : Since it is never optimal to choose a , we can delete it fromthe set of feasible actions. The proof proceeds as in (a.ii), though we deﬁne v ( t, z ) =  ( u − u )(1 − m t ( z )) + u if z < f ( t, ˜ r l ) g ( m t ( z ) ; C , C ) if f ( t, ˜ r l ) ≤ z < e z t g ( m t ( z ) ; C , C ) if e z t ≤ z < f ( t, ˜ r R )( u − u ) m t ( z ) + u if f ( t, ˜ r R ) ≤ z, where g and g are identical to g and g (deﬁned in (43)) respectively,except that the constants C , ..., C are replaced respectively by C , ..., C given by C = − ˆ cℓ (˜ r R ) + u − u C = − ˆ cℓ (˜ r l ) + u − u C = u − ˆ c [1 − e l (˜ r R )] C = u − ˆ c [1 − e l (˜ r l )]. Deﬁne ˆ l ( r ) = (2 r −

1) log( r − r ). We prove the existence and uniqueness ofsolutions to the following equations: (32) : Follows from l : (0 , → ( −∞ , ∞ ) being surjective, continuous andstrictly increasing. (33) : Adapt the argument in Peskir and Shiryaev (2006, p. 290) used fora classical optimal stopping problem, generalized here to our context withambiguity. For ﬁxed ˆ r l ∈ (0 , π ), consider the following equation for V l ( y ):  V l ( y ) = ˆ c ˆ l ( y ) + ˆ C y + ˆ C V ly ( y ) = ˆ cl ( y ) + ˆ C V l (ˆ r l ) = − ( u − u )ˆ r l + u V ly (ˆ r l ) = u − u , (51)where y ∈ (0 ,

1) and ˆ C , ˆ C are constants to be determined. The solution is V l ( y ) = ˆ c ˆ l ( y ) − ( u − u + ˆ cl (ˆ r l )) y + u + ˆ c (ˆ r l l (ˆ r l ) − ˆ l (ˆ r l )).29ecause V l ( y ) depends on ˆ r l , we denote the solution by V l ( y ; ˆ r l ). If V l ( π ; ˆ r l )

1) and ˆ C , ˆ C are constants to be determined. The solution is V R ( y ) = ˆ c ˆ l ( y ) + ( V ly ( π ; ˆ r l ) − ˆ cl ( π )) y + V l ( π ; ˆ r l ) + ˆ c ( πl ( π ) − ˆ l ( π )) − πV ly ( π ; ˆ r l ).Denote the solution by V R ( y ; ˆ r l ). Since ˆ l ′′ ( y ) = l ′ ( y ) > y ∈ (0 , V l ( y ; ˆ r l ) and V R ( y ; ˆ r l ) are strictly convex functions. Recallthat π = m t ( e z t ), π = m t ( e z t ) and π ( u − u )+ u = (1 − π ) ( u − u )+ u .Then, V R ( π ) = V l ( π ; ˆ r l ) implies that the function y V R ( y ; ˆ r l ) intersects y ( u − u ) y + u for some y ∈ ( π,

1) when ˆ r l is close to π . Let y = ˆ y l satify V l ( y ; ˆ r l ) = u . Then, ˆ y l ↓ r l ↓ r l from π down to 0 and applying the properties estab-lished above, we obtain the existence of a unique point ˆ r l ∗ ∈ (0 , π ) for whichthere exists ˆ r R ∗ ∈ ( π,

1) such that V R (ˆ r R ∗ ; ˆ r l ∗ ) = ( u − u )ˆ r R ∗ + u (53) V Ry (ˆ r R ∗ ; ˆ r l ∗ ) = u − u .Combining (51), (52) and (53), we can verify that (ˆ r R ∗ , ˆ r l ∗ ) is a solution of(33). Note that each step of the derivation is reversible. Thus, there existsa unique solution ( e r R , e r l ) for (33). Inequalities (35) follow directly fromconstruction of the solution. (31) and (30) : By the deﬁnition of u ∗∗ and equation (32), it is easy tocheck that u ∗∗ > u . Set ˆ y = u − u ∗∗ u − u . Deﬁne the following payoﬀ function V ( y ) = (cid:26) − ( u − u ) y + u if y ∈ (0 , ˆ y ); u ∗∗ if y ∈ (ˆ y, r l , r l ) for (31). The proof for (30) is similar.It is obvious that r l < r l and r R < r R due to l being strictly increasing.Turn to the remainder of the lemma (we skip the most obvious as-sertions). Given payoﬀ symmetry, the deﬁnitions of π and π imply that π + π = 1. Then r l + r R = 1 follows from (32) and l ( r ) + l (1 − r ) = 0.30 rove (37) : Verify that l ( r ) = e l ( r ) − r (1 − r ) + 1 and rewrite (30) as˜ l ( r R ) − ˜ l ( r R ) = r R (1 − r R ) − r R (1 − r R ) + u − u ˆ c ˜ l ( r R ) − ˜ l ( r R ) = u − u ˆ c .If u = u ∗∗ , then, using payoﬀ symmetry, we can verify that r R = r R , r R = π is the unique solution of (30). Next we prove that the solution r R of (30) is increasing with respect to u . Note that l ′ ( r ) = r (1 − r ) and˜ l ′ ( r ) = r (1 − r ) . From (30), derive l ′ ( r R ) dr R dr R − l ′ ( r R ) = 0˜ l ′ ( r R ) dr R dr R dr R du − ˜ l ( r R ) dr R du = 1ˆ c .Thus, dr R du = ( r R ) (1 − r R ) ˆ c ( r R − r R ) > r R ≥ π ⇐⇒ u ≥ u ∗∗ . Similarly, we can prove that r l ≤ π ⇐⇒ u ≥ u ∗∗ . (cid:4) Proof of Theorem 3.1 (Ellsberg): (i) Compute that ˆ c = cσ α , e z t = 0, π = − ǫ , π = ǫ . Equations (30) and (31) simplify to r R + r R = 1 , l ( r R ) = α cσ r l + r l = 1 , l ( r l ) = α cσ ,(which exploit the fact that u = ( u + u )), and the functions f and f become f ( t, r ) = σ α log( 1 − ǫ ǫ r − r ) f ( t, r ) = σ α log( 1 + ǫ − ǫ r − r ).If r l < ǫ , then f ( t, r l ) ≤ ≤ f ( t, r R ). By Theorem 4.2(a.i), the signal Z = 0 falls in the stopping region which leads to τ ∗ = 0. This proves (i)with b r = r l . 31ii) Equation (32) becomes r R + r l = 1 , l ( r R ) + l ( 1 + ǫ α cσ ,and z ≡ f ( t, r R ) = − f ( t, r l ) = σ α (cid:20) log( 1 + ǫ − ǫ ) + log( r R − r R ) (cid:21) . By Theorem 4.2(a.ii), τ ∗ = min { t ≥ | Z t |≥ z } .Let z be given by z = σ α log( 1 + ǫ − ǫ ) < z .It follows from (16) and (11) that at any given t , not necessarily an optimalstopping time, betting on the ambiguous urn is preferred to betting on therisky urn iﬀ | Z t |≥ z . Thus at τ ∗ > | Z τ ∗ | = z > z , and betting on theambiguous urn is optimal on stopping.Finally, we show that z is increasing in ǫ : ℓ ′ ( r ) = r (1 − r ) = ⇒ dzdǫ > r R − ǫ ℓ ′ (cid:0) r R (cid:1) > ǫ − r R ℓ ′ (cid:0) ǫ (cid:1) iﬀ ǫ · − ǫ > r R (cid:0) − r R (cid:1) . But < ǫ < r l r l (cid:0) − r l (cid:1) > r R (cid:0) − r R (cid:1) . This completes proof of (ii) with r = r R . (cid:4) Proof of Theorem 3.3 (hypothesis test): Given Theorem 4.2(b), it remainsonly to prove (29) assuming that a = b . Payoﬀ symmetry implies that (33)reduces to (32). Using also Lemma 4.1, conclude that ˜ r l = 1 − ˜ r R and that˜ r R solves l ( e r R ) = l ( π ) + b ˆ c < b ˆ c . For Bayesians, π = π = ba + b , and (27)implies that ˜ r lB = 1 − ˜ r RB and l (˜ r RB ) = a + b c = b ˆ c . Hence ˜ r R < ˜ r RB . (cid:4) References [1] Arrow KJ, Blackwell D, Girshick MA (1949), Bayes and minimax solu-tions of sequential decision problems.

Econometrica

Uncertainty and Expecta-tions in Economics (Basil Blackwell, Oxford), 1-11.[3] Berger JO (1984) The robust Bayesian viewpoint. Kadane J, ed.

Ro-bustness in Bayesian Statistics (North Holland, Amsterdam), 63-124.[4] Berger JO (1985)

Statistical Decision Theory and Bayesian Analysis (Springer, New York). 325] Berger JO (1994) An overview of robust Bayesian analysis (with dis-cussion).

Test

Handbook of Brownian Motion–Factsand Formulae , 2nd ed. (Birkhauser, Basel).[7] Caro F, Das Gupta A (2015) Robust control of the multi-armed banditproblem.

Ann. Oper. Res . https://doi.org/10.1007/s10479-015-1965-7.[8] Chen Z, Epstein LG (2002) Ambiguity, risk and asset returns in con-tinuous time.

Econometrica

Math. Finan. Econom.

Proc. Fourth Berkeley Symp. on Math. Statist. andProbab. vol 1 (U. California Press, Berkeley), 79-91.[11] Choi H (2016) Learning under ambiguity: portfolio choice and assetreturns. Working Paper, City University of Hong Kong.[12] Ebert S, Strack P (2018) Never, ever gettingstarted: on prospect theory without commitment,https://papers.ssrn.com/sol3/papers.cfm?abstract id=2765550.[13] El Karoui N, Kapoundijian C, Pardoux E, Peng S, Quenez M (1997)Reﬂected solutions of backward SDE’s and related obstacle problemsfor PDE’s.

Ann. Probab . 25(2):702-737.[14] Ellsberg D (1961) Risk, ambiguity, and the Savage axioms.

Quart. J.Econom.

J. Econom.Theory

Rev.Econom. Stud . 74(4):1275-1303.[17] Epstein LG, Schneider M (2008) Ambiguity, information quality andasset pricing.

J. Finan.

Ann.Rev. Finan. Econom.

Amer. Econom. Rev . 108(2):3651-3684.[20] Gilboa I (2009)

Theory of Decision under Uncertainty . (Cambridge U.Press, New York).[21] Gilboa I (2015) Rationality and the Bayesian paradigm.

J. Econom.Method.

Synthese

J. Math. Econom.

Proc. IEEE

Ann.Math. Statist.

JASA

J. Math.Econom.

Statistics of Random Processes I: GeneralTheory. (Springer, Berlin).[30] Marinacci M (2002) Learning from ambiguous urns.

Statist. Papers

Ann. Econom. Finan.

Stochastic Diﬀerential Equations,

Optimal Stopping and Free-Boundary Prob-lems (Springer, Berlin).[35] Rios-Insua D, Ruggeri F (2000)

Robust Bayesian Analysis (Springer,New York).[36] Shapiro A (2016) Rectangular sets of probability measures.

Oper. Res.

Optimal Stopping Rules

Games Econom. Behav . 79:44-55.[39] Wald A (1945) Sequential tests of statistical hypotheses.

Ann. Math.Statist.

Sequential Analysis (Wiley, New York).[41] Walley P (1991)

Statistical Reasoning with Imprecise Probabilities (Chapman and Hall, London).[42] Zhang J (2017)