[PDF] Mislearning from Censored Data: The Gambler's Fallacy in Optimal-Stopping Problems

Abstract

I study endogenous learning dynamics for people expecting systematic reversals from random sequences - the "gambler's fallacy." Biased agents face an optimal-stopping problem. They are uncertain about the underlying distribution and learn its parameters from predecessors. Agents stop when early draws are "good enough," so predecessors' experience contain negative streaks but not positive streaks. Since biased agents understate the likelihood of consecutive below-average draws, society converges to over-pessimistic beliefs about the distribution's mean and stops too early. Agents uncertain about the distribution's variance overestimate it to an extent that depends on predecessors' stopping thresholds. Subsidizing search partially mitigates long-run belief distortions.

Full PDF

MMislearning from Censored Data:The Gambler’s Fallacy in Optimal-Stopping Problems

Kevin He ∗ First version: March 21, 2018This version: August 15, 2019

Abstract

I study endogenous learning dynamics for people expecting systematic reversalsfrom random sequences — the “gambler’s fallacy.” Biased agents face an optimal-stopping problem, such as managers conducting sequential interviews. They are un-certain about the underlying distribution (e.g. talent distribution in the labor pool)and learn its parameters from their predecesors. Agents stop when early draws aredeemed “good enough,” so predecessors’ experience contain negative streaks but notpositive streaks. Since biased agents understate the likelihood of consecutive below-average draws, society converges to over-pessimistic beliefs about the distribution’smean. When early agents decrease their acceptance thresholds due to pessimism, lateragents will become more surprised by the lack of positive reversals in their predecessors’histories, leading to more pessimistic inferences and lower acceptance thresholds — apositive-feedback cycle. Agents who are additionally uncertain about the distribution’svariance believe in ﬁctitious variation (exaggerated variance) to an extent dependingon the severity of data censoring. ∗ California Institute of Technology and University of Pennsylvania. Email: [email protected]. I amindebted to Drew Fudenberg, Matthew Rabin, Tomasz Strzalecki, and Ben Golub for their guidance andsupport. I thank Isaiah Andrews, Ruiqing Cao, In-Koo Cho, Martin Cripps, Krishna Dasaratha, Jetlir Duraj,Ben Enke, Ignacio Esponda, Jiacheng Feng, Mira Frick, Tristan Gagnon-Bartsch, Ashvin Gandhi, OliverHart, Johannes Hörner, Alice Hsiaw, Ryota Iijima, Yuhta Ishii, Lawrence Jin, Yizhou Jin, Michihiro Kandori,Max Kasy, Shengwu Li, Jonathan Libgober, Matthew Lilley, George Mailath, Eric Maskin, Weicheng Min,Xiaosheng Mu, Andy Newman, Harry Pei, Joshua Schwartzstein, Roberto Serrano, Philipp Strack, ElieTamer, Omer Tamuz, Michael Thaler, Linh T. Tô, Maria Voronina, Yuichi Yamamoto, and my seminarparticipants for their insightful comments. a r X i v : . [ q -f i n . E C ] A ug Introduction

The gambler’s fallacy is widespread. Many people believe that a fair coin has a higherchance of landing on tails after landing on heads three times in a row, think a son is “due”to a woman who has given birth to consecutive daughters, and, in general, expect too muchreversal from sequential realizations of independent random events. Studies have documentedthis bias in settings where it is strictly costly, such as state lotteries with pari-mutuel payouts(Terrell, 1994; Suetens, Galbo-Jørgensen, and Tyran, 2016) and incentivized lab experiments(Benjamin, Moore, and Rabin, 2017). The same bias also aﬀects experienced decision-makersin high-stakes environments, including immigration judges (Chen, Moskowitz, and Shue,2016). Section 1.3 surveys more of this empirical literature.This paper highlights novel implications of the gambler’s fallacy in optimal-stoppingproblems when a society of biased agents learns about the underlying distributions. As arunning example, consider a junior HR manager who sequentially interviews candidates fora single job opening. In deciding whether to hire a candidate or to keep searching, the juniormanager must form a belief about the distribution of potential future applicants should shekeep the position open. She consults with senior managers and adopts their belief aboutthe labor pool based on their recruiting experience for similar positions in the past. Thejunior manager then implements a stopping strategy for her own recruiting problem, updatesher belief at the end of the hiring season, and shares this new belief with future managers.Suppose all managers commit the gambler’s fallacy — that is, they exaggerate how unlikelyit is to get consecutive above-average or consecutive below-average applicants (relative tothe labor pool mean). This error stems from the same psychology that leads people toexaggerate how unlikely it is to get consecutive heads or consecutive tails when tossing a faircoin. How does this bias inﬂuence the managers’ beliefs and behavior over time?In this example and other natural optimal-stopping problems, agents tend to stop whenearly draws are deemed “good enough,” leading to an asymmetric truncation of experience.When a manager discovers a suﬃciently strong candidate early in the hiring cycle, she stopsher recruitment eﬀorts and does not observe what additional candidates would have beenfound for the same job opening with a longer search. This endogenous censoring eﬀect onhistories interacts with the gambler’s fallacy bias and leads to pessimistic inference aboutthe labor pool. Managers continue searching only when their early candidates are below-average. They misinterpret subsequent above-average candidates as the expected positivereversal after bad initial outcomes, not as strong signals about the labor pool. On the otherhand, they are surprised by subsequent below-average candidates since they understate thelikelihood of bad streaks, misreading consecutive bad draws as very strong negative signalsabout the pool. That is, after bad early draws, managers under-infer from subsequent gooddraws but over-infer from subsequent bad draws. On average, they communicate an over-pessimistic impression of the labor pool to today’s junior manager. This pessimism informs1he junior manager’s stopping strategy and aﬀects the kind of censored history she observesand the new belief she communicates to future managers.This paper examines the endogenous learning dynamics of a society of agents believingin the gambler’s fallacy. All agents face a common stage game: an optimal-stopping problemwith draws in diﬀerent periods independently generated from ﬁxed yet unknown distribu-tions. They take turns playing the stage game, with each agent’s payoﬀ determined by thegame’s outcome. Agents are Bayesians except for the statistical bias. That is, they start witha prior belief supported on a class of feasible models about the joint distribution of draws.Feasible models are symmetric, log-concave distributions indexed by diﬀerent unconditionalmeans (the fundamentals ). I study the gambler’s fallacy as a misspeciﬁed prior: all feasiblemodels specify that better earlier draws tend to lead to worse later draws, and vice versa.The feasible models exclude the true distribution where draws are independent, so agentsundertake misspeciﬁed Bayesian learning.I consider two social-learning environments. In the ﬁrst environment, agents play thestage game one at a time. Before playing her own game, each agent adopts the ﬁnal belief ofher immediate predecessor as her prior belief and formulates a stopping strategy. At the endof her game, she updates her belief about the fundamentals by applying the Bayes’ rule toher stage-game history, then passes on her posterior belief to her successor. I show that thestochastic processes of the agents’ beliefs and behavior almost surely converge to a uniquesteady state in which agents are over-pessimistic about the fundamentals and stop too early relative to the objectively optimal strategy.In the second environment, agents arrive in large generations with everyone in the samegeneration playing simultaneously after observing all predecessors’ histories. Society con-verges to the same steady state as the previous environment. This large-generations modelillustrates a positive-feedback cycle between distorted beliefs and distorted stopping strate-gies. More severely censored datasets lead to more pessimistic beliefs, while more pessimisticbeliefs lead to earlier stopping and, as a consequence, heavier history censoring. Mappingback to the recruiting example, suppose a ﬁrm appoints HR managers in cohorts. Uponarrival, each junior manager learns the recruiting experience of all previous managers. Ifmanagers in the ﬁrst cohort start with the correct stopping strategy, then average hiringoutcome monotonically deteriorates across all future cohorts. After today’s cohort observespredecessors’ histories and makes an over-pessimistic inference, this belief leads them toact less “choosy” and only keep searching if their early candidates prove to be truly un-satisfactory. On average, early applicants rejected by today’s managers are worse than the Mueller, Spinnewijn, and Topa (2018) ﬁnd evidence consistent with people exhibiting the gambler’sfallacy in an optimal-stopping problem. They show job seekers’ beliefs about the probability of ﬁnding ajob in the near future increase signiﬁcantly over the course of the unemployment spell, after controlling forindividual ﬁxed eﬀects. These beliefs contrast with theories that predict decreasing job-ﬁnding rates (e.g.,human capital depreciation) and with the authors’ structural estimation that suggests constant rates. Oneapplication of my work is studying how a society of such biased job seekers make inferences about job-ﬁndingrates from others’ job-search experience. ﬁctitious variation both depends on severity of history censoring andinﬂuences the managers’ stopping strategy. I derive two results that illustrate how this be-lief in ﬁctitious variation interacts with endogenous learning. First, when the stage-gamepayoﬀ function is convex in draws (such as when previously rejected candidates can berecalled with some probability in the sequential interviewing game), the positive-feedbackcycle of the baseline environment strengthens. More severely censored histories not onlymake agents more pessimistic about the fundamentals by the usual censoring eﬀect, but alsodecrease their belief in ﬁctitious variation. Both forces encourage earlier stopping due tothe convexity of the optimal-stopping problem, so subsequent agents will face even heavier3ata censoring. Second, a society where agents are uncertain about the variances can endup with a diﬀerent long-run belief about the means than another society where agents knowthe correct variances. This is despite the fact that agents in both societies would make thesame (mis)inference about the means given the same data.I study a number of extensions in the Online Appendix, showing robustness of the resultsto a range of alternative speciﬁcations. The paper focuses on (misspeciﬁed) Bayesian agents,but the over-pessimism result and the positive-feedback loop continue to obtain under a non-Bayesian method-of-moments inference procedure (Online Appendix OA 5). For simplicityI consider a two-period optimal-stopping problem as the stage game, but the combinationof the gambler’s fallacy and history truncation after good outcomes still produces over-pessimistic inferences in stage games of arbitrary length (Online Appendix OA 2). I assumeall agents have the gambler’s fallacy. The presence of a subpopulation of unbiased agents oragents suﬀering from additional behavioral biases may mitigate the extent of over-pessimism(Online Appendix OA 8.2), but does not eliminate it.

This work contributes to two strands of literature: the behavioral economics literature oninference mistakes for biased learners, and the theoretical literature on the dynamics ofmisspeciﬁed endogenous learning.As a contribution to behavioral economics, I highlight a novel channel of misinference forbehavioral agents — the interaction between psychological bias and data censoring. In manynatural environments, agents learn from censored data. The economics literature has recentlyfocused on the learning implications of selection neglect in these settings, where agents act asif their dataset is not censored. This work points out that other well-documented behavioralbiases can also interact with data censoring to produce new implications. Mislearning stemsprecisely from this interaction, not from either censored data or the gambler’s fallacy alone.Agents who do not suﬀer from the statistical bias learn the fundamentals correctly even fromcensored histories. On the other hand, if we removed censoring by having agents observe ex-post what would have been drawn in each period of the optimal-stopping problem, then evenbiased agents would learn the fundamentals correctly. The intuition is that the gambler’sfallacy is a “symmetric” bias. The “asymmetric” outcome of over-pessimism only occurswhen the bias interacts with an (endogenous) asymmetric censoring mechanism that tendsto produce data containing negative streaks but not positive streaks. Environments thatfeature diﬀerent censoring patterns (e.g., strategies that produce positive streaks) or otherbehavioral biases would produce diﬀerent predictions, but again through the same basicmechanism— interaction between censoring and bias.As a theoretical contribution, I prove convergence of beliefs and behavior in a non-self- See, for example, Enke (2019) and Jehiel (2018). Another diﬀerence is that I establish my convergence result ina setting with multiple dimensions of uncertainty (the distributional parameters for diﬀerentperiods of the stage game), whereas Heidhues, Koszegi, and Strack (2018) consider conver-gence of misspeciﬁed learning with one-dimensional uncertainty. Fudenberg, Romanyuk, andStrack (2017) study a continuous-time model of active learning under misspeciﬁcation, buttheir learning problem only involves two feasible models. In this work, agents’ prior beliefabout each distributional parameter is supported on a continuum of possible values.As another contribution to the theoretical literature on misspeciﬁed learning dynamics,this project studies a new source of endogeneity: the censoring eﬀect in a dynamic stagegame. The dynamic stage game is both essential for studying learning under the gambler’sfallacy — a behavioral bias concerning the serial correlation of data — and crucial for thecensoring eﬀect. In my setting, the type of data that an agent generates depends on herbeliefs. To understand the distinction from the existing literature, consider the classic paperin this area, Nyarko (1991), who studies a monopolist setting a price on each date andobserving the resulting sales. No matter what action the monopolist takes, she observesthe same type of data: quantity sold. Similarly, the agent in Fudenberg, Romanyuk, andStrack (2017) always observes payoﬀs and the agent in Heidhues, Koszegi, and Strack (2018)always observes output levels, after any action. Endogenous learning in these other paperstakes the form of agents attributing diﬀerent meanings to the same data, when interpretedthrough the lenses of diﬀerent actions. On the other hand, we may think of stage-gamehistories censored with diﬀerent thresholds as diﬀerent types of data that, by themselves,lead to diﬀerent beliefs about the fundamentals for biased learners. Actions play no role ininference except to generate these diﬀerent types of data, as the likelihood of a (feasible)history does not depend on the censoring threshold that produced it.

Rabin (2002) and Rabin and Vayanos (2010) are the ﬁrst to study the inferential mistakes Their follow-up work Heidhues, Koszegi, and Strack (2019) also focuses on mislearning with one-dimensional uncertainty in a self-conﬁrming setting. In Online Appendix OA 6,I modify Rabin’s example to induce the censoring eﬀect. His ﬁnite-urn model then deliversa misinference result analogous to the results in this paper, which are derived in a diﬀerentsetting with continuously-valued draws. This exercise shows the robustness of my resultswithin diﬀerent modeling frameworks of the same statistical bias.Steady state in this work corresponds to Esponda and Pouzo (2016)’s Berk-Nash equilib-rium. Rather than focusing only on equilibrium analysis, however, I study non-equilibriumlearning dynamics and prove global convergence of behavior. This paper also contains morespeciﬁc results: I emphasize the interaction between censoring and bias as the driver of mis-learning, discuss how changing the stage game aﬀects long-run beliefs, and relate my resultsto previous ﬁndings on inference under the gambler’s fallacy (e.g., ﬁctitious variation in anendogenous-data setting).Although my learning framework involves a sequence of short-lived agents, the social-learning aspect of the framework is not central to the results. In fact, the environmentwhere a sequence of short-lived agents acts one at a time is equivalent to an environmentwhere a single long-lived agent plays the stage game repeatedly, myopically maximizing herexpected payoﬀ in each iteration of the stage game. In the growing literature on sociallearning with misspeciﬁed Bayesians (e.g., Eyster and Rabin (2010); Gaurino and Jehiel(2013); Bohren (2016); Bohren and Hauser (2018); Frick, Iijima, and Ishii (2019)), agentsobserve their predecessors’ actions but make errors when inverting these actions to deducesaid predecessors’ information. This kind of action inversion does not take place here: lateragents inherit all the information that their predecessors have seen, either by adopting theirbeliefs or by observing their histories, so predecessors’ actions are uninformative.The econometrics literature has also studied data-generating processes with censoring — In Rabin (2002)’s example, biased agents (correctly) believe that the part of the data which is alwaysobservable is independent of the part of the data which is sometimes missing. However, what I termthe “censoring eﬀect” is about misinference resulting from agents wrongly believing in negative correlationbetween the early draws that are always observed and the later draws that may be censored, depending onthe realizations of the early draws. Esponda, Pouzo, and Yamamoto (2019)’s work-in-progress considers misspeciﬁed learning environmentswith ﬁnite action sets and studies the convergence of empirical action frequencies. Their techniques andnotion of convergence do not seem to apply to a setting with a continuum of actions. This literature has primarilyfocused on the issue of model identiﬁcation from censored data (Cox, 1962; Tsiatis, 1975;Heckman and Honoré, 1989). In my setting, there is no identiﬁcation problem for correctlyspeciﬁed agents. Instead, I study how agents make wrong parameter estimates from censoreddata when they infer using a family of misspeciﬁed models. Another contrast is that theeconometrics literature has focused on exogenous data-censoring mechanisms, but censoringis endogenous in this paper and depends on the beliefs of previous agents. This endogeneityis central to the results, as discussed before.

Bar-Hillel and Wagenaar (1991) review classical psychology studies on the gambler’s fallacy.The earliest lab evidence involves two types of tasks. In “production tasks,” subjects areasked to write down sequences using a given alphabet, with the goal of generating sequencesthat resemble the realizations of an i.i.d. random process. Subjects tend to produce sequenceswith too many alternations between symbols, as they attempt to locally balance out symbolfrequencies. In “judgment tasks” where people are asked to identify which sequence ofbinary symbols appears most like consecutive tosses of a fair coin, subjects routinely judgesequences with an alternation rate of 60% as “more random” than those with an alternationrate of 50%. While most of these studies are unincentivized, Benjamin, Moore, and Rabin(2017) have found the gambler’s fallacy with strict monetary incentives, where a bet on afair coin continuing its streak pays strictly more than the bet on the streak reversing. Barronand Leider (2010) have shown that experiencing a streak of binary outcomes one at a timeexacerbates the gambler’s fallacy, compared with simply being told the past sequence ofoutcomes all at once.Other studies have identiﬁed the gambler’s fallacy using ﬁeld data on lotteries and casinogames. Unlike in experiments, agents in ﬁeld settings are typically not explicitly told theunderlying probabilities of the randomization devices. In state lotteries, players tend toavoid betting on numbers that have very recently won. This under-betting behavior isstrictly costly for the players when lotteries have a pari-mutuel payout structure (as in thestudies of Terrell (1994) and Suetens, Galbo-Jørgensen, and Tyran (2016)), since it leads to alarger-than-average payout per winner in the event that the same number is drawn again thefollowing week. Using security video footage, Croson and Sundali (2005) show that roulettegamblers in casinos bet more on a color after a long streak of the opposite color. Narayananand Manchanda (2012) use individual-level data tracked using casino loyalty cards to ﬁndthat a larger recent win has a negative eﬀect on the next bet that the gambler places, whilea larger recent loss increases the size of the next bet. Finally, using ﬁeld data from asylumjudges, loan oﬃcers, and baseball umpires, Chen, Moskowitz, and Shue (2016) show that References can be found in Amemiya (1985) and Crowder (2001). conditional on the underlying fundamentalsand mislearn some parameters of the world as a result. But, the misinference mechanism inthis paper is further complicated by the presence of endogenous data censoring.

This section presents the basic elements of the model, previews the main results, and providesintuition for how the censoring eﬀect drives the conclusions. I describe the (single-player) stage game , an optimal-stopping problems satisfying some conditions. Agents are uncertainabout the distribution of draws in the stage game. They entertain a prior belief over a familyof feasible models of how draws are generated. All feasible models specify the same negativecorrelation between draws, though they are objectively independent — an error that reﬂectsthe gambler’s fallacy. Sections 3 and 4 embed these model elements into social-learningenvironments and derive learning dynamics. Section 5 contains a number of extensions thatverify robustness of the main results. 8 .1 Basic Elements of the Model

The stage game is a two-period optimal-stopping problem. In the ﬁrst period, the agentdraws x ∈ R and decides whether to stop. If she stops, her payoﬀ is u ( x ) and the stagegame ends. Otherwise, she continues to the second period and draws x ∈ R . The stagegame then ends with the payoﬀ u ( x , x ).The payoﬀ functions u : R → R and u : R → R satisfy some regularity conditions tobe introduced in Assumption 1. The following example satisﬁes Assumption 1 and will beused to illustrate my results throughout this paper. Example 1 (search with q probability of recall) . Many industries have an annual hiringcycle. Consider a ﬁrm in such an industry and an HR manager who must ﬁll a job openingduring this year’s cycle. In the early phase of the hiring cycle, she ﬁnds a candidate whowould bring net beneﬁt x to the organization if hired. She must decide between hiringthis candidate immediately or waiting. Waiting means she continues searching in the latephase of the cycle, ﬁnding another candidate with beneﬁt x . Waiting carries the risk thatthe early candidate accepts an oﬀer from a diﬀerent ﬁrm in the interim, which happenswith probability 0 < − q ≤

1. This situation has the payoﬀ functions u ( x ) = x and u ( x , x ) = q · max( x , x ) + (1 − q ) x . In the late phase, there is q probability themanager gets payoﬀ equal to the higher of the two candidates’ qualities, and complementaryprobability that only the second candidate is available.The following regularity conditions deﬁne the class of optimal-stopping problems I study. Assumption 1 (regularity conditions) . The payoﬀ functions satisfy:(a) For x > x and x > x , u ( x ) > u ( x ) and u ( x , x ) > u ( x , x ) . (b) For x > x and any ¯ x , u ( x ) − u ( x ) > | u ( x , ¯ x ) − u ( x , ¯ x ) | . (c) There exist x g , x b , x b , x g ∈ R so that u ( x g ) > u ( x g , x b ) and u ( x b ) < u ( x b , x g ) .(d) u , u are continuous and x u (¯ x , x + ¯ k ) is absolutely integrable with respect tothe objective distribution of X for all ¯ x , ¯ k ∈ R . Assumption 1(a) says u , u are strictly increasing in the draws of their respective periods.Assumption 1(b) says a higher realization of the early draw increases ﬁrst-period payoﬀ morethan it changes second-period payoﬀ. Under Assumption 1(a), Assumption 1(b) is satisﬁedwhenever u is not a function of x , as in optimal-stopping problems where stopping in period k gives payoﬀ only depending on the k -th draw. Assumption 1(c) says there exist situationswhere the agent wants to stop and other situations where the agent wants to continue. Thetechnical Assumption 1(d) ensures continuation payoﬀs are well-deﬁned. These conditionsare satisﬁed by my recurring example. 9 laim . Example 1 satisﬁes Assumption 1 whenever the objective distribution of X has aﬁnite ﬁrst moment.Proofs of results in Sections 2 to 4 can be found in Appendix A.I now deﬁne strategies and histories of the stage game. Deﬁnition 1. A strategy is a function S : R → {Stop, Continue} that maps the realizationof the ﬁrst-period draw X = x into a stopping decision.Without loss I only consider pure strategies, because there always exists a payoﬀ-maximizingpure strategy under any belief about the distribution of draws. Deﬁnition 2.

The history of the stage game is an element h ∈ H := R × ( R ∪ { ∅ } ). Ifan agent decides to stop after X = x , her history is ( x , ∅ ). If the agent continues after X = x and draws X = x in the second period, her history is ( x , x ).The symbol ∅ is a censoring indicator , emphasizing that the hypothetical second-perioddraw is unobserved when the agent does not continue into the second period. In Example 1,if the HR manager hires the ﬁrst candidate, she stops her recruitment eﬀorts early and thecounterfactual second candidate that she would have found had she kept the position openremains unknown. I work with a general class of distributions for the main results. Both the true data-generatingprocess and the agents’ domain of learning can be described in terms of a pair of densitieson R satisfying the following: Assumption 2 (log-concavity and symmetry) . f ( · | and f ( · | are strictly positivedensities on R with ﬁnite second moments, and they are strictly log-concave, symmetric, andmean-zero. A leading example of strictly log-concave and symmetric distributions is the Gaussiandistribution. Another example is the logistic distribution. The mean-zero condition is onlya normalization, since we can shift any log-concave distribution symmetric around its meanto be centered around 0.For τ , τ ∈ R , let f ( · | τ ) and f ( · | τ ) represent shifted versions of f ( · |

0) and f ( · | τ and τ , respectively. More precisely, f ( x | τ ) := f ( x − τ |

0) and f ( x | τ ) := f ( x − τ |

0) for x , x ∈ R .Objectively, draws X , X in the stage game are independently distributed with X ∼ f ( · | µ • ) and X ∼ f ( · | µ • ). The parameters µ • , µ • ∈ R are the true fundamentals . InExample 1, µ • and µ • stand for the true qualities of the two applicant pools in the early andlate phases of the hiring season.Agents are uncertain about the distribution of ( X , X ). The next deﬁnition describesthe set of distributions that a gambler’s fallacy agent deems plausible.10 eﬁnition 3. The set of feasible models { Ψ( µ , µ ; γ ) : ( µ , µ ) ∈ M} is a family of jointdistributions of ( X , X ) indexed by feasible fundamentals ( µ , µ ) ∈ M ⊆ R , for some biasparameter γ >

0. Here Ψ( µ , µ ; γ ) refers to the joint distribution X ∼ f ( · | µ )( X | X = x ) ∼ f ( · | µ − γ ( x − µ )) , where X | ( X = x ) is the conditional distribution of X given X = x . I write E Ψ and P Ψ throughout for expectation and probability with respect to model Ψ.When E and P are used without subscripts, they refer to expectation and probability underthe true model, Ψ • = Ψ( µ • , µ • ; 0) . I model the gambler’s fallacy as an additive shift in the agent’s belief about X ’s dis-tribution following diﬀerent X realizations, so that ( X | X = x ) increases in ﬁrst-orderstochastic dominance order as x decreases. Conditional on the fundamentals, if the real-ization of X is higher than expected, then the agent believes bad luck is due in the nearfuture and the second draw is likely below average. Conversely, an exceptionally bad earlydraw likely portends above-average luck in the next period. This interpretation is clearerin the following equivalent formulation of Ψ( µ , µ ; γ ): X = µ + (cid:15) , X = µ + (cid:15) where (cid:15) ∼ f ( · |

0) and ( (cid:15) | (cid:15) ) ∼ f ( · | − γ(cid:15) ). The mean-zero terms (cid:15) , (cid:15) represent the idiosyn-cratic factors, or “luck,” that determine how X and X ’s realizations deviate from theirunconditional means µ and µ . The negative correlation between (cid:15) and (cid:15) conditional on µ , µ represents a belief in reversal of luck. Larger γ > Example 2 (the Gaussian case) . Objectively, X ∼ N ( µ • , σ ) and X ∼ N ( µ • , σ ) areindependent Gaussian random variables each with variance σ >

0. But the agent believes X , X are a pair of correlated Gaussian random variables with X ∼ N ( µ , σ ) and ( X | X = x ) ∼ N ( µ − γ ( x − µ ) , σ ) for some ( µ , µ ) ∈ M . The set of feasible models is indexed by the set of feasible fundamentals, M . We maythink of the agents as learning about the unconditional means of X and X , with M as thedomain of their inference. I study gambler’s fallacy for continuous random variables, where the magnitude of X aﬀects the agent’sprediction about X . Chen, Moskowitz, and Shue (2016)’s analysis of baseball umpire data provides supportfor the continuous version of the statistical bias. They ﬁnd that an umpire is more likely to call the currentpitch a ball after having called the previous pitch a strike, controlling for the actual location of the pitch.Crucially, the eﬀect size is larger after more obvious strikes, where “obviousness” is based on the distance ofthe pitch to the center of the regulated strike zone. This distance can be thought of as a continuous measureof the “quality” of each pitch. emark . I will consider several speciﬁcations of M throughout this paper.(a) M = R . The agent thinks all values ( µ , µ ) ∈ R are possible.(b) M = ♦ , where ♦ is a bounded parallelogram in R whose left and right edges areparallel to the y -axis, whose top and bottom edges have slope − γ . The agent isuncertain about both µ and µ , but her uncertainty has bounded support. (c) M = { µ • } × [ µ , ¯ µ ] . The agent has a correct, dogmatic belief about µ , but hasuncertainty about µ supported on a bounded interval.(d) M = { ( µ, µ ) : µ ∈ R } . The agent is convinced that the ﬁrst-period and second-periodfundamentals are the same, but is uncertain what this common parameter is.While the agent can freely update her belief about the fundamentals on M , she holdsa dogmatic belief about γ > This implies the set of feasible models excludes the truemodel, Ψ • = Ψ( µ • , µ • ; 0), so Bayesian updating within the class of feasible models amountsto misspeciﬁed learning. I use misspeciﬁcation as a tool to represent and study the gambler’sfallacy. This approach is motivated by ﬁeld evidence on the bias’ persistence: for example,Chen, Moskowitz, and Shue (2016) show that even very experienced decision-makers exhibita non-negligible amount of the gambler’s fallacy in high-stakes settings.In the social-learning environment I study in Section 3, short-lived agents each observesone iteration of the stage game, so no one has a large enough dataset to identify the misspec-iﬁcation problem. In Online Appendix OA 7, I discuss why even agents with large datasetsmay never question their feasible models: the misspeciﬁcation is “attentionally stable” inthe sense of Gagnon-Bartsch, Rabin, and Schwartzstein (2018).Before stating my main results, I ﬁrst establish a proposition about the optimal stage-game strategy. This will motivate a slight strengthening of Assumption 1 that I need forsome results. For c ∈ R , write S c for the cutoﬀ strategy such that S c ( x ) = Stop if and onlyif x > c . That is, S c accepts all early draws above a cutoﬀ threshold c . Proposition 1.

Under Assumption 1 and for γ > , • Under each feasible model Ψ( µ , µ ; γ ) , there exists a cutoﬀ threshold C ( µ , µ ; γ ) ∈ R such that it is strictly optimal to continue whenever x < C ( µ , µ ; γ ) and strictlyoptimal to stop whenever x > C ( µ , µ ; γ ) . • For every µ ∈ R , µ C ( µ , µ ; γ ) is strictly increasing. Any prior belief over fundamentals ( µ , µ ) supported on a bounded set in R can be arbitrarily well-approximated by a prior belief over a large enough ♦ . Section 5.3 studies the extension where agents are uncertain about γ , but the support of their priorbelief about γ lies to the left of 0 and is bounded away from it. For every µ ∈ R , µ C ( µ , µ ; γ ) is Lipschitz continuous with Lipschitz constant /γ . The content of this proposition is threefold.First, it shows that the best strategy for the class of optimal-stopping problems I studytakes a cutoﬀ form. This is because a higher x both increases the payoﬀ to stopping and,under the gambler’s fallacy, predicts worse draws in the next period. Both forces push in thedirection of stopping. The optimality of cutoﬀ strategies leads to an endogenous, asymmetriccensoring of histories, formalizing the idea that agents stop after “good enough” draws.Second, holding ﬁxed µ , the cutoﬀ threshold increases with µ . This is because the agentcan aﬀord to be choosier in the ﬁrst period when prospects in the second period improve.The third statement about Lipschitz continuity, on the other hand, gives a bound onhow quickly µ C ( µ , µ ; γ ) increases. Suppose that one agent believes draws are gen-erated according to Ψ( µ , µ ; γ ), while another agent believes they are generated accordingto Ψ( µ , µ + 1; γ ). If the ﬁrst agent is indiﬀerent between stopping and continuing after X = c , then the second agent prefers stopping after X = c + γ . This is because the predictedconditional mean of X falls by (1 /γ ) · γ = 1 when X increases by 1 /γ under any feasiblemodel, which cancels out the relative optimism of the second agent about the unconditionaldistribution of X .The Lipschitz constant 1 /γ is guaranteed for every optimal-stopping problem satisfyingAssumption 1 and every γ >

0. But, 1 /γ may not be the best Lipschitz constant. My resultsuse the slightly stronger condition that µ C ( µ • , µ ; γ ) has a Lipschitz constant strictly smaller than 1 /γ. Instead of making an assumption on C directly, I strengthen Assumption1(b) on the stage-game primitives to imply the desired inﬁnitesimally stronger Lipschitzcontinuity. Assumption 3 ( ‘ -Lipschitz continuity) . Either: (a)

There exists < ‘ < γ so that for every x , x ∈ R and d > ,u ( x + ‘d ) − u ( x ) ≥ u ( x + ‘d, x + (1 − γ‘ ) d ) − u ( x , x ) Or: (b) u is Lipschitz continuous and only a function of x , and furthermore there exists (cid:15) > so that u ( x ) > (cid:15) for all x ∈ R . Assumption 3(a) is satisﬁed by my recurring example.

Claim . Example 1 satisﬁes Assumption 3(a) with ‘ = γ for every probability of recall0 ≤ q < γ > I now state my two main results, which concern learning dynamics under the gambler’sfallacy in two diﬀerent social-learning environments. Precise details of these environments13ill follow in Sections 3 and 4, respectively.In the ﬁrst environment, short-lived agents arrive one per round, t = 1 , , , ... . Agent inround t = 1 starts with a full-support prior density m : ♦ → R > , where ♦ is a boundedparallelogram in R as in Remark 1(b). In round t, agent t adopts the ﬁnal belief ˜ m t − of herimmediate predecessor as her prior belief, then chooses a cutoﬀ threshold ˜ C t to maximizeher expected payoﬀ based on this belief. She observes what happens in her stage game anduses Bayes’ rule to update her belief from ˜ m t − to ˜ m t , which then becomes the prior beliefof agent t + 1 . In this environment, the sequences of cutoﬀs ( ˜ C t ) and posterior belief densities ( ˜ m t ) arestochastic processes whose randomness derives from the randomness of draws. Draws areobjectively independent, both between the two periods in the same round of the stage gameand across diﬀerent rounds. Write (˜ µ ,t , ˜ µ ,t ) for the random element in ♦ given by thedensity ˜ m t . Theorem 1.

Suppose Assumptions 1, 2, and 3 hold, and the second derivative of ln ( f ( x | µ • )) is uniformly bounded for x ∈ R . There exists a unique steady state µ ∞ , c ∞ ∈ R not depen-dent on m , so that provided ( µ • , µ ∞ ) ∈ ♦ , almost surely lim t →∞ ˜ C t = c ∞ and (˜ µ ,t , ˜ µ ,t ) t ≥ converges in L to ( µ • , µ ∞ ) . The steady state satisﬁes µ ∞ < µ • and c ∞ < c • , where c • is theobjectively optimal cutoﬀ threshold. In other words, almost surely behavior and belief converge in the society, and this steadystate is independent of the prior over fundamentals (provided its support is large enough).In the steady state, agents hold overly pessimistic beliefs about the fundamentals and stoptoo early, relative to the objectively optimal strategy. (The additional regularity assumptionthat the second derivative of ln ( f ( x | µ • )) is uniformly bounded is satisﬁed by the Gaussianand logistic distributions.)In the second environment, short-lived agents arrive in generations, t = 0 , , , ..., witha continuum of agents per generation. Agents’ prior belief about the fundamentals is givenby a full-support density m on R , as in Remark 1(a). Each agent observes the stage-gamehistories of all predecessors from all past generations to make inferences about the funda-mentals. Due to the large generations, cutoﬀs and beliefs are deterministic in generations t ≥ , which I denote as c [ t ] and µ [ t ] = ( µ , [ t ] , µ , [ t ] ) respectively. The society is initialized atan arbitrary cutoﬀ strategy S c [0] in the 0th generation, the initial condition . Theorem 2.

Suppose Assumptions 1 and 2 hold. Starting from any initial condition and any m , cutoﬀs ( c [ t ] ) t ≥ and beliefs ( µ , [ t ] ) t ≥ form monotonic sequences across generations. WhenAssumption 3 also holds, there exists a unique steady state µ ∞ , c ∞ ∈ R so that c [ t ] → c ∞ and ( µ , [ t ] , µ , [ t ] ) → ( µ • , µ ∞ ) monotonically, regardless of the initial condition and m . Thissteady state is the same as the one in Theorem 1. I focus on learning across diﬀerent iterations of the stage game and assume agents do not update beliefswithin the stage game. t is more pessimisticthan generation t − µ , [ t ] < µ , [ t − . The monotonicityresult implies that beliefs move in the same direction again in generation t + 1, that is µ , [ t +1] < µ , [ t ] . The information of generation t + 1 diﬀers from that of generation t only inthat agents in generation t + 1 observe all stage-game histories of generation t. This meansgeneration t ’s stopping behavior diﬀers from that of generation t − t − t. In the learning environments of this paper, each agent censors her stage game history throughher stopping strategy, where the strategy choice depends on her beliefs. To build intuitionfor how this censoring eﬀect relates to the two main theorems, I ﬁrst consider a biased agentwith feasible fundamentals M = R , facing a large sample of histories all censored accordingto some cutoﬀ threshold c . I characterize her inference about fundamentals when the samplesize grows and analyze how her inference depends on the cutoﬀ threshold c .Suppose c ∈ R ∪ {∞} and Ψ is a model. Then H (Ψ; c ) refers to the distribution ofhistories when draws are generated by Ψ and histories censored according to S c . Deﬁnition 4.

For c ∈ R and Ψ a model, H (Ψ; c ) ∈ ∆( H ) is the distribution of historiesgiven by H (Ψ; c )[ E × E ] := P Ψ [( E ∩ ( c, ∞ )) × E ] for E , E ∈ B ( R ) H (Ψ; c )[ E × { ∅ } ] := P Ψ [( E ∩ ( −∞ , c ]) × R ] for E ∈ B ( R ) , where B ( R ) is the collection of Borel subsets of R .I abbreviate H (Ψ • ; c ) as simply H • ( c ) , the true distribution of histories under the cutoﬀthreshold c . The next deﬁnition gives a measure of the diﬀerence between the distributionof histories under the feasible model with fundamentals ( µ , µ ) and the true distribution ofhistories, both with the same censoring threshold c . Deﬁnition 5.

For µ , µ ∈ R , c ∈ R ∪ {∞} the Kullback-Leibler (KL) divergence from H • ( c )to H (Ψ( µ , µ ; γ ); c ), denoted by D KL ( H • ( c ) || H (Ψ( µ , µ ; γ ); c ) ), is Z ∞ c f ( x | µ • ) · ln (cid:18) f ( x | µ • ) f ( x | µ ) (cid:19) dx + Z c −∞ (cid:26)Z ∞−∞ f ( x | µ • ) · f ( x | µ • ) · ln (cid:20) f ( x | µ • ) · f ( x | µ • ) f ( x | µ ) · f ( x | µ − γ ( x − µ )) (cid:21) dx (cid:27) dx . c ,( µ ∗ , µ ∗ ) ∈ arg min µ ,µ ∈ R D KL ( H • ( c ) || H (Ψ( µ , µ ; γ ); c ) ) , are called the pseudo-true fundamentals with respect to c .To interpret, the likelihood of the history h = ( x , x ) with x ≤ c is f ( x | µ • ) · f ( x | µ • )under the true model Ψ • , f ( x | µ ) · f ( x | µ − γ ( x − µ )) under the feasible modelΨ( µ , µ ; γ ). The likelihood of the history h = ( x , ∅ ) with x > c is f ( x | µ • ) underthe true model, f ( x | µ ) under the feasible model. The likelihoods of all other historiesare 0 under both models. So the KL divergence expression in Deﬁnition 5 is the expectedlog-likelihood ratio of the history under the true model versus under the feasible modelwith fundamentals ( µ , µ ) , where expectation over histories is taken under the true model.In general, this optimization objective depends on the cutoﬀ threshold c and I denote thepseudo-true fundamentals as µ ∗ ( c ) , µ ∗ ( c ) to emphasize this dependence. The pseudo-truefundamentals correspond to the biased agent’s inference about the fundamentals in largesamples.The next proposition characterizes the pseudo-true fundamentals and contains the keyintuition behind the two main theorems. Proposition 2.

Under Assumption 2, for any c ∈ R ∪ {∞} , the KL divergence minimizationproblem in Deﬁnition 5 admit a unique solution ( µ ∗ ( c ) , µ ∗ ( c )) ∈ R . Furthermore: • µ ∗ ( c ) = µ • for any c ∈ R ∪ {∞}• µ ∗ ( c ) < µ • for any c ∈ R and µ ∗ ( ∞ ) = µ • • µ ∗ ( c ) is strictly increasing in c . In the Gaussian case, the pseudo-true fundamental µ ∗ ( c ) admits a closed-form expressionthat readily veriﬁes Proposition 2. Example 2 (continued) . In the Gaussian case, for c ∈ R ∪ {∞} ,µ ∗ ( c ) = µ • − γ ( µ • − E [ X | X ≤ c ]) . The censoring eﬀect is crucial for misinference: as Proposition 2 shows, the pseudo-truefundamentals are unbiased in the absence of censoring (i.e., when c = ∞ ). Here is why thedirectional data censoring I study leads to over-pessimism. In every feasible model of drawsΨ( µ , µ ; γ ), the realization of X depends on two factors: the second-period fundamental µ , and a reversal eﬀect based on the realization of X . The biased agent cannot end up witha correct or over-optimistic belief about µ , else she would be systematically disappointedby realizations of X in her dataset. This is because X is only uncensored when X is low16nough, a contingency where the agent expects positive reversal on average. Over-pessimismcan therefore be thought of as “two wrongs making a right,” as the biased agent’s pessimismabout the unconditional mean of X counteracts her false expectation of positive reversalsin the dataset of censored histories.This mechanism explains the long-run pessimism in Theorem 1 and Theorem 2. Infact, in the large-generations setting of Theorem 2, every generation t ≥ µ = µ as in Remark1(d) (Section 5.4), when the stage game has more than two periods (Online Appendix OA2), under an alternative method-of-moments inference procedure (Online Appendix OA 5),when a fraction of agents suﬀer from selection neglect (Online Appendix OA 8.2), whenhigher draws bring worse payoﬀs (Online Appendix OA 8.1), and with high probability afterobserving a ﬁnite dataset containing just 100 censored histories (Online Appendix OA 9.1).The severity of the biased agent’s pessimism increases with the severity of censoring. Theintuition is that the agent wants to infer a lower µ ∗ to better match X ’s in histories thatstart with bad X ’s, but doing so carries the cost of a worse model ﬁt for histories that startwith intermediate X ’s. More severe censoring — generated by a strategy that accepts notonly very good X ’s but also intermediate ones— alleviates this cost, as histories that startwith intermediate X ’s no longer contain their associated X ’s. The optimal inference µ ∗ thus decreases.The comparative static dµ ∗ dc > H • ( c [0] ) and chooses a cutoﬀ c [1] . Generation 2 then observes histories from all predecessorgenerations, that is histories drawn from both H • ( c [0] ) and H • ( c [1] ). If c [1] < c [0] , thenGeneration 2’s dataset features (on average) more severe censoring than Generation 1’sdataset. Thus, Generation 2 comes to a more pessimistic inference about the second-periodfundamental. By Proposition 1, this leads to a further lowering of the cutoﬀ threshold, c [2] < c [1] , and the pattern continues. This section studies a social-learning environment where biased agents act one at a timeand pass down beliefs to their successors. I deﬁne the steady state of the stage game,prove its existence and uniqueness, and show it features over-pessimistic beliefs and earlystopping. Then, I turn to the stochastic process of beliefs and behavior in the social-learningenvironment, showing that this process almost surely converges to the steady state.17 .1 Steady State: Existence, Uniqueness, and Other Properties

A steady state is a triplet consisting of fundamentals ( µ ∞ , µ ∞ ) ∈ R and a cutoﬀ threshold c ∞ ∈ R that endogenously determine each other. The cutoﬀ strategy with acceptancethreshold c ∞ maximizes expected payoﬀ under the feasible model Ψ( µ ∞ , µ ∞ ; γ ), while thefundamentals are the pseudo-true fundamentals under data censoring with threshold c ∞ .More precisely, Deﬁnition 6. A steady state consists of µ ∞ , µ ∞ , c ∞ ∈ R such that:1. c ∞ = C ( µ ∞ , µ ∞ ; γ ) . µ ∞ = µ ∗ ( c ∞ ) and µ ∞ = µ ∗ ( c ∞ ).Steady states correspond to Esponda and Pouzo (2016)’s pure Berk-Nash equilibria for anagent whose prior is supported on the feasible models with feasible fundamentals M = R ,under the restriction that equilibrium belief puts full conﬁdence in a single fundamentalpair. The set of steady states depends on γ , since the severity of the bias changes both theoptimal cutoﬀ thresholds under diﬀerent fundamentals and inference about fundamentalsfrom stage-game histories.The “steady state” deﬁned here almost surely characterizes the long-run learning outcomein the society where biased agents act one by one. This convergence does not follow fromEsponda and Pouzo (2016), for their results only imply local convergence from prior beliefssuﬃciently close to the equilibrium beliefs, and only in a “perturbed game” environmentwhere learners receive idiosyncratic payoﬀ shocks to diﬀerent actions. I will show globalconvergence of the stochastic processes of beliefs and behavior without payoﬀ shocks.Like almost all examples of Berk-Nash equilibrium in Esponda and Pouzo (2016), mysteady state generates data with positive KL divergence relative to the implied data distri-bution under the steady-state beliefs. That is, H • ( c ∞ ) = H (Ψ( µ ∞ , µ ∞ ; γ ); c ∞ ), so the steadystate is not a self-conﬁrming equilibrium. This is because for every censoring threshold c (and in particular for c = c ∞ ) , the KL divergences of the true history distribution to thehistory distributions under diﬀerent feasible models is bounded away from 0.To prove the existence and uniqueness of steady state, I deﬁne the following belief itera-tion map on the second-period fundamental. Deﬁnition 7.

For γ > , the iteration map I : R → R is given by I ( µ ; γ ) := µ ∗ ( C ( µ • , µ , γ )) For example, under the history distribution H • ( c ∞ ), E [ h | c ∞ − ≤ h ≤ c ∞ ] = E [ h | c ∞ − ≤ h ≤ c ∞ −

1] since draws are objectively independent. However, under the history distribution driven by the steady-state feasible model Ψ( µ ∞ , µ ∞ ; γ ), we must have E [ h | c ∞ − ≤ h ≤ c ∞ ] < E [ h | c ∞ − ≤ h ≤ c ∞ − γ >

0. This feature contrasts with Heidhues, Koszegi, and Strack (2018)’s model that results in aself-conﬁrming learning outcome. µ ∗ ( c ) = µ • for all c from Proposition 2, it is straightforward to see thatsteady-state beliefs about µ are in bijection with ﬁxed points of I . This shows steady-statebelief about µ exhibits over-pessimism. Proposition 3.

Under Assumption 2, every steady state satisﬁes µ ∞ < µ • , µ ∞ = µ • . Furthermore, steady state is unique under the additional Assumption 3.

Proposition 4.

Under Assumptions 1, 2, and 3, I is a contraction mapping with contractionconstant < ‘γ < . Therefore, a unique steady state exists. The contraction mapping property of I comes from two lemmas. First, we can use thestrict log-concavity assumption to show that µ ∗ ( c ) is Lipschitz continuous with constant γ. Lemma 1.

Under Assumption 2, µ ∗ ( c ) is Lipschitz continuous with Lipschitz constant γ. Next , the indiﬀerence threshold is Lipschitz continuous with a Lipschitz constant strictlyless than 1 /γ once we adopt Assumption 3. Lemma 2.

Under Assumptions 1, 2, and 3, µ C ( µ • , µ ; γ ) is Lipschitz continuous witha Lipschitz constant ‘ < /γ . Even under Assumptions 1 and 2 alone, the basic regularity conditions we maintainthroughout, it turns out I is “almost” a contraction mapping for any γ >

0, in the sensethat |I ( µ ) − I ( µ ) | < | µ − µ | for every µ , µ ∈ R . But, there is no guarantee of a uniformcontraction constant strictly less than 1. The slight strengthening in Assumption 3 ensuressuch a uniform contraction constant exists.I now show the steady-state stopping threshold always features stopping too early. Forevery µ • , µ • ∈ R , the objectively optimal stopping strategy takes the form of a cutoﬀ c • ∈ R ∪ {±∞} , where c • = −∞ means always stopping and c • = ∞ means never stopping. Ishow that c • > c ∞ for every steady-state cutoﬀ c ∞ . (This result only requires Assumptions1 and 2 and does not require uniqueness of steady states.)Early stopping does not directly follow from over-pessimism. In fact, outside of thesteady state, there is an intuition that a biased agent may stop later than a rational agent,not earlier. For a concrete illustration, consider Example 1 with q = 0, so there is noprobability of recall. Suppose the true fundamentals are µ • (cid:29) µ • . If a biased agent hasthe correct beliefs about the fundamentals, she perceives a greater continuation value after X = µ • than a rational agent with the same correct beliefs, since the former holds afalse expectation of positive reversals after a bad (relative to µ • ) early draw. Even though c • = µ • and the rational agent chooses to stop, the biased agent chooses to continue and has This follows from Lemma A.2 in the Appendix, which shows even when γ = 0, the diﬀerence betweenstopping payoﬀ at x and expected continuation payoﬀ after x is strictly increasing and continuous in x .

19n indiﬀerence threshold strictly above c • . By continuity, the biased agent’s cutoﬀ thresholdremains strictly above c • even under slightly pessimistic beliefs about µ . Nevertheless, the next proposition shows that in the steady state, it is unambiguousthat the biased agent stops too early relative to the objectively optimal threshold. Theearly-stopping result strengthens the over-pessimism result. In the steady state, agents mustbe suﬃciently pessimistic as to overcome the opposite intuition about late stopping justdiscussed.

Proposition 5.

Under Assumptions 1 and 2, every steady-state stopping threshold c ∞ isstrictly lower than the objectively optimal threshold, c • . To understand why, note the biased agent believes in diﬀerent distributions of X follow-ing diﬀerent realizations of X , with more pessimistic beliefs after higher realizations. In asteady state ( µ ∞ , µ ∞ , c ∞ ) , the agent’s subjective belief about X following X = c ∞ must bea leftward shift of the true distribution f ( · | µ • ). Else, the agent would have subjective dis-tributions of X that stochastically dominate the true distribution whenever S c ∞ prescribescontinuing, so heuristically she could improve the ﬁt of her model by lowering her beliefabout µ . The biased agent’s indiﬀerence at c ∞ is thus based on an overly pessimistic beliefabout the continuation value, so we must have c ∞ < c • . This section shows the steady state deﬁned and studied earlier corresponds to the long-run learning outcome for a society of biased agents acting one at a time. I outline theconvergence proof for a simpler variant of Theorem 1, where agents start oﬀ knowing µ • andonly entertain uncertainty over µ . That is, the feasible fundamentals are given by Remark1(c) rather than Remark 1(b). This simpliﬁcation is without much loss: even when agentsare initially uncertain about µ , they will almost surely learn it in the long run regardless ofthe stochastic process of their predecessors’ stopping strategies. Intuitively, this is because X can never be censored, so no belief distortion in µ is possible. Once agents have learned µ • , the rest of the argument proceeds much like the case where µ • is known from the start.In the next section I comment on the key steps in extending the proof to the case uncertaintyover two-dimensional fundamentals ( µ , µ ), but will defer the details to Online AppendixOA 3.In the learning environment, time is discrete and partitioned into rounds t = 1 , , , ... One short-lived agent arrives per round. Agent 1 starts with a prior belief M given by aprior density m : [ µ , ¯ µ ] → R > , while agent t ≥ M t − of agent t − Next, agent t chooses a cutoﬀ threshold ˜ C t maximizing expected This is similar to the intuition for why µ ∗ ( c ) = µ • for every c . The same learning dynamics obtain in an environment where every agent starts with the common priorbelief M and observes the stage-game histories of all predecessors. M t − to ˜ M t by applying Bayes’ rule to her stage-game history, ˜ H t ∈ H .The sequences ( ˜ M t ) , ( ˜ C t ) , ( ˜ H t ) are stochastic processes whose randomness stem from ran-domness of the stage-game draws realizations in diﬀerent rounds. The convergence theoremis about the almost sure convergence of processes ( ˜ M t ) and ( ˜ C t ) . To deﬁne the probabilityspace formally, consider the R -valued stochastic process ( X t ) t ≥ = ( X ,t , X ,t ) t ≥ , where X t and X t are independent for t = t . Within each t, X ,t ∼ f ( · | µ • ), X ,t ∼ f ( · | µ • ) are alsoindependent. Interpret X t as the pair of potential draws in the t -th round of the stage game.Clearly, there exists a probability space (Ω , A , P ), with sample space Ω = ( R ) ∞ interpretedas paths of the process just described, A the Borel σ -algebra on Ω , and P the measure onsample paths so that the process X t ( ω ) = ω t has the desired distribution. The term “almostsurely” means “with probability 1 with respect to the realization of the inﬁnite sequence ofall (potential) draws”, i.e. P -almost surely. The processes ( ˜ M t ) , ( ˜ C t ) , ( ˜ H t ) are deﬁned onthis probability space and adapted to the ﬁltration ( F t ) t ≥ , where F t is the sub- σ -algebragenerated by draws up to round t , F t = σ (( X s ) ts =1 ).Under Assumptions 1, 2, and 3, by Proposition 4 there exists a unique steady state( µ • , µ ∞ , c ∞ ). Theorem 1 shows that, provided the support of m contains µ ∞ , m is con-tinuous, and the second derivative of ln( f ( · | µ • )) is uniformly bounded, the stochasticprocesses ( ˜ C t ) and ( ˜ M t ) almost surely converge to the steady state. This is a global conver-gence result since the bounded interval [ µ , ¯ µ ] can be arbitrarily large and the prior density m can assign arbitrarily small probability to neighborhoods around µ ∞ . Theorem 1 . Suppose Assumptions 1, 2 and 3 hold and the second derivative of ln ( f ( x | µ • )) is uniformly bounded for x ∈ R . Suppose also µ ≤ µ ∞ ≤ ¯ µ where µ ∞ is the unique steady-state belief, and prior density m : [ µ , ¯ µ ] → R > is continuously diﬀerentiable. Almostsurely, lim t →∞ ˜ C t = c ∞ and lim t →∞ E µ ∼ ˜ M t | µ − µ ∞ | = 0 , where c ∞ is the unique steady-state cutoﬀ threshold. I will now discuss the obstacles to proving convergence and provide an outline of myargument. In each round t, the cutoﬀ choice of the t -th agent determines how history ˜ H t will be censored. We can think of each c ∈ R as generating a diﬀerent “type” of data. As wesaw in Proposition 2, diﬀerent “types” of data (in large samples) lead to diﬀerent inferencesabout the fundamentals for biased agents, so the cutoﬀ ˜ C t inﬂuences the belief that will bepassed on to agent t + 1. Yet ˜ C t is an endogenous, ex-ante random object that depends onthe belief of the t -th agent, meaning that belief and behavior co-evolve to complicate theanalysis of learning dynamics.To be more precise, the log-likelihood of all X data up to the end of round t underfundamental µ ∈ [ µ , ¯ µ ] is the random variable t X s =1 ln( f ( X ,s ; µ − γ ( X ,s − µ • )) · { X ,s ≤ ˜ C s } . s -th summand contains the indicator { X ,s ≤ C s } , referring to the fact that X ,s wouldbe censored if X ,s exceeds the cutoﬀ ˜ C s . The cutoﬀ ˜ C s depends on histories in periods1 , , ..., s − , hence indirectly on ( X k ) k

~~Proposition 10 from Heidhues, Koszegi, and Strack (2018):~~

Let ( y t ) t be a martingalethat satisﬁes a.s. [ y t ] ≤ vt for some constant v ≥ . We have that a.s. lim t →∞ y t t = 0 . After simplifying the problem with this result, I establish a pair of mutual bounds onasymptotic behavior and asymptotic beliefs. If we know cutoﬀ thresholds are asymptoticallybounded between c l and c h , c l < c h , then beliefs about µ must be asymptotically supportedon the interval [ µ ∗ ( c l ) , µ ∗ ( c h )]. Conversely, if belief is asymptotically supported on the subin-terval [ µ l , µ h ] ⊆ [ µ , ¯ µ ], then cutoﬀ thresholds must be asymptotically bounded between C ( µ • , µ l ; γ ) and C ( µ • , µ h ; γ ). Lemma A.19 . For c l ≥ C ( µ • , µ ; γ ) , if almost surely lim inf t →∞ ˜ C t ≥ c l , then almost surely lim t →∞ ˜ M t ( [ µ , µ ∗ ( c l )) ) = 0 . Also, for c h ≤ C ( µ • , ¯ µ ; γ ) , if almost surely lim sup t →∞ ˜ C t ≤ c h , then almost surely lim t →∞ ˜ M t ( ( µ ∗ ( c h ) , ¯ µ ]) = 0 . Lemma A.20 . For µ ≤ µ l < µ h ≤ ¯ µ , if lim t →∞ ˜ M t ([ µ l , µ h ]) = 1 almost surely, then lim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) and lim sup t →∞ ˜ C t ≤ C ( µ • , µ h ; γ ) almost surely. Applying this pair of lemmas to supp( M ) = [ µ , ¯ µ ], we conclude that asymptotically˜ M t must be supported on the subinterval [ I ( µ ) , I (¯ µ )] , where I is the iteration map fromDeﬁnition 7 ﬁrst used in analyzing the existence and uniqueness of steady states. UnderAssumptions 1, 2, and 3, Proposition 4 implies that I is a contraction mapping whoseiterates converge to µ ∞ . Therefore by repeatedly applying the pair of Lemmas A.19 andA.20, we can reﬁne the bound on asymptotic beliefs down to the singleton { µ ∞ } , showingthe almost-sure convergence of beliefs there. The almost-sure convergence of behavior followseasily from Lemma A.20. µ The hypotheses of Theorem 1 diﬀer from those of Theorem 1 in that agents start oﬀ withuncertainty about µ . I now comment on the key step to proving almost-sure convergence22f beliefs and behavior in the environment with two-dimensional uncertainty about funda-mentals.The structure of the inference problem in my setting is such that I can separately boundthe agents’ asymptotic beliefs in two “directions,” thus reducing the task of proving a two-dimensional belief bound into a pair of tasks involving one-dimensional belief bounds. Tounderstand why, consider a pair of fundamentals, ( µ , µ ) and ( µ , µ ) = ( µ + d, µ − γd ) forsome d >

0, satisfying µ , µ ≤ µ • . That is, ( µ , µ ) and ( µ , µ ) lie on the same line withslope − γ . For any uncensored history ( x , x ) ∈ R , the likelihood of second-period draw x is the same under both pairs of fundamentals, f ( x | µ − γ ( x − µ )) = f ( x | µ − γ ( x − µ )) . This is because the feasible model Ψ( µ , µ ; γ ) has a lower ﬁrst-period mean but also ahigher second-period unconditional mean, compared to the model Ψ( µ , µ ; γ ). An agentwho believes in the ﬁrst model feels less disappointed by the draw x , since she evaluates itagainst a lower expectation. This leads a weaker anticipation of positive reversal under thegambler’s fallacy, compared to another agent who believes in the second model. But, thisdiﬀerence is canceled out by the more optimistic belief about the unconditional distributionof second-period draw, µ > µ .This argument shows that both pairs of fundamentals ( µ , µ ) and ( µ , µ ) explain X data equally well in all uncensored histories. This is important as it shows regardless of agent t ’s strategy, she would always ﬁnd that ( µ , µ ) and ( µ , µ ) lead to the same likelihood ofsecond-period data in her history ˜ H t . At the same time, ( µ , µ ) provides a strictly betterﬁt for X data on average than ( µ , µ ) , since µ < µ ≤ µ • . This means in the long run,fundamentals ( µ , µ ) should receive much less posterior probability than ( µ , µ ), as thelatter better rationalize the data overall.This heuristic comparison of the asymptotic goodness-of-ﬁt for two feasible models isformalized by computing the directional derivative for data log-likelihood along the vector − γ ! in the space of fundamentals. I establish an (almost-sure) positive lowerbound onthis directional derivative to the left of µ • , and an analogous negative upperbound to theright of µ • . This allows me to show the region colored in red receives 0 posterior probabilityasymptotically, by comparing each point in red with a corresponding point in blue along aline of slope − γ . By repeating this argument (and applying the symmetric bound to theright of µ • ), I show that belief is asymptotically concentrated along an (cid:15) -width vertical stripcontaining the steady state beliefs, ( µ • , µ ∞ ). 23aving restricted the long-run belief to a small vertical strip, we have completed one“direction” of the belief bounds and eﬀectively reduced the dimensionality of uncertaintyback to one. The rest of the argument proceeds similarly to the case where agents know µ • discussed before, iteratively restricting agents’ asymptotic behavior and asymptotic beliefabout µ . These restrictions amount to “vertical” belief reﬁnements within the (cid:15) -strip, soeventually belief is restricted to the single point ( µ • , µ ∞ ), the unique steady-state belief. In this section, I consider a social-learning environment where agents arrive in large gen-erations and all agents in the same generation act simultaneously. I will prove Theorem2, fully characterizing the learning dynamics in this environment. I will also discuss thepositive-feedback loop between distorted beliefs about fundamentals and distorted stoppingbehavior.

There is an inﬁnite sequence of generations, t ∈ { , , , ... } . Each generation is “large” andwill be modeled as a continuum of agents, n ∈ [0 , n from generation 1 is unrelated toagent n from generation 2. The realizations of draws X , X are independent across all stagegames, including those from the same generation. Generation 0 agents play some strategy S c [0] , where c [0] ∈ R is the initial condition of social learning.Write h τ,n ∈ H for the stage-game history of agent n from generation τ. Before playingher own stage game, each agent in generation t ≥ h τ,n ) n ∈ [0 , from each predecessor generation, 0 ≤ τ ≤ t −

1. If all generation τ predecessors used the stopping strategy S c τ for some c τ ∈ R , then the sub-dataset ( h τ,n ) n ∈ [0 , All generation τ predecessors had the same information about the fundamentals, so all of them wouldhave found the same stopping strategy subjectively optimal. H • ( c τ ). Agents are told the stopping strategies of their predecessors fromall past generations and use the entire dataset of histories to infer fundamentals. The spaceof feasible fundamentals is M = R as in Remark 1(a), so agents can ﬂexibly estimate theunconditional means of draws from diﬀerent periods, subject to their dogmatic belief inreversals.Agents only infer from predecessors’ histories, not from their behavior. This is rationalas information sets are nested across generations. For t > t , generation t observes allthe social information that generation t saw. In addition, generation t ’s dataset contains acomplete record of everything that happened in generation t ’s stage games. Since generation t has no private information that is unobserved by generation t , the behavior of thesepredecessors is uninformative about the fundamentals beyond what generation t can learnfrom the dataset of histories.In the large-generations model, generation t agents infer fundamentals ( µ , [ t ] , µ , [ t ] ) thatminimize the sum of the KL divergences between the implied history distribution under thefeasible model Ψ( µ , [ t ] , µ , [ t ] ; γ ) on the one hand, and the t observed history distributions ingenerations 0 ≤ τ ≤ t − t ’s minimization objectivebelow. Deﬁnition 8.

The large-generations pseudo-true fundamentals with respect to cutoﬀ thresh-olds ( c τ ) t − τ =0 solve min µ ,µ ∈ R t − X τ =0 D KL ( H • ( c τ ) || H (Ψ( µ , µ ; γ ); c τ ) ) , (1)where D KL is KL divergence from Deﬁnition 5. Denote the minimizers as µ ∗ ( c , ..., c t − ) and µ ∗ ( c , ..., c t − ).I interpret the continuum of agents in each generation as an idealized, tractable modelingdevice representing a large but ﬁnite number of agents. Appendix OA 4 provides a ﬁnite-population foundation for inference and behavior in the continuum-population model. There,for the Gaussian case, I show that when an agent observe t ﬁnite sub-datasets of historiesdrawn from distributions H • ( c τ ) for 0 ≤ τ ≤ t −

1, as these datasets grow large her inferenceand behavior almost surely converge to the inﬁnite-population analogs.

Now I develop the proof of Theorem 2.

Theorem 2.

Suppose Assumptions 1 and 2 hold. Starting from any initial condition and any m , cutoﬀs ( c [ t ] ) t ≥ and beliefs ( µ [ t ] ) t ≥ form monotonic sequences across generations. When These stopping rules can also be exactly inferred from the inﬁnite dataset. ssumption 3 also holds, there exists a unique steady state µ ∞ , c ∞ ∈ R so that c [ t ] → c ∞ and ( µ , [ t ] , µ , [ t ] ) → ( µ • , µ ∞ ) monotonically, regardless of the initial condition and m . Thissteady state is the same as the one in Theorem 1. Towards a proof, ﬁrst consider learning dynamics in a related auxiliary environment .The auxiliary environment is identical to the large-generations environment just described,except that agents in each generation t ≥ t − . Write µ A [ t ] and c A [ t ] for the inference and cutoﬀ threshold ingeneration t of the auxiliary environment, where the superscript “A” distinguishes themfrom the corresponding processes of the baseline large-generations environment.We have µ A , [ t ] = µ • for every t ≥

1, from from Proposition 2. Also, it is easy to see that( µ A , [ t ] ) t ≥ are iterates of the I map from Deﬁnition 7, and that they must be monotonic sincethe pair of comparative statics ∂C∂µ > dµ ∗ dc > q = 0. Consider the Gaussian casewith µ • = µ • = 0 , γ = − . , σ = 1 , and the society starts at the objectively optimal cutoﬀthreshold, c [0] = 0. Society mislearns monotonically in both the baseline large-generationsenvironment and the auxiliary environment. This mislearning is more exaggerated in the inthe auxiliary environment, but both environments lead to the same long-run outcome.The map I ( · ; γ ) connects the environment where large generations of agents act simul-taneously to the environment where agents act one by one. We can think of I ( · ; γ ) as theone-generation-forward belief map in the auxiliary society, whose belief dynamics are closelyrelated to the belief dynamics of the baseline large-generations environment. There are nolarge generations at all in the environment where agents act one by one, but there I still playsa critical role in establishing the long-run convergence of beliefs and behavior. Intuitively,26 ynamics of Beliefs in the First Four Generations generation be li e f abou t m − . − . − . large−gen. environmentauxiliary environment Figure 1: The dynamics of beliefs about µ in the ﬁrst four generations for the Gaussiancase (with σ = 1). The stage game is search (without recall), with true fundamentals are µ • = µ • = 0, bias parameter γ >

0, and initial condition is c [0] = 0. In both the baselinelarge-generations environment and the auxiliary environment, beliefs are monotonic acrossgenerations, an illustration of Theorem 2. Beliefs in both environments converge to the samesteady-state beliefs, though the rate of convergence is faster in the auxiliary environment.in the learning environment of Section 3, a belief based on the histories of one predecessorfrom each of many past generations replaces a belief based on a large dataset of historiesfrom many agents all belonging to the same past generation.I combine the asymptotic early-stopping result of Theorem 1 with the monotonic learningdynamics of Theorem 2 to deduce: Corollary 1.

Suppose Assumptions 1, 2, and 3 hold. In the large-generations environment,if society starts at the objectively optimal initial condition c [0] = c • , then expected payoﬀstrictly decreases across all successive generations. This stark “monotonic” mislearning result relies crucially on the endogenous-data setting.Each generation uses a lower acceptance threshold relative to their predecessors, a changewith the side eﬀect of changing censoring threshold of their successors’ data. The new“type” of censored data causes the next generation to become more pessimism about thefundamentals than any past generation.

In this section I explore a number of alternative model speciﬁcations to examine the robust-ness of my main results. The Online Appendix contains the proofs of results in this sectionand additional extensions. 27 .1 Comparative Statics

In the ﬁrst extension, I consider how learning dynamics react to changes in stage-gameparameters. In general, when agents learn from exogenous data, their decision problem doesnot inﬂuence learning outcomes. This observation holds independently of whether agentsare misspeciﬁed. On the other hand, correctly speciﬁed agents in my setting always learncorrectly in the long run, so the stage game is again irrelevant. With misspeciﬁed learners inan endogenous-data setting, however, changes in the stage game carry long-run consequenceson agents’ beliefs about the fundamentals.

Deﬁnition 9.

Given a pair of second-period payoﬀ functions u H , u L , say u H payoﬀ dominates u L (abbreviated u H (cid:31) u L ) if for every x ∈ R , u H ( x , x ) ≥ u L ( x , x ) for every x ∈ R , andalso u H ( x , x ) > u L ( x , x ) for a positive-measure set of x in R .For instance, in Example 1, increasing q (the probability of recall) creates a new optimal-stopping problem that payoﬀ dominates the old one. More generally, starting from any stagegame with payoﬀ functions u and u , we can impose an extra waiting cost κ wait > u and u L with u L = u − κ wait . The modiﬁed stage game is payoﬀ dominated by the unmodiﬁedone.When u H (cid:31) u L , a society facing the problem ( u , u H ) always uses a higher stoppingthreshold than a society facing the problem ( u , u L ) , given the same beliefs about funda-mentals. To state this formally, let C u ,u be the optimal cutoﬀ threshold function for thestage game ( u , u ). Lemma 3.

Suppose Assumption 1 holds for stage games ( u , u H ) and ( u , u L ) , and u H (cid:31) u L .For all µ , µ ∈ R , γ > , C u ,u H ( µ , µ ; γ ) > C u ,u L ( µ , µ ; γ ) . The next proposition shows that when one stage game payoﬀ dominated another in termsof second-period payoﬀs, the dominated stage game leads to more pessimistic beliefs and alower cutoﬀ threshold in the steady state.

Proposition 6.

Suppose both ( u , u H ) and ( u , u L ) satisfy Assumptions 1 and 3, and that u H (cid:31) u L . Under Assumption 2, the steady state of ( u , u H ) features strictly more optimisticbelief about the second-period fundamental and a strictly higher cutoﬀ threshold than thesteady state of ( u , u L ) . Combined with my main results on learning dynamics (Theorems 1 and 2), Proposition 6illustrates how changing the stage-game payoﬀ structure aﬀects long-run inference. Considertwo societies of gambler’s fallacy agents with the same bias parameter γ >

0, facing stagegames ( u , u H ) and ( u , u L ) respectively, where u H payoﬀ dominates u L . Even if the lattersociety starts with a much more optimistic belief about µ , in the long run the second societywill end up with strictly more pessimistic beliefs and will use strictly lower cutoﬀ thresholds.28ince steady-state beliefs are too pessimistic in both societies, the second society’s long-runbeliefs are more distorted.This comparative statics result provides novel predictions about how the economic envi-ronment aﬀects biased agents’ inference. Applied to the hiring context from Example 1, thisresult says when managers are more impatient (i.e., suﬀer a greater waiting cost) or whenthey have a lower chance of recalling previous applicants, then they will end up with morepessimistic beliefs about the labor pool. The direction of the comparative statics is anotherexpression of the positive-feedback cycle between stopping threshold and inference. Whenmanagers become more impatient, for instance, they use a lower acceptance threshold asthey wish to ﬁnish recruiting earlier. The lower cutoﬀ intensiﬁes the censoring eﬀect on his-tories, leading to more pessimistic inference about the fundamentals. The extra pessimism,in turn, leads future managers to further lower their acceptance threshold, amplifying theinitial change in behavior that came from the increase in waiting cost.From a policy perspective, subsidizing longer search (i.e., decreasing κ wait ) unambiguouslyimproves asymptotic learning accuracy for biased agents. So, even a policymaker who isignorant of ( µ • , µ • ) can partially correct society’s long-run beliefs. We can also think ofthis policy as a test of misspeciﬁcation, as it alters steady-state beliefs only when agentsare misspeciﬁed. The test can be conducted without knowledge of the true data-generatingprocess. For this and subsequent extensions, I specialize to the Gaussian case.So far, I have assumed agents hold dogmatic and correct beliefs about the variance of X and the conditional variance of X | ( X = x ) . In this extension, I expand the set offeasible models and consider agents who are uncertain about the variances of the draws andjointly estimate means and variances. I show that agents exaggerate variances in a way thatdepends on the severity of data censoring, and study how this belief in ﬁctitious variation strengthens the positive-feedback cycle between beliefs and behavior.For µ , µ ∈ R , σ , σ ≥ , and γ ≥ , let Ψ( µ , µ , σ , σ ; γ ) refer to the joint distribution X ∼ N ( µ , σ )( X | X = x ) ∼ N ( µ − γ ( x − µ ) , σ ) . Objectively, X , X are independent Gaussian random variables each with a variance of( σ • ) >

0, so the true joint distribution of ( X , X ) is Ψ • = Ψ( µ • , µ • , ( σ • ) , ( σ • ) ; 0). Supposeagents have a full-support belief over the class of feasible models n Ψ( µ , µ , σ , σ ; γ ) : µ , µ ∈ R , σ , σ ≥ o γ >

0. For this extension, “fundamentals” refer to the fourparameters µ , µ , σ , σ .Following Deﬁnition 5, write D KL ( H • ( c ) k H (Ψ( µ , µ , σ , σ ; γ ); c )) ) to denote the KLdivergence between the true distribution of histories with censoring threshold c and theimplied history distribution under the fundamentals µ , µ , σ , σ . This divergence is givenby Z ∞ c φ ( x ; µ • , ( σ • ) ) · ln φ ( x ; µ • , ( σ • ) ) φ ( x ; µ , σ ) ! dx (2)+ Z c −∞ (Z ∞−∞ φ ( x ; µ • , ( σ • ) ) · φ ( x ; µ • , ( σ • ) ) · ln " φ ( x ; µ • , ( σ • ) ) · φ ( x ; µ • , ( σ • ) ) φ ( x ; µ , σ ) · φ ( x ; µ − γ ( x − µ ) , σ ) dx ) dx , where φ ( x ; µ, σ ) is the Gaussian density with mean µ and variance σ , evaluated at x. The next Proposition characterizes the pseudo-true fundamentals µ ∗ , µ ∗ , ( σ ∗ ) , ( σ ∗ ) thatminimize Equation (2) in closed-form expressions. Proposition 7.

The solutions of min µ , µ ∈ R ,σ ,σ ≥ D KL ( H • ( c ) k H (Ψ( µ , µ , σ , σ ; γ ); c )) ) are µ ∗ = µ • , µ ∗ = µ • − γ ( µ • − E [ X | X ≤ c ]) , ( σ ∗ ) = ( σ • ) , and ( σ ∗ ) = ( σ • ) + γ Var[ X | X ≤ c ] . Comparing Proposition 7 with the expressions for µ ∗ , µ ∗ in Example 2 shows that theagent makes the same misinference about the means regardless of whether she knows thevariances. This shows the robustness of the over-pessimism prediction in an environmentwhere agents jointly estimate both means and variances.Biased agents correctly estimate the ﬁrst-period variance, ( σ ∗ ) = ( σ • ) , but over-estimate second-period variance. The magnitude of this distortion increases in the severityof the gambler’s fallacy but decreases with the severity of the censoring, as Var[ X | X ≤ c ]increases in c for X Gaussian.Here is the intuition. Whereas the objective conditional distribution of X | ( X = x ) isindependent of x , the agent entertains diﬀerent beliefs about this distribution for diﬀerent x . The agent’s inference about µ ∗ ensures her belief about X | X = x ﬁts the data wellfollowing “typical” realizations of x under the censoring restriction X ≤ c . But the agentcontinues to be surprised by streaks of bad draws in the data. To better account for thesesurprising observations, the agent increases estimated conditional variance of X | ( X = x )and attributes these unexpected realizations of X to “noise.” More “noise” is needed whenVar[ X | X ≤ c ] larger, for the frequency of these surprising observations depends on howmuch X tends to deviate from its typical value of E [ X | X ≤ c ] under the restriction X ≤ c .30n equivalent formulation of this result helps interpret the distorted ( σ ∗ ) . We may writethe feasible model Ψ( µ , µ , σ , σ ; γ ) with σ ≥ σ as X = µ + (cid:15) X = µ + ζ + (cid:15) where (cid:15) ∼ N (0 , σ ), (cid:15) | (cid:15) ∼ N ( − γ(cid:15) , σ ), and ζ ∼ N (0 , σ ζ ) , with ζ independent of (cid:15) , (cid:15) . In the context where X and X represent the quality realizations of two candidates fromthe early and late applicant pools, ζ is a vacancy-speciﬁc shock to the average quality of thesecond-period applicant. A positive σ ζ means there are some vacancies for which the lateapplicants are an especially poor ﬁt and some others for which they are especially suitable.Proposition 7 says that in an environment where all jobs are objectively homogeneous withrespect to the ﬁt of the late applicants, biased managers who ﬁnd it possible that jobs areheterogeneous in this dimension will indeed estimate a positive amount of this heterogeneity, σ ζ >

0, from the censored histories of their predecessors. This added heterogeneity allowsagents to better rationalize histories ( X , X ) where both candidates have unusually high/lowqualities as vacancies that happen to be an especially good/bad ﬁt for second-period appli-cants. That is, the biased managers reason that the realization of the vacancy-speciﬁc ﬁxedeﬀect, ζ, must have been far from 0.This phenomenon relates to ﬁndings in Rabin (2002) and Rabin and Vayanos (2010),who refer to exaggeration of variance under the gambler’s fallacy as ﬁctitious variation . Thekey innovation of Proposition 7 is to show, in an endogenous-data setting, how the degree ofﬁctitious variation depends on the severity of censoring. To highlight this point, I now derivetwo results focusing on the interplay between ﬁctitious variation and endogenous censoring.For simplicity, I derive these results using the auxiliary large-generations environment deﬁnedin Section 4.1, where agents arrive in large generations and only infer from the histories ofthe immediate predecessor generation.The ﬁrst result says that when the second-period payoﬀ u ( x , x ) is a linear or convexfunction of x , the positive-feedback cycle from Section 4 continues to obtain — cutoﬀs,beliefs about fundamentals, and beliefs about variances form monotonic sequences acrossgenerations. This weak convexity includes the case of search with recall in Example 1 forany recall probability 0 ≤ q < Deﬁnition 10.

The optimal-stopping problem is convex if for every x ∈ R , x u ( x , x )is convex with strict convexity for x in a positive-measure subset of R . The optimal-stoppingproblem is concave if for every x ∈ R , x u ( x , x ) is concave with strict concavity for x in a positive-measure subset of R . Proposition 8.

Suppose the optimal-stopping problem is convex. Suppose agents start witha full-support prior over { Ψ( µ , µ , σ , σ ; γ ) : µ , µ ∈ R , σ , σ ≥ } and society starts at the nitial condition c [0] ∈ R . For t ≥ , denote the beliefs of generation t as ( µ , [ t ] , µ , [ t ] , σ , [ t ] , σ , [ t ] ) and their cutoﬀ threshold as c [ t ] . Then µ , [ t ] = µ • , ( σ , [ t ] ) = ( σ • ) for all t, while ( µ , [ t ] ) t ≥ , ( σ , [ t ] ) t ≥ , and ( c [ t ] ) t ≥ are monotonic sequences. The intuition is straightforward. Suppose generation t uses a more relaxed acceptancethreshold c [ t ] < c [ t − than generation t − , resulting in a more severely censored dataset.By the usual censoring eﬀect without variance uncertainty, generation t + 1 becomes morepessimistic about second-period mean than generation t. In addition, by Proposition 7 weknow that generation t +1 suﬀers less from ﬁctitious variation than generation t . This impliesgeneration t + 1 agents would perceive less continuation value than generation t agents evenif they held the same beliefs about the means, for a larger variance in X | ( X = x ) improvesthe expected payoﬀ when continuing due to the convexity of u in x . Combining these twoforces, we deduce c [ t +1] < c [ t ] . The intuition just discussed shows that uncertainty about variance strengthens the mono-tonicity result. To be more precise, suppose c [ t ] < c [ t − . Consider a hypothetical generation t +1 agent who dogmatically adopts generation t ’s beliefs about variances, σ , [ t ] and σ , [ t ] , andinfers from the class of models { Ψ( µ , µ , σ , [ t ] , σ , [ t ] ; γ ) : µ , µ ∈ R } . Based on generation t shistories, this hypothetical agent makes inferences about means and chooses a cutoﬀ thresh-old, ˆ µ , [ t +1] , ˆ µ , [ t +1] , ˆ c [ t +1] . By comparing Proposition 7 and Example 2, ˆ µ , [ t +1] = µ , [ t +1] ,ˆ µ , [ t +1] = µ , [ t +1] , but c [ t +1] < ˆ c [ t +1] < c [ t ] . That is, while the cutoﬀ threshold of this hy-pothetical agent follows the monotonicity pattern in the previous two generations, ˆ c [ t +1]
2, thesociety with uncertainty about variances ends up with a more pessimistic/optimistic beliefabout the second-period mean compared with the society that knows the variances, pro-vided the optimal-stopping problem is convex/concave. This divergence depends cruciallyon the endogenous-learning setting, for Proposition 7 implies that the two societies makethe same inferences about the means when given the same data. Allowing uncertainty onone dimension (variance) ends up aﬀecting society’s long-run inference in another dimension(mean).Formally, consider two societies of agents, A and B. Agents in society A start with a full-support prior over { Ψ( µ , µ , ( σ • ) , ( σ • ) ; γ ) : µ , µ ∈ R } . Agents in society B start with afull-support prior over { Ψ( µ , µ , σ , σ ; γ ) : µ , µ ∈ R , σ , σ ≥ } . Fix the same Generation0 initial condition c [0] ∈ R for both societies. For t ≥ , denote the beliefs of Generation t insociety k ∈ { A, B } as ( µ , [ k,t ] , µ , [ k,t ] , ( σ , [ k,t ] ) , ( σ , [ k,t ] ) ) and their cutoﬀ threshold as c [ k,t ] .32 roposition 9. In the ﬁrst generation, µ , [ A, = µ , [ B, and µ , [ A, = µ , [ B, . If the optimal-stopping problem is convex, then µ , [ B,t ] > µ , [ A,t ] and c [ B,t ] > c [ A,t ] for every t ≥ . If theoptimal-stopping problem is concave, then µ , [ B,t ] < µ , [ A,t ] and c [ B,t ] < c [ A,t ] for every t ≥ . ( X , X ) and Uncertainty About γ So far I have assumed that draws ( X , X ) within the stage game are objectively independent,and that agents have a dogmatic γ >

0, interpreted as the severity of the gambler’s fallacybias. This extension considers a joint relaxation of these two assumptions.Suppose the true model is ( X , X ) ∼ Ψ( µ • , µ • ; γ • ) , where possibly γ • = 0 . Agents jointlyestimate ( µ , µ , γ ) ∈ R , with a prior supported on R × R × [ γ, ¯ γ ] where [ γ, ¯ γ ] is a ﬁniteinterval. The next result generalizes the pseudo-true fundamentals in Example 2. It showsthat when γ • / ∈ [ γ, ¯ γ ], the agent infers γ ∗ equal to the boundary point of the interval that iscloser to γ • . Given the estimated pseudo-true parameter γ ∗ , the estimates of the ﬁrst- andsecond-period fundamentals take similar forms to those in Example 2. Proposition 10.

Suppose γ • / ∈ [ γ, ¯ γ ] . Let ˜ γ = ¯ γ if γ • > ¯ γ , otherwise ˜ γ = γ when γ • < γ .The solution of the KL-divergence minimization problem min µ ,µ ∈ R ,γ ∈ [ γ, ¯ γ ] D KL ( H (Ψ( µ • , µ • ; γ • ); c ) || H (Ψ( µ , µ ; γ ); c )) is given by µ ∗ ( c ) = µ • , µ ∗ ( c ) = µ • + ( γ • − ˜ γ ) · ( µ • − E Ψ • [ X | X ≤ c ]) , γ ∗ ( c ) = ˜ γ . Intuitively, we may expect the closest distance (in the KL divergence sense) from the setof feasible models { Ψ( µ , µ ; γ ) : µ , µ ∈ R } to the objective distribution Ψ( µ • , µ • ; γ • ) todecrease in | γ − γ • | . Proposition 10 conﬁrms this intuition, showing that the pseudo-truemodel from the set { Ψ( µ , µ ; γ ) : µ , µ ∈ R , γ ∈ [ γ, ¯ γ ] } lies in the subset { Ψ( µ , µ ; γ ) : µ , µ ∈ R , γ = ˜ γ } , where ˜ γ is the closest point (in the Euclidean sense) to γ • in the interval[ γ, ¯ γ ].When γ • = 0 and ¯ γ <

0, this result shows that over-pessimism in inference is robustto agents learning the correlation of X and X , provided the support of their uncertaintyabout correlation lies to the left of 0 and excludes 0. In this case, it is also easy to see thatthe learning dynamics in the large-generations auxiliary environment are the same as whenagents start with a dogmatic belief in γ = ¯ γ . µ = µ I now consider the special case where the true fundamentals are time-invariant, µ • = µ • = µ • ∈ R . If agents’ feasible fundamentals are M = R as in Remark 1(a), then Proposition2 continues to apply. But now suppose agents know the fundamentals are time-invariant33nd only have uncertainty over this common value, so the set of feasible fundamentals is thediagonal M = { ( x, x ) : x ∈ R } , as in Remark 1(d). I investigate inference in this settingwhen agents’ prior belief over feasible models is supported on { Ψ( µ, µ ; γ ) : µ ∈ R } . Let µ ∗ (cid:77) ( c ) ∈ R stand for the common fundamental that minimizes the KL divergencerelative to the history distribution H • ( c ), that is µ ∗ (cid:77) ( c ) := arg min µ ∈ R D KL ( H • ( c ) k H (Ψ( µ, µ ; γ ); c ))The next result characterizes µ ∗ (cid:77) ( c ). Proposition 11. µ ∗ (cid:77) ( c ) = P [ X ≤ c ] · (1+ γ ) µ ◦ ( c ) + P [ X ≤ c ] · (1+ γ ) P [ X ≤ c ] · (1+ γ ) µ ◦ ( c ) , where µ ◦ ( c ) = µ • and µ ◦ ( c ) = µ • − γ γ ( µ • − E [ X | X ≤ c ]) . Agents face two kinds of data about the common fundamental: observations of ﬁrst-period draws and observations of second-period draws. Feasible models Ψ( µ ◦ ( c ) , µ ◦ ( c ); γ )and Ψ( µ ◦ ( c ) , µ ◦ ( c ); γ ) minimize the KL divergence of these two kinds of data, respectively. The overall KL-divergence minimizing estimator is a certain convex combination betweenthese two points. Through the term P [ X ≤ c ] , the relative weight given to µ ◦ ( c ) increases asthe cutoﬀ c increases, because the second-period data is observed more often if the datasetof histories is censored with a higher cutoﬀ in the ﬁrst period.For any censoring threshold c generating the history distribution, agents underestimatesthe common fundamental. We have µ ◦ ( c ) < µ • while µ ◦ ( c ) = µ • . This shows the robustnessof the over-pessimism result from the setting with M = R . However, the extent of over-pessimism about µ is dampened relative to agents who can ﬂexibly estimate diﬀerent µ and µ for the two periods. Compared with the unconstrained pseudo-true fundamentalsfrom Example 2, we have µ ◦ ( c ) > µ ∗ ( c ) since γ γ < γ , hence µ ∗ (cid:77) ( c ) > µ ∗ ( c ). This makesintuitive sense: when unconstrained, agents come to two diﬀerent beliefs about µ and µ ,even though they are objectively the same. They hold correct beliefs about µ but pessimisticbeliefs about µ . When constrained to a common inference across two fundamentals, agentsdistort their belief about µ downwards and their belief about µ upwards, relative to theunconstrained environment. This paper studies endogenous learning dynamics of misspeciﬁed agents. I focus on thegambler’s fallacy, a non-self-conﬁrming misspeciﬁcation where no feasible beliefs of the bi-ased agents can exactly match the data. In natural optimal-stopping problems, agents tend Note that µ ◦ ( c ) diﬀers from the pseudo-true fundamental µ ∗ ( c ) from Example 2. The estimator µ ◦ ( c )minimizes the KL divergence of second-period draws under the constraint that the same fundamental mustbe inferred for both periods, whereas µ ∗ ( c ) minimizes this divergence when ﬁrst-period fundamental is ﬁxedat its true value, µ • .

34o stop after “good enough” early draws, where the threshold for “good enough” evolvesas agents update their beliefs about the underlying distributions. Stopping decisions thusimpose an endogenous censoring eﬀect on histories, which in turn aﬀects the beliefs of subse-quent agents. The statistical bias interacts with data censoring, generating over-pessimismabout the fundamentals and resulting in stopping too early in the long run. These asymp-totic mistakes are driven by a positive-feedback loop between distorted beliefs and distortedbehavior.I have studied a particular behavioral bias (the gambler’s fallacy) in a natural environ-ment where censoring happens (histories in optimal-stopping problems). The key mechanismI highlight, the interaction between data censoring and bias, applies more broadly and de-livers diﬀerent predictions in diﬀerent contexts. For example, the same mechanism wouldlead to an over-estimation of µ if the agents believe in some γ <

0. I am leaving open theinteraction of other kinds of behavioral learning with other censoring mechanisms to futurework.

References

Amemiya, T. (1985):

Advanced Econometrics , Harvard University Press.

Andrews, D. W. (1992): “Generic uniform convergence,”

Econometric theory , 8, 241–257.

Bar-Hillel, M. and W. A. Wagenaar (1991): “The perception of randomness,”

Ad-vances in Applied Mathematics , 12, 428–454.

Barron, G. and S. Leider (2010): “The role of experience in the gambler’s fallacy,”

Journal of Behavioral Decision Making , 23, 117–129.

Benjamin, D. J., D. A. Moore, and M. Rabin (2017): “Biased Beliefs About RandomSamples: Evidence from Two Integrated Experiments,”

Working Paper . Berk, R. H. (1966): “Limiting behavior of posterior distributions when the model is incor-rect,”

Annals of Mathematical Statistics , 37, 51–58.

Bohren, J. A. (2016): “Informational herding with model misspeciﬁcation,”

Journal ofEconomic Theory , 163, 222–247.

Bohren, J. A. and D. Hauser (2018): “Bounded Rationality And Learning: A Frame-work and A Robustness Result,”

Working Paper . Bromiley, P. (2018): “Products and Convolutions of Gaussian Probability Density Func-tions,”

Working Paper . 35 unke, O. and X. Milhaud (1998): “Asymptotic behavior of Bayes estimates underpossibly incorrect models,”

Annals of Statistics , 26, 617–644.

Camerer, C. F. (1987): “Do biases in probability judgment matter in markets? Experi-mental evidence,”

American Economic Review , 77, 981–997.

Chen, D. L., T. J. Moskowitz, and K. Shue (2016): “Decision making under thegambler’s fallacy: Evidence from asylum judges, loan oﬃcers, and baseball umpires,”

Quarterly Journal of Economics , 131, 1181–1242.

Cox, D. R. (1962):

Renewal Theory , Methuen.

Croson, R. and J. Sundali (2005): “The gambler’s fallacy and the hot hand: Empiricaldata from casinos,”

Journal of Risk and Uncertainty , 30, 195–209.

Crowder, M. J. (2001):

Classical Competing Risks , Chapman and Hall/CRC.

Enke, B. (2019): “What you see is all there is,”

Working Paper . Esponda, I. and D. Pouzo (2016): “Berk–Nash equilibrium: A framework for modelingagents with misspeciﬁed models,”

Econometrica , 84, 1093–1130.

Esponda, I., D. Pouzo, and Y. Yamamoto (2019): “Asymptotic Behavior of BayesianLearners with Misspeciﬁed Models,”

In Preparation . Eyster, E. and M. Rabin (2010): “Naive herding in rich-information settings,”

AmericanEconomic Journal: Microeconomics , 2, 221–243.

Frick, M., R. Iijima, and Y. Ishii (2019): “Misinterpreting Others and the Fragility ofSocial Learning,”

Working Paper . Fudenberg, D., G. Romanyuk, and P. Strack (2017): “Active learning with a mis-speciﬁed prior,”

Theoretical Economics , 12, 1155–1189.

Gagnon-Bartsch, T., M. Rabin, and J. Schwartzstein (2018): “Channeled Atten-tion and Stable Errors,”

Working Paper . Gaurino, A. and P. Jehiel (2013): “Social Learning with Coarse Inference,”

AmericanEconomic Journal: Microeconomics , 5, 147–74.

Gumbel, E. J. (1960): “Bivariate exponential distributions,”

Journal of the AmericanStatistical Association , 55, 698–707.

Heckman, J. J. and B. E. Honoré (1989): “The identiﬁability of the competing risksmodel,”

Biometrika , 76, 325–330. 36 eidhues, P., B. Koszegi, and P. Strack (2018): “Unrealistic expectations and mis-guided learning,”

Econometrica , 86, 1159–1214.——— (2019): “Convergence in Misspeciﬁed Learning Models with Endogenous Actions,”

Working Paper . Jehiel, P. (2018): “Investment strategy and selection bias: An equilibrium perspective onoveroptimism,”

American Economic Review , 108, 1582–97.

Mueller, A. I., J. Spinnewijn, and G. Topa (2018): “Job Seekers’ Perceptions andEmployment Prospects: Heterogeneity, Duration Dependence and Bias,”

Working Paper . Narayanan, S. and P. Manchanda (2012): “An empirical analysis of individual levelcasino gambling behavior,”

Quantitative Marketing and Economics , 10, 27–62.

Nyarko, Y. (1991): “Learning in mis-speciﬁed models and the possibility of cycles,”

Journalof Economic Theory , 55, 416–427.

Rabin, M. (2002): “Inference by believers in the law of small numbers,”

Quarterly Journalof Economics , 117, 775–816.

Rabin, M. and D. Vayanos (2010): “The gambler’s and hot-hand fallacies: Theory andapplications,”

Review of Economic Studies , 77, 730–778.

Suetens, S., C. B. Galbo-Jørgensen, and J.-R. Tyran (2016): “Predicting lottonumbers: a natural experiment on the gambler’s fallacy and the hot-hand fallacy,”

Journalof the European Economic Association , 14, 584–607.

Terrell, D. (1994): “A test of the gambler’s fallacy: Evidence from pari-mutuel games,”

Journal of Risk and Uncertainty , 8, 309–317.

Tsiatis, A. (1975): “A nonidentiﬁability aspect of the problem of competing risks,”

Pro-ceedings of the National Academy of Sciences , 72, 20–22.

AppendixA Proofs of Results in Sections 2, 3, and 4

A.1 Proof of Claim 1

Proof.

For Example 1, clearly u and u are strictly increasing functions of x and x re-spectively. We also have that | u ( x , ¯ x ) − u ( x , ¯ x ) | ≤ q ( x − x ) for x > x and any37 x , while u ( x ) = 1 . This shows Assumption 1(b) holds. If x > x <

0, then u ( x , x ) = q · x + (1 − q ) x < x = u ( x ), and conversely x < , x > u ( x , x ) > u ( x ) . This shows Assumption 1(c) holds. It is clear that u , u are continuous.Also, | u (¯ x , x + ¯ k ) | ≤ q ( | ¯ x | + | x + ¯ k | ) + (1 − q ) | x + ¯ k | ≤ q | ¯ x | + | ¯ k | + | x | . Since the objective distribution satisﬁes E ( | X | ) < ∞ , we have E ( | u (¯ x , x + ¯ k ) | ) ≤ q | ¯ x | + | ¯ k | + E ( | X | ) < ∞ . This shows Assumption 1(d) holds. A.2 Proofs of Proposition 1 and Lemma 2

The argument behind Proposition 1 consists of three lemmas (A.1, A.3, and A.4) thatcorrespond to the three statements in the proposition. Along the way, I will also proveLemma 2.

A.2.1 The Optimal Strategy Has a Cutoﬀ Form

In the ﬁrst part, I establish lemma A.1.

Lemma A.1.

Under Assumption 1 and the feasible model Ψ( µ , µ ; γ ) for any γ > C ( µ , µ ; γ ), such that: (i) the agent strictly prefers stopping afterany x > C ( µ , µ ; γ ); (ii) the agent is indiﬀerent between continuing and stopping after x = C ( µ , µ ; γ ); (iii) the agent strictly prefers continuing after any x < C ( µ , µ ; γ ).Suppose X = x . Consider the payoﬀ diﬀerence between accepting it and continuingunder the feasible model Ψ( µ , µ ; γ ) for γ ≥ D ( x ; µ , µ , γ ) := u ( x ) − E Ψ( µ ,µ ; γ ) [ u ( x , X ) | X = x ] . I abbreviate this as D ( x ) when Ψ is ﬁxed. Lemma A.2 summarizes some properties of D .(The proofs of some technical results stated in the Appendix, like Lemma A.2, appear in theOnline Appendix.) Lemma A.2. D is strictly increasing and continuous. If γ > , then there are x < x sothat D ( x ) < < D ( x ) . Lemma A.1 follows readily from Lemma A.2.

Proof.

Applying Lemma A.2 and using the fact that γ > D changes sign and is strictlyincreasing and continuous. So, there exists a unique c ∗ ∈ R satisfying D ( c ∗ ) = 0 . It is clearthat the best stopping strategy under Ψ is the cutoﬀ strategy that stops after every x > c ∗ and continues after every x < c ∗ . This establishes property (ii) of the optimal strategy.Properties (i) and (iii) follow from the fact that D is strictly increasing.38 .2.2 Cutoﬀ Threshold Increasing in µ In the second part, I prove the lemma:

Lemma A.3.

Under Assumption 1, for any µ ∈ R and γ >

0, the indiﬀerence threshold C ( µ , µ ; γ ) is strictly increasing in µ . Proof.

Let ˆ µ , ˆ µ , ˆˆ µ ∈ R with ˆˆ µ > ˆ µ . I show that C (ˆ µ ˆ µ ; γ ) < C (ˆ µ ˆˆ µ ; γ ).By Lemma A.1, the threshold C (ˆ µ , ˆ µ ; γ ) is characterized by the indiﬀerence condition, u ( C (ˆ µ , ˆ µ ; γ )) = E ˜ X ∼ f ( ·| ˆ µ − γ ( C (ˆ µ , ˆ µ ; γ ) − ˆ µ )) [ u ( C (ˆ µ , ˆ µ ; γ ) , ˜ X )]But if agent were to instead believe (ˆ µ ˆˆ µ ) where ˆˆ µ > ˆ µ , then the conditional distributionof X given X = C (ˆ µ , ˆ µ ; γ ) would be f ( · | ˆˆ µ − γ ( C (ˆ µ , ˆ µ ; γ ) − ˆ µ )). We have u ( C (ˆ µ , ˆ µ ; γ )) < E ˜ X ∼ f ( ·| ˆˆ µ − γ ( C (ˆ µ , ˆ µ ; γ ) − ˆ µ )) [ u ( C (ˆ µ , ˆ µ ; γ ) , ˜ X )]by Assumption 1(a). This means C (ˆ µ , ˆ µ ; γ ) < C (ˆ µ , ˆˆ µ ; γ ) by Lemma A.1, as only valuesof X below C (ˆ µ , ˆˆ µ ; γ ) lead to strict preference for continuing. A.2.3 Proof of Lemma 2

Proof.

In fact, this lemma holds for any µ ∈ R . I ﬁrst prove this for Assumption 3(a).For µ > µ , write the corresponding optimal cutoﬀs as c := C ( µ , µ ; γ ) and c := C ( µ , µ ; γ ) . I show that | c − c | < ‘ | µ − µ | . Under the model Ψ( µ , µ ; γ ), the expected continuation payoﬀ after X = c + ‘ ( µ − µ )is E ˜˜ X ∼ f ( ·| µ − γ ( c + ‘ ( µ − µ ) − µ )) [ u ( c + ‘ ( µ − µ ) , ˜˜ X ]= E ˜ X ∼ f ( ·| µ − γ ( c − µ )) [ u ( c + ‘ ( µ − µ ) , ˜ X + ( µ − µ ) − γ‘ ( µ − µ )]= E ˜ X ∼ f ( ·| µ − γ ( c − µ )) [ u ( c + ‘d, ˜ X + (1 − γ‘ ) d )]where we put d = | µ − µ | > . From Assumption 3(a), for every x ∈ R , u ( c + ‘d, x +(1 − γ‘ ) d ) − u ( c , x ) < u ( c + ‘d ) − u ( c ). This means E ˜ X ∼ f ( ·| µ − γ ( c − µ )) [ u ( c + ‘d, ˜ X + (1 − γ‘ ) d ) − u ( c , ˜ X )] < u ( c + ‘d ) − u ( c ) E ˜ X ∼ f ( ·| µ − γ ( c − µ )) [ u ( c + ‘d, ˜ X + (1 − γ‘ ) d )] − u ( c + ‘d ) < E ˜ X ∼ f ( ·| µ − γ ( c − µ )) [ u ( c , ˜ X )] − u ( c ) . The cutoﬀ c satisﬁes the indiﬀerence condition, u ( c ) = E ˜ X ∼ f ( ·| µ − γ ( c − µ )) [ u ( c , ˜ X )] , soRHS is 0. But LHS is the diﬀerence between expected continuation payoﬀ and stoppingpayoﬀ at X = c + ‘ ( µ − µ ) under the model Ψ( µ , µ ; γ ), which shows the agent strictly39refers stopping. This means c < c + ‘ ( µ − µ ) . But µ C ( µ , µ ; γ ) is increasing byLemma A.3, which means c > c . Together, these two inequalities imply | c − c | < ‘ ( µ − µ ) . Now, replace Assumption 3(a) with Assumption 3(b). By Lipschitz continuity of u , suppose | u ( x ) − u ( x ) | < L · | x − x | for some L > x , x ∈ R . Let β =min( (cid:15)/γLγ + (cid:15) , γ ) and put ‘ = γ − β , so 0 < ‘ < γ . Let any ∆ > c = C ( µ , µ , γ ).I show that C ( µ , µ + ∆ , γ ) < c + ‘ ∆.We have u ( c + ‘ ∆) − u ( c ) > ( γ − β ) (cid:15) ∆ > ( γ − (cid:15)/γLγ + (cid:15) ) (cid:15) ∆ and E Ψ( µ ,µ +∆; γ ) [ u ( X ) | X = c + ‘ ∆] − E Ψ( µ ,µ ; γ ) [ u ( X ) | X = c ] ≤ L · (∆ − ‘ ∆ γ ) = ∆ Lγβ ≤ ∆ Lγ · (cid:15)/γLγ + (cid:15) By simple algebra, ( γ − (cid:15)/γLγ + (cid:15) ) (cid:15) ∆ = ∆ Lγ · (cid:15)/γLγ + (cid:15) . Since u ( c ) = E Ψ( µ ,µ ,γ ) [ u ( X ) | X = c ],we conclude u ( c + ‘ ∆) > E Ψ( µ ,µ +∆ ,γ ) [ u ( X ) | X = c + ‘ ∆]. By Lemma A.1, this implies c + ‘ ∆ > ( µ , µ + ∆ , γ ) . A.2.4 Lipschitz Continuity with Constant /γ Now I prove the lemma:

Lemma A.4.

Under Assumption 1, for every γ > µ ∈ R , µ C ( µ , µ ; γ ) isLipschitz continuous with Lipschitz constant 1 /γ . Proof.

The proof of Lemma 2 also applies when ‘ = γ , which implies that when the inequalityin Assumption 3(a) is satisﬁed with ‘ = γ , µ C ( µ , µ ; γ ) is Lipschitz continuous withLipschitz constant 1 /γ . But this reduces the inequality to u ( x + γ d ) − u ( x ) ≥ u ( x + γ d, x ) − u ( x , x ), which is true by Assumption 1(b). A.3 Proof of Claim 2

Proof.

For d > u ( x + 11 + γ d ) − u ( x ) = 11 + γ d while u ( x + 11 + γ d, x + (1 − γ γ ) d ) − u ( x , x )= u ( x + 11 + γ d, x + 11 + γ d ) − u ( x , x )= q max( x + 11 + γ d, x + 11 + γ d ) + (1 − q )( x + 11 + γ d ) − q max( x , x ) − (1 − q ) x = q

11 + γ d + (1 − q ) 11 + γ d = 11 + γ d. ‘ = γ , we have u ( x + ‘d ) − u ( x ) = u ( x + ‘d, x + (1 − γ‘ ) d ) − u ( x , x ) for every x , x ∈ R , d > A.4 Proof of Proposition 2

A.4.1 Preliminary Deﬁnitions and Lemmas

I ﬁrst require some preliminary deﬁnitions and lemmas.The ﬁrst result says for any censoring threshold c ∈ R ∪ {∞} , KL divergence cannot beminimized at ( µ , µ ) where µ = µ • . Lemma A.5.

For every γ > , c ∈ R ∪ {∞} , µ , µ ∈ R with µ = µ • , we have D KL ( H • ( c ) kH (Ψ( µ , µ ; γ ); c )) > D KL ( H • ( c ) kH (Ψ( µ • , µ − γ ( µ • − µ ); γ ); c ))Lemma A.5 shows that solutions to the KL divergence minimization problem (if anyexist) can only take the form ( µ • , µ ) for some µ ∈ R . Thus motivated, we deﬁne L ( µ | x ) := Z ∞−∞ f ( x | µ • ) ln[ f ( x | µ − γ ( x − µ • ))] dx , the expected log-likelihood of second-period data under the fundamentals ( µ • , µ ) and afterthe realization X = x . Also, put¯ L ( µ | c ) := Z c −∞ f ( x | µ • ) · L ( µ | x ) dx for c ∈ R ∪ {∞} . Note ¯ L ( µ | c ) is − D KL ( H • ( c ) kH (Ψ( µ • , µ ; γ ); c )) up to a constant notdepending on µ .I establish some properties of L and ¯ L that will be used in the remainder of the proof. Lemma A.6.

For all x , µ ∈ R , L ( µ | x ) = L ( µ • | x − γ ( µ − µ • )) , so ∂∂µ L ( µ | x ) = − γ ∂∂x L ( µ • | x − γ ( µ − µ • )) .Proof. Follows easily from the deﬁnition of L ( µ | x ) . Lemma A.7.

For every µ ∈ R , L ( µ | · ) is strictly concave. For every x ∈ R , L ( · | x ) isstrictly concave. For every c ∈ R ∪ {∞} , ¯ L ( · | c ) is strictly concave. Finally, ∂ ∂x ∂µ L ( µ | x ) > . Finally, I note a convenient property of strict log-concavity.

Lemma A.8. If f ( x ) > is strictly log concave, then for any K > , x f ( x + K ) f ( x ) is strictlydecreasing. .4.2 Existence and Uniqueness of KL Divergence Minimizers If µ ∗ ∈ R satisﬁes the FOC ∂∂µ ¯ L ( µ ∗ | c ) = 0, then ( µ • , µ ∗ ) is the unique KL divergence mini-mizer across all R . This is because µ ∗ satisﬁes the FOC in minimizing D KL ( H • ( c ) kH (Ψ( µ • , µ ; γ ); c ))across µ ∈ R , a strictly convex objective function by the third statement in Lemma A.7. Fur-thermore, D KL ( H • ( c ) kH (Ψ( µ • , µ ∗ ; γ ); c )) < D KL ( H • ( c ) kH (Ψ( µ , µ ; γ ); c )) for any µ = µ • ,µ ∈ R by Lemma A.5.The next Lemma shows the FOC has a solution at µ = µ • when c = ∞ . Lemma A.9. ∂∂µ ¯ L ( µ • | ∞ ) = 0 . Proof.

I ﬁrst show L ( µ • | x ) is symmetric around x = µ • . Suppose x h − µ • = µ • − x l > . Then, L ( µ • | x h ) is: Z ∞−∞ f ( x | µ • ) ln[ f ( x | µ • − γ ( x h − µ • ))] dx = Z µ • −∞ f ( x | µ • ) ln[ f ( x | µ • − γ ( x h − µ • ))] dx + Z ∞ µ • f ( x | µ • ) ln[ f ( x | µ • − γ ( x h − µ • ))] dx = Z µ • −∞ f ( x | µ • ) ln[ f ( x + γ ( x h − µ • ) | µ • )] dx + Z ∞ µ • f ( x | µ • ) ln[ f ( x + γ ( x h − µ • ) | µ • )] dx Using the symmetry of f ( · | µ • ) around µ • , let ˜ g ( y ) = f ( µ • + y | µ • ) = f ( µ • − y | µ • )for y ≥ . Let y = µ • − x in the ﬁrst integral and y = x − µ • in the second integral inthe sum. We then get L ( µ • | x h ) = Z ∞ ˜ g ( y ) (cid:16) ln[ f ( µ • − y + γ ( x h − µ • ) | µ • ] + ln[ f ( µ • + y + γ ( x h − µ • ) | µ • )] (cid:17) dy . Analogous argument shows L ( µ • | x l ) = Z ∞ ˜ g ( y ) (cid:16) ln[ f ( µ • − y + γ ( x l − µ • ) | µ • ] + ln[ f ( µ • + y + γ ( x l − µ • ) | µ • )] (cid:17) dy . For every y ≥ , we have | [ µ • − y + γ ( x l − µ • )] − [ µ • ] | = | µ • + y + γ ( x h − µ • ) − [ µ • ] | since x l − µ • = − ( x h − µ • ) . As f ( · | µ • ) is symmetric about µ • , this showsln[ f ( µ • − y + γ ( x l − µ • ) | µ • ] = ln[ f ( µ • + y + γ ( x h − µ • ) | µ • )] . A similar symmetry argument shows ln[ f ( µ • + y + γ ( x l − µ • ) | µ • )] = ln[ f ( µ • − y + γ ( x h − µ • ) | µ • ] for all y ≥ . Hence we conclude L ( µ • | x h ) = L ( µ • | x l ) .

42o ﬁnish the argument, apply the second statement in Lemma A.6 to get: ∂∂µ ¯ L ( µ • | ∞ ) = Z ∞−∞ f ( x | µ • ) · ( − γ ) · ∂L∂x ( µ • | x − γ ( µ • − µ • )) dx = − γ Z ∞−∞ f ( x | µ • ) ∂L∂x ( µ • | x ) dx = − γ Z µ • −∞ f ( x | µ • ) ∂L∂x ( µ • | x ) dx + Z ∞ µ • f ( x | µ • ) ∂L∂x ( µ • | x ) dx ! By symmetry of x L ( µ • | x ) around x = µ • just established, ∂L∂x ( µ • | µ • − y ) = − ∂L∂x ( µ • | µ • + y ) for all y ≥

0. At the same time, f ( µ • − y | µ • ) = f ( µ • + y | µ • ) . Therefore the sumof the two integrals is 0.In fact, the FOC also has a solution for any c ∈ R , as the next lemma shows. Lemma A.10.

For any ¯ c ∈ R , there exists some µ ∗ ∈ R so that ∂∂µ ¯ L ( µ ∗ | ¯ c ) = 0 . A.4.3 Monotonicity of µ ∗ ( c ) in c So far, I have shown that ( µ ∗ ( c ) , µ ∗ ( c )) ∈ R are well-deﬁned for all c ∈ R ∪ {∞} and charac-terize the unique solution pair to the KL divergence minimization problem, with µ ∗ ( c ) = µ • and µ ∗ ( ∞ ) = µ • . To ﬁnish proving Proposition 2, it remains to show that µ ∗ ( c ) is strictlyincreasing over ( −∞ , ∞ ] . Lemma A.11.

Let c l , c, c h ∈ R ∪ {∞} with c l < c < c h . Then ∂∂θ L ( µ ∗ ( c ) | c l ) < and ∂∂θ L ( µ ∗ ( c ) | c h ) > . As a result, whenever c , c ∈ ( −∞ , ∞ ] with c < c , we have µ ∗ ( c ) < µ ∗ ( c ) . Proof.

First-order condition for µ ∗ ( c ) implies that ∂∂µ ¯ L ( µ ∗ ( c ) | c ) = 0 ⇒ − γ Z c −∞ f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx = 0 . From Lemma A.7, x ∂∂x L (0 | x − γ µ ∗ ( c )) is strictly decreasing. If at the rightmost pointon integration interval, we have ∂∂x L (0 | c − γ µ ∗ ( c )) ≥ , then ∂∂x L (0 | x − γ µ ∗ ( c )) > x < c . This would lead to ∂∂µ ¯ L ( µ ∗ ( c ) | c ) = 0 , a contradiction. Therefore ∂∂x L (0 | c − γ µ ∗ ( c )) <

0. 43or c h > c, we have that ∂∂µ ¯ L ( µ ∗ ( c ) | c h ) = = − γ Z c h −∞ f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx = − γ Z c −∞ f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx + ( − γ ) Z c h c f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx =0 + ( − γ ) Z c h c f ( x | µ • ) · ∂∂x L (0 | x − γ θ ∗ ( c )) dx . Since ∂∂x L (0 | c − γ µ ∗ ( c )) <

0, we also get ∂∂x L (0 | x − γ µ ∗ ( c )) < x > c since L (0 | · ) is strictly concave by Lemma A.7. Therefore ∂∂µ ¯ L ( µ ∗ ( c ) | c h ) > f ( x | µ • ) is strictly positive.For c l < c, we have that ∂∂µ ¯ L ( µ ∗ ( c ) | c l ) = − γ Z c l −∞ f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx . If ∂∂x L (0 | x − γ µ ∗ ( c )) ≥ x ≤ c l , then clearly this gives ∂∂µ ¯ L ( µ ∗ ( c ) | c l ) < − γ "Z c −∞ f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx − Z cc l f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c )) dx , which simpliﬁes to γ R cc l f ( x | µ • ) · ∂∂x L (0 | x − γ µ ∗ ( c ) dx . If ∂∂x L (0 | x − γ µ ∗ ( c )) ≥ x ∈ [ c l , c ] , then we must also get ∂∂x L (0 | x − γ µ ∗ ( c )) ≥ x ≤ c l , but thisreturns us to the case we have already considered. Thus ∂∂x L (0 | x − γ µ ∗ ( c )) < x ∈ [ c l , c ] , and the integral is strictly negative, showing ∂∂µ ¯ L ( µ ∗ ( c ) | c l ) < . Finally, consider c , c ∈ ( −∞ , ∞ ] with c < c . We must have ∂∂µ ¯ L ( µ ∗ ( c ) | c ) > L ( · | c ) is strictly concave from Lemma A.7 and its FOChas a solution by Lemma A.10. So µ ∗ ( c ) > µ ∗ ( c ).44 .5 Proof of the Expression for µ ∗ ( c ) in Example 2 Proof.

Rewrite Deﬁnition 5 as Z ∞ c φ ( x ; µ • , σ ) · ln φ ( x ; µ • , σ ) φ ( x ; µ , σ ) ! dx + Z c −∞ φ ( x ; µ • , σ ) · Z ∞−∞ φ ( x ; µ • , σ ) · ln " φ ( x ; µ • , σ ) φ ( x ; µ , σ ) dx dx + Z c −∞ φ ( x ; µ • , σ ) · Z ∞−∞ φ ( x ; µ • , σ ) · ln " φ ( x ; µ • , σ ) φ ( x ; µ − γ ( x − µ ) , σ ) dx dx which is: Z ∞−∞ φ ( x ; µ • , σ ) · ln φ ( x ; µ • , σ ) φ ( x ; µ , σ ) ! dx + Z c −∞ φ ( x ; µ • , σ ) · Z ∞−∞ φ ( x ; µ • , σ ) ln " φ ( x ; µ • , σ ) φ ( x ; µ − γ ( x − µ ) , σ ) dx dx The KL divergence between N ( µ true , σ ) and N ( µ model , σ ) isln σ model σ true + σ + ( µ true − µ model ) σ − , so we may simplify the ﬁrst term and the inner integral of the second term:( µ − µ • ) σ + Z c −∞ φ ( x ; µ • , σ ) · " σ + ( µ − γ ( x − µ ) − µ • ) σ − dx . Dropping constant terms not depending on µ and µ and multiplying by σ , we get asimpliﬁed expression of the objective, ξ ( µ , µ ) := ( µ − µ • ) Z c −∞ φ ( x ; µ • , σ ) · " ( µ − γ ( x − µ ) − µ • ) dx We have the partial derivatives by diﬀerentiating under the integral sign, ∂ξ∂µ = Z c −∞ φ ( x ; µ • , σ ) · ( µ − γ ( x − µ ) − µ • ) dx ∂ξ∂µ = ( µ − µ • ) + γ Z c −∞ φ ( x ; µ • , σ ) · ( µ − γ ( x − µ ) − µ • ) dx = ( µ − µ • ) + γ ∂ξ∂µ

45y the ﬁrst order conditions, at the minimum ( µ ∗ , µ ∗ ) , we must have ∂ξ∂µ ( µ ∗ , µ ∗ ) = ∂ξ∂µ ( µ ∗ , µ ∗ ) =0 ⇒ µ ∗ = µ • . So µ ∗ satisﬁes ∂ξ∂µ ( µ • , µ ∗ ) = 0 , which by straightforward algebra shows µ ∗ ( c ) = µ • − γ ( µ • − E [ X | X ≤ c ]) . A.6 Proof of Lemma 1

Proof.

Let c ∈ R , K > | µ ∗ ( c + K ) − µ ∗ ( c ) | < γK . We have¯ L ( µ | c ) = Z c −∞ f ( x | µ • ) L (0 | x − γ µ ) dx and so ¯ L ( µ + γK | c + K ) = Z c + K −∞ f ( x | µ • ) L (0 | ( x − K ) − γ µ ) dx = Z c −∞ f ( x + K | µ • ) L (0 | x − γ µ ) dx This implies ∂∂µ ¯ L ( µ + γK | c + K ) = − γ Z c −∞ f ( x + K | µ • ) ∂∂x L (0 | x − γ µ ) dx . When µ = µ ∗ ( c ) , ﬁrst-order condition implies that ∂∂µ ¯ L ( µ ∗ ( c ) | c ) = 0 , that is Z c −∞ f ( x | µ • ) ∂∂x L (0 | x − γ µ ∗ ( c )) dx = 0 . We may write ∂∂µ ¯ L ( µ ∗ ( c ) + γK | c + K ) as − γ Z c −∞ f ( x + K | µ • ) f ( x | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ µ ∗ ( c )) dx . The term x ∂∂x L (0 | x − γ µ ∗ ( c )) is positive for low values of x and negative forhigh values of x , due to strict concavity of L (0 | · ) by Lemma A.7. Let r be such that ∂∂x L (0 | r − γ µ ∗ ( c )) = 0. Then, ∂∂µ ¯ L ( µ ∗ ( c ) + γK | c + K ) < − γ · Z r −∞ f ( r + K | µ • ) f ( r | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ µ ∗ ( c )) dx − γ · Z cr f ( r + K | µ • ) f ( r | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ µ ∗ ( c )) dx . By Lemma A.8, x f ( x + K | µ • ) f ( x | µ • ) is strictly decreasing. So the weight (cid:16) f ( r + K | µ • ) f ( r | µ • ) (cid:17) under-weighsthe integrand on the interval ( −∞ , r ) , while the same weight over-weighs the integrand on46 r, c ) . This amounts to an under-weighting of the positive part of the integrand and an over-weighting of the negative part, thus under-estimating the integral value. Accounting for theterm − γ gives the inequality above.FOC ∂∂µ ¯ L ( µ ∗ ( c ) | c ) = 0 then implies RHS must be 0. So, ∂∂µ ¯ L ( µ ∗ ( c ) + γK | c + K ) < . Since ¯ L ( · | c + K ) is strictly concave by Lemma A.7, this implies µ ∗ ( c + K ) < µ ∗ ( c ) + γK .Given that we must have µ ∗ ( c + K ) > µ ∗ ( c ) from Proposition 2, this shows | µ ∗ ( c + K ) − µ ∗ ( c ) | < γK. A.7 Proof of Proposition 4

Proof.

Consider the map I as discussed in the text, I ( µ ) := µ ∗ ( C ( µ • , µ ; γ )) . If ˆ µ is aﬁxed point of I , then there is a steady state with µ ∞ = µ • , µ ∞ = ˆ µ , c ∞ = C ( µ • , ˆ µ ; γ ) . So,existence of steady states follows from existence of ﬁxed points of I .Conversely, suppose ( µ ∞ , µ ∞ , c ∞ ) is a steady state. From Proposition 2, µ ∞ = µ ∗ ( c ∞ ) = µ • . From the deﬁnition of a steady state, µ ∞ = µ ∗ ( c ∞ ) and c ∞ = C ( µ ∞ , µ ∞ ; γ ) = C ( µ • , µ ∞ ; γ ) . That is to say, µ ∞ = µ ∗ ( C ( µ • , µ ∞ ; γ )), so µ ∞ is a ﬁxed point of I . So,uniqueness of steady states follows from uniqueness of ﬁxed points of I . Since µ C ( µ • , µ ; γ ) is a contraction mapping with Lipschitz constant ‘ < /γ byLemma 2 and µ ∗ ( c ) is a contraction mapping with Lipschitz constant γ by Lemma 1, theircomposition I is a contraction mapping with Lipschitz constant ‘γ < . This propositionfollows from properties of contraction mappings.

A.8 Proof of Proposition 5

I will use the following lemma.

Lemma A.12.

For any c ∈ R , µ ∗ ( c ) − γ ( c − µ • ) < µ • . Here is the proof of Proposition 5.

Proof.

Suppose ( µ • , µ ∞ , c ∞ ) is a steady state. If c • = ∞ , then c ∞ < c • trivially as c ∞ ∈ R . Now suppose c • = ∞ . By Proposition 1, agent is indiﬀerent between stopping andcontinuing after X = c ∞ under the feasible model Ψ( µ • , µ ∞ ; γ ). This implies u ( c ∞ ) = E Ψ( µ • ,µ ∞ ; γ ) [ u ( c ∞ , X ) | X = c ∞ ]= E ˜ X ∼ f ( ·| µ ∞ − γ ( c ∞ − µ • )) [ u ( c ∞ , ˜ X )]By the deﬁnition of steady state, µ ∞ = µ ∗ ( c ∞ ). By Lemma A.12, µ ∗ ( c ∞ ) − γ ( c ∞ − µ • ) < µ • .Therefore, f ( · | µ ∞ − γ ( c ∞ − µ • )) is ﬁrst-order stochastically dominated by f ( · | µ • ).Since u is strictly increasing in its second argument by Assumption 1(a), we therefore have u ( c ∞ ) < E ˜ X ∼ f ( ·| µ • ) [ u ( c ∞ , ˜ X )] . The LHS is the objective payoﬀ of stopping at c ∞ while the47HS is the objective expected payoﬀ of continuing at c ∞ . Since the best stopping strategyunder the objective model Ψ • has the cutoﬀ form, we must have c ∞ < c • . A.9 Proof of Theorem 1 The hypotheses of Theorem 1 will be maintained throughout this section. I also also ab-breviate f ( · | µ • ) =: g ( · ) and f ( · | µ • ) =: g ( · ) . Finally, let κ g ∈ R > be such that (cid:12)(cid:12)(cid:12) d dx ln( g ( x )) (cid:12)(cid:12)(cid:12) < κ g for all x ∈ R . A.9.1 Optimality of Cutoﬀ Strategies

I ﬁrst develop an extension of Lemma A.1. I show that for an agent who knows µ • and hassome belief over µ with supported bounded by [ µ , ¯ µ ], there exists a cutoﬀ strategy thatuniquely maximizes payoﬀ across all cutoﬀ strategies, so the “myopically optimal” cutoﬀstrategy is well deﬁned. Furthermore, this myopically optimal cutoﬀ strategy also achievesweakly larger expected payoﬀ compared to any arbitrary stopping strategy . So, restrictionto cutoﬀ strategies is without loss. Lemma A.13.

For an agent who knows µ • and who holds some belief ν ∈ ∆([ µ , ¯ µ ]) aboutsecond-period fundamental, there exists c ∗ ∈ R such that: (i) the cutoﬀ strategy S c ∗ achievesweakly higher expected payoﬀ than any other (not necessarily cutoﬀ-based) stopping strategy S : R → { Stop, Continue } ; (ii) for any other c = c ∗ , S c ∗ achieves strictly higher expectedpayoﬀ than S c . A.9.2 The Log Likelihood Process

Next, I deﬁne the processes of data log likelihood (for a given fundamental). For each µ ∈ [ µ , ¯ µ ] , let ‘ t ( µ )( ω ) be the log likelihood that the true second-period fundamental is µ and histories ( ˜ H s ) s ≤ t ( ω ) are generated by the end of round t . It is given by ‘ t ( µ )( ω ) := ln( m ( µ )) + t X s =1 ln(lik( ˜ H s ( ω ); µ ))where lik( x , ∅ ; µ ) := g ( x ) and lik( x , x ; µ ) := g ( x ) · f ( x | µ − γ ( x − µ • )).I record a useful decomposition of ‘ t ( µ ) , the derivative of the log-likelihood process. Let λ ( z ) := ddz ln( g ( z )) = g ( z ) g ( z ) .Deﬁne two stochastic processes: ϕ s ( µ ) := − λ ( X ,s − µ + µ • + γ ( X ,s − µ • )) · { X ,s ≤ ˜ C s } In particular this implies if there exists at least one steady state, then c • = −∞ . One can construct other stopping strategies with the same expected payoﬀ by, for example, modifyingthe stopping decision of the optimal cutoﬀ strategy at ﬁnitely many x . ϕ s ( µ ) := ∂∂µ ¯ L ( µ | ˜ C s )Note that ¯ ϕ s ( µ ) is measurable with respect to F s − , since ( C t ) is a predictable process.Write ξ s ( µ ) := ϕ s ( µ ) − ¯ ϕ s ( µ ) and y t ( µ ) := P ts =1 ξ s ( µ ). Write z t ( µ ) := P ts =1 ¯ ϕ s ( µ ). Lemma A.14. ‘ t ( µ ) = m ( µ ) m ( µ ) + y t ( µ ) + z t ( µ ) Proof.

We may expand ‘ t ( µ ) asln( m ( µ )) + t X s =1 ln( g ( X ,s )) + t X s =1 ln( f ( X ,s | µ − γ ( X ,s − µ • ))) · { X ,s ≤ ˜ C s } . The derivative of the ﬁrst term is m ( µ ) m ( µ ) . The second term does not depend on µ . In the thirdterm, we use the fact that f ( · | τ ) are translations of each other and that g ( · ) = f ( · | µ • )to write: f ( X ,s | µ − γ ( X ,s − µ • )) = g ( X ,s − µ + µ • + γ ( X ,s − µ • )) . This shows that the derivative of each summand in the third term with respect to µ is − g ( X ,s − µ + µ • + γ ( X ,s − µ • )) g ( X ,s − µ + µ • + γ ( X ,s − µ • )) · { X ,s ≤ ˜ C s } = ϕ s ( µ ) . So in sum, ‘ t ( µ ) = m ( µ ) m ( µ ) + P ts =1 ϕ s ( µ ). The lemma then follows from simple rearrange-ments.Now I derive two results about the ξ t ( µ ) processes for diﬀerent values of µ . Lemma A.15.

There exists κ ξ < ∞ so that for every µ ∈ [ µ , ¯ µ ] and for every t ≥ ,ω ∈ Ω , E [ ξ t ( µ ) |F t − ]( ω ) ≤ κ ξ . The proof can be found in the Online Appendix.

Lemma A.16.

For every t ≥ , µ ∈ [ µ , ¯ µ ] and ω ∈ Ω , | ξ t ( µ )( ω ) | ≤ κ g .Proof. In the proof of Lemma A.15, we established E [ ϕ t ( µ ) |F t − ] = ¯ ϕ t ( µ ). So, we have | ξ t ( µ )( ω ) | ≤ | ϕ t ( µ )( ω ) | + | E [ ϕ t ( µ ) | F t − ]( ω ) | . We have ϕ t ( µ ) = λ ( X ,s − µ + µ • + γ ( X ,s − µ • )) · { X ,s ≤ ˜ C s } , with | λ ( z ) | ≤ κ g for all z ∈ R . This shows | ϕ t ( µ )( ω ) | ≤ κ g for all ω, and similarly | E [ ϕ t ( µ ) | F t − ]( ω ) | ≤ κ g for all ω . 49 .9.3 Heidhues, Koszegi, and Strack (2018)’s Law of Large Numbers I use a statistical result from Heidhues, Koszegi, and Strack (2018) to show that the y t termin the decomposition of ‘ t almost surely converges to 0 in the long run, and furthermorethis convergence is uniform on [ µ , ¯ µ ] . This lets me focus on summands of the form ¯ ϕ s ( µ ),which can be interpreted as the expected contribution to the log likelihood derivative fromround s data. This lends tractability to the problem as ¯ ϕ s ( µ ) only depends on ˜ C s , but noton X ,s or X ,s . Lemma A.17.

For every µ ∈ [ µ , ¯ µ ] , lim t →∞ | y t ( µ ) t | = 0 almost surely.Proof. Heidhues, Koszegi, and Strack (2018)’s Proposition 10 shows that if ( y t ) is a martin-gale such that there exists some constant v ≥ y ] t ≤ vt almost surely, where [ y ] t is the quadratic variation of ( y t ) , then almost surely lim t →∞ y t t = 0.Consider the process y t ( µ ) for a ﬁxed µ ∈ [ µ , ¯ µ ]. By deﬁnition y t = P ts =1 ϕ s ( µ ) − ¯ ϕ s ( µ ). As established in the proof of Lemma A.15, for every s, ¯ ϕ s ( µ ) = E [ ϕ s ( µ ) |F s − ].So for t < t, E [ y t ( µ ) |F t ] = t X s =1 ϕ s ( µ ) − ¯ ϕ s ( µ ) + E [ t X s = t +1 ϕ s ( µ ) − ¯ ϕ s ( µ ) |F t ]= t X s =1 ϕ s ( µ ) − ¯ ϕ s ( µ ) + t X s = t +1 E [ E [ ϕ s ( µ ) − ¯ ϕ s ( µ ) |F s − ] | F t ]= t X s =1 ϕ s ( µ ) − ¯ ϕ s ( µ ) + 0 = y t ( µ ) . This shows ( y t ( µ )) is a martingale. Also,[ y ( µ )] t = t − X s =1 E [( y s ( µ ) − y s − ( µ )) |F s − ]= t − X s =1 E [ ξ s ( µ ) |F s − ] ≤ κ ξ · t by Lemma A.15. Therefore Heidhues, Koszegi, and Strack (2018) Proposition 10 applies. Lemma A.18. lim t →∞ sup µ ∈ [ µ , ¯ µ ] | y t ( µ ) t | = 0 almost surely.Proof. From the proof of Lemma 11 in Heidhues, Koszegi, and Strack (2018), it suﬃces toﬁnd a sequence of random variables B t such that sup µ ∈ [ µ , ¯ µ ] | ξ t ( µ ) | ≤ B t almost surely,sup t ≥ t P ts =1 E [ B s ] < ∞ , and lim t →∞ t P ts =1 ( B s − E [ B s ]) = 0. But Lemma A.16 establishesthe constant random variable B t = 2 κ g as a bound on ξ t ( µ ) for every t, µ , ω , which satisﬁesthese requirements. 50 .9.4 Bounds on Asymptotic Beliefs and Asymptotic Cutoﬀs For each t, let ˜ M t be the (random) posterior belief induced by the (random) posterior density˜ m t after updating prior m using t rounds of histories. Lemma A.19.

For c l ≥ C ( µ • , µ ; γ ), if almost surely lim inf t →∞ ˜ C t ≥ c l , then almost surelylim t →∞ ˜ M t ( [ µ , µ ∗ ( c l )) ) = 0 . Also, for c h ≤ C ( µ • , ¯ µ ; γ ) , if almost surely lim sup t →∞ ˜ C t ≤ c h ,then almost surely lim t →∞ ˜ M t ( ( µ ∗ ( c h ) , ¯ µ ]) = 0 . Proof.

I ﬁrst show that for all (cid:15) > , there exists δ > t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] ‘ t ( µ ) t ≥ δ. From Lemma A.14, we may rewrite LHS aslim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] " t m ( µ ) m ( µ ) + y t ( µ ) t + z t ( µ ) t , which is no smaller than taking the inf separately across the three terms in the bracket,lim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] t m ( µ ) m ( µ ) + lim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] y t ( µ ) t + lim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] z t ( µ ) t . Since m is continuous and m is strictly positive (and continuous) on [ µ , ¯ µ ] by thehypotheses of Theorem 1 , m /m is bounded on [ µ , ¯ µ ], so we in fact havelim t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] t m ( µ ) m ( µ ) = 0 . To deal with the second term,lim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] y t ( µ ) t ≥ lim inf t →∞ inf µ ∈ [ µ ¯ µ ] y t ( µ ) t = − lim inf t →∞ sup µ ∈ [ µ ¯ µ ] − y t ( µ ) t . Lemma A.18 gives lim t →∞ sup µ ∈ [ µ ¯ µ ] − y t ( µ ) t = 0 almost surely, so this second term is non-negative almost surely.It suﬃces then to ﬁnd δ > t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] z t ( µ ) t ≥ δ almost surely.Put δ := ∂∂µ ¯ L ( µ ∗ ( c l ) − (cid:15) | c l ) and I will show ¯ ϕ s ( µ )( ω ) ≥ δ whenever ˜ C s ( ω ) ≥ c l and µ ≤ µ ∗ ( c l ) − (cid:15) . To see this, note that when ˜ C s ( ω ) = c ∈ R , ¯ ϕ s ( µ )( ω ) = ∂∂µ ¯ L ( µ | c ) and ¯ L ( · | c ) isstrictly concave in its ﬁrst argument by Lemma A.7. Therefore, if ¯ ϕ s ( µ ∗ ( c l ) − (cid:15) )( ω ) ≥ δ, thenwe also get ¯ ϕ s ( µ )( ω ) ≥ δ for any µ ≤ µ ∗ ( c l ) − (cid:15) . So it suﬃces to show ∂∂µ ¯ L ( µ ∗ ( c l ) − (cid:15) | c ) ≥ δ whenever c ≥ c l . 51e have ∂∂µ ¯ L ( µ | c l ) = Z c l −∞ g ( x ) · Z ∞−∞ ( − · g ( x ) · λ ( x − µ + µ • + γ ( x − µ • )) dx dx . First-order condition implies that ∂∂µ ¯ L ( µ ∗ ( c l ) | c l ) = 0. Since λ is strictly decreasing, thisimplies δ = ∂∂µ ¯ L ( µ ∗ ( c l ) − (cid:15) | c l ) >

0. Also, again using λ strictly decreasing, the innerintegrand is strictly increasing in x . Thus, ∂∂µ ¯ L ( µ ∗ ( c l ) − (cid:15) | c l ) > Z ∞−∞ ( − · g ( x ) · λ ( x − ( µ ∗ ( c l ) − (cid:15) ) + µ • + γ ( c − µ • )) dx > c ≥ c l . This then shows ∂∂µ ¯ L ( µ ∗ ( c l ) − (cid:15) | c ) > ∂∂µ ¯ L ( µ ∗ ( c l ) − (cid:15) | c l ) for any c > c l .Having shown that ¯ ϕ s ( µ )( ω ) ≥ δ for all µ ∈ [ µ , µ ∗ ( c l ) − (cid:15) ] whenever ˜ C s ( ω ) ≥ c l , thisshows along any ω such that lim inf t →∞ ˜ C t ≥ c l , we also have lim inf s →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] ¯ ϕ s ( µ ) ≥ δ , and thus lim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] z t ( µ ) t = lim inf t →∞ inf µ ∈ [ µ ,µ ∗ ( c l ) − (cid:15) ] t " t X s =1 ¯ ϕ s ( µ ) ≥ δ. From here, it is a standard exercise to establish that lim t →∞ ˜ M t ( [ µ , µ ∗ ( c l ) − (cid:15) ) ) = 0almost surely. Since the choice of (cid:15) > t →∞ sup µ ∈ [ µ ∗ ( c h )+ (cid:15), ¯ µ ] z t ( µ ) t ≤ − δ where − δ = max ∂∂µ ¯ L ( µ ∗ ( c h ) + (cid:15) | c h ) , ∂∂µ ¯ L ( µ ∗ ( c h ) + (cid:15) | C ( µ • , µ ; γ )) ! < . Lemma A.20.

For µ ≤ µ l < µ h ≤ ¯ µ , if lim t →∞ ˜ M t ([ µ l , µ h ]) = 1 almost surely, thenlim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) and lim sup t →∞ ˜ C t ≤ C ( µ • , µ h ; γ ) almost surely. Proof.

I show lim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) almost surely . The argument establishinglim sup t →∞ ˜ C t ≤ C ( µ • , µ h ; γ ) is symmetric.Let c l = C ( µ • , µ l ; γ ), c = C ( µ • , µ ; γ ) , ¯ c = C ( µ • , ¯ µ ; γ ). Fix some (cid:15) > . Since c U ( c ; µ • , µ ) is single peaked for every µ , and since c l ≤ C ( µ • , µ ; γ ) for all µ ∈ [ µ l , µ h ] , we get U ( c l ; µ • , µ ) − U ( c l − (cid:15) ; µ • , µ ) > µ ∈ [ µ l , µ h ]. As µ (cid:16) U ( c l ; µ • , µ ) − U ( c l − (cid:15) ; µ • , µ ) (cid:17) is continuous, there exists some κ ∗ > U ( c l ; µ • , µ ) − U ( c l − (cid:15) ; µ • , µ ) > κ ∗ for all µ ∈ [ µ l , µ h ]. In particular, if ν ∈ ∆([ µ l , µ h ]) is a be-lief over second-period fundamental supported on [ µ l , µ h ] , then R U ( c l ; µ • , µ ) − U ( c l − (cid:15) ; µ • , µ ) dν ( µ ) > κ ∗ . Now , let ¯ κ := sup c ∈ [ c, ¯ c ] sup µ ∈ [ µ , ¯ µ ] U ( c ; µ • , µ ), κ := inf c ∈ [ c, ¯ c ] inf µ ∈ [ µ , ¯ µ ] U ( c ; µ • , µ ).52ind p ∈ (0 ,

1) so that pκ ∗ − (1 − p )(¯ κ − κ ) = 0 . At any belief ˆ ν ∈ ∆([ µ , ¯ µ ]) that assignsmore than probability p to the subinterval [ µ l , µ h ], the optimal cutoﬀ is larger than c l − (cid:15) . Tosee this, take any ˆ c ≤ c l − (cid:15) and I will show ˆ c is suboptimal. If ˆ c < c, then it is suboptimal afterany belief on [ µ , ¯ µ ] . If c ≤ ˆ c ≤ c l − (cid:15) , I show that R U ( c l ; µ • , µ ) − U (ˆ c ; µ • , µ ) d ˆ ν ( µ ) > . To see this, we may decompose ˆ ν as the mixture of a probability measure ν on [ µ l , µ h ]and another probability measure ν c on [ µ , ¯ µ ] \ [ µ l , µ h ] . Let ˆ p > p be the probability that ν assigns to [ µ l , µ h ] . The above integral is equal to:ˆ p Z µ ∈ [ µ l ,µ h ] U ( c l ; µ • , µ ) − U (ˆ c ; µ • , µ ) dν ( µ ) + (1 − ˆ p ) Z µ ∈ [ µ , ¯ µ ] \ [ µ l ,µ h ] . U ( c l ; µ • , µ ) − U (ˆ c ; µ • , µ ) dν c ( µ )Since c l is to the left of the optimal cutoﬀ for all µ ∈ [ µ l , µ h ] and ˆ c ≤ c l − (cid:15) , then U (ˆ c ; µ • , µ ) ≤ U ( c l − (cid:15) ; µ • , µ ) for all µ ∈ [ µ l , µ h ] . The ﬁrst summand is no less thanˆ p Z µ ∈ [ µ l ,µ h ] U ( c l ; µ • , µ ) − U ( c l − (cid:15) ; µ • , µ ) dν ( µ ) ≥ ˆ pκ ∗ . Also, the integrand in the second summand is no smaller than − (¯ κ − κ ) , therefore R U ( c l ; µ • , µ ) − U (ˆ c ; µ • , µ ) d ˆ ν ( µ ) ≥ ˆ pκ ∗ − (1 − ˆ p )(¯ κ − κ ) . Since ˆ p > p , we get ˆ pκ ∗ − (1 − ˆ p )(¯ κ − κ ) > ω where lim t →∞ ˜ M t ([ µ l , µ h ])( ω ) = 1 , eventually ˜ M t ([ µ l , µ h ])( ω ) >p for all large enough t, meaning lim inf t →∞ ˜ C t ( ω ) ≥ c l − (cid:15). This shows lim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) − (cid:15) almost surely. Since the choice of (cid:15) > t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) almost surely. A.9.5 The Contraction Map

I now combine the results established so far to prove Theorem 1 . Proof.

Let µ l , [1] := µ , µ h , [1] := ¯ µ . For k = 2 , , ... , iteratively deﬁne µ l , [ k ] := I ( µ l , [ k − ; γ )and µ h , [ k ] := I ( µ h , [ k − ; γ ).From Lemma A.20, if lim t →∞ ˜ M t ([ µ l , [ k ] , µ h , [ k ] ]) = 1 almost surely, then lim inf t →∞ ˜ C t ≥ C ( µ • , µ l , [ k ] ; γ ) and lim sup t →∞ ˜ C t ≤ C ( µ • , µ h , [ k ] ; γ ) almost surely. But using these conclusionsin Lemma A.19, we further deduce that lim t →∞ ˜ M t ([ µ ∗ ( C ( µ • , µ l , [ k ] ; γ )) , µ ∗ ( C ( µ • , µ h , [ k ] ; γ ))]) =1 almost surely, that is to say lim t →∞ ˜ M t ([ µ l , [ k +1] , µ h , [ k +1] ]) = 1 almost surely.As shown in the proof of Proposition 4, under Assumptions 1, 2, and 3, µ

7→ I ( µ ; γ ) isa contraction mapping. Since µ < µ ∞ and ¯ µ > µ ∞ , ( µ l , [ k ] ) k ≥ is a sequence whose limit is µ ∞ , and ( µ h , [ k ] ) k ≥ is a sequence whose limit is µ ∞ . Thus, agent’s posterior converges in L to µ ∞ almost surely (since the support of the prior is bounded).In addition, µ C ( µ • , µ ; γ ) is continuous, so the sequences of bounds on asymptoticcutoﬀs also converge, lim k →∞ C ( µ • , µ l , [ k ] ; γ ) = c ∞ and lim k →∞ C ( µ • , µ h , [ k ] ; γ ) = c ∞ . This53eans lim t →∞ ˜ C t = c ∞ almost surely. A.10 Proof of Theorem 2

I require a lemma that shows beliefs and cutoﬀs are monotonic in the auxiliary environment.

Lemma A.21.

Suppose Assumptions 1 and 2 hold. Starting from any initial condition andany m , cutoﬀs ( c A [ t ] ) t ≥ and beliefs ( µ A , [ t ] ) t ≥ in the auxiliary environment form monotonicsequences across generations. Also, lim t →∞ µ A , [ t ] = µ ∞ where µ ∞ is the unique ﬁxed point of I ( · ; γ ) and lim t →∞ c A [ t ] = C ( µ • , µ ∞ ; γ ) . Now I turn to the proof of Theorem 2.

Proof.

For the ﬁrst step of the proof, suppose Assumptions 1 and 2 hold.

Step 1 : If c [1] > c [0] , then ( µ , [ t ] ) t ≥ and ( c [ t ] ) t ≥ are two increasing sequence, whereas c [1] ≤ c [0] implies ( µ , [ t ] ) t ≥ and ( c [ t ] ) t ≥ are two decreasing sequences.By simple algebra, the problem of generation t + 1 amounts to maximizing the sum of¯ L ( · | c [0] ) , ..., ¯ L ( · | c [ t ] ). For c , ..., c t ∈ R , denote µ ∗ ( c , ..., c t ) := arg min µ ∈ R P ts =0 ¯ L ( µ | c s ) . Suppose c [1] > c [0] . Then µ , [1] = µ ∗ ( c [0] ) , but by Lemma A.11, ∂∂µ ¯ L ( µ , [1] | c [1] ) > L ( · | c [0] ) + ¯ L ( · | c [1] ) is strictly concave and since ∂∂µ ¯ L ( µ , [1] | c [0] ) + ∂∂µ ¯ L ( µ , [1] | c [1] ) = ∂∂µ ¯ L ( µ , [1] | c [1] ) >

0, we must have µ , [2] = µ ∗ ( c [0] , c [1] ) > µ , [1] . This also shows that,since C is strictly increasing, c [2] > c [1] .Assume we have established that c [0] < c [1] < ... < c [ t ] and µ , [1] < ... < µ , [ t ] forsome t ≥ . By FOC of inference in generation t, P t − s =0 ∂∂µ ¯ L ( µ , [ t ] | c [ s ] ) = 0 . If we had ∂∂µ ¯ L ( µ , [ t ] | c [ t − ) <

0, then by single-peaked nature of L ( · | c [ t − ) , µ , [ t ] > µ ∗ ( c [ t − ) . Since c [0] < c [1] < ... < c [ t − implies µ ∗ ( c [0] ) < ... < µ ∗ ( c [ t − ) by Proposition 2, we must also have µ , [ t ] > µ ∗ ( c [ s ] ) for all 0 ≤ s ≤ t − , that is to say ∂∂µ ¯ L ( µ , [ t ] | c [ s ] ) < ≤ s ≤ t − ∂∂µ ¯ L ( µ , [ t ] | c [ t − ) ≥

0, which implies ∂∂µ ¯ L ( µ , [ t ] | c [ t ] ) > c [ t ] > c [ t − from the inductive hypothesis. Hence we see that P ts =0 ∂∂µ ¯ L ( µ , [ t ] | c [ s ] ) > . This shows µ , [ t +1] = µ ∗ ( c [0] , ..., c [ t ] ) > µ , [ t ] by the strict concavity of generation t ’s objective.Also, c [ t +1] > c [ t ] follows.So by induction, we have shown Step 1. (The other case of c [1] < c [0] is symmetric.)For the rest of this proof, suppose Assumption 3 also holds. Step 2 : ( µ , [ t ] ) t ≥ is bounded and converges.I ﬁrst show that for every t, µ , [ t ] is bounded between µ , [1] and µ ∞ . Combined with thefact that ( µ , [ t ] ) t ≥ is monotonic from Step 1 , the sequence must then converge.Consider the case of c [1] > c [0] (so µ , [2] > µ , [1] ), Step 1 implies that ( µ , [ t ] ) t ≥ forms anincreasing sequence. We have µ A , [1] = µ , [1] = µ ∗ ( c [0] ) , so also c A [1] = c [1] . We have µ A , [2] = µ ∗ ( c [1] ) , but ∂∂µ ¯ L ( µ A , [2] | c [0] ) + ∂∂µ ¯ L ( µ A , [2] | c [1] ) = ∂∂µ ¯ L ( µ A , [2] | c [0] ) < , using the FOCthat ∂∂µ ¯ L ( µ A , [2] | c [1] ) = 0 and c [1] > c [0] . This shows µ , [2] < µ A , [2] , hence c [2] < c A [2] . Byinduction, suppose we have shown that µ , [2] < µ A , [2] and c [ t ] < c A [ t ] for some t ≥ . Then, the54rguments from

Step 1 establish that ∂∂µ ¯ L ( µ , [ t +1] | c [ t ] ) ≥

0, which implies ∂∂µ ¯ L ( µ , [ t +1] | c A [ t ] ) > c A [ t ] > c [ t ] . By strict concavity of ¯ L ( · | c A [ t ] ) from Lemma A.7, this shows µ A , [ t +1] = µ ∗ ( c A [ t ] ) > µ , [ t +1] , hence also c A [ t +1] > c [ t +1] . So we have established that µ , [ t ] ≤ µ A , [ t ] by induction. But from the proof of Lemma A.21, ( µ A , [ t ] ) converge upwards to µ ∞ in thiscase (given that they are iterates of I , which is a contraction map by Proposition 4 whenAssumptions 1, 2, and 3 hold), meaning µ , [ t ] is bounded between µ , [1] and µ ∞ .The case of c [1] < c [0] is symmetric (and if c [1] = c [0] then µ , [1] = µ ∞ ) . We have proven

Step 2 . Denote ˜ µ = lim t →∞ µ , [ t ] and observe that since C is continuous in its secondargument, c [ t ] → ˜ c = C ( µ • , ˜ µ ; γ ) . Step 3 : ˜ µ is a ﬁxed point of I ( · ; γ ), so in particular ˜ µ = µ ∞ and ˜ c = c ∞ since I ( · ; γ )has a unique ﬁxed point by Proposition 4.Consider the case of c [1] > c [0] , for the other case is symmetric. From the proof of Step2 , µ , [ t ] is bounded above by ˜ µ ∞ , so if ˜ µ = µ ∞ by way of contradiction, then ˜ µ < µ ∞ . Since the iterates of I ( · ; γ ) are monotonic, this implies I (˜ µ ; γ ) > ˜ µ , that is µ ∗ (˜ c ) > ˜ µ .As ¯ L ( · | ˜ c ) is strictly concave, this implies R ˜ c −∞ ∂∂µ L (˜ µ | x ) dx > . Using the fact that ∂∂µ L ( · | x ) is decreasing, there must exist (cid:15) > R c −∞ ∂∂µ L ( µ | x ) dx ≥ (cid:15) whenever c ∈ [˜ c − (cid:15), ˜ c ] and µ ≤ ˜ µ . Since c [ t ] % ˜ c , ﬁnd large enough T so that c [ t ] ≥ ˜ c − (cid:15) whenever t ≥ T. Also, let B = max µ ∈ [ µ [2] , , ˜ µ ] max c ∈ [ c [0] , ˜ c ] (cid:12)(cid:12)(cid:12)R c −∞ ∂∂µ L ( µ | x ) dx (cid:12)(cid:12)(cid:12) . So for t ≥ T + 1 , P t − s =0 ∂∂µ ¯ L ( µ , [ t ] | c [ s ] ) ≥ − T B + (cid:15) ( t − T ) . This quantity must be strictly positive for largeenough t, a contradiction that says FOC is not satisﬁed for large t. Thus, we must have˜ µ = µ ∞ , hence ˜ c = C ( µ • , µ ∞ ; γ ). A.11 Proof of Corollary 1

Proof.

Suppose c [1] ≥ c [0] . Since µ ∗ ( c ) is increasing, we have µ , [2] = µ ∗ ( c [1] , c [0] ) ≥ µ ∗ ( c [0] ) = µ , [1] . So we get c [2] ≥ c [1] . By Theorem 2, we deduce ( c [ t ] ) t ≥ is an increasing sequence, soin particular c ∞ ≥ c • . But again by 2, c ∞ is the same as the steady-state cutoﬀ in Theorem1. This is a contradiction because Theorem 1 implies c ∞ < c • .This shows c [1] < c [0] and similar arguments show ( c [ t ] ) t ≥ is a strictly decreasing sequence.Since c • is the objectively optimal cutoﬀ threshold under the true model Ψ • , and sinceexpected payoﬀ under the true model is a single-peaked function in acceptance threshold byLemma A.2, this shows expected payoﬀ is strictly decreasing across generations.55 nline Appendix for “Mislearning from Censored Data:The Gambler’s Fallacy in Optimal-Stopping Problems” Kevin HeAugust 19, 2019

OA 1 Proofs of Results in Section 5 and the Appendix

OA 1.1 Proof of Lemma A.2

Proof.

Step 1 : D is strictly increasing.Suppose x > ¯ x . Then, E Ψ [ u (¯ x , X ) | X = ¯ x ] = E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X )] , while E Ψ [ u (¯ x , X ) | X = x ] = E ˜˜ X ∼ f ( ·| µ − γ ( x − µ )) [ u (¯ x , ˜˜ X )]= E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X − γ ( x − ¯ x ))] . Since u is strictly increasing in its second argument by Assumption 1(a), we get E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X − γ ( x − ¯ x ))] ≤ E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X )]seeing that γ ( x − ¯ x ) ≥

0. Also, at any x ∈ R , by Assumption 1(b) we know that u ( x ) − u (¯ x ) > u ( x , x ) − u (¯ x , x ) . ⇒ u ( x ) − u ( x , x ) > u (¯ x ) − u (¯ x , x ) . This then shows u ( x ) − E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u ( x , ˜ X − γ ( x − ¯ x ))] > u (¯ x ) − E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X − γ ( x − ¯ x ))] ≥ u (¯ x ) − E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X )]that is D ( x ) > D (¯ x ). Step 2 : D is continuous. 1ixing some ¯ x ∈ R , I show D is continuous at ¯ x . Since u is continuous, ﬁnd δ > | x − ¯ x | < , | u ( x ) − u (¯ x ) | < δ . Consider the function w : R → R ≥ deﬁnedby w ( x ) := | u (¯ x , x + γ ) | + | u (¯ x , x − γ ) | + δ . Claim

OA.1 . Whenever | x − ¯ x | < , | u ( x , x + γ (¯ x − x )) | ≤ w ( x ) for every x ∈ R . Proof.

Since u is increasing its second argument by Assumption 1(a), if u ( x , x + γ (¯ x − x )) ≥ , then | u ( x , x + γ (¯ x − x )) | ≤ | u ( x , x + γ ) | since | x − ¯ x | < . Otherwise, if u ( x , x + γ (¯ x − x )) <

0, then | u ( x , x + γ (¯ x − x )) | ≤ | u ( x , x − γ ) | . But we have | u ( x , x + γ ) | ≤ | u (¯ x , x + γ ) | + | u ( x , x + γ ) − u (¯ x , x + γ ) | for every x . By Assumption 1(b), | u ( x , x + γ ) − u (¯ x , x + γ ) | ≤ | u ( x ) − u (¯ x ) | < δ whenever | x − ¯ x | <

1. Similarly, | u ( x , x − γ ) | ≤ | u (¯ x , x − γ ) | + | u ( x , x − γ ) − u (¯ x , x − γ ) | ≤ | u (¯ x , x − γ ) | + δ. Claim

OA.2 . The function w is absolutely integrable with respect to the distribution f ( · | µ − γ (¯ x − µ )). Proof.

This is because both x u (¯ x , x + µ − γ (¯ x − µ ) + γ ) and x u (¯ x , x + µ − γ (¯ x − µ ) + γ ) are absolutely integrable with respect to f ( · | , by Assumption 2.Together, these two claims show that for the family of functions x u ( x , x + γ (¯ x − x )) for | x − ¯ x | < w is an integrable dominating function with respect to the distribution f ( · | µ − γ (¯ x − µ )). Consider a sequence ( x ( n )1 ) n ∈ N with x ( n )1 → ¯ x . By continuity, u ( x ( n )1 ) → u (¯ x ) . For all large enough n , the functions x u ( x ( n )1 , x + γ (¯ x − x ( n )1 ))falls within the family mentioned before. Since these functions converge pointwise in x to x u (¯ x , x ), the existence of the dominating function f implies the convergence of theintegrals by dominated convergence theorem, E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u ( x ( n )1 , ˜ X + γ (¯ x − x ( n )1 )] → E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X )] . But E Ψ [ u ( x ( n )1 , X ) | X = x ( n )1 ] = E ˜˜ X ∼ f ( ·| µ − γ ( x ( n )1 − µ )) [ u ( x ( n )1 , ˜˜ X ]= E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u ( x ( n )1 , ˜ X + γ (¯ x − x ( n )1 )] , n →∞ E Ψ [ u ( x ( n )1 , X ) | X = x ( n )1 ] = E ˜ X ∼ f ( ·| µ − γ (¯ x − µ )) [ u (¯ x , ˜ X )]= E Ψ [ u (¯ x , X ) | X = ¯ x ] . This establishes that D ( x ( n )1 ) → D (¯ x ), so D is continuous at ¯ x . Step 3 : If γ > , then there are x < x so that D ( x ) < < D ( x ) . I show D is not always negative; the other statement is symmetric.From u ( x g ) − u ( x g , x b ) > κ > , we get that for any x ≥ x g , x ≤ x b ,u ( x ) − u ( x , x ) ≥ u ( x g ) − u ( x g , x ) ≥ u ( x g ) − u ( x g , x b ) > κ where the ﬁrst inequality comes from Assumption 1(b) and the second one comes fromAssumption 1(a). We have for any x ,D ( x ) = u ( x ) − E Ψ [ u ( x , X ) | X = x ]= P Ψ [ X ≤ x b | X = x ] · ( u ( x ) − E Ψ [ u ( x , X ) | X = x , X ≤ x b ])+ P Ψ [ X > x b | X = x ] · ( u ( x ) − E Ψ [ u ( x , X ) | X = x , X > x b ]) . When x ≥ x g , u ( x ) − E Ψ [ u ( x , X ) | X = x , X ≤ x b ] > κ . Also, for x ≥ x g ,u ( x ) − E Ψ [ u ( x , X ) | X = x , X > x b ] ≤ u ( x g ) − E Ψ [ u ( x g , X ) | X = x , X > x b ] . But P Ψ [ X > x b | X = x ] · E Ψ [ u ( x g , X ) | X = x , X > x b ]= E Ψ [ { X > x b } · u ( x g , X ) | X = x ]= E ˜˜ X ∼ f ( ·| µ − γ ( x − µ )) [ { ˜˜ X > x b } · u ( x g , ˜˜ X )]= E ˜ X ∼ f ( ·| µ ) [ { ˜ X − γ ( x − µ ) > x b } · u ( x g , ˜ X − γ ( x − µ ))] ≤ E ˜ X ∼ f ( ·| µ ) [ { ˜ X − γ ( x − µ ) > x b } · | u ( x g , ˜ X ) | ]when x > µ . Since E ˜ X ∼ f ( ·| µ ) [ | u ( x g , ˜ X ) | ] = E ˜ X ∼ f ( ·| [ | u ( x g , ˜ X + µ ) | ] exists and isﬁnite by Assumption 2, as x → ∞ we must have E ˜ X ∼ f ( ·| µ ) [ { ˜ X − γ ( x − µ ) > x b } · | u ( x g , ˜ X ) | ] → γ >

0. So this shows for all largeenough x , D ( x ) ≥ κ/ > . A 1.2 Proof of Lemma A.5

Proof.

The LHS, up to a constant not depending on µ , µ , can be written as: − Z ∞−∞ f ( x | µ • ) ln( f ( x | µ )) dx − Z c −∞ (cid:26)Z ∞−∞ f ( x | µ • ) · f ( x | µ • ) ln [ f ( x | µ − γ ( x − µ ))] dx (cid:27) dx Replacing ( µ , µ ) with ( µ • , µ − γ ( µ • − µ )) , the above expression becomes: − Z ∞−∞ f ( x | µ • )) ln( f ( x | µ • )) dx − Z c −∞ (cid:26)Z ∞−∞ f ( x | µ • ) · f ( x | µ • ) ln [ f ( x | µ − γ ( µ • − µ ) − γ ( x − µ • ))] dx (cid:27) dx which simpliﬁes to − Z ∞−∞ f ( x | µ • ) ln( f ( x | µ • )) dx − Z c −∞ (cid:26)Z ∞−∞ f ( x | µ • ) · f ( x | µ • ) ln [ f ( x | µ − γ ( x − µ ))] dx (cid:27) dx So we see D KL ( H • ( c ) kH (Ψ( µ , µ ; γ ); c )) − D KL ( H • ( c ) kH (Ψ( µ • , µ − γ ( µ • − µ ); γ ); c ))= Z ∞−∞ f ( x | µ • ) ln( f ( x | µ • )) dx − Z ∞−∞ f ( x | µ • ) ln( f ( x | µ )) dx . Since { f ( · | µ ) : µ ∈ R } is a family of shifted densities, µ Z ∞−∞ f ( x | µ • ) ln( f ( x | µ )) dx is maximized at µ = µ • and attains a strictly smaller value for any µ = µ • . Thus thediﬀerence is strictly positive.

OA 1.3 Proof of Lemma A.7

Proof.

I ﬁrst show that ∂ ∂τ ln[ f ( x | τ )] < x , τ ∈ R . To see this, f ( x | τ ) = f ( x − τ | ∂ ∂τ ln[ f ( x | τ )] = h ∂ ∂y ln( f ( y | i y = x − τ . By Assumption 2, f ( · |

0) isstrictly log-concave, therefore ∂ ∂y ln( f ( y | < y ∈ R . We have from the deﬁnition of L ( µ | x ) ,∂ ∂µ L ( µ | x ) = Z ∞−∞ f ( x | µ • ) " ∂ ∂τ ln[ f ( x | τ )] τ = µ − γ ( x − µ • ) dx < ∂ ∂τ ln[ f ( x | τ )] < . Also, for the same reason, ∂ ∂x L ( µ | x ) = ( − γ ) · Z ∞−∞ f ( x | µ • ) " ∂ ∂τ ln[ f ( x | τ )] τ = µ − γ ( x − µ • ) dx < . Now, replacing L ( µ | x ) in the deﬁnition of ¯ L ( µ | c ) with L ( µ • | x − γ ( µ − µ • )) usingLemma A.6, we have for any c ∈ R ∪ {∞} ,∂ ∂µ ¯ L ( µ | c ) = ∂ ∂µ Z c −∞ f ( x | µ • ) · L ( µ • | x − γ ( µ − µ • )) dx =( − γ ) · Z c −∞ f ( x | µ • ) · " ∂ ∂τ L ( µ • | τ ) τ = x − γ ( µ − µ • ) dx . As just established, ∂ ∂τ L ( µ • | τ ) < τ ∈ R , therefore ∂ ∂µ ¯ L ( µ | c ) < ∂ ∂x ∂µ L ( µ | x ) = − γ ∂ ∂x L ( µ • | x − γ ( µ − µ • )) . But ∂ ∂x L ( µ • | x − γ ( µ − µ • )) < L ( µ | · ) just derived, therefore ∂ ∂x ∂µ L ( µ | x ) > . OA 1.4 Proof of Lemma A.8

Proof.

We have ddx f ( x + K ) f ( x ) ! = f ( x + K ) f ( x ) − f ( x + K ) f ( x ) f ( x ) . Since d dx ln( f ( x )) <

0, we get ddx (cid:20) f ( x ) f ( x ) (cid:21) <

0, so f ( x + K ) f ( x + K ) < f ( x ) f ( x ) for all x. Rearranging thisshows f ( x + K ) f ( x ) − f ( x + K ) f ( x ) < . OA 1.5 Proof of Lemma A.10

Proof.

Using Lemma A.9’s conclusion that ∂∂µ ¯ L ( µ • | ∞ ) = 0, we get Z ∞−∞ f ( x | µ • ) ∂L∂x ( µ • | x ) dx = 0 . ∂L∂x ( µ • | · ) is strictly decreasing by Lemma A.7, we conclude Z c −∞ f ( x | µ • ) ∂L∂x ( µ • | x ) dx > c ∈ R , therefore ∂∂µ ¯ L ( µ • | c ) < µ l ∈ R where ∂∂µ ¯ L ( µ | ¯ c ) ≥ , then a solution to the FOC existsbetween µ l and µ • by intermediate value theorem. We show such µ l can always be found.We have ∂∂µ ¯ L ( µ • − | ∞ ) > , since µ ∂∂µ L ( µ | x ) is decreasing by Lemma A.7.By continuity, we may ﬁnd large enough c h ∈ R so that ∂∂µ ¯ L ( µ • − | c h ) > ∂∂µ ¯ L ( µ • − | c ) > c ∈ R , we are done by taking µ l = µ • − . Else, byintermediate value theorem there exists ˆ c ∈ R so that ∂∂µ ¯ L ( µ • − | ˆ c ) = 0. Using the ﬁnalfact from A.7 that ∂L∂µ ( µ • − | · ) is strictly increasing, if ¯ c ≥ ˆ c then we are done by taking µ l = µ • − , as ∂∂µ ¯ L ( µ • − | ˆ c ) = 0 implies ∂∂µ ¯ L ( µ • − | ¯ c ) ≥ c l = ˆ c − K for K > . I show that µ l may be taken to be µ • − − γK to get ∂∂µ ¯ L ( µ l | c l ) > . We have ¯ L ( µ | ˆ c ) = Z ˆ c −∞ f ( x | µ • ) L (0 | x − γ µ ) dx , and so ¯ L ( µ − γK | ˆ c − K ) = Z ˆ c − K −∞ f ( x | µ • ) L (0 | ( x + K ) − γ µ ) dx = Z ˆ c −∞ f ( x − K | µ • ) L (0 | x − γ µ ) dx . This implies ∂∂µ ¯ L ( µ − γK | ˆ c − K ) = − γ Z ˆ c −∞ f ( x − K | µ • ) ∂∂x L (0 | x − γ µ ) dx . For µ = µ • −

1, we rewrite ∂∂µ ¯ L ( µ − γK | ˆ c − K ) as − γ Z ˆ c −∞ f ( x − K | µ • ) f ( x | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx . By the construction of ˆ c , ∂∂µ ¯ L ( µ • − | ˆ c ) = 0 , that is to say Z ˆ c −∞ f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx = 0 . Since ∂∂x L (0 | x − γ ( µ • − x by Lemma A.7, it must be positive6or some low values of x and negative for some high values of x not exceeding ˆ c. Let r < ˆ c be such that ∂∂x L (0 | x − γ ( µ • − . Then we have Z ˆ c −∞ f ( x − K | µ • ) f ( x | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx < Z r −∞ f ( r − K | µ • ) f ( r | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx + Z ˆ cr f ( r − K | µ • )) f ( r | µ • )) ! f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx . To see this, ﬁrst observe that x f ( x | µ • ) f ( x − K | µ • ) is strictly decreasing by Lemma A.8, therefore x f ( x − K | µ • ) f ( x | µ • ) is strictly increasing. So replacing f ( x − K | µ • ) f ( x | µ • ) with f ( r − K | µ • ) f ( r | µ • ) over-weighs theintegrand on the interval ( −∞ , r ) , while the same weight under-weighs the integrand on( r, ˆ c ) . This amounts to an over-weighting of the positive part of the integrand and an under-weighting of the negative part, thus over-estimating the integral value.Multiplying both sides by − γ and reversing the inequality, ∂∂µ ¯ L ( µ l | ˆ c − K ) > − γ · Z r −∞ f ( r − K | µ • ) f ( r | µ • ) ! f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx − γ Z ˆ cr f ( r − K | µ • )) f ( r | µ • )) ! f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx . The RHS is − γ · f ( r − K | µ • )) f ( r | µ • )) · Z ˆ c −∞ f ( x | µ • ) ∂∂x L (0 | x − γ ( µ • − dx = 0 , hence we have ∂∂µ ¯ L ( µ l | ¯ c ) > OA 1.6 Proof of Lemma A.12

Proof.

First-order condition implies ∂∂µ ¯ L ( µ ∗ ( c ) | c ) = 0. Using the second statement ofLemma A.6 in the FOC gives − γ Z c −∞ f ( · | µ • ) ∂∂x L ( µ • | x − γ ( µ ∗ ( c ) − µ • )) = 0 . By strict concavity of L ( µ • | · ) from Lemma A.7, this requires that ∂∂x L ( µ • | · ) takes ona strictly negative value at the rightmost point of the domain of integration, ∂∂x L ( µ • | c − γ ( µ ∗ ( c ) − µ • )) <

0. 7rom its deﬁnition, x L ( µ • | x ) is maximized when x = µ • , for we have L ( µ • | µ • ) = R ∞−∞ f ( x | µ • ) ln[ f ( x | µ • )] dx . Since L ( µ • | · ) is strictly concave, this means ∂∂x L ( µ • | τ ) < τ > µ • . Combining with the previous inequality, c − γ ( µ ∗ ( c ) − µ • ) > µ • , which rearranges to say µ ∗ ( c ) − γ ( c − µ • ) < µ • as desired. OA 1.7 Proof of Lemma A.13

Proof.

Consider the payoﬀ diﬀerence between accepting x and continuing under belief ν , D ( x ; ν ) := u ( x ) − Z E X ∼ f ( ·| µ − γ ( x − µ • ) ,σ ) [ u ( x , X )] dν ( µ ) . Note that D ( x , ν ) = R D ( x ; µ • , µ , γ ) dν ( µ ). Lemma A.2 shows that for every µ ∈ R , D ( x ; µ • , µ , γ ) is strictly increasing in x . Hence the same must hold for D ( x , ν ) . Also, Lemma A.2 implies there exists some x ∈ R so that D ( x ; µ • , µ , γ ) < , and thatthere exists some x ∈ R satisfying D ( x ; µ • , µ , γ ) >

0. Since u increases in its secondargument, we also get D ( x ; µ • , µ , γ ) < D ( x ; µ • , µ , γ ) > µ ∈ [ µ , ¯ µ ]. Thisimplies D ( x ; ν ) < D ( x ; ν ) >

0, as ν is supported on (a subset of) [ µ , ¯ µ ] . Finally, I show D ( x ; ν ) is continuous in x . Fix ¯ x ∈ R . Since u is continuous, ﬁnd δ > | x − ¯ x | < , | u ( x ) − u (¯ x ) | < δ. Consider the function φ : R → R ≥ deﬁned by φ ( x , µ ) := | u (¯ x , x − γ + µ ) | + | u (¯ x , x + γ + µ ) | + δ . Claim

OA.3 . Whenever | x − ¯ x | < , | u ( x , x + γ (¯ x − x ) + µ ) | ≤ φ ( x , µ ) for every x , µ ∈ R . Proof.

This is the same as the proof of Claim OA.1.

Claim

OA.4 . R ¯ µ µ (cid:16)R ∞−∞ | φ ( x , µ ) | · f ( x | − γ (¯ x − µ • )) dx (cid:17) dν ( µ ) < ∞ . Proof.

We may write φ ( x , µ ) := u + γ , + ( x , µ ) + u + γ , − ( x , µ ) + u − γ , + ( x , µ ) + u − γ , − ( x , µ ) + δ where u + γ , + and u + γ , − are the positive and negative parts of ( x , µ ) u (¯ x , x + γ + µ ) , and u − γ , + and u − γ , − are the positive and negative parts of ( x , µ ) u (¯ x , x − γ + µ ) . FromAssumption 2, for every µ ∈ [ µ , ¯ µ ], each of u + γ , + ( · , µ ) , u + γ , − ( · , µ ) ,u − γ , + ( · , µ ) , and u − γ , − ( · , µ )is integrable over R with respect to the density f ( · | − γ (¯ x − µ • )) . These integrals aremaximized at µ = ¯ µ for u + γ , + ( · , µ ) and u − γ , + ( · , µ ), and maximized at µ = µ for u + γ , − ( · , µ )8nd u − γ , − ( · , µ ). In other words, for every µ ∈ [ µ , ¯ µ ], Z ∞−∞ | φ ( x , µ ) | · f ( x | − γ (¯ x − µ • )) dx ≤ Z ∞−∞ (cid:16) u + γ , + ( x , ¯ µ ) + u − γ , + ( x , ¯ µ ) (cid:17) · f ( x | − γ (¯ x − µ • )) dx + Z ∞−∞ (cid:16) u + γ , − ( x , µ ) + u − γ , − ( x , µ ) (cid:17) · f ( x | − γ (¯ x − µ • )) dx . This bound is ﬁnite and does not depend on µ , so the overall integral over dν ( µ ) is alsoﬁnite.Consider a sequence x ( n )1 → ¯ x . We have D ( x ( n )1 ; ν ) = u ( x ( n )1 ) − Z E ˜˜ X ∼ f ( ·| µ − γ ( x ( n )1 − µ • )) [ u ( x ( n )1 , ˜˜ X )] dν ( µ )= u ( x ( n )1 ) − Z E ˜ X ∼ f ( ·|− γ (¯ x − µ • )) [ u ( x ( n )1 , ˜ X + γ (¯ x − x ( n )1 ) + µ )] dν ( µ )= u ( x ( n )1 ) − Z ¯ µ µ Z ∞−∞ u ( x ( n )1 , x + γ (¯ x − x ( n )1 ) + µ ) · f ( x | − γ (¯ x − µ • )) dx dν ( µ ) . The sequence of functions ( x , µ ) u ( x ( n )1 , x + γ (¯ x − x ( n )1 ) + µ ) pointwise convergeto u (¯ x , x + µ ) as n → ∞ . From the two claims, for all large enough n, this sequenceof functions are pointwise dominated by f, an absolutely integrable function on the samedomain. Therefore continuity follows from dominated convergence theorem, as in the proofof Lemma A.2.This means there exists a unique c ∗ so that D ( c ∗ ) = 0. The cutoﬀ strategy S c ∗ is optimal,because it stops at every x whose stopping payoﬀ exceeds expected continuation payoﬀ, andcontinues at every x where expected continuation payoﬀ is higher than stopping payoﬀ.For any c = c ∗ + δ for some δ >

0, the diﬀerence in expected payoﬀs of S c ∗ and S c is R c ∗ + δc ∗ D ( x ; ν ) > D ( x ; ν ) is strictly positive on the interval ( c ∗ , c ∗ + δ ]. So everystrictly higher cutoﬀ than c ∗ is strictly suboptimal. A similar argument shows every strictlylower cutoﬀ than c ∗ is also strictly suboptimal. OA 1.8 Proof of Lemma A.15

Proof.

Note that ¯ ϕ t ( µ ) is measurable with respect to F t − . Also, ϕ t ( µ ) |F t − = ϕ t ( µ ) | ˜ C t ,because by independence of X t from ( X s ) t − s =1 , the only information that F t − contains about ϕ t ( µ ) is in determining the cutoﬀ threshold ˜ C t .9t a sample path ω so that ˜ C t ( ω ) = c ∈ R , E [ ϕ t ( µ ) |F t − ]( ω ) = E [ − λ ( X ,s − µ + µ • + γ ( X ,s − µ • )) · { X ,s ≤ c } ]= ∂∂µ Z c −∞ g ( x ) · Z ∞−∞ g ( x ) · ln( f ( X ,s | µ − γ ( X ,s − µ • ))) dx dx = ∂∂µ ¯ L ( µ | c ) . This shows that E [ ϕ t ( µ ) |F t − ]( ω ) = ¯ ϕ t ( µ )( ω ). Since this holds regardless of c , we get that E [ ϕ t ( µ ) |F t − ] = ¯ ϕ t ( µ ) for all ω, that is to say E [ ξ t ( µ ) |F t − ] = Var[ ϕ t ( µ ) |F t − ] ≤ E [ ϕ t ( µ ) |F t − ] ≤ E [( λ ( X ,s − µ + µ • + γ ( X ,s − µ • ))) ]It suﬃces now to show E h ( λ ( X − µ + µ • + γ ( X − µ • ))) i exists for all µ ∈ R and iscontinuous in µ , for then the (ﬁnite) maximum value this expectation takes on the compactinterval [ µ , ¯ µ ] can be taken as κ ξ .Continuity is clear.By assumption on g , there exists some κ g < ∞ so that for all z ∈ R , − κ g < λ ( z ) < λ ( z ) is Lipschitz continuous with constant κ g . Let b := λ ( − µ + µ • − γµ • ).For any x , x ∈ R , ( λ ( x − µ + µ • + γ ( x − µ • ))) = b + ( λ ( x − µ + µ • + γ ( x − µ • ))) − ( λ ( − µ + µ • − γµ • )) ≤ b + | λ ( x − µ + µ • + γ ( x − µ • )) − λ ( − µ + µ • − γµ • ) | ·× | λ ( x − µ + γ ( x + µ • − µ • )) + λ ( − µ + µ • − γµ • ) |≤ b + ( κ g · ( | x | + γ | x | )) · (2 b + ( κ g · ( | x | + γ | x | ))) . Note the bound is a second-order polynomial in | x | and | x | . We have E h ( λ ( X − µ + µ • + γ ( X − µ • ))) i ≤ E h b + ( κ g · ( | X | + γ | X | )) · (2 b + ( κ g · ( | X | + γ | X | ))) i < ∞ , where the last inequality comes from the fact that X , X have ﬁnite second moments. OA 1.9 Proof of Lemma A.21

Proof.

Suppose µ A , [2] ≥ µ A , [1] . From Lemma 1, C is strictly increasing in its second argument.This shows c A [2] = C ( µ • , µ A , [2] ; γ ) ≥ C ( µ • , µ A , [1] ; γ ) = c A [1] . But by Proposition 2, µ ∗ ( c ) increasesin c , so µ A , [3] = µ ∗ ( c A [2] ) ≥ µ ∗ ( c A [1] ) = µ A , [2] . Continuing this argument shows that ( µ A , [ t ] ) t ≥ is10 monotonically increasing sequence. Since C is strictly increasing in its second argument,( c A [ t ] ) t ≥ must also form a monotonically increasing sequence.Conversely if µ A , [2] < µ A , [1] , then the analogous arguments show that ( µ A , [ t ] ) t ≥ and ( c A [ t ] ) t ≥ are monotonically decreasing sequences.It is clear that µ A , [ t ] are iterates of I ( · ; γ ) , so they must converge to its ﬁxed point as I ( · ; γ )is a contraction mapping by Proposition 4. We have lim t →∞ c A [ t ] = lim t →∞ C ( µ • , µ A , [ t ] ; γ ) . Wemay take the limit inside the C function since it is continuous, ﬁnding that lim t →∞ c A [ t ] = C ( µ • , µ ∞ ; γ ). OA 1.10 Proof of Lemma 3

Proof.

Indiﬀerence condition c L = C u ,u L ( µ , µ ; γ ) implies that u ( c L ) = E ˜ X ∼ f ( ·| µ − γ ( c L − µ )) [ u L ( c L , ˜ X )] . Since u H ( c L , x ) ≥ u L ( c L , x ) for all x ∈ R , with strict inequality on a positive-measure set,this shows u ( c L ) < E ˜ X ∼ f ( ·| µ − γ ( c L − µ )) [ u H ( c L , ˜ X )] . Because ( u , u L ) satisfy Assumptions 1, the best stopping strategy in the feasible modelΨ( µ , µ ; γ ) has a cutoﬀ form by Proposition 1. This shows C u ,u H ( µ , µ ; γ ) is strictly above c L . OA 1.11 Proof of Proposition 6

Proof.

Under Assumptions 1, 2, and 3, each of ( u , u H ) and ( u , u L ) has a unique steady state,( µ • , µ ∞ ,H , c ∞ H ) , ( µ • , µ ∞ ,L , c ∞ L ) respectively. Let I H , I L be the iteration maps corresponding tothese two stage games, that is to say I H ( µ ) := µ ∗ ( C u ,u H ( µ • , µ ; γ )) I L ( µ ) := µ ∗ ( C u ,u L ( µ • , µ ; γ )) . From Proposition 4, both I H and I L are contraction mappings. Consider their iterateswith a starting value of 0. That is, put µ [0]2 ,H = 0, µ [0]2 ,L = 0 and let µ [ t ]2 ,H = I H ( µ [ t − ,H ) ,µ [ t ]2 ,L = I L ( µ [ t − ,L ) for t ≥

1. By property of contraction mappings and since the ﬁxed pointsof the iteration maps are the steady state beliefs, µ [ t ]2 ,H → µ ∞ ,H and µ [ t ]2 ,L → µ ∞ ,L .By induction, I will show µ [ t ]2 ,L ≤ µ [ t ]2 ,H for every t ≥ . The base case of t = 0 is true bydeﬁnition. If µ [ T ]2 ,L ≤ µ [ T ]2 ,H , then C u ,u L ( µ • , µ [ T ]2 ,L ; γ ) ≤ C u ,u L ( µ • , µ [ T ]2 ,H ; γ ) < C u ,u H ( µ • , µ [ T ]2 ,H ; γ ) . C being increasing in the second argument and the inductivehypothesis, while the second inequality is due to Lemma 3. Therefore, I L ( µ [ T ]2 ,L ) ≤ I H ( µ [ T ]2 ,H )using the fact that µ ∗ is increasing by Proposition 2, so µ [ T +1]2 ,L ≤ µ [ T +1]2 ,H . Since weak inequalities are preserved by limits, we have µ ∞ ,H ≥ µ ∞ ,L . It is impossible tohave µ ∞ ,H = µ ∞ ,L , because this would lead to c ∞ H > c ∞ L by Lemma 3, which in turn implies µ ∞ ,H = µ ∗ ( c ∞ H ) > µ ∗ ( c ∞ L ) = µ ∞ ,L . This inequality contradicts µ ∞ ,H = µ ∞ ,L . Therefore, we infact have µ ∞ ,H > µ ∞ ,L . The conclusion that c ∞ H > c ∞ L follows from Lemma 3 and the fact that C is increases in its second argument. OA 1.12 Proof of Proposition 7

Proof.

Rewrite Equation (2) as Z ∞−∞ φ ( x ; µ • , ( σ • ) ) · ln φ ( x ; µ • , ( σ • ) ) φ ( x ; µ , σ ) ! dx + Z c −∞ φ ( x ; µ • , ( σ • ) ) · Z ∞−∞ φ ( x ; µ • , ( σ • ) ) ln " φ ( x ; µ • , ( σ • ) ) φ ( x ; µ − γ ( x − µ ) , σ ) dx dx . The KL divergence between N ( µ true , σ ) and N ( µ model , σ ) is ln σ model σ true + σ +( µ true − µ model ) σ − , so we may simplify the ﬁrst term and the inner integral of the second term.ln σ σ • + ( µ − µ • ) σ + ( σ • ) σ − Z c −∞ φ ( x ; µ • , σ • ) · " ln σ σ • + ( σ • ) + ( µ − γ ( x − µ ) − µ • ) σ − dx . Dropping terms not dependent on any of the four variables gives a simpliﬁed version of theobjective, ξ ( µ , µ , σ , σ ) := ln σ σ • + ( µ − µ • ) σ + ( σ • ) σ + Z c −∞ φ ( x ; µ • , ( σ • ) ) · " ln σ σ • + ( σ • ) + ( µ − γ ( x − µ ) − µ • ) σ dx . Diﬀerentiating under the integral sign, ∂ξ∂µ = Z c −∞ φ ( x ; µ • , ( σ • ) ) · " ( µ − γ ( x − µ ) − µ • ) σ dx ξ∂µ = ( µ − µ • ) σ + γ Z c −∞ φ ( x ; µ • , ( σ • ) ) · " ( µ − γ ( x − µ ) − µ • ) σ dx = ( µ − µ • ) σ + γ ∂ξ∂µ . At FOC ( µ ∗ , µ ∗ , σ ∗ , σ ∗ ) , we have ∂ξ∂µ ( µ ∗ , µ ∗ , σ ∗ , σ ∗ ) = 0 , hence µ ∗ = µ • . Similar argumentsas before then establish µ ∗ = µ • − γ ( µ • − E [ X | X ≤ c ]) , where expectation is taken withrespect to the true distribution of X (with the true variance ( σ • ) ). Then, ∂ξ∂σ ( µ ∗ , µ ∗ , σ ∗ , σ ∗ ) = 1( σ ∗ ) − ( σ • ) ( σ ∗ ) = 0 , this gives σ ∗ = σ • (since σ ∗ ≥ . Finally, from the FOC for σ , Z c −∞ φ ( x ; µ • , ( σ • ) ) · " σ ∗ − ( σ • ) + ( µ ∗ − γ ( x − µ ∗ ) − µ • ) ( σ ∗ ) dx = 0 . Substituting in values of µ ∗ , µ ∗ already solved for,( σ ∗ ) = ( σ • ) + E [( µ ∗ − γ ( X − µ • ) − µ • ) | X ≤ c ]= ( σ • ) + E [( µ • − γ ( µ • − E [ X | X ≤ c ]) − γ ( X − µ • ) − µ • ) | X ≤ c ]= ( σ • ) + γ E h [( X − µ • ) − ( E [ X | X ≤ c ] − µ • )] | X ≤ c i = ( σ • ) + γ Var[ X − µ • | X ≤ c ]= ( σ • ) + γ Var[ X | X ≤ c ]as desired. OA 1.13 Proof of Proposition 8

I start with a lemma that says, depending on the convexity of the decision problem, astronger belief in ﬁctitious variation either increases or decreases the subjectively optimalcutoﬀ threshold.

Lemma OA.1.

Suppose that under the feasible model Ψ( µ , µ , σ , σ ; γ ) , the agent is in-diﬀerent between stopping at c and continuing. Suppose ˆ σ > σ . Then: (i) if x u ( c, x ) is convex with strict convexity for x in a positive-measure set, then under the feasible model Ψ( µ , µ , σ , ˆ σ ; γ ) the agent strictly prefers continuing at c ; (ii) if x u ( c, x ) is con-cave with strict concavity for x in a positive-measure set, then under the feasible model Ψ( µ , µ , σ , ˆ σ ; γ ) the agent strictly prefers stopping at c . roof. Indiﬀerence at x = c under the model Ψ( µ , µ , σ , σ ; γ ) implies that u ( c ) = E X ∼N ( µ − γ ( x − µ ) ,σ ) [ u ( c, X )] . When hypothesis in (i) is satisﬁed, E X ∼N ( µ − γ ( x − µ ) ,σ ) [ u ( c, X )] < E X ∼N ( µ − γ ( x − µ ) , ˆ σ ) [ u ( c, X )]since ˆ σ > σ implies that N ( µ − γ ( x − µ ) , ˆ σ ) is a strict mean-preserving spread of N ( µ − γ ( x − µ ) , σ ) . The RHS is the expected continuation payoﬀ under model Ψ( µ , µ , σ , ˆ σ ; γ ),so the agent strictly prefers continuing when X = c. The argument establishing (ii) isanalogous.Now I give the proof of Proposition 8.

Proof.

The result that µ , [ t ] = µ • , ( σ , [ t ] ) = ( σ • ) for all t follows from Proposition 7.Suppose c [1] ≤ c [0] . From Proposition 7, µ , [2] ≤ µ , [1] and ( σ , [2] ) ≤ ( σ , [1] ) . Let c [2] be the indiﬀerence threshold under the model Ψ( µ • , µ , [2] , ( σ • ) , ( σ , [1] ) ). By Lemma 1, c [2] ≤ c [1] . Also, from Lemma OA.1, c [2] ≤ c [2] as generation 2 actually believes in the feasiblemodel Ψ( µ • , µ , [2] , ( σ • ) , ( σ , [2] ) ) where ( σ , [2] ) ≤ ( σ , [1] ) . This shows c [2] ≤ c [1] . Continuingthis argument shows that ( c [ t ] ) t ≥ forms a monotonically decreasing sequence. Since thepseudo-true parameters µ ∗ and ( σ ∗ ) are monotonic functions of the censoring threshold c, we have established the proposition in the case where c [1] ≤ c [0] .The argument for the case where c [1] ≥ c [0] is exactly analogous and therefore omitted. OA 1.14 Proof of Proposition 9

Proof.

In the ﬁrst generation, both societies A and B observe large datasets of histories withdistribution H • ( c [0] ) . So, by Proposition 7, two societies make the same inferences about thefundamentals.Suppose the optimal-stopping problem is convex. Then due to ﬁctitious variation ingeneration 1 and the convexity of u , it follows from Lemma OA.1 that c [ B, > c [ A, . Inthe second generation, µ , [ B, > µ , [ A, because the pseudo-true second-period fundamentalincreases in the censoring cutoﬀ. Together again with the existence of ﬁctitious variation,we conclude c [ B, > c [ A, . Continuing this argument establishes the proposition for thecase where the optimal-stopping problem is convex. The case of concave optimal-stoppingproblems is analogous. 14

A 1.15 Proof of Proposition 10

Proof.

In the true model, X | ( X = x ) ∼ N ( µ • − γ • ( x − µ • ) , σ ), while the agents’ feasiblemodel Ψ( µ , µ ; γ ) has X | ( X = x ) ∼ N ( µ − γ ( x − µ ) , σ ). So, we can write D KL ( H (Ψ( µ • , µ • ; γ • ); c ) k H (Ψ( µ , µ ; γ ); c ))as the following: Z ∞ c φ ( x ; µ • , σ ) · ln φ ( x ; µ • , σ ) φ ( x ; µ , σ ) ! dx + Z c −∞ Z ∞−∞ φ ( x ; µ • , σ ) · φ ( x ; µ • − γ • ( x − µ • ) , σ ) · ln h φ ( x ; µ • ,σ ) · φ ( x ; µ • − γ • ( x − µ • ) ,σ ) φ ( x ; µ ,σ ) · φ ( x ; µ − γ ( x − µ ) ,σ ) i dx  dx . Performing rearrangements similar to those in the proof of Proposition 2 and using theclosed-form expression of KL divergence between two Gaussian distributions, the above canbe rewritten as( µ − µ • ) σ + Z c −∞ φ ( x ; µ • , σ ) · ( µ − γ ( x − µ ) − µ • + γ • ( x − µ • )) σ dx . Multiplying through by σ and dropping terms not depending on µ , µ , γ , we get a simpliﬁedobjective with the same minimizers: ξ ( µ , µ , γ ) = ( µ − µ • ) Z c −∞ φ ( x ; µ • , σ ) · · [ µ − γ ( x − µ ) − µ • + γ • ( x − µ • )] dx . We have the partial derivatives by diﬀerentiating under the integral sign, ∂ξ∂µ = Z c −∞ φ ( x ; µ • , σ ) · [ µ − γ ( x − µ ) − µ • + γ • ( x − µ • )] dx ,∂ξ∂µ = ( µ − µ • ) + γ Z c −∞ φ ( x ; µ • , σ ) · [ µ − γ ( x − µ ) − µ • + γ • ( x − µ • )] dx = ( µ − µ • ) + γ ∂ξ∂µ ,∂ξ∂γ = − Z c −∞ φ ( x ; µ • , σ ) · [ x − µ ] · [ µ − γ ( x − µ ) − µ • + γ • ( x − µ • )] dx . Suppose ( µ ∗ , µ ∗ , γ ∗ ) is the minimum. By the ﬁrst-order conditions for µ and µ , we have: ∂ξ∂µ ( µ ∗ , µ ∗ , γ ∗ ) = ∂ξ∂µ ( µ ∗ , µ ∗ , γ ∗ ) = 0 ⇒ µ ∗ = µ • . µ ,∂ξ∂µ ( µ • , µ ∗ , γ ∗ ) = 0 ⇒ µ ∗ = µ • + ( γ • − γ ∗ ) · ( µ • − E [ X | X ≤ c ]) . It remains to show γ ∗ = ˜ γ. We have ∂ξ∂γ ( µ ∗ , µ ∗ , γ ∗ ) = − P [ X ≤ c ] · E [( X − µ ∗ ) · ( µ ∗ − γ ∗ ( X − µ ∗ ) − µ • + γ • ( X − µ • )) | X ≤ c ] . We rearrange the expectation term as: E [( X − µ ∗ ) · ( µ ∗ − γ ∗ ( X − µ ∗ ) − µ • + γ • ( X − µ • )) | X ≤ c ]= E [( X − µ ∗ ) | X ≤ c ] · E [( µ ∗ − γ ∗ ( X − µ ∗ ) − µ • + γ • ( X − µ • )) | X ≤ c ]+ Cov( X − µ ∗ , µ ∗ − γ ∗ ( X − µ ∗ ) − µ • + γ • ( X − µ • ) | X ≤ c ] . The ﬁrst-order condition for µ implies E [( µ ∗ − γ ∗ ( X − µ ∗ ) − µ • + γ • ( X − µ • )) | X ≤ c ] = 0 atthe optimum ( µ ∗ , µ ∗ , γ ∗ ). Also, we may drop terms without X in the conditional covarianceoperator, and we get: ∂ξ∂γ ( µ ∗ , µ ∗ , γ ∗ ) = P [ X ≤ c ] · ( γ ∗ − γ • ) · Cov( X , X | X ≤ c ) . We have P [ X ≤ c ] > X , X | X ≤ c ) > , hence we conclude ∂ξ∂γ ( µ ∗ , µ ∗ , γ ∗ )  > γ ∗ > γ • = 0 for γ ∗ = γ • < γ ∗ < γ • . In case that γ > γ • , at the optimum we must have ∂ξ∂γ ( µ ∗ , µ ∗ , γ ∗ ) >

0. By Karush-Kuhn-Tucker condition, this means the minimizer is γ ∗ = γ. Conversely, when ¯ γ < γ • , at theoptimum we must have ∂ξ∂γ ( µ ∗ , µ ∗ , γ ∗ ) <

0. In that case, the minimizer is γ ∗ = ¯ γ . So in bothcases, γ ∗ = ˜ γ as desired. OA 1.16 Proof of Proposition 11

Proof.

I start with the expression for the KL divergence from H • ( c ) to H (Ψ( µ, µ ; γ ); c ). Asin the proof of Proposition 2, this expression can be written as( µ − µ • ) Z c −∞ φ ( x ; µ • , σ ) · " σ + ( µ − γ ( x − µ ) − µ • ) − dx . µ , we get a simpliﬁed expression of the objective, ξ ( µ ) := ( µ − µ • ) Z c −∞ φ ( x ; µ • , σ ) · " ( µ − γ ( x − µ ) − µ • ) dx . Taking the ﬁrst-order condition, ξ ( µ ) = ( µ − µ • ) + (1 + γ ) · R c −∞ φ ( x ; µ • , σ ) · ((1 + γ ) µ − γx − µ • ) dx . The term R c −∞ φ ( x ; µ • , σ ) · ((1 + γ ) µ − γx − µ • ) dx may be rewritten as P [ X ≤ c ] · E [(1 + γ ) µ − γX − µ • | X ≤ c ].Setting the ﬁrst-order condition to 0 and using straightforward algebra, µ ∗ (cid:77) ( c ) = 11 + P [ X ≤ c ] · (1 + γ ) µ • + P [ X ≤ c ] · (1 + γ ) P [ X ≤ c ] · (1 + γ ) µ ◦ ( c ) . OA 2 Optimal-Stopping Problems with L Periods

OA 2.1 An L -Periods Model of the Gambler’s Fallacy In an optimal-stopping problem with L periods, the agent observes a draw x ‘ ∈ R in eachperiod 1 ≤ ‘ ≤ L . At the end of period ‘, the agent must decide between stopping andreceiving a payoﬀ u ‘ ( x , ..., x ‘ ) that depends on the proﬁle of draws ( x i ) ‘i =1 observed so far,or continuing into the next period. If the agent continues into period L without stopping,then payoﬀ will be u L ( x , ..., x L ) . I ﬁrst introduce notation for a class of joint distributions of the L possible draws ( X i ) Li =1 ,which extends the Gaussian case from Example 2 to multiple periods. Deﬁnition OA.1.

Let σ > µ = ( µ i ) Li =1 and triangular array γ = ( γ i,j ) ≤ i ≤ L, ≤ j ≤ i − with each γ i,j ∈ R , let Ψ( µ ; γ ) denote the joint distribution of ( X i ) Li =1 where X ∼ N ( µ , σ ) and, for all i ≥ x j ) i − j =1 ∈ R i − ,X i | ( X = x , ..., X i − = x i − ) ∼ N  µ i − i − X j =1 γ i,j · ( x j − µ j ) , σ  . Under Ψ( µ ; γ ) , ( X i ) Li =1 are jointly Gaussian, such that the conditional mean of X i giventhe previous draws X = x , ..., X i − = x i − depends linearly on these realizations. I consideragents who entertain a set of feasible models , { Ψ( µ ; γ ) : µ ∈ R L } for a ﬁxed array γ whereeach γ i,j >

0. The positive γ i,j capture the gambler’s fallacy, as higher realizations of An equivalent description of the model Ψ( µ ; γ ) is to consider a set of L independent Gaussian randomvariables Z i ∼ N ( µ i , σ ) for 1 ≤ i ≤ L . Let X = Z and iteratively deﬁne X i = Z i − P i − j =1 γ i,j ( X j − µ j ).Using induction, one can show that every X i is a linear function of the Z i ’s, so they are jointly Gaussian. γ i,j , the more that the agent’s prediction of X i depends onrealization of X j . Agents hold a dogmatic belief in the correlation structure between ( X i ) Li =1 , but can ﬂexibly estimate ( µ i ) Li =1 , the fundamentals of the environment. Objectively, ( X i ) Li =1 are independent, so the true joint distribution is Ψ • = Ψ( µ • ; ) for some ( µ • i ) Li =1 . A useful functional form to keep in mind is γ i,j = α · δ i − j − for α > , ≤ δ ≤ , whichcorresponds to Rabin and Vayanos (2010)’s speciﬁcation of gambler’s fallacy in multipleperiods. Here, α relates to the severity of the bias and δ captures how quickly the inﬂuenceof past observations decay in predicting future draws. OA 2.2 Inference from Censored Datasets in L Periods

In general, a stopping strategy in an optimal-stopping problem over L periods is a set offunctions S i : R i → { Stop , Continue } for 1 ≤ i ≤ L − , where S i ( x , ..., x i ) maps therealizations of the ﬁrst i draws to a stopping decision. I consider stopping strategies where S i is a cutoﬀ rule in x i after each partial history ( x , ..., x i − ) , that is there exist ( c i ) L − i =1 with c ∈ R and for i ≥ , c i ( x , ..., x i − ) ∈ R for every ( x , ..., x i − ) ∈ R i − , so that the agentstops after ( x , ...x i ) if and only if x i ≥ c i ( x , ..., x i − ) . A stopping strategy with stoppingregions characterized by a proﬁle of cutoﬀ rules c = ( c i ) L − i − will be abbreviated as S c . For feasible model Ψ and cutoﬀ rule S c , let H (Ψ; S c ) represent the distribution of his-tories when applying rule S c to draws ( X i ) ∼ Ψ . More precisely, consider a procedurewhere X , X , ..., X L is drawn according to Ψ and revealed one at a time. At the earliest1 ≤ ¯ i ≤ L − X ¯ i ≥ c ¯ i ( X , ..., X ¯ i − ) , the process stops and the history records( X , ..., X ¯ i , ∅ , ..., ∅ ) , with L − ¯ i instances of the censoring indicator ∅ replacing the unob-served subvector ( X ¯ i +1 , ..., X L ) . If no such ¯ i exists, then history records the entire proﬁle ofdraws, ( X , ..., X L ) . The distribution of histories generated this way is denoted H (Ψ; S c ) . Deﬁnition OA.2.

For cutoﬀ strategy S c and fundamentals ˆ µ , the KL divergence betweenobjective distribution of histories and the predicted distribution under censoring is the sumof L integrals, D KL ( H (Ψ • ; S c ) || H (Ψ( µ ; γ ); S c ) ) := L X i =1 I i , where I = Z ∞ c φ ( x ; µ • , σ ) ln φ ( x ; µ • , σ ) φ ( x ; µ , σ ) ! dx , and for 2 ≤ i ≤ L − , integral I i is Z c −∞ ... Z c i − ( x ,...,x i − ) −∞ Z ∞ c i ( x ,...,x i − ) i Y k =1 φ ( x k ; µ • k , σ ) ln Q ik =1 φ ( x k ; µ • k , σ ) Q ik =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx i ...dx . I L is given by Z c −∞ ... Z c L − ( x ,...,x L − ) −∞ Z ∞−∞ i Y k =1 φ ( x k ; µ • k , σ ) ln Q ik =1 φ ( x k ; µ • k , σ ) Q ik =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx i ...dx . To interpret, consider a history h = ( x , ..., x i, ∅ , ..., ∅ ) where x k < c k ( x , ..., x k − ) for all k ≤ i − x i ≥ c i ( x , ..., x i − ). This history is possible under the stopping strategy S c . Ithas a likelihood of Π ik =1 φ ( x k ; µ • k , σ ) under Ψ • and a likelihood of Π ik =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) under Ψ( µ ; γ ). So, the integral I i calculates the contribution of all possiblehistories of length i to the KL divergence from H (Ψ( µ ; γ ); S c ) to H (Ψ • ; S c ). In the case of L = 2, this deﬁnition reduces to the Gaussian case of Deﬁnition 5, the KL divergence in thetwo-periods baseline model, with γ = γ , and c ∈ R as the censoring threshold.The KL-divergence minimizersmin µ ∈ R L D KL ( H (Ψ • ; S c ) || H (Ψ( µ ; γ ); S c ) )are the pseudo-true fundamentals with respect to stopping strategy S c . The next propositiongives an explicit characterization of them.

Proposition OA.1.

Let stopping strategy S c be given. For each i ≥ , let R i represent theregion { ( x , ..., x i ) : x < c , x < c ( x ) , ..., x i < c i ( x , .., x i − ) } ⊆ R i . The pseudo-true fundamentals with respect to S c are µ ∗ = µ • and, iteratively, ˆ µ ∗ i = µ • i − i − X j =1 γ i,j · ( µ ∗ j − E Ψ • [ X j | ( X k ) i − k =1 ∈ R i − ]) . The expression for µ ∗ i in the general L -periods setting resembles the expression for µ ∗ in the two-period setting. Relative to the truth µ • i , the estimate µ ∗ i is distorted by the factthat X i is only observed when previous draws ( X , ..., X i − ) fall into the continuation re-gion R i − ⊆ R i − associated with S c . The agent uses this censored empirical distribution of( X , ..., X i − , X i ) to infer the period- i fundamental, under a dogmatic belief about the corre-lation structure between the draws given by γ . Importantly, whether a certain realization X j for j < i should be judged as below-average (and thus predict a higher X i ) or above-average(and thus predict a lower X i ) depends on agent’s belief about the period j fundamental, µ ∗ j ,which gives the iterative structure of the expression for ˆ µ ∗ i .The proof of this result follows two steps. First, recall that D KL ( H (Ψ • ; S c ) ||H (Ψ( µ ; γ ); S c ))is deﬁned as the sum P Li =1 I i , where I i is the KL-divergence contribution from histories withlength i . I rewrite this expression as the sum of L diﬀerent integrals, P Li =1 J i , where J i is theKL-divergence contributions from histories containing X i . So, J i is a function of µ , ..., µ i .19he second step is similar to deriving the explicit expressions of pseudo-true fundamentalsfor Example 2, where I show ∂J i ∂µ j is a linear multiple of ∂J i ∂µ i whenever j < i . First-ordercondition at µ ∗ allows for a telescoping rearrangement, yielding ∂J i ∂µ i ( µ ∗ ) = 0 for every i . Theproposition readily follows.Let J = ( µ • − µ ) σ and for i ≥ , let J i be Z c −∞ ... Z c i − ( x ,...,x i − ) −∞ i − Y j =1 φ ( x j ; µ • j , σ ) · " ( µ • i − µ i + P i − j =1 γ i,j · ( x j − µ j )) σ dx i − ...dx . The expression in square brackets is the KL divergence from the agent’s feasible model for X i | ( X = x , ..., X i − = x i − ) to the true distribution of X i , under fundamentals µ , ..., µ i .So, the integral J i is a weighted average of this divergence, taken across diﬀerent realizationsof previous draws ( x , ..., x i − i ) with weights given by the true likelihood of observing such asequence of draws in periods 1 through i − S c . Note that foreach i , J i (and I i ) depends on µ , ..., µ i .I ﬁrst develop an alternative expression of D KL ( H (Ψ • ; S c ) ||H (Ψ( µ ; γ ); S c )) as the sumof J i . Lemma OA.2. P Li =1 I i = P Li =1 J i .Proof. Let ˜ I i be a slightly modiﬁed version of I i , where the inner-most integral over x i hasthe range ( −∞ , ∞ ), so ˜ I i is Z c −∞ ... Z c i − ( x ,...,x i − ) −∞ Z ∞−∞ i Y k =1 φ ( x k ; µ • k , σ ) ln Π ik =1 φ ( x k ; µ • k , σ )Π ik =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx i ...dx . Observe that ˜ I L = I L . Inductively I will show ˜ I L + P L − i =1 I i = P L i =1 J i for every 1 ≤ L ≤ L . When L = 1, this just says ˜ I = J , which is true by deﬁnition. Now suppose thestatement holds for some L = S ≤ L −

1. I show it also holds when L = S + 1.We have˜ I S +1 + S X i =1 I i = ˜ I S +1 + ( I S − ˜ I S ) + ˜ I S + S − X i =1 I i ! = ˜ I S +1 + ( I S − ˜ I S ) + S X i =1 J i where the last equality comes from the inductive hypothesis. Since I S and ˜ I S simply diﬀerin terms of the bounds of the inner-most integral, I S − ˜ I S is − Z c −∞ ... Z c S ( x ,...,x S − ) −∞ S Y k =1 φ ( x k ; µ • k , σ ) · ln Π Sk =1 φ ( x k ; µ • k , σ )Π Sk =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx S ...dx . Π S +1 k =1 φ ( x k ; µ • k ,σ )Π S +1 k =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) ,σ ) ! term in the integrand of ˜ I S +1 intothe sumln Π Sk =1 φ ( x k ; µ • k , σ )Π Sk =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! +ln φ ( x S +1 ; µ S +1 , • , σ ) φ ( x S +1 ; µ S +1 − P Sj =1 γ S +1 ,j · ( x j − µ j ) , σ ) ! . We know that Z c −∞ ... Z c S ( .. ) −∞ Z ∞−∞ S +1 Y k =1 φ ( x k ; µ • k , σ ) ln Π Sk =1 φ ( x k ; µ • k , σ )Π Sk =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx S +1 ...dx = Z c −∞ ... Z c S ( .. ) −∞ S Y k =1 φ ( x k ; µ • k , σ ) Z ∞−∞ φ ( x S +1 ; µ • S +1 , σ ) · ln Π Sk =1 φ ( x k ; µ • k , σ )Π Sk =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx S +1 ...dx = Z c −∞ ... Z c S ( .. ) −∞ S Y k =1 φ ( x k ; µ • k , σ ) · ln Π Sk =1 φ ( x k ; µ • k , σ )Π Sk =1 φ ( x k ; µ k − P k − j =1 γ k,j · ( x j − µ j ) , σ ) ! dx S ...dx = − ( I S − ˜ I S ) where c S ( .. ) abbreviates the bound of integration c S ( x , ..., x S − ) . At the same time, Z c −∞ ... Z c S ( .. ) −∞ Z ∞−∞ S +1 Y k =1 φ ( x k ; µ • k , σ ) ln φ ( x S +1 ; µ • S +1 , σ ) φ ( x S +1 ; µ S +1 − P Sj =1 γ S +1 ,j · ( x j − µ j ) , σ ) ! dx S +1 ...dx = Z c −∞ ... Z c S ( .. ) −∞ S Y k =1 φ ( x k ; µ • k , σ ) Z ∞−∞ φ ( x S +1 ; µ • S +1 , σ ) ln φ ( x S +1 ; µ • S +1 , σ ) φ ( x S +1 ; µ S +1 − P Sj =1 γ S +1 ,j · ( x j − µ j ) , σ ) ! dx S +1 ...dx = Z c −∞ ... Z c S ( .. ) −∞ S Y k =1 φ ( x k ; µ • k , σ ) D KL [ N ( µ • S +1 , σ ) , N ( µ S +1 − S X j =1 γ S +1 ,j · ( x j − µ j ) , σ )] dx S ...dx = Z c −∞ ... Z c S ( .. ) −∞ S Y k =1 φ ( x k ; µ • k , σ ) ( µ • S +1 − µ S +1 + P Sj =1 γ S +1 ,j · ( x j − µ j )) σ dx S ...dx = J S +1 where we used the closed-form expression of the KL divergence between two Gaussian dis-tributions, D KL  N ( µ • S +1 , σ ) ||N ( µ S +1 − S X j =1 γ S +1 ,j · ( x j − µ j ) , σ )  = ( µ • S +1 − µ S +1 + P Sj =1 γ S +1 ,j · ( x j − µ j )) σ . So by induction, ˜ I L + P L − i =1 I i = P Li =1 J i . As ˜ I L = I L , we are done.Using Lemma OA.2, I can now give the proof of Proposition OA.1. Proof.

Abbreviate D KL ( H (Ψ • ; S c ) ||H (Ψ( µ ; γ ); S c )) as ξ ( µ , ..., µ L ) . By Lemma OA.2, ξ ( µ , ..., µ L ) = P Li =1 J i ( µ , ..., µ i ). We show that the recursively deﬁned parameters are the only ones satis-fying the ﬁrst-order condition, ∂ξ∂µ i (ˆ µ , ..., ˆ µ L ) = 0 for each i .21n the integrand for J i , each µ j where 1 ≤ j ≤ i appears once in the term ( µ • i − µ i + P i − j =1 γ i,j · ( x j − µ j )) σ .For any ( x , ..., x i − ) , the partial derivative of this term with respect to µ j for j < i is γ i,j times its partial derivative with respect to µ i . That is, at any values of ˆ µ , ..., ˆ µ i , we get ∂J i ∂µ j (ˆ µ , ..., ˆ µ i ) = γ i,j ∂J i ∂µ i (ˆ µ , ..., ˆ µ i )for each 1 ≤ j < i .At any ( µ ∗ , ..., µ ∗ L ) satisfying the ﬁrst-order condition for µ L , we must have ∂ξ∂µ L ( µ ∗ , ..., µ ∗ L ) = ∂J L ∂µ L ( µ ∗ , ..., µ ∗ L ) = 0 . By above, this also implies for each 1 ≤ j < L , either ∂J L ∂µ j ( µ ∗ , ..., µ ∗ L ) = 0, or γ L,k = 0 (inwhich case J L is not actually a function of µ j and ∂J L ∂µ j = 0 everywhere). Either way, thisshows for the case of j = L − ,∂ξ∂µ L − ( µ ∗ , ..., µ ∗ L ) = ∂J L ∂µ L − ( µ ∗ , ..., µ ∗ L ) + ∂J L − ∂µ L − ( µ ∗ , ..., µ ∗ L − )= ∂J L − ∂µ L − ( µ ∗ , ..., µ ∗ L − ) . If ( µ ∗ , ..., µ ∗ L ) also satisﬁes the ﬁrst-order condition for µ L − , then ∂J L − ∂µ L − ( µ ∗ , ..., µ ∗ L − ) = 0.Continuing this telescoping argument, we conclude if ( µ ∗ , ..., µ ∗ L ) satisﬁes the ﬁrst-order con-dition for all µ i , 1 ≤ i ≤ L , then ∂J i ∂µ i ( µ ∗ , ..., µ ∗ i ) = 0 for every 1 ≤ i ≤ L .Given the form of J , it is clear that ∂J ∂µ ( µ ∗ ) = 0 implies µ ∗ = µ • . Also, ∂J i ∂µ i ( µ ∗ , ..., µ ∗ i ) = − Z c −∞ ... Z c i − ( x ,...,x i − ) −∞ i − Y j =1 φ ( x j ; µ • j , σ ) " ( µ • i − µ ∗ i + P γ i,j · ( x j − µ ∗ j )) σ dx i − ...dx . Using the fact that ∂J i ∂µ i ( µ ∗ , ..., µ ∗ i ) = 0 , we multiply the integrand by the constant − σ · ( Z c −∞ ... Z c i − ( x ,...,x i − ) −∞ i − Y j =1 φ ( x j ; µ • j , σ ) dx i − ...dx ) − and get E Ψ •  µ • i − µ ∗ i + i − X j =1 γ i,j · ( X j − µ ∗ j ) | ( X k ) i − k =1 ∈ R i −  = 0 . Rearranging, we have µ ∗ i = µ • i − P i − j =1 γ i,j · ( µ ∗ j − E Ψ • [ X j | ( X k ) i − k =1 ∈ R i − ]) as desired. Thismeans the only ( µ ∗ , ..., µ ∗ L ) satisfying the ﬁrst-order condition for minimizing KL divergenceis the one iteratively given in this proposition.22ow I turn to a special class of cutoﬀ-based stopping rules where c k is independent ofhistory. So, a stopping rule of this kind S c can be viewed simply as a list of L constants, c , ..., c L ∈ R , such that the agent stops after the draw X ‘ = x ‘ if and only if x ‘ < c ‘ . Ishow that the expression for the pseudo-true fundamentals greatly simpliﬁes and admits apath-counting interpretation. Deﬁnition OA.3.

For 1 ≤ j < i ≤ L, a path p from i to j is a sequence of pairs p =(( i , i ) , ..., ( i M − , i M )) with M ≥ i = i , i M = j , and i m +1 < i m for all m = 0 , , ..., M − p is p ) := M . The weight of p is W ( p ) := Π ≤ m ≤ M − ( − γ i ‘ ,i ‘ +1 ). Denote theset of all paths from i to j as P [ i → j ].That is, we may imagine a network with L nodes, one per period of the optimal-stoppingproblem. There is a directed edge with weight − γ i,j for all pairs i > j . A path from i to j isa concatenation of edges, starting with i and ending with j. Its weight is the product of theweights of all the edges used.The next proposition diﬀers from Proposition OA.1 in that the expression for the pseudo-true fundamental µ ∗ i does not involve other pseudo-true fundamentals µ ∗ j . It shows that thedistortion of µ ∗ i from the true value µ • i depends on terms µ • j − E Ψ • [ X j | X j ≤ c j ] and the totalnumber of paths from i to j in the network that γ deﬁnes. Proposition OA.2.

For stopping strategy S c = ( c , ..., c L ) ∈ R L , the pseudo-true funda-mentals are given by µ ∗ i = µ • i + i − X j =1  X p ∈ P [ i → j ] W ( p )  · (cid:16) µ • j − E [ X j | X j ≤ c j ] (cid:17) . Proof.

This clearly holds for i = 1 . By induction assume this holds for all i ≤ K for some K ≤ L −

1. I show that this also holds for i = K + 1 . From Proposition OA.1, µ ∗ i = µ • i − i − X j =1 γ i,j · ( µ ∗ j − E Ψ • [ X j | ( X k ) i − k =1 ∈ R i − ]) . The continuation region R i − is the rectangle ( −∞ , c ) × ... × ( −∞ , c i − ) ∈ R i − . As( X , ..., X i − ) are objectively independent, the events { X k ≤ c k } for k = j are indepen-dent of X j , so the expression simpliﬁes to µ ∗ i = µ • i − i − X j =1 γ i,j · ( µ ∗ j − E Ψ • [ X j | X j ≤ c j ]) . µ ∗ j for 1 ≤ j ≤ i − µ ∗ K +1 = µ • K +1 − K X j =1 γ K +1 ,j · ( µ • j − E Ψ • [ X j | X j ≤ c j ])+ K X j =1 − γ K +1 ,j ·  j − X k =1  X p ∈ P [ j → k ] W ( p )  · ( µ • k − E Ψ • [ X k | X k ≤ c k ])  = µ • K +1 + K X j =1  ( − γ K +1 ,j ) + K X k = j +1 − γ K +1 ,k ·  X p ∈ P [ k → j ] W ( p )  · (cid:16) µ • j − E Ψ • [ X j | X j ≤ c j ] (cid:17) . Paths in P [ K + 1 → j ] come in two types. The ﬁrst type is the direct path consisting of justone edge ( K + 1 , j ), with weight − γ K +1 ,j . The second type consists of the indirect paths p = (( K + 1 , k ) , p ) where p ∈ P [ k → j ] . We have W ( p ) = − γ K +1 ,k · W ( p ) . We thereforesee that the expression P Kj =1 h ( − γ K +1 ,j ) + P Kk = j +1 − γ K +1 ,k · (cid:16)P p ∈ P [ k → j ] W ( p ) (cid:17)i in fact givesthe sum of weights for all paths in P [ K + 1 → j ] . So, we have shown that the claim holdsalso for i = K + 1 . By induction it holds for all 1 ≤ i ≤ L .As a corollary, suppose L ≥ γ have the Rabin and Vayanos (2010) functional formof γ i,j = α · δ i − j − for α > , ≤ δ ≤

1. I show that all pseudo-true fundamentals aretoo pessimistic in every dataset censored with S c = ( c , ..., c L ) ∈ R L if and only if δ > α .The idea is the inﬂuence of the gambler’s fallacy psychology must not decay “too quickly”relative to the inﬂuence of the most recent observation. This condition is satisﬁed in allthe calibration exercises in Rabin and Vayanos (2010) and in the structural estimations ofBenjamin, Moore, and Rabin (2017). The result shows the over-pessimism from the 2-periodsmodel extends into the L periods model for history-independent stopping rules, provided theregularity condition on the parametrization of the L -periods gambler’s fallacy holds. Corollary OA.1.

Suppose L ≥ and γ i,j = α · δ i − j − for α > , ≤ δ ≤ . If δ > α ,then for all stopping strategies S c = ( c , ..., c L ) ∈ R L , the pseudo-true fundamentals satisfy µ ∗ i < µ • i for all i . If δ < α , then there exists a stopping strategy S c = ( c , ..., c L ) ∈ R L suchthat µ ∗ i > µ • i for at least one i. To understand the intuition, consider an example that violates the condition of thecorollary, α = 0 . , δ = 0 , so that γ , = 0 . , γ , = 0 . , and γ , = 0. The agent expectsreversals between the pairs ( X , X ) and ( X , X ), but his expectation for X | ( X = x , X = x ) does not vary with x . By the same logic as the two-periods censoring eﬀect, inferenceabout the second-period fundamental µ ∗ decreases as c decreases, with lim c →−∞ µ ∗ ( c ) = −∞ . This has an important indirect eﬀect on µ ∗ , since a very pessimistic µ ∗ leads theagent to interpret objectively typical draws of X as greatly above average. Expecting lowvalues of X after these surprisingly high draws of X , the agent infers the fundamental µ ∗ to be above the sample mean of X in the dataset, hence overestimating it as c → −∞ .24hen δ is strictly positive, however, there is an opposite eﬀect where lower sample mean of X in observations containing uncensored X lead to more pessimistic inference about thethird-period fundamental. When δ > . , overoptimistic inference never happens becausethis second eﬀect dominates. Proof.

First suppose δ > α.

By Proposition OA.2, since µ • j − E [ X j | X j ≤ c j ] > c j ∈ R , I only need to show that P p ∈ P [ i → j ] W ( p ) < i > j pair. Due to the stationarityof γ under the γ i,j = α · δ i − j − functional form, it suﬃces to prove P p ∈ P [ i → W ( p ) < ≤ i ≤ L. When i = 2 , P [2 →

1] consists of a single path with weight − α <

0. By inductionsuppose P p ∈ P [ i → W ( p ) < i ≤ S for 2 ≤ S ≤ L − . We can exhaustively enumerate p ∈ P [ S + 1 →

1] by relating each path in P [ S →

1] to a pair of paths in P [ S + 1 → . Relate p = (( S, i ) , ..., ( i M − , ∈ P [ S →

1] to the pair p = (( S + 1 , i ) , ..., ( i M − , p = (( S + 1 , S ) , ( S, i ) , ..., ( i M − , p modiﬁes the ﬁrst edge in p from ( S, i ) to( S + 1 , i ) , while p simply concatenates the extra edge ( S + 1 , S ) in front of p. We have W ( p ) = δ · W ( p ), because the weight of ( S, i ) is − αδ S − i − while the weight of ( S + 1 , i )is − αδ S − i , and the two paths are otherwise identical. We have W ( p ) = − α · W ( p ), sincethe newly concatenated edge has weight − α . This argument shows P p ∈ P [ S +1 → W ( p ) =( δ − α ) · P p ∈ P [ S → W ( p ) . Since δ − α > P p ∈ P [ S → W ( p ) < P p ∈ P [ S +1 → W ( p ) < . By induction, we have shown that P p ∈ P [ i → W ( p ) < ≤ i ≤ L. Next, suppose δ < α . By Proposition OA.2, µ ∗ = µ • + (cid:16) − αδ + α (cid:17) · ( µ • − E [ X | X ≤ c ]) + ( − α ) ( µ • − E [ X | X ≤ c ]) . The coeﬃcient in front of µ • − E [ X | X ≤ c ] comes from the fact that there are two pathsfrom 3 to 1, with weights − γ , = − αδ and ( − γ , ) · ( − γ , ) = ( − α ) · ( − α ) = α . We have( − αδ + α ) = α ( α − δ ) > α > δ < α . So, ﬁxing c , as c → −∞ we get µ • − E [ X | X ≤ c ] → ∞ and therefore µ ∗ → ∞ . OA 3 Proof of Theorem 1

In this section I prove the almost-sure convergence of beliefs and behavior when biased agentsact one at a time and entertain uncertainty over both µ and µ .For µ < ¯ µ , µ < ¯ µ , let ♦ ([ µ , ¯ µ ] , [ µ , ¯ µ ]) refer to the parallelogram in R with thevertices: • ( µ , ¯ µ + γ (¯ µ − µ )) • ( µ , µ + γ (¯ µ − µ )) 25 (¯ µ , ¯ µ − γ (¯ µ − µ )) • (¯ µ , µ − γ (¯ µ − µ ))In other words, ♦ ([ µ , ¯ µ ] , [ µ , ¯ µ ]) is the parallelogram constructed by starting with therectangle [ µ , ¯ µ ] × [ µ , ¯ µ ], then replacing the top and bottom edges with lines with slope − γ (and adjusting the left and right edges accordingly to connect with the new top andbottom edges.)Consider a sequence of short-lived agents playing the stage game in rounds t = 1 , , , ... They are uncertain about both µ and µ , with prior density of the ﬁrst round agent m ( µ , µ ) supported on feasible fundamentals M = ♦ ([ µ , ¯ µ ] , [ µ , ¯ µ ]) as in Remark 1(b).I abbreviate this support as ♦ when no confusion arises. Each agent t choose the the opti-mal cutoﬀ ˜ C t maximizing expected payoﬀ based on the ﬁnal belief ˜ M t − of the immediatepredecessor. I show the almost sure convergence of stochastic processes ( ˜ C t ) and ( ˜ M t ) tothe unique steady state under the hypotheses of Theorem 1. OA 3.1 Preliminary Results

First, I consider how the predicted second-period payoﬀ after X = x depends on theparameters of the feasible model Ψ( µ , µ ; γ ). Lemma OA.3.

For every µ , µ , x ∈ R , the conditional distribution X | X = x is thesame under Ψ( µ • , µ + γ ( µ − µ • ); γ ) and Ψ( µ , µ ; γ ) . So in particular, C ( µ , µ ; γ ) = C ( µ • , µ + γ ( µ − µ • ); γ ) .Proof. Under the feasible model Ψ( µ • , µ + γ ( µ − µ • ); γ ), the conditional density of X given X = x is f ( · | µ + γ ( µ − µ • ) − γ ( x − µ • )), which simpliﬁes to f ( · | µ − γ ( x − µ )). It iseasy to see that this is also the expression for the same conditional density under Ψ( µ , µ ; γ ).Suppose C ( µ , µ ; γ ) = c. This implies the indiﬀerence condition, u ( c ) = E Ψ( µ ,µ ; γ ) [ u ( c, X ) | X = c ] . But by the equivalence of conditional distribution given above, u ( c ) = E Ψ( µ • ,µ + γ ( µ − µ • ); γ ) [ u ( c, X ) | X = c ] . This means c is also the indiﬀerence threshold for the model Ψ( µ • , µ + γ ( µ − µ • ); γ ).As a corollary, this lemma shows the restriction to cutoﬀ strategies is without loss, andthat ˜ C t is well deﬁned. That is, for any belief given by a density on M , there exists a cutoﬀstrategy that is weakly optimal among the class of all stopping strategies, and further this I assume that agents do not update beliefs within the stage game. x ∈ R and any density ˜ m on M , Z M E Ψ( µ ,µ ; γ ) [ u ( x , X ) | X = x ] · ˜ m ( µ , µ ) d ( µ , µ ) = Z ¯ µ ◦ µ ◦ E Ψ( µ • ,µ ; γ ) [ u ( x , X ) | X = x ] · ˜ m V ( µ ) dµ where ¯ µ ◦ := max { µ : ( µ • , µ ) ∈ ♦ } and µ ◦ := min { µ : ( µ • , µ ) ∈ ♦ } , and ˜ m V ( µ ) isthe integral of ˜ m ( µ , µ ) over the line in ♦ with slope − γ that passes through ( µ • , µ ).This equality holds because by Lemma OA.3, all fundamentals on that line imply the samecontinuation payoﬀ after X = x as the fundamentals ( µ • , µ ) . The proof of Lemma A.13shows that x u ( x ) − Z ¯ µ ◦ µ ◦ E Ψ( µ • ,µ ; γ ) [ u ( x , X ) | X = x ] ˜ m V ( µ ) dµ is a strictly increasing, continuous function that crosses 0.Now, the key step is to separate the two-dimensional inference problem into a pair ofone-dimensional problems. OA 3.2 Learning µ • I deﬁne the stochastic process of data log-likelihood (for a given fundamental). For each µ , µ ∈ supp( m ), let ‘ t ( µ , µ )( ω ) be the log likelihood that the fundamentals are ( µ , µ )and histories ( ˜ H s ) s ≤ t ( ω ) are generated by the end of round t . It is given by ‘ t ( µ , µ )( ω ) := ln( m ( µ , µ )) + t X s =1 ln(lik( ˜ H s ( ω ); µ , µ ))where lik( x , ∅ ; µ , µ ) := f ( x | µ ) and lik( x , x ; µ , µ ) := f ( x | µ ) · f ( x | µ − γ ( x − µ )). We have f ( x | µ ) = g ( x − µ + µ • ) and f ( x | µ − γ ( x − µ )) = g ( x − µ + µ • + γ ( x − µ )). By simple algebra, we may expand ‘ t ( µ , µ )( ω ) = ln( m ( µ , µ )) + t X s =1 ln[ g ( X ,s ( ω ) − µ + µ • )]+ t X s =1 { X ,s ( ω ) ≤ ˜ C s ( ω ) } · ln [ g ( X ,s ( ω ) − µ + µ • + γ ( X ,s ( ω ) − µ ))]I ﬁrst establish that, without knowing anything about the process ( C t ) , we can concludeagents learn µ • arbitrarily well. Lemma OA.4.

For every (cid:15) > , almost surely lim t →∞ ˜ M t ( ♦ ∩ ([ µ • − (cid:15), µ • + (cid:15) ] × R )) = 1 . roof. I ﬁrst calculate the directional derivative ∇ v t ‘ t ( µ , µ ) , where v = / √ γ − γ/ √ γ ! is the unit vector with slope − γ . We have ∂ ( ‘ t /t ) ∂µ ( µ , µ ) = 1 t D m ( µ , µ ) m ( µ , µ ) − t t X s =1 g ( X ,s − µ + µ • ) g ( X ,s − µ + µ • ) − γt t X s =1 { X ,s ≤ ˜ C s } · λ ( X ,s − µ + µ • + γ ( X ,s − µ )) ∂ ( ‘ t /t ) ∂µ ( µ , µ ) = 1 t D m ( µ , µ ) m ( µ , µ ) − t t X s =1 { X ,s ≤ ˜ C s } · λ ( X ,s − µ + µ • + γ ( X ,s − µ )) , where D m and D m are the two partial derivatives of m . At every ω and every ( µ , µ ) , note the last summand in ∂ ( ‘ t /t ) ∂µ is γ times the last summand in ∂ ( ‘ t /t ) ∂µ . Therefore, ∇ v t ‘ t ( µ , µ ) = − σ √ γ t t X s =1 g ( X ,s − µ + µ • ) g ( X ,s − µ + µ • ) ! + 1 t √ γ t D m ( µ , µ ) m ( µ , µ ) − γt √ γ D m ( µ , µ ) m ( µ , µ ) . Since m , D m , D m are continuous on the compact set ♦ , there exists some 0 < B < ∞ so that | D m ( µ ,µ ) m ( µ ,µ ) | < B and | D m ( µ ,µ ) m ( µ ,µ ) | < B for all ( µ , µ ) ∈ ♦ . This means for every ω, inf ( µ ,µ ) ∈ ♦ L "(cid:18) ∇ v t ‘ t ( µ , µ ) (cid:19) + 1 σ √ γ t t X s =1 g ( X ,s − µ + µ • ) g ( X ,s − µ + µ • ) ! ≥ − t (1 + γ ) √ γ B, where ♦ L := ♦ ∩ ([ µ , µ • − (cid:15) ] × R ) is the sub-parallelogram to the left of µ • − (cid:15) . By law oflarge numbers applied to the i.i.d. sequence ( g ( X ,s − ( µ • − (cid:15) )+ µ • ) g ( X ,s − ( µ • − (cid:15) )+ µ • ) ) s ≥ , almost surely1 t t X s =1 g ( X ,s − ( µ • − (cid:15) ) + µ • ) g ( X ,s − ( µ • − (cid:15) ) + µ • ) → E X ∼ g " g ( X + (cid:15) ) g ( X + (cid:15) ) . Since E X ∼ g (cid:20) g ( X ) g ( X ) (cid:21) = 0 and since z g ( z ) g ( z ) = ddz (ln( g ( z )) is strictly decreasing by log-concavity, there is some δ > E X ∼ g (cid:20) g ( X + (cid:15) ) g ( X + (cid:15) ) (cid:21) = − δ. Furthermore, for any µ ≥ • − (cid:15), then for any x ∈ R , g ( x − µ + µ • ) g ( x − µ + µ • ) ≤ g ( x + (cid:15) ) g ( x − (cid:15) ) . Along any ω where t P ts =1 g ( X ,s − ( µ • − (cid:15) )+ µ • ) g ( X ,s − ( µ • − (cid:15) )+ µ • ) → − δ , we therefore also havelim sup t →∞ sup µ ≥ µ • − (cid:15) t t X s =1 g ( X ,s − µ + µ • ) g ( X ,s − µ + µ • ) ≤ − δ. Therefore almost surelylim inf t →∞ inf ( µ ,µ ) ∈ ♦ L (cid:18) ∇ v t ‘ t ( µ , µ ) (cid:19) ≥ δσ √ γ . We may divide ♦ L further divide into two halves: ♦ L, := ♦ ∩ ([ µ , µ + d/ × R ) ♦ L, := ♦ ∩ ([ µ + d/ , µ • − (cid:15) ] × R )where d := µ • − (cid:15) − µ . I will show that lim t →∞ ˜ M t ( ♦ L, ) = 0 almost surely. The idea iswe can map every point in ♦ L, to another point in ♦ L, in the direction of v . For everypoint, its image under the map will have much higher posterior probability, since we have auniform, strictly positive lowerbound on the directional derivative of log-likelihood ‘ t in thedirection of v .˜ M t ( ♦ L, ) = Z ♦ L, ˜ m t ( µ , µ ) dµ = Z ♦ L, ˜ m t ( µ , µ ) · ˜ m t ( µ − d, µ − γd )˜ m t ( µ , µ ) dµ = Z ♦ L, ˜ m t ( µ , µ ) exp( ‘ t ( µ − d, µ − γd ) − ‘ t ( µ , µ )) dµ = Z ♦ L, ˜ m t ( µ , µ ) exp( − Z d ∇ v ‘ t ( µ − d + z, µ − γd + γz ) dz ) dµ Almost surely,lim inf t →∞ inf ( µ ,µ ) ∈ ♦ L, ,z ∈ [0 ,d ] ( ∇ v ‘ t ( µ − d + z, µ − γd + γz )) ≥ tδσ √ γ , so almost surelylim sup t →∞ ˜ M t ( ♦ L, ) ≤ lim sup t →∞ Z ♦ L, ˜ m t ( µ , µ ) exp( − dtδσ √ γ ) dµ. ω and t , the RHS is bounded above by exp( − dtδσ √ γ ), which tends to 0 as t → ∞ since d, δ >

0. So in fact ˜ M t ( ♦ L, ) → ♦ L, into two equal halves and iterating this argument, we eventuallyshow lim t →∞ ˜ M t ( ♦ ∩ ([ µ • − (cid:15), ∞ ) × R )) = 1. A symmetric argument also shows lim t →∞ ˜ M t ( ♦ ∩ (( −∞ , µ • + (cid:15) ] × R )) = 1 . OA 3.3 Decomposing Partial Derivative of Log-Likelihood WithRespect to µ I record a decomposition of ∂‘∂µ ( µ , µ ), the partial derivative of the log-likelihood processwith respect to its second argument.Deﬁne two stochastic processes: ϕ s ( µ , µ ) := − λ ( X ,s − µ + µ • + γ ( X ,s − µ )) · { X ,s ≤ ˜ C s } ¯ ϕ s ( µ , µ ) := ∂∂µ L ( µ + γ ( µ − µ • ) | ˜ C s ) . Note that ¯ ϕ s ( µ , µ ) is measurable with respect to F s − , since ( ˜ C t ) is a predictable pro-cess. Write ξ s ( µ , µ ) := ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) and y t ( µ , µ ) := P ts =1 ξ s ( µ , µ ). Write z t ( µ , µ ) := P ts =1 ¯ ϕ s ( µ , µ ). Lemma OA.5. ∂‘ t ∂µ ( µ , µ ) = D m ( µ ,µ ) m ( µ ,µ ) + y t ( µ , µ ) + z t ( µ , µ ) Proof.

This comes from expanding ‘ t ( µ , µ ) and taking its derivative as in the proof ofLemma OA.4.Now I derive two results about the ξ t ( µ , µ ) processes for diﬀerent pairs ( µ , µ ) . Lemma OA.6.

There exists κ ξ < ∞ so that for every ( µ , µ ) ∈ ♦ and for every t ≥ ,ω ∈ Ω , E [ ξ t ( µ , µ ) |F t − ]( ω ) ≤ κ ξ .Proof. Note that ¯ ϕ t ( µ , µ ) is measurable with respect to F t − . Also, ϕ t ( µ , µ ) |F t − = ϕ t ( µ , µ ) | ˜ C t , because by independence of X t from ( X s ) t − s =1 , the only information that F t − contains about ϕ t ( µ , µ ) is in determining the cutoﬀ threshold ˜ C t .At a sample path ω so that ˜ C t ( ω ) = c ∈ R , E [ ϕ s ( µ , µ ) |F t − ]( ω ) = E [ − λ ( X ,s − µ + µ • + γ ( X ,s − µ )) · { X ≤ c } ]= ∂∂µ Z c −∞ g ( x ) · Z ∞−∞ g ( x ) · ln( f ( X ,s | µ − γ ( X ,s − µ ))) dx dx = ∂∂µ Z c −∞ g ( x ) · Z ∞−∞ g ( x ) · ln( f ( X ,s | [ µ + γ ( µ − µ • )] − γ ( X ,s − µ • ))) dx dx = ∂∂µ ¯ L ( µ + γ ( µ − µ • ) | c ) . E [ ϕ s ( µ , µ ) |F t − ]( ω ) = ¯ ϕ s ( µ , µ )( ω ). Since this holds regardless of c , weget that E [ ϕ s ( µ , µ ) |F t − ] = ¯ ϕ t ( µ , µ ) for all ω, that is to say E [ ξ t ( µ , µ ) |F t − ] = Var[ ϕ t ( µ , µ ) |F t − ] ≤ E [ ϕ t ( µ , µ ) |F t − ] ≤ E [( λ ( X ,s − µ + µ • + γ ( X ,s − µ ))) ] . By the same argument as in the proof of Lemma A.15, E h ( λ ( X − µ + µ • + γ ( X − µ ))) i exists for all µ , µ ∈ R . The (ﬁnite) maximum value this expectation takes on the compactset ♦ can be taken as κ ξ . OA 3.4 Heidhues, Koszegi, and Strack (2018)’s Law of Large Num-bers

I use a statistical result from Heidhues, Koszegi, and Strack (2018) to show that the y t /t termin the decomposition of t ∂‘ t ∂µ almost surely converges to 0 in the long run, and furthermorethis convergence is uniform on ♦ . This lets me focus on terms of the form ¯ ϕ s ( µ , µ ), whichcan be interpreted as the expected contribution to the log likelihood derivative from round s data. This lends tractability to the problem as ¯ ϕ s ( µ , µ ) only depends on ˜ C s , but not on X ,s or X ,s . Lemma OA.7.

For every ( µ , µ ) ∈ ♦ , lim t →∞ | y t ( µ ,µ ) t | = 0 almost surely.Proof. Heidhues, Koszegi, and Strack (2018)’s Proposition 10 shows that if ( y t ) is a martin-gale such that there exists some constant v ≥ y ] t ≤ vt almost surely, where [ y ] t is the quadratic variation of ( y t ) , then almost surely lim t →∞ y t t = 0.Consider the process y t ( µ , µ ) for a ﬁxed ( µ , µ ) ∈ ♦ . By deﬁnition y t = P ts =1 ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ). As established in the proof of Lemma OA.6, for every s, ¯ ϕ s ( µ , µ ) = E [ ϕ s ( µ , µ ) |F s − ].31o for t < t, E [ y t ( µ , µ ) |F t ] = t X s =1 ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) + E  t X s = t +1 ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) |F t  = t X s =1 ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) + t X s = t +1 E [ E [ ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) |F s − ] | F t ]= t X s =1 ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) + 0= y t ( µ , µ ) . This shows ( y t ( µ , µ )) t is a martingale. Also,[ y ( µ , µ )] t = t − X s =1 E [( y s ( µ , µ ) − y s − ( µ , µ )) |F s − ]= t − X s =1 E [ ξ s ( µ , µ ) |F s − ] ≤ κ ξ · t by Lemma OA.6. Therefore Heidhues, Koszegi, and Strack (2018) Proposition 10 applies. Lemma OA.8. lim t →∞ sup ( µ ,µ ) ∈ ♦ | y t ( µ ,µ ) t | = 0 almost surely.Proof. This argument is similar to Lemma 11 in Heidhues, Koszegi, and Strack (2018). Iapply Lemma 2 of Andrews (1992), which says to prove this result I just need to checkconditions BD, P-SSLN, and S-LIP from Andrews (1992). BD holds because ♦ is a boundedsubset of R . P-SLLN holds because by Lemma OA.7, which shows for all ( µ , µ ) ∈ ♦ ,lim t →∞ | y t ( µ ,µ ) t | = 0 almost surely.Condition S-LIP is essentially a Lipschitz continuity condition. It requires ﬁnding se-quence of random variables B t such that | ξ t ( µ , µ ) − ξ t ( µ , µ ) | ≤ B t · ( | µ − µ | + | µ − µ | )almost surely, such that these random variables satisfysup t ≥ t P ts =1 E [ B s ] < ∞ , and lim t →∞ t P ts =1 ( B s − E [ B s ]) = 0 almost surely.But for every ω, ϕ s ( µ , µ ) := − λ ( X ,s − µ + µ • + γ ( X ,s − µ )) · { X ,s ≤ ˜ C s }| ϕ s ( µ , µ ) − ϕ s ( µ , µ ) | ≤| λ ( X ,s − µ + µ • + γ ( X ,s − µ )) − λ ( X ,s − µ + µ • + γ ( X ,s − µ )) | . Since ln( g ( · )) has a bounded second derivative, the RHS is bounded by κ g · (cid:16) | µ − µ | + γ · | µ − µ | (cid:17) .Now that we know | ϕ s ( µ , µ ) − ϕ s ( µ , µ ) | ( ω ) ≤ κ g · (cid:16) | µ − µ | + γ · | µ − µ | (cid:17) for all ω, we must also have | ¯ ϕ s ( µ , µ ) − ¯ ϕ s ( µ , µ ) | ( ω ) ≤ κ g · (cid:16) | µ − µ | + γ · | µ − µ | (cid:17) for all ω since ¯ ϕ s ( µ , µ ) = E [ ϕ s ( µ , µ ) | F s − ].Setting B s as the constant 2 κ g for every s satisﬁes S-LIP.32 A 3.5 Bounds on Asymptotic Beliefs and Asymptotic Cutoﬀs

Recall that Lemma OA.3 implies that if we draw the line with slope − γ through the point( µ • , µ ) , all pairs of fundamentals on this line have the same optimal cutoﬀ threshold. Thenagainst any feasible model Ψ( µ , µ ; γ ) with ( µ , µ ) ∈ ♦ , the best cutoﬀ strategy is between c ◦ := C ( µ • , µ ◦ ; γ ) and ¯ c ◦ := C ( µ • , ¯ µ ◦ ; γ ).For µ l ≤ µ h in the interval [ µ ◦ , ¯ µ ◦ ], let ♦ [ µ l ,µ h ] ⊆ ♦ be constructed from ♦ by translatingits top and bottom edges towards the center, so that they pass through ( µ • , µ l ) and ( µ • , µ h )respectively. For ( µ • , µ ) ∈ ♦ , let li ( µ ) ⊆ ♦ be the line segment in supp( m ) with slope − γ that contains the point ( µ • , µ ). So, ♦ [ µ l ,µ h ] = ∪ µ ∈ [ µ l ,µ h ] li ( µ ). Lemma OA.9.

For c ≥ c ◦ , if lim inf t →∞ ˜ C t ≥ c almost surely, then lim t →∞ ˜ M t ( ♦ [ µ ◦ ,µ ∗ ( c )) ) =0 almost surely. Also, for ¯ c ≤ ¯ c ◦ , if lim sup t →∞ ˜ C t ≤ ¯ c almost surely, then lim t →∞ ˜ M t ( ♦ ( µ ∗ (¯ c ) , ¯ µ ◦ ] ) =0 almost surely.Proof. I ﬁrst show that for all (cid:15) > , there exists δ > t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] t ∂‘ t ∂µ ( µ , µ ) ≥ δ. From Lemma OA.5, we may rewrite LHS aslim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] " t D m ( µ , µ ) m ( µ , µ ) + y t ( µ , µ ) t + z t ( µ , µ ) t , which is no smaller than taking the inf separately across the three terms in the bracket,lim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] t D m ( µ , µ ) m ( µ , µ ) + lim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] y t ( µ , µ ) t + lim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] z t ( µ , µ ) t . Since D g/g is bounded on ♦ as D m is continuous and m is continuous and strictlypositive on the compact set ♦ , the ﬁrst term is 0 for every ω . To deal with the second term,lim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] y t ( µ , µ ) t ≥ lim inf t →∞ inf ( µ ,µ ) ∈ ♦ −| y t ( µ , µ ) t | = lim inf t →∞ ( − · sup ( µ ,µ ) ∈ ♦ | y t ( µ , µ ) t | ) . Lemma OA.8 gives lim t →∞ sup ( µ ,µ ) ∈ ♦ | y t ( µ ,µ ) t | = 0 almost surely. Hence, we concludelim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] y t ( µ , µ ) t ≥ δ > t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] z t ( µ ,µ ) t ≥ δ almostsurely. The proof of Lemma A.19 establishes that, if we put δ = ∂∂µ ¯ L ( µ ∗ ( c ) − (cid:15) | c ) , then δ > ∂∂µ ¯ L ( µ | c ) ≥ δ whenever c ≥ c and µ ≤ µ ∗ ( c ) − (cid:15) . For every ( µ , µ ) ∈ li (ˆ µ ) ,µ + γ ( µ − µ • ) = ˆ µ . So, ¯ ϕ s ( µ , µ )( ω ) ≥ δ for all ( µ , µ ) ∈ ♦ [ µ ◦ ,µ ∗ ( c ) − (cid:15) ] , whenever ˜ C s ( ω ) ≥ c .Along any ω where lim inf t →∞ ˜ C t ≥ c , we therefore havelim inf s →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ¯ ϕ s ( µ , µ ) ≥ δ and thuslim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] z t ( µ , µ ) t = lim inf t →∞ inf ( µ ,µ ) ∈ ♦ [ µ ◦ ,µ ∗ c ) − (cid:15) ] ] t " t X s =1 ¯ ϕ s ( µ , µ ) ≥ δ. From here, the same argument as in the proof of Lemma OA.4 showslim t →∞ ˜ M t ( ♦ [ µ ◦ ,µ ∗ ( c ) − (cid:15) ] ) =0 almost surely. Since the choice of (cid:15) > µ to deduce asymptotic restric-tions on their cutoﬀs. Lemma OA.10.

Suppose that there are µ ◦ ≤ µ l < µ h ≤ ¯ µ ◦ such that lim t →∞ ˜ M t ( ♦ [ µ l ,µ h ] ) =1 almost surely. Then lim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) and lim sup t →∞ ˜ C t ≤ C ( µ • , µ h ; γ ) almostsurely.Proof. I show lim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) almost surely. The argument establishing lim sup t →∞ ˜ C t ≤ C ( µ • , µ h ; γ ) is symmetric.Let c l = C ( µ • , µ l ; γ ), and recall before we deﬁned c ◦ := C ( µ • , µ ◦ ; γ ) and ¯ c ◦ := C ( µ • , ¯ µ ◦ ; γ ).By Lemma OA.3, C ( µ , µ ; γ ) = C ( µ • , µ ; γ ) for all ( µ , µ ) ∈ li ( µ ). Since c U ( c ; µ , µ ) is single peaked for every ( µ , µ ) , and since c l ≤ C ( µ • , µ ; γ ) for all µ ∈ [ µ l , µ h ] , we also get c l ≤ C ( µ , µ ; γ ) for every ( µ , µ ) ∈ ♦ [ µ l ,µ h ] , since ♦ [ µ l ,µ h ] is the union of the linesegments, ♦ [ µ l ,µ h ] = ∪ µ ∈ [ µ l ,µ h ] li ( µ ).Fix some (cid:15) > . We get U ( c l ; µ , µ ) − U ( c l − (cid:15) ; µ , µ ) > µ , µ ) ∈ ♦ [ µ l ,µ h ] . As( µ , µ ) (cid:16) U ( c l ; µ , µ ) − U ( c l − (cid:15) ; µ , µ ) (cid:17) is continuous, there exists some κ ∗ > U ( c l ; µ , µ ) − U ( c l − (cid:15) ; µ , µ ) > κ ∗ for all ( µ , µ ) ∈ ♦ [ µ l ,µ h ] . In particular, if ν ∈ ∆( ♦ [ µ l ,µ h ] )is a belief about fundamentals, then R U ( c l ; µ , µ ) − U ( c l − (cid:15) ; µ , µ ) > dν ( µ ) > κ ∗ . Now , let ¯ κ := sup c ∈ [ c ◦ , ¯ c ◦ ] sup ( µ ,µ ) ∈ ♦ U ( c ; µ , µ ) ,κ := inf c ∈ [ c ◦ , ¯ c ◦ ] inf ( µ ,µ ) ∈ ♦ U ( c ; µ , µ ) . p ∈ (0 ,

1) so that pκ ∗ − (1 − p )(¯ κ − κ ) = 0 . At any belief ˆ ν ∈ ∆( ♦ ) that assigns morethan probability p to the sub-parallelogram ♦ [ µ l ,µ h ] , the optimal cutoﬀ is larger than c l − (cid:15) .To see this, take any ˆ c ≤ c l − (cid:15) and I will show ˆ c is suboptimal. If ˆ c < c, then it is suboptimalafter any belief on ♦ . If c ≤ ˆ c ≤ c l − (cid:15) , I show that Z U ( c l ; µ , µ ) − U (ˆ c ; µ , µ ) d ˆ ν ( µ ) > . To see this, we may decompose ˆ ν as the mixture of a probability measure ν on ♦ [ µ l ,µ h ] andanother probability measure ν c on ♦ \ ♦ [ µ l ,µ h ] . Let ˆ p > p be the probability that ν assigns to ♦ [ µ l ,µ h ] . The above integral is equal to:ˆ p Z ♦ [ µl ,µh U ( c l ; µ , µ ) − U (ˆ c ; µ , µ ) dν ( µ ) + (1 − ˆ p ) Z ♦ \ ♦ [ µl ,µh U ( c l ; µ , µ ) − U (ˆ c ; µ , µ ) dν c ( µ )Since c l is to the left of the optimal cutoﬀ for all ( µ , µ ) ∈ ♦ [ µ l ,µ h ] and ˆ c ≤ c l − (cid:15) , then U (ˆ c ; µ , µ ) ≤ U ( c l − (cid:15) ; µ , µ ) for all ( µ , µ ) ∈ ♦ [ µ l ,µ h ] . The ﬁrst summand is no less thanˆ p Z ♦ [ µl ,µh U ( c l ; µ , µ ) − U ( c l − (cid:15) ; µ , µ ) dν ( µ ) ≥ ˆ pκ ∗ . Also, the integrand in the second summand is no smaller than − (¯ κ − κ ) , therefore R U ( c l ; µ , µ ) − U (ˆ c ; µ , µ ) d ˆ ν ( µ ) ≥ ˆ pκ ∗ − (1 − ˆ p )(¯ κ − κ ) . Since ˆ p > p , we get ˆ pκ ∗ − (1 − ˆ p )(¯ κ − κ ) > ω where lim t →∞ ˜ M t ( ♦ [ µ l ,µ h ] )( ω ) = 1 , eventually ˜ M t ( ♦ [ µ l ,µ h ] )( ω ) >p for all large enough t, meaning lim inf t →∞ ˜ C t ( ω ) ≥ c l − (cid:15). Since lim t →∞ ˜ M t ( ♦ [ µ l ,µ h ] ) = 1almost surely, this shows lim inf t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) − (cid:15) almost surely. Since the choice of (cid:15) > t →∞ ˜ C t ≥ C ( µ • , µ l ; γ ) almost surely. OA 3.6 The Contraction Map

I now combine the results established so far to prove the convergence statement in Theorem1.

Proof.

Let µ l , [1] := µ ◦ , µ h , [1] := ¯ µ ◦ . For k = 2 , , ... , iteratively deﬁne µ l , [ k ] := I ( µ l , [ k − ; γ )and µ h , [ k ] := I ( µ h , [ k − ; γ ).From Lemma OA.10, if lim t →∞ ˜ M t ( ♦ [ µ l , [ k ] ,µ h , [ k ] ] ) = 1 almost surely, then lim inf t →∞ ˜ C t ≥ C ( µ • , µ l , [ k ] ; γ ) and lim sup t →∞ ˜ C t ≤ C ( µ • , µ h , [ k ] ; γ ) almost surely. But using these conclusionsin Lemma OA.9, we further deduce thatlim t →∞ ˜ M t ( ♦ [ µ ∗ ( C ( µ • ,µ l , [ k ] ; γ )) ,µ ∗ ( C ( µ • ,µ h , [ k ] ; γ ))] ) = 135lmost surely, that is to say lim t →∞ ˜ M t ( ♦ [ µ l , [ k +1] ,µ h , [ k +1] ] ) = 1 almost surely.The iterates ( µ l , [ k ] ) k ≥ and ( µ h , [ k ] ) k ≥ are the iterates of a contraction map, so lim k →∞ µ l , [ k ] = µ • = lim k →∞ µ h , [ k ] . Thus, agent’s posterior converges in L to the line segment with slope − γ containing µ ∞ almost surely (since the support of the prior is bounded).In addition, the sequences of bounds on asymptotic actions also converge by continuity,lim k →∞ C ( µ • , µ l , [ k ] ; γ ) = c ∞ = lim k →∞ C ( µ • , µ h , [ k ] ; γ ). This implies lim t →∞ ˜ C t = c ∞ almostsurely.Finally, combining the asymptotic belief result with Lemma OA.4, we see that in fact ˜ M t converges in L to the point ( µ • , µ ∞ ) almost surely. OA 4 Foundation for Inference and Behavior in theLarge-Generation Environment

In Section 4, I introduced the large-generations social-learning environment with a continuumof agents in each generation. When agents in generations τ = 0 , , ..., t − c [0] , c [1] , ..., c [ t − , each generation t agent observes an inﬁnite sample of histories( h τ,n ) n ∈ [0 , drawn from the history distribution H • ( c τ ) for each 0 ≤ τ ≤ t − . Agents inferthe large-generations pseudo-true fundamentals µ ∗ ( c [0] , ..., c [ t − ), µ ∗ ( c [0] , ..., c [ t − ) and choosethe stopping strategy that best responds to the feasible model with these parameters.In this section, I provide a ﬁnite-population foundation for inference and behavior in thelarge-generations environment for the Gaussian case. For K ≥ , let c † = ( c ( k ) † ) Kk =1 ∈ R K bea list of cutoﬀ thresholds. I show that when an agent starts with a full-support prior on thespace of fundamentals R and observes N < ∞ histories drawn i.i.d. from each of H • ( c ( k ) † )for 1 ≤ k ≤ K , her posterior belief almost surely converges to the dogmatic belief on thelarge-generations pseudo-true fundamentals µ ∗ ( c † ) , µ ∗ ( c † ) as N → ∞ . Also, if she choosesthe cutoﬀ strategy S c maximizing her posterior expected payoﬀs, then as N → ∞ andprovided the stage-game payoﬀ functions u , u are Lipschitz continuous, her cutoﬀ choicealmost surely converges to C ( µ ∗ ( c † ) , µ ∗ ( c † ); γ ). OA 4.1 Setting up the Probability Space

Suppose an agent has a full-support prior density m : R → R > over fundamentals ( µ , µ ).To formally deﬁne the problem, consider the R K -valued stochastic process ( X n ) n ≥ =( X ( k )1 ,n , X ( k )2 ,n ) ≤ k ≤ K,n ≥ , where X s and X s are independent for s = s . Here, X n are i.i.d. R K -valued random variables with independent components, distributed as X ( k )1 ,n ∼ N ( µ • , σ ), X ( k )2 ,n ∼ N ( µ • , σ ) for each 1 ≤ k ≤ K . The interpretation is that there are K diﬀerent pop-ulations, who play the stage game using diﬀerent cutoﬀ thresholds. The random variables( X ( k )1 ,n , X ( k )2 ,n ) are the potential draws in the n -th iteration of the stage game in population k, (but X ( k )2 ,n may not be observed if X ( k )1 ,n is suﬃciently large). Clearly, there is a probability36pace (Ω , A , P ), with sample space Ω = ( R K ) ∞ interpreted as paths of the process just de-scribed, A the Borel σ -algebra on Ω , and P the measure on sample paths so that the process X n ( ω ) = ω n has the desired distribution. The term “almost surely” means “with probability1 with respect to the realization of inﬁnite sequence of all (potential) draws”, i.e. P -almostsurely.For each n ≥ ≤ k ≤ K , let H ( k ) n be the (random) history given by H ( k ) n = ( X ( k )1 ,n , ∅ )if X ( k )1 ,t ≥ c ( k ) † , H ( k ) n = ( X ( k )1 ,n , X ( k )2 ,n ) if X ( k )1 ,n < c ( k ) † . Let H n = ( H (1) n , ..., H ( K ) n ) . After each ﬁnite N, the agent Bayesian updates prior density m about the fundamentals, based on the ﬁnitedataset of histories ( H n ) n ≤ N . She ends up with a random, non-degenerate posterior density˜ m N = m ( ·| ( H n ) n ≤ N ), whose randomness comes from the randomness of the 2 K · N potentialdraws. OA 4.2 Inference after Observing Large Samples

Proposition OA.3 shows that as N → ∞ , the random posterior ˜ m N converges to the large-generations pseudo-true fundamentals in L . Proposition OA.3.

Suppose m : R → R > is integrable and has bounded magnitude.Almost surely, lim N →∞ E ( µ ,µ ) ∼ ˜ m N ( | µ − µ • | + | µ − µ ∗ ( c † ) | ) = 0 . Belief convergence in L is required to later establish convergence of behavior in Propo-sition OA.5. This convergence does not follow from Berk (1966), because his result onlyestablishes convergence in a weaker mode: for any open set containing the pseudo-true fun-damentals, the mass that the posterior belief assigns to the open set almost surely convergesto 1. Crucially, the prior distribution in this setting has full support on an unbounded do-main of feasible fundamentals, ( µ , µ ) ∈ M = R . Indeed, one of the implications of mycentral inference result, Proposition 2, is that the pseudo-true parameter becomes unbound-edly pessimistic as censoring threshold decreases. So, the weak mode of convergence in Berk(1966)’s conclusion leaves open the possibility that posterior beliefs for increasing N putdecreasing mass on increasingly extreme values of µ . If the magnitudes of these extremevalues grow more quickly in N than the speed with which probability concentrates on theopen set around the pseudo-true fundamentals, then there can be a positive-probability eventwhere the agent’s behavior is bounded away from C ( µ • , µ ∗ ( c † ); γ ) for every N .Instead, I apply Bunke and Milhaud (1998)’s results to derive the stronger convergencein L that subsequently allows for convergence of payoﬀs and behavior as the agent’s samplegrows large. One technical challenge is that the results of Bunke and Milhaud (1998) onlyapply in environments where observables are valued in some Euclidean space and given bydensities, but censored histories are valued in H and their distributions have a probabilitymass on the missing data indicator ∅ . So, I ﬁrst consider a noise-added observation structure37here each history H ( k ) n is replaced by the R -valued pair ( X ( k ) n, , Y ( k ) n ), where Y ( k ) n = X ( k ) n, if X ( k ) n, ≤ c ( k ) † . But if X ( k ) n, > c ( k ) † , then Y ( k ) n ∼ N (0 ,

1) is a white noise term that is independentof the draws of any decision problem. The idea is that a censored draw is replaced by noisethat is uninformative about the fundamentals, so the distribution of each ( X ( k ) n, , Y ( k ) n ) pairis given by a density function on R . After establishing the analogous belief convergenceresult in the auxiliary environment, I map the result back into the environment of observingcensored histories. This translation is possible because in every ﬁnite dataset, the realiza-tions of the white noise terms do not change the relative likelihoods of data under diﬀerentparameters ( µ , µ ), hence they do not aﬀect the agent’s posterior belief over fundamentals.I now formally deﬁne this noise-added observation structure that replaces censored X ’swith white noise. Let P Z ∞ be the measure on ( R ∞ ) K corresponding to product of K i.i.d.sequence of N (0 ,

1) random variables. Consider the expanded probability space ( ¯Ω , ¯ A , ¯ P )where ¯Ω = Ω × ( R ∞ ) K , ¯ A is the product σ -algebra on ¯Ω where ( R ∞ ) K is endowed with theusual product Borel σ -algebra, and ¯ P is the product measure P ⊗ P Z ∞ on ¯Ω. To interpret,each element ¯ ω = ( ω, z ) ∈ ¯Ω consists of the sample path of a sequence of potential draws( X n ) ∞ n =1 as well as the sample path of a sequence of white noise realizations ( Z n ) ∞ n =1 , whereeach Z n is an R K -valued random variable.On the expanded probability space, we can deﬁne two kinds of observations. The his-tory dataset of size N is ( H n (¯ ω )) n ≤ N = ( H n ( ω n )) n ≤ N , as the K round n histories onlydepends on ω n (and not on the white noise z n ) . The noise-added dataset of size T is( X ,n ( ω ) , Y n ( ω, z )) n ≤ N = ( X ,n ( ω n ) , Y n ( ω n , z n )) n ≤ N . Write ˜ m HN as the posterior density fromhistory dataset of size N, ˜ m XYN as the posterior density from noise-added dataset of size N .The next lemma formalizes the idea that replacing censored observations with white noisedoes not aﬀect posterior beliefs. Lemma OA.11.

For every ¯ ω ∈ ¯Ω and N ∈ N , ˜ m HN (¯ ω ) = ˜ m XYN (¯ ω ) . Proof.

Suppose ¯ ω = (( x ,n , x ,n ) ∞ n =1 , ( z n ) ∞ n =1 ) ∈ Ω × ( R ∞ ) K . The noise-added dataset of size N is ( x ,n , y n ) Nn =1 where y ( k ) n = z ( k ) n for each n, k where x ( k )1 ,n ≥ c ( k ) † , and y ( k ) n = x ( k )2 ,n for each n, k where x ( k )1 ,n < c ( k ) † . The history dataset of size N is ( h n ) Nn =1 , where h ( k ) n = ( x ( k )1 ,n , ∅ ) foreach n, k where x ( k )1 ,n ≥ c ( k ) † , and h ( k ) n = ( x ( k )1 ,n , x ( k )2 ,n ) for each n, k where x ( k )1 ,n < c ( k ) † .The likelihood of the noise-added dataset under parameters µ , µ is: K Y k =1  N Y n =1 φ ( x ( k )1 ,n ; µ , σ ) ! ·  Y n : x ( k ) n ≤ c ( k ) † φ ( y ( k ) n ; µ − γ ( x ( k )1 ,n − µ ) , σ )  ·  Y n : x ( k ) n ≥ c ( k ) † φ ( y ( k ) n ; 1 ,  The likelihood of the history dataset under parameters µ , µ is: K Y k =1  N Y n =1 φ ( x ( k )1 ,n ; µ , σ ) ! ·  Y n : x ( k ) n ≤ c ( k ) † φ ( y ( k ) n ; µ − γ ( x ( k )1 ,n − µ ) , σ )  Q Kk =1 (cid:18)Q n : x ( k ) n ≥ c ( k ) † φ ( y ( k ) n ; 1 , (cid:19) , which iscommon across all parameters ( µ , µ ). So the posterior likelihood of parameters µ , µ mustbe the same under both ˜ m HN and ˜ m XYN , that is ˜ m HN (¯ ω ) = ˜ m XYN (¯ ω ) . On the expanded probability space, inference from history dataset and inference fromnoise-added dataset give the same posterior beliefs everywhere. If ˜ m XYN converges in L todogmatic belief on ( µ • , µ ∗ ( c † )) ¯ P -a.s., then ˜ m HN also converges in L to the same belief ¯ P -a.s.Further, by relationship between the expanded probability space and the original probabilityspace, this would also show that ˜ m N converges in L to dogmatic belief on ( µ • , µ ∗ ( c † )) P -a.s.,which proves Proposition OA.3. Therefore, to prove Proposition OA.3 one just needs thefollowing on the expanded probability space. Lemma OA.12. ˜ m XYN converges in L to the dogmatic belief on ( µ • , µ ∗ ( c † )) ¯ P -a.s.Proof. First, I write down the KL divergence objective in the noise-added observation struc-ture and show its minimizers are exactly the large-generations pseudo-true fundamentals.Each observation ( X ( k )1 ,n , Y ( k ) n ) Kk =1 is an element of R (2 K ) , whose distribution is given by a K densities over K copies of R . For 1 ≤ k ≤ K , the k -th such density is f • , ( k ) ( x, y ) =  φ ( x ; µ • , σ ) · φ ( y ; µ • , σ ) if x < c ( k ) † φ ( x ; µ • , σ ) · φ ( y ; 0 ,

1) if x ≥ c ( k ) † . Under the fundamentals ( µ , µ ) ∈ R , the agent thinks the observations are distributedaccording to the product of K densities where the k -th density is f ( k )ˆ µ , ˆ µ ( x, y ) =  φ ( x ; ˆ µ , σ ) · φ ( y ; ˆ µ − γ · ( x − ˆ µ ) , σ ) if x < c ( k ) † φ ( x ; ˆ µ , σ ) · φ ( y ; 0 ,

1) if x ≥ c ( k ) † . The log likelihood ratio of an observation ( x, y ) = ( x ( k )1 , y ( k ) ) Kk =1 ∈ R K isln  K Y k =1 f • , ( k ) ( x ( k )1 , y ( k ) ) f ( k )ˆ µ , ˆ µ ( x ( k )1 , y ( k ) )  = K X k =1 ln  f • , ( k ) ( x ( k )1 , y ( k ) ) f ( k )ˆ µ , ˆ µ ( x ( k )1 , y ( k ) )  . So KL divergence is deﬁned as Z R K  K X k =1 ln  f • , ( k ) ( x ( k )1 , y ( k ) ) f ( k )ˆ µ , ˆ µ ( x ( k )1 , y ( k ) )  · K Y k =1 f • , ( k ) ( x ( k )1 , y ( k ) ) ! d ( x, y )= K X k =1 Z R K ln  f • , ( k ) ( x ( k )1 , y ( k ) ) f ( k )ˆ µ , ˆ µ ( x ( k )1 , y ( k ) )  ·  K Y j =1 f • , ( j ) ( x ( j )1 , y ( j ) )  d ( x, y ) . k, the integrand f • , ( k ) ( x ( k )1 ,y ( k ) ) f ( k )ˆ µ , ˆ µ ( x ( k )1 ,y ( k ) ) only depends on ( x, y ) ∈ R K through two of itscoordinates, x ( k )1 and y ( k ) . In addition, the density Q Kj =1 f • , ( j ) ( x ( j )1 , y ( j ) ) is a product density,so in fact the k -th summand is just Z R ln  f • , ( k ) ( x ( k )1 , y ( k ) ) f ( k )ˆ µ , ˆ µ ( x ( k )1 , y ( k ) )  · f • , ( k ) ( x ( k )1 , y ( k ) ) d ( x ( k )1 , y ( k ) ) . This expression is, up to a constant not depending on ˆ µ , ˆ µ (due to the white noise term),equal to the KL divergence between H • ( c ( k ) † ) and H (Ψ(ˆ µ , ˆ µ ; γ ); c ( k ) † ). Therefore the overallKL divergence is oﬀ by a constant from K X k =1 D KL ( H • ( c ( k ) † ) || H (Ψ(ˆ µ , ˆ µ ; γ ); c ( k ) † ) ) , the objective deﬁning large-generations pseudo-true fundamentals in Equation (1).To ﬁnish the proof, Bunke and Milhaud (1998) show that provided the true density f • and the family of subjective densities { f ˆ µ , ˆ µ : ˆ µ , ˆ µ ∈ R } satisfy a number of conditions,then ˜ m XYN ¯ P -a.s. converges to its KL-divergence minimizers in L , which I have shown to beexactly ( µ • , µ ∗ ( c † )) . I now check the conditions of Bunke and Milhaud (1998) for the case of K = 1, so both f • and f ˆ µ , ˆ µ are densities on R . Checking the conditions for larger K isexactly analogous, because both f • and f ˆ µ , ˆ µ can be separated as the product of K densitieson R .From the hypothesis on m ’s magnitude being bounded, there is some B < ∞ so that0 < m ( µ , µ ) < B for all µ , µ ∈ R . The parameter space is Θ = R . The data-generatingdensity of observation ( x, y ) is: f • ( x, y ) =  φ ( x ; µ • , σ ) · φ ( y ; µ • , σ ) if x < c † φ ( x ; µ • , σ ) · φ ( y ; 0 ,

1) if x ≥ c † where φ ( · ; µ, σ ) is the Gaussian density with mean µ and variance σ . Under the feasiblemodel Ψ(ˆ µ , ˆ µ ; γ ), the same observation has density: f ˆ µ , ˆ µ ( x, y ) =  φ ( x ; ˆ µ , σ ) · φ ( y ; ˆ µ − γ · ( x − ˆ µ ) , σ ) if x < c † φ ( x ; ˆ µ , σ ) · φ ( y ; 0 ,

1) if x ≥ c † . A1 . Parameter space is a closed, convex set in R with nonempty interior. The density f ˆ µ , ˆ µ ( x, y ) is bounded over (ˆ µ , ˆ µ , x, y ) and its carrier, { ( x, y ) : f ˆ µ , ˆ µ ( x, y ) > } is the samefor all ˆ µ , ˆ µ .Evidently R is closed in itself. The density f ˆ µ , ˆ µ ( x, y ) is bounded by the product ofthe modes of Gaussian densities with variance σ and variance 1. The density f ˆ µ , ˆ µ ( x, y ) is40trictly positive on R for any parameter values ˆ µ , ˆ µ . A2 . For all ˆ µ , ˆ µ , there is a sphere S [(ˆ µ , ˆ µ ) , η ] of center (ˆ µ , ˆ µ ) and radius η > E f •  sup ( µ ,µ ) ∈ S [(ˆ µ , ˆ µ ) ,η ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f µ ,µ ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ . Pick say η = 1. Consider the rectangle R [(ˆ µ , ˆ µ ) , η ] consisting of points ( µ , µ ) such that | µ − ˆ µ | < η and | µ − ˆ µ | < η . Since the the Gaussian distribution is single-peaked, for any( x, y ) ∈ R the absolute value of the log likelihood ratio (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X,Y ) f µ ,µ ( X,Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) on all of R [(ˆ µ , ˆ µ ) , η ]must be bounded by its value at the 4 corners. That is to say,sup ( µ ,µ ) ∈ S [(ˆ µ , ˆ µ ) ,η ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f µ ,µ ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup ( µ ,µ ) ∈ R [(ˆ µ , ˆ µ ) ,η ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f µ ,µ ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f ˆ µ − η, ˆ µ − η ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f ˆ µ − η, ˆ µ + η ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f ˆ µ + η, ˆ µ − η ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X, Y ) f ˆ µ + η, ˆ µ + η ( X, Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . It is easy to see that for any ﬁxed parameter E f • "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ln f • ( X,Y ) f µ ,µ ( X,Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) is ﬁnite, so the sum of these4 terms gives a ﬁnite upper bound. A3 . For all ﬁxed ( x , y ) ∈ R , the map from parameters to density ( µ , µ ) f µ ,µ ( x , y )has continuous derivatives with respect to parameters ( µ , µ ) ∂f∂µ ( x , y ; µ , µ ), ( µ , µ ) ∂f∂µ ( x, y ; µ , µ ). There exist positive constants κ and b with Z Z (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) [ f µ ,µ ( x, y )] − ·  ∂f∂µ ( x, y ; µ , µ ) ∂f∂µ ( x, y ; µ , µ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) · f µ ,µ ( x, y ) · dydx < κ (1 + || ( µ , µ ) || b )satisﬁed for every ( µ , µ ) ∈ R , where || · || is a norm on R .Let’s choose the max norm, || v || = max( | v | , | v | ). For uncensored data ( x , y ) with x < c † , we can compute ∂f∂µ ( x , y ; µ , µ ) = f µ ,µ ( x , y ) · " (1 + γ ) σ · ( x − µ ) + γσ · ( y − µ ) and ∂f∂µ ( x , y ; µ , µ ) = f µ ,µ ( x , y ) · (cid:20) γσ · ( x − µ ) − σ · ( y − µ ) (cid:21) . While for censored data ( x , y ) where x > c † , the likelihood of the data is unchangedby parameter µ since it neither changes the distribution of the early draw quality nor the41istribution of the white noise term, meaning ∂f∂µ ( x , y ; µ , µ ) = 0. Also, for the censoredcase ∂f∂µ ( x , y ; µ , µ ) = f µ ,µ ( x , y ) · σ ( x − µ ) . This means the integral to be bounded is: Z x = c † x = −∞ Z ∞−∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (1+ γ ) σ · ( x − µ ) + γσ · ( y − µ ) γσ · ( x − µ ) − σ · ( y − µ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) · f µ ,µ ( x, y ) · dy  dx + Z x = ∞ x = c † (cid:20)Z ∞−∞ ( 1 σ ( x − µ )) · f µ ,µ ( x, y ) · dy (cid:21) dx. Since the inner integrals are non-negative, this expression is smaller than the version wherethe domains of the outer integrals are expanded and the densities f µ ,µ ( x, y ) are simplyreplaced with the joint density on R of the feasible model for Ψ( µ , µ ; γ ), which I denoteas ˜ f µ ,µ ( x, y ). Z ∞−∞ Z ∞−∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (1+ γ ) σ · ( x − µ ) + γσ · ( y − µ ) γσ · ( x − µ ) − σ · ( y − µ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) · ˜ f µ ,µ ( x, y ) · dy  dx + Z ∞−∞ (cid:20)Z ∞−∞ ( 1 σ ( x − µ )) · ˜ f µ ,µ ( x, y ) · dy (cid:21) dx. The second summand is a 12th moment of the joint normal random variable with distributionΨ( µ , µ ; γ ), so for all µ , µ it is given by some 12th order polynomial P ( µ , µ ). Similarlythe ﬁrst summand is also given by a 12th order polynomial P ( µ , µ ). Therefore by choosing b = 12 and choosing κ appropriately according to the coeﬃcients in P and P , we achievedthe desired bound. A4 . For some positive constants b and κ , the aﬃnity function A ( µ , µ ) := Z Z [ f µ ,µ ( x, y ) · f • ( x, y )] / dydx satisﬁes A ( µ , µ ) < κ · || ( µ , µ ) || − b for all µ , µ .We have A ( µ , µ ) ≤ [ R R [ f µ ,µ ( x, y ) · f • ( x, y )] dydx ] / , so it’s suﬃcient to ﬁnd some κ b that works to bound R R [ f µ ,µ ( x, y ) · f • ( x, y )] dydx . We have: Z Z [ f µ ,µ ( x, y ) · f • ( x, y )] dydx = Z c † x = −∞ Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; µ − γ ( x − µ ) , σ ) · φ ( y ; µ • , σ ) dydx + Z ∞ x = c † Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; 0 , · φ ( y ; 0 , dydx ≤ Z ∞−∞ Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; µ − γ ( x − µ ) , σ ) · φ ( y ; µ • , σ ) dydx + Z ∞−∞ Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; 0 , · φ ( y ; 0 , dydx. I show how to ﬁnd κ and b to bound the ﬁrst summand in the last expression above. It iseasy to similarly bound the second summand. By Bromiley (2018), the product of Gaussiandensities φ ( y ; µ − γ ( x − µ ) , σ ) · φ ( y ; µ • , σ ) is itself a Gaussian density in y , ˜ φ ( y ), multipliedby a scaling factor equal to (4 πσ ) − / · exp (cid:16) − γ σ · [ x − ( µ − µ • γ + µ γ )] (cid:17) . So we have Z ∞−∞ Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; µ − γ ( x − µ ) , σ ) · φ ( y ; µ • , σ ) dydx = Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · (cid:16) πσ (cid:17) − / · exp − γ σ · [ x − ( µ − µ • γ + µ γ )] ! · Z ∞−∞ · ˜ φ ( y ) dydx = (cid:16) πσ (cid:17) − / · Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · exp − γ σ · [ x − ( µ − µ • γ + µ γ )] ! · dx. Again applying Bromiley (2018), product of the two Gaussian densities φ ( x ; µ , σ ) · φ ( x ; µ • , σ )is another Gaussian density with mean µ • + µ , variance σ , and multiplied to a scaling factorof (4 πσ ) − / exp (cid:16) − ( µ − µ • ) σ (cid:17) . So above expression is: K · exp − ( µ − µ • ) σ ! · Z ∞−∞ φ ( x ; µ • + µ , σ · exp − γ σ · [ x − ( µ − µ • γ + µ γ )] ! dx where K is a constant not dependent on µ , µ . Also, we may writeexp − γ σ · [ x − ( µ − µ • γ + µ γ )] ! = K · φ ( x ; ( µ − µ • γ + µ γ ) , σ B )where σ B = σ γ and K = (2 πσ B ) / . Applying Bromiley (2018) one ﬁnal time, the product φ ( x ; µ • + µ , σ ) · φ ( x ; ( µ − µ • γ + µ γ ) , σ B ) is a Gaussian density in x scaled by K exp( − K · ( µ • − µ − µ − µ • γ ) ) where K , K > µ , µ . So altogether,the second summand we are bounding is a constant multiple of exp (cid:16) − ( µ − µ • ) σ (cid:17) · exp( − K · ( µ • − µ − µ − µ • γ ) ). For | µ | ≥ | µ | , the max norm || ( µ , µ ) || = | µ | and exp (cid:16) − ( µ − µ • ) σ (cid:17) | µ | < | µ | , and | µ | − | µ • | + | µ • | γ > − K · ( µ • − µ − µ − µ • γ ) ) ≤ exp( − K · ( | µ | − | µ • | | µ • | γ ) ) . So for large enough | µ | , exp( − K · ( µ • − µ − µ − µ • γ ) ) will decrease exponentially fast in thenorm. These two facts imply that there is some K > || ( µ , µ ) || > K , Z ∞−∞ Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; µ − γ ( x − µ ) , σ ) · φ ( y ; µ • , σ ) dydx < || ( µ , µ ) || − . Now put κ = K − and we can ensure for any value of || ( µ , µ ) || we will have Z ∞−∞ Z ∞−∞ φ ( x ; µ , σ ) · φ ( x ; µ • , σ ) · φ ( y ; µ − γ ( x − µ ) , σ ) · φ ( y ; µ • , σ ) dydx < κ ·|| ( µ , µ ) || − . A5 . There are positive constants b , b so that for all ( µ , µ ) and r > m ( S [( µ , µ ) , r ]) ≤ cr b (1 + ( || ( µ , µ ) || + r ) b ). Moreover, g assigns positive mass to everysphere with positive radius.Since we have assumed that density g is bounded by B , the prior mass assigned to thesphere S [( µ , µ ) , r ] is bounded by B times its Euclidean volume. So, take b = 2 and c = πB and the ﬁrst statement is satisﬁed. Since we have assumed that m is strictlypositive everywhere, the second statement is satisﬁed. OA 4.3 Behavior after Observing Large Samples

Next, I turn to the convergence of expected payoﬀs for diﬀerent cutoﬀ strategies as samplesize grows large. For any c ∈ R and N ∈ N , let U N ( c ) := E ( µ ,µ ) ∼ ˜ m N [ U ( c ; µ , µ , γ )] where U ( c ; µ , µ , γ ) is the expected payoﬀ of using the stopping strategy S c when ( X , X ) ∼ Ψ( µ , µ ; γ ) . Note that U N ( c ) is a real-valued random variable representing the agent’s sub-jective expected payoﬀ for the stopping strategy S c , under the (random) non-degenerateposterior belief after observing a sample of size N . Proposition OA.4 shows that U N ( c )converges almost surely to the subjective expected payoﬀ of S c with a dogmatic belief inthe pseudo-true fundamentals, provided the payoﬀ functions u , u of the optimal-stoppingproblem are Lipschitz continuous. Furthermore, this convergence is uniform across all cutoﬀthresholds. Proposition OA.4.

Suppose there are constants K , K > so that | u ( x ) − u ( x ) |
Suppose there are constants K , K > so that | u ( x ) − u ( x ) | < K ·| x − x | and | u ( x , x ) − u ( x , x ) | < K · ( | x − x | + | x − x | ) for all x , x , x , x ∈ R . For each center ( µ ◦ , µ ◦ ) ∈ R , there corresponds a constant K > so that for any µ , µ ∈ R and any c ∈ R , | U ( c ; µ , µ ) − U ( c ; µ ◦ , µ ◦ ) | < K · ( | µ − µ ◦ | + | µ − µ ◦ | ) .Proof. Let ( µ ◦ , µ ◦ ) ∈ R be given. For any µ , µ , c ∈ R , we have U ( c ; µ , µ ) = Z ∞ c u ( x ) φ ( x ; µ , σ ) dx + Z c −∞ (cid:20)Z ∞−∞ u ( x , x ) φ ( x ; µ − γ ( x − µ ) , σ ) dx (cid:21) · φ ( x ; µ , σ ) dx . We ﬁrst bound | R ∞ c u ( x ) φ ( x ; µ , σ ) dx − R ∞ c u ( x ) φ ( x ; µ ◦ , σ ) dx | by a multiple of | µ − µ ◦ | . Suppose ﬁrst µ = µ ◦ + ∆ for some ∆ >

0. We have Z ∞ c u ( x ) φ ( x ; µ , σ ) dx = Z ∞ c − ∆ u ( x + ∆) φ ( x ; µ ◦ , σ ) dx . By Lipschitz continuity of u , | u ( x ) − u ( x + ∆) | ≤ K ∆ for all x ∈ R . Thus we conclude (cid:12)(cid:12)(cid:12)(cid:12)Z ∞ c u ( x ) φ ( x ; µ , σ ) dx − Z ∞ c u ( x ) φ ( x ; µ ◦ , σ ) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ∆ + (cid:12)(cid:12)(cid:12)(cid:12)Z cc − ∆ u ( x ) φ ( x ; µ ◦ , σ ) dx (cid:12)(cid:12)(cid:12)(cid:12) . Again by Lipschitz continuity of u , for any x ∈ R , | u ( x ) φ ( x ; µ , σ ) | ≤ ( | u (0) | + K | x | ) · φ ( x ; µ ◦ , σ ) . Since the Gaussian density decreases to 0 exponentially fast as x → ±∞ , the RHS isuniformly bounded for all x ∈ R by some constant, say J >

0. (Note that the RHS is nota function of c , so J does not depend on c .) This shows that (cid:12)(cid:12)(cid:12)(cid:12)Z cc − ∆ u ( x ) φ ( x ; µ ◦ , σ ) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z cc − ∆ | u ( x ) φ ( x ; µ ◦ , σ ) | dx ≤ Z cc − ∆ J dx = J ∆ .

45o altogether, | Z ∞ c u ( x ) φ ( x ; µ , σ ) dx − Z ∞ c u ( x ) φ ( x ; µ ◦ , σ ) dx | ≤ ( K + J )∆ . If instead µ = µ ◦ − ∆ , then a similar argument shows that | Z ∞ c u ( x ) φ ( x ; µ , σ ) dx − Z ∞ c u ( x ) φ ( x ; µ ◦ , σ ) dx | ≤ K ∆ + | Z c +∆ c u ( x ) φ ( x ; µ ◦ , σ ) dx | , and again we may bound the second term by J ∆ as before.We now turn to bounding the diﬀerence in the second summand making up U ( c ; µ , µ ).First consider the case where µ = µ ◦ . For each x ∈ R , let I ( x ; µ ) := R ∞−∞ u ( x , x ) φ ( x ; µ ◦ − γ ( x − µ ) , σ ) dx , the expected continuation utility after X = x , in the feasible modelΨ( µ , µ ◦ ; γ ) . The second summand in U ( c ; µ , µ ) is given by R c −∞ I ( x ; µ ) φ ( x ; µ , σ ) dx .For x = x + d , µ = µ + d , we have I ( x ; µ ) = Z ∞−∞ u ( x , x ) φ ( x ; µ ◦ − γ ( x − µ ) , σ ) dx = Z ∞−∞ u ( x + d , x − γ ( d − d )) φ ( x ; µ ◦ − γ ( x − µ ) , σ ) dx . Lipschitz continuity of u implies that | u ( x + d , x − γ ( d − d )) − u ( x , x ) | ≤ K ((1 + γ ) · | d | + γ | d | ) ≤ K (1 + γ ) · ( | d | + | d | ) , which shows | I ( x ; µ ) − I ( x ; µ ) | ≤ K (1 + γ ) · ( | x − x | + | x − x | ). That is, I is Lipschitzcontinuous.Suppose µ = µ ◦ + ∆ for some ∆ >

0. Similar to the above argument bounding the ﬁrstsummand in ( c ; µ , µ ) , we have Z c −∞ I ( x ; µ ) φ ( x ; µ , σ ) dx = Z c − ∆ −∞ I ( x + ∆; µ ◦ + ∆) φ ( x ; µ ◦ , σ ) dx . By Lipschitz continuity of I, | I ( x ; µ ◦ ) − I ( x + ∆; µ ◦ + ∆) | ≤ K (1 + γ )∆ for all x ∈ R .Thus we conclude | Z c −∞ I ( x ; µ ) φ ( x ; µ , σ ) dx − Z c −∞ I ( x ; µ ◦ ) φ ( x ; µ ◦ , σ ) dx |≤ K (1 + γ )∆ + | Z cc − ∆ I ( x ; µ ◦ ) φ ( x ; µ ◦ , σ ) dx | . Since x I ( x ; µ ◦ ) is Lipschitz continuous, there exists J > | I ( x ; µ ◦ ) φ ( x ; µ ◦ , σ ) | ≤ J for all x ∈ R , which means | R cc − ∆ I ( x ; µ ◦ ) φ ( x ; µ ◦ , σ ) dx | ≤ J ∆. (Once again, J does46ot depend on c. ) The case of µ = µ ◦ − ∆ is symmetric and we have shown that | Z c −∞ I ( x ; µ ) φ ( x ; µ , σ ) dx − I ( x ; µ ◦ ) φ ( x ; µ ◦ , σ ) dx | ≤ (2 K (1 + γ ) + J ) · | µ − µ ◦ | . Finally, we investigate the diﬀerence in the second summand of U ( c ; µ , µ ) between param-eters ( µ , µ ◦ ) and ( µ , µ ) for µ , µ ∈ R . This diﬀerence is bounded by Z c −∞ (cid:12)(cid:12)(cid:12)(cid:12)Z ∞−∞ u ( x , x ) φ ( x ; µ ◦ − γ ( x − µ ) , σ ) dx − Z ∞−∞ u ( x , x ) φ ( x ; µ − γ ( x − µ ) , σ ) dx (cid:12)(cid:12)(cid:12)(cid:12) φ ( x ; µ , σ ) dx . (3) But for every x ∈ R , Z ∞−∞ u ( x , x ) φ ( x ; µ − γ ( x − µ ) , σ ) dx = Z ∞−∞ u ( x , x +( µ − µ ◦ )) φ ( x ; µ ◦ − γ ( x − µ ) , σ ) dx , and | u ( x , x + ( µ − µ ◦ )) − u ( x , x ) | ≤ K | µ − µ ◦ | by Lipschitz continuity of u . Thisshows that, for all values µ , µ ∈ R , (3) is bounded by K | µ − µ ◦ | .Applying the triangle inequality to the second term, we conclude that | U ( c ; µ , µ ) − U ( c ; µ ◦ , µ ◦ ) | ≤ ( K + J ) | µ − µ ◦ | + (2 K (1 + γ ) + J ) · | µ − µ ◦ | + K | µ − µ ◦ | . So we see that setting K = K + J + (2 K (1 + γ ) + J establishes the lemma.Now I prove Proposition OA.4. Proof.

Let µ ◦ = µ • , µ ◦ = µ ∗ ( c † ). Lemma OA.13 implies there is a constant K >

0, indepen-dent of c , so that | U ( c ; µ , µ ) − U ( c ; µ ◦ , µ ◦ ) | ≤ K · ( | µ − µ ◦ | + | µ − µ ◦ | ) for all µ , µ , c ∈ R .So for ν a joint distribution about the fundamentals ( µ , µ ) , we get | E ( µ ,µ ) ∼ ν [ U ( c ; µ , µ ) − U ( c ; µ ◦ , µ ◦ )] | ≤ E ( µ ,µ ) ∼ ν [ | U ( c ; µ , µ ) − U ( c ; µ ◦ , µ ◦ ) | ] ≤ K · E ( µ ,µ ) ∼ ν [ | µ − µ ◦ | + | µ − µ ◦ | ]for every c ∈ R , therefore we also get the uniform bound,sup c ∈ R | E ( µ ,µ ) ∼ ν [ U ( c ; µ , µ )] − U ( c ; µ ◦ , µ ◦ ) | ≤ K · E ( µ ,µ ) ∼ ν [ | µ − µ ◦ | + | µ − µ ◦ | ] . By Proposition OA.3, almost surelylim T →∞ E ( µ ,µ ) ∼ ˜ m T [ | µ − µ ◦ | + | µ − µ ◦ | ] = 0 . ω ∈ Ω where the above limit holds,lim T →∞ sup c ∈ R | U T ( c ) − U ( c ; µ ◦ , µ ◦ ) | ≤ lim T →∞ K · E ( µ ,µ ) ∼ ˜ m T [ | µ − µ ◦ | + | µ − µ ◦ | ]= 0 . This shows that P -a.s., U T ( c ) converges to U ( c ; µ ∗ ( c † ) , µ ∗ ( c † )) uniformly across all c as T →∞ . To reach my main result on convergence of behavior, suppose the agent chooses a cutoﬀthreshold after observing N histories ( h n ) n ≤ N . The choices are given by the functions ˜ C N : H N → R , so the cutoﬀ after a sample of size N is a random variable C N that depends onthe ﬁrst N pairs of potential draws ( X n ) n ≤ N . Deﬁnition OA.4.

Cutoﬀ choice functions ( ˜ C N ) are asymptotically myopic in N iflim sup N →∞ ( sup c ∈ R U N ( c ) − U N ( ˜ C N ) ) = 0almost surely.A simple example is that ˜ C N chooses a cutoﬀ whose expected payoﬀ diﬀers from sup c ∈ R U N ( c )by no more than 1 /N after every sample of size N . Proposition OA.5.

Let c ∗ = C ( µ • , µ ∗ ( c † ); γ ) . Suppose cutoﬀs C N are generated usingasymptotically myopic cutoﬀ choice functions. Almost surely, C N → c ∗ as N → ∞ . The expected payoﬀ of diﬀerent cutoﬀ strategies under the pseudo-true fundamentals, c U ( c ; µ • , µ ∗ ( c † )), is single peaked and maximized at c ∗ . Therefore cutoﬀs outside anopen ball around c ∗ have expected payoﬀs bounded away from the subjectively optimalpayoﬀ under the model Ψ( µ • , µ ∗ ( c † ); γ ) . Lemma OA.14.

For each µ , µ ∈ R , let c ∗ be the subjectively optimal cutoﬀ thresholdunder Ψ( µ , µ ; γ ) . For every (cid:15) > , there exists δ > so that whenever | c − c ∗ | ≥ (cid:15) , we have U ( c ; µ , µ ) ≤ U ( c ∗ ; µ , µ ) − δ .Proof. First, I show c U ( c ; µ , µ ) is single peaked: it is strictly increasing up to c = c ∗ , then strictly decreasing afterwards. Recall the cutoﬀ form of the best stopping strategycomes from the fact that u ( x ) < E Ψ( µ ,µ ; γ ) [ u ( x , X ) | X = x ] for x < c ∗ , but u ( x ) < E Ψ( µ ,µ ; γ ) [ u ( x , X ) | X = x ] for x > c ∗ . For two cutoﬀs c < c < c ∗ , the two stoppingstrategies S c , S c only diﬀer in how they treat ﬁrst-period draws in the interval [ c , c ] , sowe can write the diﬀerence in their expected payoﬀs as Z c c (cid:16) E Ψ( µ ,µ ; γ ) [ u ( x , X ) | X = x ] − u ( x ) (cid:17) φ ( x ; µ , σ ) dx . c , c ] , therefore U ( c ; µ , µ ) < U ( c ; µ , µ ) . Thisshows U ( · ; µ , µ ) is strictly increasing up until c ∗ ; a symmetric argument shows it is strictlydecreasing after c ∗ .For a given (cid:15) > , let δ = U ( c ∗ ; µ , µ ) − max( U ( c ∗ − (cid:15) ; µ , µ ) , U ( c ∗ + (cid:15) ; µ , µ )) , where δ > c ∗ − (cid:15) and c ∗ + (cid:15) must have a strictly positive loss relative to c ∗ . Since U ( · ; µ , µ ) is single peaked, every c more than (cid:15) away from c ∗ must have a loss relative to c ∗ at least as much as the loss of either c ∗ − (cid:15) or c ∗ + (cid:15), so U ( c ∗ ; µ , µ ) − U ( c ; µ , µ ) ≥ δ .This fact, combined with the uniform convergence U N ( c ) from Proposition OA.4, estab-lishes Proposition OA.5. Proof.

Consider any sample path ω = ( x n ) ∞ n =1 where the conclusion of Proposition OA.4holds and the cutoﬀ choice functions are asymptotically myopic. For every (cid:15) >

0, ﬁnd δ > µ = µ • , µ = µ ∗ ( c † ), and ﬁnd large enough ¯ N so thatsup c ∈ R | U N ( c )( ω ) − U ( c ; µ • , µ ∗ ( c † )) | < δ/ N ≥ ¯ N . This means for N ≥ ¯ N ,sup c ∈ R U N ( c )( ω ) ≥ U N ( c ∗ )( ω ) ≥ U ( c ∗ ; µ • , µ ∗ ( c † )) − δ/ , while U N ( c )( ω ) ≤ U ( c ∗ ; µ • , µ ∗ ( c † )) − (2 δ ) / c / ∈ [ c ∗ − (cid:15), c ∗ + (cid:15) ]. Find ¯ N so that for N ≥ ¯ N , sup c ∈ R U N ( c )( ω ) − U N ( C N )( ω ) < δ/ N ≥ max( ¯ N , ¯ N ) , C N ( ω ) ∈ [ c ∗ − (cid:15), c ∗ + (cid:15) ]. Since (cid:15) > C N ( ω ) → c ∗ .Therefore, we conclude C N → c ∗ along any sample path ω where the conclusion ofProposition OA.4 holds and the cutoﬀ choice functions are asymptotically myopic. Sincethese two events both happen almost surely, C N → c ∗ almost surely. OA 5 Misspeciﬁed Inference under Method of Moments

In the analysis so far, I have modeled the learners as misspeciﬁed Bayesians. In this sec-tion, I consider agents who use a method-of-moments (MOM) procedure as a simpler butnatural alternative to Bayesian inference. Proposition OA.7 and Corollary OA.2 show thatover-pessimism and the positive-feedback loop remain robust to this relaxation of Bayesianinference.

OA 5.1 Feasible Models for ( X , X ) Each agent starts with a set of feasible models { F ( · ; θ , θ ) : θ ∈ Θ , θ ∈ Θ } for thejoint distribution of ( X , X ) , indexed by feasible fundamentals Θ × Θ ⊆ R . For each49 θ θ ) , F ( · ; θ , θ ) is a full-support measure on the rectangle I × I , where each I , I is apossibly inﬁnite interval of R . By “full-support” I mean that for every open ball B ⊆ I × I ,F ( B ; θ , θ ) > F ( · ; θ , θ ), let F ( · ; θ , θ ) denote its marginal on I , and let F | ( ·| θ , θ ; x ) denote its conditional distribution on I given X = x . I will make thefollowing assumptions on the family of feasible models: Assumption OA.1.

The feasible models { F ( · ; θ , θ ) : θ ∈ Θ , θ ∈ Θ } satisfy :(a) F ( · ; θ , θ ) is only a function of θ and E F ( · ; θ ,θ ) [ X ] is strictly increasing in θ .(b) For each x ∈ I and θ ∈ Θ , E F | ( · ; θ ,θ | x ) [ X ] strictly increases in θ .(c) For any θ ∈ Θ and θ ∈ Θ , E F | ( · ; θ ,θ | x ) [ X ] strictly decreases in x . In light of Assumption OA.1(a), the marginal distribution on X can be just writtenas F ( · ; θ ), omitting θ . Assumption OA.1(c) is the substantive assumption capturing thegambler’s fallacy psychology. Every subjective distribution in the family is such that theagent predicts a lower mean for X after a higher realization of X . The behavioral economicsliterature has not settled on a general deﬁnition of the gambler’s fallacy that works underall distributional assumptions, but Assumption OA.1(c) seems like a reasonable ﬁrst step.Note that this is a generalization of how I model the gambler’s fallacy in the main text usinga pair of symmetric, log-concave distributions.Here are some examples satisfying these assumptions. The ﬁrst example concerns theGaussian case from Example 2.

Example OA.1.

Let I = I = R and let Θ = Θ = R . Fixing some σ > , γ >

0, let F ( · ; θ , θ ) be such that X ∼ N ( θ , σ ) and X | ( X = x ) ∼ N ( θ − γ ( x − θ ) , σ ). Themarginal distribution on X does not depend on θ . Its mean is θ so it strictly increases in θ . The conditional mean of X | X = x is is strictly increasing in θ and strictly decreasingin x since γ >

0. So all conditions in Assumption OA.1 are satisﬁed.The next example features bivariate exponential distributions supported on the half-line[0 , ∞ ) . Example OA.2.

Gumbel (1960) proposes the following family of bivariate exponentialdistributions, parametrized by α ∈ [ − ,

1] : consider a joint distribution with the densityfunction (˜ x , ˜ x ) e − ˜ x − ˜ x · [1 + α (2 e − ˜ x − · (2 e − ˜ x − x , ˜ x ≥

0. If ( ˜ X , ˜ X )are random variables with this density, then they have full support on [0 , ∞ ) × [0 , ∞ ) andeach ˜ X j has the marginal distribution of an exponential random variable with mean 1. Theconditional distribution of ˜ X given a realization of ˜ X is E [ ˜ X | ˜ X = ˜ x ] = 1 − α − αe − ˜ x .The correlation between ˜ X and ˜ X is α/

4. 50et I = I = [0 , ∞ ) and let Θ = Θ = (0 , ∞ ) . Fixing some − ≤ α <

0, let F ( · ; θ , θ )be the joint distribution generated by X = θ · ˜ X and X = θ · ˜ X where ( ˜ X , ˜ X ) havethe Gumbel bivariate distribution with parameter α. Since ( ˜ X , ˜ X ) have full support on I × I , the same holds for ( X , X ) for any θ , θ > . The marginal distribution of X isexponential with a mean of θ , so Assumption OA.1(a) is satisﬁed. The conditional mean of X | X = x is given by E [ θ ˜ X | θ ˜ X = x ] = θ · E h ˜ X | ˜ X = x θ i = θ · (cid:16) − α − αe − ( x /θ ) (cid:17) .As α <

0, the term inside the bracket is strictly positive. So this conditional expectation isstrictly increasing in θ , showing that Assumption OA.1(b) is satisﬁed. Also, since θ , θ > x

7→ − αθ e − ( x /θ ) is strictly decreasing and so Assumption OA.1(c) is satisﬁed.I give a third example where I = I = [0 ,

1] are bounded intervals.

Example OA.3.

Let Θ = Θ = (0 , ∞ ) and consider the family of distribution F ( · ; θ , θ )such that under parameters ( θ , θ ), X ∼ Beta( θ ,

1) and X | X = x ∼ Beta((1 − x ) θ , θ , θ > X has full support on [0 , x ∈ (0 , X has full support on [0 , F ( · ; θ , θ ) has full-support on [0 , for every ( θ , θ ) ∈ Θ × Θ . The mean of X is θ θ +1 , which only depends on θ and is strictlyincreasing in it. So Assumption OA.1(a) is satisﬁed. The conditional mean of X | X = x is (1 − x ) θ (1 − x ) θ +1 , which is strictly increasing in θ and strictly decreasing in x . So, AssumptionsOA.1(b) and OA.1(c) are satisﬁed.Finally, I give a general class of examples that allows any pair of given marginal distribu-tions for X and X to be joined together using a copula as to induce negative dependencefor the joint distribution. Example OA.4.

Consider two families of distribution functions Q ( · ; θ ) : I → [0 , Q ( · ; θ ) : I → [0 , Q and Q are supported on I , I respectively for all θ ∈ Θ and θ ∈ Θ . Suppose the mean of Q is increasing in θ , and Q is increasing in stochasticdominance order with respect to θ . Fix a diﬀerentiable copula: that is, a diﬀerentiablefunction W : [0 , → [0 ,

1] so that W ( u,

0) = W (0 , v ) = 0, W ( u,

1) = u , W (1 , v ) = v forall u, v ∈ [0 , u ≤ u , v ≤ v ∈ [0 , , we get W ( u , v ) − W ( u , v ) − W ( u , v ) − W ( u , v ) ≥

0. Consider the family of distribution functions Q ( · ; θ , θ ) on R generated by joining together Q ( · ; θ ) with Q ( · ; θ ) using the copula W, namely Q (( −∞ , x ] × ( −∞ , x ]; θ , θ ) = W ( Q − ( x | θ ) , Q − ( x | θ )) . Then Q ( · ; θ , θ ) has marginal distributions on X and X given by distribution functions Q ( · ; θ ) , Q ( · ; θ ). The next lemma shows that when the copula W satisﬁes u ∂W∂u ( u, v )is increasing, the resulting joint distribution satisﬁes the conditions in Assumption OA.1. Lemma OA.15.

Suppose ∂W∂u ( u, v ) is an increasing function in u and that { Q ( · ; θ ) : θ ∈ Θ } , { Q ( · ; θ ) : θ ∈ Θ } satisfy the conditions of this example. Then, the conditions in ssumption OA.1 are satisﬁed for the family of distributions F ( · ; θ , θ ) where F ( · ; θ , θ ) has the distribution function Q ( · ; θ , θ ) .Proof. For Assumption OA.1(a), the marginal of F ( · ; θ , θ ) on X is simply Q ( · ; θ ) , whichI assumed is strictly increasing in mean with respect to θ . For Assumptions OA.1(b), it is well-known that by the copula construction, for all u, v ∈ [0 , P F ( · ; θ ,θ ) [ X ≤ Q − ( v ; θ ) | X = Q − ( u ; θ )] = ∂W∂u ( u, v ). This means ∂W∂u ( u, v ) isincreasing in v. Fixing some x ∈ I and θ ∈ Θ , put u = Q ( x ; θ ). Now for every θ and x ∈ I , we have P F ( · ; θ ,θ ) [ X ≤ x | X = x ] = ∂W∂u ( u, Q − ( x ; θ )). Since the family ofmarginals Q ( · ; θ ) increases in FOSD order as θ increases, Q − ( x ; θ ) is decreasing in θ .Since ∂W∂u increases in its second argument, P F ( · ; θ ,θ ) [ X ≤ x | X = x ] must then decreasein θ , that is to say the conditional distribution X | X = x is increasing in FOSD order in θ . So in particular Assumption OA.1(b) is satisﬁed.For Assumption OA.1(c), again start with the expression P F ( · ; θ ,θ ) [ X ≤ Q − ( v ; θ ) | X = Q − ( u ; θ )] = ∂W∂u ( u, v ) . For x > x , put u = Q ( x ) > Q ( x ) = u . We have for every v ∈ [0 ,

1] that P F ( · ; θ ,θ ) [ X ≤ Q − ( v ; θ ) | X = x ] = ∂W∂u ( Q ( x ; θ ) , v )while P F ( · ; θ ,θ ) [ X ≤ Q − ( v ; θ ) | X = x ] = ∂W∂u ( Q ( x ; θ ) , v ) . Since the distribution function Q ( · ; θ ) has full support, Q ( x ; θ ) > Q ( x ; θ ). And sincewe assumed ∂W∂u is increasing in its ﬁrst argument, we see that P F ( · ; θ ,θ ) [ X ≤ x | X = x ]is increasing in x . That is, the conditional distribution X | X = x is decreasing in FOSDorder in x . So Assumption OA.1(c) is satisﬁed. Example OA.5.

The condition that ∂W∂u ( u, v ) increases in u is satisﬁed by, for example,the Gaussian copula with any negative correlation. The derivative of the Gaussian copulais given by ∂W∂u ( u, v ) = P [ X ≤ Φ − ( v ) | X = Φ − ( u )] where Φ is the standard Gaussiandistribution function and ( X , X ) are jointly Gaussian with correlation − < ρ < X | X = x ∼ N ( ρx , − ρ ),it is clear that X | X = x decreases in FOSD order as x increases, so for any v we have P [ X ≤ Φ − ( v ) | X = Φ − ( u )] increases in u . OA 5.2 Method of Moments Inference

For a distribution of histories

H ∈ ∆( H ), let a [ H ] represent the average ﬁrst-period drawunder this distribution and let a [ H ] represent the average second-period draw (when un-52ensored). More precisely, a [ H ] := E h ∼H [ h ], a [ H ] := E h ∼H [ h | h = ∅ ]. Suppose thatobjectively X , X are independent with a joint distribution F • , and denote the true dis-tribution of histories under censoring by cutoﬀ stopping rule c ∈ R as H • ( c ) . Then byindependence, a [ H • ( c )] and a [ H • ( c )] do not in fact depend on c. Given the family of feasible models { F ( · ; θ , θ ) : θ ∈ Θ , θ ∈ Θ } about the joint dis-tribution of ( X , X ) , let H ( θ , θ ; c ) := H ( F ( · ; θ , θ ); c ) denote the distribution on historiesunder the model F ( · ; θ , θ ) and censoring cutoﬀ c. I now deﬁne the method of momentsestimator.

Deﬁnition OA.5.

The method-of-moments estimator derived from an inﬁnite dataset ofhistories with the distribution H • ( c ) is any pair ( θ M , θ M ) ∈ Θ × Θ such that:1. a [ H ( θ M , θ M ; c )] = a [ H • ( c )]2. a [ H ( θ M , θ M ; c )] = a [ H • ( c )]I will sometimes write θ M ( c ) , θ M ( c ) to emphasize the dependence of the MOM estimatorson the censoring threshold c. It is easy to check that for the Gaussian case, the MOMestimators and the pseudo-true fundamentals coincide.The MOM estimator need not exist — for example, if all values of θ ∈ Θ generate amarginal distribution on X that is smaller than a [ H • ( c )] . However, when it exists, it isunique under the assumptions I made.

Lemma OA.16.

When the family of feasible models satisﬁes Assumption OA.1, the MOMestimator is unique when it exists.Proof.

Suppose ( θ M , θ M ) is an MOM estimator. I show any other MOM estimator (ˆ θ , ˆ θ )must be equal to it.We may rewrite the moments as: a [ H ( θ M , θ M ; c )] = E F ( · ; θ ) [ X ] , a [ H ( θ M , θ M ; c )] = E F ( · ; θ ,θ ) [ X | X < c ].The unconditional mean of X , namely E F ( · ; θ ) [ X ], is strictly increasing in θ by As-sumption OA.1(a). So, at most one value of θ ∈ Θ can generate an unconditional meanthat matches a [ H • ( c )], meaning we must have ˆ θ = θ M .Given this unique θ M , Assumption OA.1(b) implies the conditional mean E F | ( · ; θ M ,θ | x ) [ X ]is strictly increasing in θ for every x < c . The conditional mean E F ( · ; θ M ,θ ) [ X | X < c ] isgiven by an integral over E F | ( · ; θ M ,θ | x ) [ X ] across the values x < c , therefore E F ( · ; θ M ,θ ) [ X | X θ M ( c ).When ( θ , θ ) correspond to the unconditional means, the MOM estimators understatethe X mean of the objective distribution F • .54 roposition OA.7. Suppose parameters ( θ , θ ) index the unconditional X , X means inall feasible models, that is E F ( · ; θ ,θ ) [ X ] = θ and E F ( · ; θ ,θ ) [ X ] = θ . Suppose c ∈ R and theMOM estimators θ M ( c ) , θ M ( c ) exist. Let θ • = E F • [ X ] , θ • = E F • [ X ] be the unconditionalmeans of the true distribution of draws. Then, θ M ( c ) = θ • , θ M ( c ) < θ • .Proof. For any θ ∈ Θ , θ ∈ Θ , and c ∈ R , a [ H ( θ , θ ; c )] = θ since ( θ , θ ) are assumed toparametrize means. Also, a [ H ( θ , θ ; c )] = E F ( · ; θ ,θ ) [ X | X ≤ c ] > θ = E F ( · ; θ ,θ ) [ X ] dueto Assumption OA.1(c). Finally, and a [ H • ( c )] = θ • , a [ H • ( c )] = θ • due to independence in F • .This means if θ M , θ M are the MOM estimators under censoring threshold c, then θ M = θ • .Also, we must have a [ H ( θ M , θ M ; c )] = θ • . At the same time we have a [ H ( θ M , θ M ; c )] > θ M , so this means θ M < θ • . These conclusions show that the ideas behind my main results do not depend on fullBayesianism. Rather, the crucial assumption is the generalized notion of negative dependencebetween X and X , as articulated by Assumption OA.1(c) for arbitrary joint distributions.As a corollary, I characterize the large-generations learning dynamics for MOM agentsusing a general class of feasible models. The key idea is that the positive feedback betweendistorted stopping rules and distorted beliefs continue to hold, with the parametric ver-sion of gambler’s fallacy interpreted as γ > a , a to take asargument multiple history distributions. That is, a ( H (1) , ..., H ( K ) ) := E h ∼⊕ Kk =1 H ( k ) [ h ] and a ( H (1) , ..., H ( K ) ) := E h ∼⊕ Kk =1 H ( k ) [ h | h = ∅ ], where ⊕ Kk =1 H ( k ) is the mixture distribution as-signing K weight to each of the k history distributions, ( H ( k ) ) Kk =1 . After K sub-datasets of cen-sored histories with distributions H • ( c [0] , ..., c [ K − ), the MOM estimators µ M ( c [0] , ..., c [ K − ) ,µ M ( c [0] , ..., c [ K − ) are such that a ( H • ( c [0] ) , ..., H • ( c [ K − )) = a ( H ( θ M , θ M ; c [0] ) , ..., H ( θ M , θ M ; c [ K − )) a ( H • ( c [0] ) , ..., H • ( c [ K − )) = a ( H ( θ M , θ M ; c [0] ) , ..., H ( θ M , θ M ; c [ K − )) . One caveat: we must now ensure the MOM estimator exists in each generation when theprevious generation uses any cutoﬀ stopping rule that has a positive probability of continuinginto the next period. To guarantee existence, I impose an additional assumption on how thefeasible models relates to the true distribution F • . Assumption OA.2. (a) The supports of X and X under the true distribution F • are I and I , respectively. b) The range of θ E F ( · ; θ ) [ X ] is I .(c) For every θ ∈ Θ and x ∈ I , θ E F | ( · ; θ ,θ | x ) [ X ] is continuous with a range of I . Assumption OA.2(a) is a consistency requirement, saying that the supports for the ob-jective distributions of X and X match their supports under the agents’ feasible models.Assumption OA.2(b) and Assumption OA.2(c) ensures the agents can always match the twomoment conditions. It is easily veriﬁed that Examples OA.1, OA.2, and OA.3 satisfy As-sumption OA.2 when the true joint distribution of ( X , X ) is supported on R , [0 , ∞ ) , and[0 , respectively. Corollary OA.2.

Fix some objective, independent distribution F • for ( X , X ) and supposeagents’ feasible models { F ( · ; θ , θ ) : θ ∈ Θ , θ ∈ Θ } satisfy Assumptions OA.1 and OA.2.Suppose the payoﬀ function u ( x , x ) in the optimal-stopping problem is linear in x . Initiatethe 0th generation at an arbitrary cutoﬀ c [0] in the interior of I . Then, beliefs and cutoﬀthresholds ( µ M , [ t ] ) t ≥ , ( µ M , [ t ] ) t ≥ , and ( c M [ t ] ) t ≥ form monotonic sequences. This corollary establishes the monotonicity of the beliefs and cutoﬀs for MOM agents,analogous to the monotonicity result of Theorem 2.

Proof.

I ﬁrst show that under any of the models F ( · ; θ , θ ), agent’s subjectively optimalstopping rule is a cutoﬀ rule (possibly involving never stopping or always stopping). Itsuﬃces to show that x ( u ( x ) − E F | ( · ; θ ,θ | x ) [ u ( x , X )])is strictly increasing in x . By linearity of u in its second argument, this expression is equalto x ( u ( x ) − u ( x , E F | ( · ; θ ,θ | x ) )) . Suppose x > x . By Assumption 1(b), u ( x ) − u ( x , E F | ( · ; θ ,θ | x ) ) ≥ u ( x ) − u ( x , E F | ( · ; θ ,θ | x ) ) . By Assumption OA.1(c), E F | ( · ; θ ,θ | x ) < E F | ( · ; θ ,θ | x ) . Combined with Assumption 1(a), itgives u ( x , E F | ( · ; θ ,θ | x ) ) < u ( x , E F | ( · ; θ ,θ | x ) ), hence showing u ( x ) − u ( x , E F | ( · ; θ ,θ | x ) ) > u ( x ) − u ( x , E F | ( · ; θ ,θ | x ) ) . Also, suppose F ( · ; θ , θ ) induces either a stopping threshold which is an interior point of I , or always stopping. Then F ( · ; θ , θ ) induces a higher stopping threshold or alwaysstopping whenever θ ≥ θ . To see this, if there is an indiﬀerence point ¯ x in the interiorof I with u (¯ x ) = u (¯ x , E F | ( · ; θ ,θ | ¯ x ) ), then we have E F | ( · ; θ ,θ | ¯ x ) > E F | ( · ; θ ,θ | ¯ x ) due56o Assumption OA.1(b), so u (¯ x ) < u (¯ x , E F | ( · ; θ ,θ | ¯ x ) ). This shows under F ( · ; θ , θ )the agent strictly prefers continuing at ¯ x , so the acceptance threshold must be higher.Similarly, if the agent prefers always stopping at every x ∈ I under F ( · ; θ , θ ). then sheprefers strictly stopping at every x under F ( · ; θ , θ ).I now show that µ M , [ t ] , µ M , [ t ] , and c M [ t ] are well deﬁned for every t ≥ . MOM agents in gener-ation t ≥ t sub-datasets of censored histories, with the distribution H • ( c [0] ) , ..., H • ( c [ t − )where c [0] ∈ int( I ) . The moments to match are a ( H • ( c [0] ) , ..., H • ( c [ K − )) = E F • [ X ] ,a ( H • ( c [0] ) , ..., H • ( c [ K − )) = E F • [ X ] , where the second-period moment is well deﬁned because c [0] is interior, so a positive fractionof histories in at least one sub-dataset contain uncensored X . These moments are interiorvalues in I , I respectively, since F • has full-support marginal distributions.By Assumption OA.2(b), there exists ¯ θ ∈ Θ , independent of K and ( c [0] , ..., c [ K − ) , sothat E F ( · ;¯ θ ) [ X ] = E F • [ X ] . By combining Assumption OA.1(b) and OA.2(c), we get that θ a ( H (¯ θ , θ ; c [0] ) , ..., H (¯ θ , θ ; c [ K − ))is increasing, continuous on Θ with a range of I . (This uses the fact that c [0] is in the interiorof I .) Since MOM agents are matching an interior value E F • [ X ] ∈ int( I ) , this shows thatfor any K and ( c [0] , ..., c [ K − ) with c [0] ∈ int( I ) , θ M ( c [0] , ..., c [ K − ) and θ M ( c [0] , ..., c [ K − )exist, and furthermore θ M ( c [0] , ..., c [ K − ) = ¯ θ .By uniqueness of MOM estimators in Lemma OA.16, µ M , [ t ] , µ M , [ t ] are well deﬁned for each t ≥

1. Also, c M [ t ] is also well deﬁned for each t ≥ , given that we have shown the optimalstrategy in the model F ( · ; µ M , [ t ] , µ M , [ t ] ) is a cutoﬀ strategy.To prove monotonicity, ﬁrst suppose that c [1] ≤ c [0] . I have argued that we must have θ M , [2] = θ M , [1] = ¯ θ , so now I rule out θ M , [2] > θ M , [1] . Note that a ( H (¯ θ , θ ; c [0] ) , H (¯ θ , θ ; c [1] )) = w w + w a ( H (¯ θ , θ ; c [0] )) + w w + w a ( H (¯ θ , θ ; c [1] ))where w = P F • [ X ≤ c [0] ] > w = P F • [ X ≤ c [1] ] ≥ . The moment-matchingcondition for generation 1 implies a ( H (¯ θ , θ M , [1] ; c [0] )) = E F • [ X ] . For any θ M , [2] > θ M , [1] , wehave a ( H (¯ θ , θ M , [2] ; c [0] )) > E F • [ X ]from Assumption OA.2(b). If c [1] = inf( I ) , we have found a contradiction since the weight57 is 0. When c [1] > inf( I ) , we get a ( H (¯ θ , θ M , [2] ; c [1] )) ≥ a ( H (¯ θ , θ M , [2] ; c [0] )) > E F • [ X ]by combining Assumption OA.2(c) with the fact that c [1] ≤ c [0] . Both w and w are strictlypositive, and they are multiplied to terms both strictly larger than E F • [ X ] . This shows a ( H (¯ θ , θ M , [2] ; c [0] ) , H (¯ θ , θ M , [2] ; c [1] )) > E F • [ X ] , again contradicting the moment condition.Hence we must have θ M , [2] ≤ θ M , [1] , and thus c M [2] ≤ c M [1] by monotonicity of the cutoﬀthreshold in belief as discuss before. Similar argument establishes ( µ M , [ t ] ) t ≥ , ( µ M , [ t ] ) t ≥ , and( c M [ t ] ) t ≥ are decreasing sequences.The case of c [1] > c [0] is symmetric. OA 6 The Censoring Eﬀect in a Finite-Urn Model

Rabin (2002) Section 7 discusses an example with endogenous observations. There is aninﬁnite population of ﬁnancial analysts, each with quality θ ∈ { , , } . Conditional onquality θ, an analyst generates either a good (signal a ) or a bad (signal b ) return eachperiod, with probabilities θ and 1 − θ and independently across periods. The agent, however,believes successive returns from the same analyst are generated through a ﬁnite-urn model.Consider an urn with N balls where N is a multiple of 4. For an analyst with quality θ, initialize the urn with θN balls labeled “ a ” and (1 − θ ) N balls labeled “ b .” Successivereturns are successive draws without replacement from the urn. The urn is refreshed everytwo draws. Rabin (2002) calls an agent with this ﬁnite-urn model an “ N -Freddy”. Sincethe urn is not refreshed between draws 2 k − k for k = 1 , , , ... , such pairs of drawexhibit negative correlation in agent’s feasible model, generating the gambler’s fallacy.Returning to Rabin (2002) Section 7’s example, objectively all ﬁnancial analysts havequality θ = . The agent samples a ﬁnancial analyst at random and observes his returns overtwo periods. Depending on the realizations of these two returns, the agent either observes thesame analyst for two more periods before sampling a new analyst, or immediately samplesa new analyst. This procedure is inﬁnitely repeated. Rabin (2002) investigates a 4-Freddyagent’s long-run belief about the proportions of analysts with the three levels of quality inthe population.The endogenous observation in the example is distinct from what I term the “censoringeﬀect” in this paper. The main mechanism behind my censoring eﬀect is that some obser-vations omit signals X , which biased agents judge to be negatively correlated with signalsthat are always observed, X . This then leads to distorted inference. However, in Rabin(2002)’s ﬁnite-urn model, the urn is refreshed every two periods. This means an N -Freddy58-Freddy θ = θ = θ = aa

16 12 ab

14 13 14 ba

14 13 14 bb

12 16 b ∅

34 12 14 θ = θ = θ = aa

128 628 1528 ab

628 828 628 ba

628 828 628 bb b ∅

34 12 14

Table OA.1: The likelihoods of observations under diﬀerent analyst qualities, for 4-Freddyand 8-Freddy agents.agent judges the part of the data that is sometimes censored (the analyst’s returns in periods3 and 4) to be independent of the part of the data that is always observed (the analyst’sreturns in periods 1 and 2). Therefore the driving force behind Rabin (2002) Section 7’s ex-ample is not the interaction between censoring and the gambler’s fallacy, but rather betweencensoring and the “Bayesian aspect” of N -Freddy’s quasi-Bayesian inference.In this section, I study a related problem where an N -Freddy agent observes each analystfor either one or two periods, depending on whether the analyst generates a bad ﬁrst-periodreturn. This setup features the censoring eﬀect, because the ﬁnite-urn model generatesnegative correlation between the ﬁrst and second draws from each urn. I ﬁnd that theagent’s inference under this censoring structure tends to be too optimistic. This conclusionis in line with predictions about the censoring eﬀect in the baseline model of this paper,for the basic inference result in Proposition 2 shows that when the dataset is censored inthe opposite way (i.e. censored when the ﬁrst draw is good), the resulting inference is toopessimistic . That is, I demonstrate the robustness of my censoring eﬀect to an alternativemodel of the gambler’s fallacy in a binary-signals setting, showing that it is not an artifactof the continuous-signals setup in my baseline model.Table OA.1 displays the likelihood of all signals of length 2 for the 4-Freddy and 8-Freddy agents, for diﬀerent values of θ ∈ { , , } . The last row of each table also shows thelikelihoods of simply observing the signal b in the ﬁrst period, under the censoring rule thatstops observing an analyst if his ﬁrst return is bad.I ﬁrst discuss inference without censoring. After aa , Freddy exaggerates the relativelikelihood of θ = to θ = compared to a Bayesian, whereas after ab Freddy’s relativelikelihoods of these two qualities are the same as a Bayesian’s. Overall, given a sample withan equal number of aa and ab signals, Freddy exaggerates the relative likelihood of θ = to θ = . This phenomenon is analogous to the continuous version of gambler’s fallacy where abiased observer “partially forgives” a mediocre outcome following an outstanding outcome. Proposition OA.12 in the Online Appendix shows that when the dataset is censoring using a strategythat stops when X ≤ c for some c ∈ R , inference about second-period fundamental is always too high. a in the ﬁrst period lead to an overly optimistic estimateabout the analyst’s ability. By the same logic, observing an equal number of ba and bb signalswould lead to exaggeration of the likelihood of θ = relative to θ = .However, now suppose the second observation is censored when the ﬁrst observation is b. The otherwise symmetric situation becomes asymmetric. Following the observation of b ∅ (where the second draw is censored), Freddy’s inference is the same as a Bayesian’s. Sowe have turned oﬀ the channel leads to exaggerating the probability of θ = but kept thechannel that leads to exaggerating the probability of θ = . This is analogous to the censoringeﬀect in my model, where censoring second-period draw following unfavorable ﬁrst-perioddraws implies overly optimistic inferences.In the long-run, the agent observes a distribution of returns across diﬀerent analysts:25% of the time aa is observed, 25% of the time ab is observed, and 50% of the time b ∅ is observed. To calculate the agent’s long-run beliefs, ﬁrst suppose Freddy’s prior speciﬁeseither all analysts have θ = or all analysts have θ = . Then Freddy’s long-run inferenceis given by the parameter maximizing expected log-likelihood of the data. For 4-Freddy, thelog-likelihood likelihood under θ = is −∞ while the log-likelihood under θ = is a ﬁnitenumber. For 8-Freddy, The log-likelihood under θ = is14 ln(1 /

28) + 14 ln(6 /

28) + 12 ln(3 / ≈ − . θ = is14 ln( 1528 ) + 14 ln( 628 ) + 12 ln(1 / ≈ − . . So in both cases, Freddy will come to believe θ = over θ = for all analysts.Now consider a 4-Freddy who dogmatically believes some 1 − κ ∈ (0 ,

1) fraction of theanalysts have θ = , but the remaining analysts either have θ = or θ = . So, the agentestimates q a ∈ [0 , − κ ], the fraction of analysts who have θ = . Straightforward algebrashows that the q ∗ a maximizing expected log-likelihood of the data is q ∗ a = κ + for κ ≥ ,q ∗ a = κ otherwise. Since κ + > κ for all κ ∈ ( , OA 7 The Gambler’s Fallacy and Attentional Stability

Many papers on behavioral learning, including this one, can be thought of as studying agentswhose prior (or “misspeciﬁed theory”) over states of the world excludes the true, data-60enerating state. Agents in this paper start with a prior supported on the class of feasiblemodels { Ψ( µ , µ ; γ ) : ( µ , µ ) ∈ M} for some ﬁxed γ > , with diﬀerent models viewed asdiﬀerent states. But the true state is the objective distribution ( X , X ) ∼ Ψ( µ • , µ • ; 0) thatlies outside the feasible set. This discrepancy is not an issue when agents move one at a timeand pass down their beliefs, since each agent only updates using one history — her own. Butin the large-generations environment, as an agent’s data set grows, her misspeciﬁed theorycan appear inﬁnitely less likely in the limit than an alternative prior belief (or “light-bulbtheory”) that includes the true state in its support.Gagnon-Bartsch, Rabin, and Schwartzstein (2018) oﬀer an explanation for why suchmisspeciﬁed theories persist with learning – attentional stability. Under a misspeciﬁed theory,some coarsened information may be suﬃcient for decision-making. When agents only payattention to this coarsened information, the aspects of the data that they attend to maybe so coarse that their misspeciﬁed theory no longer appears inﬁnitely less likely than thelight-bulb theory.In this section, I investigate the attentional stability of the gambler’s fallacy bias in mylearning setting for the Gaussian case (so, Ψ( µ , µ ; γ ) will stand for a correlated Gaus-sian distribution). The main intuition is that when agents are dogmatic about γ , they aredogmatic about the correlation between X and X . Therefore, under their misspeciﬁed the-ory, agents do not ﬁnd it necessary to separately keep track of the conditional distributions X | ( X = x ) for diﬀerent values of x . Agents believe certain “statistics” of the datasetare suﬃcient for decision-making, and this process of reducing the entire dataset into thesesuﬃcient statistics removes features of the dataset that would otherwise have led the agentsto question the validity of their theory.My setting diﬀers in two ways from that of Gagnon-Bartsch, Rabin, and Schwartzstein(2018). Each of my agents acts once (after observing a possibly large or even inﬁnite dataset),while their agents observe one signal each period over an inﬁnite number of periods. Anotherdistinction is that data is endogenous in my setting, whereas Gagnon-Bartsch, Rabin, andSchwartzstein (2018) almost entirely focus on an exogenous-data environment. So, I beginby deﬁning the key concepts surrounding attentional stability in my setting.

OA 7.1 A Deﬁnition of Attentional Stability in Large Datasets

In the learning environment where agents act in large generations, each agent in generation t observes t sub-datasets of inﬁnitely many histories. The overall distribution of historiesin the dataset is H • ( c [0] , ..., c [ t − ) = ⊕ t − k =0 H • ( c [ k ] ), where the right-hand side refers to themixture between the t history distributions that assigns weight 1 /t to each.To develop a deﬁnition of attentional stability in large datasets, I consider an agentwho directly observes a distribution of histories (instead of a dataset with this distribution) H • ( c , ..., c L ) ∈ ∆( H ). This represents the observations of agents in each generation t ≥ Deﬁnition OA.6.

Let π, λ be beliefs over the joint distribution of ( X , X ) . Say π is inex-plicable relative to λ , conditional on the true model Ψ • and censoring thresholds c , ..., c L ,if H • ( c , ..., c L ) = H (Ψ; c , ..., c L ) for some Ψ ∈ supp( λ ) , but H • ( c , ..., c L ) = H (Ψ; c , ..., c L )for any Ψ ∈ supp( λ ).Each feasible model Ψ and list of censoring thresholds c , ..., c L together induce a distri-bution over histories. If the observed history distribution H • ( c , ..., c L ) can be produced bysome feasible model of ( X , X ) in the support of the light-bulb theory λ, but not by anydistribution in the support of the misspeciﬁed theory π, then I call π inexplicable.I now deﬁne a particular kind of limited attention. Given a distribution over histories,the agent maps the entire distribution to ﬁnitely many real numbers. This is an extremeform of data coarsening. If there is a strategy optimal under the misspeciﬁed theory π thatonly makes use of these ﬁnitely many statistics, then we have a suﬃcient-statistics strategy. Deﬁnition OA.7. A suﬃcient-statistics strategy (SSS) for large generations consists of astatistics map Λ : ∆( H ) → R K for some ﬁnite K < ∞ and a cutoﬀ map σ : Im(Λ) → R , suchthat agents in each generation t ≥ ∼ π ) to use the stopping strategy with cutoﬀ σ (Λ( H )) whenever H is adataset of predecessors’ histories H = H • ( c [0] , ..., c [ t − ) . An agent following the strategy (Λ , σ ) ﬁrst extracts K statistics (i.e. K real numbers)from the inﬁnite dataset of predecessors’ histories. Then, she applies σ to choose a cutoﬀthreshold that only depends on the dataset through its K extracted statistics, Λ( H ). Theidea is that the agent only pays attention to the ﬁnitely many statistics, a perhaps morerealistic behavior than paying full attention to the entire inﬁnite dataset. If such a strategyis optimal for an agent believing the true joint distribution of ( X , X ) is drawn accordingto her (misspeciﬁed) prior Ψ ∼ π , I call the pair (Λ , σ ) an SSS.A related deﬁnition of suﬃciency works with ﬁnite datasets instead of inﬁnite datasets.This corresponds to limited attention in an alternative version of my Section 3 environment,where agents act one at a time but observe all predecessors’ histories instead of adopting theimmediate predecessor’s posterior belief. Deﬁnition OA.8. A suﬃcient-statistics strategy (SSS) in datasets of size N < ∞ con-sists of a statistics map Λ ( N ) : H N → R K for some ﬁnite K < ∞ and a cutoﬀ map σ ( N ) : Im(Λ ( N ) ) × N → R , such that the subjectively optimal cutoﬀ threshold (under theBayesian posterior belief about the fundamentals after updating prior density m ( µ , µ )) is σ ( N ) (Λ ( N ) (( h n ) Nn =1 ) , N ) after observing a dataset ( h n ) Nn =1 with size N and containing N ≤ N instances of second-period draws.Finally, I combine these concepts to deﬁne attentional stability. Roughly speaking, thetheory π is attentionally stable if we can ﬁnd a (Λ , σ ) pair that pays “ﬁne” enough attention62o be an SSS under π, but “coarse” enough attention so that the resulting statistics can beexplained by some model in the support of π . Deﬁnition OA.9.

Theory π is attentionally stable , conditional on the objective model Ψ • and censoring thresholds c , ..., c L , if there exists an SSS (Λ , σ ) such that Λ( H • ( c , ..., c L )) =Λ( H (Ψ; c , ..., c L )) for some Ψ in the support of π . OA 7.2 The Gambler’s Fallacy is Inexplicable under Full Atten-tion

Fix γ > . Let π be any full-support belief over { Ψ( µ , µ ; γ ) : ( µ , µ ) ∈ M} , where M ⊆ R is any speciﬁcation of feasible fundamentals. Let λ be any belief with Ψ • = Ψ( µ • , µ • ; 0) inits support. I ﬁrst show that without channeled attention, agents will come to realize thattheir misspeciﬁed theory π is wrong after seeing a large dataset. Proposition OA.8. π is inexplicable relative to λ , conditional on Ψ • and any censoringthresholds c , ..., c L ∈ R .Proof. This is because Ψ • ∈ supp( λ ) but every Ψ ∈ supp( π ) has KL divergence boundedaway from 0 relative to Ψ • in terms of the histories they generate under censoring by c , ..., c L ,that is to say inf Ψ ∈ supp( π ) D KL ( H • ( c , ..., c L ) k H (Ψ; c , ..., c L ))= inf ( µ ,µ ) ∈M D KL ( H • ( c , ..., c L ) k H (Ψ( µ , µ , σ , σ ; γ ); c , ..., c L )) > . This inequality holds because the derivation of µ ∗ , µ ∗ in Example 2 implies the above KL-divergence minimization problem has a minimum strictly above 0 even over the unrestricteddomain ( µ , µ ) ∈ R . The restriction to some

M ⊆ R can only make the minimum larger. OA 7.3 The Gambler’s Fallacy is Attentionally Stable

Now I exhibit a family of SSS for ﬁnite datasets of size N and another SSS for large gener-ations that naturally corresponds to taking N → ∞ . These SSS have the additional prop-erty that they lead agents to the same beliefs about the fundamentals as the full-attentionBayesianism assumed in the rest of the paper. So, not only do these SSS justify agents notdiscarding their misspeciﬁed theory after seeing large datasets, they also provide a limited-attention foundation for the learning dynamics that I investigate in the main text of thepaper for the Gaussian case. 63n a dataset of size N, consider the statistics map with K = 2,Λ ( N ) (( h n ) Nn =1 ) =  N N X n =1 h ,n , n : h ,n = ∅ ) X n : h ,n = ∅ ( h ,n + γh ,n )  . The ﬁrst statistic is the sample mean of the ﬁrst-period draws. The second statistic canbe thought of as a “re-centered” observation v n := h ,n + γh ,n for each history h n where h ,n = ∅ . The agent only pays attention to the sample averages of x ,n = h ,n and v n . Underthe feasible model Ψ( µ , µ ; γ ), we may write the distributions of X , X as X = µ + (cid:15) X = µ + γ(cid:15) + z where (cid:15) , z ∼ N (0 , σ ) , are independent. Deﬁning V := X + γX , we see that underΨ( µ , µ ; γ ), V = µ + γµ + z . So, observations of ﬁrst-period draws are signals about µ , while observations of re-centered second-period V are signals about µ + γµ . Proposition OA.9. Λ ( N ) is part of an SSS in datasets of size N . The cutoﬀ choice in thisSSS is the same as for the full-attention agent.Proof. Write φ ( x ; a, b ) for the Gaussian density with mean a, variance b , evaluated at x. Without loss, suppose h ,n = ∅ for all 1 ≤ n ≤ N , and h ,n = ∅ for all n > N . I show thatthe posterior density over ( µ , µ ) after the dataset ( h n ) Nn =1 only depends on N , N P Nn =1 h ,n ,and N P N n =1 ( h ,n + γh ,n ). Indeed, m ( µ , µ | ( h n ) Nn =1 ) ∝ m ( µ , µ )  N Y n =1 φ ( h ,n ; µ , σ ) · φ ( h ,n ; µ − γ ( h ,n − µ ) , σ )  ·  N Y n = N +1 φ ( h ,n ; µ , σ )  = m ( µ , µ ) " N Y n =1 φ ( h ,n ; µ , σ ) ·  N Y n =1 φ ( h ,n ; µ − γ ( h ,n − µ ) , σ )  = " N Y n =1 φ ( h ,n ; µ , σ ) ·  N Y n =1 φ ( h ,n + γh ,n ; µ + γµ , σ )  . It is well-known that under the Gaussian likelihood, ( h ,n ) Nn =1 Q Nn =1 φ ( h ,n ; µ , σ ) is afunction of N P Nn =1 h ,n , and for the same reason ( h ,n + γh ,n ) N n =1 Q N n =1 φ ( h ,n + γh ,n ; µ + γµ , σ ) is a function of N P N n =1 ( h ,n + γh ,n ).Since the posterior belief m ( ·| ( h n ) Nn =1 ) only depends on N and the two statistics Λ ( N )1 (( h n ) Nn =1 ) , Λ ( N )2 (( h n ) Nn =1 ) ∈ R , the optimal cutoﬀ rule may be expressed as a function of these two statis-tics, N , and c of the predecessors.In the environment where full-attention Bayesian agents move one at a time, their be-havior is indistinguishable from agents using this SSS. Roughly speaking, this is because64he subjective joint distribution between ( X , V ) is Gaussian and the mean of a sequence ofGaussian random variables is a suﬃcient statistic for the likelihood of the entire sequence.Even when agents are full-attention Bayesians, their posterior distribution only depends onthe histories data through these statistics. Therefore, the statistics are suﬃcient for anydecision problem.Consider now the large-sample analog of the ﬁnite-sample SSS just deﬁned. Again with K = 2 , consider the statistic map Λ sends each distribution H to E h ∼H [ h i, ] and E h ∼H [ h i, + γh i, | h i, = ∅ ] . I show that Λ makes π attentionally explicable whenever π has full-supportover the feasible models indexed by feasible fundamentals M = R . Proposition OA.10.

For any list of censoring thresholds c , ..., c L ∈ R and fundamentals µ , µ ∈ R , Λ ( H (Ψ( µ , µ , γ ); c , ..., c L )) = µ , Λ ( H (Ψ( µ , µ , γ ); c , ..., c L )) = µ + γµ . Also, Λ( H • ( c , ..., c L )) = Λ( H ( Ψ( µ • , µ ∗ ( c , ..., c L ); γ ) ; c , ..., c L ))The ﬁrst two equations in this claim show that for any c , ..., c L , Ψ Λ( H (Ψ; c , ..., c L ))is a one-to-one function on the support of π , and furthermore any values of the statistics s , s can be rationalized through appropriate choices of µ , µ . We may put σ ( s , s ) = C ( s , s − γs ; γ ) to make (Λ , σ ) an SSS, thus showing the gambler’s fallacy is attentionallystable in large datasets. Another implication of this claim is that the limited-attention agentcomes to believe the large-generations pseudo-true fundamentals ( µ • , µ ∗ ( c , ..., c L )) after see-ing the history distribution H • ( c , ..., c L ) . Therefore, the large-generations SSS gives the samebehavior as the full-attention Bayesianism in the baseline large-generations environment.

Proof.

To see the ﬁrst two equations, let c ∈ R , µ , µ ∈ R , and write Ψ = Ψ( µ , µ ; γ ). Wehave E h ∼H (Ψ; c ) [ h + γh | h = ∅ ]= E h ∼H (Ψ; c ) [ h | h = ∅ ] + γ E h ∼H (Ψ; c ) [ h | h = ∅ ]= E Ψ [ X | X ≤ c ] + γ E Ψ [ X | X ≤ c ]= E Ψ [ µ − γ ( X − µ ) | X ≤ c ] + γ E Ψ [ X | X ≤ c ]= E Ψ [ µ + γµ | X ≤ c ]= µ + γµ . Since this holds for any c , so we must get that on the mixed history distribution,Λ ( H (Ψ( µ , µ , γ ); c , ..., c L )) = µ + γµ

65s well. It is easy to see that we must have Λ ( H (Ψ( µ , µ , γ ); c , ..., c L )) = µ .To obtain the ﬁnal equation, ﬁrst note that we can re-write the second statistic underthe true distribution of histories H • ( c , ...c L ) as a weighted average, E h ∼H • ( c ,...c L ) [ h + γh | h = ∅ ] = L X ‘ =1 w ‘ · E h ∼H • ( c ‘ ) [ h + γh | h = ∅ ]This is because the event of h = ∅ happens only when h falls below the censoring threshold,so the posterior probability of h being generated from the sub-distribution H • ( c ‘ ) given that h = ∅ depends on the relative likelihoods of X falling under the L diﬀerent censoringthresholds.For each ‘, E h ∼H • ( c ‘ ) [ h + γh | h ≤ c ] = µ • + γ E [ X | X ≤ c ‘ ]where the conditional expectation of h | h ≤ c ‘ is simply µ • by independence of X and X under Ψ • . Putting this into the weighted average expression, E h ∼H • ( c ,...c L ) [ h + γh | h = ∅ ] = µ • + γ L X ‘ =1 w ‘ · E [ X | X ≤ c ‘ ] . In order to match the statistics, s = µ • and s = µ • + γ P L‘ =1 w ‘ · E [ X | X ≤ c ‘ ] producedby Λ( H • ( c , ..., c L )), we must therefore have µ = µ • , and µ • + γ L X ‘ =1 w ‘ · E [ X | X ≤ c ‘ ] = µ + γµ • , which rearranges to µ = µ • − γ L X ‘ =1 w ‘ · ( µ • − E [ X | X ≤ c ‘ ]) = µ ∗ ( c , ..., c L ) . OA 8 Additional Extensions of the Baseline Model

I consider further extensions of the baseline model for the Gaussian case.

OA 8.1 Draws as Costs

In the baseline model, I have studied optimal-stopping problems satisfying Assumption 1.One implication of Assumption 1 is that higher draws are more beneﬁcial to the agent, as66 and u are strictly increasing functions of the draws in their respective periods. In thissection, I verify the robustness of my positive-feedback loop result when draws are interpretedas costs. Here is the canonical example to keep in mind: Example OA.6 (Do It Now or Later) . The agent has two periods to complete a task. Inperiod 1, she draws her cost of completing the task today, X = x . The agent choosesbetween paying x and ﬁnishing the task, or waiting until period 2. If she decides to wait,she will draw another cost X = x in period 2, which she must then pay. So, u ( x ) = − x and u ( x , x ) = − x . In optimal-stopping problems like Example OA.6, the subjectively optimal stopping rulegiven any beliefs about the fundamentals in the two periods features stopping for low valuesof X . This means agents observed censored datasets from their predecessors where X is onlyobserved following high values of X , the “opposite” kind of endogenous censoring comparedto what happens in problems satisfying Assumption 1. Now, a more heavily censored datasetinduces a higher belief about the second-period mean in the next generation, which causesthe next generation to accept higher costs in the ﬁrst period. This exacerbates the censoringand the positive feedback cycle again obtains.More generally, I will consider payoﬀ functions u ( x ), u ( x , x ) satisfy the followingassumptions. Assumption OA.3.

The payoﬀ functions satisfy:(a) For x > x and x > x , u ( x ) < u ( x ) and u ( x , x ) < u ( x , x ) . (b) For x > x and any ¯ x , u ( x ) − u ( x ) < −| u ( x , ¯ x ) − u ( x , ¯ x ) | . (c) There exist x h , x l , x l , x h ∈ R so that u ( x h ) − u ( x h , x l ) < , while u ( x l ) − u ( x l , x h ) > (d) u , u are continuous. Also, for any ¯ x ∈ R , x u (¯ x , x ) is absolutely integrablewith respect to any Gaussian distribution on R . I show that the subjectively optimal stopping strategy under dogmatic belief in funda-mentals µ , µ takes a cutoﬀ form, but the agent stops in period 1 for low realizations ofperiod 1 costs, X ≤ c . Furthermore, the optimal cutoﬀ increases in µ . Proposition OA.11.

Under Assumption OA.3 and for γ > , • Under each feasible model Ψ( µ , µ ; γ ) , there exists a cutoﬀ threshold C cost ( µ , µ ; γ ) ∈ R such that it is strictly optimal to continue whenever x > C cost ( µ , µ ; γ ) and strictlyoptimal to stop whenever x < C cost ( µ , µ ; γ ) . • For every µ ∈ R , µ C cost ( µ , µ ; γ ) is strictly increasing. roof. Consider the pair of payoﬀ functions ˜ u : R → R and ˜ u : R → R deﬁned by˜ u (˜ x ) := u ( − ˜ x ) and ˜ u (˜ x , ˜ x ) := u ( − ˜ x , − ˜ x ) . It is easy to verify that since u , u satisfy Assumption OA.3, ˜ u , ˜ u must satisfy Assumption 1.When ( X , X ) ∼ Ψ( µ , µ ; γ ), we also have ( ˜ X , ˜ X ) ∼ Ψ( − µ , − µ ; γ ), where ( ˜ X , ˜ X ) =( − X , − X ). The best stopping strategy under the payoﬀ functions ˜ u , ˜ u when drawsare generated from Ψ( − µ , − µ ; γ ) involves the cutoﬀ threshold C ( − µ , − µ ; γ ), stoppingwhenever ˜ X exceeds the threshold and continuing whenever ˜ X falls below it. Here, C ( − µ , − µ ; γ ) is the usual acceptance threshold from Proposition 1.By the relationship between u , u and ˜ u , ˜ u , we deduce that the optimal stopping strat-egy under the payoﬀ functions u , u when draws are generated from Ψ( µ , µ ; γ ) involvesthe cutoﬀ threshold C cost ( µ , µ ; γ ) := − C ( − µ , − µ ; γ ). The agent should stop when theﬁrst (cost-based) draw falls below C cost ( µ , µ ; γ ) , and continue when the ﬁrst draw exceedsthe cutoﬀ.For µ > µ , we have C ( − µ , − µ ; γ ) < C ( − µ , − µ ; γ ) by Proposition 1. So, C cost ( µ , µ ; γ ) = − · C ( − µ , − µ ; γ ) > − · C ( − µ , − µ ; γ ) = C cost ( µ , µ ; γ )as desired.As Proposition OA.11 shows, the subjectively optimal stopping rules in problems sat-isfying Assumption OA.3 imply a diﬀerent kind of censoring than in the baseline model.Speciﬁcally, the history contains the second-period draw only when X is high. For c ∈ R , let ¯ S c denote the stopping strategy S ( x ) = Continue if x > c , S ( x ) = Stop if x ≤ c . Thebar notation distinguishes it from S c , the stopping strategy with the stopping region [ c, ∞ ).For c, µ , µ ∈ R , the KL divergence between H (Ψ • ; ¯ S c ) and H (Ψ( µ , µ ; γ ); ¯ S c ) is given by Z c −∞ φ ( x ; µ • , σ ) · ln φ ( x ; µ • , σ ) φ ( x ; µ , σ ) ! dx + Z ∞ c (Z ∞−∞ φ ( x ; µ • , σ ) · φ ( x ; µ • , σ ) · ln " φ ( x ; µ • , σ ) · φ ( x ; µ • , σ ) φ ( x ; µ , σ ) · φ ( x ; µ − γ ( x − µ ) , σ ) dx ) dx . Proposition OA.12.

The pseudo-true fundamentals minimizing D KL ( H (Ψ • ; ¯ S c ) k H (Ψ( µ , µ ; γ ); ¯ S c )) are µ ∗ ( c ) = µ • and µ ∗ ( c ) = µ • − γ ( µ • − E [ X | X ≥ c ]) . So µ ∗ ( c ) is strictly increasing in c. Since E [ X | X ≥ c ] > µ • for every c ∈ R and γ > , this shows the pseudo-truesecond-period fundamental is always too high for every stopping strategy ¯ S c . The directionof misinference about µ is the opposite as in the main text, due to the opposite asymmetrydata censoring. Still, the key mechanism behind the misinference remains the same: the68nteraction between the (opposite kind of) censoring eﬀect and the gambler’s fallacy, as anunbiased agent with γ = 0 and a biased agent facing uncensored data both infer µ correctly.Since high values of draws are bad news in the environment where draws are interpretedas costs, this shows agents end up over-pessimistic beliefs about the distributions, as over-estimating µ corresponds to making an overly unfavorable assessment about the secondperiod. Proof.

Rewrite D KL ( H (Ψ • ; ¯ S c ) k H (Ψ( µ , µ ; γ ); ¯ S c )) as Z ∞−∞ φ ( x ; µ • , σ ) · ln φ ( x ; µ • , σ ) φ ( x ; µ , σ ) ! dx + Z ∞ c φ ( x ; µ • , σ ) · Z ∞−∞ φ ( x ; µ • , σ ) ln " φ ( x ; µ • , σ ) φ ( x ; µ − γ ( x − µ ) , σ ) dx dx . The KL divergence between N ( µ true , σ ) and N ( µ model , σ ) is ln σ model σ true + σ +( µ true − µ model ) σ − , so we may simplify the ﬁrst term and the inner integral of the second term.( µ − µ • ) σ + Z ∞ c φ ( x ; µ • , σ ) · " σ + ( µ − γ ( x − µ ) − µ • ) σ − dx . Dropping constant terms not depending on µ and µ and multiplying by σ , we get asimpliﬁed expression of the objective, ξ ( µ , µ ) := ( µ − µ • ) Z c −∞ φ ( x ; µ • , σ ) · " ( µ − γ ( x − µ ) − µ • ) dx . We have the partial derivatives by diﬀerentiating under the integral sign, ∂ξ∂µ = Z ∞ c φ ( x ; µ • , σ ) · ( µ − γ ( x − µ ) − µ • ) dx ∂ξ∂µ = ( µ − µ • ) + γ Z ∞ c φ ( x ; µ • , σ ) · ( µ − γ ( x − µ ) − µ • ) dx = ( µ − µ • ) + γ ∂ξ∂µ By the ﬁrst order conditions, at the minimum ( µ ∗ , µ ∗ ) , we must have: ∂ξ∂µ ( µ ∗ , µ ∗ ) = ∂ξ∂µ ( µ ∗ , µ ∗ ) = 0 ⇒ µ ∗ = µ • . So µ ∗ satisﬁes ∂ξ∂µ ( µ • , µ ∗ ) = 0 , which by straightforward algebra shows µ ∗ ( c ) = µ • − γ ( µ • − E [ X | X ≥ c ]) . c, which leads to more severe censoring of the dataset as X is only observed when X ≥ c .This more severely censored dataset, in turn, leads to even higher belief in the second-periodfundamental by Proposition OA.12. So as in Theorem 2, cutoﬀ thresholds and beliefs aboutthe fundamentals form monotonic sequences across generations t ≥ OA 8.2 Population with Heterogeneity in Selection Neglect

In Section 3’s learning environment, select neglect is unlikely to appear. Bayesian inferencesimply takes the form of updating beliefs using what the agent sees during the stage game:either X or the pair ( X , X ) . I believe even the large-generations learning environment from Section 4 is unlikely toevoke selection neglect, a psychology most likely to be present when the observed datasetcontains does not contain reminders about selection. Censoring is highly explicit and salientin my setting, which is not the type of framing that typically evokes selection neglect.In Enke (2019)’s experiment on selection neglect, players (one human subject and ﬁvecomputer players following a mechanical rule) are asked to guess a “state of the world”based on the average of 6 private signals. Players are sorted into one of two groups based onwhether their own private signal is high or low, then observe the signals of others in theirgroup. In the baseline treatment, there is no reminder of the excluded data on the decisionscreen where subjects are shown the signals of others in the same group and asked to entera guess. This treatment ﬁnds selection neglect. Another “nudge” treatment where subjectsare given a simple hint stating: “

Also pay attention to those randomly drawn balls that arenot shown to you by the information source ” reduces the number of selection neglecters by50%. So I believe the much clearer reminders of selection in my environment should reducethe frequency of selection neglect even further.Jehiel (2018) studies misperceived investment returns under selection neglect. In hismodel, each predecessor has a potential project and observes a private signal about theproject’s quality. Predecessors with high signals implement their projects. Agents in thecurrent generation observe the pool of implemented projects, then generate their own signalsabout the qualities of these observed projects. These signals are independent of the actualprivate signals that the predecessors used for implementation decisions. Current agentsinfer the conditional quality given each signal using the empirical mean quality among pastimplemented projects generating the same signal. This is another environment where thedataset contains no hints about the existence of excluded data (the unimplemented projects)or the selection criterion (the private signals of predecessors). In fact, if datasets in Jehiel(2018)’s setting record the complete experience of the predecessors in their decision problems,70s is the case in my history datasets, then the misinference result no longer holds.Nevertheless, in this section, I study an extension of the baseline model where a fraction0 ≤ α < h ,n , h ,n ) n ∈ [0 , , they treat ( h ,n ) n ∈ [0 , as a sample from the unconditional distribu-tion of X , and ( h ,n ) n : h ,n = ∅ as an independent sample from the unconditional distributionof X . Relative to the base line agents, they mistake the selection process by which h ,n ’sappear in the dataset: they are not censored at random, but only censored when h ,n exceedsthe acceptance threshold used by the predecessors. In this environment, the gambler’s fal-lacy and selection neglect exactly cancel each other out, since in large datasets the mean of h ,n is µ • and the mean of uncensored h ,n is µ • . This shows that from the dataset H • ( c ) forany c ∈ R , the selection neglecters correctly infer the fundamentals and choose the stoppingstrategy with cutoﬀ C ( µ • , µ • ; γ ) . Now consider a baseline agent with the gambler’s fallacy, facing a dataset of historiesgenerated by a heterogeneous population of predecessors. A fraction α of the histories aregenerated by selection neglecters using the stopping strategy S C ( µ • ,µ • ; γ ) . The remaining 1 − α fraction are generated by baseline predecessors using the stopping strategy S c . The nextProposition characterizes the pseudo-true fundamentals maximizing the weighted-averageKL-divergence objective, αD KL ( H • ( C ( µ • , µ • ; γ )) k H (Ψ( µ , µ ; γ ); C ( µ • , µ • ; γ )))+(1 − α ) D KL ( H • ( c ) ||H (Ψ( µ , µ ; γ ); c )) . (4) Proposition OA.13.

The pseudo-true fundamentals minimizing Equation (4) when baselinepredecessors use the stopping threshold c is µ SN = µ • ,µ SN ( c ) = α P [ X ≤ C ( µ • , µ • ; γ )] α P [ X ≤ C ( µ • , µ • ; γ )] + (1 − α ) P [ X ≤ c ] · µ ∗ ( C ( µ • , µ • ; γ ))+ (1 − α ) P [ X ≤ c ] α P [ X ≤ C ( µ • , µ • ; γ )] + (1 − α ) P [ X ≤ c ] · µ ∗ ( c ) . Proof.

Let w = α, w = 1 − α, c = C ( µ • , µ • ; γ ) , c = c. By simple algebra, we may rewrite This cutoﬀ may nevertheless diﬀer from the objectively optimal one, since the selection neglecters alsosuﬀer from the gambler’s fallacy, so they believe in the joint distribution Ψ( µ • , µ • ; γ ) . µ − µ • ) σ + X k =1 w k (Z c k −∞ φ ( x ; µ • , σ ) · " σ + ( µ − γ ( x − µ ) − µ • ) σ − dx ) . Dropping terms not dependent on µ , µ and multiplying through by σ , we get the simpliﬁedobjective ξ SN ( µ , µ ) := ( µ − µ • ) X k =1 w k (Z c k −∞ φ ( x ; µ • , σ ) · " ( µ − γ ( x − µ ) − µ • ) σ dx ) . The ﬁrst-order condition is only satisﬁed at µ SN = µ • ,µ SN = 1 w P [ X ≤ c ] + w P [ X ≤ c ] X k =1 w k P [ X ≤ c k ] { µ • − γ ( µ • − E [ X | X ≤ c k ]) } . This shows, in terms of expressions for pseudo-true fundamentals in the baseline model µ ∗ , µ SN ( c ) = α P [ X ≤ C ( µ • , µ • ; γ )] α P [ X ≤ C ( µ • , µ • ; γ )] + (1 − α ) P [ X ≤ c ] · µ ∗ ( C ( µ • , µ • ; γ ))+ (1 − α ) P [ X ≤ c ] α P [ X ≤ C ( µ • , µ • ; γ )] + (1 − α ) P [ X ≤ c ] · µ ∗ ( c ) . That is, with a mixture of selection-neglecter and baseline predecessors, baseline agents’inference about the second-period fundamental is a convex combination between what theywould infer from the histories of the selection neglecters alone and what they would inferfrom the histories of the baseline predecessors alone. The relative weights given to these twopseudo-true second-period fundamentals depend on the relative sizes of the two subpopu-lations, as well as on how frequently second-period draws are observed in each of the twosub-datasets.Since both µ ∗ ( C ( µ • , µ • ; γ )) and µ ∗ ( c ) are strictly below µ • , we immediately conclude thesame holds for µ SN ( c ) for any c ∈ R . This shows the robustness of the over-pessimism resultfrom the main text to the presence of a fraction of selection neglecters.Next, I compare a baseline society with no selection neglecters with a second societycontaining a positive fraction of selection neglecters. I show that when two societies startwith the same generation 0 behavior, society with selection neglecters hold more optimisticbeliefs about the second-period fundamental and use a higher stopping threshold in everygeneration t ≥ . This is not simply due to the mechanical reason that the selection neglectersalways make the correct inferences about the fundamentals, thereby making the “average”belief in the society more optimistic. The presence of the selection neglecters also moderates72he over-pessimism of the baseline gambler’s fallacy agents (without completing eliminatingit), by making the censoring eﬀect less severe.

Corollary OA.3.

Let < α < . Consider two societies, A and B , where society A has noselection neglecters and society B has an α fraction of selection neglecters in each generation.Suppose both societies start at the same initial condition c [0] ∈ R . For t ≥ , consider theauxiliary learning environment and denote the baseline agents’ beliefs and cutoﬀ thresholdsin society k ∈ { A, B } as µ k , [ t ] , µ k , [ t ] , c k [ t ] . Then for every t ≥ , µ , [ t ] > µ , [ t ] and c t ] > c t ] .Proof. From Proposition OA.13 (and Example 2 for the case of t = 1), µ A , [ t ] = µ B , [ t ] = µ • for every t ≥ . Also, in the ﬁrst generation, µ A , [1] = µ B , [1] and c A [1] = c B [1] since both societiesface the same dataset H • ( c [0] ) . Since µ A , [1] < µ • , we must have c A [1] = C ( µ • , µ A , [1] ; γ ) µ A , [2] and hence c B [2] > c A [2] . But when c B [ t ] > c A [ t ] and C ( µ • , µ • ; γ ) > c A [ t ] we have µ ∗ ( c A [ t ] ) < µ ∗ ( c B [ t ] ) , which shows in the next generation, µ B , [ t +1] isthe convex combination of two terms both exceeding µ ∗ ( c A [ t ] ). This implies µ B , [ t +1] > µ A , [ t +1] and c B [ t +1] > c A [ t +1] . By induction, the corollary holds for all t ≥ . OA 8.3 Only Observing the Final Draw

In the baseline model, the history h n of predecessor n ∈ [0 ,

1] records just the ﬁrst-perioddraw h n = ( x ,n , ∅ ) if n stopped in period 1, and it records both draws h n = ( x ,n , x ,n ) if n continued into period 2. An outcome history diﬀers from a history of the baseline modelin that it always records only one draw – the one from the period where the agent stops.So, predecessor n ’s outcome history h on is either h on = ( x ,n , ∅ ) or h on = ( ∅ , x ,n ) . This kindof observation may be natural when the optimal-stopping problem is search without recall(i.e. Example 1 with q = 0) and managers in the current generation only know about thecandidates who were eventually hired in the previous generation across various ﬁrms, butnot the early-phase candidates who were discovered but let go.Write H o (Ψ • ; c ) for the distribution of predecessors’ outcome histories when ( X , X ) ∼ Ψ • and predecessors use the cutoﬀ threshold c . I show that for agents using a method-of-moments (MOM) inference procedure analogous to the one in Online Appendix OA 5,they will still infer the pseudo-true fundamentals associated with usual history distribu-tion H (Ψ • ; c ) in the baseline model. To be precise, MOM agents ﬁnd µ MO , µ MO so that H o (Ψ( µ MO , µ MO ; γ ); c ) matches H o (Ψ • ; c ) in terms of the sample means of the uncensoredﬁrst-period draws and uncensored second-period draws. As µ E ˜ X ∼N ( µ ,σ ) [ ˜ X | ˜ X ≥ c ] is astrictly increasing function, the MOM inference µ MO must correctly estimate the ﬁrst-periodfundamental, µ MO = µ • . Also, note that for any ˆ µ , ˆ µ ∈ R and any ˆ γ ≥ , the second mo-ments is the same in the outcome histories distribution H o (Ψ(ˆ µ , ˆ µ ; ˆ γ ); c ) as in the baseline73istories distribution H (Ψ(ˆ µ , ˆ µ ; ˆ γ ); c ). By the method-of-moments interpretation of µ ∗ ( c ) , we conclude that µ MO ( c ) = µ ∗ ( c ) for all c ∈ R . The KL-divergence minimizing pseudo-fundamentals for agents observing outcomes provesdiﬃcult to calculate analytically. This is because the likelihood of the outcome history h on = ( ∅ , x ,n ) is given by an integral over its likelihoods for diﬀerent censored realizations of X . Using numerical simulations, I show in Section OA 9.3 that when Bayesian agents witha correct dogmatic belief about µ • face a large, ﬁnite dataset of outcome histories, their in-ference about the second-period fundamental seems to closely match µ ∗ ( c ). It remains as anopen conjecture whether the minimizers in these two diﬀerent KL-divergence minimizationproblems in fact coincide exactly. OA 9 Numerical Simulations

OA 9.1 Pessimism and Fictitious Variation in Finite Datasets

Lemma OA.3 proves that when an agent with a full-support prior m : R → R observes N histories drawn from the distribution H • ( c ) in the Gaussian case, then as N goes to inﬁnityher posterior belief almost surely concentrates on the KL-divergence minimizing pseudo-trueparameters. In this section, I use simulations to check how well the predictions of Proposition2 and Proposition 7 hold up in ﬁnite datasets. In particular, I am interested in the pessimisticinference in Proposition 2 and the ﬁctitious variation in Proposition 7.I consider the objective distribution ( X , X ) ∼ Ψ( µ , µ , σ , σ ; 0) with µ = µ = 0 ,σ = 1, and a stopping rule that censors X whenever X ≥

1. I suppose agents havedogmatic belief in γ = 0 . R . In FiguresOA.1 and OA.2, I plot distributions of the Bayesian posterior mode after a dataset of size N = 100 , , . I ﬁnd that when N = 100 , there is 91.9% chance that agents under-estimate the second-period and and 78.9% chance they believe in ﬁctitious variation for thesecond-period conditional variance. These probabilities grow to virtually 100% for N = 1000and N = 10000 . OA 9.2 Welfare Implications of Endogenous Learning

In this paper, I have emphasized the dynamics of mislearning and the interaction betweendistorted stopping strategy and distorted beliefs under the gambler’s fallacy. The positivefeedback cycle between censoring and gambler’s fallacy leads to additional welfare implica-tions beyond what would happen with gambler’s fallacy alone in a static, exogenous-datasetting. Figure OA.3 returns to the illustrative example used for Figure 1 and compares theexpected loss (relative to using the objectively optimal stopping rule) in the learning steadystate versus the expected loss for the ﬁrst-generation agents. Recall that Figure 1 considers74 osterior mode for m (N = 100) posterior mode D en s i t y −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 Posterior mode for m (N = 100) posterior mode D en s i t y −0.6 −0.4 −0.2 0.0 0.2 . . . . Posterior mode for m (N = 1000) posterior mode D en s i t y −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 Posterior mode for m (N = 1000) posterior mode D en s i t y −0.6 −0.4 −0.2 0.0 0.2 Posterior mode for m (N = 10000) posterior mode D en s i t y −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 Posterior mode for m (N = 10000) posterior mode D en s i t y −0.6 −0.4 −0.2 0.0 0.2 Figure OA.1: Histograms of inferences about fundamentals in ﬁnite datasets. The red linesin the histograms for µ denote the pseudo-true fundamental (and also the true fundamental) µ ∗ ( c = 1) = 0 . The blue lines in the histograms for µ denote the true fundamental µ • = 0 , while the red lines show the pseudo-true fundamental µ ∗ ( c = 1) = − . osterior mode for s (N = 100) posterior mode D en s i t y . . . . . . Posterior mode for s (N = 100) posterior mode D en s i t y . . . . . Posterior mode for s (N = 1000) posterior mode D en s i t y Posterior mode for s (N = 1000) posterior mode D en s i t y Posterior mode for s (N = 10000) posterior mode D en s i t y Posterior mode for s (N = 10000) posterior mode D en s i t y Figure OA.2: Histograms of inferences about variances in ﬁnite datasets. The red lines inthe histograms for σ denote the pseudo-true variance (and also the true variance) ( σ ∗ ) ( c =1) = 1 . The blue lines in the histograms for σ denote the true fundamental ( σ • ) = 1 , whilethe red lines show the pseudo-true fundamental ( σ ∗ ) ( c = 1) = 1 . . . . . Positive feedback amplifies first−generation loss believed correlation f i r s t gen l o ss / l ong − r un l o ss Figure OA.3: Welfare loss in the ﬁrst generation as a fraction of the total long-run welfareloss, as a function of the believed correlation between X and X . A more negative correlationcorresponds to a larger γ and a more severe gambler’s fallacy bias.search without recall, so u ( x ) = x , u ( x , x ) = x with true fundamentals µ • = µ • = 0 . As I initialize the 0th generation with the objectively optimal stopping threshold c [0] = 0,misinference from the gambler’s fallacy is solely responsible the ﬁrst-generation loss. Thelong-run loss, on the other hand, is exacerbated by successive generations of predecessorslowering their stopping threshold and thus censoring the dataset with increasing severity.As Figure OA.3 shows, the fraction of long-run losses attributable to passive inference un-der gambler’s fallacy falls with the degree of the bias, highlighting the need of the dynamicanalysis especially in environment where we expect the bias to be more serious. OA 9.3 Inference of Misspeciﬁed Bayesian Agents when Observ-ing Only the Final Draw

Consider a Bayesian agent with the (improper) ﬂat prior over the class of models { Ψ( µ , µ , σ , σ , γ ) : µ = 0 , σ = 1 , γ = 0 . , µ ∈ R } . − . − . − . . Inference from Baseline Histories and Outcome Histories

Related Researches

The Origin and the Resolution of Nonuniqueness in Linear Rational Expectations

by John G. Thistle

Data-based Automatic Discretization of Nonparametric Distributions

by Alexis Akira Toda

Corruption-free scheme of entering into contract: mathematical model

by Oleg Malafeyev

Chocs technologiques, chocs des prix et fluctuations du chômage en République Démocratique du Congo

by Antoine Kamiantako Miyamueni

Measurement of the evolution of technology: A new perspective

by Mario Coccia

Econophysics Beyond General Equilibrium: the Business Cycle Model

by Victor Olkhov

Planning Fallacy or Hiding Hand: Which Is the Better Explanation?

by Bent Flyvbjerg

Equilibrium in thin security markets under restricted participation

by Michail Anthropelos

Minimising the expectation value of the procurement cost in electricity markets based on the prediction error of energy consumption

by Naoya Yamaguchi

Stock management (Gestão de estoques)

by Cainan K. de Oliveira

New Proposals of a Stress Measure in a Capital and its Robust Estimator

by Tadeusz Klecha

Diversification, economies of scope, and exports growth of Chinese firms

by Mercedes Campi

A quantitative approach to choose among multiple mutually exclusive decisions: comparative expected utility theory

by Pengyu Zhu

When does a disaster become a systemic event? Estimating indirect economic losses from natural disasters

by Sebastian Poledna

Capital Structure in U.S., a Quantile Regression Approach with Macroeconomic Impacts

by Andreas Kaloudis

The time interpretation of expected utility theory

by Ole Peters

The tipping point: a mathematical model for the profit-driven abandonment of restaurant tipping

by Sara M. Clifton

Explaining the Mechanism of Growth in the Past Two Million Years Vol. I

by Ron W. Nielsen

Ownership Cost Calculations for Distributed Energy Resources Using Uncertainty and Risk Analyses

by S. Ali Pourmousavi

Modeling of the Labour Force Redistribution in Investment Projects with Account of their Delay

by I.D. Kolesin

Optimal Inflation Target: Insights from an Agent-Based Model

by Jean-Philippe Bouchaud

Relatedness, Knowledge Diffusion, and the Evolution of Bilateral Trade

by Bogang Jun

729 new measures of economic complexity (Addendum to Improving the Economic Complexity Index)

by Saleh Albeaik

Indirect Inference with a Non-Smooth Criterion Function

by David T. Frazier

The phase space structure of the oligopoly dynamical system by means of Darboux integrability

by Adam Krawiec

«

1

2

3

4

»

Submitted on 21 Mar 2018 (v1), last revised 8 Dec 2020 (this version, v5) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar