LLearning from Manipulable Signals ∗ M EHMET E KMEKCI L EANDRO G ORNO L UCAS M AESTRI J IAN S UN D ONG W EI Boston College FGV EPGE FGV EPGE MIT Sloan UC Berkeley
September 9, 2020
Abstract
We study a dynamic stopping game between a principal and an agent. The agent is privatelyinformed about his type. The principal learns about the agent’s type from a noisy performancemeasure, which can be manipulated by the agent via a costly and hidden action. We fully char-acterize the unique Markov equilibrium of this game. We find that terminations/market crashesare often preceded by a spike in (expected) performance. Our model also predicts that, dueto endogenous signal manipulation, too much transparency can inhibit learning. As the playersget arbitrarily patient, the principal elicits no useful information from the observed signal.
Keywords:
Asymmetric information, learning, signal manipulation, venture capital
JEL classification:
C73, D82, D83, G24, M13 ∗ This paper originated from two related but independent projects. All authors contributed equally to the currentpaper. We are grateful for the helpful comments from Alp Atakan, Costas Cavounidis, Doruk Cetemen, Brett Green,Sam Jindani, Aaron Kolb, Aditya Kuvalekar, Elliot Lipnowski, Andrey Malenko, Harry Di Pei, Jo˜ao Ramos, YimanSun, Philipp Strack, as well as seminar and conference participants at Stanford, MIT Sloan, Boston University, BostonCollege and the 2020 Econometric Society World Congress.
Email:
Ekmekci ([email protected]); Gorno([email protected]); Maestri ([email protected]); Sun ([email protected]); Wei ([email protected]). a r X i v : . [ ec on . T H ] S e p Introduction
Asymmetric information is pervasive in long-term relationships; meanwhile, learning often takesplace during the interactions between different parties. For instance, venture capital (VC) firms faceasymmetric information in their investments: startups often have better information about the oddsof success of their projects than the investors (Brealey et al., 1977; Chan, 1983; Gompers and Lerner,2004). Moreover, due to the private benefits from receiving continuous funding, startups are willingto pursue projects that are less viable than what VCs are willing to invest in. VCs, upon agreeingto finance a startup, receive periodical performance reports (subscription growth, number of patents,media and user reviews, etc.) from the startup. These reports may provide information about theviability of the startup. However, the startup may undertake hidden actions to inflate the performancereport, tampering with its informativeness. Examples include rideshare platforms who periodicallyannounce their numbers of users and could inflate such statistics by specialized promotions, andLuckin Coffee and Theranos who have been under investigation for fabricating key performance data.We analyze learning problems with asymmetric information and hidden actions, and investigatethe equilibrium learning dynamics. In our model, a principal (VC) and an agent (startup) are engagedin a relationship that takes place in continuous time. Performance reports are modeled as publicsignals evolving according to a Brownian motion whose drift depends on the agent’s privately-knowntype and action. If the agent is an investible type, then the drift is µ > ; if the agent is a noninvestible type, then the drift is by default, but the agent can take a costly action to boost the drift up to µ .The signals serve only an informational role, and do not affect the principal’s payoffs. The principalreceives opportunities to terminate the relationship according to a Poisson process, and chooseswhether to terminate the relationship whenever such an opportunity arises; she prefers to continue therelationship with the investible type and to terminate the relationship against the noninvestible type.We study Markov equilibria of this game where the state variable is the public belief that theagent is a noninvestible type. We call the complementary probability, i.e., the probability that theagent is an investible type, the agent’s reputation. Our first result establishes the existence anduniqueness of Markov equilibrium.In the unique equilibrium, the principal’s termination strategy has a cutoff structure — theprincipal terminates the relationship if and only if the agent’s reputation is sufficiently bad. Theagent’s equilibrium strategy depends on the magnitude of his discount rate. If his discount rate is An extreme example is the former CEO of WeWork, Adam Neumann, who allegedly purchased a corporatejet with the company’s money for personal use. The Poisson arrival of stopping opportunities captures the frictions in the principal’s decision making andimplementation, and will technically help us avoid off-path histories in our equilibrium analysis. peaks at the principal’s termination cutoff.Our first qualitative finding concerns the relationship between the agent’s reputation and theexpected performance, measured by the expected drift of the signal from an outsider’s perspective.If the agent is so impatient that he never engages in performance boosting, then the expectedperformance is increasing in the agent’s reputation (decreasing in the state variable). However,for higher patience such that the agent engages in some performance boosting, we find that theexpected performance is non-monotone in the agent’s reputation. Starting from an initial goodreputation, as the agent’s reputation deteriorates, the expected performance first declines, reaching alocal minimum, and then it rises, reaching a local maximum precisely at the principal’s terminationcutoff, and decreases again thereafter (see Figure 3). This finding may help explain why somestartups deliver impressive performance reports, such as large sales growth (e.g., Luckin Coffee),extraordinary revenue flow (e.g., Theranos) or rapid expansion (e.g., WeWork), not long beforeinvestors pull their funds. It is also consistent with the observation that growing market suspicionand strong (expected) performance can coexist for a period of time.Our second qualitative result concerns the relationship between the amount of informationtransmission and the transparency of the performance measure. Due to random events such asdemand shocks and measurement errors, performance reports are imperfect signals of the agent’stype and action, and we use the signal-to-noise ratio of the process to capture its transparency. Inreality, differences in disclosure standards can lead to differences in transparency. We investigatehow the principal’s equilibrium payoff changes with the signal-to-noise ratio, and show that, dueto the agent’s endogenous signal manipulation, the principal may be worse off as transparencyimproves. This result has a policy implication regarding disclosure requirements for startups: morestringent disclosure requirements may harm the investors.Specifically, if the opportunity to terminate the relationship arrives at a rate less than a cutoff,then in equilibrium, the agent never engages in performance boosting too aggressively becausetermination is always unlikely. In this case, as the signal-to-noise ratio grows, the information flowin the principal’s optimal stopping problem approaches immediate revelation of the agent’s type,which benefits the principal.On the other hand, if the termination opportunity arrives at a rate greater than the aforementionedcutoff, then the agent has stronger incentives to engage in performance boosting. We find, perhaps2urprisingly, that the principal’s payoff is nonmonotone in the signal-to-noise ratio of the perfor-mance report, implying that the principal can be worse off when the performance measure becomesmore transparent (less noisy). We obtain this result by looking at two extreme cases. At one extreme,if the performance report is independent of the agent’s type and action (i.e., uninformative signals),then the principal can never learn about the agent’s type and will receive her “no-information”value. At the other extreme, as the signal-to-noise ratio grows without bound, we show that theprincipal cannot utilize any information about the agent’s type either. Intuitively, in this case theagent will engage in performance boosting aggressively, for otherwise his type would be revealedrapidly. Such aggressive performance boosting is anticipated by the principal, and thus largelyreduces the informativeness of the signal. As a result, the principal’s equilibrium payoff convergesto her “no-information” value. In contrast to the extreme cases, for intermediate values of thesignal-to-noise ratio, the principal will learn some information about the agent’s type and get apayoff strictly above her “no-information” value.Finally, we investigate the equilibrium outcomes as players get arbitrarily patient. We find a strongmanifestation of the ratchet effect in the patient limit of our model. Since the principal cannot committo refraining from using future information against the agent, a patient agent will engage in perfor-mance boosting with almost full intensity in order to maintain his reputation. At the limit, no usefulinformation is revealed, and the principal’s lack of commitment hurts her in the most extreme way. While the leading application of our model is the VC-startup relationships, we believe that the eco-nomic forces identified in our analysis apply more widely to other scenarios, such as voter-politician,manager-worker and purchaser-supplier relationships, where learning with asymmetric informationis a critical aspect. On the technical side, our choice of modeling this game in continuous timeenables us to fully solve the model for arbitrary parameter values, which also sets the stage forthe asymptotic analyses we present. Meanwhile, we overcome additional technical obstacles toobtain these asymptotic results. For example, in order to establish our result on the patient limit,we prove a continuous-time analogue of the well-known learning result from Fudenberg and Levine(1992). Moreover, our analysis of the principal’s learning speed allows us to better understandthe mechanisms that cause the convergence of her value function as transparency improves.
Related Literature.
Our paper is most closely related to the reputation literature and the literatureon dynamic games with stopping decisions. This result holds if both players get arbitrarily patient at the same rate, or if the agent gets patient at a faster ratethan does the principal. Besides papers reviewed below, recent works that exploit the tractability of continuous-time methods includeDemarzo et al. (2012), Bonatti et al. (2016), Ortner (2017), Cisternas (2017) and Varas et al. (forthcoming), among others. See Lemma OA. 5 in the Online Appendix for an analogue of Theorem A.1 in Fudenberg and Levine (1992). In contrast, our analysis fully characterizes (Markov) equilibrium behavior forall discount rates, and we uncover new qualitative features that the equilibrium dynamics exhibit. Faingold and Sannikov (2011) study reputation effects in games played in continuous timewith one long-lived informed player against myopic opponents, and they characterize the set ofsequential equilibria. Unlike Faingold and Sannikov (2011), the uninformed player in our game isforward-looking and can terminate the game. More importantly, the termination payoffs depend onthe informed player’s type, creating interdependence of payoffs between the players (similar to Pei,forthcoming) and thus making their characterization not applicable to our model.There is a growing interest in dynamic games with stopping decisions. Daley and Green (2012),Kolb (2015, 2019), Dilm´e (2019), Ekmekci and Maestri (2019) and Sun (2020) all study stoppinggames with two long-lived players, where the uninformed party receives information over timeand obtains type-dependent payoffs. In Daley and Green (2012), Kolb (2015) and Dilm´e (2019),the informed player makes the stopping decision while in our paper such decision is made by theuninformed player. This makes the incentives and equilibrium structure in our model quite differentfrom theirs. In Kolb (2019), the agent can only influence the information process by irreversiblychanging his type, while in our model the agent can directly manipulate the signal, with his typebeing persistent. Besides, the qualitative results on the equilibrium dynamics in our paper do nothave a counterpart in these papers. Ekmekci and Maestri (2019) study a similar setting in discretetime, and focus solely on the limiting case with arbitrarily patient players. Our Theorem 4 is thecontinuous-time version of their main finding, while we obtain much richer equilibrium dynamics forany fixed discount rate. Finally, Sun (2020) studies dynamic censorship with Poisson news, whereinthe agent can decide whether to show or hide the bad news after privately observing its realization. There are also papers that bound equilibrium payoff of the informed player with long-lived uninformed players,e.g., Schmidt (1993); Cripps and Thomas (1997); Celentani et al. (1996); Atakan and Ekmekci (2012, 2015). Studies on reputation dynamics include Mailath and Samuelson (2001); Phelan (2006); Liu (2011); Ekmekci(2011); Lee and Liu (2013); Liu and Skrzypacz (2014). However, these papers do not share similar equilibriumdynamics or qualitative results that we obtain partly because they look at repeated moral hazard games and/or theuninformed parties are myopic. In Sun (2020), the equilibrium censoring intensity is monotone in the agent’s reputation while in our model,the intensity of performance boosting is non-monotone. Besides, his analysis focuses on the welfare implications of thecensoring activity, while we examine the welfare effects of better transparency and the ratchet effect at the patient limit. costless to theagent and observable to the principal. By contrast, in our model the agent has private informationabout his type, and his action is costly and hidden . This necessarily makes the principal’s inferenceproblem more delicate, as she has to form a conjecture about the agent’s action which need coincidewith the agent’s actual strategy in equilibrium. Moreover, in our model the agent’s trade-off isbetween improving his reputation and saving the mimicking cost, while in their models the agent isoptimizing over the speed of learning (i.e., variance, rather than drift, of the belief process). Orlov etal. (2020) also consider a dynamic setting with stopping decisions and symmetrically informedplayers, and they study the agent’s optimal information disclosure policy in a persuasion game.
A principal (she) and an agent (he), both risk-neutral, interact in continuous time t ∈ [0 , ∞ ) .At any time t , an exogenous stopping opportunity arrives according to a Poisson process { J t } t ≥ with rate λ > . When the said opportunity arrives, the principal chooses whether to continue orirreversibly stop the game. The Poisson arrival of stopping opportunities captures the frictions inthe principal’s decision making and implementation. The agent can be one of two types, denoted by θ : an investible type ( θ = I ), or a noninvestible type ( θ = NI ). The agent’s type is his private information. From the principal’s viewpoint, theinitial probability that the agent is a noninvestible type is p ∈ (0 , .There is a public signal { X t } t ≥ that evolves over time. If the agent is an investible type, thenthe public signal evolves according to the process: dX t = µdt + σdB t , where { B t } t ≥ is a standard Brownian motion. Without loss, we assume that µ > and wenormalize σ to . If the agent is a noninvestible type, he chooses an α t ∈ [0 , at any time t when Technically, this assumption ensures that there is no off-equilibrium history/belief. In Remark 1 we discuss whathappens as the frictions vanish, i.e. as λ →∞ . dX t = µα t dt + dB t . The model assumes that the investible type does not have any action choice, and the evolutionof the public signal is exogenous conditional on this type (always having a drift of µ ). Meanwhile,the noninvestible type chooses a mimicking intensity , which can be interpreted as the probabilitywith which the noninvestible type acts the same as the investible type. In our leading applicationof VC investments, we can interpret the public signal as performance reports from the startup andthe mimicking action taken by the noninvestible type as performance boosting . The investible type of the agent does not have an action choice. A strategy for the noninvestible typeis a stochastic process { α t } t ≥ , which takes values in [0 , and is progressively measurable withrespect to the filtration generated by { B t } t ≥ . Let A be the set of admissible strategies for the agent.A strategy for the principal is a stochastic process β ≡ { β t } t ≥ , progressively measurable withrespect to the filtration generated by { X t ,J t } t ≥ , which represents the probability with which theprincipal takes the stopping action conditional on the arrival of a stopping opportunity. Let B bethe set of admissible strategies for the principal. Given a strategy profile ( α,β ) and a prior p , the principal updates her belief about the agent’stype using Bayes’ rule, and we let { p t } t ≥ denote the belief process defined by p t := P { θ = NI | { X s } s ≤ t } . (1)Note that the belief process p t , conditional on a continuing relationship, is determined by the strategyof the agent and not affected by the strategy of the principal or the arrival of stopping opportunities. If the game is stopped, the agent receives his outside option v , which we normalize to 0. If thegame is not yet stopped, the noninvestible agent receives a flow payoff that depends on his action, u +(1 − α t ) c , where u > and c > . , That is, if the noninvestible agent does (not) mimic the We note that the principal only observes the public signal, while the agent knows his own past actions, and thuscan recover { B t } t ≥ by removing the drift term. Interpreting the noninvestible type as choosing between mimicking ( A t =1 ) or not ( A t =0 ) and α t as the probabilityof taking the mimicking action, we can think of the noninvestible type’s flow payoff as defined by u + { A t =0 } c . The flow payoff of the investible type in the relationship is always some positive constant, say, u + c . u (resp., u + c ); thus, c is the flow cost ofmimicking. For a given strategy profile, ( α,β ) , the expected discounted payoff of the noninvestibleagent at time t is given by U ( t,α,β ) := E (cid:26)(cid:90) T t e − r ( τ − t ) r [ u +(1 − α τ ) c ] dτ (cid:12)(cid:12)(cid:12)(cid:12) θ = NI, { B s } s ≤ t (cid:27) , where T is the random time at which the game stops and the expectation is taken over T . Thisexpression can be simplified to U ( t,α,β ) := E (cid:26)(cid:90) ∞ t e − Λ ( t,τ,β ) r [ u +(1 − α τ ) c ] dτ (cid:12)(cid:12)(cid:12)(cid:12) θ = NI, { B s } s ≤ t (cid:27) , where we define the discounting exponent (taking into account the agent’s discount rate r andthe termination probability) Λ ( t,τ,β ) := (cid:90) τt ( r + λβ s ) ds. The principal’s flow payoff does not depend on the agent’s action or the public signal, and wenormalize her flow payoff to zero. However, the principal receives a lump-sum payoff of w NI > ifthe game stops against a noninvestible type, and w I < if the game stops against an investible type.That is, relative to continuing the relationship, the principal prefers stopping against a noninvestibletype but dislikes terminating an investible type. Thus, given a strategy profile ( α,β ) , the expecteddiscounted payoff of the principal at time t is given by U ( t,α,β ) := E (cid:26)(cid:90) ∞ t e − Λ ( t,τ,β ) λβ τ (cid:0) { θ = NI } w NI + { θ = I } w I (cid:1) dτ (cid:12)(cid:12)(cid:12)(cid:12) { X s } s ≤ t (cid:27) , where we define the discounting exponent (taking into account the principal’s discount rate r andthe termination probability) Λ ( t,τ,β ) := (cid:90) τt ( r + λβ s ) ds. Note that U ( t,α,β ) is calculated conditional on the stopping opportunity not arriving (or havingbeen forgone) at time t . This assumption seems reasonable in our leading example of venture capital investments, wherein an investor’spayoff is mainly driven by the viability (type) of the startup rather than its performance in the initial financing period,while the initial performance is still informative to the investor about the startup’s type. Discussion of Model Assumptions
The essential ingredients of our model are the following:1. The agent has private information about his type, and wants to stay in the relationship foras long as possible.2. The principal faces a learning problem about the agent’s type; she prefers to terminate againstthe nonivestible type and to continue with the investible type.3. The noninvestible type can manipulate the drift of a noisy signal at a cost to mimic theinvestible type’s performance.4. The signal only serves an informational role and is payoff-irrelevant to the principal.In addition to these assumptions, we adopt a normalization of flow payoffs and outside optionsto simplify the exposition. Below, we present an alternative but equivalent formulation, which fitsbetter with our VC examples and provides a foundation to the principal’s simplified payoff structure.Suppose that the principal’s outside option is independent of the agent’s type and is equal to 0. Bycontinuing the relationship the principal incurs a flow cost equal to b > . There is a revealing event that arrives according to a Poisson process with rate ρ , independent of the agent’s type and the signalprocess. When the event arrives, the game ends delivering a lump-sum payoff to the principal. Thispayoff is equal to π I > if the agent is investible and π NI = 0 otherwise. The flow cost represents thecontinuous financial inputs that the VC contributes to the startup. The revealing event correspondsto the VC’s realization of the startup’s profitability (type), and the ensuing type-dependent lump-sumpayoffs correspond to the value of the startup to the VC upon learning its type. As in the originalformulation, the principal can also terminate the relationship whenever a stopping opportunityarrives. The arrival follows a Poisson process at rate ˆ λ , and is independent of the revealing event.The (noninvestible) agent’s flow payoff is ˆ u if he engages in performance boosting and ˆ u +ˆ c otherwise. The agent receives a payoff of 0 when the relationship ends, either because the principalterminates it or the revealing event occurs. The discount rates of the agent and the principal are ˆ r and ˆ r , respectively.This formulation is strategically equivalent to the benchmark model with a type-dependent outsideoption for the principal. The equivalence is achieved through the following transformation ofparameters, which can be verified by standard calculations. The agent’s flow payoffs are identical We could also assume that the investible type gets a positive lump-sum reward when the revealing event occurs,but this will not change the game in any way because the revealing event is out of everyone’s control. u = ˆ u , c =ˆ c , and so is the arrival rate of the principal’s stoppingopportunity, λ = ˆ λ . The implied discount rates are augmented by the arrival rate of the revealing event,i.e., r =ˆ r + ρ , r =ˆ r + ρ . And finally, the principal’s type-dependent outside options are given by w I = ˆ r b − ρπ I ˆ r + ρ , w NI = ˆ r b ˆ r + ρ. As long as π I > ˆ r ρ b (i.e., if the reward to the principal upon learning that the agent is investibleis large enough), we have w I < < w NI , as in our baseline model.Finally, our model focuses solely on the adverse-selection aspect of the problem, by assumingthat performance reports do not directly affect the principal’s payoff. This seems reasonable incontexts such as VC-startup relationships, wherein an investor’s payoff is mostly driven by theviability (type) of the startup rather than its performance in the initial financing period, though theinitial performance is informative about the startup’s type. An equilibrium is a strategy profile ( α,β ) such that U ( t,α,β ) ≥ U ( t, ˜ α,β ) ,U ( t,α,β ) ≥ U ( t,α, ˜ β ) , for all alternative strategies ˜ α ∈ A and ˜ β ∈ B , almost surely for all t ≥ . Let P := { f : (0 , → [0 , ,f is right-continuous and piecewise Lipschitz } . Recall that the belief process defined in (1) is determined by the agent’s strategy. We say that a strat-egy α of the agent is Markovian if there exists policy function a ∈ P such that α t = a ( p t ) for all t ≥ . Moreover, we have verified that our equilibrium characterization still holds even after allowing for somedependence of the principal’s flow payoff on the agent’s action. This analysis is not included in the manuscript andis readily available upon request. Note that for any t > , U i ( t,α,β ) is player i ’s expected payoff conditional on the respective filtration at time t . Thus, U i ( t,α,β ) itself is a random variable and the inequality about U i in this definition is interpreted to hold almostsurely (i.e., for almost all histories up to t ). A function f :(0 , → [0 , is piecewise Lipschitz if there exist n ∈ N and x 9n equilibrium ( α,β ) is Markovian if there exist policy functions a,b ∈ P such that α t = a ( p t ) and β t = b ( p t ) for all t ≥ . In this case, we say that the policy profile ( a,b ) ∈ P is induced by ( α,β ) .Given a Markovian equilibrium ( α,β ) , let SP ( α ) be the set of posteriors reached on the equi-librium path. Lemma 1. Any Markovian equilibrium ( α,β ) with an induced policy profile ( a,b ) satisfies i )sup p ∈ (0 , a ( p ) < , and ii ) SP ( α ) = (0 , . Lemma 1 says that in any Markovian equilibrium the agent’s action is always bounded awayfrom full mimicking and that every posterior belief is reached with positive probability. Intuitively,if the noninvestible type is expected to choose a = 1 at some belief ˆ p , then the signal from thattime on becomes uninformative, making ˆ p an absorbing state. But if the belief is not moving, thenoninvestible type’s best reply at state ˆ p is to choose a = 0 , which violates the (implicit) requirementthat the agent’s equilibrium action must coincide with the principal’s conjecture about his action.In fact, one can show that an equilibrium policy function a ( · ) must be bounded away from , andthus the variance of the belief process is always bounded away from . Together with the Poissonarrival of stopping opportunities, this makes all interior beliefs reachable on the equilibrium path.Given a Markovian equilibrium ( α,β ) , the continuation payoff at time t depends only on thepublic belief p t . Hence, we define the value function of the (noninvestible) agent as V ( p ) := E { U ( t,α,β ) | p t = p,θ = NI } and the value function of the principal as W ( p ) := E { U ( t,α,β ) | p t = p } for every p ∈ (0 , .We say that a value function is regular if it is continuously differentiable everywhere, and twicecontinuously differentiable everywhere except perhaps at a finite number of points. We say that aMarkovian equilibrium ( α,β ) is smooth if the associated value functions are regular and the agent’spolicy function a ( · ) is Lipschitz. We refer to smooth Markovian equilibria simply as Markovequilibria . Moreover, when there is no confusion, we denote a Markov equilibrium by the policy Consider a Markovian equilibrium, ( α,β ) , and the underlying probability space (Ω , F , P ) . For each p ∈ (0 , , we define Φ( p ) := { ω ∈ Ω: ∃ t ≤ T such that p t ( ω )= p } , where T is the equilibrium stopping time. The belief span,SP ( α ) , is the set of all p such that P (Φ( p )) > . Because p t in a continuing relationship depends only on the agent’sstrategy α and the principal’s stopping opportunity may not arrive at any t , the belief span is also solely determinedby α . Consequently, this notion of belief span can be defined for any (Markovian) strategy of the agent. ( a,b ) that it induces. We first introduce some terminology to define properties of policy functions for the principal andthe agent. Recall that the state variable p is the principal’s belief that the agent is noninvestible. Definition 1. The policy function b ∈ P for the principal has a cutoff structure if there exists ˜ p ∈ [0 , such that b ( p ) = 0 for p < ˜ p , and b ( p ) = 1 for p > ˜ p . We refer to ˜ p as the cutoff belief of b . Definition 2. The policy function a ∈ P for the (noninvestible) agent is fully separating if a ( p ) = 0 for all p ∈ (0 , . Definition 3. The policy function a ∈ P for the (noninvestible) agent is hump-shaped if a iscontinuous and there are cutoffs < p L < p ∗ < p R < such that a ( p ) = 0 for p ≤ p L , strictlyincreasing on ( p L ,p ∗ ) , strictly decreasing on ( p ∗ ,p R ) , and a ( p ) = 0 for p ≥ p R . Theorem 1. There always exists a unique Markov equilibrium ( a,b ) . In this equilibrium, b hasa cutoff structure with some cutoff belief p ∗ ∈ (0 , . Moreover, there exists r ∗ > such that1. If r ≥ r ∗ , then a is fully separating.2. If r < r ∗ , then a is hump-shaped and is maximized at p ∗ . Theorem 1 characterizes the structure of the unique Markov equilibrium. First, the principal usesa cutoff strategy. This follows from the type-dependent stopping payoff of the principal, and theabsence of flow payoffs.Second, the noninvestible agent’s behavior depends on his discount rate. If he is impatient(i.e., with a high discount rate), then he never mimics the investible type, because he always findsthe saving of the mimicking cost to outweigh the benefit of having a better reputation. A richerdynamics opens up if the agent is patient (i.e., with a low discount rate). In this case, his behaviorcan be described by three reputation phases: good, medium and bad, as depicted in Figure 1. Bothin the good and the bad reputation phases, the noninvestible type does not mimic the investibletype at all, but for different reasons: when his reputation is good ( p < p L ), the relationship is highlystable, so the noninvestible agent gains little from further improving his reputation by mimicking;when his reputation is bad ( p > p R ), termination is so imminent that the noninvestible agent gives upbuilding reputation. In the intermediate phase, however, the noninvestible type first starts to mimicmore often as his reputation worsens in order to slow down the principal’s learning, and then he11 ( p ) Figure 1: Agent’s Equilibrium Policy Function When r < r ∗ . This figure is plotted under thefollowing parameter values: r = 0 . , r = 0 . , λ = 2 , µ = 1 . , u = 1 , c = 1 , w NI = 1 , w I = − .In equilibrium, p L ≈ . , p ∗ ≈ . , p R ≈ . .gradually gives up as the relationship becomes doomed. His mimicking intensity is highest at belief p ∗ when the principal’s action switches from continuing the relationship to termination.Finally, the cutoff discount rate, r ∗ , can be characterized in closed form. Specifically, r ∗ is theunique solution to the following equation: r ∗ ( (cid:112) r ∗ /µ + (cid:112) r ∗ + λ ) /µ )+ λ ( (cid:112) r ∗ /µ +1) = 4 λ (cid:16) uc +1 (cid:17) . (2)Some comparative statics results are readily obtained. In particular, r ∗ increases with λ , µ and uc . Thisis intuitive, because the noninvestible type will have a higher incentive to mimic if: i) the stoppingopportunity arrives more frequently and thus the relationship is less stable; ii) the signal-to-noiseratio is higher and thus a manipulation of signal is more profitable; iii) mimicking is relatively lesscostly. We also note that r ∗ does not depend on the principal’s payoff parameters ( r , w NI and w I ). Remark 1. As λ increases to ∞ , the cutoff discount rate r ∗ converges from below to a finite number ¯ r . For any fixed r < ¯ r , the agent’s equilibrium policy function converges to one that resembles Figure1 for p < p ∗ and is equal to for all p > p ∗ , that is, p ∗ R converges to p ∗ , creating a discontinuity at p ∗ .This is intuitive because in the limit the agent’s incentive when p < p ∗ is similar to what we explainedbefore, but once the belief is above p ∗ the agent expects the relationship to be terminated in the nextinstant regardless of what he does, and thus he should choose a = 0 to save the mimicking cost. Technically, if λ is set to ∞ , i.e., if we literally allow the principal to terminate the relationship whenever shewants, for that game additional refinement is needed to preserve equilibrium uniqueness, because once p > p ∗ theagent expects the relationship to be terminated right away in which case his action choice in that instant has no payoff ( a,b ) , and a conjecture that the principal holds about the agent’s strategysuch that: i) the principal’s conjecture determines her interpretation of public signal histories intoher beliefs about the agent’s type; ii) the principal’s policy b is optimal given her conjecture aboutthe agent’s strategy; iii) the agent’s policy a is optimal given b and the principal’s conjecture; iv) the principal’s conjecture coincides with a .Specifically, any Markov equilibrium ( a,b ) satisfies the optimality conditions stated below. Principal’s Optimality: b ∈ argmax ˜ b ∈ P ˆ W ( p,a, ˜ b ) , (3)where ˆ W ( p,a, ˜ b ) := E { e − r ν (cid:0) { θ = NI } w NI + { θ = I } w I (cid:1) } , where p = p , ν is the time when the game stops, controlled by both ˜ b and { J t } t ≥ , and the evolutionof { p t } t ≥ is given by the SDE (for a formal derivation, see, e.g., Bolton and Harris, 1999) dp t = − µ (1 − a t ) γ ( p t ) d ˜ B t . (4)In (4), a t is a function of p t , γ : [0 , → R + is defined by γ ( p ) := p (1 − p ) , and ( ˜ B t ) t ≥ is theinnovation process associated with the filtering of the principal, i.e., d ˜ B t = dX t − µ ( p t a t +1 − p t ) dt. (5)The optimality condition (3) requires that b maximizes the principal’s payoff when the agentis using policy a .Agent’s Optimality: a ∈ argmax ˜ a ∈ P ˆ V ( p, ˜ a,b ; a ) , (6) consequence. See Kuvalekar and Lipnowski (2020) for a detailed discussion and a refinement that will select thelimiting policy function we described in the limiting game. In our model, the Poisson arrival of stopping opportunitieshelps us avoid such complications and obtain equilibrium uniqueness for every finite λ without additional refinements. The optimality conditions (3) and (6) restrict the players to maximize their payoffs over Markov controls in P .This is for expository purposes and is without loss. In the proof of Theorem 1 we verify that the equilibrium strategiesare mutual best replies among all admissible strategies in B and A . ˆ V ( p, ˜ a,b ; a ) := E (cid:26)(cid:90) ν e − r τ { r [(1 − ˜ a ( p τ )) c + u ] } dτ (cid:27) where p = p , ν is the time when the game stops, and the evolution of { p t } t ≥ is given by substituting dX t = µ ˜ a t dt + dB t into equations (4) and (5). Specifically, from the noninvestible type’s perspective,the belief process satisfies: dp t = µ (1 − a t )[1 − ˜ a t − p t (1 − a t )] γ ( p t ) dt − µ (1 − a t ) γ ( p t ) dB t . (7)In a Markov equilibrium ( a,b ) , the principal has a conjecture about the agent’s behavior, whichdetermines how she interprets any history of signal realizations into her belief about the agent’s type.This conjecture has to coincide with the agent’s policy a in equilibrium. If the agent contemplates adeviation from the equilibrium, this would not affect the processes in equations (4) and (5) (whichjointly describe the dependence of beliefs on the public history), but would affect the process thatgoverns the evolution of X t (public histories). The necessary condition (6) requires that the agentdoes not have a profitable deviation from his equilibrium policy function a , when the principalconjectures that the agent is using this policy function.We now proceed by utilizing the implications of the necessary conditions outlined above. We firstshow that in any Markov equilibrium, the principal’s policy function has a cutoff structure: she termi-nates the relationship if and only if the agent’s reputation is bad enough. We then show that the agent’sequilibrium policy function must be either fully separating (i.e., never mimicking) or hump-shaped.Finally, the existence and uniqueness of Markov equilibrium follow from a fixed-point argument.Let R ( p ) := pw NI +(1 − p ) w I be the principal’s expected payoff if the relationship is terminatedat belief p . Define p ∗∗ := R − (0) > and p H := R − (cid:16) λr + λ w NI (cid:17) < . Lemma 2. If ( a,b ) is a Markov equilibrium, then b has a cutoff structure with a cutoff belief p ∗ ∈ [ p ∗∗ ,p H ] . To prove this result, we utilize the optimality condition in (3). Observe that the equilibrium valuefunction, W ( p ) , is such that W ( p ) := ˆ W ( p,a,b ) . Then, the principal’s value and policy functionsmust satisfy the following HJB equation: r W ( p ) = max ˜ b ∈ [0 , (cid:26) µ [1 − a ( p )] γ ( p ) W (cid:48)(cid:48) ( p )+ λ ˜ b [ R ( p ) − W ( p )] (cid:27) . (8)It is clear that b ( p ) = 0 whenever R ( p ) < W ( p ) , and b ( p ) = 1 whenever R ( p ) > W ( p ) . In the proof,we show that these functions have a unique intersection point. Moreover, because terminating the14elationship when p < p ∗∗ gives the principal a negative payoff, and because the stopping opportunityarrives only once in a while which bounds her payoff from waiting by λr + λ w NI , the principal’soptimal stopping threshold must be between p ∗∗ and p H . See Figure 2 for an illustration.Figure 2: Principal’s Equilibrium Cutoff. This figure is plotted under the following parametervalues: r = 0 . , r = 0 . , λ = 2 , µ = 1 . , u = 1 , c = 1 , w NI = 1 , w I = − . In equilibrium, p ∗∗ = 0 . , p ∗ ≈ . , p H = 0 . .We now turn to the agent’s behavior. Lemma 3. Suppose b ∈ P is a cutoff policy function for the principal with cutoff belief p ∗ . Then,there is a unique policy function a ∈ P for the agent such that i) ˆ V ( p,a ( p ) ,b ( p ); a ( p )) is a regularfunction of p , ii) a is Lipschitz and sup p ∈ (0 , a ( p ) < , and iii) a satisfies (6). Moreover, this uniquepolicy function is fully separating if r ≥ r ∗ , and is hump-shaped if r < r ∗ . The proof of Lemma 3 is more involved. This is because finding a solution to program (6) is akinto finding a fixed point: the policy a for the agent is optimal when the principal holds the conjecture a . Nonetheless, we are able to characterize its unique solution in closed form.Intuitively, if the agent is impatient, the short-run incentives determine his behavior, and thenoninvestible type will never pay the cost to mimic the investible type, leading to a fully separatingpolicy function. If the agent is patient, full separation can no longer be part of an equilibrium. This isbecause the fully separating policy function, if conjectured by the principal, generates opportunitiesto build a reputation rather fast; and when the agent cares enough about the future, it will give strictincentives to the noninvestible type to mimic. 15hat is the dynamics of the agent’s mimicking intensity when he is patient? When the publicbelief p is very small, it takes a long time for the belief to increase all the way up to the terminationcutoff. Hence, the limited benefit of further improving reputation cannot justify the mimicking cost.As a result, a ( p ) = 0 for low p . When p is very large, it takes so long for the agent to regain hisreputation that he simply gives up. Consequently, a ( p ) = 0 for high p . For intermediate p , the agent’sshort-run temptation and long-run benefits are more balanced, so that a ( p ) ∈ (0 , . Specifically,after the good reputation phase, for p ∈ ( p L ,p ∗ ) , the noninvestible type starts to mimic more often ashis reputation worsens. Such an incentive peaks at p ∗ where the principal’s action is most sensitiveto a change in belief. After that, for p ∈ ( p ∗ ,p R ) , the noninvestible type gradually gives up restoringhis reputation as termination becomes more imminent. We saw in Theorem 1 that the noninvestible type will engage in performance boosting wheneverhe is sufficiently patient, in which case a peaks at the termination cutoff p ∗ . We interpret this as a “scramble-to-rescue” effect : the agent increases his mimicking intensity (before p ∗ ) as the relationshipgets less stable.What is its implication on the observables? An outsider (the principal or a modeler) does not seethe agent’s type or action, but can observe his performance, such as subscription growth or progressreports. In our model, the expected performance at time t is given by EP t := E [ dX t ] dt = µ − p t (cid:124) (cid:123)(cid:122) (cid:125) investible + p t a ( p t ) (cid:124) (cid:123)(cid:122) (cid:125) noninvestible . Holding constant a < , the expected performance decreases with p . We call this the belief effect : ifthe agent is more likely to be noninvestible, then the expected performance is lower. This is the entirestory if the equilibrium is fully separating, in which case a ( p ) = 0 everywhere and EP ( p ) = µ (1 − p ) .However, if the equilibrium is not fully separating, the noninvestible type’s mimicking intensity a is no longer constant: it is increasing below p ∗ due to the scramble-to-rescue effect. Hence, whetherthe expected performance increases or decreases with the public belief depends on which of the twoeffects is stronger. The following theorem characterizes the evolution of the expected performancewhen the stopping opportunity arrives sufficiently fast. Theorem 2. Fixing all parameters of the model other than r and λ , there exists ¯ λ such that forall λ > ¯ λ , EP ( p ) is non-monotone whenever the equilibrium is not fully separating (i.e., whenever r < r ∗ ). In particular, EP ( p ) is strictly decreasing for p ∈ [0 ,p ) , where p is in [ p L ,p ∗ ) ; • strictly increasing for p ∈ ( p,p ∗ ) ; • strictly decreasing for p ∈ ( p ∗ , . Theorem 2 shows that if the arrival rate of the stopping opportunity is large enough, the scramble-to-rescue effect will dominate the belief effect when the public belief is less than but close tothe termination cutoff p ∗ . As a result, the expected performance reaches a local maximum at p ∗ whenever the agent’s equilibrium policy function is hump-shaped (see Figure 3). , EP ( p ) Figure 3: Agent’s Expected Performance When λ > ¯ λ and r < r ∗ . This figure is plotted underthe following parameter values: r = 0 . , r = 0 . , λ = 2 , µ = 1 . , u = 1 , c = 1 , w NI = 1 , w I = − .In the context of our applications, Theorem 2 offers an empirical prediction of our model: whenperformance boosting is expected to happen, terminations are preceded by a spike in expectedperformance. This seems consistent with a number of famous cases of corporate failure, such asTheranos, Luckin Coffee and WeWork: there were periods of time during which market suspicionsabout their business models grew, meanwhile the companies kept performing strongly and/orexpanding aggressively prior to the crashes of their market values. The lower bound on λ is not crucial for this qualitative predication. In fact, even if λ is small, we can show thatthe expected performance is either decreasing or has the shape described in Theorem 2 (see Lemma 4). Moreover,for any λ , we can find a ¯ r (less than r ∗ ) such that the expected performance is non-monotone whenever r < ¯ r . Theonly difference for small λ is that we cannot say definitively what will happen when r ∈ (¯ r,r ∗ ) . The sudden drop of expected performance after p ∗ carries over to the time domain. Specifically, because EP ( p ) is locally maximized (thus concave) at p ∗ , one can show that the stochastic process EP t is a local supermartingalewhen p t = p ∗ . For example, in the third quarter of 2019, Luckin Coffee reported a 470.1% increase in the total items sold from 7.8 Better Transparency May Hurt the Principal The noise in the performance measure may come from various random events such as temporarydemand shocks, measurement errors, etc., making X t only an imperfect signal of the agent’s type.We say that a performance measure is more transparent if it is less affected by the noise component,and we can use the signal-to-noise ratio of the process to capture its transparency . In reality,transparency may be improved (reduced) by having more (less) stringent disclosure requirements.In this section, we investigate whether policies that improve transparency will help the principal. Wefind that, under some parameters, improving transparency can inhibit learning and hurt the principal,due to the agent’s endogenous response through signal manipulation.Recall that the drift of the signal process is µ and that its diffusion coefficient σ is normalizedto . Therefore, the signal-to-noise ratio in our model is simply µ . In what follows, we will fixall parameters other than µ , and analyze the principal’s payoff as µ increases. Hence, we makeexplicit the dependence of any variable or function on µ .As a benchmark, suppose that the principal never receives any information about the agent’s type.Recall that R ( p ) = pw NI +(1 − p ) w I and p ∗∗ = R − (0) . In this case, the principal would continuethe relationship if p < p ∗∗ , and she would terminate the relationship at the first stopping opportunityif p > p ∗∗ . This leads to the following “no-information value function” for the principal: W ( p ) := λr + λ max { ,pw NI +(1 − p ) w I } = λr + λ max { ,R ( p ) } . Note that when µ is near , the signal process is close to pure noise regardless of the agent’s action.So we have the following observation. Observation 1. lim µ → W ( p ; µ ) = W ( p ) and lim µ → p ∗ ( µ ) = p ∗∗ . Now suppose that the agent’s type is exogenously and immediately revealed to the principal. Inthis case, the principal will obtain her highest possible payoff for each belief, summarized by her “full-information value function” : ¯ W ( p ) := λr + λ [ p max { ,w NI } +(1 − p )max { ,w I } ] = λr + λpw NI . million in the same quarter of 2018. Its stock price was slashed by 75% in April 2020, following suspicion and then admis-sion of fabricating sales data. Likewise, before scandals started to unravel, Theranos falsely claimed in 2014 that the com-pany had annual revenues of $100 million, a thousand times more than the actual figure of $100,000. In the case of We-Work, the company once had expanded to over 86 cities in 32 countries, despite growing suspicion about its profitability.However, in September 2019, the companied delayed its IPO, followed by a 90% slash in valuation and enormous layoffs. This normalization is without loss because before normalization µ and σ always show up together as ( µ/σ ) inthe calculation. Hence, only the signal-to-noise ratio is relevant, and increasing µ to ∞ is equivalent to shrinking σ to . W ( p ) and ¯ W ( p ) . This isbecause some learning will take place in equilibrium (as the noninvestible type never fully mimics),but the agent’s type is not immediately revealed (as µ < ∞ ). Observation 2. For all µ ∈ (0 , ∞ ) and p ∈ (0 , , W ( p ; µ ) ∈ ( W ( p ) , ¯ W ( p )) . The next theorem characterizes the limiting behavior of the principal’s payoff as µ goes to infinity. Theorem 3. Letting ˜ λ := r (cid:0) cu (cid:1) ,1. If λ < ˜ λ , then lim µ →∞ || W ( · ; µ ) − ¯ W ( · ) || ∞ = 0 and lim µ →∞ p ∗ ( µ ) = 1 .2. If λ > ˜ λ , then lim µ →∞ || W ( · ; µ ) − W ( · ) || ∞ = 0 and lim µ →∞ p ∗ ( µ ) = p ∗∗ . When the stopping opportunity arrives slowly ( λ < ˜ λ ), the noninvestible type’s incentives tomimic are not strong. Intuitively, the relationship is relatively stable from the agent’s viewpointbecause, due to the lack of stopping opportunity, it will take a long time for the relationship toend even if the principal has decided to terminate it. As a result, the agent’s equilibrium actionis bounded away from “full mimicking” for all µ . As µ grows without bound, the public signalbecomes increasingly informative about the agent’s type, and in the end, the agent’s type is almostimmediately revealed. Thus, the principal can afford to wait until being very certain that the agentis noninvestible, and her equilibrium value function converges to ¯ W (see Figure 4, left panel).On the other hand, when the stopping opportunity arrives fast ( λ > ˜ λ ), the noninvestible typehas stronger incentives to mimic the investible type. In particular, as µ increases without bound,the equilibrium mimicking intensity at the termination cutoff converges to 1. The speed of thisconvergence is so fast that the variance of the belief process vanishes there (i.e., p ∗ becomes analmost absorbing state). Meanwhile, the equilibrium policy function a ( · ) converges to 1 alsofor p < p ∗ , and it converges to a function that is strictly less than 1 for p > p ∗ . In both of theseregions, the principal will learn some information about the agent’s type from the public signal.However, this information is not useful (payoff-relevant) for the principal since it does not leadto an action change. Hence, the principal’s termination cutoff converges to p ∗∗ and her equilibriumvalue function converges to W —as if no information would ever arrive (see Figure 4, right panel).To better understand the limiting equilibrium dynamics when λ > ˜ λ , note from equation (4) thatthe diffusion coefficient of the principal’s belief process at time t is proportional to µ (1 − a t ) . For For sufficiently small λ , the noninvestible type does not mimic at all (i.e., a ( p )=0 for all p ), and the principal’sproblem becomes identical to a standard two-armed bandit problem. For relatively large λ (still less than ˜ λ ), somemimicking appears in equilibrium, but a ( · ) is uniformly bounded away from for all µ . ( p ;) (a) λ< ˜ λ W ( p ;) (b) λ> ˜ λ Figure 4: Convergence of the Principal’s Equilibrium Value Function. This figure is plottedunder the following parameter values: r = 0 . , r = 0 . , u = 1 , c = 1 , w NI = 1 , w I = − (so that p ∗∗ = 0 . , ˜ λ = 0 . ); λ = 0 . for (a), λ = 2 for (b).any fixed p , we can write it as µ (cid:124)(cid:123)(cid:122)(cid:125) direct effect [1 − a ( p ; µ ) (cid:124) (cid:123)(cid:122) (cid:125) equilibrium effect ] . As µ increases, the direct effect through the multiplier accelerates information revelation while the equilibrium effect through the agent’s strategy slows down learning. Its limit depends on the valueof p ; in particular, one can show that lim µ →∞ µ [1 − a ( p ; µ )] ∈ (0 , ∞ ) , for all p ∈ (0 ,p ∗∗ ) , lim µ →∞ µ [1 − a ( p ; µ )] = ∞ , for all p ∈ ( p ∗∗ , , lim µ →∞ µ [1 − a ( p ∗ ( µ ); µ )] = 0 . This suggests the following limiting equilibrium dynamics. If the prior belief is above p ∗∗ , thereis an immediate split of belief to either or very close to p ∗∗ and then it (almost) stops moving. Ifthe prior belief is below p ∗∗ , then learning takes place gradually but becomes slower and slower asthe posterior approaches p ∗∗ . In both cases the principal’s learning does not stop despite the agent’sextreme manipulation, but because her posterior never moves across p ∗∗ , the principal’s action neverchanges with the information she learns, and so payoff-wise it is as if no information ever arrives.The next corollary follows immediately from Theorem 3.20 orollary 1. For any λ > ˜ λ and p ∈ (0 , , there exist µ ,µ such that µ > µ and W ( p ; µ ) We now investigate the equilibrium outcomes as players get arbitrarily patient. The purpose is tosee more clearly the role of patience in the principal’s incentives to wait for more information andin the agent’s incentives to engage in performance boosting.First, consider an extreme case where r is constant and r goes to 0 (i.e., the principal getsarbitrarily more patient than the agent). In this case, the agent’s mimicking intensity (which isindependent of r ) stays bounded away from 1 everywhere, implying that the public signal alwaysreveals some information about the agent’s type. As the principal gets more patient, her marginalcost of waiting for new information becomes lower. Consequently, a patient principal will terminatethe relationship only when p is very high. Indeed, the termination cutoff converges to 1 and theprincipal’s payoff converges to ¯ W .Next, consider the other extreme case where r is constant and r goes to 0 (i.e., the agent gets ar-bitrarily more patient than the principal). As the agent gets more patient, he cares more about stayingin the relationship for long and less about the instantaneous mimicking cost. Thus, the noninvestibletype has stronger incentives to mimic the investible type, and the equilibirum mimicking intensityapproaches one at and below the termination cutoff. At the limit, the outcome is similar to the casefor large µ and λ characterized in the previous section. That is, it is as if no information ever arrives,with the termination cutoff converging to p ∗∗ and the principal’s value function converging to W . The possibility that better monitoring/more transparency may hurt a principal or relationship has appeared in othersettings, such as career-concern models (Holmstr¨om, 1999; Dewatripont et al., 1999), contracting in insurance markets(Hirshleifer, 1971; Schlee, 2001), contracting with moral hazard (Zhu, 2020), and dynamic team production (Cetemenet al., 2020; Ramos and Sadzik, 2020). In our model, this effect shows up for a different reason: better monitoringmay give stronger incentives to the agent to engage in performance boosting, which depresses the informativenessof the public signals. { r ,n ,r ,n } n of discount ratessuch that r i,n → for both i = 1 , and lim n r ,n r ,n = χ ∈ (0 , ∞ ) . Consider a sequence of games alongwhich all other parameters are fixed, and let { W n ,V n } n be the corresponding sequence of valuefunctions for the principal and the agent, respectively. The following theorem displays their limits. Theorem 4. W n ( · ) converges uniformly to max { ,R ( · ) } , and V n ( · ) converges pointwise to V ∗ ( · ) which satisfies V ∗ ( p ) := u, if p < p ∗∗ , if p > p ∗∗ . Theorem 4 shows that if both players get arbitrarily patient at comparable rates, then it is as if theprincipal does not receive any information. Along the sequence, both the agent’s incentives to mimicand the principal’s resolve to wait for more information get stronger, but the former effect dominates.We view this result as a strong manifestation of the ratchet effect in the patient limit of our model.Since the principal cannot commit to not using future information against the agent, the noninvestibletype will engage in performance boosting with almost full intensity in order to maintain his reputation.In the end, no useful information is ever revealed, and the principal’s lack of commitment hurts herin the most extreme way. In our applications, this result suggests that the use of other instruments,such as some form of commitment (e.g., setting a deadline and/or grace period), additional screeningdevices (e.g., performance-based investment levels and/or salaries), or huge fines that increase theexpected cost of performance boosting, may be necessary to help the principal get more information. In this paper, we study a stopping game with asymmetric information where the performancemeasures that reflect the fundamental can be manipulated by an agent at a cost. Despite the modelbeing stylized, we obtain rich equilibrium dynamics. Our model illustrates that inflated performancecan coexist with growing suspicion about a project’s viability. Our analysis also implies that toomuch transparency may hinder the principal’s ability to learn, by encouraging excessive performanceboosting. This result suggests that some noise in the monitoring technology may be beneficial forthe principal. Furthermore, we find an extreme form of ratchet effect in the patient limit, precludingany useful learning. This happens because the principal lacks the commitment to refrain fromusing the information obtained during the relationship against the agent, giving a highly patientnoninvestible type strong incentives to boost performance and maintain his reputation.Several ways to extend our analysis are worth mentioning. While our main focus is the adverseselection problem, in some settings moral hazard is a prominent issue. Thus, it would be interesting22o allow the agent’s action to directly influence the principal’s payoff. Relatedly, in that setting theprincipal might want to use history-dependent flow payoffs to reward/punish the agent. Anotherpossibility is to expand the choice set of the principal by allowing her to elevate the “status” ofthe relationship, such as promoting the agent or upgrading the terms of financing. Finally, optimalcontracting in this setting remains an open problem. We leave all these aspects as interestingdirections for future research. A Appendix: Proofs Remark. Because only the noninvestible type of the agent has an active choice to make, whenever thereis no confusion we simply refer to the “noninvestible-type agent” as the agent. A.1 Equilibrium Characterization: Toward a Proof of Theorem 1 To establish Theorem 1, we use the results that the equilibrium belief process must have full support (Lemma1), and that the principal’s equilibrium strategy must have a cutoff structure (Lemma 2). These two lemmas areproved in the Online Appendix. The main proof characterizes the agent’s (pseudo-)best reply to any cutoff ter-mination rule (Lemma 3). Finally, we prove equilibrium existence and uniqueness using a fixed point argument. A.1.1 Proof of Lemma 3 In light of Lemma 2, let us fix a cutoff termination rule of the principal. We define a new state variable Z t :=log p t − p t , which is a strictly increasing transformation of p t . Note that Z t is defined on ( −∞ , ∞ ) .Given the principal’s conjecture a ( · ) about the agent’s policy function and the agent’s actual policy function ˜ a ( · ) , the law of motion of p t is given by (7). By Itˆo’s lemma, the law of motion of Z t is dZ t = µ (1 − a t ) (cid:2) − ˜ a t − (1 − a t ) (cid:3) dt − µ (1 − a t ) dB t . (9)Now, suppose that the principal uses a particular cutoff policy function b with cutoff belief p ∗ ∈ (0 , .Suppose also that the noninvestible type’s policy function a satisfies the conditions in Lemma 3: a ( · ) isLipschitz, sup p ∈ (0 , a ( p ) < , and it satisfies (6), i.e., a ∈ argmax ˜ a ∈ P ˆ V ( p, ˜ a,b ; a ) ; moreover, the resulting V ( p ) = ˆ V ( p,a ( p ) ,b ( p ); a ( p )) is regular. For brevity, we call any a ( · ) that satisfies these conditionsa pseudo-best reply to b ( · ) . Lipschitz continuity of a ( · ) implies that, for any control in P or A , thecontrolled process p t or Z t in the agent’s problem always admits a unique strong solution. We call such a ( · ) pseudo-best reply because b ( · ) by itself does not lead to a well-defined strategy of the principal;the principal’s interpretation of the observed signal into her posterior belief depends on (her conjecture of) the agent’sstrategy. The equilibrium condition that the principal’s conjecture coincides with the agent’s actual strategy is imposedas part of the definition of a pseudo-best reply. et Z t be the new state variable and define v ( z ):= V (cid:16) e z e z (cid:17) and z ∗ :=log p ∗ − p ∗ . Because we work with Z t most of the time in this appendix, we will write a ( z ) to mean a (cid:16) e z e z (cid:17) whenever there is no confusion. The HJB for the agent is [ r + b ( z ) λ ] v ( z )= max ˜ a ∈ [0 , r [ u +(1 − ˜ a ) c ]+ µ [1 − a ( z )] (cid:2) − ˜ a − (1 − a ( z )) (cid:3) v (cid:48) ( z )+ µ [1 − a ( z )] v (cid:48)(cid:48) ( z ) . (10)The following sequence of claims establishes some necessary properties of any pseudo-best reply a ( · ) . Claim 1. a ( z )=1 − r c max { r c, − µ v (cid:48) ( z ) } , for all z ∈ ( −∞ , ∞ ) .Proof. Since the RHS of (10) is affine in the choice variable, optimality requires that, for almost every z ∈ R , a ( z ) =0 , if r c + µ [1 − a ( z )] v (cid:48) > ∈ [0 , if r c + µ [1 − a ( z )] v (cid:48) =0=1 , if r c + µ [1 − a ( z )] v (cid:48) < . This implies that, for almost every z ∈ R , we have a ( z )=1 − r c max { r c, − µ v (cid:48) ( z ) } . (11)Since both sides of (11) are continuous in z (recall that a ( · ) is Lipschitz by assumption, and v ( · ) is C byassumption), we conclude that (11) must hold for every z ∈ R . Claim 2. Fix any z Since b ( z )=0 and a ( z )=0 for all z ∈ ( z ,z ) , equation (10) becomes r v ( z )= r ( u + c )+ 12 µ [ v (cid:48) ( z )+ v (cid:48)(cid:48) ( z )] . It is easy to verify that its general solution is given by (12). Since v is regular, v is C except at possibly finite points. This HJB holds on any interval over which v is C . Since sup z ∈ R a ( z ) < by definition of a pseudo-best reply (and by Lemma 1), the law of motion (9) impliesthat the distribution of Z t has full support for any t> , i.e., supp ( Z t )= R , ∀ t> . This is because any continuous function that is almost everywhere is equal to everywhere. laim 3. Fix any z ∗ ≤ z Since b ( z )=1 and a ( z )=0 for all z ∈ ( z ,z ) , equation (10) becomes ( r + λ ) v ( z )= r ( u + c )+ 12 µ [ v (cid:48) ( z )+ v (cid:48)(cid:48) ( z )] . It is easy to verify that its general solution is given by (13).Now, let us denote by Φ and φ the CDF and PDF of the standard normal distribution, respectively. Claim 4. Fix any z Fix any z ∗ ≤ z For any z ,z such that either z < z ≤ z ∗ or z ∗ ≤ z < z , a ( z ) = a ( z ) = 0 implies a ( z ) = 0 for all z ∈ [ z ,z ] . roof. Fix any z < z ≤ z ∗ such that a ( z ) = a ( z ) = 0 . Suppose (for a contradiction) that there exists ˜ z ∈ ( z ,z ) s.t. a (˜ z ) ∈ (0 , . Let Z be the largest interval containing ˜ z such that a ( z ) ∈ (0 , for all z ∈ Z .Obviously, z ≤ inf Z < sup Z ≤ z , and a (inf Z ) = a (sup Z ) = 0 because a ( · ) is continuous. By Claim 4, a ( · ) is strictly increasing, or strictly decreasing, or first strictly decreasing and then strictly increasing on Z . Since a (inf Z ) = 0 , a ( · ) can only be strictly increasing on Z , but this contradicts the continuity of a ( · ) at sup Z . An analogous argument which invokes Claim 5 establishes same result for any z ∗ ≤ z One of the following must hold for any pseudo-best reply a ( · ) : • a ( z )=0 for all z ∈ R ; • a ( z ) is hump-shaped (and maximized at z ∗ ).Proof. Suppose that a ( · ) is not always equal to . Then there exists ˜ z s.t. a (˜ z ) > . Let Z be the largestinterval containing ˜ z such that a ( z ) > for all z ∈ Z . Let z L = inf Z and z R = sup Z . By Claim 6, −∞ < z L < z R < ∞ . By continuity of a , a ( z L ) = a ( z R ) = 0 . Then we must have z ∗ ∈ ( z L ,z R ) , forotherwise we would reach a contradiction to Claim 7. Moreover, for any z ∈ ( z L ,z R ) c , we must have a ( z )=0 ,for otherwise we can construct another Z (cid:48) which also contains z ∗ such that Z (cid:48) − Z (cid:54) = ∅ , contradicting themaximality of Z . Since a ( z L )=0 , Claim 4 implies that a ( · ) must be strictly increasing on ( z L ,z ∗ ) . Similarly,since a ( z R )=0 , Claim 5 implies that a must be strictly decreasing on ( z ∗ ,z R ) .In summary, if a ( · ) is not always equal to , then there exist z L Suppose that r < r ∗ , and let v L and v R be given by (23) and (29) , respectively. Define a ∗− ,a ∗ + :[ v R ,v L ] → R by a ∗− ( x ):=1 − √ r µ φ (cid:16) x − u √ κ L (cid:17) √ r µ φ (cid:16) v L − u √ κ L (cid:17) +Φ (cid:16) v L − u √ κ L (cid:17) − Φ (cid:16) x − u √ κ L (cid:17) , (38) a ∗ + ( x ):=1 − √ r + λ ) µ φ (cid:18) x − r r λ u √ κ R (cid:19) √ r + λ ) µ φ (cid:18) v R − r r λ u √ κ R (cid:19) +Φ (cid:18) v R − r r λ u √ κ R (cid:19) − Φ (cid:18) x − r r λ u √ κ R (cid:19) . (39) Then, a ∗− ( · ) is strictly decreasing on [ v R ,v L ] with a ∗− ( v R ) ∈ (0 , and a ∗− ( v L ) = 0 ; a ∗ + ( · ) is strictlyincreasing on [ v R ,v L ] with a ∗ + ( v R )=0 and a ∗ + ( v L ) ∈ (0 , .Proof. See Online Appendix. Proof of Lemma 3. Suppose first that r ≥ r ∗ . By Corollary 2 and Claim 8, any pseudo-best reply a ( · ) mustbe such that a ( z )=0 for all z ∈ R . Obviously, such function is unique. To verify that a ( z )=0 for all z ∈ R is indeed a solution to (6), note that we have shown in the proof of Claim 8 that, together with the v ( · ) givenby (12) and (13) and the coefficients given by (34) through (37), it satisfies the agent’s HJB equation (10). Since v ( · ) is bounded, we have lim t →∞ e − r t E (cid:2) v ( z t )1 { τ ≥ t } (cid:3) = 0 , where τ is the stopping time when the Everywhere except at z ∗ where v (cid:48)(cid:48) does not exist. elationship is terminated. Then by Ross (2008, Theorem 3.3.5), a ( z )=1 for all z ∈ R is indeed a solution to(6). In addition, v ( · ) is regular because the functions given by (12) and (13) are smooth, and value-matchingand smooth-pasting conditions are imposed at z ∗ .Suppose now that r Lemmas 2 and 3 establish the unique structure of Markov equilibria. To prove Theorem 1, we still needan argument for equilibrium existence and uniqueness. Proof of Theorem 1. Suppose first that r ≥ r ∗ . By Lemma 3, the agent’s pseudo-best reply to any cutofftermination rule satisfies that a ( p )=0 for all p ∈ (0 , . In fact, the verification theorem we invoke in provingLemma 3 tells us that such a ( · ) satisfies the agent’s optimality condition (6) in a stronger sense, even if weallow him to maximize over all admissible controls in A instead of over Markov controls in P . On the otherhand, given this Markovian strategy of the agent under which the belief span is (0 , , the proof of Lemma2 (in the Online Appendix) can be used verbatim to show that the principal has a unique best reply whosepolicy function b admits a cutoff p ∗ ∈ (0 , . Hence, such ( a,b ) is the unique Markov equilibrium in this case.Suppose now that r The following hold:1. W ( · ) is (weakly) increasing, nonnegative and convex on (0 , , and it satisfies lim p → W ( p )=0 and lim p → W ( p )= λr + λ w NI .2. v ( · ) is strictly decreasing on R , and it satisfies lim z →−∞ v ( z ) = u + c and lim z →∞ v ( z ) = r r + λ ( u + c ) . Moreover, v ( · ) is concave on ( −∞ ,z ∗ ) and convex on ( z ∗ , ∞ ) .Proof. See Online Appendix. A.2 Expected Performance: Toward a Proof of Theorem 2 In this section, we prove Theorem 2 which is about the non-monotonicity of the expected performance.Given a Markov equilibrium ( a,b ) (where the equilibrium policy functions are defined on the z − space),let v be the agent’s value function, z ∗ be the principal’s termination cutoff, and recall that the agent’s expectedperformance is given by EP ( z )= µ [1 − (1 − a ( z )) p ( z )] , (41)where p ( z )= e z e z . Our analyses in this section fix all model parameters, except r and/or λ . Lemma 4. If r ≥ r ∗ , then EP ( · ) is strictly decreasing on R . If r < r ∗ , then either EP ( · ) is strictlydecreasing on R , or EP ( · ) is • strictly decreasing for z There exist λ (cid:48) ≥ λ and A> such that if λ>λ (cid:48) , then a ∗ + ( u ; r ,λ ) > − A exp (cid:18) − µ c r λ r + λu (cid:19) , ∀ r ∈ [ r,µ ] . (46) Proof. See Online Appendix. Lemma 8. There exists λ ≥ λ such that if λ>λ and r ≤ r ≤ µ , then EP ( · ) is non-monotone.Proof. Let λ (cid:48) ≥ λ and A > be delivered by Claim 11. For each r ∈ [ r,µ ] , let λ ( r ) be the uniquesolution on R + to A exp (cid:16) − µ c r λ r + λ u (cid:17) =1 − a ∗− ( u ; r ) . Note that λ ( r ) is well-defined for all r ∈ [ r,µ ] because the LHS is strictly decreasing in λ while the RHS is independent of λ . Hence, we have A exp (cid:18) − µ c r λ r + λu (cid:19) < − a ∗− ( u ; r ) , ∀ λ>λ ( r ) . (47)Also, λ ( r ) is continuous in r by the implicit function theorem. Let λ (cid:48)(cid:48) := max r ∈ [ r,µ ] λ ( r ) and λ :=max { λ (cid:48) ,λ (cid:48)(cid:48) } . Combining (46) and (47), we have a ∗ + ( u ; r ,λ ) > − A exp (cid:18) − µ c r λ r + λu (cid:19) >a − ( u ; r ) , for all λ>λ and r ∈ [ r,µ ] . Then, by Claim 10 and Corollary 5, we conclude that EP ( · ) is non-monotone if λ>λ and r ∈ [ r,µ ] . Proof of Theorem 2. Let λ be defined in (45), and let λ and λ be delivered by Lemmas 7 and 8, respec-tively. Define ¯ λ :=max { λ ,λ ,λ } . Lemmas 6 through 8 imply that if λ> ¯ λ and r In this section, we prove Theorem 3 which is about the convergence of the principal’s equilibrium valuefunction when the signal-to-noise ratio µ grows without bound.Take any sequence { µ n } n such that lim n µ n =+ ∞ . For each n ∈ N , take the unique Markov equilibrium ( a n ,b n ) associated with the signal-to-noise ratio µ n . Let V n ( · ) be the agent’s value function in the equilibrium Moreover, when λ = 0 , the LHS is equal to A > − a ∗− ( u ; r ) ; when λ → ∞ , the LHS converges to < − a ∗− ( u ; r ) . a n ,b n ) and W n ( · ) be the principal’s value function. We will often use z ≡ log( p/ − p ) as state variablewhen analyzing the agent’s behavior. When doing so, we denote by v n ( z ) := V n ( p ( z )) the agent’s valuefunction in the z − space. Write z ∗ n for the principal’s equilibrium cutoff. Write z L,n for the infimum belief z at which the agent plays a n ( z ) > and write z R,n for the supremum. Write T for the equilibrium stoppingtime that stops the play of the game. Without labeling explicitly, we note that the distribution of T dependson n and the current state z . For i =1 , , let E θn (cid:8) e − r i T (cid:9) be the expected discount factor when the stoppingaction is taken in the equilibrium ( a n ,b n ) discounted at rate r i and given the equilibrium strategy of type θ ∈{ NI,I } . When the game starts at state z , let E n (cid:110) e − r i T (cid:111) := p ( z ) E NIn (cid:110) e − r i T (cid:111) +(1 − p ( z )) E In (cid:110) e − r i T (cid:111) . A.3.1 Case 1: λ < r (cid:0) cu (cid:1) , i.e., u < r r + λ ( u + c ) Proof of Part 1 of Theorem 3. We first show that there is ε> such that a n ( z ∗ n ) < − ε for all n . Withoutloss, assume that a n ( z ∗ n ) > for all n . Then condition (18) implies that a n ( z ∗ n )=1 − (cid:18) v n ( z ∗ n ) − uc (cid:19) − a (cid:48) n (cid:0) z ∗ n − (cid:1) < − (cid:32) r r + λ ( u + c ) − uc (cid:33) , where the inequality follows from v n ( · ) ≥ r r + λ ( u + c ) and a (cid:48) n ( z ∗ n − ) > . By the assumption of Case 1, wecan set ε =2 (cid:18) r r λ ( u + c ) − uc (cid:19) > .Since a n ( · ) is maximized at z ∗ n , the above implies that a n ( · ) is uniformly bounded away from . Con-sequently, as µ n → ∞ , the principal learns the agent’s type almost immediately, and thus the principal’sequilibrium value function W n ( · ) converges uniformly to her full-information value function ¯ W ( · ) . A.3.2 Case 2: λ > r (cid:0) cu (cid:1) , i.e., u > r r + λ ( u + c ) Claim 12. There exists N ∈ N such that whenever n ≥ N , a n ( · ) is hump-shaped.Proof. From condition (2), it is easily verified that lim n r ∗ n = λ (cid:0) uc +1 (cid:1) . The assumption of Case 2 impliesthat r <λ (cid:0) uc (cid:1) <λ (cid:0) uc +1 (cid:1) . By Theorem 1, the result follows.We assume n ≥ N for the rest of the proofs in Case 2. Claim 13. Take any compact set [ p ,p ] ⊂ (0 , . We have limsup n →∞ (cid:2) max z ∈ [ z ( p ) ,z ( p )] v n ( z ) (cid:3) ≤ u .Proof. Beacuse p ∗ ∈ [ p ∗∗ ,p H ] (by Lemma 2), we can without loss assume that p ∗ n ∈ [ p ,p ] . Since v n ( · ) isdecreasing, it suffices to show that limsup v n (cid:0) z (cid:0) p (cid:1)(cid:1) ≤ u. By Claim 12, a n ( · ) is hump-shaped for all n . First assume that a n ( z ∗ n ) → . Take any ε> , and let z εn be the smallest z such that a n ( z )=1 − ε , which is well-defined for every large n such that a n ( z ∗ n ) > − ε. onsider the stochastic process Z t in the equilibrium ( a n ,b n ) under the noninvestible-type strategy and theinitial condition Z = z (cid:0) p (cid:1) . Let T † n be the stopping time that stops the game at the first time that Z t ≥ z εn . From the law of motion (9), as µ n → ∞ , we have E NIn (cid:104) e − r T † n (cid:105) → and hence v n (cid:0) z (cid:0) p (cid:1)(cid:1) → v n ( z εn ) . Moreover, since v n ( · ) is decreasing and concave to the left of z nε (by Corollary 3), we have r v n ( z nε ) = r [ u +(1 − a n ( z nε )) c ]+ 12 µ [1 − a n ( z nε )] (cid:2) v (cid:48) (cid:0) a n (cid:0) z nε − (cid:1)(cid:1) + v (cid:48)(cid:48) (cid:0) a n (cid:0) z nε − (cid:1)(cid:1)(cid:3) ≤ r [ u +(1 − a n ( z nε )) c ] , which implies v n ( z nε ) ≤ u + εc , delivering the result as ε is arbitrary.Next assume that lim inf a n ( z ∗ n ) < . Take an (cid:15) > such that u > r r + λ ( u + c ) + (cid:15) ; such an (cid:15) exists because of the assumption of Case 2. Notice that we can find a z † sufficiently large such that v n (cid:0) z † (cid:1) < r r + λ ( u + c )+ (cid:15), ∀ n. Let T † n be the stopping time that stops the game at the first time that Z t = z † .As µ n →∞ , we have v n (cid:0) z (cid:0) p (cid:1)(cid:1) → v n (cid:0) z † (cid:1) < r r + λ ( u + c )+ (cid:15) −∞ . Then for any ε > , we have z L,n ∈ [ z − ε,z + ε ] and v n ( z L,n ) ≥ u + c when n is sufficiently large. Take any ε < (0 ,c/ . By Claim 13 and the monotonicity of v n ( · ) , there exists n ∗ such that for every n > n ∗ and for every z ∈ [ z − ε,z + ε ] , we have v n ( z ) < u + ε < u + c , a contradictionto v L,n → u + c and z L,n → z . Claim 15. For any κ> , we have lim n →∞ a n ( z ∗ n − κ )=1 . Proof. Assume toward a contradiction that, taking a subsequence if necessary, lim a n ( z ∗ n − κ )=1 − ε forsome ε > . Take any large M > and notice that Claim 14 tells us that z L,n < z ∗ n − κ − M for large n .Since a n ( · ) is increasing on [ z ∗ n − κ − M,z ∗ n − κ ] and lim a n ( z ∗ n − κ ) = 1 − ε , we know that a n ( z ) < − ε (infinitely often) for all z ∈ [ z ∗ n − κ − M,z ∗ n − κ ] . Recall from condition (18) that for z ∈ [ z ∗ n − κ − M,z ∗ n − κ ] , a (cid:48) n ( z )=1 − a n ( z ) − (cid:18) v n ( z ) − uc (cid:19) , implying, in light of Claim 13, that for n sufficiently large, we have a (cid:48) n ( z ) > ε for all z ∈ [ z ∗ n − κ − M,z ∗ n − κ ] .But then, we can take M large enough such that a n ( z ∗ n − κ − M ) < , a contradiction. To see this, note first that p ∗ n is bounded above by p H < (Lemma 2). Next observe that the posterior is asubmartingale according to the strategy of the noninvestible type. This implies that for every η > we can find p η < such that, conditional on the noninvestible-type strategy, p ∈ ( p η , implies that the posterior goes below p H withprobability less than η . This immediately easily implies the existence of said z † . laim 16. For any κ> , we have lim n →∞ v n ( z ∗ n − κ )= u. Proof. From Claim 13 and Lemma 2 , we know limsup v n ( z ∗ n − κ ) ≤ u . Assume toward a contradiction,taking a subsequence if necessary, that lim n →∞ v n ( z ∗ n − κ ) = u − ε for some ε > . This implies that v n ( z ∗ n − κ ) < u − ε for n sufficiently large. Note also that Claim 14 tells us that z ∗ n − κ > z L,n for n sufficiently large. From condition (18), a (cid:48) n ( z ) = 1 − a n ( z ) − (cid:16) v n ( z ) − uc (cid:17) for all z ∈ [ z ∗ n − κ,z ∗ n − κ ] .Then by Claim 15 and the monotonicity of a n ( · ) , we know that for n sufficiently large, a (cid:48) n ( z ) > εc for all z ∈ [ z ∗ n − κ,z ∗ n − κ ] . But then, we have a n ( z ∗ n − κ ) < − (cid:0) κ (cid:1)(cid:0) εc (cid:1) , a contradiction to Claim 15. Claim 17. Fix a prior p ∈ (0 , and some ¯ p ∈ ( p , . For each µ> , consider an adapted Markov function α µ ( · ) and a belief process defined by substituting α µ ( · ) into (7) . Take ε> and let ¯ T be the random timethat stops the play in the first time that p ≥ ¯ p . Then we have: limsup µ ↑∞ E NI (cid:40) r (cid:90) ¯ T e − r t I { α µ ( p t ) ≤ − ε } dt (cid:41) =0 . Proof. See Online Appendix (Section B.3, and Lemma OA.5 in Section B.4). In words, this lemma saysthat if the noninvestible type does not mimic too often, then as the noise in the signal vanishes, the principalcan learn the agent’s type almost immediately. Lemma 9. Fix any κ> and, for each n ∈ N , assume that the game starts at the prior z ∗ n − κ. Let T ∗ n bethe stopping time that stops the play in the first time that a posterior reaches [ z ∗ n , ∞ ) . We have: limsup n →∞ E NIn (cid:104) e − r T ∗ n (cid:105) =0 . Proof. Let ˆ z n := z ∗ n − κ and z n := z ∗ n − κ. Let T n ( z n ) be the stopping time that stops the play in the firsttime that the posterior reaches z n and T n ( z ∗ n ) be the stopping time that stops the play in the first time thatthe posterior reaches z ∗ n . Observe that for any n , P NIn [ T n ( z n ) < ∞ ]+ P NIn [ T n ( z ∗ n ) < ∞ ]=1 .By Claim 15, we know that for any ε > , there exists n ∈ N such that n > n implies that v n (ˆ z ∗ n ) isbounded above by: P NIn [ T n ( z n ) < ∞ ] E NIn (cid:34)(cid:90) T n ( z n )0 ue − r t dt + e − r T n ( z n ) v n ( z n ) | T n ( z n ) < ∞ (cid:35) + P NIn [ T n ( z ∗ n ) < ∞ ] E NIn (cid:34)(cid:90) T n ( z ∗ n )0 ue − r t dt + e − r T n ( z ∗ n ) v n ( z ∗ n ) | T n ( z ∗ n ) < ∞ (cid:35) + ε. (48)Next we obtain an upper bound for v n ( z ∗ n ) . For that, we let T λ be the random time of the next Poisson hock. Note that v n ( z ∗ n )= P NIn [ z T λ >z ∗ n ] E θ S n (cid:20)(cid:90) T λ e − r t [ u +(1 − a n ( z t )) c ] dt | z T λ >z ∗ n (cid:21) + P NIn [ z T λ ≤ z ∗ n ] E θ S n (cid:20)(cid:90) T λ e − r t [ u +(1 − a n ( z t )) c ] dt + e − r T λ v n ( z T λ ) | z T λ ≤ z ∗ n (cid:21) . Now we use the following facts to bound the expected value above:i) Because a n ( z t ) ≥ , E NIn (cid:20)(cid:90) T λ e − r t [ u +(1 − a n ( z t )) c ] dt | z T λ >z ∗ n (cid:21) ≤ E NIn (cid:20)(cid:90) T λ e − r t ( u + c ) dt | z T λ >z ∗ n (cid:21) . ii) From Claim 13, limsup E NIn (cid:104) I { z T λ ≤ z ∗ n } v n ( z T λ ) (cid:105) ≤ limsup E NIn (cid:104) I { z T λ ≤ z ∗ n } u (cid:105) . iii) For every ε> , from Claim 17, limsup P NIn (cid:20) { z T λ ≤ z ∗ n }∩ (cid:26)(cid:90) T λ e − r t (1 − a ( z t )) dt>ε (cid:27)(cid:21) =0 . Conditions (i), (ii) and (iii) above imply that for every (cid:15)> , we can find n ∈ N with n >n such that n>n implies v n ( z ∗ n ) ≤ P NIn [ z T λ >z ∗ n ] E NIn (cid:104)(cid:82) T λ e − r t ( u + c ) dt | z T λ >z ∗ n (cid:105) + P θ S n [ z T λ ≤ z ∗ n ] u + (cid:15). Since Poissonshocks are independent of the Brownian motion, E NIn (cid:104)(cid:82) T λ e − r t ( u + c ) dt | z T λ >z ∗ n (cid:105) = (cid:16) rr + λ (cid:17) ( u + c ) z ∗ n ] ≥ . Therefore, the last two observations imply v n ( z ∗ n ) ≤ (cid:18) r r + λ (cid:19) ( u + c )+ 12 u + (cid:15). Since (cid:16) r r + λ (cid:17) ( u + c )
Recall that we always have z ∗ n ≥ z ∗∗ . Hence assume (toward a contradiction) that we can find some ε> such that, taking a subsequence if necessary, z ∗ n >z ∗∗ + ε for every n . For every n , consider the gamestarts at z ∗ n − ε > z ∗∗ + ε . Let T ∗ n be the stopping time that stops the play in the first time that a posteriorreaches [ z ∗ n , ∞ ) . By Lemma 9, we have E NIn (cid:2) e − r T ∗ n (cid:3) → , implying that limsup W n (cid:0) p (cid:0) z ∗ n − ε (cid:1)(cid:1) ≤ . But ince z ∗ n − ε >z ∗∗ + ε , the principal can get a strictly positive payoff by terminating the relationship when thenext stopping opportunity arrives. So the principal has a profitable deviation at z ∗ n − ε when n is sufficientlylarge, a contradiction. Proof of Part 2 of Theorem 3. In light of Corollary 3, we extend each W n continuously from (0 , to [0 , by setting W n (0)=0 and W n (1)= λr + λ w NI .We first show that W n ( · ) converges to W ( p ∗∗ ) = 0 . By Lemmas 9 and 10, it is easy to see that lim n →∞ W n ( p ) = 0 for all p < p ∗∗ . Suppose toward a contradiction, taking a subsequence if necessary,that lim n →∞ W n ( p ∗∗ ) = δ > . Let (cid:15) = δ (1 − p ∗∗ )2 w NI . For n large enough, we have W n ( p ∗∗ − (cid:15) ) < δ . Since W n ( · ) is convex (by Corollary 3), we have W n (1) − W n ( p ∗∗ )1 − p ∗∗ ≥ W n ( p ∗∗ ) − W n ( p ∗∗ − (cid:15) ) (cid:15) ≥ w NI − p ∗∗ , which implies W n (1) >w NI , a contradiction.Next, for each n , because W n (0) = W (0) , W n (1) = W (1) , and W n ( · ) is increasing and convex, wehave ≤ W (cid:48) n ( · ) ≤ λr + λ . But then, we always have p ∗∗ =argmax p ∈ [0 , | W n ( p ) − W ( p ) | . Hence, uniformconvergence of W n follows immediately from its pointwise convergence at p ∗∗ . References Aghion, Philippe and Matthew O. Jackson , “Inducing Leaders to Take Risky Decisions: Dismissal,Tenure, and Term Limits,” American Economic Journal: Microeconomics , August 2016, (3), 1–38. Atakan, Alp and Mehmet Ekmekci , “Reputation in Long-Run Relationships,” Review of Economic Studies ,2012, (2), 451–480. and , “Reputation in the Long-Run with Imperfect Monitoring,” Journal of Economic Theory , 2015, , 553–605. Bolton, Patrick and Christopher Harris , “Strategic Experimentation,” Econometrica , 1999, (2),349–374. Bonatti, Alessandro, Gonzalo Cisternas, and Juuso Toikka , “Dynamic Oligopoly with IncompleteInformation,” The Review of Economic Studies , 2016, (2), 503–546. Brealey, Richard, Hayne E. Leland, and David H. Pyle , “Informational Asymmetries, Financial Structure,and Financial Intermediation,” The Journal of Finance , 1977, (2), 371–387. Celentani, Marco, Drew Fudenberg, David K. Levine, and Wolfgang Pesendorfer , “Maintaining aReputation Against a Long-Lived Opponent,” Econometrica , 1996, (3), 691–704. Cetemen, Doruk, Ilwoo Hwang, and Ayca Kaya , “Uncertainty-Driven Cooperation,” Theoretical Eco-nomics , 2020, (3), 1023–1058. Note that Lemma 9 enables us to conclude that limsup n E n (cid:2) e − r T (cid:3) =0 . This is because the principal only derivespositive payoff from terminating against the noninvestible type. As a result, if limsup n E NIn (cid:2) e − r T (cid:3) = 0 , then wemust have limsup n E In (cid:2) e − r T (cid:3) =0 , for otherwise the principal’s equilibrium payoff would be negative. han, Yuk-Shee , “On the Positive Role of Financial Intermediation in Allocation of Venture Capital in aMarket with Imperfect Information,” The Journal of Finance , 1983, (5), 1543–1568. Cisternas, Gonzalo , “Two-Sided Learning and the Ratchet Principle,” The Review of Economic Studies ,2017, (1), 307–351. Cripps, Martin and Jonathan Thomas , “Reputation and Perfection in Repeated Common Interest Games,” Games and Economic Behavior , 1997, , 141–158. Daley, Brendan and Brett Green , “Waiting for News in the Market for Lemons,” Econometrica , 2012, (4), 1433–1504. Demarzo, Peter M., Michael J. Fishman, Zhiguo He, and Neng Wang , “Dynamic Agency and the qTheory of Investment,” The Journal of Finance , 2012, (6), 2295–2340. Dewatripont, Mathias, Ian Jewitt, and Jean Tirole , “The Economics of Career Concerns, Part I: Com-paring Information Structures,” Review of Economic Studies , 1999, (1), 183–198. Dilm´e, Francesc , “Dynamic Quality Signaling with Hidden Actions,” Games and Economic Behavior , 2019, , 116 – 136. Ekmekci, Mehmet , “Sustainable Reputations with Rating Systems,” Journal of Economic Theory , 2011, (2), 479 – 503. and Lucas Maestri , “Reputation and Screening in a Noisy Environment with Irreversible Actions,” 2019.MPRA Paper 100885, University Library of Munich, Germany. Faingold, Eduardo and Yuliy Sannikov , “Reputation in Continuous-Time Games,” Econometrica , 2011, (3), 773–876. Fudenberg, Drew and David K. Levine , “Reputation and Equilibrium Selection in Games with a PatientPlayer,” Econometrica , 1989, (4), 759–778. and , “Maintaining a Reputation When Strategies Are Imperfectly Observed,” The Review of EconomicStudies , 1992, (3), 561–579. Gompers, Paul and Josh Lerner , The Venture Capital Cycle , MIT Press, 2004. Hirshleifer, Jack , “The Private and Social Value of Information and the Reward to Inventive Activity,” TheAmerican Economic Review , 1971, (4), 561–574. Holmstr¨om, Bengt , “Managerial Incentive Problems: A Dynamic Perspective,” The Review of EconomicStudies , 1999, (1), 169–182. Kolb, Aaron M. , “Optimal Entry Timing,” Journal of Economic Theory , 2015, , 973 – 1000., “Strategic Real Options,” Journal of Economic Theory , 2019, , 344 – 383. Kreps, David and Robert Wilson , “Reputation and Imperfect Information,” Journal of Economic Theory ,1982, (2), 253–279. uvalekar, Aditya and Elliot Lipnowski , “Job Insecurity,” American Economic Journal: Microeconomics ,2020, (2). Lee, Jihong and Qingmin Liu , “Gambling Reputation: Repeated Bargaining With Outside Options,” Econometrica , 2013, (4), 1601–1672. Liu, Qingmin , “Information Acquisition and Reputation Dynamics,” The Review of Economic Studies , 2011, (4), 1400–1425. and Andrzej Skrzypacz , “Limited Records and Reputation Bubbles,” Journal of Economic Theory ,2014, , 2 – 29. Mailath, George and Larry Samuelson , “Who Wants a Good Reputation?,” The Review of EconomicStudies , 2001, (2), 415–441. Milgrom, Paul and John Roberts , “Predation, Reputation, and Entry Deterrence,” Journal of EconomicTheory , 1982, (2), 280–312. Orlov, Dmitry, Andrzej Skrzypacz, and Pavel Zryumov , “Persuading the Principal to Wait,” Journal ofPolitical Economy , 2020, (7), 2542–2578. Ortner, Juan , “Durable Goods Monopoly with Stochastic Costs,” Theoretical Economics , 2017, (2),817–861. Pei, Harry Di , “Reputation Effects Under Interdependent Values,” Econometrica , forthcoming. Phelan, Christopher , “Public Trust and Government Betrayal,” Journal of Economic Theory , 2006, (1),27–43. Ramos, Joao and Tomasz Sadzik , “Partnership with Persistence,” 2020. Working Paper. Ross, Kevin , “Stochastic Control in Continuous Time,” 2008. Available at: http://statweb.stanford.edu/˜kjross/Stat220notes.pdf . Schlee, Edward E. , “The Value of Information in Efficient Risk-Sharing Arrangements,” The AmericanEconomic Review , 2001, (3), 509–524. Schmidt, Klaus , “Reputation and Equilibrium Characterization in Repeated Games with ConflictingInterests,” Econometrica , 1993, (2), 325–351. Sun, Yiman , “A Dynamic Model of Censorship,” 2020. Working Paper. Varas, Felipe, Iv´an Marinovic, and Andrzej Skrzypacz , “Random Inspections and Periodic Reviews:Optimal Dynamic Monitoring,” The Review of Economic Studies , forthcoming. Zhu, Yiran , “Better Monitoring...Worse Productivity?,” 2020. Working Paper. Online Appendix B.1 Omitted Proofs for Theorem 1 B.1.1 Proofs of Lemmas 1 and 2 Consider a Markovian equilibrium, ( α,β ) , and the underlying probability space (Ω , F , P ) . For each p ∈ (0 , , we define Φ( p ) := { ω ∈ Ω: ∃ t ≤ T such that p t ( ω )= p } , where T is the random stopping time of the re-lationship induced by ( α,β ) . The belief span, SP ( α ) , is the set of all p such that P (Φ( p )) > . Clearly,SP ( α ) is a connected set because the sample path of X t is almost surely continuous. Let p := inf SP ( α ) ,and ¯ p := sup SP ( α ) . Define the principal’s value function W as in the main text on the domain of SP ( α ) .The function W is continuous because the agent’s equilibrium policy function a ∈ P . Claim OA.1. SP ( α ) is an open interval. That is, SP ( α )=( p, ¯ p ) .Proof. Since SP ( α ) is a connected set, we only need to show that ¯ p,p / ∈ SP ( α ) . Suppose, toward acontradiction, that ¯ p ∈ SP ( α ) . Then, consider a history that leads to the belief ¯ p and the continuation playstarting from this history. Since the belief process is a martingale, we must have p t = ¯ p for all t ≤ T andalmost all sample paths. Agent’s optimality then implies a (¯ p )=0 ,and thus the diffusion coefficient of thebelief process at ¯ p is strictly positive. This contradicts p t =¯ p for all t ≤ T and almost all sample paths. Thesame argument proves that p / ∈ SP ( α ) . Claim OA.2. The principal’s equilibrium policy function b has a cutoff structure on SP ( α ) . That is, thereexists a unique p ∗ ∈ [ p, ¯ p ] such that p ∈ ( p,p ∗ ) implies b ( p )=0 and p ∈ ( p ∗ , ¯ p ) implies b ( p )=1 .Proof. Recall that R ( p ) := pw NI + (1 − p ) w I is the principal’s expected payoff if the relationship isterminated when her belief is p . For any p ∈ SP ( α ) , define F ( p ):= W ( p ) − R ( p ) . At any time t such thatthe stopping opportunity arrives, given her belief p t = p ∈ SP ( α ) , if the principal terminates the relationship,her expected payoff is R ( p ) ; if the principal continues the relationship, her continuation value is W ( p ) . Thus,principal’s optimality requires that b ( p )=1 if F ( p ) < and that b ( p )=0 if F ( p ) > .We first establish two useful properties of F . Property 1: If F (˜ p ) > at some ˜ p ∈ SP ( α ) , then F ( p ) > for all p< ˜ p .To see this, suppose that F (˜ p ) > at some ˜ p ∈ SP ( α ) . Let ( p a ,p b ) be the largest interval containing ˜ p suchthat F ( p ) > for all p ∈ ( p a ,p b ) . We want to show that p a = p . Suppose, toward a contradiction, that p a >p .Since W is continuous, we have F ( p a )=0 , i.e., W ( p a )= R ( p a ) . Moreover, principal’s optimality requiresthat b ( p )=0 for all p ∈ ( p a ,p b ) . We consider two cases, and will reach a contradiction in each of these cases. Case 1: p b < ¯ p .In this case, continuity of W also implies that F ( p b )=0 , i.e., W ( p b )= R ( p b ) . Consider now a historythat leads to the belief ˜ p and the continuation play starting from this history. Let T † be the first time that the osterior belief reaches ( p a ,p b ) c (setting T † = ∞ if this event is not reached in finite time). Let ϕ representthe probability measure (from the principal’s perspective) induced by the distribution of p T † . Then, R (˜ p ) Let p ∗ :=sup { p ∈ SP ( α ): F ( p ) > } . Then, F ( p ) < for all p>p ∗ .By definition of p ∗ , we know that F ( p ) ≤ for all p ≥ p ∗ , i.e., W ( p ) ≤ R ( p ) for all p ≥ p ∗ , so it is weaklyoptimal for the principal to terminate the relationship whenever p ∈ ( p ∗ , ¯ p ) . Suppose, toward a contradiction,that F (˜ p )=0 for some ˜ p>p ∗ . Consider a history that leads to the belief ˜ p and the continuation play startingfrom this history. Let T † be the first time that the stopping opportunity arrives or the posterior reaches p ∗ (setting T † = ∞ if this event is not reached in finite time). Let ϕ represent the probability measure (fromthe principal’s perspective) induced by the distribution of p T † . Then, R (˜ p )= W (˜ p )= (cid:90) ¯ pp ∗ W ( p ) E (cid:104) e − r T † | p T † = p (cid:105) ϕ ( dp ) ≤ (cid:90) ¯ pp ∗ R ( p ) E (cid:104) e − r T † | p T † = p (cid:105) ϕ ( dp ) < (cid:90) ¯ pp ∗ R ( p ) ϕ ( dp )= R (˜ p ) . The first equality follows from the contradiction assumption that F (˜ p )=0 , the first inequality follows fromthe definition of p ∗ such that F ( p ) ≤ for all p ≥ p ∗ , the last inequality holds because ≤ W ( p ∗ ) ≤ R ( p ∗ ) implies that R ( p ) > for all p > p ∗ , and the final equality holds because p t is a bounded martingale and By convention, if { p ∈ SP ( α ): F ( p ) > } = ∅ , we set sup { p ∈ SP ( α ): F ( p ) > } = p . ( · ) is an affine function. But then, we have an obvious contradiction, establishing Property 2.These two properties of F immediately deliver our result. Specifically, let p ∗ :=sup { p ∈ SP ( α ): F ( p ) > } . Then, Property 1 implies that F ( p ) > (and thus b ( p ) = 0 ) for all p ∈ ( p,p ∗ ) , and Property 2 impliesthat F ( p ) < (and thus b ( p )=1 ) for all p ∈ ( p ∗ , ¯ p ) .We continue with a technical result that will be later used. Claim OA.3. Fix a positive integer T. For any ε > there exists η > satisfying the following property:Take any pair of adapted processes dY t = µ ,t dt + σdB t and dY t = µ ,t dt + σdB t such that µ j,t ∈ [0 , for j =1 , and for every t. Let P and P be the probability distributions over ( C ([0 ,T ]) , B ( C ([0 ,T ]))) generated by such stochastic processes. If A ∈ B ( C ([0 ,T ])) is such that E P [ I A ] <η then E P [ I A ] <ε .Proof. Dividing both processes by σ and subtracting the same drift from both processes if necessary wemay assume that dY t = dB t and dY t = µ ,t dt + dB t with µ ,t ∈ (cid:2) − σ − ,σ − (cid:3) . Since the drift is boundedwe can invoke Girsanov’s theorem to obtain E P [ I A ]= E P [ I A M T ] , where M T =exp (cid:16)(cid:82) T µ ,t dB t − (cid:82) T µ ,t dt (cid:17) . Notice that M T ≤ F µ :=exp (cid:16)(cid:82) T µ ,t dB t (cid:17) . Since this classof processes is uniformly integrable we can take n ∗ ∈ N such that E P (cid:2) F µ I { F µ >n ∗ } (cid:3) < ε (holding forevery process in this class) and consequently E P (cid:2) M T I { M T >n ∗ } (cid:3) ≤ E P (cid:104) F µ I { F µ >n ∗ } (cid:105) < ε . Therefore, taking η = ε n ∗ , we obtain that E P [ I A ]= E P (cid:2) I A M T I { M T ≤ n ∗ } (cid:3) + E P (cid:2) I A M T I { M T >n ∗ } (cid:3) ≤ E P (cid:2) I A M T I { M T ≤ n ∗ } (cid:3) + ε Assume towards a contradiction that ¯ p< . Case 1: ¯ p>p ∗ .The belief process p t is a martingale, so for every ε> there exists an (cid:15)> such that if p t > ¯ p − (cid:15) , then P (inf s>t p s >p ∗ + ε | θ = NI ) > − ε . This implies that P ( T = T λ | θ = NI ) > − ε where T λ is the arrivalof the next Poisson-shock. Notice that for every η > we can take ε η > such that the agent’s payoff at p t is no more than ( u + c ) (cid:16) r r + λ (cid:17) + η. This implies that for every ν > we can take η small enough (taking ε η to satisfy the condition above) so that E (cid:16)(cid:82) min { t +1 , T } t ( a ( p t )) dt | θ = NI (cid:17) <ν . Hence, there exists (cid:36) > such that E (cid:16)(cid:82) min { t +1 , T } t (1 − a ( t )) dt | θ = NI (cid:17) >(cid:36) . B stands for the Borel sigma-field, and C ([0 ,T ]) is the set of continuous functions over [0 ,T ] . onsider the law of motion (7) when p t ∈ [ p ∗ , ¯ p ] . Observe that the instantaneous variance of the beliefprocess when the (noninvestible type) agent plays a ( · ) is bounded below by a positive constant times (1 − a t ) min { p ∗ (1 − p ∗ ) , ¯ p (1 − ¯ p ) } > . Because ¯ p < and because p t − p = (cid:82) t dp t , we obtain that E (cid:104)(cid:12)(cid:12) p min { t +1 , T } − p t (cid:12)(cid:12) (cid:105) ≥ (cid:37) for some positive constant (cid:37) , hence E (cid:2)(cid:12)(cid:12) p min { t +1 , T } − p t (cid:12)(cid:12)(cid:3) ≥ (cid:37) . Because (cid:0) p min { t +1 , T } − p t (cid:1) has mean zero, we obtain that E (cid:104)(cid:0) p min { t +1 , T } − p t (cid:1) + (cid:105) ≥ (cid:37) . Taking ε< (cid:37) we concludethat P (cid:0) p min { t +1 , T } > ¯ p + ε (cid:1) > , which is a contradiction. Case 2: ¯ p ≤ p ∗ .Assume ¯ p ≤ p ∗ . Then, b ( p ) = 0 for all p ∈ SP ( α,β ) . Claim OA.3 implies that, for every T > , if thenoninvestible agent plays a t =0 for every t ∈ [0 ,T ] , then the relationship terminates before T with probabilityzero, which implies that the agent’s best response must satisfy a t =0 for every t> . This contradicts theassumption that ¯ p is never reached. Claim OA.5. p =0 .Proof. Assume towards a contradiction that p> . Case 1: p
For every η ∈ (cid:16) , p ∗ − p (cid:17) there exists (cid:15)> such that if p t t p s ≥ p ∗ | θ = NI ) <η, and consequently P (sup s>t p s ≥ p ∗ ) <η .This follows from the martingale property of the belief process. Step 2: For every ε> and T ∈ N there exists (cid:15)> such that if p t , then P ˜ σ ( { T There exists T ∗ ∈ N and ε> such that if P ˜ σ ( { T There exists (cid:15) ∗ > and ε ∗ > such that if:1. p t t p s ≥ p ∗ ) <ε ∗ ,3. E (cid:16)(cid:82) min { t + T ∗ , T } t (1 − a ( t )) dt | θ = NI (cid:17) ≥ ,then P (cid:0) inf s>t p s .The argument is analogous to that used in the Case 1 of Claim OA.4’s proof, and is thus omitted. Step 5: Step 3 guarantees that we can find ε ∗ > and T ∗ such that if P ˜ σ ( { T ,which contradicts the definition of p. Case 2: p ∗ ≤ p . This case is analogous to the Case 2 of Claim OA.4’s proof, and is thus omitted. Claim OA.6. In any Markovian equilibrium policy profile ( a,b ) , lim p → a ( p ) < . Proof. Assume towards a contradiction that lim p → a ( p ) = 1 . Fix an ε > . Take T ε ∈ N such that e − r T ε <ε and P ( T λ >T ε ) <ε where T λ is the random time that the next stopping opportunity arrives. Let Z ( p ):=ln( p/ (1 − p )) , for p ∈ (0 , . Under the contradiction assumption, take ˆ Z >Z ( p ∗ ) such that z > ˆ Z implies a ( z ) > . Recall from the law of motion (9) that Z t has bounded drift. So there exists Z ε > ˆ Z such that Z ≥ Z ε implies P (cid:16) inf t That SP ( α ) = (0 , follows directly from Claims OA.4 and OA.5. Consequently, a ( p ) < for all p ∈ (0 , , for otherwise there would be an absorbing state, contradicting SP ( α ) = (0 , .By definition of an Markovian equilibrium, a ( · ) is piecewise Lipschitz, so a ( · ) only has finite discontinuitiesand its one-sided limit always exists. Hence, sup p ∈ (0 , a ( p ) < if and only if lim p → a ( p ) < and lim p → a ( p ) < , so by Claims OA.6 and OA.7 we are done. Proof of Lemma 2. Take any Markov equilibrium ( a,b ) . By Claim OA.2 and Lemma 1, b has a cutoffstructure on (0 , . So we only need to argue that the cutoff belief p ∗ satisfies < p ∗ < . First, since R ( p ) < for all p ∈ (0 ,p ∗∗ ) , principal’s optimality requires that b ( p )=0 for all such p , and so p ∗ ≥ p ∗∗ > .Moreover, since W is bounded above by λr + λ w NI , R ( p ) > W ( p ) for all p ∈ ( p H , , so principal’soptimality requires that b ( p )=1 for all p ∈ ( p H , , and thus p ∗ ≤ p H < . B.1.2 Properties of a ∗ + ( x ) and a ∗− ( x ) Proof of Claim 9. Recall the definition of a ∗− ( · ) and a ∗ + ( · ) in (38) and (39). In this proof, we focus on theproperties of a ∗ + ( · ) , which is the more difficult case. The properties of a ∗− ( · ) can be established analogously. Note that p ∗ ≥ p ∗∗ > . Recall that W ( p ) is the principal’s value at p conditional on the stopping opportunity not arriving. or ease of notation, define q := x − r r λ u √ κ R and q R := v R − r r λ u √ κ R . Consequently, we can rewrite a ∗ + ( · ) as a ∗ + ( x ( q ))=1 − √ r + λ ) µ φ ( q ) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) . First, notice that q R = (cid:114) r + λ (cid:18) − r + λξ R µ (cid:19) µ = (cid:114) r + λ (cid:18) − ξ R (cid:19) µ> µ (cid:112) r + λ ) , (49)where the first equality uses the definition of v R in (29) and the fact that κ R = r c r + λ ) µ , and the rest followsfrom the fact that ξ R is the positve root of ξ + ξ = r + λ ) µ so that r + λξ R µ = ξ R < .Next, notice that given (49), we have (cid:34)(cid:112) r + λ ) µ φ ( q )+Φ( q ) (cid:35) (cid:48) = φ ( q ) (cid:34) − (cid:112) r + λ ) µ q (cid:35) < , ∀ q ∈ [ q R , ∞ ) . (50)Now, let us take derivative of − a ∗ + ( x ( q )) with respect to q . Using the fact that φ (cid:48) ( q )= − qφ ( q ) , we have √ r + λ ) µ φ ( q ) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:48) = −√ r + λ ) µ qφ ( q ) (cid:20) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:21) + √ r + λ ) µ φ ( q ) (cid:20) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:21) = (cid:112) r + λ ) µ φ ( q ) − q (cid:20) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:21) + φ ( q ) (cid:20) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:21) . So the sign of a ∗(cid:48) + ( x ( q )) is the same as that of S ( q ):= q (cid:34)(cid:112) r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:35) − φ ( q ) . Note that, given condition (49), we have S ( q R )= (cid:20) √ r + λ ) µ q R − (cid:21) Φ( q R ) > . Moreover, given condition(50), we have S (cid:48) ( q )= (cid:20) √ r + λ ) µ φ ( q R )+Φ( q R ) − Φ( q ) (cid:21) > . Therefore, S ( q ) > for all q ≥ q R , implying a ∗ + ( · ) is strictly increasing on [ v R ,v L ] .Finally, inspecting (39) we have a ∗ + ( v R )=0 ; applying condition (50) we have
Lemma OA.2 (Conditional Translation Invariance) . Fix an arbitrary strategy profile ( α,β ) and some (cid:15) ∈ R .Consider a new profile ( α (cid:48) ,β (cid:48) ) , defined by α (cid:48) t := α t (cid:12)(cid:12)(cid:12) { Z s − (cid:15) } s ≤ t and β (cid:48) t := β t (cid:12)(cid:12)(cid:12) { Z s − (cid:15) } s ≤ t for all t ≥ .Then, the conditional payoffs of the principal satisfy E { U ( t,α,β ) | θ = NI,Z t = z } = E (cid:8) U ( t,α (cid:48) ,β (cid:48) ) (cid:12)(cid:12) θ = NI,Z t = z + (cid:15) (cid:9) , E { U ( t,α,β ) | θ = I,Z t = z } = E (cid:8) U ( t,α (cid:48) ,β (cid:48) ) (cid:12)(cid:12) θ = I,Z t = z + (cid:15) (cid:9) , almost surely, for all t ≥ and z ∈ R .Proof. In the case of conditioning on θ = NI , the law of motion of { Z t } t ≥ is as in the proof of Lemma OA.1.In the case of conditioning on θ = I , the dynamics of { Z t } t ≥ satisfies dZ t = − µ (1 − α t ) dt + µ (1 − α t ) dB t . n both cases the dynamics are linear given α , so if the agent perturbs his strategy using a constant displacementin z − space, the principal can maintain her payoff distribution intact by imitating the perturbation. B.1.4 Monotonicity and Curvature of Value Functions Proof of Corollary 3.