[PDF] Signaling with Private Monitoring

Abstract

We study dynamic signaling when the informed party does not observe the signals generated by her actions. A long-run player signals her type continuously over time to a myopic second player who privately monitors her behavior; in turn, the myopic player transmits his private inferences back through an imperfect public signal of his actions. Preferences are linear-quadratic and the information structure is Gaussian. We construct linear Markov equilibria using belief states up to the long-run player's second-order belief . Because of the private monitoring, this state is an explicit function of the long-run player's past play. A novel separation effect then emerges through this second-order belief channel, altering the traditional signaling that arises when beliefs are public. Applications to models of leadership, reputation, and trading are examined.

Full PDF

SSignaling with Private Monitoring ∗ Gonzalo Cisternas and Aaron KolbMay 8, 2020

Abstract

We study dynamic signaling when the informed party does not observe the signalsgenerated by her actions. A long-run player signals her type continuously over timeto a myopic second player who privately monitors her behavior; in turn, the myopicplayer transmits his private inferences back through an imperfect public signal of hisactions. Preferences are linear-quadratic and the information structure is Gaussian.We construct linear Markov equilibria using belief states up to the long-run player’s second-order belief . Because of the private monitoring, this state is an explicit functionof the long-run player’s past play. A novel separation eﬀect then emerges through thissecond-order belief channel, altering the traditional signaling that arises when beliefsare public. Applications to models of leadership, reputation, and trading are examined.

The general interest in signaling —i.e., information transmission through costly actions—isreﬂected in its inﬂuence in virtually all subﬁelds across economics. Despite this breadth, thegreat majority of signaling games share a key commonality: the “sender” knows the belief ofthe “receiver” about the sender’s type at the moment of action. While this public nature of areceiver’s belief can be a sensible approximation in some settings, it is far less appropriate inothers, such as when imperfect private signals of behavior are at play: employers subjectivelyassessing their workers’ performances (Levin, 2003); traders handling others’ orders (Yangand Zhu, 2019); or data brokers collecting data about consumers (Bonatti and Cisternas,2019). There, the beliefs of employers, ﬁnancial intermediaries, or data brokers over variablessuch as a worker’s ability, an asset’s value, or a consumer’s preferences, are private . ∗ Cisternas: MIT Sloan School of Management, 100 Main St., Cambridge, MA 02142, [email protected] .Kolb: Indiana University Kelley School of Business, 1309 E. Tenth St., Bloomington, IN 47405 [email protected] . We thank Alessandro Bonatti, Isa Chavez, Wouter Dessein, Robert Gibbons, Ma-rina Halac, Stephen Morris, Alessandro Pavan, Andy Skrzypacz, Bruno Strulovici, and Vish Viswanathanfor useful conversations. a r X i v : . [ ec on . T H ] J u l llowing for private monitoring of an informed player’s actions is an important agenda,as it can open the way for a new set of applied-theory questions to be analyzed. How doleaders gradually inﬂuence their followers when they do not know how their actions have beeninterpreted? Can career-concerned agents beneﬁt by not being able to observe the signalsgenerated by their actions when attempting to manage their reputations? How is tradingbehavior aﬀected by the possibility of hidden leakages to other traders? While clearly realisticand relevant, these questions nonetheless present substantial challenges. First, higher-orderbeliefs can arise: in most settings, the senders involved will have to form a nontrivial beliefabout their receivers’ beliefs. Second, such settings can be inherently asymmetric: whenfacing a sender of a ﬁxed type, the receiver develops evolving private information in the formof a belief. Third, most analyses will be nonstationary due to ongoing learning eﬀects.In this paper, we introduce a class of linear-quadratic-Gaussian games of incomplete in-formation and private monitoring in which these questions and challenges can be addressed.A long-run player (she) and a myopic counterpart (he), both with linear-quadratic prefer-ences, interact over a ﬁnite horizon. The long-run player has a normally distributed type.Our key innovation is to allow the myopic player to privately observe a noisy signal of thelong-run player’s action; in turn we let the long-run player receive feedback about the my-opic player’s inferences via an imperfect public signal of the latter’s behavior. The shocksin both signals are additive and Brownian. Using continuous-time methods, we constructlinear Markov equilibria (LMEs) in which the players’ beliefs are the relevant states. Equilibrium construction and signaling.

It is well known that the construction ofnontrivial equilibria in games of private monitoring can be a daunting task. In fact, toestimate rivals’ continuation behavior under any strategy, players usually have to make aninference about their opponents’ private histories. Not knowing what their rivals have seen,the players will then rely on their past play, but this implies that the players’ inferences willvary with their own private histories. Thus, (i) probability distributions over histories mustbe computed, and (ii) the continuation games at oﬀ- versus on-path histories may diﬀer.With incomplete information, one expects this statistical inference problem to become oneof the estimation of belief states that summarize the payoﬀ-relevant aspects of the players’private histories—our approach oﬀers a parsimonious treatment of this issue. The quadraticpreferences permit our players to employ strategies that are linear in their posterior beliefs’means (henceforth, beliefs). Conjecturing such linear strategies, learning is (conditionally)Gaussian: the myopic player’s belief is linear in the history of his private signals, and the The myopic “receiver” assumption is convenient for focusing exclusively on how the long-run player’ssignaling motives respond to the introduction of higher-order uncertainty, but our construction, methodsand main ﬁndings remain valid beyond this case. We discuss this and other assumptions in the conclusion. second-order belief —her belief about the myopic player’s private belief—islinear in the histories of the public signal and her past play . The estimation of historiesdescribed in (i) is thus simpliﬁed by the fact that these are aggregated linearly.Critically, the long-run player’s second-order belief is also private, as her actions dependon her type; the myopic player must therefore forecast this state. The problem of the statespace expanding is then circumvented by a key representation of the (candidate, on path)second-order belief in terms of the long-run player’s type and the belief about it basedexclusively on the public signal (Lemma 1). Thus, performing equilibrium analysis requiresa nontrivial second-order belief that is spanned by the rest of the states along the path ofplay, in a reﬂection of how the game’s structure changes after deviations, as noted in (ii).With Markov states as suﬃcient statistics, we can write the long-run player’s best-responseproblem as one of stochastic control and use dynamic programming to ﬁnd LMEs. The long-run player controls her own second-order belief, in a generalization of the tradi-tional control of a public belief under imperfect public monitoring. But since this state is nowan explicit function of past play, private monitoring has novel implications for signaling—ourrepresentation result is again key. Speciﬁcally, because diﬀerent types behave diﬀerently inequilibrium, their diﬀerent past behavior leads them to expect their “receivers” to hold dif-ferent beliefs. In other words, the perception of diﬀerent continuation games—as measuredby the value of the second-order belief in the representation—opens an additional channelfor separation. We refer to this as the history-inference eﬀect on signaling. The potentialamplitude of this eﬀect is largest when the public signal is pure noise (“no feedback”), andthus the reliance on past play is strongest; conversely, it disappears when beliefs are public.From a positive standpoint, the relevance of this eﬀect depends on the plausibility ofindividuals relying on their past behavior to forecast what others currently know. Crucially,this notion strongly resonates with reality, such as when leaders reﬂect on their past behav-ior when assessing organizations’ understanding of the leadership’s long-term goals, whenpoliticians gauge their reputations, or when traders estimate how much of their private in-formation has been learned by others. We are not aware of an existing framework where thesignaling implications of this natural use of past behavior can be studied. Our approach,which ultimately exploits the use of ordinary diﬀerential equations (ODEs), oﬀers a venue.

Applications.

To leverage the ﬂexibility of the model, we examine one instance of ourbaseline speciﬁcation and two based on extensions of it. Our aim is to show how the precisionsof the signals involved shape outcomes via the extent of higher-order uncertainty created.In our leading application (Section 2), we examine the history-inference eﬀect in a coordi- The presence of this public belief creates signal-jamming motives for the long-run player. underperform their less informed counterparts.Uncertainty about others’ beliefs is also natural in reputational settings. In Section 5.1,we examine a model of horizontal reputation based on an extension that allows for terminal payoﬀs: the long-run player suﬀers a terminal quadratic loss that increases in the distancebetween the myopic player’s belief and the type’s prior (e.g., a politician facing reelectionwho desires a reputation for neutrality). In such a context, we show that not directlyobserving her reputation can beneﬁt the long-run player, despite the negative direct eﬀect ofthe increased uncertainty over her concave objective. Indeed, since higher types take higheractions due to their higher biases, those types must oﬀset higher beliefs to appear unbiased;the history-inference eﬀect then reduces the informativeness of the long-run player’s action,making beliefs less sensitive to new information, a strategic eﬀect that can dominate.Finally, in Section 5.2, we exploit the presence of the public belief state in a linear tradingmodel in which an informed trader faces both a myopic trader who privately monitors herand a competitive market maker who only observes the public total order ﬂow. In thiscontext, we show that there is no linear Markov equilibrium for any degree of noise of theprivate signal. Intuitively, the myopic player introduces momentum into the price, as theinformation he obtains is now distributed to the market maker through all future order ﬂows.This causes prices to move against the insider and creates urgency—with an inﬁnite numberof opportunities to trade, the insider trades away all information in the ﬁrst instant.

Existence of LME and technical contribution.

The bulk of our analysis unfolds inSections 3 and 4, where we introduce the general model and lay out the methodologicalframework. A distinctive feature there is that the environment is asymmetric , both in termsof the players’ preferences and their private information (a ﬁxed state versus a changingone). In particular, the players can signal at substantially diﬀerent rates, which is in stark4ontrast to the existing literature on symmetric multisided learning. With diﬀerent rates oflearning, however, the equilibrium analysis can become severely complicated.Speciﬁcally, our belief states depend on both the myopic player’s posterior variance,which determines the sensitivity of the myopic player’s belief, and the weight attached tothe long-run player’s type in the representation result, which shapes the history-inferenceeﬀect and is linked to the long-run player’s learning. Both functions are deterministic due tothe Gaussian structure. Using dynamic programming, one can then show that the problemof the existence of an LME reduces to a boundary value problem (BVP) including ODEs forthe two aforementioned functions of time and for the weights in the long-run player’s linearMarkov strategy. The two learning

ODEs endow the BVP with exogenous initial conditions,while the rest carry terminal conditions arising from myopic play at the end of the game.With multiple ODEs in both directions, establishing the existence of a solution to sucha BVP is a challenging “shooting” problem: not only must solutions to all individual ODEsexist, but they must land at speciﬁc (potentially endogenous) values. To address this com-plexity, we distinguish between two types of environments. In a private-value setting, themyopic player’s best response is only a function of his expectation of the long-run player’saction, i.e., it does not depend directly on his expectation of the type. In this context, thereis enough (strategic) symmetry that a one-to-one mapping emerges between the solutionsto the learning ODEs, which renders the shooting problem unidimensional (Lemma 4). Viatraditional continuity arguments, we can guarantee the existence of an LME in the leadershipmodel of Section 2 when the public signal is of intermediate quality, for a horizon lengththat is decreasing in the prior variance of the state of the world (Theorem 1).In common-value settings, the multidimensionality issue must be confronted. Building onthe literature on BVPs with intertemporal linear constraints (Keller, 1968), we can establishthe existence of LME for our BVP that carries intratemporal nonlinear (terminal) constraints.Speciﬁcally, the multidimensional shooting problem can be formulated as one of ﬁnding aﬁxed point for a suitable function derived from the BVP, a problem that we tackle for avariation of the leadership model in which the follower directly cares about the state of theworld (Theorem 2). Critically, this approach is general: we show how to apply it to the wholeclass of games under study, and more generally, it oﬀers a promising venue for examiningbehavior in other settings exhibiting incomplete information and asymmetries.

Related Literature.

A long literature on multisided private monitoring has developed inrepeated games with complete information, where the issue of inferences of private historieshas been handled very diﬀerently relative to us. Closest in spirit is Phelan and Skrzypacz(2012), where such inferences are coarsened into beliefs over a ﬁnite set of states; instead,our players’ states take inﬁnitely many values and completely determine their beliefs about5he other player’s state. Other approaches include Mailath and Morris (2002), examiningequilibria that condition on ﬁnite histories when monitoring is nearly public, and Ely andV¨alim¨aki (2002), where mixed-strategy equilibria render such inferences irrelevant. Relativeto this literature, we focus on one-sided private monitoring but add private information at theoutset to construct and quantify natural, yet nontrivial, belief-dependent Markov equilibria.Regarding signaling models, in traditional static (i.e., sequential-move, one-shot) noisysignaling games (e.g., Matthews and Mirman, 1983; Carlsson and Dasgupta, 1997), the signalrealization is trivially hidden from the sender at the moment of action, but the common priormakes the receiver’s belief known at the same time. In dynamic environments, the receiver’sbelief is public in settings with observable actions and an exogenous, public stochastic process(e.g., Daley and Green, 2012; Gryglewicz and Kolb, 2019; Kolb, 2019) or when there isimperfect public monitoring, such as in Heinsalu (2018) and Dilm´e (2019). By contrast, ourassumptions on payoﬀs and signal structure make all players’ beliefs private.Private beliefs arise in Foster and Viswanathan (1996) and Bonatti et al. (2017), whereall the players have ﬁxed private information and there is an imperfect public signal; a rep-resentation result for ﬁrst-order beliefs eliminates the need for higher-order beliefs. Bonattiand Cisternas (2019) in turn examine two-sided signaling when ﬁrms privately observe asummary of a consumer’s past behavior to price discriminate; however, via the prices theyset, ﬁrms perfectly reveal their information to the consumer. Finally, private beliefs can alsoresult from an exogenous private signal of the sender’s type, as in Feltovich et al. (2002).Turning to our applications, adaptation-coordination tradeoﬀs are a key element in recentanalyses of organizations: in static, linear-quadratic, settings, see Dessein and Santos (2006)and Rantakari (2008) for questions of specialization and governance, respectively; our focusis instead on the dynamics of information transmission with private signals of behavior.Bouvard and L´evy (2019) examine a model of horizontal reputation with quadratic payoﬀsand symmetric uncertainty; beliefs are public in the linear Markov equilibrium constructed.Lastly, Yang and Zhu (2019) study a two-period model in which a trader faces, in the secondperiod, a “backrunner” who has observed a private signal of the former’s ﬁrst-period trade;there, the feedback element is absent, and so is the need for a belief representation like ours.To conclude, this paper contributes to a growing literature using continuous-time methodsto analyze dynamic incentives. Sannikov (2007) examines two-player games of imperfectpublic monitoring; Faingold and Sannikov (2011) reputation eﬀects with behavioral types;Cisternas (2018) games of ex ante symmetric incomplete information; and Bergemann andStrack (2015) revenue maximization with privately informed buyers. Our representationresult and derivation of belief states, the distinction between private and common-valuesettings, and the question of existence of equilibria make these methods virtually necessary.6

Application: Coordinated Adaptation

A team consisting of a leader (she) and a follower (he) operates over a ﬁnite horizon [0 , T ].The environment is parametrized by a state of the world θ that is normally distributed withmean µ ∈ R and variance γ o >

0. Letting a t ∈ R and ˆ a t ∈ R denote the leader’s action andfollower’s action at time t ∈ [0 , T ], respectively, the team’s performance is given by ˆ T e − rt {− ( a t − θ ) − ( a t − ˆ a t ) } dt, (1)where r ≥ a t − ˆ a t ) at all times. To solve this prediction problem, this playerrelies solely on a private signal of the leader’s action that is distorted by Brownian noise: dY t = a t dt + σ Y dZ Yt . Intuitively, as the leader signals —e.g., as she takes actions intended to drive the organi-zation in her desired direction—observing Y allows the follower to gradually adjust towardstaking the “right” action (in this case, θ ). But since Y is private to the follower, the leaderloses track of the follower’s belief in the process. Attempting to adapt the organization tonew economic conditions then creates two-sided uncertainty: the organization’s members donot know the long-term goals behind the leadership’s actions, and the leadership does notknow the organization’s understanding of what should be done at all instants of time.We are motivated by two elements that are critical to the performance of organizations.1. Eﬃcient adaptation.

Adjusting to the external economic environment is a key problemfor organizations, requiring substantial coordination of multiple functions (Williamson,1996; Milgrom and Roberts, 1992), which implies that (i) misaligned incentives and (ii)information frictions are key threats. The adaptation and coordination concerns arecaptured by − ( a t − θ ) and − ( a t − ˆ a t ) , respectively; in turn, the follower’s preferencespartially align the players’ objectives to concentrate on information frictions. Bounded rationality.

The barriers that people face in solving problems and processinginformation are at the core of every organization (Simon, 1957). As Williamson (1996)further remarks, “failures of coordination can arise because autonomous parties read Our general analysis allows for misalignments in the players’ ﬂow payoﬀs (see Section 4.3). Y is noisy, and its noise is idiosyncraticto the follower: examples include Y being linked to a cognitive process of the follower,or to a chain of imperfect transmissions that hides the ﬁnal realization from the leader.The leader’s knowledge of the state of the world is therefore understood as expertiserelevant to the current economic conditions; the transmission of this knowledge is thenlinked to behavior, but the transfer is slow and imperfect. In this regard, our choice to shutdown communication is essentially a dimensionality constraint intended to reﬂect situationsin which the knowledge involved is substantially richer than the code available. We now study two information structures for the leader: in the perfect-feedback case, theleader observes the follower’s action; in the no-feedback case, she observes nothing. That is,we keep the diﬃculty in transferring knowledge as given (the signal Y is ﬁxed), and we varythe quality of the information fed to the leader (which is the more likely choice variable).These are limit instances of a model we explore Sections 3 and 4 under general preferences. Perfect feedback (“public”) case.

If the leader perfectly observes the follower’s actionshe can potentially infer the follower’s belief, in which case the latter belief becomes public .In a linear Markov equilibrium (LME), the leader chooses actions that are linear both inher type θ and in the follower’s (commonly known) belief ˆ M t := ˆ E t [ θ ], where ˆ E t [ · ] denotesthe follower’s expectation operator; in turn, the follower’s action is his best prediction ofthe leader’s action, and hence it is linear in ˆ M t exclusively. For consistency throughout thepaper, we write β t for the weight on the type in the leader’s strategy at t ∈ [0 , T ]. Proposition 1 (LME—Public Case) . For all r ≥ and T > :(i) Existence of LME: There exists a unique LME. In this equilibrium, a t = β t θ + (1 − β t ) ˆ M t and ˆ a t = ˆ E t [ a t ] = ˆ M t , where ( β t ) t ∈ [0 ,T ] is deterministic.(ii) Signaling and learning: β t ∈ (1 / , for t < T , β T = 1 / , and β is strictly decreas-ing. Also, γ t := ˆ E t [( θ − ˆ M t ) ] evolves according to ˙ γ t = − (cid:16) γ t β t σ Y (cid:17) . In the LME, the leader reduces her degree of adaptation below the full-information so-lution a ≡ θ to coordinate with the follower—we refer to the weight β on the type as the These features resonate with the notion of tacit knowledge—“know-how” that is diﬃcult to codify andtransfer. Recognized as a key input to production, Garicano (2000) examines its implications on hierarchies,while Grant (1996) argues that this knowledge is “only being observed through its application” and Nonaka(1991) that it is “rooted in action and in an individual’s commitment to a speciﬁc context.” This notion of LME is perfect when Y is public, but only Nash when Y is private but the follower’saction is observed—our choice of exposition (observed actions as opposed to a public Y ) stems from the formof our general model. We keep the abbreviation later on despite its subsequent “perfection” property. ignaling coeﬃcient . This coeﬃcient shapes the follower’s learning captured by the posteriorvariance γ t , and it remains above 1/2—the value in the static equilibrium ( θ + ˆ M , ˆ M )—except at the end of the game. Indeed, by more aggressively signaling her know-how, theleader can steer the follower’s behavior toward the ﬁrst-best action faster, eﬀectively invest-ing in the follower’s adaptation. This incentive falls deterministically (i.e., β t is decreasing)because of both horizon and learning eﬀects—there is less time remaining to enjoy thosebeneﬁts, and steering behavior becomes more diﬃcult as information accumulates. No-feedback case.

Suppose now that the leader ceases to receive any information aboutthe follower: how does she forecast ˆ M , and how are signaling and learning aﬀected?To gain intuition, let us ﬁrst elaborate on the form of the follower’s belief when it ispublic. Upon conjecturing a linear Markov strategy by the leader, the follower’s learning hasa Gaussian structure, and so ˆ M is a linear function of the history Y t := ( Y s : 0 ≤ s < t ):namely, there are deterministic A ( · ) and A ( · , · ) such thatˆ M t = A ( t ) + ˆ t A ( t, s ) dY s , t ∈ [0 , T ] . (2)The leader’s forecast of ˆ M is trivially given by the same formula, as observing ˆ a (or Y )reveals the follower’s belief. That is, the leader forecasts by output : Y , which reﬂects theconsequences of her actions from the follower’s perspective, fully determines her inferences.In the absence of feedback, the signal Y is not available, making the leader’s forecastingproblem nontrivial. However, to the extent that ˆ M is as above (for potentially diﬀerent A and A ), the leader can take an expectation in (2) to obtain her second-order belief M t = A ( t ) + ˆ t A ( t, s ) a s ds, t ∈ [0 , T ] . (3)Crucially, this belief now is a function of the leader’s past actions, so the leader forecasts by input : absent any information, the leader must reﬂect on her past behavior to assess howmuch knowledge has been transferred. As natural as it seems, however, observe that past playwas completely irrelevant with perfect feedback: higher (lower) past actions only indicatedthat more negative (positive) shocks thwarted the leader’s eﬀorts, which is immaterial forfuture decision-making. The importance of forecasting by input versus output depends onthe setting, but many situations will entail both; our general model captures this feature.It is clear from (2) and (3) that a common element in these two extreme informationstructures studied is that, in both, future beliefs respond to diﬀerent continuation strategies In fact, d ˆ M t da t = γ t β t σ Y , so the sensitivity of the follower’s belief falls with lower values of γ . forward-looking exercise , the leader determines her best response tothe follower’s strategy. However, the follower’s behavior will depend on his assessment of theinformational content behind the leader’s actions, and this is a backward-looking exercise :how do diﬀerent types behave at their respective histories? Whether beliefs are a functionof commonly observed versus private information then introduces important diﬀerences.Abusing notation, let us now consider a linear strategy for the leader of the form a t = β t µ + β t M t + β t θ, (4)with the coeﬃcients again being deterministic. It is evident that M is generically private tothe leader under (4), as her actions carry her type—how will the follower coordinate then?Inspection of (3) and (4), however, suggests a linear relationship between M and θ . Supposethen that the follower conjectures that, on path, ( M t ) t ∈ [0 ,T ] satisﬁes the representation M t = (cid:18) − γ t γ o (cid:19) θ + γ t γ o µ, (5)where ( γ t ) t ∈ [0 ,T ] again denotes the follower’s posterior variance but now under (4)–(5). As aproof of concept, note that setting γ = γ o in (5) leads to M = µ = ˆ M , consistent with thecommon prior at time zero; conversely, if enough signaling has occurred, the leader thinksthat the follower must have learned the state: γ t ≈ M t ≈ θ in the same formula.The representation (5) is key. First, it encodes how private monitoring alters the extentof information transmission. In fact, the new signaling coeﬃcient—denoted α —is obtainedas the total weight on θ when inserting (5) into the leader’s strategy (4), which yields α := β + β χ, where χ := 1 − γγ o . We refer to the correction term β χ stemming from (5) as the history-inference eﬀect onsignaling. Indeed, since the leader forecasts by input, the follower needs to infer the leader’sprivate histories to extract the correct informational content from Y . From his perspective,how diﬀerently would a leader of a marginally higher type behave given a history Y t ? Withperfect feedback, the overall eﬀect is β , as all types agree on the value that ˆ M takes (i.e.,they pool along the belief dimension); this is not the case when there is no feedback, as theirdiﬀering past actions also lead them to perceive diﬀerent continuation games via M .Second, the representation prevents the state space from growing: via (5), the follower’sbelief about M (i.e., a third-order belief) is a function of ˆ M , and so E t [ ˆ E t [ M t ]] is a function of M t , and so forth. The linear-quadratic-Gaussian structure then ensures that ( θ, M, µ, t ), with10 as in (3), summarizes all that is payoﬀ-relevant for the leader after all private histories. Proposition 2 (LME—No-Feedback Case) . For all r ≥ and T > :(i) Existence: There exists an LME. In any such equilibrium, β + β + β = 1 ; β t > / , t ∈ [0 , T ) ; β T = 1 / ; and β > over [0 , T ] .(ii) Signaling and learning: α := β + β χ , where χ = 1 − γ t /γ o , satisﬁes α > / ; α T → as T → ∞ ; and α (cid:48) t ≥ , t ∈ [0 , T ) , with strict inequality if and only if r > . Also, γ t := ˆ E t [( θ − ˆ M ) ] evolves as ˙ γ t = − (cid:16) α t γ t σ Y (cid:17) . From part (ii) we see that private monitoring overturns strictly decreasing signaling eﬀectsexpected to arise under the traditional logic of public beliefs: α is non-decreasing. β Public β NF α NF t t β Public β NF α NF Figure 1: Left r = 0; Right: r = 1. Other parameter values: γ o = 1, σ Y = 1 . T = 10. Comparison.

Figure 1 plots the signaling coeﬃcients in each LME. In the no-feedbackcase, β is decreasing, so a nondecreasing α implies that the history-inference eﬀect increases over time. Indeed, because higher types take higher actions holding everything else ﬁxed,they will expect their followers to have higher beliefs: this eﬀect then grows—reﬂected in M attaching an increasing weight χ to θ in (5)—as past play acquires more relevance forpredicting the continuation game as time progresses. With a positive coordination motive( β >

0) that also strengthens with time, higher types gradually take even higher actions viathis second-order belief channel, enhancing the informational content of the leader’s action.Being forced to rely on her past actions to forecast the follower’s understanding essentiallyimposes discipline on the leader: she does not cater to the follower’s belief as she would inthe public case. In turn, this suggests that more knowledge is transferred to the follower.To assess the validity of this conjecture, we take advantage of the model’s analytic solutionsin the patient ( r = 0) and myopic ( r = ∞ ) cases. Let γ Pub and γ NF denote the follower’sposterior variance in the public and no-feedback cases, respectively. To conjecture the outcome of the game (i.e., (5)), and hence to be able to interpret signals correctly,our players must generically anticipate how play unfolds when (5) fails. Lemma A.2 in the Appendix provesthat (5) holds. The method for showing existence is part of a broader approach discussed in Section 4.3. roposition 3 (Total learning) . (i) If r = 0 , β Pub > α and γ Pub T > γ NF T , all T > ; (ii)Given T > and δ ∈ (0 , T ) , γ Pub t > γ NF t for t ∈ [ T − δ, T ] if r is large enough. Consequently, when the leader is either patient or very impatient, in the no-feedback casethe follower always has learned more by the end of the interaction. To show that this resultis nontrivial, part (i) states that in the beginning, a patient leader always signals less aggres-sively in the no-feedback case—this is due to the anticipation of the history-inference eﬀectat play later. Conversely, when the leader is myopic, the previous inter-temporal substitutioneﬀect disappears, so the only diﬀerence between the signaling coeﬃcients in the no-feedbackand public cases is the history-inference eﬀect that arises in the former. As argued previously,this eﬀect is always positive; part (ii) then follows by uniform convergence. We conclude with a discussion on how payoﬀs and information transmission connect:

Proposition 4 (Team’s payoﬀ) . (i) If r = 0 , the team’s ex ante payoﬀ is larger in the publiccase; (ii) ∀ r ≥ , ex ante undiscounted coordination costs equal σ Y log (cid:16) γ o γ T (cid:17) in each case. The direct eﬀect of shutting down the feedback channel is increased coordination costs:holding everything else ﬁxed, the leader is now uncertain about a concave payoﬀ. To un-derstand the strategic eﬀect, part (ii) is key: in both cases, the extent of the follower’slearning is a measure of total coordination costs . Consider the perfect-feedback case: if theleader chooses an action that the follower can match, no coordination costs are created, butthis necessarily implies that the leader is neglecting her type. Consequently, the leadershiptransmits its knowledge only when it introduces changes to which the organization does notperfectly know how to respond, generating transient miscoordination in the process.From this perspective, private monitoring exacerbates such costs by making the follower’sactions more volatile in response to more informative, yet stable, behavior by the leader; inparticular, the team is worse oﬀ in the no-feedback case when the leader is patient (part (i)).This setting of action-based information transmission then reveals that an organization’sbetter understanding of its leadership’s goals need not be indicative of better past, or evenfuture, performance: it can be reﬂective of the organization’s painful struggle to coordinate. The next sections develop a framework for general quadratic preferences and partially in-formative public feedback channels. This generality is important not only for the breadth ofapplications that can be explored but also because information channels often have interme-diate quality. For organizations in particular, the value of the analysis is clear: improvements The ranking of terminal learning appears to hold for all r >

0. See Figure 1 in the online appendix. Marschak (1955) stresses the importance of incorporating action-based information transmission in theanalysis of organizations: “A realistic theory of teams would be dynamic. It takes time to process and passmessages along a chain of members; and messages must include not only information on external variables,but also information on what has been done [emphasis added] by other members in the team” (p. 137). SeeHermalin (1998) for a theory of leadership featuring noiseless signaling, albeit in a static setting.

12n quality can sometimes be prohibitively costly, and even when partial improvements arefeasible, cost-beneﬁt analyses require assessing payoﬀs under varying levels of uncertainty.

We consider two-player, linear-quadratic-Gaussian games with private information and pri-vate monitoring. Extensions of the baseline setup are presented in Section 5 via applications.

Players, Actions and Payoﬀs.

A forward-looking long-run player (she) and a myopiccounterpart (he) interact in a repeated game that is played continuously over a time interval[0 , T ], T < ∞ . At each instant t ∈ [0 , T ], the long-run player chooses an action a t , while themyopic player chooses ˆ a t , both taking values over the real line. If at any instant the proﬁleof actions chosen is ( a, ˆ a ), the long-run player’s and myopic player’s ﬂow payoﬀs are U ( a, ˆ a, θ ) and ˆ U ( a, ˆ a, θ ) , (6)respectively, where U and ˆ U are quadratic functions. In (6), θ denotes a normally distributedrandom variable with mean µ ∈ R and variance γ o >

0. The long-run player discounts thefuture at a rate r >

0, while the myopic player cares only about his ﬂow payoﬀ at all times.To state our main assumptions on the functions U and ˆ U , we introduce the scalars u xy := ∂ U/∂x∂y | ∂ U/∂a | and ˆ u xy := ∂ ˆ U /∂x∂y | ∂ ˆ U /∂ ˆ a | , for x, y ∈ { a, ˆ a, θ } . Indeed, if the players’ ﬂow payoﬀs are concave in their respective actions, best responses willexhibit denominators like the above, allowing us to state our conditions in normalized form:

Assumption 1.

Flow payoﬀs satisfy (i) u aa = ˆ u ˆ a ˆ a = − (strict concavity); (ii) u aθ ( u aθ + u a ˆ a ˆ u ˆ aθ ) > (nontrivial signaling); (iii) | ˆ u ˆ aθ | + | ˆ u a ˆ a | (cid:54) = 0 and | u a ˆ a | + | u ˆ a ˆ a | (cid:54) = 0 (second-orderinferences); and (iv) u a ˆ a ˆ u a ˆ a < (myopic best-replies intersect). Our ﬁrst requirement is that θ be strategically relevant for the long-run player (i.e., u aθ (cid:54) = 0), which is implied by (ii). Part (iii) then invokes the use of higher-order inferences.Speciﬁcally, the ﬁrst condition states that the myopic player’s ﬁrst-order belief inﬂuences hisbehavior, either because he cares about θ directly (ˆ u ˆ aθ term) or indirectly via the long-runplayer’s action (ˆ u a ˆ a term); in turn, the second condition forces the long-run player to forecastthe myopic player’s belief, due to either an interaction term ( u a ˆ a ) or a nonlinear eﬀect ( u ˆ a ˆ a ).(Clearly, (iii) is a choice to focus on the most interesting cases rather than a limitation.)13he remaining conditions are used to ﬁnd equilibria in linear strategies. The concavityof the players’ objectives with respect to their own actions (part (i)) gives rise to linear bestresponses. Coupled with (iv), the static game of two-sided private information that arisesat the end of the interaction will admit a Nash equilibrium; part (ii) then ensures that thisequilibrium entails type dependence. We will revisit these assumptions in Section 4. Information.

The long-run player observes the value of θ before play begins, while themyopic player only knows its distribution θ ∼ N ( µ, γ o ) (and this is common knowledge).There are also two signals with full support due to the presence of Brownian noise: dX t = ˆ a t dt + σ X dZ Xt and dY t = a t dt + σ Y dZ Yt , (7)where Z X and Z Y are orthogonal and the volatility parameters σ Y and σ X strictly positive.Our key departure from existing analyses is to make Y —which carries information aboutthe long-run player’s actions—privately observed by the myopic player; instead, the signal X carrying the latter player’s action remains public. This mixed private-public informationstructure is important for our construction, but it is also natural for analyzing sender-receivergames: it makes the departure minimal while still economically relevant for applications.In what follows, we let E t [ · ] denote the long-run player’s conditional expectation operator,which can condition on the histories ( θ, a s , X s : 0 ≤ s ≤ t ) and on her conjecture of themyopic player’s play. Similarly, ˆ E t [ · ] denotes the myopic player’s analog, which conditionson (ˆ a s , X s , Y s : 0 ≤ s ≤ t ) and on his belief about the long-run player’s strategy, t ≥ Strategies and Equilibrium Concept.

With full-support monitoring, the only oﬀ-pathhistories for each player are those in which that player herself/himself has deviated. Thus, weuse the Nash equilibrium concept for deﬁning the equilibrium of the game, as imposing fullsequential rationality places no additional restrictions on the set of equilibrium outcomes.From this perspective, an admissible strategy for the long-run player is any square-integrable real-valued process ( a t ) t ∈ [0 ,T ] that is progressively measurable with respect tothe ﬁltration generated by ( θ, X ). The analogous notion for the myopic player involves theidentical integrability condition, but the measurability restriction is with respect to ( X, Y ). Deﬁnition 1 (Nash equilibrium.) . An admissible pair ( a t , ˆ a t ) t ≥ is a Nash equilibrium if:(i) the process ( a t ) t ∈ [0 ,T ] maximizes E (cid:104) ´ T e − rt U ( a t , ˆ a t , θ ) dt (cid:105) ; and (ii) for each t ∈ [0 , T ] , ˆ a t maximizes ˆ E t [ ˆ U ( a t , ˆ a t , θ )] when (ˆ a s ) s

14n the next section, we characterize Nash equilibria supported by strategies that are fullysequentially rational, thereby specifying behavior after deviations. The equilibria studiedgeneralize that of Section 2 for the no-feedback case σ X = ∞ to the whole range 0 < σ X ≤ ∞ . Remark 1 (Extensions) . Our methods can accommodate various extensions, including: (i)a terminal payoﬀ for the long-run player

Ψ(ˆ a T ) , where Ψ is quadratic; and (ii) the drift of X in (7) taking the form ˆ a t + νa t , where ν ∈ [0 , is a scalar. See Section 5.1 for a reputationmodel featuring (i) and Section 5.2 for an insider trading model featuring (ii) (that alsoaccommodates ∂ U/∂a = 0 , as is traditional in that literature). We construct linear Markov (perfect) equilibria using the players’ beliefs as the relevantstates. Indeed, the quadratic payoﬀs and linear signals open the possibility for quadraticvalue functions that are supported by strategies that are linear in some state variables. Forthis to work, however, the states themselves must be linear in the signals available. With aGaussian information structure, it is then natural to appeal to belief-based states.The appeal of such equilibria is twofold. First, the Markov restriction captures thatbehavior depends only on the aspects of the histories that the players perceive to be payoﬀ-relevant. Second, in equilibrium, the players’ actions are linear in the signals observed, whichgeneralizes traditional linear equilibria widely employed in static applied-theory work.

Speciﬁcally, we characterize equilibria in which, after their corresponding (on- or oﬀ-path)private histories, the long-run player and the myopic counterpart play according to a t = β t + β t M t + β t L t + β t θ (8)ˆ a t = δ t + δ t ˆ M t + δ t L t . (9)Here, ˆ M t := ˆ E t [ θ ] is the myopic player’s ﬁrst-order belief, M t := E t [ ˆ M t ] the long-run player’ssecond-order counterpart, and L t := E [ θ |F Xt ] is the belief about θ using the public informa-tion exclusively ; the coeﬃcients β it and δ jt , i = 0 , , , j = 0 , ,

2, are deterministic.Intuitively, because the long-run player conditions her actions on her type ((ii) in Assump-tion 1), the myopic player’s belief ( ˆ M t ) [0 ,T ] is a relevant state (ﬁrst part in (iii), Assumption1). However, this implies that the long-run player must forecast the myopic player’s beliefto determine her best response (second part in (iii), Assumption 1), which makes ( M t ) t ∈ [0 ,T ] payoﬀ-relevant. The appearance of L t is in turn linked to the nature of M t , as follows.15s is traditional, the long-run player will use X to forecast ˆ M since this signal carries themyopic player’s action. The novelty is that, due to the private monitoring, she will also usethe history of her past actions in this forecasting exercise. Intuitively, as long as the publicsignal is imperfect (i.e., σ X > M on her past actions—as occurred in (3)—makesthis state a private one even in equilibrium, as those actions depend on her actual type. Themyopic player is then forced to make an inference about this belief (and so forth).Along the path of play of any pure strategy, however, the outcome of the game shoulddepend only on ( θ, X, Y ). In particular, M must be a function of the tuple ( θ, X ), which isthe long-run player’s only source of information. The Gaussian structure then suggests theexistence of process ( L t ) t ∈ [0 ,T ] depending only on the public information, and a deterministicfunction χ , such that, under the linear proﬁle (8)–(9) carrying this public process, M t = χ t θ + (1 − χ t ) L t . (10)The representation (10) is at the core of our analysis. First, if true, it shows that the“beliefs about beliefs” problem is manageable: as the myopic player uses (10) to forecast M , all higher-order expectations can be written in terms of the aforementioned belief states.(Clearly, in this third-order inference step by the myopic player, L becomes payoﬀ-relevant.)Second, the representation encodes the separation that occurs via the second-order beliefchannel. Indeed, (10) captures how, under linear strategies, the long-run player balancesher past play ( χθ term) and the public signal ((1 − χ ) L term) when forecasting the myopicplayer’s belief. Inserting (10) into the long-run player’s strategy (8) yields the action process a t = β t (cid:124)(cid:123)(cid:122)(cid:125) =: α t + ( β t + β t (1 − χ t )) (cid:124) (cid:123)(cid:122) (cid:125) =: α t L t + ( β t + β t χ t ) (cid:124) (cid:123)(cid:122) (cid:125) =: α t θ, (11)from which the signaling coeﬃcient is α t := β t + β t χ t . The term β χ t encodes the afore-mentioned history-inference eﬀect : diﬀerent types take diﬀerent actions in equilibrium partlybecause their diﬀering past actions have lead them to hold diﬀerent beliefs today.Our approach for characterizing (10) is constructive. To state the formal result, we omitthe hat symbol for convenience and denote the myopic player’s posterior variance simply by γ t := ˆ E t [( θ − ˆ M t ) ] . emma 1 (Second-order belief representation) . Suppose that ( X, Y ) is driven by (8) – (9) and the myopic player believes that (10) , with ( L t ) t ∈ [0 ,T ] a process that depends only on thepublic information, holds. Then (10) holds at all times (path-by-path of X ), if and only if ˙ γ t = − γ t ( β t + β t χ t ) σ Y , γ = γ o , (12)˙ χ t = γ t ( β t + β t χ t ) (1 − χ t ) σ Y − γ t χ t δ t σ X , χ = 0 , (13) dL t = ( l t + l t L t ) dt + B t dX t , L = µ, (14) with ( l t , l t , B t ) given in (B.7) deterministic. Also, L t = E [ θ |F Xt ] and γ t χ t = E t [( M t − ˆ M t ) ] . In light of the lemma, the representation (10) reads M t = Var t (cid:100) Var t θ + (cid:16) − Var t (cid:100) Var t (cid:17) E [ θ |F Xt ],where we have used the notation (cid:100) Var t := ˆ E t [( θ − ˆ M t ) ] and Var t := E t [( M t − ˆ M t ) ], the lattermeasuring the long-run player’s uncertainty about ˆ M . Indeed, in forecasting ˆ M , the onlyinformational advantage that the long-run player has relative to an outsider who observes X exclusively is that she knows what actions she has taken, and such actions carry hertype. Under linear strategies, learning is Gaussian, so (i) M t is a linear combination of θ and E [ ˆ M t |F Xt ], and (ii) the weights are deterministic; the representation then follows from E [ ˆ M t |F Xt ] = E [ θ |F Xt ]. Observe also that the linearity of E [ θ |F Xt ] in the history ( X s : 0 ≤ s < t ) can be deduced from the linearity of (14) both in L and in the increments of X . The χ -ODE (13) quantiﬁes the dynamics of the relevance of past behavior in the previousforecasting exercise. Indeed, by the common prior, Var = 0 and E [ θ |F X ] = µ ; thus, M = µ in the display above, and so the χ -ODE must start at zero. As signaling progresses, however,the long-run player loses track of ˆ M (i.e., Var t > χ > α > χ then simply reﬂectsthat the long-run player expects ˆ M to gradually incorporate her type via this channel.The relative importance of past play will naturally depend on the quality of the publicinformation. Consider the last term γ t χ t δ t /σ X in (13). If σ X = ∞ or δ ≡

0, the publicsignal is uninformative: indeed, L t = L = µ and χ t = 1 − γ t /γ o in both cases, as in theno-feedback analysis of Section 2. Apart from these cases, the public information is alwaysuseful. In particular, as δ /σ X grows, more downward pressure is exerted on the growth of χ ,reﬂecting a weaker dependence on past play as the quality of X improves, all else being equal;in the limit, χ ≡ L → ˆ M , and so M → ˆ M (i.e., the environment becomes public).The no-feedback case then maximizes the potential amplitude of the history-inference eﬀect. The resulting expression for L holds irrespective of the past private histories of the players: this isbecause deviations are hidden and hence each player thinks the counterparty has constructed L using (14). In (13), δ /σ X ≡ χ -ODE in the no-feedback case (see the proof of Lemma A.2). γ, χ ) as an input, so we require(12)–(13) to have a unique solution to ensure that the ODE-characterization is valid. Notethat the weight on ˆ M in the myopic player’s best reply is given by ˆ u ˆ aθ + ˆ u ˆ aa [ β t + β t χ t ]. Lemma 2 (Learning ODEs) . Suppose ( β , β ) is continuous and δ t = ˆ u ˆ aθ + ˆ u ˆ aa [ β t + β t χ t ] .Then (12) – (13) has a unique solution and < γ t ≤ γ o and ≤ χ t < for all t ∈ [0 , T ] . If,moreover, β (cid:54) = 0 , then these inequalities are strict over (0 , T ] . The ﬁltering equations are valid under weak integrability conditions on the coeﬃcients in(8)–(9), from which γ t = ˆ E t [( θ − ˆ M t ) ] and χ t = Var t / (cid:100) Var t = E t [( M t − ˆ M t ) ] /γ t must solvethe system; a mild strengthening of the conditions ensures that no other solutions exist, andif β (cid:54) = 0 (a property our equilibria satisfy), some information indeed gets transmitted.Our derivation of the representation (10) exploits the tractability of the Gaussian ﬁlteringunder linear strategies. Due to the full-support monitoring, the myopic player expects a t = α t + α t L t + α t θ deﬁned in (11), and L is public. The myopic player’s learning problemof ﬁltering θ from Y is thus (conditionally) Gaussian, so his belief is characterized by astochastic mean ( ˆ M t ) t ∈ [0 ,T ] and the deterministic variance ( γ t ) t ∈ [0 ,T ] . But the linearity of thesignal structure renders the pair ( ˆ M , X ) (conditionally) Gaussian too. The long-run player’sﬁltering then yields a second mean-variance pair, with M t now an explicit linear function ofher past actions. One can insert the linear strategy (8) into M t to pin down ( χ, L ).The representation (10) then relies on the long-run player following the linear strategy(8). In particular, that M is spanned by θ and L is no longer true after deviations, and suchdeviations are needed to evaluate the candidacy of (8) as an LME. This brings us to thethird property behind (10): it captures a divergence in the game’s structure at on- versusoﬀ-path histories, a well known feature of games with private monitoring. The next resultintroduces the law of motion of M and L for an arbitrary strategy of the long-run player. Lemma 3 (Controlled dynamics) . Suppose that the myopic player follows (9) and believesthat (8) and (10) hold. Then, if the long-run player follows ( a (cid:48) t ) t ∈ [0 ,T ] , from her perspective dM t = γ t α t σ Y ( a (cid:48) t − [ α t + α t L t + α t M t ]) dt + χ t γ t δ t σ X dZ t (15) dL t = χ t γ t δ t σ X (1 − χ t ) [ δ t ( M t − L t ) dt + σ X dZ t ] , (16) with Z t := σ X [ X t − ´ t ( δ s + δ s M s + δ s L s ) ds ] being a Brownian motion. The dynamic (15) illustrates how the long-run player expects her future choices to aﬀecther future beliefs. In particular, she will revise her belief upward when a (cid:48) t > E t [ α t + α t L t +18 t ˆ M t ], i.e., when she expects to beat the myopic player’s expectation of her own behavior.The intensity of the revision is given by γ t α t /σ Y : it increases with both the myopic player’suncertainty ( γ ) and his conjecture of the long-run player’s strength of signaling ( α ). Further,it is clear that M is deterministic only if δ /σ X ≡

0, exactly as in Section 2. The appearance of M in the drift of (16) shows that the long-run player expects toinﬂuence L , despite her actions not entering the public signal: this happens through herinﬂuence on the myopic player’s behavior. Consequently, a signal-jamming eﬀect arises:the incentive to inﬂuence a public belief (albeit only indirectly), with such incentives beingperfectly accounted for in equilibrium—this eﬀect is obviously absent in the no-feedback case( σ X = ∞ ). The drift of (16) also shows that L chases M on average, reﬂecting that someonewho only observes X is able to gradually learn the long-run player’s type over time.Finally, observe that the pair ( γ, χ ) appears explicitly in the evolution of ( M, L ). Indeed,this is because of the role of ( γ, χ ) in the myopic player’s learning process: since deviationsare hidden, this player always assumes that (10) holds when constructing his belief.

Remark 2 ( M as a function of past actions) . In Lemma 3, insert the deﬁnition of Z t into (15) to solve for M t as a linear function of ( a s , L s , X s ) s

Given a conjecture (cid:126)β := ( β , β , β , β )by the myopic player, (cid:126)δ := ( δ , δ , δ ) is found by matching coeﬃcients inˆ a t := δ t + δ t ˆ M t + δ t L t = arg max ˆ a (cid:48) ˆ E t [ ˆ U ( α t + α t L t + α t θ, ˆ a (cid:48) , θ )] , (17)with (cid:126)α := ( α , α , α ) as in (11). Since the ﬂow U ( a t , ˆ a t , θ ) is quadratic and M t := E t [ ˆ M t ],we can write the long-run player’s total payoﬀ as a function of ( M t ) t ∈ [0 ,T ] as follows: E (cid:20) ˆ T e − rt U ( a t , δ t + δ t M t + δ t L t , θ ) dt (cid:21) + 12 ∂ U∂ ˆ a ˆ T e − rt δ t γ t χ t dt. (18)Indeed, in writing E t [ ˆ M t ] = M t + E t [( M t − ˆ M t ) ], the Gaussian learning structure guaranteesthat the variances E t [( M t − ˆ M t ) ] are independent of the long-run player’s actual behavior,determined instead by the candidate equilibrium proﬁle; by Lemma 1, their value is χ t γ t .From here, it is clear that ( t, θ, L, M ) is a suﬃcient statistic for the long-run player, with Our choice of dynamic programming over optimal control in Section 2 is not only due to the deterministicproperty being nongeneric: the latter approach obscures how the history-inference eﬀect shapes signaling. γ, χ ).The long-run player’s problem can then be stated as maximizing (18) subject to thedynamics (15)–(16) of (

M, L ), which depend on ( γ, χ ) satisfying (12)–(13). To tackle thisbest response problem, we postulate a quadratic value function V ( θ, m, (cid:96), t ) = v t + v t θ + v t m + v t (cid:96) + v t θ + v t m + v t (cid:96) + v t θm + v t θ(cid:96) + v t m(cid:96), where v i · , i = 0 , ..., rV = sup a (cid:48) (cid:26) ˜ U ( a (cid:48) , E t [ˆ a t ] , θ ) + V t + µ M ( a (cid:48) ) V m + µ L V (cid:96) + σ M V mm + σ M σ L V m(cid:96) + σ L V (cid:96)(cid:96) (cid:27) , where ˜ U := U + ∂ U∂ ˆ a δ t γ t χ t , µ M ( a (cid:48) ) and µ L (respectively, σ M and σ L ) denote the drifts(respectively, volatilities) in (15) and (16), and ˆ a t is determined via (17).A Nash equilibrium in linear Markov strategies immediately follows when β t + β t M + β t L + β t θ is an optimal policy for the long-run player. Indeed, along the path of play ofsuch a policy, the representation (10) holds by construction, and so the long-run player’sbehavior is given by a t = α t + α t L t + α t θ , where ( L t ) t ∈ [0 ,T ] follows (14) in Lemma 1; i.e.,actions are a function of ( t, θ, X ) exclusively. However, conditioning diﬀerently on L and M is proﬁtable after deviations—the policy β t + β t M + β t L + β t θ then speciﬁes how tobehave at such oﬀ-path histories, eﬀectively inducing an LME that is also perfect . The boundary-value problem (BVP).

We brieﬂy explain how to obtain a system ofordinary diﬀerential equations (ODEs) for (cid:126)β . Letting a ( θ, m, (cid:96), t ) denote the maximizer ofthe right-hand side in the HJB equation, the ﬁrst-order condition (FOC) reads ∂U∂a ( a ( θ, m, (cid:96), t ) , δ t + δ t m + δ t (cid:96), θ ) + γ t α t σ Y (cid:124) (cid:123)(cid:122) (cid:125) dM t /da t [ v t + 2 v t m + v t θ + v t (cid:96) ] (cid:124) (cid:123)(cid:122) (cid:125) V m ( θ,m,(cid:96),t ) = 0 . (19)Solving for a ( θ, m, (cid:96), t ) in (19), the equilibrium condition becomes a ( θ, m, (cid:96), t ) = β t + β t m + β t (cid:96) + β t θ , which is a linear equation. We can then solve for ( v , v , v , v ) as afunction of the coeﬃcients (cid:126)β and insert the resulting expressions into the HJB equationalong with a ( θ, m, (cid:96), t ) = β t + β t m + β t (cid:96) + β t θ , to obtain a system of ODEs for the (cid:126)β The long-run player’s problem is, in practice, one of optimally controlling an unobserved state . We areallowed to ﬁlter ﬁrst and then optimize, because the separation principle applies. See the proof of Lemma 3. The myopic player’s behavior is speciﬁed in (17), where ( t, L, ˆ M ) is the relevant state ((B.1) shows howˆ M evolves). While deviations by this player do aﬀect L , it is clear that no additional states are needed forour players after deviations. Also, all the payoﬀ-relevant histories are reachable on path, so the sequentialrationality requirement is trivial for this player in an LME. All this is true if this player is forward looking. v and v satisfy (and thatare readily obtained from the HJB equation): since M feeds into L (see 16), the envelopecondition with respect to the controlled state M cannot deliver a self-contained system forthe optimal policy. Finally, because the pair ( γ, χ ) aﬀects the law of motion of ( M, L ), italso inﬂuences ( (cid:126)β, v , v ), and so the ODEs (12)–(13) must be included.This procedure leads to a system of ODEs for ( β , β , β , β , v , v , γ, χ ); we also needthe boundary conditions . First, there are the exogenous initial conditions that γ and χ satisfy, i.e., γ = γ o > χ = 0. Second, there are terminal conditions v T = v T = 0due to the absence of a lump-sum terminal payoﬀ in the long-run player’s problem. Third,and more interesting, there are endogenous terminal conditions that are determined by thestatic (Bayes) Nash equilibrium that arises from myopic play at time T . In fact, letting u := ∂U/∂a | ∂ U/∂a | (0 , ,

0) and ˆ u := ∂ ˆ U/∂ ˆ a | ∂ ˆ U/∂ ˆ a | (0 , ,

0) denote the intercepts of the players’ staticbest responses, it is easy to verify that this equilibrium entails the coeﬃcients β T = u + u a ˆ a ˆ u − u a ˆ a ˆ u ˆ aa , β T = u a ˆ a [ u aθ ˆ u ˆ aa + ˆ u ˆ aθ ]1 − u a ˆ a ˆ u ˆ aa χ T , β T = u a ˆ a ˆ u ˆ aa [ u aθ ˆ u ˆ aa + ˆ u ˆ aθ ](1 − χ T )(1 − u a ˆ a ˆ u ˆ aa )(1 − u a ˆ a ˆ u ˆ aa χ T ) , β T = u aθ . By part (iv) in Assumption 1 and the fact that χ T ∈ (0 , Moreover, the terminal signaling coeﬃcient α T = β T + β T χ T is proportional to u aθ + u a ˆ a ˆ u ˆ aθ χ T , which, by part (ii) in Assumption 1,never vanishes either. This latter property is suﬃcient for the dynamic equilibria that weconstruct to always exhibit nontrivial signaling throughout the game. We conclude that b := ( β , β , β , β , v , v , γ, χ ) T satisﬁes a BVP of the form˙ b t = f( b t ) , s.t. D b + D T b T = ( B ( χ T ) T , γ o , T (20)where f : R × R + × [0 , → R ; D := diag(0 , , , , , , ,

1) and D T := diag(1 , , , , , , , B ( χ ) : [0 , → R carries the terminal conditions via B ( χ ) := (cid:18) u + u a ˆ a ˆ u − u a ˆ a ˆ u ˆ aa , u a ˆ a [ u aθ ˆ u ˆ aa + ˆ u ˆ aθ ]1 − u a ˆ a ˆ u ˆ aa χ , u a ˆ a ˆ u ˆ aa [ u aθ ˆ u ˆ aa + ˆ u ˆ aθ ](1 − χ )(1 − u a ˆ a ˆ u ˆ aa )(1 − u a ˆ a ˆ u ˆ aa χ ) , u aθ , , (cid:19) T ∈ R . (21)The general expression that f( · ) in (20) takes for a generic pair ( U, ˆ U ) satisfying Assumption1 is long, and can be found in spm.nb on our websites. (There, to simplify notation, we workwith normalized payoﬀs U/ | ∂ U/∂a | and ˆ U / | ∂ ˆ U /∂ ˆ a | .) In the next subsection, we provide If both players are myopic, the strategies carry coeﬃcients with χ t as above, t ∈ [0 , T ]. By the argumentsused for Lemma 2, the induced system in ( γ, χ ) has a unique solution, so there is a unique LME for all T > This requirement at time T can be relaxed, but it is beyond our scope of interest. Also, it is easy to seethat δ T = ˆ u + ˆ u ˆ aa β T , δ T = ˆ u ˆ aθ + ˆ u ˆ aa [ β T + β T χ T ] and δ T = ˆ u ˆ aa [ β T + β T (1 − χ T )]. · ) can satisfy.The task of ﬁnding an LME is then reduced to solving the BVP (20) (and checking thatthe rest of the coeﬃcients in the value function are well deﬁned, which is a simpler task). In this section, we present two existence results for LME. Behind these results are two ap-proaches to separately address common and private-value environments, as the correspond-ing BVPs have a diﬀerent structure linked to the extent of asymmetry between the players’signaling rates that arise in each case. For the sake of exposition, we state the theorems forvariations of the leadership game of Section 2. The “common-value” method is, nevertheless,fully general and can be exported to other asymmetric settings.

The shooting problem.

Establishing the existence of a solution to the BVP (20) iscomplex because there are multiple ODEs in both directions: ( (cid:126)β, v , v ) is traced backwardfrom its terminal values, while ( γ, χ ) is traced forward using its initial values—see Figure 2.This means that some notion of “shooting” must be applied: say, to construct a backward initial value problem (IVP) in which ( γ, χ ) has a parametrized initial condition at T , and beable to ensure that the terminal value (now, at 0) exactly matches ( γ o , γ and χ at T for all possible coeﬃcients (cid:126)β would berequired to ﬁnd the right “tracing path” in a multidimensional domain. γ o χ T β ( χ , γ ) γ tt i,T T T v ( χ , γ ) i,T T T StaticNash... B ( χ , γ ) T T

Figure 2: The tuple ( γ, χ ) has initial conditions, while ( (cid:126)β, v , v ) has terminal ones. We have allowedfor non-zero v ’s and for a dependence on γ T , as this can occur if terminal payoﬀs are allowed. The reason behind this dimensionality problem is the asymmetry in the environment: therate at which the long-run player signals, α := β + β χ , can be very diﬀerent from the myopicplayer’s counterpart, δ . When this is the case, a nontrivial history dependence between γ and χ —reﬂected in the coupled system of ODEs they satisfy—ensues. Two questions naturallyarise: ﬁrst, under what conditions can such history dependence be simpliﬁed; second, howcan one tackle the issue of existence of LME when a simpliﬁcation is not feasible?22 rivate values: one-dimensional shooting. We say that the environment is one of private values if the myopic player’s ﬂow utility satisﬁes ˆ u ˆ aθ = 0, i.e., his best reply does notdirectly depend on his belief about θ , but only indirectly via the long-run player’s action.Otherwise, the setting is one of common values (despite the long-run player knowing θ ).In a private-value setting, the players signal to each other at rates that are proportional.Indeed, the weight attached to ˆ M in the myopic player’s best response becomes δ = ˆ u ˆ aa α . Lemma 4 (One-to-one mapping) . Suppose that β and β are continuous and that δ =ˆ u ˆ aa α . If ˆ u ˆ aa (cid:54) = 0 , there are positive constants c , c and d independent of γ o such that χ t = c c (1 − [ γ t /γ o ] d ) c + c [ γ t /γ o ] d . Moreover, (i) ≤ χ t < c < for all t ∈ [0 , T ] and (ii) c → as σ X → and c → as σ X → ∞ . If instead ˆ u ˆ aa = 0 or σ X = ∞ , χ t = 1 − γ t /γ o . Private-value settings, by inducing proportional signaling rates, create useful symmetry:while the players’ posterior variances are not proportional, there is a decreasing relationshipbetween χ and γ at all times. By (i), χ is uniformly below 1 if the public signal is informative,reﬂecting that the scope for the history-inference eﬀect falls relative to the no-feedbackcase. By (ii), the public and no-feedback cases are recovered as we take limits; further, thecharacterization of χ obtained in the latter case is recovered when, in addition, ˆ u ˆ aa = 0.This result enables the use of standard one-dimensional shooting arguments, making ourleadership application of Section 2 a valid laboratory: the game has private values because thefollower wants to match the leader’s action. Below is the corresponding BVP for σ X ∈ (0 , ∞ )and, for simplicity, for r = 0; we omit the ODE for β because it is uncoupled and linear:˙ v t = β t + 2 β t β t (1 − χ t ) − β t (1 − χ t ) + 2 v t α t γ t χ t σ X (1 − χ t )˙ v t = − β t − − α t ) β t (1 − χ t ) − β t χ t (1 − χ t ) + v t α t γ t χ t σ X (1 − χ t )˙ β t = α t γ t σ X σ Y (1 − χ t ) (cid:8) σ X ( α t − β t ) β t (1 − χ t ) − α t β t γ t χ t v t − σ Y α t χ t ( β t − β t [1 − χ t − β t χ t ]) (cid:9) ˙ β t = α t γ t σ X σ Y (1 − χ t ) (cid:8) σ X β t (1 − χ t ) + 2 σ Y α t β t χ t (1 − β t ) − α t γ t χ t (2 v t + β t v t ) (cid:9) ˙ β t = α t γ t σ X σ Y (1 − χ t ) (cid:8) − σ X β t (1 − χ t ) β t + 2 σ Y α t β t χ t (1 − β t ) − α t β t γ t χ t v t (cid:9) ˙ γ t = − γ t α t σ Y , v T = v T = 0, β T = − χ T ) , β T = − χ T − χ T ) , β T = and γ = γ o , and where α := β + β χ and χ t is as in the previous lemma. We have the following: Theorem 1 (Existence of LME—private values) . Let σ X ∈ (0 , ∞ ) and r = 0 . There exists astrictly positive T ( γ o ) ∈ O (1 /γ o ) such that, for all T < T ( γ o ) , there is an LME based on thesolution to the previous BVP that satisﬁes β t = 0 , β t + β t + β t = 1 and α t > , t ∈ [0 , T ] . The key step in the proof is to show that ( β , β , β , v , v , γ ) can be bounded uniformlyover [0 , T ( γ o )), for some T ( γ o ) >

0, when γ t ∈ [0 , γ o ] at all times. This implies that tracingthe (parametrized) initial condition of γ in the backward IVP from 0 upwards will lead toat least one γ -path landing at γ o due to the continuity of the solutions with respect to theinitial conditions, while the rest of the ODEs still admit solutions.Figure 3 below illustrates the signaling coeﬃcient α for various values of σ X : as thelatter increases, the dashed lines rotate counterclockwise from the public to the no-feedbackcase, justifying our earlier focus on σ X ∈ { , ∞} . Interestingly, when r > σ X < ∞ , α is nonmonotonic . Intuitively, a partially informative signal combines the increasing history-inference eﬀect of the no-feedback case with the decreasing signaling motive driving thepublic case. Discounting weakens the latter, while the former grows over time even with amyopic leader. As σ X increases, moreover, the history-inference eﬀect gains strength andthe maximum of α shifts to the right. Only a fully dynamic model can uncover such eﬀects. t β Public α NF α Interior (a) r = 0 t β Public α NF α Interior (b) r = 1Figure 3: Signaling coeﬃcients for σ X ∈ { , . , . , , , + ∞} ; “ α Interior” denotes α . Theorem 1 can be easily generalized to accommodate a conﬂict of interest between theplayers. Indeed, the one-to-one mapping between γ and χ still holds for any best responseof the myopic player that is a time-independent aﬃne function of his expectation of thelong-run player’s action. We defer a discussion of this topic to the common-value setting,where we address asymmetries at a more general level. Existence in the discounted case can be shown with identical methods. For sharper visual eﬀects, we arepotentially plotting beyond the interval of existence ensured by the theorem (which is a crude lower bound). ommon-value settings: ﬁxed-point methods. When α and δ are not proportional, χ can depend on both current and past values of γ —the dimensionality problem resurfaces.Our key observation is that ﬁnding a solution to any given instance of the BVP (20) is,mathematically, a ﬁxed-point problem . Speciﬁcally, note that the static Nash equilibrium attime T depends on the value that χ takes at that point. The latter value, however, dependson how much signaling has taken place along the way, i.e., on values of the coeﬃcients (cid:126)β attimes prior to T . Those values, in turn, depend on the value of the equilibrium coeﬃcientsat T by backward induction—thus, we are back to the same point where we started.Our approach therefore applies a ﬁxed-point argument adapted from the literature onBVPs with intertemporal linear constraints (Keller, 1968) to our problem with intratemporalnonlinear constraints . Because the method is novel and has the generality required to becomeuseful in other settings, we brieﬂy elaborate on how it works.In essence, we will be “shooting” six ODEs forward . Speciﬁcally, let t (cid:55)→ b t ( s, γ o , s, γ o , s ∈ R parametrizing the initial value of ( (cid:126)β, v , v ). From Lemma 2, the last twocomponents of b , i.e., γ and χ , admit solutions as long as the others do; moreover, there areno constraints on their terminal values. Thus, for our ﬁxed-point argument, we can focus onthe ﬁrst six components in b := ( β , β , β , β , v , v , γ, χ ) T by deﬁning the gap function g ( s ) = B ( χ T ( s, γ o , − D T ˆ T f( b t ( s, γ o , dt, where D T := diag(1 , , , , , , , (cid:126)β, v , v ) (last term), and its target value, B ( χ T ( s, γ o , B ( χ ) is nonlinear : the static equilibrium imposes nonlinear relationships across variables at time T .By deﬁnition, b ( s, γ o ,

0) = s . Consequently, it follows that g ( s ) = s ⇐⇒ B ( χ T ( s, γ o , s + D T ˆ T f( b t ( s, γ o , dt = D T b T ( s, γ o , , where the last equality follows from the deﬁnition of the ODE-system that D T b satisﬁes.Thus, the shooting problem of ﬁnding s ∈ R such that B ( χ T ( s, γ o , T b T ( s, γ o ,

0) canbe restated as one of ﬁnding a ﬁxed point of the function g . The goal is then to ﬁnd a time T ( γ o ) and a compact set S such that (i) for all s ∈ S , A BVP with intertemporal linear constraints (Keller, 1968) diﬀers from ours in that D b + D T b T =( B ( χ T ) T , γ o , T becomes A b + B b T = ζ , where ζ is a constant column vector and A and B are generalmatrices. On the one hand, since A and B are not necessarily diagonal matrices, one may not be able todispense with a subset of the system. On the other hand, our version of ζ is a nonlinear function of a subsetof (endogenous) components of b T , which makes the ﬁxed-point argument more involved.

25 unique solution to the aforementioned IVP over [0 , T ( γ o )] exists, and (ii) g is continuousfrom S to itself. The natural choice for S is a ball with center s := B (0), the terminalcondition of the trivial game with T = 0; we then apply Brouwer’s ﬁxed-point theorem.We can now establish our main existence result for a variation of the leadership ap-plication in which the follower’s best response is of the form ˆ a t = ˆ u ˆ aθ ˆ E t [ θ ] + ˆ u ˆ aa ˆ E t [ a t ]for (ˆ u ˆ aθ , ˆ u ˆ aa ) as in Assumption 1; in particular, the myopic player’s signaling coeﬃcientis δ t = ˆ u ˆ aθ + ˆ u ˆ aa α t . The BVP is stated in (B.8)-(B.14) in the Appendix.

Theorem 2 (Existence of LME—common values) . Set σ X ∈ (0 , ∞ ) and r = 0 in theleadership model, and let (ˆ u ˆ aθ , ˆ u ˆ aa ) ∈ R satisfy Assumption 1. There is a strictly positivefunction T ( γ o ) ∈ O (1 /γ o ) such that if T < T ( γ o ) , there exists an LME based on the BVP (B.8) - (B.14) . In such an equilibrium, α > . There are three observations from this theorem. First, the time for which an LME is en-sured to exist grows without bound as γ o (cid:38)

0. Indeed, f( · ) naturally scales with this param-eter, so the solutions converge to the full-information benchmark ( β , β , β , β , v , v , χ, γ ) =(0 , u a ˆ a [ u aθ ˆ u ˆ aa + ˆ u ˆ aθ ] , u a ˆ a ˆ u ˆ aa [ u aθ ˆ u ˆ aa +ˆ u ˆ aθ ]1 − u a ˆ a ˆ u ˆ aa , u aθ , , , , T > T ( γ o ) relative to Theo-rem 1, it is not vacuous either. In fact, since s = B (0) is the center of S , we have that g ( s ) − s = B ( χ T ( s, γ o , − B (0) − D T ˆ T f( b t ( s, γ o , dt. Bounding B ( χ T ( s, γ o , − B (0) therefore imposes an additional constraint relative to thosethat ensure that the system is uniformly bounded (which in turn bound the last integral).Finally, the bound T ( γ o ) is obtained under minimal knowledge of the system: it is theoutcome of bounds that only exploit the degree of the polynomials in f( b ) and hence that donot exploit any relationship between the equilibrium coeﬃcients. Thus, the proof techniqueis (i) fully general and (ii) improvable provided more is known about the system at hand.Appendix B.3 sketches how the proof of Theorem 2 applies to the whole class of gamessatisfying Assumption 1. Moreover, observe that this method, by being able to handlemultiple ODEs in each direction, has the power to be applied to other asymmetric games oflearning beyond the class under study (see the concluding remarks for more on this topic). Asymmetric games.

Let us brieﬂy elaborate on how Theorems 1 and 2 enable us toexplore natural settings in which the players rates of signaling are inevitably diﬀerent. Since u a ˆ a = u aθ = 1 / u ˆ aθ > −

1; best responses intersect ifˆ u ˆ aa <

2; and second-order inferences arise if (ˆ u ˆ aθ , ˆ u ˆ aa ) (cid:54) = (0 , σ X > a = ˆ M as opposed to ˆ a = α ˆ M , his actionsare more sensitive to his private information. This, in turn, magniﬁes the leader’s signalingin two ways: the leader has a stronger incentive to steer the follower’s behavior (i.e., β increases), and due to the imperfect learning, the leader relies more on M (at the expenseof L ) to coordinate (i.e., β also increases). This results in more signaling and learning,also compounded by an overall higher χ in the history-inference eﬀect (despite the negativedirect impact that a more informative X has on the reliance on past play); the left and centerpanels in Figure 4 illustrate these forces. It is noteworthy that this example must utilizeour most general Theorem 2, in spite of its a priori simplicity stemming from the follower’smyopic best reply being independent of the leader’s strategy. t α λ = α λ = .5 α λ = t γ λ = χ λ = γ λ = .5 χ λ = .5 γ λ = χ λ = β β β t - - Figure 4: Signaling (left) and learning (center) for ˆ U ( a, ˆ a, θ ) = − λ (ˆ a − θ ) − (1 − λ )(ˆ a − a ) , λ ∈ { , . , } . Right: Leader’s strategy coeﬃcients for follower payoﬀs − (ˆ a − (3 / a ) . Asymmetries also naturally arise from a conﬂict of interest between the players. Let usnow use Theorem 1. In the leadership application, ﬁxing the leader’s payoﬀs, incentives aremisaligned if ˆ u ˆ aa + ˆ u ˆ aθ (cid:54) = 1 on the follower’s side. Suppose then that ˆ u = ˆ u ˆ aθ = 0 andˆ u ˆ aa >

1, i.e., the follower overreacts to the leader’s action. If the horizon is suﬃcientlylong, the leader has an initial incentive to invest in mitigating the follower’s reaction byshrinking the latter’s belief, so β and β (the weights on M and L , respectively) are negative.Furthermore, the right panel in Figure 4 shows that β is the main component in this attempt:the manipulation occurs largely via the leader jamming the public belief L , as her directincentives to coordinate soon are strong. Finally, as the time to enjoy the beneﬁts of suchmanipulation shrinks, the leader accommodates the follower, and these eﬀects reverse.We have chosen to continue with the leadership application for expositional reasons. Inthe next section, we explore other applications based on extensions of our baseline model.27 Extensions

We extend our model to allow (i) a quadratic terminal payoﬀ in a career-concerns model,and (ii) the long-run player aﬀecting the public signal in a trading model a la Kyle (1985).

Suppose that the long-run player is now an expert or politician with career concerns. Thisagent has a hidden ideological bias θ and takes repeated actions—for example, adoptingpositions on critical issues or making campaign promises. The mean of the prior distributiondenotes the unbiased type—without loss, let us use the normalization µ = 0.We interpret the myopic player as a news outlet that always attempts to report on thetrue bias, i.e., that maximizes − ˆ E t [(ˆ a t − θ ) ] at all times. In turn, the politician’s payoﬀ is − ˆ T ( a t − θ ) dt − ψ ˆ a T , with ψ >

0. Given the myopic player’s preferences, the termination payoﬀ takes the form − ψ ˆ M T , and so the politician has career concerns: she wants to appear as unbiased at theend of the horizon. But this long-term goal conﬂicts with her short-term ideological desires:in her ﬂow payoﬀ, she beneﬁts from taking actions that conform to her bias.The private nature of dY = a t dt + σ Y dZ Yt is understood as the outlet having access toimperfect private sources regarding the politician’s actions. In turn, dX t = ˆ M t dt + σ X dZ Xt isthe outlet’s news process: the (public) reporting on the bias is fair on average, but imperfect.When does the politician fare better? In settings where the reporting is precise—i.e., low σ X —and hence she can tailor her actions to her reputation? Clearly, noisier environmentsentail a direct cost: they introduce increased uncertainty over a concave objective. Thenext result shows that increasing an agent’s uncertainty over her own reputation, therebyundermining her ability to take appropriate actions, can be beneﬁcial: Proposition 5. (i) Suppose that σ X ∈ { , + ∞} . Then, for all ψ, T > there exists anLME. Moreover, if ψ < σ Y /γ o , the LME is unique, and learning is lower and ex antepayoﬀs higher in the no-feedback case.(ii) If σ X ∈ (0 , ∞ ) , there exists T ( γ o ) ∈ O (1 /γ o ) s.t. an LME exists for all T < T ( γ o ) . Politicians or experts with larger biases take more extreme actions, and hence the equilib-rium strategy attaches a positive weight to the type. Because of career concerns, however, thegreater the perceived value of ˆ M , the greater the incentive to manipulate it downward. With28rivate monitoring, higher types therefore must oﬀset higher beliefs from their perspectives,leading to a history-inference eﬀect that dampens the signaling coeﬃcient α . The belief isthen less responsive from an ex ante perspective, which facilitates maintaining a reputationfor neutrality. Indeed, provided that the objective is not too concave and the environmentnot too uncertain (which strengthen the direct cost), this strategic eﬀect dominates.Regarding (ii), because this environment is one of common values, one can establishthe existence of an LME with minimal changes to the method behind Theorem 2. Indeed,the only diﬀerence is that our baseline BVP changes to incorporate terminal conditionsthat depend not only on χ T , but also on γ T via β T = − ψγ T σ Y + ψγ T χ T : with terminal lump-sum payoﬀs, there are last-minute incentives to manipulate the myopic player’s belief thatdecrease in the associated precision. Our approach does not vary with this dependence. An asset with ﬁxed fundamental value θ is traded in continuous time until date T , the timeat which its true value is revealed, ending the game. A patient insider (the long-run player)privately observes θ prior to the start of the game. As in Yang and Zhu (2019), a secondtrader has a technology that allows him to privately observe imperfect signals of the insider’strades; this player is myopic. Both players and a ﬂow of noise traders submit orders to a market maker who then executes those trades at a public price L t = E [ θ |F Xt ].We depart from the baseline model along three dimensions. First, the public signal—the total order ﬂow —is dX t = ( a t + ˆ a t ) dt + σ X dZ Xt , which now includes the long-run player’saction; hence, the myopic player learns from both the private monitoring channel and thepublic price. Second, the players’ ﬂow payoﬀs depend directly on L , interpreted as the actiontaken by the market maker: the myopic player’s ﬂow payoﬀ is given by ξ ( θ − L )ˆ a − ˆ a , where ξ ≥

0, while the long-run player’s ﬂow payoﬀ is ( θ − L t ) a t ; the inverse of the parameter ξ is a measure of transaction costs for the myopic player. Finally, observe that the long-runplayer’s ﬂow payoﬀ is linear in her action a t at all instants t ∈ [0 , T ].Following the literature, we seek an equilibrium in which the informed trader reveals herprivate information gradually over time through a linear strategy of the form (8). Hence,we require that the coeﬃcients of the insider’s strategy be C functions over strict compactsubsets of [0 , T ); we can then apply Lemmas 1 and 2 to such sets. It is easy to show that the ex ante expectation of ˆ M T is γ o − γ T , so that greater learning by the myopicplayer results in larger terminal losses for the long-run player. This reverses for slightly negative ψ , but sodoes the history-inference eﬀect: there is more learning but again a higher payoﬀ in the no-feedback case. The C requirement suﬃces for the total order to be “inverted” from the price for t < T (hence, it iswithout loss to make X the source of public information), while the open interval allows for the possibilityof full revelation of information by time T . The proof of Lemma 1 derives learning ODEs for an additive ξ = 0 (or σ Y = ∞ ), the model reduces to Kyle (1985), and hence an LMEwith trading strategy of the form β ( θ − L ) always exists. This is not the case when ξ > Proposition 6.

Fix ξ > . For all σ Y > , there does not exist a linear Markov equilibrium. With linear Markov strategies, the myopic player acquires private information about θ over time. Thus, the myopic player’s own repeated trades carry further information to themarket maker, beyond that which the market maker learns from the insider alone. Thisintroduces momentum into the price from the insider’s perspective, measured by a term ξ ( m − l ) in the drift of L . Future trades then become less attractive to the insider, therebyplacing the insider in a race against herself that results in all her information being tradedaway in the ﬁrst instant, regardless of the amount of noise in the private signal Y . In a closely related result, Yang and Zhu (2019) show that a linear equilibrium can ceaseto exist in a two-period setting where a trader who only participates in the last round re-ceives a suﬃciently precise signal of an informed player’s ﬁrst-period trade; a mixed-strategyequilibrium emerges instead. More generally, the existence problem relates to how, withcommon information, an informed player’s rush to trade depends on the number of tradingopportunities. The analysis of Foster and Viswanathan (1994) is illuminating in this respect:in a setting with nested information structures, a better informed insider trades a commonlyknown piece of information ﬁrst, exploiting her superior information only later. While thereare important diﬀerences between our setups (in their model, the belief of the less informedplayer is always known to the more informed player, and the common source of information isexogenous) there is a unifying theme: once common information is created, there is pressureto trade quickly on it. Such pressure increases with the number of rounds ahead. We have examined an important departure from the vast literature on signaling games:the case of a sender who does not see the signals emanating from her actions. A complex“beliefs about beliefs” problem arises in this case and leads to a novel separation eﬀect viaa second-order belief channel. Our contributions—namely, constructing belief-dependentequilibria and quantifying the impact of this natural separation eﬀect on outcomes, alongwith the necessary new methodologies introduced—are at the frontier of what is known inthese settings. drift in X , and it is easy to see that the steps of Lemma 2 (with ˆ u ˆ aθ = ξ, ˆ u ˆ aa = 0) go through for this case. The linearity of the setting prevents a Nash equilibrium from being deﬁned at T . Thus, our argumentdoes not stem from a problem with a BVP but rather an impossibility of indiﬀerence for the long-run player. For a similar result in a symmetric setting, see Back et al. (2000). Both the presence of a myopiccounterpart and a quadratic trading cost for this player only strengthen our nonexistence result.

30e conclude with a discussion of our modeling and expositional choices. First, the signalstructure that we employ is important in that it enables us to “close” the set of states atthe second order. If instead the long-run player had a stochastic type, more states would beneeded at the very least; and if both players had access to imperfect private signals, beliefs ofeven higher order would be payoﬀ-relevant. While these are interesting exercises, a naturalquestion is whether behavior truly relies on such considerably more complex strategies.Second, a model with a forward-looking receiver is a tractable extension that requires nomajor conceptual changes. In fact, most of the results are derived for, or can be generalizedto, continuous coeﬃcients in the myopic player’s strategy. Those coeﬃcients would thensatisfy ODEs capturing optimal dynamic behavior, but crucially (i) no additional statesare needed, and (ii) the ﬁxed-point argument is applicable to an enlarged boundary valueproblem. Such an extension, however, only brings an old known force to the analysis: since X is public, a forward-looking receiver would exhibit a traditional signal-jamming motive.Our choice of applications stems from their proximity to our informational assumptions,but others are also plausible: a deception game for business or military strategy arises in anasymmetric version of our coordination game in which the leader enjoys miscoordination; aleadership model to study encouragement eﬀects is obtained when complementarities betweenthe state of the world and aggregate eﬀort are allowed; and trading models with quadratictrading costs that restore existence can shed light on an informed trader’s behavior.Finally, while stylized, the linear-quadratic-Gaussian class is of great value. First, it un-covers eﬀects that are likely to be key in other, more nonlinear, settings: the history-inferenceeﬀect coupled with the time eﬀects arising from learning seem to exhaust the forces presentwhen behavior depends on the payoﬀ-relevant aspects of the histories. Second, it permitsthe development of methods that are exportable to other settings: the ﬁxed-point methodfor BVPs, by handling multidimensional shooting, has the power to be taken to asymmetricoligopolies with private information, or to reputation models with multidimensional types. Appendix A: Proofs for Section 2

Preliminary results.

We state standard results on ODEs (Teschl, 2012) which we use inthe proofs that follow. Let f ( t, x ) be continuous from [0 , T ] × R n to R n , where T >

Peano’s

Theorem (Theorem 2.19, p. 56): There exists T (cid:48) ∈ (0 , T ), such that there isat least one solution to the IVP ˙ x = f ( t, x ) , x (0) = x over t ∈ [0 , T (cid:48) ).If, moreover, f is locally Lipschitz continuous in x , uniformly in t , then:- Picard-Lindel¨of

Theorem (Theorem 2.2, p. 38): For ( t , x ) ∈ [0 , T ) × R n , there is anopen interval I over which the IVP ˙ x = f ( t, x ) , x ( t ) = x admits a unique solution.31 Comparison theorem (Theorem 1.3, p. 27): If x ( · ) , y ( · ) are diﬀerentiable, x ( t ) ≤ y ( t )for some t ∈ [0 , T ), and ˙ x t − f ( t, x ( t )) ≤ ˙ y t − f ( t, y ( t )) ∀ t ∈ [ t , T ), then x ( t ) ≤ y ( t ) ∀ t ∈ [ t , T ). If, moreover, x ( t ) < y ( t ) for some t ∈ [ t , T ), then x ( s ) < y ( s ) ∀ s ∈ [ t, T ). A.1: Proofs for Public Case

Proof of Proposition 1.

We aim to characterize an LME in which the leader backs out thefollower’s belief from his action at all times, with strategies of the form a t = β t + β t ˆ M t + β t θ and ˆ a t = ˆ E t [ a t ] = β t +( β t + β t ) ˆ M t , where ˆ M t := ˆ E t [ θ ], and β it , i = 0 , ,

3, are deterministic,satisfying β t + β t (cid:54) = 0, t ∈ [0 , T ]. From standard results in ﬁltering theory, if the followerexpects ( a t ) t ≥ as above, then whenever he is on path his beliefs are θ ∼ N ( ˆ M t , γ t ), where d ˆ M t = β t γ t σ Y [ dY t − { β t + ( β t + β t ) ˆ M t } (cid:124) (cid:123)(cid:122) (cid:125) ˆ E t [ a t ]= dt ] , ˆ M = µ and ˙ γ t = − (cid:18) γ t β t σ Y (cid:19) , γ = γ o . (A.1)Let V : R × [0 , T ] → R denote the leader’s value function. The HJB equation is rV = sup a ∈ R {− ( a − θ ) − ( a − ˆ a t ) + β t γ t σ Y [ a − β t − ( β t + β t ) m ] V m + β t γ t σ Y V mm + V t } . We guessa quadratic solution V ( θ, m, t ) = v t + v t θ + v t m + v t θ + v t m + v t θm , from which theFOC in the HJB reads 0 = − β t + β t m + β t θ − θ ) − β t ( θ − m ) + β t γ t [ v t +2 mv t + θv t ] σ Y whenthe maximizer is a ∗ := β t + β t m + β t θ .From here, ( v t , v t , v t ) = (cid:16) σ Y β t β t γ t , σ Y ( β t − β t ) β t γ t , σ Y (2 β t − β t γ t (cid:17) , due to the FOC holding for all( θ, m, t ) ∈ R × [0 , T ]. And since v iT = 0 for i ∈ { , . . . , } , we deduce that ( β T , β T , β T ) =(0 , / , / myopic equilibrium coeﬃcients .Inserting a ∗ into the HJB equation, and using the previous expressions for ( v t , v t , v t ) toreplace ( v t , v t , v t , ˙ v t , ˙ v t , ˙ v t ), yields an equation in (cid:126)β := ( β , β , β ) and ˙ (cid:126)β . Grouping by co-eﬃcients ( θ, m, θ ,..., etc.) in the latter, we obtain a system of ODEs for ( v , v , v , β , β , β ):˙ v t = rv t + β t γ t ( β t − β t ), ˙ v t = rv t − β t β t , and ˙ v t = 1 + rv t − β t along with( ˙ β t , ˙ β t , ˙ β t ) = (cid:18) rβ t β t , β t (cid:20) r (2 β t −

1) + β t β t γ t σ Y (cid:21) , β t (cid:20) r (2 β t − − β t β t γ t σ Y (cid:21)(cid:19) , (A.2)with conditions ( v T , v T , v T , β T , β T , β T ) = (0 , , , , / , / β , β , β , γ ) delivers the remaining v i , as their ODEs are uncoupledfrom one another and linear in themselves. The existence of a LME then reduces to the BVPdeﬁned by the γ -ODE (A.1) and (A.2) above, with γ = γ o and ( β T , β T , β T ) = (0 , / , / If instead of σ X = 0, Y is public, this holds also after deviations by the myopic player; see footnote 5.

32f time) IVP problem, using a parametrized initial value for γ . Abusing notation,( ˙ β t , ˙ β t , ˙ β t , ˙ γ t ) = β t × (cid:18) − rβ t , r (1 − β t ) − β t β t γ t σ Y , r (1 − β t ) + β t β t γ t σ Y , β t γ t σ Y (cid:19) , (A.3)with initial conditions β = 0, β = β = and γ = γ F ≥

0. Deﬁne B Pub t := β t + β t . Lemma A.1.

Fix any γ F ≥ . If a solution to the backward system exists over [0 , T ] , thenany such solution must have the following properties. If γ F > , then (i) B Pub t = 1 for all t ∈ [0 , T ] , (ii) β t ∈ (1 / , and β t ∈ (0 , / for all t ∈ (0 , T ] , (iii) β is monotonicallyincreasing while β is monotonically decreasing, and (iv) γ is strictly increasing. If γ F = 0 ,then β t = β t = and γ t = 0 for all t ∈ [0 , T ] . For any γ F ≥ , β ≡ .Proof of Lemma A.1. Because the system (A.3) is C , the solution is unique when it exists.If γ F = 0, it is clear by inspection that ( β , β , β , γ ) = (0 , / , / ,

0) (uniquely) solves theIVP, so assume hereafter that γ F >

0. We ﬁrst claim that β >

0. Indeed, let f β ( t, β t )denote the RHS of the β -ODE in (A.3). Letting x t := 0 for all t ∈ [0 , T ], we have β =1 / > x and ˙ β t − f β ( t, β t ) = 0 = ˙ x t − f β ( t, x t ); by the comparison theorem, the claimfollows. Now, add the ODEs that β and β satisfy to get ˙ B Pub t = 2 rβ t (1 − B Pub t ) with B Pub = 1; because the RHS is of class C , it has a unique solution, which is clearly B Pub = 1.Hence, β + β = 1 and ˙ β t = β t (cid:104) r (1 − β t ) + β t (1 − β t ) γ t σ Y (cid:105) , and we maintain the label f β ( t, β t ) for its RHS. Deﬁning x t := 1 for all t ∈ [0 , T ], then, x = 1 > β = , and˙ β t − f β ( t, β t ) = 0 ≤ r = ˙ x t − f β ( t, x t ); thus, β < β = 1 − β > β > γ is clearly strictly increasing, and hence γ t > t ∈ [0 , T ]. Now,˙ β t = (cid:104) γ t σ Y (cid:105) > β t = , and thus β t > / β t < / t ∈ (0 , T ].We now turn to (iii). Since ˙ β t + ˙ β t = 0, we just show that ˙ β >

0; in turn, it suﬃcesto show that H t := ˙ β t /β t = r (1 − β t ) + β t (1 − β t ) γ t σ Y > t ∈ [0 , T ]. Observe that H = γ σ Y >

0, and with algebra it can be shown that if H t = 0, ˙ H t = (1 − β t ) β t γ t σ Y >

0. Itfollows that

H > β >

0, so from (A.3) β ≡

0. Also, as long as γ F > γ >

0, so ( v , v , v ) are well deﬁned.It remains to show that there is a γ F > γ T = γ o in the backward system whileall the other ODEs admit solutions. As we argue in the proof of Theorem 1, it suﬃces toshow that the solutions are uniformly bounded when γ t ∈ [0 , γ o ] for t ∈ [0 , T ]—refer to thatproof for the details of the argument. Applied to this context, the bounds β , β , β ∈ [0 , γ does not explode, so there is indeeda solution to the BVP, and hence a LME exists. To conclude, part (ii) in the proposition isimplied by Lemma A.1, while the uniqueness property is shown in the online appendix.33 .2: Proofs for No-Feedback Case Lemma A.2 (Belief Representation) . Suppose that the follower expects a t = [ β t + β t (1 − χ t )] µ + α t θ , where α = β + β χ , χ = 1 − γ/γ o , and γ t := ˆ E t [( θ − ˆ M t ) ] . Then ˙ γ t = − (cid:16) γ t α t σ Y (cid:17) .Moreover, if the leader follows (4) , M t = χ t θ + (1 − χ t ) µ holds at all times.Proof of Lemma A.2. Anticipating a t = α t µ + α t θ , with α = [ β + β (1 − χ )] µ and α = β + β χ , the myopic player’s belief is ∼ N ( M t , γ t ) where d ˆ M t = α t γ t σ Y [ dY t − ( α t + α t ˆ M t ) dt ]and ˙ γ t = − γ t α t σ Y . Thus, ˆ M t = µR ( t,

0) + ´ t R ( t, s ) α s γ s σ Y [( a s − α s ) ds + σ Y dZ Ys ] and M t = µR ( t,

0) + ´ t R ( t, s ) α s γ s σ Y ( a s − α s ) ds where R ( t, s ) = exp( − ´ ts α u γ u σ Y du ). Solving for M afterinserting a t = β t µ + β t M t + β t θ , and imposing the representation, it is easy to concludethat (5) will hold if and only if ˙ χ t = α t γ t σ Y (1 − χ t ). By arguments analogous to those used forLemma 2, the ( γ, χ )-ODE pair admits a unique solution, and it satisﬁes χ = 1 − γ/γ o . Proof of Proposition 2.

If the leader uses a t = β t µ + β t M t + β t θ then, using the represen-tation M t = χ t θ + (1 − χ t ) µ , ˆ a t = ˆ E t [ a t ] = α t µ + α t M t , where α t := β t + β t [1 − χ t and α t = β t χ t + β t . Taking an expectation in the leader’s ﬂow payoﬀ − ( a t − θ ) − ( a t − ˆ a t ) thenyields that ( θ, M t , t ) is the relevant state on and oﬀ path. (Indeed, expanding the squares inthe previous expression the only nontrivial component is E t [ˆ a t ], which makes E t [ ˆ M t ] appear;however, E t [ ˆ M t ] = M t + E t [( ˆ M t − M t ) ] = M t + γ t χ t after all private histories. )We can then set up the HJB equation. Since dM t = α t γ t σ Y ( a − α t − α t m ) dt from the proofof Lemma A.2 , rV = sup a ∈ R {− ( a − θ ) − ( a − a [ α t + α t m ] + α t + 2 α t α t m + α t [ m + γ t χ t ])+ V t + α t γ t σ Y ( a − α t − α t m ) V m } . We then guess V ( θ, m, t ) = v t + v t θ + v t m + v t θ + v t m + v t θm and take analogous steps to those in the proof of Proposition 1. Namely, we ﬁrst showthat there is a core BVP consisting of ( β , β , β , γ ). Second, we construct a backward IVPversion of our original BVP that has a parametrized initial condition γ F for the γ − ODE: ˙ β t = α t (2 σ Y ) − × (cid:8) − rσ Y β t (2 − χ t ) + rσ Y (1 − χ t ) − γ t β t (1 − χ t ) (cid:9) (A.4)˙ β t = α t (2 σ Y ) − × (cid:8) rσ Y − β t [ β t γ t + rσ Y (2 − χ t )] + 2 β t γ t (1 − χ t ) (cid:9) (A.5)˙ β t = α t (2 σ Y ) − × (cid:8) rσ Y (2 − χ t ) + 2 β t [ β t γ t − rσ Y (2 − χ t )] (cid:9) (A.6)˙ γ t = α t γ t /σ Y (A.7)with initial condition ( β , β , β , γ ) = ( − χ − χ ) , − χ ) , , γ F ) and where χ = 1 − γ/γ o . From the proof of Lemma A.2, E t [( ˆ M t − M t ) ] = E t [( ´ t R ( t, s ) α s γ s σ Y dZ Ys ) ] = ´ t R ( t, s ) α s γ s σ Y ds = ´ t exp(2 ´ ts ˙ γ u γ u du )( − ˙ γ s ) ds = ´ t ( γ t /γ s ) ( − ˙ γ s ) ds = γ t (1 /γ t − /γ o ) = γ t χ t . The detailed steps can be found in the online appendix.

34e aim to prove that there exists γ F ∈ (0 , γ o ) such that the IVP has a (unique) solutionwhich satisﬁes γ T = γ o . ( γ F = 0 cannot work, as ( β , β , β , γ ) = (0 , / , / ,

0) is theunique solution.) As argued in the proof of Proposition 1, it suﬃces to show that the systemis uniformly bounded if γ t ∈ [0 , γ o ] over [0 , T ] (see the proof of Theorem 1 for further details).The α -ODE is ˙ α t = f α ( t, α t ) := rα t [1 − α t (2 − χ t )] and α = − χ >

0. By thecomparison theorem, α >

0; hence, by the same argument as in the proof of Lemma A.1, γ is increasing (in the backward system), so χ = 1 − γ/γ o < α = − χ and ˙ α > ddt (cid:16) − χ t (cid:17) | t =0 , the comparison theorem can be applied to α and 1 / (2 − χ ) to show α t ≥ / (2 − χ t ) ≥ /

2, with both inequalities strict for all t ∈ (0 , T ], for all r ≥

0; in turn,˙ α t ≤ α t ≥ t ∈ [0 , T ], with strict inequality for t ∈ (0 , T ] if and only if r >

0. It follows that for all t ∈ (0 , T ], α t ≤ α = − χ < B NF := β + β + β satisﬁes ˙ B NF t = α t σ Y (cid:8) rσ Y (2 − χ t )[1 − B NF t ] (cid:9) with B NF = 1;thus B NF ≡

1. By routine application of the comparison theorem to the backward system, β ∈ (1 / ,

1) and β ∈ (0 , β is bounded too; thus, a solution to the BVP for( β , β , β , γ ) exists, as discussed in the proof of Proposition 1. In the online appendix wecheck that the rest of the coeﬃcients are well deﬁned, ensuring the existence of an LME.The ﬁnal claim is α T → T → ∞ in the forward system. Indeed, since α > /

2, wehave γ T → T → ∞ ; thus χ T → α T = 1 / (2 − χ T ) →

1, all in forward form.For the proofs of Propositions 3 and 4, see the online appendix.

Appendix B: Proofs for Section 4

Proof of Lemma 1 . We consider a drift of the form ˆ a t + νa t , ν ∈ [0 , X . Also, let L in (10) denote a process that is measurable with respect to X . Inserting (10) into (8)yields a t = α t + α t L t + α t θ which the myopic player thinks drives Y , where α t = β t , α t = β t + β t (1 − χ t ), and α t = β t + β t χ t (the latter often abbreviated α t in this appendix).The myopic player’s ﬁltering problem is then conditionally Gaussian. Speciﬁcally, deﬁne d ˆ X t := dX t − [ˆ a t + ν ( α t + α t L t )] dt = να t θdt + σ X dZ Xt d ˆ Y t := dY t − [ α t + α t L t ] dt = α t θdt + σ Y dZ Yt , which are in the myopic player’s information set, and where the last equalities hold from hisperspective. By Theorems 12.6 and 12.7 in Liptser and Shiryaev (1977), his posterior beliefis Gaussian with mean ˆ M t and variance γ t (simply γ t in the main body) that evolve as d ˆ M t = να t γ t σ X [ d ˆ X t − να t ˆ M t dt ] + α t γ t σ Y [ d ˆ Y t − α t ˆ M t dt ] and ˙ γ t = − γ t α t Σ , (B.1)35ith Σ := ν /σ X +1 /σ Y . (These expressions still hold after deviations, which go undetected.)The long-run player can aﬀect ˆ M t via her choice of actions. Indeed, using that d ˆ X = ν ( a t − α t − α t L t ) dt + σ X dZ Xt and d ˆ Y t = ( a t − α t − α t L t ) dt + σ Y dZ Yt from her standpoint, d ˆ M t = ( κ t + κ t a t + κ t ˆ M t ) dt + B Xt dZ Xt + B Yt dZ Yt , where (B.2) κ t = α t γ t Σ , κ t = − κ t [ α t + α t L t ] , κ t = − α t κ t , B Xt = να t γ t σ X , B Yt = α t γ t σ Y . (B.3)On the other hand, since the long-run player always thinks that the myopic player ison path, the public signal evolves, from her perspective, as dX t = ( νa t + δ t + δ t ˆ M t dt + δ t L t ) dt + σ X dZ Xt . Because the dynamics of ˆ M and X have drifts that are aﬃne in ˆ M —withintercepts and slopes that are in the long-run player’s information set—and deterministicvolatilities, the pair ( ˆ M , X ) is conditionally Gaussian. Thus, by the ﬁltering equations inTheorem 12.7 in Liptser and Shiryaev (1977), M t := E t [ ˆ M t ] and γ t := E t [( M t − ˆ M t ) ] satisfy dM t = ( κ t + κ t a t + κ t M t ) dt (cid:124) (cid:123)(cid:122) (cid:125) = E t [( κ t + κ t a t + κ t ˆ M t ) dt ] + σ X B Xt + γ t δ t σ X [ dX t − ( νa t + δ t + δ t M t + δ t L t ) dt ] (B.4)˙ γ t = 2 κ t γ t + ( B Xt ) + ( B Yt ) − (cid:0) B Xt + γ t δ t /σ X (cid:1) , (B.5)with dZ t := [ dX t − ( νa t + δ t + δ t M t + δ t L t ) dt ] /σ X a Brownian motion from the long-runplayer’s standpoint. Critically, observe that since (B.4) is linear, one can solve for M t as an explicit function of past actions ( a s ) s

1) taking the system above as aprimitive (we do it for ν = 0, but it also holds otherwise). Setting ν = 0 and γ = χγ in thethird ODE, and writing γ for γ , the ﬁrst and third ODEs become (12)–(13). If now ν = 0,using (i)–(v) that deﬁne ( (cid:126) ˆ κ, ˆ B ) yields that (B.6) becomes dL t = ( (cid:96) t + (cid:96) t L t ) dt + B t dX t where( l t , l t , B t ) = [ σ X (1 − χ t )] − × ( − γ t χ t δ t δ t , − γ t χ t δ t ( δ t + δ t ) , γ t χ t δ t ) . (B.7)That L t coincides with E [ θ |F Xt ] is proved in the Online Appendix. (cid:3) Proof of Lemma 2.

Consider the system in ( γ , γ , χ ) from the proof of the previouslemma when ν = 0, and let δ t := ˆ u ˆ aθ + ˆ u a ˆ a α t . The local existence of a solution fol-lows from Peano’s Theorem. Suppose that the maximal interval of existence is [0 , ˜ T ), with˜ T ≤ T . Since the system is locally Lipschitz continuous in ( γ , γ , χ ) uniformly in t ∈ [0 , T ],its solution over [0 , ˜ T ) is unique (Picard-Lindel¨of). Applying the comparison theorem tothe pairs { γ , } and { γ , γ o } , we get γ t ∈ (0 , γ o ] over [0 , ˜ T ). Hence, γ /γ is well-deﬁned,and since it solves the χ -ODE, χ = γ /γ by uniqueness. Replacing γ = χγ and ν = 0in the χ − ODE then yields (13). A second application of the comparison theorem to { χ, } and { χ, } then implies χ ∈ [0 , γ = χγ ∈ [0 , γ o ), over [0 , ˜ T ). Since thesolution is bounded, if ˜ T < T , it can be extended to ˜ T by the continuity of the RHS of thesystem; and then subsequently extended beyond ˜ T by Peano’s theorem, a contradiction. Butif ˜ T = T , it can be extended to T —the ﬁrst part of the lemma holds. If β (cid:54) = 0, then ˙ γ < χ >

0, so by continuity of ˙ γ and ˙ χ , there exists (cid:15) > γ t < γ o and χ t > t ∈ (0 , (cid:15) ), and by the comparison theorem, these strict inequalities hold up to time T . (cid:3) Proof of Lemma 3.

Inserting ν = 0 in (B.3) deﬁning ( κ , κ , κ , B Xt ) yields that (B.4)becomes dM t = γ t α t σ Y ( a t − [ α t + α t L t + α t M t ]) dt + χ t γ t δ t σ X dZ t , where dZ t := [ dX t − ( νa t + δ t + δ t M t + δ t L t ) dt ] /σ X a Brownian motion from the long-run player’s standpoint. As for All the results in this proof extend (i) to ν ∈ [0 ,

1] and (ii) to δ a generic continuous function over[0 , T ], the latter case arising when the myopic player becomes forward looking. L , this one follows from (14) using (B.7) and that dX t = ( δ t + δ t L t + δ t M t ) dt + σ X dZ t from the long-run player’s perspective when ν = 0.We conclude with three observations. First, from (B.2) and (B.4), ˆ M t − M t is independentof the strategy followed, and hence so is Z t due to σ X dZ t = δ t ( ˆ M t − M t ) dt + σ X dZ Xt underthe true data-generating process. This strategic independence enables us to ﬁx an exogenousBrownian motion Z and then solve the best-response problem with Z in the laws of motionof M and L —i.e., the so-called separation principle for control problems with unobservedstates applies (see, for instance, Liptser and Shiryaev, 1977, Chapter 16).Second, it is clear from (18), (B.4)–(B.5), and the proof of Lemma 2 that no additionalstate variables are needed due to γ t := E t [( M t − ˆ M t ) ] = χ t γ t holding irrespective of thestrategy chosen. Third, the set of admissible strategies for the best-response problem thenconsists of all square-integrable processes that are progressively measurable with respect to( θ, M, L ). This set is clearly the appropriate set, and richer than that in Deﬁnition 1. (cid:3) Proof of Lemma 4.

By Lemma 2, the system of ODEs (12)-(13) admits a unique so-lution. The proof then consists of showing that ( γ, χ ( γ )) as in the lemma, with c = σ X u a ˆ a [ (cid:112) /σ Y + 4( u a ˆ a / [ σ X σ Y ] − /σ Y ] ∈ (0 , c = σ X u a ˆ a [ (cid:112) /σ Y + 4( u a ˆ a / [ σ X σ Y ] +1 /σ Y ] > d = [ σ Y ˆ u a ˆ a ] ( c + c ) /σ X >

0, is a solution. This is done in the Online Appendix, wherewe also show how to construct the candidate χ and the above coeﬃcients. (cid:3) B.1: Proof of Theorem 1

In light of the generality of Theorem 2, we only sketch the proof of Theorem 1 here (all thedetails are in the Online Appendix). Speciﬁcally, the proof can be divided into two steps: reduction and shooting . In the reduction step, we show that the problem of solving theBVP stated in Section 4.3 can be translated to ﬁnding a value γ F such that the backward IVPconsisting of ODEs for ( v γ, β , β , γ ) with γ = γ F has a solution over [0 , T ] that shoots γ to γ o at T ; this is done after recognizing that v and β can be written as functions of the othervariables in closed form. We then tackle the shooting step via a contradiction. Speciﬁcally,we consider the supremum over values ˜ γ F such that all the IVPs with γ = γ F ∈ [0 , ˜ γ F )admit a solution over [0 , T ]—towards a contradiction, assume that γ T < γ o for all initialconditions in the maximal set induced (otherwise, our desired conclusion follows from thecontinuity of the solutions). Under this assumption, γ t ∈ [0 , γ o ] at all times, and one canshow that for some horizons as in the theorem, the solutions can be uniformly bounded overall initial values in the maximal set. But this in turn implies that a solution to our IVP A ﬁnal step of veriﬁcation —i.e., checking that the rest of the coeﬃcients are well deﬁned as the laststep for ﬁnding a LME—is a special instance of “Step 5” in the proof of Theorem 2, and is thus omitted. , T ] exists for initial values of γ strictly above the supremum, a contradiction. B.2: Proof of Theorem 2

After a change of variables ˜ β = β / (1 − χ ) , ˜ v = v γ/ (1 − χ ) , ˜ v = v γ/ (1 − χ ) , the BVP is˙˜ v t = γ t (cid:110) − β t + 2 β t ˜ β t + ˜ β t + ˜ v t (cid:2) α t /σ Y + 2(ˆ u ˆ aθ + ˆ u ˆ aa α t ) χ t /σ X (cid:3)(cid:111) (B.8)˙˜ v t = γ t (cid:110) ( − α t ) β t − β t + ˜ v t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) χ t /σ X − β t χ t (cid:111) (B.9)˙ β t = γ t [4 σ X σ Y (1 + ˆ u ˆ aθ χ t )] − × (cid:8) σ X α t (cid:0) [ˆ u ˆ aθ + ˆ u ˆ aa α t ] − α t [ˆ u ˆ aθ + ˆ u ˆ aa α t ] (cid:1) +4 σ X α t β t ( α t − β t ) + ˜ v t α t χ t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) (ˆ u ˆ aθ − β t )+4 β t χ t (cid:2) ˆ u aθ σ Y + ˆ u ˆ aθ α t (cid:0) ˆ u ˆ aθ σ X + 2ˆ u ˆ aa σ Y − σ X β t (cid:1)(cid:3) +4ˆ u ˆ aa β t α t χ t (cid:0) u ˆ aθ σ X + ˆ u ˆ aa σ Y + σ X α t [ˆ u ˆ aa − (cid:1) − σ Y (ˆ u ˆ aθ + ˆ u ˆ aa α t ) ˜ β t χ t + 4 σ Y (ˆ u ˆ aθ + ˆ u ˆ aa α t ) β t (ˆ u ˆ aθ − β t ) χ t (cid:111) (B.10)˙˜ β t = γ t [4 σ X σ Y (1 + ˆ u ˆ aθ χ t )] − × (cid:110) σ X α t (cid:104) ˆ u aθ + 2 β t + α t (ˆ u ˆ aθ [2ˆ u ˆ aa −

1] + 2 ˜ β t ) (cid:105) +2 σ X α t ˆ u ˆ aa (ˆ u ˆ aa −

1) + α t χ t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) (cid:104) − v t + ˜ v t (ˆ u ˆ aθ − β t ) (cid:105) +4 α t χ t ˆ σ X (cid:104) u ˆ aθ β t + (ˆ u aθ + ˆ u ˆ aa α t [2ˆ u ˆ aθ + (ˆ u ˆ aa − α t ]) ˜ β t (cid:105) − u ˆ aθ + ˆ u ˆ aa α t ) (cid:104) ˆ u ˆ aθ ˜ v t α t + σ Y ˜ β t ( − ˆ u ˆ aθ + 2 ˜ β t ) (cid:105) χ t (cid:111) (B.11)˙ β t = γ t [4 σ X σ Y (1 + ˆ u ˆ aθ χ t )] − × (cid:8) − σ X α t β t − χ t [˜ v t α t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) (ˆ u ˆ aθ − β t )]+2 α t χ t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) (cid:2) − ˆ u ˆ aθ σ X + σ X α t (2ˆ u ˆ aθ + [ˆ u ˆ aa − α t − (cid:3) − α t χ ˜ v t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) − α t χ t [2ˆ u ˆ aθ σ X α t β t − σ X β t ] − σ X χ t α t β t [ˆ u ˆ aθ α t (2ˆ u ˆ aa −

1) + ˆ u ˆ aa α t (ˆ u ˆ aa −

1) + ˆ u ˆ aθ (ˆ u ˆ aθ − β t )] − σ Y χ t (ˆ u ˆ aθ + ˆ u ˆ aa α t ) ( − α t ) ˜ β t + 8 σ Y (ˆ u ˆ aθ + ˆ u ˆ aa α t ) β t ˜ β t χ t (cid:111) (B.12)˙ γ t = − α t γ t /σ Y (B.13)˙ χ t = γ t (cid:8) α t (1 − χ t ) /σ Y − (ˆ u ˆ aθ + ˆ u ˆ aa α t ) χ t /σ X (cid:9) . (B.14)with boundaries ( γ , χ , ˜ v T , ˜ v T , β T , ˜ β T , β T ) = ( γ o , , , , ˆ u ˆ aa +2ˆ u ˆ aθ − ˆ u ˆ aa χ T ) , ˆ u ˆ aa +2ˆ u ˆ aθ − ˆ u ˆ aa χ T ) , /

2) andwhere α := β + β χ . (By Lemma 2 and Assumption 1 part (ii) (using u aθ = u a ˆ a = 1 /

2) thedenominators are strictly positive, and hence well deﬁned after all possible histories.)The proof proceeds in ﬁve steps. The main task is to obtain a solution (˜ v , ˜ v , β , ˜ β , β , γ, χ )to the boundary value problem for all T < T ( γ o ); from there, it is straightforward to verify See Bonatti et al. (2017) for an application of this method to a symmetric oligopoly model featuringdispersed ﬁxed private information, imperfect public monitoring, and multiple long-run ﬁrms.

Step 1 : Convert BVP to ﬁxed point problem in terms of a parameterized IVP.

It is usefulto introduce z = (˜ v , ˜ v , β , ˜ β , β , γ, χ ) T and write the system of ODEs (B.8)-(B.14) as˙ z t = F ( z t ). We write ˜ z = ( z , z , . . . , z ) T and ˜ F ( z ) = ( F ( z ) , F ( z ) , . . . , F ( z )) T .Deﬁne B : [0 , → R by B ( χ ) = (cid:16) , , ˆ u ˆ aa +2ˆ u ˆ aθ − ˆ u ˆ aa χ ) , (ˆ u ˆ aa +2ˆ u ˆ aθ )2(2 − ˆ u ˆ aa χ ) , / (cid:17) T , formed by writing theterminal value of ˜ z as a function of χ . Deﬁne s ∈ R by s = B (0) = (0 , , ˆ u ˆ aa +2ˆ u ˆ aθ , ˆ u ˆ aa +2ˆ u ˆ aθ , / T .For x ∈ R n , let || x || ∞ denote the sup norm, sup ≤ i ≤ n | x i | . For any ρ >

0, let S ρ ( s ) denotethe closed ρ -ball around s , S ρ ( s ) := { s ∈ R | || s − s || ∞ ≤ ρ } . For all s ∈ S ρ ( s ), let IVP- s denote the initial value problem deﬁned by (B.8)-(B.14)and initial conditions (˜ v , ˜ v , β , ˜ β , β , γ , χ ) = ( s, γ o , s exists, it is unique as F is of class C ; denote it by z ( s ), where z ( s ) = (˜ z ( s ) T , γ ( s ) , χ ( s )) T =(˜ v ( s ) , ˜ v ( s ) , β ( s ) , ˜ β ( s ) , β ( s ) , γ ( s ) , χ ( s )) T , where we suppress additional dependence on( γ o ,

0) which remain ﬁxed. Note that such a solution solves the BVP if and only if˜ z T ( s ) = B ( χ T ( s )) , (B.15)as the initial values γ ( s ) = γ o and χ ( s ) = 0 are satisﬁed by construction. Note also that˜ z T ( s ) = s + ´ T ˜ F ( z t ( s )) dt ; hence (B.15) is satisﬁed if and only if s is a ﬁxed point of thefunction g : S ρ ( s ) → R deﬁned by g ( s ) := B ( χ T ( s )) − ´ T ˜ F ( z t ( s )) dt. Note, moreover, thatfor any solution, we have by Lemma 2 that χ t ∈ [0 , ¯ χ ), where we deﬁne ¯ χ := 1 for this proof. Step 2 : Obtain suﬃcient conditions for IVP- s to have unique and uniformly bounded so-lutions for all s ∈ S ρ ( s ) , any ρ > . Speciﬁcally, for arbitrary

K >

0, we ensure that thesolution ˜ z t ( s ) varies at most K from its starting point s for all t ∈ [0 , T ], and thus by thetriangle inequality, this solution varies most ρ + K from s . These bounds will be used later. Lemma B.1.

Fix γ o , ρ, K > . There exists a threshold T SBC ( γ o ; ρ, K ) > such that if T < T

SBC ( γ o ; ρ, K ) , then for all s ∈ S ρ ( s ) a unique solution to IVP- s exists over [0 , T ] with ˜ z t ( s ) ∈ S ρ + K ( s ) , all t ∈ [0 , T ] . We call this property the System Bound Condition (SBC).Proof. Fix any s ∈ S ρ ( s ). Since ˜ F is of class C , a local solution exists, and solutions areunique given existence. We now construct bounds on ˜ F by writing ˜ F ( z ( s )) = ˜ F ( z ( s ) − s + s )and using the conjectured bounds || ˜ z ( s ) − s || ∞ < ρ + K , γ ∈ (0 , γ o ], χ ∈ [0 , ¯ χ ) for thesolution, when it exists. Using these bounds on ˜ F , we identify T SBC ( γ o ; ρ, K ) such that forall t < T SBC ( γ o ; ρ, K ) the solution to IVP- s (exists and) satisﬁes the conjectured bounds.Note that the desired component-wise inequalities | z it ( s ) − s i | < ρ + K , i ∈ { , , . . . , } ,imply the further bounds | ˜ v t | , | ˜ v t | < ¯ v ( ρ, K ) := ¯ v ( ρ, K ) := ρ + K , | β t | , | ˜ β t | < ¯ β ( ρ, K ) :=¯ β ( ρ, K ) := | ˆ u ˆ aa +2ˆ u ˆ aθ | + ρ + K , | β t | < ¯ β ( ρ, K ) := 1 / ρ + K and | α t | < ¯ α ( ρ, K ) :=40 β ( ρ, K ) ¯ χ + ¯ β ( ρ, K ). A lower bound on 1 + ˆ u ˆ aθ χ t in the denominators of (B.10)-(B.12) is u := min { , u ˆ aθ } ; using that u aθ = u a ˆ a = 1 /

2, Assumption 1 part (ii) implies u > h i ( γ o ; ρ, K ), i ∈ { , , . . . , } , proportional to γ o , which bound the magnitudes of the RHS in (B.8)-(B.12). Now for arbitrary ( ρ, K ) ∈ R , deﬁne T SBC ( γ o ; ρ, K ) := min i ∈{ , ,..., } Kh i ( γ o ; ρ,K ) . We claim that, for any t < T

SBC ( γ o ; ρ, K ), if a solution exists at time t , then || ˜ z t ( s ) − s || ∞ < K , γ t ∈ (0 , γ o ] and χ t ∈ [0 , ¯ χ ). To see this, suppose by way of contradiction thatthere is some s ∈ S ρ and some t < T SBC ( γ o ; ρ, K ) at which a solution to IVP- s exists buteither | z it ( s ) − s i | ≥ K for some i ∈ { , , . . . , } , γ t / ∈ (0 , γ o ] or χ t / ∈ [0 , ¯ χ ); let τ be theinﬁmum of such times. Now by Lemma 2, it cannot be that γ t / ∈ (0 , γ o ] or χ t / ∈ [0 , ¯ χ ) while z t ( s ) exists, so (by continuity of z t ( s ) in t ) it must be that for some i , | z iτ ( s ) − s i | ≥ K , while γ t ∈ (0 , γ o ] and χ t ∈ [0 , ¯ χ ) hold for t ∈ [0 , τ ]. By construction of h i ( γ o ; ρ, K ), for all t ∈ [0 , τ ]we have | F i ( z t ( s )) | ≤ h i ( γ o ; ρ, K ) and thus | z iτ ( s ) − s i | ≤ ´ τ | F i ( z t ( s )) | dt ≤ τ · h i ( γ o ; ρ, K )

SBC ( γ o ; ρ, K ), a (unique) solution to IVP- s exists over [0 , T ], since an explosion atany time would imply that the previous bound is violated at an earlier time. Step 3:

Establish that g is a well deﬁned, continuous self-map on S ρ when T is below athreshold T ( γ o ; ρ, K ) . The expression for the latter is shown in the proof Lemma B.2 below.

Lemma B.2.

Fix γ o > , ρ > and K > . There exists T ( γ o ; ρ, K ) ≤ T SBC ( γ o ; ρ, K ) such that for all T < T ( γ o ; ρ, K ) , g is a well deﬁned, continuous self-map on S ρ .Proof. First, the inequality T ( γ o ; ρ, K ) ≤ T SBC ( γ o ; ρ, K ), which holds by construction (ascarried out below), ensures that a unique solution to IVP- s exists for all s ∈ S ρ , and hence g is well deﬁned on S ρ . Now g ( s ) is equal to B ( χ T ( s )) − [˜ z T ( s ) − s ]. The continuity of g thenfollows from ˜ z T and χ T being continuous in s and B in χ .To complete the proof, we show that if T < T ( γ o ; ρ, K ), g satisﬁes the condition || g ( s ) − s || ∞ ≤ ρ for all s ∈ S ρ , which we refer to as the Self-Map Condition (SMC).Note that g ( s ) − s = ∆( s ) − ´ T ˜ F ( z t ( s )) dt , where ∆( s ) := B ( χ T ( s )) − B (0) takes thevalue | ˆ u ˆ aa +2ˆ u ˆ aθ | [(2 − ˆ u ˆ aa χ T ( s )) − − /

2] in its third and fourth components and zero in theothers. The h i ( γ o ; ρ, K ) constructed in the proof of the previous lemma will provide us abound for the components of ´ T ˜ F ( z t ( s )) dt , but we must also bound ∆ ( s ) = ∆ ( s ).Recalling that χ ∈ [0 , χ implies that ˙ χ t ≤ γ t { α t (1 − χ t ) /σ Y } ≤ γ o ¯ α /σ Y .We also have χ T = ´ T ˙ χ t dt ≤ ( γ o ¯ α /σ Y ) T . Next, observe that 2 − ˆ u ˆ aa χ T ( s ) ≥ φ :=min { , − ˆ u ˆ aa } >

0, where strict inequality follows from Assumption 1 part (iv) using u a ˆ a = 1 /

2. Hence, | ∆ ( s ) | = | ˆ u ˆ aa +2ˆ u ˆ aθ | (cid:12)(cid:12)(cid:12) ˆ u ˆ aa χ T ( s )2(2 − ˆ u ˆ aa χ T ( s )) (cid:12)(cid:12)(cid:12) ≤ | ˆ u ˆ aa +2ˆ u ˆ aθ | | ˆ u ˆ aa | ( γ o ¯ α /σ Y ) Tφ . i : R → R + by ∆ i ( γ o ; ρ, K ) = | ˆ u ˆ aa +2ˆ u ˆ aθ | | ˆ u ˆ aa | γ o ¯ α /σ Y φ for i ∈ { , } and¯∆ i ( γ o ; ρ, K ) = 0 for i ∈ { , , } . Note that for all i ∈ { , , , , } , ¯∆ i ( γ o ; ρ, K ) is pro-portional to γ o , and by construction, | ∆ i ( s ) | ≤ T ¯∆ i ( γ o ; ρ, K ). Finally, deﬁne T ( γ o ; ρ, K ) := min (cid:26) T SBC ( γ o ; ρ, K ) , min i ∈{ , ,..., } ρ ¯∆ i ( γ o ; ρ, K ) + h i ( γ o ; ρ, K ) (cid:27) . (B.16)We now have | g i ( s ) − s i | = | ∆ i ( s ) − ´ T F i ( z t ( s )) dt | bounded above by | ∆ i ( s ) | + ´ T | F i ( z t ( s )) | dt ≤ T ¯∆ i ( γ o ; ρ, K )+ T h i ( γ o ; ρ, K ) < ρ , where we have used that T < T ( γ o ; ρ, K ) ≤ ρ ¯∆ i ( γ o ; ρ,K )+ h i ( γ o ; ρ,K ) by construction. Hence, | g i ( s ) − s i | ≤ ρ for all i ∈ { , , . . . , } , completing the proof. Step 4 : Apply a ﬁxed point theorem to g to ﬁnd s such that the solution to IVP- s solvesthe BVP. By Lemma B.2, we can apply Brouwer’s Theorem: there exists s ∗ ∈ S ρ such that s ∗ = g ( s ∗ ), and hence the solution to IVP- s ∗ is a solution to the BVP. That T ( γ o ) ∈ O (1 /γ o )follows simply from the denominators of the underlying expressions begin proportional to γ o . One can further optimize T ( γ o ; ρ, K ) over ( ρ, K ) ∈ R to obtain T ( γ o ). Step 5 : Show that given a solution to the BVP as above, the remaining coeﬃcients are welldeﬁned and thus a LME exists.

Since γ > χ < , T ] in our solution, the tuple( v , v , β , β , β , γ, χ ) (obtained by reversing the change of variables at the beginning of theproof) solves our original boundary value problem by construction.To verify that the remaining coeﬃcients are well deﬁned, consider ﬁrst the ODE for α :˙ α t = α t (ˆ u ˆ aθ +ˆ u ˆ aa α t ) γ t χ t σ X σ Y (1+ˆ u ˆ aθ χ t ) (cid:110) (ˆ u ˆ aθ + ˆ u ˆ aa α t )[2 σ X α t − ˜ v t α t − σ Y ˜ β t χ t ] − σ X α t (cid:111) . By continuity ofthe solution to the BVP, the RHS of the equation above is locally Lipschitz continuous in α , uniformly in t . Moreover, α T = β T χ T + β T = u ˆ aθ χ T − ˆ u ˆ aa χ T > u aθ = u a ˆ a = 1 /

2. By the comparison theorem, α t > t ∈ [0 , T ].Using the solution to the BVP and the inequalities above, we identify the remainingequilibrium coeﬃcients. We have directly v t = σ Y β t γ t α t , v t = σ Y [ β t (2 − χ t ) − β t − ˆ u ˆ aθ ] γ t α t , v t = − σ Y (1 − β t ) γ t α t and v t = σ Y [ β t − β t (1 − χ t )] γ t α t . The last three are well deﬁned as α, γ >

0. In theremaining ODEs—included in the online appendix—( β , v , v ) is uncoupled from ( v , v ).By inspection, the former has (unique) solution ( β , v , v ) = (0 , , v = 0, and thesolutions for ( v , v ) can be obtained directly by integration, given their terminal values. (cid:3) B.3: Existence Proof Sketch for the General Model

In what follows, we refer the reader to the Mathematica ﬁle spm.nb on our websites. There,we work under the normalization ∂ U/∂a = ∂ ˆ U /∂ ˆ a = −

1, as scaling ﬂow payoﬀs by afactor does not aﬀect incentives; consequently, U xy = u xy , for x, y ∈ { a, ˆ a, θ } in that ﬁle.As in the paper, we ﬁrst show that the task of ﬁnding LMEs can be reduced to a BVP42n ( v , v , (cid:126)β, γ, χ ). In this BVP, Assumption 1 and the fact that χ ∈ [0 ,

1) ensure that theterminal conditions ((1 − u a ˆ a ˆ u a ˆ a ) and (1 − u a ˆ a ˆ u a ˆ a χ T ) in their denominators), as well as theODEs (with ( u aθ + u a ˆ a ˆ u ˆ aθ χ t ) and (1 − u a ˆ a ˆ u a ˆ a ) in their denominators) are all well deﬁned.The change of variables (˜ v , ˜ v , ˜ β ) = ( v γ (1 − χ ) , v γ − χ , β − χ ) takes us to a new well deﬁned BVPconsisting of (˜ v , ˜ v , β , ˜ β , β , γ, χ ), where (1 − χ t ) is absent from their denominators.The existence proof for the latter BVP now follows the same Steps 1–4 as in the proof ofTheorem 2. Regarding Step 5: 1) 1 − χ > γ > v , v , β ) from(˜ v , ˜ v , ˜ β ); 2) α T (cid:54) = 0 and the comparison theorem applied to α and 0 (in backward form)yield α (cid:54) = 0, so ( v , v , v , v ) are well deﬁned; 3) ( β , v ) form a linear system in themselvesthat does not contain v , v or v , so its solution exists and is unique; 4) the ODEs for v , v and v are linear in themselves and uncoupled, so they have unique solutions. Proofs for Section 5:

Refer to the online appendix.

References

Back, K., C. H. Cao, and G. A. Willard (2000): “Imperfect competition amonginformed traders,”

The Journal of Finance , 55, 2117–2155.

Bergemann, D. and P. Strack (2015): “Dynamic revenue maximization: A continuoustime approach,”

Journal of Economic Theory , 159, 819–853.

Bonatti, A. and G. Cisternas (2019): “Consumer Scores and Price Discrimination,”

Review of Economic Studies . Bonatti, A., G. Cisternas, and J. Toikka (2017): “Dynamic oligopoly with incom-plete information,”

The Review of Economic Studies , 84, 503–546.

Bouvard, M. and R. L´evy (2019): “Horizontal Reputation and Strategic Audience Man-agement,”

Journal of the European Economic Association . Carlsson, H. and S. Dasgupta (1997): “Noise-proof equilibria in two-action signalinggames,”

Journal of Economic Theory , 77, 432–460.

Cisternas, G. (2018): “Two-sided learning and the ratchet principle,”

The Review ofEconomic Studies , 85, 307–351.

Daley, B. and B. Green (2012): “Waiting for News in the Market for Lemons,”

Econo-metrica , 80, 1433–1504. 43 essein, W. and T. Santos (2006): “Adaptive Organizations,”

Journal of Political Econ-omy , 114, 956–995.

Dilm´e, F. (2019): “Dynamic Quality Signaling with Hidden Actions,”

Games and EconomicBehavior , 113, 116–136.

Ely, J. C. and J. V¨alim¨aki (2002): “A robust folk theorem for the prisoner’s dilemma,”

Journal of Economic Theory , 102, 84–105.

Faingold, E. and Y. Sannikov (2011): “Reputation in continuous-time games,”

Econo-metrica , 79, 773–876.

Feltovich, N., R. Harbaugh, and T. To (2002): “Too Cool for School? Signalling andCountersignalling,”

The RAND Journal of Economics , 33, 630–649.

Foster, F. D. and S. Viswanathan (1994): “Strategic trading with asymmetrically in-formed traders and long-lived information,”

Journal of Financial and Quantitative Anal-ysis , 29, 499–518.——— (1996): “Strategic trading when agents forecast the forecasts of others,”

The Journalof Finance , 51, 1437–1478.

Garicano, L. (2000): “Hierarchies and the Organization of Knowledge in Production,”

Journal of Political Economy , 108, 874–904.

Grant, R. (1996): “Toward a knowledge-based theory of the ﬁrm,”

Strategic managementjournal , 17, 109–122.

Gryglewicz, S. and A. Kolb (2019): “Strategic Pricing in Volatile Markets,” Tech. rep.

Heinsalu, S. (2018): “Dynamic Noisy Signaling,”

American Economic Journal: Microeco-nomics , 10, 225–249.

Hermalin, B. E. (1998): “Toward an economic theory of leadership: Leading by example,”

American Economic Review , 1188–1206.

Keller, H. B. (1968):

Numerical Methods for Two-Point Boundary-Value Problems , Blais-dell Publishing Co.

Kolb, A. M. (2019): “Strategic real options,”

Journal of Economic Theory , 183, 344–383.

Kyle, A. S. (1985): “Continuous auctions and insider trading,”

Econometrica , 1315–1335.44 evin, J. (2003): “Relational incentive contracts,”

American Economic Review , 93, 835–857.

Liptser, r. S. and A. Shiryaev (1977):

Statistics of Random Processes 1, 2 , Springer-Verlag, New York.

Mailath, G. J. and S. Morris (2002): “Repeated games with almost-public monitoring,”

Journal of Economic Theory , 102, 189–228.

Marschak, J. (1955): “Elements for a theory of teams,”

Management Science , 1, 127–137.

Marschak, J. and R. Radner (1972):

Economic Theory of Teams , New Haven, CT:Yale University Press.

Matthews, S. A. and L. J. Mirman (1983): “Equilibrium limit pricing: The eﬀects ofprivate information and stochastic demand,”

Econometrica , 981–996.

Milgrom, P. R. and J. D. Roberts (1992):

Economics, organization and management ,Prentice-Hall.

Nonaka, I. (1991): “The knowledge-creating company,”

Harvard Business Review , 69,96–104.

Phelan, C. and A. Skrzypacz (2012): “Beliefs and private monitoring,”

Review ofEconomic Studies , 79, 1637–1660.

Rantakari, H. (2008): “Governing Adaptation,”

Review of Economic Studies , 75, 1257–1285.

Sannikov, Y. (2007): “Games with imperfectly observable actions in continuous time,”

Econometrica , 75, 1285–1329.

Simon, H. A. (1957):

Models of man; social and rational , Wiley.

Teschl, G. (2012):

Ordinary diﬀerential equations and dynamical systems , vol. 140, Amer-ican Mathematical Society.

Williamson, O. E. (1996):

The mechanisms of governance , Oxford University Press.

Yang, L. and H. Zhu (2019): “Back-Running: Seeking and Hiding Fundamental Infor-mation in Order Flows,”