[PDF] Equilibrium Behaviors in Repeated Games

Abstract

We examine a patient player's behavior when he can build reputations in front of a sequence of myopic opponents. With positive probability, the patient player is a commitment type who plays his Stackelberg action in every period. We characterize the patient player's action frequencies in equilibrium. Our results clarify the extent to which reputations can refine the patient player's behavior and provide new insights to entry deterrence, business transactions, and capital taxation. Our proof makes a methodological contribution by establishing a new concentration inequality.

Full PDF

aa r X i v : . [ ec on . T H ] A ug Equilibrium Behaviors in Reputation Games

Yingkai Li ∗ Harry Pei † August 4, 2020

Abstract

We examine a patient player’s behavior when he can build a reputation in front of a sequenceof myopic opponents. With positive probability, the patient player is a commitment type whomechanically plays his Stackelberg action in every period. We characterize the patient player’saction frequencies in equilibrium. Our results clarify the extent to which reputation effects canreﬁne the patient player’s equilibrium behavior.

Keywords: reputation, equilibrium behavior, equilibrium reﬁnement, Wald’s identity

JEL Codes:

D82, D83

Economists have long recognized that individuals, ﬁrms, and governments can beneﬁt from good reputations.As shown in the seminal work of Fudenberg and Levine (1989), a patient player can guarantee himself a highpayoff when facing a sequence of myopic opponents who believe that the patient player might be committedto play a particular action. Their result can be viewed as an equilibrium reﬁnement, which in many games ofinterest, selects the patient player’s highest equilibrium payoff in repeated complete information games.This paper studies the effects of reputations on the patient player’s behavior, which have been underex-plored in the reputation literature. Different from existing works on behavior that restrict attention to particularequilibria or games with particular stage-game payoffs, we identify tight bounds on the patient player’s actionfrequencies that apply to all equilibria under general stage-game payoffs. The motivation of our approach is toderive predictions that can be tested by researchers who do not know which equilibrium players coordinate on.Our ﬁndings also clarify the extent to which reputation effects can reﬁne the patient player’s behavior. ∗ Department of Computer Science, Northwestern University. Email: [email protected] † Department of Economics, Northwestern University. Email: [email protected] This includes for example, Bar-Isaac (2003), Phelan (2006), and Liu (2011). There is a separate strand of works that focuson games with particular stage-game payoff functions and examine players’ behaviors in ﬁnite horizon reputation games, such asKreps and Wilson (1982), Milgrom and Roberts (1982), Barro (1986), and Schmidt (1993). By contrast, we study players’ behaviorsin inﬁnite horizon games under more permissive conditions on stage-game payoffs.

Stackelberg action ) in every period. Themyopic players learn about the patient player’s type by observing all the actions taken in the past.We examine the extent to which the option to imitate the commitment type can motivate the patient playerto play his Stackelberg action. Theorem 1 characterizes tight bounds on the discounted frequencies (or frequen-cies ) with which the strategic-type patient player plays his Stackelberg action in equilibrium. We show thatthe maximal frequency equals one and the minimal frequency equals the value of the following linear programthat can be computed in polynomial time: Choose a distribution over action proﬁles in order to minimize theprobability of the Stackelberg action subject to two constraints. First, each action proﬁle in the support of thisdistribution satisﬁes the myopic player’s incentive constraint. Second, the patient player’s expected payoff fromthis distribution is no less than his Stackelberg payoff. Intuitively, the patient player can approximately attainhis Stackelberg payoff by imitating the commitment type. In order to provide him an incentive not to playhis Stackelberg action, his continuation value after separating from the commitment type must be at least hisStackelberg payoff.The substantial part of our proof is to construct equilibria that approximately attain this lower bound whenthe patient player’s discount factor is close to one. In order to calculate the patient player’s action frequencies,we establish a discounted version of the Wald’s identity (Lemma A.5) that is of separate technical interest.Theorem 2 builds upon Theorem 1 and identiﬁes a sufﬁcient condition under which a distribution over thepatient player’s actions corresponds to his action frequency in some equilibrium of the reputation game. Whenthe patient player’s Stackelberg payoff coincides with his highest equilibrium payoff in the repeated completeinformation game (such as the entry deterrence game and the product choice game), our sufﬁcient condition isalso necessary, in which case for every distribution over action proﬁles from which the patient player obtainshis Stackelberg payoff, there exists an equilibrium of reputation game in which the patient player’s actionfrequencies coincide with this distribution. Our result implies that reputation effects cannot reﬁne the patientplayer’s behavior beyond the fact that his equilibrium payoff is weakly greater than his Stackelberg payoff. Our results are robust when there are multiple pure-strategy commitment types. We exclude mixed-strategy commitment types fortwo reasons. First, as shown in Fudenberg and Levine (1992), the patient player’s guaranteed payoff in reputation games with mixed-strategy commitment types can be strictly greater than his highest equilibrium payoff in the repeated complete information game. Thisgoes against our interpretation that the presence of commitment type is an equilibrium reﬁnement. Second, commitment types that playthe same mixed action in every period are hard to rationalize using rational types that have reasonable stage-game payoffs.

INTRODUCTION 3

Related Literature:

Our paper contributes to the reputation literature by examining the patient player’s be-havior. Our research question contrasts to the ones in Fudenberg and Levine (1989, 1992) that focus exclusivelyon players’ payoffs. Our result clariﬁes the extent to which reputation effects can reﬁne the patient player’s be-havior in repeated complete information games studied by Fudenberg, Kreps, and Maskin (1990).Existing works that study players’ behaviors in reputation games focus on ﬁnite-horizon games or re-strict attention to particular equilibria or particular payoff structures. For example, Kreps and Wilson (1982)and Milgrom and Roberts (1982) characterize sequential equilibria in ﬁnite horizon entry deterrence games.Schmidt (1993) characterizes Markov equilibria in repeated bargaining games. Phelan (2006), Ekmekci (2011),Liu (2011), and Liu and Skrzypacz (2014) restrict attention to games with monotone-supermodular payoffs or2 × Ekmekci and Maestri (2019) study the common properties of all equilibria in repeated games between twolong-lived players in which the uninformed player chooses between continuing and irreversibly stopping thegame. By contrast, the uninformed players can ﬂexibly choose their actions in our model, and the reversibilityof actions is crucial for our constructive proof. Pei (2020) provides sufﬁcient conditions under which the patientplayer’s on-path behavior is the same in all equilibria of a reputation game. In contrast to our model that restrictsattention to private value environments but allows for general stage-game payoffs, his result requires nontrivialinterdependent values and restricts attention to games with monotone-supermodular payoffs.Cripps et al. (2004) show that when the monitoring structure has full support, the myopic players eventuallylearn the patient player’s type and play converges to an equilibrium of the repeated complete information game.However, their results do not characterize the speed of convergence or players’ behaviors in ﬁnite time, andtherefore, do not imply what are the discounted average frequencies with which the patient player plays each ofhis actions. By contrast, we focus on games with perfect monitoring and examine the patient player’s discountedaction frequency. Our measure of the patient player’s behavior is continuous at inﬁnity, and therefore, can betested by researchers who can only observe players’ behaviors in a ﬁnite number of periods. Our result requires an additional requirement, that the patient player has a unique Stackelberg action and the myopic players havea strict best reply against this Stackelberg action. This is satisﬁed for generic stage-game payoffs given that the stage game is ﬁnite.

MODEL 4

Time is discrete, indexed by t = , , , ... . A patient player 1 with discount factor δ ∈ ( , ) interacts withan inﬁnite sequence of myopic player 2s, arriving one in each period and each plays the game only once. Inperiod t , players simultaneously choose their actions ( a t , b t ) ∈ A × B , and receive stage-game payoffs u ( a t , b t ) and u ( a t , b t ) . We assume that A and B are ﬁnite sets, with | A | ≥ | B | ≥ : ∆ ( B ) ⇒ A \{ ∅ } and BR : ∆ ( A ) ⇒ B \{ ∅ } be player 1’s and player 2’s best reply correspondences in the stage-game. The set of player 1’s (pure)Stackelberg actions is arg max a ∈ A { min b ∈ BR ( a ) u ( a , b ) } . Assumption 1.

Player has a unique Stackelberg action, and player has a unique best reply againstplayer ’s Stackelberg action. Since A and B are ﬁnite sets, Assumption 1 is satisﬁed for generic u and u , for example, when each playerhas a strict best reply against each of his opponent’s pure actions. Let a ∗ be player 1’s Stackelberg action, andlet b ∗ be player 2’s unique best reply against a ∗ . We call u ( a ∗ , b ∗ ) player 1’s (pure) Stackelberg payoff.Let B ∗ ≡ { β ∈ ∆ ( B ) |∃ α ∈ ∆ ( A ) s.t. supp ( β ) ⊂ BR ( α ) } ⊂ ∆ ( B ) , (2.1)which is the set of player 2’s mixed actions that best reply against some α ∈ ∆ ( A ) . Since player 2s are myopic,they will never take actions that do not belong to B ∗ . As a result, player 1’s minmax payoff is: v ≡ min β ∈ B ∗ max a ∈ A u ( a , β ) . (2.2) Assumption 2. u ( a ∗ , b ∗ ) is not a Nash equilibrium in the stage game and u ( a ∗ , b ∗ ) > v . Assumption 2 requires player 1 to have a strict incentive to deviate from his Stackelberg action in theone-shot game, and can strictly beneﬁt from committing to his Stackelberg action. This is satisﬁed (i) in the en-try deterrence games of Kreps and Wilson (1982) and Milgrom and Roberts (1982), in which the incumbent’sStackelberg action is to ﬁght potential entrants, despite its stage-game payoff is strictly higher when it accom-modates entry; (ii) in the product choice game of Mailath and Samuelson (2001, 2006), in which the seller’sStackelberg action is to supply high quality, despite it can save cost by undercutting quality; (iii) in the mone-tary and ﬁscal policy games of Barro (1986) and Phelan (2006), in which the government’s Stackelberg actionis to set low inﬂation rates or low tax rates, although it is tempted to raise inﬂation in order to boost economicactivities or to raise taxes in order to increase tax revenue. Our assumption rules out coordination games such MODEL 5as the battle of sexes in which ( a ∗ , b ∗ ) is a Nash equilibrium of the stage game, and zero-sum games such asmatching pennies and rock-paper-scissors.Player 1 has private information about his type ω , which is perfectly persistent. Let ω ∈ { ω s , ω c } in which ω c stands for a commitment type who mechanically plays a ∗ in every period, and ω s stands for a strategic type who can ﬂexibly choose his actions in order to maximize his discounted average payoff ∑ + ∞ t = ( − δ ) δ t u ( a t , b t ) .Player 2s’ prior belief attaches probability π ∈ ( , ) to the commitment type.Players’ past actions can be perfectly monitored. A typical public history is denoted by h t ≡ { a s , b s , ξ s } t − s = ,which consists of all actions taken in the past and the realizations of public randomization devices ξ t ∈ [ , ] .Let H t be the set of h t and let H ≡ ∪ t ∈ N H t . Strategic type player 1’s strategy is σ : H → ∆ ( A ) . Player 2s’strategy is σ : H → ∆ ( B ) . Let Σ and Σ be the set of player 1’s and player 2s’ strategies, respectively.The solution concept is (Bayes) Nash equilibrium. Let NE ( δ , π ) ⊂ Σ × Σ be the set of Nash equilibriaunder parameter conﬁguration ( δ , π ) . Since the stage game is ﬁnite and the game is continuous at inﬁnity,NE ( δ , π ) is non-empty for every δ and π . Existing Result on Equilibrium Payoffs:

The reputation result in Fudenberg and Levine (1989) implies thata patient player 1 can secure his Stackelberg payoff in all equilibria. Formally, for every π > ε >

0, thereexists δ ∈ ( , ) such that for every δ > δ , we have:inf ( σ , σ ) ∈ NE ( δ , π ) E ( σ , σ ) h + ∞ ∑ t = ( − δ ) δ t u ( a t , b t ) i ≥ u ( a ∗ , b ∗ ) − ε , (2.3)where E ( σ , σ ) [ · ] is the expectation operator when player 1 uses σ and player 2s use σ .Inequality (2.3) unveils the signiﬁcant effects of reputations on the patient player’s payoff. Fudenberg andLevine (1989) view this reputation result as a reﬁnement, which selects among the plethora of equilibria inrepeated complete information games. According to the folk theorem result in Fudenberg, Kreps, and Maskin(1990), the patient player can attain any payoff between v and v ≡ max supp ( β ) ⊂ BR ( α ) min a ∈ supp ( α ) u ( a , β ) (2.4)in a repeated complete information game against a sequence of myopic opponents. One can verify that v ≥ Our result generalizes to models with multiple commitment types, as long as all commitment types play pure strategies. Weexclude mixed-strategy commitment types since we focus on the reﬁnement role of reputations. As shown in Fudenberg and Levine(1992), the patient player’s lowest equilibrium payoff in a reputation game with mixed-strategy commitment types can be strictly greaterthan his highest equilibrium payoff in the repeated complete information game, which goes against our interpretation. Establishing the common properties of all Nash equilibria is a common practice in the reputation literature. Despite we focus onNash equilibria, the equilibria we construct can survive standard reﬁnements and are not driven by suboptimal behaviors or unreasonablebeliefs off the equilibrium path. For example, they are also Perfect Bayesian equilibria and sequential equilibria.

RESULTS 6 u ( a ∗ , b ∗ ) , which implies that introducing a commitment type that mechanically plays the Stackelberg actionin every period selects equilibria in which player 1’s payoff is between u ( a ∗ , b ∗ ) and v . In games withmonotone-supermodular payoffs introduced in Deﬁnition 1 which include the product choice game and entrydeterrence game as special cases, v = u ( a ∗ , b ∗ ) , in which case the reputation model selects equilibria in whichthe patient player attains his highest equilibrium payoff. We examine the extent to which the option to build a reputation can encourage the patient player to play hisStackelberg action, as well as how reputation effects can reﬁne his behavior. We focus on the discountedfrequencies of the patient player’s actions. Formally, if the strategic type patient player’s strategy is σ andplayer 2s’ strategy is σ , then the discounted frequency of action a ∈ A is: G ( σ , σ ) ( a ) ≡ E ( σ , σ ) h ∞ ∑ t = ( − δ ) δ t { a t = a } i . (3.1)In Section 3.1, we characterize the range of discounted frequencies with which the patient player plays a ∗ inequilibrium, i.e., the values of inf ( σ , σ ) ∈ NE ( δ , π ) G ( σ , σ ) ( a ∗ ) and sup ( σ , σ ) ∈ NE ( δ , π ) G ( σ , σ ) ( a ∗ ) (3.2)when δ is close to 1. In Section 3.2, we characterize the set of action frequencies that can arise in equilibrium. When δ is above some cutoff, there exists an equilibrium ( σ , σ ) in which G ( σ , σ ) ( a ∗ ) =

1. For example, σ ( h t ) = a ∗ and σ ( h t ) = b ∗ at every on-path history h t . Once player 1 plays an action other than a ∗ , hiscontinuation value equals his minmax payoff v . Such a punishment is feasible since player 1 separates fromthe commitment type after any deviation, and according to Fudenberg, Kreps, and Maskin (1990), there existsan equilibrium of the repeated complete information game in which player 1’s payoff is v . This grim triggerpunishment provides the patient player an incentive to play a ∗ given that u ( a ∗ , b ∗ ) > v . By contrast, reputation results in models with mixed-strategy commitment types cannot be viewed as reﬁnements, since the patientplayer’s guaranteed payoff in the reputation game can be strictly higher than v (for example, in the product choice game). RESULTS 7Theorem 1 characterizes a tight lower bound on the frequency with which the patient player plays a ∗ . Let Γ ≡ n ( α , b ) ∈ ∆ ( A ) × B (cid:12)(cid:12)(cid:12) b ∈ BR ( α ) o , (3.3)which consists of pairs of ( α , b ) such that b best replies against α . Let F ∗ ( u , u ) ≡ min ( α , α , b , b , q ) ∈ ∆ ( A ) × ∆ ( A ) × B × B × [ , ] n q α ( a ∗ ) + ( − q ) α ( a ∗ ) o , (3.4)subject to ( α , b ) , ( α , b ) ∈ Γ , and qu ( α , b ) + ( − q ) u ( α , b ) ≥ u ( a ∗ , b ∗ ) , (3.5)in which α i ( a ) is the probability of action a ∈ A in distribution α i ∈ ∆ ( A ) for every i ∈ { , } . One can verifythat F ∗ ( u , u ) < u and u that satisfy Assumption 2. Theorem 1.

Suppose u and u satisfy Assumptions 1 and 2. For every π ∈ ( , ) , lim δ → inf ( σ , σ ) ∈ NE ( δ , π ) G ( σ , σ ) ( a ∗ ) = F ∗ ( u , u ) . (3.6)Since players have access to a public randomization device, Theorem 1 implies that the frequency withwhich the patient player plays his Stackelberg action can be any number between F ∗ ( u , u ) and 1. Whenplayers’ stage-game payoffs satisfy Assumption 2, the value of F ∗ ( u , u ) is strictly less than 1, which impliesthat there exist equilibria in which an arbitrarily patient player plays his Stackelberg action with frequencybounded away from one despite having the option to build a reputation.For some intuition behind F ∗ ( u , u ) , notice that the presence of commitment type implies that the patientplayer can guarantee payoff approximately u ( a ∗ , b ∗ ) by imitating the commitment type. Therefore, he has anincentive to play actions other than a ∗ only if his continuation value after separating from the commitment typeis no less than u ( a ∗ , b ∗ ) . This explains the necessity of constraint (3.5).The substantial part of our result is to establish the existence of equilibria in which the patient player plays a ∗ with frequency approximately F ∗ ( u , u ) . This is nontrivial given that player 1’s mixed actions cannot beperfectly monitored, which implies that player 1 needs to be indifferent when he is supposed to mix. However,player 1’s indifference conditions are not in our linear program. This raises the concern that these indifferenceconditions may introduce additional constraints on players’ action frequencies. In addition, imperfect moni-toring of mixed actions also implies that the patient player’s payoff from either ( α , b ) or ( α , b ) cannot be RESULTS 8attained in any equilibrium of the repeated game.In order to illustrate the subtleties, consider the following game between a ﬁrm (row player) that choosesits effort level and a sequence of consumers (column player), each of them chooses whether to trust the ﬁrm ornot: – T NH , − , M , − , L , − , M . His Stackelberg payoff is 2, which is also his highest equilibrium payoffin the repeated complete information game. Action proﬁle ( H + L , T ) belongs to Γ , from which player 1’sexpected payoff is 2 .

5. According to (3.4) and (3.5), the value of F ∗ is 0. Theorem 1 implies that there existequilibria in which the patient player plays M with frequency arbitrarily close to 0.However, the conclusion that the patient player plays M with frequency approximately zero seems to be atodds with the result that he can secure payoff approximately 2 in all equilibria. To see this, suppose there existsa history h t at which the strategic-type patient player plays M with zero probability at every h s (cid:23) h t , i.e., only H and L are played in the continuation game. The folk theorem result in Fudenberg, Kreps, and Maskin (1990)suggests that player 1’s continuation value at h t is no more than 1, which is strictly lower than his Stackelbergpayoff 2. This implies that at every history where the strategic type plays actions other than M with positiveprobability, he needs to play M in unboundedly number of periods after separating from the commitment typein order to receive a continuation payoff close to 2. This requirement seems to be in conﬂict with the one thatthe frequency of M can be arbitrarily close to 0.Our proof, which is in Appendix A, constructs equilibria in which the probability with which the patientplayer plays M inﬁnitely often is strictly positive, but can be arbitrarily close to zero. We explain the mainideas using the above example. Play starts from a phase in which player 2 plays T and player 1 ﬁrst mixesbetween M and L , and then mixes between H and L . If player 1 has played L too frequently in the past, thenthe continuation play enters an absorbing phase after which player 1 never plays M and his continuation valueis at most 1. If player 1 has played H or M too frequently in the past, then the continuation play consists onlyof outcome ( M , T ) , from which player 1’s continuation value is 2. The calendar time at which play enters theabsorbing phase as well as player 1’s continuation value at the absorbing phase depends on the history of player1’s actions as well as his discount factor, which are constructed to provide him incentives to play his equilibriummixed actions before play enters the absorbing phase.An important step of our proof is to verify that the discounted frequency of action M is indeed close to 0 RESULTS 9as δ approaches unity. We establish a discounted version of the Wald’s identity (Lemma A.5) that bounds thediscounted average frequency of M from above, which might be of separate technical interest. Compute F ∗ ( u , u ) : We explain how to efﬁciently compute the value of F ∗ ( u , u ) , which is a key steptoward applying Theorem 1. First, we show that F ∗ ( u , u ) can be computed in polynomial time. Proposition 1.

The program that deﬁnes F ∗ ( u , u ) can be computed in polynomial time. The proof is in Appendix B. Intuitively, this is because for every ( b , b ) ∈ B × B , one can rewrite theconstrained optimization problem into one with 2 | A | + | B | + A and B are ﬁnite sets.Next, we show that the constrained optimization problem can be further simpliﬁed when players’ stage-game payoff functions are monotone-supermodular, which ﬁts into a number of applications such as entrydeterrence, business transactions, and ﬁscal policies. Deﬁnition 1.

Players’ payoffs are monotone-supermodular if A and B are totally ordered sets such thatu ( a , b ) is strictly decreasing in a, and u ( a , b ) has strictly increasing differences in a and b. Let a be the lowest element in A and let b ∈ B be player 2’s best reply against a . If player 2 has multiplebest replies against a , then pick the one that maximizes player 1’s payoff. Let Γ ∗ ≡ n ( α , b ) ∈ Γ (cid:12)(cid:12)(cid:12) | BR ( α ) | ≥ b ∈ argmax b ′ ∈ BR ( α ) u ( α , b ′ ) o , (3.7)which is a subset of Γ that consists of strategy proﬁles ( α , b ) in which player 2’s incentive constraint is binding.One can verify that under generic ( u , u ) , Γ ∗ is a ﬁnite set. Proposition 2.

If players’ payoffs are monotone-supermodular, then:F ∗ ( u , u ) = min ( α , α , b , b , q ) ∈ ∆ ( A ) × ∆ ( A ) × B × B × [ , ] n q α ( a ∗ ) + ( − q ) α ( a ∗ ) o , subject to ( α , b ) , ( α , b ) ∈ Γ ∗ ∪ { ( a , b ) } , and qu ( α , b ) + ( − q ) u ( α , b ) ≥ u ( a ∗ , b ∗ ) . Proposition 2 implies that in games with monotone-supermodular stage-game payoffs, it is without lossof generality to choose ( α , b ) and ( α , b ) from a ﬁnite subset of Γ , which consists of strategy proﬁles inwhich either player 2’s incentive constraint is binding or player 1’s lowest-cost action is played with probability This deﬁnition resembles the one in Liu and Pei (2020) and Pei (2020) except that there is no state that affects players’ payoffs,and furthermore, we do not require u to be strictly increasing in b . RESULTS 101. To understand why, suppose toward a contradiction that { ( α , b ) , ( α , b ) , q } solves (3.4), player 2 has astrict incentive to play b against α , and α does not attach probability 1 to player 1’s lowest action. Thenone can modify the distribution by slightly increasing the probability of a and decreasing the probability ofother actions, after which player 1’s expected payoff strictly increases and the probability of action a ∗ strictlydecreases. This contradicts the presumption that { ( α , b ) , ( α , b ) , q } attains the minimum.We illustrate how to apply our result using the product choice game in Mailath and Samuelson (2006).Players’ stage-game payoffs are given by: – T NH , − γ ∗ − b N , L + b T , − γ ∗ , b T > H when player 2 plays T , b N ≥ H whenplayer 2 plays N , and γ ∗ ∈ ( , ) is the minimal probability with which player 1 needs to play H in order toinduce player 2 to play T . One can verify that this example satisﬁes Assumptions 1 and 2, and players’ payoffsare monotone-supermodular once player 1’s actions are ranked according to H ≻ L , and player 2’s actions areranked according to T ≻ N .According to (3.7), Γ ∗ contains only one action proﬁle ( γ ∗ H + ( − γ ∗ ) L , T ) . Proposition 2 implies that: F ∗ ( u , u ) = min q ∈ [ , ] q γ ∗ , subject to qu ( γ ∗ H + ( − γ ∗ ) L , T ) + ( − q ) u ( L , N ) ≥ u ( H , T ) , from which we obtain that q ≥ ( − γ ∗ ) b T + and therefore, F ∗ ( u , u ) = γ ∗ ( − γ ∗ ) b T + . Let A ≡ n α ∗ ∈ ∆ ( A ) (cid:12)(cid:12)(cid:12) ∃ q ∈ ∆ ( Γ ) such that α ∗ = Z α α dq and Z ( α , b ) u ( α , b ) dq = u ( a ∗ , b ∗ ) o (3.8)be the set of distributions over player 1’s actions such that for every α ∗ that belongs to this set, one can ﬁnd adistribution over action proﬁles q ∈ ∆ ( Γ ) from which player 1’s expected payoff equals his Stackelberg payoffand the marginal distribution over his actions coincides with α ∗ . Theorem 2.

Suppose u and u satisfy Assumptions 1 and 2. For every α ∗ ∈ A and ε > , there exists Assumption 2 implies that a ∗ = a in games with monotone-supermodular stage-game payoffs. CONCLUSION 11 δ ∈ ( , ) such that for every δ > δ , there exists ( σ , σ ) ∈ NE ( δ , π ) such that: (cid:12)(cid:12)(cid:12) G ( σ , σ ) ( a ) − α ∗ ( a ) (cid:12)(cid:12)(cid:12) < ε for every a ∈ A . (3.9) Suppose u and u satisfy Assumptions 1 and 2, and u ( a ∗ , b ∗ ) = v . For every α ′ / ∈ A , there exist η > and δ ∈ ( , ) such that for every δ > δ and every ( σ , σ ) ∈ NE ( δ , π ) , we have: (cid:12)(cid:12)(cid:12) G ( σ , σ ) ( a ) − α ∗ ( a ) (cid:12)(cid:12)(cid:12) > η for some a ∈ A . (3.10)The proof is in Appendix A. According to Theorem 2, every action distribution that belongs to A coin-cides with the patient player’s action frequency in some equilibria of the reputation game. In games where u ( a ∗ , b ∗ ) = v , i.e., when reputation effects select the patient player’s highest equilibrium payoff in the re-peated complete information game, an action distribution is the patient player’s discounted action frequency insome equilibrium if and only if it belongs to A .In terms of reﬁning the patient player’s equilibrium behaviors for repeated complete information games, ourresult implies that when u ( a ∗ , b ∗ ) = v , reputation effects cannot reﬁne the patient player’s behavior beyondthe fact that his equilibrium payoff being weakly greater than his Stackelberg payoff. We examine the effects of reputation on a patient informed player’s equilibrium behavior, which contrasts to theexisting literature that focuses on the patient player’s equilibrium payoff. Our analysis focuses on the patientplayer’s discounted action frequencies and characterize tight bounds that apply to all equilibria in a broad classof games.Our results imply that ﬁrst, in games where the optimal commitment outcome is not a stage-game Nashequilibrium, the long-lived player may play his optimal commitment action with frequency bounded away fromone no matter how patient he is. Second, when the patient player’s optimal commitment payoff coincides withhis highest equilibrium payoff in the repeated complete information game, reputation effects cannot further re-ﬁne the patient player’s behavior beyond that fact that his equilibrium payoff is at least his optimal commitmentpayoff. PROOFOFTHEOREMS1AND2 12

A Proof of Theorems 1 and 2

Our proof consists of two parts. In Appendix A.1, we show that in every equilibrium, the discounted averagefrequency with which a patient player plays a ∗ cannot be strictly lower than F ∗ ( u , u ) . In another word,lim inf δ → inf ( σ , σ ) ∈ NE ( δ , π ) G ( σ , σ ) ( a ∗ ) ≥ F ∗ ( u , u ) . (A.1)We then establish the second statement of Theorem 2 that in games where v = u ( a ∗ , b ∗ ) , any action distributionthat does not belong to A cannot be the patient player’s action frequencies in any equilibrium.In Appendix A.2, we provide a constructive proof to the sufﬁciency part of Theorem 1 and the ﬁrst state-ment of Theorem 2. We construct a class of equilibria { ( σ δ , σ δ ) } δ ∈ ( , ) in which G ( σ δ , σ δ ) ( a ∗ ) converges to F ∗ ( u , u ) when δ goes to 1, which implies that:lim sup δ → inf ( σ , σ ) ∈ NE ( δ , π ) G ( σ , σ ) ( a ∗ ) ≤ F ∗ ( u , u ) . (A.2)More generally, for every α ∗ ∈ A , we construct a sequence of equilibria such that G ( σ δ , σ δ ) ( a ) converges to α ∗ ( a ) for every a ∈ A . A.1 Part I: Necessity

Let ∆ ( Γ ) be the set of probability distributions on Γ whose support has countable number of elements. Let F ( u , u , ε ) be the value of the following constrained optimization problem: F ( u , u , ε ) ≡ inf p ∈ ∆ ( Γ ) Z α ( a ∗ ) d p ( α , b ) , (A.3)subject to Z u ( α , b ) d p ( α , b ) ≥ u ( a ∗ , b ∗ ) − ε . (A.4)Our proof of the necessity part of Theorem 1 consists of three lemmas, which together imply inequality (A.1). Lemma A.1.

For every π > and ε > , there exists δ ∈ ( , ) such that for every δ > δ ,G ( σ , σ ) ( a ∗ ) ≥ F ( u , u , ε ) − ( − δ ) for every ( σ , σ ) ∈ NE ( δ , π ) . (A.5) Lemma A.2.

For every u and u that satisfy Assumptions 1 and 2, lim ε ↓ F ( u , u , ε ) = F ( u , u , ) . Lemma A.3.

For every u and u that satisfy Assumptions 1 and 2, F ∗ ( u , u ) = F ( u , u , ) . PROOFOFTHEOREMS1AND2 13

Proof of Lemma A.1:

The reputation result in Fudenberg and Levine (1989) implies that for every π > ε >

0, there exists δ ∈ ( , ) such that for every δ > δ , E ( σ , σ ) h + ∞ ∑ t = ( − δ ) δ t u ( a t , b t ) i ≥ u ( a ∗ , b ∗ ) − ε / ( σ , σ ) ∈ NE ( δ , π ) . (A.6)For given ( σ , σ ) ∈ NE ( δ , π ) , let H ∗ be a set of on-path histories such that h t ∈ H ∗ if and only if • a ∗ was played from period 0 to t − and σ ( h t ) assigns positive probability to actions other than a ∗ .By construction, for every h t ∈ H ∗ , player 2’s posterior belief at h t assigns probability at least π to thecommitment type, and therefore, player 1’s continuation value at h t is at least u ( a ∗ , b ∗ ) − ε /

2. Let M ≡ max ( a , b ) ∈ A × B u ( a , b ) . For every a ∈ supp ( σ ( h t )) \{ a ∗ } and b ∈ supp ( σ ( h t )) , player 1’s continuation value at ( h t , a , b ) , denoted by v ( h t , a , b ) , satisﬁes: v ( h t , a , b ) ≥ δ (cid:16) u ( a ∗ , b ∗ ) − ε − ( − δ ) M (cid:17) . The right-hand-side is strictly greater than u ( a ∗ , b ∗ ) − ε when δ is close enough to 1. For every on-pathhistory h s such that h s (cid:23) ( h t , a , b ) , player 2 attaches probability 1 to the rational type at h s , and therefore, σ ( h s ) best replies against σ ( h s ) . Therefore, ( σ ( h s ) , b ) ∈ Γ for every b ∈ supp ( σ ( h s )) . Let p ( h t , a , b ) ∈ ∆ ( Γ ) be aprobability measure on Γ such that for every ( α , b ) ∈ Γ , p ( h t , a , b ) ( α , b ) ≡ E ( σ , σ ) h ∞ ∑ s = t + ( − δ ) δ s − t − { σ ( h s ) = α } σ ( b ) (cid:12)(cid:12)(cid:12) ( h t , a , b ) i . (A.7)By construction, p ( h t , a , b ) has a countable number of elements in its support, and player 1’s continuation valueat ( h t , a , b ) , denoted by v ( h t , a , b ) , satisﬁes v ( h t , a , b ) = Z u ( α , b ) d p ( h t , a , b ) ( α , b ) ≥ u ( a ∗ , b ∗ ) − ε . (A.8)The deﬁnition of F ( u , u , ε ) in (A.3) and (A.4) suggests that: G ( h t , a , b ) ( a ∗ ) ≡ E ( σ , σ ) h ∞ ∑ s = t + ( − δ ) δ s − t − { a s = a ∗ } (cid:12)(cid:12)(cid:12) ( h t , a , b ) i ≥ F ( u , u , ε ) . (A.9)Next, we compute a lower bound on G ( σ , σ ) ( a ∗ ) . Let c H be the set of on-path histories h t ≡ ( h t − , a t − , b t − ) such that t ≥ h t − ∈ H ∗ , and a t − = a ∗ . Let p ( σ , σ ) ( h t ) be the ex ante probability of history h t under theprobability measure induced by ( σ , σ ) . By deﬁnition, 1 − ∑ h t ∈ c H p ( σ , σ ) ( h t ) is the ex ante probability with PROOFOFTHEOREMS1AND2 14which player 1 plays a ∗ in every period conditional on him being the rational type. Therefore, G ( σ , σ ) ( a ∗ ) = (cid:16) − ∑ h t ∈ c H p ( σ , σ ) ( h t ) (cid:17) + ∑ h t ∈ c H p ( σ , σ ) ( h t ) (cid:16) ( − δ t − ) + δ t X ( h t ) ( a ∗ ) (cid:17) ≥ − ( − δ ) + (cid:16) − ∑ h t ∈ c H p ( σ , σ ) ( h t ) (cid:17) + ∑ h t ∈ c H p ( σ , σ ) ( h t ) (cid:16) ( − δ t ) + δ t X ( h t ) ( a ∗ ) (cid:17) ≥ F ( u , u , ε ) − ( − δ ) ≥ F ( u , u , ε ) − ( − δ ) (A.10) Proof of Lemma A.2:

By deﬁnition, the value of F ( u , u , ε ) is a decreasing function of ε and is bounded by [ , ] . Therefore, lim ε ↓ F ( u , u , ε ) exists and moreover, lim ε ↓ F ( u , u , ε ) ≤ F ( u , u , ) .Next, we show that lim ε ↓ F ( u , u , ε ) ≥ F ( u , u , ) . The optimization problem that deﬁnes F ( u , u , ε ) implies that for every ε >

0, there exists p ε ∈ ∆ ( Γ ) that has countable number of elements in its support suchthat R α ( a ∗ ) d p ε ( α , b ) ≤ F ( u , u , ε ) + ε and R u ( α , b ) d p ε ( α , b ) ≥ u ( a ∗ , b ∗ ) − ε .According to Assumption 2, there exists a ′ ∈ A such that u ( a ′ , b ∗ ) > u ( a ∗ , b ∗ ) . According to Assumption1, b ∗ is player 2’s strict best reply against a ∗ . This implies the existence of α ∗ ∈ ∆ ( A ) such that α ∗ ( a ∗ ) = b ∗ ∈ BR ( α ∗ ) , and u ( α ∗ , b ∗ ) > u ( a ∗ , b ∗ ) . Let ρ ≡ u ( α ∗ , b ∗ ) − u ( a ∗ , b ∗ ) . Since the support of p ε is countable,there exists α ∗ ε ∈ ∆ ( A ) such that α ∗ ε ( a ∗ ) = b ∗ ∈ BR ( α ∗ ε ) , u ( α ∗ ε , b ∗ ) − u ( a ∗ , b ∗ ) ≥ ρ , and ( α ∗ ε , b ∗ ) does notbelong to the support of p ε . We construct probability measure p ′ ε ∈ ∆ ( Γ ) according to: • p ′ ε ( α ∗ ε , b ∗ ) ≡ ερ + ε . • p ′ ε ( α , b ) ≡ ρρ + ε p ε ( α , b ) for every ( α , b ) that belongs to the support of p ε .By construction, R u ( α , b ) d p ′ ε ( α , b ) ≥ u ( a ∗ , b ∗ ) , and therefore,2 ερ + ε + ρρ + ε (cid:16) F ( u , u , ε ) + ε (cid:17) ≥ Z α ( a ∗ ) d p ′ ε ( α , b ) ≥ F ( u , u , ) . (A.11)This implies that lim ε ↓ n ερ + ε + ρρ + ε (cid:16) F ( u , u , ε ) + ε (cid:17)o = lim ε ↓ F ( u , u , ε ) ≥ F ( u , u , ) . Proof of Lemma A.3:

The inequality that F ∗ ( u , u ) ≥ F ( u , u , ) is implied by the deﬁnitions of F ∗ ( u , u ) and F ( u , u , ) . In what follows, we show that F ∗ ( u , u ) ≤ F ( u , u , ) . For every η >

0, there exists p η ∈ ∆ ( Γ ) that has countable number of elements in its support such that R α ( a ∗ ) d p η ( α , b ) ≤ F ( u , u , ) + η and R u ( α , b ) d p η ( α , b ) ≥ u ( a ∗ , b ∗ ) . Let Γ η be a countable subset of Γ that contains the support of p η . Consider PROOFOFTHEOREMS1AND2 15the following minimization problem: F η ≡ min p ∈ ∆ ( Γ η ) ∑ ( α , b ) ∈ Γ η p ( α , b ) α ( a ∗ ) , (A.12)subject to ∑ ( α , b ) ∈ Γ η p ( α , b ) u ( α , b ) ≥ u ( a ∗ , b ∗ ) . (A.13)By construction, F η ≤ R α ( a ∗ ) d p η ( α , b ) ≤ F ( u , u , ) + η . We show that F η can be attained via a distributionthat contains at most two elements in its support. The Lagrangian of the minimization problem is: ∑ ( α , b ) ∈ Γ η p ( α , b ) α ( a ∗ ) + λ (cid:16) ∑ ( α , b ) ∈ Γ η p ( α , b ) u ( α , b ) − u ( a ∗ , b ∗ ) (cid:17) , (A.14)where λ is the Lagrange multiplier. If constraint (A.13) is not binding, then the minimum is zero and is attainedby a degenerate distribution. If constraint (A.14) is binding, then for every pair of elements ( α , b ) and ( α ′ , b ′ ) in the support of the minimand p ∗ η ∈ ∆ ( Γ η ) , α ( a ∗ ) + λ u ( α , b ) = α ′ ( a ∗ ) + λ u ( α ′ , b ) . (A.15)Label the elements in the support of p ∗ η as { ( α i , b i ) } + ∞ i = . Equation (A.15) implies that for every α i ( a ∗ ) = α j ( a ∗ ) , u ( α i , b ) − u ( α j , b ) α i ( a ∗ ) − α j ( a ∗ ) = − λ . (A.16)Let u ≡ sup ( α , b ) ∈{ ( α i , b i ) } + ∞ i = u ( α , b ) , u ≡ inf ( α , b ) ∈{ ( α i , b i ) } + ∞ i = u ( α , b ) , q ≡ sup ( α , b ) ∈{ ( α i , b i ) } + ∞ i = α ( a ∗ ) , and q ≡ inf ( α , b ) ∈{ ( α i , b i ) } + ∞ i = α ( a ∗ ) . Equation (A.16) implies that u − uq − q = − λ . Let γ ∈ ( , ) be such that γ u + ( − γ ) u = u ( a ∗ , b ∗ ) . According to (A.16), we have γ q + ( − γ ) q = F η .Since ∆ ( A ) × B is compact, there exist ( α , b ) and ( α , b ) which are limit points of set { ( α i , b i ) } + ∞ i = suchthat u ( α , b ) = u , α ( a ∗ ) = q , u ( α , b ) = u , and α ( a ∗ ) = q . Since player 2’s best reply correspondenceis upper-hemi-continuous, ( α , b ) , ( α , b ) ∈ Γ . Our analysis above suggests that there exists a distribution on Γ η S { ( α , b ) , ( α , b ) } with at most two elements in its support that satisﬁes constraint (A.4) and the value of the PROOFOFTHEOREMS1AND2 16objective function (A.3) is at most F ( u , u , ) + η .Take a decreasing sequence of positive real numbers { η n } n ∈ N such that lim n → ∞ η n =

0. For every n ∈ N ,there exists p n ∈ ∆ ( Γ ) with at most two elements in its support that satisﬁes constraint (A.4) and the value ofthe objective function is at most F ( u , u , ) + η n . Since (cid:16) ∆ ( A ) × B (cid:17) is compact, there exists a convergingsubsequence { p k n } n ∈ N such that its limit p ∗ has at most two elements in its support, satisﬁes constraint (A.4),and the value of the objective function is at most F ( u , u , ) . This implies that F ∗ ( u , u ) ≤ F ( u , u , ) . Proof of Statement 2 of Theorem 2:

Since v = u ( a ∗ , b ∗ ) , for every ε >

0, there exists δ ∈ ( , ) such thatplayer 1’s payoff in every equilibrium where δ > δ is no more than u ( a ∗ , b ∗ ) + ε . Let A ε ≡ n α ∗ ∈ ∆ ( A ) (cid:12)(cid:12)(cid:12) ∃ q ∈ ∆ ( Γ ) such that α ∗ = Z α α dq and (cid:12)(cid:12)(cid:12) Z ( α , b ) u ( α , b ) dq − u ( a ∗ , b ∗ ) (cid:12)(cid:12)(cid:12) ≤ ε o . (A.17)Lemma A.1 implies that for every α ′ / ∈ A ε , there exist η > δ ∈ ( , ) such that for every δ > δ and every ( σ , σ ) ∈ NE ( δ , π ) , we have: (cid:12)(cid:12)(cid:12) G ( σ , σ ) ( a ) − α ∗ ( a ) (cid:12)(cid:12)(cid:12) > η for some a ∈ A . (A.18)The conclusion of Theorem 2 is obtained since lim ε → A ε = A . A.2 Part II: Sufﬁciency

First, we argue that in order to establish (A.2), it is without loss of generality to focus on solutions of theconstrained optimization problem under which constraint (3.5) is binding. Suppose the constrained minimumof (3.4) is attained by { ( α , b ) , ( α , b ) , q } in which qu ( α , b ) + ( − q ) u ( α , b ) > u ( a ∗ , b ∗ ) . Since a ∗ isplayer 1’s unique Stackelberg action, for every a ′ = a ∗ , there exists b ′ ∈ BR ( a ′ ) such that u ( a ′ , b ′ ) < u ( a ∗ , b ∗ ) .Let r ∈ [ , ] be deﬁned via: ru ( a ′ , b ′ ) + ( − r ) (cid:16) u ( α , b ) + ( − q ) u ( α , b ) (cid:17) = u ( a ∗ , b ∗ ) . Consider an alternative distribution q ′ ∈ ∆ ( Γ ) that attaches probability r to ( a ′ , b ′ ) , probability ( − r ) q to ( α , b ) , and probability rq to ( α , b ) . The probability of a ∗ is weakly lower under q ′ compared to that under q , and constraint (3.5) is binding. According to Lemma A.3, there exists a distribution over action proﬁlessupported on two elements under which constraint (3.5) binds and minimizes (3.4).The above argument implies that the sufﬁcient part of Theorem 1 is implied by the ﬁrst statement of Theo- PROOFOFTHEOREMS1AND2 17rem 2, which we restate as Lemma A.4: Lemma A.4.

For every α ∗ ∈ A and ε > , there exists δ ∈ ( , ) such that for every δ > δ , there exists ( σ , σ ) ∈ NE ( δ , π ) such that: (cid:12)(cid:12)(cid:12) E ( σ , σ ) h ∞ ∑ t = ( − δ ) δ t { a t = a } i − α ∗ ( a ) (cid:12)(cid:12)(cid:12) < ε for every a ∈ A . (A.19)Our constructive proof of Lemma A.4 hinges on the following concentration inequality, which can beviewed as a discounted version of the Wald’s identity. In the next lemma, we focus on the case that Z t takespositive value with positive probability. Note that the other case is trivial since the sum is always negative. Lemma A.5.

For every δ ∈ ( , ) , c ≥ , and sequence of i.i.d. random variables Z t with ﬁnite support andmean µ < , and Z t takes positive value with positive probability, we have: Pr " ∞ [ n = ( n ∑ t = δ t Z t ≥ c ) ≤ exp ( − r ∗ · c ) where r ∗ > is the smallest positive real number such that E z ∼ Z [ exp ( r ∗ z )] = .Proof. Let γ Z , t ( r ) = ln E z ∼ Z t [ exp ( rz δ t )] , and let q Z , r , t ( z ) = p Z ( z ) exp ( rz δ t − γ Z , t ( r )) , where p Z ( z ) is the probability mass function of random variable Z . One can verify that q is a well-deﬁnedprobability measure. For a sequence of random variables Z n ≡ { Z , . . . , Z n } , we have q Z n , r ( z , . . . , z n ) = p Z n ( z , . . . , z n ) exp n ∑ t = rz t δ t − n ∑ t = γ Z t , t ( r ) ! . Let s n = ∑ nt = z t δ t , we have q S n , r ( s n ) = p S n ( s n ) exp rs n − n ∑ t = γ Z t , t ( r ) ! . Since q S n , r is a probability measure, we have E " exp rs n − n ∑ t = γ Z t , t ( r ) ! = . (A.20) PROOFOFTHEOREMS1AND2 18Let γ ( r ) ≡ E z ∼ Z [ exp ( rz )] , we have γ ( ) = γ ′ ( ) = E z ∼ Z [ z ] <

0. Since r ∗ > E z ∼ Z [ exp ( r ∗ z )] =

1, we have γ ( r ) ≤ ≤ r ≤ r ∗ . Since random variables Z t arei.i.d., we have γ Z t , t ( r ∗ ) = ln E z ∼ Z t (cid:2) exp ( r ∗ z δ t ) (cid:3) = ln E z ∼ Z (cid:2) exp ( r ∗ z δ t ) (cid:3) ≤ t ≥

1. By substituting r = r ∗ in inequality (A.20), we have E [ exp ( r ∗ s n )] ≤ Let J be the stoppingtime that the sum s J ﬁrst exceeds the threshold c , we havePr [ s J ≥ c ] · E h exp ( r ∗ s J ) (cid:12)(cid:12)(cid:12) s J ≥ c i ≤ , which implies that Pr " ∞ [ n = ( n ∑ t = δ t Z t ≥ c ) = Pr [ s J ≥ c ] ≤ exp ( − r ∗ · c ) . Back to the proof of Lemma A.4. According to Assumption 2, there exists α ′ ∈ ∆ ( A ) such that α ′ ( a ∗ ) = b ∗ ∈ BR ( α ′ ) , and u ( α ′ , b ∗ ) > u ( a ∗ , b ∗ ) . Let a ′ = a ∗ such that p ≡ α ∗ ( a ′ ) > u ( a ′ , b ∗ ) > u ( a ∗ , b ∗ ) ,and let b ′ = BR ( a ′ ) . Since a ∗ is player 1’s Stackelberg action and a ′ = a ∗ , we have b ′ = b ∗ .For given α ∗ ∈ A , let q ∈ ∆ ( Γ ) be such that α ∗ = R α α dq and R ( α , b ) u ( α , b ) dq = u ( a ∗ , b ∗ ) , and let ε > Z = u ( a ∗ , b ∗ ) − u ( a , b ) be a random variable that • equals u ( a ∗ , b ∗ ) − u ( a ∗ , b ∗ ) with probability ε α ′ ( a ∗ ) , • equals u ( a ∗ , b ∗ ) − u ( a ′ , b ∗ ) with probability ε α ′ ( a ′ ) , • with probability 1 − ε , equals u ( a ∗ , b ∗ ) − u ( a , b ) where ( a , b ) is distributed according to q .One can verify that Z has ﬁnite support and E [ Z ] <

0. Let r ∗ > E z ∼ Z [ exp ( r ∗ · z )] = Similarly, let Z = u ( a , b ) − ε be the random variable that: • equals u ( a ∗ , b ∗ ) − ε with probability ε α ′ ( a ∗ ) , • equals u ( a ′ , b ∗ ) − ε with probability ε α ′ ( a ′ ) , • with probability 1 − ε , equals u ( a , b ) − ε where ( a , b ) is distributed according to q .Let r ∗ > E z ∼ Z [ exp ( r ∗ · z )] =

1. Let T = ⌈ M + cu ( a ′ , b ∗ ) − u ( a ∗ , b ∗ ) ⌉ where c ∈ R + is such that exp ( − min { r ∗ , r ∗ } · c ) ≤ ε . Note that when δ =

1, the inequality holds with equality, which is the celebrated Wald’s identity established in Wald (1944). Here we consider the case that the random variable Z takes positive value with positive probability. As will become clearer in theanalysis, the case when Z only has non-positive support is trivial. We made the same assumption for Z as well. PROOFOFTHEOREMS1AND2 19In what follows, we construct an equilibrium when δ > δ with¯ δ = max (cid:26) ln ( − ε ) ln T , − ε (cid:27) . (A.21) • Play starts from a preparation phase in which player 1 plays α ′ and player 2 plays b ∗ . This phasecontinues as long as action proﬁle ( a ∗ , b ∗ ) was played in all previous periods. • If ( a ′ , b ∗ ) has been played before, then the following strategies are repeatedly played for inﬁnitely manytimes, and we refer to each repetition as a stage.1. In the beginning of each stage, both players follow strategy proﬁle ( α ′ , b ∗ ) for T periods.2. If action proﬁle ( a ′ , b ∗ ) is observed for fewer than T periods, then jump to Step 4.3. If action proﬁle ( a ′ , b ∗ ) is observed for T periods, play enters a random walk phase . Let ¯ T ≡⌈ ln ( − ε ) ln δ ⌉ . For every integer t ∈ [ , ¯ T ] , if player 1’s discounted payoff in the random walk phase is atleast ( − δ t ) u ( a ∗ , b ∗ ) − c ( − δ ) and at most ( − δ t )( ε u ( α ′ , b ∗ ) + ( − ε ) E ( α , b ) ∼ q [ u ( α , b )] + ε ) + c ( − δ ) , both players follow action proﬁle ( α ′ , b ∗ ) with probability ε and follow the dis-tribution over action proﬁles q with probability 1 − ε , dictated by the realization of the publicrandomization device. Otherwise, the random walk phase stops and let T ≤ ¯ T be the stoppingtime.4. Both players follow action proﬁle ( a ′ , b ′ ) in current stage until time T such that the discountedpayoff of agent 1 in current stage is ( − δ T ) u ( a ∗ , b ∗ ) . When T satisfying the requirement is notan integer, let ξ T ≡ T − ⌊ T ⌋ , players use the public randomization device ξ ∼ U [ , ] to dictate thecontinuation play. Both players follow action proﬁle ( a ′ , b ′ ) if ξ ≤ ξ T . Otherwise, play enters thenext stage. • At every off-path history, player 2s have ruled out the possibility that player 1 is the commitment type,and the continuation play delivers player 1 his minmax payoff. This is feasible given the folk theoremresult in Fudenberg, Kreps, and Maskin (1990).In the above construction, the discounted payoff for player 1 in each stage equals ( − δ T ) u ( a ∗ , b ∗ ) , inwhich T ∈ N is the number of time periods in the stage. This implies that the strategic type has an incentive toplay the mixed action in the beginning of the game to separate from the commitment type. In addition, one canverify that player 1 has no incentive to make any off-path deviations, since his expected continuation value atevery on-path history is strictly greater than ν when δ is sufﬁciently close to 1. PROOFOFTHEOREMS1AND2 20In what follows, we show that inequality (A.19) holds, which is sufﬁcient to imply the desired conclusion.Let E be the event that player 1’s discounted payoff in the random walk phase is less than ( − δ t ) u ( a ∗ , b ∗ ) − c ( − δ ) . Let E be the event that player 1’s discounted payoff in the random walk phase is more than ( − δ t )( ε u ( α ′ , b ∗ ) + ( − ε ) E ( α , b ) ∼ q [ u ( α , b )] + ε ) + c ( − δ ) . First, the probability that event E happens isbounded from above by the probability that ∑ nt = δ t z t is greater than c for some n ≥ z t ∼ Z for all t .According to Lemma A.5, the latter probability is bounded from above by exp ( − r ∗ · c ) ≤ ε , which implies thatPr [ E ] ≤ ε . Similarly, we have Pr [ E ] ≤ ε . Let E be the event that action proﬁle ( a ′ , b ∗ ) is observed for T periods, and by deﬁnition we have Pr [ E ] = p T .We ﬁrst show that G σ , σ ( a ) ≤ α ∗ ( a ) + ε for every a ∈ A . Let G denote the discounted number of timesaction a is chosen from the beginning of each stage. By construction, we have G ≤ ( − δ T ) + ( − p T · ( − ε )) · δ T G + ( − ε ) · p T δ T + ¯ T G + p T δ T ( − δ ¯ T )( ε + ( − ε ) α ∗ ( a )) ⇒ G ≤ − δ T + p T δ T ( − δ ¯ T )( ε + ( − ε ) α ∗ ( a ))( − ε )( − δ ¯ T ) δ T p T + ( − δ T ) ≤ α ∗ ( a ) + ε − ε . To interpret the above inequalities, the ﬁrst term in the ﬁrst inequality is the upper bound on the discountednumber of times action a is chosen from time 1 to T ; the second term is the upper bound on the discountednumber of times action a is chosen in future stages conditional on event ( E ∪ E ) happens; the third term isthe upper bound on the discounted number of times action a is chosen in future stages conditional on event ¬ ( E ∪ E ) happens; and the last term is the upper bound on the discounted number of times action a is chosenin the random walk phase. The second inequality holds by rearranging the terms. By setting the parameter ε ≪ p T , the last inequality holds since 1 − δ T ≤ ε and 1 − δ ¯ T ≈ ε . Therefore, we have E ( σ , σ ) h ∞ ∑ t = ( − δ ) δ t { a t = a } i ≤ ∞ ∑ t = p ( − p ) t (cid:0) − δ t + δ t G (cid:1) = ( − p )( − δ ) − ( − p ) δ + α ∗ ( a ) + ε ( − ( − p ) δ )( − ε ) ≤ α ∗ ( a ) + ε . where the last inequality holds for sufﬁciently small ε ≪ ε .Next we show that G σ , σ ( a ) ≥ α ∗ ( a ) − ε for any action a . First we provides upper bounds for the stopping PROOFOFTHEOREMS1AND2 21time T in different events. When event E ∩ E happens, the stopping time T satisﬁes ( − δ T + T ) M + δ T + T ( − δ T − T − T ) u ( a ′ , b ′ ) ≥ ( − δ T ) u ( a ∗ , b ∗ ) ⇒ δ T ≥ u ( a ∗ , b ∗ ) − δ T + T ( u ( α ′ , b ′ ) − ( − δ T + T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ≥ δ T + T − ( − δ T + T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ≥ δ T + ¯ T − ( − δ T + ¯ T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) (A.22)When event ( ¬ E ) ∩ E happens, the stopping time T satisﬁes ( − δ T ) M + δ T ( − δ T )( ε u ( α ′ , b ∗ ) + ( − ε ) E ( α , b ) ∼ q [ u ( α , b )] + ε )+ c ( − δ ) + δ T + T ( − δ T − T − T ) u ( a ′ , b ′ ) ≥ ( − δ T ) u ( a ∗ , b ∗ ) ⇒ δ T ≥ δ T + T ( u ( a ∗ , b ∗ ) − ( u ( α ′ , b ′ )) − ( − δ T ) M − c ( − δ ) − δ T ( − δ T )( ε u ( α ′ , b ∗ ) + ε ) u ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ≥ δ T + T − ε ( − δ T + T )( M + c ) u ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ≥ δ T + ¯ T − ε ( − δ T + ¯ T )( M + c ) u ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) (A.23)Finally when event ¬ E happens, the stopping time T satisﬁes ( − δ T ) M + δ T ( − δ T − T ) u ( a ′ , b ′ ) ≥ ( − δ T ) u ( a ∗ , b ∗ ) ⇒ δ T ≥ u ( a ∗ , b ∗ ) − δ T u ( α ′ , b ′ ) − ( − δ T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ≥ δ T − ( − δ T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) (A.24)Let G denote the discounted number of times action a is chosen from the beginning of each stage. Byconstruction, we have G ≥ ( − p T )( δ T − ( − δ T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ) G + p T ( − ε )( δ T + ¯ T − ε ( − δ T + ¯ T )( M + c ) u ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ) G + p T ε ( δ T + ¯ T − ( − δ T + ¯ T ) Mu ( a ∗ , b ∗ ) − u ( α ′ , b ′ ) ) G + p T δ T ( − δ ¯ T )( − ε ) α ∗ ( a ) ⇒ G ≥ p T δ T ( − δ ¯ T )( − ε ) α ∗ ( a ) p T δ T ( − δ ¯ T ) + O ( ε ) ≥ α ∗ ( a )( − ε ) + O ( ε ) . The ﬁrst term in the ﬁrst inequality is the lower bound on the discounted number of times action a is chosen infuture stages conditional on event ¬ E happens; the second term is the lower bound on the discounted numberof times action a is chosen in future stages conditional on event E ∩ ( ¬ E ) happens; the third term is the PROOFOFPROPOSITION1 22lower bound on the discounted number of times action a is chosen in future stages conditional on event E ∩ E happens; and the last term is the lower bound on the discounted number of times action a is chosen in randomwalk phase. Finally, we have E ( σ , σ ) h ∞ ∑ t = ( − δ ) δ t { a t = a } i ≥ ∞ ∑ t = p ( − p ) t δ t G = α ∗ ( a )( − ε )( − ( − p ) δ )( + O ( ε )) ≥ α ∗ ( a ) − ε . where the last inequality holds for sufﬁciently small ε ≪ ε . Combining these bounds, we have | E ( σ , σ ) h ∞ ∑ t = ( − δ ) δ t { a t = a } i − α ∗ ( a ) | ≤ ε . for every action a ∈ A . B Proof of Proposition 1

Let B ∗ ≡ n b ∈ B (cid:12)(cid:12)(cid:12) there exists α ∈ ∆ ( A ) such that b ∈ BR ( α ) o . (B.1)For every ( b , b ) ∈ B ∗ × B ∗ , consider the following linear program with 2 | A | + | B ∗ | + F ∗∗ ( u , u , b , b ) ≡ min { α ( a ) } a ∈ A , { α ( a ) } a ∈ A , q n q · α ( a ∗ ) + ( − q ) α ( a ∗ ) o s.t. q ∑ a ∈ A α ( a ) · u ( a , b ) + ( − q ) ∑ a ∈ A α ( a ) · u ( a , b ) ≥ u ( a ∗ , b ∗ ) ∑ a ∈ A α ( a ) · u ( a , b ) ≥ ∑ a ∈ A α ( a ) · u ( a , b ) , ∀ b ∈ B ∑ a ∈ A α ( a ) · u ( a , b ) ≥ ∑ a ∈ A α ( a ) · u ( a , b ) , ∀ b ∈ B This implies that F ∗∗ ( u , u , b , b ) can be solved in time polynomial in | A | and | B ∗ | . The program that deﬁnes F ∗ ( u , u ) can be solved by taking the maximum of F ∗∗ ( u , u , b , b ) while varying ( b , b ) ∈ B ∗ × B ∗ . Thiscan also be computed in polynomial time since there are at most | B ∗ | pairs of ( b , b ) .EFERENCES 23 References

Heski Bar-Isaac. Reputation and survival: learning in a dynamic signalling model.

The Review of EconomicStudies , 70(2):231–251, 2003.Robert Barro. Reputation in a model of monetary policy with incomplete information.

Journal of MonetaryEconomics , 17(1):3–20, 1986.Martin Cripps, George Mailath, and Larry Samuelson. Imperfect monitoring and impermanent reputations.

Econometrica , 72(2):407–432, 2004.Mehmet Ekmekci. Sustainable reputations with rating systems.

Journal of Economic Theory , 146(2):479–503,2011.Mehmet Ekmekci and Lucas Maestri. Reputation and screening in a noisy environment with irreversible actions.

Working Paper , 2019.Drew Fudenberg and David Levine. Reputation and equilibrium selection in games with a patient player.

Econometrica , 57(4):759–778, 1989.Drew Fudenberg and David Levine. Maintaining a reputation when strategies are imperfectly observed.

Reviewof Economic Studies , 59(3):561–579, 1992.Drew Fudenberg, David Kreps, and Eric Maskin. Repeated games with long-run and short-run players.

TheReview of Economic Studies , 57(4):555–573, 1990.David Kreps and Robert Wilson. Reputation and imperfect information.

Journal of Economic Theory , 27(2):253–279, 1982.Qingmin Liu. Information acquisition and reputation dynamics.

The Review of Economic Studies , 78(4):1400–1425, 2011.Qingmin Liu and Andrzej Skrzypacz. Limited records and reputation bubbles.

Journal of Economic Theory ,151:2–29, 2014.Shuo Liu and Harry Pei. Monotone equilibria in signaling games.

European Economic Review , page 103408,2020.George J Mailath and Larry Samuelson. Who wants a good reputation?

The Review of Economic Studies , 68(2):415–441, 2001.George J Mailath and Larry Samuelson.

Repeated games and reputations: long-run relationships . Oxforduniversity press, 2006.Paul Milgrom and John Roberts. Predation, reputation, and entry deterrence.

Journal of Economic Theory , 27(2):280–312, 1982.Harry Pei. Reputation effects under interdependent values.

Econometrica, forthcoming , 2020.Christopher Phelan. Public trust and government betrayal.

Journal of Economic Theory , 130(1):27–43, 2006.Klaus Schmidt. Commitment through incomplete information in a simple repeated bargaining game.

Journalof Economic Theory , 60(1):114–139, 1993.Abraham Wald. On cumulative sums of random variables.