Trembling-Hand Perfection and Correlation in Sequential Games
aa r X i v : . [ c s . G T ] D ec Trembling-Hand Perfection and Correlation in Sequential Games
Alberto Marchesi, Nicola Gatti
Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133, Milan, Italy { name.surname } @polimi.it Abstract
We initiate the study of trembling-hand perfection in sequen-tial ( i.e. , extensive-form) games with correlation. We intro-duce the extensive-form perfect correlated equilibrium (EF-PCE) as a refinement of the classical extensive-form corre-lated equilibrium (EFCE) that amends its weaknesses off theequilibrium path. This is achieved by accounting for the pos-sibility that players may make mistakes while following rec-ommendations independently at each information set of thegame. After providing an axiomatic definition of EFPCE, weshow that one always exists since any perfect (Nash) equi-librium constitutes an EFPCE, and that it is a refinement ofEFCE, as any EFPCE is also an EFCE. Then, we prove that,surprisingly, computing an EFPCE is not harder than find-ing an EFCE, since the problem can be solved in polyno-mial time for general n -player extensive-form games (alsowith chance). This is achieved by formulating the problemas that of finding a limit solution (as ǫ → ) to a suitablydefined trembling LP parametrized by ǫ , featuring exponen-tially many variables and polynomially many constraints. Tothis end, we show how a recently developed polynomial-timealgorithm for trembling LPs can be adapted to deal with prob-lems having an exponential number of variables. This callsfor the solution of a sequence of (non-trembling) LPs withexponentially many variables and polynomially many con-straints, which is possible in polynomial time by applying anellipsoid against hope approach. Introduction
Nash equilibrium (NE) (Nash 1951) computation in -playerzero-sum games has been the flagship challenge in artifi-cial intelligence for several years (see, e.g. , landmark resultsin poker (Brown and Sandholm 2018, 2019)). Recently, in-creasing attention has been devoted to multi-player games,where equilibria based on correlation are now mainstream.Correlation in games is customarily modeled through atrusted external mediator that privately recommends ac-tions to the players. The mediator acts as a correlationdevice that draws action recommendations according to apublicly known distribution. The seminal notion of corre-lated equilibrium (CE) introduced by Aumann (1974) re-quires that no player has an incentive to deviate from a recommendation. This is encoded by NE conditions ap-plied to an extended game where the correlation deviceplays first by randomly selecting a profile of actions ac-cording to the public distribution; then, the original gameis played with each player being informed only of the actionselected for her. CEs are computationally appealing sincethey can be implemented in a decentralized way by let-ting players play independently according to no-regret pro-cedures (Hart and Mas-Colell 2000).Computing CEs in sequential ( i.e. , extensive-form)games with imperfect information has received con-siderable attention in the last years (Celli et al. 2019;Farina, Bianchi, and Sandholm 2020). In this context, var-ious CE definitions are possible, depending on the waysrecommendations are revealed to the players. The onethat has emerged as the most suitable for sequen-tial games is the extensive-form correlated equilibrium (EFCE) of Von Stengel and Forges (2008). The key fea-ture of EFCE is that recommendations are revealed tothe players only when they reach a decision point wherethe action is to be played, and, if one player defectsfrom a recommendation, then she stops receiving themin the future. Von Stengel and Forges (2008) show thatEFCEs can be characterized by a polynomially-sized lin-ear program (LP) in two-player games without chance.In the same restricted setting, Farina et al. (2019a) showhow to find an EFCE by solving a bilinear saddle-point problem, which can be exploited to derive an ef-ficient no-regret algorithm (Farina et al. 2019b). In gen-eral n -player games, Huang and von Stengel (2008) provethat an EFCE can be computed in polynomial time bymeans of an ellipsoid against hope (EAH) algorithm sim-ilar to that introduced by Papadimitriou and Roughgarden(2008) for CEs in compactly represented games (seealso (Gordon, Greenwald, and Marks 2008) for anotheralgorithm). Instead, finding a payoff-maximizing EFCEis NP -hard (Von Stengel and Forges 2008). Very re-cently, Celli et al. (2020) provide an efficient no-regret pro-cedure for EFCE in n -player games.One of the crucial weaknesses of standard equilibriumnotions, such as NE, in sequential games is that they mayprescribe to play sub-optimally off the equilibrium path, i.e. , at those information sets never reached when play-ing equilibrium strategies. One way to amend this issue is rembling-hand perfection (Selten 1975), whose rationale isto let players reasoning about the possibility that they maymake mistakes in the future, playing sub-optimal actionswith small, vanishing probabilities (a.k.a. trembles). Thisidea leads to the NE refinement known as perfect equilib-rium (PE) (Selten 1975). Other refinements have been in-troduced in the literature; e.g. , in the quasi-perfect equilib-rium of Van Damme (1984) players only account for oppo-nents’ future trembles (see (Van Damme 1991) for other ex-amples). Trembles can also be introduced in normal-formgames, leading to robust equilibria that rule out weakly dom-inated strategies (Hillas and Kohlberg 2002). Recently, equi-librium refinement has been addressed beyond the NE case,such as, e.g. , in Stackelberg settings (Farina et al. 2018;Marchesi et al. 2019).Trembling-hand perfection for CEs has only been stud-ied from a theoretical viewpoint in normal-form games,by Dhillon and Mertens (1996). The authors introduce theconcept of perfect CE by enforcing PE conditions in the ex-tended game, rather than NE ones. Despite equilibrium re-finements in sequential games are ubiquitous, no work ad-dressed perfection and correlation together in such setting. Original Contributions
We give an axiomatic definitionof extensive-form perfect correlated equilibrium (EFPCE),enforcing PE conditions, rather than NE ones, in the ex-tended game introduced by Von Stengel and Forges (2008)for their original definition of EFCE. Intuitively, this ac-counts for the possibility that players may make mistakeswhile following recommendations independently at eachinformation set of the game. Trembles are introduced onplayers’ strategies, while the correlation device is definedas in classical CE notions. First, we show that an EF-PCE always exists, since any PE constitutes an EFPCE,and that EFPCE is a refinement of EFCE, as any EFPCEis also an EFCE. Then, we show how an EFPCE can becomputed in polynomial time in any n -player extensive-form game (also with chance). At first, we introduce acharacterization of the equilibria of perturbed extendedgames ( i.e. , extended games with trembles) inspired bythe definition of EFCE based on trigger agents , introducedby Gordon, Greenwald, and Marks (2008) and Farina et al.(2019a). This result allows us to formulate the EFPCE prob-lem as that of finding a limit solution (as ǫ → ) to a suitablydefined trembling LP parametrized by ǫ , featuring exponen-tially many variables and polynomially many constraints.To this end, we show how the polynomial-time algorithmfor trembling LPs developed by Farina, Gatti, and Sandholm(2018) can be adapted to deal with problems having anexponential number of variables. This calls for the solu-tion of a sequence of (non-trembling) LPs with exponen-tially many variables and polynomially many constraints,which is possible in polynomial time by applying an EAH Applying the perfect CE by Dhillon and Mertens (1996) to thenormal-form representation of a sequential game does not gener-ally solve equilibrium weaknesses. This would lead to a correlatedversion of the normal-form
PE, which is known not to guard againstsub-optimality off the equilibrium path (Van Damme 1991). approach. The latter is inspired by the analogous algo-rithm of Huang and von Stengel (2008) for EFCEs, whichis adapted to deal with a different set of dual constraints,requiring a modification of the polynomial-time separationoracle of Huang and von Stengel (2008). Preliminaries
Extensive-Form Games
We focus on n -player extensive-form games (EFGs) withimperfect information. We let N := { , . . . , n } be the setof players, and, additionally, we let c be the chance playerrepresenting exogenous stochasticity. The sequential struc-ture is encoded by a game tree with node set H . Each node h ∈ H is identified by the ordered sequence σ ( h ) of actionsencountered on the path from the root to h . We let Z ⊆ H be the subset of terminal nodes, which are the leaves of thegame tree. For every non-terminal node h ∈ H \ Z , we let P ( h ) ∈ N ∪ { c } be the player who acts at h , while A ( h ) is the set of actions available. The function p c : Z → (0 , defines the probability of reaching each terminal node giventhe chance moves on the path from the root to that node.For every player i ∈ N , the function u i : Z → R encodesplayer i ’s utilities over terminal nodes. Imperfect informa-tion is modeled through information sets (infosets). An in-foset I ⊆ H \ Z of player i ∈ N is a group of player i ’snodes indistinguishable for her, i.e. , for every h ∈ I , it mustbe the case that P ( h ) = i and A ( h ) = A ( I ) , where A ( I ) isthe set of actions available at the infoset. W.l.o.g., we assumethat the sets A ( I ) are disjoint. We denote with I i the collec-tion of infosets of player i ∈ N . For every i ∈ N , we let A i := S I ∈I i A ( I ) be the set of all player i ’s actions. More-over, we let A := S i ∈ N A i . We focus on EFGs with perfectrecall in which no player forgets what she did or knew in thepast. Formally, for every player i ∈ N and infoset I ∈ I i ,it must be that every node h ∈ I is identified by the sameordered sequence σ i ( I ) of player i ’s actions from the root tothat node. Given two infosets I, J ∈ I i of player i ∈ N , wesay that J follows I , written I ≺ J , if there exist two nodes h ∈ I and k ∈ J such that h is on the path from the root to k . By perfect recall, ≺ is a partial order on I i . We also write I (cid:22) J whenever either I = J or I ≺ J . For every infoset I ∈ I i , we let C ( I, a ) ⊆ I i be the set of all infosets thatimmediately follow I by playing action a ∈ A ( I ) . Strategies
A player’s pure strategy specifies an action atevery infoset of her. For every i ∈ N , the set of player i ’spure strategies π i is Π i := × I ∈I i A ( I ) , with π i ( I ) ∈ A ( I ) being the action at infoset I ∈ I i . Moreover, Π := × i ∈ N Π i denotes the set of strategy profiles specifying a strategy foreach player, while, for i ∈ N , we let Π − i := × j = i ∈ N Π j be the (partial) strategy profiles defining a strategy for eachplayer other than i . Given π i ∈ Π i and a ∈ A i , we write a ∈ π i whenever π i prescribes to play a . Analogously, for π ∈ Π and a ∈ A , we write a ∈ π . Players are allowed torandomize over pure strategies by playing mixed strategies .For i ∈ N , we let µ i : Π i → [0 , be a player i ’s mixed All the omitted proofs are in Appendix F. a bc d c d e f g hm n o p
X YI K LJ , , , , , , , , I J K L π a c e gπ a c e hπ a c f gπ a c f hπ a d e gπ a d e hπ a d f gπ a d f h I J K L π b c e gπ b c e hπ b c f gπ b c f hπ b d e gπ b d e hπ b d f gπ b d f h Π ( a ) = { π , . . . , π } Π ( f ) = { π , π , π , π , π , π , π , π } Π ( I ) = { π , . . . , π } Π ( J ) = { π , . . . , π } Π ( K ) = { π , . . . , π } Π ( L ) = { π , . . . , π } Π ( K , f ) = { π , π , π , π } Π ( L , g ) = { π , π , π , π } Π ( z ) = { π , π , π , π } Figure 1: (
Left ) Sample EFG. Black round nodes belong to player , white round nodes belong to player , and white squarenodes are leaves (with players’ payoffs specified under them). Rounded gray lines denote infosets. ( Center ) Set Π of purestrategies for player . ( Right ) Examples of certain subsets of Π used in this work.strategy, where P π i ∈ Π i µ i ( π i ) = 1 . The perfect recall as-sumption allows to work with behavior strategies , whichdefine probability distributions locally at each infoset. For i ∈ N , we let β i : A i → [0 , be a player i ’s behavior strat-egy, which is such that P a ∈ A ( I ) β i ( a ) = 1 for all I ∈ I i . Additional Notation
We introduce some subsets of Π i (see Figure 1 for some examples). For every action a ∈ A i of player i ∈ N , we define Π i ( a ) := { π i ∈ Π i | a ∈ π i } as the set of player i ’s pure strategies specifying a . For ev-ery infoset I ∈ I i , we let Π i ( I ) ⊆ Π i be the set of strategiesthat prescribe to play so as to reach I whenever possible (de-pending on players’ moves up to that point) and any actionwhenever reaching I is not possible anymore. Additionally,for every action a ∈ A ( I ) , we let Π i ( I, a ) ⊆ Π i ( I ) ⊆ Π i be the set of player i ’s strategies that reach I and play a .Given a terminal node z ∈ Z , we denote with Π i ( z ) ⊆ Π i the set of strategies by which player i plays so as to reach z ,while Π( z ) := × i ∈ N Π i ( z ) and Π − i ( z ) := × j = i ∈ N Π j ( z ) .We also introduce the following subsets of Z . For every i ∈ N and I ∈ I i , we let Z ( I ) ⊆ Z be the set of termi-nal nodes reachable from infoset I of player i . Moreover, Z ( I, a ) ⊆ Z ( I ) ⊆ Z is the set of terminal nodes reach-able by playing action a ∈ A ( I ) at I , whereas Z ⊥ ( I, a ) := Z ( I, a ) \ S J ∈C ( I,a ) Z ( J ) is the set of those reachable byplaying a at I without traversing any other player i ’s infoset. Nash Equilibrium and Its Refinements
Given an EFG, players’ behavior strategies { β i } i ∈ N consti-tute an NE if no player has an incentive to unilaterally devi-ate from the equilibrium by playing another strategy (Nash1951). The PE defined by Selten (1975) relies on the idea ofintroducing trembles in the game, representing the possibil-ity that players may take non-equilibrium actions with small,vanishing probability. Trembles are encoded by means ofSelten’s perturbed games , which force lower bounds on theprobabilities of playing actions. Given an EFG Γ , a pair (Γ , η ) defines a perturbed game, where η : A → (0 , is afunction assigning a positive lower bound η ( a ) on the proba-bility of playing each action a ∈ A , with P a ∈ A ( I ) η ( a ) < for every i ∈ N and I ∈ I i . Then: EFGs with perfect recall admit a compact strategy representa-tion called sequence form (Von Stengel 1996). See Appendix A.
Definition 1.
Given an EFG Γ , { β i } i ∈ N is a PE of Γ ifit is a limit point of NEs for at least one sequence of per-turbed games { (Γ , η t ) } t ∈ N such that, for all a ∈ A , thelower bounds η t ( a ) converge to zero as t → ∞ . There are only a few computational works on NErefinements. For instance, Miltersen and Sørensen(2010) characterize quasi-perfect equilibria of -playerEFGs using the sequence form (see the recent workby Gatti, Gilli, and Marchesi (2020) for its extension to n -player games) and exploit this to compute an equilib-rium by solving a linear complementarity problem withtrembles defined as polynomials of some parameter treatedsymbolically. Farina and Gatti (2017) do the same for thePE. Recently, Farina, Gatti, and Sandholm (2018) providea general framework for computing NE refinements in -player zero-sum EFGs in polynomial time. The authorsshow how to reduce the task to the more general problemof solving trembling LPs parametrized by some param-eter ǫ , i.e. , finding their limit solutions as ǫ → . Then,they provide a general polynomial-time algorithm to findlimit solutions to trembling LPs. Other works study theproblem of computing (approximate) NE refinements in -player zero-sum EFGs by employing online convex op-timization techniques (Kroer, Farina, and Sandholm 2017;Farina, Kroer, and Sandholm 2017). Correlation in Extensive-Form Games
We model a correlation device as a probability distribu-tion µ ∈ ∆ Π . In the classical CE by Aumann (1974), thecorrelation device draws a strategy profile π ∈ Π accord-ing to µ ; then, it privately communicates π i to each player i ∈ N . This notion of CE does not fit well to EFGs, as itrequires the players to reason over the exponentially-sizedset Π i . Von Stengel and Forges (2008) introduced the EFCEto solve this issue. The first crucial feature of the EFCEis a different way of giving recommendations: the strategy π i is revealed to player i as the game progresses, i.e. , theplayer is recommended to play the action π i ( I ) at infoset I ∈ I i only when I is actually reached during play. Thesecond key aspect characterizing EFCEs is that, whenevera player decides to defect from a recommended action atsome infoset, then she may choose any move at her sub-sequent infosets and she stops receiving recommendationsrom the correlation device. The definition of EFCE intro-duced by Von Stengel and Forges (2008) (Definition 3) re-quires the introduction of the notion of extended game witha correlation device. Definition 2.
Given an EFG Γ and a distribution µ ∈ ∆ Π ,the extended game Γ ext ( µ ) is a new EFG in which chancefirst selects π ∈ Π according to µ , and, then, Γ is played witheach player i ∈ N receiving the recommendation to play π i ( I ) as a signal, whenever she reaches an infoset I ∈ I i . The signaling in Γ ext ( µ ) induces a new infoset structure.Specifically, every infoset I ∈ I i of the original game Γ corresponds to many, new infosets in Γ ext ( µ ) , one for eachcombination of possible action recommendations received atthe infosets preceding I (this included). At each new infoset,player i can only distinguish among chance moves corre-sponding to strategy profiles π ∈ Π that differ in the recom-mendations at infosets J ∈ I i : J (cid:22) I . Figure 2 shows asimple EFG with its corresponding extended game. Definition 3.
Given an EFG Γ , µ ∈ ∆ Π defines an EFCEof Γ if following recommendations is an NE of Γ ext ( µ ) . Next, we introduce an equivalent characterization ofEFCEs (Farina, Bianchi, and Sandholm 2020). It is basedon the following concept of trigger agent , originally dueto Gordon, Greenwald, and Marks (2008).
Definition 4.
Given an infoset I ∈ I i of player i ∈ N ,an action a ∈ A ( I ) , and a distribution ˆ µ i ∈ ∆ Π i ( I ) , an ( I, a, ˆ µ i ) -trigger agent for player i is an agent that takes onthe role of player i and follows all recommendations unlessshe reaches I and gets recommended to play a . If this hap-pens, she stops committing to recommendations and playsaccording to a strategy sampled from ˆ µ i until the game ends. Then, it follows that µ ∈ ∆ Π is an EFCE if, for every i ∈ N , player i ’s expected utility when following recom-mendations is at least as large as the expected utility thatany ( I, a, ˆ µ i ) -trigger agent for player i can achieve (assum-ing the opponents’ do not deviate from recommendations).We provide a formal statement in Appendix B. Computing EFCEs in n -player EFGs The algorithmof Huang and von Stengel (2008) relies on the following LPformulation of the problem of finding an EFCE, which hasexponentially many variables and polynomially many con-straints (for completeness, its derivation is in Appendix C). max µ ≥ , v X π ∈ Π µ [ π ] s.t. (1a) A µ + B v ≥ , (1b) For EFCEs, one can restrict the attention to distributions µ over reduced strategy profiles, i.e. , those in which each player’spure strategy only specifies actions at infosets reachable given thatplayer’s moves (Vermeulen and Jansen 1998). In the following, westick to general, un-reduced strategy profiles since, as showed inAppendix D, these are necessary for trembling-hand perfect CEsin order to define the players’ behavior off the equilibrium path. where µ is a vector of variables µ [ π ] for π ∈ Π , encoding aprobability distribution µ ∈ ∆ Π . Problem 1 does not enforceany simplex constraint on variables µ [ π ] , and, thus, it is ei-ther unbounded or it has an optimal solution with value zero(by setting µ and v to zero). In the former case, any feasible µ encodes an EFCE after normalizing it. As a result, sincean EFCE always exists (Von Stengel and Forges 2008), thefollowing dual of Problem 1 is always infeasible: A ⊤ y ≤ − (2a) B ⊤ y = − (2b) y ≥ − , (2c)where y is a vector of dual variables. TheEAH approach applies the ellipsoid algo-rithm (Gr¨otschel, Lov´asz, and Schrijver 1993) to Problem 2in order to conclude that it is infeasible. Since there areexponentially many constraints, the algorithm runs inpolynomial time only if a polynomial-time separation oracleis available. This is given by the following: Lemma 1 (Lemma 5, (Huang and von Stengel 2008)) . If y ≥ is such that B ⊤ y = , then there exists µ encod-ing a product distribution µ ∈ ∆ Π such that µ ⊤ A ⊤ y = 0 .Moreover, µ can be computed in polynomial time. Jiang and Leyton-Brown (2015) show how, given a prod-uct distribution µ computed as in Lemma 1, it is possi-ble to recover, in polynomial time, a violated constraint forProblem 2, corresponding to some strategy profile π ∈ Π .This, together with some additional technical tricks ensur-ing that B ⊤ y = holds (see (Huang and von Stengel 2008)for more details), allows to apply the ellipsoid algorithm toProblem 2 in polynomial time. Since the problem is infeasi-ble, the algorithm must terminate after polynomially manyiterations with a collection of violated constraints, whichcorrespond to polynomially many strategy profiles. Then,solving (in polynomial time) Problem 1 with the variables µ restricted to these strategy profiles gives an EFCE of thegame. Let us also remark that the EFCE obtained in this wayhas support size polynomial in the size of the game. Trembling-Hand Perfection and Correlation
We are now ready to show how trembling-hand perfec-tion can be injected into the definition of EFCE so as toamend its weaknesses off the equilibrium path (see thefollowing for an example). We generalize the approachof Dhillon and Mertens (1996) (restricted to CEs in normal-form games) to the general setting of EFCEs in EFGs. Thecore idea is to use the PE rather than the NE in the definitionof CE. Thus:
Definition 5.
Given an EFG Γ , a distribution µ ∈ ∆ Π isan extensive-form perfect correlated equilibrium (EFPCE) if following recommendations is a PE of Γ ext ( µ ) . The definition of EFPCE crucially relies on the introduc-tion of trembles in extended games, i.e. , it takes into accountthe possibility that each player may not follow action rec-ommendations with a small, vanishing probability. In thefollowing, given a perturbed EFG (Γ , η ) and µ ∈ ∆ Π , wedenote with (Γ ext ( µ ) , η ) a perturbed extended game in which bc d c dm n X IJ µ ( a c , m ) µ ( a c , n ) µ ( a d , m ) µ ( a d , n ) µ ( b c , m ) µ ( b c , n ) µ ( b d , m ) µ ( b d , n ) h I , a i h I , b ih X , m i h X , n ih J , ac i h J , ad i h J , bc i h J , bd i Figure 2: (
Left ) An EFG Γ . ( Right ) The extended game Γ ext ( µ ) . The white square node at the root is a chance node, whereeach action corresponds to some π ∈ Π and is labeled with its probability µ ( π ) . Infosets in Γ ext ( µ ) are identified by pairs.For instance, infoset h J , ac i corresponds to J when being recommended to play a and c at I and J , respectively. Thick actionsrepresent players’ behavior when following recommendations (for the ease of reading, action names are omitted).the probability of playing each action is subject to a lowerbound equal to the lower bound η ( a ) of the correspondingaction a ∈ A in Γ . By recalling the definition of PE (Defi-nition 1) and the structure of perturbed extended games, it iseasy to infer the following characterization of EFPCEs: Lemma 2.
Given an EFG Γ , a distribution µ ∈ ∆ Π isan EFPCE of Γ if following recommendations constitutesNEs for at least one sequence of perturbed extended games { Γ ext ( µ ) , η t ) } t ∈ N such that, for all a ∈ A , the lower bounds η t ( a ) converge to zero as t → ∞ . We remark that, with an abuse of terminology, we say thatplayers follow recommendations in a perturbed extendedgame (Γ ext ( µ ) , η ) whenever they play strategies which placeall the residual probability (given lower bounds) on recom-mended actions. In the following sections, we crucially relyon the characterization of EFPCEs given in Lemma 2 in or-der to derive our computational results. First, we show anexample of EFPCE and prove some of its properties. Example of EFPCE
Consider the EFG in Figure 1(
Left )and lower bounds η t : A → (0 , for t ∈ N , with η t ( a ) → as t → ∞ for all a ∈ A . First, notice that player is alwaysbetter off playing action a at the root infoset I , since she canguarantee herself a utility of by selecting c at the follow-ing infoset J , while she can achieve at most by playing b .Thus, any EFPCE of the game (as well as any EFCE) mustrecommend a at I with probability . Then, in the sub-gamereached when playing a at I , it is easy to check that recom-mending the pairs of actions ( c, m ) , ( c, n ) , and ( d, m ) eachwith probability is an equilibrium, as no player has an in-centive to deviate from a recommendation, even with trem-bles (see Appendix E for more details). The correlation de-vice described fo far is sufficient to define an EFCE, as rec-ommendations at infosets Y , K , and L are not relevant giventhat they do not influence players’ utilities at the equilibrium( b is never recommended). However, they become relevantfor EFPCEs, since, in perturbed extended games, these in-fosets could be reached due to a tremble with probability η t ( b ) . Then, player must be told to play p at Y , becauseher utility is always if she plays p , while it is always for o . Moreover, with an analogous reasoning, player mustbe recommended to play e and h at K and L , respectively. In conclusion, we can state that µ ∈ ∆ Π : µ ( aceh, mp ) = µ ( aceh, np ) = µ ( adeh, mp ) = is an EFPCE. Properties of EFPCEs
We characterize the relation be-tween EFPCEs and other equilibria, also showing that EF-PCEs always exist and represent a refinement of EFCEs. Theorem 1.
This relation holds: PE ⊆ EFPCE ⊆ EFCE . Theorem 2.
The following relations hold: • EFPCE NE and NE EFPCE ; • EFPCE ∩ NE = PE . NEs of Perturbed Extended Games
We provide a characterization of NEs of perturbed extendedgames (Γ ext ( µ ) , η ) , useful for our main algorithmic resulton EFPCEs given in the following section. Specifically, wegive a set of easily interpretable conditions which ensure thatfollowing recommendations is an NE of (Γ ext ( µ ) , η ) . Theseare crucial for the derivation of the LP exploited by our al-gorithm. Our characterization is inspired by that of EFCEsbased on trigger agents (see Lemma 4 in Appendix B). How-ever, the presence of trembles in extended games requiressome key changes, which we highlight in the following.First, we introduce some additional notation. Given a per-turbed extended game (Γ ext ( µ ) , η ) , we let ξ η ( z, π ) be theprobability of reaching a node z ∈ Z when a strategy profile π ∈ Π is recommended and players’ obey to recommenda-tions, in presence of trembles defined by η . Each ξ η ( z, π ) is obtained by multiplying probabilities of actions in σ ( z ) ,which are those on the path from the root to z . For each a ∈ σ ( z ) , two cases are possible: either a is prescribed bythe recommended π and played with its maximum probabil-ity given η , or it is not , which means that a tremble occurredwith probability η ( a ) . Formally, letting { a ∈ π } be an in-dicator for the event a ∈ π , for every z ∈ Z and π ∈ Π : ξ η ( z, π ) := Y a ∈ A : a ∈ σ ( z ) η ( a ) { a ∈ π } ˜ η ( a ) − { a ∈ π } p c ( z ) , In the following, we denote the sets of equilibria with theircorresponding acronyms ( e.g. , NE is the set of all NEs of a game). here, for a ∈ A ( I ) , we let ˜ η ( a ) := 1 − P a ′ = a ∈ A ( I ) η ( a ′ ) be the maximum probability assignable to a given η . More-over, for every player i ∈ N , infoset I ∈ I i , terminal node z ∈ Z ( I ) reachable from I , and strategy profile π ∈ Π ,we let ξ η ( z, I, π ) be defined as ξ η ( z, π ) excluding player i ’sactions leading from I to z , i.e. , with the product restrictedto actions a ∈ σ i ( I ) ∪ ( σ ( z ) \ A i ) . Analogously, for everyplayer i ’s strategy π i ∈ Π i ( I ) , we let ξ η ( z, π i ) be definedfor player i ’s actions a ∈ A i ∩ ( σ ( z ) \ σ i ( I )) from I to z .Following recommendations is an NE of the perturbed ex-tended game (Γ ext ( µ ) , η ) if, for every player i ∈ N , infoset I ∈ I i , and action a ∈ ( I ) , player i ’s utility when obeyingto the recommendation a at I is at least as large as the utilityachieved by any ( I, a, ˆ µ i ) -trigger agent. The fundamentaldifferences with respect to EFCE are: (i) an infoset I couldbe reached even when actions recommended at precedinginfosets do not allow it (due to trembles); and (ii) triggeragents are subjects to trembles, which means that they maymake mistakes while playing the strategy sampled from ˆ µ i .For any terminal node z ∈ Z , the probability of reachingit when following recommendations is: q ηµ ( z ) := X π ∈ Π ξ η ( z, π ) µ ( π ) , where the summation accounts for the probability of reach-ing z for every possible π . The sum is over Π rather than Π( z ) as for EFCE (see Equation (9) in Appendix B), since,due to trembles, z could be reached even when π / ∈ Π( z ) .For any ( I, a, ˆ µ i ) -trigger agent, the probability of reach-ing z ∈ Z ( I ) when the agent ‘gets triggered’ is defined as: p η,I,aµ, ˆ µ i ( z ):= X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, I, π ) µ ( π ) X ˆ π i ∈ Π i ( I ) ξ η ( z, ˆ π i )ˆ µ i (ˆ π i ) , where the first summation is over Π i ( a ) instead of Π i ( I, a ) (as in the EFCE, see Equation (8) in Appendix B) since itmight be the case that the agent is activated also when therecommended strategy π i does not allow to reach infoset I .Finally, the overall probability of reaching z ∈ Z ( I ) is: y η,I,aµ, ˆ µ i ( z ) := p η,I,aµ, ˆ µ i ( z ) + X π i ∈ Π i \ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) , where the first term is for when the agent ‘gets triggered’,while the second term accounts for the case in which theagent is not activated (the two events are independent). Theorem 3.
Given a perturbed extended game (Γ ext ( µ ) , η ) ,following recommendations is an NE of the game if for every i ∈ N and ( I, a, ˆ µ i ) -trigger agent for player i , it holds that: X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) ≥ X z ∈ Z ( I ) p η,I,aµ, ˆ µ i ( z ) u i ( z ) . Computing an EFPCE in n -player EFGs We provide a polynomial-time algorithm to compute an EF-PCE in n -player EFGs (also with chance). The algorithmis built on three fundamental components: (i) a tremblingLP (with exponentially many variables and polynomiallymany constraints) whose limit solutions define EFPCEs; (ii) an adaption of the algorithm by Farina, Gatti, and Sandholm(2018) that finds such limit solutions by solving a sequenceof (non-trembling) LPs; and (iii) a polynomial-time EAHprocedure that solves these LPs. Trembling LP for EFPCEs
It resembles the EFCE LP inProblem 1. In this case, the constraints appearing in the LPensure that following recommendations is an NE in a givensequence of perturbed extended games, by exploiting thecharacterization given in Theorem 3. Then, Lemma 2 allowsto conclude that the limit solutions of the trembling LP de-fine EFPCEs. In the following, we assume that a sequence ofperturbed extended games { (Γ ext ( µ ) , η t ) } t ∈ N is given. Forevery player i ∈ N , infoset I ∈ I p , and action a ∈ A ( I ) , weintroduce a variable u [ i, I, a ] to encode player i ’s expectedutility when following the recommendation to play a at I inthe perturbed extended game (Γ ext ( µ ) , η t ) . These variablesare defined by the following constraints: u [ i, I, a ] = X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ [ π ] u i ( z ) (3) ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) . Then, we introduce constraints that recursively define vari-ables v [ i, I, a, J ] for every infoset J ∈ I i : I (cid:22) J . These en-code the maximum expected utility obtained at infoset J bytrigger agents associated with I and a . To this end, we alsoneed some auxiliary non-negative variables w [ i, I, a, J, a ′ ] ,which are defined for every player i ∈ N , infoset I ∈ I i ,action a ∈ A ( I ) , infoset J ∈ I i : I (cid:22) J following I (thisincluded), and action a ′ ∈ A ( J ) available at J . v [ i, I, a, J ] − w [ i, I, a, J, a ′ ] ≥ (4) X z ∈ Z ⊥ ( J,a ′ ) X π i ∈ Π i ( a ) π − i ∈ Π − i ( z ) ξ η ( z, I, π ) µ [ π ] u i ( z )+ X K ∈C ( J,a ′ ) v [ i, I, a, K ] − X a ′′ ∈ A ( K ) η t ( a ′′ ) w [ i, I, a, K, a ′′ ] ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) , ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) . Intuitively, each auxiliary variable w [ i, I, a, J, a ′ ] representsa penalty on v [ i, I, a, J ] due to the possibility of tremblingby playing a (possibly) sub-optimal action a ′ ∈ A ( J ) at J . Indeed, whenever a ′ is an optimal action at infoset J ,then w [ i, I, a, J, a ′ ] is set to in any solution; otherwise, w [ i, I, a, J, a ′ ] represents how much utility is lost by playing a ′ instead of an optimal action (see Figure 3( Right ) for an z z z z a bc d e f IJ K
Constraints (4) and (5) for player , infoset I , and action au [1 , I , a ] = v [1 , I , a, I ] − η ( a ) w [1 , I , a, I , a ] − η ( b ) w [1 , I , a, I , b ]( I , a ) : v [1 , I , a, I ] − w [1 , I , a, I , a ] ≥ v [1 , I , a, J ] + v [1 , I , a, K ]+ − η ( c ) w [1 , I , a, J , c ] − η ( d ) w [1 , I , a, J , d ] − η ( e ) w [1 , I I , a, K , e ] − η ( f ) w [1 , I , a, K , f ]( I , b ) v [1 , I , a, I ] − w [1 , I , a, I , b ] ≥ U ( z )( J , c ) : v [1 , I , a, J ] − w [1 , I , a, J , c ] ≥ U ( z ) ( K , e ) : v [1 , I , a, K ] − w [1 , I , a, K , e ] ≥ U ( z )( J , d ) : v [1 , I , a, J ] − w [1 , I , a, J , d ] ≥ U ( z ) ( K , f ) : v [1 , I , a, K ] − w [1 , I , a, K , f ] ≥ U ( z ) Figure 3: (
Left ) Simple EFG. (
Right ) Example of Constraints (4)–(5); we let U ( z ) := P π ∈ Π( a ) π ∈ Π ( z ) ξ η ( z, I , π ) µ [ π ] u ( z ) for every z ∈ Z . Variables v [1 , I , a, · ] encode the optimal utility of trigger agents associated to I , a at infosets following I (withouttrembles). Variables w [1 , I , a, · , · ] account for penalties due to trembles. To see this, fix µ [ π ] . Assume that U ( z ) > U ( z ) and consider the constraints for ( K , e ) and ( K , f ) . Then, it must be v [1 , I , a, K ] = U ( z ) and w [1 , I , a, K , f ] = 0 , which implies w [1 , I , a, K , e ] = U ( z ) − U ( z ) (constraint for ( I , a ) ). Similarly, assuming U ( z ) > U ( z ) , it must be v [1 , I , a, J ] = U ( z ) , w [1 , I , a, J , d ] = 0 , and w [1 , I , a, J , c ] = U ( z ) − U ( z ) . An analogous reasoning holds at infosets upwards in the game tree.example). Finally, the incentive constraints are: u [ i, I, a ] = v [ i, I, a, I ] − X a ′ ∈ A ( I ) η t ( a ′ ) w [ i, I, a, I, a ′ ] (5) ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) . Figure 3 provides an example of Constraints (4) and (5) tobetter clarify their meaning. The following theorem showsthat Constraints (3), (5), and (4) correctly encode the condi-tions given in Theorem 3, which ensure that following rec-ommendations is an NE in (Γ ext ( µ ) , η t ) . Theorem 4.
Given a perturbed extended game (Γ ext ( µ ) , η t ) ,if Constraints (3) , (4) , and (5) can be satisfied for the vector µ of variables µ [ π ] encoding the distribution µ , then follow-ing recommendations is an NE of (Γ ext ( µ ) , η t ) . By substituting the expression of u [ i, I, a ] (given by Con-straints (3) and (5)) into Constraints (4), we can formulatethe following trembling LP parameterized by t ∈ N : max µ ≥ , v , w ≥ X π ∈ Π µ [ π ] s.t. (6a) A t µ + B v + C t w ≥ , (6b)where A t is the analogous of matrix A in Problem 1, w is avector whose components are the variables w [ i, I, a, J, a ′ ] ,and C t is a matrix defining the constraints coefficients forthese variables. Notice that the coefficients of variables in v (as defined by B ) are the same as in Problem 1. Limit Solutions of Trembling LP
Problem 6 can be castinto the framework of Farina, Gatti, and Sandholm (2018)by defining sequences of lower bounds η t by means of van-ishing polynomials in a parameter ǫ → . As a result, thepolynomial-time algorithm by Farina, Gatti, and Sandholm(2018) can be used, with the only difference that, at eachstep, for a fixed value of the parameter ǫ ( i.e. , particularlower bounds η t ), it needs to solve an instance of Prob-lem 6 featuring exponentially many variables. Provided thatthe latter can be done in polynomial time, the polynomialityof the overall procedure is preserved, since the bounds onthe running time provided by Farina, Gatti, and Sandholm(2018) do not depend on the number of variables in the LP. EAH Procedure
In order to solve Problem 6 for aparticular lower bound function η t in polynomial time,we can apply a procedure similar to the EAH algorithmby Huang and von Stengel (2008). Notice that Problem 6 isalways unbounded, since there always exists a distribution µ ∈ ∆ Π such that following recommendations is an NEof the perturbed extended game (Γ ext ( µ ) , η t ) (such µ is anEFCE of the corresponding perturbed, non-extended game).Thus, we only need to provide a polynomial-time separationoracle for the always-infeasible dual of Problem 6, whichreads as: A ⊤ t y ≤ − (7a) B ⊤ y = − (7b) C ⊤ t y ≥ − (7c) y ≥ − , (7d)where the vector of dual variables y has the same role as inProblem 2, since the constraints of the primal problems areindexed on the same sets. Notice that constraints C ⊤ t y ≥ are polynomially many. As a result, one can always checkwhether one of these constraints is violated in polynomialtime and, if this is the case, output one such constraint as aviolated inequality. This allows to focus on separation ora-cles for the other constraints. Then, the required one is givenby the following lemma, an analogous of Lemma 1. Lemma 3. If y ≥ is such that B ⊤ y = , then thereexists µ encoding a product distribution µ ∈ ∆ Π such that µ ⊤ A ⊤ t y = 0 . Moreover, µ can be computed in poly-time. The proof of Lemma 3 follows the same line as that ofLemma 5 by Huang and von Stengel (2008) (see (Huang2011) for its complete version) and it is based on the CEexistence proof by Hart and Schmeidler (1989).
Discussion and Future Works
We started the study of trembling-hand perfection in sequen-tial games with correlation, introducing the EFPCE as a re-finement of the EFCE that amends its weaknesses off theequilibrium path. This paves the way to a new research line,raising novel game-theoretic and computational challenges.s for EFPCEs, an open question is whether compact cor-related strategy representations, like the EFCE-based corre-lation plan by Von Stengel and Forges (2008), are possiblein some restricted settings, such as -player games withoutchance. This would enable the optimization over the set ofEFPCEs in polynomial time. The main challenge raised byEFPCEs with respect to EFCEs is that the former require toreason about general, un-reduced strategy profiles.Another possible future work is to extend our analysis toother CE-based solution concepts, such as the normal-form CE and the agent-form
CE (see (Von Stengel and Forges2008) for their definitions). This raises the interesting ques-tion of how different trembling-hand-based CEs are able toamend weaknesses off the equilibrium path.Finally, an interesting direction is to consider differentways of refining CE-based equilibria in sequential games,such as, e.g. , using quasi-perfection (Van Damme 1984).
Acknowledgments
This work has been partially supported by the Italian MIURPRIN 2017 Project ALGADIMAR “Algorithms, Games,and Digital Market”.
References
Aumann, R. J. 1974. Subjectivity and correlation in random-ized strategies.
Journal of mathematical Economics
Science
Science
Advances in Neural Information ProcessingSystems , 13076–13086.Celli, A.; Marchesi, A.; Farina, G.; and Gatti, N. 2020. No-regret learning dynamics for extensive-form correlated equi-librium.
Advances in Neural Information Processing Sys-tems
Journal of economic theory
AAAI Conference onArtificial Intelligence , 1934–1941.Farina, G.; and Gatti, N. 2017. Extensive-Form PerfectEquilibrium Computation in Two-Player Games. In
AAAIConference on Artificial Intelligence , 502–508.Farina, G.; Gatti, N.; and Sandholm, T. 2018. Practical ex-act algorithm for trembling-hand equilibrium refinements ingames. In
Advances in Neural Information Processing Sys-tems , 5039–5049.Farina, G.; Kroer, C.; and Sandholm, T. 2017. Regret mini-mization in behaviorally-constrained zero-sum games. In
In-ternational Conference on Machine Learning , 1107–1116. Farina, G.; Ling, C. K.; Fang, F.; and Sandholm, T. 2019a.Correlation in Extensive-Form Games: Saddle-Point Formu-lation and Benchmarks. In
Advances in Neural InformationProcessing Systems , 9229–9239.Farina, G.; Ling, C. K.; Fang, F.; and Sandholm, T. 2019b.Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium. In
Advances in Neural In-formation Processing Systems , 5187–5197.Farina, G.; Marchesi, A.; Kroer, C.; Gatti, N.; and Sand-holm, T. 2018. Trembling-hand perfection in extensive-formgames with commitment. In
International Joint Conferenceson Artificial Intelligence Organization , 233–239.Farina, G.; and Sandholm, T. 2020. Polynomial-timecomputation of optimal correlated equilibria in two-playerextensive-form games with public chance moves and be-yond.
Advances in Neural Information Processing Systems
Games and EconomicBehavior
International Confer-ence on Machine Learning , 360–367.Gr¨otschel, M.; Lov´asz, L.; and Schrijver, A. 1993.
Geomet-ric algorithms and combinatorial optimization . Springer.Hart, S.; and Mas-Colell, A. 2000. A simple adaptive proce-dure leading to correlated equilibrium.
Econometrica
Mathematics of Operations Research
Handbook of Game Theory with Economic Ap-plications
3: 1597–1663.Huang, W. 2011.
Equilibrium computation for extensivegames . Ph.D. thesis, Citeseer.Huang, W.; and von Stengel, B. 2008. Computing anextensive-form correlated equilibrium in polynomial time.In
International Workshop on Internet and Network Eco-nomics , 506–513. Springer.Jiang, A. X.; and Leyton-Brown, K. 2015. Polynomial-time computation of exact correlated equilibrium in compactgames.
Games and Economic Behavior
91: 347–359.Kroer, C.; Farina, G.; and Sandholm, T. 2017. SmoothingMethod for Approximate Extensive-Form Perfect Equilib-rium. In
International Joint Conference on Artificial Intelli-gence , 295–301.Marchesi, A.; Farina, G.; Kroer, C.; Gatti, N.; and Sand-holm, T. 2019. Quasi-perfect stackelberg equilibrium. In
Proceedings of the AAAI Conference on Artificial Intelli-gence , volume 33, 2117–2124.Miltersen, P. B.; and Sørensen, T. B. 2010. Computing aquasi-perfect equilibrium of a two-player game.
EconomicTheory
Annals of mathe-matics
Journal ofthe ACM (JACM)
Internationaljournal of game theory
International Journal of Game Theory
Stability and perfection of Nash equi-libria , volume 339. Springer.Vermeulen, D.; and Jansen, M. 1998. The reduced form ofa game.
European journal of operational research
Games and Economic Behavior
Mathematics of Operations Research ppendix
A Sequence-Form Representation for Perfect-Recall EFGs
The number of pure strategies | Π i | of each player i ∈ N may be exponentially large in the size of an EFG, preventing thedevelopment of scalable computational tools using them. Moreover, the same holds for reduced pure strategies, which onlyspecify actions at infosets that are reachable given the player’s past moves. This problem is circumvented by the sequence form introduced by Von Stengel (1996), where each player selects a sequence of actions rather than a pure strategy. For any node h ∈ H , we let σ i ( h ) be the ordered sequence of actions of player i ∈ N on the path from the root of the game tree to h . Werecall that, given the perfect recall assumption, all nodes in an infoset I ∈ I i of player i ∈ N define the same sequence σ i ( I ) of player i ’s actions, i.e. , it holds σ i ( h ) = σ i ( I ) for all h ∈ I . Moreover, σ i ( I ) can be extended by any action a ∈ A ( I ) ,defining a new player i ’s sequence σ i ( I ) a . Thus, by introducing the empty sequence to represent the paths in the game tree inwhich a player does not play, the set of sequences available to player i ∈ N is Σ i := { ∅ } ∪ { σ i ( I ) a | I ∈ I i , a ∈ A ( I ) } .Within the sequence form, mixed strategies are expressed as realization plans . A realization plan for player i ∈ N is a function x i : Σ i → [0 , , with x i ( σ i ) expressing the realization probability of sequence σ i ∈ Σ i . In order to be well defined, x i mustsatisfy the linear constraints x i ( ∅ ) = 1 and x i ( σ i ( I )) = P a ∈ A ( I ) x i ( σ i ( I ) a ) for every infoset I ∈ I i . Since the numberof sequences | Σ i | of each player i ∈ N is polynomial in the size of an EFG and realization plans can be easily expressedby linear constraints, the sequence form is an appealing formalism for handling EFGs. Moreover, as shown by Von Stengel(1996), the crucial property of the sequence form is that realization plans and behavior strategies are equally expressive inEFGs with perfect recall. In particular, x i is equivalent to a behavior strategy that selects a ∈ A ( I ) with probability x i ( σ i ( I ) a ) x i ( σ i ( I )) if x p ( σ i ( I )) > and arbitrarily if x i ( σ i ( I )) = 0 . Conversely, a behavior strategy β i is equivalent to a realization plan that selectseach sequence σ i ∈ Σ i with probability Q a ∈ σ i β i ( a ) . B Characterization of EFCEs Using Trigger Agents
We provide a formal statement of the characterization of EFCEs based on trigger agents (see Definition 4), originally introducedby Gordon, Greenwald, and Marks (2008) and Farina et al. (2019a) (see also (Farina, Bianchi, and Sandholm 2020) for a moregeneral treatment). We recall that such characterization is based on the fact that µ ∈ ∆ Π is an EFCE if, for every i ∈ N , player i ’s expected utility when following recommendations is at least as large as the expected utility that any ( I, a, ˆ µ i ) -trigger agentfor player i can achieve (assuming the opponents’ do not deviate from recommendations).For any µ ∈ ∆ Π and ( I, a, ˆ µ i ) -trigger agent, we define the probability of reaching a terminal node z ∈ Z ( I ) as: p I,aµ, ˆ µ i ( z ) := X π i ∈ Π i ( I,a ) π − i ∈ Π − i ( z ) µ ( π i , π − i ) X ˆ π i ∈ Π i ( z ) ˆ µ i (ˆ π i ) p c ( z ) , (8)which accounts for the fact that the agent follows recommendations until she receives the recommendation of playing a at I ,and, thus, she ‘gets triggered’ and plays according to ˆ π i sampled from ˆ µ i from I onwards. Moreover, the probability of reachinga terminal node z ∈ Z when following the recommendations is defined as follows: q µ ( z ) := X π ∈ Π( z ) µ ( π ) p c ( z ) . (9)Then, the following lemma provides the trigger-agent-based characterization of EFCEs: Lemma 4 (Farina, Bianchi, and Sandholm (2020)) . Given an EFG Γ , µ ∈ ∆ Π is an EFCE of Γ if for every i ∈ N and ( I, a, ˆ µ i ) -trigger agent for player i , it holds that: X z ∈ Z ( I,a ) q µ ( z ) u i ( z ) ≥ X z ∈ Z ( I ) p I,aµ, ˆ µ i ( z ) u i ( z ) . C LP Formulation for the Set of EFCEs in n -Player EFGs We show how to derive the LP formulation (Problem 1) for the set of EFCEs in n -player EFGs originally introducedby Huang and von Stengel (2008), using the characterization of EFCEs based on trigger agents (see Definition 4 and Lemma 4).In the following, we assume that a probability distribution µ ∈ ∆ Π is encoded by means of variables µ [ π ] , defined for π ∈ Π .For every player i ∈ N , infoset I ∈ I i , and action a ∈ A ( I ) , we introduce a variable u [ i, I, a ] representing player i ’s expectedutility when following the recommendation to play a at infoset I . These variables are defined by the following constraints: u [ i, I, a ] = X z ∈ Z ( I,a ) X π ∈ Π( z ) µ [ π ] p c ( z ) u i ( z ) ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) . (10)hen, we need to introduce constraints which ensure that following recommendations guarantees a utility at least as large asthat achieved by any ( I, a, ˆ µ i ) -trigger agent. For every infoset J ∈ I i such that I (cid:22) J , we introduce a variable v [ i, I, a, J ] that encodes the maximum expected utility obtained at infoset J by trigger agents associated with I and a . We can recursivelydefine variables v [ i, I, a, J ] as follows: v [ i, I, a, J ] ≥ X z ∈ Z ⊥ ( J,a ′ ) X π i ∈ Π i ( I,a ) π − i ∈ Π − i ( z ) µ [ π i , π − i ] p c ( z ) u i ( z ) + X K ∈C ( J,a ′ ) v [ i, I, a, K ] (11) ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) , ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) , where we notice that the first summation is over the set of terminal nodes which are reachable from J by playing a ′ withouttraversing any other player i ’s infoset. The following incentive constraints complete the formulation: u [ i, I, a ] = v [ i, I, a, I ] ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) . (12)A direct application of Lemma 4 and LP duality is enough to prove that Constraints (10), (11), and (12) correctly characterizethe set of EFCEs (formally, it is enough to follow steps similar to those in the proof of Theorem 4, with the only difference thatConstraints (13d) in the inner maximization problems and the corresponding dual variables w [ i, I, a, J, a ′ ] are missing).By substituting the equalities in Constraints (10) and (12) into Constraints (11), we obtain the following set of linear con-straints, which are equivalent to those introduced by Huang and von Stengel (2008): A µ + B v ≥ , where µ is the vector whose components are the variables µ [ π ] for π ∈ Π , while v is the vector of variables v [ i, I, a, J ] indexedby i ∈ N, I ∈ I i , a ∈ A ( I ) , and J ∈ I i : I (cid:22) J . Moreover, the matrices A and B encode the coefficients appearing inConstraints (10) and (11). Specifically, non-zero entries of A are products p c ( z ) u i ( z ) , while those of B are either or − . D Discussion on EFPCEs and Un-Reduced Strategies
Next, we discuss the reasons why EFPCEs need un-reduced strategy profiles in order to be defined consistently.First, we remark that, as discussed by Von Stengel and Forges (2008), restricting the definition of probability distributions µ to reduced strategy profiles ( i.e. , those in which each player’s pure strategy only specifies actions at infosets reachable given thatplayer’s moves, see (Vermeulen and Jansen 1998) for a formal definition) is sufficient for the characterization of the classicalnotions of correlated equilibria. Intuitively, the reason is that, at the equilibrium, each player follows recommendations issuedby the correlation device, and, thus, the latter does not need to specify action recommendations for the player at those infosetsthat are never reached when following recommendations at the preceding infosets of the same player.This is no longer the case if we introduce trembles in the game, which make all the infosets reachable with positive proba-bility even when committing to following recommendations. As a result, the correlation device has to be ready to issue actionrecommendations everywhere in the game. Then, when defining EFPCEs, we cannot restrict the attention to probability dis-tributions over reduced strategy profiles, and un-reduced ones are necessary. The EFG in Figure 1( Left ) provides an examplewhere un-reduced strategies are necessary to express EFPCEs. As shown in the main text, any EFPCE of the game must recom-mend to play a at I , while, at the same time, it is crucial to define recommendations also at infosets K and L , in order to achieveoptimality off the equilibrium path. Clearly, this is incompatible with reduced strategies, as any player ’s reduced strategyprescribing a at I does not specify anything at infosets K and L , which are unreachable when playing a at I . E Detailed Examples of EFPCEs
Consider the EFG in Figure 1(
Left ) and lower bound functions η t : A → (0 , for t ∈ N , with η t ( a ) converging to zero as t → ∞ for each a ∈ A . First, let us notice that, without trembles, player is always better off playing action a at the rootinfoset I , since she can guarantee herself a utility of by selecting c at the following infoset J , while she can achieve at most autility of by playing b . Thus, any EFPCE of the game (as well as any EFCE) must recommend a at I with probability , sincethere is no way player can be incentivized to play b . Then, in the sub-game reached when playing a at I , it is easy to checkthat recommending the pairs of actions ( c, m ) , ( c, n ) , and ( d, m ) each with probability is an equilibrium, as each player hasno incentive to deviate from each possible recommendation, even in presence of trembles. As an example, consider the case inwhich player is told to play action c at J . Then, by following the recommendations, she gets a utility equal to: h ·
13 (1 − η t ( n )) (1 − η t ( d )) + 3 ·
13 (1 − η t ( n )) η t ( d ) + 1 · η t ( n ) (1 − η t ( d )) + 0 · η t ( n ) η t ( d ) i + h ·
13 (1 − η t ( m )) (1 − η t ( d )) + 0 ·
13 (1 − η t ( m )) η t ( d ) + 2 · η t ( m ) (1 − η t ( d )) + 3 · η t ( m ) η t ( d ) i , here the first sum is for the case in which ( c, m ) is recommended, while the second one is for ( c, n ) . Each term appearing ina sum is for one of the four possible outcomes that may result when following recommendations subject to trembles. Instead,player ’s utility if deviating to d at J is: h ·
13 (1 − η t ( n )) (1 − η t ( c )) + 2 ·
13 (1 − η t ( n )) η t ( c ) + 0 · η t ( n ) (1 − η t ( c )) + 1 · η t ( n ) η t ( c ) i + h ·
13 (1 − η t ( m )) (1 − η t ( c )) + 1 ·
13 (1 − η t ( m )) η t ( c ) + 3 · η t ( m ) (1 − η t ( c )) + 2 · η t ( m ) η t ( c ) i . A simple calculation shows that the first quantity is greater than or equal to the second one as the lower bounds approachzero. Analogous conditions hold for other recommendations at infosets X and J . Notice that, when lower bounds are zero, theconditions above collapse to the classical incentive constraints for EFCE. The correlation device described up to this pointis sufficient to define an EFCE, as recommendations at infosets Y , K , and L are not relevant given that they do not influenceplayers’ utilities at the equilibrium ( b is never recommended). However, in perturbed extended games, these infosets could bereached due to a tremble which happens with probability η t ( b ) , and, thus, recommendations at such infosets become relevant.Then, it is easy to check that player must be told to play p at Y , because her utility is always if she plays p , while it isalways when playing o . Moreover, with an analogous reasoning, player must be recommended to play e and h at K and L , respectively. As an example, consider the case in which player is recommended to play e at K . Then, her utility would be · (1 − η t ( f )) + 0 · η t ( f ) , while she would get · (1 − η t ( e )) + · η t ( e ) by deviating to e . Similar conditions hold for infosets Y and L . In conclusion, we can state that the following distribution µ ∈ ∆ Π defines an EFPCE: µ ( aceh, mp ) = µ ( aceh, np ) = µ ( adeh, mp ) = 13 . Let us remark that this is not the only EFPCE of the game, as there are other ways of correlating players’ behavior at infosets X and J while satisfying the required incentive constraints. For example, setting µ ( aceh, mp ) = µ ( aceh, np ) = µ ( adeh, mp ) = µ ( adeh, np ) = 14 defines a valid EFPCE that results from a PE of the game (where players play uniform strategies at infosets X and J ). F Proofs of Theorems and Lemmas
In this section, we provide the complete proofs of Theorems 1, 2, 3, 4, and Lemma 3.
Theorem 1.
This relation holds: PE ⊆ EFPCE ⊆ EFCE . Proof.
Clearly,
EFPCE ⊆ EFCE holds since any PE of Γ ext ( µ ) is also an NE. As for the other relation, let { β i } i ∈ N be a PEof Γ obtained for a sequence of perturbed games { (Γ , η t ) } t ∈ N and a corresponding sequence of NEs in these games, namely { β i,t } i ∈ N for t ∈ N , where each β i,t is a well-defined player i ’s behavior strategy in (Γ , η t ) , i.e. , it holds β i,t ( a ) ≥ η t ( a ) forall t ∈ N , i ∈ N , and a ∈ A i . Let µ ∈ ∆ Π be such that, for every π ∈ Π , it holds µ ( π ) = Q i ∈ N Q I ∈I i β i ( π i ( I )) . Considerthe extended game Γ ext ( µ ) , where we denote with I ext i the set of all infosets of player i ∈ N , one for each infoset I ∈ I i of Γ and possible combination of recommendations received by i at the infosets J ∈ I i : J (cid:22) I . Overloading the notation, for eachinfoset I ∈ I ext i of the extended game, we use I as well to denote the corresponding infoset in the original game. We also use A ( I ) as the set of actions available at I ∈ I ext i . Let { (Γ ext ( µ ) , η t ) } t ∈ N be the sequence of perturbed extended games resultingfrom { (Γ , η t ) } t ∈ N . Furthermore, for each t ∈ N and player i ∈ N , we define a player i ’s behavior strategy for (Γ ext ( µ ) , η t ) such that, at each infoset I ∈ I ext i :• all the residual probability given the lower bounds − P a ∈ A ( I ): a = π i ( I ) η t ( a ) is placed on the action π i ( I ) which is recom-mended at I ; and• all the other, non-recommended actions a ∈ A ( I ) : a = π i ( I ) are played with probabilities equal to their correspondinglower bounds η t ( a ) .Intuitively, these strategies encode the fact that players follow recommendations in the perturbed extended games (Γ ext ( µ ) , η t ) ,where trembles prevent them to perfectly obey to recommendations. Given the definition of µ and the fact that each { β i,t } i ∈ N constitutes an NE for the perturbed game (Γ , η t ) , we can conclude that the behavior strategies defined above constitute NEs forthe perturbed extended games (Γ ext ( µ ) , η t ) . Thus, any limit point of the sequence defined by such behavior strategies for t ∈ N is a PE of Γ ext ( µ ) . Moreover, by definition, any limit point prescribes to play recommended actions, which shows that µ definesan EFPCE of Γ , proving that PE ⊆ EFPCE . Theorem 2.
The following relations hold: • EFPCE NE and NE EFPCE ; • EFPCE ∩ NE = PE .roof. Let us start with the first bullet point. We consider the EFG in Figure 1(
Left ) in order to provide examples that provethe two relations. Notice that, in such game, player is always better off playing action a at the first infoset I , since she canguarantee herself to get at least by playing c at J , while she can achieve at most by playing action b . Then, it is easy to checkthat, in any NE of the game, the players play behavior strategies β and β such that:• β ( a ) = 1 and β ( b ) = 0 , while β ( c ) = β ( d ) = ; and• β ( m ) = β ( n ) = .The players’ behavior at other infosets can be any, as it does not affect players’ utilities at the equilibrium (given that infosets Y , K , and L are never reached due to β ( b ) = 0 ). As we have shown in the main text, one EFPCE of the game is the distribution µ ∈ ∆ Π such that µ ( aceh, mp ) = µ ( adeh, mp ) = µ ( aceh, np ) = 13 , which enforces each player to follow recommendations, even in presence of trembles. Clearly, this distribution µ cannot comeup from players’ behavior strategies, and, thus, it cannot result from an NE. This shows that EFPCE NE . Moreover, noticethat any NE such that β ( f ) > cannot determine a distribution µ ∈ ∆ Π which is an EFPCE, since it would be the case thataction f is recommended with positive probability when reaching infoset K (due to trembles). However, player cannot haveany incentive to follow such recommendation, as she can gain a utility of instead of by deviating to e . This proves that NE EFPCE .As for the second bullet point, notice that all the EFPCEs µ ∈ ∆ Π which are also NEs must be such that µ is obtained fromsome players’ behavior strategies defining an NE. As a result, by definition of EFPCE, we can conclude that such behaviorstrategies are indeed PEs. Theorem 3.
Given a perturbed extended game (Γ ext ( µ ) , η ) , following recommendations is an NE of the game if for every i ∈ N and ( I, a, ˆ µ i ) -trigger agent for player i , it holds that: X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) ≥ X z ∈ Z ( I ) p η,I,aµ, ˆ µ i ( z ) u i ( z ) . Proof.
Given the definitions of q ηµ ( z ) , p η,I,aµ, ˆ µ i ( z ) , and y η,I,aµ, ˆ µ i ( z ) , following recommendations is an NE of (Γ ext ( µ ) , η ) if for every i ∈ N and ( I, a, ˆ µ i ) -trigger agent for player i , it holds that: X z ∈ Z q ηµ ( z ) u i ( z ) ≥ X z ∈ Z \ Z ( I ) q ηµ ( z ) u i ( z ) + X z ∈ Z ( I ) y η,I,aµ, ˆ µ i ( z ) u i ( z ) . Equivalently, we can write: X z ∈ Z ( I ) q ηµ ( z ) u i ( z ) ≥ X z ∈ Z ( I ) y η,I,aµ, ˆ µ i ( z ) u i ( z ) X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) + X π i ∈ Π i \ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) ≥≥ X z ∈ Z ( I ) p η,I,aµ, ˆ µ i ( z ) u i ( z ) + X z ∈ Z ( I ) X π i ∈ Π i \ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) ≥ X z ∈ Z ( I ) p η,I,aµ, ˆ µ i ( z ) u i ( z ) , which proves the result. Theorem 4.
Given a perturbed extended game (Γ ext ( µ ) , η t ) , if Constraints (3) , (4) , and (5) can be satisfied for the vector µ ofvariables µ [ π ] encoding the distribution µ , then following recommendations is an NE of (Γ ext ( µ ) , η t ) .roof. By Theorem 3, following recommendations is an NE of (Γ ext ( µ ) , η t ) if the vector µ of variables µ [ π ] encoding thedistribution µ satisfies the following constraints (form here on, we omit the subscript t for the ease of notation): X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) = X z ∈ Z ( I ) p η,I,aµ, ˆ µ I,ai ( z ) u i ( z ) ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I )ˆ µ I,ai ∈ argmax ˆ µ i ∈ ∆ Π i ( I ) X z ∈ Z ( I ) p η,I,aµ, ˆ µ i ( z ) u i ( z ) ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) , where we replaced quantifications over all player i ’s pure strategies ˆ µ i ∈ ∆ Π i ( I ) with inner maximizations, by introducingauxiliary variables ˆ µ I,ai for each player i ∈ N , infoset I ∈ I i , and action a ∈ A ( I ) . Next, let us notice that, as long asthe objective to be maximized in each inner problem is the sum P z ∈ Z ( I ) p η,I,aµ, ˆ µ i ( z ) u i ( z ) (which only contains terms referredto terminal nodes reachable from I ), strategies ˆ µ i ∈ ∆ Π i ( I ) can be replaced with realization plans x i : Σ i → [0 , such that x i ( σ i ( I )) = 1 ( i.e. , where the probability of reaching infoset I given player i ’s moves is ). This holds thanks to the equivalencebetween mixed strategies and realization plans (Von Stengel 1996). As a result, for every player i ∈ N , infoset I ∈ I i , andaction a ∈ A ( I ) , we can write each inner maximization problem as follows: max X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, I, π ) µ ( π ) u i ( z ) x i [ σ i ( z )] s.t. (13a) x i [ σ i ( I )] = 1 (13b) x i [ σ i ( J )] = X a ′ ∈ A ( J ) x i [ σ i ( J ) a ′ ] ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) (13c) x i [ σ i ( J ) a ′ ] ≥ η ( a ′ ) x i [ σ i ( J )] ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) (13d) x i [ σ i ( I )] ≥ x i [ σ i ( J ) a ′ ] ≥ ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) , where x i [ σ i ( I )] and x i [ σ i ( J ) a ′ ] are variables encoding a player i ’s realization plan restricted to sequences extending σ i ( I ) (these are the only variables needed, since Objective (13a) does not depend on the realization plan probabilities of other se-quences). We also notice that the trembles associated with player i ’s actions at infosets J ∈ I i : I (cid:22) J (managed by the terms ξ η ( z, I, ˆ π i ) in the definition of p η,I,aµ, ˆ µ i ( z ) ) are encoded by Constraints (13d), which ensure that each action a ′ ∈ A ( J ) is playedwith probability x i [ σ i ( J ) a ′ ] x i [ σ i ( J )] ≥ η ( a ′ ) (given that the denominator is non-null). The dual of Problem 13 reads as follows: min v [ i, I, a, ∅ ] s.t. (14a) v [ i, I, a, ∅ ] ≥ v [ i, I, a, I ] + X a ′ ∈ A ( I ) η ( a ′ ) w [ i, I, a, I, a ′ ] (14b) v [ i, I, a, J ] − w [ i, I, a, J, a ′ ] ≥ X z ∈ Z ⊥ ( J,a ′ ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, I, π ) µ ( π ) u i ( z )++ X K ∈C ( J,a ′ ) v [ i, I, a, K ] + X a ′′ ∈ A ( K ) η ( a ′′ ) w [ i, I, a, K, a ′′ ] ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) (14c) w [ i, I, a, J, a ′ ] ≥ ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) , where v [ i, I, a, ∅ ] is the dual variable associated to Constraint (13b), v [ i, I, a, J ] for J ∈ I i : I (cid:22) J are the dual variablesassociated to Constraints (13c), and w [ i, I, a, J, a ′ ] for J ∈ I i : I (cid:22) J and a ′ ∈ A ( J ) are the dual variables associated toConstraints (13d). By using the fact that variable v [ i, I, a, ∅ ] appears only in Constraint (14b) and by changing sign to variables [ i, I, a, J, a ′ ] , we can re-write Problem (14) as follows: min v [ i, I, a, I ] − X a ′ ∈ A ( I ) η ( a ′ ) w [ i, I, a, I, a ′ ] s.t. (15a) v [ i, I, a, J ] − w [ i, I, a, J, a ′ ] ≥ X z ∈ Z ⊥ ( J,a ′ ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, I, π ) µ ( π ) u i ( z )++ X K ∈C ( J,a ′ ) v [ i, I, a, K ] − X a ′′ ∈ A ( K ) η ( a ′′ ) w [ i, I, a, K, a ′′ ] ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) (15b) w [ i, I, a, J, a ′ ] ≥ ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) . Then, we can remove the inner maximization problems by enforcing strong duality, i.e. , we add constraints equating Objec-tive (13a) and Objective (15a). Noticing that Objective (13a) is equal to P z ∈ Z ( I ) p η,I,aµ, ˆ µ I,ai ( z ) u i ( z ) , we obtain the following setof linear constraints: X z ∈ Z ( I ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, π ) µ ( π ) u i ( z ) = v [ i, I, a, I ] − X a ′ ∈ A ( I ) η ( a ′ ) w [ i, I, a, I, a ′ ] ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) v [ i, I, a, J ] − w [ i, I, a, J, a ′ ] ≥ X z ∈ Z ⊥ ( J,a ′ ) X π i ∈ Π i ( a ) π − i ∈ Π − i ξ η ( z, I, π ) µ ( π ) u i ( z )++ X K ∈C ( J,a ′ ) v [ i, I, a, K ] − X a ′′ ∈ A ( K ) η ( a ′′ ) w [ i, I, a, K, a ′′ ] ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) , ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) w [ i, I, a, J, a ′ ] ≥ ∀ i ∈ N, ∀ I ∈ I i , ∀ a ∈ A ( I ) , ∀ J ∈ I i : I (cid:22) J, ∀ a ′ ∈ A ( J ) . By introducing variables u [ i, I, a ] we get to the result. Lemma 3. If y ≥ is such that B ⊤ y = , then there exists µ encoding a product distribution µ ∈ ∆ Π such that µ ⊤ A ⊤ t y = 0 .Moreover, µ can be computed in poly-time.Proof. The proof follows the same line as the proof of Lemma 5 of Huang and von Stengel (2008) (its complete version can befound in (Huang 2011)). This is an extension of the CE existence proof by Hart and Schmeidler (1989) to the case of EFCE.It is based on the construction of an auxiliary -player zero-sum EFG, where player plays first by selecting a strategy profile π ∈ Π , and player plays second by choosing an infoset I ∈ I i of some player i ∈ N , an action a ∈ A ( I ) , and a combinationsof actions at following infosets J ∈ I i : I (cid:22) J (intuitively, player chooses a trigger agent corresponding to I and a , togetherwith a possible trigger agent’s behavior). It is easy to see that, for our Problem 7, variables in y have the same meaning as inLemma 5 of Huang and von Stengel (2008), i.e. , they represent valid player ’s strategies in the auxiliary game. This is becausethey satisfy the same linear restrictions B ⊤ y = . As a result, the only difference is in the coefficients of the exponentially-many constraints, which, in our case, are defined by the (perturbed) matrix A t , rather than A . These define the payoffs in theauxiliary game. In particular, following steps analogous to those by Huang (2011) we can conclude that, in the auxiliary game,player ’s expected payment to player when the latter plays π ∈ Π is given by the entry of A ⊤ t y corresponding to ππ