[PDF] Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks

Abstract

While Nash equilibrium in extensive-form games is well understood, very little is known about the properties of extensive-form correlated equilibrium (EFCE), both from a behavioral and from a computational point of view. In this setting, the strategic behavior of players is complemented by an external device that privately recommends moves to agents as the game progresses; players are free to deviate at any time, but will then not receive future recommendations. Our contributions are threefold. First, we show that an EFCE can be formulated as the solution to a bilinear saddle-point problem. To showcase how this novel formulation can inspire new algorithms to compute EFCEs, we propose a simple subgradient descent method which exploits this formulation and structural properties of EFCEs. Our method has better scalability than the prior approach based on linear programming. Second, we propose two benchmark games, which we hope will serve as the basis for future evaluation of EFCE solvers. These games were chosen so as to cover two natural application domains for EFCE: conflict resolution via a mediator, and bargaining and negotiation. Third, we document the qualitative behavior of EFCE in our proposed games. We show that the social-welfare-maximizing equilibria in these games are highly nontrivial and exhibit surprisingly subtle sequential behavior that so far has not received attention in the literature.

Full PDF

CC ORRELATION IN E XTENSIVE -F ORM G AMES :S ADDLE -P OINT F ORMULATION AND B ENCHMARKS ∗ A RXIV P REPRINT

Gabriele Farina

Computer Science DepartmentCarnegie Mellon University [email protected]

Chun Kai Ling

Computer Science DepartmentCarnegie Mellon University [email protected]

Fei Fang

Institute for Software ResearchCarnegie Mellon University [email protected]

Tuomas Sandholm

Computer Science Department, CMUStrategic Machine, Inc.Strategy Robot, Inc.Optimized Markets, Inc. [email protected]

October 27, 2019

Abstract

While Nash equilibrium in extensive-form games is well understood, very little is known aboutthe properties of extensive-form correlated equilibrium (EFCE) , both from a behavioral and froma computational point of view. In this setting, the strategic behavior of players is complemented byan external device that privately recommends moves to agents as the game progresses; players arefree to deviate at any time, but will then not receive future recommendations. Our contributionsare threefold. First, we show that an EFCE can be formulated as the solution to a bilinear saddle-point problem. To showcase how this novel formulation can inspire new algorithms to computeEFCEs, we propose a simple subgradient descent method which exploits this formulation andstructural properties of EFCEs. Our method has better scalability than the prior approach based onlinear programming. Second, we propose two benchmark games, which we hope will serve as thebasis for future evaluation of EFCE solvers. These games were chosen so as to cover two naturalapplication domains for EFCE: conﬂict resolution via a mediator, and bargaining and negotiation.Third, we document the qualitative behavior of EFCE in our proposed games. We show that thesocial-welfare-maximizing equilibria in these games are highly nontrivial and exhibit surprisinglysubtle sequential behavior that so far has not received attention in the literature.

1. Introduction

Nash equilibrium (NE) (Nash, 1950), the most seminal concept in non-cooperative game theory, captures amulti-agent setting where each agent is selﬁshly motivated to maximize their own payoff. The assumptionunderpinning NE is that the interaction is completely decentralized : the behavior of each agent is notregulated by any external orchestrator. Contrasted with the other—often utopian—extreme of a fullymanaged interaction, where an external dictator controls the behavior of each agent so that the whole systemmoves to a desired state, the social welfare that can be achieved by NE is generally lower, sometimesdramatically so (Koutsoupias & Papadimitriou, 1999; Roughgarden & Tardos, 2002). Yet, in many ∗ This paper was accepted for publication at NeurIPS 2019. a r X i v : . [ c s . G T ] O c t RXIV P REPRINT - O

CTOBER

27, 2019realistic interactions, some intermediate form of centralized control can be achieved. In particular, in hislandmark paper, Aumann (1974) proposed the concept of correlated equilibrium (CE), where a mediator(the correlation device ) can recommend behavior, but not enforce it . In a CE, the correlation device isconstructed so that the agents—which are still modeled as fully rational and selﬁsh just like in an NE—haveno incentive to deviate from the private recommendation. Allowing correlation of actions while ensuringselﬁshness makes CE a good candidate solution concept in multi-agent and semi-competitive settings suchas trafﬁc control, load balancing (Ashlagi et al., 2008), and carbon abatement (Ray & Gupta, 2009), and itcan lead to win-win outcomes.In this paper, we study the natural extension of correlated equilibrium in extensive-form (i.e., sequential)games, known as extensive-form correlated equilibrium (EFCE) (von Stengel & Forges, 2008). Like CE,EFCE assumes that the strategic interaction is complemented by an external mediator; however, in anEFCE the mediator only privately reveals the recommended next move to each acting player , instead ofrevealing the whole plan of action throughout the game (i.e., recommended move at all decision points)for each player at the beginning of the game. Furthermore, while each agent is free to defect from therecommendation at any time, this comes at the cost of future recommendations.While the properties of correlation in normal-form games are well-studied, they do not automatically transferto the richer world of sequential interactions. It is known in the study of NE that sequential interactions canpose different challenges, especially in settings where the agents retain private information. Conceptually,the players can strategically adjust to dynamic observations about the environment and their opponents as thegame progresses. Despite tremendous interest and progress in recent years for computing NE in sequentialinteractions with private information, with signiﬁcant milestones achieved in poker games (Bowling et al.,2015; Brown & Sandholm, 2017; Moravˇcík et al., 2017; Brown & Sandholm, 2019b) and other large,real-world domains, not much has been done to increase our understanding of (extensive-form) correlatedequilibria in these settings.

Contributions

Our primary objective with this paper is to spark more interest in the community towardsa deeper understanding of the behavioral and computational aspects of EFCE. • In Section 3 we show that an EFCE in a two-player general-sum game is the solution to a bilinearsaddle-point problem (BSPP). This conceptual reformulation complements the EFCE construction by vonStengel & Forges (2008), and allows for the development of new and efﬁcient algorithms. As a proofof concept, by using our reformulation we devise a variant of projected subgradient descent whichoutperforms linear-programming(LP)-based algorithms proposed by von Stengel & Forges (2008) inlarge game instances. • In Section 5 we propose two benchmark games; each game is parametric, so that these games can scalein size as desired. The ﬁrst game is a general-sum variant of the classic war game

Battleship . The secondgame is a simpliﬁed version of the

Sheriff of Nottingham board game. These games were chosen so as tocover two natural application domains for EFCE: conﬂict resolution via a mediator, and bargaining andnegotiation. • By analyzing EFCE in our proposed benchmark games, we show that even if the mediator cannot enforcebehavior, it can induce signiﬁcantly higher social welfare than NE and successfully deter players fromdeviating in at least two (often connected) ways: (1) using certain sequences of actions as ‘passcodes’ toverify that a player has not deviated: defecting leads to incomplete or wrong passcodes which indicatedeviation, and (2) inducing opponents to play punitive actions against players that have deviated fromthe recommendation, if such a deviation is detected. Crucially, both deterrents are unique to sequential interactions and do not apply to non-sequential games. This corroborates the idea that the mediation ofsequential interactions is a qualitatively different problem than that of non-sequential games and furtherjustiﬁes the study of EFCE as an interesting direction for the community. To our knowledge, these arethe ﬁrst experimental results and observations on EFCE in the literature.The source code for our game generators and subgradient method is published online . https://github.com/Sandholm-Lab/game-generators https://github.com/Sandholm-Lab/efce-subgradient RXIV P REPRINT - O

CTOBER

27, 2019

2. Preliminaries

Extensive-form games (EFGs) are sequential games that are played over a rooted game tree. Each nodein the tree belongs to a player and corresponds to a decision point for that player. Outgoing edges froma node v correspond to actions that can be taken by the player to which v belongs. Each terminal nodein the game tree is associated with a tuple of payoffs that the players receive should the game end in thatstate. To capture imperfect information, the set of vertices of each player is partitioned into informationsets . The vertices in a same information set are indistinguishable to the player that owns those vertices.For example, in a game of Poker, a player cannot distinguish between certain states that only differ inopponent’s private hand. As a result, the strategy of the player (specifying which action to take) is deﬁnedon the information sets instead of the vertices. For the purpose of this paper, we only consider perfect-recall EFGs. This property means that each player does not forget any of their previous action, nor any private orpublic observation that the player has made. The perfect-recall property can be formalized by requiring thatfor any two vertices in a same information set, the paths from those vertices to the root of the game treecontain the exact same sequence of actions for the acting player at the information set.A pure normal-form strategy for Player i deﬁnes a choice of action for every information set that belongsto i . A player can play a mixed strategy, i.e., sample from a distribution over their pure normal-formstrategies. However, this representation contains redundancies: some information sets for Player i maybecome unreachable reachable after the player makes certain decisions higher up in the tree. Omitting theseredundancies leads to the notion of reduced-normal-form strategies, which are known to be strategicallyequivalent to normal-form strategies (e.g., (Shoham & Leyton-Brown, 2009) for more details). Both thenormal-form and the reduced-normal-form representation are exponentially large in the size of the gametree.Here, we ﬁx some notations. Let Z be the set of terminal states (or equivalently, outcomes) in the gameand u i ( z ) be the utility obtained by player i if the game terminates at z ∈ Z . Let Π i be the set of purereduced-normal-form strategies for Player i . We deﬁne Π i ( I ) , Π i ( I, a ) and Π i ( z ) to be the set of reduced-normal-form strategies that (a) can lead to information set I , (b) can lead to I and prescribes action a at information set I , and (c) can lead to the terminal state z , respectively. We denote by Σ i the set ofinformation set-action pairs ( I, a ) (also referred to as sequences ), where I is an information set for Player i and a is an action at set I . For a given terminal state z let σ i ( z ) be the last ( I, a ) pair belonging to Player i encountered in the path from the root of the tree to z . Extensive-Form Correlated Equilibrium

Extensive-form correlated equilibrium (EFCE) is a solutionconcept for extensive-form games introduced by von Stengel & Forges (2008). Like in the traditionalcorrelated equilibrium (CE), introduced by Aumann (1974), a correlation device selects private signalsfor the players before the game starts. These signals are sampled from a correlated distribution µ —a jointprobability distribution over Π × Π —and represent recommended player strategies. However, while in aCE the recommended moves for the whole game tree are privately revealed to the players when the gamestarts, in an EFCE the recommendations are revealed incrementally as the players progress in the game tree.In particular, a recommended move is only revealed when the player reaches the decision point in the gamefor which the recommendation is relevant. Moreover, if a player ever deviates from the recommended move,they will stop receiving recommendations. To concretely implement an EFCE, one places recommendationsinto ‘sealed envelopes’ which may only be opened at its respective information set. Sealed envelopes mayimplemented using cryptographic techniques (see (Dodis et al., 2000) for one such example).In an EFCE, the players know less about the set of recommendations that were sampled by the correlationdevice. The beneﬁts are twofold. First, the players can be more easily induced to play strategies thathurt them (but beneﬁt the overall social welfare), as long as “on average” the players are indifferent asto whether or not to follow the recommendations: the set of EFCEs is a superset of that of CEs. Second,since the players observe less, the set of probability distributions for the correlation device for which no Other CE-related solution concepts in sequential games include the agent-form correlated equilibrium (AFCE),where agents continue to receive recommendations even upon defection, and normal-form coarse CE (NFCCE). NFCCEdoes not allow for defections during the game, in fact, before the game starts, players must decide to commit tofollowing all recommendations upfront (before receiving them), or elect to receive none. RXIV P REPRINT - O

CTOBER

27, 2019player has an incentive to deviate can be described succinctly in certain classes of games: von Stengel &Forges (2008, Theorem 1.1) show that in two-player, perfect-recall extensive-form games with no chancemoves, the set of EFCEs can be described by a system of linear equations and inequalities of polynomialsize in the game description. On the other hand, the same result cannot hold in more general settings: vonStengel & Forges (2008, Section 3.7) also show that in games with more than two players and/or chancemoves, deciding the existence of an EFCE with social welfare greater than a given value is NP-hard. It isimportant to note that this last result only implies that the characterization of the set of all

EFCEs cannot beof polynomial size in general (unless P = NP). However, the problem of ﬁnding one

EFCE can be solved inpolynomial time: Huang (2011) and Huang & von Stengel (2008) show how to adapt the

Ellipsoid AgainstHope algorithm (Papadimitriou & Roughgarden, 2008; Jiang & Leyton-Brown, 2015) to compute an EFCEin polynomial time in games with more than two players and/or with chance moves. Unfortunately, thatalgorithm is only theoretical, and known to not scale beyond extremely small instances (Leyton-Brown,2019).

3. Extensive-Form Correlated Equilibria as Bilinear Saddle-Point Problems

Our objective for this section is to cast the problem of ﬁnding an EFCE in a two-player game as a bilinearsaddle-point problem, that is a problem of the form min x ∈X max y ∈Y x (cid:62) Ay, where X and Y are compactconvex sets. In the case of EFCE, X and Y are convex polytopes that belong to a space whose dimension ispolynomial in the game tree size. This reformulation is meaningful: • From a conceptual angle, it brings the problem of computing an EFCE closer to several other solutionconcepts in game theory that are known to be expressible as BSPP. In particular, the BSPP formulationshows that an EFCE can be viewed as a NE in a two-player zero-sum game between a deviator , who istrying to decide how to best defect from recommendations, and a mediator , who is trying to come upwith an incentive-compatible set of recommendations. • From a geometric point of view, the BSPP formulation better captures the combinatorial structureof the problem: X and Y have a well-deﬁned meaning in terms of the input game tree. This hasalgorithmic implications: for example, because of the structure of Y (which will be detailed later), theinner maximization problem can be solved via a single bottom-up game-tree traversal. • From a computational standpoint, it opens the way to the plethora of optimization algorithms (bothgeneral-purpose and those speciﬁc to game theory) that have been developed to solve BSPPs. Examplesinclude Nesterov’s excessive gap technique (Nesterov, 2005), Nemirovski’s mirror prox algorithm (Ne-mirovski, 2004) and regret-methods based methods such as mirror descent, follow-the-regularized-leader(e.g., (Hazan, 2016)), and CFR and its variants (Zinkevich et al., 2007; Farina et al., 2019; Brown &Sandholm, 2019a).Furthermore, it is easy to show that by dualizing the inner maximization problem in the BSPP formulation,one recovers the linear program introduced by von Stengel & Forges (2008) (we show this in Appendix A).In this sense, our formulation subsumes the existing one.

Triggers and Deviations

One effective way to reason about extensive-form correlated equilibria is viathe notion of trigger agents , which was introduced (albeit used in a different context) in Gordon et al. (2008)and Dudik & Gordon (2009):

Deﬁnition 1.

Let ˆ σ := ( ˆ I, ˆ a ) ∈ Σ i be a sequence for Player i , and let ˆ µ be a distribution over Π i ( ˆ I ) . A (ˆ σ, ˆ µ ) -trigger agent for Player i is a player that follows all recommendations given by the mediator unlessthey get recommended ˆ a at ˆ I ; in that case, the player ‘gets triggered’, stops following the recommendationsand instead plays based on a pure strategy sampled from ˆ µ until the game ends. A correlated distribution µ is an EFCE if and only if any trigger agent for Player i can get utility at mostequal to the utility that Player i earns by following the recommendations of the mediator at all decisionpoints. In order to express the utility of the trigger agent, it is necessary to compute the probability of thegame ending in each of the terminal states. As we show in Appendix B, this can be done concisely bypartitioning the set of terminal nodes in the game tree into three different sets. In particular, let Z ˆ I, ˆ a be the4 RXIV P REPRINT - O

CTOBER

27, 2019set of terminal nodes whose path from the root of the tree contains taking action ˆ a at ˆ I and let Z ˆ I be the setof terminal nodes whose path from the root passes through ˆ I and are not in Z ˆ I, ˆ a . We have Lemma 1.

Consider a (ˆ σ, ˆ µ ) -trigger agent for Player 1, where ˆ σ = ( ˆ I, ˆ a ) . The value of the trigger agent,deﬁned as the expected difference between the utility of the trigger agent and the utility of an agent thatalways follows recommendations sampled from correlated distribution µ , is computed as v , ˆ σ ( µ, ˆ µ ) := (cid:88) z ∈ Z ˆ I u ( z ) ξ (ˆ σ ; z ) y , ˆ σ ( z ) − (cid:88) z ∈ Z ˆ I, ˆ a u ( z ) ξ ( σ ( z ); z ) , where ξ (ˆ σ ; z ) := (cid:80) π ∈ Π (ˆ σ ) (cid:80) π ∈ Π ( z ) µ ( π , π ) and y , ˆ σ ( z ) := (cid:80) ˆ π ∈ Π ( z ) ˆ µ (ˆ π ) . (A symmetric result holds for Player 2, with symbols ξ (ˆ σ ; z ) and y , ˆ σ ( z ) .) It now seems natural to performa change of variables, and pick distributions for the random variables y , ˆ σ ( · ) , y , ˆ σ ( · ) , ξ ( · ; · ) and ξ ( · ; · ) instead of µ and ˆ µ . Since there are only a polynomial number (in the game tree size) of combinationsof arguments for these new random variables, this approach allows one to remove the redundancy ofrealization-equivalent normal-form plans and focus on a signiﬁcantly smaller search space. In fact, thedeﬁnition of ξ = ( ξ , ξ ) also appears in (von Stengel & Forges, 2008), referred to as (sequence-form) correlation plan . In the case of the y , ˆ σ and y , ˆ σ random variables, it is clear that the change of variables ispossible via the sequence form (von Stengel, 2002); we let Y i, ˆ σ be the sequence-form polytope of feasiblevalues for the vector y i, ˆ σ . Hence, the only hurdle is characterizing the space spanned by ξ and ξ as µ varies across the probability simplex. In two-player perfect-recall games with no chance moves, thisis exactly one of the merits of the landmark work by von Stengel & Forges (2008). In particular, theauthors prove that in those games the space of feasible ξ can be captured by a polynomial number of linearconstraints. In more general cases the same does not hold (see second half of Section 2), but we prove thefollowing (Appendix C): Lemma 2.

In a two-player game, as µ varies over the probability simplex, the joint vector of ξ ( · ; · ) , ξ ( · ; · ) variables spans a convex polytope X in R n , where n is at most quadratic in the game size. Saddle-Point Reformulation

According to Lemma 1, for each Player i and (ˆ σ, ˆ µ ) -trigger agent for them,the value of the trigger agent is a biafﬁne expression in the vectors y i, ˆ σ and ξ i , and can be written as v i, ˆ σ ( ξ i , y i, ˆ σ ) = ξ (cid:62) i A i, ˆ σ y i, ˆ σ − b (cid:62) i, ˆ σ ξ i for a suitable matrix A i, ˆ σ and vector b i, ˆ σ , where the two terms in thedifference correspond to the expected utility for deviating at ˆ σ according to the (sequence-form) strategy y i, ˆ σ and the expected utility for not deviating at ˆ σ . Given a correlation plan ξ = ( ξ , ξ ) ∈ X , the maximumvalue of any deviation for any player can therefore be expressed as v ∗ ( ξ ) := max { i, ˆ σ,y i, ˆ σ } v i, ˆ σ ( ξ i , y i, ˆ σ ) = max i ∈{ , } max ˆ σ ∈ Σ i max y ˆ σ ∈ Y ˆ σ { ξ (cid:62) i A i, ˆ σ y i, ˆ σ − b (cid:62) i, ˆ σ ξ i } . We can convert the maximization above into a continuous linear optimization problem by introducing themultipliers λ i, ˆ σ ∈ [0 , (one per each Player i ∈ { , } and trigger ˆ σ ∈ Σ i ), and write v ∗ ( ξ ) = max { λ i, ˆ σ ,z i, ˆ σ } (cid:88) i (cid:88) ˆ σ ξ (cid:62) i A i, ˆ σ z i, ˆ σ − λ i, ˆ σ b (cid:62) i, ˆ σ ξ i , where the maximization is subject to the linear constraints [ C ] (cid:80) i ∈{ , } (cid:80) ˆ σ ∈ Σ i λ i, ˆ σ = 1 and [ C ] z i, ˆ σ ∈ λ i, ˆ σ Y i, ˆ σ for all i ∈ { , } , ˆ σ ∈ Σ i . These linear constraints deﬁne a polytope Y .A correlation plan ξ is an EFCE if an only if v i, ˆ σ ( ξ, y i, ˆ σ ) ≤ for every trigger agent, i.e., v ∗ ( ξ ) ≤ .Therefore, to ﬁnd an EFCE, we can solve the optimization problem min ξ ∈X v ∗ ( ξ ) , which is a bilinearsaddle point problem over the convex domains X and Y , both of which are convex polytopes that belong to R n , where n is at most quadratic in the input game size (Lemma 2). If an EFCE exists, the optimal valueshould be non-positive and the optimal solution is an EFCE (as it satisﬁes v ∗ ( ξ ) ≤ ). In fact, since EFCE’salways exist (as EFCEs are supersets of CEs (von Stengel & Forges, 2008)), and one can select triggers tobe terminal sequences for Player , the optimal value of the BSPP is always . The BSPP can be interpretedas the NE of a zero-sum game between the mediator , who decides on a suitable correlation plan ξ and a deviator who selects the y i, ˆ σ ’s to maximize each v i, ˆ σ ( ξ i , y i, ˆ σ ) . The value of this game is always .5 RXIV P REPRINT - O

CTOBER

27, 2019Finally, we can enforce a minimum lower bound τ on the sum of players’ utility by introducing an additionalvariable λ sw ∈ [0 , and maximizing the new convex objective v ∗ sw ( ξ ) := max λ sw ∈ [0 , (cid:40) (1 − λ sw ) · v ∗ ( ξ ) + λ sw (cid:34) τ − (cid:88) z ∈ Z u ( z ) ξ ( z ; z ) − (cid:88) z ∈ Z u ( z ) ξ ( z ; z ) (cid:35)(cid:41) . (1)

4. Computing an EFCE using Subgradient Descent (von Stengel & Forges, 2008) show that a SW-maximizing EFCE of a two-player game without chance maybe expressed as the solution of an LP and solved using generic methods such as the simplex algorithm orinterior-point methods. However, this does not scale to large games as these methods require storing andinverting large matrices. Another way of computing SW-maximizing EFCEs was provided by (Dudik &Gordon, 2009). However, their algorithm assumes that sampling from correlation plans is possible using theMonte Carlo Markov chain algorithm and does not factor in convergence of the Markov chain. Furthermore,even though their formulation generalizes beyond our setting of two-player games without chance, ourgradient descent method admits more complex objectives. In particular, it allows the mediator to maximizeover general concave objectives (in correlation plans) instead of only linear objectives with potentiallysome regularization. Here, we showcase the beneﬁts of exploiting the combinatorial structure of the BSPPformulation of Section 3 by proposing a simple algorithm based on subgradient descent; in Section 6 weshow that this method scales better than commercial state-of-the-art LP solver in large games.For brevity, we only provide a sketch of our algorithm, which computes a feasible EFCE; the extensionto the slightly more complicated objective v ∗ sw ( ξ ) (Equation 1) is straightforward—see Appendix D formore details. First, observe that the objective v ∗ ( ξ ) is convex since it is the maximum of linear functionsof ξ . This suggests that we may perform subgradient descent on v ∗ , where the subgradients are givenby ∂/∂ξ v ∗ ( ξ ) = A i ∗ , ˆ σ ∗ y ∗ i ∗ , ˆ σ ∗ − b i, ˆ σ ∗ , where ( i ∗ , ˆ σ ∗ , y ∗ i ∗ , ˆ σ ∗ ) is a triplet which maximizes the objectivefunction v ∗ ( ξ ) . The computation of such a triplet can be done via a straightforward bottom-up traversalof the game tree. In order to maintain feasibility (that is, ξ ∈ X ), it is necessary to project onto X , whichis challenging in practice because we are not aware of any distance-generating function that allows forefﬁcient projection onto this polytope. This is so even in games without chance (where ξ can be expressedby a polynomial number of constraints (von Stengel & Forges, 2008)). Furthermore, iterative methods suchas Dykstra’s algorithm, add a dramatic overhead to the cost of each iterate.To overcome this hurdle, we observe that in games with no chance moves, the set X of correlation plans—ascharacterized by von Stengel & Forges (2008) via the notion of consistency constraints—can be expressedas the intersection of three sets: (i) X , the sets of vectors ξ that only satisfy consistency constraints forPlayer 1; (ii) X , the sets of vectors ξ that only satisfy consistency constraints for Player 2; and (iii) R n + , thenon-negative orthant. X and X are polytopes deﬁned by equality constraints only. Therefore, an exactprojection (in the Euclidean sense) onto X and X can be carried out efﬁciently by precomputing a suitablefactorization the constraint matrices that deﬁne X and X . In particular, we are able to leverage the speciﬁccombinatorial structure of the constraints that form X and X to design an efﬁcient and parallel sparsefactorization algorithm (see Appendix D for the full details). Furthermore, projection onto the non-negativeorthant can be done conveniently, as it just amounts to computing a component-wise maximum between ξ and the zero vector. Since X = X ∩ X ∩ R n + , and since projecting onto X , X and R n + individually iseasy, we can adopt the recent algorithm proposed by (Wang & Bertsekas, 2013) designed to handle exactlythis situation. In that algorithm, gradient steps are interlaced with projections onto X , X and R n + in acyclical manner. This is similar to projected gradient descent, but instead of projecting onto the intersectionof X , X and R n + (which we believe to be difﬁcult), we project onto just one of them in round-robin fashion.This simple method was shown to converge by (Wang & Bertsekas, 2013). However, no convergence boundis currently known.

5. Introducing the First Benchmarks for EFCE

In this section we introduce the ﬁrst two benchmark games for EFCE. These games are naturally parametricso that they can scale in size as desired and hence used to evaluate different EFCE solvers. In addition,we show that the EFCE in these games are interesting behaviorally: the correlation plan in social-welfare-6

RXIV P REPRINT - O

CTOBER

27, 2019maximizing EFCE is highly nontrivial and even seemingly counter-intuitive. We believe some of theseinduced behaviors may prove practical in real-world scenarios and hope our analysis can spark an interestin EFCEs and other equilibria in sequential settings.

In this section we introduce our ﬁrst proposed benchmark game to illustrate the power of correlation inextensive-form games. Our game is a general-sum variant of the classic game

Battleship . Each player takesturns to secretly place a set of ships S (of varying sizes and value) on separate grids of size H × W . Afterplacement, players take turns ﬁring at their opponent—ships which have been hit at all the tiles they lieon are considered destroyed. The game continues until either one player has lost all of their ships, or eachplayer has completed r shots. At the end of the game, the payoff of each player is computed as the sum ofthe values of the opponent’s ships that were destroyed, minus γ times the value of ships which they lost,where γ ≥ is called the loss multiplier of the game. The social welfare (SW) of the game is the sum ofutilities to all players.In order to illustrate a few interesting feature of social-welfare-maximizing EFCE in this game, we willfocus on the instance of the game with a board of size × , in which each player commands just shipof value and length , there are rounds of shooting per player, and the loss multiplier is γ = 2 . In thisgame, the social-welfare-maximizing Nash equilibrium is such that each player places their ship and shootsuniformly at random. This way, the probability that Player 1 and 2 will end the game by destroying theopponent’s ship is / and / respectively (Player 1 has an advantage since they act ﬁrst). The probabilitythat both players will end the game with their ships unharmed is a meagre / . Correspondingly, themaximum SW reached by any NE of the game is − / .In the EFCE model, it is possible to induce the players to end the game with a peaceful outcome—that is, nodamage to either ship—with probability / , . times of the probability in NE, resulting in a much-higherSW of − / . Before we continue with more details as to how the mediator (correlation device) is able toachieve this result in the case where γ = 2 , we remark that the beneﬁt of EFCE is even higher when theloss multiplier γ increases: Figure 1 (left) shows, as a function of γ , the probability with which Player 1and 2 terminate the game by sinking their opponent’s ship, if they play according to the SW-maximizingEFCE. For all values of γ , the SW-maximizing NE remains the same while with a mediator, the probabilityof reaching a peaceful outcome increases as γ increases, and asymptotically gets closer to / and thegap between the expected utility of the two players vanishes. This is remarkable, considering Player 1’sadvantage for acting ﬁrst. . . . . P l a y e r P l a y e r No sunken ship

Ship loss value ( γ ) P r ob a b ilit y Pl.1 in cell a , Pl.2 in cell a Sh. b Sh. a Sh. c Sh. b Sh. a Sh. c Sh. a Sh. c Sh. c Sh. a Sh. c Sh. b Sh. a Sh. b Sh. a Sh. b Sh. a Sh. c Sh. c Sh. a Sh. b Sh. b Sh. a /

54 2 /

27 25 / / / / / / / / / / / / / / / / / / / / / Figure 1: (Left) Probabilities of players sinking their opponent when the players play according to the SW-maximizingEFCE. For γ ≥ , the probability of the game ending with no sunken ship and the probability of Player 2 sinking Player1 coincide. (Right) Example of a playthrough of Battleship assuming both players are recommended to place their shipin the same position a . Edge labels represents the probability of an action being recommended. Squares and hexagonsdenote actions taken by Players 1 and 2 respectively. Blue and red nodes represent cases where Players 1 and 2 sinktheir opponent, respectively. The Shoot action is abbreviated ‘Sh.’.

We now resume our analysis of the SW-maximizing EFCE in the instance where γ = 2 . In a nutshell, thecorrelation plan is constructed in a way that players are recommended to deliberately miss, and deviations7 RXIV P REPRINT - O

CTOBER

27, 2019from this are punished by the mediator, who reveals to the opponent the ship location that was recommended to the deviating player. First, the mediator recommends the players a ship placement that is sampleduniformly at random and independently for each players. This results in possible scenarios (one perpossible ship placement) in the game, each occurring with probability / . Due to the symmetric nature ofship placements, only two scenarios are relevant: whether the two players are recommended to place theirship in the same spot, or in different spots. Figure 1 (right) shows the probability of each recommendationfrom the mediator in the former case, assuming that the players do not deviate. The latter case is symmetric(see Appendix E for details). Now, we explain the ﬁrst of the two methods in which the mediator compelsnon-violent behavior. We focus on the ﬁrst shot made by Player 1 (i.e., the root in Figure 3). The mediatorsuggests that Player 1 shoot at the Player 2’s ship with a low / probability, and deliberately miss withhigh probability. One may wonder how it is possible for this behavior to be incentive-compatible (that is,what are the incentives that compel Player 1 into not defecting), since the player may choose to randomlyﬁre in any of the 2 locations that were not recommended, and get almost / chance of winning the gameimmediately. The key is that if Player 1 does so and does not hit the opponent’s ship, then the mediatorcan punish him by recommending that Player 2 shoot in the position where Player 1’s was recommendedto place their ship. Since players value their ships more than destroying their opponents’, the player isincentivized to avoid such a situation by accepting the recommendation to (most probably) miss. We seethe ﬁrst example of deterrent used by the mediator: inducing the opponent to play punitive actions againstplayers that have deviated from the recommendation, if ever that deviation can be detected from the player.A similar situation arises in the ﬁrst move of Player 2, where Player 2 is recommended to deliberately miss,hitting each of the 2 empty spots with probability / . A more detailed analysis is available in Appendix E. Our second proposed benchmark is a simpliﬁed version of the

Sheriff of Nottingham board game. Thegame models the interaction of two players: the

Smuggler —who is trying to smuggle illegal items in theircargo—and the

Sheriff —who is trying to stop the Smuggler. At the beginning of the game, the Smugglersecretly loads his cargo with n ∈ { , . . . , n max } illegal items. At the end of the game, the Sheriff decideswhether to inspect the cargo. If the Sheriff chooses to inspect the cargo and ﬁnds illegal goods, the Smugglermust pay a ﬁne worth p · n to the Sheriff. On the other hand, the Sheriff has to compensate the Smugglerwith a utility s if no illegal goods are found. Finally, if the Sheriff decides not to inspect the cargo, theSmuggler’s utility is v · n whereas the Sheriff’s utility is . The game is made interesting by two additionalelements (which are also present in the board game): bribery and bargaining . After the Smuggler has loadedthe cargo and before the Sheriff chooses whether or not to inspect, they engage in r rounds of bargaining. Ateach round i = 1 , . . . , r , the Smuggler tries to tempt the Sheriff into not inspecting the cargo by proposinga bribe b i ∈ { , . . . b max } , and the Sheriff responds whether or not they would accept the proposed bribe.Only the proposal and response from round r will be executed and have an impact on the ﬁnal payoffs—thatis, all but the r -th round of bargaining are non-consequential and their purpose is for the two players tosettle on a suitable bribe amount. If the Sheriff accepts bribe b r , then the Smuggler gets p · n − b r , whilethe Sheriff gets b r . See Appendix F for a formal description of the game.We now point out some interesting behavior of EFCE in this game. We refer to the game instance where v = 5 , p = 1 , s = 1 , n max = 10 , b max = 2 , r = 2 as the baseline instance. Effect of v, p and s . First, we show what happens in the baseline instance when the item value v , itempenalty p , and Sheriff compensation (penalty) s are varied in isolation over a continuous range of values.The results are shown in Figure 2. In terms of general trends, the effect of the parameter to the Smuggler isfairly consistent with intuition: the Smuggler beneﬁts from a higher item value as well as from higher sheriffpenalties, and suffers when the penalty for smuggling is increased. However, the ﬁner details are much morenuanced. For one, the effect of changing the parameters not only is non-monotonic, but also discontinuous.This behavior has never been documented and we ﬁnd it rather counterintuitive. More counterintuitiveobservations can be found in Appendix F. Effect of n max , b max , and r . Here, we try to empiricallyunderstand the impact of n and b on the SW maximizing equilibrium. As before we set v = 5 , p = 1 , s = 1 and vary n and r simultaneously while keeping b max constant. The results are shown in Table 1. The moststriking observation is that increasing the capacity of the cargo n max may decrease social welfare. For8 RXIV P REPRINT - O

CTOBER

27, 2019 − S m ugg l e r Sheriff

Illegal item value ( v ) U tilit y Sheriff game with varyingillegal item value − Sheriff Sheriff S m ugg l e r S m ugg l e r Illegal item penalty ( p )Sheriff game with varyingillegal item penalty SheriffSmuggler

Sheriff penalty ( s )Sheriff game with varying sheriff penalty (upon inspection of a cargo with no illegal items) Figure 2: Utility of players with varying v, p and s for the SW-maximizing EFCE. We veriﬁed that these plots are notthe result of equilibrium selection issues. example, consider the case when b max = 2 , n max = 2 , r = 1 (shown in blue in Table 1, right) where thepayoffs are (8 . , . . This achieves the maximum attainable social welfare by smuggling n max = 2 itemsand having the Sheriff accept a bribe of . When n max is increased to (red entry in the table), the payoffsto both players drop signiﬁcantly, and even more so when n max increases further. While counter-intuitive,this behavior is consistent in that the Smuggler may not beneﬁt from loading items every time he wasrecommended to load ; the Sheriff reacts by inspecting more, leading to lower payoffs for both players. n max r = 1 r = 2 r = 31 (3.00, 2.00) (3.00, 2.00) (3.00, 2.00) (8.00, 2.00) (8.00, 2.00) (8.00, 2.00) (2.28, 1.26) (8.00, 2.00) (8.00, 2.00) (1.76, 0.93) (7.26, 1.82) (8.00, 2.00) Table 1: Payoffs for (Smuggler, Sheriff) in theSW-maximizing EFCE.

That behavior is avoided by increasing the number of rounds r : by increasing to r = 2 (entry shown in purple), the be-havior disappears and we revert to achieving a social welfareof 10 just like in the instance with n max = 2 , r = 1 . Withsufﬁcient bargaining steps, the Smuggler, with the aid ofthe mediator, is able to convince the Sheriff that they havecomplied with the recommendation by the mediator. Thisis because the mediator spends the ﬁrst r − bribes to givea ‘passcode’ to the Smuggler so that the Sheriff can verifycompliance—if an ‘unexpected’ bribe is suggested, then the Smuggler must have deviated, and the Sheriffwill inspect the cargo as punishment. With more rounds, it is less likely that the Smuggler will guess thecorrect passcode. See also Appendix F for additional insights.

6. Experimental Evaluation

Even our proof-of-concept algorithm based on the BSSP formulation and subgradient descent, introduced inSection 3, is able to beat LP-based approaches using the commercial solver Gurobi (Gurobi Optimization,2018) in large games. This conﬁrms known results about the scalability of methods for computing NE,where in the recent years ﬁrst-order methods have afﬁrmed themselves as the only algorithms that are ableto handle large games.We experimented on

Battleship over a range of parameters while ﬁxing γ = 2 . All experiments were run ona machine with 64 cores and 500GB of memory. For our method, we tuned step sizes based on multiples of10. In Table 2, we report execution times when all constraints (feasibility and deviation) are violated byno greater than − , − and − . Our method outperforms the LP-based approach for larger games.However, while we outperform the LP-based approach for accuracies up to − , Gurobi spends most of itstime reordering variables and preprocessing and its solution converges faster for higher levels of precision;this is expected of a gradient-based method like ours. On very large games with more than 100 millionvariables, both our method and Gurobi fail—in Gurobi’s case, it was due to a lack of memory while in ourcase, each iteration required nearly an hour which was prohibitive. The main bottleneck in our method wasthe projection onto X and X . We also experimented on the Sheriff game and obtained similar ﬁndings(Appendix H). 9 RXIV P REPRINT - O

CTOBER

27, 2019 ( H, W ) r Ship − − − − − − (2, 2) † — — did not achieve ‡ — Table 2: ξ under the compact representation of (von Stengel & Forges, 2008).For LPs, we report the fastest of Barrier, Primal and Dual Simplex, and 3 different formulations (Appendix G). † Gurobiwent out of memory and was killed by the system after ∼ seconds ‡ Our method requires hour per iteration anddid not achieve the required accuracy after hours.

7. Conclusions

In this paper, we proposed two parameterized benchmark games in which EFCE exhibits interestingbehaviors. We analyzed those behaviors both qualitatively and quantitatively, and isolated two ways throughwhich a mediator is able to compel the agents to follow the recommendations. We also provided analternative saddle-point formulation of EFCE and demonstrated its merit with a simple subgradient methodwhich outperforms standard LP based methods.We hope that our analysis will bring attention to some of the computational and practical uses of EFCE,and that our benchmark games will be useful for evaluating future algorithms for computing EFCE in largegames.

Acknowledgments

This material is based on work supported by the National Science Foundation under grants IIS-1718457,IIS-1617590, and CCF-1733556, and the ARO under award W911NF-17-1-0082. Gabriele Farina issupported by a Facebook fellowship. Co-authors Ling and Fang are supported in part by a research grantfrom Lockheed Martin.

References

Ashlagi, I., Monderer, D., and Tennenholtz, M. On the value of correlation.

Journal of Artiﬁcial IntelligenceResearch , 33:575–613, 2008.Aumann, R. Subjectivity and correlation in randomized strategies.

Journal of Mathematical Economics , 1:67–96, 1974.Bowling, M., Burch, N., Johanson, M., and Tammelin, O. Heads-up limit hold’em poker is solved.

Science ,2015.Brown, N. and Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.

Science , Dec. 2017.Brown, N. and Sandholm, T. Solving imperfect-information games via discounted regret minimization. In

AAAI , 2019a.Brown, N. and Sandholm, T. Superhuman AI for multiplayer poker.

Science , 365(6456):885–890, 2019b.ISSN 0036-8075. doi: 10.1126/science.aay2400. URL https://science.sciencemag.org/content/365/6456/885 .Crawford, V. P. and Sobel, J. Strategic information transmission.

Econometrica: Journal of the EconometricSociety , pp. 1431–1451, 1982.Dodis, Y., Halevi, S., and Rabin, T. A cryptographic solution to a game theoretic problem. In

AnnualInternational Cryptology Conference , pp. 112–130. Springer, 2000.10

RXIV P REPRINT - O

CTOBER

27, 2019Dudik, M. and Gordon, G. J. A sampling-based approach to computing equilibria in succinct extensive-formgames. In

UAI , pp. 151–160. AUAI Press, 2009.Farina, G., Kroer, C., and Sandholm, T. Online convex optimization for sequential decision processes andextensive-form games. In

AAAI Conference on Artiﬁcial Intelligence , 2019.Gordon, G. J., Greenwald, A., and Marks, C. No-regret learning in convex games. In

Proceedings of the25 th international conference on Machine learning , pp. 360–367. ACM, 2008.Gurobi Optimization, L. Gurobi optimizer reference manual, 2018. URL .Hazan, E. Introduction to online convex optimization. Foundations and Trends in Optimization , 2016.Huang, W.

Equilibrium computation for extensive games . PhD thesis, London School of Economics andPolitical Science, January 2011.Huang, W. and von Stengel, B. Computing an extensive-form correlated equilibrium in polynomial time. In

International Workshop On Internet And Network Economics (WINE) , pp. 506–513. Springer, 2008.Jiang, A. X. and Leyton-Brown, K. Polynomial-time computation of exact correlated equilibrium incompact games.

Games and Economic Behavior , 91:347–359, 2015.Koutsoupias, E. and Papadimitriou, C. Worst-case equilibria. In

Symposium on Theoretical Aspects inComputer Science , 1999.Leyton-Brown, K. Personal communication, 2019.Moravˇcík, M., Schmid, M., Burch, N., Lisý, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M.,and Bowling, M. Deepstack: Expert-level artiﬁcial intelligence in heads-up no-limit poker.

Science ,2017.Nash, J. Equilibrium points in n-person games.

Proceedings of the National Academy of Sciences , 36:48–49, 1950.Nemirovski, A. Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitzcontinuous monotone operators and smooth convex-concave saddle point problems.

SIAM Journal onOptimization , 2004.Nesterov, Y. Excessive gap technique in nonsmooth convex minimization.

SIAM Journal of Optimization ,2005.Papadimitriou, C. H. and Roughgarden, T. Computing correlated equilibria in multi-player games.

Journalof the ACM , 55(3):14, 2008.Ray, I. and Gupta, S. S. Technical Report, 2009.Roughgarden, T. and Tardos, É. How bad is selﬁsh routing?

Journal of the ACM (JACM) , 49(2):236–259,2002.Shoham, Y. and Leyton-Brown, K.

Multiagent systems: Algorithmic, game-theoretic, and logical founda-tions . Cambridge University Press, 2009.von Stengel, B. Efﬁcient computation of behavior strategies.

Games and Economic Behavior , 1996.von Stengel, B. Computing equilibria for two-person games. In Aumann, R. and Hart, S. (eds.),

Handbookof game theory , volume 3. North Holland, Amsterdam, The Netherlands, 2002.von Stengel, B. and Forges, F. Extensive-form correlated equilibrium: Deﬁnition and computationalcomplexity.

Mathematics of Operations Research , 33(4):1002–1022, 2008.Wang, M. and Bertsekas, D. P. Incremental constraint projection-proximal methods for nonsmooth convexoptimization.

SIAM J. Optim.(to appear) , 2013.Zinkevich, M., Bowling, M., Johanson, M., and Piccione, C. Regret minimization in games with incompleteinformation. In

NIPS , 2007. 11

RXIV P REPRINT - O

CTOBER

27, 2019

A. Recovering the Linear Program of (von Stengel & Forges, 2008)

Recall the continuous version of the primal version of the inner maximization problem which was obtainedby adding the multipliers λ i, ˆ σ ∈ [0 , . max λ,z i, ˆ σ (cid:88) i ∈{ , } (cid:88) ˆ σ ∈ Σ i ξ (cid:62) i A i, ˆ σ z i, ˆ σ − λ i, ˆ σ b (cid:62) i, ˆ σ ξ i such that (cid:88) i ∈{ , } (cid:88) ˆ σ ∈ Σ i λ i, ˆ σ = 1 λ i, ˆ σ ≥ z i, ˆ σ ∈ λ i, ˆ σ Y i, ˆ σ , ∀ i ∈ { , } , ˆ σ ∈ Σ i where z i, ˆ σ may be seen as the sequence form representation of a game rooted at a particular information setof player i , and scaled by the factor λ i, ˆ σ . By expanding the sequence form constraints which deﬁne Y i, ˆ σ ,we get max λ,z (cid:88) i ∈{ , } (cid:88) ˆ σ ∈ Σ i ξ (cid:62) i A i, ˆ σ z i, ˆ σ − λ i, ˆ σ b (cid:62) i, ˆ σ ξ i such that (cid:88) i ∈{ , } (cid:88) ˆ σ ∈ Σ i λ i, ˆ σ = 1 λ i, ˆ σ ≥ z i, ˆ σ ≥ F i, ˆ σ z i, ˆ σ − λ ˆ σ f i, ˆ σ = 0 , ∀ i ∈ { , } , ˆ σ ∈ Σ i where F i, ˆ σ and f i, ˆ σ are sequence form constraint matrices rooted at the information set ˆ I containing ˆ σ , withthe only difference that instead of having the ‘empty sequence’ be equal to , we require that all actionsbelonging to ˆ I sum to λ i, ˆ σ . We are now in a position to take duals; the only non-zero elements on the righthand side of the constraints are from the sum-to-one constraints over λ i, ˆ σ . This give s the following dual min u,ν i (ˆ σ, · ) u such that F Ti, ˆ σ ν i (ˆ σ, · ) ≥ A Ti, ˆ σ ξ i ∀ i ∈ { , } , ˆ σ ∈ Σ i u − ν i (ˆ σ, ˆ I ) ≥ − b Ti, ˆ σ ξ i ∀ i ∈ { , } , ˆ σ = ( ˆ I, ˆ a ) ∈ Σ i , where u and ν (ˆ σ, · ) are free in sign. Combining this with the outer minimization over ξ i gives us the linearprogram by (von Stengel & Forges, 2008), up to a change in variable names and conventions. B. Derivation of Probabilities over Terminal States

In order to express the utility of a trigger agent, it is necessary to compute the probability of the gameending in each of the terminal states. Before that, we will review the notation introduced in earlier sectionsin more detail. • Z be the set of terminal states (or equivalently, outcomes) in the game, and z ∈ Z is some terminal state. • u i ( z ) be the utility obtained by player i if the game terminates at some terminal state z ∈ Z . • Π i be the set of pure reduced-normal-form strategies for Player i . We also require notation for subsets of Π i , namely, – Π i ( I ) , is the set of reduced-normal-form strategies that can lead to information set I (which belongsto player i ) assuming that the other player acts to do so as well. This is equivalent (assuming nozero-chance nodes or disconnected game trees) to saying that all reduced-normal-from strategies in Π i ( I ) have some action which belongs to information set I . – Π i ( I, a ) is the set of reduced normal form strategies which will lead to information set I and recommend the action a in I . This is equivalent to the set of reduced normal form strategies whichcontain a as part of their recommendation (this set is typically a subset of Π i ( I, a ) ).12 RXIV P REPRINT - O

CTOBER

27, 2019 – Π i ( z ) is the set of reduced-normal-form strategies which can lead to the terminal state z (assumingthe other player players to do so). This is equivalent to the set of reduced-normal-form strategieswhich contain the σ = ( I, a ) pair where σ = ( I, a ) is the unique last information set-action pairwhich has to be encountered by player i before the terminal state z . • Σ i the set of information set-action pairs ( I, a ) (also known as sequences ), where I is an information setfor Player i and a is an action at set I . • σ i ( z ) is the last ( I, a ) pair belonging to Player i encountered before some terminal state z ∈ Z .We are interested in characterizing the random variable t ˆ σ : Π × Π × Π ( ˆ I ) → Z that maps a triple ofreduced-normal-form strategies ( π , π , ˆ π ) to the terminal state of the game that is reached when Player 1is a ˆ σ -trigger agent and Player 2 follows all recommendations. That is, we want to ﬁnd the probabillityof terminating at each z ∈ Z for a ˆ σ -trigger agent, given the mediator’s joint distribution µ over reducednormal form strategies and the trigger strategy ˆ µ for the deviating player, which we will assume to be Player without loss of generality. For each trigger ˆ σ , the terminal leaves may be partitioned into the following sets. • Z ˆ σ (or equivalently Z ˆ I, ˆ a ) is the set of terminal nodes that are descendants of the trigger ˆ σ = ( ˆ I, ˆ a ) .In order for the game to end in one of these terminal nodes, it is necessary that the recommendationdevice recommended to Player 1 the trigger sequence ˆ σ , and therefore the agent must have deviated.Furthermore, Player 2 must have been recommended the terminal sequence σ ( z ) corresponding tothe terminal state, and ﬁnally ˆ π must be compatible with σ ( z ) . We can capture all these constraintsconcisely by saying that the sampled ( π , π , ˆ π ) must be such that π ∈ Π (ˆ σ ) , π ∈ Π ( z ) and ˆ π ∈ Π ( z ) . Therefore the probability that a ˆ σ trigger agent terminates at some z ∈ Z ˆ σ is given by, P µ, ˆ µ [ t ˆ σ = z ∈ Z ˆ σ ] =  (cid:88) π ∈ Π (ˆ σ ) π ∈ Π ( z ) µ ( π , π )   (cid:88) ˆ π ∈ Π ( z ) ˆ µ (ˆ π )  , where the ﬁrst term in the product is the probability that Player plays to z and Player gets triggered,and the second term is the probability that the deviation strategy from Player upon getting triggered isone that reaches z . • Z ˆ I is the set of terminal states that are descendant of any sequence in ˆ I , except ˆ σ . In order for the gameto reach this terminal state, recommendations issued to Player 1 by the correlation device must have beensuch that Player 1 reached ˆ I . There are two cases: either the correlation device recommended ˆ σ at ˆ I ,or it did not. In the former case, Player 1 started deviating (using the sampled reduced-normal-formplan ˆ π ); hence, in this case it must be ˆ π ∈ Π ( z ) . In the latter case, Player 1 does not deviatefrom the recommendation, and therefore it must be π ∈ Π ( z ) . Either way, Player 2 must havebeen recommended the terminal sequence z corresponding to the terminal state z ; that is, π ∈ Π ( z ) .Collecting all these constraints, it must be ( π , π , ˆ π ) ∈ Π (ˆ σ ) × Π ( z ) × Π ( z ) ∪ Π ( z ) × Π ( z ) × Π ( ˆ I ) . Using the fact that the two cases as to whether or not Player 1 was recommended ˆ σ or not at ˆ I are disjoint,we can write P µ, ˆ µ [ t ˆ σ = z ∈ Z ˆ I ] =  (cid:88) π ∈ Π (ˆ σ ) π ∈ Π ( z ) µ ( π , π )   (cid:88) ˆ π ∈ Π ( z ) ˆ µ (ˆ π )  +  (cid:88) π ∈ Π ( z ) π ∈ Π ( z ) µ ( π , π )  . The ﬁrst term in the summation may be understood as the probability that the agent was triggered and itsdeviation was to play something other than ˆ σ . The second term is that probability that the agent was nottriggered and the game simply terminates at z based on µ . • Finally, Z − ˆ I is the set of terminal nodes that are neither in Z ˆ σ nor in Z ˆ I . If the game has endedin any terminal state that belongs to Z − ˆ I , Player 1 has not deviated from the recommended strategy,13 RXIV P REPRINT - O

CTOBER

27, 2019since they have never even reached the trigger information set, ˆ I . Hence, in this case it must be ( π , π ) ∈ Π ( z ) × Π ( z ) . Hence, P µ, ˆ µ [ t ˆ σ = z ∈ Z − ˆ I ] = (cid:88) π ∈ Π ( z ) π ∈ Π ( z ) µ ( π , π ) . With the above, we can ﬁnally express the constraint that no deviation strategy ˆ µ can lead to a higher utilityfor Player 1 than simply following each recommendation. Indeed, for all ˆ µ , the utility of the trigger agent isexpressed as (cid:88) z ∈ Z u ( z ) P µ, ˆ µ [ t ˆ σ = z ] , where the correct expression for P µ, ˆ µ [ t ˆ σ = z ] must be selected depending on whether z ∈ Z ˆ σ , z ∈ Z ˆ I or z ∈ Z − ˆ I . On the other hand, the utility of an agent that follows all recommendations is (cid:88) z ∈ Z u ( z ) P µ, ˆ µ [ π ∈ Π ( z ) , π ∈ Π ( z )] = (cid:88) z ∈ Z  u ( z ) (cid:88) π ∈ Π ( z ) π ∈ Π ( z ) µ ( π , π )  . Therefore, following all recommendations is a best response for the ˆ σ -trigger agent if and only if µ ischosen so that (cid:88) z ∈ Z u ( z )  P µ, ˆ µ [ t ˆ σ = z ] − (cid:88) π ∈ Π ( z ) π ∈ Π ( z ) µ ( π , π )  ≤ ∀ ˆ µ ∈ ∆ | Π (ˆ I ) | . (2)The crucial observation is that all the probabilities P µ, ˆ µ [ t = z ] deﬁned above can be expressed via thefollowing quantities: y , ˆ σ ( z ) := (cid:88) ˆ π ∈ Π ( z ) ˆ µ (ˆ π ) ∀ z ∈ Z ; ξ ( σ ; z ) := (cid:88) π ∈ Π ( σ ) π ∈ Π ( z ) µ ( π , π ) ∀ σ ∈ Σ , z ∈ Z. For example, for all z ∈ Z ˆ I we can write P µ, ˆ µ [ t ˆ σ = z ] = ξ (ˆ σ ; z ) y i, ˆ σ ( z ) + ξ ( σ ( z ); z ) . When deviations relative to Player 2 are brought into the picture, the following two sets of symmetricquantities also become relevant: y , ˆ σ ( z ) := (cid:88) ˆ π ∈ Π ( z ) ˆ µ (ˆ π ) ∀ z ∈ Z ; ξ ( σ ; z ) := (cid:88) π ∈ Π ( z )) π ∈ Π ( σ ) µ ( π , π ) ∀ σ ∈ Σ , z ∈ Z. It would now seem natural to perform a change of variables, and pick (correlated) distributions for therandom variables y , ˆ σ ( · ) , y , ˆ σ ( · ) , ξ ( · ; · ) and ξ ( · ; · ) instead of µ, ˆ µ and ˆ µ . Since there are only apolynomial number (in the game tree size) of combinations of arguments for these new random variables,this approach would allow one to remove the redundancy of realization-equivalent normal-form plans andfocus on a polynomially-small search space. In the case of the random variables y , ˆ σ and y , ˆ σ , it is clearthat the change of variables is possible via the sequence form (von Stengel, 2002). Therefore, the onlydifﬁculty is in characterizing the space spanned by ξ and ξ as µ varies across the probability simplex.In two-player perfect-recall games with no chance moves, this is exactly the merit of the landmark workby von Stengel & Forges (2008). In particular, the authors prove that in those games the space of feasible ξ , ξ can be captured by a polynomial number of linear constraints.14 RXIV P REPRINT - O

CTOBER

27, 2019

C. Proof of Lemma 2

The vectors of entries ξ ( · ; · ) , ξ ( · ; · ) are obtained from µ via a linear mapping. Hence, the set of valuesthat can be assumed by ξ is the image of the probability simplex via a linear mapping. Since images ofpolytopes via linear functions are polytopes, the lemma holds. D. Details of Our Subgradient Method

First, observe that the objective v ∗ ( ξ ) is convex since it is the maximum of linear functions of ξ . Thissuggests that we may perform subgradient descent on v ∗ , where the subgradients are given by ∂/∂ξ v ∗ ( ξ ) = A i ∗ , ˆ σ ∗ y ∗ i ∗ , ˆ σ ∗ − b i, ˆ σ ∗ , (3)where ( i ∗ , ˆ σ ∗ , y ∗ i ∗ , ˆ σ ∗ ) is a triplet which maximizes the objective function v ∗ ( ξ ) . The computation of such atriplet is a straightforward bottom-up traversal of the game tree. In order to maintain feasibility (that is, ξ ∈ X ), it is necessary to project onto X , which is challenging in practice, because we are not aware of anydistance-generating function which allows for efﬁcient projection onto this polytope. This is so even ingames without chance (where ξ can be expressed by a polynomial number of constraints (von Stengel &Forges, 2008)). Furthermore, iterative methods such as Dykstra’s algorithm, add an dramatic overhead tothe cost of each iterate.To overcome this hurdle, we observe that in games with no chance moves, the set X of correlation plans—ascharacterized by von Stengel & Forges (2008) via the notion of consistency constraints—can be expressedas the intersection of three sets: (i) X , the sets of vectors ξ that only satisfy consistency constraints forPlayer 1; (ii) X , the sets of vectors ξ that only satisfy consistency constraints for Player 2, respectively; and(iii) R n + , the non-negative orthant. X and X are polytopes deﬁned by equality constraints only. Therefore,an exact projection (in the Euclidean sense) onto X and X can be carried out efﬁciently by precomputinga suitable factorization the constraint matrices that deﬁne X and X . In particular, we are able to leveragethe speciﬁc combinatorial structure of the constraints that form X and X to design an efﬁcient and parallelsparse factorization algorithm (see Appendix D for the full details). Furthermore, projection onto thenon-negative orthant can be done conveniently, as it just amounts to computing a component-wise maximumbetween ξ and the zero vector. Since X = X ∩ X ∩ R n + , and since projecting onto X , X and R n + individually is easy, we can adopt the recent algorithm proposed by (Wang & Bertsekas, 2013) designed tohandle exactly this situation. In that algorithm, gradient steps are interlaced with projections onto X , X and R n + in a cyclical manner. This is similar to projected gradient descent, but instead of projecting ontothe intersection of X , X and R n + (which we believe to be difﬁcult), we project onto just one of them inround-robin fashion. This simple method was shown to converge by (Wang & Bertsekas, 2013), however,no convergence bound is currently known. D.1. Factorization of constraints over X von Stengel & Forges (2008) showed that a ξ may be represented compactly as a 2-dimensional matrix,with dimensions equal to the sequence form representation (von Stengel, 1996) of each player, where oneis only interested in entries corresponding to relevant sequence pairs (von Stengel & Forges (2008) fordetails). Then, the aforementioned constraints (i) and (ii) deﬁning X and X are equivalent to the sequenceform constraints for each row and column respectively. Constraint (iii) ensures that the entries of ξ arenon-negative and that the entry for the empty sequence pair is .Observe that projection (based on L2 distance) on X and X individually can be decomposed a seriesof disjoint projections (either on rows or columns) and thus computed in parallel. We now show that L2-projection of each individual row/column over the sequence form constraints (von Stengel & Forges, 2008)may be done efﬁciently. Let F and f be matrices and vectors corresponding to the sequence form constraints F x − f = 0 . Here, F is a (sparse) matrix of size information sets × sequences which contains entriesin {− , , } and f is a vector containing ’s or ’s. Each information set in F corresponds to the ‘ﬂow’constraints for an information set, with a coefﬁcient of − for the unique parent sequence leading to that15 RXIV P REPRINT - O

CTOBER

27, 2019information set, and a coefﬁcient of − for all sequences immediately following that information set. Given a vector w , the projection onto the afﬁne space given by F x − f is given by the optimization problem min x (cid:107) x − w (cid:107) s.t. F x − f = 0 The closed form solution may be found using Lagrange multipliers, and is given by x ∗ = F T ( F F T ) − ( f − F w ) + w, Since F is sparse, the main difﬁculty in computing x ∗ is overcome if we can efﬁciently compute ( F F T ) − q for any vector q . Lemma 3.

Let F be the sequence form constraint matrix. Computing ( F F T ) − q may be done efﬁciently.Proof. The key here is to exploit the structure of

F F T . Observe that F F T is symmetric, positive-deﬁniteand has dimension equal to the number of information sets. Furthermore, F F T may be expressed in closedform: ( F F T ) ij =  − i is the direct parent/child of j , i is the sibling of j i, i = j , otherwise , where i, j above are information sets, and i being the parent of j means that there is some action in i whichcan lead to information set j (without any other information set from the same player in between), and i being the sibling of j means that the (unique) sequence leading to i and j are the same. Observe that F F T is almost , but not quite tree-structured. However, it is sparse and more importantly, has ﬁll-in if weorder variables in a bottom-up fashion in the player’s game tree. That is, we treat F F T as a graph withinformation sets as vertices, then repeatedly remove vertices (information sets) in a bottom-up fashion,while forming cliques with all neighbors of the removed vertex. Note that due to the structure of F F T ,we will not introduce any new edges. In other words, performing Gaussian elimination on ( F F T ) may bedone without introducing additional non-zero entries. If the number of maximum number of actions that aninformation set may have is upper bounded by a constant a max , then eliminating a single variable will onlyrequire time quadratic in a max . This means that computing ( F F T ) − q may be done efﬁciently when x max is small. Remark.

Lemma 3 and the fact that L2 projections onto sequence form constraints can be doneefﬁciently may be of separate interest to researchers beyond the scope of EFCEs.In practice, we precompute a sparse Cholesky factor of

F F T . From the previous discussion, the Choleskyfactors are guaranteed to be sparse and easily stored. Withe the Cholesky decomposition of F F T , ﬁnding ( F F T ) − q becomes straightforward. This precomputation is done once per trigger-sequence ˆ σ , since theset of relevant sequence pairs for each trigger sequence (i.e., the location of non-zero entries in the matrixrepresenting ξ ) differs. This precomputation step is trivially parallel. In our experiments, computing theCholesky factors was rarely the bottleneck (although we do include this timing when evaluating our method) D.2. Social Welfare Maximization

Observe that Equation (1) may be rewritten in the form of (1 − λ sw ) v ∗ sw ( ξ ) − λ sw b T ξ for a suitable vector b .Hence, the gradient for the modiﬁed objective is given by ∂/∂ξ v ∗ ( ξ ) = (cid:40) A i ∗ , ˆ σ ∗ y ∗ i ∗ , ˆ σ ∗ − b i, ˆ σ ∗ v ∗ ( ξ ) ≥ κ ( ξ ) − b otherwise , In our implementation, f need not have this restriction, but it is included her to be more in line with the classicwork of (von Stengel, 1996). RXIV P REPRINT - O

CTOBER

27, 2019where κ ( ξ ) = τ − (cid:80) z ∈ Z u ( z ) ξ ( z ; z ) − (cid:80) z ∈ Z u ( z ) ξ ( z ; z ) , the difference between τ and the socialwelfare obtained from ξ . E. Battleship Game

E.1. Extended Description of the Game

A game of Battleship is parameterized by a tuple ( H, W, S , r, γ ) , where • the integers H, W ≥ deﬁne the height and width of the playing ﬁeld for each player; • S is an ordered list containing ship descriptions s i for each player. Each description is a pair s i =( (cid:96) i , v i ) , where (cid:96) i is the length of the i -th ship and v i is its value; • r ≥ is the number of rounds in the game; • γ ≥ is a loss multiplier that controls the relative value of a losing versus destroying ships.The game proceeds in phases: ship placement and shooting . During the ship placement phase, the players(starting with Player 1) take turns placing their ships on their playing ﬁeld. The players must place alltheir ships, in the same order in which they appear in S , on the playing ﬁeld. The ship placement phaseends when all ships have been placed. We remark that the players’ playing ﬁelds are separate: in otherwords, there are two playing ﬁelds of dimensions H × W , one per player. The ships may be placed eitherhorizontally or vertically on each player’s grid (playing ﬁeld); all ships must lie entirely within the playingﬁeld and may not overlap with other ships the player has already placed. Finally, the locations of a player’sships is private information for each player.In the shooting phase, players take turns ﬁring at each other; Player 1 starts ﬁrst. This is done by selecting apair of integer coordinates ( x, y ) that identify a cell within the playing ﬁeld. After taking a shot, the playeris told if the shot was a hit , that is, the selected cell ( x, y ) is occupied by a ship of the opponent, or if it is a miss , that is, ( x, y ) does not contain an opponent’s ship. If all cells covered by a ship have been shot at, theship is destroyed and this fact is announced. Note that the identity of the ship which was hit or sunk is notrevealed; players only know that some ships was hit or sunk. The game ends when r shots have been madeby each player, or if one player has lost all their ships, whichever comes ﬁrst. At the end of the game, eachplayer’s payoff is computed as follows: for each opponent’s ship that the player has destroyed, the playerreceives a payoff equal to the value v of that ships; for each ship that the player has lost to the opponent, theplayer incurs a negative payoff equal to γ · v , that is the value of the ship times the loss multiplier γ . Notethat when γ > the game is general sum.Since γ ≥ , this asymmetric model describes situations where players are encouraged to destroy otherships, but are ultimately more protective of their own assets. The loss multiplier γ governs this gap; a highervalue of γ makes so that each player values their ships more than destroying others. Note that when γ = 1 ,we obtain a zero-sum version of battleships (with varying scores for each ship).For the remainder of the discussion, we deﬁne the social welfare (SW) of any outcome to be the sum ofpayoffs of each player. We will demonstrate that with the aid of a mediator (the correlation device), thesocial welfare of the optimal correlated equilibria are dramatically higher than the social welfare of even thebest Nash equilibrium. In other words, the mediator leads to signiﬁcantly less destructive outcomes, andleads to more frequent ties where the players sometimes agree to deliberately miss their opponents, whilestill retaining incentive-compatibility and rationality in the standard game-theoretic sense. E.2. Analysis of Social-Welfare-Maximizing EFCE

We analyze one social-welfare-maximizing EFCE in the same small instance of Battleship as the previoussection. The mediator in this EFCE recommends the players a ship placement that is sampled uniformlyat random and independently for each players. This results in possible scenarios (one per possibleship placement) in the game, each occurring with probability / . Due to the symmetric nature of ship17 RXIV P REPRINT - O

CTOBER

27, 2019placements, only two scenarios are relevant: whether the two players are recommended to place their shipin the same spot, or in different spots. Figure 3 details the strategy of the the mediator in each of these twoscenarios, assuming that the players do not deviate. Note that the game trees in Figure 3 are parametric onthe recommended ship placements a and b ; all possible ship placements can be recovered from Figure 3by setting a and b to appropriate values in { , , } . Pl.1 in cell a , Pl.2 in cell a Sh. b Sh. a Sh. c Sh. b Sh. a Sh. c Sh. a Sh. c Sh. c Sh. a Sh. c Sh. b Sh. a Sh. b Sh. a Sh. b Sh. a Sh. c Sh. c Sh. a Sh. b Sh. b Sh. a /

54 2 /

27 25 / / / / / / / / / / / / / / / / / / / / / Pl.1 in cell b , Pl.2 in cell a Sh. b Sh. a Sh. c Sh. a Sh. a Sh. c Sh. b Sh. c Sh. c Sh. a Sh. c Sh. b Sh. a Sh. a Sh. a Sh. b Sh. b Sh. c Sh. c Sh. a Sh. b Sh. b Sh. a /

54 2 /

27 25 / / / / / / / / / / / / / / / / / / / / / Figure 3: Example of a playthrough of Battleship assuming both players were recommended to place their ship in a (left), or that Player and were recommended to place their ships in a and b respectively (right). For both pictures,the numbers along each edge denote probabilities of each action being recommended; no edge is shown for actionsrecommended with zero probability. Squares and hexagons denote actions taken by Players 1 and 2 respectively.Similarly, blue and red nodes represent cases where Players 1 and 2 sink their opponent’s ship, respectively. Green leafnodes are where the game results in no ship loss. The Shoot action is abbreviated to ‘Sh.’

For both game trees, note that the correlation device suggests that Player 1 shoot at the Player 2’s ship witha low / probability, and deliberately miss with high probability. As hinted in earlier sections, this type ofrecommendation is key to understanding why the EFCE succeeds in promoting less destructive outcomes.One may wonder why this behavior is incentive-compatible (that is, what are the incentives that compelPlayer 1 into not defecting), since the player may choose to randomly ﬁre in any of the 2 locations that were not recommended, and get almost / chance of winning the game immediately. The key is that if Player1 does so and does not hit the opponent’s ship, then the mediator can punish him by recommending thatPlayer 2 shoot at the location of Player 1’s ship. Since players value their ships more than destroying theiropponents, the player is incentivized to avoid such a situation by accepting the recommendation to (mostprobably) miss.A similar situation arises in the ﬁrst move of Player 2. Here, Player 2 is recommended to deliberately miss,hitting each of the 2 empty spots with probability / . If he deviates and attempts to destroy Player 1’s ship,then he risks the mediator revealing his location to his opponent if his shot misses; this risk is enough tokeep Player 2 ‘in line’. The second move of Player 1 (third shot of the full game) bears a similar ideas.Here, Player 1 is recommended to hit Player 2’s ship with probability / . Similar to his ﬁrst shot, Player 1may deviate and ﬁre at the remaining location and enjoy / chance of winning the game out right. Yet, thisbehavior is discouraged, since in the / chance that he misses the shot (i.e., the recommendation was infact, the correct location of Player 2’s ship), then his location would be revealed by the mediator and heloses the next round. Again, this threat from the mediator encourages peaceful behavior, even though therecommendation to Player 1 reveals a more accurate ‘posterior’ of Player 2’s ship location, as compared tothe uniform distribution of / . While making these recommendations, the mediator ensures that Player 2has a uniform distribution of Player 1’s ship location, meaning that even though Player 2 has the ﬁnal move,he may not do better than guessing at uniform at this stage. Remark.

It is important to note that Figure 3 does not convey the full information of the correlatedplans. Crucially, it does not show the consequences suffered if a player deviates from his recommendedstrategy—in this case, the deviating player stops receiving recommendations and risks having his ship’s18

RXIV P REPRINT - O

CTOBER

27, 2019location revealed to the opponent. These ‘counterfactual’ scenarios may be counter-intuitive but are key tounderstanding how SW-maximizing EFCEs achieve their purpose.

F. Sheriff Game

F.1. Extended Version of the Game

The Sheriff game is described by the the parameters v, p, s ∈ R + , n max , b max , r ∈ N . The parameters v, p, s ≥ describe the value of each illegal item, the penalty that the Smuggler has to pay for each discovered illegal item, and the compensation that the Sheriff pays to the Smuggler in the case of a falsealarm. At the beginning of the game, the Smuggler loads n ∈ { , . . . , n max } items into his cargo. Theamount of goods loaded is unknown to the Sheriff. The game then proceeds for r ≥ rounds of bargaining.Each round comprises two steps. First, the Smuggler offers a bribe b t ∈ { , . . . , b max } to the Sheriff, where t ≤ r is the round of bargaining. After that, the Sheriff responds with ‘Yes’ or ‘No’.All actions are public knowledge, except for the selection of cargo contents, which only the Smugglerknows. In the ﬁnal step, we compute the payoffs to players. The outcome of the game is decided by the last step of bargaining. In particular, the ﬁrst r − rounds of bargaining have no explicit bearing on theoutcome of the game, except for purposes of coordination. The payoffs for each outcome are:1. Sheriff accepts the bribe. The Smuggler’s gets n · v − b r , and the Sheriff’s gets the bribe offered b r .2. Sheriff inspects and discovers illegal items. The Smuggler is ﬁned and gets a payoff of − n · p whilethe Sheriff gets a payoff of n · p .3. Sheriff chooses to inspect and does not ﬁnd illegal items. The Smuggler receives a compensation of s ,while the Sheriff gets − s .The objective of the mediator is to maximize social welfare in the space EFCEs. Ideally, this will involvethe Smuggler bringing in goods and the Sheriff accepting bribes – any other outcome would simply bezero-sum, since it no goods will be successfully smuggled and money only changes hands between players.A qualitative description of the welfare maximizing equilibrium is not obvious, since the game containselements of both lying and bargaining. Remark.

The communication in the bargaining steps is at a superﬁcial level similar to that in cheaptalk (Crawford & Sobel, 1982), where costless and non-binding signals are transmitted between players.However, in our setting, the signals are transmitted in the middle of the game as opposed to just at thebeginning. More importantly, the presence of the mediator during the phase of bargaining bestows moreuses for the signals—in particular, the mediator may be able to take punitive measures against players whodeviate from recommendations, since future recommendations will be withheld from players who deviate.We will illustrate the importance of this at the end of Appendix F.2.

F.2. Effect of Additional Rounds of Bargaining ( r ) n max r = 1 r = 2 r = 3 r = 41 (4.00, 1.00) (4.00, 1.00) (4.00, 1.00) (4.00, 1.00) (1.24, 0.19) (4.00, 1.00) (4.00, 1.00) (4.00, 1.00) (0.89, 0.11) (1.11, 1.00) (4.00, 1.00) (4.00, 1.00) (0.82, 0.00) (0.84, 1.00) (3.62, 1.00) (4.00, 1.00) Table 3: Payoffs for (Smuggler, Sheriff) when players play according to the SW-maximizing EFCE in the Sheriff gamewith b max = 2 (right). RXIV P REPRINT - O

CTOBER

27, 2019We illustrate the effect of the non-consequential bribes with two small settings, where v = 5 , p = 1 , s =1 , n max = 3 , b max = 2 , r ∈ { , } . Examples of SW-maximizing equilibria are shown in Figure 4 andFigure 5. Start of GameLoad 1 Load 3Bribe $0 Bribe $1 Bribe $2Inspect Pass Pass Bribe $2Pass /

79 17 / /

62 8 /

31 17 /

11 1 1 1

Figure 4: Example of a playthrough of the Sheriff game with r = 1 . Edge labels correspond to action probabilities,edges with probability are omitted. Squares and hexagons denote actions taken by Players 1 and 2 respectively, whilegreen and red nodes denote the Sheriff choosing to pass or inspect. Start of GameLoad 1 Load 3Bribe $0 Bribe $1 Bribe $2 Bribe $0 Bribe $1 Bribe $2Feedback 1 Feedback 1 Feedback 1 Feedback 1 Feedback 1 Feedback 1Bribe $2 Bribe $2 Bribe $2 Bribe $2 Bribe $2 Bribe $2Pass Pass Pass Pass Pass Pass / / / / / / / / Figure 5: Example of a playthrough of the Sheriff game with r = 2 . Edge labels correspond to action probabilities,edges with probability are omitted. Squares and hexagons denote actions taken by Players 1 and 2 respectively, whilegreen and red nodes denote the Sheriff choosing to pass or inspect. The SW maximizing EFCE yields payoffs of (3 . , . and (8 . , . for r = 1 and r = 2 respectively.We will ﬁrst consider the case where r = 2 (Figure 5. Here, what occurs happens along the equilibriumpath is straightforward. The Smuggler loads in or items with equal probability. Next, he offers a(non-consequential) bribe of either , , or . Then, he receives some feedback of , and proceeds to offer abribe of $2 , which the Sheriff gladly accepts. The payoffs to players is (13 , and (3 , depending if theSmuggler was recommended to load or items, leading to an average payoff of (8 , .The underlying mechanism is in fact fairly straightforward and mirrors the idea in the modiﬁed sig-nalling game of (von Stengel & Forges, 2008). Assume that a random number is chosen uniformly from { , . . . , b max } . This acts as a ‘passcode’ which the Sheriff expects from the Smuggler in the ﬁrst round. As with the analysis of Battleship, note that this only shows interactions of players on the equilibrium path, that is,the graph omits what would happen if some player deviated. RXIV P REPRINT - O

CTOBER

27, 2019This passcode forms part of the correlated plan, and will eventually be revealed to the Smuggler assuminghe did not deviate when selecting the number of illegal items (recall that the sequential nature of the EFCEmeans that the recommended amount to bribe is not revealed until the Smuggler loads the cargo with therecommended number of items.) In other words, the ﬁrst (non-consequential) bribe may be used as a signalwhich hints to the Sheriff if the Smuggler has deviated —if it is not equal to the passcode, the Smuggler must have deviated somewhere. On the other hand, a deviating Smuggler may successfully guess the passcodewith probability no greater than / ( b max + 1) ; if the number of signals b max is sufﬁciently large, then it isnear impossible to guess the code. Using these tools, the mediator is able to engineer a ‘deviation detector’which checks if the Smuggler ever deviated. Note, however, that unlike the Signaling game, the Sheriff isnot able to glean exactly how what was recommended (in this case, the number of items in the cargo); he isonly able to deduce if the player deviated from the recommendation (in this case, this would be load either or items).Issuing threats to the Smuggler becomes straightforward with this deviation detector. If the Sheriff knowsthe Smuggler is lying, he employs a ‘grim trigger’ for the rest of the game—in this case, the Sheriff opts toinspect all of the player’s cargo, regardless of the bribe offered in the second round. The Smuggler could alsobe pretending to bring in illegal goods, i.e., by loading items and hoping that he would guess the incorrect passcode, resulting in the Sheriff making a false accusation. However, because the Smuggler’s payoff fordeceiving the Sheriff in this manner is just , he remains incentivized to stick to the recommendations,which guarantees him a payoff of either of .We now make the following hypotheses. First, the effect of additional bargaining rounds r is that the chanceof randomly guessing the passcode is reduced. If there are r rounds, then there are ( b + 1) r − differentpossible signals that the Smuggler could have sent to the Sheriff through the ﬁrst r − rounds. When r = 1 ,this class of correlation plans fails since the bribe by the Smuggler serves both as the answer to the ‘secretquestion’ and as the actual bribe to be offered. This aliasing of roles is what leads to a lower payoff; the riskof sending an incorrect passcode is not sufﬁciently high to dissuade the Smuggler from deviating. G. 3 LP formulations for computing EFCEs

Refer to the dualized problem in Appendix A. Observe that u is the value of the maximum deviation over all ˆ σ —when all incentive constraints are met, u should be non-positive. We propose different formulations. • Min-Deviation: what was presented in Appendix A. • Feas-Deviation: instead of minimizing u in the objective, replace that by a hard constraint that u ≤ . • Maximum-SW: formulate the LP similar to Feas-Deviation, but with the SW-maximizing objective.

H. Additional Experiments on Sheriff Game

The results for the Sheriff game were run using the parameters p = 1 , v = 5 , b max = 3 , r = 5 , s = 4 whilevarying the maximum number of items that can be smuggled n max . The time required for the error to dropbelow a certain threshold is reported for both Gurobi and our subgradient method. The results are reportedin Table 4. As before, we observe that our method can outperforms Gurobi if lower levels of accuracy are n max .

75 0 . .

75 0 . Table 4: ξ under the compact representation of (von Stengel & Forges, 2008). For LPs,we report the fastest of Barrier, Primal and Dual Simplex, and 3 different formulations (Appendix G). Our subgradientmethod did not manage to achieve an accuracy of . after 1 hour of running. desired. However, it was observed that for higher levels of accuracy, Gurobi requires signiﬁcantly less21 RXIV P REPRINT - O