[PDF] Optimally Deceiving a Learning Leader in Stackelberg Games

Abstract

Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near-)optimal payoffs for various scenarios about the learning interaction between leader and follower.

Full PDF

aa r X i v : . [ c s . G T ] J un Optimally Deceiving a Learning Leaderin Stackelberg Games ∗Georgios Birmpas , Jiarui Gan , Alexandros Hollender ,Francisco J. Marmolejo-Coss´ıo , Ninad Rajgopal , and Alexandros A. Voudouris University of Oxford, UK University of Essex, UK

Abstract

Recent results in the ML community have revealed that learning algorithms used to compute theoptimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulationby the follower. Such a learning algorithm operates by querying the best responses or the payoﬀsof the follower, who consequently can deceive the algorithm by responding as if his payoﬀs weremuch diﬀerent than what they actually are. For this strategic behavior to be successful, the mainchallenge faced by the follower is to pinpoint the payoﬀs that would make the learning algorithmcompute a commitment so that best responding to it maximizes the follower’s utility, according tohis true payoﬀs. While this problem has been considered before, the related literature only focusedon the simpliﬁed scenario in which the payoﬀ space is ﬁnite, thus leaving the general version ofthe problem unanswered. In this paper, we ﬁll in this gap, by showing that it is always possible forthe follower to compute (near-)optimal payoﬀs for various scenarios about the learning interactionbetween leader and follower.

Stackelberg games are a simple yet powerful model for sequential interaction among strategic agents.In such games there are two players: a leader and a follower. The leader commits to an action, and thefollower acts upon observing the leader’s commitment. The simple sequential structure of the gamepermits modeling a multitude of important scenarios. Indicative applications include the competitionbetween a large and a small ﬁrm [Von Stackelberg, 2010], the allocation of defensive resources [Tambe,2011], the competition among mining pools in the Bitcoin network [Marmolejo-Coss´ıo et al., 2019,Sun et al., 2020], and the protection again manipulation in elections [Elkind et al., 2019, Yin et al., 2018].In Stackelberg games, the leader is interested in ﬁnding the best commitment she can make, as-suming that the follower behaves rationally. The combination of such a commitment by the leader andthe follower’s rational best response to it leads to a strong Stackelberg equilibrium (SSE). In general,the utility that the leader obtains in an SSE is larger than what she would obtain in a Nash equilib-rium of the corresponding one-shot game [Stengel and Zamir, 2004], implying that the leader prefersto commit than to engage in a simultaneous game with the follower.In case the leader has access to both hers and the follower’s payoﬀ parameters, computing an SSEis a computationally tractable problem [Conitzer and Sandholm, 2006]. In practice however, the leader ∗ Georgios Birmpas is supported by the ERC Starting grant number 639945 (ACCORD). Jiarui Gan is supported by theEPSRC International Doctoral Scholars Grant EP/N509711/1. Alexandros Hollender is supported by an EPSRC doctoral stu-dentship (Reference 1892947). any payoﬀ matrix, without restrictions on the space of possible values, has been consideredonly in two very recent papers [Gan et al., 2019a, Nguyen and Xu, 2019], which however focused onthe speciﬁc application of Stackelberg games to security resource allocation problems. Besides that,no progress has been made for general Stackelberg games. In this paper, we aim to ﬁll in this gap,by completely resolving this computational problem, a result that reﬂects the insecurity of learning tocommit in Stackelberg games.

Our Contribution

We explore how a follower can optimally deceive a learning leader in Stackelberg games by misre-porting his payoﬀ matrix, and study the tractability of the corresponding optimization problem. As inprevious work, our objective is to compute the fake payoﬀ matrix according to which the follower canbest respond to make the leader learn an SSE in which the true utility of the follower is maximized.However, unlike the related literature, we do not impose any restrictions on the space from which thepayoﬀs are selected or on the type of the game. By exploiting an intuitive characterization of all strat-egy proﬁles that can be induced as SSEs in Stackelberg games, we show that it is always possible forthe follower to compute a payoﬀ matrix implying an SSE which maximizes his true utility, in polyno-mial time. Furthermore, we strengthen this result to resolve possible equilibrium selection issues, byshowing that the follower can construct a payoﬀ matrix that induces a unique

SSE, in which his utilityis maximized up to some arbitrarily small loss.

Other Related Work

Our paper is related to an emerging line of work at the intersection of machine learning and algo-rithmic game theory, dealing with scenarios where the samples used for training learning algorithmsare controlled by strategic agents, who aim to optimize their personal beneﬁt. Indicatively, there hasbeen recent interest in the analysis of the eﬀect of strategic behavior on the eﬃciency of existingalgorithms, as well as the design of algorithms resilient to strategic manipulation for linear regres-sion [Ben-Porat and Tennenholtz, 2019, Chen et al., 2018, Dekel et al., 2010, Hossain and Shah, 2020,Perote and Perote-Pe ˜na, 2004, Waggoner et al., 2015] and classiﬁcation [Chen et al., 2019, Dong et al.,2018, Meir et al., 2012, Zhang et al., 2019].Beyond the strategic considerations above, our work is also related to the study of query pro-tocols for learning game-theoretic equilibria. In this setting, as in ours, algorithms for computing2quilibria via utility and best response queries are a natural starting point. For utility queries, therehas been much work in proving exponential lower bounds for randomized computation of exact, ap-proximate and well-supported Nash equilibria [Babichenko and Rubinstein, 2017, Babichenko, 2016,Chen et al., 2015, Goldberg and Roth, 2016, Hart and Mansour, 2010, Hart and Nisan, 2016], as well asproviding query-eﬃcient protocols for approximate Nash equilibrium computation in bimatrix games,congestion games [Fearnley et al., 2015], anonymous games [Goldberg and Turchetta, 2017], and largegames [Goldberg et al., 2019]. Best response queries are forcibly weaker than utility queries, but theyarise naturally in practice, and are also expressive enough to implement ﬁctitious play, a dynamicﬁrst proposed in Brown [1949], and proven to converge in [Robinson, 1951] for two-player zero-sum games to an approximate Nash equilibrium. In terms of equilibrium computation, the authors in[Goldberg and Marmolejo-Coss´ıo, 2018] also provide query-eﬃcient algorithms for computing approx-imate Nash equilibria for bimatrix games via best response queries provided one agent has a constantnumber of strategies.Finally, learning via incentive queries in games is directly related to the theory of preference elic-itation, where the goal is to mine information about the private parameters of the agents by inter-acting with them [Blum et al., 2004, Lahaie and Parkes, 2004, Zinkevich et al., 2003, Goldberg et al.,2020]. This has many applications, most notably combinatorial auctions, where access to the valu-ation functions of the agents is achieved via value or demand queries [Blumrosen and Nisan, 2007,Conen and Sandholm, 2001, Nisan and Segal, 2006].

A Stackelberg game (SG) is a sequential game between a leader and a follower . The leader commitsto a strategy, and the follower then acts upon observing this commitment. We consider ﬁnite SGs, inwhich the leader and the follower have m and n pure strategies at their disposal, respectively, and theirutilities for all possible outcomes are given by the matrices u L , u F ∈ R m × n . The entries u L ( i, j ) and u F ( i, j ) denote the utilities of the leader and the follower, under pure strategy proﬁle ( i, j ) ∈ [ m ] × [ n ] .We use G = ( u L , u F ) to denote the SG with payoﬀ matrices u L and u F ; we omit m and n as they areclear from context.Like one-shot games, the agents are allowed to employ mixed strategies whereby they randomizeover actions in their strategy set. A mixed strategy of the leader is a probability distribution over [ m ] , denoted by x ∈ ∆ m − = { x ≥ P i ∈ [ m ] x i = 1 } . By slightly abusing notation, we let u L ( x , j ) = P i ∈ [ m ] x i · u L ( i, j ) be the expected utility of the leader when she plays the mixed strategy x and the follower plays a pure strategy j . Similarly, we deﬁne u F ( x , j ) = P i ∈ [ m ] x i · u F ( i, j ) forthe follower. For a given mixed strategy x ∈ ∆ m − of the leader, we say that j ∈ [ n ] is a followerbest response if u F ( x , j ) = max ℓ ∈ [ n ] u F ( x , ℓ ) ; we denote the set of all follower best responses to x by BR ( x ) ⊆ [ n ] and refer to the function BR as the best response correspondence of the follower.A strong Stackelberg equilibrium (SSE) is the standard solution concept in SGs, and captures thesituation where the leader commits to a mixed strategy that maximizes her expected utility, whiletaking into account the follower’s best response to her commitment. It is assumed that the followerbreaks ties in favor of the leader when he has multiple best responses. Deﬁnition 2.1 (SSE) . A strategy proﬁle ( x , j ) is an SSE of the SG G = ( u L , u F ) if ( x , j ) ∈ arg max y ∈ ∆ m − , ℓ ∈ BR ( y ) u L ( y , ℓ ) . Following the standard convention, we will refer to the leader as a female and to the follower as a male. This standard assumption is justiﬁed by the fact that such tie-breaking behavior can often be enforced by an inﬁnitesimalperturbation in the leader’s strategy [Stengel and Zamir, 2004]. earning SSEs and Deceptive Follower Behavior. We consider the scenario where the leader hasfull knowledge of her utility matrix u L , and aims to compute an SSE by interacting with the followerand gleaning information about u F . For example, the leader could observe follower best responses inplay (akin to having query access to BR ), or observe follower payoﬀs at pure strategy proﬁles duringplay (akin to having query access to u F as a function). Hence, this can be cast as the problem oflearning an SSE with a speciﬁed notion of query access to information about the follower’s incentives.Consider an SG G = ( u L , u F ) . If the follower controls the ﬂow of information to the leader inthis paradigm, he may consider perpetually interacting with the leader as if he had a diﬀerent payoﬀmatrix ˜ u F , which can make the leader believe that both agents are playing the game e G = ( u L , ˜ u F ) .This deceiving power provides the follower with an incentive to act according to e G for a judiciouschoice of ˜ u F , because the SSEs in e G may provide larger utility (according to u F ) than the SSEs in G .More concretely, the example below shows that the follower can gain an arbitrary beneﬁt by deceivingthe leader to play a diﬀerent game. Example 2.2 ( Beneﬁcial deception).

Let α ∈ [0 , and consider the following matrices: R = (cid:18) (cid:19) , C α = (cid:18) α α (cid:19) Now, suppose that u L = R and u F = C α , and let x ∈ [0 , represent the probability mass thatthe leader (row player) places on the ﬁrst row (her ﬁrst strategy); thus, − x is the probability withwhich she plays her second strategy. Given this mixed strategy of the leader, the utilities that thefollower expects to derive from her two strategies (columns) are u F ( x,

1) = 1 − x and u F ( x,

2) = α .Consequently, the ﬁrst strategy is a best response of the follower when x ∈ [0 , − α ] , and the secondone is a best response when x ∈ (1 − α, (when x = 1 − α , the tie is broken in favor of the leader).With this information, it is clear that the SSE of the game occurs when the leader chooses x = 1 − α and the follower plays his ﬁrst strategy. As a result, the follower’s utility is u F (1 − α,

1) = α .However, for any α < , the follower has an incentive to deceive the leader into playing the game e G = ( R, C ) , which will improve his utility in the resulting SSE to . This will be an improvement bya multiplicative factor of /α , which can be arbitrarily large when α is arbitrarily close to . Inducible Strategy Proﬁles.

The ultimate goal of the follower is to identify the SSE that maximizeshis true utility, from the set of SSEs that he can deceive the leader into learning. We will refer to suchSSEs as inducible strategy proﬁles . At a high level, the follower’s problem can now be expressed as thefollowing optimization problem: max x ,j u F ( x , j ) , (1)subject to ( x , j ) is inducibleThis maximum utility for the follower is called the optimal inducible utility . If the maximum value isnever achieved, then for every ε > , we would like to be able to ﬁnd an inducible SSE that achieves avalue ε -close to the supremum value.As discussed previously, the leader can learn an SSE by gleaning information about the incentivesof the follower by querying the best responses of the follower to particular leader strategies, or morereﬁned information about the follower’s payoﬀ matrix. Depending on the type of information queried,we can deﬁne various levels of inducible strategy proﬁles.In more detail, suppose the leader can only query the best responses of the follower, who behavesaccording to some best response correspondence f BR : ∆ m − → [ n ] \ { ∅ } . This interaction betweenthe leader and the follower leads to a game e G = ( u L , f BR ) where only information about f BR is known(instead of a payoﬀ matrix implying f BR ). The deﬁnition of f BR enforces a best response answer to4ny possible query. Consequently, the leader learns an SSE ( x , j ) ∈ arg max y ∈ ∆ m − , ℓ ∈ f BR ( y ) u L ( y , ℓ ) ,which yields the following notion of BR-inducible strategy proﬁles.

Deﬁnition 2.3 (BR-inducibility) . A strategy proﬁle ( x , j ) is BR-inducible with respect to u L if thereexists a best response correspondence f BR : ∆ m − → [ n ] \ { ∅ } such that ( x , j ) is an SSE of the game e G = ( u L , f BR ) , in which case we say that ( x , j ) is induced by f BR .Next, consider the case where the leader can query information about the payoﬀs of the follower,who can now behave according to a fake payoﬀ matrix ˜ u F . We refer to the SSEs of the resulting game e G = ( u L , ˜ u F ) as payoﬀ-inducible strategy proﬁles. Deﬁnition 2.4 (Payoﬀ-inducibility) . A strategy proﬁle ( x , j ) is said to be payoﬀ-inducible with respectto u L if there exists ˜ u F ∈ R m × n such that ( x , j ) is an SSE in the game e G = ( u L , ˜ u F ) , in which casewe say that ( x , j ) is induced by ˜ u F .Clearly, payoﬀ-inducibility is stricter than BR-inducibility: for every choice of ˜ u F , the correspond-ing best response correspondence f BR ( y ) = arg max ℓ ∈ [ n ] ˜ u F ( y , ℓ ) induces the same SSEs as ˜ u F does.Note that the above deﬁnitions only require an inducible strategy proﬁle to be a veriﬁable SSE,with respect to information about the follower’s incentive (either f BR or ˜ u F ). However, it may happenthat the resulting game e G has multiple SSEs, which gives rise to an equilibrium selection issue. Indeed,in practice, it is not realistic to assume that the follower has any control over which SSE is chosenby the leader (who moves the ﬁrst in the game). To address this, and thus completely resolve theoptimal deception problem for the follower, we introduce an even stricter notion of inducibility on topof payoﬀ-inducibility, which requires e G to have a unique SSE. Deﬁnition 2.5 (Strong inducibility) . A strategy proﬁle ( x , j ) is said to be strongly inducible withrespect to u L , if there exists a matrix ˜ u F ∈ R m × n such that ( x , j ) is the unique SSE of the game e G = ( u L , ˜ u F ) , in which case we say that ( x , j ) is strongly induced by ˜ u F .In the next sections, we will investigate solutions to (1) under the inducibility notions above, fromthe weakest to the strongest. Our general approach is to decompose (1) into n sub-problems by enu-merating all possible follower responses j ∈ [ n ] . For each strategy j , we solve the correspondingoptimization problem, and pick the one that yields the maximum utility for the follower. Due to spaceconstraints, some proofs are omitted and can be found in the supplementary material. Let us start our analysis by considering the case in which the leader queries the best responses of thefollower. The aim of the follower is to deceive the leader towards a strategy proﬁle that is BR-inducible;see Deﬁnition 2.3. Indeed, if the follower is allowed to use an arbitrary f BR to induce a strategy proﬁle ( x , j ) , he can simply deﬁne f BR as follows: f BR ( y ) = ( { j } if y = x arg min ℓ ∈ [ n ] u L ( y , ℓ ) if y = x . Namely, the follower threatens to choose the worst possible response against any leader strategy y = x , so as to minimize the leader’s incentive to commit to these strategies. This f BR will successfullyconvince the leader that ( x , j ) is an SSE of e G , hence inducing ( x , j ) , if the threat is powerful enough,that is, if u L ( x , j ) ≥ min ℓ ∈ [ n ] u L ( y , ℓ ) for all y ∈ ∆ m − . Equivalently, this means that u L ( x , j ) ≥ M := max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) , (2)5here M is exactly the leader’s maximin utility . Indeed, (2) is necessary for ( x , j ) to be BR-inducible:if on the contrary u L ( x , j ) < M , then by committing to y ∗ ∈ arg max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) , theleader can obtain (at least) her maximin utility, which will be strictly larger than u L ( x , j ) .Thus, condition (2) gives a simple criterion for BR-inducibility. The problem is that such f BR maybe far from being one that arises from a choice of ˜ u F . To alleviate this limitation, we impose a strictercondition on f BR . Polytopal BR Correspondence.

In a similar vein to Goldberg and Marmolejo-Coss´ıo [2018], werequire that, for every ℓ ∈ [ n ] , the set of leader strategies to which ℓ is a best response f BR − ( ℓ ) = { y ∈ ∆ m − : ℓ ∈ f BR ( y ) } is a closed convex polytope, and the union of all these sets forms a partitionof ∆ m − (for example, see the polytope partition of ∆ in Figure 1). Any best response correspondence f BR satisfying this assumption is called polytopal . Deﬁnition 3.1 (Polytopal best response correspondence [Goldberg and Marmolejo-Coss´ıo, 2018]) . Abest response correspondence f BR : ∆ m − → [ n ] \ { ∅ } is polytopal if it also satisﬁes the following:• f BR − ( ℓ ) is a closed convex polytope for each ℓ ∈ [ n ] , and• For each k = ℓ , either relint( f BR − ( k )) ∩ relint( f BR − ( ℓ )) = ∅ or f BR − ( k ) = f BR − ( ℓ ) , where relint( H ) denotes the relative interior of a set H .Being polytopal is necessary for f BR to arise from some payoﬀ matrix. Indeed, the true best responsecorrespondence BR that arises from u F is polytopal: clearly, each BR − ( ℓ ) is a closed convex polytopedeﬁned by the hyperplanes u F ( y , ℓ ) ≥ u F ( y , k ) for all k ∈ [ n ] and the borders of ∆ m − ; in addition, ∪ nℓ =1 BR − ( ℓ ) = ∆ m − , and for any ℓ = k , the polytopes BR − ( ℓ ) and BR − ( k ) only intersect at theirborders unless u F ( · , ℓ ) = u F ( · , k ) . Thus, if the follower attempts to deceive the leader via a fake f BR ,the leader might spot the deception in case f BR is not polytopal.It turns out that the following correspondence, which we denote as f BR P , is polytopal and, as wewill shortly show, it is in fact as powerful as any best response correspondence. f BR P ( y ) =  { j } if y ∈ ∆ m − \ U j ( x ) { j } ∪ arg min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) if y ∈ U j ( x ) \ U j ( x )arg min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) if y ∈ U j ( x ) where U j ( x ) is the closure of U j ( x ) = (cid:8) y ∈ ∆ m − : u L ( y , j ) > u L ( x , j ) (cid:9) . Intuitively, it is safe forthe follower to respond by playing j against any leader strategy y if u L ( y , j ) ≤ u L ( x , j ) , in whichcase the leader does not have a strong incentive to commit to y instead of x . In response to the otherstrategies, however, the follower needs to play a diﬀerent strategy in order to minimize the leader’sincentive to commit to such a y . Therefore, this approach will successfully induce ( x , j ) if and only ifthe following holds: u L ( x , j ) ≥ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) , (3)where we use the convention that max ∅ = −∞ . It is easy to see that f BR P is indeed polytopal: f BR − P ( j ) = ∆ m − \ U j ( x ) is a closed convex polytope, and the same holds for the sets f BR − P ( ℓ ) deﬁnedby the hyperplanes u L ( y , ℓ ) ≤ u L ( y , k ) , k ∈ [ n ] \ { j } and the borders of U j ( x ) , which further forma partition of U j ( x ) . Note that the use of U j ( x ) , instead of the set (cid:8) y ∈ ∆ m − : u L ( y , j ) ≥ u L ( x , j ) (cid:9) , is important: when u L ( y , j ) = u L ( x , j ) for all y ∈ ∆ m − , these two sets deﬁne diﬀerent behaviors.

6n fact, (2) is equivalent to (3), meaning that the extra condition imposed on f BR P does not compro-mise its power: if ( x , j ) can be induced by an arbitrary f BR then it can also be induced by f BR P . We statethis result in Lemma 3.2. Lemma 3.2. u L ( x , j ) ≥ M if and only if u L ( x , j ) ≥ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) .Proof. Recall that we want to show that u L ( x , j ) ≥ M if and only if u L ( x , j ) ≥ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) (4)where M is the maximin utility of the leader.We show that (4) does not hold if and only if u L ( x , j ) < M . Suppose that (4) does not hold. Then u L ( x , j ) < max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) by deﬁnition, which implies that U j ( x ) = ∅ . By thecontinuity of min ℓ ∈ [ n ] \{ j } u L ( · , ℓ ) , there exists y ∗ ∈ U j ( x ) such that u L ( x , j ) < min ℓ ∈ [ n ] \{ j } u L ( y ∗ , ℓ ) . By the deﬁnition of U j ( x ) , we also have u L ( x , j ) < u L ( y ∗ , j ) . Thus, u L ( x , j ) < min ℓ ∈ [ n ] u L ( y ∗ , ℓ ) ≤ max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) = M. Conversely, suppose that u L ( x , j ) < M . Let y ∗ ∈ arg max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) . Thus, M =min ℓ ∈ [ n ] u L ( y ∗ , ℓ ) , and we have u L ( x , j ) < M = min ℓ ∈ [ n ] u L ( y ∗ , ℓ ) ≤ u L ( y ∗ , j ) which implies that y ∗ ∈ U j ( x ) . It follows that M = max y ∈ U j ( x ) min ℓ ∈ [ n ] u L ( y , ℓ ) and thus u L ( x , j ) < max y ∈ U j ( x ) min ℓ ∈ [ n ] u L ( y , ℓ ) ≤ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) , so (4) does not hold.Using Lemma 3.2, we can eﬃciently compute the best strategy proﬁle that can be induced by f BR P ,simply by solving the following Linear Program (LP) for each j ∈ [ n ] : max x ∈ ∆ m − u F ( x , j ) (5)subject to u L ( x , j ) ≥ M At this point, it might be tempting to think that with the polytopal constraint imposed, we wouldalso be able to construct an explicit payoﬀ matrix ˜ u F to implement f BR P . Unfortunately, this is not trueas Example 3.3 illustrates. Surprisingly though, in the next section we will show that, even thoughwe cannot construct a payoﬀ matrix that implements f BR P directly, every strategy proﬁle ( x , j ) thatis f BR P -inducible, is in fact payoﬀ-inducible. We also present an eﬃcient algorithm for computing apayoﬀ matrix ˜ u F to induce such ( x , j ) . Example 3.3.

Consider a × game with the leader payoﬀ matrix given in Figure 1. Let f BR P be apolytopal BR correspondence deﬁned by the regions R , R , and R in Figure 1, such that ℓ ∈ f BR P if and only if y ∈ R ℓ . This best response behavior cannot be realized by any payoﬀ matrix. To seethis, suppose f BR P is realized by some ˜ u F ∈ R × . Let x = ( , , , w = ( , , ) , and z = ( , , ) .We have f BR P ( z ) = { , , } and f BR P ( w ) = { , } . This means that u L ( z ,

1) = u L ( z ,

3) = u L ( z , and u L ( w ,

1) = u L ( w , > u L ( w , . Since x = 2 z − w , by the linearity of the utility function, u L ( x ,

1) = u L ( x , < u L ( x , , which contradicts the fact that f BR P ( x ) = { , } .7 L =  − / /

21 1 / − /  R R R xzw y y Figure 1: No payoﬀ matrix ˜ u F realizes the polytopal BR correspondence f BR P , such that ℓ ∈ f BR P if and onlyif y ∈ R ℓ , where R = { y ∈ ∆ : y ≥ y + y } , R = { y ∈ ∆ : y ≤ y + y and y ≥ y } , and R = { y ∈ ∆ : y ≤ y + y and y ≤ y } . In this section, we will show that every proﬁle strategy that can be induced by f BR P is also payoﬀ-inducible, and a corresponding payoﬀ matrix can be eﬃciently constructed. Recall that the maximinutility of the leader is denote by M = max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) . We will show the followingcharacterization as one of our key results, which enables us to use the LP in (5) to eﬃciently computea payoﬀ matrix that achieves the optimal inducible utility. Theorem 4.1.

A strategy proﬁle ( x , j ) is payoﬀ-inducible if and only if u L ( x , j ) ≥ M . Furthermore, amatrix ˜ u F inducing ( x , j ) can be constructed in polynomial time. One direction of the characterization is easy to show. Indeed, if ( x , j ) is payoﬀ-inducible, then itis also BR-inducible, and as seen in Section 3, it holds that u L ( x , j ) ≥ M .Now consider any proﬁle ( x , j ) such that u L ( x , j ) ≥ M . Recall that U j ( x ) = { y ∈ ∆ m − : u L ( y , j ) > u L ( x , j ) } . Without loss of generality, in what follows, we can also assume that U j ( x ) = ∅ :if U j ( x ) = ∅ , then ( x , j ) will be an SSE if the follower always responds by playing j ; this can easilybe achieved by claiming that j strictly dominates all other strategies, i.e., by letting ˜ u F ( i, j ) = 1 and ˜ u F ( i, ℓ ) = 0 for all i ∈ [ m ] and ℓ ∈ [ n ] \ { j } .We begin by analyzing the following payoﬀ function that forms the basis of our approach. Let b S ⊆ [ n ] \ { j } and pick k ∈ argmin ℓ ∈ b S u L ( x , ℓ ) arbitrarily. For all y ∈ ∆ m − , let ˜ u F ( y , ℓ ) =  − u L ( y , ℓ ) if ℓ ∈ b S − u L ( y , k ) − if ℓ ∈ [ n ] \ ( b S ∪ { j } ) − u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) if ℓ = j (6)where α > is a constant. In what follows, we will let f BR denote the best response correspondencecorresponding to ˜ u F , i.e., f BR ( y ) = argmax ℓ ∈ [ n ] ˜ u F ( y , ℓ ) . Note that we can compute the payoﬀ matrixcorresponding to ˜ u F in polynomial time. Then, the hope is that with appropriately chosen b S and α ,the payoﬀ matrix will induce ( x , j ) . Indeed, ˜ u F has the following nice properties:i. Strategy j is indeed a best response to x , since, by the choice of k we have ˜ u F ( x , j ) = − u L ( x , k ) ≥ − min ℓ ∈ b S u L ( x , ℓ ) = max ℓ ∈ b S ˜ u F ( x , ℓ ) . ii. Any ℓ ∈ [ n ] \ ( b S ∪ { j } ) cannot be a best response of the follower as it is strictly dominated by k , i.e., ˜ u F ( y , ℓ ) < ˜ u F ( y , k ) for all y ∈ ∆ m − . Thus, f BR ( y ) ⊆ b S ∪ { j } for all y ∈ ∆ m − .8ii. If j is a best response to some y ∈ ∆ m − , then u L ( y , j ) ≤ u L ( x , j ) . Indeed, j ∈ f BR ( y ) impliesthat ˜ u F ( y , j ) = max ℓ ∈ [ n ] ˜ u F ( y , ℓ ) ≥ ˜ u F ( y , k ) . Substituting ˜ u F ( y , j ) = − u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) into this inequality and rearrang-ing the terms immediately gives u L ( y , j ) ≤ u L ( x , j ) .iv. If any ℓ ∈ b S is a best response to some y ∈ ∆ m − , then it holds that ˜ u F ( y , ℓ ) = max ℓ ′ ∈ b S ˜ u F ( y , ℓ ′ ) ,which implies that u L ( y , ℓ ) = min ℓ ′ ∈ b S u L ( y , ℓ ′ ) . (7)Therefore, if the following also holds for the y in (iv), min ℓ ′ ∈ b S u L ( y , ℓ ′ ) ≤ u L ( x , j ) , then by (7) we will have u L ( y , ℓ ) ≤ u L ( x , j ) for every ℓ ∈ f BR ( y ) ∩ b S . This, together with (ii) and (iii),will imply that u L ( x , j ) ≥ u L ( y , ℓ ) for every ℓ ∈ f BR ( y ) . Therefore, ( x , j ) will indeed form an SSEgiven that j ∈ f BR ( x ) by (i). We state this observation as the following lemma. Lemma 4.2. If min ℓ ′ ∈ b S u L ( y , ℓ ′ ) ≤ u L ( x , j ) holds for all y ∈ ∆ m − such that f BR ( y ) ∩ b S = ∅ , thenthe payoﬀ matrix deﬁned by (6) induces ( x , j ) . The proof of Theorem 4.1 is then completed by showing the following result.

Proposition 4.3. If u L ( x , j ) ≥ M and U j ( x ) = ∅ , then we can construct b S ⊆ [ n ] \ { j } and α > inpolynomial time, with which the condition of Lemma 4.2 holds for ˜ u F as deﬁned in (6) . The proof relies on the following useful lemma.

Lemma 4.4 (Farkas’ Lemma [Boyd and Vandenberghe, 2014]) . Let A ∈ R n × n and b ∈ R n . Thenexactly one of the following statements is true:1. there exists z ∈ R n such that Az = b and z ≥ ;2. there exists z ∈ R n such that A T z ≥ and b · z < .Proof of Proposition 4.3. Consider any strategy proﬁle ( x , j ) with u L ( x , j ) ≥ M and U j ( x ) = ∅ . Webegin by taking care of a simple case, as an immediate corollary of Lemma 4.2. Corollary 4.5.

A matrix ˜ u F that induces ( x , j ) can be constructed in polynomial time if it holds that u L ( x , j ) ≥ M − j := max y ∈ ∆ m − min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) . (8) Proof.

Let b S = [ n ] \ { j } . Then, for every y ∈ ∆ m − , we immediately obtain that u L ( x , j ) ≥ max y ∈ ∆ m − min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) ≥ min ℓ ∈ b S u L ( y , ℓ ) By Lemma 4.2, the payoﬀ matrix deﬁned by (6) (with, say, α = 1 ) then induces ( x , j ) , and can clearlybe computed in polynomial time. 9e more challenging case is when (8) does not hold (e.g., the case with the proﬁle ( x , in Exam-ple 3.3). In what follows, we prove Proposition 4.3 by showing that there is still a choice of b S and α that leads to the condition in Lemma 4.2, even when (8) does not hold. Thus, from now on, we assumethat u L ( x , j ) < M − j . (9)We deﬁne the following useful components. By Lemma 3.2 and the assumption that u L ( x , j ) ≥ M ,we know that u L ( x , j ) ≥ V (10)where V = max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) . Since U j ( x ) = ∅ , there exists y ∗ ∈ U j ( x ) such that min ℓ ∈ [ n ] \{ j } u L ( y ∗ , ℓ ) = V, (11)which can be computed eﬃciently by solving an LP (i.e., maximize µ , subject to µ ≤ u L ( y , ℓ ) for all ℓ ∈ [ n ] \ { j } and y ∈ U j ( x ) ). We then let S = { ℓ ∈ [ n ] \ { j } | u L ( y ∗ , ℓ ) = V } . Before we proceed, we prove two useful technical results.

Lemma 4.6. u L ( y ∗ , j ) = u L ( x , j ) .Proof. For the sake of contradiction, suppose that u L ( y ∗ , j ) = u L ( x , j ) . Since y ∗ ∈ U j ( x ) , we havethat u L ( y ∗ , j ) ≥ u L ( x , j ) , so it must be that u L ( y ∗ , j ) > u L ( x , j ) .The assumption (9) that u L ( x , j ) < M − j implies that there exists ˆ y ∈ ∆ m − such that min ℓ ∈ [ n ] \{ j } u L (ˆ y , ℓ ) > u L ( x , j ) ≥ V, where we also use (10). Now that min ℓ ∈ [ n ] \{ j } u L ( y ∗ , ℓ ) = V by (11), by the concavity of min ℓ ∈ [ n ] \{ j } u L ( · , ℓ ) ,it follows that min ℓ ∈ [ n ] \{ j } u L ( z , ℓ ) > V for all z on the segment [ˆ y , y ∗ ) ; z ∈ ∆ m − as ∆ m − is con-vex. Now that we have u L ( y ∗ , j ) > u L ( x , j ) under our assumption, when z is suﬃciently close to y ∗ ,we can have u L ( z , j ) ≥ u L ( x , j ) and hence, z ∈ U j ( x ) . This leads to the contradiction that V = max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) ≥ min ℓ ∈ [ n ] \{ j } u L ( z , ℓ ) > V. Lemma 4.7. min ℓ ∈ S u L ( y , ℓ ) < V for all y ∈ U j ( x ) .Proof. For the sake of contradiction, assume that there exists ˆ y ∈ U j ( x ) such that min ℓ ∈ S u L (ˆ y , ℓ ) ≥ V. By assumption (9) that u L ( x , j ) < M − j , there exists z ∈ ∆ m − such that min ℓ ∈ [ n ] \{ j } u L ( z , ℓ ) >u L ( x , j ) ≥ V , which immediately yields the following given that S ⊆ [ n ] \ { j } by deﬁnition: min ℓ ∈ S u L ( z , ℓ ) > V.

10y deﬁnition, u L ( y ∗ , ℓ ) = V for all ℓ ∈ S , which also implies that u L ( y ∗ , ℓ ) > V for all ℓ ∈ [ n ] \ ( { j } ∪ S ) (otherwise, we would have min ℓ ∈ [ n ] \{ j } u L ( y ∗ ,ℓ ) < V ). Thus, we have min ℓ ∈ S u L ( y ∗ , ℓ ) = V and min ℓ ∈ [ n ] \ ( { j }∪ S ) u L ( y ∗ , ℓ ) > V. Now consider a point w on the segment ( y ∗ , ˆ y ] . Since y ∗ ∈ U j ( x ) and ˆ y ∈ U j ( x ) , i.e., u L ( y ∗ , j ) ≥ u L ( x , j ) and u L (ˆ y , j ) > u L ( x , j ) , we have u L ( w , j ) > u L ( x , j ) and hence, w ∈ U j ( x ) . In addition,by continuity, when w is suﬃciently close to y ∗ , we have min ℓ ∈ [ n ] \ ( { j }∪ S ) u L ( w , ℓ ) > V. (12)By concavity of the function min ℓ ∈ S u L ( · , ℓ ) , since min ℓ ∈ S u L ( y , ℓ ) ≥ V for both y ∈ { y ∗ , ˆ y } , wehave min ℓ ∈ S u L ( w , ℓ ) ≥ V. (13)Analogously, we can ﬁnd a point w ′ ∈ U j ( x ) on the segment ( w , z ] , such that (12) and (13) holdfor w ′ while (13) is strict, in particular. Thus, we have min ℓ ∈ [ n ] \{ j } u L ( w ′ , ℓ ) > V = max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) , which is a contradiction as w ′ ∈ U j ( x ) .In what follows, we use the coordinates ( y , . . . , y m − ) for every point y ∈ ∆ m − , i.e., we have ∆ m − = ( ( y , . . . , y m − ) ∈ R ≥ : m − X i =1 y i ≤ ) . Accordingly, we can write the utility function as u L ( y , ℓ ) = g ℓ · y + u L ( m, ℓ ) , where g ℓ ∈ R m − and its i -th component is g ℓ,i = u L ( i, ℓ ) − u L ( m, ℓ ) ; “ · ” denotes the inner product.Hence, we have u L ( y , ℓ ) = g ℓ · ( y − y ∗ ) + u L ( y ∗ , ℓ ) = ( g ℓ · ( y − y ∗ ) + V if ℓ ∈ S g j · ( y − y ∗ ) + u L ( x , j ) if ℓ = j (14)where u L ( y ∗ , ℓ ) = V for all ℓ ∈ S by the deﬁnition of S , and u L ( y ∗ , j ) = u L ( x , j ) by Lemma 4.6.Note that since U j ( x ) = ∅ , it must be that g j = 0 .We also write the m boundary conditions that deﬁne ∆ m − as e i · y ≥ β i . Namely, for each i ∈ [ m − , let e i ∈ R m − be the i -th unit vector and β i = 0 , while e m = ( − , . . . , − ∈ R m − and β m = − . Thus, ∆ m − = { y ∈ R m − : e i · y ≥ β i for i ∈ [ m ] } . Let B = { i ∈ [ m ] : e i · y ∗ = β i } be the set of boundary conditions that are tight for y ∗ . Note that for any y ∈ ∆ m − we have e i · ( y − y ∗ ) ≥ for all i ∈ B. (15)We can now prove the following result using Farkas’ Lemma (Lemma 4.4), which allows us toexpress − g j as a non-negative linear combination of g ℓ ’s and e i ’s.11 emma 4.8. − g j can be expressed as a non-negative linear combination of { g ℓ : ℓ ∈ S } ∪ { e i : i ∈ B } ,i.e. − g j = P ℓ ∈ S λ ℓ g ℓ + P i ∈ B µ i e i , where λ ℓ ≥ and µ i ≥ .Proof. We use Farkas’ Lemma (Lemma 4.4) and let n = m − and n = | S | + | B | . The columns of A are exactly the vectors { g ℓ : ℓ ∈ S } ∪ { e i : i ∈ B } . We set b = − g j . Note that the ﬁrst alternative ofFarkas’ Lemma immediately yields the statement we want to prove. Thus, we set out to prove that thesecond alternative cannot hold.Assume, for the sake of contradiction, that there exists z ∈ R m − such that A T z ≥ and b · z < ,i.e., g ℓ · z ≥ for all ℓ ∈ S , e i · z ≥ for all i ∈ B , and g j · z > .Then, by picking δ > suﬃciently small, it holds for y = y ∗ + δ z that:• By (14), we have the following for all ℓ ∈ S : u L ( y , ℓ ) = g ℓ · ( y − y ∗ ) + V = δ g ℓ · z + V ≥ V. In addition, u L ( y , j ) = g j · ( y − y ∗ ) + u L ( y ∗ , j ) = δ g j · z + u L ( x , j ) > u L ( x , j ) . • y ∈ ∆ m − : For i ∈ B , we immediately obtain that e i · y = e i · ( y ∗ + δ z ) ≥ e i · y ∗ = β i , whichmeans that these boundary conditions are satisﬁed. For i ∈ [ m ] \ B , we know that e i · y ∗ > β i and thus by picking δ > small enough, we can ensure that e i · y = e i · y ∗ + δ ( e i · z ) ≥ β i .Thus, it follows that y ∈ U j ( x ) and min ℓ ∈ S u L ( y , ℓ ) ≥ V . But this cannot hold according to Lemma 4.7.We can now complete the proof of Proposition 4.3. We ﬁrst express − g j as a non-negative linearcombination of the vectors { g ℓ : ℓ ∈ S } ∪ { e i : i ∈ B } . By Lemma 4.8 we know that this is possibleand it is easy to see that we can ﬁnd the coeﬃcients in polynomial time (e.g. by solving an LP). Wethus obtain − g j = P ℓ ∈ S λ ℓ g ℓ + P i ∈ B µ i e i , where λ ℓ ≥ for every ℓ ∈ S and µ i ≥ for every i ∈ B . Let b S = { ℓ ∈ S : λ ℓ > } . We will argue that b S = ∅ .Observe that since now − g j = P ℓ ∈ S λ ℓ g ℓ + P i ∈ B µ i e i and, by (15), we have e i · ( y − y ∗ ) ≥ for all y ∈ ∆ m − and i ∈ B , it follows that, for all y ∈ ∆ m − , we have − g j · ( y − y ∗ ) = X ℓ ∈ S λ ℓ g ℓ · ( y − y ∗ ) + X i ∈ B µ i e i · ( y − y ∗ ) ≥ X ℓ ∈ S λ ℓ g ℓ · ( y − y ∗ )= X ℓ ∈ b S λ ℓ g ℓ · ( y − y ∗ ) , (16)where the last transition is due to the fact that λ ℓ = 0 for all ℓ ∈ S \ b S , as implied by the deﬁnition of b S . Since U j ( x ) = ∅ , consider any y ∈ U j ( x ) . By deﬁnition, this means that u L ( y , j ) > u L ( x , j ) ,which further implies that g j · ( y − y ∗ ) > since u L ( y , j ) = g j · ( y − y ∗ ) + u L ( x , j ) by (14). By (16),we then have X ℓ ∈ b S λ ℓ g ℓ · ( y − y ∗ ) < . Hence, b S = ∅ . 12t remains to show that with the above b S and, in particular, α = 1 /λ k (recall that k ∈ argmin ℓ ∈ b S u L ( x , ℓ ) ),the condition in Lemma 4.2 holds, i.e., we prove that min ℓ ∈ b S u L ( y , ℓ ) ≤ u L ( x , j ) holds for all y ∈ ∆ m − such that f BR ( y ) ∩ b S = ∅ .For the sake of contradiction, suppose that there exists y ∈ ∆ m − such that f BR ( y ) ∩ b S = ∅ , but u L ( y , ℓ ) > u L ( x , j ) for all ℓ ∈ b S . By (10), we have u L ( x , j ) ≥ V , and thus u L ( y , ℓ ) > V for all ℓ ∈ b S .By (14), we have u L ( y , ℓ ) = g ℓ · ( y − y ∗ ) + V ; thus, g ℓ · ( y − y ∗ ) > for all ℓ ∈ b S .Using (16) and the fact that k ∈ b S by our choice, we then obtain − g j · ( y − y ∗ ) ≥ X ℓ ∈ b S λ ℓ g ℓ · ( y − y ∗ ) ≥ λ k g k · ( y − y ∗ ) . By (14), we have u L ( x , j ) − u L ( y , j ) = − g j · ( y − y ∗ ) . Recall that it is deﬁned that ˜ u F ( y , j ) = − u L ( y , k )+ α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) as in (6). Using the abovetwo equations and (14), we then obtain the following: ˜ u F ( y , j ) = − u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) = − g k · ( y − y ∗ ) − V − α g j · ( y − y ∗ ) ≥ − V + ( αλ k − g k · ( y − y ∗ )= − V. However, by (6) we also have ˜ u F ( y , ℓ ) = − u L ( y , ℓ ) if ℓ ∈ b S , which implies that for all ℓ ∈ b S it holdsthat ˜ u F ( y , j ) ≥ − V > − u L ( y , ℓ ) = ˜ u F ( y , ℓ ) . Hence, f BR ( y ) ∩ b S = ∅ , which contradicts our assumption. As discussed in Section 2, a weakness of BR- and payoﬀ-inducible strategy proﬁles is that the resultinggames may have multiple SSEs, in which case the follower depends on the leader to choose the SSEthat maximizes his utility. To avoid this, in this section, we turn our attention to strong inducibility(see Deﬁnition 2.5) and attempt to ﬁnd a payoﬀ matrix ˜ u F such that e G has a unique SSE.We begin with an example showcasing that, in general, the best strongly inducible proﬁle can bemuch worse than the best payoﬀ-inducible proﬁle. Example 5.1.

Consider a × game G = ( u L , u F ) with the payoﬀ matrices given in Figure 2. Note thatthe follower obtains positive utility only by playing his strategy . Now, observe that the SSE ( x ∗ , , x ∗ = (0 , , ∈ ∆ , is payoﬀ-inducible and yields a utility of for the follower: it can be induced byany payoﬀ matrix in which strategy of the follower strictly dominates all other strategies. However,such a payoﬀ matrix will also induce other SSEs, e.g., ( y ∗ , with y ∗ = (1 , , ∈ ∆ . Indeed, it holdsthat no proﬁle of the form ( y , can be strongly induced, and thus the optimal utility the follower canobtain at a strongly inducible proﬁle is . To see this, ﬁrst note that, as seen above, if the followerclaims that strategy is his unique best response for all points in ∆ , then the SSE is not unique. Onthe other hand, if strategy is a best response at some point z ∈ ∆ , then ( y , will not be an SSE,since for the leader u L ( y , < u L ( z , for any y , z ∈ ∆ .13 L =  / / / /  u F =  / /  R x ∗ y ∗ y y Figure 2: A game where the optimal inducible utility is , but the optimal strongly inducible utility is . u L =  / /  u F =  / / / /

21 1 /  R R x ∗ y ∗ w ∗ y y Figure 3: A non-max-degenerate game for which the optimal inducible utility cannot be achieved by any stronglyinducible proﬁle.

The problem in Example 5.1 stems from the following observation: if the follower reports a payoﬀmatrix such that strategy is the unique best response for all points in the domain, then there aremultiple SSEs. This can be thought of as a “degenerate” case, since it would occur with probability ,if the payoﬀs of the leader were drawn uniformly at random in [0 , . We formalize this as follows. Deﬁnition 5.2.

A leader payoﬀ matrix u L is said to be max-degenerate , if there exists j ∈ [ n ] suchthat | argmax i ∈ [ m ] u L ( i, j ) | > .We next provide an example showing that even when u L is not max-degenerate, we cannot hopeto exactly achieve the optimal inducible utility via a strongly inducible proﬁle. Example 5.3.

Consider a × game with the leader and follower payoﬀ matrices given in Figure 3.It is easy to check that u L is not max-degenerate. Now, observe that the maximin utility of the leaderis M = 1 / and is achieved at the point y ∗ = ( , , ∈ ∆ . Let x ∗ = (0 , , ∈ ∆ . Since u L ( x ∗ ,

1) = 1 / ≥ M , it follows that ( x ∗ , is payoﬀ-inducible by Theorem 4.1. Indeed, the partition ( R , R ) of ∆ in Figure 3 shows how ( x ∗ , can be induced. Note that u F ( x ∗ ,

1) = 1 , while anyproﬁle diﬀerent from ( x ∗ , yields utility strictly less than for the follower. We will now showthat ( x ∗ , cannot be strongly induced, which implies that any strongly inducible proﬁle gives utilitystrictly less than to the follower. Indeed, suppose that ( x ∗ , is induced by some ˜ u F . If by ˜ u F strategy is a best response to y ∗ , then ( x ∗ , cannot be the unique SSE, since u L ( x ∗ ,

1) = u L ( y ∗ , . Onthe other hand, if strategy is the only best response to y ∗ , then there exists some suﬃciently small δ > such that strategy is also a best response to w ∗ = ( − δ, + δ, (see Figure 3). However,this means that ( x ∗ , cannot be an SSE, since u L ( x ∗ ,

1) = 1 / and u L ( w ∗ ,

2) = 1 / δ .As a result, unlike in the previous section, here we cannot hope to solve the problem exactly.However, the next theorem shows that we can approximate the optimal utility with arbitrarily goodprecision. Theorem 5.4. If u L is not max-degenerate, then for any ε > , the follower can strongly induce aproﬁle ( x , j ) that yields the optimal inducible utility up to an additive loss of at most ε . Furthermore, amatrix ˜ u F strongly inducing ( x , j ) can be constructed in time polynomial in log(1 /ε ) (and the size of therepresentation of the game). roof. Let ( x ∗ , j ) be a payoﬀ-inducible proﬁle that yields the optimal inducible payoﬀ for the follower.By Theorem 4.1, such a proﬁle can be computed in polynomial time.We begin by solving the following LP. max δ, x δ s.t. x ∈ ∆ m − u F ( x , j ) ≥ u F ( x ∗ , j ) − εu L ( x , j ) = u L ( x ∗ , j ) + δ (17)Note that this LP can be solved in time polynomial in log(1 /ε ) . Furthermore, note that the polytopeof feasible points is not empty since δ = 0 and x = x ∗ satisfy all the constraints. Finally, the LP is notunbounded since δ can be at most max i ∈ [ m ] u L ( i, j ) − u L ( x ∗ , j ) .In the rest of this proof let δ and x denote an optimal solution to this LP. Note that we can inparticular assume that x is a vertex of the convex polytope P δ = { y ∈ ∆ m − : u L ( y , j ) = u L ( x ∗ , j )+ δ } . Indeed, given a solution δ, x to LP (17), if x is not a vertex of P δ , then we consider the LP max y u F ( y , j ) s.t. y ∈ ∆ m − u L ( y , j ) = u L ( x ∗ , j ) + δ It is known that a solution of an LP that is also a vertex of the feasible polytope can be computed inpolynomial time [Gr¨otschel et al., 1981]. Note that in this case the feasible polytope is exactly P δ . Let y be an optimal solution that is a vertex of P δ . We know that x ∈ P δ and u F ( x , j ) ≥ u F ( x ∗ , j ) − ε ,which implies that u F ( y , j ) ≥ u F ( x ∗ , j ) − ε . But this means that δ, y is also an optimal solution tothe original LP (17). Thus, by letting x := y , we indeed have that x is a vertex of the convex polytope P δ . Let us ﬁrst handle the case where δ = 0 by showing that ( x ∗ , j ) itself can be strongly induced. Since δ = 0 , it follows that U j ( x ∗ ) = ∅ . Indeed, if there exists ˆ y ∈ ∆ m − with u L (ˆ y , j ) > u L ( x ∗ , j ) , thenthere exists y on the segment ( x ∗ , ˆ y ] such that u F ( y , j ) ≥ u F ( x ∗ , j ) − ε (when y is suﬃciently close to x ∗ ) and u L ( y , j ) > u L ( x ∗ , j ) , a contradiction to the optimality of δ = 0 . Now, given that U j ( x ∗ ) = ∅ ,we have that u L ( y , j ) ≤ u L ( x ∗ , j ) for all y ∈ ∆ m − . But since u L is not max-degenerate (in the senseof Deﬁnition 5.2), it follows that in fact u L ( y , j ) < u L ( x ∗ , j ) for all y ∈ ∆ m − \ { x ∗ } . Thus, if thefollower always best responds with strategy j , then ( x ∗ , j ) will be the unique SSE. As seen before, itis easy to implement this behavior by reporting ˜ u F ( i, j ) = 1 and ˜ u F ( i, ℓ ) = 0 for all i ∈ [ m ] and ℓ ∈ [ n ] \ { j } .In the rest of this proof, we consider the case δ > and show that ( x , j ) can be strongly induced.Since u F ( x , j ) ≥ u F ( x ∗ , j ) − ε , this means that at ( x , j ) the follower achieves the optimal inducibleutility up to an additive error of ε . Using the same notation as in the proof of Proposition 4.3, we let B = { i ∈ [ m ] : e i · x = β i } denote the set of boundary conditions of ∆ m − that are tight for x . Note that since x is a vertex of thepolytope P δ , it follows that B = ∅ . We let h = P i ∈ B e i . As in the proof of Proposition 4.3, we havethat for all y ∈ ∆ m − it holds that h · ( y − x ) = X i ∈ B e i · ( y − x ) ≥ . (18)Furthermore, since x is a vertex of P δ , it follows that for all y ∈ P δ \ { x } there exists i ∈ B such that e i · ( y − x ) > , and thus h · ( y − x ) > . (19)15ndeed, if e i · ( y − x ) = 0 for all i ∈ B for some y ∈ P δ \ { x } , this would contradict the fact that x isa vertex of P δ (i.e. the unique point in P δ for which the boundary conditions in B are tight).We are now ready to construct the payoﬀ matrix reported by the follower. Pick an arbitrary k ∈ argmin ℓ ∈ [ n ] \{ j } u L ( x , ℓ ) . For all y ∈ ∆ m − let ˜ u F ( y , ℓ ) = ( − u L ( y , ℓ ) if ℓ ∈ [ n ] \ { j }− u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) − h · ( y − x ) if ℓ = j (20)where α = (cid:0) i ∈ [ m ] max ℓ ∈ [ n ] (cid:12)(cid:12) u L ( i, ℓ ) (cid:12)(cid:12) + m (cid:1) /δ > . Note that we can compute the payoﬀ matrixcorresponding to this utility function in polynomial time. In the remainder of this proof, we show that ( x , j ) is the unique SSE of the game ( u L , ˜ u F ) .Clearly, j is a best response at x , since ˜ u F ( x , j ) = − u L ( x , k ) = − min ℓ ∈ [ n ] \{ j } u L ( x , ℓ ) = max ℓ ∈ [ n ] \{ j } ˜ u F ( x , ℓ ) , by the choice of k .Next, let us show that if j is a best response at some y ∈ ∆ m − \ { x } , then u L ( y , j ) < u L ( x , j ) .Indeed, if j is a best response at y , then in particular ˜ u F ( y , j ) ≥ ˜ u F ( y , k ) , which implies that α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) ≥ h · ( y − x ) . (21)Since h · ( y − x ) ≥ by (18), and α > , it follows that u L ( x , j ) ≥ u L ( y , j ) . It remains to showthat u L ( x , j ) = u L ( y , j ) . But if u L ( x , j ) = u L ( y , j ) , then y ∈ P δ \ { x } and so by (19) we have h · ( y − x ) > , which contradicts (21).Finally, it remains to show that if ℓ ∈ [ n ] \ { j } is a best response at some y ∈ ∆ m − , then itmust be that u L ( y , ℓ ) < u L ( x , j ) : Indeed, if ℓ ∈ [ n ] \ { j } is a best response at y , then in particular ˜ u F ( y , j ) ≤ ˜ u F ( y , ℓ ) , which by (20) means that α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) ≤ − u L ( y , ℓ ) + u L ( y , k ) + h · ( y − x ) ≤ − u L ( y , ℓ ) + u L ( y , k ) + k h k k y − x k ≤ i ∈ [ m ] max ℓ ′ ∈ [ n ] | u L ( i, ℓ ′ ) | + √ m − √ m − ≤ αδ by the choice of α . Thus, we obtain that u L ( x , j ) − u L ( y , j ) ≤ δ , which implies that u L ( y , j ) ≥ u L ( x ∗ , j ) , i.e. y ∈ U j ( x ∗ ) (since U j ( x ∗ ) = ∅ ). Since ( x ∗ , j ) is payoﬀ-inducible, which means that u L ( x ∗ , j ) ≥ M , we can use Lemma 3.2 to obtain u L ( x , j ) = u L ( x ∗ , j ) + δ > u L ( x ∗ , j ) ≥ min ℓ ′ ∈ [ n ] \{ j } u L ( y , ℓ ′ ) = u L ( y , ℓ ) where the last equality comes from the fact that ℓ is a best response at y , i.e., in particular ˜ u F ( y , ℓ ) =max ℓ ′ ∈ [ n ] \{ j } ˜ u F ( y , ℓ ′ ) . An interesting ﬁrst question that emerges from our results, is how to design countermeasures to miti-gate the potential loss of a learning leader, caused by possible deceptive behavior of the follower. Thiswas considered in [Gan et al., 2019b], where as a solution it was proposed that the leader could committo a policy, which is a strategy conditioned on the report of the follower, instead of a strategy. However,16n contrast to [Gan et al., 2019b], where the follower’s report is limited to a ﬁnite set of payoﬀ matri-ces, computing the optimal policy in our model seems to be a very challenging problem. In addition,it would be nice to explore whether the optimal follower payoﬀ matrix (or a good approximation ofit) can still be computed eﬃciently, when additional constraints on how much he can deviate from histrue payoﬀ matrix are imposed. Finally, another interesting direction would be to quantify and providetight bounds on the leader’s utility loss, caused by the deceptive behavior of the follower.

References

Yakov Babichenko. Query complexity of approximate Nash equilibria.

Journal of the ACM , 63(4):36:1–36:24, 2016.Yakov Babichenko and Aviad Rubinstein. Communication complexity of approximate Nash equilibria.In

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC) , pages878–889, 2017.Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment withoutregrets: Online learning in Stackelberg security games. In

Proceedings of the 16th ACM Conferenceon Economics and Computation (EC) , pages 61–78, 2015.Marco Barreno, Blaine Nelson, Anthony D. Joseph, and J. Doug Tygar. The security of machine learning.

Machine Learning , 81(2):121–148, 2010.Omer Ben-Porat and Moshe Tennenholtz. Regression equilibrium. In

Proceedings of the 2019 ACMConference on Economics and Computation (EC) , pages 173–191, 2019.Avrim Blum, Jeﬀrey C. Jackson, Tuomas Sandholm, and Martin Zinkevich. Preference elicitation andquery learning.

Journal of Machine Learning Research , 5:649–667, 2004.Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Learning optimal commitment to overcomeinsecurity. In

Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS) ,pages 1826–1834, 2014.Liad Blumrosen and Noam Nisan. Combinatorial auctions. In

Algorithmic Game Theory , chapter 11,pages 267–299. Cambridge University Press, 2007.Stephen P. Boyd and Lieven Vandenberghe.

Convex Optimization . Cambridge University Press, 2014.George W. Brown. Some notes on computation of game solutions.

RAND corporation report , page 78,1949.X. Chen, Y. Cheng, and B. Tang. Well-supported versus approximate Nash equilibria: Query complexityof large games.

ArXiv rept. 1511.00785 , 2015.Yiling Chen, Chara Podimata, Ariel D. Procaccia, and Nisarg Shah. Strategyproof linear regression inhigh dimensions. In

Proceedings of the 2018 ACM Conference on Economics and Computation (EC) ,pages 9–26, 2018.Yiling Chen, Yang Liu, and Chara Podimata. Grinding the space: Learning to classify against strategicagents.

CoRR , abs/1911.04004, 2019.Wolfram Conen and Tuomas Sandholm. Preference elicitation in combinatorial auctions. In

Proceedingsof the 3rd ACM conference on Electronic Commerce (EC) , pages 256–259, 2001.17incent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In

Proceedingsof the 7th ACM Conference on Electronic Commerce (EC) , pages 82–90, 2006.Ofer Dekel, Felix A. Fischer, and Ariel D. Procaccia. Incentive compatible regression learning.

Journalof Computer and System Sciences , 76(8):759–777, 2010.Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic clas-siﬁcation from revealed preferences. In

Proceedings of the 2018 ACM Conference on Economics andComputation (EC) , pages 55–70, 2018.Edith Elkind, Jiarui Gan, Svetlana Obraztsova, Zinovi Rabinovich, and Alexandros A. Voudouris. Pro-tecting elections by recounting ballots. In

Proceedings of the 28th International Joint Conference onArtiﬁcial Intelligence (IJCAI) , pages 259–265, 2019.J. Fearnley, M. Gairing, P.W. Goldberg, and R. Savani. Learning equilibria of games via payoﬀ queries.

Journal of Machine Learning Research , 16:1305–1344, 2015.Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, and Michael Wooldridge. Manipulating a learningdefender and ways to counteract. In

Advances in Neural Information Processing Systems (NeurIPS) ,pages 8272–8281, 2019a.Jiarui Gan, Haifeng Xu, Qingyu Guo, Long Tran-Thanh, Zinovi Rabinovich, and Michael Wooldridge.Imitative follower deception in Stackelberg games. In

Proceedings of the 2019 ACM Conference onEconomics and Computation (EC) , page 639–657, 2019b.Paul W. Goldberg and Francisco J. Marmolejo-Coss´ıo. Learning convex partitions and computing game-theoretic equilibria from best response queries. In

International Conference on Web and InternetEconomics (WINE) , pages 168–187, 2018.Paul W. Goldberg and Aaron Roth. Bounds for the query complexity of approximate equilibria.

ACMTransactions on Economics and Computation , 4(4):24:1–24:25, 2016.Paul W. Goldberg, Francisco J. Marmolejo-Coss´ıo, and Zhiwei Steven Wu. Logarithmic query complex-ity for approximate Nash computation in large games.

Theory of Computing Systems , 63(1):26–53,2019.Paul W. Goldberg, Edwin Lock, and Francisco Marmolejo-Coss´ıo. Learning strong substitutes demandvia queries. arXiv preprint arXiv:2005.01496 , 2020.P.W. Goldberg and S. Turchetta. Query complexity of approximate equilibria in anonymous games.

Journal of Computer and System Sciences , 90:80–98, 2017.Martin Gr¨otschel, L´aszl´o Lov´asz, and Alexander Schrijver. The ellipsoid method and its consequencesin combinatorial optimization.

Combinatorica , 1(2):169–197, 1981.S. Hart and Y. Mansour. How long to equilibrium? the communication complexity of uncoupled equi-librium procedures.

Games and Economic Behavior , 69(1):107–126, 2010.Sergiu Hart and Noam Nisan. The query complexity of correlated equilibria.

Games and EconomicBehavior , pages 401–410, 2016.Safwan Hossain and Nisarg Shah. The eﬀect of strategic noise in linear regression. In

Proceedings ofthe 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) , pages511–519, 2020. 18ebastien M. Lahaie and David C. Parkes. Applying learning algorithms to preference elicitation. In

Proceedings of the 5th ACM conference on Electronic commerce (EC) , pages 180–188, 2004.Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimalstrategy to commit to. In

International Symposium on Algorithmic Game Theory , pages 250–262, 2009.Daniel Lowd and Christopher Meek. Adversarial learning. In

Proceedings of the eleventh ACM SIGKDDinternational conference on Knowledge discovery in data mining , pages 641–647, 2005.Francisco J. Marmolejo-Coss´ıo, Eric Brigham, Benjamin Sela, and Jonathan Katz. Competing (semi-)selﬁsh miners in Bitcoin. In

Proceedings of the 1st ACM Conference on Advances in Financial Tech-nologies , pages 89–109, 2019.Reshef Meir, Ariel D. Procaccia, and Jeﬀrey S. Rosenschein. Algorithms for strategyproof classiﬁcation.

Artiﬁcial Intelligence , 186:123–156, 2012.Thanh H. Nguyen and Haifeng Xu. Imitative attacker deception in Stackelberg security games. In

Proceedings of the 28th International Joint Conference on Artiﬁcial Intelligence (IJCAI) , pages 528–534,2019.Noam Nisan and Ilya Segal. The communication requirements of eﬃcient allocations and supportingprices.

Journal of Economic Theory , 129(1):192–224, 2006.Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to.In

Proceedings of the 33rd AAAI Conference on Artiﬁcial Intelligence (AAAI) , pages 2149–2156, 2019.Javier Perote and Juan Perote-Pe ˜na. Strategy-proof estimators for simple regression.

MathematicalSocial Sciences , 47(2):153–176, 2004.Julia Robinson. An iterative method of solving a game.

The Annals of Mathematics , 54(2):296–301, 1951.Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Watch and learn: Optimizing from revealedpreferences feedback. In

Proceedings of the 48th annual ACM symposium on Theory of Computing(STOC) , pages 949–962, 2016.Bernhard von Stengel and Shmuel Zamir. Leadership with commitment to mixed strategies.

CDAMResearch Report LSE-CDAM-2004-01, London School of Economics , 2004.Jingchang Sun, Pingzhong Tang, and Yulong Zeng. Games of miners. In

Proceedings of the 19th In-ternational Conference on Autonomous Agents and MultiAgent Systems (AAMAS) , pages 1323–1331,2020.Milind Tambe.

Security and Game theory: Algorithms, Deployed Systems, Lessons Learned . CambridgeUniversity Press, 2011.Heinrich Von Stackelberg.

Market structure and equilibrium . Springer Science & Business Media, 2010.Bo Waggoner, Rafael Frongillo, and Jacob D. Abernethy. A market framework for eliciting private data.In

Advances in Neural Information Processing Systems , pages 3510–3518, 2015.Yue Yin, Yevgeniy Vorobeychik, Bo An, and Noam Hazon. Optimal defense against election control bydeleting voter groups.

Artiﬁcial Intelligence , 259:32–51, 2018.Hanrui Zhang, Yu Cheng, and Vincent Conitzer. When samples are strategically selected. In

Proceedingsof the 36th International Conference on Machine Learning (ICML) , volume 97, 2019.19artin A Zinkevich, Avrim Blum, and Tuomas Sandholm. On polynomial-time preference elicitationwith value queries. In