Optimally Deceiving a Learning Leader in Stackelberg Games
Georgios Birmpas, Jiarui Gan, Alexandros Hollender, Francisco J. Marmolejo-Cossío, Ninad Rajgopal, Alexandros A. Voudouris
aa r X i v : . [ c s . G T ] J un Optimally Deceiving a Learning Leaderin Stackelberg Games ∗Georgios Birmpas , Jiarui Gan , Alexandros Hollender ,Francisco J. Marmolejo-Coss´ıo , Ninad Rajgopal , and Alexandros A. Voudouris University of Oxford, UK University of Essex, UK
Abstract
Recent results in the ML community have revealed that learning algorithms used to compute theoptimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulationby the follower. Such a learning algorithm operates by querying the best responses or the payoffsof the follower, who consequently can deceive the algorithm by responding as if his payoffs weremuch different than what they actually are. For this strategic behavior to be successful, the mainchallenge faced by the follower is to pinpoint the payoffs that would make the learning algorithmcompute a commitment so that best responding to it maximizes the follower’s utility, according tohis true payoffs. While this problem has been considered before, the related literature only focusedon the simplified scenario in which the payoff space is finite, thus leaving the general version ofthe problem unanswered. In this paper, we fill in this gap, by showing that it is always possible forthe follower to compute (near-)optimal payoffs for various scenarios about the learning interactionbetween leader and follower.
Stackelberg games are a simple yet powerful model for sequential interaction among strategic agents.In such games there are two players: a leader and a follower. The leader commits to an action, and thefollower acts upon observing the leader’s commitment. The simple sequential structure of the gamepermits modeling a multitude of important scenarios. Indicative applications include the competitionbetween a large and a small firm [Von Stackelberg, 2010], the allocation of defensive resources [Tambe,2011], the competition among mining pools in the Bitcoin network [Marmolejo-Coss´ıo et al., 2019,Sun et al., 2020], and the protection again manipulation in elections [Elkind et al., 2019, Yin et al., 2018].In Stackelberg games, the leader is interested in finding the best commitment she can make, as-suming that the follower behaves rationally. The combination of such a commitment by the leader andthe follower’s rational best response to it leads to a strong Stackelberg equilibrium (SSE). In general,the utility that the leader obtains in an SSE is larger than what she would obtain in a Nash equilib-rium of the corresponding one-shot game [Stengel and Zamir, 2004], implying that the leader prefersto commit than to engage in a simultaneous game with the follower.In case the leader has access to both hers and the follower’s payoff parameters, computing an SSEis a computationally tractable problem [Conitzer and Sandholm, 2006]. In practice however, the leader ∗ Georgios Birmpas is supported by the ERC Starting grant number 639945 (ACCORD). Jiarui Gan is supported by theEPSRC International Doctoral Scholars Grant EP/N509711/1. Alexandros Hollender is supported by an EPSRC doctoral stu-dentship (Reference 1892947). any payoff matrix, without restrictions on the space of possible values, has been consideredonly in two very recent papers [Gan et al., 2019a, Nguyen and Xu, 2019], which however focused onthe specific application of Stackelberg games to security resource allocation problems. Besides that,no progress has been made for general Stackelberg games. In this paper, we aim to fill in this gap,by completely resolving this computational problem, a result that reflects the insecurity of learning tocommit in Stackelberg games.
Our Contribution
We explore how a follower can optimally deceive a learning leader in Stackelberg games by misre-porting his payoff matrix, and study the tractability of the corresponding optimization problem. As inprevious work, our objective is to compute the fake payoff matrix according to which the follower canbest respond to make the leader learn an SSE in which the true utility of the follower is maximized.However, unlike the related literature, we do not impose any restrictions on the space from which thepayoffs are selected or on the type of the game. By exploiting an intuitive characterization of all strat-egy profiles that can be induced as SSEs in Stackelberg games, we show that it is always possible forthe follower to compute a payoff matrix implying an SSE which maximizes his true utility, in polyno-mial time. Furthermore, we strengthen this result to resolve possible equilibrium selection issues, byshowing that the follower can construct a payoff matrix that induces a unique
SSE, in which his utilityis maximized up to some arbitrarily small loss.
Other Related Work
Our paper is related to an emerging line of work at the intersection of machine learning and algo-rithmic game theory, dealing with scenarios where the samples used for training learning algorithmsare controlled by strategic agents, who aim to optimize their personal benefit. Indicatively, there hasbeen recent interest in the analysis of the effect of strategic behavior on the efficiency of existingalgorithms, as well as the design of algorithms resilient to strategic manipulation for linear regres-sion [Ben-Porat and Tennenholtz, 2019, Chen et al., 2018, Dekel et al., 2010, Hossain and Shah, 2020,Perote and Perote-Pe ˜na, 2004, Waggoner et al., 2015] and classification [Chen et al., 2019, Dong et al.,2018, Meir et al., 2012, Zhang et al., 2019].Beyond the strategic considerations above, our work is also related to the study of query pro-tocols for learning game-theoretic equilibria. In this setting, as in ours, algorithms for computing2quilibria via utility and best response queries are a natural starting point. For utility queries, therehas been much work in proving exponential lower bounds for randomized computation of exact, ap-proximate and well-supported Nash equilibria [Babichenko and Rubinstein, 2017, Babichenko, 2016,Chen et al., 2015, Goldberg and Roth, 2016, Hart and Mansour, 2010, Hart and Nisan, 2016], as well asproviding query-efficient protocols for approximate Nash equilibrium computation in bimatrix games,congestion games [Fearnley et al., 2015], anonymous games [Goldberg and Turchetta, 2017], and largegames [Goldberg et al., 2019]. Best response queries are forcibly weaker than utility queries, but theyarise naturally in practice, and are also expressive enough to implement fictitious play, a dynamicfirst proposed in Brown [1949], and proven to converge in [Robinson, 1951] for two-player zero-sum games to an approximate Nash equilibrium. In terms of equilibrium computation, the authors in[Goldberg and Marmolejo-Coss´ıo, 2018] also provide query-efficient algorithms for computing approx-imate Nash equilibria for bimatrix games via best response queries provided one agent has a constantnumber of strategies.Finally, learning via incentive queries in games is directly related to the theory of preference elic-itation, where the goal is to mine information about the private parameters of the agents by inter-acting with them [Blum et al., 2004, Lahaie and Parkes, 2004, Zinkevich et al., 2003, Goldberg et al.,2020]. This has many applications, most notably combinatorial auctions, where access to the valu-ation functions of the agents is achieved via value or demand queries [Blumrosen and Nisan, 2007,Conen and Sandholm, 2001, Nisan and Segal, 2006].
A Stackelberg game (SG) is a sequential game between a leader and a follower . The leader commitsto a strategy, and the follower then acts upon observing this commitment. We consider finite SGs, inwhich the leader and the follower have m and n pure strategies at their disposal, respectively, and theirutilities for all possible outcomes are given by the matrices u L , u F ∈ R m × n . The entries u L ( i, j ) and u F ( i, j ) denote the utilities of the leader and the follower, under pure strategy profile ( i, j ) ∈ [ m ] × [ n ] .We use G = ( u L , u F ) to denote the SG with payoff matrices u L and u F ; we omit m and n as they areclear from context.Like one-shot games, the agents are allowed to employ mixed strategies whereby they randomizeover actions in their strategy set. A mixed strategy of the leader is a probability distribution over [ m ] , denoted by x ∈ ∆ m − = { x ≥ P i ∈ [ m ] x i = 1 } . By slightly abusing notation, we let u L ( x , j ) = P i ∈ [ m ] x i · u L ( i, j ) be the expected utility of the leader when she plays the mixed strategy x and the follower plays a pure strategy j . Similarly, we define u F ( x , j ) = P i ∈ [ m ] x i · u F ( i, j ) forthe follower. For a given mixed strategy x ∈ ∆ m − of the leader, we say that j ∈ [ n ] is a followerbest response if u F ( x , j ) = max ℓ ∈ [ n ] u F ( x , ℓ ) ; we denote the set of all follower best responses to x by BR ( x ) ⊆ [ n ] and refer to the function BR as the best response correspondence of the follower.A strong Stackelberg equilibrium (SSE) is the standard solution concept in SGs, and captures thesituation where the leader commits to a mixed strategy that maximizes her expected utility, whiletaking into account the follower’s best response to her commitment. It is assumed that the followerbreaks ties in favor of the leader when he has multiple best responses. Definition 2.1 (SSE) . A strategy profile ( x , j ) is an SSE of the SG G = ( u L , u F ) if ( x , j ) ∈ arg max y ∈ ∆ m − , ℓ ∈ BR ( y ) u L ( y , ℓ ) . Following the standard convention, we will refer to the leader as a female and to the follower as a male. This standard assumption is justified by the fact that such tie-breaking behavior can often be enforced by an infinitesimalperturbation in the leader’s strategy [Stengel and Zamir, 2004]. earning SSEs and Deceptive Follower Behavior. We consider the scenario where the leader hasfull knowledge of her utility matrix u L , and aims to compute an SSE by interacting with the followerand gleaning information about u F . For example, the leader could observe follower best responses inplay (akin to having query access to BR ), or observe follower payoffs at pure strategy profiles duringplay (akin to having query access to u F as a function). Hence, this can be cast as the problem oflearning an SSE with a specified notion of query access to information about the follower’s incentives.Consider an SG G = ( u L , u F ) . If the follower controls the flow of information to the leader inthis paradigm, he may consider perpetually interacting with the leader as if he had a different payoffmatrix ˜ u F , which can make the leader believe that both agents are playing the game e G = ( u L , ˜ u F ) .This deceiving power provides the follower with an incentive to act according to e G for a judiciouschoice of ˜ u F , because the SSEs in e G may provide larger utility (according to u F ) than the SSEs in G .More concretely, the example below shows that the follower can gain an arbitrary benefit by deceivingthe leader to play a different game. Example 2.2 ( Beneficial deception).
Let α ∈ [0 , and consider the following matrices: R = (cid:18) (cid:19) , C α = (cid:18) α α (cid:19) Now, suppose that u L = R and u F = C α , and let x ∈ [0 , represent the probability mass thatthe leader (row player) places on the first row (her first strategy); thus, − x is the probability withwhich she plays her second strategy. Given this mixed strategy of the leader, the utilities that thefollower expects to derive from her two strategies (columns) are u F ( x,
1) = 1 − x and u F ( x,
2) = α .Consequently, the first strategy is a best response of the follower when x ∈ [0 , − α ] , and the secondone is a best response when x ∈ (1 − α, (when x = 1 − α , the tie is broken in favor of the leader).With this information, it is clear that the SSE of the game occurs when the leader chooses x = 1 − α and the follower plays his first strategy. As a result, the follower’s utility is u F (1 − α,
1) = α .However, for any α < , the follower has an incentive to deceive the leader into playing the game e G = ( R, C ) , which will improve his utility in the resulting SSE to . This will be an improvement bya multiplicative factor of /α , which can be arbitrarily large when α is arbitrarily close to . Inducible Strategy Profiles.
The ultimate goal of the follower is to identify the SSE that maximizeshis true utility, from the set of SSEs that he can deceive the leader into learning. We will refer to suchSSEs as inducible strategy profiles . At a high level, the follower’s problem can now be expressed as thefollowing optimization problem: max x ,j u F ( x , j ) , (1)subject to ( x , j ) is inducibleThis maximum utility for the follower is called the optimal inducible utility . If the maximum value isnever achieved, then for every ε > , we would like to be able to find an inducible SSE that achieves avalue ε -close to the supremum value.As discussed previously, the leader can learn an SSE by gleaning information about the incentivesof the follower by querying the best responses of the follower to particular leader strategies, or morerefined information about the follower’s payoff matrix. Depending on the type of information queried,we can define various levels of inducible strategy profiles.In more detail, suppose the leader can only query the best responses of the follower, who behavesaccording to some best response correspondence f BR : ∆ m − → [ n ] \ { ∅ } . This interaction betweenthe leader and the follower leads to a game e G = ( u L , f BR ) where only information about f BR is known(instead of a payoff matrix implying f BR ). The definition of f BR enforces a best response answer to4ny possible query. Consequently, the leader learns an SSE ( x , j ) ∈ arg max y ∈ ∆ m − , ℓ ∈ f BR ( y ) u L ( y , ℓ ) ,which yields the following notion of BR-inducible strategy profiles.
Definition 2.3 (BR-inducibility) . A strategy profile ( x , j ) is BR-inducible with respect to u L if thereexists a best response correspondence f BR : ∆ m − → [ n ] \ { ∅ } such that ( x , j ) is an SSE of the game e G = ( u L , f BR ) , in which case we say that ( x , j ) is induced by f BR .Next, consider the case where the leader can query information about the payoffs of the follower,who can now behave according to a fake payoff matrix ˜ u F . We refer to the SSEs of the resulting game e G = ( u L , ˜ u F ) as payoff-inducible strategy profiles. Definition 2.4 (Payoff-inducibility) . A strategy profile ( x , j ) is said to be payoff-inducible with respectto u L if there exists ˜ u F ∈ R m × n such that ( x , j ) is an SSE in the game e G = ( u L , ˜ u F ) , in which casewe say that ( x , j ) is induced by ˜ u F .Clearly, payoff-inducibility is stricter than BR-inducibility: for every choice of ˜ u F , the correspond-ing best response correspondence f BR ( y ) = arg max ℓ ∈ [ n ] ˜ u F ( y , ℓ ) induces the same SSEs as ˜ u F does.Note that the above definitions only require an inducible strategy profile to be a verifiable SSE,with respect to information about the follower’s incentive (either f BR or ˜ u F ). However, it may happenthat the resulting game e G has multiple SSEs, which gives rise to an equilibrium selection issue. Indeed,in practice, it is not realistic to assume that the follower has any control over which SSE is chosenby the leader (who moves the first in the game). To address this, and thus completely resolve theoptimal deception problem for the follower, we introduce an even stricter notion of inducibility on topof payoff-inducibility, which requires e G to have a unique SSE. Definition 2.5 (Strong inducibility) . A strategy profile ( x , j ) is said to be strongly inducible withrespect to u L , if there exists a matrix ˜ u F ∈ R m × n such that ( x , j ) is the unique SSE of the game e G = ( u L , ˜ u F ) , in which case we say that ( x , j ) is strongly induced by ˜ u F .In the next sections, we will investigate solutions to (1) under the inducibility notions above, fromthe weakest to the strongest. Our general approach is to decompose (1) into n sub-problems by enu-merating all possible follower responses j ∈ [ n ] . For each strategy j , we solve the correspondingoptimization problem, and pick the one that yields the maximum utility for the follower. Due to spaceconstraints, some proofs are omitted and can be found in the supplementary material. Let us start our analysis by considering the case in which the leader queries the best responses of thefollower. The aim of the follower is to deceive the leader towards a strategy profile that is BR-inducible;see Definition 2.3. Indeed, if the follower is allowed to use an arbitrary f BR to induce a strategy profile ( x , j ) , he can simply define f BR as follows: f BR ( y ) = ( { j } if y = x arg min ℓ ∈ [ n ] u L ( y , ℓ ) if y = x . Namely, the follower threatens to choose the worst possible response against any leader strategy y = x , so as to minimize the leader’s incentive to commit to these strategies. This f BR will successfullyconvince the leader that ( x , j ) is an SSE of e G , hence inducing ( x , j ) , if the threat is powerful enough,that is, if u L ( x , j ) ≥ min ℓ ∈ [ n ] u L ( y , ℓ ) for all y ∈ ∆ m − . Equivalently, this means that u L ( x , j ) ≥ M := max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) , (2)5here M is exactly the leader’s maximin utility . Indeed, (2) is necessary for ( x , j ) to be BR-inducible:if on the contrary u L ( x , j ) < M , then by committing to y ∗ ∈ arg max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) , theleader can obtain (at least) her maximin utility, which will be strictly larger than u L ( x , j ) .Thus, condition (2) gives a simple criterion for BR-inducibility. The problem is that such f BR maybe far from being one that arises from a choice of ˜ u F . To alleviate this limitation, we impose a strictercondition on f BR . Polytopal BR Correspondence.
In a similar vein to Goldberg and Marmolejo-Coss´ıo [2018], werequire that, for every ℓ ∈ [ n ] , the set of leader strategies to which ℓ is a best response f BR − ( ℓ ) = { y ∈ ∆ m − : ℓ ∈ f BR ( y ) } is a closed convex polytope, and the union of all these sets forms a partitionof ∆ m − (for example, see the polytope partition of ∆ in Figure 1). Any best response correspondence f BR satisfying this assumption is called polytopal . Definition 3.1 (Polytopal best response correspondence [Goldberg and Marmolejo-Coss´ıo, 2018]) . Abest response correspondence f BR : ∆ m − → [ n ] \ { ∅ } is polytopal if it also satisfies the following:• f BR − ( ℓ ) is a closed convex polytope for each ℓ ∈ [ n ] , and• For each k = ℓ , either relint( f BR − ( k )) ∩ relint( f BR − ( ℓ )) = ∅ or f BR − ( k ) = f BR − ( ℓ ) , where relint( H ) denotes the relative interior of a set H .Being polytopal is necessary for f BR to arise from some payoff matrix. Indeed, the true best responsecorrespondence BR that arises from u F is polytopal: clearly, each BR − ( ℓ ) is a closed convex polytopedefined by the hyperplanes u F ( y , ℓ ) ≥ u F ( y , k ) for all k ∈ [ n ] and the borders of ∆ m − ; in addition, ∪ nℓ =1 BR − ( ℓ ) = ∆ m − , and for any ℓ = k , the polytopes BR − ( ℓ ) and BR − ( k ) only intersect at theirborders unless u F ( · , ℓ ) = u F ( · , k ) . Thus, if the follower attempts to deceive the leader via a fake f BR ,the leader might spot the deception in case f BR is not polytopal.It turns out that the following correspondence, which we denote as f BR P , is polytopal and, as wewill shortly show, it is in fact as powerful as any best response correspondence. f BR P ( y ) = { j } if y ∈ ∆ m − \ U j ( x ) { j } ∪ arg min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) if y ∈ U j ( x ) \ U j ( x )arg min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) if y ∈ U j ( x ) where U j ( x ) is the closure of U j ( x ) = (cid:8) y ∈ ∆ m − : u L ( y , j ) > u L ( x , j ) (cid:9) . Intuitively, it is safe forthe follower to respond by playing j against any leader strategy y if u L ( y , j ) ≤ u L ( x , j ) , in whichcase the leader does not have a strong incentive to commit to y instead of x . In response to the otherstrategies, however, the follower needs to play a different strategy in order to minimize the leader’sincentive to commit to such a y . Therefore, this approach will successfully induce ( x , j ) if and only ifthe following holds: u L ( x , j ) ≥ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) , (3)where we use the convention that max ∅ = −∞ . It is easy to see that f BR P is indeed polytopal: f BR − P ( j ) = ∆ m − \ U j ( x ) is a closed convex polytope, and the same holds for the sets f BR − P ( ℓ ) definedby the hyperplanes u L ( y , ℓ ) ≤ u L ( y , k ) , k ∈ [ n ] \ { j } and the borders of U j ( x ) , which further forma partition of U j ( x ) . Note that the use of U j ( x ) , instead of the set (cid:8) y ∈ ∆ m − : u L ( y , j ) ≥ u L ( x , j ) (cid:9) , is important: when u L ( y , j ) = u L ( x , j ) for all y ∈ ∆ m − , these two sets define different behaviors.
6n fact, (2) is equivalent to (3), meaning that the extra condition imposed on f BR P does not compro-mise its power: if ( x , j ) can be induced by an arbitrary f BR then it can also be induced by f BR P . We statethis result in Lemma 3.2. Lemma 3.2. u L ( x , j ) ≥ M if and only if u L ( x , j ) ≥ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) .Proof. Recall that we want to show that u L ( x , j ) ≥ M if and only if u L ( x , j ) ≥ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) (4)where M is the maximin utility of the leader.We show that (4) does not hold if and only if u L ( x , j ) < M . Suppose that (4) does not hold. Then u L ( x , j ) < max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) by definition, which implies that U j ( x ) = ∅ . By thecontinuity of min ℓ ∈ [ n ] \{ j } u L ( · , ℓ ) , there exists y ∗ ∈ U j ( x ) such that u L ( x , j ) < min ℓ ∈ [ n ] \{ j } u L ( y ∗ , ℓ ) . By the definition of U j ( x ) , we also have u L ( x , j ) < u L ( y ∗ , j ) . Thus, u L ( x , j ) < min ℓ ∈ [ n ] u L ( y ∗ , ℓ ) ≤ max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) = M. Conversely, suppose that u L ( x , j ) < M . Let y ∗ ∈ arg max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) . Thus, M =min ℓ ∈ [ n ] u L ( y ∗ , ℓ ) , and we have u L ( x , j ) < M = min ℓ ∈ [ n ] u L ( y ∗ , ℓ ) ≤ u L ( y ∗ , j ) which implies that y ∗ ∈ U j ( x ) . It follows that M = max y ∈ U j ( x ) min ℓ ∈ [ n ] u L ( y , ℓ ) and thus u L ( x , j ) < max y ∈ U j ( x ) min ℓ ∈ [ n ] u L ( y , ℓ ) ≤ max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) , so (4) does not hold.Using Lemma 3.2, we can efficiently compute the best strategy profile that can be induced by f BR P ,simply by solving the following Linear Program (LP) for each j ∈ [ n ] : max x ∈ ∆ m − u F ( x , j ) (5)subject to u L ( x , j ) ≥ M At this point, it might be tempting to think that with the polytopal constraint imposed, we wouldalso be able to construct an explicit payoff matrix ˜ u F to implement f BR P . Unfortunately, this is not trueas Example 3.3 illustrates. Surprisingly though, in the next section we will show that, even thoughwe cannot construct a payoff matrix that implements f BR P directly, every strategy profile ( x , j ) thatis f BR P -inducible, is in fact payoff-inducible. We also present an efficient algorithm for computing apayoff matrix ˜ u F to induce such ( x , j ) . Example 3.3.
Consider a × game with the leader payoff matrix given in Figure 1. Let f BR P be apolytopal BR correspondence defined by the regions R , R , and R in Figure 1, such that ℓ ∈ f BR P if and only if y ∈ R ℓ . This best response behavior cannot be realized by any payoff matrix. To seethis, suppose f BR P is realized by some ˜ u F ∈ R × . Let x = ( , , , w = ( , , ) , and z = ( , , ) .We have f BR P ( z ) = { , , } and f BR P ( w ) = { , } . This means that u L ( z ,
1) = u L ( z ,
3) = u L ( z , and u L ( w ,
1) = u L ( w , > u L ( w , . Since x = 2 z − w , by the linearity of the utility function, u L ( x ,
1) = u L ( x , < u L ( x , , which contradicts the fact that f BR P ( x ) = { , } .7 L = − / /
21 1 / − / R R R xzw y y Figure 1: No payoff matrix ˜ u F realizes the polytopal BR correspondence f BR P , such that ℓ ∈ f BR P if and onlyif y ∈ R ℓ , where R = { y ∈ ∆ : y ≥ y + y } , R = { y ∈ ∆ : y ≤ y + y and y ≥ y } , and R = { y ∈ ∆ : y ≤ y + y and y ≤ y } . In this section, we will show that every profile strategy that can be induced by f BR P is also payoff-inducible, and a corresponding payoff matrix can be efficiently constructed. Recall that the maximinutility of the leader is denote by M = max y ∈ ∆ m − min ℓ ∈ [ n ] u L ( y , ℓ ) . We will show the followingcharacterization as one of our key results, which enables us to use the LP in (5) to efficiently computea payoff matrix that achieves the optimal inducible utility. Theorem 4.1.
A strategy profile ( x , j ) is payoff-inducible if and only if u L ( x , j ) ≥ M . Furthermore, amatrix ˜ u F inducing ( x , j ) can be constructed in polynomial time. One direction of the characterization is easy to show. Indeed, if ( x , j ) is payoff-inducible, then itis also BR-inducible, and as seen in Section 3, it holds that u L ( x , j ) ≥ M .Now consider any profile ( x , j ) such that u L ( x , j ) ≥ M . Recall that U j ( x ) = { y ∈ ∆ m − : u L ( y , j ) > u L ( x , j ) } . Without loss of generality, in what follows, we can also assume that U j ( x ) = ∅ :if U j ( x ) = ∅ , then ( x , j ) will be an SSE if the follower always responds by playing j ; this can easilybe achieved by claiming that j strictly dominates all other strategies, i.e., by letting ˜ u F ( i, j ) = 1 and ˜ u F ( i, ℓ ) = 0 for all i ∈ [ m ] and ℓ ∈ [ n ] \ { j } .We begin by analyzing the following payoff function that forms the basis of our approach. Let b S ⊆ [ n ] \ { j } and pick k ∈ argmin ℓ ∈ b S u L ( x , ℓ ) arbitrarily. For all y ∈ ∆ m − , let ˜ u F ( y , ℓ ) = − u L ( y , ℓ ) if ℓ ∈ b S − u L ( y , k ) − if ℓ ∈ [ n ] \ ( b S ∪ { j } ) − u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) if ℓ = j (6)where α > is a constant. In what follows, we will let f BR denote the best response correspondencecorresponding to ˜ u F , i.e., f BR ( y ) = argmax ℓ ∈ [ n ] ˜ u F ( y , ℓ ) . Note that we can compute the payoff matrixcorresponding to ˜ u F in polynomial time. Then, the hope is that with appropriately chosen b S and α ,the payoff matrix will induce ( x , j ) . Indeed, ˜ u F has the following nice properties:i. Strategy j is indeed a best response to x , since, by the choice of k we have ˜ u F ( x , j ) = − u L ( x , k ) ≥ − min ℓ ∈ b S u L ( x , ℓ ) = max ℓ ∈ b S ˜ u F ( x , ℓ ) . ii. Any ℓ ∈ [ n ] \ ( b S ∪ { j } ) cannot be a best response of the follower as it is strictly dominated by k , i.e., ˜ u F ( y , ℓ ) < ˜ u F ( y , k ) for all y ∈ ∆ m − . Thus, f BR ( y ) ⊆ b S ∪ { j } for all y ∈ ∆ m − .8ii. If j is a best response to some y ∈ ∆ m − , then u L ( y , j ) ≤ u L ( x , j ) . Indeed, j ∈ f BR ( y ) impliesthat ˜ u F ( y , j ) = max ℓ ∈ [ n ] ˜ u F ( y , ℓ ) ≥ ˜ u F ( y , k ) . Substituting ˜ u F ( y , j ) = − u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) into this inequality and rearrang-ing the terms immediately gives u L ( y , j ) ≤ u L ( x , j ) .iv. If any ℓ ∈ b S is a best response to some y ∈ ∆ m − , then it holds that ˜ u F ( y , ℓ ) = max ℓ ′ ∈ b S ˜ u F ( y , ℓ ′ ) ,which implies that u L ( y , ℓ ) = min ℓ ′ ∈ b S u L ( y , ℓ ′ ) . (7)Therefore, if the following also holds for the y in (iv), min ℓ ′ ∈ b S u L ( y , ℓ ′ ) ≤ u L ( x , j ) , then by (7) we will have u L ( y , ℓ ) ≤ u L ( x , j ) for every ℓ ∈ f BR ( y ) ∩ b S . This, together with (ii) and (iii),will imply that u L ( x , j ) ≥ u L ( y , ℓ ) for every ℓ ∈ f BR ( y ) . Therefore, ( x , j ) will indeed form an SSEgiven that j ∈ f BR ( x ) by (i). We state this observation as the following lemma. Lemma 4.2. If min ℓ ′ ∈ b S u L ( y , ℓ ′ ) ≤ u L ( x , j ) holds for all y ∈ ∆ m − such that f BR ( y ) ∩ b S = ∅ , thenthe payoff matrix defined by (6) induces ( x , j ) . The proof of Theorem 4.1 is then completed by showing the following result.
Proposition 4.3. If u L ( x , j ) ≥ M and U j ( x ) = ∅ , then we can construct b S ⊆ [ n ] \ { j } and α > inpolynomial time, with which the condition of Lemma 4.2 holds for ˜ u F as defined in (6) . The proof relies on the following useful lemma.
Lemma 4.4 (Farkas’ Lemma [Boyd and Vandenberghe, 2014]) . Let A ∈ R n × n and b ∈ R n . Thenexactly one of the following statements is true:1. there exists z ∈ R n such that Az = b and z ≥ ;2. there exists z ∈ R n such that A T z ≥ and b · z < .Proof of Proposition 4.3. Consider any strategy profile ( x , j ) with u L ( x , j ) ≥ M and U j ( x ) = ∅ . Webegin by taking care of a simple case, as an immediate corollary of Lemma 4.2. Corollary 4.5.
A matrix ˜ u F that induces ( x , j ) can be constructed in polynomial time if it holds that u L ( x , j ) ≥ M − j := max y ∈ ∆ m − min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) . (8) Proof.
Let b S = [ n ] \ { j } . Then, for every y ∈ ∆ m − , we immediately obtain that u L ( x , j ) ≥ max y ∈ ∆ m − min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) ≥ min ℓ ∈ b S u L ( y , ℓ ) By Lemma 4.2, the payoff matrix defined by (6) (with, say, α = 1 ) then induces ( x , j ) , and can clearlybe computed in polynomial time. 9e more challenging case is when (8) does not hold (e.g., the case with the profile ( x , in Exam-ple 3.3). In what follows, we prove Proposition 4.3 by showing that there is still a choice of b S and α that leads to the condition in Lemma 4.2, even when (8) does not hold. Thus, from now on, we assumethat u L ( x , j ) < M − j . (9)We define the following useful components. By Lemma 3.2 and the assumption that u L ( x , j ) ≥ M ,we know that u L ( x , j ) ≥ V (10)where V = max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) . Since U j ( x ) = ∅ , there exists y ∗ ∈ U j ( x ) such that min ℓ ∈ [ n ] \{ j } u L ( y ∗ , ℓ ) = V, (11)which can be computed efficiently by solving an LP (i.e., maximize µ , subject to µ ≤ u L ( y , ℓ ) for all ℓ ∈ [ n ] \ { j } and y ∈ U j ( x ) ). We then let S = { ℓ ∈ [ n ] \ { j } | u L ( y ∗ , ℓ ) = V } . Before we proceed, we prove two useful technical results.
Lemma 4.6. u L ( y ∗ , j ) = u L ( x , j ) .Proof. For the sake of contradiction, suppose that u L ( y ∗ , j ) = u L ( x , j ) . Since y ∗ ∈ U j ( x ) , we havethat u L ( y ∗ , j ) ≥ u L ( x , j ) , so it must be that u L ( y ∗ , j ) > u L ( x , j ) .The assumption (9) that u L ( x , j ) < M − j implies that there exists ˆ y ∈ ∆ m − such that min ℓ ∈ [ n ] \{ j } u L (ˆ y , ℓ ) > u L ( x , j ) ≥ V, where we also use (10). Now that min ℓ ∈ [ n ] \{ j } u L ( y ∗ , ℓ ) = V by (11), by the concavity of min ℓ ∈ [ n ] \{ j } u L ( · , ℓ ) ,it follows that min ℓ ∈ [ n ] \{ j } u L ( z , ℓ ) > V for all z on the segment [ˆ y , y ∗ ) ; z ∈ ∆ m − as ∆ m − is con-vex. Now that we have u L ( y ∗ , j ) > u L ( x , j ) under our assumption, when z is sufficiently close to y ∗ ,we can have u L ( z , j ) ≥ u L ( x , j ) and hence, z ∈ U j ( x ) . This leads to the contradiction that V = max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) ≥ min ℓ ∈ [ n ] \{ j } u L ( z , ℓ ) > V. Lemma 4.7. min ℓ ∈ S u L ( y , ℓ ) < V for all y ∈ U j ( x ) .Proof. For the sake of contradiction, assume that there exists ˆ y ∈ U j ( x ) such that min ℓ ∈ S u L (ˆ y , ℓ ) ≥ V. By assumption (9) that u L ( x , j ) < M − j , there exists z ∈ ∆ m − such that min ℓ ∈ [ n ] \{ j } u L ( z , ℓ ) >u L ( x , j ) ≥ V , which immediately yields the following given that S ⊆ [ n ] \ { j } by definition: min ℓ ∈ S u L ( z , ℓ ) > V.
10y definition, u L ( y ∗ , ℓ ) = V for all ℓ ∈ S , which also implies that u L ( y ∗ , ℓ ) > V for all ℓ ∈ [ n ] \ ( { j } ∪ S ) (otherwise, we would have min ℓ ∈ [ n ] \{ j } u L ( y ∗ ,ℓ ) < V ). Thus, we have min ℓ ∈ S u L ( y ∗ , ℓ ) = V and min ℓ ∈ [ n ] \ ( { j }∪ S ) u L ( y ∗ , ℓ ) > V. Now consider a point w on the segment ( y ∗ , ˆ y ] . Since y ∗ ∈ U j ( x ) and ˆ y ∈ U j ( x ) , i.e., u L ( y ∗ , j ) ≥ u L ( x , j ) and u L (ˆ y , j ) > u L ( x , j ) , we have u L ( w , j ) > u L ( x , j ) and hence, w ∈ U j ( x ) . In addition,by continuity, when w is sufficiently close to y ∗ , we have min ℓ ∈ [ n ] \ ( { j }∪ S ) u L ( w , ℓ ) > V. (12)By concavity of the function min ℓ ∈ S u L ( · , ℓ ) , since min ℓ ∈ S u L ( y , ℓ ) ≥ V for both y ∈ { y ∗ , ˆ y } , wehave min ℓ ∈ S u L ( w , ℓ ) ≥ V. (13)Analogously, we can find a point w ′ ∈ U j ( x ) on the segment ( w , z ] , such that (12) and (13) holdfor w ′ while (13) is strict, in particular. Thus, we have min ℓ ∈ [ n ] \{ j } u L ( w ′ , ℓ ) > V = max y ∈ U j ( x ) min ℓ ∈ [ n ] \{ j } u L ( y , ℓ ) , which is a contradiction as w ′ ∈ U j ( x ) .In what follows, we use the coordinates ( y , . . . , y m − ) for every point y ∈ ∆ m − , i.e., we have ∆ m − = ( ( y , . . . , y m − ) ∈ R ≥ : m − X i =1 y i ≤ ) . Accordingly, we can write the utility function as u L ( y , ℓ ) = g ℓ · y + u L ( m, ℓ ) , where g ℓ ∈ R m − and its i -th component is g ℓ,i = u L ( i, ℓ ) − u L ( m, ℓ ) ; “ · ” denotes the inner product.Hence, we have u L ( y , ℓ ) = g ℓ · ( y − y ∗ ) + u L ( y ∗ , ℓ ) = ( g ℓ · ( y − y ∗ ) + V if ℓ ∈ S g j · ( y − y ∗ ) + u L ( x , j ) if ℓ = j (14)where u L ( y ∗ , ℓ ) = V for all ℓ ∈ S by the definition of S , and u L ( y ∗ , j ) = u L ( x , j ) by Lemma 4.6.Note that since U j ( x ) = ∅ , it must be that g j = 0 .We also write the m boundary conditions that define ∆ m − as e i · y ≥ β i . Namely, for each i ∈ [ m − , let e i ∈ R m − be the i -th unit vector and β i = 0 , while e m = ( − , . . . , − ∈ R m − and β m = − . Thus, ∆ m − = { y ∈ R m − : e i · y ≥ β i for i ∈ [ m ] } . Let B = { i ∈ [ m ] : e i · y ∗ = β i } be the set of boundary conditions that are tight for y ∗ . Note that for any y ∈ ∆ m − we have e i · ( y − y ∗ ) ≥ for all i ∈ B. (15)We can now prove the following result using Farkas’ Lemma (Lemma 4.4), which allows us toexpress − g j as a non-negative linear combination of g ℓ ’s and e i ’s.11 emma 4.8. − g j can be expressed as a non-negative linear combination of { g ℓ : ℓ ∈ S } ∪ { e i : i ∈ B } ,i.e. − g j = P ℓ ∈ S λ ℓ g ℓ + P i ∈ B µ i e i , where λ ℓ ≥ and µ i ≥ .Proof. We use Farkas’ Lemma (Lemma 4.4) and let n = m − and n = | S | + | B | . The columns of A are exactly the vectors { g ℓ : ℓ ∈ S } ∪ { e i : i ∈ B } . We set b = − g j . Note that the first alternative ofFarkas’ Lemma immediately yields the statement we want to prove. Thus, we set out to prove that thesecond alternative cannot hold.Assume, for the sake of contradiction, that there exists z ∈ R m − such that A T z ≥ and b · z < ,i.e., g ℓ · z ≥ for all ℓ ∈ S , e i · z ≥ for all i ∈ B , and g j · z > .Then, by picking δ > sufficiently small, it holds for y = y ∗ + δ z that:• By (14), we have the following for all ℓ ∈ S : u L ( y , ℓ ) = g ℓ · ( y − y ∗ ) + V = δ g ℓ · z + V ≥ V. In addition, u L ( y , j ) = g j · ( y − y ∗ ) + u L ( y ∗ , j ) = δ g j · z + u L ( x , j ) > u L ( x , j ) . • y ∈ ∆ m − : For i ∈ B , we immediately obtain that e i · y = e i · ( y ∗ + δ z ) ≥ e i · y ∗ = β i , whichmeans that these boundary conditions are satisfied. For i ∈ [ m ] \ B , we know that e i · y ∗ > β i and thus by picking δ > small enough, we can ensure that e i · y = e i · y ∗ + δ ( e i · z ) ≥ β i .Thus, it follows that y ∈ U j ( x ) and min ℓ ∈ S u L ( y , ℓ ) ≥ V . But this cannot hold according to Lemma 4.7.We can now complete the proof of Proposition 4.3. We first express − g j as a non-negative linearcombination of the vectors { g ℓ : ℓ ∈ S } ∪ { e i : i ∈ B } . By Lemma 4.8 we know that this is possibleand it is easy to see that we can find the coefficients in polynomial time (e.g. by solving an LP). Wethus obtain − g j = P ℓ ∈ S λ ℓ g ℓ + P i ∈ B µ i e i , where λ ℓ ≥ for every ℓ ∈ S and µ i ≥ for every i ∈ B . Let b S = { ℓ ∈ S : λ ℓ > } . We will argue that b S = ∅ .Observe that since now − g j = P ℓ ∈ S λ ℓ g ℓ + P i ∈ B µ i e i and, by (15), we have e i · ( y − y ∗ ) ≥ for all y ∈ ∆ m − and i ∈ B , it follows that, for all y ∈ ∆ m − , we have − g j · ( y − y ∗ ) = X ℓ ∈ S λ ℓ g ℓ · ( y − y ∗ ) + X i ∈ B µ i e i · ( y − y ∗ ) ≥ X ℓ ∈ S λ ℓ g ℓ · ( y − y ∗ )= X ℓ ∈ b S λ ℓ g ℓ · ( y − y ∗ ) , (16)where the last transition is due to the fact that λ ℓ = 0 for all ℓ ∈ S \ b S , as implied by the definition of b S . Since U j ( x ) = ∅ , consider any y ∈ U j ( x ) . By definition, this means that u L ( y , j ) > u L ( x , j ) ,which further implies that g j · ( y − y ∗ ) > since u L ( y , j ) = g j · ( y − y ∗ ) + u L ( x , j ) by (14). By (16),we then have X ℓ ∈ b S λ ℓ g ℓ · ( y − y ∗ ) < . Hence, b S = ∅ . 12t remains to show that with the above b S and, in particular, α = 1 /λ k (recall that k ∈ argmin ℓ ∈ b S u L ( x , ℓ ) ),the condition in Lemma 4.2 holds, i.e., we prove that min ℓ ∈ b S u L ( y , ℓ ) ≤ u L ( x , j ) holds for all y ∈ ∆ m − such that f BR ( y ) ∩ b S = ∅ .For the sake of contradiction, suppose that there exists y ∈ ∆ m − such that f BR ( y ) ∩ b S = ∅ , but u L ( y , ℓ ) > u L ( x , j ) for all ℓ ∈ b S . By (10), we have u L ( x , j ) ≥ V , and thus u L ( y , ℓ ) > V for all ℓ ∈ b S .By (14), we have u L ( y , ℓ ) = g ℓ · ( y − y ∗ ) + V ; thus, g ℓ · ( y − y ∗ ) > for all ℓ ∈ b S .Using (16) and the fact that k ∈ b S by our choice, we then obtain − g j · ( y − y ∗ ) ≥ X ℓ ∈ b S λ ℓ g ℓ · ( y − y ∗ ) ≥ λ k g k · ( y − y ∗ ) . By (14), we have u L ( x , j ) − u L ( y , j ) = − g j · ( y − y ∗ ) . Recall that it is defined that ˜ u F ( y , j ) = − u L ( y , k )+ α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) as in (6). Using the abovetwo equations and (14), we then obtain the following: ˜ u F ( y , j ) = − u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) = − g k · ( y − y ∗ ) − V − α g j · ( y − y ∗ ) ≥ − V + ( αλ k − g k · ( y − y ∗ )= − V. However, by (6) we also have ˜ u F ( y , ℓ ) = − u L ( y , ℓ ) if ℓ ∈ b S , which implies that for all ℓ ∈ b S it holdsthat ˜ u F ( y , j ) ≥ − V > − u L ( y , ℓ ) = ˜ u F ( y , ℓ ) . Hence, f BR ( y ) ∩ b S = ∅ , which contradicts our assumption. As discussed in Section 2, a weakness of BR- and payoff-inducible strategy profiles is that the resultinggames may have multiple SSEs, in which case the follower depends on the leader to choose the SSEthat maximizes his utility. To avoid this, in this section, we turn our attention to strong inducibility(see Definition 2.5) and attempt to find a payoff matrix ˜ u F such that e G has a unique SSE.We begin with an example showcasing that, in general, the best strongly inducible profile can bemuch worse than the best payoff-inducible profile. Example 5.1.
Consider a × game G = ( u L , u F ) with the payoff matrices given in Figure 2. Note thatthe follower obtains positive utility only by playing his strategy . Now, observe that the SSE ( x ∗ , , x ∗ = (0 , , ∈ ∆ , is payoff-inducible and yields a utility of for the follower: it can be induced byany payoff matrix in which strategy of the follower strictly dominates all other strategies. However,such a payoff matrix will also induce other SSEs, e.g., ( y ∗ , with y ∗ = (1 , , ∈ ∆ . Indeed, it holdsthat no profile of the form ( y , can be strongly induced, and thus the optimal utility the follower canobtain at a strongly inducible profile is . To see this, first note that, as seen above, if the followerclaims that strategy is his unique best response for all points in ∆ , then the SSE is not unique. Onthe other hand, if strategy is a best response at some point z ∈ ∆ , then ( y , will not be an SSE,since for the leader u L ( y , < u L ( z , for any y , z ∈ ∆ .13 L = / / / / u F = / / R x ∗ y ∗ y y Figure 2: A game where the optimal inducible utility is , but the optimal strongly inducible utility is . u L = / / u F = / / / /
21 1 / R R x ∗ y ∗ w ∗ y y Figure 3: A non-max-degenerate game for which the optimal inducible utility cannot be achieved by any stronglyinducible profile.
The problem in Example 5.1 stems from the following observation: if the follower reports a payoffmatrix such that strategy is the unique best response for all points in the domain, then there aremultiple SSEs. This can be thought of as a “degenerate” case, since it would occur with probability ,if the payoffs of the leader were drawn uniformly at random in [0 , . We formalize this as follows. Definition 5.2.
A leader payoff matrix u L is said to be max-degenerate , if there exists j ∈ [ n ] suchthat | argmax i ∈ [ m ] u L ( i, j ) | > .We next provide an example showing that even when u L is not max-degenerate, we cannot hopeto exactly achieve the optimal inducible utility via a strongly inducible profile. Example 5.3.
Consider a × game with the leader and follower payoff matrices given in Figure 3.It is easy to check that u L is not max-degenerate. Now, observe that the maximin utility of the leaderis M = 1 / and is achieved at the point y ∗ = ( , , ∈ ∆ . Let x ∗ = (0 , , ∈ ∆ . Since u L ( x ∗ ,
1) = 1 / ≥ M , it follows that ( x ∗ , is payoff-inducible by Theorem 4.1. Indeed, the partition ( R , R ) of ∆ in Figure 3 shows how ( x ∗ , can be induced. Note that u F ( x ∗ ,
1) = 1 , while anyprofile different from ( x ∗ , yields utility strictly less than for the follower. We will now showthat ( x ∗ , cannot be strongly induced, which implies that any strongly inducible profile gives utilitystrictly less than to the follower. Indeed, suppose that ( x ∗ , is induced by some ˜ u F . If by ˜ u F strategy is a best response to y ∗ , then ( x ∗ , cannot be the unique SSE, since u L ( x ∗ ,
1) = u L ( y ∗ , . Onthe other hand, if strategy is the only best response to y ∗ , then there exists some sufficiently small δ > such that strategy is also a best response to w ∗ = ( − δ, + δ, (see Figure 3). However,this means that ( x ∗ , cannot be an SSE, since u L ( x ∗ ,
1) = 1 / and u L ( w ∗ ,
2) = 1 / δ .As a result, unlike in the previous section, here we cannot hope to solve the problem exactly.However, the next theorem shows that we can approximate the optimal utility with arbitrarily goodprecision. Theorem 5.4. If u L is not max-degenerate, then for any ε > , the follower can strongly induce aprofile ( x , j ) that yields the optimal inducible utility up to an additive loss of at most ε . Furthermore, amatrix ˜ u F strongly inducing ( x , j ) can be constructed in time polynomial in log(1 /ε ) (and the size of therepresentation of the game). roof. Let ( x ∗ , j ) be a payoff-inducible profile that yields the optimal inducible payoff for the follower.By Theorem 4.1, such a profile can be computed in polynomial time.We begin by solving the following LP. max δ, x δ s.t. x ∈ ∆ m − u F ( x , j ) ≥ u F ( x ∗ , j ) − εu L ( x , j ) = u L ( x ∗ , j ) + δ (17)Note that this LP can be solved in time polynomial in log(1 /ε ) . Furthermore, note that the polytopeof feasible points is not empty since δ = 0 and x = x ∗ satisfy all the constraints. Finally, the LP is notunbounded since δ can be at most max i ∈ [ m ] u L ( i, j ) − u L ( x ∗ , j ) .In the rest of this proof let δ and x denote an optimal solution to this LP. Note that we can inparticular assume that x is a vertex of the convex polytope P δ = { y ∈ ∆ m − : u L ( y , j ) = u L ( x ∗ , j )+ δ } . Indeed, given a solution δ, x to LP (17), if x is not a vertex of P δ , then we consider the LP max y u F ( y , j ) s.t. y ∈ ∆ m − u L ( y , j ) = u L ( x ∗ , j ) + δ It is known that a solution of an LP that is also a vertex of the feasible polytope can be computed inpolynomial time [Gr¨otschel et al., 1981]. Note that in this case the feasible polytope is exactly P δ . Let y be an optimal solution that is a vertex of P δ . We know that x ∈ P δ and u F ( x , j ) ≥ u F ( x ∗ , j ) − ε ,which implies that u F ( y , j ) ≥ u F ( x ∗ , j ) − ε . But this means that δ, y is also an optimal solution tothe original LP (17). Thus, by letting x := y , we indeed have that x is a vertex of the convex polytope P δ . Let us first handle the case where δ = 0 by showing that ( x ∗ , j ) itself can be strongly induced. Since δ = 0 , it follows that U j ( x ∗ ) = ∅ . Indeed, if there exists ˆ y ∈ ∆ m − with u L (ˆ y , j ) > u L ( x ∗ , j ) , thenthere exists y on the segment ( x ∗ , ˆ y ] such that u F ( y , j ) ≥ u F ( x ∗ , j ) − ε (when y is sufficiently close to x ∗ ) and u L ( y , j ) > u L ( x ∗ , j ) , a contradiction to the optimality of δ = 0 . Now, given that U j ( x ∗ ) = ∅ ,we have that u L ( y , j ) ≤ u L ( x ∗ , j ) for all y ∈ ∆ m − . But since u L is not max-degenerate (in the senseof Definition 5.2), it follows that in fact u L ( y , j ) < u L ( x ∗ , j ) for all y ∈ ∆ m − \ { x ∗ } . Thus, if thefollower always best responds with strategy j , then ( x ∗ , j ) will be the unique SSE. As seen before, itis easy to implement this behavior by reporting ˜ u F ( i, j ) = 1 and ˜ u F ( i, ℓ ) = 0 for all i ∈ [ m ] and ℓ ∈ [ n ] \ { j } .In the rest of this proof, we consider the case δ > and show that ( x , j ) can be strongly induced.Since u F ( x , j ) ≥ u F ( x ∗ , j ) − ε , this means that at ( x , j ) the follower achieves the optimal inducibleutility up to an additive error of ε . Using the same notation as in the proof of Proposition 4.3, we let B = { i ∈ [ m ] : e i · x = β i } denote the set of boundary conditions of ∆ m − that are tight for x . Note that since x is a vertex of thepolytope P δ , it follows that B = ∅ . We let h = P i ∈ B e i . As in the proof of Proposition 4.3, we havethat for all y ∈ ∆ m − it holds that h · ( y − x ) = X i ∈ B e i · ( y − x ) ≥ . (18)Furthermore, since x is a vertex of P δ , it follows that for all y ∈ P δ \ { x } there exists i ∈ B such that e i · ( y − x ) > , and thus h · ( y − x ) > . (19)15ndeed, if e i · ( y − x ) = 0 for all i ∈ B for some y ∈ P δ \ { x } , this would contradict the fact that x isa vertex of P δ (i.e. the unique point in P δ for which the boundary conditions in B are tight).We are now ready to construct the payoff matrix reported by the follower. Pick an arbitrary k ∈ argmin ℓ ∈ [ n ] \{ j } u L ( x , ℓ ) . For all y ∈ ∆ m − let ˜ u F ( y , ℓ ) = ( − u L ( y , ℓ ) if ℓ ∈ [ n ] \ { j }− u L ( y , k ) + α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) − h · ( y − x ) if ℓ = j (20)where α = (cid:0) i ∈ [ m ] max ℓ ∈ [ n ] (cid:12)(cid:12) u L ( i, ℓ ) (cid:12)(cid:12) + m (cid:1) /δ > . Note that we can compute the payoff matrixcorresponding to this utility function in polynomial time. In the remainder of this proof, we show that ( x , j ) is the unique SSE of the game ( u L , ˜ u F ) .Clearly, j is a best response at x , since ˜ u F ( x , j ) = − u L ( x , k ) = − min ℓ ∈ [ n ] \{ j } u L ( x , ℓ ) = max ℓ ∈ [ n ] \{ j } ˜ u F ( x , ℓ ) , by the choice of k .Next, let us show that if j is a best response at some y ∈ ∆ m − \ { x } , then u L ( y , j ) < u L ( x , j ) .Indeed, if j is a best response at y , then in particular ˜ u F ( y , j ) ≥ ˜ u F ( y , k ) , which implies that α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) ≥ h · ( y − x ) . (21)Since h · ( y − x ) ≥ by (18), and α > , it follows that u L ( x , j ) ≥ u L ( y , j ) . It remains to showthat u L ( x , j ) = u L ( y , j ) . But if u L ( x , j ) = u L ( y , j ) , then y ∈ P δ \ { x } and so by (19) we have h · ( y − x ) > , which contradicts (21).Finally, it remains to show that if ℓ ∈ [ n ] \ { j } is a best response at some y ∈ ∆ m − , then itmust be that u L ( y , ℓ ) < u L ( x , j ) : Indeed, if ℓ ∈ [ n ] \ { j } is a best response at y , then in particular ˜ u F ( y , j ) ≤ ˜ u F ( y , ℓ ) , which by (20) means that α (cid:0) u L ( x , j ) − u L ( y , j ) (cid:1) ≤ − u L ( y , ℓ ) + u L ( y , k ) + h · ( y − x ) ≤ − u L ( y , ℓ ) + u L ( y , k ) + k h k k y − x k ≤ i ∈ [ m ] max ℓ ′ ∈ [ n ] | u L ( i, ℓ ′ ) | + √ m − √ m − ≤ αδ by the choice of α . Thus, we obtain that u L ( x , j ) − u L ( y , j ) ≤ δ , which implies that u L ( y , j ) ≥ u L ( x ∗ , j ) , i.e. y ∈ U j ( x ∗ ) (since U j ( x ∗ ) = ∅ ). Since ( x ∗ , j ) is payoff-inducible, which means that u L ( x ∗ , j ) ≥ M , we can use Lemma 3.2 to obtain u L ( x , j ) = u L ( x ∗ , j ) + δ > u L ( x ∗ , j ) ≥ min ℓ ′ ∈ [ n ] \{ j } u L ( y , ℓ ′ ) = u L ( y , ℓ ) where the last equality comes from the fact that ℓ is a best response at y , i.e., in particular ˜ u F ( y , ℓ ) =max ℓ ′ ∈ [ n ] \{ j } ˜ u F ( y , ℓ ′ ) . An interesting first question that emerges from our results, is how to design countermeasures to miti-gate the potential loss of a learning leader, caused by possible deceptive behavior of the follower. Thiswas considered in [Gan et al., 2019b], where as a solution it was proposed that the leader could committo a policy, which is a strategy conditioned on the report of the follower, instead of a strategy. However,16n contrast to [Gan et al., 2019b], where the follower’s report is limited to a finite set of payoff matri-ces, computing the optimal policy in our model seems to be a very challenging problem. In addition,it would be nice to explore whether the optimal follower payoff matrix (or a good approximation ofit) can still be computed efficiently, when additional constraints on how much he can deviate from histrue payoff matrix are imposed. Finally, another interesting direction would be to quantify and providetight bounds on the leader’s utility loss, caused by the deceptive behavior of the follower.
References
Yakov Babichenko. Query complexity of approximate Nash equilibria.
Journal of the ACM , 63(4):36:1–36:24, 2016.Yakov Babichenko and Aviad Rubinstein. Communication complexity of approximate Nash equilibria.In
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC) , pages878–889, 2017.Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment withoutregrets: Online learning in Stackelberg security games. In
Proceedings of the 16th ACM Conferenceon Economics and Computation (EC) , pages 61–78, 2015.Marco Barreno, Blaine Nelson, Anthony D. Joseph, and J. Doug Tygar. The security of machine learning.
Machine Learning , 81(2):121–148, 2010.Omer Ben-Porat and Moshe Tennenholtz. Regression equilibrium. In
Proceedings of the 2019 ACMConference on Economics and Computation (EC) , pages 173–191, 2019.Avrim Blum, Jeffrey C. Jackson, Tuomas Sandholm, and Martin Zinkevich. Preference elicitation andquery learning.
Journal of Machine Learning Research , 5:649–667, 2004.Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Learning optimal commitment to overcomeinsecurity. In
Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS) ,pages 1826–1834, 2014.Liad Blumrosen and Noam Nisan. Combinatorial auctions. In
Algorithmic Game Theory , chapter 11,pages 267–299. Cambridge University Press, 2007.Stephen P. Boyd and Lieven Vandenberghe.
Convex Optimization . Cambridge University Press, 2014.George W. Brown. Some notes on computation of game solutions.
RAND corporation report , page 78,1949.X. Chen, Y. Cheng, and B. Tang. Well-supported versus approximate Nash equilibria: Query complexityof large games.
ArXiv rept. 1511.00785 , 2015.Yiling Chen, Chara Podimata, Ariel D. Procaccia, and Nisarg Shah. Strategyproof linear regression inhigh dimensions. In
Proceedings of the 2018 ACM Conference on Economics and Computation (EC) ,pages 9–26, 2018.Yiling Chen, Yang Liu, and Chara Podimata. Grinding the space: Learning to classify against strategicagents.
CoRR , abs/1911.04004, 2019.Wolfram Conen and Tuomas Sandholm. Preference elicitation in combinatorial auctions. In
Proceedingsof the 3rd ACM conference on Electronic Commerce (EC) , pages 256–259, 2001.17incent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In
Proceedingsof the 7th ACM Conference on Electronic Commerce (EC) , pages 82–90, 2006.Ofer Dekel, Felix A. Fischer, and Ariel D. Procaccia. Incentive compatible regression learning.
Journalof Computer and System Sciences , 76(8):759–777, 2010.Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic clas-sification from revealed preferences. In
Proceedings of the 2018 ACM Conference on Economics andComputation (EC) , pages 55–70, 2018.Edith Elkind, Jiarui Gan, Svetlana Obraztsova, Zinovi Rabinovich, and Alexandros A. Voudouris. Pro-tecting elections by recounting ballots. In
Proceedings of the 28th International Joint Conference onArtificial Intelligence (IJCAI) , pages 259–265, 2019.J. Fearnley, M. Gairing, P.W. Goldberg, and R. Savani. Learning equilibria of games via payoff queries.
Journal of Machine Learning Research , 16:1305–1344, 2015.Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, and Michael Wooldridge. Manipulating a learningdefender and ways to counteract. In
Advances in Neural Information Processing Systems (NeurIPS) ,pages 8272–8281, 2019a.Jiarui Gan, Haifeng Xu, Qingyu Guo, Long Tran-Thanh, Zinovi Rabinovich, and Michael Wooldridge.Imitative follower deception in Stackelberg games. In
Proceedings of the 2019 ACM Conference onEconomics and Computation (EC) , page 639–657, 2019b.Paul W. Goldberg and Francisco J. Marmolejo-Coss´ıo. Learning convex partitions and computing game-theoretic equilibria from best response queries. In
International Conference on Web and InternetEconomics (WINE) , pages 168–187, 2018.Paul W. Goldberg and Aaron Roth. Bounds for the query complexity of approximate equilibria.
ACMTransactions on Economics and Computation , 4(4):24:1–24:25, 2016.Paul W. Goldberg, Francisco J. Marmolejo-Coss´ıo, and Zhiwei Steven Wu. Logarithmic query complex-ity for approximate Nash computation in large games.
Theory of Computing Systems , 63(1):26–53,2019.Paul W. Goldberg, Edwin Lock, and Francisco Marmolejo-Coss´ıo. Learning strong substitutes demandvia queries. arXiv preprint arXiv:2005.01496 , 2020.P.W. Goldberg and S. Turchetta. Query complexity of approximate equilibria in anonymous games.
Journal of Computer and System Sciences , 90:80–98, 2017.Martin Gr¨otschel, L´aszl´o Lov´asz, and Alexander Schrijver. The ellipsoid method and its consequencesin combinatorial optimization.
Combinatorica , 1(2):169–197, 1981.S. Hart and Y. Mansour. How long to equilibrium? the communication complexity of uncoupled equi-librium procedures.
Games and Economic Behavior , 69(1):107–126, 2010.Sergiu Hart and Noam Nisan. The query complexity of correlated equilibria.
Games and EconomicBehavior , pages 401–410, 2016.Safwan Hossain and Nisarg Shah. The effect of strategic noise in linear regression. In
Proceedings ofthe 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) , pages511–519, 2020. 18ebastien M. Lahaie and David C. Parkes. Applying learning algorithms to preference elicitation. In
Proceedings of the 5th ACM conference on Electronic commerce (EC) , pages 180–188, 2004.Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimalstrategy to commit to. In
International Symposium on Algorithmic Game Theory , pages 250–262, 2009.Daniel Lowd and Christopher Meek. Adversarial learning. In
Proceedings of the eleventh ACM SIGKDDinternational conference on Knowledge discovery in data mining , pages 641–647, 2005.Francisco J. Marmolejo-Coss´ıo, Eric Brigham, Benjamin Sela, and Jonathan Katz. Competing (semi-)selfish miners in Bitcoin. In
Proceedings of the 1st ACM Conference on Advances in Financial Tech-nologies , pages 89–109, 2019.Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. Algorithms for strategyproof classification.
Artificial Intelligence , 186:123–156, 2012.Thanh H. Nguyen and Haifeng Xu. Imitative attacker deception in Stackelberg security games. In
Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI) , pages 528–534,2019.Noam Nisan and Ilya Segal. The communication requirements of efficient allocations and supportingprices.
Journal of Economic Theory , 129(1):192–224, 2006.Binghui Peng, Weiran Shen, Pingzhong Tang, and Song Zuo. Learning optimal strategies to commit to.In
Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI) , pages 2149–2156, 2019.Javier Perote and Juan Perote-Pe ˜na. Strategy-proof estimators for simple regression.
MathematicalSocial Sciences , 47(2):153–176, 2004.Julia Robinson. An iterative method of solving a game.
The Annals of Mathematics , 54(2):296–301, 1951.Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Watch and learn: Optimizing from revealedpreferences feedback. In
Proceedings of the 48th annual ACM symposium on Theory of Computing(STOC) , pages 949–962, 2016.Bernhard von Stengel and Shmuel Zamir. Leadership with commitment to mixed strategies.
CDAMResearch Report LSE-CDAM-2004-01, London School of Economics , 2004.Jingchang Sun, Pingzhong Tang, and Yulong Zeng. Games of miners. In
Proceedings of the 19th In-ternational Conference on Autonomous Agents and MultiAgent Systems (AAMAS) , pages 1323–1331,2020.Milind Tambe.
Security and Game theory: Algorithms, Deployed Systems, Lessons Learned . CambridgeUniversity Press, 2011.Heinrich Von Stackelberg.
Market structure and equilibrium . Springer Science & Business Media, 2010.Bo Waggoner, Rafael Frongillo, and Jacob D. Abernethy. A market framework for eliciting private data.In
Advances in Neural Information Processing Systems , pages 3510–3518, 2015.Yue Yin, Yevgeniy Vorobeychik, Bo An, and Noam Hazon. Optimal defense against election control bydeleting voter groups.
Artificial Intelligence , 259:32–51, 2018.Hanrui Zhang, Yu Cheng, and Vincent Conitzer. When samples are strategically selected. In
Proceedingsof the 36th International Conference on Machine Learning (ICML) , volume 97, 2019.19artin A Zinkevich, Avrim Blum, and Tuomas Sandholm. On polynomial-time preference elicitationwith value queries. In