Approximate Nash Equilibria via Sampling
aa r X i v : . [ c s . G T ] J u l Approximate Nash Equilibria via Sampling
Yakov Babichenko ∗ , Ron Peretz † September 26, 2018
Abstract
We prove that in a normal form n -player game with m actionsfor each player, there exists an approximate Nash equilibrium whereeach player randomizes uniformly among a set of O (log m +log n ) purestrategies. This result induces an N log log N algorithm for computingan approximate Nash equilibrium in games where the number of ac-tions is polynomial in the number of players ( m = poly ( n )), where N = nm n is the size of the game (the input size).In addition, we establish an inverse connection between the entropyof Nash equilibria in the game, and the time it takes to find such anapproximate Nash equilibrium using the random sampling algorithm. Sampling from a Nash equilibrium is a well-know method for proving exis-tence of a simple approximate Nash equilibrium. By the sampling method, ∗ Center for the Mathematics of Information, and Department of Computing and Math-ematical Sciences, California Institute of Technology. E-mail:[email protected]. † Department of Mathematics, London School of Economics. E-mail: [email protected] x i of player i is replaced by k i.i.d.samples of pure strategies from the distribution x i . These k samples are eachchosen at random with probability 1 /k , and together they form a simple k -uniform strategy s i . Equivalently, k -uniform strategies are mixed strategiesthat assign to each pure strategy a rational probability with denominator k .The main advantage of the k -uniform strategy s i over the original strategy x i is that there are at most m k such strategies (actually (cid:0) m + k − k (cid:1) ), where m is the number of actions of player i . Therefore, in the case where we do notknow the original strategy x i (and thus we cannot produce the strategy s i from x i ), we can search for the strategy s i over a relatively small set of size m k .The sampling method has a very important consequence for the compu-tation of approximate Nash equilibria. If we prove existence of a k -uniformapproximate Nash equilibrium ( s i ) ni =1 for small k , then we need only searchexhaustively for an approximate Nash equilibrium over all the possible n -tuples of k -uniform strategies. Although this method seems naive, it providesthe best upper bound that is known today for computing an approximateNash equilibrium.Althofer [1] was the first to introduce the sampling method, when he stud-ied two-player zero-sum games and showed existence of k -uniform approxi-mately optimal strategies with k = O (log m ). Althofer [1] also showed thatthe order of log m is optimal (for two-player games). Lipton, Markakis, andMehta [7] generalized this result to all two-player games; i.e., they provedexistence of a k -uniform approximate Nash equilibrium for k = O (log m ).For n -player games, Lipton, Markakis, and Mehta [7] proved existence ofa k -uniform approximate Nash equilibrium for k = O ( n log m ). H´emon,Rougemont, and Santha [5] simplified it to k = O ( n log m ).2n the present paper, we prove existence of a k -uniform approximate Nashequilibrium for k = O (log n + log m ) (see Theorem 1). The results in [7]and [5] induce a poly ( N log N ) algorithm for computing an approximate Nashequilibrium (see [8]), where N = nm n is the input size. Our result yieldsa poly ( N log log N ) algorithm for games where the number of actions of eachplayer is polynomial in n (the number of players). To our knowledge, thebest previously known upper bound for this class of games is the poly ( N log N )of [7].Our second result establishes an inverse connection between the entropyof Nash equilibria in the game and the time that it takes the sampling methodalgorithm to find an approximate Nash equilibrium (see Theorem 2). In par-ticular, this result generalizes the result of Daskalakis and Papadimitriou [4]on existence of a polynomial algorithm for an approximate Nash equilibriumin small probability games , which are a sub-class of the games where theentropy of a Nash equilibrium is very high. Daskalakis and Papadimitriou[4] proved this result for two-player games. A corollary of our result (seeCorollary 3) is that an appropriate generalization of that statement holds forany number of players n . We consider n -player games with m -actions for each player. The size of thegame is denoted by N := nm n . We use the following standard notation.The set of players is [ n ] = { , , ..., n } . The set of actions of each player is A i = [ m ] = { , , ..., m } . The set of strategy profiles is A = [ m ] n . The payoff All the results in the paper hold also for the case where each player has a differentnumber of actions (i.e., player i has m i actions). For simplicity, we assume throughoutthat all players have the same number of actions m . i is u i : A → [0 , u = ( u i ) i ∈ [ n ] . The set of probability distributions over a set B is denotedby ∆( B ). The set of mixed actions of player i is ∆( A i ). The payoff functioncan be multilinearly extended to u i : ∆( A ) → [0 , x = ( x i ) i ∈ [ n ] , where x i ∈ ∆( A i ) is an ε - equilibrium if no player can gain more than ε by a unilateral deviation; i.e., u i ( x ) ≥ u i ( a i , x − i ) − ε , for every player i and every action a i ∈ [ m ], where x − i denotesthe action profile of all players other than i . A 0-equilibrium is called an exact or Nash equilibrium.A mixed strategy x i ∈ A i is called k -uniform if x i ( a i ) = c i /k , where c i ∈ Z , for every action a i ∈ A i . Equivalently, a k -uniform strategy is auniform distribution over a multi-set of k pure actions. A strategy profile x = ( x i ) i ∈ [ n ] will be called k -uniform if every x i is k -uniform.We use the notation f ( x ) = poly ( g ( x )) if there exists a constant c suchthat f ( x ) ≤ g ( x ) c for large enough x . Our Main Theorem states the following:
Theorem 1.
Every n -player game with m actions for each player admits a k -uniform ε -equilibrium for every k ≥ m + ln n − ln ε + ln 8) ε . . Corollary 1.
Let m = poly ( n ), and let N = nm n be the input size of an n -player m -action normal-form game. For every constant ε >
0, there existsan algorithm for computing an ε -equilibrium in poly ( N log log N ) steps.4 roof of Corollary 1. The number of all the possible k -uniform profiles is atmost m nk . Note that m nk = poly ( m n log n ) = poly (( m n ) log log( m n ) ) = poly ( N log log N ) . Therefore the exhaustive search algorithm that searches for an ε -equilibriumover all possible k -uniform profiles finds such an ε -equilibrium after at most poly ( N log log N ) iterations. Proof of Theorem 1.
The proof uses the sampling method. Let k ≥ m +ln n − ln ε +ln 8) ε , and let x = ( x i ) i ∈ [ n ] be an exact equilibrium of thegame u = ( u i ) i ∈ [ n ] . For every player i , we sample k i.i.d. pure strategies( b ij ) j ∈ k according to the distribution x i ( b ij ∈ A i ). Denote by s i the uniformdistribution over the pure actions ( b ij ) j ∈ k . It is enough to show that withpositive probability the profile ( s i ) i ∈ [ n ] forms an ε -equilibrium.For every player i and strategy j ∈ A i = [ m ], we define a set of forbidden s values: E i,j = { s ∈ × l ∈ [ n ] ∆( A l ) : | u i ( j, x − i ) − u i ( j, s − i ) | ≥ ε } . Note that almost every realization of s is absolutely continuous with re-spect to x , written s ≪ x ; i.e., the event { support( s ) ⊂ support( x ) } hasprobability 1. Therefore, it is sufficient to verify that P ( s / ∈ ∪ i,j E i,j ) > s ≪ x , s / ∈ ∪ i,j E i,j is an ε -equilibrium, by u i ( a i , s − i ) ≤ u i ( a i , x − i ) + ε ≤ X b ∈ A i s i ( b ) u i ( b, x − i ) + ε ≤ X b ∈ A i s i ( b ) u i ( b, s − i ) + ε = u i ( s ) + ε, where the second inequality holds because all the strategies in the supportof s i are in the support of x i , which contains only best replies to x − i .5o show that P ( s ∈ ∪ i,j E i,j ) <
1, it is sufficient to show that P ( s ∈ E i,j ) ≤ mn because we have mn such events { s ∈ E i,j } .Up to this point, the arguments of the proof are similar to [7] and [5].The estimation of the probability P ( s ∈ E i,j ), however, uses more delicatearguments. Let us estimate P ( s ∈ E , ).We begin by rewriting the payoff of player 1. For every l ∈ [ k ], we canwrite u (1 , s − ) = 1 k n − X j ,j ,...,j n ∈ [ k ] u (1 , b j + l , b j + l, ..., b nj n + l )where the indexes j i + l are taken modulo k . If we take the average over allpossible l we have u (1 , s − ) = 1 k n − X j ,j ,...,j n ∈ [ k ] k X l ∈ [ k ] u (1 , b j + l , b j + l, ..., b nj n + l ) . (1)For every initial profile of indexes j ∗ = ( j , j , ..., j n ) ∈ [ k ] n − and every l ∈ [ k ], we denote b − j ∗ + l := ( b j + l , b j + l , ..., b nj n + l ) ∈ A − , and we define therandom variable d ( j ∗ ) := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k P l ∈ [ k ] u (1 , b − j ∗ + l ) − u (1 , x − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε
41 otherwise. (2)By the definition of d ( j ∗ ), we have d ( j ∗ ) + ε ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k X l ∈ [ k ] u (1 , b − j ∗ + l ) − u (1 , x − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (3)Note also that for any fixed j ∗ the random action profiles b − j ∗ +1 , . . . , b − j ∗ + k areindependent. Therefore by Hoeffding’s inequality (see [6]) we have E ( d ( j ∗ )) ≤ e − kε . (4)6sing representation (1) of the payoffs and inequalities (3) and (4), we get P ( s ∈ E , ) = P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k n − X j ∗ ∈ [ k ] n − k X l ∈ [ k ] u (1 , b − j ∗ + l ) − u (1 , x − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ε ≤ P k n − X j ∗ ∈ [ k ] n − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k X l ∈ [ k ] u (1 , b − j ∗ + l ) − u (1 , x − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ε ≤ P k n − X j ∗ ∈ [ k ] n − d ( j ∗ ) ≥ ε ≤ e − kε ε (5)where the last inequality follows from Markov’s inequality. Putting k ≥ m +ln n − ln ε +ln 8) ε in inequality (5), we get P ( E , ) ≤ mn . In the sequel it will be convenient to consider the set of k -uniform strategiesas the set of ordered k -tuples of pure actions. To avoid ambiguity we will callthose strategies k -uniform ordered strategies . Now the number of k -uniformordered profiles is exactly m nk .The algorithm of Corollary 1 suggests that we should search over all thepossible k -uniform profiles (or k -uniform ordered profiles), one by one, untilwe find an approximate equilibrium. Consider now the case where a largefraction of the k -uniform ordered strategies form an approximate equilibrium,say a fraction of 1 /r . In such a case we can pick k -uniform ordered profiles at random , and then we will find the approximate equilibrium in expectedtime r .Define the k -uniform random sampling algorithm ( k -URS) to be the al-gorithm described above; i.e., it samples uniformly at random n -tuples of Many k -uniform ordered strategies correspond to the same mixed strategy of the playerin the game. -uniform ordered strategies and checks whether this profile forms an ε -equilibrium. An interesting question arises: For which games does the k -URS algo-rithm find an approximate equilibrium fast? Daskalakis and Papadimitriou[4] focused on two-player games with m actions, and they showed that the k -URS algorithm finds an approximate equilibrium after poly ( m ) samples for small-probability games . A small-probability game is a game that admits aNash equilibrium where each pure action is played with probability at most c/m for some constant c .Here we generalize the result of Daskalakis and Papadimitriou to n -playergames. Instead of focusing on the specific class of small-probability gameswe establish a general connection between the entropy of equilibria in thegame and the expected number of samples of the k -URS algorithm until anapproximate Nash equilibrium is found. Theorem 2.
Let u be an n -player game with m actions for each player, witha Nash equilibrium x = ( x i ). Let k ≥ max { ε (ln n + ln m − ln ε + 2) , e /ε } = O (log m + log n ); then the k -uniform random sampling algorithm finds an ε -equilibrium after at most 4 · k ( n log m − H ( x )) samples in expectation, where H ( x ) is Shannon’s entropy of the Nash equilibrium x .The following corollary of this theorem is straightforward. Corollary 2.
Families of games where n log m − max x ∈ NE H ( x ) is bounded ad-mit a poly ( m, n ) probabilistic algorithm for computing an approximate Nashequilibrium. Checking whether a strategy profile forms an approximate equilibrium can always bedone in poly ( N ) time. Actually, it can even be done by using only poly ( n, m ) samplesfrom the mixed profile. Using the samples, the answer will be correct with a probabilitythat is exponential (in n and m ) close to 1 (see, e.g., [3], proof of Theorem 2). k = O (log m + log n ), and there-fore 4 · kO (1) = poly ( n, m ).A special case where n log m − H ( x ) is constant is that of small-probabilitygames with a constant number of players n . Corollary 3.
Let c ≥
1, and let u be an n -player m -action game with aNash equilibrium x = ( x i ) i ∈ [ n ] , where x i ( a i ) ≤ cm for players i and all actions a i ∈ A i . Let k = O (log m ), as defined in Theorem 2. Then the expectednumber of samples of the k -URS algorithm is at most 4 · kn log c = poly ( m ).The corollary follows from the fact that the entropy of the Nash equilib-rium x is H ( x ) = P i ∈ [ n ] H ( x i ) ≥ n (log m − log c ).The following example demonstrates that even in the case of two-playergames, the class of games that have PTAS according to Corollary 2 is slightlywider than the class of small-probability games. Example 1.
Consider a two-player m -action game where the equilibriumis x = ( x , x ), where x is the uniform distribution over all actions x =( m , m , ..., m ), and x = ( √ m , m + √ m , m + √ m , ..., m + √ m ). This game is not asmall-probability game, but it does satisfy n log m − H ( x ) = o (1):2 log m − H ( x ) ≤ log m − m − m + √ m log ( m + √ m ) ≤ √ m + 1 log m = o (1) . In the proof of Theorem 2 we use the following lemma from informationtheory.
Lemma 1.
Let y be a random variable that assumes values in a finite set M . Let S ⊂ M such that P ( y ∈ S ) ≥ − | M | ; then | S | ≥ H ( y ) .9 roof. H ( y ) = P ( y ∈ S ) H ( y | y ∈ S ) + P ( y / ∈ S ) H ( y | y / ∈ S ) + H ( { y ∈ S } ) ≤ log | S | + P ( y / ∈ S ) log | M | + 1 ≤ log | S | + 2 . Proof of Theorem 2.
Note that k ≥ max { ε (ln n + ln m − ln ε + 2) , e /ε } guarantees that 8 e − kε ε ≤ mn nklog m . By considering inequality (5) in the proof of Theorem 1, we can see thatthe above choice of k implies that P ( E , ) ≤ mn nk log m , which implies that P ( s ∈ ∪ i,j E i,j ) ≤ nk log m . This means that if we sample k -uniform orderedstrategy profiles according to the Nash equilibrium x , then the resulting k -uniform ordered strategies form an ε -equilibrium with a probability of atleast 1 − nk log m = 1 − ( m nk ) .Next, using Lemma 1, we provide a lower bound on the number of k -uniform profiles that form an ε -equilibrium. The random k -uniform profilesare elements of a set of size m nk . The entropy of the random k -uniform profileis kH ( x ). The probability that the random profile will form an ε -equilibriumis at least 1 − ( m nk ) . Therefore, by Lemma 1, we get that there are atleast kH ( x ) different k -uniform profiles that are ε -equilibria.To conclude, the fraction of the k -uniform profiles that form an ε -equilibrium(among all the k -uniform profiles) is at least: kH ( x ) m nk = 14 2 k ( H ( x ) − n log m ) . Therefore, the expected time for finding an ε -equilibrium is at most 4 · k ( n log m − H ( x )) . 10 Discussion
Having established an upper bound of O (log m + log n ), it is natural to askwhether it is tight. Althofer [1] provided a lower bound of the order log m that matches our upper bound in the case where the number of players isnot much larger than the number of pure strategies; i.e., n = poly ( m ). Ingeneral, the tightness of our upper bound remains an open question. Asimilar question regarding the existence of pure approximate equilibria in Lipschitz games with many players arose in a related work by Azrieli andShmaya [2].Let us call games with n players, m actions for each player, and payoffsin [0 , normalized n -player m -action games . To pinpoint the limits of ourunderstanding of the problem, consider the following questions. Question 1.
Is there a function k : (0 , → N ( k dependents on ε only, andnot on the number of players n ), such that every normalized n -player two-action game admits an ε -equilibrium in which every player employs a mixedstrategy whose coefficients are rational numbers with a denominator at most k ( ε )? Question 2.
Is there an ε >
C >
0, such that for every n, m ∈ N there exists a normalized n -player m -action game that does notadmit any ε -equilibrium in which every player employs a mixed strategywhose coefficients are rational numbers with a denominator at most C (log n +log m )?Note that a positive answer to Question 2 means that our upper bound is tight, whereas a positive answer to Question 1 implies that our upper boundis not tight. A positive answer to Question 1 means that one can find a k -uniform approximate equilibrium of the game for a constant k (depending11nly on ε ), which in particular implies that there exists a poly ( N ) algorithmfor computing an approximate Nash equilibrium in two-action games. References [1] Althofer, I. (1994) “On Sparse Approximations to Randomized Strate-gies and Convex Combinations,”
Linear Algebra and Its Applications
Mathematics ofOperations Research , forthcoming.[3] Babichenko, Y. (2013) “Query Complexity of Approximate Nash Equi-librium,” arXiv:1306.6686.[4] Daskalakis, C. and Papadimitriou, C. H. (2009) “On Oblivious PTAS’sfor Nash Equilibrium,”
Proceedings of the 41st Annual ACM Symposiumon Theory of Computing , pp. 75–84.[5] H´emon, S., Rougemont, M., and Santha, M. (2008) “Approximate NashEquilibria for Multy-player Games,”
Algorithmic Game Theory, LectureNotes in Computer Science
Journal of the American Statistical Association