[PDF] Playing Anonymous Games using Simple Strategies

Abstract

We investigate the complexity of computing approximate Nash equilibria in anonymous games. Our main algorithmic result is the following: For any n -player anonymous game with a bounded number of strategies and any constant δ>0 , an O(1/ n 1−δ ) -approximate Nash equilibrium can be computed in polynomial time. Complementing this positive result, we show that if there exists any constant δ>0 such that an O(1/ n 1+δ ) -approximate equilibrium can be computed in polynomial time, then there is a fully polynomial-time approximation scheme for this problem. We also present a faster algorithm that, for any n -player k -strategy anonymous game, runs in time O ~ ((n+k)k n k ) and computes an O ~ ( n −1/3 k 11/3 ) -approximate equilibrium. This algorithm follows from the existence of simple approximate equilibria of anonymous games, where each player plays one strategy with probability 1−δ , for some small δ , and plays uniformly at random with probability δ . Our approach exploits the connection between Nash equilibria in anonymous games and Poisson multinomial distributions (PMDs). Specifically, we prove a new probabilistic lemma establishing the following: Two PMDs, with large variance in each direction, whose first few moments are approximately matching are close in total variation distance. Our structural result strengthens previous work by providing a smooth tradeoff between the variance bound and the number of matching moments.

Full PDF

aa r X i v : . [ c s . G T ] A ug Playing Anonymous Games using Simple Strategies

Yu Cheng ∗ Ilias Diakonikolas † Alistair Stewart ‡ University of Southern California

Abstract

0, an O (1 /n − δ )-approximate Nashequilibrium can be computed in polynomial time. Complementing this positive result, we showthat if there exists any constant δ > O (cid:0) /n δ (cid:1) -approximate equilibrium can becomputed in polynomial time, then there is a fully polynomial-time approximation scheme forthis problem.We also present a faster algorithm that, for any n -player k -strategy anonymous game, runs intime e O (cid:0) ( n + k ) kn k (cid:1) and computes an e O (cid:0) n − / k / (cid:1) -approximate equilibrium. This algorithmfollows from the existence of simple approximate equilibria of anonymous games, where eachplayer plays one strategy with probability 1 − δ , for some small δ , and plays uniformly at randomwith probability δ .Our approach exploits the connection between Nash equilibria in anonymous games andPoisson multinomial distributions (PMDs). Speciﬁcally, we prove a new probabilistic lemmaestablishing the following: Two PMDs, with large variance in each direction, whose ﬁrst fewmoments are approximately matching are close in total variation distance. Our structural resultstrengthens previous work by providing a smooth tradeoﬀ between the variance bound and thenumber of matching moments. ∗ Supported in part by Shang-Hua Teng’s Simons Investigator Award. † Supported by a USC startup fund. ‡ Authors’ emails: { yu.cheng.1, diakonik, alistais } @usc.edu Introduction

Anonymous games are multiplayer games in which the utility of each player depends on her ownstrategy, as well as the number (as opposed to the identity) of other players who play each of thestrategies. Anonymous games comprise an important class of succinct games — well-studied in theeconomics literature (see, e.g., [Mil96, Blo99, Blo05]) — capturing a wide range of phenomena thatfrequently arise in practice, including congestion games, voting systems, and auctions.In recent years, anonymous games have attracted signiﬁcant attention in TCS [DP07, DP08,DP09, DP15, GT15, CDO15, DDKT16, DKS16a], with a focus on understanding the computationalcomplexity of their (approximate) Nash equilibria. Consider the family of anonymous games wherethe number of players, n , is large and the number of strategies, k , is bounded. It was recentlyshown by Chen et al. [CDO15] that computing an ǫ -approximate Nash equilibrium of these gamesis PPAD-Complete when ǫ is exponentially small, even for anonymous games with 5 strategies .On the algorithmic side, Daskalakis and Papadimitriou [DP07, DP08] presented the ﬁrst polynomial-time approximation scheme (PTAS) for this problem with running time n (1 /ǫ ) Ω( k ) . For the case of2-strategies, this bound was improved [DP09, DDS12, DP15] to poly( n ) · (1 /ǫ ) O (log (1 /ǫ )) , and sub-sequently sharpened to poly( n ) · (1 /ǫ ) O (log(1 /ǫ )) in [DKS16b]).In recent work, Daskalakis et al. [DDKT16] and Diakonikolas et al. [DKS16a] generalized theaforementioned results [DP15, DKS16b] to any ﬁxed number k of strategies, obtaining algorithmsfor computing ǫ -well-supported equilibria with runtime of the form n poly( k ) · (1 /ǫ ) k log(1 /ǫ ) O ( k ) . Thatis, the problem of computing approximate Nash equilibria in anonymous games with a ﬁxed num-ber of strategies admits an eﬃcient polynomial-time approximation scheme (EPTAS). Moreover,the dependence of the running time on the parameter 1 /ǫ is quasi-polynomial – as opposed toexponential.We note that all the aforementioned algorithmic results are obtained by exploiting a connectionbetween Nash equilibria in anonymous games and Poisson multinomial distributions (PMDs). Thisconnection – formalized in [DP07, DP08] – translates constructive upper bounds on ǫ -covers forPMDs to upper bounds on computing ǫ -Nash equilibria in anonymous games (see Section 2 forformal deﬁnitions). Unfortunately, as shown in [DDKT16, DKS16a], this “cover-based” approachcannot lead to qualitatively faster algorithms, due to a matching existential lower bound on thesize of the corresponding ǫ -covers. In a related algorithmic work, Goldberg and Turchetta [GT15]studied two-strategy anonymous games ( k = 2) and designed a polynomial-time algorithm thatcomputes an ǫ -approximate Nash equilibria for ǫ = Ω( n − / ).The aforementioned discussion prompts the following natural question: What is the preciseapproximability of computing Nash equilibria in anonymous games?

In this paper, we make progresson this question by establishing the following result: For any δ >

0, and any n -player anonymousgame with a constant number of strategies, there exists a poly δ ( n ) time algorithm that computesan ǫ -approximate Nash equilibrium of the game, for ǫ = 1 /n − δ . Moreover, we show that theexistence of a polynomial-time algorithm that computes an ǫ -approximate Nash equilibrium for ǫ = 1 /n δ , for any small constant δ > [CDO15] showed that computing an equilibrium of 7-strategy anonymous games is PPAD-Complete, but 3 of the7 strategies in their construction can be merged, resulting in a 5-strategy anonymous game. The runtime of our algorithm depends exponentially in 1 /δ . We remind the reader that the algorithmsof [DDKT16, DKS16a] run in quasi-polynomial time for any value of ǫ inverse polynomial in n . ǫ = 1 /n is the thresholdfor the polynomial-time approximability of Nash equilibria in anonymous games, unless there is anFPTAS. In the following subsection, we describe our results in detail and provide an overview ofour techniques. We study the following question:

For n -player k -strategy anonymous games, how small can ǫ be (as a function of n ), sothat an ǫ -approximate Nash equilibrium can be computed in polynomial time? Upper Bounds.

We present two diﬀerent algorithms (Theorems 1.1 and 1.2) for computingapproximate Nash equilibria in anonymous games. Both algorithms run in polynomial time andcompute ǫ -approximate equilibria for an inverse polynomial ǫ above a certain threshold. Theorem 1.1 (Main) . For any δ > , and any n -player k -strategy anonymous game, there is a poly δ,k ( n ) time algorithm that computes an (1 /n − δ ) -approximate equilibrium of the game. Theorem 1.2.

For any n -player k -strategy anonymous game, we can compute an e O (cid:16) n − / k / (cid:17) -approximate equilibrium in time e O (cid:16) ( n + k ) kn k (cid:17) . Prior to our work, for k >

2, no polynomial time ǫ -approximation was known for any inversepolynomial ǫ . For k = 2, the best previous result is due to [GT15] who gave a polynomial-timealgorithm for ǫ = Ω( n − / ). Overview of Techniques.

The high-level idea of our approach is this: If the desired accuracy ǫ is above a certain threshold, we do not need to enumerate over an ǫ -cover for the set of allPMDs. Our approach is in part inspired by [GT15], who design an algorithm (for k = 2 and ǫ = Ω( n − / )) in which all players use one of the two pre-selected mixed strategies. We notethat for k = 2, PMDs are tantamount to Poisson Binomial distributions (PBDs), i.e., sums ofindependent Bernoulli random variables. The [GT15] algorithm can be equivalently interpreted asguessing a PBD from an appropriately small set. One reason this idea succeeds is the following: Ifevery player randomizes, then the variance of the resulting PBD must be relatively high, and (as aresult) the corresponding subset of PBDs has a smaller cover.Our quantitative improvement for the k = 2 case is obtained as follows: Instead of enforcingplayers to selected speciﬁc mixed strategies – as in [GT15] – we show that there always exists an ǫ -approximate equilibrium where the associated PBD has variance at least Θ( nǫ ). When ǫ = n − c for some c <

1, the variance is an inverse polynomial of n . We then construct a polynomial-size ǫ -cover for the subset of PBDs with variance at least this much, which leads to a polynomial-timealgorithm for computing ǫ -approximate equilibria in 2-strategy anonymous games.The idea for the general case of k > k >

2. We proceed as follows: We start byshowing that there is an ǫ -approximate equilibrium whose corresponding PMD has a large variancein each direction. Our main structural result is a robust moment-matching lemma (Lemma 3.4),which states that the closeness in low-degree moments of two PMDs, with large variance in eachdirection, implies their closeness in total variation distance. The proof of this lemma uses Fourier2nalytic techniques, building on and strengthening previous work [DKS16a]. As a consequence ofour moment-matching lemma, we can construct a polynomial-size ( ǫ/ ǫ -approximate equilibrium, using adynamic programming approach similar to the one in [DP15].We now provide a brief intuition of our moment-matching lemma. Intuitively, if the two PMDsin question are both very close to discrete Gaussians, then the closeness in the ﬁrst two moments issuﬃcient. Lemma 3.4 can be viewed as a generalization of this intuition, which gives a quantitativetradeoﬀ between the number of moments we need to approximately match and the size of thevariance. The proof of Lemma 3.4 exploits the sparsity of the Fourier transform of our PMDs, andthe fact that higher variance allows us to take fewer terms in the Taylor expansion when we usemoments to approximate the logarithmic Fourier transform. This completes the proof sketch ofTheorem 1.1.Our second algorithm (Theorem 1.2) addresses the need to play simple strategies. Playerstend to favor simple strategies which are easier to learn and implement, even if these strategiesmight have slightly sub-optimal payoﬀs [Sim82]. In addition, our algorithm is signiﬁcantly fasterin this case. We build on the idea of [GT15] to “smooth” an anonymous game by forcing allthe players to randomize. We prove that the perturbed game is Lipschitz and therefore admits apure Nash equilibrium, which corresponds to simple approximate equilibria of a speciﬁc form inthe original game: Each player plays one strategy with probability 1 − δ for some small δ , andplays other strategies uniformly at random with probability δ . To prove that the perturbed gameis Lipschitz, we make essential use of the recently established multivariate central limit theorem(CLT) in Daskalakis et al. [DDKT16] and Diakonikolas, Kane and Stewart [DKS16a] to show thatif we add a little more noise (corresponding to δ = Θ( n − / )), the associated PMD is suﬃcientlyclose to a discrete Gaussian. Lower Bounds.

When ǫ = 1 /n , we can show that there is an ǫ -approximate equilibrium wherethe associated PMD has a variance at least 1 /k in every direction. Unfortunately, the PMDs inthe explicit quasi-polynomial-size lower bounds given in [DDKT16, DKS16a] satisfy this property.Thus, we need a diﬀerent approach to get a polynomial-time algorithm for ǫ = 1 /n or smaller.In fact, we prove the following results, which states that even a slight improvement of our upperbound in Theorem 1.1 would imply an FPTAS for computing Nash equilibria in anonymous games.It is important to note that Theorem 1.3 applies to all algorithms, not only the ones that leveragethe structure of PMDs. Theorem 1.3.

For n -player k -strategy anonymous games with k = O (1) , if we can compute an O ( n − c ) -approximate equilibrium in polynomial time for some constant c > , then there is anFPTAS for computing (well-supported) Nash equilibria of k -strategy anonymous games. Remark.

As observed in [DDKT16], because there is a quasi-polynomial time algorithm forcomputing an ( n − c )-approximate equilibrium in anonymous games, the problem cannot be PPAD-Complete unless PPAD ⊆ Quasi-PTIME . On the other hand, we do not know how to improve thequasi-polynomial-time upper bounds of [DDKT16, DKS16a] when ǫ < /n .Recall that computing an ǫ -approximate equilibrium of a two-player general-sum n × n game(2-NASH) for constant ǫ also admits a quasi-polynomial-time algorithm [LMM03]. Very recently, A fully polynomial-time approximation scheme (FPTAS) is an algorithm that runs in time poly( n, /ǫ ) andreturns an ǫ -optimal solution, or in our context, returns an ǫ -approximate Nash equilibrium. ǫ >

0, quasi-polynomial-time is necessary to compute an ǫ -approximate equilibrium of 2-NASH. It is a plausible conjecture that quasi-polynomial-time isalso required for ǫ -Nash equilibria in anonymous games, when ǫ = n − c for some constant c >

1. Inparticular, this would imply that there is no FPTAS for computing approximate Nash equilibria inanonymous games, and consequently the upper bound of Theorem 1.1 is essentially tight.

Anonymous Games.

We study anonymous games ( n, k, { u ia } i ∈ [ n ] ,a ∈ [ k ] ) with n players labeledby [ n ] = { , . . . , n } , and k common strategies labeled by [ k ] for each player. The payoﬀ of a playerdepends on her own strategy, and how many of her peers choose which strategy, but not on theiridentities. When player i ∈ [ n ] plays strategy a ∈ [ k ], her payoﬀs are given by a function u ia that maps the possible outcomes (partitions of all other players) Π kn − to the interval [0 , kn − = { ( x , . . . , x k ) | x j ∈ R ∧ P kj =1 x j = n − } . Approximate Equilibria.

We denote by ∆ S a distribution on the set S . A mixed strategy isan element of ∆ [ k ] , and a mixed strategy proﬁle s = ( s , . . . , s n ) maps every player i to her mixedstrategy s i ∈ ∆ [ k ] . We use s − i to denote the strategies of players other than i in s .A mixed strategy proﬁle s is an ǫ -approximate Nash equilibrium for some ǫ ≥ ∀ i ∈ [ n ] , ∀ a ′ ∈ [ k ] , E x ∼ s − i h u ia ′ ( x ) i ≤ E x ∼ s − i ,a ∼ s i h u ia ( x ) i + ǫ, where x ∈ Π kn − is the partition formed by n − k ]according to the distributions s − i . Note that given a mixed strategy proﬁle s , we can computea player’s expected payoﬀ to precision ǫ in time poly( n k log(1 /ǫ )) by straightforward dynamicprogramming, and hence throughout this paper we assume that we can compute players’ payoﬀsexactly given their mixed strategies. Poisson Multinomial Distributions. A k -Categorical Random Variable ( k -CRV) is a vectorrandom variable supported on the set of k -dimensional basis vectors { e , . . . , e k } . A k -CRV is i -maximal if e i is its most likely outcome (break ties by taking the smallest index i ). A k -PoissonMultinomial Distribution of order n , or an ( n, k )-PMD, is a vector random variable of the form X = P ni =1 X i where the X i ’s are independent k -CRVs. The case of k = 2 is usually referred to asPoisson Binomial Distribution (PBD).Note that a mixed strategy proﬁle s = ( s , . . . , s n ) of an n -player k -strategy anonymous gamecorresponds to the k -CRVs { X , . . . , X n } where Pr[ X i = e a ] = s i ( a ). The expected payoﬀ of player i ∈ [ n ] for playing pure strategy a ∈ [ k ] can also be written as E (cid:2) u ia ( X − i ) (cid:3) = E h u ia (cid:16)P j = i,j ∈ [ n ] X j (cid:17)i .Let X = P ni =1 X i be an ( n, k )-PMD such that for i ∈ [ n ] and j ∈ [ k ] we denote p i,j = Pr[ X i = e j ], where P kj =1 p i,j = 1. For m = ( m , . . . , m k ) ∈ Z k + , we deﬁne the m th -parameter moments of X to be M m ( X ) def = P ni =1 Q kj =1 p m j i,j . We refer to k m k = P kj =1 m j as the degree of the parametermoment M m ( X ). 4 otal Variation Distance and Covers. The total variation distance between two distributions P and Q supported on a ﬁnite domain A isd TV ( P, Q ) := max S ⊆ A | P ( S ) − Q ( S ) | = (1 / · k P − Q k . If X and Y are two random variables ranging over a ﬁnite set, their total variation distanced TV ( X, Y ) is deﬁned as the total variation distance between their distributions. For convenience,we will often blur the distinction between a random variable and its distribution.Let ( X , d ) be a metric space. Given ǫ >

0, a subset

Y ⊆ X is said to be a proper ǫ -cover of X with respect to the metric d : X → R + , if for every X ∈ X there exists some Y ∈ Y such that d ( X, Y ) ≤ ǫ . In this work, we will be interested in constructing ǫ -covers for high-variance PMDsunder the total variation distance metric. Multidimensional Fourier Transform.

For x ∈ R , we will denote e ( x ) def = exp( − πix ). The(continuous) Fourier Transform of a function F : Z → C is the function b F : [0 , k → C deﬁned as b F ( ξ ) = P x ∈ Z k e ( ξ · x ) F ( x ). For the case that F is a probability mass function, we can equivalentlywrite b F ( ξ ) = E x ∼ F [ e ( ξ · x )].Let X = P ni =1 X i be an ( n, k )-PMD with p i,j def = Pr[ X i = e j ]. To avoid clutter in the notation,we will sometimes use the symbol X to denote the corresponding probability mass function. Withthis convention, we can write that b X ( ξ ) = Q ni =1 c X i ( ξ ) = Q ni =1 P kj =1 e ( ξ j ) p i,j . In this section, we present a polynomial-time algorithm that, for n -player anonymous games withbounded number of strategies, computes an ǫ -approximate equilibrium with ǫ = n − c for any con-stant c <

1. As a warm up, we start by describing the simpler setting of two-strategy anonymousgames ( k = 2). The main results of this section is Theorem 1.1 that applies to general k -strategyanonymous games for any constant k ≥ ǫ -approximate Nash equilibria in which thecorresponding PMDs have high variance and every player randomizes (Lemma 3.1). We then useour robust moment matching lemma (Lemma 3.4) to show that when two PMDs have high variances,the closeness in their constant-degree parameter moments implies their closeness in total variationdistance. The fact that matching the constant-degree moments suﬃces allows us to construct apolynomial-size ( ǫ/ ǫ -approximate equilibrium (Algorithm 2). Lemma 3.1.

For an n -player k -strategy anonymous game, there always exists an ǫ -approximateequilibrium where every player plays each strategy with probability at least ǫk − .Proof. Given an anonymous game G = ( n, k, { u ia } i ∈ [ n ] ,a ∈ [ k ] ), we smooth players’ utility functionsby requiring every player to randomize. Fix ǫ >

0, we deﬁne an ǫ -perturbed game G ǫ as follows.When a player plays some pure strategy a ∈ [ k ] in G ǫ , we map it back to the original game as if sheplays strategy j with probability 1 − ǫ , and plays some other strategy a ′ = a uniformly at random(i.e., she plays a ′ with probability ǫk − ). Her payoﬀ in G ǫ also accounts for such perturbation, andis deﬁned to be her expected payoﬀ given that all the players (including herself) would deviate toother strategies uniformly at random with probability ǫ .5ormally, let X ǫ ( e j ) denote the k -CRV that takes value e j with probability 1 − ǫ , and takesvalue e j ′ with probability ǫk − for each j ′ = j . The payoﬀ structure of G ǫ is given by u ′ ia ( x ) := (1 − ǫ ) E h u ia ( M ǫ ( x )) i + ǫk − X a ′ = a E h u ia ′ ( M ǫ ( x )) i , ∀ i ∈ [ n ] , a ∈ [ k ] , x ∈ Π kn − , where M ǫ ( x ) = P j ∈ [ k ] x j X ǫ ( e j ) is an ( n − , k )-PMD that corresponds to the perturbed outcomeof the partition x ∈ Π kn − of all other players.Let s ′ = ( s ′ , . . . , s ′ n ) denote any exact Nash equilibrium of G ǫ . We can interpret this mixedstrategy proﬁle in G equivalently as s = ( s , . . . , s n ), where s i = (1 − kǫk − ) s ′ i + ǫk − , where = (1 , . . . , s each player has no incentive to deviate to the mixedstrategies X ǫ ( e j ) for all j ∈ [ k ], therefore a player can gain at most ǫ by deviating to pure strategiesin G , so s is an ǫ -approximate equilibrium with s i ( j ) ≥ ǫk − for all i ∈ [ n ], j ∈ [ k ]. Warm-up: The Case of k = 2 Strategies.

For two-strategy anonymous games ( k = 2), ifall the players put at least ǫ probability mass on both strategies, the resulting PBD is going tohave variance at least nǫ (1 − ǫ ). When ǫ = n − c for some constant c <

1, the variance is at leastΘ (cid:0) n − c (cid:1) = n Θ(1) . We can now use the following lemma from [DKS16c], which states that if twoPBDs P and Q are close in the ﬁrst few moments, then P and Q are ǫ -close in total variationdistance. Note that without any assumption on the variance of the PBDs, we would need to checkthe ﬁrst O (log(1 /ǫ )) moments, but when the variance is n Ω(1) , which is the case in our application,we only need the ﬁrst constant number of moments to match.

Lemma 3.2 ([DKS16c]) . Let ǫ > . Let P and Q be n -PBDs with P having parameters p , . . . , p s ≤ / and p ′ , . . . , p ′ s ′ > / , and Q having parameters q , . . . , q s ≤ / and q ′ , . . . , q ′ s ′ > / . Supposethat V = Var[ P ] + 1 = Θ(Var[ Q ] + 1) and let C > be a suﬃciently large constant. Supposefurthermore that for A = C p log(1 /ǫ ) /V and for all positive integers ℓ it holds A ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s X i =1 p ℓi − s X i =1 q ℓi (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s ′ X i =1 (1 − p ′ i ) ℓ − s ′ X i =1 (1 − q ′ i ) ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ǫC log(1 /ǫ ) (1) Then d TV ( P, Q ) < ǫ . Let ǫ = n − c . For Lemma 3.2 we have V ≥ nǫ (1 − ǫ ) and A = Θ (cid:16)p log(1 /ǫ ) /V (cid:17) = O (cid:18)q log nn − c (cid:19) .The diﬀerence in the moments of parameters of P and Q in Equation (1) is bounded from aboveby n , so whenever ℓ > c − c , the condition in Lemma 3.2 is automatically satisﬁed for suﬃcientlylarge n because A ℓ n = O log ℓ/ nn (1 − c ) ℓ/ n ! < C · n c · c log n = ǫC log(1 /ǫ ) . So it is enough to search over the ﬁrst ℓ = Θ (cid:16) − c (cid:17) moments when each player put probability atleast Ω( n − c ) on both strategies. The algorithm for ﬁnding such an ǫ -approximate equilibrium usesmoment search and dynamic programming, and is given for the case of general k in the remainderof this section. 6 he General Case: k Strategies.

We now present our algorithm for n -player anonymousgames with k > k = 2 case carriesover to the general case, but the details are more elaborate. First, we show (Claim 3.3) that thereexists an ǫ -approximate equilibrium whose corresponding PMD has variance ( nǫ/k ) in all directionsorthogonal to the vector = (1 , . . . , ǫ -approximate Nash equilibria (Algorithm 2).We ﬁrst prove that when all the players put probability at least ǫk − on each strategy, thecovariance matrix of the resulting PMD has relatively large eigenvalues, except the zero eigenvalueassociated with the all-one eigenvector. The all-one eigenvector has eigenvalue zero because thecoordinates of X always sum to n . Claim 3.3.

Let X = P ni =1 X i be an ( n, k ) -PMD and let Σ be the covariance matrix of X . If p i,j = Pr[ X i = e j ] ≥ ǫk − for all i ∈ [ n ] and j ∈ [ k ] , then all eigenvalues of Σ but one are at least nǫk − .Proof. For any unit vector v ∈ R k that is orthogonal to the all-one vector , i.e., P j v j = 0 and P j v j = 1, combining this with the assumption that p i,j ≥ ǫk − we have,Var[ v T X i ] = E (cid:20)(cid:16) v T X i − E h ( v T X i ) i(cid:17) (cid:21) = n X j =1 p i,j  v j − n X j ′ =1 p i,j ′ v j ′  ≥ min j { p i,j } · n X j =1  v j +  n X j ′ =1 p i,j ′ v j ′  − v j  n X j ′ =1 p i,j ′ v j ′  = min j { p i,j } ·  n  n X j ′ =1 p i,j ′ v j ′   ≥ ǫk − . Therefore, v T Σ v = Var[ v T X ] = n X i =1 Var[ v T X i ] ≥ nǫk − . So, for all eigenvectors v orthogonal to , we have v T Σ v = λv T v = λ ≥ nǫk − as claimed.The following robust moment-matching lemma provides a bound on how close degree ℓ momentsneed to be so that two ( n, k )-PMDs are ǫ -close to each other, under the assumption that n ≫ k (theanonymous game has many players and few strategies) and p i,j ≥ ǫk − (every player randomizes).Lemma 3.4 allows us to build a polynomial-size ( ǫ/ ǫ -approximate equilibrium with a high variance, we are guaranteed toﬁnd one in our cover. 7 emma 3.4. Fix < c < and let ǫ = n − c . Assume that n = k Ω( k ) for some suﬃciently largeconstant in the exponent. Let X , Y be ( n, k ) -PMDs with X = P ki =1 X i , Y = P ki =1 Y i where each X i , Y i is an i -maximal PMD. Let Σ and Σ ′ denote the covariance matrices of X and Y respectively.If all eigenvalues of Σ , Σ ′ but one are at least ǫn/k , and for ℓ ≤ c − c all the parameter moments m of degree ℓ satisfy that (cid:12)(cid:12)(cid:12) M m ( X i ) − M m ( Y i ) (cid:12)(cid:12)(cid:12) ≤ n − c . Then, we have that d TV ( X, Y ) ≤ ǫ . Lemma 3.4 follows from the next proposition whose proof is given in the following subsection.

Proposition 3.5.

Let ǫ > . Let X , Y be ( n, k ) -PMDs with X = P ki =1 X i , Y = P ki =1 Y i whereeach X i , Y i is an i -maximal PMD. Let Σ and Σ ′ denote the covariance matrices of X and Y respectively, where all eigenvalues of Σ and Σ ′ but one are at least σ , where σ ≥ poly( k log(1 /ǫ )) .Suppose that for ≤ i ≤ k , ℓ ≥ , for all moments m of degree ℓ with m i = 0 , we have that (cid:12)(cid:12)(cid:12) M m ( X i ) − M m ( Y i ) (cid:12)(cid:12)(cid:12) ≤ ǫ · σ ℓ C ′ k + ℓ · k ℓ/ · log k + ℓ/ (1 /ǫ ) (2) for a suﬃciently large constant C ′ . Then d T V ( X, Y ) ≤ ǫ . The proof of Proposition 3.5 exploits the sparsity of the continuous Fourier transform of ourPMDs, as well as careful Taylor approximations of the logarithm of the Fourier transform.

Proof of Lemma 3.4 from Proposition 3.5.

In order to guarantee that d TV ( X, Y ) ≤ ǫ , Proposition3.5 requires the following condition to hold for a suﬃciently large constant C ′ : (cid:12)(cid:12)(cid:12) M m ( X i ) − M m ( Y i ) (cid:12)(cid:12)(cid:12) ≤ ǫk ( C ′ log(1 /ǫ )) k · p ǫn/kC ′ k / log / (1 /ǫ ) ! ℓ , ∀ i ∈ [ k ] , ℓ ≥ . (3)To prove the lemma, we use the fact that n ≫ k and essentially ignore all the terms exceptpolynomials of n . Formally, we ﬁrst need to show that ǫk ( C ′ log(1 /ǫ )) k · p ǫn/kC ′ k / log / (1 /ǫ ) ! ℓ ≥ n − c , ∀ ℓ ≥ , under the assumption that c < ǫ = n − c and n ≥ k O ( k/ (1 − c )) . After substituting ǫ = n − c , observethat n − c ≥ C ′ k log n , so the term inside the ℓ -th power is greater than 1. Thus, we only need tocheck this inequality for ℓ = 1, which simpliﬁes to n − c ≥ C ′ k +2 k (log n ) k and holds true.In addition, we need to show that condition (3) holds automatically for ℓ > c − c . This followsfrom the fact that the diﬀerence in parameter moments is at most n and n ≫ k , (cid:12)(cid:12)(cid:12) M m ( X i ) − M m ( Y i ) (cid:12)(cid:12)(cid:12) ≤ n ≤ ǫk ( C ′ log(1 /ǫ )) k · p ǫn/kC ′ k / log / (1 /ǫ ) ! ℓ , ∀ ℓ > c − c . We recall some of the notations for readability before we describe the construction of our ǫ -coverof high-variance PMDs. We use X to denote a generic ( ℓ, k )-PMD for some ℓ ∈ [ n ], and we denote p i,j = Pr[ X i = e j ]. We use A t ⊆ [ ℓ ] to denote the set of t -maximal CRVs in X , where a k -CRV8s t -maximal if e t is its most likely outcome, and we use X t = P i ∈ A t X i to denote the t -maximalcomponent PMD of X . For a vector m = ( m , . . . , m k ) ∈ Z k + , we deﬁne m th parameter moment of X t to be M m ( X t ) = P i ∈ A t Q kj =1 p m j i,j . We refer to k m k = P kj =1 m j as the degree of M m ( X ). Weuse S to denote the set of all k -CRVs whose probabilities are multiples of ǫ kn .Lemma 3.4 states that the high-degree parameter moments match automatically, which allowsus to impose an appropriate grid on the low-degree moments to cover the set of high-variancePMDs. The size of this cover can be bounded by a simple counting argument: We have at most k O ( − c ) moments with degree at most O ( − c ), and we need to approximate these moments for each t -maximal component PMDs, so there are at most k · k O ( − c ) = k O ( − c ) moments M m ( X t ) that wecare about. We approximate these moments to precision n − c , and the moments are at most n , sothe size of the cover is (cid:16) nn − c (cid:17) k O ( 11 − c ) = n k O (1 / − c ) .We deﬁne this grid on low-degree moments formally in the following lemma. For every ( ℓ, k )-PMD X with ℓ ∈ [ n ], we associate some data D ( X ) with X , which is a vector of the approximatevalues of the low-degree moments M m ( X t ) of X . Lemma 3.6.

Fix < c < and n , let ǫ = n − c . We deﬁne the data D ( W ) of a k -CRV W as: D ( W ) m,t = ( M m ( W ) rounded to the nearest integer multiple of n − c /n, if W is t -maximal. , otherwise.For ℓ ∈ [ n ] , we deﬁne the data of an ( ℓ, k ) -PMD X = P ℓi =1 X i to be the sum of the data of its k -CRVs: D ( X ) = P ℓi =1 D ( X i ) . The data D ( X ) satisﬁes two important properties:1. (Representative) If D ( X ) = D ( Y ) for two ( n, k ) -PMDs or two ( n − , k ) -PMDs, then d TV ( X, Y ) ≤ ǫ .2. (Extensible) For independent PMDs X and Y , we have that D ( X + Y ) = D ( X ) + D ( Y ) .Proof. The extensible property follows directly from the deﬁnition of D ( X ). To see the repre-sentative property, note that we round M m ( W ) to the nearest integer multiple of n − c /n , so theerror in the moments of W is at most n − c / (2 n ). When we add up the data of an ( n, k )-PMD or( n − , k )-PMD, the error in the moments of each t -maximal component PMDs is at most n − c / X and Y have the same data, their low-degree moments diﬀer by at most n − c ,and then by Lemma 3.4 we have d TV ( X, Y ) ≤ ǫ .Our algorithm (Algorithm 2) for computing approximate equilibria is similar to the approachused in [DP15] and [DKS16a]. We start by constructing a polynomial-sized ( ǫ/ ǫ/ ǫ/ ǫ -approximate Nash equilibrium.Recall that a mixed strategy proﬁle for a k -strategy anonymous game can be represented as a listof k -CRVs ( X , . . . , X n ), where X i describes the mixed strategy of player i . Recall that ( X , . . . , X n )is an ǫ -approximate Nash equilibrium if for each player i we have E h u iX i ( X − i ) i ≥ E (cid:2) u ia ( X − i ) (cid:3) − ǫ for all a ∈ [ k ], where X − i = P j = i X j is the distribution of the sum of other players strategies.9 lgorithm 1: GenerateData

Input : {S i } ni =1 , ǫ > Output : The set of all possible data D of ( n, k )-PMDs X = P ni =1 X i where X i ∈ S i . D = { } ; for ℓ = 1 . . . n doforall the D ∈ D ℓ − doforall the W ∈ S ℓ do Add D + D ( W ) to D ℓ if it is not in D ℓ already;Keep track of an ( ℓ, k )-PMD whose data is D + D ( W ); endendendreturn D = D n ; Algorithm 2:

Moment Search

Input : An n -player k -strategy anonymous game G , ǫ = n − c for some c < Output : An ǫ -approximate Nash equilibrium of G . D n = GenerateData( {S i = S} ni =1 , ǫ/ D n − = GenerateData( {S i = S} n − i =1 , ǫ/ forall the D ∈ D n do Set S i = ∅ for all i ; forall the X i ∈ S do Let D − i = D − D ( X i ); if ∃ Y D − i ∈ D n − with D ( Y D − i ) = D − i and X i is a (3 ǫ/ -best response to Y D − i then Add X i to S i ; endend D ′ n = GenerateData( {S i } ni =1 , ǫ/ if D ∈ D ′ n thenreturn ( X , . . . , X n ) with PMD X = with D ( P ni =1 X i ) = D in D ′ n ; endendLemma 3.7. Fix an anonymous game G = ( n, k, { u ia } i ∈ [ n ] ,a ∈ [ k ] ) with payoﬀs normalized to [0 , .Let ( X , . . . , X n ) and ( Y , . . . , Y n ) be two lists of k -CRVs. If X i is a δ -best response to X − i , and d TV ( X − i , Y − i ) ≤ ǫ , then X i is a ( δ + 2 ǫ ) -best response to Y − i . Moreover, if ( X , . . . , X n ) is a δ -approximate equilibrium, and d TV ( X i , Y i ) + d TV ( X − i , Y − i ) ≤ ǫ for all i ∈ [ n ] , then ( Y , . . . , Y n ) is a ( δ + 2 ǫ ) -approximate equilibrium.Proof. Since u ia ( x ) ∈ [0 ,

1] for all a ∈ [ k ] and x ∈ Π kn − , we have that (cid:12)(cid:12)(cid:12) E h u ia ( X − i ) i − E h u ia ( Y − i ) i(cid:12)(cid:12)(cid:12) ≤ d TV ( X − i , Y − i ) , ∀ i ∈ [ n ] , a ∈ [ k ] . Therefore, if d TV ( X − i , Y − i ) ≤ ǫ , and player i cannot deviate and gain more than δ when other10layers play X − i , then she cannot gain more than ( δ + 2 ǫ ) when other players play Y − i insteadof X − i . The second claim combines the inequality above with the fact that, if player i plays Y i instead of X i and the mixed strategies of other players remain the same, her payoﬀ changes by atmost d TV ( X i , Y i ). Formally, (cid:12)(cid:12)(cid:12) E h u iX i ( Z − i ) i − E h u iY i ( Z − i ) i(cid:12)(cid:12)(cid:12) ≤ d TV ( X i , Y i ) , ∀ k -CRV X i , Y i , ∀ ( n − , k )-PMD Z − i . The next lemma states that by rounding an ( ǫ /10)-approximate equilibrium, we can obtain an( ǫ/ ǫ kn . Claim 3.8.

There is an ( ǫ/ -approximate Nash equilibrium ( X , . . . , X n ) , such that for all i ∈ [ n ] and j ∈ [ k ] , the probabilities p i,j = Pr[ X i = e j ] are multiples of ǫ kn , and also p i,j ≥ ǫ k .Proof. We start with an ( ǫ/ Y , . . . , Y n ) from Lemma 3.1 with p i,j ≥ ǫ k , and then round the probabilities to integer multiples of ǫ kn . We construct X i from Y i as follows: for every j < k , we set Pr[ X i = e j ] to be Pr[ Y i = e j ] rounded down to a multiple of ǫ kn and we set Pr[ X i = e k ] = 1 − P j

We ﬁrst show that the output ( X , . . . , X n ) is an ǫ -approximate equilibrium.Recall that S is the set of all k -CRVs whose probabilities are multiples of ǫ kn , and S i ⊆ S is theset of approximate best-responses of player i . When we put X i in S i , we checked that X i isa (3 ǫ/ Y D − i , note that D ( Y D − i ) = D − D ( X i ) = D ( X − i ), so by Lemma 3.6d TV (cid:0) X − i , Y D − i (cid:1) ≤ ǫ/ i . By Lemma 3.7, X i is indeed an ǫ -best response to X − i for all i .Next we show the algorithm must always output something. By Claim 3.8 there exists an ( ǫ/ X ′ i with each X ′ i ∈ S . If the algorithm does not terminate successfully ﬁrst,it eventually considers D ( X ′ ). Because X ′− i is an ( n − , k )-PMD, the algorithm can ﬁnd some Y D − i with D ( Y D − i ) = D ( X ′ ) − D ( X ′ i ) = D ( X ′− i ), and by Lemma 3.6 we have d TV (cid:0) X ′− i , Y D − i (cid:1) ≤ ǫ/ i . Since X ′ i is an ( ǫ/ X ′− i , Lemma 3.7 yields that X ′ i is a (3 ǫ /5)-best responseto Y D − i , so we would add each X ′ i to S i . Then our cover construction algorithm is guaranteed togenerate a set of data that includes D ( X ′ ), and Algorithm 2 would produce an output.Finally, we bound the running time of Algorithm 2. Let N = O (cid:16) n k O (1 / − c ) (cid:17) denote the size ofthe ( ǫ/ O ( n · N · | S | ) aswe try to add one k -CRV from S in each step. We iterate through the cover, and for each elementin the cover, we need to ﬁnd the subset S i ⊆ S of (3 ǫ/ i , and thenrun the cover construction algorithm again using only the best responses {S i } ni =1 . So the overallrunning time of the algorithm is O ( nN | S | ) · (cid:16) poly( n k ) | S | + O ( nN | S | ) (cid:17) = n k O (1 / − c ) . When both c < k are constants, the running time is polynomial in n , as claimed in Theorem 1.1. This subsection is devoted to the proof of Proposition 3.5. For two ( n, k )-PMDs with varianceat least σ in each direction, Proposition 3.5 gives a quantitative bound on how close degree ℓ ǫ , σ , k and ℓ , but independent of n ), in order for the twoPMDs to be ǫ -close in total variation distance.The proof of Proposition 3.5 exploits the sparsity of the continuous Fourier transforms of ourPMDs, as well as careful Taylor approximations of the logarithm of the Fourier transform. Thefact that our PMDs have large variance enables us to take fewer low-degree terms in the Taylorapproximation. For technical reasons, we split our PMD as the sum of k independent componentPMDs, X = P ki =1 X i , where all the k -CRVs in the component PMD X i is i -maximal. Becausethe Fourier transform of X is the product of the Fourier transform of X i , we can just bound thepointwise diﬀerence between the logarithm of Fourier transform of each component PMD. Onetechnicality is that since we have no assumption on the variances of the component PMDs X i ,their Fourier transforms may not be sparse, so it is crucial that we bound this diﬀerence only onthe eﬀective support of the Fourier transform of the entire PMD.We start by considering a set S that includes the eﬀective support of X (and Y when we showthat the means are close): Lemma 3.9 (Essentially Corollary 5.3 of [DKS16a]) . Let X be an ( n, k ) -PMD with mean µ andcovariance matrix Σ , such that all the non-zero eigenvalues of Σ is at least σ where σ ≥ poly(1 /ǫ ) .Let S be the set of points x ∈ Z k where ( x − µ ) T = 0 and ( x − µ ) T (Σ + I ) − ( x − µ ) ≤ ( Ck log(1 /ǫ )) , for some suﬃciently large constant C. Then, X ∈ S with probability at least − ǫ/ , and | S | = q det(Σ + I ) · O (log(1 /ǫ )) k/ . Proof.

Applying Lemma 5.2 of [DKS16a], we have that ( X − µ ) T (Σ + I ) − ( X − µ ) = O ( k log( k/ǫ ))with probability at least 1 − ǫ . The set of integer coordinate points in this ellipsoid is the set S .Note that | S | is equal to the volume of S ′ = n y ∈ R k : ∃ x ∈ S with k y − x k ∞ ≤ / o , because S ′ is the disjoint union of cubes of volume 1, one for each integer point. But S ′ is again containedin an ellipsoid with ( y − µ ) T (Σ + I ) − ( y − µ ) = O ( k log( k/ǫ )), so | S | = Vol( S ′ ) = p det(Σ + I ) · O (log(1 /ǫ )) k/ .Next we show that b X , the Fourier transform of X , has a relatively small eﬀective support. Wefold the eﬀective support onto [0 , k to obtain the set T . We use [ x ] to denote the additive distanceof x ∈ R to the closest integer, i.e., [ x ] = min x ′ ∈ Z | x − x ′ | . Lemma 3.10.

Let X be an ( n, k ) -PMD with mean µ and covariance matrix Σ , such that all thenon-zero eigenvalues of Σ are at least σ where σ ≥ poly( k log(1 /ǫ )) . Let S be as above. Let b X bethe Fourier transform of X . Let T def = n ξ ∈ [0 , k : ∃ ξ ′ ∈ ξ + Z k with ξ ′ T Σ ξ ′ ≤ Ck log(1 /ǫ ) o , forsome suﬃciently large constant C . Then, we have that(i) For ξ ∈ T , and for all ≤ i, j ≤ k , [ ξ i − ξ j ] ≤ p Ck log(1 /ǫ ) /σ .(ii) Vol( T ) | S | = O ( C log(1 /ǫ )) k .(iii) R [0 , k \ T (cid:12)(cid:12)(cid:12) b X ( ξ ) (cid:12)(cid:12)(cid:12) dξ ≤ ǫ/ (2 | S | ) . b X coming frompoints outside of T is negligibly small. We then use the sparsity of the Fourier transform to showthat, if two PMDs have Fourier transforms that are pointwise suﬃciently close within the eﬀectivesupport T , then the two PMDs are close in total variation distance. Lemma 3.11.

Let X , Y , S , T be as above. If (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) ≤ ǫ ( C ′ log(1 /ǫ )) − k for all ξ ∈ T anda suﬃciently large constant C ′ , then d T V ( X, Y ) ≤ ǫ. Proof.

For any x ∈ Z k , taking the inverse Fourier transform, we have that Pr[ X = x ] = R ξ ∈ [0 , k e ( − ξ · x ) b X ( ξ ) dξ and similarly Pr[ Y = x ] = R ξ ∈ [0 , k e ( − ξ · x ) b Y ( ξ ) dξ . Thus, | Pr[ X = x ] − Pr[ Y = x ] | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ξ ∈ [0 , k e ( − ξ · x ) (cid:16) b X ( ξ ) − b Y ( ξ ) (cid:17) dξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Z ξ ∈ [0 , k (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) dξ = Z ξ ∈ T (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) dξ + Z ξ ∈ [0 , k \ T (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) dξ ≤ Vol( T ) · ǫ ( C ′ log(1 /ǫ )) − k + ǫ | S |≤ O ( C log(1 /ǫ )) k | S | · ǫ ( C ′ log(1 /ǫ )) − k + ǫ | S |≤ ǫ | S | . Since X and Y are outside of S each with probability less than ǫ/

2, we have that d T V ( X, Y ) ≤ ǫ/ P x ∈ S | Pr[ X = x ] − Pr[ Y = x ] | ≤ ǫ .We now have all the ingredients to prove Proposition 3.5. For two PMDs X and Y that areclose in their low-degree moments, we show that their Fourier transforms b X and b Y are pointwiseclose on T , and then by Lemma 3.11, X and Y are close in total variation distance. Proof of Proposition 3.5.

Let X , Y , S , T be as above. Given Lemma 3.11, we only need to showthat ∀ ξ ∈ T , (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) ≤ ǫ ( C ′ log(1 /ǫ )) − k .Fix ξ ∈ T . We ﬁrst examine, without loss of generality, the Fourier transform d X k of the13 -maximal component PMD. Let A k ⊆ [ n ] denote the set of k -maximal CRVs. d X k ( ξ ) = Y i ∈ A k k X j =1 e ( ξ j ) p i,j = e ( | A k | ξ k ) Y i ∈ A k  − k − X j =1 (1 − e ( ξ j − ξ k )) p i,j )  = e ( | A k | ξ k ) exp  X i ∈ A k log  − k − X j =1 (1 − e ( ξ j − ξ k )) p i,j )  = e ( | A k | ξ k ) exp  − X i ∈ A k ∞ X ℓ =1 ℓ  k − X j =1 (1 − e ( ξ j − ξ k )) p i,j )  = e ( | A k | ξ k ) exp  − X m ∈ Z k − k m k m ! k m k M m ( X k ) k − Y j =1 (1 − e ( ξ j − ξ k )) m j  (4)For notational convenience, we use Ψ kX to denote the expression inside exp( · ) in Equation (4). Asimilar formula holds for the Fourier transform c X i and c Y i of other i -maximal PMDs, and we useΨ iX and Ψ iY to denote the corresponding expressions inside exp( · ). Since the Fourier transform ofa PMD is the product of the Fourier transform of its component PMDs, we have (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k Y t =1 c X t ( ξ ) − k Y t =1 c Y t ( ξ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e k X t =1 | A t | ξ t ! k Y t =1 (cid:16) exp (cid:16) Ψ tX (cid:17) − exp (cid:16) Ψ tY (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ π k X t =1 (cid:12)(cid:12)(cid:12) Ψ tX − Ψ tY (cid:12)(cid:12)(cid:12) , where the last inequality is due to e ( P kt =1 | A t | ξ t ) = 1, and | exp( a ) − exp( b ) | ≤ | a − b | if the realparts of a and b satisfy Re( a ) , Re( b ) ≤ b X ( ξ ) and b Y ( ξ ) are pointwise close for all ξ ∈ T , it is enough to bound fromabove 2 π P kt =1 (cid:12)(cid:12) Ψ tX − Ψ tY (cid:12)(cid:12) . We use the fact that | − e ( ξ j − ξ k ) | = O ([ ξ j − ξ k ]), and recall that [ ξ i − ξ j ] ≤ p Ck log(1 /ǫ ) /σ by Lemma 3.10. We also use the multinomial identity P m ∈ Z k − , k m k = ℓ (cid:0) ℓm (cid:1) =14 k − ℓ . When C ′ is a suﬃciently large constant, we have (cid:12)(cid:12)(cid:12) b X ( ξ ) − b Y ( ξ ) (cid:12)(cid:12)(cid:12) ≤ π k X t =1 (cid:12)(cid:12)(cid:12) Ψ tX − Ψ tY (cid:12)(cid:12)(cid:12) = 2 π k X t =1 X m ∈ Z k − k m k m ! k m k (cid:12)(cid:12)(cid:12) M m ( X t ) − M m ( Y t ) (cid:12)(cid:12)(cid:12) k − Y j =1 (1 − e ( ξ j − ξ k )) m j ≤ π ∞ X ℓ =1 ( k − ℓ ℓ O p k log(1 /ǫ ) σ !! ℓ k X t =1 max m ∈ Z k − , k m k = ℓ (cid:12)(cid:12)(cid:12) M m ( X t ) − M m ( Y t ) (cid:12)(cid:12)(cid:12) ≤ ∞ X ℓ =1 k ℓ C ′ p k log(1 /ǫ )2 σ ! ℓ k · ǫσ ℓ C ′ k + ℓ · k ℓ/ · log k + ℓ/ (1 /ǫ )= ∞ X ℓ =1 − ℓ ǫ ( C ′ log(1 /ǫ )) − k = ǫ ( C ′ log(1 /ǫ )) − k . In this section, we show that even a slight improvement of our upper bound would imply an FPTASfor computing (well-supported) Nash equilibria in anonymous games (Theorem 1.3). It is a plausibleconjecture that assuming ETH for PPAD, there is no such FPTAS, in which case our upper bound(Theorem 1.1) is essentially tight.Theorem 1.3 follows directly from the following two lemmas. Lemma 4.1 converts an ǫ n -approximate Nash equilibrium into an ǫ -well-supported Nash equilibrium , by reallocating eachplayer’s probabilities on strategies with low expected payoﬀs to the best-response strategy (ﬁrstobserved in [DGP09]). Lemma 4.2 then uses a padding argument to show that, for ǫ -well-supported Nash equilibrium, the question of whether there is a polynomial-time algorithm for ǫ = n − c isequivalent for all constants c > Lemma 4.1.

For any n -player game whose payoﬀs are normalized to be between [0 , , if we havean oracle for computing players’ payoﬀs, we can eﬃciently convert an ǫ n -approximate equilibriuminto an ǫ -well-supported equilibrium.Proof. Take an ǫ n -approximate equilibrium of the game. We call a strategy “good” for a player ifthe strategy is an ǫ -best response for the player, and we call it “bad” otherwise. A player can putat most probability ǫ n on the “bad” strategies without violating the ǫ n -approximate equilibriumcondition. We move all the probabilities on “bad” strategies for all players to (any one of) their bestresponses simultaneously. After moving the probabilities, every player assigns non-zero probabilitiesonly to the “good” strategies. Since the total probability we moved is at most ǫ and the payoﬀsare in [0 , ǫ -best responses) are now ǫ -best responses. A mixed strategy proﬁle s is a well-supported Nash equilibrium iﬀ ∀ i ∈ [ n ], ∀ a, a ′ ∈ [ k ], we have E x ∼ s − i (cid:2) u ia ( x ) (cid:3) > E x ∼ s − i (cid:2) u ia ′ ( x ) (cid:3) + ǫ = ⇒ s i ( a ′ ) = 0, i.e., players can only put non-zero probability on ǫ -best-response strategies. emma 4.2. For n -player k -strategy anonymous games with k = O (1) , if an n γ -well-supportedequilibrium can be computed in time O ( n d ) for constants γ, d > , then there is an FPTAS forcomputing approximate-well-supported Nash equilibria in anonymous games.Proof. Let ǫ be the desired quality of the well-supported equilibrium. If n γ ≤ ǫ we are done, so weassume n is smaller. We set n ′ = (1 /ǫ ) /γ , so that n ′ γ = ǫ . Given an n -player anonymous game G , we build an n ′ -player anonymous game G ′ as follows: we add n ′ − n dummy players, and givethe dummy players utility 1 on strategy 1, and 0 on any other strategies so in any ǫ -well-supportedequilibria, the dummy player must all play strategy 1 with probability 1. (Note that this is only truefor ǫ -well-supported Nash equilibrium; in an ǫ -approximate Nash equilibrium, the dummy playerscan put ǫ probability elsewhere.) We shift the utility function of the actual players to ignore thedummy players on strategy 1. Formally, the payoﬀ structure of G ′ is given by: • For each i > n , u ′ ia ( x ) = ( a = 10 otherwise • For each i ≤ n , we subtract the number of players on strategy 1 by n ′ − n and then apply theoriginal utility function. We deﬁne φ : Z k → Z k as φ ( x , . . . , x k ) = ( x − ( n ′ − n ) , x , . . . , x k ), u ′ ia ( x ) = ( u ia ( φ ( x )) if x ≥ n ′ − n ǫ = n ′ γ , by assumption we can compute an ǫ -well-supported equilibrium of G ′ in time O ( n ′ d ),and we can simply remove the dummy players to obtain an ǫ -equilibrium of the original game G .The running time is O ( n ′ d ) = poly( n, /ǫ ) when γ = Θ(1). Proof of Theorem 1.3.

Assume that we can compute an O ( n − c )-approximate equilibrium in poly-nomial time for some constant c >

1. Let γ = c −

1, so we can compute an O (cid:16) n γ (cid:17) -approximateequilibrium in polynomial time. By Lemma 4.1, we can convert it into an O (cid:16) n γ/ (cid:17) -well-supportedequilibrium. Lemma 4.2 then states that any polynomial-time algorithm that computes a well-supported Nash equilibrium of an inverse polynomial precision gives an FPTAS for computingwell-supported Nash equilibria in anonymous games. In this section, we present a faster algorithm that computes an e O (cid:16) n − / k / (cid:17) -approximate Nashequilibrium in n player k strategy anonymous games. Note that this algorithm always runs inpolynomial time in the input size, without assuming any relationship between n and k .Our approach builds on the idea of [GT15] to “smooth” an anonymous game by forcing all theplayers to randomize. We prove that the perturbed game is Lipschitz and therefore admits a pureNash equilibrium (Lemma 5.1), which corresponds to simple approximate equilibria of a speciﬁcform in the original game: Each player plays one strategy with probability 1 − δ for some small δ ,and plays other strategies uniformly at random with probability δ . To prove the perturbed game isLipschitz (Proposition 5.2), we rely on the recently established multivariate central limit theorem16CLT) of [DDKT16, DKS16a] to show that for δ = Ω( n − / ) the associated PMD is close to adiscrete Gaussian.Recall that an anonymous game G = ( n, k, { u ia } i ∈ [ n ] ,a ∈ [ k ] ) is λ -Lipschitz if ∀ i ∈ [ n ] , ∀ a ∈ [ k ] , ∀ x, y ∈ Π kn − , (cid:12)(cid:12)(cid:12) u ia ( x ) − u ia ( y ) (cid:12)(cid:12)(cid:12) ≤ λ k x − y k . An approximate pure Nash equilibrium always exists in Lipschitz anonymous games.

Lemma 5.1 ([DP15, AS13]) . Every λ -Lipschitz anonymous game with k strategies admits a (2 kλ ) -approximate pure Nash equilibrium. Moreover, such an approximate equilibrium can be found intime e O ( n + k ) times the description size of the game. We perturb the input game G to get another game G δ as follows. Let X δ ( e j ) denote the k -CRVthat takes value e j with probability 1 − δ , and takes value e j ′ with probability δk − for all other j ′ = j . When a player plays the strategy j in the perturbed game G δ , it is as if she is playing X δ ( e j ) in the original game G . For example, the strategy (1 , , . . . ,

0) in G δ maps back to the mixedstrategy (1 − δ, δk − , . . . , δk − ) in G .By forcing all players to randomize, we increase the uncertainty in the outcome of the game(i.e., the variance of the resulting PMD), and thus making the game “smoother”. As we will provelater, the perturbed game G δ is λ -Lipschitz for λ = e O (cid:16) k / √ nδ (cid:17) . It then follows from Lemma 5.1 thatthere exists a (2 kλ )-pure Nash equilibrium of G δ , which is a ( δ + 2 kλ )-mixed Nash equilibriumof G . The next proposition formally deﬁnes the payoﬀ structure of G δ , and bounds its Lipschitzconstant. Proposition 5.2.

Given an anonymous game G = ( n, k, { u ia } i ∈ [ n ] ,a ∈ [ k ] ) with payoﬀs normalized to [0 , , we deﬁne an anonymous game G δ = ( n, k, { u ′ ia } i ∈ [ n ] ,a ∈ [ k ] ) as follows, ∀ i ∈ [ n ] , a ∈ [ k ] , x ∈ Π kn − , u ′ ia ( x ) := (1 − δ ) E x ′ ∼ M δ ( x ) h u ia ( x ′ ) i + δk − X a ′ = a E x ′ ∼ M δ ( x ) h u ia ′ ( x ′ ) i , where M δ ( x ) = P j ∈ [ k ] x j X δ ( e j ) is an ( n − , k ) -PMD that corresponds to the perturbed outcome ofthe partition x ∈ Π kn − . Then G δ is e O (cid:16) k / √ nδ (cid:17) -Lipschitz. We defer the proof of Proposition 5.2 to the next subsection. We now show how Theorem 1.2follows from Proposition 5.2.

Proof of Theorem 1.2.

Proposition 5.2 shows that G δ is e O (cid:16) k / √ nδ (cid:17) -Lipschitz. By Lemma 5.1, thereexists a (2 kλ )-approximate pure Nash equilibrium in G δ , and as noted in [DP15], such an ap-proximate equilibrium can be found in total number of bit operations that is e O ( n + k ) times thedescription size of G δ , by enumerating pure strategy proﬁles and solving maximum ﬂows to matchplayers to mixed strategies. Since we can compute the payoﬀ structure of G δ in polynomial-timegiven the input game G , the overall running time is polynomial in the input size.We now bound the quality of the approximate Nash equilibrium. Note that a (2 kλ )-pureequilibrium of G δ is a ( δ + 2 kλ )-mixed Nash equilibrium of G , since an ǫ -equilibrium in G δ meansthat players cannot gain more than ǫ by deviating to the mixed strategies of the form X δ ( e j ) =(1 − δ ) e j + δk − ( − e j ), so they gain at most ( δ + 2 kλ ) by deviating to any e j . Because changingwhat a player is doing δ fraction of the time can change her payoﬀ by at most δ . Therefore, wecan compute an ( δ + 2 kλ ) = e O (cid:16) δ + k / √ nδ (cid:17) -equilibrium of the original game G in polynomial-timefor any δ >

0. Finally, setting δ = k / n / , we get an e O (cid:16) k / n / (cid:17) -approximate Nash equilibrium.17 .1 Proof of Proposition 5.2 This section is devoted to the proof of Proposition 5.2. We will make use of the following tworesults. The ﬁrst lemma is the multivariate central limit theorem from [DKS16a], which statesthat if an ( n, k )-PMD X has high variance in all directions orthogonal to the all ones vector (its variance along is 0), then the projection of X on the ﬁrst ( k −

1) coordinates is close to adiscretized Gaussian distribution with the same mean vector and covariance matrix.

Lemma 5.3 ([DKS16a]) . Let X be an ( n, k ) -PMD, and X ′ be a ( k − -dimensional randomvariable that is the projection of X onto its ﬁrst k − coordinates. Let Σ ′ be the covariance matrixof X ′ . Suppose that Σ ′ has no eigenvectors with eigenvalue less than σ ′ . Let G ′ be the distributionobtained by sampling from N ( E [ X ′ ] , Σ ′ ) and rounding to the nearest point in Z k . Then, we havethat d TV (cid:0) X ′ , G ′ (cid:1) ≤ O (cid:18) k / q log ( σ ′ ) /σ ′ (cid:19) . The second simple lemma states that if two k -dimensional Gaussian distributions have similarmean vectors and variances (in all directions), then they are close in total variation distance. Lemma 5.4 ([DDKT16]) . For two k -dimensional Gaussians X ∼ N ( µ , Σ ) and Y ∼ N ( µ , Σ ) ,such that for all unit vector v , (cid:12)(cid:12)(cid:12) v T ( µ − µ ) (cid:12)(cid:12)(cid:12) ≤ ǫs v , and (cid:12)(cid:12)(cid:12) v T (Σ − Σ ) v (cid:12)(cid:12)(cid:12) ≤ ǫs v √ k , where s v = max { v T Σ v, v T Σ v } . Then d TV ( X, Y ) ≤ ǫ .Proof of Proposition 5.2. To prove the game G δ is λ -Lipschitz, we need to show that ∀ i ∈ [ n ] , ∀ a ∈ [ k ] , ∀ x, y ∈ Π kn − , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ′ ∼ M δ ( x ) h u ia ( x ′ ) i − E y ′ ∼ M δ ( y ) h u ia ( y ′ ) i(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ λ k x − y k . In fact, because the payoﬀ entries are normalized in [0 , n − , k )-PMDs M δ ( x ) and M δ ( y ) is small, namely ∀ x, y ∈ Π kn − , d TV ( M δ ( x ) , M δ ( y )) ≤ λ k x − y k . Let M ′ δ ( x ) and M ′ δ ( y ) be the distributions M δ ( x ) and M δ ( y ) projected onto their ﬁrst k − n , the k th coordinate is redundant andso d TV ( M δ ( x ) , M δ ( y )) = d TV ( M ′ δ ( x ) , M ′ δ ( y )). To show that M ′ δ ( x ) and M ′ δ ( y ) are close in totalvariation distance, we ﬁrst prove that the covariance matrix of M ′ δ ( x ) has high variance in all di-rections, which allows us to use the multivariate central limit theorem (Lemma 5.3) to concludethat both M ′ δ ( x ) and M ′ δ ( y ) are close to the (discretized) Gaussian distributions with the samemean vectors and covariance matrices respectively. We then bound from above the total variationdistance between two high-variance k -dimensional Gaussian distributions whose mean vectors areessentially x and y . 18ecall that M δ ( x ) is the sum of n − k -CRVs, and let Σ denote the covariancematrix of M δ ( x ). For any unit vector v ∈ R k that is orthogonal to the all-one vector, we haveVar[ v T X δ ( e j )] = E (cid:20)(cid:16) v T X δ ( e j ) (cid:17) (cid:21) − (cid:16) E h v T X δ ( e j ) i(cid:17) = (1 − δ ) v j + δk − X j ′ = j v j ′ −  (1 − δ ) v j + δk − X j ′ = j v j ′  = (1 − δ ) v j + δk − − v j ) − (cid:18) (1 − δ ) v j − δk − v j (cid:19) ≥ δk − , where we simplify the expression using the fact that P j v j = 0 and P j v j = 1, and then take deriva-tive to minimize it. Therefore, for any unit vector v , v T Σ v = Var[ v T M δ ( x )] = P j ∈ [ k ] x j Var[ v T X δ ( e j )] ≥ ( n − δk − , which implies that Σ has no eigenvalues less than ( n − δk − (except the one associated with ). We then use the following lemma to bound from below the eigenvalues of Σ ′ : Lemma 5.5.

Suppose that Σ is a positive semideﬁnite matrix with Σ = 0 and that all othereigenvalues of Σ are at least σ . Then for all vectors w ∈ R k with w k = 0 , we have that w T Σ ww T w ≥ σ /k. Proof.

Let w be a vector that minimizes w T Σ ww T w over w ∈ R k with w k = 0. Then v = w − w T k has v T = 0 and so v T Σ v ≥ σ v T v . We have v T Σ v = w T Σ w since v − w is a multiple of , and wehave v T v = w − w T k ! T w − w T k ! = w T w + ( w T ) /k − w T ) /k = k w k − k w k /k ≥ w T w/k , where the last line follows from the inequality k w k ≤ √ k − k w k . Thus, we have that w T Σ ww T w ≥ v T Σ vkv T v ≥ σ /k. Since all except one eigenvalues of each of Σ and Σ are at least ( n − δ ( k − , the minimum eigenval-ues of Σ ′ and Σ ′ are at least ( n − δk . Let Z ( µ, Σ) be the discretized Gaussian obtained by rounding N ( µ, Σ) to the nearest integer in every coordinate. Then, by Lemma 5.3, we haved TV (cid:0) M ′ δ ( x ) , Z ( µ ′ , Σ ′ ) (cid:1) ≤ e O k / √ nδ ! , d TV (cid:0) M ′ δ ( y ) , Z ( µ ′ , Σ ′ ) (cid:1) ≤ e O k / √ nδ ! . (5)19ext, we use Lemma 5.4 to bound the total variation distance between the k -dimensional Gaus-sian distributions N ( µ ′ , Σ ′ ) and N ( µ ′ , Σ ′ ). Let µ , µ and Σ , Σ be the mean vectors and thecovariance matrices of M δ ( x ) and M δ ( y ) respectively. Observe that µ = (cid:18) − kδk − (cid:19) x + δ . So, for any unit vector v ∈ R k , s v = max { v T Σ v, v T Σ v } ≥ ( n − δk − , (cid:12)(cid:12)(cid:12) v T ( µ − µ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) v T (cid:18)(cid:18) − kδk − (cid:19) x − (cid:18) − kδk − (cid:19) y (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) − kδk − (cid:19) k x − y k ≤ k x − y k . If the unit vector v is orthogonal to , we can use the expression for v T Σ v we had earlier. Takingderivative with respect to v j shows that Var[ v T X δ ( e j )] is maximized at v = ± e j . Hence, we canwrite (cid:12)(cid:12)(cid:12) v T (Σ − Σ ) v (cid:12)(cid:12)(cid:12) = X j ∈ [ k ] ( x j − y j ) Var[ v T X δ ( e j )] ≤ k x − y k max j Var[ v T X δ ( e j )] ≤ k x − y k " (1 − δ ) − (cid:18) − kδk − (cid:19) = k − k + 1 δ k x − y k ≤ δ k x − y k . To see that the upper bound on (cid:12)(cid:12)(cid:12) v T (Σ − Σ ) v (cid:12)(cid:12)(cid:12) holds for all unit vectors, observe that for bothcovariance matrices it holds Σ = Σ = . For any unit vector v ′ , we can take its projectiononto the subspace orthogonal to , and write v ′ as a linear combination αv + β , for some α < v that is orthogonal to . That is, (cid:12)(cid:12)(cid:12) v ′ T (Σ − Σ ) v ′ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ( αv + β ) T (Σ − Σ )( αv + β ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) α v T (Σ − Σ ) v (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) v T (Σ − Σ ) v (cid:12)(cid:12)(cid:12) . Thus, for all unit vectors v ∈ R k , we have (cid:12)(cid:12)(cid:12) v T ( µ − µ ) (cid:12)(cid:12)(cid:12) ≤ k x − y k and (cid:12)(cid:12)(cid:12) v T (Σ − Σ ) v (cid:12)(cid:12)(cid:12) ≤ δ k x − y k . In particular, this holds for vectors with k th coordinate 0. Hence, for all v ∈ R k − ,we have (cid:12)(cid:12)(cid:12) v T ( µ ′ − µ ′ ) (cid:12)(cid:12)(cid:12) ≤ k x − y k and (cid:12)(cid:12)(cid:12) v T (Σ ′ − Σ ′ ) v (cid:12)(cid:12)(cid:12) ≤ δ k x − y k .Finally, we set ǫ = O (cid:16) √ k √ nδ + k / n (cid:17) k x − y k to satisfy the requirements of Lemma 5.4, and there-fore d TV ( N ( µ ′ , Σ ′ ) , N ( µ ′ , Σ ′ )) ≤ ǫ . By the data processing inequality, rounding both distributionsto the nearest integer coordinates does not increase their total variation distance, therefored TV (cid:0) Z ( µ ′ , Σ ′ ) , Z ( µ ′ , Σ ′ ) (cid:1) ≤ ǫ. (6)By the triangle inequality, Equations (5) and (6) yieldd TV ( M δ ( x ) , M δ ( y )) = d TV (cid:0) M ′ δ ( x ) , M ′ δ ( y ) (cid:1) ≤ d TV (cid:0) M ′ δ ( x ) , Z ( µ ′ , Σ ′ ) (cid:1) + d TV (cid:0) Z ( µ ′ , Σ ′ ) , Z ( µ ′ , Σ ′ ) (cid:1) + d TV (cid:0) Z ( µ ′ , Σ ′ ) , M ′ δ ( y ) (cid:1) ≤ e O k / √ nδ ! + O √ k √ nδ + k / n ! k x − y k ≤ e O k / √ nδ ! k x − y k . M δ ( x ) = M δ ( y ) when x = y , so we can assume that k x − y k ≥ G δ is λ -Lipschitz for λ = e O (cid:16) k / √ nδ (cid:17) . References [AS13] Yaron Azrieli and Eran Shmaya. Lipschitz games.

Mathematics of Operations Research ,38(2):350–357, 2013.[Blo99] Matthias Blonski. Anonymous games with binary actions.

Games and Economic Be-havior , 28(2):171–180, 1999.[Blo05] Matthias Blonski. The women of Cairo: Equilibria in large anonymous games.

Journalof Mathematical Economics , 41(3):253–264, 2005.[CDO15] Xi Chen, David Durfee, and Anthi Orfanou. On the complexity of Nash equilibria inanonymous games. In

Proceedings of the 47th Annual ACM Symposium on Theory ofComputing (STOC) , pages 381–390, 2015.[DDKT16] Constantinos Daskalakis, Anindya De, Gautam Kamath, and Christos Tzamos. A size-free CLT for Poisson multinomials and its applications. In

Proceedings of the 48thAnnual ACM Symposium on Theory of Computing (STOC) , pages 1074–1086, 2016.[DDS12] Constantinos Daskalakis, Ilias Diakonikolas, and Rocco A. Servedio. Learning poissonbinomial distributions. In

Proceedings of the Forty-fourth Annual ACM Symposium onTheory of Computing , STOC ’12, pages 709–728. ACM, 2012.[DGP09] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The com-plexity of computing a Nash equilibrium.

SIAM J. Comput. , 39(1):195–259, 2009.[DKS16a] Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. The Fourier transform ofPoisson multinomial distributions and its algorithmic applications. In

Proceedings ofthe 48th Annual ACM Symposium on Theory of Computing (STOC) , pages 1060–1073,2016.[DKS16b] Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Optimal learning via thefourier transform for sums of independent integer random variables. In

Proceedings ofThe 29th Conference on Learning Theory (COLT) , pages 831–849, 2016.[DKS16c] Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Properly learning Poissonbinomial distributions in almost polynomial time. In

Proceedings of The 29th Conferenceon Learning Theory (COLT) , pages 850–878, 2016.[DP07] Constantinos Daskalakis and Christos H. Papadimitriou. Computing equilibria inanonymous games. In , pages 83–93, 2007.[DP08] Constantinos Daskalakis and Christos H. Papadimitriou. Discretized multinomial dis-tributions and Nash equilibria in anonymous games. In , pages 25–34, 2008.21DP09] Constantinos Daskalakis and Christos H. Papadimitriou. On oblivious PTAS’s forNash equilibrium. In

Proceedings of the 41st Annual ACM Symposium on Theory ofComputing (STOC) , pages 75–84, 2009.[DP15] Constantinos Daskalakis and Christos H. Papadimitriou. Approximate Nash equilibriain anonymous games.

J. Economic Theory , 156:207–245, 2015.[GT15] Paul W. Goldberg and Stefano Turchetta. Query complexity of approximate equilibriain anonymous games. In

Proceedings of the 11th Conference on Web and InternetEconomics (WINE) , pages 357–369, 2015.[LMM03] Richard J. Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games usingsimple strategies. In

Proceedings 4th ACM Conference on Electronic Commerce (EC) ,pages 36–41, 2003.[Mil96] Igal Milchtaich. Congestion games with player-speciﬁc payoﬀ functions.

Games andEconomic Behavior , 13(1):111–124, 1996.[Rub16] Aviad Rubinstein. Settling the complexity of computing approximate two-player Nashequilibria. In , 2016. To appear.[Sim82] Herbert A. Simon.

Models of bounded rationality: Empirically grounded economic rea-son , volume 3. MIT press, 1982.

AppendixA Proof of Lemma 3.10

This lemma is a generalization of Lemma 5.5 of [DKS16a], which assumes that ǫ = f O k (1 /σ ). Thus,we need to be careful about where this relation was used in the proof.Note that for a ﬁxed ξ , if ξ ′ satisﬁes ξ ′ ∈ ξ + Z k and ξ ′ T Σ ξ ′ ≤ Ck log(1 /ǫ ), then so does ξ + i for all i ∈ Z . We deﬁne T ′ as T ′ def = n ξ ′ ∈ R k : ξ ′ T Σ ξ ′ ≤ Ck log(1 /ǫ ) and 0 ≤ ξ ′ · ≤ k o . Then, ξ ∈ T if and only if there is a ξ ′ ∈ T ′ with ξ − ξ ′ ∈ Z k .(i) Because ξ − ξ ′ ∈ Z k , we have [ ξ i − ξ j ] ≤ (cid:12)(cid:12)(cid:12) ξ ′ i − ξ ′ j (cid:12)(cid:12)(cid:12) . So to prove (i), we need to show that (cid:12)(cid:12)(cid:12) ξ ′ i − ξ ′ j (cid:12)(cid:12)(cid:12) ≤ p Ck log(1 /ǫ ) /σ for all ξ ′ ∈ T ′ , i and j .Fix ξ ′ ∈ T ′ , we deﬁne e ξ ′ to be the projection of ξ ′ onto the subspace orthogonal to , i.e., e ξ ′ = ξ ′ − ξ ′ · k . Since Σ = and all other eigenvalues of Σ are at least σ , for all i , j we have (cid:12)(cid:12)(cid:12) ξ ′ i − ξ ′ j (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) e ξ ′ i − e ξ ′ j (cid:12)(cid:12)(cid:12) ≤ k e ξ ′ k ∞ ≤ k e ξ ′ k ≤ q ξ ′ T Σ ξ ′ /σ ≤ q Ck log(1 /ǫ ) /σ. This proves (i). 22ii) Next we consider Vol( T ′ ). If ξ ′ ∈ T ′ , we know that k ξ ′ − ( ξ ′ · /k ) k ≤ Ck log(1 /ǫ ) /σ . Also0 ≤ ξ ′ · ≤ k implies that k ( ξ ′ · /k ) k ≤ k . Because these two vectors are orthogonal, wecan write k ξ ′ k = (cid:13)(cid:13)(cid:13)(cid:13) ξ ′ − ξ ′ · k (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) ξ ′ · k (cid:13)(cid:13)(cid:13)(cid:13) ≤ Ck log(1 /ǫ ) /σ + k ≤ Ck log(1 /ǫ ) , where the last inequality holds by the assumption that σ ≥

1. Thus, ξ ′ T (Σ + I ) ξ ′ = ξ ′ T Σ ξ ′ + k ξ ′ k ≤ Ck log(1 /ǫ ) . By Claim 5.4 of [DKS16a], we get thatVol( T ′ ) ≤ det(Σ + I ) − / O ( C log(1 /ǫ )) k/ . It then follows from Lemma 3.9 that Vol( T ′ ) | S | = O ( C log(1 /ǫ )) k .To show (ii), we need to show that Vol( T ) ≤ Vol( T ′ ). Note that T ′ is a disjoint union of itsintersections with unit cubes with integer corners, and soVol( T ′ ) = X b ∈ Z k Vol T ′ ∩ k Y i =1 [ b i , b i + 1) ! . On the other hand, T is the union of translations of these sets T = [ b ∈ Z k { ξ ′ − b : ξ ∈ T ′ ∩ k Y i =1 [ b i , b i + 1) } , so Vol( T ) ≤ Vol( T ′ ).(iii) By the pigeonhole principle, for every ξ ∈ R k , there is an interval I ξ of length kk +1 such thatthere exists ξ ′ ∈ ξ + Z k where all the coordinates of ξ ′ are in I ξ . We deﬁne T m to be T m def = n ξ ∈ [0 , k : ∃ ξ ′ ∈ (cid:16) ξ + Z k (cid:17) ∩ I kξ and 2 m Ck log( σ ) ≤ ξ ′ T Σ ξ ′ ≤ m +1 Ck log( σ ) o . Then, we have that T ∪ ( S ∞ m =0 T m ) = [0 , k , although these sets need not be disjoint. Thus,[0 , k /T ⊆ S ∞ m =0 T m and so Z [0 , k /T (cid:12)(cid:12)(cid:12) b X ( ξ ) (cid:12)(cid:12)(cid:12) dξ ≤ ∞ X m =0 Vol( T m ) sup ξ ∈ T m | b X ( ξ ) | . If we apply (ii) of this lemma with 2 m +1 C instead of C , the resulting set T would be asuperset of T m . Thus, we have that Vol( T m ) ≤ O (cid:0) m +1 C log(1 /ǫ ) (cid:1) k / | S | . To show (iii), webound sup ξ ∈ T m | b X ( ξ ) | using the following claim, which gives a “Gaussian decay” upper boundon the magnitude of the Fourier transform. Claim A.1.

For ξ ∈ T m , it holds | b X ( ξ ) | ≤ exp( − Ω( C m log(1 /ǫ ) /k )) . If additionally we have m ≤ k, then | b X ( ξ ) | = exp( − Ω( C m k log(1 /ǫ ))) . roof. We take ξ ′ ∈ (cid:16) ξ + Z k (cid:17) ∩ I kξ as in the deﬁnition of T m . Lemma 3.10 of [DKS16a] givesthat if the coordinates of ξ ′ lie in an interval of length 1 − δ , then | b X ( ξ ) | = | b X ( ξ ′ ) | ≤ exp( − Ω( δ ξ ′ T · Σ · ξ ′ )) = exp( − Ω( C m k log(1 /ǫ ) δ )) . By the deﬁnition of T m , we take δ = k +1 to get the bound | b X ( ξ ) | ≤ exp( − Ω( C m log(1 /ǫ ) /k )).To get the stronger bound, we need to show that when m is small all coordinates of ξ ′ are ina shorter interval. This is because, if we apply (i) of this lemma with 2 m +1 C instead of C ,we have | ξ ′ i − ξ ′ j | ≤ p m +3 Ck log(1 /ǫ ) /σ for any i , j . When m ≤ log ( σ/ ( Ck log(1 /ǫ ))) − δ = 1 / σ ≥ poly( k log(1 /ǫ )). We need m ≤ k ≤ log ( σ/ ( Ck log(1 /ǫ ))) −

4, which holds when σ ≥ Ck log(1 /ǫ ).Finally, for (iii) we can write Z [0 , k /T (cid:12)(cid:12)(cid:12) b X ( ξ ) (cid:12)(cid:12)(cid:12) dξ ≤ ∞ X m =0 Vol( T m ) sup ξ ∈ T m | b X ( ξ ) |≤ ∞ X m =0 O (cid:16) m +1 C log(1 /ǫ ) (cid:17) k sup ξ ∈ T m | b X ( ξ ) |≤ O ( C log(1 /ǫ )) k | S | ∞ X m =0 mk sup ξ ∈ T m | b X ( ξ ) | . We divide this sum into two pieces: k X m =0 mk sup ξ ∈ T m | b X ( ξ ) | ≤ k X m =0 mk exp( − Ω( C m k log(1 /ǫ ))) ≤ k X m =0 exp( − Ω( C (2 m − m ) k log(1 /ǫ ))) ≤ k X m =0 − m exp( − Ω( Ck log(1 /ǫ ))) ≤ exp( − Ω( Ck log(1 /ǫ ))) = ǫ Ω( Ck ) , and ∞ X m =3 log k mk sup ξ ∈ T m | b X ( ξ ) | ≤ ∞ X m =3 log k mk exp( − Ω( C m log(1 /ǫ ) /k )) ≤ ∞ X m =3 log k exp( − Ω( C (2 m − mk ) log(1 /ǫ ) /k )) ≤ ∞ X m =3 log k exp( − Ω( C ( k + mk ) log(1 /ǫ ) /k )) ≤ ∞ X m =3 log k − m exp( − Ω( Ck log(1 /ǫ ))) ≤ ǫ Ω( Ck ) . We thus have R [0 , k \ T | b X ( ξ ) | dξ ≤ O ( C log(1 /ǫ )) k ǫ Ω( Ck ) / | S | ≤ ǫ/ (2 ||