[PDF] Approximating Nash Equilibria and Dense Subgraphs via an Approximate Version of Carathéodory's Theorem

Abstract

We present algorithmic applications of an approximate version of Carathéodory's theorem. The theorem states that given a set of vectors X in R d , for every vector in the convex hull of X there exists an ε -close (under the p -norm distance, for 2≤p<∞ ) vector that can be expressed as a convex combination of at most b vectors of X , where the bound b depends on ε and the norm p and is independent of the dimension d . This theorem can be derived by instantiating Maurey's lemma, early references to which can be found in the work of Pisier (1981) and Carl (1985). However, in this paper we present a self-contained proof of this result. Using this theorem we establish that in a bimatrix game with n×n payoff matrices A,B , if the number of non-zero entries in any column of A+B is at most s then an ε -Nash equilibrium of the game can be computed in time n O( logs ε 2 ) . This, in particular, gives us a polynomial-time approximation scheme for Nash equilibrium in games with fixed column sparsity s . Moreover, for arbitrary bimatrix games---since s can be at most n ---the running time of our algorithm matches the best-known upper bound, which was obtained by Lipton, Markakis, and Mehta (2003). The approximate Carathéodory's theorem also leads to an additive approximation algorithm for the normalized densest k -subgraph problem. Given a graph with n vertices and maximum degree d , the developed algorithm determines a subgraph with exactly k vertices with normalized density within ε (in the additive sense) of the optimal in time n O( logd ε 2 ) . Additionally, we show that a similar approximation result can be achieved for the problem of finding a k×k -bipartite subgraph of maximum normalized density.

Full PDF

aa r X i v : . [ c s . G T ] A p r Approximating Nash Equilibria and Dense Subgraphs via anApproximate Version of Carath´eodory’s Theorem

Siddharth Barman ∗ Abstract

We present algorithmic applications of an approximate version of Carath´eodory’s theorem.The theorem states that given a set of vectors X in R d , for every vector in the convex hull of X there exists an ε -close (under the p -norm distance, for 2 ≤ p < ∞ ) vector that can be expressedas a convex combination of at most b vectors of X , where the bound b depends on ε and thenorm p and is independent of the dimension d . This theorem can be derived by instantiatingMaurey’s lemma, early references to which can be found in the work of Pisier (1981) and Carl(1985). However, in this paper we present a self-contained proof of this result.Using this theorem we establish that in a bimatrix game with n × n payoﬀ matrices A, B , ifthe number of non-zero entries in any column of A + B is at most s then an ε -Nash equilibriumof the game can be computed in time n O ( log sε ). This, in particular, gives us a polynomial-timeapproximation scheme for Nash equilibrium in games with ﬁxed column sparsity s . Moreover,for arbitrary bimatrix games—since s can be at most n —the running time of our algorithmmatches the best-known upper bound, which was obtained by Lipton, Markakis, and Mehta(2003).The approximate Carath´eodory’s theorem also leads to an additive approximation algorithmfor the normalized densest k -subgraph problem. Given a graph with n vertices and maximumdegree d , the developed algorithm determines a subgraph with exactly k vertices with normalizeddensity within ε (in the additive sense) of the optimal in time n O ( log dε ). Additionally, we showthat a similar approximation result can be achieved for the problem of ﬁnding a k × k -bipartitesubgraph of maximum normalized density. Carath´eodory’s theorem is a fundamental dimensionality result in convex geometry. It states thatany vector in the convex hull of a set X in R d can be expressed as a convex combination of at most d + 1 vectors of X . This paper considers a natural approximate version of Carath´eodory’s theoremwhere the goal is to seek convex combinations that are close enough to vectors in the convex hull. ∗ California Institute of Technology. [email protected] This bound of d + 1 is tight. X in the p -unit ball with norm p ∈ [2 , ∞ ), for every vector µ in the convex hull of X there exists an ε -close—under the p -norm distance—vector µ ′ that can be expressed as a convex combination of pε vectors of X . Anotable aspect of this result is that the number of vectors of X that are required to express µ ′ , i.e., pε , is independent of the underlying dimension d . This theorem can be derived by instantiatingMaurey’s lemma, early references to which can be found in the work of Pisier [30] and Carl [10].However, in this paper we present a self-contained proof of this result, which we proceed to outlinebelow. The author was made aware of the connection with Maurey’s lemma after a preliminaryversion of this work had appeared.To establish the approximate version of Carath´eodory’s theorem we use the probabilistic method.Given a vector µ in the convex hull of a set X ⊂ R d , consider a convex combination of vectors of X that generates µ . The coeﬃcients in this convex combination induce a probability distributionover X and the mean of this distribution is µ . The approach is to draw b independent and identi-cally distributed (i.i.d.) samples from this distribution and show that with positive probability thesample mean, with an appropriate number of samples, is close to µ under the p -norm distance, for p ∈ [2 , ∞ ). Therefore, the probabilistic method implies that these exists a vector close to µ thatcan be expressed as a convex combination of at most b vectors, where b is the number of sampleswe drew.Note that in this context applying the probabilistic method is a natural idea, but a directapplication of this method will not work. Speciﬁcally, a dimension-free result is unlikely if we ﬁrsttry to prove that the i th component of the sample mean vector is close to the i th component of µ ,for every i ∈ [ d ]; since this would entail a union bound over the number of components d . Bypassingsuch a component-wise analysis requires the use of atypical ideas. We are able to accomplish thistask and, in particular, bound (in expectation) the p -norm distance between µ and the samplemean vector via an interesting application of Khintchine inequality (see Theorem 1).Given the signiﬁcance of Carath´eodory’s theorem, this approximate version is interesting in itsown right. The key contribution of the paper is to substantiate the algorithmic relevance of thisapproximate version by developing new algorithmic applications. Our applications include additiveapproximation algorithms for (i) Nash equilibria in two-player games, and (ii) the densest subgraphproblem. These algorithmic results are outlined below. Algorithmic Applications

Approximate Nash Equilibria.

Nash equilibria are central constructs in game theory that areused to model likely outcomes of strategic interactions between self-interested entities, like humanplayers. They denote distributions over actions of players under which no player can beneﬁt, inexpectation, by unilateral deviation. These solution concepts are arguably the most well-studied That is, X is contained in the set { v ∈ R d | k v k p ≤ } . approximate Nash equilibriumcan be computed in polynomial time still remains open. Throughout this paper we will considerthe standard additive notion of approximate Nash equilibria that are deﬁned as follows: a pairdistributions, one for each player, is said to be an ε -Nash equilibrium if any unilateral deviationincreases utility by at most ε , in expectation.We apply the approximate version of Carath´eodory’s theorem to address this central openquestion. Speciﬁcally, we prove that in a bimatrix game with n × n payoﬀ matrices A, B , i.e., atwo-player game with n actions for each player, if the number of non-zero entries in any columnof A + B is at most s then an ε -Nash equilibrium of the game can be computed in time n O (cid:16) log sε (cid:17) .Our result, in particular, shows that games with ﬁxed column sparsity s admit a polynomial-timeapproximation scheme (PTAS) for Nash equilibrium. Along the lines of zero-sum games (whichmodel strict competition), games with ﬁxed column sparsity capture settings in which, except forparticular action proﬁles, the gains and losses of the two player balance out. In other words, suchgames are a natural generalization of zero-sum games; recall that zero-sum games admit eﬃcientcomputation of Nash equilibrium (see, e.g., [29]).It is also worth pointing out that for an arbitrary bimatrix game the running time of ouralgorithm is n O (cid:16) log nε (cid:17) , since s is at most n . Given that the best-known algorithm for computing ε -Nash equilibrium also runs in time n O (cid:16) log nε (cid:17) [25], for general games the time complexity of ouralgorithm matches the best-known upper bound. Overall, this result provides a parameterizedunderstanding of the complexity of computing approximate Nash equilibrium in terms of a verynatural measure, the column sparsity s of the matrix A + B .Our framework can address other notions of sparsity as well. Speciﬁcally, if there exist constants α, β ∈ R + and γ ∈ R such that the matrix αA + βB + γ n × n has column or row sparsity s , thenour algorithm can be directly adopted to ﬁnd an ε -Nash equilibrium of the game ( A, B ) in time n O (cid:16) log sε (cid:17) ; here, n × n is the all-ones n × n matrix. Additionally, the same running-time bound canbe achieved for approximating Nash equilibrium in games wherein both matrices A and B havecolumn or row sparsity s . Note that this case is not subsumed by the previous result; in particular,if the columns of matrix A and the rows of matrix B are sparse, then it is not necessary that A + B has low column or row sparsity.We also reﬁne the following result of Daskalakis and Papadimitriou [17]: They develop a PTASfor bimatrix games that admit an equilibrium with small, speciﬁcally O (cid:0) n (cid:1) , probability values. Thisresult is somewhat surprising, since such small-probability equilibria have large, Ω( n ), support, andhence are not amenable to, say, exhaustive search. We show that if a game has an equilibrium withprobability values O (cid:0) m (cid:1) , for m ∈ [ n ], then an approximate equilibrium can be computed in time Note that given matrices A and B , parameters α , β , and γ can eﬃciently computed. t , where t = O (cid:16) log( s/m ) ε (cid:17) . Since s ≤ n , we get the result of [17] as a special case. Densest Subgraph.

In the normalized densest k -subgraph problem (NDkS) we are given a simplegraph and the objective is to ﬁnd a size- k subgraph (i.e., a subgraph containing exactly k vertices)of maximum density; here, density is normalized to be at most one, i.e., for a subgraph with k vertices, it is deﬁned to be the number of edges in the subgraph divided by k . NDkS is simplya normalized version of of the standard densest k -subgraph problem (see, e.g., [1] and referencestherein) wherein the goal is to ﬁnd a subgraph with k vertices with the maximum possible numberof edges in it. The densest k -subgraph problem (DkS) is computationally hard and it is shownin [1] that a constant-factor approximation for DkS is unlikely. This result implies that NDkS ishard to approximate (multiplicatively) within a constant factor as well.In this paper we focus on an additive approximation for NDkS. In particular, our objective isto compute a size- k subgraph whose density is close (in the additive sense) to the optimal. Thepaper also presents additive approximations for the densest k -bipartite subgraph (DkBS) problem.DkBS is a natural variant of NDkS and the goal in this problem is to ﬁnd size- k vertex subsetsof maximum density. In the bipartite case, density of vertex subsets S and T is deﬁned to be thenumber of edges between the two subsets divided by | S || T | .Hardness of additively approximating DkBS was studied by Hazan and Krauthgamer [20].Speciﬁcally, the reduction in [20] rules out an additive PTAS for DkBS, under complexity the-oretic assumptions. In terms of upper bound, the result of Alon et al. [3] presents an algorithmfor this problem that runs in time exponential in the rank of the adjacency matrix.This paper develops the following complementary upper bounds: given a graph with n verticesand maximum degree d , an ε -additive approximation for NDkS can be computed in time n O (cid:16) log dε (cid:17) .This paper also presents an algorithm with the same time complexity for additively approximatingDkBS. Approximate Version of Carath´eodory’s Theorem.

In this paper we provide a self-containedproof of the approximate version of Carath´eodory’s theorem, employing the Khintchine inequality(see Theorem 1), and use the theorem to develop new approximation algorithms. As mentionedearlier, the approximate version of Carath´eodory’s theorem can also be obtained by instantiatingMaurey’s lemma, which, in particular, appears in the analysis and operator theory literatures; see,e.g., [30, 10, 8]. They reduce the problem of determining a planted clique to that of computing an ε -additive approximation forDkBS, with a suﬃciently small but constant ε . pproximate Nash Equilibria. The computation of equilibria is an active area of research.Nash equilibria is known to be computationally hard [12, 14], and in light of these ﬁndings, aconsiderable eﬀort has been directed towards understanding the complexity of approximate

Nashequilibrium. Results in this direction include both upper bounds [25, 22, 15, 21, 16, 23, 18, 7, 33,34, 3, 2] and lower bounds [20, 13, 9]. In particular, it is known that for a general bimatrix gamean approximate Nash equilibrium can be computed in quasi-polynomial time [25]. Polynomial timealgorithms have been developed for computing approximate Nash equilibria for ﬁxed values of theapproximation factor ε ; the best-known result of this type shows that a 0 . A + B , haslogarithmic rank. Our result is incomparable to such rank based results, since a sparse matrix canhave high rank and a low-rank matrix can have high sparsity.Chen et al. [11] considered sparsity in the context of games and showed that computing anexact Nash equilibrium is hard even if both the payoﬀ matrices have a ﬁxed number of non-zeroentries in every row and column. It was observed in [17] that such games admit a trivial PTAS. Note that we study a strictly larger class of games and provide a PTAS for games in which the row or column sparsity of A + B is ﬁxed. Densest Subgraph.

The best-known (multiplicative) approximation ratio for the densest k -subgraph problem is n (1 / o (1)) [6]. But unlike this result, our work addresses additive approxima-tions with normalized density as the maximization objective. In parituclar, we approximate NDkSby approximately solving a quadratic program, which is similar to the quadratic program used inthe Motzkin-Straus theorem [28].In addition, our approximation algorithm for DkBS is based on solving a bilinear program thatwas formulated by Alon et al. [3]. This bilinear program was used in [3] to develop an additivePTAS for DkBS in particular classes of graphs, including ones with low-rank adjacency matrices.This paper supplements prior work by developing an approximation algorithm whose running timeis parametrized by the maximum degree of the given graph, and not by the rank of its adjacencymatrix. Approximate Nash Equilibria.

Our algorithm for computing an approximate Nash equilib-rium relies on ﬁnding a near-optimal solution of a bilinear program (BP). The BP we consider wasformulated by Mangasarian and Stone [26] and its optimal (near-optimal) solutions correspond to In particular, the product of uniform distributions over players’ actions corresponds to an approximate Nashequilibrium in such games. x and y , correspond to probability distributions that are mixed strate-gies of the players and its objective is to maximize x T Cy , where C is the sum of the payoﬀ matricesof the game. Suppose we knew the vector u := C ˆ y , for some Nash equilibrium (ˆ x, ˆ y ). Then, aNash equilibrium can be eﬃciently computed by solving a linear program (with variables x and y )that is obtained by modifying the BP as follows: replace x T Cy by x T u as the objective and includethe constraint Cy = u . Section 4 shows that this idea can be used to ﬁnd an approximate Nashequilibrium, even if u is not exactly equal to C ˆ y but close to it. That is, to ﬁnd an approximateNash equilibrium it suﬃces to have a vector u for which k C ˆ y − u k p is small.To apply the approximate version of Carath´eodory’s theorem we observe that C ˆ y is a vectorin the convex hull of the columns of C . Also, note that in the context of (additive) approximateNash equilibria the payoﬀ matrices are normalized, hence the absolute value of any entry of matrix C is no more than, say, 2. This entry-wise normalization implies that if no column of matrix C has more than s non-zero entries, then the log s norm of the columns is a ﬁxed constant: k C i k p ≤ ( s · p ) /p = 2 · log sp ≤

4, where C i is the i th column of C and norm p = log s . This is a simple butcritical observation, since it implies that, modulo a small scaling factor, the columns of an C lie inthe log s -unit ball. At this point we can apply the approximate version of Carath´eodory’s theoremto guarantee that close to C ˆ y there exists a vector u that can be expressed as a convex combinationof about p = log s columns of C . We show in Section 4 that exhaustively searching for u takes n O (log s ) time, where n is the number of columns of C . Thus we can ﬁnd a vector close to C ˆ y andhence determine a near-optimal solution of the bilinear program. This way we get an approximateNash equilibrium and the running time of the algorithm is dominated by the exhaustive search.Overall, this template for approximating Nash equilibria in sparse games is made possible bythe approximate version of Carath´eodory’s theorem. It is notable that our algorithmic frameworkemploys arbitrary norms p ∈ [2 , ∞ ), and in this sense it goes beyond standard ε -net -based resultsthat typically use norms 1, 2, or ∞ . Densest Subgraph.

The algorithmic approach outlined above applies to any quadratic or bi-linear program in which the objective matrix is column (or row) sparse and the feasible region iscontained in the simplex. We use this observation to develop an additive approximations for NDkSand DkBS.Speciﬁcally, we formulate a quadratic program, near-optimal solutions of which correspondto approximate-solutions of NDkS. The column sparsity of the objective matrix in the quadraticprogram is equal to the maximum degree of the underlying graph plus one. Hence, using the abovementioned observation, we obtain the approximation result for NDkS. The same template appliesto DkBS; for this problem we employ a bilinear program from [3]. We ignore the linear part of the objective for ease of presentation, see Section 4 for details. .3 Organization We begin by setting up notation in Section 2. Then, in Section 3 we present the approximate versionof Carath´eodory’s theorem. Algorithmic applications of the theorem are developed in Sections 4and 5. In Section 6 we consider convex hulls of matrices and also detail approximate versions of thecolorful Carath´eodory theorem and Tverberg’s theorem. Finally, Section 7 presents a lower boundproving showing that, in general, ε -close (under the p -norm distance with p ∈ [2 , ∞ )) vectors cannotbe expressed as a convex combination of less than ε p/ ( p − vectors of the given set. Write k x k p to denote the p -norm of a vector x ∈ R d . The Euclidean norm is denoted by k x k , i.e.,we drop the subscript from k x k . The number of non-zero components of a vector x is speciﬁed viathe ℓ “norm”: k x k := |{ i | x i = 0 }| . Let ∆ n be the set of probability distributions over the set[ n ]. For x ∈ ∆ n , we deﬁne Supp( x ) := { i | x i = 0 } . Similarly, for a vector v ∈ R n write Supp( v ) todenote the set { i | v i = 0 } .Given a set X = { x , x , . . . , x n } ⊂ R d , we use the standard abbreviation conv( X ) for theconvex hull of X . A vector y ∈ conv( X ) is said to be k uniform with respect to X if there exists asize k multiset S of [ n ] such that y = k P i ∈ S x i . In particular, if vector y is k uniform with respectto X then y can be expressed as a convex combination of at most k vectors from X . Throughout,the set X will be clear from context so we will simply say that a vector is k uniform and notexplicitly mention the fact that uniformity is with respect to X . A key technical ingredient in our proof is Khintchine inequality (see, e.g., [19] and [24]). Thefollowing form of the inequality is derived from a result stated in [31].

Theorem 1 (Khintchine Inequality) . Let r , r , . . . , r m be a sequence of i.i.d. Rademacher ± random variables, i.e., Pr( r i = ±

1) = for all i ∈ [ m ] . In addition, let u , u , . . . , u m ∈ R d be adeterministic sequence of vectors. Then, for ≤ p < ∞ E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i u i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ≤ √ p m X i =1 k u i k p ! . (1) Proof.

Given vector v ∈ R d , write diag( v ) to denote the d × d diagonal matrix whose diagonalis equal to v . Note that if matrix Q = diag( v ) then k Q k S p = k v k p , where k Q k S p denotes theSchatten p -norm of Q , i.e., k Q k S p = k σ ( Q ) k p , where σ ( Q ) is the vector of singular values of Q .In addition, say we construct diagonal matrices for a sequence of vectors u , u , . . . , u m ∈ R d , i.e.,7et Q i = diag( u i ) for all i ∈ [ m ], then for any sequence of scalars ξ , ξ , . . . , ξ m ∈ R we have P mi =1 ξ i Q i = diag ( P mi =1 ξ i u i ).In order to prove the theorem statement for vectors u , u , . . . , u m ∈ R d , we simply use Theorem2 of [31]. In particular, setting Q i = diag( u i ) for all i ∈ [ m ] in Theorem 2 of [31] and using theabove stated observations we get: E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i u i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) pp ≤ p p/ m X i =1 k u i k p ! p . (2)For p ≥ p th root is a concave function; hence Jensen’s inequality, applied to (2), gives usthe desired result.We are ready to prove the main result of this section. Note that in the following theorem thescaling term γ is deﬁned with respect to the p norm. Theorem 2.

Given a set of vectors X = { x , x , . . . , x n } ⊂ R d and ε > . For every µ ∈ conv ( X ) and ≤ p < ∞ there exists an pγ ε uniform vector µ ′ ∈ conv ( X ) such that k µ − µ ′ k p ≤ ε . Here, γ := max x ∈ X k x k p .Proof. Express µ ∈ conv( X ) as a convex combination of x i s: µ = P ni =1 α i x i where α i ≥

0, for all i ∈ [ n ], and P ni =1 α i = 1. Note that α = ( α , α , . . . , α n ) corresponds to a probability distributionover vectors x , x , . . . , x n . That is, under probability distribution α vector x i is drawn withprobability α i . The vector µ is the mean of this distribution. Speciﬁcally, the j th component of µ is the expected value of the random variable that takes value x i,j with probability α i , here x i,j isthe j th component of vector x i . We succinctly express these component-wise equalities as follows: E v ∼ α [ v ] = µ. (3)Let v , v , . . . , v m be m i.i.d. draws from α . The sample mean vector is deﬁned to be m P mi =1 v i .Below we specify function g : X m → R to quantify the p -norm distance between the sample meanvector and the µ . g ( v , v , . . . , v m ) := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 v i − µ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p . (4)The key technical part of the remainder of the proof is to show that E [ g ] ≤ √ p γ √ m . (5)8or m = pγ ε this inequality reduces to E [ g ] ≤ ε . Therefore, when the number of samples m = pγ ε we have Pr( g ≤ ε ) >

0, i.e., with positive probability the sample mean vector is ε closeto µ in the p -norm. Overall, the stated claim is implied by the probabilistic method.Recall that in expectation the sampled mean is equal to µ , i.e., E v ′ ,...,v ′ m ∼ α m P mi =1 v ′ i = µ .Hence, we have E [ g ] = E v ,...,v m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 v i − µ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (6)= E v ,...,v m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 v i − E v ′ ,...,v ′ m m m X i =1 v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (7)= E v ,...,v m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E v ′ ,...,v ′ m m m X i =1 v i − m m X i =1 v ′ i !(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p . (8)Note that k · k p is convex for p ≥

1. Therefore, Jensen’s inequality gives us: E v ,...,v m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E v ′ ,...,v ′ m m m X i =1 v i − m m X i =1 v ′ i !(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ≤ E v ,...,v m E v ′ ,...,v ′ m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m X i =1 v i − m m X i =1 v ′ i !(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (9)= 1 m E v ,...,v m v ′ ,...,v ′ m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 (cid:0) v i − v ′ i (cid:1)!(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p . (10)Let r , r , . . . , r m be a sequence of i.i.d. Rademacher ± r i = ±

1) = for all i ∈ [ m ]. Since, for all i ∈ [ m ], v i and v ′ i are i.i.d. copies we can write1 m E v i ,v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 (cid:0) v i − v ′ i (cid:1)!(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p = 1 m E v i ,v ′ i ,r i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i (cid:0) v i − v ′ i (cid:1)!(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ≤ m E v i ,v ′ i ,r i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p  (Triangle inequality)= 1 m E r i  E v i ,v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r , .., r m   (Tower property)= 1 m E r i  E v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r , .., r m  + E v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r , .., r m   = 1 m E r i  E v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r , .., r m   = 2 E v i ,r i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p . (11)9he penultimate equality follows from the following ( v i s and v ′ i s are i.i.d. copies) E v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r , .., r m  = E v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r , .., r m  . (12)Overall, inequalities (8), (10), and (11) imply E [ g ] ≤ E v i ,r i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p , (13)where r , r , . . . , r m is a sequence of i.i.d. Rademacher ± u i = v i m to obtain E v i ,r i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p = E v i  E r i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m X i =1 r i v i m (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) v , .., v m   (14) ≤ E v i  √ p m X i =1 (cid:13)(cid:13)(cid:13) v i m (cid:13)(cid:13)(cid:13) p ! /  (15) ≤ E v i  √ p m X i =1 γ m ! /  (16)= √ p γ √ m . (17)Inequality (16) uses the fact that random vectors v i are supported over X , so k v i k p ≤ γ .Using (13) and (17) we get E [ g ] ≤ √ p γ √ m . (18)This completes the proof.We end this section by stating an ∞ -norm variant of our result. This theorem follows directlyfrom Hoeﬀding’s inequality. Note that in the following theorem the scaling of vectors in X is withrespect to the ∞ -norm. Theorem 3.

Given a set of vectors X = { x , x , . . . , x n } ⊂ R d , with max x ∈ X k x k ∞ ≤ , and ε > . For every µ ∈ conv ( X ) there exists an O (cid:16) log nε (cid:17) uniform vector µ ′ ∈ conv ( X ) such that k µ − µ ′ k ∞ ≤ ε .Proof. Apply Hoeﬀding’s inequality component wise and take union bound.10

Computing Approximate Nash Equilibrium

Bimatrix Games.

Bimatrix games are two player games in normal form. Such games are speciﬁedby a pair of n × n matrices ( A, B ), which are termed the payoﬀ matrices for the players. The ﬁrstplayer, also called the row player, has payoﬀ matrix A , and the second player, or the column player,has payoﬀ matrix B . The strategy set for each player is [ n ] = { , , . . . , n } , and, if the row playerplays strategy i and column player plays strategy j , then the payoﬀs of the two players are A ij and B ij respectively. The payoﬀs of the players are normalized between − A ij , B ij ∈ [ − , i, j ∈ [ n ].Recall that ∆ n is the set of probability distributions over the set of pure strategies [ n ]. Weuse e i ∈ R n to denote the vector with 1 in the i th coordinate and 0’s elsewhere. The playerscan randomize over their strategies by selecting any probability distribution in ∆ n , called a mixedstrategy. When the row and column players play mixed strategies x and y respectively, the expectedpayoﬀ of the row player is x T Ay and the expected payoﬀ of the column player is x T By . Deﬁnition 1 (Nash Equilibrium) . A mixed strategy pair ( x, y ) , x, y ∈ ∆ n , is said to be a Nashequilibrium if and only if: x T Ay ≥ e Ti Ay ∀ i ∈ [ n ] and (19) x T By ≥ x T Be j ∀ j ∈ [ n ] . (20)By deﬁnition, if ( x, y ) is a Nash equilibrium neither the row player nor the column player canbeneﬁt, in expectation, by unilaterally deviating to some other strategy. We say that a mixedstrategy pair is an ε -Nash equilibrium is no player can beneﬁt more than ε , in expectation, byunilateral deviation. Formally, Deﬁnition 2 ( ε -Nash Equilibrium) . A mixed strategy pair ( x, y ) , x, y ∈ ∆ n , is said to be an ε -Nashequilibrium if and only if: x T Ay ≥ e Ti Ay − ε ∀ i ∈ [ n ] and (21) x T By ≥ x T Be j − ε ∀ j ∈ [ n ] . (22)Throughout, we will write C to denote the sum of payoﬀ matrices, C := A + B . We will denotethe i th column of C by C i , for i ∈ [ n ]. Note that k C i k is equal to the number of non-zero entriesin the i th column of C .In the following deﬁnition we ensure that the sparsity parameter s is at least 4 for ease ofpresentation. In particular, the running time of our algorithm depends on the log of the number ofnon-zero entries in the columns of C , i.e., log of the sparsity of the columns of matrix C . Setting s ≥ s ≥

2. This allows us to state a single running-time bound, which holds even forcorner cases wherein the column sparsity is, say, zero (and hence log(max i k C i k ) is undeﬁned).11 eﬁnition 3 ( s -Sparse Games) . The sparsity of a game ( A, B ) is deﬁned to be s := max { max i k C i k , } , where matrix C = A + B . The quantitative connection between the sparsity of a game and the time it takes to computean ε -Nash equilibrium is stated below. Theorem 4.

Let

A, B ∈ [ − , n × n be the payoﬀ matrices of an s -sparse bimatrix game. Then, an ε -Nash equilibrium of ( A, B ) can be computed in time n O (cid:16) log sε (cid:17) . Our algorithm for computing ε -Nash equilibrium relies on the following bilinear program, whichwas formulated by Mangasarian and Stone [26]. As formally speciﬁed in Lemma 1 below, approxi-mate solutions of this bilinear program correspond to approximate Nash equilibria.max x,y,π ,π x T Cy − π − π subject to x T B ≤ T π Ay ≤ π x, y ∈ ∆ n π , π ∈ [ − , . (BP)Here denotes the all-ones vector. Using the deﬁnition of Nash equilibrium one can show thatthe optimal solutions of this bilinear program correspond to Nash equilibria of the game ( A, B ).Formally, we have

Theorem 5 (Equivalence Theorem [26]) . Mixed strategy pair (ˆ x, ˆ y ) is a Nash equilibrium of thegame ( A, B ) if and only if ˆ x , ˆ y , ˆ π , and ˆ π form an optimal solution of the bilinear program (BP),for some scalars ˆ π and ˆ π . In addition, the optimal value achieved by (BP) is equal to zero andthe payoﬀs of the row and column player at this equilibrium are ˆ π and ˆ π respectively. A relevant observation is that that an approximate solution of the bilinear program correspondsto an ε -Nash equilibrium. Lemma 1.

Let x, y ∈ ∆ n along with scalars π and π form a feasible solution of (BP) that achievesan objective function value more than − ε , i.e., x T Cy ≥ π + π − ε . Then, ( x, y ) is an ε -Nashequilibrium of the game ( A, B ) . roof. The feasibility of x, y implies that max j x T Be j ≤ π and max i e Ti Ay ≤ π . Since theobjective function value achieved by x, y is at least − ε we have x T Ay + x T By − π − π ≥ − ε . But, x T Ay is at most π and x T By is at most π . So, the following inequalities must hold x T Ay ≥ π − ε and (23) x T By ≥ π − ε. (24)Overall, we get x T Ay ≥ max i e Ti Ay − ε and x T By ≥ max j x T Be j − ε . Hence ( x, y ) satisﬁes thedeﬁnition of an ε -Nash equilibrium.The proposed algorithm (see Algorithm 1) solves the following p -norm minimization problem,with p ≥

2. The program CP( u ) is parametrized by vector u ∈ R n and it can be solved in polynomialtime. min x,y,π ,π k Cy − u k p ( CP( u )) subject to x T u ≥ π + π − ε/ Ay ≤ π x T B ≤ T π x, y ∈ ∆ n π , π ∈ [ − , . Proof of Theorem 4.

Algorithm 1 iterates at most n O (cid:16) pε (cid:17) times, since this is an upper bound onthe number of multisets of size O ( pε ). Furthermore, in each iteration the algorithm solves convexprogram CP( u ), this takes polynomial time. Given that p = log s , these observations establish thedesired running-time bound.Now, in order to prove the theorem we need to show that Alogorithm 1 is (i) Sound: any mixedstrategy pair, ( x, y ), returned by the algorithm is an approximate Nash equilibrium; (ii) Complete:the algorithm always returns a mixed-strategy pair. Soundness:

Lemma 1 implies that any mixed-strategy pair returned by the algorithm is guar-anteed to be an ε -Nash equilibrium. Speciﬁcally, say for some u the “if” condition in Step 7 is met.In addition, let x and y be the returned optimal solution of CP( u ). Then, | x T ( Cy − u ) | ≤ k x k q k Cy − u k p (H¨older’s inequality) (25) ≤ × ε/ . (26) Note that for ﬁxed u , CP( u ) is a convex program. Speciﬁcally, given u ∈ R n and matrix C ∈ R n × n , for p ≥ f ( x ) := k Cx − u k p is convex. lgorithm 1 Algorithm for computing ε -Nash equilibrium in s -sparse gamesGiven payoﬀ matrices A, B ∈ [ − , n × n and ε >

0; Return: ε -Nash equilibrium of ( A, B ) Write s to denote the sparsity of the game ( A, B ) and let p = log s . { Note that, by deﬁnition, s ≥

4; hence, p ≥ } Let U be the collection of all multisets of { , , . . . , n } of cardinality at most κ pε , where κ is aﬁxed constant. Write C i to denote the i th column of matrix C = A + B , for i ∈ [ n ]. for all multisets S ∈ U do Set u = | S | P i ∈ S C i . { u is an | S | -uniform vector in the convex hull of the columns of C . } Solve convex program CP( u ). if the objective function value of CP( u ) is less than ε/ then Return ( x, y ), where x and y form an optimal solution of CP( u ). end if end for Here q = p/ ( p − ≥

1, since p ≥

2. The second inequality follows from the fact that the objectivefunction value of CP( u ) is no more than ε/ k x k q ≤ k x k = 1. Since the returned x satisﬁesthe feasibility constraints in CP( u ) we have x T u ≥ π + π − ε/

2. Therefore, x T Cy ≥ x T u − ε/ ≥ π + π − ε . Overall, x and y satisfy the conditions in Lemma 1, and hence form an ε -Nashequilibrium. Completeness:

It remains to show that the “if” condition in Step 7 is satisﬁed at least once(and hence the algorithm successfully returns a mixed strategy pair ( x, y )). Next we accomplishthis task.Write (ˆ x, ˆ y ) to denote a Nash equilibrium of the given game and let ˆ π (ˆ π ) be the payoﬀ of therow (column) player under this equilibrium. Note that C ˆ y lies in the convex hull of the columns of C . Furthermore, since the sparsity of the game is s , for p = log s , we have k C i k p ≤ ∀ i ∈ [ n ] . (27)This follows from the fact that the entries of matrix C lie between − − p norm of any column i we get: k C i k p ≤ (2 p s ) /p = 2( s ) /p = 2(2 log s ) /p = 4.Therefore, for p = log s , we can apply Theorem 2 over the convex hull conv( { C i } i ) with γ ≤ µ = C ˆ y , Theorem 2 implies that there exists a O ( pε ) uniform vector µ ′ such that k C ˆ y − µ ′ k p ≤ ε/ µ ′ is O ( pε ) uniform, at some point during its execution the algorithm (with an appropriate14alue of κ ) will set u = µ ′ . Therefore, at least once the algorithm will consider a u that satisﬁes k C ˆ y − u k p ≤ ε/ . (28)We show that in this case ˆ x , ˆ y , ˆ π and ˆ π form a feasible solution of CP( u ) that achieves anobjective function value of no more than ε/

2. That is, the “if” condition in Step 7 is satisﬁed forthis choice of u .First of all the fact that the objective function value is no more than ε/ x, ˆ y ) is a Nash equilibrium, using Theorem 5 we getˆ x T C ˆ y = ˆ π + ˆ π . (29)Next we show that ˆ x T u ≥ π + π − ε/

2. Consider the following bound: | ˆ x T ( C ˆ y − u ) | ≤ k ˆ x k q k C ˆ y − u k p (H¨older’s inequality) (30) ≤ × ε/ . (31)Again, q = p/ ( p − ≥ k ˆ x k q ≤

1. Here, the second inequality now follows from ourchoice of u . Since, ˆ x T C ˆ y = ˆ π + ˆ π , we have ˆ x T u ≥ ˆ π + ˆ π − ε/ u ) are satisﬁed as well. This simply follows fromthe fact that ˆ x, ˆ y, ˆ π , and ˆ π form a feasible (in fact optimal) solution of (BP), see Theorem 5.Overall, we get that the “if” condition in Step 7 will be satisﬁed at least once and this completesthe proof.Ideas developed in this section can address other notions of sparsity as well. Speciﬁcally, if thereexist α, β ∈ R + and γ ∈ R such that the matrix αA + βB + γ n × n has column or row sparsity s , thenour algorithm can be used to ﬁnd an ε -Nash equilibrium of the game ( A, B ) in time n O (cid:18) λ sε (cid:19) ;here, n × n is the all-ones n × n matrix and λ := max { α, β, /α, /β } . This follows from the factthat a (min { α, β } ε )-Nash equilibrium of the game ( αA, βB + γ n × n ) is an ε -Nash equilibrium ofthe game ( A, B ).Furthermore, in time n O (cid:16) log sε (cid:17) , we can compute ε -Nash equilibria of games in which both ma-trices A and B have column or row sparsity s . Note that this case is not a direct corollary ofTheorem 4. In particular, if the columns of matrix A and the rows of matrix B are s sparse,then it is not necessary that A + B has low column or row sparsity. But, an approximate equi-librium of such a game can be computed by exhaustively searching for vectors, v and w , that are15 / s -norm distance) to A ˆ y and ˆ x T B respectively, here (ˆ x, ˆ y ) is a Nash equilib-rium of ( A, B ). In this case, instead of CP( u ), we need to solve a convex program that minimizes k Ay − v k log s + k B T x − w k log s and has the following constraint x T v + w T y ≥ π + π − ε/ ε -Nash equilibrium of the game ( A, B ). Remark 1.

Consider the class of games in which the p norm of the columns of matrix C is a ﬁxedconstant. A simple modiﬁcation of the arguments mentioned above shows that for such games an ε -Nash equilibrium can be computed in time n O (cid:16) pε (cid:17) . Remark 2.

Algorithm 1 can be adopted to ﬁnd an approximate Nash equilibrium with large socialwelfare (the total payoﬀs of the players). Speciﬁcally, in order to determine whether there existsan approximate Nash equilibrium with social welfare more than α − ε , we include the constraint π + π ≥ α in CP( u ). The time complexity of the algorithm stays the same, and then via a binarysearch over α we can ﬁnd an approximate Nash equilibrium with near-optimal social welfare. Remark 3.

In Algorithm 1, instead of the convex program CP( u ), we can solve the linear programwith objective min k Cy − u k ∞ and constraints identical to CP( u ). Algorithm 1 still ﬁnds an approx-imate Nash, since k Cy − u k ∞ ≤ k Cy − u k p and we have | x T ( Cy − u ) | ≤ k x k k Cy − u k ∞ ≤ × ε/ ( and ∞ are H¨older conjugates of each other).Solving a linear program, in place of a convex program, would lead to a polynomial improvementin the running time of the algorithm. But, minimizing the p norm of Cy − u remains useful inspeciﬁc cases; in particular, it provides a better running-time bound when the game is guaranteedto have a “small probability” equilibrium. We detail this result in the following Section. Daskalakis and Papadimitriou [17] showed that there exists a PTAS for games that contain an equi-librium with small—speciﬁcally, O (cid:0) n (cid:1) —probability values. This result is somewhat surprisingly,since such small-probability equilibria have large—Ω( n )—support, and hence are not amenable to,say, exhaustive search. This section shows that if a game has an equilibrium with probability values O (cid:0) m (cid:1) , for some 1 ≤ m ≤ n , then an approximate equilibrium can be computed in time n O ( k/ε ) ,where k has a logarithmic dependence on s/m . Since column sparsity s is no more than n , we getback the result of [17] as a special case. Deﬁnition 4 (Small Probability Equilibrium) . A Nash equilibrium ( x, y ) is said to be m -smallprobability if all the entries of x and y are at most m . In [17] a PTAS is given for games that have an equilibrium with probability values at most δn , for some ﬁxed constant δ ∈ (0 , δn -small probability equilibrium. Next we prove a result for general m -small probability equilibria.16 heorem 6. Let

A, B ∈ [ − , n × n be the payoﬀ matrices of an s -sparse bimatrix game. If ( A, B ) contains an m -small probability Nash equilibrium, then an ε -Nash equilibrium of the game can becomputed in time n O (cid:16) tε (cid:17) , where t = max (cid:8) (cid:0) sm (cid:1) , (cid:9) .Proof. Let norm p = max (cid:8) (cid:0) sm (cid:1) , (cid:9) and write q to denote the H¨older conjugate of p , i.e., q satisﬁes p + q = 1. To obtain this result Algorithm 1 is modiﬁed as follows: (i) use the updatedvalue of p instead of the one speciﬁed in Step 1 of the algorithm; (ii) include convex constraint k x k q ≤ m − /p in CP( u ); (iii) In Step 7 use ε m /p , instead of ε , as the threshold for returning asolution.In order to establish the theorem we prove that the modiﬁed alogorithm is (i) Sound: anymixed strategy pair, ( x, y ), returned by it is an approximate Nash equilibrium; (ii) Complete: thealgorithm always returns a mixed-strategy strategy pair. Soundness:

Below we show that if x and y are returned by this modiﬁed algorithm then theyform a near-optimal solution of the bilinear program (BP) and, in particular, satisfy x T Cy ≥ π + π − ε . Hence, Lemma 1 shows that any returned solution ( x, y ) is an ε -Nash equilibrium.Say the algorithm returns ( x, y ) while considering vector u . Applying H¨older’s inequality givesus: | x T ( Cy − u ) | ≤ k x k q k Cy − u k p (32) ≤ m − /p ε m /p ε/ . (34)Here the second inequality uses the fact that x satisﬁes the feasibility constraint k x k q ≤ m − /p and the objective function value of CP( u ) is at most ε m /p . Since x is a feasible solution of CP( u ),it satisﬁes x T u ≥ π + π − ε/

2. Therefore, using inequality (34) we get x T Cy ≥ π + π − ε , asrequired. Completeness:

It remains to show that the “if” condition in Step 7 is satisﬁed at least once.To achieve this we prove that, for a particular u , an m -small probability Nash equilibrium formsa feasible solution of CP( u ) and achieves an objective function value of at most ε m /p . Therefore,for this u the “if” condition in Step 7 will be met.Recall that the columns of C are s sparse and its entries are at most 2. Hence the p norm ofany column C i satisﬁes k C i k p ≤ ( s p ) /p = 2 s /p .Let (ˆ x, ˆ y ) be an m -small probability Nash equilibrium of the game. Theorem 2, applied overthe convex hull conv( { C i } i ) with γ ≤ s /p , guarantees that there exists a O (cid:16) p s /p ε m /p (cid:17) uniform17ector that is ε m /p close to C ˆ y . Since p ≥ (cid:0) sm (cid:1) , we have s /p m /p = (cid:16) sm (cid:17) /p (35) ≤ . (36)Therefore, there exists a O (cid:0) pε (cid:1) uniform vector that is ε m /p close to C ˆ y . Such a vector, say u , will be selected by the algorithm in Step 4 at some point of time. Below we show that, for u ,the mixed strategies of the Nash equilibrium ˆ x and ˆ y are feasible solutions that achieve the desiredobjective function value.To establish the feasibility of ˆ x , we ﬁrst upper bound its q norm. The fact that its entries atmost 1 /m and q = pp − ≥ k ˆ x k q ≤ (cid:18) m m q (cid:19) /q (37)= m − /p . (38)Since (ˆ x, ˆ y ) is a Nash equilibrium, there exists payoﬀs ˆ π and ˆ π such ˆ x T C ˆ y = ˆ π + ˆ π (The-orem 5). Also, ˆ x , ˆ y , ˆ π , and ˆ π are feasible with respect to (BP). Hence, the only constraint ofCP( u ) that we still need to verify is ˆ x T u ≥ ˆ π + ˆ π − ε/

2. This follows from H¨older’s inequality: | ˆ x T ( C ˆ y − u ) | ≤ k ˆ x k q k C ˆ y − u k p (39) ≤ m − /p ε m /p ε/ . (41)Overall, for the above speciﬁed u , the “if” condition in Step 7 will be satisﬁed. This shows thatthe algorithms successfully returns an ε -Nash equilibrium.The algorithm iterates at most n O (cid:16) pε (cid:17) times, since this is an upper bound on the number ofmultisets of size O ( pε ). Furthermore, in each iteration the algorithm solves a convex program, thistakes polynomial time. These observations establish the desired running-time bound and completethe proof. This section and the next one present additive approximations for the normalized densest k -subgraph problem (NDkS) and the densest k -bipartite subgraph problem (DkBS) respectively.In NDkS we are given a simple graph G = ( V, E ) along with a size parameter k ≤ | V | and thegoal is to ﬁnd a maximum density subgraph containing exactly k vertices. Here, density of a size- k S = ( V S , E S ) is deﬁned to be ρ ( S ) := | E S | /k . Note that in NDkS density is normalizedto be at most one.Next, we present a quadratic program, an ε -additive approximate solution of which can be usedto eﬃciently ﬁnd an ε -additive approximate solution of NDkS. Write A to denote the adjacencymatrix of the given graph G and let n be the number of vertices in G . Deﬁne matrix C := A + I ,here I is the n × n identity matrix. max x x T Cx ( QP) subject to x i ≤ k ∀ i ∈ [ n ] x ∈ ∆ n Write S ∗ to denote an optimal solution of the given NDkS instance and let z ∗ denote theoptimal value of the quadratic program (QP). Given a solution x of (QP) that achieves an objectivefunction value of z ∗ − ε (i.e., x satisﬁes x T Cx ≥ z ∗ − ε ), we show how to eﬃciently ﬁnd a subgraph S that is an ε -additive approximate solution of NDkS, i.e., ﬁnd a size- k subgraph S that satisﬁes ρ ( S ) ≥ ρ ( S ∗ ) − ε . Towards this end, the following lemma serves as a useful tool. Lemma 2.

Given a feasible solution, y , of the quadratic program (QP) we can ﬁnd in polynomialtime a feasible solution z that satisﬁes z T Cz ≥ y T Cy and, moreover, every component of z is either or k , i.e., z i ∈ (cid:8) , k (cid:9) for each i ∈ [ n ] .Proof. Given feasible solution y write M ( y ) := { i ∈ [ n ] | < y i < /k } . We iteratively update y to decrease the cardinality of M ( y ), and at the same time ensure that the objective function value(i.e., y T Cy ) does not decrease. Since | M ( y ) | ≤ n we iterate at most n times. This will overallestablish the lemma.Note that each i ∈ [ n ] indices both a component of y and a vertex of the graph G . Write γ i to denote the total y value of vertex i and its neighbors in G , i.e., γ i := y i + P j :( i,j ) ∈ E y j . Since y ∈ ∆ n and integer k >

1, the cardinality of set M ( y ) is either 0 or strictly greater than one. If M ( y ) = φ then the stated claims follows simply by setting z = y . Otherwise, if | M ( y ) | ≥ i, j ∈ M ( y ) and change y i and y j such that the size of M ( y ) decreases by atleast one. We select i, j ∈ M ( y ) as follows: • If there exist vertices i, j ∈ M ( y ) that are not connected by an edge in G , then without lossof generality we assume that γ i + y i ≥ γ j + y j . • If every pair of vertices i, j ∈ M ( y ) is connected by an edge, then without loss of generalitywe assume that γ i ≥ γ j . 19et δ := min (cid:8) y j , k − y i (cid:9) . We update y i ← y i + δ and y j ← y j − δ , this update ensures thateither y j goes down to zero or y i becomes equal to 1 /k ; hence, the size of M ( y ) decreases by atleast one.Next we show this change in y does not decrease the objective function value y T Cy . Therebywe get the stated claim.Consider the case in which i and j are not connected via an edge. The other case in which( i, j ) ∈ E (and we have s i ≥ s j ) follows along the same lines.Before the update the following inequality holds, γ i + y i ≥ γ j + y j . For any δ , the change inobjective function value is equal to ( y i + δ )( γ i + δ ) + ( y j − δ )( γ j − δ ) − ( y i γ i + y j γ j ). This qualityis equal to δ ( y i + γ i − y j − γ j ) + δ . Since y i + γ i − y j − γ j ≥ δ is nonnegative, we get thatthe update in y does not decrease the objective function value. This completes the proof.Recall that ρ ( S ∗ ) and z ∗ denote the optimal values of the given NDkS instance and (QP)respectively. Using Lemma 2 we get the following proposition. Proposition 1.

The optimal value of (QP) is equal to the optimal value of the NDkS instance plus /k , i.e., z ∗ = ρ ( S ∗ ) + 1 /k .Proof. We use S ∗ to obtain a feasible solution, ˆ x , for (QP) as follows: for each i ∈ V ( S ∗ ) setˆ x i = 1 /k and for every j / ∈ V ( S ∗ ) set ˆ x j = 0. Here, V ( S ∗ ) denotes the set of vertices of thesubgraph S ∗ .By the deﬁnition of matrix C we get that ˆ x T C ˆ x = ρ ( S ∗ ) + 1 /k . This implies that z ∗ ≥ ρ ( S ∗ ) + 1 /k. (42)Using Lemma 2 we can obtain an optimal solution x ′ of the QP such that the components of x ′ are either 0 or 1 /k . By deﬁnition, ( x ′ ) T Cx ′ = z ∗ . Write S ′ to denote the subgraph induced bythe vertex subset Supp( x ′ ). Note that z ∗ = ( x ′ ) T Cx ′ = ρ ( S ′ ) + 1 /k . Given that S ∗ is an optimalsolution of the NDkS instance, we have ρ ( S ∗ ) ≥ ρ ( S ′ ). Hence, the following inequality holds ρ ( S ∗ ) + 1 /k ≥ z ∗ . (43)Inequalities (42) and (43) imply the stated claim.Finally, we establish the connection between (QP) and NDkS. Theorem 7.

Given an ε -additive approximate solution of (QP) we can ﬁnd an ε -additive approx-imate solution of NDkS in polynomial time.

Proof.

Given an ε -additive approximate solution of (QP), via Lemma 2, we can ﬁnd an ε -additiveapproximation ˆ x whose components are either 0 or 1 /k . Write ˆ S to denote the subgraph induced20y vertex subset Supp(ˆ x ); recall that, each i ∈ [ n ] indices both a component of ˆ x and a vertex ofthe graph G .Note that ˆ x T C ˆ x = ρ ( ˆ S ) + 1 /k . Since ˆ x T C ˆ x ≥ z ∗ − ε , the following inequality holds ρ ( ˆ S ) +1 /k ≥ z ∗ − ε . Using Propostion 1 we get that ˆ S is an ε -additive approximate solution of NDkS, ρ ( ˆ S ) ≥ ρ ( S ∗ ) − ε .In other words, in order to determine an approximate solution of NDkS it suﬃces to computean approximate solution of (QP).Note that, if the maximum degree of the graph G is d then the number of non-zero componentsin any column of the objective matrix C is no more than d + 1. This implies that, for i ∈ [ n ],we have k C i k p ≤ ( d + 1) /p . Here, C i denotes the i th column of C . Now for p = log( d + 1)the following bound holds for all i ∈ [ n ]: k C i k p ≤

2. Therefore, as in Algorithm 1, enumeratingall O (cid:0) pε (cid:1) -uniform vectors in the convex hull of the columns of C , we can ﬁnd an ε -approximatesolution of (QP). This establishes the following theorem. Theorem 8.

Let G be a graph with n vertices and maximum degree d . Then, an ε -additive ap-proximation of NDkS over G can be determined in time n O (cid:16) log dε (cid:17) . In general, this gives us an additive approximation algorithm for NDkS that runs in quasi-polynomial time.

In DkBS we are given a graph G = ( V, E ) along with a size parameter k ≤ | V | and the goal is toﬁnd size- k vertex subsets, S and T , such that the density of edges between S and T is maximized.Speciﬁcally, the (bipartite) density of vertex subsets S and T is deﬁned as follows: ρ ( S, T ) := | E ( S, T ) || S || T | , (44)here E ( S, T ) denotes the set of edges that connect S and T .Let vertex subsets S ∗ and T ∗ form an optimal solution of the given DkBS instance. Hence, ρ ( S ∗ , T ∗ ) is equal to the optimal density. Next we state a bilinear program from [3] to approximateDkBS. Here, A denotes the adjacency matrix of the given graph G and n = | V | .max x,y x T Ay subject to x, y ∈ ∆ n x i , y i ≤ k ∀ i ∈ [ n ] . (BP-DkBS)21ote that optimizing (BP-DkBS) over x , with a ﬁxed y , corresponds to solving a linear program.Therefore, for any ﬁxed y there exists an optimal basic feasible x , and vice versa. In other words,for any feasible pair ( x , y ) we can ﬁnd ( x, y ), such that x T Ay ≤ x T Ay and the all the componentsof x and y are either 0 or 1 /k . This observation implies that the optimal value of (BP-DkBS) isequal to ρ ( S ∗ , T ∗ ). In addition, given an additive ε -approximate solution of (BP-DkBS), ( x ′ , y ′ ), wecan eﬃciently determine an ε -approximate solution of DkBS. Speciﬁcally, we can assume withoutloss of generality that x ′ and y ′ are basic, and then for S ′ := Supp( x ′ ) and T ′ := Supp( y ′ ) we have ρ ( S ′ , T ′ ) ≥ ρ ( S ∗ , T ∗ ) − ε . In other words, in order to determine an approximate solution of DkBSit suﬃces to compute an approximate solution of (BP-DkBS).Note that the column sparsity of the objective matrix in (BP-DkBS)—i.e, the column sparsityof the adjacency matrix A —is equal to the maximum degree of the given graph. Therefore, asoutlined in the previous section, we can modify Algorithm 1 to obtain the following theorem. Theorem 9.

Let G be a graph with n vertices and maximum degree d . Then, there exists analgorithm that runs in time n O (cid:16) log dε (cid:17) and computes a k × k -bipartite subgraph of density at least ρ ( S ∗ , T ∗ ) − ε . This section extends Theorem 2 to address convex hulls of matrices. Here we also detail approximateversions of certain generalizations of the Carath´eodory’s theorem; speciﬁcally, we focus on thecolorful Carath´eodory theorem and Tverberg’s theorem.First of all, note that a d × d matrix can be considered as a vector in R d , and hence directlyapplying Theorem 2 to vectors in R d we get results for entrywise p -norms. Recall that for a d × d matrix Y the entrywise p -norm is deﬁned as follows k Y k p := (cid:16)P di =1 P dj =1 | Y i,j | p (cid:17) /p .In particular, we get the an approximate version of the Birkhoﬀ-von Neumann theorem byconsidering d × d matrices as vector of size d and applying Theorem 2 with norm p = log d .We know via the Birkhoﬀ-von Neumann theorem that any d × d doubly stochastic matrix can beexpressed as a convex combination of d × d permutation matrices (see, e.g., [5]). The followingcorollary shows that for every doubly stochastic matrix D there exists an ε -close (in the entrywiselog d -norm, and hence in the entrywise ∞ -norm) doubly stochastic matrix D ′ that can be expressedas a convex combination of O (cid:16) log dε (cid:17) permutation matrices.We say that a doubly stochastic matrix D ′ is k uniform if there exists a size k multiset Π ofpermutation matrices such that D ′ = k P P ∈ Π P . Corollary 1.

For every d × d doubly stochastic matrix D there exists an O (cid:16) log dε (cid:17) uniform doublystochastic matrix D ′ such that max i,j | D i,j − D ′ i,j | ≤ ε . In addition to entrywise norms, we can establish an approximate version of Carath´eodory’stheorem for matrices under the Schatten p -norm. Write k Y k S p to denote that Schatten p -norm of22 d × d ′ matrix Y , i.e., k Y k S p := k σ ( Y ) k p , where σ ( Y ) is the vector of singular values of Y . Givena set of matrices Y = { Y , Y , . . . , Y n } ⊂ R d × d ′ , we say that a matrix M ′ ∈ conv( Y ) is k uniform(implicitly with respect to Y ) if there exists a size k multiset S of [ n ] such that M ′ = k P i ∈ S Y i .We can directly adopt the proof of Theorem 2 and, in particular, use the matrix version ofKhintchine inequality [31, 32] (instead of Theorem 1) to obtain the following result. Theorem 10.

Given a set of matrices Y = { Y , Y , . . . , Y n } ⊂ R d × d ′ and ε > . For every matrix M ∈ conv ( Y ) and ≤ p < ∞ there exists an O (cid:16) pγ ε (cid:17) uniform matrix M ′ ∈ conv ( Y ) such that k M − M ′ k S p ≤ ε . Here, γ := max Y ∈Y k Y k S p . The colorful Carath´eodory theorem (see [4] and [27]) asserts that if the convex hulls of d +1 sets (thecolor classes) X , X , . . . , X d +1 ⊂ R d intersect, then every vector in the intersection µ ∈ ∩ i conv( X i )can be expressed as a convex combination of vectors, each of which has a diﬀerent color. Theorem 11 (Colorful Carath´eodory Theorem) . Let X , X , . . . , X d +1 be d + 1 sets in R d andvector µ ∈ ∩ i conv ( X i ) . Then, there exists d + 1 vectors x , x , . . . , x d +1 such that x i ∈ X i for each i and µ ∈ conv ( { x , x , . . . , x d +1 } ) . For a given collection of d + 1 sets (the color classes) X , X , . . . , X d +1 ⊂ R d , a set R ∈ R d iscalled a rainbow if | R ∩ X i | = 1 for all i ∈ [ d + 1].The colorful Carath´eodory theorem guarantees the existence of a rainbow R = { x , x , . . . , x d +1 } whose convex hull contains the given vector µ . But, determining a rainbow R with µ ∈ conv( R ) inpolynomial time remains an interesting open problem; see, e.g., [27]. Here we consider a naturalapproximate version of this question wherein the goal is to ﬁnd a rainbow R ′ whose convex hull is ε close to µ in the following sense inf v ∈ conv( R ′ ) k µ − v k p ≤ ε . Below we show that, for appropriatelyscaled vectors, Theorem 2 can be used to eﬃciently determine such a rainbow R ′ .Theorem 2 implies that there exists a vector µ ′ that satisﬁes k µ − µ ′ k p ≤ ε and can be expressedas a convex combination of t = O (cid:16) pγ ε (cid:17) vectors of R , for p ≥ γ := max x ∈∪ i X i k x k p . Let R ′ be a size t subset of R such that µ ′ ∈ conv( R ′ ). Write n = P d +1 i =1 | X i | and note that we canexhaustively search for R ′ in time (cid:0) d +1 t (cid:1) n O ( t ) . For a guessed R ′ we can test if its convex hull is closeto µ (i.e., inf v ∈ conv( R ′ ) k µ − v k p ≤ ε ) via a convex program. Moreover, after ﬁnding an appropriatesubset R ′ we can extend it to obtain a rainbow that is ε close to µ .Another generalization of Carath´eodory’s theorem is Tverberg’s Theorem (see [35] and [27]),which is stated below. Theorem 12 (Tverberg’s Theorem) . Any set of ( r − d + 1) + 1 vectors X ⊂ R d can be parti-tioned into r pairwise disjoint subsets X , X , . . . , X r ⊆ X such that their convex hulls intersect: ri =1 conv ( X i ) = φ . Note that here the special case of r = 2 corresponds to Radon’s theorem (see, e.g., [27]).Next we consider an approximate version of Tverberg’s theorem. At a high level, our resultguarantees the existence of pairwise disjoint subsets, X ′ , . . . , X ′ r , that have small cardinality andwhose convex hulls, conv( X ′ ) , . . . , conv( X ′ r ), are “concurrently close” to each other. Formally: Deﬁnition 5 (Concurrently ε close) . Sets V , V , ..., V r ⊂ R d are said to be concurrently ε closeunder the p -norm distance if there exists a vector µ ∈ R d such that inf v ∈ conv ( V i ) k µ − v k p ≤ ε , forall i ∈ [ r ] . Tverberg’s theorem together with Theorem 2 establish the following result.

Theorem 13.

Let norm p ∈ [2 , ∞ ) and parameter ε > . Then, any set of ( r − d + 1) + 1 vectors X ⊂ R d can be partitioned into r pairwise disjoint subsets X ′ , X ′ , . . . , X ′ r ⊆ X that areconcurrently ε close under the p -norm distance and satisfy | X ′ i | = O (cid:16) pγ ε (cid:17) , for all i ∈ [ r ] . Here, γ := max x ∈ X k x k p . Tverberg’s theorem guarantees the existence of a partition with subsets X , X , . . . , X r whoseconvex hulls intersect. It is an interesting open problem whether such a partition can be determinedin polynomial time; see [27]. But, note that Theorem 13 implies that for ﬁxed r we can eﬃcientlydetermine an approximate solution for this problem, i.e., determine a partition consisting of subsetsthat are concurrently ε close. In particular, with t = O (cid:16) pγ ε (cid:17) , we can exhaustively search for size t subsets X ′ , X ′ , . . . , X ′ r ⊆ X that are concurrently ε close under the p -norm distance. This searchruns in time n O ( rt ) . Finally, we can extend the disjoint subsets X ′ , X ′ , . . . , X ′ r ⊆ X —by arbitrarilyassigning the vectors in X \ ( ∪ i X ′ i )—to obtain a partition of X that is concurrently ε close. This section presents a lower bound that proves that in general ε -close (under the p -norm distancewith p ∈ [2 , ∞ )) vectors cannot be expressed as a convex combination of less than ε p/ ( p − vectorsof the given set. This, in particular, implies that the 1 /ε dependence established in Theorem 2 istight for the Euclidean case ( p = 2).To prove the lower bound we consider the convex hull of the standard basis vectors e , e , . . . , e d ∈ R d , i.e., consider the simplex ∆ d . Write ¯ e ∈ ∆ d to denote the uniform convex combination of e , e , . . . , e d : ¯ e := (1 /d, /d, . . . , /d ) T . Next we show that any vector that can be expressed as a A polynomial-time algorithm exists for testing whether given subsets X ′ , X ′ , . . . , X ′ r are concurrently ε closeunder the p -norm distance. In particular, we can run the ellipsoid method with a separation oracle that solvesthe convex optimization problem min v ∈ conv( X ′ i ) k u − v k p to test if there exists a separating hyperplane for proposedellipsoid center u . ε p/ ( p − standard basis vectors is at least ε away from ¯ e in the p -norm distance. This establishes the desired lower bound.Note that (trivially) any vector in ∆ d can be expressed as a convex combination of ε p/ ( p − standard basis vectors if, say, ε p/ ( p − = d . Hence, to obtain a meaningful lower bound we need toassume that ε p/ ( p − is less than d . Proposition 2.

Let ε p/ ( p − < d and norm p ≥ . Then, any vector u ∈ ∆ d that can be expressedas a convex combination of less than ε p/ ( p − standard basis vectors must satisfy k ¯ e − u k p > ε .Proof. Write k = ε p/ ( p − . Since ¯ e is symmetric, we can assume without loss of generality that u isa convex combination of vectors e , e , . . . , e k . The optimal value of the following convex programlower bounds the p -norm distance between u and ¯ e :min α ,...,α k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ¯ e − k X i =1 α i e i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p subject to k X i =1 α i = 1 α i ≥ ∀ i ∈ [ k ] . K.K.T. conditions imply that the optimal is achieved by setting α i = 1 /k for all i ∈ [ k ]. Hence,the p -norm distance between ¯ e and u is at least (cid:0) k (cid:0) k − d (cid:1) p (cid:1) /p . The choice of k and the assumptionthat ε p/ ( p − < d guarantee that k ¯ e − u k p > ε /p . Since p ≥

2, we get the desired claim.

Acknowledgements

The author thanks Federico Echenique, Katrina Ligett, Assaf Naor, Aviad Rubinstein, AnthonyMan-Cho So, and Joel Tropp for helpful discussions and references. This research was supportedby NSF grants CNS-0846025 and CCF-1101470, along with a Linde/SISL postdoctoral fellowship.

References [1] Noga Alon, Sanjeev Arora, Rajsekar Manokaran, Dana Moshkovitz, and OmriWeinstein. Inapproximability of densest κ -subgraph from average case hardness. , 2011.[2] Noga Alon, Troy Lee, and Adi Shraibman. The cover number of a matrix and its algo-rithmic applications. Approximation, Randomization, and Combinatorial Optimization (AP-PROX/RANDOM) , 2014. 253] Noga Alon, Troy Lee, Adi Shraibman, and Santosh Vempala. The approximate rank of amatrix and its algorithmic applications: approximate rank. In

Proceedings of the 45th annualACM symposium on Symposium on theory of computing , pages 675–684. ACM, 2013.[4] Imre B´ar´any. A generalization of carath´eodory’s theorem.

Discrete Mathematics , 40(2):141–152, 1982.[5] Alexander Barvinok.

A course in convexity , volume 54. American Mathematical Soc., 2002.[6] Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaragha-van. Detecting high log-densities: an O ( n / ) approximation for densest k-subgraph. In Pro-ceedings of the 42nd ACM symposium on Theory of computing , pages 201–210. ACM, 2010.[7] Hartwig Bosse, Jaroslaw Byrka, and Evangelos Markakis. New algorithms for approximateNash equilibria in bimatrix games. In

Internet and Network Economics , pages 17–29. Springer,2007.[8] Jean Bourgain, Alain Pajor, SJ Szarek, and N Tomczak-Jaegermann. On the duality problemfor entropy numbers of operators. In

Geometric aspects of functional analysis , pages 50–63.Springer, 1989.[9] Mark Braverman, Young Kun Ko, and Omri Weinstein. Approximating the best Nash equi-librium in n o (log n ) -time breaks the exponential time hypothesis. ACM-SIAM Symposium onDiscrete Algorithms (SODA) , 2015.[10] Bernd Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness of operatorsin Banach spaces. In

Annales de l’institut Fourier , volume 35, pages 79–118. Institut Fourier,1985.[11] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Sparse games are hard. In

Internet and NetworkEconomics , pages 262–273. Springer, 2006.[12] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-playerNash equilibria.

Journal of the ACM (JACM) , 56(3):14, 2009.[13] Constantinos Daskalakis. On the complexity of approximating a Nash equilibrium.

ACMTransactions on Algorithms (TALG) , 9(3):23, 2013.[14] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexityof computing a Nash equilibrium.

SIAM Journal on Computing , 39(1):195–259, 2009.[15] Constantinos Daskalakis, Aranyak Mehta, and Christos Papadimitriou. A note on approximateNash equilibria. In

Internet and Network Economics , pages 297–306. Springer, 2006.2616] Constantinos Daskalakis, Aranyak Mehta, and Christos Papadimitriou. Progress in approx-imate Nash equilibria. In

Proceedings of the 8th ACM conference on Electronic commerce ,pages 355–358. ACM, 2007.[17] Constantinos Daskalakis and Christos H Papadimitriou. On oblivious ptas’s for Nash equi-librium. In

Proceedings of the 41st annual ACM symposium on Theory of computing , pages75–84. ACM, 2009.[18] Tomas Feder, Hamid Nazerzadeh, and Amin Saberi. Approximating Nash equilibria usingsmall-support strategies. In

Proceedings of the 8th ACM conference on Electronic commerce ,pages 352–354. ACM, 2007.[19] D.J.H. Garling.

Inequalities: a journey into linear analysis , volume 19. Cambridge UniversityPress Cambridge, 2007.[20] Elad Hazan and Robert Krauthgamer. How hard is it to approximate the best Nash equilib-rium?

SIAM Journal on Computing , 40(1):79–91, 2011.[21] Ravi Kannan and Thorsten Theobald. Games of ﬁxed rank: A hierarchy of bimatrix games.In

Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms , pages1124–1132. Society for Industrial and Applied Mathematics, 2007.[22] Spyros C Kontogiannis, Panagiota N Panagopoulou, and Paul G Spirakis. Polynomial algo-rithms for approximating Nash equilibria of bimatrix games. In

Internet and Network Eco-nomics , pages 286–296. Springer, 2006.[23] Spyros C Kontogiannis and Paul G Spirakis. Eﬃcient algorithms for constant well supportedapproximate equilibria in bimatrix games. In

Automata, Languages and Programming , pages595–606. Springer, 2007.[24] Joram Lindenstrauss and Lior Tzafriri.

Classical Banach spaces . Springer Berlin-Heidelberg-New York, 1973.[25] Richard J Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games using simplestrategies. In

Proceedings of the 4th ACM conference on Electronic commerce , pages 36–41.ACM, 2003.[26] Olvi L. Mangasarian and H. Stone. Two-person nonzero-sum games and quadratic program-ming.

Journal of Mathematical Analysis and Applications , 9(3):348–355, 1964.[27] Jiˇr´ı Matouˇsek.

Lectures on discrete geometry , volume 108. Springer New York, 2002.[28] T. S. Motzkin and E. G. Straus. Maxima for graphs and a new proof of a theorem of Tur´an.

Canadian Journal of Mathematics , 17:533–540, 1965.2729] Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V Vazirani. Algorithmic game theory.2007.[30] Gilles Pisier. Remarques sur un r´esultat non publi´e de B. Maurey (French) [Remarks on anunpublished result of B. Maurey].

S´eminaire Analyse fonctionnelle (dit) , pages 1–12, 1981.[31] Anthony Man-Cho So. Moment inequalities for sums of random matrices and their applicationsin optimization.

Mathematical programming , 130(1):125–151, 2011.[32] Nicole Tomczak-Jaegermann. The moduli of smoothness and convexity and the Rademacheraverages of the trace classes S p (1 ≤ p < ∞ ). Studia Mathematica , 50(2):163–182, 1974.[33] Haralampos Tsaknakis and Paul G Spirakis. An optimization approach for approximate Nashequilibria. In

Internet and Network Economics , pages 42–56. Springer, 2007.[34] Haralampos Tsaknakis and Paul G Spirakis. Practical and eﬃcient approximations of Nashequilibria for win-lose games based on graph spectra. In

Internet and Network Economics ,pages 378–390. Springer, 2010.[35] Helge Tverberg. A generalization of Radon’s theorem.