Mixture Selection, Mechanism Design, and Signaling
Yu Cheng, Ho Yee Cheung, Shaddin Dughmi, Ehsan Emamjomeh-Zadeh, Li Han, Shang-Hua Teng
aa r X i v : . [ c s . G T ] A ug Mixture Selection, Mechanism Design, and Signaling ∗ Yu Cheng Ho Yee Cheung Shaddin Dughmi Ehsan Emamjomeh-ZadehLi Han Shang-Hua Teng † August 18, 2015
Abstract
We pose and study a fundamental algorithmic problem which we term mixture selection ,arising as a building block in a number of game-theoretic applications: Given a function g from the n -dimensional hypercube to the bounded interval [ − , n × m matrix A with bounded entries, maximize g ( Ax ) over x in the m -dimensional simplex. This problemarises naturally when one seeks to design a lottery over items for sale in an auction, or craftthe posterior beliefs for agents in a Bayesian game through the provision of information (a.k.a.signaling).We present an approximation algorithm for this problem when g simultaneously satisfies two“smoothness” properties: Lipschitz continuity with respect to the L ∞ norm, and noise stability .The latter notion, which we define and cater to our setting, controls the degree to which low-probability — and possibly correlated — errors in the inputs of g can impact its output. Theapproximation guarantee of our algorithm degrades gracefully as a function of the Lipschitzcontinuity and noise stability of g . In particular, when g is both O (1)-Lipschitz continuous and O (1)-stable, we obtain an (additive) polynomial-time approximation scheme (PTAS) for mixtureselection. We also show that neither assumption suffices by itself for an additive PTAS, andboth assumptions together do not suffice for an additive fully polynomial-time approximationscheme (FPTAS).We apply our algorithm for mixture selection to a number of different game-theoretic ap-plications, focusing on problems from mechanism design and optimal signaling. In particular,we make progress on a number of open problems suggested in prior work by easily reducingthem to mixture selection: we resolve an important special case of the small-menu lotterydesign problem posed by Dughmi, Han, and Nisan [DHN14]; we resolve the problem of revenue-maximizing signaling in Bayesian second-price auctions posed by Emek et al. [EFG +
12] andMiltersen and Sheffet [BMS12]; we design a quasipolynomial-time approximation scheme for the optimal signaling problem in normal form games suggested by Dughmi [Dug14]; and we designan approximation algorithm for the optimal signaling problem in the voting model of Alonsoand Cˆamara [AC14]. ∗ This work was supported by NSF grants CCF-1350900, CCF-1423618, CCF-0964481, CCF-1111270, and the lastauthor’s Simons Investigator Award from the Simons Foundation. † University of Southern California, { yu.cheng.1, hoyeeche, shaddin, emamjome, li.han, shanghua } @usc.edu. Introduction
Lotteries, beliefs, mixed strategies — all are distributions arising as important objects in gametheory. It is unsurprising, therefore, that algorithmic game theory is rife with algorithmic problemswhich — implicitly or explicitly — optimize over the space of distributions, or equivalently thesimplex. In this paper, we identify a family of algorithmic problems over the simplex which ariseover and over in game theory. We term problems in this class mixture selection , and examine theircomputational complexity.Given a function g from the solid n -dimensional hypercube to the bounded interval [ − , m , we define the m -dimensional mixture selection problem for g as follows. The input tothis problem is an n × m matrix A with bounded entries, and the objective is to compute x ∈ ∆ m maximizing g ( Ax ). It is natural to expect that the computational complexity of mixture selectiondepends crucially on the “complexity” of the function g . We therefore identify two “smoothness”parameters of the function g which control the extent to which mixture selection is tractable, andderive a simple approximation algorithm with guarantees degrading gracefully in those parameters.Moreover, we present evidence — in the form of hardness results — that smoothness in both sensesis necessary for the kind of general results we obtain.The first smoothness quantity is a familiar one, namely Lipschitz continuity in the L ∞ metric.The second quantity, which we define and term noise stability , borrows ideas from related definitionsof stability in other contexts (e.g. [KKL88, MOO10]), though is importantly different. Informally,a function g from the solid n -dimensional hypercube to the real numbers is β -noise stable (or β -stable for short) if the random corruption of an α -fraction of the n inputs to g , with no individualinput disproportionately likely to be corrupted, does not decrease the output of g by more than αβ . We note that a Fourier-analytic notion of stability is closely-related to ours — we elaborateon this connection in Section 9.This paper lays out a framework for tackling mixture selection problems, and presents a numberof applications in mechanism design and optimal signaling in games. Notably, we find that weresolve or make progress on a number of known open problems, and some new ones, using ourframework. Our Results
Our results for mixture selection can be viewed as generalizing the main insights of Lipton etal. [LMM03]. First, we show that when g is noise stable and Lipschitz continuous, and x ∈ ∆ m is arbitrary, there is a sparse vector e x for which g ( A e x ) is not much smaller than g ( Ax ). Theproof of this fact proceeds by sampling from x and letting e x be the empirical distribution, as in[LMM03]. However, when g is sufficiently noise stable and Lipschitz continuous, we obtain a bettertradeoff between the number of samples required and the error introduced into the objective thandoes [LMM03], and this is crucial for our applications. Our analysis bounds the expected differencebetween g ( Ax ) and g ( A e x ) as the sum of two terms: The first term represents the error in the outputof g caused by the low-probability “large errors” in its n inputs, and the second term represents theerror in the output of g introduced by the higher-probability “small errors” in its n inputs. Thefirst term is bounded using noise stability, and the second is bounded using Lipschitz continuity.Second, we instantiate the above insight algorithmically, as does [LMM03]. Specifically, ouralgorithm enumerates vectors e x of the desired sparsity in order to find an approximately optimalsolution to our mixture selection problem. We note that our guarantees are all parametrized by2he Lipschitz continuity c and the noise stability β of the function g . Most notably, we obtain anadditive polynomial-time approximation scheme (PTAS) whenever both β and c are constants.Third, we rule out certain natural extensions of our results assuming well-believed complexity-theoretic conjectures. We show that neither Lipschitz continuity nor noise stability alone sufficesfor an additive PTAS for mixture selection, and both together do not suffice for an additive fullypolynomial-time approximation scheme (FPTAS). For a function which is O (1)-stable yet O (1)-Lipschitz continuous only in the L metric, we show approximation hardness by a reduction fromthe NP-hard maximum independent set problem. For a function which is O (1)-Lipschitz in L ∞ yetnot O (1)-stable, we show approximation hardness by a reduction from the planted clique problem.Finally, for a function which is both O (1)-Lipschitz in L ∞ and O (1)-stable, we rule out an additiveFPTAS via a reduction from the maximum independent set problem.Despite the simplicity of our framework, we find that it has powerful implications for problemsin mechanism design and optimal signaling in games. We feature four natural applications in thispaper, three of which resolve or partially resolve outstanding open problems from prior work:1. Lottery design : Dughmi, Han, and Nisan [DHN14] examined one of the most basic problemsin mechanism design: that of designing the revenue-maximizing multi-item auction for a singleunit-demand buyer with valuation represented implicitly via a sampling oracle. They reducedthis problem to a regularized variant of itself, namely optimally designing a small numberof lottery-price pairs — a small menu — from which the buyer is allowed to choose. Weapply our framework to resolve the special case of this problem with a single lottery — i.e., amenu of size 1. This follows from the Lipschitz continuity and noise stability of the function g (lottery) w ( t ) := max p { p · P ni =1 w i · I [ t i ≥ p ] } for an arbitrary weight vector w ∈ ∆ n , where I [ E ]is the indicator function for the event E .2. Revenue-maximizing signaling in probabilistic second-price auctions : Emek etal. [EFG +
12] and Miltersen and Sheffet [BMS12] considered signaling in the context of aprobabilistic second-price auction. In particular, the attributes of the item for sale are un-known, and the auctioneer must decide what information to reveal in order to maximize hisrevenue in this auction. This is particularly relevant in advertising auctions, where itemsare impressions associated with demographics that are a-priori unknown to the advertisersbidding in the auction. Whereas both papers presented a polynomial-time algorithm for thisproblem when bidder types are fixed, the general problem was shown to be NP-hard and itsapproximability was left largely open. Using our framework, a PTAS for the general problemfollows easily. We use the fact that the function max2, which simply returns the secondlargest entry of a vector, is Lipschitz continuous and noise stable.3.
Persuasion in voting : Alonso and Cˆamara [AC14] examine a simple election for selectinga binary outcome — say whether a ballot measure is passed — when voters are not fullyinformed of the consequences of the measure, and hence of their utilities. Each voter casts aYes/No vote, and the measure passes if the fraction of Yes votes exceeds a certain pre-specifiedthreshold. A principal — say a moderator of a political debate — can determine the protocol— or signaling scheme — through which information regarding the measure is gathered andshared with voters. We consider a principal concerned with maximizing the probability ofthe measure passing. [AC14] characterize the optimal signaling scheme and a number of itsproperties, though stop short of deriving an algorithm for optimal signaling. We design a3ulti-criteria PTAS for this problem using our framework. Along the way, we also design abi-criteria PTAS for the related problem of maximizing the expected number of Yes votes inthe election. For both results, we use the fact that the function g (vote-sum) ( t ) = n | { i : t i ≥ } | is noise stable and Lipschitz continuous in a bi-criteria sense.4. Optimal signaling in normal form games : Dughmi [Dug14] examined the problem ofoptimal signaling in abstract normal form games, and ruled out an FPTAS even for two-player zero-sum games. The possibility for a PTAS for two-player zero-sum games, and aQPTAS for general games with a constant number of players, were left open. We show thata bi-criteria QPTAS for normal-form games with a constant number of players follows fromour framework, and applies to a large and natural class of objective functions. We use thefact that every function is O ( n )-stable, and the fact that the function measuring the qualityof equilibria satisfies a bi-criteria notion of Lipschitz continuity which we define. Additional Discussion of Related Work
As previously described, our framework generalizes and refines the main insight of [LMM03]. Therecent work of Barman [Bar15] is also similar in spirit; in particular, the approximate variant ofCaratheodory’s theorem employed in that paper can be viewed as a mixture selection problem with g ( Ax ) = −|| Ax − Ax ∗ || p for a fixed vector x ∗ and norm p ≥
2. Even though this function g is neitherLipschitz continuous in L ∞ nor noise stable, Barman exhibits a PTAS under the assumption thatthe columns of A have small (i.e. constant) p -norm. For a function g : [ − , n → [ − ,
1] and a positive integer m , we define the following optimizationproblem which we term m -dimensional mixture selection for g : given an n × m matrix A with entriesin [ − , x in the m -dimensional simplex ∆ m maximizing f ( x ) := g ( Ax ). In this section, wepresent our notion of noise stability , and derive approximation algorithms for this problem when thefunction g is simultaneously noise stable and Lipschitz continuous with respect to the L ∞ metric.Moreover, we show that neither requirement alone suffices for our results.Our approximation guarantees will be additive — i.e., an ǫ -approximation algorithm for mixtureselection outputs x ∈ ∆ m with f ( x ) ≥ max y ∈ ∆ m f ( y ) − ǫ . To illustrate our techniques, we use thefollowing function g (mid) : [ − , n → [ − , g (mid) ( t ) = 1 ⌈ n/ ⌉ − ⌊ n/ ⌋ ⌈ n/ ⌉ X i = ⌊ n/ ⌋ +1 t [ i ] , where t [ i ] denote the i th largest entry of t . Throughout the paper, we use t i to denote the i th entryof t , and use t [ i ] to denote the i th largest entry of t .Though we present our framework for functions g : [ − , n → [ − , g : [0 , n → [0 , g which are both noise stable and Lipschitz continuouswith respect to the L ∞ metric. We now formalize these two conditions. Lipschitz Continuity
A function g : [ − , n → [ − ,
1] is c -Lipschitz continuous in L ∞ — or c -Lipschitz for short — ifand only if for all t, t ′ in the domain of g , | g ( t ) − g ( t ′ ) | ≤ c || t − t ′ || ∞ . To illustrate, our examplefunction g (mid) is 1-Lipschitz. We note that Lipschitz continuity in L ∞ is a stronger assumptionthan in any other L p norm. Noise Stability
Our notion of noise stability captures the following desirable property of a function g : [ − , n → [ − , g , with noindividual input disproportionately likely to be corrupted, then the output of g does not decreaseby much in expectation. Such random corruption patterns are captured by our notion of a lightdistribution over subsets of [ n ], defined below. Definition 2.1 (Light Distribution) . Let D be a distribution supported on subsets of [ n ] . For α ∈ (0 , , we say D is α -Light if and only if the following holds for all i ∈ [ n ] : Pr R ∼D [ i ∈ R ] ≤ α. In other words, a light distribution bounds the marginal probability of any individual elementof [ n ]. When corrupted inputs follow a light distribution, no individual input is too likely to becorrupted. However, we note that our notion of light distribution allows arbitrary correlationsbetween the corruption events of various inputs. We define a noise stable function as one whichis robust, in an average sense, to corrupting a subset R of its n inputs when R follows a lightdistribution D . Our notion of robustness is one-sided: we only require that our function’s outputnot decrease substantially in expectation. This one-sided guarantee suffices for all our applications,and is necessitated by some. We note that the light distribution D , as well as the (corrupted)inputs, are chosen adversarially. We make use of the following notation in our definition: Givenvectors t, t ′ ∈ [ − , n and a set R ⊆ [ n ], we say t ′ ≈ R t if t i = t ′ i for all i R . In other words, if t ′ ≈ R t , then t ′ is a result of corrupting the entries of t corresponding to R . Definition 2.2 (Noise Stability) . Given a function g : [ − , n → [ − , and a real number β ≥ ,we say g is β -stable if and only if the following holds for all t ∈ [ − , n , α ∈ (0 , , and α -Lightdistributions D over subsets of [ n ] : E R ∼D (cid:20) min { g ( t ′ ) : t ′ ≈ R t } (cid:21) ≥ g ( t ) − αβ. To illustrate this definition, we show that our example function g (mid) is 4-stable. To see this,observe that changing k entries of the input to g (mid) can decrease its output by at most kn . When R is drawn from an α -Light distribution and t is an arbitrary input, 4-stability therefore followsfrom the linearity of expectations:E R ∼D (cid:20) min { g (mid) ( t ′ ) : t ′ ≈ R t } (cid:21) ≥ E R ∼D (cid:20) g (mid) ( t ) − | R | n (cid:21) ≥ g (mid) ( t ) − α.
5e note that every function g : [ − , n → [ − ,
1] is 2 n -stable, which follows from the union bound.As a useful building block for proving some of our functions stable, we show that stable functionscan be combined to yield other stable functions if composed with a convex, nondecreasing, andLipschitz continuous function. Proposition 2.3.
Fix β, c ≥ , and let g , g , . . . , g k : [ − , n → [ − , be β -stable functions. Forevery convex function h : [ − , k → [ − , which is nondecreasing in each of its arguments and c -Lipschitz continuous in L ∞ , the function g ( t ) := h ( g ( t ) , . . . , g k ( t )) is ( βc ) -stable.Proof. For all t ∈ [ − , n and all α -Light distributions D ,E R ∼D [min t ′ ≈ R t g ( t ′ )] = E R ∼D [min t ′ ≈ R t h ( g ( t ′ ) , . . . , g k ( t ′ ))] ≥ E R ∼D [ h (min t ′ ≈ R t g ( t ′ ) , . . . , min t ′ ≈ R t g k ( t ′ ))] (Since h is nondecreasing) ≥ h ( E R ∼D [min t ′ ≈ R t g ( t ′ )] , . . . , E R ∼D [min t ′ ≈ R t g k ( t ′ )]) (Jensen’s inequality) ≥ h ( g ( t ) − αβ, . . . , g k ( t ) − αβ ) (Stability of each g i ) ≥ h ( g ( t ) , . . . , g k ( t )) − αβc (Lipschitz continuity of h )= g ( t ) − αβc. As a consequence of the above proposition, a convex combination of β -stable functions is β -stable,and the point-wise maximum of β -stable functions is β -stable. Consequences of Noise Stability and Lipschitz Continuity
We now state the two main results of our framework. Both results apply to functions g : [ − , n → [ − ,
1] which are simultaneously Lipschitz continuous and noise stable, and n × m matrices A withentries in [ − , x ∈ ∆ m and integer s >
0, we view x as a probability distributionover [ m ], and use the random variable e x ∈ ∆ m to denote the empirical distribution of s i.i.d.samples from x . Formally, e x = s P si =1 e k i , where k , . . . , k s ∈ [ m ] are drawn i.i.d. according to x , and e j ∈ ∆ m denotes the j th standard basis vector. Since e x is the average of s standard basisvectors, we say it is s -uniform . Definition 2.4.
We refer to a distribution y ∈ ∆ m as s -uniform if and only if it is the average ofa multiset of s standard basis vectors in m -dimensional space. Our first result shows that when the number of samples s is chosen as a suitable function ofthe Lipschitz continuity and noise stability parameters, g ( A e x ) is not much smaller than g ( Ax ) inexpectation over e x . At a high level, we bound this difference as a sum of two error terms: oneaccounts for the effect of low-probability large errors in the inputs e t = A e x to g , and the otheraccounts for effect of higher-probability small errors in the inputs e t . The former error term isbounded using noise stability, and the latter error term is bounded using Lipschitz continuity. Theorem 2.5.
Let g : [ − , n → [ − , be β -stable and c -Lipschitz in L ∞ , let A be an n × m matrix with entries in [ − , , let α, δ > , and let s ≥ α ) /δ be an integer. Fix a vector x ∈ ∆ m , and let the random variable e x denote the empirical distribution of s i.i.d. samples fromprobability distribution x . The following then holds: E[ g ( A e x ))] ≥ g ( Ax ) − αβ − cδ. roof. Denote t = Ax and e t = A e x . Note that e t is a random variable. Also note that t i and e t i can be viewed as the mean and empirical mean, respectively, of a distribution supported on A i, , . . . , A i,m ∈ [ − , i th entry of t is approximately preserved if | t i − e t i | ≤ δ , andwe say it is corrupted otherwise. Let R ⊆ [ n ] denote the set of corrupted entries. Hoeffding’sinequality, and our choice of the number of samples s , imply that R follows an α -Light distribution.Let t ′ be such that (1) t ′ i = e t i for i ∈ R , and (2) t ′ i = t i otherwise. Observe that t ′ ≈ R t , and || t ′ − e t || ∞ ≤ δ . We can now bound the expected difference between g ( t ) and g ( e t ) as a sum of theerror introduced by corrupted entries and the error introduced by the approximately preservedentries of t : g ( t ) − E[ g ( e t )] = E[ g ( t ) − g ( t ′ )] + E[ g ( t ′ ) − g ( e t )] ≤ αβ + cδ. Notice that if we fix the desired approximation error ǫ , the minimum required number ofsamples s in Theorem 2.5 to guarantee that E[ g ( A e x ))] ≥ g ( Ax ) − ǫ is obtained by minimizing ⌈ α ) /δ ⌉ over α, δ > αβ + δc ≤ ǫ . Therefore, the required number of samplesdepends only on the error term ǫ , the noise stability parameter β , and the Lipschitz continuityparameter c ; in particular, it is independent of n and m .As a corollary of Theorem 2.5, we derive the following algorithmic result. Theorem 2.6.
Let g : [ − , n → [ − , be β -stable and c -Lipschitz, and let m > be an inte-ger. For every δ, α > , the m -dimensional mixture selection problem for g admits an ( αβ + cδ ) -approximation algorithm in the additive sense, with runtime n · m O (log(1 /α ) /δ ) · T , where T denotesthe time needed to evaluate g on a single input.Proof. Let s ≥ /α ) /δ be an integer. Our algorithm simply enumerates all s -uniform distri-butions, and outputs the one maximizing g ( Ax ). This takes time n · m O ( s ) · T . The approximationguarantee follows from Theorem 2.5 and the probabilistic method.As a consequence of Theorem 2.6, the mixture selection problem for g (mid) admits a polynomial-time approximation scheme (PTAS) in the additive sense. The same holds for every function g which is O (1)-stable and O (1)-Lipschitz continuous. Specifically, by setting α = ǫ β and δ = ǫ c , an ǫ -approximation algorithm runs in time n · m O ( c log( β/ǫ ) /ǫ ) · T . Interestingly, neither noise stabilitynor Lipschitz continuity alone suffices for such a PTAS, as we argue in the next subsection. The Necessity of
Both
Noise Stability and Lipschitz Continuity
We now present evidence that both our assumptions — Noise stability and Lipschitz continuity —appear necessary for general positive results along the lines of those in Theorem 2.6.1.
Stability alone is not sufficient.
In Section 8.1, we define a function g (slope) : [0 , n → [0 , g (slope) is O (1)-Lipschitz with respect to the L metric, whichis a weaker property than Lipschitz continuity with respect to L ∞ . We show in Theorem 8.4that there is a polynomial-time reduction from the maximum independent set problem on n -node graphs to the n -dimensional mixture selection for g (slope) . Moreover, the reductionprecludes a polynomial-time ǫ -approximation algorithm in the additive sense for some con-stant ǫ >
0. 7.
Lipschitz continuity alone is not sufficient.
One might hope to prove NP-hardness of mixtureselection in the absence of stability. However, we are out of luck in this regard: since everyfunction g : [ − , n → [ − ,
1] is 2 n -stable, Theorem 2.6 implies a quasipolynomial-timeapproximation scheme in the additive sense whenever g is O (1)-Lipschitz. Nevertheless, weprove hardness of approximation assuming the planted clique conjecture ([Jer92] and [Kuˇc95]).More specifically, in Section 8.2 we exhibit a reduction from the planted k -clique problemto mixture selection for the 3-Lipschitz function g (clique) k ( t ) = t [ k ] − t [ k +1] + t [ n ] . When k = ω (log n ) and A is the adjacency matrix of an n -node undirected graph G , we show thatmax x g (clique) k ( Ax ) ≈ G contains a k -clique, and max x g (clique) k ( Ax ) ≈ with high probability if G is the Erd¨os-R´enyi random graph G ( n, ). A Bi-criteria Extension of the Framework
We have already showed that in the absence of Lipschitz continuity, one can not hope for a PTASin general. Motivated by two of our applications, namely
Optimal signaling in normal form games and
Persuasion in voting , we extend our framework to the design of approximation algorithmsfor mixture selection with a bi-criteria guarantee when the function in question is stable but notLipschitz continuous. We first define a ( δ, ρ )-relaxation of a function.
Definition 2.7.
Given two functions g, h : [ − , n → [ − , and parameters δ, ρ ≥ , we say h isa ( δ, ρ ) -relaxation of g if for all t , t ∈ [ − , n with || t − t || ∞ ≤ δ , h ( t ) ≥ g ( t ) − ρ . In lieu of the Lipschitz continuity condition, we prove our bounds for a relaxation of the function.
Theorem 2.8.
Let g : [ − , n → [ − , be β -stable, let A be an n × m matrix with entries in [ − , , let α > and δ, ρ ≥ , and let s ≥ α ) /δ be an integer. Fix a vector x ∈ ∆ m , and let therandom variable e x denote the empirical distribution of s i.i.d. samples from probability distribution x . The following then holds for any ( δ, ρ ) -relaxation h of g , E[ h ( A e x ))] ≥ g ( Ax ) − αβ − ρ. Proof.
Because the proof is almost identical to the proof of Theorem 2.5, we just mention thenecessary modifications. Again, let t = Ax , let e t = A e x , let R ⊆ [ n ] denote the set of corruptedinputs, and let t ′ be such that t ′ i = e t i for i ∈ R and t ′ i = t i otherwise. Then g ( t ) − E[ h ( e t )] = E[ g ( t ) − g ( t ′ )] + E[ g ( t ′ ) − h ( e t )] ≤ αβ + E[ g ( t ′ ) − h ( e t )] ≤ αβ + ρ, where the first inequality follows by noise stability of g , and the last inequality follows from thefact that h is a ( δ, ρ )-relaxation of g .Having replaced Theorem 2.5 by Theorem 2.8, a similar computational result as Theorem 2.6can be inferred in the bi-criteria sense. 8 Lottery Design for Revenue Maximization
To illustrate the utility of our framework, we start with a simple but basic open problem in Bayesianmechanism design posed by Dughmi, Han, and Nisan [DHN14]. An instance of the lottery design problem is given by a valuation matrix A ∈ [0 , n × m and n non-negative weights w , . . . , w n with P ni =1 w i = 1. Here n denotes the number of buyer types, m denotes the number of items, and w represents a probability distribution over types. Each 0 ≤ A i,j ≤ j to abuyer of type i . The goal is to design a single lottery-price pair ( x, p ), with x ∈ ∆ m and p ≥
0, sothat the expected revenue of the auction which offers the lottery x over items at price p to a buyerwith type drawn according to w is maximized. We assume the buyer is risk neutral, and thereforeaccepts the offer precisely if his type i satisfies A i x ≥ p . Consequently, our goal is to choose ( x, p )maximizing p · P ni =1 ( w i · I [ A i x ≥ p ]), where I [ E ] denotes the indicator function for the event E .The lottery design problem is closely related to the general unit-demand single-buyer mechanismdesign problem considered in [DHN14], where the buyer’s type is drawn from a common knowledgeprior distribution B given by a sampling oracle, and the buyer is to be presented with a menu consisting of several lottery-price pairs from which to choose. [DHN14] frame this mechanismdesign problem as a computational task of “learning” a good mechanism by sampling from B , anduse the size of the menu as a regularization constraint in order to prevent over-fitting the mechanismto the sampled data. The problem of maximizing the expected revenue by using a menu of at mosta given size is the main algorithmic question in [DHN14], and its computational complexity isleft largely open. When constrained to a menu with a single lottery, the goal is to choose ( x, p )maximizing the expected revenue p · Pr a ∼B [ ax ≥ p ].Using our mixture selection framework, we first give an additive PTAS for the lottery designproblem when the value distribution B is given explicitly by the matrix A and weights w as describedabove. We then extend our result to cases in which B can only be accessed through sampling,using fairly standard uniform convergence arguments. In Section 8.3 (Theorem 8.9), we rule outan additive FPTAS for this problem, and in doing so provide complexity-theoretic evidence thatour PTAS — for both lottery design in particular and mixture selection for stable and Lipschitz-continuous functions more generally — is essentially the best we can hope for. Given the number of buyer types n and weights w ∈ ∆ n , the lottery design problem is simplymixture selection for the function g (lottery) w ( t ) : [0 , n → [0 , g (lottery) w ( t ) = max p ≥ { p · n X i =1 ( w i · I [ t i ≥ p ]) } (3.1)As the first step to applying our framework, we show that g (lottery) w is noise stable. The highlevel idea is the following: if a subset of the inputs to g (lottery) w is corrupted, then in the worst caseeach such input exceeded the price p before corruption but not after corruption. This reduces theoutput of g (lottery) w by at most the total weight of corrupted inputs. When corrupted inputs arechosen according to an α -Light distribution, their expected total weight is bounded by α . Lemma 3.1.
The function g (lottery) w is -stable. roof. Let t ∈ [0 , n be an arbitrary input to g (lottery) w . When t ′ is obtained from t by corrupting theentries corresponding to R ⊆ [ n ], an event we denote by t ′ ≈ R t , it is easy to see that g (lottery) w ( t ′ ) ≥ g (lottery) w ( t ) − w ( R ) where w ( R ) = P i ∈ R w i denotes the total weight of corrupted entries. Moreover,when R is a random variable drawn from an α -Light distribution D , we can bound the expectedloss: E[ w ( R )] = P ni =1 Pr[ i ∈ R ] · w i ≤ P i αw i = α . It follows that g (lottery) w is 1-stable.E R ∼D (cid:20) min { g (lottery) w ( t ′ ) : t ′ ≈ R t } (cid:21) ≥ g (lottery) w ( t ) − E R ∼D [ w ( R )] ≥ g (lottery) w ( t ) − α. Next, we prove Lipschitz continuity. The high-level idea is the following: if all inputs to g (lottery) w decrease by δ , then we need only decrease the price p from expression (3.1) by δ . The (weighted)fraction of inputs exceeding the price is at least the same as before. Lemma 3.2.
The function g (lottery) w is -Lipschitz continuous.Proof. Consider t, t ′ ∈ [0 , n with || t ′ − t || ∞ ≤ δ . We prove that g (lottery) w ( t ′ ) ≥ g (lottery) w ( t ) − δ , andby symmetry it follows that g (lottery) w ( t ) ≥ g (lottery) w ( t ′ ) − δ . Let p be the optimal price for t — i.e. themaximizer of expression (3.1) — and define p ′ = max { , p − δ } . Whenever t i ≥ p we have t ′ i ≥ p ′ , andtherefore n P i =1 ( w i · I [ t ′ i ≥ p ′ ]) ≥ n P i =1 ( w i · I [ t i ≥ p ]). It follows that g (lottery) w ( t ′ ) ≥ g (lottery) w ( t ) − δ .Combining Lemmas 3.1 and 3.2 with Theorem 2.6 yields an additive PTAS for lottery design. Theorem 3.3.
There is an additive PTAS for the lottery design problem when the valuation dis-tribution is given explicitly by a matrix A ∈ [0 , n × m and a weight vector w ∈ ∆ n . So far, we have assumed that the buyer’s type distribution is given explicitly. We now show how toextend our results to the sample oracle model . Specifically, we assume the buyer’s type a ∈ [0 , m is drawn from a distribution B given by a sampling oracle, and seek a randomized approximationscheme with runtime (and number of samples) polynomial in m for each desired approximationguarantee ǫ . To simplify exposition we assume B has finite support, though our results hold moregenerally. As usual, our goal is to choose a lottery x ∈ ∆ m and a price p ≥ B ( x, p ) = p · Pr a ∼B [ ax ≥ p ]. Theorem 3.4.
There is an additive polynomial-time randomized approximation scheme (PRAS)for the lottery design problem in the sample oracle model.Proof.
Recall that the PTAS in Theorem 3.3 optimizes over all s -uniform m -dimensional lotteries,where s depends only on the desired approximation guarantee ǫ >
0, and is bounded by a polynomialin ǫ . In particular, s is independent of the number of types n , implying that the same approach ofenumerating all s -uniform lotteries e x would succeed in the sample oracle model were it not for ourinability to evaluate max p Rev B ( e x, p ) exactly.We overcome this difficulty by Monte Carlo sampling from B . Given n types a , . . . , a n ∈ [0 , m sampled from B and presented as the rows of a matrix A ∈ [0 , n × m , we run the PTAS from10heorem 3.3 with approximation parameter ǫ on the empirical distribution given by A and uniformweights w i = n for all i . Taking n to be a suitable polynomial in m , ǫ , and log( γ ) — where γ > | Rev B ( e x, p ) − p · n P ni =1 I [ A i e x ≥ p ] | ≤ ǫ simultaneously for all s -uniform lotteries e x and prices p with probability at least 1 − γ . This follows from standard tailbounds and the union bound, coupled with a uniform convergence argument over prices p ∈ [0 , − γ our algorithm outputs a lottery-price pair whose expected revenuefor a buyer drawn from B is within O ( ǫ ) from the optimal. In the next few sections, we consider a number of Bayesian games in which a key parameter θ , the state of nature , in part determines the payoff structure of the game. We use Θ to denote the set ofall states of nature, and assume θ ∈ Θ is drawn from a common-knowledge prior distribution whichwe denote by λ . In all our applications, we assume players a-priori know nothing about θ otherthan its prior distribution λ , and examine policies whereby a principal with access to the realizedvalue of θ may commit to a policy of revealing information to the players regarding θ . This is oftenreferred to as signaling (see e.g. [EFG +
12, BMS12, DIR14, Dug14]). We restrict our attention tosymmetric signaling schemes, in which the principal must reveal the same information to all playersin the game. Thus, a symmetric signaling scheme is given by a set Σ of signals , and a (possiblyrandomized) map ϕ from states of nature Θ to signals Σ. The goal of the principal, who is privy toconfidential state-of-nature information, is to boost her own objective by using a signaling scheme ϕ that optimally affects the outcome of the game.In this section, in addition to providing the technical background for signaling schemes, we useour framework to define an abstract signaling problem and characterize its approximation complex-ity. This abstract problem captures the essence of all signaling problems considered in this paper. Let m = | Θ | . Abusing notation, we use ϕ ( θ, σ ) to denote the probability of announcing signal σ ∈ Σconditioned on the state of nature is θ ∈ Θ. It is well known ([KG09, Dug14]) that signaling schemesare in one-to-one correspondence with convex decompositions of the prior distribution λ ∈ ∆ m :Formally, a signaling scheme ϕ : Θ → Σ corresponds to the convex decomposition λ = P σ ∈ Σ ν σ · µ σ , where (1) ν σ = Pr θ ∼ Θ [ ϕ ( θ ) = σ ] = P θ ∈ Θ λ ( θ ) ϕ ( θ, σ ) is the probability of announcing signal σ , and(2) µ σ ( θ ) = Pr θ ∼ Θ [ θ | ϕ ( θ ) = σ ] = λ ( θ ) ϕ ( θ,σ ) ν σ is the posterior belief distribution of θ conditionedon signal σ . The converse is also true: every convex decomposition of λ ∈ ∆ m corresponds to asignaling scheme. Alternatively, the reader can view a signaling scheme ϕ as the m × | Σ | matrix ofpairwise probabilities ϕ ( θ, σ ) satisfying conditions (1) and (2) with respect to λ ∈ ∆ m .Note that each posterior distribution µ ∈ ∆ m defines a Bayesian game, and the principal’sutility depends on the outcome of the game. Given a suitable equilibrium concept and selectionrule, we let f : ∆ m → R denote the principal’s utility as a function of the posterior distribution µ .For example, in an auction game f ( µ ) may be the social welfare or principal’s revenue at theinduced equilibrium, or any weighted combination of players’ utilities, or something else entirely.The principal’s objective as a function of the signaling scheme ϕ can be mathematically expressedby F ( ϕ, λ ) = P σ ν σ · f ( µ σ ). 11n this setup, the optimal choice of a signaling scheme is related to the concave envelope f + of thefunction f ([KG09, Dug14]). Specifically, such a signaling scheme achieves P σ ν σ · f ( µ σ ) = f + ( λ ).Thus, there exists a signaling scheme with m + 1 signals that maximizes the principal’s objective,by applying Caratheodory’s theorem to the hypograph of f . To connect to our mixture selection framework, we consider signaling problems in which the prin-cipal’s utility f ( µ ) from a posterior distribution µ ∈ ∆ m can be written as g ( Aµ ) for a function g : [ − , n → [ − ,
1] and a matrix A ∈ [ − , n × m . As described in Section 4.1, a signalingscheme ϕ with signals Σ corresponds to a family of probability-posterior pairs { ( ν σ , µ σ ) } σ ∈ Σ de-composing the prior λ ∈ ∆ m into a convex combination of posterior distributions (one per signal): λ = P σ ∈ Σ ν σ µ σ . The objective of our signaling problem is then F ( ϕ ) = X σ ∈ Σ ν σ f ( µ σ ) = X σ ∈ Σ ν σ g ( Aµ σ ) . We note that this signaling problem can alternatively be written as an (infinite-dimensional)linear program which searches over probability measures supported on ∆ m with expectation λ .The separation oracle for the dual of this linear program is a mixture selection problem. Whereaswe do not use this infinite-dimensional formulation nor its dual directly, we nevertheless show thatthe same conditions — noise stability and Lipschitz continuity — on the function g which lead toan approximation scheme for mixture selection also lead to a similar approximation scheme for oursignaling problem with f ( µ ) = g ( Aµ ). Lemma 4.1. If g is β -stable and c -Lipschitz, then for any constants α, δ > , and for any integer s ≥ δ − ln(2 /α ) , there exists a signaling scheme e ϕ for which every posterior distribution is s -uniform, and F ( e ϕ ) ≥ OP T − ( αβ + cδ ) where OPT denotes the value of the optimal signalingscheme.Proof. Let s ≥ δ − ln(2 /α ), and let τ ∈ [ m s ] index all s -uniform posteriors, with e µ τ denoting the τ ’th such posterior. For an arbitrary signaling scheme ϕ = (Σ , { ( ν σ , µ σ ) } σ ∈ Σ ), we show that eachposterior µ σ can be decomposed into s -uniform posteriors without degrading the objective by morethan αβ + cδ ; more formally:1. µ σ can be expressed as a convex combination of s -uniform posteriors as follows. µ σ = X τ ∈ [ m s ] e ν σ,τ e µ τ with e ν σ ∈ ∆ m s . (4.1)2. The value of objective function, i.e., g ( Aµ σ ), is decreased by no more than αβ + cδ throughthis decomposition, X τ ∈ [ m s ] e ν σ,τ · g ( A e µ τ ) ≥ g ( Aµ σ ) − ( αβ + cδ ) . (4.2) f + is the point-wise lowest concave function h for which h ( x ) ≥ f ( x ) for all x in the domain. Equivalently, thehypograph of f + is the convex hull of the hypograph of f . σ , and let e µ ∈ ∆ m be theempirical distribution of s i.i.d. samples from distribution µ σ ∈ ∆ m . The vector e µ is itself arandom variable supported on s -uniform posteriors, its expectation is µ σ , and by Theorem 2.5 wehave E[ g ( A e µ )] ≥ g ( Aµ σ ) − ( αβ + cδ ). Therefore, by taking e ν σ,τ = Pr[ e µ = e µ τ ] for each τ ∈ [ m s ] weget the desired decomposition of µ σ .The lemma follows by composing the decomposition ϕ with the decompositions of the posteriorbeliefs µ σ to yield a signaling scheme e ϕ with only s -uniform posteriors and F ( e ϕ ) ≥ F ( ϕ ) − ( αβ + cδ ).Specifically, the signals of e ϕ are Σ × [ m s ], where signal ( σ, τ ) has probability ν σ · e ν σ,τ and inducesthe posterior e µ τ . Using Equation (4.1) and (4.2), it is easy to verify that this describes a validsignaling scheme with F ( e ϕ ) ≥ F ( ϕ ) − ( αβ + cδ ).Lemma 4.1 permits us to restrict attention to s -uniform posteriors without much loss in ourobjective. Since there are only m s such posteriors, a simple linear program with m s variablescomputes an approximately optimal signaling scheme. Theorem 4.2 (Polynomial-Time Signaling) . If g is β -stable and c -Lipschitz, then for any constant α, δ > , there exists a deterministic algorithm that constructs a signaling scheme with objectivevalue at least OP T − ( αβ + cδ ) , where OP T is the value of the optimal signaling scheme. Moreover,the algorithm runs in time poly( m δ − ln(1 /α ) ) · n · T , where T denotes the time needed to evaluate g on a single input.Proof. Let s be an integer with s ≥ (2 δ − ln(2 /α )), denote M = m s , and let { µ , · · · , µ M } bethe set of all s -uniform posteriors. Lemma 4.1 shows that restricting to s -uniform posteriors onlyintroduces an αβ + cδ additive loss in the objective. Thus it suffices to compute the optimalsignaling scheme supported only on s -uniform posteriors. This can be done using the followinglinear program: maximize P j ∈ [ M ] ν j · g ( Aµ j )subject to P j ∈ [ M ] ν j µ j = λν ∈ ∆ M (4.3)Note µ j is the j ’th s -uniform posterior — the only variables in this LP are ν , . . . , ν M .Our proofs can be adapted to obtain a bi-criteria guarantee in the absence of Lipschitz conti-nuity, as in Section 2. The following Theorem follows easily, and we omit the details. Theorem 4.3 (Polynomial-Time Signaling (Bi-criteria)) . Let g, h : [ − , n → [ − , be suchthat g is β -stable and h is a ( δ, ρ ) -relaxation of g , and let α > be a parameter. There exists adeterministic algorithm which, when given as input a matrix A ∈ [ − , n × m and a prior distribution λ ∈ ∆ m , constructs a signaling scheme ϕ = { ( ν σ , µ σ ) } σ ∈ Σ such that X σ ∈ Σ ν σ h ( Aµ σ ) ≥ OP T − αβ − ρ, where OP T is the maximizer of F ( ϕ ∗ ) = P σ ∈ Σ ∗ ν ∗ σ g ( Aµ ∗ σ ) over signaling schemes ϕ ∗ = { ( ν ∗ σ , µ ∗ σ ) } σ ∈ Σ ∗ .Moreover, the algorithm runs in time poly( m δ − ln(1 /α ) ) · n · T , where T denotes the time needed toevaluate h on a single input. Note, however, that we can also “merge” all signals with the same posterior e µ τ without loss. emarks We note that our proof suggests an extension of the result in Theorem 4.2 to casesin which f is given by a “black box” oracle, so long as we are promised that it is of the form f ( µ ) = g ( Aµ ). In this model the runtime of our algorithm does not depend on n , but insteaddepends on the cost of querying f . We also point out that that even though we precompute thequality of all m s posteriors, we can guarantee that our output signaling scheme uses at most m + 1signals; this is because LP (4.3) has only m +1 constraints, and therefore admits an optimal solutionwhere at most m + 1 variables are non-zero. We examine signaling in probabilistic second-price auctions, as considered by Emek et al. [EFG + n denoting the number of bidders. We index the players by the set [ n ] = { , . . . , n } .- An integer m denoting the number of states of nature . We index states of nature by the setΘ = { , . . . , m } . Each θ ∈ Θ represents a possible instantiation of the item being sold.- A common-knowledge prior distribution λ ∈ ∆ m on the states of nature.- A common-knowledge prior distribution D on valuation matrices V ∈ [0 , n × m , given eitherexplicitly or as a “black-box” sampling oracle. For a valuation matrix V , entry V ij denotesthe value of player i for the item corresponding to state of nature j .The game being played is the following: (a) The auctioneer first commits to a signaling scheme ϕ : Θ → Σ; (b) A state of nature θ ∈ Θ is drawn according to λ and revealed to the auctioneerbut not the bidders; (c) The auctioneer reveals a public signal σ ∼ ϕ ( θ ) to all the bidders; (d) Avaluation matrix V ∈ [0 , n × m is drawn according to D , and each player i learns his value V i,j foreach potential item j ; (e) Finally, a second-price auction for the item is run.As an example, consider an auction for an umbrella: the state of nature θ can be the weathertomorrow, which determines the utility V i,θ of an umbrella to player i . We assume that λ and D areindependent. We also emphasize that a bidder knows nothing about θ other than its distribution λ and the public signal σ , and the auctioneer knows nothing about V besides its distribution D priorto running the auction.We adopt the (unique) dominant-strategy truth-telling equilibrium as our solution concept.Specifically, given a signaling scheme ϕ : Θ → Σ and a signal σ ∈ Σ, in the subgame correspondingto σ it is a dominant strategy for player i to bid E θ ∼ λ [ V iθ | ϕ ( θ ) = σ ] — his posterior expectedvalue for the item conditioned on the received signal σ . Therefore the item goes to the player withmaximum posterior expected value, at a price equal to the second-highest posterior expected value.The algorithmic problem we consider is the one faced by the auctioneer in step (a) — namelycomputing an optimal signaling scheme — assuming the auctioneer looks to maximize expectedrevenue. It was shown in [EFG +
12, BMS12] that polynomial-time algorithms exist for severalspecial cases of this problem. However, the general problem was shown to be NP-hard even with14 bidders — specifically, no additive FPTAS exists unless P = NP. In this section, we resolve theapproximation complexity of this basic signaling problem by giving an additive PTAS. We notethat variations of this problem were considered in [GNS07, GD13], with different constraints onthe signaling scheme — the results in these works are not directly relevant to our model.
Given a signaling scheme ϕ expressed as a decomposition { ν σ , µ σ } σ ∈ Σ of the prior distribution λ ,we can express the auctioneer’s expected revenue as X σ ∈ Σ ν σ E V∼D max2( V µ σ ) , where the function max2 returns the second largest entry of a given vector, i.e. max2( t ) = t [2] . Toapply our main theorem, we need to show that the revenue in a subgame with posterior distribution µ ∈ ∆ m — namely E V∼D max2( V µ ) — can be written in the form g ( W µ ) for a matrix W . Tofacilitate our discussion we assume that the valuation distribution D has finite support size C ,though this is without loss of generality. Imagine we form a large matrix W by stacking matricesin the support of D on top of each other. Formally, W = [ V T , V T , · · · , V TC ] T where V i is the i thmatrix in the support of D . When matrix V i is drawn from D , we take the second-highest bid fromthe rows of W corresponding to V i (rows ( i − · n + 1 to i · n , where n is the number players). For S ⊆ [ nC ] and t ∈ [0 , nC , let max2 S ( t ) denote the second-highest value among entries of t indexedby S . Then we can write the auctioneer’s expected revenue as g (rev) ( W µ ) = E
V∼D max2 S ( V ) ( W µ )where S ( V ) is the set of rows in W corresponding to V . Lemma 5.1 (Smooth and Stable Revenue) . The function g (rev) ( t ) = E V∼D max2 S ( V ) ( t ) is 1-Lipschitz and 2-stable.Proof. Because max2 S is 1-Lipschitz for a fixed set of indices S , it follows that g (rev) , which is aconvex combination of these 1-Lipschitz functions, is also 1-Lipschitz.To show that g (rev) is stable, we first show that the function max2 : [0 , n → [0 ,
1] is stable.Given t ∈ [0 , n and a random set R ⊆ [ n ] drawn from an α -light distribution D , the union boundimplies that R includes neither of the two largest entries of t with probability at least 1 − α . Inthis case, the value of max2 is not affected by corruption of the entries indexed by R . HenceE R ∼D (cid:20) min { max2( t ′ ) : t ′ ≈ R t } (cid:21) ≥ (1 − α ) · max2( t ) + 2 α · ≥ max2( t ) − α. Therefore max2 is 2-stable, which implies that max2 S : [0 , nC → [0 ,
1] is also 2-stable for anyfixed set of indices S . The function g (rev) is a convex combination of functions of the form max2 S ,and is therefore also 2-stable by Proposition 2.3. Theorem 5.2.
The revenue-maximizing signaling problem in probabilistic second-price auctionsadmits an additive PTAS when the valuation distribution is given explicitly, and an additive PRASwhen the valuation distribution is given by a sampling oracle. roof. Lemma 5.1 shows that the function g (rev) is 2-stable and 1-Lipschitz. If the valuationdistribution D is explicitly given with support size C , the function g (rev) can be evaluated inpoly( n, m, C ) time. Then for any ǫ >
0, it follows from Theorem 4.2 by setting α = ǫ/ δ = ǫ/ OP T − ǫ , in time poly( n, m ǫ − ln(1 /ǫ ) , C ).If D is given via a sampling oracle, standard tail bounds and the union bound imply that C = Θ(( s log m + log( γ − )) /ǫ ) samples from D suffice to estimate to within O ( ǫ ) the revenueassociated with every s -uniform posterior in ∆ m , with success probability 1 − γ . Since revenueis O (1)-stable and O (1)-Lipschitz, Lemma 4.1 implies that we can restrict attention to signalingschemes with s -uniform posteriors for s = poly( ǫ ). Proceeding as in Theorem 4.2, using the revenueestimates from Monte-Carlo sampling in lieu of exact values, we can construct a signaling schemewith revenue OP T − ǫ in time poly( n, m ǫ − ln(1 /ǫ ) , log( γ )), with success probability 1 − γ . In this section, we apply our mixture selection framework to natural signaling problems encounteredin the context of social choice, as introduced by Alonso and Cˆamara [AC14]. Consider an electionwith two possible outcomes, ‘Yes’ and ‘No’. For example, voters may need to choose whether toadopt a new law or social policy; board members of a company may need to decide whether toinvest in a new project; and members of a jury must decide whether a defendant is declared guilty ornot guilty. As in [AC14], we focus on the scenario in which voters have uncertainty regarding theirutilities for the two possible outcomes (e.g., the risks and rewards of the new project). Specifically,voters’ utilities are parameterized by an a-priori unknown state of nature θ drawn from a common-knowledge prior distribution. We adopt the perspective of a principal with access to the realizationof θ , and looking to influence the outcome of the election by signaling.Formally, we consider a voting setting with n voters and m states of nature. We index thevoters by the set [ n ] = { , . . . , n } , and states of nature by the set Θ = { , . . . , m } . We assumevoters’ preferences are given by a matrix U ∈ [ − , n × m , where U i,j denotes voter i ’s utility inthe event of a ‘Yes’ outcome in state of nature j . Without loss of generality, we assume utilitiesare normalized so that each voter’s utility for a ‘No’ outcome is 0 in each state of nature. Avoter i who believes that the state of nature follows a distribution µ ∈ ∆ m has expected utility u ( i, µ ) = P j ∈ Θ U i,j µ j for a ‘Yes’ outcome. In most voting systems with a binary outcome, includingfor example threshold voting rules, it is a dominant strategy to vote ‘Yes’ if the utility u ( i, µ ) isat least 0 and ‘No’ otherwise. For our approximation algorithms, we also allow implementationin approximate dominant strategies — i.e., we sometimes assume a voter votes ‘Yes’ if his utility u ( i, µ ) is at least − δ for a small parameter δ . We assume that the state of nature θ ∈ Θ is drawnfrom a common prior λ ∈ ∆ m , and a principal with access to θ reveals a public signal σ prior tovoters casting their votes. As usual, we adopt the perspective of a principal looking to commit toa signaling scheme ϕ : Θ → Σ, for some set of signals Σ.Alonso and Cˆamara [AC14] consider a principal interested in maximizing the probability thatat least 50% (or some given threshold) of the voters vote ’Yes’, in expectation over states ofnature. They characterize optimal signaling schemes analytically, though stop short of prescribingan algorithm for signaling. Theirs is the natural objective when the election employs a majority Such relaxations seem necessary for our results. Moreover, depending on the context, modes of intervention forshifting the votes of voters who are close to being indifferent may be realistic.
We now examine bi-criteria approximation algorithms for maximizing the expected number of‘Yes’ votes. For our benchmark, we use the function g (vote-sum) ( t ) := P i ∈ [ n ] 1 n I [ t i ≥ I [ E ] denotes the indicator function for event E . Assuming voters vote ‘Yes’ precisely when theirposterior expected utility for a ‘Yes’ outcome is nonnegative, the number of ‘Yes’ votes when votershave preferences U ∈ [ − , n × m and posterior belief µ ∈ ∆ m equals g (vote-sum) ( U µ ). When thestate of nature is distributed according to a common prior λ , and voters are informed accordingto signaling ϕ inducing a decomposition { µ σ , ν σ } σ ∈ Σ of λ , the expected number of ‘Yes’ votesequals F (vote-sum) ( ϕ, U, λ ) := P σ ∈ Σ ν σ g (vote-sum) ( U µ σ ). We use OP T (vote-sum) ( U, λ ) to denote themaximum value of F (vote-sum) ( ϕ, U, λ ) over signaling schemes ϕ .As the first step to apply our framework, we prove that g (vote-sum) is stable. Lemma 6.1.
The function g (vote-sum) is -stable.Proof. For each voter i ∈ [ n ], let g i : [ − , n → { , } be the function indicating whether voter i prefers the ‘Yes’ outcome, i.e., g i ( t ) = I [ t i ≥ g i is 1-stable, because as long as the i ’th input t i is not corrupted the output of g i does not change. Therefore g (vote-sum) ( t ) = n n P i =1 g i ( t ),being a convex combination of 1-stable functions, is 1-stable by Proposition 2.3.Unfortunately, g (vote-sum) is not O (1)-Lipschitz. We therefore employ the bi-criteria extensionto our framework from Definition 2.7. Specifically, for a parameter δ >
0, we assume a votervotes ‘Yes’ as long as his expected utility from a ‘Yes’ outcome is at least − δ . Correspondingly,we define the relaxed function g (vote-sum) δ ( t ) := P i ∈ [ n ] 1 n I [ t i ≥ − δ ]; the expected number of ‘Yes’votes from a signaling scheme ϕ = { µ σ , ν σ } σ ∈ Σ can analogously be written as F (vote-sum) δ ( ϕ, U, λ ) := P σ ∈ Σ ν σ g (vote-sum) δ ( U µ σ ).It is easy to verify that g (vote-sum) δ is an ( δ, g (vote-sum) ; combining this fact withTheorem 4.3 yields a bi-criteria approximation scheme for the problem of maximizing the expectednumber of ‘Yes’ votes. Theorem 6.2.
Let ǫ, δ > be parameters, let U ∈ [ − , n × m describe the preferences of n votersin m states of nature, and let λ ∈ ∆ m be the prior of states of nature. There is an algorithmwith runtime poly( m ln(1 /ǫ ) δ , n ) for computing a signaling scheme ϕ such that F (vote-sum) δ ( ϕ, U, λ ) ≥ OP T (vote-sum) ( U, λ ) − ǫ . Using the same techniques as in Section 3.2, we can extend this result to the case where thevaluations of voters are drawn from a distribution give either explicitly or by a sampling oracle.We omit the details. 17 .2 Maximizing Probability of a Majority Vote
We now sketch the necessary modifications when the principal is interested in maximizing theprobability of a ‘Yes’ outcome, assuming a majority voting rule. We make two relaxations, whichappear necessary for our framework: we assume a voter votes ‘Yes’ as long as his expected utilityfrom a ‘Yes’ outcome is at least − δ , and assume that the ‘Yes’ outcome is attained when at least a(0 . − δ ) fraction of voters vote ‘Yes’. Our benchmark will be the maximum probability of a ’Yes’outcome in the absence of these two relaxations.We define our benchmark using the function g (vote-thresh) ( t ) = I [ g (vote-sum) ( t ) ≥ .
5] whichevaluates to 1 if at least half of its n inputs are nonnegative, and to 0 otherwise. This function isnot O (1)-stable, so we work with a more stringent benchmark which is. Specifically, for a parameter δ >
0, we use the function g (vote-smooth-thresh) δ which is pointwise greater than or equal to g (vote-thresh) ,defined as follows: g (vote-smooth-thresh) δ ( t ) = ( δ (cid:0) g (vote-sum) ( t ) − . δ (cid:1) if g (vote-sum) ( t ) ∈ [0 . − δ, . g (vote-thresh) ( t ) otherwise.Observe that g (vote-smooth-thresh) δ applies a continuous piecewise-linear function to the output of g (vote-sum) . It is easy to verify that g (vote-smooth-thresh) δ is δ -stable, and upperbounds g (vote-thresh) .Finally, to measure the quality of our output we define the relaxed function g (vote-thresh) δ :[ − , n → { , } , which outputs 1 if at least a (0 . − δ ) fraction of its inputs exceed − δ , andoutputs 0 otherwise. By Definition 2.7, g (vote-thresh) δ is a ( δ, g (vote-smooth-thresh) δ (and,consequently, also of g (vote-thresh) ).As usual, let F (vote-thresh) ( ϕ, U, λ ) and F (vote-thresh) δ ( ϕ, U, λ ) denote the functions which evaluatethe quality of a signaling ϕ scheme using g (vote-thresh) and g (vote-thresh) δ , respectively. Moreover, let OP T (vote-thresh) ( U, λ ) be the maximum value of F (vote-thresh) ( ϕ, U, λ ) over signaling schemes ϕ . Weapply Theorem 4.3 to g (vote-thresh) δ and g (vote-smooth-thresh) , setting α = ǫδ , and use the fact that g (vote-smooth-thresh) upperbounds our true benchmark g (vote-thresh) , to conclude the following. Theorem 6.3.
Let ǫ, δ > be parameters, let U ∈ [ − , n × m describe the preferences of n votersin m states of nature, and let λ ∈ ∆ m be the prior of states of nature. There is an algorithm withruntime poly( m ln(1 /ǫδ ) δ , n ) for computing a signaling scheme ϕ such that F (vote-thresh) δ ( ϕ, U, λ ) ≥ OP T (vote-thresh) ( U, λ ) − ǫ . Turning our attention away from signaling, we note that g (vote-sum) ( Ax ) simply counts the numberof satisfied inequalities in the system Ax (cid:23)
0. Mixture selection for g (vote-sum) is therefore theproblem of maximizing the number of satisfied inequalities over the simplex. Using our frameworkfrom Section 2, we obtain a bi-criteria PTAS for this problem. Moreover, using Monte-Carlosampling, our bi-criteria PTAS extends to the model in which A is given implicitly; specifically,the rows of A correspond to the sample space of a distribution D over [ − , m , and are weightedaccordingly. In this implicit model, we can think of mixture selection for g (vote-sum) as the problemof finding x ∈ ∆ m which maximizes the probability that a · x ≥ a ∼ D .Motivated by systems applications, Daskalakis et al. [DDD +
14] consider a special case of thisproblem termed
Fault-Tolerant Distributed Storage . Their problem is equivalent to mixture selection18or g (vote-sum) in the implicit model, with the additional restriction that D is a product distributionover binary vectors with marginal probabilities given explicitly. They present an additive EPTASfor this problem in a uni-criteria sense. Our framework relaxes their restrictions on D , at the costof a bi-criteria guarantee and exponential dependence on the error parameters. We consider normal form games of incomplete information, in which payoffs are parameterized bya state of nature θ . A principal has access to the exact realization of θ , whereas the players initiallyshare a prior belief on θ and form a posterior belief based on the information revealed by theprincipal. The goal of the principal is then to commit to revealing certain information about θ —i.e., a signaling scheme — to induce a favorable equilibrium over the resulting Bayesian subgames.Signaling in normal form games has recently been examined from a complexity-theoretic per-spective. Dughmi [Dug14] considered the special case of two-player zero-sum games, and examinedthe design of symmetric signaling schemes with the goal of maximizing the expected utility of one ofthe players. It was shown that no FPTAS is possible for the signaling problem for zero sum games,assuming the planted clique conjecture. In this section, we complement the impossibility resultof [Dug14] with a bi-criteria quasi-polynomial time approximation scheme (QPTAS) which appliesto normal form games with a constant number of players, slightly relaxing both the equilibriumdefinition and the polynomial-time restriction. It remains open if signaling for Bayesian zero-sumgames admits a PTAS. We make heavy use of tensors in describing multi-player games. Specifically, we focus on n -dimensional tensors of order k with entries in [ − , n is typically the number of strategiesper player and k is the number of players. We think of these tensors as functions T : [ n ] k → [ − , s ∈ [ n ] k to a number in [ − , T also naturally describesa multilinear map; overloading notation, given k vectors x , . . . , x k ∈ R n we write T ( x . . . , x k ) = P s ,...,s k ∈ [ n ] (cid:16) T ( s , . . . , s k ) · Q ki =1 x i ( s i ) (cid:17) . This is most natural when x , . . . , x k ∈ ∆ n form a mixedstrategy profile, in which case T ( x , . . . , x k ) evaluates the expected value of T over pure strategyprofiles drawn from ( x , . . . , x k ).A Bayesian normal form game is defined by the following parameters:- An integer k denoting the number of players, indexed by the set [ k ] = { , . . . , k } .- An integer n bounding the number of pure strategies of each player. Without loss of gen-erality, we assume each player has exactly n pure strategies, and index them by the set[ n ] = { , . . . , n } .- An integer m denoting the number of states of nature . We index states of nature by the setΘ = { , . . . , m } , and use the variable θ to represent a state of nature.- A common prior distribution λ ∈ ∆ m on states of nature.- A family of payoff tensors A θi : [ n ] k → [ − , i and state of nature θ , where A θi ( s , . . . , s k ) is the payoff to player i when the state of nature is θ and each player j playsstrategy s j . 19ote that a game of complete information is the special case with m = 1 — i.e., the state ofnature is fixed and known to all. In a general Bayesian normal form game, absent any informationabout the state of nature beyond the prior λ , risk neutral players will behave as in the completeinformation game E θ ∼ λ [ A θ ]. We consider signaling schemes which partially and symmetricallyinform players by publicly announcing a signal σ , correlated with θ ; this induces a common posteriorbelief on the state of nature for each value of σ . When players’ posterior belief over θ is given by µ ∈ ∆ m , we use A µ to denote the equivalent complete information game E θ ∼ µ [ A θ ]. As shorthand,we use A µi ( x , . . . , x k ) to denote E[ A θi ( s , . . . , s k )] when θ ∼ µ ∈ ∆ m and s i ∼ x i ∈ ∆ n . In theevent that the state of nature is θ and players play the pure strategy profile s , . . . , s k , we refer tothe tuple ( θ, s , . . . , s k ) as the state of play . For our result, we assume that a Bayesian game ( A , λ )is represented explicitly as a vector λ ∈ ∆ m and a list of tensors {A θi ∈ [ − , n k : i ∈ [ k ] , θ ∈ [ m ] } .We adopt the approximate Nash equilibrium as our equilibrium concept. There are two variants. Definition 7.1.
Let ǫ ≥ . In a k -player n -action normal form game with expected payoffs in [ − , given by tensors A , . . . , A k , a mixed strategy profile x , . . . , x k ∈ ∆ n is an ǫ -Nash Equilibrium ( ǫ -NE) if A i ( x , . . . , x k ) ≥ A i ( t i , x − i ) − ǫ for every player i and alternative pure strategy t i ∈ [ n ] . Definition 7.2.
Let ǫ ≥ . In a k -player n -action normal form game with expected payoffs in [ − , given by tensors A , . . . , A k , a mixed strategy profile x , . . . , x k ∈ ∆ n is an ǫ -well-supportedNash equilibrium ( ǫ -WSNE) if A i ( s i , x − i ) ≥ A i ( t i , x − i ) − ǫ for every player i , strategy s i in the support of x i , and alternative pure strategy t i ∈ [ n ] . Clearly, every ǫ -WSNE is also an ǫ -NE. When ǫ = 0, both correspond to the exact NashEquilibrium. Note that we omitted reference to the state of nature in the above definitions — in asubgame corresponding to posterior beliefs µ ∈ ∆ m , we naturally use tensors A µ , . . . A µk instead.Fixing an equilibrium concept (NE, ǫ -NE, or ǫ -WSNE), a Bayesian game ( A , λ ), and a signalingscheme ϕ : Θ → Σ, an equilibrium selection rule distinguishes an equilibrium strategy profile( x σ , . . . , x σk ) to be played in each subgame σ — we call the tuple X = { x σi : σ ∈ Σ , i ∈ [ k ] } a Bayesian equilibrium of the game ( A , λ ) with signaling scheme ϕ . Together with the prior λ , theBayesian equilibrium X induces a distribution Γ ∈ ∆ Θ × [ n ] k over states of play — we refer to Γas a distribution of play . This is analogous to implementation of allocation rules in traditionalmechanism design.Our results concern objectives which depend only on the state of play, and we seek to maximizethe objective in expectation over the distribution of play. These include, but are not restricted to,the social welfare of the players, as well as weighted combinations of player utilities. Formally, ourobjective is described by a family of tensors F θ : [ n ] k → [ − , θ ∈ Θ.Equivalently, we may think of the objective as describing the payoffs of an additional player in thegame — namely the principal. For a distribution µ over states of nature, we use F µ = E θ ∼ µ F θ to denote the principal’s expected utility in a subgame with posterior beliefs µ , as a function ofplayers’ strategies.For a signaling scheme ϕ and associated (approximate) equilibria X = { x σi : σ ∈ Σ , i ∈ [ k ] } , ourobjective function can be written as F ( ϕ, X ) = E θ ∼ λ E σ ∼ ϕ ( θ ) E ~s ∼ x σ [ F ( θ, ~s )]. When ϕ correspondsto a convex decomposition { ( µ σ , ν σ ) } σ ∈ Σ of the prior distribution, this can be equivalently written20s F ( ϕ, X ) = P σ ∈ Σ ν σ F µ σ ( x σ ). Let OP T = OP T ( A , λ, F ) denote the maximizer of F ( ϕ ∗ , X ∗ )over signaling schemes ϕ ∗ and (exact) Nash equilibria X ∗ . We seek a signaling scheme ϕ : Θ → Σ,as well as a Bayesian ǫ -NE (or ǫ -WSNE) X such that F ( ϕ, X ) ≥ OP T − ǫ .We will use the following Lemma, which follows easily from the results of Lipton et al. [LMM03],to restrict attention to equilibria with small support. Lemma 7.3.
Let tensors A , . . . , A k : [ n ] k → [ − , describe a k -player game of complete informa-tion with n pure strategies per player, and let F : [ n ] k → [ − , be a tensor describing an objectivefunction on mixed strategies. Define the function r ( ǫ ) = k +1) ln(( k +1) n ) ǫ . For each ǫ > , integer s ≥ r ( ǫ ) , and mixed strategy profile x = ( x , . . . , x k ) , there is a profile e x = ( e x , . . . , e x k ) of s -uniformmixed strategies such that |A i ( x ) − A i ( e x ) | ≤ ǫ for all players i , |F ( x ) − F ( e x ) | ≤ ǫ , and if x is aNash equilibrium of A then e x is an ǫ -equilibrium of A . This holds for both NE and WSNE.Proof. We can think of the tensor F µ : [ n ] k → [ − ,
1] as describing the utility of an additionalplayer in the game with a trivial strategy set. The rest follows from [LMM03, Theorem 2].
We prove the following bi-criteria result.
Theorem 7.4.
Let ǫ > denote an approximation parameter, let ( A , λ ) be a Bayesian normal formgame with k = O (1) players, n actions, and m states of nature, and let F : [ m ] × [ n ] k → [ − , be an objective function given as a tensor. There is an algorithm with runtime poly( m ln( n/ǫ ) ǫ , n ln nǫ ) which outputs a signaling scheme ϕ and corresponding Bayesian ǫ -equilibria X satisfying F ( ϕ, X ) ≥ OP T ( A , λ, F ) − ǫ . This holds for both approximate NE and approximate WSNE. In other words, when the number of players is a constant we can in quasi-polynomial timeapproximate the optimal reward from signaling while losing an additive ǫ in the objective as wellas in the incentive constraints, as compared to the optimal signaling scheme / Nash equilibriumcombination.Fix ǫ >
0. To prove this theorem, we define functions g and g ǫ which each take as input a k -player n -action game of complete information B , given as payoff tensors B . . . , B k : [ n ] k → [ − , G : [ n ] k → [ − , − , g ( B , G ) =max {G ( x ) : x ∈ EQ ( B ) } and g ǫ ( B , G ) = max {G ( x ) : x ∈ EQ ǫ ( B ) } , where EQ ( B ) denotes the setof Nash equilibria of the game B , and EQ ǫ ( B ) denotes the (non-empty) set of ⌈ r ( ǫ/ ⌉ -uniform ǫ -Nash equilibria (or ǫ -WSNE) for r as given in Lemma 7.3. Recall that G ( x ) denotes evaluatingthe multilinear map described by tensor G at the mixed strategy profile x ∈ ∆ kn .Now suppose we fix a Bayesian game ( A , λ ) and objective tensor F as in the statement ofTheorem 7.4. For a subgame with a posterior distribution µ ∈ ∆ m over states of nature, theprincipal’s expected utility at the “best” Nash equilibrium of this subgame can be written as g ( A µ , F µ ). Similarly, the principal’s expected utility at the “best” ⌈ r ( ǫ/ ⌉ -uniform ǫ -NE (or ǫ -WSNE) can be written as g ǫ ( A µ , F µ ). Observe that the input to both g and g ǫ is a linearfunction of µ , as need to apply the results in Section 4. For a signaling scheme ϕ correspondingto a decomposition λ = P σ ∈ Σ ν σ · µ σ of the prior distribution λ into posterior distributions (seeSection 4.1), we can write the principal’s expected utility assuming the “best” Nash equilibrium as F ( ϕ ) = P σ ∈ Σ ν σ g ( F µ σ , A µ σ ), and assuming the “best” ⌈ r ( ǫ/ ⌉ -uniform ǫ -equilibrium as F ǫ ( ϕ ) = P σ ∈ Σ ν σ g ǫ ( F µ σ , A µ σ ). We use OP T to denote the maximum value of F over all signaling schemes.21e prove Theorem 7.4 by exhibiting an algorithm for computing a signaling scheme ϕ suchthat F ǫ ( ϕ ) ≥ OP T − ǫ . The proof hinges on two main lemmas. Lemma 7.5.
The function g is k + 1) n k -stable.Proof. As noted in Section 2, any function mapping a hypercube [ − , N to the interval [ − ,
1] is2 N stable. The function g is such a function with N = ( k + 1) n k . Lemma 7.6.
The function g ǫ is an ( ǫ/ , ǫ/ -relaxation of g .Proof. Consider tensors G , e G : [ n ] k → [ − ,
1] with |G ( s ) − e G ( s ) | ≤ ǫ/ s ∈ [ n ] k , and two k -player n -action games B = ( B , . . . , B k ) and e B = ( e B , . . . , e B k ) with |B i ( s ) − e B i ( s ) | ≤ ǫ/ s ∈ [ n ] k . It suffices to show that g ǫ ( e B , e G ) ≥ g ( B , G ) − ǫ/
2. Let x ∈ ∆ kn be the Bayesian equilibriumof B for which G ( x ) = g ( B , G ). By Lemma 7.3, there is a profile e x of ⌈ r ( ǫ/ ⌉ -uniform mixedstrategies such that e x is an ǫ/ B , and G ( e x ) ≥ G ( x ) − ǫ/
4. Since e B differs from B byat most ǫ/ e x is an ǫ -equilibrium of e B , i.e. e x ∈ EQ ǫ ( e B ). Similarly, since e G differs from G by at most ǫ/ e G ( e x ) ≥ G ( e x ) − ǫ/ ≥ G ( x ) − ǫ/
2. Weconclude that g ǫ ( e B, e G ) ≥ e G ( e x ) ≥ g ( B , G ) − ǫ/ g , h = g ǫ , and α = ǫ k +1) n k . The runtime is poly( m ln(1 /α ) ǫ , ( k + 1) n k , T ), where T is the time needed to evaluate g ǫ (and compute the corresponding ⌈ r ( ǫ/ ⌉ -uniform ǫ -equilibrium) on a given input. Recall that k = O (1) and α = ǫ poly( n ) . Moreover, using brute-force enumeration of all ⌈ r ( ǫ/ ⌉ -uniform mixedstrategy profiles we conclude that T is bounded by a polynomial in n ln nǫ . Therefore our totalruntime is poly( m ln( n/ǫ ) ǫ , n ln nǫ ), as needed. Remarks
In the special case of two-player zero-sum games and a principal interested in max-imizing one player’s utility, as studied in [Dug14], our techniques lead to a more efficient ap-proximation scheme and a uni-criteria guarantee. This is because the principal’s payoff tensor G equals the payoff tensor B of one of the players (say, player 1), and consequently the func-tion g ( B , G ) = g ( B , B ) = max x min y x ⊺ B y is n -stable and 2-Lipschitz. Its Lipschitz continuityfollows from the fact that an ǫ -equilibrium of a zero-sum game leads to utilities within ǫ of theequilibrium utilities. Moreover, evaluating g now takes time T = poly( m, n ). Theorem 4.2 in-stantiated with α = ǫ n and δ = ǫ/
4, leads to an algorithm with runtime poly( m ln( n/ǫ ) ǫ , n ), whichoutputs a signaling scheme ϕ and corresponding Bayesian (exact) Nash-equilibria X satisfying F ( ϕ, X ) ≥ OP T ( A , λ, F ) − ǫ . In this section, we present hardness results which justify our assumptions, and exhibit the limi-tations of our techniques. Specifically, we show in Sections 8.1 and 8.2 that neither stability norLipschitz continuity alone suffices for an additive PTAS. In Section 8.3, we show that even in thepresence of Lipschitz continuity and noise stability, obtaining an additive FPTAS would imply P= NP. 22 .1 NP-Hardness in the Absence of Lipschitz Continuity
We now show that stability alone does not suffice for an additive PTAS for mixture selection, ingeneral. First, we show that mixture selection for the 1-stable function g (vote-sum) , presented inSection 6, does not admit a (uni-criteria) additive PTAS unless P = NP. Being that g (vote-sum) isnot continuous in any metric, we drive the point home by exhibiting a “smoothed” function g (slope) which is 1-stable and O (1)-Lipschitz with respect to L , but not O (1)-Lipschitz with respect to L ∞ , and show that mixture selection for g (slope) still does not admit an additive PTAS unless P =NP.Both NP-hardness results share a similar reduction from the maximum independent set problem.We use a consequence of the result by [KS12], namely that there exists a constant ǫ such that itis NP-hard to approximate maximum independent set to within an additive error of ǫn , where n denotes the number of vertices.Given an n -node undirected graph G , let OPT IS = OPT IS ( G ) be the size of its largest indepen-dent set. We define the n × n matrix A = A ( G ) as follows:- Diagonal entries of A are all ( A i,i = for all 1 ≤ i ≤ n ).- When vertices i and j share an edge in G , both A i,j and A j,i are − A , namely A i,j for non-adjacent distinct vertices i and j , are − n .We relate OPT IS to convex combinations of the columns of A as follows. Observation 8.1.
Let I be an independent set of G with |I| = k . There exists x ∈ ∆ n such that k entries of Ax are at least n , and all remaining entries are strictly negative.Proof. Let x ∈ ∆ n be the normalized indicator vector of I — i.e., x i = k if i ∈ I and x i = 0otherwise. By construction ( Ax ) i = k ( − ( k − n ) ≥ n whenever i ∈ I , and ( Ax ) i ≤ − n otherwise. Observation 8.2.
For any x ∈ ∆ n , nonnegative entries of Ax correspond to an independent setof G . Consequently, Ax can have at most OPT IS nonnegative entries.Proof. Let t = Ax . Consider an edge { i, j } of graph G , and without loss of generality assume that x i ≥ x j . If x i = 0, then t i ≤ − n < t j ≤ x j − x i <
0. Therefore, t i and t j cannot be both nonnegative. We conclude that the nonnegative coordinates of t correspondto an independent set of G .Observations 8.1 and 8.2 imply that max x ∈ ∆ n g (vote-sum) ( Ax ) = OPT IS n . Combined with the factthat obtaining an additive PTAS for the maximum independent set problem is NP-hard, we getthe following theorem. Theorem 8.3.
Mixture selection for the -stable function g (vote-sum) admits no additive PTASunless P = NP. Noting that g (vote-sum) is a discontinuous function, for emphasis we exhibit a function g (slope) which is Lipschitz continuous in L (but not in L ∞ ) and 1-noise stable, but for which the sameimpossibility result holds by an identical reduction. Informally, g (slope) “smoothes” the thresholdbehavior of g (vote-sum) as follows: each input t i contributes 0 to g (slope) ( t ) when t i ≤
0, contributes23 n when t i ≥ n , and the contribution is a linear function of t i increasing from 0 to n for t i ∈ [0 , n ].Formally, we define g (slope) ( t ) = P ni =1 min (cid:8) { , t i } , n (cid:9) . Since each entry of t contributesat most n to g (slope) ( t ), it is easy to verify that g (slope) is 1-stable. Moreover, since the partialderivatives of g (slope) ( t ) are upper-bounded by 4, it is 4-Lipschitz continuous with respect to the L metric. Observations 8.1 and 8.2 imply that max x ∈ ∆ n g (slope) ( Ax ) = OPT IS n , ruling out an additivePTAS for mixture selection for g (slope) . Theorem 8.4.
The function g (slope) is -stable and O (1) -Lipschitz with respect to L , and yetmixture selection for g (slope) admits no additive PTAS unless P = NP. We now present evidence that Lipschitz continuity alone does not suffice for a PTAS for mixtureselection. Recalling that a quasipolynomial time algorithm follows from our framework whenever afunction is O (1)-Lipschitz, we reduce from the planted clique problem —for which a quasipolynomialtime algorithm exists, and yet a polynomial-time algorithm is conjectured not to exist—rather thanfrom an NP-hard problem.In the planted clique problem, one must distinguish the n -node Erd¨os-R´enyi random graph G ( n, ), in which each edge is included independently with probability , from the graph G ( n, , k )formed by “planting” a clique in G ( n, ) at a randomly (or, equivalently, adversarially) chosenset of k nodes. This problem was first considered by by Jerrum [Jer92] and Ku˘cera [Kuˇc95], andhas been the subject of a large body of work since. A quasi-polynomial time algorithm existswhen k ≥ n , and the best polynomial-time algorithms only succeed for k = Ω( √ n ) (see e.g.,[AKS98] [DGGP11] [FR10] [CO10]). Several papers suggest that the problem is hard for k = o ( √ n )by ruling out natural classes of algorithmic approaches (e.g. [Jer92, FK03, FGR + + Assumption 8.5.
For some function k = k ( n ) satisfying k = ω (log n ) and k = o ( √ n ) , there isno probabilistic polynomial-time algorithm that can distinguish between a random graph drawn from G ( n, ) and a random graph drawn from G ( n, , k ) with success probability − o (1) . We let k = k ( n ) be as in Assumption 8.5, and consider mixture selection for the function g (clique) k : [0 , n → [0 ,
1] with g (clique) k ( t ) = t [ k ] − t [ k +1] + t [ n ] , where t [ i ] denotes the i ’th largest entryof the vector t . It is easy to verify that g (clique) k is 3-Lipschitz with respect to the L ∞ metric, yet isnot O (1)-stable. We prove the following theorem. Theorem 8.6.
Assumption 8.5 implies that there is no additive PTAS for mixture selection for g (clique) k . To prove Theorem 8.6, we show that max x ∈ ∆ n g (clique) k ( Ax ) is arbitrarily close to 1 with highprobability when A is the adjacency matrix of G ∼ G ( n, , k ), and is bounded away from 1 withhigh probability when A is the adjacency matrix of G ∼ G ( n, ). For convenience, and withoutloss of generality, we assume that both random graphs include each self-loop with probability —i.e., diagonal entries of the adjacency matrix A are independent uniform draws from { , } in bothcases. Our argument is captured by the following two lemmas.24 emma 8.7. Fix a constant ǫ > . Let G ∼ G ( n, , k ) , and let A be its adjacency matrix. Withprobability − o (1) , there exists an x ∈ ∆ n such that g (clique) k ( Ax ) ≥ − ǫ .Proof. Let C denote the vertices of the planted k -clique. We set x i = k if i ∈ C and 0 otherwise.Let t = Ax . For i ∈ C , t i ≥ − k . On the other hand, all other entries of t concentrate around with high probability. For i / ∈ C , t i is simply the average of k independent Bernoulli randomvariables by definition of G ( n, , k ); using Hoeffding’s inequality, we bound the probability that t i deviates from its expectation by more than a constant δ >
0, to be chosen later:Pr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) t i − (cid:12)(cid:12)(cid:12)(cid:12) > δ (cid:21) ≤ e − δ k . By the union bound, t i ∈ [ − δ, + δ ] simultaneously for all i / ∈ C with probability at least1 − log n − Ω( k ) = 1 − o (1). Thus t [ k +1] − t [ n ] ≤ δ and g (clique) k ( t ) = t [ k ] − ( t [ k +1] − t [ n ] ) ≥ − k − δ with probability 1 − o (1). Choosing δ = ǫ/
3, we conclude that g (clique) k ( t ) ≥ − ǫ with probability1 − o (1). Lemma 8.8.
Fix a constant ǫ > . Let G ∼ G ( n, ) , and let A be its adjacency matrix. Withprobability − o (1) , g (clique) k ( Ax ) ≤ + ǫ for all x ∈ ∆ n .Proof. Recall that g (clique) k is O (1)-Lipschitz and — like any other function from the hypercubeto the bounded interval — O ( n )-stable. If there exists x ∗ such that g (clique) k ( Ax ∗ ) ≥ + ǫ , thenTheorem 2.5 implies that there is an integer s = O (log n ) and an s -uniform vector e x such that g (clique) k ( A e x ) > . There are n s such vectors. We next show that for an arbitrary fixed vector x ∈ ∆ n the probability that g (clique) k ( Ax ) > is at most 2 − Ω( k ) . This would complete the proof bythe union bound, since 1 − n s · − Ω( k ) = 1 − o (1).Fix x ∈ ∆ n , and let t = Ax . Define D as the distribution supported on [0 ,
1] which is sampledas follows: draw a uniformly from { , } n , and output a · x . Since A is the adjacency matrix of G ∼ G ( n, ), each entry t i of t can be viewed as an independent draw from D . We exploit a keyproperty of D in our proof, namely the fact that D is symmetric about . Formally we mean thatPr D [ r ] = Pr D [1 − r ] for all r ∈ [0 , D .Symmetry of D implies that Pr r ∼D [ r ≥ ] = Pr r ∼D [ r ≤ ] ≥ . Recalling that k = o ( n ) andthat entries of t are independent draws from D , the Chernoff bound implies that the following holdswith probability at least 1 − − Ω( n ) : t [ n ] ≤ ≤ t [ k +1] . (8.1)If g (clique) k ( t ) > , then the following two conditions must hold:1. t [ k ] > , and2. t [ k +1] − t [ n ] < .Condition 1 implies that the k largest entries of t are all at least . Furthermore, unless Inequal-ity (8.1) is violated — an event with small probability 2 − Ω( n ) — Condition 2 implies that remainingentries of t are all strictly between and . Let p denote Pr r ∼D [ r ≤ ], also equal to Pr r ∼D [ r ≥ ]by symmetry of D . The probability that k entries of t are at least and all remaining entries25re in ( , ) is given by (cid:0) nk (cid:1) p k (1 − p ) n − k , which is maximized at p = k n , with maximum value2 − Ω( k ) . In summary, the probability that g (clique) k ( Ax ) > is at most 2 − Ω( k ) + 2 − Ω( n ) = 2 − Ω( k ) , asneeded. Our last hardness proof rules out an additive FPTAS for the lottery design problem, a.k.a. mixtureselection for the function g (lottery) , as defined in Section 3. We restrict attention to the weight vector w assigning equal weight to all inputs to g (lottery) , and therefore omit references to the weight vectorfor the remainder of this section. Theorem 8.9.
The lottery design problem, a.k.a. mixture selection for g (lottery) , admits no additiveFPTAS unless P = NP.Proof. The proof involves a reduction from the independent set problem which is very similar to thereduction in Section 8.1, so we only detail the necessary modifications. Given an n -node undirectedgraph G , we define an n × n matrix A = A ( G ) as in Section 8.1, though shifted and normalized soentries lie in [0 , A to , and we set an off-diagonal entry A ij to 0 if i and j share an edge and to − n otherwise. Observation 8.2 implies that for every x ∈ ∆ n , entries of Ax which are at least correspond to an independent set of G . Observation8.1 implies that there is a vector x ∗ ∈ ∆ n so that Ax ∗ has exactly OPT IS ( G ) entries no less than p ∗ := + n .Our input to the lottery design problem will be a valuation matrix B with 8 n + n rows and n columns, obtained from A by adding 8 n “dummy” rows, each of which is ( p ∗ , . . . , p ∗ ). Setting aprice of p ∗ for the lottery x ∗ results in expected revenue r ∗ := p ∗ · n +OPT IS n + n = ( + n ) · n +OPT IS n + n > . Therefore, r ∗ lower-bounds max x ∈ ∆ n g (lottery) ( Bx ). We claim that r ∗ is in fact the maximumvalue of this mixture selection problem, and prove this by fixing an arbitrary lottery x ∈ ∆ n andconducting a simple case analysis on the associated price p :- When p ∗ < p ≤
1, none of the “dummy” types purchase the lottery x , resulting in expectedrevenue at most n n + n < < r ∗ .- When ≤ p ≤ p ∗ , all dummy types purchase x , and the i ’th non-dummy type purchases x only if ( Ax ) i ≥ . Since entries of Ax which are no less than correspond to an independentset of G , at most 8 n + OPT IS types purchase the lottery x , resulting in expected revenue atmost p · n +OPT IS n + n ≤ r ∗ .- When 0 ≤ p < , the best case scenario is that all buyer types purchase the lottery at price p , yielding expected revenue at most p < < r ∗ .Recalling that there exists a constant ǫ > ǫn -approximation algorithm in polynomial time, we conclude thatmixture selection for g (lottery) does not admit a polynomial-time additive ǫ ′ -approximation algorithmfor any ǫ ′ = ǫ ′ ( n ) ≤ ( + n ) ǫ n +1 . This rules out an additive FPTAS.26 Connection to Boolean Function Analysis
In Definition 2.2, stability of a function from the solid hypercube to the bounded interval was definedas robustness to corruption patterns which follow a light distribution. In this section, we exhibit aclosely-related algebraic notion of stability, based on Fourier analysis of Boolean functions. Insteadof using light distributions directly, we describe the effects of corruption on a function g at input t inthe solid hypercube by a Boolean function h t . In particular, h t takes as input a vector z ∈ {− , } n ,interprets z i = 1 as forbidding corruption of input t i and z i = − t i ,and considers the worst-case such corruption of t . Formally, denoting I + ( z ) := { i ∈ [ n ] : z i = 1 } and I − ( z ) := { i ∈ [ n ] : z i = − } , we define Boolean extensions of g at an input t . Definition 9.1 (Boolean extension) . Let g : [ − , n → [ − , , and let t ∈ [ − , n . A Booleanfunction h t : {− , } n → [ − , is called a Boolean extension of g at t if it satisfies: h t ( ) = g ( t ) (9.1) h t ( z ) ≤ min { g ( t ′ ) : t ′ ≈ I − ( z ) t } . (9.2)Next we define Algebraic Stability for g ( t ), using the notion of the Fourier transform of aBoolean function (e.g. [O’D14]). Definition 9.2 (Algebraic Stability) . A function g ( t ) : [ − , n → [ − , is algebraically k -stableif for every t ∈ [ − , n there exists a Boolean extension h t ( z ) at t such that: b h t ( S ) ≥ for all S ⊆ [ n ] (9.3) b h t ( S ) = 0 for all S such that | S | > k, (9.4) where b h t ( S ) is the Fourier coefficient of h t at S ⊆ [ n ] . In the parlance of Boolean function analysis, g is algebraically stable if for all t , the Fourierspectrum of h t is both nonnegative and low-degree . We prove the following analogue of Theorem 2.5. Theorem 9.3.
Let g : [ − , n → [ − , be algebraically k -stable and c -Lipschitz in L ∞ , let A bean n × m matrix with entries in [ − , , let δ, ǫ > , and let s > δ · log kǫ be an integer.Fix a vector x ∈ ∆ m , and let the random variable e x denote the empirical distribution of s i.i.d.samples from probability distribution x . The following then holds: E e x [ g ( A e x )] ≥ g ( Ax ) − ǫ − cδ, (9.5) Proof sketch.
Let t = Ax . Consider the random variable e t = A e x where e x is the (random) empiricaldistribution. Let E i denote the event that | e t i − t i | ≤ δ for some δ ∈ (0 , t i is approximately preserved. Define a Boolean function p D : {− , } n → [0 , p D ( z ) := Pr e x [( ∩ i ∈I + ( z ) E i ) ∩ ( ∩ i ∈I − ( z ) E i )] . (9.6)In particular, p D ( z ) is the probability that only coordinates in I + ( z ) are approximately preserved.It is easy to see that P z ∈{− , } n p D ( z ) = 1 and Pr[ E i ] = Pr[ | e t i − t | ≤ δ ] = P z : z i =1 p D ( z ).27ext, we state some standard equalities for Boolean functions. Let denote the all-one n -dimensional vector (1 , , . . . ,
1) and h t be a Boolean extension of g t as stated in Definition 9.2. X z ∈{− , } n h t ( z ) · p D ( z ) = 2 n X S ⊆ [ n ] c p D ( S ) · b h t ( S ) (9.7) X S ⊆ [ n ] b h t ( S ) = h t ( ) (9.8)By Definition 9.1, h t ( ) = g ( t ). Since g ( t ) is c -Lipschitz in L ∞ , we know:E e x [ g ( A e x )] = E e t [ g ( e t )] ≥ X z ∈{− , } n h t ( z ) · p D ( z ) − cδ, (9.9)so it suffices to prove P S ⊆ [ n ] b h t ( S ) − n P S ⊆ [ n ] c p D ( S ) · b h t ( S ) ≤ ǫ , which can be rewritten as: X S : | S |≤ k (1 − n c p D ( S )) b h t ( S ) + X S : | S | >k (1 − n c p D ( S )) b h t ( S ) (9.10)As the latter term in (9.10) is zero by definition 9.2, we only need to upper bound P S : | S |≤ k (1 − n c p D ( S )) b h t ( S ). Using a simple union-bound argument, one can prove that with sample size s ≥ δ log kǫ , c p D ( S ) ≥ − ǫ n for all S ⊆ [ n ] with | S | ≤ k . As b h t ( S ) ≥ S with | S | ≤ k , wehave P S : | S |≤ k (1 − n c p D ( S )) b h t ( S ) ≤ ǫ P S ⊆ [ n ] b h t ( S ) ≤ ǫ . Combined with equation (9.9) we haveE[ g ( A e x )] ≥ g ( t ) − ǫ − cδ . We observe that algebraic stability holds for the objective functions in some our applications,and leave open whether it holds for all. Specifically, the relevant functions in lottery design,revenue maximizing signaling, and one of our two voting problems, are algebraically O (1)-stable,and therefore an additive PTAS for each of those applications follows from Theorem 9.3 just as itdid from Theorem 2.5. We list the Boolean extension associated with each of these applications. Lemma 9.4 (Lottery design) . For a fixed price p , a Boolean extension of the objective function g (lottery) w,p ( t ) := p · P ni =1 ( w i · I [ t i ≥ p ]) at t is h (lottery) t ( z ) = p · P i ∈ [ n ] w i z i +12 I [ t i ≥ p ] . We enumerateover all prices p , up to a suitable discretization, in the algorithmic solution. Lemma 9.5 (Persuasion in voting) . A Boolean extension of g (vote-sum) at t is h (vote-sum) t ( z ) = P i ∈ [ n ] I [ t i ≥ n z i +12 . Lemma 9.6 (Revenue maximization) . A Boolean extension of the function max2 : [0 , n → [0 , at t is h (max2) t ( z ) = t j z i +12 z j +12 , where i and j denotes the indices of the largest and second-largestentry of t , respectively. eferences [AAK +
07] Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld, andNing Xie. Testing k-wise and almost k-wise independence. In
Proceedings of the 39thannual ACM Symposium on Theory of Computing (STOC) , pages 496–505, 2007.[AC14] Ricardo Alonso and Odilon Cˆamara. Persuading voters. Manuscript, 2014.[AKS98] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique ina random graph. In
Proceedings of the 9th ACM Symposium on Discrete Algorithms(SODA) , pages 594–598. Society for Industrial and Applied Mathematics, 1998.[Bar15] Siddharth Barman. Approximating Nash equilibria and dense bipartite subgraphs viaan approximate version of Caratheodory’s theorem. In
Proceedings of the 47th AnnualACM on Symposium on Theory of Computing (STOC) , pages 361–369, 2015.[BMS12] Peter Bro Miltersen and Or Sheffet. Send mixed signals: earn more, work less. In
Proceedings of the 13th ACM Conference on Electronic Commerce (EC) , pages 234–247, 2012.[CO10] Amin Coja-Oghlan. Graph partitioning via adaptive spectral techniques.
Combina-torics, Probability & Computing , 19(2):227, 2010.[DDD +
14] Constantinos Daskalakis, Anindya De, Ilias Diakonikolas, Ankur Moitra, and Rocco A.Servedio. A polynomial-time approximation scheme for fault-tolerant distributed stor-age. In
Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms(SODA) , pages 628–644, 2014.[DGGP11] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear timewith high probability. In
ANALCO , pages 67–75. SIAM, 2011.[DHN14] Shaddin Dughmi, Li Han, and Noam Nisan. Sampling and representation complexityof revenue maximization. In
Proceedings of the 10th Conference on Web and InternetEconomics (WINE) , 2014.[DIR14] Shaddin Dughmi, Nicole Immorlica, and Aaron Roth. Constrained signaling in auctiondesign. In
Proceedings of the 25th ACM Symposium on Discrete Algorithms (SODA) ,2014.[Dug14] Shaddin Dughmi. On the hardness of signaling. In
Proceedings of the 55th IEEESymposium on Foundations of Computer Science (FOCS) , 2014.[EFG +
12] Yuval Emek, Michal Feldman, Iftah Gamzu, Renato Paes Leme, and Moshe Tennen-holtz. Signaling schemes for revenue maximization. In
Proceedings of the 13th ACMConference on Electronic Commerce (EC) , pages 514–531, 2012.[FGR +
13] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao. Sta-tistical algorithms and a lower bound for detecting planted cliques. In
Proceedings ofthe 45th annual ACM Symposium on Theory of Computing (STOC) , pages 655–664.ACM, 2013. 29FK03] Uriel Feige and Robert Krauthgamer. The probable value of the Lov´asz–Schrijver re-laxations for maximum independent set.
SIAM Journal on Computing , 32(2):345–370,2003.[FR10] Uriel Feige and Dorit Ron. Finding hidden cliques in linear time.
DMTCS Proceedings ,(01):189–204, 2010.[GD13] Mingyu Guo and Argyrios Deligkas. Revenue maximization via hiding item attributes.In
Proceedings of the 23rd international joint conference on Artificial Intelligence , pages157–163. AAAI Press, 2013.[GNS07] A. Ghosh, H. Nazerzadeh, and M. Sundararajan. Computing optimal bundles for spon-sored search. In
Workshop on Internet and Network Economics (WINE) , 2007.[HK11] Elad Hazan and Robert Krauthgamer. How hard is it to approximate the best Nashequilibrium?
SIAM Journal on Computing , 40(1):79–91, 2011.[Jer92] Mark Jerrum. Large cliques elude the Metropolis process.
Random Structures & Algo-rithms , 3(4):347–359, 1992.[JP00] Ari Juels and Marcus Peinado. Hiding cliques for cryptographic security.
Designs,Codes and Cryptography , 20(3):269–280, 2000.[KG09] Emir Kamenica and Matthew Gentzkow. Bayesian persuasion. Technical report, Na-tional Bureau of Economic Research, 2009.[KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on boolean functions.In
Proceedings of the 29th Annual Symposium on Foundations of Computer Science ,SFCS ’88, pages 68–80, 1988.[KS12] S. Khot and R. Saket. Hardness of finding independent sets in almost q-colorablegraphs. In
Proceedings of the 53rd IEEE Symposium on Foundations of ComputerScience (FOCS) , pages 380–389, 2012.[Kuˇc95] Ludˇek Kuˇcera. Expected complexity of graph partitioning problems.
Discrete AppliedMathematics , 57(2):193–212, 1995.[LMM03] Richard Lipton, Evangelos Markakis, and Aranyak Mehta. Playing large games usingsimple strategies. In
Proceedings of the 4th ACM conference on Electronic commerce(EC) , pages 36–41, 2003.[MOO10] Elchanan Mossel, Ryan ODonnell, and Krzysztof Oleszkiewicz. Noise stability offunctions with low influences: Invariance and optimality.
Annals of Mathematics ,171(1):295–341, 2010.[MV09] Lorenz Minder and Dan Vilenchik. Small clique detection and approximate Nash equi-libria. In
Approximation, Randomization, and Combinatorial Optimization. Algorithmsand Techniques , pages 673–685. Springer, 2009.[O’D14] Ryan O’Donnell.