Competing Prediction Algorithms
aa r X i v : . [ c s . G T ] M a y Competing Prediction Algorithms
Omer Ben-Porat
Technion - Israel Institute of TechnologyHaifa 32000 Israel [email protected]
Moshe Tennenholtz
Technion - Israel Institute of TechnologyHaifa 32000 Israel [email protected]
Remark
An updated and significantly improved version of this paper was published in Economics andComputation 2019 under the name “Regression Equilibrium”, and is publicly available here: https://arxiv.org/abs/1905.02576 . Please refer to that version.
Abstract
Prediction is a well-studied machine learning task, and prediction algorithms arecore ingredients in online products and services. Despite their centrality in thecompetition between online companies who offer prediction-based products, the strategic use of prediction algorithms remains unexplored. The goal of this pa-per is to examine strategic use of prediction algorithms. We introduce a novelgame-theoretic setting that is based on the PAC learning framework, where eachplayer (aka a prediction algorithm at competition) seeks to maximize the sum ofpoints for which it produces an accurate prediction and the others do not. Weshow that algorithms aiming at generalization may wittingly miss-predict somepoints to perform better than others on expectation. We analyze the empiricalgame, i.e. the game induced on a given sample, prove that it always possessesa pure Nash equilibrium, and show that every better-response learning processconverges. Moreover, our learning-theoretic analysis suggests that players can,with high probability, learn an approximate pure Nash equilibrium for the wholepopulation using a small number of samples.
Prediction plays an important role in twenty-first century economics. An important example is theway online retailers advertise services and products tailored to predict individual taste. Companiescollect massive amounts of data and employ sophisticated machine learning algorithms to discoverpatterns and seek connections between different user groups. A company can offer customized prod-ucts, relying on user properties and past interactions, to outperform the one-size-fits-all approach.For instance, after examining sufficient number of users and the articles they read, media websitespromote future articles predicted as having a high probability of satisfying a particular user.For revenue-seeking companies, prediction is another tool that can be exploited to increase revenue.When companies’ products are alike, the chance that a user will select the product of a particularcompany decreases. In this case a company may purposely avoid offering the user this product andoffer an alternative one in order to maximize the chances of having its product selected. Despite theintuitive clarity of the tradeoff above and the enormous amount of work done on prediction in themachine learning and statistical learning communities, far too little attention has been paid to thestudy of prediction in the context of competition .n this paper we introduce what is, to the best of our knowledge, a first-ever attempt to study how theselection of prediction algorithms is affected by strategic behavior in a competitive setting, using agame-theoretic lens. We consider a space of users, where each user is modeled as a triplet ( x, y, t ) ofan instance, a label and a threshold, respectively. A user’s instance is a real vector that encodes his properties; the label is associated with his taste, and the threshold is the “distance” he is willing toaccept between a proposed product and his taste. Namely, the user associated with ( x, y, t ) embracesa customized product f ( x ) if f ( x ) − y is less than or equal to t . In such a case, the user is satisfied and willing to adopt the product. If a user is satisfied with several products (of several companies),he selects one uniformly at random. Indeed, the user-model we adopt is aligned with the celebrated“Satisficing" principle of Simon [18], and other widely-accepted models in the literature on choiceprediction, e.g. the model of selection based on small samples [3, 7]. Several players are equippedwith infinite strategy spaces, or hypothesis classes in learning-theoretic terminology. A player’sstrategy space models the possible predictive functions she can employ. Players are competing forthe users, and a player’s payoff is the expected number of users who select her offer. To modeluncertainty w.r.t. the users’ taste, we use the PAC-learning framework of Valiant [19]. We assumeuser distribution is unknown, but the players have access to a sequence of examples, containinginstances, labels and thresholds, with which they should optimize their payoffs w.r.t. the unknownunderlying user distribution.From a machine learning perspective we now face the challenge of what would be a good predictionalgorithm profile, i.e. a set of algorithms for the players such that no player would deviate from heralgorithm assuming the others all stick to their algorithms. Indeed, such a profile of algorithms deter-mines a pure Nash equilibrium (PNE) of prediction algorithms, a powerful solution concept whichrarely exists in games. An important question in this regard is whether such a profile exists. An ac-companying question is whether a learning dynamics in which players may change their predictionalgorithms to better-respond to others would converge. Therefore, we ask: ● Does a PNE exist? ● Will the players be able to find it efficiently with high probability using a better-response dynam-ics?We prove that the answer to both questions is yes. We first show that when the capacity of eachstrategy space is bounded (i.e., finite pseudo-dimension), players can learn payoffs from samples.Namely, we show that the payoff function of each player uniformly converges over all possiblestrategy profiles (that include strategies of the other players). Thus with high probability a player’spayoff under any strategy profile is not too distant from her empirical payoff. Later, we show that anempirical PNE always exists, i.e., a PNE of the game induced on the empirical sample distribution.Moreover, we show that any learning dynamics in which players improve their payoff by more thana non-negligible quantity converges fast to an approximate PNE. Using the two latter results, weshow an interesting property of the setting: the elementary idea of sampling and better-respondingaccording to the empirical distribution until convergence leads to an approximate PNE of the gameon the whole population. We analyze this learning process, and formalize the above intuition viaan algorithm that runs in polynomial time in the instance parameters, and returns an approximatePNE with high probability. Finally, we discuss the case of infinite capacities, and demonstrate thatnon-learnability can occur even if the user distribution is known to all players.
Related work
The intersection of game theory and machine learning has increased rapidly in re-cent years. Sample empowered mechanism design [16] is a fruitful line of research. For example,[6, 8, 14] reconsider auctions where the auctioneer can sample from bidder valuation functions,thereby relaxing the assumption of prior knowledge on bidder valuation distribution [15]. Empiricaldistributions also play a key role in other lines of research [1, 2, 11], where e.g. [2] show how to ob-tain an approximate equilibrium by sampling any mixed equilibrium. The PAC-learning frameworkproposed by Valiant [19] has also been extended by Blum et al. [5], who consider a collaborativegame where players attempt to learn the same underlying prediction function, but each player hasher own distribution over the space. In their work each player can sample from her own distribution,and the goal is to use information sharing among the players to reduce the sample complexity.Our work is inspired by Dueling Algorithms [10]. Immorlica et al. analyze an optimization problemfrom the perspective of competition, rather than from the point of view of a single optimizer. Our For ease of exposition, third-person singular pronouns are “he” for a user and “she” for a player.
Our contribution
Our contribution is three-fold. First, we explicitly suggest that prediction algo-rithms, like other products on the market, are in competition. This novel view emphasizes the needfor stability in prediction-based competition similar to Hotelling’s stability in spatial competition[9].Second, we introduce an extension of the PAC-learning framework for dealing with strategy profiles,each of which is a sequence of functions. We show a reduction from payoff maximization to lossminimization, which is later used to achieve bounds on the sample complexity for uniform conver-gence over the set of profiles. We also show that when players have approximate better-responseoracles, they can learn an approximate PNE of the empirical game. The main technical contributionof this paper is an algorithm which, given ǫ, δ , samples a polynomial number of points in the gameinstance parameters, runs any ǫ -better-response dynamics, and returns an ǫ -PNE with probability ofat least − δ .Third, we consider games with at least one player with infinite pseudo-dimension. We show a gameinstance where each player can learn the best prediction function from her hypothesis class if shewere alone in the game, but a PNE of the empirical game is not generalized. This inability tolearn emphasizes that strategic behavior can introduce further challenges to the machine learningcommunity. In this section we formalize the model. We begin with an informal introduction to elementaryconcepts in both game theory and learning theory that are used throughout the paper.
Game theory
A non-cooperative game is composed of a set of players
N = { , . . . N } ; a strategyspace H i for every player i ; and a payoff function π i ∶ H × ⋯ × H N → R for every player i . The set H = H × ⋯ × H N contains of all possible strategies, and a tuple of strategies h = ( h , . . . h N ) ∈ H is called a strategy profile , or simply a profile. We denote by h − i the vector obtained by omitting the i -th component of h .A strategy h ′ i ∈ H i is called a better response of player i with respect to a strategy profile h if π i ( h ′ i , h − i ) > π i ( h ) . Similarly, h ′ i is said to be an ǫ -better response of player i w.r.t. a strategyprofile h if π i ( h ′ i , h − i ) ≥ π i ( h )+ ǫ , and a best response to h − i if π i ( h ′ i , h − i ) ≥ sup h i ∈H i π i ( h i , h − i ) .We say that a strategy profile h is a pure Nash equilibrium (herein denoted PNE) if every playerplays a best response under h . We say that a strategy profile h is an ǫ -PNE if no player has an ǫ -better response under h , i.e. for every player i it holds that π i ( h ) ≥ sup h ′ i ∈H i π i ( h ′ i , h − i ) − ǫ .3 earning theory Let F be a class of binary-valued functions F ⊆ { , } X . Given a sequence S =( x , . . . x m ) ∈ X m , we denote the restriction of F to S by F ∩ S = {( f ( x ) , . . . , f ( x m )) ∣ f ∈ F } .The growth function of F , denoted Π F ∶ N → N , is defined as Π F ( m ) = max S∈X m ∣ F ∩ S ∣ . Wesay that F shatters S if ∣ F ∩ S ∣ = ∣ S ∣ . The Vapnik-Chervonenkis dimension of a binary functionclass is the cardinality of the largest set of points in X that can be shattered by F , VCdim ( F ) = max { m ∈ N ∶ Π F ( m ) = m } .Let H be a class of real-valued functions H ⊆ R X . The restriction of H to S ∈ X m is analogouslydefined, H ∩ S = {( h ( x ) , . . . , h ( x m )) ∣ h ∈ H } . We say that H pseudo-shatters S if there exists r = ( r , . . . , r m ) ∈ R m such that for every binary vector b = ( b , . . . b m ) ∈ { − , } m there exists h b ∈ H and for every i ∈ [ m ] it holds that sign ( h b ( x i ) − r i ) = b i . The pseudo-dimension of H isthe cardinality of the largest set of points in X that can be pseudo-shattered by H ,Pdim ( H ) = max { m ∈ N ∶ ∃S ∈ X m such that S is pseudo-shattered by H } . We consider a set of users who are interested in a product provided by a set of competing players.Each user is associated with a vector ( x, y, t ) , where x is the instance; y is the label; and t is thethreshold that the user is willing to accept.The players offer customized products to the users. When a user associated with a vector ( x, y, t ) approaches player i , she produces a prediction h i ( x ) . If ∣ h i ( x ) − y ∣ is at most t , the user associatedwith ( x, y, t ) will grant one monetary unit to player i . Alternatively, that user will move on to anotherplayer. We assume that users approach players according to the uniform distribution, although ourmodel and results support any distribution over player orderings. Player i has a set of possiblestrategies (prediction algorithms) H i , from which she has to decide which one to use. Each playeraims to maximize her expected payoff, and will act strategically to do so.Formally, the game is a tuple ⟨ Z , D , N , ( H i ) i ∈N ⟩ such that1. Z is the examples domain Z = X × Y × T , where X ⊂ R n is the instance domain; Y ⊂ R is the label domain; and T ⊂ R ≥ is the tolerance domain.2. D is a probability distribution over Z = X × Y × T .3. N is the set of players, with ∣N ∣ = N . A strategy of player i is an element from H i ⊆ Y X .The space of all strategy profiles is denoted by H = ⨉ Ni = H i .4. For z = ( x, y, t ) and a function g ∶ X → Y , we define the indicator I( z, g ) to be 1 if thedistance between the value g predicted for x and y is at most t . Formally, I( z, g ) = { ∣ g ( x ) − y ∣ ≤ t otherwise .
5. Given a strategy profile h = ( h , . . . h N ) with h i ∈ H i for i ∈ { , . . . N } and z = ( x, y, t ) ∈ Z , let w i ( z ; h ) = ⎧⎪⎪⎨⎪⎪⎩ if I( z, h i ) = ∑ Ni ′= I ( z,h i ′ ) otherwise . Note that w i ( z ; h ) represents the expected payoff of player i w.r.t. the user associated with z . The payoff of player i under h is the average sum over all users, and is defined by π i ( h ) = E z ∼D [ w i ( z ; h )] . D is unknown to the players.We assume players have access to a sequence of examples S . Given a game instance ⟨ Z , D , N , ( H i ) i ∈N ⟩ and a sample S = { z , . . . z m } , we denote by ⟨ Z , S ∼ D m , N , ( H i ) i ∈N ⟩ the empirical game : the game over the same N , H , Z and uniform distribution over the known S ∈ Z m .We denote the payoff of player i in the empirical game by π S i ( h ) = E z ∈S [ w i ( z ; h )] = m m ∑ j = w i ( z j ; h ) . S is known from the context, we occasionally use the term empirical PNE to denote a PNE ofthe empirical game. Since the empirical game is a complete information game, players can use thesample in order to optimize their payoffs.The optimization problem of finding a best response in our model is intriguing in its own right anddeserves future study. In this paper, we assume that each player i has a polynomial ǫ -better-responseoracle. Namely, given a real number ǫ > , a strategy profile h and sample S , we assume thateach player i has an oracle that returns an ǫ -better response to h − i if such exists or answers falseotherwise, which runs in time poly ( ǫ , m, N ) . Throughout this section we assume the pseudo-dimension of H i is finite, and we denote it by d i , i.e.Pdim ( H i ) = d i < ∞ . Our goal is to propose a generic method for finding an ǫ -PNE efficiently. Themethod is composed of two steps: first, it attains a sample of “sufficient” size. Afterwards, it runs an ǫ -better-response dynamics until convergence, and returns the obtained profile. The underlying ideais straightforward, but its analysis is non-trivial. In particular, we need to show two main claims:• Given a sufficiently large sample S , the payoff of each player i in the empirical game isnot too far away from her payoff in the actual game, with high probability. This holdsconcurrently for all possible strategy profiles.• An ǫ -PNE exists in every empirical game. Therefore, players can reach an ǫ -PNE of theempirical game fast, using their ǫ -better-response oracles.These claims will be made explicit in forthcoming Subsections 3.1 and 3.2. We formalize the abovediscussion via Algorithm 1 in Subsection 3.3. We now bound the probability (over all choices of S ) of having player i ’s payoff (for an arbitrary i ) greater or less than its empirical counterpart by more than ǫ . Notice that the restriction of H i to any arbitrary sample S , i.e. H i ∩ S , may be of infinite size. Nevertheless, the payoff functionconcerns the indicator function I only and not the real-valued prediction produced by functions in H i ; therefore, we now analyze this binary function class.Let F i ∶ Z → { , } such that F i def = { I ( z, h ) ∣ h ∈ H i } . (1)Notice that ∣ F i ∩ S ∣ represents the effective size of H i ∩ S with respect to the indicator function I .We already know that the pseudo-dimension of H i is d i . In Lemma 1 we bind the pseudo-dimensionof H i with the VC dimension of F i . Lemma 1.
VCdim ( F i ) ≤ d i . After discovering the connection between the growth rate of H i and F i , we can progress to boundingthe growth of the payoff function class F (which we will define shortly).For ease of notation, denote I ( z, h ) = ( I ( z, h ) , . . . , I ( z, h N )) . Similarly, let w ( z ; h ) = ( w ( z ; h ) , . . . , w N ( z ; h )) . Note that there is a bijection I ( z, h ) ↦ w ( z ; h ) , which divides I ( z, h ) by its norm if it is greater than zero or leaves it as is otherwise. Formally, there is a bijection M , M ∶ { , } N → { , , . . . , N , } N such that for every v ∈ { , } N , M ( v ) = { if ∥ v ∥ = v ∥ v ∥ otherwise . Notice that a best response can be found in constant time if H i is of constant size. In addition, in theappendix we leverage the algorithm proposed in [4], and show that it can compute a best response within theset of linear predictors efficiently when the input dimension (denoted by n in the model above) is constant. Wealso discuss situations where a better response cannot be computed efficiently in Section 5, and present theapplicability of our models for these cases as well. F = Z → { , } N , defined by F def = { I ( z, h ) ∣ h ∈ H } . Note that every element in F is a function from Z to { , } N . The restriction of F to a sample S isdefined by F ∩ S = {( I ( z , h ) , . . . , I ( z m , h )) ∣ h ∈ H } . Due to the aforementioned bijection, every element in
F ∩ S represents a distinct payoff vector ofthe empirical game; thus, bounding ∣ F ∩ S ∣ corresponds to bounding the number of distinct strategyprofiles in the empirical game. Clearly, ∣ F ∩ S ∣ = N ∏ i = ∣ F i ∩ S ∣ . The growth function of F , Π F ( m ) = max S∈Z m ∣ F ∩ S ∣ , is therefore bounded as follows. Lemma 2. Π F ( m ) ≤ ( em ) ∑ Ni = d i . Next, we bound the probability of a player i ’s payoff being “too far” from its empirical counterpart.The proof of Lemma 3 below goes along the path of Vapnik and Chervonenkis, introduced in [20].Since in our case F is not a binary function class, few modifications are needed. Lemma 3.
Let m be a positive integer, and let ǫ > . It holds that Pr S∼D m ( ∃ h ∶ ∣ π i ( h ) − π S i ( h )∣ ≥ ǫ ) ≤ F ( m ) e − ǫ m . The following Theorem 1 bounds the probability that any player i has a difference greater than ǫ between its payoff and its empirical payoff (over the selection of a sample S ), uniformly over allpossible strategy profiles. This is done by simply applying the union bound on the bound alreadyobtained in Lemma 3. Theorem 1.
Let m be a positive integer, and let ǫ > . It holds that Pr S∼D m ( ∃ i ∈ [ N ] ∶ sup h ∈H ∣ π i ( h ) − π S i ( h )∣ ≥ ǫ ) ≤ N ( em ) ∑ Ni = d i e − ǫ m . (2) In the previous subsection we bounded the probability of a payoff vector being too far from its coun-terpart in the empirical game. Notice, however, that this result implies nothing about the existenceof a PNE or an approximate PNE: for a fixed S , even if sup h ∈H ∣ π i ( h ) − π S i ( h )∣ < ǫ holds for every i , a player may still have a beneficial deviation. Therefore, the results of the previous subsection areonly meaningful if we show that there exists a PNE in the empirical game, which is the goal of thissubsection. We prove this existence using the notion of potential games [13].A non-cooperative game is called a potential game if there exists a function Φ ∶ H → R such that forevery strategy profile h = ( h , . . . , h N ) ∈ H and every i ∈ [ N ] , whenever player i switches from h i to a strategy h ′ i ∈ H i , the change in her payoff function equals the change in the potential function,i.e. Φ ( h ′ i , h − i ) − Φ ( h i , h − i ) = π i ( h ′ i , h − i ) − π i ( h i , h − i ) . Theorem 2 ([13, 17]) . Every potential game with a finite strategy space possesses at least one PNE.
Obviously, in our setting the strategy space of a game instance ⟨ Z , D , N , ( H i ) i ∈N ⟩ is typicallyinfinite. Infinite potential games may also possess a PNE (as discussed in [13]), but in our case thedistribution D is approximated from samples and the empirical game is finite, so no stronger claimsare needed.Lemma 4 below shows that every empirical game is a potential game. Lemma 4.
Every empirical game ⟨ Z , S ∼ D m , N , ( H i ) i ∈N ⟩ has a potential function. As an immediate result of Theorem 2 and Lemma 4,6 lgorithm:
Approximate PNE w.h.p. via better-response dynamics
Input: δ, ǫ ∈ ( , ) Output: a strategy profile h set m = m ǫ ,δ // the minimal integer m satisfying Equation (3) sample S from D m execute any ǫ -better-response dynamics on S until convergence, and obtain a strategy profile h that is an empirical ǫ -PNE return h Corollary 1.
Every empirical game ⟨ Z , S ∼ D m , N , ( H i ) i ∈N ⟩ possesses at least one PNE. After establishing the existence of a PNE in the empirical game, we are interested in the rate withwhich it can be “learnt”. More formally, we are interested in the convergence rate of the dynamicsbetween the players, where at every step one player deviates to one of her ǫ -better responses. Suchdynamics do not necessarily converge in general games, but do converge in potential games. Byexamining the specific potential function in our class of (empirical) games, we can also bound thenumber of steps until convergence. Lemma 5.
Let ⟨ Z , S ∼ D m , N , ( H i ) i ∈N ⟩ be any empirical game instance. After at most O ( log Nǫ ) iterations of any ǫ -better-response dynamics, an ǫ -PNE is obtained. ǫ -PNE with high probability In this subsection we leverage the results of the previous Subsections 3.1 and 3.2 to devise Algorithm1, which runs in polynomial time and returns an approximate equilibrium with high probability.More precisely, we show that Algorithm 1 returns an ǫ -PNE with probability of at least − δ , andhas time complexity of poly ( ǫ , m, N, log ( δ ) , d ) . As in the previous subsections, we denote d = ∑ Ni = d i .First, we bound the required sample size. Using standard algebraic manipulations on Equation (2),we obtain the following. Lemma 6.
Let ǫ, δ ∈ ( , ) , and let m ≥ dǫ log ( dǫ ) + d log ( e ) ǫ + ǫ log ( Nδ ) . (3) With probability of at least − δ over all possible samples S of size m , it holds that ∀ i ∈ [ N ] ∶ sup h ∈H ∣ π i ( h ) − π S i ( h )∣ < ǫ. Given ǫ, δ , we denote by m ǫ,δ the minimal integer m satisfying Equation (3). Lemma 6 shows that m ǫ,δ = O ( dǫ log ( dǫ ) + ǫ log ( Nδ )) are enough samples to have all empirical payoff vectors ǫ -closeto their theoretic counterpart coordinate-wise (i.e. L ∞ norm), with a probability of at least − δ .Next, we bind an approximate PNE in the empirical game with an approximate PNE in the (actual)game. Lemma 7.
Let m ≥ m ǫ ,δ and let h be an ǫ -PNE in ⟨ Z , S ∼ D m , N , ( H i ) i ∈N ⟩ . Then h is an ǫ -PNEwith probability of at least − δ . Recall that Lemma 5 ensures that every O ( log Nǫ ) iterations of any ǫ -better-response dynamics mustconverge to an ǫ -PNE of the empirical game. In each such iteration a player calls her approximatebetter-response oracle, which is assumed to run in poly ( ǫ , m, N ) time. Altogether, given ǫ and δ ,Algorithm 1 runs in poly ( ǫ , N, log ( δ ) , d ) time, and returns an ǫ -PNE with probability of at least − δ . 7 Learnability in games with infinite dimension
While Lemma 1 upper bounds VCdim ( F i ) as a function of Pdim ( H i ) , it is fairly easy to show thatVCdim ( F i ) ≥ Pdim ( H i ) (we prove this claim formally in the appendix). Therefore, if Pdim ( H i ) isinfinite, so is VCdim ( F i ) .Classical results in learning theory suggest that if Pdim ( F i ) = ∞ , a best response on the samplemay not generalize to an approximate best response w.h.p. To see this, imagine a “game” with oneplayer, who seeks to maximize her payoff function. No Free Lunch Theorems (see, e.g., [21]) implythat with a constant probability the player cannot get her payoff within a constant distance from theoptimal payoff. We conclude that in general games, if a player has a strategy space with an infinitepseudo-dimension, she may not be able to learn. However, in the presence of such a player, canother players with a finite pseudo-dimension learn an approximate best-response?One typically shows non-learnability by constructing two distributions and proving that with con-stant probability an agent cannot tell which distribution produced the sample she obtained. Thesetwo distributions are constructed to be distant enough from each other, so the loss (or payoff in oursetting) is far from optimal by at least a constant. In our setting, however, players are interactingwith each other, and player payoffs are a function of the whole strategy profile; thus, interestingphenomena occur even if the distribution D is known. In particular, Example 1 below demonstratesthat in the infinite dimension case, not every empirical PNE is generalized to an approximate PNEwith high probability. Example 1.
Let D be a density function over Z = [ , ] × { , } × { } as follows: D ( x, y, t ) = ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ ≤ x < , y = , t = ≤ x ≤ , y = , t = otherwise . In addition, for any finite size subset S of Z in the support of D , denote h S → ( x ) = { ∃ y, t ∶ ( x, y, t ) ∈ S ≤ x ≤ ∀ y, t ∶ ( x, y, t ) ∉ S , h S → ( x ) = { ∃ y, t ∶ ( x, y, t ) ∈ S ≤ x ≤ ∀ y, t ∶ ( x, y, t ) ∉ S . In other words, h S → labels 0 every instance that appears in the sample S and every instance in the [ , ) segment. On the other hand, h S → labels 1 every instance that appears in the sample S andevery instance in the [ , ] segment. Denote H = { h S → ∣ S ⊂ Z } ∪ { h S → ∣ S ⊂ Z } , and let H = H = { ≤ x < , ≤ x ≤ } . In this three-player game, consider the profile h = ( h , h , h ) such that h = h S → , h = h = ≤ x ≤ . Notice that the payoffs under h are defined as follows: π ( h ) = m m ∑ j = ( − y j ) , π ( h ) = π ( h ) = m m ∑ j = y j . Observe that if < m ∑ mj = y j < , then h is an empirical PNE, since no player can improve herpayoff. Notice, however, that π ( h ) = yet π ( ≤ x < , h − ) = .Since we have < m ∑ mj = y j < with probability of at least over all choices of S for ∣ S ∣ ≥ (see the appendix), this empirical equilibrium will not be generalized to -PNE w.p. of at least .This is true for any ǫ, δ ∈ ( , ) ; thus, an empirical PNE is not generalized to an approximate PNEw.h.p.Another interesting point is that in Example 1 each player can trivially find a strategy that maximizesher payoff if she were alone, since D is known. Indeed, this inability to generalize from samplesfollows solely from strategic behavior. Notice that if player 3 has knowledge of H , she can inferthat her strategy under h is sub-optimal. However, knowledge of the strategy spaces of other playersis a heavy assumption: the better-response dynamics we discussed in Subsection 3.2 only assumedthat each player can compute a better response. 8 Discussion
As mentioned in Section 2.1, our analysis assumes players have better-response oracles. In fact,our model and results are valid for a much more general scenario, as described next. Consider thecase where players only have heuristics for finding a better response. After running heuristic better-response dynamics and obtaining a strategy profile, the payoffs with respect to the whole populationare guaranteed to be close to their empirical counterparts, w.h.p.; therefore, our analysis is stillmeaningful even if players cannot maximize their empirical payoff efficiently, as the bounds on therequired sample size we obtained in Section 3 and the rate of convergence are relevant for this caseas well.The reader may wonder about a variation of our model, where player payoffs are defined differently.For example, consider each user as granting one monetary unit to the player that offers the closestprediction to his instance. This definition is in the spirit of Dueling Algorithms [10] and Best Re-sponse Regression [4]. Under this payoff function, and unlike our model, an empirical PNE doesnot necessarily exist. Nevertheless, we believe that examining and understanding these scenarios isfundamental to analysis of competing prediction algorithms, and deserves future work.
Acknowledgments
This project has received funding from the European Research Council (ERC) under the EuropeanUnion’s Horizon 2020 research and innovation programme (grant agreement n° 740435).
References [1] I. Althöfer. On sparse approximations to randomized strategies and convex combinations.
Lin-ear Algebra and its Applications , 199:339–355, 1994.[2] Y. Babichenko, S. Barman, and R. Peretz. Empirical distribution of equilibrium play and itstesting application.
Mathematics of Operations Research , 42(1):15–29, 2016.[3] G. Barron and I. Erev. Small feedback-based decisions and their limited correspondence todescription-based decisions.
Journal of Behavioral Decision Making , 16(3):215–233, 2003.[4] O. Ben-Porat and M. Tennenholtz. Best response regression. In
Advances in Neural Informa-tion Processing Systems , pages 1498–1507, 2017.[5] A. Blum, N. Haghtalab, A. D. Procaccia, and M. Qiao. Collaborative pac learning. In
Advancesin Neural Information Processing Systems , pages 2389–2398, 2017.[6] R. Cole and T. Roughgarden. The sample complexity of revenue maximization. In
Proceedingsof the 46th Annual ACM Symposium on Theory of Computing , pages 243–252. ACM, 2014.[7] I. Erev, E. Ert, A. E. Roth, E. Haruvy, S. M. Herzog, R. Hau, R. Hertwig, T. Stewart, R. West,and C. Lebiere. A choice prediction competition: Choices from experience and from descrip-tion.
Journal of Behavioral Decision Making , 23(1):15–47, 2010.[8] Y. A. Gonczarowski and N. Nisan. Efficient empirical revenue maximization in single-parameter auction environments. In
Proceedings of the 49th Annual ACM SIGACT Symposiumon Theory of Computing, STOC 2017 .[9] H. Hotelling. Stability in competition. 1929. In the Economic Journal 39(153): 41–57, 1929.[10] N. Immorlica, A. T. Kalai, B. Lucier, A. Moitra, A. Postlewaite, and M. Tennenholtz. Duelingalgorithms. In
Proceedings of the forty-third annual ACM symposium on Theory of computing ,pages 215–224. ACM, 2011.[11] R. J. Lipton, E. Markakis, and A. Mehta. Playing large games using simple strategies. In
Proceedings of the 4th ACM conference on Electronic commerce , pages 36–41. ACM, 2003.[12] Y. Mansour, A. Slivkins, and Z. S. Wu. Competing bandits: Learning under competition. In , pages 48:1–48:27, 2018.913] D. Monderer and L. S. Shapley. Potential games.
Games and economic behavior , 14(1):124–143, 1996.[14] J. H. Morgenstern and T. Roughgarden. On the pseudo-dimension of nearly optimal auctions.In
Advances in Neural Information Processing Systems , pages 136–144, 2015.[15] R. B. Myerson. Optimal auction design.
Mathematics of operations research , 6(1):58–73,1981.[16] N. Nisan and A. Ronen. Algorithmic mechanism design. In
Proceedings of the thirty-firstannual ACM symposium on Theory of computing , pages 129–140. ACM, 1999.[17] R. W. Rosenthal. A class of games possessing pure-strategy nash equilibria.
InternationalJournal of Game Theory , 2(1):65–67, 1973.[18] H. A. Simon. Rational choice and the structure of the environment.
Psychological Review , 63(2):129, 1956.[19] L. G. Valiant. A theory of the learnable.
Communications of the ACM , 27(11):1134–1142,1984.[20] V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies ofevents to their probabilities. In
Measures of complexity , pages 11–30. Springer, 2015.[21] D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization.