Learning and Efficiency in Games with Dynamic Population
aa r X i v : . [ c s . G T ] D ec Learning and Efficiency in Games with DynamicPopulation
Thodoris Lykouris*
Department of Computer Science, Cornell University, Gates Hall, Ithaca, NY 14853, USA, [email protected]
Vasilis Syrgkanis
Microsoft Research, New York City, 641 Avenue of Americas, New York, NY 10011, [email protected] ´Eva Tardos † Department of Computer Science, Cornell University, Gates Hall, Ithaca, NY 14853, USA, [email protected]
We study the quality of outcomes in repeated games when the population of players is dynamicallychanging and participants use learning algorithms to adapt to the changing environment. Game theoryclassically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,and in such games players often use algorithmic tools to learn to play in the given environment. Learning inrepeated games has only been studied when the population playing the game is stable over time.We analyze efficiency of repeated games in dynamically changing environments, motivated by applicationdomains such as packet routing and Internet ad-auctions. We prove that, in many classes of games, if playerschoose their strategies in a way that guarantees low adaptive regret, then high social welfare is ensured,even under very frequent changes. This result extends previous work, which showed high welfare for learningoutcomes in stable environments. A main technical tool for our analysis is the existence of a solution tothe welfare maximization problem that is both close to optimal and relatively stable over time. Such asolution serves as a benchmark in the efficiency analysis of learning outcomes. We show that such stableand near-optimal solutions exist for many problems, even in cases when the exact optimal solution can bevery unstable. We develop direct techniques to show the existence of a stable solution in some classes ofgames. Further, we show that a sufficient condition for the existence of stable solutions is the existence ofa differentially private algorithm for the welfare maximization problem. We demonstrate our techniques byfocusing on three classes of games as examples: simultaneous item auctions, bandwidth allocation mechanismsand congestion games.
Key words : No-regret learning, price of anarchy, auctions, congestion games, differential privacy
History : First version: May 2015. Current version: November 2015. Preliminary version to appear at the2016 SIAM Symposium on Discrete Algorithms, SODA’16. ∗ Work supported in part by NSF grants CCF-0910940 and ONR grant N00014-08-1-0031. † Work supported in part by NSF grants CCF-0910940; and CCF-1215994, ONR grant N00014-08-1-0031, a Yahoo!Research Alliance Grant, and a Google Research Grant. Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
1. Introduction
The goal of this paper is to understand the quality of outcomes of games and simple mechanismsin a dynamic environment. The Internet allows for the repeated strategic interaction of many enti-ties with constantly changing parameters and participants. Primary examples of such interactionsinclude online advertising auction platforms, packet routing and allocation of cloud computingresources. Understanding whether the constant change in these strategic environments can severelydamage the efficiency of the corresponding system, as compared to the hypothetical centralizedoptimal, is of prime importance as these systems constitute the cornerstone of the online economy.For example, advertising provides close to 90% of Google’s revenue (Google 2015).Classical economic analysis of the interaction of strategic agents assumes that players reach astable outcome where all players are mutually best-responding to each others’ actions (or consid-ers mechanisms that are dominant strategy solvable). Dynamic environments, with high volumeinteractions of small individual value or cost, such as packet routing or ad-auctions, are bettermodeled as repeated games with learning players. Nash equilibria of the one-shot game correspondto stable outcomes repeated in each iteration, where the players have no regret for their choice ofstrategies. Hence, analyzing the quality of outcomes in repeated games via the price of anarchy(Koutsoupias and Papadimitriou 1999) assumes that the repeated game reaches a stable, station-ary outcome. Such an analysis of price of anarchy of one-shot Nash equilibria has received largeattention in the past few years in both the computer science and operations research communityand in a plethora of application domains such as routing games (Roughgarden and Tardos 2002,Correa et al. 2003), bandwidth allocation (Johari and Tsitsiklis 2004), strategic supply amongfirms (Johari and Tsitsiklis 2011) and online ad-auctions (Caragiannis et al. 2015) (see e.g. Chap-ters 17 to 21 of (Nisan et al. 2007) for a survey).A more attractive model of player behavior in such repeated environments is to assume playersuse a form of algorithmic learning. Modeling players as learners is especially appealing in onlineauctions, as individual auctions provide very little value, costing only a few cents to a few dollarseach, so using experimentation to learn from the data is natural. Many advertisers use sophisticatedoptimization tools or services to optimize their bidding, such as Bluekai or AdRoll .It is well known that in most games natural game play does not lead to equilibria, under anydefinition of “natural play” (see e.g. Chapter 7 of (Hart and Mas-Colell 2012)). In fact, results onpolynomial time computability of Nash equilibria of general games are mostly negative: findingequilibria is computationally hard (see (Daskalakis 2009) for a survey). , accessed 10-03-2015 , accessed 10-03-2015 Even with computational concerns aside, the game that the participants are playing at eachtime-step and the participants they are playing against, can change at any time without even theplayers realizing it or being able to form any distributional belief. Hence, even the concept of aNash equilibrium is debatable in such an adversarially evolving setting, as the players don’t evenhave the information necessary to calculate their expected utility at each time-step. Instead theyobserve their utility from the action they took or from any alternative action they could have taken,only after the fact. In such an evolving setting, players can base their actions on past experience.A particular class of learning behaviors, no-regret learning, emerged as a nice way to capturethe intuition that players learn to play appropriate strategies over time without necessitatingconvergence to a stationary equilibrium. A stationary distribution that is also a no-regret learningoutcome corresponds to a Nash equilibrium of the one shot game, and in this sense, learningoutcomes generalize Nash equilibrium. More importantly, there are several simple and naturalalgorithms that achieve the no-regret property (e.g. regret matching (Hart and Mas-Colell 2000),multiplicative weight updates (Arora et al. 2012)). However, no-regret does not preclude the useof possibly much more sophisticated tools, including using the above learning algorithms withmore complex benchmarks. Achieving small regret is a relatively simple expectation from bidoptimization tools.Blum et al. (2006, 2008) consider regret-minimization as a model of player behavior in repeatedgames, and study the average inefficiency of the outcome, coining the term price of total anarchy for the worst-case ratio between the optimal objective value and the average objective value whenplayers use a no-regret algorithm. In a sequence of play all players achieve the no-regret property, ifand only if the empirical distribution of strategy vectors is a coarse correlated equilibrium , hence the price of total anarchy is the ratio of the socially optimal welfare to the welfare at the worst coarsecorrelated equilibrium. Roughgarden (2009) observed that many of the Nash equilibrium priceof anarchy bounds are shown via a proof technique he called smoothness, and such proofs easilyextend also to show bounds on the quality of coarse correlated equilibria. Syrgkanis and Tardos(2013) extend smoothness to simple mechanisms, such as independent item auctions.However, this learning outcome analysis is based on the strong assumption that the underlyingenvironment and player population is stable. The reason for this requirement is easy to understand:with the game and the players stable, there is a fixed optimal solution, and a fixed strategy, thateach player i would need to play (action a ∗ i ) as his or her part for achieving the optimum. Toguarantee high social welfare via the smoothness approach, all we need is that each player i doesn’tregret not playing this optimal action a ∗ i . No-regret learning guarantees exactly this; player i willnot regret any fixed strategy with hindsight, including strategy a ∗ i . However, online environmentsare typically not stable. Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
In this paper, we study learning outcomes in games with a dynamically changing player pop-ulation. As stated in (Fudenberg and Levine 1998, p. 4), the fact that players extrapolate acrossgames they view as similar is an important reason learning has relevance in a real-world situation.A repeated game with an evolving population is exactly a setup where players are asked to playrepeatedly in similar games. Rather than aiming to predict the exact outcome, our goal is to predictproperties of outcomes, such as their efficiency, i.e. the price of anarchy.In a changing game environment, we need a slightly stronger notion of regret minimization.No-regret learning aims to select strategies that do at least as well on the average over a sequenceof steps as the best single strategy would have done in hindsight. With the game environment andpopulation changing, a single best strategy in hindsight gives a really weak benchmark. Players,using good learning algorithms, should be able to adapt to the changing environment, and suchadaptation may be very useful with the population changing over time. For example, in the con-text of routing games, a player with many route options, may want to adjust their route choicesdepending which part of the network is more congested, or in auction games, a player may wantto bid for items that are less in demand.Hazan and Seshadhri (2007) formally introduced the stronger notion of adaptive regret that wewill use, bounding the average regret over any sub-interval of steps [ τ , τ ), compared to a singlebest action over this interval in hindsight. The study of adaptive learning goes back much further:the work of Lehrer (2003) and Blum and Mansour (2007) studied generalizations of adaptive regretprior to (Hazan and Seshadhri 2007). Clearly short intervals will result in relatively high regret withany learning algorithm, but adaptive learning algorithms guarantee for the player that the cumu-lative regret grows sub-linearly with the length of the interval. Most adaptive learning algorithmsare constructed by modifying classical no-regret learning algorithms to stop relying too heavilyon experience from the distant past. We believe that such adaptive learning is a better model ofbehavior when strategic agents (such as bidders in online auctions) use sophisticated optimizationtools. The current best adaptive learning algorithm is a natural adaptation of the classical Hedgealgorithm, AdaNormalHedge, due to Luo and Schapire (2015). With this framework in mind, weask the following main question: How much rate of change can a system admit to sustain approximate efficiency, when itsparticipants are adaptive learners?Our Results.
We show that in large classes of games, if players choose their strategies in away that guarantees low adaptive regret, this ensures high social welfare, even under surprisinglyhigh turnover. To model a changing environment we consider a dynamic player population wherebetween every pair of iterations each player leaves independently with a (small) probability p andis replaced by an arbitrary new player, implying that in expectation a p fraction of the population is replaced. The independent departure probability models churn in player population caused byeffects that are external to the game. We make no assumptions on the sequence of arriving players,which can be chosen in an adversarial way. We use independence of departures for simplicity ofpresentation, and most of our results carry over to any process where the departing players arealso chosen adversarially, subject to a constraint on the number of per-step replacements. Thismodel of the environment is simple enough to allow a clean analysis, and allows arbitrary worstcase shifts in player populations.We show that learning behavior ensures high social welfare in dynamic situations with highchurn for four classes of games: • In Section 4.1 we consider an item auction game with unit demand bidders. At each periodthe auction sells m different items, and the bidders have value for at most one item per-period.The value of player i for each item j is different and is denoted by v ij . We consider a simple auctionformat: each item is auctioned independently (via a first or second price auction). We show thatadaptive learning by players ensures high social welfare (i.e. price of anarchy close to 4), even whenthe probability p of player departure is close to a constant (independent of the number of items orplayers, and depends only on the range of values that players have). • In Section 4.2, we consider a bandwidth allocation: a unit of bandwidth is to be divided acrossthe players of the game and each player i has a valuation function v i ( x ) for bandwidth x . Weconsider the proportional mechanism of Kelly (1997), analyzed in Johari and Tsitsiklis (2011), andshow a price of anarchy close to 4 under mild assumptions on the utility functions and even withhigh player turnover. • In Section 5.2 we prove that in large dynamic congestion games learning by players ensureslow social cost even with a dynamically changing player population. For example, when the costsare a linear function of the congestion, we get a price of anarchy guarantee close to the 5 / / polylog( n ) fractionof the n players are changing at each time-step. • In Section 5.3 we consider auction games where bidders have gross substitute valuations.Extending the results of Section 4.1, we prove that in large dynamic markets, learning by playersensures high social welfare, i.e. price of anarchy close to 2, even if a 1 / polylog( n ) fraction of the n players are changing at each time-step.We achieve these results by developing a general technique (in Section 3) to show that in manygames adaptive learners achieve high social welfare in dynamically changing environments. Ourtechnique is based on the following three conditions:1. All players are adaptive learners , i.e. they choose their strategies in a way that guaranteessmall adaptive regret on the outcome (for instance, using an adaptive learning algorithm). In Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population deriving concrete bounds, we assume that players use adaptive learning algorithm with the bestknown bound of (Luo and Schapire 2015) or (Blum and Mansour 2007). Our results deteriorategracefully with weaker assumptions on the regret of learning.2. The game repeated in each state (called stage ) needs to have low price of anarchy. In partic-ular, we need that the game satisfies a slight strengthening of the (Roughgarden 2009) smoothnessproperty (or the smooth mechanisms property of (Syrgkanis and Tardos 2013)), which is typicallyused to prove price of anarchy guarantees.3. There exists a sequence of solutions for the underlying optimization problem that is approxi-mately optimal , and where on average each player’s part of the solution is stable , i.e. doesn’t changemuch over time.With our model of players leaving the game independently with probability p at each step, onaverage each player is expected to participate in 1 /p rounds of the game, which turns out to be longenough to learn good strategies. On the other hand, players will experience dynamic populationchanges, and with no assumption on arriving players, they will need to adapt to the changingenvironment. With a player population of size n , and each player being replaced with turnoverprobability p , after each step we have np new players in expectation, so the population is constantlychanging. We use an approximately optimal solution where each player’s allocation is relativelystable as a benchmark for each player; a stable enough benchmark that will allow adaptive learnersto learn how to play at least as well as this solution. We will be interested in understanding whatvalue of p is needed to guarantee high social welfare.To apply the above outline to a game, we need to develop techniques for point 3 above: showthat there exists a stable sequence of close to optimal solutions in our changing environment. Wepresent two ways to achieve this stability. In Section 4, we consider solution sequences that areproduced by greedy algorithms where a turnover in the input has only local influence in the output.In Section 5, we consider solution sequences that are produced by differentially private algorithmswhere a turnover in the input affects the whole output but only with a small probability.Our first application, via the greedy algorithm approach, is the unit demand auction problemanalyzed in Section 4.1. In a unit-demand auction, after a change in one player, we could recomputethe optimal solution by an augmenting path algorithm. Unfortunately, a single augmenting pathcan change the assignment of many (or even all) players, and hence in no sense is the evolvingoptimal solution stable. Such major changes can happen even if the player valuations are all 0 or 1.We develop a greedy algorithm that finds stable solution sequences losing only a factor of 2 fromthe optimum value. To illustrate the idea, observe that in the special case of 0/1 values a greedymatching is essentially stable, and has size at least 1/2 of the optimal matching. In Section 4.1we extend this idea beyond 0 / auction problem. We use this algorithm to show that players using adaptive learning guarantee highsocial welfare in the item auction game with unit demand bidder even with a dynamically changingplayer population, allowing for a probability p of player departure that depends logarithmically onthe range of values players have, and does not depend on the number of items or players.Another application of the greedy algorithm approach is the bandwidth allocation problem(Section 4.2), where some bandwidth is divided across players with smooth concave valuationfunctions. Segmenting the bandwidth in small parts and viewing each segment as an item, weprovide an almost optimal greedy approximation algorithm with similar stability guarantees as inthe unit-demand auction setting.In Section 5 we develop a general method for applying our framework via the use of differentialprivacy. Differential privacy has been developed by Dwork et al. (2006) for (approximately) answer-ing queries of databases of private information, while protecting the privacy of data. Consider adatabase of sensitive personal information (such as medical data). The framework of differentialprivacy has been developed to allow us to take advantage of the statistical information in thedatabase without compromising the privacy of the individuals. A differentially private responseto a database query is randomized, and it requires that if two databases differ only in the datarelated to one individual, the probability that the response differs is very small. In recent yearsmany optimization problems have been shown to be solvable in a differentially private way (seethe recent book of Dwork and Roth (2014)).The requirement of differential privacy for a solution to an optimization problem is very close towhat we need for our stable solution sequences: if there is a differentially private close to optimalsolution, this immediately implies that the solution cannot change much as one person’s datachanges. We will be using a variant of the notion of differential privacy adapted to game theoreticenvironments, joint differential privacy (Kearns et al. 2014). Player i ’s share of any reasonablesolution must depend on his/her own input, so a solution cannot be fully differentially private.Joint differential privacy fixes this discrepancy. In fact, the notion of marginal differential privacyof Kannan et al. (2014) seems even more appropriate, as it only requires that the output for eachplayer j is differentially private in the data of other players. In order to take advantage of differentialprivacy in the context of dynamically changing games, we need to overcome an important technicaldifficulty: with the output of the differentially private algorithm randomized, the natural measureof change in a sequence of such outputs is the sum of the total variation distances between adjacentpairs of distributions. We need to turn the sequence of output distributions with low total variationdistance into a distribution of stable output sequences. We do this in Section 5 for joint differentialprivacy. In Appendix EC.2 we show how to adapt our analysis to the weaker notion of marginaldifferential privacy . Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
We illustrate the differential privacy approach via two applications. In Section 5.2 we use thedifferentially private algorithm of Rogers et al. (2015) for congestion games to prove that in largedynamic congestion games players using adaptive learning guarantees low social cost even with adynamically changing player population. In Section 5.3 we use differentially private algorithms ofHsu et al. (2014) for a matchings and allocations with gross substitute valuations to prove thatin large dynamic markets players using adaptive learning guarantees high social welfare even witha dynamically changing player population. For simplicity of presentation, we focus on first priceauctions in Sections 4.1 and 5.3, but our results apply also to second price auction (assuming nooverbidding) as well as any hybrids of the two auction formats. In this setting we show, roughly,that if we have a smoothness-based price of anarchy bound for the single-shot game then, in thedynamic population setting, the price of anarchy is ǫ close to the same bound assuming that p = O ( ǫ / polylog( n )), as long as the market is large enough, in the sense that the supply of goodsis large enough. The simultaneous first price auction gives a price of anarchy bound of 2. Thuseven if approximately n/ log( n ) players are changing at each time-step, a constant inefficiency isguaranteed.As a benchmark for the latter two results, it is interesting to consider a simpler model of dynamicplayer population, where the departure or arrival of a player is announced to all players. We expect np new players each step, so in expectation there will be 1 / ( np ) steps with no change at all. Ifall the changes are announced, players could be expected to restart their learning algorithms dueto the change. If the stable period 1 / ( np ) is long enough, we can use results for the total price ofanarchy to guarantee high social welfare. Under standard no-regret learning algorithms each playerwill then have average regret approximately O ( √ n · p ). Hence, if we want the regret in the systemto be at most an ǫ fraction of the optimal welfare and hence contribute only an ǫ to the inefficiency,we would require that p = O ( ǫ /n ). In other words, the probability that any player changes in aperiod needs to be ǫ /n , which is a tiny rate of change for large n .Our results are stronger than what is implied by this argument in two ways. First, we do notassume that change is announced, rather, we take advantage of the fact that players using learningalgorithms can adjust to the changing environment even without the announcement of the change.More importantly, our results allow a probability of change much higher than the required by theabove argument. The resulting dynamic game will not have long periods with no change. Multipleplayers will be arriving and leaving at each step. We show that in many games, despite the constantchange, there exists a good benchmark of the kind mentioned in the conditions above, where eachplayer’s individual solution or allocation is relatively stable. The rate of expected change np inour applications will turn out to be high, especially as the number of players increases. Roughlyspeaking, if we want the regret of the players to be an ǫ fraction of the optimal welfare, we will only require that p = O (poly( ǫ ) / polylog( n )), where the constants depend on several parameters of eachgame at hand, but importantly depends only logarithmically in the number of players. Moreover,in some games we even give a bound that is independent of n . Hence, for any constant ǫ we allowalmost a constant fraction of players to be changing at each period. Further related work on dynamic games.
Dynamic games have a long history in economics,dynamical systems and operations research, see for example the survey books of Baar and Olsder(1998) and Van Long (2010). The classic approach of analyzing behavior in such dynamic gamesis to assume that players have prior beliefs about their competitors and that their behavior willconstitute a perfect Bayesian equilibrium or refinements of it such as the sequential equilibrium of Fudenberg and Tirole (1991) or
Markov perfect equilibrium of Maskin and Tirole (2001). Theintractability of such equilibrium solution concepts and the informational and rationality assump-tions that they impose on the players casts doubt on whether players in practice and in complexgame theoretic environments such as packet routing or internet ad-auctions would behave as pre-scribed by such an equilibrium behavior.Equilibrium-like behavior might be more plausible in large game approximations (see e.g. recentwork of Kalai and Shmaya (2015)). A natural approximation to equilibrium behavior in large gamesituations that has been recently extensively analyzed in economics and in operations research(and particularly, in auction settings) is that of the mean field equilibrium (Balseiro et al. 2015,Weintraub et al. 2006, 2008, Adlakha et al. 2015, Iyer et al. 2014, Adlakha and Johari 2013). How-ever, even these large game approximations require the players to form almost correct beliefs aboutthe competition and exactly best-respond to these approximate large-market beliefs. Moreover, theapproach requires that the environment either is stochastically stable or evolves in a known stochas-tic manner and in most situations the mean field approach captures behavior at a stochasticallystable state of the system. On the contrary our dynamic model allows for adversarial changes andour analysis attempts to analyze even constantly evolving and never converging behavior. More-over, our assumption that players invoke adaptive learning algorithms does not impose that playerspossess or form any beliefs on the competition. Most of the algorithms that achieve adaptive regretonly require that the player is able to see the utility that each of his strategic options would havegiven in-retrospect, in past time-steps. Last our approach also applies in small markets.There is also a large literature on truthful mechanisms in a dynamic setting analogous to ourdynamic player population model, where the goal is to truthfully implement a desired outcomewith dynamically changing populations of users with private value. This line of work goes backto Parkes and Singh (2003) in the computer science literature, but has been also considered muchearlier with queuing models by Dolan (1978). In a more recent work Cavallo et al. (2010) offers ageneralized VCG mechanism in an environment very similar to the one we are considering with Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population departures and arrivals, and also provides a nice overview of work in truthful mechanisms in adynamic setting. For a more complete overview, the reader is referred to the survey on dynamicauctions by Bergemann and Said (2010).
Further related work on learning in games.
There is a large literature analyzing learning ingames, dating back to the work on fictitious play by Brown (1951). For an overview of this area,the reader is referred to the books of Fudenberg and Levine (1998) and Cesa-Bianchi and Lugosi(2006). A standard notion of learning in games is that of no-regret learning. The notion of no-regret against the best fixed action in hindsight dates back to the work of Hannan (1957), and isalso referred as,
Hannan consistency . There are many learning algorithms achieving this guaranteesuch as regret matching by Hart and Mas-Colell (2000) and multiplicative weights updates byFreund and Schapire (1997).
Related work on learning in dynamic environments.
The notion of no-regret learning againsttime-varying benchmarks, as opposed to fixed actions, traces back to Herbster and Warmuth (1998)who provided guarantees compared to the best sequence of k experts. The stronger notion of adap-tive regret, i.e. having guarantees for every sub-interval was formalized by Hazan and Seshadhri(2007) and near-optimal adaptive regret guarantees were achieved through a series of algorithmsby Lehrer (2003), Blum and Mansour (2007), Cesa-Bianchi et al. (2012), and Luo and Schapire(2015). One important trait of these algorithms is that they display some sort of recency bias, inthe sense that the influence of past steps decays as time goes by. Recent experimental evidenceby Fudenberg and Peysakhovich (2014) suggests that humans display such forms of recency biaswhen making repeated decisions.Competing against an adaptive benchmark has also been studied in the context of online convexoptimization. Besbes et al. (2013) compare to a target function that is changing from step to step.In order to guarantee some stability across steps they require that the total variation distancebetween subsequent target functions is bounded by some number. This is a way to capture thenotion that subsequent rounds are not very different, related to our notion of turnover probabilitywhich in expectation guarantees a similar stability bound on the number of changes per step.
2. Preliminaries
Games and mechanisms.
We will consider a game played repeatedly, where the population ofplayers is drifting over time. Let G be an n -player normal form stage game and assume that game G is played repeatedly T times. Each player i who participates in a stage game has a strategy space S i , with max i | S i | = N , a type v i ∈ V i and a cost function c i ( s ; v i ) that depends on the strategyprofile s ∈ × i S i , and on his type. We will denote with C ( s ; v ) = P i ∈ [ n ] c i ( s ; v i ) the social cost, where s is a strategy profile and v a type profile. We will also analyze the case when the stage game is a utility maximization mechanism M , which takes as input a strategy profile and outputs anallocation X i ( s ) for each player and a payment P i ( s ). We will assume that players have quasi-linearutility u i ( s ; v i ) = v i ( X i ( s )) − P i ( s ) and the welfare is the sum of valuations (sum of utilities ofbidders and revenue of auctioneer): W ( s ; v ) = P i ∈ [ n ] v i ( X i ( s )).In all the games that we study, the optimal social welfare problem can equivalently be defined asan optimization over a “feasible solution space” X n which involves no incentives (e.g. in networkcongestion games it is the set of feasible integral flows, in a combinatorial auction setting it is theset of feasible partitions of items to bidders, in the bandwidth allocation setting it is the set ofvalid partitions of the bandwidth). We will overload the social cost and welfare notations, and fora feasible solution (or allocation) x ∈ X n we will use C ( x ; v ) and W ( x ; v ) to denote the social costor welfare of the solution. We denote the optimal social cost or welfare for a type profile v , as Opt ( v ) = min x ∈X n C ( x ; v ) and Opt ( v ) = max x ∈X n W ( x ; v ) respectively. Definition 2.1 (Repeated game/mechanism with dynamic population).
A repeatedgame with dynamic population consists of a stage game G played for T time steps. Let P t denotethe set of players at time t , where each player i ∈ P t has a private type v ti . After each step, everyplayer independently exits the game with a (small) probability p > G, T, p ). Similarly, we denote with M = ( M, T, p ) a mechanism that isplayed T times with player replacement probability p .Our model of dynamic population assumes that after each step every player independently exitsthe game with a probability p >
0, so each player is expected to play the game for 1 /p rounds. Tokeep our model simple, we make the assumption that when a player exits, she is replaced by a newparticipant. This assumption guarantees that we will have exactly n players in each iteration, witha p fraction of the population changing each iteration in expectation. We make no assumptionabout the types of the new arriving players which can be picked adversarially. Most of our resultscould be extended to the case when the players that are being replaced is also chosen adversarially,subject to some constraint on the number of per-step replacements.To simplify the notation, we will use player i to denote the current i th player, where this player isreplaced by a new i th player with probability p each round. An alternate view of the dynamic playerpopulation is to think of players as changing types after each iteration with a small probability p .We will refer to such a change as player i switches or turns over . Basic notation.
For any quantity x we will denote with x T the sequence x , . . . , x T . For instance, v Ti will denote the sequence of types of player i produced by the random choice of leaving playersand by the choices of the adversary.We will consider three special classes of games, two welfare-maximization mechanisms and onecost-minimization game: Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
First-price Auction Game.
The auction games we consider are defined by a set of m goods,where we will assume that each good has a supply of s identical copies in each iteration. We assumefor simplicity of presentation that the supply of each item is identical. The players are buyers whorepeatedly participate in item auctions to buy copies of the items. Each buyer wants at most onecopy of each item. The type of a buyer i is her valuation over sets of items.We will use v ti ( A ) to denote the valuation of the i -th player in iteration t , if he gets at least onecopy of each item in set A ⊂ [ m ]. We will assume, that valuations are non-negative and at most1. Last we will assume that conditional on having a set S , the marginal value of a player for anyextra item j , i.e. v ti ( { j } ∪ S ) − v ti ( S ) is either 0 or at least some constant ρ . Valuations over timeare additive, which models perishable items, such as advertising opportunity, where a player willplay to repeatedly win items in each period she is participating.We will focus the presentation on first price item auctions, where players submit a bid on eachitem separately: if we have s copies of an item, the s highest bidders for the item get one copy each,and pay their bid (ties are broken arbitrarily). The bid on each item comes from some sufficientlyfine, discrete bid space. Specifically, bids are multiples of δ · ρ for some small δ and lie in [0 , v ti ( j ) to denotethe value of an item j for buyer i at time t , so the player’s value for a set A is v ti ( A ) = max j ∈ A v ti ( j ).In this application, we will assume that players will bid for at most one item at each iteration.Thus the number of strategies available to each player is N = mδ · ρ .In Section 5.3 we consider large markets of first price item auctions with players that have morecomplex valuations satisfying the gross substitute property. In this application, we will assume thatplayers want at most d types of different items, i.e. for any set A : v ti ( A ) = max T ⊂ A : | T | = d v ti ( T ) andwe assume that they will bid on only d different auctions. Thus the number of strategies availableto them is N = (cid:0) md (cid:1) (cid:16) δ · ρ (cid:17) d ≤ (cid:16) mδ · ρ (cid:17) d . Proportional bandwidth allocation mechanism.
The proportional bandwidth allocation mecha-nism, introduced by Kelly (1997) and first studied for price of anarchy by Johari and Tsitsiklis(2011), is defined by a bandwidth of B and a valuation function for each player which is concaveon the bandwidth she receives. At every round, each player i submits a bid b ti , pays her bid andgets allocated bandwidth proportional to her bid, i.e. x i ( b t ) = b ti P j b tj .In this setting the type of the player is her valuation function. We will use v ti ( x i ) to denoteplayer i ’s valuation for bandwidth x i . We will make the assumption for the valuation functionsthat their slope will be lower bounded by some ρ >
0. Player i ’s utility will be again quasilinear, i.e. u ti ( b ) = v ti ( x i ( b t )) − b ti . Similarly as before, we will assume that bids will be only multiples of ρδ forsome δ >
0. Therefore bidding space is sufficiently discrete and the number of strategies availableis at most N = ρδ . Atomic congestion game.
In the atomic congestion game, we assume that we have a set ofcongestible elements E (and let m = | E | ), each element e has a latency function ℓ e ( x ), or cost, thatis monotone non-decreasing in the congestion x . Given some selection of sets s i ⊆ E for each player i , the congestion in an element e is the number of players that have selected it: x e ( s ) = |{ i : e ∈ s i }| ,and the cost of player i is then the sum P e ∈ s i ℓ e ( x e ( s )).A player’s type v ti denotes the possible subsets of the element set she can select. For example, inthe routing game on a graph, the type of a player i is a source-sink pair ( o i , d i ), and her strategyis the choice of a path from o i to d i in the graph. We assume that a player’s cost is infinity if hersolution is not one of the selected sets. Thus the number of strategies available to the player isthe number of ( o i , d i ), paths in the graph and thereby N is the maximum number of such possiblepaths across possible source-sink pairs. Adaptive Learning in Dynamic Environments.
We use the notion of adaptive regret introducedby Hazan and Seshadhri (2007). We start by defining no-regret learning, and then consider adaptiveregret. To formally define regret, and no-regret learning, we consider an arbitrary loss function. Fora cost-game, we will think of the cost the player incurs as loss. For a utility game, we define losseach step as the difference between the maximum possible utility and the player’s utility. Considera player who has N possible choices, the N strategies that the player has to choose from. In definingregret, and no-regret learning we are focusing on a single player, and hence we will temporarilydrop the index i for the player from the notation. We use L ( s, t ) to denote the loss (cost or lostutility) of the player if she plays strategy s at time t . We can assume without loss of generalitythat L ( s, t ) is a value in [0 , s ∗ with hindsight. Formally, we say the regret of a strategy sequence s T is T X t =1 L ( s t , t ) − min s ∗ L ( s ∗ , t )Note that even with a stable set of players the value L ( s, t ) will vary over time, depending of thestrategies chosen by other players. As mentioned in the Introduction, there are many simple algo-rithms (see, for example, (Arora et al. 2012)) that achieve regret O ( √ T ) against any (adversarial)sequence of loss values L ( s, t ).In dealing with changing environments, we will need a stronger assumption on the learning of theplayers, we need that the players adapt their strategies to the environment. We will use a notion of Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population adaptive regret, regret over (long) intervals of time [ τ , τ ), in addition to the regret of the wholesequence, defined by Hazan and Seshadhri (2007). Definition 2.2 (Adaptive Regret).
The adaptive regret of strategy sequence s T in timeframe [ τ , τ ) is defined as: R ( τ , τ ) = max s ∗ τ − X t = τ ( L ( s t , t ) − L ( s ∗ , t ))Adaptive learning algorithms go back to the work of Lehrer (2003) and Blum and Mansour (2007)who considered more general notions of regret. We say that a player satisfies adaptive learning ifher regret R ( τ , τ ) can be bounded by a function that is o ( τ − τ ), that is, regret grows slowerthan linearly over time. Our results are affected by the quality of the learning algorithm playersuse, as with better learning we can tolerate higher turnover in the population of players. In the restof the paper we will use the learning bounds of the recent work of Luo and Schapire (2015), whodeveloped an adaptation of the classical Hedge algorithm, AdaNormalHedge that achieves smallregret on all intervals. An alternate algorithm with a bound of the same type was also given in(Blum and Mansour 2007). Theorem 2.1 ((Luo and Schapire 2015)) . Suppose a player uses AdaNormalHedge andselected strategy sequence s T . For any time frame [ τ , τ ) , AdaNormalHedge achieves adaptiveregret: E ( R ( τ , τ )) ≤ C R p ( τ − τ ) ln( N τ ) where N is the number of choices, C R is a small constant less than , and loss is assumed to be in [0 , for all s and t . In what follows, we will assume that all players in our repeated game use a learning algorithmwith low adaptive regret, will use R i ( τ , τ ) to denote the adaptive regret of player i over the period[ τ , τ ]. For simplicity of presentation, we will assume that, for some constant C R , E ( R ( τ , τ )) ≤ C R p ( τ − τ ) ln( N τ ) for all players and all time periods [ τ , τ ). Throughout the paper, we willrefer to this assumption as “The players use adaptive learning algorithms with constant C R .” Ourresults would smoothly degrade if we assumed only that players achieve adaptive regret that issome other sublinear concave function of the interval’s length ( τ − τ ). Solution-based Smoothness in Games and Mechanisms.
Smooth games were introduced byRoughgarden (2009) as a general framework bounding the price of anarchy in games. He alsoshowed that smoothness based price of anarchy bounds extend to outcomes in repeated gameswhen all players use no-regret learning. We need a somewhat more general variant of smooth games, that compares the cost or utilityresulting from a strategy choice to the social welfare of a specific solution, rather than comparingto the social optimum. For two strategy vectors s and s ∗ we use ( s ∗ i , s − i ) to denote the vector whereplayer i uses strategy s ∗ i and all other players j use their strategy s j . Definition 2.3 (Solution-based smooth game).
A cost-minimization game G is ( λ, µ )-smooth with respect to a solution x , if for some λ > µ <
1, for any type profile v , for eachplayer i there is a strategy s ∗ i ∈ S i depending on his type v i and her part of the solution x i suchthat for any strategy profile s X i c i ( s ∗ i ( v i , x i ) , s − i ; v i ) ≤ λC ( x ; v ) + µC ( s ; v )A game G is solution-based ( λ, µ )-smooth if it is smooth with respect to any feasible solution x ∈ X n .Note that, when x is the optimal solution, we recover the traditional examples of smooth games, asthe deviating strategy s ∗ usually depends on other players’ types through his part of the optimalsolution x ∗ i ( v ). A game that is ( λ, µ )-smooth with respect to the optimal solution x ∗ ( v ) is ( λ, µ )-smooth in the sense of (Roughgarden 2009), and the game has price of anarchy bounded by λ/ (1 − µ ), and the average social cost of no-regret learning outcomes is also bounded by λ/ (1 − µ ) Opt .More generally,
Theorem 2.2.
If a game is ( λ, µ ) -smooth with respect to a solution x , then at any Nash equilibriaof the game, as well as at any no-regret learning outcome, the expected cost is at most λ − µ C ( x ; v ) .Proof of Theorem 2.2. We include the proof for the case of pure Nash equilibria for complete-ness. Consider a strategy vector s that is a Nash equilibrium. At a Nash equilibrium, no player hasregret for any alternate strategy, so in particular we get that c i ( s ∗ i ( v i , x i ) , s − i ; v i ) ≥ c i ( s ; v i ) for all i . Adding up these inequalities and using the smoothness property, we get C ( s ; v ) = X i ∈ [ n ] c i ( s ; v i ) ≤ X i c i ( s ∗ i ( v i , x i ) , s − i ; v i ) ≤ λC ( x ; v ) + µC ( s ; v ) (2.1)The claimed bound follows by rearranging the terms. The proof extends to randomized equilibriaby taking expectations, including the distribution resulting in no-regret learning in the limit. (cid:3) Syrgkanis and Tardos (2013) give a related definition for smooth mechanisms assuming quasi-linear valuation for all players. Again, we define a mechanism smooth with respect to a solution x ,and allow the choice of strategy s ∗ to depend on the player’s part of the solution x i and his type v i . More formally, we will use the following definition. Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Definition 2.4 (Solution-based smooth mechanism).
A mechanism M is ( λ, µ )-smooth with respect to a solution x for some λ, µ ≥ v for each player i thereexists a deviating strategy s ∗ i ∈ S i depending on v i and x i such that for all strategy vectors s , X i u i ( s ∗ i ( v i , x i ) , s − i ; v i ) ≥ λW ( x ; v ) − µ R ( s ) . where R ( s ) = P ni =1 P i ( s ). M is a solution-based ( λ, µ )-smooth mechanism if the latter holds forany feasible solution x ∈ X n .Syrgkanis and Tardos (2013) proved that a ( λ, µ )-smooth mechanism has price of anarchybounded by max( µ, /λ , and the average social welfare of no-regret learning outcome is also atleast ( λ/ max( µ, Opt ( v ). Analogously we get: Theorem 2.3.
If a mechanism is ( λ, µ ) -smooth with respect to a solution x , then at any Nashequilibria of the game, as well as at any no-regret learning outcome, the expected social welfare isat least max( µ, λ W ( x ; v ) .Differential privacy. Differential privacy has been developed for databases storing private infor-mation for a population. A database D ∈ V n is a vector of inputs, one for each player. Two databasesare i -neighbors if they differ just in the i -th coordinate, i.e. differ only in the input the i -th player.If two databases are i -neighbors for some i , they are called neighboring databases .In the context of repeated games, every time a player leaves or arrives, the solution may changedrastically. Instead of comparing the game outcomes to the socially optimal solution that changeswith every player change, we will want to compare the outcome to a more stable but close tooptimal solution. The notion of differential privacy offers a useful framework for this goal.Dwork et al. (2006) define an algorithm as differentially private if one person’s information haslittle influence on the outcome. In the setting of a game or mechanism the outcome for player i clearly should depend on player i ’s input (her claimed valuation, or source destination pair), socannot be differentially private. The notion of joint differential privacy which has been developedby Kearns et al. (2014) to adapt differential privacy to settings, where the algorithm has a set of n outcomes, one for each player. We use X to denote the set of possible outcomes for one player,so an algorithm in this context is a function A : V n → X n . The algorithm is jointly differentiallyprivate, if for all players i , the output for all other players is differentially private in the input ofplayer i . More formally, Definition 2.5 ((Kearns et al. 2014)).
An algorithm A : V n → X n is ( ǫ, δ )- jointly differen-tially private if for every i , for every pair of i -neighbors D, D ′ ∈ V n , and for every subset of outputs S ⊆ X n − . P r [ A ( D ) − i ∈ S ] ≤ exp( ǫ ) P r [ A ( D ′ ) − i ∈ S ] + δ If δ = 0, we say that A is ǫ -jointly differentially private. We will see that close to optimal and jointly private solutions along with smoothness with respectto the sequence of solutions x t , can be used to show the strength of learning outcomes in our setting.Over the last the years there have been a number of algorithms developed that solve problemsclose to optimally in a differentially private way. See the recent book of Dwork and Roth (2014)for a survey. In this paper, we will take advantage of such algorithms, including the algorithmsfor solving matching problems (Hsu et al. 2014) and finding socially optimal routing (Rogers et al.2015). Marginal privacy.
A recent work of Kannan et al. (2014) introduced the weaker notion of marginal differential privacy , also in the setting when the algorithm outputs a set of n outcomes,one for each player. A mechanism is marginally differentially private if the distribution of outcomesfor any one player j is differentially private in the input of another player i = j , but not requiringthat the combined output of all players j = i should be differentially private in i th input. Ourmain results continue to hold even under this weaker notion of privacy. However since no improvedapproximation algorithms are known under this notion for the settings that we study, we focus onjoint privacy in the main part of the paper and present the extension in Appendix EC.2.
3. Price of Anarchy for Dynamic Games and Mechanisms
In this section we offer our two main theorems which follow the high level outline presentedin section 1. Specifically, we formalize the connection between adaptive learning , solution-basedsmoothness and the existence of approximately optimal and stable solution sequences. We give thisconnection both in the context of cost-minimization games and in the context of mechanisms. Inthe next section we give an application of the framework to unit-demand matching markets andbandwidth allocation, and in Section 5 we provide a more canonical approach towards producingstable sequences by connecting the problem to differential privacy, along with a way we can relaxthe stability notion required. Definition 3.1 ( k -stable sequence). A randomized sequence of solutions x T = { x , . . . , x T } and types v T = { v , . . . , v T } is k -stable if the average (across players) expected number of changesin each individual player’s solution or type is at most k , i.e., if k i ( v Ti , x Ti ) is the number of timesthat x ti = x t +1 i or v ti = v t +1 i , then: 1 n n X i =1 E (cid:2) k i (cid:0) v Ti , x Ti (cid:1)(cid:3) ≤ k Theorem 3.1 (Main theorem for cost-minimization games) . Consider a repeated costgame with dynamic population
Γ = (
G, T, p ) , such that the stage game G is solution-based ( λ, µ ) -smooth and costs are bounded in [0 , . Suppose that v T and x T is a k -stable sequence, Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population such that x t is feasible (pointwise) and α -approximately (in-expectation) optimal for each t , i.e. E [ C ( x t ; v t )] ≤ α · E [ Opt ( v t )] . If players use an adaptive learning algorithm with constant C R then: X t E [ C ( s t ; v t )] ≤ λα − µ X t E [ Opt ( v t )] + n − µ · C R p T · ( k + 1) · ln( N T )An analogue of the theorem above holds for mechanisms too.
Theorem 3.2 (Main theorem for mechanisms) . Consider a repeated mechanism withdynamic population M = ( M, T, p ) , such that the stage mechanism M is solution-based ( λ, µ ) -smooth and utilities are bounded in [0 , . Suppose that v T and x T is a k -stable sequence,such that x t is feasible (pointwise) and α -approximately optimal (in-expectation) for each t , i.e. α · E [ W ( x t ; v t )] ≥ E [ Opt ( v t )] . If players use an adaptive learning algorithm with constant C R then: X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − n · C R p T · ( k + 1) · ln( N T )We also show an improved bound for some classes of mechanisms that satisfy an non-negativeutility property and which we will use in our application in Section 4.1. For the case of simultaneoussingle-item first price auctions with unit-demand bidders it leverages the fact that by bidding onlyon one item at-a-time, player utilities are guaranteed to be nonnegative at all times, and only asubset of the players (e.g. at most m in the case of an m item auction) are being allocated in anyfeasible allocation. Under these conditions, players with no item in the feasible allocation will haveno regret against a deviating strategy that attempts to ”win” the empty allocation. For a generalmechanism M the required Property is stated as follows: Property 1. M has an empty allocation ∅ in the allocation space. Moreover u i ( s ∗ i ( v i , ∅ ) , s − i ) = 0and u i ( s ; v i ) ≥ Theorem 3.3 (Improved bound for mechanisms) . Consider a repeated mechanism withdynamic population M = ( M, T, p ) , such that the stage mechanism M is solution-based ( λ, µ ) -smooth, satisfies Property (1) and utilities are in [0 , . Assume that there exists a randomizedsequence of solutions x T = { x , . . . , x T } and types v T = { v , . . . , v T } , such that x t is feasi-ble (pointwise) and α -approximately optimal (in-expectation) for each t , i.e. α · E [ W ( x t ; v t )] ≥ E [ Opt ( v t )] .For each player i , let κ i ( v Ti , x Ti ) be the number of times that x ti = x t +1 i or ( x ti = ∅ and v ti = v t +1 i ). If the randomized sequence satisfies an analogue of k -stability: n n X i =1 E (cid:2) κ i (cid:0) v Ti , x Ti (cid:1)(cid:3) ≤ k (3.1) Observe that unlike the definition of k i ( v Ti , x Ti ), κ i ( v Ti , x Ti ) does not account for changes in the type of playersthat are not currently allocated an item in solution x ti . and players use an adaptive learning algorithm with constant C R then: X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − C R p T · m · ( k · n + m ) · ln( N T ) where m is such that for any feasible allocation x , |{ i : x i = ∅}| ≤ m .Removing the dependence on T . In all the theorems of this section there is a logarithmic depen-dence of the average regret on the time horizon T . This will lead in the efficiency theorems through-out the paper to require that the probability of change p be at most a quantity that is inverselyproportional to log( T ). As we want to think of T as a really large quantity, one might argue thatthis dependence makes the requirements on p very harsh. However, we note that this dependenceon T is not essential and is only for the simplicity of exposition. The quantity that should actuallygo into the regret bounds presented in this section is rather of the order of the expected lifespanof any player in the repeated game, which is of the order of 1 /p . Therefore the log( T ) terms in thetheorems of this section can be replaced by terms that are roughly O (log(1 /p )).In Section EC.4 of the supplementary material we formalize this argument and provide a detailedproof of how to remove the dependence on T in all our theorems.
4. Stable Sequences via Greedy Algorithms
In this section we offer direct arguments to show the existence of stable solution sequences andhence good efficiency results for games with dynamic population. We prove efficiency results forthe case of matching markets with dynamic population and the case of proportional bandwidthallocation with dynamic population. Our method is based on a combination of using the greedyalgorithm and rounding the input parameters.
As a first application we focus on a repeated mechanism with dynamic population Γ = (
M, T, p ),where the stage mechanism is simultaneous first price auction with unit-demand bidders (matchingmarkets). To apply our improved theorem, Theorem 3.3, we need two things: i) that the mech-anism is allocation based ( λ, µ )-smooth, and ii) that there exists a relatively stable sequence ofapproximately optimal solutions for the optimization problem.We start by showing that the mechanism is smooth. (1 / , / , Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Lemma 4.1 (Smoothness of simultaneous first price auction) . The simultaneous first pricemechanism where players are restricted to bid on at most d items and on each item submit abid that is a multiple of δ · ρ , is a solution based (cid:0) − δ, (cid:1) -smooth mechanism, when players havesubmodular valuations, such that all marginals are either or at least ρ and such that each playerwants at most d items, i.e. v i ( S ) = max T ⊆ S : | T | = d v ( T ) . To get a stable and approximately optimal allocation, we use a layered version of the greedyalgorithm. The greedy matching algorithm considers item valuations v i ( j ) in decreasing order andassigns item j to player i if, when v i ( j ) is considered, neither item j nor player i are matched.To make this algorithm more stable we define the greedy-layered matching algorithm , which worksas follows. Let ρ > ǫ ≤ /
3, we round each player’s value down to the closest number of the form ρ (1 + ǫ ) ℓ for someinteger ℓ , and run the greedy algorithm with these rounded values. It is well known that the greedyalgorithm guarantees a solution that is within a factor of 2 to optimal. We lose an additional factorof (1 + ǫ ) by working with the rounded values. The greedy algorithm will have many ties and wewill resolve ties in a way to make the output stable. Lemma 4.2 (Stability via the greedy algorithm) . Consider a repeated matching marketmechanism with dynamic population M = ( M, T, p ) , with m items and n players, where ρ is theminimum possible non-zero valuation. Assuming T ≥ /p , the greedy-layered matching algorithmwith parameter ǫ guarantees that W ( x t ; v t ) ≥ ǫ ) Opt ( v t ) for all t , and it can be implemented sothat the average (over players) expected number of changes in the allocation sequence or the typefor players who hold an item at the time of the change is upper bounded by n P ni =1 E [ κ i ( v Ti , x Ti )] ≤ · T · m · p · log (1+ ǫ ) (1 /ρ ) n (4.1) Theorem 4.1 (Main theorem for matching markets) . In the simultaneous first price auc-tion mechanism with dynamic population and unit-demand bidders, if all players use adaptivelearning algorithms with constant C R and if T ≥ p we have: P t E [ W ( s t ; v t )] ≥ ǫ ) P t E [ Opt ( v t )] − mT · C R q · p · log (1+ ǫ ) (1 /ρ ) · ln( N T ) (4.2) where N is the number of different strategies considered by a player.If in addition we assume that all items get allocated at each round for the minimum value of ρ ,or that the average optimal welfare in each round is at least mρ , that is T P Tt =1 E [ Opt ( v t )] ≥ mρ ,then we can also get a purely multiplicative bound: P t E [ W ( s t ; v t )] ≥ ǫ ) P t E [ Opt ( v t )] (4.3) if the turnover probability p is at most C · ρ ǫ ln( NT ) for C = (96(1 + ǫ ) ( C R ) log (1+ ǫ ) (1 /ρ )) − . Remark 4.1.
An interesting feature of Theorem 4.1 is that the probability p is independent ofthe number of players n and the number of items m , implying that the game can accommodateextremely high turnover in player population, as the number of players increases, without losingin the quality of the outcome. The probability p required for the high quality solution, needs todepend only on log (1+ ǫ ) (1 /ρ ), log N and log T , where N is bounded by mρδ and the dependence on T can be removed as presented in Section EC.4 of the supplementary material.The high-level intuition why the greedy algorithm can sustain such a rate of change is as follows:At any time-step the only players that incur any non-zero regret are the players to whom thegreedy solution currently allocates some item. Since the optimal welfare is at least m · ρ , if wewant the efficiency to be ǫ close to what is implied by having absolutely no regret for the greedylayered algorithm we need the total regret in the system to be at most ǫ · m · ρ . In other words,we need the regret associated with each item to be at most ǫ · ρ . Now observe that when an itemis allocated to a player in the highest level, i.e. with a value in [ ǫ ) , view the lifetime ofan item as decomposing into p · T cycles such that during each cycle the item transitions fromlevel-1 players to level-log (1+ ǫ ) (1 /ρ ) players. In other words, the lifetime of an item splits in roughly pT log (1+ ǫ ) (1 /ρ ) stable allocation intervals, leading to average interval length ( p log (1+ ǫ ) (1 /ρ )) − and thereby average regret at most q p · log (1+ ǫ ) (1 /ρ ). Since we want this regret to be at most ǫ · ρ ,we get p ≤ ρ ǫ log (1+ ǫ ) (1 /ρ ) which is essentially the bound we have in Theorem 4.1. As a second application we focus on a repeated mechanism with dynamic population M = ( M, T, p )where the stage mechanism is the proportional bandwidth allocation mechanism. Recall the band-width sharing mechanism, where every player i submits and pays a bid b i , and the availablebandwidth B (which we assume is B = 1 for notational simplicity) is divided proportionally to theplayer’s bid, so bidder i gets bandwidth x i ( b ) = b i P j b j and pays b i . We assume that the player’sutility is quasilinear, so if the player’s valuation function is v i ( x ) for x amount of bandwidth,then the resulting utility is u i ( b ) = v i ( x i ( b )) − b i . Following Kelly (1997) and Johari and Tsitsiklis(2011), we will assume that the player’s valuation functions v i : [0 , B ] → R are increasing, concaveand differentiable. Further, we will make some Lipschitz style assumptions on the rate of changeof the value functions. Concretely, we will assume the following:1. Value functions v i ( x ) are increasing, concave and twice differentiable, v i (0) = 0 and v i ( B ) ≤ ρ , i.e. ∀ i, x : v ′ i ( x ) ≥ ρ . Not completely accurate as players can leave to other items too, but a good approximation. Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
3. The gradient is α -Lipschitz, i.e. ∀ i, x : | v ′′ i ( x ) | ≤ α .Following a similar approach as in the previous section, we can derive an efficiency guarantee inthis setting too. Theorem 4.2 (Main theorem for bandwidth allocation) . Consider the proportional band-width sharing game with dynamic population and with valuations satisfying the conditions listedabove. If all players use adaptive learning algorithms with constant C R and if T ≥ p then we have: P t E [ W ( s t ; v t )] ≥ (2 −√ − ǫ )(1 − ǫ )(1+ ǫ ) P t E [ Opt ( v t )] if the turnover probability p is at most C · ρ ǫ α ln( NT ) for C = (96(1 − ǫ ) ( C R ) log (1+ ǫ ) ( α (1 − ǫ ) /ρ ǫ )) − . The high-level outline of the proof consists of three lemmas. • As a benchmark optimization problem, we consider the δ -segmented bandwidth allocation prob-lem for some δ >
0, where all allocated bandwidths are integer multiples of δ . We show that theLipschitz condition above ensures that for a small enough δ >
0, the segmented optimum is notmuch smaller than the true optimum.
Lemma 4.3.
The social welfare of the optimal δ -segmented solution approximates within (1 − ǫ ) the optimum if δ ≤ ǫρα . • To get a stable and approximately optimal allocation for the δ -segmented bandwidth problem,we use a layered version of the greedy algorithm, similar to our greedy matching algorithm inSection 4.1. We divide the bandwidth in segments of length δ . The greedy bandwidth allocationalgorithm greedily allocates segments based on the marginal increase in the players’ valuationfunction. We will denote as v i,j the marginal valuation that player i has for her j -th segment. Notethat, due to concavity of the valuation function v i,j is a non-increasing function on j and, due tothe lower bound on the gradient, it is at least ρδ . The greedy algorithm is therefore optimal forthe δ -segmented bandwidth problem.To make it more stable, similarly as in the matching markets, we use a layered version of thevaluation functions where the layer of some marginal valuation v i,j is the highest ℓ such that ℓ ( v i,j ) ≥ ρδ (1 + ǫ ) ℓ − . We will use ℓ t ( j ) to denote the layer that the j -th most valued (as in marginalvalues) segment was assigned at step t . We will again select the tie-breaking rule across marginalvalues of the same layer to facilitate stability, i.e. previous holders of segments are helped in thetie-breaks to keep the same number of segments as they had before. We show that the greedy layeredalgorithm for the δ -segmented bandwidth allocation problem finds a solution within a (1 + ǫ ) factorof the welfare of the optimal δ -segmented solution, and that the sequence of solutions found bythis greedy algorithm is stable. Lemma 4.4.
Consider a repeated δ -segmented bandwidth allocation game with dynamic population M = ( M, T, p ) and n players. Assuming T ≥ /p , the greedy layered algorithm with parameter ǫ guarantees that W ( x t ; v t ) ≥ ǫ ) Opt ( v t ) for all t , and it can be implemented so that the average(over players) expected number of changes in the allocation sequence or the type for players whohold an item at the time of the change is upper bounded by n P ni =1 E [ κ i ( v Ti , x Ti )] ≤ · T · (1 /δ ) · p · log (1+ ǫ ) (1 /ρ ) n • Finally, we need to show that the proportional sharing mechanism is smooth.Syrgkanis and Tardos (2013) showed that the mechanism is (2 − √ − ǫ, ǫ >
0, the proportional allocation mechanism is (2 − √ − ǫ, δ -segmented bandwidth allocation problem, usinga the discredited deviation. Lemma 4.5.
The proportional mechanism allowing only bids that are multiples of ζ = ǫδ is (2 −√ − ǫ, -solution based smooth with respect to any δ -segmented allocation. Combining these lemmas, we use Theorem 3.3 to get the claimed efficiency result.
Proof of Theorem 4.2.
From Lemmas 4.3, 4.4, 4.5 and Theorem 3.3, setting δ = ǫρα (1 − ǫ ) , we getthat the aggregate social welfare of the proportional allocation bandwidth is (2 −√ − ǫ )(1 − ǫ )(1+ ǫ ) of theoptimum. This is achieved for turnover probability p : p ≤ ( ρδ ) ǫ · ǫ ) ( C R ) log (1+ ǫ ) (1 /ρδ ) ln( N T ) . Replacing δ , the result follows. (cid:3)
5. Stable Sequences via Differential Privacy
In this section we formally connect joint differential privacy with the construction of stablesequences needed by our main Theorems 3.1 and 3.2. In Appendix EC.2 we offer a strengtheningof these theorems that allows us to use marginal differential privacy. Differential privacy offers ageneral framework to find solutions that are close to optimal, yet more stable to changes in theinput than the optimum itself. To guarantee privacy, the output of the algorithm is required todepend only minimally on any player’s input. This is exactly what we need in our framework.
Theorem 5.1 (Stable sequences via privacy) . Suppose there exists an algorithm A : V n → ∆( X n ) that is ( ǫ, δ ) -jointly differentially private, takes as input a valuation profile v and outputs adistribution of solutions such that a sample from this distribution is feasible with probability − β ,and is α -approximately efficient in expectation (for ≤ ǫ ≤ / , α > and δ, β > ). Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Consider a sample v T from the distribution of valuations produced by the adversary in a repeatedcost-minimization game with dynamic population Γ = (
G, p, T ) . There exists a randomized sequenceof solutions x T for the sequence v T , such that for each ≤ t ≤ T , x t conditional on v t isan α -approximation to Opt ( v t ) in expectation and the joint randomized sequence ( v T , x T ) is pT (1 + n (2 ǫ + 2 β + δ )) -stable (as in Definition 3.1). We defer the proof of Theorem 5.1 to the next subsection. Combining Theorem 5.1 with Theorem3.1 and Theorem 3.2, we immediately get the following corollary.
Corollary 5.1.
Consider a repeated cost game with dynamic population
Γ = (
G, T, p ) , such thatthe stage game G is allocation based ( λ, µ ) -smooth and T ≥ p . Assume that there exists an ( ǫ, δ ) -joint differentially private algorithm A : V n → X n with error parameter β that satisfies the con-ditions of Theorem 5.1. If all players use adaptive learning algorithms with constant C R in therepeated game then the overall cost of the solution is at most: P t E [ C ( s t ; v t )] ≤ λα − µ P t Opt ( v t ) + nT − µ · C R q p (cid:0) n ( ǫ + β + δ ) (cid:1) ln( N T ) Similarly for a mechanism we get: X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − nT max { , µ } · C R p p (1 + n ( ǫ + β + δ )) ln( N T ) We will use total variation distance to measure the distance between distributions. For two distri-butions µ and η on some finite probability space Ω the following are two equivalent versions of thetotal variation distance: d tv ( µ, η ) = 12 k µ − η k = max A ⊂ Ω ( µ ( A ) − η ( A )) , (5.1)where in the 1-norm in the middle we think of µ and η as a vector of probabilities over the possibleoutcomes. Lemma 5.1.
Suppose that A : V n → ∆( X n ) is an ( ǫ, δ ) -joint differentially private algorithm withfailure probability β (for ≤ ǫ ≤ / and δ, β > ) that takes as input a valuation profile v andoutputs a distribution over feasible solutions σ . Let σ and σ ′ be the algorithm’s outputs on twoinputs v and v ′ that differ only in coordinate i . Then we can bound the total variation distancebetween σ − i and σ ′− i by d tv ( σ − i , σ ′− i ) ≤ (2 ǫ + δ ) .Proof of Lemma 5.1. Condition (2.5) of joint differential privacy guarantees that if we let S ⊆X n − i be a subset of possible solutions for players other than i and with σ − i ( S ) and σ ′− i ( S ) theprobability that the two distributions assign on S , then for any S : σ − i ( S ) ≤ exp ( ǫ ) σ ′− i ( S ) + δ . Since ǫ ≤ /
2, we can use the bound exp ( ǫ ) ≤ ǫ to get that σ − i ( S ) − σ ′− i ( S ) ≤ ǫσ ′− i ( S ) + δ ≤ ǫ + δ . Thus by the second definition of the total variation distance in Equation (5.1) we get that d tv ( σ − i , σ ′− i ) ≤ ǫ + δ . (cid:3) To facilitate the proof we need a simple lemma from basic probability theory.
Lemma 5.2 (Coupling Lemma) . Let µ and η be two probability measures over a finite set Ω .There is a coupling ω of ( µ, η ) , such that if the random variable ( X, Y ) is distributed according to ω , then the marginal distribution on X is µ , the marginal distribution on Y is η , and Pr [ X = Y ] = d tv ( µ, η ) , Proof of Theorem 5.1.
Suppose that A : V n → ∆( X n ) is an ( ǫ, δ )-joint differentially private algo-rithm as described in the definition of the theorem. The differentially private algorithm fails withprobability β . We will denote with σ the output distribution over solutions for an input v , wherewe use the optimal solution in the low probability event that the algorithm fails. (Equivalently A could be a randomized algorithm and σ its implicit distribution over solutions).Let σ , . . . , σ T , be the sequence of distributions output by the private algorithm when run ona deterministic sequence of valuation profiles v , . . . , v T with the modification described in theparagraph above. To simplify the discussion we will assume that only one player changes valuationat each time-step t . Essentially we are breaking every transition from time-step t to t + 1 intomany sequential transitions where only one player changes at every time step, and then deletingthe solutions from the resulting sequence that correspond to the added steps. Thus the number ofsteps within this proof should be thought as being equal to n · p · T in expectation.By differential privacy we know that the total variation distance of two consecutive distributionswithout the modification of replacing failures with the optimal solution is at most 2 ǫ + δ . Since,by the union bound, the probability that any of the two consecutive runs of the algorithm fail isat most 2 β , we can show that the total variation distance of the latter modified output is at most2 ǫ + δ + 2 β , i.e. for any t ∈ [ T ]: d tv ( σ t +1 − i , σ t − i ) ≤ ǫ + δ + 2 β (see Lemma 5.3 for a formal proof).We can turn the sequence of distributions σ , . . . , σ T into a distribution of sequences of allocations x T by coupling the randomness used to select the solutions in different distributions σ t . To dothis, we take advantage of the coupling lemma from probability theory, Lemma 5.2. If at step t noplayer changes values, then σ t = σ t +1 , and we select the same outcome from the two distributions,so we get P [ x t − i = x t +1 − i ] = 0.Now consider a step in which a player i changes her private type v i . We use Lemma 5.2 to couple x t +1 − i and x t − i so that P [ x t +1 − i = x t − i ] = d tv ( σ t +1 − i , σ t − i ) ≤ ǫ + δ + 2 β. (5.2) One can think of it as sampling x t +1 conditional on x t and assuming the joint distribution of x t and x t +1 is asprescribed by the coupling lemma applied to σ t and σ t +1 . This is to address concerns that x t is already coupled with x t − in the previous step. Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Note that this couples the i th coordinate x t +1 i and x ti in an arbitrary manner, which is fine, as weassumed that the valuation of player i changes at this step.We have defined a probability distribution of sequences x T for every fixed sequence of valuations v T . We extend this definition to random sequences of valuation in the natural way adding thedistribution of valuations v T .We claim that the resulting random sequences of (valuation,solution) pairs satisfies the statementof the theorem: the α -approximation follows by the guarantees of the private algorithm and bythe fact that we use the optimal solution when the algorithm fails. Next we argue about thestability of the sequence. Consider a player i , and the distribution of her sequence ( v Ti , x Ti ).In each step t her valuation v ti changes with probability p contributing pT in expectation to thenumber of changes. In a step t when some other value j = i changes, we use (5.2) to bound theprobability that x ti = x t +1 i by 2 ǫ + δ + 2 β . Thus any change in the value of some other player j contributes (2 ǫ + 2 β + δ ) to the expectation of the number of changes for player i . The expectednumber of such changes in other values is ( n − pT over the sequence, showing that the sequenceis pT + ( n − pT (2 ǫ + 2 β + δ ) ≤ pT (1 + n (2 ǫ + 2 β + δ )) stable, as claimed. (cid:3) Lemma 5.3.
Let q and q ′ be the output of an ( ǫ, δ ) -joint differentially private algorithm with failureprobability β , on two valuation profiles v and v ′ that differ only in coordinate i . Let σ and σ ′ bethe modified output where the outcome is replaced with optimal outcome when the algorithm fails.Then: d tv ( σ, σ ′ ) ≤ ǫ + δ + 2 β Proof of Lemma 5.3.
Consider two random coupled random variables y, y ′ that are implied byLemma 5.2 applied to distributions q and q ′ , such that y ∼ q and y ′ ∼ q ′ and Pr [ y = y ′ ] = d tv ( q, q ′ ) ≤ ǫ + δ (by ( ǫ, δ )-joint privacy). Now consider two other random variables x and x ′ where x = y except for the cases where y is an outcome of a failure in which case x is equal to the welfare optimaloutcome and similarly for x ′ and y ′ . Obviously: x ∼ σ and x ′ ∼ σ ′ , thus ( x, x ′ ) is a valid coupling fordistributions σ and σ ′ . Thus if we show that P r [ x = x ′ ] ≤ ǫ + δ + 2 β , then by properties of totalvariation distance d tv ( σ, σ ′ ) ≤ P r [ x = x ′ ] ≤ ǫ + δ + 2 β , which is the property we want to show.Let fail be the event that either y or y ′ is the outcome of a failed run of the algorithm. Then bythe union bound Pr [fail] ≤ β . Thus we have: Pr [ x = x ′ ] = Pr [ x = x ′ | ¬ fail] · Pr [ ¬ fail] + Pr [ x ′ = x | fail] Pr [fail] ≤ Pr [ x = x ′ | ¬ fail] · Pr [ ¬ fail] + 2 β = Pr [ y = y ′ | ¬ fail] · Pr [ ¬ fail] + 2 β ≤ Pr [ y = y ′ ] + 2 β ≤ d tv ( q, q ′ ) + 2 β ≤ ǫ + δ + 2 β This completes the proof of the Lemma. (cid:3) Our first application of differential privacy is for the atomic congestion game with dynamic popu-lation, defined in Section 2. Rogers et al. (2015) gives a jointly differentially private algorithm forfinding an optimal solution in congestion games, called
Private gradient descent algorithm . Theyfocus on routing games due to the paper’s focus on tolls as mediators, but their algorithm worksin full generality for any atomic congestion game.We illustrate our technique with linear latencies ℓ e ( x ) = a e x + b e . We assume latency is monotoneincreasing, i.e., a e > e ∈ E and that b e ≥
0. The algorithm of Rogers et al. (2015) assumesthat ℓ e ( x ) ≤ e . To achieve this we need to scale latencies by n max e ( a e + b e ). This makesthe functions γ -Lipschitz for γ = 1 /n . For this case, the algorithm outputs an integer solution thatsatisfies ( ǫ, δ ) joint differential privacy, and has an error probability of β for parameters ǫ, δ, β > v with probability 1 − β returns a solution x with cost in expectation over therandomization of the algorithm E [ C ( x ; v )] ≤ Opt ( v ) + m / nγ / ǫ / polylog( ǫ, /δ, /β, n, m ) . (5.3)We can combine this differentially private algorithm with Corollary 5.1 for a class of latencyfunctions ℓ ( x ) that we have good smoothness properties. The class of linear latencies ℓ e ( x ) = a e x + b e are (5 / , / Lemma 5.4.
Congestion games with linear latencies ℓ e ( x ) = a e x + b e for a e , b e ≥ are (5 / , / -smooth with respect to any solution x . Theorem 5.2 (Main theorem for large congestion games.) . Consider a repeated conges-tion game with dynamic population
Γ = (
G, T, p ) , such that T ≥ p , the stage game G is an atomiccongestion game with affine latency functions ℓ e ( x ) = a e x + b e with a e > and b e ≥ for all e .For any η > , if all players use adaptive learning algorithms with constant C R , then the overallexpected cost is bounded by P t E [ C ( s t ; v t )] ≤ (1 + η ) P t Opt ( v t ) assuming the probability p of departures is at most C · η · m − · (ln T ) − for C = (cid:18) · (cid:19) · ( C R ) − (cid:18) min e a e max e ( a e + b e ) (cid:19) · (cid:0) log ( m · n ) ln( n ) (cid:1) − Remark 5.1.
We note that the probability p depends mainly on the number of congestible ele-ments m , but depends on n only in a polylogarithmic way. For large n , almost a constant fractionof the player population can turn over at each step.In Appendix EC.3 we generalize the bound to polynomial functions, and also give additive errorresults for congestion games with general latency functions. Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Next we revisit the first price auction game, but consider a much broader class of valuations: weconsider large markets with valuations that satisfy the gross substitute property. Hsu et al. (2014)give a jointly differentially private algorithm to find close to optimal allocation in markets wherebuyers have the gross substitute property, and there are enough copies of each item. This algorithmwill allow us to derive good welfare guarantees for outcomes on adaptive learning in repeatedauctions with dynamic population using Corollary 5.1.We will assume that the valuation functions satisfy the gross substitute property , i.e., increasingprices outside a subset doesn’t decrease the player’s demand in the set.
Definition 5.1 (Gross-substitute valuation).
For a price p let p ( A ) = P j ∈ A p j denote thetotal price, and let ω ( p ) denote the player’s most desirable set of goods, that is, let ω ( p ) =arg max A v ( A ) − p ( A ). The valuation satisfies the gross substitutes condition if for every pair ofprice vectors ( p, p ′ ) such that ∀ items j p j ≤ p ′ j and for every set of goods S ∈ ω ( p ) if S ′ ⊆ S satisfies p ′ j = p j for every j ∈ S ′ then there is a set S ∗ ∈ ω ( p ′ ) with S ′ ⊆ S ∗ .We will make the following large market assumptions :1. The number of items ms is large, in particular ms ≥ cn for some constant c ≤ ρ marginal gain. This impliesimmediately that the optimal social welfare is at least Opt t ≥ ρms at each time t ∈ [ T ].3. The players are interested in at most d types of items and want only one copy of each item(meaning that their value for any bundle A of items is equal to the maximum value among anysubset of this bundle with cardinality at most d ).We will use the PAlloc algorithm from Hsu et al. (2014) as our benchmark for adaptive learning.The algorithm has two additional parameters α > β >
0, it is ǫ -jointly differentially private,that is ( ǫ, − β ) it computes a feasible efficientallocation. Assuming the supply s is high enough, the social value of the allocation is at least Opt − α · max( ms, n ) in expectation, where recall that ms is the total supply, as we have s copiesof m different items each. Concretely, with supply s we get α = O (cid:18) sǫ ) / · polylog( n, m, s, /β ) (cid:19) (5.4)In order to be able to use this algorithm as a benchmark in Corollary 5.1, we need to show thatthis is an approximation algorithm with small approximation factor. Lemma 5.5.
For every η > , when the players’ valuations satisfy the gross substitute assumption,the algorithm PAlloc with privacy parameter ǫ ( n ) can be used to output an allocation, w.p. − β ( n ) , that has social welfare at least (1 − η ) Opt under the large market assumptions listed above,assuming in addition that η = O (cid:18) ρ · c · ( s · ǫ ( n )) / (cid:19) · polylog( n, m, s, /β ( n )) Theorem 5.3 (Main theorem for large markets) . Consider a repeated large market mecha-nism with dynamic population
Γ = (
M, T, p ) , such that T ≥ p where the stage mechanism M is asolution-based ( λ, µ ) -smooth mechanism, the players have gross substitute valuations and the mar-ket satisfies the large assumption. If all players use an adaptive learning algorithm with constant C R , then the overall expected social welfare is at least: X t E ( W ( x t ; v t )) ≥ λ max(1 , µ ) · (1 − η ) X t Opt t if the probability p of a player leaving is p ≤ C · η · ρ · c m · ln( N T ) for C = Θ ((polylog( n, m, s ) − ) where N is the number of different strategies each player is using,which is at most (cid:16) mδ · ρ (cid:17) d , when bids on each item are multiples of δ · ρ . There are several mechanisms for this setting that are ( λ, µ )-smooth. As we showed in Lemma4.1, running simultaneous first price auctions for each type of good (as described in Section 2),results in a (cid:0) − δ, (cid:1) -solution based smooth mechanism. A. Proofs of Main Results
A.1. Proofs from Section 3
Proof of Theorem 3.1.
Let s ∗ ,ti be the deviation s ∗ i ( v ti , x ti ) defined by the smoothness propertyand s ∗ , Ti the sequence of these deviations. Let K i be the number of time steps that s ∗ ,ti = s ∗ ,t +1 i and r i ( s ∗ , Ti , s T ; v T ) the regret that player i has compared to selecting s ∗ ,ti at every step, i.e.: r i ( s ∗ , Ti , s T ; v T ) = T X t =1 (cid:0) c i ( s t ; v t ) − c i ( s ∗ ,ti , s t − i ; v t ) (cid:1) . (A.1)For shorthand, we denote this with r ∗ i in this proof. Observe that since s ∗ ,ti is uniquely determinedby v ti and x ti , K i is a random variable that is equal to k i ( v Ti , x Ti ), for each instantiation of thesequences v T and x T .For any period [ τ r , τ r +1 ) that the strategy s ∗ ti is fixed, adaptive learning guarantees that theplayer’s regret for this strategy is bounded by R i ( τ r , τ r +1 ) ≤ C R p ( τ r +1 − τ r ) ln( N T ) , (A.2) Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Summing over the K i periods in which the strategy is fixed and using the Cauchy-Schwartz inequal-ity, we can bound the total regret of each i : r ∗ i ≤ C R vuut ( K i + 1) K i +1 X r =1 ( τ r +1 − τ r ) ln( N T ) = C R p ( K i + 1) T ln( N T ) , (A.3)Thus for each instance of x T and v T , we have: T X t =1 c i ( s t ; v t ) = T X t =1 c i ( s ∗ ,ti , s t − i ; v t ) + r ∗ i ≤ T X t =1 c i ( s ∗ ,ti , s t − i ; v t ) + C R p ( K i + 1) T ln( N T ) , (A.4)Adding over all players, and using the smoothness property, we get that X t C ( s t ; v t ) ≤ λ X t C ( x t ; v t ) + µ X t C ( s t ; v t ) + X i C R p ( K i + 1) T ln( N T ) . By Cauchy-Schwartz, P i p ( K i + 1) T ln( N T ) ≤ p n · T · ln( N T ) · P ni =1 ( K i + 1). Taking expecta-tion over the allocation and valuation sequence and using the α -apptroximate optimality andJensen’s inequality: P t E [ C ( s t ; v t )] ≤ λα P t E [ Opt ( v t )] + µ P t E [ C ( s t ; v t )] + n · C R q T ln( N T ) (cid:0) n P ni =1 E [ K i ] (cid:1) . By the k -stability of the sequence, we have that P ni =1 E [ K i ] ≤ k · n . By re-arranging we get theclaimed bound. (cid:3) Proof of Theorem 3.2.
The proof follows along similar lines as the proof of Theorem 3.1 andfor completeness is given in Section EC.1 of the supplementary material. (cid:3)
Proof of Theorem 3.3.
Let s ∗ , Ti , K i and r i ( s ∗ , Ti , s T ; v T ) be defined exactly as in the proofof Theorem 3.2, including the shorthand of r ∗ i . For any period [ τ r , τ r +1 ) that the strategy s ∗ ,ti isfixed, adaptive learning guarantees that the player’s regret for this strategy is bounded by R i ( τ r , τ r +1 ) ≤ C R p ( τ r +1 − τ r ) ln( N T ) , Moreover, if in period r , x ti = ∅ , then by Assumption 1 we have that: R i ( τ r , τ r +1 ) ≤
0. Thus, if wedenote with X i,r the indicator of whether in period r , x ti = ∅ , we get: R i ( τ r , τ r +1 ) ≤ C R q X i,r ( τ r +1 − τ r ) ln( N T ) , Summing over the K i + 1 periods in which the strategy is fixed and using the Cauchy-Schwartzinequality, we can bound the total regret of each i : r ∗ i = K i +1 X r =1 C R p X i,r · q X i,r ( τ r +1 − τ r ) ln( N T ) ≤ C R vuut K i +1 X r =1 X i,r · vuut K i +1 X r =1 X i,r ( τ r +1 − τ r ) ln( N T ) Let Y ti = 1 { x ti = ∅} . Then observe that: K i +1 X r =1 X i,r ( τ r +1 − τ r ) = T X t =1 Y ti . Replacing in the previous inequality, summing over all players and using Cauchy-Swartz: n X i =1 r ∗ i ≤ n X i =1 C R vuut K i +1 X r =1 X i,r · vuut T X t =1 Y ti ln( N T ) ≤ C R · vuut n X i =1 K i +1 X r =1 X i,r · vuut n X i =1 T X t =1 Y ti ln( N T )Since each x t is a feasible allocation: P ni =1 Y ti ≤ m . Hence, P ni =1 P Tt =1 Y ti ≤ mT . Moreover: n X i =1 K i +1 X r =1 X i,r ≤ n X i =1 K i X r =1 X i,r + n X i =1 X i,K i +1 = n X i =1 K i X r =1 X i,r + n X i =1 Y Ti ≤ m + n X i =1 K i X r =1 X i,r Now observe that for each instance of ( v T , x T ): P K i r =1 X i,r ≤ κ i ( v T , x T ), since the latter sum-mation sums all changes in type or allocation ranging from r = 1 to K i , such that the allocation x τ r i in the period right before the r -th change is non-empty. This is at most the set of changes thatare accounted in κ i ( v T , x T ). It is an inequality as there could be an index r at which both atype and an allocation is changing and the summation only accounts it once, while κ i ( v T , x T )counts it twice, or there could be changes where x ti = x t − i and x ti = ∅ , which are not accounted inthe above, but are accounted in κ i ( v T , x T ). Combining all the above we get: n X i =1 r ∗ i ≤ C R vuut m + n X i =1 κ i ( v T , x T ) · p mT ln( N T )By the no-regret property of each player, for each instance of x T and v T , we have: T X t =1 u i ( s t ; v t ) ≥ T X t =1 u i ( s ∗ ,ti , s t − i ; v t ) − r ∗ i Adding over all players, and using the smoothness property and the bound on the sum of regrets,we get that X t X i u i ( s t ; v t ) ≥ λ X t W ( x t ; v t ) − µ X t R ( s t ) − C R vuut m + n X i =1 κ i ( v T , x T ) · p mT ln( N T )Taking expectation over the allocation and valuation sequence and using the α -approximate opti-mality and Jensen’s inequality: X t X i E [ u i ( s t ; v t )] ≥ λα X t E [ Opt ( v t )] − µ X t E [ R ( s t )] − C R vuut m + n X i =1 E [ κ i ( v T , x T )] · p mT ln( N T ) . Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
By the analogue of k -stability of the sequence, as defined in Equation (3.1), we have that n X i =1 E [ κ i ( v T , x T )] ≤ k · n. By re-arranging and using the fact that W ( s t ; v t ) = P i u i ( s t ; v t ) + R ( s t ) and that R ( s t ) ≤ W ( s t ; v t )(since utilities are non-negative), we get the claimed bound. (cid:3) A.2. Proofs from Section 4.1
Proof of Lemma 4.1
The proof is similar to the proof of Syrgkanis and Tardos (2013) that themechanism is (1 / , (cid:3) Proof of Lemma 4.2.
The 2(1 + ǫ )-approximation result holds as we lose an approximationfactor of 2 due to the greedy algorithm and another approximation factor of (1 + ǫ ) due to thelayers.To show the stability let ℓ ( v i ( j )) be the highest ℓ such that ℓ ( v i ( j )) ≥ ρ (1 + ǫ ) ℓ − , i.e., the roundedversion of v i ( j ) is ρ (1 + ǫ ) ℓ ( v i ( j )) − , which we call the layer of this value. For example, any value inthe range [ ρ, ρ (1 + ǫ )) is in layer 1. Let ℓ t ( j ) denote ℓ ( v i ( j )) if item j is assigned to player i at time t , and let ℓ t ( j ) = 0 if item j is not assigned at time t . We will use the potential functionΦ( x t ) = X j ℓ t ( j )to show stability.We will show that changes in assignments correspond to increases in the potential function, andthe potential function can only decrease due to departures.When a player who was assigned item j leaves at time t , this immediately decreases the potentialfunction by ℓ t ( j ) ≤ log (1+ ǫ ) (1 /ρ ). Next we see how to restore the layered greedy solution after adeparture and after an arrival. We will claim that each change in the solution corresponds to anincrease in the potential function.To get the desired stability, we will only reassign an item j from a player i to a different player i ′ if ℓ ( v i ′ ( j )) > ℓ ( v i ( j )), that is, if the rounded value is higher. If this is the case, we say that i is eligible to be reassigned to item j , and similarly, we will say that player i is eligible to be movedfrom an item j to a different item j ′ if ℓ ( v i ( j ′ )) > ℓ ( v i ( j )).When a new player i arrives, we assign the player to her highest valued item j to which sheis eligible to be assigned. This increases the potential function by at least one. Now the previousowner of the item j has no allocation, and again we assign this player to her highest value item towhich she is eligible to be reassigned, further increasing the potential function. We continue thisprocess till a layered greedy solution is obtained. After a player departs, the remaining solution may have an item j that is unassigned. We reassignitem j to the eligible player i of highest value. This increases the potential function, but possiblyleaves a different item, one that i used to have, unassigned. Again we assign this item to the eligibleplayer of highest value for the item, further increasing the potential function. We continue thisprocess till a layered greedy solution is obtained.We have shown that each change in the assignment, other than player departures, increases thepotential function Φ allowing us to bound the expected number of changes. Each step t , each ofthe up to m players with assigned item leaves with probability p , so the expected decrease inthe potential function over the T steps of the algorithm is at most pmT log (1+ ǫ ) (1 /ρ ). The poten-tial function Φ is nonnegative, integral, and is bounded by m log (1+ ǫ ) (1 /ρ ). This implies that theexpected increase in the potential function during the algorithm is at most m (1 + pT ) log (1+ ǫ ) (1 /ρ ).Since each change in the solution also increases the potential function by at least 1, the sameexpression also bounds the total number of changes in the allocation and each such change affectsat most two players. Thus the aggregate number of changes in allocation across players is at most2 m (1 + pT ) log (1+ ǫ ) (1 /ρ ).Last we also need to account for the departures (or changes in type) of players that are alreadyallocated an item. Since there are m such players in each iteration and each is replaced with proba-bility p , there are mpT such changes in expectation. Thus the total number of changes in allocationor changes in type of players that are allocated an item is at most m (2 + 3 pT ) log (1+ ǫ ) (1 /ρ ). Theaverage change for a player is an n th fraction of this, leading to the claimed bound using that T ≥ /p . (cid:3) Proof of Theorem 4.1.
Apply Theorem 3.3, where x T is the outcome of the greedy-layeredmechanism; the fact that first price auction is ( , X t E [ W ( s t ; v t )] ≥ ǫ ) X t E [ Opt ( v t )] − C R q T · m · (5 · T · m · p · log (1+ ǫ ) (1 /ρ ) + m ) · ln( N T )Using that pT >
1, we get the first claimed bound.To get the multiplicative bound, it suffices to upper bound the expected aggregate regret by ǫ ǫ ) P t E [ Opt ( v t )], which is at least ǫ ǫ ) T mρ , by the assumptions ǫ ≤ / ρ . To show that this is true, what we need to prove is the following (usingthe inequality (4.2)): mT · C R q · p · log (1+ ǫ ) (1 /ρ ) · ln( N T ) ≤ ǫ ǫ ) T mρ which is true if p ≤ ρ ǫ · ǫ ) ( C R ) log (1+ ǫ ) (1 /ρ ) ln( N T ) . (cid:3) Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
A.3. Proofs from Section 4.2
Proof of Lemma 4.3.
For a given value M , if we allocate bandwidth to players up to the pointwhen their marginal value for bandwidth is M or more, i.e., setting x i such that u ′ i ( x i ) = M whenever x i > u ′ i (0) ≤ M when x i = 0, then the allocations { x i } form the optimal solutionfor total bandwidth P i x i . The idea of the proof is to consider this optimal solution to a smallerbandwidth, and then round each allocation x i up to the next multiple of δ . For a value M let x i ( M ) = 0 if u ′ (0) < M , and otherwise set x i ( M ) > u ′ i ( x i ( M )) = M . So the optimalsolution is the allocation x i ( M ) for an M such that P i x i ( M ) = 1.Now for an allocation x i to player i let ˆ x i = ⌈ x i /δ ⌉ δ , the allocation rounded up to a multiple of δ . Now let seg ( M ) = P i ˆ x i ( M ). Clearly, seg ( M ) is a monotone decreasing function of M , and isright-continuous. Set M be the minimum value such that seg ( M ) ≤ M ≤ max i u ′ i (0)).Now we consider the following segmented allocation: for player i such that x i ( M ) < ˆ x i ( M ), or x i ( M ) = 0 and u ′ i (0) < M , we set y i = ˆ x i ( M ). For the remaining players we have that x i ( M ) is aninteger multiple of δ and u ′ i ( x i ( M )) = M . For these players we set y i either x i ( M ) or x i ( M ) + δ such that P i y i = 1. We note that such allocation always exists, as seg ( M ′ ) > M ′ > M ,so there must be enough players with ˆ x i ( M ′ ) > ˆ x i ( M ), using y i = ˆ x i ( M ′ ) = x i ( M ) + δ for a subsetof these players can make the total exactly 1.Now we claim that the segmented allocation { y i } satisfies the claim of the lemma. Let z i = y i − x i ( M ), be the additional allocation due to rounding, and let z = P i z i denote the total roundingused.First note that the value of the optimum allocation is at most P i u i ( x i ( M )) + zM . This is true,as at the allocation { x i ( M ) } there is z amount of space left to be allocated, and all players havemarginal utility at most M for additional space.To bound a player’s utility for its allocation y i , we use the fact that the second derivative of theutility is at least − α , so we get that u i ( y i ) − u i ( x i ( M )) = Z y i x i ( M ) u ′ ( ξ ) dξ ≥ Z y i x i ( M ) ( M − αξ ) dξ = M z i − αz i where the inequality used the fact that u ′ i ( x i ( M )) = M for all players whose allocation was rounded,i.e., who have y i > x i ( M ).To bound the utility of the segmented solution, we add the above bound for all players, and usethat z i ≤ δ for all i to get X i u i ( y i ) ≥ X i u i ( x i ( M )) + M z − αzδ Now by the choice of allocation x i ( M ) we have that P i u i ( x i ( M )) + M z ≥ M , and so we canbound the last term by 12 αzδ ≤ αδ ≤ ǫM ≤ ǫ ( X i u i ( x i ) + M z )using the bound on δ and the fact the first derivative is at least ρ .Combining these bounds, we get the claimed overall bound OP T ≤ X i u i ( x i ( M )) + M z ≤ − ǫ X i u i ( y i )as claimed. (cid:3) Proof of Lemma 4.4.
The (1 + ǫ )-approximation holds as at most this is lost due to the layerswhilst the non-layered greedy algorithm would be optimal as the valuation functions are concave.In order to prove stability we use the same potential function as in the matching markets:Φ( x t ) = X j ℓ t ( j )We will again show that, unless some player who holds bandwidth departs, changes in the allocationcorrespond to equal increase in the potential function. Hence, we will show that decrease in thepotential function happens only due to the departures of current holders of bandwidth.When a player j who is assigned m j segments of bandwidth leaves, all her segments becomefree. Hence, this causes a decrease in the potential function of m j log ǫ (1 /ρδ ). Summing over allthe players who have items, the expected decrease in the potential function is equal to P j p · m j log ǫ (1 /ρδ ) = pδ log ǫ (1 /ρδ ). This is the same as the expected decrease in the potential functionin matching markets with a lower bound of ρδ instead of ρ and 1 /δ segments instead of m items.When a player i arrives, she either gets assigned to some segments or not. If she does not thenshe does not affect the allocation at all. If she is assigned to some segments, given the tie-breakingrule, it means that her marginal value for the segment is higher compared to the player’s who doesnot get assigned to segments due to that. Hence, the potential function increases at least by thenumber of segments she gets and she causes no more changes in the allocation (for each segmentshe takes, she might affect the allocation of at most one player). The total increase is bounded bythe decrease that is previously done in the potential function hence there is correspondence to thematching markets case.The remainder of the proof follows exactly the same steps as the proof of Lemma 4.2, having ρδ instead of ρ as this is the minimum value of one segment (correspondingly item) and 1 /δ as this isthe number of segments (correspondingly items). (cid:3) Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Proof of Lemma 4.5.
In (Syrgkanis and Tardos 2013), the proportional mechanism is proven(2 − √ , , λv i ( x ∗ i )]where x ∗ i is the optimal allocation and λ is a carefully tuned parameter. If, instead of the opti-mal solution, we selected any other solution ˆ x ∗ , the same result would hold for solution-basedsmoothness. Letting B i be i ’s realized deviating bid and b i be the bid played, this means that: E B "X i u i ( B i , b − i ) ≥ (2 − √ W (ˆ x ∗ ) − X j b j Recall that we defined the mechanisms with a discrete action space. Hence, we will consideronly bids that are multiples of ζ . The deviating bid we will use is the rounding of the bid ofSyrgkanis and Tardos (2013) (to multiples of ζ ). Hence the deviating bid now will be ¯ B i = ⌈ B i ζ ⌉ · ζ .Summing over all players that hold items in ˆ x ∗ E ¯ B "X i u i ( ¯ B i , b − i ) = E ¯ B "X i (cid:16) v i ( ¯ B i , b − i ) − ¯ B i (cid:17) ≥ E B "X i (cid:16) v i ( B i , b − i ) − B i − ζ (cid:17) ≥ (2 − √ W (ˆ x ∗ ) − X j b j − X i ζ = (2 − √ − ǫ ) W (ˆ x ∗ ) − X j b j The first inequality holds from the monotonicity of the valuation function and the discretizationof bids. The second holds from the smoothness condition of the non-discretized version. The lastequality holds replacing ζ = ǫδ and by the fact that, for any δ -segmented allocation x , W ( x ) ≤ /δ (as the valuation function is upper bounded by 1 and the number of players that can hold segmentsare upper bounded by 1 /δ ). (cid:3) A.4. Proofs from Section 5.2
Proof of Theorem 5.2.
To use the jointly differentially private algorithm of Rogers et al. (2015)with a set of affine latency functions ℓ e ( x e ) = a e x e + b e , we need to scale them by n max e ( a e + b e ) toguarantee that ℓ e ( n ) ≤ γ = 1 /n -Lipschitz. We will use thejointly differentially private algorithm on the scaled problem, with privacy parameters ǫ ( n ), δ ( n ),and β ( n ) that will depend on the size of the population, and then rescale to the original costs, toget a solution with expected cost: E [ C ( x ; v )] ≤ Opt ( v ) + 141 n / · m · (max( nγ, m )) / · ǫ − / · log (4 m · n · max( nγ, m ) · ǫ/β ) · p ln(1 /δ ) More precisely, the authors proved that they can find a fractional solution with cost at most
Opt + R + 4 R for R ≤ ( nm )(2 nγ +8 m ) √ T + 2 m · mT/β ) √ T ln(1 /δ ) ǫ where T = n ( nγ +4 m ) ǫ √ and then lose an additional m p n ln( m/β ) to get theintegral solution. This can give an upper bound of 141 · n / · m · (max( nγ, m )) / · ǫ − / · log (4 m · n · max( nγ, m ) · ǫ/β ) · p ln(1 /δ ). where γ = 1 /n , and the polylog term is the actual expression in (5.3).Corollary 5.1 is expecting an α -approximation algorithm, so we need to bound the approximationfactor of this algorithm. To claim that it is a (1 + η )-approximation algorithm we need to guaranteethat 141 m / √ n p ǫ ( n ) log(4 m nǫ ( n ) /β ( n )) p ln (1 /δ ( n ))) · n max e ( a e + b e ) ≤ η Opt . A simple lower bound on the optimal solution is
Opt ≥ n min e a e n/m = n m min e a e , assuming allplayers are congesting at least one elements. Using this lower bound, and rearranging terms, wecan guarantee the desired approximation bound by assuming that n ≥ (cid:16) m / p ǫ ( n ) log(4 m nǫ ( n ) /β ( n )) p ln(1 /δ ( n ))) · max e ( a e + b e ) · mη min e a e (cid:17) (A.5)To use this solution as a benchmark in Corollary 5.1, we need a small enough ǫ ( n ) and δ ( n ) as eachperson leaving and arriving causes the benchmark solution to change for an O ( ǫ ( n ) + β ( n ) + δ ( n ))fraction of the population in expectation. We will let δ ( n ) , β ( n ) = ǫ ( n ) / ǫ ( n ) as small asis allowed by Equation (A.5). Since ǫ ( n ) /β ( n ) = 3 and δ ( n ) = ǫ ( n ) /
3, we need: ǫ ( n )ln(3 /ǫ ( n )) ≥ n (cid:16) m / log(12 m n ) · max e ( a e + b e ) · mη min e a e (cid:17) Let f ( n ) = (cid:16) m / log(12 m n ) · max e ( a e + b e ) · mη min e a e (cid:17) = O (cid:18) m (cid:16) log( m n ) max e ( a e + b e ) η min e a e (cid:17) (cid:19) , andobserve that f ( n ) = poly( m, log( n )). The latter inequality is satisfied if : ǫ ( n ) = 1 n f ( n ) ln(3 n )Moreover, by the latter parameters we also have that ǫ ( n ) + β ( n ) + δ ( n ) ≤ ǫ ( n ).Now applying Corollary 5.1 to the problem scaled by m · n max e ( a e + b e ) to guarantee the assump-tion ℓ e ( x ) ≤
1, that the loss functions for every player are bounded by 1, and scaling back, we getthat X t E [ C ( s t ; v t )] ≤ (cid:16) η (cid:17) X t Opt ( v t ) + 32 C R · nT s p (cid:18) nǫ ( n ) (cid:19) ln( N T ) max e ( a e + b e ) · n · m Consider the cost minimization problem assuming the latency function of all edges is replaced with the latencyˆ l ( x ) = x · min e a e . The value of the original cost minimization problem is at least the value of this new one. Thesocial cost in this new problem is simply: min e a e · P e x e . Since each player congests at least one edge the solutionmust satisfy the constraint: P e x e ≥ n . By the convexity and symmetry of the objective function, the latter relaxedproblem achieves a minimum when all x e are identical and equal to n/m in which case the value is n m min e a e . If we set: ǫ ( n ) = n f ( n ) · ln(3 n ) then: ln(3 /ǫ ( n )) = ln(3 n ) − ln( f ( n )) − ln ln(3 n ) ≤ ln(3 n ). Thus: ǫ ( n ) = n f ( n ) · ln(3 n ) ≥ n f ( n ) · ln(3 /ǫ ( n )). Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
To get the desired bound, we need to make sure that the additive error is bounded by a smallmultiple of
Opt . Concretely, we need:32 C R · nT s p (cid:18) nǫ ( n ) (cid:19) ln( N T ) max e ( a e + b e ) n · m ≤ · η X t Opt ( v t ) . Using again the n m min e a e ≤ Opt ( v t ) lower bound for the cost in each step t , we will now showthat we can guarantee this with the choice of p suggested in the theorem. With no loss of generalitywe can assume that ǫ ( n ) n > m ≥ n ≥ C R · nT p p · nǫ ( n ) ln( N T ) max e ( a e + b e ) n · m ≤ · η · T · n m min e a e . Finally, we use that the number of player strategies N in a congestion game with m elementsis clearly bounded by N ≤ m , and hence ln( N T ) ≤ m ln T . Using this fact, we can rearrange theabove inequality, and guarantee the required inequality if have p ≤ (cid:16) C R · · min e a e m · max e ( a e + b e ) · η (cid:17) · (cid:0) ǫ ( n ) · n (cid:1) − m ln T = (cid:16) C R · · min e a e m · max e ( a e + b e ) · η (cid:17) · f ( n ) ln(3 n ) m ln T = (cid:16) C R · · · (min e a e ) (max e ( a e + b e )) · η (cid:17) · (12 m n ) ln(3 n ) m ln T = (cid:18) C R · · (cid:19) · (cid:18) (min e a e )max e ( a e + b e ) · η (cid:19) · (12 m n ) ln(3 n ) m ln T The latter completes the proof of the theorem. (cid:3)
A.5. Proofs from Section 5.3
Proof of Lemma 5.5.
Algorithm
PAlloc with parameter α , finds, w.p. 1 − β , a feasible solutionwith social welfare at least : W ( x ; v ) ≥ Opt − α · max( ms, n )w.p. 1 − β , assuming (5.4) holds. We will use the ρms ≥ ρ · c · max { ms, n } lower bound on Opt , bythe two first large market assumptions. Now setting α = η · c · ρ with c from the first large marketassumption, we get that: W ( x ; v ) ≥ Opt − α max( ms, n ) ≥ (cid:16) − η (cid:17) Opt The algorithm assumes ms > n and gives an additive error bound of α · ms . If ms < n , we run PAlloc with an extra m ′ items such that ( m ′ + m ) s = a for some a ∈ [ n, n + s ]. For all the extra items every player has valuation 0 and, bythe way the algorithm works, no user gets extra item in the algorithm’s allocation. Applying the algorithm we havean error bound of α ( m + m ′ ) s ≤ α · ( n + s ) ≤ α · n . as required.For a given supply s , the bound from Equation (5.4) required is exactly the one claimed in thelemma. (cid:3) Proof of Theorem 5.3.
We apply Lemma 5.5 with a ǫ ( n ) = β ( n ) that satisfy the condition, i.e.,set ǫ ( n ) = O (cid:18) η · c · ρ · s (cid:19) polylog( n, m, s ) , By Corollary 5.1 and Lemma 5.5, we have: X t E [ W ( s t ; v t )] ≥ λ max { , µ } · (cid:16) − η (cid:17) X t E [ Opt ( v t )] − T n · C R p p (1 + 2 nǫ ( n )) ln( N T )In order to lower bound the second term by λ max(1 ,µ ) · η P t Opt t , we bound Opt t ≥ ρms as before,and then it suffices to prove the following: T n · C R · p p (1 + 2 nǫ ( n )) ln( N T ) ≤ T · η · ρms Using the assumption that ms ≥ cn , and rearranging terms this is ensured by: C R p p (1 + 2 nǫ ( n )) ln( N T ) ≤ η ρc Assuming wlog that n · ǫ ( n ) ≥ p ≤ η · ρ · c C R ) ln( N T ) · (cid:0) ǫ ( n ) · n (cid:1) − = Θ (cid:18) η · ρ · c ln( N T ) · sn · polylog( n, m, s ) (cid:19) . Using the assumption that ms ≥ cn , this is implied by the condition of the theorem assumed. (cid:3) Acknowledgments
We would like to thank Karthik Sridharan for pointing us to relevant adaptive regret learning literature.
References
Adlakha, Sachin, Ramesh Johari. 2013. Mean field equilibrium in dynamic games with strategic complemen-tarities.
Operations Research (4) 971–989.Adlakha, Sachin, Ramesh Johari, Gabriel Y. Weintraub. 2015. Equilibria of dynamic games with manyplayers: Existence, approximation, and market structure. Journal of Economic Theory
Theory of Computing (1) 121–164. URL http://dblp.uni-trier.de/db/journals/toc/toc8.html .Awerbuch, Baruch, Yossi Azar, Amir Epstein. 2013. The price of routing unsplittable flow. SIAM J. Comput. (1) 160–177. doi:10.1137/070702370. URL http://dx.doi.org/10.1137/070702370 . Preliminaryversion appeared in STOC’05.0 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Balseiro, Santiago R., Omar Besbes, Gabriel Y. Weintraub. 2015. Repeated auctions with budgets in adexchanges: Approximations and design.
Management Science (4) 864–884. doi:10.1287/mnsc.2014.2022. URL http://dx.doi.org/10.1287/mnsc.2014.2022 .Baar, T., G. Olsder. 1998. Dynamic Noncooperative Game Theory, 2nd Edition . Soci-ety for Industrial and Applied Mathematics. doi:10.1137/1.9781611971132. URL http://epubs.siam.org/doi/abs/10.1137/1.9781611971132 .Bergemann, Dirk, Maher Said. 2010. Dynamic Auctions: A Survey. Tech. Rep. 1757R, Cowles Foundation forResearch in Economics, Yale University. URL http://ideas.repec.org/p/cwl/cwldpp/1757r.html .Besbes, Omar, Yonatan Gur, Assaf J. Zeevi. 2013. Non-stationary stochastic optimization.
CoRR abs/1307.5449 . URL http://arxiv.org/abs/1307.5449 .Blum, Avrim, Eyal Even-Dar, Katrina Ligett. 2006. Routing without regret: On convergence to nash equi-libria of regret-minimizing algorithms in routing games.
Proceedings of the Twenty-fifth Annual ACMSymposium on Principles of Distributed Computing . PODC ’06, ACM, New York, NY, USA, 45–52.doi:10.1145/1146381.1146392. URL http://doi.acm.org/10.1145/1146381.1146392 .Blum, Avrim, MohammadTaghi Hajiaghayi, Katrina Ligett, Aaron Roth. 2008. Regret minimizationand the price of total anarchy.
Proceedings of the Fortieth Annual ACM Symposium on Theory ofComputing . STOC ’08, ACM, New York, NY, USA, 373–382. doi:10.1145/1374376.1374430. URL http://doi.acm.org/10.1145/1374376.1374430 .Blum, Avrim, Yishay Mansour. 2007. From external to internal regret.
J. Mach. Learn. Res. http://dl.acm.org/citation.cfm?id=1314498.1314543 .Brown, G. W. 1951. Iterative solutions of games by fictitious play. Activity Analysis of Production andAllocation . Wiley, 374–376.Caragiannis, Ioannis, Christos Kaklamanis, Panagiotis Kanellopoulos, Maria Kyropoulou, Brendan Lucier,Renato Paes Leme, va Tardos. 2015. Bounding the inefficiency of outcomes in generalized second priceauctions.
Journal of Economic Theory
343 – 388. doi:http://dx.doi.org/10.1016/j.jet.2014.04.010. URL . ComputerScience and Economic Theory.Cavallo, Ruggiero, David C. Parkes, Satinder Singh. 2010. Efficientmechanisms with dynamic populations and dynamic types. URL . Types.Technical report, Harvard University.Cesa-Bianchi, Nicol`o, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz. 2012. Mirror Descent Meets FixedShare (and feels no regret).
NIPS 2012 , vol. 25. Lake Tahoe, United States, Paper 471. URL https://hal.archives-ouvertes.fr/hal-00670514 .Cesa-Bianchi, Nicolo, Gabor Lugosi. 2006.
Prediction, Learning, and Games . Cambridge University Press,New York, NY, USA.1Christodoulou, George, Elias Koutsoupias. 2005. The price of anarchy of finite congestiongames.
Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Comput-ing . STOC ’05, ACM, New York, NY, USA, 67–73. doi:10.1145/1060590.1060600. URL http://doi.acm.org/10.1145/1060590.1060600 .Correa, Jose R., A. S. Schulz, N. E. Stier-Moses. 2003. Selfish routing in capacitated networks.
MATHE-MATICS OF OPERATIONS RESEARCH Comput. Sci. Rev. (2) 87–100. doi:10.1016/j.cosrev.2009.03.003. URL http://dx.doi.org/10.1016/j.cosrev.2009.03.003 .Dolan, Robert J. 1978. Incentive mechainsms for priority queuing problem. Bell Journal of Economics The Algorithmic Foundations of Differential Privacy . Founda-tions and Trends in Theoretical Computer Science, Now Publishers Incorporated. URL http://books.google.com/books?id=J3PUoQEACAAJ .Dwork, Cynthia, Frank McSherry, Kobbi Nissim, Adam Smith. 2006. Calibrating noise to sensi-tivity in private data analysis.
Proceedings of the Third Conference on Theory of Cryptog-raphy . TCC’06, Springer-Verlag, Berlin, Heidelberg, 265–284. doi:10.1007/11681878 14. URL http://dx.doi.org/10.1007/11681878_14 .Freund, Yoav, Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and anapplication to boosting.
J. Comput. Syst. Sci. (1) 119–139. doi:10.1006/jcss.1997.1504. URL http://dx.doi.org/10.1006/jcss.1997.1504 .Fudenberg, Drew, David K Levine. 1998. The Theory of Learning in Games . MIT Press, Cambridge, MA,USA.Fudenberg, Drew, Alexander Peysakhovich. 2014. Recency, records and recaps: Learning and non-equilibriumbehavior in a simple decision problem.
Proceedings of the Fifteenth ACM Conference on Economicsand Computation . EC ’14, ACM, New York, NY, USA, 971–986. doi:10.1145/2600057.2602872. URL http://doi.acm.org/10.1145/2600057.2602872 .Fudenberg, Drew, Jean Tirole. 1991. Perfect bayesian equilibrium and sequential equilibrium.
Journalof Economic Theory (2) 236 – 260. doi:http://dx.doi.org/10.1016/0022-0531(91)90155-W. URL .Google, Inc. 2015. First quarter 2015 results. https://investor.google.com/pdf/2015Q1_google_earnings_release.pdf .Released: April 23, 2015.Hannan, James. 1957. Approximation to Bayes risk in repeated plays. Contributions to the Theory of Games Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Hart, Sergiu, Andreu Mas-Colell. 2000. A Simple Adaptive Procedure Lead-ing to Correlated Equilibrium.
Econometrica (5) 1127–1150. URL http://ideas.repec.org/a/ecm/emetrp/v68y2000i5p1127-1150.html .Hart, Sergiu, Andreu Mas-Colell. 2012. Simple Adaptive Strategies: From Regret-matching to UncoupledDynamics . World Scientific Publishing Co., Inc., River Edge, NJ, USA.Hazan, Elad, C. Seshadhri. 2007. Adaptive algorithms for online decision prob-lems.
Electronic Colloquium on Computational Complexity (ECCC) (088). URL http://eccc.hpi-web.de/eccc-reports/2007/TR07-088/index.html .Herbster, Mark, Manfred K. Warmuth. 1998. Tracking the best expert. Mach. Learn. (2) 151–178.doi:10.1023/A:1007424614876. URL http://dx.doi.org/10.1023/A:1007424614876 .Hsu, Justin, Zhiyi Huang, Aaron Roth, Tim Roughgarden, Zhiwei Steven Wu. 2014. Private match-ings and allocations. Proceedings of the 46th Annual ACM Symposium on Theory of Com-puting . STOC ’14, ACM, New York, NY, USA, 21–30. doi:10.1145/2591796.2591826. URL http://doi.acm.org/10.1145/2591796.2591826 .Iyer, Krishnamurthy, Ramesh Johari, Mukund Sundararajan. 2014. Mean field equilibria of dynamic auctionswith learning.
Management Science (12) 2949–2970.Johari, Ramesh, John N. Tsitsiklis. 2004. Efficiency loss in a network resource allocation game. Mathematicsof Operations Research (3) 407–435.Johari, Ramesh, John N. Tsitsiklis. 2011. Parameterized supply function bidding: Equilibrium and efficiency. Operations Research (5) 1079–1089.Kalai, Ehud, Eran Shmaya. 2015. Learning and stability in big uncertain games. Tech. rep.Kannan, Sampath, Jamie Morgenstern, Ryan M. Rogers, Aaron Roth. 2014. Private pareto optimal exchange. CoRR abs/1407.2641 . URL http://arxiv.org/abs/1407.2641 .Kearns, Michael, Mallesh M. Pai, Aaron Roth, Jonathan Ullman. 2014. Mechanism design inlarge games: incentives and privacy.
Innovations in Theoretical Computer Science, ITCS’14,Princeton, NJ, USA, January 12-14, 2014 . 403–410. doi:10.1145/2554797.2554834. URL http://doi.acm.org/10.1145/2554797.2554834 .Kelly, Frank. 1997. Charging and rate control for elastic traffic.
European Transactions on Telecommunica-tions .Koutsoupias, Elias, Christos Papadimitriou. 1999. Worst-case equilibria.
Proceedings of the 16th AnnualConference on Theoretical Aspects of Computer Science . STACS’99, Springer-Verlag, Berlin, Heidel-berg, 404–413. URL http://dl.acm.org/citation.cfm?id=1764891.1764944 .Lehmann, Benny, Daniel Lehmann, Noam Nisan. 2001. Combinatorial auctions with decreasing marginalutilities.
Proceedings of the 3rd ACM Conference on Electronic Commerce . EC ’01, ACM, New York,NY, USA, 18–28. doi:10.1145/501158.501161. URL http://doi.acm.org/10.1145/501158.501161 .3Lehrer, Ehud. 2003. A wide range no-regret theorem.
Games and Economic Behavior CoRR abs/1502.05934 . URL http://arxiv.org/abs/1502.05934 .Maskin, Eric, J. Tirole. 2001. Markov perfect equilibrium, i: Observable actions.
Journal of Economic Theory
Algorithmic Game Theory . CambridgeUniversity Press, New York, NY, USA.Parkes, David C., Satinder P. Singh. 2003. An MDP-Based Approach to Online Mechanism Design.
NIPS .URL http://papers.nips.cc/paper/2432-an-mdp-based-approach-to-online-mechanism-design.pdf .Rogers, Ryan M., Aaron Roth, Jonathan Ullman, Zhiwei Steven Wu. 2015. Inducing approximately opti-mal flow using truthful mediators.
Proceedings of the Sixteenth ACM Conference on Economics andComputation . EC’15, ACM.Roughgarden, Tim. 2009. Intrinsic robustness of the price of anarchy.
Proceedings of the Forty-first AnnualACM Symposium on Theory of Computing . STOC ’09, ACM, New York, NY, USA, 513–522. doi:10.1145/1536414.1536485. URL http://doi.acm.org/10.1145/1536414.1536485 .Roughgarden, Tim, ´Eva Tardos. 2002. How bad is selfish routing?
J. ACM (2) 236–259. doi:10.1145/506147.506153. URL http://doi.acm.org/10.1145/506147.506153 .Syrgkanis, Vasilis, ´Eva Tardos. 2013. Composable and efficient mechanisms. Proceedings of the Forty-fifthAnnual ACM Symposium on Theory of Computing . STOC ’13, ACM, New York, NY, USA, 211–220.doi:10.1145/2488608.2488635. URL http://doi.acm.org/10.1145/2488608.2488635 .Van Long, Ngo. 2010.
A survey of dynamic games in economics , vol. 1. World Scientific.Weintraub, Gabriel Y., C. Lanier Benkard, Benjamin Van Roy. 2008. Markov perfect industrydynamics with many firms.
Econometrica (6) 1375–1411. doi:10.3982/ECTA6158. URL http://dx.doi.org/10.3982/ECTA6158 .Weintraub, Gabriel Y., Lanier Benkard, Benjamin Van Roy. 2006. Oblivious equilibrium: A meanfield approximation for large-scale dynamic games. Y. Weiss, B. Sch¨olkopf, J.C. Platt,eds., Advances in Neural Information Processing Systems 18 . MIT Press, 1489–1496. URL http://papers.nips.cc/paper/2786-oblivious-equilibrium-a-mean-field-approximation-for-large-scale- ykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population ec1
Supplementary Material
EC.1. Proof of Theorem 3.2
Theorem 3.2.
Consider a repeated mechanism with dynamic population M = ( M, T, p ) , such thatthe stage mechanism M is allocation-based ( λ, µ ) -smooth. Suppose that v T and x T is a k -stablesequence, such that x t is feasible (pointwise) and α -approximately optimal (in-expectation) for each t , i.e. α · E [ W ( x t ; v t )] ≥ E [ Opt ( v t )] . If players use an adaptive learning algorithm with constant C R then: X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − n · C R p T · ( k + 1) · ln( N T ) Proof of Theorem 3.2.
Let s ∗ , Ti be defined exactly as in the proof of Theorem 3.1 and r i ( s ∗ , Ti , s T ; v T ) be defined similarly as: r i ( s ∗ , Ti , s T ; v T ) = T X t =1 (cid:0) u i ( s ∗ ,ti , s t − i ; v t ) − u i ( s t ; v t ) (cid:1) For shorthand, we will denote this as r ∗ i in this proof. Following exactly the same arguments as inthe proof of Theorem 3.1, we can show that for each instance of v T and x T : r ∗ i ≤ C R p ( K i + 1) T ln( N T ) , We sum the latter inequality over all players and take expectation over v T and x T . Then we applyCauchy-Schwartz and Jensen inequalities and the k -stability of the sequence, i.e., P i E [ K i ] ≤ k · n : E "X i r ∗ i ≤ E " C R X i p ( K i + 1) T ln( N T ) ≤ E C R vuut n · T · ln( N T ) · n X i =1 ( K i + 1) ≤ C R vuut n · T · ln( N T ) · n X i =1 ( E [ K i ] + 1) ≤ n · C R p T · ln( N T ) · ( k + 1) (EC.EC.1.1)By the definition of regret for each instance of x T and v T , we have: T X t =1 u i ( s t ; v t ) = T X t =1 u i ( s ∗ ,ti , s t − i ; v t ) − r ∗ i Summing over all players and using the smooth mechanism property, we get that X t X i u i ( s t ; v t ) ≥ λ X t W ( x t ; v t ) − µ X t R ( s t ) − X i r ∗ i . c2 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
By re-arranging and using the fact that W ( s t ; v t ) = P i u i ( s t ; v t ) + R ( s t ): X t W ( s t ; v t ) + ( µ − X t R ( s t ) ≥ λ X t W ( x t ; v t ) − X i r ∗ i . Taking expectation over the allocation and valuation sequence and using the α -approximate opti-mality and Inequality (EC.EC.1.1): X t E [ W ( s t ; v t )] + ( µ − X t E [ R ( s t )] ≥ λα X t E [ Opt ( v t )] − n · C R p T ln( N T ) ( k + 1) . If µ ≤ µ >
1, we will show that totalrevenue is approximately bounded from above by welfare. Specifically, we will show that: X t E [ R ( s t )] ≤ X t E [ W ( s t ; v t )] + n · C R p T ln( N T ) ( k + 1) . The latter is equivalent to showing: X t X i E [ u i ( s t ; v t )] ≥ − n · C R p T ln( N T ) ( k + 1) . We use the fact that players can always play the empty strategy ∅ i of exiting the mechanismand receiving zero utility. Thus it suffices to bound the expected average per player regret withrespect to this empty fixed strategy. Define ∅ Ti the sequence of fixed empty strategies and denote r ∅ i = r i ( ∅ T , s T ; v T ). Then, using the no-regret definition with respect to this empty strategy foreach player i : X t u i ( s t ; v t ) = − r ∅ i Hence, for what we want to show, it suffices: X i E (cid:2) r ∅ i (cid:3) ≤ n · C R p T ln( N T ) ( k + 1) . (EC.EC.1.2)Observe that since this strategy and the type of each player i are fixed in the intervals definedby the changes accounted for in k i ( v Ti , x Ti ), from the exact same reasoning as what we used tobound r ∗ i , we can also derive that for each instance of v T and x T : r ∅ i ≤ C R p ( K i + 1) T ln( N T ) , and thereby similarly as in Inequality (EC.EC.1.1) we get the desired property given in Equation(EC.EC.1.2).Hence, we get that: µ X t E [ W ( s t ; v t )] ≥ X t E [ W ( s t ; v t )] + ( µ − X t E [ R ( s t )] − ( µ − n · C R p T ln( N T ) ( k + 1) ≥ λα X t E [ Opt ( v t )] − µn · C R p T ln( N T ) ( k + 1) . Dividing over by µ yields the Theorem. (cid:3) ykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population ec3
EC.2. Stable Sequences via Marginal Privacy
Here we extend the Corollary 5.1 to use a weaker form of privacy, marginal differential privacy,showing that results on marginal differential privacy would have sufficed for our main results fromSection 5. This weaker form of privacy may make it easier to prove the existence of approximatelyoptimal private solutions. We first state marginal privacy formally and then prove the extensionof our results.
Definition EC.2.1 ((Kannan et al. 2014)).
An algorithm M : C n → G n is ( ǫ, δ )-marginallydifferentially private if for every i , for every pair of i -neighbors D, D ′ ∈ C n , every other player j = i ,and for every subset of outputs S ⊆ G for player j . P r [ M ( D ) j ∈ S ] ≤ exp( ǫ ) P r [ M ( D ′ ) j ∈ S ] + δ If δ = 0, we say that M is ǫ -marginally differentially private.Similar to joint privacy, we will allow for our algorithms to have a failure probability β , withwhich they either return a very inefficient solution or an infeasible solution. Theorem EC.2.1.
Consider a repeated cost game with dynamic population
Γ = (
G, T, p ) , suchthat the stage game G is allocation-based ( λ, µ ) -smooth and T ≥ p . Assume that there exists an ( ǫ, δ ) -marginal differentially private algorithm A : V n → X n with failure probability β that satisfiesthe conditions of Theorem 5.1. If all players use adaptive learning in the repeated game then theoverall cost of the solution is at most: X t E [ C ( s t ; v t )] ≤ λα − µ X t Opt ( v t ) + nT − µ C R q p (cid:0) n ( ǫ + β + δ ) (cid:1) ln( N T ) Proof outline.
The proof follows roughly the same outline as the proof of Corollary 5.1 (whichused Lemma 5.1 and Theorem 3.1). The outline of the changes needed is as follows.1. The notion of marginal privacy is not strong enough to allow the kind of global couplingoffered by Theorem 5.1. Instead, we can couple the distributions ( v Ti , x Ti ) separately for eachplayer i , while ensuring that each sequence has expected number of changes in either her solutionor type at most p · T (1 + n (2 ǫ + δ )).2. With no global coupling of solutions, we cannot directly use Theorem 3.1. Rather we need toprove that the stable coupling of distributions of each player’s value and outcome individually isstrong enough to reach the same conclusion.We note that, while we can prove Theorems 3.1 and 3.2 without the need for global coupling,Theorem 3.3, requiring Property 1, does need the global coupling used there. (cid:3) We state the claims used by the two steps, and offer a sketch of how to modify the proves usedso far to prove the claims. c4 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Lemma EC.2.1 (Stable sequences via marginal privacy) . Suppose that there exists an algo-rithm A : V n → X n that is ( ǫ, δ ) -marginal differentially private algorithm, takes as input a valuationprofile v and outputs a distribution such that a sample from this distribution is feasible with proba-bility − β , and is an α -approximately efficient in expectation (for ≤ ǫ < / , α > and δ, β > ).Consider the sequence of valuations v T produced by the adversary in a repeated cost-minimization game with dynamic population Γ = (
G, p, T ) , and let σ T be the sequence of theresulting outcome distributions produced by algorithm A . Then there exists a randomized sequenceof solutions x Ti for each player i , such that for each ≤ t ≤ T , conditional on v t for each i thedistribution of ( v ti , x ti ) is the i th marginal distribution of an α -approximation to Opt ( v t ) , and thedistribution of the sequences ( v Ti , x Ti ) is such that the expected number of changes in i ’s solutionor type is at most p · T (cid:0) n (2 ǫ + 2 β + δ ) (cid:1) for each player i .Proof of Lemma EC.2.1. This is an application of the coupling Lemma 5.2 for each distribution σ i , where we use the optimal solution in the low probability event that the marginally differen-tially private algorithm fails. Using the notation from the proof of Theorem 5.1, marginal privacybounds the effect of a change in valuation of player j = i on the distribution σ i . Note that thereis no requirement that coupling is coordinated between the different coordinates, so the result-ing distribution of sequences ( v Ti , x Ti ) cannot be viewed as a distribution of global sequences( v T , x T ). (cid:3) Next we prove the analog of Theorem 3.1, which will finish our proof of Theorem EC.2.1.
Theorem EC.2.2 (Improved main theorem for cost-minimization games) . Considera repeated cost game with dynamic population
Γ = (
G, T, p ) , such that the stage game G isallocation-based ( λ, µ ) -smooth. Suppose D T is a sequence of solution distributions, such that thesolution in D t has cost at most α times the minimum possible cost Opt ( v t ) in expectation, andsuppose the marginal distributions D Ti can be though of as a randomized sequence of solutions x Ti for each player i , such that the distribution of the sequences ( v Ti , x Ti ) has expected numberof changes in i ’s solution or type at most k . If players use adaptive learning algorithms withconstant C R then: X t E [ C ( s t ; v t )] ≤ λα − µ X t E [ Opt ( v t )] + n − µ C R · p T · ( k + 1) · ln( N T ) Proof of Theorem EC.2.2.
We follow the outline of the proof of Theorem 3.1 till equation (A.4).Then take expectation of the resulting inequality to get T X t =1 E ( c i ( s t ; v t ))) ≤ T X t =1 E ( c i ( s ∗ ,ti , s t − i ; v t )) + C R · p ( k + 1) T ln( N T ) . ykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population ec5
Adding over all players, and using the smoothness property, we get that X t E ( C ( s t ; v t )) ≤ λ X t E ( C ( x t ; v t )) + µ X t E ( C ( s t ; v t )) + n · C R · p ( k + 1) T ln( N T ) , which finishes the proof. (cid:3) We can prove the analogous theorems for mechanisms as well.
Theorem EC.2.3 (Improved main theorem for mechanisms) . Consider a repeated mecha-nism with dynamic population M = ( M, T, p ) , such that the stage mechanism M is allocation-based ( λ, µ ) -smooth. Suppose σ T is a sequence of solution distributions, such that the solution in σ t hassocial welfare at least an α fraction of the maximum possible value Opt ( v t ) in expectation, andsuppose the marginal distributions σ Ti can be though of as a randomized sequence of solutions x Ti for each player i , such that the distribution of the sequences ( v Ti , x Ti ) has expected numberof changes in i ’s solution or type at most k for each player i . If players use adaptive learningalgorithms with constant C R then: X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − n · C R · p T · ( k + 1) · ln( N T ) . Theorem EC.2.4.
Consider a repeated mechanism with dynamic population
Γ = (
M, T, p ) , suchthat the stage mechanism M is allocation-based ( λ, µ ) -smooth and T ≥ /p . Assume that thereexists an ( ǫ, δ ) -marginal differentially private algorithm A : V n → X n with error parameter β thatsatisfies the conditions of Theorem 5.1. If all players use adaptive learning with constant C R in therepeated mechanism then the overall welfare of the solution is at least X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − n · C R · p p (1 + n ( ǫ + β + δ )) ln( N T ) EC.3. Large Congestion Games with General Latencies
Considering congestion games more generally, Rogers et al. (2015) assume that the latency func-tions ℓ e ( x ) satisfy the following conditions:1. The functions ℓ e ( x ) are non-decreasing, convex and twice differentiable.2. Latency on each edge is bounded by 1, that is, ℓ e ( n ) ≤ γ -Lipschitz, that is | ℓ e ( x ) − ℓ e ( x ′ ) | ≤ γ | x − x ′ | for some parameter 0 < γ < ǫ, δ ) joint dif-ferential privacy, and has an error probability of β for parameters ǫ, δ, β >
0, and for player types v with probability 1 − β returns a solution x with close to minimum cost: C ( x ; v ) ≤ Opt ( v ) + 20 m / n √ γ √ ǫ log (cid:0) m n γǫ ( n ) /β ( n ) (cid:1)q ln (cid:0) /δ ( n ) (cid:1) c6 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Polynomial Latencies.
Using this algorithm, we can extend the result for Linear Congestiongames in Section 5.2 to polynomial latency functions. Consider congestion games with latencyfunctions are polynomial of the form ℓ e ( x ) = d X j =0 a e,j x j with a e,d > a e,j ≥ j . More formally: Theorem EC.3.1.
Consider a repeated congestion game with dynamic population
Γ = (
G, T, p ) ,such that T ≥ p , the stage game G is an atomic ( λ, µ ) allocation based smooth congestion gamewith polynomial latency functions ℓ e ( x ) = P dj =0 a e,j x j with a e,d > and a e,j ≥ for all e and j = d .For any η > , if all players use adaptive learning algorithms with constant C R then the overallexpected cost is bounded by P t E [ C ( s t ; v t )] ≤ λ − µ (1 + η ) P t Opt ( v t ) assuming the probability p of departures is at most: C · η ( d · m d +6 ) − · (ln( T )) − for some C = Θ (cid:18) min e a e max e ( P e a e,d ) (cid:19) · ( C R ) − · (cid:0) log (6 m n ) log(3 n ) d (cid:1) − ! . Proof of Theorem EC.3.1.
The proof follows the same steps with the proof of Theorem 5.2.Here we will illustrate just the places where the analysis differs. Similarly to there, let ǫ ( n ), δ ( n )and β ( n ) be the privacy parameters of the algorithm.In order to make the latency function on each edge bounded by 1 as required by the algorithm,we need to scale the latency of each edge by an upper bound on it. As upper bound, we will use n d (cid:0) max e P dj =0 a e,j (cid:1) . Recall that for affine latencies, this upper bound was n max e ( a e + b e ) so herewe are using its natural extension to polynomials of degree d .This scaling down also makes the latencies d/n -Lipschitz, as required by the algorithm: ℓ e ( n ) − ℓ e ( n − n d (cid:0) max e P dj =0 a e,j (cid:1) ≤ (cid:0) n d − ( n − d (cid:1) · (cid:0) max e P dj =0 a e,j (cid:1) n d (cid:0) max e P dj =0 a e,j (cid:1) ≤ d · n d − n d = dn Similarly with the affine case, to claim that this is a (1 + η )-approximation algorithm, we need toguarantee that141 m / √ nd p ǫ ( n ) log (cid:0) m ndǫ ( n ) /β ( n ) (cid:1)q ln (cid:0) /δ ( n ) (cid:1) · n d (cid:0) max e d X j =0 a e,j (cid:1) ≤ η Opt
The lower bound we will use for the optimum is:
Opt ≥ n min e a e,d (cid:0) nm (cid:1) d = n d +1 m d min e a e,d againassuming that each player congests at least one elements, and using the fact that all latencyfunctions are degree d . Hence, the desired approximation bound is guaranteed for: n ≥ (cid:16) m / √ d p ǫ ( n ) log (cid:0) m ndǫ ( n ) /β ( n ) (cid:1)q ln (cid:0) /δ ( n ) (cid:1) · m d η min e a e,d (cid:17) ykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population ec7
The rest of the proof goes as the proof of Theorem 5.2 replacing ǫ ( n ) and the upper and lowerbounds accordingly. (cid:3) General Congestion Games.
We can use algorithm in the proof of Corollary 5.1 for general con-gestion games satisfying the conditions of (Rogers et al. 2015), and we get the following Theorem.
Theorem EC.3.2.
Consider a repeated congestion game with dynamic population
Γ = (
G, T, p ) ,such that the stage game G is allocation based ( λ, µ ) -smooth and T ≥ p . Assume the game satisfiesthe conditions above. For any parameters ǫ, δ, β > , if all players use adaptive learning algorithmswith constant C R in the repeated game then the overall cost of the solution is at most: X t E [ C ( s t ; v t )] ≤ λ − µ X t Opt ( v t ) + nmT − µ ˜ O (cid:0)p p (1 + n ( ǫ + β + δ )) + λm / γ / ǫ − / (cid:1) where the ˜ O is a polylog term in N, T, ǫ, /δ, /β, n, m .Proof of Theorem EC.3.2. A small technical difficulty in using the proof of Corollary 5.1 in ablack box form is that Corollary 5.1, as well as the main Theorem 3.1 used to prove it, are statedwith multiplicative error bounds. However, using the additive error in the proof of Theorem 3.1,we get the following, where v t is the type vector of players, s t is the strategy vector played at time t , and x t is the allocation that the differentially private algorithm generates. The assumption forcongestion games was that each individual latency is bounded by 1. Dividing each latency functionby m , the number of edges to make the total latency bounded by 1, or equivalently scaling downthe error bounds from Corollary 5.1 by a factor of m , we get X t E [ C ( s t ; v t )] ≤ λ X t E [ C ( x t ; v t )] + µ X t E [ C ( s t ; v t )] + nmT C R q p (cid:0) n ( ǫ + β + δ ) (cid:1) ln( N T )Adding the bound for the quality of the solution x , and rearranging terms we get the claimedbound. (cid:3) EC.4. Removing the dependence on T In our results presented so far, we have a logarithmic dependence on the total time T the game isplayed. Here we show that with a more careful analysis this dependence is not needed. Theorem EC.4.1.
Under the assumptions of Theorem 3.1, the bound can be replaced by: X t E [ C ( s t ; v t )] ≤ λα − µ X t E [ Opt ( v t )] + 11 − µ · n · C R s T ( k + 1) ln (cid:18) Np ln ( n/κ ) (cid:19) + 11 − µ · κT for all κ ∈ (0 , n/e ) c8 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Proof of Theorem EC.4.1.
In the proof of Theorem 3.1, the dependence on the total time T ,shows up in equation (A.2) bounding the regret of a player over time. The bound on regret isderived from Theorem 2.1 of (Luo and Schapire 2015) where regret over an interval of time [ τ i , τ )is bounded with τ inside the logarithm. In equation (A.2) we used the upper bound τ ≤ T for allthe regret terms.If all players in our game live at most T max steps, we can bound the total regret of the playersin one position i (using the shorthand r ∗ i from the proof of Theorem 3.1) as: r ∗ i ≤ C R vuut ( K i + 1) K i +1 X r =1 ( τ r +1 − τ r ) ln( N T max ) = C R p ( K i + 1) T ln( N T max )With a high enough T max , only a very small fraction of the players will live more than T max steps.To bound the overall regret without any assumption on how long players can live, we can boundthe regret of such long living players by 1 in each step.Let L ti denote the random event that at time t player i has been alive for more than T max stepsfor a value of T max that we will set later. Let also L i,t correspond to the indicator random variableof the event L ti . Following the proof of Theorem 3.1, and bounding regret by 1 for each player i atany step t that L ti occurs, we get the following bound. X t E [ C ( s t ; v t )] ≤ λα − µ X t E [ Opt ( v t )] + n − µ C R p T · ( k + 1) · ln( N T max ) + 11 − µ E "X i,t L i,t . To prove the theorem, we set T max = n/κ ) p , and we will show that this suffices to get E hP i,t L i,t i ≤ κT , which finishes the proof.To bound the expected value of the sum E [ P t L i,t ] for a given player i , divide the sequence of T time steps into intervals I j of length T max /
2. For any interval I j , let B i,j denote the event thatplayer i doesn’t change value throughout this interval, and note that the probability of this eventis bounded by Pr [ B i,j ] = (1 − p ) T max / . Now note that, if L i,t = 1, i.e. player i has lived more than T max steps at some time t ∈ I j , there exists a sequence of at most one contiguous intervals ending at I j − such that player i has not changed value. We will say that player i at time t is associated withthe first interval in this sequence. Note that, with this process, every player i at some time step t with L i,t = 1 is associated to at most one interval I j where a bad event occurs. Hence, E hP i,t L i,t i is at most the expected number of steps t when player i is associated with an interval where a badevent occurred.To get the claimed bound, we note the following facts: • there are n players (indices i ) we need to consider, ykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population ec9 • for each index i we consider 2 T /T max intervals, • the probability that this interval is associated with one particular long living player i isbounded by (1 − p ) T max / , • For every player index i , a bad event in an interval may incur an expected increase in E [ P t L i,t ]of at most the expected lifespan of the user after the interval, i.e. (1 − p ) + (1 − p ) + · · · ≤ /p (asevery player i has a probability p at each step to turn over).Combining these, we get the bound E "X i,t L ti ≤ n · TT max · (1 − p ) T max / · p Substituting T max and using that (1 − p ) /p ≤ /e we get the following bound: E "X i,t L i,t ≤ n · T p n/κ ) · e − ln( n/κ ) · p = n · T ln( n/κ ) · κn ≤ κT where the last inequality follows from the assumption that κ ≤ n/e and hence ln( n/κ ) ≥ (cid:3) Corollary EC.4.1.
In Theorem 5.2, it suffices to bound the probability of departures by O (cid:18) min e a e max e ( a e + b e ) · η (cid:19) · m − (cid:18) polylog (cid:18) n, m, η, min e a e max e ( a e + b e ) (cid:19)(cid:19) − ! . Proof of Corollary EC.4.1.
From Theorem EC.4.1, by setting κ = η m · n · max e ( a e + b e ) · n m min e a e = η n min e a e m max e ( a e + b e ) , together with the conditions of Theorem 5.2 we get that the approximation guar-antee in the Theorem holds if the probability of departure p is at most: O (cid:18) min e a e max e ( a e + b e ) η (cid:19) · (cid:0) m log ( m · n ) ln( n ) (cid:1) − · (cid:16) n/κ ) p (cid:17) which essentially is derived by replacing T with n/κ ) p in the bound stated in Theorem 5.2.To observe that, note that if we do the analysis in the proof of Theorem 5.2 but using η/ η and replace κ as described, the first two terms of the RHS of Theorem EC.4.1after rescaling back with m · n · max e ( a e + b e ) can be upper bounded by λ − µ (cid:16) η (cid:17) E "X t Opt ( v t ) (given that λ/ (1 − µ ) ≥ η E (cid:2) P t Opt ( v t ) (cid:3) , after rescaling, by our choice of κ and by the lower bound on the optimumof n m min e a e . c10 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Thus the requirement on the probability p is of the form: p ≤ A ln( B/p ) (EC.EC.4.1)for A = O (cid:18) min e a e max e ( a e + b e ) η (cid:19) · (cid:0) m log ( m · n ) ln( n ) (cid:1) − ! and B = 2 ln( n/κ ) > . We argue that p ≤ A B/A ) implies Inequality EC.EC.4.1 and hence is a sufficient upper boundon the probability p . Observe that the function g ( p ) = p log( B/p ) is monotone increasing in theregion p ∈ [0 , B/e ]. Wlog in this analysis assume that p < /e , hence the latter monotonicity holdsin this range, since B >
1. Moreover, we might as well assume that A B/A ) < /e , since we canalways assume that A < /e . Thus if p ≤ A B/A ) , then: p log( B/p ) = g ( p ) ≤ g (cid:18) A B/A ) (cid:19) = A B/A ) log (cid:18) B log(2 B/A ) A (cid:19) = A B/A ) (cid:18) log (cid:18) BA (cid:19) + log log (cid:18) BA (cid:19)(cid:19) ≤ A B/A ) 2 log (cid:18) BA (cid:19) = A Which is exactly inequality (EC.EC.4.1).Thus we conclude that p ≤ A B/A ) suffices to get the efficiency guarantee we want. Replacing A and B in the latter gives an upper bound of the asymptotic form stated in the corollary andwhich concludes the proof. (cid:3) Theorem EC.4.2.
Under the assumptions of Theorem 3.2, the bound can be replaced by: X t E [ W ( s t ; v t )] ≥ λα max { , µ } X t E [ Opt ( v t )] − n · C R s T ( k + 1) ln (cid:18) Np ln ( n/κ ) (cid:19) − κT for all κ ∈ (0 , n/e ) , where the term under the square root improves to an T · m ( k · n + m ) ln (cid:16) Np ln (cid:0) m/κ (cid:1)(cid:17) under Property 1Proof of Theorem EC.4.2. The proof of the first part of the theorem has the same steps as theproof of Theorem EC.4.1, hence we omit it. For the second part, the proof is also the same albeitinvoking the proof of Theorem 3.3 to replace n with m . The main difference, for the latter resultis that, under Property 1, it suffices to set T max = 2 ln( m/κ ), as in the second term we add at most m TT max / summands. Hence, we can totally remove the dependence on n . (cid:3) ykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population ec11
Corollary EC.4.2.
Theorem 5.3 continues to hold with an extra η multiplicative loss inthe welfare, even under the weaker requirement that the probability of departure is at most: O (cid:16) η · ρ · c m · polylog( n,m,s,η,ρ,c,N ) (cid:17) , i.e. there is no dependence on T at all in the upper bound.Proof of Corollary EC.4.2. Similarly to the Proof of Corollary EC.4.1, we set A = η ρ c m polylog( n,m,s ) and B = N ln(1 /η · ρ · c ). The claim then follows from the previous theorem by setting κ = η · ρ · c · n (cid:3) Corollary EC.4.3.
Theorem 4.1 continues to hold with an extra ǫ multiplicative loss inthe welfare, even under the weaker requirement that the probability of departure is at most: O (cid:16) ρ ǫ log (1+ ǫ ) (1 /ρ )polylog( N,ρ,ǫ ) (cid:17) , i.e. there is no dependence on T at all in the upper bound.Proof of Corollary EC.4.3. Again similarly to the Proof of Corollary EC.4.1, we set A = ρ ǫ · (1+ ǫ ) log (1+ ǫ ) (1 /ρ ) and B = N ln (cid:0) / ( ǫρ ) (cid:1) . The claim then follows from the previous theorem bysetting κ = ǫ · mρ (cid:3) Corollary EC.4.4.
Theorem 4.2 continues to hold with an extra ǫ multiplicative loss inthe welfare, even under the weaker requirement that the probability of departure is at most: ρ ǫ α (1 − ǫ ) log (1+ ǫ ) ( α (1 − ǫ ) /ρ ǫ ) ln( NT ) , i.e. there is no dependence on T at all in the upper bound.Proof of Corollary EC.4.4. Similarly to the proof of Corollary EC.4.3, we set A = ρ δ ǫ α (1 − ǫ ) and B = N ln(1 / ( ǫρδ ) where δ = ǫρα (1 − ǫ ) . The claim then follows from the previous theorem by setting κ = ǫρ . (cid:3) EC.5. Smoothness of First Price Auction with Discrete Bid Spaces
Lemma 4.1
The simultaneous first price mechanism where players are restricted to bid on at most d items and on each item submit a bid that is a multiple of δ · ρ , is a solution based (cid:0) − δ, (cid:1) -smoothmechanism, when players have submodular valuations, such that all marginals are either or atleast ρ and such that each player wants at most d items, i.e. v i ( S ) = max T ⊆ S : | T | = d v ( T ) .Proof of Lemma 4.1 Consider a valuation profile v = ( v , . . . , v n ) for the n players and a bidprofile b = ( b , . . . , b n ). Each valuation v i is submodular and thereby also falls into the class of XOSvaluations Lehmann et al. (2001), i.e. it can be expressed as a maximum over additive valuations.More formally, for some index set L i : v i ( S ) = max ℓ ∈L i X j ∈ S a ℓij Moreover, by the assumption that marginals are either 0 or at least ρ , it can be easily shown that a ℓij is either 0 or at least ρ . Moreover, when the player has value for at most d types of items, itcan also be shown that for any ℓ ∈ L i at most d of the ( a ℓij ) j ∈ [ m ] will be non-zero. c12 Lykouris, Syrgkanis and Tardos:
Learning and Efficiency in Games with Dynamic Population
Consider a feasible allocation x = ( x , . . . , x n ) of the items to the bidders, where x i is the set oftypes of items allocated to player i (the latter is feasible if each item is never allocated more thanits supply). Consider the following deviation b ∗ i ( v i , x i ) that is related to the valuation v i of player i and to allocation x i : Let ℓ ∗ ( x i ) = arg max ℓ ∈L i P j ∈ x i a ℓij . Then on each item j ∈ x i with a ℓ ∗ ( x i ) ij > (cid:22) a ℓ ∗ ( xi ) ij (cid:23) δ · ρ . On each j / ∈ x i , submit a zero bid. This will submit at most d non-zero bids.Now we argue that this deviations imply the solution based smooth property. Let p j ( b ) be thelowest winning bid on item j , under bid profile b . Observe that for each j , if p j ( b ) < (cid:22) a ℓ ∗ ( xi ) ij (cid:23) δ · ρ ,the player wins item j and pays (cid:22) a ℓ ∗ ( xi ) ij (cid:23) δ · ρ . Thus we get: u i ( b ∗ i ( v i , x i ) , b − i ; v i ) ≥ X j ∈ x i a ℓ ∗ ( x i ) ij − $ a ℓ ∗ ( x i ) ij % δ · ρ · p j ( b ) < $ a ℓ ∗ ( x i ) ij % δ · ρ ≥ X j ∈ x i $ a ℓ ∗ ( x i ) ij % δ · ρ · p j ( b ) < $ a ℓ ∗ ( x i ) ij % δ · ρ ≥ X j ∈ x i $ a ℓ ∗ ( x i ) ij % δ · ρ − p j ( b ) ≥ X j ∈ x i a ℓ ∗ ( x i ) ij − δ · ρ − p j ( b ) ! ≥ (cid:18) − δ (cid:19) X j ∈ x i a ℓ ∗ ( x i ) ij − X j ∈ x i p j ( b )= (cid:18) − δ (cid:19) v i ( x i ) − X j ∈ x i p j ( b )Summing over all players and observing that R ( b ) ≥ P j ∈ x i p j ( b ), we get the theorem. (cid:3) We denote with ⌊ x ⌋ δ · ρ the closest multiple of δ · ρ that is less than or equal to xx