Distributionally Robust Games: f-Divergence and Learning
aa r X i v : . [ m a t h . O C ] J u l Distributionally Robust Games: f -Divergence and Learning Dario Bauso , Jian Gao and Hamidou Tembine ∗† November 6, 2018
Abstract
In this paper we introduce the novel framework of distributionallyrobust games. These are multi-player games where each player modelsthe state of nature using a worst-case distribution, also called adversar-ial distribution. Thus each player’s payoff depends on the other players’decisions and on the decision of a virtual player (nature) who selectsan adversarial distribution of scenarios. This paper provides three maincontributions. Firstly, the distributionally robust game is formulated us-ing the statistical notions of f -divergence between two distributions, hererepresented by the adversarial distribution, and the exact distribution.Secondly, the complexity of the problem is significantly reduced by meansof triality theory. Thirdly, stochastic Bregman learning algorithms areproposed to speedup the computation of robust equilibria. Finally, thetheoretical findings are illustrated in a convex setting and its limitationsare tested with a non-convex non-concave function. Keywords:
Robustness, distribution uncertainty, stochastic optimization,robust game ∗ Department of Automatic Control and Systems Engineering, The University of Sheffield,Mappin Street, Sheffield, S1 3JD, United Kingdom [email protected] . † Learning & Game Theory Laboratory, New York University Abu Dhabi [email protected] . ontents Introduction
Games with payoff uncertainty refer to games where the outcome of a play isnot known with certainty by the players. Such games are also called incompleteinformation games and can be formalized in different ways. Distribution-freemodels of incomplete information games, both with and without private infor-mation are examined in [1, 2]. There the players use a robust optimizationapproach to contend with the worst-case scenario payoff. The distribution freeapproach relaxes the well-known Bayesian game model approach by Harsanyi.The limitation of the distribution-free model is that the uncertainty set has tobe carefully designed and in most cases such approach leads to too conservativeand unrealistic scenarios.Strategic learning has proven to be a powerful approach in stochastic games.In particular, its algorithmic nature is well suited to accommodate parallel anddistributed information exchange and processing as well as hardware realizabil-ity. However, almost all existing learning approaches work well only for spe-cific classes of games such as concave-convex zero-sum games, convex potentialgames, and some S-modular game problems with unimodal objective functions.For more general classes of games, convergence of strategic learning dynamicsis still an open issue. In addition, learning algorithms for finding fixed points orequilibria for general classes of games still present several challenges. For poly-matrix games with finite action spaces, there have been great progress includ-ing Cournot adjustment, Brown-von Neumann-Nash dynamics, reinforcementlearning, combined learning (see [13] and the references therein). For continu-ous action spaces, however, only few handful works are available. Evolutionarydynamics and revision protocols based actions has been proposed [6, 9, 5]. Theexploration of continuous action takes too much time if the dynamics is basedon individual action or measurable subsets [3, 7, 8]. Moreover, the convergencetime of these existing strategic learning algorithms (when a convergence to apoint or a limit cycle occurs) are unacceptably high even for potential gamesand it often requires strong assumptions such as bounded densities. The abovementioned prior works do not consider robust games setting. In [1, 2] a robustgame framework is presented. The authors defined a distributed-free approach(by considering worst-case performance). However the choice of the uncertaintyset remains an important part of the robust game modelling. In this work weare interested in learning in distributionally robust games under f -divergence. We make several contributions in this paper. We introduce for the first time anovel game model, called distributionally robust game. This game provides anew and original way of addressing game scenarios with incomplete information.For this game, we provide a rigorous definition of distributionally robust equilib-rium. Distributionally robust games accommodate both finite and continuousaction spaces. The relevance in formulating such a new game is in that it relaxesthe assumptions of Harsanyi’s Bayesian games. Distributionally robust games3iffer from the distribution-free framework of Aghassi & Bertsimas in [1, 2] inthat in the distribution-free approach the interval (or generally the uncertaintyset) needs to be known (learnable) by the decision-maker. In contrast to this,in the distributionally robust approach any alternative distribution within adivergence ball can be tested.As second contribution, we use a triality theory , which reduces considerablythe curse of dimensionality of the problem. We prove the existence of equilibriain any such robust finite game under suitable conditions.As third contribution, we provide a computational method based on Breg-man flow for approximately computing equilibria. Such a computational methodallows us to test the implementability of the approach on numerical examples.We introduce a class of distributionally robust games with continuous actionspaces, for which a subset of equilibria can be computed using the Bregmanalgorithm. We show that the resulting iterative dynamics, which we call Breg-man dynamics, is characterized by double exponential decay and convergenceto distributionally robust equilibria.
The rest of the paper is structured as follows. In Section 2 we introduce dis-tributionally robust games. Section 3 presents a learning algorithm for robustequilibria. Section 4 focuses on stochastic Bregman learning. Section 5 pro-vides numerical results. Discussions on the finite action spaces are presented inSection 6. Section 7 concludes the paper.
In this section, we first introduce the distribution uncertainty set, and thenformulate the distributionally robust game problem. Then, we define distri-butionally robust equilibrium, discuss triality theory and apply such theory toreduce the model via Lagrangian relaxation.
Let (Ω , F , m ) be a probability space. Here m is a probability measure definedon (Ω , F ) . The distribution m of the state ω is used to capture the probability ofthe different scenarios and of the corresponding performance function obtainedunder each of such scenarios for fixed action profile. We assume that the exactdistribution of the state is not available in general. Therefore we propose anuncertainty/constraint set among all the possible distributions with a divergencebounded by above by a scalar value ρ. Such a constraint takes the form B ρ ( m ) = { ˜ m | Z Ω d ˜ m = ˜ m (Ω) = 1 , D f ( ˜ m || m ) ≤ ρ } , D f is the so-called f -divergence from the probability measure m to ˜ m defined as D f ( ˜ m || m ) = Z ω ∈ Ω f (cid:18) d ˜ mdm (cid:19) dm − f (1) . Recall that for a convex (and proper) function f the Legendre-Fenchel du-ality holds: ( f ∗ ) ∗ = f where f ∗ ( ξ ) = sup x [ h x, ξ i − f ( x )] = − inf x [ f ( x ) − h x, ξ i ] . (1) Each player j chooses a j ∈ A j to optimize the worst loss performance functional E ˜ m l j ( a, ω ) subject to the constraint that the divergence D f ( ˜ m k m ) ≤ ρ . Thismeans that the worst loss performance is obtained under the assumption thata virtual player (nature) acts as a discriminator/attacker who modifies the dis-tribution m into ˜ m with an effort capacity that should not exceed ρ > . Therobust stochastic optimization of player j given ( a j ′ ) j ′ = j , m, ρ is given by( P j ) (cid:8) inf a j ∈A j sup ˜ m ∈ B ρ ( m ) E ˜ m l j ( a, ω ) . (2)Throughout the paper we assume that the following hold: The measure ˜ m is continuous with the respect to m and it is not a given profile, it could bedeformed or falsified by the discriminator. The function l j ( ., ω ) is proper andupper semi-continuous for m − almost all ω ∈ Ω . Either the domain A j is anon-empty compact set or E ˜ m l j ( a, ω ) is coercive. Definition 1 (Distributionally Robust Game) . The robust game G ( m ) involves • the set of players J = { , , . . . , n } , n ≥ • the decision space A j of each player j , j ∈ J• the uncertainty set B ρ ( m ) defined on the set of probability distributions m on Ω and ρ > • the payoff function E ˜ m l j ( a, ω ) of player j , j ∈ J . With the above game in mind, we can introduce the following solution con-cept.
Definition 2 (Distributionally Robust Equilibrium) . Let a ∗ j be the configurationof player j and a ∗− j := ( a ∗ k ) k = j . A strategy profile a ∗ = ( a ∗ , . . . , a ∗ n ) satisfying sup ˜ m ∈ B ρ ( m ) E ˜ m l j ( a ∗ , ω ) ≤ sup ˜ m ∈ B ρ ( m ) E ˜ m l j ( a j , a ∗− j , ω ) , for every a j ∈ A j and every agent j, is said distributionally robust pure Nashequilibrium of game G ( m ) . .3 From Duality to Triality Theory We here streamline the basic idea of triality theory. To this purpose, consideruncoupled domains A j , j ∈ J . For a general function l , one hassup a ∈A inf a ∈A l ( a , a ) ≤ inf a ∈A sup a ∈A l ( a , a )and the difference min a ∈A max a ∈A l ( a , a ) − max a ∈A min a ∈A l ( a , a )is the well-known duality gap. As it is widely known in duality theory fromSion’s Theorem [11] (which is an extension of von Neumann minimax Theorem)there is an equality, for example for convex-concave function, and the value isachieved by a saddle point in the case of non-empty convex compact domain.For a general function l , ( a , a , a ) l ( a , a , a ) one hasinf a ∈A sup a ∈A inf a ∈A l ( . ) ≤ inf a ∈A ,a ∈A sup a ∈A l ( . ) , sup a ∈A inf a ∈A sup a ∈A l ( . ) ≥ sup a ∈A ,a ∈A inf a ∈A l ( . ) . Proposition 1 (Triality) . Let ( a , a , a ) l ( a , a , a ) ∈ R be a function l defined on Q i =1 A i . Then, the following inequalities hold: sup a ∈A inf a ∈A ,a ∈A l ( a , a , a ) ≤ inf a ∈A sup a ∈A inf a ∈A l ( a , a , a ) ≤ inf a ∈A ,a ∈A sup a ∈A l ( a , a , a ) , (3) and similarly sup a ∈A ,a ∈A inf a ∈A l ( a , a , a ) ≤ sup a ∈A inf a ∈A sup a ∈A l ( a , a , a ) ≤ inf a ∈A sup a ∈A ,a ∈A l ( a , a , a ) . (4) Proof.
First we shall prove the sup inf inequality. Define g ( a , a ) = inf a ∈A l ( a , a , a ) . Thus, for all a , a , one has g ( a , a ) ≤ l ( a , a , a ) . It follows that, for any a , a , sup a ∈A g ( a , a ) ≤ sup a ∈A l ( a , a , a ) . Using the definition of g, one obtainsup a ∈A inf a ∈A l ( a , a , a ) ≤ sup a ∈A l ( a , a , a ) , ∀ a , a . Taking the infinimum in a yields:sup a ∈A inf a ∈A l ( a , a , a ) ≤ inf a ∈A sup a ∈A l ( a , a , a ) , ∀ a (5)Now, for variable in a we use two operations:6 Taking the infininum in inequality (5) in a yieldsinf a ∈A sup a ∈A inf a ∈A l ( a , a , a ) ≤ inf a ∈A inf a ∈A sup a ∈A l ( a , a , a )= inf ( a ,a ) ∈A ×A sup a ∈A l ( a , a , a ) , which proves the second part of the inequalities (3). The first part of theinequalities (3) follows immediately from (5). • Taking the supremum in inequality (5) in a yieldssup ( a ,a ) ∈A ×A inf a ∈A l ( a , a , a ) ≤ sup a ∈A inf a ∈A sup a ∈A l ( a , a , a ) , which proves the first part of the inequalities (4). The second part of theinequalities (4) follows immediately from (5).This completes the proof.We use the above inequalities in the Lagrangean relaxation of the MaxMinRobust Game. Assume that a E ˜ m l j ( a, ω ) is continuous for m − almost all ω. Then, the func-tional F j : ˜ m inf a j E ˜ m l j ( a, ω ) is Gateaux differentiable with derivative F j,m ( ˆ m ) = inf a j ∈A ∗ j ( m ) E ˆ m l j ( a, ω ) , where A ∗ j ( m ) = arg min a j E m l j ( a, ω ) is the best-response under m. This deriva-tive in the space of square integrable measurable functions under m which isof infinite dimensions, does not facilitate the computation of the robust op-timal strategy a ∗ j , ˜ m ∗ . Below we propose an equivalent problem that reducesconsiderably the curse of dimensionality of the problem.
In order to reduce the curse of dimensionality of the problem we use a trialitytheory. The robust best-response problem of agent j is equivalent to (cid:8) inf a j sup L ∈ L ρ ( m ) E m [ l j L ]; (6)7here L ( ω ) = d ˜ mdm ( ω ) is the likelihood and set L ρ ( m ) is L ρ ( m ) = { L | Z ω f ( L ( ω )) dm − f (1) ≤ ρ, Z ω L ( ω ) dm = 1 } . We introduce the Lagrangian as˜ l j ( a, L, λ, µ ) = R ω l j ( a, ω ) L ( ω ) dm + λ ( ρ + f (1) − R ω f ( L ( ω )) dm )+ µ (1 − R ω L ( ω ) dm ( ω )) , where λ ≥ µ ∈ R . The problem solved by Player j is( ˜ P ∗ j ) n inf a j sup L ∈ L ρ ( m ) inf λ ≥ ,µ ∈ R ˜ l j ( a, L, λ, µ ) . (7)A full understanding of problem ( ˜ P ∗ j ) requires a triality theory (not a dualitytheory) whose main principles were streamlined in Section 2.3. The underlyingidea is that one can use a transformation of the last two terms to derive afinite dimensional optimization problem. The Lagrangian ˜ l j of agent j is clearlyconcave in L and convex in λ, µ and is semi-continuous jointly. By the trialitytheory above, ˜ l j : ( a, L, λ, µ ) ˜ l j ( a, L, λ, µ ) satisfies the sup inf inequality andone has the following:inf a j sup L ∈ L ρ ( m ) inf λ ≥ ,µ ∈ R ˜ l j ( . ) ≤ inf a j inf λ ≥ ,µ ∈ R sup L ∈ L ρ ( m ) ˜ l j ( . ) . In this case there is no gap in the second part of the optimization and thefollowing equality holds:inf a j sup L ∈ L ρ ( m ) inf λ ≥ ,µ ∈ R ˜ l j ( . ) = inf a j inf λ ≥ ,µ ∈ R sup L ∈ L ρ ( m ) ˜ l j ( . ) . The latter problem can be rewritten as( ˜ P ∗ j ) n inf a j ∈A j ,λ ≥ ,µ ∈ R [sup L ∈ L ρ ( m ) ˜ l j ( a, L, λ, µ )] . (8)The Lagrangian function takes the form as ˜ l j = λ ( ρ + f (1)) + µ + R { L [ l j − µ ] − λf ( L ) } dm. It follows that sup L ∈ L ρ ( m ) ˜ l j ( a, L, λ, µ ) = λ ( ρ + f (1)) + µ + sup L Z { L [ l j − µ ] − λf ( L ) } dm. (9)Introducing the Fenchel-Legendre transform on L and exchanging sup and R , one gets sup L ∈ L ρ ( m ) ˜ l j ( . ) = λ ( ρ + f (1)) + µ + Z λf ∗ ( l j − µλ ) dm. A j × R + × R is a subset of a finite dimensional vector space, then itfollows that the robust best-response problem of agent j is equivalent to thefinite dimensional stochastic optimization problem:( P ∗ j ) inf a j ∈A j ,λ ≥ ,µ ∈ R l ∗ j ( a, λ, µ, m ) l ∗ j ( a, λ, µ, m ) = λ ( ρ + f (1)) + µ + R λf ∗ ( l j − µλ ) dm = E m h j . (10)where h j is the integrand cost λ ( ρ + f (1)) + µ + λf ∗ ( l j − µλ ) . We have convertedthe infinite dimensional problem ( P j ) into a finite dimensional problem ( P ∗ j ) . The above calculations culminate in the following result:
Proposition 2. If a, λ ∗ ( a ) , µ ∗ ( a ) , is a solution of ( P ∗ j ) then the optimal like-lihood L ∗ is such that R ω L ∗ dm = 1 , f ′ ( L ∗ ) = l j − µ ∗ λ ∗ . This means that a j and d ˜ m ∗ = L ∗ dm provide a solution of the original problem ( P j ) . Proof.
Let λ ∗ ( a ) , µ ∗ ( a ) be solution to ( P ∗ j ) associated with the profile a. Then,the optimal likelihood L ∗ is obtained by differentiating f ∗ or by inverting theequation f ′ ( L ∗ ) = l j − µ ∗ λ ∗ . As ˜ m is a probability measure, and using the definitionof L ∗ , one gets d ˜ m ∗ ( ω ) = L ∗ dm ( ω ) . It follows that a ∗ j , L ∗ solves the original problem ( P j ) . Next we look at the existence of robust equilibria.
As in classical game theory, sufficiency condition for existence of robust equi-librium can be obtained from the standard fixed-point theory which we recallnext. Let A j be nonempty compact convex sets and l ∗ j be continuous functionssuch that for any fixed ( z k ) k = j , the function z j l ∗ j ( z, m ) is quasi-convex foreach j. Then, there exists at least one distributionally robust pure equilibrium.This result can be easily extended to the coupled-action constraint case forrobust generalized Nash equilibria.
Using l ∗ j ( a, λ, µ, m ) = λ ( ρ + f (1)) + µ + lim N j → + ∞ N j P N j k =1 λf ∗ ( l j ( .,ω k ) − µλ )where ω k ∼ m. Let m N j = 1 N j N j X k =1 δ ω k
9e the empirical measure of the channel state and define ǫ N j p N j = p N j sup ˜ m ∈ D ρNj ( m Nj ) E ˜ m l j − p N j E m Nj l j − q N j ρ N j var m Nj [ l j ]with N j ρ N j < + ∞ . Then, the following holds: ǫ N j p N j → N j grows. Theabove result states that the robust performance captures the risk by consideringthe variance and not just the ergodic performance. In this section we develop learning algorithms for ( P ∗ j ) j . Consider the optimal control problem inf u ∈ U R T ˆ l ( t, z, u ) dt such that ˙ z = u. Themaximum principle is a necessary condition of optimality when the underlyingfunction is sufficiently smooth. The adjoint variable ˙ p = − H z = − ˆ l z andthe optimal control optimizes the Hamiltonian H ( z, p ) = inf u ∈ U { ˆ l + pu } i.e.,the Legendre-Fenchel transform of − ˆ l applied at the point − p. A closed-formexpression of the optimal control can be obtained and it is generically given by u ∗ = H p ( z, p ) . A necessary condition for optimality says that H u ∗ ( u − u ∗ ) ≥ u ∈ U, where H u denotes an element of the sub-differential of H. Thislatter variational equation can be rewritten as0 ≤ H u ∗ ( u − u ∗ ) = [ˆ l u ∗ + p ]( u − u ∗ ) , (11)for all u ∈ U. In particular, an interior solution u ∗ should solve p = − ˆ l u ∗ and the adjointequation becomes ˙ p = ddt ( − ˆ l u ∗ ) = − ˆ l z ( z, u ∗ ) , which means that ddt ˆ l ˙ z = ˆ l z ( z, ˙ z ) . The latter equation is also called Euler-Lagrange equation in the field of calculusof variations. Since the minimization is among all possible curves, this minimumprinciple may exhibit features that allow to investigate faster time curves.
Let g : A → R be a differentiable, strictly convex function. The Bregmandivergence [10] is d g : A × relint ( A ) → R and is defined as d BRg ( y, x ) = g ( y ) − g ( x ) − h g x ( x ) , y − x i , where relint ( A ) denotes the relative interior of A .
10e investigate the equation ddt ˆ l u ( z, u ) = ˆ l z ( z, u ) for a class of quantity-of-interest ˆ l. Let the family of Bregman-based Lagrangian be ˆ l ( z, u ) = e α + γ [ d g ( z + e − α u, z ) − e β l ∗ ( z )] . Proposition 3.
The Euler-Lagrange equation reduces to the following secondorder differential system, for ˙ γ = e α , ¨ z + ( e α − ˙ α ) ˙ z + e α + β g − zz ( z + e − α ˙ x ) l ∗ z ( z ) = 0 . (12) Proof.
We start with the definition of Bregman divergence. A simple computa-tion shows that ∂ y d g ( y, x ) = g x ( y ) − g x ( x ) , ∂ x d g ( y, x ) = − g xx ( x ) . ( y − x ) . By differentiating the functional ˆ l one getsˆ l z = e α + γ [ − d g, ( z + e − α u, z ) − d g, ( z + e − α u, z ) + e β l ∗ z ]= e α + γ [ − g z ( y ) + g z ( z ) + g zz ( z )( y − z ) + e β l ∗ z ] , ˆ l u = − e γ d g, ( z + e − α u, z ) = − e γ [ g z ( z + e − α u ) − g z ( z )] (13)It follows that ddt ˆ l u = − ˙ γe γ [ g z ( z + e − α u ) − g z ( z )] − e γ ddt [ g z ( z + e − α u ) − g z ( z )]= − ˙ γe γ [ g z ( z + e − α u ) − g z ( z )] − e γ [ g zz ( z + e − α u )( ˙ z − ˙ αe − α ˙ z + e − α ¨ z ) − g zz ( z ) ˙ z ]= ˆ l z = e α + γ [ − g z ( y ) + g z ( z ) + g zz ( z )( y − z ) + e β l ∗ z ] . (14)Thus, − ˙ γ [ g z ( z + e − α ˙ z ) − g z ( z )] − [ g zz ( z + e − α ˙ z )( ˙ z − ˙ αe − α ˙ z + e − α ¨ z ) − g zz ( z ) ˙ z ]= e α [ − g z ( z + e − α ˙ z ) + g z ( z ) + g zz ( z ) e − α ˙ z + e β l ∗ z ] . (15)By rearranging the terms in ¨ z, ˙ z we obtain e α ( ˙ γ − e α ) g − zz ( z + e − α ˙ z )[ g z ( z + e − α ˙ z ) − g z ( z )]+ e α g − zz ( z + e − α ˙ z )[ g zz ( z + e − α ˙ z )( ˙ z − ˙ αe − α ˙ z ) − g zz ( z ) ˙ z ]+ e α g − zz ( z + e − α ˙ z )[ g zz e − α ˙ z + e β l ∗ z ] + ¨ z = 0 . (16)Taking ˙ γ = e α yields e α g − zz ( z + e − α ˙ z )[ g zz ( z + e − α ˙ z )( ˙ z − ˙ αe − α ˙ z )]+ e α g − zz ( z + e − α ˙ z )[ g zz e − α ˙ z + e β l ∗ z ] + ¨ z = 0 . (17)The Euler-Lagrange equation reduces to¨ z + ( e α − ˙ α ) ˙ z + e α + β g − zz ( z + e − α ˙ z ) l ∗ z ( z ) = 0 , (18)11hich can be rewritten as[¨ ze − α + (1 − ˙ ae − α ) ˙ z ] g zz ( z + e − α ˙ z ) + e α + β l ∗ z ( z ) = 0 . (19)From the above, the Bregman algorithm yields ddt [ g z ( z + e − α ˙ z )] = − e α + β l ∗ z ( z ) . (20)This completes the proof.Note that the second order system is easily converted into a first-order systemby setting y = z + e − α u, ˙ z = u, ˙ u = ¨ z = − ( e α − ˙ α ) u − e α + β g − zz ( z + e − α u ) l ∗ z ( z ) . Definition 3.
We say that z ˜ l ∗ ( z ) is a best response pseudo-potential func-tion for the distributionally robust game G ( m ) if arg min z j ˜ l ∗ ( z ) ⊆ arg min z j l ∗ j ( z ) , for every j. The Bregman algorithm is given by ddt [ g z ( z + e − α ˙ z )] = − e α + β ˜ l ∗ z ( z ) , z (0) = z , (21)where β ( t ) = β (0) + R t e α ( t ′ ) dt ′ , β (0) ≥ , and α is a time-dependent function. Proposition 4. If ˜ l ∗ is convex then ≤ − ˜ l ( z ∗ ) + ˜ l ( z ( t )) ≤ e − β ( t ) c , where c = d BRg ( z ∗ , z + e − α (0) ˙ z ) + e β (0) [ − ˜ l ( z ∗ ) + ˜ l ( z (0))] . By choosing α ( t ) = t, β ( t ) = e t , and the error gap is − ˜ l ( z ∗ ) + ˜ l ( z ( t )) ≤ e − e t c . It takes T η = log log( c η ) time units to reach a neighborhood of the equilibriumpayoff of z ∗ with a precision η > . This is faster than Ishikawa-Nesterov algo-rithm O ( √ η ), gradient ascent method O ( η ) , no-regret dynamics, and back-boxoptimization c η . Thus, Bregman dynamics speeds up the learning and improvesclassical methods with exponential decay.The proof of Proposition 4 is based on a careful construction of a generalizedbest-response pseudo-potential function using Pontryagin maximum principle.It extends the framework developed in [4] to the context of strategic-form games.Then we check that the following function V is a Lyapuvov function: V ( z ∗ , z ( t )) = d BRg ( z ∗ , z ( t ) + e − α ( t ) ˙ z ( t )) + e β ( t ) [ − ˜ l ( z ∗ ) + ˜ l ( z ( t ))]12here z ( t ) is generated by the Bregman algorithm.Note that Proposition 4 does not require the strong convexity property oftenused in the proof of convergence gradient dynamics and Newton-based gradientmethods. This is because the Bregman divergence is carefully designed to com-pensate that part. Table 1 summarizes the theoretical speedup advantages ofBregman algorithms over the state-of-the-art algorithms.Accuracy Time to ReachThis paper O ( e − β ( t ) ) O ( β − (log( η )))This paper (Bregman) O ( e − e t ) O (log(log( η )))Ishikawa-Nesterov O ( t ) O ( √ η )Conjugate/proximal gradient O ( t ) O ( η )Gradient ascent O ( t ) O ( η )Regret-min O ( log tt ) -Standard black-box O ( √ t + . ) O ( η )Table 1: Performance of the proposed Bregman algorithm compared to theclassical ones with a precision error within η > . Proof of Proposition 4.
Let us define function V as follows. V ( z, u, t, z ∗ ) = d g ( z ∗ , z + e − α u ) + e β [ − l ∗ ( z ∗ ) + l ∗ ( z )] . (22)Then the function V ( z, u, t ) is positive. Let us compute the time derivative of V over the path z ( t ) , u ( t ) generated by the Bregman algorithm. ddt [ g z ( z + e − α ˙ z )] = − e α + β l ∗ z ( z ) . We also have that ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = − ddt [ z + e − α u ] g zz ( z + e − α u ) . ( z ∗ − z − e − α u )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ )] + e β l ∗ z ( z ) ˙ z (23)By summing and subtracting the same term we have ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = − ddt [ z + e − α u ] g zz ( z + e − α u ) . ( z ∗ − z − e − α u )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) + l ∗ z ( z ∗ − z ) − l ∗ z ( z ∗ − z )]+ e β l ∗ z ( z ) ˙ z = − ddt [ g z ( z + e − α u )] . ( z ∗ − z − e − α u )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) + l ∗ z ( z ∗ − z )] − ˙ βe β l ∗ z ( z ∗ − z ) + e β l ∗ z ( z ) ˙ z. (24)13he above leads to further expansion as follows ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = e α + β l ∗ z ( z )( z ∗ − z − e − α u )+˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) + l ∗ z ( z ∗ − z )] − ˙ βe β l ∗ z ( z ∗ − z ) + e β l ∗ z ( z ) ˙ z = e α + β l ∗ z ( z )( z ∗ − z ) + e β l ∗ z ( z )( ˙ z − u ) − ˙ βe β l ∗ z ( z ∗ − z ) + ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )]= e β ( e α − ˙ β ) l ∗ z ( z ∗ − z )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )] . (25)By convexity of the function l ∗ , [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )] ≤ l ∗ z ( z ∗ − z ) ≤ . If e α − ˙ β ≥ ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = e β ( e α − ˙ β ) l ∗ z ( z ∗ − z )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )] ≤ . (26)Thus, ddt V ( z ( t ) , u ( t ) , t, z ∗ ) ≤ β ≤ e α . Then the function is decreasingover the path of the Bregman algorithm. It follows that e β [ l ∗ ( z ) − l ∗ ( z ∗ )] ≤ V ( z, u, t, z ∗ ) ≤ V ( z , u , t ) . Then, the global error is0 ≤ l ∗ ( z ) − l ∗ ( z ∗ ) ≤ e − β V ( z , u , t ) , with ˙ β ≤ e α , which shows an exponential convergence to z ∗ . This completes theproof.
Very often the computation of the terms ∇ E l ∗ , E [ λ ( ρ + f (1)) + µ + f ∗ ( l j − µλ )]or its partial derivatives is challenging and depends on the structure of thedistribution m. We now propose a swarm learning to estimate the expectedgradient and then insert it to the Bregman algorithm (21), leading to particleswarm stochastic Bregman algorithm.
We propose a stochastic Bregman learning framework which is adjusted basedonly on the realized integrand h j ( z, ω j ) := λ ( ρ + f (1)) + µ + λf ∗ ( l j − µλ ) . Theexpected value of h j is E ω ∼ m h j = l ∗ j . The stochastic Bregman dynamics yields ddt [ g j,z j ( z + e − α u )] = − e α + β h j,z j ( z, ω j )= − e α + β [ E h j,z j ( z, ω j ) + W j ] = − e α + β [ l ∗ j,z ( z ) + W j ( z, ω j )] ,i ∈ { , , . . . , n } . Function z = ( a, λ, µ ) is now a stochastic process because of the stochastic term ω j and W j ( z ) = h j,z j ( z, ω j ) − E h j,z j ( z, ω j ) . The variance of W is being highand not vanishing because it is based on a single particle path discrepancy. Weintroduce in the following subsection a swarm of particles.14 .2 Swarm of particles Let us associate to each agent j a swarm of virtual particles ω jk . Then, we have E h j,z j ( z, . ) = lim N N N X k =1 h j,z j ( z, ω jk ) . Swarm-based stochastic Bregman dynamics yields ddt [ g z j ( z + e − α u )] = − e α + β N P Nk =1 h j,z j ( z, ω jk ) ,ω jk ∼ m, j ∈ { , , . . . , n } . (27)This is a mean-field-type interacting system and can be seen as control-dependentcorrelated noise modification of the Bregman dynamics as ddt [ g z j ( z + e − α u )] = − e α + β [ l ∗ j,z j ( z ) + ǫ j,N ( z, ω )] ,j ∈ { , , . . . , n } , where ǫ j,N = N P Nk =1 h j,z j ( z, ω jk ) − l ∗ j,z j ( z ) has zero mean and standard devi-ation as q E [ ǫ j,N ] = r var[ h j,z j ( z, . )] N . For a realized ω, set l ∗ j,N = 1 N N X k =1 h j ( z, ω jk ) , z ∗ j,N ∈ arg min z N N X k =1 h j ( z, ω jk ) . Then, the particle swarm Bregman algorithm (27) gives as output the function z N ( t ) that satisfies 0 ≤ − ˜ l N ( z ∗ N ) + ˜ l N ( z N ( t )) ≤ e − β ( t ) c ,N where c ,N := d BRg ( z ∗ N , z + e − α (0) ˙ z ) + e β (0) [ − ˜ l N ( z ∗ N ) + ˜ l N ( z )] . This says that the N − swarm per player Bregman scheme provides a goodapproximation of the robust equilibrium. To illustrate the particle swarm Bregman algorithm (27) we consider specificrobust games. We consider two agents and the discriminator/adversary. Wechoose f ( x ) = (cid:26) x log x − x if x > , x = 0 . (28)We compute f (1) = − f ′ ( x ) = log x, f ′′ ( x ) = x > . Hence f is convexon R + . The Legendre-Fenchel transform of f yields f ∗ ( ξ ) = e ξ . .1 Best-response Pseudo-Potential Distributionally Ro-bust Game We set l j ( a, ω ) = log (cid:0) ω a + ω a (cid:1) defined over R . The integrand function h j is h j = λ ( ρ −
1) + µ + λ (1 + ω a + ω a ) λ e − µλ The random variable ω is distributed over m and we assume that ω has finitemoments. The stochastic robust objective function l ∗ j,N : l ∗ j,N = λ ( ρ −
1) + µ + λN N X k =1 (1 + ω k a + ω k a ) λ e − µλ Time a -4-2024681012 Bregman DynamicsGradient Dynamics
Time a -2-10123456 Bregman DynamicsGradient Dynamics
Figure 1: Gradient vs Bregman-based dynamics.We illustrate the Bregman-based dynamics in Figure 1 N = 1000 samples.We observe a rapid convergence to the robust equilibrium. The trajectory is nota descent but the amplitude of oscillation quickly decreases and an acceptableconvergence time that is 20 times better than the classical gradient dynamics. We set l j ( a, ω ) = log (cid:0)
20 + ω − sin( a ) sin( a ) √ a a (cid:1) , , . The function l j has multiple local extrema as illus-trated in Figures 2 and 3. The objective function of agent j is non-convex andnon-concave. The function l j is chosen because it does not fulfill the conditionsof the Theorem 4. We observed that the Bregman algorithm behaves well evenfor this multimodal case which opens the investigation for non-convex objectivefunctions. The multimodal function has a robust equilibrium around (7 . , . . . In Figure 4, the Bregman learn-ing outcome changes from the distributionally robust Nash equilibrium (7 . , . . , . . Figure 2: The function l j is non-convex, non-concave and has multiple localextrema. We now discuss how the above framework can be used in the discrete actionspace case. We limit our exposition to the finite set. A j is a finite (withtwo or more actions), it is not a convex set. We convexify them as standardmixed strategy approach. A j will be replaced by X j the simplex over A j . Therobust payoff on pure action profile will be replaced by the expected robustpayoff. One obtains the so-called mixed extension of the game. The existenceof distributionally robust mixed equilibrium follows from standard fixed-pointtheorems. 17igure 3: The particle swarm Bregman learning leads to the robust Nashequilibria around (7 . , .
9) and the equilibrium value is around 7 . . Acknowledgment
This research is supported by U.S. Air Force Office of Scientific Research undergrant number FA9550-17-1-0259. This research was conducted when D. Bausowas visiting NYUAD.
We have introduced distributionally robust game with continuous action spacefor each agent and a possible adversarial modification of the uncertainty. Theproblem is formulated using a notion of divergence between two measures: themodified measure and the exact measure associated to the uncertainty. In thecontext of existence of robust solutions, additional difficulties arise if in additiona robustness condition or an adversarial control of the distribution is involvedin the objective function. We have used triality theory to transform the ob-jective function of each agent. This transformation reduces considerably thecurse of dimensionality of the problem. Then, sufficient conditions of existenceof solution are derived. We constructed a speedup learning algorithm basedon Bregman discrepancy. The methodology does not require strong convexityassumptions as in the classical gradient algorithms. The convergence time isshown to be much faster than the current state-of-the-art algorithms developedfor pseudo-potential games. Our future work aims to use and apply the approachto adversarial generative networks. 18
Time a Time a Figure 4: The Bregman learning outcome changes from the robust Nash equi-librium (7 . , .