[PDF] Distributionally Robust Games: f-Divergence and Learning

Abstract

In this paper we introduce the novel framework of distributionally robust games. These are multi-player games where each player models the state of nature using a worst-case distribution, also called adversarial distribution. Thus each player's payoff depends on the other players' decisions and on the decision of a virtual player (nature) who selects an adversarial distribution of scenarios. This paper provides three main contributions. Firstly, the distributionally robust game is formulated using the statistical notions of f-divergence between two distributions, here represented by the adversarial distribution, and the exact distribution. Secondly, the complexity of the problem is significantly reduced by means of triality theory. Thirdly, stochastic Bregman learning algorithms are proposed to speedup the computation of robust equilibria. Finally, the theoretical findings are illustrated in a convex setting and its limitations are tested with a non-convex non-concave function.

Full PDF

aa r X i v : . [ m a t h . O C ] J u l Distributionally Robust Games: f -Divergence and Learning Dario Bauso , Jian Gao and Hamidou Tembine ∗† November 6, 2018

Abstract

In this paper we introduce the novel framework of distributionallyrobust games. These are multi-player games where each player modelsthe state of nature using a worst-case distribution, also called adversar-ial distribution. Thus each player’s payoﬀ depends on the other players’decisions and on the decision of a virtual player (nature) who selectsan adversarial distribution of scenarios. This paper provides three maincontributions. Firstly, the distributionally robust game is formulated us-ing the statistical notions of f -divergence between two distributions, hererepresented by the adversarial distribution, and the exact distribution.Secondly, the complexity of the problem is signiﬁcantly reduced by meansof triality theory. Thirdly, stochastic Bregman learning algorithms areproposed to speedup the computation of robust equilibria. Finally, thetheoretical ﬁndings are illustrated in a convex setting and its limitationsare tested with a non-convex non-concave function. Keywords:

Robustness, distribution uncertainty, stochastic optimization,robust game ∗ Department of Automatic Control and Systems Engineering, The University of Sheﬃeld,Mappin Street, Sheﬃeld, S1 3JD, United Kingdom [email protected] . † Learning & Game Theory Laboratory, New York University Abu Dhabi [email protected] . ontents Introduction

Games with payoﬀ uncertainty refer to games where the outcome of a play isnot known with certainty by the players. Such games are also called incompleteinformation games and can be formalized in diﬀerent ways. Distribution-freemodels of incomplete information games, both with and without private infor-mation are examined in [1, 2]. There the players use a robust optimizationapproach to contend with the worst-case scenario payoﬀ. The distribution freeapproach relaxes the well-known Bayesian game model approach by Harsanyi.The limitation of the distribution-free model is that the uncertainty set has tobe carefully designed and in most cases such approach leads to too conservativeand unrealistic scenarios.Strategic learning has proven to be a powerful approach in stochastic games.In particular, its algorithmic nature is well suited to accommodate parallel anddistributed information exchange and processing as well as hardware realizabil-ity. However, almost all existing learning approaches work well only for spe-ciﬁc classes of games such as concave-convex zero-sum games, convex potentialgames, and some S-modular game problems with unimodal objective functions.For more general classes of games, convergence of strategic learning dynamicsis still an open issue. In addition, learning algorithms for ﬁnding ﬁxed points orequilibria for general classes of games still present several challenges. For poly-matrix games with ﬁnite action spaces, there have been great progress includ-ing Cournot adjustment, Brown-von Neumann-Nash dynamics, reinforcementlearning, combined learning (see [13] and the references therein). For continu-ous action spaces, however, only few handful works are available. Evolutionarydynamics and revision protocols based actions has been proposed [6, 9, 5]. Theexploration of continuous action takes too much time if the dynamics is basedon individual action or measurable subsets [3, 7, 8]. Moreover, the convergencetime of these existing strategic learning algorithms (when a convergence to apoint or a limit cycle occurs) are unacceptably high even for potential gamesand it often requires strong assumptions such as bounded densities. The abovementioned prior works do not consider robust games setting. In [1, 2] a robustgame framework is presented. The authors deﬁned a distributed-free approach(by considering worst-case performance). However the choice of the uncertaintyset remains an important part of the robust game modelling. In this work weare interested in learning in distributionally robust games under f -divergence. We make several contributions in this paper. We introduce for the ﬁrst time anovel game model, called distributionally robust game. This game provides anew and original way of addressing game scenarios with incomplete information.For this game, we provide a rigorous deﬁnition of distributionally robust equilib-rium. Distributionally robust games accommodate both ﬁnite and continuousaction spaces. The relevance in formulating such a new game is in that it relaxesthe assumptions of Harsanyi’s Bayesian games. Distributionally robust games3iﬀer from the distribution-free framework of Aghassi & Bertsimas in [1, 2] inthat in the distribution-free approach the interval (or generally the uncertaintyset) needs to be known (learnable) by the decision-maker. In contrast to this,in the distributionally robust approach any alternative distribution within adivergence ball can be tested.As second contribution, we use a triality theory , which reduces considerablythe curse of dimensionality of the problem. We prove the existence of equilibriain any such robust ﬁnite game under suitable conditions.As third contribution, we provide a computational method based on Breg-man ﬂow for approximately computing equilibria. Such a computational methodallows us to test the implementability of the approach on numerical examples.We introduce a class of distributionally robust games with continuous actionspaces, for which a subset of equilibria can be computed using the Bregmanalgorithm. We show that the resulting iterative dynamics, which we call Breg-man dynamics, is characterized by double exponential decay and convergenceto distributionally robust equilibria.

The rest of the paper is structured as follows. In Section 2 we introduce dis-tributionally robust games. Section 3 presents a learning algorithm for robustequilibria. Section 4 focuses on stochastic Bregman learning. Section 5 pro-vides numerical results. Discussions on the ﬁnite action spaces are presented inSection 6. Section 7 concludes the paper.

In this section, we ﬁrst introduce the distribution uncertainty set, and thenformulate the distributionally robust game problem. Then, we deﬁne distri-butionally robust equilibrium, discuss triality theory and apply such theory toreduce the model via Lagrangian relaxation.

Let (Ω , F , m ) be a probability space. Here m is a probability measure deﬁnedon (Ω , F ) . The distribution m of the state ω is used to capture the probability ofthe diﬀerent scenarios and of the corresponding performance function obtainedunder each of such scenarios for ﬁxed action proﬁle. We assume that the exactdistribution of the state is not available in general. Therefore we propose anuncertainty/constraint set among all the possible distributions with a divergencebounded by above by a scalar value ρ. Such a constraint takes the form B ρ ( m ) = { ˜ m | Z Ω d ˜ m = ˜ m (Ω) = 1 , D f ( ˜ m || m ) ≤ ρ } , D f is the so-called f -divergence from the probability measure m to ˜ m deﬁned as D f ( ˜ m || m ) = Z ω ∈ Ω f (cid:18) d ˜ mdm (cid:19) dm − f (1) . Recall that for a convex (and proper) function f the Legendre-Fenchel du-ality holds: ( f ∗ ) ∗ = f where f ∗ ( ξ ) = sup x [ h x, ξ i − f ( x )] = − inf x [ f ( x ) − h x, ξ i ] . (1) Each player j chooses a j ∈ A j to optimize the worst loss performance functional E ˜ m l j ( a, ω ) subject to the constraint that the divergence D f ( ˜ m k m ) ≤ ρ . Thismeans that the worst loss performance is obtained under the assumption thata virtual player (nature) acts as a discriminator/attacker who modiﬁes the dis-tribution m into ˜ m with an eﬀort capacity that should not exceed ρ > . Therobust stochastic optimization of player j given ( a j ′ ) j ′ = j , m, ρ is given by( P j ) (cid:8) inf a j ∈A j sup ˜ m ∈ B ρ ( m ) E ˜ m l j ( a, ω ) . (2)Throughout the paper we assume that the following hold: The measure ˜ m is continuous with the respect to m and it is not a given proﬁle, it could bedeformed or falsiﬁed by the discriminator. The function l j ( ., ω ) is proper andupper semi-continuous for m − almost all ω ∈ Ω . Either the domain A j is anon-empty compact set or E ˜ m l j ( a, ω ) is coercive. Deﬁnition 1 (Distributionally Robust Game) . The robust game G ( m ) involves • the set of players J = { , , . . . , n } , n ≥ • the decision space A j of each player j , j ∈ J• the uncertainty set B ρ ( m ) deﬁned on the set of probability distributions m on Ω and ρ > • the payoﬀ function E ˜ m l j ( a, ω ) of player j , j ∈ J . With the above game in mind, we can introduce the following solution con-cept.

Deﬁnition 2 (Distributionally Robust Equilibrium) . Let a ∗ j be the conﬁgurationof player j and a ∗− j := ( a ∗ k ) k = j . A strategy proﬁle a ∗ = ( a ∗ , . . . , a ∗ n ) satisfying sup ˜ m ∈ B ρ ( m ) E ˜ m l j ( a ∗ , ω ) ≤ sup ˜ m ∈ B ρ ( m ) E ˜ m l j ( a j , a ∗− j , ω ) , for every a j ∈ A j and every agent j, is said distributionally robust pure Nashequilibrium of game G ( m ) . .3 From Duality to Triality Theory We here streamline the basic idea of triality theory. To this purpose, consideruncoupled domains A j , j ∈ J . For a general function l , one hassup a ∈A inf a ∈A l ( a , a ) ≤ inf a ∈A sup a ∈A l ( a , a )and the diﬀerence min a ∈A max a ∈A l ( a , a ) − max a ∈A min a ∈A l ( a , a )is the well-known duality gap. As it is widely known in duality theory fromSion’s Theorem [11] (which is an extension of von Neumann minimax Theorem)there is an equality, for example for convex-concave function, and the value isachieved by a saddle point in the case of non-empty convex compact domain.For a general function l , ( a , a , a ) l ( a , a , a ) one hasinf a ∈A sup a ∈A inf a ∈A l ( . ) ≤ inf a ∈A ,a ∈A sup a ∈A l ( . ) , sup a ∈A inf a ∈A sup a ∈A l ( . ) ≥ sup a ∈A ,a ∈A inf a ∈A l ( . ) . Proposition 1 (Triality) . Let ( a , a , a ) l ( a , a , a ) ∈ R be a function l deﬁned on Q i =1 A i . Then, the following inequalities hold: sup a ∈A inf a ∈A ,a ∈A l ( a , a , a ) ≤ inf a ∈A sup a ∈A inf a ∈A l ( a , a , a ) ≤ inf a ∈A ,a ∈A sup a ∈A l ( a , a , a ) , (3) and similarly sup a ∈A ,a ∈A inf a ∈A l ( a , a , a ) ≤ sup a ∈A inf a ∈A sup a ∈A l ( a , a , a ) ≤ inf a ∈A sup a ∈A ,a ∈A l ( a , a , a ) . (4) Proof.

First we shall prove the sup inf inequality. Deﬁne g ( a , a ) = inf a ∈A l ( a , a , a ) . Thus, for all a , a , one has g ( a , a ) ≤ l ( a , a , a ) . It follows that, for any a , a , sup a ∈A g ( a , a ) ≤ sup a ∈A l ( a , a , a ) . Using the deﬁnition of g, one obtainsup a ∈A inf a ∈A l ( a , a , a ) ≤ sup a ∈A l ( a , a , a ) , ∀ a , a . Taking the inﬁnimum in a yields:sup a ∈A inf a ∈A l ( a , a , a ) ≤ inf a ∈A sup a ∈A l ( a , a , a ) , ∀ a (5)Now, for variable in a we use two operations:6 Taking the inﬁninum in inequality (5) in a yieldsinf a ∈A sup a ∈A inf a ∈A l ( a , a , a ) ≤ inf a ∈A inf a ∈A sup a ∈A l ( a , a , a )= inf ( a ,a ) ∈A ×A sup a ∈A l ( a , a , a ) , which proves the second part of the inequalities (3). The ﬁrst part of theinequalities (3) follows immediately from (5). • Taking the supremum in inequality (5) in a yieldssup ( a ,a ) ∈A ×A inf a ∈A l ( a , a , a ) ≤ sup a ∈A inf a ∈A sup a ∈A l ( a , a , a ) , which proves the ﬁrst part of the inequalities (4). The second part of theinequalities (4) follows immediately from (5).This completes the proof.We use the above inequalities in the Lagrangean relaxation of the MaxMinRobust Game. Assume that a E ˜ m l j ( a, ω ) is continuous for m − almost all ω. Then, the func-tional F j : ˜ m inf a j E ˜ m l j ( a, ω ) is Gateaux diﬀerentiable with derivative F j,m ( ˆ m ) = inf a j ∈A ∗ j ( m ) E ˆ m l j ( a, ω ) , where A ∗ j ( m ) = arg min a j E m l j ( a, ω ) is the best-response under m. This deriva-tive in the space of square integrable measurable functions under m which isof inﬁnite dimensions, does not facilitate the computation of the robust op-timal strategy a ∗ j , ˜ m ∗ . Below we propose an equivalent problem that reducesconsiderably the curse of dimensionality of the problem.

In order to reduce the curse of dimensionality of the problem we use a trialitytheory. The robust best-response problem of agent j is equivalent to (cid:8) inf a j sup L ∈ L ρ ( m ) E m [ l j L ]; (6)7here L ( ω ) = d ˜ mdm ( ω ) is the likelihood and set L ρ ( m ) is L ρ ( m ) = { L | Z ω f ( L ( ω )) dm − f (1) ≤ ρ, Z ω L ( ω ) dm = 1 } . We introduce the Lagrangian as˜ l j ( a, L, λ, µ ) = R ω l j ( a, ω ) L ( ω ) dm + λ ( ρ + f (1) − R ω f ( L ( ω )) dm )+ µ (1 − R ω L ( ω ) dm ( ω )) , where λ ≥ µ ∈ R . The problem solved by Player j is( ˜ P ∗ j ) n inf a j sup L ∈ L ρ ( m ) inf λ ≥ ,µ ∈ R ˜ l j ( a, L, λ, µ ) . (7)A full understanding of problem ( ˜ P ∗ j ) requires a triality theory (not a dualitytheory) whose main principles were streamlined in Section 2.3. The underlyingidea is that one can use a transformation of the last two terms to derive aﬁnite dimensional optimization problem. The Lagrangian ˜ l j of agent j is clearlyconcave in L and convex in λ, µ and is semi-continuous jointly. By the trialitytheory above, ˜ l j : ( a, L, λ, µ ) ˜ l j ( a, L, λ, µ ) satisﬁes the sup inf inequality andone has the following:inf a j sup L ∈ L ρ ( m ) inf λ ≥ ,µ ∈ R ˜ l j ( . ) ≤ inf a j inf λ ≥ ,µ ∈ R sup L ∈ L ρ ( m ) ˜ l j ( . ) . In this case there is no gap in the second part of the optimization and thefollowing equality holds:inf a j sup L ∈ L ρ ( m ) inf λ ≥ ,µ ∈ R ˜ l j ( . ) = inf a j inf λ ≥ ,µ ∈ R sup L ∈ L ρ ( m ) ˜ l j ( . ) . The latter problem can be rewritten as( ˜ P ∗ j ) n inf a j ∈A j ,λ ≥ ,µ ∈ R [sup L ∈ L ρ ( m ) ˜ l j ( a, L, λ, µ )] . (8)The Lagrangian function takes the form as ˜ l j = λ ( ρ + f (1)) + µ + R { L [ l j − µ ] − λf ( L ) } dm. It follows that sup L ∈ L ρ ( m ) ˜ l j ( a, L, λ, µ ) = λ ( ρ + f (1)) + µ + sup L Z { L [ l j − µ ] − λf ( L ) } dm. (9)Introducing the Fenchel-Legendre transform on L and exchanging sup and R , one gets sup L ∈ L ρ ( m ) ˜ l j ( . ) = λ ( ρ + f (1)) + µ + Z λf ∗ ( l j − µλ ) dm. A j × R + × R is a subset of a ﬁnite dimensional vector space, then itfollows that the robust best-response problem of agent j is equivalent to theﬁnite dimensional stochastic optimization problem:( P ∗ j )  inf a j ∈A j ,λ ≥ ,µ ∈ R l ∗ j ( a, λ, µ, m ) l ∗ j ( a, λ, µ, m ) = λ ( ρ + f (1)) + µ + R λf ∗ ( l j − µλ ) dm = E m h j . (10)where h j is the integrand cost λ ( ρ + f (1)) + µ + λf ∗ ( l j − µλ ) . We have convertedthe inﬁnite dimensional problem ( P j ) into a ﬁnite dimensional problem ( P ∗ j ) . The above calculations culminate in the following result:

Proposition 2. If a, λ ∗ ( a ) , µ ∗ ( a ) , is a solution of ( P ∗ j ) then the optimal like-lihood L ∗ is such that R ω L ∗ dm = 1 , f ′ ( L ∗ ) = l j − µ ∗ λ ∗ . This means that a j and d ˜ m ∗ = L ∗ dm provide a solution of the original problem ( P j ) . Proof.

Let λ ∗ ( a ) , µ ∗ ( a ) be solution to ( P ∗ j ) associated with the proﬁle a. Then,the optimal likelihood L ∗ is obtained by diﬀerentiating f ∗ or by inverting theequation f ′ ( L ∗ ) = l j − µ ∗ λ ∗ . As ˜ m is a probability measure, and using the deﬁnitionof L ∗ , one gets d ˜ m ∗ ( ω ) = L ∗ dm ( ω ) . It follows that a ∗ j , L ∗ solves the original problem ( P j ) . Next we look at the existence of robust equilibria.

As in classical game theory, suﬃciency condition for existence of robust equi-librium can be obtained from the standard ﬁxed-point theory which we recallnext. Let A j be nonempty compact convex sets and l ∗ j be continuous functionssuch that for any ﬁxed ( z k ) k = j , the function z j l ∗ j ( z, m ) is quasi-convex foreach j. Then, there exists at least one distributionally robust pure equilibrium.This result can be easily extended to the coupled-action constraint case forrobust generalized Nash equilibria.

Using l ∗ j ( a, λ, µ, m ) = λ ( ρ + f (1)) + µ + lim N j → + ∞ N j P N j k =1 λf ∗ ( l j ( .,ω k ) − µλ )where ω k ∼ m. Let m N j = 1 N j N j X k =1 δ ω k

9e the empirical measure of the channel state and deﬁne ǫ N j p N j = p N j sup ˜ m ∈ D ρNj ( m Nj ) E ˜ m l j − p N j E m Nj l j − q N j ρ N j var m Nj [ l j ]with N j ρ N j < + ∞ . Then, the following holds: ǫ N j p N j → N j grows. Theabove result states that the robust performance captures the risk by consideringthe variance and not just the ergodic performance. In this section we develop learning algorithms for ( P ∗ j ) j . Consider the optimal control problem inf u ∈ U R T ˆ l ( t, z, u ) dt such that ˙ z = u. Themaximum principle is a necessary condition of optimality when the underlyingfunction is suﬃciently smooth. The adjoint variable ˙ p = − H z = − ˆ l z andthe optimal control optimizes the Hamiltonian H ( z, p ) = inf u ∈ U { ˆ l + pu } i.e.,the Legendre-Fenchel transform of − ˆ l applied at the point − p. A closed-formexpression of the optimal control can be obtained and it is generically given by u ∗ = H p ( z, p ) . A necessary condition for optimality says that H u ∗ ( u − u ∗ ) ≥ u ∈ U, where H u denotes an element of the sub-diﬀerential of H. Thislatter variational equation can be rewritten as0 ≤ H u ∗ ( u − u ∗ ) = [ˆ l u ∗ + p ]( u − u ∗ ) , (11)for all u ∈ U. In particular, an interior solution u ∗ should solve p = − ˆ l u ∗ and the adjointequation becomes ˙ p = ddt ( − ˆ l u ∗ ) = − ˆ l z ( z, u ∗ ) , which means that ddt ˆ l ˙ z = ˆ l z ( z, ˙ z ) . The latter equation is also called Euler-Lagrange equation in the ﬁeld of calculusof variations. Since the minimization is among all possible curves, this minimumprinciple may exhibit features that allow to investigate faster time curves.

Let g : A → R be a diﬀerentiable, strictly convex function. The Bregmandivergence [10] is d g : A × relint ( A ) → R and is deﬁned as d BRg ( y, x ) = g ( y ) − g ( x ) − h g x ( x ) , y − x i , where relint ( A ) denotes the relative interior of A .

10e investigate the equation ddt ˆ l u ( z, u ) = ˆ l z ( z, u ) for a class of quantity-of-interest ˆ l. Let the family of Bregman-based Lagrangian be ˆ l ( z, u ) = e α + γ [ d g ( z + e − α u, z ) − e β l ∗ ( z )] . Proposition 3.

The Euler-Lagrange equation reduces to the following secondorder diﬀerential system, for ˙ γ = e α , ¨ z + ( e α − ˙ α ) ˙ z + e α + β g − zz ( z + e − α ˙ x ) l ∗ z ( z ) = 0 . (12) Proof.

We start with the deﬁnition of Bregman divergence. A simple computa-tion shows that ∂ y d g ( y, x ) = g x ( y ) − g x ( x ) , ∂ x d g ( y, x ) = − g xx ( x ) . ( y − x ) . By diﬀerentiating the functional ˆ l one getsˆ l z = e α + γ [ − d g, ( z + e − α u, z ) − d g, ( z + e − α u, z ) + e β l ∗ z ]= e α + γ [ − g z ( y ) + g z ( z ) + g zz ( z )( y − z ) + e β l ∗ z ] , ˆ l u = − e γ d g, ( z + e − α u, z ) = − e γ [ g z ( z + e − α u ) − g z ( z )] (13)It follows that ddt ˆ l u = − ˙ γe γ [ g z ( z + e − α u ) − g z ( z )] − e γ ddt [ g z ( z + e − α u ) − g z ( z )]= − ˙ γe γ [ g z ( z + e − α u ) − g z ( z )] − e γ [ g zz ( z + e − α u )( ˙ z − ˙ αe − α ˙ z + e − α ¨ z ) − g zz ( z ) ˙ z ]= ˆ l z = e α + γ [ − g z ( y ) + g z ( z ) + g zz ( z )( y − z ) + e β l ∗ z ] . (14)Thus, − ˙ γ [ g z ( z + e − α ˙ z ) − g z ( z )] − [ g zz ( z + e − α ˙ z )( ˙ z − ˙ αe − α ˙ z + e − α ¨ z ) − g zz ( z ) ˙ z ]= e α [ − g z ( z + e − α ˙ z ) + g z ( z ) + g zz ( z ) e − α ˙ z + e β l ∗ z ] . (15)By rearranging the terms in ¨ z, ˙ z we obtain e α ( ˙ γ − e α ) g − zz ( z + e − α ˙ z )[ g z ( z + e − α ˙ z ) − g z ( z )]+ e α g − zz ( z + e − α ˙ z )[ g zz ( z + e − α ˙ z )( ˙ z − ˙ αe − α ˙ z ) − g zz ( z ) ˙ z ]+ e α g − zz ( z + e − α ˙ z )[ g zz e − α ˙ z + e β l ∗ z ] + ¨ z = 0 . (16)Taking ˙ γ = e α yields e α g − zz ( z + e − α ˙ z )[ g zz ( z + e − α ˙ z )( ˙ z − ˙ αe − α ˙ z )]+ e α g − zz ( z + e − α ˙ z )[ g zz e − α ˙ z + e β l ∗ z ] + ¨ z = 0 . (17)The Euler-Lagrange equation reduces to¨ z + ( e α − ˙ α ) ˙ z + e α + β g − zz ( z + e − α ˙ z ) l ∗ z ( z ) = 0 , (18)11hich can be rewritten as[¨ ze − α + (1 − ˙ ae − α ) ˙ z ] g zz ( z + e − α ˙ z ) + e α + β l ∗ z ( z ) = 0 . (19)From the above, the Bregman algorithm yields ddt [ g z ( z + e − α ˙ z )] = − e α + β l ∗ z ( z ) . (20)This completes the proof.Note that the second order system is easily converted into a ﬁrst-order systemby setting  y = z + e − α u, ˙ z = u, ˙ u = ¨ z = − ( e α − ˙ α ) u − e α + β g − zz ( z + e − α u ) l ∗ z ( z ) . Deﬁnition 3.

We say that z ˜ l ∗ ( z ) is a best response pseudo-potential func-tion for the distributionally robust game G ( m ) if arg min z j ˜ l ∗ ( z ) ⊆ arg min z j l ∗ j ( z ) , for every j. The Bregman algorithm is given by ddt [ g z ( z + e − α ˙ z )] = − e α + β ˜ l ∗ z ( z ) , z (0) = z , (21)where β ( t ) = β (0) + R t e α ( t ′ ) dt ′ , β (0) ≥ , and α is a time-dependent function. Proposition 4. If ˜ l ∗ is convex then ≤ − ˜ l ( z ∗ ) + ˜ l ( z ( t )) ≤ e − β ( t ) c , where c = d BRg ( z ∗ , z + e − α (0) ˙ z ) + e β (0) [ − ˜ l ( z ∗ ) + ˜ l ( z (0))] . By choosing α ( t ) = t, β ( t ) = e t , and the error gap is − ˜ l ( z ∗ ) + ˜ l ( z ( t )) ≤ e − e t c . It takes T η = log log( c η ) time units to reach a neighborhood of the equilibriumpayoﬀ of z ∗ with a precision η > . This is faster than Ishikawa-Nesterov algo-rithm O ( √ η ), gradient ascent method O ( η ) , no-regret dynamics, and back-boxoptimization c η . Thus, Bregman dynamics speeds up the learning and improvesclassical methods with exponential decay.The proof of Proposition 4 is based on a careful construction of a generalizedbest-response pseudo-potential function using Pontryagin maximum principle.It extends the framework developed in [4] to the context of strategic-form games.Then we check that the following function V is a Lyapuvov function: V ( z ∗ , z ( t )) = d BRg ( z ∗ , z ( t ) + e − α ( t ) ˙ z ( t )) + e β ( t ) [ − ˜ l ( z ∗ ) + ˜ l ( z ( t ))]12here z ( t ) is generated by the Bregman algorithm.Note that Proposition 4 does not require the strong convexity property oftenused in the proof of convergence gradient dynamics and Newton-based gradientmethods. This is because the Bregman divergence is carefully designed to com-pensate that part. Table 1 summarizes the theoretical speedup advantages ofBregman algorithms over the state-of-the-art algorithms.Accuracy Time to ReachThis paper O ( e − β ( t ) ) O ( β − (log( η )))This paper (Bregman) O ( e − e t ) O (log(log( η )))Ishikawa-Nesterov O ( t ) O ( √ η )Conjugate/proximal gradient O ( t ) O ( η )Gradient ascent O ( t ) O ( η )Regret-min O ( log tt ) -Standard black-box O ( √ t + . ) O ( η )Table 1: Performance of the proposed Bregman algorithm compared to theclassical ones with a precision error within η > . Proof of Proposition 4.

Let us deﬁne function V as follows. V ( z, u, t, z ∗ ) = d g ( z ∗ , z + e − α u ) + e β [ − l ∗ ( z ∗ ) + l ∗ ( z )] . (22)Then the function V ( z, u, t ) is positive. Let us compute the time derivative of V over the path z ( t ) , u ( t ) generated by the Bregman algorithm. ddt [ g z ( z + e − α ˙ z )] = − e α + β l ∗ z ( z ) . We also have that ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = − ddt [ z + e − α u ] g zz ( z + e − α u ) . ( z ∗ − z − e − α u )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ )] + e β l ∗ z ( z ) ˙ z (23)By summing and subtracting the same term we have ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = − ddt [ z + e − α u ] g zz ( z + e − α u ) . ( z ∗ − z − e − α u )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) + l ∗ z ( z ∗ − z ) − l ∗ z ( z ∗ − z )]+ e β l ∗ z ( z ) ˙ z = − ddt [ g z ( z + e − α u )] . ( z ∗ − z − e − α u )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) + l ∗ z ( z ∗ − z )] − ˙ βe β l ∗ z ( z ∗ − z ) + e β l ∗ z ( z ) ˙ z. (24)13he above leads to further expansion as follows ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = e α + β l ∗ z ( z )( z ∗ − z − e − α u )+˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) + l ∗ z ( z ∗ − z )] − ˙ βe β l ∗ z ( z ∗ − z ) + e β l ∗ z ( z ) ˙ z = e α + β l ∗ z ( z )( z ∗ − z ) + e β l ∗ z ( z )( ˙ z − u ) − ˙ βe β l ∗ z ( z ∗ − z ) + ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )]= e β ( e α − ˙ β ) l ∗ z ( z ∗ − z )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )] . (25)By convexity of the function l ∗ , [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )] ≤ l ∗ z ( z ∗ − z ) ≤ . If e α − ˙ β ≥ ddt V ( z ( t ) , u ( t ) , t, z ∗ ) = e β ( e α − ˙ β ) l ∗ z ( z ∗ − z )+ ˙ βe β [ l ∗ ( z ) − l ∗ ( z ∗ ) − l ∗ z ( z − z ∗ )] ≤ . (26)Thus, ddt V ( z ( t ) , u ( t ) , t, z ∗ ) ≤ β ≤ e α . Then the function is decreasingover the path of the Bregman algorithm. It follows that e β [ l ∗ ( z ) − l ∗ ( z ∗ )] ≤ V ( z, u, t, z ∗ ) ≤ V ( z , u , t ) . Then, the global error is0 ≤ l ∗ ( z ) − l ∗ ( z ∗ ) ≤ e − β V ( z , u , t ) , with ˙ β ≤ e α , which shows an exponential convergence to z ∗ . This completes theproof.

Very often the computation of the terms ∇ E l ∗ , E [ λ ( ρ + f (1)) + µ + f ∗ ( l j − µλ )]or its partial derivatives is challenging and depends on the structure of thedistribution m. We now propose a swarm learning to estimate the expectedgradient and then insert it to the Bregman algorithm (21), leading to particleswarm stochastic Bregman algorithm.

We propose a stochastic Bregman learning framework which is adjusted basedonly on the realized integrand h j ( z, ω j ) := λ ( ρ + f (1)) + µ + λf ∗ ( l j − µλ ) . Theexpected value of h j is E ω ∼ m h j = l ∗ j . The stochastic Bregman dynamics yields ddt [ g j,z j ( z + e − α u )] = − e α + β h j,z j ( z, ω j )= − e α + β [ E h j,z j ( z, ω j ) + W j ] = − e α + β [ l ∗ j,z ( z ) + W j ( z, ω j )] ,i ∈ { , , . . . , n } . Function z = ( a, λ, µ ) is now a stochastic process because of the stochastic term ω j and W j ( z ) = h j,z j ( z, ω j ) − E h j,z j ( z, ω j ) . The variance of W is being highand not vanishing because it is based on a single particle path discrepancy. Weintroduce in the following subsection a swarm of particles.14 .2 Swarm of particles Let us associate to each agent j a swarm of virtual particles ω jk . Then, we have E h j,z j ( z, . ) = lim N N N X k =1 h j,z j ( z, ω jk ) . Swarm-based stochastic Bregman dynamics yields ddt [ g z j ( z + e − α u )] = − e α + β N P Nk =1 h j,z j ( z, ω jk ) ,ω jk ∼ m, j ∈ { , , . . . , n } . (27)This is a mean-ﬁeld-type interacting system and can be seen as control-dependentcorrelated noise modiﬁcation of the Bregman dynamics as ddt [ g z j ( z + e − α u )] = − e α + β [ l ∗ j,z j ( z ) + ǫ j,N ( z, ω )] ,j ∈ { , , . . . , n } , where ǫ j,N = N P Nk =1 h j,z j ( z, ω jk ) − l ∗ j,z j ( z ) has zero mean and standard devi-ation as q E [ ǫ j,N ] = r var[ h j,z j ( z, . )] N . For a realized ω, set l ∗ j,N = 1 N N X k =1 h j ( z, ω jk ) , z ∗ j,N ∈ arg min z N N X k =1 h j ( z, ω jk ) . Then, the particle swarm Bregman algorithm (27) gives as output the function z N ( t ) that satisﬁes 0 ≤ − ˜ l N ( z ∗ N ) + ˜ l N ( z N ( t )) ≤ e − β ( t ) c ,N where c ,N := d BRg ( z ∗ N , z + e − α (0) ˙ z ) + e β (0) [ − ˜ l N ( z ∗ N ) + ˜ l N ( z )] . This says that the N − swarm per player Bregman scheme provides a goodapproximation of the robust equilibrium. To illustrate the particle swarm Bregman algorithm (27) we consider speciﬁcrobust games. We consider two agents and the discriminator/adversary. Wechoose f ( x ) = (cid:26) x log x − x if x > , x = 0 . (28)We compute f (1) = − f ′ ( x ) = log x, f ′′ ( x ) = x > . Hence f is convexon R + . The Legendre-Fenchel transform of f yields f ∗ ( ξ ) = e ξ . .1 Best-response Pseudo-Potential Distributionally Ro-bust Game We set l j ( a, ω ) = log (cid:0) ω a + ω a (cid:1) deﬁned over R . The integrand function h j is h j = λ ( ρ −

1) + µ + λ (1 + ω a + ω a ) λ e − µλ The random variable ω is distributed over m and we assume that ω has ﬁnitemoments. The stochastic robust objective function l ∗ j,N : l ∗ j,N = λ ( ρ −

1) + µ + λN N X k =1 (1 + ω k a + ω k a ) λ e − µλ Time a -4-2024681012 Bregman DynamicsGradient Dynamics

Time a -2-10123456 Bregman DynamicsGradient Dynamics

Figure 1: Gradient vs Bregman-based dynamics.We illustrate the Bregman-based dynamics in Figure 1 N = 1000 samples.We observe a rapid convergence to the robust equilibrium. The trajectory is nota descent but the amplitude of oscillation quickly decreases and an acceptableconvergence time that is 20 times better than the classical gradient dynamics. We set l j ( a, ω ) = log (cid:0)

20 + ω − sin( a ) sin( a ) √ a a (cid:1) , , . The function l j has multiple local extrema as illus-trated in Figures 2 and 3. The objective function of agent j is non-convex andnon-concave. The function l j is chosen because it does not fulﬁll the conditionsof the Theorem 4. We observed that the Bregman algorithm behaves well evenfor this multimodal case which opens the investigation for non-convex objectivefunctions. The multimodal function has a robust equilibrium around (7 . , . . . In Figure 4, the Bregman learn-ing outcome changes from the distributionally robust Nash equilibrium (7 . , . . , . . Figure 2: The function l j is non-convex, non-concave and has multiple localextrema. We now discuss how the above framework can be used in the discrete actionspace case. We limit our exposition to the ﬁnite set. A j is a ﬁnite (withtwo or more actions), it is not a convex set. We convexify them as standardmixed strategy approach. A j will be replaced by X j the simplex over A j . Therobust payoﬀ on pure action proﬁle will be replaced by the expected robustpayoﬀ. One obtains the so-called mixed extension of the game. The existenceof distributionally robust mixed equilibrium follows from standard ﬁxed-pointtheorems. 17igure 3: The particle swarm Bregman learning leads to the robust Nashequilibria around (7 . , .

9) and the equilibrium value is around 7 . . Acknowledgment

This research is supported by U.S. Air Force Oﬃce of Scientiﬁc Research undergrant number FA9550-17-1-0259. This research was conducted when D. Bausowas visiting NYUAD.

We have introduced distributionally robust game with continuous action spacefor each agent and a possible adversarial modiﬁcation of the uncertainty. Theproblem is formulated using a notion of divergence between two measures: themodiﬁed measure and the exact measure associated to the uncertainty. In thecontext of existence of robust solutions, additional diﬃculties arise if in additiona robustness condition or an adversarial control of the distribution is involvedin the objective function. We have used triality theory to transform the ob-jective function of each agent. This transformation reduces considerably thecurse of dimensionality of the problem. Then, suﬃcient conditions of existenceof solution are derived. We constructed a speedup learning algorithm basedon Bregman discrepancy. The methodology does not require strong convexityassumptions as in the classical gradient algorithms. The convergence time isshown to be much faster than the current state-of-the-art algorithms developedfor pseudo-potential games. Our future work aims to use and apply the approachto adversarial generative networks. 18

Time a Time a Figure 4: The Bregman learning outcome changes from the robust Nash equi-librium (7 . , .