Convergence of Bayesian Nash Equilibrium in Infinite Bayesian Games under Discretization
aa r X i v : . [ c s . G T ] F e b Convergence of Bayesian Nash Equilibrium inInfinite Bayesian Games under Discretization
Linan Huang and Quanyan ZhuDepartment of Electrical and Computer Engineering, Tandon School of Engineering,New York University, Brooklyn, NY 11201 USAEmail: { lh2328, qz494 } @nyu.edu Abstract
We prove the existence of Bayesian Nash Equilibrium (BNE) of general-sum Bayesian gameswith continuous types and finite actions under the conditions that the utility functions and the priortype distributions are continuous concerning the players’ types. Moreover, there exists a sequence ofdiscretized Bayesian games whose BNE strategies converge weakly to a BNE strategy of the infiniteBayesian game. Our proof establishes a connection between the equilibria of the infinite Bayesiangame and those of finite approximations, which leads to an algorithm to construct ε -BNE of infiniteBayesian games by discretizing players’ type spaces. I. I
NTRODUCTION
Bayesian games [6] have found wide application in auctions [12], wireless networks [1], cyberse-curity [7], [8], and robotic systems [9]. In these applications, it is natural to model the incompleteinformation such as players’ bids in auction theory as a continuous random variable. However, theexisting computational techniques are mainly for finite Bayesian games where the action and thetype spaces are both finite. For Bayesian games with continuous types, the equilibrium is usuallycomputed under restrictive assumptions. For example, [3] focuses on the single crossing conditionand the authors in [5] restrict the type distribution to be piecewise linear with some prior domainknowledge of a qualitative model. Iterative methods and learning have also been applied. The authorsin [18] focus on the piecewise uniform type distribution and payoffs that are linear functions fromplayers’ types and actions. They apply an iterated best response to compute the BNE. The authorsin [17] restrict each player’s utility to be independent of other’s types and develop a fictitious playalgorithm to learn pure-strategy equilibrium.In this paper, we consider general Bayesian games with continuous types and prove the existenceof BNE in these games. Comparing to previous works (see e.g., [14], [4]) that prove the existenceof BNE in infinite Bayesian games, we further prove that there exists a sequence of discretizedBayesian games whose BNE strategies converge weakly to a BNE strategy of the infinite Bayesiangame. Our proof further implies an algorithm to approximate the BNE of infinite Bayesian games bydiscretization. The convergence of equilibrium strategies by discretization or sampling has been shownin complete information games with continuous actions [15], signaling games of certain classes [13],and infinite Bayesian Stackelberg games [11]. The authors in [2] define a new concept of constrainedstrategic equilibrium (CSE) for Bayesian games and propose sufficient conditions under which asequence of CSEs converges toward a BNE. However, the convergence of BNE has not been shownin simultaneous-move Bayesian games of continuous types.After a proper reformulation, we obtain BNE in its distributional form, which enables us to adoptthe key idea from [15]. Following a similar argument in [15], our results in two-player general-suminfinite Bayesian games can be directly extended to the N -player case. Since there exists a one-to-onemapping from any set with the cardinality of the continuum to the unit interval [0 , (i.e., R n and [0 , has the same cardinality), we can directly extend the convergence theorem to any compact jointtype space of higher dimensions. II. B
AYESIAN G AMES WITH C ONTINUOUS T YPES
We consider the following Bayesian game
Γ := < X , Y , Θ × Θ , b ( · ) , { ¯ u x,y ( · ) , ¯ v x,y ( · ) } x ∈X ,y ∈Y > with a compact joint type space Θ × Θ and two finite action spaces of X := { x , ..., x L } and Y := { y , ..., y H } ; i.e., the first and the second player have L and H actions to choose from andsimultaneously take action x ∈ X and y ∈ Y , respectively. The incomplete information of the gameis represented by two single-dimensional continuous random variables ˜ θ ∈ Θ , ˜ θ ∈ Θ whosejoint distribution b is assumed to be common knowledge and continuous over the joint type space Θ × Θ . We require the marginal distribution to be positive, i.e., ¯ b i ( θ i ) := R Θ j b ( θ i , θ j ) dθ j > , ∀ i ∈{ , } , ∀ θ i ∈ Θ i and take Θ = Θ = [0 , without loss of generality. Player i privately observeshis type realization θ i ∈ Θ i and knows that the other player j has a type θ j ∈ Θ j with a probabilitydensity of b i ( θ j | θ i ) := b ( θ j , θ i ) / ¯ b i ( θ i ) ∈ R +0 . Then, b i is a valid conditional probability measure andwe have R b i ( θ j | θ i ) dθ j = 1 , ∀ θ i ∈ Θ i .The utility functions ¯ u x,y ( θ , θ ) ∈ R +0 and ¯ v x,y ( θ , θ ) ∈ R +0 of the first and the second player,respectively, depend on players’ actions x ∈ X , y ∈ Y , and types θ ∈ Θ , θ ∈ Θ . We furtherassume that both players’ utility functions ¯ u x,y ( θ , θ ) and ¯ v x,y ( θ , θ ) are continuous over the jointtype set Θ × Θ for all actions x ∈ X , y ∈ Y . Since a continuous function on a compact metric spaceis bounded and uniformly continuous , we know that both players’ utility functions are bounded anduniformly continuous over the joint type set. Therefore, we can assume non-negative utility functionswithout loss of generality as we can always add a sufficiently large constant, which is guaranteed bythe boundedness, to make them non-negative without any change to the equilibrium policy.The behavioral strategies σ : Θ ∆ X and σ : Θ ∆ Y of the first and the second player,respectively, map each player’s type to the distribution of his action space. In particular, we denote σ ( x | θ ) ∈ R +0 (resp. σ ( y | θ ) ∈ R +0 ) as the probability of player (resp. player ) taking action x ∈ X (resp. action y ∈ Y ) when his type is θ ∈ Θ (resp. θ ∈ Θ ). Obviously, we have P x ∈X σ ( x | θ ) = 1 , ∀ θ ∈ Θ and P y ∈Y σ ( y | θ ) = 1 , ∀ θ ∈ Θ . Define two players’ expectedutilities under any strategy pair ( σ , σ ) as r ( θ , σ , σ ) := Z b ( θ | θ ) X x ∈X σ ( x | θ ) X y ∈Y σ ( y | θ )¯ u x,y ( θ , θ ) dθ .r ( θ , σ , σ ) := Z b ( θ | θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ( y | θ )¯ v x,y ( θ , θ ) dθ . (1)For player i of type θ i ∈ Θ i , his best response strategy σ ∗ i ( ·| θ i ) with respect to the player’s strategy σ j belongs to a set B i ( θ i , σ j ) , i.e., σ ∗ i ( ·| θ i ) ∈ B i ( θ i , σ j ) := arg max σ i ( ·| θ i ) r i ( θ i , σ i , σ j ) . (2)For any given policy σ j of the other player j , player i ’s best response set B i ( θ i , σ j ) under type θ i isnonempty and contains a pure policy as shown in Lemma 1. Analogous statement holds for player . Lemma 1 (Pure Policy in Best Response Set) . If the second player’s strategy σ is common knowledge,then player ’s best response set B ( θ , σ ) under any θ ∈ Θ contains the following pure policy arg max x ∈X Z Z b ( θ | θ ) X y ∈Y σ ∗ ( y | θ )¯ u x,y ( θ , θ ) dθ . A strategy pair consists a BNE if they are best response to each other as defined below.
Definition 1 (Bayesian Nash Equilibrium) . A strategy pair ( σ ∗ , σ ∗ ) consists a BNE of infinite Bayesiangame Γ if σ ∗ i ( ·| θ i ) ∈ B i ( θ i , σ ∗ j ) , ∀ i, j ∈ { , } , i = j , for almost every θ ∈ Θ and θ ∈ Θ . The joint type space refers to the Cartesian product (denoted as × ) of each player i ’s type space Θ i . Since the joint typespace is compact, each Θ i has to be compact. “Almost” in this context means that the probability of all types for which the strategy does not prescribe an optimalaction is zero. For example, if player i ’s strategies differ only at countable points over Θ i , then they result in the same valueof Riemann integration in (1). Since ¯ b i ( θ i ) > , ∀ θ i ∈ Θ , ∀ i ∈ { , } , Lemma 2 below shows that we can compute BNE strategypair ( σ ∗ , σ ∗ ) through the following integration form in (3) and (4); i.e., no player has a profitabledeviation after he knows his private type if and only if he does not benefit from any deviation beforeknowing his type [6]. Z Z b ( θ , θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ∗ ( y | θ )¯ u x,y ( θ , θ ) dθ dθ = max σ Z Z b ( θ , θ ) X x ∈X σ ( x | θ ) X y ∈Y σ ∗ ( y | θ )¯ u x,y ( θ , θ ) dθ dθ , (3)and Z Z b ( θ , θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ∗ ( y | θ )¯ v x,y ( θ , θ ) dθ dθ = max σ Z Z b ( θ , θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ( y | θ )¯ v x,y ( θ , θ ) dθ dθ . (4) Lemma 2 (BNE is equivalent to Nash Equilibrium) . A strategy pair ( σ ∗ , σ ∗ ) consists a BNE if andonly if (3) and (4) holds.Proof. The ‘only if’ part (sufficiency) is straight forward as (2) results in (3) and (4). To prove the‘if’ part (necessity), we show that if ( σ ∗ , σ ∗ ) is not a BNE defined in Definition 1, then (3) and (4)cannot hold in the same time. As ( σ ∗ , σ ∗ ) is not a BNE, there exists a measurable set ˆΘ i ⊆ Θ i andat least one player i (assume the second player) who has a profitable deviation from σ ∗ ( ·| θ ) to anaction y l ∈ Y when θ ∈ ˆΘ , i.e., Z ˆΘ ¯ b ( θ ) (cid:20) Z b ( θ | θ ) X x ∈X σ ∗ ( x | θ )¯ v x,y l ( θ , θ ) dθ (cid:21) dθ > Z ˆΘ ¯ b ( θ ) (cid:20) Z b ( θ | θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ∗ ( y | θ )¯ v x,y ( θ , θ ) dθ (cid:21) dθ . Consider a strategy ˆ σ where ˆ σ ( y | θ ) = { y = y l } , ∀ y ∈ Y , θ ∈ ˆΘ and ˆ σ = σ , ∀ θ / ∈ ˆΘ ; i.e., ˆ σ is identical to σ except over the set ˆΘ . Then, we know that Z Z b ( θ , θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y ˆ σ ( y | θ )¯ v x,y ( θ , θ ) dθ dθ = Z ˆΘ ˆ b ( θ ) (cid:20) Z b ( θ | θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y ˆ σ ( y | θ )¯ v x,y ( θ , θ ) dθ (cid:21) dθ + Z Θ \ ˆΘ ˆ b ( θ ) (cid:20) Z b ( θ | θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ∗ ( y | θ )¯ v x,y ( θ , θ ) dθ (cid:21) dθ > Z Z b ( θ , θ ) X x ∈X σ ∗ ( x | θ ) X y ∈Y σ ∗ ( y | θ )¯ v x,y ( θ , θ ) dθ dθ , which contradicts (4). A. Equivalent Reformulation in Distributional Form
Since both integrands in (3) and (4) are non-negative, we can exchange the summation of actionsand the integration of types according to Fubini’s theorem. Define u x,y ( θ , θ ) := b ( θ , θ )¯ u x,y ( θ , θ ) and v x,y ( θ , θ ) := b ( θ , θ )¯ v x,y ( θ , θ ) . Since a finite production of continuous functions is stillcontinuous, u x,y and v x,y are both continuous over the joint set Θ × Θ . By assimilating the prior Since the best response of any give policy contains a pure policy as shown in Lemma 1, we can restrict the profitabledeviation to an action without loss of generality. distribution of types into the players’ utility functions, we can discretize the continuous type setuniformly as shown Section IV. Let represent the first player’s behavioral strategy σ ( x | θ ) as afunction of θ parameterized by action x , i.e., f x ( θ ) . Then we can define a non-decreasing boundedfunction F x ( θ ) := R θ f x (˜ θ ) d ˜ θ of θ parameterized by action x . Since P x ∈X f x ( θ ) = 1 , ∀ θ ∈ Θ , and f x ( θ ) ≥ , ∀ θ , ∀ x ∈ X , we obtain P x ∈X F x ( θ ) = θ , ∀ θ ∈ Θ by Fubini’s theorem. Weuse F X to denote the set of functions F X := { F x } x ∈X that satisfy the above conditions. Similarly,we can represent the second player’s strategy σ ( y | θ ) as g y ( θ ) and define G y ( θ ) := R θ f x (˜ θ ) d ˜ θ as the non-decreasing bounded function of θ . Analogously, we have G y (0) = 0 for any y ∈ Y and P y ∈Y G y ( θ ) = θ , ∀ θ ∈ Θ . We use G Y to denote the set of functions G Y := { G y } y ∈Y that satisfythe above conditions. Then, we can recast a BNE strategy pair ( F X ∈ F X , G Y ∈ G Y ) in the following distributional form , i.e., X x ∈X X y ∈Y Z Z u x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) = max F X ∈F X X x ∈X X y ∈Y Z Z u x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) , X x ∈X X y ∈Y Z Z v x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) = max G Y ∈G Y X x ∈X X y ∈Y Z Z v x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) . Due to the difficulty of computing an exact BNE, it is common to consider an approximateequilibrium defined below.
Definition 2 ( ε -BNE) . A strategy pair ( F X ∈ F X , G Y ∈ G Y ) consists a ε -BNE if for all ( F X ∈F X , G Y ∈ G Y ) , the following holds. X x ∈X X y ∈Y Z Z u x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) ≥ X x ∈X X y ∈Y Z Z u x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) − ε, X x ∈X X y ∈Y Z Z v x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) ≥ X x ∈X X y ∈Y Z Z v x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) − ε. III. E
XTENSION OF H ELLY ’ S S ELECTION T HEOREM
Based on the reformulation of BNE in Section II-A, the players’ strategies F X and G Y become L - and H -dimensional vectors of constrained functions, respectively. Thus, we extend the originalHelly’s selection theorem in the following lemma to fit the vector of functions with constraints. Lemma 3 (Convergence on Countable Set) . Consider the finite set X := { x , ..., x L } and a se-quence of functions { F X n } n ∈Z + , where F X n ∈ F X for each n ∈ Z + . Let D := { θ , θ , ... } beany countable subset of Θ . Then there is a subsequence of { F X n } n ∈Z + , i.e., { F X n k } ∞ k =1 such that ¯ F x ( θ ) := lim k →∞ F xn k ( θ ) exists for any θ ∈ D , x ∈ X . Moreover, the limit function ¯ F X ∈ F X .Proof. With a little abuse of notation, the vector F X n ( θ d ) := [ F x n ( θ d ) , · · · , F x L n ( θ d )] at θ d ∈ D , d ∈{ , , · · · } , ∀ n ∈ Z + , belongs to a subset of R L , i.e., F X ( θ d ) , that is closed and bounded. Thenthe subset must be sequentially compact based on Bolzano–Weierstrass theorem and every sequenceof points in this subset has a convergent subsequence to a point in the subset. Thus, we know thatthere exist a subsequence F X n dk ( θ d ) converge to ¯ F X ( θ d ) ∈ F X ( θ d ) . Then, we can apply the standarddiagonalization argument to repeatedly find subsequence from subsequence so that there exist a finalsubsequence n k that makes F X n k ( θ ) converges to ¯ F X ( θ ) ∈ F X ( θ ) for all θ ∈ D . Note that ¯ F x isnon-decreasing with respect to θ ∈ D for each x ∈ X as the inequality is preserved in the limit; i.e.,if θ d < θ d ∈ D , then ¯ F x ( θ d ) = lim k →∞ F xn k ( θ d ) ≤ lim k →∞ F xn k ( θ d ) = ¯ F x ( θ d ) . Theorem 1 (Convergence on Compact Set) . Consider
Θ := [0 , , finite set X := { x , ..., x L } , anda sequence of functions { F X n } n ∈Z + , where F X n ∈ F X for each n ∈ Z + . Then, some subsequence of { F X n } n ∈Z + , i.e., { F X n k } ∞ k =1 , converges point-wise to a non-decreasing bounded function F X ∈ F X ,i.e., F x ( θ ) = lim k →∞ F xn k ( θ ) , ∀ x ∈ X , ∀ θ ∈ Θ .Proof. Let D := Q ∩ [0 , , then D is countable where Q represents the set of rational number. Then,based on Lemma 3, there exists a subsequence { F X n k } ∞ k =1 that converges to ¯ F X ∈ F X if θ ∈ D .Next, we need to extend the function ¯ F X defined on discrete set D to a function F X defined over the continuous region by connecting the dots, i.e., F x ( α ) = sup β ≤ α,β ∈D ¯ F x ( β ) , ∀ α ∈ Θ , ∀ x ∈ X . Then, F X ( θ ) = ¯ F X ( θ ) , ∀ θ ∈ D , and F X ∈ F X is also element-wise non-decreasing with respect to θ ∈ Θ as α < γ leads to F x ( α ) = sup β ≤ α,β ∈D ¯ F x ( β ) ≤ sup β ≤ γ,β ∈D ¯ F x ( β ) = F x ( γ ) , ∀ x ∈ X . Note that by connecting discrete dots, F X ( θ ) is right continuous and there are countable jumps at θ ∈ D . We first show that for each x ∈ X , if F x is continuous at α ∈ Θ , then there exists a subsequence n k of the subsequence n k such that F x ( α ) = lim k →∞ F xn k ( α ) . For any α ∈ Θ , since F x , ∀ x ∈ X ,is continuous at α ∈ Θ , we can choose p, q ∈ D , α ∈ ( p, q ) such that F x ( q ) − F x ( p ) < ǫ/ , ∀ x ∈ X .Owning to the convergence on the countable set D , we can pick k sufficiently large such that F xn k ( p ) ∈ ( F x ( p ) − ǫ/ , F x ( p ) + ǫ/ and F xn k ( q ) ∈ ( F x ( q ) − ǫ/ , F x ( q ) + ǫ/ for all x ∈ X .Then, F xn k ( α ) ≤ F xn k ( q ) < F x ( q ) + ǫ/ < F x ( p ) + ǫ ≤ F x ( α ) + ǫ, ∀ x ∈ X . Analogously, we can also obtain F xn k ( α ) > F x ( α ) − ǫ, ∀ x ∈ X , which together show the convergenceat α . Second, we show the convergence at discontinuous point β ∈ Θ . Since F X is element-wise non-decreasing, the set of discontinuity is at most countable based on Froda’s theorem. Thus, Lemma3 guarantees that we can select a convergent subsequence n k from the subsequence n k such that F x ( β ) = lim k →∞ F xn k ( β ) , ∀ x ∈ X . Combining the above two cases, we have found a convergentsubsequence n k over the entire set Θ , i.e., F x ( θ ) = lim k →∞ F xn k ( θ ) , ∀ x ∈ X , ∀ θ ∈ Θ .Next, we extend Helly’s second theorem to a production of sets in Theorem 2. Theorem 2.
Let u x,y ( θ , θ ) be continuous over the joint set [ α, β ] × C for each action x ∈ X , y ∈ Y ,where C is compact and [ α, β ] ⊆ Θ , then for each y ∈ Y , P x ∈X R βα u x,y ( θ , θ ) dF xn ( θ ) convergesto P x ∈X R βα u x,y ( θ , θ ) dF x ( θ ) uniformly in θ .Proof. For any ǫ > , since u x,y is uniformly continuous over the joint set, we can choose δ > such that | u x,y ( θ , θ ) − u x,y ( θ ′ , θ ) | < ǫ (5)for all x ∈ X , y ∈ Y , θ ∈ C , | θ − θ ′ | < δ . Choose α = θ < θ < · · · < θ D = β such that F xn iscontinuous at each θ d , d ∈ { , · · · , D − } and θ d +11 − θ d < δ for all x ∈ X , which can be done as F xn has at most countable discontinuities over the set [ α, β ] . Define u x,yd ( θ ) := min θ d ≤ θ ≤ θ d +11 u x,y ( θ , θ ) and S x,yn ( θ ) := P D − d =1 R θ d +11 θ d u x,yd ( θ ) dF xn ( θ ) , ∀ n ∈ Z +0 . Then, we have Z βα u x,y ( θ , θ ) dF xn ( θ ) ≥ S x,yn ( θ ) . (6)Now by (5) and the monotonicity of F xn , we have S x,yn ( θ ) ≥ R βα u x,y ( θ , θ ) dF xn ( θ ) − ǫ ( F xn ( β ) − F xn ( α )) , ∀ x ∈ X , y ∈ Y . Then, using (6) and the fact that F xn ( β ) − F xn ( α ) ≤ , ∀ x ∈ X , we have | S x,yn ( θ ) − Z βα u x,y ( θ , θ ) dF xn ( θ ) | < ǫ, ∀ x ∈ X , y ∈ Y . (7)Now, choose N large enough such that, for each d = 1 , · · · , D , and each n ≥ N , | F xn ( θ d ) − F x ( θ d ) | < ǫD , ∀ x ∈ X . (8)Then, we obtain R θ d +11 θ d u x,yd ( θ ) dF xn ( θ ) = u x,yd ( θ )( F xn ( θ d +11 ) − F xn ( θ d )) , ∀ x ∈ X , y ∈ Y , ∀ n ∈ Z +0 , which, together with (8), gives us | R θ d +11 θ d u x,yd ( θ ) dF xn ( θ ) − R θ d +11 θ d u x,yd ( θ ) dF x ( θ ) | < ǫD | u x,yd ( θ ) | , ∀ x ∈ X , y ∈ Y . Since u x,y is continuous over a compact set, there exists a finite upper bound M for | u x,yd ( θ ) | . Therefore, | S x,yn ( θ ) − S x,y ( θ ) | < D − X d =1 ǫD | u x,yd ( θ ) | ≤ ǫM, (9)Combine (7) and (9), we have that ∀ θ ∈ C , n ≥ N , | Z βα u x,y ( θ , θ ) dF xn ( θ ) − Z βα u x,y ( θ , θ ) dF x ( θ ) | < (2 M + 1) ǫ, ∀ x ∈ X , y ∈ Y , or equivalently, X x ∈X | Z βα u x,y ( θ , θ ) dF xn ( θ ) − Z βα u x,y ( θ , θ ) dF x ( θ ) | < |X | · (2 M + 1) ǫ, ∀ y ∈ Y . Since ǫ is arbitrary and its coefficient |X | · (2 M + 1) is fixed for all θ ∈ C , the convergence isuniformly in θ for each y ∈ Y .IV. D ISCRETIZATION AND C ONVERGENCE
In this section, we provide a theoretical guarantee to approximate infinite Bayesian games byproperly discretizing the type space and solving the resulted finite Bayesian games. The convergenceof the BNE is guaranteed as long as the maximum distance of intervals under the discretization schemegoes to zero when the number of intervals goes to infinity. For simplicity, we adopt the followinguniform discretization scheme. We can also adopt other deterministic schemes such as dichotomy orstochastic schemes such as sampling.For any integer n ≥ and action pair x ∈ X , y ∈ Y , define the level- n approximation of two players’utility functions u x,y , v x,y as two n × n matrices [ u x,y,ni,j ] i,j ∈{ , ··· ,n } , [ v x,y,ni,j ] i,j ∈{ , ··· ,n } , respectively,where the ( i, j ) elements are u x,y,ni,j = u x,y ( in , jn ) , v x,y,ni,j = v x,y ( in , jn ) . (10)Then, the level- n discretized version of the infinite Bayesian game Γ is denoted as Γ n = < X , Y , ¯Θ n × ¯Θ n , b n ( · ) , { u x,y,ni,j , v x,y,ni,j } i,j ∈{ , ··· ,n } x ∈X ,y ∈Y >, where the finite type set ¯Θ ni := { n , · · · , nn } contains n discrete types of player i . Since we haveassimilated the the prior type distribution b ( · ) into the players’ utility functions u x,y , v x,y , the priordistribution of the discrete types is b n ( in , jn ) = n , ∀ i, j ∈ { , · · · , n } . Let s X ,n := ( s X ,n , · · · , s X ,nn ) and t Y ,n := ( t Y ,n , · · · , t Y ,nn ) be a BNE of the level- n discretized Bayesian game Γ n where the elementsof s X ,ni := [ s x ,ni , s x ,ni , · · · , s x L ,ni ] and t Y ,nj := [ t y ,ni , t y ,ni , · · · , t y H ,ni ] are all non-negative for all i, j ∈ { , · · · , n } and each sum up to be , i.e., P Ll =1 s x l ,ni = 1 , P Hh =1 t y h ,ni = 1 , ∀ i, j ∈ { , · · · , n } .The existence of behavioral strategy pairs ( s X ,n , t Y ,n ) is guaranteed [19] for any finite Bayesian games Γ n . For any x ∈ X , y ∈ Y and n ∈ Z + , define the non-decreasing right-continuous step functions F xn ( θ ) = 1 n ⌊ nθ ⌋ X i =1 s x,ni , G yn ( θ ) = 1 n ⌊ nθ ⌋ X j =1 t y,nj , (11)where ⌊ nθ ⌋ represents the great integer that is not greater than the value of nθ . Obviously, F X n ∈ F X and G Y n ∈ G Y for any n ∈ Z + .Since player has H possible actions, we can divide the entire type space into at most H disjointsubsets, i.e., Θ = ∪ Hh =1 Θ h , Θ h ∩ Θ h ′ = ∅ , ∀ h = h ′ , where player chooses to take action y h ∈ Y when his type θ belongs to Θ h , i.e., g y h ( θ ) = { θ ∈ Θ h } , ∀ h ∈ { , , ..., H } , ∀ θ ∈ Θ . Note thateach subset Θ h ⊆ Θ , h ∈ { , · · · , H } , does not need to be connected and can be empty. Lemma 4.
The function P h ∈{ , ··· ,H } R Θ h v x,y h ( θ , θ ) dθ is continuous over θ for any x ∈ X .Proof. Since v x,y h ( θ , θ ) is continuous over the joint type space for any x ∈ X , y ∈ Y , weknow that for any number ǫ > , however small, there exists some number δ > such thatfor all θ ∈ ( α − δ, α + δ ) , v x,y h ( θ , θ ) ∈ ( v x,y h ( α, θ ) − ǫ, v x,y h ( α, θ ) + ǫ ) for all θ ∈ Θ .Based on the fact that P h ∈{ , ··· ,H } R Θ h dθ ≡ , we have P h ∈{ , ··· ,H } R Θ h v x,y h ( α, θ ) dθ − ǫ < P h ∈{ , ··· ,H } R Θ h v x,y h ( θ , θ ) dθ < P h ∈{ , ··· ,H } R Θ h v x,y h ( α, θ ) dθ + ǫ, which proves the continuityin θ .Now, we are ready to prove our main result of equilibrium convergence in Theorem 3. Theorem 3 (Convergence of BNE by Discretization) . A infinite Bayesian game Γ has at least oneBNE pair ( F X ∈ F X , G Y ∈ G Y ) in behavioral strategies. Moreover, there exists a sequence of discretized Bayesian games { Γ n k } k ∈ Z + such that F x ( θ ) = lim k →∞ F xn k ( θ ) , ∀ θ ∈ Θ , ∀ x ∈ X and G y ( θ ) = lim k →∞ G yn k ( θ ) , ∀ θ ∈ Θ , ∀ y ∈ Y .Proof. We prove the theorem by contradiction. According to Theorem 1, the sequence of mixedstrategy pairs ( F X n ∈ F X , G Y n ∈ G Y ) will have a subsequence ( F X n k , G Y n k ) that converges weakly toa pair of strategies ( F X ∈ F X , G Y ∈ G Y ) . Suppose the strategy pair ( F X , G Y ) does not consist aBNE. Then, at least one of the two strategies is not a best response against the other. We may assumethat G Y is not optimal againt F X . The second player’s expected utility under the BNE of Γ is w := X x ∈X X y ∈Y Z Z v x,y ( θ , θ ) dF x ( θ ) dG y ( θ ) . (12)For each n , the second player’s expected utility under the BNE of Γ n is w n := 1 n X x ∈X X y ∈Y n X i =1 n X j =1 v x,y,ni,j s x,ni t y,nj = X x ∈X X y ∈Y Z Z v x,y ( θ , θ ) dF xn ( θ ) dG yn ( θ ) . (13)Since G Y is not an optimal response against F X and Lemma 1 shows that the deviation can be a purestrategy without loss of generality, there exists a set division of Θ , i.e., Θ h , ∀ h ∈ { , , ..., H } , suchthat the deviation strategy ¯ g y h ( θ ) = { θ ∈ Θ h } , ∀ h ∈ { , , ..., H } , ∀ θ ∈ Θ , achieves an expectedutility larger than w . Then, there exists ǫ > such that X h ∈{ , ··· ,H } Z Θ h X x ∈X Z v x,y h ( θ , θ ) dF x ( θ ) dθ ≥ w + 4 ǫ. Based on the continuity result in Lemma 4 and the convergence result in Theorem 2, for the setdivision { Θ h } h ∈{ , ··· ,H } , there exists K such that if k ≥ K , we have X x ∈X Z (cid:20) X h ∈{ , ··· ,H } Z Θ h v x,y h ( θ , θ ) dθ (cid:21) dF xn k ( θ ) > X x ∈X Z (cid:20) X h ∈{ , ··· ,H } Z Θ h v x,y h ( θ , θ ) dθ (cid:21) dF x ( θ ) − ǫ, or equivalently, n k X x ∈X n k X i =1 (cid:20) X h ∈{ , ··· ,H } Z Θ h v x,y h ( in k , θ ) dθ (cid:21) s x,n k i > w + 3 ǫ. (14)Theorem 2 also guarantees that there exists K such that if k ≥ K , X x ∈X Z v x,y ( θ , θ ) dF xn k ( θ ) < X x ∈X Z v x,y ( θ , θ ) dF x ( θ ) + ǫ, ∀ θ ∈ [0 , , ∀ y ∈ Y , X y ∈Y Z v x,y ( θ , θ ) dG xn k ( θ ) < X y ∈Y Z v x,y ( θ , θ ) dG x ( θ ) + ǫ, ∀ θ ∈ [0 , , ∀ x ∈ X . Thus, we obtain w n k < w + 2 ǫ and from (14), we have n k X x ∈X n k X i =1 (cid:20) X h ∈{ , ··· ,H } Z Θ h v x,y h ,n k ( in k , θ ) dθ (cid:21) s x,n k i > w n k + ǫ. (15)Owning to the continuity of v x,y h ( in k , θ ) over θ , R Θ h v x,y h ,n k ( in k , θ ) dθ is Riemann integrable.Since we discretize the entire type set Θ uniformly, the length of the sub-interval of the partition is n k . Thus, there exists K such that if k ≥ K , n k X h ∈{ , ··· ,H } X jnk ∈ Θ h v x,y h ,n k ( in k , jn k ) dθ > X h ∈{ , ··· ,H } Z Θ h v x,y h ,n k ( in k , θ ) dθ − ǫ, and so n k X x ∈X n k X i =1 (cid:20) X h ∈{ , ··· ,H } X jnk ∈ Θ h v x,y h ,n k ( in k , jn k ) (cid:21) s x,n k i > X x ∈X n k X i =1 (cid:20) X h ∈{ , ··· ,H } Z Θ h v x,y h ,n k ( in k , θ ) dθ (cid:21) s x,n k i − n k · ǫ. Finally, combine with (15), we know that n k ) X x ∈X n k X i =1 (cid:20) X h ∈{ , ··· ,H } X jnk ∈ Θ h v x,y h ,n k ( in k , jn k ) (cid:21) s x,n k i > w n k , which leads to a contradiction as t Y ,n k was assumed to be an optimal response against s X ,n k in thefinite Bayesian game Γ n k . However, the second player achieves a higher expected utility under s X ,n k if he adopts the pure BNE strategy ¯ t Y ,n k whose i -th element ¯ t Y ,n k i satisfies ¯ t y h ′ ,n k i = { h ′ = h } , ∀ h ′ ∈{ , · · · , H } , if in k ∈ Θ h . Therefore, the contradiction leads to the conclusion that G Y is alwaysoptimal against F X and the strategy pair ( F X , G Y ) consists a BNE in behavioral strategy for theinfinite Bayesian game Γ . A. Algorithm to Compute ε -BNE of Infinite Bayesian Games Although Theorem 3 proves the asymptotic convergence of BNE, there is no finite-step performanceguarantee. There exist counterexamples (see e.g., [18]) where the finite approximation of an infinitegame leads to misleading results. Due to the pathology, we construct Algorithm 1 as follows to checkwhether a ε -BNE has been reached at some finite level n . Algorithm 1:
Compute ε -BNE of infinite Bayesian game Γ Input the infinite Bayesian game Γ , the approximation accuracy ε > , and the maximumnumber of discretization K ; Initialize the discretization level n = 1 ; while n < K do Discretize Γ via (10) to obtain Γ n ; Solve Γ n to obtain the equilibrium strategy pair ( s X ,n , t Y ,n ) ; Obtain the level- n approximated strategy pair ( F X n , G Y n ) for Γ via (11); if ( F X n , G Y n ) consists a ε -BNE of Γ in Definition 2 then Terminate ; n := n + 1 ; end Output the ε -BNE strategy ( F X n , G Y n ) of the infinite Bayesian game Γ .To compute the BNE of finite Bayesian games in line , we can construct the following bilinearprogram C K (see Theorem 1 of [10]). Recall that the finite type set ¯Θ ni ⊂ Θ contains the n discretetypes of player i . [ C K ] : max σ ,σ ,s ,s X θ ∈ ¯Θ n α ( θ ) s ( θ ) + X θ ∈ ¯Θ n α ( θ ) E θ ∼ b ( ·| θ ) ,x ∼ σ ,y ∼ σ [¯ u x,y ( θ , θ )]+ X θ ∈ ¯Θ n α ( θ ) s ( θ ) + X θ ∈ ¯Θ n α ( θ ) E θ ∼ b ( ·| θ ) ,x ∼ σ ,y ∼ σ [¯ v x,y ( θ , θ )] s.t. ( a ) E θ ∼ b ( ·| θ ) ,x ∼ σ [¯ v x,y ( θ , θ )] ≤ − s ( θ ) , ∀ θ ∈ ¯Θ n , ∀ y ∈ Y , ( b ) X x ∈X σ ( x | θ ) = 1 , σ ( x | θ ) ≥ , ∀ θ ∈ ¯Θ n , ( c ) E θ ∼ b ( ·| θ ) ,y ∼ σ [¯ u x,y ( θ , θ )] ≤ − s ( θ ) , ∀ θ ∈ ¯Θ n , ∀ x ∈ X , ( d ) X y ∈Y σ ( y | x, θ ) = 1 , σ ( y | θ ) ≥ , ∀ θ ∈ ¯Θ n . (16) Note that α ( θ ) , ∀ θ ∈ ¯Θ n and α ( θ ) , ∀ θ ∈ ¯Θ n , are not decision variables and can be any strictlypositive and finite numbers. Thus, we have the freedom to pick them properly to obtain a linearprogram rather than a bilinear program under certain conditions as shown in Proposition 1. Proposition 1 (Linear Program Reformulation) . If there exists m i ( θ i ) > , ∀ i ∈ { , } , ∀ θ i ∈ ¯Θ ni ,such that m ( θ )¯ u x,y ( θ , θ ) = − m ( θ )¯ v x,y ( θ , θ ) holds for all x ∈ X , y ∈ Y , θ ∈ ¯Θ n , θ ∈ ¯Θ n ,then we can pick α i ( θ i ) = ¯ b i ( θ i ) /m i ( θ i ) > to make C K a linear program.Proof. It is straightforward to verify that two bilinear terms always sum up to , i.e., X θ ∈ ¯Θ n α ( θ ) E θ ∼ b ( ·| θ ) ,x ∼ σ ,y ∼ σ [¯ u x,y ( θ , θ )] + X θ ∈ ¯Θ n α ( θ ) E θ ∼ b ( ·| θ ) ,x ∼ σ ,y ∼ σ [¯ v x,y ( θ , θ )] ≡ , for all feasible strategy pair σ , σ , if we choose α i ( θ i ) = ¯ b i ( θ i ) /m i ( θ i ) .Note that the condition m i ( θ i ) = 1 , ∀ i ∈ { , } , ∀ θ i ∈ ¯Θ ni , results in a zero-sum finite Bayesiangame. Then, we can recast C K as a linear program by picking α i ( θ i ) = ¯ b i ( θ i ) , ∀ θ i ∈ ¯Θ n , whichcoincides with the existing result in [16]. R EFERENCES [1] Khajonpong Akkarajitsakul, Ekram Hossain, and Dusit Niyato. Distributed resource allocation in wireless networksunder uncertainty and application of bayesian game.
IEEE Communications Magazine , 49(8):120–127, 2011.[2] Olivier Armantier, Jean-Pierre Florens, and Jean-Francois Richard. Approximation of nash equilibria in bayesian games.
Journal of Applied Econometrics , 23(7):965–981, 2008.[3] Susan Athey. Single crossing properties and the existence of pure strategy equilibria in games of incomplete information.
Econometrica , 69(4):861–889, 2001.[4] Oriol Carbonell-Nicolau and Richard P McLean. On the existence of nash equilibrium in bayesian games.
Mathematicsof Operations Research , 43(1):100–129, 2018.[5] Sam Ganzfried and Tuomas Sandholm. Computing equilibria by incorporating qualitative models. In
Proceedings ofthe 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1 - Volume 1 , AAMAS ’10,page 183–190. International Foundation for Autonomous Agents and Multiagent Systems, 2010.[6] John C Harsanyi. Games with incomplete information played by “bayesian” players, i–iii part i. the basic model.
Management science , 14(3):159–182, 1967.[7] Linan Huang and Quanyan Zhu. Analysis and computation of adaptive defense strategies against advanced persistentthreats for cyber-physical systems. In
International Conference on Decision and Game Theory for Security , pages205–226. Springer, 2018.[8] Linan Huang and Quanyan Zhu. Adaptive strategic cyber defense for advanced persistent threats in critical infrastructurenetworks.
ACM SIGMETRICS Performance Evaluation Review , 46(2):52–56, 2019.[9] Linan Huang and Quanyan Zhu. Dynamic games of asymmetric information for deceptive autonomous vehicles. arXivpreprint arXiv:1907.00459 , 2019.[10] Linan Huang and Quanyan Zhu. A dynamic games approach to proactive defense strategies against advanced persistentthreats in cyber-physical systems.
Computers & Security , 89:101660, 2020.[11] Christopher Kiekintveld, Janusz Marecki, and Milind Tambe. Approximation methods for infinite bayesian stackelberggames: Modeling distributional payoff uncertainty. In
The 10th International Conference on Autonomous Agents andMultiagent Systems-Volume 3 , pages 1005–1012, 2011.[12] Vijay Krishna.
Auction theory . Academic press, 2009.[13] Alejandro M Manelli. The convergence of equilibrium strategies of approximating signaling games.
Economic Theory ,7(2):323–335, 1996.[14] Paul R Milgrom and Robert J Weber. Distributional strategies for games with incomplete information.
Mathematicsof operations research , 10(4):619–632, 1985.[15] Guillermo Owen. Existence of equilibrium pairs in continuous games.
International Journal of Game Theory , 5(2):97–105, 1976.[16] J-P Ponssard and Sylvain Sorin. The lp formulation of finite zero-sum games with incomplete information.
InternationalJournal of Game Theory , 9(2):99–105, 1980.[17] Zinovi Rabinovich, Victor Naroditskiy, Enrico H Gerding, and Nicholas R Jennings. Computing pure bayesian-nashequilibria in games with finite actions and continuous types.
Artificial Intelligence , 195:106–139, 2013.[18] Daniel M Reeves and Michael P Wellman. Computing best-response strategies in infinite games of incompleteinformation. In
Proceedings of the 20th conference on Uncertainty in artificial intelligence , pages 470–478, 2004.[19] Yoav Shoham and Kevin Leyton-Brown.